├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── __init__.py ├── agent.py ├── avnav_agent.py ├── avwan_agent.py ├── configs ├── avnav_agent.yaml ├── avwan_agent.yaml ├── challenge_audionav.local.rgbd.yaml ├── challenge_avwan.local.rgbd.yaml └── challenge_random.local.yaml ├── eval.py ├── example_ckpt.pth ├── res └── img │ ├── sep_metrics.png │ ├── soundspaces-logo.png │ └── spl.png ├── test_locally_audionav_rgbd.sh ├── test_locally_avwan_rgbd.sh └── test_locally_random.sh /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .vscode 3 | data 4 | # *.pth 5 | runs 6 | __pycache__ 7 | 8 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to. 4 | Please read the [full text](https://code.fb.com/codeofconduct/) 5 | so that you can understand what actions will and will not be tolerated. 6 | 7 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to soundspaces-challenge 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Pull Requests 6 | We actively welcome your pull requests. 7 | 8 | 1. Fork the repo and create your branch from `master`. 9 | 2. If you haven't already, complete the Contributor License Agreement ("CLA"). 10 | 11 | ## Contributor License Agreement ("CLA") 12 | In order to accept your pull request, we need you to submit a CLA. You only need 13 | to do this once to work on any of Facebook's open source projects. 14 | 15 | Complete your CLA here: 16 | 17 | ## Issues 18 | We use GitHub issues to track public bugs. Please ensure your description is 19 | clear and has sufficient instructions to be able to reproduce the issue. 20 | 21 | ## License 22 | By contributing to soundspaces-challenge, you agree that your contributions will be licensed 23 | under the LICENSE file in the root directory of this source tree. 24 | 25 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Facebook, Inc. and its affiliates. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 | -------------------------------------------------------------------------------- 6 | 7 | # SoundSpaces Challenge 2023 8 | 9 | This repository contains starter code for the 2023 challenge, details of the tasks, and training and evaluation setups. For an overview of SoundSpaces Challenge visit [soundspaces.org/challenge](https://soundspaces.org/challenge/). 10 | 11 | This year, we are hosting two challenges: the first one is on the audio-visual navigation task [1], where an agent is tasked to find a sound-making object in unmapped 3D environments with visual and auditory perception, and the second one is on the active audio-visual source separation task [3], where an agent is tasked to separate a target sound-making object emitting time-varying sounds from an audio mixture comprising spatial time-varying sounds from multiple sound sources. 12 | 13 | 14 | ## Task 15 | In AudioGoal navigation (AudioNav), an agent is spawned at a random starting position and orientation in an unseen environment. A sound-emitting object is also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform at each time step and needs to navigate to the target location. No ground-truth map is available and the agent must only use its sensory input (audio and RGB-D) to navigate. 16 | 17 | In Active Audio-Visual Separation (active AV separation), an agent is spawned at a random starting position and orientation in an unseen environment. Multiple sound-emitting objects, each of which emits a time-varying sound, are also randomly spawned at a location in the same environment. 18 | The agent receives a one-second audio input in the form of a waveform, which is a mixture of the spatial sounds from all sources, at each time step and needs to navigate to separate the audio from a target source, denoted by a target class label, at every step of its motion. 19 | No ground-truth map is available and the agent must only use its sensory input (audio and RGB) to navigate. The current version of the challenge considers separation scenarios like speech vs. speech and speech. vs. music. 20 | 21 | ### Dataset 22 | The challenge will be conducted on the SoundSpaces Dataset, which is based on AI Habitat, Matterport3D, and Replica. For this challenge, we use the Matterport3D dataset due to its diversity and scale of environments. This challenge focuses on evaluating agents' ability to generalize to unheard sounds and unseen environments. For AudioNav, the training and validation splits are the same as used in Unheard Sound experiments reported in the SoundSpaces paper. They can be downloaded from the SoundSpaces dataset page (including minival). For active AV separation, the training and validation splits are the same as used in Unheard Sound experiments reported in the Active AV Dynamic Separation paper. 23 | 24 | ### Evaluation 25 | For AudioNav, after calling the STOP action, the agent is evaluated using the 'Success weighted by Path Length' (SPL) metric [2]. An episode is deemed successful if on calling the STOP action, the agent is within 0.36m (2x agent-radius) of the goal position. 26 | 27 |

28 | 29 |

30 | 31 | 32 | For active AV separation, the agent is evaluated using the 'Scale-invariant source-to-distortion ratio' (SI-SDR) metric, averaged over the whole agent trajectory. 33 | 34 |

35 | 36 |

37 | 38 | ## Participation Guidelines 39 | 40 | Participate in the contest by registering on the [EvalAI challenge page](https://eval.ai/web/challenges/challenge-page/1971/overview) and creating a team. Participants will upload JSON files containing the evaluation metric values for both challenges. For AV Nav, participants will also upload the trajectories executed by their model, which will be used to validate the submitted performance values. For active AV separation, the winning teams will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. Instructions for evaluation and online submission are provided below. 41 | 42 | ### Evaluation 43 | For AudioNav, 44 | 1. Clone the challenge repository: 45 | 46 | ```bash 47 | git clone https://github.com/facebookresearch/soundspaces-challenge.git 48 | cd soundspaces-challenge 49 | ``` 50 | 51 | 1. Implement your own agent or try one of ours. We provide an agent in `agent.py` that takes random actions: 52 | ```python 53 | import habitat 54 | import soundspaces 55 | 56 | class RandomAgent(habitat.Agent): 57 | def reset(self): 58 | pass 59 | 60 | def act(self, observations): 61 | return numpy.random.choice(len(self._POSSIBLE_ACTIONS)) 62 | 63 | def main(): 64 | agent = RandomAgent(task_config=config) 65 | challenge = soundspaces.Challenge() 66 | challenge.submit(agent) 67 | ``` 68 | 69 | 1. Following instructions for downloading SoundSpaces [dataset](https://github.com/facebookresearch/sound-spaces/tree/master/soundspaces) and place all data under `data/` folder. 70 | 71 | 72 | 1. Evaluate the random agent locally: 73 | ```bash 74 | env CHALLENGE_CONFIG_FILE="configs/challenge_random.local.yaml" python agent.py 75 | ``` 76 | This calls `eval.py`, which dumps a JSON file that contains a Python dictionary of the following type: 77 | ```python 78 | eval_dict = {"ACTIONS": {f"{scene_id_1}_{episode_id_1}": [action_1_1, ..., 0], f"{scene_id_2}_{episode_id_2}": [action_2_1, ..., 0]}, "SPL": average_spl, "SOFT_SPL": average_softspl, "DISTANCE_TO_GOAL": average_distance_to_goal, "SUCCESS": average_success} 79 | ``` 80 | **Make sure that the json that gets dumped upon evaluating your agent is of the exact same type**. The easiest way to ensure that is by not modifying `eval.py`. 81 | 82 | 83 | For active AV separation, follow instructions in the `challenge` branch of the [active-AV-dynamic-separation](https://https://github.com/SAGNIKMJR/active-AV-dynamic-separation) repository. 84 | 85 | ### Online submission 86 | 87 | Follow instructions in the `submit` tab of the [EvalAI challenge page](https://eval.ai/web/challenges/challenge-page/1971/overview) (will open soon!) to **upload** your evaluation JSON file. 88 | 89 | Valid challenge phases are `AudioNav {Minival, Test-Standard} Phase` and `AudioSep Test-Standard Phase`. 90 | 91 | The challenge consists of the following phases: 92 | 93 | 1. **AudioNav Minival Phase**: This split is same as the one used in `./test_locally_audionav_rgbd.sh`. The purpose of this phase/split is sanity checking -- to confirm that your online submission to EvalAI doesn't run into any issue during evaluation. Each team is allowed maximum of 30 submission per day for this phase. 94 | 2. **AudioNav Test-Standard Phase**: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for AudioNav; this is what should be used to report results in papers. The relevant split for this phase is `test_multiple_unheard`. Each team is allowed maximum of 10 submission per day for this phase. As a reminder, the submitted trajectories will be used to validate the submitted performance values. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. 95 | 2. **AudioSep Test-Standard Phase**: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for active AV separation; this is what should be used to report results in papers. The relevant split for this phase is `testUnheard_1000episodes`. Each team is allowed maximum of 30 submission per day for this phase. As a reminder, the winning teams of the active AV separation challenge will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. 96 | 97 | Note: If you face any issues or have questions you can ask them by mailing the organizers or opening an issue on this repository. 98 | 99 | ### Baselines and Starter Code 100 | 1. **AudioNav**: We included both the configs and Python scripts for [av-nav](https://github.com/facebookresearch/sound-spaces/tree/master/ss_baselines/av_nav) and [av-wan](https://github.com/facebookresearch/sound-spaces/tree/master/ss_baselines/av_wan). Note that the [MapNav environment](https://github.com/facebookresearch/sound-spaces/blob/soundspaces-challenge/ss_baselines/av_wan/mapnav_env.py) used by av-wan is baked into the environment container and can't be changed. We suggest you to re-write that planning for loop in the agent code if you want to modify mapping or planning. 101 | 102 | 2. **Active AV Separation**: We have included configs and Python in the `challenge` branch of the [active-AV-dynamic-separation](https://https://github.com/SAGNIKMJR/active-AV-dynamic-separation) repository. 103 | 104 | 105 | ## Acknowledgments 106 | 107 | Thank Habitat team for the challenge template. 108 | 109 | 110 | ## References 111 | 112 | [1] [SoundSpaces: Audio-Visual Navigation in 3D Environments](https://arxiv.org/pdf/1912.11474.pdf). Changan Chen\*, Unnat Jain\*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman. ECCV, 2020. 113 | 114 | [2] [On evaluation of embodied navigation agents](https://arxiv.org/abs/1807.06757). Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir. arXiv:1807.06757, 2018. 115 | 116 | [3] [Active Audio-Visual Separation of Dynamic Sound Sources](https://arxiv.org/pdf/2202.00850.pdf). Sagnik Majumder, Kristen Grauman. ECCV, 2022. 117 | 118 | 119 | ## License 120 | This repo is MIT licensed, as found in the LICENSE file. 121 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/__init__.py -------------------------------------------------------------------------------- /agent.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import argparse 8 | import os 9 | import random 10 | 11 | import numpy 12 | 13 | import habitat 14 | import soundspaces 15 | # from av_nav.config.default import get_task_config 16 | from ss_baselines.av_nav.config.default import get_task_config 17 | from eval import Challenge2022 18 | 19 | 20 | class RandomAgent(habitat.Agent): 21 | def __init__(self, task_config: habitat.Config): 22 | self._POSSIBLE_ACTIONS = task_config.TASK.POSSIBLE_ACTIONS 23 | 24 | def reset(self): 25 | pass 26 | 27 | def act(self, observations): 28 | return numpy.random.choice(len(self._POSSIBLE_ACTIONS)) 29 | 30 | 31 | def main(): 32 | parser = argparse.ArgumentParser() 33 | parser.add_argument( 34 | "--run-dir", type=str, default="runs/", 35 | ) 36 | args = parser.parse_args() 37 | 38 | config_paths = os.environ["CHALLENGE_CONFIG_FILE"] 39 | config = get_task_config(config_paths) 40 | agent = RandomAgent(task_config=config) 41 | 42 | challenge = Challenge2022() 43 | 44 | challenge.submit(agent, run_dir=args.run_dir, json_filename=f"random_{config.DATASET.SPLIT}.json") 45 | 46 | 47 | if __name__ == "__main__": 48 | main() 49 | -------------------------------------------------------------------------------- /avnav_agent.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import argparse 8 | import os 9 | import random 10 | from collections import OrderedDict 11 | import logging 12 | import sys 13 | import time 14 | 15 | import numba 16 | import numpy as np 17 | import torch 18 | from gym.spaces import Box, Dict, Discrete 19 | 20 | import habitat 21 | from habitat import Config 22 | from habitat.core.agent import Agent 23 | import soundspaces 24 | from ss_baselines.av_nav.config import get_config 25 | from ss_baselines.av_nav.ppo.policy import AudioNavBaselinePolicy 26 | from ss_baselines.common.utils import batch_obs 27 | from eval import Challenge2022 28 | 29 | 30 | @numba.njit 31 | def _seed_numba(seed: int): 32 | random.seed(seed) 33 | np.random.seed(seed) 34 | 35 | 36 | class PPOAgent(Agent): 37 | def __init__(self, config: Config): 38 | spaces = { 39 | "spectrogram": Box( 40 | low=np.finfo(np.float32).min, 41 | high=np.finfo(np.float32).max, 42 | shape=(65, 26, 2), 43 | dtype=np.float32, 44 | ) 45 | } 46 | 47 | if config.INPUT_TYPE in ["depth", "rgbd"]: 48 | spaces["depth"] = Box( 49 | low=0, 50 | high=1, 51 | shape=( 52 | config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.HEIGHT, 53 | config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.WIDTH, 54 | 1, 55 | ), 56 | dtype=np.float32, 57 | ) 58 | 59 | if config.INPUT_TYPE in ["rgb", "rgbd"]: 60 | spaces["rgb"] = Box( 61 | low=0, 62 | high=255, 63 | shape=( 64 | config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.HEIGHT, 65 | config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.WIDTH, 66 | 3, 67 | ), 68 | dtype=np.uint8, 69 | ) 70 | observation_spaces = Dict(spaces) 71 | 72 | action_space = Discrete(len(config.TASK_CONFIG.TASK.POSSIBLE_ACTIONS)) 73 | 74 | self.device = torch.device("cuda:{}".format(config.TORCH_GPU_ID)) 75 | self.hidden_size = config.RL.PPO.hidden_size 76 | 77 | random.seed(config.RANDOM_SEED) 78 | np.random.seed(config.RANDOM_SEED) 79 | _seed_numba(config.RANDOM_SEED) 80 | torch.random.manual_seed(config.RANDOM_SEED) 81 | torch.backends.cudnn.deterministic = True 82 | policy_arguments = OrderedDict( 83 | observation_space=observation_spaces, 84 | action_space=action_space, 85 | hidden_size=self.hidden_size, 86 | extra_rgb=False, 87 | goal_sensor_uuid=config.TASK_CONFIG.TASK.GOAL_SENSOR_UUID 88 | ) 89 | 90 | self.actor_critic = AudioNavBaselinePolicy(**policy_arguments) 91 | self.actor_critic.to(self.device) 92 | 93 | if config.MODEL_PATH: 94 | ckpt = torch.load(config.MODEL_PATH, map_location=self.device) 95 | print(f"Checkpoint loaded: {config.MODEL_PATH}") 96 | # Filter only actor_critic weights 97 | self.actor_critic.load_state_dict( 98 | { 99 | k.replace("actor_critic.", ""): v 100 | for k, v in ckpt["state_dict"].items() 101 | if "actor_critic" in k 102 | } 103 | ) 104 | 105 | else: 106 | habitat.logger.error( 107 | "Model checkpoint wasn't loaded, evaluating " "a random model." 108 | ) 109 | 110 | self.test_recurrent_hidden_states = None 111 | self.not_done_masks = None 112 | self.prev_actions = None 113 | 114 | def reset(self): 115 | self.test_recurrent_hidden_states = torch.zeros( 116 | self.actor_critic.net.num_recurrent_layers, 117 | 1, 118 | self.hidden_size, 119 | device=self.device, 120 | ) 121 | self.not_done_masks = torch.zeros(1, 1, device=self.device) 122 | self.prev_actions = torch.zeros(1, 1, dtype=torch.long, device=self.device) 123 | 124 | def act(self, observations): 125 | batch = batch_obs([observations], device=self.device) 126 | 127 | with torch.no_grad(): 128 | _, action, _, self.test_recurrent_hidden_states = self.actor_critic.act( 129 | batch, 130 | self.test_recurrent_hidden_states, 131 | self.prev_actions, 132 | self.not_done_masks, 133 | deterministic=False, 134 | ) 135 | # Make masks not done till reset (end of episode) will be called 136 | self.not_done_masks.fill_(1.0) 137 | self.prev_actions.copy_(action) 138 | 139 | return action.item() 140 | 141 | 142 | def main(): 143 | parser = argparse.ArgumentParser() 144 | parser.add_argument( 145 | "--input-type", default="blind", choices=["blind", "rgb", "depth", "rgbd"] 146 | ) 147 | config_paths = os.environ["CHALLENGE_CONFIG_FILE"] 148 | parser.add_argument("--model-path", default="", type=str) 149 | parser.add_argument( 150 | "--run-dir", type=str, default="runs/", 151 | ) 152 | args = parser.parse_args() 153 | 154 | config = get_config( 155 | "configs/avnav_agent.yaml", ["BASE_TASK_CONFIG_PATH", config_paths] 156 | ).clone() 157 | 158 | config.defrost() 159 | config.TORCH_GPU_ID = 0 160 | config.INPUT_TYPE = args.input_type 161 | config.MODEL_PATH = args.model_path 162 | 163 | config.RANDOM_SEED = 7 164 | config.freeze() 165 | 166 | agent = PPOAgent(config) 167 | 168 | challenge = Challenge2022() 169 | challenge._env.seed(config.RANDOM_SEED) 170 | 171 | print("Start evaluating ...") 172 | challenge.submit(agent, run_dir=args.run_dir, json_filename=f"avnav_{config.TASK_CONFIG.DATASET.SPLIT}.json") 173 | 174 | 175 | if __name__ == "__main__": 176 | main() -------------------------------------------------------------------------------- /avwan_agent.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import argparse 8 | import os 9 | import random 10 | from collections import OrderedDict 11 | import logging 12 | import sys 13 | import time 14 | 15 | import numba 16 | import numpy as np 17 | import torch 18 | from gym.spaces import Box, Dict, Discrete 19 | 20 | import habitat 21 | from habitat import Config 22 | from habitat.core.agent import Agent 23 | import soundspaces 24 | from ss_baselines.av_wan.config.default import get_config 25 | from ss_baselines.av_wan.ppo.policy import AudioNavBaselinePolicy 26 | from ss_baselines.common.utils import batch_obs 27 | from eval import Challenge2022 28 | 29 | 30 | @numba.njit 31 | def _seed_numba(seed: int): 32 | random.seed(seed) 33 | np.random.seed(seed) 34 | 35 | 36 | class PPOAgent(Agent): 37 | def __init__(self, config: Config): 38 | spaces = { 39 | "spectrogram": Box( 40 | low=np.finfo(np.float32).min, 41 | high=np.finfo(np.float32).max, 42 | shape=(65, 26, 2), 43 | dtype=np.float32, 44 | ), 45 | "gm": Box( 46 | low=np.finfo(np.float32).min, 47 | high=np.finfo(np.float32).max, 48 | shape=(400, 400, 2), 49 | dtype=np.float32, 50 | ), 51 | "am": Box( 52 | low=np.finfo(np.float32).min, 53 | high=np.finfo(np.float32).max, 54 | shape=(20, 20, 1), 55 | dtype=np.float32, 56 | ), 57 | "action_map": Box( 58 | low=np.finfo(np.float32).min, 59 | high=np.finfo(np.float32).max, 60 | shape=(9, 9, 1), 61 | dtype=np.float32, 62 | ), 63 | } 64 | 65 | if config.INPUT_TYPE in ["depth", "rgbd"]: 66 | spaces["depth"] = Box( 67 | low=0, 68 | high=1, 69 | shape=( 70 | config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.HEIGHT, 71 | config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.WIDTH, 72 | 1, 73 | ), 74 | dtype=np.float32, 75 | ) 76 | 77 | if config.INPUT_TYPE in ["rgb", "rgbd"]: 78 | spaces["rgb"] = Box( 79 | low=0, 80 | high=255, 81 | shape=( 82 | config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.HEIGHT, 83 | config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.WIDTH, 84 | 3, 85 | ), 86 | dtype=np.uint8, 87 | ) 88 | observation_spaces = Dict(spaces) 89 | 90 | action_space = Discrete(len(config.TASK_CONFIG.TASK.POSSIBLE_ACTIONS)) 91 | 92 | self.device = torch.device("cuda:{}".format(config.TORCH_GPU_ID)) 93 | self.hidden_size = config.RL.PPO.hidden_size 94 | 95 | random.seed(config.RANDOM_SEED) 96 | np.random.seed(config.RANDOM_SEED) 97 | _seed_numba(config.RANDOM_SEED) 98 | torch.random.manual_seed(config.RANDOM_SEED) 99 | torch.backends.cudnn.deterministic = True 100 | policy_arguments = OrderedDict( 101 | observation_space=observation_spaces, 102 | hidden_size=self.hidden_size, 103 | goal_sensor_uuid=config.TASK_CONFIG.TASK.GOAL_SENSOR_UUID, 104 | masking=config.MASKING, 105 | action_map_size=9 106 | ) 107 | 108 | self.actor_critic = AudioNavBaselinePolicy(**policy_arguments) 109 | self.actor_critic.to(self.device) 110 | 111 | if config.MODEL_PATH: 112 | ckpt = torch.load(config.MODEL_PATH, map_location=self.device) 113 | print(f"Checkpoint loaded: {config.MODEL_PATH}") 114 | # Filter only actor_critic weights 115 | self.actor_critic.load_state_dict( 116 | { 117 | k.replace("actor_critic.", ""): v 118 | for k, v in ckpt["state_dict"].items() 119 | if "actor_critic" in k 120 | } 121 | ) 122 | 123 | else: 124 | habitat.logger.error( 125 | "Model checkpoint wasn't loaded, evaluating " "a random model." 126 | ) 127 | 128 | self.test_recurrent_hidden_states = None 129 | self.not_done_masks = None 130 | self.prev_actions = None 131 | 132 | def reset(self): 133 | self.test_recurrent_hidden_states = torch.zeros( 134 | self.actor_critic.net.num_recurrent_layers, 135 | 1, 136 | self.hidden_size, 137 | device=self.device, 138 | ) 139 | self.not_done_masks = torch.zeros(1, 1, device=self.device) 140 | self.prev_actions = torch.zeros(1, 1, dtype=torch.long, device=self.device) 141 | 142 | def act(self, observations): 143 | batch = batch_obs([observations], device=self.device) 144 | 145 | with torch.no_grad(): 146 | _, action, _, self.test_recurrent_hidden_states, _ = self.actor_critic.act( 147 | batch, 148 | self.test_recurrent_hidden_states, 149 | self.prev_actions, 150 | self.not_done_masks, 151 | deterministic=True, 152 | ) 153 | # Make masks not done till reset (end of episode) will be called 154 | self.not_done_masks.fill_(1.0) 155 | self.prev_actions.copy_(action) 156 | 157 | return action.item() 158 | 159 | 160 | def main(): 161 | parser = argparse.ArgumentParser() 162 | parser.add_argument( 163 | "--input-type", default="blind", choices=["blind", "rgb", "depth", "rgbd"] 164 | ) 165 | config_paths = os.environ["CHALLENGE_CONFIG_FILE"] 166 | parser.add_argument("--model-path", default="", type=str) 167 | parser.add_argument( 168 | "--run-dir", type=str, default="runs/", 169 | ) 170 | args = parser.parse_args() 171 | 172 | config = get_config( 173 | "configs/avwan_agent.yaml", ["BASE_TASK_CONFIG_PATH", config_paths] 174 | ).clone() 175 | config.defrost() 176 | config.TORCH_GPU_ID = 0 177 | config.INPUT_TYPE = args.input_type 178 | config.MODEL_PATH = args.model_path 179 | 180 | config.RANDOM_SEED = 7 181 | config.freeze() 182 | logging.basicConfig(level=logging.INFO, format='%(asctime)s, %(levelname)s: %(message)s', 183 | datefmt="%Y-%m-%d %H:%M:%S") 184 | 185 | agent = PPOAgent(config) 186 | 187 | challenge = Challenge2022() 188 | challenge._env.seed(config.RANDOM_SEED) 189 | 190 | print("Start evaluating ...") 191 | challenge.submit(agent, run_dir=args.run_dir, json_filename=f"avwan_{config.TASK_CONFIG.DATASET.SPLIT}.json") 192 | 193 | 194 | if __name__ == "__main__": 195 | main() -------------------------------------------------------------------------------- /configs/avnav_agent.yaml: -------------------------------------------------------------------------------- 1 | BASE_TASK_CONFIG_PATH: "configs/challenge_audionav.local.rgbd.yaml" 2 | TRAINER_NAME: "ppo" 3 | ENV_NAME: "NavRLEnv" 4 | SIMULATOR_GPU_ID: 0 5 | TORCH_GPU_ID: 0 6 | VIDEO_OPTION: [] 7 | TENSORBOARD_DIR: "tb" 8 | VIDEO_DIR: "video_dir" 9 | TEST_EPISODE_COUNT: 10 10 | EVAL_CKPT_PATH_DIR: "data/new_checkpoints" 11 | NUM_PROCESSES: 4 12 | SENSORS: ["DEPTH_SENSOR"] 13 | CHECKPOINT_FOLDER: "data/new_checkpoints" 14 | NUM_UPDATES: 10000 15 | LOG_INTERVAL: 10 16 | CHECKPOINT_INTERVAL: 50 17 | 18 | RL: 19 | PPO: 20 | # ppo params 21 | clip_param: 0.1 22 | ppo_epoch: 4 23 | num_mini_batch: 1 24 | value_loss_coef: 0.5 25 | entropy_coef: 0.20 26 | lr: 2.5e-4 27 | eps: 1e-5 28 | max_grad_norm: 0.5 29 | # decide the length of history that ppo encodes 30 | num_steps: 150 31 | hidden_size: 512 32 | use_gae: True 33 | gamma: 0.99 34 | tau: 0.95 35 | use_linear_clip_decay: True 36 | use_linear_lr_decay: True 37 | # window size for calculating the past rewards 38 | reward_window_size: 50 -------------------------------------------------------------------------------- /configs/avwan_agent.yaml: -------------------------------------------------------------------------------- 1 | BASE_TASK_CONFIG_PATH: "configs/challenge_avwan.local.rgbd.yaml" 2 | TRAINER_NAME: "AVWanTrainer" 3 | ENV_NAME: "MapNavEnv" 4 | SIMULATOR_GPU_ID: 0 5 | TORCH_GPU_ID: 0 6 | VIDEO_OPTION: [] 7 | TENSORBOARD_DIR: "tb" 8 | VIDEO_DIR: "video_dir" 9 | TEST_EPISODE_COUNT: 10 10 | EVAL_CKPT_PATH_DIR: "data/new_checkpoints" 11 | NUM_PROCESSES: 4 12 | SENSORS: ["DEPTH_SENSOR"] 13 | CHECKPOINT_FOLDER: "data/new_checkpoints" 14 | NUM_UPDATES: 10000 15 | LOG_INTERVAL: 10 16 | CHECKPOINT_INTERVAL: 50 17 | 18 | RL: 19 | PPO: 20 | # ppo params 21 | clip_param: 0.1 22 | ppo_epoch: 4 23 | num_mini_batch: 1 24 | value_loss_coef: 0.5 25 | entropy_coef: 0.20 26 | lr: 2.5e-4 27 | eps: 1e-5 28 | max_grad_norm: 0.5 29 | # decide the length of history that ppo encodes 30 | num_steps: 150 31 | hidden_size: 512 32 | use_gae: True 33 | gamma: 0.99 34 | tau: 0.95 35 | use_linear_clip_decay: True 36 | use_linear_lr_decay: True 37 | # window size for calculating the past rewards 38 | reward_window_size: 50 -------------------------------------------------------------------------------- /configs/challenge_audionav.local.rgbd.yaml: -------------------------------------------------------------------------------- 1 | ENVIRONMENT: 2 | MAX_EPISODE_STEPS: 500 3 | SIMULATOR: 4 | HABITAT_SIM_V0: 5 | GPU_DEVICE_ID: 0 6 | RGB_SENSOR: 7 | WIDTH: 128 8 | HEIGHT: 128 9 | DEPTH_SENSOR: 10 | WIDTH: 128 11 | HEIGHT: 128 12 | 13 | TYPE: "SoundSpacesSim" 14 | ACTION_SPACE_CONFIG: "v0" 15 | SCENE_DATASET: "mp3d" 16 | GRID_SIZE: 1.0 17 | AUDIO: 18 | RIR_SAMPLING_RATE: 16000 19 | AGENT_0: 20 | SENSORS: ['DEPTH_SENSOR'] 21 | 22 | TASK: 23 | TYPE: AudioNav 24 | SUCCESS_DISTANCE: 0.2 25 | 26 | # SENSORS: ['SPECTROGRAM_SENSOR', 'EGOMAP_SENSOR', "GEOMETRIC_MAP", "ACTION_MAP", 'ACOUSTIC_MAP', 'INTENSITY', 'COLLISION'] 27 | SENSORS: ['SPECTROGRAM_SENSOR'] 28 | GOAL_SENSOR_UUID: spectrogram 29 | 30 | MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL'] 31 | SPL: 32 | TYPE: SPL 33 | GEOMETRIC_MAP: 34 | MAP_SIZE: 400 35 | INTERNAL_MAP_SIZE: 1200 36 | MAP_RESOLUTION: 0.1 37 | ACOUSTIC_MAP: 38 | MAP_SIZE: 20 39 | MAP_RESOLUTION: 1.0 40 | ACTION_MAP: 41 | MAP_SIZE: 9 42 | MAP_RESOLUTION: 1.0 43 | 44 | DATASET: 45 | TYPE: "AudioNav" 46 | SPLIT: "val_mini" 47 | CONTENT_SCENES: ["*"] 48 | VERSION: 'v1' 49 | SCENES_DIR: "data/scene_datasets/mp3d" 50 | DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz" 51 | -------------------------------------------------------------------------------- /configs/challenge_avwan.local.rgbd.yaml: -------------------------------------------------------------------------------- 1 | ENVIRONMENT: 2 | MAX_EPISODE_STEPS: 500 3 | SIMULATOR: 4 | HABITAT_SIM_V0: 5 | GPU_DEVICE_ID: 0 6 | RGB_SENSOR: 7 | WIDTH: 128 8 | HEIGHT: 128 9 | DEPTH_SENSOR: 10 | WIDTH: 128 11 | HEIGHT: 128 12 | 13 | TYPE: "SoundSpacesSim" 14 | ACTION_SPACE_CONFIG: "v0" 15 | SCENE_DATASET: "mp3d" 16 | GRID_SIZE: 1.0 17 | AUDIO: 18 | RIR_SAMPLING_RATE: 16000 19 | AGENT_0: 20 | SENSORS: ['DEPTH_SENSOR'] 21 | 22 | TASK: 23 | TYPE: AudioNav 24 | SUCCESS_DISTANCE: 0.2 25 | 26 | SENSORS: ['SPECTROGRAM_SENSOR', 'EGOMAP_SENSOR', "GEOMETRIC_MAP", "ACTION_MAP", 'ACOUSTIC_MAP', 'INTENSITY', 'COLLISION'] 27 | GOAL_SENSOR_UUID: spectrogram 28 | 29 | MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL'] 30 | SPL: 31 | TYPE: SPL 32 | GEOMETRIC_MAP: 33 | MAP_SIZE: 400 34 | INTERNAL_MAP_SIZE: 1200 35 | MAP_RESOLUTION: 0.1 36 | ACOUSTIC_MAP: 37 | MAP_SIZE: 20 38 | MAP_RESOLUTION: 1.0 39 | ACTION_MAP: 40 | MAP_SIZE: 9 41 | MAP_RESOLUTION: 1.0 42 | 43 | DATASET: 44 | TYPE: "AudioNav" 45 | SPLIT: "val_mini" 46 | CONTENT_SCENES: ["*"] 47 | VERSION: 'v1' 48 | SCENES_DIR: "data/scene_datasets/mp3d" 49 | DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz" 50 | -------------------------------------------------------------------------------- /configs/challenge_random.local.yaml: -------------------------------------------------------------------------------- 1 | ENVIRONMENT: 2 | MAX_EPISODE_STEPS: 500 3 | SIMULATOR: 4 | HABITAT_SIM_V0: 5 | GPU_DEVICE_ID: 0 6 | RGB_SENSOR: 7 | WIDTH: 128 8 | HEIGHT: 128 9 | DEPTH_SENSOR: 10 | WIDTH: 128 11 | HEIGHT: 128 12 | 13 | TYPE: "SoundSpacesSim" 14 | ACTION_SPACE_CONFIG: "v0" 15 | SCENE_DATASET: "mp3d" 16 | GRID_SIZE: 1.0 17 | AUDIO: 18 | RIR_SAMPLING_RATE: 16000 19 | AGENT_0: 20 | SENSORS: ['DEPTH_SENSOR'] 21 | 22 | TASK: 23 | TYPE: AudioNav 24 | SUCCESS_DISTANCE: 0.2 25 | 26 | SENSORS: ['SPECTROGRAM_SENSOR'] 27 | GOAL_SENSOR_UUID: spectrogram 28 | 29 | MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL'] 30 | SPL: 31 | TYPE: SPL 32 | GEOMETRIC_MAP: 33 | MAP_SIZE: 400 34 | INTERNAL_MAP_SIZE: 1200 35 | MAP_RESOLUTION: 0.1 36 | ACOUSTIC_MAP: 37 | MAP_SIZE: 20 38 | MAP_RESOLUTION: 1.0 39 | ACTION_MAP: 40 | MAP_SIZE: 9 41 | MAP_RESOLUTION: 1.0 42 | 43 | DATASET: 44 | TYPE: "AudioNav" 45 | SPLIT: "val_mini" 46 | CONTENT_SCENES: ["*"] 47 | VERSION: 'v1' 48 | SCENES_DIR: "data/scene_datasets/mp3d" 49 | DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz" 50 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import os 10 | import json 11 | from tqdm import tqdm 12 | from collections import defaultdict 13 | from typing import Dict, Optional 14 | 15 | from habitat.core.logging import logger 16 | from soundspaces.challenge import Challenge 17 | 18 | 19 | EVAL_DCT_KEY_TO_METRIC_NAME = {"SPL": "spl", "SOFT_SPL": "softspl", "DISTANCE_TO_GOAL": "distance_to_goal", "SUCCESS": "success"} 20 | 21 | 22 | class Challenge2022(Challenge): 23 | def __init__(self, eval_remote=False): 24 | config_paths = os.environ["CHALLENGE_CONFIG_FILE"] 25 | super().__init__(eval_remote=eval_remote) 26 | 27 | def evaluate_custom( 28 | self, agent: "Agent", num_episodes: Optional[int] = None 29 | ) -> Dict[str, float]: 30 | if num_episodes is None: 31 | num_episodes = len(self._env.episodes) 32 | else: 33 | assert num_episodes <= len(self._env.episodes), ( 34 | "num_episodes({}) is larger than number of episodes " 35 | "in environment ({})".format( 36 | num_episodes, len(self._env.episodes) 37 | ) 38 | ) 39 | 40 | assert num_episodes > 0, "num_episodes should be greater than 0" 41 | 42 | agg_metrics: Dict = defaultdict(float) 43 | trajs: Dict = defaultdict(list) 44 | 45 | for count_episodes in tqdm(range(num_episodes)): 46 | agent.reset() 47 | observations = self._env.reset() 48 | 49 | scene_id = self._env.current_episode.scene_id.split("/")[-1].split(".")[0] 50 | episode_id = self._env.current_episode.episode_id 51 | scene_episode_id = f"{scene_id}_{episode_id}" 52 | 53 | while not self._env.episode_over: 54 | action = agent.act(observations) 55 | observations = self._env.step(action) 56 | 57 | if scene_episode_id not in trajs: 58 | trajs[scene_episode_id] = [action] 59 | else: 60 | trajs[scene_episode_id].append(action) 61 | 62 | metrics = self._env.get_metrics() 63 | for m, v in metrics.items(): 64 | agg_metrics[m] += v 65 | 66 | avg_metrics = {k: v / (count_episodes + 1) for k, v in agg_metrics.items()} 67 | 68 | eval_dct = {"ACTIONS": trajs} 69 | for eval_dct_key in EVAL_DCT_KEY_TO_METRIC_NAME: 70 | assert EVAL_DCT_KEY_TO_METRIC_NAME[eval_dct_key] in avg_metrics 71 | eval_dct[eval_dct_key] = float(f"{avg_metrics[EVAL_DCT_KEY_TO_METRIC_NAME[eval_dct_key]]:.2f}") 72 | 73 | return avg_metrics, eval_dct 74 | 75 | def submit(self, agent, run_dir, json_filename): 76 | metrics, eval_dct = self.evaluate_custom(agent) 77 | 78 | for k, v in metrics.items(): 79 | logger.info("{}: {}".format(k, v)) 80 | 81 | if not os.path.isdir(run_dir): 82 | os.makedirs(run_dir) 83 | with open(os.path.join(run_dir, json_filename), "w") as fo: 84 | json.dump(eval_dct, fo) 85 | -------------------------------------------------------------------------------- /example_ckpt.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/example_ckpt.pth -------------------------------------------------------------------------------- /res/img/sep_metrics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/sep_metrics.png -------------------------------------------------------------------------------- /res/img/soundspaces-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/soundspaces-logo.png -------------------------------------------------------------------------------- /res/img/spl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/spl.png -------------------------------------------------------------------------------- /test_locally_audionav_rgbd.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | env CHALLENGE_CONFIG_FILE="configs/challenge_audionav.local.rgbd.yaml" python avnav_agent.py --input-type depth --model-path example_ckpt.pth 4 | -------------------------------------------------------------------------------- /test_locally_avwan_rgbd.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | env CHALLENGE_CONFIG_FILE="configs/challenge_avwan.local.rgbd.yaml" python avwan_agent.py --input-type $INPUT_TYPE --model-path CKPT_NAME.pth$@ 4 | -------------------------------------------------------------------------------- /test_locally_random.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | env CHALLENGE_CONFIG_FILE="configs/challenge_random.local.yaml" python agent.py 4 | --------------------------------------------------------------------------------