├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── __init__.py
├── agent.py
├── avnav_agent.py
├── avwan_agent.py
├── configs
    ├── avnav_agent.yaml
    ├── avwan_agent.yaml
    ├── challenge_audionav.local.rgbd.yaml
    ├── challenge_avwan.local.rgbd.yaml
    └── challenge_random.local.yaml
├── eval.py
├── example_ckpt.pth
├── res
    └── img
    │   ├── sep_metrics.png
    │   ├── soundspaces-logo.png
    │   └── spl.png
├── test_locally_audionav_rgbd.sh
├── test_locally_avwan_rgbd.sh
└── test_locally_random.sh


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .vscode
3 | data
4 | # *.pth
5 | runs
6 | __pycache__
7 | 
8 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Code of Conduct
2 | 
3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
4 | Please read the [full text](https://code.fb.com/codeofconduct/)
5 | so that you can understand what actions will and will not be tolerated.
6 | 
7 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to soundspaces-challenge
 2 | We want to make contributing to this project as easy and transparent as
 3 | possible.
 4 | 
 5 | ## Pull Requests
 6 | We actively welcome your pull requests.
 7 | 
 8 | 1. Fork the repo and create your branch from `master`.
 9 | 2. If you haven't already, complete the Contributor License Agreement ("CLA").
10 | 
11 | ## Contributor License Agreement ("CLA")
12 | In order to accept your pull request, we need you to submit a CLA. You only need
13 | to do this once to work on any of Facebook's open source projects.
14 | 
15 | Complete your CLA here: <https://code.facebook.com/cla>
16 | 
17 | ## Issues
18 | We use GitHub issues to track public bugs. Please ensure your description is
19 | clear and has sufficient instructions to be able to reproduce the issue.
20 | 
21 | ## License
22 | By contributing to soundspaces-challenge, you agree that your contributions will be licensed
23 | under the LICENSE file in the root directory of this source tree.
24 | 
25 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) Facebook, Inc. and its affiliates.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <p align="center">
  2 |   <img width = "50%" src='res/img/soundspaces-logo.png' />
  3 |   </p>
  4 | 
  5 | --------------------------------------------------------------------------------
  6 | 
  7 | # SoundSpaces Challenge 2023
  8 | 
  9 | This repository contains starter code for the 2023 challenge, details of the tasks, and training and evaluation setups. For an overview of SoundSpaces Challenge visit [soundspaces.org/challenge](https://soundspaces.org/challenge/). 
 10 | 
 11 | This year, we are hosting two challenges: the first one is on the audio-visual navigation task [1], where an agent is tasked to find a sound-making object in unmapped 3D environments with visual and auditory perception, and the second one is on the active audio-visual source separation task [3], where an agent is tasked to separate a target sound-making object emitting time-varying sounds from an audio mixture comprising spatial time-varying sounds from multiple sound sources.
 12 | 
 13 | 
 14 | ## Task
 15 | In AudioGoal navigation (AudioNav), an agent is spawned at a random starting position and orientation in an unseen environment. A sound-emitting object is also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform at each time step and needs to navigate to the target location. No ground-truth map is available and the agent must only use its sensory input (audio and RGB-D) to navigate.
 16 | 
 17 | In Active Audio-Visual Separation (active AV separation), an agent is spawned at a random starting position and orientation in an unseen environment. Multiple sound-emitting objects, each of which emits a time-varying sound, are also randomly spawned at a location in the same environment.
 18 | The agent receives a one-second audio input in the form of a waveform, which is a mixture of the spatial sounds from all sources, at each time step and needs to navigate to separate the audio from a target source, denoted by a target class label, at every step of its motion.
 19 | No ground-truth map is available and the agent must only use its sensory input (audio and RGB) to navigate. The current version of the challenge considers separation scenarios like speech vs. speech and speech. vs. music.
 20 | 
 21 | ### Dataset
 22 | The challenge will be conducted on the <a href="https://github.com/facebookresearch/sound-spaces/blob/master/soundspaces/README.md">SoundSpaces Dataset</a>, which is based on <a href="https://aihabitat.org/">AI Habitat</a>, <a href="https://niessner.github.io/Matterport//">Matterport3D</a>, and <a href="https://github.com/facebookresearch/Replica-Dataset">Replica</a>. For this challenge, we use the Matterport3D dataset due to its diversity and scale of environments. This challenge focuses on evaluating agents' ability to generalize to unheard sounds and unseen environments. For AudioNav, the training and validation splits are the same as used in <i>Unheard Sound</i> experiments reported in the <a href="http://vision.cs.utexas.edu/projects/audio_visual_navigation/">SoundSpaces paper</a>. They can be downloaded from the <a href="https://github.com/facebookresearch/sound-spaces/tree/master/soundspaces">SoundSpaces dataset page</a> (including minival). For active AV separation, the training and validation splits are the same as used in <i>Unheard Sound</i> experiments reported in the <a href="https://vision.cs.utexas.edu/projects/active-av-dynamic-separation//">Active AV Dynamic Separation paper</a>. 
 23 | 
 24 | ### Evaluation
 25 | For AudioNav, after calling the STOP action, the agent is evaluated using the 'Success weighted by Path Length' (SPL) metric [2]. An episode is deemed successful if on calling the STOP action, the agent is within 0.36m (2x agent-radius) of the goal position.
 26 | 
 27 | <p align="center">
 28 |   <img src='res/img/spl.png' />
 29 | </p>
 30 | 
 31 | 
 32 | For active AV separation, the agent is evaluated using the 'Scale-invariant source-to-distortion ratio' (SI-SDR) metric, averaged over the whole agent trajectory.
 33 | 
 34 | <p align="center">
 35 |   <img width="500" height="250" src='res/img/sep_metrics.png' />
 36 | </p>
 37 | 
 38 | ## Participation Guidelines
 39 | 
 40 | Participate in the contest by registering on the [EvalAI challenge page](https://eval.ai/web/challenges/challenge-page/1971/overview)<!--EvalAI challenge page (coming soon)--> and creating a team. Participants will upload JSON files containing the evaluation metric values for both challenges. For AV Nav, participants will also upload the trajectories executed by their model, which will be used to validate the submitted performance values. For active AV separation, the winning teams will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. Instructions for evaluation and online submission are provided below.
 41 | 
 42 | ### Evaluation
 43 | For AudioNav, 
 44 | 1. Clone the challenge repository:  
 45 | 
 46 |     ```bash
 47 |     git clone https://github.com/facebookresearch/soundspaces-challenge.git
 48 |     cd soundspaces-challenge
 49 |     ```
 50 | 
 51 | 1. Implement your own agent or try one of ours. We provide an agent in `agent.py` that takes random actions:
 52 |     ```python
 53 |     import habitat
 54 |     import soundspaces
 55 | 
 56 |     class RandomAgent(habitat.Agent):
 57 |         def reset(self):
 58 |             pass
 59 | 
 60 |         def act(self, observations):
 61 |             return numpy.random.choice(len(self._POSSIBLE_ACTIONS))
 62 | 
 63 |     def main():
 64 |         agent = RandomAgent(task_config=config)
 65 |         challenge = soundspaces.Challenge()
 66 |         challenge.submit(agent)
 67 |     ```
 68 | 
 69 | 1. Following instructions for downloading SoundSpaces [dataset](https://github.com/facebookresearch/sound-spaces/tree/master/soundspaces) and place all data under `data/` folder.
 70 | 
 71 | 
 72 | 1. Evaluate the random agent locally:
 73 |     ```bash
 74 |     env CHALLENGE_CONFIG_FILE="configs/challenge_random.local.yaml" python agent.py 
 75 |     ```
 76 |     This calls `eval.py`, which dumps a JSON file that contains a Python dictionary of the following type:
 77 |     ```python
 78 |     eval_dict = {"ACTIONS": {f"{scene_id_1}_{episode_id_1}": [action_1_1, ..., 0], f"{scene_id_2}_{episode_id_2}": [action_2_1, ..., 0]}, "SPL": average_spl, "SOFT_SPL": average_softspl, "DISTANCE_TO_GOAL": average_distance_to_goal, "SUCCESS": average_success}
 79 |     ```
 80 |     **Make sure that the json that gets dumped upon evaluating your agent is of the exact same type**. The easiest way to ensure that is by not modifying `eval.py`.
 81 | 
 82 | 
 83 | For active AV separation, follow instructions in the `challenge` branch of the [active-AV-dynamic-separation](https://https://github.com/SAGNIKMJR/active-AV-dynamic-separation) repository.
 84 | 
 85 | ### Online submission
 86 | 
 87 | Follow instructions in the `submit` tab of the [EvalAI challenge page](https://eval.ai/web/challenges/challenge-page/1971/overview) (will open soon!)<!-- EvalAI challenge page (coming soon) --> to **upload** your evaluation JSON file.
 88 | 
 89 | Valid challenge phases are `AudioNav {Minival, Test-Standard} Phase` and `AudioSep Test-Standard Phase`.
 90 | 
 91 | The challenge consists of the following phases:
 92 | 
 93 | 1. **AudioNav Minival Phase**: This split is same as the one used in `./test_locally_audionav_rgbd.sh`. The purpose of this phase/split is sanity checking -- to confirm that your online submission to EvalAI doesn't run into any issue during evaluation. Each team is allowed maximum of 30 submission per day for this phase. 
 94 | 2. **AudioNav Test-Standard Phase**: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for AudioNav; this is what should be used to report results in papers. The relevant split for this phase is `test_multiple_unheard`. Each team is allowed maximum of 10 submission per day for this phase. As a reminder, the submitted trajectories will be used to validate the submitted performance values. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. 
 95 | 2. **AudioSep Test-Standard Phase**: The purpose of this phase is to serve as the public leaderboard establishing the state of the art for active AV separation; this is what should be used to report results in papers. The relevant split for this phase is `testUnheard_1000episodes`. Each team is allowed maximum of 30 submission per day for this phase. As a reminder, the winning teams of the active AV separation challenge will be later asked to turn in their code and checkpoints for inspection. Suspicious submissions will be reviewed and if necessary, the participating team will be disqualified. 
 96 | 
 97 | Note: If you face any issues or have questions you can ask them by mailing the organizers or opening an issue on this repository.
 98 | 
 99 | ### Baselines and Starter Code
100 | 1. **AudioNav**: We included both the configs and Python scripts for [av-nav](https://github.com/facebookresearch/sound-spaces/tree/master/ss_baselines/av_nav) and [av-wan](https://github.com/facebookresearch/sound-spaces/tree/master/ss_baselines/av_wan). Note that the [MapNav environment](https://github.com/facebookresearch/sound-spaces/blob/soundspaces-challenge/ss_baselines/av_wan/mapnav_env.py) used by av-wan is baked into the environment container and can't be changed. We suggest you to re-write that planning for loop in the agent code if you want to modify mapping or planning.
101 | 
102 | 2. **Active AV Separation**: We have included configs and Python in the `challenge` branch of the [active-AV-dynamic-separation](https://https://github.com/SAGNIKMJR/active-AV-dynamic-separation) repository.
103 | 
104 | 
105 | ## Acknowledgments
106 | 
107 | Thank Habitat team for the challenge template.
108 | 
109 | 
110 | ## References
111 | 
112 | [1] [SoundSpaces: Audio-Visual Navigation in 3D Environments](https://arxiv.org/pdf/1912.11474.pdf). Changan Chen\*, Unnat Jain\*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman. ECCV, 2020.
113 | 
114 | [2] [On evaluation of embodied navigation agents](https://arxiv.org/abs/1807.06757). Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir. arXiv:1807.06757, 2018.
115 | 
116 | [3] [Active Audio-Visual Separation of Dynamic Sound Sources](https://arxiv.org/pdf/2202.00850.pdf). Sagnik Majumder, Kristen Grauman. ECCV, 2022.
117 | 
118 | 
119 | ## License
120 | This repo is MIT licensed, as found in the LICENSE file.
121 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/__init__.py


--------------------------------------------------------------------------------
/agent.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | # Copyright (c) Facebook, Inc. and its affiliates.
 4 | # This source code is licensed under the MIT license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import argparse
 8 | import os
 9 | import random
10 | 
11 | import numpy
12 | 
13 | import habitat
14 | import soundspaces
15 | # from av_nav.config.default import get_task_config
16 | from ss_baselines.av_nav.config.default import get_task_config
17 | from eval import Challenge2022
18 | 
19 | 
20 | class RandomAgent(habitat.Agent):
21 |     def __init__(self, task_config: habitat.Config):
22 |         self._POSSIBLE_ACTIONS = task_config.TASK.POSSIBLE_ACTIONS
23 | 
24 |     def reset(self):
25 |         pass
26 | 
27 |     def act(self, observations):
28 |         return numpy.random.choice(len(self._POSSIBLE_ACTIONS))
29 | 
30 | 
31 | def main():
32 |     parser = argparse.ArgumentParser()
33 |     parser.add_argument(
34 |         "--run-dir",  type=str, default="runs/",
35 |     )
36 |     args = parser.parse_args()
37 | 
38 |     config_paths = os.environ["CHALLENGE_CONFIG_FILE"]
39 |     config = get_task_config(config_paths)
40 |     agent = RandomAgent(task_config=config)
41 | 
42 |     challenge = Challenge2022()
43 | 
44 |     challenge.submit(agent, run_dir=args.run_dir, json_filename=f"random_{config.DATASET.SPLIT}.json")
45 | 
46 | 
47 | if __name__ == "__main__":
48 |     main()
49 | 


--------------------------------------------------------------------------------
/avnav_agent.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # This source code is licensed under the MIT license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import argparse
  8 | import os
  9 | import random
 10 | from collections import OrderedDict
 11 | import logging
 12 | import sys
 13 | import time
 14 | 
 15 | import numba
 16 | import numpy as np
 17 | import torch
 18 | from gym.spaces import Box, Dict, Discrete
 19 | 
 20 | import habitat
 21 | from habitat import Config
 22 | from habitat.core.agent import Agent
 23 | import soundspaces
 24 | from ss_baselines.av_nav.config import get_config
 25 | from ss_baselines.av_nav.ppo.policy import AudioNavBaselinePolicy
 26 | from ss_baselines.common.utils import batch_obs
 27 | from eval import Challenge2022
 28 | 
 29 | 
 30 | @numba.njit
 31 | def _seed_numba(seed: int):
 32 |     random.seed(seed)
 33 |     np.random.seed(seed)
 34 | 
 35 | 
 36 | class PPOAgent(Agent):
 37 |     def __init__(self, config: Config):
 38 |         spaces = {
 39 |             "spectrogram": Box(
 40 |                 low=np.finfo(np.float32).min,
 41 |                 high=np.finfo(np.float32).max,
 42 |                 shape=(65, 26, 2),
 43 |                 dtype=np.float32,
 44 |             )
 45 |         }
 46 | 
 47 |         if config.INPUT_TYPE in ["depth", "rgbd"]:
 48 |             spaces["depth"] = Box(
 49 |                 low=0,
 50 |                 high=1,
 51 |                 shape=(
 52 |                     config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.HEIGHT,
 53 |                     config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.WIDTH,
 54 |                     1,
 55 |                 ),
 56 |                 dtype=np.float32,
 57 |             )
 58 | 
 59 |         if config.INPUT_TYPE in ["rgb", "rgbd"]:
 60 |             spaces["rgb"] = Box(
 61 |                 low=0,
 62 |                 high=255,
 63 |                 shape=(
 64 |                     config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.HEIGHT,
 65 |                     config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.WIDTH,
 66 |                     3,
 67 |                 ),
 68 |                 dtype=np.uint8,
 69 |             )
 70 |         observation_spaces = Dict(spaces)
 71 | 
 72 |         action_space = Discrete(len(config.TASK_CONFIG.TASK.POSSIBLE_ACTIONS))
 73 | 
 74 |         self.device = torch.device("cuda:{}".format(config.TORCH_GPU_ID))
 75 |         self.hidden_size = config.RL.PPO.hidden_size
 76 | 
 77 |         random.seed(config.RANDOM_SEED)
 78 |         np.random.seed(config.RANDOM_SEED)
 79 |         _seed_numba(config.RANDOM_SEED)
 80 |         torch.random.manual_seed(config.RANDOM_SEED)
 81 |         torch.backends.cudnn.deterministic = True
 82 |         policy_arguments = OrderedDict(
 83 |             observation_space=observation_spaces,
 84 |             action_space=action_space,
 85 |             hidden_size=self.hidden_size,
 86 |             extra_rgb=False,
 87 |             goal_sensor_uuid=config.TASK_CONFIG.TASK.GOAL_SENSOR_UUID
 88 |         )
 89 | 
 90 |         self.actor_critic = AudioNavBaselinePolicy(**policy_arguments)
 91 |         self.actor_critic.to(self.device)
 92 | 
 93 |         if config.MODEL_PATH:
 94 |             ckpt = torch.load(config.MODEL_PATH, map_location=self.device)
 95 |             print(f"Checkpoint loaded: {config.MODEL_PATH}")
 96 |             #  Filter only actor_critic weights
 97 |             self.actor_critic.load_state_dict(
 98 |                 {
 99 |                     k.replace("actor_critic.", ""): v
100 |                     for k, v in ckpt["state_dict"].items()
101 |                     if "actor_critic" in k
102 |                 }
103 |             )
104 | 
105 |         else:
106 |             habitat.logger.error(
107 |                 "Model checkpoint wasn't loaded, evaluating " "a random model."
108 |             )
109 | 
110 |         self.test_recurrent_hidden_states = None
111 |         self.not_done_masks = None
112 |         self.prev_actions = None
113 | 
114 |     def reset(self):
115 |         self.test_recurrent_hidden_states = torch.zeros(
116 |             self.actor_critic.net.num_recurrent_layers,
117 |             1,
118 |             self.hidden_size,
119 |             device=self.device,
120 |         )
121 |         self.not_done_masks = torch.zeros(1, 1, device=self.device)
122 |         self.prev_actions = torch.zeros(1, 1, dtype=torch.long, device=self.device)
123 | 
124 |     def act(self, observations):
125 |         batch = batch_obs([observations], device=self.device)
126 | 
127 |         with torch.no_grad():
128 |             _, action, _, self.test_recurrent_hidden_states = self.actor_critic.act(
129 |                 batch,
130 |                 self.test_recurrent_hidden_states,
131 |                 self.prev_actions,
132 |                 self.not_done_masks,
133 |                 deterministic=False,
134 |             )
135 |             #  Make masks not done till reset (end of episode) will be called
136 |             self.not_done_masks.fill_(1.0)
137 |             self.prev_actions.copy_(action)
138 | 
139 |         return action.item()
140 | 
141 | 
142 | def main():
143 |     parser = argparse.ArgumentParser()
144 |     parser.add_argument(
145 |         "--input-type", default="blind", choices=["blind", "rgb", "depth", "rgbd"]
146 |     )
147 |     config_paths = os.environ["CHALLENGE_CONFIG_FILE"]
148 |     parser.add_argument("--model-path", default="", type=str)
149 |     parser.add_argument(
150 |         "--run-dir",  type=str, default="runs/",
151 |     )
152 |     args = parser.parse_args()
153 | 
154 |     config = get_config(
155 |         "configs/avnav_agent.yaml", ["BASE_TASK_CONFIG_PATH", config_paths]
156 |     ).clone()
157 | 
158 |     config.defrost()
159 |     config.TORCH_GPU_ID = 0
160 |     config.INPUT_TYPE = args.input_type
161 |     config.MODEL_PATH = args.model_path
162 | 
163 |     config.RANDOM_SEED = 7
164 |     config.freeze()
165 | 
166 |     agent = PPOAgent(config)
167 | 
168 |     challenge = Challenge2022()
169 |     challenge._env.seed(config.RANDOM_SEED)
170 | 
171 |     print("Start evaluating ...")
172 |     challenge.submit(agent, run_dir=args.run_dir, json_filename=f"avnav_{config.TASK_CONFIG.DATASET.SPLIT}.json")
173 | 
174 | 
175 | if __name__ == "__main__":
176 |     main()


--------------------------------------------------------------------------------
/avwan_agent.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # This source code is licensed under the MIT license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import argparse
  8 | import os
  9 | import random
 10 | from collections import OrderedDict
 11 | import logging
 12 | import sys
 13 | import time
 14 | 
 15 | import numba
 16 | import numpy as np
 17 | import torch
 18 | from gym.spaces import Box, Dict, Discrete
 19 | 
 20 | import habitat
 21 | from habitat import Config
 22 | from habitat.core.agent import Agent
 23 | import soundspaces
 24 | from ss_baselines.av_wan.config.default import get_config
 25 | from ss_baselines.av_wan.ppo.policy import AudioNavBaselinePolicy
 26 | from ss_baselines.common.utils import batch_obs
 27 | from eval import Challenge2022
 28 | 
 29 | 
 30 | @numba.njit
 31 | def _seed_numba(seed: int):
 32 |     random.seed(seed)
 33 |     np.random.seed(seed)
 34 | 
 35 | 
 36 | class PPOAgent(Agent):
 37 |     def __init__(self, config: Config):
 38 |         spaces = {
 39 |             "spectrogram": Box(
 40 |                 low=np.finfo(np.float32).min,
 41 |                 high=np.finfo(np.float32).max,
 42 |                 shape=(65, 26, 2),
 43 |                 dtype=np.float32,
 44 |             ),
 45 |             "gm": Box(
 46 |                 low=np.finfo(np.float32).min,
 47 |                 high=np.finfo(np.float32).max,
 48 |                 shape=(400, 400, 2),
 49 |                 dtype=np.float32,
 50 |             ),
 51 |             "am": Box(
 52 |                 low=np.finfo(np.float32).min,
 53 |                 high=np.finfo(np.float32).max,
 54 |                 shape=(20, 20, 1),
 55 |                 dtype=np.float32,
 56 |             ),
 57 |             "action_map": Box(
 58 |                 low=np.finfo(np.float32).min,
 59 |                 high=np.finfo(np.float32).max,
 60 |                 shape=(9, 9, 1),
 61 |                 dtype=np.float32,
 62 |             ),
 63 |         }
 64 | 
 65 |         if config.INPUT_TYPE in ["depth", "rgbd"]:
 66 |             spaces["depth"] = Box(
 67 |                 low=0,
 68 |                 high=1,
 69 |                 shape=(
 70 |                     config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.HEIGHT,
 71 |                     config.TASK_CONFIG.SIMULATOR.DEPTH_SENSOR.WIDTH,
 72 |                     1,
 73 |                 ),
 74 |                 dtype=np.float32,
 75 |             )
 76 | 
 77 |         if config.INPUT_TYPE in ["rgb", "rgbd"]:
 78 |             spaces["rgb"] = Box(
 79 |                 low=0,
 80 |                 high=255,
 81 |                 shape=(
 82 |                     config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.HEIGHT,
 83 |                     config.TASK_CONFIG.SIMULATOR.RGB_SENSOR.WIDTH,
 84 |                     3,
 85 |                 ),
 86 |                 dtype=np.uint8,
 87 |             )
 88 |         observation_spaces = Dict(spaces)
 89 | 
 90 |         action_space = Discrete(len(config.TASK_CONFIG.TASK.POSSIBLE_ACTIONS))
 91 | 
 92 |         self.device = torch.device("cuda:{}".format(config.TORCH_GPU_ID))
 93 |         self.hidden_size = config.RL.PPO.hidden_size
 94 | 
 95 |         random.seed(config.RANDOM_SEED)
 96 |         np.random.seed(config.RANDOM_SEED)
 97 |         _seed_numba(config.RANDOM_SEED)
 98 |         torch.random.manual_seed(config.RANDOM_SEED)
 99 |         torch.backends.cudnn.deterministic = True
100 |         policy_arguments = OrderedDict(
101 |             observation_space=observation_spaces,
102 |             hidden_size=self.hidden_size,
103 |             goal_sensor_uuid=config.TASK_CONFIG.TASK.GOAL_SENSOR_UUID,
104 |             masking=config.MASKING,
105 |             action_map_size=9
106 |         )
107 | 
108 |         self.actor_critic = AudioNavBaselinePolicy(**policy_arguments)
109 |         self.actor_critic.to(self.device)
110 | 
111 |         if config.MODEL_PATH:
112 |             ckpt = torch.load(config.MODEL_PATH, map_location=self.device)
113 |             print(f"Checkpoint loaded: {config.MODEL_PATH}")
114 |             #  Filter only actor_critic weights
115 |             self.actor_critic.load_state_dict(
116 |                 {
117 |                     k.replace("actor_critic.", ""): v
118 |                     for k, v in ckpt["state_dict"].items()
119 |                     if "actor_critic" in k
120 |                 }
121 |             )
122 | 
123 |         else:
124 |             habitat.logger.error(
125 |                 "Model checkpoint wasn't loaded, evaluating " "a random model."
126 |             )
127 | 
128 |         self.test_recurrent_hidden_states = None
129 |         self.not_done_masks = None
130 |         self.prev_actions = None
131 | 
132 |     def reset(self):
133 |         self.test_recurrent_hidden_states = torch.zeros(
134 |             self.actor_critic.net.num_recurrent_layers,
135 |             1,
136 |             self.hidden_size,
137 |             device=self.device,
138 |         )
139 |         self.not_done_masks = torch.zeros(1, 1, device=self.device)
140 |         self.prev_actions = torch.zeros(1, 1, dtype=torch.long, device=self.device)
141 | 
142 |     def act(self, observations):
143 |         batch = batch_obs([observations], device=self.device)
144 | 
145 |         with torch.no_grad():
146 |             _, action, _, self.test_recurrent_hidden_states, _  = self.actor_critic.act(
147 |                 batch,
148 |                 self.test_recurrent_hidden_states,
149 |                 self.prev_actions,
150 |                 self.not_done_masks,
151 |                 deterministic=True,
152 |             )
153 |             #  Make masks not done till reset (end of episode) will be called
154 |             self.not_done_masks.fill_(1.0)
155 |             self.prev_actions.copy_(action)
156 | 
157 |         return action.item()
158 | 
159 | 
160 | def main():
161 |     parser = argparse.ArgumentParser()
162 |     parser.add_argument(
163 |         "--input-type", default="blind", choices=["blind", "rgb", "depth", "rgbd"]
164 |     )
165 |     config_paths = os.environ["CHALLENGE_CONFIG_FILE"]
166 |     parser.add_argument("--model-path", default="", type=str)
167 |     parser.add_argument(
168 |         "--run-dir",  type=str, default="runs/",
169 |     )
170 |     args = parser.parse_args()
171 | 
172 |     config = get_config(
173 |         "configs/avwan_agent.yaml", ["BASE_TASK_CONFIG_PATH", config_paths]
174 |     ).clone()
175 |     config.defrost()
176 |     config.TORCH_GPU_ID = 0
177 |     config.INPUT_TYPE = args.input_type
178 |     config.MODEL_PATH = args.model_path
179 | 
180 |     config.RANDOM_SEED = 7
181 |     config.freeze()
182 |     logging.basicConfig(level=logging.INFO, format='%(asctime)s, %(levelname)s: %(message)s',
183 |                         datefmt="%Y-%m-%d %H:%M:%S")
184 | 
185 |     agent = PPOAgent(config)
186 | 
187 |     challenge = Challenge2022()
188 |     challenge._env.seed(config.RANDOM_SEED)
189 | 
190 |     print("Start evaluating ...")
191 |     challenge.submit(agent, run_dir=args.run_dir, json_filename=f"avwan_{config.TASK_CONFIG.DATASET.SPLIT}.json")
192 | 
193 | 
194 | if __name__ == "__main__":
195 |     main()


--------------------------------------------------------------------------------
/configs/avnav_agent.yaml:
--------------------------------------------------------------------------------
 1 | BASE_TASK_CONFIG_PATH: "configs/challenge_audionav.local.rgbd.yaml"
 2 | TRAINER_NAME: "ppo"
 3 | ENV_NAME: "NavRLEnv"
 4 | SIMULATOR_GPU_ID: 0
 5 | TORCH_GPU_ID: 0
 6 | VIDEO_OPTION: []
 7 | TENSORBOARD_DIR: "tb"
 8 | VIDEO_DIR: "video_dir"
 9 | TEST_EPISODE_COUNT: 10
10 | EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
11 | NUM_PROCESSES: 4
12 | SENSORS: ["DEPTH_SENSOR"]
13 | CHECKPOINT_FOLDER: "data/new_checkpoints"
14 | NUM_UPDATES: 10000
15 | LOG_INTERVAL: 10
16 | CHECKPOINT_INTERVAL: 50
17 | 
18 | RL:
19 |   PPO:
20 |     # ppo params
21 |     clip_param: 0.1
22 |     ppo_epoch: 4
23 |     num_mini_batch: 1
24 |     value_loss_coef: 0.5
25 |     entropy_coef: 0.20
26 |     lr: 2.5e-4
27 |     eps: 1e-5
28 |     max_grad_norm: 0.5
29 |     # decide the length of history that ppo encodes
30 |     num_steps: 150
31 |     hidden_size: 512
32 |     use_gae: True
33 |     gamma: 0.99
34 |     tau: 0.95
35 |     use_linear_clip_decay: True
36 |     use_linear_lr_decay: True
37 |     # window size for calculating the past rewards
38 |     reward_window_size: 50


--------------------------------------------------------------------------------
/configs/avwan_agent.yaml:
--------------------------------------------------------------------------------
 1 | BASE_TASK_CONFIG_PATH: "configs/challenge_avwan.local.rgbd.yaml"
 2 | TRAINER_NAME: "AVWanTrainer"
 3 | ENV_NAME: "MapNavEnv"
 4 | SIMULATOR_GPU_ID: 0
 5 | TORCH_GPU_ID: 0
 6 | VIDEO_OPTION: []
 7 | TENSORBOARD_DIR: "tb"
 8 | VIDEO_DIR: "video_dir"
 9 | TEST_EPISODE_COUNT: 10
10 | EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
11 | NUM_PROCESSES: 4
12 | SENSORS: ["DEPTH_SENSOR"]
13 | CHECKPOINT_FOLDER: "data/new_checkpoints"
14 | NUM_UPDATES: 10000
15 | LOG_INTERVAL: 10
16 | CHECKPOINT_INTERVAL: 50
17 | 
18 | RL:
19 |   PPO:
20 |     # ppo params
21 |     clip_param: 0.1
22 |     ppo_epoch: 4
23 |     num_mini_batch: 1
24 |     value_loss_coef: 0.5
25 |     entropy_coef: 0.20
26 |     lr: 2.5e-4
27 |     eps: 1e-5
28 |     max_grad_norm: 0.5
29 |     # decide the length of history that ppo encodes
30 |     num_steps: 150
31 |     hidden_size: 512
32 |     use_gae: True
33 |     gamma: 0.99
34 |     tau: 0.95
35 |     use_linear_clip_decay: True
36 |     use_linear_lr_decay: True
37 |     # window size for calculating the past rewards
38 |     reward_window_size: 50


--------------------------------------------------------------------------------
/configs/challenge_audionav.local.rgbd.yaml:
--------------------------------------------------------------------------------
 1 | ENVIRONMENT:
 2 |   MAX_EPISODE_STEPS: 500
 3 | SIMULATOR:
 4 |   HABITAT_SIM_V0:
 5 |     GPU_DEVICE_ID: 0
 6 |   RGB_SENSOR:
 7 |     WIDTH: 128
 8 |     HEIGHT: 128
 9 |   DEPTH_SENSOR:
10 |     WIDTH: 128
11 |     HEIGHT: 128
12 | 
13 |   TYPE: "SoundSpacesSim"
14 |   ACTION_SPACE_CONFIG: "v0"
15 |   SCENE_DATASET: "mp3d"
16 |   GRID_SIZE: 1.0
17 |   AUDIO:
18 |     RIR_SAMPLING_RATE: 16000
19 |   AGENT_0:
20 |     SENSORS: ['DEPTH_SENSOR']
21 | 
22 | TASK:
23 |   TYPE: AudioNav
24 |   SUCCESS_DISTANCE: 0.2
25 | 
26 |   # SENSORS: ['SPECTROGRAM_SENSOR', 'EGOMAP_SENSOR', "GEOMETRIC_MAP", "ACTION_MAP", 'ACOUSTIC_MAP', 'INTENSITY', 'COLLISION']
27 |   SENSORS: ['SPECTROGRAM_SENSOR']
28 |   GOAL_SENSOR_UUID: spectrogram
29 | 
30 |   MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL']
31 |   SPL:
32 |     TYPE: SPL
33 |   GEOMETRIC_MAP:
34 |     MAP_SIZE: 400
35 |     INTERNAL_MAP_SIZE: 1200
36 |     MAP_RESOLUTION: 0.1
37 |   ACOUSTIC_MAP:
38 |     MAP_SIZE: 20
39 |     MAP_RESOLUTION: 1.0
40 |   ACTION_MAP:
41 |     MAP_SIZE: 9
42 |     MAP_RESOLUTION: 1.0
43 | 
44 | DATASET:
45 |   TYPE: "AudioNav"
46 |   SPLIT: "val_mini"
47 |   CONTENT_SCENES: ["*"]
48 |   VERSION: 'v1'
49 |   SCENES_DIR: "data/scene_datasets/mp3d"
50 |   DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz"
51 | 


--------------------------------------------------------------------------------
/configs/challenge_avwan.local.rgbd.yaml:
--------------------------------------------------------------------------------
 1 | ENVIRONMENT:
 2 |   MAX_EPISODE_STEPS: 500
 3 | SIMULATOR:
 4 |   HABITAT_SIM_V0:
 5 |     GPU_DEVICE_ID: 0
 6 |   RGB_SENSOR:
 7 |     WIDTH: 128
 8 |     HEIGHT: 128
 9 |   DEPTH_SENSOR:
10 |     WIDTH: 128
11 |     HEIGHT: 128
12 | 
13 |   TYPE: "SoundSpacesSim"
14 |   ACTION_SPACE_CONFIG: "v0"
15 |   SCENE_DATASET: "mp3d"
16 |   GRID_SIZE: 1.0
17 |   AUDIO:
18 |     RIR_SAMPLING_RATE: 16000
19 |   AGENT_0:
20 |     SENSORS: ['DEPTH_SENSOR']
21 | 
22 | TASK:
23 |   TYPE: AudioNav
24 |   SUCCESS_DISTANCE: 0.2
25 | 
26 |   SENSORS: ['SPECTROGRAM_SENSOR', 'EGOMAP_SENSOR', "GEOMETRIC_MAP", "ACTION_MAP", 'ACOUSTIC_MAP', 'INTENSITY', 'COLLISION']
27 |   GOAL_SENSOR_UUID: spectrogram
28 | 
29 |   MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL']
30 |   SPL:
31 |     TYPE: SPL
32 |   GEOMETRIC_MAP:
33 |     MAP_SIZE: 400
34 |     INTERNAL_MAP_SIZE: 1200
35 |     MAP_RESOLUTION: 0.1
36 |   ACOUSTIC_MAP:
37 |     MAP_SIZE: 20
38 |     MAP_RESOLUTION: 1.0
39 |   ACTION_MAP:
40 |     MAP_SIZE: 9
41 |     MAP_RESOLUTION: 1.0
42 | 
43 | DATASET:
44 |   TYPE: "AudioNav"
45 |   SPLIT: "val_mini"
46 |   CONTENT_SCENES: ["*"]
47 |   VERSION: 'v1'
48 |   SCENES_DIR: "data/scene_datasets/mp3d"
49 |   DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz"
50 | 


--------------------------------------------------------------------------------
/configs/challenge_random.local.yaml:
--------------------------------------------------------------------------------
 1 | ENVIRONMENT:
 2 |   MAX_EPISODE_STEPS: 500
 3 | SIMULATOR:
 4 |   HABITAT_SIM_V0:
 5 |     GPU_DEVICE_ID: 0
 6 |   RGB_SENSOR:
 7 |     WIDTH: 128
 8 |     HEIGHT: 128
 9 |   DEPTH_SENSOR:
10 |     WIDTH: 128
11 |     HEIGHT: 128
12 | 
13 |   TYPE: "SoundSpacesSim"
14 |   ACTION_SPACE_CONFIG: "v0"
15 |   SCENE_DATASET: "mp3d"
16 |   GRID_SIZE: 1.0
17 |   AUDIO:
18 |     RIR_SAMPLING_RATE: 16000
19 |   AGENT_0:
20 |     SENSORS: ['DEPTH_SENSOR']
21 | 
22 | TASK:
23 |   TYPE: AudioNav
24 |   SUCCESS_DISTANCE: 0.2
25 | 
26 |   SENSORS: ['SPECTROGRAM_SENSOR']
27 |   GOAL_SENSOR_UUID: spectrogram
28 | 
29 |   MEASUREMENTS: ['DISTANCE_TO_GOAL', 'NORMALIZED_DISTANCE_TO_GOAL', 'NUM_ACTION', 'SUCCESS_WEIGHTED_BY_NUM_ACTION', 'SUCCESS', 'SPL', 'SOFT_SPL']
30 |   SPL:
31 |     TYPE: SPL
32 |   GEOMETRIC_MAP:
33 |     MAP_SIZE: 400
34 |     INTERNAL_MAP_SIZE: 1200
35 |     MAP_RESOLUTION: 0.1
36 |   ACOUSTIC_MAP:
37 |     MAP_SIZE: 20
38 |     MAP_RESOLUTION: 1.0
39 |   ACTION_MAP:
40 |     MAP_SIZE: 9
41 |     MAP_RESOLUTION: 1.0
42 | 
43 | DATASET:
44 |   TYPE: "AudioNav"
45 |   SPLIT: "val_mini"
46 |   CONTENT_SCENES: ["*"]
47 |   VERSION: 'v1'
48 |   SCENES_DIR: "data/scene_datasets/mp3d"
49 |   DATA_PATH: "data/datasets/audionav/mp3d/{version}/{split}/{split}.json.gz"
50 | 


--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | # Copyright (c) Facebook, Inc. and its affiliates.
 4 | # All rights reserved.
 5 | 
 6 | # This source code is licensed under the license found in the
 7 | # LICENSE file in the root directory of this source tree.
 8 | 
 9 | import os
10 | import json
11 | from tqdm import tqdm
12 | from collections import defaultdict
13 | from typing import Dict, Optional
14 | 
15 | from habitat.core.logging import logger
16 | from soundspaces.challenge import Challenge
17 | 
18 | 
19 | EVAL_DCT_KEY_TO_METRIC_NAME = {"SPL": "spl", "SOFT_SPL": "softspl", "DISTANCE_TO_GOAL": "distance_to_goal", "SUCCESS": "success"}
20 | 
21 | 
22 | class Challenge2022(Challenge):
23 |     def __init__(self, eval_remote=False):
24 |         config_paths = os.environ["CHALLENGE_CONFIG_FILE"]
25 |         super().__init__(eval_remote=eval_remote)
26 | 
27 |     def evaluate_custom(
28 |         self, agent: "Agent", num_episodes: Optional[int] = None
29 |     ) -> Dict[str, float]:
30 |         if num_episodes is None:
31 |             num_episodes = len(self._env.episodes)
32 |         else:
33 |             assert num_episodes <= len(self._env.episodes), (
34 |                 "num_episodes({}) is larger than number of episodes "
35 |                 "in environment ({})".format(
36 |                     num_episodes, len(self._env.episodes)
37 |                 )
38 |             )
39 | 
40 |         assert num_episodes > 0, "num_episodes should be greater than 0"
41 | 
42 |         agg_metrics: Dict = defaultdict(float)
43 |         trajs: Dict = defaultdict(list)
44 | 
45 |         for count_episodes in tqdm(range(num_episodes)):
46 |             agent.reset()
47 |             observations = self._env.reset()
48 | 
49 |             scene_id = self._env.current_episode.scene_id.split("/")[-1].split(".")[0]
50 |             episode_id = self._env.current_episode.episode_id
51 |             scene_episode_id = f"{scene_id}_{episode_id}"
52 | 
53 |             while not self._env.episode_over:
54 |                 action = agent.act(observations)
55 |                 observations = self._env.step(action)
56 | 
57 |                 if scene_episode_id not in trajs:
58 |                     trajs[scene_episode_id] = [action]
59 |                 else:
60 |                     trajs[scene_episode_id].append(action)
61 | 
62 |             metrics = self._env.get_metrics()
63 |             for m, v in metrics.items():
64 |                 agg_metrics[m] += v
65 | 
66 |         avg_metrics = {k: v / (count_episodes + 1) for k, v in agg_metrics.items()}
67 | 
68 |         eval_dct = {"ACTIONS": trajs}
69 |         for eval_dct_key in EVAL_DCT_KEY_TO_METRIC_NAME:
70 |             assert EVAL_DCT_KEY_TO_METRIC_NAME[eval_dct_key] in avg_metrics
71 |             eval_dct[eval_dct_key] = float(f"{avg_metrics[EVAL_DCT_KEY_TO_METRIC_NAME[eval_dct_key]]:.2f}")
72 | 
73 |         return avg_metrics, eval_dct
74 | 
75 |     def submit(self, agent, run_dir, json_filename):
76 |         metrics, eval_dct = self.evaluate_custom(agent)
77 | 
78 |         for k, v in metrics.items():
79 |             logger.info("{}: {}".format(k, v))
80 | 
81 |         if not os.path.isdir(run_dir):
82 |             os.makedirs(run_dir)
83 |         with open(os.path.join(run_dir, json_filename), "w") as fo:
84 |             json.dump(eval_dct, fo)
85 | 


--------------------------------------------------------------------------------
/example_ckpt.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/example_ckpt.pth


--------------------------------------------------------------------------------
/res/img/sep_metrics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/sep_metrics.png


--------------------------------------------------------------------------------
/res/img/soundspaces-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/soundspaces-logo.png


--------------------------------------------------------------------------------
/res/img/spl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/soundspaces-challenge/0eaaa76c992bba49ad5cf0151493259bee771bcb/res/img/spl.png


--------------------------------------------------------------------------------
/test_locally_audionav_rgbd.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | env CHALLENGE_CONFIG_FILE="configs/challenge_audionav.local.rgbd.yaml" python avnav_agent.py --input-type depth --model-path example_ckpt.pth
4 |   


--------------------------------------------------------------------------------
/test_locally_avwan_rgbd.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | env CHALLENGE_CONFIG_FILE="configs/challenge_avwan.local.rgbd.yaml" python avwan_agent.py --input-type $INPUT_TYPE --model-path CKPT_NAME.pth$@
4 |   


--------------------------------------------------------------------------------
/test_locally_random.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | env CHALLENGE_CONFIG_FILE="configs/challenge_random.local.yaml" python agent.py
4 |   


--------------------------------------------------------------------------------