├── .gitignore
├── DefineTask.md
├── LICENSE
├── README.md
├── datasets
    ├── .gitignore
    └── README.md
├── ithor_arm
    ├── __init__.py
    ├── arm_calculation_utils.py
    ├── ithor_arm_constants.py
    ├── ithor_arm_environment.py
    ├── ithor_arm_sensors.py
    ├── ithor_arm_task_samplers.py
    ├── ithor_arm_tasks.py
    └── ithor_arm_viz.py
├── manipulathor_baselines
    ├── __init__.py
    └── armpointnav_baselines
    │   ├── __init__.py
    │   ├── experiments
    │       ├── __init__.py
    │       ├── armpointnav_base.py
    │       ├── armpointnav_mixin_ddppo.py
    │       ├── armpointnav_mixin_simplegru.py
    │       ├── armpointnav_thor_base.py
    │       └── ithor
    │       │   ├── __init__.py
    │       │   ├── armpointnav_depth.py
    │       │   ├── armpointnav_disjoint_depth.py
    │       │   ├── armpointnav_ithor_base.py
    │       │   ├── armpointnav_no_vision.py
    │       │   ├── armpointnav_rgb.py
    │       │   ├── armpointnav_rgbdepth.py
    │       │   ├── test_NovelObj_armpointnav_depth.py
    │       │   └── test_SeenScenes_NovelObj_armpointnav_depth.py
    │   └── models
    │       ├── __init__.py
    │       ├── arm_pointnav_models.py
    │       ├── base_models.py
    │       └── disjoint_arm_pointnav_models.py
├── manipulathor_constants.py
├── manipulathor_utils
    ├── __init__.py
    ├── debugger_util.py
    └── net_utils.py
├── media
    ├── armpointnav_task.png
    └── dataset.png
├── pretrained_models
    ├── .gitignore
    └── EvaluateModels.md
├── requirements.txt
├── scripts
    ├── example_notebook.ipynb
    ├── jupyter_helper.py
    └── startx.py
├── setup.py
└── updated_requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | *.DS_Store*
3 | .idea/
4 | experiment_output/
5 | test_out
6 | *.ipynb_checkpoints*
7 | 


--------------------------------------------------------------------------------
/DefineTask.md:
--------------------------------------------------------------------------------
 1 | # ✨ Defining a New Task
 2 | 
 3 | In order to define a new task or change any component of the training procedure, it suffices to look into/change the following files and classes.
 4 | 
 5 | ## ManipulaTHOR Environment
 6 | 
 7 | The `ManipulaTHOREnvironment` class defined in `ithor_arm/ithor_arm_environment.py` is a wrapper around the AI2-THOR environment which helps with discretizing the action space for the ArmPointNav task and wraps the functions that are needed for working with the low-level manipulation feature, including getting the arm's pose, agent's pose, etc.  
 8 | 
 9 | #### Important Functions
10 | 
11 | 1. `step`: This function translates the actions generated by the model to their corresponding AI2THOR API commands. For example, for the action `MoveHandXUp`, this function calculates the current arm location, increments the `x` value by `0.05` (which is our step size), and calls the function `MoveMidLevelArm` with the desires `x,y,z` values.
12 | 
13 | 1. `get_objects_moved`: Finds the objects that have moved from the previous action. We only use this function during evaluation for calculating the Success Without Disturbance metric. However, it can be used for reward shaping as well.
14 | 
15 | 1. `get_current_arm_state`: Calculates the current arm state from the metadata of the environment.
16 | 
17 | 2. `is_object_at_low_level_hand`: Checks whether the object is at hand or not.
18 | 
19 | ## ArmPointNav Task Sampler
20 | 
21 | You can find `ArmPointNavTaskSampler` in `ithor_arm/ithor_arm_task_samplers.py`. This class is in charge of reading the dataset metadata, initializing the all possible locations for the object and agent and randomly sampling a data point from this set for each episode.
22 | 
23 | #### Important Functions
24 | 
25 | 1. `calc_possible_trajectories`: This function stores all the possible locations of the objects on the counters in a dictionary sorted by the object names.
26 | 2. `get_source_target_indices`: This function returns two indices from the dictionary calculated above for the initial and goal location of the object.
27 | 3. `next_task`: Gets the source and target locations, resets the scene, initializes the agent and transport the object to its initial location.
28 | 
29 | 
30 | ## ArmPointNav Task
31 | 
32 | You can find `ArmPointNavTask` in `ithor_arm/ithor_arm_tasks.py`. This class includes the possible actions, reward definition, metric calculation and recording and calling the appropriate API functions on the environment.
33 | 
34 | #### Important Functions and Attributes
35 | 
36 | 1. `_actions`: An attribute storing all the possible actions for each task. Note that this is an ordered list.
37 | 2. `judge`: The function for calculating the reward. Any reward shaping should be done in this function. 
38 | 3. `metrics`: Calculates and logs the value of each evaluation metric per episode.


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Original work Copyright (c) 2017 Ilya Kostrikov
 4 | 
 5 | Original work Copyright (c) Facebook, Inc. and its affiliates.
 6 | 
 7 | Modified work Copyright (c) 2020 Allen Institute for Artificial Intelligence
 8 | 
 9 | Permission is hereby granted, free of charge, to any person obtaining a copy
10 | of this software and associated documentation files (the "Software"), to deal
11 | in the Software without restriction, including without limitation the rights
12 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13 | copies of the Software, and to permit persons to whom the Software is
14 | furnished to do so, subject to the following conditions:
15 | 
16 | The above copyright notice and this permission notice shall be included in all
17 | copies or substantial portions of the Software.
18 | 
19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25 | SOFTWARE.
26 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # <a href="https://arxiv.org/pdf/2104.11213.pdf">ManipulaTHOR: A Framework for Visual Object Manipulation</a>
  2 | #### Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi
  3 | #### (Oral Presentation at CVPR 2021)
  4 | #### <a href="https://prior.allenai.org/projects/manipulathor">(Project Page)</a>--<a href="http://ai2thor.allenai.org/manipulathor">(Framework)</a>--<a href="https://www.youtube.com/watch?v=nINZ52nlzX0&ab_channel=AllenInstituteforAI">(Video)</a>--<a href="#TODO">(Slides)</a> 
  5 | 
  6 | We present <b>ManipulaTHOR</b>, a framework that facilitates <b>visual manipulation</b> of objects using a robotic arm. Our framework is built upon a <b>physics engine</b> and enables <b>realistic interactions</b> with objects while navigating through scenes and performing tasks. Object manipulation is an established research domain within the robotics community and poses several challenges including <b>avoiding collisions</b>, <b>grasping</b>, and <b>long-horizon planning</b>. Our framework focuses primarily on manipulation in visually rich and <b>complex scenes</b>, <b>joint manipulation and navigation</b> planning, and <b>generalization</b> to unseen environments and objects; challenges that are often overlooked. The framework provides a comprehensive suite of sensory information and motor functions enabling development of robust manipulation agents.
  7 | 
  8 | This code base is based on <a href=https://allenact.org/>AllenAct</a> framework and the majority of the core training algorithms and pipelines are borrowed from <a href=https://github.com/allenai/allenact>AllenAct code base</a>. 
  9 | 
 10 | ### Citation
 11 | 
 12 | If you find this project useful in your research, please consider citing:
 13 | 
 14 | ```
 15 |    @inproceedings{ehsani2021manipulathor,
 16 |      title={ManipulaTHOR: A Framework for Visual Object Manipulation},
 17 |      author={Ehsani, Kiana and Han, Winson and Herrasti, Alvaro and VanderBilt, Eli and Weihs, Luca and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh},
 18 |      booktitle={CVPR},
 19 |      year={2021}
 20 |    }
 21 | ```
 22 | 
 23 | ### Contents
 24 | <div class="toc">
 25 | <ul>
 26 | <li><a href="#-installation">💻 Installation</a></li>
 27 | <li><a href="#-armpointnav-task-description">📝 ArmPointNav Task Description</a></li>
 28 | <li><a href="#-dataset">📊 Dataset</a></li>
 29 | <li><a href="#-sensory-observations">🖼️ Sensory Observations</a></li>
 30 | <li><a href="#-allowed-actions">🏃 Allowed Actions</a></li>
 31 | <li><a href="#-defining-a-new-task">✨Defining a New Task</a></li>
 32 | <li><a href="#-training-an-agent">🏋 Training an Agent</a></li>
 33 | <li><a href="#-evaluating-a-pre-trained-agent">💪 Evaluating a Pre-Trained Agent</a></li>
 34 | </ul>
 35 | </div>
 36 | 
 37 | ## 💻 Installation
 38 |  
 39 | To begin, clone this repository locally
 40 | ```bash
 41 | git clone https://github.com/ehsanik/manipulathor.git
 42 | ```
 43 | <details>
 44 | <summary><b>See here for a summary of the most important files/directories in this repository</b> </summary> 
 45 | <p>
 46 | 
 47 | Here's a quick summary of the most important files/directories in this repository:
 48 | * `manipulathor_utils/*.py` - Helper functions and classes.
 49 | * `manipulathor_baselines/armpointnav_baselines`
 50 |     - `experiments/`
 51 |         + `ithor/armpointnav_*.py` - Different baselines introduced in the paper. Each files in this folder corresponds to a row of a table in the paper.
 52 |         + `*.py` - The base configuration files which define experiment setup and hyperparameters for training.
 53 |     - `models/*.py` - A collection of Actor-Critic baseline models.  
 54 | * `ithor_arm/` - A collection of Environments, Task Samplers and Task Definitions
 55 |     - `ithor_arm_environment.py` - The definition of the `ManipulaTHOREnvironment` that wraps the AI2THOR-based framework introduced in this work and enables an easy-to-use API.  
 56 |     - `itho_arm_constants.py` - Constants used to define the task and parameters of the environment. These include the step size 
 57 |       taken by the agent, the unique id of the the THOR build we use, etc.
 58 |     - `ithor_arm_sensors.py` - Sensors which provide observations to our agents during training. E.g. the
 59 |       `RGBSensor` obtains RGB images from the environment and returns them for use by the agent. 
 60 |     - `ithor_arm_tasks.py` - Definition of the `ArmPointNav` task, the reward definition and the function for calculating the goal achievement. 
 61 |     - `ithor_arm_task_samplers.py` - Definition of the `ArmPointNavTaskSampler` samplers. Initializing the sampler, reading the json files from the dataset and randomly choosing a task is defined in this file. 
 62 |     - `ithor_arm_viz.py` - Utility functions for visualization and logging the outputs of the models.
 63 | 
 64 | </p>
 65 | </details>
 66 | 
 67 | You can then install requirements by running
 68 | ```bash
 69 | pip install -r requirements.txt
 70 | ```
 71 | 
 72 | 
 73 | 
 74 | **Python 3.6+ 🐍.** Each of the actions supports `typing` within <span class="chillMono">Python</span>.
 75 | 
 76 | **AI2-THOR <bcc2e6> 🧞.** To ensure reproducible results, please install this version of the AI2THOR.
 77 | 
 78 | After installing the requirements, you should start the xserver by running [this script](scripts/startx.py) in the background. Finally, you can start playing with the environment using [our example jupyter notebook](scripts/example_notebook.ipynb).
 79 | 
 80 | ## 📝 ArmPointNav Task Description
 81 | 
 82 | <img src="media/armpointnav_task.png" alt="" width="100%">
 83 | 
 84 | ArmPointNav is the goal of addressing the problem of visual object manipulation, where the task is to move an object between two locations in a scene. Operating in visually rich and complex environments, generalizing to unseen environments and objects, avoiding collisions with objects and structures in the scene, and visual planning to reach the destination are among the major challenges of this task. The example illustrates a sequence of actions taken a by a virtual robot within the ManipulaTHOR environment for picking up a vase from the shelf and stack it on a plate on the countertop.
 85 |    
 86 | ## 📊 Dataset
 87 | 
 88 | To study the task of ArmPointNav, we present the ArmPointNav Dataset (APND). This consists of 30 kitchen scenes in AI2-THOR that include more than 150 object categories (69 interactable object categories) with a variety of shapes, sizes and textures. We use 12 pickupable categories as our target objects. We use 20 scenes in the training set and the remaining is evenly split into Val and Test. We train with 6 object categories and use the remaining to test our model in a Novel-Obj setting. For more information on dataset, and how to download it refer to <a href="datasets/README.md">Dataset Details</a>.
 89 | 
 90 | ## 🖼️ Sensory Observations
 91 | 
 92 | The types of sensors provided for this paper include:
 93 | 
 94 | 1. **RGB images** - having shape `224x224x3` and an FOV of 90 degrees.  
 95 | 2. **Depth maps** - having shape `224x224` and an FOV of 90 degrees.
 96 | 3. **Perfect egomotion** - We allow for agents to know precisely what the object location is relative to the agent's arm as well as to its goal location.
 97 | 
 98 | 
 99 | ## 🏃 Allowed Actions
100 | 
101 | A total of 13 actions are available to our agents, these include:
102 | 
103 | 1. **Moving the agent**
104 | 
105 | * `MoveAhead` - Results in the agent moving ahead by 0.25m if doing so would not result in the agent colliding with something.
106 | 
107 | * `Rotate [Right/Left]` - Results in the agent's body rotating 45 degrees by the desired direction.
108 | 
109 | 2. **Moving the arm**
110 | 
111 | * `Moving the wrist along axis [x, y, z]` - Results in the arm moving along an axis (<span>&#177;</span>x,<span>&#177;</span>y, <span>&#177;</span>z) by 0.05m.
112 | 
113 | * `Moving the height of the arm base [Up/Down]` - Results in the base of the arm moving along y axis by 0.05m.
114 | 
115 | 3. **Abstract Grasp**
116 | 
117 | * Picks up a target object. Only succeeds if the object is inside the arm grasper.
118 |   
119 | 4. **Done Action**
120 | 
121 | * This action finishes an episode. The agent must issue a `Done` action when it reaches the goal otherwise the episode considers as a failure.
122 | 
123 | ## ✨ Defining a New Task
124 | 
125 | In order to define a new task, redefine the rewarding, try a new model, or change the enviornment setup, checkout our tutorial on defining a new task <a href="DefineTask.md">here</a>.
126 | 
127 | ## 🏋 Training An Agent
128 | 
129 | For running experiments first you need to add the project directory to your python path. You can train a model with a specific experiment setup by running one of the experiments below:
130 | 
131 | ```
132 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/<EXPERIMENT-NAME> -o experiment_output -s 1
133 | ```
134 | 
135 | Where `<EXPERIMENT-NAME>` can be one of the options below:
136 | 
137 | ```
138 | armpointnav_no_vision -- No Vision Baseline
139 | armpointnav_disjoint_depth -- Disjoint Model Ablation
140 | armpointnav_rgb -- Our RGB Experiment
141 | armpointnav_rgbdepth -- Our RGBD Experiment
142 | armpointnav_depth -- Our Depth Experiment
143 | ``` 
144 | 
145 | 
146 | ## 💪 Evaluating A Pre-Trained Agent 
147 | 
148 | To evaluate a pre-trained model, (for example to reproduce the numbers in the paper), you can add
149 | `-t test -c <WEIGHT-ADDRESS>` to the end of the command you ran for training. 
150 | 
151 | In order to reproduce the numbers in the paper, you need to download the pretrained models from 
152 | [here](https://drive.google.com/file/d/1wZi_IL5d7elXLkAb4jOixfY0M6-ZfkGM/view?usp=sharing) and extract them 
153 | to pretrained_models. The full list of experiments and their corresponding trained weights can be found
154 | [here](pretrained_models/EvaluateModels.md).
155 | 
156 | ```
157 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/<EXPERIMENT-NAME> -o test_out -s 1 -t test -c <WEIGHT-ADDRESS>
158 | ```
159 | 


--------------------------------------------------------------------------------
/datasets/.gitignore:
--------------------------------------------------------------------------------
1 | *
2 | !.gitignore
3 | !*.md


--------------------------------------------------------------------------------
/datasets/README.md:
--------------------------------------------------------------------------------
 1 | # 📊 Arm Point Nav Dataset (APND)
 2 | 
 3 | As a step towards generalizable object manipulation, we present the task of ArmPointNav – moving an object in the scene from a source location to a target location. This involves, navigating towards the object, moving the arm gripper close to the object, picking it up, navigating towards the target location, moving the arm gripper (with the object in place) close to the target location, and finally releasing the object so it lands carefully. In line with the agent navigation task of PointNav, source and target locations of the object are specified via (x, y, z) coordinates in the agent coordinate frame.
 4 | 
 5 | To study the task of ArmPointNav, we present the Arm Point Nav Dataset (APND). 
 6 | 
 7 | 
 8 | Our data generation process starts by finding all the possible locations in each room per object. Furthermore, we need to separate out the initial object locations for which there exists a valid agent pose that can reach it. This is to ensure the objects in each task of our dataset are reachable by our agent, and there is at least a solution for picking up/dropping off the object in that location. Note that the object in the aforementioned target location can be visible or hidden from the agent’s point of view (e.g., on a table vs. in the closed fridge). Since there is a path between any two agent locations in the room, any pair of possible locations can be considered as a task (which may or may not require navigating the agent in the scene).
 9 | 
10 | <img src="../media/dataset.png" alt="" width="100%">
11 | 
12 | The initial location of the object can pose a variety of challenges. In (a) the tomato is occluded by the bowl, therefore the agent needs to remove the bowl to reach the tomato. In (b) the lettuce is on the shelf, which requires the agent to move the arm carefully such that it does not collide with the shelf or the vase. In (c) The goal object is inside another object and in (d) the goal object is inside a receptacle, therefore it requires interacting with another entity (opening microwave’s door) before reaching for the object. The latter case is outside the scope of this code base.
13 | 
14 | ## 📲 Install The Dataset
15 | 
16 | 1. Download the compressed file of the dataset from [here](https://drive.google.com/file/d/13fOGheELJuoXv_JWRPGcpu2fBY-jKWKB/view?usp=sharing).
17 | 2. Extract the zip file to `datasets/apnd-dataset` 
18 | 
19 | ## 🗂️ Dataset Hierarchy
20 | 
21 | Below you can find the description and usecase of each of the files in the dataset.
22 | 
23 | ```
24 | apnd-dataset
25 | └── valid_object_positions ----- Pool of potential object positions
26 | │   └── valid_<OBJECT-NAME>_positions_in_<ROOM-ID>.json
27 | └── deterministic_tasks ----- Deterministic set of tasks, randomly chosen for evaluation
28 | │   └── tasks_<OBJECT-NAME>_positions_in_<ROOM-ID>.json ----- Consists of pairs of initial and goal location of the object
29 | └── starting_pose.json ----- Initial location in each room for which the arm initialization succeeds
30 | └── valid_agent_initial_locations.json ----- Reachable positions for the agent in each room
31 | └── deterministic_valid_agent_initial_locations.json ----- Fixed set of initial locations for the agent which is used during test time (to ensure evaluation consistency).
32 | ```
33 | 
34 | 
35 | 


--------------------------------------------------------------------------------
/ithor_arm/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/ithor_arm/__init__.py


--------------------------------------------------------------------------------
/ithor_arm/arm_calculation_utils.py:
--------------------------------------------------------------------------------
  1 | """Utility classes and functions for calculating the arm relative and absolute position."""
  2 | from typing import Dict
  3 | 
  4 | import numpy as np
  5 | import torch
  6 | from scipy.spatial.transform import Rotation as R
  7 | 
  8 | from ithor_arm.ithor_arm_constants import (
  9 |     ARM_START_POSITIONS,
 10 |     ADITIONAL_ARM_ARGS,
 11 | )
 12 | 
 13 | 
 14 | def convert_state_to_tensor(state: Dict):
 15 |     result = []
 16 |     if "position" in state:
 17 |         result += [
 18 |             state["position"]["x"],
 19 |             state["position"]["y"],
 20 |             state["position"]["z"],
 21 |         ]
 22 |     if "rotation" in state:
 23 |         result += [
 24 |             state["rotation"]["x"],
 25 |             state["rotation"]["y"],
 26 |             state["rotation"]["z"],
 27 |         ]
 28 |     return torch.Tensor(result)
 29 | 
 30 | 
 31 | def diff_position(state_goal, state_curr):
 32 |     p1 = state_goal["position"]
 33 |     p2 = state_curr["position"]
 34 |     result = {k: abs(p1[k] - p2[k]) for k in p1.keys()}
 35 |     return result
 36 | 
 37 | 
 38 | def make_rotation_matrix(position, rotation):
 39 |     result = np.zeros((4, 4))
 40 |     r = R.from_euler("xyz", [rotation["x"], rotation["y"], rotation["z"]], degrees=True)
 41 |     result[:3, :3] = r.as_matrix()
 42 |     result[3, 3] = 1
 43 |     result[:3, 3] = [position["x"], position["y"], position["z"]]
 44 |     return result
 45 | 
 46 | 
 47 | def inverse_rot_trans_mat(mat):
 48 |     mat = np.linalg.inv(mat)
 49 |     return mat
 50 | 
 51 | 
 52 | def position_rotation_from_mat(matrix):
 53 |     result = {"position": None, "rotation": None}
 54 |     rotation = R.from_matrix(matrix[:3, :3]).as_euler("xyz", degrees=True)
 55 |     rotation_dict = {"x": rotation[0], "y": rotation[1], "z": rotation[2]}
 56 |     result["rotation"] = rotation_dict
 57 |     position = matrix[:3, 3]
 58 |     result["position"] = {"x": position[0], "y": position[1], "z": position[2]}
 59 |     return result
 60 | 
 61 | 
 62 | def find_closest_inverse(deg):
 63 |     for k in saved_inverse_rotation_mats.keys():
 64 |         if abs(k - deg) < 5:
 65 |             return saved_inverse_rotation_mats[k]
 66 |     # if it reaches here it means it had not calculated the degree before
 67 |     rotation = R.from_euler("xyz", [0, deg, 0], degrees=True)
 68 |     result = rotation.as_matrix()
 69 |     inverse = inverse_rot_trans_mat(result)
 70 |     print("WARNING: Had to calculate the matrix for ", deg)
 71 |     return inverse
 72 | 
 73 | 
 74 | def calc_inverse(deg):
 75 |     rotation = R.from_euler("xyz", [0, deg, 0], degrees=True)
 76 |     result = rotation.as_matrix()
 77 |     inverse = inverse_rot_trans_mat(result)
 78 |     return inverse
 79 | 
 80 | 
 81 | saved_inverse_rotation_mats = {i: calc_inverse(i) for i in range(0, 360, 45)}
 82 | saved_inverse_rotation_mats[360] = saved_inverse_rotation_mats[0]
 83 | 
 84 | 
 85 | def convert_world_to_agent_coordinate(world_obj, agent_state):
 86 |     position = agent_state["position"]
 87 |     rotation = agent_state["rotation"]
 88 |     agent_translation = [position["x"], position["y"], position["z"]]
 89 |     # inverse_agent_rotation = inverse_rot_trans_mat(agent_rotation_matrix[:3, :3])
 90 |     assert abs(rotation["x"] - 0) < 0.01 and abs(rotation["z"] - 0) < 0.01
 91 |     inverse_agent_rotation = find_closest_inverse(rotation["y"])
 92 |     obj_matrix = make_rotation_matrix(world_obj["position"], world_obj["rotation"])
 93 |     obj_translation = np.matmul(
 94 |         inverse_agent_rotation, (obj_matrix[:3, 3] - agent_translation)
 95 |     )
 96 |     # add rotation later
 97 |     obj_matrix[:3, 3] = obj_translation
 98 |     result = position_rotation_from_mat(obj_matrix)
 99 |     return result
100 | 
101 | 
102 | def test_translation_functions():
103 |     agent_coordinate = {
104 |         "position": {"x": 1, "y": 0, "z": 2},
105 |         "rotation": {"x": 0, "y": -45, "z": 0},
106 |     }
107 |     obj_coordinate = {
108 |         "position": {"x": 0, "y": 1, "z": 0},
109 |         "rotation": {"x": 0, "y": 0, "z": 0},
110 |     }
111 |     rotated = convert_world_to_agent_coordinate(obj_coordinate, agent_coordinate)
112 |     eps = 0.01
113 |     assert (
114 |         rotated["position"]["x"] - (-2.1) < eps
115 |         and rotated["position"]["x"] - (1) < eps
116 |         and rotated["position"]["x"] - (-0.7) < eps
117 |     )
118 | 
119 | 
120 | def initialize_arm(controller):
121 |     # for start arm from high up,
122 |     scene = controller.last_event.metadata["sceneName"]
123 |     initial_pose = ARM_START_POSITIONS[scene]
124 |     event1 = controller.step(
125 |         dict(
126 |             action="TeleportFull",
127 |             standing=True,
128 |             x=initial_pose["x"],
129 |             y=initial_pose["y"],
130 |             z=initial_pose["z"],
131 |             rotation=dict(x=0, y=initial_pose["rotation"], z=0),
132 |             horizon=initial_pose["horizon"],
133 |         )
134 |     )
135 |     event2 = controller.step(
136 |         dict(action="MoveArm", position=dict(x=0.0, y=0, z=0.35), **ADITIONAL_ARM_ARGS)
137 |     )
138 |     event3 = controller.step(dict(action="MoveArmBase", y=0.8, **ADITIONAL_ARM_ARGS))
139 |     return event1, event2, event3
140 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_constants.py:
--------------------------------------------------------------------------------
  1 | """Constant values and hyperparameters that are used by the environment."""
  2 | import ai2thor
  3 | import ai2thor.fifo_server
  4 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment
  5 | 
  6 | # MANIPULATHOR_COMMIT_ID = "68212159d78aab5c611b7f16338380993884a06a"
  7 | # MANIPULATHOR_COMMIT_ID = 'bcc2e62970823667acb5c2a56e809419f1521e52'
  8 | MANIPULATHOR_COMMIT_ID = "a84dd29471ec2201f583de00257d84fac1a03de2"
  9 | 
 10 | MOVE_THR = 0.01
 11 | ARM_MIN_HEIGHT = 0.450998873
 12 | ARM_MAX_HEIGHT = 1.8009994
 13 | MOVE_ARM_CONSTANT = 0.05
 14 | MOVE_ARM_HEIGHT_CONSTANT = MOVE_ARM_CONSTANT
 15 | 
 16 | ADITIONAL_ARM_ARGS = {
 17 |     "disableRendering": True,
 18 |     "returnToStart": True,
 19 |     "speed": 1,
 20 | }
 21 | 
 22 | MOVE_AHEAD = "MoveAheadContinuous"
 23 | ROTATE_LEFT = "RotateLeftContinuous"
 24 | ROTATE_RIGHT = "RotateRightContinuous"
 25 | MOVE_ARM_HEIGHT_P = "MoveArmHeightP"
 26 | MOVE_ARM_HEIGHT_M = "MoveArmHeightM"
 27 | MOVE_ARM_X_P = "MoveArmXP"
 28 | MOVE_ARM_X_M = "MoveArmXM"
 29 | MOVE_ARM_Y_P = "MoveArmYP"
 30 | MOVE_ARM_Y_M = "MoveArmYM"
 31 | MOVE_ARM_Z_P = "MoveArmZP"
 32 | MOVE_ARM_Z_M = "MoveArmZM"
 33 | PICKUP = "PickUpMidLevel"
 34 | DONE = "DoneMidLevel"
 35 | 
 36 | 
 37 | ENV_ARGS = dict(
 38 |     gridSize=0.25,
 39 |     width=224,
 40 |     height=224,
 41 |     visibilityDistance=1.0,
 42 |     agentMode="arm",
 43 |     fieldOfView=100,
 44 |     agentControllerType="mid-level",
 45 |     server_class=ai2thor.fifo_server.FifoServer,
 46 |     useMassThreshold=True,
 47 |     massThreshold=10,
 48 |     autoSimulation=False,
 49 |     autoSyncTransforms=True,
 50 | )
 51 | 
 52 | TRAIN_OBJECTS = ["Apple", "Bread", "Tomato", "Lettuce", "Pot", "Mug"]
 53 | TEST_OBJECTS = ["Potato", "SoapBottle", "Pan", "Egg", "Spatula", "Cup"]
 54 | 
 55 | 
 56 | def make_all_objects_unbreakable(controller):
 57 |     all_breakable_objects = [
 58 |         o["objectType"]
 59 |         for o in controller.last_event.metadata["objects"]
 60 |         if o["breakable"] is True
 61 |     ]
 62 |     all_breakable_objects = set(all_breakable_objects)
 63 |     for obj_type in all_breakable_objects:
 64 |         controller.step(action="MakeObjectsOfTypeUnbreakable", objectType=obj_type)
 65 | 
 66 | 
 67 | def reset_environment_and_additional_commands(controller, scene_name):
 68 |     controller.reset(scene_name)
 69 |     controller.step(action="MakeAllObjectsMoveable")
 70 |     controller.step(action="MakeObjectsStaticKinematicMassThreshold")
 71 |     make_all_objects_unbreakable(controller)
 72 |     return
 73 | 
 74 | 
 75 | def transport_wrapper(controller, target_object, target_location):
 76 |     transport_detail = dict(
 77 |         action="PlaceObjectAtPoint",
 78 |         objectId=target_object,
 79 |         position=target_location,
 80 |         forceKinematic=True,
 81 |     )
 82 |     advance_detail = dict(action="AdvancePhysicsStep", simSeconds=1.0)
 83 | 
 84 |     if issubclass(type(controller), IThorEnvironment):
 85 |         event = controller.step(transport_detail)
 86 |         controller.step(advance_detail)
 87 |     elif type(controller) == ai2thor.controller.Controller:
 88 |         event = controller.step(**transport_detail)
 89 |         controller.step(**advance_detail)
 90 |     return event
 91 | 
 92 | 
 93 | VALID_OBJECT_LIST = [
 94 |     "Knife",
 95 |     "Bread",
 96 |     "Fork",
 97 |     "Potato",
 98 |     "SoapBottle",
 99 |     "Pan",
100 |     "Plate",
101 |     "Tomato",
102 |     "Egg",
103 |     "Pot",
104 |     "Spatula",
105 |     "Cup",
106 |     "Bowl",
107 |     "SaltShaker",
108 |     "PepperShaker",
109 |     "Lettuce",
110 |     "ButterKnife",
111 |     "Apple",
112 |     "DishSponge",
113 |     "Spoon",
114 |     "Mug",
115 | ]
116 | 
117 | import json
118 | 
119 | with open("datasets/apnd-dataset/starting_pose.json") as f:
120 |     ARM_START_POSITIONS = json.load(f)
121 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_environment.py:
--------------------------------------------------------------------------------
  1 | """A wrapper for engaging with the ManipulaTHOR environment."""
  2 | 
  3 | import copy
  4 | import math
  5 | import typing
  6 | import warnings
  7 | from typing import Tuple, Dict, List, Set, Union, Any, Optional
  8 | 
  9 | import ai2thor.server
 10 | import numpy as np
 11 | from ai2thor.controller import Controller
 12 | from allenact_plugins.ithor_plugin.ithor_constants import VISIBILITY_DISTANCE, FOV
 13 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment
 14 | 
 15 | from ithor_arm.ithor_arm_constants import (
 16 |     ADITIONAL_ARM_ARGS,
 17 |     ARM_MIN_HEIGHT,
 18 |     ARM_MAX_HEIGHT,
 19 |     MOVE_ARM_HEIGHT_CONSTANT,
 20 |     MOVE_ARM_CONSTANT,
 21 |     MANIPULATHOR_COMMIT_ID,
 22 |     reset_environment_and_additional_commands,
 23 |     MOVE_THR,
 24 | )
 25 | from manipulathor_utils.debugger_util import ForkedPdb
 26 | 
 27 | 
 28 | class ManipulaTHOREnvironment(IThorEnvironment):
 29 |     """Wrapper for the manipulathor controller providing arm functionality
 30 |     and bookkeeping.
 31 | 
 32 |     See [here](https://ai2thor.allenai.org/documentation/installation) for comprehensive
 33 |      documentation on AI2-THOR.
 34 | 
 35 |     # Attributes
 36 | 
 37 |     controller : The ai2thor controller.
 38 |     """
 39 | 
 40 |     def __init__(
 41 |         self,
 42 |         x_display: Optional[str] = None,
 43 |         docker_enabled: bool = False,
 44 |         local_thor_build: Optional[str] = None,
 45 |         visibility_distance: float = VISIBILITY_DISTANCE,
 46 |         fov: float = FOV,
 47 |         player_screen_width: int = 224,
 48 |         player_screen_height: int = 224,
 49 |         quality: str = "Very Low",
 50 |         restrict_to_initially_reachable_points: bool = False,
 51 |         make_agents_visible: bool = True,
 52 |         object_open_speed: float = 1.0,
 53 |         simplify_physics: bool = False,
 54 |         verbose: bool = False,
 55 |         env_args=None,
 56 |     ) -> None:
 57 |         """Initializer.
 58 | 
 59 |         # Parameters
 60 | 
 61 |         x_display : The x display into which to launch ai2thor (possibly necessarily if you are running on a server
 62 |             without an attached display).
 63 |         docker_enabled : Whether or not to run thor in a docker container (useful on a server without an attached
 64 |             display so that you don't have to start an x display).
 65 |         local_thor_build : The path to a local build of ai2thor. This is probably not necessary for your use case
 66 |             and can be safely ignored.
 67 |         visibility_distance : The distance (in meters) at which objects, in the viewport of the agent,
 68 |             are considered visible by ai2thor and will have their "visible" flag be set to `True` in the metadata.
 69 |         fov : The agent's camera's field of view.
 70 |         width : The width resolution (in pixels) of the images returned by ai2thor.
 71 |         height : The height resolution (in pixels) of the images returned by ai2thor.
 72 |         quality : The quality at which to render. Possible quality settings can be found in
 73 |             `ai2thor._quality_settings.QUALITY_SETTINGS`.
 74 |         restrict_to_initially_reachable_points : Whether or not to restrict the agent to locations in ai2thor
 75 |             that were found to be (initially) reachable by the agent (i.e. reachable by the agent after resetting
 76 |             the scene). This can be useful if you want to ensure there are only a fixed set of locations where the
 77 |             agent can go.
 78 |         make_agents_visible : Whether or not the agent should be visible. Most noticable when there are multiple agents
 79 |             or when quality settings are high so that the agent casts a shadow.
 80 |         object_open_speed : How quickly objects should be opened. High speeds mean faster simulation but also mean
 81 |             that opening objects have a lot of kinetic energy and can, possibly, knock other objects away.
 82 |         simplify_physics : Whether or not to simplify physics when applicable. Currently this only simplies object
 83 |             interactions when opening drawers (when simplified, objects within a drawer do not slide around on
 84 |             their own when the drawer is opened or closed, instead they are effectively glued down).
 85 |         """
 86 | 
 87 |         self._start_player_screen_width = player_screen_width
 88 |         self._start_player_screen_height = player_screen_height
 89 |         self._local_thor_build = local_thor_build
 90 |         self.x_display = x_display
 91 |         self.controller: Optional[Controller] = None
 92 |         self._started = False
 93 |         self._quality = quality
 94 |         self._verbose = verbose
 95 |         self.env_args = env_args
 96 | 
 97 |         self._initially_reachable_points: Optional[List[Dict]] = None
 98 |         self._initially_reachable_points_set: Optional[Set[Tuple[float, float]]] = None
 99 |         self._move_mag: Optional[float] = None
100 |         self._grid_size: Optional[float] = None
101 |         self._visibility_distance = visibility_distance
102 | 
103 |         self._fov = fov
104 | 
105 |         self.restrict_to_initially_reachable_points = (
106 |             restrict_to_initially_reachable_points
107 |         )
108 |         self.make_agents_visible = make_agents_visible
109 |         self.object_open_speed = object_open_speed
110 |         self._always_return_visible_range = False
111 |         self.simplify_physics = simplify_physics
112 | 
113 |         self.start(None)
114 |         self.check_controller_version()
115 | 
116 |         # noinspection PyTypeHints
117 |         self.controller.docker_enabled = docker_enabled  # type: ignore
118 | 
119 |     def check_controller_version(self):
120 |         if MANIPULATHOR_COMMIT_ID is not None:
121 |             assert (
122 |                 MANIPULATHOR_COMMIT_ID in self.controller._build.url
123 |             ), "Build number is not right, {} vs {}, use  pip3 install -e git+https://github.com/allenai/ai2thor.git@{}#egg=ai2thor".format(
124 |                 self.controller._build.url,
125 |                 MANIPULATHOR_COMMIT_ID,
126 |                 MANIPULATHOR_COMMIT_ID,
127 |             )
128 | 
129 |     def create_controller(self):
130 |         controller = Controller(**self.env_args, commit_id=MANIPULATHOR_COMMIT_ID)
131 | 
132 |         return controller
133 | 
134 |     def start(
135 |         self,
136 |         scene_name: Optional[str],
137 |         move_mag: float = 0.25,
138 |         **kwargs,
139 |     ) -> None:
140 |         """Starts the ai2thor controller if it was previously stopped.
141 | 
142 |         After starting, `reset` will be called with the scene name and move magnitude.
143 | 
144 |         # Parameters
145 | 
146 |         scene_name : The scene to load.
147 |         move_mag : The amount of distance the agent moves in a single `MoveAhead` step.
148 |         kwargs : additional kwargs, passed to reset.
149 |         """
150 |         if self._started:
151 |             raise RuntimeError(
152 |                 "Trying to start the environment but it is already started."
153 |             )
154 | 
155 |         self.controller = self.create_controller()
156 | 
157 |         if (
158 |             self._start_player_screen_height,
159 |             self._start_player_screen_width,
160 |         ) != self.current_frame.shape[:2]:
161 |             self.controller.step(
162 |                 {
163 |                     "action": "ChangeResolution",
164 |                     "x": self._start_player_screen_width,
165 |                     "y": self._start_player_screen_height,
166 |                 }
167 |             )
168 | 
169 |         self._started = True
170 |         self.reset(scene_name=scene_name, move_mag=move_mag, **kwargs)
171 | 
172 |     def reset(
173 |         self,
174 |         scene_name: Optional[str],
175 |         move_mag: float = 0.25,
176 |         **kwargs,
177 |     ):
178 |         self._move_mag = move_mag
179 |         self._grid_size = self._move_mag
180 | 
181 |         if scene_name is None:
182 |             scene_name = self.controller.last_event.metadata["sceneName"]
183 |         # self.reset_init_params()#**kwargs) removing this fixes one of the crashing problem
184 | 
185 |         # to solve the crash issue
186 |         try:
187 |             reset_environment_and_additional_commands(self.controller, scene_name)
188 |         except Exception:
189 |             print("RESETTING THE SCENE,", scene_name)
190 |             self.controller = ai2thor.controller.Controller(
191 |                 **self.env_args, commit_id=MANIPULATHOR_COMMIT_ID
192 |             )
193 |             reset_environment_and_additional_commands(self.controller, scene_name)
194 | 
195 |         if self.object_open_speed != 1.0:
196 |             self.controller.step(
197 |                 {"action": "ChangeOpenSpeed", "x": self.object_open_speed}
198 |             )
199 | 
200 |         self._initially_reachable_points = None
201 |         self._initially_reachable_points_set = None
202 |         self.controller.step({"action": "GetReachablePositions"})
203 |         if not self.controller.last_event.metadata["lastActionSuccess"]:
204 |             warnings.warn(
205 |                 "Error when getting reachable points: {}".format(
206 |                     self.controller.last_event.metadata["errorMessage"]
207 |                 )
208 |             )
209 |         self._initially_reachable_points = self.last_action_return
210 | 
211 |         self.list_of_actions_so_far = []
212 | 
213 |     def randomize_agent_location(
214 |         self, seed: int = None, partial_position: Optional[Dict[str, float]] = None
215 |     ) -> Dict:
216 |         raise Exception("not used")
217 | 
218 |     def is_object_at_low_level_hand(self, object_id):
219 |         current_objects_in_hand = self.controller.last_event.metadata["arm"]["heldObjects"]
220 |         return object_id in current_objects_in_hand
221 | 
222 |     def object_in_hand(self):
223 |         """Object metadata for the object in the agent's hand."""
224 |         inv_objs = self.last_event.metadata["inventoryObjects"]
225 |         if len(inv_objs) == 0:
226 |             return None
227 |         elif len(inv_objs) == 1:
228 |             return self.get_object_by_id(
229 |                 self.last_event.metadata["inventoryObjects"][0]["objectId"]
230 |             )
231 |         else:
232 |             raise AttributeError("Must be <= 1 inventory objects.")
233 | 
234 |     def correct_nan_inf(self, flawed_dict, extra_tag=""):
235 |         corrected_dict = copy.deepcopy(flawed_dict)
236 |         anything_changed = 0
237 |         for (k, v) in corrected_dict.items():
238 |             if v != v or math.isinf(v):
239 |                 corrected_dict[k] = 0
240 |                 anything_changed += 1
241 |         return corrected_dict
242 | 
243 |     def get_object_by_id(self, object_id: str) -> Optional[Dict[str, Any]]:
244 |         for o in self.last_event.metadata["objects"]:
245 |             if o["objectId"] == object_id:
246 |                 o["position"] = self.correct_nan_inf(o["position"], "obj id")
247 |                 return o
248 |         return None
249 | 
250 |     def get_current_arm_state(self):
251 |         h_min = ARM_MIN_HEIGHT
252 |         h_max = ARM_MAX_HEIGHT
253 |         agent_base_location = 0.9009995460510254
254 |         event = self.controller.last_event
255 |         offset = event.metadata["agent"]["position"]["y"] - agent_base_location
256 |         h_max += offset
257 |         h_min += offset
258 |         joints = event.metadata["arm"]["joints"]
259 |         arm = joints[-1]
260 |         assert arm["name"] == "robot_arm_4_jnt"
261 |         xyz_dict = copy.deepcopy(arm["rootRelativePosition"])
262 |         height_arm = joints[0]["position"]["y"]
263 |         xyz_dict["h"] = (height_arm - h_min) / (h_max - h_min)
264 |         xyz_dict = self.correct_nan_inf(xyz_dict, "realtive hand")
265 |         return xyz_dict
266 | 
267 |     def get_absolute_hand_state(self):
268 |         event = self.controller.last_event
269 |         joints = event.metadata["arm"]["joints"]
270 |         arm = copy.deepcopy(joints[-1])
271 |         assert arm["name"] == "robot_arm_4_jnt"
272 |         xyz_dict = arm["position"]
273 |         xyz_dict = self.correct_nan_inf(xyz_dict, "absolute hand")
274 |         return dict(position=xyz_dict, rotation={"x": 0, "y": 0, "z": 0})
275 | 
276 |     def get_pickupable_objects(self):
277 | 
278 |         event = self.controller.last_event
279 |         object_list = event.metadata["arm"]["pickupableObjects"]
280 | 
281 |         return object_list
282 | 
283 |     def get_current_object_locations(self):
284 |         obj_loc_dict = {}
285 |         metadata = self.controller.last_event.metadata["objects"]
286 |         for o in metadata:
287 |             obj_loc_dict[o["objectId"]] = dict(
288 |                 position=o["position"], rotation=o["rotation"]
289 |             )
290 |         return copy.deepcopy(obj_loc_dict)
291 | 
292 |     def close_enough(self, current_obj_pose, init_obj_pose, threshold):
293 |         position_close = [
294 |             abs(current_obj_pose["position"][k] - init_obj_pose["position"][k])
295 |             <= threshold
296 |             for k in ["x", "y", "z"]
297 |         ]
298 |         position_is_close = sum(position_close) == 3
299 |         rotation_close = [
300 |             abs(current_obj_pose["rotation"][k] - init_obj_pose["rotation"][k])
301 |             <= threshold
302 |             for k in ["x", "y", "z"]
303 |         ]
304 |         rotation_is_close = sum(rotation_close) == 3
305 |         return position_is_close and rotation_is_close
306 | 
307 |     def get_objects_moved(self, initial_object_locations):
308 |         current_object_locations = self.get_current_object_locations()
309 |         moved_objects = []
310 |         for object_id in current_object_locations.keys():
311 |             if not self.close_enough(
312 |                 current_object_locations[object_id],
313 |                 initial_object_locations[object_id],
314 |                 threshold=MOVE_THR,
315 |             ):
316 |                 moved_objects.append(object_id)
317 | 
318 |         return moved_objects
319 | 
320 |     def step(
321 |         self, action_dict: Dict[str, Union[str, int, float]]
322 |     ) -> ai2thor.server.Event:
323 |         """Take a step in the ai2thor environment."""
324 |         action = typing.cast(str, action_dict["action"])
325 | 
326 |         skip_render = "renderImage" in action_dict and not action_dict["renderImage"]
327 |         last_frame: Optional[np.ndarray] = None
328 |         if skip_render:
329 |             last_frame = self.current_frame
330 | 
331 |         if self.simplify_physics:
332 |             action_dict["simplifyOPhysics"] = True
333 |         if action in ["PickUpMidLevel", "DoneMidLevel"]:
334 |             if action == "PickUpMidLevel":
335 |                 object_id = action_dict["object_id"]
336 |                 if not self.is_object_at_low_level_hand(object_id):
337 |                     pickupable_objects = self.get_pickupable_objects()
338 |                     #
339 |                     if object_id in pickupable_objects:
340 |                         # This version of the task is actually harder # consider making it easier, are we penalizing failed pickup? yes
341 |                         event = self.step(dict(action="PickupObject"))
342 |                         #  we are doing an additional pass here, label is not right and if we fail we will do it twice
343 |                         object_inventory = self.controller.last_event.metadata["arm"][
344 |                             "heldObjects"
345 |                         ]
346 |                         if (
347 |                             len(object_inventory) > 0
348 |                             and object_id not in object_inventory
349 |                         ):
350 |                             event = self.step(dict(action="ReleaseObject"))
351 |             action_dict = {
352 |                 'action': 'Pass'
353 |             }
354 | 
355 |         elif not "MoveArm" in action:
356 |             if "Continuous" in action:
357 |                 copy_aditions = copy.deepcopy(ADITIONAL_ARM_ARGS)
358 | 
359 |                 action_dict = {**action_dict, **copy_aditions}
360 |                 if action in ["MoveAheadContinuous"]:
361 |                     action_dict["action"] = "MoveAgent"
362 |                     action_dict["ahead"] = 0.2
363 | 
364 |                 elif action in ["RotateRightContinuous"]:
365 |                     action_dict["action"] = "RotateAgent"
366 |                     action_dict["degrees"] = 45
367 | 
368 |                 elif action in ["RotateLeftContinuous"]:
369 |                     action_dict["action"] = "RotateAgent"
370 |                     action_dict["degrees"] = -45
371 | 
372 |         elif "MoveArm" in action:
373 |             copy_aditions = copy.deepcopy(ADITIONAL_ARM_ARGS)
374 |             action_dict = {**action_dict, **copy_aditions}
375 |             base_position = self.get_current_arm_state()
376 |             if "MoveArmHeight" in action:
377 |                 action_dict["action"] = "MoveArmBase"
378 | 
379 |                 if action == "MoveArmHeightP":
380 |                     base_position["h"] += MOVE_ARM_HEIGHT_CONSTANT
381 |                 if action == "MoveArmHeightM":
382 |                     base_position[
383 |                         "h"
384 |                     ] -= MOVE_ARM_HEIGHT_CONSTANT  # height is pretty big!
385 |                 action_dict["y"] = base_position["h"]
386 |             else:
387 |                 action_dict["action"] = "MoveArm"
388 |                 if action == "MoveArmXP":
389 |                     base_position["x"] += MOVE_ARM_CONSTANT
390 |                 elif action == "MoveArmXM":
391 |                     base_position["x"] -= MOVE_ARM_CONSTANT
392 |                 elif action == "MoveArmYP":
393 |                     base_position["y"] += MOVE_ARM_CONSTANT
394 |                 elif action == "MoveArmYM":
395 |                     base_position["y"] -= MOVE_ARM_CONSTANT
396 |                 elif action == "MoveArmZP":
397 |                     base_position["z"] += MOVE_ARM_CONSTANT
398 |                 elif action == "MoveArmZM":
399 |                     base_position["z"] -= MOVE_ARM_CONSTANT
400 |                 action_dict["position"] = {
401 |                     k: v for (k, v) in base_position.items() if k in ["x", "y", "z"]
402 |                 }
403 | 
404 |         sr = self.controller.step(action_dict)
405 |         self.list_of_actions_so_far.append(action_dict)
406 | 
407 |         if self._verbose:
408 |             print(self.controller.last_event)
409 | 
410 |         if self.restrict_to_initially_reachable_points:
411 |             self._snap_agent_to_initially_reachable()
412 | 
413 |         if skip_render:
414 |             assert last_frame is not None
415 |             self.last_event.frame = last_frame
416 | 
417 |         return sr
418 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_sensors.py:
--------------------------------------------------------------------------------
  1 | """Utility classes and functions for sensory inputs used by the models."""
  2 | from typing import Any, Union, Optional
  3 | 
  4 | import gym
  5 | import numpy as np
  6 | from allenact.base_abstractions.sensor import DepthSensor, Sensor, RGBSensor
  7 | from allenact.base_abstractions.task import Task
  8 | from allenact.utils.misc_utils import prepare_locals_for_super
  9 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment
 10 | 
 11 | from ithor_arm.arm_calculation_utils import (
 12 |     convert_world_to_agent_coordinate,
 13 |     convert_state_to_tensor,
 14 |     diff_position,
 15 | )
 16 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment
 17 | 
 18 | 
 19 | class DepthSensorThor(
 20 |     DepthSensor[
 21 |         Union[IThorEnvironment],
 22 |         Union[Task[IThorEnvironment]],
 23 |     ]
 24 | ):
 25 |     """Sensor for Depth images in THOR.
 26 | 
 27 |     Returns from a running IThorEnvironment instance, the current RGB
 28 |     frame corresponding to the agent's egocentric view.
 29 |     """
 30 | 
 31 |     def frame_from_env(self, env: IThorEnvironment, task: Optional[Task]) -> np.ndarray:
 32 |         return env.controller.last_event.depth_frame.copy()
 33 | 
 34 | 
 35 | class NoVisionSensorThor(
 36 |     RGBSensor[
 37 |         Union[IThorEnvironment],
 38 |         Union[Task[IThorEnvironment]],
 39 |     ]
 40 | ):
 41 |     """Sensor for RGB images in THOR.
 42 | 
 43 |     Returns from a running IThorEnvironment instance, the current RGB
 44 |     frame corresponding to the agent's egocentric view.
 45 |     """
 46 | 
 47 |     def frame_from_env(self, env: IThorEnvironment, task: Optional[Task]) -> np.ndarray:
 48 |         result = env.current_frame.copy()
 49 |         result.fill(0)
 50 |         return result
 51 | 
 52 | 
 53 | class AgentRelativeCurrentObjectStateThorSensor(Sensor):
 54 |     def __init__(self, uuid: str = "relative_current_obj_state", **kwargs: Any):
 55 |         observation_space = gym.spaces.Box(
 56 |             low=-100, high=100, shape=(6,), dtype=np.float32
 57 |         )  # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
 58 |         super().__init__(**prepare_locals_for_super(locals()))
 59 | 
 60 |     def get_observation(
 61 |         self, env: IThorEnvironment, task: Task, *args: Any, **kwargs: Any
 62 |     ) -> Any:
 63 |         object_id = task.task_info["objectId"]
 64 |         current_object_state = env.get_object_by_id(object_id)
 65 |         relative_current_obj = convert_world_to_agent_coordinate(
 66 |             current_object_state, env.controller.last_event.metadata["agent"]
 67 |         )
 68 |         result = convert_state_to_tensor(
 69 |             dict(
 70 |                 position=relative_current_obj["position"],
 71 |                 rotation=relative_current_obj["rotation"],
 72 |             )
 73 |         )
 74 |         return result
 75 | 
 76 | 
 77 | class RelativeObjectToGoalSensor(Sensor):
 78 |     def __init__(self, uuid: str = "relative_obj_to_goal", **kwargs: Any):
 79 |         # observation_space = gym.spaces.Discrete(len(self.detector_types))
 80 |         observation_space = gym.spaces.Box(
 81 |             low=-100, high=100, shape=(3,), dtype=np.float32
 82 |         )  # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
 83 |         super().__init__(**prepare_locals_for_super(locals()))
 84 | 
 85 |     def get_observation(
 86 |         self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any
 87 |     ) -> Any:
 88 |         goal_obj_id = task.task_info["objectId"]
 89 |         object_info = env.get_object_by_id(goal_obj_id)
 90 |         target_state = task.task_info["target_location"]
 91 | 
 92 |         agent_state = env.controller.last_event.metadata["agent"]
 93 | 
 94 |         relative_current_obj = convert_world_to_agent_coordinate(
 95 |             object_info, agent_state
 96 |         )
 97 |         relative_goal_state = convert_world_to_agent_coordinate(
 98 |             target_state, agent_state
 99 |         )
100 |         relative_distance = diff_position(relative_current_obj, relative_goal_state)
101 |         result = convert_state_to_tensor(dict(position=relative_distance))
102 |         return result
103 | 
104 | 
105 | class RelativeAgentArmToObjectSensor(Sensor):
106 |     def __init__(self, uuid: str = "relative_agent_arm_to_obj", **kwargs: Any):
107 |         observation_space = gym.spaces.Box(
108 |             low=-100, high=100, shape=(3,), dtype=np.float32
109 |         )  # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
110 |         super().__init__(**prepare_locals_for_super(locals()))
111 | 
112 |     def get_observation(
113 |         self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any
114 |     ) -> Any:
115 |         goal_obj_id = task.task_info["objectId"]
116 |         object_info = env.get_object_by_id(goal_obj_id)
117 |         hand_state = env.get_absolute_hand_state()
118 | 
119 |         relative_goal_obj = convert_world_to_agent_coordinate(
120 |             object_info, env.controller.last_event.metadata["agent"]
121 |         )
122 |         relative_hand_state = convert_world_to_agent_coordinate(
123 |             hand_state, env.controller.last_event.metadata["agent"]
124 |         )
125 |         relative_distance = diff_position(relative_goal_obj, relative_hand_state)
126 |         result = convert_state_to_tensor(dict(position=relative_distance))
127 | 
128 |         return result
129 | 
130 | 
131 | class PickedUpObjSensor(Sensor):
132 |     def __init__(self, uuid: str = "pickedup_object", **kwargs: Any):
133 |         observation_space = gym.spaces.Box(
134 |             low=0, high=1, shape=(1,), dtype=np.float32
135 |         )  # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
136 |         super().__init__(**prepare_locals_for_super(locals()))
137 | 
138 |     def get_observation(
139 |         self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any
140 |     ) -> Any:
141 |         return task.object_picked_up
142 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_task_samplers.py:
--------------------------------------------------------------------------------
  1 | """Task Samplers for the task of ArmPointNav"""
  2 | import json
  3 | import random
  4 | from typing import List, Dict, Optional, Any, Union
  5 | 
  6 | import gym
  7 | from allenact.base_abstractions.sensor import Sensor
  8 | from allenact.base_abstractions.task import Task
  9 | from allenact.base_abstractions.task import TaskSampler
 10 | from allenact.utils.experiment_utils import set_deterministic_cudnn, set_seed
 11 | 
 12 | from ithor_arm.arm_calculation_utils import initialize_arm
 13 | from ithor_arm.ithor_arm_constants import transport_wrapper
 14 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment
 15 | from ithor_arm.ithor_arm_tasks import (
 16 |     AbstractPickUpDropOffTask,
 17 |     ArmPointNavTask,
 18 | )
 19 | from ithor_arm.ithor_arm_viz import LoggerVisualizer, ImageVisualizer
 20 | 
 21 | 
 22 | class AbstractMidLevelArmTaskSampler(TaskSampler):
 23 | 
 24 |     _TASK_TYPE = Task
 25 | 
 26 |     def __init__(
 27 |         self,
 28 |         scenes: List[str],
 29 |         sensors: List[Sensor],
 30 |         max_steps: int,
 31 |         env_args: Dict[str, Any],
 32 |         action_space: gym.Space,
 33 |         rewards_config: Dict,
 34 |         objects: List[str],
 35 |         scene_period: Optional[Union[int, str]] = None,
 36 |         max_tasks: Optional[int] = None,
 37 |         seed: Optional[int] = None,
 38 |         deterministic_cudnn: bool = False,
 39 |         fixed_tasks: Optional[List[Dict[str, Any]]] = None,
 40 |         visualizers: List[LoggerVisualizer] = [],
 41 |         *args,
 42 |         **kwargs
 43 |     ) -> None:
 44 |         self.rewards_config = rewards_config
 45 |         self.env_args = env_args
 46 |         self.scenes = scenes
 47 |         self.grid_size = 0.25
 48 |         self.env: Optional[ManipulaTHOREnvironment] = None
 49 |         self.sensors = sensors
 50 |         self.max_steps = max_steps
 51 |         self._action_space = action_space
 52 |         self.objects = objects
 53 | 
 54 |         self.scene_counter: Optional[int] = None
 55 |         self.scene_order: Optional[List[str]] = None
 56 |         self.scene_id: Optional[int] = None
 57 |         self.scene_period: Optional[
 58 |             Union[str, int]
 59 |         ] = scene_period  # default makes a random choice
 60 |         self.max_tasks: Optional[int] = None
 61 |         self.reset_tasks = max_tasks
 62 | 
 63 |         self._last_sampled_task: Optional[Task] = None
 64 | 
 65 |         self.seed: Optional[int] = None
 66 |         self.set_seed(seed)
 67 | 
 68 |         if deterministic_cudnn:
 69 |             set_deterministic_cudnn()
 70 | 
 71 |         self.reset()
 72 |         self.visualizers = visualizers
 73 |         self.sampler_mode = kwargs["sampler_mode"]
 74 |         self.cap_training = kwargs["cap_training"]
 75 | 
 76 |     def _create_environment(self, **kwargs) -> ManipulaTHOREnvironment:
 77 |         env = ManipulaTHOREnvironment(
 78 |             make_agents_visible=False,
 79 |             object_open_speed=0.05,
 80 |             env_args=self.env_args,
 81 |         )
 82 | 
 83 |         return env
 84 | 
 85 |     @property
 86 |     def last_sampled_task(self) -> Optional[Task]:
 87 |         return self._last_sampled_task
 88 | 
 89 |     def close(self) -> None:
 90 |         if self.env is not None:
 91 |             self.env.stop()
 92 | 
 93 |     @property
 94 |     def all_observation_spaces_equal(self) -> bool:
 95 |         """Check if observation spaces equal.
 96 | 
 97 |         # Returns
 98 | 
 99 |         True if all Tasks that can be sampled by this sampler have the
100 |             same observation space. Otherwise False.
101 |         """
102 |         return True
103 | 
104 |     def reset(self):
105 |         self.scene_counter = 0
106 |         self.scene_order = list(range(len(self.scenes)))
107 |         random.shuffle(self.scene_order)
108 |         self.scene_id = 0
109 |         self.sampler_index = 0
110 | 
111 |         self.max_tasks = self.reset_tasks
112 | 
113 |     def set_seed(self, seed: int):
114 |         self.seed = seed
115 |         if seed is not None:
116 |             set_seed(seed)
117 | 
118 | 
119 | class SimpleArmPointNavGeneralSampler(AbstractMidLevelArmTaskSampler):
120 | 
121 |     _TASK_TYPE = AbstractPickUpDropOffTask
122 | 
123 |     def __init__(self, **kwargs) -> None:
124 | 
125 |         super(SimpleArmPointNavGeneralSampler, self).__init__(**kwargs)
126 |         self.all_possible_points = []
127 |         for scene in self.scenes:
128 |             for object in self.objects:
129 |                 valid_position_adr = "datasets/apnd-dataset/valid_object_positions/valid_{}_positions_in_{}.json".format(
130 |                     object, scene
131 |                 )
132 |                 try:
133 |                     with open(valid_position_adr) as f:
134 |                         data_points = json.load(f)
135 |                 except Exception:
136 |                     print("Failed to load", valid_position_adr)
137 |                     continue
138 |                 visible_data = [
139 |                     data for data in data_points[scene] if data["visibility"]
140 |                 ]
141 |                 self.all_possible_points += visible_data
142 | 
143 |         self.countertop_object_to_data_id = self.calc_possible_trajectories(
144 |             self.all_possible_points
145 |         )
146 | 
147 |         scene_names = set(
148 |             [
149 |                 self.all_possible_points[counter[0]]["scene_name"]
150 |                 for counter in self.countertop_object_to_data_id.values()
151 |                 if len(counter) > 1
152 |             ]
153 |         )
154 | 
155 |         if len(set(scene_names)) < len(self.scenes):
156 |             print("Not all scenes appear")
157 | 
158 |         print(
159 |             "Len dataset",
160 |             len(self.all_possible_points),
161 |             "total_remained",
162 |             sum([len(v) for v in self.countertop_object_to_data_id.values()]),
163 |         )
164 | 
165 |         if (
166 |             self.sampler_mode != "train"
167 |         ):  # Be aware that this totally overrides some stuff
168 |             self.deterministic_data_list = []
169 |             for scene in self.scenes:
170 |                 for object in self.objects:
171 |                     valid_position_adr = "datasets/apnd-dataset/deterministic_tasks/tasks_{}_positions_in_{}.json".format(
172 |                         object, scene
173 |                     )
174 |                     try:
175 |                         with open(valid_position_adr) as f:
176 |                             data_points = json.load(f)
177 |                     except Exception:
178 |                         print("Failed to load", valid_position_adr)
179 |                         continue
180 |                     visible_data = [
181 |                         dict(scene=scene, index=i, datapoint=data)
182 |                         for (i, data) in enumerate(data_points[scene])
183 |                     ]
184 |                     self.deterministic_data_list += visible_data
185 | 
186 |         if self.sampler_mode == "test":
187 |             random.shuffle(self.deterministic_data_list)
188 |             self.max_tasks = self.reset_tasks = len(self.deterministic_data_list)
189 | 
190 |     def next_task(
191 |         self, force_advance_scene: bool = False
192 |     ) -> Optional[AbstractPickUpDropOffTask]:
193 |         if self.max_tasks is not None and self.max_tasks <= 0:
194 |             return None
195 | 
196 |         if self.sampler_mode != "train" and self.length <= 0:
197 |             return None
198 | 
199 |         source_data_point, target_data_point = self.get_source_target_indices()
200 | 
201 |         scene = source_data_point["scene_name"]
202 | 
203 |         assert source_data_point["object_id"] == target_data_point["object_id"]
204 |         assert source_data_point["scene_name"] == target_data_point["scene_name"]
205 | 
206 |         if self.env is None:
207 |             self.env = self._create_environment()
208 | 
209 |         self.env.reset(
210 |             scene_name=scene, agentMode="arm", agentControllerType="mid-level"
211 |         )
212 | 
213 |         event1, event2, event3 = initialize_arm(self.env.controller)
214 | 
215 |         source_location = source_data_point
216 |         target_location = dict(
217 |             position=target_data_point["object_location"],
218 |             rotation={"x": 0, "y": 0, "z": 0},
219 |         )
220 | 
221 |         task_info = {
222 |             "objectId": source_location["object_id"],
223 |             "countertop_id": source_location["countertop_id"],
224 |             "source_location": source_location,
225 |             "target_location": target_location,
226 |         }
227 | 
228 |         this_controller = self.env
229 | 
230 |         event = transport_wrapper(
231 |             this_controller,
232 |             source_location["object_id"],
233 |             source_location["object_location"],
234 |         )
235 |         agent_state = source_location["agent_pose"]
236 | 
237 |         event = this_controller.step(
238 |             dict(
239 |                 action="TeleportFull",
240 |                 standing=True,
241 |                 x=agent_state["position"]["x"],
242 |                 y=agent_state["position"]["y"],
243 |                 z=agent_state["position"]["z"],
244 |                 rotation=dict(
245 |                     x=agent_state["rotation"]["x"],
246 |                     y=agent_state["rotation"]["y"],
247 |                     z=agent_state["rotation"]["z"],
248 |                 ),
249 |                 horizon=agent_state["cameraHorizon"],
250 |             )
251 |         )
252 | 
253 |         should_visualize_goal_start = [
254 |             x for x in self.visualizers if issubclass(type(x), ImageVisualizer)
255 |         ]
256 |         if len(should_visualize_goal_start) > 0:
257 |             task_info["visualization_source"] = source_data_point
258 |             task_info["visualization_target"] = target_data_point
259 | 
260 |         self._last_sampled_task = self._TASK_TYPE(
261 |             env=self.env,
262 |             sensors=self.sensors,
263 |             task_info=task_info,
264 |             max_steps=self.max_steps,
265 |             action_space=self._action_space,
266 |             visualizers=self.visualizers,
267 |             reward_configs=self.rewards_config,
268 |         )
269 | 
270 |         return self._last_sampled_task
271 | 
272 |     @property
273 |     def total_unique(self) -> Optional[Union[int, float]]:
274 |         if self.sampler_mode == "train":
275 |             return None
276 |         else:
277 |             return min(self.max_tasks, len(self.deterministic_data_list))
278 | 
279 |     @property
280 |     def length(self) -> Union[int, float]:
281 |         """Length.
282 | 
283 |         # Returns
284 | 
285 |         Number of total tasks remaining that can be sampled. Can be float('inf').
286 |         """
287 |         return (
288 |             self.total_unique - self.sampler_index
289 |             if self.sampler_mode != "train"
290 |             else (float("inf") if self.max_tasks is None else self.max_tasks)
291 |         )
292 | 
293 |     def get_source_target_indices(self):
294 |         if self.sampler_mode == "train":
295 |             valid_countertops = [
296 |                 k for (k, v) in self.countertop_object_to_data_id.items() if len(v) > 1
297 |             ]
298 |             countertop_id = random.choice(valid_countertops)
299 |             indices = random.sample(self.countertop_object_to_data_id[countertop_id], 2)
300 |             result = (
301 |                 self.all_possible_points[indices[0]],
302 |                 self.all_possible_points[indices[1]],
303 |             )
304 |         else:
305 |             result = self.deterministic_data_list[self.sampler_index]["datapoint"]
306 |             # ForkedPdb().set_trace()
307 |             self.sampler_index += 1
308 | 
309 |         return result
310 | 
311 |     def calc_possible_trajectories(self, all_possible_points):
312 | 
313 |         object_to_data_id = {}
314 | 
315 |         for i in range(len(all_possible_points)):
316 |             object_id = all_possible_points[i]["object_id"]
317 |             object_to_data_id.setdefault(object_id, [])
318 |             object_to_data_id[object_id].append(i)
319 | 
320 |         return object_to_data_id
321 | 
322 | 
323 | class ArmPointNavTaskSampler(SimpleArmPointNavGeneralSampler):
324 |     _TASK_TYPE = ArmPointNavTask
325 | 
326 |     def __init__(self, **kwargs) -> None:
327 | 
328 |         super(ArmPointNavTaskSampler, self).__init__(**kwargs)
329 |         possible_initial_locations = (
330 |             "datasets/apnd-dataset/valid_agent_initial_locations.json"
331 |         )
332 |         if self.sampler_mode == "test":
333 |             possible_initial_locations = (
334 |                 "datasets/apnd-dataset/deterministic_valid_agent_initial_locations.json"
335 |             )
336 |         with open(possible_initial_locations) as f:
337 |             self.possible_agent_reachable_poses = json.load(f)
338 | 
339 |     def next_task(
340 |         self, force_advance_scene: bool = False
341 |     ) -> Optional[AbstractPickUpDropOffTask]:
342 |         if self.max_tasks is not None and self.max_tasks <= 0:
343 |             return None
344 | 
345 |         if self.sampler_mode != "train" and self.length <= 0:
346 |             return None
347 | 
348 |         source_data_point, target_data_point = self.get_source_target_indices()
349 | 
350 |         scene = source_data_point["scene_name"]
351 | 
352 |         assert source_data_point["object_id"] == target_data_point["object_id"]
353 |         assert source_data_point["scene_name"] == target_data_point["scene_name"]
354 | 
355 |         if self.env is None:
356 |             self.env = self._create_environment()
357 | 
358 |         self.env.reset(
359 |             scene_name=scene, agentMode="arm", agentControllerType="mid-level"
360 |         )
361 | 
362 |         event1, event2, event3 = initialize_arm(self.env.controller)
363 | 
364 |         source_location = source_data_point
365 |         target_location = dict(
366 |             position=target_data_point["object_location"],
367 |             rotation={"x": 0, "y": 0, "z": 0},
368 |         )
369 | 
370 |         task_info = {
371 |             "objectId": source_location["object_id"],
372 |             "countertop_id": source_location["countertop_id"],
373 |             "source_location": source_location,
374 |             "target_location": target_location,
375 |         }
376 | 
377 |         this_controller = self.env
378 | 
379 |         event = transport_wrapper(
380 |             this_controller,
381 |             source_location["object_id"],
382 |             source_location["object_location"],
383 |         )
384 | 
385 |         agent_state = source_location[
386 |             "initial_agent_pose"
387 |         ]  # THe only line different from father
388 | 
389 |         event = this_controller.step(
390 |             dict(
391 |                 action="TeleportFull",
392 |                 standing=True,
393 |                 x=agent_state["position"]["x"],
394 |                 y=agent_state["position"]["y"],
395 |                 z=agent_state["position"]["z"],
396 |                 rotation=dict(
397 |                     x=agent_state["rotation"]["x"],
398 |                     y=agent_state["rotation"]["y"],
399 |                     z=agent_state["rotation"]["z"],
400 |                 ),
401 |                 horizon=agent_state["cameraHorizon"],
402 |             )
403 |         )
404 | 
405 |         should_visualize_goal_start = [
406 |             x for x in self.visualizers if issubclass(type(x), ImageVisualizer)
407 |         ]
408 |         if len(should_visualize_goal_start) > 0:
409 |             task_info["visualization_source"] = source_data_point
410 |             task_info["visualization_target"] = target_data_point
411 | 
412 |         self._last_sampled_task = self._TASK_TYPE(
413 |             env=self.env,
414 |             sensors=self.sensors,
415 |             task_info=task_info,
416 |             max_steps=self.max_steps,
417 |             action_space=self._action_space,
418 |             visualizers=self.visualizers,
419 |             reward_configs=self.rewards_config,
420 |         )
421 | 
422 |         return self._last_sampled_task
423 | 
424 |     def get_source_target_indices(self):
425 |         if self.sampler_mode == "train":
426 |             valid_countertops = [
427 |                 k for (k, v) in self.countertop_object_to_data_id.items() if len(v) > 1
428 |             ]
429 |             countertop_id = random.choice(valid_countertops)
430 |             indices = random.sample(self.countertop_object_to_data_id[countertop_id], 2)
431 |             result = (
432 |                 self.all_possible_points[indices[0]],
433 |                 self.all_possible_points[indices[1]],
434 |             )
435 |             scene_name = result[0]["scene_name"]
436 |             selected_agent_init_loc = random.choice(
437 |                 self.possible_agent_reachable_poses[scene_name]
438 |             )
439 |             initial_agent_pose = {
440 |                 "name": "agent",
441 |                 "position": {
442 |                     "x": selected_agent_init_loc["x"],
443 |                     "y": selected_agent_init_loc["y"],
444 |                     "z": selected_agent_init_loc["z"],
445 |                 },
446 |                 "rotation": {
447 |                     "x": -0.0,
448 |                     "y": selected_agent_init_loc["rotation"],
449 |                     "z": 0.0,
450 |                 },
451 |                 "cameraHorizon": selected_agent_init_loc["horizon"],
452 |                 "isStanding": True,
453 |             }
454 |             result[0]["initial_agent_pose"] = initial_agent_pose
455 |         else:  # we need to fix this for test set, agent init location needs to be fixed, therefore we load a fixed valid agent init that is previously randomized
456 |             result = self.deterministic_data_list[self.sampler_index]["datapoint"]
457 |             scene_name = self.deterministic_data_list[self.sampler_index]["scene"]
458 |             datapoint_original_index = self.deterministic_data_list[self.sampler_index][
459 |                 "index"
460 |             ]
461 |             selected_agent_init_loc = self.possible_agent_reachable_poses[scene_name][
462 |                 datapoint_original_index
463 |             ]
464 |             initial_agent_pose = {
465 |                 "name": "agent",
466 |                 "position": {
467 |                     "x": selected_agent_init_loc["x"],
468 |                     "y": selected_agent_init_loc["y"],
469 |                     "z": selected_agent_init_loc["z"],
470 |                 },
471 |                 "rotation": {
472 |                     "x": -0.0,
473 |                     "y": selected_agent_init_loc["rotation"],
474 |                     "z": 0.0,
475 |                 },
476 |                 "cameraHorizon": selected_agent_init_loc["horizon"],
477 |                 "isStanding": True,
478 |             }
479 |             result[0]["initial_agent_pose"] = initial_agent_pose
480 |             self.sampler_index += 1
481 | 
482 |         return result
483 | 
484 | 
485 | def get_all_tuples_from_list(list):
486 |     result = []
487 |     for first_ind in range(len(list) - 1):
488 |         for second_ind in range(first_ind + 1, len(list)):
489 |             result.append([list[first_ind], list[second_ind]])
490 |     return result
491 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_tasks.py:
--------------------------------------------------------------------------------
  1 | """Task Definions for the task of ArmPointNav"""
  2 | 
  3 | from typing import Dict, Tuple, List, Any, Optional
  4 | 
  5 | import gym
  6 | import numpy as np
  7 | from allenact.base_abstractions.misc import RLStepResult
  8 | from allenact.base_abstractions.sensor import Sensor
  9 | from allenact.base_abstractions.task import Task
 10 | 
 11 | from ithor_arm.ithor_arm_constants import (
 12 |     MOVE_ARM_CONSTANT,
 13 |     MOVE_ARM_HEIGHT_P,
 14 |     MOVE_ARM_HEIGHT_M,
 15 |     MOVE_ARM_X_P,
 16 |     MOVE_ARM_X_M,
 17 |     MOVE_ARM_Y_P,
 18 |     MOVE_ARM_Y_M,
 19 |     MOVE_ARM_Z_P,
 20 |     MOVE_ARM_Z_M,
 21 |     MOVE_AHEAD,
 22 |     ROTATE_RIGHT,
 23 |     ROTATE_LEFT,
 24 |     PICKUP,
 25 |     DONE,
 26 | )
 27 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment
 28 | from ithor_arm.ithor_arm_viz import LoggerVisualizer
 29 | 
 30 | 
 31 | def position_distance(s1, s2):
 32 |     position1 = s1["position"]
 33 |     position2 = s2["position"]
 34 |     return (
 35 |         (position1["x"] - position2["x"]) ** 2
 36 |         + (position1["y"] - position2["y"]) ** 2
 37 |         + (position1["z"] - position2["z"]) ** 2
 38 |     ) ** 0.5
 39 | 
 40 | 
 41 | class AbstractPickUpDropOffTask(Task[ManipulaTHOREnvironment]):
 42 | 
 43 |     _actions = (
 44 |         MOVE_ARM_HEIGHT_P,
 45 |         MOVE_ARM_HEIGHT_M,
 46 |         MOVE_ARM_X_P,
 47 |         MOVE_ARM_X_M,
 48 |         MOVE_ARM_Y_P,
 49 |         MOVE_ARM_Y_M,
 50 |         MOVE_ARM_Z_P,
 51 |         MOVE_ARM_Z_M,
 52 |         MOVE_AHEAD,
 53 |         ROTATE_RIGHT,
 54 |         ROTATE_LEFT,
 55 |     )
 56 | 
 57 |     def __init__(
 58 |         self,
 59 |         env: ManipulaTHOREnvironment,
 60 |         sensors: List[Sensor],
 61 |         task_info: Dict[str, Any],
 62 |         max_steps: int,
 63 |         visualizers: List[LoggerVisualizer] = [],
 64 |         **kwargs
 65 |     ) -> None:
 66 |         """Initializer.
 67 | 
 68 |         See class documentation for parameter definitions.
 69 |         """
 70 |         super().__init__(
 71 |             env=env, sensors=sensors, task_info=task_info, max_steps=max_steps, **kwargs
 72 |         )
 73 |         self._took_end_action: bool = False
 74 |         self._success: Optional[bool] = False
 75 |         self._subsampled_locations_from_which_obj_visible: Optional[
 76 |             List[Tuple[float, float, int, int]]
 77 |         ] = None
 78 |         self.visualizers = visualizers
 79 |         self.start_visualize()
 80 |         self.action_sequence_and_success = []
 81 |         self._took_end_action: bool = False
 82 |         self._success: Optional[bool] = False
 83 |         self._subsampled_locations_from_which_obj_visible: Optional[
 84 |             List[Tuple[float, float, int, int]]
 85 |         ] = None
 86 | 
 87 |         # in allenact initialization is with 0.2
 88 |         self.last_obj_to_goal_distance = None
 89 |         self.last_arm_to_obj_distance = None
 90 |         self.object_picked_up = False
 91 |         self.got_reward_for_pickup = False
 92 |         self.reward_configs = kwargs["reward_configs"]
 93 |         self.initial_object_metadata = self.env.get_current_object_locations()
 94 | 
 95 |     @property
 96 |     def action_space(self):
 97 |         return gym.spaces.Discrete(len(self._actions))
 98 | 
 99 |     def reached_terminal_state(self) -> bool:
100 |         return self._took_end_action
101 | 
102 |     @classmethod
103 |     def class_action_names(cls, **kwargs) -> Tuple[str, ...]:
104 |         return cls._actions
105 | 
106 |     def close(self) -> None:
107 |         self.env.stop()
108 | 
109 |     def obj_state_aproximity(self, s1, s2):
110 |         # KIANA ignore rotation for now
111 |         position1 = s1["position"]
112 |         position2 = s2["position"]
113 |         eps = MOVE_ARM_CONSTANT * 2
114 |         return (
115 |             abs(position1["x"] - position2["x"]) < eps
116 |             and abs(position1["y"] - position2["y"]) < eps
117 |             and abs(position1["z"] - position2["z"]) < eps
118 |         )
119 | 
120 |     def start_visualize(self):
121 |         for visualizer in self.visualizers:
122 |             if not visualizer.is_empty():
123 |                 print("OH NO VISUALIZER WAS NOT EMPTY")
124 |                 visualizer.finish_episode(self.env, self, self.task_info)
125 |                 visualizer.finish_episode_metrics(self, self.task_info, None)
126 |             visualizer.log(self.env, "")
127 | 
128 |     def visualize(self, action_str):
129 | 
130 |         for vizualizer in self.visualizers:
131 |             vizualizer.log(self.env, action_str)
132 | 
133 |     def finish_visualizer(self, episode_success):
134 | 
135 |         for visualizer in self.visualizers:
136 |             visualizer.finish_episode(self.env, self, self.task_info)
137 | 
138 |     def finish_visualizer_metrics(self, metric_results):
139 | 
140 |         for visualizer in self.visualizers:
141 |             visualizer.finish_episode_metrics(self, self.task_info, metric_results)
142 | 
143 |     def render(self, mode: str = "rgb", *args, **kwargs) -> np.ndarray:
144 |         assert mode == "rgb", "only rgb rendering is implemented"
145 |         return self.env.current_frame
146 | 
147 |     def calc_action_stat_metrics(self) -> Dict[str, Any]:
148 |         action_stat = {
149 |             "metric/action_stat/" + action_str: 0.0 for action_str in self._actions
150 |         }
151 |         action_success_stat = {
152 |             "metric/action_success/" + action_str: 0.0 for action_str in self._actions
153 |         }
154 |         action_success_stat["metric/action_success/total"] = 0.0
155 | 
156 |         seq_len = len(self.action_sequence_and_success)
157 |         for (action_name, action_success) in self.action_sequence_and_success:
158 |             action_stat["metric/action_stat/" + action_name] += 1.0
159 |             action_success_stat[
160 |                 "metric/action_success/{}".format(action_name)
161 |             ] += action_success
162 |             action_success_stat["metric/action_success/total"] += action_success
163 | 
164 |         action_success_stat["metric/action_success/total"] /= seq_len
165 | 
166 |         for action_name in self._actions:
167 |             action_success_stat[
168 |                 "metric/" + "action_success/{}".format(action_name)
169 |             ] /= (action_stat["metric/action_stat/" + action_name] + 0.000001)
170 |             action_stat["metric/action_stat/" + action_name] /= seq_len
171 | 
172 |         succ = [v for v in action_success_stat.values()]
173 |         sum(succ) / len(succ)
174 |         result = {**action_stat, **action_success_stat}
175 | 
176 |         return result
177 | 
178 |     def metrics(self) -> Dict[str, Any]:
179 |         result = super(AbstractPickUpDropOffTask, self).metrics()
180 |         if self.is_done():
181 |             result = {**result, **self.calc_action_stat_metrics()}
182 |             final_obj_distance_from_goal = self.obj_distance_from_goal()
183 |             result[
184 |                 "metric/average/final_obj_distance_from_goal"
185 |             ] = final_obj_distance_from_goal
186 |             final_arm_distance_from_obj = self.arm_distance_from_obj()
187 |             result[
188 |                 "metric/average/final_arm_distance_from_obj"
189 |             ] = final_arm_distance_from_obj
190 |             final_obj_pickup = 1 if self.object_picked_up else 0
191 |             result["metric/average/final_obj_pickup"] = final_obj_pickup
192 | 
193 |             original_distance = self.get_original_object_distance()
194 |             result["metric/average/original_distance"] = original_distance
195 | 
196 |             # this ratio can be more than 1?
197 |             if self.object_picked_up:
198 |                 ratio_distance_left = final_obj_distance_from_goal / original_distance
199 |                 result["metric/average/ratio_distance_left"] = ratio_distance_left
200 |                 result["metric/average/eplen_pickup"] = self.eplen_pickup
201 | 
202 |             if self._success:
203 |                 result["metric/average/eplen_success"] = result["ep_length"]
204 |                 # put back this is not the reason for being slow
205 |                 objects_moved = self.env.get_objects_moved(self.initial_object_metadata)
206 |                 # Unnecessary, this is definitely happening objects_moved.remove(self.task_info['object_id'])
207 |                 result["metric/average/number_of_unwanted_moved_objects"] = (
208 |                     len(objects_moved) - 1
209 |                 )
210 |                 result["metric/average/success_wo_disturb"] = (
211 |                     len(objects_moved) == 1
212 |                 )  # multiply this by the successrate
213 | 
214 |             result["success"] = self._success
215 | 
216 |             self.finish_visualizer_metrics(result)
217 |             self.finish_visualizer(self._success)
218 |             self.action_sequence_and_success = []
219 | 
220 |         return result
221 | 
222 |     def _step(self, action: int) -> RLStepResult:
223 |         raise Exception("Not implemented")
224 | 
225 |     def arm_distance_from_obj(self):
226 |         goal_obj_id = self.task_info["objectId"]
227 |         object_info = self.env.get_object_by_id(goal_obj_id)
228 |         hand_state = self.env.get_absolute_hand_state()
229 |         return position_distance(object_info, hand_state)
230 | 
231 |     def obj_distance_from_goal(self):
232 |         goal_obj_id = self.task_info["objectId"]
233 |         object_info = self.env.get_object_by_id(goal_obj_id)
234 |         goal_state = self.task_info["target_location"]
235 |         return position_distance(object_info, goal_state)
236 | 
237 |     def get_original_object_distance(self):
238 |         goal_obj_id = self.task_info["objectId"]
239 |         s_init = dict(position=self.task_info["source_location"]["object_location"])
240 |         current_location = self.env.get_object_by_id(goal_obj_id)
241 | 
242 |         original_object_distance = position_distance(s_init, current_location)
243 |         return original_object_distance
244 | 
245 |     def judge(self) -> float:
246 |         """Compute the reward after having taken a step."""
247 |         raise Exception("Not implemented")
248 | 
249 | 
250 | class ArmPointNavTask(AbstractPickUpDropOffTask):
251 |     _actions = (
252 |         MOVE_ARM_HEIGHT_P,
253 |         MOVE_ARM_HEIGHT_M,
254 |         MOVE_ARM_X_P,
255 |         MOVE_ARM_X_M,
256 |         MOVE_ARM_Y_P,
257 |         MOVE_ARM_Y_M,
258 |         MOVE_ARM_Z_P,
259 |         MOVE_ARM_Z_M,
260 |         MOVE_AHEAD,
261 |         ROTATE_RIGHT,
262 |         ROTATE_LEFT,
263 |         PICKUP,
264 |         DONE,
265 |     )
266 | 
267 |     def _step(self, action: int) -> RLStepResult:
268 | 
269 |         action_str = self.class_action_names()[action]
270 | 
271 |         self._last_action_str = action_str
272 |         action_dict = {"action": action_str}
273 |         object_id = self.task_info["objectId"]
274 |         if action_str == PICKUP:
275 |             action_dict = {**action_dict, "object_id": object_id}
276 |         self.env.step(action_dict)
277 |         self.last_action_success = self.env.last_action_success
278 | 
279 |         last_action_name = self._last_action_str
280 |         last_action_success = float(self.last_action_success)
281 |         self.action_sequence_and_success.append((last_action_name, last_action_success))
282 |         self.visualize(last_action_name)
283 | 
284 |         # If the object has not been picked up yet and it was picked up in the previous step update parameters to integrate it into reward
285 |         if not self.object_picked_up:
286 | 
287 |             if self.env.is_object_at_low_level_hand(object_id):
288 |                 self.object_picked_up = True
289 |                 self.eplen_pickup = (
290 |                     self._num_steps_taken + 1
291 |                 )  # plus one because this step has not been counted yet
292 | 
293 |         if action_str == DONE:
294 |             self._took_end_action = True
295 |             object_state = self.env.get_object_by_id(object_id)
296 |             goal_state = self.task_info["target_location"]
297 |             goal_achieved = self.object_picked_up and self.obj_state_aproximity(
298 |                 object_state, goal_state
299 |             )
300 |             self.last_action_success = goal_achieved
301 |             self._success = goal_achieved
302 | 
303 |         step_result = RLStepResult(
304 |             observation=self.get_observations(),
305 |             reward=self.judge(),
306 |             done=self.is_done(),
307 |             info={"last_action_success": self.last_action_success},
308 |         )
309 |         return step_result
310 | 
311 |     def judge(self) -> float:
312 |         """Compute the reward after having taken a step."""
313 |         reward = self.reward_configs["step_penalty"]
314 | 
315 |         if not self.last_action_success or (
316 |             self._last_action_str == PICKUP and not self.object_picked_up
317 |         ):
318 |             reward += self.reward_configs["failed_action_penalty"]
319 | 
320 |         if self._took_end_action:
321 |             reward += (
322 |                 self.reward_configs["goal_success_reward"]
323 |                 if self._success
324 |                 else self.reward_configs["failed_stop_reward"]
325 |             )
326 | 
327 |         # increase reward if object pickup and only do it once
328 |         if not self.got_reward_for_pickup and self.object_picked_up:
329 |             reward += self.reward_configs["pickup_success_reward"]
330 |             self.got_reward_for_pickup = True
331 | 
332 |         current_obj_to_arm_distance = self.arm_distance_from_obj()
333 |         if self.last_arm_to_obj_distance is None:
334 |             delta_arm_to_obj_distance_reward = 0
335 |         else:
336 |             delta_arm_to_obj_distance_reward = (
337 |                 self.last_arm_to_obj_distance - current_obj_to_arm_distance
338 |             )
339 |         self.last_arm_to_obj_distance = current_obj_to_arm_distance
340 |         reward += delta_arm_to_obj_distance_reward
341 | 
342 |         current_obj_to_goal_distance = self.obj_distance_from_goal()
343 |         if self.last_obj_to_goal_distance is None:
344 |             delta_obj_to_goal_distance_reward = 0
345 |         else:
346 |             delta_obj_to_goal_distance_reward = (
347 |                 self.last_obj_to_goal_distance - current_obj_to_goal_distance
348 |             )
349 |         self.last_obj_to_goal_distance = current_obj_to_goal_distance
350 |         reward += delta_obj_to_goal_distance_reward
351 | 
352 |         # add collision cost, maybe distance to goal objective,...
353 | 
354 |         return float(reward)
355 | 


--------------------------------------------------------------------------------
/ithor_arm/ithor_arm_viz.py:
--------------------------------------------------------------------------------
  1 | """Utility functions and classes for visualization and logging"""
  2 | import os
  3 | from datetime import datetime
  4 | 
  5 | import cv2
  6 | import imageio
  7 | import numpy as np
  8 | 
  9 | from ithor_arm.arm_calculation_utils import initialize_arm
 10 | from ithor_arm.ithor_arm_constants import (
 11 |     reset_environment_and_additional_commands,
 12 |     transport_wrapper,
 13 | )
 14 | from manipulathor_utils.debugger_util import ForkedPdb
 15 | 
 16 | 
 17 | class LoggerVisualizer:
 18 |     def __init__(self, exp_name="", log_dir=""):
 19 |         if log_dir == "":
 20 |             log_dir = self.__class__.__name__
 21 |         if exp_name == "":
 22 |             exp_name = "NoNameExp"
 23 |         now = datetime.now()
 24 |         self.exp_name = exp_name
 25 |         log_dir = os.path.join(
 26 |             "experiment_output/visualizations",
 27 |             exp_name,
 28 |             log_dir + "_" + now.strftime("%m_%d_%Y_%H_%M_%S_%f"),
 29 |         )
 30 |         self.log_dir = log_dir
 31 |         os.makedirs(self.log_dir, exist_ok=True)
 32 |         self.log_queue = []
 33 |         self.action_queue = []
 34 |         self.logger_index = 0
 35 | 
 36 |     def log(self, environment, action_str):
 37 |         raise Exception("Not Implemented")
 38 | 
 39 |     def is_empty(self):
 40 |         return len(self.log_queue) == 0
 41 | 
 42 |     def finish_episode_metrics(self, episode_info, task_info, metric_results):
 43 |         pass
 44 | 
 45 |     def finish_episode(self, environment, episode_info, task_info):
 46 |         pass
 47 | 
 48 | 
 49 | class TestMetricLogger(LoggerVisualizer):
 50 |     def __init__(self, **kwargs):
 51 |         super().__init__(**kwargs)
 52 |         self.total_metric_dict = {}
 53 |         log_file_name = os.path.join(
 54 |             self.log_dir, "test_metric_{}.txt".format(self.exp_name)
 55 |         )
 56 |         self.metric_log_file = open(log_file_name, "w")
 57 | 
 58 |     def average_dict(self):
 59 |         result = {}
 60 |         for (k, v) in self.total_metric_dict.items():
 61 |             result[k] = sum(v) / len(v)
 62 |         return result
 63 | 
 64 |     def finish_episode_metrics(self, episode_info, task_info, metric_results=None):
 65 | 
 66 |         if metric_results is None:
 67 |             print("had to reset")
 68 |             self.log_queue = []
 69 |             self.action_queue = []
 70 |             return
 71 | 
 72 |         for k in metric_results.keys():
 73 |             if "metric" in k or k in ["ep_length", "reward", "success"]:
 74 |                 self.total_metric_dict.setdefault(k, [])
 75 |                 self.total_metric_dict[k].append(metric_results[k])
 76 |         print(
 77 |             "total",
 78 |             len(self.total_metric_dict["success"]),
 79 |             "average test metric",
 80 |             self.average_dict(),
 81 |         )
 82 | 
 83 |         # save the task info and all the action queue and results
 84 |         log_dict = {
 85 |             "task_info_metrics": metric_results,
 86 |             "action_sequence": self.action_queue,
 87 |             "logger_number": self.logger_index,
 88 |         }
 89 |         self.logger_index += 1
 90 |         self.metric_log_file.write(str(log_dict))
 91 |         self.metric_log_file.write("\n")
 92 |         print("Logging to", self.metric_log_file.name)
 93 | 
 94 |         self.log_queue = []
 95 |         self.action_queue = []
 96 | 
 97 |     def log(self, environment, action_str):
 98 |         # We can add agent arm and state location if needed
 99 |         self.action_queue.append(action_str)
100 |         self.log_queue.append(action_str)
101 | 
102 | 
103 | class ImageVisualizer(LoggerVisualizer):
104 |     def finish_episode(self, environment, episode_info, task_info):
105 |         now = datetime.now()
106 |         time_to_write = now.strftime("%m_%d_%Y_%H_%M_%S_%f")
107 |         time_to_write += "log_ind_{}".format(self.logger_index)
108 |         self.logger_index += 1
109 |         print("Loggigng", time_to_write, "len", len(self.log_queue))
110 |         object_id = task_info["objectId"]
111 | 
112 |         pickup_success = episode_info.object_picked_up
113 |         episode_success = episode_info._success
114 | 
115 |         # Put back if you want the images
116 |         # for i, img in enumerate(self.log_queue):
117 |         #     image_dir = os.path.join(self.log_dir, time_to_write + '_seq{}.png'.format(str(i)))
118 |         #     cv2.imwrite(image_dir, img[:,:,[2,1,0]])
119 | 
120 |         episode_success_offset = "succ" if episode_success else "fail"
121 |         pickup_success_offset = "succ" if pickup_success else "fail"
122 |         gif_name = (
123 |             time_to_write
124 |             + "_obj_"
125 |             + object_id.split("|")[0]
126 |             + "_pickup_"
127 |             + pickup_success_offset
128 |             + "_episode_"
129 |             + episode_success_offset
130 |             + ".gif"
131 |         )
132 |         concat_all_images = np.expand_dims(np.stack(self.log_queue, axis=0), axis=1)
133 |         save_image_list_to_gif(concat_all_images, gif_name, self.log_dir)
134 | 
135 |         self.log_start_goal(
136 |             environment,
137 |             task_info["visualization_source"],
138 |             tag="start",
139 |             img_adr=os.path.join(self.log_dir, time_to_write),
140 |         )
141 |         self.log_start_goal(
142 |             environment,
143 |             task_info["visualization_target"],
144 |             tag="goal",
145 |             img_adr=os.path.join(self.log_dir, time_to_write),
146 |         )
147 | 
148 |         self.log_queue = []
149 |         self.action_queue = []
150 | 
151 |     def log(self, environment, action_str):
152 |         image_tensor = environment.current_frame
153 |         self.action_queue.append(action_str)
154 |         self.log_queue.append(image_tensor)
155 | 
156 |     def log_start_goal(self, env, task_info, tag, img_adr):
157 |         object_location = task_info["object_location"]
158 |         object_id = task_info["object_id"]
159 |         agent_state = task_info["agent_pose"]
160 |         this_controller = env.controller
161 |         scene = this_controller.last_event.metadata[
162 |             "sceneName"
163 |         ]  # maybe we need to reset env actually]
164 |         reset_environment_and_additional_commands(this_controller, scene)
165 |         # for start arm from high up as a cheating, this block is very important. never remove
166 |         event1, event2, event3 = initialize_arm(this_controller)
167 |         if not (
168 |             event1.metadata["lastActionSuccess"]
169 |             and event2.metadata["lastActionSuccess"]
170 |             and event3.metadata["lastActionSuccess"]
171 |         ):
172 |             print("ERROR: ARM MOVEMENT FAILED in logging! SHOULD NEVER HAPPEN")
173 | 
174 |         event = transport_wrapper(this_controller, object_id, object_location)
175 |         if event.metadata["lastActionSuccess"] == False:
176 |             print("ERROR: oh no could not transport in logging")
177 | 
178 |         event = this_controller.step(
179 |             dict(
180 |                 action="TeleportFull",
181 |                 standing=True,
182 |                 x=agent_state["position"]["x"],
183 |                 y=agent_state["position"]["y"],
184 |                 z=agent_state["position"]["z"],
185 |                 rotation=dict(
186 |                     x=agent_state["rotation"]["x"],
187 |                     y=agent_state["rotation"]["y"],
188 |                     z=agent_state["rotation"]["z"],
189 |                 ),
190 |                 horizon=agent_state["cameraHorizon"],
191 |             )
192 |         )
193 |         if event.metadata["lastActionSuccess"] == False:
194 |             print("ERROR: oh no could not teleport in logging")
195 | 
196 |         image_tensor = this_controller.last_event.frame
197 |         image_dir = (
198 |             img_adr + "_obj_" + object_id.split("|")[0] + "_pickup_" + tag + ".png"
199 |         )
200 |         cv2.imwrite(image_dir, image_tensor[:, :, [2, 1, 0]])
201 | 
202 | 
203 | class ThirdViewVisualizer(LoggerVisualizer):
204 |     def __init__(self):
205 |         super(ThirdViewVisualizer, self).__init__()
206 |         print("This does not work")
207 |         ForkedPdb().set_trace()
208 |         # self.init_camera = False
209 | 
210 |     def finish_episode(self, environment, episode_success, task_info):
211 |         now = datetime.now()
212 |         time_to_write = now.strftime("%m_%d_%Y_%H_%M_%S_%f")
213 |         print("Loggigng", time_to_write, "len", len(self.log_queue))
214 | 
215 |         for i, img in enumerate(self.log_queue):
216 |             image_dir = os.path.join(
217 |                 self.log_dir, time_to_write + "_seq{}.png".format(str(i))
218 |             )
219 |             cv2.imwrite(image_dir, img[:, :, [2, 1, 0]])
220 | 
221 |         success_offset = "succ" if episode_success else "fail"
222 |         gif_name = time_to_write + "_" + success_offset + ".gif"
223 |         concat_all_images = np.expand_dims(np.stack(self.log_queue, axis=0), axis=1)
224 |         save_image_list_to_gif(concat_all_images, gif_name, self.log_dir)
225 | 
226 |         self.log_queue = []
227 |         self.action_queue = []
228 | 
229 |     def log(self, environment, action_str):
230 | 
231 |         # if True or not self.init_camera:
232 |         #     # self.init_camera = True
233 |         #
234 |         #     agent_state = environment.controller.last_event.metadata['agent']
235 |         #     offset={'x':1, 'y':1, 'z':1}
236 |         #     rotation_offset = 0
237 |         #     # environment.controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=agent_state['rotation']['y']+rotation_offset, z=0), position=dict(x=agent_state['position']['x'] + offset['x'], y=1.0 + offset['y'], z=agent_state['position']['z'] + offset['z']))
238 |         #     environment.controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=45, z=0), position=dict(x=-1, y=1.5, z=-1), fieldOfView=100)
239 | 
240 |         # the direction of this might not be ideal
241 |         image_tensor = environment.controller.last_event.third_party_camera_frames[0]
242 |         self.action_queue.append(action_str)
243 |         self.log_queue.append(image_tensor)
244 | 
245 | 
246 | # def __save_thirdparty_camera(controller, address='/Users/kianae/Desktop/third_camera.png', offset={'x':0, 'y':0, 'z':0}, rotation_offset=0):
247 | #     This is taking an additional step which messes up the sequence
248 | #     agent_state = controller.last_event.metadata['agent']
249 | #     controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=agent_state['rotation']['y']+rotation_offset, z=0), position=dict(x=agent_state['position']['x'] + offset['x'], y=1.0 + offset['y'], z=agent_state['position']['z'] + offset['z']))
250 | #     frame = controller.last_event.third_party_camera_frames
251 | #     assert len(frame) == 1
252 | #     frame = frame[0]
253 | #     file_adr = address
254 | #     res = cv2.imwrite(file_adr, frame[:,:,[2,1,0]])
255 | #     return res
256 | 
257 | 
258 | def save_image_list_to_gif(image_list, gif_name, gif_dir):
259 |     gif_adr = os.path.join(gif_dir, gif_name)
260 | 
261 |     seq_len, cols, w, h, c = image_list.shape
262 | 
263 |     pallet = np.zeros((seq_len, w, h * cols, c))
264 | 
265 |     for col_ind in range(cols):
266 |         pallet[:, :, col_ind * h : (col_ind + 1) * h, :] = image_list[:, col_ind]
267 | 
268 |     if not os.path.exists(gif_dir):
269 |         os.makedirs(gif_dir)
270 |     imageio.mimsave(gif_adr, pallet.astype(np.uint8), format="GIF", duration=1 / 5)
271 |     print("Saved result in ", gif_adr)
272 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/__init__.py


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/__init__.py


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/experiments/__init__.py


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_base.py:
--------------------------------------------------------------------------------
 1 | from abc import ABC
 2 | from typing import Optional, Sequence, Union
 3 | 
 4 | from allenact.base_abstractions.experiment_config import ExperimentConfig
 5 | from allenact.base_abstractions.preprocessor import Preprocessor
 6 | from allenact.base_abstractions.sensor import Sensor
 7 | from allenact.utils.experiment_utils import Builder
 8 | 
 9 | 
10 | class ArmPointNavBaseConfig(ExperimentConfig, ABC):
11 |     """The base object navigation configuration file."""
12 | 
13 |     ADVANCE_SCENE_ROLLOUT_PERIOD: Optional[int] = None
14 |     SENSORS: Optional[Sequence[Sensor]] = None
15 | 
16 |     STEP_SIZE = 0.25
17 |     ROTATION_DEGREES = 45.0
18 |     VISIBILITY_DISTANCE = 1.0
19 |     STOCHASTIC = False
20 | 
21 |     CAMERA_WIDTH = 224
22 |     CAMERA_HEIGHT = 224
23 |     SCREEN_SIZE = 224
24 |     MAX_STEPS = 200
25 | 
26 |     def __init__(self):
27 |         self.REWARD_CONFIG = {
28 |             "step_penalty": -0.01,
29 |             "goal_success_reward": 10.0,
30 |             "pickup_success_reward": 10.0,
31 |             "failed_stop_reward": 0.0,
32 |             "shaping_weight": 1.0,  # we are not using this
33 |             "failed_action_penalty": -0.03,
34 |         }
35 | 
36 |     @classmethod
37 |     def preprocessors(cls) -> Sequence[Union[Preprocessor, Builder[Preprocessor]]]:
38 |         return tuple()
39 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_mixin_ddppo.py:
--------------------------------------------------------------------------------
 1 | import torch.optim as optim
 2 | from allenact.algorithms.onpolicy_sync.losses import PPO
 3 | from allenact.algorithms.onpolicy_sync.losses.ppo import PPOConfig
 4 | from allenact.utils.experiment_utils import (
 5 |     Builder,
 6 |     PipelineStage,
 7 |     TrainingPipeline,
 8 |     LinearDecay,
 9 | )
10 | from torch.optim.lr_scheduler import LambdaLR
11 | 
12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import (
13 |     ArmPointNavBaseConfig,
14 | )
15 | 
16 | 
17 | class ArmPointNavMixInPPOConfig(ArmPointNavBaseConfig):
18 |     def training_pipeline(self, **kwargs):
19 |         ppo_steps = int(300000000)
20 |         lr = 3e-4
21 |         num_mini_batch = 1
22 |         update_repeats = 4
23 |         num_steps = self.MAX_STEPS
24 |         save_interval = 500000  # from 50k
25 |         log_interval = 1000
26 |         gamma = 0.99
27 |         use_gae = True
28 |         gae_lambda = 0.95
29 |         max_grad_norm = 0.5
30 |         return TrainingPipeline(
31 |             save_interval=save_interval,
32 |             metric_accumulate_interval=log_interval,
33 |             optimizer_builder=Builder(optim.Adam, dict(lr=lr)),
34 |             num_mini_batch=num_mini_batch,
35 |             update_repeats=update_repeats,
36 |             max_grad_norm=max_grad_norm,
37 |             num_steps=num_steps,
38 |             named_losses={"ppo_loss": PPO(**PPOConfig)},
39 |             gamma=gamma,
40 |             use_gae=use_gae,
41 |             gae_lambda=gae_lambda,
42 |             advance_scene_rollout_period=self.ADVANCE_SCENE_ROLLOUT_PERIOD,
43 |             pipeline_stages=[
44 |                 PipelineStage(loss_names=["ppo_loss"], max_stage_steps=ppo_steps)
45 |             ],
46 |             lr_scheduler_builder=Builder(
47 |                 LambdaLR, {"lr_lambda": LinearDecay(steps=ppo_steps)}
48 |             ),
49 |         )
50 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_mixin_simplegru.py:
--------------------------------------------------------------------------------
 1 | from typing import Sequence, Union
 2 | 
 3 | import gym
 4 | import torch.nn as nn
 5 | from allenact.base_abstractions.preprocessor import Preprocessor
 6 | from allenact.base_abstractions.sensor import RGBSensor, DepthSensor
 7 | from allenact.utils.experiment_utils import Builder
 8 | from allenact_plugins.ithor_plugin.ithor_sensors import GoalObjectTypeThorSensor
 9 | 
10 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import (
11 |     ArmPointNavBaseConfig,
12 | )
13 | from manipulathor_baselines.armpointnav_baselines.models.arm_pointnav_models import (
14 |     ArmPointNavBaselineActorCritic,
15 | )
16 | 
17 | 
18 | class ArmPointNavMixInSimpleGRUConfig(ArmPointNavBaseConfig):
19 |     @classmethod
20 |     def preprocessors(cls) -> Sequence[Union[Preprocessor, Builder[Preprocessor]]]:
21 |         preprocessors = []
22 |         return preprocessors
23 | 
24 |     @classmethod
25 |     def create_model(cls, **kwargs) -> nn.Module:
26 |         has_rgb = any(isinstance(s, RGBSensor) for s in cls.SENSORS)
27 |         has_depth = any(isinstance(s, DepthSensor) for s in cls.SENSORS)
28 |         goal_sensor_uuid = next(
29 |             (s.uuid for s in cls.SENSORS if isinstance(s, GoalObjectTypeThorSensor)),
30 |             None,
31 |         )
32 | 
33 |         return ArmPointNavBaselineActorCritic(
34 |             action_space=gym.spaces.Discrete(
35 |                 len(cls.TASK_SAMPLER._TASK_TYPE.class_action_names())
36 |             ),
37 |             observation_space=kwargs["sensor_preprocessor_graph"].observation_spaces,
38 |             hidden_size=512,
39 |         )
40 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_thor_base.py:
--------------------------------------------------------------------------------
  1 | import platform
  2 | from abc import ABC
  3 | from math import ceil
  4 | from typing import Dict, Any, List, Optional, Sequence
  5 | 
  6 | import gym
  7 | import numpy as np
  8 | import torch
  9 | from allenact.base_abstractions.experiment_config import MachineParams
 10 | from allenact.base_abstractions.preprocessor import SensorPreprocessorGraph
 11 | from allenact.base_abstractions.sensor import SensorSuite, ExpertActionSensor
 12 | from allenact.base_abstractions.task import TaskSampler
 13 | from allenact.utils.experiment_utils import evenly_distribute_count_into_bins
 14 | 
 15 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 16 | from ithor_arm.ithor_arm_task_samplers import (
 17 |     SimpleArmPointNavGeneralSampler,
 18 | )
 19 | from ithor_arm.ithor_arm_viz import ImageVisualizer, TestMetricLogger
 20 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import (
 21 |     ArmPointNavBaseConfig,
 22 | )
 23 | 
 24 | 
 25 | class ArmPointNavThorBaseConfig(ArmPointNavBaseConfig, ABC):
 26 |     """The base config for all iTHOR PointNav experiments."""
 27 | 
 28 |     TASK_SAMPLER = SimpleArmPointNavGeneralSampler
 29 |     VISUALIZE = False
 30 |     if platform.system() == "Darwin":
 31 |         VISUALIZE = True
 32 | 
 33 |     NUM_PROCESSES: Optional[int] = None
 34 |     TRAIN_GPU_IDS = list(range(torch.cuda.device_count()))
 35 |     SAMPLER_GPU_IDS = TRAIN_GPU_IDS
 36 |     VALID_GPU_IDS = [torch.cuda.device_count() - 1]
 37 |     TEST_GPU_IDS = [torch.cuda.device_count() - 1]
 38 | 
 39 |     TRAIN_DATASET_DIR: Optional[str] = None
 40 |     VAL_DATASET_DIR: Optional[str] = None
 41 | 
 42 |     CAP_TRAINING = None
 43 | 
 44 |     TRAIN_SCENES: str = None
 45 |     VAL_SCENES: str = None
 46 |     TEST_SCENES: str = None
 47 | 
 48 |     OBJECT_TYPES: Optional[Sequence[str]] = None
 49 |     VALID_SAMPLES_IN_SCENE = 1
 50 |     TEST_SAMPLES_IN_SCENE = 1
 51 | 
 52 |     NUMBER_OF_TEST_PROCESS = 10
 53 | 
 54 |     def __init__(self):
 55 |         super().__init__()
 56 | 
 57 |         assert (
 58 |             self.CAMERA_WIDTH == 224
 59 |             and self.CAMERA_HEIGHT == 224
 60 |             and self.VISIBILITY_DISTANCE == 1
 61 |             and self.STEP_SIZE == 0.25
 62 |         )
 63 |         self.ENV_ARGS = ENV_ARGS
 64 | 
 65 |     def machine_params(self, mode="train", **kwargs):
 66 |         sampler_devices: Sequence[int] = []
 67 |         if mode == "train":
 68 |             workers_per_device = 1
 69 |             gpu_ids = (
 70 |                 []
 71 |                 if not torch.cuda.is_available()
 72 |                 else self.TRAIN_GPU_IDS * workers_per_device
 73 |             )
 74 |             nprocesses = (
 75 |                 1
 76 |                 if not torch.cuda.is_available()
 77 |                 else evenly_distribute_count_into_bins(self.NUM_PROCESSES, len(gpu_ids))
 78 |             )
 79 |             sampler_devices = self.SAMPLER_GPU_IDS
 80 |         elif mode == "valid":
 81 |             nprocesses = 1
 82 |             gpu_ids = [] if not torch.cuda.is_available() else self.VALID_GPU_IDS
 83 |         elif mode == "test":
 84 |             nprocesses = self.NUMBER_OF_TEST_PROCESS if torch.cuda.is_available() else 1
 85 |             gpu_ids = [] if not torch.cuda.is_available() else self.TEST_GPU_IDS
 86 |         else:
 87 |             raise NotImplementedError("mode must be 'train', 'valid', or 'test'.")
 88 | 
 89 |         sensors = [*self.SENSORS]
 90 |         if mode != "train":
 91 |             sensors = [s for s in sensors if not isinstance(s, ExpertActionSensor)]
 92 | 
 93 |         sensor_preprocessor_graph = (
 94 |             SensorPreprocessorGraph(
 95 |                 source_observation_spaces=SensorSuite(sensors).observation_spaces,
 96 |                 preprocessors=self.preprocessors(),
 97 |             )
 98 |             if mode == "train"
 99 |             or (
100 |                 (isinstance(nprocesses, int) and nprocesses > 0)
101 |                 or (isinstance(nprocesses, Sequence) and sum(nprocesses) > 0)
102 |             )
103 |             else None
104 |         )
105 | 
106 |         return MachineParams(
107 |             nprocesses=nprocesses,
108 |             devices=gpu_ids,
109 |             sampler_devices=sampler_devices
110 |             if mode == "train"
111 |             else gpu_ids,  # ignored with > 1 gpu_ids
112 |             sensor_preprocessor_graph=sensor_preprocessor_graph,
113 |         )
114 | 
115 |     @classmethod
116 |     def make_sampler_fn(cls, **kwargs) -> TaskSampler:
117 |         from datetime import datetime
118 | 
119 |         now = datetime.now()
120 |         exp_name_w_time = cls.__name__ + "_" + now.strftime("%m_%d_%Y_%H_%M_%S_%f")
121 |         if cls.VISUALIZE:
122 |             visualizers = [
123 |                 ImageVisualizer(exp_name=exp_name_w_time),
124 |                 TestMetricLogger(exp_name=exp_name_w_time),
125 |             ]
126 | 
127 |             kwargs["visualizers"] = visualizers
128 |         kwargs["objects"] = cls.OBJECT_TYPES
129 |         kwargs["exp_name"] = exp_name_w_time
130 |         return cls.TASK_SAMPLER(**kwargs)
131 | 
132 |     @staticmethod
133 |     def _partition_inds(n: int, num_parts: int):
134 |         return np.round(np.linspace(0, n, num_parts + 1, endpoint=True)).astype(
135 |             np.int32
136 |         )
137 | 
138 |     def _get_sampler_args_for_scene_split(
139 |         self,
140 |         scenes: List[str],
141 |         process_ind: int,
142 |         total_processes: int,
143 |         seeds: Optional[List[int]] = None,
144 |         deterministic_cudnn: bool = False,
145 |     ) -> Dict[str, Any]:
146 |         if total_processes > len(scenes):  # oversample some scenes -> bias
147 |             if total_processes % len(scenes) != 0:
148 |                 print(
149 |                     "Warning: oversampling some of the scenes to feed all processes."
150 |                     " You can avoid this by setting a number of workers divisible by the number of scenes"
151 |                 )
152 |             scenes = scenes * int(ceil(total_processes / len(scenes)))
153 |             scenes = scenes[: total_processes * (len(scenes) // total_processes)]
154 |         else:
155 |             if len(scenes) % total_processes != 0:
156 |                 print(
157 |                     "Warning: oversampling some of the scenes to feed all processes."
158 |                     " You can avoid this by setting a number of workers divisor of the number of scenes"
159 |                 )
160 |         inds = self._partition_inds(len(scenes), total_processes)
161 | 
162 |         return {
163 |             "scenes": scenes[inds[process_ind] : inds[process_ind + 1]],
164 |             "env_args": self.ENV_ARGS,
165 |             "max_steps": self.MAX_STEPS,
166 |             "sensors": self.SENSORS,
167 |             "action_space": gym.spaces.Discrete(
168 |                 len(self.TASK_SAMPLER._TASK_TYPE.class_action_names())
169 |             ),
170 |             "seed": seeds[process_ind] if seeds is not None else None,
171 |             "deterministic_cudnn": deterministic_cudnn,
172 |             "rewards_config": self.REWARD_CONFIG,
173 |         }
174 | 
175 |     def train_task_sampler_args(
176 |         self,
177 |         process_ind: int,
178 |         total_processes: int,
179 |         devices: Optional[List[int]] = None,
180 |         seeds: Optional[List[int]] = None,
181 |         deterministic_cudnn: bool = False,
182 |     ) -> Dict[str, Any]:
183 |         res = self._get_sampler_args_for_scene_split(
184 |             self.TRAIN_SCENES,
185 |             process_ind,
186 |             total_processes,
187 |             seeds=seeds,
188 |             deterministic_cudnn=deterministic_cudnn,
189 |         )
190 |         res["scene_period"] = "manual"
191 |         res["sampler_mode"] = "train"
192 |         res["cap_training"] = self.CAP_TRAINING
193 |         res["env_args"] = {}
194 |         res["env_args"].update(self.ENV_ARGS)
195 |         res["env_args"]["x_display"] = (
196 |             ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None
197 |         )
198 |         return res
199 | 
200 |     def valid_task_sampler_args(
201 |         self,
202 |         process_ind: int,
203 |         total_processes: int,
204 |         devices: Optional[List[int]],
205 |         seeds: Optional[List[int]] = None,
206 |         deterministic_cudnn: bool = False,
207 |     ) -> Dict[str, Any]:
208 |         res = self._get_sampler_args_for_scene_split(
209 |             self.VALID_SCENES,
210 |             process_ind,
211 |             total_processes,
212 |             seeds=seeds,
213 |             deterministic_cudnn=deterministic_cudnn,
214 |         )
215 |         res["scene_period"] = self.VALID_SAMPLES_IN_SCENE
216 |         res["sampler_mode"] = "val"
217 |         res["cap_training"] = self.CAP_TRAINING
218 |         res["max_tasks"] = self.VALID_SAMPLES_IN_SCENE * len(res["scenes"])
219 |         res["env_args"] = {}
220 |         res["env_args"].update(self.ENV_ARGS)
221 |         res["env_args"]["x_display"] = (
222 |             ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None
223 |         )
224 |         return res
225 | 
226 |     def test_task_sampler_args(
227 |         self,
228 |         process_ind: int,
229 |         total_processes: int,
230 |         devices: Optional[List[int]],
231 |         seeds: Optional[List[int]] = None,
232 |         deterministic_cudnn: bool = False,
233 |     ) -> Dict[str, Any]:
234 |         res = self._get_sampler_args_for_scene_split(
235 |             self.TEST_SCENES,
236 |             process_ind,
237 |             total_processes,
238 |             seeds=seeds,
239 |             deterministic_cudnn=deterministic_cudnn,
240 |         )
241 |         res["scene_period"] = self.TEST_SAMPLES_IN_SCENE
242 |         res["sampler_mode"] = "test"
243 |         res["env_args"] = {}
244 |         res["cap_training"] = self.CAP_TRAINING
245 |         res["env_args"].update(self.ENV_ARGS)
246 |         res["env_args"]["x_display"] = (
247 |             ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None
248 |         )
249 |         return res
250 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/experiments/ithor/__init__.py


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth.py:
--------------------------------------------------------------------------------
 1 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 2 | from ithor_arm.ithor_arm_sensors import (
 3 |     RelativeAgentArmToObjectSensor,
 4 |     RelativeObjectToGoalSensor,
 5 |     PickedUpObjSensor,
 6 |     DepthSensorThor,
 7 | )
 8 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
 9 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import (
10 |     ArmPointNavMixInPPOConfig,
11 | )
12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import (
13 |     ArmPointNavMixInSimpleGRUConfig,
14 | )
15 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
16 |     ArmPointNaviThorBaseConfig,
17 | )
18 | 
19 | 
20 | class ArmPointNavDepth(
21 |     ArmPointNaviThorBaseConfig,
22 |     ArmPointNavMixInPPOConfig,
23 |     ArmPointNavMixInSimpleGRUConfig,
24 | ):
25 |     """An Object Navigation experiment configuration in iThor with RGB
26 |     input."""
27 | 
28 |     SENSORS = [
29 |         DepthSensorThor(
30 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
31 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
32 |             use_normalization=True,
33 |             uuid="depth_lowres",
34 |         ),
35 |         RelativeAgentArmToObjectSensor(),
36 |         RelativeObjectToGoalSensor(),
37 |         PickedUpObjSensor(),
38 |     ]
39 | 
40 |     MAX_STEPS = 200
41 |     TASK_SAMPLER = ArmPointNavTaskSampler
42 | 
43 |     def __init__(self):
44 |         super().__init__()
45 | 
46 |         assert (
47 |             self.CAMERA_WIDTH == 224
48 |             and self.CAMERA_HEIGHT == 224
49 |             and self.VISIBILITY_DISTANCE == 1
50 |             and self.STEP_SIZE == 0.25
51 |         )
52 |         self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True}
53 | 
54 |     @classmethod
55 |     def tag(cls):
56 |         return cls.__name__
57 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_disjoint_depth.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | import torch.nn as nn
 3 | 
 4 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 5 | from ithor_arm.ithor_arm_sensors import (
 6 |     RelativeAgentArmToObjectSensor,
 7 |     RelativeObjectToGoalSensor,
 8 |     PickedUpObjSensor,
 9 |     DepthSensorThor,
10 | )
11 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import (
13 |     ArmPointNavMixInPPOConfig,
14 | )
15 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import (
16 |     ArmPointNavMixInSimpleGRUConfig,
17 | )
18 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
19 |     ArmPointNaviThorBaseConfig,
20 | )
21 | from manipulathor_baselines.armpointnav_baselines.models.disjoint_arm_pointnav_models import (
22 |     DisjointArmPointNavBaselineActorCritic,
23 | )
24 | 
25 | 
26 | class ArmPointNavDisjointDepth(
27 |     ArmPointNaviThorBaseConfig,
28 |     ArmPointNavMixInPPOConfig,
29 |     ArmPointNavMixInSimpleGRUConfig,
30 | ):
31 |     """An Object Navigation experiment configuration in iThor with RGB
32 |     input."""
33 | 
34 |     SENSORS = [
35 |         DepthSensorThor(
36 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
37 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
38 |             use_normalization=True,
39 |             uuid="depth_lowres",
40 |         ),
41 |         RelativeAgentArmToObjectSensor(),
42 |         RelativeObjectToGoalSensor(),
43 |         PickedUpObjSensor(),
44 |     ]
45 | 
46 |     MAX_STEPS = 200
47 |     TASK_SAMPLER = ArmPointNavTaskSampler
48 | 
49 |     def __init__(self):
50 |         super().__init__()
51 | 
52 |         assert (
53 |             self.CAMERA_WIDTH == 224
54 |             and self.CAMERA_HEIGHT == 224
55 |             and self.VISIBILITY_DISTANCE == 1
56 |             and self.STEP_SIZE == 0.25
57 |         )
58 |         self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True}
59 | 
60 |     @classmethod
61 |     def create_model(cls, **kwargs) -> nn.Module:
62 |         return DisjointArmPointNavBaselineActorCritic(
63 |             action_space=gym.spaces.Discrete(
64 |                 len(cls.TASK_SAMPLER._TASK_TYPE.class_action_names())
65 |             ),
66 |             observation_space=kwargs["sensor_preprocessor_graph"].observation_spaces,
67 |             hidden_size=512,
68 |         )
69 | 
70 |     @classmethod
71 |     def tag(cls):
72 |         return cls.__name__
73 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_ithor_base.py:
--------------------------------------------------------------------------------
 1 | from abc import ABC
 2 | 
 3 | from ithor_arm.ithor_arm_constants import TRAIN_OBJECTS, TEST_OBJECTS
 4 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_thor_base import (
 5 |     ArmPointNavThorBaseConfig,
 6 | )
 7 | 
 8 | 
 9 | class ArmPointNaviThorBaseConfig(ArmPointNavThorBaseConfig, ABC):
10 |     """The base config for all iTHOR ObjectNav experiments."""
11 | 
12 |     NUM_PROCESSES = 40
13 |     # add all the arguments here
14 |     TOTAL_NUMBER_SCENES = 30
15 | 
16 |     TRAIN_SCENES = [
17 |         "FloorPlan{}_physics".format(str(i))
18 |         for i in range(1, TOTAL_NUMBER_SCENES + 1)
19 |         if (i % 3 == 1 or i % 3 == 0) and i != 28
20 |     ]  # last scenes are really bad
21 |     TEST_SCENES = [
22 |         "FloorPlan{}_physics".format(str(i))
23 |         for i in range(1, TOTAL_NUMBER_SCENES + 1)
24 |         if i % 3 == 2 and i % 6 == 2
25 |     ]
26 |     VALID_SCENES = [
27 |         "FloorPlan{}_physics".format(str(i))
28 |         for i in range(1, TOTAL_NUMBER_SCENES + 1)
29 |         if i % 3 == 2 and i % 6 == 5
30 |     ]
31 | 
32 |     ALL_SCENES = TRAIN_SCENES + TEST_SCENES + VALID_SCENES
33 | 
34 |     assert (
35 |         len(ALL_SCENES) == TOTAL_NUMBER_SCENES - 1
36 |         and len(set(ALL_SCENES)) == TOTAL_NUMBER_SCENES - 1
37 |     )
38 | 
39 |     OBJECT_TYPES = tuple(sorted(TRAIN_OBJECTS))
40 | 
41 |     UNSEEN_OBJECT_TYPES = tuple(sorted(TEST_OBJECTS))
42 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_no_vision.py:
--------------------------------------------------------------------------------
 1 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 2 | from ithor_arm.ithor_arm_sensors import (
 3 |     RelativeAgentArmToObjectSensor,
 4 |     RelativeObjectToGoalSensor,
 5 |     PickedUpObjSensor,
 6 |     NoVisionSensorThor,
 7 | )
 8 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
 9 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import (
10 |     ArmPointNavMixInPPOConfig,
11 | )
12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import (
13 |     ArmPointNavMixInSimpleGRUConfig,
14 | )
15 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
16 |     ArmPointNaviThorBaseConfig,
17 | )
18 | 
19 | 
20 | class ArmPointNavNoVision(
21 |     ArmPointNaviThorBaseConfig,
22 |     ArmPointNavMixInPPOConfig,
23 |     ArmPointNavMixInSimpleGRUConfig,
24 | ):
25 |     """An Object Navigation experiment configuration in iThor with RGB
26 |     input."""
27 | 
28 |     SENSORS = [
29 |         NoVisionSensorThor(
30 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
31 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
32 |             use_resnet_normalization=False,
33 |             uuid="rgb_lowres",
34 |         ),
35 |         RelativeAgentArmToObjectSensor(),
36 |         RelativeObjectToGoalSensor(),
37 |         PickedUpObjSensor(),
38 |     ]
39 | 
40 |     MAX_STEPS = 200
41 |     TASK_SAMPLER = ArmPointNavTaskSampler  #
42 | 
43 |     def __init__(self):
44 |         super().__init__()
45 | 
46 |         assert (
47 |             self.CAMERA_WIDTH == 224
48 |             and self.CAMERA_HEIGHT == 224
49 |             and self.VISIBILITY_DISTANCE == 1
50 |             and self.STEP_SIZE == 0.25
51 |         )
52 |         self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": False}
53 | 
54 |     @classmethod
55 |     def tag(cls):
56 |         return cls.__name__
57 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgb.py:
--------------------------------------------------------------------------------
 1 | from allenact_plugins.ithor_plugin.ithor_sensors import RGBSensorThor
 2 | 
 3 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 4 | from ithor_arm.ithor_arm_sensors import (
 5 |     RelativeAgentArmToObjectSensor,
 6 |     RelativeObjectToGoalSensor,
 7 |     PickedUpObjSensor,
 8 | )
 9 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
10 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import (
11 |     ArmPointNavMixInPPOConfig,
12 | )
13 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import (
14 |     ArmPointNavMixInSimpleGRUConfig,
15 | )
16 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
17 |     ArmPointNaviThorBaseConfig,
18 | )
19 | 
20 | 
21 | class ArmPointNavRGB(
22 |     ArmPointNaviThorBaseConfig,
23 |     ArmPointNavMixInPPOConfig,
24 |     ArmPointNavMixInSimpleGRUConfig,
25 | ):
26 |     """An Object Navigation experiment configuration in iThor with RGB
27 |     input."""
28 | 
29 |     SENSORS = [
30 |         RGBSensorThor(
31 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
32 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
33 |             use_resnet_normalization=True,
34 |             uuid="rgb_lowres",
35 |         ),
36 |         RelativeAgentArmToObjectSensor(),
37 |         RelativeObjectToGoalSensor(),
38 |         PickedUpObjSensor(),
39 |     ]
40 | 
41 |     MAX_STEPS = 200
42 |     TASK_SAMPLER = ArmPointNavTaskSampler  #
43 | 
44 |     def __init__(self):
45 |         super().__init__()
46 | 
47 |         assert (
48 |             self.CAMERA_WIDTH == 224
49 |             and self.CAMERA_HEIGHT == 224
50 |             and self.VISIBILITY_DISTANCE == 1
51 |             and self.STEP_SIZE == 0.25
52 |         )
53 |         self.ENV_ARGS = {**ENV_ARGS}
54 | 
55 |     @classmethod
56 |     def tag(cls):
57 |         return cls.__name__
58 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgbdepth.py:
--------------------------------------------------------------------------------
 1 | from allenact_plugins.ithor_plugin.ithor_sensors import RGBSensorThor
 2 | 
 3 | from ithor_arm.ithor_arm_constants import ENV_ARGS
 4 | from ithor_arm.ithor_arm_sensors import (
 5 |     RelativeAgentArmToObjectSensor,
 6 |     RelativeObjectToGoalSensor,
 7 |     PickedUpObjSensor,
 8 |     DepthSensorThor,
 9 | )
10 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
11 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import (
12 |     ArmPointNavMixInPPOConfig,
13 | )
14 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import (
15 |     ArmPointNavMixInSimpleGRUConfig,
16 | )
17 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
18 |     ArmPointNaviThorBaseConfig,
19 | )
20 | 
21 | 
22 | class ArmPointNavRGBDepth(
23 |     ArmPointNaviThorBaseConfig,
24 |     ArmPointNavMixInPPOConfig,
25 |     ArmPointNavMixInSimpleGRUConfig,
26 | ):
27 |     """An Object Navigation experiment configuration in iThor with RGB
28 |     input."""
29 | 
30 |     SENSORS = [
31 |         DepthSensorThor(
32 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
33 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
34 |             use_normalization=True,
35 |             uuid="depth_lowres",
36 |         ),
37 |         RGBSensorThor(
38 |             height=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
39 |             width=ArmPointNaviThorBaseConfig.SCREEN_SIZE,
40 |             use_resnet_normalization=True,
41 |             uuid="rgb_lowres",
42 |         ),
43 |         RelativeAgentArmToObjectSensor(),
44 |         RelativeObjectToGoalSensor(),
45 |         PickedUpObjSensor(),
46 |     ]
47 | 
48 |     MAX_STEPS = 200
49 |     TASK_SAMPLER = ArmPointNavTaskSampler  #
50 | 
51 |     def __init__(self):
52 |         super().__init__()
53 | 
54 |         assert (
55 |             self.CAMERA_WIDTH == 224
56 |             and self.CAMERA_HEIGHT == 224
57 |             and self.VISIBILITY_DISTANCE == 1
58 |             and self.STEP_SIZE == 0.25
59 |         )
60 |         self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True}
61 | 
62 |     @classmethod
63 |     def tag(cls):
64 |         return cls.__name__
65 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_NovelObj_armpointnav_depth.py:
--------------------------------------------------------------------------------
 1 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
 2 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_depth import (
 3 |     ArmPointNavDepth,
 4 | )
 5 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
 6 |     ArmPointNaviThorBaseConfig,
 7 | )
 8 | 
 9 | 
10 | class TestOnUObjUSceneRealDepthRandomAgentLocArmNav(ArmPointNavDepth):
11 | 
12 |     MAX_STEPS = 200
13 |     TASK_SAMPLER = ArmPointNavTaskSampler
14 |     TEST_SCENES = ArmPointNaviThorBaseConfig.TEST_SCENES
15 |     OBJECT_TYPES = ArmPointNaviThorBaseConfig.UNSEEN_OBJECT_TYPES
16 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_SeenScenes_NovelObj_armpointnav_depth.py:
--------------------------------------------------------------------------------
 1 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler
 2 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_depth import (
 3 |     ArmPointNavDepth,
 4 | )
 5 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import (
 6 |     ArmPointNaviThorBaseConfig,
 7 | )
 8 | 
 9 | 
10 | class TestOnUObjSSceneArmNavDepth(ArmPointNavDepth):
11 |     """An Object Navigation experiment configuration in iThor with RGB
12 |     input."""
13 | 
14 |     MAX_STEPS = 200
15 |     TASK_SAMPLER = ArmPointNavTaskSampler
16 |     TEST_SCENES = ArmPointNaviThorBaseConfig.TRAIN_SCENES
17 |     OBJECT_TYPES = ArmPointNaviThorBaseConfig.UNSEEN_OBJECT_TYPES
18 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/models/__init__.py


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/models/arm_pointnav_models.py:
--------------------------------------------------------------------------------
  1 | """Baseline models for use in the object navigation task.
  2 | 
  3 | Object navigation is currently available as a Task in AI2-THOR and
  4 | Facebook's Habitat.
  5 | """
  6 | from typing import Tuple, Optional
  7 | 
  8 | import gym
  9 | import torch
 10 | from allenact.algorithms.onpolicy_sync.policy import (
 11 |     ActorCriticModel,
 12 |     LinearCriticHead,
 13 |     LinearActorHead,
 14 |     DistributionType,
 15 |     Memory,
 16 |     ObservationType,
 17 | )
 18 | from allenact.base_abstractions.distributions import CategoricalDistr
 19 | from allenact.base_abstractions.misc import ActorCriticOutput
 20 | from allenact.embodiedai.models.basic_models import SimpleCNN, RNNStateEncoder
 21 | from gym.spaces.dict import Dict as SpaceDict
 22 | 
 23 | from manipulathor_utils.net_utils import input_embedding_net
 24 | 
 25 | 
 26 | class ArmPointNavBaselineActorCritic(ActorCriticModel[CategoricalDistr]):
 27 |     """Baseline recurrent actor critic model for armpointnav task.
 28 | 
 29 |     # Attributes
 30 |     action_space : The space of actions available to the agent. Currently only discrete
 31 |         actions are allowed (so this space will always be of type `gym.spaces.Discrete`).
 32 |     observation_space : The observation space expected by the agent. This observation space
 33 |         should include (optionally) 'rgb' images and 'depth' images.
 34 |     hidden_size : The hidden size of the GRU RNN.
 35 |     object_type_embedding_dim: The dimensionality of the embedding corresponding to the goal
 36 |         object type.
 37 |     """
 38 | 
 39 |     def __init__(
 40 |         self,
 41 |         action_space: gym.spaces.Discrete,
 42 |         observation_space: SpaceDict,
 43 |         hidden_size=512,
 44 |         obj_state_embedding_size=512,
 45 |         trainable_masked_hidden_state: bool = False,
 46 |         num_rnn_layers=1,
 47 |         rnn_type="GRU",
 48 |     ):
 49 |         """Initializer.
 50 | 
 51 |         See class documentation for parameter definitions.
 52 |         """
 53 |         super().__init__(action_space=action_space, observation_space=observation_space)
 54 | 
 55 |         self._hidden_size = hidden_size
 56 |         self.object_type_embedding_size = obj_state_embedding_size
 57 | 
 58 |         sensor_names = self.observation_space.spaces.keys()
 59 |         self.visual_encoder = SimpleCNN(
 60 |             self.observation_space,
 61 |             self._hidden_size,
 62 |             rgb_uuid="rgb_lowres" if "rgb_lowres" in sensor_names else None,
 63 |             depth_uuid="depth_lowres" if "depth_lowres" in sensor_names else None,
 64 |         )
 65 | 
 66 |         if "rgb_lowres" in sensor_names and "depth_lowres" in sensor_names:
 67 |             input_visual_feature_num = 2
 68 |         elif "rgb_lowres" in sensor_names:
 69 |             input_visual_feature_num = 1
 70 |         elif "depth_lowres" in sensor_names:
 71 |             input_visual_feature_num = 1
 72 | 
 73 |         self.state_encoder = RNNStateEncoder(
 74 |             (self._hidden_size) * input_visual_feature_num + obj_state_embedding_size,
 75 |             self._hidden_size,
 76 |             trainable_masked_hidden_state=trainable_masked_hidden_state,
 77 |             num_layers=num_rnn_layers,
 78 |             rnn_type=rnn_type,
 79 |         )
 80 | 
 81 |         self.actor = LinearActorHead(self._hidden_size, action_space.n)
 82 |         self.critic = LinearCriticHead(self._hidden_size)
 83 |         relative_dist_embedding_size = torch.Tensor([3, 100, obj_state_embedding_size])
 84 |         self.relative_dist_embedding = input_embedding_net(
 85 |             relative_dist_embedding_size.long().tolist(), dropout=0
 86 |         )
 87 | 
 88 |         self.train()
 89 | 
 90 |     @property
 91 |     def recurrent_hidden_state_size(self) -> int:
 92 |         """The recurrent hidden state size of the model."""
 93 |         return self._hidden_size
 94 | 
 95 |     @property
 96 |     def num_recurrent_layers(self) -> int:
 97 |         """Number of recurrent hidden layers."""
 98 |         return self.state_encoder.num_recurrent_layers
 99 | 
100 |     def _recurrent_memory_specification(self):
101 |         return dict(
102 |             rnn=(
103 |                 (
104 |                     ("layer", self.num_recurrent_layers),
105 |                     ("sampler", None),
106 |                     ("hidden", self.recurrent_hidden_state_size),
107 |                 ),
108 |                 torch.float32,
109 |             )
110 |         )
111 | 
112 |     def get_relative_distance_embedding(
113 |         self, state_tensor: torch.Tensor
114 |     ) -> torch.FloatTensor:
115 | 
116 |         return self.relative_dist_embedding(state_tensor)
117 | 
118 |     def forward(  # type:ignore
119 |         self,
120 |         observations: ObservationType,
121 |         memory: Memory,
122 |         prev_actions: torch.Tensor,
123 |         masks: torch.FloatTensor,
124 |     ) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]:
125 |         """Processes input batched observations to produce new actor and critic
126 |         values. Processes input batched observations (along with prior hidden
127 |         states, previous actions, and masks denoting which recurrent hidden
128 |         states should be masked) and returns an `ActorCriticOutput` object
129 |         containing the model's policy (distribution over actions) and
130 |         evaluation of the current state (value).
131 | 
132 |         # Parameters
133 |         observations : Batched input observations.
134 |         memory : `Memory` containing the hidden states from initial timepoints.
135 |         prev_actions : Tensor of previous actions taken.
136 |         masks : Masks applied to hidden states. See `RNNStateEncoder`.
137 |         # Returns
138 |         Tuple of the `ActorCriticOutput` and recurrent hidden state.
139 |         """
140 | 
141 |         arm2obj_dist = self.get_relative_distance_embedding(
142 |             observations["relative_agent_arm_to_obj"]
143 |         )
144 |         obj2goal_dist = self.get_relative_distance_embedding(
145 |             observations["relative_obj_to_goal"]
146 |         )
147 | 
148 |         perception_embed = self.visual_encoder(observations)
149 | 
150 |         pickup_bool = observations["pickedup_object"]
151 |         before_pickup = pickup_bool == 0  # not used because of our initialization
152 |         after_pickup = pickup_bool == 1
153 |         distances = arm2obj_dist
154 |         distances[after_pickup] = obj2goal_dist[after_pickup]
155 | 
156 |         x = [distances, perception_embed]
157 | 
158 |         x_cat = torch.cat(x, dim=-1)
159 |         x_out, rnn_hidden_states = self.state_encoder(
160 |             x_cat, memory.tensor("rnn"), masks
161 |         )
162 | 
163 |         actor_out = self.actor(x_out)
164 |         critic_out = self.critic(x_out)
165 |         actor_critic_output = ActorCriticOutput(
166 |             distributions=actor_out, values=critic_out, extras={}
167 |         )
168 | 
169 |         updated_memory = memory.set_tensor("rnn", rnn_hidden_states)
170 | 
171 |         return (
172 |             actor_critic_output,
173 |             updated_memory,
174 |         )
175 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/models/base_models.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class LinearActorHeadNoCategory(nn.Module):
 6 |     def __init__(self, num_inputs: int, num_outputs: int):
 7 |         super().__init__()
 8 | 
 9 |         self.linear = nn.Linear(num_inputs, num_outputs)
10 |         nn.init.orthogonal_(self.linear.weight, gain=0.01)
11 |         nn.init.constant_(self.linear.bias, 0)
12 | 
13 |     def forward(self, x: torch.FloatTensor):  # type: ignore
14 |         x = self.linear(x)  # type:ignore
15 |         assert len(x.shape) == 3
16 |         return x
17 | 


--------------------------------------------------------------------------------
/manipulathor_baselines/armpointnav_baselines/models/disjoint_arm_pointnav_models.py:
--------------------------------------------------------------------------------
  1 | """Baseline models for use in the object navigation task.
  2 | 
  3 | Object navigation is currently available as a Task in AI2-THOR and
  4 | Facebook's Habitat.
  5 | """
  6 | from typing import Tuple, Optional
  7 | 
  8 | import gym
  9 | import torch
 10 | from allenact.algorithms.onpolicy_sync.policy import (
 11 |     ActorCriticModel,
 12 |     LinearCriticHead,
 13 |     DistributionType,
 14 |     Memory,
 15 |     ObservationType,
 16 | )
 17 | from allenact.base_abstractions.distributions import CategoricalDistr
 18 | from allenact.base_abstractions.misc import ActorCriticOutput
 19 | from allenact.embodiedai.models.basic_models import SimpleCNN, RNNStateEncoder
 20 | from gym.spaces.dict import Dict as SpaceDict
 21 | 
 22 | from manipulathor_baselines.armpointnav_baselines.models.base_models import (
 23 |     LinearActorHeadNoCategory,
 24 | )
 25 | from manipulathor_utils.net_utils import input_embedding_net
 26 | 
 27 | 
 28 | class DisjointArmPointNavBaselineActorCritic(ActorCriticModel[CategoricalDistr]):
 29 |     """Disjoint Baseline recurrent actor critic model for armpointnav.
 30 | 
 31 |     # Attributes
 32 |     action_space : The space of actions available to the agent. Currently only discrete
 33 |         actions are allowed (so this space will always be of type `gym.spaces.Discrete`).
 34 |     observation_space : The observation space expected by the agent. This observation space
 35 |         should include (optionally) 'rgb' images and 'depth' images and is required to
 36 |         have a component corresponding to the goal `goal_sensor_uuid`.
 37 |     goal_sensor_uuid : The uuid of the sensor of the goal object. See `GoalObjectTypeThorSensor`
 38 |         as an example of such a sensor.
 39 |     hidden_size : The hidden size of the GRU RNN.
 40 |     object_type_embedding_dim: The dimensionality of the embedding corresponding to the goal
 41 |         object type.
 42 |     """
 43 | 
 44 |     def __init__(
 45 |         self,
 46 |         action_space: gym.spaces.Discrete,
 47 |         observation_space: SpaceDict,
 48 |         hidden_size=512,
 49 |         obj_state_embedding_size=512,
 50 |         trainable_masked_hidden_state: bool = False,
 51 |         num_rnn_layers=1,
 52 |         rnn_type="GRU",
 53 |     ):
 54 |         """Initializer.
 55 | 
 56 |         See class documentation for parameter definitions.
 57 |         """
 58 |         super().__init__(action_space=action_space, observation_space=observation_space)
 59 | 
 60 |         self._hidden_size = hidden_size
 61 |         self.object_type_embedding_size = obj_state_embedding_size
 62 | 
 63 |         self.visual_encoder_pick = SimpleCNN(
 64 |             self.observation_space,
 65 |             self._hidden_size,
 66 |             rgb_uuid=None,
 67 |             depth_uuid="depth_lowres",
 68 |         )
 69 |         self.visual_encoder_drop = SimpleCNN(
 70 |             self.observation_space,
 71 |             self._hidden_size,
 72 |             rgb_uuid=None,
 73 |             depth_uuid="depth_lowres",
 74 |         )
 75 | 
 76 |         self.state_encoder = RNNStateEncoder(
 77 |             (self._hidden_size) + obj_state_embedding_size,
 78 |             self._hidden_size,
 79 |             trainable_masked_hidden_state=trainable_masked_hidden_state,
 80 |             num_layers=num_rnn_layers,
 81 |             rnn_type=rnn_type,
 82 |         )
 83 | 
 84 |         self.actor_pick = LinearActorHeadNoCategory(self._hidden_size, action_space.n)
 85 |         self.critic_pick = LinearCriticHead(self._hidden_size)
 86 |         self.actor_drop = LinearActorHeadNoCategory(self._hidden_size, action_space.n)
 87 |         self.critic_drop = LinearCriticHead(self._hidden_size)
 88 | 
 89 |         # self.object_state_embedding = nn.Embedding(num_embeddings=6, embedding_dim=obj_state_embedding_size)
 90 | 
 91 |         relative_dist_embedding_size = torch.Tensor([3, 100, obj_state_embedding_size])
 92 |         self.relative_dist_embedding_pick = input_embedding_net(
 93 |             relative_dist_embedding_size.long().tolist(), dropout=0
 94 |         )
 95 |         self.relative_dist_embedding_drop = input_embedding_net(
 96 |             relative_dist_embedding_size.long().tolist(), dropout=0
 97 |         )
 98 | 
 99 |         self.train()
100 | 
101 |     @property
102 |     def recurrent_hidden_state_size(self) -> int:
103 |         """The recurrent hidden state size of the model."""
104 |         return self._hidden_size
105 | 
106 |     @property
107 |     def num_recurrent_layers(self) -> int:
108 |         """Number of recurrent hidden layers."""
109 |         return self.state_encoder.num_recurrent_layers
110 | 
111 |     def _recurrent_memory_specification(self):
112 |         return dict(
113 |             rnn=(
114 |                 (
115 |                     ("layer", self.num_recurrent_layers),
116 |                     ("sampler", None),
117 |                     ("hidden", self.recurrent_hidden_state_size),
118 |                 ),
119 |                 torch.float32,
120 |             )
121 |         )
122 | 
123 |     def forward(  # type:ignore
124 |         self,
125 |         observations: ObservationType,
126 |         memory: Memory,
127 |         prev_actions: torch.Tensor,
128 |         masks: torch.FloatTensor,
129 |     ) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]:
130 |         """Processes input batched observations to produce new actor and critic
131 |         values. Processes input batched observations (along with prior hidden
132 |         states, previous actions, and masks denoting which recurrent hidden
133 |         states should be masked) and returns an `ActorCriticOutput` object
134 |         containing the model's policy (distribution over actions) and
135 |         evaluation of the current state (value).
136 | 
137 |         # Parameters
138 |         observations : Batched input observations.
139 |         memory : `Memory` containing the hidden states from initial timepoints.
140 |         prev_actions : Tensor of previous actions taken.
141 |         masks : Masks applied to hidden states. See `RNNStateEncoder`.
142 |         # Returns
143 |         Tuple of the `ActorCriticOutput` and recurrent hidden state.
144 |         """
145 | 
146 |         arm2obj_dist = self.relative_dist_embedding_pick(
147 |             observations["relative_agent_arm_to_obj"]
148 |         )
149 |         obj2goal_dist = self.relative_dist_embedding_drop(
150 |             observations["relative_obj_to_goal"]
151 |         )
152 | 
153 |         perception_embed_pick = self.visual_encoder_pick(observations)
154 |         perception_embed_drop = self.visual_encoder_drop(observations)
155 | 
156 |         pickup_bool = observations["pickedup_object"]
157 |         before_pickup = pickup_bool == 0  # not used because of our initialization
158 |         after_pickup = pickup_bool == 1
159 |         distances = arm2obj_dist
160 |         distances[after_pickup] = obj2goal_dist[after_pickup]
161 | 
162 |         perception_embed = perception_embed_pick
163 |         perception_embed[after_pickup] = perception_embed_drop[after_pickup]
164 | 
165 |         x = [distances, perception_embed]
166 | 
167 |         x_cat = torch.cat(x, dim=-1)  # type: ignore
168 |         x_out, rnn_hidden_states = self.state_encoder(
169 |             x_cat, memory.tensor("rnn"), masks
170 |         )
171 |         actor_out_pick = self.actor_pick(x_out)
172 |         critic_out_pick = self.critic_pick(x_out)
173 | 
174 |         actor_out_drop = self.actor_drop(x_out)
175 |         critic_out_drop = self.critic_drop(x_out)
176 | 
177 |         actor_out = actor_out_pick
178 |         actor_out[after_pickup] = actor_out_drop[after_pickup]
179 |         critic_out = critic_out_pick
180 |         critic_out[after_pickup] = critic_out_drop[after_pickup]
181 | 
182 |         actor_out = CategoricalDistr(logits=actor_out)
183 |         actor_critic_output = ActorCriticOutput(
184 |             distributions=actor_out, values=critic_out, extras={}
185 |         )
186 |         updated_memory = memory.set_tensor("rnn", rnn_hidden_states)
187 | 
188 |         return (
189 |             actor_critic_output,
190 |             updated_memory,
191 |         )
192 | 


--------------------------------------------------------------------------------
/manipulathor_constants.py:
--------------------------------------------------------------------------------
1 | import os
2 | from pathlib import Path
3 | 
4 | ABS_PATH_OF_MANIPULATHOR_TOP_LEVEL_DIR = os.path.abspath(
5 |     os.path.dirname(Path(__file__))
6 | )
7 | 


--------------------------------------------------------------------------------
/manipulathor_utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_utils/__init__.py


--------------------------------------------------------------------------------
/manipulathor_utils/debugger_util.py:
--------------------------------------------------------------------------------
 1 | import pdb
 2 | import sys
 3 | 
 4 | import matplotlib as mpl
 5 | import torch
 6 | 
 7 | mpl.use("Agg")
 8 | import matplotlib.pyplot as plt
 9 | 
10 | 
11 | def visualize_tensor(x, img_name):
12 |     if len(x.shape) == 3 and x.shape[0] == 3:
13 |         x = x.transpose(0, 1).transpose(1, 2)
14 |     x = x.cpu().detach()
15 |     plt.imshow(x)
16 |     plt.savefig(img_name)
17 | 
18 | 
19 | def print_gradients(x):
20 |     print("New gradient")
21 |     print("shape", x.shape)
22 |     if (x != x).sum() > 0:
23 |         print("gradient is nan")
24 |         pdb.set_trace()
25 |     print("mean", x.mean())
26 |     print("min", x.min())
27 |     print("max", x.max())
28 |     print("gradient", x)
29 |     pdb.set_trace()
30 | 
31 | 
32 | class ForkedPdb(pdb.Pdb):
33 |     """A Pdb subclass that may be used
34 |     from a forked multiprocessing child
35 |     """
36 | 
37 |     def interaction(self, *args, **kwargs):
38 |         _stdin = sys.stdin
39 |         try:
40 |             sys.stdin = open("/dev/stdin")
41 |             pdb.Pdb.interaction(self, *args, **kwargs)
42 |         finally:
43 |             sys.stdin = _stdin
44 | 
45 | 
46 | def is_weight_nan(model):
47 |     all_params = [k for k in model.parameters()]
48 |     # [p.shape for p in all_params]
49 |     has_nan = [torch.any(p != p) for p in all_params]
50 |     norms = [p.norm() for p in all_params]
51 |     print("norms", norms)
52 |     print("total nans", sum(has_nan), "out of", len(has_nan))
53 |     print("total norm", sum(norms) / len(norms))
54 | 


--------------------------------------------------------------------------------
/manipulathor_utils/net_utils.py:
--------------------------------------------------------------------------------
  1 | import pdb
  2 | 
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | 
  6 | 
  7 | def upshuffle(
  8 |     in_planes, out_planes, upscale_factor, kernel_size=3, stride=1, padding=1
  9 | ):
 10 |     return nn.Sequential(
 11 |         nn.Conv2d(
 12 |             in_planes,
 13 |             out_planes * upscale_factor ** 2,
 14 |             kernel_size=kernel_size,
 15 |             stride=stride,
 16 |             padding=padding,
 17 |         ),
 18 |         nn.PixelShuffle(upscale_factor),
 19 |         nn.LeakyReLU(),
 20 |     )
 21 | 
 22 | 
 23 | def upshufflenorelu(
 24 |     in_planes, out_planes, upscale_factor, kernel_size=3, stride=1, padding=1
 25 | ):
 26 |     return nn.Sequential(
 27 |         nn.Conv2d(
 28 |             in_planes,
 29 |             out_planes * upscale_factor ** 2,
 30 |             kernel_size=kernel_size,
 31 |             stride=stride,
 32 |             padding=padding,
 33 |         ),
 34 |         nn.PixelShuffle(upscale_factor),
 35 |     )
 36 | 
 37 | 
 38 | def combine_block(in_planes, out_planes):
 39 |     return nn.Sequential(
 40 |         nn.Conv2d(in_planes, out_planes, 1, 1),
 41 |         nn.BatchNorm2d(out_planes),
 42 |         nn.LeakyReLU(),
 43 |     )
 44 | 
 45 | 
 46 | def conv2d_block(in_planes, out_planes, kernel_size, stride=1, padding=1):
 47 |     return nn.Sequential(
 48 |         nn.Conv2d(in_planes, out_planes, kernel_size, stride=stride, padding=padding),
 49 |         nn.BatchNorm2d(out_planes),
 50 |         nn.LeakyReLU(),
 51 |         nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1),
 52 |         nn.BatchNorm2d(out_planes),
 53 |     )
 54 | 
 55 | 
 56 | def combine_block_w_do(in_planes, out_planes, dropout=0.0):
 57 |     return nn.Sequential(
 58 |         nn.Conv2d(in_planes, out_planes, 1, 1),
 59 |         nn.LeakyReLU(),
 60 |         nn.Dropout(dropout),
 61 |     )
 62 | 
 63 | 
 64 | def combine_block_no_do(in_planes, out_planes):
 65 |     return nn.Sequential(
 66 |         nn.Conv2d(in_planes, out_planes, 1, 1),
 67 |         nn.LeakyReLU(),
 68 |     )
 69 | 
 70 | 
 71 | def linear_block(in_features, out_features, dropout=0.0):
 72 |     return nn.Sequential(
 73 |         nn.Linear(in_features, out_features),
 74 |         nn.LeakyReLU(),
 75 |         nn.Dropout(dropout),
 76 |     )
 77 | 
 78 | 
 79 | def linear_block_norelu(in_features, out_features):
 80 |     return nn.Sequential(
 81 |         nn.Linear(in_features, out_features),
 82 |     )
 83 | 
 84 | 
 85 | def input_embedding_net(list_of_feature_sizes, dropout=0.0):
 86 |     modules = []
 87 |     for i in range(len(list_of_feature_sizes) - 1):
 88 |         input_size, output_size = list_of_feature_sizes[i : i + 2]
 89 |         if i + 2 == len(list_of_feature_sizes):
 90 |             modules.append(linear_block_norelu(input_size, output_size))
 91 |         else:
 92 |             modules.append(linear_block(input_size, output_size, dropout=dropout))
 93 |     return nn.Sequential(*modules)
 94 | 
 95 | 
 96 | def _upsample_add(x, y):
 97 |     _, _, H, W = y.size()
 98 |     return F.upsample(x, size=(H, W), mode="bilinear") + y
 99 | 
100 | 
101 | def replace_all_relu_w_leakyrelu(model):
102 |     pdb.set_trace()
103 |     print("Not sure if using this is a good idea")
104 |     modules = model._modules
105 |     for m in modules.keys():
106 |         module = modules[m]
107 |         if isinstance(module, nn.ReLU):
108 |             model._modules[m] = nn.LeakyReLU()
109 |         elif isinstance(module, nn.Module):
110 |             model._modules[m] = replace_all_relu_w_leakyrelu(module)
111 |     return model
112 | 
113 | 
114 | def replace_all_leakyrelu_w_relu(model):
115 |     modules = model._modules
116 |     for m in modules.keys():
117 |         module = modules[m]
118 |         if isinstance(module, nn.LeakyReLU):
119 |             model._modules[m] = nn.ReLU()
120 |         elif isinstance(module, nn.Module):
121 |             model._modules[m] = replace_all_leakyrelu_w_relu(module)
122 |     return model
123 | 
124 | 
125 | def replace_all_bn_w_groupnorm(model):
126 |     pdb.set_trace()
127 |     print("Not sure if using this is a good idea")
128 |     modules = model._modules
129 |     for m in modules.keys():
130 |         module = modules[m]
131 |         if isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm1d):
132 |             feature_number = module.num_features
133 |             model._modules[m] = nn.GroupNorm(32, feature_number)
134 |         elif isinstance(module, nn.BatchNorm3d):
135 |             raise Exception("Not implemented")
136 |         elif isinstance(module, nn.Module):
137 |             model._modules[m] = replace_all_bn_w_groupnorm(module)
138 |     return model
139 | 
140 | 
141 | def flat_temporal(tensor, batch_size, sequence_length):
142 |     tensor_shape = [s for s in tensor.shape]
143 |     assert tensor_shape[0] == batch_size and tensor_shape[1] == sequence_length
144 |     result_shape = [batch_size * sequence_length] + tensor_shape[2:]
145 |     return tensor.contiguous().view(result_shape)
146 | 
147 | 
148 | def unflat_temporal(tensor, batch_size, sequence_length):
149 |     tensor_shape = [s for s in tensor.shape]
150 |     assert tensor_shape[0] == batch_size * sequence_length
151 |     result_shape = [batch_size, sequence_length] + tensor_shape[1:]
152 |     return tensor.contiguous().view(result_shape)
153 | 


--------------------------------------------------------------------------------
/media/armpointnav_task.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/media/armpointnav_task.png


--------------------------------------------------------------------------------
/media/dataset.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/media/dataset.png


--------------------------------------------------------------------------------
/pretrained_models/.gitignore:
--------------------------------------------------------------------------------
1 | *
2 | !.gitignore
3 | !*.md
4 | 


--------------------------------------------------------------------------------
/pretrained_models/EvaluateModels.md:
--------------------------------------------------------------------------------
 1 | # Evaluating the Pre-trained Models
 2 | ## Download the Weights
 3 | You can download the weights from [here](https://drive.google.com/file/d/1axZRgY3oKgATu0zoi1LeLxUtqpIXmJCg/view?usp=sharing).
 4 | After downloading the weights, extract them to `pretrained_models/saved_checkpoints`.
 5 | ## Evaluating the Trained Models
 6 | ### Table 1
 7 | #### Test-SeenObj
 8 | ```
 9 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth.py -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt
10 | ```
11 | #### Test-NovelObj
12 | ```
13 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_NovelObj_armpointnav_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt
14 | ```
15 | #### SeenScenes-NovelObj
16 | ```
17 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_SeenScenes_NovelObj_armpointnav_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt
18 | ```
19 | 
20 | ### Table 2
21 | 
22 | #### No-Vision
23 | ```
24 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_no_vision -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/no_vision_armpointnav.pt
25 | ```
26 | 
27 | #### Disjoint Model
28 | ```
29 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_disjoint_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/disjoint_model_armpointnav.pt
30 | ```
31 | 
32 | #### RGB Model
33 | ```
34 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgb -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/rgb_armpointnav.pt
35 | ```
36 | 
37 | #### RGB-Depth Model
38 | ```
39 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgbdepth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/rgbdepth_armpointnav.pt
40 | ```
41 | 
42 | 
43 | #### Depth Model
44 | ```
45 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth -o test_out -s 1 -t test -cpretrained_models/saved_checkpoints/depth_armpointnav.pt
46 | ```


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | absl-py==0.12.0
 2 | -e git+https://github.com/allenai/ai2thor.git@bcc2e62970823667acb5c2a56e809419f1521e52#egg=ai2thor
 3 | allenact==0.2.2
 4 | allenact-plugins==0.2.2
 5 | appdirs==1.4.4
 6 | astunparse==1.6.3
 7 | attrs==20.3.0
 8 | aws-requests-auth==0.4.3
 9 | black==20.8b1
10 | botocore==1.20.55
11 | cachetools==4.2.1
12 | certifi==2020.12.5
13 | chardet==4.0.0
14 | click==7.1.2
15 | cloudpickle==1.6.0
16 | colour==0.1.5
17 | cycler==0.10.0
18 | dataclasses==0.6
19 | filelock==3.0.12
20 | Flask==1.1.2
21 | future==0.18.2
22 | gast==0.4.0
23 | google-auth==1.29.0
24 | google-auth-oauthlib==0.4.4
25 | google-pasta==0.2.0
26 | grpcio==1.37.0
27 | gym==0.17.3
28 | h5py==2.10.0
29 | idna==2.10
30 | imageio==2.9.0
31 | imageio-ffmpeg==0.4.3
32 | importlib-metadata==4.0.1
33 | itsdangerous==1.1.0
34 | jedi==0.17.0
35 | Jinja2==2.11.3
36 | jmespath==0.10.0
37 | Keras-Preprocessing==1.1.2
38 | kiwisolver==1.3.1
39 | llvmlite==0.36.0
40 | Markdown==3.3.4
41 | MarkupSafe==1.1.1
42 | matplotlib==3.3.4
43 | moviepy==1.0.3
44 | msgpack==1.0.2
45 | mypy-extensions==0.4.3
46 | networkx==2.5.1
47 | numba==0.53.1
48 | numpy==1.19.5
49 | numpy-quaternion==2021.4.5.14.42.35
50 | oauthlib==3.1.0
51 | opencv-python==4.5.1.48
52 | opt-einsum==3.3.0
53 | packaging==20.9
54 | pathspec==0.8.1
55 | Pillow==8.2.0
56 | proglog==0.1.9
57 | progressbar2==3.53.1
58 | protobuf==3.15.8
59 | pyasn1==0.4.8
60 | pyasn1-modules==0.2.8
61 | pyglet==1.5.0
62 | pyparsing==2.4.7
63 | pyquaternion==0.9.9
64 | python-dateutil==2.8.1
65 | python-utils==2.5.6
66 | python-xlib==0.29
67 | PyYAML==5.4.1
68 | regex==2021.4.4
69 | requests==2.25.1
70 | requests-oauthlib==1.3.0
71 | rsa==4.7.2
72 | scipy==1.5.4
73 | setproctitle==1.2.2
74 | six==1.15.0
75 | tensorboardX==2.2
76 | tensorflow-estimator==2.4.0
77 | termcolor==1.1.0
78 | toml==0.10.2
79 | torch==1.8.1
80 | torchvision==0.9.1
81 | tqdm==4.60.0
82 | typed-ast==1.4.3
83 | typing-extensions==3.7.4.3
84 | urllib3==1.26.4
85 | Werkzeug==1.0.1
86 | wrapt==1.12.1
87 | zipp==3.4.1
88 | 


--------------------------------------------------------------------------------
/scripts/jupyter_helper.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | import random
  4 | import ai2thor
  5 | import pdb
  6 | import ai2thor.fifo_server
  7 | 
  8 | 
  9 | ### CONSTANTS
 10 | 
 11 | 
 12 | 
 13 | ADITIONAL_ARM_ARGS = {
 14 |     'disableRendering': True,
 15 |     'returnToStart': True,
 16 |     'speed': 1,
 17 | }
 18 | 
 19 | ARM_MOVE_CONSTANT = 0.05
 20 | 
 21 | SCENE_INDICES = [i + 1 for i in range(30)] +[i + 1 for i in range(200,230)] +[i + 1 for i in range(300,330)] +[i + 1 for i in range(400,430)]
 22 | SCENE_NAMES = ['FloorPlan{}_physics'.format(i) for i in SCENE_INDICES]
 23 | 
 24 | 
 25 | ENV_ARGS = dict(gridSize=0.25,
 26 |                 width=224, height=224, agentMode='arm', fieldOfView=100,
 27 |                 agentControllerType='mid-level',
 28 |                 server_class=ai2thor.fifo_server.FifoServer,
 29 |                 useMassThreshold = True, massThreshold = 10,
 30 |                 autoSimulation=False, autoSyncTransforms=True,
 31 |                 )
 32 | 
 33 | #Functions
 34 | 
 35 | def is_object_at_position(controller, action_detail):
 36 |     objectId = action_detail['objectId']
 37 |     position = action_detail['position']
 38 |     current_object_position = get_object_details(controller, objectId)['position']
 39 |     return two_dict_equal(dict(position=position), dict(position=current_object_position))
 40 | 
 41 | def is_agent_at_position(controller, action_detail):
 42 |     # dict(action='TeleportFull', x=initial_location['x'], y=initial_location['y'], z=initial_location['z'], rotation=dict(x=0, y=initial_rotation, z=0), horizon=horizon, standing=True)
 43 |     target_pose = dict(
 44 |         position={'x': action_detail['x'], 'y': action_detail['y'], 'z': action_detail['z'], },
 45 |         rotation=action_detail['rotation'],
 46 |         horizon=action_detail['horizon']
 47 |     )
 48 |     current_agent_pose = controller.last_event.metadata['agent']
 49 |     current_agent_pose = dict(
 50 |         position=current_agent_pose['position'],
 51 |         rotation=current_agent_pose['rotation'],
 52 |         horizon=current_agent_pose['cameraHorizon'],
 53 |     )
 54 |     return two_dict_equal(current_agent_pose, target_pose)
 55 | 
 56 | 
 57 | def get_object_details(controller, obj_id):
 58 |     return [o for o in controller.last_event.metadata['objects'] if o['objectId'] == obj_id][0]
 59 | 
 60 | def initialize_arm(controller, scene_starting_cheating_locations):
 61 |     # for start arm from high up as a cheating, this block is very important. never remove
 62 |     scene = controller.last_event.metadata['sceneName']
 63 |     initial_pose = scene_starting_cheating_locations[scene]
 64 |     event1 = controller.step(dict(action='TeleportFull', standing=True, x=initial_pose['x'], y=initial_pose['y'], z=initial_pose['z'], rotation=dict(x=0, y=initial_pose['rotation'], z=0), horizon=initial_pose['horizon']))
 65 |     event2 = controller.step(dict(action='MoveMidLevelArm',  position=dict(x=0.0, y=0, z=0.35), **ADITIONAL_ARM_ARGS))
 66 |     event3 = controller.step(dict(action='MoveMidLevelArmHeight', y=0.8, **ADITIONAL_ARM_ARGS))
 67 |     return event1, event2, event3
 68 | 
 69 | def make_all_objects_unbreakable(controller):
 70 |     all_breakable_objects = [o['objectType'] for o in controller.last_event.metadata['objects'] if o['breakable'] is True]
 71 |     all_breakable_objects = set(all_breakable_objects)
 72 |     for obj_type in all_breakable_objects:
 73 |         controller.step(action='MakeObjectsOfTypeUnbreakable', objectType=obj_type)
 74 | 
 75 | 
 76 | def reset_the_scene_and_get_reachables(controller, scene_name=None, scene_options=None):
 77 |     if scene_name is None:
 78 |         if scene_options is None:
 79 |             scene_options = SCENE_NAMES
 80 |         scene_name = random.choice(scene_options)
 81 |     controller.reset(scene_name)
 82 |     controller.step(action='MakeAllObjectsMoveable')
 83 |     controller.step(action='MakeObjectsStaticKinematicMassThreshold')
 84 |     make_all_objects_unbreakable(controller)
 85 |     return get_reachable_positions(controller)
 86 | 
 87 | def only_reset_scene(controller, scene_name):
 88 |     controller.reset(scene_name)
 89 |     controller.step(action='MakeAllObjectsMoveable')
 90 |     controller.step(action='MakeObjectsStaticKinematicMassThreshold')
 91 |     make_all_objects_unbreakable(controller)
 92 | 
 93 | def transport_wrapper(controller, target_object, target_location):
 94 |     action_detail_list = []
 95 |     transport_detail = dict(action = 'PlaceObjectAtPoint', objectId=target_object, position=target_location, forceKinematic=True)
 96 |     event = controller.step(**transport_detail)
 97 |     action_detail_list.append(transport_detail)
 98 |     # controller.step('PhysicsSyncTransforms')
 99 |     advance_detail = dict(action='AdvancePhysicsStep', simSeconds=1.0)
100 |     controller.step(**advance_detail)
101 |     action_detail_list.append(advance_detail)
102 |     return event, action_detail_list
103 | 
104 | def is_object_in_receptacle(event,target_obj,target_receptacle):
105 |     all_containing_receptacle = set([])
106 |     parent_queue = [target_obj]
107 |     while(len(parent_queue) > 0):
108 |         top_queue = parent_queue[0]
109 |         parent_queue = parent_queue[1:]
110 |         if top_queue in all_containing_receptacle:
111 |             continue
112 |         current_parent_list = event.get_object(top_queue)['parentReceptacles']
113 |         if current_parent_list is None:
114 |             continue
115 |         else:
116 |             parent_queue += current_parent_list
117 |             all_containing_receptacle.update(set(current_parent_list))
118 |     return target_receptacle in all_containing_receptacle
119 | 
120 | def get_reachable_positions(controller):
121 |     event = controller.step('GetReachablePositions')
122 |     reachable_positions = event.metadata['reachablePositions']
123 |     if reachable_positions is None or len(reachable_positions) == 0:
124 |         reachable_positions = event.metadata['actionReturn']
125 |     if reachable_positions is None or len(reachable_positions) == 0:
126 |         print('Scene name', controller.last_event.metadata['sceneName'])
127 |         pdb.set_trace()
128 |     return reachable_positions
129 | def execute_command(controller, command,action_dict_addition):
130 | 
131 |     base_position = get_current_arm_state(controller)
132 |     change_height = ARM_MOVE_CONSTANT
133 |     change_value = change_height
134 |     action_details = {}
135 | 
136 |     if command == 'w':
137 |         base_position['z'] += change_value
138 |     elif command == 'z':
139 |         base_position['z'] -= change_value
140 |     elif command == 's':
141 |         base_position['x'] += change_value
142 |     elif command == 'a':
143 |         base_position['x'] -= change_value
144 |     elif command == '3':
145 |         base_position['y'] += change_value
146 |     elif command == '4':
147 |         base_position['y'] -= change_value
148 |     elif command == 'u':
149 |         base_position['h'] += change_height
150 |     elif command == 'j':
151 |         base_position['h'] -= change_height
152 |     elif command == '/':
153 |         action_details = dict('')
154 |         pickupable = controller.last_event.metadata['arm']['PickupableObjectsInsideHandSphere']
155 |         print(pickupable)
156 |     elif command == 'd':
157 |         event = controller.step(action='ReleaseObject')
158 |         action_details = dict(action='ReleaseObject')
159 |     elif command == 'm':
160 |         action_dict_addition = copy.deepcopy(action_dict_addition)
161 |         event = controller.step(action='MoveAgent', ahead=0.2,**action_dict_addition)
162 |         action_details = dict(action='MoveAgent', ahead=0.2,**action_dict_addition)
163 | 
164 |     elif command == 'r':
165 |         action_dict_addition = copy.deepcopy(action_dict_addition)
166 |         event = controller.step(action='RotateAgent', degrees = 45,**action_dict_addition)
167 |         action_details = dict(action='RotateAgent', degrees = 45,**action_dict_addition)
168 |     elif command == 'l':
169 |         action_dict_addition = copy.deepcopy(action_dict_addition)
170 |         event = controller.step(action='RotateAgent', degrees = -45,**action_dict_addition)
171 |         action_details = dict(action='RotateAgent', degrees = -45,**action_dict_addition)
172 |     elif command == 'p':
173 |         event = controller.step(action='PickupObject')
174 |         action_details = dict(action='PickupObject')
175 |     elif command == 'q':
176 |         action_details = {}
177 |     else:
178 |         action_details = {}
179 | 
180 |     if command in ['w', 'z', 's', 'a', '3', '4']:
181 | 
182 |         event = controller.step(action='MoveArm', position=dict(x=base_position['x'], y=base_position['y'], z=base_position['z']),**action_dict_addition)
183 |         action_details=dict(action='MoveArm', position=dict(x=base_position['x'], y=base_position['y'], z=base_position['z']),**action_dict_addition)
184 |         success = event.metadata['lastActionSuccess']
185 | 
186 | 
187 |     elif command in ['u', 'j']:
188 | 
189 |         event = controller.step(action='MoveArmBase', y=base_position['h'],**action_dict_addition)
190 |         action_details=dict(action='MoveArmBase', y=base_position['h'],**action_dict_addition)
191 | 
192 |         success = event.metadata['lastActionSuccess']
193 | 
194 |     return action_details
195 | 
196 | def get_current_arm_state(controller):
197 |     h_min = 0.450998873
198 |     h_max = 1.8009994
199 |     agent_base_location = 0.9009995460510254
200 |     event = controller.last_event
201 |     offset = event.metadata['agent']['position']['y'] - agent_base_location
202 |     h_max += offset
203 |     h_min += offset
204 |     joints=(event.metadata['arm']['joints'])
205 |     arm=joints[-1]
206 |     assert arm['name'] == 'robot_arm_4_jnt'
207 |     xyz_dict = copy.deepcopy(arm['rootRelativePosition'])
208 |     height_arm = joints[0]['position']['y']
209 |     xyz_dict['h'] = (height_arm - h_min) / (h_max - h_min)
210 |     #     print_error([x['position']['y'] for x in joints])
211 |     return xyz_dict
212 | 
213 | def two_list_equal(l1, l2):
214 |     dict1 = {i: v for (i,v) in enumerate(l1)}
215 |     dict2 = {i: v for (i,v) in enumerate(l2)}
216 |     return two_dict_equal(dict1, dict2)
217 | 
218 | 
219 | def get_current_full_state(controller):
220 |     return {'agent_position':controller.last_event.metadata['agent']['position'], 'agent_rotation':controller.last_event.metadata['agent']['rotation'], 'arm_state': controller.last_event.metadata['arm']['joints'], 'held_object': controller.last_event.metadata['arm']['HeldObjects']}
221 | 
222 | 
223 | def two_dict_equal(dict1, dict2, threshold=0.001, ignore_keys=[]):
224 |     if len(dict1) != len(dict2):
225 |         print('different len', dict1, dict2)
226 |         return False
227 |     # assert len(dict1) == len(dict2), print('different len', dict1, dict2)
228 |     equal = True
229 |     for k in dict1:
230 |         if k in ignore_keys:
231 |             continue
232 |         val1 = dict1[k]
233 |         val2 = dict2[k]
234 |         if not (type(val1) == type(val2) or (type(val1) in [int, float] and type(val2) in [int, float])):
235 |             print('different type', dict1, dict2)
236 |             return False
237 |         # assert type(val1) == type(val2) or (type(val1) in [int, float] and type(val2) in [int, float]), ()
238 |         if type(val1) == dict:
239 |             equal = two_dict_equal(val1, val2)
240 |         elif type(val1) == list:
241 |             equal = two_list_equal(val1, val2)
242 |         # elif val1 != val1: # Either nan or -inf
243 |         #     equal = val2 != val2
244 |         elif type(val1) == float:
245 |             equal = abs(val1 - val2) < threshold
246 |         else:
247 |             equal = (val1 == val2)
248 |         if not equal:
249 |             print('not equal', 'key', k, 'values', val1, val2)
250 |             return equal
251 |     return equal
252 | 
253 | 


--------------------------------------------------------------------------------
/scripts/startx.py:
--------------------------------------------------------------------------------
 1 | import atexit
 2 | import os
 3 | import platform
 4 | import re
 5 | import shlex
 6 | import subprocess
 7 | import tempfile
 8 | 
 9 | 
10 | # Turning off automatic black formatting for this script as it breaks quotes.
11 | 
12 | # fmt: off
13 | 
14 | def pci_records():
15 |     records = []
16 |     command = shlex.split("lspci -vmm")
17 |     output = subprocess.check_output(command).decode()
18 | 
19 |     for devices in output.strip().split("\n\n"):
20 |         record = {}
21 |         records.append(record)
22 |         for row in devices.split("\n"):
23 |             key, value = row.split("\t")
24 |             record[key.split(":")[0]] = value
25 | 
26 |     return records
27 | 
28 | def generate_xorg_conf(devices):
29 |     xorg_conf = []
30 | 
31 |     device_section = """
32 | Section "Device"
33 |     Identifier     "Device{device_id}"
34 |     Driver         "nvidia"
35 |     VendorName     "NVIDIA Corporation"
36 |     BusID          "{bus_id}"
37 | EndSection
38 | """
39 |     server_layout_section = """
40 | Section "ServerLayout"
41 |     Identifier     "Layout0"
42 |     {screen_records}
43 | EndSection
44 | """
45 |     screen_section = """
46 | Section "Screen"
47 |     Identifier     "Screen{screen_id}"
48 |     Device         "Device{device_id}"
49 |     DefaultDepth    24
50 |     Option         "AllowEmptyInitialConfiguration" "True"
51 |     SubSection     "Display"
52 |         Depth       24
53 |         Virtual 1024 768
54 |     EndSubSection
55 | EndSection
56 | """
57 |     screen_records = []
58 |     for i, bus_id in enumerate(devices):
59 |         xorg_conf.append(device_section.format(device_id=i, bus_id=bus_id))
60 |         xorg_conf.append(screen_section.format(device_id=i, screen_id=i))
61 |         screen_records.append('Screen {screen_id} "Screen{screen_id}" 0 0'.format(screen_id=i))
62 | 
63 |     xorg_conf.append(server_layout_section.format(screen_records="\n    ".join(screen_records)))
64 | 
65 |     output =  "\n".join(xorg_conf)
66 |     return output
67 | 
68 | def startx(display=0):
69 |     if platform.system() != "Linux":
70 |         raise Exception("Can only run startx on linux")
71 | 
72 |     devices = []
73 |     for r in pci_records():
74 |         if r.get("Vendor", "") == "NVIDIA Corporation" \
75 |                 and r["Class"] in ["VGA compatible controller", "3D controller"]:
76 |             bus_id = "PCI:" + ":".join(map(lambda x: str(int(x, 16)), re.split(r"[:\.]", r["Slot"])))
77 |             devices.append(bus_id)
78 | 
79 |     if not devices:
80 |         raise Exception("no nvidia cards found")
81 | 
82 |     try:
83 |         fd, path = tempfile.mkstemp()
84 |         with open(path, "w") as f:
85 |             f.write(generate_xorg_conf(devices))
86 |         command = shlex.split("Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config %s :%s" % (path, display))
87 |         proc = subprocess.Popen(command)
88 |         atexit.register(lambda: proc.poll() is None and proc.kill())
89 |         proc.wait()
90 |     finally:
91 |         os.close(fd)
92 |         os.unlink(path)
93 | 
94 | # fmt: on
95 | 
96 | 
97 | if __name__ == "__main__":
98 |     startx()


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import find_packages, setup
 2 | 
 3 | if __name__ == "__main__":
 4 |     setup(
 5 |         name="manipulathor",
 6 |         packages=find_packages(),
 7 |         version="0.0.1",
 8 |         install_requires=[
 9 |             "allenact==0.2.2",
10 |             "allenact_plugins[ithor]==0.2.2",
11 |             "setuptools",
12 |         ],
13 |     )
14 | 


--------------------------------------------------------------------------------
/updated_requirements.txt:
--------------------------------------------------------------------------------
 1 | absl-py==0.12.0
 2 | -e git+https://github.com/allenai/ai2thor.git@bcc2e62970823667acb5c2a56e809419f1521e52#egg=ai2thor
 3 | allenact==0.2.2
 4 | allenact-plugins==0.2.2
 5 | astunparse==1.6.3
 6 | aws-requests-auth==0.4.3
 7 | botocore==1.20.67
 8 | cachetools==4.2.2
 9 | certifi==2020.12.5
10 | chardet==4.0.0
11 | python-xlib==0.29
12 | click==7.1.2
13 | cloudpickle==1.6.0
14 | cmake==3.18.4.post1
15 | cycler==0.10.0
16 | dataclasses==0.8
17 | decorator==4.4.2
18 | filelock==3.0.12
19 | Flask==1.1.2
20 | flatbuffers==1.12
21 | future==0.18.2
22 | gast==0.3.3
23 | gin==0.1.6
24 | gin-config==0.4.0
25 | google-auth==1.30.0
26 | google-auth-oauthlib==0.4.4
27 | google-pasta==0.2.0
28 | grpcio==1.32.0
29 | gym==0.17.3
30 | h5py==2.10.0
31 | idna==2.10
32 | imageio==2.9.0
33 | imageio-ffmpeg==0.4.3
34 | importlib-metadata==4.0.1
35 | itsdangerous==1.1.0
36 | Jinja2==2.11.3
37 | jmespath==0.10.0
38 | Keras-Preprocessing==1.1.2
39 | kiwisolver==1.3.1
40 | Markdown==3.3.4
41 | MarkupSafe==1.1.1
42 | matplotlib==3.3.4
43 | moviepy==1.0.3
44 | msgpack==1.0.2
45 | networkx==2.5.1
46 | numpy==1.19.5
47 | oauthlib==3.1.0
48 | opencv-python==4.5.1.48
49 | opt-einsum==3.3.0
50 | Pillow==8.2.0
51 | proglog==0.1.9
52 | progressbar2==3.53.1
53 | protobuf==3.15.8
54 | pyasn1==0.4.8
55 | pyasn1-modules==0.2.8
56 | pyglet==1.5.0
57 | pyparsing==2.4.7
58 | python-dateutil==2.8.1
59 | python-utils==2.5.6
60 | PyYAML==5.4.1
61 | requests==2.25.1
62 | requests-oauthlib==1.3.0
63 | rsa==4.7.2
64 | scipy==1.5.4
65 | setproctitle==1.2.2
66 | six==1.15.0
67 | tensorboard==2.5.0
68 | tensorboard-data-server==0.6.1
69 | tensorboard-plugin-wit==1.8.0
70 | tensorboardX==2.2
71 | tensorflow==2.4.1
72 | tensorflow-estimator==2.4.0
73 | termcolor==1.1.0
74 | torch==1.8.1
75 | torchaudio==0.8.1
76 | torchvision==0.9.1
77 | tqdm==4.60.0
78 | typing-extensions==3.7.4.3
79 | urllib3==1.26.4
80 | Werkzeug==1.0.1
81 | wrapt==1.12.1
82 | zipp==3.4.1
83 | 


--------------------------------------------------------------------------------