├── .gitignore ├── DefineTask.md ├── LICENSE ├── README.md ├── datasets ├── .gitignore └── README.md ├── ithor_arm ├── __init__.py ├── arm_calculation_utils.py ├── ithor_arm_constants.py ├── ithor_arm_environment.py ├── ithor_arm_sensors.py ├── ithor_arm_task_samplers.py ├── ithor_arm_tasks.py └── ithor_arm_viz.py ├── manipulathor_baselines ├── __init__.py └── armpointnav_baselines │ ├── __init__.py │ ├── experiments │ ├── __init__.py │ ├── armpointnav_base.py │ ├── armpointnav_mixin_ddppo.py │ ├── armpointnav_mixin_simplegru.py │ ├── armpointnav_thor_base.py │ └── ithor │ │ ├── __init__.py │ │ ├── armpointnav_depth.py │ │ ├── armpointnav_disjoint_depth.py │ │ ├── armpointnav_ithor_base.py │ │ ├── armpointnav_no_vision.py │ │ ├── armpointnav_rgb.py │ │ ├── armpointnav_rgbdepth.py │ │ ├── test_NovelObj_armpointnav_depth.py │ │ └── test_SeenScenes_NovelObj_armpointnav_depth.py │ └── models │ ├── __init__.py │ ├── arm_pointnav_models.py │ ├── base_models.py │ └── disjoint_arm_pointnav_models.py ├── manipulathor_constants.py ├── manipulathor_utils ├── __init__.py ├── debugger_util.py └── net_utils.py ├── media ├── armpointnav_task.png └── dataset.png ├── pretrained_models ├── .gitignore └── EvaluateModels.md ├── requirements.txt ├── scripts ├── example_notebook.ipynb ├── jupyter_helper.py └── startx.py ├── setup.py └── updated_requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | *.DS_Store* 3 | .idea/ 4 | experiment_output/ 5 | test_out 6 | *.ipynb_checkpoints* 7 | -------------------------------------------------------------------------------- /DefineTask.md: -------------------------------------------------------------------------------- 1 | # ✨ Defining a New Task 2 | 3 | In order to define a new task or change any component of the training procedure, it suffices to look into/change the following files and classes. 4 | 5 | ## ManipulaTHOR Environment 6 | 7 | The `ManipulaTHOREnvironment` class defined in `ithor_arm/ithor_arm_environment.py` is a wrapper around the AI2-THOR environment which helps with discretizing the action space for the ArmPointNav task and wraps the functions that are needed for working with the low-level manipulation feature, including getting the arm's pose, agent's pose, etc. 8 | 9 | #### Important Functions 10 | 11 | 1. `step`: This function translates the actions generated by the model to their corresponding AI2THOR API commands. For example, for the action `MoveHandXUp`, this function calculates the current arm location, increments the `x` value by `0.05` (which is our step size), and calls the function `MoveMidLevelArm` with the desires `x,y,z` values. 12 | 13 | 1. `get_objects_moved`: Finds the objects that have moved from the previous action. We only use this function during evaluation for calculating the Success Without Disturbance metric. However, it can be used for reward shaping as well. 14 | 15 | 1. `get_current_arm_state`: Calculates the current arm state from the metadata of the environment. 16 | 17 | 2. `is_object_at_low_level_hand`: Checks whether the object is at hand or not. 18 | 19 | ## ArmPointNav Task Sampler 20 | 21 | You can find `ArmPointNavTaskSampler` in `ithor_arm/ithor_arm_task_samplers.py`. This class is in charge of reading the dataset metadata, initializing the all possible locations for the object and agent and randomly sampling a data point from this set for each episode. 22 | 23 | #### Important Functions 24 | 25 | 1. `calc_possible_trajectories`: This function stores all the possible locations of the objects on the counters in a dictionary sorted by the object names. 26 | 2. `get_source_target_indices`: This function returns two indices from the dictionary calculated above for the initial and goal location of the object. 27 | 3. `next_task`: Gets the source and target locations, resets the scene, initializes the agent and transport the object to its initial location. 28 | 29 | 30 | ## ArmPointNav Task 31 | 32 | You can find `ArmPointNavTask` in `ithor_arm/ithor_arm_tasks.py`. This class includes the possible actions, reward definition, metric calculation and recording and calling the appropriate API functions on the environment. 33 | 34 | #### Important Functions and Attributes 35 | 36 | 1. `_actions`: An attribute storing all the possible actions for each task. Note that this is an ordered list. 37 | 2. `judge`: The function for calculating the reward. Any reward shaping should be done in this function. 38 | 3. `metrics`: Calculates and logs the value of each evaluation metric per episode. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Original work Copyright (c) 2017 Ilya Kostrikov 4 | 5 | Original work Copyright (c) Facebook, Inc. and its affiliates. 6 | 7 | Modified work Copyright (c) 2020 Allen Institute for Artificial Intelligence 8 | 9 | Permission is hereby granted, free of charge, to any person obtaining a copy 10 | of this software and associated documentation files (the "Software"), to deal 11 | in the Software without restriction, including without limitation the rights 12 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 13 | copies of the Software, and to permit persons to whom the Software is 14 | furnished to do so, subject to the following conditions: 15 | 16 | The above copyright notice and this permission notice shall be included in all 17 | copies or substantial portions of the Software. 18 | 19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 22 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 25 | SOFTWARE. 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ManipulaTHOR: A Framework for Visual Object Manipulation 2 | #### Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi 3 | #### (Oral Presentation at CVPR 2021) 4 | #### (Project Page)--(Framework)--(Video)--(Slides) 5 | 6 | We present ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm. Our framework is built upon a physics engine and enables realistic interactions with objects while navigating through scenes and performing tasks. Object manipulation is an established research domain within the robotics community and poses several challenges including avoiding collisions, grasping, and long-horizon planning. Our framework focuses primarily on manipulation in visually rich and complex scenes, joint manipulation and navigation planning, and generalization to unseen environments and objects; challenges that are often overlooked. The framework provides a comprehensive suite of sensory information and motor functions enabling development of robust manipulation agents. 7 | 8 | This code base is based on AllenAct framework and the majority of the core training algorithms and pipelines are borrowed from AllenAct code base. 9 | 10 | ### Citation 11 | 12 | If you find this project useful in your research, please consider citing: 13 | 14 | ``` 15 | @inproceedings{ehsani2021manipulathor, 16 | title={ManipulaTHOR: A Framework for Visual Object Manipulation}, 17 | author={Ehsani, Kiana and Han, Winson and Herrasti, Alvaro and VanderBilt, Eli and Weihs, Luca and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh}, 18 | booktitle={CVPR}, 19 | year={2021} 20 | } 21 | ``` 22 | 23 | ### Contents 24 |
25 | 35 |
36 | 37 | ## 💻 Installation 38 | 39 | To begin, clone this repository locally 40 | ```bash 41 | git clone https://github.com/ehsanik/manipulathor.git 42 | ``` 43 |
44 | See here for a summary of the most important files/directories in this repository 45 |

46 | 47 | Here's a quick summary of the most important files/directories in this repository: 48 | * `manipulathor_utils/*.py` - Helper functions and classes. 49 | * `manipulathor_baselines/armpointnav_baselines` 50 | - `experiments/` 51 | + `ithor/armpointnav_*.py` - Different baselines introduced in the paper. Each files in this folder corresponds to a row of a table in the paper. 52 | + `*.py` - The base configuration files which define experiment setup and hyperparameters for training. 53 | - `models/*.py` - A collection of Actor-Critic baseline models. 54 | * `ithor_arm/` - A collection of Environments, Task Samplers and Task Definitions 55 | - `ithor_arm_environment.py` - The definition of the `ManipulaTHOREnvironment` that wraps the AI2THOR-based framework introduced in this work and enables an easy-to-use API. 56 | - `itho_arm_constants.py` - Constants used to define the task and parameters of the environment. These include the step size 57 | taken by the agent, the unique id of the the THOR build we use, etc. 58 | - `ithor_arm_sensors.py` - Sensors which provide observations to our agents during training. E.g. the 59 | `RGBSensor` obtains RGB images from the environment and returns them for use by the agent. 60 | - `ithor_arm_tasks.py` - Definition of the `ArmPointNav` task, the reward definition and the function for calculating the goal achievement. 61 | - `ithor_arm_task_samplers.py` - Definition of the `ArmPointNavTaskSampler` samplers. Initializing the sampler, reading the json files from the dataset and randomly choosing a task is defined in this file. 62 | - `ithor_arm_viz.py` - Utility functions for visualization and logging the outputs of the models. 63 | 64 |

65 |
66 | 67 | You can then install requirements by running 68 | ```bash 69 | pip install -r requirements.txt 70 | ``` 71 | 72 | 73 | 74 | **Python 3.6+ 🐍.** Each of the actions supports `typing` within Python. 75 | 76 | **AI2-THOR 🧞.** To ensure reproducible results, please install this version of the AI2THOR. 77 | 78 | After installing the requirements, you should start the xserver by running [this script](scripts/startx.py) in the background. Finally, you can start playing with the environment using [our example jupyter notebook](scripts/example_notebook.ipynb). 79 | 80 | ## 📝 ArmPointNav Task Description 81 | 82 | 83 | 84 | ArmPointNav is the goal of addressing the problem of visual object manipulation, where the task is to move an object between two locations in a scene. Operating in visually rich and complex environments, generalizing to unseen environments and objects, avoiding collisions with objects and structures in the scene, and visual planning to reach the destination are among the major challenges of this task. The example illustrates a sequence of actions taken a by a virtual robot within the ManipulaTHOR environment for picking up a vase from the shelf and stack it on a plate on the countertop. 85 | 86 | ## 📊 Dataset 87 | 88 | To study the task of ArmPointNav, we present the ArmPointNav Dataset (APND). This consists of 30 kitchen scenes in AI2-THOR that include more than 150 object categories (69 interactable object categories) with a variety of shapes, sizes and textures. We use 12 pickupable categories as our target objects. We use 20 scenes in the training set and the remaining is evenly split into Val and Test. We train with 6 object categories and use the remaining to test our model in a Novel-Obj setting. For more information on dataset, and how to download it refer to Dataset Details. 89 | 90 | ## 🖼️ Sensory Observations 91 | 92 | The types of sensors provided for this paper include: 93 | 94 | 1. **RGB images** - having shape `224x224x3` and an FOV of 90 degrees. 95 | 2. **Depth maps** - having shape `224x224` and an FOV of 90 degrees. 96 | 3. **Perfect egomotion** - We allow for agents to know precisely what the object location is relative to the agent's arm as well as to its goal location. 97 | 98 | 99 | ## 🏃 Allowed Actions 100 | 101 | A total of 13 actions are available to our agents, these include: 102 | 103 | 1. **Moving the agent** 104 | 105 | * `MoveAhead` - Results in the agent moving ahead by 0.25m if doing so would not result in the agent colliding with something. 106 | 107 | * `Rotate [Right/Left]` - Results in the agent's body rotating 45 degrees by the desired direction. 108 | 109 | 2. **Moving the arm** 110 | 111 | * `Moving the wrist along axis [x, y, z]` - Results in the arm moving along an axis (±x,±y, ±z) by 0.05m. 112 | 113 | * `Moving the height of the arm base [Up/Down]` - Results in the base of the arm moving along y axis by 0.05m. 114 | 115 | 3. **Abstract Grasp** 116 | 117 | * Picks up a target object. Only succeeds if the object is inside the arm grasper. 118 | 119 | 4. **Done Action** 120 | 121 | * This action finishes an episode. The agent must issue a `Done` action when it reaches the goal otherwise the episode considers as a failure. 122 | 123 | ## ✨ Defining a New Task 124 | 125 | In order to define a new task, redefine the rewarding, try a new model, or change the enviornment setup, checkout our tutorial on defining a new task here. 126 | 127 | ## 🏋 Training An Agent 128 | 129 | For running experiments first you need to add the project directory to your python path. You can train a model with a specific experiment setup by running one of the experiments below: 130 | 131 | ``` 132 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/ -o experiment_output -s 1 133 | ``` 134 | 135 | Where `` can be one of the options below: 136 | 137 | ``` 138 | armpointnav_no_vision -- No Vision Baseline 139 | armpointnav_disjoint_depth -- Disjoint Model Ablation 140 | armpointnav_rgb -- Our RGB Experiment 141 | armpointnav_rgbdepth -- Our RGBD Experiment 142 | armpointnav_depth -- Our Depth Experiment 143 | ``` 144 | 145 | 146 | ## 💪 Evaluating A Pre-Trained Agent 147 | 148 | To evaluate a pre-trained model, (for example to reproduce the numbers in the paper), you can add 149 | `-t test -c ` to the end of the command you ran for training. 150 | 151 | In order to reproduce the numbers in the paper, you need to download the pretrained models from 152 | [here](https://drive.google.com/file/d/1wZi_IL5d7elXLkAb4jOixfY0M6-ZfkGM/view?usp=sharing) and extract them 153 | to pretrained_models. The full list of experiments and their corresponding trained weights can be found 154 | [here](pretrained_models/EvaluateModels.md). 155 | 156 | ``` 157 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/ -o test_out -s 1 -t test -c 158 | ``` 159 | -------------------------------------------------------------------------------- /datasets/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !.gitignore 3 | !*.md -------------------------------------------------------------------------------- /datasets/README.md: -------------------------------------------------------------------------------- 1 | # 📊 Arm Point Nav Dataset (APND) 2 | 3 | As a step towards generalizable object manipulation, we present the task of ArmPointNav – moving an object in the scene from a source location to a target location. This involves, navigating towards the object, moving the arm gripper close to the object, picking it up, navigating towards the target location, moving the arm gripper (with the object in place) close to the target location, and finally releasing the object so it lands carefully. In line with the agent navigation task of PointNav, source and target locations of the object are specified via (x, y, z) coordinates in the agent coordinate frame. 4 | 5 | To study the task of ArmPointNav, we present the Arm Point Nav Dataset (APND). 6 | 7 | 8 | Our data generation process starts by finding all the possible locations in each room per object. Furthermore, we need to separate out the initial object locations for which there exists a valid agent pose that can reach it. This is to ensure the objects in each task of our dataset are reachable by our agent, and there is at least a solution for picking up/dropping off the object in that location. Note that the object in the aforementioned target location can be visible or hidden from the agent’s point of view (e.g., on a table vs. in the closed fridge). Since there is a path between any two agent locations in the room, any pair of possible locations can be considered as a task (which may or may not require navigating the agent in the scene). 9 | 10 | 11 | 12 | The initial location of the object can pose a variety of challenges. In (a) the tomato is occluded by the bowl, therefore the agent needs to remove the bowl to reach the tomato. In (b) the lettuce is on the shelf, which requires the agent to move the arm carefully such that it does not collide with the shelf or the vase. In (c) The goal object is inside another object and in (d) the goal object is inside a receptacle, therefore it requires interacting with another entity (opening microwave’s door) before reaching for the object. The latter case is outside the scope of this code base. 13 | 14 | ## 📲 Install The Dataset 15 | 16 | 1. Download the compressed file of the dataset from [here](https://drive.google.com/file/d/13fOGheELJuoXv_JWRPGcpu2fBY-jKWKB/view?usp=sharing). 17 | 2. Extract the zip file to `datasets/apnd-dataset` 18 | 19 | ## 🗂️ Dataset Hierarchy 20 | 21 | Below you can find the description and usecase of each of the files in the dataset. 22 | 23 | ``` 24 | apnd-dataset 25 | └── valid_object_positions ----- Pool of potential object positions 26 | │ └── valid__positions_in_.json 27 | └── deterministic_tasks ----- Deterministic set of tasks, randomly chosen for evaluation 28 | │ └── tasks__positions_in_.json ----- Consists of pairs of initial and goal location of the object 29 | └── starting_pose.json ----- Initial location in each room for which the arm initialization succeeds 30 | └── valid_agent_initial_locations.json ----- Reachable positions for the agent in each room 31 | └── deterministic_valid_agent_initial_locations.json ----- Fixed set of initial locations for the agent which is used during test time (to ensure evaluation consistency). 32 | ``` 33 | 34 | 35 | -------------------------------------------------------------------------------- /ithor_arm/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/ithor_arm/__init__.py -------------------------------------------------------------------------------- /ithor_arm/arm_calculation_utils.py: -------------------------------------------------------------------------------- 1 | """Utility classes and functions for calculating the arm relative and absolute position.""" 2 | from typing import Dict 3 | 4 | import numpy as np 5 | import torch 6 | from scipy.spatial.transform import Rotation as R 7 | 8 | from ithor_arm.ithor_arm_constants import ( 9 | ARM_START_POSITIONS, 10 | ADITIONAL_ARM_ARGS, 11 | ) 12 | 13 | 14 | def convert_state_to_tensor(state: Dict): 15 | result = [] 16 | if "position" in state: 17 | result += [ 18 | state["position"]["x"], 19 | state["position"]["y"], 20 | state["position"]["z"], 21 | ] 22 | if "rotation" in state: 23 | result += [ 24 | state["rotation"]["x"], 25 | state["rotation"]["y"], 26 | state["rotation"]["z"], 27 | ] 28 | return torch.Tensor(result) 29 | 30 | 31 | def diff_position(state_goal, state_curr): 32 | p1 = state_goal["position"] 33 | p2 = state_curr["position"] 34 | result = {k: abs(p1[k] - p2[k]) for k in p1.keys()} 35 | return result 36 | 37 | 38 | def make_rotation_matrix(position, rotation): 39 | result = np.zeros((4, 4)) 40 | r = R.from_euler("xyz", [rotation["x"], rotation["y"], rotation["z"]], degrees=True) 41 | result[:3, :3] = r.as_matrix() 42 | result[3, 3] = 1 43 | result[:3, 3] = [position["x"], position["y"], position["z"]] 44 | return result 45 | 46 | 47 | def inverse_rot_trans_mat(mat): 48 | mat = np.linalg.inv(mat) 49 | return mat 50 | 51 | 52 | def position_rotation_from_mat(matrix): 53 | result = {"position": None, "rotation": None} 54 | rotation = R.from_matrix(matrix[:3, :3]).as_euler("xyz", degrees=True) 55 | rotation_dict = {"x": rotation[0], "y": rotation[1], "z": rotation[2]} 56 | result["rotation"] = rotation_dict 57 | position = matrix[:3, 3] 58 | result["position"] = {"x": position[0], "y": position[1], "z": position[2]} 59 | return result 60 | 61 | 62 | def find_closest_inverse(deg): 63 | for k in saved_inverse_rotation_mats.keys(): 64 | if abs(k - deg) < 5: 65 | return saved_inverse_rotation_mats[k] 66 | # if it reaches here it means it had not calculated the degree before 67 | rotation = R.from_euler("xyz", [0, deg, 0], degrees=True) 68 | result = rotation.as_matrix() 69 | inverse = inverse_rot_trans_mat(result) 70 | print("WARNING: Had to calculate the matrix for ", deg) 71 | return inverse 72 | 73 | 74 | def calc_inverse(deg): 75 | rotation = R.from_euler("xyz", [0, deg, 0], degrees=True) 76 | result = rotation.as_matrix() 77 | inverse = inverse_rot_trans_mat(result) 78 | return inverse 79 | 80 | 81 | saved_inverse_rotation_mats = {i: calc_inverse(i) for i in range(0, 360, 45)} 82 | saved_inverse_rotation_mats[360] = saved_inverse_rotation_mats[0] 83 | 84 | 85 | def convert_world_to_agent_coordinate(world_obj, agent_state): 86 | position = agent_state["position"] 87 | rotation = agent_state["rotation"] 88 | agent_translation = [position["x"], position["y"], position["z"]] 89 | # inverse_agent_rotation = inverse_rot_trans_mat(agent_rotation_matrix[:3, :3]) 90 | assert abs(rotation["x"] - 0) < 0.01 and abs(rotation["z"] - 0) < 0.01 91 | inverse_agent_rotation = find_closest_inverse(rotation["y"]) 92 | obj_matrix = make_rotation_matrix(world_obj["position"], world_obj["rotation"]) 93 | obj_translation = np.matmul( 94 | inverse_agent_rotation, (obj_matrix[:3, 3] - agent_translation) 95 | ) 96 | # add rotation later 97 | obj_matrix[:3, 3] = obj_translation 98 | result = position_rotation_from_mat(obj_matrix) 99 | return result 100 | 101 | 102 | def test_translation_functions(): 103 | agent_coordinate = { 104 | "position": {"x": 1, "y": 0, "z": 2}, 105 | "rotation": {"x": 0, "y": -45, "z": 0}, 106 | } 107 | obj_coordinate = { 108 | "position": {"x": 0, "y": 1, "z": 0}, 109 | "rotation": {"x": 0, "y": 0, "z": 0}, 110 | } 111 | rotated = convert_world_to_agent_coordinate(obj_coordinate, agent_coordinate) 112 | eps = 0.01 113 | assert ( 114 | rotated["position"]["x"] - (-2.1) < eps 115 | and rotated["position"]["x"] - (1) < eps 116 | and rotated["position"]["x"] - (-0.7) < eps 117 | ) 118 | 119 | 120 | def initialize_arm(controller): 121 | # for start arm from high up, 122 | scene = controller.last_event.metadata["sceneName"] 123 | initial_pose = ARM_START_POSITIONS[scene] 124 | event1 = controller.step( 125 | dict( 126 | action="TeleportFull", 127 | standing=True, 128 | x=initial_pose["x"], 129 | y=initial_pose["y"], 130 | z=initial_pose["z"], 131 | rotation=dict(x=0, y=initial_pose["rotation"], z=0), 132 | horizon=initial_pose["horizon"], 133 | ) 134 | ) 135 | event2 = controller.step( 136 | dict(action="MoveArm", position=dict(x=0.0, y=0, z=0.35), **ADITIONAL_ARM_ARGS) 137 | ) 138 | event3 = controller.step(dict(action="MoveArmBase", y=0.8, **ADITIONAL_ARM_ARGS)) 139 | return event1, event2, event3 140 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_constants.py: -------------------------------------------------------------------------------- 1 | """Constant values and hyperparameters that are used by the environment.""" 2 | import ai2thor 3 | import ai2thor.fifo_server 4 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment 5 | 6 | # MANIPULATHOR_COMMIT_ID = "68212159d78aab5c611b7f16338380993884a06a" 7 | # MANIPULATHOR_COMMIT_ID = 'bcc2e62970823667acb5c2a56e809419f1521e52' 8 | MANIPULATHOR_COMMIT_ID = "a84dd29471ec2201f583de00257d84fac1a03de2" 9 | 10 | MOVE_THR = 0.01 11 | ARM_MIN_HEIGHT = 0.450998873 12 | ARM_MAX_HEIGHT = 1.8009994 13 | MOVE_ARM_CONSTANT = 0.05 14 | MOVE_ARM_HEIGHT_CONSTANT = MOVE_ARM_CONSTANT 15 | 16 | ADITIONAL_ARM_ARGS = { 17 | "disableRendering": True, 18 | "returnToStart": True, 19 | "speed": 1, 20 | } 21 | 22 | MOVE_AHEAD = "MoveAheadContinuous" 23 | ROTATE_LEFT = "RotateLeftContinuous" 24 | ROTATE_RIGHT = "RotateRightContinuous" 25 | MOVE_ARM_HEIGHT_P = "MoveArmHeightP" 26 | MOVE_ARM_HEIGHT_M = "MoveArmHeightM" 27 | MOVE_ARM_X_P = "MoveArmXP" 28 | MOVE_ARM_X_M = "MoveArmXM" 29 | MOVE_ARM_Y_P = "MoveArmYP" 30 | MOVE_ARM_Y_M = "MoveArmYM" 31 | MOVE_ARM_Z_P = "MoveArmZP" 32 | MOVE_ARM_Z_M = "MoveArmZM" 33 | PICKUP = "PickUpMidLevel" 34 | DONE = "DoneMidLevel" 35 | 36 | 37 | ENV_ARGS = dict( 38 | gridSize=0.25, 39 | width=224, 40 | height=224, 41 | visibilityDistance=1.0, 42 | agentMode="arm", 43 | fieldOfView=100, 44 | agentControllerType="mid-level", 45 | server_class=ai2thor.fifo_server.FifoServer, 46 | useMassThreshold=True, 47 | massThreshold=10, 48 | autoSimulation=False, 49 | autoSyncTransforms=True, 50 | ) 51 | 52 | TRAIN_OBJECTS = ["Apple", "Bread", "Tomato", "Lettuce", "Pot", "Mug"] 53 | TEST_OBJECTS = ["Potato", "SoapBottle", "Pan", "Egg", "Spatula", "Cup"] 54 | 55 | 56 | def make_all_objects_unbreakable(controller): 57 | all_breakable_objects = [ 58 | o["objectType"] 59 | for o in controller.last_event.metadata["objects"] 60 | if o["breakable"] is True 61 | ] 62 | all_breakable_objects = set(all_breakable_objects) 63 | for obj_type in all_breakable_objects: 64 | controller.step(action="MakeObjectsOfTypeUnbreakable", objectType=obj_type) 65 | 66 | 67 | def reset_environment_and_additional_commands(controller, scene_name): 68 | controller.reset(scene_name) 69 | controller.step(action="MakeAllObjectsMoveable") 70 | controller.step(action="MakeObjectsStaticKinematicMassThreshold") 71 | make_all_objects_unbreakable(controller) 72 | return 73 | 74 | 75 | def transport_wrapper(controller, target_object, target_location): 76 | transport_detail = dict( 77 | action="PlaceObjectAtPoint", 78 | objectId=target_object, 79 | position=target_location, 80 | forceKinematic=True, 81 | ) 82 | advance_detail = dict(action="AdvancePhysicsStep", simSeconds=1.0) 83 | 84 | if issubclass(type(controller), IThorEnvironment): 85 | event = controller.step(transport_detail) 86 | controller.step(advance_detail) 87 | elif type(controller) == ai2thor.controller.Controller: 88 | event = controller.step(**transport_detail) 89 | controller.step(**advance_detail) 90 | return event 91 | 92 | 93 | VALID_OBJECT_LIST = [ 94 | "Knife", 95 | "Bread", 96 | "Fork", 97 | "Potato", 98 | "SoapBottle", 99 | "Pan", 100 | "Plate", 101 | "Tomato", 102 | "Egg", 103 | "Pot", 104 | "Spatula", 105 | "Cup", 106 | "Bowl", 107 | "SaltShaker", 108 | "PepperShaker", 109 | "Lettuce", 110 | "ButterKnife", 111 | "Apple", 112 | "DishSponge", 113 | "Spoon", 114 | "Mug", 115 | ] 116 | 117 | import json 118 | 119 | with open("datasets/apnd-dataset/starting_pose.json") as f: 120 | ARM_START_POSITIONS = json.load(f) 121 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_environment.py: -------------------------------------------------------------------------------- 1 | """A wrapper for engaging with the ManipulaTHOR environment.""" 2 | 3 | import copy 4 | import math 5 | import typing 6 | import warnings 7 | from typing import Tuple, Dict, List, Set, Union, Any, Optional 8 | 9 | import ai2thor.server 10 | import numpy as np 11 | from ai2thor.controller import Controller 12 | from allenact_plugins.ithor_plugin.ithor_constants import VISIBILITY_DISTANCE, FOV 13 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment 14 | 15 | from ithor_arm.ithor_arm_constants import ( 16 | ADITIONAL_ARM_ARGS, 17 | ARM_MIN_HEIGHT, 18 | ARM_MAX_HEIGHT, 19 | MOVE_ARM_HEIGHT_CONSTANT, 20 | MOVE_ARM_CONSTANT, 21 | MANIPULATHOR_COMMIT_ID, 22 | reset_environment_and_additional_commands, 23 | MOVE_THR, 24 | ) 25 | from manipulathor_utils.debugger_util import ForkedPdb 26 | 27 | 28 | class ManipulaTHOREnvironment(IThorEnvironment): 29 | """Wrapper for the manipulathor controller providing arm functionality 30 | and bookkeeping. 31 | 32 | See [here](https://ai2thor.allenai.org/documentation/installation) for comprehensive 33 | documentation on AI2-THOR. 34 | 35 | # Attributes 36 | 37 | controller : The ai2thor controller. 38 | """ 39 | 40 | def __init__( 41 | self, 42 | x_display: Optional[str] = None, 43 | docker_enabled: bool = False, 44 | local_thor_build: Optional[str] = None, 45 | visibility_distance: float = VISIBILITY_DISTANCE, 46 | fov: float = FOV, 47 | player_screen_width: int = 224, 48 | player_screen_height: int = 224, 49 | quality: str = "Very Low", 50 | restrict_to_initially_reachable_points: bool = False, 51 | make_agents_visible: bool = True, 52 | object_open_speed: float = 1.0, 53 | simplify_physics: bool = False, 54 | verbose: bool = False, 55 | env_args=None, 56 | ) -> None: 57 | """Initializer. 58 | 59 | # Parameters 60 | 61 | x_display : The x display into which to launch ai2thor (possibly necessarily if you are running on a server 62 | without an attached display). 63 | docker_enabled : Whether or not to run thor in a docker container (useful on a server without an attached 64 | display so that you don't have to start an x display). 65 | local_thor_build : The path to a local build of ai2thor. This is probably not necessary for your use case 66 | and can be safely ignored. 67 | visibility_distance : The distance (in meters) at which objects, in the viewport of the agent, 68 | are considered visible by ai2thor and will have their "visible" flag be set to `True` in the metadata. 69 | fov : The agent's camera's field of view. 70 | width : The width resolution (in pixels) of the images returned by ai2thor. 71 | height : The height resolution (in pixels) of the images returned by ai2thor. 72 | quality : The quality at which to render. Possible quality settings can be found in 73 | `ai2thor._quality_settings.QUALITY_SETTINGS`. 74 | restrict_to_initially_reachable_points : Whether or not to restrict the agent to locations in ai2thor 75 | that were found to be (initially) reachable by the agent (i.e. reachable by the agent after resetting 76 | the scene). This can be useful if you want to ensure there are only a fixed set of locations where the 77 | agent can go. 78 | make_agents_visible : Whether or not the agent should be visible. Most noticable when there are multiple agents 79 | or when quality settings are high so that the agent casts a shadow. 80 | object_open_speed : How quickly objects should be opened. High speeds mean faster simulation but also mean 81 | that opening objects have a lot of kinetic energy and can, possibly, knock other objects away. 82 | simplify_physics : Whether or not to simplify physics when applicable. Currently this only simplies object 83 | interactions when opening drawers (when simplified, objects within a drawer do not slide around on 84 | their own when the drawer is opened or closed, instead they are effectively glued down). 85 | """ 86 | 87 | self._start_player_screen_width = player_screen_width 88 | self._start_player_screen_height = player_screen_height 89 | self._local_thor_build = local_thor_build 90 | self.x_display = x_display 91 | self.controller: Optional[Controller] = None 92 | self._started = False 93 | self._quality = quality 94 | self._verbose = verbose 95 | self.env_args = env_args 96 | 97 | self._initially_reachable_points: Optional[List[Dict]] = None 98 | self._initially_reachable_points_set: Optional[Set[Tuple[float, float]]] = None 99 | self._move_mag: Optional[float] = None 100 | self._grid_size: Optional[float] = None 101 | self._visibility_distance = visibility_distance 102 | 103 | self._fov = fov 104 | 105 | self.restrict_to_initially_reachable_points = ( 106 | restrict_to_initially_reachable_points 107 | ) 108 | self.make_agents_visible = make_agents_visible 109 | self.object_open_speed = object_open_speed 110 | self._always_return_visible_range = False 111 | self.simplify_physics = simplify_physics 112 | 113 | self.start(None) 114 | self.check_controller_version() 115 | 116 | # noinspection PyTypeHints 117 | self.controller.docker_enabled = docker_enabled # type: ignore 118 | 119 | def check_controller_version(self): 120 | if MANIPULATHOR_COMMIT_ID is not None: 121 | assert ( 122 | MANIPULATHOR_COMMIT_ID in self.controller._build.url 123 | ), "Build number is not right, {} vs {}, use pip3 install -e git+https://github.com/allenai/ai2thor.git@{}#egg=ai2thor".format( 124 | self.controller._build.url, 125 | MANIPULATHOR_COMMIT_ID, 126 | MANIPULATHOR_COMMIT_ID, 127 | ) 128 | 129 | def create_controller(self): 130 | controller = Controller(**self.env_args, commit_id=MANIPULATHOR_COMMIT_ID) 131 | 132 | return controller 133 | 134 | def start( 135 | self, 136 | scene_name: Optional[str], 137 | move_mag: float = 0.25, 138 | **kwargs, 139 | ) -> None: 140 | """Starts the ai2thor controller if it was previously stopped. 141 | 142 | After starting, `reset` will be called with the scene name and move magnitude. 143 | 144 | # Parameters 145 | 146 | scene_name : The scene to load. 147 | move_mag : The amount of distance the agent moves in a single `MoveAhead` step. 148 | kwargs : additional kwargs, passed to reset. 149 | """ 150 | if self._started: 151 | raise RuntimeError( 152 | "Trying to start the environment but it is already started." 153 | ) 154 | 155 | self.controller = self.create_controller() 156 | 157 | if ( 158 | self._start_player_screen_height, 159 | self._start_player_screen_width, 160 | ) != self.current_frame.shape[:2]: 161 | self.controller.step( 162 | { 163 | "action": "ChangeResolution", 164 | "x": self._start_player_screen_width, 165 | "y": self._start_player_screen_height, 166 | } 167 | ) 168 | 169 | self._started = True 170 | self.reset(scene_name=scene_name, move_mag=move_mag, **kwargs) 171 | 172 | def reset( 173 | self, 174 | scene_name: Optional[str], 175 | move_mag: float = 0.25, 176 | **kwargs, 177 | ): 178 | self._move_mag = move_mag 179 | self._grid_size = self._move_mag 180 | 181 | if scene_name is None: 182 | scene_name = self.controller.last_event.metadata["sceneName"] 183 | # self.reset_init_params()#**kwargs) removing this fixes one of the crashing problem 184 | 185 | # to solve the crash issue 186 | try: 187 | reset_environment_and_additional_commands(self.controller, scene_name) 188 | except Exception: 189 | print("RESETTING THE SCENE,", scene_name) 190 | self.controller = ai2thor.controller.Controller( 191 | **self.env_args, commit_id=MANIPULATHOR_COMMIT_ID 192 | ) 193 | reset_environment_and_additional_commands(self.controller, scene_name) 194 | 195 | if self.object_open_speed != 1.0: 196 | self.controller.step( 197 | {"action": "ChangeOpenSpeed", "x": self.object_open_speed} 198 | ) 199 | 200 | self._initially_reachable_points = None 201 | self._initially_reachable_points_set = None 202 | self.controller.step({"action": "GetReachablePositions"}) 203 | if not self.controller.last_event.metadata["lastActionSuccess"]: 204 | warnings.warn( 205 | "Error when getting reachable points: {}".format( 206 | self.controller.last_event.metadata["errorMessage"] 207 | ) 208 | ) 209 | self._initially_reachable_points = self.last_action_return 210 | 211 | self.list_of_actions_so_far = [] 212 | 213 | def randomize_agent_location( 214 | self, seed: int = None, partial_position: Optional[Dict[str, float]] = None 215 | ) -> Dict: 216 | raise Exception("not used") 217 | 218 | def is_object_at_low_level_hand(self, object_id): 219 | current_objects_in_hand = self.controller.last_event.metadata["arm"]["heldObjects"] 220 | return object_id in current_objects_in_hand 221 | 222 | def object_in_hand(self): 223 | """Object metadata for the object in the agent's hand.""" 224 | inv_objs = self.last_event.metadata["inventoryObjects"] 225 | if len(inv_objs) == 0: 226 | return None 227 | elif len(inv_objs) == 1: 228 | return self.get_object_by_id( 229 | self.last_event.metadata["inventoryObjects"][0]["objectId"] 230 | ) 231 | else: 232 | raise AttributeError("Must be <= 1 inventory objects.") 233 | 234 | def correct_nan_inf(self, flawed_dict, extra_tag=""): 235 | corrected_dict = copy.deepcopy(flawed_dict) 236 | anything_changed = 0 237 | for (k, v) in corrected_dict.items(): 238 | if v != v or math.isinf(v): 239 | corrected_dict[k] = 0 240 | anything_changed += 1 241 | return corrected_dict 242 | 243 | def get_object_by_id(self, object_id: str) -> Optional[Dict[str, Any]]: 244 | for o in self.last_event.metadata["objects"]: 245 | if o["objectId"] == object_id: 246 | o["position"] = self.correct_nan_inf(o["position"], "obj id") 247 | return o 248 | return None 249 | 250 | def get_current_arm_state(self): 251 | h_min = ARM_MIN_HEIGHT 252 | h_max = ARM_MAX_HEIGHT 253 | agent_base_location = 0.9009995460510254 254 | event = self.controller.last_event 255 | offset = event.metadata["agent"]["position"]["y"] - agent_base_location 256 | h_max += offset 257 | h_min += offset 258 | joints = event.metadata["arm"]["joints"] 259 | arm = joints[-1] 260 | assert arm["name"] == "robot_arm_4_jnt" 261 | xyz_dict = copy.deepcopy(arm["rootRelativePosition"]) 262 | height_arm = joints[0]["position"]["y"] 263 | xyz_dict["h"] = (height_arm - h_min) / (h_max - h_min) 264 | xyz_dict = self.correct_nan_inf(xyz_dict, "realtive hand") 265 | return xyz_dict 266 | 267 | def get_absolute_hand_state(self): 268 | event = self.controller.last_event 269 | joints = event.metadata["arm"]["joints"] 270 | arm = copy.deepcopy(joints[-1]) 271 | assert arm["name"] == "robot_arm_4_jnt" 272 | xyz_dict = arm["position"] 273 | xyz_dict = self.correct_nan_inf(xyz_dict, "absolute hand") 274 | return dict(position=xyz_dict, rotation={"x": 0, "y": 0, "z": 0}) 275 | 276 | def get_pickupable_objects(self): 277 | 278 | event = self.controller.last_event 279 | object_list = event.metadata["arm"]["pickupableObjects"] 280 | 281 | return object_list 282 | 283 | def get_current_object_locations(self): 284 | obj_loc_dict = {} 285 | metadata = self.controller.last_event.metadata["objects"] 286 | for o in metadata: 287 | obj_loc_dict[o["objectId"]] = dict( 288 | position=o["position"], rotation=o["rotation"] 289 | ) 290 | return copy.deepcopy(obj_loc_dict) 291 | 292 | def close_enough(self, current_obj_pose, init_obj_pose, threshold): 293 | position_close = [ 294 | abs(current_obj_pose["position"][k] - init_obj_pose["position"][k]) 295 | <= threshold 296 | for k in ["x", "y", "z"] 297 | ] 298 | position_is_close = sum(position_close) == 3 299 | rotation_close = [ 300 | abs(current_obj_pose["rotation"][k] - init_obj_pose["rotation"][k]) 301 | <= threshold 302 | for k in ["x", "y", "z"] 303 | ] 304 | rotation_is_close = sum(rotation_close) == 3 305 | return position_is_close and rotation_is_close 306 | 307 | def get_objects_moved(self, initial_object_locations): 308 | current_object_locations = self.get_current_object_locations() 309 | moved_objects = [] 310 | for object_id in current_object_locations.keys(): 311 | if not self.close_enough( 312 | current_object_locations[object_id], 313 | initial_object_locations[object_id], 314 | threshold=MOVE_THR, 315 | ): 316 | moved_objects.append(object_id) 317 | 318 | return moved_objects 319 | 320 | def step( 321 | self, action_dict: Dict[str, Union[str, int, float]] 322 | ) -> ai2thor.server.Event: 323 | """Take a step in the ai2thor environment.""" 324 | action = typing.cast(str, action_dict["action"]) 325 | 326 | skip_render = "renderImage" in action_dict and not action_dict["renderImage"] 327 | last_frame: Optional[np.ndarray] = None 328 | if skip_render: 329 | last_frame = self.current_frame 330 | 331 | if self.simplify_physics: 332 | action_dict["simplifyOPhysics"] = True 333 | if action in ["PickUpMidLevel", "DoneMidLevel"]: 334 | if action == "PickUpMidLevel": 335 | object_id = action_dict["object_id"] 336 | if not self.is_object_at_low_level_hand(object_id): 337 | pickupable_objects = self.get_pickupable_objects() 338 | # 339 | if object_id in pickupable_objects: 340 | # This version of the task is actually harder # consider making it easier, are we penalizing failed pickup? yes 341 | event = self.step(dict(action="PickupObject")) 342 | # we are doing an additional pass here, label is not right and if we fail we will do it twice 343 | object_inventory = self.controller.last_event.metadata["arm"][ 344 | "heldObjects" 345 | ] 346 | if ( 347 | len(object_inventory) > 0 348 | and object_id not in object_inventory 349 | ): 350 | event = self.step(dict(action="ReleaseObject")) 351 | action_dict = { 352 | 'action': 'Pass' 353 | } 354 | 355 | elif not "MoveArm" in action: 356 | if "Continuous" in action: 357 | copy_aditions = copy.deepcopy(ADITIONAL_ARM_ARGS) 358 | 359 | action_dict = {**action_dict, **copy_aditions} 360 | if action in ["MoveAheadContinuous"]: 361 | action_dict["action"] = "MoveAgent" 362 | action_dict["ahead"] = 0.2 363 | 364 | elif action in ["RotateRightContinuous"]: 365 | action_dict["action"] = "RotateAgent" 366 | action_dict["degrees"] = 45 367 | 368 | elif action in ["RotateLeftContinuous"]: 369 | action_dict["action"] = "RotateAgent" 370 | action_dict["degrees"] = -45 371 | 372 | elif "MoveArm" in action: 373 | copy_aditions = copy.deepcopy(ADITIONAL_ARM_ARGS) 374 | action_dict = {**action_dict, **copy_aditions} 375 | base_position = self.get_current_arm_state() 376 | if "MoveArmHeight" in action: 377 | action_dict["action"] = "MoveArmBase" 378 | 379 | if action == "MoveArmHeightP": 380 | base_position["h"] += MOVE_ARM_HEIGHT_CONSTANT 381 | if action == "MoveArmHeightM": 382 | base_position[ 383 | "h" 384 | ] -= MOVE_ARM_HEIGHT_CONSTANT # height is pretty big! 385 | action_dict["y"] = base_position["h"] 386 | else: 387 | action_dict["action"] = "MoveArm" 388 | if action == "MoveArmXP": 389 | base_position["x"] += MOVE_ARM_CONSTANT 390 | elif action == "MoveArmXM": 391 | base_position["x"] -= MOVE_ARM_CONSTANT 392 | elif action == "MoveArmYP": 393 | base_position["y"] += MOVE_ARM_CONSTANT 394 | elif action == "MoveArmYM": 395 | base_position["y"] -= MOVE_ARM_CONSTANT 396 | elif action == "MoveArmZP": 397 | base_position["z"] += MOVE_ARM_CONSTANT 398 | elif action == "MoveArmZM": 399 | base_position["z"] -= MOVE_ARM_CONSTANT 400 | action_dict["position"] = { 401 | k: v for (k, v) in base_position.items() if k in ["x", "y", "z"] 402 | } 403 | 404 | sr = self.controller.step(action_dict) 405 | self.list_of_actions_so_far.append(action_dict) 406 | 407 | if self._verbose: 408 | print(self.controller.last_event) 409 | 410 | if self.restrict_to_initially_reachable_points: 411 | self._snap_agent_to_initially_reachable() 412 | 413 | if skip_render: 414 | assert last_frame is not None 415 | self.last_event.frame = last_frame 416 | 417 | return sr 418 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_sensors.py: -------------------------------------------------------------------------------- 1 | """Utility classes and functions for sensory inputs used by the models.""" 2 | from typing import Any, Union, Optional 3 | 4 | import gym 5 | import numpy as np 6 | from allenact.base_abstractions.sensor import DepthSensor, Sensor, RGBSensor 7 | from allenact.base_abstractions.task import Task 8 | from allenact.utils.misc_utils import prepare_locals_for_super 9 | from allenact_plugins.ithor_plugin.ithor_environment import IThorEnvironment 10 | 11 | from ithor_arm.arm_calculation_utils import ( 12 | convert_world_to_agent_coordinate, 13 | convert_state_to_tensor, 14 | diff_position, 15 | ) 16 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment 17 | 18 | 19 | class DepthSensorThor( 20 | DepthSensor[ 21 | Union[IThorEnvironment], 22 | Union[Task[IThorEnvironment]], 23 | ] 24 | ): 25 | """Sensor for Depth images in THOR. 26 | 27 | Returns from a running IThorEnvironment instance, the current RGB 28 | frame corresponding to the agent's egocentric view. 29 | """ 30 | 31 | def frame_from_env(self, env: IThorEnvironment, task: Optional[Task]) -> np.ndarray: 32 | return env.controller.last_event.depth_frame.copy() 33 | 34 | 35 | class NoVisionSensorThor( 36 | RGBSensor[ 37 | Union[IThorEnvironment], 38 | Union[Task[IThorEnvironment]], 39 | ] 40 | ): 41 | """Sensor for RGB images in THOR. 42 | 43 | Returns from a running IThorEnvironment instance, the current RGB 44 | frame corresponding to the agent's egocentric view. 45 | """ 46 | 47 | def frame_from_env(self, env: IThorEnvironment, task: Optional[Task]) -> np.ndarray: 48 | result = env.current_frame.copy() 49 | result.fill(0) 50 | return result 51 | 52 | 53 | class AgentRelativeCurrentObjectStateThorSensor(Sensor): 54 | def __init__(self, uuid: str = "relative_current_obj_state", **kwargs: Any): 55 | observation_space = gym.spaces.Box( 56 | low=-100, high=100, shape=(6,), dtype=np.float32 57 | ) # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32) 58 | super().__init__(**prepare_locals_for_super(locals())) 59 | 60 | def get_observation( 61 | self, env: IThorEnvironment, task: Task, *args: Any, **kwargs: Any 62 | ) -> Any: 63 | object_id = task.task_info["objectId"] 64 | current_object_state = env.get_object_by_id(object_id) 65 | relative_current_obj = convert_world_to_agent_coordinate( 66 | current_object_state, env.controller.last_event.metadata["agent"] 67 | ) 68 | result = convert_state_to_tensor( 69 | dict( 70 | position=relative_current_obj["position"], 71 | rotation=relative_current_obj["rotation"], 72 | ) 73 | ) 74 | return result 75 | 76 | 77 | class RelativeObjectToGoalSensor(Sensor): 78 | def __init__(self, uuid: str = "relative_obj_to_goal", **kwargs: Any): 79 | # observation_space = gym.spaces.Discrete(len(self.detector_types)) 80 | observation_space = gym.spaces.Box( 81 | low=-100, high=100, shape=(3,), dtype=np.float32 82 | ) # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32) 83 | super().__init__(**prepare_locals_for_super(locals())) 84 | 85 | def get_observation( 86 | self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any 87 | ) -> Any: 88 | goal_obj_id = task.task_info["objectId"] 89 | object_info = env.get_object_by_id(goal_obj_id) 90 | target_state = task.task_info["target_location"] 91 | 92 | agent_state = env.controller.last_event.metadata["agent"] 93 | 94 | relative_current_obj = convert_world_to_agent_coordinate( 95 | object_info, agent_state 96 | ) 97 | relative_goal_state = convert_world_to_agent_coordinate( 98 | target_state, agent_state 99 | ) 100 | relative_distance = diff_position(relative_current_obj, relative_goal_state) 101 | result = convert_state_to_tensor(dict(position=relative_distance)) 102 | return result 103 | 104 | 105 | class RelativeAgentArmToObjectSensor(Sensor): 106 | def __init__(self, uuid: str = "relative_agent_arm_to_obj", **kwargs: Any): 107 | observation_space = gym.spaces.Box( 108 | low=-100, high=100, shape=(3,), dtype=np.float32 109 | ) # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32) 110 | super().__init__(**prepare_locals_for_super(locals())) 111 | 112 | def get_observation( 113 | self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any 114 | ) -> Any: 115 | goal_obj_id = task.task_info["objectId"] 116 | object_info = env.get_object_by_id(goal_obj_id) 117 | hand_state = env.get_absolute_hand_state() 118 | 119 | relative_goal_obj = convert_world_to_agent_coordinate( 120 | object_info, env.controller.last_event.metadata["agent"] 121 | ) 122 | relative_hand_state = convert_world_to_agent_coordinate( 123 | hand_state, env.controller.last_event.metadata["agent"] 124 | ) 125 | relative_distance = diff_position(relative_goal_obj, relative_hand_state) 126 | result = convert_state_to_tensor(dict(position=relative_distance)) 127 | 128 | return result 129 | 130 | 131 | class PickedUpObjSensor(Sensor): 132 | def __init__(self, uuid: str = "pickedup_object", **kwargs: Any): 133 | observation_space = gym.spaces.Box( 134 | low=0, high=1, shape=(1,), dtype=np.float32 135 | ) # (low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32) 136 | super().__init__(**prepare_locals_for_super(locals())) 137 | 138 | def get_observation( 139 | self, env: ManipulaTHOREnvironment, task: Task, *args: Any, **kwargs: Any 140 | ) -> Any: 141 | return task.object_picked_up 142 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_task_samplers.py: -------------------------------------------------------------------------------- 1 | """Task Samplers for the task of ArmPointNav""" 2 | import json 3 | import random 4 | from typing import List, Dict, Optional, Any, Union 5 | 6 | import gym 7 | from allenact.base_abstractions.sensor import Sensor 8 | from allenact.base_abstractions.task import Task 9 | from allenact.base_abstractions.task import TaskSampler 10 | from allenact.utils.experiment_utils import set_deterministic_cudnn, set_seed 11 | 12 | from ithor_arm.arm_calculation_utils import initialize_arm 13 | from ithor_arm.ithor_arm_constants import transport_wrapper 14 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment 15 | from ithor_arm.ithor_arm_tasks import ( 16 | AbstractPickUpDropOffTask, 17 | ArmPointNavTask, 18 | ) 19 | from ithor_arm.ithor_arm_viz import LoggerVisualizer, ImageVisualizer 20 | 21 | 22 | class AbstractMidLevelArmTaskSampler(TaskSampler): 23 | 24 | _TASK_TYPE = Task 25 | 26 | def __init__( 27 | self, 28 | scenes: List[str], 29 | sensors: List[Sensor], 30 | max_steps: int, 31 | env_args: Dict[str, Any], 32 | action_space: gym.Space, 33 | rewards_config: Dict, 34 | objects: List[str], 35 | scene_period: Optional[Union[int, str]] = None, 36 | max_tasks: Optional[int] = None, 37 | seed: Optional[int] = None, 38 | deterministic_cudnn: bool = False, 39 | fixed_tasks: Optional[List[Dict[str, Any]]] = None, 40 | visualizers: List[LoggerVisualizer] = [], 41 | *args, 42 | **kwargs 43 | ) -> None: 44 | self.rewards_config = rewards_config 45 | self.env_args = env_args 46 | self.scenes = scenes 47 | self.grid_size = 0.25 48 | self.env: Optional[ManipulaTHOREnvironment] = None 49 | self.sensors = sensors 50 | self.max_steps = max_steps 51 | self._action_space = action_space 52 | self.objects = objects 53 | 54 | self.scene_counter: Optional[int] = None 55 | self.scene_order: Optional[List[str]] = None 56 | self.scene_id: Optional[int] = None 57 | self.scene_period: Optional[ 58 | Union[str, int] 59 | ] = scene_period # default makes a random choice 60 | self.max_tasks: Optional[int] = None 61 | self.reset_tasks = max_tasks 62 | 63 | self._last_sampled_task: Optional[Task] = None 64 | 65 | self.seed: Optional[int] = None 66 | self.set_seed(seed) 67 | 68 | if deterministic_cudnn: 69 | set_deterministic_cudnn() 70 | 71 | self.reset() 72 | self.visualizers = visualizers 73 | self.sampler_mode = kwargs["sampler_mode"] 74 | self.cap_training = kwargs["cap_training"] 75 | 76 | def _create_environment(self, **kwargs) -> ManipulaTHOREnvironment: 77 | env = ManipulaTHOREnvironment( 78 | make_agents_visible=False, 79 | object_open_speed=0.05, 80 | env_args=self.env_args, 81 | ) 82 | 83 | return env 84 | 85 | @property 86 | def last_sampled_task(self) -> Optional[Task]: 87 | return self._last_sampled_task 88 | 89 | def close(self) -> None: 90 | if self.env is not None: 91 | self.env.stop() 92 | 93 | @property 94 | def all_observation_spaces_equal(self) -> bool: 95 | """Check if observation spaces equal. 96 | 97 | # Returns 98 | 99 | True if all Tasks that can be sampled by this sampler have the 100 | same observation space. Otherwise False. 101 | """ 102 | return True 103 | 104 | def reset(self): 105 | self.scene_counter = 0 106 | self.scene_order = list(range(len(self.scenes))) 107 | random.shuffle(self.scene_order) 108 | self.scene_id = 0 109 | self.sampler_index = 0 110 | 111 | self.max_tasks = self.reset_tasks 112 | 113 | def set_seed(self, seed: int): 114 | self.seed = seed 115 | if seed is not None: 116 | set_seed(seed) 117 | 118 | 119 | class SimpleArmPointNavGeneralSampler(AbstractMidLevelArmTaskSampler): 120 | 121 | _TASK_TYPE = AbstractPickUpDropOffTask 122 | 123 | def __init__(self, **kwargs) -> None: 124 | 125 | super(SimpleArmPointNavGeneralSampler, self).__init__(**kwargs) 126 | self.all_possible_points = [] 127 | for scene in self.scenes: 128 | for object in self.objects: 129 | valid_position_adr = "datasets/apnd-dataset/valid_object_positions/valid_{}_positions_in_{}.json".format( 130 | object, scene 131 | ) 132 | try: 133 | with open(valid_position_adr) as f: 134 | data_points = json.load(f) 135 | except Exception: 136 | print("Failed to load", valid_position_adr) 137 | continue 138 | visible_data = [ 139 | data for data in data_points[scene] if data["visibility"] 140 | ] 141 | self.all_possible_points += visible_data 142 | 143 | self.countertop_object_to_data_id = self.calc_possible_trajectories( 144 | self.all_possible_points 145 | ) 146 | 147 | scene_names = set( 148 | [ 149 | self.all_possible_points[counter[0]]["scene_name"] 150 | for counter in self.countertop_object_to_data_id.values() 151 | if len(counter) > 1 152 | ] 153 | ) 154 | 155 | if len(set(scene_names)) < len(self.scenes): 156 | print("Not all scenes appear") 157 | 158 | print( 159 | "Len dataset", 160 | len(self.all_possible_points), 161 | "total_remained", 162 | sum([len(v) for v in self.countertop_object_to_data_id.values()]), 163 | ) 164 | 165 | if ( 166 | self.sampler_mode != "train" 167 | ): # Be aware that this totally overrides some stuff 168 | self.deterministic_data_list = [] 169 | for scene in self.scenes: 170 | for object in self.objects: 171 | valid_position_adr = "datasets/apnd-dataset/deterministic_tasks/tasks_{}_positions_in_{}.json".format( 172 | object, scene 173 | ) 174 | try: 175 | with open(valid_position_adr) as f: 176 | data_points = json.load(f) 177 | except Exception: 178 | print("Failed to load", valid_position_adr) 179 | continue 180 | visible_data = [ 181 | dict(scene=scene, index=i, datapoint=data) 182 | for (i, data) in enumerate(data_points[scene]) 183 | ] 184 | self.deterministic_data_list += visible_data 185 | 186 | if self.sampler_mode == "test": 187 | random.shuffle(self.deterministic_data_list) 188 | self.max_tasks = self.reset_tasks = len(self.deterministic_data_list) 189 | 190 | def next_task( 191 | self, force_advance_scene: bool = False 192 | ) -> Optional[AbstractPickUpDropOffTask]: 193 | if self.max_tasks is not None and self.max_tasks <= 0: 194 | return None 195 | 196 | if self.sampler_mode != "train" and self.length <= 0: 197 | return None 198 | 199 | source_data_point, target_data_point = self.get_source_target_indices() 200 | 201 | scene = source_data_point["scene_name"] 202 | 203 | assert source_data_point["object_id"] == target_data_point["object_id"] 204 | assert source_data_point["scene_name"] == target_data_point["scene_name"] 205 | 206 | if self.env is None: 207 | self.env = self._create_environment() 208 | 209 | self.env.reset( 210 | scene_name=scene, agentMode="arm", agentControllerType="mid-level" 211 | ) 212 | 213 | event1, event2, event3 = initialize_arm(self.env.controller) 214 | 215 | source_location = source_data_point 216 | target_location = dict( 217 | position=target_data_point["object_location"], 218 | rotation={"x": 0, "y": 0, "z": 0}, 219 | ) 220 | 221 | task_info = { 222 | "objectId": source_location["object_id"], 223 | "countertop_id": source_location["countertop_id"], 224 | "source_location": source_location, 225 | "target_location": target_location, 226 | } 227 | 228 | this_controller = self.env 229 | 230 | event = transport_wrapper( 231 | this_controller, 232 | source_location["object_id"], 233 | source_location["object_location"], 234 | ) 235 | agent_state = source_location["agent_pose"] 236 | 237 | event = this_controller.step( 238 | dict( 239 | action="TeleportFull", 240 | standing=True, 241 | x=agent_state["position"]["x"], 242 | y=agent_state["position"]["y"], 243 | z=agent_state["position"]["z"], 244 | rotation=dict( 245 | x=agent_state["rotation"]["x"], 246 | y=agent_state["rotation"]["y"], 247 | z=agent_state["rotation"]["z"], 248 | ), 249 | horizon=agent_state["cameraHorizon"], 250 | ) 251 | ) 252 | 253 | should_visualize_goal_start = [ 254 | x for x in self.visualizers if issubclass(type(x), ImageVisualizer) 255 | ] 256 | if len(should_visualize_goal_start) > 0: 257 | task_info["visualization_source"] = source_data_point 258 | task_info["visualization_target"] = target_data_point 259 | 260 | self._last_sampled_task = self._TASK_TYPE( 261 | env=self.env, 262 | sensors=self.sensors, 263 | task_info=task_info, 264 | max_steps=self.max_steps, 265 | action_space=self._action_space, 266 | visualizers=self.visualizers, 267 | reward_configs=self.rewards_config, 268 | ) 269 | 270 | return self._last_sampled_task 271 | 272 | @property 273 | def total_unique(self) -> Optional[Union[int, float]]: 274 | if self.sampler_mode == "train": 275 | return None 276 | else: 277 | return min(self.max_tasks, len(self.deterministic_data_list)) 278 | 279 | @property 280 | def length(self) -> Union[int, float]: 281 | """Length. 282 | 283 | # Returns 284 | 285 | Number of total tasks remaining that can be sampled. Can be float('inf'). 286 | """ 287 | return ( 288 | self.total_unique - self.sampler_index 289 | if self.sampler_mode != "train" 290 | else (float("inf") if self.max_tasks is None else self.max_tasks) 291 | ) 292 | 293 | def get_source_target_indices(self): 294 | if self.sampler_mode == "train": 295 | valid_countertops = [ 296 | k for (k, v) in self.countertop_object_to_data_id.items() if len(v) > 1 297 | ] 298 | countertop_id = random.choice(valid_countertops) 299 | indices = random.sample(self.countertop_object_to_data_id[countertop_id], 2) 300 | result = ( 301 | self.all_possible_points[indices[0]], 302 | self.all_possible_points[indices[1]], 303 | ) 304 | else: 305 | result = self.deterministic_data_list[self.sampler_index]["datapoint"] 306 | # ForkedPdb().set_trace() 307 | self.sampler_index += 1 308 | 309 | return result 310 | 311 | def calc_possible_trajectories(self, all_possible_points): 312 | 313 | object_to_data_id = {} 314 | 315 | for i in range(len(all_possible_points)): 316 | object_id = all_possible_points[i]["object_id"] 317 | object_to_data_id.setdefault(object_id, []) 318 | object_to_data_id[object_id].append(i) 319 | 320 | return object_to_data_id 321 | 322 | 323 | class ArmPointNavTaskSampler(SimpleArmPointNavGeneralSampler): 324 | _TASK_TYPE = ArmPointNavTask 325 | 326 | def __init__(self, **kwargs) -> None: 327 | 328 | super(ArmPointNavTaskSampler, self).__init__(**kwargs) 329 | possible_initial_locations = ( 330 | "datasets/apnd-dataset/valid_agent_initial_locations.json" 331 | ) 332 | if self.sampler_mode == "test": 333 | possible_initial_locations = ( 334 | "datasets/apnd-dataset/deterministic_valid_agent_initial_locations.json" 335 | ) 336 | with open(possible_initial_locations) as f: 337 | self.possible_agent_reachable_poses = json.load(f) 338 | 339 | def next_task( 340 | self, force_advance_scene: bool = False 341 | ) -> Optional[AbstractPickUpDropOffTask]: 342 | if self.max_tasks is not None and self.max_tasks <= 0: 343 | return None 344 | 345 | if self.sampler_mode != "train" and self.length <= 0: 346 | return None 347 | 348 | source_data_point, target_data_point = self.get_source_target_indices() 349 | 350 | scene = source_data_point["scene_name"] 351 | 352 | assert source_data_point["object_id"] == target_data_point["object_id"] 353 | assert source_data_point["scene_name"] == target_data_point["scene_name"] 354 | 355 | if self.env is None: 356 | self.env = self._create_environment() 357 | 358 | self.env.reset( 359 | scene_name=scene, agentMode="arm", agentControllerType="mid-level" 360 | ) 361 | 362 | event1, event2, event3 = initialize_arm(self.env.controller) 363 | 364 | source_location = source_data_point 365 | target_location = dict( 366 | position=target_data_point["object_location"], 367 | rotation={"x": 0, "y": 0, "z": 0}, 368 | ) 369 | 370 | task_info = { 371 | "objectId": source_location["object_id"], 372 | "countertop_id": source_location["countertop_id"], 373 | "source_location": source_location, 374 | "target_location": target_location, 375 | } 376 | 377 | this_controller = self.env 378 | 379 | event = transport_wrapper( 380 | this_controller, 381 | source_location["object_id"], 382 | source_location["object_location"], 383 | ) 384 | 385 | agent_state = source_location[ 386 | "initial_agent_pose" 387 | ] # THe only line different from father 388 | 389 | event = this_controller.step( 390 | dict( 391 | action="TeleportFull", 392 | standing=True, 393 | x=agent_state["position"]["x"], 394 | y=agent_state["position"]["y"], 395 | z=agent_state["position"]["z"], 396 | rotation=dict( 397 | x=agent_state["rotation"]["x"], 398 | y=agent_state["rotation"]["y"], 399 | z=agent_state["rotation"]["z"], 400 | ), 401 | horizon=agent_state["cameraHorizon"], 402 | ) 403 | ) 404 | 405 | should_visualize_goal_start = [ 406 | x for x in self.visualizers if issubclass(type(x), ImageVisualizer) 407 | ] 408 | if len(should_visualize_goal_start) > 0: 409 | task_info["visualization_source"] = source_data_point 410 | task_info["visualization_target"] = target_data_point 411 | 412 | self._last_sampled_task = self._TASK_TYPE( 413 | env=self.env, 414 | sensors=self.sensors, 415 | task_info=task_info, 416 | max_steps=self.max_steps, 417 | action_space=self._action_space, 418 | visualizers=self.visualizers, 419 | reward_configs=self.rewards_config, 420 | ) 421 | 422 | return self._last_sampled_task 423 | 424 | def get_source_target_indices(self): 425 | if self.sampler_mode == "train": 426 | valid_countertops = [ 427 | k for (k, v) in self.countertop_object_to_data_id.items() if len(v) > 1 428 | ] 429 | countertop_id = random.choice(valid_countertops) 430 | indices = random.sample(self.countertop_object_to_data_id[countertop_id], 2) 431 | result = ( 432 | self.all_possible_points[indices[0]], 433 | self.all_possible_points[indices[1]], 434 | ) 435 | scene_name = result[0]["scene_name"] 436 | selected_agent_init_loc = random.choice( 437 | self.possible_agent_reachable_poses[scene_name] 438 | ) 439 | initial_agent_pose = { 440 | "name": "agent", 441 | "position": { 442 | "x": selected_agent_init_loc["x"], 443 | "y": selected_agent_init_loc["y"], 444 | "z": selected_agent_init_loc["z"], 445 | }, 446 | "rotation": { 447 | "x": -0.0, 448 | "y": selected_agent_init_loc["rotation"], 449 | "z": 0.0, 450 | }, 451 | "cameraHorizon": selected_agent_init_loc["horizon"], 452 | "isStanding": True, 453 | } 454 | result[0]["initial_agent_pose"] = initial_agent_pose 455 | else: # we need to fix this for test set, agent init location needs to be fixed, therefore we load a fixed valid agent init that is previously randomized 456 | result = self.deterministic_data_list[self.sampler_index]["datapoint"] 457 | scene_name = self.deterministic_data_list[self.sampler_index]["scene"] 458 | datapoint_original_index = self.deterministic_data_list[self.sampler_index][ 459 | "index" 460 | ] 461 | selected_agent_init_loc = self.possible_agent_reachable_poses[scene_name][ 462 | datapoint_original_index 463 | ] 464 | initial_agent_pose = { 465 | "name": "agent", 466 | "position": { 467 | "x": selected_agent_init_loc["x"], 468 | "y": selected_agent_init_loc["y"], 469 | "z": selected_agent_init_loc["z"], 470 | }, 471 | "rotation": { 472 | "x": -0.0, 473 | "y": selected_agent_init_loc["rotation"], 474 | "z": 0.0, 475 | }, 476 | "cameraHorizon": selected_agent_init_loc["horizon"], 477 | "isStanding": True, 478 | } 479 | result[0]["initial_agent_pose"] = initial_agent_pose 480 | self.sampler_index += 1 481 | 482 | return result 483 | 484 | 485 | def get_all_tuples_from_list(list): 486 | result = [] 487 | for first_ind in range(len(list) - 1): 488 | for second_ind in range(first_ind + 1, len(list)): 489 | result.append([list[first_ind], list[second_ind]]) 490 | return result 491 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_tasks.py: -------------------------------------------------------------------------------- 1 | """Task Definions for the task of ArmPointNav""" 2 | 3 | from typing import Dict, Tuple, List, Any, Optional 4 | 5 | import gym 6 | import numpy as np 7 | from allenact.base_abstractions.misc import RLStepResult 8 | from allenact.base_abstractions.sensor import Sensor 9 | from allenact.base_abstractions.task import Task 10 | 11 | from ithor_arm.ithor_arm_constants import ( 12 | MOVE_ARM_CONSTANT, 13 | MOVE_ARM_HEIGHT_P, 14 | MOVE_ARM_HEIGHT_M, 15 | MOVE_ARM_X_P, 16 | MOVE_ARM_X_M, 17 | MOVE_ARM_Y_P, 18 | MOVE_ARM_Y_M, 19 | MOVE_ARM_Z_P, 20 | MOVE_ARM_Z_M, 21 | MOVE_AHEAD, 22 | ROTATE_RIGHT, 23 | ROTATE_LEFT, 24 | PICKUP, 25 | DONE, 26 | ) 27 | from ithor_arm.ithor_arm_environment import ManipulaTHOREnvironment 28 | from ithor_arm.ithor_arm_viz import LoggerVisualizer 29 | 30 | 31 | def position_distance(s1, s2): 32 | position1 = s1["position"] 33 | position2 = s2["position"] 34 | return ( 35 | (position1["x"] - position2["x"]) ** 2 36 | + (position1["y"] - position2["y"]) ** 2 37 | + (position1["z"] - position2["z"]) ** 2 38 | ) ** 0.5 39 | 40 | 41 | class AbstractPickUpDropOffTask(Task[ManipulaTHOREnvironment]): 42 | 43 | _actions = ( 44 | MOVE_ARM_HEIGHT_P, 45 | MOVE_ARM_HEIGHT_M, 46 | MOVE_ARM_X_P, 47 | MOVE_ARM_X_M, 48 | MOVE_ARM_Y_P, 49 | MOVE_ARM_Y_M, 50 | MOVE_ARM_Z_P, 51 | MOVE_ARM_Z_M, 52 | MOVE_AHEAD, 53 | ROTATE_RIGHT, 54 | ROTATE_LEFT, 55 | ) 56 | 57 | def __init__( 58 | self, 59 | env: ManipulaTHOREnvironment, 60 | sensors: List[Sensor], 61 | task_info: Dict[str, Any], 62 | max_steps: int, 63 | visualizers: List[LoggerVisualizer] = [], 64 | **kwargs 65 | ) -> None: 66 | """Initializer. 67 | 68 | See class documentation for parameter definitions. 69 | """ 70 | super().__init__( 71 | env=env, sensors=sensors, task_info=task_info, max_steps=max_steps, **kwargs 72 | ) 73 | self._took_end_action: bool = False 74 | self._success: Optional[bool] = False 75 | self._subsampled_locations_from_which_obj_visible: Optional[ 76 | List[Tuple[float, float, int, int]] 77 | ] = None 78 | self.visualizers = visualizers 79 | self.start_visualize() 80 | self.action_sequence_and_success = [] 81 | self._took_end_action: bool = False 82 | self._success: Optional[bool] = False 83 | self._subsampled_locations_from_which_obj_visible: Optional[ 84 | List[Tuple[float, float, int, int]] 85 | ] = None 86 | 87 | # in allenact initialization is with 0.2 88 | self.last_obj_to_goal_distance = None 89 | self.last_arm_to_obj_distance = None 90 | self.object_picked_up = False 91 | self.got_reward_for_pickup = False 92 | self.reward_configs = kwargs["reward_configs"] 93 | self.initial_object_metadata = self.env.get_current_object_locations() 94 | 95 | @property 96 | def action_space(self): 97 | return gym.spaces.Discrete(len(self._actions)) 98 | 99 | def reached_terminal_state(self) -> bool: 100 | return self._took_end_action 101 | 102 | @classmethod 103 | def class_action_names(cls, **kwargs) -> Tuple[str, ...]: 104 | return cls._actions 105 | 106 | def close(self) -> None: 107 | self.env.stop() 108 | 109 | def obj_state_aproximity(self, s1, s2): 110 | # KIANA ignore rotation for now 111 | position1 = s1["position"] 112 | position2 = s2["position"] 113 | eps = MOVE_ARM_CONSTANT * 2 114 | return ( 115 | abs(position1["x"] - position2["x"]) < eps 116 | and abs(position1["y"] - position2["y"]) < eps 117 | and abs(position1["z"] - position2["z"]) < eps 118 | ) 119 | 120 | def start_visualize(self): 121 | for visualizer in self.visualizers: 122 | if not visualizer.is_empty(): 123 | print("OH NO VISUALIZER WAS NOT EMPTY") 124 | visualizer.finish_episode(self.env, self, self.task_info) 125 | visualizer.finish_episode_metrics(self, self.task_info, None) 126 | visualizer.log(self.env, "") 127 | 128 | def visualize(self, action_str): 129 | 130 | for vizualizer in self.visualizers: 131 | vizualizer.log(self.env, action_str) 132 | 133 | def finish_visualizer(self, episode_success): 134 | 135 | for visualizer in self.visualizers: 136 | visualizer.finish_episode(self.env, self, self.task_info) 137 | 138 | def finish_visualizer_metrics(self, metric_results): 139 | 140 | for visualizer in self.visualizers: 141 | visualizer.finish_episode_metrics(self, self.task_info, metric_results) 142 | 143 | def render(self, mode: str = "rgb", *args, **kwargs) -> np.ndarray: 144 | assert mode == "rgb", "only rgb rendering is implemented" 145 | return self.env.current_frame 146 | 147 | def calc_action_stat_metrics(self) -> Dict[str, Any]: 148 | action_stat = { 149 | "metric/action_stat/" + action_str: 0.0 for action_str in self._actions 150 | } 151 | action_success_stat = { 152 | "metric/action_success/" + action_str: 0.0 for action_str in self._actions 153 | } 154 | action_success_stat["metric/action_success/total"] = 0.0 155 | 156 | seq_len = len(self.action_sequence_and_success) 157 | for (action_name, action_success) in self.action_sequence_and_success: 158 | action_stat["metric/action_stat/" + action_name] += 1.0 159 | action_success_stat[ 160 | "metric/action_success/{}".format(action_name) 161 | ] += action_success 162 | action_success_stat["metric/action_success/total"] += action_success 163 | 164 | action_success_stat["metric/action_success/total"] /= seq_len 165 | 166 | for action_name in self._actions: 167 | action_success_stat[ 168 | "metric/" + "action_success/{}".format(action_name) 169 | ] /= (action_stat["metric/action_stat/" + action_name] + 0.000001) 170 | action_stat["metric/action_stat/" + action_name] /= seq_len 171 | 172 | succ = [v for v in action_success_stat.values()] 173 | sum(succ) / len(succ) 174 | result = {**action_stat, **action_success_stat} 175 | 176 | return result 177 | 178 | def metrics(self) -> Dict[str, Any]: 179 | result = super(AbstractPickUpDropOffTask, self).metrics() 180 | if self.is_done(): 181 | result = {**result, **self.calc_action_stat_metrics()} 182 | final_obj_distance_from_goal = self.obj_distance_from_goal() 183 | result[ 184 | "metric/average/final_obj_distance_from_goal" 185 | ] = final_obj_distance_from_goal 186 | final_arm_distance_from_obj = self.arm_distance_from_obj() 187 | result[ 188 | "metric/average/final_arm_distance_from_obj" 189 | ] = final_arm_distance_from_obj 190 | final_obj_pickup = 1 if self.object_picked_up else 0 191 | result["metric/average/final_obj_pickup"] = final_obj_pickup 192 | 193 | original_distance = self.get_original_object_distance() 194 | result["metric/average/original_distance"] = original_distance 195 | 196 | # this ratio can be more than 1? 197 | if self.object_picked_up: 198 | ratio_distance_left = final_obj_distance_from_goal / original_distance 199 | result["metric/average/ratio_distance_left"] = ratio_distance_left 200 | result["metric/average/eplen_pickup"] = self.eplen_pickup 201 | 202 | if self._success: 203 | result["metric/average/eplen_success"] = result["ep_length"] 204 | # put back this is not the reason for being slow 205 | objects_moved = self.env.get_objects_moved(self.initial_object_metadata) 206 | # Unnecessary, this is definitely happening objects_moved.remove(self.task_info['object_id']) 207 | result["metric/average/number_of_unwanted_moved_objects"] = ( 208 | len(objects_moved) - 1 209 | ) 210 | result["metric/average/success_wo_disturb"] = ( 211 | len(objects_moved) == 1 212 | ) # multiply this by the successrate 213 | 214 | result["success"] = self._success 215 | 216 | self.finish_visualizer_metrics(result) 217 | self.finish_visualizer(self._success) 218 | self.action_sequence_and_success = [] 219 | 220 | return result 221 | 222 | def _step(self, action: int) -> RLStepResult: 223 | raise Exception("Not implemented") 224 | 225 | def arm_distance_from_obj(self): 226 | goal_obj_id = self.task_info["objectId"] 227 | object_info = self.env.get_object_by_id(goal_obj_id) 228 | hand_state = self.env.get_absolute_hand_state() 229 | return position_distance(object_info, hand_state) 230 | 231 | def obj_distance_from_goal(self): 232 | goal_obj_id = self.task_info["objectId"] 233 | object_info = self.env.get_object_by_id(goal_obj_id) 234 | goal_state = self.task_info["target_location"] 235 | return position_distance(object_info, goal_state) 236 | 237 | def get_original_object_distance(self): 238 | goal_obj_id = self.task_info["objectId"] 239 | s_init = dict(position=self.task_info["source_location"]["object_location"]) 240 | current_location = self.env.get_object_by_id(goal_obj_id) 241 | 242 | original_object_distance = position_distance(s_init, current_location) 243 | return original_object_distance 244 | 245 | def judge(self) -> float: 246 | """Compute the reward after having taken a step.""" 247 | raise Exception("Not implemented") 248 | 249 | 250 | class ArmPointNavTask(AbstractPickUpDropOffTask): 251 | _actions = ( 252 | MOVE_ARM_HEIGHT_P, 253 | MOVE_ARM_HEIGHT_M, 254 | MOVE_ARM_X_P, 255 | MOVE_ARM_X_M, 256 | MOVE_ARM_Y_P, 257 | MOVE_ARM_Y_M, 258 | MOVE_ARM_Z_P, 259 | MOVE_ARM_Z_M, 260 | MOVE_AHEAD, 261 | ROTATE_RIGHT, 262 | ROTATE_LEFT, 263 | PICKUP, 264 | DONE, 265 | ) 266 | 267 | def _step(self, action: int) -> RLStepResult: 268 | 269 | action_str = self.class_action_names()[action] 270 | 271 | self._last_action_str = action_str 272 | action_dict = {"action": action_str} 273 | object_id = self.task_info["objectId"] 274 | if action_str == PICKUP: 275 | action_dict = {**action_dict, "object_id": object_id} 276 | self.env.step(action_dict) 277 | self.last_action_success = self.env.last_action_success 278 | 279 | last_action_name = self._last_action_str 280 | last_action_success = float(self.last_action_success) 281 | self.action_sequence_and_success.append((last_action_name, last_action_success)) 282 | self.visualize(last_action_name) 283 | 284 | # If the object has not been picked up yet and it was picked up in the previous step update parameters to integrate it into reward 285 | if not self.object_picked_up: 286 | 287 | if self.env.is_object_at_low_level_hand(object_id): 288 | self.object_picked_up = True 289 | self.eplen_pickup = ( 290 | self._num_steps_taken + 1 291 | ) # plus one because this step has not been counted yet 292 | 293 | if action_str == DONE: 294 | self._took_end_action = True 295 | object_state = self.env.get_object_by_id(object_id) 296 | goal_state = self.task_info["target_location"] 297 | goal_achieved = self.object_picked_up and self.obj_state_aproximity( 298 | object_state, goal_state 299 | ) 300 | self.last_action_success = goal_achieved 301 | self._success = goal_achieved 302 | 303 | step_result = RLStepResult( 304 | observation=self.get_observations(), 305 | reward=self.judge(), 306 | done=self.is_done(), 307 | info={"last_action_success": self.last_action_success}, 308 | ) 309 | return step_result 310 | 311 | def judge(self) -> float: 312 | """Compute the reward after having taken a step.""" 313 | reward = self.reward_configs["step_penalty"] 314 | 315 | if not self.last_action_success or ( 316 | self._last_action_str == PICKUP and not self.object_picked_up 317 | ): 318 | reward += self.reward_configs["failed_action_penalty"] 319 | 320 | if self._took_end_action: 321 | reward += ( 322 | self.reward_configs["goal_success_reward"] 323 | if self._success 324 | else self.reward_configs["failed_stop_reward"] 325 | ) 326 | 327 | # increase reward if object pickup and only do it once 328 | if not self.got_reward_for_pickup and self.object_picked_up: 329 | reward += self.reward_configs["pickup_success_reward"] 330 | self.got_reward_for_pickup = True 331 | 332 | current_obj_to_arm_distance = self.arm_distance_from_obj() 333 | if self.last_arm_to_obj_distance is None: 334 | delta_arm_to_obj_distance_reward = 0 335 | else: 336 | delta_arm_to_obj_distance_reward = ( 337 | self.last_arm_to_obj_distance - current_obj_to_arm_distance 338 | ) 339 | self.last_arm_to_obj_distance = current_obj_to_arm_distance 340 | reward += delta_arm_to_obj_distance_reward 341 | 342 | current_obj_to_goal_distance = self.obj_distance_from_goal() 343 | if self.last_obj_to_goal_distance is None: 344 | delta_obj_to_goal_distance_reward = 0 345 | else: 346 | delta_obj_to_goal_distance_reward = ( 347 | self.last_obj_to_goal_distance - current_obj_to_goal_distance 348 | ) 349 | self.last_obj_to_goal_distance = current_obj_to_goal_distance 350 | reward += delta_obj_to_goal_distance_reward 351 | 352 | # add collision cost, maybe distance to goal objective,... 353 | 354 | return float(reward) 355 | -------------------------------------------------------------------------------- /ithor_arm/ithor_arm_viz.py: -------------------------------------------------------------------------------- 1 | """Utility functions and classes for visualization and logging""" 2 | import os 3 | from datetime import datetime 4 | 5 | import cv2 6 | import imageio 7 | import numpy as np 8 | 9 | from ithor_arm.arm_calculation_utils import initialize_arm 10 | from ithor_arm.ithor_arm_constants import ( 11 | reset_environment_and_additional_commands, 12 | transport_wrapper, 13 | ) 14 | from manipulathor_utils.debugger_util import ForkedPdb 15 | 16 | 17 | class LoggerVisualizer: 18 | def __init__(self, exp_name="", log_dir=""): 19 | if log_dir == "": 20 | log_dir = self.__class__.__name__ 21 | if exp_name == "": 22 | exp_name = "NoNameExp" 23 | now = datetime.now() 24 | self.exp_name = exp_name 25 | log_dir = os.path.join( 26 | "experiment_output/visualizations", 27 | exp_name, 28 | log_dir + "_" + now.strftime("%m_%d_%Y_%H_%M_%S_%f"), 29 | ) 30 | self.log_dir = log_dir 31 | os.makedirs(self.log_dir, exist_ok=True) 32 | self.log_queue = [] 33 | self.action_queue = [] 34 | self.logger_index = 0 35 | 36 | def log(self, environment, action_str): 37 | raise Exception("Not Implemented") 38 | 39 | def is_empty(self): 40 | return len(self.log_queue) == 0 41 | 42 | def finish_episode_metrics(self, episode_info, task_info, metric_results): 43 | pass 44 | 45 | def finish_episode(self, environment, episode_info, task_info): 46 | pass 47 | 48 | 49 | class TestMetricLogger(LoggerVisualizer): 50 | def __init__(self, **kwargs): 51 | super().__init__(**kwargs) 52 | self.total_metric_dict = {} 53 | log_file_name = os.path.join( 54 | self.log_dir, "test_metric_{}.txt".format(self.exp_name) 55 | ) 56 | self.metric_log_file = open(log_file_name, "w") 57 | 58 | def average_dict(self): 59 | result = {} 60 | for (k, v) in self.total_metric_dict.items(): 61 | result[k] = sum(v) / len(v) 62 | return result 63 | 64 | def finish_episode_metrics(self, episode_info, task_info, metric_results=None): 65 | 66 | if metric_results is None: 67 | print("had to reset") 68 | self.log_queue = [] 69 | self.action_queue = [] 70 | return 71 | 72 | for k in metric_results.keys(): 73 | if "metric" in k or k in ["ep_length", "reward", "success"]: 74 | self.total_metric_dict.setdefault(k, []) 75 | self.total_metric_dict[k].append(metric_results[k]) 76 | print( 77 | "total", 78 | len(self.total_metric_dict["success"]), 79 | "average test metric", 80 | self.average_dict(), 81 | ) 82 | 83 | # save the task info and all the action queue and results 84 | log_dict = { 85 | "task_info_metrics": metric_results, 86 | "action_sequence": self.action_queue, 87 | "logger_number": self.logger_index, 88 | } 89 | self.logger_index += 1 90 | self.metric_log_file.write(str(log_dict)) 91 | self.metric_log_file.write("\n") 92 | print("Logging to", self.metric_log_file.name) 93 | 94 | self.log_queue = [] 95 | self.action_queue = [] 96 | 97 | def log(self, environment, action_str): 98 | # We can add agent arm and state location if needed 99 | self.action_queue.append(action_str) 100 | self.log_queue.append(action_str) 101 | 102 | 103 | class ImageVisualizer(LoggerVisualizer): 104 | def finish_episode(self, environment, episode_info, task_info): 105 | now = datetime.now() 106 | time_to_write = now.strftime("%m_%d_%Y_%H_%M_%S_%f") 107 | time_to_write += "log_ind_{}".format(self.logger_index) 108 | self.logger_index += 1 109 | print("Loggigng", time_to_write, "len", len(self.log_queue)) 110 | object_id = task_info["objectId"] 111 | 112 | pickup_success = episode_info.object_picked_up 113 | episode_success = episode_info._success 114 | 115 | # Put back if you want the images 116 | # for i, img in enumerate(self.log_queue): 117 | # image_dir = os.path.join(self.log_dir, time_to_write + '_seq{}.png'.format(str(i))) 118 | # cv2.imwrite(image_dir, img[:,:,[2,1,0]]) 119 | 120 | episode_success_offset = "succ" if episode_success else "fail" 121 | pickup_success_offset = "succ" if pickup_success else "fail" 122 | gif_name = ( 123 | time_to_write 124 | + "_obj_" 125 | + object_id.split("|")[0] 126 | + "_pickup_" 127 | + pickup_success_offset 128 | + "_episode_" 129 | + episode_success_offset 130 | + ".gif" 131 | ) 132 | concat_all_images = np.expand_dims(np.stack(self.log_queue, axis=0), axis=1) 133 | save_image_list_to_gif(concat_all_images, gif_name, self.log_dir) 134 | 135 | self.log_start_goal( 136 | environment, 137 | task_info["visualization_source"], 138 | tag="start", 139 | img_adr=os.path.join(self.log_dir, time_to_write), 140 | ) 141 | self.log_start_goal( 142 | environment, 143 | task_info["visualization_target"], 144 | tag="goal", 145 | img_adr=os.path.join(self.log_dir, time_to_write), 146 | ) 147 | 148 | self.log_queue = [] 149 | self.action_queue = [] 150 | 151 | def log(self, environment, action_str): 152 | image_tensor = environment.current_frame 153 | self.action_queue.append(action_str) 154 | self.log_queue.append(image_tensor) 155 | 156 | def log_start_goal(self, env, task_info, tag, img_adr): 157 | object_location = task_info["object_location"] 158 | object_id = task_info["object_id"] 159 | agent_state = task_info["agent_pose"] 160 | this_controller = env.controller 161 | scene = this_controller.last_event.metadata[ 162 | "sceneName" 163 | ] # maybe we need to reset env actually] 164 | reset_environment_and_additional_commands(this_controller, scene) 165 | # for start arm from high up as a cheating, this block is very important. never remove 166 | event1, event2, event3 = initialize_arm(this_controller) 167 | if not ( 168 | event1.metadata["lastActionSuccess"] 169 | and event2.metadata["lastActionSuccess"] 170 | and event3.metadata["lastActionSuccess"] 171 | ): 172 | print("ERROR: ARM MOVEMENT FAILED in logging! SHOULD NEVER HAPPEN") 173 | 174 | event = transport_wrapper(this_controller, object_id, object_location) 175 | if event.metadata["lastActionSuccess"] == False: 176 | print("ERROR: oh no could not transport in logging") 177 | 178 | event = this_controller.step( 179 | dict( 180 | action="TeleportFull", 181 | standing=True, 182 | x=agent_state["position"]["x"], 183 | y=agent_state["position"]["y"], 184 | z=agent_state["position"]["z"], 185 | rotation=dict( 186 | x=agent_state["rotation"]["x"], 187 | y=agent_state["rotation"]["y"], 188 | z=agent_state["rotation"]["z"], 189 | ), 190 | horizon=agent_state["cameraHorizon"], 191 | ) 192 | ) 193 | if event.metadata["lastActionSuccess"] == False: 194 | print("ERROR: oh no could not teleport in logging") 195 | 196 | image_tensor = this_controller.last_event.frame 197 | image_dir = ( 198 | img_adr + "_obj_" + object_id.split("|")[0] + "_pickup_" + tag + ".png" 199 | ) 200 | cv2.imwrite(image_dir, image_tensor[:, :, [2, 1, 0]]) 201 | 202 | 203 | class ThirdViewVisualizer(LoggerVisualizer): 204 | def __init__(self): 205 | super(ThirdViewVisualizer, self).__init__() 206 | print("This does not work") 207 | ForkedPdb().set_trace() 208 | # self.init_camera = False 209 | 210 | def finish_episode(self, environment, episode_success, task_info): 211 | now = datetime.now() 212 | time_to_write = now.strftime("%m_%d_%Y_%H_%M_%S_%f") 213 | print("Loggigng", time_to_write, "len", len(self.log_queue)) 214 | 215 | for i, img in enumerate(self.log_queue): 216 | image_dir = os.path.join( 217 | self.log_dir, time_to_write + "_seq{}.png".format(str(i)) 218 | ) 219 | cv2.imwrite(image_dir, img[:, :, [2, 1, 0]]) 220 | 221 | success_offset = "succ" if episode_success else "fail" 222 | gif_name = time_to_write + "_" + success_offset + ".gif" 223 | concat_all_images = np.expand_dims(np.stack(self.log_queue, axis=0), axis=1) 224 | save_image_list_to_gif(concat_all_images, gif_name, self.log_dir) 225 | 226 | self.log_queue = [] 227 | self.action_queue = [] 228 | 229 | def log(self, environment, action_str): 230 | 231 | # if True or not self.init_camera: 232 | # # self.init_camera = True 233 | # 234 | # agent_state = environment.controller.last_event.metadata['agent'] 235 | # offset={'x':1, 'y':1, 'z':1} 236 | # rotation_offset = 0 237 | # # environment.controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=agent_state['rotation']['y']+rotation_offset, z=0), position=dict(x=agent_state['position']['x'] + offset['x'], y=1.0 + offset['y'], z=agent_state['position']['z'] + offset['z'])) 238 | # environment.controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=45, z=0), position=dict(x=-1, y=1.5, z=-1), fieldOfView=100) 239 | 240 | # the direction of this might not be ideal 241 | image_tensor = environment.controller.last_event.third_party_camera_frames[0] 242 | self.action_queue.append(action_str) 243 | self.log_queue.append(image_tensor) 244 | 245 | 246 | # def __save_thirdparty_camera(controller, address='/Users/kianae/Desktop/third_camera.png', offset={'x':0, 'y':0, 'z':0}, rotation_offset=0): 247 | # This is taking an additional step which messes up the sequence 248 | # agent_state = controller.last_event.metadata['agent'] 249 | # controller.step('UpdateThirdPartyCamera', thirdPartyCameraId=0, rotation=dict(x=0, y=agent_state['rotation']['y']+rotation_offset, z=0), position=dict(x=agent_state['position']['x'] + offset['x'], y=1.0 + offset['y'], z=agent_state['position']['z'] + offset['z'])) 250 | # frame = controller.last_event.third_party_camera_frames 251 | # assert len(frame) == 1 252 | # frame = frame[0] 253 | # file_adr = address 254 | # res = cv2.imwrite(file_adr, frame[:,:,[2,1,0]]) 255 | # return res 256 | 257 | 258 | def save_image_list_to_gif(image_list, gif_name, gif_dir): 259 | gif_adr = os.path.join(gif_dir, gif_name) 260 | 261 | seq_len, cols, w, h, c = image_list.shape 262 | 263 | pallet = np.zeros((seq_len, w, h * cols, c)) 264 | 265 | for col_ind in range(cols): 266 | pallet[:, :, col_ind * h : (col_ind + 1) * h, :] = image_list[:, col_ind] 267 | 268 | if not os.path.exists(gif_dir): 269 | os.makedirs(gif_dir) 270 | imageio.mimsave(gif_adr, pallet.astype(np.uint8), format="GIF", duration=1 / 5) 271 | print("Saved result in ", gif_adr) 272 | -------------------------------------------------------------------------------- /manipulathor_baselines/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/__init__.py -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/__init__.py -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/experiments/__init__.py -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_base.py: -------------------------------------------------------------------------------- 1 | from abc import ABC 2 | from typing import Optional, Sequence, Union 3 | 4 | from allenact.base_abstractions.experiment_config import ExperimentConfig 5 | from allenact.base_abstractions.preprocessor import Preprocessor 6 | from allenact.base_abstractions.sensor import Sensor 7 | from allenact.utils.experiment_utils import Builder 8 | 9 | 10 | class ArmPointNavBaseConfig(ExperimentConfig, ABC): 11 | """The base object navigation configuration file.""" 12 | 13 | ADVANCE_SCENE_ROLLOUT_PERIOD: Optional[int] = None 14 | SENSORS: Optional[Sequence[Sensor]] = None 15 | 16 | STEP_SIZE = 0.25 17 | ROTATION_DEGREES = 45.0 18 | VISIBILITY_DISTANCE = 1.0 19 | STOCHASTIC = False 20 | 21 | CAMERA_WIDTH = 224 22 | CAMERA_HEIGHT = 224 23 | SCREEN_SIZE = 224 24 | MAX_STEPS = 200 25 | 26 | def __init__(self): 27 | self.REWARD_CONFIG = { 28 | "step_penalty": -0.01, 29 | "goal_success_reward": 10.0, 30 | "pickup_success_reward": 10.0, 31 | "failed_stop_reward": 0.0, 32 | "shaping_weight": 1.0, # we are not using this 33 | "failed_action_penalty": -0.03, 34 | } 35 | 36 | @classmethod 37 | def preprocessors(cls) -> Sequence[Union[Preprocessor, Builder[Preprocessor]]]: 38 | return tuple() 39 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_mixin_ddppo.py: -------------------------------------------------------------------------------- 1 | import torch.optim as optim 2 | from allenact.algorithms.onpolicy_sync.losses import PPO 3 | from allenact.algorithms.onpolicy_sync.losses.ppo import PPOConfig 4 | from allenact.utils.experiment_utils import ( 5 | Builder, 6 | PipelineStage, 7 | TrainingPipeline, 8 | LinearDecay, 9 | ) 10 | from torch.optim.lr_scheduler import LambdaLR 11 | 12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import ( 13 | ArmPointNavBaseConfig, 14 | ) 15 | 16 | 17 | class ArmPointNavMixInPPOConfig(ArmPointNavBaseConfig): 18 | def training_pipeline(self, **kwargs): 19 | ppo_steps = int(300000000) 20 | lr = 3e-4 21 | num_mini_batch = 1 22 | update_repeats = 4 23 | num_steps = self.MAX_STEPS 24 | save_interval = 500000 # from 50k 25 | log_interval = 1000 26 | gamma = 0.99 27 | use_gae = True 28 | gae_lambda = 0.95 29 | max_grad_norm = 0.5 30 | return TrainingPipeline( 31 | save_interval=save_interval, 32 | metric_accumulate_interval=log_interval, 33 | optimizer_builder=Builder(optim.Adam, dict(lr=lr)), 34 | num_mini_batch=num_mini_batch, 35 | update_repeats=update_repeats, 36 | max_grad_norm=max_grad_norm, 37 | num_steps=num_steps, 38 | named_losses={"ppo_loss": PPO(**PPOConfig)}, 39 | gamma=gamma, 40 | use_gae=use_gae, 41 | gae_lambda=gae_lambda, 42 | advance_scene_rollout_period=self.ADVANCE_SCENE_ROLLOUT_PERIOD, 43 | pipeline_stages=[ 44 | PipelineStage(loss_names=["ppo_loss"], max_stage_steps=ppo_steps) 45 | ], 46 | lr_scheduler_builder=Builder( 47 | LambdaLR, {"lr_lambda": LinearDecay(steps=ppo_steps)} 48 | ), 49 | ) 50 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_mixin_simplegru.py: -------------------------------------------------------------------------------- 1 | from typing import Sequence, Union 2 | 3 | import gym 4 | import torch.nn as nn 5 | from allenact.base_abstractions.preprocessor import Preprocessor 6 | from allenact.base_abstractions.sensor import RGBSensor, DepthSensor 7 | from allenact.utils.experiment_utils import Builder 8 | from allenact_plugins.ithor_plugin.ithor_sensors import GoalObjectTypeThorSensor 9 | 10 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import ( 11 | ArmPointNavBaseConfig, 12 | ) 13 | from manipulathor_baselines.armpointnav_baselines.models.arm_pointnav_models import ( 14 | ArmPointNavBaselineActorCritic, 15 | ) 16 | 17 | 18 | class ArmPointNavMixInSimpleGRUConfig(ArmPointNavBaseConfig): 19 | @classmethod 20 | def preprocessors(cls) -> Sequence[Union[Preprocessor, Builder[Preprocessor]]]: 21 | preprocessors = [] 22 | return preprocessors 23 | 24 | @classmethod 25 | def create_model(cls, **kwargs) -> nn.Module: 26 | has_rgb = any(isinstance(s, RGBSensor) for s in cls.SENSORS) 27 | has_depth = any(isinstance(s, DepthSensor) for s in cls.SENSORS) 28 | goal_sensor_uuid = next( 29 | (s.uuid for s in cls.SENSORS if isinstance(s, GoalObjectTypeThorSensor)), 30 | None, 31 | ) 32 | 33 | return ArmPointNavBaselineActorCritic( 34 | action_space=gym.spaces.Discrete( 35 | len(cls.TASK_SAMPLER._TASK_TYPE.class_action_names()) 36 | ), 37 | observation_space=kwargs["sensor_preprocessor_graph"].observation_spaces, 38 | hidden_size=512, 39 | ) 40 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/armpointnav_thor_base.py: -------------------------------------------------------------------------------- 1 | import platform 2 | from abc import ABC 3 | from math import ceil 4 | from typing import Dict, Any, List, Optional, Sequence 5 | 6 | import gym 7 | import numpy as np 8 | import torch 9 | from allenact.base_abstractions.experiment_config import MachineParams 10 | from allenact.base_abstractions.preprocessor import SensorPreprocessorGraph 11 | from allenact.base_abstractions.sensor import SensorSuite, ExpertActionSensor 12 | from allenact.base_abstractions.task import TaskSampler 13 | from allenact.utils.experiment_utils import evenly_distribute_count_into_bins 14 | 15 | from ithor_arm.ithor_arm_constants import ENV_ARGS 16 | from ithor_arm.ithor_arm_task_samplers import ( 17 | SimpleArmPointNavGeneralSampler, 18 | ) 19 | from ithor_arm.ithor_arm_viz import ImageVisualizer, TestMetricLogger 20 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_base import ( 21 | ArmPointNavBaseConfig, 22 | ) 23 | 24 | 25 | class ArmPointNavThorBaseConfig(ArmPointNavBaseConfig, ABC): 26 | """The base config for all iTHOR PointNav experiments.""" 27 | 28 | TASK_SAMPLER = SimpleArmPointNavGeneralSampler 29 | VISUALIZE = False 30 | if platform.system() == "Darwin": 31 | VISUALIZE = True 32 | 33 | NUM_PROCESSES: Optional[int] = None 34 | TRAIN_GPU_IDS = list(range(torch.cuda.device_count())) 35 | SAMPLER_GPU_IDS = TRAIN_GPU_IDS 36 | VALID_GPU_IDS = [torch.cuda.device_count() - 1] 37 | TEST_GPU_IDS = [torch.cuda.device_count() - 1] 38 | 39 | TRAIN_DATASET_DIR: Optional[str] = None 40 | VAL_DATASET_DIR: Optional[str] = None 41 | 42 | CAP_TRAINING = None 43 | 44 | TRAIN_SCENES: str = None 45 | VAL_SCENES: str = None 46 | TEST_SCENES: str = None 47 | 48 | OBJECT_TYPES: Optional[Sequence[str]] = None 49 | VALID_SAMPLES_IN_SCENE = 1 50 | TEST_SAMPLES_IN_SCENE = 1 51 | 52 | NUMBER_OF_TEST_PROCESS = 10 53 | 54 | def __init__(self): 55 | super().__init__() 56 | 57 | assert ( 58 | self.CAMERA_WIDTH == 224 59 | and self.CAMERA_HEIGHT == 224 60 | and self.VISIBILITY_DISTANCE == 1 61 | and self.STEP_SIZE == 0.25 62 | ) 63 | self.ENV_ARGS = ENV_ARGS 64 | 65 | def machine_params(self, mode="train", **kwargs): 66 | sampler_devices: Sequence[int] = [] 67 | if mode == "train": 68 | workers_per_device = 1 69 | gpu_ids = ( 70 | [] 71 | if not torch.cuda.is_available() 72 | else self.TRAIN_GPU_IDS * workers_per_device 73 | ) 74 | nprocesses = ( 75 | 1 76 | if not torch.cuda.is_available() 77 | else evenly_distribute_count_into_bins(self.NUM_PROCESSES, len(gpu_ids)) 78 | ) 79 | sampler_devices = self.SAMPLER_GPU_IDS 80 | elif mode == "valid": 81 | nprocesses = 1 82 | gpu_ids = [] if not torch.cuda.is_available() else self.VALID_GPU_IDS 83 | elif mode == "test": 84 | nprocesses = self.NUMBER_OF_TEST_PROCESS if torch.cuda.is_available() else 1 85 | gpu_ids = [] if not torch.cuda.is_available() else self.TEST_GPU_IDS 86 | else: 87 | raise NotImplementedError("mode must be 'train', 'valid', or 'test'.") 88 | 89 | sensors = [*self.SENSORS] 90 | if mode != "train": 91 | sensors = [s for s in sensors if not isinstance(s, ExpertActionSensor)] 92 | 93 | sensor_preprocessor_graph = ( 94 | SensorPreprocessorGraph( 95 | source_observation_spaces=SensorSuite(sensors).observation_spaces, 96 | preprocessors=self.preprocessors(), 97 | ) 98 | if mode == "train" 99 | or ( 100 | (isinstance(nprocesses, int) and nprocesses > 0) 101 | or (isinstance(nprocesses, Sequence) and sum(nprocesses) > 0) 102 | ) 103 | else None 104 | ) 105 | 106 | return MachineParams( 107 | nprocesses=nprocesses, 108 | devices=gpu_ids, 109 | sampler_devices=sampler_devices 110 | if mode == "train" 111 | else gpu_ids, # ignored with > 1 gpu_ids 112 | sensor_preprocessor_graph=sensor_preprocessor_graph, 113 | ) 114 | 115 | @classmethod 116 | def make_sampler_fn(cls, **kwargs) -> TaskSampler: 117 | from datetime import datetime 118 | 119 | now = datetime.now() 120 | exp_name_w_time = cls.__name__ + "_" + now.strftime("%m_%d_%Y_%H_%M_%S_%f") 121 | if cls.VISUALIZE: 122 | visualizers = [ 123 | ImageVisualizer(exp_name=exp_name_w_time), 124 | TestMetricLogger(exp_name=exp_name_w_time), 125 | ] 126 | 127 | kwargs["visualizers"] = visualizers 128 | kwargs["objects"] = cls.OBJECT_TYPES 129 | kwargs["exp_name"] = exp_name_w_time 130 | return cls.TASK_SAMPLER(**kwargs) 131 | 132 | @staticmethod 133 | def _partition_inds(n: int, num_parts: int): 134 | return np.round(np.linspace(0, n, num_parts + 1, endpoint=True)).astype( 135 | np.int32 136 | ) 137 | 138 | def _get_sampler_args_for_scene_split( 139 | self, 140 | scenes: List[str], 141 | process_ind: int, 142 | total_processes: int, 143 | seeds: Optional[List[int]] = None, 144 | deterministic_cudnn: bool = False, 145 | ) -> Dict[str, Any]: 146 | if total_processes > len(scenes): # oversample some scenes -> bias 147 | if total_processes % len(scenes) != 0: 148 | print( 149 | "Warning: oversampling some of the scenes to feed all processes." 150 | " You can avoid this by setting a number of workers divisible by the number of scenes" 151 | ) 152 | scenes = scenes * int(ceil(total_processes / len(scenes))) 153 | scenes = scenes[: total_processes * (len(scenes) // total_processes)] 154 | else: 155 | if len(scenes) % total_processes != 0: 156 | print( 157 | "Warning: oversampling some of the scenes to feed all processes." 158 | " You can avoid this by setting a number of workers divisor of the number of scenes" 159 | ) 160 | inds = self._partition_inds(len(scenes), total_processes) 161 | 162 | return { 163 | "scenes": scenes[inds[process_ind] : inds[process_ind + 1]], 164 | "env_args": self.ENV_ARGS, 165 | "max_steps": self.MAX_STEPS, 166 | "sensors": self.SENSORS, 167 | "action_space": gym.spaces.Discrete( 168 | len(self.TASK_SAMPLER._TASK_TYPE.class_action_names()) 169 | ), 170 | "seed": seeds[process_ind] if seeds is not None else None, 171 | "deterministic_cudnn": deterministic_cudnn, 172 | "rewards_config": self.REWARD_CONFIG, 173 | } 174 | 175 | def train_task_sampler_args( 176 | self, 177 | process_ind: int, 178 | total_processes: int, 179 | devices: Optional[List[int]] = None, 180 | seeds: Optional[List[int]] = None, 181 | deterministic_cudnn: bool = False, 182 | ) -> Dict[str, Any]: 183 | res = self._get_sampler_args_for_scene_split( 184 | self.TRAIN_SCENES, 185 | process_ind, 186 | total_processes, 187 | seeds=seeds, 188 | deterministic_cudnn=deterministic_cudnn, 189 | ) 190 | res["scene_period"] = "manual" 191 | res["sampler_mode"] = "train" 192 | res["cap_training"] = self.CAP_TRAINING 193 | res["env_args"] = {} 194 | res["env_args"].update(self.ENV_ARGS) 195 | res["env_args"]["x_display"] = ( 196 | ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None 197 | ) 198 | return res 199 | 200 | def valid_task_sampler_args( 201 | self, 202 | process_ind: int, 203 | total_processes: int, 204 | devices: Optional[List[int]], 205 | seeds: Optional[List[int]] = None, 206 | deterministic_cudnn: bool = False, 207 | ) -> Dict[str, Any]: 208 | res = self._get_sampler_args_for_scene_split( 209 | self.VALID_SCENES, 210 | process_ind, 211 | total_processes, 212 | seeds=seeds, 213 | deterministic_cudnn=deterministic_cudnn, 214 | ) 215 | res["scene_period"] = self.VALID_SAMPLES_IN_SCENE 216 | res["sampler_mode"] = "val" 217 | res["cap_training"] = self.CAP_TRAINING 218 | res["max_tasks"] = self.VALID_SAMPLES_IN_SCENE * len(res["scenes"]) 219 | res["env_args"] = {} 220 | res["env_args"].update(self.ENV_ARGS) 221 | res["env_args"]["x_display"] = ( 222 | ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None 223 | ) 224 | return res 225 | 226 | def test_task_sampler_args( 227 | self, 228 | process_ind: int, 229 | total_processes: int, 230 | devices: Optional[List[int]], 231 | seeds: Optional[List[int]] = None, 232 | deterministic_cudnn: bool = False, 233 | ) -> Dict[str, Any]: 234 | res = self._get_sampler_args_for_scene_split( 235 | self.TEST_SCENES, 236 | process_ind, 237 | total_processes, 238 | seeds=seeds, 239 | deterministic_cudnn=deterministic_cudnn, 240 | ) 241 | res["scene_period"] = self.TEST_SAMPLES_IN_SCENE 242 | res["sampler_mode"] = "test" 243 | res["env_args"] = {} 244 | res["cap_training"] = self.CAP_TRAINING 245 | res["env_args"].update(self.ENV_ARGS) 246 | res["env_args"]["x_display"] = ( 247 | ("0.%d" % devices[process_ind % len(devices)]) if len(devices) > 0 else None 248 | ) 249 | return res 250 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/experiments/ithor/__init__.py -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth.py: -------------------------------------------------------------------------------- 1 | from ithor_arm.ithor_arm_constants import ENV_ARGS 2 | from ithor_arm.ithor_arm_sensors import ( 3 | RelativeAgentArmToObjectSensor, 4 | RelativeObjectToGoalSensor, 5 | PickedUpObjSensor, 6 | DepthSensorThor, 7 | ) 8 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 9 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import ( 10 | ArmPointNavMixInPPOConfig, 11 | ) 12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import ( 13 | ArmPointNavMixInSimpleGRUConfig, 14 | ) 15 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 16 | ArmPointNaviThorBaseConfig, 17 | ) 18 | 19 | 20 | class ArmPointNavDepth( 21 | ArmPointNaviThorBaseConfig, 22 | ArmPointNavMixInPPOConfig, 23 | ArmPointNavMixInSimpleGRUConfig, 24 | ): 25 | """An Object Navigation experiment configuration in iThor with RGB 26 | input.""" 27 | 28 | SENSORS = [ 29 | DepthSensorThor( 30 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 31 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 32 | use_normalization=True, 33 | uuid="depth_lowres", 34 | ), 35 | RelativeAgentArmToObjectSensor(), 36 | RelativeObjectToGoalSensor(), 37 | PickedUpObjSensor(), 38 | ] 39 | 40 | MAX_STEPS = 200 41 | TASK_SAMPLER = ArmPointNavTaskSampler 42 | 43 | def __init__(self): 44 | super().__init__() 45 | 46 | assert ( 47 | self.CAMERA_WIDTH == 224 48 | and self.CAMERA_HEIGHT == 224 49 | and self.VISIBILITY_DISTANCE == 1 50 | and self.STEP_SIZE == 0.25 51 | ) 52 | self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True} 53 | 54 | @classmethod 55 | def tag(cls): 56 | return cls.__name__ 57 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_disjoint_depth.py: -------------------------------------------------------------------------------- 1 | import gym 2 | import torch.nn as nn 3 | 4 | from ithor_arm.ithor_arm_constants import ENV_ARGS 5 | from ithor_arm.ithor_arm_sensors import ( 6 | RelativeAgentArmToObjectSensor, 7 | RelativeObjectToGoalSensor, 8 | PickedUpObjSensor, 9 | DepthSensorThor, 10 | ) 11 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import ( 13 | ArmPointNavMixInPPOConfig, 14 | ) 15 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import ( 16 | ArmPointNavMixInSimpleGRUConfig, 17 | ) 18 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 19 | ArmPointNaviThorBaseConfig, 20 | ) 21 | from manipulathor_baselines.armpointnav_baselines.models.disjoint_arm_pointnav_models import ( 22 | DisjointArmPointNavBaselineActorCritic, 23 | ) 24 | 25 | 26 | class ArmPointNavDisjointDepth( 27 | ArmPointNaviThorBaseConfig, 28 | ArmPointNavMixInPPOConfig, 29 | ArmPointNavMixInSimpleGRUConfig, 30 | ): 31 | """An Object Navigation experiment configuration in iThor with RGB 32 | input.""" 33 | 34 | SENSORS = [ 35 | DepthSensorThor( 36 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 37 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 38 | use_normalization=True, 39 | uuid="depth_lowres", 40 | ), 41 | RelativeAgentArmToObjectSensor(), 42 | RelativeObjectToGoalSensor(), 43 | PickedUpObjSensor(), 44 | ] 45 | 46 | MAX_STEPS = 200 47 | TASK_SAMPLER = ArmPointNavTaskSampler 48 | 49 | def __init__(self): 50 | super().__init__() 51 | 52 | assert ( 53 | self.CAMERA_WIDTH == 224 54 | and self.CAMERA_HEIGHT == 224 55 | and self.VISIBILITY_DISTANCE == 1 56 | and self.STEP_SIZE == 0.25 57 | ) 58 | self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True} 59 | 60 | @classmethod 61 | def create_model(cls, **kwargs) -> nn.Module: 62 | return DisjointArmPointNavBaselineActorCritic( 63 | action_space=gym.spaces.Discrete( 64 | len(cls.TASK_SAMPLER._TASK_TYPE.class_action_names()) 65 | ), 66 | observation_space=kwargs["sensor_preprocessor_graph"].observation_spaces, 67 | hidden_size=512, 68 | ) 69 | 70 | @classmethod 71 | def tag(cls): 72 | return cls.__name__ 73 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_ithor_base.py: -------------------------------------------------------------------------------- 1 | from abc import ABC 2 | 3 | from ithor_arm.ithor_arm_constants import TRAIN_OBJECTS, TEST_OBJECTS 4 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_thor_base import ( 5 | ArmPointNavThorBaseConfig, 6 | ) 7 | 8 | 9 | class ArmPointNaviThorBaseConfig(ArmPointNavThorBaseConfig, ABC): 10 | """The base config for all iTHOR ObjectNav experiments.""" 11 | 12 | NUM_PROCESSES = 40 13 | # add all the arguments here 14 | TOTAL_NUMBER_SCENES = 30 15 | 16 | TRAIN_SCENES = [ 17 | "FloorPlan{}_physics".format(str(i)) 18 | for i in range(1, TOTAL_NUMBER_SCENES + 1) 19 | if (i % 3 == 1 or i % 3 == 0) and i != 28 20 | ] # last scenes are really bad 21 | TEST_SCENES = [ 22 | "FloorPlan{}_physics".format(str(i)) 23 | for i in range(1, TOTAL_NUMBER_SCENES + 1) 24 | if i % 3 == 2 and i % 6 == 2 25 | ] 26 | VALID_SCENES = [ 27 | "FloorPlan{}_physics".format(str(i)) 28 | for i in range(1, TOTAL_NUMBER_SCENES + 1) 29 | if i % 3 == 2 and i % 6 == 5 30 | ] 31 | 32 | ALL_SCENES = TRAIN_SCENES + TEST_SCENES + VALID_SCENES 33 | 34 | assert ( 35 | len(ALL_SCENES) == TOTAL_NUMBER_SCENES - 1 36 | and len(set(ALL_SCENES)) == TOTAL_NUMBER_SCENES - 1 37 | ) 38 | 39 | OBJECT_TYPES = tuple(sorted(TRAIN_OBJECTS)) 40 | 41 | UNSEEN_OBJECT_TYPES = tuple(sorted(TEST_OBJECTS)) 42 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_no_vision.py: -------------------------------------------------------------------------------- 1 | from ithor_arm.ithor_arm_constants import ENV_ARGS 2 | from ithor_arm.ithor_arm_sensors import ( 3 | RelativeAgentArmToObjectSensor, 4 | RelativeObjectToGoalSensor, 5 | PickedUpObjSensor, 6 | NoVisionSensorThor, 7 | ) 8 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 9 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import ( 10 | ArmPointNavMixInPPOConfig, 11 | ) 12 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import ( 13 | ArmPointNavMixInSimpleGRUConfig, 14 | ) 15 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 16 | ArmPointNaviThorBaseConfig, 17 | ) 18 | 19 | 20 | class ArmPointNavNoVision( 21 | ArmPointNaviThorBaseConfig, 22 | ArmPointNavMixInPPOConfig, 23 | ArmPointNavMixInSimpleGRUConfig, 24 | ): 25 | """An Object Navigation experiment configuration in iThor with RGB 26 | input.""" 27 | 28 | SENSORS = [ 29 | NoVisionSensorThor( 30 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 31 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 32 | use_resnet_normalization=False, 33 | uuid="rgb_lowres", 34 | ), 35 | RelativeAgentArmToObjectSensor(), 36 | RelativeObjectToGoalSensor(), 37 | PickedUpObjSensor(), 38 | ] 39 | 40 | MAX_STEPS = 200 41 | TASK_SAMPLER = ArmPointNavTaskSampler # 42 | 43 | def __init__(self): 44 | super().__init__() 45 | 46 | assert ( 47 | self.CAMERA_WIDTH == 224 48 | and self.CAMERA_HEIGHT == 224 49 | and self.VISIBILITY_DISTANCE == 1 50 | and self.STEP_SIZE == 0.25 51 | ) 52 | self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": False} 53 | 54 | @classmethod 55 | def tag(cls): 56 | return cls.__name__ 57 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgb.py: -------------------------------------------------------------------------------- 1 | from allenact_plugins.ithor_plugin.ithor_sensors import RGBSensorThor 2 | 3 | from ithor_arm.ithor_arm_constants import ENV_ARGS 4 | from ithor_arm.ithor_arm_sensors import ( 5 | RelativeAgentArmToObjectSensor, 6 | RelativeObjectToGoalSensor, 7 | PickedUpObjSensor, 8 | ) 9 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 10 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import ( 11 | ArmPointNavMixInPPOConfig, 12 | ) 13 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import ( 14 | ArmPointNavMixInSimpleGRUConfig, 15 | ) 16 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 17 | ArmPointNaviThorBaseConfig, 18 | ) 19 | 20 | 21 | class ArmPointNavRGB( 22 | ArmPointNaviThorBaseConfig, 23 | ArmPointNavMixInPPOConfig, 24 | ArmPointNavMixInSimpleGRUConfig, 25 | ): 26 | """An Object Navigation experiment configuration in iThor with RGB 27 | input.""" 28 | 29 | SENSORS = [ 30 | RGBSensorThor( 31 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 32 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 33 | use_resnet_normalization=True, 34 | uuid="rgb_lowres", 35 | ), 36 | RelativeAgentArmToObjectSensor(), 37 | RelativeObjectToGoalSensor(), 38 | PickedUpObjSensor(), 39 | ] 40 | 41 | MAX_STEPS = 200 42 | TASK_SAMPLER = ArmPointNavTaskSampler # 43 | 44 | def __init__(self): 45 | super().__init__() 46 | 47 | assert ( 48 | self.CAMERA_WIDTH == 224 49 | and self.CAMERA_HEIGHT == 224 50 | and self.VISIBILITY_DISTANCE == 1 51 | and self.STEP_SIZE == 0.25 52 | ) 53 | self.ENV_ARGS = {**ENV_ARGS} 54 | 55 | @classmethod 56 | def tag(cls): 57 | return cls.__name__ 58 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgbdepth.py: -------------------------------------------------------------------------------- 1 | from allenact_plugins.ithor_plugin.ithor_sensors import RGBSensorThor 2 | 3 | from ithor_arm.ithor_arm_constants import ENV_ARGS 4 | from ithor_arm.ithor_arm_sensors import ( 5 | RelativeAgentArmToObjectSensor, 6 | RelativeObjectToGoalSensor, 7 | PickedUpObjSensor, 8 | DepthSensorThor, 9 | ) 10 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 11 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_ddppo import ( 12 | ArmPointNavMixInPPOConfig, 13 | ) 14 | from manipulathor_baselines.armpointnav_baselines.experiments.armpointnav_mixin_simplegru import ( 15 | ArmPointNavMixInSimpleGRUConfig, 16 | ) 17 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 18 | ArmPointNaviThorBaseConfig, 19 | ) 20 | 21 | 22 | class ArmPointNavRGBDepth( 23 | ArmPointNaviThorBaseConfig, 24 | ArmPointNavMixInPPOConfig, 25 | ArmPointNavMixInSimpleGRUConfig, 26 | ): 27 | """An Object Navigation experiment configuration in iThor with RGB 28 | input.""" 29 | 30 | SENSORS = [ 31 | DepthSensorThor( 32 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 33 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 34 | use_normalization=True, 35 | uuid="depth_lowres", 36 | ), 37 | RGBSensorThor( 38 | height=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 39 | width=ArmPointNaviThorBaseConfig.SCREEN_SIZE, 40 | use_resnet_normalization=True, 41 | uuid="rgb_lowres", 42 | ), 43 | RelativeAgentArmToObjectSensor(), 44 | RelativeObjectToGoalSensor(), 45 | PickedUpObjSensor(), 46 | ] 47 | 48 | MAX_STEPS = 200 49 | TASK_SAMPLER = ArmPointNavTaskSampler # 50 | 51 | def __init__(self): 52 | super().__init__() 53 | 54 | assert ( 55 | self.CAMERA_WIDTH == 224 56 | and self.CAMERA_HEIGHT == 224 57 | and self.VISIBILITY_DISTANCE == 1 58 | and self.STEP_SIZE == 0.25 59 | ) 60 | self.ENV_ARGS = {**ENV_ARGS, "renderDepthImage": True} 61 | 62 | @classmethod 63 | def tag(cls): 64 | return cls.__name__ 65 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_NovelObj_armpointnav_depth.py: -------------------------------------------------------------------------------- 1 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 2 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_depth import ( 3 | ArmPointNavDepth, 4 | ) 5 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 6 | ArmPointNaviThorBaseConfig, 7 | ) 8 | 9 | 10 | class TestOnUObjUSceneRealDepthRandomAgentLocArmNav(ArmPointNavDepth): 11 | 12 | MAX_STEPS = 200 13 | TASK_SAMPLER = ArmPointNavTaskSampler 14 | TEST_SCENES = ArmPointNaviThorBaseConfig.TEST_SCENES 15 | OBJECT_TYPES = ArmPointNaviThorBaseConfig.UNSEEN_OBJECT_TYPES 16 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_SeenScenes_NovelObj_armpointnav_depth.py: -------------------------------------------------------------------------------- 1 | from ithor_arm.ithor_arm_task_samplers import ArmPointNavTaskSampler 2 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_depth import ( 3 | ArmPointNavDepth, 4 | ) 5 | from manipulathor_baselines.armpointnav_baselines.experiments.ithor.armpointnav_ithor_base import ( 6 | ArmPointNaviThorBaseConfig, 7 | ) 8 | 9 | 10 | class TestOnUObjSSceneArmNavDepth(ArmPointNavDepth): 11 | """An Object Navigation experiment configuration in iThor with RGB 12 | input.""" 13 | 14 | MAX_STEPS = 200 15 | TASK_SAMPLER = ArmPointNavTaskSampler 16 | TEST_SCENES = ArmPointNaviThorBaseConfig.TRAIN_SCENES 17 | OBJECT_TYPES = ArmPointNaviThorBaseConfig.UNSEEN_OBJECT_TYPES 18 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_baselines/armpointnav_baselines/models/__init__.py -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/models/arm_pointnav_models.py: -------------------------------------------------------------------------------- 1 | """Baseline models for use in the object navigation task. 2 | 3 | Object navigation is currently available as a Task in AI2-THOR and 4 | Facebook's Habitat. 5 | """ 6 | from typing import Tuple, Optional 7 | 8 | import gym 9 | import torch 10 | from allenact.algorithms.onpolicy_sync.policy import ( 11 | ActorCriticModel, 12 | LinearCriticHead, 13 | LinearActorHead, 14 | DistributionType, 15 | Memory, 16 | ObservationType, 17 | ) 18 | from allenact.base_abstractions.distributions import CategoricalDistr 19 | from allenact.base_abstractions.misc import ActorCriticOutput 20 | from allenact.embodiedai.models.basic_models import SimpleCNN, RNNStateEncoder 21 | from gym.spaces.dict import Dict as SpaceDict 22 | 23 | from manipulathor_utils.net_utils import input_embedding_net 24 | 25 | 26 | class ArmPointNavBaselineActorCritic(ActorCriticModel[CategoricalDistr]): 27 | """Baseline recurrent actor critic model for armpointnav task. 28 | 29 | # Attributes 30 | action_space : The space of actions available to the agent. Currently only discrete 31 | actions are allowed (so this space will always be of type `gym.spaces.Discrete`). 32 | observation_space : The observation space expected by the agent. This observation space 33 | should include (optionally) 'rgb' images and 'depth' images. 34 | hidden_size : The hidden size of the GRU RNN. 35 | object_type_embedding_dim: The dimensionality of the embedding corresponding to the goal 36 | object type. 37 | """ 38 | 39 | def __init__( 40 | self, 41 | action_space: gym.spaces.Discrete, 42 | observation_space: SpaceDict, 43 | hidden_size=512, 44 | obj_state_embedding_size=512, 45 | trainable_masked_hidden_state: bool = False, 46 | num_rnn_layers=1, 47 | rnn_type="GRU", 48 | ): 49 | """Initializer. 50 | 51 | See class documentation for parameter definitions. 52 | """ 53 | super().__init__(action_space=action_space, observation_space=observation_space) 54 | 55 | self._hidden_size = hidden_size 56 | self.object_type_embedding_size = obj_state_embedding_size 57 | 58 | sensor_names = self.observation_space.spaces.keys() 59 | self.visual_encoder = SimpleCNN( 60 | self.observation_space, 61 | self._hidden_size, 62 | rgb_uuid="rgb_lowres" if "rgb_lowres" in sensor_names else None, 63 | depth_uuid="depth_lowres" if "depth_lowres" in sensor_names else None, 64 | ) 65 | 66 | if "rgb_lowres" in sensor_names and "depth_lowres" in sensor_names: 67 | input_visual_feature_num = 2 68 | elif "rgb_lowres" in sensor_names: 69 | input_visual_feature_num = 1 70 | elif "depth_lowres" in sensor_names: 71 | input_visual_feature_num = 1 72 | 73 | self.state_encoder = RNNStateEncoder( 74 | (self._hidden_size) * input_visual_feature_num + obj_state_embedding_size, 75 | self._hidden_size, 76 | trainable_masked_hidden_state=trainable_masked_hidden_state, 77 | num_layers=num_rnn_layers, 78 | rnn_type=rnn_type, 79 | ) 80 | 81 | self.actor = LinearActorHead(self._hidden_size, action_space.n) 82 | self.critic = LinearCriticHead(self._hidden_size) 83 | relative_dist_embedding_size = torch.Tensor([3, 100, obj_state_embedding_size]) 84 | self.relative_dist_embedding = input_embedding_net( 85 | relative_dist_embedding_size.long().tolist(), dropout=0 86 | ) 87 | 88 | self.train() 89 | 90 | @property 91 | def recurrent_hidden_state_size(self) -> int: 92 | """The recurrent hidden state size of the model.""" 93 | return self._hidden_size 94 | 95 | @property 96 | def num_recurrent_layers(self) -> int: 97 | """Number of recurrent hidden layers.""" 98 | return self.state_encoder.num_recurrent_layers 99 | 100 | def _recurrent_memory_specification(self): 101 | return dict( 102 | rnn=( 103 | ( 104 | ("layer", self.num_recurrent_layers), 105 | ("sampler", None), 106 | ("hidden", self.recurrent_hidden_state_size), 107 | ), 108 | torch.float32, 109 | ) 110 | ) 111 | 112 | def get_relative_distance_embedding( 113 | self, state_tensor: torch.Tensor 114 | ) -> torch.FloatTensor: 115 | 116 | return self.relative_dist_embedding(state_tensor) 117 | 118 | def forward( # type:ignore 119 | self, 120 | observations: ObservationType, 121 | memory: Memory, 122 | prev_actions: torch.Tensor, 123 | masks: torch.FloatTensor, 124 | ) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]: 125 | """Processes input batched observations to produce new actor and critic 126 | values. Processes input batched observations (along with prior hidden 127 | states, previous actions, and masks denoting which recurrent hidden 128 | states should be masked) and returns an `ActorCriticOutput` object 129 | containing the model's policy (distribution over actions) and 130 | evaluation of the current state (value). 131 | 132 | # Parameters 133 | observations : Batched input observations. 134 | memory : `Memory` containing the hidden states from initial timepoints. 135 | prev_actions : Tensor of previous actions taken. 136 | masks : Masks applied to hidden states. See `RNNStateEncoder`. 137 | # Returns 138 | Tuple of the `ActorCriticOutput` and recurrent hidden state. 139 | """ 140 | 141 | arm2obj_dist = self.get_relative_distance_embedding( 142 | observations["relative_agent_arm_to_obj"] 143 | ) 144 | obj2goal_dist = self.get_relative_distance_embedding( 145 | observations["relative_obj_to_goal"] 146 | ) 147 | 148 | perception_embed = self.visual_encoder(observations) 149 | 150 | pickup_bool = observations["pickedup_object"] 151 | before_pickup = pickup_bool == 0 # not used because of our initialization 152 | after_pickup = pickup_bool == 1 153 | distances = arm2obj_dist 154 | distances[after_pickup] = obj2goal_dist[after_pickup] 155 | 156 | x = [distances, perception_embed] 157 | 158 | x_cat = torch.cat(x, dim=-1) 159 | x_out, rnn_hidden_states = self.state_encoder( 160 | x_cat, memory.tensor("rnn"), masks 161 | ) 162 | 163 | actor_out = self.actor(x_out) 164 | critic_out = self.critic(x_out) 165 | actor_critic_output = ActorCriticOutput( 166 | distributions=actor_out, values=critic_out, extras={} 167 | ) 168 | 169 | updated_memory = memory.set_tensor("rnn", rnn_hidden_states) 170 | 171 | return ( 172 | actor_critic_output, 173 | updated_memory, 174 | ) 175 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/models/base_models.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | class LinearActorHeadNoCategory(nn.Module): 6 | def __init__(self, num_inputs: int, num_outputs: int): 7 | super().__init__() 8 | 9 | self.linear = nn.Linear(num_inputs, num_outputs) 10 | nn.init.orthogonal_(self.linear.weight, gain=0.01) 11 | nn.init.constant_(self.linear.bias, 0) 12 | 13 | def forward(self, x: torch.FloatTensor): # type: ignore 14 | x = self.linear(x) # type:ignore 15 | assert len(x.shape) == 3 16 | return x 17 | -------------------------------------------------------------------------------- /manipulathor_baselines/armpointnav_baselines/models/disjoint_arm_pointnav_models.py: -------------------------------------------------------------------------------- 1 | """Baseline models for use in the object navigation task. 2 | 3 | Object navigation is currently available as a Task in AI2-THOR and 4 | Facebook's Habitat. 5 | """ 6 | from typing import Tuple, Optional 7 | 8 | import gym 9 | import torch 10 | from allenact.algorithms.onpolicy_sync.policy import ( 11 | ActorCriticModel, 12 | LinearCriticHead, 13 | DistributionType, 14 | Memory, 15 | ObservationType, 16 | ) 17 | from allenact.base_abstractions.distributions import CategoricalDistr 18 | from allenact.base_abstractions.misc import ActorCriticOutput 19 | from allenact.embodiedai.models.basic_models import SimpleCNN, RNNStateEncoder 20 | from gym.spaces.dict import Dict as SpaceDict 21 | 22 | from manipulathor_baselines.armpointnav_baselines.models.base_models import ( 23 | LinearActorHeadNoCategory, 24 | ) 25 | from manipulathor_utils.net_utils import input_embedding_net 26 | 27 | 28 | class DisjointArmPointNavBaselineActorCritic(ActorCriticModel[CategoricalDistr]): 29 | """Disjoint Baseline recurrent actor critic model for armpointnav. 30 | 31 | # Attributes 32 | action_space : The space of actions available to the agent. Currently only discrete 33 | actions are allowed (so this space will always be of type `gym.spaces.Discrete`). 34 | observation_space : The observation space expected by the agent. This observation space 35 | should include (optionally) 'rgb' images and 'depth' images and is required to 36 | have a component corresponding to the goal `goal_sensor_uuid`. 37 | goal_sensor_uuid : The uuid of the sensor of the goal object. See `GoalObjectTypeThorSensor` 38 | as an example of such a sensor. 39 | hidden_size : The hidden size of the GRU RNN. 40 | object_type_embedding_dim: The dimensionality of the embedding corresponding to the goal 41 | object type. 42 | """ 43 | 44 | def __init__( 45 | self, 46 | action_space: gym.spaces.Discrete, 47 | observation_space: SpaceDict, 48 | hidden_size=512, 49 | obj_state_embedding_size=512, 50 | trainable_masked_hidden_state: bool = False, 51 | num_rnn_layers=1, 52 | rnn_type="GRU", 53 | ): 54 | """Initializer. 55 | 56 | See class documentation for parameter definitions. 57 | """ 58 | super().__init__(action_space=action_space, observation_space=observation_space) 59 | 60 | self._hidden_size = hidden_size 61 | self.object_type_embedding_size = obj_state_embedding_size 62 | 63 | self.visual_encoder_pick = SimpleCNN( 64 | self.observation_space, 65 | self._hidden_size, 66 | rgb_uuid=None, 67 | depth_uuid="depth_lowres", 68 | ) 69 | self.visual_encoder_drop = SimpleCNN( 70 | self.observation_space, 71 | self._hidden_size, 72 | rgb_uuid=None, 73 | depth_uuid="depth_lowres", 74 | ) 75 | 76 | self.state_encoder = RNNStateEncoder( 77 | (self._hidden_size) + obj_state_embedding_size, 78 | self._hidden_size, 79 | trainable_masked_hidden_state=trainable_masked_hidden_state, 80 | num_layers=num_rnn_layers, 81 | rnn_type=rnn_type, 82 | ) 83 | 84 | self.actor_pick = LinearActorHeadNoCategory(self._hidden_size, action_space.n) 85 | self.critic_pick = LinearCriticHead(self._hidden_size) 86 | self.actor_drop = LinearActorHeadNoCategory(self._hidden_size, action_space.n) 87 | self.critic_drop = LinearCriticHead(self._hidden_size) 88 | 89 | # self.object_state_embedding = nn.Embedding(num_embeddings=6, embedding_dim=obj_state_embedding_size) 90 | 91 | relative_dist_embedding_size = torch.Tensor([3, 100, obj_state_embedding_size]) 92 | self.relative_dist_embedding_pick = input_embedding_net( 93 | relative_dist_embedding_size.long().tolist(), dropout=0 94 | ) 95 | self.relative_dist_embedding_drop = input_embedding_net( 96 | relative_dist_embedding_size.long().tolist(), dropout=0 97 | ) 98 | 99 | self.train() 100 | 101 | @property 102 | def recurrent_hidden_state_size(self) -> int: 103 | """The recurrent hidden state size of the model.""" 104 | return self._hidden_size 105 | 106 | @property 107 | def num_recurrent_layers(self) -> int: 108 | """Number of recurrent hidden layers.""" 109 | return self.state_encoder.num_recurrent_layers 110 | 111 | def _recurrent_memory_specification(self): 112 | return dict( 113 | rnn=( 114 | ( 115 | ("layer", self.num_recurrent_layers), 116 | ("sampler", None), 117 | ("hidden", self.recurrent_hidden_state_size), 118 | ), 119 | torch.float32, 120 | ) 121 | ) 122 | 123 | def forward( # type:ignore 124 | self, 125 | observations: ObservationType, 126 | memory: Memory, 127 | prev_actions: torch.Tensor, 128 | masks: torch.FloatTensor, 129 | ) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]: 130 | """Processes input batched observations to produce new actor and critic 131 | values. Processes input batched observations (along with prior hidden 132 | states, previous actions, and masks denoting which recurrent hidden 133 | states should be masked) and returns an `ActorCriticOutput` object 134 | containing the model's policy (distribution over actions) and 135 | evaluation of the current state (value). 136 | 137 | # Parameters 138 | observations : Batched input observations. 139 | memory : `Memory` containing the hidden states from initial timepoints. 140 | prev_actions : Tensor of previous actions taken. 141 | masks : Masks applied to hidden states. See `RNNStateEncoder`. 142 | # Returns 143 | Tuple of the `ActorCriticOutput` and recurrent hidden state. 144 | """ 145 | 146 | arm2obj_dist = self.relative_dist_embedding_pick( 147 | observations["relative_agent_arm_to_obj"] 148 | ) 149 | obj2goal_dist = self.relative_dist_embedding_drop( 150 | observations["relative_obj_to_goal"] 151 | ) 152 | 153 | perception_embed_pick = self.visual_encoder_pick(observations) 154 | perception_embed_drop = self.visual_encoder_drop(observations) 155 | 156 | pickup_bool = observations["pickedup_object"] 157 | before_pickup = pickup_bool == 0 # not used because of our initialization 158 | after_pickup = pickup_bool == 1 159 | distances = arm2obj_dist 160 | distances[after_pickup] = obj2goal_dist[after_pickup] 161 | 162 | perception_embed = perception_embed_pick 163 | perception_embed[after_pickup] = perception_embed_drop[after_pickup] 164 | 165 | x = [distances, perception_embed] 166 | 167 | x_cat = torch.cat(x, dim=-1) # type: ignore 168 | x_out, rnn_hidden_states = self.state_encoder( 169 | x_cat, memory.tensor("rnn"), masks 170 | ) 171 | actor_out_pick = self.actor_pick(x_out) 172 | critic_out_pick = self.critic_pick(x_out) 173 | 174 | actor_out_drop = self.actor_drop(x_out) 175 | critic_out_drop = self.critic_drop(x_out) 176 | 177 | actor_out = actor_out_pick 178 | actor_out[after_pickup] = actor_out_drop[after_pickup] 179 | critic_out = critic_out_pick 180 | critic_out[after_pickup] = critic_out_drop[after_pickup] 181 | 182 | actor_out = CategoricalDistr(logits=actor_out) 183 | actor_critic_output = ActorCriticOutput( 184 | distributions=actor_out, values=critic_out, extras={} 185 | ) 186 | updated_memory = memory.set_tensor("rnn", rnn_hidden_states) 187 | 188 | return ( 189 | actor_critic_output, 190 | updated_memory, 191 | ) 192 | -------------------------------------------------------------------------------- /manipulathor_constants.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | 4 | ABS_PATH_OF_MANIPULATHOR_TOP_LEVEL_DIR = os.path.abspath( 5 | os.path.dirname(Path(__file__)) 6 | ) 7 | -------------------------------------------------------------------------------- /manipulathor_utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/manipulathor_utils/__init__.py -------------------------------------------------------------------------------- /manipulathor_utils/debugger_util.py: -------------------------------------------------------------------------------- 1 | import pdb 2 | import sys 3 | 4 | import matplotlib as mpl 5 | import torch 6 | 7 | mpl.use("Agg") 8 | import matplotlib.pyplot as plt 9 | 10 | 11 | def visualize_tensor(x, img_name): 12 | if len(x.shape) == 3 and x.shape[0] == 3: 13 | x = x.transpose(0, 1).transpose(1, 2) 14 | x = x.cpu().detach() 15 | plt.imshow(x) 16 | plt.savefig(img_name) 17 | 18 | 19 | def print_gradients(x): 20 | print("New gradient") 21 | print("shape", x.shape) 22 | if (x != x).sum() > 0: 23 | print("gradient is nan") 24 | pdb.set_trace() 25 | print("mean", x.mean()) 26 | print("min", x.min()) 27 | print("max", x.max()) 28 | print("gradient", x) 29 | pdb.set_trace() 30 | 31 | 32 | class ForkedPdb(pdb.Pdb): 33 | """A Pdb subclass that may be used 34 | from a forked multiprocessing child 35 | """ 36 | 37 | def interaction(self, *args, **kwargs): 38 | _stdin = sys.stdin 39 | try: 40 | sys.stdin = open("/dev/stdin") 41 | pdb.Pdb.interaction(self, *args, **kwargs) 42 | finally: 43 | sys.stdin = _stdin 44 | 45 | 46 | def is_weight_nan(model): 47 | all_params = [k for k in model.parameters()] 48 | # [p.shape for p in all_params] 49 | has_nan = [torch.any(p != p) for p in all_params] 50 | norms = [p.norm() for p in all_params] 51 | print("norms", norms) 52 | print("total nans", sum(has_nan), "out of", len(has_nan)) 53 | print("total norm", sum(norms) / len(norms)) 54 | -------------------------------------------------------------------------------- /manipulathor_utils/net_utils.py: -------------------------------------------------------------------------------- 1 | import pdb 2 | 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | 6 | 7 | def upshuffle( 8 | in_planes, out_planes, upscale_factor, kernel_size=3, stride=1, padding=1 9 | ): 10 | return nn.Sequential( 11 | nn.Conv2d( 12 | in_planes, 13 | out_planes * upscale_factor ** 2, 14 | kernel_size=kernel_size, 15 | stride=stride, 16 | padding=padding, 17 | ), 18 | nn.PixelShuffle(upscale_factor), 19 | nn.LeakyReLU(), 20 | ) 21 | 22 | 23 | def upshufflenorelu( 24 | in_planes, out_planes, upscale_factor, kernel_size=3, stride=1, padding=1 25 | ): 26 | return nn.Sequential( 27 | nn.Conv2d( 28 | in_planes, 29 | out_planes * upscale_factor ** 2, 30 | kernel_size=kernel_size, 31 | stride=stride, 32 | padding=padding, 33 | ), 34 | nn.PixelShuffle(upscale_factor), 35 | ) 36 | 37 | 38 | def combine_block(in_planes, out_planes): 39 | return nn.Sequential( 40 | nn.Conv2d(in_planes, out_planes, 1, 1), 41 | nn.BatchNorm2d(out_planes), 42 | nn.LeakyReLU(), 43 | ) 44 | 45 | 46 | def conv2d_block(in_planes, out_planes, kernel_size, stride=1, padding=1): 47 | return nn.Sequential( 48 | nn.Conv2d(in_planes, out_planes, kernel_size, stride=stride, padding=padding), 49 | nn.BatchNorm2d(out_planes), 50 | nn.LeakyReLU(), 51 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1), 52 | nn.BatchNorm2d(out_planes), 53 | ) 54 | 55 | 56 | def combine_block_w_do(in_planes, out_planes, dropout=0.0): 57 | return nn.Sequential( 58 | nn.Conv2d(in_planes, out_planes, 1, 1), 59 | nn.LeakyReLU(), 60 | nn.Dropout(dropout), 61 | ) 62 | 63 | 64 | def combine_block_no_do(in_planes, out_planes): 65 | return nn.Sequential( 66 | nn.Conv2d(in_planes, out_planes, 1, 1), 67 | nn.LeakyReLU(), 68 | ) 69 | 70 | 71 | def linear_block(in_features, out_features, dropout=0.0): 72 | return nn.Sequential( 73 | nn.Linear(in_features, out_features), 74 | nn.LeakyReLU(), 75 | nn.Dropout(dropout), 76 | ) 77 | 78 | 79 | def linear_block_norelu(in_features, out_features): 80 | return nn.Sequential( 81 | nn.Linear(in_features, out_features), 82 | ) 83 | 84 | 85 | def input_embedding_net(list_of_feature_sizes, dropout=0.0): 86 | modules = [] 87 | for i in range(len(list_of_feature_sizes) - 1): 88 | input_size, output_size = list_of_feature_sizes[i : i + 2] 89 | if i + 2 == len(list_of_feature_sizes): 90 | modules.append(linear_block_norelu(input_size, output_size)) 91 | else: 92 | modules.append(linear_block(input_size, output_size, dropout=dropout)) 93 | return nn.Sequential(*modules) 94 | 95 | 96 | def _upsample_add(x, y): 97 | _, _, H, W = y.size() 98 | return F.upsample(x, size=(H, W), mode="bilinear") + y 99 | 100 | 101 | def replace_all_relu_w_leakyrelu(model): 102 | pdb.set_trace() 103 | print("Not sure if using this is a good idea") 104 | modules = model._modules 105 | for m in modules.keys(): 106 | module = modules[m] 107 | if isinstance(module, nn.ReLU): 108 | model._modules[m] = nn.LeakyReLU() 109 | elif isinstance(module, nn.Module): 110 | model._modules[m] = replace_all_relu_w_leakyrelu(module) 111 | return model 112 | 113 | 114 | def replace_all_leakyrelu_w_relu(model): 115 | modules = model._modules 116 | for m in modules.keys(): 117 | module = modules[m] 118 | if isinstance(module, nn.LeakyReLU): 119 | model._modules[m] = nn.ReLU() 120 | elif isinstance(module, nn.Module): 121 | model._modules[m] = replace_all_leakyrelu_w_relu(module) 122 | return model 123 | 124 | 125 | def replace_all_bn_w_groupnorm(model): 126 | pdb.set_trace() 127 | print("Not sure if using this is a good idea") 128 | modules = model._modules 129 | for m in modules.keys(): 130 | module = modules[m] 131 | if isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm1d): 132 | feature_number = module.num_features 133 | model._modules[m] = nn.GroupNorm(32, feature_number) 134 | elif isinstance(module, nn.BatchNorm3d): 135 | raise Exception("Not implemented") 136 | elif isinstance(module, nn.Module): 137 | model._modules[m] = replace_all_bn_w_groupnorm(module) 138 | return model 139 | 140 | 141 | def flat_temporal(tensor, batch_size, sequence_length): 142 | tensor_shape = [s for s in tensor.shape] 143 | assert tensor_shape[0] == batch_size and tensor_shape[1] == sequence_length 144 | result_shape = [batch_size * sequence_length] + tensor_shape[2:] 145 | return tensor.contiguous().view(result_shape) 146 | 147 | 148 | def unflat_temporal(tensor, batch_size, sequence_length): 149 | tensor_shape = [s for s in tensor.shape] 150 | assert tensor_shape[0] == batch_size * sequence_length 151 | result_shape = [batch_size, sequence_length] + tensor_shape[1:] 152 | return tensor.contiguous().view(result_shape) 153 | -------------------------------------------------------------------------------- /media/armpointnav_task.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/media/armpointnav_task.png -------------------------------------------------------------------------------- /media/dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/manipulathor/4562eb8c2f67ff67e5b9ba3930da84b6023a58a4/media/dataset.png -------------------------------------------------------------------------------- /pretrained_models/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !.gitignore 3 | !*.md 4 | -------------------------------------------------------------------------------- /pretrained_models/EvaluateModels.md: -------------------------------------------------------------------------------- 1 | # Evaluating the Pre-trained Models 2 | ## Download the Weights 3 | You can download the weights from [here](https://drive.google.com/file/d/1axZRgY3oKgATu0zoi1LeLxUtqpIXmJCg/view?usp=sharing). 4 | After downloading the weights, extract them to `pretrained_models/saved_checkpoints`. 5 | ## Evaluating the Trained Models 6 | ### Table 1 7 | #### Test-SeenObj 8 | ``` 9 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth.py -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt 10 | ``` 11 | #### Test-NovelObj 12 | ``` 13 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_NovelObj_armpointnav_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt 14 | ``` 15 | #### SeenScenes-NovelObj 16 | ``` 17 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/test_SeenScenes_NovelObj_armpointnav_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/depth_armpointnav.pt 18 | ``` 19 | 20 | ### Table 2 21 | 22 | #### No-Vision 23 | ``` 24 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_no_vision -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/no_vision_armpointnav.pt 25 | ``` 26 | 27 | #### Disjoint Model 28 | ``` 29 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_disjoint_depth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/disjoint_model_armpointnav.pt 30 | ``` 31 | 32 | #### RGB Model 33 | ``` 34 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgb -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/rgb_armpointnav.pt 35 | ``` 36 | 37 | #### RGB-Depth Model 38 | ``` 39 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_rgbdepth -o test_out -s 1 -t test -c pretrained_models/saved_checkpoints/rgbdepth_armpointnav.pt 40 | ``` 41 | 42 | 43 | #### Depth Model 44 | ``` 45 | allenact manipulathor_baselines/armpointnav_baselines/experiments/ithor/armpointnav_depth -o test_out -s 1 -t test -cpretrained_models/saved_checkpoints/depth_armpointnav.pt 46 | ``` -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.12.0 2 | -e git+https://github.com/allenai/ai2thor.git@bcc2e62970823667acb5c2a56e809419f1521e52#egg=ai2thor 3 | allenact==0.2.2 4 | allenact-plugins==0.2.2 5 | appdirs==1.4.4 6 | astunparse==1.6.3 7 | attrs==20.3.0 8 | aws-requests-auth==0.4.3 9 | black==20.8b1 10 | botocore==1.20.55 11 | cachetools==4.2.1 12 | certifi==2020.12.5 13 | chardet==4.0.0 14 | click==7.1.2 15 | cloudpickle==1.6.0 16 | colour==0.1.5 17 | cycler==0.10.0 18 | dataclasses==0.6 19 | filelock==3.0.12 20 | Flask==1.1.2 21 | future==0.18.2 22 | gast==0.4.0 23 | google-auth==1.29.0 24 | google-auth-oauthlib==0.4.4 25 | google-pasta==0.2.0 26 | grpcio==1.37.0 27 | gym==0.17.3 28 | h5py==2.10.0 29 | idna==2.10 30 | imageio==2.9.0 31 | imageio-ffmpeg==0.4.3 32 | importlib-metadata==4.0.1 33 | itsdangerous==1.1.0 34 | jedi==0.17.0 35 | Jinja2==2.11.3 36 | jmespath==0.10.0 37 | Keras-Preprocessing==1.1.2 38 | kiwisolver==1.3.1 39 | llvmlite==0.36.0 40 | Markdown==3.3.4 41 | MarkupSafe==1.1.1 42 | matplotlib==3.3.4 43 | moviepy==1.0.3 44 | msgpack==1.0.2 45 | mypy-extensions==0.4.3 46 | networkx==2.5.1 47 | numba==0.53.1 48 | numpy==1.19.5 49 | numpy-quaternion==2021.4.5.14.42.35 50 | oauthlib==3.1.0 51 | opencv-python==4.5.1.48 52 | opt-einsum==3.3.0 53 | packaging==20.9 54 | pathspec==0.8.1 55 | Pillow==8.2.0 56 | proglog==0.1.9 57 | progressbar2==3.53.1 58 | protobuf==3.15.8 59 | pyasn1==0.4.8 60 | pyasn1-modules==0.2.8 61 | pyglet==1.5.0 62 | pyparsing==2.4.7 63 | pyquaternion==0.9.9 64 | python-dateutil==2.8.1 65 | python-utils==2.5.6 66 | python-xlib==0.29 67 | PyYAML==5.4.1 68 | regex==2021.4.4 69 | requests==2.25.1 70 | requests-oauthlib==1.3.0 71 | rsa==4.7.2 72 | scipy==1.5.4 73 | setproctitle==1.2.2 74 | six==1.15.0 75 | tensorboardX==2.2 76 | tensorflow-estimator==2.4.0 77 | termcolor==1.1.0 78 | toml==0.10.2 79 | torch==1.8.1 80 | torchvision==0.9.1 81 | tqdm==4.60.0 82 | typed-ast==1.4.3 83 | typing-extensions==3.7.4.3 84 | urllib3==1.26.4 85 | Werkzeug==1.0.1 86 | wrapt==1.12.1 87 | zipp==3.4.1 88 | -------------------------------------------------------------------------------- /scripts/jupyter_helper.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import math 3 | import random 4 | import ai2thor 5 | import pdb 6 | import ai2thor.fifo_server 7 | 8 | 9 | ### CONSTANTS 10 | 11 | 12 | 13 | ADITIONAL_ARM_ARGS = { 14 | 'disableRendering': True, 15 | 'returnToStart': True, 16 | 'speed': 1, 17 | } 18 | 19 | ARM_MOVE_CONSTANT = 0.05 20 | 21 | SCENE_INDICES = [i + 1 for i in range(30)] +[i + 1 for i in range(200,230)] +[i + 1 for i in range(300,330)] +[i + 1 for i in range(400,430)] 22 | SCENE_NAMES = ['FloorPlan{}_physics'.format(i) for i in SCENE_INDICES] 23 | 24 | 25 | ENV_ARGS = dict(gridSize=0.25, 26 | width=224, height=224, agentMode='arm', fieldOfView=100, 27 | agentControllerType='mid-level', 28 | server_class=ai2thor.fifo_server.FifoServer, 29 | useMassThreshold = True, massThreshold = 10, 30 | autoSimulation=False, autoSyncTransforms=True, 31 | ) 32 | 33 | #Functions 34 | 35 | def is_object_at_position(controller, action_detail): 36 | objectId = action_detail['objectId'] 37 | position = action_detail['position'] 38 | current_object_position = get_object_details(controller, objectId)['position'] 39 | return two_dict_equal(dict(position=position), dict(position=current_object_position)) 40 | 41 | def is_agent_at_position(controller, action_detail): 42 | # dict(action='TeleportFull', x=initial_location['x'], y=initial_location['y'], z=initial_location['z'], rotation=dict(x=0, y=initial_rotation, z=0), horizon=horizon, standing=True) 43 | target_pose = dict( 44 | position={'x': action_detail['x'], 'y': action_detail['y'], 'z': action_detail['z'], }, 45 | rotation=action_detail['rotation'], 46 | horizon=action_detail['horizon'] 47 | ) 48 | current_agent_pose = controller.last_event.metadata['agent'] 49 | current_agent_pose = dict( 50 | position=current_agent_pose['position'], 51 | rotation=current_agent_pose['rotation'], 52 | horizon=current_agent_pose['cameraHorizon'], 53 | ) 54 | return two_dict_equal(current_agent_pose, target_pose) 55 | 56 | 57 | def get_object_details(controller, obj_id): 58 | return [o for o in controller.last_event.metadata['objects'] if o['objectId'] == obj_id][0] 59 | 60 | def initialize_arm(controller, scene_starting_cheating_locations): 61 | # for start arm from high up as a cheating, this block is very important. never remove 62 | scene = controller.last_event.metadata['sceneName'] 63 | initial_pose = scene_starting_cheating_locations[scene] 64 | event1 = controller.step(dict(action='TeleportFull', standing=True, x=initial_pose['x'], y=initial_pose['y'], z=initial_pose['z'], rotation=dict(x=0, y=initial_pose['rotation'], z=0), horizon=initial_pose['horizon'])) 65 | event2 = controller.step(dict(action='MoveMidLevelArm', position=dict(x=0.0, y=0, z=0.35), **ADITIONAL_ARM_ARGS)) 66 | event3 = controller.step(dict(action='MoveMidLevelArmHeight', y=0.8, **ADITIONAL_ARM_ARGS)) 67 | return event1, event2, event3 68 | 69 | def make_all_objects_unbreakable(controller): 70 | all_breakable_objects = [o['objectType'] for o in controller.last_event.metadata['objects'] if o['breakable'] is True] 71 | all_breakable_objects = set(all_breakable_objects) 72 | for obj_type in all_breakable_objects: 73 | controller.step(action='MakeObjectsOfTypeUnbreakable', objectType=obj_type) 74 | 75 | 76 | def reset_the_scene_and_get_reachables(controller, scene_name=None, scene_options=None): 77 | if scene_name is None: 78 | if scene_options is None: 79 | scene_options = SCENE_NAMES 80 | scene_name = random.choice(scene_options) 81 | controller.reset(scene_name) 82 | controller.step(action='MakeAllObjectsMoveable') 83 | controller.step(action='MakeObjectsStaticKinematicMassThreshold') 84 | make_all_objects_unbreakable(controller) 85 | return get_reachable_positions(controller) 86 | 87 | def only_reset_scene(controller, scene_name): 88 | controller.reset(scene_name) 89 | controller.step(action='MakeAllObjectsMoveable') 90 | controller.step(action='MakeObjectsStaticKinematicMassThreshold') 91 | make_all_objects_unbreakable(controller) 92 | 93 | def transport_wrapper(controller, target_object, target_location): 94 | action_detail_list = [] 95 | transport_detail = dict(action = 'PlaceObjectAtPoint', objectId=target_object, position=target_location, forceKinematic=True) 96 | event = controller.step(**transport_detail) 97 | action_detail_list.append(transport_detail) 98 | # controller.step('PhysicsSyncTransforms') 99 | advance_detail = dict(action='AdvancePhysicsStep', simSeconds=1.0) 100 | controller.step(**advance_detail) 101 | action_detail_list.append(advance_detail) 102 | return event, action_detail_list 103 | 104 | def is_object_in_receptacle(event,target_obj,target_receptacle): 105 | all_containing_receptacle = set([]) 106 | parent_queue = [target_obj] 107 | while(len(parent_queue) > 0): 108 | top_queue = parent_queue[0] 109 | parent_queue = parent_queue[1:] 110 | if top_queue in all_containing_receptacle: 111 | continue 112 | current_parent_list = event.get_object(top_queue)['parentReceptacles'] 113 | if current_parent_list is None: 114 | continue 115 | else: 116 | parent_queue += current_parent_list 117 | all_containing_receptacle.update(set(current_parent_list)) 118 | return target_receptacle in all_containing_receptacle 119 | 120 | def get_reachable_positions(controller): 121 | event = controller.step('GetReachablePositions') 122 | reachable_positions = event.metadata['reachablePositions'] 123 | if reachable_positions is None or len(reachable_positions) == 0: 124 | reachable_positions = event.metadata['actionReturn'] 125 | if reachable_positions is None or len(reachable_positions) == 0: 126 | print('Scene name', controller.last_event.metadata['sceneName']) 127 | pdb.set_trace() 128 | return reachable_positions 129 | def execute_command(controller, command,action_dict_addition): 130 | 131 | base_position = get_current_arm_state(controller) 132 | change_height = ARM_MOVE_CONSTANT 133 | change_value = change_height 134 | action_details = {} 135 | 136 | if command == 'w': 137 | base_position['z'] += change_value 138 | elif command == 'z': 139 | base_position['z'] -= change_value 140 | elif command == 's': 141 | base_position['x'] += change_value 142 | elif command == 'a': 143 | base_position['x'] -= change_value 144 | elif command == '3': 145 | base_position['y'] += change_value 146 | elif command == '4': 147 | base_position['y'] -= change_value 148 | elif command == 'u': 149 | base_position['h'] += change_height 150 | elif command == 'j': 151 | base_position['h'] -= change_height 152 | elif command == '/': 153 | action_details = dict('') 154 | pickupable = controller.last_event.metadata['arm']['PickupableObjectsInsideHandSphere'] 155 | print(pickupable) 156 | elif command == 'd': 157 | event = controller.step(action='ReleaseObject') 158 | action_details = dict(action='ReleaseObject') 159 | elif command == 'm': 160 | action_dict_addition = copy.deepcopy(action_dict_addition) 161 | event = controller.step(action='MoveAgent', ahead=0.2,**action_dict_addition) 162 | action_details = dict(action='MoveAgent', ahead=0.2,**action_dict_addition) 163 | 164 | elif command == 'r': 165 | action_dict_addition = copy.deepcopy(action_dict_addition) 166 | event = controller.step(action='RotateAgent', degrees = 45,**action_dict_addition) 167 | action_details = dict(action='RotateAgent', degrees = 45,**action_dict_addition) 168 | elif command == 'l': 169 | action_dict_addition = copy.deepcopy(action_dict_addition) 170 | event = controller.step(action='RotateAgent', degrees = -45,**action_dict_addition) 171 | action_details = dict(action='RotateAgent', degrees = -45,**action_dict_addition) 172 | elif command == 'p': 173 | event = controller.step(action='PickupObject') 174 | action_details = dict(action='PickupObject') 175 | elif command == 'q': 176 | action_details = {} 177 | else: 178 | action_details = {} 179 | 180 | if command in ['w', 'z', 's', 'a', '3', '4']: 181 | 182 | event = controller.step(action='MoveArm', position=dict(x=base_position['x'], y=base_position['y'], z=base_position['z']),**action_dict_addition) 183 | action_details=dict(action='MoveArm', position=dict(x=base_position['x'], y=base_position['y'], z=base_position['z']),**action_dict_addition) 184 | success = event.metadata['lastActionSuccess'] 185 | 186 | 187 | elif command in ['u', 'j']: 188 | 189 | event = controller.step(action='MoveArmBase', y=base_position['h'],**action_dict_addition) 190 | action_details=dict(action='MoveArmBase', y=base_position['h'],**action_dict_addition) 191 | 192 | success = event.metadata['lastActionSuccess'] 193 | 194 | return action_details 195 | 196 | def get_current_arm_state(controller): 197 | h_min = 0.450998873 198 | h_max = 1.8009994 199 | agent_base_location = 0.9009995460510254 200 | event = controller.last_event 201 | offset = event.metadata['agent']['position']['y'] - agent_base_location 202 | h_max += offset 203 | h_min += offset 204 | joints=(event.metadata['arm']['joints']) 205 | arm=joints[-1] 206 | assert arm['name'] == 'robot_arm_4_jnt' 207 | xyz_dict = copy.deepcopy(arm['rootRelativePosition']) 208 | height_arm = joints[0]['position']['y'] 209 | xyz_dict['h'] = (height_arm - h_min) / (h_max - h_min) 210 | # print_error([x['position']['y'] for x in joints]) 211 | return xyz_dict 212 | 213 | def two_list_equal(l1, l2): 214 | dict1 = {i: v for (i,v) in enumerate(l1)} 215 | dict2 = {i: v for (i,v) in enumerate(l2)} 216 | return two_dict_equal(dict1, dict2) 217 | 218 | 219 | def get_current_full_state(controller): 220 | return {'agent_position':controller.last_event.metadata['agent']['position'], 'agent_rotation':controller.last_event.metadata['agent']['rotation'], 'arm_state': controller.last_event.metadata['arm']['joints'], 'held_object': controller.last_event.metadata['arm']['HeldObjects']} 221 | 222 | 223 | def two_dict_equal(dict1, dict2, threshold=0.001, ignore_keys=[]): 224 | if len(dict1) != len(dict2): 225 | print('different len', dict1, dict2) 226 | return False 227 | # assert len(dict1) == len(dict2), print('different len', dict1, dict2) 228 | equal = True 229 | for k in dict1: 230 | if k in ignore_keys: 231 | continue 232 | val1 = dict1[k] 233 | val2 = dict2[k] 234 | if not (type(val1) == type(val2) or (type(val1) in [int, float] and type(val2) in [int, float])): 235 | print('different type', dict1, dict2) 236 | return False 237 | # assert type(val1) == type(val2) or (type(val1) in [int, float] and type(val2) in [int, float]), () 238 | if type(val1) == dict: 239 | equal = two_dict_equal(val1, val2) 240 | elif type(val1) == list: 241 | equal = two_list_equal(val1, val2) 242 | # elif val1 != val1: # Either nan or -inf 243 | # equal = val2 != val2 244 | elif type(val1) == float: 245 | equal = abs(val1 - val2) < threshold 246 | else: 247 | equal = (val1 == val2) 248 | if not equal: 249 | print('not equal', 'key', k, 'values', val1, val2) 250 | return equal 251 | return equal 252 | 253 | -------------------------------------------------------------------------------- /scripts/startx.py: -------------------------------------------------------------------------------- 1 | import atexit 2 | import os 3 | import platform 4 | import re 5 | import shlex 6 | import subprocess 7 | import tempfile 8 | 9 | 10 | # Turning off automatic black formatting for this script as it breaks quotes. 11 | 12 | # fmt: off 13 | 14 | def pci_records(): 15 | records = [] 16 | command = shlex.split("lspci -vmm") 17 | output = subprocess.check_output(command).decode() 18 | 19 | for devices in output.strip().split("\n\n"): 20 | record = {} 21 | records.append(record) 22 | for row in devices.split("\n"): 23 | key, value = row.split("\t") 24 | record[key.split(":")[0]] = value 25 | 26 | return records 27 | 28 | def generate_xorg_conf(devices): 29 | xorg_conf = [] 30 | 31 | device_section = """ 32 | Section "Device" 33 | Identifier "Device{device_id}" 34 | Driver "nvidia" 35 | VendorName "NVIDIA Corporation" 36 | BusID "{bus_id}" 37 | EndSection 38 | """ 39 | server_layout_section = """ 40 | Section "ServerLayout" 41 | Identifier "Layout0" 42 | {screen_records} 43 | EndSection 44 | """ 45 | screen_section = """ 46 | Section "Screen" 47 | Identifier "Screen{screen_id}" 48 | Device "Device{device_id}" 49 | DefaultDepth 24 50 | Option "AllowEmptyInitialConfiguration" "True" 51 | SubSection "Display" 52 | Depth 24 53 | Virtual 1024 768 54 | EndSubSection 55 | EndSection 56 | """ 57 | screen_records = [] 58 | for i, bus_id in enumerate(devices): 59 | xorg_conf.append(device_section.format(device_id=i, bus_id=bus_id)) 60 | xorg_conf.append(screen_section.format(device_id=i, screen_id=i)) 61 | screen_records.append('Screen {screen_id} "Screen{screen_id}" 0 0'.format(screen_id=i)) 62 | 63 | xorg_conf.append(server_layout_section.format(screen_records="\n ".join(screen_records))) 64 | 65 | output = "\n".join(xorg_conf) 66 | return output 67 | 68 | def startx(display=0): 69 | if platform.system() != "Linux": 70 | raise Exception("Can only run startx on linux") 71 | 72 | devices = [] 73 | for r in pci_records(): 74 | if r.get("Vendor", "") == "NVIDIA Corporation" \ 75 | and r["Class"] in ["VGA compatible controller", "3D controller"]: 76 | bus_id = "PCI:" + ":".join(map(lambda x: str(int(x, 16)), re.split(r"[:\.]", r["Slot"]))) 77 | devices.append(bus_id) 78 | 79 | if not devices: 80 | raise Exception("no nvidia cards found") 81 | 82 | try: 83 | fd, path = tempfile.mkstemp() 84 | with open(path, "w") as f: 85 | f.write(generate_xorg_conf(devices)) 86 | command = shlex.split("Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config %s :%s" % (path, display)) 87 | proc = subprocess.Popen(command) 88 | atexit.register(lambda: proc.poll() is None and proc.kill()) 89 | proc.wait() 90 | finally: 91 | os.close(fd) 92 | os.unlink(path) 93 | 94 | # fmt: on 95 | 96 | 97 | if __name__ == "__main__": 98 | startx() -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages, setup 2 | 3 | if __name__ == "__main__": 4 | setup( 5 | name="manipulathor", 6 | packages=find_packages(), 7 | version="0.0.1", 8 | install_requires=[ 9 | "allenact==0.2.2", 10 | "allenact_plugins[ithor]==0.2.2", 11 | "setuptools", 12 | ], 13 | ) 14 | -------------------------------------------------------------------------------- /updated_requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.12.0 2 | -e git+https://github.com/allenai/ai2thor.git@bcc2e62970823667acb5c2a56e809419f1521e52#egg=ai2thor 3 | allenact==0.2.2 4 | allenact-plugins==0.2.2 5 | astunparse==1.6.3 6 | aws-requests-auth==0.4.3 7 | botocore==1.20.67 8 | cachetools==4.2.2 9 | certifi==2020.12.5 10 | chardet==4.0.0 11 | python-xlib==0.29 12 | click==7.1.2 13 | cloudpickle==1.6.0 14 | cmake==3.18.4.post1 15 | cycler==0.10.0 16 | dataclasses==0.8 17 | decorator==4.4.2 18 | filelock==3.0.12 19 | Flask==1.1.2 20 | flatbuffers==1.12 21 | future==0.18.2 22 | gast==0.3.3 23 | gin==0.1.6 24 | gin-config==0.4.0 25 | google-auth==1.30.0 26 | google-auth-oauthlib==0.4.4 27 | google-pasta==0.2.0 28 | grpcio==1.32.0 29 | gym==0.17.3 30 | h5py==2.10.0 31 | idna==2.10 32 | imageio==2.9.0 33 | imageio-ffmpeg==0.4.3 34 | importlib-metadata==4.0.1 35 | itsdangerous==1.1.0 36 | Jinja2==2.11.3 37 | jmespath==0.10.0 38 | Keras-Preprocessing==1.1.2 39 | kiwisolver==1.3.1 40 | Markdown==3.3.4 41 | MarkupSafe==1.1.1 42 | matplotlib==3.3.4 43 | moviepy==1.0.3 44 | msgpack==1.0.2 45 | networkx==2.5.1 46 | numpy==1.19.5 47 | oauthlib==3.1.0 48 | opencv-python==4.5.1.48 49 | opt-einsum==3.3.0 50 | Pillow==8.2.0 51 | proglog==0.1.9 52 | progressbar2==3.53.1 53 | protobuf==3.15.8 54 | pyasn1==0.4.8 55 | pyasn1-modules==0.2.8 56 | pyglet==1.5.0 57 | pyparsing==2.4.7 58 | python-dateutil==2.8.1 59 | python-utils==2.5.6 60 | PyYAML==5.4.1 61 | requests==2.25.1 62 | requests-oauthlib==1.3.0 63 | rsa==4.7.2 64 | scipy==1.5.4 65 | setproctitle==1.2.2 66 | six==1.15.0 67 | tensorboard==2.5.0 68 | tensorboard-data-server==0.6.1 69 | tensorboard-plugin-wit==1.8.0 70 | tensorboardX==2.2 71 | tensorflow==2.4.1 72 | tensorflow-estimator==2.4.0 73 | termcolor==1.1.0 74 | torch==1.8.1 75 | torchaudio==0.8.1 76 | torchvision==0.9.1 77 | tqdm==4.60.0 78 | typing-extensions==3.7.4.3 79 | urllib3==1.26.4 80 | Werkzeug==1.0.1 81 | wrapt==1.12.1 82 | zipp==3.4.1 83 | --------------------------------------------------------------------------------