├── .gitignore ├── LICENSE ├── README.md ├── envs ├── setup_env.sh └── teaser.jpg └── src └── test_env.py /.gitignore: -------------------------------------------------------------------------------- 1 | assemblyEnv 2 | ml-agents 3 | envs/ 4 | **/__pycache__ 5 | **pyc 6 | **npy 7 | **DS_Store 8 | src/baselines 9 | baselines 10 | src/trained_models 11 | src/SaveData 12 | src/temp 13 | src/temp.txt 14 | src/auto_plot.sh 15 | src/pytorch-tools 16 | pytorch-tools 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2019 Deepak Pathak 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 24 | 25 | 26 | -------------------------------------------------------------------------------- 27 | Original Pytorch-PPO License: 28 | -------------------------------------------------------------------------------- 29 | MIT License 30 | 31 | Copyright (c) 2017 Ilya Kostrikov 32 | 33 | Permission is hereby granted, free of charge, to any person obtaining a copy 34 | of this software and associated documentation files (the "Software"), to deal 35 | in the Software without restriction, including without limitation the rights 36 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 37 | copies of the Software, and to permit persons to whom the Software is 38 | furnished to do so, subject to the following conditions: 39 | 40 | The above copyright notice and this permission notice shall be included in all 41 | copies or substantial portions of the Software. 42 | 43 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 44 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 45 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 46 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 47 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 48 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 49 | SOFTWARE. 50 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Learning to Control Self-Assembling Morphologies ## 2 | ### NeurIPS 2019 (Spotlight)
Winner of [Virtual Creatures Competition at GECCO 2019, Prague](https://virtualcreatures.github.io/) 3 | #### [[Project Website]](https://pathak22.github.io/modular-assemblies/) [[Demo Video]](https://youtu.be/ngCIB-IWD8E) 4 | 5 | [Deepak Pathak](https://people.eecs.berkeley.edu/~pathak/)*, [Chris Lu](https://chris-lu.weebly.com/)*, [Trevor Darrell](https://people.eecs.berkeley.edu/~trevor/), [Phillip Isola](https://www.eecs.mit.edu/people/faculty/phillip-isola/), [Alexei A. Efros](https://people.eecs.berkeley.edu/~efros/)
6 | University of California, Berkeley
7 | MIT
8 | (* equal contribution) 9 | 10 | 11 |

12 | 13 | 14 | This is a pytorch based implementation for our [paper on learning to control self-assembling agents using deep reinforcement learning](https://pathak22.github.io/modular-assemblies/). We investigate a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies. We learn compositional policies to demonstrate better zero-shot generalization. If you find this work useful in your research, please cite: 15 | 16 | @inproceedings{pathak19assemblies, 17 | Author = {Pathak, Deepak and Lu, Chris and Darrell, Trevor and 18 | Isola, Phillip and Efros, Alexei A.}, 19 | Title = {Learning to Control Self-Assembling Morphologies: 20 | A Study of Generalization via Modularity}, 21 | Booktitle = {arXiv preprint arXiv:1902.05546}, 22 | Year = {2019} 23 | } 24 | 25 | ### Installation and Usage 26 | 27 | 1. Setting up repository 28 | ```Shell 29 | git clone https://github.com/pathak22/modular-assemblies.git 30 | cd modular-assemblies/ 31 | git clone https://github.com/Unity-Technologies/ml-agents.git 32 | cd ml-agents/ 33 | git reset --hard 6c5255e 34 | cd .. 35 | bash envs/setup_env.sh 36 | 37 | python3 -m venv assemblyEnv 38 | source $PWD/assemblyEnv/bin/activate 39 | pip install --upgrade pip 40 | ``` 41 | 42 | 2. Installation 43 | - Requirements: 44 | - CUDNN-5.1, CUDA-8.0, Python-3.5 45 | - Detailed setup, skip to quick setup for exact replication: 46 | ```Shell 47 | # Install Pytorch from http://pytorch.org/ 48 | pip install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl 49 | pip install torchvision 50 | pip install --upgrade visdom 51 | 52 | # Install baselines for Atari preprocessing 53 | pip install gym==0.9.4 # baselines install latest gym first automatically, but latest gym has moved to mujoco5 so first install old gym and then install baselines 54 | git clone https://github.com/openai/baselines.git 55 | cd baselines 56 | git reset --hard b5be53d 57 | pip install -e . 58 | 59 | # Additional packages 60 | pip install numpy 61 | pip install matplotlib 62 | pip install pillow 63 | pip install opencv-python 64 | 65 | # fold 66 | cd modular-assemblies/src/ 67 | git clone https://github.com/nearai/pytorch-tools.git 68 | cd pytorch-tools/ 69 | git reset --hard 09dccb2 70 | python setup.py install 71 | ``` 72 | - Quick setup for exact replication: 73 | ```Shell 74 | pip install -r requirements.txt 75 | ``` 76 | 77 | 3. Run code 78 | ```Shell 79 | cd modular-assemblies/src/ 80 | python test_env.py 81 | ``` 82 | 83 | ### Acknowledgement 84 | Builds upon Ilya Kostrikov's Pytorch PPO [implementation](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr). 85 | -------------------------------------------------------------------------------- /envs/setup_env.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | URL=https://www.dropbox.com/s//final_N_vgrid3.zip 4 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/" && pwd )" 5 | cd $DIR 6 | 7 | rm -rf *.tar.gz *.zip 8 | echo "Downloading the environment executable..." 9 | wget $URL 10 | echo "Unzipping..." 11 | unzip final_N_vgrid3.zip 12 | mv final_N_vgrid3/* . 13 | rm -rf final_N_vgrid3 __MACOSX .DS_Store 14 | rm -rf *.tar.gz *.zip 15 | echo "Downloading Done." 16 | -------------------------------------------------------------------------------- /envs/teaser.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pathak22/modular-assemblies/535338cd2a33526770d5cb7cd04c516ae518b12c/envs/teaser.jpg -------------------------------------------------------------------------------- /src/test_env.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import argparse 3 | import os 4 | import sys 5 | sys.path.insert(0, os.path.abspath("../ml-agents/python/")) 6 | from unityagents import UnityEnvironment 7 | 8 | parser = argparse.ArgumentParser(description='Unity Env Test') 9 | parser.add_argument('-e', '--env-name', default='../envs/pyenv', 10 | help='environment path') 11 | args = parser.parse_args() 12 | 13 | env = UnityEnvironment(file_name=args.env_name) 14 | default_brain = env.brain_names[0] 15 | brain = env.brains[default_brain] 16 | print('Loaded environment from: %s' % args.env_name) 17 | print(str(env)) 18 | print('action space size per limb = ', brain.action_space_size) 19 | print('state space size per limb = ', brain.state_space_size) 20 | train_mode = False 21 | 22 | # joinDist means merge action 23 | # forceScale means scale torque in the environment 24 | # autoJoin is to join automatically if dist <= joinDist else via merge action 25 | expID = 0.0 26 | saveEnv = 0.0 27 | loadEnv = 0.0 28 | env_config = {"maxLimb": 7, "joinDist":4.0, "forceScale":10.0, 29 | "autoJoin": 0.0, "dynamicFriction":0.6, "staticFriction":0.6, 30 | "saveEnv":saveEnv, "loadEnv":loadEnv, "expID":expID} 31 | if saveEnv != 0.0: 32 | print('EnvData Save dir: ./SaveData/%05d/%05d.dat' % (expID, saveEnv)) 33 | if loadEnv != 0.0: 34 | print('EnvData Load dir: ./SaveData/%05d/%05d.dat' % (expID, loadEnv)) 35 | if 'multiagent_N' in args.env_name: 36 | env_config["N"] = 8 37 | print('Env config:', env_config) 38 | 39 | for episode in range(10): 40 | print('='*60) 41 | env_info = env.reset(train_mode=train_mode, config=env_config)[default_brain] 42 | print('Starting Episode: %d' % (episode+1)) 43 | if "N" in env_config: 44 | env_config["N"] = 0 # to add only once 45 | done = False 46 | episode_rewards = 0 47 | for i in range(100): 48 | if brain.action_space_type == 'continuous': 49 | act = np.random.randn(len(env_info.agents), brain.action_space_size) 50 | else: 51 | act = np.random.randint(0, brain.action_space_size, 52 | size=(len(env_info.agents))) 53 | env_info = env.step(act)[default_brain] 54 | episode_rewards += env_info.rewards[0] 55 | print("Total reward this episode: {}".format(episode_rewards)) 56 | 57 | env.close() 58 | --------------------------------------------------------------------------------