├── .gitignore ├── README.md ├── agent-zoo ├── kick-and-defend │ ├── defender │ │ ├── agent2_parameters-v1.pkl │ │ ├── agent2_parameters-v2.pkl │ │ └── agent2_parameters-v3.pkl │ └── kicker │ │ ├── agent1_parameters-v1.pkl │ │ ├── agent1_parameters-v2.pkl │ │ └── agent1_parameters-v3.pkl ├── run-to-goal │ ├── ants │ │ ├── agent1_parameters-v1.pkl │ │ └── agent2_parameters-v1.pkl │ └── humans │ │ ├── agent1_parameters-v1.pkl │ │ └── agent2_parameters-v1.pkl ├── sumo │ ├── ants │ │ ├── agent_parameters-v1.pkl │ │ ├── agent_parameters-v2.pkl │ │ ├── agent_parameters-v3.pkl │ │ └── agent_parameters-v4.pkl │ └── humans │ │ ├── agent_parameters-v1.pkl │ │ ├── agent_parameters-v2.pkl │ │ └── agent_parameters-v3.pkl └── you-shall-not-pass │ ├── agent1_parameters-v1pkl │ └── agent2_parameters-v1.pkl ├── demo_tasks.sh ├── gym-compete ├── gym_compete │ ├── __init__.py │ └── new_envs │ │ ├── __init__.py │ │ ├── agents │ │ ├── __init__.py │ │ ├── agent.py │ │ ├── ant.py │ │ ├── ant_fighter.py │ │ ├── humanoid.py │ │ ├── humanoid_blocker.py │ │ ├── humanoid_fighter.py │ │ ├── humanoid_goalkeeper.py │ │ └── humanoid_kicker.py │ │ ├── assets │ │ ├── ant_body.xml │ │ ├── humanoid_body.xml │ │ ├── world_body.ant_body.ant_body.xml │ │ ├── world_body.humanoid_body.humanoid_body.xml │ │ ├── world_body.xml │ │ ├── world_body_arena.ant_body.ant_body.xml │ │ ├── world_body_arena.humanoid_body.humanoid_body.xml │ │ ├── world_body_arena.xml │ │ ├── world_body_football.humanoid_body.humanoid_body.xml │ │ └── world_body_football.xml │ │ ├── kick_and_defend.py │ │ ├── multi_agent_env.py │ │ ├── multi_agent_scene.py │ │ ├── multi_monitoring.py │ │ ├── multi_time_limit.py │ │ ├── sumo.py │ │ ├── utils.py │ │ └── you_shall_not_pass.py └── setup.py ├── main.py ├── policy.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | 103 | # specific 104 | /rllab-multi/extra 105 | /output 106 | .DS_STORE 107 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **Status:** Archive (code is provided as-is, no updates expected) 2 | 3 | # Competitive Multi-Agent Environments 4 | 5 | This repository contains the environments for the paper [Emergent Complexity via Multi-agent Competition](https://arxiv.org/abs/1710.03748) 6 | 7 | ## Dependencies 8 | Use `pip install -r requirements.txt` to install dependencies. If you haven't used MuJoCo before, please refer to the [installation guide](https://github.com/openai/mujoco-py). 9 | The code has been tested with the following dependencies: 10 | * Python version 3.6 11 | * [OpenAI GYM](https://github.com/openai/gym) version 0.9.1 with MuJoCo 1.31 support (use [mujoco-py version 0.5.7](https://github.com/openai/mujoco-py/tree/0.5)) 12 | * [Tensorflow](https://www.tensorflow.org/versions/r1.1/install/) version 1.1.0 13 | * [Numpy](https://scipy.org/install.html) version 1.12.1 14 | 15 | ## Installing Package 16 | After installing all dependencies, make sure gym works with support for MuJoCo environments. 17 | Next install `gym-compete` package as: 18 | ```bash 19 | cd gym-compete 20 | pip install -e . 21 | ``` 22 | Check install is successful by coming out of the directory and trying `import gym_compete` in python console. Some users might require a `sudo pip install`. 23 | 24 | ## Trying the environments 25 | Agent policies are provided for the various environments in folder `agent-zoo`. To see a demo of all the environments do: 26 | ```bash 27 | bash demo_tasks.sh all 28 | ``` 29 | To instead try a single environment use: 30 | ```bash 31 | bash demo_tasks.sh 32 | ``` 33 | where `` is one of: `run-to-goal-humans`, `run-to-goal-ants`, `you-shall-not-pass`, `sumo-ants`, `sumo-humans` and `kick-and-defend` 34 | -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/defender/agent2_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/defender/agent2_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/defender/agent2_parameters-v2.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/defender/agent2_parameters-v2.pkl -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/defender/agent2_parameters-v3.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/defender/agent2_parameters-v3.pkl -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/kicker/agent1_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/kicker/agent1_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/kicker/agent1_parameters-v2.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/kicker/agent1_parameters-v2.pkl -------------------------------------------------------------------------------- /agent-zoo/kick-and-defend/kicker/agent1_parameters-v3.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/kick-and-defend/kicker/agent1_parameters-v3.pkl -------------------------------------------------------------------------------- /agent-zoo/run-to-goal/ants/agent1_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/run-to-goal/ants/agent1_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/run-to-goal/ants/agent2_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/run-to-goal/ants/agent2_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/run-to-goal/humans/agent1_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/run-to-goal/humans/agent1_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/run-to-goal/humans/agent2_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/run-to-goal/humans/agent2_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/ants/agent_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/ants/agent_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/ants/agent_parameters-v2.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/ants/agent_parameters-v2.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/ants/agent_parameters-v3.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/ants/agent_parameters-v3.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/ants/agent_parameters-v4.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/ants/agent_parameters-v4.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/humans/agent_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/humans/agent_parameters-v1.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/humans/agent_parameters-v2.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/humans/agent_parameters-v2.pkl -------------------------------------------------------------------------------- /agent-zoo/sumo/humans/agent_parameters-v3.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/sumo/humans/agent_parameters-v3.pkl -------------------------------------------------------------------------------- /agent-zoo/you-shall-not-pass/agent1_parameters-v1pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/you-shall-not-pass/agent1_parameters-v1pkl -------------------------------------------------------------------------------- /agent-zoo/you-shall-not-pass/agent2_parameters-v1.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-competition/b2e081a192429913b7d21a89525901d59c85e7f3/agent-zoo/you-shall-not-pass/agent2_parameters-v1.pkl -------------------------------------------------------------------------------- /demo_tasks.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | task=$1 4 | max_episodes=3 5 | correct=0 6 | 7 | if [ $task == "run-to-goal-ants" ] || [ $task == "all" ] 8 | then 9 | echo "Task 1: Run-to-Goal (Ants)" 10 | python main.py --env run-to-goal-ants --param-paths agent-zoo/run-to-goal/ants/agent1_parameters-v1.pkl agent-zoo/run-to-goal/ants/agent2_parameters-v1.pkl --max-episodes $max_episodes 11 | correct=1 12 | fi 13 | if [ $task == "run-to-goal-humans" ] || [ $task == "all" ] 14 | then 15 | echo "Task 1: Run-to-Goal (Humans)" 16 | python main.py --env run-to-goal-humans --param-paths agent-zoo/run-to-goal/humans/agent1_parameters-v1.pkl agent-zoo/run-to-goal/humans/agent2_parameters-v1.pkl --max-episodes $max_episodes 17 | correct=1 18 | fi 19 | if [ $task == "you-shall-not-pass" ] || [ $task == "all" ] 20 | then 21 | echo "Task 2: You-Shall-Not-Pass" 22 | python main.py --env you-shall-not-pass --param-paths agent-zoo/you-shall-not-pass/agent1_parameters-v1pkl agent-zoo/you-shall-not-pass/agent2_parameters-v1.pkl --max-episodes $max_episodes 23 | fi 24 | if [ $task == "sumo-ants" ] || [ $task == "all" ] 25 | then 26 | echo "Task 3: Sumo (Ants)" 27 | python main.py --env sumo-ants --param-paths agent-zoo/sumo/ants/agent_parameters-v2.pkl agent-zoo/sumo/ants/agent_parameters-v2.pkl --max-episodes $max_episodes 28 | correct=1 29 | fi 30 | if [ $task == "sumo-humans" ] || [ $task == "all" ] 31 | then 32 | echo "Task 3: Sumo (Humans)" 33 | python main.py --env sumo-humans --param-paths agent-zoo/sumo/humans/agent_parameters-v1.pkl agent-zoo/sumo/humans/agent_parameters-v1.pkl --max-episodes $max_episodes 34 | correct=1 35 | fi 36 | if [ $task == "kick-and-defend" ] || [ $task == "all" ] 37 | then 38 | echo "Task 4: Kick-and-Defend" 39 | python main.py --env kick-and-defend --param-paths agent-zoo/kick-and-defend/kicker/agent1_parameters-v1.pkl agent-zoo/kick-and-defend/defender/agent2_parameters-v1.pkl --max-episodes $max_episodes 40 | correct=1 41 | fi 42 | if [ $correct == 0 ] 43 | then 44 | echo "Usage: bash demo_tasks.sh " 45 | echo "where is all to demo all tasks or one of: run-to-goal-humans, run-to-goal-ants, you-shall-not-pass, sumo-humans, sumo-ants, kick-and-defend" 46 | fi 47 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/__init__.py: -------------------------------------------------------------------------------- 1 | from gym.envs.registration import register 2 | import os 3 | 4 | register( 5 | id='run-to-goal-ants-v0', 6 | entry_point='gym_compete.new_envs:MultiAgentEnv', 7 | kwargs={'agent_names': ['ant', 'ant'], 8 | 'scene_xml_path': os.path.join( 9 | os.path.dirname(__file__), 10 | "new_envs", "assets", 11 | "world_body.ant_body.ant_body.xml" 12 | ), 13 | 'init_pos': [(-1, 0, 0.75), (1, 0, 0.75)] 14 | }, 15 | ) 16 | 17 | register( 18 | id='run-to-goal-humans-v0', 19 | entry_point='gym_compete.new_envs:MultiAgentEnv', 20 | kwargs={'agent_names': ['humanoid', 'humanoid'], 21 | 'scene_xml_path': os.path.join( 22 | os.path.dirname(__file__), "new_envs", 23 | "assets", 24 | "world_body.humanoid_body.humanoid_body.xml" 25 | ), 26 | 'init_pos': [(-1, 0, 1.4), (1, 0, 1.4)] 27 | }, 28 | ) 29 | 30 | register( 31 | id='you-shall-not-pass-humans-v0', 32 | entry_point='gym_compete.new_envs:HumansBlockingEnv', 33 | kwargs={'agent_names': ['humanoid_blocker', 'humanoid'], 34 | 'scene_xml_path': os.path.join( 35 | os.path.dirname(__file__), "new_envs", 36 | "assets", 37 | "world_body.humanoid_body.humanoid_body.xml" 38 | ), 39 | 'init_pos': [(-1, 0, 1.4), (1, 0, 1.4)], 40 | 'max_episode_steps': 500, 41 | }, 42 | ) 43 | 44 | register( 45 | id='sumo-humans-v0', 46 | entry_point='gym_compete.new_envs:SumoEnv', 47 | kwargs={'agent_names': ['humanoid_fighter', 'humanoid_fighter'], 48 | 'scene_xml_path': os.path.join( 49 | os.path.dirname(__file__), "new_envs", 50 | "assets", 51 | "world_body_arena.humanoid_body.humanoid_body.xml" 52 | ), 53 | 'init_pos': [(-1, 0, 1.4), (1, 0, 1.4)], 54 | 'max_episode_steps': 500, 55 | 'min_radius': 1.5, 56 | 'max_radius': 3.5 57 | }, 58 | ) 59 | 60 | register( 61 | id='sumo-ants-v0', 62 | entry_point='gym_compete.new_envs:SumoEnv', 63 | kwargs={'agent_names': ['ant_fighter', 'ant_fighter'], 64 | 'scene_xml_path': os.path.join( 65 | os.path.dirname(__file__), "new_envs", 66 | "assets", 67 | "world_body_arena.ant_body.ant_body.xml" 68 | ), 69 | 'world_xml_path': os.path.join( 70 | os.path.dirname(__file__), "new_envs", 71 | "assets", 'world_body_arena.xml' 72 | ), 73 | 'init_pos': [(-1, 0, 2.5), (1, 0, 2.5)], 74 | 'max_episode_steps': 500, 75 | 'min_radius': 2.5, 76 | 'max_radius': 4.5 77 | }, 78 | ) 79 | 80 | # register( 81 | # id='HumanAntArena-v0', 82 | # entry_point='gym_compete.new_envs:HumansKnockoutEnv', 83 | # kwargs={'agent_names': ['ant_fighter', 'humanoid_fighter'], 84 | # 'scene_xml_path': os.path.join( 85 | # os.path.dirname(__file__), "new_envs", 86 | # "assets", 87 | # "world_body_arena.ant_body.human_body.xml" 88 | # ), 89 | # 'world_xml_path': os.path.join( 90 | # os.path.dirname(__file__), "new_envs", 91 | # "assets", 'world_body_arena.xml' 92 | # ), 93 | # 'init_pos': [(-1, 0, 2.5), (1, 0, 2.5)], 94 | # 'max_episode_steps': 500, 95 | # 'min_radius': 2, 96 | # 'max_radius': 3.5 97 | # }, 98 | # ) 99 | 100 | register( 101 | id='kick-and-defend-v0', 102 | entry_point='gym_compete.new_envs:KickAndDefend', 103 | kwargs={'agent_names': ['humanoid_kicker', 'humanoid_goalkeeper'], # ['humanoid_goalkeeper', 'humanoid_kicker'] 104 | 'scene_xml_path': os.path.join( 105 | os.path.dirname(__file__), "new_envs", 106 | "assets", 107 | "world_body_football.humanoid_body.humanoid_body.xml" 108 | ), 109 | 'world_xml_path': os.path.join( 110 | os.path.dirname(__file__), "new_envs", 111 | "assets", 'world_body_football.xml' 112 | ), 113 | 'init_pos': [(-1, 0, 1.5), (1, 0, 1.5)], 114 | 'max_episode_steps': 500, 115 | }, 116 | ) 117 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/__init__.py: -------------------------------------------------------------------------------- 1 | from .multi_agent_env import MultiAgentEnv 2 | # from .margin_wrapper import MultiAgentMarginEnv 3 | from .agents import Agent 4 | # from .ant import Ant 5 | # from .adverse_reward_wrapper import CompeteMoveRewardEnv, CompeteMarginMoveRewardEnv, KnockOutEnv, KnockOutFinalEnv, RandomKnockOutEnv 6 | # from .ant_team import AntTeam 7 | # from .multi_ants_team import MultiAntsTeam 8 | # from .humanoid_blocker import HumanoidBlocker 9 | # from .humans_blocking import HumansBlockingEnv 10 | from .you_shall_not_pass import HumansBlockingEnv 11 | from .sumo import SumoEnv 12 | from .kick_and_defend import KickAndDefend 13 | # from .humanoid_kicker import HumanoidKicker 14 | # from .humanoid_goalkeeper import HumanoidGoalKeeper 15 | # from .agent_id_wrapper import * 16 | # from .attack_envs import * 17 | # from .transfer_agents import * 18 | from .multi_monitoring import MultiMonitor 19 | from .multi_time_limit import MultiTimeLimit 20 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/__init__.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from .humanoid import Humanoid 3 | from .ant import Ant 4 | from .humanoid_blocker import HumanoidBlocker 5 | from .ant_fighter import AntFighter 6 | from .humanoid_fighter import HumanoidFighter 7 | from .humanoid_kicker import HumanoidKicker 8 | from .humanoid_goalkeeper import HumanoidGoalKeeper 9 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/agent.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | from gym.spaces import Box 3 | import six 4 | from ..utils import list_filter 5 | import numpy as np 6 | 7 | class Agent(object): 8 | ''' 9 | Super class for all agents in a multi-agent mujoco environement 10 | Each subclass shoudl implement a move_reward method which are the moving 11 | rewards for that agent 12 | Each agent can also implement its own action space 13 | Over-ride set_observation_space to change observation_space 14 | (default is Box based on _get_obs() implementation) 15 | After creation, an Env reference should be given calling set_env 16 | ''' 17 | JNT_NPOS = {0: 7, 18 | 1: 4, 19 | 2: 1, 20 | 3: 1, 21 | } 22 | 23 | def __init__(self, agent_id, xml_path, nagents=2): 24 | self.id = agent_id 25 | self.scope = 'agent' + str(self.id) 26 | self._xml_path = xml_path 27 | print("Reading agent XML from:", xml_path) 28 | self.xml = ET.parse(xml_path) 29 | self.env = None 30 | self._env_init = False 31 | self.n_agents = nagents 32 | 33 | def set_env(self, env): 34 | self.env = env 35 | self._env_init = True 36 | self._set_body() 37 | self._set_joint() 38 | if self.n_agents > 1: 39 | self._set_other_joint() 40 | self.set_observation_space() 41 | self.set_action_space() 42 | 43 | def set_observation_space(self): 44 | obs = self._get_obs 45 | self.obs_dim = obs.size 46 | high = np.inf * np.ones(self.obs_dim) 47 | low = -high 48 | self.observation_space = Box(low, high) 49 | 50 | def set_action_space(self): 51 | acts = self.xml.find('actuator') 52 | self.action_dim = len(list(acts)) 53 | default = self.xml.find('default') 54 | range_set = False 55 | if default is not None: 56 | motor = default.find('motor') 57 | if motor is not None: 58 | ctrl = motor.get('ctrlrange') 59 | if ctrl: 60 | clow, chigh = list(map(float, ctrl.split())) 61 | high = chigh * np.ones(self.action_dim) 62 | low = clow * np.ones(self.action_dim) 63 | range_set = True 64 | if not range_set: 65 | high = np.inf * np.ones(self.action_dim) 66 | low = - high 67 | for i, motor in enumerate(list(acts)): 68 | ctrl = motor.get('ctrlrange') 69 | if ctrl: 70 | clow, chigh = list(map(float, ctrl.split())) 71 | high[i] = chigh 72 | low[i] = clow 73 | self._low = low 74 | self._high = high 75 | self.action_space = Box(low, high) 76 | 77 | # @property 78 | # def observation_space(self): 79 | # return self.observation_space 80 | # 81 | # @property 82 | # def action_space(self): 83 | # return self.action_space 84 | 85 | def in_scope(self, name): 86 | return name.startswith(six.b(self.scope)) 87 | 88 | def in_agent_scope(self, name, agent_id): 89 | return name.startswith(six.b('agent' + str(agent_id))) 90 | 91 | def _set_body(self): 92 | self.body_names = list_filter( 93 | lambda x: self.in_scope(x), 94 | self.env.model.body_names 95 | ) 96 | self.body_ids = [self.env.model.body_names.index(body) 97 | for body in self.body_names] 98 | self.body_dofnum = self.env.model.body_dofnum[self.body_ids] 99 | self.nv = self.body_dofnum.sum() 100 | self.body_dofadr = self.env.model.body_dofadr[self.body_ids] 101 | dof = list_filter(lambda x: x >= 0, self.body_dofadr) 102 | self.qvel_start_idx = int(dof[0]) 103 | last_dof_body_id = self.body_dofnum.shape[0] - 1 104 | while self.body_dofnum[last_dof_body_id] == 0: 105 | last_dof_body_id -= 1 106 | self.qvel_end_idx = int(dof[-1] + self.body_dofnum[last_dof_body_id]) 107 | 108 | 109 | def _set_joint(self): 110 | self.join_names = list_filter( 111 | lambda x: self.in_scope(x), self.env.model.joint_names 112 | ) 113 | self.joint_ids = [self.env.model.joint_names.index(body) 114 | for body in self.join_names] 115 | self.jnt_qposadr = self.env.model.jnt_qposadr[self.joint_ids] 116 | self.jnt_type = self.env.model.jnt_type[self.joint_ids] 117 | self.jnt_nqpos = [self.JNT_NPOS[int(j)] for j in self.jnt_type] 118 | self.nq = sum(self.jnt_nqpos) 119 | self.qpos_start_idx = int(self.jnt_qposadr[0]) 120 | self.qpos_end_idx = int(self.jnt_qposadr[-1] + self.jnt_nqpos[-1]) 121 | 122 | # self.jnt_dofadr = self.env.model.jnt_dofadr[self.joint_ids] 123 | # dof = list_filter(lambda x: x >= 0, self.jnt_dofadr) 124 | # self.qvel_start_idx = int(dof[0]) 125 | # last_dof_body_id = self.body_dofnum.shape[0] - 1 126 | # while self.body_dofnum[last_dof_body_id] == 0: 127 | # last_dof_body_id -= 1 128 | # self.qvel_end_idx = int(dof[-1] + self.body_dofnum[last_dof_body_id]) 129 | 130 | def _set_other_joint(self): 131 | self._other_qpos_idx = {} 132 | for i in range(self.n_agents): 133 | if i == self.id: continue 134 | other_join_names = list_filter( 135 | lambda x: self.in_agent_scope(x, i), self.env.model.joint_names 136 | ) 137 | other_joint_ids = [self.env.model.joint_names.index(body) 138 | for body in other_join_names] 139 | other_jnt_qposadr = self.env.model.jnt_qposadr[other_joint_ids] 140 | jnt_type = self.env.model.jnt_type[other_joint_ids] 141 | jnt_nqpos = [self.JNT_NPOS[int(j)] for j in jnt_type] 142 | nq = sum(jnt_nqpos) 143 | qpos_start_idx = int(other_jnt_qposadr[0]) 144 | qpos_end_idx = int(other_jnt_qposadr[-1] + jnt_nqpos[-1]) 145 | assert nq == qpos_end_idx - qpos_start_idx, (i, nq, qpos_start_idx, qpos_end_idx) 146 | self._other_qpos_idx[i] = (qpos_start_idx, qpos_end_idx) 147 | 148 | def get_other_agent_qpos(self): 149 | other_qpos = {} 150 | for i in range(self.n_agents): 151 | if i == self.id: continue 152 | startid, endid = self._other_qpos_idx[i] 153 | qpos = self.env.model.data.qpos[startid: endid] 154 | other_qpos[i] = qpos 155 | return other_qpos 156 | 157 | def before_step(self): 158 | raise NotImplementedError 159 | 160 | def after_step(self): 161 | raise NotImplementedError 162 | 163 | def _get_obs(self): 164 | raise NotImplementedError 165 | 166 | def get_body_com(self, body_name): 167 | assert self._env_init, "Env reference is not set" 168 | idx = self.body_ids[self.body_names.index(six.b(self.scope + '/' + body_name))] 169 | return self.env.model.data.com_subtree[idx] 170 | 171 | def get_cfrc_ext(self): 172 | assert self._env_init, "Env reference is not set" 173 | return self.env.model.data.cfrc_ext[self.body_ids] 174 | 175 | def depricated_get_qpos(self): 176 | qpos = np.zeros((self.nq, 1)) 177 | cnt = 0 178 | for j, start_idx in enumerate(self.jnt_qposadr): 179 | jlen = self.jnt_nqpos[j] 180 | qpos[cnt: cnt + jlen] = self.env.model.data.qpos[start_idx: start_idx + jlen] 181 | cnt += jlen 182 | return qpos 183 | 184 | def get_qpos(self): 185 | ''' 186 | Note: this relies on the qpos for one agent being contiguously located 187 | this is generally true, use depricated_get_qpos if not 188 | ''' 189 | return self.env.model.data.qpos[self.qpos_start_idx: self.qpos_end_idx] 190 | 191 | def get_other_qpos(self): 192 | ''' 193 | Note: this relies on the qpos for one agent being contiguously located 194 | this is generally true, use depricated_get_qpos if not 195 | ''' 196 | left_part = self.env.model.data.qpos[:self.qpos_start_idx] 197 | right_part = self.env.model.data.qpos[self.qpos_end_idx:] 198 | return np.concatenate((left_part, right_part), axis=0) 199 | 200 | def get_qvel(self): 201 | ''' 202 | Note: this relies on the qvel for one agent being contiguously located 203 | this is generally true, follow depricated_get_qpos if not 204 | ''' 205 | return self.env.model.data.qvel[self.qvel_start_idx: self.qvel_end_idx] 206 | 207 | def get_qfrc_actuator(self): 208 | return self.env.model.data.qfrc_actuator[self.qvel_start_idx: self.qvel_end_idx] 209 | 210 | def get_cvel(self): 211 | return self.env.model.data.cvel[self.body_ids] 212 | 213 | def get_body_mass(self): 214 | return self.env.model.body_mass[self.body_ids] 215 | 216 | def get_xipos(self): 217 | return self.env.model.data.xipos[self.body_ids] 218 | 219 | def get_cinert(self): 220 | return self.env.model.data.cinert[self.body_ids] 221 | 222 | def get_xmat(self): 223 | return self.env.model.data.xmat[self.body_ids] 224 | 225 | def get_torso_xmat(self): 226 | return self.env.model.data.xmat[self.body_ids[self.body_names.index(six.b('agent%d/torso' % self.id))]] 227 | 228 | # def get_ctrl(self): 229 | # return self.env.model.data.ctrl[self.joint_ids] 230 | 231 | def set_xyz(self, xyz): 232 | ''' 233 | Set (x, y, z) position of the agent any element can be None 234 | ''' 235 | assert any(xyz) 236 | start = self.qpos_start_idx 237 | qpos = self.env.model.data.qpos.flatten().copy() 238 | if xyz[0]: 239 | qpos[start] = xyz[0] 240 | if xyz[1]: 241 | qpos[start+1] = xyz[1] 242 | if xyz[2]: 243 | qpos[start+2] = xyz[2] 244 | qvel = self.env.model.data.qvel.flatten() 245 | self.env.set_state(qpos, qvel) 246 | 247 | def set_margin(self, margin): 248 | agent_geom_ids = [i for i, name in enumerate(self.env.model.geom_names) 249 | if self.in_scope(name)] 250 | m = self.env.model.geom_margin.copy() 251 | print("Resetting", self.scope, "margins to", margin) 252 | m[agent_geom_ids] = margin 253 | self.env.model.__setattr__('geom_margin', m) 254 | 255 | def reached_goal(self): 256 | ''' 257 | Override this 258 | ''' 259 | raise NotImplementedError 260 | 261 | def set_goal(self): 262 | ''' 263 | Override if needed, this called when initializing the agent 264 | and also if goal needs to be changed on reset 265 | ''' 266 | pass 267 | 268 | def reset_agent(self): 269 | '''Override this''' 270 | pass 271 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/ant.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from gym.spaces import Box 3 | import numpy as np 4 | 5 | class Ant(Agent): 6 | 7 | def __init__(self, agent_id, xml_path=None): 8 | if xml_path is None: 9 | xml_path = os.path.join(os.path.dirname(__file__), "assets", "ant_body.xml") 10 | super(Ant, self).__init__(agent_id, xml_path) 11 | 12 | def set_goal(self, goal): 13 | self.GOAL = goal 14 | self.move_left = False 15 | if self.get_qpos()[0] > 0: 16 | self.move_left = True 17 | 18 | def before_step(self): 19 | self._xposbefore = self.get_body_com("torso")[0] 20 | 21 | def after_step(self, action): 22 | xposafter = self.get_body_com("torso")[0] 23 | forward_reward = (xposafter - self._xposbefore) / self.env.dt 24 | if self.move_left: 25 | forward_reward *= -1 26 | ctrl_cost = .5 * np.square(action).sum() 27 | cfrc_ext = self.get_cfrc_ext() 28 | contact_cost = 0.5 * 1e-3 * np.sum( 29 | np.square(np.clip(cfrc_ext, -1, 1)) 30 | ) 31 | qpos = self.get_qpos() 32 | agent_standing = qpos[2] >= 0.28 33 | survive = 1.0 if agent_standing else -1. 34 | reward = forward_reward - ctrl_cost - contact_cost + survive 35 | 36 | reward_info = dict() 37 | reward_info['reward_forward'] = forward_reward 38 | reward_info['reward_ctrl'] = ctrl_cost 39 | reward_info['reward_contact'] = contact_cost 40 | reward_info['reward_survive'] = survive 41 | reward_info['reward_move'] = reward 42 | 43 | done = not agent_standing 44 | 45 | return reward, done, reward_info 46 | 47 | 48 | def _get_obs(self): 49 | ''' 50 | Return agent's observations 51 | ''' 52 | my_pos = self.get_qpos() 53 | other_pos = self.get_other_qpos() 54 | my_vel = self.get_qvel() 55 | cfrc_ext = np.clip(self.get_cfrc_ext(), -1, 1) 56 | 57 | obs = np.concatenate( 58 | [my_pos.flat, my_vel.flat, cfrc_ext.flat, 59 | other_pos.flat] 60 | ) 61 | return obs 62 | 63 | def set_observation_space(self): 64 | obs = self._get_obs() 65 | self.obs_dim = obs.size 66 | high = np.inf * np.ones(self.obs_dim) 67 | low = -high 68 | self.observation_space = Box(low, high) 69 | 70 | def reached_goal(self): 71 | xpos = self.get_body_com('torso')[0] 72 | if self.GOAL > 0 and xpos > self.GOAL: 73 | return True 74 | elif self.GOAL < 0 and xpos < self.GOAL: 75 | return True 76 | return False 77 | 78 | def reset_agent(self): 79 | xpos = self.get_qpos()[0] 80 | if xpos * self.GOAL > 0 : 81 | self.set_goal(-self.GOAL) 82 | if xpos > 0: 83 | self.move_left = True 84 | else: 85 | self.move_left = False 86 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/ant_fighter.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from .ant import Ant 3 | from gym.spaces import Box 4 | import numpy as np 5 | import six 6 | 7 | 8 | def mass_center(mass, xpos): 9 | return (np.sum(mass * xpos, 0) / np.sum(mass))[0] 10 | 11 | 12 | class AntFighter(Ant): 13 | 14 | def __init__(self, agent_id, xml_path=None, team='ant'): 15 | super(AntFighter, self).__init__(agent_id, xml_path) 16 | self.team = team 17 | 18 | def before_step(self): 19 | pass 20 | 21 | def set_env(self, env): 22 | super(AntFighter, self).set_env(env) 23 | self.arena_id = self.env.model.geom_names.index(six.b('arena')) 24 | self.arena_height = self.env.model.geom_size[self.arena_id][1] * 2 25 | 26 | def after_step(self, action): 27 | ctrl_cost = .1 * np.square(action).sum() 28 | cfrc_ext = self.get_cfrc_ext() 29 | contact_cost = .5e-6 * np.square(cfrc_ext).sum() 30 | contact_cost = min(contact_cost, 10) 31 | qpos = self.get_qpos() 32 | center_reward = - np.sqrt(np.sum((0. - qpos[:2])**2)) 33 | agent_standing = qpos[2] - self.arena_height >= 0.28 34 | survive = 5.0 if agent_standing else -5. 35 | reward = center_reward - ctrl_cost - contact_cost + survive 36 | # reward = survive 37 | 38 | reward_info = dict() 39 | # reward_info['reward_forward'] = forward_reward 40 | reward_info['reward_center'] = center_reward 41 | reward_info['reward_ctrl'] = ctrl_cost 42 | reward_info['reward_contact'] = contact_cost 43 | reward_info['reward_survive'] = survive 44 | reward_info['reward_move'] = reward 45 | 46 | done = bool(qpos[2] - self.arena_height <= 0.28) 47 | 48 | return reward, done, reward_info 49 | 50 | def _get_obs(self): 51 | ''' 52 | Return agent's observations 53 | ''' 54 | qpos = self.get_qpos() 55 | my_pos = qpos 56 | other_qpos = self.get_other_qpos() 57 | other_pos = other_qpos 58 | other_relative_xy = other_qpos[:2] - qpos[:2] 59 | 60 | vel = self.get_qvel() 61 | cfrc_ext = np.clip(self.get_cfrc_ext(), -1, 1) 62 | 63 | torso_xmat = self.get_torso_xmat() 64 | 65 | obs = np.concatenate( 66 | [my_pos.flat, vel.flat, cfrc_ext.flat, 67 | other_pos.flat, other_relative_xy.flat, 68 | torso_xmat.flat 69 | ] 70 | ) 71 | assert np.isfinite(obs).all(), "Ant observation is not finite!!" 72 | return obs 73 | 74 | def reached_goal(self): 75 | return False 76 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/humanoid.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from gym.spaces import Box 3 | import numpy as np 4 | 5 | 6 | def mass_center(mass, xpos): 7 | return (np.sum(mass * xpos, 0) / np.sum(mass))[0] 8 | 9 | 10 | class Humanoid(Agent): 11 | 12 | def __init__(self, agent_id, xml_path=None, **kwargs): 13 | if xml_path is None: 14 | xml_path = os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml") 15 | super(Humanoid, self).__init__(agent_id, xml_path, **kwargs) 16 | self.team = 'walker' 17 | 18 | def set_goal(self, goal): 19 | self.GOAL = goal 20 | self.move_left = False 21 | if self.get_qpos()[0] > 0: 22 | self.move_left = True 23 | 24 | def before_step(self): 25 | self._pos_before = mass_center(self.get_body_mass(), self.get_xipos()) 26 | 27 | def after_step(self, action): 28 | pos_after = mass_center(self.get_body_mass(), self.get_xipos()) 29 | forward_reward = 0.25 * (pos_after - self._pos_before) / self.env.model.opt.timestep 30 | if self.move_left: 31 | forward_reward *= -1 32 | ctrl_cost = .1 * np.square(action).sum() 33 | cfrc_ext = self.get_cfrc_ext() 34 | contact_cost = .5e-6 * np.square(cfrc_ext).sum() 35 | contact_cost = min(contact_cost, 10) 36 | qpos = self.get_qpos() 37 | agent_standing = qpos[2] >= 1.0 38 | survive = 5.0 if agent_standing else -5. 39 | reward = forward_reward - ctrl_cost - contact_cost + survive 40 | reward_goal = - np.abs(np.asscalar(qpos[0]) - self.GOAL) 41 | reward += reward_goal 42 | 43 | reward_info = dict() 44 | reward_info['reward_forward'] = forward_reward 45 | reward_info['reward_ctrl'] = ctrl_cost 46 | reward_info['reward_contact'] = contact_cost 47 | reward_info['reward_survive'] = survive 48 | if self.team == 'walker': 49 | reward_info['reward_goal_dist'] = reward_goal 50 | reward_info['reward_move'] = reward 51 | 52 | # done = not agent_standing 53 | done = qpos[2] < 0.8 54 | 55 | return reward, done, reward_info 56 | 57 | 58 | def _get_obs(self): 59 | ''' 60 | Return agent's observations 61 | ''' 62 | my_pos = self.get_qpos() 63 | other_pos = self.get_other_qpos() 64 | vel = self.get_qvel() 65 | cfrc_ext = np.clip(self.get_cfrc_ext(), -1, 1) 66 | cvel = self.get_cvel() 67 | cinert = self.get_cinert() 68 | qfrc_actuator = self.get_qfrc_actuator() 69 | 70 | obs = np.concatenate( 71 | [my_pos.flat, vel.flat, 72 | cinert.flat, cvel.flat, 73 | qfrc_actuator.flat, cfrc_ext.flat, 74 | other_pos.flat] 75 | ) 76 | assert np.isfinite(obs).all(), "Humanoid observation is not finite!!" 77 | return obs 78 | 79 | def _get_obs_relative(self): 80 | ''' 81 | Return agent's observations, positions are relative 82 | ''' 83 | qpos = self.get_qpos() 84 | my_pos = qpos[2:] 85 | other_agents_qpos = self.get_other_agent_qpos() 86 | all_other_qpos = [] 87 | for i in range(self.n_agents): 88 | if i == self.id: continue 89 | other_qpos = other_agents_qpos[i] 90 | other_relative_xy = other_qpos[:2] - qpos[:2] 91 | other_qpos = np.concatenate([other_relative_xy.flat, other_qpos[2:].flat], axis=0) 92 | all_other_qpos.append(other_qpos) 93 | all_other_qpos = np.concatenate(all_other_qpos) 94 | 95 | vel = self.get_qvel() 96 | cfrc_ext = np.clip(self.get_cfrc_ext(), -1, 1) 97 | cvel = self.get_cvel() 98 | cinert = self.get_cinert() 99 | qfrc_actuator = self.get_qfrc_actuator() 100 | 101 | obs = np.concatenate( 102 | [my_pos.flat, vel.flat, 103 | cinert.flat, cvel.flat, 104 | qfrc_actuator.flat, cfrc_ext.flat, 105 | all_other_qpos.flat] 106 | ) 107 | assert np.isfinite(obs).all(), "Humanoid observation is not finite!!" 108 | return obs 109 | 110 | def set_observation_space(self): 111 | obs = self._get_obs() 112 | self.obs_dim = obs.size 113 | high = np.inf * np.ones(self.obs_dim) 114 | low = -high 115 | self.observation_space = Box(low, high) 116 | 117 | def reached_goal(self): 118 | xpos = self.get_body_com('torso')[0] 119 | if self.GOAL > 0 and xpos > self.GOAL: 120 | return True 121 | elif self.GOAL < 0 and xpos < self.GOAL: 122 | return True 123 | return False 124 | 125 | def reset_agent(self): 126 | xpos = self.get_qpos()[0] 127 | if xpos * self.GOAL > 0 : 128 | self.set_goal(-self.GOAL) 129 | if xpos > 0: 130 | self.move_left = True 131 | else: 132 | self.move_left = False 133 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/humanoid_blocker.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from .humanoid import Humanoid 3 | from gym.spaces import Box 4 | import numpy as np 5 | 6 | 7 | def mass_center(mass, xpos): 8 | return (np.sum(mass * xpos, 0) / np.sum(mass))[0] 9 | 10 | 11 | class HumanoidBlocker(Humanoid): 12 | 13 | def __init__(self, agent_id, xml_path=None, team='blocker'): 14 | super(HumanoidBlocker, self).__init__(agent_id, xml_path) 15 | self.team = team 16 | 17 | def before_step(self): 18 | pass 19 | 20 | def after_step(self, action): 21 | forward_reward = 0. 22 | ctrl_cost = .1 * np.square(action).sum() 23 | cfrc_ext = self.get_cfrc_ext() 24 | contact_cost = .5e-6 * np.square(cfrc_ext).sum() 25 | contact_cost = min(contact_cost, 10) 26 | qpos = self.get_qpos() 27 | agent_standing = qpos[2] >= 1.0 28 | survive = 5.0 if agent_standing else -5. 29 | reward = forward_reward - ctrl_cost - contact_cost + survive 30 | # reward = survive 31 | 32 | reward_info = dict() 33 | reward_info['reward_forward'] = forward_reward 34 | reward_info['reward_ctrl'] = ctrl_cost 35 | reward_info['reward_contact'] = contact_cost 36 | reward_info['reward_survive'] = survive 37 | reward_info['reward_move'] = reward 38 | 39 | done = bool(qpos[2] <= 0.5) 40 | 41 | return reward, done, reward_info 42 | 43 | def reached_goal(self): 44 | return False 45 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/humanoid_fighter.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from .humanoid import Humanoid 3 | from gym.spaces import Box 4 | import numpy as np 5 | import six 6 | 7 | 8 | def mass_center(mass, xpos): 9 | return (np.sum(mass * xpos, 0) / np.sum(mass))[0] 10 | 11 | 12 | class HumanoidFighter(Humanoid): 13 | 14 | def __init__(self, agent_id, xml_path=None, team='blocker', **kwargs): 15 | super(HumanoidFighter, self).__init__(agent_id, xml_path, **kwargs) 16 | self.team = team 17 | 18 | def before_step(self): 19 | pass 20 | 21 | def set_env(self, env): 22 | super(HumanoidFighter, self).set_env(env) 23 | self.arena_id = self.env.model.geom_names.index(six.b('arena')) 24 | self.arena_height = self.env.model.geom_size[self.arena_id][1] * 2 25 | 26 | def after_step(self, action): 27 | action = np.clip(action, self.action_space.low, self.action_space.high) 28 | ctrl_cost = .1 * np.square(action).sum() 29 | cfrc_ext = self.get_cfrc_ext() 30 | contact_cost = .5e-6 * np.square(cfrc_ext).sum() 31 | contact_cost = min(contact_cost, 10) 32 | qpos = self.get_qpos() 33 | center_reward = - np.sqrt(np.sum((0. - qpos[:2])**2)) 34 | agent_standing = qpos[2] - self.arena_height >= 1.0 35 | survive = 5.0 if agent_standing else -5. 36 | reward = center_reward - ctrl_cost - contact_cost + survive 37 | # reward = survive 38 | 39 | reward_info = dict() 40 | # reward_info['reward_forward'] = forward_reward 41 | reward_info['reward_center'] = center_reward 42 | reward_info['reward_ctrl'] = ctrl_cost 43 | reward_info['reward_contact'] = contact_cost 44 | reward_info['reward_survive'] = survive 45 | reward_info['reward_move'] = reward 46 | 47 | done = bool(qpos[2] - self.arena_height <= 0.5) 48 | 49 | return reward, done, reward_info 50 | 51 | def _get_obs(self): 52 | ''' 53 | Return agent's observations 54 | ''' 55 | qpos = self.get_qpos() 56 | my_pos = qpos 57 | other_qpos = self.get_other_qpos() 58 | other_pos = other_qpos 59 | other_relative_xy = other_qpos[:2] - qpos[:2] 60 | 61 | vel = self.get_qvel() 62 | cfrc_ext = np.clip(self.get_cfrc_ext(), -1, 1) 63 | cvel = self.get_cvel() 64 | cinert = self.get_cinert() 65 | qfrc_actuator = self.get_qfrc_actuator() 66 | torso_xmat = self.get_torso_xmat() 67 | 68 | obs = np.concatenate( 69 | [my_pos.flat, vel.flat, 70 | cinert.flat, cvel.flat, 71 | qfrc_actuator.flat, cfrc_ext.flat, 72 | other_pos.flat, other_relative_xy.flat, 73 | torso_xmat.flat] 74 | ) 75 | assert np.isfinite(obs).all(), "Humanoid observation is not finite!!" 76 | return obs 77 | 78 | def reached_goal(self): 79 | return False 80 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/humanoid_goalkeeper.py: -------------------------------------------------------------------------------- 1 | from .agent import Agent 2 | from .humanoid_kicker import HumanoidKicker 3 | from gym.spaces import Box 4 | import numpy as np 5 | 6 | 7 | def mass_center(mass, xpos): 8 | return (np.sum(mass * xpos, 0) / np.sum(mass))[0] 9 | 10 | 11 | class HumanoidGoalKeeper(HumanoidKicker): 12 | 13 | def __init__(self, agent_id, xml_path=None, team='blocker', goal_x=4, goal_y=3): 14 | super(HumanoidGoalKeeper, self).__init__(agent_id, xml_path) 15 | self.team = team 16 | self.GOAL_X = goal_x 17 | self.GOAL_Y = goal_y 18 | 19 | def before_step(self): 20 | pass 21 | 22 | def after_step(self, action): 23 | action = np.clip(action, self.action_space.low, self.action_space.high) 24 | forward_reward = 0. 25 | ctrl_cost = .1 * np.square(action).sum() 26 | cfrc_ext = self.get_cfrc_ext() 27 | contact_cost = .5e-6 * np.square(cfrc_ext).sum() 28 | contact_cost = min(contact_cost, 10) 29 | qpos = self.get_qpos() 30 | agent_standing = qpos[2] >= 1.0 31 | at_goal = (qpos[0] <= self.GOAL_X + 0.05) and (abs(qpos[1]) <= self.GOAL_Y) 32 | survive = 5.0 if agent_standing and at_goal else -5. 33 | reward = forward_reward - ctrl_cost - contact_cost + survive 34 | 35 | reward_info = dict() 36 | reward_info['reward_forward'] = forward_reward 37 | reward_info['reward_ctrl'] = ctrl_cost 38 | reward_info['reward_contact'] = contact_cost 39 | reward_info['reward_survive'] = survive 40 | reward_info['reward_move'] = reward 41 | # print("keeper", reward_info) 42 | 43 | done = bool(qpos[2] <= 0.5) 44 | 45 | return reward, done, reward_info 46 | 47 | def _get_obs(self): 48 | obs = super(HumanoidGoalKeeper, self)._get_obs() 49 | assert np.isfinite(obs).all(), "Humanoid Keeper observation is not finite!!" 50 | return obs 51 | 52 | def reached_goal(self): 53 | return False 54 | 55 | def reset_agent(self): 56 | xpos = self.get_qpos()[0] 57 | # NOTE: for keeper Target is on same side as the agent 58 | if xpos * self.TARGET < 0 : 59 | self.set_goal(-self.TARGET) 60 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/agents/humanoid_kicker.py: -------------------------------------------------------------------------------- 1 | from .humanoid import Humanoid 2 | from gym.spaces import Box 3 | import numpy as np 4 | from .agent import Agent 5 | import six 6 | 7 | 8 | def mass_center(mass, xpos): 9 | return (np.sum(mass * xpos, 0) / np.sum(mass)) 10 | 11 | 12 | class HumanoidKicker(Humanoid): 13 | 14 | def __init__(self, agent_id, xml_path=None): 15 | if xml_path is None: 16 | xml_path = os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml") 17 | super(HumanoidKicker, self).__init__(agent_id, xml_path) 18 | self.team = 'walker' 19 | self.TARGET = 4 if agent_id == 0 else -4 20 | self.TARGET_Y = 3 21 | 22 | def set_env(self, env): 23 | self.ball_jnt_id = env.model.joint_names.index(six.b('ball')) 24 | self.ball_jnt_nqpos = Agent.JNT_NPOS[int(env.model.jnt_type[self.ball_jnt_id])] 25 | super(HumanoidKicker, self).set_env(env) 26 | 27 | def get_ball_qpos(self): 28 | start_idx = int(self.env.model.jnt_qposadr[self.ball_jnt_id]) 29 | return self.env.model.data.qpos[start_idx:start_idx+self.ball_jnt_nqpos] 30 | 31 | def get_ball_qvel(self): 32 | start_idx = int(self.env.model.jnt_dofadr[self.ball_jnt_id]) 33 | # ball has 6 components: 3d translation, 3d rotational 34 | return self.env.model.data.qvel[start_idx:start_idx+6] 35 | 36 | def set_goal(self, goal): 37 | ball_ini_xyz = self.get_ball_qpos() 38 | self.GOAL = np.asscalar(ball_ini_xyz[0]) 39 | self.TARGET = goal 40 | self.move_left = False 41 | if self.get_qpos()[0] - self.GOAL > 0: 42 | self.move_left = True 43 | 44 | def after_step(self, action): 45 | action = np.clip(action, self.action_space.low, self.action_space.high) 46 | # print(action) 47 | _, done, rinfo = super(HumanoidKicker, self).after_step(action) 48 | ball_xy = self.get_ball_qpos()[:2] 49 | my_xy = self.get_qpos()[:2] 50 | ball_dist = np.sqrt(np.sum((my_xy - ball_xy)**2)) 51 | rinfo['reward_goal_dist'] = np.asscalar(ball_dist) 52 | reward = rinfo['reward_forward'] - rinfo['reward_ctrl'] - rinfo['reward_contact'] + rinfo['reward_survive'] - rinfo['reward_goal_dist'] 53 | rinfo['reward_move'] = reward 54 | assert np.isfinite(reward), (rinfo, action) 55 | return reward, done, rinfo 56 | 57 | def _get_obs(self): 58 | state = super(HumanoidKicker, self)._get_obs_relative() 59 | ball_xyz = self.get_ball_qpos()[:3] 60 | relative_xy = ball_xyz[:2] - self.get_qpos()[:2] 61 | relative_xyz = np.concatenate([relative_xy.flat, ball_xyz[2].flat]) 62 | 63 | ball_goal_dist = self.TARGET - ball_xyz[0] 64 | ball_qvel = self.get_ball_qvel()[:3] 65 | ball_goal_y_dist1 = np.asarray(self.TARGET_Y - ball_xyz[1]) 66 | ball_goal_y_dist2 = np.asarray(-self.TARGET_Y - ball_xyz[1]) 67 | 68 | obs = np.concatenate([state.flat, relative_xyz.flat, np.asarray(ball_goal_dist).flat, ball_goal_y_dist1.flat, ball_goal_y_dist2.flat]) 69 | assert np.isfinite(obs).all(), "Humanoid Kicker observation is not finite!!" 70 | return obs 71 | 72 | def reached_goal(self): 73 | return False 74 | 75 | def reset_agent(self): 76 | pass 77 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/ant_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/humanoid_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body.ant_body.ant_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 145 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body.humanoid_body.humanoid_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 210 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 22 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body_arena.ant_body.ant_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 146 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body_arena.humanoid_body.humanoid_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 222 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body_arena.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 23 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body_football.humanoid_body.humanoid_body.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 215 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/assets/world_body_football.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 26 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/kick_and_defend.py: -------------------------------------------------------------------------------- 1 | from .multi_agent_env import MultiAgentEnv 2 | import numpy as np 3 | from .agents import Agent 4 | import six 5 | from gym import spaces 6 | 7 | class KickAndDefend(MultiAgentEnv): 8 | def __init__(self, max_episode_steps=500, randomize_ball=True, **kwargs): 9 | super(KickAndDefend, self).__init__(**kwargs) 10 | self._max_episode_steps = max_episode_steps 11 | self._elapsed_steps = 0 12 | self.GOAL_REWARD = 1000 13 | self.ball_jnt_id = self.env_scene.model.joint_names.index(six.b('ball')) 14 | self.jnt_nqpos = Agent.JNT_NPOS[int(self.env_scene.model.jnt_type[self.ball_jnt_id])] 15 | if self.agents[0].team == 'walker': 16 | self.walker_id = 0 17 | self.blocker_id = 1 18 | else: 19 | self.walker_id = 1 20 | self.blocker_id = 0 21 | self.GOAL_X = self.agents[self.walker_id].TARGET 22 | # print("GOAL_X:", self.GOAL_X) 23 | self.GOAL_Y = 3 24 | self.randomize_ball = randomize_ball 25 | self.LIM_X = [(-3.5, -0.5), (1.6, 3.5)] 26 | # self.RANGE_X = self.LIM_X = [(-2, -2), (1.6, 3.5)] 27 | self.LIM_Y = [(-2, 2), (-2, 2)] 28 | self.RANGE_X = self.LIM_X.copy() 29 | self.RANGE_Y = self.LIM_Y.copy() 30 | self.BALL_LIM_X = (-2, 1) 31 | self.BALL_LIM_Y = (-4, 4) 32 | self.BALL_RANGE_X = self.BALL_LIM_X 33 | self.BALL_RANGE_Y = self.BALL_LIM_Y 34 | self.keeper_touched_ball = False 35 | 36 | def _past_limit(self): 37 | if self._max_episode_steps <= self._elapsed_steps: 38 | return True 39 | return False 40 | 41 | def get_ball_qpos(self): 42 | start_idx = int(self.env_scene.model.jnt_qposadr[self.ball_jnt_id]) 43 | return self.env_scene.model.data.qpos[start_idx:start_idx+self.jnt_nqpos] 44 | 45 | def get_ball_qvel(self): 46 | start_idx = int(self.env_scene.model.jnt_dofadr[self.ball_jnt_id]) 47 | # ball has 6 components: 3d translation, 3d rotational 48 | return self.env_scene.model.data.qvel[start_idx:start_idx+6] 49 | 50 | def get_ball_contacts(self, agent_id): 51 | mjcontacts = self.env_scene.data._wrapped.contents.contact 52 | ncon = self.env_scene.model.data.ncon 53 | contacts = [] 54 | for i in range(ncon): 55 | ct = mjcontacts[i] 56 | g1 , g2 = ct.geom1, ct.geom2 57 | g1 = self.env_scene.model.geom_names[g1] 58 | g2 = self.env_scene.model.geom_names[g2] 59 | if g1.find(six.b('ball')) >= 0: 60 | if g2.find(six.b('agent' + str(agent_id))) >= 0: 61 | if ct.dist < 0: 62 | contacts.append((g1, g2, ct.dist)) 63 | return contacts 64 | 65 | def _set_ball_xyz(self, xyz): 66 | start = int(self.env_scene.model.jnt_qposadr[self.ball_jnt_id]) 67 | qpos = self.env_scene.model.data.qpos.flatten().copy() 68 | qpos[start:start+3] = xyz 69 | qvel = self.env_scene.model.data.qvel.flatten() 70 | self.env_scene.set_state(qpos, qvel) 71 | 72 | def is_goal(self): 73 | ball_xyz = self.get_ball_qpos()[:3] 74 | if self.GOAL_X > 0 and ball_xyz[0] > self.GOAL_X and abs(ball_xyz[1]) <= self.GOAL_Y: 75 | return True 76 | elif self.GOAL_X < 0 and ball_xyz[0] < self.GOAL_X and abs(ball_xyz[1]) <= self.GOAL_Y: 77 | return True 78 | return False 79 | 80 | def goal_rewards(self, infos=None, agent_dones=None): 81 | self._elapsed_steps += 1 82 | # print(self._elapsed_steps, self.keeper_touched_ball) 83 | goal_rews = [0. for _ in range(self.n_agents)] 84 | ball_xyz = self.get_ball_qpos()[:3] 85 | done = self._past_limit() or (self.GOAL_X > 0 and ball_xyz[0] > self.GOAL_X) or (self.GOAL_X < 0 and ball_xyz[0] < self.GOAL_X) 86 | ball_vel = self.get_ball_qvel()[:3] 87 | if ball_vel[0] < 0 and np.linalg.norm(ball_vel) > 1: 88 | done = True 89 | # print("Keeper stopped ball, vel:", ball_vel) 90 | # agent_fallen = [self.agents[i].get_qpos()[2] < 0.5 for i in range(self.n_agents)] 91 | # import ipdb; ipdb.set_trace() 92 | ball_contacts = self.get_ball_contacts(self.blocker_id) 93 | if len(ball_contacts) > 0: 94 | # print("detected contacts for keeper:", ball_contacts) 95 | self.keeper_touched_ball = True 96 | if self.is_goal(): 97 | for i in range(self.n_agents): 98 | if self.agents[i].team == 'walker': 99 | goal_rews[i] += self.GOAL_REWARD 100 | infos[i]['winner'] = True 101 | else: 102 | goal_rews[i] -= self.GOAL_REWARD 103 | done = True 104 | elif done or all(agent_dones): 105 | for i in range(self.n_agents): 106 | if self.agents[i].team == 'walker': 107 | goal_rews[i] -= self.GOAL_REWARD 108 | else: 109 | goal_rews[i] += self.GOAL_REWARD 110 | infos[i]['winner'] = True 111 | if self.keeper_touched_ball: 112 | # ball contact bonus 113 | goal_rews[i] += 0.5 * self.GOAL_REWARD 114 | if self.agents[i].get_qpos()[2] > 0.8: 115 | # standing bonus 116 | goal_rews[i] += 0.5 * self.GOAL_REWARD 117 | else: 118 | keeper_penalty = False 119 | for i in range(self.n_agents): 120 | if self.agents[i].team == 'blocker': 121 | if np.abs(self.GOAL_X - self.agents[i].get_qpos()[0]) > 2.5: 122 | keeper_penalty = True 123 | # print("keeper x:", self.agents[i].get_qpos()[0], "goal_x:", self.GOAL_X) 124 | print("Keeper foul!") 125 | break 126 | if keeper_penalty: 127 | done = True 128 | for i in range(self.n_agents): 129 | if self.agents[i].team == 'blocker': 130 | goal_rews[i] -= self.GOAL_REWARD 131 | else: 132 | for i in range(self.n_agents): 133 | if self.agents[i].team == 'walker': 134 | # goal_rews[i] -= np.abs(ball_xyz[0] - self.GOAL_X) 135 | infos[i]['reward_move'] -= np.asscalar(np.abs(ball_xyz[0] - self.GOAL_X)) 136 | else: 137 | infos[i]['reward_move'] += np.asscalar(np.abs(ball_xyz[0] - self.GOAL_X)) 138 | # if len(ball_contacts) > 0: 139 | # # ball contact bonus 140 | # print("detected contacts for keeper:", ball_contacts) 141 | # goal_rews[i] += 0.5 * self.GOAL_REWARD 142 | return goal_rews, done 143 | 144 | def _set_ball_vel(self, vel_xyz): 145 | start = int(self.env_scene.model.jnt_dofadr[self.ball_jnt_id]) 146 | qvel = self.env_scene.model.data.qvel.flatten().copy() 147 | qvel[start:start+len(vel_xyz)] = vel_xyz 148 | qpos = self.env_scene.model.data.qpos.flatten() 149 | self.env_scene.set_state(qpos, qvel) 150 | 151 | def _set_random_ball_pos(self): 152 | x = np.random.uniform(*self.BALL_RANGE_X) 153 | y = np.random.uniform(*self.BALL_RANGE_Y) 154 | z = 0.35 155 | # print("setting ball to {}".format((x, y, z))) 156 | self._set_ball_xyz((x,y,z)) 157 | if self.get_ball_qvel()[0] < 0: 158 | self._set_ball_vel((0.1, 0.1, 0.1)) 159 | 160 | def _reset_range(self, version): 161 | decay_func = lambda x: 0.05 * np.exp(0.001 * x) 162 | v = decay_func(version) 163 | self.BALL_RANGE_X = (max(self.BALL_LIM_X[0], -v), min(self.BALL_LIM_X[1], v)) 164 | self.BALL_RANGE_Y = (max(self.BALL_LIM_Y[0], -v), min(self.BALL_LIM_Y[1], v)) 165 | self.RANGE_X[0] = (max(self.LIM_X[0][0], -2-v), min(self.LIM_X[0][1], -2+v)) 166 | self.RANGE_Y[0] = (max(self.LIM_Y[0][0], -v), min(self.LIM_Y[0][1], v)) 167 | self.RANGE_X[1] = (max(self.LIM_X[1][0], 2-v), min(self.LIM_X[1][1], 2+v)) 168 | self.RANGE_Y[1] = (max(self.LIM_Y[1][0], -v), min(self.LIM_Y[1][1], v)) 169 | # print(self.RANGE_X) 170 | # print(self.RANGE_Y) 171 | # print(self.BALL_RANGE_X) 172 | # print(self.BALL_RANGE_Y) 173 | 174 | def _reset(self, version=None): 175 | self._elapsed_steps = 0 176 | self.keeper_touched_ball = False 177 | _ = self.env_scene.reset() 178 | if version is not None: 179 | self._reset_range(version) 180 | for i in range(self.n_agents): 181 | x = np.random.uniform(*self.RANGE_X[i]) 182 | y = np.random.uniform(*self.RANGE_Y[i]) 183 | # print("setting agent {} to pos {}".format(i, (x,y))) 184 | self.agents[i].set_xyz((x, y, None)) 185 | self.agents[i].reset_agent() 186 | if self.randomize_ball: 187 | self._set_random_ball_pos() 188 | return self._get_obs() 189 | 190 | def reset(self, margins=None, version=None): 191 | _ = self._reset(version=version) 192 | if self.agents[0].team == 'walker': 193 | self.walker_id = 0 194 | self.blocker_id = 1 195 | else: 196 | self.walker_id = 1 197 | self.blocker_id = 0 198 | self.GOAL_X = self.agents[self.walker_id].TARGET 199 | if margins is not None: 200 | for i in range(self.n_agents): 201 | self.agents[i].set_margin(margins[i]) 202 | # print("GOAL_X:", self.GOAL_X) 203 | ob = self._get_obs() 204 | return ob 205 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/multi_agent_env.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from gym import Env, spaces 3 | from .multi_agent_scene import MultiAgentScene 4 | from .agents import * 5 | from .utils import create_multiagent_xml 6 | import os 7 | import six 8 | 9 | class MultiAgentEnv(Env): 10 | ''' 11 | A multi-agent environment consists of some number of Agent and 12 | a MultiAgentScene 13 | The supported agents and their classes are defined in 14 | AGENT_MAP, a dictionary mapping {agent_name: (xml_path, class)} 15 | Agents with initial x coordinate < 0 have goal on the right and 16 | vice versa 17 | ''' 18 | AGENT_MAP = { 19 | 'ant': ( 20 | os.path.join(os.path.dirname(__file__), "assets", "ant_body.xml"), 21 | Ant 22 | ), 23 | 'humanoid': ( 24 | os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml"), 25 | Humanoid 26 | ), 27 | 'humanoid_blocker': ( 28 | os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml"), 29 | HumanoidBlocker 30 | ), 31 | 'humanoid_fighter': ( 32 | os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml"), 33 | HumanoidFighter 34 | ), 35 | 'ant_fighter': ( 36 | os.path.join(os.path.dirname(__file__), "assets", "ant_body.xml"), 37 | AntFighter 38 | ), 39 | 'humanoid_kicker': ( 40 | os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml"), 41 | HumanoidKicker 42 | ), 43 | 'humanoid_goalkeeper': ( 44 | os.path.join(os.path.dirname(__file__), "assets", "humanoid_body.xml"), 45 | HumanoidGoalKeeper 46 | ), 47 | } 48 | WORLD_XML = os.path.join(os.path.dirname(__file__), "assets", "world_body.xml") 49 | GOAL_REWARD = 1000 50 | 51 | def __init__( 52 | self, agent_names, 53 | world_xml_path=WORLD_XML, agent_map=AGENT_MAP, 54 | scene_xml_path=None, move_reward_weight=1.0, 55 | init_pos=None, rgb=None, agent_args=None 56 | ): 57 | ''' 58 | agent_args is a list of kwargs for each agent 59 | ''' 60 | self.n_agents = len(agent_names) 61 | self.agents = {} 62 | all_agent_xml_paths = [] 63 | if not agent_args: 64 | agent_args = [{} for _ in range(self.n_agents)] 65 | assert len(agent_args) == self.n_agents, "Incorrect length of agent_args" 66 | for i, name in enumerate(agent_names): 67 | print("Creating agent", name) 68 | agent_xml_path, agent_class = agent_map[name] 69 | self.agents[i] = agent_class(i, agent_xml_path, **agent_args[i]) 70 | all_agent_xml_paths.append(agent_xml_path) 71 | agent_scopes = ['agent' + str(i) for i in range(self.n_agents)] 72 | # print(scene_xml_path) 73 | if scene_xml_path is not None and os.path.exists(scene_xml_path): 74 | self._env_xml_path = scene_xml_path 75 | else: 76 | print("Creating Scene XML") 77 | print(init_pos) 78 | _, self._env_xml_path = create_multiagent_xml( 79 | world_xml_path, all_agent_xml_paths, agent_scopes, 80 | # outdir=os.path.join(os.path.dirname(__file__), "assets"), 81 | outpath=scene_xml_path, 82 | ini_pos=init_pos, rgb=rgb 83 | ) 84 | print("Scene XML path:", self._env_xml_path) 85 | self.env_scene = MultiAgentScene(self._env_xml_path, self.n_agents) 86 | print("Created Scene with agents") 87 | for i, agent in self.agents.items(): 88 | agent.set_env(self.env_scene) 89 | self._set_observation_space() 90 | self._set_action_space() 91 | self.metadata = self.env_scene.metadata 92 | self.move_reward_weight = move_reward_weight 93 | gid = self.env_scene.model.geom_names.index(six.b('rightgoal')) 94 | self.RIGHT_GOAL = self.env_scene.model.geom_pos[gid][0] 95 | gid = self.env_scene.model.geom_names.index(six.b('leftgoal')) 96 | self.LEFT_GOAL = self.env_scene.model.geom_pos[gid][0] 97 | for i in range(self.n_agents): 98 | if self.agents[i].get_qpos()[0] > 0: 99 | self.agents[i].set_goal(self.LEFT_GOAL) 100 | else: 101 | self.agents[i].set_goal(self.RIGHT_GOAL) 102 | 103 | def _set_observation_space(self): 104 | self.observation_space = spaces.Tuple( 105 | [self.agents[i].observation_space for i in range(self.n_agents)] 106 | ) 107 | 108 | def _set_action_space(self): 109 | self.action_space = spaces.Tuple( 110 | [self.agents[i].action_space for i in range(self.n_agents)] 111 | ) 112 | 113 | def goal_rewards(self, infos=None, agent_dones=None): 114 | touchdowns = [self.agents[i].reached_goal() 115 | for i in range(self.n_agents)] 116 | num_reached_goal = sum(touchdowns) 117 | goal_rews = [0. for _ in range(self.n_agents)] 118 | if num_reached_goal != 1: 119 | return goal_rews, num_reached_goal > 0 120 | for i in range(self.n_agents): 121 | if touchdowns[i]: 122 | goal_rews[i] = self.GOAL_REWARD 123 | if infos: 124 | infos[i]['winner'] = True 125 | else: 126 | goal_rews[i] = - self.GOAL_REWARD 127 | return goal_rews, True 128 | 129 | def _get_done(self, dones, game_done): 130 | done = np.all(dones) 131 | done = game_done or not np.isfinite(self.state_vector()).all() or done 132 | dones = tuple(done for _ in range(self.n_agents)) 133 | return dones 134 | 135 | def _step(self, actions): 136 | for i in range(self.n_agents): 137 | self.agents[i].before_step() 138 | self.env_scene.simulate(actions) 139 | move_rews = [] 140 | infos = [] 141 | dones = [] 142 | for i in range(self.n_agents): 143 | move_r, agent_done, rinfo = self.agents[i].after_step(actions[i]) 144 | move_rews.append(move_r) 145 | dones.append(agent_done) 146 | rinfo['agent_done'] = agent_done 147 | infos.append(rinfo) 148 | goal_rews, game_done = self.goal_rewards(infos=infos, agent_dones=dones) 149 | rews = [] 150 | for i, info in enumerate(infos): 151 | info['reward_remaining'] = float(goal_rews[i]) 152 | rews.append(float(goal_rews[i] + self.move_reward_weight * move_rews[i])) 153 | rews = tuple(rews) 154 | done = self._get_done(dones, game_done) 155 | infos = tuple(infos) 156 | obses = self._get_obs() 157 | return obses, rews, done, infos 158 | 159 | def _get_obs(self): 160 | return tuple([self.agents[i]._get_obs() for i in range(self.n_agents)]) 161 | 162 | ''' 163 | Following remaps all mujoco-env calls to the scene 164 | ''' 165 | def _seed(self, seed=None): 166 | return self.env_scene._seed(seed) 167 | 168 | def _reset(self): 169 | # _ = self.env_scene._reset() 170 | ob = self.reset_model() 171 | return ob 172 | 173 | def set_state(self, qpos, qvel): 174 | self.env_scene.set_state(qpos, qvel) 175 | 176 | @property 177 | def dt(self): 178 | return self.env_scene.dt 179 | 180 | def _render(self, mode='human', close=False): 181 | return self.env_scene._render(mode, close) 182 | 183 | def state_vector(self): 184 | return self.env_scene.state_vector() 185 | 186 | def reset_model(self): 187 | # self.env_scene.reset_model() 188 | _ = self.env_scene.reset() 189 | for i in range(self.n_agents): 190 | self.agents[i].reset_agent() 191 | return self._get_obs() 192 | 193 | def viewer_setup(self): 194 | self.env_scene.viewer_setup() 195 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/multi_agent_scene.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from gym import utils 3 | from gym.envs.mujoco import mujoco_env 4 | from gym import spaces 5 | import os 6 | 7 | class MultiAgentScene(mujoco_env.MujocoEnv, utils.EzPickle): 8 | 9 | def __init__(self, xml_path, n_agents): 10 | self.n_agents = n_agents 11 | self._mujoco_init = False 12 | mujoco_env.MujocoEnv.__init__(self, xml_path, 5) 13 | self._mujoco_init = True 14 | utils.EzPickle.__init__(self) 15 | 16 | def simulate(self, actions): 17 | a = np.concatenate(actions, axis=0) 18 | self.do_simulation(a, self.frame_skip) 19 | 20 | def _step(self, actions): 21 | ''' 22 | Just to satisfy mujoco_init, should not be used 23 | ''' 24 | assert not self._mujoco_init, '_step should not be called on Scene' 25 | return self._get_obs(), 0, False, None 26 | 27 | def _get_obs(self): 28 | ''' 29 | Just to satisfy mujoco_init, should not be used 30 | ''' 31 | assert not self._mujoco_init, '_get_obs should not be called on Scene' 32 | obs = np.concatenate([ 33 | self.model.data.qpos.flat, 34 | self.model.data.qvel.flat 35 | ]) 36 | return obs 37 | 38 | def reset_model(self): 39 | qpos = self.init_qpos + self.np_random.uniform(size=self.model.nq, low=-.1, high=.1) 40 | qvel = self.init_qvel + self.np_random.randn(self.model.nv) * .1 41 | self.set_state(qpos, qvel) 42 | return None 43 | 44 | def viewer_setup(self): 45 | self.viewer.cam.trackbodyid = 0 46 | self.viewer.cam.distance = self.model.stat.extent * 0.55 47 | # self.viewer.cam.distance = self.model.stat.extent * 0.65 48 | # self.viewer.cam.distance = self.model.stat.extent * 1.5 49 | self.viewer.cam.lookat[2] += .8 50 | self.viewer.cam.elevation = -10 51 | # self.viewer.cam.distance = self.model.stat.extent * 0.4 52 | # self.viewer.cam.lookat[2] += 1.0 53 | # self.viewer.cam.elevation = -25 54 | # self.viewer.cam.azimuth = 0 if np.random.random() > 0.5 else 180 55 | self.viewer.cam.azimuth = 90 56 | # self.viewer.vopt.flags[8] = True 57 | # self.viewer.vopt.flags[9] = True 58 | rand = np.random.random() 59 | if rand < 0.33: 60 | self.viewer.cam.azimuth = 0 61 | elif 0.33 <= rand < 0.66: 62 | self.viewer.cam.azimuth = 90 63 | else: 64 | self.viewer.cam.azimuth = 180 65 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/multi_monitoring.py: -------------------------------------------------------------------------------- 1 | from gym.wrappers import Monitor 2 | 3 | class MultiMonitor(Monitor): 4 | 5 | def _before_step(self, action): 6 | return 7 | 8 | def _after_step(self, observation, reward, done, info): 9 | if not self.enabled: return done 10 | 11 | if done[0] and self.env_semantics_autoreset: 12 | # For envs with BlockingReset wrapping VNCEnv, this observation will be the first one of the new episode 13 | self._reset_video_recorder() 14 | self.episode_id += 1 15 | self._flush() 16 | 17 | # Record video 18 | self.video_recorder.capture_frame() 19 | 20 | return done 21 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/multi_time_limit.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym.wrappers import TimeLimit 3 | 4 | class MultiTimeLimit(TimeLimit): 5 | 6 | def _step(self, action): 7 | assert self._episode_started_at is not None, "Cannot call env.step() before calling reset()" 8 | observation, reward, done, info = self.env.step(action) 9 | self._elapsed_steps += 1 10 | 11 | if self._past_limit(): 12 | if self.metadata.get('semantics.autoreset'): 13 | _ = self.reset() # automatically reset the env 14 | done = tuple([True for _ in range(len(observation))]) 15 | 16 | return observation, reward, done, info 17 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/sumo.py: -------------------------------------------------------------------------------- 1 | from .multi_agent_env import MultiAgentEnv 2 | import numpy as np 3 | from gym import spaces 4 | import six 5 | 6 | class SumoEnv(MultiAgentEnv): 7 | ''' 8 | ''' 9 | 10 | def __init__(self, max_episode_steps=500, min_radius=1, max_radius=3.5, **kwargs): 11 | super(SumoEnv, self).__init__(**kwargs) 12 | self._max_episode_steps = max_episode_steps 13 | self._elapsed_steps = 0 14 | self.GOAL_REWARD = 1000 15 | self.RADIUS = self.MAX_RADIUS = self.current_max_radius = max_radius 16 | self.MIN_RADIUS = min_radius 17 | self.LIM_X = [(-2, 0), (0, 2)] 18 | self.LIM_Y = [(-2, 2), (-2, 2)] 19 | self.RANGE_X = self.LIM_X.copy() 20 | self.RANGE_Y = self.LIM_Y.copy() 21 | self.arena_id = self.env_scene.model.geom_names.index(six.b('arena')) 22 | self.arena_height = self.env_scene.model.geom_size[self.arena_id][1] * 2 23 | self._set_geom_radius() 24 | self.agent_contacts = False 25 | 26 | def _past_limit(self): 27 | if self._max_episode_steps <= self._elapsed_steps: 28 | return True 29 | return False 30 | 31 | def _past_arena(self, agent_id): 32 | xy = self.agents[agent_id].get_qpos()[:2] 33 | r = np.sum(xy ** 2) ** 0.5 34 | # print("Agent", agent_id, "at", r) 35 | if r > self.RADIUS: 36 | return True 37 | return False 38 | 39 | def _is_fallen(self, agent_id, limit=0.5): 40 | if self.agents[agent_id].team == 'ant': 41 | limit = 0.3 42 | limit = limit + self.arena_height 43 | return bool(self.agents[agent_id].get_qpos()[2] <= limit) 44 | 45 | def _is_standing(self, agent_id, limit=0.9): 46 | limit = limit + self.arena_height 47 | return bool(self.agents[agent_id].get_qpos()[2] > limit) 48 | 49 | def get_agent_contacts(self): 50 | mjcontacts = self.env_scene.data._wrapped.contents.contact 51 | ncon = self.env_scene.model.data.ncon 52 | contacts = [] 53 | for i in range(ncon): 54 | ct = mjcontacts[i] 55 | g1 , g2 = ct.geom1, ct.geom2 56 | g1 = self.env_scene.model.geom_names[g1] 57 | g2 = self.env_scene.model.geom_names[g2] 58 | if g1.find(six.b('agent')) >= 0 and g2.find(six.b('agent')) >= 0: 59 | if g1.find(six.b('agent0')) >= 0: 60 | if g2.find(six.b('agent1')) >= 0 and ct.dist < 0: 61 | contacts.append((g1, g2, ct.dist)) 62 | elif g1.find(six.b('agent1')) >= 0: 63 | if g2.find(six.b('agent0')) >= 0 and ct.dist < 0: 64 | contacts.append((g1, g2, ct.dist)) 65 | return contacts 66 | 67 | def goal_rewards(self, infos=None, agent_dones=None): 68 | self._elapsed_steps += 1 69 | goal_rews = [0. for _ in range(self.n_agents)] 70 | fallen = [self._is_fallen(i) 71 | for i in range(self.n_agents)] 72 | 73 | timeup = self._past_limit() 74 | past_arena = [self._past_arena(i) for i in range(self.n_agents)] 75 | done = False 76 | 77 | agent_contacts = self.get_agent_contacts() 78 | if len(agent_contacts) > 0: 79 | # print('Detected contacts:', agent_contacts) 80 | self.agent_contacts = True 81 | 82 | if any(fallen): 83 | done = True 84 | for j in range(self.n_agents): 85 | if fallen[j]: 86 | print('Agent', j, 'fallen') 87 | goal_rews[j] -= self.GOAL_REWARD 88 | elif self.agent_contacts: 89 | goal_rews[j] += self.GOAL_REWARD 90 | infos[j]['winner'] = True 91 | # import ipdb; ipdb.set_trace() 92 | elif any(past_arena): 93 | done = True 94 | for j in range(self.n_agents): 95 | if past_arena[j]: 96 | print('Agent', j, 'past arena') 97 | goal_rews[j] -= self.GOAL_REWARD 98 | elif self.agent_contacts: 99 | goal_rews[j] += self.GOAL_REWARD 100 | infos[j]['winner'] = True 101 | elif timeup: 102 | for j in range(self.n_agents): 103 | goal_rews[j] -= self.GOAL_REWARD 104 | 105 | done = timeup or done 106 | 107 | return goal_rews, done 108 | 109 | def _set_observation_space(self): 110 | ob_spaces_limits = [] 111 | # nextra = 3 + self.n_agents - 1 112 | nextra = 4 113 | for i in range(self.n_agents): 114 | s = self.agents[i].observation_space.shape[0] 115 | h = np.ones(s+nextra) * np.inf 116 | l = -h 117 | ob_spaces_limits.append((l, h)) 118 | self.observation_space = spaces.Tuple( 119 | [spaces.Box(l, h) for l,h in ob_spaces_limits] 120 | ) 121 | 122 | def _get_obs(self): 123 | obs = [] 124 | dists = [] 125 | for i in range(self.n_agents): 126 | xy = self.agents[i].get_qpos()[:2] 127 | r = np.sqrt(np.sum(xy**2)) 128 | d = self.RADIUS - r 129 | # print(r, d) 130 | dists.append(d) 131 | for i in range(self.n_agents): 132 | ob = self.agents[i]._get_obs() 133 | mydist = np.asarray(dists[i]) 134 | if self.n_agents == 1: 135 | other_dist = np.asarray(self.RADIUS) 136 | elif self.n_agents == 2: 137 | other_dist = np.asarray(dists[1-i]) 138 | else: 139 | other_dist = np.asarray([dists[j] for j in range(self.n_agents) if j != i]) 140 | ob = np.concatenate( 141 | [ob.flat, np.asarray(self.RADIUS).flat, 142 | mydist.flat, other_dist.flat, 143 | np.asarray(self._max_episode_steps - self._elapsed_steps).flat 144 | ] 145 | ) 146 | obs.append(ob) 147 | return tuple(obs) 148 | 149 | def _reset_max_radius(self, version): 150 | decay_func_r = lambda x: 0.1 * np.exp(0.001 * x) 151 | vr = decay_func_r(version) 152 | self.current_max_radius = min(self.MAX_RADIUS, self.MIN_RADIUS + vr) 153 | # print(self.current_max_radius) 154 | 155 | def _reset_radius(self): 156 | self.RADIUS = np.random.uniform(self.MIN_RADIUS, self.current_max_radius) 157 | # print('setting Radus to', self.RADIUS) 158 | 159 | def _set_geom_radius(self): 160 | gs = self.env_scene.model.geom_size.copy() 161 | gs[self.arena_id][0] = self.RADIUS 162 | self.env_scene.model.__setattr__('geom_size', gs) 163 | self.env_scene.model.forward() 164 | 165 | def _reset_agents(self): 166 | # set agent 0 167 | min_gap = 0.3 + self.MIN_RADIUS / 2 168 | for i in range(self.n_agents): 169 | if i % 2 == 0: 170 | x = np.random.uniform(-self.RADIUS + min_gap, -0.3) 171 | y_lim = np.sqrt(self.RADIUS**2 - x**2) 172 | y = np.random.uniform(-y_lim + min_gap, y_lim - min_gap) 173 | else: 174 | x = np.random.uniform(0.3, self.RADIUS - min_gap) 175 | y_lim = np.sqrt(self.RADIUS**2 - x**2) 176 | y = np.random.uniform(-y_lim + min_gap, y_lim - min_gap) 177 | self.agents[i].set_xyz((x,y,None)) 178 | # print('setting agent', i, 'at', (x,y)) 179 | 180 | def _reset(self, version=None): 181 | self._elapsed_steps = 0 182 | self.agent_contacts = False 183 | # self.RADIUS = self.START_RADIUS 184 | if version is not None: 185 | self._reset_max_radius(version) 186 | self._reset_radius() 187 | self._set_geom_radius() 188 | # print("here") 189 | _ = self.env_scene.reset() 190 | self._reset_agents() 191 | ob = self._get_obs() 192 | return ob 193 | 194 | def reset(self, margins=None, version=None): 195 | ob = self._reset(version=version) 196 | if margins: 197 | for i in range(self.n_agents): 198 | self.agents[i].set_margin(margins[i]) 199 | return ob 200 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/utils.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | import colorsys 3 | import numpy as np 4 | 5 | def list_filter(lambda_fn, iterable): 6 | return list(filter(lambda_fn, iterable)) 7 | 8 | def get_distinct_colors(n=2): 9 | ''' 10 | taken from: https://stackoverflow.com/a/876872 11 | ''' 12 | HSV_tuples = [(x*1.0/n, 0.5, 0.5) for x in range(n)] 13 | RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples) 14 | return RGB_tuples 15 | 16 | def set_class(root, prop, agent_class): 17 | if root is None: 18 | return 19 | # root_class = root.get('class') 20 | if root.tag == prop: 21 | root.set('class', agent_class) 22 | children = list(root) 23 | for child in children: 24 | set_class(child, prop, agent_class) 25 | 26 | def set_geom_class(root, name): 27 | set_class(root, 'geom', name) 28 | 29 | def set_motor_class(root, name): 30 | set_class(root, 'motor', name) 31 | 32 | def add_prefix(root, prop, prefix, force_set=False): 33 | if root is None: 34 | return 35 | root_prop_val = root.get(prop) 36 | if root_prop_val is not None: 37 | root.set(prop, prefix + '/' + root_prop_val) 38 | elif force_set: 39 | root.set(prop, prefix + '/' + 'anon' + str(np.random.randint(1, 1e10))) 40 | children = list(root) 41 | for child in children: 42 | add_prefix(child, prop, prefix, force_set) 43 | 44 | 45 | def tuple_to_str(tp): 46 | return " ".join(map(str, tp)) 47 | 48 | 49 | def create_multiagent_xml( 50 | world_xml, all_agent_xmls, agent_scopes=None, 51 | outdir='/tmp/', outpath=None, ini_pos=None, rgb=None 52 | ): 53 | world = ET.parse(world_xml) 54 | world_root = world.getroot() 55 | world_default = world_root.find('default') 56 | world_body = world_root.find('worldbody') 57 | world_actuator = None 58 | world_tendons = None 59 | n_agents = len(all_agent_xmls) 60 | if rgb is None: 61 | rgb = get_distinct_colors(n_agents) 62 | RGB_tuples = list( 63 | map(lambda x: tuple_to_str(x), rgb) 64 | ) 65 | if agent_scopes is None: 66 | agent_scopes = ['agent' + str(i) for i in range(n_agents)] 67 | 68 | if ini_pos is None: 69 | ini_pos = [(-i, 0, 0.75) for i in np.linspace(-n_agents, n_agents, n_agents)] 70 | # ini_pos = list(map(lambda x: tuple_to_str(x), ini_pos)) 71 | 72 | for i in range(n_agents): 73 | agent_default = ET.SubElement( 74 | world_default, 'default', attrib={'class': agent_scopes[i]} 75 | ) 76 | rgba = RGB_tuples[i] + " 1" 77 | agent_xml = ET.parse(all_agent_xmls[i]) 78 | default = agent_xml.find('default') 79 | color_set = False 80 | for child in list(default): 81 | if child.tag == 'geom': 82 | child.set('rgba', rgba) 83 | color_set = True 84 | agent_default.append(child) 85 | if not color_set: 86 | agent_geom = ET.SubElement( 87 | agent_default, 'geom', 88 | attrib={'contype': '1', 'conaffinity': '1', 'rgba': rgba} 89 | ) 90 | 91 | agent_body = agent_xml.find('body') 92 | if agent_body.get('pos'): 93 | oripos = list(map(float, agent_body.get('pos').strip().split(" "))) 94 | # keep original y and z coordinates 95 | pos = list(ini_pos[i]) 96 | # pos[1] = oripos[1] 97 | # pos[2] = oripos[2] 98 | # print(tuple_to_str(pos)) 99 | agent_body.set('pos', tuple_to_str(pos)) 100 | # add class to all geoms 101 | set_geom_class(agent_body, agent_scopes[i]) 102 | # add prefix to all names, important to map joints 103 | add_prefix(agent_body, 'name', agent_scopes[i], force_set=True) 104 | # add aggent body to xml 105 | world_body.append(agent_body) 106 | # get agent actuators 107 | agent_actuator = agent_xml.find('actuator') 108 | # add same prefix to all motor joints 109 | add_prefix(agent_actuator, 'joint', agent_scopes[i]) 110 | add_prefix(agent_actuator, 'name', agent_scopes[i]) 111 | # add actuator 112 | set_motor_class(agent_actuator, agent_scopes[i]) 113 | if world_actuator is None: 114 | world_root.append(agent_actuator) 115 | world_actuator = world_root.find('actuator') 116 | # print(world_actuator) 117 | # print(ET.tostring(world_root)) 118 | else: 119 | for motor in list(agent_actuator): 120 | world_actuator.append(motor) 121 | # get agent tendons if exists 122 | agent_tendon = agent_xml.find('tendon') 123 | if agent_tendon: 124 | # add same prefix to all motor joints 125 | add_prefix(agent_tendon, 'joint', agent_scopes[i]) 126 | add_prefix(agent_tendon, 'name', agent_scopes[i]) 127 | # add tendon 128 | if world_tendons is None: 129 | world_root.append(agent_tendon) 130 | world_tendons = world_root.find('tendon') 131 | # print(world_actuator) 132 | # print(ET.tostring(world_root)) 133 | else: 134 | for tendon in list(agent_tendon): 135 | world_tendons.append(tendon) 136 | 137 | if outpath is None: 138 | outname = world_xml.split("/")[-1].split(".xml")[0] + '.' + ".".join(map(lambda x: x.split("/")[-1].split(".xml")[0], all_agent_xmls)) + ".xml" 139 | outpath = outdir + '/' + outname 140 | 141 | world.write(outpath) 142 | return ET.tostring(world_root), outpath 143 | -------------------------------------------------------------------------------- /gym-compete/gym_compete/new_envs/you_shall_not_pass.py: -------------------------------------------------------------------------------- 1 | from .multi_agent_env import MultiAgentEnv 2 | import numpy as np 3 | 4 | class HumansBlockingEnv(MultiAgentEnv): 5 | ''' 6 | Two teams: walker and blocker 7 | Walker needs to reach the other end and bloker need to block them 8 | Rewards: 9 | Some Walker reaches end: 10 | walker which did touchdown: +1000 11 | all blockers: -1000 12 | No Walker reaches end: 13 | all walkers: -1000 14 | if blocker is standing: 15 | blocker gets +1000 16 | else: 17 | blocker gets 0 18 | NOTE: walker is fallen if z < 0.3 19 | ''' 20 | 21 | def __init__(self, max_episode_steps=500, **kwargs): 22 | super(HumansBlockingEnv, self).__init__(**kwargs) 23 | self._max_episode_steps = max_episode_steps 24 | self._elapsed_steps = 0 25 | self.GOAL_REWARD = 1000 26 | 27 | def _past_limit(self): 28 | if self._max_episode_steps <= self._elapsed_steps: 29 | return True 30 | return False 31 | 32 | def _is_standing(self, agent_id, limit=0.3): 33 | return bool(self.agents[agent_id].get_qpos()[2] > limit) 34 | 35 | def _get_done(self, dones, game_done): 36 | dones = tuple(game_done for _ in range(self.n_agents)) 37 | return dones 38 | 39 | def goal_rewards(self, infos=None, agent_dones=None): 40 | self._elapsed_steps += 1 41 | goal_rews = [0. for _ in range(self.n_agents)] 42 | touchdowns = [self.agents[i].reached_goal() 43 | for i in range(self.n_agents)] 44 | 45 | walkers_fallen = [not self._is_standing(i) 46 | for i in range(self.n_agents) 47 | if self.agents[i].team == 'walker'] 48 | 49 | # print(self._elapsed_steps, touchdowns, walkers_fallen) 50 | 51 | done = self._past_limit() or all(walkers_fallen) 52 | # print(self._elapsed_steps,touchdowns, walkers_fallen) 53 | if not any(touchdowns): 54 | all_walkers_fallen = all(walkers_fallen) 55 | # game_over = all_walkers_fallen 56 | for j in range(self.n_agents): 57 | if self.agents[j].team == 'blocker': 58 | # goal_rews[j] += -infos[1-j]['reward_goal_dist'] 59 | infos[j]['reward_move'] += -infos[1-j]['reward_goal_dist'] 60 | if all_walkers_fallen and self.agents[j].team == 'blocker': 61 | if self._is_standing(j): 62 | goal_rews[j] += self.GOAL_REWARD 63 | infos[j]['winner'] = True 64 | elif done: 65 | if self.agents[j].team == 'walker': 66 | goal_rews[j] -= self.GOAL_REWARD 67 | else: 68 | infos[j]['winner'] = True 69 | else: 70 | # some walker touched-down 71 | done = True 72 | for i in range(self.n_agents): 73 | if self.agents[i].team == 'walker': 74 | if touchdowns[i]: 75 | goal_rews[i] += self.GOAL_REWARD 76 | infos[i]['winner'] = True 77 | else: 78 | goal_rews[i] -= self.GOAL_REWARD 79 | 80 | # print(done, self._elapsed_steps, self._past_limit()) 81 | return goal_rews, done 82 | 83 | def _reset(self): 84 | self._elapsed_steps = 0 85 | ob = super(HumansBlockingEnv, self)._reset() 86 | return ob 87 | 88 | def reset(self, margins=None): 89 | ob = self._reset() 90 | if margins: 91 | for i in range(self.n_agents): 92 | self.agents[i].set_margin(margins[i]) 93 | return ob 94 | -------------------------------------------------------------------------------- /gym-compete/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup(name='gym_compete', 4 | version='0.0.1', 5 | install_requires=['gym'] # And any other dependencies foo needs 6 | ) 7 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from policy import LSTMPolicy, MlpPolicyValue 2 | import gym 3 | import gym_compete 4 | import pickle 5 | import sys 6 | import argparse 7 | import tensorflow as tf 8 | import numpy as np 9 | 10 | def load_from_file(param_pkl_path): 11 | with open(param_pkl_path, 'rb') as f: 12 | params = pickle.load(f) 13 | return params 14 | 15 | def setFromFlat(var_list, flat_params): 16 | shapes = list(map(lambda x: x.get_shape().as_list(), var_list)) 17 | total_size = np.sum([int(np.prod(shape)) for shape in shapes]) 18 | theta = tf.placeholder(tf.float32, [total_size]) 19 | start = 0 20 | assigns = [] 21 | for (shape, v) in zip(shapes, var_list): 22 | size = int(np.prod(shape)) 23 | assigns.append(tf.assign(v, tf.reshape(theta[start:start + size], shape))) 24 | start += size 25 | op = tf.group(*assigns) 26 | tf.get_default_session().run(op, {theta: flat_params}) 27 | 28 | def run(config): 29 | if config.env == "kick-and-defend": 30 | env = gym.make("kick-and-defend-v0") 31 | policy_type = "lstm" 32 | elif config.env == "run-to-goal-humans": 33 | env = gym.make("run-to-goal-humans-v0") 34 | policy_type = "mlp" 35 | elif config.env == "run-to-goal-ants": 36 | env = gym.make("run-to-goal-ants-v0") 37 | policy_type = "mlp" 38 | elif config.env == "you-shall-not-pass": 39 | env = gym.make("you-shall-not-pass-humans-v0") 40 | policy_type = "mlp" 41 | elif config.env == "sumo-humans": 42 | env = gym.make("sumo-humans-v0") 43 | policy_type = "lstm" 44 | elif config.env == "sumo-ants": 45 | env = gym.make("sumo-ants-v0") 46 | policy_type = "lstm" 47 | else: 48 | print("unsupported environment") 49 | print("choose from: run-to-goal-humans, run-to-goal-ants, you-shall-not-pass, sumo-humans, sumo-ants, kick-and-defend") 50 | sys.exit() 51 | 52 | param_paths = config.param_paths 53 | 54 | tf_config = tf.ConfigProto( 55 | inter_op_parallelism_threads=1, 56 | intra_op_parallelism_threads=1) 57 | sess = tf.Session(config=tf_config) 58 | sess.__enter__() 59 | 60 | policy = [] 61 | for i in range(2): 62 | scope = "policy" + str(i) 63 | if policy_type == "lstm": 64 | policy.append(LSTMPolicy(scope=scope, reuse=False, 65 | ob_space=env.observation_space.spaces[i], 66 | ac_space=env.action_space.spaces[i], 67 | hiddens=[128, 128], normalize=True)) 68 | else: 69 | policy.append(MlpPolicyValue(scope=scope, reuse=False, 70 | ob_space=env.observation_space.spaces[i], 71 | ac_space=env.action_space.spaces[i], 72 | hiddens=[64, 64], normalize=True)) 73 | 74 | # initialize uninitialized variables 75 | sess.run(tf.variables_initializer(tf.global_variables())) 76 | 77 | params = [load_from_file(param_pkl_path=path) for path in param_paths] 78 | for i in range(len(policy)): 79 | setFromFlat(policy[i].get_variables(), params[i]) 80 | 81 | max_episodes = config.max_episodes 82 | num_episodes = 0 83 | nstep = 0 84 | total_reward = [0.0 for _ in range(len(policy))] 85 | total_scores = [0 for _ in range(len(policy))] 86 | # total_scores = np.asarray(total_scores) 87 | observation = env.reset() 88 | print("-"*5 + " Episode %d " % (num_episodes+1) + "-"*5) 89 | while num_episodes < max_episodes: 90 | env.render() 91 | action = tuple([policy[i].act(stochastic=True, observation=observation[i])[0] 92 | for i in range(len(policy))]) 93 | observation, reward, done, infos = env.step(action) 94 | nstep += 1 95 | for i in range(len(policy)): 96 | total_reward[i] += reward[i] 97 | if done[0]: 98 | num_episodes += 1 99 | draw = True 100 | for i in range(len(policy)): 101 | if 'winner' in infos[i]: 102 | draw = False 103 | total_scores[i] += 1 104 | print("Winner: Agent {}, Scores: {}, Total Episodes: {}".format(i, total_scores, num_episodes)) 105 | if draw: 106 | print("Game Tied: Agent {}, Scores: {}, Total Episodes: {}".format(i, total_scores, num_episodes)) 107 | observation = env.reset() 108 | nstep = 0 109 | total_reward = [0.0 for _ in range(len(policy))] 110 | for i in range(len(policy)): 111 | policy[i].reset() 112 | if num_episodes < max_episodes: 113 | print("-"*5 + "Episode %d" % (num_episodes+1) + "-"*5) 114 | 115 | if __name__ == "__main__": 116 | p = argparse.ArgumentParser(description="Environments for Multi-agent competition") 117 | p.add_argument("--env", default="sumo-humans", type=str, help="competitive environment: run-to-goal-humans, run-to-goal-ants, you-shall-not-pass, sumo-humans, sumo-ants, kick-and-defend") 118 | p.add_argument("--param-paths", nargs='+', required=True, type=str) 119 | p.add_argument("--max-episodes", default=10, help="max number of matches", type=int) 120 | 121 | config = p.parse_args() 122 | run(config) 123 | -------------------------------------------------------------------------------- /policy.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import gym 4 | import logging 5 | import copy 6 | 7 | from tensorflow.contrib import layers 8 | 9 | 10 | class Policy(object): 11 | def reset(self, **kwargs): 12 | pass 13 | 14 | def act(self, observation): 15 | # should return act, info 16 | raise NotImplementedError() 17 | 18 | class RunningMeanStd(object): 19 | def __init__(self, scope="running", reuse=False, epsilon=1e-2, shape=()): 20 | with tf.variable_scope(scope, reuse=reuse): 21 | self._sum = tf.get_variable( 22 | dtype=tf.float32, 23 | shape=shape, 24 | initializer=tf.constant_initializer(0.0), 25 | name="sum", trainable=False) 26 | self._sumsq = tf.get_variable( 27 | dtype=tf.float32, 28 | shape=shape, 29 | initializer=tf.constant_initializer(epsilon), 30 | name="sumsq", trainable=False) 31 | self._count = tf.get_variable( 32 | dtype=tf.float32, 33 | shape=(), 34 | initializer=tf.constant_initializer(epsilon), 35 | name="count", trainable=False) 36 | self.shape = shape 37 | 38 | self.mean = tf.to_float(self._sum / self._count) 39 | var_est = tf.to_float(self._sumsq / self._count) - tf.square(self.mean) 40 | self.std = tf.sqrt(tf.maximum(var_est, 1e-2)) 41 | 42 | 43 | def dense(x, size, name, weight_init=None, bias=True): 44 | w = tf.get_variable(name + "/w", [x.get_shape()[1], size], initializer=weight_init) 45 | ret = tf.matmul(x, w) 46 | if bias: 47 | b = tf.get_variable(name + "/b", [size], initializer=tf.zeros_initializer()) 48 | return ret + b 49 | else: 50 | return ret 51 | 52 | def switch(condition, if_exp, else_exp): 53 | x_shape = copy.copy(if_exp.get_shape()) 54 | x = tf.cond(tf.cast(condition, 'bool'), 55 | lambda: if_exp, 56 | lambda: else_exp) 57 | x.set_shape(x_shape) 58 | return x 59 | 60 | 61 | class DiagonalGaussian(object): 62 | def __init__(self, mean, logstd): 63 | self.mean = mean 64 | self.logstd = logstd 65 | self.std = tf.exp(logstd) 66 | 67 | def sample(self): 68 | return self.mean + self.std * tf.random_normal(tf.shape(self.mean)) 69 | 70 | def mode(self): 71 | return self.mean 72 | 73 | 74 | class MlpPolicyValue(Policy): 75 | def __init__(self, scope, *, ob_space, ac_space, hiddens, convs=[], reuse=False, normalize=False): 76 | self.recurrent = False 77 | self.normalized = normalize 78 | self.zero_state = np.zeros(1) 79 | with tf.variable_scope(scope, reuse=reuse): 80 | self.scope = tf.get_variable_scope().name 81 | 82 | assert isinstance(ob_space, gym.spaces.Box) 83 | 84 | self.observation_ph = tf.placeholder(tf.float32, [None] + list(ob_space.shape), name="observation") 85 | self.stochastic_ph = tf.placeholder(tf.bool, (), name="stochastic") 86 | self.taken_action_ph = tf.placeholder(dtype=tf.float32, shape=[None, ac_space.shape[0]], name="taken_action") 87 | 88 | if self.normalized: 89 | if self.normalized != 'ob': 90 | self.ret_rms = RunningMeanStd(scope="retfilter") 91 | self.ob_rms = RunningMeanStd(shape=ob_space.shape, scope="obsfilter") 92 | 93 | obz = self.observation_ph 94 | if self.normalized: 95 | obz = tf.clip_by_value((self.observation_ph - self.ob_rms.mean) / self.ob_rms.std, -5.0, 5.0) 96 | 97 | last_out = obz 98 | for i, hid_size in enumerate(hiddens): 99 | last_out = tf.nn.tanh(dense(last_out, hid_size, "vffc%i" % (i + 1))) 100 | self.vpredz = dense(last_out, 1, "vffinal")[:, 0] 101 | 102 | self.vpred = self.vpredz 103 | if self.normalized and self.normalized != 'ob': 104 | self.vpred = self.vpredz * self.ret_rms.std + self.ret_rms.mean # raw = not standardized 105 | 106 | last_out = obz 107 | for i, hid_size in enumerate(hiddens): 108 | last_out = tf.nn.tanh(dense(last_out, hid_size, "polfc%i" % (i + 1))) 109 | mean = dense(last_out, ac_space.shape[0], "polfinal") 110 | logstd = tf.get_variable(name="logstd", shape=[1, ac_space.shape[0]], initializer=tf.zeros_initializer()) 111 | 112 | self.pd = DiagonalGaussian(mean, logstd) 113 | self.sampled_action = switch(self.stochastic_ph, self.pd.sample(), self.pd.mode()) 114 | 115 | def make_feed_dict(self, observation, taken_action): 116 | return { 117 | self.observation_ph: observation, 118 | self.taken_action_ph: taken_action 119 | } 120 | 121 | def act(self, observation, stochastic=True): 122 | outputs = [self.sampled_action, self.vpred] 123 | a, v = tf.get_default_session().run(outputs, { 124 | self.observation_ph: observation[None], 125 | self.stochastic_ph: stochastic}) 126 | return a[0], {'vpred': v[0]} 127 | 128 | def get_variables(self): 129 | return tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, self.scope) 130 | 131 | def get_trainable_variables(self): 132 | return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.scope) 133 | 134 | 135 | class LSTMPolicy(Policy): 136 | def __init__(self, scope, *, ob_space, ac_space, hiddens, reuse=False, normalize=False): 137 | self.recurrent = True 138 | self.normalized = normalize 139 | with tf.variable_scope(scope, reuse=reuse): 140 | self.scope = tf.get_variable_scope().name 141 | 142 | assert isinstance(ob_space, gym.spaces.Box) 143 | 144 | self.observation_ph = tf.placeholder(tf.float32, [None, None] + list(ob_space.shape), name="observation") 145 | self.stochastic_ph = tf.placeholder(tf.bool, (), name="stochastic") 146 | self.taken_action_ph = tf.placeholder(dtype=tf.float32, shape=[None, None, ac_space.shape[0]], name="taken_action") 147 | 148 | if self.normalized: 149 | if self.normalized != 'ob': 150 | self.ret_rms = RunningMeanStd(scope="retfilter") 151 | self.ob_rms = RunningMeanStd(shape=ob_space.shape, scope="obsfilter") 152 | 153 | obz = self.observation_ph 154 | if self.normalized: 155 | obz = tf.clip_by_value((self.observation_ph - self.ob_rms.mean) / self.ob_rms.std, -5.0, 5.0) 156 | 157 | last_out = obz 158 | for hidden in hiddens[:-1]: 159 | last_out = tf.contrib.layers.fully_connected(last_out, hidden) 160 | self.zero_state = [] 161 | self.state_in_ph = [] 162 | self.state_out = [] 163 | cell = tf.contrib.rnn.BasicLSTMCell(hiddens[-1], reuse=reuse) 164 | size = cell.state_size 165 | self.zero_state.append(np.zeros(size.c, dtype=np.float32)) 166 | self.zero_state.append(np.zeros(size.h, dtype=np.float32)) 167 | self.state_in_ph.append(tf.placeholder(tf.float32, [None, size.c], name="lstmv_c")) 168 | self.state_in_ph.append(tf.placeholder(tf.float32, [None, size.h], name="lstmv_h")) 169 | initial_state = tf.contrib.rnn.LSTMStateTuple(self.state_in_ph[-2], self.state_in_ph[-1]) 170 | last_out, state_out = tf.nn.dynamic_rnn(cell, last_out, initial_state=initial_state, scope="lstmv") 171 | self.state_out.append(state_out) 172 | 173 | self.vpredz = tf.contrib.layers.fully_connected(last_out, 1, activation_fn=None)[:, :, 0] 174 | self.vpred = self.vpredz 175 | if self.normalized and self.normalized != 'ob': 176 | self.vpred = self.vpredz * self.ret_rms.std + self.ret_rms.mean # raw = not standardized 177 | 178 | last_out = obz 179 | for hidden in hiddens[:-1]: 180 | last_out = tf.contrib.layers.fully_connected(last_out, hidden) 181 | cell = tf.contrib.rnn.BasicLSTMCell(hiddens[-1], reuse=reuse) 182 | size = cell.state_size 183 | self.zero_state.append(np.zeros(size.c, dtype=np.float32)) 184 | self.zero_state.append(np.zeros(size.h, dtype=np.float32)) 185 | self.state_in_ph.append(tf.placeholder(tf.float32, [None, size.c], name="lstmp_c")) 186 | self.state_in_ph.append(tf.placeholder(tf.float32, [None, size.h], name="lstmp_h")) 187 | initial_state = tf.contrib.rnn.LSTMStateTuple(self.state_in_ph[-2], self.state_in_ph[-1]) 188 | last_out, state_out = tf.nn.dynamic_rnn(cell, last_out, initial_state=initial_state, scope="lstmp") 189 | self.state_out.append(state_out) 190 | 191 | mean = tf.contrib.layers.fully_connected(last_out, ac_space.shape[0], activation_fn=None) 192 | logstd = tf.get_variable(name="logstd", shape=[1, ac_space.shape[0]], initializer=tf.zeros_initializer()) 193 | 194 | self.pd = DiagonalGaussian(mean, logstd) 195 | self.sampled_action = switch(self.stochastic_ph, self.pd.sample(), self.pd.mode()) 196 | 197 | self.zero_state = np.array(self.zero_state) 198 | self.state_in_ph = tuple(self.state_in_ph) 199 | self.state = self.zero_state 200 | 201 | for p in self.get_trainable_variables(): 202 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, tf.reduce_sum(tf.square(p))) 203 | 204 | def make_feed_dict(self, observation, state_in, taken_action): 205 | return { 206 | self.observation_ph: observation, 207 | self.state_in_ph: list(np.transpose(state_in, (1, 0, 2))), 208 | self.taken_action_ph: taken_action 209 | } 210 | 211 | def act(self, observation, stochastic=True): 212 | outputs = [self.sampled_action, self.vpred, self.state_out] 213 | a, v, s = tf.get_default_session().run(outputs, { 214 | self.observation_ph: observation[None, None], 215 | self.state_in_ph: list(self.state[:, None, :]), 216 | self.stochastic_ph: stochastic}) 217 | self.state = [] 218 | for x in s: 219 | self.state.append(x.c[0]) 220 | self.state.append(x.h[0]) 221 | self.state = np.array(self.state) 222 | return a[0, 0], {'vpred': v[0, 0], 'state': self.state} 223 | 224 | def get_variables(self): 225 | return tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, self.scope) 226 | 227 | def get_trainable_variables(self): 228 | return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.scope) 229 | 230 | def reset(self): 231 | self.state = self.zero_state 232 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | gym 2 | mujoco-py==0.5.7 3 | numpy 4 | tensorflow>=1.1.0 5 | --------------------------------------------------------------------------------