├── .gitignore ├── LICENSE.txt ├── README.md ├── bin ├── __init__.py └── interactive.py ├── make_env.py ├── multiagent ├── __init__.py ├── core.py ├── environment.py ├── multi_discrete.py ├── policy.py ├── rendering.py ├── scenario.py └── scenarios │ ├── __init__.py │ ├── simple.py │ ├── simple_adversary.py │ ├── simple_crypto.py │ ├── simple_push.py │ ├── simple_reference.py │ ├── simple_speaker_listener.py │ ├── simple_spread.py │ ├── simple_tag.py │ └── simple_world_comm.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | *.egg-info/ 3 | *.pyc -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 OpenAI 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **Status:** Archive (code is provided as-is, no updates expected) 2 | 3 | # Maintained Fork 4 | 5 | The maintained version of these environments, which includenumerous fixes, comprehensive documentation, support for installation via pip, and support for current versions of Python are available in PettingZoo (https://github.com/Farama-Foundation/PettingZoo , https://pettingzoo.farama.org/environments/mpe/) 6 | 7 | # Multi-Agent Particle Environment 8 | 9 | A simple multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics. 10 | Used in the paper [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/pdf/1706.02275.pdf). 11 | 12 | ## Getting started: 13 | 14 | - To install, `cd` into the root directory and type `pip install -e .` 15 | 16 | - To interactively view moving to landmark scenario (see others in ./scenarios/): 17 | `bin/interactive.py --scenario simple.py` 18 | 19 | - Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), numpy (1.14.5), pyglet (1.5.27) 20 | 21 | - To use the environments, look at the code for importing them in `make_env.py`. 22 | 23 | ## Code structure 24 | 25 | - `make_env.py`: contains code for importing a multiagent environment as an OpenAI Gym-like object. 26 | 27 | - `./multiagent/environment.py`: contains code for environment simulation (interaction physics, `_step()` function, etc.) 28 | 29 | - `./multiagent/core.py`: contains classes for various objects (Entities, Landmarks, Agents, etc.) that are used throughout the code. 30 | 31 | - `./multiagent/rendering.py`: used for displaying agent behaviors on the screen. 32 | 33 | - `./multiagent/policy.py`: contains code for interactive policy based on keyboard input. 34 | 35 | - `./multiagent/scenario.py`: contains base scenario object that is extended for all scenarios. 36 | 37 | - `./multiagent/scenarios/`: folder where various scenarios/ environments are stored. scenario code consists of several functions: 38 | 1) `make_world()`: creates all of the entities that inhabit the world (landmarks, agents, etc.), assigns their capabilities (whether they can communicate, or move, or both). 39 | called once at the beginning of each training session 40 | 2) `reset_world()`: resets the world by assigning properties (position, color, etc.) to all entities in the world 41 | called before every episode (including after make_world() before the first episode) 42 | 3) `reward()`: defines the reward function for a given agent 43 | 4) `observation()`: defines the observation space of a given agent 44 | 5) (optional) `benchmark_data()`: provides diagnostic data for policies trained on the environment (e.g. evaluation metrics) 45 | 46 | ### Creating new environments 47 | 48 | You can create new scenarios by implementing the first 4 functions above (`make_world()`, `reset_world()`, `reward()`, and `observation()`). 49 | 50 | ## List of environments 51 | 52 | 53 | | Env name in code (name in paper) | Communication? | Competitive? | Notes | 54 | | --- | --- | --- | --- | 55 | | `simple.py` | N | N | Single agent sees landmark position, rewarded based on how close it gets to landmark. Not a multiagent environment -- used for debugging policies. | 56 | | `simple_adversary.py` (Physical deception) | N | Y | 1 adversary (red), N good agents (green), N landmarks (usually N=2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary. | 57 | | `simple_crypto.py` (Covert communication) | Y | Y | Two good agents (alice and bob), one adversary (eve). Alice must sent a private message to bob over a public channel. Alice and bob are rewarded based on how well bob reconstructs the message, but negatively rewarded if eve can reconstruct the message. Alice and bob have a private key (randomly generated at beginning of each episode), which they must learn to use to encrypt the message. | 58 | | `simple_push.py` (Keep-away) | N |Y | 1 agent, 1 adversary, 1 landmark. Agent is rewarded based on distance to landmark. Adversary is rewarded if it is close to the landmark, and if the agent is far from the landmark. So the adversary learns to push agent away from the landmark. | 59 | | `simple_reference.py` | Y | N | 2 agents, 3 landmarks of different colors. Each agent wants to get to their target landmark, which is known only by other agent. Reward is collective. So agents have to learn to communicate the goal of the other agent, and navigate to their landmark. This is the same as the simple_speaker_listener scenario where both agents are simultaneous speakers and listeners. | 60 | | `simple_speaker_listener.py` (Cooperative communication) | Y | N | Same as simple_reference, except one agent is the ‘speaker’ (gray) that does not move (observes goal of other agent), and other agent is the listener (cannot speak, but must navigate to correct landmark).| 61 | | `simple_spread.py` (Cooperative navigation) | N | N | N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions. | 62 | | `simple_tag.py` (Predator-prey) | N | Y | Predator-prey environment. Good agents (green) are faster and want to avoid being hit by adversaries (red). Adversaries are slower and want to hit good agents. Obstacles (large black circles) block the way. | 63 | | `simple_world_comm.py` | Y | Y | Environment seen in the video accompanying the paper. Same as simple_tag, except (1) there is food (small blue balls) that the good agents are rewarded for being near, (2) we now have ‘forests’ that hide agents inside from being seen from outside; (3) there is a ‘leader adversary” that can see the agents at all times, and can communicate with the other adversaries to help coordinate the chase. | 64 | 65 | ## Paper citation 66 | 67 | If you used this environment for your experiments or found it helpful, consider citing the following papers: 68 | 69 | Environments in this repo: 70 |
71 | @article{lowe2017multi,
72 |   title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},
73 |   author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},
74 |   journal={Neural Information Processing Systems (NIPS)},
75 |   year={2017}
76 | }
77 | 
78 | 79 | Original particle world environment: 80 |
81 | @article{mordatch2017emergence,
82 |   title={Emergence of Grounded Compositional Language in Multi-Agent Populations},
83 |   author={Mordatch, Igor and Abbeel, Pieter},
84 |   journal={arXiv preprint arXiv:1703.04908},
85 |   year={2017}
86 | }
87 | 
88 | -------------------------------------------------------------------------------- /bin/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openai/multiagent-particle-envs/83ba4d1aeb00282f7c4acd6912435b3ca642c227/bin/__init__.py -------------------------------------------------------------------------------- /bin/interactive.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os,sys 3 | sys.path.insert(1, os.path.join(sys.path[0], '..')) 4 | import argparse 5 | 6 | from multiagent.environment import MultiAgentEnv 7 | from multiagent.policy import InteractivePolicy 8 | import multiagent.scenarios as scenarios 9 | 10 | if __name__ == '__main__': 11 | # parse arguments 12 | parser = argparse.ArgumentParser(description=None) 13 | parser.add_argument('-s', '--scenario', default='simple.py', help='Path of the scenario Python script.') 14 | args = parser.parse_args() 15 | 16 | # load scenario from script 17 | scenario = scenarios.load(args.scenario).Scenario() 18 | # create world 19 | world = scenario.make_world() 20 | # create multiagent environment 21 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation, info_callback=None, shared_viewer = False) 22 | # render call to create viewer window (necessary only for interactive policies) 23 | env.render() 24 | # create interactive policies for each agent 25 | policies = [InteractivePolicy(env,i) for i in range(env.n)] 26 | # execution loop 27 | obs_n = env.reset() 28 | while True: 29 | # query for action from each agent's policy 30 | act_n = [] 31 | for i, policy in enumerate(policies): 32 | act_n.append(policy.action(obs_n[i])) 33 | # step environment 34 | obs_n, reward_n, done_n, _ = env.step(act_n) 35 | # render all agent views 36 | env.render() 37 | # display rewards 38 | #for agent in env.world.agents: 39 | # print(agent.name + " reward: %0.3f" % env._get_reward(agent)) 40 | -------------------------------------------------------------------------------- /make_env.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for creating a multiagent environment with one of the scenarios listed 3 | in ./scenarios/. 4 | Can be called by using, for example: 5 | env = make_env('simple_speaker_listener') 6 | After producing the env object, can be used similarly to an OpenAI gym 7 | environment. 8 | 9 | A policy using this environment must output actions in the form of a list 10 | for all agents. Each element of the list should be a numpy array, 11 | of size (env.world.dim_p + env.world.dim_c, 1). Physical actions precede 12 | communication actions in this array. See environment.py for more details. 13 | """ 14 | 15 | def make_env(scenario_name, benchmark=False): 16 | ''' 17 | Creates a MultiAgentEnv object as env. This can be used similar to a gym 18 | environment by calling env.reset() and env.step(). 19 | Use env.render() to view the environment on the screen. 20 | 21 | Input: 22 | scenario_name : name of the scenario from ./scenarios/ to be Returns 23 | (without the .py extension) 24 | benchmark : whether you want to produce benchmarking data 25 | (usually only done during evaluation) 26 | 27 | Some useful env properties (see environment.py): 28 | .observation_space : Returns the observation space for each agent 29 | .action_space : Returns the action space for each agent 30 | .n : Returns the number of Agents 31 | ''' 32 | from multiagent.environment import MultiAgentEnv 33 | import multiagent.scenarios as scenarios 34 | 35 | # load scenario from script 36 | scenario = scenarios.load(scenario_name + ".py").Scenario() 37 | # create world 38 | world = scenario.make_world() 39 | # create multiagent environment 40 | if benchmark: 41 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation, scenario.benchmark_data) 42 | else: 43 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation) 44 | return env 45 | -------------------------------------------------------------------------------- /multiagent/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | import warnings 3 | 4 | from gym.envs.registration import register 5 | 6 | # Multiagent envs 7 | # ---------------------------------------- 8 | 9 | register( 10 | id='MultiagentSimple-v0', 11 | entry_point='multiagent.envs:SimpleEnv', 12 | # FIXME(cathywu) currently has to be exactly max_path_length parameters in 13 | # rllab run script 14 | max_episode_steps=100, 15 | ) 16 | 17 | register( 18 | id='MultiagentSimpleSpeakerListener-v0', 19 | entry_point='multiagent.envs:SimpleSpeakerListenerEnv', 20 | max_episode_steps=100, 21 | ) 22 | 23 | warnings.warn("This code base is no longer maintained, and is not expected to be maintained again in the future. \n" 24 | "For the past handful of years, these environments been maintained inside of PettingZoo (see " 25 | "https://pettingzoo.farama.org/environments/mpe/). \nThis maintained version includes documentation, " 26 | "support for the PettingZoo API, support for current versions of Python, numerous bug fixes, \n" 27 | "support for installation via pip, and numerous other large quality of life improvements. \nWe " 28 | "encourage researchers to switch to this maintained version for all purposes other than comparing " 29 | "to results run on this version of the environments. \n") 30 | 31 | if os.getenv('SUPPRESS_MA_PROMPT') != '1': 32 | input("Please read the raised warning, then press Enter to continue... (to suppress this prompt, please set the environment variable `SUPPRESS_MA_PROMPT=1`)\n") 33 | -------------------------------------------------------------------------------- /multiagent/core.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | # physical/external base state of all entites 4 | class EntityState(object): 5 | def __init__(self): 6 | # physical position 7 | self.p_pos = None 8 | # physical velocity 9 | self.p_vel = None 10 | 11 | # state of agents (including communication and internal/mental state) 12 | class AgentState(EntityState): 13 | def __init__(self): 14 | super(AgentState, self).__init__() 15 | # communication utterance 16 | self.c = None 17 | 18 | # action of the agent 19 | class Action(object): 20 | def __init__(self): 21 | # physical action 22 | self.u = None 23 | # communication action 24 | self.c = None 25 | 26 | # properties and state of physical world entity 27 | class Entity(object): 28 | def __init__(self): 29 | # name 30 | self.name = '' 31 | # properties: 32 | self.size = 0.050 33 | # entity can move / be pushed 34 | self.movable = False 35 | # entity collides with others 36 | self.collide = True 37 | # material density (affects mass) 38 | self.density = 25.0 39 | # color 40 | self.color = None 41 | # max speed and accel 42 | self.max_speed = None 43 | self.accel = None 44 | # state 45 | self.state = EntityState() 46 | # mass 47 | self.initial_mass = 1.0 48 | 49 | @property 50 | def mass(self): 51 | return self.initial_mass 52 | 53 | # properties of landmark entities 54 | class Landmark(Entity): 55 | def __init__(self): 56 | super(Landmark, self).__init__() 57 | 58 | # properties of agent entities 59 | class Agent(Entity): 60 | def __init__(self): 61 | super(Agent, self).__init__() 62 | # agents are movable by default 63 | self.movable = True 64 | # cannot send communication signals 65 | self.silent = False 66 | # cannot observe the world 67 | self.blind = False 68 | # physical motor noise amount 69 | self.u_noise = None 70 | # communication noise amount 71 | self.c_noise = None 72 | # control range 73 | self.u_range = 1.0 74 | # state 75 | self.state = AgentState() 76 | # action 77 | self.action = Action() 78 | # script behavior to execute 79 | self.action_callback = None 80 | 81 | # multi-agent world 82 | class World(object): 83 | def __init__(self): 84 | # list of agents and entities (can change at execution-time!) 85 | self.agents = [] 86 | self.landmarks = [] 87 | # communication channel dimensionality 88 | self.dim_c = 0 89 | # position dimensionality 90 | self.dim_p = 2 91 | # color dimensionality 92 | self.dim_color = 3 93 | # simulation timestep 94 | self.dt = 0.1 95 | # physical damping 96 | self.damping = 0.25 97 | # contact response parameters 98 | self.contact_force = 1e+2 99 | self.contact_margin = 1e-3 100 | 101 | # return all entities in the world 102 | @property 103 | def entities(self): 104 | return self.agents + self.landmarks 105 | 106 | # return all agents controllable by external policies 107 | @property 108 | def policy_agents(self): 109 | return [agent for agent in self.agents if agent.action_callback is None] 110 | 111 | # return all agents controlled by world scripts 112 | @property 113 | def scripted_agents(self): 114 | return [agent for agent in self.agents if agent.action_callback is not None] 115 | 116 | # update state of the world 117 | def step(self): 118 | # set actions for scripted agents 119 | for agent in self.scripted_agents: 120 | agent.action = agent.action_callback(agent, self) 121 | # gather forces applied to entities 122 | p_force = [None] * len(self.entities) 123 | # apply agent physical controls 124 | p_force = self.apply_action_force(p_force) 125 | # apply environment forces 126 | p_force = self.apply_environment_force(p_force) 127 | # integrate physical state 128 | self.integrate_state(p_force) 129 | # update agent state 130 | for agent in self.agents: 131 | self.update_agent_state(agent) 132 | 133 | # gather agent action forces 134 | def apply_action_force(self, p_force): 135 | # set applied forces 136 | for i,agent in enumerate(self.agents): 137 | if agent.movable: 138 | noise = np.random.randn(*agent.action.u.shape) * agent.u_noise if agent.u_noise else 0.0 139 | p_force[i] = agent.action.u + noise 140 | return p_force 141 | 142 | # gather physical forces acting on entities 143 | def apply_environment_force(self, p_force): 144 | # simple (but inefficient) collision response 145 | for a,entity_a in enumerate(self.entities): 146 | for b,entity_b in enumerate(self.entities): 147 | if(b <= a): continue 148 | [f_a, f_b] = self.get_collision_force(entity_a, entity_b) 149 | if(f_a is not None): 150 | if(p_force[a] is None): p_force[a] = 0.0 151 | p_force[a] = f_a + p_force[a] 152 | if(f_b is not None): 153 | if(p_force[b] is None): p_force[b] = 0.0 154 | p_force[b] = f_b + p_force[b] 155 | return p_force 156 | 157 | # integrate physical state 158 | def integrate_state(self, p_force): 159 | for i,entity in enumerate(self.entities): 160 | if not entity.movable: continue 161 | entity.state.p_vel = entity.state.p_vel * (1 - self.damping) 162 | if (p_force[i] is not None): 163 | entity.state.p_vel += (p_force[i] / entity.mass) * self.dt 164 | if entity.max_speed is not None: 165 | speed = np.sqrt(np.square(entity.state.p_vel[0]) + np.square(entity.state.p_vel[1])) 166 | if speed > entity.max_speed: 167 | entity.state.p_vel = entity.state.p_vel / np.sqrt(np.square(entity.state.p_vel[0]) + 168 | np.square(entity.state.p_vel[1])) * entity.max_speed 169 | entity.state.p_pos += entity.state.p_vel * self.dt 170 | 171 | def update_agent_state(self, agent): 172 | # set communication state (directly for now) 173 | if agent.silent: 174 | agent.state.c = np.zeros(self.dim_c) 175 | else: 176 | noise = np.random.randn(*agent.action.c.shape) * agent.c_noise if agent.c_noise else 0.0 177 | agent.state.c = agent.action.c + noise 178 | 179 | # get collision forces for any contact between two entities 180 | def get_collision_force(self, entity_a, entity_b): 181 | if (not entity_a.collide) or (not entity_b.collide): 182 | return [None, None] # not a collider 183 | if (entity_a is entity_b): 184 | return [None, None] # don't collide against itself 185 | # compute actual distance between entities 186 | delta_pos = entity_a.state.p_pos - entity_b.state.p_pos 187 | dist = np.sqrt(np.sum(np.square(delta_pos))) 188 | # minimum allowable distance 189 | dist_min = entity_a.size + entity_b.size 190 | # softmax penetration 191 | k = self.contact_margin 192 | penetration = np.logaddexp(0, -(dist - dist_min)/k)*k 193 | force = self.contact_force * delta_pos / dist * penetration 194 | force_a = +force if entity_a.movable else None 195 | force_b = -force if entity_b.movable else None 196 | return [force_a, force_b] -------------------------------------------------------------------------------- /multiagent/environment.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import spaces 3 | from gym.envs.registration import EnvSpec 4 | import numpy as np 5 | from multiagent.multi_discrete import MultiDiscrete 6 | 7 | # environment for all agents in the multiagent world 8 | # currently code assumes that no agents will be created/destroyed at runtime! 9 | class MultiAgentEnv(gym.Env): 10 | metadata = { 11 | 'render.modes' : ['human', 'rgb_array'] 12 | } 13 | 14 | def __init__(self, world, reset_callback=None, reward_callback=None, 15 | observation_callback=None, info_callback=None, 16 | done_callback=None, shared_viewer=True): 17 | 18 | self.world = world 19 | self.agents = self.world.policy_agents 20 | # set required vectorized gym env property 21 | self.n = len(world.policy_agents) 22 | # scenario callbacks 23 | self.reset_callback = reset_callback 24 | self.reward_callback = reward_callback 25 | self.observation_callback = observation_callback 26 | self.info_callback = info_callback 27 | self.done_callback = done_callback 28 | # environment parameters 29 | self.discrete_action_space = True 30 | # if true, action is a number 0...N, otherwise action is a one-hot N-dimensional vector 31 | self.discrete_action_input = False 32 | # if true, even the action is continuous, action will be performed discretely 33 | self.force_discrete_action = world.discrete_action if hasattr(world, 'discrete_action') else False 34 | # if true, every agent has the same reward 35 | self.shared_reward = world.collaborative if hasattr(world, 'collaborative') else False 36 | self.time = 0 37 | 38 | # configure spaces 39 | self.action_space = [] 40 | self.observation_space = [] 41 | for agent in self.agents: 42 | total_action_space = [] 43 | # physical action space 44 | if self.discrete_action_space: 45 | u_action_space = spaces.Discrete(world.dim_p * 2 + 1) 46 | else: 47 | u_action_space = spaces.Box(low=-agent.u_range, high=+agent.u_range, shape=(world.dim_p,), dtype=np.float32) 48 | if agent.movable: 49 | total_action_space.append(u_action_space) 50 | # communication action space 51 | if self.discrete_action_space: 52 | c_action_space = spaces.Discrete(world.dim_c) 53 | else: 54 | c_action_space = spaces.Box(low=0.0, high=1.0, shape=(world.dim_c,), dtype=np.float32) 55 | if not agent.silent: 56 | total_action_space.append(c_action_space) 57 | # total action space 58 | if len(total_action_space) > 1: 59 | # all action spaces are discrete, so simplify to MultiDiscrete action space 60 | if all([isinstance(act_space, spaces.Discrete) for act_space in total_action_space]): 61 | act_space = MultiDiscrete([[0, act_space.n - 1] for act_space in total_action_space]) 62 | else: 63 | act_space = spaces.Tuple(total_action_space) 64 | self.action_space.append(act_space) 65 | else: 66 | self.action_space.append(total_action_space[0]) 67 | # observation space 68 | obs_dim = len(observation_callback(agent, self.world)) 69 | self.observation_space.append(spaces.Box(low=-np.inf, high=+np.inf, shape=(obs_dim,), dtype=np.float32)) 70 | agent.action.c = np.zeros(self.world.dim_c) 71 | 72 | # rendering 73 | self.shared_viewer = shared_viewer 74 | if self.shared_viewer: 75 | self.viewers = [None] 76 | else: 77 | self.viewers = [None] * self.n 78 | self._reset_render() 79 | 80 | def step(self, action_n): 81 | obs_n = [] 82 | reward_n = [] 83 | done_n = [] 84 | info_n = {'n': []} 85 | self.agents = self.world.policy_agents 86 | # set action for each agent 87 | for i, agent in enumerate(self.agents): 88 | self._set_action(action_n[i], agent, self.action_space[i]) 89 | # advance world state 90 | self.world.step() 91 | # record observation for each agent 92 | for agent in self.agents: 93 | obs_n.append(self._get_obs(agent)) 94 | reward_n.append(self._get_reward(agent)) 95 | done_n.append(self._get_done(agent)) 96 | 97 | info_n['n'].append(self._get_info(agent)) 98 | 99 | # all agents get total reward in cooperative case 100 | reward = np.sum(reward_n) 101 | if self.shared_reward: 102 | reward_n = [reward] * self.n 103 | 104 | return obs_n, reward_n, done_n, info_n 105 | 106 | def reset(self): 107 | # reset world 108 | self.reset_callback(self.world) 109 | # reset renderer 110 | self._reset_render() 111 | # record observations for each agent 112 | obs_n = [] 113 | self.agents = self.world.policy_agents 114 | for agent in self.agents: 115 | obs_n.append(self._get_obs(agent)) 116 | return obs_n 117 | 118 | # get info used for benchmarking 119 | def _get_info(self, agent): 120 | if self.info_callback is None: 121 | return {} 122 | return self.info_callback(agent, self.world) 123 | 124 | # get observation for a particular agent 125 | def _get_obs(self, agent): 126 | if self.observation_callback is None: 127 | return np.zeros(0) 128 | return self.observation_callback(agent, self.world) 129 | 130 | # get dones for a particular agent 131 | # unused right now -- agents are allowed to go beyond the viewing screen 132 | def _get_done(self, agent): 133 | if self.done_callback is None: 134 | return False 135 | return self.done_callback(agent, self.world) 136 | 137 | # get reward for a particular agent 138 | def _get_reward(self, agent): 139 | if self.reward_callback is None: 140 | return 0.0 141 | return self.reward_callback(agent, self.world) 142 | 143 | # set env action for a particular agent 144 | def _set_action(self, action, agent, action_space, time=None): 145 | agent.action.u = np.zeros(self.world.dim_p) 146 | agent.action.c = np.zeros(self.world.dim_c) 147 | # process action 148 | if isinstance(action_space, MultiDiscrete): 149 | act = [] 150 | size = action_space.high - action_space.low + 1 151 | index = 0 152 | for s in size: 153 | act.append(action[index:(index+s)]) 154 | index += s 155 | action = act 156 | else: 157 | action = [action] 158 | 159 | if agent.movable: 160 | # physical action 161 | if self.discrete_action_input: 162 | agent.action.u = np.zeros(self.world.dim_p) 163 | # process discrete action 164 | if action[0] == 1: agent.action.u[0] = -1.0 165 | if action[0] == 2: agent.action.u[0] = +1.0 166 | if action[0] == 3: agent.action.u[1] = -1.0 167 | if action[0] == 4: agent.action.u[1] = +1.0 168 | else: 169 | if self.force_discrete_action: 170 | d = np.argmax(action[0]) 171 | action[0][:] = 0.0 172 | action[0][d] = 1.0 173 | if self.discrete_action_space: 174 | agent.action.u[0] += action[0][1] - action[0][2] 175 | agent.action.u[1] += action[0][3] - action[0][4] 176 | else: 177 | agent.action.u = action[0] 178 | sensitivity = 5.0 179 | if agent.accel is not None: 180 | sensitivity = agent.accel 181 | agent.action.u *= sensitivity 182 | action = action[1:] 183 | if not agent.silent: 184 | # communication action 185 | if self.discrete_action_input: 186 | agent.action.c = np.zeros(self.world.dim_c) 187 | agent.action.c[action[0]] = 1.0 188 | else: 189 | agent.action.c = action[0] 190 | action = action[1:] 191 | # make sure we used all elements of action 192 | assert len(action) == 0 193 | 194 | # reset rendering assets 195 | def _reset_render(self): 196 | self.render_geoms = None 197 | self.render_geoms_xform = None 198 | 199 | # render environment 200 | def render(self, mode='human'): 201 | if mode == 'human': 202 | alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 203 | message = '' 204 | for agent in self.world.agents: 205 | comm = [] 206 | for other in self.world.agents: 207 | if other is agent: continue 208 | if np.all(other.state.c == 0): 209 | word = '_' 210 | else: 211 | word = alphabet[np.argmax(other.state.c)] 212 | message += (other.name + ' to ' + agent.name + ': ' + word + ' ') 213 | print(message) 214 | 215 | for i in range(len(self.viewers)): 216 | # create viewers (if necessary) 217 | if self.viewers[i] is None: 218 | # import rendering only if we need it (and don't import for headless machines) 219 | #from gym.envs.classic_control import rendering 220 | from multiagent import rendering 221 | self.viewers[i] = rendering.Viewer(700,700) 222 | 223 | # create rendering geometry 224 | if self.render_geoms is None: 225 | # import rendering only if we need it (and don't import for headless machines) 226 | #from gym.envs.classic_control import rendering 227 | from multiagent import rendering 228 | self.render_geoms = [] 229 | self.render_geoms_xform = [] 230 | for entity in self.world.entities: 231 | geom = rendering.make_circle(entity.size) 232 | xform = rendering.Transform() 233 | if 'agent' in entity.name: 234 | geom.set_color(*entity.color, alpha=0.5) 235 | else: 236 | geom.set_color(*entity.color) 237 | geom.add_attr(xform) 238 | self.render_geoms.append(geom) 239 | self.render_geoms_xform.append(xform) 240 | 241 | # add geoms to viewer 242 | for viewer in self.viewers: 243 | viewer.geoms = [] 244 | for geom in self.render_geoms: 245 | viewer.add_geom(geom) 246 | 247 | results = [] 248 | for i in range(len(self.viewers)): 249 | from multiagent import rendering 250 | # update bounds to center around agent 251 | cam_range = 1 252 | if self.shared_viewer: 253 | pos = np.zeros(self.world.dim_p) 254 | else: 255 | pos = self.agents[i].state.p_pos 256 | self.viewers[i].set_bounds(pos[0]-cam_range,pos[0]+cam_range,pos[1]-cam_range,pos[1]+cam_range) 257 | # update geometry positions 258 | for e, entity in enumerate(self.world.entities): 259 | self.render_geoms_xform[e].set_translation(*entity.state.p_pos) 260 | # render to display or array 261 | results.append(self.viewers[i].render(return_rgb_array = mode=='rgb_array')) 262 | 263 | return results 264 | 265 | # create receptor field locations in local coordinate frame 266 | def _make_receptor_locations(self, agent): 267 | receptor_type = 'polar' 268 | range_min = 0.05 * 2.0 269 | range_max = 1.00 270 | dx = [] 271 | # circular receptive field 272 | if receptor_type == 'polar': 273 | for angle in np.linspace(-np.pi, +np.pi, 8, endpoint=False): 274 | for distance in np.linspace(range_min, range_max, 3): 275 | dx.append(distance * np.array([np.cos(angle), np.sin(angle)])) 276 | # add origin 277 | dx.append(np.array([0.0, 0.0])) 278 | # grid receptive field 279 | if receptor_type == 'grid': 280 | for x in np.linspace(-range_max, +range_max, 5): 281 | for y in np.linspace(-range_max, +range_max, 5): 282 | dx.append(np.array([x,y])) 283 | return dx 284 | 285 | 286 | # vectorized wrapper for a batch of multi-agent environments 287 | # assumes all environments have the same observation and action space 288 | class BatchMultiAgentEnv(gym.Env): 289 | metadata = { 290 | 'runtime.vectorized': True, 291 | 'render.modes' : ['human', 'rgb_array'] 292 | } 293 | 294 | def __init__(self, env_batch): 295 | self.env_batch = env_batch 296 | 297 | @property 298 | def n(self): 299 | return np.sum([env.n for env in self.env_batch]) 300 | 301 | @property 302 | def action_space(self): 303 | return self.env_batch[0].action_space 304 | 305 | @property 306 | def observation_space(self): 307 | return self.env_batch[0].observation_space 308 | 309 | def step(self, action_n, time): 310 | obs_n = [] 311 | reward_n = [] 312 | done_n = [] 313 | info_n = {'n': []} 314 | i = 0 315 | for env in self.env_batch: 316 | obs, reward, done, _ = env.step(action_n[i:(i+env.n)], time) 317 | i += env.n 318 | obs_n += obs 319 | # reward = [r / len(self.env_batch) for r in reward] 320 | reward_n += reward 321 | done_n += done 322 | return obs_n, reward_n, done_n, info_n 323 | 324 | def reset(self): 325 | obs_n = [] 326 | for env in self.env_batch: 327 | obs_n += env.reset() 328 | return obs_n 329 | 330 | # render environment 331 | def render(self, mode='human', close=True): 332 | results_n = [] 333 | for env in self.env_batch: 334 | results_n += env.render(mode, close) 335 | return results_n 336 | -------------------------------------------------------------------------------- /multiagent/multi_discrete.py: -------------------------------------------------------------------------------- 1 | # An old version of OpenAI Gym's multi_discrete.py. (Was getting affected by Gym updates) 2 | # (https://github.com/openai/gym/blob/1fb81d4e3fb780ccf77fec731287ba07da35eb84/gym/spaces/multi_discrete.py) 3 | 4 | import numpy as np 5 | 6 | import gym 7 | from gym.spaces import prng 8 | 9 | class MultiDiscrete(gym.Space): 10 | """ 11 | - The multi-discrete action space consists of a series of discrete action spaces with different parameters 12 | - It can be adapted to both a Discrete action space or a continuous (Box) action space 13 | - It is useful to represent game controllers or keyboards where each key can be represented as a discrete action space 14 | - It is parametrized by passing an array of arrays containing [min, max] for each discrete action space 15 | where the discrete action space can take any integers from `min` to `max` (both inclusive) 16 | Note: A value of 0 always need to represent the NOOP action. 17 | e.g. Nintendo Game Controller 18 | - Can be conceptualized as 3 discrete action spaces: 19 | 1) Arrow Keys: Discrete 5 - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] - params: min: 0, max: 4 20 | 2) Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1 21 | 3) Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1 22 | - Can be initialized as 23 | MultiDiscrete([ [0,4], [0,1], [0,1] ]) 24 | """ 25 | def __init__(self, array_of_param_array): 26 | self.low = np.array([x[0] for x in array_of_param_array]) 27 | self.high = np.array([x[1] for x in array_of_param_array]) 28 | self.num_discrete_space = self.low.shape[0] 29 | 30 | def sample(self): 31 | """ Returns a array with one sample from each discrete action space """ 32 | # For each row: round(random .* (max - min) + min, 0) 33 | random_array = prng.np_random.rand(self.num_discrete_space) 34 | return [int(x) for x in np.floor(np.multiply((self.high - self.low + 1.), random_array) + self.low)] 35 | def contains(self, x): 36 | return len(x) == self.num_discrete_space and (np.array(x) >= self.low).all() and (np.array(x) <= self.high).all() 37 | 38 | @property 39 | def shape(self): 40 | return self.num_discrete_space 41 | def __repr__(self): 42 | return "MultiDiscrete" + str(self.num_discrete_space) 43 | def __eq__(self, other): 44 | return np.array_equal(self.low, other.low) and np.array_equal(self.high, other.high) -------------------------------------------------------------------------------- /multiagent/policy.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pyglet.window import key 3 | 4 | # individual agent policy 5 | class Policy(object): 6 | def __init__(self): 7 | pass 8 | def action(self, obs): 9 | raise NotImplementedError() 10 | 11 | # interactive policy based on keyboard input 12 | # hard-coded to deal only with movement, not communication 13 | class InteractivePolicy(Policy): 14 | def __init__(self, env, agent_index): 15 | super(InteractivePolicy, self).__init__() 16 | self.env = env 17 | # hard-coded keyboard events 18 | self.move = [False for i in range(4)] 19 | self.comm = [False for i in range(env.world.dim_c)] 20 | # register keyboard events with this environment's window 21 | env.viewers[agent_index].window.on_key_press = self.key_press 22 | env.viewers[agent_index].window.on_key_release = self.key_release 23 | 24 | def action(self, obs): 25 | # ignore observation and just act based on keyboard events 26 | if self.env.discrete_action_input: 27 | u = 0 28 | if self.move[0]: u = 1 29 | if self.move[1]: u = 2 30 | if self.move[2]: u = 4 31 | if self.move[3]: u = 3 32 | else: 33 | u = np.zeros(5) # 5-d because of no-move action 34 | if self.move[0]: u[1] += 1.0 35 | if self.move[1]: u[2] += 1.0 36 | if self.move[3]: u[3] += 1.0 37 | if self.move[2]: u[4] += 1.0 38 | if True not in self.move: 39 | u[0] += 1.0 40 | return np.concatenate([u, np.zeros(self.env.world.dim_c)]) 41 | 42 | # keyboard event callbacks 43 | def key_press(self, k, mod): 44 | if k==key.LEFT: self.move[0] = True 45 | if k==key.RIGHT: self.move[1] = True 46 | if k==key.UP: self.move[2] = True 47 | if k==key.DOWN: self.move[3] = True 48 | def key_release(self, k, mod): 49 | if k==key.LEFT: self.move[0] = False 50 | if k==key.RIGHT: self.move[1] = False 51 | if k==key.UP: self.move[2] = False 52 | if k==key.DOWN: self.move[3] = False 53 | -------------------------------------------------------------------------------- /multiagent/rendering.py: -------------------------------------------------------------------------------- 1 | """ 2 | 2D rendering framework 3 | """ 4 | from __future__ import division 5 | import os 6 | import six 7 | import sys 8 | 9 | if "Apple" in sys.version: 10 | if 'DYLD_FALLBACK_LIBRARY_PATH' in os.environ: 11 | os.environ['DYLD_FALLBACK_LIBRARY_PATH'] += ':/usr/lib' 12 | # (JDS 2016/04/15): avoid bug on Anaconda 2.3.0 / Yosemite 13 | 14 | from gym.utils import reraise 15 | from gym import error 16 | 17 | try: 18 | import pyglet 19 | except ImportError as e: 20 | reraise(suffix="HINT: you can install pyglet directly via 'pip install pyglet'. But if you really just want to install all Gym dependencies and not have to think about it, 'pip install -e .[all]' or 'pip install gym[all]' will do it.") 21 | 22 | try: 23 | from pyglet.gl import * 24 | except ImportError as e: 25 | reraise(prefix="Error occured while running `from pyglet.gl import *`",suffix="HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'. If you're running on a server, you may need a virtual frame buffer; something like this should work: 'xvfb-run -s \"-screen 0 1400x900x24\" python '") 26 | 27 | import math 28 | import numpy as np 29 | 30 | RAD2DEG = 57.29577951308232 31 | 32 | def get_display(spec): 33 | """Convert a display specification (such as :0) into an actual Display 34 | object. 35 | 36 | Pyglet only supports multiple Displays on Linux. 37 | """ 38 | if spec is None: 39 | return None 40 | elif isinstance(spec, six.string_types): 41 | return pyglet.canvas.Display(spec) 42 | else: 43 | raise error.Error('Invalid display specification: {}. (Must be a string like :0 or None.)'.format(spec)) 44 | 45 | class Viewer(object): 46 | def __init__(self, width, height, display=None): 47 | display = get_display(display) 48 | 49 | self.width = width 50 | self.height = height 51 | 52 | self.window = pyglet.window.Window(width=width, height=height, display=display) 53 | self.window.on_close = self.window_closed_by_user 54 | self.geoms = [] 55 | self.onetime_geoms = [] 56 | self.transform = Transform() 57 | 58 | glEnable(GL_BLEND) 59 | # glEnable(GL_MULTISAMPLE) 60 | glEnable(GL_LINE_SMOOTH) 61 | # glHint(GL_LINE_SMOOTH_HINT, GL_DONT_CARE) 62 | glHint(GL_LINE_SMOOTH_HINT, GL_NICEST) 63 | glLineWidth(2.0) 64 | glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) 65 | 66 | def close(self): 67 | self.window.close() 68 | 69 | def window_closed_by_user(self): 70 | self.close() 71 | 72 | def set_bounds(self, left, right, bottom, top): 73 | assert right > left and top > bottom 74 | scalex = self.width/(right-left) 75 | scaley = self.height/(top-bottom) 76 | self.transform = Transform( 77 | translation=(-left*scalex, -bottom*scaley), 78 | scale=(scalex, scaley)) 79 | 80 | def add_geom(self, geom): 81 | self.geoms.append(geom) 82 | 83 | def add_onetime(self, geom): 84 | self.onetime_geoms.append(geom) 85 | 86 | def render(self, return_rgb_array=False): 87 | glClearColor(1,1,1,1) 88 | self.window.clear() 89 | self.window.switch_to() 90 | self.window.dispatch_events() 91 | self.transform.enable() 92 | for geom in self.geoms: 93 | geom.render() 94 | for geom in self.onetime_geoms: 95 | geom.render() 96 | self.transform.disable() 97 | arr = None 98 | if return_rgb_array: 99 | buffer = pyglet.image.get_buffer_manager().get_color_buffer() 100 | image_data = buffer.get_image_data() 101 | arr = np.fromstring(image_data.data, dtype=np.uint8, sep='') 102 | # In https://github.com/openai/gym-http-api/issues/2, we 103 | # discovered that someone using Xmonad on Arch was having 104 | # a window of size 598 x 398, though a 600 x 400 window 105 | # was requested. (Guess Xmonad was preserving a pixel for 106 | # the boundary.) So we use the buffer height/width rather 107 | # than the requested one. 108 | arr = arr.reshape(buffer.height, buffer.width, 4) 109 | arr = arr[::-1,:,0:3] 110 | self.window.flip() 111 | self.onetime_geoms = [] 112 | return arr 113 | 114 | # Convenience 115 | def draw_circle(self, radius=10, res=30, filled=True, **attrs): 116 | geom = make_circle(radius=radius, res=res, filled=filled) 117 | _add_attrs(geom, attrs) 118 | self.add_onetime(geom) 119 | return geom 120 | 121 | def draw_polygon(self, v, filled=True, **attrs): 122 | geom = make_polygon(v=v, filled=filled) 123 | _add_attrs(geom, attrs) 124 | self.add_onetime(geom) 125 | return geom 126 | 127 | def draw_polyline(self, v, **attrs): 128 | geom = make_polyline(v=v) 129 | _add_attrs(geom, attrs) 130 | self.add_onetime(geom) 131 | return geom 132 | 133 | def draw_line(self, start, end, **attrs): 134 | geom = Line(start, end) 135 | _add_attrs(geom, attrs) 136 | self.add_onetime(geom) 137 | return geom 138 | 139 | def get_array(self): 140 | self.window.flip() 141 | image_data = pyglet.image.get_buffer_manager().get_color_buffer().get_image_data() 142 | self.window.flip() 143 | arr = np.fromstring(image_data.data, dtype=np.uint8, sep='') 144 | arr = arr.reshape(self.height, self.width, 4) 145 | return arr[::-1,:,0:3] 146 | 147 | def _add_attrs(geom, attrs): 148 | if "color" in attrs: 149 | geom.set_color(*attrs["color"]) 150 | if "linewidth" in attrs: 151 | geom.set_linewidth(attrs["linewidth"]) 152 | 153 | class Geom(object): 154 | def __init__(self): 155 | self._color=Color((0, 0, 0, 1.0)) 156 | self.attrs = [self._color] 157 | def render(self): 158 | for attr in reversed(self.attrs): 159 | attr.enable() 160 | self.render1() 161 | for attr in self.attrs: 162 | attr.disable() 163 | def render1(self): 164 | raise NotImplementedError 165 | def add_attr(self, attr): 166 | self.attrs.append(attr) 167 | def set_color(self, r, g, b, alpha=1): 168 | self._color.vec4 = (r, g, b, alpha) 169 | 170 | class Attr(object): 171 | def enable(self): 172 | raise NotImplementedError 173 | def disable(self): 174 | pass 175 | 176 | class Transform(Attr): 177 | def __init__(self, translation=(0.0, 0.0), rotation=0.0, scale=(1,1)): 178 | self.set_translation(*translation) 179 | self.set_rotation(rotation) 180 | self.set_scale(*scale) 181 | def enable(self): 182 | glPushMatrix() 183 | glTranslatef(self.translation[0], self.translation[1], 0) # translate to GL loc ppint 184 | glRotatef(RAD2DEG * self.rotation, 0, 0, 1.0) 185 | glScalef(self.scale[0], self.scale[1], 1) 186 | def disable(self): 187 | glPopMatrix() 188 | def set_translation(self, newx, newy): 189 | self.translation = (float(newx), float(newy)) 190 | def set_rotation(self, new): 191 | self.rotation = float(new) 192 | def set_scale(self, newx, newy): 193 | self.scale = (float(newx), float(newy)) 194 | 195 | class Color(Attr): 196 | def __init__(self, vec4): 197 | self.vec4 = vec4 198 | def enable(self): 199 | glColor4f(*self.vec4) 200 | 201 | class LineStyle(Attr): 202 | def __init__(self, style): 203 | self.style = style 204 | def enable(self): 205 | glEnable(GL_LINE_STIPPLE) 206 | glLineStipple(1, self.style) 207 | def disable(self): 208 | glDisable(GL_LINE_STIPPLE) 209 | 210 | class LineWidth(Attr): 211 | def __init__(self, stroke): 212 | self.stroke = stroke 213 | def enable(self): 214 | glLineWidth(self.stroke) 215 | 216 | class Point(Geom): 217 | def __init__(self): 218 | Geom.__init__(self) 219 | def render1(self): 220 | glBegin(GL_POINTS) # draw point 221 | glVertex3f(0.0, 0.0, 0.0) 222 | glEnd() 223 | 224 | class FilledPolygon(Geom): 225 | def __init__(self, v): 226 | Geom.__init__(self) 227 | self.v = v 228 | def render1(self): 229 | if len(self.v) == 4 : glBegin(GL_QUADS) 230 | elif len(self.v) > 4 : glBegin(GL_POLYGON) 231 | else: glBegin(GL_TRIANGLES) 232 | for p in self.v: 233 | glVertex3f(p[0], p[1],0) # draw each vertex 234 | glEnd() 235 | 236 | color = (self._color.vec4[0] * 0.5, self._color.vec4[1] * 0.5, self._color.vec4[2] * 0.5, self._color.vec4[3] * 0.5) 237 | glColor4f(*color) 238 | glBegin(GL_LINE_LOOP) 239 | for p in self.v: 240 | glVertex3f(p[0], p[1],0) # draw each vertex 241 | glEnd() 242 | 243 | def make_circle(radius=10, res=30, filled=True): 244 | points = [] 245 | for i in range(res): 246 | ang = 2*math.pi*i / res 247 | points.append((math.cos(ang)*radius, math.sin(ang)*radius)) 248 | if filled: 249 | return FilledPolygon(points) 250 | else: 251 | return PolyLine(points, True) 252 | 253 | def make_polygon(v, filled=True): 254 | if filled: return FilledPolygon(v) 255 | else: return PolyLine(v, True) 256 | 257 | def make_polyline(v): 258 | return PolyLine(v, False) 259 | 260 | def make_capsule(length, width): 261 | l, r, t, b = 0, length, width/2, -width/2 262 | box = make_polygon([(l,b), (l,t), (r,t), (r,b)]) 263 | circ0 = make_circle(width/2) 264 | circ1 = make_circle(width/2) 265 | circ1.add_attr(Transform(translation=(length, 0))) 266 | geom = Compound([box, circ0, circ1]) 267 | return geom 268 | 269 | class Compound(Geom): 270 | def __init__(self, gs): 271 | Geom.__init__(self) 272 | self.gs = gs 273 | for g in self.gs: 274 | g.attrs = [a for a in g.attrs if not isinstance(a, Color)] 275 | def render1(self): 276 | for g in self.gs: 277 | g.render() 278 | 279 | class PolyLine(Geom): 280 | def __init__(self, v, close): 281 | Geom.__init__(self) 282 | self.v = v 283 | self.close = close 284 | self.linewidth = LineWidth(1) 285 | self.add_attr(self.linewidth) 286 | def render1(self): 287 | glBegin(GL_LINE_LOOP if self.close else GL_LINE_STRIP) 288 | for p in self.v: 289 | glVertex3f(p[0], p[1],0) # draw each vertex 290 | glEnd() 291 | def set_linewidth(self, x): 292 | self.linewidth.stroke = x 293 | 294 | class Line(Geom): 295 | def __init__(self, start=(0.0, 0.0), end=(0.0, 0.0)): 296 | Geom.__init__(self) 297 | self.start = start 298 | self.end = end 299 | self.linewidth = LineWidth(1) 300 | self.add_attr(self.linewidth) 301 | 302 | def render1(self): 303 | glBegin(GL_LINES) 304 | glVertex2f(*self.start) 305 | glVertex2f(*self.end) 306 | glEnd() 307 | 308 | class Image(Geom): 309 | def __init__(self, fname, width, height): 310 | Geom.__init__(self) 311 | self.width = width 312 | self.height = height 313 | img = pyglet.image.load(fname) 314 | self.img = img 315 | self.flip = False 316 | def render1(self): 317 | self.img.blit(-self.width/2, -self.height/2, width=self.width, height=self.height) 318 | 319 | # ================================================================ 320 | 321 | class SimpleImageViewer(object): 322 | def __init__(self, display=None): 323 | self.window = None 324 | self.isopen = False 325 | self.display = display 326 | def imshow(self, arr): 327 | if self.window is None: 328 | height, width, channels = arr.shape 329 | self.window = pyglet.window.Window(width=width, height=height, display=self.display) 330 | self.width = width 331 | self.height = height 332 | self.isopen = True 333 | assert arr.shape == (self.height, self.width, 3), "You passed in an image with the wrong number shape" 334 | image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3) 335 | self.window.clear() 336 | self.window.switch_to() 337 | self.window.dispatch_events() 338 | image.blit(0,0) 339 | self.window.flip() 340 | def close(self): 341 | if self.isopen: 342 | self.window.close() 343 | self.isopen = False 344 | def __del__(self): 345 | self.close() -------------------------------------------------------------------------------- /multiagent/scenario.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | # defines scenario upon which the world is built 4 | class BaseScenario(object): 5 | # create elements of the world 6 | def make_world(self): 7 | raise NotImplementedError() 8 | # create initial conditions of the world 9 | def reset_world(self, world): 10 | raise NotImplementedError() 11 | -------------------------------------------------------------------------------- /multiagent/scenarios/__init__.py: -------------------------------------------------------------------------------- 1 | import imp 2 | import os.path as osp 3 | 4 | 5 | def load(name): 6 | pathname = osp.join(osp.dirname(__file__), name) 7 | return imp.load_source('', pathname) 8 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | class Scenario(BaseScenario): 6 | def make_world(self): 7 | world = World() 8 | # add agents 9 | world.agents = [Agent() for i in range(1)] 10 | for i, agent in enumerate(world.agents): 11 | agent.name = 'agent %d' % i 12 | agent.collide = False 13 | agent.silent = True 14 | # add landmarks 15 | world.landmarks = [Landmark() for i in range(1)] 16 | for i, landmark in enumerate(world.landmarks): 17 | landmark.name = 'landmark %d' % i 18 | landmark.collide = False 19 | landmark.movable = False 20 | # make initial conditions 21 | self.reset_world(world) 22 | return world 23 | 24 | def reset_world(self, world): 25 | # random properties for agents 26 | for i, agent in enumerate(world.agents): 27 | agent.color = np.array([0.25,0.25,0.25]) 28 | # random properties for landmarks 29 | for i, landmark in enumerate(world.landmarks): 30 | landmark.color = np.array([0.75,0.75,0.75]) 31 | world.landmarks[0].color = np.array([0.75,0.25,0.25]) 32 | # set random initial states 33 | for agent in world.agents: 34 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 35 | agent.state.p_vel = np.zeros(world.dim_p) 36 | agent.state.c = np.zeros(world.dim_c) 37 | for i, landmark in enumerate(world.landmarks): 38 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 39 | landmark.state.p_vel = np.zeros(world.dim_p) 40 | 41 | def reward(self, agent, world): 42 | dist2 = np.sum(np.square(agent.state.p_pos - world.landmarks[0].state.p_pos)) 43 | return -dist2 44 | 45 | def observation(self, agent, world): 46 | # get positions of all entities in this agent's reference frame 47 | entity_pos = [] 48 | for entity in world.landmarks: 49 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 50 | return np.concatenate([agent.state.p_vel] + entity_pos) 51 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_adversary.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | 6 | class Scenario(BaseScenario): 7 | 8 | def make_world(self): 9 | world = World() 10 | # set any world properties first 11 | world.dim_c = 2 12 | num_agents = 3 13 | world.num_agents = num_agents 14 | num_adversaries = 1 15 | num_landmarks = num_agents - 1 16 | # add agents 17 | world.agents = [Agent() for i in range(num_agents)] 18 | for i, agent in enumerate(world.agents): 19 | agent.name = 'agent %d' % i 20 | agent.collide = False 21 | agent.silent = True 22 | agent.adversary = True if i < num_adversaries else False 23 | agent.size = 0.15 24 | # add landmarks 25 | world.landmarks = [Landmark() for i in range(num_landmarks)] 26 | for i, landmark in enumerate(world.landmarks): 27 | landmark.name = 'landmark %d' % i 28 | landmark.collide = False 29 | landmark.movable = False 30 | landmark.size = 0.08 31 | # make initial conditions 32 | self.reset_world(world) 33 | return world 34 | 35 | def reset_world(self, world): 36 | # random properties for agents 37 | world.agents[0].color = np.array([0.85, 0.35, 0.35]) 38 | for i in range(1, world.num_agents): 39 | world.agents[i].color = np.array([0.35, 0.35, 0.85]) 40 | # random properties for landmarks 41 | for i, landmark in enumerate(world.landmarks): 42 | landmark.color = np.array([0.15, 0.15, 0.15]) 43 | # set goal landmark 44 | goal = np.random.choice(world.landmarks) 45 | goal.color = np.array([0.15, 0.65, 0.15]) 46 | for agent in world.agents: 47 | agent.goal_a = goal 48 | # set random initial states 49 | for agent in world.agents: 50 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 51 | agent.state.p_vel = np.zeros(world.dim_p) 52 | agent.state.c = np.zeros(world.dim_c) 53 | for i, landmark in enumerate(world.landmarks): 54 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 55 | landmark.state.p_vel = np.zeros(world.dim_p) 56 | 57 | def benchmark_data(self, agent, world): 58 | # returns data for benchmarking purposes 59 | if agent.adversary: 60 | return np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos)) 61 | else: 62 | dists = [] 63 | for l in world.landmarks: 64 | dists.append(np.sum(np.square(agent.state.p_pos - l.state.p_pos))) 65 | dists.append(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))) 66 | return tuple(dists) 67 | 68 | # return all agents that are not adversaries 69 | def good_agents(self, world): 70 | return [agent for agent in world.agents if not agent.adversary] 71 | 72 | # return all adversarial agents 73 | def adversaries(self, world): 74 | return [agent for agent in world.agents if agent.adversary] 75 | 76 | def reward(self, agent, world): 77 | # Agents are rewarded based on minimum agent distance to each landmark 78 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world) 79 | 80 | def agent_reward(self, agent, world): 81 | # Rewarded based on how close any good agent is to the goal landmark, and how far the adversary is from it 82 | shaped_reward = True 83 | shaped_adv_reward = True 84 | 85 | # Calculate negative reward for adversary 86 | adversary_agents = self.adversaries(world) 87 | if shaped_adv_reward: # distance-based adversary reward 88 | adv_rew = sum([np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in adversary_agents]) 89 | else: # proximity-based adversary reward (binary) 90 | adv_rew = 0 91 | for a in adversary_agents: 92 | if np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) < 2 * a.goal_a.size: 93 | adv_rew -= 5 94 | 95 | # Calculate positive reward for agents 96 | good_agents = self.good_agents(world) 97 | if shaped_reward: # distance-based agent reward 98 | pos_rew = -min( 99 | [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents]) 100 | else: # proximity-based agent reward (binary) 101 | pos_rew = 0 102 | if min([np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents]) \ 103 | < 2 * agent.goal_a.size: 104 | pos_rew += 5 105 | pos_rew -= min( 106 | [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents]) 107 | return pos_rew + adv_rew 108 | 109 | def adversary_reward(self, agent, world): 110 | # Rewarded based on proximity to the goal landmark 111 | shaped_reward = True 112 | if shaped_reward: # distance-based reward 113 | return -np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos)) 114 | else: # proximity-based reward (binary) 115 | adv_rew = 0 116 | if np.sqrt(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))) < 2 * agent.goal_a.size: 117 | adv_rew += 5 118 | return adv_rew 119 | 120 | 121 | def observation(self, agent, world): 122 | # get positions of all entities in this agent's reference frame 123 | entity_pos = [] 124 | for entity in world.landmarks: 125 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 126 | # entity colors 127 | entity_color = [] 128 | for entity in world.landmarks: 129 | entity_color.append(entity.color) 130 | # communication of all other agents 131 | other_pos = [] 132 | for other in world.agents: 133 | if other is agent: continue 134 | other_pos.append(other.state.p_pos - agent.state.p_pos) 135 | 136 | if not agent.adversary: 137 | return np.concatenate([agent.goal_a.state.p_pos - agent.state.p_pos] + entity_pos + other_pos) 138 | else: 139 | return np.concatenate(entity_pos + other_pos) 140 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_crypto.py: -------------------------------------------------------------------------------- 1 | """ 2 | Scenario: 3 | 1 speaker, 2 listeners (one of which is an adversary). Good agents rewarded for proximity to goal, and distance from 4 | adversary to goal. Adversary is rewarded for its distance to the goal. 5 | """ 6 | 7 | 8 | import numpy as np 9 | from multiagent.core import World, Agent, Landmark 10 | from multiagent.scenario import BaseScenario 11 | import random 12 | 13 | 14 | class CryptoAgent(Agent): 15 | def __init__(self): 16 | super(CryptoAgent, self).__init__() 17 | self.key = None 18 | 19 | class Scenario(BaseScenario): 20 | 21 | def make_world(self): 22 | world = World() 23 | # set any world properties first 24 | num_agents = 3 25 | num_adversaries = 1 26 | num_landmarks = 2 27 | world.dim_c = 4 28 | # add agents 29 | world.agents = [CryptoAgent() for i in range(num_agents)] 30 | for i, agent in enumerate(world.agents): 31 | agent.name = 'agent %d' % i 32 | agent.collide = False 33 | agent.adversary = True if i < num_adversaries else False 34 | agent.speaker = True if i == 2 else False 35 | agent.movable = False 36 | # add landmarks 37 | world.landmarks = [Landmark() for i in range(num_landmarks)] 38 | for i, landmark in enumerate(world.landmarks): 39 | landmark.name = 'landmark %d' % i 40 | landmark.collide = False 41 | landmark.movable = False 42 | # make initial conditions 43 | self.reset_world(world) 44 | return world 45 | 46 | 47 | def reset_world(self, world): 48 | # random properties for agents 49 | for i, agent in enumerate(world.agents): 50 | agent.color = np.array([0.25, 0.25, 0.25]) 51 | if agent.adversary: 52 | agent.color = np.array([0.75, 0.25, 0.25]) 53 | agent.key = None 54 | # random properties for landmarks 55 | color_list = [np.zeros(world.dim_c) for i in world.landmarks] 56 | for i, color in enumerate(color_list): 57 | color[i] += 1 58 | for color, landmark in zip(color_list, world.landmarks): 59 | landmark.color = color 60 | # set goal landmark 61 | goal = np.random.choice(world.landmarks) 62 | world.agents[1].color = goal.color 63 | world.agents[2].key = np.random.choice(world.landmarks).color 64 | 65 | for agent in world.agents: 66 | agent.goal_a = goal 67 | 68 | # set random initial states 69 | for agent in world.agents: 70 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 71 | agent.state.p_vel = np.zeros(world.dim_p) 72 | agent.state.c = np.zeros(world.dim_c) 73 | for i, landmark in enumerate(world.landmarks): 74 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 75 | landmark.state.p_vel = np.zeros(world.dim_p) 76 | 77 | 78 | def benchmark_data(self, agent, world): 79 | # returns data for benchmarking purposes 80 | return (agent.state.c, agent.goal_a.color) 81 | 82 | # return all agents that are not adversaries 83 | def good_listeners(self, world): 84 | return [agent for agent in world.agents if not agent.adversary and not agent.speaker] 85 | 86 | # return all agents that are not adversaries 87 | def good_agents(self, world): 88 | return [agent for agent in world.agents if not agent.adversary] 89 | 90 | # return all adversarial agents 91 | def adversaries(self, world): 92 | return [agent for agent in world.agents if agent.adversary] 93 | 94 | def reward(self, agent, world): 95 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world) 96 | 97 | def agent_reward(self, agent, world): 98 | # Agents rewarded if Bob can reconstruct message, but adversary (Eve) cannot 99 | good_listeners = self.good_listeners(world) 100 | adversaries = self.adversaries(world) 101 | good_rew = 0 102 | adv_rew = 0 103 | for a in good_listeners: 104 | if (a.state.c == np.zeros(world.dim_c)).all(): 105 | continue 106 | else: 107 | good_rew -= np.sum(np.square(a.state.c - agent.goal_a.color)) 108 | for a in adversaries: 109 | if (a.state.c == np.zeros(world.dim_c)).all(): 110 | continue 111 | else: 112 | adv_l1 = np.sum(np.square(a.state.c - agent.goal_a.color)) 113 | adv_rew += adv_l1 114 | return adv_rew + good_rew 115 | 116 | def adversary_reward(self, agent, world): 117 | # Adversary (Eve) is rewarded if it can reconstruct original goal 118 | rew = 0 119 | if not (agent.state.c == np.zeros(world.dim_c)).all(): 120 | rew -= np.sum(np.square(agent.state.c - agent.goal_a.color)) 121 | return rew 122 | 123 | 124 | def observation(self, agent, world): 125 | # goal color 126 | goal_color = np.zeros(world.dim_color) 127 | if agent.goal_a is not None: 128 | goal_color = agent.goal_a.color 129 | 130 | # get positions of all entities in this agent's reference frame 131 | entity_pos = [] 132 | for entity in world.landmarks: 133 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 134 | # communication of all other agents 135 | comm = [] 136 | for other in world.agents: 137 | if other is agent or (other.state.c is None) or not other.speaker: continue 138 | comm.append(other.state.c) 139 | 140 | confer = np.array([0]) 141 | 142 | if world.agents[2].key is None: 143 | confer = np.array([1]) 144 | key = np.zeros(world.dim_c) 145 | goal_color = np.zeros(world.dim_c) 146 | else: 147 | key = world.agents[2].key 148 | 149 | prnt = False 150 | # speaker 151 | if agent.speaker: 152 | if prnt: 153 | print('speaker') 154 | print(agent.state.c) 155 | print(np.concatenate([goal_color] + [key] + [confer] + [np.random.randn(1)])) 156 | return np.concatenate([goal_color] + [key]) 157 | # listener 158 | if not agent.speaker and not agent.adversary: 159 | if prnt: 160 | print('listener') 161 | print(agent.state.c) 162 | print(np.concatenate([key] + comm + [confer])) 163 | return np.concatenate([key] + comm) 164 | if not agent.speaker and agent.adversary: 165 | if prnt: 166 | print('adversary') 167 | print(agent.state.c) 168 | print(np.concatenate(comm + [confer])) 169 | return np.concatenate(comm) 170 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_push.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | class Scenario(BaseScenario): 6 | def make_world(self): 7 | world = World() 8 | # set any world properties first 9 | world.dim_c = 2 10 | num_agents = 2 11 | num_adversaries = 1 12 | num_landmarks = 2 13 | # add agents 14 | world.agents = [Agent() for i in range(num_agents)] 15 | for i, agent in enumerate(world.agents): 16 | agent.name = 'agent %d' % i 17 | agent.collide = True 18 | agent.silent = True 19 | if i < num_adversaries: 20 | agent.adversary = True 21 | else: 22 | agent.adversary = False 23 | # add landmarks 24 | world.landmarks = [Landmark() for i in range(num_landmarks)] 25 | for i, landmark in enumerate(world.landmarks): 26 | landmark.name = 'landmark %d' % i 27 | landmark.collide = False 28 | landmark.movable = False 29 | # make initial conditions 30 | self.reset_world(world) 31 | return world 32 | 33 | def reset_world(self, world): 34 | # random properties for landmarks 35 | for i, landmark in enumerate(world.landmarks): 36 | landmark.color = np.array([0.1, 0.1, 0.1]) 37 | landmark.color[i + 1] += 0.8 38 | landmark.index = i 39 | # set goal landmark 40 | goal = np.random.choice(world.landmarks) 41 | for i, agent in enumerate(world.agents): 42 | agent.goal_a = goal 43 | agent.color = np.array([0.25, 0.25, 0.25]) 44 | if agent.adversary: 45 | agent.color = np.array([0.75, 0.25, 0.25]) 46 | else: 47 | j = goal.index 48 | agent.color[j + 1] += 0.5 49 | # set random initial states 50 | for agent in world.agents: 51 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 52 | agent.state.p_vel = np.zeros(world.dim_p) 53 | agent.state.c = np.zeros(world.dim_c) 54 | for i, landmark in enumerate(world.landmarks): 55 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 56 | landmark.state.p_vel = np.zeros(world.dim_p) 57 | 58 | def reward(self, agent, world): 59 | # Agents are rewarded based on minimum agent distance to each landmark 60 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world) 61 | 62 | def agent_reward(self, agent, world): 63 | # the distance to the goal 64 | return -np.sqrt(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))) 65 | 66 | def adversary_reward(self, agent, world): 67 | # keep the nearest good agents away from the goal 68 | agent_dist = [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in world.agents if not a.adversary] 69 | pos_rew = min(agent_dist) 70 | #nearest_agent = world.good_agents[np.argmin(agent_dist)] 71 | #neg_rew = np.sqrt(np.sum(np.square(nearest_agent.state.p_pos - agent.state.p_pos))) 72 | neg_rew = np.sqrt(np.sum(np.square(agent.goal_a.state.p_pos - agent.state.p_pos))) 73 | #neg_rew = sum([np.sqrt(np.sum(np.square(a.state.p_pos - agent.state.p_pos))) for a in world.good_agents]) 74 | return pos_rew - neg_rew 75 | 76 | def observation(self, agent, world): 77 | # get positions of all entities in this agent's reference frame 78 | entity_pos = [] 79 | for entity in world.landmarks: # world.entities: 80 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 81 | # entity colors 82 | entity_color = [] 83 | for entity in world.landmarks: # world.entities: 84 | entity_color.append(entity.color) 85 | # communication of all other agents 86 | comm = [] 87 | other_pos = [] 88 | for other in world.agents: 89 | if other is agent: continue 90 | comm.append(other.state.c) 91 | other_pos.append(other.state.p_pos - agent.state.p_pos) 92 | if not agent.adversary: 93 | return np.concatenate([agent.state.p_vel] + [agent.goal_a.state.p_pos - agent.state.p_pos] + [agent.color] + entity_pos + entity_color + other_pos) 94 | else: 95 | #other_pos = list(reversed(other_pos)) if random.uniform(0,1) > 0.5 else other_pos # randomize position of other agents in adversary network 96 | return np.concatenate([agent.state.p_vel] + entity_pos + other_pos) 97 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_reference.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | class Scenario(BaseScenario): 6 | def make_world(self): 7 | world = World() 8 | # set any world properties first 9 | world.dim_c = 10 10 | world.collaborative = True # whether agents share rewards 11 | # add agents 12 | world.agents = [Agent() for i in range(2)] 13 | for i, agent in enumerate(world.agents): 14 | agent.name = 'agent %d' % i 15 | agent.collide = False 16 | # add landmarks 17 | world.landmarks = [Landmark() for i in range(3)] 18 | for i, landmark in enumerate(world.landmarks): 19 | landmark.name = 'landmark %d' % i 20 | landmark.collide = False 21 | landmark.movable = False 22 | # make initial conditions 23 | self.reset_world(world) 24 | return world 25 | 26 | def reset_world(self, world): 27 | # assign goals to agents 28 | for agent in world.agents: 29 | agent.goal_a = None 30 | agent.goal_b = None 31 | # want other agent to go to the goal landmark 32 | world.agents[0].goal_a = world.agents[1] 33 | world.agents[0].goal_b = np.random.choice(world.landmarks) 34 | world.agents[1].goal_a = world.agents[0] 35 | world.agents[1].goal_b = np.random.choice(world.landmarks) 36 | # random properties for agents 37 | for i, agent in enumerate(world.agents): 38 | agent.color = np.array([0.25,0.25,0.25]) 39 | # random properties for landmarks 40 | world.landmarks[0].color = np.array([0.75,0.25,0.25]) 41 | world.landmarks[1].color = np.array([0.25,0.75,0.25]) 42 | world.landmarks[2].color = np.array([0.25,0.25,0.75]) 43 | # special colors for goals 44 | world.agents[0].goal_a.color = world.agents[0].goal_b.color 45 | world.agents[1].goal_a.color = world.agents[1].goal_b.color 46 | # set random initial states 47 | for agent in world.agents: 48 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 49 | agent.state.p_vel = np.zeros(world.dim_p) 50 | agent.state.c = np.zeros(world.dim_c) 51 | for i, landmark in enumerate(world.landmarks): 52 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 53 | landmark.state.p_vel = np.zeros(world.dim_p) 54 | 55 | def reward(self, agent, world): 56 | if agent.goal_a is None or agent.goal_b is None: 57 | return 0.0 58 | dist2 = np.sum(np.square(agent.goal_a.state.p_pos - agent.goal_b.state.p_pos)) 59 | return -dist2 60 | 61 | def observation(self, agent, world): 62 | # goal color 63 | goal_color = [np.zeros(world.dim_color), np.zeros(world.dim_color)] 64 | if agent.goal_b is not None: 65 | goal_color[1] = agent.goal_b.color 66 | 67 | # get positions of all entities in this agent's reference frame 68 | entity_pos = [] 69 | for entity in world.landmarks: 70 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 71 | # entity colors 72 | entity_color = [] 73 | for entity in world.landmarks: 74 | entity_color.append(entity.color) 75 | # communication of all other agents 76 | comm = [] 77 | for other in world.agents: 78 | if other is agent: continue 79 | comm.append(other.state.c) 80 | return np.concatenate([agent.state.p_vel] + entity_pos + [goal_color[1]] + comm) 81 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_speaker_listener.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | class Scenario(BaseScenario): 6 | def make_world(self): 7 | world = World() 8 | # set any world properties first 9 | world.dim_c = 3 10 | num_landmarks = 3 11 | world.collaborative = True 12 | # add agents 13 | world.agents = [Agent() for i in range(2)] 14 | for i, agent in enumerate(world.agents): 15 | agent.name = 'agent %d' % i 16 | agent.collide = False 17 | agent.size = 0.075 18 | # speaker 19 | world.agents[0].movable = False 20 | # listener 21 | world.agents[1].silent = True 22 | # add landmarks 23 | world.landmarks = [Landmark() for i in range(num_landmarks)] 24 | for i, landmark in enumerate(world.landmarks): 25 | landmark.name = 'landmark %d' % i 26 | landmark.collide = False 27 | landmark.movable = False 28 | landmark.size = 0.04 29 | # make initial conditions 30 | self.reset_world(world) 31 | return world 32 | 33 | def reset_world(self, world): 34 | # assign goals to agents 35 | for agent in world.agents: 36 | agent.goal_a = None 37 | agent.goal_b = None 38 | # want listener to go to the goal landmark 39 | world.agents[0].goal_a = world.agents[1] 40 | world.agents[0].goal_b = np.random.choice(world.landmarks) 41 | # random properties for agents 42 | for i, agent in enumerate(world.agents): 43 | agent.color = np.array([0.25,0.25,0.25]) 44 | # random properties for landmarks 45 | world.landmarks[0].color = np.array([0.65,0.15,0.15]) 46 | world.landmarks[1].color = np.array([0.15,0.65,0.15]) 47 | world.landmarks[2].color = np.array([0.15,0.15,0.65]) 48 | # special colors for goals 49 | world.agents[0].goal_a.color = world.agents[0].goal_b.color + np.array([0.45, 0.45, 0.45]) 50 | # set random initial states 51 | for agent in world.agents: 52 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 53 | agent.state.p_vel = np.zeros(world.dim_p) 54 | agent.state.c = np.zeros(world.dim_c) 55 | for i, landmark in enumerate(world.landmarks): 56 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p) 57 | landmark.state.p_vel = np.zeros(world.dim_p) 58 | 59 | def benchmark_data(self, agent, world): 60 | # returns data for benchmarking purposes 61 | return self.reward(agent, reward) 62 | 63 | def reward(self, agent, world): 64 | # squared distance from listener to landmark 65 | a = world.agents[0] 66 | dist2 = np.sum(np.square(a.goal_a.state.p_pos - a.goal_b.state.p_pos)) 67 | return -dist2 68 | 69 | def observation(self, agent, world): 70 | # goal color 71 | goal_color = np.zeros(world.dim_color) 72 | if agent.goal_b is not None: 73 | goal_color = agent.goal_b.color 74 | 75 | # get positions of all entities in this agent's reference frame 76 | entity_pos = [] 77 | for entity in world.landmarks: 78 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 79 | 80 | # communication of all other agents 81 | comm = [] 82 | for other in world.agents: 83 | if other is agent or (other.state.c is None): continue 84 | comm.append(other.state.c) 85 | 86 | # speaker 87 | if not agent.movable: 88 | return np.concatenate([goal_color]) 89 | # listener 90 | if agent.silent: 91 | return np.concatenate([agent.state.p_vel] + entity_pos + comm) 92 | 93 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_spread.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | 6 | class Scenario(BaseScenario): 7 | def make_world(self): 8 | world = World() 9 | # set any world properties first 10 | world.dim_c = 2 11 | num_agents = 3 12 | num_landmarks = 3 13 | world.collaborative = True 14 | # add agents 15 | world.agents = [Agent() for i in range(num_agents)] 16 | for i, agent in enumerate(world.agents): 17 | agent.name = 'agent %d' % i 18 | agent.collide = True 19 | agent.silent = True 20 | agent.size = 0.15 21 | # add landmarks 22 | world.landmarks = [Landmark() for i in range(num_landmarks)] 23 | for i, landmark in enumerate(world.landmarks): 24 | landmark.name = 'landmark %d' % i 25 | landmark.collide = False 26 | landmark.movable = False 27 | # make initial conditions 28 | self.reset_world(world) 29 | return world 30 | 31 | def reset_world(self, world): 32 | # random properties for agents 33 | for i, agent in enumerate(world.agents): 34 | agent.color = np.array([0.35, 0.35, 0.85]) 35 | # random properties for landmarks 36 | for i, landmark in enumerate(world.landmarks): 37 | landmark.color = np.array([0.25, 0.25, 0.25]) 38 | # set random initial states 39 | for agent in world.agents: 40 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 41 | agent.state.p_vel = np.zeros(world.dim_p) 42 | agent.state.c = np.zeros(world.dim_c) 43 | for i, landmark in enumerate(world.landmarks): 44 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 45 | landmark.state.p_vel = np.zeros(world.dim_p) 46 | 47 | def benchmark_data(self, agent, world): 48 | rew = 0 49 | collisions = 0 50 | occupied_landmarks = 0 51 | min_dists = 0 52 | for l in world.landmarks: 53 | dists = [np.sqrt(np.sum(np.square(a.state.p_pos - l.state.p_pos))) for a in world.agents] 54 | min_dists += min(dists) 55 | rew -= min(dists) 56 | if min(dists) < 0.1: 57 | occupied_landmarks += 1 58 | if agent.collide: 59 | for a in world.agents: 60 | if self.is_collision(a, agent): 61 | rew -= 1 62 | collisions += 1 63 | return (rew, collisions, min_dists, occupied_landmarks) 64 | 65 | 66 | def is_collision(self, agent1, agent2): 67 | delta_pos = agent1.state.p_pos - agent2.state.p_pos 68 | dist = np.sqrt(np.sum(np.square(delta_pos))) 69 | dist_min = agent1.size + agent2.size 70 | return True if dist < dist_min else False 71 | 72 | def reward(self, agent, world): 73 | # Agents are rewarded based on minimum agent distance to each landmark, penalized for collisions 74 | rew = 0 75 | for l in world.landmarks: 76 | dists = [np.sqrt(np.sum(np.square(a.state.p_pos - l.state.p_pos))) for a in world.agents] 77 | rew -= min(dists) 78 | if agent.collide: 79 | for a in world.agents: 80 | if self.is_collision(a, agent): 81 | rew -= 1 82 | return rew 83 | 84 | def observation(self, agent, world): 85 | # get positions of all entities in this agent's reference frame 86 | entity_pos = [] 87 | for entity in world.landmarks: # world.entities: 88 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 89 | # entity colors 90 | entity_color = [] 91 | for entity in world.landmarks: # world.entities: 92 | entity_color.append(entity.color) 93 | # communication of all other agents 94 | comm = [] 95 | other_pos = [] 96 | for other in world.agents: 97 | if other is agent: continue 98 | comm.append(other.state.c) 99 | other_pos.append(other.state.p_pos - agent.state.p_pos) 100 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + comm) 101 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_tag.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | 6 | class Scenario(BaseScenario): 7 | def make_world(self): 8 | world = World() 9 | # set any world properties first 10 | world.dim_c = 2 11 | num_good_agents = 1 12 | num_adversaries = 3 13 | num_agents = num_adversaries + num_good_agents 14 | num_landmarks = 2 15 | # add agents 16 | world.agents = [Agent() for i in range(num_agents)] 17 | for i, agent in enumerate(world.agents): 18 | agent.name = 'agent %d' % i 19 | agent.collide = True 20 | agent.silent = True 21 | agent.adversary = True if i < num_adversaries else False 22 | agent.size = 0.075 if agent.adversary else 0.05 23 | agent.accel = 3.0 if agent.adversary else 4.0 24 | #agent.accel = 20.0 if agent.adversary else 25.0 25 | agent.max_speed = 1.0 if agent.adversary else 1.3 26 | # add landmarks 27 | world.landmarks = [Landmark() for i in range(num_landmarks)] 28 | for i, landmark in enumerate(world.landmarks): 29 | landmark.name = 'landmark %d' % i 30 | landmark.collide = True 31 | landmark.movable = False 32 | landmark.size = 0.2 33 | landmark.boundary = False 34 | # make initial conditions 35 | self.reset_world(world) 36 | return world 37 | 38 | 39 | def reset_world(self, world): 40 | # random properties for agents 41 | for i, agent in enumerate(world.agents): 42 | agent.color = np.array([0.35, 0.85, 0.35]) if not agent.adversary else np.array([0.85, 0.35, 0.35]) 43 | # random properties for landmarks 44 | for i, landmark in enumerate(world.landmarks): 45 | landmark.color = np.array([0.25, 0.25, 0.25]) 46 | # set random initial states 47 | for agent in world.agents: 48 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 49 | agent.state.p_vel = np.zeros(world.dim_p) 50 | agent.state.c = np.zeros(world.dim_c) 51 | for i, landmark in enumerate(world.landmarks): 52 | if not landmark.boundary: 53 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p) 54 | landmark.state.p_vel = np.zeros(world.dim_p) 55 | 56 | 57 | def benchmark_data(self, agent, world): 58 | # returns data for benchmarking purposes 59 | if agent.adversary: 60 | collisions = 0 61 | for a in self.good_agents(world): 62 | if self.is_collision(a, agent): 63 | collisions += 1 64 | return collisions 65 | else: 66 | return 0 67 | 68 | 69 | def is_collision(self, agent1, agent2): 70 | delta_pos = agent1.state.p_pos - agent2.state.p_pos 71 | dist = np.sqrt(np.sum(np.square(delta_pos))) 72 | dist_min = agent1.size + agent2.size 73 | return True if dist < dist_min else False 74 | 75 | # return all agents that are not adversaries 76 | def good_agents(self, world): 77 | return [agent for agent in world.agents if not agent.adversary] 78 | 79 | # return all adversarial agents 80 | def adversaries(self, world): 81 | return [agent for agent in world.agents if agent.adversary] 82 | 83 | 84 | def reward(self, agent, world): 85 | # Agents are rewarded based on minimum agent distance to each landmark 86 | main_reward = self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world) 87 | return main_reward 88 | 89 | def agent_reward(self, agent, world): 90 | # Agents are negatively rewarded if caught by adversaries 91 | rew = 0 92 | shape = False 93 | adversaries = self.adversaries(world) 94 | if shape: # reward can optionally be shaped (increased reward for increased distance from adversary) 95 | for adv in adversaries: 96 | rew += 0.1 * np.sqrt(np.sum(np.square(agent.state.p_pos - adv.state.p_pos))) 97 | if agent.collide: 98 | for a in adversaries: 99 | if self.is_collision(a, agent): 100 | rew -= 10 101 | 102 | # agents are penalized for exiting the screen, so that they can be caught by the adversaries 103 | def bound(x): 104 | if x < 0.9: 105 | return 0 106 | if x < 1.0: 107 | return (x - 0.9) * 10 108 | return min(np.exp(2 * x - 2), 10) 109 | for p in range(world.dim_p): 110 | x = abs(agent.state.p_pos[p]) 111 | rew -= bound(x) 112 | 113 | return rew 114 | 115 | def adversary_reward(self, agent, world): 116 | # Adversaries are rewarded for collisions with agents 117 | rew = 0 118 | shape = False 119 | agents = self.good_agents(world) 120 | adversaries = self.adversaries(world) 121 | if shape: # reward can optionally be shaped (decreased reward for increased distance from agents) 122 | for adv in adversaries: 123 | rew -= 0.1 * min([np.sqrt(np.sum(np.square(a.state.p_pos - adv.state.p_pos))) for a in agents]) 124 | if agent.collide: 125 | for ag in agents: 126 | for adv in adversaries: 127 | if self.is_collision(ag, adv): 128 | rew += 10 129 | return rew 130 | 131 | def observation(self, agent, world): 132 | # get positions of all entities in this agent's reference frame 133 | entity_pos = [] 134 | for entity in world.landmarks: 135 | if not entity.boundary: 136 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 137 | # communication of all other agents 138 | comm = [] 139 | other_pos = [] 140 | other_vel = [] 141 | for other in world.agents: 142 | if other is agent: continue 143 | comm.append(other.state.c) 144 | other_pos.append(other.state.p_pos - agent.state.p_pos) 145 | if not other.adversary: 146 | other_vel.append(other.state.p_vel) 147 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel) 148 | -------------------------------------------------------------------------------- /multiagent/scenarios/simple_world_comm.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from multiagent.core import World, Agent, Landmark 3 | from multiagent.scenario import BaseScenario 4 | 5 | 6 | class Scenario(BaseScenario): 7 | def make_world(self): 8 | world = World() 9 | # set any world properties first 10 | world.dim_c = 4 11 | #world.damping = 1 12 | num_good_agents = 2 13 | num_adversaries = 4 14 | num_agents = num_adversaries + num_good_agents 15 | num_landmarks = 1 16 | num_food = 2 17 | num_forests = 2 18 | # add agents 19 | world.agents = [Agent() for i in range(num_agents)] 20 | for i, agent in enumerate(world.agents): 21 | agent.name = 'agent %d' % i 22 | agent.collide = True 23 | agent.leader = True if i == 0 else False 24 | agent.silent = True if i > 0 else False 25 | agent.adversary = True if i < num_adversaries else False 26 | agent.size = 0.075 if agent.adversary else 0.045 27 | agent.accel = 3.0 if agent.adversary else 4.0 28 | #agent.accel = 20.0 if agent.adversary else 25.0 29 | agent.max_speed = 1.0 if agent.adversary else 1.3 30 | # add landmarks 31 | world.landmarks = [Landmark() for i in range(num_landmarks)] 32 | for i, landmark in enumerate(world.landmarks): 33 | landmark.name = 'landmark %d' % i 34 | landmark.collide = True 35 | landmark.movable = False 36 | landmark.size = 0.2 37 | landmark.boundary = False 38 | world.food = [Landmark() for i in range(num_food)] 39 | for i, landmark in enumerate(world.food): 40 | landmark.name = 'food %d' % i 41 | landmark.collide = False 42 | landmark.movable = False 43 | landmark.size = 0.03 44 | landmark.boundary = False 45 | world.forests = [Landmark() for i in range(num_forests)] 46 | for i, landmark in enumerate(world.forests): 47 | landmark.name = 'forest %d' % i 48 | landmark.collide = False 49 | landmark.movable = False 50 | landmark.size = 0.3 51 | landmark.boundary = False 52 | world.landmarks += world.food 53 | world.landmarks += world.forests 54 | #world.landmarks += self.set_boundaries(world) # world boundaries now penalized with negative reward 55 | # make initial conditions 56 | self.reset_world(world) 57 | return world 58 | 59 | def set_boundaries(self, world): 60 | boundary_list = [] 61 | landmark_size = 1 62 | edge = 1 + landmark_size 63 | num_landmarks = int(edge * 2 / landmark_size) 64 | for x_pos in [-edge, edge]: 65 | for i in range(num_landmarks): 66 | l = Landmark() 67 | l.state.p_pos = np.array([x_pos, -1 + i * landmark_size]) 68 | boundary_list.append(l) 69 | 70 | for y_pos in [-edge, edge]: 71 | for i in range(num_landmarks): 72 | l = Landmark() 73 | l.state.p_pos = np.array([-1 + i * landmark_size, y_pos]) 74 | boundary_list.append(l) 75 | 76 | for i, l in enumerate(boundary_list): 77 | l.name = 'boundary %d' % i 78 | l.collide = True 79 | l.movable = False 80 | l.boundary = True 81 | l.color = np.array([0.75, 0.75, 0.75]) 82 | l.size = landmark_size 83 | l.state.p_vel = np.zeros(world.dim_p) 84 | 85 | return boundary_list 86 | 87 | 88 | def reset_world(self, world): 89 | # random properties for agents 90 | for i, agent in enumerate(world.agents): 91 | agent.color = np.array([0.45, 0.95, 0.45]) if not agent.adversary else np.array([0.95, 0.45, 0.45]) 92 | agent.color -= np.array([0.3, 0.3, 0.3]) if agent.leader else np.array([0, 0, 0]) 93 | # random properties for landmarks 94 | for i, landmark in enumerate(world.landmarks): 95 | landmark.color = np.array([0.25, 0.25, 0.25]) 96 | for i, landmark in enumerate(world.food): 97 | landmark.color = np.array([0.15, 0.15, 0.65]) 98 | for i, landmark in enumerate(world.forests): 99 | landmark.color = np.array([0.6, 0.9, 0.6]) 100 | # set random initial states 101 | for agent in world.agents: 102 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p) 103 | agent.state.p_vel = np.zeros(world.dim_p) 104 | agent.state.c = np.zeros(world.dim_c) 105 | for i, landmark in enumerate(world.landmarks): 106 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p) 107 | landmark.state.p_vel = np.zeros(world.dim_p) 108 | for i, landmark in enumerate(world.food): 109 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p) 110 | landmark.state.p_vel = np.zeros(world.dim_p) 111 | for i, landmark in enumerate(world.forests): 112 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p) 113 | landmark.state.p_vel = np.zeros(world.dim_p) 114 | 115 | def benchmark_data(self, agent, world): 116 | if agent.adversary: 117 | collisions = 0 118 | for a in self.good_agents(world): 119 | if self.is_collision(a, agent): 120 | collisions += 1 121 | return collisions 122 | else: 123 | return 0 124 | 125 | 126 | def is_collision(self, agent1, agent2): 127 | delta_pos = agent1.state.p_pos - agent2.state.p_pos 128 | dist = np.sqrt(np.sum(np.square(delta_pos))) 129 | dist_min = agent1.size + agent2.size 130 | return True if dist < dist_min else False 131 | 132 | 133 | # return all agents that are not adversaries 134 | def good_agents(self, world): 135 | return [agent for agent in world.agents if not agent.adversary] 136 | 137 | # return all adversarial agents 138 | def adversaries(self, world): 139 | return [agent for agent in world.agents if agent.adversary] 140 | 141 | 142 | def reward(self, agent, world): 143 | # Agents are rewarded based on minimum agent distance to each landmark 144 | #boundary_reward = -10 if self.outside_boundary(agent) else 0 145 | main_reward = self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world) 146 | return main_reward 147 | 148 | def outside_boundary(self, agent): 149 | if agent.state.p_pos[0] > 1 or agent.state.p_pos[0] < -1 or agent.state.p_pos[1] > 1 or agent.state.p_pos[1] < -1: 150 | return True 151 | else: 152 | return False 153 | 154 | 155 | def agent_reward(self, agent, world): 156 | # Agents are rewarded based on minimum agent distance to each landmark 157 | rew = 0 158 | shape = False 159 | adversaries = self.adversaries(world) 160 | if shape: 161 | for adv in adversaries: 162 | rew += 0.1 * np.sqrt(np.sum(np.square(agent.state.p_pos - adv.state.p_pos))) 163 | if agent.collide: 164 | for a in adversaries: 165 | if self.is_collision(a, agent): 166 | rew -= 5 167 | def bound(x): 168 | if x < 0.9: 169 | return 0 170 | if x < 1.0: 171 | return (x - 0.9) * 10 172 | return min(np.exp(2 * x - 2), 10) # 1 + (x - 1) * (x - 1) 173 | 174 | for p in range(world.dim_p): 175 | x = abs(agent.state.p_pos[p]) 176 | rew -= 2 * bound(x) 177 | 178 | for food in world.food: 179 | if self.is_collision(agent, food): 180 | rew += 2 181 | rew += 0.05 * min([np.sqrt(np.sum(np.square(food.state.p_pos - agent.state.p_pos))) for food in world.food]) 182 | 183 | return rew 184 | 185 | def adversary_reward(self, agent, world): 186 | # Agents are rewarded based on minimum agent distance to each landmark 187 | rew = 0 188 | shape = True 189 | agents = self.good_agents(world) 190 | adversaries = self.adversaries(world) 191 | if shape: 192 | rew -= 0.1 * min([np.sqrt(np.sum(np.square(a.state.p_pos - agent.state.p_pos))) for a in agents]) 193 | if agent.collide: 194 | for ag in agents: 195 | for adv in adversaries: 196 | if self.is_collision(ag, adv): 197 | rew += 5 198 | return rew 199 | 200 | 201 | def observation2(self, agent, world): 202 | # get positions of all entities in this agent's reference frame 203 | entity_pos = [] 204 | for entity in world.landmarks: 205 | if not entity.boundary: 206 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 207 | 208 | food_pos = [] 209 | for entity in world.food: 210 | if not entity.boundary: 211 | food_pos.append(entity.state.p_pos - agent.state.p_pos) 212 | # communication of all other agents 213 | comm = [] 214 | other_pos = [] 215 | other_vel = [] 216 | for other in world.agents: 217 | if other is agent: continue 218 | comm.append(other.state.c) 219 | other_pos.append(other.state.p_pos - agent.state.p_pos) 220 | if not other.adversary: 221 | other_vel.append(other.state.p_vel) 222 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel) 223 | 224 | def observation(self, agent, world): 225 | # get positions of all entities in this agent's reference frame 226 | entity_pos = [] 227 | for entity in world.landmarks: 228 | if not entity.boundary: 229 | entity_pos.append(entity.state.p_pos - agent.state.p_pos) 230 | 231 | in_forest = [np.array([-1]), np.array([-1])] 232 | inf1 = False 233 | inf2 = False 234 | if self.is_collision(agent, world.forests[0]): 235 | in_forest[0] = np.array([1]) 236 | inf1= True 237 | if self.is_collision(agent, world.forests[1]): 238 | in_forest[1] = np.array([1]) 239 | inf2 = True 240 | 241 | food_pos = [] 242 | for entity in world.food: 243 | if not entity.boundary: 244 | food_pos.append(entity.state.p_pos - agent.state.p_pos) 245 | # communication of all other agents 246 | comm = [] 247 | other_pos = [] 248 | other_vel = [] 249 | for other in world.agents: 250 | if other is agent: continue 251 | comm.append(other.state.c) 252 | oth_f1 = self.is_collision(other, world.forests[0]) 253 | oth_f2 = self.is_collision(other, world.forests[1]) 254 | if (inf1 and oth_f1) or (inf2 and oth_f2) or (not inf1 and not oth_f1 and not inf2 and not oth_f2) or agent.leader: #without forest vis 255 | other_pos.append(other.state.p_pos - agent.state.p_pos) 256 | if not other.adversary: 257 | other_vel.append(other.state.p_vel) 258 | else: 259 | other_pos.append([0, 0]) 260 | if not other.adversary: 261 | other_vel.append([0, 0]) 262 | 263 | # to tell the pred when the prey are in the forest 264 | prey_forest = [] 265 | ga = self.good_agents(world) 266 | for a in ga: 267 | if any([self.is_collision(a, f) for f in world.forests]): 268 | prey_forest.append(np.array([1])) 269 | else: 270 | prey_forest.append(np.array([-1])) 271 | # to tell leader when pred are in forest 272 | prey_forest_lead = [] 273 | for f in world.forests: 274 | if any([self.is_collision(a, f) for a in ga]): 275 | prey_forest_lead.append(np.array([1])) 276 | else: 277 | prey_forest_lead.append(np.array([-1])) 278 | 279 | comm = [world.agents[0].state.c] 280 | 281 | if agent.adversary and not agent.leader: 282 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel + in_forest + comm) 283 | if agent.leader: 284 | return np.concatenate( 285 | [agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel + in_forest + comm) 286 | else: 287 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + in_forest + other_vel) 288 | 289 | 290 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup(name='multiagent', 4 | version='0.0.1', 5 | description='Multi-Agent Goal-Driven Communication Environment', 6 | url='https://github.com/openai/multiagent-public', 7 | author='Igor Mordatch', 8 | author_email='mordatch@openai.com', 9 | packages=find_packages(), 10 | include_package_data=True, 11 | zip_safe=False, 12 | install_requires=['gym', 'numpy-stl'] 13 | ) 14 | --------------------------------------------------------------------------------