├── .gitignore
├── LICENSE.txt
├── README.md
├── bin
├── __init__.py
└── interactive.py
├── make_env.py
├── multiagent
├── __init__.py
├── core.py
├── environment.py
├── multi_discrete.py
├── policy.py
├── rendering.py
├── scenario.py
└── scenarios
│ ├── __init__.py
│ ├── simple.py
│ ├── simple_adversary.py
│ ├── simple_crypto.py
│ ├── simple_push.py
│ ├── simple_reference.py
│ ├── simple_speaker_listener.py
│ ├── simple_spread.py
│ ├── simple_tag.py
│ └── simple_world_comm.py
└── setup.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | *.egg-info/
3 | *.pyc
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 OpenAI
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | **Status:** Archive (code is provided as-is, no updates expected)
2 |
3 | # Maintained Fork
4 |
5 | The maintained version of these environments, which includenumerous fixes, comprehensive documentation, support for installation via pip, and support for current versions of Python are available in PettingZoo (https://github.com/Farama-Foundation/PettingZoo , https://pettingzoo.farama.org/environments/mpe/)
6 |
7 | # Multi-Agent Particle Environment
8 |
9 | A simple multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
10 | Used in the paper [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/pdf/1706.02275.pdf).
11 |
12 | ## Getting started:
13 |
14 | - To install, `cd` into the root directory and type `pip install -e .`
15 |
16 | - To interactively view moving to landmark scenario (see others in ./scenarios/):
17 | `bin/interactive.py --scenario simple.py`
18 |
19 | - Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), numpy (1.14.5), pyglet (1.5.27)
20 |
21 | - To use the environments, look at the code for importing them in `make_env.py`.
22 |
23 | ## Code structure
24 |
25 | - `make_env.py`: contains code for importing a multiagent environment as an OpenAI Gym-like object.
26 |
27 | - `./multiagent/environment.py`: contains code for environment simulation (interaction physics, `_step()` function, etc.)
28 |
29 | - `./multiagent/core.py`: contains classes for various objects (Entities, Landmarks, Agents, etc.) that are used throughout the code.
30 |
31 | - `./multiagent/rendering.py`: used for displaying agent behaviors on the screen.
32 |
33 | - `./multiagent/policy.py`: contains code for interactive policy based on keyboard input.
34 |
35 | - `./multiagent/scenario.py`: contains base scenario object that is extended for all scenarios.
36 |
37 | - `./multiagent/scenarios/`: folder where various scenarios/ environments are stored. scenario code consists of several functions:
38 | 1) `make_world()`: creates all of the entities that inhabit the world (landmarks, agents, etc.), assigns their capabilities (whether they can communicate, or move, or both).
39 | called once at the beginning of each training session
40 | 2) `reset_world()`: resets the world by assigning properties (position, color, etc.) to all entities in the world
41 | called before every episode (including after make_world() before the first episode)
42 | 3) `reward()`: defines the reward function for a given agent
43 | 4) `observation()`: defines the observation space of a given agent
44 | 5) (optional) `benchmark_data()`: provides diagnostic data for policies trained on the environment (e.g. evaluation metrics)
45 |
46 | ### Creating new environments
47 |
48 | You can create new scenarios by implementing the first 4 functions above (`make_world()`, `reset_world()`, `reward()`, and `observation()`).
49 |
50 | ## List of environments
51 |
52 |
53 | | Env name in code (name in paper) | Communication? | Competitive? | Notes |
54 | | --- | --- | --- | --- |
55 | | `simple.py` | N | N | Single agent sees landmark position, rewarded based on how close it gets to landmark. Not a multiagent environment -- used for debugging policies. |
56 | | `simple_adversary.py` (Physical deception) | N | Y | 1 adversary (red), N good agents (green), N landmarks (usually N=2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary. |
57 | | `simple_crypto.py` (Covert communication) | Y | Y | Two good agents (alice and bob), one adversary (eve). Alice must sent a private message to bob over a public channel. Alice and bob are rewarded based on how well bob reconstructs the message, but negatively rewarded if eve can reconstruct the message. Alice and bob have a private key (randomly generated at beginning of each episode), which they must learn to use to encrypt the message. |
58 | | `simple_push.py` (Keep-away) | N |Y | 1 agent, 1 adversary, 1 landmark. Agent is rewarded based on distance to landmark. Adversary is rewarded if it is close to the landmark, and if the agent is far from the landmark. So the adversary learns to push agent away from the landmark. |
59 | | `simple_reference.py` | Y | N | 2 agents, 3 landmarks of different colors. Each agent wants to get to their target landmark, which is known only by other agent. Reward is collective. So agents have to learn to communicate the goal of the other agent, and navigate to their landmark. This is the same as the simple_speaker_listener scenario where both agents are simultaneous speakers and listeners. |
60 | | `simple_speaker_listener.py` (Cooperative communication) | Y | N | Same as simple_reference, except one agent is the ‘speaker’ (gray) that does not move (observes goal of other agent), and other agent is the listener (cannot speak, but must navigate to correct landmark).|
61 | | `simple_spread.py` (Cooperative navigation) | N | N | N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions. |
62 | | `simple_tag.py` (Predator-prey) | N | Y | Predator-prey environment. Good agents (green) are faster and want to avoid being hit by adversaries (red). Adversaries are slower and want to hit good agents. Obstacles (large black circles) block the way. |
63 | | `simple_world_comm.py` | Y | Y | Environment seen in the video accompanying the paper. Same as simple_tag, except (1) there is food (small blue balls) that the good agents are rewarded for being near, (2) we now have ‘forests’ that hide agents inside from being seen from outside; (3) there is a ‘leader adversary” that can see the agents at all times, and can communicate with the other adversaries to help coordinate the chase. |
64 |
65 | ## Paper citation
66 |
67 | If you used this environment for your experiments or found it helpful, consider citing the following papers:
68 |
69 | Environments in this repo:
70 |
71 | @article{lowe2017multi,
72 | title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},
73 | author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},
74 | journal={Neural Information Processing Systems (NIPS)},
75 | year={2017}
76 | }
77 |
78 |
79 | Original particle world environment:
80 |
81 | @article{mordatch2017emergence,
82 | title={Emergence of Grounded Compositional Language in Multi-Agent Populations},
83 | author={Mordatch, Igor and Abbeel, Pieter},
84 | journal={arXiv preprint arXiv:1703.04908},
85 | year={2017}
86 | }
87 |
88 |
--------------------------------------------------------------------------------
/bin/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openai/multiagent-particle-envs/83ba4d1aeb00282f7c4acd6912435b3ca642c227/bin/__init__.py
--------------------------------------------------------------------------------
/bin/interactive.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | import os,sys
3 | sys.path.insert(1, os.path.join(sys.path[0], '..'))
4 | import argparse
5 |
6 | from multiagent.environment import MultiAgentEnv
7 | from multiagent.policy import InteractivePolicy
8 | import multiagent.scenarios as scenarios
9 |
10 | if __name__ == '__main__':
11 | # parse arguments
12 | parser = argparse.ArgumentParser(description=None)
13 | parser.add_argument('-s', '--scenario', default='simple.py', help='Path of the scenario Python script.')
14 | args = parser.parse_args()
15 |
16 | # load scenario from script
17 | scenario = scenarios.load(args.scenario).Scenario()
18 | # create world
19 | world = scenario.make_world()
20 | # create multiagent environment
21 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation, info_callback=None, shared_viewer = False)
22 | # render call to create viewer window (necessary only for interactive policies)
23 | env.render()
24 | # create interactive policies for each agent
25 | policies = [InteractivePolicy(env,i) for i in range(env.n)]
26 | # execution loop
27 | obs_n = env.reset()
28 | while True:
29 | # query for action from each agent's policy
30 | act_n = []
31 | for i, policy in enumerate(policies):
32 | act_n.append(policy.action(obs_n[i]))
33 | # step environment
34 | obs_n, reward_n, done_n, _ = env.step(act_n)
35 | # render all agent views
36 | env.render()
37 | # display rewards
38 | #for agent in env.world.agents:
39 | # print(agent.name + " reward: %0.3f" % env._get_reward(agent))
40 |
--------------------------------------------------------------------------------
/make_env.py:
--------------------------------------------------------------------------------
1 | """
2 | Code for creating a multiagent environment with one of the scenarios listed
3 | in ./scenarios/.
4 | Can be called by using, for example:
5 | env = make_env('simple_speaker_listener')
6 | After producing the env object, can be used similarly to an OpenAI gym
7 | environment.
8 |
9 | A policy using this environment must output actions in the form of a list
10 | for all agents. Each element of the list should be a numpy array,
11 | of size (env.world.dim_p + env.world.dim_c, 1). Physical actions precede
12 | communication actions in this array. See environment.py for more details.
13 | """
14 |
15 | def make_env(scenario_name, benchmark=False):
16 | '''
17 | Creates a MultiAgentEnv object as env. This can be used similar to a gym
18 | environment by calling env.reset() and env.step().
19 | Use env.render() to view the environment on the screen.
20 |
21 | Input:
22 | scenario_name : name of the scenario from ./scenarios/ to be Returns
23 | (without the .py extension)
24 | benchmark : whether you want to produce benchmarking data
25 | (usually only done during evaluation)
26 |
27 | Some useful env properties (see environment.py):
28 | .observation_space : Returns the observation space for each agent
29 | .action_space : Returns the action space for each agent
30 | .n : Returns the number of Agents
31 | '''
32 | from multiagent.environment import MultiAgentEnv
33 | import multiagent.scenarios as scenarios
34 |
35 | # load scenario from script
36 | scenario = scenarios.load(scenario_name + ".py").Scenario()
37 | # create world
38 | world = scenario.make_world()
39 | # create multiagent environment
40 | if benchmark:
41 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation, scenario.benchmark_data)
42 | else:
43 | env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation)
44 | return env
45 |
--------------------------------------------------------------------------------
/multiagent/__init__.py:
--------------------------------------------------------------------------------
1 | import os
2 | import warnings
3 |
4 | from gym.envs.registration import register
5 |
6 | # Multiagent envs
7 | # ----------------------------------------
8 |
9 | register(
10 | id='MultiagentSimple-v0',
11 | entry_point='multiagent.envs:SimpleEnv',
12 | # FIXME(cathywu) currently has to be exactly max_path_length parameters in
13 | # rllab run script
14 | max_episode_steps=100,
15 | )
16 |
17 | register(
18 | id='MultiagentSimpleSpeakerListener-v0',
19 | entry_point='multiagent.envs:SimpleSpeakerListenerEnv',
20 | max_episode_steps=100,
21 | )
22 |
23 | warnings.warn("This code base is no longer maintained, and is not expected to be maintained again in the future. \n"
24 | "For the past handful of years, these environments been maintained inside of PettingZoo (see "
25 | "https://pettingzoo.farama.org/environments/mpe/). \nThis maintained version includes documentation, "
26 | "support for the PettingZoo API, support for current versions of Python, numerous bug fixes, \n"
27 | "support for installation via pip, and numerous other large quality of life improvements. \nWe "
28 | "encourage researchers to switch to this maintained version for all purposes other than comparing "
29 | "to results run on this version of the environments. \n")
30 |
31 | if os.getenv('SUPPRESS_MA_PROMPT') != '1':
32 | input("Please read the raised warning, then press Enter to continue... (to suppress this prompt, please set the environment variable `SUPPRESS_MA_PROMPT=1`)\n")
33 |
--------------------------------------------------------------------------------
/multiagent/core.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | # physical/external base state of all entites
4 | class EntityState(object):
5 | def __init__(self):
6 | # physical position
7 | self.p_pos = None
8 | # physical velocity
9 | self.p_vel = None
10 |
11 | # state of agents (including communication and internal/mental state)
12 | class AgentState(EntityState):
13 | def __init__(self):
14 | super(AgentState, self).__init__()
15 | # communication utterance
16 | self.c = None
17 |
18 | # action of the agent
19 | class Action(object):
20 | def __init__(self):
21 | # physical action
22 | self.u = None
23 | # communication action
24 | self.c = None
25 |
26 | # properties and state of physical world entity
27 | class Entity(object):
28 | def __init__(self):
29 | # name
30 | self.name = ''
31 | # properties:
32 | self.size = 0.050
33 | # entity can move / be pushed
34 | self.movable = False
35 | # entity collides with others
36 | self.collide = True
37 | # material density (affects mass)
38 | self.density = 25.0
39 | # color
40 | self.color = None
41 | # max speed and accel
42 | self.max_speed = None
43 | self.accel = None
44 | # state
45 | self.state = EntityState()
46 | # mass
47 | self.initial_mass = 1.0
48 |
49 | @property
50 | def mass(self):
51 | return self.initial_mass
52 |
53 | # properties of landmark entities
54 | class Landmark(Entity):
55 | def __init__(self):
56 | super(Landmark, self).__init__()
57 |
58 | # properties of agent entities
59 | class Agent(Entity):
60 | def __init__(self):
61 | super(Agent, self).__init__()
62 | # agents are movable by default
63 | self.movable = True
64 | # cannot send communication signals
65 | self.silent = False
66 | # cannot observe the world
67 | self.blind = False
68 | # physical motor noise amount
69 | self.u_noise = None
70 | # communication noise amount
71 | self.c_noise = None
72 | # control range
73 | self.u_range = 1.0
74 | # state
75 | self.state = AgentState()
76 | # action
77 | self.action = Action()
78 | # script behavior to execute
79 | self.action_callback = None
80 |
81 | # multi-agent world
82 | class World(object):
83 | def __init__(self):
84 | # list of agents and entities (can change at execution-time!)
85 | self.agents = []
86 | self.landmarks = []
87 | # communication channel dimensionality
88 | self.dim_c = 0
89 | # position dimensionality
90 | self.dim_p = 2
91 | # color dimensionality
92 | self.dim_color = 3
93 | # simulation timestep
94 | self.dt = 0.1
95 | # physical damping
96 | self.damping = 0.25
97 | # contact response parameters
98 | self.contact_force = 1e+2
99 | self.contact_margin = 1e-3
100 |
101 | # return all entities in the world
102 | @property
103 | def entities(self):
104 | return self.agents + self.landmarks
105 |
106 | # return all agents controllable by external policies
107 | @property
108 | def policy_agents(self):
109 | return [agent for agent in self.agents if agent.action_callback is None]
110 |
111 | # return all agents controlled by world scripts
112 | @property
113 | def scripted_agents(self):
114 | return [agent for agent in self.agents if agent.action_callback is not None]
115 |
116 | # update state of the world
117 | def step(self):
118 | # set actions for scripted agents
119 | for agent in self.scripted_agents:
120 | agent.action = agent.action_callback(agent, self)
121 | # gather forces applied to entities
122 | p_force = [None] * len(self.entities)
123 | # apply agent physical controls
124 | p_force = self.apply_action_force(p_force)
125 | # apply environment forces
126 | p_force = self.apply_environment_force(p_force)
127 | # integrate physical state
128 | self.integrate_state(p_force)
129 | # update agent state
130 | for agent in self.agents:
131 | self.update_agent_state(agent)
132 |
133 | # gather agent action forces
134 | def apply_action_force(self, p_force):
135 | # set applied forces
136 | for i,agent in enumerate(self.agents):
137 | if agent.movable:
138 | noise = np.random.randn(*agent.action.u.shape) * agent.u_noise if agent.u_noise else 0.0
139 | p_force[i] = agent.action.u + noise
140 | return p_force
141 |
142 | # gather physical forces acting on entities
143 | def apply_environment_force(self, p_force):
144 | # simple (but inefficient) collision response
145 | for a,entity_a in enumerate(self.entities):
146 | for b,entity_b in enumerate(self.entities):
147 | if(b <= a): continue
148 | [f_a, f_b] = self.get_collision_force(entity_a, entity_b)
149 | if(f_a is not None):
150 | if(p_force[a] is None): p_force[a] = 0.0
151 | p_force[a] = f_a + p_force[a]
152 | if(f_b is not None):
153 | if(p_force[b] is None): p_force[b] = 0.0
154 | p_force[b] = f_b + p_force[b]
155 | return p_force
156 |
157 | # integrate physical state
158 | def integrate_state(self, p_force):
159 | for i,entity in enumerate(self.entities):
160 | if not entity.movable: continue
161 | entity.state.p_vel = entity.state.p_vel * (1 - self.damping)
162 | if (p_force[i] is not None):
163 | entity.state.p_vel += (p_force[i] / entity.mass) * self.dt
164 | if entity.max_speed is not None:
165 | speed = np.sqrt(np.square(entity.state.p_vel[0]) + np.square(entity.state.p_vel[1]))
166 | if speed > entity.max_speed:
167 | entity.state.p_vel = entity.state.p_vel / np.sqrt(np.square(entity.state.p_vel[0]) +
168 | np.square(entity.state.p_vel[1])) * entity.max_speed
169 | entity.state.p_pos += entity.state.p_vel * self.dt
170 |
171 | def update_agent_state(self, agent):
172 | # set communication state (directly for now)
173 | if agent.silent:
174 | agent.state.c = np.zeros(self.dim_c)
175 | else:
176 | noise = np.random.randn(*agent.action.c.shape) * agent.c_noise if agent.c_noise else 0.0
177 | agent.state.c = agent.action.c + noise
178 |
179 | # get collision forces for any contact between two entities
180 | def get_collision_force(self, entity_a, entity_b):
181 | if (not entity_a.collide) or (not entity_b.collide):
182 | return [None, None] # not a collider
183 | if (entity_a is entity_b):
184 | return [None, None] # don't collide against itself
185 | # compute actual distance between entities
186 | delta_pos = entity_a.state.p_pos - entity_b.state.p_pos
187 | dist = np.sqrt(np.sum(np.square(delta_pos)))
188 | # minimum allowable distance
189 | dist_min = entity_a.size + entity_b.size
190 | # softmax penetration
191 | k = self.contact_margin
192 | penetration = np.logaddexp(0, -(dist - dist_min)/k)*k
193 | force = self.contact_force * delta_pos / dist * penetration
194 | force_a = +force if entity_a.movable else None
195 | force_b = -force if entity_b.movable else None
196 | return [force_a, force_b]
--------------------------------------------------------------------------------
/multiagent/environment.py:
--------------------------------------------------------------------------------
1 | import gym
2 | from gym import spaces
3 | from gym.envs.registration import EnvSpec
4 | import numpy as np
5 | from multiagent.multi_discrete import MultiDiscrete
6 |
7 | # environment for all agents in the multiagent world
8 | # currently code assumes that no agents will be created/destroyed at runtime!
9 | class MultiAgentEnv(gym.Env):
10 | metadata = {
11 | 'render.modes' : ['human', 'rgb_array']
12 | }
13 |
14 | def __init__(self, world, reset_callback=None, reward_callback=None,
15 | observation_callback=None, info_callback=None,
16 | done_callback=None, shared_viewer=True):
17 |
18 | self.world = world
19 | self.agents = self.world.policy_agents
20 | # set required vectorized gym env property
21 | self.n = len(world.policy_agents)
22 | # scenario callbacks
23 | self.reset_callback = reset_callback
24 | self.reward_callback = reward_callback
25 | self.observation_callback = observation_callback
26 | self.info_callback = info_callback
27 | self.done_callback = done_callback
28 | # environment parameters
29 | self.discrete_action_space = True
30 | # if true, action is a number 0...N, otherwise action is a one-hot N-dimensional vector
31 | self.discrete_action_input = False
32 | # if true, even the action is continuous, action will be performed discretely
33 | self.force_discrete_action = world.discrete_action if hasattr(world, 'discrete_action') else False
34 | # if true, every agent has the same reward
35 | self.shared_reward = world.collaborative if hasattr(world, 'collaborative') else False
36 | self.time = 0
37 |
38 | # configure spaces
39 | self.action_space = []
40 | self.observation_space = []
41 | for agent in self.agents:
42 | total_action_space = []
43 | # physical action space
44 | if self.discrete_action_space:
45 | u_action_space = spaces.Discrete(world.dim_p * 2 + 1)
46 | else:
47 | u_action_space = spaces.Box(low=-agent.u_range, high=+agent.u_range, shape=(world.dim_p,), dtype=np.float32)
48 | if agent.movable:
49 | total_action_space.append(u_action_space)
50 | # communication action space
51 | if self.discrete_action_space:
52 | c_action_space = spaces.Discrete(world.dim_c)
53 | else:
54 | c_action_space = spaces.Box(low=0.0, high=1.0, shape=(world.dim_c,), dtype=np.float32)
55 | if not agent.silent:
56 | total_action_space.append(c_action_space)
57 | # total action space
58 | if len(total_action_space) > 1:
59 | # all action spaces are discrete, so simplify to MultiDiscrete action space
60 | if all([isinstance(act_space, spaces.Discrete) for act_space in total_action_space]):
61 | act_space = MultiDiscrete([[0, act_space.n - 1] for act_space in total_action_space])
62 | else:
63 | act_space = spaces.Tuple(total_action_space)
64 | self.action_space.append(act_space)
65 | else:
66 | self.action_space.append(total_action_space[0])
67 | # observation space
68 | obs_dim = len(observation_callback(agent, self.world))
69 | self.observation_space.append(spaces.Box(low=-np.inf, high=+np.inf, shape=(obs_dim,), dtype=np.float32))
70 | agent.action.c = np.zeros(self.world.dim_c)
71 |
72 | # rendering
73 | self.shared_viewer = shared_viewer
74 | if self.shared_viewer:
75 | self.viewers = [None]
76 | else:
77 | self.viewers = [None] * self.n
78 | self._reset_render()
79 |
80 | def step(self, action_n):
81 | obs_n = []
82 | reward_n = []
83 | done_n = []
84 | info_n = {'n': []}
85 | self.agents = self.world.policy_agents
86 | # set action for each agent
87 | for i, agent in enumerate(self.agents):
88 | self._set_action(action_n[i], agent, self.action_space[i])
89 | # advance world state
90 | self.world.step()
91 | # record observation for each agent
92 | for agent in self.agents:
93 | obs_n.append(self._get_obs(agent))
94 | reward_n.append(self._get_reward(agent))
95 | done_n.append(self._get_done(agent))
96 |
97 | info_n['n'].append(self._get_info(agent))
98 |
99 | # all agents get total reward in cooperative case
100 | reward = np.sum(reward_n)
101 | if self.shared_reward:
102 | reward_n = [reward] * self.n
103 |
104 | return obs_n, reward_n, done_n, info_n
105 |
106 | def reset(self):
107 | # reset world
108 | self.reset_callback(self.world)
109 | # reset renderer
110 | self._reset_render()
111 | # record observations for each agent
112 | obs_n = []
113 | self.agents = self.world.policy_agents
114 | for agent in self.agents:
115 | obs_n.append(self._get_obs(agent))
116 | return obs_n
117 |
118 | # get info used for benchmarking
119 | def _get_info(self, agent):
120 | if self.info_callback is None:
121 | return {}
122 | return self.info_callback(agent, self.world)
123 |
124 | # get observation for a particular agent
125 | def _get_obs(self, agent):
126 | if self.observation_callback is None:
127 | return np.zeros(0)
128 | return self.observation_callback(agent, self.world)
129 |
130 | # get dones for a particular agent
131 | # unused right now -- agents are allowed to go beyond the viewing screen
132 | def _get_done(self, agent):
133 | if self.done_callback is None:
134 | return False
135 | return self.done_callback(agent, self.world)
136 |
137 | # get reward for a particular agent
138 | def _get_reward(self, agent):
139 | if self.reward_callback is None:
140 | return 0.0
141 | return self.reward_callback(agent, self.world)
142 |
143 | # set env action for a particular agent
144 | def _set_action(self, action, agent, action_space, time=None):
145 | agent.action.u = np.zeros(self.world.dim_p)
146 | agent.action.c = np.zeros(self.world.dim_c)
147 | # process action
148 | if isinstance(action_space, MultiDiscrete):
149 | act = []
150 | size = action_space.high - action_space.low + 1
151 | index = 0
152 | for s in size:
153 | act.append(action[index:(index+s)])
154 | index += s
155 | action = act
156 | else:
157 | action = [action]
158 |
159 | if agent.movable:
160 | # physical action
161 | if self.discrete_action_input:
162 | agent.action.u = np.zeros(self.world.dim_p)
163 | # process discrete action
164 | if action[0] == 1: agent.action.u[0] = -1.0
165 | if action[0] == 2: agent.action.u[0] = +1.0
166 | if action[0] == 3: agent.action.u[1] = -1.0
167 | if action[0] == 4: agent.action.u[1] = +1.0
168 | else:
169 | if self.force_discrete_action:
170 | d = np.argmax(action[0])
171 | action[0][:] = 0.0
172 | action[0][d] = 1.0
173 | if self.discrete_action_space:
174 | agent.action.u[0] += action[0][1] - action[0][2]
175 | agent.action.u[1] += action[0][3] - action[0][4]
176 | else:
177 | agent.action.u = action[0]
178 | sensitivity = 5.0
179 | if agent.accel is not None:
180 | sensitivity = agent.accel
181 | agent.action.u *= sensitivity
182 | action = action[1:]
183 | if not agent.silent:
184 | # communication action
185 | if self.discrete_action_input:
186 | agent.action.c = np.zeros(self.world.dim_c)
187 | agent.action.c[action[0]] = 1.0
188 | else:
189 | agent.action.c = action[0]
190 | action = action[1:]
191 | # make sure we used all elements of action
192 | assert len(action) == 0
193 |
194 | # reset rendering assets
195 | def _reset_render(self):
196 | self.render_geoms = None
197 | self.render_geoms_xform = None
198 |
199 | # render environment
200 | def render(self, mode='human'):
201 | if mode == 'human':
202 | alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
203 | message = ''
204 | for agent in self.world.agents:
205 | comm = []
206 | for other in self.world.agents:
207 | if other is agent: continue
208 | if np.all(other.state.c == 0):
209 | word = '_'
210 | else:
211 | word = alphabet[np.argmax(other.state.c)]
212 | message += (other.name + ' to ' + agent.name + ': ' + word + ' ')
213 | print(message)
214 |
215 | for i in range(len(self.viewers)):
216 | # create viewers (if necessary)
217 | if self.viewers[i] is None:
218 | # import rendering only if we need it (and don't import for headless machines)
219 | #from gym.envs.classic_control import rendering
220 | from multiagent import rendering
221 | self.viewers[i] = rendering.Viewer(700,700)
222 |
223 | # create rendering geometry
224 | if self.render_geoms is None:
225 | # import rendering only if we need it (and don't import for headless machines)
226 | #from gym.envs.classic_control import rendering
227 | from multiagent import rendering
228 | self.render_geoms = []
229 | self.render_geoms_xform = []
230 | for entity in self.world.entities:
231 | geom = rendering.make_circle(entity.size)
232 | xform = rendering.Transform()
233 | if 'agent' in entity.name:
234 | geom.set_color(*entity.color, alpha=0.5)
235 | else:
236 | geom.set_color(*entity.color)
237 | geom.add_attr(xform)
238 | self.render_geoms.append(geom)
239 | self.render_geoms_xform.append(xform)
240 |
241 | # add geoms to viewer
242 | for viewer in self.viewers:
243 | viewer.geoms = []
244 | for geom in self.render_geoms:
245 | viewer.add_geom(geom)
246 |
247 | results = []
248 | for i in range(len(self.viewers)):
249 | from multiagent import rendering
250 | # update bounds to center around agent
251 | cam_range = 1
252 | if self.shared_viewer:
253 | pos = np.zeros(self.world.dim_p)
254 | else:
255 | pos = self.agents[i].state.p_pos
256 | self.viewers[i].set_bounds(pos[0]-cam_range,pos[0]+cam_range,pos[1]-cam_range,pos[1]+cam_range)
257 | # update geometry positions
258 | for e, entity in enumerate(self.world.entities):
259 | self.render_geoms_xform[e].set_translation(*entity.state.p_pos)
260 | # render to display or array
261 | results.append(self.viewers[i].render(return_rgb_array = mode=='rgb_array'))
262 |
263 | return results
264 |
265 | # create receptor field locations in local coordinate frame
266 | def _make_receptor_locations(self, agent):
267 | receptor_type = 'polar'
268 | range_min = 0.05 * 2.0
269 | range_max = 1.00
270 | dx = []
271 | # circular receptive field
272 | if receptor_type == 'polar':
273 | for angle in np.linspace(-np.pi, +np.pi, 8, endpoint=False):
274 | for distance in np.linspace(range_min, range_max, 3):
275 | dx.append(distance * np.array([np.cos(angle), np.sin(angle)]))
276 | # add origin
277 | dx.append(np.array([0.0, 0.0]))
278 | # grid receptive field
279 | if receptor_type == 'grid':
280 | for x in np.linspace(-range_max, +range_max, 5):
281 | for y in np.linspace(-range_max, +range_max, 5):
282 | dx.append(np.array([x,y]))
283 | return dx
284 |
285 |
286 | # vectorized wrapper for a batch of multi-agent environments
287 | # assumes all environments have the same observation and action space
288 | class BatchMultiAgentEnv(gym.Env):
289 | metadata = {
290 | 'runtime.vectorized': True,
291 | 'render.modes' : ['human', 'rgb_array']
292 | }
293 |
294 | def __init__(self, env_batch):
295 | self.env_batch = env_batch
296 |
297 | @property
298 | def n(self):
299 | return np.sum([env.n for env in self.env_batch])
300 |
301 | @property
302 | def action_space(self):
303 | return self.env_batch[0].action_space
304 |
305 | @property
306 | def observation_space(self):
307 | return self.env_batch[0].observation_space
308 |
309 | def step(self, action_n, time):
310 | obs_n = []
311 | reward_n = []
312 | done_n = []
313 | info_n = {'n': []}
314 | i = 0
315 | for env in self.env_batch:
316 | obs, reward, done, _ = env.step(action_n[i:(i+env.n)], time)
317 | i += env.n
318 | obs_n += obs
319 | # reward = [r / len(self.env_batch) for r in reward]
320 | reward_n += reward
321 | done_n += done
322 | return obs_n, reward_n, done_n, info_n
323 |
324 | def reset(self):
325 | obs_n = []
326 | for env in self.env_batch:
327 | obs_n += env.reset()
328 | return obs_n
329 |
330 | # render environment
331 | def render(self, mode='human', close=True):
332 | results_n = []
333 | for env in self.env_batch:
334 | results_n += env.render(mode, close)
335 | return results_n
336 |
--------------------------------------------------------------------------------
/multiagent/multi_discrete.py:
--------------------------------------------------------------------------------
1 | # An old version of OpenAI Gym's multi_discrete.py. (Was getting affected by Gym updates)
2 | # (https://github.com/openai/gym/blob/1fb81d4e3fb780ccf77fec731287ba07da35eb84/gym/spaces/multi_discrete.py)
3 |
4 | import numpy as np
5 |
6 | import gym
7 | from gym.spaces import prng
8 |
9 | class MultiDiscrete(gym.Space):
10 | """
11 | - The multi-discrete action space consists of a series of discrete action spaces with different parameters
12 | - It can be adapted to both a Discrete action space or a continuous (Box) action space
13 | - It is useful to represent game controllers or keyboards where each key can be represented as a discrete action space
14 | - It is parametrized by passing an array of arrays containing [min, max] for each discrete action space
15 | where the discrete action space can take any integers from `min` to `max` (both inclusive)
16 | Note: A value of 0 always need to represent the NOOP action.
17 | e.g. Nintendo Game Controller
18 | - Can be conceptualized as 3 discrete action spaces:
19 | 1) Arrow Keys: Discrete 5 - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] - params: min: 0, max: 4
20 | 2) Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
21 | 3) Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
22 | - Can be initialized as
23 | MultiDiscrete([ [0,4], [0,1], [0,1] ])
24 | """
25 | def __init__(self, array_of_param_array):
26 | self.low = np.array([x[0] for x in array_of_param_array])
27 | self.high = np.array([x[1] for x in array_of_param_array])
28 | self.num_discrete_space = self.low.shape[0]
29 |
30 | def sample(self):
31 | """ Returns a array with one sample from each discrete action space """
32 | # For each row: round(random .* (max - min) + min, 0)
33 | random_array = prng.np_random.rand(self.num_discrete_space)
34 | return [int(x) for x in np.floor(np.multiply((self.high - self.low + 1.), random_array) + self.low)]
35 | def contains(self, x):
36 | return len(x) == self.num_discrete_space and (np.array(x) >= self.low).all() and (np.array(x) <= self.high).all()
37 |
38 | @property
39 | def shape(self):
40 | return self.num_discrete_space
41 | def __repr__(self):
42 | return "MultiDiscrete" + str(self.num_discrete_space)
43 | def __eq__(self, other):
44 | return np.array_equal(self.low, other.low) and np.array_equal(self.high, other.high)
--------------------------------------------------------------------------------
/multiagent/policy.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from pyglet.window import key
3 |
4 | # individual agent policy
5 | class Policy(object):
6 | def __init__(self):
7 | pass
8 | def action(self, obs):
9 | raise NotImplementedError()
10 |
11 | # interactive policy based on keyboard input
12 | # hard-coded to deal only with movement, not communication
13 | class InteractivePolicy(Policy):
14 | def __init__(self, env, agent_index):
15 | super(InteractivePolicy, self).__init__()
16 | self.env = env
17 | # hard-coded keyboard events
18 | self.move = [False for i in range(4)]
19 | self.comm = [False for i in range(env.world.dim_c)]
20 | # register keyboard events with this environment's window
21 | env.viewers[agent_index].window.on_key_press = self.key_press
22 | env.viewers[agent_index].window.on_key_release = self.key_release
23 |
24 | def action(self, obs):
25 | # ignore observation and just act based on keyboard events
26 | if self.env.discrete_action_input:
27 | u = 0
28 | if self.move[0]: u = 1
29 | if self.move[1]: u = 2
30 | if self.move[2]: u = 4
31 | if self.move[3]: u = 3
32 | else:
33 | u = np.zeros(5) # 5-d because of no-move action
34 | if self.move[0]: u[1] += 1.0
35 | if self.move[1]: u[2] += 1.0
36 | if self.move[3]: u[3] += 1.0
37 | if self.move[2]: u[4] += 1.0
38 | if True not in self.move:
39 | u[0] += 1.0
40 | return np.concatenate([u, np.zeros(self.env.world.dim_c)])
41 |
42 | # keyboard event callbacks
43 | def key_press(self, k, mod):
44 | if k==key.LEFT: self.move[0] = True
45 | if k==key.RIGHT: self.move[1] = True
46 | if k==key.UP: self.move[2] = True
47 | if k==key.DOWN: self.move[3] = True
48 | def key_release(self, k, mod):
49 | if k==key.LEFT: self.move[0] = False
50 | if k==key.RIGHT: self.move[1] = False
51 | if k==key.UP: self.move[2] = False
52 | if k==key.DOWN: self.move[3] = False
53 |
--------------------------------------------------------------------------------
/multiagent/rendering.py:
--------------------------------------------------------------------------------
1 | """
2 | 2D rendering framework
3 | """
4 | from __future__ import division
5 | import os
6 | import six
7 | import sys
8 |
9 | if "Apple" in sys.version:
10 | if 'DYLD_FALLBACK_LIBRARY_PATH' in os.environ:
11 | os.environ['DYLD_FALLBACK_LIBRARY_PATH'] += ':/usr/lib'
12 | # (JDS 2016/04/15): avoid bug on Anaconda 2.3.0 / Yosemite
13 |
14 | from gym.utils import reraise
15 | from gym import error
16 |
17 | try:
18 | import pyglet
19 | except ImportError as e:
20 | reraise(suffix="HINT: you can install pyglet directly via 'pip install pyglet'. But if you really just want to install all Gym dependencies and not have to think about it, 'pip install -e .[all]' or 'pip install gym[all]' will do it.")
21 |
22 | try:
23 | from pyglet.gl import *
24 | except ImportError as e:
25 | reraise(prefix="Error occured while running `from pyglet.gl import *`",suffix="HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'. If you're running on a server, you may need a virtual frame buffer; something like this should work: 'xvfb-run -s \"-screen 0 1400x900x24\" python '")
26 |
27 | import math
28 | import numpy as np
29 |
30 | RAD2DEG = 57.29577951308232
31 |
32 | def get_display(spec):
33 | """Convert a display specification (such as :0) into an actual Display
34 | object.
35 |
36 | Pyglet only supports multiple Displays on Linux.
37 | """
38 | if spec is None:
39 | return None
40 | elif isinstance(spec, six.string_types):
41 | return pyglet.canvas.Display(spec)
42 | else:
43 | raise error.Error('Invalid display specification: {}. (Must be a string like :0 or None.)'.format(spec))
44 |
45 | class Viewer(object):
46 | def __init__(self, width, height, display=None):
47 | display = get_display(display)
48 |
49 | self.width = width
50 | self.height = height
51 |
52 | self.window = pyglet.window.Window(width=width, height=height, display=display)
53 | self.window.on_close = self.window_closed_by_user
54 | self.geoms = []
55 | self.onetime_geoms = []
56 | self.transform = Transform()
57 |
58 | glEnable(GL_BLEND)
59 | # glEnable(GL_MULTISAMPLE)
60 | glEnable(GL_LINE_SMOOTH)
61 | # glHint(GL_LINE_SMOOTH_HINT, GL_DONT_CARE)
62 | glHint(GL_LINE_SMOOTH_HINT, GL_NICEST)
63 | glLineWidth(2.0)
64 | glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
65 |
66 | def close(self):
67 | self.window.close()
68 |
69 | def window_closed_by_user(self):
70 | self.close()
71 |
72 | def set_bounds(self, left, right, bottom, top):
73 | assert right > left and top > bottom
74 | scalex = self.width/(right-left)
75 | scaley = self.height/(top-bottom)
76 | self.transform = Transform(
77 | translation=(-left*scalex, -bottom*scaley),
78 | scale=(scalex, scaley))
79 |
80 | def add_geom(self, geom):
81 | self.geoms.append(geom)
82 |
83 | def add_onetime(self, geom):
84 | self.onetime_geoms.append(geom)
85 |
86 | def render(self, return_rgb_array=False):
87 | glClearColor(1,1,1,1)
88 | self.window.clear()
89 | self.window.switch_to()
90 | self.window.dispatch_events()
91 | self.transform.enable()
92 | for geom in self.geoms:
93 | geom.render()
94 | for geom in self.onetime_geoms:
95 | geom.render()
96 | self.transform.disable()
97 | arr = None
98 | if return_rgb_array:
99 | buffer = pyglet.image.get_buffer_manager().get_color_buffer()
100 | image_data = buffer.get_image_data()
101 | arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
102 | # In https://github.com/openai/gym-http-api/issues/2, we
103 | # discovered that someone using Xmonad on Arch was having
104 | # a window of size 598 x 398, though a 600 x 400 window
105 | # was requested. (Guess Xmonad was preserving a pixel for
106 | # the boundary.) So we use the buffer height/width rather
107 | # than the requested one.
108 | arr = arr.reshape(buffer.height, buffer.width, 4)
109 | arr = arr[::-1,:,0:3]
110 | self.window.flip()
111 | self.onetime_geoms = []
112 | return arr
113 |
114 | # Convenience
115 | def draw_circle(self, radius=10, res=30, filled=True, **attrs):
116 | geom = make_circle(radius=radius, res=res, filled=filled)
117 | _add_attrs(geom, attrs)
118 | self.add_onetime(geom)
119 | return geom
120 |
121 | def draw_polygon(self, v, filled=True, **attrs):
122 | geom = make_polygon(v=v, filled=filled)
123 | _add_attrs(geom, attrs)
124 | self.add_onetime(geom)
125 | return geom
126 |
127 | def draw_polyline(self, v, **attrs):
128 | geom = make_polyline(v=v)
129 | _add_attrs(geom, attrs)
130 | self.add_onetime(geom)
131 | return geom
132 |
133 | def draw_line(self, start, end, **attrs):
134 | geom = Line(start, end)
135 | _add_attrs(geom, attrs)
136 | self.add_onetime(geom)
137 | return geom
138 |
139 | def get_array(self):
140 | self.window.flip()
141 | image_data = pyglet.image.get_buffer_manager().get_color_buffer().get_image_data()
142 | self.window.flip()
143 | arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
144 | arr = arr.reshape(self.height, self.width, 4)
145 | return arr[::-1,:,0:3]
146 |
147 | def _add_attrs(geom, attrs):
148 | if "color" in attrs:
149 | geom.set_color(*attrs["color"])
150 | if "linewidth" in attrs:
151 | geom.set_linewidth(attrs["linewidth"])
152 |
153 | class Geom(object):
154 | def __init__(self):
155 | self._color=Color((0, 0, 0, 1.0))
156 | self.attrs = [self._color]
157 | def render(self):
158 | for attr in reversed(self.attrs):
159 | attr.enable()
160 | self.render1()
161 | for attr in self.attrs:
162 | attr.disable()
163 | def render1(self):
164 | raise NotImplementedError
165 | def add_attr(self, attr):
166 | self.attrs.append(attr)
167 | def set_color(self, r, g, b, alpha=1):
168 | self._color.vec4 = (r, g, b, alpha)
169 |
170 | class Attr(object):
171 | def enable(self):
172 | raise NotImplementedError
173 | def disable(self):
174 | pass
175 |
176 | class Transform(Attr):
177 | def __init__(self, translation=(0.0, 0.0), rotation=0.0, scale=(1,1)):
178 | self.set_translation(*translation)
179 | self.set_rotation(rotation)
180 | self.set_scale(*scale)
181 | def enable(self):
182 | glPushMatrix()
183 | glTranslatef(self.translation[0], self.translation[1], 0) # translate to GL loc ppint
184 | glRotatef(RAD2DEG * self.rotation, 0, 0, 1.0)
185 | glScalef(self.scale[0], self.scale[1], 1)
186 | def disable(self):
187 | glPopMatrix()
188 | def set_translation(self, newx, newy):
189 | self.translation = (float(newx), float(newy))
190 | def set_rotation(self, new):
191 | self.rotation = float(new)
192 | def set_scale(self, newx, newy):
193 | self.scale = (float(newx), float(newy))
194 |
195 | class Color(Attr):
196 | def __init__(self, vec4):
197 | self.vec4 = vec4
198 | def enable(self):
199 | glColor4f(*self.vec4)
200 |
201 | class LineStyle(Attr):
202 | def __init__(self, style):
203 | self.style = style
204 | def enable(self):
205 | glEnable(GL_LINE_STIPPLE)
206 | glLineStipple(1, self.style)
207 | def disable(self):
208 | glDisable(GL_LINE_STIPPLE)
209 |
210 | class LineWidth(Attr):
211 | def __init__(self, stroke):
212 | self.stroke = stroke
213 | def enable(self):
214 | glLineWidth(self.stroke)
215 |
216 | class Point(Geom):
217 | def __init__(self):
218 | Geom.__init__(self)
219 | def render1(self):
220 | glBegin(GL_POINTS) # draw point
221 | glVertex3f(0.0, 0.0, 0.0)
222 | glEnd()
223 |
224 | class FilledPolygon(Geom):
225 | def __init__(self, v):
226 | Geom.__init__(self)
227 | self.v = v
228 | def render1(self):
229 | if len(self.v) == 4 : glBegin(GL_QUADS)
230 | elif len(self.v) > 4 : glBegin(GL_POLYGON)
231 | else: glBegin(GL_TRIANGLES)
232 | for p in self.v:
233 | glVertex3f(p[0], p[1],0) # draw each vertex
234 | glEnd()
235 |
236 | color = (self._color.vec4[0] * 0.5, self._color.vec4[1] * 0.5, self._color.vec4[2] * 0.5, self._color.vec4[3] * 0.5)
237 | glColor4f(*color)
238 | glBegin(GL_LINE_LOOP)
239 | for p in self.v:
240 | glVertex3f(p[0], p[1],0) # draw each vertex
241 | glEnd()
242 |
243 | def make_circle(radius=10, res=30, filled=True):
244 | points = []
245 | for i in range(res):
246 | ang = 2*math.pi*i / res
247 | points.append((math.cos(ang)*radius, math.sin(ang)*radius))
248 | if filled:
249 | return FilledPolygon(points)
250 | else:
251 | return PolyLine(points, True)
252 |
253 | def make_polygon(v, filled=True):
254 | if filled: return FilledPolygon(v)
255 | else: return PolyLine(v, True)
256 |
257 | def make_polyline(v):
258 | return PolyLine(v, False)
259 |
260 | def make_capsule(length, width):
261 | l, r, t, b = 0, length, width/2, -width/2
262 | box = make_polygon([(l,b), (l,t), (r,t), (r,b)])
263 | circ0 = make_circle(width/2)
264 | circ1 = make_circle(width/2)
265 | circ1.add_attr(Transform(translation=(length, 0)))
266 | geom = Compound([box, circ0, circ1])
267 | return geom
268 |
269 | class Compound(Geom):
270 | def __init__(self, gs):
271 | Geom.__init__(self)
272 | self.gs = gs
273 | for g in self.gs:
274 | g.attrs = [a for a in g.attrs if not isinstance(a, Color)]
275 | def render1(self):
276 | for g in self.gs:
277 | g.render()
278 |
279 | class PolyLine(Geom):
280 | def __init__(self, v, close):
281 | Geom.__init__(self)
282 | self.v = v
283 | self.close = close
284 | self.linewidth = LineWidth(1)
285 | self.add_attr(self.linewidth)
286 | def render1(self):
287 | glBegin(GL_LINE_LOOP if self.close else GL_LINE_STRIP)
288 | for p in self.v:
289 | glVertex3f(p[0], p[1],0) # draw each vertex
290 | glEnd()
291 | def set_linewidth(self, x):
292 | self.linewidth.stroke = x
293 |
294 | class Line(Geom):
295 | def __init__(self, start=(0.0, 0.0), end=(0.0, 0.0)):
296 | Geom.__init__(self)
297 | self.start = start
298 | self.end = end
299 | self.linewidth = LineWidth(1)
300 | self.add_attr(self.linewidth)
301 |
302 | def render1(self):
303 | glBegin(GL_LINES)
304 | glVertex2f(*self.start)
305 | glVertex2f(*self.end)
306 | glEnd()
307 |
308 | class Image(Geom):
309 | def __init__(self, fname, width, height):
310 | Geom.__init__(self)
311 | self.width = width
312 | self.height = height
313 | img = pyglet.image.load(fname)
314 | self.img = img
315 | self.flip = False
316 | def render1(self):
317 | self.img.blit(-self.width/2, -self.height/2, width=self.width, height=self.height)
318 |
319 | # ================================================================
320 |
321 | class SimpleImageViewer(object):
322 | def __init__(self, display=None):
323 | self.window = None
324 | self.isopen = False
325 | self.display = display
326 | def imshow(self, arr):
327 | if self.window is None:
328 | height, width, channels = arr.shape
329 | self.window = pyglet.window.Window(width=width, height=height, display=self.display)
330 | self.width = width
331 | self.height = height
332 | self.isopen = True
333 | assert arr.shape == (self.height, self.width, 3), "You passed in an image with the wrong number shape"
334 | image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3)
335 | self.window.clear()
336 | self.window.switch_to()
337 | self.window.dispatch_events()
338 | image.blit(0,0)
339 | self.window.flip()
340 | def close(self):
341 | if self.isopen:
342 | self.window.close()
343 | self.isopen = False
344 | def __del__(self):
345 | self.close()
--------------------------------------------------------------------------------
/multiagent/scenario.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | # defines scenario upon which the world is built
4 | class BaseScenario(object):
5 | # create elements of the world
6 | def make_world(self):
7 | raise NotImplementedError()
8 | # create initial conditions of the world
9 | def reset_world(self, world):
10 | raise NotImplementedError()
11 |
--------------------------------------------------------------------------------
/multiagent/scenarios/__init__.py:
--------------------------------------------------------------------------------
1 | import imp
2 | import os.path as osp
3 |
4 |
5 | def load(name):
6 | pathname = osp.join(osp.dirname(__file__), name)
7 | return imp.load_source('', pathname)
8 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 | class Scenario(BaseScenario):
6 | def make_world(self):
7 | world = World()
8 | # add agents
9 | world.agents = [Agent() for i in range(1)]
10 | for i, agent in enumerate(world.agents):
11 | agent.name = 'agent %d' % i
12 | agent.collide = False
13 | agent.silent = True
14 | # add landmarks
15 | world.landmarks = [Landmark() for i in range(1)]
16 | for i, landmark in enumerate(world.landmarks):
17 | landmark.name = 'landmark %d' % i
18 | landmark.collide = False
19 | landmark.movable = False
20 | # make initial conditions
21 | self.reset_world(world)
22 | return world
23 |
24 | def reset_world(self, world):
25 | # random properties for agents
26 | for i, agent in enumerate(world.agents):
27 | agent.color = np.array([0.25,0.25,0.25])
28 | # random properties for landmarks
29 | for i, landmark in enumerate(world.landmarks):
30 | landmark.color = np.array([0.75,0.75,0.75])
31 | world.landmarks[0].color = np.array([0.75,0.25,0.25])
32 | # set random initial states
33 | for agent in world.agents:
34 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
35 | agent.state.p_vel = np.zeros(world.dim_p)
36 | agent.state.c = np.zeros(world.dim_c)
37 | for i, landmark in enumerate(world.landmarks):
38 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
39 | landmark.state.p_vel = np.zeros(world.dim_p)
40 |
41 | def reward(self, agent, world):
42 | dist2 = np.sum(np.square(agent.state.p_pos - world.landmarks[0].state.p_pos))
43 | return -dist2
44 |
45 | def observation(self, agent, world):
46 | # get positions of all entities in this agent's reference frame
47 | entity_pos = []
48 | for entity in world.landmarks:
49 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
50 | return np.concatenate([agent.state.p_vel] + entity_pos)
51 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_adversary.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 |
6 | class Scenario(BaseScenario):
7 |
8 | def make_world(self):
9 | world = World()
10 | # set any world properties first
11 | world.dim_c = 2
12 | num_agents = 3
13 | world.num_agents = num_agents
14 | num_adversaries = 1
15 | num_landmarks = num_agents - 1
16 | # add agents
17 | world.agents = [Agent() for i in range(num_agents)]
18 | for i, agent in enumerate(world.agents):
19 | agent.name = 'agent %d' % i
20 | agent.collide = False
21 | agent.silent = True
22 | agent.adversary = True if i < num_adversaries else False
23 | agent.size = 0.15
24 | # add landmarks
25 | world.landmarks = [Landmark() for i in range(num_landmarks)]
26 | for i, landmark in enumerate(world.landmarks):
27 | landmark.name = 'landmark %d' % i
28 | landmark.collide = False
29 | landmark.movable = False
30 | landmark.size = 0.08
31 | # make initial conditions
32 | self.reset_world(world)
33 | return world
34 |
35 | def reset_world(self, world):
36 | # random properties for agents
37 | world.agents[0].color = np.array([0.85, 0.35, 0.35])
38 | for i in range(1, world.num_agents):
39 | world.agents[i].color = np.array([0.35, 0.35, 0.85])
40 | # random properties for landmarks
41 | for i, landmark in enumerate(world.landmarks):
42 | landmark.color = np.array([0.15, 0.15, 0.15])
43 | # set goal landmark
44 | goal = np.random.choice(world.landmarks)
45 | goal.color = np.array([0.15, 0.65, 0.15])
46 | for agent in world.agents:
47 | agent.goal_a = goal
48 | # set random initial states
49 | for agent in world.agents:
50 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
51 | agent.state.p_vel = np.zeros(world.dim_p)
52 | agent.state.c = np.zeros(world.dim_c)
53 | for i, landmark in enumerate(world.landmarks):
54 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
55 | landmark.state.p_vel = np.zeros(world.dim_p)
56 |
57 | def benchmark_data(self, agent, world):
58 | # returns data for benchmarking purposes
59 | if agent.adversary:
60 | return np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))
61 | else:
62 | dists = []
63 | for l in world.landmarks:
64 | dists.append(np.sum(np.square(agent.state.p_pos - l.state.p_pos)))
65 | dists.append(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos)))
66 | return tuple(dists)
67 |
68 | # return all agents that are not adversaries
69 | def good_agents(self, world):
70 | return [agent for agent in world.agents if not agent.adversary]
71 |
72 | # return all adversarial agents
73 | def adversaries(self, world):
74 | return [agent for agent in world.agents if agent.adversary]
75 |
76 | def reward(self, agent, world):
77 | # Agents are rewarded based on minimum agent distance to each landmark
78 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world)
79 |
80 | def agent_reward(self, agent, world):
81 | # Rewarded based on how close any good agent is to the goal landmark, and how far the adversary is from it
82 | shaped_reward = True
83 | shaped_adv_reward = True
84 |
85 | # Calculate negative reward for adversary
86 | adversary_agents = self.adversaries(world)
87 | if shaped_adv_reward: # distance-based adversary reward
88 | adv_rew = sum([np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in adversary_agents])
89 | else: # proximity-based adversary reward (binary)
90 | adv_rew = 0
91 | for a in adversary_agents:
92 | if np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) < 2 * a.goal_a.size:
93 | adv_rew -= 5
94 |
95 | # Calculate positive reward for agents
96 | good_agents = self.good_agents(world)
97 | if shaped_reward: # distance-based agent reward
98 | pos_rew = -min(
99 | [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents])
100 | else: # proximity-based agent reward (binary)
101 | pos_rew = 0
102 | if min([np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents]) \
103 | < 2 * agent.goal_a.size:
104 | pos_rew += 5
105 | pos_rew -= min(
106 | [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in good_agents])
107 | return pos_rew + adv_rew
108 |
109 | def adversary_reward(self, agent, world):
110 | # Rewarded based on proximity to the goal landmark
111 | shaped_reward = True
112 | if shaped_reward: # distance-based reward
113 | return -np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))
114 | else: # proximity-based reward (binary)
115 | adv_rew = 0
116 | if np.sqrt(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos))) < 2 * agent.goal_a.size:
117 | adv_rew += 5
118 | return adv_rew
119 |
120 |
121 | def observation(self, agent, world):
122 | # get positions of all entities in this agent's reference frame
123 | entity_pos = []
124 | for entity in world.landmarks:
125 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
126 | # entity colors
127 | entity_color = []
128 | for entity in world.landmarks:
129 | entity_color.append(entity.color)
130 | # communication of all other agents
131 | other_pos = []
132 | for other in world.agents:
133 | if other is agent: continue
134 | other_pos.append(other.state.p_pos - agent.state.p_pos)
135 |
136 | if not agent.adversary:
137 | return np.concatenate([agent.goal_a.state.p_pos - agent.state.p_pos] + entity_pos + other_pos)
138 | else:
139 | return np.concatenate(entity_pos + other_pos)
140 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_crypto.py:
--------------------------------------------------------------------------------
1 | """
2 | Scenario:
3 | 1 speaker, 2 listeners (one of which is an adversary). Good agents rewarded for proximity to goal, and distance from
4 | adversary to goal. Adversary is rewarded for its distance to the goal.
5 | """
6 |
7 |
8 | import numpy as np
9 | from multiagent.core import World, Agent, Landmark
10 | from multiagent.scenario import BaseScenario
11 | import random
12 |
13 |
14 | class CryptoAgent(Agent):
15 | def __init__(self):
16 | super(CryptoAgent, self).__init__()
17 | self.key = None
18 |
19 | class Scenario(BaseScenario):
20 |
21 | def make_world(self):
22 | world = World()
23 | # set any world properties first
24 | num_agents = 3
25 | num_adversaries = 1
26 | num_landmarks = 2
27 | world.dim_c = 4
28 | # add agents
29 | world.agents = [CryptoAgent() for i in range(num_agents)]
30 | for i, agent in enumerate(world.agents):
31 | agent.name = 'agent %d' % i
32 | agent.collide = False
33 | agent.adversary = True if i < num_adversaries else False
34 | agent.speaker = True if i == 2 else False
35 | agent.movable = False
36 | # add landmarks
37 | world.landmarks = [Landmark() for i in range(num_landmarks)]
38 | for i, landmark in enumerate(world.landmarks):
39 | landmark.name = 'landmark %d' % i
40 | landmark.collide = False
41 | landmark.movable = False
42 | # make initial conditions
43 | self.reset_world(world)
44 | return world
45 |
46 |
47 | def reset_world(self, world):
48 | # random properties for agents
49 | for i, agent in enumerate(world.agents):
50 | agent.color = np.array([0.25, 0.25, 0.25])
51 | if agent.adversary:
52 | agent.color = np.array([0.75, 0.25, 0.25])
53 | agent.key = None
54 | # random properties for landmarks
55 | color_list = [np.zeros(world.dim_c) for i in world.landmarks]
56 | for i, color in enumerate(color_list):
57 | color[i] += 1
58 | for color, landmark in zip(color_list, world.landmarks):
59 | landmark.color = color
60 | # set goal landmark
61 | goal = np.random.choice(world.landmarks)
62 | world.agents[1].color = goal.color
63 | world.agents[2].key = np.random.choice(world.landmarks).color
64 |
65 | for agent in world.agents:
66 | agent.goal_a = goal
67 |
68 | # set random initial states
69 | for agent in world.agents:
70 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
71 | agent.state.p_vel = np.zeros(world.dim_p)
72 | agent.state.c = np.zeros(world.dim_c)
73 | for i, landmark in enumerate(world.landmarks):
74 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
75 | landmark.state.p_vel = np.zeros(world.dim_p)
76 |
77 |
78 | def benchmark_data(self, agent, world):
79 | # returns data for benchmarking purposes
80 | return (agent.state.c, agent.goal_a.color)
81 |
82 | # return all agents that are not adversaries
83 | def good_listeners(self, world):
84 | return [agent for agent in world.agents if not agent.adversary and not agent.speaker]
85 |
86 | # return all agents that are not adversaries
87 | def good_agents(self, world):
88 | return [agent for agent in world.agents if not agent.adversary]
89 |
90 | # return all adversarial agents
91 | def adversaries(self, world):
92 | return [agent for agent in world.agents if agent.adversary]
93 |
94 | def reward(self, agent, world):
95 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world)
96 |
97 | def agent_reward(self, agent, world):
98 | # Agents rewarded if Bob can reconstruct message, but adversary (Eve) cannot
99 | good_listeners = self.good_listeners(world)
100 | adversaries = self.adversaries(world)
101 | good_rew = 0
102 | adv_rew = 0
103 | for a in good_listeners:
104 | if (a.state.c == np.zeros(world.dim_c)).all():
105 | continue
106 | else:
107 | good_rew -= np.sum(np.square(a.state.c - agent.goal_a.color))
108 | for a in adversaries:
109 | if (a.state.c == np.zeros(world.dim_c)).all():
110 | continue
111 | else:
112 | adv_l1 = np.sum(np.square(a.state.c - agent.goal_a.color))
113 | adv_rew += adv_l1
114 | return adv_rew + good_rew
115 |
116 | def adversary_reward(self, agent, world):
117 | # Adversary (Eve) is rewarded if it can reconstruct original goal
118 | rew = 0
119 | if not (agent.state.c == np.zeros(world.dim_c)).all():
120 | rew -= np.sum(np.square(agent.state.c - agent.goal_a.color))
121 | return rew
122 |
123 |
124 | def observation(self, agent, world):
125 | # goal color
126 | goal_color = np.zeros(world.dim_color)
127 | if agent.goal_a is not None:
128 | goal_color = agent.goal_a.color
129 |
130 | # get positions of all entities in this agent's reference frame
131 | entity_pos = []
132 | for entity in world.landmarks:
133 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
134 | # communication of all other agents
135 | comm = []
136 | for other in world.agents:
137 | if other is agent or (other.state.c is None) or not other.speaker: continue
138 | comm.append(other.state.c)
139 |
140 | confer = np.array([0])
141 |
142 | if world.agents[2].key is None:
143 | confer = np.array([1])
144 | key = np.zeros(world.dim_c)
145 | goal_color = np.zeros(world.dim_c)
146 | else:
147 | key = world.agents[2].key
148 |
149 | prnt = False
150 | # speaker
151 | if agent.speaker:
152 | if prnt:
153 | print('speaker')
154 | print(agent.state.c)
155 | print(np.concatenate([goal_color] + [key] + [confer] + [np.random.randn(1)]))
156 | return np.concatenate([goal_color] + [key])
157 | # listener
158 | if not agent.speaker and not agent.adversary:
159 | if prnt:
160 | print('listener')
161 | print(agent.state.c)
162 | print(np.concatenate([key] + comm + [confer]))
163 | return np.concatenate([key] + comm)
164 | if not agent.speaker and agent.adversary:
165 | if prnt:
166 | print('adversary')
167 | print(agent.state.c)
168 | print(np.concatenate(comm + [confer]))
169 | return np.concatenate(comm)
170 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_push.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 | class Scenario(BaseScenario):
6 | def make_world(self):
7 | world = World()
8 | # set any world properties first
9 | world.dim_c = 2
10 | num_agents = 2
11 | num_adversaries = 1
12 | num_landmarks = 2
13 | # add agents
14 | world.agents = [Agent() for i in range(num_agents)]
15 | for i, agent in enumerate(world.agents):
16 | agent.name = 'agent %d' % i
17 | agent.collide = True
18 | agent.silent = True
19 | if i < num_adversaries:
20 | agent.adversary = True
21 | else:
22 | agent.adversary = False
23 | # add landmarks
24 | world.landmarks = [Landmark() for i in range(num_landmarks)]
25 | for i, landmark in enumerate(world.landmarks):
26 | landmark.name = 'landmark %d' % i
27 | landmark.collide = False
28 | landmark.movable = False
29 | # make initial conditions
30 | self.reset_world(world)
31 | return world
32 |
33 | def reset_world(self, world):
34 | # random properties for landmarks
35 | for i, landmark in enumerate(world.landmarks):
36 | landmark.color = np.array([0.1, 0.1, 0.1])
37 | landmark.color[i + 1] += 0.8
38 | landmark.index = i
39 | # set goal landmark
40 | goal = np.random.choice(world.landmarks)
41 | for i, agent in enumerate(world.agents):
42 | agent.goal_a = goal
43 | agent.color = np.array([0.25, 0.25, 0.25])
44 | if agent.adversary:
45 | agent.color = np.array([0.75, 0.25, 0.25])
46 | else:
47 | j = goal.index
48 | agent.color[j + 1] += 0.5
49 | # set random initial states
50 | for agent in world.agents:
51 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
52 | agent.state.p_vel = np.zeros(world.dim_p)
53 | agent.state.c = np.zeros(world.dim_c)
54 | for i, landmark in enumerate(world.landmarks):
55 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
56 | landmark.state.p_vel = np.zeros(world.dim_p)
57 |
58 | def reward(self, agent, world):
59 | # Agents are rewarded based on minimum agent distance to each landmark
60 | return self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world)
61 |
62 | def agent_reward(self, agent, world):
63 | # the distance to the goal
64 | return -np.sqrt(np.sum(np.square(agent.state.p_pos - agent.goal_a.state.p_pos)))
65 |
66 | def adversary_reward(self, agent, world):
67 | # keep the nearest good agents away from the goal
68 | agent_dist = [np.sqrt(np.sum(np.square(a.state.p_pos - a.goal_a.state.p_pos))) for a in world.agents if not a.adversary]
69 | pos_rew = min(agent_dist)
70 | #nearest_agent = world.good_agents[np.argmin(agent_dist)]
71 | #neg_rew = np.sqrt(np.sum(np.square(nearest_agent.state.p_pos - agent.state.p_pos)))
72 | neg_rew = np.sqrt(np.sum(np.square(agent.goal_a.state.p_pos - agent.state.p_pos)))
73 | #neg_rew = sum([np.sqrt(np.sum(np.square(a.state.p_pos - agent.state.p_pos))) for a in world.good_agents])
74 | return pos_rew - neg_rew
75 |
76 | def observation(self, agent, world):
77 | # get positions of all entities in this agent's reference frame
78 | entity_pos = []
79 | for entity in world.landmarks: # world.entities:
80 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
81 | # entity colors
82 | entity_color = []
83 | for entity in world.landmarks: # world.entities:
84 | entity_color.append(entity.color)
85 | # communication of all other agents
86 | comm = []
87 | other_pos = []
88 | for other in world.agents:
89 | if other is agent: continue
90 | comm.append(other.state.c)
91 | other_pos.append(other.state.p_pos - agent.state.p_pos)
92 | if not agent.adversary:
93 | return np.concatenate([agent.state.p_vel] + [agent.goal_a.state.p_pos - agent.state.p_pos] + [agent.color] + entity_pos + entity_color + other_pos)
94 | else:
95 | #other_pos = list(reversed(other_pos)) if random.uniform(0,1) > 0.5 else other_pos # randomize position of other agents in adversary network
96 | return np.concatenate([agent.state.p_vel] + entity_pos + other_pos)
97 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_reference.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 | class Scenario(BaseScenario):
6 | def make_world(self):
7 | world = World()
8 | # set any world properties first
9 | world.dim_c = 10
10 | world.collaborative = True # whether agents share rewards
11 | # add agents
12 | world.agents = [Agent() for i in range(2)]
13 | for i, agent in enumerate(world.agents):
14 | agent.name = 'agent %d' % i
15 | agent.collide = False
16 | # add landmarks
17 | world.landmarks = [Landmark() for i in range(3)]
18 | for i, landmark in enumerate(world.landmarks):
19 | landmark.name = 'landmark %d' % i
20 | landmark.collide = False
21 | landmark.movable = False
22 | # make initial conditions
23 | self.reset_world(world)
24 | return world
25 |
26 | def reset_world(self, world):
27 | # assign goals to agents
28 | for agent in world.agents:
29 | agent.goal_a = None
30 | agent.goal_b = None
31 | # want other agent to go to the goal landmark
32 | world.agents[0].goal_a = world.agents[1]
33 | world.agents[0].goal_b = np.random.choice(world.landmarks)
34 | world.agents[1].goal_a = world.agents[0]
35 | world.agents[1].goal_b = np.random.choice(world.landmarks)
36 | # random properties for agents
37 | for i, agent in enumerate(world.agents):
38 | agent.color = np.array([0.25,0.25,0.25])
39 | # random properties for landmarks
40 | world.landmarks[0].color = np.array([0.75,0.25,0.25])
41 | world.landmarks[1].color = np.array([0.25,0.75,0.25])
42 | world.landmarks[2].color = np.array([0.25,0.25,0.75])
43 | # special colors for goals
44 | world.agents[0].goal_a.color = world.agents[0].goal_b.color
45 | world.agents[1].goal_a.color = world.agents[1].goal_b.color
46 | # set random initial states
47 | for agent in world.agents:
48 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
49 | agent.state.p_vel = np.zeros(world.dim_p)
50 | agent.state.c = np.zeros(world.dim_c)
51 | for i, landmark in enumerate(world.landmarks):
52 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
53 | landmark.state.p_vel = np.zeros(world.dim_p)
54 |
55 | def reward(self, agent, world):
56 | if agent.goal_a is None or agent.goal_b is None:
57 | return 0.0
58 | dist2 = np.sum(np.square(agent.goal_a.state.p_pos - agent.goal_b.state.p_pos))
59 | return -dist2
60 |
61 | def observation(self, agent, world):
62 | # goal color
63 | goal_color = [np.zeros(world.dim_color), np.zeros(world.dim_color)]
64 | if agent.goal_b is not None:
65 | goal_color[1] = agent.goal_b.color
66 |
67 | # get positions of all entities in this agent's reference frame
68 | entity_pos = []
69 | for entity in world.landmarks:
70 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
71 | # entity colors
72 | entity_color = []
73 | for entity in world.landmarks:
74 | entity_color.append(entity.color)
75 | # communication of all other agents
76 | comm = []
77 | for other in world.agents:
78 | if other is agent: continue
79 | comm.append(other.state.c)
80 | return np.concatenate([agent.state.p_vel] + entity_pos + [goal_color[1]] + comm)
81 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_speaker_listener.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 | class Scenario(BaseScenario):
6 | def make_world(self):
7 | world = World()
8 | # set any world properties first
9 | world.dim_c = 3
10 | num_landmarks = 3
11 | world.collaborative = True
12 | # add agents
13 | world.agents = [Agent() for i in range(2)]
14 | for i, agent in enumerate(world.agents):
15 | agent.name = 'agent %d' % i
16 | agent.collide = False
17 | agent.size = 0.075
18 | # speaker
19 | world.agents[0].movable = False
20 | # listener
21 | world.agents[1].silent = True
22 | # add landmarks
23 | world.landmarks = [Landmark() for i in range(num_landmarks)]
24 | for i, landmark in enumerate(world.landmarks):
25 | landmark.name = 'landmark %d' % i
26 | landmark.collide = False
27 | landmark.movable = False
28 | landmark.size = 0.04
29 | # make initial conditions
30 | self.reset_world(world)
31 | return world
32 |
33 | def reset_world(self, world):
34 | # assign goals to agents
35 | for agent in world.agents:
36 | agent.goal_a = None
37 | agent.goal_b = None
38 | # want listener to go to the goal landmark
39 | world.agents[0].goal_a = world.agents[1]
40 | world.agents[0].goal_b = np.random.choice(world.landmarks)
41 | # random properties for agents
42 | for i, agent in enumerate(world.agents):
43 | agent.color = np.array([0.25,0.25,0.25])
44 | # random properties for landmarks
45 | world.landmarks[0].color = np.array([0.65,0.15,0.15])
46 | world.landmarks[1].color = np.array([0.15,0.65,0.15])
47 | world.landmarks[2].color = np.array([0.15,0.15,0.65])
48 | # special colors for goals
49 | world.agents[0].goal_a.color = world.agents[0].goal_b.color + np.array([0.45, 0.45, 0.45])
50 | # set random initial states
51 | for agent in world.agents:
52 | agent.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
53 | agent.state.p_vel = np.zeros(world.dim_p)
54 | agent.state.c = np.zeros(world.dim_c)
55 | for i, landmark in enumerate(world.landmarks):
56 | landmark.state.p_pos = np.random.uniform(-1,+1, world.dim_p)
57 | landmark.state.p_vel = np.zeros(world.dim_p)
58 |
59 | def benchmark_data(self, agent, world):
60 | # returns data for benchmarking purposes
61 | return self.reward(agent, reward)
62 |
63 | def reward(self, agent, world):
64 | # squared distance from listener to landmark
65 | a = world.agents[0]
66 | dist2 = np.sum(np.square(a.goal_a.state.p_pos - a.goal_b.state.p_pos))
67 | return -dist2
68 |
69 | def observation(self, agent, world):
70 | # goal color
71 | goal_color = np.zeros(world.dim_color)
72 | if agent.goal_b is not None:
73 | goal_color = agent.goal_b.color
74 |
75 | # get positions of all entities in this agent's reference frame
76 | entity_pos = []
77 | for entity in world.landmarks:
78 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
79 |
80 | # communication of all other agents
81 | comm = []
82 | for other in world.agents:
83 | if other is agent or (other.state.c is None): continue
84 | comm.append(other.state.c)
85 |
86 | # speaker
87 | if not agent.movable:
88 | return np.concatenate([goal_color])
89 | # listener
90 | if agent.silent:
91 | return np.concatenate([agent.state.p_vel] + entity_pos + comm)
92 |
93 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_spread.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 |
6 | class Scenario(BaseScenario):
7 | def make_world(self):
8 | world = World()
9 | # set any world properties first
10 | world.dim_c = 2
11 | num_agents = 3
12 | num_landmarks = 3
13 | world.collaborative = True
14 | # add agents
15 | world.agents = [Agent() for i in range(num_agents)]
16 | for i, agent in enumerate(world.agents):
17 | agent.name = 'agent %d' % i
18 | agent.collide = True
19 | agent.silent = True
20 | agent.size = 0.15
21 | # add landmarks
22 | world.landmarks = [Landmark() for i in range(num_landmarks)]
23 | for i, landmark in enumerate(world.landmarks):
24 | landmark.name = 'landmark %d' % i
25 | landmark.collide = False
26 | landmark.movable = False
27 | # make initial conditions
28 | self.reset_world(world)
29 | return world
30 |
31 | def reset_world(self, world):
32 | # random properties for agents
33 | for i, agent in enumerate(world.agents):
34 | agent.color = np.array([0.35, 0.35, 0.85])
35 | # random properties for landmarks
36 | for i, landmark in enumerate(world.landmarks):
37 | landmark.color = np.array([0.25, 0.25, 0.25])
38 | # set random initial states
39 | for agent in world.agents:
40 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
41 | agent.state.p_vel = np.zeros(world.dim_p)
42 | agent.state.c = np.zeros(world.dim_c)
43 | for i, landmark in enumerate(world.landmarks):
44 | landmark.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
45 | landmark.state.p_vel = np.zeros(world.dim_p)
46 |
47 | def benchmark_data(self, agent, world):
48 | rew = 0
49 | collisions = 0
50 | occupied_landmarks = 0
51 | min_dists = 0
52 | for l in world.landmarks:
53 | dists = [np.sqrt(np.sum(np.square(a.state.p_pos - l.state.p_pos))) for a in world.agents]
54 | min_dists += min(dists)
55 | rew -= min(dists)
56 | if min(dists) < 0.1:
57 | occupied_landmarks += 1
58 | if agent.collide:
59 | for a in world.agents:
60 | if self.is_collision(a, agent):
61 | rew -= 1
62 | collisions += 1
63 | return (rew, collisions, min_dists, occupied_landmarks)
64 |
65 |
66 | def is_collision(self, agent1, agent2):
67 | delta_pos = agent1.state.p_pos - agent2.state.p_pos
68 | dist = np.sqrt(np.sum(np.square(delta_pos)))
69 | dist_min = agent1.size + agent2.size
70 | return True if dist < dist_min else False
71 |
72 | def reward(self, agent, world):
73 | # Agents are rewarded based on minimum agent distance to each landmark, penalized for collisions
74 | rew = 0
75 | for l in world.landmarks:
76 | dists = [np.sqrt(np.sum(np.square(a.state.p_pos - l.state.p_pos))) for a in world.agents]
77 | rew -= min(dists)
78 | if agent.collide:
79 | for a in world.agents:
80 | if self.is_collision(a, agent):
81 | rew -= 1
82 | return rew
83 |
84 | def observation(self, agent, world):
85 | # get positions of all entities in this agent's reference frame
86 | entity_pos = []
87 | for entity in world.landmarks: # world.entities:
88 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
89 | # entity colors
90 | entity_color = []
91 | for entity in world.landmarks: # world.entities:
92 | entity_color.append(entity.color)
93 | # communication of all other agents
94 | comm = []
95 | other_pos = []
96 | for other in world.agents:
97 | if other is agent: continue
98 | comm.append(other.state.c)
99 | other_pos.append(other.state.p_pos - agent.state.p_pos)
100 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + comm)
101 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_tag.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 |
6 | class Scenario(BaseScenario):
7 | def make_world(self):
8 | world = World()
9 | # set any world properties first
10 | world.dim_c = 2
11 | num_good_agents = 1
12 | num_adversaries = 3
13 | num_agents = num_adversaries + num_good_agents
14 | num_landmarks = 2
15 | # add agents
16 | world.agents = [Agent() for i in range(num_agents)]
17 | for i, agent in enumerate(world.agents):
18 | agent.name = 'agent %d' % i
19 | agent.collide = True
20 | agent.silent = True
21 | agent.adversary = True if i < num_adversaries else False
22 | agent.size = 0.075 if agent.adversary else 0.05
23 | agent.accel = 3.0 if agent.adversary else 4.0
24 | #agent.accel = 20.0 if agent.adversary else 25.0
25 | agent.max_speed = 1.0 if agent.adversary else 1.3
26 | # add landmarks
27 | world.landmarks = [Landmark() for i in range(num_landmarks)]
28 | for i, landmark in enumerate(world.landmarks):
29 | landmark.name = 'landmark %d' % i
30 | landmark.collide = True
31 | landmark.movable = False
32 | landmark.size = 0.2
33 | landmark.boundary = False
34 | # make initial conditions
35 | self.reset_world(world)
36 | return world
37 |
38 |
39 | def reset_world(self, world):
40 | # random properties for agents
41 | for i, agent in enumerate(world.agents):
42 | agent.color = np.array([0.35, 0.85, 0.35]) if not agent.adversary else np.array([0.85, 0.35, 0.35])
43 | # random properties for landmarks
44 | for i, landmark in enumerate(world.landmarks):
45 | landmark.color = np.array([0.25, 0.25, 0.25])
46 | # set random initial states
47 | for agent in world.agents:
48 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
49 | agent.state.p_vel = np.zeros(world.dim_p)
50 | agent.state.c = np.zeros(world.dim_c)
51 | for i, landmark in enumerate(world.landmarks):
52 | if not landmark.boundary:
53 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p)
54 | landmark.state.p_vel = np.zeros(world.dim_p)
55 |
56 |
57 | def benchmark_data(self, agent, world):
58 | # returns data for benchmarking purposes
59 | if agent.adversary:
60 | collisions = 0
61 | for a in self.good_agents(world):
62 | if self.is_collision(a, agent):
63 | collisions += 1
64 | return collisions
65 | else:
66 | return 0
67 |
68 |
69 | def is_collision(self, agent1, agent2):
70 | delta_pos = agent1.state.p_pos - agent2.state.p_pos
71 | dist = np.sqrt(np.sum(np.square(delta_pos)))
72 | dist_min = agent1.size + agent2.size
73 | return True if dist < dist_min else False
74 |
75 | # return all agents that are not adversaries
76 | def good_agents(self, world):
77 | return [agent for agent in world.agents if not agent.adversary]
78 |
79 | # return all adversarial agents
80 | def adversaries(self, world):
81 | return [agent for agent in world.agents if agent.adversary]
82 |
83 |
84 | def reward(self, agent, world):
85 | # Agents are rewarded based on minimum agent distance to each landmark
86 | main_reward = self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world)
87 | return main_reward
88 |
89 | def agent_reward(self, agent, world):
90 | # Agents are negatively rewarded if caught by adversaries
91 | rew = 0
92 | shape = False
93 | adversaries = self.adversaries(world)
94 | if shape: # reward can optionally be shaped (increased reward for increased distance from adversary)
95 | for adv in adversaries:
96 | rew += 0.1 * np.sqrt(np.sum(np.square(agent.state.p_pos - adv.state.p_pos)))
97 | if agent.collide:
98 | for a in adversaries:
99 | if self.is_collision(a, agent):
100 | rew -= 10
101 |
102 | # agents are penalized for exiting the screen, so that they can be caught by the adversaries
103 | def bound(x):
104 | if x < 0.9:
105 | return 0
106 | if x < 1.0:
107 | return (x - 0.9) * 10
108 | return min(np.exp(2 * x - 2), 10)
109 | for p in range(world.dim_p):
110 | x = abs(agent.state.p_pos[p])
111 | rew -= bound(x)
112 |
113 | return rew
114 |
115 | def adversary_reward(self, agent, world):
116 | # Adversaries are rewarded for collisions with agents
117 | rew = 0
118 | shape = False
119 | agents = self.good_agents(world)
120 | adversaries = self.adversaries(world)
121 | if shape: # reward can optionally be shaped (decreased reward for increased distance from agents)
122 | for adv in adversaries:
123 | rew -= 0.1 * min([np.sqrt(np.sum(np.square(a.state.p_pos - adv.state.p_pos))) for a in agents])
124 | if agent.collide:
125 | for ag in agents:
126 | for adv in adversaries:
127 | if self.is_collision(ag, adv):
128 | rew += 10
129 | return rew
130 |
131 | def observation(self, agent, world):
132 | # get positions of all entities in this agent's reference frame
133 | entity_pos = []
134 | for entity in world.landmarks:
135 | if not entity.boundary:
136 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
137 | # communication of all other agents
138 | comm = []
139 | other_pos = []
140 | other_vel = []
141 | for other in world.agents:
142 | if other is agent: continue
143 | comm.append(other.state.c)
144 | other_pos.append(other.state.p_pos - agent.state.p_pos)
145 | if not other.adversary:
146 | other_vel.append(other.state.p_vel)
147 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel)
148 |
--------------------------------------------------------------------------------
/multiagent/scenarios/simple_world_comm.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from multiagent.core import World, Agent, Landmark
3 | from multiagent.scenario import BaseScenario
4 |
5 |
6 | class Scenario(BaseScenario):
7 | def make_world(self):
8 | world = World()
9 | # set any world properties first
10 | world.dim_c = 4
11 | #world.damping = 1
12 | num_good_agents = 2
13 | num_adversaries = 4
14 | num_agents = num_adversaries + num_good_agents
15 | num_landmarks = 1
16 | num_food = 2
17 | num_forests = 2
18 | # add agents
19 | world.agents = [Agent() for i in range(num_agents)]
20 | for i, agent in enumerate(world.agents):
21 | agent.name = 'agent %d' % i
22 | agent.collide = True
23 | agent.leader = True if i == 0 else False
24 | agent.silent = True if i > 0 else False
25 | agent.adversary = True if i < num_adversaries else False
26 | agent.size = 0.075 if agent.adversary else 0.045
27 | agent.accel = 3.0 if agent.adversary else 4.0
28 | #agent.accel = 20.0 if agent.adversary else 25.0
29 | agent.max_speed = 1.0 if agent.adversary else 1.3
30 | # add landmarks
31 | world.landmarks = [Landmark() for i in range(num_landmarks)]
32 | for i, landmark in enumerate(world.landmarks):
33 | landmark.name = 'landmark %d' % i
34 | landmark.collide = True
35 | landmark.movable = False
36 | landmark.size = 0.2
37 | landmark.boundary = False
38 | world.food = [Landmark() for i in range(num_food)]
39 | for i, landmark in enumerate(world.food):
40 | landmark.name = 'food %d' % i
41 | landmark.collide = False
42 | landmark.movable = False
43 | landmark.size = 0.03
44 | landmark.boundary = False
45 | world.forests = [Landmark() for i in range(num_forests)]
46 | for i, landmark in enumerate(world.forests):
47 | landmark.name = 'forest %d' % i
48 | landmark.collide = False
49 | landmark.movable = False
50 | landmark.size = 0.3
51 | landmark.boundary = False
52 | world.landmarks += world.food
53 | world.landmarks += world.forests
54 | #world.landmarks += self.set_boundaries(world) # world boundaries now penalized with negative reward
55 | # make initial conditions
56 | self.reset_world(world)
57 | return world
58 |
59 | def set_boundaries(self, world):
60 | boundary_list = []
61 | landmark_size = 1
62 | edge = 1 + landmark_size
63 | num_landmarks = int(edge * 2 / landmark_size)
64 | for x_pos in [-edge, edge]:
65 | for i in range(num_landmarks):
66 | l = Landmark()
67 | l.state.p_pos = np.array([x_pos, -1 + i * landmark_size])
68 | boundary_list.append(l)
69 |
70 | for y_pos in [-edge, edge]:
71 | for i in range(num_landmarks):
72 | l = Landmark()
73 | l.state.p_pos = np.array([-1 + i * landmark_size, y_pos])
74 | boundary_list.append(l)
75 |
76 | for i, l in enumerate(boundary_list):
77 | l.name = 'boundary %d' % i
78 | l.collide = True
79 | l.movable = False
80 | l.boundary = True
81 | l.color = np.array([0.75, 0.75, 0.75])
82 | l.size = landmark_size
83 | l.state.p_vel = np.zeros(world.dim_p)
84 |
85 | return boundary_list
86 |
87 |
88 | def reset_world(self, world):
89 | # random properties for agents
90 | for i, agent in enumerate(world.agents):
91 | agent.color = np.array([0.45, 0.95, 0.45]) if not agent.adversary else np.array([0.95, 0.45, 0.45])
92 | agent.color -= np.array([0.3, 0.3, 0.3]) if agent.leader else np.array([0, 0, 0])
93 | # random properties for landmarks
94 | for i, landmark in enumerate(world.landmarks):
95 | landmark.color = np.array([0.25, 0.25, 0.25])
96 | for i, landmark in enumerate(world.food):
97 | landmark.color = np.array([0.15, 0.15, 0.65])
98 | for i, landmark in enumerate(world.forests):
99 | landmark.color = np.array([0.6, 0.9, 0.6])
100 | # set random initial states
101 | for agent in world.agents:
102 | agent.state.p_pos = np.random.uniform(-1, +1, world.dim_p)
103 | agent.state.p_vel = np.zeros(world.dim_p)
104 | agent.state.c = np.zeros(world.dim_c)
105 | for i, landmark in enumerate(world.landmarks):
106 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p)
107 | landmark.state.p_vel = np.zeros(world.dim_p)
108 | for i, landmark in enumerate(world.food):
109 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p)
110 | landmark.state.p_vel = np.zeros(world.dim_p)
111 | for i, landmark in enumerate(world.forests):
112 | landmark.state.p_pos = np.random.uniform(-0.9, +0.9, world.dim_p)
113 | landmark.state.p_vel = np.zeros(world.dim_p)
114 |
115 | def benchmark_data(self, agent, world):
116 | if agent.adversary:
117 | collisions = 0
118 | for a in self.good_agents(world):
119 | if self.is_collision(a, agent):
120 | collisions += 1
121 | return collisions
122 | else:
123 | return 0
124 |
125 |
126 | def is_collision(self, agent1, agent2):
127 | delta_pos = agent1.state.p_pos - agent2.state.p_pos
128 | dist = np.sqrt(np.sum(np.square(delta_pos)))
129 | dist_min = agent1.size + agent2.size
130 | return True if dist < dist_min else False
131 |
132 |
133 | # return all agents that are not adversaries
134 | def good_agents(self, world):
135 | return [agent for agent in world.agents if not agent.adversary]
136 |
137 | # return all adversarial agents
138 | def adversaries(self, world):
139 | return [agent for agent in world.agents if agent.adversary]
140 |
141 |
142 | def reward(self, agent, world):
143 | # Agents are rewarded based on minimum agent distance to each landmark
144 | #boundary_reward = -10 if self.outside_boundary(agent) else 0
145 | main_reward = self.adversary_reward(agent, world) if agent.adversary else self.agent_reward(agent, world)
146 | return main_reward
147 |
148 | def outside_boundary(self, agent):
149 | if agent.state.p_pos[0] > 1 or agent.state.p_pos[0] < -1 or agent.state.p_pos[1] > 1 or agent.state.p_pos[1] < -1:
150 | return True
151 | else:
152 | return False
153 |
154 |
155 | def agent_reward(self, agent, world):
156 | # Agents are rewarded based on minimum agent distance to each landmark
157 | rew = 0
158 | shape = False
159 | adversaries = self.adversaries(world)
160 | if shape:
161 | for adv in adversaries:
162 | rew += 0.1 * np.sqrt(np.sum(np.square(agent.state.p_pos - adv.state.p_pos)))
163 | if agent.collide:
164 | for a in adversaries:
165 | if self.is_collision(a, agent):
166 | rew -= 5
167 | def bound(x):
168 | if x < 0.9:
169 | return 0
170 | if x < 1.0:
171 | return (x - 0.9) * 10
172 | return min(np.exp(2 * x - 2), 10) # 1 + (x - 1) * (x - 1)
173 |
174 | for p in range(world.dim_p):
175 | x = abs(agent.state.p_pos[p])
176 | rew -= 2 * bound(x)
177 |
178 | for food in world.food:
179 | if self.is_collision(agent, food):
180 | rew += 2
181 | rew += 0.05 * min([np.sqrt(np.sum(np.square(food.state.p_pos - agent.state.p_pos))) for food in world.food])
182 |
183 | return rew
184 |
185 | def adversary_reward(self, agent, world):
186 | # Agents are rewarded based on minimum agent distance to each landmark
187 | rew = 0
188 | shape = True
189 | agents = self.good_agents(world)
190 | adversaries = self.adversaries(world)
191 | if shape:
192 | rew -= 0.1 * min([np.sqrt(np.sum(np.square(a.state.p_pos - agent.state.p_pos))) for a in agents])
193 | if agent.collide:
194 | for ag in agents:
195 | for adv in adversaries:
196 | if self.is_collision(ag, adv):
197 | rew += 5
198 | return rew
199 |
200 |
201 | def observation2(self, agent, world):
202 | # get positions of all entities in this agent's reference frame
203 | entity_pos = []
204 | for entity in world.landmarks:
205 | if not entity.boundary:
206 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
207 |
208 | food_pos = []
209 | for entity in world.food:
210 | if not entity.boundary:
211 | food_pos.append(entity.state.p_pos - agent.state.p_pos)
212 | # communication of all other agents
213 | comm = []
214 | other_pos = []
215 | other_vel = []
216 | for other in world.agents:
217 | if other is agent: continue
218 | comm.append(other.state.c)
219 | other_pos.append(other.state.p_pos - agent.state.p_pos)
220 | if not other.adversary:
221 | other_vel.append(other.state.p_vel)
222 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel)
223 |
224 | def observation(self, agent, world):
225 | # get positions of all entities in this agent's reference frame
226 | entity_pos = []
227 | for entity in world.landmarks:
228 | if not entity.boundary:
229 | entity_pos.append(entity.state.p_pos - agent.state.p_pos)
230 |
231 | in_forest = [np.array([-1]), np.array([-1])]
232 | inf1 = False
233 | inf2 = False
234 | if self.is_collision(agent, world.forests[0]):
235 | in_forest[0] = np.array([1])
236 | inf1= True
237 | if self.is_collision(agent, world.forests[1]):
238 | in_forest[1] = np.array([1])
239 | inf2 = True
240 |
241 | food_pos = []
242 | for entity in world.food:
243 | if not entity.boundary:
244 | food_pos.append(entity.state.p_pos - agent.state.p_pos)
245 | # communication of all other agents
246 | comm = []
247 | other_pos = []
248 | other_vel = []
249 | for other in world.agents:
250 | if other is agent: continue
251 | comm.append(other.state.c)
252 | oth_f1 = self.is_collision(other, world.forests[0])
253 | oth_f2 = self.is_collision(other, world.forests[1])
254 | if (inf1 and oth_f1) or (inf2 and oth_f2) or (not inf1 and not oth_f1 and not inf2 and not oth_f2) or agent.leader: #without forest vis
255 | other_pos.append(other.state.p_pos - agent.state.p_pos)
256 | if not other.adversary:
257 | other_vel.append(other.state.p_vel)
258 | else:
259 | other_pos.append([0, 0])
260 | if not other.adversary:
261 | other_vel.append([0, 0])
262 |
263 | # to tell the pred when the prey are in the forest
264 | prey_forest = []
265 | ga = self.good_agents(world)
266 | for a in ga:
267 | if any([self.is_collision(a, f) for f in world.forests]):
268 | prey_forest.append(np.array([1]))
269 | else:
270 | prey_forest.append(np.array([-1]))
271 | # to tell leader when pred are in forest
272 | prey_forest_lead = []
273 | for f in world.forests:
274 | if any([self.is_collision(a, f) for a in ga]):
275 | prey_forest_lead.append(np.array([1]))
276 | else:
277 | prey_forest_lead.append(np.array([-1]))
278 |
279 | comm = [world.agents[0].state.c]
280 |
281 | if agent.adversary and not agent.leader:
282 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel + in_forest + comm)
283 | if agent.leader:
284 | return np.concatenate(
285 | [agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + other_vel + in_forest + comm)
286 | else:
287 | return np.concatenate([agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + in_forest + other_vel)
288 |
289 |
290 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 |
3 | setup(name='multiagent',
4 | version='0.0.1',
5 | description='Multi-Agent Goal-Driven Communication Environment',
6 | url='https://github.com/openai/multiagent-public',
7 | author='Igor Mordatch',
8 | author_email='mordatch@openai.com',
9 | packages=find_packages(),
10 | include_package_data=True,
11 | zip_safe=False,
12 | install_requires=['gym', 'numpy-stl']
13 | )
14 |
--------------------------------------------------------------------------------