├── README.md └── src ├── .gitignore ├── Collective Grouping Behaviors ├── .DS_Store ├── Model.py ├── env.py ├── main.py └── train.sh ├── Model.py ├── Population Dynamics ├── .DS_Store ├── .gitignore ├── Model.py ├── env.py ├── main.py └── train.sh ├── env.py ├── killprocess.sh ├── main.py ├── plot.py ├── plot_circle.py ├── plot_largest_group.py ├── plot_number.py ├── readme.md ├── train_10000_pig_rabbit_add.sh └── train_1M.sh /README.md: -------------------------------------------------------------------------------- 1 | # A preliminary platform for up to 1 million reinforcement learning agents 2 | 3 | Our goal is to provide a **Multi-Agent Reinforcement Learning** plateform for up to 1 million agents. 4 | 5 | This plateform is still work in progress, but we have provided two specific settings for demo. 6 | 7 | You could see two directories `Population Dynamics` and `Collective Grouping Behaviors`, which correspond to the two setting in the paper(when arXiv version is available) respectively. 8 | 9 | ### Dependencies 10 | 11 | - [Tensorflow](tensorflow.org) 12 | - [opencv2/3](opencv.org) 13 | 14 | ### **Population Dynamics** setting 15 | 16 | #### Usage 17 | cd Population Dynamics 18 | ./train.sh 19 | 20 | The log is saved at `Population_dynmacis.log`. 21 | 22 | ### **Collective Grouping Behaviors** setting 23 | 24 | ####Usage 25 | cd Collective Grouping Behaviors 26 | ./train.sh 27 | 28 | The log is saved at `Collective_Grouping_Behaviors.log`. 29 | 30 | You could also open the bash file to change the parameters, the list above is the specific explanations of the parameters. 31 | 32 | - `random_seed default=10` The random seed of the random number generator for generating the obstacles. 33 | - `width default=1000` The width of the map. 34 | - `height default=1000` The height of the map. 35 | - `batch_size default=32` The batch size of the process of training. 36 | - `view_args defalut=2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3` Define the view size and face direction of the agents. Four number in each item means the number of agents that has these property, the left view size, the front view size and the face direction, where 0 means north in the map, 1 means east, 2 means south and 3 means west. 37 | - `agent_number defalut=10000` The initial number of agents. 38 | - `pig_max_number default=5000` The initial number of prey-pig. 39 | - `rabbit_max_number default=3000` The initial number of prey-rabbit. 40 | - `agent_increase_rate default=0.001` The birth rate of the agent. 41 | - `pig_increaserate default=0.001` The birth rate of the prey-pig. 42 | - `rabbit_increase_rate default=0.001` The birth rate of the prey-rabbit. 43 | - `reward_radius_pig default=7` The reward radius threshold of the prey-pig. 44 | - `reward_radius_rabbit default=2` The reward radius threshold of the prey-rabbit. 45 | - `reward_threshold_pig default=3` The reward threshold of the prey-pig. 46 | - `agent_emb_dim default=5` The dimension of the agent embedding. 47 | - `damage_per_step default=0.01` The decrease health of agent per step. 48 | - `model_name default=DNN` The category of the model. 49 | - `model_hidden_size default=32, 32` The units number of each layer. 50 | - `activations default=sigmoid, sigmoid` The activation functions of each layer. 51 | - `view_flat_size defalut=32` The input dimension of the neural network, please see the paper for details. 52 | - `num_actions default=9` The number of actions. 53 | - `reward_decay defalut=0.9` The reward decay in the reinforcement learning. 54 | - `savd_dir default=models` The model save path. 55 | - `load_dir default=None` The model load path. 56 | - `round default=100` The training rounds. 57 | - `time_step defalut=500` The training steps in each round. 58 | - `learning_rate default=0.001` The learning rate of the reinforcement learning. 59 | - `log_file default=log.txt` The log file path. 60 | 61 | For more details of the parameters, please refer to the papers. 62 | -------------------------------------------------------------------------------- /src/.gitignore: -------------------------------------------------------------------------------- 1 | good_model 2 | largest_group 3 | -------------------------------------------------------------------------------- /src/Collective Grouping Behaviors/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geek-ai/1m-agents/93ae67027f24aea6c523ced83bab9ba76394ab23/src/Collective Grouping Behaviors/.DS_Store -------------------------------------------------------------------------------- /src/Collective Grouping Behaviors/Model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import random 4 | 5 | 6 | class Model_DNN(): 7 | def __init__(self, args): 8 | self.args = args 9 | assert self.args.model_name == 'DNN' 10 | 11 | # Input placeholders 12 | self.input_view = tf.placeholder(tf.float32) # 2-D, [batch_size, view_size(width x length x depth)] 13 | self.actions = tf.placeholder(tf.int32) # 1-D, [batch_size] 14 | self.reward = tf.placeholder(tf.float32) # 1-D, [batch_size] 15 | self.maxQ = tf.placeholder(tf.float32) # 1-D, [batch_size], the max Q-value of next state 16 | self.learning_rate = tf.placeholder(tf.float32) 17 | 18 | self.agent_embeddings = {} 19 | 20 | # Build Graph 21 | self.x = self.input_view 22 | last_hidden_size = self.args.view_flat_size 23 | for i, hidden_size in enumerate(self.args.model_hidden_size): 24 | with tf.variable_scope('layer_%d' % i): 25 | self.W = tf.get_variable(name='weights', 26 | initializer=tf.truncated_normal([last_hidden_size, hidden_size], stddev=0.1)) 27 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([hidden_size])) 28 | last_hidden_size = hidden_size 29 | self.x = tf.matmul(self.x, self.W) + self.b 30 | # activation function 31 | if self.args.activations[i] == 'sigmoid': 32 | self.x = tf.sigmoid(self.x) 33 | elif self.args.activations[i] == 'tanh': 34 | self.x = tf.nn.tanh(self.x) 35 | elif self.args.activations[i] == 'relu': 36 | self.x = tf.nn.relu(self.x) 37 | 38 | with tf.variable_scope('layer_output'): 39 | self.W = tf.get_variable(name='weights', 40 | initializer=tf.truncated_normal([last_hidden_size, self.args.num_actions], 41 | stddev=0.1)) 42 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([self.args.num_actions])) 43 | self.output = tf.matmul(self.x, self.W) + self.b # batch_size x output_size 44 | 45 | # Train operation 46 | self.reward_decay = self.args.reward_decay 47 | self.actions_onehot = tf.one_hot(self.actions, self.args.num_actions) 48 | self.loss = tf.reduce_mean( 49 | tf.square( 50 | (self.reward + self.reward_decay * self.maxQ) - tf.reduce_sum( 51 | tf.multiply(self.actions_onehot, self.output), axis=1) 52 | ) 53 | ) 54 | self.train_op = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate).minimize(self.loss) 55 | 56 | def _inference(self, sess, input_view, if_sample, policy='e_greedy', epsilon=0.1): 57 | """ 58 | Perform inference for one batch 59 | :param if_sample: bool; If true, return Q(s,a) for all the actions; If false, return the sampled action. 60 | :param policy: valid when if_sample=True, sample policy of the actions taken. 61 | Available: e_greedy, greedy 62 | :param epsilon: for e_greedy policy 63 | :return: numpy array; if_sample=True, [batch_size]; if_sample=False, [batch_size, num_actions] 64 | """ 65 | assert policy in ['greedy', 'e_greedy'] 66 | value_s_a = sess.run(self.output, {self.input_view: input_view}) 67 | if if_sample: 68 | if policy == 'greedy': 69 | actions = np.argmax(value_s_a, axis=1) 70 | return actions 71 | if policy == 'e_greedy': 72 | all_actions = range(self.args.num_actions) 73 | actions = [] 74 | for i in xrange(len(value_s_a)): 75 | if random.random() < epsilon: 76 | actions.append(np.random.choice(all_actions)) 77 | else: 78 | actions.append(np.argmax(value_s_a[i])) 79 | return np.array(actions) 80 | else: 81 | return value_s_a 82 | 83 | def infer_actions(self, sess, view_batches, policy='e_greedy', epsilon=0.1): 84 | ret_actions = [] 85 | ret_actions_batch = [] 86 | for input_view in view_batches: 87 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 88 | actions_batch = self._inference(sess, batch_view, if_sample=True, policy=policy, epsilon=epsilon) 89 | ret_actions_batch.append(zip(batch_id, actions_batch)) 90 | ret_actions.extend(zip(batch_id, actions_batch)) 91 | 92 | return ret_actions, ret_actions_batch 93 | 94 | def infer_max_action_values(self, sess, view_batches): 95 | ret = [] 96 | for input_view in view_batches: 97 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 98 | value_batches = self._inference(sess, batch_view, if_sample=False) 99 | ret.append(zip(batch_id, np.max(value_batches, axis=1))) 100 | return ret 101 | 102 | def process_view_with_emb_batch(self, input_view): 103 | # parse input into id, view as and concatenate view with embedding 104 | batch_view = [] 105 | batch_id = [] 106 | for id, view in input_view: 107 | batch_id.append(id) 108 | if id in self.agent_embeddings: 109 | new_view = np.concatenate((self.agent_embeddings[id], view), 0) 110 | batch_view.append(new_view) 111 | else: 112 | new_embedding = np.random.normal(size=[self.args.agent_emb_dim]) 113 | self.agent_embeddings[id] = new_embedding 114 | new_view = np.concatenate((new_embedding, view), 0) 115 | batch_view.append(new_view) 116 | return batch_id, np.array(batch_view) 117 | 118 | def _train(self, sess, input_view, actions, reward, maxQ, learning_rate=0.001): 119 | feed_dict = { 120 | self.input_view: input_view, 121 | self.actions: actions, 122 | self.reward: reward, 123 | self.maxQ: maxQ, 124 | self.learning_rate: learning_rate 125 | } 126 | _ = sess.run(self.train_op, feed_dict) 127 | 128 | def train(self, sess, view_batches, actions_batches, rewards, maxQ_batches, learning_rate=0.001): 129 | def split_id_value(input_): 130 | ret_id = [] 131 | ret_value = [] 132 | for item in input_: 133 | ret_id.append(item[0]) 134 | ret_value.append(item[1]) 135 | return ret_id, ret_value 136 | 137 | for i in xrange(len(view_batches)): 138 | view_id, view_value = self.process_view_with_emb_batch(view_batches[i]) 139 | action_id, action_value = split_id_value(actions_batches[i]) 140 | maxQ_id, maxQ_value = split_id_value(maxQ_batches[i]) 141 | assert view_id == action_id == maxQ_id 142 | reward_value = [] 143 | for id in view_id: 144 | if id in rewards: 145 | reward_value.append(rewards[id]) 146 | else: 147 | reward_value.append(0.) 148 | 149 | self._train(sess, view_value, action_value, reward_value, maxQ_value, learning_rate) 150 | 151 | def save(self, sess, filename): 152 | saver = tf.train.Saver() 153 | saver.save(sess, filename) 154 | 155 | def load(self, sess, filename): 156 | saver = tf.train.Saver() 157 | saver.restore(sess, filename) 158 | 159 | def remove_dead_agent_emb(self, dead_list): 160 | for id in dead_list: 161 | del self.agent_embeddings[id] 162 | -------------------------------------------------------------------------------- /src/Collective Grouping Behaviors/env.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import multiprocessing 4 | from PIL import Image 5 | import numpy as np 6 | from cv2 import VideoWriter, imread, resize 7 | from copy import deepcopy 8 | import cv2 9 | 10 | 11 | # from model import inference, train 12 | 13 | class Env(object): 14 | def __init__(self, args): 15 | self.args = args 16 | self.h = args.height 17 | self.w = args.width 18 | self.batch_size = args.batch_size 19 | self.view_args = args.view_args 20 | self.agent_num = args.agent_number 21 | self.pig_num = 0 22 | self.rabbit_num = 0 23 | self.action_num = args.num_actions 24 | 25 | # Initialization 26 | self.view = [] 27 | self.map = np.zeros((self.h, self.w), dtype=np.int32) 28 | self.id_pos = {} 29 | self.pig_pos = set() 30 | self.property = {} 31 | self.rabbit_pos = set() 32 | 33 | # For the view size modify 34 | self.property_copy = {} 35 | self.max_group = 0 36 | self.id_group = {} 37 | self.group_ids = {} 38 | self.batch_views = {} 39 | self.ally = {} 40 | 41 | # For reason of degroup 42 | self.id_ally_number = {} 43 | self.actions = None 44 | 45 | # For health 46 | self.health = {} 47 | self.max_id = 0 48 | 49 | # For mortal 50 | self.dead_id = [] 51 | 52 | # For track largest group 53 | self.largest_group = 0 54 | 55 | self.rewards = None 56 | self.reward_radius_pig = args.reward_radius_pig 57 | self.reward_threshold_pig = args.reward_threshold_pig 58 | self.reward_radius_rabbit = args.reward_radius_rabbit 59 | 60 | self.groups_view_size = {} 61 | self.max_view_size = None 62 | self.min_view_size = None 63 | 64 | self._init_property() 65 | self._init_group() 66 | 67 | def _init_property(self): 68 | self.property[-3] = [1, [0, 1, 0]] 69 | self.property[-2] = [1, [1, 0, 0]] 70 | self.property[-1] = [1, [0.411, 0.411, 0.411]] 71 | self.property[0] = [1, [0, 0, 0]] 72 | 73 | def _init_group(self): 74 | for i in xrange(self.agent_num): 75 | self.id_group[i + 1] = 0 76 | 77 | def _gen_power(self, cnt): 78 | 79 | def max_view_size(view_size1, view_size2): 80 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 81 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 82 | 83 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 84 | 85 | def min_view_size(view_size1, view_size2): 86 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 87 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 88 | 89 | return view_size1 if view_size_area1 < view_size_area2 else view_size2 90 | 91 | cur = 0 92 | for k in self.view_args: 93 | k = [int(x) for x in k.split('-')] 94 | assert len(k) == 4 95 | 96 | num, power_list = k[0], k[1:] 97 | # Maintain the max_view_size 98 | if self.max_view_size is None: 99 | self.max_view_size = power_list 100 | else: 101 | self.max_view_size = max_view_size(self.max_view_size, power_list) 102 | 103 | if self.min_view_size is None: 104 | self.min_view_size = power_list 105 | else: 106 | self.min_view_size = min_view_size(self.min_view_size, power_list) 107 | 108 | cur += num 109 | 110 | if cnt <= cur: 111 | return power_list 112 | 113 | def gen_wall(self, prob=0, seed=10): 114 | if prob == 0: 115 | return 116 | np.random.seed(seed) 117 | # Generate wall according to the prob 118 | for i in xrange(self.h): 119 | for j in xrange(self.w): 120 | if i == 0 or i == self.h - 1 or j == 0 or j == self.w - 1: 121 | self.map[i][j] = -1 122 | continue 123 | wall_prob = np.random.rand() 124 | if wall_prob < prob: 125 | self.map[i][j] = -1 126 | 127 | def gen_agent(self, agent_num=None): 128 | if agent_num == None: 129 | agent_num = self.args.agent_number 130 | 131 | for i in xrange(agent_num): 132 | while True: 133 | x = np.random.randint(0, self.h) 134 | y = np.random.randint(0, self.w) 135 | if self.map[x][y] == 0: 136 | self.map[x][y] = i + 1 137 | self.id_pos[i + 1] = (x, y) 138 | self.property[i + 1] = [self._gen_power(i + 1), [0, 0, 1]] 139 | self.health[i + 1] = 1.0 140 | break 141 | assert (2 * self.max_view_size[0] + 1) * (self.max_view_size[1] + 1) * 5 + self.args.agent_emb_dim == \ 142 | self.args.view_flat_size 143 | 144 | self.agent_num = self.args.agent_number 145 | self.max_id = self.args.agent_number 146 | # self.property_copy = self.property[:] 147 | for k in self.property: 148 | self.property_copy[k] = self.property[k][:] 149 | # self.property_copy = deepcopy(self.property) 150 | 151 | def _grow_power(self): 152 | 153 | candidate_view = [] 154 | for k in self.view_args: 155 | k = [int(x) for x in k.split('-')] 156 | assert len(k) == 4 157 | candidate_view.append(k) 158 | 159 | num = len(candidate_view) 160 | random_power = np.random.randint(0, num) 161 | 162 | return candidate_view[random_power][1:] 163 | 164 | def grow_agent(self, agent_num=0): 165 | if agent_num == 0: 166 | return 167 | 168 | for i in xrange(agent_num): 169 | while True: 170 | x = np.random.randint(0, self.h) 171 | y = np.random.randint(0, self.w) 172 | if self.map[x][y] == 0: 173 | self.max_id += 1 174 | self.map[x][y] = self.max_id 175 | self.id_pos[self.max_id] = (x, y) 176 | self.property[self.max_id] = [self._grow_power(), [0, 0, 1]] 177 | self.property_copy[self.max_id] = self.property[self.max_id][:] 178 | self.health[self.max_id] = 1.0 179 | self.id_group[self.max_id] = 0 180 | 181 | break 182 | 183 | self.agent_num += agent_num 184 | 185 | def gen_pig(self, pig_nums=None): 186 | if pig_nums == None: 187 | pig_nums = self.args.pig_max_number 188 | 189 | for i in xrange(pig_nums): 190 | while True: 191 | x = np.random.randint(0, self.h) 192 | y = np.random.randint(0, self.w) 193 | if self.map[x][y] == 0: 194 | self.map[x][y] = -2 195 | self.pig_pos.add((x, y)) 196 | break 197 | 198 | self.pig_num = self.pig_num + pig_nums 199 | 200 | def gen_rabbit(self, rabbit_num=None): 201 | if rabbit_num is None: 202 | rabbit_num = self.args.rabbit_max_number 203 | 204 | for i in xrange(rabbit_num): 205 | while True: 206 | x = np.random.randint(0, self.h) 207 | y = np.random.randint(0, self.w) 208 | if self.map[x][y] == 0: 209 | self.map[x][y] = -3 210 | self.rabbit_pos.add((x, y)) 211 | break 212 | 213 | self.rabbit_num = self.rabbit_num + rabbit_num 214 | 215 | def get_pig_num(self): 216 | return self.pig_num 217 | 218 | def get_rabbit_num(self): 219 | return self.rabbit_num 220 | 221 | def get_agent_num(self): 222 | return self.agent_num 223 | 224 | def _agent_act(self, x, y, face, action, id): 225 | 226 | def move_forward(x, y, face): 227 | if face == 0: 228 | return x - 1, y 229 | elif face == 1: 230 | return x, y + 1 231 | elif face == 2: 232 | return x + 1, y 233 | elif face == 3: 234 | return x, y - 1 235 | 236 | def move_backward(x, y, face): 237 | if face == 0: 238 | return x + 1, y 239 | elif face == 1: 240 | return x, y - 1 241 | elif face == 2: 242 | return x - 1, y 243 | elif face == 3: 244 | return x, y + 1 245 | 246 | def move_left(x, y, face): 247 | if face == 0: 248 | return x, y - 1 249 | elif face == 1: 250 | return x - 1, y 251 | elif face == 2: 252 | return x, y + 1 253 | elif face == 3: 254 | return x + 1, y 255 | 256 | def move_right(x, y, face): 257 | if face == 0: 258 | return x, y + 1 259 | elif face == 1: 260 | return x + 1, y 261 | elif face == 2: 262 | return x, y - 1 263 | elif face == 3: 264 | return x - 1, y 265 | 266 | def in_board(x, y): 267 | return self.map[x][y] == 0 268 | 269 | # return the max view size(the area of the view) of the two view sizes 270 | def max_view_size(view_size1, view_size2): 271 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 272 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 273 | 274 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 275 | 276 | if action == 0: 277 | pass 278 | elif action == 1: 279 | new_x, new_y = move_forward(x, y, face) 280 | if in_board(new_x, new_y): 281 | self.map[x][y] = 0 282 | self.map[new_x][new_y] = id 283 | self.id_pos[id] = (new_x, new_y) 284 | elif action == 2: 285 | new_x, new_y = move_backward(x, y, face) 286 | if in_board(new_x, new_y): 287 | self.map[x][y] = 0 288 | self.map[new_x][new_y] = id 289 | self.id_pos[id] = (new_x, new_y) 290 | elif action == 3: 291 | new_x, new_y = move_left(x, y, face) 292 | if in_board(new_x, new_y): 293 | self.map[x][y] = 0 294 | self.map[new_x][new_y] = id 295 | self.id_pos[id] = (new_x, new_y) 296 | elif action == 4: 297 | new_x, new_y = move_right(x, y, face) 298 | if in_board(new_x, new_y): 299 | self.map[x][y] = 0 300 | self.map[new_x][new_y] = id 301 | self.id_pos[id] = (new_x, new_y) 302 | elif action == 5: 303 | self.property[id][0][2] = (face + 4 - 1) % 4 304 | elif action == 6: 305 | self.property[id][0][2] = (face + 1) % 4 306 | elif action == 7: 307 | if self.id_group[id] == 0: 308 | if id in self.ally: 309 | ally_id = self.ally[id] 310 | if self.id_group[ally_id] == 0: 311 | self.max_group += 1 312 | self.id_group[id] = self.max_group 313 | self.id_group[ally_id] = self.max_group 314 | 315 | self.group_ids[self.max_group] = [] 316 | self.group_ids[self.max_group].append(id) 317 | self.group_ids[self.max_group].append(ally_id) 318 | 319 | # For view size 320 | assert self.property[id][0] == self.property_copy[id][0] 321 | assert self.property[ally_id][0] == self.property_copy[ally_id][0] 322 | self.groups_view_size[self.max_group] = max_view_size(self.property[id][0], 323 | self.property[ally_id][0]) 324 | self.property[id][0] = self.groups_view_size[self.max_group] 325 | self.property[ally_id][0] = self.groups_view_size[self.max_group] 326 | else: 327 | assert self.property[id][0] == self.property_copy[id][0] 328 | self.id_group[id] = self.id_group[ally_id] 329 | self.group_ids[self.id_group[ally_id]].append(id) 330 | 331 | group_id = self.id_group[ally_id] 332 | 333 | cur_max_view_size = max_view_size(self.property[id][0], self.groups_view_size[group_id]) 334 | if cur_max_view_size == self.property[id][0] and self.property[id][0] != self.groups_view_size[ 335 | group_id]: 336 | # A powerful man join in a group, need to change all the members' view size in that group 337 | for people in self.group_ids[group_id]: 338 | self.property[people][0] = cur_max_view_size 339 | self.groups_view_size[group_id] = cur_max_view_size 340 | else: 341 | self.property[id][0] = cur_max_view_size 342 | 343 | elif action == 8: 344 | group_id = self.id_group[id] 345 | 346 | if group_id != 0: 347 | another_id = None 348 | if len(self.group_ids[group_id]) == 2: 349 | for item in self.group_ids[group_id]: 350 | if item != id: 351 | another_id = item 352 | self.id_group[id], self.id_group[another_id] = 0, 0 353 | self.group_ids[group_id] = None 354 | 355 | # Restore the origin view size 356 | self.property[id] = self.property_copy[id][:] 357 | self.property[another_id] = self.property_copy[another_id][:] 358 | self.groups_view_size[group_id] = None 359 | else: 360 | self.id_group[id] = 0 361 | self.group_ids[group_id].remove(id) 362 | 363 | # Restore the origin view size 364 | self.property[id] = self.property_copy[id][:] 365 | cur_max_view_size = None 366 | 367 | for people in self.group_ids[group_id]: 368 | if cur_max_view_size is None: 369 | cur_max_view_size = self.property_copy[people][0][:] 370 | else: 371 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 372 | 373 | for people in self.group_ids[group_id]: 374 | self.property[people][0] = cur_max_view_size 375 | 376 | self.groups_view_size[group_id] = cur_max_view_size 377 | 378 | else: 379 | pass 380 | 381 | else: 382 | print action 383 | print "Wrong Action ID!!!!" 384 | 385 | def take_action(self, actions): 386 | 387 | # Move Agent 388 | self.actions = actions 389 | # for i in xrange(self.agent_num): 390 | for id, action in actions: 391 | x, y = self.id_pos[id] 392 | face = self.property[id][0][2] 393 | self._agent_act(x, y, face, action, id) 394 | 395 | def increase_health(self, rewards): 396 | for id in rewards: 397 | self.health[id] += 12. * rewards[id] 398 | 399 | # if rewards[id] > 0.2: 400 | # self.health[id] = 1. 401 | # elif rewards > 0: 402 | # self.health[id] += rewards[id] 403 | 404 | # self.health[id] += rewards[id] 405 | # if self.health[id] > 1.0: 406 | # self.health[id] = 1.0 407 | 408 | def group_monitor(self): 409 | """ 410 | :return: group_num, mean_size, variance_size, max_size 411 | """ 412 | group_sizes = [] 413 | group_view_num = {} 414 | group_view_avg_size = {} 415 | for k in self.group_ids: 416 | ids = self.group_ids[k] 417 | if ids: 418 | group_size = len(ids) 419 | assert group_size >= 2 420 | group_sizes.append(group_size) 421 | 422 | # count group view size and group number 423 | group_view = self.groups_view_size[k] 424 | group_view = group_view[:2] 425 | if str(group_view) not in group_view_num: 426 | group_view_num[str(group_view)] = 1 427 | else: 428 | group_view_num[str(group_view)] += 1 429 | if str(group_view) not in group_view_avg_size: 430 | group_view_avg_size[str(group_view)] = group_size 431 | else: 432 | group_view_avg_size[str(group_view)] += group_size 433 | 434 | group_sizes = np.array(group_sizes) 435 | for k in group_view_avg_size: 436 | group_view_avg_size[k] = 1. * group_view_avg_size[k] / group_view_num[k] 437 | 438 | # For reason of degroup 439 | # cnt = 0 440 | # cnt_degroup = 0 441 | # 442 | # for i, action in enumerate(self.actions): 443 | # id = i + 1 444 | # if action == 8 and self.id_group[id] > 0: 445 | # cnt += 1 446 | # if id in self.id_ally_number: 447 | # cnt_degroup += self.id_ally_number[id] 448 | # 449 | # avg_degroup = 0 if cnt == 0.0 else 1. * cnt_degroup / (1. * cnt) 450 | 451 | if len(group_sizes) > 0: 452 | return len(group_sizes), group_sizes.mean(), group_sizes.var(), np.max( 453 | group_sizes), group_view_num 454 | else: 455 | return 0, 0, 0, 0, None 456 | 457 | def track_largest_group(self, time_step, update_largest_every): 458 | if time_step % update_largest_every == 0 or (self.group_ids[self.largest_group] is None): 459 | self.largest_group_size = 0 460 | self.largest_group = 0 461 | for k in self.group_ids: 462 | ids = self.group_ids[k] 463 | if ids: 464 | if len(ids) > self.largest_group_size: 465 | self.largest_group_size = len(ids) 466 | self.largest_group = k 467 | return [self.id_pos[i] for i in self.group_ids[self.largest_group]] 468 | 469 | def update_pig_pos(self): 470 | 471 | def in_board(x, y): 472 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 473 | 474 | # Move Pigs 475 | for i, item in enumerate(self.pig_pos): 476 | x, y = item 477 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 478 | np.random.shuffle(direction) 479 | for pos_x, pos_y in direction: 480 | if (pos_x, pos_y) == (0, 0): 481 | break 482 | new_x = x + pos_x 483 | new_y = y + pos_y 484 | 485 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 486 | self.pig_pos.remove((x, y)) 487 | self.pig_pos.add((new_x, new_y)) 488 | self.map[new_x][new_y] = -2 489 | self.map[x][y] = 0 490 | break 491 | 492 | def update_rabbit_pos(self): 493 | 494 | def in_board(x, y): 495 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 496 | 497 | # Move rabbits 498 | for i, item in enumerate(self.rabbit_pos): 499 | x, y = item 500 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 501 | np.random.shuffle(direction) 502 | for pos_x, pos_y in direction: 503 | if (pos_x, pos_y) == (0, 0): 504 | break 505 | new_x = x + pos_x 506 | new_y = y + pos_y 507 | 508 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 509 | self.rabbit_pos.remove((x, y)) 510 | self.rabbit_pos.add((new_x, new_y)) 511 | self.map[new_x][new_y] = -3 512 | self.map[x][y] = 0 513 | break 514 | 515 | def decrease_health(self): 516 | for id, _ in self.id_pos.iteritems(): 517 | self.health[id] -= self.args.damage_per_step 518 | 519 | def remove_dead_people(self): 520 | 521 | def max_view_size(view_size1, view_size2): 522 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 523 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 524 | 525 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 526 | 527 | self.dead_id = [] 528 | for id, pos in self.id_pos.iteritems(): 529 | assert id > 0 530 | if self.health[id] <= 0.: 531 | x, y = pos 532 | self.map[x][y] = 0 533 | 534 | self.dead_id.append(id) 535 | self.agent_num -= 1 536 | 537 | group_id = self.id_group[id] 538 | if group_id > 0: 539 | group_num = len(self.group_ids[group_id]) 540 | 541 | assert group_num >= 2 542 | 543 | if group_num > 2: 544 | del self.id_group[id] 545 | self.group_ids[group_id].remove(id) 546 | 547 | cur_max_view_size = None 548 | for people in self.group_ids[group_id]: 549 | if cur_max_view_size is None: 550 | cur_max_view_size = self.property_copy[people][0][:] 551 | else: 552 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 553 | for people in self.group_ids[group_id]: 554 | self.property[people][0] = cur_max_view_size 555 | 556 | self.groups_view_size[group_id] = cur_max_view_size 557 | else: 558 | another_id = None 559 | for item in self.group_ids[group_id]: 560 | if item != id: 561 | another_id = item 562 | self.id_group[another_id] = 0 563 | del self.id_group[id] 564 | self.group_ids[group_id] = None 565 | 566 | self.property[another_id] = self.property_copy[another_id][:] 567 | self.groups_view_size[group_id] = None 568 | 569 | for id in self.dead_id: 570 | del self.id_pos[id] 571 | del self.property[id] 572 | del self.property_copy[id] 573 | 574 | return self.dead_id 575 | 576 | def make_video(self, images, outvid=None, fps=5, size=None, is_color=True, format="XVID"): 577 | """ 578 | Create a video from a list of images. 579 | @param outvid output video 580 | @param images list of images to use in the video 581 | @param fps frame per second 582 | @param size size of each frame 583 | @param is_color color 584 | @param format see http://www.fourcc.org/codecs.php 585 | """ 586 | # fourcc = VideoWriter_fourcc(*format) 587 | # For opencv2 and opencv3: 588 | if int(cv2.__version__[0]) > 2: 589 | fourcc = cv2.VideoWriter_fourcc(*format) 590 | else: 591 | fourcc = cv2.cv.CV_FOURCC(*format) 592 | vid = None 593 | for image in images: 594 | assert os.path.exists(image) 595 | img = imread(image) 596 | if vid is None: 597 | if size is None: 598 | size = img.shape[1], img.shape[0] 599 | vid = VideoWriter(outvid, fourcc, float(fps), size, is_color) 600 | if size[0] != img.shape[1] and size[1] != img.shape[0]: 601 | img = resize(img, size) 602 | vid.write(img) 603 | vid.release() 604 | 605 | def dump_image(self, img_name): 606 | new_w, new_h = self.w * 5, self.h * 5 607 | img = np.zeros((new_w, new_h, 3), dtype=np.uint8) 608 | length = self.args.img_length 609 | for i in xrange(self.w): 610 | for j in xrange(self.h): 611 | id = self.map[i][j] 612 | if id != 0: 613 | for m in xrange(length): 614 | for n in xrange(length): 615 | img[i * length + m][j * length + n] = 255 * np.array(self.property[id][1]) 616 | output_img = Image.fromarray(img, 'RGB') 617 | output_img.save(img_name) 618 | 619 | 620 | def _get_reward_pig(pos): 621 | def in_bound(x, y): 622 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 623 | 624 | x, y = pos 625 | groups_num = {} 626 | for i in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 627 | for j in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 628 | new_x, new_y = x + i, y + j 629 | if in_bound(new_x, new_y): 630 | id = env_map[new_x][new_y] 631 | if id > 0 and env_id_group[id] > 0: 632 | if env_id_group[id] in groups_num: 633 | groups_num[env_id_group[id]] += 1 634 | else: 635 | groups_num[env_id_group[id]] = 1 636 | if len(groups_num): 637 | groups_num = [(k, groups_num[k]) for k in groups_num if groups_num[k] >= env_reward_threshold_pig] 638 | if len(groups_num) > 0: 639 | groups_num = sorted(groups_num, key=lambda x: x[1]) 640 | return env_group_ids[groups_num[-1][0]], pos 641 | else: 642 | return [], pos 643 | else: 644 | return [], pos 645 | 646 | 647 | def _get_reward_rabbit_both(pos): 648 | # both groups and individuals can catch rabbits 649 | def in_bound(x, y): 650 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 651 | 652 | x, y = pos 653 | candidates = [] 654 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 655 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 656 | new_x, new_y = x + i, y + j 657 | if in_bound(new_x, new_y): 658 | id = env_map[new_x][new_y] 659 | if id > 0: 660 | candidates.append(id) 661 | if len(candidates) > 0: 662 | winner = np.random.choice(candidates) 663 | if env_id_group[winner] == 0: 664 | return [winner], pos 665 | else: 666 | return env_group_ids[env_id_group[winner]], pos 667 | else: 668 | return [], pos 669 | 670 | 671 | def _get_reward_rabbit_individual(pos): 672 | # only individuals can catch rabbits 673 | def in_bound(x, y): 674 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 675 | 676 | x, y = pos 677 | candidates = [] 678 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 679 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 680 | new_x, new_y = x + i, y + j 681 | if in_bound(new_x, new_y): 682 | id = env_map[new_x][new_y] 683 | if id > 0 and env_id_group[id] == 0: 684 | candidates.append(id) 685 | if len(candidates) > 0: 686 | return [np.random.choice(candidates)], pos 687 | else: 688 | return [], pos 689 | 690 | 691 | def get_reward(env): 692 | global env_pig_pos 693 | global env_agent_num 694 | global env_batch_size 695 | global env_map 696 | global env_reward_radius_pig 697 | global env_reward_radius_rabbit 698 | global env_reward_threshold_pig 699 | global env_w 700 | global env_h 701 | global env_id_group 702 | global env_group_ids 703 | 704 | env_pig_pos = env.pig_pos 705 | env_rabbit_pos = env.rabbit_pos 706 | env_agent_num = env.agent_num 707 | env_map = env.map 708 | env_batch_size = env.batch_size 709 | env_reward_radius_pig = env.reward_radius_pig 710 | env_reward_threshold_pig = env.reward_threshold_pig 711 | env_reward_radius_rabbit = env.reward_radius_rabbit 712 | env_w = env.w 713 | env_h = env.h 714 | env_id_group = env.id_group 715 | env_group_ids = env.group_ids 716 | 717 | cores = multiprocessing.cpu_count() 718 | pool = multiprocessing.Pool(processes=cores) 719 | 720 | reward_ids_pig = pool.map(_get_reward_pig, env_pig_pos) 721 | reward_ids_rabbit = pool.map(_get_reward_rabbit_individual, env_rabbit_pos) 722 | pool.close() 723 | 724 | killed_pigs = set() 725 | killed_rabbits = set() 726 | rewards = {} 727 | 728 | for item in reward_ids_pig: 729 | if len(item[0]) > 0: 730 | reward_per_agent = 1. / len(item[0]) 731 | for id in item[0]: 732 | if id not in rewards: 733 | rewards[id] = reward_per_agent 734 | else: 735 | rewards[id] += reward_per_agent 736 | killed_pigs.add(item[1]) 737 | 738 | for item in reward_ids_rabbit: 739 | if len(item[0]) > 0: 740 | reward_per_agent = 0.05 / len(item[0]) 741 | for id in item[0]: 742 | if id not in rewards: 743 | rewards[id] = reward_per_agent 744 | else: 745 | rewards[id] += reward_per_agent 746 | killed_rabbits.add(item[1]) 747 | 748 | env_pig_pos = env_pig_pos - killed_pigs 749 | env.pig_pos = env_pig_pos 750 | env.pig_num -= len(killed_pigs) 751 | 752 | env_rabbit_pos = env_rabbit_pos - killed_rabbits 753 | env.rabbit_pos = env_rabbit_pos 754 | env.rabbit_num -= len(killed_rabbits) 755 | 756 | for item in killed_pigs: 757 | x, y = item 758 | env.map[x][y] = 0 759 | for item in killed_rabbits: 760 | x, y = item 761 | env.map[x][y] = 0 762 | 763 | return rewards 764 | 765 | 766 | def get_view(env): 767 | global env_property 768 | global env_map 769 | global env_h 770 | global env_w 771 | global env_id_group 772 | global env_id_pos 773 | global batch_size 774 | global env_agent_num 775 | global env_max_view_size 776 | global env_min_view_size 777 | global env_id_ally_number 778 | global env_health 779 | 780 | env_property = env.property 781 | env_map = env.map 782 | env_h = env.h 783 | env_w = env.w 784 | env_id_group = env.id_group 785 | env_id_pos = env.id_pos 786 | env_batch_size = env.batch_size 787 | env_agent_num = env.agent_num 788 | env_max_view_size = env.max_view_size 789 | env_min_view_size = env.min_view_size 790 | env_id_ally_number = {} 791 | env_health = env.health 792 | 793 | allies = [] 794 | 795 | cores = multiprocessing.cpu_count() 796 | pool = multiprocessing.Pool(processes=cores) 797 | 798 | env_id_pos_keys = env_id_pos.keys() 799 | env_id_pos_keys.sort() 800 | pos = [env_id_pos[k] for k in env_id_pos_keys] 801 | view = pool.map(_get_view, pos) 802 | pool.close() 803 | 804 | env.id_ally_number = env_id_ally_number 805 | 806 | views = [] 807 | ids = [] 808 | for item in view: 809 | views.append(item[0]) 810 | ids.append(item[2]) 811 | if item[1]: 812 | allies.append(item[1]) 813 | 814 | env.ally.clear() 815 | # Candidate ally 816 | for item in allies: 817 | env.ally[item[0]] = item[1] 818 | 819 | view = np.array(views) 820 | 821 | batch_views = [] 822 | 823 | for i in xrange(int(np.ceil(1. * env_agent_num / env_batch_size))): 824 | st = env_batch_size * i 825 | ed = st + env_batch_size 826 | if ed > env_agent_num: 827 | ed = env_agent_num 828 | 829 | # batch_view_tmp = view[st:ed] 830 | # batch_ids = ids[st:ed] 831 | batch_view = [] 832 | for j in xrange(st, ed): 833 | batch_view.append((ids[j], view[j])) 834 | 835 | batch_views.append(batch_view) 836 | 837 | return batch_views 838 | 839 | 840 | def _get_view(pos): 841 | x, y = pos 842 | range_l, range_f, face = env_property[env_map[x][y]][0] 843 | max_range_l, max_range_f, _ = env_max_view_size 844 | min_range_l, min_range_f, _ = env_min_view_size 845 | # single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 4), dtype=np.float32) 846 | single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 5), dtype=np.float32) 847 | env_ally = None 848 | 849 | def in_bound(x, y): 850 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 851 | 852 | def in_group(id_1, id_2): 853 | if env_id_group[id_1] == env_id_group[id_2]: 854 | return True 855 | else: 856 | return False 857 | 858 | cur_pos = 0 859 | allies = set() 860 | face = 0 861 | if face == 0: 862 | # for i in xrange(-range_f, 1): 863 | # for j in xrange(-range_l, range_l + 1): 864 | for i in xrange(-max_range_f, 1): 865 | for j in xrange(-max_range_l, max_range_l + 1): 866 | new_x, new_y = x + i, y + j 867 | 868 | if not in_bound(new_x, new_y) or i < -range_f or j < -range_l or j > range_l: 869 | single_view[cur_pos] = [1, 1, 0, 0, 0] 870 | else: 871 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_f, 872 | 1) and j in xrange( 873 | -min_range_l, min_range_l + 1): 874 | allies.add(env_map[new_x][new_y]) 875 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 876 | env_property[env_map[x][y]][1] 877 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 878 | single_view[cur_pos][3] = 1 879 | # For exploring the reason of degroup 880 | if env_map[x][y] in env_id_ally_number: 881 | env_id_ally_number[env_map[x][y]] += 1 882 | else: 883 | env_id_ally_number[env_map[x][y]] = 1 884 | else: 885 | single_view[cur_pos][3] = 0 886 | 887 | # For health 888 | single_view[cur_pos][4] = env_health[env_map[x][y]] 889 | 890 | cur_pos = cur_pos + 1 891 | 892 | # TODO: the logic of join a group 893 | if len(allies) > 0: 894 | ally_id = random.sample(allies, 1)[0] 895 | id = env_map[x][y] 896 | if id != ally_id: 897 | env_ally = (id, ally_id) 898 | 899 | elif face == 1: 900 | # for i in xrange(-range_l, range_l + 1): 901 | # for j in xrange(0, range_f + 1): 902 | for i in xrange(-max_range_l, max_range_l + 1): 903 | for j in xrange(0, max_range_f + 1): 904 | new_x, new_y = x + i, y + j 905 | if not in_bound(new_x, new_y) or i < -range_l or i > range_l or j > range_f: 906 | single_view[cur_pos] = [1, 1, 0, 0, 0] 907 | else: 908 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_l, 909 | min_range_l + 1) and j in xrange( 910 | 0, min_range_f + 1): 911 | allies.add(env_map[new_x][new_y]) 912 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 913 | env_property[env_map[x][y]][1] 914 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 915 | single_view[cur_pos][3] = 1 916 | if env_map[x][y] in env_id_ally_number: 917 | env_id_ally_number[env_map[x][y]] += 1 918 | else: 919 | env_id_ally_number[env_map[x][y]] = 1 920 | else: 921 | single_view[cur_pos][3] = 0 922 | 923 | # For health 924 | single_view[cur_pos][4] = env_health[env_map[x][y]] 925 | 926 | cur_pos = cur_pos + 1 927 | if len(allies) > 0: 928 | ally_id = random.sample(allies, 1)[0] 929 | id = env_map[x][y] 930 | if id != ally_id: 931 | env_ally = (id, ally_id) 932 | 933 | elif face == 2: 934 | # range_i_st, range_i_ed = -range_f, 0 935 | # range_j_st, range_j_ed = -range_l, range_l 936 | # for i in xrange(range_f, -1): 937 | # for j in xrange(range_l, -range_l - 1): 938 | for i in xrange(max_range_f, -1, -1): 939 | for j in xrange(max_range_l, -max_range_l - 1, -1): 940 | new_x, new_y = x + i, y + j 941 | if not in_bound(new_x, new_y) or i > range_f or j > range_l or j < -range_l: 942 | single_view[cur_pos] = [1, 1, 0, 0, 0] 943 | else: 944 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_f, -1, 945 | -1) and j in xrange( 946 | min_range_l, -min_range_l - 1, -1): 947 | allies.add(env_map[new_x][new_y]) 948 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 949 | env_property[env_map[x][y]][1] 950 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 951 | single_view[cur_pos][3] = 1 952 | if env_map[x][y] in env_id_ally_number: 953 | env_id_ally_number[env_map[x][y]] += 1 954 | else: 955 | env_id_ally_number[env_map[x][y]] = 1 956 | else: 957 | single_view[cur_pos][3] = 0 958 | # For health 959 | single_view[cur_pos][4] = env_health[env_map[x][y]] 960 | 961 | cur_pos = cur_pos + 1 962 | if len(allies) > 0: 963 | ally_id = random.sample(allies, 1)[0] 964 | id = env_map[x][y] 965 | if id != ally_id: 966 | env_ally = (id, ally_id) 967 | 968 | 969 | elif face == 3: 970 | # for i in xrange(range_l, -range_l - 1): 971 | # for j in xrange(-range_f, 1): 972 | for i in xrange(max_range_l, -max_range_l - 1, -1): 973 | for j in xrange(-max_range_f, 1): 974 | print "miaomiaomiao" 975 | new_x, new_y = x + i, y + j 976 | if not in_bound(new_x, new_y) or i > range_l or i < -range_l or j < -range_f: 977 | single_view[cur_pos] = [1, 1, 0, 0, 0] 978 | else: 979 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_l, 980 | -min_range_l - 1, 981 | -1) and j in xrange( 982 | -min_range_f, 1): 983 | allies.add(env_map[new_x][new_y]) 984 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 985 | env_property[env_map[x][y]][1] 986 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 987 | single_view[cur_pos][3] = 1 988 | if env_map[x][y] in env_id_ally_number: 989 | env_id_ally_number[env_map[x][y]] += 1 990 | else: 991 | env_id_ally_number[env_map[x][y]] = 1 992 | else: 993 | single_view[cur_pos][3] = 0 994 | 995 | # For health 996 | single_view[cur_pos][4] = env_health[env_map[x][y]] 997 | 998 | cur_pos = cur_pos + 1 999 | if len(allies) > 0: 1000 | ally_id = random.sample(allies, 1)[0] 1001 | id = env_map[x][y] 1002 | if id != ally_id: 1003 | env_ally = (id, ally_id) 1004 | 1005 | else: 1006 | print "Error Face!!!" 1007 | assert cur_pos == (2 * max_range_l + 1) * (max_range_f + 1) 1008 | return single_view.reshape(-1), env_ally, env_map[x][y] 1009 | -------------------------------------------------------------------------------- /src/Collective Grouping Behaviors/main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from env import Env, get_view, get_reward 3 | from Model import Model_DNN 4 | import argparse 5 | import tensorflow as tf 6 | import os 7 | import shutil 8 | import time 9 | 10 | if __name__ == '__main__': 11 | argparser = argparse.ArgumentParser(sys.argv[0]) 12 | # Environment 13 | argparser.add_argument('--add_pig_number', type=int, default=500) 14 | argparser.add_argument('--add_rabbit_number', type=int, default=500) 15 | argparser.add_argument('--add_every', type=int, default=500) 16 | 17 | argparser.add_argument('--random_seed', type=int, default=10, 18 | help='the random seed to generate the wall in the map') 19 | argparser.add_argument('--width', type=int, default=1000) 20 | argparser.add_argument('--height', type=int, default=1000) 21 | argparser.add_argument('--batch_size', type=int, default=32) 22 | argparser.add_argument('--view_args', type=str, default='2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3', 23 | help="num-leftView-frontView-orientation, separated by space") 24 | argparser.add_argument('--pig_max_number', type=int, default=3000) 25 | argparser.add_argument('--pig_min_number', type=int, default=1500) 26 | argparser.add_argument('--pig_increase_every', type=int, default=5) 27 | argparser.add_argument('--pig_increase_rate', type=float, default=0.001) 28 | argparser.add_argument('--rabbit_increase_rate', type=float, default=0.001) 29 | argparser.add_argument('--rabbit_max_number', type=int, default=3000) 30 | argparser.add_argument('--agent_increase_rate', type=float, default=0.001) 31 | argparser.add_argument('--pig_increase_policy', type=int, default=1, 32 | help='0: max_min; 1: increase every n timestep') 33 | argparser.add_argument('--reward_radius_rabbit', type=int, default=7) 34 | argparser.add_argument('--reward_radius_pig', type=int, default=7) 35 | argparser.add_argument('--reward_threshold_pig', type=int, default=5) 36 | argparser.add_argument('--img_length', type=int, default=5) 37 | argparser.add_argument('--images_dir', type=str, default='images') 38 | argparser.add_argument('--agent_mortal', type=int, default=0, 39 | help='0: immortal, 1: mortal') 40 | argparser.add_argument('--agent_emb_dim', type=int, default=5) 41 | argparser.add_argument('--agent_id', type=int, default=0, 42 | help='0: no id, 1: has id') 43 | argparser.add_argument('--damage_per_step', type=float, default=0.) 44 | # model 45 | argparser.add_argument('--model_name', type=str, default='DNN') 46 | argparser.add_argument('--model_hidden_size', type=str, default='32,32') 47 | argparser.add_argument('--activations', type=str, default='sigmoid,sigmoid') 48 | argparser.add_argument('--view_flat_size', type=int, default=66 * 5 + 5) 49 | argparser.add_argument('--num_actions', type=int, default=9) 50 | argparser.add_argument('--reward_decay', type=float, default=0.9) 51 | argparser.add_argument('--save_every_round', type=int, default=10) 52 | argparser.add_argument('--save_dir', type=str, default='models') 53 | argparser.add_argument('--load_dir', type=str, default=None, 54 | help='e.g. models/round_0/model.ckpt') 55 | # Train 56 | argparser.add_argument('--video_dir', type=str, default='videos') 57 | argparser.add_argument('--video_per_round', type=int, default=None) 58 | argparser.add_argument('--round', type=int, default=100) 59 | argparser.add_argument('--time_step', type=int, default=10) 60 | argparser.add_argument('--policy', type=str, default='e_greedy') 61 | argparser.add_argument('--epsilon', type=float, default=0.1) 62 | argparser.add_argument('--agent_number', type=int, default=100) 63 | argparser.add_argument('--learning_rate', type=float, default=0.001) 64 | argparser.add_argument('--log_file', type=str, default='log.txt') 65 | argv = argparser.parse_args() 66 | 67 | argv.model_hidden_size = [int(x) for x in argv.model_hidden_size.split(',')] 68 | argv.view_args = [x for x in argv.view_args.split(',')] 69 | argv.activations = [x for x in argv.activations.split(',')] 70 | if argv.load_dir == 'None': 71 | argv.load_dir = None 72 | 73 | env = Env(argv) 74 | model = Model_DNN(argv) 75 | 76 | # Environment Initialization 77 | env.gen_wall(0.02, seed=argv.random_seed) 78 | env.gen_agent(argv.agent_number) 79 | 80 | # env.gen_pig(argv.pig_max_number) 81 | # env.gen_rabbit(argv.rabbit_max_number) 82 | 83 | config = tf.ConfigProto() 84 | config.gpu_options.allow_growth = True 85 | sess = tf.Session(config=config) 86 | sess.run(tf.global_variables_initializer()) 87 | 88 | if argv.load_dir: 89 | model.load(sess, argv.load_dir) 90 | print 'Load model from ' + argv.load_dir 91 | 92 | if not os.path.exists(argv.images_dir): 93 | os.mkdir(argv.images_dir) 94 | if not os.path.exists(argv.video_dir): 95 | os.mkdir(argv.video_dir) 96 | if not os.path.exists(argv.save_dir): 97 | os.mkdir(argv.save_dir) 98 | 99 | flip = 0 100 | 101 | log = open(argv.log_file, 'w') 102 | log_largest_group = open('log_largest_group.txt', 'w') 103 | for r in xrange(argv.round): 104 | video_flag = False 105 | if argv.video_per_round > 0 and r % argv.video_per_round == 0: 106 | video_flag = True 107 | img_dir = os.path.join(argv.images_dir, str(r)) 108 | try: 109 | os.makedirs(img_dir) 110 | except: 111 | shutil.rmtree(img_dir) 112 | os.makedirs(img_dir) 113 | for t in xrange(argv.time_step): 114 | if t == 0 and video_flag: 115 | env.dump_image(os.path.join(img_dir, '%d.png' % t)) 116 | 117 | view_batches = get_view(env) # s 118 | actions, actions_batches = model.infer_actions(sess, view_batches, policy=argv.policy, 119 | epsilon=argv.epsilon) # a 120 | 121 | env.take_action(actions) 122 | env.decrease_health() 123 | env.update_pig_pos() 124 | env.update_rabbit_pos() 125 | 126 | if video_flag: 127 | env.dump_image(os.path.join(img_dir, '%d.png' % (t + 1))) 128 | 129 | rewards = get_reward(env) # r, a dictionary 130 | env.increase_health(rewards) 131 | 132 | new_view_batches = get_view(env) # s' 133 | maxQ_batches = model.infer_max_action_values(sess, new_view_batches) 134 | 135 | model.train(sess=sess, 136 | view_batches=view_batches, 137 | actions_batches=actions_batches, 138 | rewards=rewards, 139 | maxQ_batches=maxQ_batches, 140 | learning_rate=argv.learning_rate) 141 | 142 | # dead_list = env.remove_dead_people() 143 | # model.remove_dead_agent_emb(dead_list) # remove agent embeddings 144 | 145 | cur_pig_num = env.get_pig_num() 146 | cur_rabbit_num = env.get_rabbit_num() 147 | group_num, mean_size, variance_size, max_size, group_view_num = env.group_monitor() 148 | info = 'Round\t%d\ttimestep\t%d\tPigNum\t%d\tgroup_num\t%d\tmean_size\t%f\tvariance_size\t%f\tmax_group_size\t%d\trabbitNum\t%d' % \ 149 | (r, t, cur_pig_num, group_num, mean_size, variance_size, max_size, cur_rabbit_num) 150 | if group_view_num is not None: 151 | for k in group_view_num: 152 | x = map(int, k[1:-1].split(',')) 153 | group_view_info = '\tView\t%d\tnumber\t%d' % ( 154 | (2 * x[0] + 1) * (x[1] + 1), group_view_num[k]) 155 | info += group_view_info 156 | print group_view_info 157 | info += '\tagent_num\t%d' % env.get_agent_num() 158 | 159 | join_num = 0 160 | leave_num = 0 161 | for item in actions: 162 | if item[1] == 7: 163 | join_num += 1 164 | elif item[1] == 8: 165 | leave_num += 1 166 | join_num = 1.0 * join_num / len(actions) 167 | leave_num = 1.0 * leave_num / len(actions) 168 | info += '\tjoin_ratio\t%f\tleave_ratio\t%f' % (join_num, leave_num) 169 | 170 | print info 171 | 172 | # print 'average degroup number:\t', avg_degroup 173 | log.write(info + '\n') 174 | log.flush() 175 | 176 | # largest_group_pos = env.track_largest_group(time_step=r * argv.round + t, update_largest_every=200) 177 | # pos_info = [] 178 | # for item in largest_group_pos: 179 | # pos_info.append(str(item[0]) + ',' + str(item[1])) 180 | # log_largest_group.write('\t'.join(pos_info) + '\n') 181 | # log_largest_group.flush() 182 | 183 | # if argv.pig_increase_policy == 0: 184 | # if cur_pig_num < argv.pig_min_number: 185 | # env.gen_pig(argv.pig_max_number - cur_pig_num) 186 | # elif argv.pig_increase_policy == 1: 187 | # if t % argv.pig_increase_every == 0: 188 | # env.gen_pig(max(1, int(env.get_pig_num() * argv.pig_increase_rate))) 189 | # elif argv.pig_increase_policy == 2: 190 | # env.gen_pig(10) 191 | # 192 | # env.gen_rabbit(max(10, int(env.get_rabbit_num() * argv.rabbit_increase_rate))) 193 | # env.grow_agent(max(1, int(env.get_agent_num() * argv.agent_increase_rate))) 194 | 195 | # if (r * argv.time_step + t) % argv.add_every == 0: 196 | # if flip: 197 | # env.gen_pig(argv.add_pig_number) 198 | # else: 199 | # env.gen_rabbit(argv.add_rabbit_number) 200 | # flip ^= 1 201 | 202 | if flip: 203 | if env.get_rabbit_num() < 1000: 204 | env.gen_pig(argv.pig_max_number - env.get_pig_num()) 205 | flip ^= 1 206 | else: 207 | if env.get_pig_num() < 2000: 208 | env.gen_rabbit(argv.rabbit_max_number - env.get_rabbit_num()) 209 | flip ^= 1 210 | 211 | if argv.save_every_round and r % argv.save_every_round == 0: 212 | if not os.path.exists(os.path.join(argv.save_dir, "round_%d" % r)): 213 | os.mkdir(os.path.join(argv.save_dir, "round_%d" % r)) 214 | model_path = os.path.join(argv.save_dir, "round_%d" % r, "model.ckpt") 215 | model.save(sess, model_path) 216 | print 'model saved into ' + model_path 217 | if video_flag: 218 | images = [os.path.join(img_dir, ("%d.png" % i)) for i in range(argv.time_step + 1)] 219 | env.make_video(images=images, outvid=os.path.join(argv.video_dir, "%d.avi" % r)) 220 | log.close() 221 | -------------------------------------------------------------------------------- /src/Collective Grouping Behaviors/train.sh: -------------------------------------------------------------------------------- 1 | add_pig_number=500 2 | add_rabbit_number=500 3 | add_every=500 4 | 5 | random_seed=10 6 | width=1000 7 | height=1000 8 | batch_size=32 9 | view_args=2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3 10 | pig_max_number=5000 11 | pig_min_number=2000 12 | pig_increase_every=1 13 | pig_increase_number=10 14 | pig_increase_policy=1 15 | agent_increase_rate=0.003 16 | pig_increase_rate=0.006 17 | rabbit_increase_rate=0.008 18 | rabbit_max_number=30000 19 | reward_radius_pig=7 20 | reward_threshold_pig=3 21 | reward_radius_rabbit=2 22 | img_length=5 23 | images_dir=images 24 | agent_mortal=1 25 | agent_emb_dim=5 26 | agent_id=1 27 | damage_per_step=0.01 28 | 29 | model_name=DNN 30 | model_hidden_size=32,32 31 | activations=sigmoid,sigmoid 32 | view_flat_size=335 33 | num_actions=9 34 | reward_decay=0.9 35 | save_every_round=10 36 | save_dir=models 37 | load_dir=None 38 | 39 | video_dir=videos 40 | video_per_round=0 41 | round=100 42 | time_step=500 43 | policy=e_greedy 44 | epsilon=0.1 45 | agent_number=10000 46 | learning_rate=0.001 47 | log_file=Collective_Grouping_Behaviors.log 48 | 49 | python main.py --add_pig_number $add_pig_number --add_rabbit_number $add_rabbit_number --add_every $add_every --random_seed $random_seed --width $width --height $height --batch_size $batch_size --view_args $view_args --pig_max_number $pig_max_number --pig_min_number $pig_min_number --pig_increase_every $pig_increase_every --pig_increase_policy $pig_increase_policy --agent_increase_rate $agent_increase_rate --pig_increase_rate $pig_increase_rate --rabbit_increase_rate $rabbit_increase_rate --rabbit_max_number $rabbit_max_number --reward_radius_pig $reward_radius_pig --reward_threshold_pig $reward_threshold_pig --reward_radius_rabbit $reward_radius_rabbit --img_length $img_length --images_dir $images_dir --agent_mortal $agent_mortal --agent_emb_dim $agent_emb_dim --agent_id $agent_id --damage_per_step $damage_per_step --model_name $model_name --model_hidden_size $model_hidden_size --activations $activations --view_flat_size $view_flat_size --num_actions $num_actions --reward_decay $reward_decay --save_every_round $save_every_round --save_dir $save_dir --load_dir $load_dir --video_dir $video_dir --video_per_round $video_per_round --round $round --time_step $time_step --policy $policy --epsilon $epsilon --agent_number $agent_number --learning_rate $learning_rate --log_file $log_file 50 | -------------------------------------------------------------------------------- /src/Model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import random 4 | 5 | 6 | class Model_DNN(): 7 | def __init__(self, args): 8 | self.args = args 9 | assert self.args.model_name == 'DNN' 10 | 11 | # Input placeholders 12 | self.input_view = tf.placeholder(tf.float32) # 2-D, [batch_size, view_size(width x length x depth)] 13 | self.actions = tf.placeholder(tf.int32) # 1-D, [batch_size] 14 | self.reward = tf.placeholder(tf.float32) # 1-D, [batch_size] 15 | self.maxQ = tf.placeholder(tf.float32) # 1-D, [batch_size], the max Q-value of next state 16 | self.learning_rate = tf.placeholder(tf.float32) 17 | 18 | self.agent_embeddings = {} 19 | 20 | # Build Graph 21 | self.x = self.input_view 22 | last_hidden_size = self.args.view_flat_size 23 | for i, hidden_size in enumerate(self.args.model_hidden_size): 24 | with tf.variable_scope('layer_%d' % i): 25 | self.W = tf.get_variable(name='weights', 26 | initializer=tf.truncated_normal([last_hidden_size, hidden_size], stddev=0.1)) 27 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([hidden_size])) 28 | last_hidden_size = hidden_size 29 | self.x = tf.matmul(self.x, self.W) + self.b 30 | # activation function 31 | if self.args.activations[i] == 'sigmoid': 32 | self.x = tf.sigmoid(self.x) 33 | elif self.args.activations[i] == 'tanh': 34 | self.x = tf.nn.tanh(self.x) 35 | elif self.args.activations[i] == 'relu': 36 | self.x = tf.nn.relu(self.x) 37 | 38 | with tf.variable_scope('layer_output'): 39 | self.W = tf.get_variable(name='weights', 40 | initializer=tf.truncated_normal([last_hidden_size, self.args.num_actions], 41 | stddev=0.1)) 42 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([self.args.num_actions])) 43 | self.output = tf.matmul(self.x, self.W) + self.b # batch_size x output_size 44 | 45 | # Train operation 46 | self.reward_decay = self.args.reward_decay 47 | self.actions_onehot = tf.one_hot(self.actions, self.args.num_actions) 48 | self.loss = tf.reduce_mean( 49 | tf.square( 50 | (self.reward + self.reward_decay * self.maxQ) - tf.reduce_sum( 51 | tf.multiply(self.actions_onehot, self.output), axis=1) 52 | ) 53 | ) 54 | self.train_op = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate).minimize(self.loss) 55 | 56 | def _inference(self, sess, input_view, if_sample, policy='e_greedy', epsilon=0.1): 57 | """ 58 | Perform inference for one batch 59 | :param if_sample: bool; If true, return Q(s,a) for all the actions; If false, return the sampled action. 60 | :param policy: valid when if_sample=True, sample policy of the actions taken. 61 | Available: e_greedy, greedy 62 | :param epsilon: for e_greedy policy 63 | :return: numpy array; if_sample=True, [batch_size]; if_sample=False, [batch_size, num_actions] 64 | """ 65 | assert policy in ['greedy', 'e_greedy'] 66 | value_s_a = sess.run(self.output, {self.input_view: input_view}) 67 | if if_sample: 68 | if policy == 'greedy': 69 | actions = np.argmax(value_s_a, axis=1) 70 | return actions 71 | if policy == 'e_greedy': 72 | all_actions = range(self.args.num_actions) 73 | actions = [] 74 | for i in xrange(len(value_s_a)): 75 | if random.random() < epsilon: 76 | actions.append(np.random.choice(all_actions)) 77 | else: 78 | actions.append(np.argmax(value_s_a[i])) 79 | return np.array(actions) 80 | else: 81 | return value_s_a 82 | 83 | def infer_actions(self, sess, view_batches, policy='e_greedy', epsilon=0.1): 84 | ret_actions = [] 85 | ret_actions_batch = [] 86 | for input_view in view_batches: 87 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 88 | actions_batch = self._inference(sess, batch_view, if_sample=True, policy=policy, epsilon=epsilon) 89 | ret_actions_batch.append(zip(batch_id, actions_batch)) 90 | ret_actions.extend(zip(batch_id, actions_batch)) 91 | 92 | return ret_actions, ret_actions_batch 93 | 94 | def infer_max_action_values(self, sess, view_batches): 95 | ret = [] 96 | for input_view in view_batches: 97 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 98 | value_batches = self._inference(sess, batch_view, if_sample=False) 99 | ret.append(zip(batch_id, np.max(value_batches, axis=1))) 100 | return ret 101 | 102 | def process_view_with_emb_batch(self, input_view): 103 | # parse input into id, view as and concatenate view with embedding 104 | batch_view = [] 105 | batch_id = [] 106 | for id, view in input_view: 107 | batch_id.append(id) 108 | if id in self.agent_embeddings: 109 | new_view = np.concatenate((self.agent_embeddings[id], view), 0) 110 | batch_view.append(new_view) 111 | else: 112 | new_embedding = np.random.normal(size=[self.args.agent_emb_dim]) 113 | self.agent_embeddings[id] = new_embedding 114 | new_view = np.concatenate((new_embedding, view), 0) 115 | batch_view.append(new_view) 116 | return batch_id, np.array(batch_view) 117 | 118 | def _train(self, sess, input_view, actions, reward, maxQ, learning_rate=0.001): 119 | feed_dict = { 120 | self.input_view: input_view, 121 | self.actions: actions, 122 | self.reward: reward, 123 | self.maxQ: maxQ, 124 | self.learning_rate: learning_rate 125 | } 126 | _ = sess.run(self.train_op, feed_dict) 127 | 128 | def train(self, sess, view_batches, actions_batches, rewards, maxQ_batches, learning_rate=0.001): 129 | def split_id_value(input_): 130 | ret_id = [] 131 | ret_value = [] 132 | for item in input_: 133 | ret_id.append(item[0]) 134 | ret_value.append(item[1]) 135 | return ret_id, ret_value 136 | 137 | for i in xrange(len(view_batches)): 138 | view_id, view_value = self.process_view_with_emb_batch(view_batches[i]) 139 | action_id, action_value = split_id_value(actions_batches[i]) 140 | maxQ_id, maxQ_value = split_id_value(maxQ_batches[i]) 141 | assert view_id == action_id == maxQ_id 142 | reward_value = [] 143 | for id in view_id: 144 | if id in rewards: 145 | reward_value.append(rewards[id]) 146 | else: 147 | reward_value.append(0.) 148 | 149 | self._train(sess, view_value, action_value, reward_value, maxQ_value, learning_rate) 150 | 151 | def save(self, sess, filename): 152 | saver = tf.train.Saver() 153 | saver.save(sess, filename) 154 | 155 | def load(self, sess, filename): 156 | saver = tf.train.Saver() 157 | saver.restore(sess, filename) 158 | 159 | def remove_dead_agent_emb(self, dead_list): 160 | for id in dead_list: 161 | del self.agent_embeddings[id] 162 | -------------------------------------------------------------------------------- /src/Population Dynamics/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geek-ai/1m-agents/93ae67027f24aea6c523ced83bab9ba76394ab23/src/Population Dynamics/.DS_Store -------------------------------------------------------------------------------- /src/Population Dynamics/.gitignore: -------------------------------------------------------------------------------- 1 | good_model 2 | largest_group 3 | -------------------------------------------------------------------------------- /src/Population Dynamics/Model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import random 4 | 5 | 6 | class Model_DNN(): 7 | def __init__(self, args): 8 | self.args = args 9 | assert self.args.model_name == 'DNN' 10 | 11 | # Input placeholders 12 | self.input_view = tf.placeholder(tf.float32) # 2-D, [batch_size, view_size(width x length x depth)] 13 | self.actions = tf.placeholder(tf.int32) # 1-D, [batch_size] 14 | self.reward = tf.placeholder(tf.float32) # 1-D, [batch_size] 15 | self.maxQ = tf.placeholder(tf.float32) # 1-D, [batch_size], the max Q-value of next state 16 | self.learning_rate = tf.placeholder(tf.float32) 17 | self.ret_loss = None 18 | 19 | self.agent_embeddings = {} 20 | 21 | # Build Graph 22 | self.x = self.input_view 23 | last_hidden_size = self.args.view_flat_size 24 | for i, hidden_size in enumerate(self.args.model_hidden_size): 25 | with tf.variable_scope('layer_%d' % i): 26 | self.W = tf.get_variable(name='weights', 27 | initializer=tf.truncated_normal([last_hidden_size, hidden_size], stddev=0.1)) 28 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([hidden_size])) 29 | last_hidden_size = hidden_size 30 | self.x = tf.matmul(self.x, self.W) + self.b 31 | # activation function 32 | if self.args.activations[i] == 'sigmoid': 33 | self.x = tf.sigmoid(self.x) 34 | elif self.args.activations[i] == 'tanh': 35 | self.x = tf.nn.tanh(self.x) 36 | elif self.args.activations[i] == 'relu': 37 | self.x = tf.nn.relu(self.x) 38 | 39 | with tf.variable_scope('layer_output'): 40 | self.W = tf.get_variable(name='weights', 41 | initializer=tf.truncated_normal([last_hidden_size, self.args.num_actions], 42 | stddev=0.1)) 43 | self.b = tf.get_variable(name='bias', initializer=tf.zeros([self.args.num_actions])) 44 | self.output = tf.matmul(self.x, self.W) + self.b # batch_size x output_size 45 | 46 | # Train operation 47 | self.reward_decay = self.args.reward_decay 48 | self.actions_onehot = tf.one_hot(self.actions, self.args.num_actions) 49 | self.loss = tf.reduce_mean( 50 | tf.square( 51 | (self.reward + self.reward_decay * self.maxQ) - tf.reduce_sum( 52 | tf.multiply(self.actions_onehot, self.output), axis=1) 53 | ) 54 | ) 55 | self.train_op = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate).minimize(self.loss) 56 | 57 | def _inference(self, sess, input_view, if_sample, policy='e_greedy', epsilon=0.1): 58 | """ 59 | Perform inference for one batch 60 | :param if_sample: bool; If true, return Q(s,a) for all the actions; If false, return the sampled action. 61 | :param policy: valid when if_sample=True, sample policy of the actions taken. 62 | Available: e_greedy, greedy 63 | :param epsilon: for e_greedy policy 64 | :return: numpy array; if_sample=True, [batch_size]; if_sample=False, [batch_size, num_actions] 65 | """ 66 | assert policy in ['greedy', 'e_greedy'] 67 | value_s_a = sess.run(self.output, {self.input_view: input_view}) 68 | if if_sample: 69 | if policy == 'greedy': 70 | actions = np.argmax(value_s_a, axis=1) 71 | return actions 72 | if policy == 'e_greedy': 73 | all_actions = range(self.args.num_actions) 74 | actions = [] 75 | for i in xrange(len(value_s_a)): 76 | if random.random() < epsilon: 77 | actions.append(np.random.choice(all_actions)) 78 | else: 79 | actions.append(np.argmax(value_s_a[i])) 80 | return np.array(actions) 81 | else: 82 | return value_s_a 83 | 84 | def infer_actions(self, sess, view_batches, policy='e_greedy', epsilon=0.1): 85 | ret_actions = [] 86 | ret_actions_batch = [] 87 | for input_view in view_batches: 88 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 89 | actions_batch = self._inference(sess, batch_view, if_sample=True, policy=policy, epsilon=epsilon) 90 | ret_actions_batch.append(zip(batch_id, actions_batch)) 91 | ret_actions.extend(zip(batch_id, actions_batch)) 92 | 93 | return ret_actions, ret_actions_batch 94 | 95 | def infer_max_action_values(self, sess, view_batches): 96 | ret = [] 97 | for input_view in view_batches: 98 | batch_id, batch_view = self.process_view_with_emb_batch(input_view) 99 | value_batches = self._inference(sess, batch_view, if_sample=False) 100 | ret.append(zip(batch_id, np.max(value_batches, axis=1))) 101 | return ret 102 | 103 | def process_view_with_emb_batch(self, input_view): 104 | # parse input into id, view as and concatenate view with embedding 105 | batch_view = [] 106 | batch_id = [] 107 | for id, view in input_view: 108 | batch_id.append(id) 109 | if id in self.agent_embeddings: 110 | new_view = np.concatenate((self.agent_embeddings[id], view), 0) 111 | batch_view.append(new_view) 112 | else: 113 | new_embedding = np.random.normal(size=[self.args.agent_emb_dim]) 114 | self.agent_embeddings[id] = new_embedding 115 | new_view = np.concatenate((new_embedding, view), 0) 116 | batch_view.append(new_view) 117 | return batch_id, np.array(batch_view) 118 | 119 | def _train(self, sess, input_view, actions, reward, maxQ, learning_rate=0.001): 120 | feed_dict = { 121 | self.input_view: input_view, 122 | self.actions: actions, 123 | self.reward: reward, 124 | self.maxQ: maxQ, 125 | self.learning_rate: learning_rate 126 | } 127 | _, self.ret_loss = sess.run([self.train_op, self.loss], feed_dict) 128 | 129 | def train(self, sess, view_batches, actions_batches, rewards, maxQ_batches, learning_rate=0.001): 130 | def split_id_value(input_): 131 | ret_id = [] 132 | ret_value = [] 133 | for item in input_: 134 | ret_id.append(item[0]) 135 | ret_value.append(item[1]) 136 | return ret_id, ret_value 137 | 138 | for i in xrange(len(view_batches)): 139 | view_id, view_value = self.process_view_with_emb_batch(view_batches[i]) 140 | action_id, action_value = split_id_value(actions_batches[i]) 141 | maxQ_id, maxQ_value = split_id_value(maxQ_batches[i]) 142 | assert view_id == action_id == maxQ_id 143 | reward_value = [] 144 | for id in view_id: 145 | if id in rewards: 146 | reward_value.append(rewards[id]) 147 | else: 148 | reward_value.append(0.) 149 | 150 | self._train(sess, view_value, action_value, reward_value, maxQ_value, learning_rate) 151 | 152 | def get_loss(self): 153 | return self.ret_loss 154 | 155 | def save(self, sess, filename): 156 | saver = tf.train.Saver() 157 | saver.save(sess, filename) 158 | 159 | def load(self, sess, filename): 160 | saver = tf.train.Saver() 161 | saver.restore(sess, filename) 162 | 163 | def remove_dead_agent_emb(self, dead_list): 164 | for id in dead_list: 165 | del self.agent_embeddings[id] 166 | -------------------------------------------------------------------------------- /src/Population Dynamics/env.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import multiprocessing 4 | from PIL import Image 5 | import numpy as np 6 | from cv2 import VideoWriter, imread, resize 7 | from copy import deepcopy 8 | import cv2 9 | 10 | 11 | # from model import inference, train 12 | 13 | class Env(object): 14 | def __init__(self, args): 15 | self.args = args 16 | self.h = args.height 17 | self.w = args.width 18 | self.batch_size = args.batch_size 19 | self.view_args = args.view_args 20 | self.agent_num = args.agent_number 21 | self.pig_num = 0 22 | self.rabbit_num = 0 23 | self.action_num = args.num_actions 24 | 25 | # Initialization 26 | self.view = [] 27 | self.map = np.zeros((self.h, self.w), dtype=np.int32) 28 | self.id_pos = {} 29 | self.pig_pos = set() 30 | self.property = {} 31 | self.birth_year = {} 32 | self.rabbit_pos = set() 33 | 34 | # For the view size modify 35 | self.property_copy = {} 36 | self.max_group = 0 37 | self.id_group = {} 38 | self.group_ids = {} 39 | self.batch_views = {} 40 | self.ally = {} 41 | 42 | # For reason of degroup 43 | self.id_ally_number = {} 44 | self.actions = None 45 | 46 | # For health 47 | self.health = {} 48 | self.max_id = 0 49 | 50 | # For mortal 51 | self.dead_id = [] 52 | # Record the avg_life of the dead people in current time step 53 | self.avg_life = None 54 | self.dead_people = None 55 | # A map: year -> (avg_life, live_year) 56 | self.avg_life = {} 57 | 58 | # For track largest group 59 | self.largest_group = 0 60 | 61 | self.rewards = None 62 | self.reward_radius_pig = args.reward_radius_pig 63 | self.reward_threshold_pig = args.reward_threshold_pig 64 | self.reward_radius_rabbit = args.reward_radius_rabbit 65 | 66 | self.groups_view_size = {} 67 | self.max_view_size = None 68 | self.min_view_size = None 69 | 70 | self._init_property() 71 | self._init_group() 72 | 73 | def _init_property(self): 74 | self.property[-3] = [1, [0, 1, 0]] 75 | self.property[-2] = [1, [1, 0, 0]] 76 | self.property[-1] = [1, [0.411, 0.411, 0.411]] 77 | self.property[0] = [1, [0, 0, 0]] 78 | 79 | def _init_group(self): 80 | for i in xrange(self.agent_num): 81 | self.id_group[i + 1] = 0 82 | 83 | def _gen_power(self, cnt): 84 | 85 | def max_view_size(view_size1, view_size2): 86 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 87 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 88 | 89 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 90 | 91 | def min_view_size(view_size1, view_size2): 92 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 93 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 94 | 95 | return view_size1 if view_size_area1 < view_size_area2 else view_size2 96 | 97 | cur = 0 98 | for k in self.view_args: 99 | k = [int(x) for x in k.split('-')] 100 | assert len(k) == 4 101 | 102 | num, power_list = k[0], k[1:] 103 | # Maintain the max_view_size 104 | if self.max_view_size is None: 105 | self.max_view_size = power_list 106 | else: 107 | self.max_view_size = max_view_size(self.max_view_size, power_list) 108 | 109 | if self.min_view_size is None: 110 | self.min_view_size = power_list 111 | else: 112 | self.min_view_size = min_view_size(self.min_view_size, power_list) 113 | 114 | cur += num 115 | 116 | if cnt <= cur: 117 | return power_list 118 | 119 | def gen_wall(self, prob=0, seed=10): 120 | if prob == 0: 121 | return 122 | np.random.seed(seed) 123 | # Generate wall according to the prob 124 | for i in xrange(self.h): 125 | for j in xrange(self.w): 126 | if i == 0 or i == self.h - 1 or j == 0 or j == self.w - 1: 127 | self.map[i][j] = -1 128 | continue 129 | wall_prob = np.random.rand() 130 | if wall_prob < prob: 131 | self.map[i][j] = -1 132 | 133 | def gen_agent(self, agent_num=None): 134 | if agent_num == None: 135 | agent_num = self.args.agent_number 136 | 137 | for i in xrange(agent_num): 138 | while True: 139 | x = np.random.randint(0, self.h) 140 | y = np.random.randint(0, self.w) 141 | if self.map[x][y] == 0: 142 | self.map[x][y] = i + 1 143 | self.id_pos[i + 1] = (x, y) 144 | self.property[i + 1] = [self._gen_power(i + 1), [0, 0, 1]] 145 | self.health[i + 1] = 1.0 146 | # Record the birthday of any agent 147 | self.birth_year[i + 1] = 0 148 | break 149 | assert (2 * self.max_view_size[0] + 1) * (self.max_view_size[1] + 1) * 5 + self.args.agent_emb_dim == \ 150 | self.args.view_flat_size 151 | 152 | self.agent_num = self.args.agent_number 153 | self.max_id = self.args.agent_number 154 | # self.property_copy = self.property[:] 155 | for k in self.property: 156 | self.property_copy[k] = self.property[k][:] 157 | # self.property_copy = deepcopy(self.property) 158 | 159 | def _grow_power(self): 160 | 161 | candidate_view = [] 162 | for k in self.view_args: 163 | k = [int(x) for x in k.split('-')] 164 | assert len(k) == 4 165 | candidate_view.append(k) 166 | 167 | num = len(candidate_view) 168 | random_power = np.random.randint(0, num) 169 | 170 | return candidate_view[random_power][1:] 171 | 172 | def grow_agent(self, agent_num=0, cur_step=-1): 173 | if agent_num == 0: 174 | return 175 | 176 | for i in xrange(agent_num): 177 | while True: 178 | x = np.random.randint(0, self.h) 179 | y = np.random.randint(0, self.w) 180 | if self.map[x][y] == 0: 181 | self.max_id += 1 182 | self.map[x][y] = self.max_id 183 | self.id_pos[self.max_id] = (x, y) 184 | self.property[self.max_id] = [self._grow_power(), [0, 0, 1]] 185 | self.property_copy[self.max_id] = self.property[self.max_id][:] 186 | self.health[self.max_id] = 1.0 187 | self.id_group[self.max_id] = 0 188 | # Record the birthday of the new agent 189 | self.birth_year[self.max_id] = cur_step 190 | break 191 | 192 | self.agent_num += agent_num 193 | 194 | def gen_pig(self, pig_nums=None): 195 | if pig_nums == None: 196 | pig_nums = self.args.pig_max_number 197 | 198 | for i in xrange(pig_nums): 199 | while True: 200 | x = np.random.randint(0, self.h) 201 | y = np.random.randint(0, self.w) 202 | if self.map[x][y] == 0: 203 | self.map[x][y] = -2 204 | self.pig_pos.add((x, y)) 205 | break 206 | 207 | self.pig_num = self.pig_num + pig_nums 208 | 209 | def gen_rabbit(self, rabbit_num=None): 210 | if rabbit_num is None: 211 | rabbit_num = self.args.rabbit_max_number 212 | 213 | for i in xrange(rabbit_num): 214 | while True: 215 | x = np.random.randint(0, self.h) 216 | y = np.random.randint(0, self.w) 217 | if self.map[x][y] == 0: 218 | self.map[x][y] = -3 219 | self.rabbit_pos.add((x, y)) 220 | break 221 | 222 | self.rabbit_num = self.rabbit_num + rabbit_num 223 | 224 | def get_pig_num(self): 225 | return self.pig_num 226 | 227 | def get_rabbit_num(self): 228 | return self.rabbit_num 229 | 230 | def get_agent_num(self): 231 | return self.agent_num 232 | 233 | def _agent_act(self, x, y, face, action, id): 234 | 235 | def move_forward(x, y, face): 236 | if face == 0: 237 | return x - 1, y 238 | elif face == 1: 239 | return x, y + 1 240 | elif face == 2: 241 | return x + 1, y 242 | elif face == 3: 243 | return x, y - 1 244 | 245 | def move_backward(x, y, face): 246 | if face == 0: 247 | return x + 1, y 248 | elif face == 1: 249 | return x, y - 1 250 | elif face == 2: 251 | return x - 1, y 252 | elif face == 3: 253 | return x, y + 1 254 | 255 | def move_left(x, y, face): 256 | if face == 0: 257 | return x, y - 1 258 | elif face == 1: 259 | return x - 1, y 260 | elif face == 2: 261 | return x, y + 1 262 | elif face == 3: 263 | return x + 1, y 264 | 265 | def move_right(x, y, face): 266 | if face == 0: 267 | return x, y + 1 268 | elif face == 1: 269 | return x + 1, y 270 | elif face == 2: 271 | return x, y - 1 272 | elif face == 3: 273 | return x - 1, y 274 | 275 | def in_board(x, y): 276 | return self.map[x][y] == 0 277 | 278 | # return the max view size(the area of the view) of the two view sizes 279 | def max_view_size(view_size1, view_size2): 280 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 281 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 282 | 283 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 284 | 285 | if action == 0: 286 | pass 287 | elif action == 1: 288 | new_x, new_y = move_forward(x, y, face) 289 | if in_board(new_x, new_y): 290 | self.map[x][y] = 0 291 | self.map[new_x][new_y] = id 292 | self.id_pos[id] = (new_x, new_y) 293 | elif action == 2: 294 | new_x, new_y = move_backward(x, y, face) 295 | if in_board(new_x, new_y): 296 | self.map[x][y] = 0 297 | self.map[new_x][new_y] = id 298 | self.id_pos[id] = (new_x, new_y) 299 | elif action == 3: 300 | new_x, new_y = move_left(x, y, face) 301 | if in_board(new_x, new_y): 302 | self.map[x][y] = 0 303 | self.map[new_x][new_y] = id 304 | self.id_pos[id] = (new_x, new_y) 305 | elif action == 4: 306 | new_x, new_y = move_right(x, y, face) 307 | if in_board(new_x, new_y): 308 | self.map[x][y] = 0 309 | self.map[new_x][new_y] = id 310 | self.id_pos[id] = (new_x, new_y) 311 | elif action == 5: 312 | self.property[id][0][2] = (face + 4 - 1) % 4 313 | elif action == 6: 314 | self.property[id][0][2] = (face + 1) % 4 315 | elif action == 7: 316 | if self.id_group[id] == 0: 317 | if id in self.ally: 318 | ally_id = self.ally[id] 319 | if self.id_group[ally_id] == 0: 320 | self.max_group += 1 321 | self.id_group[id] = self.max_group 322 | self.id_group[ally_id] = self.max_group 323 | 324 | self.group_ids[self.max_group] = [] 325 | self.group_ids[self.max_group].append(id) 326 | self.group_ids[self.max_group].append(ally_id) 327 | 328 | # For view size 329 | assert self.property[id][0] == self.property_copy[id][0] 330 | assert self.property[ally_id][0] == self.property_copy[ally_id][0] 331 | self.groups_view_size[self.max_group] = max_view_size(self.property[id][0], 332 | self.property[ally_id][0]) 333 | self.property[id][0] = self.groups_view_size[self.max_group] 334 | self.property[ally_id][0] = self.groups_view_size[self.max_group] 335 | else: 336 | assert self.property[id][0] == self.property_copy[id][0] 337 | self.id_group[id] = self.id_group[ally_id] 338 | self.group_ids[self.id_group[ally_id]].append(id) 339 | 340 | group_id = self.id_group[ally_id] 341 | 342 | cur_max_view_size = max_view_size(self.property[id][0], self.groups_view_size[group_id]) 343 | if cur_max_view_size == self.property[id][0] and self.property[id][0] != self.groups_view_size[ 344 | group_id]: 345 | # A powerful man join in a group, need to change all the members' view size in that group 346 | for people in self.group_ids[group_id]: 347 | self.property[people][0] = cur_max_view_size 348 | self.groups_view_size[group_id] = cur_max_view_size 349 | else: 350 | self.property[id][0] = cur_max_view_size 351 | 352 | elif action == 8: 353 | group_id = self.id_group[id] 354 | 355 | if group_id != 0: 356 | another_id = None 357 | if len(self.group_ids[group_id]) == 2: 358 | for item in self.group_ids[group_id]: 359 | if item != id: 360 | another_id = item 361 | self.id_group[id], self.id_group[another_id] = 0, 0 362 | self.group_ids[group_id] = None 363 | 364 | # Restore the origin view size 365 | self.property[id] = self.property_copy[id][:] 366 | self.property[another_id] = self.property_copy[another_id][:] 367 | self.groups_view_size[group_id] = None 368 | else: 369 | self.id_group[id] = 0 370 | self.group_ids[group_id].remove(id) 371 | 372 | # Restore the origin view size 373 | self.property[id] = self.property_copy[id][:] 374 | cur_max_view_size = None 375 | 376 | for people in self.group_ids[group_id]: 377 | if cur_max_view_size is None: 378 | cur_max_view_size = self.property_copy[people][0][:] 379 | else: 380 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 381 | 382 | for people in self.group_ids[group_id]: 383 | self.property[people][0] = cur_max_view_size 384 | 385 | self.groups_view_size[group_id] = cur_max_view_size 386 | 387 | else: 388 | pass 389 | 390 | else: 391 | print action 392 | print "Wrong Action ID!!!!" 393 | 394 | def take_action(self, actions): 395 | 396 | # Move Agent 397 | self.actions = actions 398 | # for i in xrange(self.agent_num): 399 | for id, action in actions: 400 | x, y = self.id_pos[id] 401 | face = self.property[id][0][2] 402 | self._agent_act(x, y, face, action, id) 403 | 404 | def increase_health(self, rewards): 405 | for id in rewards: 406 | self.health[id] += 12. * rewards[id] 407 | 408 | # if rewards[id] > 0.2: 409 | # self.health[id] = 1. 410 | # elif rewards > 0: 411 | # self.health[id] += rewards[id] 412 | 413 | # self.health[id] += rewards[id] 414 | # if self.health[id] > 1.0: 415 | # self.health[id] = 1.0 416 | 417 | def group_monitor(self): 418 | """ 419 | :return: group_num, mean_size, variance_size, max_size 420 | """ 421 | group_sizes = [] 422 | group_view_num = {} 423 | group_view_avg_size = {} 424 | for k in self.group_ids: 425 | ids = self.group_ids[k] 426 | if ids: 427 | group_size = len(ids) 428 | assert group_size >= 2 429 | group_sizes.append(group_size) 430 | 431 | # count group view size and group number 432 | group_view = self.groups_view_size[k] 433 | group_view = group_view[:2] 434 | if str(group_view) not in group_view_num: 435 | group_view_num[str(group_view)] = 1 436 | else: 437 | group_view_num[str(group_view)] += 1 438 | if str(group_view) not in group_view_avg_size: 439 | group_view_avg_size[str(group_view)] = group_size 440 | else: 441 | group_view_avg_size[str(group_view)] += group_size 442 | 443 | group_sizes = np.array(group_sizes) 444 | for k in group_view_avg_size: 445 | group_view_avg_size[k] = 1. * group_view_avg_size[k] / group_view_num[k] 446 | 447 | # For reason of degroup 448 | # cnt = 0 449 | # cnt_degroup = 0 450 | # 451 | # for i, action in enumerate(self.actions): 452 | # id = i + 1 453 | # if action == 8 and self.id_group[id] > 0: 454 | # cnt += 1 455 | # if id in self.id_ally_number: 456 | # cnt_degroup += self.id_ally_number[id] 457 | # 458 | # avg_degroup = 0 if cnt == 0.0 else 1. * cnt_degroup / (1. * cnt) 459 | 460 | if len(group_sizes) > 0: 461 | return len(group_sizes), group_sizes.mean(), group_sizes.var(), np.max( 462 | group_sizes), group_view_num 463 | else: 464 | return 0, 0, 0, 0, None 465 | 466 | def track_largest_group(self, time_step, update_largest_every): 467 | if time_step % update_largest_every == 0 or (self.group_ids[self.largest_group] is None): 468 | self.largest_group_size = 0 469 | self.largest_group = 0 470 | for k in self.group_ids: 471 | ids = self.group_ids[k] 472 | if ids: 473 | if len(ids) > self.largest_group_size: 474 | self.largest_group_size = len(ids) 475 | self.largest_group = k 476 | return [self.id_pos[i] for i in self.group_ids[self.largest_group]] 477 | 478 | def update_pig_pos(self): 479 | 480 | def in_board(x, y): 481 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 482 | 483 | # Move Pigs 484 | for i, item in enumerate(self.pig_pos): 485 | x, y = item 486 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 487 | np.random.shuffle(direction) 488 | for pos_x, pos_y in direction: 489 | if (pos_x, pos_y) == (0, 0): 490 | break 491 | new_x = x + pos_x 492 | new_y = y + pos_y 493 | 494 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 495 | self.pig_pos.remove((x, y)) 496 | self.pig_pos.add((new_x, new_y)) 497 | self.map[new_x][new_y] = -2 498 | self.map[x][y] = 0 499 | break 500 | 501 | def update_rabbit_pos(self): 502 | 503 | def in_board(x, y): 504 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 505 | 506 | # Move rabbits 507 | for i, item in enumerate(self.rabbit_pos): 508 | x, y = item 509 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 510 | np.random.shuffle(direction) 511 | for pos_x, pos_y in direction: 512 | if (pos_x, pos_y) == (0, 0): 513 | break 514 | new_x = x + pos_x 515 | new_y = y + pos_y 516 | 517 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 518 | self.rabbit_pos.remove((x, y)) 519 | self.rabbit_pos.add((new_x, new_y)) 520 | self.map[new_x][new_y] = -3 521 | self.map[x][y] = 0 522 | break 523 | 524 | def decrease_health(self): 525 | for id, _ in self.id_pos.iteritems(): 526 | self.health[id] -= self.args.damage_per_step 527 | 528 | def get_avg_life(self): 529 | assert self.avg_life != None 530 | assert self.dead_people != None 531 | return self.avg_life, self.dead_people 532 | 533 | def remove_dead_people(self, cur_step): 534 | 535 | def max_view_size(view_size1, view_size2): 536 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 537 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 538 | 539 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 540 | 541 | self.dead_id = [] 542 | for id, pos in self.id_pos.iteritems(): 543 | assert id > 0 544 | if self.health[id] <= 0.: 545 | x, y = pos 546 | self.map[x][y] = 0 547 | 548 | self.dead_id.append(id) 549 | self.agent_num -= 1 550 | 551 | group_id = self.id_group[id] 552 | if group_id > 0: 553 | group_num = len(self.group_ids[group_id]) 554 | 555 | assert group_num >= 2 556 | 557 | if group_num > 2: 558 | del self.id_group[id] 559 | self.group_ids[group_id].remove(id) 560 | 561 | cur_max_view_size = None 562 | for people in self.group_ids[group_id]: 563 | if cur_max_view_size is None: 564 | cur_max_view_size = self.property_copy[people][0][:] 565 | else: 566 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 567 | for people in self.group_ids[group_id]: 568 | self.property[people][0] = cur_max_view_size 569 | 570 | self.groups_view_size[group_id] = cur_max_view_size 571 | else: 572 | another_id = None 573 | for item in self.group_ids[group_id]: 574 | if item != id: 575 | another_id = item 576 | self.id_group[another_id] = 0 577 | del self.id_group[id] 578 | self.group_ids[group_id] = None 579 | 580 | self.property[another_id] = self.property_copy[another_id][:] 581 | self.groups_view_size[group_id] = None 582 | 583 | total_life = 0 584 | 585 | for id in self.dead_id: 586 | total_life += cur_step - self.birth_year[id] 587 | del self.id_pos[id] 588 | del self.property[id] 589 | del self.property_copy[id] 590 | del self.birth_year[id] 591 | 592 | if len(self.dead_id) == 0: 593 | self.avg_life = 0 594 | else: 595 | self.avg_life = 1. * total_life / (1. * len(self.dead_id)) 596 | self.dead_people = len(self.dead_id) 597 | 598 | return self.dead_id 599 | 600 | def make_video(self, images, outvid=None, fps=5, size=None, is_color=True, format="XVID"): 601 | """ 602 | Create a video from a list of images. 603 | @param outvid output video 604 | @param images list of images to use in the video 605 | @param fps frame per second 606 | @param size size of each frame 607 | @param is_color color 608 | @param format see http://www.fourcc.org/codecs.php 609 | """ 610 | # fourcc = VideoWriter_fourcc(*format) 611 | # For opencv2 and opencv3: 612 | if int(cv2.__version__[0]) > 2: 613 | fourcc = cv2.VideoWriter_fourcc(*format) 614 | else: 615 | fourcc = cv2.cv.CV_FOURCC(*format) 616 | vid = None 617 | for image in images: 618 | assert os.path.exists(image) 619 | img = imread(image) 620 | if vid is None: 621 | if size is None: 622 | size = img.shape[1], img.shape[0] 623 | vid = VideoWriter(outvid, fourcc, float(fps), size, is_color) 624 | if size[0] != img.shape[1] and size[1] != img.shape[0]: 625 | img = resize(img, size) 626 | vid.write(img) 627 | vid.release() 628 | 629 | def dump_image(self, img_name): 630 | new_w, new_h = self.w * 5, self.h * 5 631 | img = np.zeros((new_w, new_h, 3), dtype=np.uint8) 632 | length = self.args.img_length 633 | for i in xrange(self.w): 634 | for j in xrange(self.h): 635 | id = self.map[i][j] 636 | if id != 0: 637 | for m in xrange(length): 638 | for n in xrange(length): 639 | img[i * length + m][j * length + n] = 255 * np.array(self.property[id][1]) 640 | output_img = Image.fromarray(img, 'RGB') 641 | output_img.save(img_name) 642 | 643 | 644 | def _get_reward_pig(pos): 645 | def in_bound(x, y): 646 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 647 | 648 | x, y = pos 649 | groups_num = {} 650 | for i in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 651 | for j in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 652 | new_x, new_y = x + i, y + j 653 | if in_bound(new_x, new_y): 654 | id = env_map[new_x][new_y] 655 | if id > 0 and env_id_group[id] > 0: 656 | if env_id_group[id] in groups_num: 657 | groups_num[env_id_group[id]] += 1 658 | else: 659 | groups_num[env_id_group[id]] = 1 660 | if len(groups_num): 661 | groups_num = [(k, groups_num[k]) for k in groups_num if groups_num[k] >= env_reward_threshold_pig] 662 | if len(groups_num) > 0: 663 | groups_num = sorted(groups_num, key=lambda x: x[1]) 664 | return env_group_ids[groups_num[-1][0]], pos 665 | else: 666 | return [], pos 667 | else: 668 | return [], pos 669 | 670 | 671 | def _get_reward_rabbit_both(pos): 672 | # both groups and individuals can catch rabbits 673 | def in_bound(x, y): 674 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 675 | 676 | x, y = pos 677 | candidates = [] 678 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 679 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 680 | new_x, new_y = x + i, y + j 681 | if in_bound(new_x, new_y): 682 | id = env_map[new_x][new_y] 683 | if id > 0: 684 | candidates.append(id) 685 | if len(candidates) > 0: 686 | winner = np.random.choice(candidates) 687 | if env_id_group[winner] == 0: 688 | return [winner], pos 689 | else: 690 | return env_group_ids[env_id_group[winner]], pos 691 | else: 692 | return [], pos 693 | 694 | 695 | def _get_reward_rabbit_individual(pos): 696 | # only individuals can catch rabbits 697 | def in_bound(x, y): 698 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 699 | 700 | x, y = pos 701 | candidates = [] 702 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 703 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 704 | new_x, new_y = x + i, y + j 705 | if in_bound(new_x, new_y): 706 | id = env_map[new_x][new_y] 707 | if id > 0 and env_id_group[id] == 0: 708 | candidates.append(id) 709 | if len(candidates) > 0: 710 | return [np.random.choice(candidates)], pos 711 | else: 712 | return [], pos 713 | 714 | 715 | def get_reward(env): 716 | global env_pig_pos 717 | global env_agent_num 718 | global env_batch_size 719 | global env_map 720 | global env_reward_radius_pig 721 | global env_reward_radius_rabbit 722 | global env_reward_threshold_pig 723 | global env_w 724 | global env_h 725 | global env_id_group 726 | global env_group_ids 727 | 728 | env_pig_pos = env.pig_pos 729 | env_rabbit_pos = env.rabbit_pos 730 | env_agent_num = env.agent_num 731 | env_map = env.map 732 | env_batch_size = env.batch_size 733 | env_reward_radius_pig = env.reward_radius_pig 734 | env_reward_threshold_pig = env.reward_threshold_pig 735 | env_reward_radius_rabbit = env.reward_radius_rabbit 736 | env_w = env.w 737 | env_h = env.h 738 | env_id_group = env.id_group 739 | env_group_ids = env.group_ids 740 | 741 | cores = multiprocessing.cpu_count() 742 | pool = multiprocessing.Pool(processes=cores) 743 | 744 | reward_ids_pig = pool.map(_get_reward_pig, env_pig_pos) 745 | reward_ids_rabbit = pool.map(_get_reward_rabbit_individual, env_rabbit_pos) 746 | pool.close() 747 | 748 | killed_pigs = set() 749 | killed_rabbits = set() 750 | rewards = {} 751 | 752 | for item in reward_ids_pig: 753 | if len(item[0]) > 0: 754 | reward_per_agent = 1. / len(item[0]) 755 | for id in item[0]: 756 | if id not in rewards: 757 | rewards[id] = reward_per_agent 758 | else: 759 | rewards[id] += reward_per_agent 760 | killed_pigs.add(item[1]) 761 | 762 | for item in reward_ids_rabbit: 763 | if len(item[0]) > 0: 764 | reward_per_agent = 0.05 / len(item[0]) 765 | for id in item[0]: 766 | if id not in rewards: 767 | rewards[id] = reward_per_agent 768 | else: 769 | rewards[id] += reward_per_agent 770 | killed_rabbits.add(item[1]) 771 | 772 | env_pig_pos = env_pig_pos - killed_pigs 773 | env.pig_pos = env_pig_pos 774 | env.pig_num -= len(killed_pigs) 775 | 776 | env_rabbit_pos = env_rabbit_pos - killed_rabbits 777 | env.rabbit_pos = env_rabbit_pos 778 | env.rabbit_num -= len(killed_rabbits) 779 | 780 | for item in killed_pigs: 781 | x, y = item 782 | env.map[x][y] = 0 783 | for item in killed_rabbits: 784 | x, y = item 785 | env.map[x][y] = 0 786 | 787 | return rewards 788 | 789 | 790 | def get_view(env): 791 | global env_property 792 | global env_map 793 | global env_h 794 | global env_w 795 | global env_id_group 796 | global env_id_pos 797 | global batch_size 798 | global env_agent_num 799 | global env_max_view_size 800 | global env_min_view_size 801 | global env_id_ally_number 802 | global env_health 803 | 804 | env_property = env.property 805 | env_map = env.map 806 | env_h = env.h 807 | env_w = env.w 808 | env_id_group = env.id_group 809 | env_id_pos = env.id_pos 810 | env_batch_size = env.batch_size 811 | env_agent_num = env.agent_num 812 | env_max_view_size = env.max_view_size 813 | env_min_view_size = env.min_view_size 814 | env_id_ally_number = {} 815 | env_health = env.health 816 | 817 | allies = [] 818 | 819 | cores = multiprocessing.cpu_count() 820 | pool = multiprocessing.Pool(processes=cores) 821 | 822 | env_id_pos_keys = env_id_pos.keys() 823 | env_id_pos_keys.sort() 824 | pos = [env_id_pos[k] for k in env_id_pos_keys] 825 | view = pool.map(_get_view, pos) 826 | pool.close() 827 | 828 | env.id_ally_number = env_id_ally_number 829 | 830 | views = [] 831 | ids = [] 832 | for item in view: 833 | views.append(item[0]) 834 | ids.append(item[2]) 835 | if item[1]: 836 | allies.append(item[1]) 837 | 838 | env.ally.clear() 839 | # Candidate ally 840 | for item in allies: 841 | env.ally[item[0]] = item[1] 842 | 843 | view = np.array(views) 844 | 845 | batch_views = [] 846 | 847 | for i in xrange(int(np.ceil(1. * env_agent_num / env_batch_size))): 848 | st = env_batch_size * i 849 | ed = st + env_batch_size 850 | if ed > env_agent_num: 851 | ed = env_agent_num 852 | 853 | # batch_view_tmp = view[st:ed] 854 | # batch_ids = ids[st:ed] 855 | batch_view = [] 856 | for j in xrange(st, ed): 857 | batch_view.append((ids[j], view[j])) 858 | 859 | batch_views.append(batch_view) 860 | 861 | return batch_views 862 | 863 | 864 | def _get_view(pos): 865 | x, y = pos 866 | range_l, range_f, face = env_property[env_map[x][y]][0] 867 | max_range_l, max_range_f, _ = env_max_view_size 868 | min_range_l, min_range_f, _ = env_min_view_size 869 | # single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 4), dtype=np.float32) 870 | single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 5), dtype=np.float32) 871 | env_ally = None 872 | 873 | def in_bound(x, y): 874 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 875 | 876 | def in_group(id_1, id_2): 877 | if env_id_group[id_1] == env_id_group[id_2]: 878 | return True 879 | else: 880 | return False 881 | 882 | cur_pos = 0 883 | allies = set() 884 | face = 0 885 | if face == 0: 886 | # for i in xrange(-range_f, 1): 887 | # for j in xrange(-range_l, range_l + 1): 888 | for i in xrange(-max_range_f, 1): 889 | for j in xrange(-max_range_l, max_range_l + 1): 890 | new_x, new_y = x + i, y + j 891 | 892 | if not in_bound(new_x, new_y) or i < -range_f or j < -range_l or j > range_l: 893 | single_view[cur_pos] = [1, 1, 0, 0, 0] 894 | else: 895 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_f, 896 | 1) and j in xrange( 897 | -min_range_l, min_range_l + 1): 898 | allies.add(env_map[new_x][new_y]) 899 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 900 | env_property[env_map[x][y]][1] 901 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 902 | single_view[cur_pos][3] = 1 903 | # For exploring the reason of degroup 904 | if env_map[x][y] in env_id_ally_number: 905 | env_id_ally_number[env_map[x][y]] += 1 906 | else: 907 | env_id_ally_number[env_map[x][y]] = 1 908 | else: 909 | single_view[cur_pos][3] = 0 910 | 911 | # For health 912 | single_view[cur_pos][4] = env_health[env_map[x][y]] 913 | 914 | cur_pos = cur_pos + 1 915 | 916 | # TODO: the logic of join a group 917 | if len(allies) > 0: 918 | ally_id = random.sample(allies, 1)[0] 919 | id = env_map[x][y] 920 | if id != ally_id: 921 | env_ally = (id, ally_id) 922 | 923 | elif face == 1: 924 | # for i in xrange(-range_l, range_l + 1): 925 | # for j in xrange(0, range_f + 1): 926 | for i in xrange(-max_range_l, max_range_l + 1): 927 | for j in xrange(0, max_range_f + 1): 928 | new_x, new_y = x + i, y + j 929 | if not in_bound(new_x, new_y) or i < -range_l or i > range_l or j > range_f: 930 | single_view[cur_pos] = [1, 1, 0, 0, 0] 931 | else: 932 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_l, 933 | min_range_l + 1) and j in xrange( 934 | 0, min_range_f + 1): 935 | allies.add(env_map[new_x][new_y]) 936 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 937 | env_property[env_map[x][y]][1] 938 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 939 | single_view[cur_pos][3] = 1 940 | if env_map[x][y] in env_id_ally_number: 941 | env_id_ally_number[env_map[x][y]] += 1 942 | else: 943 | env_id_ally_number[env_map[x][y]] = 1 944 | else: 945 | single_view[cur_pos][3] = 0 946 | 947 | # For health 948 | single_view[cur_pos][4] = env_health[env_map[x][y]] 949 | 950 | cur_pos = cur_pos + 1 951 | if len(allies) > 0: 952 | ally_id = random.sample(allies, 1)[0] 953 | id = env_map[x][y] 954 | if id != ally_id: 955 | env_ally = (id, ally_id) 956 | 957 | elif face == 2: 958 | # range_i_st, range_i_ed = -range_f, 0 959 | # range_j_st, range_j_ed = -range_l, range_l 960 | # for i in xrange(range_f, -1): 961 | # for j in xrange(range_l, -range_l - 1): 962 | for i in xrange(max_range_f, -1, -1): 963 | for j in xrange(max_range_l, -max_range_l - 1, -1): 964 | new_x, new_y = x + i, y + j 965 | if not in_bound(new_x, new_y) or i > range_f or j > range_l or j < -range_l: 966 | single_view[cur_pos] = [1, 1, 0, 0, 0] 967 | else: 968 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_f, -1, 969 | -1) and j in xrange( 970 | min_range_l, -min_range_l - 1, -1): 971 | allies.add(env_map[new_x][new_y]) 972 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 973 | env_property[env_map[x][y]][1] 974 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 975 | single_view[cur_pos][3] = 1 976 | if env_map[x][y] in env_id_ally_number: 977 | env_id_ally_number[env_map[x][y]] += 1 978 | else: 979 | env_id_ally_number[env_map[x][y]] = 1 980 | else: 981 | single_view[cur_pos][3] = 0 982 | # For health 983 | single_view[cur_pos][4] = env_health[env_map[x][y]] 984 | 985 | cur_pos = cur_pos + 1 986 | if len(allies) > 0: 987 | ally_id = random.sample(allies, 1)[0] 988 | id = env_map[x][y] 989 | if id != ally_id: 990 | env_ally = (id, ally_id) 991 | 992 | 993 | elif face == 3: 994 | # for i in xrange(range_l, -range_l - 1): 995 | # for j in xrange(-range_f, 1): 996 | for i in xrange(max_range_l, -max_range_l - 1, -1): 997 | for j in xrange(-max_range_f, 1): 998 | print "miaomiaomiao" 999 | new_x, new_y = x + i, y + j 1000 | if not in_bound(new_x, new_y) or i > range_l or i < -range_l or j < -range_f: 1001 | single_view[cur_pos] = [1, 1, 0, 0, 0] 1002 | else: 1003 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_l, 1004 | -min_range_l - 1, 1005 | -1) and j in xrange( 1006 | -min_range_f, 1): 1007 | allies.add(env_map[new_x][new_y]) 1008 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 1009 | env_property[env_map[x][y]][1] 1010 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 1011 | single_view[cur_pos][3] = 1 1012 | if env_map[x][y] in env_id_ally_number: 1013 | env_id_ally_number[env_map[x][y]] += 1 1014 | else: 1015 | env_id_ally_number[env_map[x][y]] = 1 1016 | else: 1017 | single_view[cur_pos][3] = 0 1018 | 1019 | # For health 1020 | single_view[cur_pos][4] = env_health[env_map[x][y]] 1021 | 1022 | cur_pos = cur_pos + 1 1023 | if len(allies) > 0: 1024 | ally_id = random.sample(allies, 1)[0] 1025 | id = env_map[x][y] 1026 | if id != ally_id: 1027 | env_ally = (id, ally_id) 1028 | 1029 | else: 1030 | print "Error Face!!!" 1031 | assert cur_pos == (2 * max_range_l + 1) * (max_range_f + 1) 1032 | return single_view.reshape(-1), env_ally, env_map[x][y] 1033 | -------------------------------------------------------------------------------- /src/Population Dynamics/main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from env import Env, get_view, get_reward 3 | from Model import Model_DNN 4 | import argparse 5 | import tensorflow as tf 6 | import os 7 | import shutil 8 | import time 9 | 10 | if __name__ == '__main__': 11 | argparser = argparse.ArgumentParser(sys.argv[0]) 12 | # Environment 13 | argparser.add_argument('--add_pig_number', type=int, default=500) 14 | argparser.add_argument('--add_rabbit_number', type=int, default=500) 15 | argparser.add_argument('--add_every', type=int, default=500) 16 | 17 | argparser.add_argument('--random_seed', type=int, default=10, 18 | help='the random seed to generate the wall in the map') 19 | argparser.add_argument('--width', type=int, default=1000) 20 | argparser.add_argument('--height', type=int, default=1000) 21 | argparser.add_argument('--batch_size', type=int, default=32) 22 | argparser.add_argument('--view_args', type=str, default='2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3', 23 | help="num-leftView-frontView-orientation, separated by space") 24 | argparser.add_argument('--pig_max_number', type=int, default=3000) 25 | argparser.add_argument('--pig_min_number', type=int, default=1500) 26 | argparser.add_argument('--pig_increase_every', type=int, default=5) 27 | argparser.add_argument('--pig_increase_rate', type=float, default=0.001) 28 | argparser.add_argument('--rabbit_increase_rate', type=float, default=0.001) 29 | argparser.add_argument('--rabbit_max_number', type=int, default=3000) 30 | argparser.add_argument('--agent_increase_rate', type=float, default=0.001) 31 | argparser.add_argument('--pig_increase_policy', type=int, default=1, 32 | help='0: max_min; 1: increase every n timestep') 33 | argparser.add_argument('--reward_radius_rabbit', type=int, default=7) 34 | argparser.add_argument('--reward_radius_pig', type=int, default=7) 35 | argparser.add_argument('--reward_threshold_pig', type=int, default=5) 36 | argparser.add_argument('--img_length', type=int, default=5) 37 | argparser.add_argument('--images_dir', type=str, default='images') 38 | argparser.add_argument('--agent_mortal', type=int, default=0, 39 | help='0: immortal, 1: mortal') 40 | argparser.add_argument('--agent_emb_dim', type=int, default=5) 41 | argparser.add_argument('--agent_id', type=int, default=0, 42 | help='0: no id, 1: has id') 43 | argparser.add_argument('--damage_per_step', type=float, default=0.) 44 | # model 45 | argparser.add_argument('--model_name', type=str, default='DNN') 46 | argparser.add_argument('--model_hidden_size', type=str, default='32,32') 47 | argparser.add_argument('--activations', type=str, default='sigmoid,sigmoid') 48 | argparser.add_argument('--view_flat_size', type=int, default=66 * 5 + 5) 49 | argparser.add_argument('--num_actions', type=int, default=9) 50 | argparser.add_argument('--reward_decay', type=float, default=0.9) 51 | argparser.add_argument('--save_every_round', type=int, default=10) 52 | argparser.add_argument('--save_dir', type=str, default='models') 53 | argparser.add_argument('--load_dir', type=str, default=None, 54 | help='e.g. models/round_0/model.ckpt') 55 | # Train 56 | argparser.add_argument('--video_dir', type=str, default='videos') 57 | argparser.add_argument('--video_per_round', type=int, default=None) 58 | argparser.add_argument('--round', type=int, default=100) 59 | argparser.add_argument('--time_step', type=int, default=10) 60 | argparser.add_argument('--policy', type=str, default='e_greedy') 61 | argparser.add_argument('--epsilon', type=float, default=0.1) 62 | argparser.add_argument('--agent_number', type=int, default=100) 63 | argparser.add_argument('--learning_rate', type=float, default=0.001) 64 | argparser.add_argument('--log_file', type=str, default='log.txt') 65 | argv = argparser.parse_args() 66 | 67 | argv.model_hidden_size = [int(x) for x in argv.model_hidden_size.split(',')] 68 | argv.view_args = [x for x in argv.view_args.split(',')] 69 | argv.activations = [x for x in argv.activations.split(',')] 70 | if argv.load_dir == 'None': 71 | argv.load_dir = None 72 | 73 | env = Env(argv) 74 | model = Model_DNN(argv) 75 | 76 | # Environment Initialization 77 | env.gen_wall(0.02, seed=argv.random_seed) 78 | env.gen_agent(argv.agent_number) 79 | 80 | env.gen_pig(argv.pig_max_number) 81 | # env.gen_rabbit(argv.rabbit_max_number) 82 | 83 | config = tf.ConfigProto() 84 | config.gpu_options.allow_growth = True 85 | sess = tf.Session(config=config) 86 | sess.run(tf.global_variables_initializer()) 87 | 88 | if argv.load_dir: 89 | model.load(sess, argv.load_dir) 90 | print 'Load model from ' + argv.load_dir 91 | 92 | if not os.path.exists(argv.images_dir): 93 | os.mkdir(argv.images_dir) 94 | if not os.path.exists(argv.video_dir): 95 | os.mkdir(argv.video_dir) 96 | if not os.path.exists(argv.save_dir): 97 | os.mkdir(argv.save_dir) 98 | 99 | flip = 0 100 | 101 | log = open(argv.log_file, 'w') 102 | log_largest_group = open('log_largest_group.txt', 'w') 103 | for r in xrange(argv.round): 104 | video_flag = False 105 | if argv.video_per_round > 0 and r % argv.video_per_round == 0: 106 | video_flag = True 107 | img_dir = os.path.join(argv.images_dir, str(r)) 108 | try: 109 | os.makedirs(img_dir) 110 | except: 111 | shutil.rmtree(img_dir) 112 | os.makedirs(img_dir) 113 | for t in xrange(argv.time_step): 114 | if t == 0 and video_flag: 115 | env.dump_image(os.path.join(img_dir, '%d.png' % t)) 116 | 117 | view_batches = get_view(env) # s 118 | actions, actions_batches = model.infer_actions(sess, view_batches, policy=argv.policy, 119 | epsilon=argv.epsilon) # a 120 | 121 | env.take_action(actions) 122 | env.decrease_health() 123 | env.update_pig_pos() 124 | # env.update_rabbit_pos() 125 | 126 | if video_flag: 127 | env.dump_image(os.path.join(img_dir, '%d.png' % (t + 1))) 128 | 129 | rewards = get_reward(env) # r, a dictionary 130 | env.increase_health(rewards) 131 | total_reward = 0 132 | for k, v in rewards.iteritems(): 133 | total_reward += v 134 | 135 | new_view_batches = get_view(env) # s' 136 | maxQ_batches = model.infer_max_action_values(sess, new_view_batches) 137 | 138 | model.train(sess=sess, 139 | view_batches=view_batches, 140 | actions_batches=actions_batches, 141 | rewards=rewards, 142 | maxQ_batches=maxQ_batches, 143 | learning_rate=argv.learning_rate) 144 | 145 | dead_list = env.remove_dead_people(r * argv.time_step + t) 146 | model.remove_dead_agent_emb(dead_list) # remove agent embeddings 147 | 148 | cur_pig_num = env.get_pig_num() 149 | cur_rabbit_num = env.get_rabbit_num() 150 | group_num, mean_size, variance_size, max_size, group_view_num = env.group_monitor() 151 | info = 'Round\t%d\ttimestep\t%d\tPigNum\t%d\tgroup_num\t%d\tmean_size\t%f\tvariance_size\t%f\tmax_group_size\t%d\trabbitNum\t%d' % \ 152 | (r, t, cur_pig_num, group_num, mean_size, variance_size, max_size, cur_rabbit_num) 153 | if group_view_num is not None: 154 | for k in group_view_num: 155 | x = map(int, k[1:-1].split(',')) 156 | group_view_info = '\tView\t%d\tnumber\t%d' % ( 157 | (2 * x[0] + 1) * (x[1] + 1), group_view_num[k]) 158 | info += group_view_info 159 | print group_view_info 160 | info += '\tagent_num\t%d' % env.get_agent_num() 161 | 162 | join_num = 0 163 | leave_num = 0 164 | for item in actions: 165 | if item[1] == 7: 166 | join_num += 1 167 | elif item[1] == 8: 168 | leave_num += 1 169 | join_num = 1.0 * join_num / len(actions) 170 | leave_num = 1.0 * leave_num / len(actions) 171 | info += '\tjoin_ratio\t%f\tleave_ratio\t%f' % (join_num, leave_num) 172 | ret_loss = model.get_loss() 173 | info += '\tloss\t%f' % (ret_loss) 174 | info += '\treward\t%f' % (total_reward) 175 | # divided by the number of pig number to normalize 176 | normalize_total_reward = 1. * total_reward / env.get_pig_num() 177 | info += '\tnorm_reward\t%f' % (normalize_total_reward) 178 | avg_life, dead_people = env.get_avg_life() 179 | info += '\tavg_life\t%f' % (avg_life) 180 | info += '\ttotal_dead_people\t%f' % (dead_people) 181 | 182 | print info 183 | 184 | # print 'average degroup number:\t', avg_degroup 185 | log.write(info + '\n') 186 | log.flush() 187 | 188 | # largest_group_pos = env.track_largest_group(time_step=r * argv.round + t, update_largest_every=200) 189 | # pos_info = [] 190 | # for item in largest_group_pos: 191 | # pos_info.append(str(item[0]) + ',' + str(item[1])) 192 | # log_largest_group.write('\t'.join(pos_info) + '\n') 193 | # log_largest_group.flush() 194 | 195 | if argv.pig_increase_policy == 0: 196 | if cur_pig_num < argv.pig_min_number: 197 | env.gen_pig(argv.pig_max_number - cur_pig_num) 198 | elif argv.pig_increase_policy == 1: 199 | if t % argv.pig_increase_every == 0: 200 | env.gen_pig(max(1, int(env.get_pig_num() * argv.pig_increase_rate))) 201 | elif argv.pig_increase_policy == 2: 202 | env.gen_pig(10) 203 | 204 | # env.gen_rabbit(max(10, int(env.get_rabbit_num() * argv.rabbit_increase_rate))) 205 | env.grow_agent(max(1, int(env.get_agent_num() * argv.agent_increase_rate)), r * argv.time_step + t) 206 | 207 | # if (r * argv.time_step + t) % argv.add_every == 0: 208 | # if flip: 209 | # env.gen_pig(argv.add_pig_number) 210 | # else: 211 | # env.gen_rabbit(argv.add_rabbit_number) 212 | # flip ^= 1 213 | 214 | #if flip: 215 | # if env.get_rabbit_num() < 1000: 216 | # env.gen_pig(argv.pig_max_number - env.get_pig_num()) 217 | # flip ^= 1 218 | #else: 219 | # if env.get_pig_num() < 2000: 220 | # env.gen_rabbit(argv.rabbit_max_number - env.get_rabbit_num()) 221 | # flip ^= 1 222 | 223 | if argv.save_every_round and r % argv.save_every_round == 0: 224 | if not os.path.exists(os.path.join(argv.save_dir, "round_%d" % r)): 225 | os.mkdir(os.path.join(argv.save_dir, "round_%d" % r)) 226 | model_path = os.path.join(argv.save_dir, "round_%d" % r, "model.ckpt") 227 | model.save(sess, model_path) 228 | print 'model saved into ' + model_path 229 | if video_flag: 230 | images = [os.path.join(img_dir, ("%d.png" % i)) for i in range(argv.time_step + 1)] 231 | env.make_video(images=images, outvid=os.path.join(argv.video_dir, "%d.avi" % r)) 232 | log.close() 233 | -------------------------------------------------------------------------------- /src/Population Dynamics/train.sh: -------------------------------------------------------------------------------- 1 | add_pig_number=500 2 | add_rabbit_number=500 3 | add_every=500 4 | 5 | random_seed=10 6 | width=1000 7 | height=1000 8 | batch_size=32 9 | view_args=2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3 10 | pig_max_number=5000 11 | pig_min_number=2000 12 | pig_increase_every=1 13 | pig_increase_number=10 14 | pig_increase_policy=1 15 | agent_increase_rate=0.003 16 | pig_increase_rate=0.006 17 | rabbit_increase_rate=0.008 18 | rabbit_max_number=30000 19 | reward_radius_pig=7 20 | reward_threshold_pig=3 21 | reward_radius_rabbit=2 22 | img_length=5 23 | images_dir=images 24 | agent_mortal=1 25 | agent_emb_dim=5 26 | agent_id=1 27 | damage_per_step=0.01 28 | 29 | model_name=DNN 30 | model_hidden_size=32,32 31 | activations=sigmoid,sigmoid 32 | view_flat_size=335 33 | num_actions=9 34 | reward_decay=0.9 35 | save_every_round=10 36 | save_dir=models 37 | load_dir=None 38 | 39 | video_dir=videos 40 | video_per_round=0 41 | round=100 42 | time_step=500 43 | policy=e_greedy 44 | epsilon=0.1 45 | agent_number=10000 46 | learning_rate=0.001 47 | log_file=log1.txt 48 | 49 | python main.py --add_pig_number $add_pig_number --add_every $add_every --random_seed $random_seed --width $width --height $height --batch_size $batch_size --view_args $view_args --pig_max_number $pig_max_number --pig_min_number $pig_min_number --pig_increase_every $pig_increase_every --pig_increase_policy $pig_increase_policy --agent_increase_rate $agent_increase_rate --pig_increase_rate $pig_increase_rate --rabbit_max_number $rabbit_max_number --reward_radius_pig $reward_radius_pig --reward_threshold_pig $reward_threshold_pig --reward_radius_rabbit $reward_radius_rabbit --img_length $img_length --images_dir $images_dir --agent_mortal $agent_mortal --agent_emb_dim $agent_emb_dim --agent_id $agent_id --damage_per_step $damage_per_step --model_name $model_name --model_hidden_size $model_hidden_size --activations $activations --view_flat_size $view_flat_size --num_actions $num_actions --reward_decay $reward_decay --save_every_round $save_every_round --save_dir $save_dir --load_dir $load_dir --video_dir $video_dir --video_per_round $video_per_round --round $round --time_step $time_step --policy $policy --epsilon $epsilon --agent_number $agent_number --learning_rate $learning_rate --log_file $log_file 50 | -------------------------------------------------------------------------------- /src/env.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import multiprocessing 4 | from PIL import Image 5 | import numpy as np 6 | from cv2 import VideoWriter, imread, resize 7 | from copy import deepcopy 8 | import cv2 9 | 10 | 11 | # from model import inference, train 12 | 13 | class Env(object): 14 | def __init__(self, args): 15 | self.args = args 16 | self.h = args.height 17 | self.w = args.width 18 | self.batch_size = args.batch_size 19 | self.view_args = args.view_args 20 | self.agent_num = args.agent_number 21 | self.pig_num = 0 22 | self.rabbit_num = 0 23 | self.action_num = args.num_actions 24 | 25 | # Initialization 26 | self.view = [] 27 | self.map = np.zeros((self.h, self.w), dtype=np.int32) 28 | self.id_pos = {} 29 | self.pig_pos = set() 30 | self.property = {} 31 | self.rabbit_pos = set() 32 | 33 | # For the view size modify 34 | self.property_copy = {} 35 | self.max_group = 0 36 | self.id_group = {} 37 | self.group_ids = {} 38 | self.batch_views = {} 39 | self.ally = {} 40 | 41 | # For reason of degroup 42 | self.id_ally_number = {} 43 | self.actions = None 44 | 45 | # For health 46 | self.health = {} 47 | self.max_id = 0 48 | 49 | # For mortal 50 | self.dead_id = [] 51 | 52 | # For track largest group 53 | self.largest_group = 0 54 | 55 | self.rewards = None 56 | self.reward_radius_pig = args.reward_radius_pig 57 | self.reward_threshold_pig = args.reward_threshold_pig 58 | self.reward_radius_rabbit = args.reward_radius_rabbit 59 | 60 | self.groups_view_size = {} 61 | self.max_view_size = None 62 | self.min_view_size = None 63 | 64 | self._init_property() 65 | self._init_group() 66 | 67 | def _init_property(self): 68 | self.property[-3] = [1, [0, 1, 0]] 69 | self.property[-2] = [1, [1, 0, 0]] 70 | self.property[-1] = [1, [0.411, 0.411, 0.411]] 71 | self.property[0] = [1, [0, 0, 0]] 72 | 73 | def _init_group(self): 74 | for i in xrange(self.agent_num): 75 | self.id_group[i + 1] = 0 76 | 77 | def _gen_power(self, cnt): 78 | 79 | def max_view_size(view_size1, view_size2): 80 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 81 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 82 | 83 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 84 | 85 | def min_view_size(view_size1, view_size2): 86 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 87 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 88 | 89 | return view_size1 if view_size_area1 < view_size_area2 else view_size2 90 | 91 | cur = 0 92 | for k in self.view_args: 93 | k = [int(x) for x in k.split('-')] 94 | assert len(k) == 4 95 | 96 | num, power_list = k[0], k[1:] 97 | # Maintain the max_view_size 98 | if self.max_view_size is None: 99 | self.max_view_size = power_list 100 | else: 101 | self.max_view_size = max_view_size(self.max_view_size, power_list) 102 | 103 | if self.min_view_size is None: 104 | self.min_view_size = power_list 105 | else: 106 | self.min_view_size = min_view_size(self.min_view_size, power_list) 107 | 108 | cur += num 109 | 110 | if cnt <= cur: 111 | return power_list 112 | 113 | def gen_wall(self, prob=0, seed=10): 114 | if prob == 0: 115 | return 116 | np.random.seed(seed) 117 | # Generate wall according to the prob 118 | for i in xrange(self.h): 119 | for j in xrange(self.w): 120 | if i == 0 or i == self.h - 1 or j == 0 or j == self.w - 1: 121 | self.map[i][j] = -1 122 | continue 123 | wall_prob = np.random.rand() 124 | if wall_prob < prob: 125 | self.map[i][j] = -1 126 | 127 | def gen_agent(self, agent_num=None): 128 | if agent_num == None: 129 | agent_num = self.args.agent_number 130 | 131 | for i in xrange(agent_num): 132 | while True: 133 | x = np.random.randint(0, self.h) 134 | y = np.random.randint(0, self.w) 135 | if self.map[x][y] == 0: 136 | self.map[x][y] = i + 1 137 | self.id_pos[i + 1] = (x, y) 138 | self.property[i + 1] = [self._gen_power(i + 1), [0, 0, 1]] 139 | self.health[i + 1] = 1.0 140 | break 141 | assert (2 * self.max_view_size[0] + 1) * (self.max_view_size[1] + 1) * 5 + self.args.agent_emb_dim == \ 142 | self.args.view_flat_size 143 | 144 | self.agent_num = self.args.agent_number 145 | self.max_id = self.args.agent_number 146 | # self.property_copy = self.property[:] 147 | for k in self.property: 148 | self.property_copy[k] = self.property[k][:] 149 | # self.property_copy = deepcopy(self.property) 150 | 151 | def _grow_power(self): 152 | 153 | candidate_view = [] 154 | for k in self.view_args: 155 | k = [int(x) for x in k.split('-')] 156 | assert len(k) == 4 157 | candidate_view.append(k) 158 | 159 | num = len(candidate_view) 160 | random_power = np.random.randint(0, num) 161 | 162 | return candidate_view[random_power][1:] 163 | 164 | def grow_agent(self, agent_num=0): 165 | if agent_num == 0: 166 | return 167 | 168 | for i in xrange(agent_num): 169 | while True: 170 | x = np.random.randint(0, self.h) 171 | y = np.random.randint(0, self.w) 172 | if self.map[x][y] == 0: 173 | self.max_id += 1 174 | self.map[x][y] = self.max_id 175 | self.id_pos[self.max_id] = (x, y) 176 | self.property[self.max_id] = [self._grow_power(), [0, 0, 1]] 177 | self.property_copy[self.max_id] = self.property[self.max_id][:] 178 | self.health[self.max_id] = 1.0 179 | self.id_group[self.max_id] = 0 180 | 181 | break 182 | 183 | self.agent_num += agent_num 184 | 185 | def gen_pig(self, pig_nums=None): 186 | if pig_nums == None: 187 | pig_nums = self.args.pig_max_number 188 | 189 | for i in xrange(pig_nums): 190 | while True: 191 | x = np.random.randint(0, self.h) 192 | y = np.random.randint(0, self.w) 193 | if self.map[x][y] == 0: 194 | self.map[x][y] = -2 195 | self.pig_pos.add((x, y)) 196 | break 197 | 198 | self.pig_num = self.pig_num + pig_nums 199 | 200 | def gen_rabbit(self, rabbit_num=None): 201 | if rabbit_num is None: 202 | rabbit_num = self.args.rabbit_max_number 203 | 204 | for i in xrange(rabbit_num): 205 | while True: 206 | x = np.random.randint(0, self.h) 207 | y = np.random.randint(0, self.w) 208 | if self.map[x][y] == 0: 209 | self.map[x][y] = -3 210 | self.rabbit_pos.add((x, y)) 211 | break 212 | 213 | self.rabbit_num = self.rabbit_num + rabbit_num 214 | 215 | def get_pig_num(self): 216 | return self.pig_num 217 | 218 | def get_rabbit_num(self): 219 | return self.rabbit_num 220 | 221 | def get_agent_num(self): 222 | return self.agent_num 223 | 224 | def _agent_act(self, x, y, face, action, id): 225 | 226 | def move_forward(x, y, face): 227 | if face == 0: 228 | return x - 1, y 229 | elif face == 1: 230 | return x, y + 1 231 | elif face == 2: 232 | return x + 1, y 233 | elif face == 3: 234 | return x, y - 1 235 | 236 | def move_backward(x, y, face): 237 | if face == 0: 238 | return x + 1, y 239 | elif face == 1: 240 | return x, y - 1 241 | elif face == 2: 242 | return x - 1, y 243 | elif face == 3: 244 | return x, y + 1 245 | 246 | def move_left(x, y, face): 247 | if face == 0: 248 | return x, y - 1 249 | elif face == 1: 250 | return x - 1, y 251 | elif face == 2: 252 | return x, y + 1 253 | elif face == 3: 254 | return x + 1, y 255 | 256 | def move_right(x, y, face): 257 | if face == 0: 258 | return x, y + 1 259 | elif face == 1: 260 | return x + 1, y 261 | elif face == 2: 262 | return x, y - 1 263 | elif face == 3: 264 | return x - 1, y 265 | 266 | def in_board(x, y): 267 | return self.map[x][y] == 0 268 | 269 | # return the max view size(the area of the view) of the two view sizes 270 | def max_view_size(view_size1, view_size2): 271 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 272 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 273 | 274 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 275 | 276 | if action == 0: 277 | pass 278 | elif action == 1: 279 | new_x, new_y = move_forward(x, y, face) 280 | if in_board(new_x, new_y): 281 | self.map[x][y] = 0 282 | self.map[new_x][new_y] = id 283 | self.id_pos[id] = (new_x, new_y) 284 | elif action == 2: 285 | new_x, new_y = move_backward(x, y, face) 286 | if in_board(new_x, new_y): 287 | self.map[x][y] = 0 288 | self.map[new_x][new_y] = id 289 | self.id_pos[id] = (new_x, new_y) 290 | elif action == 3: 291 | new_x, new_y = move_left(x, y, face) 292 | if in_board(new_x, new_y): 293 | self.map[x][y] = 0 294 | self.map[new_x][new_y] = id 295 | self.id_pos[id] = (new_x, new_y) 296 | elif action == 4: 297 | new_x, new_y = move_right(x, y, face) 298 | if in_board(new_x, new_y): 299 | self.map[x][y] = 0 300 | self.map[new_x][new_y] = id 301 | self.id_pos[id] = (new_x, new_y) 302 | elif action == 5: 303 | self.property[id][0][2] = (face + 4 - 1) % 4 304 | elif action == 6: 305 | self.property[id][0][2] = (face + 1) % 4 306 | elif action == 7: 307 | if self.id_group[id] == 0: 308 | if id in self.ally: 309 | ally_id = self.ally[id] 310 | if self.id_group[ally_id] == 0: 311 | self.max_group += 1 312 | self.id_group[id] = self.max_group 313 | self.id_group[ally_id] = self.max_group 314 | 315 | self.group_ids[self.max_group] = [] 316 | self.group_ids[self.max_group].append(id) 317 | self.group_ids[self.max_group].append(ally_id) 318 | 319 | # For view size 320 | assert self.property[id][0] == self.property_copy[id][0] 321 | assert self.property[ally_id][0] == self.property_copy[ally_id][0] 322 | self.groups_view_size[self.max_group] = max_view_size(self.property[id][0], 323 | self.property[ally_id][0]) 324 | self.property[id][0] = self.groups_view_size[self.max_group] 325 | self.property[ally_id][0] = self.groups_view_size[self.max_group] 326 | else: 327 | assert self.property[id][0] == self.property_copy[id][0] 328 | self.id_group[id] = self.id_group[ally_id] 329 | self.group_ids[self.id_group[ally_id]].append(id) 330 | 331 | group_id = self.id_group[ally_id] 332 | 333 | cur_max_view_size = max_view_size(self.property[id][0], self.groups_view_size[group_id]) 334 | if cur_max_view_size == self.property[id][0] and self.property[id][0] != self.groups_view_size[ 335 | group_id]: 336 | # A powerful man join in a group, need to change all the members' view size in that group 337 | for people in self.group_ids[group_id]: 338 | self.property[people][0] = cur_max_view_size 339 | self.groups_view_size[group_id] = cur_max_view_size 340 | else: 341 | self.property[id][0] = cur_max_view_size 342 | 343 | elif action == 8: 344 | group_id = self.id_group[id] 345 | 346 | if group_id != 0: 347 | another_id = None 348 | if len(self.group_ids[group_id]) == 2: 349 | for item in self.group_ids[group_id]: 350 | if item != id: 351 | another_id = item 352 | self.id_group[id], self.id_group[another_id] = 0, 0 353 | self.group_ids[group_id] = None 354 | 355 | # Restore the origin view size 356 | self.property[id] = self.property_copy[id][:] 357 | self.property[another_id] = self.property_copy[another_id][:] 358 | self.groups_view_size[group_id] = None 359 | else: 360 | self.id_group[id] = 0 361 | self.group_ids[group_id].remove(id) 362 | 363 | # Restore the origin view size 364 | self.property[id] = self.property_copy[id][:] 365 | cur_max_view_size = None 366 | 367 | for people in self.group_ids[group_id]: 368 | if cur_max_view_size is None: 369 | cur_max_view_size = self.property_copy[people][0][:] 370 | else: 371 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 372 | 373 | for people in self.group_ids[group_id]: 374 | self.property[people][0] = cur_max_view_size 375 | 376 | self.groups_view_size[group_id] = cur_max_view_size 377 | 378 | else: 379 | pass 380 | 381 | else: 382 | print action 383 | print "Wrong Action ID!!!!" 384 | 385 | def take_action(self, actions): 386 | 387 | # Move Agent 388 | self.actions = actions 389 | # for i in xrange(self.agent_num): 390 | for id, action in actions: 391 | x, y = self.id_pos[id] 392 | face = self.property[id][0][2] 393 | self._agent_act(x, y, face, action, id) 394 | 395 | def increase_health(self, rewards): 396 | for id in rewards: 397 | self.health[id] += 12. * rewards[id] 398 | 399 | # if rewards[id] > 0.2: 400 | # self.health[id] = 1. 401 | # elif rewards > 0: 402 | # self.health[id] += rewards[id] 403 | 404 | # self.health[id] += rewards[id] 405 | # if self.health[id] > 1.0: 406 | # self.health[id] = 1.0 407 | 408 | def group_monitor(self): 409 | """ 410 | :return: group_num, mean_size, variance_size, max_size 411 | """ 412 | group_sizes = [] 413 | group_view_num = {} 414 | group_view_avg_size = {} 415 | for k in self.group_ids: 416 | ids = self.group_ids[k] 417 | if ids: 418 | group_size = len(ids) 419 | assert group_size >= 2 420 | group_sizes.append(group_size) 421 | 422 | # count group view size and group number 423 | group_view = self.groups_view_size[k] 424 | group_view = group_view[:2] 425 | if str(group_view) not in group_view_num: 426 | group_view_num[str(group_view)] = 1 427 | else: 428 | group_view_num[str(group_view)] += 1 429 | if str(group_view) not in group_view_avg_size: 430 | group_view_avg_size[str(group_view)] = group_size 431 | else: 432 | group_view_avg_size[str(group_view)] += group_size 433 | 434 | group_sizes = np.array(group_sizes) 435 | for k in group_view_avg_size: 436 | group_view_avg_size[k] = 1. * group_view_avg_size[k] / group_view_num[k] 437 | 438 | # For reason of degroup 439 | # cnt = 0 440 | # cnt_degroup = 0 441 | # 442 | # for i, action in enumerate(self.actions): 443 | # id = i + 1 444 | # if action == 8 and self.id_group[id] > 0: 445 | # cnt += 1 446 | # if id in self.id_ally_number: 447 | # cnt_degroup += self.id_ally_number[id] 448 | # 449 | # avg_degroup = 0 if cnt == 0.0 else 1. * cnt_degroup / (1. * cnt) 450 | 451 | if len(group_sizes) > 0: 452 | return len(group_sizes), group_sizes.mean(), group_sizes.var(), np.max( 453 | group_sizes), group_view_num 454 | else: 455 | return 0, 0, 0, 0, None 456 | 457 | def track_largest_group(self, time_step, update_largest_every): 458 | if time_step % update_largest_every == 0 or (self.group_ids[self.largest_group] is None): 459 | self.largest_group_size = 0 460 | self.largest_group = 0 461 | for k in self.group_ids: 462 | ids = self.group_ids[k] 463 | if ids: 464 | if len(ids) > self.largest_group_size: 465 | self.largest_group_size = len(ids) 466 | self.largest_group = k 467 | return [self.id_pos[i] for i in self.group_ids[self.largest_group]] 468 | 469 | def update_pig_pos(self): 470 | 471 | def in_board(x, y): 472 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 473 | 474 | # Move Pigs 475 | for i, item in enumerate(self.pig_pos): 476 | x, y = item 477 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 478 | np.random.shuffle(direction) 479 | for pos_x, pos_y in direction: 480 | if (pos_x, pos_y) == (0, 0): 481 | break 482 | new_x = x + pos_x 483 | new_y = y + pos_y 484 | 485 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 486 | self.pig_pos.remove((x, y)) 487 | self.pig_pos.add((new_x, new_y)) 488 | self.map[new_x][new_y] = -2 489 | self.map[x][y] = 0 490 | break 491 | 492 | def update_rabbit_pos(self): 493 | 494 | def in_board(x, y): 495 | return not (x < 0 or x >= self.h or y < 0 or y >= self.w) 496 | 497 | # Move rabbits 498 | for i, item in enumerate(self.rabbit_pos): 499 | x, y = item 500 | direction = [(-1, 0), (1, 0), (0, 1), (0, -1), (0, 0)] 501 | np.random.shuffle(direction) 502 | for pos_x, pos_y in direction: 503 | if (pos_x, pos_y) == (0, 0): 504 | break 505 | new_x = x + pos_x 506 | new_y = y + pos_y 507 | 508 | if in_board(new_x, new_y) and self.map[new_x][new_y] == 0: 509 | self.rabbit_pos.remove((x, y)) 510 | self.rabbit_pos.add((new_x, new_y)) 511 | self.map[new_x][new_y] = -3 512 | self.map[x][y] = 0 513 | break 514 | 515 | def decrease_health(self): 516 | for id, _ in self.id_pos.iteritems(): 517 | self.health[id] -= self.args.damage_per_step 518 | 519 | def remove_dead_people(self): 520 | 521 | def max_view_size(view_size1, view_size2): 522 | view_size_area1 = (2 * view_size1[0] + 1) * (view_size1[1] + 1) 523 | view_size_area2 = (2 * view_size2[0] + 1) * (view_size2[1] + 1) 524 | 525 | return view_size1 if view_size_area1 > view_size_area2 else view_size2 526 | 527 | self.dead_id = [] 528 | for id, pos in self.id_pos.iteritems(): 529 | assert id > 0 530 | if self.health[id] <= 0.: 531 | x, y = pos 532 | self.map[x][y] = 0 533 | 534 | self.dead_id.append(id) 535 | self.agent_num -= 1 536 | 537 | group_id = self.id_group[id] 538 | if group_id > 0: 539 | group_num = len(self.group_ids[group_id]) 540 | 541 | assert group_num >= 2 542 | 543 | if group_num > 2: 544 | del self.id_group[id] 545 | self.group_ids[group_id].remove(id) 546 | 547 | cur_max_view_size = None 548 | for people in self.group_ids[group_id]: 549 | if cur_max_view_size is None: 550 | cur_max_view_size = self.property_copy[people][0][:] 551 | else: 552 | cur_max_view_size = max_view_size(cur_max_view_size, self.property_copy[people][0][:]) 553 | for people in self.group_ids[group_id]: 554 | self.property[people][0] = cur_max_view_size 555 | 556 | self.groups_view_size[group_id] = cur_max_view_size 557 | else: 558 | another_id = None 559 | for item in self.group_ids[group_id]: 560 | if item != id: 561 | another_id = item 562 | self.id_group[another_id] = 0 563 | del self.id_group[id] 564 | self.group_ids[group_id] = None 565 | 566 | self.property[another_id] = self.property_copy[another_id][:] 567 | self.groups_view_size[group_id] = None 568 | 569 | for id in self.dead_id: 570 | del self.id_pos[id] 571 | del self.property[id] 572 | del self.property_copy[id] 573 | 574 | return self.dead_id 575 | 576 | def make_video(self, images, outvid=None, fps=5, size=None, is_color=True, format="XVID"): 577 | """ 578 | Create a video from a list of images. 579 | @param outvid output video 580 | @param images list of images to use in the video 581 | @param fps frame per second 582 | @param size size of each frame 583 | @param is_color color 584 | @param format see http://www.fourcc.org/codecs.php 585 | """ 586 | # fourcc = VideoWriter_fourcc(*format) 587 | # For opencv2 and opencv3: 588 | if int(cv2.__version__[0]) > 2: 589 | fourcc = cv2.VideoWriter_fourcc(*format) 590 | else: 591 | fourcc = cv2.cv.CV_FOURCC(*format) 592 | vid = None 593 | for image in images: 594 | assert os.path.exists(image) 595 | img = imread(image) 596 | if vid is None: 597 | if size is None: 598 | size = img.shape[1], img.shape[0] 599 | vid = VideoWriter(outvid, fourcc, float(fps), size, is_color) 600 | if size[0] != img.shape[1] and size[1] != img.shape[0]: 601 | img = resize(img, size) 602 | vid.write(img) 603 | vid.release() 604 | 605 | def dump_image(self, img_name): 606 | new_w, new_h = self.w * 5, self.h * 5 607 | img = np.zeros((new_w, new_h, 3), dtype=np.uint8) 608 | length = self.args.img_length 609 | for i in xrange(self.w): 610 | for j in xrange(self.h): 611 | id = self.map[i][j] 612 | if id != 0: 613 | for m in xrange(length): 614 | for n in xrange(length): 615 | img[i * length + m][j * length + n] = 255 * np.array(self.property[id][1]) 616 | output_img = Image.fromarray(img, 'RGB') 617 | output_img.save(img_name) 618 | 619 | 620 | def _get_reward_pig(pos): 621 | def in_bound(x, y): 622 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 623 | 624 | x, y = pos 625 | groups_num = {} 626 | for i in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 627 | for j in xrange(-env_reward_radius_pig, env_reward_radius_pig + 1): 628 | new_x, new_y = x + i, y + j 629 | if in_bound(new_x, new_y): 630 | id = env_map[new_x][new_y] 631 | if id > 0 and env_id_group[id] > 0: 632 | if env_id_group[id] in groups_num: 633 | groups_num[env_id_group[id]] += 1 634 | else: 635 | groups_num[env_id_group[id]] = 1 636 | if len(groups_num): 637 | groups_num = [(k, groups_num[k]) for k in groups_num if groups_num[k] >= env_reward_threshold_pig] 638 | if len(groups_num) > 0: 639 | groups_num = sorted(groups_num, key=lambda x: x[1]) 640 | return env_group_ids[groups_num[-1][0]], pos 641 | else: 642 | return [], pos 643 | else: 644 | return [], pos 645 | 646 | 647 | def _get_reward_rabbit_both(pos): 648 | # both groups and individuals can catch rabbits 649 | def in_bound(x, y): 650 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 651 | 652 | x, y = pos 653 | candidates = [] 654 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 655 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 656 | new_x, new_y = x + i, y + j 657 | if in_bound(new_x, new_y): 658 | id = env_map[new_x][new_y] 659 | if id > 0: 660 | candidates.append(id) 661 | if len(candidates) > 0: 662 | winner = np.random.choice(candidates) 663 | if env_id_group[winner] == 0: 664 | return [winner], pos 665 | else: 666 | return env_group_ids[env_id_group[winner]], pos 667 | else: 668 | return [], pos 669 | 670 | 671 | def _get_reward_rabbit_individual(pos): 672 | # only individuals can catch rabbits 673 | def in_bound(x, y): 674 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 675 | 676 | x, y = pos 677 | candidates = [] 678 | for i in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 679 | for j in xrange(-env_reward_radius_rabbit, env_reward_radius_rabbit + 1): 680 | new_x, new_y = x + i, y + j 681 | if in_bound(new_x, new_y): 682 | id = env_map[new_x][new_y] 683 | if id > 0 and env_id_group[id] == 0: 684 | candidates.append(id) 685 | if len(candidates) > 0: 686 | return [np.random.choice(candidates)], pos 687 | else: 688 | return [], pos 689 | 690 | 691 | def get_reward(env): 692 | global env_pig_pos 693 | global env_agent_num 694 | global env_batch_size 695 | global env_map 696 | global env_reward_radius_pig 697 | global env_reward_radius_rabbit 698 | global env_reward_threshold_pig 699 | global env_w 700 | global env_h 701 | global env_id_group 702 | global env_group_ids 703 | 704 | env_pig_pos = env.pig_pos 705 | env_rabbit_pos = env.rabbit_pos 706 | env_agent_num = env.agent_num 707 | env_map = env.map 708 | env_batch_size = env.batch_size 709 | env_reward_radius_pig = env.reward_radius_pig 710 | env_reward_threshold_pig = env.reward_threshold_pig 711 | env_reward_radius_rabbit = env.reward_radius_rabbit 712 | env_w = env.w 713 | env_h = env.h 714 | env_id_group = env.id_group 715 | env_group_ids = env.group_ids 716 | 717 | cores = multiprocessing.cpu_count() 718 | pool = multiprocessing.Pool(processes=cores) 719 | 720 | reward_ids_pig = pool.map(_get_reward_pig, env_pig_pos) 721 | reward_ids_rabbit = pool.map(_get_reward_rabbit_individual, env_rabbit_pos) 722 | pool.close() 723 | 724 | killed_pigs = set() 725 | killed_rabbits = set() 726 | rewards = {} 727 | 728 | for item in reward_ids_pig: 729 | if len(item[0]) > 0: 730 | reward_per_agent = 1. / len(item[0]) 731 | for id in item[0]: 732 | if id not in rewards: 733 | rewards[id] = reward_per_agent 734 | else: 735 | rewards[id] += reward_per_agent 736 | killed_pigs.add(item[1]) 737 | 738 | for item in reward_ids_rabbit: 739 | if len(item[0]) > 0: 740 | reward_per_agent = 0.05 / len(item[0]) 741 | for id in item[0]: 742 | if id not in rewards: 743 | rewards[id] = reward_per_agent 744 | else: 745 | rewards[id] += reward_per_agent 746 | killed_rabbits.add(item[1]) 747 | 748 | env_pig_pos = env_pig_pos - killed_pigs 749 | env.pig_pos = env_pig_pos 750 | env.pig_num -= len(killed_pigs) 751 | 752 | env_rabbit_pos = env_rabbit_pos - killed_rabbits 753 | env.rabbit_pos = env_rabbit_pos 754 | env.rabbit_num -= len(killed_rabbits) 755 | 756 | for item in killed_pigs: 757 | x, y = item 758 | env.map[x][y] = 0 759 | for item in killed_rabbits: 760 | x, y = item 761 | env.map[x][y] = 0 762 | 763 | return rewards 764 | 765 | 766 | def get_view(env): 767 | global env_property 768 | global env_map 769 | global env_h 770 | global env_w 771 | global env_id_group 772 | global env_id_pos 773 | global batch_size 774 | global env_agent_num 775 | global env_max_view_size 776 | global env_min_view_size 777 | global env_id_ally_number 778 | global env_health 779 | 780 | env_property = env.property 781 | env_map = env.map 782 | env_h = env.h 783 | env_w = env.w 784 | env_id_group = env.id_group 785 | env_id_pos = env.id_pos 786 | env_batch_size = env.batch_size 787 | env_agent_num = env.agent_num 788 | env_max_view_size = env.max_view_size 789 | env_min_view_size = env.min_view_size 790 | env_id_ally_number = {} 791 | env_health = env.health 792 | 793 | allies = [] 794 | 795 | cores = multiprocessing.cpu_count() 796 | pool = multiprocessing.Pool(processes=cores) 797 | 798 | env_id_pos_keys = env_id_pos.keys() 799 | env_id_pos_keys.sort() 800 | pos = [env_id_pos[k] for k in env_id_pos_keys] 801 | view = pool.map(_get_view, pos) 802 | pool.close() 803 | 804 | env.id_ally_number = env_id_ally_number 805 | 806 | views = [] 807 | ids = [] 808 | for item in view: 809 | views.append(item[0]) 810 | ids.append(item[2]) 811 | if item[1]: 812 | allies.append(item[1]) 813 | 814 | env.ally.clear() 815 | # Candidate ally 816 | for item in allies: 817 | env.ally[item[0]] = item[1] 818 | 819 | view = np.array(views) 820 | 821 | batch_views = [] 822 | 823 | for i in xrange(int(np.ceil(1. * env_agent_num / env_batch_size))): 824 | st = env_batch_size * i 825 | ed = st + env_batch_size 826 | if ed > env_agent_num: 827 | ed = env_agent_num 828 | 829 | # batch_view_tmp = view[st:ed] 830 | # batch_ids = ids[st:ed] 831 | batch_view = [] 832 | for j in xrange(st, ed): 833 | batch_view.append((ids[j], view[j])) 834 | 835 | batch_views.append(batch_view) 836 | 837 | return batch_views 838 | 839 | 840 | def _get_view(pos): 841 | x, y = pos 842 | range_l, range_f, face = env_property[env_map[x][y]][0] 843 | max_range_l, max_range_f, _ = env_max_view_size 844 | min_range_l, min_range_f, _ = env_min_view_size 845 | # single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 4), dtype=np.float32) 846 | single_view = np.zeros(((2 * max_range_l + 1) * (max_range_f + 1), 5), dtype=np.float32) 847 | env_ally = None 848 | 849 | def in_bound(x, y): 850 | return not (x < 0 or x >= env_h or y < 0 or y >= env_w) 851 | 852 | def in_group(id_1, id_2): 853 | if env_id_group[id_1] == env_id_group[id_2]: 854 | return True 855 | else: 856 | return False 857 | 858 | cur_pos = 0 859 | allies = set() 860 | face = 0 861 | if face == 0: 862 | # for i in xrange(-range_f, 1): 863 | # for j in xrange(-range_l, range_l + 1): 864 | for i in xrange(-max_range_f, 1): 865 | for j in xrange(-max_range_l, max_range_l + 1): 866 | new_x, new_y = x + i, y + j 867 | 868 | if not in_bound(new_x, new_y) or i < -range_f or j < -range_l or j > range_l: 869 | single_view[cur_pos] = [1, 1, 0, 0, 0] 870 | else: 871 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_f, 872 | 1) and j in xrange( 873 | -min_range_l, min_range_l + 1): 874 | allies.add(env_map[new_x][new_y]) 875 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 876 | env_property[env_map[x][y]][1] 877 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 878 | single_view[cur_pos][3] = 1 879 | # For exploring the reason of degroup 880 | if env_map[x][y] in env_id_ally_number: 881 | env_id_ally_number[env_map[x][y]] += 1 882 | else: 883 | env_id_ally_number[env_map[x][y]] = 1 884 | else: 885 | single_view[cur_pos][3] = 0 886 | 887 | # For health 888 | single_view[cur_pos][4] = env_health[env_map[x][y]] 889 | 890 | cur_pos = cur_pos + 1 891 | 892 | # TODO: the logic of join a group 893 | if len(allies) > 0: 894 | ally_id = random.sample(allies, 1)[0] 895 | id = env_map[x][y] 896 | if id != ally_id: 897 | env_ally = (id, ally_id) 898 | 899 | elif face == 1: 900 | # for i in xrange(-range_l, range_l + 1): 901 | # for j in xrange(0, range_f + 1): 902 | for i in xrange(-max_range_l, max_range_l + 1): 903 | for j in xrange(0, max_range_f + 1): 904 | new_x, new_y = x + i, y + j 905 | if not in_bound(new_x, new_y) or i < -range_l or i > range_l or j > range_f: 906 | single_view[cur_pos] = [1, 1, 0, 0, 0] 907 | else: 908 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(-min_range_l, 909 | min_range_l + 1) and j in xrange( 910 | 0, min_range_f + 1): 911 | allies.add(env_map[new_x][new_y]) 912 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 913 | env_property[env_map[x][y]][1] 914 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 915 | single_view[cur_pos][3] = 1 916 | if env_map[x][y] in env_id_ally_number: 917 | env_id_ally_number[env_map[x][y]] += 1 918 | else: 919 | env_id_ally_number[env_map[x][y]] = 1 920 | else: 921 | single_view[cur_pos][3] = 0 922 | 923 | # For health 924 | single_view[cur_pos][4] = env_health[env_map[x][y]] 925 | 926 | cur_pos = cur_pos + 1 927 | if len(allies) > 0: 928 | ally_id = random.sample(allies, 1)[0] 929 | id = env_map[x][y] 930 | if id != ally_id: 931 | env_ally = (id, ally_id) 932 | 933 | elif face == 2: 934 | # range_i_st, range_i_ed = -range_f, 0 935 | # range_j_st, range_j_ed = -range_l, range_l 936 | # for i in xrange(range_f, -1): 937 | # for j in xrange(range_l, -range_l - 1): 938 | for i in xrange(max_range_f, -1, -1): 939 | for j in xrange(max_range_l, -max_range_l - 1, -1): 940 | new_x, new_y = x + i, y + j 941 | if not in_bound(new_x, new_y) or i > range_f or j > range_l or j < -range_l: 942 | single_view[cur_pos] = [1, 1, 0, 0, 0] 943 | else: 944 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_f, -1, 945 | -1) and j in xrange( 946 | min_range_l, -min_range_l - 1, -1): 947 | allies.add(env_map[new_x][new_y]) 948 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 949 | env_property[env_map[x][y]][1] 950 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 951 | single_view[cur_pos][3] = 1 952 | if env_map[x][y] in env_id_ally_number: 953 | env_id_ally_number[env_map[x][y]] += 1 954 | else: 955 | env_id_ally_number[env_map[x][y]] = 1 956 | else: 957 | single_view[cur_pos][3] = 0 958 | # For health 959 | single_view[cur_pos][4] = env_health[env_map[x][y]] 960 | 961 | cur_pos = cur_pos + 1 962 | if len(allies) > 0: 963 | ally_id = random.sample(allies, 1)[0] 964 | id = env_map[x][y] 965 | if id != ally_id: 966 | env_ally = (id, ally_id) 967 | 968 | 969 | elif face == 3: 970 | # for i in xrange(range_l, -range_l - 1): 971 | # for j in xrange(-range_f, 1): 972 | for i in xrange(max_range_l, -max_range_l - 1, -1): 973 | for j in xrange(-max_range_f, 1): 974 | print "miaomiaomiao" 975 | new_x, new_y = x + i, y + j 976 | if not in_bound(new_x, new_y) or i > range_l or i < -range_l or j < -range_f: 977 | single_view[cur_pos] = [1, 1, 0, 0, 0] 978 | else: 979 | if env_id_group[env_map[x][y]] == 0 and env_map[new_x][new_y] > 0 and i in xrange(min_range_l, 980 | -min_range_l - 1, 981 | -1) and j in xrange( 982 | -min_range_f, 1): 983 | allies.add(env_map[new_x][new_y]) 984 | single_view[cur_pos][0], single_view[cur_pos][1], single_view[cur_pos][2] = \ 985 | env_property[env_map[x][y]][1] 986 | if env_map[new_x][new_y] > 0 and in_group(env_map[x][y], env_map[new_x][new_y]): 987 | single_view[cur_pos][3] = 1 988 | if env_map[x][y] in env_id_ally_number: 989 | env_id_ally_number[env_map[x][y]] += 1 990 | else: 991 | env_id_ally_number[env_map[x][y]] = 1 992 | else: 993 | single_view[cur_pos][3] = 0 994 | 995 | # For health 996 | single_view[cur_pos][4] = env_health[env_map[x][y]] 997 | 998 | cur_pos = cur_pos + 1 999 | if len(allies) > 0: 1000 | ally_id = random.sample(allies, 1)[0] 1001 | id = env_map[x][y] 1002 | if id != ally_id: 1003 | env_ally = (id, ally_id) 1004 | 1005 | else: 1006 | print "Error Face!!!" 1007 | assert cur_pos == (2 * max_range_l + 1) * (max_range_f + 1) 1008 | return single_view.reshape(-1), env_ally, env_map[x][y] 1009 | -------------------------------------------------------------------------------- /src/killprocess.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | for pid in `ps aux | grep main.py | awk '{print $2}'`; do 4 | kill -9 $pid 5 | done 6 | -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from env import Env, get_view, get_reward 3 | from Model import Model_DNN 4 | import argparse 5 | import tensorflow as tf 6 | import os 7 | import shutil 8 | import time 9 | 10 | if __name__ == '__main__': 11 | argparser = argparse.ArgumentParser(sys.argv[0]) 12 | # Environment 13 | argparser.add_argument('--add_pig_number', type=int, default=500) 14 | argparser.add_argument('--add_rabbit_number', type=int, default=500) 15 | argparser.add_argument('--add_every', type=int, default=500) 16 | 17 | argparser.add_argument('--random_seed', type=int, default=10, 18 | help='the random seed to generate the wall in the map') 19 | argparser.add_argument('--width', type=int, default=1000) 20 | argparser.add_argument('--height', type=int, default=1000) 21 | argparser.add_argument('--batch_size', type=int, default=32) 22 | argparser.add_argument('--view_args', type=str, default='2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3', 23 | help="num-leftView-frontView-orientation, separated by space") 24 | argparser.add_argument('--pig_max_number', type=int, default=3000) 25 | argparser.add_argument('--pig_min_number', type=int, default=1500) 26 | argparser.add_argument('--pig_increase_every', type=int, default=5) 27 | argparser.add_argument('--pig_increase_rate', type=float, default=0.001) 28 | argparser.add_argument('--rabbit_increase_rate', type=float, default=0.001) 29 | argparser.add_argument('--rabbit_max_number', type=int, default=3000) 30 | argparser.add_argument('--agent_increase_rate', type=float, default=0.001) 31 | argparser.add_argument('--pig_increase_policy', type=int, default=1, 32 | help='0: max_min; 1: increase every n timestep') 33 | argparser.add_argument('--reward_radius_rabbit', type=int, default=7) 34 | argparser.add_argument('--reward_radius_pig', type=int, default=7) 35 | argparser.add_argument('--reward_threshold_pig', type=int, default=5) 36 | argparser.add_argument('--img_length', type=int, default=5) 37 | argparser.add_argument('--images_dir', type=str, default='images') 38 | argparser.add_argument('--agent_mortal', type=int, default=0, 39 | help='0: immortal, 1: mortal') 40 | argparser.add_argument('--agent_emb_dim', type=int, default=5) 41 | argparser.add_argument('--agent_id', type=int, default=0, 42 | help='0: no id, 1: has id') 43 | argparser.add_argument('--damage_per_step', type=float, default=0.) 44 | # model 45 | argparser.add_argument('--model_name', type=str, default='DNN') 46 | argparser.add_argument('--model_hidden_size', type=str, default='32,32') 47 | argparser.add_argument('--activations', type=str, default='sigmoid,sigmoid') 48 | argparser.add_argument('--view_flat_size', type=int, default=66 * 5 + 5) 49 | argparser.add_argument('--num_actions', type=int, default=9) 50 | argparser.add_argument('--reward_decay', type=float, default=0.9) 51 | argparser.add_argument('--save_every_round', type=int, default=10) 52 | argparser.add_argument('--save_dir', type=str, default='models') 53 | argparser.add_argument('--load_dir', type=str, default=None, 54 | help='e.g. models/round_0/model.ckpt') 55 | # Train 56 | argparser.add_argument('--video_dir', type=str, default='videos') 57 | argparser.add_argument('--video_per_round', type=int, default=None) 58 | argparser.add_argument('--round', type=int, default=100) 59 | argparser.add_argument('--time_step', type=int, default=10) 60 | argparser.add_argument('--policy', type=str, default='e_greedy') 61 | argparser.add_argument('--epsilon', type=float, default=0.1) 62 | argparser.add_argument('--agent_number', type=int, default=100) 63 | argparser.add_argument('--learning_rate', type=float, default=0.001) 64 | argparser.add_argument('--log_file', type=str, default='log.txt') 65 | argv = argparser.parse_args() 66 | 67 | argv.model_hidden_size = [int(x) for x in argv.model_hidden_size.split(',')] 68 | argv.view_args = [x for x in argv.view_args.split(',')] 69 | argv.activations = [x for x in argv.activations.split(',')] 70 | if argv.load_dir == 'None': 71 | argv.load_dir = None 72 | 73 | env = Env(argv) 74 | model = Model_DNN(argv) 75 | 76 | # Environment Initialization 77 | env.gen_wall(0.02, seed=argv.random_seed) 78 | env.gen_agent(argv.agent_number) 79 | 80 | # env.gen_pig(argv.pig_max_number) 81 | # env.gen_rabbit(argv.rabbit_max_number) 82 | 83 | config = tf.ConfigProto() 84 | config.gpu_options.allow_growth = True 85 | sess = tf.Session(config=config) 86 | sess.run(tf.global_variables_initializer()) 87 | 88 | if argv.load_dir: 89 | model.load(sess, argv.load_dir) 90 | print 'Load model from ' + argv.load_dir 91 | 92 | if not os.path.exists(argv.images_dir): 93 | os.mkdir(argv.images_dir) 94 | if not os.path.exists(argv.video_dir): 95 | os.mkdir(argv.video_dir) 96 | if not os.path.exists(argv.save_dir): 97 | os.mkdir(argv.save_dir) 98 | 99 | flip = 0 100 | 101 | log = open(argv.log_file, 'w') 102 | log_largest_group = open('log_largest_group.txt', 'w') 103 | for r in xrange(argv.round): 104 | video_flag = False 105 | if argv.video_per_round > 0 and r % argv.video_per_round == 0: 106 | video_flag = True 107 | img_dir = os.path.join(argv.images_dir, str(r)) 108 | try: 109 | os.makedirs(img_dir) 110 | except: 111 | shutil.rmtree(img_dir) 112 | os.makedirs(img_dir) 113 | for t in xrange(argv.time_step): 114 | if t == 0 and video_flag: 115 | env.dump_image(os.path.join(img_dir, '%d.png' % t)) 116 | 117 | view_batches = get_view(env) # s 118 | actions, actions_batches = model.infer_actions(sess, view_batches, policy=argv.policy, 119 | epsilon=argv.epsilon) # a 120 | 121 | env.take_action(actions) 122 | env.decrease_health() 123 | env.update_pig_pos() 124 | env.update_rabbit_pos() 125 | 126 | if video_flag: 127 | env.dump_image(os.path.join(img_dir, '%d.png' % (t + 1))) 128 | 129 | rewards = get_reward(env) # r, a dictionary 130 | env.increase_health(rewards) 131 | 132 | new_view_batches = get_view(env) # s' 133 | maxQ_batches = model.infer_max_action_values(sess, new_view_batches) 134 | 135 | model.train(sess=sess, 136 | view_batches=view_batches, 137 | actions_batches=actions_batches, 138 | rewards=rewards, 139 | maxQ_batches=maxQ_batches, 140 | learning_rate=argv.learning_rate) 141 | 142 | # dead_list = env.remove_dead_people() 143 | # model.remove_dead_agent_emb(dead_list) # remove agent embeddings 144 | 145 | cur_pig_num = env.get_pig_num() 146 | cur_rabbit_num = env.get_rabbit_num() 147 | group_num, mean_size, variance_size, max_size, group_view_num = env.group_monitor() 148 | info = 'Round\t%d\ttimestep\t%d\tPigNum\t%d\tgroup_num\t%d\tmean_size\t%f\tvariance_size\t%f\tmax_group_size\t%d\trabbitNum\t%d' % \ 149 | (r, t, cur_pig_num, group_num, mean_size, variance_size, max_size, cur_rabbit_num) 150 | if group_view_num is not None: 151 | for k in group_view_num: 152 | x = map(int, k[1:-1].split(',')) 153 | group_view_info = '\tView\t%d\tnumber\t%d' % ( 154 | (2 * x[0] + 1) * (x[1] + 1), group_view_num[k]) 155 | info += group_view_info 156 | print group_view_info 157 | info += '\tagent_num\t%d' % env.get_agent_num() 158 | 159 | join_num = 0 160 | leave_num = 0 161 | for item in actions: 162 | if item[1] == 7: 163 | join_num += 1 164 | elif item[1] == 8: 165 | leave_num += 1 166 | join_num = 1.0 * join_num / len(actions) 167 | leave_num = 1.0 * leave_num / len(actions) 168 | info += '\tjoin_ratio\t%f\tleave_ratio\t%f' % (join_num, leave_num) 169 | 170 | print info 171 | 172 | # print 'average degroup number:\t', avg_degroup 173 | log.write(info + '\n') 174 | log.flush() 175 | 176 | # largest_group_pos = env.track_largest_group(time_step=r * argv.round + t, update_largest_every=200) 177 | # pos_info = [] 178 | # for item in largest_group_pos: 179 | # pos_info.append(str(item[0]) + ',' + str(item[1])) 180 | # log_largest_group.write('\t'.join(pos_info) + '\n') 181 | # log_largest_group.flush() 182 | 183 | # if argv.pig_increase_policy == 0: 184 | # if cur_pig_num < argv.pig_min_number: 185 | # env.gen_pig(argv.pig_max_number - cur_pig_num) 186 | # elif argv.pig_increase_policy == 1: 187 | # if t % argv.pig_increase_every == 0: 188 | # env.gen_pig(max(1, int(env.get_pig_num() * argv.pig_increase_rate))) 189 | # elif argv.pig_increase_policy == 2: 190 | # env.gen_pig(10) 191 | # 192 | # env.gen_rabbit(max(10, int(env.get_rabbit_num() * argv.rabbit_increase_rate))) 193 | # env.grow_agent(max(1, int(env.get_agent_num() * argv.agent_increase_rate))) 194 | 195 | # if (r * argv.time_step + t) % argv.add_every == 0: 196 | # if flip: 197 | # env.gen_pig(argv.add_pig_number) 198 | # else: 199 | # env.gen_rabbit(argv.add_rabbit_number) 200 | # flip ^= 1 201 | 202 | if flip: 203 | if env.get_rabbit_num() < 1000: 204 | env.gen_pig(argv.pig_max_number - env.get_pig_num()) 205 | flip ^= 1 206 | else: 207 | if env.get_pig_num() < 2000: 208 | env.gen_rabbit(argv.rabbit_max_number - env.get_rabbit_num()) 209 | flip ^= 1 210 | 211 | if argv.save_every_round and r % argv.save_every_round == 0: 212 | if not os.path.exists(os.path.join(argv.save_dir, "round_%d" % r)): 213 | os.mkdir(os.path.join(argv.save_dir, "round_%d" % r)) 214 | model_path = os.path.join(argv.save_dir, "round_%d" % r, "model.ckpt") 215 | model.save(sess, model_path) 216 | print 'model saved into ' + model_path 217 | if video_flag: 218 | images = [os.path.join(img_dir, ("%d.png" % i)) for i in range(argv.time_step + 1)] 219 | env.make_video(images=images, outvid=os.path.join(argv.video_dir, "%d.avi" % r)) 220 | log.close() 221 | -------------------------------------------------------------------------------- /src/plot.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | 3 | log_file = 'log.txt' 4 | 5 | # 'Round %d timestep %d group_num %d average_group_size %f max_group_size %d' 6 | 7 | pig_num = [] 8 | group_num = [] 9 | mean_size = [] 10 | variance_size = [] 11 | max_size = [] 12 | 13 | with open(log_file)as fin: 14 | for line in fin: 15 | line = line.split() 16 | pig_num.append(line[5]) 17 | group_num.append(line[7]) 18 | mean_size.append(line[9]) 19 | variance_size.append(line[11]) 20 | max_size.append(line[13]) 21 | 22 | x = range(len(pig_num)) 23 | 24 | plt.figure(figsize=(8, 6)) 25 | plt.plot(x, pig_num, label='pig number') 26 | plt.xlabel('timestep') 27 | plt.ylabel('pig number') 28 | plt.grid() 29 | plt.legend(['pig number'], loc='upper left') 30 | plt.savefig('pig number.pdf') 31 | 32 | plt.figure(figsize=(8, 6)) 33 | plt.plot(x, group_num, label='group number') 34 | plt.xlabel('timestep') 35 | plt.ylabel('group number') 36 | plt.grid() 37 | plt.legend(['group number'], loc='upper left') 38 | plt.savefig('group number.pdf') 39 | 40 | plt.figure(figsize=(8, 6)) 41 | plt.plot(x, mean_size, label='mean size') 42 | plt.xlabel('timestep') 43 | plt.ylabel('mean size') 44 | plt.grid() 45 | plt.legend(['mean size'], loc='upper left') 46 | plt.savefig('mean size.pdf') 47 | 48 | plt.figure(figsize=(8, 6)) 49 | plt.plot(x, variance_size, label='group number') 50 | plt.xlabel('timestep') 51 | plt.ylabel('variance size') 52 | plt.grid() 53 | plt.legend(['variance size'], loc='upper left') 54 | plt.savefig('variance size.pdf') 55 | 56 | plt.figure(figsize=(8, 6)) 57 | plt.plot(x, max_size, label='max size') 58 | plt.xlabel('timestep') 59 | plt.ylabel('max size') 60 | plt.grid() 61 | plt.legend(['max size'], loc='upper left') 62 | plt.savefig('max size.pdf') 63 | -------------------------------------------------------------------------------- /src/plot_circle.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | 3 | log_file = 'log.txt' 4 | 5 | # 'Round %d timestep %d group_num %d average_group_size %f max_group_size %d' 6 | 7 | group_num = [] 8 | average_group_size = [] 9 | max_group_size = [] 10 | agent_num = [] 11 | pig_num = [] 12 | 13 | with open(log_file)as fin: 14 | for line in fin: 15 | line = line.split() 16 | pig_num.append(int(line[5])) 17 | agent_num.append(int(line[18])) 18 | 19 | x = range(len(agent_num)) 20 | x = x[8000:] 21 | agent_num = agent_num[8000:] 22 | pig_num = pig_num[8000:] 23 | 24 | length = len(x) 25 | 26 | agent_num_avg = [] 27 | pig_num_avg = [] 28 | 29 | for i in xrange(0, length, 10): 30 | agent_tot = 0 31 | pig_tot = 0 32 | for j in xrange(i, min(i + 10, len(x))): 33 | agent_tot += agent_num[j] 34 | pig_tot += pig_num[j] 35 | 36 | agent_tot = 1. * agent_tot / 10. 37 | pig_tot = 1. * pig_tot / 10. 38 | agent_num_avg.append(agent_tot) 39 | pig_num_avg.append(pig_tot) 40 | 41 | 42 | 43 | 44 | print agent_num 45 | print pig_num 46 | 47 | plt.figure(figsize=(8, 6)) 48 | #plt.plot(x, agent_num, label='agent number') 49 | #plt.plot(x, pig_num, label='pig number') 50 | plt.plot(agent_num_avg, pig_num_avg, label='number') 51 | #plt.plot(agent_num_avg, pig_num_avg, label='number') 52 | plt.xlabel('agent number') 53 | plt.ylabel('pig number') 54 | plt.legend(['agent number', 'pig number'], loc='upper left') 55 | plt.grid() 56 | plt.savefig('two species.pdf') 57 | 58 | #plt.figure(figsize=(8, 6)) 59 | #plt.plot(x, group_num, label='Number of groups') 60 | #plt.xlabel('timestep') 61 | #plt.ylabel('group number') 62 | #plt.grid() 63 | #plt.savefig('group_num.pdf') 64 | # 65 | #plt.figure(figsize=(8, 6)) 66 | #plt.plot(x, average_group_size, label='Average group size') 67 | #plt.xlabel('timestep') 68 | #plt.ylabel('average size') 69 | #plt.grid() 70 | #plt.savefig('avg_group_size.pdf') 71 | # 72 | #plt.figure(figsize=(8, 6)) 73 | #plt.plot(x, max_group_size, label='Max group size') 74 | #plt.xlabel('timestep') 75 | #plt.ylabel('Max size') 76 | #plt.grid() 77 | #plt.savefig('max_group_size.pdf') 78 | -------------------------------------------------------------------------------- /src/plot_largest_group.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | from cv2 import VideoWriter, imread, resize 4 | import cv2 5 | import os 6 | import numpy as np 7 | 8 | 9 | def make_video(images, outvid=None, fps=5, size=None, is_color=True, format="XVID"): 10 | """ 11 | Create a video from a list of images. 12 | @param outvid output video 13 | @param images list of images to use in the video 14 | @param fps frame per second 15 | @param size size of each frame 16 | @param is_color color 17 | @param format see http://www.fourcc.org/codecs.php 18 | """ 19 | # fourcc = VideoWriter_fourcc(*format) 20 | # For opencv2 and opencv3: 21 | if int(cv2.__version__[0]) > 2: 22 | fourcc = cv2.VideoWriter_fourcc(*format) 23 | else: 24 | fourcc = cv2.cv.CV_FOURCC(*format) 25 | vid = None 26 | for image in images: 27 | assert os.path.exists(image) 28 | img = imread(image) 29 | if vid is None: 30 | if size is None: 31 | size = img.shape[1], img.shape[0] 32 | vid = VideoWriter(outvid, fourcc, float(fps), size, is_color) 33 | if size[0] != img.shape[1] and size[1] != img.shape[0]: 34 | img = resize(img, size) 35 | vid.write(img) 36 | vid.release() 37 | 38 | 39 | t = 0 40 | start_step = 20000 41 | target_step = 26000 42 | width = 1000 43 | height = 1000 44 | 45 | # with open('log_largest_group.txt') as fin: 46 | # for line in fin: 47 | # t += 1 48 | # if t > target_step: 49 | # exit(0) 50 | # if t >= start_step: 51 | # x = [] 52 | # y = [] 53 | # line = line.split() 54 | # for item in line: 55 | # item = item.split(',') 56 | # x.append(int(item[0])) 57 | # y.append(int(item[1])) 58 | # plt.figure(figsize=(8, 6)) 59 | # plt.scatter(x, y, marker='o', color='r') 60 | # 61 | # x_min_bound = (np.min(x) / 100) * 100 62 | # x_max_bound = (np.max(x) / 100 + 1) * 100 if np.max(x) % 100 != 0 else (np.max(x) / 100) * 100 63 | # plt.xlim(x_min_bound, x_max_bound) 64 | # 65 | # y_min_bound = (np.min(y) / 100) * 100 66 | # y_max_bound = (np.max(y) / 100 + 1) * 100 if np.max(y) % 100 != 0 else (np.max(y) / 100) * 100 67 | # plt.ylim(y_min_bound, y_max_bound) 68 | # 69 | # plt.title('largest group size: %d' % len(x)) 70 | # plt.grid() 71 | # plt.savefig('largest_group/%d.png' % t) 72 | 73 | images = ['largest_group/%d.png' % i for i in xrange(start_step, target_step + 1)] 74 | make_video(images=images, outvid='largest_group_%d_%d.avi' % (start_step, target_step), fps=10) 75 | -------------------------------------------------------------------------------- /src/plot_number.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | 3 | log_file = 'log.txt' 4 | 5 | # 'Round %d timestep %d group_num %d average_group_size %f max_group_size %d' 6 | 7 | agent_num = [] 8 | pig_num = [] 9 | rabbit_num = [] 10 | join_ratio = [] 11 | leave_ratio = [] 12 | group_proportion = [] 13 | 14 | with open(log_file)as fin: 15 | for line in fin: 16 | line = line.split() 17 | pig_num.append(int(line[5])) 18 | rabbit_num.append(int(line[15])) 19 | agent_num.append(int(line[21])) 20 | group_num = int(line[7]) 21 | mean_size = float(line[9]) 22 | grouped_agents_num = group_num * mean_size 23 | group_proportion.append(1.0 * grouped_agents_num / int(line[21])) 24 | 25 | x = range(len(agent_num)) 26 | 27 | st = 4000 28 | ed = 5000 29 | x = x[st:ed] 30 | agent_num = agent_num[st:ed] 31 | pig_num = pig_num[st:ed] 32 | rabbit_num = rabbit_num[st:ed] 33 | group_proportion = group_proportion[st:ed] 34 | 35 | plt.figure(figsize=(8, 6)) 36 | ax1 = plt.gca() 37 | ax2 = ax1.twinx() 38 | 39 | # ax1.plot(x, agent_num, label='agent number') 40 | ax1.plot(x, pig_num, color='r',label='pig number') 41 | ax1.plot(x, rabbit_num, color='b', label='rabbit number') 42 | ax1.set_xlabel('time step') 43 | ax1.set_ylabel('number') 44 | ax1.legend(['pig number', 'rabbit number'], loc='upper left') 45 | 46 | ax2.plot(x, group_proportion, color='y', label='group proportion') 47 | ax2.set_ylabel('group proportion') 48 | plt.grid() 49 | 50 | plt.savefig('three species and group proportion from %d to %d.pdf' % (st, ed)) 51 | plt.savefig('three species and group proportion all.pdf') 52 | 53 | 54 | # plt.figure(figsize=(8, 6)) 55 | # plt.plot(x, agent_num, label='agent number') 56 | # plt.plot(x, pig_num, label='pig number') 57 | # plt.plot(x, rabbit_num, label='rabbit number') 58 | # plt.xlabel('time step') 59 | # plt.ylabel('number') 60 | # plt.legend(['agent number', 'pig number', 'rabbit number'], loc='upper left') 61 | # plt.grid() 62 | # plt.savefig('three species.pdf') 63 | -------------------------------------------------------------------------------- /src/readme.md: -------------------------------------------------------------------------------- 1 | ## Train script Introduction 2 | * train_1M.sh: train 1 million agent 3 | * train_10000_multiview.sh: train agents with different view, each view has 10000 agents 4 | * train_10000_pigincrease.sh: only one kind of view, only need to change the pig_increase_number parameter, to find out the effect of different pig increase speed 5 | * train_10000_minor_multiview.sh: train agents with only two different view(264, 66), 100 agents for the large view(264) and 9900 agents for the small view 6 | -------------------------------------------------------------------------------- /src/train_10000_pig_rabbit_add.sh: -------------------------------------------------------------------------------- 1 | add_pig_number=500 2 | add_rabbit_number=500 3 | add_every=500 4 | 5 | random_seed=10 6 | width=1000 7 | height=1000 8 | batch_size=32 9 | view_args=2500-5-5-0,2500-5-5-1,2500-5-5-2,2500-5-5-3 10 | pig_max_number=5000 11 | pig_min_number=2000 12 | pig_increase_every=1 13 | pig_increase_number=10 14 | pig_increase_policy=1 15 | agent_increase_rate=0.003 16 | pig_increase_rate=0.006 17 | rabbit_increase_rate=0.008 18 | rabbit_max_number=30000 19 | reward_radius_pig=7 20 | reward_threshold_pig=3 21 | reward_radius_rabbit=2 22 | img_length=5 23 | images_dir=images 24 | agent_mortal=1 25 | agent_emb_dim=5 26 | agent_id=1 27 | damage_per_step=0.01 28 | 29 | model_name=DNN 30 | model_hidden_size=32,32 31 | activations=sigmoid,sigmoid 32 | view_flat_size=335 33 | num_actions=9 34 | reward_decay=0.9 35 | save_every_round=10 36 | save_dir=models 37 | load_dir=None 38 | 39 | video_dir=videos 40 | video_per_round=0 41 | round=100 42 | time_step=500 43 | policy=e_greedy 44 | epsilon=0.1 45 | agent_number=10000 46 | learning_rate=0.001 47 | log_file=log.txt 48 | 49 | python main.py --add_pig_number $add_pig_number --add_rabbit_number $add_rabbit_number --add_every $add_every --random_seed $random_seed --width $width --height $height --batch_size $batch_size --view_args $view_args --pig_max_number $pig_max_number --pig_min_number $pig_min_number --pig_increase_every $pig_increase_every --pig_increase_policy $pig_increase_policy --agent_increase_rate $agent_increase_rate --pig_increase_rate $pig_increase_rate --rabbit_increase_rate $rabbit_increase_rate --rabbit_max_number $rabbit_max_number --reward_radius_pig $reward_radius_pig --reward_threshold_pig $reward_threshold_pig --reward_radius_rabbit $reward_radius_rabbit --img_length $img_length --images_dir $images_dir --agent_mortal $agent_mortal --agent_emb_dim $agent_emb_dim --agent_id $agent_id --damage_per_step $damage_per_step --model_name $model_name --model_hidden_size $model_hidden_size --activations $activations --view_flat_size $view_flat_size --num_actions $num_actions --reward_decay $reward_decay --save_every_round $save_every_round --save_dir $save_dir --load_dir $load_dir --video_dir $video_dir --video_per_round $video_per_round --round $round --time_step $time_step --policy $policy --epsilon $epsilon --agent_number $agent_number --learning_rate $learning_rate --log_file $log_file 50 | -------------------------------------------------------------------------------- /src/train_1M.sh: -------------------------------------------------------------------------------- 1 | random_seed=10 2 | width=10000 3 | height=10000 4 | batch_size=256 5 | view_args=250000-5-5-0,250000-5-5-1,250000-5-5-2,250000-5-5-3 6 | pig_max_number=500000 7 | pig_min_number=200000 8 | pig_increase_every=1 9 | pig_increase_number=10 10 | pig_increase_policy=1 11 | agent_increase_rate=0.004 12 | pig_increase_rate=0.006 13 | reward_radius=5 14 | reward_threshold=3 15 | img_length=5 16 | images_dir=images 17 | agent_mortal=1 18 | agent_emb_dim=5 19 | agent_id=1 20 | damage_per_step=0.01 21 | 22 | model_name=DNN 23 | model_hidden_size=32,32 24 | activations=sigmoid,sigmoid 25 | view_flat_size=335 26 | num_actions=9 27 | reward_decay=0.9 28 | save_every_round=1 29 | save_dir=models 30 | load_dir=models/round_90/model.ckpt 31 | 32 | video_dir=videos 33 | video_per_round=0 34 | round=100 35 | time_step=100 36 | policy=e_greedy 37 | epsilon=0.1 38 | agent_number=1000000 39 | learning_rate=0.001 40 | log_file=log.txt 41 | 42 | python main.py --random_seed $random_seed --width $width --height $height --batch_size $batch_size --view_args $view_args --pig_max_number $pig_max_number --pig_min_number $pig_min_number --pig_increase_every $pig_increase_every --pig_increase_policy $pig_increase_policy --agent_increase_rate $agent_increase_rate --pig_increase_rate $pig_increase_rate --reward_radius $reward_radius --reward_threshold $reward_threshold --img_length $img_length --images_dir $images_dir --agent_mortal $agent_mortal --agent_emb_dim $agent_emb_dim --agent_id $agent_id --damage_per_step $damage_per_step --model_name $model_name --model_hidden_size $model_hidden_size --activations $activations --view_flat_size $view_flat_size --num_actions $num_actions --reward_decay $reward_decay --save_every_round $save_every_round --save_dir $save_dir --load_dir $load_dir --video_dir $video_dir --video_per_round $video_per_round --round $round --time_step $time_step --policy $policy --epsilon $epsilon --agent_number $agent_number --learning_rate $learning_rate --log_file $log_file 43 | --------------------------------------------------------------------------------