├── CITATION.cff
├── DQN
    ├── Images
    │   ├── AvgLosses_sample_DQN_agent.png
    │   ├── AvgTestScore_sample_DQN_agent.png
    │   ├── Epsilon_sample_DQN_agent.png
    │   └── ModelEvalNSFNET.pdf
    ├── Logs
    │   └── expsample_DQN_agentLogs.txt
    ├── README.md
    ├── evaluate_DQN.py
    ├── gym-environments
    │   ├── gym_environments
    │   │   ├── __init__.py
    │   │   └── envs
    │   │   │   ├── __init__.py
    │   │   │   └── environment1.py
    │   └── setup.py
    ├── modelssample_DQN_agent
    │   ├── checkpoint
    │   ├── ckpt-349.data-00000-of-00001
    │   └── ckpt-349.index
    ├── mpnn.py
    ├── parse.py
    ├── requirements.txt
    └── train_DQN.py
├── LICENSE
└── README.md


/CITATION.cff:
--------------------------------------------------------------------------------
 1 | cff-version: 1.2.0
 2 | message: "If you use this software, please cite it as below."
 3 | authors:
 4 | - family-names: "Almasan"
 5 |   given-names: "Paul"
 6 |   orcid: "https://orcid.org/0000-0003-3903-6759"
 7 | title: "Code of DRL+GNN architecture in OTN"
 8 | version: 1.0
 9 | date-released: 2021-11-22
10 | url: "https://github.com/knowledgedefinednetworking/DRL-GNN"
11 | 


--------------------------------------------------------------------------------
/DQN/Images/AvgLosses_sample_DQN_agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/AvgLosses_sample_DQN_agent.png


--------------------------------------------------------------------------------
/DQN/Images/AvgTestScore_sample_DQN_agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/AvgTestScore_sample_DQN_agent.png


--------------------------------------------------------------------------------
/DQN/Images/Epsilon_sample_DQN_agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/Epsilon_sample_DQN_agent.png


--------------------------------------------------------------------------------
/DQN/Images/ModelEvalNSFNET.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/ModelEvalNSFNET.pdf


--------------------------------------------------------------------------------
/DQN/README.md:
--------------------------------------------------------------------------------
 1 | # Instructions to execute
 2 | 
 3 | 1. First, create the virtual environment and activate the environment.
 4 | ```ruby
 5 | virtualenv -p python3 myenv
 6 | source myenv/bin/activate
 7 | ```
 8 | 
 9 | 2. Then, we install all the required packages.
10 | ```ruby
11 | pip install -r requirements.txt
12 | ```
13 | 
14 | 3. Register custom gym environment.
15 | ```ruby
16 | pip install -e gym-environments/
17 | ```
18 | 
19 | 4. Now we are ready to train a DQN agent. To do this, we must execute the following command. Notice that inside the *train_DQN.py* there are different hyperparameters that you can configure to set the training for different topologies, to define the size of the GNN model, etc.
20 | ```ruby
21 | python train_DQN.py
22 | ```
23 | 
24 | 5. Now that the training process is executing, we can see the DQN agent performance evolution by parsing the log files.
25 | ```ruby
26 | python parse.py -d ./Logs/expsample_DQN_agentLogs.txt
27 | ```
28 | 
29 | 6. Finally, we can evaluate our trained model on different topologies executing the command below. Notice that in the *evaluate_DQN.py* script you must modify the hyperparameters of the model to match the ones from the trained model.
30 | ```ruby
31 | python evaluate_DQN.py -d ./Logs/expsample_DQN_agentLogs.txt
32 | ```


--------------------------------------------------------------------------------
/DQN/evaluate_DQN.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import gym
  3 | import os
  4 | import gym_environments
  5 | import networkx as nx
  6 | import random
  7 | import matplotlib.pyplot as plt
  8 | import argparse
  9 | import mpnn as gnn
 10 | from collections import deque
 11 | import tensorflow as tf
 12 | 
 13 | os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
 14 | 
 15 | ENV_NAME_AGENT = 'GraphEnv-v1'
 16 | ENV_NAME = 'GraphEnv-v1'
 17 | 
 18 | SEED = 9
 19 | os.environ['PYTHONHASHSEED']=str(SEED)
 20 | np.random.seed(SEED)
 21 | tf.random.set_seed(1)
 22 | 
 23 | # Force TensorFlow to use single thread.
 24 | # Multiple threads are a potential source of non-reproducible results.
 25 | # For further details, see: https://stackoverflow.com/questions/42022950/
 26 | # tf.config.threading.set_inter_op_parallelism_threads(1)
 27 | # tf.config.threading.set_intra_op_parallelism_threads(1)
 28 | 
 29 | NUMBER_EPISODES = 50
 30 | # We assume that the number of samples is always larger than the number of demands any agent can ever allocate
 31 | NUM_SAMPLES_EPSD = 100
 32 | 
 33 | # Set evaluation topology
 34 | graph_topology = 0 # 0==NSFNET, 1==GEANT2, 2==Small Topology, 3==GBN
 35 | listofDemands = [8, 32, 64]
 36 | 
 37 | hparams = {
 38 |     'l2': 0.1,
 39 |     'dropout_rate': 0.01,
 40 |     'link_state_dim': 20,
 41 |     'readout_units': 35,
 42 |     'learning_rate': 0.0001,
 43 |     'batch_size': 32,
 44 |     'T': 4, 
 45 |     'num_demands': len(listofDemands)
 46 | }
 47 | 
 48 | class SAPAgent:
 49 |     # Shortest Available Path
 50 |     # Select the shortest available path among the K paths
 51 |     def __init__(self):
 52 |         self.K = 4
 53 | 
 54 |     def act(self, env, state, demand, n1, n2):
 55 |         pathList = env.allPaths[str(n1) +':'+ str(n2)]
 56 |         path = 0
 57 |         allocated = 0 # Indicates 1 if we allocated the demand, 0 otherwise
 58 |         new_state = np.copy(state)
 59 |         while allocated==0 and path < len(pathList) and path<self.K:
 60 |             currentPath = pathList[path]
 61 |             can_allocate = 1 # Indicates 1 if we can allocate the demand, 0 otherwise
 62 |             i = 0
 63 |             j = 1
 64 | 
 65 |             # 1. Iterate over pairs of nodes and check if we can allocate the demand
 66 |             while j < len(currentPath):
 67 |                 if new_state[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] - demand < 0:
 68 |                     can_allocate = 0
 69 |                     break
 70 |                 i = i + 1
 71 |                 j = j + 1
 72 | 
 73 |             if can_allocate==1:
 74 |                 return path
 75 |             
 76 |             path = path + 1
 77 | 
 78 |         # If we can't allocate it we just do it in the first path
 79 |         if allocated==0:
 80 |             return 0
 81 | 
 82 | class LBAgent:
 83 |     # Load Balancing agent
 84 |     # Selects the path among the K paths with uniform probability
 85 |     def __init__(self):
 86 |         self.K = 4
 87 | 
 88 |     def act(self, env, state, demand, n1, n2):
 89 |         pathList = env.allPaths[str(n1) +':'+  str(n2)]
 90 |         new_state = np.copy(state)
 91 | 
 92 |         free_capacity = 0
 93 |         id_last_free = -1 # Indicates the id of the last free path where we will allocate the demand
 94 |         path = 0
 95 |         # Check if there are at least 2 paths
 96 |         while free_capacity < 2 and path < len(pathList) and path<self.K:
 97 |             currentPath = pathList[path]
 98 |             can_allocate = 1  # Indicates 1 if we can allocate the demand, 0 otherwise
 99 |             i = 0
100 |             j = 1
101 | 
102 |             # 1. Iterate over pairs of nodes and check if we can allocate the demand
103 |             while j < len(currentPath):
104 |                 if new_state[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] - demand < 0:
105 |                     can_allocate = 0
106 |                     break
107 |                 i = i + 1
108 |                 j = j + 1
109 | 
110 |             if can_allocate == 1:
111 |                 free_capacity = free_capacity + 1
112 |                 id_last_free = path
113 |             path = path + 1
114 | 
115 |         # If we can't allocate nowhere
116 |         if free_capacity == 0:
117 |             return 0
118 | 
119 |         # If there is just one path to allocate we allocate it there
120 |         elif free_capacity == 1:
121 |             return id_last_free
122 |         else:
123 |             allocated = 0 # Indicates 1 if we allocated the demand, 0 otherwise
124 |             while allocated==0:
125 |                 # -1 to convert a <= max_action <= b to a <= max_action < b
126 |                 max_action = min(self.K,len(pathList))-1 
127 |                 action = random.randint(0, max_action)
128 |  
129 |                 currentPath = pathList[action]
130 |                 can_allocate = 1 # Indicates 1 if we can allocate the demand, 0 otherwise
131 |                 i = 0
132 |                 j = 1
133 | 
134 |                 # 1. Iterate over pairs of nodes and check if we can allocate the demand
135 |                 while j < len(currentPath):
136 |                     if new_state[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] - demand < 0:
137 |                         can_allocate = 0
138 |                         break
139 |                     i = i + 1
140 |                     j = j + 1
141 | 
142 |                 if can_allocate == 1:
143 |                     return action
144 | 
145 | def cummax(alist, extractor):
146 |     with tf.name_scope('cummax'):
147 |         maxes = [tf.reduce_max(extractor(v)) + 1 for v in alist]
148 |         cummaxes = [tf.zeros_like(maxes[0])]
149 |         for i in range(len(maxes) - 1):
150 |             cummaxes.append(tf.math.add_n(maxes[0:i + 1]))
151 |     return cummaxes
152 | 
153 | class DQNAgent:
154 |     def __init__(self, env_nsfnet):
155 |         self.gamma = 0.95  # discount rate
156 |         self.epsilon = 1.0  # exploration rate
157 |         self.epsilon_min = 0.01
158 |         self.epsilon_decay = 0.995
159 |         self.writer = None
160 |         self.K = 4
161 |         self.listQValues = None
162 |         self.action = None
163 |         self.capacity_feature = None
164 |         self.bw_demand_feature = np.zeros((env_nsfnet.numEdges,len(env_nsfnet.listofDemands)))
165 | 
166 |         self.global_step = 0
167 |         self.primary_network = gnn.myModel(hparams)
168 |         self.primary_network.build()
169 |         self.target_network = gnn.myModel(hparams)
170 |         self.target_network.build()
171 |         self.optimizer = tf.keras.optimizers.SGD(learning_rate=hparams['learning_rate'],momentum=0.9,nesterov=True)
172 | 
173 |     def act(self, env, state, demand, source, destination, flagEvaluation):
174 |         """
175 |         Given a demand stored in the environment it allocates the K=4 shortest paths on the current 'state'
176 |         and predicts the q_values of the K=4 different new graph states by using the GNN model.
177 |         Picks the state according to epsilon-greedy approach. The flag=TRUE indicates that we are testing
178 |         the model and thus, it won't activate the drop layers.
179 |         """
180 |         # Set to True if we need to compute K=4 q-values and take the maxium
181 |         takeMax_epsilon = False
182 |         # List of graphs
183 |         listGraphs = []
184 |         # List of graph features that are used in the cummax() call
185 |         list_k_features = list()
186 |         # Initialize action
187 |         action = 0
188 | 
189 |         # We get the K-paths between source-destination
190 |         pathList = env.allPaths[str(source) +':'+ str(destination)]
191 |         path = 0
192 | 
193 |         # 1. Implement epsilon-greedy to pick allocation
194 |         # If flagEvaluation==TRUE we are EVALUATING => take always the action that the agent is saying has higher q-value
195 |         # Otherwise, we are training with normal epsilon-greedy strategy
196 |         if flagEvaluation:
197 |             # If evaluation, compute K=4 q-values and take the maxium value
198 |             takeMax_epsilon = True
199 |         else:
200 |             # If training, compute epsilon-greedy
201 |             z = np.random.random()
202 |             if z > self.epsilon:
203 |                 # Compute K=4 q-values and pick the one with highest value
204 |                 # In case of multiple same max values, return the first one
205 |                 takeMax_epsilon = True
206 |             else:
207 |                 # Pick a random path and compute only one q-value
208 |                 path = np.random.randint(0, len(pathList))
209 |                 action = path
210 | 
211 |         # 2. Allocate (S,D, linkDemand) demand using the K shortest paths
212 |         while path < len(pathList):
213 |             state_copy = np.copy(state)
214 |             currentPath = pathList[path]
215 |             i = 0
216 |             j = 1
217 | 
218 |             # 3. Iterate over paths' pairs of nodes and allocate demand to bw_allocated
219 |             while (j < len(currentPath)):
220 |                 state_copy[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = demand
221 |                 i = i + 1
222 |                 j = j + 1
223 | 
224 |             # 4. Add allocated graphs' features to the list. Later we will compute their q-values using cummax
225 |             listGraphs.append(state_copy)
226 |             features = self.get_graph_features(env, state_copy)
227 |             list_k_features.append(features)
228 | 
229 |             if not takeMax_epsilon:
230 |                 # If we don't need to compute the K=4 q-values we exit
231 |                 break
232 | 
233 |             path = path + 1
234 | 
235 |         vs = [v for v in list_k_features]
236 | 
237 |         # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 
238 |         # link hidden states for each graph.
239 |         graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))]
240 |         first_offset = cummax(vs, lambda v: v['first'])
241 |         second_offset = cummax(vs, lambda v: v['second'])
242 | 
243 |         tensors = ({
244 |             'graph_id': tf.concat([v for v in graph_ids], axis=0),
245 |             'link_state': tf.concat([v['link_state'] for v in vs], axis=0),
246 |             'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0),
247 |             'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0),
248 |             'num_edges': tf.math.add_n([v['num_edges'] for v in vs]),
249 |             }
250 |         )        
251 | 
252 |         # Predict qvalues for all graphs within tensors
253 |         self.listQValues = self.primary_network(tensors['link_state'], tensors['graph_id'], tensors['first'],
254 |                         tensors['second'], tensors['num_edges'], training=False).numpy()
255 | 
256 |         if takeMax_epsilon:
257 |             # We take the path with highest q-value
258 |             action = np.argmax(self.listQValues)
259 |         else:
260 |             return path, list_k_features[0]
261 | 
262 |         return action, list_k_features[action]
263 |     
264 |     def get_graph_features(self, env, copyGraph):
265 |         """
266 |         We iterate over the converted graph nodes and take the features. The capacity and bw allocated features
267 |         are normalized on the fly.
268 |         """
269 |         self.bw_demand_feature.fill(0.0)
270 |         self.capacity_feature = (copyGraph[:,0] - 100.00000001) / 200.0
271 | 
272 |         itera = 0
273 |         for i in copyGraph[:, 1]:
274 |             if i == 8:
275 |                 self.bw_demand_feature[itera][0] = 1
276 |             elif i == 32:
277 |                 self.bw_demand_feature[itera][1] = 1
278 |             elif i == 64:
279 |                 self.bw_demand_feature[itera][2] = 1
280 |             itera = itera + 1
281 |         
282 |         sample = {
283 |             'num_edges': env.numEdges,  
284 |             'length': env.firstTrueSize,
285 |             'betweenness': tf.convert_to_tensor(value=env.between_feature, dtype=tf.float32),
286 |             'bw_allocated': tf.convert_to_tensor(value=self.bw_demand_feature, dtype=tf.float32),
287 |             'capacities': tf.convert_to_tensor(value=self.capacity_feature, dtype=tf.float32),
288 |             'first': tf.convert_to_tensor(env.first, dtype=tf.int32),
289 |             'second': tf.convert_to_tensor(env.second, dtype=tf.int32)
290 |         }
291 | 
292 |         sample['capacities'] = tf.reshape(sample['capacities'][0:sample['num_edges']], [sample['num_edges'], 1])
293 |         sample['betweenness'] = tf.reshape(sample['betweenness'][0:sample['num_edges']], [sample['num_edges'], 1])
294 | 
295 |         hiddenStates = tf.concat([sample['capacities'], sample['betweenness'], sample['bw_allocated']], axis=1)
296 | 
297 |         paddings = tf.constant([[0, 0], [0, hparams['link_state_dim'] - 2 - hparams['num_demands']]])
298 |         link_state = tf.pad(tensor=hiddenStates, paddings=paddings, mode="CONSTANT")
299 | 
300 |         inputs = {'link_state': link_state, 'first': sample['first'][0:sample['length']],
301 |                   'second': sample['second'][0:sample['length']], 'num_edges': sample['num_edges']}
302 | 
303 |         return inputs
304 | 
305 | def exec_lb_model_episodes(experience_memory, graph_topology):
306 |     env_lb = gym.make(ENV_NAME)
307 |     env_lb.seed(SEED)
308 |     env_lb.generate_environment(graph_topology, listofDemands)
309 | 
310 |     agent = LBAgent()
311 |     rewards_lb = np.zeros(NUMBER_EPISODES)
312 | 
313 |     rewardAdd = 0
314 |     reward_it = 0
315 |     iter_episode = 0 # Iterates over samples within the same episode
316 |     new_episode = True
317 |     wait_for_new_episode = False
318 |     new_episode_it = 0 # Iterates over EPISODES
319 |     while iter_episode < len(experience_memory):
320 |         if new_episode:
321 |             new_episode = False
322 |             demand = experience_memory[iter_episode][1]
323 |             source = experience_memory[iter_episode][2]
324 |             destination = experience_memory[iter_episode][3]
325 |             state = env_lb.eval_sap_reset(demand, source, destination)
326 | 
327 |             action = agent.act(env_lb, state, demand, source, destination)
328 |             new_state, reward, done, _, _, _ = env_lb.make_step(state, action, demand, source, destination)
329 |             env_lb.demand = demand
330 |             env_lb.source = source
331 |             env_lb.destination = destination
332 |             rewardAdd = rewardAdd + reward
333 |             state = new_state
334 | 
335 |             if done:
336 |                 rewards_lb[reward_it] = rewardAdd
337 |                 reward_it = reward_it + 1
338 |                 wait_for_new_episode = True
339 | 
340 |             iter_episode = iter_episode + 1
341 |         else:
342 |             if experience_memory[iter_episode][0] != new_episode_it:
343 |                 print("LB ERROR! The experience replay buffer needs more samples/episode")
344 |                 os.kill(os.getpid(), 9)
345 | 
346 |             demand = experience_memory[iter_episode][1]
347 |             source = experience_memory[iter_episode][2]
348 |             destination = experience_memory[iter_episode][3]
349 |             action = agent.act(env_lb, state, demand, source, destination)
350 |             new_state, reward, done, _, _, _ = env_lb.make_step(state, action, demand, source, destination)
351 |             env_lb.demand = demand
352 |             env_lb.source = source
353 |             env_lb.destination = destination
354 |             rewardAdd = rewardAdd + reward
355 |             state = new_state
356 | 
357 |             if done:
358 |                 rewards_lb[reward_it] = rewardAdd
359 |                 reward_it = reward_it + 1
360 |                 wait_for_new_episode = True
361 | 
362 |             iter_episode = iter_episode + 1
363 |         if wait_for_new_episode:
364 |             rewardAdd = 0
365 |             wait_for_new_episode = False
366 |             new_episode = True
367 |             new_episode_it = new_episode_it + 1
368 |             iter_episode = new_episode_it*NUM_SAMPLES_EPSD
369 |     return rewards_lb
370 | 
371 | def exec_sap_model_episodes(experience_memory, graph_topology):
372 |     env_sap = gym.make(ENV_NAME)
373 |     env_sap.seed(SEED)
374 |     env_sap.generate_environment(graph_topology, listofDemands)
375 | 
376 |     agent = SAPAgent()
377 |     rewards_sap = np.zeros(NUMBER_EPISODES)
378 | 
379 |     rewardAdd = 0
380 |     reward_it = 0
381 |     iter_episode = 0  # Iterates over samples within the same episode
382 |     new_episode = True
383 |     wait_for_new_episode = False
384 |     new_episode_it = 0  # Iterates over EPISODES
385 |     while iter_episode < len(experience_memory):
386 |         if new_episode:
387 |             new_episode = False
388 |             demand = experience_memory[iter_episode][1]
389 |             source = experience_memory[iter_episode][2]
390 |             destination = experience_memory[iter_episode][3]
391 |             state = env_sap.eval_sap_reset(demand, source, destination)
392 | 
393 |             action = agent.act(env_sap, state, demand, source, destination)
394 |             new_state, reward, done, _, _, _ = env_sap.make_step(state, action, demand, source, destination)
395 |             env_sap.demand = demand
396 |             env_sap.source = source
397 |             env_sap.destination = destination
398 |             rewardAdd = rewardAdd + reward
399 |             state = new_state
400 | 
401 |             if done:
402 |                 rewards_sap[reward_it] = rewardAdd
403 |                 reward_it = reward_it + 1
404 |                 wait_for_new_episode = True
405 | 
406 |             iter_episode = iter_episode + 1
407 |         else:
408 |             if experience_memory[iter_episode][0]!=new_episode_it:
409 |                 print("SAP ERROR! The experience replay buffer needs more samples/episode")
410 |                 os.kill(os.getpid(), 9)
411 | 
412 |             demand = experience_memory[iter_episode][1]
413 |             source = experience_memory[iter_episode][2]
414 |             destination = experience_memory[iter_episode][3]
415 |             action = agent.act(env_sap, state, demand, source, destination)
416 |             new_state, reward, done, _, _, _ = env_sap.make_step(state, action, demand, source, destination)
417 |             env_sap.demand = demand
418 |             env_sap.source = source
419 |             env_sap.destination = destination
420 |             rewardAdd = rewardAdd + reward
421 |             state = new_state
422 | 
423 |             if done:
424 |                 rewards_sap[reward_it] = rewardAdd
425 |                 reward_it = reward_it + 1
426 |                 wait_for_new_episode = True
427 | 
428 |             iter_episode = iter_episode + 1
429 |         if wait_for_new_episode:
430 |             rewardAdd = 0
431 |             wait_for_new_episode = False
432 |             new_episode = True
433 |             new_episode_it = new_episode_it + 1
434 |             iter_episode = new_episode_it * NUM_SAMPLES_EPSD
435 |     return rewards_sap
436 | 
437 | def exec_dqn_model_episodes(experience_memory, env_dqn, agent):
438 |     rewards_dqn = np.zeros(NUMBER_EPISODES)
439 | 
440 |     rewardAdd = 0
441 |     reward_it = 0
442 |     iter_episode = 0  # Iterates over samples within the same episode
443 |     new_episode = True
444 |     wait_for_new_episode = False
445 |     new_episode_it = 0  # Iterates over EPISODES
446 |     while iter_episode < len(experience_memory):
447 |         if new_episode:
448 |             new_episode = False
449 |             demand = experience_memory[iter_episode][1]
450 |             source = experience_memory[iter_episode][2]
451 |             destination = experience_memory[iter_episode][3]
452 |             state = env_dqn.eval_sap_reset(demand, source, destination)
453 | 
454 |             action, state_action = agent.act(env_dqn, state, demand, source, destination, True)
455 |             new_state, reward, done, new_demand, new_source, new_destination = env_dqn.make_step(state, action, demand, source, destination)
456 |             rewardAdd = rewardAdd + reward
457 |             state = new_state
458 |             if done:
459 |                 rewards_dqn[reward_it] = rewardAdd
460 |                 reward_it = reward_it + 1
461 |                 wait_for_new_episode = True
462 |             iter_episode = iter_episode + 1
463 |         else:
464 |             if experience_memory[iter_episode][0] != new_episode_it:
465 |                 print("DQNAgent ERROR! The experience replay buffer needs more samples/episode")
466 |                 os.kill(os.getpid(), 9)
467 | 
468 |             demand = experience_memory[iter_episode][1]
469 |             source = experience_memory[iter_episode][2]
470 |             destination = experience_memory[iter_episode][3]
471 | 
472 |             action, state_action = agent.act(env_dqn, state, demand, source, destination, True)
473 |             new_state, reward, done, new_demand, new_source, new_destination = env_dqn.make_step(state, action, demand, source, destination)
474 |             rewardAdd = rewardAdd + reward
475 |             state = new_state
476 |             if done:
477 |                 rewards_dqn[reward_it] = rewardAdd
478 |                 reward_it = reward_it + 1
479 |                 wait_for_new_episode = True
480 |             iter_episode = iter_episode + 1
481 |         if wait_for_new_episode:
482 |             rewardAdd = 0
483 |             wait_for_new_episode = False
484 |             new_episode = True
485 |             new_episode_it = new_episode_it + 1
486 |             if new_episode_it%5==0:
487 |                 print("DQN Episode >>> ", new_episode_it)
488 |             iter_episode = new_episode_it * NUM_SAMPLES_EPSD
489 |     return rewards_dqn
490 | 
491 | if __name__ == "__main__":
492 |     # python evaluate_DQN.py -d ./Logs/expsample_DQN_agentLogs.txt
493 | 
494 |     # Parse logs and get best model
495 |     parser = argparse.ArgumentParser(description='Parse file and create plots')
496 | 
497 |     parser.add_argument('-d', help='data file', type=str, required=True, nargs='+')
498 |     args = parser.parse_args()
499 | 
500 |     aux = args.d[0].split(".")
501 |     aux = aux[1].split("exp")
502 |     differentiation_str = str(aux[1].split("Logs")[0])
503 | 
504 |     if not os.path.exists("./Images"):
505 |         os.makedirs("./Images")
506 | 
507 |     topo = ""
508 |     if graph_topology==0:
509 |         topo = "NSFNET"
510 |     elif graph_topology==1:
511 |         topo = "GEANT2"
512 |     elif graph_topology==2:
513 |         topo = "Small_Top"
514 |     else:
515 |         topo = "GBN"
516 | 
517 |     # Uncomment the following if you want to store the demands in a file
518 |     # store_experiences = open("Traffic_demands_"+topo+"_1K.txt", "w")
519 |     model_id = 0
520 |     with open(args.d[0]) as fp:
521 |         for line in reversed(list(fp)):
522 |             arrayLine = line.split(":")
523 |             if arrayLine[0]=='MAX REWD':
524 |                 model_id = int(arrayLine[2].split(",")[0])
525 |                 break
526 | 
527 |     env_dqn = gym.make(ENV_NAME_AGENT)
528 |     env_dqn.seed(SEED)
529 |     env_dqn.generate_environment(graph_topology, listofDemands)
530 | 
531 |     dqn_agent = DQNAgent(env_dqn)
532 |     checkpoint_dir = "./models" + differentiation_str
533 |     checkpoint = tf.train.Checkpoint(model=dqn_agent.primary_network, optimizer=dqn_agent.optimizer)
534 |     # Restore variables on creation if a checkpoint exists.
535 |     checkpoint.restore(checkpoint_dir + "/ckpt-" + str(model_id))
536 |     print("Load model " + checkpoint_dir + "/ckpt-" + str(model_id))
537 | 
538 |     means_sap = np.zeros(NUMBER_EPISODES)
539 |     means_dqn = np.zeros(NUMBER_EPISODES)
540 |     means_lb = np.zeros(NUMBER_EPISODES)
541 |     iters = np.zeros(NUMBER_EPISODES)
542 | 
543 |     experience_memory = deque(maxlen=NUMBER_EPISODES*NUM_SAMPLES_EPSD)
544 | 
545 |     # Generate lists of determined size of demands. The different agents will iterate over the same list
546 |     for ep_num in range(NUMBER_EPISODES):
547 |         for sample in range(NUM_SAMPLES_EPSD):
548 |             demand = np.random.choice(listofDemands)
549 |             source = np.random.choice(env_dqn.nodes)
550 | 
551 |             # We pick a pair of SOURCE,DESTINATION different nodes
552 |             while True:
553 |                 destination = np.random.choice(env_dqn.nodes)
554 |                 if destination != source:
555 |                     # We generate unique demands that don't overlap with existing topology edges
556 |                     experience_memory.append((ep_num, demand, source, destination))
557 |                     #cstore_experiences.write(str(ep_num)+","+str(source)+","+str(destination)+","+str(demand)+"\n")
558 |                     break
559 | 
560 |     # store_experiences.close()
561 | 
562 |     rewards_lb = exec_lb_model_episodes(experience_memory, graph_topology)
563 |     rewards_sap = exec_sap_model_episodes(experience_memory, graph_topology)
564 |     rewards_dqn = exec_dqn_model_episodes(experience_memory, env_dqn, dqn_agent)
565 | 
566 |     #rewards_lb.tofile('rewards_lb'+topo+'1K.dat')
567 |     #rewards_dqn.tofile('rewards_dqn'+topo+'1K.dat')
568 | 
569 |     plt.rcParams.update({'font.size': 12})
570 |     plt.plot(rewards_dqn, 'r', label="DQN")
571 |     plt.plot(rewards_sap, 'b', label="SAP")
572 |     plt.plot(rewards_lb, 'g', label="LB")
573 | 
574 |     #DQN
575 |     mean = np.mean(rewards_dqn) 
576 |     means_dqn.fill(mean)
577 |     plt.plot(means_dqn, 'r', linestyle="-.")
578 | 
579 |     #SAP
580 |     mean = np.mean(rewards_sap) 
581 |     means_sap.fill(mean)
582 |     plt.plot(means_sap, 'b', linestyle=":")
583 | 
584 |     #LB
585 |     mean = np.mean(rewards_lb) 
586 |     means_lb.fill(mean)
587 |     plt.plot(means_lb, 'g', linestyle="--")
588 | 
589 |     plt.xlabel("Episodes", fontsize=14, fontweight='bold')
590 |     plt.ylabel("Score", fontsize=14, fontweight='bold')
591 |     lgd = plt.legend(loc="lower left", bbox_to_anchor=(0.1, -0.24),
592 |             ncol=4, fancybox=True, shadow=True)
593 |     
594 |     plt.savefig("./Images/ModelEval"+topo+".pdf", bbox_extra_artists=(lgd,), bbox_inches='tight')
595 |     #plt.show()
596 | 
597 | 


--------------------------------------------------------------------------------
/DQN/gym-environments/gym_environments/__init__.py:
--------------------------------------------------------------------------------
1 | from gym.envs.registration import register
2 | 
3 | register(
4 |     id='GraphEnv-v1',
5 |     entry_point='gym_environments.envs:Env1',
6 | )


--------------------------------------------------------------------------------
/DQN/gym-environments/gym_environments/envs/__init__.py:
--------------------------------------------------------------------------------
1 | from gym_environments.envs.environment1 import Env1


--------------------------------------------------------------------------------
/DQN/gym-environments/gym_environments/envs/environment1.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2021, Paul Almasan [^1]
  2 | #
  3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture
  4 | #     department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu
  5 | 
  6 | import gym
  7 | import numpy as np
  8 | import networkx as nx
  9 | import random
 10 | from gym import error, spaces, utils
 11 | from random import choice
 12 | import pylab
 13 | import json 
 14 | import gc
 15 | import matplotlib.pyplot as plt
 16 | 
 17 | def create_geant2_graph():
 18 |     Gbase = nx.Graph()
 19 |     Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
 20 |     Gbase.add_edges_from(
 21 |         [(0, 1), (0, 2), (1, 3), (1, 6), (1, 9), (2, 3), (2, 4), (3, 6), (4, 7), (5, 3),
 22 |          (5, 8), (6, 9), (6, 8), (7, 11), (7, 8), (8, 11), (8, 20), (8, 17), (8, 18), (8, 12),
 23 |          (9, 10), (9, 13), (9, 12), (10, 13), (11, 20), (11, 14), (12, 13), (12,19), (12,21),
 24 |          (14, 15), (15, 16), (16, 17), (17,18), (18,21), (19, 23), (21,22), (22, 23)])
 25 | 
 26 |     return Gbase
 27 | 
 28 | def create_nsfnet_graph():
 29 |     Gbase = nx.Graph()
 30 |     Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
 31 |     Gbase.add_edges_from(
 32 |         [(0, 1), (0, 2), (0, 3), (1, 2), (1, 7), (2, 5), (3, 8), (3, 4), (4, 5), (4, 6), (5, 12), (5, 13),
 33 |          (6, 7), (7, 10), (8, 9), (8, 11), (9, 10), (9, 12), (10, 11), (10, 13), (11, 12)])
 34 | 
 35 |     return Gbase
 36 | 
 37 | def create_small_top():
 38 |     Gbase = nx.Graph()
 39 |     Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8])
 40 |     Gbase.add_edges_from(
 41 |         [(0, 1), (0, 2), (0, 3), (1, 2), (1, 7), (2, 5), (3, 8), (3, 4), (4, 5), (4, 6), (5, 0),
 42 |          (6, 7), (6, 8), (7, 8), (8, 0), (8, 6), (3, 2), (5, 3)])
 43 | 
 44 |     return Gbase
 45 | 
 46 | def create_gbn_graph():
 47 |     Gbase = nx.Graph()
 48 |     Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
 49 |     Gbase.add_edges_from(
 50 |         [(0, 2), (0, 8), (1, 2), (1, 3), (1, 4), (2, 4), (3, 4), (3, 9), (4, 8), (4, 10), (4, 9),
 51 |          (5, 6), (5, 8), (6, 7), (7, 8), (7, 10), (9, 10), (9, 12), (10, 11), (10, 12), (11, 13),
 52 |          (12, 14), (12, 16), (13, 14), (14, 15), (15, 16)])
 53 | 
 54 |     return Gbase
 55 | 
 56 | def generate_nx_graph(topology):
 57 |     """
 58 |     Generate graphs for training with the same topology.
 59 |     """
 60 |     if topology == 0:
 61 |         G = create_nsfnet_graph()
 62 |     elif topology == 1:
 63 |         G = create_geant2_graph()
 64 |     elif topology == 2:
 65 |         G = create_small_top()
 66 |     else:
 67 |         G = create_gbn_graph()
 68 | 
 69 |     # nx.draw(G, with_labels=True)
 70 |     # plt.show()
 71 |     # plt.clf()
 72 | 
 73 |     # Node id counter
 74 |     incId = 1
 75 |     # Put all distance weights into edge attributes.
 76 |     for i, j in G.edges():
 77 |         G.get_edge_data(i, j)['edgeId'] = incId
 78 |         G.get_edge_data(i, j)['betweenness'] = 0
 79 |         G.get_edge_data(i, j)['numsp'] = 0  # Indicates the number of shortest paths going through the link
 80 |         # We set the edges capacities to 200
 81 |         G.get_edge_data(i, j)["capacity"] = float(200)
 82 |         G.get_edge_data(i, j)['bw_allocated'] = 0
 83 |         incId = incId + 1
 84 | 
 85 |     return G
 86 | 
 87 | 
 88 | def compute_link_betweenness(g, k):
 89 |     n = len(g.nodes())
 90 |     betw = []
 91 |     for i, j in g.edges():
 92 |         # we add a very small number to avoid division by zero
 93 |         b_link = g.get_edge_data(i, j)['numsp'] / ((2.0 * n * (n - 1) * k) + 0.00000001)
 94 |         g.get_edge_data(i, j)['betweenness'] = b_link
 95 |         betw.append(b_link)
 96 | 
 97 |     mu_bet = np.mean(betw)
 98 |     std_bet = np.std(betw)
 99 |     return mu_bet, std_bet
100 | 
101 | class Env1(gym.Env):
102 |     """
103 |     Description:
104 |     The self.graph_state stores the relevant features for the GNN model
105 | 
106 |     self.graph_state[:][0] = CAPACITY
107 |     self.graph_state[:][1] = BW_ALLOCATED
108 |   """
109 |     def __init__(self):
110 |         self.graph = None
111 |         self.initial_state = None
112 |         self.source = None
113 |         self.destination = None
114 |         self.demand = None
115 |         self.graph_state = None
116 |         self.diameter = None
117 | 
118 |         # Nx Graph where the nodes have features. Betweenness is allways normalized.
119 |         # The other features are "raw" and are being normalized before prediction
120 |         self.first = None
121 |         self.firstTrueSize = None
122 |         self.second = None
123 |         self.between_feature = None
124 | 
125 |         # Mean and standard deviation of link betweenness
126 |         self.mu_bet = None
127 |         self.std_bet = None
128 | 
129 |         self.max_demand = 0
130 |         self.K = 4
131 |         self.listofDemands = None
132 |         self.nodes = None
133 |         self.ordered_edges = None
134 |         self.edgesDict = None
135 |         self.numNodes = None
136 |         self.numEdges = None
137 | 
138 |         self.state = None
139 |         self.episode_over = True
140 |         self.reward = 0
141 |         self.allPaths = dict()
142 | 
143 |     def seed(self, seed):
144 |         random.seed(seed)
145 |         np.random.seed(seed)
146 | 
147 |     def num_shortest_path(self, topology):
148 |         self.diameter = nx.diameter(self.graph)
149 | 
150 |         # Iterate over all node1,node2 pairs from the graph
151 |         for n1 in self.graph:
152 |             for n2 in self.graph:
153 |                 if (n1 != n2):
154 |                     # Check if we added the element of the matrix
155 |                     if str(n1)+':'+str(n2) not in self.allPaths:
156 |                         self.allPaths[str(n1)+':'+str(n2)] = []
157 |                     
158 |                     # First we compute the shortest paths taking into account the diameter
159 |                     # This is because large topologies might take too long to compute all shortest paths 
160 |                     [self.allPaths[str(n1)+':'+str(n2)].append(p) for p in nx.all_simple_paths(self.graph, source=n1, target=n2, cutoff=self.diameter*2)]
161 | 
162 |                     # We take all the paths from n1 to n2 and we order them according to the path length
163 |                     self.allPaths[str(n1)+':'+str(n2)] = sorted(self.allPaths[str(n1)+':'+str(n2)], key=lambda item: (len(item), item))
164 | 
165 |                     path = 0
166 |                     while path < self.K and path < len(self.allPaths[str(n1)+':'+str(n2)]):
167 |                         currentPath = self.allPaths[str(n1)+':'+str(n2)][path]
168 |                         i = 0
169 |                         j = 1
170 | 
171 |                         # Iterate over pairs of nodes increase the number of sp
172 |                         while (j < len(currentPath)):
173 |                             self.graph.get_edge_data(currentPath[i], currentPath[j])['numsp'] = \
174 |                                 self.graph.get_edge_data(currentPath[i], currentPath[j])['numsp'] + 1
175 |                             i = i + 1
176 |                             j = j + 1
177 | 
178 |                         path = path + 1
179 | 
180 |                     # Remove paths not needed
181 |                     del self.allPaths[str(n1)+':'+str(n2)][path:len(self.allPaths[str(n1)+':'+str(n2)])]
182 |                     gc.collect()
183 | 
184 | 
185 |     def _first_second_between(self):
186 |         self.first = list()
187 |         self.second = list()
188 | 
189 |         # For each edge we iterate over all neighbour edges
190 |         for i, j in self.ordered_edges:
191 |             neighbour_edges = self.graph.edges(i)
192 | 
193 |             for m, n in neighbour_edges:
194 |                 if ((i != m or j != n) and (i != n or j != m)):
195 |                     self.first.append(self.edgesDict[str(i) +':'+ str(j)])
196 |                     self.second.append(self.edgesDict[str(m) +':'+ str(n)])
197 | 
198 |             neighbour_edges = self.graph.edges(j)
199 |             for m, n in neighbour_edges:
200 |                 if ((i != m or j != n) and (i != n or j != m)):
201 |                     self.first.append(self.edgesDict[str(i) +':'+ str(j)])
202 |                     self.second.append(self.edgesDict[str(m) +':'+ str(n)])
203 | 
204 | 
205 |     def generate_environment(self, topology, listofdemands):
206 |         # The nx graph will only be used to convert graph from edges to nodes
207 |         self.graph = generate_nx_graph(topology)
208 | 
209 |         self.listofDemands = listofdemands
210 | 
211 |         self.max_demand = np.amax(self.listofDemands)
212 | 
213 |         # Compute number of shortest paths per link. This will be used for the betweenness
214 |         self.num_shortest_path(topology)
215 | 
216 |         # Compute the betweenness value for each link
217 |         self.mu_bet, self.std_bet = compute_link_betweenness(self.graph, self.K)
218 | 
219 |         self.edgesDict = dict()
220 | 
221 |         some_edges_1 = [tuple(sorted(edge)) for edge in self.graph.edges()]
222 |         self.ordered_edges = sorted(some_edges_1)
223 | 
224 |         self.numNodes = len(self.graph.nodes())
225 |         self.numEdges = len(self.graph.edges())
226 | 
227 |         self.graph_state = np.zeros((self.numEdges, 2))
228 |         self.between_feature = np.zeros(self.numEdges)
229 | 
230 |         position = 0
231 |         for edge in self.ordered_edges:
232 |             i = edge[0]
233 |             j = edge[1]
234 |             self.edgesDict[str(i)+':'+str(j)] = position
235 |             self.edgesDict[str(j)+':'+str(i)] = position
236 |             betweenness = (self.graph.get_edge_data(i, j)['betweenness'] - self.mu_bet) / self.std_bet
237 |             self.graph.get_edge_data(i, j)['betweenness'] = betweenness
238 |             self.graph_state[position][0] = self.graph.get_edge_data(i, j)["capacity"]
239 |             self.between_feature[position] = self.graph.get_edge_data(i, j)['betweenness']
240 |             position = position + 1
241 | 
242 |         self.initial_state = np.copy(self.graph_state)
243 | 
244 |         self._first_second_between()
245 | 
246 |         self.firstTrueSize = len(self.first)
247 | 
248 |         # We create the list of nodes ids to pick randomly from them
249 |         self.nodes = list(range(0,self.numNodes))
250 | 
251 |     def make_step(self, state, action, demand, source, destination):
252 |         self.graph_state = np.copy(state)
253 |         self.episode_over = True
254 |         self.reward = 0
255 | 
256 |         i = 0
257 |         j = 1
258 |         currentPath = self.allPaths[str(source) +':'+ str(destination)][action]
259 | 
260 |         # Once we pick the action, we decrease the total edge capacity from the edges
261 |         # from the allocated path (action path)
262 |         while (j < len(currentPath)):
263 |             self.graph_state[self.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] -= demand
264 |             if self.graph_state[self.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] < 0:
265 |                 # FINISH IF LINKS CAPACITY <0
266 |                 return self.graph_state, self.reward, self.episode_over, self.demand, self.source, self.destination 
267 |             i = i + 1
268 |             j = j + 1
269 | 
270 |         # Leave the bw_allocated back to 0
271 |         self.graph_state[:,1] = 0
272 | 
273 |         # Reward is the allocated demand or 0 otherwise (end of episode)
274 |         # We normalize the demand to don't have extremely large values
275 |         self.reward = demand/self.max_demand
276 |         self.episode_over = False
277 | 
278 |         self.demand = random.choice(self.listofDemands)
279 |         self.source = random.choice(self.nodes)
280 | 
281 |         # We pick a pair of SOURCE,DESTINATION different nodes
282 |         while True:
283 |             self.destination = random.choice(self.nodes)
284 |             if self.destination != self.source:
285 |                 break
286 | 
287 |         return self.graph_state, self.reward, self.episode_over, self.demand, self.source, self.destination
288 | 
289 |     def reset(self):
290 |         """
291 |         Reset environment and setup for new episode. Generate new demand and pair source, destination.
292 | 
293 |         Returns:
294 |             initial state of reset environment, a new demand and a source and destination node
295 |         """
296 |         self.graph_state = np.copy(self.initial_state)
297 |         self.demand = random.choice(self.listofDemands)
298 |         self.source = random.choice(self.nodes)
299 | 
300 |         # We pick a pair of SOURCE,DESTINATION different nodes
301 |         while True:
302 |             self.destination = random.choice(self.nodes)
303 |             if self.destination != self.source:
304 |                 break
305 | 
306 |         return self.graph_state, self.demand, self.source, self.destination
307 |     
308 |     def eval_sap_reset(self, demand, source, destination):
309 |         """
310 |         Reset environment and setup for new episode. This function is used in the "evaluate_DQN.py" script.
311 |         """
312 |         self.graph_state = np.copy(self.initial_state)
313 |         self.demand = demand
314 |         self.source = source
315 |         self.destination = destination
316 | 
317 |         return self.graph_state


--------------------------------------------------------------------------------
/DQN/gym-environments/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 | 
3 | setup(name='gym_environments',
4 |       version='0.0.1',
5 |       install_requires=['gym']  # And any other dependencies foo needs
6 | )


--------------------------------------------------------------------------------
/DQN/modelssample_DQN_agent/checkpoint:
--------------------------------------------------------------------------------
1 | model_checkpoint_path: "ckpt-367"
2 | all_model_checkpoint_paths: "ckpt-367"
3 | 


--------------------------------------------------------------------------------
/DQN/modelssample_DQN_agent/ckpt-349.data-00000-of-00001:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/modelssample_DQN_agent/ckpt-349.data-00000-of-00001


--------------------------------------------------------------------------------
/DQN/modelssample_DQN_agent/ckpt-349.index:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/modelssample_DQN_agent/ckpt-349.index


--------------------------------------------------------------------------------
/DQN/mpnn.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2021, Paul Almasan [^1]
 2 | #
 3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture
 4 | #     department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu
 5 | 
 6 | import tensorflow as tf
 7 | from tensorflow import keras
 8 | from keras import regularizers
 9 | 
10 | class myModel(tf.keras.Model):
11 |     def __init__(self, hparams):
12 |         super(myModel, self).__init__()
13 |         self.hparams = hparams
14 | 
15 |         # Define layers here
16 |         self.Message = tf.keras.models.Sequential()
17 |         self.Message.add(keras.layers.Dense(self.hparams['link_state_dim'],
18 |                                             activation=tf.nn.selu, name="FirstLayer"))
19 | 
20 |         self.Update = tf.keras.layers.GRUCell(self.hparams['link_state_dim'], dtype=tf.float32)
21 | 
22 |         self.Readout = tf.keras.models.Sequential()
23 |         self.Readout.add(keras.layers.Dense(self.hparams['readout_units'],
24 |                                             activation=tf.nn.selu,
25 |                                             kernel_regularizer=regularizers.l2(hparams['l2']),
26 |                                             name="Readout1"))
27 |         self.Readout.add(keras.layers.Dropout(rate=hparams['dropout_rate']))
28 |         self.Readout.add(keras.layers.Dense(self.hparams['readout_units'],
29 |                                             activation=tf.nn.selu,
30 |                                             kernel_regularizer=regularizers.l2(hparams['l2']),
31 |                                             name="Readout2"))
32 |         self.Readout.add(keras.layers.Dropout(rate=hparams['dropout_rate']))
33 |         self.Readout.add(keras.layers.Dense(1, kernel_regularizer=regularizers.l2(hparams['l2']),
34 |                                             name="Readout3"))
35 | 
36 |     def build(self, input_shape=None):
37 |         self.Message.build(input_shape=tf.TensorShape([None, self.hparams['link_state_dim']*2]))
38 |         self.Update.build(input_shape=tf.TensorShape([None,self.hparams['link_state_dim']]))
39 |         self.Readout.build(input_shape=[None, self.hparams['link_state_dim']])
40 |         self.built = True
41 | 
42 |     @tf.function
43 |     def call(self, states_action, states_graph_ids, states_first, states_second, sates_num_edges, training=False):
44 |         # Define the forward pass
45 |         link_state = states_action
46 | 
47 |         # Execute T times
48 |         for _ in range(self.hparams['T']):
49 |             # We have the combination of the hidden states of the main edges with the neighbours
50 |             mainEdges = tf.gather(link_state, states_first)
51 |             neighEdges = tf.gather(link_state, states_second)
52 | 
53 |             edgesConcat = tf.concat([mainEdges, neighEdges], axis=1)
54 | 
55 |             ### 1.a Message passing for link with all it's neighbours
56 |             outputs = self.Message(edgesConcat)
57 | 
58 |             ### 1.b Sum of output values according to link id index
59 |             edges_inputs = tf.math.unsorted_segment_sum(data=outputs, segment_ids=states_second,
60 |                                                         num_segments=sates_num_edges)
61 | 
62 |             ### 2. Update for each link
63 |             # GRUcell needs a 3D tensor as state because there is a matmul: Wrap the link state
64 |             outputs, links_state_list = self.Update(edges_inputs, [link_state])
65 | 
66 |             link_state = links_state_list[0]
67 | 
68 |         # Perform sum of all hidden states
69 |         edges_combi_outputs = tf.math.segment_sum(link_state, states_graph_ids, name=None)
70 | 
71 |         r = self.Readout(edges_combi_outputs,training=training)
72 |         return r
73 | 


--------------------------------------------------------------------------------
/DQN/parse.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | import matplotlib.pyplot as plt
 4 | from scipy.signal import savgol_filter
 5 | 
 6 | if __name__ == "__main__":
 7 |     # python parse.py -d ./Logs/expsample_DQN_agentLogs.txt
 8 |     parser = argparse.ArgumentParser(description='Parse file and create plots')
 9 | 
10 |     parser.add_argument('-d', help='data file', type=str, required=True, nargs='+')
11 |     args = parser.parse_args()
12 | 
13 |     aux = args.d[0].split(".")
14 |     aux = aux[1].split("exp")
15 |     differentiation_str = str(aux[1].split("Logs")[0])
16 | 
17 |     list_score_test = []
18 |     epsilon_decay = []
19 |     list_losses = []
20 | 
21 |     if not os.path.exists("./Images"):
22 |         os.makedirs("./Images")
23 | 
24 |     with open(args.d[0]) as fp:
25 |         for line in fp:
26 |             arrayLine = line.split(",")
27 |             if arrayLine[0]==">":
28 |                 list_score_test.append(float(arrayLine[1]))
29 |             elif arrayLine[0]=="-":
30 |                 epsilon_decay.append(float(arrayLine[1]))
31 |             elif arrayLine[0]==".":
32 |                 list_losses.append(float(arrayLine[1]))
33 | 
34 |     model_id = -1
35 |     reward = 0
36 |     with open(args.d[0]) as fp:
37 |         for line in reversed(list(fp)):
38 |             arrayLine = line.split(":")
39 |             if arrayLine[0]=='MAX REWD':
40 |                 model_id = arrayLine[2].split(",")[0]
41 |                 reward = arrayLine[1].split(" ")[1]
42 |                 break
43 |     
44 |     print("Best model_id: "+model_id+" with Average Score Test of "+reward)
45 | 
46 |     plt.plot(list_score_test, label="Score")
47 |     plt.xlabel("Episodes")
48 |     plt.title("GNN+DQN Testing score")
49 |     plt.ylabel("Average Score Test")
50 |     plt.legend(loc="lower right")
51 |     plt.savefig("./Images/AvgTestScore_" + differentiation_str)
52 |     plt.close()
53 | 
54 |     # Plot epsilon evolution
55 |     plt.plot(epsilon_decay)
56 |     plt.xlabel("Episodes")
57 |     plt.ylabel("Epsilon value")
58 |     plt.savefig("./Images/Epsilon_" + differentiation_str)
59 |     plt.close()
60 | 
61 |     # Plot Loss evolution
62 |     ysmoothed = savgol_filter(list_losses, 51, 3)
63 |     plt.plot(list_losses, color='lightblue')
64 |     plt.plot(ysmoothed)
65 |     plt.xlabel("Batch")
66 |     plt.title("Average loss per batch")
67 |     plt.ylabel("Loss")
68 |     plt.yscale("log")
69 |     plt.savefig("./Images/AvgLosses_" + differentiation_str)
70 |     plt.close()


--------------------------------------------------------------------------------
/DQN/requirements.txt:
--------------------------------------------------------------------------------
 1 | absl-py==1.0.0
 2 | astunparse==1.6.3
 3 | cachetools==4.2.4
 4 | certifi==2021.10.8
 5 | charset-normalizer==2.0.7
 6 | cloudpickle==2.0.0
 7 | cycler==0.11.0
 8 | flatbuffers==2.0
 9 | fonttools==4.28.1
10 | gast==0.4.0
11 | google-auth==2.3.3
12 | google-auth-oauthlib==0.4.6
13 | google-pasta==0.2.0
14 | grpcio==1.42.0
15 | gym==0.21.0
16 | h5py==3.6.0
17 | idna==3.3
18 | importlib-metadata==4.8.2
19 | keras==2.7.0
20 | Keras-Preprocessing==1.1.2
21 | kiwisolver==1.3.2
22 | libclang==12.0.0
23 | Markdown==3.3.6
24 | matplotlib==3.5.0
25 | networkx==2.6.3
26 | numpy==1.21.4
27 | oauthlib==3.1.1
28 | opt-einsum==3.3.0
29 | packaging==21.3
30 | Pillow==8.4.0
31 | protobuf==3.19.1
32 | pyasn1==0.4.8
33 | pyasn1-modules==0.2.8
34 | pyparsing==3.0.6
35 | python-dateutil==2.8.2
36 | requests==2.26.0
37 | requests-oauthlib==1.3.0
38 | rsa==4.7.2
39 | scipy==1.7.2
40 | setuptools-scm==6.3.2
41 | six==1.16.0
42 | tensorboard==2.7.0
43 | tensorboard-data-server==0.6.1
44 | tensorboard-plugin-wit==1.8.0
45 | tensorflow==2.7.0
46 | tensorflow-estimator==2.7.0
47 | tensorflow-io-gcs-filesystem==0.22.0
48 | termcolor==1.1.0
49 | tomli==1.2.2
50 | typing-extensions==4.0.0
51 | urllib3==1.26.7
52 | Werkzeug==2.0.2
53 | wrapt==1.13.3
54 | zipp==3.6.0
55 | 


--------------------------------------------------------------------------------
/DQN/train_DQN.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2021, Paul Almasan [^1]
  2 | #
  3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture
  4 | #     department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu
  5 | 
  6 | import numpy as np
  7 | import gym
  8 | import gc
  9 | import os
 10 | import sys
 11 | import gym_environments
 12 | import random
 13 | import mpnn as gnn
 14 | import tensorflow as tf
 15 | from collections import deque
 16 | import multiprocessing
 17 | import time as tt
 18 | import glob
 19 | 
 20 | os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
 21 | 
 22 | ENV_NAME = 'GraphEnv-v1'
 23 | graph_topology = 0 # 0==NSFNET, 1==GEANT2, 2==Small Topology, 3==GBN
 24 | SEED = 37
 25 | ITERATIONS = 10000
 26 | TRAINING_EPISODES = 20
 27 | EVALUATION_EPISODES = 40
 28 | FIRST_WORK_TRAIN_EPISODE = 60
 29 | 
 30 | MULTI_FACTOR_BATCH = 6 # Number of batches used in training
 31 | TAU = 0.08 # Only used in soft weights copy
 32 | 
 33 | differentiation_str = "sample_DQN_agent"
 34 | checkpoint_dir = "./models"+differentiation_str
 35 | store_loss = 3 # Store the loss every store_loss batches
 36 | 
 37 | os.environ['PYTHONHASHSEED']=str(SEED)
 38 | np.random.seed(SEED)
 39 | random.seed(SEED)
 40 | 
 41 | # Force TensorFlow to use single thread.
 42 | # Multiple threads are a potential source of non-reproducible results.
 43 | # For further details, see: https://stackoverflow.com/questions/42022950/
 44 | # tf.config.threading.set_inter_op_parallelism_threads(1)
 45 | # tf.config.threading.set_intra_op_parallelism_threads(1)
 46 | 
 47 | tf.random.set_seed(1)
 48 | 
 49 | train_dir = "./TensorBoard/"+differentiation_str
 50 | # summary_writer = tf.summary.create_file_writer(train_dir)
 51 | listofDemands = [8, 32, 64]
 52 | copy_weights_interval = 50
 53 | evaluation_interval = 20
 54 | epsilon_start_decay = 70
 55 | 
 56 | 
 57 | hparams = {
 58 |     'l2': 0.1,
 59 |     'dropout_rate': 0.01,
 60 |     'link_state_dim': 20,
 61 |     'readout_units': 35,
 62 |     'learning_rate': 0.0001,
 63 |     'batch_size': 32,
 64 |     'T': 4, 
 65 |     'num_demands': len(listofDemands)
 66 | }
 67 | 
 68 | MAX_QUEUE_SIZE = 4000
 69 | 
 70 | def cummax(alist, extractor):
 71 |     with tf.name_scope('cummax'):
 72 |         maxes = [tf.reduce_max(extractor(v)) + 1 for v in alist]
 73 |         cummaxes = [tf.zeros_like(maxes[0])]
 74 |         for i in range(len(maxes) - 1):
 75 |             cummaxes.append(tf.math.add_n(maxes[0:i + 1]))
 76 |     return cummaxes
 77 | 
 78 | class DQNAgent:
 79 |     def __init__(self, batch_size):
 80 |         self.memory = deque(maxlen=MAX_QUEUE_SIZE)
 81 |         self.gamma = 0.95  # discount rate
 82 |         self.epsilon = 1.0 # exploration rate
 83 |         self.epsilon_min = 0.01
 84 |         self.epsilon_decay = 0.995
 85 |         self.writer = None
 86 |         self.K = 4 # K-paths
 87 |         self.listQValues = None
 88 |         self.numbersamples = batch_size
 89 |         self.action = None
 90 |         self.capacity_feature = None
 91 |         self.bw_allocated_feature = np.zeros((env_training.numEdges,len(env_training.listofDemands)))
 92 | 
 93 |         self.global_step = 0
 94 |         self.primary_network = gnn.myModel(hparams)
 95 |         self.primary_network.build()
 96 |         self.target_network = gnn.myModel(hparams)
 97 |         self.target_network.build()
 98 |         self.optimizer = tf.keras.optimizers.SGD(learning_rate=hparams['learning_rate'],momentum=0.9,nesterov=True)
 99 | 
100 |     def act(self, env, state, demand, source, destination, flagEvaluation):
101 |         """
102 |         Given a demand stored in the environment it allocates the K=4 shortest paths on the current 'state'
103 |         and predicts the q_values of the K=4 different new graph states by using the GNN model.
104 |         Picks the state according to epsilon-greedy approach. The flag=TRUE indicates that we are testing
105 |         the model and thus, it won't activate the drop layers.
106 |         """
107 |         # Set to True if we need to compute K=4 q-values and take the maxium
108 |         takeMax_epsilon = False
109 |         # List of graphs
110 |         listGraphs = []
111 |         # List of graph features that are used in the cummax() call
112 |         list_k_features = list()
113 |         # Initialize action
114 |         action = 0
115 | 
116 |         # We get the K-paths between source-destination
117 |         pathList = env.allPaths[str(source) +':'+ str(destination)]
118 |         path = 0
119 | 
120 |         # 1. Implement epsilon-greedy to pick allocation
121 |         # If flagEvaluation==TRUE we are EVALUATING => take always the action that the agent is saying has higher q-value
122 |         # Otherwise, we are training with normal epsilon-greedy strategy
123 |         if flagEvaluation:
124 |             # If evaluation, compute K=4 q-values and take the maxium value
125 |             takeMax_epsilon = True
126 |         else:
127 |             # If training, compute epsilon-greedy
128 |             z = np.random.random()
129 |             if z > self.epsilon:
130 |                 # Compute K=4 q-values and pick the one with highest value
131 |                 # In case of multiple same max values, return the first one
132 |                 takeMax_epsilon = True
133 |             else:
134 |                 # Pick a random path and compute only one q-value
135 |                 path = np.random.randint(0, len(pathList))
136 |                 action = path
137 | 
138 |         # 2. Allocate (S,D, linkDemand) demand using the K shortest paths
139 |         while path < len(pathList):
140 |             state_copy = np.copy(state)
141 |             currentPath = pathList[path]
142 |             i = 0
143 |             j = 1
144 | 
145 |             # 3. Iterate over paths' pairs of nodes and allocate demand to bw_allocated
146 |             while (j < len(currentPath)):
147 |                 state_copy[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = demand
148 |                 i = i + 1
149 |                 j = j + 1
150 | 
151 |             # 4. Add allocated graphs' features to the list. Later we will compute their q-values using cummax
152 |             listGraphs.append(state_copy)
153 |             features = self.get_graph_features(env, state_copy)
154 |             list_k_features.append(features)
155 | 
156 |             if not takeMax_epsilon:
157 |                 # If we don't need to compute the K=4 q-values we exit
158 |                 break
159 | 
160 |             path = path + 1
161 | 
162 |         vs = [v for v in list_k_features]
163 | 
164 |         # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 
165 |         # link hidden states for each graph.
166 |         graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))]
167 |         first_offset = cummax(vs, lambda v: v['first'])
168 |         second_offset = cummax(vs, lambda v: v['second'])
169 | 
170 |         tensors = ({
171 |             'graph_id': tf.concat([v for v in graph_ids], axis=0),
172 |             'link_state': tf.concat([v['link_state'] for v in vs], axis=0),
173 |             'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0),
174 |             'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0),
175 |             'num_edges': tf.math.add_n([v['num_edges'] for v in vs]),
176 |             }
177 |         )        
178 | 
179 |         # Predict qvalues for all graphs within tensors
180 |         self.listQValues = self.primary_network(tensors['link_state'], tensors['graph_id'], tensors['first'],
181 |                         tensors['second'], tensors['num_edges'], training=False).numpy()
182 | 
183 |         if takeMax_epsilon:
184 |             # We take the path with highest q-value
185 |             action = np.argmax(self.listQValues)
186 |         else:
187 |             return path, list_k_features[0]
188 | 
189 |         return action, list_k_features[action]
190 |     
191 |     def get_graph_features(self, env, copyGraph):
192 |         """
193 |         We iterate over the converted graph nodes and take the features. The capacity and bw allocated features
194 |         are normalized on the fly.
195 |         """
196 |         self.bw_allocated_feature.fill(0.0)
197 |         # Normalize capacity feature
198 |         self.capacity_feature = (copyGraph[:,0] - 100.00000001) / 200.0
199 | 
200 |         iter = 0
201 |         for i in copyGraph[:, 1]:
202 |             if i == 8:
203 |                 self.bw_allocated_feature[iter][0] = 1
204 |             elif i == 32:
205 |                 self.bw_allocated_feature[iter][1] = 1
206 |             elif i == 64:
207 |                 self.bw_allocated_feature[iter][2] = 1
208 |             iter = iter + 1
209 |         
210 |         sample = {
211 |             'num_edges': env.numEdges,  
212 |             'length': env.firstTrueSize,
213 |             'betweenness': tf.convert_to_tensor(value=env.between_feature, dtype=tf.float32),
214 |             'bw_allocated': tf.convert_to_tensor(value=self.bw_allocated_feature, dtype=tf.float32),
215 |             'capacities': tf.convert_to_tensor(value=self.capacity_feature, dtype=tf.float32),
216 |             'first': tf.convert_to_tensor(env.first, dtype=tf.int32),
217 |             'second': tf.convert_to_tensor(env.second, dtype=tf.int32)
218 |         }
219 | 
220 |         sample['capacities'] = tf.reshape(sample['capacities'][0:sample['num_edges']], [sample['num_edges'], 1])
221 |         sample['betweenness'] = tf.reshape(sample['betweenness'][0:sample['num_edges']], [sample['num_edges'], 1])
222 | 
223 |         hiddenStates = tf.concat([sample['capacities'], sample['betweenness'], sample['bw_allocated']], axis=1)
224 | 
225 |         paddings = tf.constant([[0, 0], [0, hparams['link_state_dim'] - 2 - hparams['num_demands']]])
226 |         link_state = tf.pad(tensor=hiddenStates, paddings=paddings, mode="CONSTANT")
227 | 
228 |         inputs = {'link_state': link_state, 'first': sample['first'][0:sample['length']],
229 |                   'second': sample['second'][0:sample['length']], 'num_edges': sample['num_edges']}
230 | 
231 |         return inputs
232 |     
233 |     def _write_tf_summary(self, gradients, loss):
234 |         with summary_writer.as_default():
235 |             tf.summary.scalar(name="loss", data=loss[0], step=self.global_step)
236 |             tf.summary.histogram(name='gradients_5', data=gradients[5], step=self.global_step)
237 |             tf.summary.histogram(name='gradients_7', data=gradients[7], step=self.global_step)
238 |             tf.summary.histogram(name='gradients_9', data=gradients[9], step=self.global_step)
239 |             tf.summary.histogram(name='FirstLayer/kernel:0', data=self.primary_network.variables[0], step=self.global_step)
240 |             tf.summary.histogram(name='FirstLayer/bias:0', data=self.primary_network.variables[1], step=self.global_step)
241 |             tf.summary.histogram(name='kernel:0', data=self.primary_network.variables[2], step=self.global_step)
242 |             tf.summary.histogram(name='recurrent_kernel:0', data=self.primary_network.variables[3], step=self.global_step)
243 |             tf.summary.histogram(name='bias:0', data=self.primary_network.variables[4], step=self.global_step)
244 |             tf.summary.histogram(name='Readout1/kernel:0', data=self.primary_network.variables[5], step=self.global_step)
245 |             tf.summary.histogram(name='Readout1/bias:0', data=self.primary_network.variables[6], step=self.global_step)
246 |             tf.summary.histogram(name='Readout2/kernel:0', data=self.primary_network.variables[7], step=self.global_step)
247 |             tf.summary.histogram(name='Readout2/bias:0', data=self.primary_network.variables[8], step=self.global_step)
248 |             tf.summary.histogram(name='Readout3/kernel:0', data=self.primary_network.variables[9], step=self.global_step)
249 |             tf.summary.histogram(name='Readout3/bias:0', data=self.primary_network.variables[10], step=self.global_step)
250 |             summary_writer.flush()
251 |             self.global_step = self.global_step + 1
252 | 
253 |     @tf.function
254 |     def _forward_pass(self, x):
255 |         prediction_state = self.primary_network(x[0], x[1], x[2], x[3], x[4], training=True)
256 |         preds_next_target = tf.stop_gradient(self.target_network(x[6], x[7], x[9], x[10], x[11], training=True))
257 |         return prediction_state, preds_next_target
258 | 
259 |     def _train_step(self, batch):
260 |         # Record operations for automatic differentiation
261 |         with tf.GradientTape() as tape:
262 |             preds_state = []
263 |             target = []
264 |             for x in batch:
265 |                 prediction_state, preds_next_target = self._forward_pass(x)
266 |                 # Take q-value of the action performed
267 |                 preds_state.append(prediction_state[0])
268 |                 # We multiple by 0 if done==TRUE to cancel the second term
269 |                 target.append(tf.stop_gradient([x[5] + self.gamma*tf.math.reduce_max(preds_next_target)*(1-x[8])]))
270 | 
271 |             loss = tf.keras.losses.MSE(tf.stack(target, axis=1), tf.stack(preds_state, axis=1))
272 |             # Loss function using L2 Regularization
273 |             regularization_loss = sum(self.primary_network.losses)
274 |             loss = loss + regularization_loss
275 | 
276 |         # Computes the gradient using operations recorded in context of this tape
277 |         grad = tape.gradient(loss, self.primary_network.variables)
278 |         #gradients, _ = tf.clip_by_global_norm(grad, 5.0)
279 |         gradients = [tf.clip_by_value(gradient, -1., 1.) for gradient in grad]
280 |         self.optimizer.apply_gradients(zip(gradients, self.primary_network.variables))
281 |         del tape
282 |         return grad, loss
283 |     
284 |     def replay(self, episode):
285 |         for i in range(MULTI_FACTOR_BATCH):
286 |             batch = random.sample(self.memory, self.numbersamples)
287 |             
288 |             grad, loss = self._train_step(batch)
289 |             if i%store_loss==0:
290 |                 fileLogs.write(".," + '%.9f' % loss.numpy() + ",\n")
291 |         
292 |         # Soft weights update
293 |         # for t, e in zip(self.target_network.trainable_variables, self.primary_network.trainable_variables):
294 |         #     t.assign(t * (1 - TAU) + e * TAU)
295 | 
296 |         # Hard weights update
297 |         if episode % copy_weights_interval == 0:
298 |             self.target_network.set_weights(self.primary_network.get_weights()) 
299 |         # if episode % evaluation_interval == 0:
300 |         #     self._write_tf_summary(grad, loss)
301 |         gc.collect()
302 |     
303 |     def add_sample(self, env_training, state_action, action, reward, done, new_state, new_demand, new_source, new_destination):
304 |         self.bw_allocated_feature.fill(0.0)
305 |         new_state_copy = np.copy(new_state)
306 | 
307 |         state_action['graph_id'] = tf.fill([tf.shape(state_action['link_state'])[0]], 0)
308 |     
309 |         # We get the K-paths between new_source-new_destination
310 |         pathList = env_training.allPaths[str(new_source) +':'+ str(new_destination)]
311 |         path = 0
312 |         list_k_features = list()
313 | 
314 |         # 2. Allocate (S,D, linkDemand) demand using the K shortest paths
315 |         while path < len(pathList):
316 |             currentPath = pathList[path]
317 |             i = 0
318 |             j = 1
319 | 
320 |             # 3. Iterate over paths' pairs of nodes and allocate new_demand to bw_allocated
321 |             while (j < len(currentPath)):
322 |                 new_state_copy[env_training.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = new_demand
323 |                 i = i + 1
324 |                 j = j + 1
325 | 
326 |             # 4. Add allocated graphs' features to the list. Later we will compute it's qvalues using cummax
327 |             features = agent.get_graph_features(env_training, new_state_copy)
328 | 
329 |             list_k_features.append(features)
330 |             path = path + 1
331 |             new_state_copy[:,1] = 0
332 |         
333 |         vs = [v for v in list_k_features]
334 | 
335 |         # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 
336 |         # link hidden states for each graph.
337 |         graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))]
338 |         first_offset = cummax(vs, lambda v: v['first'])
339 |         second_offset = cummax(vs, lambda v: v['second'])
340 | 
341 |         tensors = ({
342 |                 'graph_id': tf.concat([v for v in graph_ids], axis=0),
343 |                 'link_state': tf.concat([v['link_state'] for v in vs], axis=0),
344 |                 'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0),
345 |                 'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0),
346 |                 'num_edges': tf.math.add_n([v['num_edges'] for v in vs]),
347 |             }
348 |         )    
349 |         
350 |         # We store the state with the action marked, the graph ids, first, second, num_edges, the reward, 
351 |         # new_state(-1 because we don't need it in this case), the graph ids, done, first, second, number of edges
352 |         self.memory.append((state_action['link_state'], state_action['graph_id'], state_action['first'], # 2
353 |                         state_action['second'], tf.convert_to_tensor(state_action['num_edges']), # 4
354 |                         tf.convert_to_tensor(reward, dtype=tf.float32), tensors['link_state'], tensors['graph_id'], # 7
355 |                         tf.convert_to_tensor(int(done==True), dtype=tf.float32), tensors['first'], tensors['second'], # 10 
356 |                         tf.convert_to_tensor(tensors['num_edges']))) # 12
357 | 
358 | if __name__ == "__main__":
359 |     # python train_DQN.py
360 |     # Get the environment and extract the number of actions.
361 |     env_training = gym.make(ENV_NAME)
362 |     np.random.seed(SEED)
363 |     env_training.seed(SEED)
364 |     env_training.generate_environment(graph_topology, listofDemands)
365 | 
366 |     env_eval = gym.make(ENV_NAME)
367 |     np.random.seed(SEED)
368 |     env_eval.seed(SEED)
369 |     env_eval.generate_environment(graph_topology, listofDemands)
370 | 
371 |     batch_size = hparams['batch_size']
372 |     agent = DQNAgent(batch_size)
373 | 
374 |     eval_ep = 0
375 |     train_ep = 0
376 |     max_reward = 0
377 |     reward_id = 0
378 | 
379 |     if not os.path.exists("./Logs"):
380 |         os.makedirs("./Logs")
381 | 
382 |     # We store all the information in a Log file and later we parse this file 
383 |     # to extract all the relevant information
384 |     fileLogs = open("./Logs/exp" + differentiation_str + "Logs.txt", "a")
385 | 
386 |     if not os.path.exists(checkpoint_dir):
387 |         os.makedirs(checkpoint_dir)
388 |     checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
389 | 
390 |     checkpoint = tf.train.Checkpoint(model=agent.primary_network, optimizer=agent.optimizer)
391 | 
392 |     rewards_test = np.zeros(EVALUATION_EPISODES)
393 | 
394 |     for eps in range(EVALUATION_EPISODES):
395 |         state, demand, source, destination = env_eval.reset()
396 |         rewardAddTest = 0
397 |         while 1:
398 |             # We execute evaluation over current state
399 |             # demand, src, dst
400 |             action, _ = agent.act(env_eval, state, demand, source, destination, True)
401 |             
402 |             new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination)
403 |             rewardAddTest = rewardAddTest + reward
404 |             state = new_state
405 |             if done:
406 |                 break
407 |         rewards_test[eps] = rewardAddTest
408 | 
409 |     evalMeanReward = np.mean(rewards_test)
410 |     fileLogs.write(">," + str(evalMeanReward) + ",\n")
411 |     fileLogs.write("-," + str(agent.epsilon) + ",\n")
412 |     fileLogs.flush()
413 | 
414 |     counter_store_model = 1
415 | 
416 |     for ep_it in range(ITERATIONS):
417 |         if ep_it%5==0:
418 |             print("Training iteration: ", ep_it)
419 | 
420 |         if ep_it==0:
421 |             # At the beginning we don't have any experiences in the buffer. Thus, we force to
422 |             # perform more training episodes than usually
423 |             train_episodes = FIRST_WORK_TRAIN_EPISODE
424 |         else:
425 |             train_episodes = TRAINING_EPISODES
426 |         for _ in range(train_episodes):
427 |             # Used to clean the TF cache
428 |             tf.random.set_seed(1)
429 |             
430 |             state, demand, source, destination = env_training.reset()            
431 | 
432 |             while 1:
433 |                 # We execute evaluation over current state
434 |                 action, state_action = agent.act(env_training, state, demand, source, destination, False)
435 |                 new_state, reward, done, new_demand, new_source, new_destination = env_training.make_step(state, action, demand, source, destination)
436 | 
437 |                 agent.add_sample(env_training, state_action, action, reward, done, new_state, new_demand, new_source, new_destination)
438 |                 state = new_state
439 |                 demand = new_demand
440 |                 source = new_source
441 |                 destination = new_destination
442 |                 if done:
443 |                     break
444 | 
445 |         agent.replay(ep_it)
446 | 
447 |         # Decrease epsilon (from epsion-greedy exploration strategy)
448 |         if ep_it > epsilon_start_decay and agent.epsilon > agent.epsilon_min:
449 |             agent.epsilon *= agent.epsilon_decay
450 |             agent.epsilon *= agent.epsilon_decay
451 | 
452 |         # We only evaluate the model every evaluation_interval steps
453 |         if ep_it % evaluation_interval == 0:
454 |             for eps in range(EVALUATION_EPISODES):
455 |                 state, demand, source, destination = env_eval.reset()
456 |                 rewardAddTest = 0
457 |                 while 1:
458 |                     # We execute evaluation over current state
459 |                     action, _ = agent.act(env_eval, state, demand, source, destination, True)
460 |                     
461 |                     new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination)
462 |                     rewardAddTest = rewardAddTest + reward
463 |                     state = new_state
464 |                     if done:
465 |                         break
466 |                 rewards_test[eps] = rewardAddTest
467 |             evalMeanReward = np.mean(rewards_test)
468 | 
469 |             if evalMeanReward>max_reward:
470 |                 max_reward = evalMeanReward
471 |                 reward_id = counter_store_model
472 | 
473 |             fileLogs.write(">," + str(evalMeanReward) + ",\n")
474 |             fileLogs.write("-," + str(agent.epsilon) + ",\n")
475 | 
476 |             # Store trained model
477 |             checkpoint.save(checkpoint_prefix)
478 |             fileLogs.write("MAX REWD: " + str(max_reward) + " MODEL_ID: " + str(reward_id) +",\n")
479 |             counter_store_model = counter_store_model + 1
480 | 
481 |         fileLogs.flush()
482 | 
483 |         # Invoke garbage collection
484 |         # tf.keras.backend.clear_session()
485 |         gc.collect()
486 |     
487 |     for eps in range(EVALUATION_EPISODES):
488 |         state, demand, source, destination = env_eval.reset()
489 |         rewardAddTest = 0
490 |         while 1:
491 |             # We execute evaluation over current state
492 |             # demand, src, dst
493 |             action, _ = agent.act(env_eval, state, demand, source, destination, True)
494 |             
495 |             new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination)
496 |             rewardAddTest = rewardAddTest + reward
497 |             state = new_state
498 |             if done:
499 |                 break
500 |         rewards_test[eps] = rewardAddTest
501 |     evalMeanReward = np.mean(rewards_test)
502 | 
503 |     if evalMeanReward>max_reward:
504 |         max_reward = evalMeanReward
505 |         reward_id = counter_store_model
506 | 
507 |     fileLogs.write(">," + str(evalMeanReward) + ",\n")
508 |     fileLogs.write("-," + str(agent.epsilon) + ",\n")
509 | 
510 |     # Store trained model
511 |     checkpoint.save(checkpoint_prefix)
512 |     fileLogs.write("MAX REWD: " + str(max_reward) + " MODEL_ID: " + str(reward_id) +",\n")
513 |     
514 |     fileLogs.flush()
515 |     fileLogs.close()
516 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2019, Knowledge-Defined Networking
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | 3. Neither the name of the copyright holder nor the names of its
17 |    contributors may be used to endorse or promote products derived from
18 |    this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case
 2 | #### Link to paper: [[here](https://arxiv.org/abs/1910.07421)]
 3 | #### P. Almasan, J. Suárez-Varela, A. Badia-Sampera, K. Rusek, P. Barlet-Ros, A. Cabellos-Aparicio.
 4 |  
 5 | Contact: <felician.paul.almasan@upc.edu>
 6 | 
 7 | [![Twitter Follow](https://img.shields.io/twitter/follow/PaulAlmasan?style=social)](https://twitter.com/PaulAlmasan)
 8 | [![GitHub watchers](https://img.shields.io/github/watchers/knowledgedefinednetworking/DRL-GNN?style=social&label=Watch)](https://github.com/knowledgedefinednetworking/DRL-GNN)
 9 | [![GitHub forks](https://img.shields.io/github/forks/knowledgedefinednetworking/DRL-GNN?style=social&label=Fork)](https://github.com/knowledgedefinednetworking/DRL-GNN)
10 | [![GitHub stars](https://img.shields.io/github/stars/knowledgedefinednetworking/DRL-GNN?style=social&label=Star)](https://github.com/knowledgedefinednetworking/DRL-GNN)
11 | 
12 | ## Abstract
13 | Recent advances in Deep Reinforcement Learning (DRL) have shown a significant improvement in decision-making problems. The networking community has started to investigate how DRL can provide a new breed of solutions to relevant optimization problems, such as routing. However, most of the state-of-the-art DRL-based networking techniques fail to generalize, this means that they can only operate over network topologies seen during training, but not over new topologies. The reason behind this important limitation is that existing DRL networking solutions use standard neural networks (e.g., fully connected), which are unable to learn graph-structured information. In this paper we propose to use Graph Neural Networks (GNN) in combination with DRL. GNN have been recently proposed to model graphs, and our novel DRL+GNN architecture is able to learn, operate and generalize over arbitrary network topologies. To showcase its generalization capabilities, we evaluate it on an Optical Transport Network (OTN) scenario, where the agent needs to allocate traffic demands efficiently. Our results show that our DRL+GNN agent is able to achieve outstanding performance in topologies unseen during training.  
14 | 
15 | # Instructions to execute
16 | 
17 | [See the execution instructions](https://github.com/knowledgedefinednetworking/DRL-GNN/blob/master/DQN/README.md)
18 | 
19 | ## Description
20 | 
21 | To know more details about the implementation used in the experiments contact: [felician.paul.almasan@upc.edu](mailto:felician.paul.almasan@upc.edu)
22 | 
23 | Please cite the corresponding article if you use the code from this repository:
24 | 
25 | ```
26 | @article{almasan2019deep,
27 |   title={Deep reinforcement learning meets graph neural networks: Exploring a routing optimization use case},
28 |   author={Almasan, Paul and Su{\'a}rez-Varela, Jos{\'e} and Badia-Sampera, Arnau and Rusek, Krzysztof and Barlet-Ros, Pere and Cabellos-Aparicio, Albert},
29 |   journal={arXiv preprint arXiv:1910.07421},
30 |   year={2019}
31 | }
32 | ```
33 | 


--------------------------------------------------------------------------------