├── CITATION.cff ├── DQN ├── Images │ ├── AvgLosses_sample_DQN_agent.png │ ├── AvgTestScore_sample_DQN_agent.png │ ├── Epsilon_sample_DQN_agent.png │ └── ModelEvalNSFNET.pdf ├── Logs │ └── expsample_DQN_agentLogs.txt ├── README.md ├── evaluate_DQN.py ├── gym-environments │ ├── gym_environments │ │ ├── __init__.py │ │ └── envs │ │ │ ├── __init__.py │ │ │ └── environment1.py │ └── setup.py ├── modelssample_DQN_agent │ ├── checkpoint │ ├── ckpt-349.data-00000-of-00001 │ └── ckpt-349.index ├── mpnn.py ├── parse.py ├── requirements.txt └── train_DQN.py ├── LICENSE └── README.md /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: "If you use this software, please cite it as below." 3 | authors: 4 | - family-names: "Almasan" 5 | given-names: "Paul" 6 | orcid: "https://orcid.org/0000-0003-3903-6759" 7 | title: "Code of DRL+GNN architecture in OTN" 8 | version: 1.0 9 | date-released: 2021-11-22 10 | url: "https://github.com/knowledgedefinednetworking/DRL-GNN" 11 | -------------------------------------------------------------------------------- /DQN/Images/AvgLosses_sample_DQN_agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/AvgLosses_sample_DQN_agent.png -------------------------------------------------------------------------------- /DQN/Images/AvgTestScore_sample_DQN_agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/AvgTestScore_sample_DQN_agent.png -------------------------------------------------------------------------------- /DQN/Images/Epsilon_sample_DQN_agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/Epsilon_sample_DQN_agent.png -------------------------------------------------------------------------------- /DQN/Images/ModelEvalNSFNET.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/Images/ModelEvalNSFNET.pdf -------------------------------------------------------------------------------- /DQN/README.md: -------------------------------------------------------------------------------- 1 | # Instructions to execute 2 | 3 | 1. First, create the virtual environment and activate the environment. 4 | ```ruby 5 | virtualenv -p python3 myenv 6 | source myenv/bin/activate 7 | ``` 8 | 9 | 2. Then, we install all the required packages. 10 | ```ruby 11 | pip install -r requirements.txt 12 | ``` 13 | 14 | 3. Register custom gym environment. 15 | ```ruby 16 | pip install -e gym-environments/ 17 | ``` 18 | 19 | 4. Now we are ready to train a DQN agent. To do this, we must execute the following command. Notice that inside the *train_DQN.py* there are different hyperparameters that you can configure to set the training for different topologies, to define the size of the GNN model, etc. 20 | ```ruby 21 | python train_DQN.py 22 | ``` 23 | 24 | 5. Now that the training process is executing, we can see the DQN agent performance evolution by parsing the log files. 25 | ```ruby 26 | python parse.py -d ./Logs/expsample_DQN_agentLogs.txt 27 | ``` 28 | 29 | 6. Finally, we can evaluate our trained model on different topologies executing the command below. Notice that in the *evaluate_DQN.py* script you must modify the hyperparameters of the model to match the ones from the trained model. 30 | ```ruby 31 | python evaluate_DQN.py -d ./Logs/expsample_DQN_agentLogs.txt 32 | ``` -------------------------------------------------------------------------------- /DQN/evaluate_DQN.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import gym 3 | import os 4 | import gym_environments 5 | import networkx as nx 6 | import random 7 | import matplotlib.pyplot as plt 8 | import argparse 9 | import mpnn as gnn 10 | from collections import deque 11 | import tensorflow as tf 12 | 13 | os.environ['CUDA_VISIBLE_DEVICES'] = '-1' 14 | 15 | ENV_NAME_AGENT = 'GraphEnv-v1' 16 | ENV_NAME = 'GraphEnv-v1' 17 | 18 | SEED = 9 19 | os.environ['PYTHONHASHSEED']=str(SEED) 20 | np.random.seed(SEED) 21 | tf.random.set_seed(1) 22 | 23 | # Force TensorFlow to use single thread. 24 | # Multiple threads are a potential source of non-reproducible results. 25 | # For further details, see: https://stackoverflow.com/questions/42022950/ 26 | # tf.config.threading.set_inter_op_parallelism_threads(1) 27 | # tf.config.threading.set_intra_op_parallelism_threads(1) 28 | 29 | NUMBER_EPISODES = 50 30 | # We assume that the number of samples is always larger than the number of demands any agent can ever allocate 31 | NUM_SAMPLES_EPSD = 100 32 | 33 | # Set evaluation topology 34 | graph_topology = 0 # 0==NSFNET, 1==GEANT2, 2==Small Topology, 3==GBN 35 | listofDemands = [8, 32, 64] 36 | 37 | hparams = { 38 | 'l2': 0.1, 39 | 'dropout_rate': 0.01, 40 | 'link_state_dim': 20, 41 | 'readout_units': 35, 42 | 'learning_rate': 0.0001, 43 | 'batch_size': 32, 44 | 'T': 4, 45 | 'num_demands': len(listofDemands) 46 | } 47 | 48 | class SAPAgent: 49 | # Shortest Available Path 50 | # Select the shortest available path among the K paths 51 | def __init__(self): 52 | self.K = 4 53 | 54 | def act(self, env, state, demand, n1, n2): 55 | pathList = env.allPaths[str(n1) +':'+ str(n2)] 56 | path = 0 57 | allocated = 0 # Indicates 1 if we allocated the demand, 0 otherwise 58 | new_state = np.copy(state) 59 | while allocated==0 and path < len(pathList) and path take always the action that the agent is saying has higher q-value 195 | # Otherwise, we are training with normal epsilon-greedy strategy 196 | if flagEvaluation: 197 | # If evaluation, compute K=4 q-values and take the maxium value 198 | takeMax_epsilon = True 199 | else: 200 | # If training, compute epsilon-greedy 201 | z = np.random.random() 202 | if z > self.epsilon: 203 | # Compute K=4 q-values and pick the one with highest value 204 | # In case of multiple same max values, return the first one 205 | takeMax_epsilon = True 206 | else: 207 | # Pick a random path and compute only one q-value 208 | path = np.random.randint(0, len(pathList)) 209 | action = path 210 | 211 | # 2. Allocate (S,D, linkDemand) demand using the K shortest paths 212 | while path < len(pathList): 213 | state_copy = np.copy(state) 214 | currentPath = pathList[path] 215 | i = 0 216 | j = 1 217 | 218 | # 3. Iterate over paths' pairs of nodes and allocate demand to bw_allocated 219 | while (j < len(currentPath)): 220 | state_copy[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = demand 221 | i = i + 1 222 | j = j + 1 223 | 224 | # 4. Add allocated graphs' features to the list. Later we will compute their q-values using cummax 225 | listGraphs.append(state_copy) 226 | features = self.get_graph_features(env, state_copy) 227 | list_k_features.append(features) 228 | 229 | if not takeMax_epsilon: 230 | # If we don't need to compute the K=4 q-values we exit 231 | break 232 | 233 | path = path + 1 234 | 235 | vs = [v for v in list_k_features] 236 | 237 | # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 238 | # link hidden states for each graph. 239 | graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))] 240 | first_offset = cummax(vs, lambda v: v['first']) 241 | second_offset = cummax(vs, lambda v: v['second']) 242 | 243 | tensors = ({ 244 | 'graph_id': tf.concat([v for v in graph_ids], axis=0), 245 | 'link_state': tf.concat([v['link_state'] for v in vs], axis=0), 246 | 'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0), 247 | 'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0), 248 | 'num_edges': tf.math.add_n([v['num_edges'] for v in vs]), 249 | } 250 | ) 251 | 252 | # Predict qvalues for all graphs within tensors 253 | self.listQValues = self.primary_network(tensors['link_state'], tensors['graph_id'], tensors['first'], 254 | tensors['second'], tensors['num_edges'], training=False).numpy() 255 | 256 | if takeMax_epsilon: 257 | # We take the path with highest q-value 258 | action = np.argmax(self.listQValues) 259 | else: 260 | return path, list_k_features[0] 261 | 262 | return action, list_k_features[action] 263 | 264 | def get_graph_features(self, env, copyGraph): 265 | """ 266 | We iterate over the converted graph nodes and take the features. The capacity and bw allocated features 267 | are normalized on the fly. 268 | """ 269 | self.bw_demand_feature.fill(0.0) 270 | self.capacity_feature = (copyGraph[:,0] - 100.00000001) / 200.0 271 | 272 | itera = 0 273 | for i in copyGraph[:, 1]: 274 | if i == 8: 275 | self.bw_demand_feature[itera][0] = 1 276 | elif i == 32: 277 | self.bw_demand_feature[itera][1] = 1 278 | elif i == 64: 279 | self.bw_demand_feature[itera][2] = 1 280 | itera = itera + 1 281 | 282 | sample = { 283 | 'num_edges': env.numEdges, 284 | 'length': env.firstTrueSize, 285 | 'betweenness': tf.convert_to_tensor(value=env.between_feature, dtype=tf.float32), 286 | 'bw_allocated': tf.convert_to_tensor(value=self.bw_demand_feature, dtype=tf.float32), 287 | 'capacities': tf.convert_to_tensor(value=self.capacity_feature, dtype=tf.float32), 288 | 'first': tf.convert_to_tensor(env.first, dtype=tf.int32), 289 | 'second': tf.convert_to_tensor(env.second, dtype=tf.int32) 290 | } 291 | 292 | sample['capacities'] = tf.reshape(sample['capacities'][0:sample['num_edges']], [sample['num_edges'], 1]) 293 | sample['betweenness'] = tf.reshape(sample['betweenness'][0:sample['num_edges']], [sample['num_edges'], 1]) 294 | 295 | hiddenStates = tf.concat([sample['capacities'], sample['betweenness'], sample['bw_allocated']], axis=1) 296 | 297 | paddings = tf.constant([[0, 0], [0, hparams['link_state_dim'] - 2 - hparams['num_demands']]]) 298 | link_state = tf.pad(tensor=hiddenStates, paddings=paddings, mode="CONSTANT") 299 | 300 | inputs = {'link_state': link_state, 'first': sample['first'][0:sample['length']], 301 | 'second': sample['second'][0:sample['length']], 'num_edges': sample['num_edges']} 302 | 303 | return inputs 304 | 305 | def exec_lb_model_episodes(experience_memory, graph_topology): 306 | env_lb = gym.make(ENV_NAME) 307 | env_lb.seed(SEED) 308 | env_lb.generate_environment(graph_topology, listofDemands) 309 | 310 | agent = LBAgent() 311 | rewards_lb = np.zeros(NUMBER_EPISODES) 312 | 313 | rewardAdd = 0 314 | reward_it = 0 315 | iter_episode = 0 # Iterates over samples within the same episode 316 | new_episode = True 317 | wait_for_new_episode = False 318 | new_episode_it = 0 # Iterates over EPISODES 319 | while iter_episode < len(experience_memory): 320 | if new_episode: 321 | new_episode = False 322 | demand = experience_memory[iter_episode][1] 323 | source = experience_memory[iter_episode][2] 324 | destination = experience_memory[iter_episode][3] 325 | state = env_lb.eval_sap_reset(demand, source, destination) 326 | 327 | action = agent.act(env_lb, state, demand, source, destination) 328 | new_state, reward, done, _, _, _ = env_lb.make_step(state, action, demand, source, destination) 329 | env_lb.demand = demand 330 | env_lb.source = source 331 | env_lb.destination = destination 332 | rewardAdd = rewardAdd + reward 333 | state = new_state 334 | 335 | if done: 336 | rewards_lb[reward_it] = rewardAdd 337 | reward_it = reward_it + 1 338 | wait_for_new_episode = True 339 | 340 | iter_episode = iter_episode + 1 341 | else: 342 | if experience_memory[iter_episode][0] != new_episode_it: 343 | print("LB ERROR! The experience replay buffer needs more samples/episode") 344 | os.kill(os.getpid(), 9) 345 | 346 | demand = experience_memory[iter_episode][1] 347 | source = experience_memory[iter_episode][2] 348 | destination = experience_memory[iter_episode][3] 349 | action = agent.act(env_lb, state, demand, source, destination) 350 | new_state, reward, done, _, _, _ = env_lb.make_step(state, action, demand, source, destination) 351 | env_lb.demand = demand 352 | env_lb.source = source 353 | env_lb.destination = destination 354 | rewardAdd = rewardAdd + reward 355 | state = new_state 356 | 357 | if done: 358 | rewards_lb[reward_it] = rewardAdd 359 | reward_it = reward_it + 1 360 | wait_for_new_episode = True 361 | 362 | iter_episode = iter_episode + 1 363 | if wait_for_new_episode: 364 | rewardAdd = 0 365 | wait_for_new_episode = False 366 | new_episode = True 367 | new_episode_it = new_episode_it + 1 368 | iter_episode = new_episode_it*NUM_SAMPLES_EPSD 369 | return rewards_lb 370 | 371 | def exec_sap_model_episodes(experience_memory, graph_topology): 372 | env_sap = gym.make(ENV_NAME) 373 | env_sap.seed(SEED) 374 | env_sap.generate_environment(graph_topology, listofDemands) 375 | 376 | agent = SAPAgent() 377 | rewards_sap = np.zeros(NUMBER_EPISODES) 378 | 379 | rewardAdd = 0 380 | reward_it = 0 381 | iter_episode = 0 # Iterates over samples within the same episode 382 | new_episode = True 383 | wait_for_new_episode = False 384 | new_episode_it = 0 # Iterates over EPISODES 385 | while iter_episode < len(experience_memory): 386 | if new_episode: 387 | new_episode = False 388 | demand = experience_memory[iter_episode][1] 389 | source = experience_memory[iter_episode][2] 390 | destination = experience_memory[iter_episode][3] 391 | state = env_sap.eval_sap_reset(demand, source, destination) 392 | 393 | action = agent.act(env_sap, state, demand, source, destination) 394 | new_state, reward, done, _, _, _ = env_sap.make_step(state, action, demand, source, destination) 395 | env_sap.demand = demand 396 | env_sap.source = source 397 | env_sap.destination = destination 398 | rewardAdd = rewardAdd + reward 399 | state = new_state 400 | 401 | if done: 402 | rewards_sap[reward_it] = rewardAdd 403 | reward_it = reward_it + 1 404 | wait_for_new_episode = True 405 | 406 | iter_episode = iter_episode + 1 407 | else: 408 | if experience_memory[iter_episode][0]!=new_episode_it: 409 | print("SAP ERROR! The experience replay buffer needs more samples/episode") 410 | os.kill(os.getpid(), 9) 411 | 412 | demand = experience_memory[iter_episode][1] 413 | source = experience_memory[iter_episode][2] 414 | destination = experience_memory[iter_episode][3] 415 | action = agent.act(env_sap, state, demand, source, destination) 416 | new_state, reward, done, _, _, _ = env_sap.make_step(state, action, demand, source, destination) 417 | env_sap.demand = demand 418 | env_sap.source = source 419 | env_sap.destination = destination 420 | rewardAdd = rewardAdd + reward 421 | state = new_state 422 | 423 | if done: 424 | rewards_sap[reward_it] = rewardAdd 425 | reward_it = reward_it + 1 426 | wait_for_new_episode = True 427 | 428 | iter_episode = iter_episode + 1 429 | if wait_for_new_episode: 430 | rewardAdd = 0 431 | wait_for_new_episode = False 432 | new_episode = True 433 | new_episode_it = new_episode_it + 1 434 | iter_episode = new_episode_it * NUM_SAMPLES_EPSD 435 | return rewards_sap 436 | 437 | def exec_dqn_model_episodes(experience_memory, env_dqn, agent): 438 | rewards_dqn = np.zeros(NUMBER_EPISODES) 439 | 440 | rewardAdd = 0 441 | reward_it = 0 442 | iter_episode = 0 # Iterates over samples within the same episode 443 | new_episode = True 444 | wait_for_new_episode = False 445 | new_episode_it = 0 # Iterates over EPISODES 446 | while iter_episode < len(experience_memory): 447 | if new_episode: 448 | new_episode = False 449 | demand = experience_memory[iter_episode][1] 450 | source = experience_memory[iter_episode][2] 451 | destination = experience_memory[iter_episode][3] 452 | state = env_dqn.eval_sap_reset(demand, source, destination) 453 | 454 | action, state_action = agent.act(env_dqn, state, demand, source, destination, True) 455 | new_state, reward, done, new_demand, new_source, new_destination = env_dqn.make_step(state, action, demand, source, destination) 456 | rewardAdd = rewardAdd + reward 457 | state = new_state 458 | if done: 459 | rewards_dqn[reward_it] = rewardAdd 460 | reward_it = reward_it + 1 461 | wait_for_new_episode = True 462 | iter_episode = iter_episode + 1 463 | else: 464 | if experience_memory[iter_episode][0] != new_episode_it: 465 | print("DQNAgent ERROR! The experience replay buffer needs more samples/episode") 466 | os.kill(os.getpid(), 9) 467 | 468 | demand = experience_memory[iter_episode][1] 469 | source = experience_memory[iter_episode][2] 470 | destination = experience_memory[iter_episode][3] 471 | 472 | action, state_action = agent.act(env_dqn, state, demand, source, destination, True) 473 | new_state, reward, done, new_demand, new_source, new_destination = env_dqn.make_step(state, action, demand, source, destination) 474 | rewardAdd = rewardAdd + reward 475 | state = new_state 476 | if done: 477 | rewards_dqn[reward_it] = rewardAdd 478 | reward_it = reward_it + 1 479 | wait_for_new_episode = True 480 | iter_episode = iter_episode + 1 481 | if wait_for_new_episode: 482 | rewardAdd = 0 483 | wait_for_new_episode = False 484 | new_episode = True 485 | new_episode_it = new_episode_it + 1 486 | if new_episode_it%5==0: 487 | print("DQN Episode >>> ", new_episode_it) 488 | iter_episode = new_episode_it * NUM_SAMPLES_EPSD 489 | return rewards_dqn 490 | 491 | if __name__ == "__main__": 492 | # python evaluate_DQN.py -d ./Logs/expsample_DQN_agentLogs.txt 493 | 494 | # Parse logs and get best model 495 | parser = argparse.ArgumentParser(description='Parse file and create plots') 496 | 497 | parser.add_argument('-d', help='data file', type=str, required=True, nargs='+') 498 | args = parser.parse_args() 499 | 500 | aux = args.d[0].split(".") 501 | aux = aux[1].split("exp") 502 | differentiation_str = str(aux[1].split("Logs")[0]) 503 | 504 | if not os.path.exists("./Images"): 505 | os.makedirs("./Images") 506 | 507 | topo = "" 508 | if graph_topology==0: 509 | topo = "NSFNET" 510 | elif graph_topology==1: 511 | topo = "GEANT2" 512 | elif graph_topology==2: 513 | topo = "Small_Top" 514 | else: 515 | topo = "GBN" 516 | 517 | # Uncomment the following if you want to store the demands in a file 518 | # store_experiences = open("Traffic_demands_"+topo+"_1K.txt", "w") 519 | model_id = 0 520 | with open(args.d[0]) as fp: 521 | for line in reversed(list(fp)): 522 | arrayLine = line.split(":") 523 | if arrayLine[0]=='MAX REWD': 524 | model_id = int(arrayLine[2].split(",")[0]) 525 | break 526 | 527 | env_dqn = gym.make(ENV_NAME_AGENT) 528 | env_dqn.seed(SEED) 529 | env_dqn.generate_environment(graph_topology, listofDemands) 530 | 531 | dqn_agent = DQNAgent(env_dqn) 532 | checkpoint_dir = "./models" + differentiation_str 533 | checkpoint = tf.train.Checkpoint(model=dqn_agent.primary_network, optimizer=dqn_agent.optimizer) 534 | # Restore variables on creation if a checkpoint exists. 535 | checkpoint.restore(checkpoint_dir + "/ckpt-" + str(model_id)) 536 | print("Load model " + checkpoint_dir + "/ckpt-" + str(model_id)) 537 | 538 | means_sap = np.zeros(NUMBER_EPISODES) 539 | means_dqn = np.zeros(NUMBER_EPISODES) 540 | means_lb = np.zeros(NUMBER_EPISODES) 541 | iters = np.zeros(NUMBER_EPISODES) 542 | 543 | experience_memory = deque(maxlen=NUMBER_EPISODES*NUM_SAMPLES_EPSD) 544 | 545 | # Generate lists of determined size of demands. The different agents will iterate over the same list 546 | for ep_num in range(NUMBER_EPISODES): 547 | for sample in range(NUM_SAMPLES_EPSD): 548 | demand = np.random.choice(listofDemands) 549 | source = np.random.choice(env_dqn.nodes) 550 | 551 | # We pick a pair of SOURCE,DESTINATION different nodes 552 | while True: 553 | destination = np.random.choice(env_dqn.nodes) 554 | if destination != source: 555 | # We generate unique demands that don't overlap with existing topology edges 556 | experience_memory.append((ep_num, demand, source, destination)) 557 | #cstore_experiences.write(str(ep_num)+","+str(source)+","+str(destination)+","+str(demand)+"\n") 558 | break 559 | 560 | # store_experiences.close() 561 | 562 | rewards_lb = exec_lb_model_episodes(experience_memory, graph_topology) 563 | rewards_sap = exec_sap_model_episodes(experience_memory, graph_topology) 564 | rewards_dqn = exec_dqn_model_episodes(experience_memory, env_dqn, dqn_agent) 565 | 566 | #rewards_lb.tofile('rewards_lb'+topo+'1K.dat') 567 | #rewards_dqn.tofile('rewards_dqn'+topo+'1K.dat') 568 | 569 | plt.rcParams.update({'font.size': 12}) 570 | plt.plot(rewards_dqn, 'r', label="DQN") 571 | plt.plot(rewards_sap, 'b', label="SAP") 572 | plt.plot(rewards_lb, 'g', label="LB") 573 | 574 | #DQN 575 | mean = np.mean(rewards_dqn) 576 | means_dqn.fill(mean) 577 | plt.plot(means_dqn, 'r', linestyle="-.") 578 | 579 | #SAP 580 | mean = np.mean(rewards_sap) 581 | means_sap.fill(mean) 582 | plt.plot(means_sap, 'b', linestyle=":") 583 | 584 | #LB 585 | mean = np.mean(rewards_lb) 586 | means_lb.fill(mean) 587 | plt.plot(means_lb, 'g', linestyle="--") 588 | 589 | plt.xlabel("Episodes", fontsize=14, fontweight='bold') 590 | plt.ylabel("Score", fontsize=14, fontweight='bold') 591 | lgd = plt.legend(loc="lower left", bbox_to_anchor=(0.1, -0.24), 592 | ncol=4, fancybox=True, shadow=True) 593 | 594 | plt.savefig("./Images/ModelEval"+topo+".pdf", bbox_extra_artists=(lgd,), bbox_inches='tight') 595 | #plt.show() 596 | 597 | -------------------------------------------------------------------------------- /DQN/gym-environments/gym_environments/__init__.py: -------------------------------------------------------------------------------- 1 | from gym.envs.registration import register 2 | 3 | register( 4 | id='GraphEnv-v1', 5 | entry_point='gym_environments.envs:Env1', 6 | ) -------------------------------------------------------------------------------- /DQN/gym-environments/gym_environments/envs/__init__.py: -------------------------------------------------------------------------------- 1 | from gym_environments.envs.environment1 import Env1 -------------------------------------------------------------------------------- /DQN/gym-environments/gym_environments/envs/environment1.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2021, Paul Almasan [^1] 2 | # 3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture 4 | # department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu 5 | 6 | import gym 7 | import numpy as np 8 | import networkx as nx 9 | import random 10 | from gym import error, spaces, utils 11 | from random import choice 12 | import pylab 13 | import json 14 | import gc 15 | import matplotlib.pyplot as plt 16 | 17 | def create_geant2_graph(): 18 | Gbase = nx.Graph() 19 | Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) 20 | Gbase.add_edges_from( 21 | [(0, 1), (0, 2), (1, 3), (1, 6), (1, 9), (2, 3), (2, 4), (3, 6), (4, 7), (5, 3), 22 | (5, 8), (6, 9), (6, 8), (7, 11), (7, 8), (8, 11), (8, 20), (8, 17), (8, 18), (8, 12), 23 | (9, 10), (9, 13), (9, 12), (10, 13), (11, 20), (11, 14), (12, 13), (12,19), (12,21), 24 | (14, 15), (15, 16), (16, 17), (17,18), (18,21), (19, 23), (21,22), (22, 23)]) 25 | 26 | return Gbase 27 | 28 | def create_nsfnet_graph(): 29 | Gbase = nx.Graph() 30 | Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) 31 | Gbase.add_edges_from( 32 | [(0, 1), (0, 2), (0, 3), (1, 2), (1, 7), (2, 5), (3, 8), (3, 4), (4, 5), (4, 6), (5, 12), (5, 13), 33 | (6, 7), (7, 10), (8, 9), (8, 11), (9, 10), (9, 12), (10, 11), (10, 13), (11, 12)]) 34 | 35 | return Gbase 36 | 37 | def create_small_top(): 38 | Gbase = nx.Graph() 39 | Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8]) 40 | Gbase.add_edges_from( 41 | [(0, 1), (0, 2), (0, 3), (1, 2), (1, 7), (2, 5), (3, 8), (3, 4), (4, 5), (4, 6), (5, 0), 42 | (6, 7), (6, 8), (7, 8), (8, 0), (8, 6), (3, 2), (5, 3)]) 43 | 44 | return Gbase 45 | 46 | def create_gbn_graph(): 47 | Gbase = nx.Graph() 48 | Gbase.add_nodes_from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) 49 | Gbase.add_edges_from( 50 | [(0, 2), (0, 8), (1, 2), (1, 3), (1, 4), (2, 4), (3, 4), (3, 9), (4, 8), (4, 10), (4, 9), 51 | (5, 6), (5, 8), (6, 7), (7, 8), (7, 10), (9, 10), (9, 12), (10, 11), (10, 12), (11, 13), 52 | (12, 14), (12, 16), (13, 14), (14, 15), (15, 16)]) 53 | 54 | return Gbase 55 | 56 | def generate_nx_graph(topology): 57 | """ 58 | Generate graphs for training with the same topology. 59 | """ 60 | if topology == 0: 61 | G = create_nsfnet_graph() 62 | elif topology == 1: 63 | G = create_geant2_graph() 64 | elif topology == 2: 65 | G = create_small_top() 66 | else: 67 | G = create_gbn_graph() 68 | 69 | # nx.draw(G, with_labels=True) 70 | # plt.show() 71 | # plt.clf() 72 | 73 | # Node id counter 74 | incId = 1 75 | # Put all distance weights into edge attributes. 76 | for i, j in G.edges(): 77 | G.get_edge_data(i, j)['edgeId'] = incId 78 | G.get_edge_data(i, j)['betweenness'] = 0 79 | G.get_edge_data(i, j)['numsp'] = 0 # Indicates the number of shortest paths going through the link 80 | # We set the edges capacities to 200 81 | G.get_edge_data(i, j)["capacity"] = float(200) 82 | G.get_edge_data(i, j)['bw_allocated'] = 0 83 | incId = incId + 1 84 | 85 | return G 86 | 87 | 88 | def compute_link_betweenness(g, k): 89 | n = len(g.nodes()) 90 | betw = [] 91 | for i, j in g.edges(): 92 | # we add a very small number to avoid division by zero 93 | b_link = g.get_edge_data(i, j)['numsp'] / ((2.0 * n * (n - 1) * k) + 0.00000001) 94 | g.get_edge_data(i, j)['betweenness'] = b_link 95 | betw.append(b_link) 96 | 97 | mu_bet = np.mean(betw) 98 | std_bet = np.std(betw) 99 | return mu_bet, std_bet 100 | 101 | class Env1(gym.Env): 102 | """ 103 | Description: 104 | The self.graph_state stores the relevant features for the GNN model 105 | 106 | self.graph_state[:][0] = CAPACITY 107 | self.graph_state[:][1] = BW_ALLOCATED 108 | """ 109 | def __init__(self): 110 | self.graph = None 111 | self.initial_state = None 112 | self.source = None 113 | self.destination = None 114 | self.demand = None 115 | self.graph_state = None 116 | self.diameter = None 117 | 118 | # Nx Graph where the nodes have features. Betweenness is allways normalized. 119 | # The other features are "raw" and are being normalized before prediction 120 | self.first = None 121 | self.firstTrueSize = None 122 | self.second = None 123 | self.between_feature = None 124 | 125 | # Mean and standard deviation of link betweenness 126 | self.mu_bet = None 127 | self.std_bet = None 128 | 129 | self.max_demand = 0 130 | self.K = 4 131 | self.listofDemands = None 132 | self.nodes = None 133 | self.ordered_edges = None 134 | self.edgesDict = None 135 | self.numNodes = None 136 | self.numEdges = None 137 | 138 | self.state = None 139 | self.episode_over = True 140 | self.reward = 0 141 | self.allPaths = dict() 142 | 143 | def seed(self, seed): 144 | random.seed(seed) 145 | np.random.seed(seed) 146 | 147 | def num_shortest_path(self, topology): 148 | self.diameter = nx.diameter(self.graph) 149 | 150 | # Iterate over all node1,node2 pairs from the graph 151 | for n1 in self.graph: 152 | for n2 in self.graph: 153 | if (n1 != n2): 154 | # Check if we added the element of the matrix 155 | if str(n1)+':'+str(n2) not in self.allPaths: 156 | self.allPaths[str(n1)+':'+str(n2)] = [] 157 | 158 | # First we compute the shortest paths taking into account the diameter 159 | # This is because large topologies might take too long to compute all shortest paths 160 | [self.allPaths[str(n1)+':'+str(n2)].append(p) for p in nx.all_simple_paths(self.graph, source=n1, target=n2, cutoff=self.diameter*2)] 161 | 162 | # We take all the paths from n1 to n2 and we order them according to the path length 163 | self.allPaths[str(n1)+':'+str(n2)] = sorted(self.allPaths[str(n1)+':'+str(n2)], key=lambda item: (len(item), item)) 164 | 165 | path = 0 166 | while path < self.K and path < len(self.allPaths[str(n1)+':'+str(n2)]): 167 | currentPath = self.allPaths[str(n1)+':'+str(n2)][path] 168 | i = 0 169 | j = 1 170 | 171 | # Iterate over pairs of nodes increase the number of sp 172 | while (j < len(currentPath)): 173 | self.graph.get_edge_data(currentPath[i], currentPath[j])['numsp'] = \ 174 | self.graph.get_edge_data(currentPath[i], currentPath[j])['numsp'] + 1 175 | i = i + 1 176 | j = j + 1 177 | 178 | path = path + 1 179 | 180 | # Remove paths not needed 181 | del self.allPaths[str(n1)+':'+str(n2)][path:len(self.allPaths[str(n1)+':'+str(n2)])] 182 | gc.collect() 183 | 184 | 185 | def _first_second_between(self): 186 | self.first = list() 187 | self.second = list() 188 | 189 | # For each edge we iterate over all neighbour edges 190 | for i, j in self.ordered_edges: 191 | neighbour_edges = self.graph.edges(i) 192 | 193 | for m, n in neighbour_edges: 194 | if ((i != m or j != n) and (i != n or j != m)): 195 | self.first.append(self.edgesDict[str(i) +':'+ str(j)]) 196 | self.second.append(self.edgesDict[str(m) +':'+ str(n)]) 197 | 198 | neighbour_edges = self.graph.edges(j) 199 | for m, n in neighbour_edges: 200 | if ((i != m or j != n) and (i != n or j != m)): 201 | self.first.append(self.edgesDict[str(i) +':'+ str(j)]) 202 | self.second.append(self.edgesDict[str(m) +':'+ str(n)]) 203 | 204 | 205 | def generate_environment(self, topology, listofdemands): 206 | # The nx graph will only be used to convert graph from edges to nodes 207 | self.graph = generate_nx_graph(topology) 208 | 209 | self.listofDemands = listofdemands 210 | 211 | self.max_demand = np.amax(self.listofDemands) 212 | 213 | # Compute number of shortest paths per link. This will be used for the betweenness 214 | self.num_shortest_path(topology) 215 | 216 | # Compute the betweenness value for each link 217 | self.mu_bet, self.std_bet = compute_link_betweenness(self.graph, self.K) 218 | 219 | self.edgesDict = dict() 220 | 221 | some_edges_1 = [tuple(sorted(edge)) for edge in self.graph.edges()] 222 | self.ordered_edges = sorted(some_edges_1) 223 | 224 | self.numNodes = len(self.graph.nodes()) 225 | self.numEdges = len(self.graph.edges()) 226 | 227 | self.graph_state = np.zeros((self.numEdges, 2)) 228 | self.between_feature = np.zeros(self.numEdges) 229 | 230 | position = 0 231 | for edge in self.ordered_edges: 232 | i = edge[0] 233 | j = edge[1] 234 | self.edgesDict[str(i)+':'+str(j)] = position 235 | self.edgesDict[str(j)+':'+str(i)] = position 236 | betweenness = (self.graph.get_edge_data(i, j)['betweenness'] - self.mu_bet) / self.std_bet 237 | self.graph.get_edge_data(i, j)['betweenness'] = betweenness 238 | self.graph_state[position][0] = self.graph.get_edge_data(i, j)["capacity"] 239 | self.between_feature[position] = self.graph.get_edge_data(i, j)['betweenness'] 240 | position = position + 1 241 | 242 | self.initial_state = np.copy(self.graph_state) 243 | 244 | self._first_second_between() 245 | 246 | self.firstTrueSize = len(self.first) 247 | 248 | # We create the list of nodes ids to pick randomly from them 249 | self.nodes = list(range(0,self.numNodes)) 250 | 251 | def make_step(self, state, action, demand, source, destination): 252 | self.graph_state = np.copy(state) 253 | self.episode_over = True 254 | self.reward = 0 255 | 256 | i = 0 257 | j = 1 258 | currentPath = self.allPaths[str(source) +':'+ str(destination)][action] 259 | 260 | # Once we pick the action, we decrease the total edge capacity from the edges 261 | # from the allocated path (action path) 262 | while (j < len(currentPath)): 263 | self.graph_state[self.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] -= demand 264 | if self.graph_state[self.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][0] < 0: 265 | # FINISH IF LINKS CAPACITY <0 266 | return self.graph_state, self.reward, self.episode_over, self.demand, self.source, self.destination 267 | i = i + 1 268 | j = j + 1 269 | 270 | # Leave the bw_allocated back to 0 271 | self.graph_state[:,1] = 0 272 | 273 | # Reward is the allocated demand or 0 otherwise (end of episode) 274 | # We normalize the demand to don't have extremely large values 275 | self.reward = demand/self.max_demand 276 | self.episode_over = False 277 | 278 | self.demand = random.choice(self.listofDemands) 279 | self.source = random.choice(self.nodes) 280 | 281 | # We pick a pair of SOURCE,DESTINATION different nodes 282 | while True: 283 | self.destination = random.choice(self.nodes) 284 | if self.destination != self.source: 285 | break 286 | 287 | return self.graph_state, self.reward, self.episode_over, self.demand, self.source, self.destination 288 | 289 | def reset(self): 290 | """ 291 | Reset environment and setup for new episode. Generate new demand and pair source, destination. 292 | 293 | Returns: 294 | initial state of reset environment, a new demand and a source and destination node 295 | """ 296 | self.graph_state = np.copy(self.initial_state) 297 | self.demand = random.choice(self.listofDemands) 298 | self.source = random.choice(self.nodes) 299 | 300 | # We pick a pair of SOURCE,DESTINATION different nodes 301 | while True: 302 | self.destination = random.choice(self.nodes) 303 | if self.destination != self.source: 304 | break 305 | 306 | return self.graph_state, self.demand, self.source, self.destination 307 | 308 | def eval_sap_reset(self, demand, source, destination): 309 | """ 310 | Reset environment and setup for new episode. This function is used in the "evaluate_DQN.py" script. 311 | """ 312 | self.graph_state = np.copy(self.initial_state) 313 | self.demand = demand 314 | self.source = source 315 | self.destination = destination 316 | 317 | return self.graph_state -------------------------------------------------------------------------------- /DQN/gym-environments/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup(name='gym_environments', 4 | version='0.0.1', 5 | install_requires=['gym'] # And any other dependencies foo needs 6 | ) -------------------------------------------------------------------------------- /DQN/modelssample_DQN_agent/checkpoint: -------------------------------------------------------------------------------- 1 | model_checkpoint_path: "ckpt-367" 2 | all_model_checkpoint_paths: "ckpt-367" 3 | -------------------------------------------------------------------------------- /DQN/modelssample_DQN_agent/ckpt-349.data-00000-of-00001: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/modelssample_DQN_agent/ckpt-349.data-00000-of-00001 -------------------------------------------------------------------------------- /DQN/modelssample_DQN_agent/ckpt-349.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/knowledgedefinednetworking/DRL-GNN/e3bc32bc6b65c1b6df570aee23bfe304fc4ebe0a/DQN/modelssample_DQN_agent/ckpt-349.index -------------------------------------------------------------------------------- /DQN/mpnn.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2021, Paul Almasan [^1] 2 | # 3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture 4 | # department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu 5 | 6 | import tensorflow as tf 7 | from tensorflow import keras 8 | from keras import regularizers 9 | 10 | class myModel(tf.keras.Model): 11 | def __init__(self, hparams): 12 | super(myModel, self).__init__() 13 | self.hparams = hparams 14 | 15 | # Define layers here 16 | self.Message = tf.keras.models.Sequential() 17 | self.Message.add(keras.layers.Dense(self.hparams['link_state_dim'], 18 | activation=tf.nn.selu, name="FirstLayer")) 19 | 20 | self.Update = tf.keras.layers.GRUCell(self.hparams['link_state_dim'], dtype=tf.float32) 21 | 22 | self.Readout = tf.keras.models.Sequential() 23 | self.Readout.add(keras.layers.Dense(self.hparams['readout_units'], 24 | activation=tf.nn.selu, 25 | kernel_regularizer=regularizers.l2(hparams['l2']), 26 | name="Readout1")) 27 | self.Readout.add(keras.layers.Dropout(rate=hparams['dropout_rate'])) 28 | self.Readout.add(keras.layers.Dense(self.hparams['readout_units'], 29 | activation=tf.nn.selu, 30 | kernel_regularizer=regularizers.l2(hparams['l2']), 31 | name="Readout2")) 32 | self.Readout.add(keras.layers.Dropout(rate=hparams['dropout_rate'])) 33 | self.Readout.add(keras.layers.Dense(1, kernel_regularizer=regularizers.l2(hparams['l2']), 34 | name="Readout3")) 35 | 36 | def build(self, input_shape=None): 37 | self.Message.build(input_shape=tf.TensorShape([None, self.hparams['link_state_dim']*2])) 38 | self.Update.build(input_shape=tf.TensorShape([None,self.hparams['link_state_dim']])) 39 | self.Readout.build(input_shape=[None, self.hparams['link_state_dim']]) 40 | self.built = True 41 | 42 | @tf.function 43 | def call(self, states_action, states_graph_ids, states_first, states_second, sates_num_edges, training=False): 44 | # Define the forward pass 45 | link_state = states_action 46 | 47 | # Execute T times 48 | for _ in range(self.hparams['T']): 49 | # We have the combination of the hidden states of the main edges with the neighbours 50 | mainEdges = tf.gather(link_state, states_first) 51 | neighEdges = tf.gather(link_state, states_second) 52 | 53 | edgesConcat = tf.concat([mainEdges, neighEdges], axis=1) 54 | 55 | ### 1.a Message passing for link with all it's neighbours 56 | outputs = self.Message(edgesConcat) 57 | 58 | ### 1.b Sum of output values according to link id index 59 | edges_inputs = tf.math.unsorted_segment_sum(data=outputs, segment_ids=states_second, 60 | num_segments=sates_num_edges) 61 | 62 | ### 2. Update for each link 63 | # GRUcell needs a 3D tensor as state because there is a matmul: Wrap the link state 64 | outputs, links_state_list = self.Update(edges_inputs, [link_state]) 65 | 66 | link_state = links_state_list[0] 67 | 68 | # Perform sum of all hidden states 69 | edges_combi_outputs = tf.math.segment_sum(link_state, states_graph_ids, name=None) 70 | 71 | r = self.Readout(edges_combi_outputs,training=training) 72 | return r 73 | -------------------------------------------------------------------------------- /DQN/parse.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import matplotlib.pyplot as plt 4 | from scipy.signal import savgol_filter 5 | 6 | if __name__ == "__main__": 7 | # python parse.py -d ./Logs/expsample_DQN_agentLogs.txt 8 | parser = argparse.ArgumentParser(description='Parse file and create plots') 9 | 10 | parser.add_argument('-d', help='data file', type=str, required=True, nargs='+') 11 | args = parser.parse_args() 12 | 13 | aux = args.d[0].split(".") 14 | aux = aux[1].split("exp") 15 | differentiation_str = str(aux[1].split("Logs")[0]) 16 | 17 | list_score_test = [] 18 | epsilon_decay = [] 19 | list_losses = [] 20 | 21 | if not os.path.exists("./Images"): 22 | os.makedirs("./Images") 23 | 24 | with open(args.d[0]) as fp: 25 | for line in fp: 26 | arrayLine = line.split(",") 27 | if arrayLine[0]==">": 28 | list_score_test.append(float(arrayLine[1])) 29 | elif arrayLine[0]=="-": 30 | epsilon_decay.append(float(arrayLine[1])) 31 | elif arrayLine[0]==".": 32 | list_losses.append(float(arrayLine[1])) 33 | 34 | model_id = -1 35 | reward = 0 36 | with open(args.d[0]) as fp: 37 | for line in reversed(list(fp)): 38 | arrayLine = line.split(":") 39 | if arrayLine[0]=='MAX REWD': 40 | model_id = arrayLine[2].split(",")[0] 41 | reward = arrayLine[1].split(" ")[1] 42 | break 43 | 44 | print("Best model_id: "+model_id+" with Average Score Test of "+reward) 45 | 46 | plt.plot(list_score_test, label="Score") 47 | plt.xlabel("Episodes") 48 | plt.title("GNN+DQN Testing score") 49 | plt.ylabel("Average Score Test") 50 | plt.legend(loc="lower right") 51 | plt.savefig("./Images/AvgTestScore_" + differentiation_str) 52 | plt.close() 53 | 54 | # Plot epsilon evolution 55 | plt.plot(epsilon_decay) 56 | plt.xlabel("Episodes") 57 | plt.ylabel("Epsilon value") 58 | plt.savefig("./Images/Epsilon_" + differentiation_str) 59 | plt.close() 60 | 61 | # Plot Loss evolution 62 | ysmoothed = savgol_filter(list_losses, 51, 3) 63 | plt.plot(list_losses, color='lightblue') 64 | plt.plot(ysmoothed) 65 | plt.xlabel("Batch") 66 | plt.title("Average loss per batch") 67 | plt.ylabel("Loss") 68 | plt.yscale("log") 69 | plt.savefig("./Images/AvgLosses_" + differentiation_str) 70 | plt.close() -------------------------------------------------------------------------------- /DQN/requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==1.0.0 2 | astunparse==1.6.3 3 | cachetools==4.2.4 4 | certifi==2021.10.8 5 | charset-normalizer==2.0.7 6 | cloudpickle==2.0.0 7 | cycler==0.11.0 8 | flatbuffers==2.0 9 | fonttools==4.28.1 10 | gast==0.4.0 11 | google-auth==2.3.3 12 | google-auth-oauthlib==0.4.6 13 | google-pasta==0.2.0 14 | grpcio==1.42.0 15 | gym==0.21.0 16 | h5py==3.6.0 17 | idna==3.3 18 | importlib-metadata==4.8.2 19 | keras==2.7.0 20 | Keras-Preprocessing==1.1.2 21 | kiwisolver==1.3.2 22 | libclang==12.0.0 23 | Markdown==3.3.6 24 | matplotlib==3.5.0 25 | networkx==2.6.3 26 | numpy==1.21.4 27 | oauthlib==3.1.1 28 | opt-einsum==3.3.0 29 | packaging==21.3 30 | Pillow==8.4.0 31 | protobuf==3.19.1 32 | pyasn1==0.4.8 33 | pyasn1-modules==0.2.8 34 | pyparsing==3.0.6 35 | python-dateutil==2.8.2 36 | requests==2.26.0 37 | requests-oauthlib==1.3.0 38 | rsa==4.7.2 39 | scipy==1.7.2 40 | setuptools-scm==6.3.2 41 | six==1.16.0 42 | tensorboard==2.7.0 43 | tensorboard-data-server==0.6.1 44 | tensorboard-plugin-wit==1.8.0 45 | tensorflow==2.7.0 46 | tensorflow-estimator==2.7.0 47 | tensorflow-io-gcs-filesystem==0.22.0 48 | termcolor==1.1.0 49 | tomli==1.2.2 50 | typing-extensions==4.0.0 51 | urllib3==1.26.7 52 | Werkzeug==2.0.2 53 | wrapt==1.13.3 54 | zipp==3.6.0 55 | -------------------------------------------------------------------------------- /DQN/train_DQN.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2021, Paul Almasan [^1] 2 | # 3 | # [^1]: Universitat Politècnica de Catalunya, Computer Architecture 4 | # department, Barcelona, Spain. Email: felician.paul.almasan@upc.edu 5 | 6 | import numpy as np 7 | import gym 8 | import gc 9 | import os 10 | import sys 11 | import gym_environments 12 | import random 13 | import mpnn as gnn 14 | import tensorflow as tf 15 | from collections import deque 16 | import multiprocessing 17 | import time as tt 18 | import glob 19 | 20 | os.environ['CUDA_VISIBLE_DEVICES'] = '-1' 21 | 22 | ENV_NAME = 'GraphEnv-v1' 23 | graph_topology = 0 # 0==NSFNET, 1==GEANT2, 2==Small Topology, 3==GBN 24 | SEED = 37 25 | ITERATIONS = 10000 26 | TRAINING_EPISODES = 20 27 | EVALUATION_EPISODES = 40 28 | FIRST_WORK_TRAIN_EPISODE = 60 29 | 30 | MULTI_FACTOR_BATCH = 6 # Number of batches used in training 31 | TAU = 0.08 # Only used in soft weights copy 32 | 33 | differentiation_str = "sample_DQN_agent" 34 | checkpoint_dir = "./models"+differentiation_str 35 | store_loss = 3 # Store the loss every store_loss batches 36 | 37 | os.environ['PYTHONHASHSEED']=str(SEED) 38 | np.random.seed(SEED) 39 | random.seed(SEED) 40 | 41 | # Force TensorFlow to use single thread. 42 | # Multiple threads are a potential source of non-reproducible results. 43 | # For further details, see: https://stackoverflow.com/questions/42022950/ 44 | # tf.config.threading.set_inter_op_parallelism_threads(1) 45 | # tf.config.threading.set_intra_op_parallelism_threads(1) 46 | 47 | tf.random.set_seed(1) 48 | 49 | train_dir = "./TensorBoard/"+differentiation_str 50 | # summary_writer = tf.summary.create_file_writer(train_dir) 51 | listofDemands = [8, 32, 64] 52 | copy_weights_interval = 50 53 | evaluation_interval = 20 54 | epsilon_start_decay = 70 55 | 56 | 57 | hparams = { 58 | 'l2': 0.1, 59 | 'dropout_rate': 0.01, 60 | 'link_state_dim': 20, 61 | 'readout_units': 35, 62 | 'learning_rate': 0.0001, 63 | 'batch_size': 32, 64 | 'T': 4, 65 | 'num_demands': len(listofDemands) 66 | } 67 | 68 | MAX_QUEUE_SIZE = 4000 69 | 70 | def cummax(alist, extractor): 71 | with tf.name_scope('cummax'): 72 | maxes = [tf.reduce_max(extractor(v)) + 1 for v in alist] 73 | cummaxes = [tf.zeros_like(maxes[0])] 74 | for i in range(len(maxes) - 1): 75 | cummaxes.append(tf.math.add_n(maxes[0:i + 1])) 76 | return cummaxes 77 | 78 | class DQNAgent: 79 | def __init__(self, batch_size): 80 | self.memory = deque(maxlen=MAX_QUEUE_SIZE) 81 | self.gamma = 0.95 # discount rate 82 | self.epsilon = 1.0 # exploration rate 83 | self.epsilon_min = 0.01 84 | self.epsilon_decay = 0.995 85 | self.writer = None 86 | self.K = 4 # K-paths 87 | self.listQValues = None 88 | self.numbersamples = batch_size 89 | self.action = None 90 | self.capacity_feature = None 91 | self.bw_allocated_feature = np.zeros((env_training.numEdges,len(env_training.listofDemands))) 92 | 93 | self.global_step = 0 94 | self.primary_network = gnn.myModel(hparams) 95 | self.primary_network.build() 96 | self.target_network = gnn.myModel(hparams) 97 | self.target_network.build() 98 | self.optimizer = tf.keras.optimizers.SGD(learning_rate=hparams['learning_rate'],momentum=0.9,nesterov=True) 99 | 100 | def act(self, env, state, demand, source, destination, flagEvaluation): 101 | """ 102 | Given a demand stored in the environment it allocates the K=4 shortest paths on the current 'state' 103 | and predicts the q_values of the K=4 different new graph states by using the GNN model. 104 | Picks the state according to epsilon-greedy approach. The flag=TRUE indicates that we are testing 105 | the model and thus, it won't activate the drop layers. 106 | """ 107 | # Set to True if we need to compute K=4 q-values and take the maxium 108 | takeMax_epsilon = False 109 | # List of graphs 110 | listGraphs = [] 111 | # List of graph features that are used in the cummax() call 112 | list_k_features = list() 113 | # Initialize action 114 | action = 0 115 | 116 | # We get the K-paths between source-destination 117 | pathList = env.allPaths[str(source) +':'+ str(destination)] 118 | path = 0 119 | 120 | # 1. Implement epsilon-greedy to pick allocation 121 | # If flagEvaluation==TRUE we are EVALUATING => take always the action that the agent is saying has higher q-value 122 | # Otherwise, we are training with normal epsilon-greedy strategy 123 | if flagEvaluation: 124 | # If evaluation, compute K=4 q-values and take the maxium value 125 | takeMax_epsilon = True 126 | else: 127 | # If training, compute epsilon-greedy 128 | z = np.random.random() 129 | if z > self.epsilon: 130 | # Compute K=4 q-values and pick the one with highest value 131 | # In case of multiple same max values, return the first one 132 | takeMax_epsilon = True 133 | else: 134 | # Pick a random path and compute only one q-value 135 | path = np.random.randint(0, len(pathList)) 136 | action = path 137 | 138 | # 2. Allocate (S,D, linkDemand) demand using the K shortest paths 139 | while path < len(pathList): 140 | state_copy = np.copy(state) 141 | currentPath = pathList[path] 142 | i = 0 143 | j = 1 144 | 145 | # 3. Iterate over paths' pairs of nodes and allocate demand to bw_allocated 146 | while (j < len(currentPath)): 147 | state_copy[env.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = demand 148 | i = i + 1 149 | j = j + 1 150 | 151 | # 4. Add allocated graphs' features to the list. Later we will compute their q-values using cummax 152 | listGraphs.append(state_copy) 153 | features = self.get_graph_features(env, state_copy) 154 | list_k_features.append(features) 155 | 156 | if not takeMax_epsilon: 157 | # If we don't need to compute the K=4 q-values we exit 158 | break 159 | 160 | path = path + 1 161 | 162 | vs = [v for v in list_k_features] 163 | 164 | # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 165 | # link hidden states for each graph. 166 | graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))] 167 | first_offset = cummax(vs, lambda v: v['first']) 168 | second_offset = cummax(vs, lambda v: v['second']) 169 | 170 | tensors = ({ 171 | 'graph_id': tf.concat([v for v in graph_ids], axis=0), 172 | 'link_state': tf.concat([v['link_state'] for v in vs], axis=0), 173 | 'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0), 174 | 'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0), 175 | 'num_edges': tf.math.add_n([v['num_edges'] for v in vs]), 176 | } 177 | ) 178 | 179 | # Predict qvalues for all graphs within tensors 180 | self.listQValues = self.primary_network(tensors['link_state'], tensors['graph_id'], tensors['first'], 181 | tensors['second'], tensors['num_edges'], training=False).numpy() 182 | 183 | if takeMax_epsilon: 184 | # We take the path with highest q-value 185 | action = np.argmax(self.listQValues) 186 | else: 187 | return path, list_k_features[0] 188 | 189 | return action, list_k_features[action] 190 | 191 | def get_graph_features(self, env, copyGraph): 192 | """ 193 | We iterate over the converted graph nodes and take the features. The capacity and bw allocated features 194 | are normalized on the fly. 195 | """ 196 | self.bw_allocated_feature.fill(0.0) 197 | # Normalize capacity feature 198 | self.capacity_feature = (copyGraph[:,0] - 100.00000001) / 200.0 199 | 200 | iter = 0 201 | for i in copyGraph[:, 1]: 202 | if i == 8: 203 | self.bw_allocated_feature[iter][0] = 1 204 | elif i == 32: 205 | self.bw_allocated_feature[iter][1] = 1 206 | elif i == 64: 207 | self.bw_allocated_feature[iter][2] = 1 208 | iter = iter + 1 209 | 210 | sample = { 211 | 'num_edges': env.numEdges, 212 | 'length': env.firstTrueSize, 213 | 'betweenness': tf.convert_to_tensor(value=env.between_feature, dtype=tf.float32), 214 | 'bw_allocated': tf.convert_to_tensor(value=self.bw_allocated_feature, dtype=tf.float32), 215 | 'capacities': tf.convert_to_tensor(value=self.capacity_feature, dtype=tf.float32), 216 | 'first': tf.convert_to_tensor(env.first, dtype=tf.int32), 217 | 'second': tf.convert_to_tensor(env.second, dtype=tf.int32) 218 | } 219 | 220 | sample['capacities'] = tf.reshape(sample['capacities'][0:sample['num_edges']], [sample['num_edges'], 1]) 221 | sample['betweenness'] = tf.reshape(sample['betweenness'][0:sample['num_edges']], [sample['num_edges'], 1]) 222 | 223 | hiddenStates = tf.concat([sample['capacities'], sample['betweenness'], sample['bw_allocated']], axis=1) 224 | 225 | paddings = tf.constant([[0, 0], [0, hparams['link_state_dim'] - 2 - hparams['num_demands']]]) 226 | link_state = tf.pad(tensor=hiddenStates, paddings=paddings, mode="CONSTANT") 227 | 228 | inputs = {'link_state': link_state, 'first': sample['first'][0:sample['length']], 229 | 'second': sample['second'][0:sample['length']], 'num_edges': sample['num_edges']} 230 | 231 | return inputs 232 | 233 | def _write_tf_summary(self, gradients, loss): 234 | with summary_writer.as_default(): 235 | tf.summary.scalar(name="loss", data=loss[0], step=self.global_step) 236 | tf.summary.histogram(name='gradients_5', data=gradients[5], step=self.global_step) 237 | tf.summary.histogram(name='gradients_7', data=gradients[7], step=self.global_step) 238 | tf.summary.histogram(name='gradients_9', data=gradients[9], step=self.global_step) 239 | tf.summary.histogram(name='FirstLayer/kernel:0', data=self.primary_network.variables[0], step=self.global_step) 240 | tf.summary.histogram(name='FirstLayer/bias:0', data=self.primary_network.variables[1], step=self.global_step) 241 | tf.summary.histogram(name='kernel:0', data=self.primary_network.variables[2], step=self.global_step) 242 | tf.summary.histogram(name='recurrent_kernel:0', data=self.primary_network.variables[3], step=self.global_step) 243 | tf.summary.histogram(name='bias:0', data=self.primary_network.variables[4], step=self.global_step) 244 | tf.summary.histogram(name='Readout1/kernel:0', data=self.primary_network.variables[5], step=self.global_step) 245 | tf.summary.histogram(name='Readout1/bias:0', data=self.primary_network.variables[6], step=self.global_step) 246 | tf.summary.histogram(name='Readout2/kernel:0', data=self.primary_network.variables[7], step=self.global_step) 247 | tf.summary.histogram(name='Readout2/bias:0', data=self.primary_network.variables[8], step=self.global_step) 248 | tf.summary.histogram(name='Readout3/kernel:0', data=self.primary_network.variables[9], step=self.global_step) 249 | tf.summary.histogram(name='Readout3/bias:0', data=self.primary_network.variables[10], step=self.global_step) 250 | summary_writer.flush() 251 | self.global_step = self.global_step + 1 252 | 253 | @tf.function 254 | def _forward_pass(self, x): 255 | prediction_state = self.primary_network(x[0], x[1], x[2], x[3], x[4], training=True) 256 | preds_next_target = tf.stop_gradient(self.target_network(x[6], x[7], x[9], x[10], x[11], training=True)) 257 | return prediction_state, preds_next_target 258 | 259 | def _train_step(self, batch): 260 | # Record operations for automatic differentiation 261 | with tf.GradientTape() as tape: 262 | preds_state = [] 263 | target = [] 264 | for x in batch: 265 | prediction_state, preds_next_target = self._forward_pass(x) 266 | # Take q-value of the action performed 267 | preds_state.append(prediction_state[0]) 268 | # We multiple by 0 if done==TRUE to cancel the second term 269 | target.append(tf.stop_gradient([x[5] + self.gamma*tf.math.reduce_max(preds_next_target)*(1-x[8])])) 270 | 271 | loss = tf.keras.losses.MSE(tf.stack(target, axis=1), tf.stack(preds_state, axis=1)) 272 | # Loss function using L2 Regularization 273 | regularization_loss = sum(self.primary_network.losses) 274 | loss = loss + regularization_loss 275 | 276 | # Computes the gradient using operations recorded in context of this tape 277 | grad = tape.gradient(loss, self.primary_network.variables) 278 | #gradients, _ = tf.clip_by_global_norm(grad, 5.0) 279 | gradients = [tf.clip_by_value(gradient, -1., 1.) for gradient in grad] 280 | self.optimizer.apply_gradients(zip(gradients, self.primary_network.variables)) 281 | del tape 282 | return grad, loss 283 | 284 | def replay(self, episode): 285 | for i in range(MULTI_FACTOR_BATCH): 286 | batch = random.sample(self.memory, self.numbersamples) 287 | 288 | grad, loss = self._train_step(batch) 289 | if i%store_loss==0: 290 | fileLogs.write(".," + '%.9f' % loss.numpy() + ",\n") 291 | 292 | # Soft weights update 293 | # for t, e in zip(self.target_network.trainable_variables, self.primary_network.trainable_variables): 294 | # t.assign(t * (1 - TAU) + e * TAU) 295 | 296 | # Hard weights update 297 | if episode % copy_weights_interval == 0: 298 | self.target_network.set_weights(self.primary_network.get_weights()) 299 | # if episode % evaluation_interval == 0: 300 | # self._write_tf_summary(grad, loss) 301 | gc.collect() 302 | 303 | def add_sample(self, env_training, state_action, action, reward, done, new_state, new_demand, new_source, new_destination): 304 | self.bw_allocated_feature.fill(0.0) 305 | new_state_copy = np.copy(new_state) 306 | 307 | state_action['graph_id'] = tf.fill([tf.shape(state_action['link_state'])[0]], 0) 308 | 309 | # We get the K-paths between new_source-new_destination 310 | pathList = env_training.allPaths[str(new_source) +':'+ str(new_destination)] 311 | path = 0 312 | list_k_features = list() 313 | 314 | # 2. Allocate (S,D, linkDemand) demand using the K shortest paths 315 | while path < len(pathList): 316 | currentPath = pathList[path] 317 | i = 0 318 | j = 1 319 | 320 | # 3. Iterate over paths' pairs of nodes and allocate new_demand to bw_allocated 321 | while (j < len(currentPath)): 322 | new_state_copy[env_training.edgesDict[str(currentPath[i]) + ':' + str(currentPath[j])]][1] = new_demand 323 | i = i + 1 324 | j = j + 1 325 | 326 | # 4. Add allocated graphs' features to the list. Later we will compute it's qvalues using cummax 327 | features = agent.get_graph_features(env_training, new_state_copy) 328 | 329 | list_k_features.append(features) 330 | path = path + 1 331 | new_state_copy[:,1] = 0 332 | 333 | vs = [v for v in list_k_features] 334 | 335 | # We compute the graphs_ids to later perform the unsorted_segment_sum for each graph and obtain the 336 | # link hidden states for each graph. 337 | graph_ids = [tf.fill([tf.shape(vs[it]['link_state'])[0]], it) for it in range(len(list_k_features))] 338 | first_offset = cummax(vs, lambda v: v['first']) 339 | second_offset = cummax(vs, lambda v: v['second']) 340 | 341 | tensors = ({ 342 | 'graph_id': tf.concat([v for v in graph_ids], axis=0), 343 | 'link_state': tf.concat([v['link_state'] for v in vs], axis=0), 344 | 'first': tf.concat([v['first'] + m for v, m in zip(vs, first_offset)], axis=0), 345 | 'second': tf.concat([v['second'] + m for v, m in zip(vs, second_offset)], axis=0), 346 | 'num_edges': tf.math.add_n([v['num_edges'] for v in vs]), 347 | } 348 | ) 349 | 350 | # We store the state with the action marked, the graph ids, first, second, num_edges, the reward, 351 | # new_state(-1 because we don't need it in this case), the graph ids, done, first, second, number of edges 352 | self.memory.append((state_action['link_state'], state_action['graph_id'], state_action['first'], # 2 353 | state_action['second'], tf.convert_to_tensor(state_action['num_edges']), # 4 354 | tf.convert_to_tensor(reward, dtype=tf.float32), tensors['link_state'], tensors['graph_id'], # 7 355 | tf.convert_to_tensor(int(done==True), dtype=tf.float32), tensors['first'], tensors['second'], # 10 356 | tf.convert_to_tensor(tensors['num_edges']))) # 12 357 | 358 | if __name__ == "__main__": 359 | # python train_DQN.py 360 | # Get the environment and extract the number of actions. 361 | env_training = gym.make(ENV_NAME) 362 | np.random.seed(SEED) 363 | env_training.seed(SEED) 364 | env_training.generate_environment(graph_topology, listofDemands) 365 | 366 | env_eval = gym.make(ENV_NAME) 367 | np.random.seed(SEED) 368 | env_eval.seed(SEED) 369 | env_eval.generate_environment(graph_topology, listofDemands) 370 | 371 | batch_size = hparams['batch_size'] 372 | agent = DQNAgent(batch_size) 373 | 374 | eval_ep = 0 375 | train_ep = 0 376 | max_reward = 0 377 | reward_id = 0 378 | 379 | if not os.path.exists("./Logs"): 380 | os.makedirs("./Logs") 381 | 382 | # We store all the information in a Log file and later we parse this file 383 | # to extract all the relevant information 384 | fileLogs = open("./Logs/exp" + differentiation_str + "Logs.txt", "a") 385 | 386 | if not os.path.exists(checkpoint_dir): 387 | os.makedirs(checkpoint_dir) 388 | checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt") 389 | 390 | checkpoint = tf.train.Checkpoint(model=agent.primary_network, optimizer=agent.optimizer) 391 | 392 | rewards_test = np.zeros(EVALUATION_EPISODES) 393 | 394 | for eps in range(EVALUATION_EPISODES): 395 | state, demand, source, destination = env_eval.reset() 396 | rewardAddTest = 0 397 | while 1: 398 | # We execute evaluation over current state 399 | # demand, src, dst 400 | action, _ = agent.act(env_eval, state, demand, source, destination, True) 401 | 402 | new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination) 403 | rewardAddTest = rewardAddTest + reward 404 | state = new_state 405 | if done: 406 | break 407 | rewards_test[eps] = rewardAddTest 408 | 409 | evalMeanReward = np.mean(rewards_test) 410 | fileLogs.write(">," + str(evalMeanReward) + ",\n") 411 | fileLogs.write("-," + str(agent.epsilon) + ",\n") 412 | fileLogs.flush() 413 | 414 | counter_store_model = 1 415 | 416 | for ep_it in range(ITERATIONS): 417 | if ep_it%5==0: 418 | print("Training iteration: ", ep_it) 419 | 420 | if ep_it==0: 421 | # At the beginning we don't have any experiences in the buffer. Thus, we force to 422 | # perform more training episodes than usually 423 | train_episodes = FIRST_WORK_TRAIN_EPISODE 424 | else: 425 | train_episodes = TRAINING_EPISODES 426 | for _ in range(train_episodes): 427 | # Used to clean the TF cache 428 | tf.random.set_seed(1) 429 | 430 | state, demand, source, destination = env_training.reset() 431 | 432 | while 1: 433 | # We execute evaluation over current state 434 | action, state_action = agent.act(env_training, state, demand, source, destination, False) 435 | new_state, reward, done, new_demand, new_source, new_destination = env_training.make_step(state, action, demand, source, destination) 436 | 437 | agent.add_sample(env_training, state_action, action, reward, done, new_state, new_demand, new_source, new_destination) 438 | state = new_state 439 | demand = new_demand 440 | source = new_source 441 | destination = new_destination 442 | if done: 443 | break 444 | 445 | agent.replay(ep_it) 446 | 447 | # Decrease epsilon (from epsion-greedy exploration strategy) 448 | if ep_it > epsilon_start_decay and agent.epsilon > agent.epsilon_min: 449 | agent.epsilon *= agent.epsilon_decay 450 | agent.epsilon *= agent.epsilon_decay 451 | 452 | # We only evaluate the model every evaluation_interval steps 453 | if ep_it % evaluation_interval == 0: 454 | for eps in range(EVALUATION_EPISODES): 455 | state, demand, source, destination = env_eval.reset() 456 | rewardAddTest = 0 457 | while 1: 458 | # We execute evaluation over current state 459 | action, _ = agent.act(env_eval, state, demand, source, destination, True) 460 | 461 | new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination) 462 | rewardAddTest = rewardAddTest + reward 463 | state = new_state 464 | if done: 465 | break 466 | rewards_test[eps] = rewardAddTest 467 | evalMeanReward = np.mean(rewards_test) 468 | 469 | if evalMeanReward>max_reward: 470 | max_reward = evalMeanReward 471 | reward_id = counter_store_model 472 | 473 | fileLogs.write(">," + str(evalMeanReward) + ",\n") 474 | fileLogs.write("-," + str(agent.epsilon) + ",\n") 475 | 476 | # Store trained model 477 | checkpoint.save(checkpoint_prefix) 478 | fileLogs.write("MAX REWD: " + str(max_reward) + " MODEL_ID: " + str(reward_id) +",\n") 479 | counter_store_model = counter_store_model + 1 480 | 481 | fileLogs.flush() 482 | 483 | # Invoke garbage collection 484 | # tf.keras.backend.clear_session() 485 | gc.collect() 486 | 487 | for eps in range(EVALUATION_EPISODES): 488 | state, demand, source, destination = env_eval.reset() 489 | rewardAddTest = 0 490 | while 1: 491 | # We execute evaluation over current state 492 | # demand, src, dst 493 | action, _ = agent.act(env_eval, state, demand, source, destination, True) 494 | 495 | new_state, reward, done, demand, source, destination = env_eval.make_step(state, action, demand, source, destination) 496 | rewardAddTest = rewardAddTest + reward 497 | state = new_state 498 | if done: 499 | break 500 | rewards_test[eps] = rewardAddTest 501 | evalMeanReward = np.mean(rewards_test) 502 | 503 | if evalMeanReward>max_reward: 504 | max_reward = evalMeanReward 505 | reward_id = counter_store_model 506 | 507 | fileLogs.write(">," + str(evalMeanReward) + ",\n") 508 | fileLogs.write("-," + str(agent.epsilon) + ",\n") 509 | 510 | # Store trained model 511 | checkpoint.save(checkpoint_prefix) 512 | fileLogs.write("MAX REWD: " + str(max_reward) + " MODEL_ID: " + str(reward_id) +",\n") 513 | 514 | fileLogs.flush() 515 | fileLogs.close() 516 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2019, Knowledge-Defined Networking 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case 2 | #### Link to paper: [[here](https://arxiv.org/abs/1910.07421)] 3 | #### P. Almasan, J. Suárez-Varela, A. Badia-Sampera, K. Rusek, P. Barlet-Ros, A. Cabellos-Aparicio. 4 | 5 | Contact: 6 | 7 | [![Twitter Follow](https://img.shields.io/twitter/follow/PaulAlmasan?style=social)](https://twitter.com/PaulAlmasan) 8 | [![GitHub watchers](https://img.shields.io/github/watchers/knowledgedefinednetworking/DRL-GNN?style=social&label=Watch)](https://github.com/knowledgedefinednetworking/DRL-GNN) 9 | [![GitHub forks](https://img.shields.io/github/forks/knowledgedefinednetworking/DRL-GNN?style=social&label=Fork)](https://github.com/knowledgedefinednetworking/DRL-GNN) 10 | [![GitHub stars](https://img.shields.io/github/stars/knowledgedefinednetworking/DRL-GNN?style=social&label=Star)](https://github.com/knowledgedefinednetworking/DRL-GNN) 11 | 12 | ## Abstract 13 | Recent advances in Deep Reinforcement Learning (DRL) have shown a significant improvement in decision-making problems. The networking community has started to investigate how DRL can provide a new breed of solutions to relevant optimization problems, such as routing. However, most of the state-of-the-art DRL-based networking techniques fail to generalize, this means that they can only operate over network topologies seen during training, but not over new topologies. The reason behind this important limitation is that existing DRL networking solutions use standard neural networks (e.g., fully connected), which are unable to learn graph-structured information. In this paper we propose to use Graph Neural Networks (GNN) in combination with DRL. GNN have been recently proposed to model graphs, and our novel DRL+GNN architecture is able to learn, operate and generalize over arbitrary network topologies. To showcase its generalization capabilities, we evaluate it on an Optical Transport Network (OTN) scenario, where the agent needs to allocate traffic demands efficiently. Our results show that our DRL+GNN agent is able to achieve outstanding performance in topologies unseen during training. 14 | 15 | # Instructions to execute 16 | 17 | [See the execution instructions](https://github.com/knowledgedefinednetworking/DRL-GNN/blob/master/DQN/README.md) 18 | 19 | ## Description 20 | 21 | To know more details about the implementation used in the experiments contact: [felician.paul.almasan@upc.edu](mailto:felician.paul.almasan@upc.edu) 22 | 23 | Please cite the corresponding article if you use the code from this repository: 24 | 25 | ``` 26 | @article{almasan2019deep, 27 | title={Deep reinforcement learning meets graph neural networks: Exploring a routing optimization use case}, 28 | author={Almasan, Paul and Su{\'a}rez-Varela, Jos{\'e} and Badia-Sampera, Arnau and Rusek, Krzysztof and Barlet-Ros, Pere and Cabellos-Aparicio, Albert}, 29 | journal={arXiv preprint arXiv:1910.07421}, 30 | year={2019} 31 | } 32 | ``` 33 | --------------------------------------------------------------------------------