├── log ├── re_log_2017-06-12_02:55:09 └── re_log_2017-06-12_02:59:23 ├── pg_re.pyc ├── pg_su.pyc ├── data └── .DS_Store ├── environment.pyc ├── other_agents.pyc ├── parameters.pyc ├── pg_network.pyc ├── plot └── .DS_Store ├── slow_down_cdf.pyc ├── job_distribution.pyc ├── pg_re_single_core.pyc ├── deeprm state space.png ├── deeprm2 state space.png ├── launcher2.py ├── run_script.py ├── other_agents.py ├── job_distribution.py ├── parameters.py ├── README.md ├── launcher.py ├── pg_su.py ├── slow_down_cdf.py ├── pg_re_single_core_o.py ├── pg_re_single_core.py ├── pg_network.py ├── pg_re_o.py ├── pg_re.py └── environment.py /log/re_log_2017-06-12_02:55:09: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /log/re_log_2017-06-12_02:59:23: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /pg_re.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/pg_re.pyc -------------------------------------------------------------------------------- /pg_su.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/pg_su.pyc -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/data/.DS_Store -------------------------------------------------------------------------------- /environment.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/environment.pyc -------------------------------------------------------------------------------- /other_agents.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/other_agents.pyc -------------------------------------------------------------------------------- /parameters.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/parameters.pyc -------------------------------------------------------------------------------- /pg_network.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/pg_network.pyc -------------------------------------------------------------------------------- /plot/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/plot/.DS_Store -------------------------------------------------------------------------------- /slow_down_cdf.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/slow_down_cdf.pyc -------------------------------------------------------------------------------- /job_distribution.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/job_distribution.pyc -------------------------------------------------------------------------------- /pg_re_single_core.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/pg_re_single_core.pyc -------------------------------------------------------------------------------- /deeprm state space.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/deeprm state space.png -------------------------------------------------------------------------------- /deeprm2 state space.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BrightFeather/deeprm_conv/HEAD/deeprm2 state space.png -------------------------------------------------------------------------------- /launcher2.py: -------------------------------------------------------------------------------- 1 | # tidier and simpler launcher without use of command line 2 | import os 3 | os.environ["THEANO_FLAGS"] = "device=cpu,floatX=float32" 4 | import sys 5 | import getopt 6 | import matplotlib 7 | matplotlib.use('Agg') 8 | 9 | import parameters 10 | import pg_re 11 | import pg_re_single_core 12 | import pg_su 13 | import slow_down_cdf 14 | 15 | pa = parameters.Parameters() 16 | pa.type_exp = "pg_re" 17 | # pa.pg_resume = "data/pg_su_net_file_9990.pkl" 18 | pa.simu_len = 50 19 | pa.num_ex = 10 20 | pa.output_filename= "data/pg_re_conv" 21 | pa.output_freq=2 22 | pg_re_single_core.launch(pa) 23 | -------------------------------------------------------------------------------- /run_script.py: -------------------------------------------------------------------------------- 1 | # /usr/bin/env python 2 | 3 | import os 4 | 5 | simu_len = 200 6 | 7 | for new_job_rate in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]: 8 | for num_seq_per_batch in [20]: 9 | for num_ex in [100]: 10 | for num_nw in [10]: 11 | 12 | file_name = 'data/pg_re_rate_' + str(new_job_rate) + '_simu_len_' + str(simu_len) + '_num_seq_per_batch_' + str(num_seq_per_batch) + '_ex_' + str(num_ex) + '_nw_' + str(num_nw) 13 | log = 'log/pg_re_rate_' + str(new_job_rate) + '_simu_len_' + str(simu_len) + '_num_seq_per_batch_' + str(num_seq_per_batch) + '_ex_' + str(num_ex) + '_nw_' + str(num_nw) 14 | 15 | # run experiment 16 | os.system('nohup python -u launcher.py --exp_type=pg_re --out_freq=50 --simu_len=' + str(simu_len) + ' --eps_max_len=' + str(simu_len * 4) + ' --num_ex=' + str(num_ex) + ' --new_job_rate=' + str(new_job_rate) + ' --num_seq_per_batch=' + str(num_seq_per_batch) + ' --num_nw=' + str(num_nw) + ' --ofile=' + file_name + ' > ' + log + ' &') 17 | 18 | # plot slowdown 19 | # it_num = 100 20 | # os.system('nohup python -u launcher.py --exp_type=test --simu_len=' + str(simu_len) + '--num_ex=' + str(num_ex) + ' --new_job_rate=' + str(new_job_rate) + ' --num_seq_per_batch=' + str(num_seq_per_batch) + ' --pg_re=' + file_name + '_' + str(it_num) + '.pkl' + ' &') 21 | -------------------------------------------------------------------------------- /other_agents.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_packer_action(machine, job_slot): 5 | align_score = 0 6 | act = len(job_slot.slot) # if no action available, hold 7 | 8 | for i in xrange(len(job_slot.slot)): 9 | new_job = job_slot.slot[i] 10 | if new_job is not None: # there is a pending job 11 | 12 | avbl_res = machine.avbl_slot[:new_job.len, :] 13 | res_left = avbl_res - new_job.res_vec 14 | 15 | if np.all(res_left[:] >= 0): # enough resource to allocate 16 | 17 | tmp_align_score = avbl_res[0, :].dot(new_job.res_vec) 18 | 19 | if tmp_align_score > align_score: 20 | align_score = tmp_align_score 21 | act = i 22 | return act 23 | 24 | 25 | def get_sjf_action(machine, job_slot): 26 | sjf_score = 0 27 | act = len(job_slot.slot) # if no action available, hold 28 | 29 | for i in xrange(len(job_slot.slot)): 30 | new_job = job_slot.slot[i] 31 | if new_job is not None: # there is a pending job 32 | 33 | avbl_res = machine.avbl_slot[:new_job.len, :] 34 | res_left = avbl_res - new_job.res_vec 35 | 36 | if np.all(res_left[:] >= 0): # enough resource to allocate 37 | 38 | tmp_sjf_score = 1 / float(new_job.len) 39 | 40 | if tmp_sjf_score > sjf_score: 41 | sjf_score = tmp_sjf_score 42 | act = i 43 | return act 44 | 45 | 46 | def get_packer_sjf_action(machine, job_slot, knob): # knob controls which to favor, 1 to packer, 0 to sjf 47 | 48 | combined_score = 0 49 | act = len(job_slot.slot) # if no action available, hold 50 | 51 | for i in xrange(len(job_slot.slot)): 52 | new_job = job_slot.slot[i] 53 | if new_job is not None: # there is a pending job 54 | 55 | avbl_res = machine.avbl_slot[:new_job.len, :] 56 | res_left = avbl_res - new_job.res_vec 57 | 58 | if np.all(res_left[:] >= 0): # enough resource to allocate 59 | 60 | tmp_align_score = avbl_res[0, :].dot(new_job.res_vec) 61 | tmp_sjf_score = 1 / float(new_job.len) 62 | 63 | tmp_combined_score = knob * tmp_align_score + (1 - knob) * tmp_sjf_score 64 | 65 | if tmp_combined_score > combined_score: 66 | combined_score = tmp_combined_score 67 | act = i 68 | return act 69 | 70 | 71 | def get_random_action(job_slot): 72 | num_act = len(job_slot.slot) + 1 # if no action available, 73 | act = np.random.randint(num_act) 74 | return act 75 | -------------------------------------------------------------------------------- /job_distribution.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class Dist: 5 | 6 | def __init__(self, num_res, max_nw_size, job_len): 7 | self.num_res = num_res 8 | self.max_nw_size = max_nw_size 9 | self.job_len = job_len 10 | 11 | self.job_small_chance = 0.8 12 | 13 | self.job_len_big_lower = job_len * 2 / 3 14 | self.job_len_big_upper = job_len 15 | 16 | self.job_len_small_lower = 1 17 | self.job_len_small_upper = job_len / 5 18 | 19 | self.dominant_res_lower = max_nw_size / 2 20 | self.dominant_res_upper = max_nw_size 21 | 22 | self.other_res_lower = 1 23 | self.other_res_upper = max_nw_size / 5 24 | 25 | def normal_dist(self): 26 | 27 | # new work duration 28 | nw_len = np.random.randint(1, self.job_len + 1) # same length in every dimension 29 | 30 | nw_size = np.zeros(self.num_res) 31 | 32 | for i in range(self.num_res): 33 | nw_size[i] = np.random.randint(1, self.max_nw_size + 1) 34 | 35 | return nw_len, nw_size 36 | 37 | def bi_model_dist(self): 38 | 39 | # -- job length -- 40 | if np.random.rand() < self.job_small_chance: # small job 41 | nw_len = np.random.randint(self.job_len_small_lower, 42 | self.job_len_small_upper + 1) 43 | else: # big job 44 | nw_len = np.random.randint(self.job_len_big_lower, 45 | self.job_len_big_upper + 1) 46 | 47 | nw_size = np.zeros(self.num_res) 48 | 49 | # -- job resource request -- 50 | dominant_res = np.random.randint(0, self.num_res) 51 | for i in range(self.num_res): 52 | if i == dominant_res: 53 | nw_size[i] = np.random.randint(self.dominant_res_lower, 54 | self.dominant_res_upper + 1) 55 | else: 56 | nw_size[i] = np.random.randint(self.other_res_lower, 57 | self.other_res_upper + 1) 58 | 59 | return nw_len, nw_size 60 | 61 | 62 | def generate_sequence_work(pa, seed=42): 63 | 64 | np.random.seed(seed) 65 | 66 | simu_len = pa.simu_len * pa.num_ex 67 | 68 | nw_dist = pa.dist.bi_model_dist 69 | 70 | nw_len_seq = np.zeros(simu_len, dtype=int) 71 | nw_size_seq = np.zeros((simu_len, pa.num_res), dtype=int) 72 | 73 | for i in range(simu_len): 74 | 75 | if np.random.rand() < pa.new_job_rate: # a new job comes 76 | 77 | nw_len_seq[i], nw_size_seq[i, :] = nw_dist() 78 | 79 | nw_len_seq = np.reshape(nw_len_seq, 80 | [pa.num_ex, pa.simu_len]) 81 | nw_size_seq = np.reshape(nw_size_seq, 82 | [pa.num_ex, pa.simu_len, pa.num_res]) 83 | 84 | return nw_len_seq, nw_size_seq 85 | -------------------------------------------------------------------------------- /parameters.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import math 3 | 4 | import job_distribution 5 | 6 | 7 | class Parameters: 8 | def __init__(self): 9 | 10 | self.output_filename = 'data/tmp' 11 | 12 | self.num_epochs = 1000 # number of training epochs 13 | self.simu_len = 10 # length of the busy cycle that repeats itself, length of each trajectory 10 14 | self.num_ex = 1 # number of sequences, number of episodes 15 | 16 | self.output_freq = 10 # interval for output and store parameters 17 | 18 | self.num_seq_per_batch = 10 # number of sequences to compute baseline 19 | self.episode_max_length = 200 # enforcing an artificial terminal 20 | 21 | self.num_res = 2 # number of resources in the system 22 | self.num_nw = 5 # maximum allowed number of work in the queue 23 | 24 | self.time_horizon = 20 # number of time steps in the graph 25 | self.max_job_len = 15 # maximum duration of new jobs 26 | self.res_slot = 10 # maximum number of available resource slots 27 | self.max_job_size = 10 # maximum resource request of new work 28 | 29 | self.backlog_size = 60 # backlog queue size 30 | 31 | self.max_track_since_new = 10 # track how many time steps since last new jobs 32 | 33 | self.job_num_cap = 40 # maximum number of distinct colors in current work graph 34 | 35 | self.new_job_rate = 0.7 # lambda in new job arrival Poisson Process 36 | 37 | self.discount = 1 # discount factor 38 | 39 | # distribution for new job arrival 40 | self.dist = job_distribution.Dist(self.num_res, self.max_job_size, self.max_job_len) 41 | 42 | # graphical representation 43 | assert self.backlog_size % self.time_horizon == 0 # such that it can be converted into an image 44 | self.backlog_width = int(math.ceil(self.backlog_size / float(self.time_horizon * self.num_res))) 45 | self.network_input_height = self.time_horizon * self.num_res 46 | self.network_input_width = \ 47 | (self.res_slot + 48 | self.max_job_size * self.num_nw) + \ 49 | self.backlog_width + \ 50 | 1 # for extra info, 1) time since last new job 51 | 52 | # compact representation 53 | self.network_compact_dim = (self.num_res + 1) * \ 54 | (self.time_horizon + self.num_nw) + 1 # + 1 for backlog indicator 55 | 56 | self.network_output_dim = self.num_nw + 1 # + 1 for void action 57 | 58 | self.delay_penalty = -1 # penalty for delaying things in the current work screen 59 | self.hold_penalty = -1 # penalty for holding things in the new work screen 60 | self.dismiss_penalty = -1 # penalty for missing a job because the queue is full 61 | 62 | self.num_frames = 1 # number of frames to combine and process 63 | self.lr_rate = 0.001 # learning rate 64 | self.rms_rho = 0.9 # for rms prop 65 | self.rms_eps = 1e-9 # for rms prop 66 | 67 | self.unseen = False # change random seed to generate unseen example 68 | 69 | # supervised learning mimic policy 70 | self.batch_size = 10 71 | self.evaluate_policy_name = "SJF" 72 | 73 | def compute_dependent_parameters(self): 74 | assert self.backlog_size % self.time_horizon == 0 # such that it can be converted into an image 75 | self.backlog_width = self.backlog_size / self.time_horizon 76 | self.network_input_height = self.time_horizon 77 | self.network_input_width = \ 78 | (self.res_slot + 79 | self.max_job_size * self.num_nw) * self.num_res + \ 80 | self.backlog_width + \ 81 | 1 # for extra info, 1) time since last new job 82 | 83 | # compact representation 84 | self.network_compact_dim = (self.num_res + 1) * \ 85 | (self.time_horizon + self.num_nw) + 1 # + 1 for backlog indicator 86 | 87 | self.network_output_dim = self.num_nw + 1 # + 1 for void action 88 | 89 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepRM+
 2 | 
Based on works of Hongzi Mao
HotNets'16 http://people.csail.mit.edu/hongzi/content/publications/DeepRM-HotNets16.pdf
 3 | 4 | 
Made improvements based on DeepRM http://github.com/hongzimao/deeprm
 5 | 6 | ## Improvement on algorithm structure:
 7 | 8 | #### Rebuild the network with a convolution neural network 9 | 10 | File: build_small_conv_pg_network in pg_network.py 11 | 
Network structure:
Input: CNN with size 2*2, 16 filters
 12 | 13 | Output: Fully connected layer with # of actions output
 14 | 
Major improvement. Improved convergence rate (by ??? --> To Do)
 15 | #### Reshape state space. 16 | 17 | File: environment.py 18 | 19 | 
In DeepRM, state space was generated by stacking vertically matrices in the following way: 20 | 21 | 
State matrix for resource 1, job 1's request matrix for resource 1, job 2's request matrix for resource 1, ... , job n's request matrix for resource 1,\
 State matrix for resource 2, job 1's request matrix for resource 2, job 2's request matrix for resource 2, ... , job n's request matrix for resource 2.
 22 | 23 | 
I decide to put the related matrices closer, therefore stacking matrices in the following way: 24 | 25 | 
Stacking vertically respectively: State matrix for resource 1, job 1's request matrix for resource 1, job 2's request matrix for resource 1, ... , job n's request matrix for resource 1,\
and State matrix for resource 2, job 1's request matrix for resource 2, job 2's request matrix for resource 2, ... , job n's request matrix for resource 2.
And then stack the above two long matrices vertically.
 26 | 27 | See picture below for better explanation: 28 | 29 | 30 | 31 | Original state matrix 32 | Original state matrix 33 | 34 | 35 | 36 | Reshaped state matrix 37 | Reshaped state matrix 38 | 39 | 40 | 41 | 
Major improvement. Improved the average slowdown by 8.9% after 1000 epochs of training. 42 | 43 | #### Rewrite penalty function. 44 | 45 | File: parameters.py 46 | 47 | 
I gave different weights of penalty for jobs already planned(in machine matrix), jobs in jobslot queue and jobs in backlog.

Minor improvement. Improved convergence rate. 48 | 49 | ## Others

 50 | * Added log and save checkpoints to make record of slowdown and save models (pg_re_single_core.py and pg_re.py) 51 | 
 52 | 53 | * Added launcher2 for convenient launching and debugging (launcher2.py) 54 | 55 | ## Install prerequisites 56 | 57 | ``` 58 | sudo apt-get update 59 | sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git 60 | pip install --user Theano 61 | pip install --user Lasagne==0.1 62 | sudo apt-get install python-matplotlib 63 | ``` 64 | 65 | ## Run code 66 | In folder RL, create a data/ folder. 67 | 68 | Use `launcher.py` to launch experiments. 69 | 70 | 71 | ``` 72 | --exp_type 73 | --num_res 74 | --num_nw 75 | --simu_len 76 | --num_ex 77 | --num_seq_per_batch 78 | --eps_max_len 79 | --num_epochs 80 | --time_horizon