├── .gitignore
├── LICENSE
├── README.md
├── pytorch-soft-actor-critic
├── LICENSE
├── continous_grids.py
├── exploration_models.py
├── flow_helpers.py
├── flows.py
├── graphics.py
├── main.py
├── model.py
├── normalized_actions.py
├── plots
│ └── plot_comet.py
├── replay_memory.py
├── sac.py
├── scripts
│ ├── run_contgridworld_exp.sh
│ └── run_contgridworld_gauss.sh
├── settings.json
└── utils.py
└── pytorch-vanilla-reinforce
├── README.md
├── main_reinforce.py
└── reinforce_simple.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.json
2 | *.rar
3 | *.tar
4 | *.zip
5 | *.pt
6 | *.swp
7 | *.png
8 | *.eps
9 | __pycache__/**
10 | .idea/**
11 | install/**
12 | *.pdf
13 | *.png
14 | *.xls
15 | .nfs*
16 | pytorch-soft-actor-critic/__pycache__/**
17 | pytorch-soft-actor-critic/ddpg_gridworld/__pycache__/**
18 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Avishek (Joey) Bose
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Improving Exploration in SAC with Normalizing Flows Policies
2 |
3 | This codebase was used to generate the results documented in the paper "[Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies](arxiv_url_placeholder)".
4 | Patrick Nadeem Ward*12, Ariella Smofsky*12, Avishek Joey Bose12. INNF Workshop ICML 2019.
5 |
6 | * * Equal contribution, 1 McGill University, 2 Mila
7 | * Correspondence to:
8 | * Patrick Nadeem Ward <[Github: NadeemWard](https://github.com/NadeemWard), patrick.ward@mail.mcgill.ca>
9 | * Ariella Smofsky <[Github: asmoog](https://github.com/asmoog), ariella.smofsky@mail.mcgill.ca>
10 |
11 | ## Requirements
12 | * [PyTorch](https://pytorch.org/)
13 | * [comet.ml](https://www.comet.ml/)
14 |
15 | ## Run Experiments
16 | Gaussian policy on Dense Gridworld environment with REINFORCE:
17 | ```
18 | TODO
19 | ```
20 |
21 | Gaussian policy on Sparse Gridworld environment with REINFORCE:
22 | ```
23 | TODO
24 | ```
25 |
26 | Gaussian policy on Dense Gridworld environment with reparametrization:
27 | ```
28 | python main.py --namestr=G-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --dense_goals --silent
29 | ```
30 |
31 | Gaussian policy on Sparse Gridworld environment with reparametrization:
32 | ```
33 | python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --silent
34 | ```
35 |
36 | Normalizing Flow policy on Dense Gridworld environment:
37 | ```
38 | TODO
39 | ```
40 |
41 | Normalizing Flow policy on Sparse Gridworld environment:
42 | ```
43 | TODO
44 | ```
45 |
46 | To run an experiment with a different policy distribution, modify the `--policy` flag.
47 |
48 | ## References
49 | * Implementation of SAC based on [PyTorch SAC](https://github.com/pranz24/pytorch-soft-actor-critic).
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Pranjal Tandon
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/continous_grids.py:
--------------------------------------------------------------------------------
1 | """
2 | code from here: https://github.com/junhyukoh/value-prediction-network/blob/master/maze.py
3 | """
4 |
5 | import copy
6 | import pandas as pd
7 |
8 | import seaborn as sns
9 |
10 | import gym.spaces
11 | import matplotlib.patches as patches
12 | import matplotlib.path as path
13 | import matplotlib.pyplot as plt
14 | import numpy as np
15 | from exploration_models import *
16 | from graphics import *
17 | from gym import spaces
18 |
19 |
20 | class GridWorld(gym.Env):
21 | """
22 | empty grid world
23 | """
24 |
25 | def __init__(self,
26 | num_rooms=0,
27 | start_position=(25.0, 25.0),
28 | goal_position=(75.0, 75.0),
29 | goal_reward=+100.0,
30 | dense_goals=None,
31 | dense_reward=+5,
32 | goal_radius=1.0,
33 | per_step_penalty=-0.01,
34 | max_episode_len=1000,
35 | grid_len=100,
36 | wall_breadth=1,
37 | door_breadth=5,
38 | action_limit_max=1.0,
39 | silent_mode=False):
40 | """
41 | params:
42 | """
43 |
44 | # num of rooms
45 | self.num_rooms = num_rooms
46 | self.silent_mode = silent_mode
47 |
48 | # grid size
49 | self.grid_len = float(grid_len)
50 | self.wall_breadth = float(wall_breadth)
51 | self.door_breadth = float(door_breadth)
52 | self.min_position = 0.0
53 | self.max_position = float(grid_len)
54 |
55 | # goal stats
56 | self.goal_position = np.array(goal_position)
57 | self.goal_radius = goal_radius
58 | self.start_position = np.array(start_position)
59 |
60 | # Dense reward stuff:
61 | self.dense_reward = dense_reward
62 | # List of dense goal coordinates
63 | self.dense_goals = dense_goals
64 |
65 | # rewards
66 | self.goal_reward = goal_reward
67 | self.per_step_penalty = per_step_penalty
68 |
69 | self.max_episode_len = max_episode_len
70 |
71 | # observation space
72 | self.low_state = np.array([self.min_position, self.min_position])
73 | self.high_state = np.array([self.max_position, self.max_position])
74 |
75 | # how much the agent can move in a step (dx,dy)
76 | self.min_action = np.array([-action_limit_max, -action_limit_max])
77 | self.max_action = np.array([+action_limit_max, +action_limit_max])
78 |
79 | self.observation_space = spaces.Box(low=self.low_state, high=self.high_state)
80 | self.action_space = spaces.Box(low=self.min_action, high=self.max_action)
81 | self.nb_actions = self.action_space.shape[-1]
82 |
83 | # add the walls here
84 | self.create_walls()
85 | self.scale = 5
86 |
87 | # This code enables live visualization of trajectories
88 | # Susan added these lines for visual purposes
89 | if not self.silent_mode:
90 | self.win1 = GraphWin("2DGrid", self.max_position * self.scale + 40, self.max_position * self.scale + 40)
91 | rectangle1 = Rectangle(Point(self.min_position * self.scale + 20, self.min_position * self.scale + 20),
92 | Point(self.max_position * self.scale + 20, self.max_position * self.scale + 20))
93 | rectangle1.setOutline('red')
94 | rectangle1.draw(self.win1)
95 |
96 | if self.num_rooms > 0:
97 | wall1 = Rectangle(Point(self.min_position * self.scale + 20,
98 | self.max_position * self.scale / 2 + 20 - self.wall_breadth * self.scale),
99 | Point(self.max_position * self.scale / 2 + 20,
100 | self.max_position * self.scale / 2 + 20 + self.wall_breadth * self.scale))
101 | wall1.draw(self.win1)
102 | wall1.setFill('aquamarine')
103 |
104 | wall2 = Rectangle(Point(self.max_position * self.scale / 2 + 20 - self.wall_breadth * self.scale,
105 | self.min_position * self.scale + 20),
106 | Point(self.max_position * self.scale / 2 + 20 + self.wall_breadth * self.scale,
107 | self.max_position * self.scale / 4 + 20 - self.door_breadth * self.scale))
108 | wall2.draw(self.win1)
109 | wall2.setFill('aquamarine')
110 |
111 | wall3 = Rectangle(Point(self.max_position * self.scale / 2 + 20 - self.wall_breadth * self.scale,
112 | self.max_position * self.scale / 4 + 20 + self.door_breadth * self.scale),
113 | Point(self.max_position * self.scale / 2 + 20 + self.wall_breadth * self.scale,
114 | self.max_position * self.scale / 2 + 20 + self.wall_breadth * self.scale))
115 | wall3.draw(self.win1)
116 | wall3.setFill('aquamarine')
117 | start_point = Circle(Point(start_position[0] * self.scale + 20, start_position[1] * self.scale + 20),
118 | goal_radius * self.scale)
119 | start_point.draw(self.win1)
120 | start_point.setFill('red')
121 | goal_point = Circle(Point(goal_position[0] * self.scale + 20, goal_position[1] * self.scale + 20),
122 | goal_radius * self.scale)
123 | goal_point.draw(self.win1)
124 | goal_point.setFill('green')
125 |
126 | # Drawing the dense goals:
127 | for idx, mini_goal in enumerate(self.dense_goals):
128 | mini_goal_point = Circle(Point(mini_goal[0] * self.scale + 20, mini_goal[1] * self.scale + 20),
129 | goal_radius * self.scale)
130 | mini_goal_point.draw(self.win1)
131 | mini_goal_point.setFill('blue')
132 |
133 | # self.win1.getMouse()
134 |
135 | self.seed()
136 | self.reset()
137 |
138 | def reset(self):
139 | self.state = copy.deepcopy(self.start_position)
140 | self.t = 0
141 | self.done = False
142 |
143 | return self._get_obs()
144 |
145 | def _get_obs(self):
146 | return copy.deepcopy(self.state)
147 |
148 | def step(self, a):
149 | """
150 | take the action here
151 | """
152 |
153 | # check if the action is valid
154 | assert self.action_space.contains(a)
155 | assert self.done is False
156 |
157 | # Susan added this line
158 | self.state_temp = copy.deepcopy(self.state)
159 |
160 | self.t += 1
161 |
162 | # check if collides, if it doesn't then update the state
163 | if self.num_rooms == 0 or not self.collides((self.state[0] + a[0], self.state[1] + a[1])):
164 | # move the agent and update the state
165 | self.state[0] += a[0]
166 | self.state[1] += a[1]
167 |
168 | # clip the state if out of bounds
169 | self.state[0] = np.clip(self.state[0], self.min_position, self.max_position)
170 | self.state[1] = np.clip(self.state[1], self.min_position, self.max_position)
171 |
172 | # the reward logic
173 | reward = self.per_step_penalty
174 |
175 | # Adding dense Rewards:
176 | for idx, mini_goal in enumerate(self.dense_goals):
177 | if np.linalg.norm(np.array(self.state) - np.array(mini_goal), 2) <= self.goal_radius:
178 | reward = self.dense_reward
179 |
180 | # if reached goal (within a radius of 1 unit)
181 | if np.linalg.norm(np.array(self.state) - np.array(self.goal_position), 2) <= self.goal_radius:
182 | # episode done
183 | self.done = True
184 | reward = self.goal_reward
185 |
186 | if self.t >= self.max_episode_len:
187 | self.done = True
188 |
189 | line = Line(Point(self.state_temp[0] * self.scale + 20, self.state_temp[1] * self.scale + 20),
190 | Point(self.state[0] * self.scale + 20, self.state[1] * self.scale + 20))
191 |
192 | if not self.silent_mode:
193 | line.draw(self.win1)
194 | line.setOutline('black')
195 | # self.win1.getMouse()
196 | self.state_temp = self.state
197 |
198 | if self.silent_mode:
199 | return self._get_obs(), reward, self.done, None
200 |
201 | # return self.win1,self._get_obs(), reward, self.done, None
202 | return self._get_obs(), reward, self.done, None
203 |
204 | def sample(self, exploration, b_0, l_p, ou_noise, stddev):
205 | """ take a random sample """
206 | if exploration == 'RandomWalk':
207 | return np.random.uniform(low=self.min_action[0], high=self.max_action[0], size=(2,))
208 | elif exploration == 'PolyRL':
209 | return PolyNoise(L_p=float(l_p), b_0=float(b_0), action_dim=self.nb_actions, ou_noise=ou_noise,
210 | sigma=float(stddev))
211 | else:
212 | raise Exception("The exploration method " + self.exploration + " is not defined!")
213 |
214 | def create_walls(self):
215 | """
216 | create the walls here, the polygons
217 | """
218 | self.walls = []
219 |
220 | # codes for drawing the polygons in matplotlib
221 | codes = [path.Path.MOVETO,
222 | path.Path.LINETO,
223 | path.Path.LINETO,
224 | path.Path.LINETO,
225 | path.Path.CLOSEPOLY,
226 | ]
227 |
228 | if self.num_rooms == 0:
229 | # no walls required
230 | return
231 | elif self.num_rooms == 1:
232 | # create one room with one opening
233 |
234 | # a wall parallel to x-axis, at (0,grid_len/2), (grid_len/2,grid_len/2)
235 | self.walls.append(path.Path([(0, self.grid_len / 2.0 + self.wall_breadth),
236 | (0, self.grid_len / 2.0 - self.wall_breadth),
237 | (self.grid_len / 2.0, self.grid_len / 2.0 - self.wall_breadth),
238 | (self.grid_len / 2.0, self.grid_len / 2.0 + self.wall_breadth),
239 | (0, self.grid_len / 2.0 + self.wall_breadth)
240 | ], codes=codes))
241 |
242 | # the top part of wall on (0,grid_len/2), parallel to y -axis containg
243 | self.walls.append(path.Path([(self.grid_len / 2.0 - self.wall_breadth, self.grid_len / 2.0),
244 | (self.grid_len / 2.0 - self.wall_breadth,
245 | self.grid_len / 4.0 + self.door_breadth),
246 | (self.grid_len / 2.0 + self.wall_breadth,
247 | self.grid_len / 4.0 + self.door_breadth),
248 | (self.grid_len / 2.0 + self.wall_breadth, self.grid_len / 2.0),
249 | (self.grid_len / 2.0 - self.wall_breadth, self.grid_len / 2.0),
250 | ], codes=codes))
251 |
252 | # the bottom part of wall on (0,grid_len/2), parallel to y -axis containg
253 | self.walls.append(
254 | path.Path([(self.grid_len / 2.0 - self.wall_breadth, self.grid_len / 4.0 - self.door_breadth),
255 | (self.grid_len / 2.0 - self.wall_breadth, 0.),
256 | (self.grid_len / 2.0 + self.wall_breadth, 0.),
257 | (self.grid_len / 2.0 + self.wall_breadth, self.grid_len / 4.0 - self.door_breadth),
258 | (self.grid_len / 2.0 - self.wall_breadth, self.grid_len / 4.0 - self.door_breadth),
259 | ], codes=codes))
260 |
261 | elif self.num_rooms == 4:
262 | # create 4 rooms
263 | raise Exception("Not implemented yet :(")
264 | else:
265 | raise Exception("Logic for current number of rooms " +
266 | str(self.num_rooms) + " is not implemented yet :(")
267 |
268 | def collides(self, pt):
269 | """
270 | to check if the point (x,y) is in the area defined by the walls polygon (i.e. collides)
271 | """
272 | wall_edge_low = self.grid_len / 2 - self.wall_breadth
273 | wall_edge_high = self.grid_len / 2 + self.wall_breadth
274 | for w in self.walls:
275 | if w.contains_point(pt):
276 | return True
277 | elif pt[0] <= self.min_position and pt[1] > wall_edge_low and pt[1] < wall_edge_high:
278 | return True
279 | elif pt[1] <= self.min_position and pt[0] > wall_edge_low and pt[0] < wall_edge_high:
280 | return True
281 | return False
282 |
283 | def vis_trajectory(self, traj, name_plot, experiment_id=None, imp_states=None):
284 | """
285 | creates the trajectory and return the plot
286 |
287 | trj: numpy_array
288 |
289 |
290 | Code taken from: https://discuss.pytorch.org/t/example-code-to-put-matplotlib-graph-to-tensorboard-x/15806
291 | """
292 | fig = plt.figure(figsize=(10, 10))
293 | ax = fig.add_subplot(111)
294 |
295 | # convert the environment to the image
296 | ax.set_xlim(0.0, self.max_position)
297 | ax.set_ylim(0.0, self.max_position)
298 |
299 | # add the border here
300 | # for i in ax.spines.itervalues():
301 | # i.set_linewidth(0.1)
302 |
303 | # plot any walls if any
304 | for w in self.walls:
305 | patch = patches.PathPatch(w, facecolor='gray', lw=2)
306 | ax.add_patch(patch)
307 |
308 | # plot the start and goal points
309 | ax.scatter([self.start_position[0]], [self.start_position[1]], c='g')
310 | ax.scatter([self.goal_position[0]], [self.goal_position[1]], c='y')
311 |
312 | # Plot the dense rewards:
313 | for idx, mini_goal in enumerate(self.dense_goals):
314 | ax.scatter([mini_goal[0]], [mini_goal[1]], c='b')
315 |
316 | # add the trajectory here
317 | # https://stackoverflow.com/questions/36607742/drawing-phase-space-trajectories-with-arrows-in-matplotlib
318 |
319 | ax.quiver(traj[:-1, 0], traj[:-1, 1],
320 | traj[1:, 0] - traj[:-1, 0], traj[1:, 1] - traj[:-1, 1],
321 | scale_units='xy', angles='xy', scale=1, color='black')
322 |
323 | # plot the decision points/states
324 | if imp_states is not None:
325 | ax.scatter(imp_states[:, 0], imp_states[:, 1], c='r')
326 |
327 | # return the image buff
328 |
329 | ax.set_title("grid")
330 | # fig.savefig(buf, format='jpeg') # maybe png
331 | fig.savefig('install/{}_{}'.format(name_plot, experiment_id), dpi=300) # maybe png
332 |
333 | def test_vis_trajectory(self, traj, name_plot, heatmap_title, experiment_id=None, heatmap_normalize=False,
334 | heatmap_vertical_clip_value=2500):
335 |
336 | # Trajectory heatmap
337 | x = np.array([point[0] * self.scale for point in traj])
338 | y = np.array([point[1] * self.scale for point in traj])
339 |
340 | # Save heatmap for different bin scales
341 | for num in range(2, 6):
342 | fig, ax = plt.subplots()
343 |
344 | bin_scale = num * 0.1
345 |
346 | h = ax.hist2d(x, y, bins=[np.arange(self.min_position * self.scale,
347 | self.max_position * self.scale, num),
348 | np.arange(self.min_position * self.scale,
349 | self.max_position * self.scale, num)],
350 | cmap='Blues', normed=heatmap_normalize, vmax=heatmap_vertical_clip_value)
351 | image = h[3]
352 | plt.colorbar(image, ax=ax)
353 |
354 | # Build graph barriers and start and goal positions
355 | start_point = (self.start_position[0] * self.scale, self.start_position[1] * self.scale)
356 | radius = self.goal_radius * self.scale / 2
357 | start_circle = patches.Circle(start_point, radius,
358 | facecolor='gold', edgecolor='black', lw=0.5, zorder=10)
359 |
360 | goal_point = (self.goal_position[0] * self.scale, self.goal_position[1] * self.scale)
361 | goal_circle = patches.Circle(goal_point, radius,
362 | facecolor='brown', edgecolor='black', lw=0.5, zorder=10)
363 |
364 | for idx, dense_goal in enumerate(self.dense_goals):
365 | dense_goal_point = (dense_goal[0] * self.scale, dense_goal[1] * self.scale)
366 | dense_goal_circle = patches.Circle(dense_goal_point, radius,
367 | facecolor='crimson', edgecolor='black', lw=0.5, zorder=10)
368 | ax.add_patch(dense_goal_circle)
369 |
370 | ax.add_patch(start_circle)
371 | ax.add_patch(goal_circle)
372 |
373 | if self.num_rooms == 1:
374 | wall1_xy = (self.min_position * self.scale,
375 | self.max_position/2 * self.scale - self.wall_breadth * self.scale)
376 | wall1_width = (self.max_position/2 - self.min_position) * self.scale
377 | wall1_height = 2 * self.wall_breadth * self.scale
378 | wall1_rect = patches.Rectangle(xy=wall1_xy, width=wall1_width, height=wall1_height,
379 | facecolor='grey', zorder=10)
380 |
381 | wall2_xy = (self.max_position/2 * self.scale - self.wall_breadth * self.scale,
382 | self.min_position * self.scale)
383 | wall2_width = 2 * self.wall_breadth * self.scale
384 | wall2_height = (self.max_position/4 - self.door_breadth - self.min_position) * self.scale
385 | wall2_rect = patches.Rectangle(xy=wall2_xy, width=wall2_width, height=wall2_height,
386 | facecolor='grey', zorder=10)
387 |
388 | wall3_xy = ((self.max_position/2 - self.wall_breadth) * self.scale,
389 | (self.max_position/4 + self.door_breadth) * self.scale)
390 | wall3_width = 2 * self.wall_breadth * self.scale
391 | wall3_height = (self.max_position/2 + self.wall_breadth - self.max_position/4 - self.door_breadth) * self.scale
392 | wall3_rect = patches.Rectangle(xy=wall3_xy, width=wall3_width, height=wall3_height,
393 | facecolor='grey', zorder=10)
394 |
395 | ax.add_patch(wall1_rect)
396 | ax.add_patch(wall2_rect)
397 | ax.add_patch(wall3_rect)
398 |
399 | ax.set_title(heatmap_title)
400 | plt.savefig('install/{}_{}_{}.pdf'.format(name_plot, experiment_id, num))
401 |
402 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/exploration_models.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import math
3 | import numpy as np
4 | import random
5 | from sklearn import preprocessing
6 | import copy
7 |
8 |
9 | # action noise models
10 | class ActionNoise(object):
11 | def reset(self):
12 | pass
13 |
14 |
15 | class RandomWalkNoise(object):
16 | def __init__(self, action_dim, max_action_limit):
17 | self.action_dim = action_dim
18 | self.max_action_limit = max_action_limit
19 |
20 | def __call__(self):
21 | return np.around(np.random.uniform(-self.max_action_limit,+self.max_action_limit, (self.action_dim,)), decimals = 10)
22 |
23 |
24 | class NormalActionNoise(ActionNoise):
25 | def __init__(self, mu, sigma):
26 | self.mu = mu
27 | self.sigma = sigma
28 |
29 | def __call__(self):
30 | return np.random.normal(self.mu, self.sigma)
31 |
32 | def __repr__(self):
33 | return 'NormalActionNoise(mu={}, sigma={})'.format(self.mu, self.sigma)
34 |
35 |
36 | # Based on http://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab
37 | class OrnsteinUhlenbeckActionNoise(ActionNoise):
38 | def __init__(self, mu, sigma, theta=.15, dt=1e-2, x0=None):
39 | self.theta = theta
40 | self.mu = mu
41 | self.sigma = sigma
42 | self.dt = dt
43 | self.x0 = x0
44 | self.reset()
45 |
46 | def __call__(self):
47 | x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)
48 | self.x_prev = x
49 | return x
50 |
51 | def reset(self):
52 | self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu)
53 |
54 | def __repr__(self):
55 | return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(self.mu, self.sigma)
56 |
57 |
58 |
59 |
60 | # TODO: Adaptive param noise: https://github.com/openai/baselines/blob/master/baselines/ddpg/noise.py
61 | class AdaptiveParamNoiseSpec(object):
62 | def __init__(self, initial_stddev=0.1, desired_action_stddev=0.1, adoption_coefficient=1.01):
63 | self.initial_stddev = initial_stddev
64 | self.desired_action_stddev = desired_action_stddev
65 | self.adoption_coefficient = adoption_coefficient
66 |
67 | self.current_stddev = initial_stddev
68 |
69 | def adapt(self, distance):
70 | if distance > self.desired_action_stddev:
71 | # Decrease stddev.
72 | self.current_stddev /= self.adoption_coefficient
73 | else:
74 | # Increase stddev.
75 | self.current_stddev *= self.adoption_coefficient
76 |
77 | def get_stats(self):
78 | stats = {
79 | 'param_noise_stddev': self.current_stddev,
80 | }
81 | return stats
82 |
83 | def __repr__(self):
84 | fmt = 'AdaptiveParamNoiseSpec(initial_stddev={}, desired_action_stddev={}, adoption_coefficient={})'
85 | return fmt.format(self.initial_stddev, self.desired_action_stddev, self.adoption_coefficient)
86 |
87 |
88 |
89 | class PolyNoise(object):
90 | def __init__(self,
91 | L_p,
92 | b_0,
93 | action_dim,
94 | ou_noise,
95 | sigma = 0.2,):
96 | """
97 | params for the L_p formulation:
98 | L_p: the persistance length
99 | b_0: movement distance
100 | signma: correlaion_variance
101 | blind: disregard the current action and only use the previous action
102 | """
103 | action_noise = NormalActionNoise(mu=np.zeros(action_dim), sigma=float(sigma) * np.ones(action_dim))
104 | self.L_p = L_p
105 | self.b_0 = b_0
106 | self.sigma = sigma
107 | self.action_dim = action_dim
108 | self.ou_noise = ou_noise
109 |
110 | # calculate the angle here
111 | self.n = int(L_p/b_0)
112 | self.lambda_ = np.arccos(np.exp((-1. * b_0)/L_p))
113 | # initialize and reset traj-specific stats
114 | self.reset()
115 |
116 |
117 | def reset(self):
118 | """
119 | reset the chain history
120 | """
121 | self.H = None
122 | self.a_p = None
123 | #self.ou_noise.reset()
124 |
125 | self.i = 0
126 | self.t = 0
127 | self.rand_or_poly = []
128 |
129 |
130 | def __call__(self, a):
131 | """
132 | the
133 | s: the current state
134 | a: the current action
135 | t: the time step
136 | """
137 | new_a = a
138 |
139 | if self.t==0:
140 | # return original a
141 | pass
142 | elif self.t==1:
143 | # create randonm trajectory vector
144 | H = np.random.rand(self.action_dim)
145 | self.H = (H * self.b_0) / np.linalg.norm(H, 2)
146 | # append the new H to the previous actions
147 | new_a = self.a_p + self.H
148 | self.i += 1
149 | else:
150 | # done with polyRL noise
151 | if self.i == self.n:
152 | # intialize
153 | noise = self.ou_noise()
154 |
155 | # add the noise
156 | new_a = a + noise
157 |
158 | #rest i and H
159 | self.i = 0
160 | self.H = new_a - self.a_p
161 | self.rand_or_poly.append(False)
162 | else:
163 | eta = abs(np.random.normal(self.lambda_, self.sigma, 1))
164 | B = sample_persistent_action(self.action_dim, self.H, self.a_p, eta)
165 | self.rand_or_poly.append(True)
166 |
167 | #update the trajectory
168 | self.H = self.b_0 * B
169 |
170 | new_a = self.a_p + self.H
171 | self.i += 1
172 |
173 | #update the previous a_p
174 | self.a_p = new_a
175 | self.t += 1
176 |
177 |
178 |
179 | return new_a
180 |
181 |
182 | class GyroPolyNoise(object):
183 | def __init__(self,
184 | L_p,
185 | b_0,
186 | action_dim,
187 | state_dim,
188 | ou_noise,
189 | sigma = 0.2,):
190 | """
191 | params for the L_p formulation:
192 | L_p: the persistance length
193 | b_0: movement distance
194 | signma: correlaion_variance
195 | blind: disregard the current action and only use the previous action
196 | """
197 | self.L_p = L_p
198 | self.b_0 = b_0
199 | self.sigma = sigma
200 | self.action_dim = action_dim
201 | self.state_dim = state_dim
202 | self.ou_noise = ou_noise
203 |
204 | # calculate the angle here
205 | self.n = int(L_p/b_0)
206 | self.lambda_ = np.arccos(np.exp((-1. * b_0)/L_p))
207 | # initialize and reset traj-specific stats
208 | self.reset()
209 |
210 |
211 | def reset(self):
212 | """
213 | reset the chain history
214 | """
215 | self.H = None
216 | self.a_p = None
217 | self.ou_noise.reset()
218 |
219 | self.i = 0
220 | self.t = 0
221 |
222 | # raius of gyration
223 | self.g = 0
224 | self.delta_g = 0
225 |
226 | # centre of mass of gyration
227 | self.C = np.zeros(self.state_dim)
228 |
229 |
230 | self.rand_or_poly = []
231 | self.g_history = []
232 | self.avg_delta_g = 0
233 |
234 |
235 | def __call__(self, a, s):
236 | """
237 | the
238 | s: the current state
239 | a: the current action
240 | t: the time step
241 | """
242 | new_a = a
243 |
244 | if self.t==0:
245 | # return original a
246 | pass
247 | elif self.t==1:
248 | # create randonm trajectory vector
249 | H = np.random.rand(self.action_dim)
250 | self.H = (H * self.b_0) / np.linalg.norm(H, 2)
251 |
252 | # append the new H to the previous actions
253 | new_a = self.a_p + self.H
254 | self.i += 1
255 | else:
256 | # done with polyRL noise
257 | if self.delta_g < 0:
258 | # intialize
259 | noise = self.ou_noise()
260 |
261 | # add the noise
262 | new_a = a + noise
263 |
264 | #rest i and H
265 | self.i = 0
266 | self.g = 0
267 | self.C = np.zeros(self.state_dim)
268 | self.delta_g = 0
269 | self.H = new_a - self.a_p
270 | self.rand_or_poly.append(False)
271 | else:
272 | eta = abs(np.random.normal(self.lambda_, self.sigma, 1))
273 | B = sample_persistent_action(self.action_dim, self.H, self.a_p, eta)
274 | self.rand_or_poly.append(True)
275 |
276 | #update the trajectory
277 | self.H = self.b_0 * B
278 |
279 | new_a = self.a_p + self.H
280 |
281 | if self.i == self.n:
282 | self.i = 0
283 | self.g = 0
284 | self.C = np.zeros(self.state_dim)
285 | self.delta_g = 0
286 | else:
287 | self.i += 1
288 |
289 | #update the previous a_p
290 | self.a_p = new_a
291 |
292 | if self.i != 0:
293 | g = np.sqrt(((float(self.i-1.0)/self.i) * self.g**2) + (1.0/(self.i+1.0) * np.linalg.norm(s - self.C, 2)**2) )
294 | self.delta_g = g - self.g
295 | self.g = g
296 |
297 | # add to history
298 | self.avg_delta_g += self.delta_g
299 | self.g_history.append(self.g)
300 |
301 |
302 | self.C = (self.i * self.C + s)/(self.i + 1.0)
303 | self.t += 1
304 |
305 |
306 | return new_a
307 | class GyroPolyNoiseActionTraj (object):
308 | def __init__(self,
309 | lambd,
310 | action_dim,
311 | state_dim,
312 | ou_noise,
313 | sigma = 0.2,
314 | max_action_limit = 1.0):
315 | self.lambd = lambd
316 | self.action_dim = action_dim
317 | self.state_dim = state_dim
318 | self.ou_noise = ou_noise
319 | self.sigma = sigma
320 | self.max_action_limit = max_action_limit
321 |
322 |
323 | # initialize and reset traj-specific stats
324 | self.reset()
325 |
326 | def reset(self):
327 |
328 | """
329 | reset the chain history
330 | """
331 | self.a_p = None
332 | self.ou_noise.reset()
333 |
334 | self.i = 0
335 | self.t = 0
336 |
337 | # raius of gyration
338 | self.g = 0
339 | self.delta_g = 0
340 |
341 | # centre of mass
342 | self.C = np.zeros(self.state_dim)
343 |
344 |
345 | self.rand_or_poly = []
346 | self.g_history = []
347 | self.avg_delta_g = 0
348 |
349 |
350 | def __call__(self, a, s):
351 | """
352 | the
353 | s: the current state
354 | a: the current action
355 | t: the time step
356 | """
357 | new_a = a
358 |
359 | if self.t==0:
360 | # return original a
361 | pass
362 | else:
363 | # done with polyRL noise
364 | if self.delta_g < 0:
365 | # intialize
366 | noise = self.ou_noise()
367 |
368 | # add the noise
369 | new_a = a + noise
370 |
371 | #rest i and H
372 | self.i = 0
373 | self.g = 0
374 | self.C = np.zeros(self.state_dim)
375 | self.delta_g = 0
376 | self.rand_or_poly.append(False)
377 | else:
378 | eta = abs(np.random.normal(self.lambd, self.sigma, 1))
379 | A = sample_persistent_action_noHvector(self.action_dim, self.a_p, eta, self.max_action_limit)
380 | self.rand_or_poly.append(True)
381 |
382 | #update the trajectory
383 | new_a = A
384 |
385 | self.i +=1
386 |
387 |
388 | #update the previous a_p
389 | self.a_p = new_a
390 |
391 | if self.i != 0:
392 | g = np.sqrt(((float(self.i-1.0)/self.i) * self.g**2) + (1.0/(self.i+1.0) * np.linalg.norm(s - self.C, 2)**2) )
393 | self.delta_g = g - self.g
394 | self.g = g
395 |
396 | # add to history
397 | self.avg_delta_g += self.delta_g
398 | self.g_history.append(self.g)
399 |
400 |
401 | self.C = (self.i * self.C + s)/(self.i + 1.0)
402 | self.t += 1
403 | return new_a
404 |
405 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/flow_helpers.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | import torch.distributions as D
5 | import torchvision.transforms as T
6 | import os
7 | import math
8 | import argparse
9 | import pprint
10 | import copy
11 |
12 | # --------------------
13 | # Model layers and helpers
14 | # --------------------
15 |
16 | def create_masks(input_size, hidden_size, n_hidden, input_order='sequential', input_degrees=None):
17 | # MADE paper sec 4:
18 | # degrees of connections between layers -- ensure at most in_degree - 1 connections
19 | degrees = []
20 |
21 | # set input degrees to what is provided in args (the flipped order of the previous layer in a stack of mades);
22 | # else init input degrees based on strategy in input_order (sequential or random)
23 | if input_order == 'sequential':
24 | degrees += [torch.arange(input_size)] if input_degrees is None else [input_degrees]
25 | for _ in range(n_hidden + 1):
26 | degrees += [torch.arange(hidden_size) % (input_size - 1)]
27 | degrees += [torch.arange(input_size) % input_size - 1] if input_degrees is None else [input_degrees % input_size - 1]
28 |
29 | elif input_order == 'random':
30 | degrees += [torch.randperm(input_size)] if input_degrees is None else [input_degrees]
31 | for _ in range(n_hidden + 1):
32 | min_prev_degree = min(degrees[-1].min().item(), input_size - 1)
33 | degrees += [torch.randint(min_prev_degree, input_size, (hidden_size,))]
34 | min_prev_degree = min(degrees[-1].min().item(), input_size - 1)
35 | degrees += [torch.randint(min_prev_degree, input_size, (input_size,)) - 1] if input_degrees is None else [input_degrees - 1]
36 |
37 | # construct masks
38 | masks = []
39 | for (d0, d1) in zip(degrees[:-1], degrees[1:]):
40 | masks += [(d1.unsqueeze(-1) >= d0.unsqueeze(0)).float()]
41 |
42 | return masks, degrees[0]
43 |
44 |
45 | class MaskedLinear(nn.Linear):
46 | """ MADE building block layer """
47 | def __init__(self, input_size, n_outputs, mask, cond_label_size=None):
48 | super().__init__(input_size, n_outputs)
49 |
50 | self.register_buffer('mask', mask)
51 |
52 | self.cond_label_size = cond_label_size
53 | if cond_label_size is not None:
54 | self.cond_weight = nn.Parameter(torch.rand(n_outputs, cond_label_size) / math.sqrt(cond_label_size))
55 |
56 | def forward(self, x, y=None):
57 | out = F.linear(x, self.weight * self.mask, self.bias)
58 | if y is not None:
59 | out = out + F.linear(y, self.cond_weight)
60 | return out
61 |
62 | def extra_repr(self):
63 | return 'in_features={}, out_features={}, bias={}'.format(
64 | self.in_features, self.out_features, self.bias is not None
65 | ) + (self.cond_label_size != None) * ', cond_features={}'.format(self.cond_label_size)
66 |
67 |
68 | class LinearMaskedCoupling(nn.Module):
69 | """ Modified RealNVP Coupling Layers per the MAF paper """
70 | def __init__(self, input_size, hidden_size, n_hidden, mask, cond_label_size=None):
71 | super().__init__()
72 |
73 | self.register_buffer('mask', mask)
74 |
75 | # scale function
76 | s_net = [nn.Linear(input_size + (cond_label_size if cond_label_size is not None else 0), hidden_size)]
77 | for _ in range(n_hidden):
78 | s_net += [nn.Tanh(), nn.Linear(hidden_size, hidden_size)]
79 | s_net += [nn.Tanh(), nn.Linear(hidden_size, input_size)]
80 | self.s_net = nn.Sequential(*s_net)
81 |
82 | # translation function
83 | self.t_net = copy.deepcopy(self.s_net)
84 | # replace Tanh with ReLU's per MAF paper
85 | for i in range(len(self.t_net)):
86 | if not isinstance(self.t_net[i], nn.Linear): self.t_net[i] = nn.ReLU()
87 |
88 | def forward(self, x, y=None):
89 | # apply mask
90 | mx = x * self.mask
91 |
92 | # run through model
93 | s = self.s_net(mx if y is None else torch.cat([y, mx], dim=1))
94 | t = self.t_net(mx if y is None else torch.cat([y, mx], dim=1))
95 | u = mx + (1 - self.mask) * (x - t) * torch.exp(-s) # cf RealNVP eq 8 where u corresponds to x (here we're modeling u)
96 |
97 | log_abs_det_jacobian = - (1 - self.mask) * s # log det du/dx; cf RealNVP 8 and 6; note, sum over input_size done at model log_prob
98 |
99 | return u, log_abs_det_jacobian
100 |
101 | def inverse(self, u, y=None):
102 | # apply mask
103 | mu = u * self.mask
104 |
105 | # run through model
106 | s = self.s_net(mu if y is None else torch.cat([y, mu], dim=1))
107 | t = self.t_net(mu if y is None else torch.cat([y, mu], dim=1))
108 | x = mu + (1 - self.mask) * (u * s.exp() + t) # cf RealNVP eq 7
109 |
110 | log_abs_det_jacobian = (1 - self.mask) * s # log det dx/du
111 |
112 | return x, log_abs_det_jacobian
113 |
114 |
115 | class BatchNorm(nn.Module):
116 | """ RealNVP BatchNorm layer """
117 | def __init__(self, input_size, momentum=0.9, eps=1e-5):
118 | super().__init__()
119 | self.momentum = momentum
120 | self.eps = eps
121 |
122 | self.log_gamma = nn.Parameter(torch.zeros(input_size))
123 | self.beta = nn.Parameter(torch.zeros(input_size))
124 |
125 | self.register_buffer('running_mean', torch.zeros(input_size))
126 | self.register_buffer('running_var', torch.ones(input_size))
127 |
128 | def forward(self, x, cond_y=None):
129 | if self.training:
130 | self.batch_mean = x.mean(0)
131 | self.batch_var = x.var(0) # note MAF paper uses biased variance estimate; ie x.var(0, unbiased=False)
132 |
133 | # update running mean
134 | self.running_mean.mul_(self.momentum).add_(self.batch_mean.data * (1 - self.momentum))
135 | self.running_var.mul_(self.momentum).add_(self.batch_var.data * (1 - self.momentum))
136 |
137 | mean = self.batch_mean
138 | var = self.batch_var
139 | else:
140 | mean = self.running_mean
141 | var = self.running_var
142 |
143 | # compute normalized input (cf original batch norm paper algo 1)
144 | x_hat = (x - mean) / torch.sqrt(var + self.eps)
145 | y = self.log_gamma.exp() * x_hat + self.beta
146 |
147 | # compute log_abs_det_jacobian (cf RealNVP paper)
148 | log_abs_det_jacobian = self.log_gamma - 0.5 * torch.log(var + self.eps)
149 | # print('in sum log var {:6.3f} ; out sum log var {:6.3f}; sum log det {:8.3f}; mean log_gamma {:5.3f}; mean beta {:5.3f}'.format(
150 | # (var + self.eps).log().sum().data.numpy(), y.var(0).log().sum().data.numpy(), log_abs_det_jacobian.mean(0).item(), self.log_gamma.mean(), self.beta.mean()))
151 | return y, log_abs_det_jacobian.expand_as(x)
152 |
153 | def inverse(self, y, cond_y=None):
154 | if self.training:
155 | mean = self.batch_mean
156 | var = self.batch_var
157 | else:
158 | mean = self.running_mean
159 | var = self.running_var
160 |
161 | x_hat = (y - self.beta) * torch.exp(-self.log_gamma)
162 | x = x_hat * torch.sqrt(var + self.eps) + mean
163 |
164 | log_abs_det_jacobian = 0.5 * torch.log(var + self.eps) - self.log_gamma
165 |
166 | return x, log_abs_det_jacobian.expand_as(x)
167 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/flows.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | import torch.nn.functional as F
5 | from torch.distributions import Normal
6 | from flow_helpers import *
7 | import ipdb
8 |
9 | #Reference: https://github.com/ritheshkumar95/pytorch-normalizing-flows/blob/master/modules.py
10 | # Initialize Policy weights
11 | LOG_SIG_MAX = 2
12 | LOG_SIG_MIN = -20
13 | epsilon = 1e-6
14 | def weights_init_(m):
15 | classname = m.__class__.__name__
16 | if classname.find('Linear') != -1:
17 | torch.nn.init.xavier_uniform_(m.weight, gain=1)
18 | torch.nn.init.constant_(m.bias, 0)
19 |
20 | class PlanarBase(nn.Module):
21 | def __init__(self, n_blocks, state_size, input_size, hidden_size, n_hidden, device):
22 | super().__init__()
23 | self.l1 = nn.Linear(state_size, hidden_size)
24 | self.l2 = nn.Linear(hidden_size,hidden_size)
25 | self.mu = nn.Linear(hidden_size, input_size)
26 | self.log_std = nn.Linear(hidden_size, input_size)
27 | self.device = device
28 | self.z_size = input_size
29 | self.num_flows = n_blocks
30 | self.flow = Planar
31 | # Amortized flow parameters
32 | self.amor_u = nn.Linear(hidden_size, self.num_flows * input_size)
33 | self.amor_w = nn.Linear(hidden_size, self.num_flows * input_size)
34 | self.amor_b = nn.Linear(hidden_size, self.num_flows)
35 |
36 | # Normalizing flow layers
37 | for k in range(self.num_flows):
38 | flow_k = self.flow()
39 | self.add_module('flow_' + str(k), flow_k)
40 |
41 | self.apply(weights_init_)
42 |
43 | def encode(self, state):
44 | x = F.relu(self.l1(state))
45 | x = F.relu(self.l2(x))
46 | mean = self.mu(x)
47 | log_std = self.log_std(x)
48 | log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
49 | return mean, log_std, x
50 |
51 | def forward(self, state):
52 | batch_size = state.size(0)
53 | mean, log_std, x = self.encode(state)
54 | std = log_std.exp()
55 | normal = Normal(mean, std)
56 | x_t = normal.rsample() # for reparameterization trick (mean + std * N(0,1))
57 | action = torch.tanh(x_t)
58 | log_prob = normal.log_prob(x_t)
59 | # Enforcing Action Bound
60 | log_prob -= torch.log(1 - action.pow(2) + epsilon)
61 | log_prob = log_prob.sum(1, keepdim=True)
62 | z = [action]
63 | u = self.amor_u(x).view(batch_size, self.num_flows, self.z_size, 1)
64 | w = self.amor_w(x).view(batch_size, self.num_flows, 1, self.z_size)
65 | b = self.amor_b(x).view(batch_size, self.num_flows, 1, 1)
66 |
67 | self.log_det_j = torch.zeros(batch_size).to(self.device)
68 |
69 | for k in range(self.num_flows):
70 | flow_k = getattr(self, 'flow_' + str(k))
71 | z_k, log_det_jacobian = flow_k(z[k], u[:, k, :, :],
72 | w[:, k, :, :], b[:, k, :, :])
73 | z.append(z_k)
74 | self.log_det_j += log_det_jacobian
75 |
76 | action = z[-1]
77 | log_prob_final_action = log_prob.squeeze() - self.log_det_j
78 |
79 | probability_final_action = torch.exp(log_prob_final_action)
80 | entropy = (probability_final_action * log_prob_final_action)
81 | normalized_action = torch.tanh(action)
82 | np_action = action.cpu().data.numpy().flatten()
83 | if np.isnan(np_action[0]):
84 | ipdb.set_trace()
85 | return normalized_action, log_prob, action , mean, std
86 |
87 | class PlanarFlow(nn.Module):
88 | def __init__(self, D):
89 | super().__init__()
90 | self.D = D
91 |
92 | def forward(self, z, lamda):
93 | '''
94 | z - latents from prev layer
95 | lambda - Flow parameters (b, w, u)
96 | b - scalar
97 | w - vector
98 | u - vector
99 | '''
100 | b = lamda[:, :1]
101 | w, u = lamda[:, 1:].chunk(2, dim=1)
102 |
103 | # Forward
104 | # f(z) = z + u tanh(w^T z + b)
105 | transf = F.tanh(
106 | z.unsqueeze(1).bmm(w.unsqueeze(2))[:, 0] + b
107 | )
108 | f_z = z + u * transf
109 |
110 | # Inverse
111 | # psi_z = tanh' (w^T z + b) w
112 | psi_z = (1 - transf ** 2) * w
113 | log_abs_det_jacobian = torch.log(
114 | (1 + psi_z.unsqueeze(1).bmm(u.unsqueeze(2))).abs()
115 | )
116 |
117 | return f_z, log_abs_det_jacobian
118 |
119 | class NormalizingFlow(nn.Module):
120 | def __init__(self, K, D):
121 | super().__init__()
122 | self.flows = nn.ModuleList([PlanarFlow(D) for i in range(K)])
123 |
124 | def forward(self, z_k, flow_params):
125 | # ladj -> log abs det jacobian
126 | sum_ladj = 0
127 | for i, flow in enumerate(self.flows):
128 | z_k, ladj_k = flow(z_k, flow_params[i])
129 | sum_ladj += ladj_k
130 |
131 | return z_k, sum_ladj
132 |
133 | class Planar(nn.Module):
134 | """
135 | PyTorch implementation of planar flows as presented in "Variational Inference with Normalizing Flows"
136 | by Danilo Jimenez Rezende, Shakir Mohamed. Model assumes amortized flow parameters.
137 | """
138 |
139 | def __init__(self):
140 |
141 | super(Planar, self).__init__()
142 |
143 | self.h = nn.Tanh()
144 | self.softplus = nn.Softplus()
145 |
146 | def der_h(self, x):
147 | """ Derivative of tanh """
148 |
149 | return 1 - self.h(x) ** 2
150 |
151 | def forward(self, zk, u, w, b):
152 | """
153 | Forward pass. Assumes amortized u, w and b. Conditions on diagonals of u and w for invertibility
154 | will be be satisfied inside this function. Computes the following transformation:
155 | z' = z + u h( w^T z + b)
156 | or actually
157 | z'^T = z^T + h(z^T w + b)u^T
158 | Assumes the following input shapes:
159 | shape u = (batch_size, z_size, 1)
160 | shape w = (batch_size, 1, z_size)
161 | shape b = (batch_size, 1, 1)
162 | shape z = (batch_size, z_size).
163 | """
164 |
165 | zk = zk.unsqueeze(2)
166 |
167 | # reparameterize u such that the flow becomes invertible (see appendix paper)
168 | uw = torch.bmm(w, u)
169 | m_uw = -1. + self.softplus(uw)
170 | w_norm_sq = torch.sum(w ** 2, dim=2, keepdim=True)
171 | u_hat = u + ((m_uw - uw) * w.transpose(2, 1) / w_norm_sq)
172 |
173 | # compute flow with u_hat
174 | wzb = torch.bmm(w, zk) + b
175 | z = zk + u_hat * self.h(wzb)
176 | z = z.squeeze(2)
177 |
178 | # compute logdetJ
179 | psi = w * self.der_h(wzb)
180 | log_det_jacobian = torch.log(torch.abs(1 + torch.bmm(psi, u_hat)))
181 | log_det_jacobian = log_det_jacobian.squeeze(2).squeeze(1)
182 |
183 | return z, log_det_jacobian
184 |
185 |
186 | # All code below this line is taken from
187 | # https://github.com/kamenbliznashki/normalizing_flows/blob/master/maf.py
188 |
189 | class FlowSequential(nn.Sequential):
190 | """ Container for layers of a normalizing flow """
191 | def forward(self, x, y):
192 | sum_log_abs_det_jacobians = 0
193 | for module in self:
194 | x, log_abs_det_jacobian = module(x, y)
195 | sum_log_abs_det_jacobians = sum_log_abs_det_jacobians + log_abs_det_jacobian
196 | return x, sum_log_abs_det_jacobians
197 |
198 | def inverse(self, u, y):
199 | sum_log_abs_det_jacobians = 0
200 | for module in reversed(self):
201 | u, log_abs_det_jacobian = module.inverse(u, y)
202 | sum_log_abs_det_jacobians = sum_log_abs_det_jacobians + log_abs_det_jacobian
203 | return u, sum_log_abs_det_jacobians
204 |
205 | # --------------------
206 | # Models
207 | # --------------------
208 |
209 | class MADE(nn.Module):
210 | def __init__(self, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', input_degrees=None):
211 | """
212 | Args:
213 | input_size -- scalar; dim of inputs
214 | hidden_size -- scalar; dim of hidden layers
215 | n_hidden -- scalar; number of hidden layers
216 | activation -- str; activation function to use
217 | input_order -- str or tensor; variable order for creating the autoregressive masks (sequential|random)
218 | or the order flipped from the previous layer in a stack of mades
219 | conditional -- bool; whether model is conditional
220 | """
221 | super().__init__()
222 | # base distribution for calculation of log prob under the model
223 | self.register_buffer('base_dist_mean', torch.zeros(input_size))
224 | self.register_buffer('base_dist_var', torch.ones(input_size))
225 |
226 | # create masks
227 | masks, self.input_degrees = create_masks(input_size, hidden_size, n_hidden, input_order, input_degrees)
228 |
229 | # setup activation
230 | if activation == 'relu':
231 | activation_fn = nn.ReLU()
232 | elif activation == 'tanh':
233 | activation_fn = nn.Tanh()
234 | else:
235 | raise ValueError('Check activation function.')
236 |
237 | # construct model
238 | self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size)
239 | self.net = []
240 | for m in masks[1:-1]:
241 | self.net += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)]
242 | self.net += [activation_fn, MaskedLinear(hidden_size, 2 * input_size, masks[-1].repeat(2,1))]
243 | self.net = nn.Sequential(*self.net)
244 |
245 | @property
246 | def base_dist(self):
247 | return D.Normal(self.base_dist_mean, self.base_dist_var)
248 |
249 | def forward(self, x, y=None):
250 | # MAF eq 4 -- return mean and log std
251 | m, loga = self.net(self.net_input(x, y)).chunk(chunks=2, dim=1)
252 | u = (x - m) * torch.exp(-loga)
253 | # MAF eq 5
254 | log_abs_det_jacobian = - loga
255 | return u, log_abs_det_jacobian
256 |
257 | def inverse(self, u, y=None, sum_log_abs_det_jacobians=None):
258 | # MAF eq 3
259 | D = u.shape[1]
260 | x = torch.zeros_like(u)
261 | # run through reverse model
262 | for i in self.input_degrees:
263 | m, loga = self.net(self.net_input(x, y)).chunk(chunks=2, dim=1)
264 | x[:,i] = u[:,i] * torch.exp(loga[:,i]) + m[:,i]
265 | log_abs_det_jacobian = -loga
266 | return x, log_abs_det_jacobian
267 |
268 | def log_prob(self, x, y=None):
269 | u, log_abs_det_jacobian = self.forward(x, y)
270 | return torch.sum(self.base_dist.log_prob(u) + log_abs_det_jacobian, dim=1)
271 |
272 |
273 | class MADEMOG(nn.Module):
274 | """ Mixture of Gaussians MADE """
275 | def __init__(self, n_components, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', input_degrees=None):
276 | """
277 | Args:
278 | n_components -- scalar; number of gauassian components in the mixture
279 | input_size -- scalar; dim of inputs
280 | hidden_size -- scalar; dim of hidden layers
281 | n_hidden -- scalar; number of hidden layers
282 | activation -- str; activation function to use
283 | input_order -- str or tensor; variable order for creating the autoregressive masks (sequential|random)
284 | or the order flipped from the previous layer in a stack of mades
285 | conditional -- bool; whether model is conditional
286 | """
287 | super().__init__()
288 | self.n_components = n_components
289 |
290 | # base distribution for calculation of log prob under the model
291 | self.register_buffer('base_dist_mean', torch.zeros(input_size))
292 | self.register_buffer('base_dist_var', torch.ones(input_size))
293 |
294 | # create masks
295 | masks, self.input_degrees = create_masks(input_size, hidden_size, n_hidden, input_order, input_degrees)
296 |
297 | # setup activation
298 | if activation == 'relu':
299 | activation_fn = nn.ReLU()
300 | elif activation == 'tanh':
301 | activation_fn = nn.Tanh()
302 | else:
303 | raise ValueError('Check activation function.')
304 |
305 | # construct model
306 | self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size)
307 | self.net = []
308 | for m in masks[1:-1]:
309 | self.net += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)]
310 | self.net += [activation_fn, MaskedLinear(hidden_size, n_components * 3 * input_size, masks[-1].repeat(n_components * 3,1))]
311 | self.net = nn.Sequential(*self.net)
312 |
313 | @property
314 | def base_dist(self):
315 | return D.Normal(self.base_dist_mean, self.base_dist_var)
316 |
317 | def forward(self, x, y=None):
318 | # shapes
319 | N, L = x.shape
320 | C = self.n_components
321 | # MAF eq 2 -- parameters of Gaussians - mean, logsigma, log unnormalized cluster probabilities
322 | m, loga, logr = self.net(self.net_input(x, y)).view(N, C, 3 * L).chunk(chunks=3, dim=-1) # out 3 x (N, C, L)
323 | # MAF eq 4
324 | x = x.repeat(1, C).view(N, C, L) # out (N, C, L)
325 | u = (x - m) * torch.exp(-loga) # out (N, C, L)
326 | # MAF eq 5
327 | log_abs_det_jacobian = - loga # out (N, C, L)
328 | # normalize cluster responsibilities
329 | self.logr = logr - logr.logsumexp(1, keepdim=True) # out (N, C, L)
330 | return u, log_abs_det_jacobian
331 |
332 | def inverse(self, u, y=None, sum_log_abs_det_jacobians=None):
333 | # shapes
334 | N, C, L = u.shape
335 | # init output
336 | x = torch.zeros(N, L).to(u.device)
337 | # MAF eq 3
338 | # run through reverse model along each L
339 | for i in self.input_degrees:
340 | m, loga, logr = self.net(self.net_input(x, y)).view(N, C, 3 * L).chunk(chunks=3, dim=-1) # out 3 x (N, C, L)
341 | # normalize cluster responsibilities and sample cluster assignments from a categorical dist
342 | logr = logr - logr.logsumexp(1, keepdim=True) # out (N, C, L)
343 | z = D.Categorical(logits=logr[:,:,i]).sample().unsqueeze(-1) # out (N, 1)
344 | u_z = torch.gather(u[:,:,i], 1, z).squeeze() # out (N, 1)
345 | m_z = torch.gather(m[:,:,i], 1, z).squeeze() # out (N, 1)
346 | loga_z = torch.gather(loga[:,:,i], 1, z).squeeze()
347 | x[:,i] = u_z * torch.exp(loga_z) + m_z
348 | log_abs_det_jacobian = - loga
349 | return x, log_abs_det_jacobian
350 |
351 | def log_prob(self, x, y=None):
352 | u, log_abs_det_jacobian = self.forward(x, y) # u = (N,C,L); log_abs_det_jacobian = (N,C,L)
353 | # marginalize cluster probs
354 | log_probs = torch.logsumexp(self.logr + self.base_dist.log_prob(u) + log_abs_det_jacobian, dim=1) # sum over C; out (N, L)
355 | return log_probs.sum(1) # sum over L; out (N,)
356 |
357 |
358 | class MAF(nn.Module):
359 | def __init__(self, n_blocks, state_size, input_size, hidden_size, n_hidden,
360 | cond_label_size=None, activation='relu',
361 | input_order='sequential', batch_norm=True):
362 | super().__init__()
363 | # base distribution for calculation of log prob under the model
364 | self.register_buffer('base_dist_mean', torch.zeros(input_size))
365 | self.register_buffer('base_dist_var', torch.ones(input_size))
366 | self.linear1 = nn.Linear(state_size, input_size)
367 |
368 | # construct model
369 | modules = []
370 | self.input_degrees = None
371 | for i in range(n_blocks):
372 | modules += [MADE(input_size, hidden_size, n_hidden,
373 | cond_label_size, activation, input_order,
374 | self.input_degrees)]
375 | self.input_degrees = modules[-1].input_degrees.flip(0)
376 | modules += batch_norm * [BatchNorm(input_size)]
377 |
378 | self.net = FlowSequential(*modules)
379 |
380 | @property
381 | def base_dist(self):
382 | return D.Normal(self.base_dist_mean, self.base_dist_var)
383 |
384 | def forward(self, x, y=None):
385 | ''' Projecting the State to the same dim as actions '''
386 | action_proj = F.relu(self.linear1(x))
387 | # action_proj = action_proj.view(1,-1)
388 | if action_proj.size()[0] == 1 and len(action_proj.size()) > 2:
389 | action, sum_log_abs_det_jacobians = self.net(action_proj[0], y)
390 | else:
391 | action, sum_log_abs_det_jacobians = self.net(action_proj, y)
392 | log_prob = torch.sum(self.base_dist.log_prob(action) + sum_log_abs_det_jacobians, dim=1)
393 | normalized_action = torch.tanh(action)
394 | # TODO: Find the mean and log std deviation of a Normalizing Flow
395 | return normalized_action, log_prob, action , action, 0
396 |
397 | def inverse(self, u, y=None):
398 | action_proj = F.relu(self.linear1(u))
399 | action, sum_log_abs_det_jacobians = self.net.inverse(action_proj, y)
400 | log_prob = torch.sum(self.base_dist.log_prob(action) + sum_log_abs_det_jacobians, dim=1)
401 | normalized_action = torch.tanh(action)
402 | return normalized_action, log_prob, action , action, 0
403 | # return self.net.inverse(action_proj, y)
404 |
405 | def log_prob(self, x, y=None):
406 | u, sum_log_abs_det_jacobians = self.forward(x, y)
407 | return torch.sum(self.base_dist.log_prob(u) + sum_log_abs_det_jacobians, dim=1)
408 |
409 | class MAFMOG(nn.Module):
410 | """ MAF on mixture of gaussian MADE """
411 | def __init__(self, n_blocks, n_components, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu',
412 | input_order='sequential', batch_norm=True):
413 | super().__init__()
414 | # base distribution for calculation of log prob under the model
415 | self.register_buffer('base_dist_mean', torch.zeros(input_size))
416 | self.register_buffer('base_dist_var', torch.ones(input_size))
417 |
418 | self.maf = MAF(n_blocks, input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, batch_norm)
419 | # get reversed input order from the last layer (note in maf model, input_degrees are already flipped in for-loop model constructor
420 | input_degrees = self.maf.input_degrees#.flip(0)
421 | self.mademog = MADEMOG(n_components, input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, input_degrees)
422 |
423 | @property
424 | def base_dist(self):
425 | return D.Normal(self.base_dist_mean, self.base_dist_var)
426 |
427 | def forward(self, x, y=None):
428 | u, maf_log_abs_dets = self.maf(x, y)
429 | u, made_log_abs_dets = self.mademog(u, y)
430 | sum_log_abs_det_jacobians = maf_log_abs_dets.unsqueeze(1) + made_log_abs_dets
431 | return u, sum_log_abs_det_jacobians
432 |
433 | def inverse(self, u, y=None):
434 | x, made_log_abs_dets = self.mademog.inverse(u, y)
435 | x, maf_log_abs_dets = self.maf.inverse(x, y)
436 | sum_log_abs_det_jacobians = maf_log_abs_dets.unsqueeze(1) + made_log_abs_dets
437 | return x, sum_log_abs_det_jacobians
438 |
439 | def log_prob(self, x, y=None):
440 | u, log_abs_det_jacobian = self.forward(x, y) # u = (N,C,L); log_abs_det_jacobian = (N,C,L)
441 | # marginalize cluster probs
442 | log_probs = torch.logsumexp(self.mademog.logr + self.base_dist.log_prob(u) + log_abs_det_jacobian, dim=1) # out (N, L)
443 | return log_probs.sum(1) # out (N,)
444 |
445 |
446 | class RealNVP(nn.Module):
447 | def __init__(self, n_blocks, input_size, hidden_size, n_hidden, cond_label_size=None, batch_norm=True):
448 | super().__init__()
449 |
450 | # base distribution for calculation of log prob under the model
451 | self.register_buffer('base_dist_mean', torch.zeros(input_size))
452 | self.register_buffer('base_dist_var', torch.ones(input_size))
453 |
454 | # construct model
455 | modules = []
456 | mask = torch.arange(input_size).float() % 2
457 | for i in range(n_blocks):
458 | modules += [LinearMaskedCoupling(input_size, hidden_size, n_hidden, mask, cond_label_size)]
459 | mask = 1 - mask
460 | modules += batch_norm * [BatchNorm(input_size)]
461 |
462 | self.net = FlowSequential(*modules)
463 |
464 | @property
465 | def base_dist(self):
466 | return D.Normal(self.base_dist_mean, self.base_dist_var)
467 |
468 | def forward(self, x, y=None):
469 | return self.net(x, y)
470 |
471 | def inverse(self, u, y=None):
472 | return self.net.inverse(u, y)
473 |
474 | def log_prob(self, x, y=None):
475 | u, sum_log_abs_det_jacobians = self.forward(x, y)
476 | return torch.sum(self.base_dist.log_prob(u) + sum_log_abs_det_jacobians, dim=1)
477 |
478 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/graphics.py:
--------------------------------------------------------------------------------
1 | # graphics.py
2 | """Simple object oriented graphics library
3 |
4 | The library is designed to make it very easy for novice programmers to
5 | experiment with computer graphics in an object oriented fashion. It is
6 | written by John Zelle for use with the book "Python Programming: An
7 | Introduction to Computer Science" (Franklin, Beedle & Associates).
8 |
9 | LICENSE: This is open-source software released under the terms of the
10 | GPL (http://www.gnu.org/licenses/gpl.html).
11 |
12 | PLATFORMS: The package is a wrapper around Tkinter and should run on
13 | any platform where Tkinter is available.
14 |
15 | INSTALLATION: Put this file somewhere where Python can see it.
16 |
17 | OVERVIEW: There are two kinds of objects in the library. The GraphWin
18 | class implements a window where drawing can be done and various
19 | GraphicsObjects are provided that can be drawn into a GraphWin. As a
20 | simple example, here is a complete program to draw a circle of radius
21 | 10 centered in a 100x100 window:
22 |
23 | --------------------------------------------------------------------
24 | from graphics import *
25 |
26 | def main():
27 | win = GraphWin("My Circle", 100, 100)
28 | c = Circle(Point(50,50), 10)
29 | c.draw(win)
30 | win.getMouse() # Pause to view result
31 | win.close() # Close window when done
32 |
33 | main()
34 | --------------------------------------------------------------------
35 | GraphWin objects support coordinate transformation through the
36 | setCoords method and mouse and keyboard interaction methods.
37 |
38 | The library provides the following graphical objects:
39 | Point
40 | Line
41 | Circle
42 | Oval
43 | Rectangle
44 | Polygon
45 | Text
46 | Entry (for text-based input)
47 | Image
48 |
49 | Various attributes of graphical objects can be set such as
50 | outline-color, fill-color and line-width. Graphical objects also
51 | support moving and hiding for animation effects.
52 |
53 | The library also provides a very simple class for pixel-based image
54 | manipulation, Pixmap. A pixmap can be loaded from a file and displayed
55 | using an Image object. Both getPixel and setPixel methods are provided
56 | for manipulating the image.
57 |
58 | DOCUMENTATION: For complete documentation, see Chapter 4 of "Python
59 | Programming: An Introduction to Computer Science" by John Zelle,
60 | published by Franklin, Beedle & Associates. Also see
61 | http://mcsp.wartburg.edu/zelle/python for a quick reference"""
62 |
63 | __version__ = "5.0"
64 |
65 | # Version 5 8/26/2016
66 | # * update at bottom to fix MacOS issue causing askopenfile() to hang
67 | # * update takes an optional parameter specifying update rate
68 | # * Entry objects get focus when drawn
69 | # * __repr_ for all objects
70 | # * fixed offset problem in window, made canvas borderless
71 |
72 | # Version 4.3 4/25/2014
73 | # * Fixed Image getPixel to work with Python 3.4, TK 8.6 (tuple type handling)
74 | # * Added interactive keyboard input (getKey and checkKey) to GraphWin
75 | # * Modified setCoords to cause redraw of current objects, thus
76 | # changing the view. This supports scrolling around via setCoords.
77 | #
78 | # Version 4.2 5/26/2011
79 | # * Modified Image to allow multiple undraws like other GraphicsObjects
80 | # Version 4.1 12/29/2009
81 | # * Merged Pixmap and Image class. Old Pixmap removed, use Image.
82 | # Version 4.0.1 10/08/2009
83 | # * Modified the autoflush on GraphWin to default to True
84 | # * Autoflush check on close, setBackground
85 | # * Fixed getMouse to flush pending clicks at entry
86 | # Version 4.0 08/2009
87 | # * Reverted to non-threaded version. The advantages (robustness,
88 | # efficiency, ability to use with other Tk code, etc.) outweigh
89 | # the disadvantage that interactive use with IDLE is slightly more
90 | # cumbersome.
91 | # * Modified to run in either Python 2.x or 3.x (same file).
92 | # * Added Image.getPixmap()
93 | # * Added update() -- stand alone function to cause any pending
94 | # graphics changes to display.
95 | #
96 | # Version 3.4 10/16/07
97 | # Fixed GraphicsError to avoid "exploded" error messages.
98 | # Version 3.3 8/8/06
99 | # Added checkMouse method to GraphWin
100 | # Version 3.2.3
101 | # Fixed error in Polygon init spotted by Andrew Harrington
102 | # Fixed improper threading in Image constructor
103 | # Version 3.2.2 5/30/05
104 | # Cleaned up handling of exceptions in Tk thread. The graphics package
105 | # now raises an exception if attempt is made to communicate with
106 | # a dead Tk thread.
107 | # Version 3.2.1 5/22/05
108 | # Added shutdown function for tk thread to eliminate race-condition
109 | # error "chatter" when main thread terminates
110 | # Renamed various private globals with _
111 | # Version 3.2 5/4/05
112 | # Added Pixmap object for simple image manipulation.
113 | # Version 3.1 4/13/05
114 | # Improved the Tk thread communication so that most Tk calls
115 | # do not have to wait for synchonization with the Tk thread.
116 | # (see _tkCall and _tkExec)
117 | # Version 3.0 12/30/04
118 | # Implemented Tk event loop in separate thread. Should now work
119 | # interactively with IDLE. Undocumented autoflush feature is
120 | # no longer necessary. Its default is now False (off). It may
121 | # be removed in a future version.
122 | # Better handling of errors regarding operations on windows that
123 | # have been closed.
124 | # Addition of an isClosed method to GraphWindow class.
125 |
126 | # Version 2.2 8/26/04
127 | # Fixed cloning bug reported by Joseph Oldham.
128 | # Now implements deep copy of config info.
129 | # Version 2.1 1/15/04
130 | # Added autoflush option to GraphWin. When True (default) updates on
131 | # the window are done after each action. This makes some graphics
132 | # intensive programs sluggish. Turning off autoflush causes updates
133 | # to happen during idle periods or when flush is called.
134 | # Version 2.0
135 | # Updated Documentation
136 | # Made Polygon accept a list of Points in constructor
137 | # Made all drawing functions call TK update for easier animations
138 | # and to make the overall package work better with
139 | # Python 2.3 and IDLE 1.0 under Windows (still some issues).
140 | # Removed vestigial turtle graphics.
141 | # Added ability to configure font for Entry objects (analogous to Text)
142 | # Added setTextColor for Text as an alias of setFill
143 | # Changed to class-style exceptions
144 | # Fixed cloning of Text objects
145 |
146 | # Version 1.6
147 | # Fixed Entry so StringVar uses _root as master, solves weird
148 | # interaction with shell in Idle
149 | # Fixed bug in setCoords. X and Y coordinates can increase in
150 | # "non-intuitive" direction.
151 | # Tweaked wm_protocol so window is not resizable and kill box closes.
152 |
153 | # Version 1.5
154 | # Fixed bug in Entry. Can now define entry before creating a
155 | # GraphWin. All GraphWins are now toplevel windows and share
156 | # a fixed root (called _root).
157 |
158 | # Version 1.4
159 | # Fixed Garbage collection of Tkinter images bug.
160 | # Added ability to set text atttributes.
161 | # Added Entry boxes.
162 |
163 | import time, os, sys
164 |
165 | try: # import as appropriate for 2.x vs. 3.x
166 | import tkinter as tk
167 | except:
168 | import Tkinter as tk
169 |
170 |
171 | ##########################################################################
172 | # Module Exceptions
173 |
174 | class GraphicsError(Exception):
175 | """Generic error class for graphics module exceptions."""
176 | pass
177 |
178 | OBJ_ALREADY_DRAWN = "Object currently drawn"
179 | UNSUPPORTED_METHOD = "Object doesn't support operation"
180 | BAD_OPTION = "Illegal option value"
181 |
182 | ##########################################################################
183 | # global variables and funtions
184 |
185 | _root = tk.Tk()
186 | _root.withdraw()
187 |
188 | _update_lasttime = time.time()
189 |
190 | def update(rate=None):
191 | global _update_lasttime
192 | if rate:
193 | now = time.time()
194 | pauseLength = 1/rate-(now-_update_lasttime)
195 | if pauseLength > 0:
196 | time.sleep(pauseLength)
197 | _update_lasttime = now + pauseLength
198 | else:
199 | _update_lasttime = now
200 |
201 | _root.update()
202 |
203 | ############################################################################
204 | # Graphics classes start here
205 |
206 | class GraphWin(tk.Canvas):
207 |
208 | """A GraphWin is a toplevel window for displaying graphics."""
209 |
210 | def __init__(self, title="Graphics Window",
211 | width=200, height=200, autoflush=True):
212 | assert type(title) == type(""), "Title must be a string"
213 | master = tk.Toplevel(_root)
214 | master.protocol("WM_DELETE_WINDOW", self.close)
215 | tk.Canvas.__init__(self, master, width=width, height=height,
216 | highlightthickness=0, bd=0)
217 | self.master.title(title)
218 | self.pack()
219 | master.resizable(0,0)
220 | self.foreground = "black"
221 | self.items = []
222 | self.mouseX = None
223 | self.mouseY = None
224 | self.bind("", self._onClick)
225 | self.bind_all("", self._onKey)
226 | self.height = int(height)
227 | self.width = int(width)
228 | self.autoflush = autoflush
229 | self._mouseCallback = None
230 | self.trans = None
231 | self.closed = False
232 | master.lift()
233 | self.lastKey = ""
234 | if autoflush: _root.update()
235 |
236 | def __repr__(self):
237 | if self.isClosed():
238 | return ""
239 | else:
240 | return "GraphWin('{}', {}, {})".format(self.master.title(),
241 | self.getWidth(),
242 | self.getHeight())
243 |
244 | def __str__(self):
245 | return repr(self)
246 |
247 | def __checkOpen(self):
248 | if self.closed:
249 | raise GraphicsError("window is closed")
250 |
251 | def _onKey(self, evnt):
252 | self.lastKey = evnt.keysym
253 |
254 |
255 | def setBackground(self, color):
256 | """Set background color of the window"""
257 | self.__checkOpen()
258 | self.config(bg=color)
259 | self.__autoflush()
260 |
261 | def setCoords(self, x1, y1, x2, y2):
262 | """Set coordinates of window to run from (x1,y1) in the
263 | lower-left corner to (x2,y2) in the upper-right corner."""
264 | self.trans = Transform(self.width, self.height, x1, y1, x2, y2)
265 | self.redraw()
266 |
267 | def close(self):
268 | """Close the window"""
269 |
270 | if self.closed: return
271 | self.closed = True
272 | self.master.destroy()
273 | self.__autoflush()
274 |
275 |
276 | def isClosed(self):
277 | return self.closed
278 |
279 |
280 | def isOpen(self):
281 | return not self.closed
282 |
283 |
284 | def __autoflush(self):
285 | if self.autoflush:
286 | _root.update()
287 |
288 |
289 | def plot(self, x, y, color="black"):
290 | """Set pixel (x,y) to the given color"""
291 | self.__checkOpen()
292 | xs,ys = self.toScreen(x,y)
293 | self.create_line(xs,ys,xs+1,ys, fill=color)
294 | self.__autoflush()
295 |
296 | def plotPixel(self, x, y, color="black"):
297 | """Set pixel raw (independent of window coordinates) pixel
298 | (x,y) to color"""
299 | self.__checkOpen()
300 | self.create_line(x,y,x+1,y, fill=color)
301 | self.__autoflush()
302 |
303 | def flush(self):
304 | """Update drawing to the window"""
305 | self.__checkOpen()
306 | self.update_idletasks()
307 |
308 | def getMouse(self):
309 | """Wait for mouse click and return Point object representing
310 | the click"""
311 | self.update() # flush any prior clicks
312 | self.mouseX = None
313 | self.mouseY = None
314 | while self.mouseX == None or self.mouseY == None:
315 | self.update()
316 | if self.isClosed(): raise GraphicsError("getMouse in closed window")
317 | time.sleep(.1) # give up thread
318 | x,y = self.toWorld(self.mouseX, self.mouseY)
319 | self.mouseX = None
320 | self.mouseY = None
321 | return Point(x,y)
322 |
323 | def checkMouse(self):
324 | """Return last mouse click or None if mouse has
325 | not been clicked since last call"""
326 | if self.isClosed():
327 | raise GraphicsError("checkMouse in closed window")
328 | self.update()
329 | if self.mouseX != None and self.mouseY != None:
330 | x,y = self.toWorld(self.mouseX, self.mouseY)
331 | self.mouseX = None
332 | self.mouseY = None
333 | return Point(x,y)
334 | else:
335 | return None
336 |
337 | def getKey(self):
338 | """Wait for user to press a key and return it as a string."""
339 | self.lastKey = ""
340 | while self.lastKey == "":
341 | self.update()
342 | if self.isClosed(): raise GraphicsError("getKey in closed window")
343 | time.sleep(.1) # give up thread
344 |
345 | key = self.lastKey
346 | self.lastKey = ""
347 | return key
348 |
349 | def checkKey(self):
350 | """Return last key pressed or None if no key pressed since last call"""
351 | if self.isClosed():
352 | raise GraphicsError("checkKey in closed window")
353 | self.update()
354 | key = self.lastKey
355 | self.lastKey = ""
356 | return key
357 |
358 | def getHeight(self):
359 | """Return the height of the window"""
360 | return self.height
361 |
362 | def getWidth(self):
363 | """Return the width of the window"""
364 | return self.width
365 |
366 | def toScreen(self, x, y):
367 | trans = self.trans
368 | if trans:
369 | return self.trans.screen(x,y)
370 | else:
371 | return x,y
372 |
373 | def toWorld(self, x, y):
374 | trans = self.trans
375 | if trans:
376 | return self.trans.world(x,y)
377 | else:
378 | return x,y
379 |
380 | def setMouseHandler(self, func):
381 | self._mouseCallback = func
382 |
383 | def _onClick(self, e):
384 | self.mouseX = e.x
385 | self.mouseY = e.y
386 | if self._mouseCallback:
387 | self._mouseCallback(Point(e.x, e.y))
388 |
389 | def addItem(self, item):
390 | self.items.append(item)
391 |
392 | def delItem(self, item):
393 | self.items.remove(item)
394 |
395 | def redraw(self):
396 | for item in self.items[:]:
397 | item.undraw()
398 | item.draw(self)
399 | self.update()
400 |
401 |
402 | class Transform:
403 |
404 | """Internal class for 2-D coordinate transformations"""
405 |
406 | def __init__(self, w, h, xlow, ylow, xhigh, yhigh):
407 | # w, h are width and height of window
408 | # (xlow,ylow) coordinates of lower-left [raw (0,h-1)]
409 | # (xhigh,yhigh) coordinates of upper-right [raw (w-1,0)]
410 | xspan = (xhigh-xlow)
411 | yspan = (yhigh-ylow)
412 | self.xbase = xlow
413 | self.ybase = yhigh
414 | self.xscale = xspan/float(w-1)
415 | self.yscale = yspan/float(h-1)
416 |
417 | def screen(self,x,y):
418 | # Returns x,y in screen (actually window) coordinates
419 | xs = (x-self.xbase) / self.xscale
420 | ys = (self.ybase-y) / self.yscale
421 | return int(xs+0.5),int(ys+0.5)
422 |
423 | def world(self,xs,ys):
424 | # Returns xs,ys in world coordinates
425 | x = xs*self.xscale + self.xbase
426 | y = self.ybase - ys*self.yscale
427 | return x,y
428 |
429 |
430 | # Default values for various item configuration options. Only a subset of
431 | # keys may be present in the configuration dictionary for a given item
432 | DEFAULT_CONFIG = {"fill":"",
433 | "outline":"black",
434 | "width":"1",
435 | "arrow":"none",
436 | "text":"",
437 | "justify":"center",
438 | "font": ("helvetica", 12, "normal")}
439 |
440 | class GraphicsObject:
441 |
442 | """Generic base class for all of the drawable objects"""
443 | # A subclass of GraphicsObject should override _draw and
444 | # and _move methods.
445 |
446 | def __init__(self, options):
447 | # options is a list of strings indicating which options are
448 | # legal for this object.
449 |
450 | # When an object is drawn, canvas is set to the GraphWin(canvas)
451 | # object where it is drawn and id is the TK identifier of the
452 | # drawn shape.
453 | self.canvas = None
454 | self.id = None
455 |
456 | # config is the dictionary of configuration options for the widget.
457 | config = {}
458 | for option in options:
459 | config[option] = DEFAULT_CONFIG[option]
460 | self.config = config
461 |
462 | def setFill(self, color):
463 | """Set interior color to color"""
464 | self._reconfig("fill", color)
465 |
466 | def setOutline(self, color):
467 | """Set outline color to color"""
468 | self._reconfig("outline", color)
469 |
470 | def setWidth(self, width):
471 | """Set line weight to width"""
472 | self._reconfig("width", width)
473 |
474 | def draw(self, graphwin):
475 |
476 | """Draw the object in graphwin, which should be a GraphWin
477 | object. A GraphicsObject may only be drawn into one
478 | window. Raises an error if attempt made to draw an object that
479 | is already visible."""
480 |
481 | if self.canvas and not self.canvas.isClosed(): raise GraphicsError(OBJ_ALREADY_DRAWN)
482 | if graphwin.isClosed(): raise GraphicsError("Can't draw to closed window")
483 | self.canvas = graphwin
484 | self.id = self._draw(graphwin, self.config)
485 | graphwin.addItem(self)
486 | if graphwin.autoflush:
487 | _root.update()
488 | return self
489 |
490 |
491 | def undraw(self):
492 |
493 | """Undraw the object (i.e. hide it). Returns silently if the
494 | object is not currently drawn."""
495 |
496 | if not self.canvas: return
497 | if not self.canvas.isClosed():
498 | self.canvas.delete(self.id)
499 | self.canvas.delItem(self)
500 | if self.canvas.autoflush:
501 | _root.update()
502 | self.canvas = None
503 | self.id = None
504 |
505 |
506 | def move(self, dx, dy):
507 |
508 | """move object dx units in x direction and dy units in y
509 | direction"""
510 |
511 | self._move(dx,dy)
512 | canvas = self.canvas
513 | if canvas and not canvas.isClosed():
514 | trans = canvas.trans
515 | if trans:
516 | x = dx/ trans.xscale
517 | y = -dy / trans.yscale
518 | else:
519 | x = dx
520 | y = dy
521 | self.canvas.move(self.id, x, y)
522 | if canvas.autoflush:
523 | _root.update()
524 |
525 | def _reconfig(self, option, setting):
526 | # Internal method for changing configuration of the object
527 | # Raises an error if the option does not exist in the config
528 | # dictionary for this object
529 | if option not in self.config:
530 | raise GraphicsError(UNSUPPORTED_METHOD)
531 | options = self.config
532 | options[option] = setting
533 | if self.canvas and not self.canvas.isClosed():
534 | self.canvas.itemconfig(self.id, options)
535 | if self.canvas.autoflush:
536 | _root.update()
537 |
538 |
539 | def _draw(self, canvas, options):
540 | """draws appropriate figure on canvas with options provided
541 | Returns Tk id of item drawn"""
542 | pass # must override in subclass
543 |
544 |
545 | def _move(self, dx, dy):
546 | """updates internal state of object to move it dx,dy units"""
547 | pass # must override in subclass
548 |
549 |
550 | class Point(GraphicsObject):
551 | def __init__(self, x, y):
552 | GraphicsObject.__init__(self, ["outline", "fill"])
553 | self.setFill = self.setOutline
554 | self.x = float(x)
555 | self.y = float(y)
556 |
557 | def __repr__(self):
558 | return "Point({}, {})".format(self.x, self.y)
559 |
560 | def _draw(self, canvas, options):
561 | x,y = canvas.toScreen(self.x,self.y)
562 | return canvas.create_rectangle(x,y,x+1,y+1,options)
563 |
564 | def _move(self, dx, dy):
565 | self.x = self.x + dx
566 | self.y = self.y + dy
567 |
568 | def clone(self):
569 | other = Point(self.x,self.y)
570 | other.config = self.config.copy()
571 | return other
572 |
573 | def getX(self): return self.x
574 | def getY(self): return self.y
575 |
576 | class _BBox(GraphicsObject):
577 | # Internal base class for objects represented by bounding box
578 | # (opposite corners) Line segment is a degenerate case.
579 |
580 | def __init__(self, p1, p2, options=["outline","width","fill"]):
581 | GraphicsObject.__init__(self, options)
582 | self.p1 = p1.clone()
583 | self.p2 = p2.clone()
584 |
585 | def _move(self, dx, dy):
586 | self.p1.x = self.p1.x + dx
587 | self.p1.y = self.p1.y + dy
588 | self.p2.x = self.p2.x + dx
589 | self.p2.y = self.p2.y + dy
590 |
591 | def getP1(self): return self.p1.clone()
592 |
593 | def getP2(self): return self.p2.clone()
594 |
595 | def getCenter(self):
596 | p1 = self.p1
597 | p2 = self.p2
598 | return Point((p1.x+p2.x)/2.0, (p1.y+p2.y)/2.0)
599 |
600 |
601 | class Rectangle(_BBox):
602 |
603 | def __init__(self, p1, p2):
604 | _BBox.__init__(self, p1, p2)
605 |
606 | def __repr__(self):
607 | return "Rectangle({}, {})".format(str(self.p1), str(self.p2))
608 |
609 | def _draw(self, canvas, options):
610 | p1 = self.p1
611 | p2 = self.p2
612 | x1,y1 = canvas.toScreen(p1.x,p1.y)
613 | x2,y2 = canvas.toScreen(p2.x,p2.y)
614 | return canvas.create_rectangle(x1,y1,x2,y2,options)
615 |
616 | def clone(self):
617 | other = Rectangle(self.p1, self.p2)
618 | other.config = self.config.copy()
619 | return other
620 |
621 |
622 | class Oval(_BBox):
623 |
624 | def __init__(self, p1, p2):
625 | _BBox.__init__(self, p1, p2)
626 |
627 | def __repr__(self):
628 | return "Oval({}, {})".format(str(self.p1), str(self.p2))
629 |
630 |
631 | def clone(self):
632 | other = Oval(self.p1, self.p2)
633 | other.config = self.config.copy()
634 | return other
635 |
636 | def _draw(self, canvas, options):
637 | p1 = self.p1
638 | p2 = self.p2
639 | x1,y1 = canvas.toScreen(p1.x,p1.y)
640 | x2,y2 = canvas.toScreen(p2.x,p2.y)
641 | return canvas.create_oval(x1,y1,x2,y2,options)
642 |
643 | class Circle(Oval):
644 |
645 | def __init__(self, center, radius):
646 | p1 = Point(center.x-radius, center.y-radius)
647 | p2 = Point(center.x+radius, center.y+radius)
648 | Oval.__init__(self, p1, p2)
649 | self.radius = radius
650 |
651 | def __repr__(self):
652 | return "Circle({}, {})".format(str(self.getCenter()), str(self.radius))
653 |
654 | def clone(self):
655 | other = Circle(self.getCenter(), self.radius)
656 | other.config = self.config.copy()
657 | return other
658 |
659 | def getRadius(self):
660 | return self.radius
661 |
662 |
663 | class Line(_BBox):
664 |
665 | def __init__(self, p1, p2):
666 | _BBox.__init__(self, p1, p2, ["arrow","fill","width"])
667 | self.setFill(DEFAULT_CONFIG['outline'])
668 | self.setOutline = self.setFill
669 |
670 | def __repr__(self):
671 | return "Line({}, {})".format(str(self.p1), str(self.p2))
672 |
673 | def clone(self):
674 | other = Line(self.p1, self.p2)
675 | other.config = self.config.copy()
676 | return other
677 |
678 | def _draw(self, canvas, options):
679 | p1 = self.p1
680 | p2 = self.p2
681 | x1,y1 = canvas.toScreen(p1.x,p1.y)
682 | x2,y2 = canvas.toScreen(p2.x,p2.y)
683 | return canvas.create_line(x1,y1,x2,y2,options)
684 |
685 | def setArrow(self, option):
686 | if not option in ["first","last","both","none"]:
687 | raise GraphicsError(BAD_OPTION)
688 | self._reconfig("arrow", option)
689 |
690 |
691 | class Polygon(GraphicsObject):
692 |
693 | def __init__(self, *points):
694 | # if points passed as a list, extract it
695 | if len(points) == 1 and type(points[0]) == type([]):
696 | points = points[0]
697 | self.points = list(map(Point.clone, points))
698 | GraphicsObject.__init__(self, ["outline", "width", "fill"])
699 |
700 | def __repr__(self):
701 | return "Polygon"+str(tuple(p for p in self.points))
702 |
703 | def clone(self):
704 | other = Polygon(*self.points)
705 | other.config = self.config.copy()
706 | return other
707 |
708 | def getPoints(self):
709 | return list(map(Point.clone, self.points))
710 |
711 | def _move(self, dx, dy):
712 | for p in self.points:
713 | p.move(dx,dy)
714 |
715 | def _draw(self, canvas, options):
716 | args = [canvas]
717 | for p in self.points:
718 | x,y = canvas.toScreen(p.x,p.y)
719 | args.append(x)
720 | args.append(y)
721 | args.append(options)
722 | return GraphWin.create_polygon(*args)
723 |
724 | class Text(GraphicsObject):
725 |
726 | def __init__(self, p, text):
727 | GraphicsObject.__init__(self, ["justify","fill","text","font"])
728 | self.setText(text)
729 | self.anchor = p.clone()
730 | self.setFill(DEFAULT_CONFIG['outline'])
731 | self.setOutline = self.setFill
732 |
733 | def __repr__(self):
734 | return "Text({}, '{}')".format(self.anchor, self.getText())
735 |
736 | def _draw(self, canvas, options):
737 | p = self.anchor
738 | x,y = canvas.toScreen(p.x,p.y)
739 | return canvas.create_text(x,y,options)
740 |
741 | def _move(self, dx, dy):
742 | self.anchor.move(dx,dy)
743 |
744 | def clone(self):
745 | other = Text(self.anchor, self.config['text'])
746 | other.config = self.config.copy()
747 | return other
748 |
749 | def setText(self,text):
750 | self._reconfig("text", text)
751 |
752 | def getText(self):
753 | return self.config["text"]
754 |
755 | def getAnchor(self):
756 | return self.anchor.clone()
757 |
758 | def setFace(self, face):
759 | if face in ['helvetica','arial','courier','times roman']:
760 | f,s,b = self.config['font']
761 | self._reconfig("font",(face,s,b))
762 | else:
763 | raise GraphicsError(BAD_OPTION)
764 |
765 | def setSize(self, size):
766 | if 5 <= size <= 36:
767 | f,s,b = self.config['font']
768 | self._reconfig("font", (f,size,b))
769 | else:
770 | raise GraphicsError(BAD_OPTION)
771 |
772 | def setStyle(self, style):
773 | if style in ['bold','normal','italic', 'bold italic']:
774 | f,s,b = self.config['font']
775 | self._reconfig("font", (f,s,style))
776 | else:
777 | raise GraphicsError(BAD_OPTION)
778 |
779 | def setTextColor(self, color):
780 | self.setFill(color)
781 |
782 |
783 | class Entry(GraphicsObject):
784 |
785 | def __init__(self, p, width):
786 | GraphicsObject.__init__(self, [])
787 | self.anchor = p.clone()
788 | #print self.anchor
789 | self.width = width
790 | self.text = tk.StringVar(_root)
791 | self.text.set("")
792 | self.fill = "gray"
793 | self.color = "black"
794 | self.font = DEFAULT_CONFIG['font']
795 | self.entry = None
796 |
797 | def __repr__(self):
798 | return "Entry({}, {})".format(self.anchor, self.width)
799 |
800 | def _draw(self, canvas, options):
801 | p = self.anchor
802 | x,y = canvas.toScreen(p.x,p.y)
803 | frm = tk.Frame(canvas.master)
804 | self.entry = tk.Entry(frm,
805 | width=self.width,
806 | textvariable=self.text,
807 | bg = self.fill,
808 | fg = self.color,
809 | font=self.font)
810 | self.entry.pack()
811 | #self.setFill(self.fill)
812 | self.entry.focus_set()
813 | return canvas.create_window(x,y,window=frm)
814 |
815 | def getText(self):
816 | return self.text.get()
817 |
818 | def _move(self, dx, dy):
819 | self.anchor.move(dx,dy)
820 |
821 | def getAnchor(self):
822 | return self.anchor.clone()
823 |
824 | def clone(self):
825 | other = Entry(self.anchor, self.width)
826 | other.config = self.config.copy()
827 | other.text = tk.StringVar()
828 | other.text.set(self.text.get())
829 | other.fill = self.fill
830 | return other
831 |
832 | def setText(self, t):
833 | self.text.set(t)
834 |
835 |
836 | def setFill(self, color):
837 | self.fill = color
838 | if self.entry:
839 | self.entry.config(bg=color)
840 |
841 |
842 | def _setFontComponent(self, which, value):
843 | font = list(self.font)
844 | font[which] = value
845 | self.font = tuple(font)
846 | if self.entry:
847 | self.entry.config(font=self.font)
848 |
849 |
850 | def setFace(self, face):
851 | if face in ['helvetica','arial','courier','times roman']:
852 | self._setFontComponent(0, face)
853 | else:
854 | raise GraphicsError(BAD_OPTION)
855 |
856 | def setSize(self, size):
857 | if 5 <= size <= 36:
858 | self._setFontComponent(1,size)
859 | else:
860 | raise GraphicsError(BAD_OPTION)
861 |
862 | def setStyle(self, style):
863 | if style in ['bold','normal','italic', 'bold italic']:
864 | self._setFontComponent(2,style)
865 | else:
866 | raise GraphicsError(BAD_OPTION)
867 |
868 | def setTextColor(self, color):
869 | self.color=color
870 | if self.entry:
871 | self.entry.config(fg=color)
872 |
873 |
874 | class Image(GraphicsObject):
875 |
876 | idCount = 0
877 | imageCache = {} # tk photoimages go here to avoid GC while drawn
878 |
879 | def __init__(self, p, *pixmap):
880 | GraphicsObject.__init__(self, [])
881 | self.anchor = p.clone()
882 | self.imageId = Image.idCount
883 | Image.idCount = Image.idCount + 1
884 | if len(pixmap) == 1: # file name provided
885 | self.img = tk.PhotoImage(file=pixmap[0], master=_root)
886 | else: # width and height provided
887 | width, height = pixmap
888 | self.img = tk.PhotoImage(master=_root, width=width, height=height)
889 |
890 | def __repr__(self):
891 | return "Image({}, {}, {})".format(self.anchor, self.getWidth(), self.getHeight())
892 |
893 | def _draw(self, canvas, options):
894 | p = self.anchor
895 | x,y = canvas.toScreen(p.x,p.y)
896 | self.imageCache[self.imageId] = self.img # save a reference
897 | return canvas.create_image(x,y,image=self.img)
898 |
899 | def _move(self, dx, dy):
900 | self.anchor.move(dx,dy)
901 |
902 | def undraw(self):
903 | try:
904 | del self.imageCache[self.imageId] # allow gc of tk photoimage
905 | except KeyError:
906 | pass
907 | GraphicsObject.undraw(self)
908 |
909 | def getAnchor(self):
910 | return self.anchor.clone()
911 |
912 | def clone(self):
913 | other = Image(Point(0,0), 0, 0)
914 | other.img = self.img.copy()
915 | other.anchor = self.anchor.clone()
916 | other.config = self.config.copy()
917 | return other
918 |
919 | def getWidth(self):
920 | """Returns the width of the image in pixels"""
921 | return self.img.width()
922 |
923 | def getHeight(self):
924 | """Returns the height of the image in pixels"""
925 | return self.img.height()
926 |
927 | def getPixel(self, x, y):
928 | """Returns a list [r,g,b] with the RGB color values for pixel (x,y)
929 | r,g,b are in range(256)
930 |
931 | """
932 |
933 | value = self.img.get(x,y)
934 | if type(value) == type(0):
935 | return [value, value, value]
936 | elif type(value) == type((0,0,0)):
937 | return list(value)
938 | else:
939 | return list(map(int, value.split()))
940 |
941 | def setPixel(self, x, y, color):
942 | """Sets pixel (x,y) to the given color
943 |
944 | """
945 | self.img.put("{" + color +"}", (x, y))
946 |
947 |
948 | def save(self, filename):
949 | """Saves the pixmap image to filename.
950 | The format for the save image is determined from the filname extension.
951 |
952 | """
953 |
954 | path, name = os.path.split(filename)
955 | ext = name.split(".")[-1]
956 | self.img.write( filename, format=ext)
957 |
958 |
959 | def color_rgb(r,g,b):
960 | """r,g,b are intensities of red, green, and blue in range(256)
961 | Returns color specifier string for the resulting color"""
962 | return "#%02x%02x%02x" % (r,g,b)
963 |
964 | def test():
965 | win = GraphWin()
966 | win.setCoords(0,0,10,10)
967 | t = Text(Point(5,5), "Centered Text")
968 | t.draw(win)
969 | p = Polygon(Point(1,1), Point(5,3), Point(2,7))
970 | p.draw(win)
971 | e = Entry(Point(5,6), 10)
972 | e.draw(win)
973 | win.getMouse()
974 | p.setFill("red")
975 | p.setOutline("blue")
976 | p.setWidth(2)
977 | s = ""
978 | for pt in p.getPoints():
979 | s = s + "(%0.1f,%0.1f) " % (pt.getX(), pt.getY())
980 | t.setText(e.getText())
981 | e.setFill("green")
982 | e.setText("Spam!")
983 | e.move(2,0)
984 | win.getMouse()
985 | p.move(2,3)
986 | s = ""
987 | for pt in p.getPoints():
988 | s = s + "(%0.1f,%0.1f) " % (pt.getX(), pt.getY())
989 | t.setText(s)
990 | win.getMouse()
991 | p.undraw()
992 | e.undraw()
993 | t.setStyle("bold")
994 | win.getMouse()
995 | t.setStyle("normal")
996 | win.getMouse()
997 | t.setStyle("italic")
998 | win.getMouse()
999 | t.setStyle("bold italic")
1000 | win.getMouse()
1001 | t.setSize(14)
1002 | win.getMouse()
1003 | t.setFace("arial")
1004 | t.setSize(20)
1005 | win.getMouse()
1006 | win.close()
1007 |
1008 | #MacOS fix 2
1009 | #tk.Toplevel(_root).destroy()
1010 |
1011 | # MacOS fix 1
1012 | update()
1013 |
1014 | if __name__ == "__main__":
1015 | test()
1016 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/main.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 | import pickle
4 | import time
5 | from comet_ml import Experiment
6 | import json
7 | import gym
8 | from tqdm import tqdm
9 | import numpy as np
10 | import itertools
11 | import torch
12 | import ipdb
13 | from sac import SAC
14 | from normalized_actions import NormalizedActions
15 | from replay_memory import ReplayMemory
16 | from continous_grids import GridWorld
17 |
18 |
19 | def main(args):
20 | # Environment
21 | if args.make_cont_grid:
22 | if args.smol:
23 | dense_goals = []
24 | if args.dense_goals:
25 | dense_goals = [(13.0, 8.0), (18.0, 11.0), (20.0, 15.0), (22.0, 19.0)]
26 | env = GridWorld(max_episode_len=500, num_rooms=1, action_limit_max=1.0, silent_mode=args.silent,
27 | start_position=(8.0, 8.0), goal_position=(22.0, 22.0), goal_reward=+100.0,
28 | dense_goals=dense_goals, dense_reward=+5,
29 | grid_len=30)
30 | env_name = "SmallGridWorld"
31 | elif args.tiny:
32 | env = GridWorld(max_episode_len=500, num_rooms=0, action_limit_max=1.0, silent_mode=args.silent,
33 | start_position=(5.0, 5.0), goal_position=(15.0, 15.0), goal_reward=+100.0,
34 | dense_goals=[], dense_reward=+0,
35 | grid_len=20)
36 | env_name = "TinyGridWorld"
37 | elif args.twotiny:
38 | env = GridWorld(max_episode_len=500, num_rooms=1, action_limit_max=1.0, silent_mode=args.silent,
39 | start_position=(5.0, 5.0), goal_position=(15.0, 15.0), goal_reward=+100.0,
40 | dense_goals=[], dense_reward=+0,
41 | grid_len=20, door_breadth=3)
42 | env_name = "TwoTinyGridWorld"
43 | elif args.threetiny:
44 | env = GridWorld(max_episode_len=500, num_rooms=0, action_limit_max=1.0, silent_mode=args.silent,
45 | start_position=(8.0, 8.0), goal_position=(22.0, 22.0), goal_reward=+100.0,
46 | dense_goals=[], dense_reward=+0,
47 | grid_len=30)
48 | env_name = "ThreeGridWorld"
49 | else:
50 | dense_goals = []
51 | if args.dense_goals:
52 | dense_goals = [(35.0, 25.0), (45.0, 25.0), (55.0, 25.0), (68.0, 33.0), (75.0, 45.0), (75.0, 55.0),
53 | (75.0, 65.0)]
54 | env = GridWorld(max_episode_len=1000, num_rooms=1, action_limit_max=1.0, silent_mode=args.silent,
55 | dense_goals=dense_goals)
56 | env_name = "VeryLargeGridWorld"
57 | else:
58 | env = NormalizedActions(gym.make(args.env_name))
59 |
60 | env.seed(args.seed)
61 | torch.manual_seed(args.seed)
62 | np.random.seed(args.seed)
63 |
64 | # Agent
65 | agent = SAC(env.observation_space.shape[0], env.action_space, args)
66 |
67 | # Memory
68 | memory = ReplayMemory(args.replay_size)
69 |
70 | # Training Loop
71 | rewards = []
72 | test_rewards = []
73 | total_numsteps = 0
74 | updates = 0
75 |
76 | if args.debug:
77 | args.use_logger = False
78 |
79 | # Check if settings file
80 | if os.path.isfile("settings.json"):
81 | with open('settings.json') as f:
82 | data = json.load(f)
83 | args.comet_apikey = data["apikey"]
84 | args.comet_username = data["username"]
85 | else:
86 | raise NotImplementedError
87 |
88 | experiment_id = None
89 | if args.comet:
90 | experiment = Experiment(api_key=args.comet_apikey,\
91 | project_name="florl",auto_output_logging="None",\
92 | workspace=args.comet_username,auto_metric_logging=False,\
93 | auto_param_logging=False)
94 | experiment.set_name(args.namestr)
95 | args.experiment = experiment
96 | experiment_id = experiment.id
97 |
98 | if args.make_cont_grid:
99 | # The following lines are for visual purposes
100 | traj = []
101 | imp_states = []
102 |
103 | for i_episode in itertools.count():
104 | state = env.reset()
105 | if args.make_cont_grid:
106 | traj.append(state)
107 |
108 | episode_reward = 0
109 | while True:
110 | if args.start_steps > total_numsteps:
111 | action = env.action_space.sample()
112 | else:
113 | action = agent.select_action(state) # Sample action from policy
114 | time.sleep(.002)
115 | next_state, reward, done, _ = env.step(action) # Step
116 | #Visual
117 | if args.make_cont_grid:
118 | traj.append(next_state)
119 | if total_numsteps % 10000 == 0 and total_numsteps != 0:
120 | imp_states.append(next_state)
121 |
122 | # Save current trajectories to JSON
123 | filename = 'run_data/{}_{}_{}_{}_{}.txt'.format(args.policy, args.env_name,
124 | experiment_id, args.namestr, total_numsteps)
125 | with open(filename, 'wb') as f:
126 | pickle.dump(traj, f)
127 |
128 |
129 |
130 | mask = not done # 1 for not done and 0 for done
131 | memory.push(state, action, reward, next_state, mask) # Append transition to memory
132 | if len(memory) > args.batch_size:
133 | for i in range(args.updates_per_step): # Number of updates per step in environment
134 | # Sample a batch from memory
135 | state_batch, action_batch, reward_batch, next_state_batch,\
136 | mask_batch = memory.sample(args.batch_size)
137 | # Update parameters of all the networks
138 | value_loss, critic_1_loss, critic_2_loss, policy_loss,\
139 | ent_loss, alpha = agent.update_parameters(state_batch,\
140 | action_batch,reward_batch,next_state_batch,mask_batch, updates)
141 |
142 | if args.comet:
143 | args.experiment.log_metric("Loss Value", value_loss,step=updates)
144 | args.experiment.log_metric("Loss Critic 1",critic_1_loss,step=updates)
145 | args.experiment.log_metric("Loss Critic 2",critic_2_loss,step=updates)
146 | args.experiment.log_metric("Loss Policy",policy_loss,step=updates)
147 | args.experiment.log_metric("Entropy",ent_loss,step=updates)
148 | args.experiment.log_metric("Entropy Temperature",alpha,step=updates)
149 | updates += 1
150 |
151 | state = next_state
152 | total_numsteps += 1
153 | episode_reward += reward
154 |
155 | if done:
156 | break
157 |
158 | if total_numsteps > args.num_steps:
159 | break
160 |
161 | rewards.append(episode_reward)
162 | if args.comet:
163 | args.experiment.log_metric("Train Reward",episode_reward,step=i_episode)
164 | args.experiment.log_metric("Average Train Reward",\
165 | np.round(np.mean(rewards[-100:]),2),step=i_episode)
166 | print("Episode: {}, total numsteps: {}, reward: {}, average reward: {}".format(i_episode,\
167 | total_numsteps, np.round(rewards[-1],2),\
168 | np.round(np.mean(rewards[-100:]),2)))
169 |
170 | if i_episode % 10 == 0 and args.eval == True:
171 | state = torch.Tensor([env.reset()])
172 | episode_reward = 0
173 | while True:
174 | action = agent.select_action(state, eval=True)
175 | next_state, reward, done, _ = env.step(action)
176 | episode_reward += reward
177 |
178 | state = next_state
179 | if done:
180 | break
181 |
182 | if args.comet:
183 | args.experiment.log_metric("Test Reward", episode_reward, step=i_episode)
184 |
185 | test_rewards.append(episode_reward)
186 | print("----------------------------------------")
187 | print("Test Episode: {}, reward: {}".format(i_episode, test_rewards[-1]))
188 | print("----------------------------------------")
189 | if args.make_cont_grid:
190 | #Visual
191 | # env.vis_trajectory(np.asarray(traj), args.namestr, experiment_id, np.asarray(imp_states))
192 |
193 | # Save final trajectories to JSON
194 | filename = 'run_data/finalrun_{}_{}_{}_{}_{}.txt'.format(args.policy, args.env_name,
195 | experiment_id, args.namestr, total_numsteps)
196 | with open(filename, 'wb') as f:
197 | pickle.dump(traj, f)
198 | env.test_vis_trajectory(np.asarray(traj), args.namestr, args.heatmap_title, experiment_id,
199 | args.heatmap_normalize, args.heatmap_vertical_clip_value)
200 |
201 | env.close()
202 |
203 | if __name__ == '__main__':
204 | """
205 | Process command-line arguments, then call main()
206 | """
207 | parser = argparse.ArgumentParser(description='PyTorch REINFORCE example')
208 | parser.add_argument('--env-name', default=None,
209 | help='name of the environment to run')
210 | parser.add_argument('--policy', default="Gaussian",
211 | help='algorithm to use: Gaussian | Deterministic')
212 | parser.add_argument('--eval', type=bool, default=True,
213 | help='Evaluates a policy a policy every 10 episode (default:True)')
214 | parser.add_argument('--gamma', type=float, default=0.99, metavar='G',
215 | help='discount factor for reward (default: 0.99)')
216 | parser.add_argument('--tau', type=float, default=0.005, metavar='G',
217 | help='target smoothing coefficient (default: 0.005)')
218 | parser.add_argument('--lr', type=float, default=0.0003, metavar='G',
219 | help='learning rate (default: 0.0003)')
220 | parser.add_argument('--alpha', type=float, default=0.1, metavar='G',
221 | help='Temperature parameter α determines the relative importance of the entropy term against the reward (default: 0.1)')
222 | parser.add_argument('--automatic_entropy_tuning', type=bool, default=False, metavar='G',
223 | help='Temperature parameter α automaically adjusted.')
224 | parser.add_argument('--seed', type=int, default=456, metavar='N',
225 | help='random seed (default: 456)')
226 | parser.add_argument('--batch_size', type=int, default=256, metavar='N',
227 | help='batch size (default: 256)')
228 | parser.add_argument('--clip', type=int, default=1, metavar='N',
229 | help='Clipping for gradient norm')
230 | parser.add_argument('--num_steps', type=int, default=1000000, metavar='N',
231 | help='maximum number of steps (default: 1000000)')
232 | parser.add_argument('--hidden_size', type=int, default=256, metavar='N',
233 | help='hidden size (default: 256)')
234 | parser.add_argument('--updates_per_step', type=int, default=1, metavar='N',
235 | help='model updates per simulator step (default: 1)')
236 | parser.add_argument('--start_steps', type=int, default=10000, metavar='N',
237 | help='Steps sampling random actions (default: 10000)')
238 | parser.add_argument('--target_update_interval', type=int, default=1, metavar='N',
239 | help='Value target update per no. of updates per step (default: 1)')
240 | parser.add_argument('--replay_size', type=int, default=1000000, metavar='N',
241 | help='size of replay buffer (default: 10000000)')
242 | parser.add_argument("--comet", action="store_true", default=False,help='Use comet for logging')
243 | parser.add_argument('--debug', default=False, action='store_true',help='Debug')
244 | parser.add_argument('--namestr', type=str, default='FloRL', \
245 | help='additional info in output filename to describe experiments')
246 | parser.add_argument('--n_blocks', type=int, default=5,\
247 | help='Number of blocks to stack in a model (MADE in MAF; Coupling+BN in RealNVP).')
248 | parser.add_argument('--n_components', type=int, default=1,\
249 | help='Number of Gaussian clusters for mixture of gaussians models.')
250 | parser.add_argument('--flow_hidden_size', type=int, default=100,\
251 | help='Hidden layer size for MADE (and each MADE block in an MAF).')
252 | parser.add_argument('--n_hidden', type=int, default=1, help='Number of hidden layers in each MADE.')
253 | parser.add_argument('--activation_fn', type=str, default='relu',\
254 | help='What activation function to use in the MADEs.')
255 | parser.add_argument('--input_order', type=str, default='sequential',\
256 | help='What input order to use (sequential | random).')
257 | parser.add_argument('--conditional', default=False, action='store_true',\
258 | help='Whether to use a conditional model.')
259 | parser.add_argument('--no_batch_norm', action='store_true')
260 | parser.add_argument('--flow_model', default='maf', help='Which model to use: made, maf.')
261 |
262 | # flags for using reparameterization trick or not
263 | parser.add_argument('--reparam',dest='reparam',action='store_true')
264 | parser.add_argument('--no-reparam',dest='reparam',action='store_false')
265 | # flags for using a tanh activation or not
266 | parser.add_argument('--tanh', dest='tanh', action='store_true')
267 | parser.add_argument('--no-tanh', dest='tanh', action='store_false')
268 | # For different gridworld environments
269 | parser.add_argument('--make_cont_grid', default=False, action='store_true', help='Make GridWorld')
270 | parser.add_argument('--dense_goals', default=False, action='store_true', help='Create sub-goals')
271 | parser.add_argument("--smol", action="store_true", default=False, help='Change to a smaller sized gridworld')
272 | parser.add_argument("--tiny", action="store_true", default=False, help='Change to the smallest sized gridworld')
273 | parser.add_argument("--twotiny", action="store_true", default=False,
274 | help='Change to 2x the smallest sized gridworld')
275 | parser.add_argument("--threetiny", action="store_true", default=False,
276 | help='Change to 3x the smallest sized gridworld')
277 | parser.add_argument("--silent", action="store_true", default=False,
278 | help='Display graphical output. Set to true when running on a server.')
279 | parser.add_argument('--heatmap_title', default='Continuous GridWorld Trajectories')
280 | parser.add_argument('--heatmap_normalize', default=False, action='store_true')
281 | parser.add_argument('--heatmap_vertical_clip_value', type=int, default=2500)
282 |
283 | parser.set_defaults(reparam=True, tanh=True)
284 |
285 | args = parser.parse_args()
286 | args.cond_label_size = None
287 | main(args)
288 |
289 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/model.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import torch
4 | import ipdb
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.distributions import Normal, Exponential, LogNormal, Laplace
8 |
9 | LOG_SIG_MAX = 2
10 | LOG_SIG_MIN = -20
11 | epsilon = 1e-6
12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
13 |
14 | # Initialize Policy weights
15 | def weights_init_(m):
16 | classname = m.__class__.__name__
17 | if classname.find('Linear') != -1:
18 | torch.nn.init.xavier_uniform_(m.weight, gain=1)
19 | torch.nn.init.constant_(m.bias, 0)
20 |
21 |
22 | class ValueNetwork(nn.Module):
23 | def __init__(self, num_inputs, hidden_dim):
24 | super(ValueNetwork, self).__init__()
25 |
26 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
27 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
28 | self.linear3 = nn.Linear(hidden_dim, 1)
29 |
30 | self.apply(weights_init_)
31 |
32 | def forward(self, state):
33 | x = F.relu(self.linear1(state))
34 | x = F.relu(self.linear2(x))
35 | x = self.linear3(x)
36 | return x
37 |
38 |
39 | class QNetwork(nn.Module):
40 | def __init__(self, num_inputs, num_actions, hidden_dim):
41 | super(QNetwork, self).__init__()
42 |
43 | # Q1 architecture
44 | self.linear1 = nn.Linear(num_inputs + num_actions, hidden_dim)
45 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
46 | self.linear3 = nn.Linear(hidden_dim, 1)
47 |
48 | # Q2 architecture
49 | self.linear4 = nn.Linear(num_inputs + num_actions, hidden_dim)
50 | self.linear5 = nn.Linear(hidden_dim, hidden_dim)
51 | self.linear6 = nn.Linear(hidden_dim, 1)
52 |
53 | self.apply(weights_init_)
54 |
55 | def forward(self, state, action):
56 | x1 = torch.cat([state, action], 1)
57 | x1 = F.relu(self.linear1(x1))
58 | x1 = F.relu(self.linear2(x1))
59 | x1 = self.linear3(x1)
60 |
61 | x2 = torch.cat([state, action], 1)
62 | x2 = F.relu(self.linear4(x2))
63 | x2 = F.relu(self.linear5(x2))
64 | x2 = self.linear6(x2)
65 |
66 | return x1, x2
67 |
68 |
69 | class GaussianPolicy(nn.Module):
70 | def __init__(self, num_inputs, num_actions, hidden_dim, args):
71 | super(GaussianPolicy, self).__init__()
72 |
73 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
74 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
75 |
76 | self.mean_linear = nn.Linear(hidden_dim, num_actions)
77 | self.log_std_linear = nn.Linear(hidden_dim, num_actions)
78 |
79 | self.apply(weights_init_)
80 |
81 | self.tanh = args.tanh
82 |
83 | def encode(self, state):
84 | x = F.relu(self.linear1(state))
85 | x = F.relu(self.linear2(x))
86 | mean = self.mean_linear(x)
87 | log_std = self.log_std_linear(x)
88 | log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
89 | return mean, log_std
90 |
91 | def forward(self, state, reparam = False):
92 | mean, log_std = self.encode(state)
93 | std = log_std.exp()
94 | normal = Normal(mean, std)
95 |
96 | if reparam == True:
97 | x_t = normal.rsample() # for reparameterization trick (mean + std * N(0,1))
98 | else:
99 | x_t = normal.sample()
100 |
101 | if self.tanh:
102 | action = torch.tanh(x_t)
103 | else:
104 | action = x_t
105 |
106 | log_prob = normal.log_prob(x_t)
107 |
108 | if self.tanh:
109 | # Enforcing Action Bound
110 | log_prob -= torch.log(1 - action.pow(2) + epsilon)
111 | log_prob = log_prob.sum(1, keepdim=True)
112 |
113 | return action, log_prob, x_t, mean, log_std
114 |
115 |
116 | class ExponentialPolicy(nn.Module):
117 | def __init__(self, num_inputs, num_actions, hidden_dim, args):
118 | super(ExponentialPolicy, self).__init__()
119 |
120 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
121 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
122 |
123 | self.rate_linear = nn.Linear(hidden_dim, num_actions)
124 | self.apply(weights_init_)
125 |
126 | self.tanh = args.tanh
127 |
128 | def encode(self, state):
129 | x = F.relu(self.linear1(state))
130 | x = F.relu(self.linear2(x))
131 | log_rate = self.rate_linear(x)
132 | return log_rate
133 |
134 | def forward(self, state, reparam = False):
135 | log_rate = self.encode(state)
136 | log_rate = torch.clamp(log_rate, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
137 | rate = torch.exp(log_rate)
138 | exponential = Exponential(rate)
139 |
140 | # whether or not to use reparametrization trick
141 | if reparam == True:
142 | x_t = exponential.rsample()
143 | else:
144 | x_t = exponential.sample()
145 | # whether or not to add tanh
146 | if self.tanh:
147 | action = torch.tanh(x_t)
148 | else:
149 | action = x_t
150 |
151 | log_prob = exponential.log_prob(x_t)
152 | mean = exponential.mean
153 | std = torch.sqrt(exponential.variance)
154 | log_std = torch.log(std)
155 | if self.tanh:
156 | # Enforcing Action Bound
157 | log_prob -= torch.log(1 - action.pow(2) + epsilon)
158 | log_prob = log_prob.sum(1, keepdim=True)
159 |
160 | return action, log_prob, x_t, mean, log_std
161 |
162 |
163 | class LogNormalPolicy(nn.Module):
164 | def __init__(self, num_inputs, num_actions, hidden_dim, args):
165 | super(LogNormalPolicy, self).__init__()
166 |
167 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
168 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
169 |
170 | self.mean_linear = nn.Linear(hidden_dim, num_actions)
171 | self.std_linear = nn.Linear(hidden_dim, num_actions)
172 |
173 | self.apply(weights_init_)
174 |
175 | self.tanh = args.tanh
176 |
177 | def encode(self, state):
178 | x = F.relu(self.linear1(state))
179 | x = F.relu(self.linear2(x))
180 | mean = self.mean_linear(x)
181 | std = self.std_linear(x)
182 | std = torch.clamp(std, min=0 , max=LOG_SIG_MAX) # standard deviation has to be > 0
183 | return mean, std
184 |
185 | def forward(self, state, reparam = False):
186 | mean, std = self.encode(state)
187 | log_normal = LogNormal(mean, std)
188 |
189 | # whether or not to use reparametrization trick
190 | if reparam == True:
191 | x_t = log_normal.rsample()
192 | else:
193 | x_t = log_normal.sample()
194 |
195 | # whether or not to add tanh
196 | if self.tanh:
197 | action = torch.tanh(x_t)
198 | else:
199 | action = x_t
200 |
201 | log_prob = log_normal.log_prob(x_t)
202 |
203 | if self.tanh:
204 | # Enforcing Action Bound
205 | log_prob -= torch.log(1 - action.pow(2) + epsilon)
206 | log_prob = log_prob.sum(1, keepdim=True)
207 |
208 | # get mean and standard deviation of the distr
209 | mean = log_normal.mean
210 | std = torch.sqrt(log_normal.variance)
211 | log_std = torch.log(std)
212 | return action, log_prob, x_t, mean, log_std
213 |
214 |
215 | class LaplacePolicy(nn.Module):
216 | def __init__(self, num_inputs, num_actions, hidden_dim, args):
217 | super(LaplacePolicy, self).__init__()
218 |
219 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
220 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
221 |
222 | self.mean_linear = nn.Linear(hidden_dim, num_actions)
223 | self.log_scale_linear = nn.Linear(hidden_dim, num_actions)
224 |
225 | self.apply(weights_init_)
226 |
227 | self.tanh = args.tanh
228 |
229 | def encode(self, state):
230 | x = F.relu(self.linear1(state))
231 | x = F.relu(self.linear2(x))
232 | mean = self.mean_linear(x)
233 | log_scale = self.log_scale_linear(x)
234 | log_scale = torch.clamp(log_scale, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
235 | return mean, log_scale
236 |
237 | def forward(self, state, reparam = False):
238 | mean, log_scale = self.encode(state)
239 | scale = torch.exp(log_scale)
240 | laplace = Laplace(mean, scale)
241 |
242 | if reparam == True:
243 | x_t = laplace.rsample()
244 | else:
245 | x_t = laplace.sample()
246 |
247 | if self.tanh:
248 | action = torch.tanh(x_t)
249 | else:
250 | action = x_t
251 |
252 | log_prob = laplace.log_prob(x_t)
253 | std = torch.sqrt(laplace.variance)
254 | log_std = torch.log(std)
255 |
256 | if self.tanh:
257 | # Enforcing Action Bound
258 | log_prob -= torch.log(1 - action.pow(2) + epsilon)
259 | log_prob = log_prob.sum(1, keepdim=True)
260 |
261 | mean = laplace.mean
262 | std = torch.sqrt(laplace.variance)
263 | log_std = torch.log(std)
264 | return action, log_prob, x_t, mean, log_std
265 |
266 |
267 | class DeterministicPolicy(nn.Module):
268 | def __init__(self, num_inputs, num_actions, hidden_dim):
269 | super(DeterministicPolicy, self).__init__()
270 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
271 | self.linear2 = nn.Linear(hidden_dim, hidden_dim)
272 |
273 | self.mean = nn.Linear(hidden_dim, num_actions)
274 | self.noise = torch.Tensor(num_actions)
275 |
276 | self.apply(weights_init_)
277 |
278 | def encode(self, state):
279 | x = F.relu(self.linear1(state))
280 | x = F.relu(self.linear2(x))
281 | mean = torch.tanh(self.mean(x))
282 | return mean
283 |
284 | def forward(self, state):
285 | mean = self.forward(state)
286 | noise = self.noise.normal_(0., std=0.1)
287 | noise = noise.clamp(-0.25, 0.25)
288 | action = mean + noise
289 | return action, torch.tensor(0.), torch.tensor(0.), mean, torch.tensor(0.)
290 |
291 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/normalized_actions.py:
--------------------------------------------------------------------------------
1 | import gym
2 |
3 |
4 | class NormalizedActions(gym.ActionWrapper):
5 |
6 | def action(self, action):
7 | action = (action + 1) / 2 # [-1, 1] => [0, 1]
8 | action *= (self.action_space.high - self.action_space.low)
9 | action += self.action_space.low
10 | return action
11 |
12 | def _reverse_action(self, action):
13 | action -= self.action_space.low
14 | action /= (self.action_space.high - self.action_space.low)
15 | action = action * 2 - 1
16 | return action
17 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/plots/plot_comet.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import csv
3 | import json
4 | import os
5 | from statistics import mean
6 |
7 | from comet_ml import API
8 | import matplotlib
9 | import numpy as np
10 |
11 | matplotlib.use('Agg')
12 | import matplotlib.pyplot as plt
13 | import seaborn as sns
14 |
15 | # Set plotting style
16 | sns.set_context('paper', font_scale=1.3)
17 | sns.set_style('whitegrid')
18 | sns.set_palette('colorblind')
19 | plt.rcParams['text.usetex'] = True
20 |
21 |
22 | def extract_excel_data(source_filename):
23 | with open(source_filename, 'r') as csvfile:
24 | csvreader = csv.reader(csvfile)
25 | rows = {row[0]: row[1:] for row in csvreader}
26 |
27 | labels = {}
28 | labels['title'] = rows.get('filename')[0]
29 | labels['x_label'] = rows.get('xlabel')[0]
30 | labels['y_label'] = rows.get('ylabel')[0]
31 | labels['metric'] = rows.get('metric')[0]
32 |
33 | data = {key: value for key, value in rows.items() if 'experiment' in key.lower()}
34 | labels['experiments'] = [key.split(':')[1] for key, value in data.items()]
35 |
36 | return labels, data
37 |
38 |
39 | def connect_to_comet():
40 | if os.path.isfile("settings.json"):
41 | with open("settings.json") as f:
42 | keys = json.load(f)
43 | comet_apikey = keys.get("apikey")
44 | comet_username = keys.get("username")
45 | comet_restapikey = keys.get("restapikey")
46 | comet_project = keys.get("project")
47 |
48 | print("COMET_REST_API_KEY=%s" %(comet_restapikey))
49 | with open('.env', 'w') as writer:
50 | writer.write("COMET_API_KEY=%s\n" %(comet_apikey))
51 | writer.write("COMET_REST_API_KEY=%s\n" %(comet_restapikey))
52 |
53 | comet_api = API()
54 | return comet_api, comet_username, comet_project
55 |
56 |
57 | def truncate_exp(data_experiments):
58 | last_data_points = [run[-1] for data_run in data_experiments for run in data_run]
59 | run_end_times = [timestep for timestep, value in last_data_points]
60 | earliest_end_time = min(run_end_times)
61 |
62 | clean_data_experiments = []
63 | for exp in data_experiments:
64 | clean_data_runs = []
65 | for run in exp:
66 | clean_data_runs.append({x: y for x, y in run if x <= earliest_end_time})
67 | clean_data_experiments.append(clean_data_runs)
68 |
69 | return clean_data_experiments
70 |
71 |
72 | def get_data(title, x_label, y_label, metric, data):
73 | if not title or not x_label or not y_label or not metric:
74 | print("Error in reading CSV file. Ensure filename, x and y labels, and metric are present.")
75 | exit(1)
76 |
77 | comet_api, comet_username, comet_project = connect_to_comet()
78 |
79 | # Accumulate data for all experiments.
80 | data_experiments = []
81 | for exp_name, runs in data.items():
82 | # Accumulate data for all runs of a given experiment.
83 | data_runs = []
84 | if len(runs) > 0:
85 | for exp_key in runs:
86 | raw_data = comet_api.get("%s/%s/%s" %(comet_username, comet_project, exp_key))
87 | data_points = raw_data.metrics_raw[metric]
88 | data_runs.append(data_points)
89 |
90 | data_experiments.append(data_runs)
91 |
92 | clean_data_experiments = truncate_exp(data_experiments)
93 | return clean_data_experiments
94 |
95 |
96 | def plot(**kwargs):
97 | labels = kwargs.get('labels')
98 | data = kwargs.get('data')
99 |
100 | # Setup figure
101 | fig = plt.figure(figsize=(12, 8))
102 | ax = plt.subplot()
103 | for label in (ax.get_xticklabels()):
104 | label.set_fontname('Arial')
105 | label.set_fontsize(28)
106 | for label in (ax.get_yticklabels()):
107 | label.set_fontname('Arial')
108 | label.set_fontsize(28)
109 | plt.ticklabel_format(style='sci', axis='x', scilimits=(0, 0))
110 | ax.xaxis.get_offset_text().set_fontsize(20)
111 | axis_font = {'fontname': 'Arial', 'size': '32'}
112 | colors = sns.color_palette('colorblind', n_colors=len(data))
113 |
114 | # Plot data
115 | for runs, label, color in zip(data, labels.get('experiments'), colors):
116 | unique_x_values = set()
117 | for run in runs:
118 | for key in run.keys():
119 | unique_x_values.add(key)
120 | x_values = sorted(unique_x_values)
121 |
122 | # Plot mean and standard deviation of all runs
123 | y_values_mean = []
124 | y_values_std = []
125 |
126 | for x in x_values:
127 | y_values_mean.append(mean([run.get(x) for run in runs if run.get(x)]))
128 | y_values_std.append(np.std([run.get(x) for run in runs if run.get(x)]))
129 |
130 | # Plot std
131 | ax.fill_between(x_values, np.add(np.array(y_values_mean), np.array(y_values_std)),
132 | np.subtract(np.array(y_values_mean), np.array(y_values_std)),
133 | alpha=0.3,
134 | edgecolor=color, facecolor=color)
135 | # Plot mean
136 | plt.plot(x_values, y_values_mean, color=color, linewidth=1.5, label=label)
137 |
138 | # Label figure
139 | ax.legend(loc='lower right', prop={'size': 26})
140 | ax.set_xlabel(labels.get('x_label'), **axis_font)
141 | ax.set_ylabel(labels.get('y_label'), **axis_font)
142 | fig.subplots_adjust(bottom=0.2)
143 | fig.subplots_adjust(left=0.2)
144 | ax.set_title(labels.get('title'), **axis_font)
145 |
146 | fig.savefig('../install/{}.pdf'.format(labels.get('title')))
147 |
148 | return
149 |
150 |
151 | def main(args):
152 | source_filename = args.source_filename
153 |
154 | labels, data = extract_excel_data(source_filename)
155 | data_experiments = get_data(labels.get('title'), labels.get('x_label'), labels.get('y_label'), labels.get('metric')
156 | , data)
157 | plot(labels=labels, data=data_experiments)
158 |
159 |
160 | if __name__ == '__main__':
161 | parser = argparse.ArgumentParser()
162 | parser.add_argument('--source_filename', default='plot_source.csv')
163 | args = parser.parse_args()
164 |
165 | main(args)
166 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/replay_memory.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | from collections import namedtuple
4 | import torch
5 |
6 | class ReplayMemory:
7 | def __init__(self, capacity):
8 | self.capacity = capacity
9 | self.buffer = []
10 | self.position = 0
11 |
12 | def push(self, state, action, reward, next_state, done):
13 | if len(self.buffer) < self.capacity:
14 | self.buffer.append(None)
15 | self.buffer[self.position] = (state, action, reward, next_state, done)
16 | self.position = (self.position + 1) % self.capacity
17 |
18 | def sample(self, batch_size):
19 | batch = random.sample(self.buffer, batch_size)
20 | state, action, reward, next_state, done = map(np.stack, zip(*batch))
21 | return state, action, reward, next_state, done
22 |
23 | def __len__(self):
24 | return len(self.buffer)
25 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/sac.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import numpy as np
4 | import torch
5 | import torch.nn.functional as F
6 | import torch.nn as nn
7 | from torch.optim import Adam
8 | from utils import soft_update, hard_update
9 | from model import GaussianPolicy, ExponentialPolicy, LogNormalPolicy, LaplacePolicy, QNetwork, ValueNetwork, DeterministicPolicy
10 | from flows import *
11 | import ipdb
12 |
13 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
14 |
15 | class SAC(object):
16 | def __init__(self, num_inputs, action_space, args):
17 |
18 | self.num_inputs = num_inputs
19 | self.action_space = action_space.shape[0]
20 | self.gamma = args.gamma
21 | self.tau = args.tau
22 | self.clip = args.clip
23 |
24 | self.policy_type = args.policy
25 | self.target_update_interval = args.target_update_interval
26 | self.automatic_entropy_tuning = args.automatic_entropy_tuning
27 |
28 | self.critic = QNetwork(self.num_inputs, self.action_space,\
29 | args.hidden_size).to(device)
30 | self.critic_optim = Adam(self.critic.parameters(), lr=args.lr)
31 | self.alpha = args.alpha
32 | self.tanh = args.tanh
33 | self.reparam = args.reparam
34 |
35 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace":
36 | # Target Entropy = −dim(A) (e.g. , -6 for HalfCheetah-v2) as given in the paper
37 | if self.automatic_entropy_tuning == True:
38 | self.target_entropy = -torch.prod(torch.Tensor(action_space.shape)).item()
39 | self.log_alpha = torch.zeros(1, requires_grad=True).to(device)
40 | self.alpha_optim = Adam([self.log_alpha], lr=args.lr)
41 | else:
42 | pass
43 |
44 | if self.policy_type == "Gaussian":
45 | self.policy = GaussianPolicy(self.num_inputs, self.action_space,\
46 | args.hidden_size,args).to(device)
47 | elif self.policy_type == "Exponential":
48 | self.policy = ExponentialPolicy(self.num_inputs, self.action_space,\
49 | args.hidden_size,args).to(device)
50 | elif self.policy_type == "LogNormal":
51 | self.policy = LogNormalPolicy(self.num_inputs, self.action_space,\
52 | args.hidden_size,args).to(device)
53 | elif self.policy_type == "Laplace":
54 | self.policy = LaplacePolicy(self.num_inputs, self.action_space,\
55 | args.hidden_size,args).to(device)
56 |
57 | self.policy_optim = Adam(self.policy.parameters(), lr=args.lr,weight_decay=1e-6)
58 |
59 | self.value = ValueNetwork(self.num_inputs,\
60 | args.hidden_size).to(device)
61 | self.value_target = ValueNetwork(self.num_inputs,\
62 | args.hidden_size).to(device)
63 | self.value_optim = Adam(self.value.parameters(), lr=args.lr)
64 | hard_update(self.value_target, self.value)
65 | elif self.policy_type == "Flow":
66 | if args.flow_model == 'made':
67 | self.policy = MADE(self.action_space,self.num_inputs,args.hidden_size,
68 | args.n_hidden, args.cond_label_size,
69 | args.activation_fn,
70 | args.input_order).to(device)
71 | elif args.flow_model == 'mademog':
72 | assert args.n_components > 1, 'Specify more than 1 component for mixture of gaussians models.'
73 | self.policy = MADEMOG(args.n_components, self.num_inputs,
74 | self.action_space, args.flow_hidden_size,
75 | args.n_hidden, args.cond_label_size,
76 | args.activation_fn,
77 | args.input_order).to(device)
78 | elif args.flow_model == 'maf':
79 | self.policy = MAF(args.n_blocks,self.num_inputs,self.action_space,
80 | args.flow_hidden_size, args.n_hidden,
81 | args.cond_label_size, args.activation_fn,
82 | args.input_order, batch_norm=not
83 | args.no_batch_norm).to(device)
84 | elif args.flow_model == 'mafmog':
85 | assert args.n_components > 1, 'Specify more than 1 component for mixture of gaussians models.'
86 | self.policy = MAFMOG(args.n_blocks,self.num_inputs,args.n_components,
87 | self.action_space, args.flow_hidden_size,
88 | args.n_hidden, args.cond_label_size,
89 | args.activation_fn,args.input_order,
90 | batch_norm=not
91 | args.no_batch_norm).to(device)
92 | elif args.flow_model =='realnvp':
93 | self.policy = RealNVP(args.n_blocks,self.num_inputs,self.action_space,
94 | args.flow_hidden_size,args.n_hidden,
95 | args.cond_label_size,batch_norm=not
96 | args.no_batch_norm).to(device)
97 | elif args.flow_model =='planar':
98 | self.policy = PlanarBase(args.n_blocks,self.num_inputs,self.action_space,
99 | args.flow_hidden_size,args.n_hidden,device).to(device)
100 | else:
101 | raise ValueError('Unrecognized model.')
102 | self.policy_optim = Adam(self.policy.parameters(), lr=args.lr, weight_decay=1e-6)
103 | self.value = ValueNetwork(self.num_inputs,\
104 | args.hidden_size).to(device)
105 | self.value_target = ValueNetwork(self.num_inputs,\
106 | args.hidden_size).to(device)
107 | self.value_optim = Adam(self.value.parameters(), lr=args.lr)
108 | hard_update(self.value_target, self.value)
109 | else:
110 | self.policy = DeterministicPolicy(self.num_inputs, self.action_space, args.hidden_size)
111 | self.policy_optim = Adam(self.policy.parameters(), lr=args.lr)
112 |
113 | self.critic_target = QNetwork(self.num_inputs, self.action_space,\
114 | args.hidden_size).to(device)
115 | hard_update(self.critic_target, self.critic)
116 |
117 | def select_action(self, state, eval=False):
118 | state = torch.FloatTensor(state).to(device).unsqueeze(0)
119 | if eval == False:
120 | self.policy.train()
121 | if len(state.size()) > 2:
122 | state = state.view(-1,self.num_inputs)
123 | action, _, _, _, _ = self.policy(state, reparam = self.reparam)
124 | else:
125 | self.policy.eval()
126 | if len(state.size()) > 2:
127 | state = state.view(-1,self.num_inputs)
128 | if self.policy_type != 'Flow':
129 | _, _, _, action, _ = self.policy(state, reparam=self.reparam)
130 | else:
131 | _, _, _, action, _ = self.policy.inverse(state)
132 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace":
133 | if self.tanh:
134 | action = torch.tanh(action)
135 | elif self.policy_type == "Flow":
136 | if self.tanh:
137 | action = torch.tanh(action)
138 | else:
139 | pass
140 | action = action.detach().cpu().numpy()
141 | return action[0]
142 |
143 | def update_parameters(self, state_batch, action_batch, reward_batch, next_state_batch, mask_batch, updates):
144 | state_batch = torch.FloatTensor(state_batch).to(device)
145 | next_state_batch = torch.FloatTensor(next_state_batch).to(device)
146 | action_batch = torch.FloatTensor(action_batch).to(device)
147 | reward_batch = torch.FloatTensor(reward_batch).to(device).unsqueeze(1)
148 | mask_batch = torch.FloatTensor(np.float32(mask_batch)).to(device).unsqueeze(1)
149 |
150 | """
151 | Use two Q-functions to mitigate positive bias in the policy improvement step that is known
152 | to degrade performance of value based methods. Two Q-functions also significantly speed
153 | up training, especially on harder task.
154 | """
155 | expected_q1_value, expected_q2_value = self.critic(state_batch, action_batch)
156 | if self.policy_type == 'Flow':
157 | new_action, log_prob, _, mean, log_std = self.policy.inverse(state_batch)
158 | else:
159 | new_action, log_prob, _, mean, log_std = self.policy(state_batch, reparam=self.reparam)
160 |
161 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace" or self.policy_type == 'Flow':
162 | if self.automatic_entropy_tuning:
163 | """
164 | Alpha Loss
165 | """
166 | alpha_loss = -(self.log_alpha * (log_prob + self.target_entropy).detach()).mean()
167 | self.alpha_optim.zero_grad()
168 | alpha_loss.backward()
169 | self.alpha_optim.step()
170 | self.alpha = self.log_alpha.exp()
171 | alpha_logs = self.alpha.clone() # For TensorboardX logs
172 | else:
173 | alpha_loss = torch.tensor(0.)
174 | alpha_logs = self.alpha # For TensorboardX logs
175 |
176 |
177 | """
178 | Including a separate function approximator for the soft value can stabilize training.
179 | """
180 | expected_value = self.value(state_batch)
181 | target_value = self.value_target(next_state_batch)
182 | next_q_value = reward_batch + mask_batch * self.gamma * (target_value).detach()
183 | else:
184 | """
185 | There is no need in principle to include a separate function approximator for the state value.
186 | We use a target critic network for deterministic policy and eradicate the value value network completely.
187 | """
188 | alpha_loss = torch.tensor(0.)
189 | alpha_logs = self.alpha # For TensorboardX logs
190 | next_state_action, _, _, _, _, = self.policy(next_state_batch, reparam =self.reparam)
191 | target_critic_1, target_critic_2 = self.critic_target(next_state_batch, next_state_action)
192 | target_critic = torch.min(target_critic_1, target_critic_2)
193 | next_q_value = reward_batch + mask_batch * self.gamma * (target_critic).detach()
194 |
195 | """
196 | Soft Q-function parameters can be trained to minimize the soft Bellman residual
197 | JQ = 𝔼(st,at)~D[0.5(Q1(st,at) - r(st,at) - γ(𝔼st+1~p[V(st+1)]))^2]
198 | ∇JQ = ∇Q(st,at)(Q(st,at) - r(st,at) - γV(target)(st+1))
199 | """
200 | q1_value_loss = F.mse_loss(expected_q1_value, next_q_value)
201 | q2_value_loss = F.mse_loss(expected_q2_value, next_q_value)
202 | q1_new, q2_new = self.critic(state_batch, new_action)
203 | expected_new_q_value = torch.min(q1_new, q2_new)
204 |
205 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace" or self.policy_type == 'Flow':
206 | """
207 | Including a separate function approximator for the soft value can stabilize training and is convenient to
208 | train simultaneously with the other networks
209 | Update the V towards the min of two Q-functions in order to reduce overestimation bias from function approximation error.
210 | JV = 𝔼st~D[0.5(V(st) - (𝔼at~π[Qmin(st,at) - α * log π(at|st)]))^2]
211 | ∇JV = ∇V(st)(V(st) - Q(st,at) + (α * logπ(at|st)))
212 | """
213 | next_value = expected_new_q_value - (self.alpha * log_prob)
214 | value_loss = F.mse_loss(expected_value, next_value.detach())
215 | else:
216 | pass
217 | # whether to use reparameterization trick or not
218 | if self.reparam == True:
219 | """
220 | Reparameterization trick is used to get a low variance estimator
221 | f(εt;st) = action sampled from the policy
222 | εt is an input noise vector, sampled from some fixed distribution
223 | Jπ = 𝔼st∼D,εt∼N[α * logπ(f(εt;st)|st) − Q(st,f(εt;st))]
224 | ∇Jπ = ∇log π + ([∇at (α * logπ(at|st)) − ∇at Q(st,at)])∇f(εt;st)
225 | """
226 | policy_loss = ((self.alpha * log_prob) - expected_new_q_value).mean()
227 | else:
228 | log_prob_target = expected_new_q_value - expected_value
229 | policy_loss = (log_prob * ((self.alpha * log_prob) - log_prob_target).detach() ).mean()
230 |
231 | # Regularization Loss
232 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace":
233 | mean_loss = 0.001 * mean.pow(2).mean()
234 | std_loss = 0.001 * log_std.pow(2).mean()
235 | policy_loss += mean_loss + std_loss
236 |
237 | self.critic_optim.zero_grad()
238 | q1_value_loss.backward()
239 | self.critic_optim.step()
240 |
241 | self.critic_optim.zero_grad()
242 | q2_value_loss.backward()
243 | self.critic_optim.step()
244 |
245 | if self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal" or self.policy_type == "Laplace":
246 | self.value_optim.zero_grad()
247 | value_loss.backward()
248 | self.value_optim.step()
249 | else:
250 | value_loss = torch.tensor(0.)
251 |
252 | self.policy_optim.zero_grad()
253 | policy_loss.backward()
254 | if self.policy_type == 'Exponential' or self.policy_type == "LogNormal" or self.policy_type == "Laplace" or self.policy_type == 'Flow':
255 | torch.nn.utils.clip_grad_norm_(self.policy.parameters(),self.clip)
256 | self.policy_optim.step()
257 |
258 | # clip weights of policy network to insure the values don't blow up
259 | for p in self.policy.parameters():
260 | p.data.clamp_(-10*self.clip, 10*self.clip)
261 |
262 | """
263 | We update the target weights to match the current value function weights periodically
264 | Update target parameter after every n(args.target_update_interval) updates
265 | """
266 | if updates % self.target_update_interval == 0 and self.policy_type == "Deterministic":
267 | soft_update(self.critic_target, self.critic, self.tau)
268 | elif updates % self.target_update_interval == 0 and (self.policy_type == "Gaussian" or self.policy_type == "Exponential" or self.policy_type == "LogNormal"):
269 | soft_update(self.value_target, self.value, self.tau)
270 |
271 | # calculate the entropy
272 | with torch.no_grad():
273 | entropy = -(log_prob.mean())
274 |
275 | # ipdb.set_trace() #alpha_loss.item()
276 | return value_loss.item(), q1_value_loss.item(), q2_value_loss.item(), policy_loss.item(),entropy, alpha_logs
277 |
278 | # Save model parameters
279 | def save_model(self, env_name, suffix="", actor_path=None, critic_path=None, value_path=None):
280 | if not os.path.exists('models/'):
281 | os.makedirs('models/')
282 |
283 | if actor_path is None:
284 | actor_path = "models/sac_actor_{}_{}".format(env_name, suffix)
285 | if critic_path is None:
286 | critic_path = "models/sac_critic_{}_{}".format(env_name, suffix)
287 | if value_path is None:
288 | value_path = "models/sac_value_{}_{}".format(env_name, suffix)
289 | print('Saving models to {}, {} and {}'.format(actor_path, critic_path, value_path))
290 | torch.save(self.value.state_dict(), value_path)
291 | torch.save(self.policy.state_dict(), actor_path)
292 | torch.save(self.critic.state_dict(), critic_path)
293 |
294 | # Load model parameters
295 | def load_model(self, actor_path, critic_path, value_path):
296 | print('Loading models from {}, {} and {}'.format(actor_path, critic_path, value_path))
297 | if actor_path is not None:
298 | self.policy.load_state_dict(torch.load(actor_path))
299 | if critic_path is not None:
300 | self.critic.load_state_dict(torch.load(critic_path))
301 | if value_path is not None:
302 | self.value.load_state_dict(torch.load(value_path))
303 |
304 |
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/scripts/run_contgridworld_exp.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | set -x
4 |
5 | #echo "Getting into the script"
6 |
7 | # Script to run multiple batches of experiments together.
8 | # Can also be used for different hyperparameter settings.
9 |
10 | # Run the following scripts in parallel:
11 |
12 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Exponential --smol --comet --dense_goals --silent --seed=0 &
13 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Exponential --smol --comet --dense_goals --silent --seed=2 &
14 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Exponential --smol --comet --dense_goals --silent --seed=4
15 |
16 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=150000 --policy=Exponential --smol --comet --silent --alpha=0.1 &
17 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=150000 --policy=Exponential --smol --comet --silent --alpha=0.2 &
18 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=150000 --policy=Exponential --smol --comet --silent --alpha=0.3 &
19 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=150000 --policy=Exponential --smol --comet --silent --alpha=0.4 &
20 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=E-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=150000 --policy=Exponential --smol --comet --silent --alpha=0.5 &
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/scripts/run_contgridworld_gauss.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | set -x
4 |
5 | #echo "Getting into the script"
6 |
7 | # Script to run multiple batches of experiments together.
8 | # Can also be used for different hyperparameter settings.
9 |
10 | # Run the following scripts in parallel:
11 |
12 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --dense_goals --silent &
13 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --dense_goals --silent
14 |
15 | # Baby parameter sweep
16 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Gaussian --smol --comet --alpha=0.1 --silent &
17 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Gaussian --smol --comet --alpha=0.2 --silent &
18 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Gaussian --smol --comet --alpha=0.3 --silent &
19 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Gaussian --smol --comet --alpha=0.4 --silent &
20 | #COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=200000 --policy=Gaussian --smol --comet --alpha=0.5 --silent
21 |
22 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=250000 --policy=Gaussian --smol --comet --alpha=0.2 --silent --seed=0 &
23 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=250000 --policy=Gaussian --smol --comet --alpha=0.2 --silent --seed=3 &
24 | COMET_DISABLE_AUTO_LOGGING=1 python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=250000 --policy=Gaussian --smol --comet --alpha=0.2 --silent --seed=4 &
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/settings.json:
--------------------------------------------------------------------------------
1 | {"username": "florl", "apikey": "q1ucYPwQZ5VndrGsUdtcHAv8y", "restapikey":"6KiaAJZv83a9meof3qkrFx15F", "project":"florl"}
--------------------------------------------------------------------------------
/pytorch-soft-actor-critic/utils.py:
--------------------------------------------------------------------------------
1 | import math
2 | import torch
3 |
4 | def create_log_gaussian(mean, log_std, t):
5 | quadratic = -((0.5 * (t - mean) / (log_std.exp())).pow(2))
6 | l = mean.shape
7 | log_z = log_std
8 | z = l[-1] * math.log(2 * math.pi)
9 | log_p = quadratic.sum(dim=-1) - log_z.sum(dim=-1) - 0.5 * z
10 | return log_p
11 |
12 | def logsumexp(inputs, dim=None, keepdim=False):
13 | if dim is None:
14 | inputs = inputs.view(-1)
15 | dim = 0
16 | s, _ = torch.max(inputs, dim=dim, keepdim=True)
17 | outputs = s + (inputs - s).exp().sum(dim=dim, keepdim=True).log()
18 | if not keepdim:
19 | outputs = outputs.squeeze(dim)
20 | return outputs
21 |
22 | def soft_update(target, source, tau):
23 | for target_param, param in zip(target.parameters(), source.parameters()):
24 | target_param.data.copy_(target_param.data * (1.0 - tau) + param.data * tau)
25 |
26 | def hard_update(target, source):
27 | for target_param, param in zip(target.parameters(), source.parameters()):
28 | target_param.data.copy_(param.data)
29 |
--------------------------------------------------------------------------------
/pytorch-vanilla-reinforce/README.md:
--------------------------------------------------------------------------------
1 | This implements basic reinforce with and without a baseline value network for continuous control using Gaussian policies.
2 |
3 | An example of how to run reinforce:
4 |
5 | ```bash
6 | > python main_reinforce.py --namestr="name of experiment" --env-name --baseline {True/False} --num-episodes 4000
7 | ```
8 |
--------------------------------------------------------------------------------
/pytorch-vanilla-reinforce/main_reinforce.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from comet_ml import Experiment
3 | import torch
4 | from torch.autograd import Variable
5 | import torch.autograd as autograd
6 | import numpy as np
7 | import torch.nn as nn
8 | import json
9 | from reinforce_simple import REINFORCE
10 | import torch.nn.functional as F
11 | import torch.optim as optim
12 | from torch.distributions import Normal
13 | import mujoco_py
14 | import os
15 | import gym
16 | import ipdb
17 |
18 | def evaluate_policy(policy, eval_episodes = 10):
19 | '''
20 | function to return the average reward of the policy over 10 runs
21 | '''
22 | avg_reward = 0.0
23 | for _ in range(eval_episodes):
24 | obs = env.reset()
25 | done = False
26 | while not done:
27 | action, log_prob, mean, std = policy.select_action(np.array(obs) )
28 | obs, reward, done, _ = env.step(action)
29 | avg_reward += reward
30 |
31 | avg_reward /= eval_episodes
32 | print("the average reward is: {0}".format(avg_reward))
33 | #return avg_reward
34 |
35 | def render_policy(policy):
36 | '''
37 | Function to see the policy in action
38 | '''
39 | obs = env.reset()
40 | done = False
41 | while not done:
42 | env.render()
43 | action,_,_,_ = policy.select_action(np.array(obs))
44 | obs, reward, done, _ = env.step(action)
45 |
46 | env.close()
47 |
48 | def main(args):
49 |
50 | # create env
51 | env = gym.make(args.env_name)
52 | env.seed(args.seed)
53 | torch.manual_seed(args.seed)
54 | np.random.seed(args.seed)
55 |
56 | # get env info
57 | state_dim = env.observation_space.shape[0]
58 | action_dim = env.action_space
59 | max_action = (env.action_space.high)
60 | min_action = (env.action_space.low)
61 |
62 | print("number of actions:{0}, dim of states: {1},\
63 | max_action:{2}, min_action: {3}".format(action_dim,state_dim,max_action,min_action))
64 |
65 | # setup comet_ml to track experiments
66 | if os.path.isfile("settings.json"):
67 | with open('settings.json') as f:
68 | data = json.load(f)
69 | args.comet_apikey = data["apikey"]
70 | args.comet_username = data["username"]
71 | else:
72 | raise NotImplementedError
73 |
74 | experiment = Experiment(api_key=args.comet_apikey,\
75 | project_name="florl",auto_output_logging="None",\
76 | workspace=args.comet_username,auto_metric_logging=False,\
77 | auto_param_logging=False)
78 | experiment.set_name(args.namestr)
79 | args.experiment = experiment
80 |
81 | # construct model
82 | hidden_size = args.hidden_size
83 | policy = REINFORCE(state_dim, hidden_size, action_dim, baseline = args.baseline)
84 |
85 | # start of experiment: Keep looping until desired amount of episodes reached
86 | max_episodes = args.num_episodes
87 | total_episodes = 0 # keep track of amount of episodes that we have done
88 |
89 | while total_episodes < max_episodes:
90 |
91 | obs = env.reset()
92 | done = False
93 | trajectory = [] # trajectory info for reinforce update
94 | episode_reward = 0 # keep track of rewards per episode
95 |
96 | while not done:
97 | action, ln_prob, mean, std = policy.select_action(np.array(obs))
98 | next_state, reward, done, _ = env.step(action)
99 | trajectory.append([np.array(obs), action, ln_prob, reward, next_state, done])
100 |
101 | obs = next_state
102 | episode_reward += reward
103 |
104 | total_episodes += 1
105 |
106 | if args.baseline:
107 | policy_loss, value_loss = policy.train(trajectory)
108 | experiment.log_metric("value function loss", value_loss, step = total_episodes)
109 | else:
110 | policy_loss = policy.train(trajectory)
111 |
112 | experiment.log_metric("policy loss",policy_loss, step = total_episodes)
113 | experiment.log_metric("episode reward", episode_reward, step =total_episodes)
114 |
115 |
116 | env.close()
117 |
118 |
119 | if __name__ == '__main__':
120 |
121 | """
122 | Process command-line arguments, then call main()
123 | """
124 | parser = argparse.ArgumentParser(description='PyTorch REINFORCE example')
125 | parser.add_argument('--env-name', default="HalfCheetah-v1",
126 | help='name of the environment to run')
127 | parser.add_argument('--seed', type=int, default=456, metavar='N',
128 | help='random seed (default: 456)')
129 | parser.add_argument('--baseline', type=bool, default = False, help = 'Whether you want to add a baseline to Reinforce or not')
130 | parser.add_argument('--namestr', type=str, default='FloRL', \
131 | help='additional info in output filename to describe experiments')
132 | parser.add_argument('--num-episodes', type=int, default=2000, metavar='N',
133 | help='maximum number of episodes (default:2000)')
134 | parser.add_argument('--hidden-size', type=int, default=256, metavar='N',
135 | help='hidden size (default: 256)')
136 | args = parser.parse_args()
137 |
138 | main(args)
139 |
140 |
--------------------------------------------------------------------------------
/pytorch-vanilla-reinforce/reinforce_simple.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 | import torch.autograd as autograd
4 | import numpy as np
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | import torch.optim as optim
8 | from torch.distributions import Normal
9 | import ipdb
10 |
11 | LOG_SIG_MAX = 2
12 | LOG_SIG_MIN = -20
13 | epsilon = 1e-6
14 |
15 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
16 |
17 | class Policy(nn.Module):
18 | '''
19 | Gaussian policy that consists of a neural network with 1 hidden layer that
20 | outputs mean and log std dev (the params) of a gaussian policy
21 | '''
22 |
23 | def __init__(self, num_inputs, hidden_size, action_space):
24 |
25 | super(Policy, self).__init__()
26 |
27 | self.action_space = action_space
28 | num_outputs = action_space.shape[0] # the number of output actions
29 |
30 | self.linear = nn.Linear(num_inputs, hidden_size)
31 | self.mean = nn.Linear(hidden_size, num_outputs)
32 | self.log_std = nn.Linear(hidden_size, num_outputs)
33 |
34 | def forward(self, inputs):
35 |
36 | # forward pass of NN
37 | x = inputs
38 | x = F.relu(self.linear(x))
39 |
40 | mean = self.mean(x)
41 | log_std = self.log_std(x) # if more than one action this will give you the diagonal elements of a diagonal covariance matrix
42 | log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX) # We limit the variance by forcing within a range of -2,20
43 | std = log_std.exp()
44 |
45 | return mean, std
46 |
47 | class ValueNetwork(nn.Module):
48 | '''
49 | Value network V(s_t) = E[G_t | s_t] to use as a baseline in the reinforce
50 | update. This a Neural Net with 1 hidden layer
51 | '''
52 |
53 | def __init__(self, num_inputs, hidden_dim):
54 | super(ValueNetwork, self).__init__()
55 | self.linear1 = nn.Linear(num_inputs, hidden_dim)
56 | self.linear2 = nn.Linear(hidden_dim, 1)
57 |
58 | def forward(self, state):
59 |
60 | x = F.relu(self.linear1(state))
61 | x = self.linear2(x)
62 |
63 | return x
64 |
65 | class REINFORCE:
66 | '''
67 | Implementation of the basic online reinforce algorithm for Gaussian policies.
68 | '''
69 |
70 | def __init__(self, num_inputs, hidden_size, action_space, lr_pi = 3e-4,\
71 | lr_vf = 1e-3, baseline = False, gamma = 0.99, train_v_iters = 1):
72 |
73 | self.gamma = gamma
74 | self.action_space = action_space
75 | self.policy = Policy(num_inputs, hidden_size, action_space)
76 | self.policy_optimizer = optim.Adam(self.policy.parameters(), lr = lr_pi)
77 | self.baseline = baseline
78 | self.train_v_iters = train_v_iters # how many times you want to run update loop.
79 |
80 | # create value network if we want to use baseline
81 | if self.baseline:
82 | self.value_function = ValueNetwork(num_inputs, hidden_size)
83 | self.value_optimizer = optim.Adam(self.value_function.parameters(), lr = lr_vf)
84 |
85 | def select_action(self,state):
86 |
87 | state = torch.from_numpy(state).float().unsqueeze(0) # just to make it a Tensor obj
88 | # get mean and std
89 | mean, std = self.policy(state)
90 |
91 | # create normal distribution
92 | normal = Normal(mean, std)
93 |
94 | # sample action
95 | action = normal.sample()
96 |
97 | # get log prob of that action
98 | ln_prob = normal.log_prob(action)
99 | ln_prob = ln_prob.sum()
100 | # squeeze action into [-1,1]
101 | action = torch.tanh(action)
102 | # turn actions into numpy array
103 | action = action.numpy()
104 |
105 | return action[0], ln_prob, mean, std
106 |
107 | def train(self, trajectory):
108 |
109 | '''
110 | The training is done using the rewards-to-go formulation of the policy gradient update of Reinforce.
111 | If we are using a baseline, the value network is also trained.
112 |
113 | trajectory: a list of the form [( state , action , lnP(a_t|s_t), reward ), ... ]
114 |
115 | '''
116 |
117 | log_probs = [item[2] for item in trajectory]
118 | rewards = [item[3] for item in trajectory]
119 | states = [item[0] for item in trajectory]
120 | actions = [item[1] for item in trajectory]
121 |
122 | #calculate rewards to go
123 | R = 0
124 | returns = []
125 | for r in rewards[::-1]:
126 | R = r + 0.99 * R
127 | returns.insert(0, R)
128 |
129 | returns = torch.tensor(returns)
130 |
131 | # train the Value Network and calcualte Advantage
132 | if self.baseline:
133 |
134 | # loop over this a couple of times
135 | for _ in range(self.train_v_iters):
136 | # calculate loss of value function using mean squared error
137 | value_estimates = []
138 | for state in states:
139 | state = torch.from_numpy(state).float().unsqueeze(0) # just to make it a Tensor obj
140 | value_estimates.append( self.value_function(state) )
141 |
142 | value_estimates = torch.stack(value_estimates).squeeze() # rewards to go for each step of env trajectory
143 |
144 | v_loss = F.mse_loss(value_estimates, returns)
145 | # update the weights
146 | self.value_optimizer.zero_grad()
147 | v_loss.backward()
148 | self.value_optimizer.step()
149 |
150 | # calculate advantage
151 | advantage = []
152 | for value, R in zip(value_estimates, returns):
153 | advantage.append(R - value)
154 |
155 | advantage = torch.Tensor(advantage)
156 |
157 | # caluclate policy loss
158 | policy_loss = []
159 | for log_prob, adv in zip(log_probs, advantage):
160 | policy_loss.append( - log_prob * adv)
161 |
162 |
163 | else:
164 | policy_loss = []
165 | for log_prob, R in zip(log_probs, returns):
166 | policy_loss.append( - log_prob * R)
167 |
168 |
169 | policy_loss = torch.stack( policy_loss ).sum()
170 | # update policy weights
171 | self.policy_optimizer.zero_grad()
172 | policy_loss.backward()
173 | self.policy_optimizer.step()
174 |
175 |
176 | if self.baseline:
177 | return policy_loss, v_loss
178 |
179 | else:
180 | return policy_loss
181 |
--------------------------------------------------------------------------------