├── 00_generate_expert_trajectories.ipynb ├── 01_bc.ipynb ├── 02_dagger.ipynb ├── 03_maxent_irl.ipynb ├── 04_mce_irl.ipynb ├── 05_relent_irl.ipynb ├── 06_gail.ipynb ├── Readme.md ├── assets ├── gridworld_definition.png └── regularization_plot.png ├── expert_actor.pt ├── expert_data └── ckpt0.pkl ├── expert_value.pt ├── gridworld.json └── gridworld.py /01_bc.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "f65db394-c47a-49e9-b1d2-0b29aae5e186", 6 | "metadata": {}, 7 | "source": [ 8 | "# Behavior Cloning (BC)\n", 9 | "\n", 10 | "In Behavior Cloning (BC), we find optimal parameter $\\theta$ in policy $\\pi_{\\theta}$ by solving a regression (or classification) problem using expert's dataset $\\mathcal{D}$ as a supervised learning.
\n", 11 | "Therefore, you can simply apply existing regression (or classification) methods - such as, Gaussian model, GMM, non-parametric method (LWR, GPR), or neural network learners.
\n", 12 | "See [my post](https://tsmatz.wordpress.com/2017/08/30/regression-in-machine-learning-math-for-beginners/) for the design choice of regression problems.\n", 13 | "\n", 14 | "In this notebook, I'll build neural network policy $\\pi_{\\theta}$ and then optimize parameters (weights) by minimizing cross-entropy loss in PyTorch.\n", 15 | "\n", 16 | "The trained policy is then available in regular reinforcement learning (RL) methods, if you refine models to get better performance. (See [here](https://github.com/tsmatz/reinforcement-learning-tutorials) for RL algorithms.)\n", 17 | "\n", 18 | "BC is a basic approach for imitation learning, and easily applied into the various scenarios.\n", 19 | "\n", 20 | "But it's worth noting that it also has the shortcomings to apply in some situations.
\n", 21 | "One of these is that the agent trained by BC might sometimes happens to encounter unknown states which are not included in the initial expert's behaviors. (Because expert dataset doesn't have enough data for failure scenarios.) In most cases, the trained agent in BC works well in success cases, but it fails when it encounters the irregular states.
\n", 22 | "In such cases, you can apply [DAgger](./02_dagger.ipynb) (next example), or the policy can be transferred to regular reinforcement learning after BC has been applied.\n", 23 | "\n", 24 | "Now let's start.\n", 25 | "\n", 26 | "*(back to [index](https://github.com/tsmatz/imitation-learning-tutorials/))*" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "id": "62766662-86b8-4012-93b6-b5a36fbf273e", 32 | "metadata": {}, 33 | "source": [ 34 | "Before we start, we need to install the required packages." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "id": "6e277d08-839a-42a9-98ee-2a6607044036", 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "!pip install torch numpy" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "id": "ffda190f-7c7f-4836-8d4c-252e3c736d1e", 50 | "metadata": {}, 51 | "source": [ 52 | "## Restore environment" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "id": "020bea39-9434-4a4d-bc08-7df367d3d764", 58 | "metadata": {}, 59 | "source": [ 60 | "Firstly, I restore GridWorld environment from JSON file.\n", 61 | "\n", 62 | "For details about this environment, see [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md).\n", 63 | "\n", 64 | "> Note : See [this script](./00_generate_expert_trajectories.ipynb) for generating the same environment." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 1, 70 | "id": "211f0d0d-c49e-46b1-a827-cfbe1ebf04f4", 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "import torch\n", 75 | "import json\n", 76 | "from gridworld import GridWorld\n", 77 | "\n", 78 | "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", 79 | "\n", 80 | "with open(\"gridworld.json\", \"r\") as f:\n", 81 | " json_object = json.load(f)\n", 82 | " env = GridWorld(**json_object, device=device)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "id": "6a56ebc3-071d-4429-a723-0ea430fff92a", 88 | "metadata": {}, 89 | "source": [ 90 | "Now I visualize our GridWorld environment.\n", 91 | "\n", 92 | "The number in each cell indicates the reward score on this state.
\n", 93 | "The goal state is on the right-bottom corner (in which the reward is ```10.0```), and the initial state is uniformly picked up from the gray-colored cells.
\n", 94 | "If the agent can reach to goal state without losing any rewards, it will get ```10.0``` for total reward.\n", 95 | "\n", 96 | "See [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md) for details about the game rule of this environment." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 2, 102 | "id": "1da9931f-23e4-4ee7-a90c-41960bf431d7", 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/html": [ 108 | "
0-1000-1000000-10-1-1-1-10-10-1000-1000-1-10-10000-1-100000000000
-10000-10-10000000000000-1-1-10-100000-10-10-10-10-1-1000-1-100-1-1
00-1-10-10-10-10000000-100-100-10000000-100-100-1-100-10000-1000
00-10-1-100-10-10000-10-10-10-100-1-100-1000000000000-10000000-1
-1-1-10000000000000000-1-10-1000000000000-10-1000-100-10-1-100
0-1-100-10-100-100-100-1-100-1000-10000-1-100-10-1-1-1-100-10-100000-1
-1-10000000000-10-100000-1-100-1-100-1-10-10000000-1-1000-10-10-1-1
0000-1000-10000-100-1-100-100-100-1000-1-100-10-10-100-10-1000000
0000000000-1-10-100-100-1-1000000-1000-1-1-1-1-1-10-1-10-1000-10000
000-10-100-1-100-1000000000-1-1-10-10000000000-1000000-1-10-1-1-1
00000-10000-100-1000-1000-100-1000-1-10-1-100000-1-1-1-1-1-100-1-1-10
-10-1000-10-100000000-100-100-10000000-1-10000-100000-1-10-1000
-10-10-10000-10000-1000-10-10-1-1-10-1000-1-10000-1000-1-1000-10-100
000-10-100-1-1-1000-100000-100000-1-10000000000-10000-10-10000
-10-100-10-1-1-100-1000-10-1000000-1-10-1-10-10-1-10-100000-10-1000-10
-1-100-100-1-1000-1000-10000-10-10-10-10-1-1-100000-1-1-1-1-1-1-1-100-10-1
0-100000-1000000-100-1-100-1-10-10-10-1-10000000-1-10-10-1-10-10-1-1-1
0-100-1-10000000-100000-1000-100-1000-1-10-10-1-10-1000-1-1-10000-1
0-1-100000000-100-10-1-1-1000-100-1000-100000-10000-10000000-10
0-1-1000-1000-10000-1000000-10-10-1-1-100000-1-10000000000-10-1-1
-1-1-1-1000-1-10-1-100-100-100000-100000-1-1000-1000-1-1-1-100-1-10-1-1-1
-100000-1000-100-1-1-100-1000-100-10000-1-10000-10-100-100-100000
0-100-10-1-1-10-1-1-10000000000-1-100-1000-100-1-1-1-10-1-100-1-10-10-10
-1000-1-100-10000000-10-100-10-1-100-1-10-10000-1-1-10000000-1-1000
000-1000-100-10-10000-10-10000-1000-10-1000-100-100-1-10-1-1-1-1-100
0-10-1000000-10-1-10-10-1000-100-10-1-100-100000-1-1-10-10000-1-1-100
-100000000-10-1-1000-1-10-1-10-1000-10000-10-10-100-100-1000000-10
0-1-1-10-1-100-1-10-10000-1000-10000000-1-1000-10000000-10000000
00-1-100-1-100000-100-1-1000-100-100-10-1-1000-1-100000-100-100-100
000000-1-1-1-10000-10-10-10000-1-1-1-1-1-1-1-1000-10-10-100000000-100
0-10-1000-1-100-10-1-1000000000000000000-10-100000-1-10000000
00-100-1-1-1000000000-1-1000-1-1-1000000-1-1000-100-1-10000-1-100-1
0000-10-10000000000-10-1-1-10-1-1-1000-1000-10-1-1000000000-1000
-100-1-10-10-100000-10-1-10-100-100-1000-1-1000-1000-1-10-10-1-100000
00-100-10-10-1-100-1-1-1-1-1000-1-10-100000000-1-1-1000-1000-100-1-100
0-1-10000000-10-10-10-10000000-10000-1000000-100-1-10000-10000
0000-10-1000-10-1-100000-1-100000-10000000-100-100000-1-100-100
000-100-10-1000-100-1-1-1000000-1000-10-100-1-10000-1-1000000-10-1
0000000-1-10000000-1000-1-1000-1-1-10-100-10-10-10-10000-1-100000
000000-100-100-10-100000-10-1000000-10-10-10000000-10-1-1000-10
0-1000-1-1-1000-10-10-10-1-100-1-1000-1-100000-10-100-1-1-1-1-1000-1-1-10
0000-1-1-100-10-10-10-1-1-100-10000-1-1-10000-100-10-100000-1-1000-10
000-10000000-1000000-100-1-10-10-100-10000-1000000-100-1-1-10-1-1
00000000-1-1000-1000-1-1-100000000-10-1-10-10-100-100000000000
-1-100000-1000-100-1-1-10-1000-10000-100-1-10-10-1-100-1-10-10-100000
-10-1-1-1-1000-100-1000-1000-1-1000-10-10000-10000-1-100000-1-10000
00000-10-1-100000-10-10-10000-1-10-10000-1-1-100-10-1-1000000000-1
00-1-1-1-1-100-10000-1-100-10000-10000-1-1-10-1-10000-1-10-1-10-10-1000
-100-1-10-100-10-1000000-1-10-1000-100000-1000000-100-1-1000-1000
-1000-10-1-1-1-10-10-1-100-1000-10000-1-100000-10-1-100-100000-100-110
" 109 | ], 110 | "text/plain": [ 111 | "" 112 | ] 113 | }, 114 | "metadata": {}, 115 | "output_type": "display_data" 116 | } 117 | ], 118 | "source": [ 119 | "from IPython.display import HTML, display\n", 120 | "\n", 121 | "valid_states_all = torch.cat((env.valid_states, torch.tensor([env.grid_size-1,env.grid_size-1]).to(device).unsqueeze(dim=0)))\n", 122 | "valid_states_all = valid_states_all[:,0] * env.grid_size + valid_states_all[:,1]\n", 123 | "\n", 124 | "html_text = \"\"\n", 125 | "for row in range(env.grid_size):\n", 126 | " html_text += \"\"\n", 127 | " for col in range(env.grid_size):\n", 128 | " if row*env.grid_size + col in valid_states_all:\n", 129 | " html_text += \"\"\n", 134 | " html_text += \"\"\n", 135 | "html_text += \"
\"\n", 130 | " else:\n", 131 | " html_text += \"\"\n", 132 | " html_text += str(env.reward_map[row*env.grid_size+col].tolist())\n", 133 | " html_text += \"
\"\n", 136 | "\n", 137 | "display(HTML(html_text))" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "id": "010cb629-2b72-43da-8a13-ec2e79ce9d05", 143 | "metadata": {}, 144 | "source": [ 145 | "## Define policy" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "id": "7c68de67-c4c7-445a-98ad-c48008785aea", 151 | "metadata": {}, 152 | "source": [ 153 | "Now I build a policy $\\pi_{\\theta}$.\n", 154 | "\n", 155 | "This network receives the current state (one-hot state) as input and returns the optimal action (action's logits) as output." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 3, 161 | "id": "6f3fb3dd-3dc0-4411-94bf-e1c7a0df826f", 162 | "metadata": {}, 163 | "outputs": [], 164 | "source": [ 165 | "import torch.nn as nn\n", 166 | "from torch.nn import functional as F\n", 167 | "\n", 168 | "STATE_SIZE = env.grid_size*env.grid_size # 2500\n", 169 | "ACTION_SIZE = env.action_size # 4\n", 170 | "\n", 171 | "#\n", 172 | "# Define model\n", 173 | "#\n", 174 | "class PolicyNet(nn.Module):\n", 175 | " def __init__(self, hidden_dim=64):\n", 176 | " super().__init__()\n", 177 | " self.hidden = nn.Linear(STATE_SIZE, hidden_dim)\n", 178 | " self.classify = nn.Linear(hidden_dim, ACTION_SIZE)\n", 179 | "\n", 180 | " def forward(self, s):\n", 181 | " outs = self.hidden(s)\n", 182 | " outs = F.relu(outs)\n", 183 | " logits = self.classify(outs)\n", 184 | " return logits\n", 185 | "\n", 186 | "#\n", 187 | "# Generate model\n", 188 | "#\n", 189 | "policy_func = PolicyNet().to(device)" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "id": "78cf5978-a292-47aa-9790-424560a93145", 195 | "metadata": {}, 196 | "source": [ 197 | "## Run agent before training" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "id": "c8cf48e8-af1b-4600-811f-0afd401b43ed", 203 | "metadata": {}, 204 | "source": [ 205 | "For comparison, now I run this agent without any training.\n", 206 | "\n", 207 | "In this game, the maximum episode's reward without losing any rewards is ```10.0```. (See [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md) for game rule in this environment.)
\n", 208 | "As you can see below, it has low average of rewards." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 4, 214 | "id": "da7720a0-cd39-4b3e-ab8a-4bb3932f4ae8", 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "# Pick stochastic samples with policy model\n", 219 | "def pick_sample_and_logits(policy, s):\n", 220 | " \"\"\"\n", 221 | " Stochastically pick up action and logits with policy model.\n", 222 | "\n", 223 | " Parameters\n", 224 | " ----------\n", 225 | " policy : torch.nn.Module\n", 226 | " Policy network to use\n", 227 | " s : torch.tensor((..., STATE_SIZE), dtype=int)\n", 228 | " The feature (one-hot) of state.\n", 229 | " The above \"...\" can have arbitrary shape with 0 or 1 dimension.\n", 230 | "\n", 231 | " Returns\n", 232 | " ----------\n", 233 | " action : torch.tensor((...), dtype=int)\n", 234 | " The picked-up actions.\n", 235 | " If input shape is (*, STATE_SIZE), this shape becomes (*).\n", 236 | " logits : torch.tensor((..., ACTION_SIZE), dtype=float)\n", 237 | " Logits of categorical distribution (used to optimize model).\n", 238 | " If input shape is (*, STATE_SIZE), this shape becomes (*, ACTION_SIZE).\n", 239 | " \"\"\"\n", 240 | " # get logits from state\n", 241 | " # --> size : (*, ACTION_SIZE)\n", 242 | " logits = policy(s.float())\n", 243 | " # from logits to probabilities\n", 244 | " # --> size : (*, ACTION_SIZE)\n", 245 | " probs = F.softmax(logits, dim=-1)\n", 246 | " # pick up action's sample\n", 247 | " # --> size : (*, 1)\n", 248 | " a = torch.multinomial(probs, num_samples=1)\n", 249 | " # --> size : (*)\n", 250 | " a = a.squeeze()\n", 251 | "\n", 252 | " # Return\n", 253 | " return a, logits" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 5, 259 | "id": "e1456ab6-17f0-4921-9c9b-93674a7f2b16", 260 | "metadata": {}, 261 | "outputs": [ 262 | { 263 | "name": "stdout", 264 | "output_type": "stream", 265 | "text": [ 266 | "Estimated rewards (before training): -67.3\n" 267 | ] 268 | } 269 | ], 270 | "source": [ 271 | "def evaluate(policy, batch_size):\n", 272 | " total_reward = torch.tensor(0.0).to(device)\n", 273 | " s = env.reset(batch_size)\n", 274 | " while True:\n", 275 | " s_onehot = F.one_hot(s, num_classes=STATE_SIZE)\n", 276 | " with torch.no_grad():\n", 277 | " a, _ = pick_sample_and_logits(policy, s_onehot)\n", 278 | " s, r, term, trunc = env.step(a, s)\n", 279 | " total_reward += torch.sum(r)\n", 280 | " done = torch.logical_or(term, trunc)\n", 281 | " work_indices = (done==False).nonzero().squeeze(dim=-1)\n", 282 | " if not (len(work_indices) > 0):\n", 283 | " break;\n", 284 | " s = s[work_indices]\n", 285 | " return total_reward.item() / batch_size\n", 286 | "\n", 287 | "avg_reward = evaluate(policy_func, 300)\n", 288 | "print(f\"Estimated rewards (before training): {avg_reward}\")" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "id": "e6d0eff9-1759-4ea3-bba0-65f3b509f683", 294 | "metadata": {}, 295 | "source": [ 296 | "## Train policy\n", 297 | "\n", 298 | "Now we train our policy with expert data.\n", 299 | "\n", 300 | "> Note : The expert data is located in ```./expert_data``` folder in this repository. See [this script](./00_generate_expert_trajectories.ipynb) for generating expert dataset.\n", 301 | "\n", 302 | "In this training, I compute cross-entropy loss for categorical distribution and then optimize the policy with only expert dataset.
\n", 303 | "Unlike [reinforcement learning](https://github.com/tsmatz/reinforcement-learning-tutorials), the reward is unknown in this training.\n", 304 | "\n", 305 | "As you can see below, the average reward becomes high, and the policy is well-trained. (See [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md) for game rule in this environment.)\n", 306 | "\n", 307 | "> Note : You can run as a batch to speed up training. (Here I get loss one by one, because the training is very easy.)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 6, 313 | "id": "088aee9f-0f53-40e2-b5c9-2ac7d81d60f2", 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "name": "stdout", 318 | "output_type": "stream", 319 | "text": [ 320 | "Processed 1000 episodes in checkpoint ckpt0.pkl...\n", 321 | "Evaluation result (Average reward): 3.13\n", 322 | "Processed 2000 episodes in checkpoint ckpt0.pkl...\n", 323 | "Evaluation result (Average reward): 7.215\n", 324 | "Processed 3000 episodes in checkpoint ckpt0.pkl...\n", 325 | "Evaluation result (Average reward): 7.58\n", 326 | "Processed 4000 episodes in checkpoint ckpt0.pkl...\n", 327 | "Evaluation result (Average reward): 8.285\n", 328 | "Processed 5000 episodes in checkpoint ckpt0.pkl...\n", 329 | "Evaluation result (Average reward): 8.225\n", 330 | "Processed 6000 episodes in checkpoint ckpt0.pkl...\n", 331 | "Evaluation result (Average reward): 8.575\n", 332 | "Processed 7000 episodes in checkpoint ckpt0.pkl...\n", 333 | "Evaluation result (Average reward): 8.485\n", 334 | "Processed 8000 episodes in checkpoint ckpt0.pkl...\n", 335 | "Evaluation result (Average reward): 8.715\n", 336 | "Processed 9000 episodes in checkpoint ckpt0.pkl...\n", 337 | "Evaluation result (Average reward): 8.63\n", 338 | "Processed 10000 episodes in checkpoint ckpt0.pkl...\n", 339 | "Evaluation result (Average reward): 8.705\n" 340 | ] 341 | } 342 | ], 343 | "source": [ 344 | "import pickle\n", 345 | "\n", 346 | "# use the following expert dataset\n", 347 | "dest_dir = \"./expert_data\"\n", 348 | "checkpoint_files = [\"ckpt0.pkl\"]\n", 349 | "\n", 350 | "# create optimizer\n", 351 | "opt = torch.optim.AdamW(policy_func.parameters(), lr=0.001)\n", 352 | "\n", 353 | "for ckpt in checkpoint_files:\n", 354 | " # load expert data from pickle\n", 355 | " with open(f\"{dest_dir}/{ckpt}\", \"rb\") as f:\n", 356 | " all_data = pickle.load(f)\n", 357 | " all_states = all_data[\"states\"]\n", 358 | " all_actions = all_data[\"actions\"]\n", 359 | " timestep_lens = all_data[\"timestep_lens\"]\n", 360 | " # loop all episodes in demonstration\n", 361 | " current_timestep = 0\n", 362 | " for i, timestep_len in enumerate(timestep_lens):\n", 363 | " # pick up states and actions in a single episode\n", 364 | " states = all_states[current_timestep:current_timestep+timestep_len]\n", 365 | " actions = all_actions[current_timestep:current_timestep+timestep_len]\n", 366 | " # collect loss and optimize (train)\n", 367 | " opt.zero_grad()\n", 368 | " loss = []\n", 369 | " for s, a in zip(states, actions):\n", 370 | " s_onehot = F.one_hot(torch.tensor(s).to(device), num_classes=STATE_SIZE)\n", 371 | " _, logits = pick_sample_and_logits(policy_func, s_onehot)\n", 372 | " l = F.cross_entropy(logits, torch.tensor(a).to(device), reduction=\"none\")\n", 373 | " loss.append(l)\n", 374 | " total_loss = torch.stack(loss, dim=0)\n", 375 | " total_loss.sum().backward()\n", 376 | " opt.step()\n", 377 | " # log\n", 378 | " print(\"Processed {:5d} episodes in checkpoint {}...\".format(i + 1, ckpt), end=\"\\r\")\n", 379 | " # run evaluation in each 1000 episodes\n", 380 | " if i % 1000 == 999:\n", 381 | " avg = evaluate(policy_func, 200)\n", 382 | " print(f\"\\nEvaluation result (Average reward): {avg}\")\n", 383 | " # proceed to next episode\n", 384 | " current_timestep += timestep_len" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "id": "5f2a9ea4-6fca-4934-b2a7-8c7f8e4af359", 391 | "metadata": {}, 392 | "outputs": [], 393 | "source": [] 394 | } 395 | ], 396 | "metadata": { 397 | "kernelspec": { 398 | "display_name": "Python 3 (ipykernel)", 399 | "language": "python", 400 | "name": "python3" 401 | }, 402 | "language_info": { 403 | "codemirror_mode": { 404 | "name": "ipython", 405 | "version": 3 406 | }, 407 | "file_extension": ".py", 408 | "mimetype": "text/x-python", 409 | "name": "python", 410 | "nbconvert_exporter": "python", 411 | "pygments_lexer": "ipython3", 412 | "version": "3.10.12" 413 | } 414 | }, 415 | "nbformat": 4, 416 | "nbformat_minor": 5 417 | } 418 | -------------------------------------------------------------------------------- /02_dagger.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e7cae5cc-fd75-40a1-80b2-bd48c6e69013", 6 | "metadata": {}, 7 | "source": [ 8 | "# DAgger (Dataset Aggregation)\n", 9 | "\n", 10 | "In Behavioral Cloning (BC), we have seen the method to learn expert's behavior with a manner of solving regression/classification problems.\n", 11 | "\n", 12 | "However, it sometimes happens that the learner encounters unknown states that the expert never encounters in her/his successful demonstrations. A small error at early time-step may then lead to further mistakes, leading to poor performance.
\n", 13 | "DAgger addresses this problem by collecting the additional dataset of trajectories under the state distribution induced by the learned policy.\n", 14 | "\n", 15 | "Unlike previous behavior cloning (BC) example, DAgger queries an expert **online** for demonstrations.
\n", 16 | "This eventually reduces the size of training demonstrations to obtain satisfactory performance (compared to vanilla behavior cloning), but it's limited to the case in which the trainer can query an expert online (interactively) for demonstrations.\n", 17 | "\n", 18 | "I note that DAgger essentially differs from the incremental learning, in which RL method follows to refine the trained policy. Unlike reinforcement learning (RL), DAgger doesn't also use the rewards to refine the policy.\n", 19 | "\n", 20 | "> Note : However, you can progressively combine behavior cloning (BC), data aggregation (DAgger), and reinforcement learning (RL) for refinement to improve performance.
\n", 21 | "> See [here](https://developer.nvidia.com/blog/training-sim-to-real-transferable-robotic-assembly-skills-over-diverse-geometries/) for real Robotics example in NVIDIA.\n", 22 | "\n", 23 | "Now let's start.\n", 24 | "\n", 25 | "*(back to [index](https://github.com/tsmatz/imitation-learning-tutorials/))*" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "id": "715f1e95-d88f-4a41-bd0f-9f5b6885cacf", 31 | "metadata": {}, 32 | "source": [ 33 | "Before we start, we need to install the required packages." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "id": "d35dde25-4a6e-4eb1-8fd3-9dc34cb81b58", 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "!pip install torch numpy matplotlib" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "id": "f9beeb14-fa75-4f68-9b50-bd38633ecf83", 49 | "metadata": {}, 50 | "source": [ 51 | "## Restore environment" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "id": "3d2413be-f2bd-4a69-befe-73179832b739", 57 | "metadata": {}, 58 | "source": [ 59 | "Firstly, I restore GridWorld environment from JSON file.\n", 60 | "\n", 61 | "For details about this environment, see [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md).\n", 62 | "\n", 63 | "> Note : See [this script](./00_generate_expert_trajectories.ipynb) for generating the same environment." 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 1, 69 | "id": "5243b603-c3e9-4bbe-85d5-37d4101fae7f", 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "import torch\n", 74 | "import json\n", 75 | "from gridworld import GridWorld\n", 76 | "\n", 77 | "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", 78 | "\n", 79 | "with open(\"gridworld.json\", \"r\") as f:\n", 80 | " json_object = json.load(f)\n", 81 | " env = GridWorld(**json_object, device=device)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "id": "6791fd52-4b0e-445b-8d54-f07870dfd6c3", 87 | "metadata": {}, 88 | "source": [ 89 | "Now I visualize our GridWorld environment.\n", 90 | "\n", 91 | "The number in each cell indicates the reward score on this state.
\n", 92 | "The goal state is on the right-bottom corner (in which the reward is ```10.0```), and the initial state is uniformly picked up from the gray-colored cells.
\n", 93 | "If the agent can reach to goal state without losing any rewards, it will get ```10.0``` for total reward.\n", 94 | "\n", 95 | "See [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md) for details about the game rule of this environment." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 2, 101 | "id": "86cb09b4-9b41-4a1c-8a3e-b9a6f73d8252", 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "data": { 106 | "text/html": [ 107 | "
0-1000-1000000-10-1-1-1-10-10-1000-1000-1-10-10000-1-100000000000
-10000-10-10000000000000-1-1-10-100000-10-10-10-10-1-1000-1-100-1-1
00-1-10-10-10-10000000-100-100-10000000-100-100-1-100-10000-1000
00-10-1-100-10-10000-10-10-10-100-1-100-1000000000000-10000000-1
-1-1-10000000000000000-1-10-1000000000000-10-1000-100-10-1-100
0-1-100-10-100-100-100-1-100-1000-10000-1-100-10-1-1-1-100-10-100000-1
-1-10000000000-10-100000-1-100-1-100-1-10-10000000-1-1000-10-10-1-1
0000-1000-10000-100-1-100-100-100-1000-1-100-10-10-100-10-1000000
0000000000-1-10-100-100-1-1000000-1000-1-1-1-1-1-10-1-10-1000-10000
000-10-100-1-100-1000000000-1-1-10-10000000000-1000000-1-10-1-1-1
00000-10000-100-1000-1000-100-1000-1-10-1-100000-1-1-1-1-1-100-1-1-10
-10-1000-10-100000000-100-100-10000000-1-10000-100000-1-10-1000
-10-10-10000-10000-1000-10-10-1-1-10-1000-1-10000-1000-1-1000-10-100
000-10-100-1-1-1000-100000-100000-1-10000000000-10000-10-10000
-10-100-10-1-1-100-1000-10-1000000-1-10-1-10-10-1-10-100000-10-1000-10
-1-100-100-1-1000-1000-10000-10-10-10-10-1-1-100000-1-1-1-1-1-1-1-100-10-1
0-100000-1000000-100-1-100-1-10-10-10-1-10000000-1-10-10-1-10-10-1-1-1
0-100-1-10000000-100000-1000-100-1000-1-10-10-1-10-1000-1-1-10000-1
0-1-100000000-100-10-1-1-1000-100-1000-100000-10000-10000000-10
0-1-1000-1000-10000-1000000-10-10-1-1-100000-1-10000000000-10-1-1
-1-1-1-1000-1-10-1-100-100-100000-100000-1-1000-1000-1-1-1-100-1-10-1-1-1
-100000-1000-100-1-1-100-1000-100-10000-1-10000-10-100-100-100000
0-100-10-1-1-10-1-1-10000000000-1-100-1000-100-1-1-1-10-1-100-1-10-10-10
-1000-1-100-10000000-10-100-10-1-100-1-10-10000-1-1-10000000-1-1000
000-1000-100-10-10000-10-10000-1000-10-1000-100-100-1-10-1-1-1-1-100
0-10-1000000-10-1-10-10-1000-100-10-1-100-100000-1-1-10-10000-1-1-100
-100000000-10-1-1000-1-10-1-10-1000-10000-10-10-100-100-1000000-10
0-1-1-10-1-100-1-10-10000-1000-10000000-1-1000-10000000-10000000
00-1-100-1-100000-100-1-1000-100-100-10-1-1000-1-100000-100-100-100
000000-1-1-1-10000-10-10-10000-1-1-1-1-1-1-1-1000-10-10-100000000-100
0-10-1000-1-100-10-1-1000000000000000000-10-100000-1-10000000
00-100-1-1-1000000000-1-1000-1-1-1000000-1-1000-100-1-10000-1-100-1
0000-10-10000000000-10-1-1-10-1-1-1000-1000-10-1-1000000000-1000
-100-1-10-10-100000-10-1-10-100-100-1000-1-1000-1000-1-10-10-1-100000
00-100-10-10-1-100-1-1-1-1-1000-1-10-100000000-1-1-1000-1000-100-1-100
0-1-10000000-10-10-10-10000000-10000-1000000-100-1-10000-10000
0000-10-1000-10-1-100000-1-100000-10000000-100-100000-1-100-100
000-100-10-1000-100-1-1-1000000-1000-10-100-1-10000-1-1000000-10-1
0000000-1-10000000-1000-1-1000-1-1-10-100-10-10-10-10000-1-100000
000000-100-100-10-100000-10-1000000-10-10-10000000-10-1-1000-10
0-1000-1-1-1000-10-10-10-1-100-1-1000-1-100000-10-100-1-1-1-1-1000-1-1-10
0000-1-1-100-10-10-10-1-1-100-10000-1-1-10000-100-10-100000-1-1000-10
000-10000000-1000000-100-1-10-10-100-10000-1000000-100-1-1-10-1-1
00000000-1-1000-1000-1-1-100000000-10-1-10-10-100-100000000000
-1-100000-1000-100-1-1-10-1000-10000-100-1-10-10-1-100-1-10-10-100000
-10-1-1-1-1000-100-1000-1000-1-1000-10-10000-10000-1-100000-1-10000
00000-10-1-100000-10-10-10000-1-10-10000-1-1-100-10-1-1000000000-1
00-1-1-1-1-100-10000-1-100-10000-10000-1-1-10-1-10000-1-10-1-10-10-1000
-100-1-10-100-10-1000000-1-10-1000-100000-1000000-100-1-1000-1000
-1000-10-1-1-1-10-10-1-100-1000-10000-1-100000-10-1-100-100000-100-110
" 108 | ], 109 | "text/plain": [ 110 | "" 111 | ] 112 | }, 113 | "metadata": {}, 114 | "output_type": "display_data" 115 | } 116 | ], 117 | "source": [ 118 | "from IPython.display import HTML, display\n", 119 | "\n", 120 | "valid_states_all = torch.cat((env.valid_states, torch.tensor([env.grid_size-1,env.grid_size-1]).to(device).unsqueeze(dim=0)))\n", 121 | "valid_states_all = valid_states_all[:,0] * env.grid_size + valid_states_all[:,1]\n", 122 | "\n", 123 | "html_text = \"\"\n", 124 | "for row in range(env.grid_size):\n", 125 | " html_text += \"\"\n", 126 | " for col in range(env.grid_size):\n", 127 | " if row*env.grid_size + col in valid_states_all:\n", 128 | " html_text += \"\"\n", 133 | " html_text += \"\"\n", 134 | "html_text += \"
\"\n", 129 | " else:\n", 130 | " html_text += \"\"\n", 131 | " html_text += str(env.reward_map[row*env.grid_size+col].tolist())\n", 132 | " html_text += \"
\"\n", 135 | "\n", 136 | "display(HTML(html_text))" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "id": "8892d179-c4a7-4274-b682-dddd8349182f", 142 | "metadata": {}, 143 | "source": [ 144 | "## Define policy" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "id": "605ebd18-e547-464a-8977-819f298b0259", 150 | "metadata": {}, 151 | "source": [ 152 | "Now I build a policy $\\pi$.\n", 153 | "\n", 154 | "In DAgger, we need both expert policy and learner policy.
\n", 155 | "The expert policy is already generated in [this script](./00_generate_expert_trajectories.ipynb) and the trained parameters are saved as ```expert_actor.pt``` in this repository.\n", 156 | "\n", 157 | "This network receives the current state (one-hot state) as input and returns the optimal action (action's logits) as output." 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 3, 163 | "id": "b5d9fdbd-a62d-4768-aec5-39c75cc29dee", 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "import torch.nn as nn\n", 168 | "from torch.nn import functional as F\n", 169 | "\n", 170 | "STATE_SIZE = env.grid_size*env.grid_size # 2500\n", 171 | "ACTION_SIZE = env.action_size # 4\n", 172 | "\n", 173 | "# Recover policy for expert\n", 174 | "# (See 00_generate_expert_trajectories.ipynb.)\n", 175 | "class ExpertNet(nn.Module):\n", 176 | " def __init__(self, hidden_dim=16):\n", 177 | " super().__init__()\n", 178 | " self.output = nn.Linear(STATE_SIZE, ACTION_SIZE, bias=False)\n", 179 | "\n", 180 | " def forward(self, state):\n", 181 | " logits = self.output(state)\n", 182 | " return logits\n", 183 | "\n", 184 | "# Define policy for learner\n", 185 | "class LearnerNet(nn.Module):\n", 186 | " def __init__(self, hidden_dim=64):\n", 187 | " super().__init__()\n", 188 | " self.hidden = nn.Linear(STATE_SIZE, hidden_dim)\n", 189 | " self.classify = nn.Linear(hidden_dim, ACTION_SIZE)\n", 190 | "\n", 191 | " def forward(self, s):\n", 192 | " outs = self.hidden(s)\n", 193 | " outs = F.relu(outs)\n", 194 | " logits = self.classify(outs)\n", 195 | " return logits" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 4, 201 | "id": "f9fe58c8-2d6d-4577-be22-2a2a0e48faf9", 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "# load expert model and freeze\n", 206 | "expert_func = ExpertNet().to(device)\n", 207 | "expert_func.load_state_dict(torch.load(\"expert_actor.pt\"))\n", 208 | "for param in expert_func.parameters():\n", 209 | " param.requires_grad = False\n", 210 | "\n", 211 | "# load learner model\n", 212 | "learner_func = LearnerNet().to(device)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "id": "5af6b909-fb3a-4c7b-be57-9e9d2cdff393", 218 | "metadata": {}, 219 | "source": [ 220 | "## Run agent before training" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "id": "bb8bc5a6-b58c-49b2-9975-d6645e2afa5b", 226 | "metadata": {}, 227 | "source": [ 228 | "For comparison, now I run this agent without any training.\n", 229 | "\n", 230 | "In this game, the maximum episode's reward without losing any rewards is ```10.0```. (See [Readme.md](https://github.com/tsmatz/imitation-learning-tutorials/blob/master/Readme.md) for game rule in this environment.)
\n", 231 | "As you can see below, it has low average of rewards." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 5, 237 | "id": "418d8cdc-cb9b-4f2a-836a-e038e561b32b", 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "# Pick stochastic samples with policy model\n", 242 | "def pick_sample(policy, s):\n", 243 | " \"\"\"\n", 244 | " Stochastically pick up action and logits with policy model.\n", 245 | "\n", 246 | " Parameters\n", 247 | " ----------\n", 248 | " policy : torch.nn.Module\n", 249 | " Policy network to use\n", 250 | " s : torch.tensor((..., STATE_SIZE), dtype=int)\n", 251 | " The feature (one-hot) of state.\n", 252 | " The above \"...\" can have arbitrary shape with 0 or 1 dimension.\n", 253 | "\n", 254 | " Returns\n", 255 | " ----------\n", 256 | " action : torch.tensor((...), dtype=int)\n", 257 | " The picked-up actions.\n", 258 | " If input shape is (*, STATE_SIZE), this shape becomes (*).\n", 259 | " \"\"\"\n", 260 | " # get logits from state\n", 261 | " # --> size : (*, ACTION_SIZE)\n", 262 | " logits = policy(s.float())\n", 263 | " # from logits to probabilities\n", 264 | " # --> size : (*, ACTION_SIZE)\n", 265 | " probs = F.softmax(logits, dim=-1)\n", 266 | " # pick up action's sample\n", 267 | " # --> size : (*, 1)\n", 268 | " a = torch.multinomial(probs, num_samples=1)\n", 269 | " # --> size : (*)\n", 270 | " a = a.squeeze()\n", 271 | "\n", 272 | " # Return\n", 273 | " return a" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 6, 279 | "id": "e57b8b92-2df9-4b44-b4ce-14b123cc7e85", 280 | "metadata": {}, 281 | "outputs": [ 282 | { 283 | "name": "stdout", 284 | "output_type": "stream", 285 | "text": [ 286 | "Estimated rewards (before training): -69.95\n" 287 | ] 288 | } 289 | ], 290 | "source": [ 291 | "def evaluate(policy, batch_size):\n", 292 | " total_reward = torch.tensor(0.0).to(device)\n", 293 | " s = env.reset(batch_size)\n", 294 | " while True:\n", 295 | " s_onehot = F.one_hot(s, num_classes=STATE_SIZE)\n", 296 | " with torch.no_grad():\n", 297 | " a = pick_sample(policy, s_onehot)\n", 298 | " s, r, term, trunc = env.step(a, s)\n", 299 | " total_reward += torch.sum(r)\n", 300 | " done = torch.logical_or(term, trunc)\n", 301 | " work_indices = (done==False).nonzero().squeeze(dim=-1)\n", 302 | " if not (len(work_indices) > 0):\n", 303 | " break;\n", 304 | " s = s[work_indices]\n", 305 | " return total_reward.item() / batch_size\n", 306 | "\n", 307 | "avg_reward = evaluate(learner_func, 300)\n", 308 | "print(f\"Estimated rewards (before training): {avg_reward}\")" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "id": "5a25364e-c584-47cb-b090-dc97febf3863", 314 | "metadata": {}, 315 | "source": [ 316 | "## Train policy with DAgger" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "id": "eff25e4b-aaec-40e5-81a9-7737a1501004", 322 | "metadata": {}, 323 | "source": [ 324 | "In DAgger, the following stochastic mixing $\\pi_i$ (where $i$ is the number of training iteration) between expert policy $\\pi^*$ and learner policy $\\hat{\\pi}_i$ is used to collect the visited states. (See [original paper](https://arxiv.org/pdf/1011.0686).)\n", 325 | "\n", 326 | "$\\pi_i = \\beta_i \\pi^* + (1 - \\beta_i) \\hat{\\pi}_i$\n", 327 | "\n", 328 | "where $\\beta_i$ satisfies $\\frac{1}{N} \\sum_{i=1}^{N} \\beta_i \\to 0$.\n", 329 | "\n", 330 | "In this example, I briefly set $\\beta_i = p^{i - 1}$ where $0 \\lt p \\lt 1$." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 7, 336 | "id": "75d853dd-d2b2-41c8-ab1b-f922253443ca", 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "def get_beta(p, iter_num):\n", 341 | " return p**iter_num" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "id": "28619b24-d5b0-402c-9a6f-fe5e5f2a2b85", 347 | "metadata": {}, 348 | "source": [ 349 | "Now I create a function to collect the visited states by applying the stochastic mixing with $\\beta_i$.\n", 350 | "\n", 351 | "> Note : You can run as a batch to speed up training. (Here I run each inference one by one.)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 8, 357 | "id": "8c87fb33-348f-44c6-852f-f6147341fb4c", 358 | "metadata": {}, 359 | "outputs": [], 360 | "source": [ 361 | "import random\n", 362 | "\n", 363 | "def collect_states(expert, learner, p_for_beta, i, collect_dat):\n", 364 | " \"\"\"\n", 365 | " Collect visited states with mixing policies on i-th iteration\n", 366 | "\n", 367 | " Parameters\n", 368 | " ----------\n", 369 | " expert : torch.nn.Module\n", 370 | " Online expert policy.\n", 371 | " learner : torch.nn.Module\n", 372 | " Learner policy.\n", 373 | " p_for_beta : int\n", 374 | " Base p value used in get_beta() function.\n", 375 | " i : int\n", 376 | " The number of training iteration\n", 377 | " (which is passed as iter_num in get_beta() function)\n", 378 | " collect_dat : int\n", 379 | " The number of data to collect.\n", 380 | "\n", 381 | " Returns\n", 382 | " ----------\n", 383 | " states : tensor of int[collect_dat, STATE_SIZE]\n", 384 | " Collected onehot states. (The number of row is collect_dat.)\n", 385 | " \"\"\"\n", 386 | "\n", 387 | " collected_states = []\n", 388 | " beta = get_beta(p_for_beta, i)\n", 389 | " done = True\n", 390 | " while len(collected_states) < collect_dat:\n", 391 | " if done:\n", 392 | " s = env.reset(batch_size=1)\n", 393 | " done = False\n", 394 | " s_onehot = F.one_hot(s.squeeze(dim=0), num_classes=STATE_SIZE)\n", 395 | " collected_states.append(s_onehot)\n", 396 | " rnd = random.uniform(0, 1)\n", 397 | " policy = expert if rnd < beta else learner\n", 398 | " with torch.no_grad():\n", 399 | " a = pick_sample(policy, s_onehot)\n", 400 | " s, _, term, trunc = env.step(a.unsqueeze(dim=0), s)\n", 401 | " done = torch.logical_or(term, trunc).squeeze(dim=0)\n", 402 | " return torch.stack(collected_states, dim=0)" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "id": "e31454b5-6c2e-4fb3-8a96-2e25ef3ab3be", 408 | "metadata": {}, 409 | "source": [ 410 | "Now we start training with DAgger.\n", 411 | "\n", 412 | "> Note : To simplify, here I store all history in memory, but in practice, please save on disk to prevent the failure of memory allocation." 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 9, 418 | "id": "25168c4b-2f10-490a-90b8-fc2238649d22", 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "Iteration 1 - Evaluation result (Average reward) -66.795\n", 426 | "Iteration 2 - Evaluation result (Average reward) -62.37\n", 427 | "Iteration 3 - Evaluation result (Average reward) -53.07\n", 428 | "Iteration 4 - Evaluation result (Average reward) -40.26\n", 429 | "Iteration 5 - Evaluation result (Average reward) -34.41\n", 430 | "Iteration 6 - Evaluation result (Average reward) -28.63\n", 431 | "Iteration 7 - Evaluation result (Average reward) -20.155\n", 432 | "Iteration 8 - Evaluation result (Average reward) -13.59\n", 433 | "Iteration 9 - Evaluation result (Average reward) -7.31\n", 434 | "Iteration 10 - Evaluation result (Average reward) -3.105\n", 435 | "Iteration 11 - Evaluation result (Average reward) 0.685\n", 436 | "Iteration 12 - Evaluation result (Average reward) 2.78\n", 437 | "Iteration 13 - Evaluation result (Average reward) 4.215\n", 438 | "Iteration 14 - Evaluation result (Average reward) 5.325\n", 439 | "Iteration 15 - Evaluation result (Average reward) 4.795\n", 440 | "Iteration 16 - Evaluation result (Average reward) 6.05\n", 441 | "Iteration 17 - Evaluation result (Average reward) 6.605\n", 442 | "Iteration 18 - Evaluation result (Average reward) 6.235\n", 443 | "Iteration 19 - Evaluation result (Average reward) 7.415\n", 444 | "Iteration 20 - Evaluation result (Average reward) 7.28\n", 445 | "Iteration 21 - Evaluation result (Average reward) 7.5\n", 446 | "Iteration 22 - Evaluation result (Average reward) 7.79\n", 447 | "Iteration 23 - Evaluation result (Average reward) 7.375\n", 448 | "Iteration 24 - Evaluation result (Average reward) 8.04\n", 449 | "Iteration 25 - Evaluation result (Average reward) 7.46\n", 450 | "Iteration 26 - Evaluation result (Average reward) 8.135\n", 451 | "Iteration 27 - Evaluation result (Average reward) 7.965\n", 452 | "Iteration 28 - Evaluation result (Average reward) 8.145\n", 453 | "Iteration 29 - Evaluation result (Average reward) 8.175\n", 454 | "Iteration 30 - Evaluation result (Average reward) 8.135\n" 455 | ] 456 | } 457 | ], 458 | "source": [ 459 | "#\n", 460 | "# Define train() function\n", 461 | "#\n", 462 | "def train(expert, learner, p_for_beta, collect_dat, total_aggregate, train_batch_size, verbose=False):\n", 463 | " \"\"\"\n", 464 | " Train with DAgger.\n", 465 | "\n", 466 | " Parameters\n", 467 | " ----------\n", 468 | " expert : torch.nn.Module\n", 469 | " Online expert policy.\n", 470 | " learner : torch.nn.Module\n", 471 | " Learner policy.\n", 472 | " p_for_beta : int\n", 473 | " Base p value used in get_beta() function.\n", 474 | " collect_dat : int\n", 475 | " The number of data to collect in each data aggregation.\n", 476 | " total_aggregate : int\n", 477 | " The number of aggregation.\n", 478 | " As a result, the number of total collected data will be collect_dat * total_aggregate.\n", 479 | " train_batch_size : int\n", 480 | " The number of data in each training batch.\n", 481 | " The collected data is divided into this chunk as a batch, and each batch is then trained.\n", 482 | " verbose : bool\n", 483 | " Whether to print results or not\n", 484 | "\n", 485 | " Returns\n", 486 | " ----------\n", 487 | " logs : int[total_aggregate]\n", 488 | " Returns the results of evaluated rewards in each aggregation.\n", 489 | " \"\"\"\n", 490 | "\n", 491 | " reward_log = []\n", 492 | " visited_states_all = torch.empty((0,STATE_SIZE), dtype=torch.int64).to(device)\n", 493 | " expert_actions_all = torch.empty((0), dtype=torch.int64).to(device)\n", 494 | "\n", 495 | " # create optimizer\n", 496 | " opt = torch.optim.AdamW(learner.parameters(), lr=0.001)\n", 497 | " \n", 498 | " for i in range(total_aggregate):\n", 499 | " # collect visited states (onehot)\n", 500 | " visited_states = collect_states(expert, learner, p_for_beta, i, collect_dat)\n", 501 | " # collect expert actions\n", 502 | " with torch.no_grad():\n", 503 | " logits = expert(visited_states.float())\n", 504 | " probs = F.softmax(logits, dim=-1)\n", 505 | " expert_actions = torch.multinomial(probs, num_samples=1)\n", 506 | " expert_actions = expert_actions.squeeze(dim=-1)\n", 507 | " # combine dataset\n", 508 | " visited_states_all = torch.cat((visited_states_all, visited_states), dim=0)\n", 509 | " expert_actions_all = torch.cat((expert_actions_all, expert_actions), dim=0)\n", 510 | " # shuffle dataset for training\n", 511 | " indices = torch.randperm(visited_states_all.shape[0])\n", 512 | " visited_states_train = visited_states_all[indices]\n", 513 | " expert_actions_train = expert_actions_all[indices]\n", 514 | " # train leraner policy by chunking dataset\n", 515 | " for j in range(0, collect_dat, train_batch_size):\n", 516 | " opt.zero_grad()\n", 517 | " logits = learner(visited_states_train[j:j+train_batch_size].float())\n", 518 | " loss = F.cross_entropy(logits, expert_actions_train[j:j+train_batch_size], reduction=\"none\")\n", 519 | " loss.sum().backward()\n", 520 | " opt.step()\n", 521 | " # evaluate policy and log\n", 522 | " avg = evaluate(learner, 200)\n", 523 | " reward_log.append(avg)\n", 524 | " if verbose:\n", 525 | " print(\"Iteration {:2d} - Evaluation result (Average reward) {}\".format(i + 1, avg))\n", 526 | "\n", 527 | " # return evaluation reward's log\n", 528 | " return reward_log\n", 529 | "\n", 530 | "#\n", 531 | "# Run training\n", 532 | "#\n", 533 | "dagger_results = train(\n", 534 | " expert=expert_func,\n", 535 | " learner=learner_func,\n", 536 | " p_for_beta=0.3,\n", 537 | " collect_dat=5000,\n", 538 | " total_aggregate=30,\n", 539 | " train_batch_size=200,\n", 540 | " verbose=True,\n", 541 | ")" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "id": "6b089553-cad3-48cc-a746-1cd53cb98578", 547 | "metadata": {}, 548 | "source": [ 549 | "## Compare results with BC" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "id": "22474b14-0fb1-44ca-afa8-5450a3913715", 555 | "metadata": {}, 556 | "source": [ 557 | "Now let's compare with the results of vanilla BC (behavior cloning).\n", 558 | "\n", 559 | "> Note : In below code, I have trained with BC by setting ```p_for_beta=1.0```.\n", 560 | "\n", 561 | "As you can easily see, DAgger has better performance (converge faster) than the results of vanilla BC.\n", 562 | "\n", 563 | "This example uses a primitive environment (GridWorld) to learn policy, but the real system is more complex and the learner often encounters the unseen states (especillay, in early stages) that the expert never encounters in successful demonstrations. As a result, the learner never converges or very slowly converges into optimal performance in vanilla BC methods.
\n", 564 | "In such case, there might be a clear difference between the results of two methods.\n", 565 | "\n", 566 | "DAgger addresses this problem by applying optimization with the visited states induced also by learner policy (not only expert policy)." 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": 10, 572 | "id": "f496130a-09e9-4696-8bb6-6f08e021c40d", 573 | "metadata": {}, 574 | "outputs": [ 575 | { 576 | "data": { 577 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiwAAAGdCAYAAAAxCSikAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABbIklEQVR4nO3dd3hUZf7+8fekzKQnQBqBBBJKkA5BIK5gAQl+RUVd1w4oyuri7iqsCupi2XXxhy721bWAjVXXhi42EAQLSJOOICUQIAVCyaRnkjm/P04YiNSEJGcmuV/Xda45bc58mAyZO+c8z3NshmEYiIiIiHgxP6sLEBERETkVBRYRERHxegosIiIi4vUUWERERMTrKbCIiIiI11NgEREREa+nwCIiIiJeT4FFREREvF6A1QWcKbfbTXZ2NuHh4dhsNqvLERERkdNgGAaFhYUkJCTg53fq8yc+H1iys7NJTEy0ugwRERGpg127dtG2bdtT7ufzgSU8PBww/8EREREWVyMiIiKnw+l0kpiY6PkePxWfDyyHLwNFREQosIiIiPiY023OoUa3IiIi4vUaNLB8++23XHrppSQkJGCz2Zg9e3aN7YZhMGXKFFq3bk1wcDBDhw5ly5YtDVmSiIiI+KAGDSzFxcX06tWLF1544bjbp02bxrPPPstLL73E0qVLCQ0NJSMjg7KysoYsS0RERHxMg7Zhufjii7n44ouPu80wDJ5++mkefPBBLr/8cgDefPNN4uLimD17Ntdee2291WEYBpWVlVRVVdXbMaVu/P39CQgIUBd0ERGpFcsa3WZmZpKbm8vQoUM96yIjIxkwYABLliw5YWApLy+nvLzcs+x0Ok/6OhUVFeTk5FBSUlI/hcsZCwkJoXXr1tjtdqtLERERH2FZYMnNzQUgLi6uxvq4uDjPtuOZOnUqjzzyyGm9htvtJjMzE39/fxISErDb7frL3kKGYVBRUcG+ffvIzMykU6dOpzVYkIiIiM91a548eTITJkzwLB/ux308FRUVuN1uEhMTCQkJaawS5SSCg4MJDAxk586dVFRUEBQUZHVJIiLiAyz78zY+Ph6AvLy8Guvz8vI8247H4XB4xlw53bFX9Fe8d9HPQ0REasuyb47k5GTi4+OZP3++Z53T6WTp0qWkp6dbVZaIiIh4oQa9JFRUVMTWrVs9y5mZmaxevZqWLVuSlJTEXXfdxd///nc6depEcnIyf/3rX0lISGDkyJENWZaIiIj4mAY9w7JixQr69OlDnz59AJgwYQJ9+vRhypQpANx777388Y9/ZNy4cZx99tkUFRXx5Zdfql0DMGbMGGw2GzabjcDAQOLi4rjooouYMWMGbrf7mP0zMjLw9/dn+fLlFlQrIiLSsGyGYRhWF3EmnE4nkZGRFBQUHNOepaysjMzMTJKTk30uBI0ZM4a8vDxmzpxJVVUVeXl5fPnll0ydOpVBgwbx6aefEhBgniDLysqiW7du3HLLLVRUVPDiiy9aXL2poqLiuF2XffnnIiIi9eNk39/H43O9hJoTh8PhaYDcpk0b+vbty8CBAxkyZAivv/46t956KwAzZ85kxIgR3HHHHQwcOJDp06cTHBzsOU5hYSG33347s2fPJiIignvvvZdPPvmE3r178/TTTwOQk5PDrbfeyoIFC4iPj+exxx7j/vvv56677uKuu+4C4NChQ/zlL3/hk08+oby8nH79+vHUU0/Rq1cvAB5++GFmz57NnXfeyWOPPcbOnTuPezZIRMSXHCyuYEO2k1xnGSF2f0IdAYR6HgMIdZjzjgC/Og2dYRgG5ZVuissrKS6vorDcRXF5FcXllRSVV1JcXklJRRVuw8AwoMowPPNut4HbAHf1OnPiyL5uc50NG/5+4Odnw99mw89m88wfvd7fz9xmPh5Z37NtFF0TrL3BcLMKLIZhUOqyZrTb4ED/ehkD5sILL6RXr1589NFH3HrrrRiGwcyZM3nhhRfo0qULHTt25IMPPuCmm27yPGfChAn88MMPfPrpp8TFxTFlyhR++uknevfu7dln1KhR5Ofns3DhQgIDA5kwYQJ79+6t8dpXX301wcHBfPHFF0RGRvLvf/+bIUOG8Msvv9CyZUsAtm7dyocffshHH32Ev7//Gf97RaR5MgyDvYXlbMguYMMeJxuynWzMcVJSUUlKTBid48LoFBtOp+rH6LAzH2fLMAxynWVs2ONkfXaB+ZrZTvYcKj2t5/v72Qix+xPmCDgq2JiBJsQeQKXbTWFZpSeYFJVXUlxRSVFZJZVu777Yce/wVAWWxlTqqqLrlK8see2Nj2YQYq+ft7tLly6sXbsWgK+//pqSkhIyMjIAuPHGG3nttdc8gaWwsJA33niD//znPwwZMgQwz8gkJCR4jrdp0ya+/vprli9fTr9+/QB49dVX6dSpk2ef77//nmXLlrF3714cDgcATz75JLNnz+aDDz5g3LhxgHkZ6M033yQmJqZe/q0i0jgMw8BZVsn+onIOFFewv7iC/UUVHCguP2reXO9ng4SoYNocnloEe5brEhzcboMd+4vZkO2sngr4OcdJflHFcffPLzrAsswDNda1CAk8KsCE0TkunI5xYcSEOY5bj9ttkHWgxBNM1u8pYGO2k/3Fx3/Ndq1CSGoZQpmriuLyKkoqKimqfiypMP8QrnIbFJZVUlhWWat//9EOB50whxl0wqrngwL9Cag++2GzmWc//P2OzPvZjpwR8Ttq3eHtRvW/ucptmGdoqh+r3IfP0hy9/th927cKrfO/qb40q8DSVBiG4fkPOGPGDK655hpPe5brrruOe+65h23bttGhQwe2b9+Oy+Wif//+nudHRkaSmprqWd68eTMBAQH07dvXs65jx460aNHCs7xmzRqKiopo1apVjVpKS0vZtm2bZ7ldu3YKKyKn4Kpys3N/MQF+fgTb/QkK9Cc40J9Af9sZnyVwuw2KKipxlrpwllbiLHPhLHVRUOrCWVbpmT9QbAaQ/OqAcrCkAlfV6f+VvyH7+LdFsQf4kRAZZIaYyCNhpm2U+Rgd7mBHfrF55qQ6oPyc4/R86R/NzwYdYsLolhBBt4RIuiVEEB4UyNZ9hfySV8SWvCK27C0k60AJB0tcLNtxgGU7agaZqJBAOsWG0SkunORWoWQXlHrOnBSVHxss/P1sdIwJo1ubI6/ZNSGCiKDAE74XVW7z7H1x+ZGzJ8UV1fMVVZRUX9qxB/hVn3EJIDwooDqYVJ+JqT4b4++n0dhPpFkFluBAfzY+mmHZa9eXn3/+meTkZA4cOMDHH3+My+Wq0dC2qqqKGTNm8Nhjj9XbaxYVFdG6dWsWLlx4zLaoqCjPfGio9SlcxBtVVrn5cfsBPluXzZfrczlY4jpmH38/G8GB1QHG7kdwdZBxVD8GB/p7Ak5llbs6jFSHkjIXBSUuCssrOZOuFGGOAFqG2mkZaic6zF497zhq3k6V2yD7UCl7DpWx51CpOX+wlLzCMioq3ezYX8KO/bW7f1tQoB9d4s1wcDigdIkPJ+g4vzt7tI2ssVzmqmLr3iK27i3il7xCtuwtYkteITsPlHCoxMXyHQdZvuPgMcexB/hxVnw4XRMi6d7m5K95Mv5+Ns+ZEGk4zerdtdls9XZZxioLFixg3bp13H333cyaNYu2bdsye/bsGvvMnTuXf/7znzz66KOkpKQQGBjI8uXLSUpKAqCgoIBffvmFwYMHA5CamkplZSWrVq0iLS0NMNuiHDx45D943759yc3NJSAggPbt2zfKv1XE11W5DZZlHmDOWjOkHH25IcTuj5/NRklFJYebL1S5DYqq/xo/U/YAPyKDA4kICiAiOJCIoMDqR3O5VXX4aBXm8My3DLXX+sv6aK4qN7kFNUNMdkEpuw9WLx8qpczlJiok0BNKurY2A0pydCgB/nUbaSMo0J/ubSLp3ubYILNt35EgsyO/hNgIB92qA0qHmDAC6/ia0vh8+9u7iSsvLyc3N/eYbs0jRoxg1KhRpKWl8dvf/pbu3bvXeF5iYiKTJ0/myy+/5JJLLmH06NHcc889tGzZktjYWB566CH8/I60Zu/SpQtDhw5l3LhxvPjiiwQGBjJx4kSCg4M9+wwdOpT09HRGjhzJtGnT6Ny5M9nZ2Xz22WdcccUVnrYvIs2d222wYudBPlubzefrc9lXeOTu8i1CAhnevTUjerZmQHJLAvz9MAwDV5V5SaHMVUVpRRVllebjkXVuSl3Vy9Xr/f1sZiA5JpQEEBEUeEbBo64C/f1IbBlCYsvj37vNMMxAFuYIaJQb0QYF+ldf1ok89c7i9RRYvNiXX35J69atCQgIoEWLFvTq1Ytnn32W0aNHs2rVKtasWcMrr7xyzPMiIyMZMmQIr732GpdccgnTp0/n9ttvZ8SIEZ5uzbt27aoxBsqbb77J2LFjGTx4MPHx8UydOpUNGzZ49rHZbHz++ec88MAD3Hzzzezbt4/4+HgGDx58zB23RbxVZZWbXQdL2ba3iK37ithfVE5MuIO4iCDiI4KIjwwiLiKo1l/2brfBql2HmLM2m8/X5ZDnPBJSIoMDyegWx4ieCaR3aHXMX/Q2mw17gM1zRqQps9lshJ+kLYjIyWjguGaouLiYNm3a8M9//pOxY8ced5/du3eTmJjI119/7eldVF/0c5GGVlphXgrYtq/IE0627i1iR34JFVWnHhsoKiSQ+IigGkEmPjLoyLrIIKKCA1m7p4DP1mbz2docsgvKPM8PDwpgWNd4RvRszW86RmMP0GUHkV/TwHFyjFWrVrFp0yb69+9PQUEBjz76KACXX365Z58FCxZQVFREjx49yMnJ4d5776V9+/aedi4i3qigxMXmvEK27i3ytFXYurfopONmBAX6kRIdRsfYMGLCHeQXlZNbUEaes4xcZxllLjeHSlwcKnGxKbfwhMfx97NRddTYGaF2fy7qap5JGdQ5GkeAxiESqU8KLM3Ek08+yebNm7Hb7aSlpfHdd98RHR3t2e5yubj//vvZvn074eHhnHPOOcyaNYvAQJ2+Fe9SXF7J3I25fLwqm++37ONE4221CAmkY6wZTDrEhNEhNoyOMWG0iQrG7wRdRw3DwFlaSY6z9EiIKSgn13l43nzcX1xBldsgxO7PkLPiGNGzNed1jrGk3YhIc6FLQtLo9HOR2qqscvPd1nxmr9rD3A15NUasbhMVXCOYHJ5vGXrsfazqS3llFfsKy4kOcyikiNSRLgmJSJNgGAZrdxfw8ao9zFmbXWPU0+ToUEb2bsPIPgm0s2AETkeAP21bHL8njEgNbjdk/wRb5kHpQTDcRyaMo5aNmtt+vewfCC3aQ8sUaNkBWnWA0BhohN5WVJabNQQGn3rfBqTAIiK1VuU2WL7jAF9tyGXx1v1EBAeQEh1GSkwoKTHmY1LLkDqNcZG1v4TZq/cwe9UetucXe9a3CrVzaa8ERvZpQ6+2kY3SLVakTqpcsOM72PSZORXmNMzr2MOhZbIZXo4OMi1TTi/MVJZDUR4U5h6Zig7P50BhnvlYegD+70nof1vD/DtOkwKLiJyW8soqftiaz1fr8/j657xj7rny65FE/f1sJLUMISU69EiQiQ4lOSb0mPu7HCyuYM66HGav2sPKnUeOExTox7Cu8VzRpw3ndorWIF/ivSqKYevX8PMc2PIVlBUc2WYPg04XmYHC5veryVY9/Xq9H2A7so+rFA7ugAPbYP92KNgFFYWQu9acfq1GmOkAGDWDyeEgcroKc8/wDTpzCiwickJF5ZV8s2kvX23I5ZtNeyk+6n4vkcGBDD0rjqFnxVJR5WbbvmK27ysiM7+Y7fuKKXVVkZlfTGZ+MfM31TxueFBAdZAJo7DMxcLN+zx3q/WzwW86RjOydxsyusdruHPxXsX74ZcvzLMo2xZA5ZGu7YREQ5f/gy6XQsp5EOCo39euLDcDzP5tcGB7dZDZBgcyTx1mjuZvh7B4CI+H8DgIb23Oe9bFm+uCW5z8OI1AvwlEpIb9ReV8/XMeX23I4/st+TXGLYmPCGJYtzgyusXTP7nlCc94GIZBrrOM7dUhZtu+YrbnF5OZX8Tug6UUllWyZncBa3Yf+Su0W0IEV/Rpw6W9EoiLUGNs8VKHso5c6tn5Q3VblGpR7eCsS6HLCEjsD34N2CA7wAExqeb0a64yOLSzOsBUBxq/AAg7KpAcHUR85PKqAouIsOdQKV+tz+WrDbks33GgRlfh5OhQMrrFk9Etjl5to07YJfhoNpuN1pHBtI4M5jcdo2tsK3NVsXN/Cdv3FbE9v5gqt8Hw7vF0jguv73+WSN0YBhTtNc9UHNoJh3aZQWXPCshZU3Pf+B5mQOkyAuK6eceXf2DQicOMD1NgEWmmylxVfL4uh1lLs2q0GwHzbMfwbvFkdI+nU2xYvTZwDQr0JzU+nNR4BRSxiNttNi49lFVzKqgOJgW7a17eOZrND5LSocsl5tSifaOW3pwpsHipMWPG8MYbb3iWW7Zsydlnn820adPo2bMnYJ52f+WVV3jttdfYsGEDAQEBdOzYkRtvvJFx48YREqJul3KsnfuLmbU0i/dX7OJgiQsw/yg8u11LMrrHM6xr3AlvXifiFSrLobzIbKdRXgQVRUctH72u8Mi28uptzj1mIHG7Tv4aNj8IT4CoRIhKgshEaNXRbDwbGn3y50qDUGDxYsOHD2fmzJkA5Obm8uCDDzJixAiysrIAuOmmm/joo4948MEHef7554mJiWHNmjU8/fTTtG/fnpEjR1pYvXiTyio3Czbt5e2lWXz7yz7P+oTIIK4fkMTv+iUSq3Yj4k0qSiB/M+zbDHt/Nh/3/QzObKiqOPXzT8UvACLamGHkcCCJSjoSUCLamGOfiNdQYPFiDoeD+Ph4AOLj45k0aRKDBg1i3759fPPNN8yaNYvZs2fXuCdQ+/btueyyy3A6nVaVLV5kr7OMd5fv4p1lWeRU35zPZoPBnWK4cWA7LuwSi/9ptEkRL+N2H+kO6+sqiiH/F9i7yQwkhwPKoSzgFAOxBwSDI8zsNuwINyd72InXOcLNhqZRSeZjQzaKlXrXvAKLYYCrxJrXDgw5o18uRUVFvP3223Ts2JFWrVoxa9YsUlNTa4SVw2w2G5GRkWdSrfgwwzBYsn0/s37M4qsNuZ7uwi1CAvldv0SuH5BkyeiwUg9y1sKK12Dt++YX8KC/QNro+u8y21AKc2H7Iti7oTqgbDp5MAlpBTFnQWwXiKmeWrQDR4QZQvyb11dYc9e8ftquEvhHgjWvfX822Gv3JTFnzhzCwsIAKC4upnXr1syZMwc/Pz+2bNlCamrTagEuZ6ag1MWHK3cza+lOtu07MkJsWrsW3DgwiYu7t9Z9b3yRqxQ2fAzLXzN7qXjWF8MX98Di5+D8SdDzGu/7Ane7IXcN/PIV/PIlZK86/n4h0RB7VnXPlupgEnuW2opIDV726ZajXXDBBbz44osAHDx4kH/9619cfPHFLFu2DB+/Z6XUo10HSnjhm63MXr2HMpc5JkSI3Z+Rfdpw44B2dE049U3FxAvlb4WVM2HV21B2yFznFwhdL4O0m83LKIumQUEWfPIH+OFpuOAB6Hq5tZeKKorNsyi/fGkGlaJfjZCa0BfapNU8a6JgIqeheQWWwBDzTIdVr11LoaGhdOzY0bP86quvEhkZySuvvELnzp3ZtGnTSZ4tTV2Zq4qXv93OC99spbzSDCqpceHcODCJkX3aEB6kBoM+p8oFmz83z6ZkLjqyPjIJ+o2BPjdBWKy5LnkQ9LoOlr8C3z9lBpj3R0Pr3jDkr9BhSOMFl0O7jgSUzG+hqvzItsBQ6HABpF4MHS8yR1MVqYPmFVhstlpflvEmNpsNPz8/SktLuf7667n22mv55JNPjmnHYhiG57bd0jR9s2kvD/9vAzv3m22y0lNacfdFnTm7fQvdFNAXFeyBn96AlW8cdUbCBp0zoN9Y6Djk+A1E7SHwmz9D2hhY8oI55ayGt6+Cdr+BIVMgaWD91+uugj0rj4SUvPU1t0clQefh5tT+XN9pYyNerXkFFh9TXl5Obq75y+vgwYM8//zzFBUVcemll3Leeefx8ccfc9111/Hggw8ybNgwYmJiWLduHU899RR//OMf1a25Cdp1oIRH52xk3sY8AOIiHDxwSVcu7dlaQcXXuN2wfQEsn2Hej+bwEO+hMdB3lBlCopJO71hBkXDB/dB/nHm2Zdkr5rDxMzKg0zC48K/Qumfd6jQMs2Hs3o1mMMldDzu+h5L8I/vY/KBtf0itDikxXZpGDybxKgosXuzLL7+kdevWAISHh9OlSxfef/99zj//fAD+85//8PLLLzNjxgwee+wxAgIC6NSpE6NGjSIjI8PCyqW+/fryT4CfjVvOTeZPQzrp5oC+wF1lfunn/2JO+zbDju/Mm9cd1n4Q9LvFHOI9wF631wmNhozHYOAf4Ntp8NNbsGWuOXW70mzjEt3xxM8vc5pdivPWQ94Gc9q7EcqPM0yCI9I889N5OHQcCqGt6lazyGmyGT7eevPwpY+CggIiImo2LiwrKyMzM5Pk5GSCgjQolrfQz6V2jnf559HLu9FJ997xPq4y2L/VHPAsf4sZTPJ/Mdcdb6h3RyT0vs4MKg1x35f92+Cbf8D6D8xlmz/0ucHsDl1VUTOY5K2v7mJ8HH6BZn1x3SC2K7Tpaw5Pr4HV5Ayc7Pv7ePSnmYiX0uUfL1BVCZWlZhCprJ5cpdWPJUfOmuz7xQwpB3dywjFF/B3m0O7Rncwv/9izzMs1DdmurlUH+O1rcO5dsODvZpuTn940pxMJTzCDSVw3iOsOcV2hVae6n/URqScKLCJeRpd/Gsna/8Lq/5jdcD2hpPyo+VJwV9b+uEGREJ0K0Z0hpnP1fCfzJnlWjawa3wOufw+ylsL8R2Hn92bPxdizjgom1WdPQlpaU6PIKei3n4gX0eWfRlBVCXMfhKUv1u55/g4IDIKA6ikwGMLjzUAS09kMKNGpZrdjbz0DljQAxsyB0oMQFAV+flZXJHLaFFhEvIAu/zSS0kPwwc2wbYG5/Ju7oO3Z1UEk+NhAEnDUclP5crfZdBZFfJJXBJYXXniBJ554gtzcXHr16sVzzz1H//79rS5LpFH8nOPkmn8vwVlWqcs/DSl/K7xzjdkANjAErnjJHBVWRHyC5X8yvPfee0yYMIGHHnqIn376iV69epGRkcHevXvr7TV8vCNUk6OfxxE79xczasYynGWV9Gwbyed/HsT9/3eWwkp92zofXr3QDCsRbeGWrxRWRHyM5YFl+vTp3Hbbbdx888107dqVl156iZCQEGbMmHHGxw4MNLvclZRYdIdmOa7DP4/DP5/maq+zjJteW8a+wnK6xIfz1tgBdFZblfplGPDjizDrt1BWAIkDYNw3dR9ETUQsY+mfcRUVFaxcuZLJkyd71vn5+TF06FCWLFly3OeUl5dTXn7kPhVO53EGNKrm7+9PVFSU52xNSEiI2gNYyDAMSkpK2Lt3L1FRUfj7N987BxeUuhg1YxlZB0pIahnCm7f0JzK4eQe4eldZAZ9PPNKFt/cNMOIpDRMv4qMsDSz5+flUVVURF1fzZlhxcXEnvLHf1KlTeeSRR077NeLj4wHq9RKTnJmoqCjPz6U5Kq2o4tY3lrMpt5CYcAdvjx1AbIQG0KtXxfnw31Hm8PQ2P7joUUi/03t774jIKfnchfLJkyczYcIEz7LT6SQxMfGE+9tsNlq3bk1sbCwul6sxSpSTCAwMbNZnVlxVbsb/5yeW7zhIeFAAb97Sn6RWtb+Td5NhGOaw9f71+KsobwO8c605qJsjAn47AzpdVH/HFxFLWBpYoqOj8ff3Jy8vr8b6vLy8E/4F7nA4cDhqf0rX39+/WX9RivXcboN7P1jLgk17CQr0Y8aYszmr9amHo26yKsvhnesgc5F547wOF0DKBZDQp+4BZtNn8NE4qCiCFsnmYGkNMeS9iDQ6Sxvd2u120tLSmD9/vmed2+1m/vz5pKenW1iZSP0yDINH52zk41V7CPCz8eINaZzdvhmPhWEY8OmfYNt8czTZrMXwzWPw2lCYlgLv3gDLXzXvhXM6vcoMA777p/m8iiJIHgy3LVBYEWlCLL8kNGHCBEaPHk2/fv3o378/Tz/9NMXFxdx8881WlyZSb55fsJXXF+8A4Mmre3FBl1hrC7Lawsdh7bvmzfiueMkMGdu+Mc+2lBXApjnmBBCZBB3ON8++pJx/7KBnrlL49I+w7n1z+ezbYPhU3ZhPpImxPLBcc8017Nu3jylTppCbm0vv3r358ssvj2mIK+Kr3v5xJ/+c9wsAD13alZF92lhckcVWvwOLHjfnRzwFPX9nzve7xWzPkr0ati+AbQth11IoyDrqhn02aN3ryOWjFu3g/Zsh+yfwC4CLp8HZYy36h4lIQ7IZPj6KV21vTy3SmOaszeaP76zCMOBPF3ZkwrBmfoki8zt46wpwu+Dcu2Howyffv7wIspaYZ1+2fwN7Nx5/v+AW8Ls3zUtBIuITavv9bfkZFpGm6ttf9nH3e6sxDLhxYBJ3X9TZ6pKstW8zvHeDGVa6XQkXTjn1cxxhZg+fw718CnNh+8IjAaYoD2K6wHXvQMuUBi1fRKylwCLSAH7KOsjv31qJq8pgRM/WPHJZ9+Y9aGHRPph19ZHRZke+WLebCYbHQ69rzckw4GAmRLTRYHAizYACi0g925JXyC2vL6fUVcWgTtFM/11v/P2acVhxlVaPi7LT7Gp87TvmXZHPlM2msyoizYjl9xISaUp2HyzhpteWcajERZ+kKP59Uxr2gGb838ztNsdF2bPCbGdywwcQ2srqqkTEBzXj36Qi9Su/qJxRry0j11lG57gwZo45mxB7Mz+J+fUU+PlT8LfDtf+B6I5WVyQiPkqBRaQeFJa5GDNzGdvzi2kTFcybtwwgKsRudVnWWv4aLH7OnL/8X9DuHGvrERGfpsAicoYqq9yM/88q1u9x0irUzltj+xMf2cxvZrhlHnz+F3P+ggeh59XW1iMiPk+BReQMPfb5z3z7yz6CA/15/eb+pMSEWV2StXLXwftjwHBD7xtg8F+srkhEmgAFFpEz8M6yLGb+sAOAp67pRY+2kdYWZLWCPTDrd0fu5zPiabM3j4jIGVJgEamjH7fv56+z1wMw8aLODO/e2uKKLFZeCP+5BgqzzcHcfvcWBDTzdjwiUm8UWETqIGt/CXe8vZJKt8GlvRK488Jm3vulqtK8p0/eOgiNhev/C8FRVlclIk2IAotILRWWuRj7xnIOlrjo2TaSJ37bs3mPYmsY8MW9sHUeBATD9e+aNyUUEalHCiwitVDlNvjTO6vYsreIuAgHr4zqR1Cgv9VlWWvJ87DiNcAGV70KbdKsrkhEmqBmPqqVSO38vy838c3mfTgC/HhlVD/iIppx9+WDO+G7J+Gnt8zljMfgrBHW1iQiTZYCi8hpen/FLl7+djsAT17di55to6wtyCqHdplBZdXb4K401w0cDwP/YG1dItKkKbCInIblOw5w/8frAPjTkE5c2ivB4oosULAbvvuneUbF7TLXpVwA50+GpAHW1iYiTZ4Ci8gp7DpQwu1vrcRVZfB/PeK5a0gnq0tqXAV74Pvp8NObUFVhrks+zwwq7dKtrU1Emg0FFpGTKCqv5LY3V7C/uIJuCRE8eXUv/PyaSY8gZzZ8/xSsfP1IUGk/yAwq7X9jaWki0vwosIicgNttcNe7q9mUW0hMuINXR/drHndfduYcFVTKzXXtfmMGleRBlpYmIs1XM/jtK1I3T8zdzNc/52EP8OPlm9JoHRlsdUkNqzCvOqjMhMoyc11SenVQGawh9kXEUgosIsfx0U+7eXHhNgCmXdWTPkktLK6oAR3aBUtfguWvHgkqiQPMoJJyvoKKiHgFBRaRX/kp6yCTPjR7BP3h/A6M7NPG4orqkWHAwUzYudicdnwPh3Ye2d72bDOodLhQQUVEvIoCi8hR9hwqZdybK6mocjOsaxx/GZZqdUlnxjAg/xczmBwOKYXZNfex+UHiQBg0EToOUVAREa+kwCJSraSiktveWEF+UTld4sN56prevtcjyF0FeRuqw8n3sHMJlOTX3Mcv0Bw+v905ZmPaxP4QFGFNvSIip0mBRQQwDIMJ761hY46T6DA7r47uR6jDR/57VBSbPXoyv4WsJVBWUHN7QJB5qaf9uWZIadMP7CGWlCoiUlc+8htZpGF9sT6XLzfkYvf34983pdG2hY98oVcUw9u/hazFR9bZwyBpYPUZlHMhoQ8E2K2rUUSkHiiwSLNXUenm8S82AXD7+R1Ia9fS4opOk6sM3r3eDCuOSBj8F/MsSnxP8Nd/bRFpWvRbTZq9N5fsIOtACTHhDn4/OMXqck5PZQX8dxRsX2ieUbnxQ0g82+qqREQajJ/VBYhY6VBJBc8t2ArAxIs6+0a7lSoXfHAzbPkKAoLh+v8qrIhIk6fAIs3a8wu2UlDqIjUunKv7JVpdzqm5q+Dj38OmOeDvgOv+o/v6iEizoMAizdbO/cW8sWQHAPdfchb+3t6F2e2GT/8I6z80uyb/7k1zgDcRkWZAgUWarWlfbsZVZTCoUzTndY6xupyTMwz4fCKsngU2f/jtDEgdbnVVIiKNRoFFmqWVOw/y2boc/GzwwCVnWV3OyRkGfHU/rJgB2OCKf0PXy6yuSkSkUTVYYHnsscc455xzCAkJISoq6rj7ZGVlcckllxASEkJsbCz33HMPlZWVDVWSCGAOEvf3zzYCcHVaIl3ivXiUV8OA+Y/Cj/8yly9/HnpebW1NIiIWaLAuERUVFVx99dWkp6fz2muvHbO9qqqKSy65hPj4eBYvXkxOTg6jRo0iMDCQf/zjHw1Vlgifr8tlVdYhggP9mTiss9XlnNy3T8D30835S/4JfW60th4REYs02BmWRx55hLvvvpsePXocd/vcuXPZuHEjb7/9Nr179+biiy/mb3/7Gy+88AIVFRUNVZY0c+WVVTz+5c8A/P68FGIjgiyu6CR+eAa+ecycz/gHnH2rtfWIiFjIsjYsS5YsoUePHsTFxXnWZWRk4HQ62bBhwwmfV15ejtPprDGJnK63luxk14FSYsMdjPPmQeKW/hvmTTHnL/wrpI+3th4REYtZFlhyc3NrhBXAs5ybm3vC502dOpXIyEjPlJjoA2NniFc4VFLBs/O3APCXYamE2L10kLiVr8MX95rzg+8xh9wXEWnmahVYJk2ahM1mO+m0adOmhqoVgMmTJ1NQUOCZdu3a1aCvJ03Hs/O34iyrpEt8OFeltbW6nONb8y787y5z/pw/wgUPWFqOiIi3qNWfmBMnTmTMmDEn3Scl5fROs8fHx7Ns2bIa6/Ly8jzbTsThcOBwOE7rNUQO25FfzFs/7gDg/v/z0kHi1n8Es+8ADOg/Di76G9i8sE4REQvUKrDExMQQE1M/A2ylp6fz2GOPsXfvXmJjYwGYN28eERERdO3atV5eQ+SwaV9twlVlcF7nGAZ74yBxmz6Hj24Dww19R8Hw/6ewIiJylAa7iJ+VlcWBAwfIysqiqqqK1atXA9CxY0fCwsIYNmwYXbt25aabbmLatGnk5uby4IMPMn78eJ1BkXq1YscBPl+Xi5/NPLvidTZ/Yd552V0JPX4HI54GP43pKCJytAYLLFOmTOGNN97wLPfp0weAb775hvPPPx9/f3/mzJnDHXfcQXp6OqGhoYwePZpHH320oUqSZsgcJM7sxvy7fomkxodbXNGvbPq8Oqy4oOtIGPki+PlbXZWIiNexGYZhWF3EmXA6nURGRlJQUEBEhBePWCqWmLM2mzv/s4oQuz8L/3K+d427sukz+O9oM6x0uwKufBX8vbTnkohIPavt97fOO0uTVV5Zxf/70uy19vvBHbwrrPw850hY6X6VwoqIyCkosEiT9eZic5C4uAgHtw1OtrqcI37+H7x/OKz8Fq54WWFFROQUFFikSTpYXMFzC8xB4iZ60yBxGz+F98dUN7C92rzzssKKiMgpKbBIk/Tsgi1HBonr6yWDxG38BD64+UhYGfmSwoqIyGnSb0tpcjLzi3lryU4AHrykq3cMErdhNnxwCxhV0PMa9QYSEaklnWGRJmfal5uodBucnxrDuZ2irS4HNnx8VFi5VmFFRKQOFFikSVm+4wBfrPeiQeLWfwQfjDXDSq/rYOS/FFZEROpAgUWajKMHibvm7CQ6x1k8SNz6D+HDW6vDyvVw+QsKKyIidaQ2LNJkzFmbw5pdhwix+3P3RZ2sLWbdB0fuDdT7BrjsOYUVEZEzoDMs0iRUVLp54qvNANx+Xgdiwy0cJK5GWLlRYUVEpB7oDIs0Ce8tzyLrQAnRYQ5uHWThIHFr34ePx5lhpc+NcOlzupGhiEg90G9S8XklFZU8M38rAH8e0tG6QeLWvHdUWLlJYUVEpB7pDIv4vJk/7CC/qJykliFcc3ZS4xdQ5YLFz8GCv5lhpe8oGPGMwoqISD1SYBGfdrC4gpcWbgNg4rDO2AMaOSRkLYU5d8HejeZy39Ew4mmFFRGReqbAIj7txUXbKCyv5KzWEVzaM6HxXrj0IHz9MKx83VwObgkZj5ljrdi8YGRdEZEmRoFFfFZOQSmvL94BwL3DU/FrjCH4DQPWvQ9f3Q/F+8x1fW6Ei/4GIS0b/vVFRJopBRbxWc98vYWKSjf9k1tyfueYhn/B/dvgswmwfaG5HJ0KI56C9r9p+NcWEWnmFFjEJ23dW8R/V+wC4L7hqdga8jJMZTl8/zR890+oKoeAIBh8D5zzJwiwN9zrioiIhwKL+KTp8zbjNmDoWXGktWvASzGZ38Gcu2H/FnO5w4VwyT+hZUrDvaaIiBxDgUV8zppdh/h8XS42G9yTkdowL1KcD3MfhDXvmMuhsTB8KnS/So1qRUQsoMAiPmfaV5sAuKJPG1Lj6/kGh243rH4b5k0xewJhg363wJApEBxVv68lIiKnTYFFfMr3W/L5Yet+7P5+3D20c/0efO8m8/JP1mJzOa67OaZK4tn1+zoiIlJrCiziMwzD4P99aZ5duWFgEoktQ+rv4Lnr4LVh4CqBwBA4fzIMvAP8A+vvNUREpM4UWMRnfLE+l3V7Cgi1+zP+go71d+AyJ/x3tBlWks6BK/8NURYM8S8iIiekwCI+obLKzZNfbQbg1kEpRIc56ufAhgGf/hEObIOItnDtLA0AJyLihXTDE/EJ76/czfb8YlqG2rl1UHL9HXj5q7BxNvgFwNWvK6yIiHgpBRbxemWuKp7++hcAxl/QkfCgempXsucnc4h9gIseVeNaEREvpsAiXu+NxTvIc5bTJiqYGwfWU9uS0oPw/mioqoAuI2DgH+rnuCIi0iAUWMSrFZS6+NfCbQDcfVFnHAH+Z35Qw4DZ4+FQFkS1g8tf0GBwIiJeToFFvNrL326joNRFp9gwrujTpn4OuuQF2PwZ+NvNdisaEE5ExOspsIjX2ussY8b3OwBzCH5/v3o4C7JrGXz9kDmf8Q9o0/fMjykiIg1OgUW81nMLtlLqqqJPUhQXdY078wMW74f3x4C7ErpdCWffeubHFBGRRqHAIl5pR34x7yzLAuC+4V2wnWkbE7cbPv49OPdAyw5w6TNqtyIi4kMUWMQrTZ/3C5Vug/NTYxiY0urMD/jD07B1HgQEwe/egKCIMz+miIg0mgYLLDt27GDs2LEkJycTHBxMhw4deOihh6ioqKix39q1axk0aBBBQUEkJiYybdq0hipJfMSG7AI+XZMNmG1XztiO72HB38z5/3sC4nuc+TFFRKRRNdjQ/Js2bcLtdvPvf/+bjh07sn79em677TaKi4t58sknAXA6nQwbNoyhQ4fy0ksvsW7dOm655RaioqIYN25cQ5UmXu6J6iH4L+uVQLeEyDM7WNFe+GAsGG7oeS30uakeKhQRkcbWYIFl+PDhDB8+3LOckpLC5s2befHFFz2BZdasWVRUVDBjxgzsdjvdunVj9erVTJ8+XYGlmfpx+34Wbt5HgJ+NCRd1PrODuavgo9ugKBdiusCI6Wq3IiLioxq1DUtBQQEtWx65V8uSJUsYPHgwdrvdsy4jI4PNmzdz8ODB4x6jvLwcp9NZY5KmwTAMpn25CYBr+yfSPjr0zA747ROwfSEEhsDVb4D9DI8nIiKWabTAsnXrVp577jl+//vfe9bl5uYSF1ezu+rh5dzc3OMeZ+rUqURGRnqmxMTEhitaGtUPW/fzU9YhggL9+NOFnc7sYNu+gYWPm/MjnobYLmdcn4iIWKfWgWXSpEnYbLaTTps2barxnD179jB8+HCuvvpqbrvttjMqePLkyRQUFHimXbt2ndHxxHu89eMOAH7XL5HYiKC6H8iZAx/eChjQdxT0uqZe6hMREevUug3LxIkTGTNmzEn3SUlJ8cxnZ2dzwQUXcM455/Dyyy/X2C8+Pp68vLwa6w4vx8fHH/fYDocDh8NR27LFy+UUlDJvo/mzv3Fgu7ofqKoSPhwLJfkQ1x0uVq8zEZGmoNaBJSYmhpiYmNPad8+ePVxwwQWkpaUxc+ZM/PxqntBJT0/ngQcewOVyERgYCMC8efNITU2lRYsWtS1NfNg7y3bhNmBAcks6x4XX/UDfPAY7fwB7uNluJTC4/ooUERHLNFgblj179nD++eeTlJTEk08+yb59+8jNza3RNuX666/HbrczduxYNmzYwHvvvcczzzzDhAkTGqos8UKuKjfvVo9qe0ZnV7bMg++nm/OXPQvRHeuhOhER8QYN1q153rx5bN26la1bt9K2bdsa2wzDACAyMpK5c+cyfvx40tLSiI6OZsqUKerS3MzM25jH3sJyosMcZHQ7/qXAU3KVwpy7zfmzb4XuV9ZfgSIiYrkGCyxjxow5ZVsXgJ49e/Ldd981VBniA95ashOAa89OxB5Qx5N+S56Hgl0Q0RYu+ls9ViciIt5A9xISS23dW8iS7fvxs8F1A5LqdhBnDnz3lDl/0SNgD6m/AkVExCsosIil3v7RbLtyYZc42kTVsYHs/EfBVQxt+0P3q+qxOhER8RYKLGKZkopKPvxpNwA3pdexse2elbDmP+b88Mc19L6ISBOlwCKW+XR1NoVllbRrFcKgjtG1P4BhwJeTzfme10LbtPotUEREvIYCi1jCMAze+tFsbHvDgCT8/OpwZmTDR7BrqXmvoKEP1XOFIiLiTRRYxBKrdx1iQ7YTe4AfV6fV4X5QrlKYVx1Szr0bIhLqt0AREfEqCixiicONbUf0bE2LUPsp9j6OxUd1Y06/s56rExERb6PAIo3uYHEF/1ubDdRxZFtn9pERbdWNWUSkWVBgkUb3wcrdVFS66ZYQQZ/EqNofYP6j4CqBxAHqxiwi0kwosEijcrsN3l5qNra9cWA7bLXthrx7Jax5x5wfPlXdmEVEmgkFFmlU32/NZ+f+EsIdAVzeu5YNZQ0Dvpxkzve6DtqoG7OISHOhwCKN6nBX5qvS2hJir+WtrNZ/CLuXmd2Yh0xpgOpERMRbKbBIo8k+VMr8n/MAuHFgLe8bVFFyVDfmCerGLCLSzCiwSKN5Z1kWbgMGprSkY2x47Z685Hlw7obIRDhH3ZhFRJobBRZpFBWVbt5dvguAmwa2r92Tndnw/VF3Yw6s400SRUTEZymwSKOYuzGXfYXlxIQ7GNYtrnZP/vqR6m7MA6HblQ1ToIiIeDUFFmkUb1c3tr3u7EQC/Wvxsdu9Ata+a86rG7OISLOlwCINbkteIT9uP4CfDa7tX4vGtjW6MV8Pbfo2TIEiIuL1FFikwc1aat43aOhZcSRE1aL9yfoPYfdyCAxVN2YRkWZOgUUaVElFJR+u3A3U8r5BFSUwrzqkDLobIlo3QHUiIuIrFFikQX2yOpvC8kratwrh3I7Rp//Exc+Bcw9EJuluzCIiosAiDccwDN5aYja2vWFAO/z8TrPBbMEe+OFpc17dmEVEBAUWaUCrdh1iY44TR4Afv01re/pPnF/djTkpHbpd0XAFioiIz1BgkQZzuCvziJ4JtAi1n96Tdi2Hte8BNnVjFhERj1refU7k9BwsrmDO2hwAbkr/VWNbdxWUFUDZIfOxtPqxrACWv2ru0/t6SOjTqDWLiIj3UmCR+nVgO6z7kF1bM3mMXbQNr6DX1y9AmfNIQCl3nvwYgaFw4V8bpVwREfENCixSf/K3wmsXQekBegI9AwAXsPME+weGQFAUBEVCcPVjUBT0uFrdmEVEpAYFFqkfRftg1lVQeoCiyM68vT+VsoAw/jA8DXtoi+pAEnUkoARFQsBptmsREZFmT4FFzlxFCbxzDRzcAVHtmBLxGB/lubh5QHvsA7tZXZ2IiDQB6iUkZ8ZdBR/eCntWQnAL8i6bxewtLsAce0VERKQ+KLBI3R2+OeHmz8DfAde+w1tb7LgNOKdDKzrGhlldoYiINBEKLFJ3i5+DZS8DNrjy3+S16MPri3cAtbxvkIiIyCkosEjdrP8I5lV3PR72d+h2BY999jNF5ZX0ToxieLd4a+sTEZEmpUEDy2WXXUZSUhJBQUG0bt2am266iezs7Br7rF27lkGDBhEUFERiYiLTpk1ryJKkPuxcDB//3pwfcDukj2fx1nw+XZONnw3+PrL76d83SERE5DQ0aGC54IIL+O9//8vmzZv58MMP2bZtG7/97W89251OJ8OGDaNdu3asXLmSJ554gocffpiXX365IcuSM7FvM7xzHVRVQJcRkPEPKqoM/vrJesC8FNS9TaTFRYqISFPToN2a7777bs98u3btmDRpEiNHjsTlchEYGMisWbOoqKhgxowZ2O12unXrxurVq5k+fTrjxo1ryNKkLgrzYNZvzRFr254NV74Cfv7M+G4b2/YV0yrUzsSLUq2uUkREmqBGa8Ny4MABZs2axTnnnENgYCAAS5YsYfDgwdjtRwYQy8jIYPPmzRw8ePC4xykvL8fpdNaYpBGUF8F/fgeHsqBlClz3LthDyD5UyjNfbwFg8v+dRWRIoMWFiohIU9TggeW+++4jNDSUVq1akZWVxSeffOLZlpubS1xcXI39Dy/n5uYe93hTp04lMjLSMyUmJjZc8WKqqoQPboGc1RDSCm74AEKjAfjbnI2Uuqro164FV/ZpY22dIiLSZNU6sEyaNAmbzXbSadOmTZ7977nnHlatWsXcuXPx9/dn1KhRGIZR54InT55MQUGBZ9q1a1edjyWnwTDg87/Alq8gIAiuew9adQBg0S/7+GJ9Lv5+Nv6mhrYiItKAat2GZeLEiYwZM+ak+6SkpHjmo6OjiY6OpnPnzpx11lkkJiby448/kp6eTnx8PHl5eTWee3g5Pv743WIdDgcOh6O2ZUtdfT8dVs4EbHDVq5B4NgDllVU8/OkGAEant+es1hEWFikiIk1drQNLTEwMMTExdXoxt9sNmO1QANLT03nggQc8jXAB5s2bR2pqKi1atKjTa0g9WvtfmP+oOX/x/4OzLvVseuXb7WTmFxMT7uCuizpZVKCIiDQXDdaGZenSpTz//POsXr2anTt3smDBAq677jo6dOhAeno6ANdffz12u52xY8eyYcMG3nvvPZ555hkmTJjQUGXJ6cr8Fmb/wZxPvxMG/N6zadeBEp7/ZisAD15yFhFBamgrIiINq8ECS0hICB999BFDhgwhNTWVsWPH0rNnTxYtWuS5pBMZGcncuXPJzMwkLS2NiRMnMmXKFHVpttren+HdG8Htgq4j4aK/1dj8yP82UuZyMzClJZf1SrCmRhERaVZsxpm0gPUCTqeTyMhICgoKiIhQO4ozVpwP/z4PnLshcSCM+gQCgzyb5/+cx9g3VhDgZ+OLPw+iU1y4hcWKiIivqu33t+4lJDUtfckMKy07wHXv1AgrZa4qHv6f2dB27LnJCisiItJoFFjkCFcZrJhpzg+ZAiEta2z+18Jt7DpQSnxEEH8aooa2IiLSeBRY5Ij1H0JJPkS0Ne8TdJQd+cW8tGgbAH8d0ZVQR4Pe1UFERKQGBRYxGYZ5OQig/63gH3DUJoOH/7eBiko3gzpF8389jj9GjoiISENRYBFT1o+Qu9Yczbbv6BqbvtqQx8LN+wj0t/HIZd2w2TSirYiINC4FFjEdPrvS83c12q6UVFTytzkbARg3OIWUmDArqhMRkWZOgUWgYDf8/D9zvv/va2x6fsFW9hwqpU1UMHdeoIa2IiJiDQUWgeWvgVEF7QdBfHfP6q17i3jlu+0APHRpV4Lt/lZVKCIizZwCS3PnKoWVr5vzRw2/bxgGD3+6AVeVwQWpMVzUNc6a+kRERFBgkXUfQOkBiEyCzhd7Vn+2Lofvt+ZjD/DjYTW0FRERiymwNGeGAUv/bc4f1ZW5qPxIQ9s/nN+Bdq1CrapQREQEUGBp3nYuhrx1EBAMfW7yrH52/hbynOUktQzh9vM6WFigiIiISYGlOTvclbnXNZ6uzL/kFTLj+0wAHrmsG0GBamgrIiLWU2Bprg7tgk1zzPkBtwNmQ9spn6yn0m1wUdc4LugSa2GBIiIiRyiwNFfLXwHDDcnnQexZAPxvbQ4/bj+AI8CPKSO6WlygiIjIEQoszVFFCax8w5yvPrtSXF7JY58dbmjbkcSWIVZVJyIicgwFluZo3X+h7BBEtYPOGQA8u+BIQ9vfn5dibX0iIiK/osDS3NToyjwO/PzZureI174zG9o+dGlXNbQVERGvo8DS3Oz4DvZuhMAQ6HOjZ0TbSrfBkC6xDDlLI9qKiIj3UWBpbg6fXel1HQRH8cX6XM+ItlMuVUNbERHxTgoszcnBHbD5c3O+/zhKKir5e/WItrcPTtGItiIi4rUUWJqT5a+aXZlTLoDYLrzwzVayC8poExXMHed3tLo6ERGRE1JgaS4qiuGnN835AbeTmV/MK9+aDW2nXNqVYLsa2oqIiPdSYGku1r4HZQXQIhmj00U8/OkGKqrcnNc5hmFd1dBWRES8mwJLc3B0V+YBv2fuz/tY9Ms+7P5+PHxZN2w2m7X1iYiInIICS3OQuQj2bQJ7GKVdr+XR/5kNbW8dlExytBraioiI91NgaQ4On13pfT0vLt3HnkOlJEQGceeFamgrIiK+QYGlqTuwHTZ/AcCezjfy0qJtADw4oish9gArKxMRETltCixN3bJXAQM6DmXK9xVUVLo5t2M0F3ePt7oyERGR06bA0pSVF8GqtwBYnXAN8zftJcDPpoa2IiLicxRYmrI170C5E3fLDvx5eSsAxp6bTMfYMIsLExERqR0FlqbK7YZlLwPwbdQV7DxYRlyEgz8O6WRxYSIiIrWnwNJUbf8G8n/BHRjGhF/Mmxo+cElXwhxqaCsiIr5HgaWpqu7K/E3wRRyoDCI9pRWX9mxtcVEiIiJ10yiBpby8nN69e2Oz2Vi9enWNbWvXrmXQoEEEBQWRmJjItGnTGqOkpm3/NtjyFQY2Ht03iAA/G49croa2IiLiuxolsNx7770kJCQcs97pdDJs2DDatWvHypUreeKJJ3j44Yd5+eWXG6OspmvZKwD86J/GTiOeMee0p3NcuMVFiYiI1F2DN2j44osvmDt3Lh9++CFffPFFjW2zZs2ioqKCGTNmYLfb6datG6tXr2b69OmMGzeuoUtrmtxVsPZdAF4qHUJMuIM/D1VDWxER8W0NeoYlLy+P2267jbfeeouQkJBjti9ZsoTBgwdjt9s96zIyMti8eTMHDx487jHLy8txOp01JjlK9iooPYjTCOF7d3fu/78uhAcFWl2ViIjIGWmwwGIYBmPGjOH222+nX79+x90nNzeXuLi4GusOL+fm5h73OVOnTiUyMtIzJSYm1m/hvm7rfAC+d3cnrX0MI3u3sbggERGRM1frwDJp0iRsNttJp02bNvHcc89RWFjI5MmT67XgyZMnU1BQ4Jl27dpVr8f3dVVbvwbgW3dP/jqiqxraiohIk1DrNiwTJ05kzJgxJ90nJSWFBQsWsGTJEhwOR41t/fr144YbbuCNN94gPj6evLy8GtsPL8fHH/9eNw6H45hjSrXSg/jtWQHAL2H96d4mwuKCRERE6ketA0tMTAwxMTGn3O/ZZ5/l73//u2c5OzubjIwM3nvvPQYMGABAeno6DzzwAC6Xi8BAs53FvHnzSE1NpUWLFrUtTbYvwma42eJuQ2rqWTq7IiIiTUaD9RJKSkqqsRwWZt6/pkOHDrRt2xaA66+/nkceeYSxY8dy3333sX79ep555hmeeuqphiqraTvqctCgTqcOlSIiIr7C0nHaIyMjmTt3LuPHjyctLY3o6GimTJmiLs11YRhUbZmPP2ZgeaZDK6srEhERqTeNFljat2+PYRjHrO/ZsyffffddY5XRdO3bjH9RNmVGIKUJA4kKsZ/6OSIiIj5C9xJqKqovBy11n8WAzurKLCIiTYsCSxNhVI+/ovYrIiLSFCmwNAUVJRg7fwBguX8f+iRFWVuPiIhIPVNgaQp2Lsavqpw9RitiU3oR6K8fq4iINC36ZmsKtlVfDqrqyaDOuhwkIiJNjwJLE+DeMg+ARe5eDOoUbXE1IiIi9U+BxdcdysJv/xYqDT8yw/uRHB1qdUUiIiL1ToHF11X3DlptdKRvajsNxy8iIk2SAouvq26/sqhK3ZlFRKTpUmDxZVUu3NsWAvCd0ZNzNBy/iIg0UQosvmz3CvwqCjlghGFL6KPh+EVEpMlSYPFl1ZeDvnf34NzOcRYXIyIi0nAUWHyYUX3/oEVVvdR+RUREmjQFFl9VvB+yVwOwIqC3huMXEZEmTYHFV23/BhsGP7uT6NSho4bjFxGRJk3fcr7q8OUgty4HiYhI06fA4ovcbozqAeMWuXtqOH4REWnyFFh8Ud56bMV7KTYc5IT30nD8IiLS5Cmw+KLq7sxL3F0Z2Lm1huMXEZEmT4HFF1VfDvrWreH4RUSkeVBg8TXlRRhZPwLwrbsXv+mo4fhFRKTpU2DxNTu+w+Z2sdMdS2TbLhqOX0REmgUFFl9zVHfmweodJCIizYQCi48xjmq/cm5HBRYREWkeFFh8yf5t2A5m4jL8WRPQgz5JLayuSEREpFEosPiSbQsAWOFOpVeHttgD9OMTEZHmQd94vkTdmUVEpJlSYPEVlRUYmd8CGo5fRESaHwUWX7HrR2yuYvYZkTgjUjUcv4iINCsKLL7CczmoB+d2jtVw/CIi0qwosPiKw3dnruql9isiItLsKLD4gsJcyFuH27Dxg9FDw/GLiEizo8DiC6q7M68zkmnbJlHD8YuISLPToIGlffv22Gy2GtPjjz9eY5+1a9cyaNAggoKCSExMZNq0aQ1Zkm9Sd2YREWnmAhr6BR599FFuu+02z3J4eLhn3ul0MmzYMIYOHcpLL73EunXruOWWW4iKimLcuHENXZpvcFdhbFuADfi2qid/UXdmERFphho8sISHhxMfH3/cbbNmzaKiooIZM2Zgt9vp1q0bq1evZvr06Qosh+WsxlZ6AKcRzC+BqRqOX0REmqUGb8Py+OOP06pVK/r06cMTTzxBZWWlZ9uSJUsYPHgwdvuRNhkZGRls3ryZgwcPNnRpvmGr2X5lsbs7Z3eI03D8IiLSLDXoGZY//elP9O3bl5YtW7J48WImT55MTk4O06dPByA3N5fk5OQaz4mLi/Nsa9Hi2LMJ5eXllJeXe5adTmcD/gu8wNavAXN0W92dWUREmqta/7k+adKkYxrS/nratGkTABMmTOD888+nZ8+e3H777fzzn//kueeeqxE4amvq1KlERkZ6psTExDofy+uVFWDsXg6Y7VcGdVaDWxERaZ5qfYZl4sSJjBkz5qT7pKSkHHf9gAEDqKysZMeOHaSmphIfH09eXl6NfQ4vn6jdy+TJk5kwYYJn2el0Nt3Qsn0RNqOKbe7WEJVEiobjFxGRZqrWgSUmJoaYmLr9pb969Wr8/PyIjY0FID09nQceeACXy0VgYCAA8+bNIzU19biXgwAcDgcOh6NOr+9zPJeDejGoU7SG4xcRkWarwVpwLlmyhKeffpo1a9awfft2Zs2axd13382NN97oCSPXX389drudsWPHsmHDBt577z2eeeaZGmdQmi3D8AwYp/FXRESkuWuwRrcOh4N3332Xhx9+mPLycpKTk7n77rtrhJHIyEjmzp3L+PHjSUtLIzo6milTpqhLM0D+L1Cwi3IjkKXGWTzVQcPxi4hI89VggaVv3778+OOPp9yvZ8+efPfddw1Vhu+qHt12qbsLndvE0iJUw/GLiEjzpUE9vNU2DccvIiJymAKLN3KVYuz4HjjS4FZERKQ5U2DxRjsXY6ssI8doyZ7AJA3HLyIizZ4CizfavhCA76p6kJ4SreH4RUSk2dM3oTfaYTZCXuzupstBIiIiKLB4n7ICjJw1ACxxd9Vw/CIiIiiweJ+di7EZbra74wmIaqPh+EVERFBg8T6Z5uWgH91dGdw5RsPxi4iIoMDidaoyvwXMy0FX9W1jcTUiIiLeQYHFm5QcwC9vPQB7ItNIa6fuzCIiIqDA4l12/oANgy3uNlx4dg9dDhIREammwOJFCjd9A8CPRleu6NvW4mpERES8hwKLFynfshCAAzEDaBMVbG0xIiIiXkSBxUu4C/cRXbINgE79L7a4GhEREe+iwOIltiz/EoBfjCQu6HuWxdWIiIh4FwUWL5G/7msA9kX3J9jub3E1IiIi3kWBxQsUl1cSf2AZAHG9LrK4GhEREe+jwOIFFixfSwdbNm5sdOiXYXU5IiIiXkeBxQtsX2G2X8kPS8UWosHiREREfk2BxWJZ+0uIyTcvB4V0Pt/aYkRERLyUAovFPvxpN+f4bQQgrMuFFlcjIiLinRRYLOR2G3y3YjXt/fJw2/whKd3qkkRERLySAouFlmYeoH3hSnOhdS8IirC2IBERES+lwGKh91fuIr36cpBf8mCLqxEREfFeCiwWKSqv5It1uaT7m4GF5EHWFiQiIuLFFFgs8vm6HFpV5tDWlo/hFwCJA60uSURExGspsFjkg5W7PZeDbG3SwBFmcUUiIiLeS4HFAjv3F7Ms84CnOzPtdTlIRETkZBRYLPDhT3sAg/Psm8wVar8iIiJyUgosjcztNvhw5W7a23JpWZUP/nZIHGB1WSIiIl5NgaWR/Zi5nz2HSrnAvtlc0fZsCAy2tigREREvp8DSyD5YsRuAkVHbzBVqvyIiInJKCiyNqLDMxefrcwCDrhVrzJVqvyIiInJKCiyN6It1uZS53Jzf8iCBpfkQEGReEhIREZGTatDA8tlnnzFgwACCg4Np0aIFI0eOrLE9KyuLSy65hJCQEGJjY7nnnnuorKxsyJIs9cFK83LQzW12mSsS+0OAw8KKREREfENAQx34ww8/5LbbbuMf//gHF154IZWVlaxfv96zvaqqiksuuYT4+HgWL15MTk4Oo0aNIjAwkH/84x8NVZZlduQXs2zHAfxsMIDD46/o/kEiIiKno0ECS2VlJX/+85954oknGDt2rGd9165dPfNz585l48aNfP3118TFxdG7d2/+9re/cd999/Hwww9jt9sbojTLfPSTeXZlUMdWBO1ZbK5U+xUREZHT0iCXhH766Sf27NmDn58fffr0oXXr1lx88cU1zrAsWbKEHj16EBcX51mXkZGB0+lkw4YNJzx2eXk5TqezxuTt3G6jerA4GNOpDEr2Q2AIJPS1uDIRERHf0CCBZfv27QA8/PDDPPjgg8yZM4cWLVpw/vnnc+DAAQByc3NrhBXAs5ybm3vCY0+dOpXIyEjPlJiY2BD/hHr143Zz7JXwoADODai+HJQ0EAKa1lkkERGRhlKrwDJp0iRsNttJp02bNuF2uwF44IEHuOqqq0hLS2PmzJnYbDbef//9Myp48uTJFBQUeKZdu3ad0fEaw+HGtpf2SiAw6wdzpcZfEREROW21asMyceJExowZc9J9UlJSyMnJAWq2WXE4HKSkpJCVlQVAfHw8y5Ytq/HcvLw8z7YTcTgcOBy+07PmyNgrcHXfBHjne3NDshrcioiInK5aBZaYmBhiYmJOuV9aWhoOh4PNmzdz7rnnAuByudixYwft2rUDID09nccee4y9e/cSGxsLwLx584iIiKgRdHzd5+tyKHO56RATSu/AXVB2COzh0Lq31aWJiIj4jAbpJRQREcHtt9/OQw89RGJiIu3ateOJJ54A4OqrrwZg2LBhdO3alZtuuolp06aRm5vLgw8+yPjx433qDMqpHL4c9Nu0RGw7vjBXtksH/wbrUS4iItLkNNi35hNPPEFAQAA33XQTpaWlDBgwgAULFtCiRQsA/P39mTNnDnfccQfp6emEhoYyevRoHn300YYqqdHtyC9m+Y6D+Nngij5t4LPvzA1qvyIiIlIrDRZYAgMDefLJJ3nyySdPuE+7du34/PPPG6oEy314eOyVTjHEhwXATo2/IiIiUhe6l1ADcbsNPvRcDmoLuWug3AlBkRDf0+LqREREfIsCSwNZsn0/2QVlhAcFcFHXOMisvhzU7jfg529tcSIiIj5GgaWBHG5se1mvBIIC/WGH2q+IiIjUlQJLA/gp6yBz1mYD1ZeDqlywc4m5Ue1XREREak2BpZ7tKyznjrdX4qoyuLh7PL0ToyB7FbiKIbglxHazukQRERGfo8BSj1xVbsb/5yfynOV0iAnliat7YbPZIPNbc4f2vwE/veUiIiK1pW/PejT1800syzxAmCOAf9/UjzBHda9xT/sVDccvIiJSFwos9eST1XuY8UMmAE9e3YuOsWHmhspyyFpqzqv9ioiISJ0osNSDn3Oc3PfhWgDGX9CB4d2PunnjnpVQWQqhMRDTxaIKRUREfJsCyxkqKHHx+7dWUuZyM6hTNBMuSq25w+HxV9qfCzZb4xcoIiLSBCiwnAG32+DP760i60AJbVsE8+y1ffD3+1Uo0fgrIiIiZ0yB5Qw8PX8LCzfvwxHgx0s3ptEi1F5zB1cZ7Fpmzierwa2IiEhdKbDU0dcb83h2/hYApl7Zg+5tIo/dafcyqCqHsHho1bGRKxQREWk6FFjqIDO/mLvfWw3A6PR2XNm37Ql2rL4clDxI7VdERETOgAJLLRWXV/L7t1ZQWF5Jv3YteOCSrifeWe1XRERE6oUCSy0YhsG9H67ll7wiYsId/OuGvtgDTvAWVhTD7hXmvMZfEREROSMKLLXw6neZfLY2hwA/Gy/e0JfYiKAT75z1I7hdENEWWiQ3XpEiIiJNkALLaVq8NZ+pX/wMwJRLu9KvfcuTP2GH2q+IiIjUFwWW05B9qJQ731mF24Ar+7bhpoHtTv4EZzb88pU5r/YrIiIiZyzA6gK8XZmrijveXsmB4gq6to7gH1f0MO/A/GtVLjOkrHoLtswFww3YNP6KiIhIPVBgOYWHP93Amt0FRIUE8u+b0ggK9K+5w/5t8NObsPo/ULz3yPqkdBj4B4hKbNyCRUREmiAFlpN4Z1kW7y7fhc0Gz17bh8SWIeaGihL4+VMzqOz84cgTQmOg13XQ5yaI6WxN0SIiIk2QAssJrMo6yEOfbADgL8NSGdw5BrJXm5d81r4P5QXmjjY/6DgU+o6CzsPBP9C6okVERJooBZYTeHHhNiqq3IzsEsofQr+Bl26G3LVHdohKMs+k9L4eIk8w0q2IiIjUCwWWE3ju3HK2Fb7FWbu/wbajzFzpb4cuI8yzKcnngZ86WYmIiDQGBZYTcGx4n677vjAXYruaIaXnNRByivFXREREpN4psJxI2hiza3LfUdAmTYO/iYiIWEiB5UQSesNlz1pdhYiIiKCRbkVERMQHKLCIiIiI11NgEREREa+nwCIiIiJeT4FFREREvF6DBZaFCxdis9mOOy1fvtyz39q1axk0aBBBQUEkJiYybdq0hipJREREfFSDdWs+55xzyMnJqbHur3/9K/Pnz6dfv34AOJ1Ohg0bxtChQ3nppZdYt24dt9xyC1FRUYwbN66hShMREREf02CBxW63Ex8f71l2uVx88skn/PGPf8RWPQjbrFmzqKioYMaMGdjtdrp168bq1auZPn26AouIiIh4NFoblk8//ZT9+/dz8803e9YtWbKEwYMHY7fbPesyMjLYvHkzBw8ebKzSRERExMs12ki3r732GhkZGbRte+TOxrm5uSQnJ9fYLy4uzrOtRYsWxxynvLyc8vJyz7LT6WygikVERMRb1PoMy6RJk07YmPbwtGnTphrP2b17N1999RVjx44944KnTp1KZGSkZ0pMTDzjY4qIiIh3q/UZlokTJzJmzJiT7pOSklJjeebMmbRq1YrLLrusxvr4+Hjy8vJqrDu8fHT7l6NNnjyZCRMmeJadTqdCi4iISBNX68ASExNDTEzMae9vGAYzZ85k1KhRBAYG1tiWnp7OAw88gMvl8mybN28eqampx70cBOBwOHA4HLUtW0RERHxYg7dhWbBgAZmZmdx6663HbLv++ut55JFHGDt2LPfddx/r16/nmWee4amnnjrt4xuGAagti4iIiC85/L19+Hv8lIwGdt111xnnnHPOCbevWbPGOPfccw2Hw2G0adPGePzxx2t1/F27dhmAJk2aNGnSpMkHp127dp3W973NME432ngnt9tNdnY24eHhnvFd6svh9jG7du0iIiKiXo/dVOk9qxu9b3Wj96329J7Vjd63ujnZ+2YYBoWFhSQkJODnd+o+QI3Wrbmh+Pn51egq3RAiIiL0Aa0lvWd1o/etbvS+1Z7es7rR+1Y3J3rfIiMjT/sYuvmhiIiIeD0FFhEREfF6Ciwn4XA4eOihh9SNuhb0ntWN3re60ftWe3rP6kbvW93U5/vm841uRUREpOnTGRYRERHxegosIiIi4vUUWERERMTrKbCIiIiI11NgOYEXXniB9u3bExQUxIABA1i2bJnVJXm1hx9+GJvNVmPq0qWL1WV5nW+//ZZLL72UhIQEbDYbs2fPrrHdMAymTJlC69atCQ4OZujQoWzZssWaYr3Eqd6zMWPGHPPZGz58uDXFepGpU6dy9tlnEx4eTmxsLCNHjmTz5s019ikrK2P8+PG0atWKsLAwrrrqKvLy8iyq2Hqn856df/75x3zebr/9dosq9g4vvvgiPXv29AwOl56ezhdffOHZXl+fMwWW43jvvfeYMGECDz30ED/99BO9evUiIyODvXv3Wl2aV+vWrRs5OTme6fvvv7e6JK9TXFxMr169eOGFF467fdq0aTz77LO89NJLLF26lNDQUDIyMigrK2vkSr3Hqd4zgOHDh9f47L3zzjuNWKF3WrRoEePHj+fHH39k3rx5uFwuhg0bRnFxsWefu+++m//973+8//77LFq0iOzsbK688koLq7bW6bxnALfddluNz9u0adMsqtg7tG3blscff5yVK1eyYsUKLrzwQi6//HI2bNgA1OPnrFZ3Gmwm+vfvb4wfP96zXFVVZSQkJBhTp061sCrv9tBDDxm9evWyugyfAhgff/yxZ9ntdhvx8fHGE0884Vl36NAhw+FwGO+8844FFXqfX79nhmEYo0ePNi6//HJL6vEle/fuNQBj0aJFhmGYn63AwEDj/fff9+zz888/G4CxZMkSq8r0Kr9+zwzDMM477zzjz3/+s3VF+YgWLVoYr776ar1+znSG5VcqKipYuXIlQ4cO9azz8/Nj6NChLFmyxMLKvN+WLVtISEggJSWFG264gaysLKtL8imZmZnk5ubW+OxFRkYyYMAAffZOYeHChcTGxpKamsodd9zB/v37rS7J6xQUFADQsmVLAFauXInL5arxeevSpQtJSUn6vFX79Xt22KxZs4iOjqZ79+5MnjyZkpISK8rzSlVVVbz77rsUFxeTnp5er58zn7/5YX3Lz8+nqqqKuLi4Guvj4uLYtGmTRVV5vwEDBvD666+TmppKTk4OjzzyCIMGDWL9+vWEh4dbXZ5PyM3NBTjuZ+/wNjnW8OHDufLKK0lOTmbbtm3cf//9XHzxxSxZsgR/f3+ry/MKbrebu+66i9/85jd0794dMD9vdrudqKioGvvq82Y63nsGcP3119OuXTsSEhJYu3Yt9913H5s3b+ajjz6ysFrrrVu3jvT0dMrKyggLC+Pjjz+ma9eurF69ut4+ZwosUi8uvvhiz3zPnj0ZMGAA7dq147///S9jx461sDJp6q699lrPfI8ePejZsycdOnRg4cKFDBkyxMLKvMf48eNZv3692pXVwones3Hjxnnme/ToQevWrRkyZAjbtm2jQ4cOjV2m10hNTWX16tUUFBTwwQcfMHr0aBYtWlSvr6FLQr8SHR2Nv7//MS2Y8/LyiI+Pt6gq3xMVFUXnzp3ZunWr1aX4jMOfL332zkxKSgrR0dH67FW78847mTNnDt988w1t27b1rI+Pj6eiooJDhw7V2F+ftxO/Z8czYMAAgGb/ebPb7XTs2JG0tDSmTp1Kr169eOaZZ+r1c6bA8it2u520tDTmz5/vWed2u5k/fz7p6ekWVuZbioqK2LZtG61bt7a6FJ+RnJxMfHx8jc+e0+lk6dKl+uzVwu7du9m/f3+z/+wZhsGdd97Jxx9/zIIFC0hOTq6xPS0tjcDAwBqft82bN5OVldVsP2+nes+OZ/Xq1QDN/vP2a263m/Ly8vr9nNVvu+Cm4d133zUcDofx+uuvGxs3bjTGjRtnREVFGbm5uVaX5rUmTpxoLFy40MjMzDR++OEHY+jQoUZ0dLSxd+9eq0vzKoWFhcaqVauMVatWGYAxffp0Y9WqVcbOnTsNwzCMxx9/3IiKijI++eQTY+3atcbll19uJCcnG6WlpRZXbp2TvWeFhYXGX/7yF2PJkiVGZmam8fXXXxt9+/Y1OnXqZJSVlVlduqXuuOMOIzIy0li4cKGRk5PjmUpKSjz73H777UZSUpKxYMECY8WKFUZ6erqRnp5uYdXWOtV7tnXrVuPRRx81VqxYYWRmZhqffPKJkZKSYgwePNjiyq01adIkY9GiRUZmZqaxdu1aY9KkSYbNZjPmzp1rGEb9fc4UWE7gueeeM5KSkgy73W7079/f+PHHH60uyatdc801RuvWrQ273W60adPGuOaaa4ytW7daXZbX+eabbwzgmGn06NGGYZhdm//6178acXFxhsPhMIYMGWJs3rzZ2qItdrL3rKSkxBg2bJgRExNjBAYGGu3atTNuu+02/XFhGMd9zwBj5syZnn1KS0uNP/zhD0aLFi2MkJAQ44orrjBycnKsK9pip3rPsrKyjMGDBxstW7Y0HA6H0bFjR+Oee+4xCgoKrC3cYrfccovRrl07w263GzExMcaQIUM8YcUw6u9zZjMMw6jjGR8RERGRRqE2LCIiIuL1FFhERETE6ymwiIiIiNdTYBERERGvp8AiIiIiXk+BRURERLyeAouIiIh4PQUWERER8XoKLCIiIuL1FFhERETE6ymwiIiIiNdTYBERERGv9/8Bt4Od2hsEpH4AAAAASUVORK5CYII=", 578 | "text/plain": [ 579 | "
" 580 | ] 581 | }, 582 | "metadata": {}, 583 | "output_type": "display_data" 584 | } 585 | ], 586 | "source": [ 587 | "import matplotlib.pyplot as plt\n", 588 | "\n", 589 | "# train with behavior cloning (BC)\n", 590 | "# (by setting p_for_beta=1.0)\n", 591 | "bc_learner_func = LearnerNet().to(device)\n", 592 | "bc_results = train(\n", 593 | " expert=expert_func,\n", 594 | " learner=bc_learner_func,\n", 595 | " p_for_beta=1.0,\n", 596 | " collect_dat=5000,\n", 597 | " total_aggregate=30,\n", 598 | " train_batch_size=200,\n", 599 | ")\n", 600 | "\n", 601 | "# plot\n", 602 | "plt.plot(dagger_results, label=\"DAgger\")\n", 603 | "plt.plot(bc_results, label=\"BC\")\n", 604 | "plt.legend()\n", 605 | "plt.show()" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 12, 611 | "id": "0170c629-e047-4933-bdf7-3f2d2e8d44c5", 612 | "metadata": {}, 613 | "outputs": [ 614 | { 615 | "data": { 616 | "text/plain": [ 617 | "[-67.285,\n", 618 | " -65.925,\n", 619 | " -52.175,\n", 620 | " -39.655,\n", 621 | " -31.535,\n", 622 | " -27.555,\n", 623 | " -27.33,\n", 624 | " -21.225,\n", 625 | " -16.61,\n", 626 | " -14.44,\n", 627 | " -10.965,\n", 628 | " -7.35,\n", 629 | " -6.5,\n", 630 | " -4.74,\n", 631 | " -0.695,\n", 632 | " -1.635,\n", 633 | " 1.25,\n", 634 | " 1.37,\n", 635 | " 1.925,\n", 636 | " 3.355,\n", 637 | " 1.525,\n", 638 | " 2.15,\n", 639 | " 3.19,\n", 640 | " 3.77,\n", 641 | " 3.95,\n", 642 | " 4.38,\n", 643 | " 5.48,\n", 644 | " 5.505,\n", 645 | " 5.005,\n", 646 | " 5.355]" 647 | ] 648 | }, 649 | "execution_count": 12, 650 | "metadata": {}, 651 | "output_type": "execute_result" 652 | } 653 | ], 654 | "source": [ 655 | "bc_results" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": null, 661 | "id": "5fb236ee-406d-4579-a356-d7d0a47e9e4a", 662 | "metadata": {}, 663 | "outputs": [], 664 | "source": [] 665 | } 666 | ], 667 | "metadata": { 668 | "kernelspec": { 669 | "display_name": "Python 3 (ipykernel)", 670 | "language": "python", 671 | "name": "python3" 672 | }, 673 | "language_info": { 674 | "codemirror_mode": { 675 | "name": "ipython", 676 | "version": 3 677 | }, 678 | "file_extension": ".py", 679 | "mimetype": "text/x-python", 680 | "name": "python", 681 | "nbconvert_exporter": "python", 682 | "pygments_lexer": "ipython3", 683 | "version": "3.10.12" 684 | } 685 | }, 686 | "nbformat": 4, 687 | "nbformat_minor": 5 688 | } 689 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Imitation Learning Algorithms Tutorial (Python) 2 | 3 | This repository shows you the implementation examples of imitation learning (IL) from scratch in Python, with theoretical aspects behind code. 4 | 5 | ## Table of Contents 6 | 7 | - [Behavior Cloning (BC)](01_bc.ipynb) 8 | - [Dataset Aggregation (DAgger)](02_dagger.ipynb) 9 | - [Maximum Entropy Inverse Reinforcement Learning (max-ent IRL)](03_maxent_irl.ipynb) 10 | - [Maximum Causal Entropy Inverse Reinforcement Learning (MCE IRL)](04_mce_irl.ipynb) 11 | - [Relative Entropy Inverse Reinforcement Learning (rel-ent IRL)](05_relent_irl.ipynb) 12 | - [Generative Adversarial Imitation Learning (GAIL)](06_gail.ipynb) 13 | 14 | In this repository, I focus on above 6 IL methods, which affected other works a lot in history.
15 | These are fundamental algorithms, and might also help you learn other recent IL algorithms (such as, [rank-game](https://www.microsoft.com/en-us/research/blog/unifying-learning-from-preferences-and-demonstration-via-a-ranking-game-for-imitation-learning/), etc). 16 | 17 | In this repository, I'll often use basic terminologies for behavioral learning - such as, "discount", "policy", "advantages", ... If you're new to learn behavioral learning, I recommend you to learn [reinforcement learning (RL)](https://github.com/tsmatz/reinforcement-learning-tutorials) briefly at first. 18 | 19 | > Note : In this repository, I focus on model-free IL algorithms.
20 | > Also, this repository focuses on action-state learning, and trajectory learning (which is sometimes applied in robotics) is out of scope.
21 | > In trajectory learning, the trajectory is modeled by [GMM](https://github.com/tsmatz/gmm), [HMM](https://github.com/tsmatz/hmm-lds-em-algorithm), or MP (Movement Primitive), etc. (See [here](https://arxiv.org/abs/1811.06711) for details.) 22 | 23 | ## Imitation Learning - What's and How ? 24 | 25 | Such like [reinforcement learning](https://github.com/tsmatz/reinforcement-learning-tutorials), imitation learning is an approach to learn how the agent takes the action to get optimal results. However, unlike reinforcement learning, imitation learning never use prior reward's functions, but do use expert's behaviors instead.
26 | 27 | There exist two main approaches for imitation learning - Behavior Cloning (BC) and Inverse Reinforcement Learning (IRL). 28 | 29 | **Behavioral Cloning (BC)** directly learns expert's (demonstrated) behaviors without reward functions, in which the optimal mapping from states to actions is explored. It simply finds optimal solution by solving a regression or classification problem using expert's behaviors (dataset) as a supervised learning problem.
30 | When you want to refine the policy optimized by Behavioral Cloning (BC), you can also apply regular reinforcement learning method after that.
31 | The methods of [Behavior Cloning (BC)](01_bc.ipynb) and [Dataset Aggregation (DAgger)](02_dagger.ipynb) belongs to this approach. 32 | 33 | **Inverse Reinforcement Learning (IRL)**, on the other hand, is a method to learn a cost function, i.e, recovering the unknown reward function from expert's behaviors, and then extract a policy 34 | from the generated cost function with reinforcement learning. In complex systems, it'll often be difficult to design manual reward functions, especially when they involve human interaction. In such cases, Inverse Reinforcement Learning (IRL) will come into play.
35 | The methods of [Maximum Entropy Inverse Reinforcement Learning](03_maxent_irl.ipynb), [Maximum Causal Entropy Inverse Reinforcement Learning](04_mce_irl.ipynb) and [Relative Entropy Inverse Reinforcement Learning](05_relent_irl.ipynb) belongs to this approach. 36 | 37 | Finally, [Generative Adversarial Imitation Learning (GAIL)](06_gail.ipynb) is a method inspired by Generative Adversarial Networks (GANs) and IRL, but unlike IRL method, it constrains the behavior of the agent to be approximately optimal without explicitly recovering the reward's (or cost's) function. (Hence GAIL is also applied in complex systems, unlike BC + RL.)
38 | GAIL is one of today's state-of-the-art (SOTA) imitation learning algorithm. 39 | 40 | Reinforcement learning (RL) has achieved a great success in a wide variety of agentic and autonomous tasks. However, it's sometimes time-consuming and hard to learn from scratch in case of some complex tasks.
41 | The imitation learning makes sense in such systems, and a lot of prior successful works show us the benefits to provide prior knowledge by imitation learning, before applying reinforcement learning directly. 42 | 43 | > Note : There also exist a lot of works to learn policy from expert's behaviors in gaming - such as, [1](https://www.nature.com/articles/nature16961), [2](https://openai.com/blog/vpt/), or [3](https://developer.nvidia.com/blog/building-generally-capable-ai-agents-with-minedojo/). 44 | 45 | ## Environment and Expert Dataset 46 | 47 | This repository includes expert dataset (```./expert_data/ckpt0.pkl```), which is trained by PPO (state-of-the-art RL algorithm) to solve GridWorld environment. 48 | 49 | GridWorld is a primitive environment, but widely used for behavioral training - such as, reinforcement learning or imitation learning.
50 | The following is the game rule of GridWorld environment used in this repository. (This definition is motivated by the paper "[Relative Entropy Inverse Reinforcement Learning](https://proceedings.mlr.press/v15/boularias11a/boularias11a.pdf)".) 51 | 52 | - It has 50 x 50 grids (cells) and the state corresponds to the location of the agent on the grid. 53 | - The agent has four actions to move in one of the directions of the compass. 54 | - When the agent reaches to the goal state (located on the bottom-right corner), a reward ```10.0``` is given. 55 | - For the remaining states, the reward was randomly set to ```0.0``` with probability 2/3 and to ```−1.0``` with probability 1/3. 56 | - The duration of each trajectory has maximum ```200``` time-step. 57 | - If the agent tries to exceed the border, the fail reward (i.e, reward=```-1.0```) is given and the agent keeps the same state. 58 | 59 | The following picture shows GridWorld environment used in this repository (which is generated with a fixed seed value, ```1000```).
60 | When the agent is on the gray-colored states, the agent can reach to the goal state without losing any rewards. The initial state is sampled from a uniform distribution on the gray-colored states, and then maximum total reward in a single episode always becomes ```10.0```. 61 | 62 | ![GridWorld game difinition](./assets/gridworld_definition.png) 63 | 64 | The expert dataset ```./expert_data/ckpt0.pkl``` includes the following entities. 65 | 66 | | name | description | 67 | | ------------- | ------- | 68 | | states | Numpy array of visited states.
The state is an integer - in which, the left-top corner is ```0``` and the right-bottom corner is ```2499```. | 69 | | actions | Numpy array of corresponding actions to be taken.
The action is also an integer - in which, 0=UP 1=DOWN 2=LEFT 3=RIGHT. | 70 | | rewards | Numpy array of corresponding rewards to be obtained.
**This is never used in imitation learning.** (This is for reference.) | 71 | | timestep_lens | Numpy array of time-step length.
Thus, the length of this array becomes the number of demonstration's episodes. | 72 | 73 | This repository also has the script [00_generate_expert_trajectories.ipynb](./00_generate_expert_trajectories.ipynb) which is used to create expert model and dataset.
74 | By modifying and running this script, you can also customize and build your own expert demonstrations. 75 | 76 | > Note : By setting ```transition_prob=True``` in environment's constructor, you can apply the transition probability - in which, the action succeeds with probability `0.7`, a failure results in a uniform random transition to one of the adjacent states (i.e, `0.1`, `0.1`, `0.1` respectively).
77 | > Dataset in this repository (```./expert_data/ckpt0.pkl```) is generated without transition probability (i.e, always transit to the selected direction deterministically). 78 | 79 | *Tsuyoshi Matsuzaki @ Microsoft* 80 | -------------------------------------------------------------------------------- /assets/gridworld_definition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/imitation-learning-tutorials/de6a63a6583d892a647738be0cbb532aced47274/assets/gridworld_definition.png -------------------------------------------------------------------------------- /assets/regularization_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/imitation-learning-tutorials/de6a63a6583d892a647738be0cbb532aced47274/assets/regularization_plot.png -------------------------------------------------------------------------------- /expert_actor.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/imitation-learning-tutorials/de6a63a6583d892a647738be0cbb532aced47274/expert_actor.pt -------------------------------------------------------------------------------- /expert_data/ckpt0.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/imitation-learning-tutorials/de6a63a6583d892a647738be0cbb532aced47274/expert_data/ckpt0.pkl -------------------------------------------------------------------------------- /expert_value.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/imitation-learning-tutorials/de6a63a6583d892a647738be0cbb532aced47274/expert_value.pt -------------------------------------------------------------------------------- /gridworld.json: -------------------------------------------------------------------------------- 1 | {"reward_map": [0, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, -1, -1, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, -1, 0, 0, 0, -1, -1, 0, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, -1, -1, 0, 0, 0, -1, -1, 0, 0, -1, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, -1, 0, -1, -1, -1, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, -1, 0, -1, -1, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, -1, 0, 0, -1, 0, -1, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, -1, -1, -1, -1, -1, 0, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, -1, 0, -1, -1, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, -1, 0, 0, -1, -1, -1, 0, -1, 0, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, 0, -1, -1, -1, 0, -1, 0, 0, 0, -1, -1, 0, 0, 0, 0, -1, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, -1, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, 0, -1, -1, -1, 0, 0, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, -1, 0, -1, 0, -1, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, -1, -1, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, 0, -1, -1, 0, -1, 0, -1, -1, -1, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, -1, 0, -1, 0, -1, -1, 0, -1, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, -1, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, -1, -1, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, -1, -1, -1, -1, 0, 0, 0, -1, -1, 0, -1, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, -1, -1, -1, 0, 0, -1, -1, 0, -1, -1, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, -1, -1, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, -1, -1, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, -1, 0, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, -1, 0, -1, -1, -1, -1, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, -1, -1, 0, -1, 0, 0, 0, 0, -1, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, -1, -1, 0, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, -1, -1, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, -1, 0, -1, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, -1, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, -1, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, 0, -1, -1, 0, 0, -1, -1, -1, -1, -1, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, -1, 0, 0, 0, -1, -1, -1, 0, -1, 0, 0, -1, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, -1, -1, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, -1, 0, 0, -1, -1, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, -1, -1, -1, -1, -1, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, -1, -1, -1, 0, 0, -1, 0, -1, 0, -1, 0, -1, -1, -1, 0, 0, -1, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, -1, 0, -1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, -1, 0, -1, -1, 0, 0, -1, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, -1, 0, -1, -1, -1, -1, 0, 0, 0, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, -1, -1, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, 0, -1, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, -1, -1, -1, 0, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, -1, -1, 0, -1, -1, 0, 0, 0, 0, -1, -1, 0, -1, -1, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, -1, -1, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, -1, 0, -1, -1, -1, -1, 0, -1, 0, -1, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, -1, 10], "valid_states": [[48, 49], [47, 49], [47, 48], [46, 48], [45, 48], [44, 48], [43, 48], [43, 47], [42, 47], [41, 47], [41, 46], [41, 45], [40, 45], [39, 45], [38, 45], [37, 45], [36, 45], [36, 46], [35, 46], [35, 47], [35, 48], [34, 48], [33, 48], [32, 48], [31, 48], [30, 48], [29, 48], [28, 48], [27, 48], [27, 47], [26, 47], [26, 46], [27, 46], [28, 46], [29, 46], [30, 46], [30, 45], [29, 45], [28, 45], [27, 45], [26, 45], [26, 44], [25, 44], [25, 43], [26, 43], [27, 43], [28, 43], [29, 43], [30, 43], [31, 43], [32, 43], [32, 42], [31, 42], [31, 41], [32, 41], [32, 40], [33, 40], [34, 40], [34, 41], [35, 41], [36, 41], [37, 41], [38, 41], [38, 40], [39, 40], [39, 39], [38, 39], [39, 38], [39, 37], [38, 37], [37, 37], [37, 36], [36, 36], [36, 35], [35, 35], [35, 34], [35, 33], [36, 33], [36, 32], [35, 32], [34, 32], [33, 32], [32, 32], [32, 31], [33, 31], [34, 31], [35, 31], [36, 31], [37, 31], [38, 31], [38, 30], [39, 30], [40, 30], [41, 30], [42, 30], [42, 31], [41, 31], [40, 31], [40, 32], [39, 32], [42, 32], [43, 32], [44, 32], [42, 33], [41, 33], [41, 34], [40, 34], [39, 34], [39, 35], [38, 35], [37, 35], [39, 36], [40, 36], [41, 36], [42, 36], [43, 36], [43, 37], [42, 37], [42, 38], [41, 38], [41, 39], [42, 39], [43, 39], [43, 40], [42, 40], [41, 40], [41, 41], [41, 42], [42, 42], [43, 42], [43, 41], [44, 41], [45, 41], [46, 41], [46, 40], [45, 40], [45, 39], [47, 40], [48, 40], [49, 40], [49, 41], [49, 42], [49, 43], [48, 43], [47, 43], [46, 43], [45, 43], [44, 43], [43, 43], [42, 43], [43, 44], [43, 45], [44, 45], [44, 46], [43, 46], [45, 46], [46, 46], [46, 45], [47, 45], [48, 45], [48, 44], [49, 44], [46, 44], [46, 47], [45, 47], [44, 47], [47, 47], [48, 47], [49, 47], [49, 46], [48, 48], [45, 42], [46, 42], [48, 39], [44, 37], [44, 38], [42, 35], [40, 37], [41, 29], [40, 29], [40, 28], [39, 28], [38, 28], [39, 27], [39, 26], [39, 25], [40, 25], [40, 24], [39, 24], [38, 24], [38, 23], [37, 23], [36, 23], [35, 23], [34, 23], [33, 23], [33, 24], [35, 22], [36, 22], [37, 22], [38, 22], [37, 21], [36, 21], [35, 21], [35, 20], [34, 20], [33, 20], [33, 21], [34, 19], [35, 19], [35, 18], [34, 18], [33, 18], [32, 18], [36, 18], [37, 18], [38, 18], [39, 18], [39, 17], [38, 17], [39, 16], [40, 16], [39, 15], [38, 15], [38, 14], [37, 14], [36, 14], [36, 15], [35, 15], [36, 16], [36, 17], [35, 17], [37, 13], [38, 13], [39, 13], [38, 12], [38, 11], [37, 11], [36, 11], [35, 11], [34, 11], [33, 11], [32, 11], [31, 11], [31, 10], [30, 10], [29, 10], [28, 10], [28, 9], [28, 8], [27, 8], [26, 8], [25, 8], [24, 8], [24, 9], [23, 9], [22, 9], [21, 9], [20, 9], [19, 9], [18, 9], [17, 9], [16, 9], [15, 9], [15, 10], [14, 10], [14, 11], [13, 11], [12, 11], [11, 11], [10, 11], [9, 11], [9, 10], [10, 12], [11, 12], [12, 12], [13, 12], [13, 13], [12, 13], [11, 13], [11, 14], [10, 14], [9, 14], [8, 14], [7, 14], [7, 15], [6, 15], [5, 15], [4, 15], [4, 14], [3, 14], [2, 14], [1, 14], [1, 13], [0, 13], [2, 13], [3, 13], [4, 13], [4, 12], [3, 12], [2, 12], [1, 12], [1, 11], [0, 11], [0, 10], [1, 10], [2, 10], [2, 11], [3, 11], [4, 11], [5, 11], [6, 11], [7, 11], [7, 10], [6, 10], [6, 9], [5, 9], [4, 9], [3, 9], [4, 8], [5, 8], [6, 8], [6, 7], [7, 7], [8, 7], [9, 7], [10, 7], [11, 7], [12, 7], [13, 7], [13, 6], [12, 6], [12, 5], [11, 5], [11, 4], [10, 4], [9, 4], [8, 4], [8, 3], [7, 3], [6, 3], [5, 3], [4, 3], [3, 3], [4, 4], [5, 4], [6, 4], [6, 5], [7, 5], [8, 5], [8, 6], [7, 6], [6, 6], [5, 6], [4, 6], [3, 6], [2, 6], [1, 6], [0, 6], [0, 7], [0, 8], [1, 8], [2, 8], [1, 9], [0, 9], [3, 7], [4, 7], [4, 5], [9, 6], [10, 6], [6, 2], [7, 2], [8, 2], [9, 2], [10, 2], [10, 1], [9, 1], [8, 1], [7, 1], [7, 0], [8, 0], [9, 0], [10, 0], [11, 1], [12, 1], [13, 1], [14, 1], [13, 0], [13, 2], [10, 3], [11, 3], [12, 3], [14, 6], [15, 6], [16, 6], [17, 6], [18, 6], [18, 5], [19, 5], [20, 5], [21, 5], [22, 5], [21, 4], [20, 4], [19, 4], [18, 4], [18, 3], [17, 3], [16, 3], [15, 3], [14, 3], [14, 4], [13, 4], [15, 2], [16, 2], [17, 2], [16, 4], [16, 5], [15, 5], [19, 3], [21, 3], [22, 3], [23, 3], [23, 2], [22, 2], [21, 2], [21, 1], [24, 2], [25, 2], [26, 2], [26, 1], [26, 3], [26, 4], [25, 4], [24, 4], [24, 5], [25, 5], [26, 5], [26, 6], [25, 6], [24, 6], [23, 6], [23, 7], [25, 7], [26, 7], [27, 7], [27, 4], [28, 4], [29, 4], [30, 4], [31, 4], [31, 3], [32, 3], [32, 2], [33, 2], [33, 1], [32, 1], [31, 1], [31, 0], [30, 0], [29, 0], [28, 0], [27, 0], [28, 1], [29, 1], [29, 2], [30, 2], [29, 3], [32, 0], [34, 1], [34, 0], [35, 0], [36, 0], [37, 0], [38, 0], [39, 0], [40, 0], [41, 0], [42, 0], [43, 0], [43, 1], [42, 1], [41, 1], [41, 2], [40, 2], [39, 2], [38, 2], [37, 2], [36, 2], [36, 1], [37, 1], [38, 1], [39, 1], [36, 3], [35, 3], [34, 3], [34, 4], [35, 4], [35, 5], [36, 5], [37, 5], [38, 5], [39, 5], [39, 4], [38, 4], [37, 4], [38, 3], [39, 3], [40, 3], [41, 3], [40, 4], [38, 6], [35, 6], [34, 6], [35, 7], [36, 7], [37, 7], [36, 8], [35, 8], [34, 8], [35, 9], [36, 9], [37, 9], [38, 9], [38, 10], [37, 10], [39, 10], [40, 10], [41, 10], [42, 10], [43, 10], [44, 10], [45, 10], [46, 10], [47, 10], [48, 10], [49, 10], [47, 11], [46, 11], [45, 11], [46, 12], [47, 12], [48, 12], [49, 12], [48, 13], [47, 13], [46, 13], [45, 13], [44, 13], [44, 12], [43, 12], [42, 12], [41, 12], [40, 12], [42, 13], [42, 14], [41, 14], [40, 14], [43, 14], [43, 15], [42, 15], [42, 16], [43, 16], [42, 17], [43, 11], [45, 14], [45, 15], [46, 15], [48, 14], [48, 15], [49, 15], [49, 16], [48, 16], [47, 16], [47, 17], [46, 17], [45, 17], [44, 17], [45, 18], [45, 19], [44, 19], [44, 20], [43, 20], [42, 20], [42, 19], [41, 19], [40, 19], [39, 19], [38, 19], [37, 19], [37, 20], [40, 20], [41, 18], [43, 21], [44, 21], [43, 22], [43, 23], [42, 23], [41, 23], [40, 23], [39, 23], [41, 22], [41, 21], [41, 24], [44, 23], [45, 23], [45, 22], [46, 22], [47, 22], [48, 22], [49, 22], [49, 23], [48, 23], [48, 24], [47, 24], [47, 25], [46, 25], [47, 26], [48, 26], [48, 27], [47, 27], [46, 27], [46, 28], [45, 28], [44, 28], [44, 29], [43, 29], [45, 29], [46, 29], [46, 30], [45, 30], [45, 31], [48, 28], [49, 28], [49, 29], [48, 29], [48, 30], [49, 30], [49, 31], [49, 32], [48, 32], [48, 33], [48, 34], [47, 34], [46, 34], [45, 34], [44, 34], [43, 34], [45, 33], [45, 35], [46, 35], [47, 35], [48, 35], [48, 36], [47, 36], [47, 37], [46, 37], [48, 37], [49, 37], [49, 38], [45, 36], [49, 34], [49, 24], [49, 25], [47, 21], [46, 21], [46, 20], [47, 20], [48, 20], [49, 20], [49, 19], [49, 18], [47, 19], [46, 19], [45, 24], [44, 24], [43, 24], [43, 25], [42, 25], [44, 25], [44, 26], [43, 26], [43, 27], [42, 27], [42, 28], [41, 28], [45, 26], [48, 17], [46, 9], [44, 9], [44, 8], [45, 8], [45, 7], [45, 6], [44, 6], [43, 6], [42, 6], [42, 5], [43, 5], [44, 5], [44, 4], [43, 4], [42, 4], [43, 3], [44, 3], [44, 2], [43, 2], [42, 2], [42, 7], [41, 7], [41, 8], [40, 8], [39, 8], [39, 7], [40, 9], [42, 8], [42, 9], [43, 7], [46, 6], [39, 11], [30, 5], [29, 5], [28, 5], [30, 6], [24, 1], [23, 1], [24, 0], [25, 0], [20, 6], [18, 7], [17, 7], [17, 8], [16, 8], [18, 8], [19, 8], [19, 7], [12, 8], [10, 8], [10, 9], [11, 9], [11, 10], [12, 10], [8, 8], [8, 9], [7, 9], [4, 10], [7, 12], [8, 12], [5, 12], [1, 15], [2, 15], [2, 16], [1, 16], [1, 17], [1, 18], [0, 18], [2, 18], [3, 18], [4, 18], [5, 18], [6, 18], [7, 18], [8, 18], [9, 18], [10, 18], [11, 18], [11, 19], [10, 19], [9, 19], [9, 20], [10, 20], [9, 21], [8, 21], [7, 21], [7, 22], [6, 22], [5, 22], [5, 21], [4, 21], [5, 23], [4, 23], [3, 23], [3, 22], [2, 22], [2, 21], [4, 24], [4, 25], [5, 25], [5, 26], [4, 26], [3, 26], [2, 26], [1, 26], [0, 26], [0, 27], [1, 27], [2, 27], [3, 27], [4, 27], [5, 27], [6, 27], [7, 27], [7, 28], [8, 28], [9, 28], [9, 27], [10, 27], [11, 27], [12, 27], [12, 28], [11, 28], [11, 29], [12, 29], [13, 29], [13, 28], [13, 30], [14, 30], [13, 31], [13, 32], [12, 32], [12, 33], [11, 33], [10, 33], [9, 33], [9, 32], [9, 31], [9, 30], [8, 30], [8, 29], [7, 29], [9, 29], [10, 30], [11, 30], [9, 34], [10, 34], [11, 34], [12, 34], [13, 34], [13, 33], [13, 35], [12, 35], [11, 35], [10, 35], [9, 35], [9, 36], [10, 36], [11, 36], [10, 37], [14, 35], [15, 35], [16, 35], [16, 34], [15, 34], [15, 33], [16, 33], [16, 32], [15, 32], [14, 32], [17, 32], [18, 32], [19, 32], [20, 32], [21, 32], [22, 32], [23, 32], [24, 32], [25, 32], [26, 32], [27, 32], [28, 32], [29, 32], [30, 32], [30, 31], [29, 31], [28, 31], [27, 31], [30, 30], [31, 30], [32, 30], [31, 29], [30, 29], [30, 28], [31, 28], [32, 28], [33, 28], [34, 28], [35, 28], [36, 28], [36, 27], [35, 27], [34, 27], [33, 27], [32, 27], [31, 27], [30, 27], [30, 26], [31, 26], [32, 26], [33, 26], [34, 26], [35, 26], [35, 25], [34, 25], [36, 25], [37, 25], [37, 26], [37, 27], [36, 24], [31, 25], [30, 25], [30, 24], [30, 23], [30, 22], [29, 22], [28, 22], [27, 22], [27, 23], [26, 23], [25, 23], [24, 23], [24, 22], [23, 22], [22, 22], [22, 21], [21, 21], [20, 21], [19, 21], [18, 21], [17, 21], [17, 20], [16, 20], [15, 20], [14, 20], [14, 19], [13, 19], [12, 19], [13, 18], [13, 17], [12, 17], [12, 16], [11, 16], [10, 16], [9, 16], [9, 15], [8, 15], [10, 15], [11, 15], [12, 15], [13, 15], [14, 15], [15, 15], [16, 15], [17, 15], [18, 15], [17, 14], [17, 16], [16, 16], [17, 17], [17, 18], [15, 14], [14, 14], [14, 13], [15, 13], [16, 13], [16, 12], [17, 12], [18, 12], [19, 12], [20, 12], [21, 12], [21, 11], [20, 13], [19, 13], [18, 13], [19, 14], [19, 11], [17, 11], [16, 11], [15, 11], [16, 10], [17, 10], [18, 10], [13, 16], [9, 17], [8, 17], [14, 17], [15, 17], [15, 18], [15, 19], [16, 19], [14, 21], [13, 21], [12, 21], [11, 21], [11, 22], [10, 22], [10, 23], [13, 22], [14, 22], [15, 22], [14, 23], [13, 23], [13, 24], [14, 24], [15, 24], [13, 25], [12, 25], [11, 25], [10, 25], [9, 25], [8, 25], [7, 25], [7, 24], [8, 24], [8, 23], [8, 22], [8, 26], [10, 26], [11, 26], [11, 24], [18, 20], [19, 20], [20, 20], [21, 20], [22, 20], [23, 20], [24, 20], [25, 20], [25, 19], [25, 18], [24, 18], [26, 18], [27, 18], [28, 18], [28, 19], [27, 19], [27, 20], [28, 20], [29, 20], [30, 20], [31, 20], [31, 19], [30, 19], [29, 19], [30, 18], [30, 17], [29, 17], [30, 16], [31, 16], [32, 16], [32, 15], [31, 15], [30, 15], [29, 15], [28, 15], [27, 15], [26, 15], [26, 14], [25, 14], [24, 14], [23, 14], [22, 14], [22, 13], [23, 13], [24, 13], [23, 12], [23, 11], [24, 11], [25, 11], [23, 10], [22, 15], [23, 15], [24, 15], [24, 16], [25, 16], [22, 16], [21, 16], [20, 16], [19, 16], [19, 17], [19, 18], [20, 18], [20, 19], [19, 19], [18, 19], [21, 19], [22, 19], [23, 19], [22, 18], [22, 17], [21, 17], [23, 17], [20, 15], [27, 14], [28, 14], [27, 13], [26, 13], [27, 16], [31, 14], [32, 14], [32, 13], [31, 13], [31, 12], [30, 12], [29, 12], [28, 12], [28, 11], [27, 11], [29, 11], [29, 13], [32, 12], [33, 12], [34, 12], [33, 13], [33, 15], [31, 21], [30, 21], [29, 21], [24, 21], [17, 22], [20, 22], [25, 22], [26, 24], [27, 24], [27, 25], [26, 25], [25, 25], [24, 25], [23, 25], [22, 25], [22, 26], [21, 26], [20, 26], [20, 25], [19, 25], [20, 24], [21, 24], [21, 23], [20, 27], [21, 27], [21, 28], [20, 28], [22, 28], [22, 29], [21, 29], [23, 29], [24, 29], [25, 29], [26, 29], [26, 28], [25, 28], [27, 28], [28, 28], [27, 27], [26, 27], [27, 26], [28, 26], [28, 25], [26, 30], [22, 30], [23, 26], [24, 26], [24, 27], [28, 23], [36, 29], [37, 29], [36, 30], [35, 30], [34, 30], [34, 29], [29, 33], [28, 33], [27, 33], [25, 31], [24, 31], [23, 31], [25, 33], [24, 33], [23, 33], [22, 33], [21, 33], [20, 33], [19, 33], [18, 33], [18, 34], [17, 34], [21, 34], [21, 35], [20, 35], [20, 36], [19, 36], [18, 36], [18, 37], [17, 37], [19, 37], [20, 37], [21, 37], [19, 38], [18, 38], [18, 39], [17, 39], [16, 39], [17, 40], [17, 41], [16, 41], [18, 41], [19, 41], [19, 40], [19, 39], [19, 42], [18, 42], [18, 43], [19, 43], [20, 43], [21, 43], [21, 42], [20, 42], [22, 42], [23, 42], [24, 42], [25, 42], [26, 42], [25, 41], [23, 41], [22, 41], [23, 40], [23, 39], [24, 39], [25, 39], [26, 39], [27, 39], [28, 39], [29, 39], [30, 39], [30, 38], [31, 38], [32, 38], [32, 37], [31, 37], [30, 37], [29, 37], [28, 37], [27, 37], [26, 37], [26, 36], [27, 36], [28, 36], [27, 35], [27, 38], [28, 38], [30, 36], [33, 37], [34, 37], [35, 37], [35, 38], [34, 38], [36, 38], [37, 38], [36, 39], [36, 40], [34, 36], [33, 36], [33, 35], [32, 39], [30, 40], [29, 40], [28, 40], [27, 40], [26, 40], [27, 41], [29, 41], [29, 42], [28, 42], [24, 38], [23, 38], [22, 38], [23, 43], [23, 44], [19, 44], [18, 44], [18, 45], [17, 45], [17, 46], [16, 46], [15, 46], [14, 46], [13, 46], [12, 46], [13, 47], [14, 47], [13, 48], [12, 48], [11, 48], [11, 47], [11, 49], [10, 49], [12, 49], [13, 49], [14, 49], [14, 45], [15, 45], [18, 46], [18, 47], [17, 47], [17, 48], [19, 47], [19, 45], [23, 34], [25, 34], [26, 34], [25, 35], [24, 35], [24, 36], [20, 31], [19, 31], [18, 31], [18, 30], [19, 30], [19, 29], [16, 31], [16, 30], [16, 36], [15, 36], [13, 36], [13, 37], [12, 37], [12, 38], [11, 38], [11, 39], [12, 39], [13, 39], [14, 39], [14, 38], [14, 37], [14, 40], [13, 40], [13, 41], [14, 41], [13, 42], [12, 42], [11, 42], [11, 41], [11, 40], [12, 43], [12, 44], [13, 44], [6, 26], [5, 28], [4, 28], [4, 29], [3, 29], [2, 29], [1, 29], [1, 28], [0, 28], [2, 28], [1, 30], [2, 30], [3, 30], [4, 30], [4, 31], [3, 31], [3, 32], [2, 32], [1, 32], [2, 33], [3, 33], [4, 33], [4, 32], [5, 32], [6, 32], [7, 32], [7, 33], [6, 33], [6, 34], [5, 34], [4, 34], [3, 34], [3, 35], [2, 35], [2, 36], [1, 36], [0, 36], [0, 35], [0, 34], [1, 34], [0, 33], [3, 36], [4, 36], [3, 37], [3, 38], [4, 38], [4, 39], [3, 39], [2, 39], [2, 40], [3, 40], [4, 40], [5, 40], [5, 39], [6, 35], [7, 35], [6, 36], [6, 37], [7, 37], [8, 37], [6, 38], [5, 31], [2, 25], [2, 24], [1, 24], [0, 24], [0, 23], [0, 22], [6, 23], [7, 19], [6, 19], [5, 19], [6, 17], [6, 16], [4, 17], [4, 16], [3, 16], [2, 19], [1, 19], [1, 20], [0, 20], [5, 14], [9, 13], [21, 8], [21, 7], [25, 9], [30, 9], [31, 9], [32, 9], [33, 9], [33, 10], [32, 10], [32, 8], [31, 8], [32, 7], [33, 7], [37, 32], [33, 33], [38, 42], [37, 42], [36, 42], [35, 42], [34, 42], [33, 42], [35, 43], [35, 44], [34, 44], [34, 45], [33, 45], [32, 45], [32, 44], [31, 44], [30, 44], [29, 44], [33, 46], [33, 47], [32, 47], [31, 47], [30, 47], [37, 43], [37, 44], [39, 42], [27, 44], [27, 49], [26, 49], [25, 49], [24, 49], [23, 49], [22, 49], [21, 49], [21, 48], [21, 47], [22, 47], [23, 47], [23, 48], [24, 48], [25, 48], [21, 46], [20, 46], [21, 45], [22, 45], [28, 49], [29, 49], [30, 49], [32, 49], [33, 49], [34, 49], [35, 49], [36, 49], [36, 48], [37, 48], [38, 48], [38, 47], [39, 47], [39, 46], [38, 46], [37, 46], [38, 49], [39, 49], [40, 49], [41, 49], [40, 44], [40, 43], [43, 49], [44, 49], [45, 49]]} -------------------------------------------------------------------------------- /gridworld.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from torch.nn import functional as F 4 | 5 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 6 | 7 | class GridWorld: 8 | """ 9 | This environment is motivated by the following paper. 10 | https://proceedings.mlr.press/v15/boularias11a/boularias11a.pdf 11 | 12 | - It has 50 x 50 grids (cells). 13 | - The agent has four actions for moving in one of the directions of the compass. 14 | - [Optional] If ```transition_prob``` = True, the actions succeed with probability 0.7, 15 | a failure results in a uniform random transition to one of the adjacent states. 16 | - A reward of 10 is given for reaching the goal state, located on the bottom-right corner. 17 | - For the remaining states, 18 | the reward function was randomly set to 0 with probability 2/3 19 | and to −1 with probability 1/3. 20 | - If the agent moves across the border, it's given the fail reward (i.e, reward=`-1`). 21 | - The initial state is sampled from a uniform distribution. 22 | """ 23 | 24 | 25 | def __init__(self, reward_map=None, valid_states=None, seed=None, transition_prob=False, max_timestep=200, device="cuda"): 26 | """ 27 | Initialize class. 28 | 29 | Parameters 30 | ---------- 31 | reward_map : float[grid_size * grid_size] 32 | Reward for each state. 33 | Set this value when you load existing world definition (when seed=None). 34 | valid_states : list(int[2]) 35 | List of states, in which the agent can reach to goal state without losing any rewards. 36 | Each state is a 2d vector, [row, column]. 37 | When you call reset(), the initial state is picked up from these states. 38 | Set this value when you load existing world definition (when seed=None). 39 | seed : int 40 | Seed value to generate new grid (maze). 41 | Set this value when you create a new world. 42 | (Above ```reward_map``` and ```valid_states``` are newly generated.) 43 | transition_prob : bool 44 | True if transition probability (above) is enabled. 45 | False when we generate an expert agent without noise. 46 | (If transition_prob=True, it only returns next states in step() function.) 47 | max_timestep : int 48 | The maximum number of time-step (horizon). 49 | When it doesn't have finite horizon, set None as max_timestep. 50 | (If max_timestep=None, it doesn't return trunc flag in step() function.) 51 | device : string 52 | Device info ("cuda", "cpu", etc). 53 | """ 54 | 55 | self.device = device 56 | self.transition_prob = transition_prob 57 | self.grid_size = 50 58 | self.action_size = 4 59 | self.max_timestep = max_timestep 60 | self.goal_reward = 10 61 | 62 | if seed is None: 63 | ############################ 64 | ### Load from definition ### 65 | ############################ 66 | self.reward_map = torch.tensor(reward_map).to(self.device) 67 | self.valid_states = torch.tensor(valid_states).to(self.device) 68 | else: 69 | ################################ 70 | ### Generate a new GridWorld ### 71 | ################################ 72 | # generate grid 73 | self.reward_map = torch.zeros(self.grid_size * self.grid_size, dtype=torch.int).to(self.device) 74 | # bottom-right is goal state 75 | self.reward_map[-1] = self.goal_reward 76 | # set reward=−1 with probability 1/3 77 | sample_n = np.floor((self.grid_size * self.grid_size - 1) / 3).astype(int) 78 | rng = np.random.default_rng(seed) 79 | sample_loc = rng.choice(self.grid_size * self.grid_size - 1, size=sample_n, replace=False) 80 | sample_loc = torch.from_numpy(sample_loc).to(self.device) 81 | self.reward_map[sample_loc] = -1 82 | # seek valid states 83 | valid_states_list = self._greedy_seek_valid_states([self.grid_size-1, self.grid_size-1], []) 84 | valid_states_list.remove([self.grid_size-1, self.grid_size-1]) 85 | self.valid_states = torch.tensor(valid_states_list).to(self.device) 86 | 87 | def _greedy_seek_valid_states(self, state, old_state_list): 88 | """ 89 | This method recursively seeks valid state. 90 | e.g, if some state is surrounded by the states with reward=-1, 91 | this state is invalid, because it cannot reach to the goal state 92 | without losing rewards. 93 | 94 | Parameters 95 | ---------- 96 | state : int[2] 97 | State to start seeking. It then seeks this state and all child's states. 98 | This state must be the list of [row, column]. 99 | old_state_list : int[N, 2] 100 | List of states already checked. 101 | Each state must be the list of [row, column]. 102 | These items are then skipped for seeking. 103 | 104 | Returns 105 | ---------- 106 | valid_states : int[N, 2] 107 | List of new valid states. 108 | Each state must be the list of [row, column]. 109 | """ 110 | # build new list 111 | new_state_list = [] 112 | # if the state is already included in the list, do nothing 113 | if state in old_state_list: 114 | return new_state_list 115 | # if the state has reward=-1, do nothing 116 | if self.reward_map[state[0]*self.grid_size+state[1]] == -1: 117 | return new_state_list 118 | # else add the state into the list 119 | new_state_list.append(state) 120 | # move up 121 | if state[0] > 0: 122 | next_state = list(map(lambda i, j: i + j, state, [-1, 0])) 123 | new_state_list += self._greedy_seek_valid_states( 124 | next_state, 125 | old_state_list + new_state_list) 126 | # move down 127 | if state[0] < self.grid_size - 1: 128 | next_state = list(map(lambda i, j: i + j, state, [1, 0])) 129 | new_state_list += self._greedy_seek_valid_states( 130 | next_state, 131 | old_state_list + new_state_list) 132 | # move left 133 | if state[1] > 0: 134 | next_state = list(map(lambda i, j: i + j, state, [0, -1])) 135 | new_state_list += self._greedy_seek_valid_states( 136 | next_state, 137 | old_state_list + new_state_list) 138 | # move right 139 | if state[1] < self.grid_size - 1: 140 | next_state = list(map(lambda i, j: i + j, state, [0, 1])) 141 | new_state_list += self._greedy_seek_valid_states( 142 | next_state, 143 | old_state_list + new_state_list) 144 | # return result 145 | return new_state_list 146 | 147 | def reset(self, batch_size): 148 | """ 149 | Randomly, get initial state (single state) from valid states. 150 | 151 | Parameters 152 | ---------- 153 | batch_size : int 154 | The number of returned states. 155 | 156 | Returns 157 | ---------- 158 | state : torch.tensor((batch_size), dtype=int) 159 | Return the picked-up state id. 160 | """ 161 | # initialize step count 162 | self.step_count = 0 163 | # pick up sample of valid states 164 | indices = torch.multinomial(torch.ones(len(self.valid_states)).to(self.device), batch_size, replacement=True) 165 | state_2d = self.valid_states[indices] 166 | # convert 2d index to 1d index 167 | state_1d = state_2d[:,0] * self.grid_size + state_2d[:,1] 168 | # return result 169 | return state_1d 170 | 171 | def step(self, actions, states, trans_state_only=False, transition_prob=None): 172 | """ 173 | Take action, proceed step, and return the result. 174 | 175 | Parameters 176 | ---------- 177 | actions : torch.tensor((batch_size), dtype=int) 178 | Actions to take 179 | (0=UP 1=DOWN 2=LEFT 3=RIGHT) 180 | states : torch.tensor((batch_size), dtype=int) 181 | Current state id. 182 | trans_state_only : bool 183 | Set TRUE, when you call only for getting next state by stateless without reset() 184 | (If transition_prob=True, it only returns next states in step() function.) 185 | transition_prob : bool 186 | Set this property, if you overrite default ```transition_prob``` property. 187 | (For this property, see above in __init__() method.) 188 | 189 | Returns 190 | ---------- 191 | new-states : torch.tensor((batch_size), dtype=int) 192 | New state id. 193 | rewards : torch.tensor((batch_size), dtype=float) 194 | The obtained reward. 195 | term : torch.tensor((batch_size), dtype=bool) 196 | Flag to check whether it reaches to the goal and terminates. 197 | trunc : torch.tensor((batch_size), dtype=bool) 198 | Flag to check whether it's truncated by reaching to max time-step. 199 | (When max_timestep is None, this is not returned.) 200 | """ 201 | # get batch size 202 | batch_size = actions.shape[0] 203 | # if transition prob is enabled, apply stochastic transition 204 | if transition_prob is None: 205 | trans_prob = self.transition_prob # set default 206 | else: 207 | trans_prob = transition_prob # overrite 208 | if trans_prob: 209 | # the action succeeds with probability 0.7 210 | prob = torch.ones(batch_size, self.action_size).to(self.device) 211 | mask = F.one_hot(actions, num_classes=self.action_size).bool() 212 | prob = torch.where(mask, 7.0, prob) 213 | selected_actions = torch.multinomial(prob, 1, replacement=True) 214 | selected_actions = selected_actions.squeeze(dim=1) 215 | action_onehot = F.one_hot(selected_actions, num_classes=self.action_size) 216 | else: 217 | # deterministic (probability=1.0 in one state) 218 | action_onehot = F.one_hot(actions, num_classes=self.action_size) 219 | # get 2d state 220 | mod = torch.div(states, self.grid_size, rounding_mode="floor") 221 | reminder = torch.remainder(states, self.grid_size) 222 | state_2d = torch.cat((mod.unsqueeze(dim=-1), reminder.unsqueeze(dim=-1)), dim=-1) 223 | # move state 224 | # (0=UP 1=DOWN 2=LEFT 3=RIGHT) 225 | up_and_down = action_onehot[:,1] - action_onehot[:,0] 226 | left_and_right = action_onehot[:,3] - action_onehot[:,2] 227 | move = torch.cat((up_and_down.unsqueeze(dim=-1), left_and_right.unsqueeze(dim=-1)), dim=-1) 228 | new_states = state_2d + move 229 | # set reward 230 | if not(trans_state_only): 231 | rewards = torch.zeros(batch_size).to(self.device) 232 | rewards = torch.where(new_states[:,0] < 0, -1.0, rewards) 233 | rewards = torch.where(new_states[:,0] >= self.grid_size, -1.0, rewards) 234 | rewards = torch.where(new_states[:,1] < 0, -1.0, rewards) 235 | rewards = torch.where(new_states[:,1] >= self.grid_size, -1.0, rewards) 236 | # correct location 237 | new_states = torch.clip(new_states, min=0, max=self.grid_size-1) 238 | # if succeed, add reward of current state 239 | states_1d = new_states[:,0] * self.grid_size + new_states[:,1] 240 | if not(trans_state_only): 241 | rewards = torch.where(rewards>=0.0, rewards+self.reward_map[states_1d], rewards) 242 | self.step_count += 1 243 | # return result 244 | if trans_state_only: 245 | return states_1d 246 | elif self.max_timestep is None: 247 | return states_1d, rewards, rewards==self.reward_map[self.grid_size * self.grid_size - 1] 248 | else: 249 | return states_1d, rewards, rewards==self.reward_map[self.grid_size * self.grid_size - 1], torch.tensor(self.step_count==self.max_timestep).to(self.device).unsqueeze(dim=0).expand(batch_size) 250 | --------------------------------------------------------------------------------