├── LICENSE ├── README.md └── gpt_like_llm.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Yash Kamble 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # micropgpt -> GPT-like Language Model 2 | 3 | This repository contains a GPT-style Language Model built using PyTorch. The architecture is inspired by the "Attention Is All You Need" paper and focuses on implementing key components such as self-attention and positional encoding. 4 | 5 | ## Features 6 | 7 | - Multi-Head Attention mechanism 8 | - Transformer blocks for sequence modeling 9 | - Trained on the Tiny Shakespeare dataset 10 | - Implements text generation functionality 11 | 12 | ## Table of Contents 13 | - [Usage](#usage) 14 | - [Architecture](#architecture) 15 | - [Training](#training) 16 | - [Generation](#generation) 17 | - [Future Work](#future-work) 18 | - [Acknowledgements](#acknowledgements) 19 | 20 | ## Usage 21 | 22 | ### Training 23 | 24 | Run the `gpt_like_llm.ipynb` Jupyter Notebook or execute the code directly to train the model. 25 | 26 | ### Text Generation 27 | 28 | After training, use the `generate` method of the `GPTLanguageModel` to create text based on a prompt: 29 | ```python 30 | index = torch.tensor([[model_start_token]]) # Input a starting token 31 | generated_text = model.generate(index, max_new_tokens=100) 32 | print(decode(generated_text.tolist()[0])) 33 | ``` 34 | 35 | ## Architecture 36 | 37 | - **Embedding Layer**: Token and positional embeddings. 38 | - **Transformer Blocks**: Consists of self-attention and feed-forward layers. 39 | - **Output Head**: Maps the final embedding to the vocabulary space. 40 | 41 | Key Hyperparameters: 42 | - `n_embd`: 384 (embedding dimension) 43 | - `n_head`: 4 (number of attention heads) 44 | - `n_layer`: 4 (number of Transformer layers) 45 | - `block_size`: 128 (context window size) 46 | 47 | ## Training 48 | 49 | - Optimizer: AdamW 50 | - Loss Function: CrossEntropyLoss 51 | - Default Parameters: 52 | - Learning Rate: 3e-4 53 | - Batch Size: 32 54 | - Maximum Iterations: 3000 55 | - Training and Validation Splits: 56 | - Train: 80% 57 | - Validation: 20% 58 | 59 | ## Generation 60 | 61 | The `generate` method allows the model to predict the next tokens based on a given input sequence. It uses a sampling technique to introduce variability in outputs. 62 | 63 | ## Future Work 64 | 65 | - Implement a larger dataset for training. 66 | - Fine-tune hyperparameters for improved performance. 67 | - Add support for multilingual text. 68 | 69 | ## Acknowledgements 70 | 71 | - Inspired by the "Attention Is All You Need" paper by Vaswani et al. 72 | - Tiny Shakespeare dataset from [karpathy/char-rnn](https://github.com/karpathy/char-rnn). 73 | --- 74 | ## License 75 | 76 | This project is licensed under the MIT License. 77 | -------------------------------------------------------------------------------- /gpt_like_llm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "gpuType": "T4", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "source": [ 33 | "!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt" 34 | ], 35 | "metadata": { 36 | "colab": { 37 | "base_uri": "https://localhost:8080/" 38 | }, 39 | "id": "VQ5koPc1ZVsD", 40 | "outputId": "036c97c6-04ec-4477-9892-3d9f048d2601" 41 | }, 42 | "execution_count": null, 43 | "outputs": [ 44 | { 45 | "output_type": "stream", 46 | "name": "stdout", 47 | "text": [ 48 | "--2024-09-13 13:24:14-- https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\n", 49 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...\n", 50 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n", 51 | "HTTP request sent, awaiting response... 200 OK\n", 52 | "Length: 1115394 (1.1M) [text/plain]\n", 53 | "Saving to: ‘input.txt’\n", 54 | "\n", 55 | "\rinput.txt 0%[ ] 0 --.-KB/s \rinput.txt 100%[===================>] 1.06M --.-KB/s in 0.05s \n", 56 | "\n", 57 | "2024-09-13 13:24:14 (22.4 MB/s) - ‘input.txt’ saved [1115394/1115394]\n", 58 | "\n" 59 | ] 60 | } 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": { 67 | "colab": { 68 | "base_uri": "https://localhost:8080/" 69 | }, 70 | "id": "UQ8iLMWDXNA_", 71 | "outputId": "6c150e90-40f5-489c-c044-fdb6d6ea8ca5" 72 | }, 73 | "outputs": [ 74 | { 75 | "output_type": "stream", 76 | "name": "stdout", 77 | "text": [ 78 | "cuda\n" 79 | ] 80 | } 81 | ], 82 | "source": [ 83 | "import torch\n", 84 | "import torch.nn as nn\n", 85 | "from torch.nn import functional as F\n", 86 | "import mmap\n", 87 | "import random\n", 88 | "import pickle\n", 89 | "import argparse\n", 90 | "\n", 91 | "# parser = argparse.ArgumentParser(description='This is a demonstration program')\n", 92 | "\n", 93 | "# Here we add an argument to the parser, specifying the expected type, a help message, etc.\n", 94 | "# parser.add_argument('-batch_size', type=str, required=True, help='Please provide a batch_size')\n", 95 | "\n", 96 | "# args = parser.parse_args()\n", 97 | "\n", 98 | "# Now we can use the argument value in our program.\n", 99 | "# print(f'batch size: {args.batch_size}')\n", 100 | "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", 101 | "\n", 102 | "# batch_size = args.batch_size # to use the batch_size cmd arg -> python file_name.py -batch_size 32\n", 103 | "batch_size = 32\n", 104 | "block_size = 128\n", 105 | "max_iters = 3000\n", 106 | "learning_rate = 3e-4\n", 107 | "eval_iters = 50\n", 108 | "n_embd = 384\n", 109 | "n_head = 4\n", 110 | "n_layer = 4\n", 111 | "dropout = 0.2\n", 112 | "\n", 113 | "print(device)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "source": [ 119 | "with open('input.txt', 'r', encoding='utf-8') as f:\n", 120 | " text = f.read()\n", 121 | "chars = sorted(set(text))\n", 122 | "print(chars)\n", 123 | "vocab_size = len(chars)" 124 | ], 125 | "metadata": { 126 | "colab": { 127 | "base_uri": "https://localhost:8080/" 128 | }, 129 | "id": "kL0c7lLCXjA_", 130 | "outputId": "714be6fa-3c00-4595-9635-4e6ad3ac3bab" 131 | }, 132 | "execution_count": null, 133 | "outputs": [ 134 | { 135 | "output_type": "stream", 136 | "name": "stdout", 137 | "text": [ 138 | "['\\n', ' ', '!', '$', '&', \"'\", ',', '-', '.', '3', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']\n" 139 | ] 140 | } 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "source": [ 146 | "string_to_int = { ch:i for i,ch in enumerate(chars) }\n", 147 | "int_to_string = { i:ch for i,ch in enumerate(chars) }\n", 148 | "encode = lambda s: [string_to_int[c] for c in s]\n", 149 | "decode = lambda l: ''.join([int_to_string[i] for i in l])" 150 | ], 151 | "metadata": { 152 | "id": "htTKteuWYZG9" 153 | }, 154 | "execution_count": null, 155 | "outputs": [] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "source": [ 160 | "data = torch.tensor(encode(text), dtype=torch.long)\n", 161 | "n = int(0.8*len(data))\n", 162 | "train_data = data[:n]\n", 163 | "val_data = data[n:]\n", 164 | "\n", 165 | "def get_batch(split):\n", 166 | " data = train_data if split == 'train' else val_data\n", 167 | " ix = torch.randint(len(data) - block_size, (batch_size,))\n", 168 | " x = torch.stack([data[i:i+block_size] for i in ix])\n", 169 | " y = torch.stack([data[i+1:i+block_size+1] for i in ix])\n", 170 | " x, y = x.to(device), y.to(device)\n", 171 | " return x, y" 172 | ], 173 | "metadata": { 174 | "id": "fdxKteFBYauu" 175 | }, 176 | "execution_count": null, 177 | "outputs": [] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "source": [ 182 | "@torch.no_grad()\n", 183 | "def estimate_loss():\n", 184 | " out = {}\n", 185 | " model.eval()\n", 186 | " for split in ['train', 'val']:\n", 187 | " losses = torch.zeros(eval_iters)\n", 188 | " for k in range(eval_iters):\n", 189 | " X, Y = get_batch(split)\n", 190 | " logits, loss = model(X, Y)\n", 191 | " losses[k] = loss.item()\n", 192 | " out[split] = losses.mean()\n", 193 | " model.train()\n", 194 | " return out" 195 | ], 196 | "metadata": { 197 | "id": "t3dQXrTfYfNW" 198 | }, 199 | "execution_count": null, 200 | "outputs": [] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "source": [ 205 | "class Head(nn.Module):\n", 206 | " \"\"\" one head of self-attention \"\"\"\n", 207 | "\n", 208 | " def __init__(self, head_size):\n", 209 | " super().__init__()\n", 210 | " self.key = nn.Linear(n_embd, head_size, bias=False)\n", 211 | " self.query = nn.Linear(n_embd, head_size, bias=False)\n", 212 | " self.value = nn.Linear(n_embd, head_size, bias=False)\n", 213 | " self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))\n", 214 | "\n", 215 | " self.dropout = nn.Dropout(dropout)\n", 216 | "\n", 217 | " def forward(self, x):\n", 218 | " # input of size (batch, time-step, channels)\n", 219 | " # output of size (batch, time-step, head size)\n", 220 | " B,T,C = x.shape\n", 221 | " k = self.key(x) # (B,T,hs)\n", 222 | " q = self.query(x) # (B,T,hs)\n", 223 | " # compute attention scores (\"affinities\")\n", 224 | " wei = q @ k.transpose(-2,-1) * k.shape[-1]**-0.5 # (B, T, hs) @ (B, hs, T) -> (B, T, T)\n", 225 | " wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)\n", 226 | " wei = F.softmax(wei, dim=-1) # (B, T, T)\n", 227 | " wei = self.dropout(wei)\n", 228 | " # perform the weighted aggregation of the values\n", 229 | " v = self.value(x) # (B,T,hs)\n", 230 | " out = wei @ v # (B, T, T) @ (B, T, hs) -> (B, T, hs)\n", 231 | " return out\n", 232 | "\n", 233 | "# [1, 0, 0]\n", 234 | "# [1, 0.6, 0]\n", 235 | "# [1, 0.6, 0.4]\n", 236 | "class MultiHeadAttention(nn.Module):\n", 237 | " \"\"\" multiple heads of self-attention in parallel \"\"\"\n", 238 | "\n", 239 | " def __init__(self, num_heads, head_size):\n", 240 | " super().__init__()\n", 241 | " self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])\n", 242 | " self.proj = nn.Linear(head_size * num_heads, n_embd)\n", 243 | " self.dropout = nn.Dropout(dropout)\n", 244 | "\n", 245 | " def forward(self, x):\n", 246 | " out = torch.cat([h(x) for h in self.heads], dim=-1) # (B, T, F) -> (B, T, [h1, h1, h1, h1, h2, h2, h2, h2, h3, h3, h3, h3])\n", 247 | " out = self.dropout(self.proj(out))\n", 248 | " return out\n", 249 | "\n", 250 | "\n", 251 | "class FeedFoward(nn.Module):\n", 252 | " \"\"\" a simple linear layer followed by a non-linearity \"\"\"\n", 253 | "\n", 254 | " def __init__(self, n_embd):\n", 255 | " super().__init__()\n", 256 | " self.net = nn.Sequential(\n", 257 | " nn.Linear(n_embd, 4 * n_embd),\n", 258 | " nn.ReLU(),\n", 259 | " nn.Linear(4 * n_embd, n_embd),\n", 260 | " nn.Dropout(dropout),\n", 261 | " )\n", 262 | "\n", 263 | " def forward(self, x):\n", 264 | " return self.net(x)\n", 265 | "\n", 266 | "class Block(nn.Module):\n", 267 | " \"\"\" Transformer block: communication followed by computation \"\"\"\n", 268 | "\n", 269 | " def __init__(self, n_embd, n_head):\n", 270 | " # n_embd: embedding dimension, n_head: the number of heads we'd like\n", 271 | " super().__init__()\n", 272 | " head_size = n_embd // n_head\n", 273 | " self.sa = MultiHeadAttention(n_head, head_size)\n", 274 | " self.ffwd = FeedFoward(n_embd)\n", 275 | " self.ln1 = nn.LayerNorm(n_embd)\n", 276 | " self.ln2 = nn.LayerNorm(n_embd)\n", 277 | "\n", 278 | " def forward(self, x):\n", 279 | " y = self.sa(x)\n", 280 | " x = self.ln1(x + y)\n", 281 | " y = self.ffwd(x)\n", 282 | " x = self.ln2(x + y)\n", 283 | " return x\n", 284 | "\n", 285 | "class GPTLanguageModel(nn.Module):\n", 286 | " def __init__(self, vocab_size):\n", 287 | " super().__init__()\n", 288 | " self.token_embedding_table = nn.Embedding(vocab_size, n_embd)\n", 289 | " self.position_embedding_table = nn.Embedding(block_size, n_embd)\n", 290 | " self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])\n", 291 | " self.ln_f = nn.LayerNorm(n_embd) # final layer norm\n", 292 | " self.lm_head = nn.Linear(n_embd, vocab_size)\n", 293 | "\n", 294 | "\n", 295 | " self.apply(self._init_weights)\n", 296 | "\n", 297 | " def _init_weights(self, module):\n", 298 | " if isinstance(module, nn.Linear):\n", 299 | " torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)\n", 300 | " if module.bias is not None:\n", 301 | " torch.nn.init.zeros_(module.bias)\n", 302 | " elif isinstance(module, nn.Embedding):\n", 303 | " torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)\n", 304 | "\n", 305 | " def forward(self, index, targets=None):\n", 306 | " B, T = index.shape\n", 307 | "\n", 308 | "\n", 309 | " # idx and targets are both (B,T) tensor of integers\n", 310 | " tok_emb = self.token_embedding_table(index) # (B,T,C)\n", 311 | " pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)\n", 312 | " x = tok_emb + pos_emb # (B,T,C)\n", 313 | " x = self.blocks(x) # (B,T,C)\n", 314 | " x = self.ln_f(x) # (B,T,C)\n", 315 | " logits = self.lm_head(x) # (B,T,vocab_size)\n", 316 | "\n", 317 | " if targets is None:\n", 318 | " loss = None\n", 319 | " else:\n", 320 | " B, T, C = logits.shape\n", 321 | " logits = logits.view(B*T, C)\n", 322 | " targets = targets.view(B*T)\n", 323 | " loss = F.cross_entropy(logits, targets)\n", 324 | "\n", 325 | " return logits, loss\n", 326 | "\n", 327 | " def generate(self, index, max_new_tokens):\n", 328 | " # index is (B, T) array of indices in the current context\n", 329 | " for _ in range(max_new_tokens):\n", 330 | " # crop idx to the last block_size tokens\n", 331 | " index_cond = index[:, -block_size:]\n", 332 | " # get the predictions\n", 333 | " logits, loss = self.forward(index_cond)\n", 334 | " # focus only on the last time step\n", 335 | " logits = logits[:, -1, :] # becomes (B, C)\n", 336 | " # apply softmax to get probabilities\n", 337 | " probs = F.softmax(logits, dim=-1) # (B, C)\n", 338 | " # sample from the distribution\n", 339 | " index_next = torch.multinomial(probs, num_samples=1) # (B, 1)\n", 340 | " # append sampled index to the running sequence\n", 341 | " index = torch.cat((index, index_next), dim=1) # (B, T+1)\n", 342 | " return index\n", 343 | "\n", 344 | "model = GPTLanguageModel(vocab_size)\n", 345 | "# print('loading model parameters...')\n", 346 | "# with open('model-01.pkl', 'rb') as f:\n", 347 | "# model = pickle.load(f)\n", 348 | "# print('loaded successfully!')\n", 349 | "m = model.to(device)" 350 | ], 351 | "metadata": { 352 | "id": "K_T8l3GkYz0p" 353 | }, 354 | "execution_count": null, 355 | "outputs": [] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "source": [ 360 | "# create a PyTorch optimizer\n", 361 | "optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)\n", 362 | "\n", 363 | "for iter in range(max_iters):\n", 364 | " print(iter)\n", 365 | " if iter % eval_iters == 0:\n", 366 | " losses = estimate_loss()\n", 367 | " print(f\"step: {iter}, train loss: {losses['train']:.3f}, val loss: {losses['val']:.3f}\")\n", 368 | "\n", 369 | " # sample a batch of data\n", 370 | " xb, yb = get_batch('train')\n", 371 | "\n", 372 | " # evaluate the loss\n", 373 | " logits, loss = model.forward(xb, yb)\n", 374 | " optimizer.zero_grad(set_to_none=True)\n", 375 | " loss.backward()\n", 376 | " optimizer.step()\n", 377 | "print(loss.item())" 378 | ], 379 | "metadata": { 380 | "colab": { 381 | "base_uri": "https://localhost:8080/" 382 | }, 383 | "id": "Qkqn0mq5Y1YC", 384 | "outputId": "ed9b0974-f993-4949-b0f6-f81b80f55874" 385 | }, 386 | "execution_count": null, 387 | "outputs": [ 388 | { 389 | "output_type": "stream", 390 | "name": "stdout", 391 | "text": [ 392 | "0\n", 393 | "step: 0, train loss: 4.239, val loss: 4.236\n", 394 | "1\n", 395 | "2\n", 396 | "3\n", 397 | "4\n", 398 | "5\n", 399 | "6\n", 400 | "7\n", 401 | "8\n", 402 | "9\n", 403 | "10\n", 404 | "11\n", 405 | "12\n", 406 | "13\n", 407 | "14\n", 408 | "15\n", 409 | "16\n", 410 | "17\n", 411 | "18\n", 412 | "19\n", 413 | "20\n", 414 | "21\n", 415 | "22\n", 416 | "23\n", 417 | "24\n", 418 | "25\n", 419 | "26\n", 420 | "27\n", 421 | "28\n", 422 | "29\n", 423 | "30\n", 424 | "31\n", 425 | "32\n", 426 | "33\n", 427 | "34\n", 428 | "35\n", 429 | "36\n", 430 | "37\n", 431 | "38\n", 432 | "39\n", 433 | "40\n", 434 | "41\n", 435 | "42\n", 436 | "43\n", 437 | "44\n", 438 | "45\n", 439 | "46\n", 440 | "47\n", 441 | "48\n", 442 | "49\n", 443 | "50\n", 444 | "step: 50, train loss: 2.525, val loss: 2.546\n", 445 | "51\n", 446 | "52\n", 447 | "53\n", 448 | "54\n", 449 | "55\n", 450 | "56\n", 451 | "57\n", 452 | "58\n", 453 | "59\n", 454 | "60\n", 455 | "61\n", 456 | "62\n", 457 | "63\n", 458 | "64\n", 459 | "65\n", 460 | "66\n", 461 | "67\n", 462 | "68\n", 463 | "69\n", 464 | "70\n", 465 | "71\n", 466 | "72\n", 467 | "73\n", 468 | "74\n", 469 | "75\n", 470 | "76\n", 471 | "77\n", 472 | "78\n", 473 | "79\n", 474 | "80\n", 475 | "81\n", 476 | "82\n", 477 | "83\n", 478 | "84\n", 479 | "85\n", 480 | "86\n", 481 | "87\n", 482 | "88\n", 483 | "89\n", 484 | "90\n", 485 | "91\n", 486 | "92\n", 487 | "93\n", 488 | "94\n", 489 | "95\n", 490 | "96\n", 491 | "97\n", 492 | "98\n", 493 | "99\n", 494 | "100\n", 495 | "step: 100, train loss: 2.465, val loss: 2.506\n", 496 | "101\n", 497 | "102\n", 498 | "103\n", 499 | "104\n", 500 | "105\n", 501 | "106\n", 502 | "107\n", 503 | "108\n", 504 | "109\n", 505 | "110\n", 506 | "111\n", 507 | "112\n", 508 | "113\n", 509 | "114\n", 510 | "115\n", 511 | "116\n", 512 | "117\n", 513 | "118\n", 514 | "119\n", 515 | "120\n", 516 | "121\n", 517 | "122\n", 518 | "123\n", 519 | "124\n", 520 | "125\n", 521 | "126\n", 522 | "127\n", 523 | "128\n", 524 | "129\n", 525 | "130\n", 526 | "131\n", 527 | "132\n", 528 | "133\n", 529 | "134\n", 530 | "135\n", 531 | "136\n", 532 | "137\n", 533 | "138\n", 534 | "139\n", 535 | "140\n", 536 | "141\n", 537 | "142\n", 538 | "143\n", 539 | "144\n", 540 | "145\n", 541 | "146\n", 542 | "147\n", 543 | "148\n", 544 | "149\n", 545 | "150\n", 546 | "step: 150, train loss: 2.400, val loss: 2.442\n", 547 | "151\n", 548 | "152\n", 549 | "153\n", 550 | "154\n", 551 | "155\n", 552 | "156\n", 553 | "157\n", 554 | "158\n", 555 | "159\n", 556 | "160\n", 557 | "161\n", 558 | "162\n", 559 | "163\n", 560 | "164\n", 561 | "165\n", 562 | "166\n", 563 | "167\n", 564 | "168\n", 565 | "169\n", 566 | "170\n", 567 | "171\n", 568 | "172\n", 569 | "173\n", 570 | "174\n", 571 | "175\n", 572 | "176\n", 573 | "177\n", 574 | "178\n", 575 | "179\n", 576 | "180\n", 577 | "181\n", 578 | "182\n", 579 | "183\n", 580 | "184\n", 581 | "185\n", 582 | "186\n", 583 | "187\n", 584 | "188\n", 585 | "189\n", 586 | "190\n", 587 | "191\n", 588 | "192\n", 589 | "193\n", 590 | "194\n", 591 | "195\n", 592 | "196\n", 593 | "197\n", 594 | "198\n", 595 | "199\n", 596 | "200\n", 597 | "step: 200, train loss: 2.311, val loss: 2.363\n", 598 | "201\n", 599 | "202\n", 600 | "203\n", 601 | "204\n", 602 | "205\n", 603 | "206\n", 604 | "207\n", 605 | "208\n", 606 | "209\n", 607 | "210\n", 608 | "211\n", 609 | "212\n", 610 | "213\n", 611 | "214\n", 612 | "215\n", 613 | "216\n", 614 | "217\n", 615 | "218\n", 616 | "219\n", 617 | "220\n", 618 | "221\n", 619 | "222\n", 620 | "223\n", 621 | "224\n", 622 | "225\n", 623 | "226\n", 624 | "227\n", 625 | "228\n", 626 | "229\n", 627 | "230\n", 628 | "231\n", 629 | "232\n", 630 | "233\n", 631 | "234\n", 632 | "235\n", 633 | "236\n", 634 | "237\n", 635 | "238\n", 636 | "239\n", 637 | "240\n", 638 | "241\n", 639 | "242\n", 640 | "243\n", 641 | "244\n", 642 | "245\n", 643 | "246\n", 644 | "247\n", 645 | "248\n", 646 | "249\n", 647 | "250\n", 648 | "step: 250, train loss: 2.197, val loss: 2.248\n", 649 | "251\n", 650 | "252\n", 651 | "253\n", 652 | "254\n", 653 | "255\n", 654 | "256\n", 655 | "257\n", 656 | "258\n", 657 | "259\n", 658 | "260\n", 659 | "261\n", 660 | "262\n", 661 | "263\n", 662 | "264\n", 663 | "265\n", 664 | "266\n", 665 | "267\n", 666 | "268\n", 667 | "269\n", 668 | "270\n", 669 | "271\n", 670 | "272\n", 671 | "273\n", 672 | "274\n", 673 | "275\n", 674 | "276\n", 675 | "277\n", 676 | "278\n", 677 | "279\n", 678 | "280\n", 679 | "281\n", 680 | "282\n", 681 | "283\n", 682 | "284\n", 683 | "285\n", 684 | "286\n", 685 | "287\n", 686 | "288\n", 687 | "289\n", 688 | "290\n", 689 | "291\n", 690 | "292\n", 691 | "293\n", 692 | "294\n", 693 | "295\n", 694 | "296\n", 695 | "297\n", 696 | "298\n", 697 | "299\n", 698 | "300\n", 699 | "step: 300, train loss: 2.113, val loss: 2.161\n", 700 | "301\n", 701 | "302\n", 702 | "303\n", 703 | "304\n", 704 | "305\n", 705 | "306\n", 706 | "307\n", 707 | "308\n", 708 | "309\n", 709 | "310\n", 710 | "311\n", 711 | "312\n", 712 | "313\n", 713 | "314\n", 714 | "315\n", 715 | "316\n", 716 | "317\n", 717 | "318\n", 718 | "319\n", 719 | "320\n", 720 | "321\n", 721 | "322\n", 722 | "323\n", 723 | "324\n", 724 | "325\n", 725 | "326\n", 726 | "327\n", 727 | "328\n", 728 | "329\n", 729 | "330\n", 730 | "331\n", 731 | "332\n", 732 | "333\n", 733 | "334\n", 734 | "335\n", 735 | "336\n", 736 | "337\n", 737 | "338\n", 738 | "339\n", 739 | "340\n", 740 | "341\n", 741 | "342\n", 742 | "343\n", 743 | "344\n", 744 | "345\n", 745 | "346\n", 746 | "347\n", 747 | "348\n", 748 | "349\n", 749 | "350\n", 750 | "step: 350, train loss: 2.056, val loss: 2.128\n", 751 | "351\n", 752 | "352\n", 753 | "353\n", 754 | "354\n", 755 | "355\n", 756 | "356\n", 757 | "357\n", 758 | "358\n", 759 | "359\n", 760 | "360\n", 761 | "361\n", 762 | "362\n", 763 | "363\n", 764 | "364\n", 765 | "365\n", 766 | "366\n", 767 | "367\n", 768 | "368\n", 769 | "369\n", 770 | "370\n", 771 | "371\n", 772 | "372\n", 773 | "373\n", 774 | "374\n", 775 | "375\n", 776 | "376\n", 777 | "377\n", 778 | "378\n", 779 | "379\n", 780 | "380\n", 781 | "381\n", 782 | "382\n", 783 | "383\n", 784 | "384\n", 785 | "385\n", 786 | "386\n", 787 | "387\n", 788 | "388\n", 789 | "389\n", 790 | "390\n", 791 | "391\n", 792 | "392\n", 793 | "393\n", 794 | "394\n", 795 | "395\n", 796 | "396\n", 797 | "397\n", 798 | "398\n", 799 | "399\n", 800 | "400\n", 801 | "step: 400, train loss: 1.992, val loss: 2.070\n", 802 | "401\n", 803 | "402\n", 804 | "403\n", 805 | "404\n", 806 | "405\n", 807 | "406\n", 808 | "407\n", 809 | "408\n", 810 | "409\n", 811 | "410\n", 812 | "411\n", 813 | "412\n", 814 | "413\n", 815 | "414\n", 816 | "415\n", 817 | "416\n", 818 | "417\n", 819 | "418\n", 820 | "419\n", 821 | "420\n", 822 | "421\n", 823 | "422\n", 824 | "423\n", 825 | "424\n", 826 | "425\n", 827 | "426\n", 828 | "427\n", 829 | "428\n", 830 | "429\n", 831 | "430\n", 832 | "431\n", 833 | "432\n", 834 | "433\n", 835 | "434\n", 836 | "435\n", 837 | "436\n", 838 | "437\n", 839 | "438\n", 840 | "439\n", 841 | "440\n", 842 | "441\n", 843 | "442\n", 844 | "443\n", 845 | "444\n", 846 | "445\n", 847 | "446\n", 848 | "447\n", 849 | "448\n", 850 | "449\n", 851 | "450\n", 852 | "step: 450, train loss: 1.923, val loss: 2.019\n", 853 | "451\n", 854 | "452\n", 855 | "453\n", 856 | "454\n", 857 | "455\n", 858 | "456\n", 859 | "457\n", 860 | "458\n", 861 | "459\n", 862 | "460\n", 863 | "461\n", 864 | "462\n", 865 | "463\n", 866 | "464\n", 867 | "465\n", 868 | "466\n", 869 | "467\n", 870 | "468\n", 871 | "469\n", 872 | "470\n", 873 | "471\n", 874 | "472\n", 875 | "473\n", 876 | "474\n", 877 | "475\n", 878 | "476\n", 879 | "477\n", 880 | "478\n", 881 | "479\n", 882 | "480\n", 883 | "481\n", 884 | "482\n", 885 | "483\n", 886 | "484\n", 887 | "485\n", 888 | "486\n", 889 | "487\n", 890 | "488\n", 891 | "489\n", 892 | "490\n", 893 | "491\n", 894 | "492\n", 895 | "493\n", 896 | "494\n", 897 | "495\n", 898 | "496\n", 899 | "497\n", 900 | "498\n", 901 | "499\n", 902 | "500\n", 903 | "step: 500, train loss: 1.874, val loss: 1.981\n", 904 | "501\n", 905 | "502\n", 906 | "503\n", 907 | "504\n", 908 | "505\n", 909 | "506\n", 910 | "507\n", 911 | "508\n", 912 | "509\n", 913 | "510\n", 914 | "511\n", 915 | "512\n", 916 | "513\n", 917 | "514\n", 918 | "515\n", 919 | "516\n", 920 | "517\n", 921 | "518\n", 922 | "519\n", 923 | "520\n", 924 | "521\n", 925 | "522\n", 926 | "523\n", 927 | "524\n", 928 | "525\n", 929 | "526\n", 930 | "527\n", 931 | "528\n", 932 | "529\n", 933 | "530\n", 934 | "531\n", 935 | "532\n", 936 | "533\n", 937 | "534\n", 938 | "535\n", 939 | "536\n", 940 | "537\n", 941 | "538\n", 942 | "539\n", 943 | "540\n", 944 | "541\n", 945 | "542\n", 946 | "543\n", 947 | "544\n", 948 | "545\n", 949 | "546\n", 950 | "547\n", 951 | "548\n", 952 | "549\n", 953 | "550\n", 954 | "step: 550, train loss: 1.839, val loss: 1.961\n", 955 | "551\n", 956 | "552\n", 957 | "553\n", 958 | "554\n", 959 | "555\n", 960 | "556\n", 961 | "557\n", 962 | "558\n", 963 | "559\n", 964 | "560\n", 965 | "561\n", 966 | "562\n", 967 | "563\n", 968 | "564\n", 969 | "565\n", 970 | "566\n", 971 | "567\n", 972 | "568\n", 973 | "569\n", 974 | "570\n", 975 | "571\n", 976 | "572\n", 977 | "573\n", 978 | "574\n", 979 | "575\n", 980 | "576\n", 981 | "577\n", 982 | "578\n", 983 | "579\n", 984 | "580\n", 985 | "581\n", 986 | "582\n", 987 | "583\n", 988 | "584\n", 989 | "585\n", 990 | "586\n", 991 | "587\n", 992 | "588\n", 993 | "589\n", 994 | "590\n", 995 | "591\n", 996 | "592\n", 997 | "593\n", 998 | "594\n", 999 | "595\n", 1000 | "596\n", 1001 | "597\n", 1002 | "598\n", 1003 | "599\n", 1004 | "600\n", 1005 | "step: 600, train loss: 1.801, val loss: 1.918\n", 1006 | "601\n", 1007 | "602\n", 1008 | "603\n", 1009 | "604\n", 1010 | "605\n", 1011 | "606\n", 1012 | "607\n", 1013 | "608\n", 1014 | "609\n", 1015 | "610\n", 1016 | "611\n", 1017 | "612\n", 1018 | "613\n", 1019 | "614\n", 1020 | "615\n", 1021 | "616\n", 1022 | "617\n", 1023 | "618\n", 1024 | "619\n", 1025 | "620\n", 1026 | "621\n", 1027 | "622\n", 1028 | "623\n", 1029 | "624\n", 1030 | "625\n", 1031 | "626\n", 1032 | "627\n", 1033 | "628\n", 1034 | "629\n", 1035 | "630\n", 1036 | "631\n", 1037 | "632\n", 1038 | "633\n", 1039 | "634\n", 1040 | "635\n", 1041 | "636\n", 1042 | "637\n", 1043 | "638\n", 1044 | "639\n", 1045 | "640\n", 1046 | "641\n", 1047 | "642\n", 1048 | "643\n", 1049 | "644\n", 1050 | "645\n", 1051 | "646\n", 1052 | "647\n", 1053 | "648\n", 1054 | "649\n", 1055 | "650\n", 1056 | "step: 650, train loss: 1.769, val loss: 1.911\n", 1057 | "651\n", 1058 | "652\n", 1059 | "653\n", 1060 | "654\n", 1061 | "655\n", 1062 | "656\n", 1063 | "657\n", 1064 | "658\n", 1065 | "659\n", 1066 | "660\n", 1067 | "661\n", 1068 | "662\n", 1069 | "663\n", 1070 | "664\n", 1071 | "665\n", 1072 | "666\n", 1073 | "667\n", 1074 | "668\n", 1075 | "669\n", 1076 | "670\n", 1077 | "671\n", 1078 | "672\n", 1079 | "673\n", 1080 | "674\n", 1081 | "675\n", 1082 | "676\n", 1083 | "677\n", 1084 | "678\n", 1085 | "679\n", 1086 | "680\n", 1087 | "681\n", 1088 | "682\n", 1089 | "683\n", 1090 | "684\n", 1091 | "685\n", 1092 | "686\n", 1093 | "687\n", 1094 | "688\n", 1095 | "689\n", 1096 | "690\n", 1097 | "691\n", 1098 | "692\n", 1099 | "693\n", 1100 | "694\n", 1101 | "695\n", 1102 | "696\n", 1103 | "697\n", 1104 | "698\n", 1105 | "699\n", 1106 | "700\n", 1107 | "step: 700, train loss: 1.733, val loss: 1.895\n", 1108 | "701\n", 1109 | "702\n", 1110 | "703\n", 1111 | "704\n", 1112 | "705\n", 1113 | "706\n", 1114 | "707\n", 1115 | "708\n", 1116 | "709\n", 1117 | "710\n", 1118 | "711\n", 1119 | "712\n", 1120 | "713\n", 1121 | "714\n", 1122 | "715\n", 1123 | "716\n", 1124 | "717\n", 1125 | "718\n", 1126 | "719\n", 1127 | "720\n", 1128 | "721\n", 1129 | "722\n", 1130 | "723\n", 1131 | "724\n", 1132 | "725\n", 1133 | "726\n", 1134 | "727\n", 1135 | "728\n", 1136 | "729\n", 1137 | "730\n", 1138 | "731\n", 1139 | "732\n", 1140 | "733\n", 1141 | "734\n", 1142 | "735\n", 1143 | "736\n", 1144 | "737\n", 1145 | "738\n", 1146 | "739\n", 1147 | "740\n", 1148 | "741\n", 1149 | "742\n", 1150 | "743\n", 1151 | "744\n", 1152 | "745\n", 1153 | "746\n", 1154 | "747\n", 1155 | "748\n", 1156 | "749\n", 1157 | "750\n", 1158 | "step: 750, train loss: 1.713, val loss: 1.869\n", 1159 | "751\n", 1160 | "752\n", 1161 | "753\n", 1162 | "754\n", 1163 | "755\n", 1164 | "756\n", 1165 | "757\n", 1166 | "758\n", 1167 | "759\n", 1168 | "760\n", 1169 | "761\n", 1170 | "762\n", 1171 | "763\n", 1172 | "764\n", 1173 | "765\n", 1174 | "766\n", 1175 | "767\n", 1176 | "768\n", 1177 | "769\n", 1178 | "770\n", 1179 | "771\n", 1180 | "772\n", 1181 | "773\n", 1182 | "774\n", 1183 | "775\n", 1184 | "776\n", 1185 | "777\n", 1186 | "778\n", 1187 | "779\n", 1188 | "780\n", 1189 | "781\n", 1190 | "782\n", 1191 | "783\n", 1192 | "784\n", 1193 | "785\n", 1194 | "786\n", 1195 | "787\n", 1196 | "788\n", 1197 | "789\n", 1198 | "790\n", 1199 | "791\n", 1200 | "792\n", 1201 | "793\n", 1202 | "794\n", 1203 | "795\n", 1204 | "796\n", 1205 | "797\n", 1206 | "798\n", 1207 | "799\n", 1208 | "800\n", 1209 | "step: 800, train loss: 1.697, val loss: 1.868\n", 1210 | "801\n", 1211 | "802\n", 1212 | "803\n", 1213 | "804\n", 1214 | "805\n", 1215 | "806\n", 1216 | "807\n", 1217 | "808\n", 1218 | "809\n", 1219 | "810\n", 1220 | "811\n", 1221 | "812\n", 1222 | "813\n", 1223 | "814\n", 1224 | "815\n", 1225 | "816\n", 1226 | "817\n", 1227 | "818\n", 1228 | "819\n", 1229 | "820\n", 1230 | "821\n", 1231 | "822\n", 1232 | "823\n", 1233 | "824\n", 1234 | "825\n", 1235 | "826\n", 1236 | "827\n", 1237 | "828\n", 1238 | "829\n", 1239 | "830\n", 1240 | "831\n", 1241 | "832\n", 1242 | "833\n", 1243 | "834\n", 1244 | "835\n", 1245 | "836\n", 1246 | "837\n", 1247 | "838\n", 1248 | "839\n", 1249 | "840\n", 1250 | "841\n", 1251 | "842\n", 1252 | "843\n", 1253 | "844\n", 1254 | "845\n", 1255 | "846\n", 1256 | "847\n", 1257 | "848\n", 1258 | "849\n", 1259 | "850\n", 1260 | "step: 850, train loss: 1.676, val loss: 1.824\n", 1261 | "851\n", 1262 | "852\n", 1263 | "853\n", 1264 | "854\n", 1265 | "855\n", 1266 | "856\n", 1267 | "857\n", 1268 | "858\n", 1269 | "859\n", 1270 | "860\n", 1271 | "861\n", 1272 | "862\n", 1273 | "863\n", 1274 | "864\n", 1275 | "865\n", 1276 | "866\n", 1277 | "867\n", 1278 | "868\n", 1279 | "869\n", 1280 | "870\n", 1281 | "871\n", 1282 | "872\n", 1283 | "873\n", 1284 | "874\n", 1285 | "875\n", 1286 | "876\n", 1287 | "877\n", 1288 | "878\n", 1289 | "879\n", 1290 | "880\n", 1291 | "881\n", 1292 | "882\n", 1293 | "883\n", 1294 | "884\n", 1295 | "885\n", 1296 | "886\n", 1297 | "887\n", 1298 | "888\n", 1299 | "889\n", 1300 | "890\n", 1301 | "891\n", 1302 | "892\n", 1303 | "893\n", 1304 | "894\n", 1305 | "895\n", 1306 | "896\n", 1307 | "897\n", 1308 | "898\n", 1309 | "899\n", 1310 | "900\n", 1311 | "step: 900, train loss: 1.640, val loss: 1.837\n", 1312 | "901\n", 1313 | "902\n", 1314 | "903\n", 1315 | "904\n", 1316 | "905\n", 1317 | "906\n", 1318 | "907\n", 1319 | "908\n", 1320 | "909\n", 1321 | "910\n", 1322 | "911\n", 1323 | "912\n", 1324 | "913\n", 1325 | "914\n", 1326 | "915\n", 1327 | "916\n", 1328 | "917\n", 1329 | "918\n", 1330 | "919\n", 1331 | "920\n", 1332 | "921\n", 1333 | "922\n", 1334 | "923\n", 1335 | "924\n", 1336 | "925\n", 1337 | "926\n", 1338 | "927\n", 1339 | "928\n", 1340 | "929\n", 1341 | "930\n", 1342 | "931\n", 1343 | "932\n", 1344 | "933\n", 1345 | "934\n", 1346 | "935\n", 1347 | "936\n", 1348 | "937\n", 1349 | "938\n", 1350 | "939\n", 1351 | "940\n", 1352 | "941\n", 1353 | "942\n", 1354 | "943\n", 1355 | "944\n", 1356 | "945\n", 1357 | "946\n", 1358 | "947\n", 1359 | "948\n", 1360 | "949\n", 1361 | "950\n", 1362 | "step: 950, train loss: 1.625, val loss: 1.834\n", 1363 | "951\n", 1364 | "952\n", 1365 | "953\n", 1366 | "954\n", 1367 | "955\n", 1368 | "956\n", 1369 | "957\n", 1370 | "958\n", 1371 | "959\n", 1372 | "960\n", 1373 | "961\n", 1374 | "962\n", 1375 | "963\n", 1376 | "964\n", 1377 | "965\n", 1378 | "966\n", 1379 | "967\n", 1380 | "968\n", 1381 | "969\n", 1382 | "970\n", 1383 | "971\n", 1384 | "972\n", 1385 | "973\n", 1386 | "974\n", 1387 | "975\n", 1388 | "976\n", 1389 | "977\n", 1390 | "978\n", 1391 | "979\n", 1392 | "980\n", 1393 | "981\n", 1394 | "982\n", 1395 | "983\n", 1396 | "984\n", 1397 | "985\n", 1398 | "986\n", 1399 | "987\n", 1400 | "988\n", 1401 | "989\n", 1402 | "990\n", 1403 | "991\n", 1404 | "992\n", 1405 | "993\n", 1406 | "994\n", 1407 | "995\n", 1408 | "996\n", 1409 | "997\n", 1410 | "998\n", 1411 | "999\n", 1412 | "1000\n", 1413 | "step: 1000, train loss: 1.618, val loss: 1.819\n", 1414 | "1001\n", 1415 | "1002\n", 1416 | "1003\n", 1417 | "1004\n", 1418 | "1005\n", 1419 | "1006\n", 1420 | "1007\n", 1421 | "1008\n", 1422 | "1009\n", 1423 | "1010\n", 1424 | "1011\n", 1425 | "1012\n", 1426 | "1013\n", 1427 | "1014\n", 1428 | "1015\n", 1429 | "1016\n", 1430 | "1017\n", 1431 | "1018\n", 1432 | "1019\n", 1433 | "1020\n", 1434 | "1021\n", 1435 | "1022\n", 1436 | "1023\n", 1437 | "1024\n", 1438 | "1025\n", 1439 | "1026\n", 1440 | "1027\n", 1441 | "1028\n", 1442 | "1029\n", 1443 | "1030\n", 1444 | "1031\n", 1445 | "1032\n", 1446 | "1033\n", 1447 | "1034\n", 1448 | "1035\n", 1449 | "1036\n", 1450 | "1037\n", 1451 | "1038\n", 1452 | "1039\n", 1453 | "1040\n", 1454 | "1041\n", 1455 | "1042\n", 1456 | "1043\n", 1457 | "1044\n", 1458 | "1045\n", 1459 | "1046\n", 1460 | "1047\n", 1461 | "1048\n", 1462 | "1049\n", 1463 | "1050\n", 1464 | "step: 1050, train loss: 1.593, val loss: 1.804\n", 1465 | "1051\n", 1466 | "1052\n", 1467 | "1053\n", 1468 | "1054\n", 1469 | "1055\n", 1470 | "1056\n", 1471 | "1057\n", 1472 | "1058\n", 1473 | "1059\n", 1474 | "1060\n", 1475 | "1061\n", 1476 | "1062\n", 1477 | "1063\n", 1478 | "1064\n", 1479 | "1065\n", 1480 | "1066\n", 1481 | "1067\n", 1482 | "1068\n", 1483 | "1069\n", 1484 | "1070\n", 1485 | "1071\n", 1486 | "1072\n", 1487 | "1073\n", 1488 | "1074\n", 1489 | "1075\n", 1490 | "1076\n", 1491 | "1077\n", 1492 | "1078\n", 1493 | "1079\n", 1494 | "1080\n", 1495 | "1081\n", 1496 | "1082\n", 1497 | "1083\n", 1498 | "1084\n", 1499 | "1085\n", 1500 | "1086\n", 1501 | "1087\n", 1502 | "1088\n", 1503 | "1089\n", 1504 | "1090\n", 1505 | "1091\n", 1506 | "1092\n", 1507 | "1093\n", 1508 | "1094\n", 1509 | "1095\n", 1510 | "1096\n", 1511 | "1097\n", 1512 | "1098\n", 1513 | "1099\n", 1514 | "1100\n", 1515 | "step: 1100, train loss: 1.583, val loss: 1.799\n", 1516 | "1101\n", 1517 | "1102\n", 1518 | "1103\n", 1519 | "1104\n", 1520 | "1105\n", 1521 | "1106\n", 1522 | "1107\n", 1523 | "1108\n", 1524 | "1109\n", 1525 | "1110\n", 1526 | "1111\n", 1527 | "1112\n", 1528 | "1113\n", 1529 | "1114\n", 1530 | "1115\n", 1531 | "1116\n", 1532 | "1117\n", 1533 | "1118\n", 1534 | "1119\n", 1535 | "1120\n", 1536 | "1121\n", 1537 | "1122\n", 1538 | "1123\n", 1539 | "1124\n", 1540 | "1125\n", 1541 | "1126\n", 1542 | "1127\n", 1543 | "1128\n", 1544 | "1129\n", 1545 | "1130\n", 1546 | "1131\n", 1547 | "1132\n", 1548 | "1133\n", 1549 | "1134\n", 1550 | "1135\n", 1551 | "1136\n", 1552 | "1137\n", 1553 | "1138\n", 1554 | "1139\n", 1555 | "1140\n", 1556 | "1141\n", 1557 | "1142\n", 1558 | "1143\n", 1559 | "1144\n", 1560 | "1145\n", 1561 | "1146\n", 1562 | "1147\n", 1563 | "1148\n", 1564 | "1149\n", 1565 | "1150\n", 1566 | "step: 1150, train loss: 1.574, val loss: 1.794\n", 1567 | "1151\n", 1568 | "1152\n", 1569 | "1153\n", 1570 | "1154\n", 1571 | "1155\n", 1572 | "1156\n", 1573 | "1157\n", 1574 | "1158\n", 1575 | "1159\n", 1576 | "1160\n", 1577 | "1161\n", 1578 | "1162\n", 1579 | "1163\n", 1580 | "1164\n", 1581 | "1165\n", 1582 | "1166\n", 1583 | "1167\n", 1584 | "1168\n", 1585 | "1169\n", 1586 | "1170\n", 1587 | "1171\n", 1588 | "1172\n", 1589 | "1173\n", 1590 | "1174\n", 1591 | "1175\n", 1592 | "1176\n", 1593 | "1177\n", 1594 | "1178\n", 1595 | "1179\n", 1596 | "1180\n", 1597 | "1181\n", 1598 | "1182\n", 1599 | "1183\n", 1600 | "1184\n", 1601 | "1185\n", 1602 | "1186\n", 1603 | "1187\n", 1604 | "1188\n", 1605 | "1189\n", 1606 | "1190\n", 1607 | "1191\n", 1608 | "1192\n", 1609 | "1193\n", 1610 | "1194\n", 1611 | "1195\n", 1612 | "1196\n", 1613 | "1197\n", 1614 | "1198\n", 1615 | "1199\n", 1616 | "1200\n", 1617 | "step: 1200, train loss: 1.566, val loss: 1.769\n", 1618 | "1201\n", 1619 | "1202\n", 1620 | "1203\n", 1621 | "1204\n", 1622 | "1205\n", 1623 | "1206\n", 1624 | "1207\n", 1625 | "1208\n", 1626 | "1209\n", 1627 | "1210\n", 1628 | "1211\n", 1629 | "1212\n", 1630 | "1213\n", 1631 | "1214\n", 1632 | "1215\n", 1633 | "1216\n", 1634 | "1217\n", 1635 | "1218\n", 1636 | "1219\n", 1637 | "1220\n", 1638 | "1221\n", 1639 | "1222\n", 1640 | "1223\n", 1641 | "1224\n", 1642 | "1225\n", 1643 | "1226\n", 1644 | "1227\n", 1645 | "1228\n", 1646 | "1229\n", 1647 | "1230\n", 1648 | "1231\n", 1649 | "1232\n", 1650 | "1233\n", 1651 | "1234\n", 1652 | "1235\n", 1653 | "1236\n", 1654 | "1237\n", 1655 | "1238\n", 1656 | "1239\n", 1657 | "1240\n", 1658 | "1241\n", 1659 | "1242\n", 1660 | "1243\n", 1661 | "1244\n", 1662 | "1245\n", 1663 | "1246\n", 1664 | "1247\n", 1665 | "1248\n", 1666 | "1249\n", 1667 | "1250\n", 1668 | "step: 1250, train loss: 1.550, val loss: 1.780\n", 1669 | "1251\n", 1670 | "1252\n", 1671 | "1253\n", 1672 | "1254\n", 1673 | "1255\n", 1674 | "1256\n", 1675 | "1257\n", 1676 | "1258\n", 1677 | "1259\n", 1678 | "1260\n", 1679 | "1261\n", 1680 | "1262\n", 1681 | "1263\n", 1682 | "1264\n", 1683 | "1265\n", 1684 | "1266\n", 1685 | "1267\n", 1686 | "1268\n", 1687 | "1269\n", 1688 | "1270\n", 1689 | "1271\n", 1690 | "1272\n", 1691 | "1273\n", 1692 | "1274\n", 1693 | "1275\n", 1694 | "1276\n", 1695 | "1277\n", 1696 | "1278\n", 1697 | "1279\n", 1698 | "1280\n", 1699 | "1281\n", 1700 | "1282\n", 1701 | "1283\n", 1702 | "1284\n", 1703 | "1285\n", 1704 | "1286\n", 1705 | "1287\n", 1706 | "1288\n", 1707 | "1289\n", 1708 | "1290\n", 1709 | "1291\n", 1710 | "1292\n", 1711 | "1293\n", 1712 | "1294\n", 1713 | "1295\n", 1714 | "1296\n", 1715 | "1297\n", 1716 | "1298\n", 1717 | "1299\n", 1718 | "1300\n", 1719 | "step: 1300, train loss: 1.541, val loss: 1.758\n", 1720 | "1301\n", 1721 | "1302\n", 1722 | "1303\n", 1723 | "1304\n", 1724 | "1305\n", 1725 | "1306\n", 1726 | "1307\n", 1727 | "1308\n", 1728 | "1309\n", 1729 | "1310\n", 1730 | "1311\n", 1731 | "1312\n", 1732 | "1313\n", 1733 | "1314\n", 1734 | "1315\n", 1735 | "1316\n", 1736 | "1317\n", 1737 | "1318\n", 1738 | "1319\n", 1739 | "1320\n", 1740 | "1321\n", 1741 | "1322\n", 1742 | "1323\n", 1743 | "1324\n", 1744 | "1325\n", 1745 | "1326\n", 1746 | "1327\n", 1747 | "1328\n", 1748 | "1329\n", 1749 | "1330\n", 1750 | "1331\n", 1751 | "1332\n", 1752 | "1333\n", 1753 | "1334\n", 1754 | "1335\n", 1755 | "1336\n", 1756 | "1337\n", 1757 | "1338\n", 1758 | "1339\n", 1759 | "1340\n", 1760 | "1341\n", 1761 | "1342\n", 1762 | "1343\n", 1763 | "1344\n", 1764 | "1345\n", 1765 | "1346\n", 1766 | "1347\n", 1767 | "1348\n", 1768 | "1349\n", 1769 | "1350\n", 1770 | "step: 1350, train loss: 1.526, val loss: 1.771\n", 1771 | "1351\n", 1772 | "1352\n", 1773 | "1353\n", 1774 | "1354\n", 1775 | "1355\n", 1776 | "1356\n", 1777 | "1357\n", 1778 | "1358\n", 1779 | "1359\n", 1780 | "1360\n", 1781 | "1361\n", 1782 | "1362\n", 1783 | "1363\n", 1784 | "1364\n", 1785 | "1365\n", 1786 | "1366\n", 1787 | "1367\n", 1788 | "1368\n", 1789 | "1369\n", 1790 | "1370\n", 1791 | "1371\n", 1792 | "1372\n", 1793 | "1373\n", 1794 | "1374\n", 1795 | "1375\n", 1796 | "1376\n", 1797 | "1377\n", 1798 | "1378\n", 1799 | "1379\n", 1800 | "1380\n", 1801 | "1381\n", 1802 | "1382\n", 1803 | "1383\n", 1804 | "1384\n", 1805 | "1385\n", 1806 | "1386\n", 1807 | "1387\n", 1808 | "1388\n", 1809 | "1389\n", 1810 | "1390\n", 1811 | "1391\n", 1812 | "1392\n", 1813 | "1393\n", 1814 | "1394\n", 1815 | "1395\n", 1816 | "1396\n", 1817 | "1397\n", 1818 | "1398\n", 1819 | "1399\n", 1820 | "1400\n", 1821 | "step: 1400, train loss: 1.520, val loss: 1.760\n", 1822 | "1401\n", 1823 | "1402\n", 1824 | "1403\n", 1825 | "1404\n", 1826 | "1405\n", 1827 | "1406\n", 1828 | "1407\n", 1829 | "1408\n", 1830 | "1409\n", 1831 | "1410\n", 1832 | "1411\n", 1833 | "1412\n", 1834 | "1413\n", 1835 | "1414\n", 1836 | "1415\n", 1837 | "1416\n", 1838 | "1417\n", 1839 | "1418\n", 1840 | "1419\n", 1841 | "1420\n", 1842 | "1421\n", 1843 | "1422\n", 1844 | "1423\n", 1845 | "1424\n", 1846 | "1425\n", 1847 | "1426\n", 1848 | "1427\n", 1849 | "1428\n", 1850 | "1429\n", 1851 | "1430\n", 1852 | "1431\n", 1853 | "1432\n", 1854 | "1433\n", 1855 | "1434\n", 1856 | "1435\n", 1857 | "1436\n", 1858 | "1437\n", 1859 | "1438\n", 1860 | "1439\n", 1861 | "1440\n", 1862 | "1441\n", 1863 | "1442\n", 1864 | "1443\n", 1865 | "1444\n", 1866 | "1445\n", 1867 | "1446\n", 1868 | "1447\n", 1869 | "1448\n", 1870 | "1449\n", 1871 | "1450\n", 1872 | "step: 1450, train loss: 1.514, val loss: 1.754\n", 1873 | "1451\n", 1874 | "1452\n", 1875 | "1453\n", 1876 | "1454\n", 1877 | "1455\n", 1878 | "1456\n", 1879 | "1457\n", 1880 | "1458\n", 1881 | "1459\n", 1882 | "1460\n", 1883 | "1461\n", 1884 | "1462\n", 1885 | "1463\n", 1886 | "1464\n", 1887 | "1465\n", 1888 | "1466\n", 1889 | "1467\n", 1890 | "1468\n", 1891 | "1469\n", 1892 | "1470\n", 1893 | "1471\n", 1894 | "1472\n", 1895 | "1473\n", 1896 | "1474\n", 1897 | "1475\n", 1898 | "1476\n", 1899 | "1477\n", 1900 | "1478\n", 1901 | "1479\n", 1902 | "1480\n", 1903 | "1481\n", 1904 | "1482\n", 1905 | "1483\n", 1906 | "1484\n", 1907 | "1485\n", 1908 | "1486\n", 1909 | "1487\n", 1910 | "1488\n", 1911 | "1489\n", 1912 | "1490\n", 1913 | "1491\n", 1914 | "1492\n", 1915 | "1493\n", 1916 | "1494\n", 1917 | "1495\n", 1918 | "1496\n", 1919 | "1497\n", 1920 | "1498\n", 1921 | "1499\n", 1922 | "1500\n", 1923 | "step: 1500, train loss: 1.496, val loss: 1.736\n", 1924 | "1501\n", 1925 | "1502\n", 1926 | "1503\n", 1927 | "1504\n", 1928 | "1505\n", 1929 | "1506\n", 1930 | "1507\n", 1931 | "1508\n", 1932 | "1509\n", 1933 | "1510\n", 1934 | "1511\n", 1935 | "1512\n", 1936 | "1513\n", 1937 | "1514\n", 1938 | "1515\n", 1939 | "1516\n", 1940 | "1517\n", 1941 | "1518\n", 1942 | "1519\n", 1943 | "1520\n", 1944 | "1521\n", 1945 | "1522\n", 1946 | "1523\n", 1947 | "1524\n", 1948 | "1525\n", 1949 | "1526\n", 1950 | "1527\n", 1951 | "1528\n", 1952 | "1529\n", 1953 | "1530\n", 1954 | "1531\n", 1955 | "1532\n", 1956 | "1533\n", 1957 | "1534\n", 1958 | "1535\n", 1959 | "1536\n", 1960 | "1537\n", 1961 | "1538\n", 1962 | "1539\n", 1963 | "1540\n", 1964 | "1541\n", 1965 | "1542\n", 1966 | "1543\n", 1967 | "1544\n", 1968 | "1545\n", 1969 | "1546\n", 1970 | "1547\n", 1971 | "1548\n", 1972 | "1549\n", 1973 | "1550\n", 1974 | "step: 1550, train loss: 1.495, val loss: 1.738\n", 1975 | "1551\n", 1976 | "1552\n", 1977 | "1553\n", 1978 | "1554\n", 1979 | "1555\n", 1980 | "1556\n", 1981 | "1557\n", 1982 | "1558\n", 1983 | "1559\n", 1984 | "1560\n", 1985 | "1561\n", 1986 | "1562\n", 1987 | "1563\n", 1988 | "1564\n", 1989 | "1565\n", 1990 | "1566\n", 1991 | "1567\n", 1992 | "1568\n", 1993 | "1569\n", 1994 | "1570\n", 1995 | "1571\n", 1996 | "1572\n", 1997 | "1573\n", 1998 | "1574\n", 1999 | "1575\n", 2000 | "1576\n", 2001 | "1577\n", 2002 | "1578\n", 2003 | "1579\n", 2004 | "1580\n", 2005 | "1581\n", 2006 | "1582\n", 2007 | "1583\n", 2008 | "1584\n", 2009 | "1585\n", 2010 | "1586\n", 2011 | "1587\n", 2012 | "1588\n", 2013 | "1589\n", 2014 | "1590\n", 2015 | "1591\n", 2016 | "1592\n", 2017 | "1593\n", 2018 | "1594\n", 2019 | "1595\n", 2020 | "1596\n", 2021 | "1597\n", 2022 | "1598\n", 2023 | "1599\n", 2024 | "1600\n", 2025 | "step: 1600, train loss: 1.493, val loss: 1.726\n", 2026 | "1601\n", 2027 | "1602\n", 2028 | "1603\n", 2029 | "1604\n", 2030 | "1605\n", 2031 | "1606\n", 2032 | "1607\n", 2033 | "1608\n", 2034 | "1609\n", 2035 | "1610\n", 2036 | "1611\n", 2037 | "1612\n", 2038 | "1613\n", 2039 | "1614\n", 2040 | "1615\n", 2041 | "1616\n", 2042 | "1617\n", 2043 | "1618\n", 2044 | "1619\n", 2045 | "1620\n", 2046 | "1621\n", 2047 | "1622\n", 2048 | "1623\n", 2049 | "1624\n", 2050 | "1625\n", 2051 | "1626\n", 2052 | "1627\n", 2053 | "1628\n", 2054 | "1629\n", 2055 | "1630\n", 2056 | "1631\n", 2057 | "1632\n", 2058 | "1633\n", 2059 | "1634\n", 2060 | "1635\n", 2061 | "1636\n", 2062 | "1637\n", 2063 | "1638\n", 2064 | "1639\n", 2065 | "1640\n", 2066 | "1641\n", 2067 | "1642\n", 2068 | "1643\n", 2069 | "1644\n", 2070 | "1645\n", 2071 | "1646\n", 2072 | "1647\n", 2073 | "1648\n", 2074 | "1649\n", 2075 | "1650\n", 2076 | "step: 1650, train loss: 1.477, val loss: 1.730\n", 2077 | "1651\n", 2078 | "1652\n", 2079 | "1653\n", 2080 | "1654\n", 2081 | "1655\n", 2082 | "1656\n", 2083 | "1657\n", 2084 | "1658\n", 2085 | "1659\n", 2086 | "1660\n", 2087 | "1661\n", 2088 | "1662\n", 2089 | "1663\n", 2090 | "1664\n", 2091 | "1665\n", 2092 | "1666\n", 2093 | "1667\n", 2094 | "1668\n", 2095 | "1669\n", 2096 | "1670\n", 2097 | "1671\n", 2098 | "1672\n", 2099 | "1673\n", 2100 | "1674\n", 2101 | "1675\n", 2102 | "1676\n", 2103 | "1677\n", 2104 | "1678\n", 2105 | "1679\n", 2106 | "1680\n", 2107 | "1681\n", 2108 | "1682\n", 2109 | "1683\n", 2110 | "1684\n", 2111 | "1685\n", 2112 | "1686\n", 2113 | "1687\n", 2114 | "1688\n", 2115 | "1689\n", 2116 | "1690\n", 2117 | "1691\n", 2118 | "1692\n", 2119 | "1693\n", 2120 | "1694\n", 2121 | "1695\n", 2122 | "1696\n", 2123 | "1697\n", 2124 | "1698\n", 2125 | "1699\n", 2126 | "1700\n", 2127 | "step: 1700, train loss: 1.472, val loss: 1.727\n", 2128 | "1701\n", 2129 | "1702\n", 2130 | "1703\n", 2131 | "1704\n", 2132 | "1705\n", 2133 | "1706\n", 2134 | "1707\n", 2135 | "1708\n", 2136 | "1709\n", 2137 | "1710\n", 2138 | "1711\n", 2139 | "1712\n", 2140 | "1713\n", 2141 | "1714\n", 2142 | "1715\n", 2143 | "1716\n", 2144 | "1717\n", 2145 | "1718\n", 2146 | "1719\n", 2147 | "1720\n", 2148 | "1721\n", 2149 | "1722\n", 2150 | "1723\n", 2151 | "1724\n", 2152 | "1725\n", 2153 | "1726\n", 2154 | "1727\n", 2155 | "1728\n", 2156 | "1729\n", 2157 | "1730\n", 2158 | "1731\n", 2159 | "1732\n", 2160 | "1733\n", 2161 | "1734\n", 2162 | "1735\n", 2163 | "1736\n", 2164 | "1737\n", 2165 | "1738\n", 2166 | "1739\n", 2167 | "1740\n", 2168 | "1741\n", 2169 | "1742\n", 2170 | "1743\n", 2171 | "1744\n", 2172 | "1745\n", 2173 | "1746\n", 2174 | "1747\n", 2175 | "1748\n", 2176 | "1749\n", 2177 | "1750\n", 2178 | "step: 1750, train loss: 1.463, val loss: 1.729\n", 2179 | "1751\n", 2180 | "1752\n", 2181 | "1753\n", 2182 | "1754\n", 2183 | "1755\n", 2184 | "1756\n", 2185 | "1757\n", 2186 | "1758\n", 2187 | "1759\n", 2188 | "1760\n", 2189 | "1761\n", 2190 | "1762\n", 2191 | "1763\n", 2192 | "1764\n", 2193 | "1765\n", 2194 | "1766\n", 2195 | "1767\n", 2196 | "1768\n", 2197 | "1769\n", 2198 | "1770\n", 2199 | "1771\n", 2200 | "1772\n", 2201 | "1773\n", 2202 | "1774\n", 2203 | "1775\n", 2204 | "1776\n", 2205 | "1777\n", 2206 | "1778\n", 2207 | "1779\n", 2208 | "1780\n", 2209 | "1781\n", 2210 | "1782\n", 2211 | "1783\n", 2212 | "1784\n", 2213 | "1785\n", 2214 | "1786\n", 2215 | "1787\n", 2216 | "1788\n", 2217 | "1789\n", 2218 | "1790\n", 2219 | "1791\n", 2220 | "1792\n", 2221 | "1793\n", 2222 | "1794\n", 2223 | "1795\n", 2224 | "1796\n", 2225 | "1797\n", 2226 | "1798\n", 2227 | "1799\n", 2228 | "1800\n", 2229 | "step: 1800, train loss: 1.459, val loss: 1.714\n", 2230 | "1801\n", 2231 | "1802\n", 2232 | "1803\n", 2233 | "1804\n", 2234 | "1805\n", 2235 | "1806\n", 2236 | "1807\n", 2237 | "1808\n", 2238 | "1809\n", 2239 | "1810\n", 2240 | "1811\n", 2241 | "1812\n", 2242 | "1813\n", 2243 | "1814\n", 2244 | "1815\n", 2245 | "1816\n", 2246 | "1817\n", 2247 | "1818\n", 2248 | "1819\n", 2249 | "1820\n", 2250 | "1821\n", 2251 | "1822\n", 2252 | "1823\n", 2253 | "1824\n", 2254 | "1825\n", 2255 | "1826\n", 2256 | "1827\n", 2257 | "1828\n", 2258 | "1829\n", 2259 | "1830\n", 2260 | "1831\n", 2261 | "1832\n", 2262 | "1833\n", 2263 | "1834\n", 2264 | "1835\n", 2265 | "1836\n", 2266 | "1837\n", 2267 | "1838\n", 2268 | "1839\n", 2269 | "1840\n", 2270 | "1841\n", 2271 | "1842\n", 2272 | "1843\n", 2273 | "1844\n", 2274 | "1845\n", 2275 | "1846\n", 2276 | "1847\n", 2277 | "1848\n", 2278 | "1849\n", 2279 | "1850\n", 2280 | "step: 1850, train loss: 1.461, val loss: 1.709\n", 2281 | "1851\n", 2282 | "1852\n", 2283 | "1853\n", 2284 | "1854\n", 2285 | "1855\n", 2286 | "1856\n", 2287 | "1857\n", 2288 | "1858\n", 2289 | "1859\n", 2290 | "1860\n", 2291 | "1861\n", 2292 | "1862\n", 2293 | "1863\n", 2294 | "1864\n", 2295 | "1865\n", 2296 | "1866\n", 2297 | "1867\n", 2298 | "1868\n", 2299 | "1869\n", 2300 | "1870\n", 2301 | "1871\n", 2302 | "1872\n", 2303 | "1873\n", 2304 | "1874\n", 2305 | "1875\n", 2306 | "1876\n", 2307 | "1877\n", 2308 | "1878\n", 2309 | "1879\n", 2310 | "1880\n", 2311 | "1881\n", 2312 | "1882\n", 2313 | "1883\n", 2314 | "1884\n", 2315 | "1885\n", 2316 | "1886\n", 2317 | "1887\n", 2318 | "1888\n", 2319 | "1889\n", 2320 | "1890\n", 2321 | "1891\n", 2322 | "1892\n", 2323 | "1893\n", 2324 | "1894\n", 2325 | "1895\n", 2326 | "1896\n", 2327 | "1897\n", 2328 | "1898\n", 2329 | "1899\n", 2330 | "1900\n", 2331 | "step: 1900, train loss: 1.446, val loss: 1.725\n", 2332 | "1901\n", 2333 | "1902\n", 2334 | "1903\n", 2335 | "1904\n", 2336 | "1905\n", 2337 | "1906\n", 2338 | "1907\n", 2339 | "1908\n", 2340 | "1909\n", 2341 | "1910\n", 2342 | "1911\n", 2343 | "1912\n", 2344 | "1913\n", 2345 | "1914\n", 2346 | "1915\n", 2347 | "1916\n", 2348 | "1917\n", 2349 | "1918\n", 2350 | "1919\n", 2351 | "1920\n", 2352 | "1921\n", 2353 | "1922\n", 2354 | "1923\n", 2355 | "1924\n", 2356 | "1925\n", 2357 | "1926\n", 2358 | "1927\n", 2359 | "1928\n", 2360 | "1929\n", 2361 | "1930\n", 2362 | "1931\n", 2363 | "1932\n", 2364 | "1933\n", 2365 | "1934\n", 2366 | "1935\n", 2367 | "1936\n", 2368 | "1937\n", 2369 | "1938\n", 2370 | "1939\n", 2371 | "1940\n", 2372 | "1941\n", 2373 | "1942\n", 2374 | "1943\n", 2375 | "1944\n", 2376 | "1945\n", 2377 | "1946\n", 2378 | "1947\n", 2379 | "1948\n", 2380 | "1949\n", 2381 | "1950\n", 2382 | "step: 1950, train loss: 1.447, val loss: 1.708\n", 2383 | "1951\n", 2384 | "1952\n", 2385 | "1953\n", 2386 | "1954\n", 2387 | "1955\n", 2388 | "1956\n", 2389 | "1957\n", 2390 | "1958\n", 2391 | "1959\n", 2392 | "1960\n", 2393 | "1961\n", 2394 | "1962\n", 2395 | "1963\n", 2396 | "1964\n", 2397 | "1965\n", 2398 | "1966\n", 2399 | "1967\n", 2400 | "1968\n", 2401 | "1969\n", 2402 | "1970\n", 2403 | "1971\n", 2404 | "1972\n", 2405 | "1973\n", 2406 | "1974\n", 2407 | "1975\n", 2408 | "1976\n", 2409 | "1977\n", 2410 | "1978\n", 2411 | "1979\n", 2412 | "1980\n", 2413 | "1981\n", 2414 | "1982\n", 2415 | "1983\n", 2416 | "1984\n", 2417 | "1985\n", 2418 | "1986\n", 2419 | "1987\n", 2420 | "1988\n", 2421 | "1989\n", 2422 | "1990\n", 2423 | "1991\n", 2424 | "1992\n", 2425 | "1993\n", 2426 | "1994\n", 2427 | "1995\n", 2428 | "1996\n", 2429 | "1997\n", 2430 | "1998\n", 2431 | "1999\n", 2432 | "2000\n", 2433 | "step: 2000, train loss: 1.444, val loss: 1.711\n", 2434 | "2001\n", 2435 | "2002\n", 2436 | "2003\n", 2437 | "2004\n", 2438 | "2005\n", 2439 | "2006\n", 2440 | "2007\n", 2441 | "2008\n", 2442 | "2009\n", 2443 | "2010\n", 2444 | "2011\n", 2445 | "2012\n", 2446 | "2013\n", 2447 | "2014\n", 2448 | "2015\n", 2449 | "2016\n", 2450 | "2017\n", 2451 | "2018\n", 2452 | "2019\n", 2453 | "2020\n", 2454 | "2021\n", 2455 | "2022\n", 2456 | "2023\n", 2457 | "2024\n", 2458 | "2025\n", 2459 | "2026\n", 2460 | "2027\n", 2461 | "2028\n", 2462 | "2029\n", 2463 | "2030\n", 2464 | "2031\n", 2465 | "2032\n", 2466 | "2033\n", 2467 | "2034\n", 2468 | "2035\n", 2469 | "2036\n", 2470 | "2037\n", 2471 | "2038\n", 2472 | "2039\n", 2473 | "2040\n", 2474 | "2041\n", 2475 | "2042\n", 2476 | "2043\n", 2477 | "2044\n", 2478 | "2045\n", 2479 | "2046\n", 2480 | "2047\n", 2481 | "2048\n", 2482 | "2049\n", 2483 | "2050\n", 2484 | "step: 2050, train loss: 1.431, val loss: 1.703\n", 2485 | "2051\n", 2486 | "2052\n", 2487 | "2053\n", 2488 | "2054\n", 2489 | "2055\n", 2490 | "2056\n", 2491 | "2057\n", 2492 | "2058\n", 2493 | "2059\n", 2494 | "2060\n", 2495 | "2061\n", 2496 | "2062\n", 2497 | "2063\n", 2498 | "2064\n", 2499 | "2065\n", 2500 | "2066\n", 2501 | "2067\n", 2502 | "2068\n", 2503 | "2069\n", 2504 | "2070\n", 2505 | "2071\n", 2506 | "2072\n", 2507 | "2073\n", 2508 | "2074\n", 2509 | "2075\n", 2510 | "2076\n", 2511 | "2077\n", 2512 | "2078\n", 2513 | "2079\n", 2514 | "2080\n", 2515 | "2081\n", 2516 | "2082\n", 2517 | "2083\n", 2518 | "2084\n", 2519 | "2085\n", 2520 | "2086\n", 2521 | "2087\n", 2522 | "2088\n", 2523 | "2089\n", 2524 | "2090\n", 2525 | "2091\n", 2526 | "2092\n", 2527 | "2093\n", 2528 | "2094\n", 2529 | "2095\n", 2530 | "2096\n", 2531 | "2097\n", 2532 | "2098\n", 2533 | "2099\n", 2534 | "2100\n", 2535 | "step: 2100, train loss: 1.429, val loss: 1.692\n", 2536 | "2101\n", 2537 | "2102\n", 2538 | "2103\n", 2539 | "2104\n", 2540 | "2105\n", 2541 | "2106\n", 2542 | "2107\n", 2543 | "2108\n", 2544 | "2109\n", 2545 | "2110\n", 2546 | "2111\n", 2547 | "2112\n", 2548 | "2113\n", 2549 | "2114\n", 2550 | "2115\n", 2551 | "2116\n", 2552 | "2117\n", 2553 | "2118\n", 2554 | "2119\n", 2555 | "2120\n", 2556 | "2121\n", 2557 | "2122\n", 2558 | "2123\n", 2559 | "2124\n", 2560 | "2125\n", 2561 | "2126\n", 2562 | "2127\n", 2563 | "2128\n", 2564 | "2129\n", 2565 | "2130\n", 2566 | "2131\n", 2567 | "2132\n", 2568 | "2133\n", 2569 | "2134\n", 2570 | "2135\n", 2571 | "2136\n", 2572 | "2137\n", 2573 | "2138\n", 2574 | "2139\n", 2575 | "2140\n", 2576 | "2141\n", 2577 | "2142\n", 2578 | "2143\n", 2579 | "2144\n", 2580 | "2145\n", 2581 | "2146\n", 2582 | "2147\n", 2583 | "2148\n", 2584 | "2149\n", 2585 | "2150\n", 2586 | "step: 2150, train loss: 1.413, val loss: 1.691\n", 2587 | "2151\n", 2588 | "2152\n", 2589 | "2153\n", 2590 | "2154\n", 2591 | "2155\n", 2592 | "2156\n", 2593 | "2157\n", 2594 | "2158\n", 2595 | "2159\n", 2596 | "2160\n", 2597 | "2161\n", 2598 | "2162\n", 2599 | "2163\n", 2600 | "2164\n", 2601 | "2165\n", 2602 | "2166\n", 2603 | "2167\n", 2604 | "2168\n", 2605 | "2169\n", 2606 | "2170\n", 2607 | "2171\n", 2608 | "2172\n", 2609 | "2173\n", 2610 | "2174\n", 2611 | "2175\n", 2612 | "2176\n", 2613 | "2177\n", 2614 | "2178\n", 2615 | "2179\n", 2616 | "2180\n", 2617 | "2181\n", 2618 | "2182\n", 2619 | "2183\n", 2620 | "2184\n", 2621 | "2185\n", 2622 | "2186\n", 2623 | "2187\n", 2624 | "2188\n", 2625 | "2189\n", 2626 | "2190\n", 2627 | "2191\n", 2628 | "2192\n", 2629 | "2193\n", 2630 | "2194\n", 2631 | "2195\n", 2632 | "2196\n", 2633 | "2197\n", 2634 | "2198\n", 2635 | "2199\n", 2636 | "2200\n", 2637 | "step: 2200, train loss: 1.431, val loss: 1.691\n", 2638 | "2201\n", 2639 | "2202\n", 2640 | "2203\n", 2641 | "2204\n", 2642 | "2205\n", 2643 | "2206\n", 2644 | "2207\n", 2645 | "2208\n", 2646 | "2209\n", 2647 | "2210\n", 2648 | "2211\n", 2649 | "2212\n", 2650 | "2213\n", 2651 | "2214\n", 2652 | "2215\n", 2653 | "2216\n", 2654 | "2217\n", 2655 | "2218\n", 2656 | "2219\n", 2657 | "2220\n", 2658 | "2221\n", 2659 | "2222\n", 2660 | "2223\n", 2661 | "2224\n", 2662 | "2225\n", 2663 | "2226\n", 2664 | "2227\n", 2665 | "2228\n", 2666 | "2229\n", 2667 | "2230\n", 2668 | "2231\n", 2669 | "2232\n", 2670 | "2233\n", 2671 | "2234\n", 2672 | "2235\n", 2673 | "2236\n", 2674 | "2237\n", 2675 | "2238\n", 2676 | "2239\n", 2677 | "2240\n", 2678 | "2241\n", 2679 | "2242\n", 2680 | "2243\n", 2681 | "2244\n", 2682 | "2245\n", 2683 | "2246\n", 2684 | "2247\n", 2685 | "2248\n", 2686 | "2249\n", 2687 | "2250\n", 2688 | "step: 2250, train loss: 1.416, val loss: 1.699\n", 2689 | "2251\n", 2690 | "2252\n", 2691 | "2253\n", 2692 | "2254\n", 2693 | "2255\n", 2694 | "2256\n", 2695 | "2257\n", 2696 | "2258\n", 2697 | "2259\n", 2698 | "2260\n", 2699 | "2261\n", 2700 | "2262\n", 2701 | "2263\n", 2702 | "2264\n", 2703 | "2265\n", 2704 | "2266\n", 2705 | "2267\n", 2706 | "2268\n", 2707 | "2269\n", 2708 | "2270\n", 2709 | "2271\n", 2710 | "2272\n", 2711 | "2273\n", 2712 | "2274\n", 2713 | "2275\n", 2714 | "2276\n", 2715 | "2277\n", 2716 | "2278\n", 2717 | "2279\n", 2718 | "2280\n", 2719 | "2281\n", 2720 | "2282\n", 2721 | "2283\n", 2722 | "2284\n", 2723 | "2285\n", 2724 | "2286\n", 2725 | "2287\n", 2726 | "2288\n", 2727 | "2289\n", 2728 | "2290\n", 2729 | "2291\n", 2730 | "2292\n", 2731 | "2293\n", 2732 | "2294\n", 2733 | "2295\n", 2734 | "2296\n", 2735 | "2297\n", 2736 | "2298\n", 2737 | "2299\n", 2738 | "2300\n", 2739 | "step: 2300, train loss: 1.408, val loss: 1.679\n", 2740 | "2301\n", 2741 | "2302\n", 2742 | "2303\n", 2743 | "2304\n", 2744 | "2305\n", 2745 | "2306\n", 2746 | "2307\n", 2747 | "2308\n", 2748 | "2309\n", 2749 | "2310\n", 2750 | "2311\n", 2751 | "2312\n", 2752 | "2313\n", 2753 | "2314\n", 2754 | "2315\n", 2755 | "2316\n", 2756 | "2317\n", 2757 | "2318\n", 2758 | "2319\n", 2759 | "2320\n", 2760 | "2321\n", 2761 | "2322\n", 2762 | "2323\n", 2763 | "2324\n", 2764 | "2325\n", 2765 | "2326\n", 2766 | "2327\n", 2767 | "2328\n", 2768 | "2329\n", 2769 | "2330\n", 2770 | "2331\n", 2771 | "2332\n", 2772 | "2333\n", 2773 | "2334\n", 2774 | "2335\n", 2775 | "2336\n", 2776 | "2337\n", 2777 | "2338\n", 2778 | "2339\n", 2779 | "2340\n", 2780 | "2341\n", 2781 | "2342\n", 2782 | "2343\n", 2783 | "2344\n", 2784 | "2345\n", 2785 | "2346\n", 2786 | "2347\n", 2787 | "2348\n", 2788 | "2349\n", 2789 | "2350\n", 2790 | "step: 2350, train loss: 1.397, val loss: 1.672\n", 2791 | "2351\n", 2792 | "2352\n", 2793 | "2353\n", 2794 | "2354\n", 2795 | "2355\n", 2796 | "2356\n", 2797 | "2357\n", 2798 | "2358\n", 2799 | "2359\n", 2800 | "2360\n", 2801 | "2361\n", 2802 | "2362\n", 2803 | "2363\n", 2804 | "2364\n", 2805 | "2365\n", 2806 | "2366\n", 2807 | "2367\n", 2808 | "2368\n", 2809 | "2369\n", 2810 | "2370\n", 2811 | "2371\n", 2812 | "2372\n", 2813 | "2373\n", 2814 | "2374\n", 2815 | "2375\n", 2816 | "2376\n", 2817 | "2377\n", 2818 | "2378\n", 2819 | "2379\n", 2820 | "2380\n", 2821 | "2381\n", 2822 | "2382\n", 2823 | "2383\n", 2824 | "2384\n", 2825 | "2385\n", 2826 | "2386\n", 2827 | "2387\n", 2828 | "2388\n", 2829 | "2389\n", 2830 | "2390\n", 2831 | "2391\n", 2832 | "2392\n", 2833 | "2393\n", 2834 | "2394\n", 2835 | "2395\n", 2836 | "2396\n", 2837 | "2397\n", 2838 | "2398\n", 2839 | "2399\n", 2840 | "2400\n", 2841 | "step: 2400, train loss: 1.402, val loss: 1.691\n", 2842 | "2401\n", 2843 | "2402\n", 2844 | "2403\n", 2845 | "2404\n", 2846 | "2405\n", 2847 | "2406\n", 2848 | "2407\n", 2849 | "2408\n", 2850 | "2409\n", 2851 | "2410\n", 2852 | "2411\n", 2853 | "2412\n", 2854 | "2413\n", 2855 | "2414\n", 2856 | "2415\n", 2857 | "2416\n", 2858 | "2417\n", 2859 | "2418\n", 2860 | "2419\n", 2861 | "2420\n", 2862 | "2421\n", 2863 | "2422\n", 2864 | "2423\n", 2865 | "2424\n", 2866 | "2425\n", 2867 | "2426\n", 2868 | "2427\n", 2869 | "2428\n", 2870 | "2429\n", 2871 | "2430\n", 2872 | "2431\n", 2873 | "2432\n", 2874 | "2433\n", 2875 | "2434\n", 2876 | "2435\n", 2877 | "2436\n", 2878 | "2437\n", 2879 | "2438\n", 2880 | "2439\n", 2881 | "2440\n", 2882 | "2441\n", 2883 | "2442\n", 2884 | "2443\n", 2885 | "2444\n", 2886 | "2445\n", 2887 | "2446\n", 2888 | "2447\n", 2889 | "2448\n", 2890 | "2449\n", 2891 | "2450\n", 2892 | "step: 2450, train loss: 1.405, val loss: 1.692\n", 2893 | "2451\n", 2894 | "2452\n", 2895 | "2453\n", 2896 | "2454\n", 2897 | "2455\n", 2898 | "2456\n", 2899 | "2457\n", 2900 | "2458\n", 2901 | "2459\n", 2902 | "2460\n", 2903 | "2461\n", 2904 | "2462\n", 2905 | "2463\n", 2906 | "2464\n", 2907 | "2465\n", 2908 | "2466\n", 2909 | "2467\n", 2910 | "2468\n", 2911 | "2469\n", 2912 | "2470\n", 2913 | "2471\n", 2914 | "2472\n", 2915 | "2473\n", 2916 | "2474\n", 2917 | "2475\n", 2918 | "2476\n", 2919 | "2477\n", 2920 | "2478\n", 2921 | "2479\n", 2922 | "2480\n", 2923 | "2481\n", 2924 | "2482\n", 2925 | "2483\n", 2926 | "2484\n", 2927 | "2485\n", 2928 | "2486\n", 2929 | "2487\n", 2930 | "2488\n", 2931 | "2489\n", 2932 | "2490\n", 2933 | "2491\n", 2934 | "2492\n", 2935 | "2493\n", 2936 | "2494\n", 2937 | "2495\n", 2938 | "2496\n", 2939 | "2497\n", 2940 | "2498\n", 2941 | "2499\n", 2942 | "2500\n", 2943 | "step: 2500, train loss: 1.393, val loss: 1.675\n", 2944 | "2501\n", 2945 | "2502\n", 2946 | "2503\n", 2947 | "2504\n", 2948 | "2505\n", 2949 | "2506\n", 2950 | "2507\n", 2951 | "2508\n", 2952 | "2509\n", 2953 | "2510\n", 2954 | "2511\n", 2955 | "2512\n", 2956 | "2513\n", 2957 | "2514\n", 2958 | "2515\n", 2959 | "2516\n", 2960 | "2517\n", 2961 | "2518\n", 2962 | "2519\n", 2963 | "2520\n", 2964 | "2521\n", 2965 | "2522\n", 2966 | "2523\n", 2967 | "2524\n", 2968 | "2525\n", 2969 | "2526\n", 2970 | "2527\n", 2971 | "2528\n", 2972 | "2529\n", 2973 | "2530\n", 2974 | "2531\n", 2975 | "2532\n", 2976 | "2533\n", 2977 | "2534\n", 2978 | "2535\n", 2979 | "2536\n", 2980 | "2537\n", 2981 | "2538\n", 2982 | "2539\n", 2983 | "2540\n", 2984 | "2541\n", 2985 | "2542\n", 2986 | "2543\n", 2987 | "2544\n", 2988 | "2545\n", 2989 | "2546\n", 2990 | "2547\n", 2991 | "2548\n", 2992 | "2549\n", 2993 | "2550\n", 2994 | "step: 2550, train loss: 1.384, val loss: 1.669\n", 2995 | "2551\n", 2996 | "2552\n", 2997 | "2553\n", 2998 | "2554\n", 2999 | "2555\n", 3000 | "2556\n", 3001 | "2557\n", 3002 | "2558\n", 3003 | "2559\n", 3004 | "2560\n", 3005 | "2561\n", 3006 | "2562\n", 3007 | "2563\n", 3008 | "2564\n", 3009 | "2565\n", 3010 | "2566\n", 3011 | "2567\n", 3012 | "2568\n", 3013 | "2569\n", 3014 | "2570\n", 3015 | "2571\n", 3016 | "2572\n", 3017 | "2573\n", 3018 | "2574\n", 3019 | "2575\n", 3020 | "2576\n", 3021 | "2577\n", 3022 | "2578\n", 3023 | "2579\n", 3024 | "2580\n", 3025 | "2581\n", 3026 | "2582\n", 3027 | "2583\n", 3028 | "2584\n", 3029 | "2585\n", 3030 | "2586\n", 3031 | "2587\n", 3032 | "2588\n", 3033 | "2589\n", 3034 | "2590\n", 3035 | "2591\n", 3036 | "2592\n", 3037 | "2593\n", 3038 | "2594\n", 3039 | "2595\n", 3040 | "2596\n", 3041 | "2597\n", 3042 | "2598\n", 3043 | "2599\n", 3044 | "2600\n", 3045 | "step: 2600, train loss: 1.389, val loss: 1.664\n", 3046 | "2601\n", 3047 | "2602\n", 3048 | "2603\n", 3049 | "2604\n", 3050 | "2605\n", 3051 | "2606\n", 3052 | "2607\n", 3053 | "2608\n", 3054 | "2609\n", 3055 | "2610\n", 3056 | "2611\n", 3057 | "2612\n", 3058 | "2613\n", 3059 | "2614\n", 3060 | "2615\n", 3061 | "2616\n", 3062 | "2617\n", 3063 | "2618\n", 3064 | "2619\n", 3065 | "2620\n", 3066 | "2621\n", 3067 | "2622\n", 3068 | "2623\n", 3069 | "2624\n", 3070 | "2625\n", 3071 | "2626\n", 3072 | "2627\n", 3073 | "2628\n", 3074 | "2629\n", 3075 | "2630\n", 3076 | "2631\n", 3077 | "2632\n", 3078 | "2633\n", 3079 | "2634\n", 3080 | "2635\n", 3081 | "2636\n", 3082 | "2637\n", 3083 | "2638\n", 3084 | "2639\n", 3085 | "2640\n", 3086 | "2641\n", 3087 | "2642\n", 3088 | "2643\n", 3089 | "2644\n", 3090 | "2645\n", 3091 | "2646\n", 3092 | "2647\n", 3093 | "2648\n", 3094 | "2649\n", 3095 | "2650\n", 3096 | "step: 2650, train loss: 1.386, val loss: 1.674\n", 3097 | "2651\n", 3098 | "2652\n", 3099 | "2653\n", 3100 | "2654\n", 3101 | "2655\n", 3102 | "2656\n", 3103 | "2657\n", 3104 | "2658\n", 3105 | "2659\n", 3106 | "2660\n", 3107 | "2661\n", 3108 | "2662\n", 3109 | "2663\n", 3110 | "2664\n", 3111 | "2665\n", 3112 | "2666\n", 3113 | "2667\n", 3114 | "2668\n", 3115 | "2669\n", 3116 | "2670\n", 3117 | "2671\n", 3118 | "2672\n", 3119 | "2673\n", 3120 | "2674\n", 3121 | "2675\n", 3122 | "2676\n", 3123 | "2677\n", 3124 | "2678\n", 3125 | "2679\n", 3126 | "2680\n", 3127 | "2681\n", 3128 | "2682\n", 3129 | "2683\n", 3130 | "2684\n", 3131 | "2685\n", 3132 | "2686\n", 3133 | "2687\n", 3134 | "2688\n", 3135 | "2689\n", 3136 | "2690\n", 3137 | "2691\n", 3138 | "2692\n", 3139 | "2693\n", 3140 | "2694\n", 3141 | "2695\n", 3142 | "2696\n", 3143 | "2697\n", 3144 | "2698\n", 3145 | "2699\n", 3146 | "2700\n", 3147 | "step: 2700, train loss: 1.384, val loss: 1.677\n", 3148 | "2701\n", 3149 | "2702\n", 3150 | "2703\n", 3151 | "2704\n", 3152 | "2705\n", 3153 | "2706\n", 3154 | "2707\n", 3155 | "2708\n", 3156 | "2709\n", 3157 | "2710\n", 3158 | "2711\n", 3159 | "2712\n", 3160 | "2713\n", 3161 | "2714\n", 3162 | "2715\n", 3163 | "2716\n", 3164 | "2717\n", 3165 | "2718\n", 3166 | "2719\n", 3167 | "2720\n", 3168 | "2721\n", 3169 | "2722\n", 3170 | "2723\n", 3171 | "2724\n", 3172 | "2725\n", 3173 | "2726\n", 3174 | "2727\n", 3175 | "2728\n", 3176 | "2729\n", 3177 | "2730\n", 3178 | "2731\n", 3179 | "2732\n", 3180 | "2733\n", 3181 | "2734\n", 3182 | "2735\n", 3183 | "2736\n", 3184 | "2737\n", 3185 | "2738\n", 3186 | "2739\n", 3187 | "2740\n", 3188 | "2741\n", 3189 | "2742\n", 3190 | "2743\n", 3191 | "2744\n", 3192 | "2745\n", 3193 | "2746\n", 3194 | "2747\n", 3195 | "2748\n", 3196 | "2749\n", 3197 | "2750\n", 3198 | "step: 2750, train loss: 1.384, val loss: 1.674\n", 3199 | "2751\n", 3200 | "2752\n", 3201 | "2753\n", 3202 | "2754\n", 3203 | "2755\n", 3204 | "2756\n", 3205 | "2757\n", 3206 | "2758\n", 3207 | "2759\n", 3208 | "2760\n", 3209 | "2761\n", 3210 | "2762\n", 3211 | "2763\n", 3212 | "2764\n", 3213 | "2765\n", 3214 | "2766\n", 3215 | "2767\n", 3216 | "2768\n", 3217 | "2769\n", 3218 | "2770\n", 3219 | "2771\n", 3220 | "2772\n", 3221 | "2773\n", 3222 | "2774\n", 3223 | "2775\n", 3224 | "2776\n", 3225 | "2777\n", 3226 | "2778\n", 3227 | "2779\n", 3228 | "2780\n", 3229 | "2781\n", 3230 | "2782\n", 3231 | "2783\n", 3232 | "2784\n", 3233 | "2785\n", 3234 | "2786\n", 3235 | "2787\n", 3236 | "2788\n", 3237 | "2789\n", 3238 | "2790\n", 3239 | "2791\n", 3240 | "2792\n", 3241 | "2793\n", 3242 | "2794\n", 3243 | "2795\n", 3244 | "2796\n", 3245 | "2797\n", 3246 | "2798\n", 3247 | "2799\n", 3248 | "2800\n", 3249 | "step: 2800, train loss: 1.376, val loss: 1.658\n", 3250 | "2801\n", 3251 | "2802\n", 3252 | "2803\n", 3253 | "2804\n", 3254 | "2805\n", 3255 | "2806\n", 3256 | "2807\n", 3257 | "2808\n", 3258 | "2809\n", 3259 | "2810\n", 3260 | "2811\n", 3261 | "2812\n", 3262 | "2813\n", 3263 | "2814\n", 3264 | "2815\n", 3265 | "2816\n", 3266 | "2817\n", 3267 | "2818\n", 3268 | "2819\n", 3269 | "2820\n", 3270 | "2821\n", 3271 | "2822\n", 3272 | "2823\n", 3273 | "2824\n", 3274 | "2825\n", 3275 | "2826\n", 3276 | "2827\n", 3277 | "2828\n", 3278 | "2829\n", 3279 | "2830\n", 3280 | "2831\n", 3281 | "2832\n", 3282 | "2833\n", 3283 | "2834\n", 3284 | "2835\n", 3285 | "2836\n", 3286 | "2837\n", 3287 | "2838\n", 3288 | "2839\n", 3289 | "2840\n", 3290 | "2841\n", 3291 | "2842\n", 3292 | "2843\n", 3293 | "2844\n", 3294 | "2845\n", 3295 | "2846\n", 3296 | "2847\n", 3297 | "2848\n", 3298 | "2849\n", 3299 | "2850\n", 3300 | "step: 2850, train loss: 1.377, val loss: 1.660\n", 3301 | "2851\n", 3302 | "2852\n", 3303 | "2853\n", 3304 | "2854\n", 3305 | "2855\n", 3306 | "2856\n", 3307 | "2857\n", 3308 | "2858\n", 3309 | "2859\n", 3310 | "2860\n", 3311 | "2861\n", 3312 | "2862\n", 3313 | "2863\n", 3314 | "2864\n", 3315 | "2865\n", 3316 | "2866\n", 3317 | "2867\n", 3318 | "2868\n", 3319 | "2869\n", 3320 | "2870\n", 3321 | "2871\n", 3322 | "2872\n", 3323 | "2873\n", 3324 | "2874\n", 3325 | "2875\n", 3326 | "2876\n", 3327 | "2877\n", 3328 | "2878\n", 3329 | "2879\n", 3330 | "2880\n", 3331 | "2881\n", 3332 | "2882\n", 3333 | "2883\n", 3334 | "2884\n", 3335 | "2885\n", 3336 | "2886\n", 3337 | "2887\n", 3338 | "2888\n", 3339 | "2889\n", 3340 | "2890\n", 3341 | "2891\n", 3342 | "2892\n", 3343 | "2893\n", 3344 | "2894\n", 3345 | "2895\n", 3346 | "2896\n", 3347 | "2897\n", 3348 | "2898\n", 3349 | "2899\n", 3350 | "2900\n", 3351 | "step: 2900, train loss: 1.364, val loss: 1.670\n", 3352 | "2901\n", 3353 | "2902\n", 3354 | "2903\n", 3355 | "2904\n", 3356 | "2905\n", 3357 | "2906\n", 3358 | "2907\n", 3359 | "2908\n", 3360 | "2909\n", 3361 | "2910\n", 3362 | "2911\n", 3363 | "2912\n", 3364 | "2913\n", 3365 | "2914\n", 3366 | "2915\n", 3367 | "2916\n", 3368 | "2917\n", 3369 | "2918\n", 3370 | "2919\n", 3371 | "2920\n", 3372 | "2921\n", 3373 | "2922\n", 3374 | "2923\n", 3375 | "2924\n", 3376 | "2925\n", 3377 | "2926\n", 3378 | "2927\n", 3379 | "2928\n", 3380 | "2929\n", 3381 | "2930\n", 3382 | "2931\n", 3383 | "2932\n", 3384 | "2933\n", 3385 | "2934\n", 3386 | "2935\n", 3387 | "2936\n", 3388 | "2937\n", 3389 | "2938\n", 3390 | "2939\n", 3391 | "2940\n", 3392 | "2941\n", 3393 | "2942\n", 3394 | "2943\n", 3395 | "2944\n", 3396 | "2945\n", 3397 | "2946\n", 3398 | "2947\n", 3399 | "2948\n", 3400 | "2949\n", 3401 | "2950\n", 3402 | "step: 2950, train loss: 1.358, val loss: 1.667\n", 3403 | "2951\n", 3404 | "2952\n", 3405 | "2953\n", 3406 | "2954\n", 3407 | "2955\n", 3408 | "2956\n", 3409 | "2957\n", 3410 | "2958\n", 3411 | "2959\n", 3412 | "2960\n", 3413 | "2961\n", 3414 | "2962\n", 3415 | "2963\n", 3416 | "2964\n", 3417 | "2965\n", 3418 | "2966\n", 3419 | "2967\n", 3420 | "2968\n", 3421 | "2969\n", 3422 | "2970\n", 3423 | "2971\n", 3424 | "2972\n", 3425 | "2973\n", 3426 | "2974\n", 3427 | "2975\n", 3428 | "2976\n", 3429 | "2977\n", 3430 | "2978\n", 3431 | "2979\n", 3432 | "2980\n", 3433 | "2981\n", 3434 | "2982\n", 3435 | "2983\n", 3436 | "2984\n", 3437 | "2985\n", 3438 | "2986\n", 3439 | "2987\n", 3440 | "2988\n", 3441 | "2989\n", 3442 | "2990\n", 3443 | "2991\n", 3444 | "2992\n", 3445 | "2993\n", 3446 | "2994\n", 3447 | "2995\n", 3448 | "2996\n", 3449 | "2997\n", 3450 | "2998\n", 3451 | "2999\n", 3452 | "1.4722700119018555\n" 3453 | ] 3454 | } 3455 | ] 3456 | }, 3457 | { 3458 | "cell_type": "code", 3459 | "source": [ 3460 | "prompt = 'Tree'\n", 3461 | "context = torch.tensor(encode(prompt), dtype=torch.long, device=device)\n", 3462 | "generated_chars = decode(m.generate(context.unsqueeze(0), max_new_tokens=1000)[0].tolist())\n", 3463 | "print(generated_chars)" 3464 | ], 3465 | "metadata": { 3466 | "colab": { 3467 | "base_uri": "https://localhost:8080/" 3468 | }, 3469 | "id": "MwDuE-seY8qk", 3470 | "outputId": "400202fb-cfaa-44d7-d7e1-3a30d3aee978" 3471 | }, 3472 | "execution_count": null, 3473 | "outputs": [ 3474 | { 3475 | "output_type": "stream", 3476 | "name": "stdout", 3477 | "text": [ 3478 | "Tree your loves: when me them flower\n", 3479 | "Than all glister-only were use you spite to speaks,\n", 3480 | "Against to all man's hearm like, at henceforce\n", 3481 | "Of was'd your aloof, nor Impatrivious mutines?\n", 3482 | "That stand\n", 3483 | "An a unparious of, as it to herre. Course's hands,\n", 3484 | "To tent assure, her that's guard a very is.\n", 3485 | "\n", 3486 | "CATUS:\n", 3487 | "Now, hast be to thou my faul, not fear therefore pardon\n", 3488 | "Let every here; but if this spoke of all for quarre,\n", 3489 | "Shot are do-mother darshing king:\n", 3490 | "If take mystable throze hope k advise his fault,\n", 3491 | "None love from come to dare remember which you\n", 3492 | "That a chase, that for thou slay of the hadster:\n", 3493 | "Therefore thy magning wast on.\n", 3494 | "\n", 3495 | "HOMAS SurdOP:\n", 3496 | "I will do do havot here, he imasters; they arm mirat.\n", 3497 | "A love! my tanley will is very out as my knees,\n", 3498 | "Petity is bonest they dire, my reman and stabb's:\n", 3499 | "Vallain, sit, a Will.\n", 3500 | "\n", 3501 | "SICINIUS:\n", 3502 | "Stink the is hate figue to too, were mirround,\n", 3503 | "Nayent worthink'd bloody on I ours.\n", 3504 | "\n", 3505 | "ROMEO:\n", 3506 | "This notifledNomth abraats, as a determine,\n", 3507 | "Stay with a daar.\n", 3508 | "\n", 3509 | "MENENnIUS:\n", 3510 | "Bereath Londier er's se\n" 3511 | ] 3512 | } 3513 | ] 3514 | } 3515 | ] 3516 | } --------------------------------------------------------------------------------