├── .gitignore ├── 1_Concepts.ipynb ├── 2_Fine_tuning.ipynb ├── README.md └── slide.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # pdm 105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 106 | #pdm.lock 107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 108 | # in version control. 109 | # https://pdm.fming.dev/#use-with-ide 110 | .pdm.toml 111 | 112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 113 | __pypackages__/ 114 | 115 | # Celery stuff 116 | celerybeat-schedule 117 | celerybeat.pid 118 | 119 | # SageMath parsed files 120 | *.sage.py 121 | 122 | # Environments 123 | .env 124 | .venv 125 | env/ 126 | venv/ 127 | ENV/ 128 | env.bak/ 129 | venv.bak/ 130 | 131 | # Spyder project settings 132 | .spyderproject 133 | .spyproject 134 | 135 | # Rope project settings 136 | .ropeproject 137 | 138 | # mkdocs documentation 139 | /site 140 | 141 | # mypy 142 | .mypy_cache/ 143 | .dmypy.json 144 | dmypy.json 145 | 146 | # Pyre type checker 147 | .pyre/ 148 | 149 | # pytype static type analyzer 150 | .pytype/ 151 | 152 | # Cython debug symbols 153 | cython_debug/ 154 | 155 | # PyCharm 156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 158 | # and can be added to the global gitignore or merged into this file. For a more nuclear 159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 160 | #.idea/ 161 | -------------------------------------------------------------------------------- /1_Concepts.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "gpuType": "T4", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "source": [ 33 | "# 환경설정하기" 34 | ], 35 | "metadata": { 36 | "id": "IC-upOpwnQ8c" 37 | } 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "source": [ 42 | "### 1. GPU 사용 설정\n", 43 | "메뉴 바에서 [런타임] -> [런타임 유형 변경] -> [하드웨어 가속기] 항목에서 GPU 선택\n", 44 | "\n", 45 | "※ Colab GPU 하루 최대 12시간까지 사용 가능\n" 46 | ], 47 | "metadata": { 48 | "id": "uZzOtYQiDPNU" 49 | } 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "source": [ 54 | "### 2. 한글 설정\n", 55 | "\n", 56 | "주의: 아래 apt-get 설치 코드가 현재 런타임에는 바로 반영되지 않을 수 있습니다.\n", 57 | "아래 코드 셀을 실행한 뒤 [런타임] -> [런타임 다시 시작] 을 통해 설치된 패키지가 현재 실행 환경에 반영되도록 해주세요." 58 | ], 59 | "metadata": { 60 | "id": "_Mr8NUV0-cti" 61 | } 62 | }, 63 | { 64 | "cell_type": "code", 65 | "source": [ 66 | "!sudo apt-get install -y fonts-nanum\n", 67 | "!sudo fc-cache -fv\n", 68 | "!rm ~/.cache/matplotlib -rf" 69 | ], 70 | "metadata": { 71 | "id": "4YtZhTaN-eXB" 72 | }, 73 | "execution_count": null, 74 | "outputs": [] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "source": [ 79 | "import matplotlib.pyplot as plt\n", 80 | "\n", 81 | "plt.rc('font', family='NanumBarunGothic')" 82 | ], 83 | "metadata": { 84 | "id": "gGAZEy0A-i15" 85 | }, 86 | "execution_count": 2, 87 | "outputs": [] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "source": [ 92 | "### 3. 라이브러리 설치하기" 93 | ], 94 | "metadata": { 95 | "id": "mN7IlHVCff2Q" 96 | } 97 | }, 98 | { 99 | "cell_type": "code", 100 | "source": [ 101 | "! pip install torch transformers datasets" 102 | ], 103 | "metadata": { 104 | "id": "krbSJqrrnEav" 105 | }, 106 | "execution_count": null, 107 | "outputs": [] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "source": [ 112 | "### 4. 편집기 설정\n", 113 | "메뉴 바에서 [도구] -> [설정] -> [편집기] 에서 \"행 번호 표시\" 와 \"들여쓰기 가이드 표시\" 등 유용한 기능을 설정하세요." 114 | ], 115 | "metadata": { 116 | "id": "A2ZQk44ciRBK" 117 | } 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "source": [ 122 | "# 다국어 BERT (mBERT) 의 Attention 점수 확인하기\n" 123 | ], 124 | "metadata": { 125 | "id": "RtRSJp4srq6N" 126 | } 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "source": [ 131 | "### 모델 허브에서 모델 다운로드하기\n", 132 | "* Hugging Face model hub: https://huggingface.co/bert-base-multilingual-cased" 133 | ], 134 | "metadata": { 135 | "id": "GZ4PWnoRuei3" 136 | } 137 | }, 138 | { 139 | "cell_type": "code", 140 | "source": [ 141 | "from transformers import AutoModel\n", 142 | "\n", 143 | "model = AutoModel.from_pretrained(\"bert-base-multilingual-cased\")" 144 | ], 145 | "metadata": { 146 | "id": "RLj6pMF4F-hv" 147 | }, 148 | "execution_count": 4, 149 | "outputs": [] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "source": [ 154 | "from transformers import AutoTokenizer\n", 155 | "\n", 156 | "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-multilingual-cased\")" 157 | ], 158 | "metadata": { 159 | "id": "UPvmyL1-XOAC" 160 | }, 161 | "execution_count": 5, 162 | "outputs": [] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "source": [ 167 | "### 문장 토크나이즈 해보기" 168 | ], 169 | "metadata": { 170 | "id": "3vHUkt34nifP" 171 | } 172 | }, 173 | { 174 | "cell_type": "code", 175 | "source": [ 176 | "example_text = \"가는 말이 고와야 오는 말도 곱다\"\n", 177 | "tokens = tokenizer.tokenize(example_text, add_special_tokens=True)\n", 178 | "token_ids = tokenizer.convert_tokens_to_ids(tokens)\n", 179 | "for token, token_id in zip(tokens, token_ids):\n", 180 | " print(f\"{token:>3} :: {token_id}\")" 181 | ], 182 | "metadata": { 183 | "colab": { 184 | "base_uri": "https://localhost:8080/" 185 | }, 186 | "id": "07gyXSamnhz5", 187 | "outputId": "0ffef602-b5ff-4c7f-fb2d-059bb9026d94" 188 | }, 189 | "execution_count": 6, 190 | "outputs": [ 191 | { 192 | "output_type": "stream", 193 | "name": "stdout", 194 | "text": [ 195 | "[CLS] :: 101\n", 196 | " 가 :: 8843\n", 197 | "##는 :: 11018\n", 198 | " 말 :: 9251\n", 199 | "##이 :: 10739\n", 200 | " 고 :: 8888\n", 201 | "##와 :: 12638\n", 202 | "##야 :: 21711\n", 203 | " 오 :: 9580\n", 204 | "##는 :: 11018\n", 205 | " 말 :: 9251\n", 206 | "##도 :: 12092\n", 207 | " 곱 :: 8894\n", 208 | "##다 :: 11903\n", 209 | "[SEP] :: 102\n" 210 | ] 211 | } 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "source": [ 217 | "### 토크나이저의 사전\n", 218 | "* 토크나이저의 사전은 학습 데이터에 등장하는 단어 통계를 학습한 결과물입니다.\n", 219 | "* 아래 예시 문장을 토크나이즈하여 학습 데이터에서 발견되지 않은 단어와 더 많이 발견된 단어를 확인해보세요.\n", 220 | " * 많이 발견된 단어: '##르고', '##기가', '##에는'\n", 221 | " * 발견되지 않은 단어: '닳도록' --> '[UNK]'" 222 | ], 223 | "metadata": { 224 | "id": "Uq7J5P3cggx0" 225 | } 226 | }, 227 | { 228 | "cell_type": "code", 229 | "source": [ 230 | "example_text = \"동해물과 백두산이 마르고 닳도록\"\n", 231 | "# 토크나이저 호출해보기\n", 232 | "tokens = tokenizer.tokenize(example_text, add_special_tokens=True)\n", 233 | "print(tokens)\n", 234 | "\n", 235 | "example_text = \"나 보기가 역겨워 가실 때에는\"\n", 236 | "# 토크나이저 호출해보기\n", 237 | "tokens = tokenizer.tokenize(example_text, add_special_tokens=True)\n", 238 | "print(tokens)" 239 | ], 240 | "metadata": { 241 | "id": "DNDR1uy6ga6l", 242 | "colab": { 243 | "base_uri": "https://localhost:8080/" 244 | }, 245 | "outputId": "ed513f50-e020-44bb-d9d6-0a93fed126c4" 246 | }, 247 | "execution_count": 7, 248 | "outputs": [ 249 | { 250 | "output_type": "stream", 251 | "name": "stdout", 252 | "text": [ 253 | "['[CLS]', '동', '##해', '##물', '##과', '백', '##두', '##산', '##이', '마', '##르고', '[UNK]', '[SEP]']\n", 254 | "['[CLS]', '나', '보', '##기가', '역', '##겨', '##워', '가', '##실', '때', '##에는', '[SEP]']\n" 255 | ] 256 | } 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "source": [ 262 | "### 모델 구조 확인\n", 263 | "* BERT 모델 구조 사용\n", 264 | "* 모델에 입력할수 있는 토큰 최대 길이: 512\n", 265 | " * `model.config.max_position_embeddings`\n", 266 | "* Embeddings Layer + 12 x Encoder Layer + Pooler Layer\n", 267 | " * `(embeddings)`: Embeddings Layer: 119,547 -> 768\n", 268 | " * `(encoder)`: 12 x Encoder Layer: 768 -> 768\n", 269 | " * `(pooler)`: Pooler Layer: 768 -> 768" 270 | ], 271 | "metadata": { 272 | "id": "mxJH3v2hLAIq" 273 | } 274 | }, 275 | { 276 | "cell_type": "code", 277 | "source": [ 278 | "print(repr(model.config))\n", 279 | "print(repr(model))" 280 | ], 281 | "metadata": { 282 | "colab": { 283 | "base_uri": "https://localhost:8080/" 284 | }, 285 | "id": "3W5E_AGzK2TA", 286 | "outputId": "c7f05916-a12e-4a8b-f1d9-653ec1ee59a4" 287 | }, 288 | "execution_count": 8, 289 | "outputs": [ 290 | { 291 | "output_type": "stream", 292 | "name": "stdout", 293 | "text": [ 294 | "BertConfig {\n", 295 | " \"_name_or_path\": \"bert-base-multilingual-cased\",\n", 296 | " \"architectures\": [\n", 297 | " \"BertForMaskedLM\"\n", 298 | " ],\n", 299 | " \"attention_probs_dropout_prob\": 0.1,\n", 300 | " \"classifier_dropout\": null,\n", 301 | " \"directionality\": \"bidi\",\n", 302 | " \"hidden_act\": \"gelu\",\n", 303 | " \"hidden_dropout_prob\": 0.1,\n", 304 | " \"hidden_size\": 768,\n", 305 | " \"initializer_range\": 0.02,\n", 306 | " \"intermediate_size\": 3072,\n", 307 | " \"layer_norm_eps\": 1e-12,\n", 308 | " \"max_position_embeddings\": 512,\n", 309 | " \"model_type\": \"bert\",\n", 310 | " \"num_attention_heads\": 12,\n", 311 | " \"num_hidden_layers\": 12,\n", 312 | " \"pad_token_id\": 0,\n", 313 | " \"pooler_fc_size\": 768,\n", 314 | " \"pooler_num_attention_heads\": 12,\n", 315 | " \"pooler_num_fc_layers\": 3,\n", 316 | " \"pooler_size_per_head\": 128,\n", 317 | " \"pooler_type\": \"first_token_transform\",\n", 318 | " \"position_embedding_type\": \"absolute\",\n", 319 | " \"transformers_version\": \"4.32.0\",\n", 320 | " \"type_vocab_size\": 2,\n", 321 | " \"use_cache\": true,\n", 322 | " \"vocab_size\": 119547\n", 323 | "}\n", 324 | "\n", 325 | "BertModel(\n", 326 | " (embeddings): BertEmbeddings(\n", 327 | " (word_embeddings): Embedding(119547, 768, padding_idx=0)\n", 328 | " (position_embeddings): Embedding(512, 768)\n", 329 | " (token_type_embeddings): Embedding(2, 768)\n", 330 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 331 | " (dropout): Dropout(p=0.1, inplace=False)\n", 332 | " )\n", 333 | " (encoder): BertEncoder(\n", 334 | " (layer): ModuleList(\n", 335 | " (0-11): 12 x BertLayer(\n", 336 | " (attention): BertAttention(\n", 337 | " (self): BertSelfAttention(\n", 338 | " (query): Linear(in_features=768, out_features=768, bias=True)\n", 339 | " (key): Linear(in_features=768, out_features=768, bias=True)\n", 340 | " (value): Linear(in_features=768, out_features=768, bias=True)\n", 341 | " (dropout): Dropout(p=0.1, inplace=False)\n", 342 | " )\n", 343 | " (output): BertSelfOutput(\n", 344 | " (dense): Linear(in_features=768, out_features=768, bias=True)\n", 345 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 346 | " (dropout): Dropout(p=0.1, inplace=False)\n", 347 | " )\n", 348 | " )\n", 349 | " (intermediate): BertIntermediate(\n", 350 | " (dense): Linear(in_features=768, out_features=3072, bias=True)\n", 351 | " (intermediate_act_fn): GELUActivation()\n", 352 | " )\n", 353 | " (output): BertOutput(\n", 354 | " (dense): Linear(in_features=3072, out_features=768, bias=True)\n", 355 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 356 | " (dropout): Dropout(p=0.1, inplace=False)\n", 357 | " )\n", 358 | " )\n", 359 | " )\n", 360 | " )\n", 361 | " (pooler): BertPooler(\n", 362 | " (dense): Linear(in_features=768, out_features=768, bias=True)\n", 363 | " (activation): Tanh()\n", 364 | " )\n", 365 | ")\n" 366 | ] 367 | } 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "source": [ 373 | "### 모델 크기 계산해서 `pytorch_model.bin` 크기와 비교해보기\n", 374 | "\n", 375 | "* 파라미터 개수 구하기\n", 376 | "* 파라미터 크기 확인\n", 377 | " * `torch.float16` -> 개당 2 byte\n", 378 | " * `torch.float32` -> 개당 4 byte\n", 379 | "* 파라미터 개수 x 파라미터 크기 = 모델의 최소 크기\n", 380 | " * mBERT: 178M * 4byte = 711 MB\n", 381 | " * `pytorch_model.bin` 크기와 거의 동일\n", 382 | "\n", 383 | "(참고) mixed precision 사용하는 경우도 있음" 384 | ], 385 | "metadata": { 386 | "id": "-fJmpuw8GWBj" 387 | } 388 | }, 389 | { 390 | "cell_type": "code", 391 | "source": [ 392 | "type_params = set([t.dtype for t in model.parameters()])\n", 393 | "type_params" 394 | ], 395 | "metadata": { 396 | "colab": { 397 | "base_uri": "https://localhost:8080/" 398 | }, 399 | "id": "2NkaD2M0H00E", 400 | "outputId": "38001fae-a856-4475-9ac5-5e0cd655bf03" 401 | }, 402 | "execution_count": 9, 403 | "outputs": [ 404 | { 405 | "output_type": "execute_result", 406 | "data": { 407 | "text/plain": [ 408 | "{torch.float32}" 409 | ] 410 | }, 411 | "metadata": {}, 412 | "execution_count": 9 413 | } 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "source": [ 419 | "num_params = sum(t.numel() for t in model.parameters())\n", 420 | "print(f\"number of params : {num_params:,}\")\n", 421 | "print(f\"total param size : {num_params * 4:,}\")" 422 | ], 423 | "metadata": { 424 | "colab": { 425 | "base_uri": "https://localhost:8080/" 426 | }, 427 | "id": "_rwpGMrzHVG-", 428 | "outputId": "0a31e5db-b05d-4212-d6cc-5dae8c71e894" 429 | }, 430 | "execution_count": 10, 431 | "outputs": [ 432 | { 433 | "output_type": "stream", 434 | "name": "stdout", 435 | "text": [ 436 | "number of params : 177,853,440\n", 437 | "total param size : 711,413,760\n" 438 | ] 439 | } 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "source": [ 445 | "### mBERT 를 이용한 텍스트 임베딩 생성\n" 446 | ], 447 | "metadata": { 448 | "id": "OsU_EXHYXB0s" 449 | } 450 | }, 451 | { 452 | "cell_type": "code", 453 | "source": [ 454 | "import torch\n", 455 | "\n", 456 | "if torch.cuda.is_available():\n", 457 | " device = torch.device(\"cuda\")\n", 458 | "else:\n", 459 | " device = torch.device(\"cpu\")\n", 460 | "\n", 461 | "# 모델과 입력 GPU 메모리에 올리기\n", 462 | "model = model.to(device)\n", 463 | "\n", 464 | "example_text = \"가는 말이 고와야 오는 말도 곱다\"\n", 465 | "tokenized_inputs = tokenizer(example_text, return_tensors='pt')\n", 466 | "\n", 467 | "input_ids = tokenized_inputs.input_ids.to(device)\n", 468 | "attention_mask = tokenized_inputs.attention_mask.to(device)\n", 469 | "\n", 470 | "# 모델 임베딩\n", 471 | "model_out = model(input_ids=input_ids, attention_mask=attention_mask, output_attentions=True)" 472 | ], 473 | "metadata": { 474 | "id": "W4iCFTZ-Bt1H" 475 | }, 476 | "execution_count": 11, 477 | "outputs": [] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "source": [ 482 | "### mBERT 모델의 Attention 점수 확인해보기\n", 483 | "\n", 484 | "* 각 층별 Attention 행렬 크기: `batch_size * num_heads * sequence_length * sequence_length`\n", 485 | "([참고](https://huggingface.co/docs/transformers/main_classes/output#transformers.modeling_outputs.BaseModelOutput.attentions))\n", 486 | "* multi-head attention: attention 연산을 병렬화하는 기법\n", 487 | "* Transformers 의 `ModelOutput` 출력에 달린 `attentions` 는 이미 multi-head attention 에 대한 softmax, weighted average 연산을 마친 값" 488 | ], 489 | "metadata": { 490 | "id": "Qad3v5J2XedR" 491 | } 492 | }, 493 | { 494 | "cell_type": "code", 495 | "source": [ 496 | "attns = model_out.attentions\n", 497 | "len(attns), attns[0].shape" 498 | ], 499 | "metadata": { 500 | "colab": { 501 | "base_uri": "https://localhost:8080/" 502 | }, 503 | "id": "dvehsz0gXkD5", 504 | "outputId": "6d20bf01-65db-403c-92a0-e2c795fab3ee" 505 | }, 506 | "execution_count": 12, 507 | "outputs": [ 508 | { 509 | "output_type": "execute_result", 510 | "data": { 511 | "text/plain": [ 512 | "(12, torch.Size([1, 12, 15, 15]))" 513 | ] 514 | }, 515 | "metadata": {}, 516 | "execution_count": 12 517 | } 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "source": [ 523 | "import matplotlib.pyplot as plt\n", 524 | "import matplotlib.ticker as ticker\n", 525 | "\n", 526 | "\n", 527 | "def plot_attention(attention, tokens):\n", 528 | " fig = plt.figure(figsize=(10, 10))\n", 529 | " ax = fig.add_subplot(1, 1, 1)\n", 530 | "\n", 531 | " attention = attention[:len(tokens), :len(tokens)]\n", 532 | "\n", 533 | " ax.matshow(attention, cmap='viridis', vmin=0.0)\n", 534 | "\n", 535 | " fontdict = {'fontsize': 14}\n", 536 | "\n", 537 | " ax.set_xticklabels([''] + tokens, fontdict=fontdict, rotation=90)\n", 538 | " ax.set_yticklabels([''] + tokens, fontdict=fontdict)\n", 539 | "\n", 540 | " ax.xaxis.set_major_locator(ticker.MultipleLocator(1))\n", 541 | " ax.yaxis.set_major_locator(ticker.MultipleLocator(1))\n", 542 | "\n", 543 | " ax.set_xlabel('Input text')\n", 544 | " ax.set_ylabel('Output text')\n", 545 | " plt.suptitle('Attention weights')" 546 | ], 547 | "metadata": { 548 | "id": "N6Hrtxlxawy7" 549 | }, 550 | "execution_count": 13, 551 | "outputs": [] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "source": [ 556 | "#### 마지막 레이어의 attention hitmap 출력하기" 557 | ], 558 | "metadata": { 559 | "id": "akG2zTTRCTAA" 560 | } 561 | }, 562 | { 563 | "cell_type": "code", 564 | "source": [ 565 | "layer_idx = -1\n", 566 | "sum_heads = attns[layer_idx][0, 0, :, :]\n", 567 | "for head_idx in range(1, model.config.num_attention_heads):\n", 568 | " sum_heads = sum_heads + attns[layer_idx][0, head_idx, :, :]\n", 569 | "plot_attention(sum_heads.cpu().detach().numpy(), tokenizer.tokenize(example_text, add_special_tokens=True))" 570 | ], 571 | "metadata": { 572 | "id": "3-wHxrEqbfni", 573 | "colab": { 574 | "base_uri": "https://localhost:8080/", 575 | "height": 1000 576 | }, 577 | "outputId": "7d305076-5de8-4f4d-edc8-5bf844f8d8de" 578 | }, 579 | "execution_count": 14, 580 | "outputs": [ 581 | { 582 | "output_type": "stream", 583 | "name": "stderr", 584 | "text": [ 585 | ":15: UserWarning: FixedFormatter should only be used together with FixedLocator\n", 586 | " ax.set_xticklabels([''] + tokens, fontdict=fontdict, rotation=90)\n", 587 | ":16: UserWarning: FixedFormatter should only be used together with FixedLocator\n", 588 | " ax.set_yticklabels([''] + tokens, fontdict=fontdict)\n" 589 | ] 590 | }, 591 | { 592 | "output_type": "display_data", 593 | "data": { 594 | "text/plain": [ 595 | "
" 596 | ], 597 | "image/png": "\n" 598 | }, 599 | "metadata": {} 600 | } 601 | ] 602 | } 603 | ] 604 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SIGPL 2023 튜토리얼: 손에 잡히는 프로그래밍 언어 모델 구현 2 | 3 | ## 준비물 4 | - Google 계정 5 | - Github 계정 6 | 7 | ## 강의 자료 8 | - [슬라이드](slide.pdf) 9 | - [1교시 실습 자료](1_Concepts.ipynb) 10 | - [2교시 실습 자료](2_Fine_tuning.ipynb) 11 | 12 | ## Colab 에서 ipynb 사용 방법 13 | 1. Colab 에 GitHub 계정 연결: 14 | * 구글 Colab 페이지 접속 https://colab.research.google.com/ 15 | * 우측 상단의 :gear: 표시 클릭 16 | * GitHub 탭에서 계정 연결 17 | 2. 파일 상단에 있는 ![image](https://colab.research.google.com/assets/colab-badge.svg) 배지 클릭 18 | 3. Colab 에서 ![image](https://github.com/prosyslab/sigpl23-tutorial/assets/17640199/d72426d3-36fe-4d88-89ac-826a4b64dfb0) 버튼 클릭하여 개인 드라이브로 복사 19 | * 본인의 구글 드라이브 내 `Colab Notebooks` 디렉토리에 저장됨 20 | -------------------------------------------------------------------------------- /slide.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prosyslab/sigpl23-tutorial/b02623147dbb8bb165e88260057eecf2824129ce/slide.pdf --------------------------------------------------------------------------------