├── .gitignore
├── presentations
    ├── ТЗ ДЗ.pdf
    ├── 01_introduction.pdf
    └── 01_introduction.pptx
├── pics
    └── 01_ml_arch_example.png
├── README.md
└── notebooks
    ├── labs
        ├── 02_lab_EDA.ipynb
        ├── 04_lab_dl_intro.ipynb
        ├── 03_lab_ml_intro.ipynb
        ├── 01_lab_python_intro_old_example.ipynb
        └── 01_lab_python_intro.ipynb
    └── seminars
        ├── 02-python-libs.ipynb
        └── 04-pytorch-intro.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | .venv/
2 | .ipynb_checkpoints
3 | 


--------------------------------------------------------------------------------
/presentations/ТЗ ДЗ.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/ТЗ ДЗ.pdf


--------------------------------------------------------------------------------
/pics/01_ml_arch_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/pics/01_ml_arch_example.png


--------------------------------------------------------------------------------
/presentations/01_introduction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/01_introduction.pdf


--------------------------------------------------------------------------------
/presentations/01_introduction.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/01_introduction.pptx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Репозиторий курса "Искусственные нейронные сети и машинное обучение" для групп БИВТ-21
2 | [Google таблица с вашими успехами](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing)
3 | 


--------------------------------------------------------------------------------
/notebooks/labs/02_lab_EDA.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "bc9520d2",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Лабораторная работа №2\n",
  9 |     "\n",
 10 |     "ФИО:   \n",
 11 |     "Группа: \n",
 12 |     "\n",
 13 |     "Отправлять можно следующими способами:\n",
 14 |     "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
 15 |     "\n",
 16 |     "Deadlines:\n",
 17 |     "- Занятие №6 в семестре (Занятие №2 очное)\n",
 18 |     "\n",
 19 |     "Что необходимо сделать:  \n",
 20 |     "**В общих чертах просто провести EDA** (но обычно это не бывает просто)\n",
 21 |     "## Читайте задание внимательно\n",
 22 |     "\n",
 23 |     "Исходные данные:\n",
 24 |     "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
 25 |     "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
 26 |     "  \n",
 27 |     "---\n",
 28 |     "Теперь по пунктам, что я от вас жду:  \n",
 29 |     "1. **Найти** в таблице (из исходных данных) название своего датасета\n",
 30 |     "2. **Описать** кратко постановку задачи, что от вас хотят. Какие есть переменные. Целевое событие непрерывно (предсказываем число от -$\\infty$ до $\\infty$) либо дискретно (предсказываем класс из конечного множества вариантов, например 0 или 1, или какое-то число в диапазоне [0; 10])\n",
 31 |     "3. Построить распределение целевой переменной в виде гистограммы, сделать промежуточные выводы (обратите внимание на однородоность распределения и возможный дисбаланс). Посчитайте количество уникальных значений целевой переменной.\n",
 32 |     "4. Выведите основные статистики по переменным в датасете (для этого есть готовый метод в pandas, он считает count, min, max, mean, 25%, 50% и пр.). Это делается **одним** методом (вы его знаете).\n",
 33 |     "5. Выведите основную информацию по датасету (сколько всего колонок, каких они типов, сколько в них non-null элементов). Это делается **одним** методом (вы его знаете).\n",
 34 |     "6. Посчитайте количество пропусков (NaN, Null, null, None) элементов во всех колонках. Предположите, почему эти пропуски могли возникнуть, и как их можно было бы заменить. \n",
 35 |     "7. Постройте гистограммы 5 любых признаков (из множества `X`, или как оно изначально у нас называется `data.data`). Если видите какое-то смещение, несимметричность и прочее, опишите это словами в ноутбуке.\n",
 36 |     "8. Постройте графики зависимости 5 любых (на ваш выбор) переменных от целевой переменной (если переменных меньше, чем 5, то сделайте столько, сколько получится). Сделайте вывод, можно ли использовать эти переменные для прогнозирования целевой переменной (иначе говоря, есть ли какая-то взаимосвязь между y-переменной и X-переменной)  \n",
 37 |     "9. _extra_ (необязательно). Посмотрите на зависимость двух переменных одновременно от целевой переменной. То есть по оси OX должна быть переменная $X_{n}$, по оси OY -- переменная $X_{k}$. И у вас будет две кривые (два облака точек) в разрезе целевой переменной. Либо вы можете построить похожее для категориальных признаков, но необходимо будет прочитать про heatmap. \n",
 38 |     "\n",
 39 |     "---\n",
 40 |     "P.S.  \n",
 41 |     "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа  \n",
 42 |     "- Заголовок\n",
 43 |     "- Ячейки с кодом\n",
 44 |     "- Другой заголовок\n",
 45 |     "- Другие ячейки с кодом"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "markdown",
 50 |    "id": "28b51062",
 51 |    "metadata": {},
 52 |    "source": [
 53 |     "## 0. Пример импорта данных"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": null,
 59 |    "id": "406597a7",
 60 |    "metadata": {},
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "data = pd.read_csv(path_to_dataset, sep='\\t')  # если нужно, можно поменять разделитель"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "markdown",
 68 |    "id": "0d119028",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "## 2. Описание постановки задачи"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": null,
 77 |    "id": "d92464d0",
 78 |    "metadata": {},
 79 |    "outputs": [],
 80 |    "source": []
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "id": "34931f43",
 85 |    "metadata": {},
 86 |    "source": [
 87 |     "## 3. Распределение целевой переменной"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": null,
 93 |    "id": "67a611d0",
 94 |    "metadata": {},
 95 |    "outputs": [],
 96 |    "source": []
 97 |   }
 98 |  ],
 99 |  "metadata": {
100 |   "kernelspec": {
101 |    "display_name": "nn-ml-bachelor-2024-venv",
102 |    "language": "python",
103 |    "name": "nn-ml-bachelor-2024-venv"
104 |   },
105 |   "language_info": {
106 |    "codemirror_mode": {
107 |     "name": "ipython",
108 |     "version": 3
109 |    },
110 |    "file_extension": ".py",
111 |    "mimetype": "text/x-python",
112 |    "name": "python",
113 |    "nbconvert_exporter": "python",
114 |    "pygments_lexer": "ipython3",
115 |    "version": "3.9.6"
116 |   }
117 |  },
118 |  "nbformat": 4,
119 |  "nbformat_minor": 5
120 | }
121 | 


--------------------------------------------------------------------------------
/notebooks/labs/04_lab_dl_intro.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "bc9520d2",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Лабораторная работа №4\n",
  9 |     "\n",
 10 |     "ФИО:   \n",
 11 |     "Группа: \n",
 12 |     "\n",
 13 |     "Отправлять можно следующими способами:\n",
 14 |     "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
 15 |     "\n",
 16 |     "Deadlines:\n",
 17 |     "- Занятие №8 в семестре (Занятие №4 очное)\n",
 18 |     "\n",
 19 |     "Что необходимо сделать:  \n",
 20 |     "- Обучить различные модели глубокого обучения на имеющихся данных  \n",
 21 |     "\n",
 22 |     "---\n",
 23 |     "## Читайте задание внимательно\n",
 24 |     "\n",
 25 |     "Исходные данные:\n",
 26 |     "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
 27 |     "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
 28 |     "  \n",
 29 |     "---\n",
 30 |     "Теперь по пунктам, что я от вас жду:  \n",
 31 |     "1. Загрузить необходимые данные к себе и считать (read) их в переменную.\n",
 32 |     "2. Понять, у вас задача классификации (бинарной или многоклассовой) или регрессии.\n",
 33 |     "3. Сделать предобработку данных:  \n",
 34 |     "     1. Разделить выборку на тренировочную (train) и тестовую (test). _Обратите внимание, что обучать скейлеры и определять, какими значениями вы будете заполнять пропуски, вы будете на train выборке, а применять и на train, и на test_.\n",
 35 |     "     2. Проверить пропуски в данных. Если они есть, заполнить одной из стратегий, предложенных в ноутбуке для семинара №3. P.S. Для численных и категориальных переменных будут разные стратегии.\n",
 36 |     "     3. Отнормировать численные переменные (`StandardScaler`, `MinMaxScaler`).\n",
 37 |     "     4. Закодировать категориальные признаки по одной из стратегий.\n",
 38 |     "4. Оформить данные в виде класса `Dataset` из библиотеки `torch` (как мы это делали на семинаре), а затем засунуть в `Dataloader` (тоже делали на семинаре).\n",
 39 |     "5. Обучить на тренировочном множестве:\n",
 40 |     "     1. Очень простую однослойную нейросеть с оптимизатором `SGD` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
 41 |     "     2. Нейросеть посложнее (с 1 скрытым слоем) с оптимизатором `Adam` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
 42 |     "     3. Нейросеть еще сложнее (с 3+ скрытыми слоями) с оптимизатором `Adam` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
 43 |     "6. Посчитайте loss на train и test множествах, в зависимости от эпохи обучения. Провизуализируйте это с помощью библиотеки `matplotlib` (выйдет так называемая **learning curve**, кривая обучения модели).\n",
 44 |     "6. Посчитайте метрики на train и test множествах:\n",
 45 |     "     1. Для задачи классификации -- Accuracy\n",
 46 |     "     2. Для задачи регрессии -- MAE\n",
 47 |     "7. Сравните метрики относительно train/test, так и относительно разных моделей. Ответьте на следующие вопросы:\n",
 48 |     "     1. Какая модель справилась лучше с поставленной задачей?\n",
 49 |     "     2. Имеет ли место переобучение?\n",
 50 |     "     3. Имеет ли место недообучение?\n",
 51 |     "     4. Как можно улучшить метрики моделей?\n",
 52 |     "\n",
 53 |     "---\n",
 54 |     "P.S.  \n",
 55 |     "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа  \n",
 56 |     "- Заголовок\n",
 57 |     "- Ячейки с кодом\n",
 58 |     "- Другой заголовок\n",
 59 |     "- Другие ячейки с кодом\n",
 60 |     "\n",
 61 |     "P.S.S.  \n",
 62 |     "Если вам повезло с многоклассовой классификацией, у вас не будет проблем, просто нужно будет поставить необходимое количество нейронов на выходе вашей нейросети."
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "markdown",
 67 |    "id": "28b51062",
 68 |    "metadata": {},
 69 |    "source": [
 70 |     "## 1. Пример импорта данных. Грузим данные"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": null,
 76 |    "id": "406597a7",
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "data = pd.read_csv(path_to_dataset, sep='\\t')  # если нужно, можно поменять разделитель"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "id": "0d119028",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "## 2. Понимаем, какая перед нами задача"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "id": "d92464d0",
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": []
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "id": "34931f43",
102 |    "metadata": {},
103 |    "source": [
104 |     "## 3. Делаем предобработку данных"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "code",
109 |    "execution_count": null,
110 |    "id": "67a611d0",
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": []
114 |   }
115 |  ],
116 |  "metadata": {
117 |   "kernelspec": {
118 |    "display_name": "nn-ml-bachelor-2024-venv",
119 |    "language": "python",
120 |    "name": "nn-ml-bachelor-2024-venv"
121 |   },
122 |   "language_info": {
123 |    "codemirror_mode": {
124 |     "name": "ipython",
125 |     "version": 3
126 |    },
127 |    "file_extension": ".py",
128 |    "mimetype": "text/x-python",
129 |    "name": "python",
130 |    "nbconvert_exporter": "python",
131 |    "pygments_lexer": "ipython3",
132 |    "version": "3.9.6"
133 |   }
134 |  },
135 |  "nbformat": 4,
136 |  "nbformat_minor": 5
137 | }
138 | 


--------------------------------------------------------------------------------
/notebooks/labs/03_lab_ml_intro.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "bc9520d2",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Лабораторная работа №3\n",
  9 |     "\n",
 10 |     "ФИО:   \n",
 11 |     "Группа: \n",
 12 |     "\n",
 13 |     "Отправлять можно следующими способами:\n",
 14 |     "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
 15 |     "\n",
 16 |     "Deadlines:\n",
 17 |     "- Занятие №7 в семестре (Занятие №3 очное)\n",
 18 |     "\n",
 19 |     "Что необходимо сделать:  \n",
 20 |     "- Обучить различного рода модели машинного обучения и сравнить их между собой  \n",
 21 |     "\n",
 22 |     "---\n",
 23 |     "## Читайте задание внимательно\n",
 24 |     "\n",
 25 |     "Исходные данные:\n",
 26 |     "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
 27 |     "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
 28 |     "  \n",
 29 |     "---\n",
 30 |     "Теперь по пунктам, что я от вас жду:  \n",
 31 |     "1. Загрузить необходимые данные к себе и считать (read) их в переменную.\n",
 32 |     "2. Понять, у вас задача классификации (бинарной или многоклассовой) или регрессии (**если у вас многоклассовая классификация, прочтите P.S.S. внизу**).\n",
 33 |     "3. Сделать предобработку данных:  \n",
 34 |     "     1. Разделить выборку на тренировочную (train) и тестовую (test). _Обратите внимание, что обучать скейлеры и определять, какими значениями вы будете заполнять пропуски, вы будете на train выборке, а применять и на train, и на test_.\n",
 35 |     "     2. Проверить пропуски в данных. Если они есть, заполнить одной из стратегий, предложенных в ноутбуке для семинара №3. P.S. Для численных и категориальных переменных будут разные стратегии.\n",
 36 |     "     3. Отнормировать численные переменные (`StandardScaler`, `MinMaxScaler`).\n",
 37 |     "     4. Закодировать категориальные признаки по одной из стратегий.\n",
 38 |     "4. Обучить на тренировочном множестве:\n",
 39 |     "     1. Линейную модель (`LogisticRegression`, `LinearRegression`)\n",
 40 |     "     2. Деревянную модель (`DecisionTreeClassifier`, `DecisionTreeRegressor`) (тут советую попробовать разные глубины деревьев)\n",
 41 |     "     3. K-ближайших соседей (`KNeighborsClassifier`, `KNeighborsRegressor`) (тут тоже есть смысл попробовать разные `k`)\n",
 42 |     "     4. Случайный лес (`RandomForestClassifier`, `RandomForestRegressor`) \n",
 43 |     "5. Посчитайте метрики на train и test множествах:\n",
 44 |     "     1. Для задачи классификации -- Accuracy, ROC-AUC (график + значение), PR-кривую (график), F1-score\n",
 45 |     "     2. Для задачи регрессии -- MAE, RMSE, MAPE\n",
 46 |     "6. Сравните метрики относительно train/test, так и относительно разных моделей. Ответьте на следующие вопросы:\n",
 47 |     "     1. Какая модель справилась лучше с поставленной задачей?\n",
 48 |     "     2. Имеет ли место переобучение?\n",
 49 |     "     3. Имеет ли место недообучение?\n",
 50 |     "     4. Как можно улучшить метрики моделей?\n",
 51 |     "\n",
 52 |     "---\n",
 53 |     "P.S.  \n",
 54 |     "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа  \n",
 55 |     "- Заголовок\n",
 56 |     "- Ячейки с кодом\n",
 57 |     "- Другой заголовок\n",
 58 |     "- Другие ячейки с кодом\n",
 59 |     "\n",
 60 |     "P.S.S.  \n",
 61 |     "Если вам повезло с многоклассовой классификацией, вам будет необходимо понять, умеет ли алгоритм работать с несколькими классами одновременно (обычно они не умеют). Поэтому вам может понадобиться такая штука, как OneVsRestClassifier ([ссылка](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier)), но советую ознакомиться с этой [страницей](https://scikit-learn.org/stable/modules/multiclass.html), здесь представлена более полная информация."
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "markdown",
 66 |    "id": "28b51062",
 67 |    "metadata": {},
 68 |    "source": [
 69 |     "## 1. Пример импорта данных. Грузим данные"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "id": "406597a7",
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "data = pd.read_csv(path_to_dataset, sep='\\t')  # если нужно, можно поменять разделитель"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "id": "0d119028",
 85 |    "metadata": {},
 86 |    "source": [
 87 |     "## 2. Понимаем, какая перед нами задача"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": null,
 93 |    "id": "d92464d0",
 94 |    "metadata": {},
 95 |    "outputs": [],
 96 |    "source": []
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "id": "34931f43",
101 |    "metadata": {},
102 |    "source": [
103 |     "## 3. Делаем предобработку данных"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "code",
108 |    "execution_count": null,
109 |    "id": "67a611d0",
110 |    "metadata": {},
111 |    "outputs": [],
112 |    "source": []
113 |   }
114 |  ],
115 |  "metadata": {
116 |   "kernelspec": {
117 |    "display_name": "nn-ml-bachelor-2024-venv",
118 |    "language": "python",
119 |    "name": "nn-ml-bachelor-2024-venv"
120 |   },
121 |   "language_info": {
122 |    "codemirror_mode": {
123 |     "name": "ipython",
124 |     "version": 3
125 |    },
126 |    "file_extension": ".py",
127 |    "mimetype": "text/x-python",
128 |    "name": "python",
129 |    "nbconvert_exporter": "python",
130 |    "pygments_lexer": "ipython3",
131 |    "version": "3.9.6"
132 |   }
133 |  },
134 |  "nbformat": 4,
135 |  "nbformat_minor": 5
136 | }
137 | 


--------------------------------------------------------------------------------
/notebooks/labs/01_lab_python_intro_old_example.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "596b4987",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Лабораторная работа №1\n",
  9 |     "\n",
 10 |     "ФИО:   \n",
 11 |     "Группа: \n",
 12 |     "\n",
 13 |     "Отправлять можно следующими способами:\n",
 14 |     "1. Создать **приватный** репозиторий на github, добавить меня по нику (l3lush) в Collaborators (Settings -> Collaborators -> Add people)\n",
 15 |     "2. Отправить заполненный ноутбук мне на почту avmysh@gmail.com, либо m1603956@edu.misis.ru\n",
 16 |     "3. Отправить заполненный ноутбук мне в тг @l3lush. \n",
 17 |     "\n",
 18 |     "Deadlines:\n",
 19 |     "- soft -- **05.03.2023 23:59** (за сдачу в пределах этого времени +1 балл в табличку)\n",
 20 |     "- hard -- **19.03.2023 23:59**\n",
 21 |     "\n",
 22 |     "Что необходимо сделать:\n",
 23 |     "1. Заполнить все ячейки ниже кодом так, чтобы прошли все `assert`ы."
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "id": "2213731a",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "# 0. Пример работы assert"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "id": "0dcee059",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "`assert` предназначен для проверки каких-либо условий.  \n",
 40 |     "Если условие истинно (возвращается `True`), код выполняется без ошибок, в противном случае выходит `AssertionError`.  <br />\n",
 41 |     "Ниже пример работы `assert`ов (в ячейке с примером `assert a > 6` можно оставить ошибку, но нигде далее в коде ошибок быть **не должно**)    <br />\n",
 42 |     "P.S. `assert`ы менять никак **нельзя**"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 1,
 48 |    "id": "31be897b",
 49 |    "metadata": {},
 50 |    "outputs": [],
 51 |    "source": [
 52 |     "a = 5\n",
 53 |     "assert a == 5  # assert проходит, т.к. a равно 0"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 2,
 59 |    "id": "139c9ca8",
 60 |    "metadata": {},
 61 |    "outputs": [
 62 |     {
 63 |      "ename": "AssertionError",
 64 |      "evalue": "",
 65 |      "output_type": "error",
 66 |      "traceback": [
 67 |       "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
 68 |       "\u001b[1;31mAssertionError\u001b[0m                            Traceback (most recent call last)",
 69 |       "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_21924\\1589179466.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0ma\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m6\u001b[0m  \u001b[1;31m# assert не проходит, т.к. а не больше 6, выпадает ошибка\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
 70 |       "\u001b[1;31mAssertionError\u001b[0m: "
 71 |      ]
 72 |     }
 73 |    ],
 74 |    "source": [
 75 |     "assert a > 6  # assert не проходит, т.к. а не больше 6, выпадает ошибка"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "id": "a72f14da",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "# 1. Базовые типы переменных"
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "markdown",
 88 |    "id": "ccfb5acf",
 89 |    "metadata": {},
 90 |    "source": [
 91 |     "Создайте переменные так, чтобы они были объектами определенного типа (типы приведены в комментариях)."
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": 5,
 97 |    "id": "e88c14eb",
 98 |    "metadata": {},
 99 |    "outputs": [
100 |     {
101 |      "ename": "AssertionError",
102 |      "evalue": "",
103 |      "output_type": "error",
104 |      "traceback": [
105 |       "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
106 |       "\u001b[1;31mAssertionError\u001b[0m                            Traceback (most recent call last)",
107 |       "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_21924\\2380738517.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0mint_variable\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m4.1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mint_variable\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mint\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
108 |       "\u001b[1;31mAssertionError\u001b[0m: "
109 |      ]
110 |     }
111 |    ],
112 |    "source": [
113 |     "int_variable = [4.1]\n",
114 |     "assert isinstance(int_variable, int)"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": null,
120 |    "id": "db46e8d6",
121 |    "metadata": {},
122 |    "outputs": [],
123 |    "source": [
124 |     "int_variable = ...  # int\n",
125 |     "assert isinstance(int_variable, int)\n",
126 |     "\n",
127 |     "float_variable = ...  # float\n",
128 |     "assert isinstance(float_variable, float)\n",
129 |     "\n",
130 |     "string_variable = ...  # string\n",
131 |     "assert isinstance(string_variable, str)\n",
132 |     "\n",
133 |     "bool_variable = ...  # bool\n",
134 |     "assert isinstance(bool_variable, bool)"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": null,
140 |    "id": "ed1861a0",
141 |    "metadata": {},
142 |    "outputs": [],
143 |    "source": [
144 |     "list_variable = ...  # list\n",
145 |     "assert isinstance(list_variable, list)\n",
146 |     "\n",
147 |     "tuple_variable = ...  # tuple\n",
148 |     "assert isinstance(tuple_variable, tuple)\n",
149 |     "\n",
150 |     "set_variable = ...  # set\n",
151 |     "assert isinstance(set_variable, set)\n",
152 |     "\n",
153 |     "dict_variable = ...  # dict\n",
154 |     "assert isinstance(dict_variable, dict)"
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "markdown",
159 |    "id": "52830470",
160 |    "metadata": {},
161 |    "source": [
162 |     "# 2. Различные функции"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": null,
168 |    "id": "3eb16cbe",
169 |    "metadata": {},
170 |    "outputs": [],
171 |    "source": [
172 |     "def difference_of_two_numbers(first, second):\n",
173 |     "    \"\"\"Возвращает разницу между первым и вторым аргументом\"\"\"\n",
174 |     "    # TODO: напиши меня\n",
175 |     "    pass\n",
176 |     "\n",
177 |     "\n",
178 |     "assert difference_of_two_numbers(2, 1) == 1\n",
179 |     "assert difference_of_two_numbers(4, 1) == 3\n",
180 |     "assert difference_of_two_numbers(10, 0) == 10\n",
181 |     "assert difference_of_two_numbers(-5, -6) == 1"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "code",
186 |    "execution_count": null,
187 |    "id": "65ff4881",
188 |    "metadata": {},
189 |    "outputs": [],
190 |    "source": [
191 |     "def condition_function(input_number):\n",
192 |     "    \"\"\"\n",
193 |     "    Если входное число меньше либо равно 0, то умножить его на 2.\n",
194 |     "    В противном случае, если число больше 0, но меньше или равно 10, умножить на 3.\n",
195 |     "    Во всех прочих случаях поделить на 10.\n",
196 |     "    \"\"\"\n",
197 |     "    # TODO: напиши меня\n",
198 |     "    pass\n",
199 |     "\n",
200 |     "\n",
201 |     "assert condition_function(0) == 0\n",
202 |     "assert condition_function(-1) == -2\n",
203 |     "assert condition_function(1) == 3\n",
204 |     "assert condition_function(10) == 30\n",
205 |     "assert condition_function(11) == 1.1\n",
206 |     "assert condition_function(20) == 2"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "code",
211 |    "execution_count": null,
212 |    "id": "51264059",
213 |    "metadata": {},
214 |    "outputs": [],
215 |    "source": [
216 |     "def calculator(number_1, operation, number_2):\n",
217 |     "    \"\"\"\n",
218 |     "    Простой оператор, способный выполнять операции +, -, *, /.\n",
219 |     "    На входе первое число, операция в виде строки и второе число.\n",
220 |     "    \n",
221 |     "    Пример: \n",
222 |     "    >>> calculator(1, \"+\", 1)\n",
223 |     "    >>> 2\n",
224 |     "    \"\"\"\n",
225 |     "    # TODO: напиши меня\n",
226 |     "    pass\n",
227 |     "\n",
228 |     "\n",
229 |     "assert calculator(1, \"+\", 2) == 3\n",
230 |     "assert calculator(3, \"-\", 1) == 2\n",
231 |     "assert calculator(4, \"*\", 3) == 12\n",
232 |     "assert calculator(2, \"/\", 2) == 1"
233 |    ]
234 |   },
235 |   {
236 |    "cell_type": "code",
237 |    "execution_count": null,
238 |    "id": "02255ff4",
239 |    "metadata": {},
240 |    "outputs": [],
241 |    "source": [
242 |     "def number_of_unique_elements(input_list):\n",
243 |     "    \"\"\"\n",
244 |     "    Считает количество уникальных элементов в листе.\n",
245 |     "    \"\"\"\n",
246 |     "    # TODO: напиши меня\n",
247 |     "    pass\n",
248 |     "\n",
249 |     "\n",
250 |     "assert number_of_unique_elements([1, 2, 3]) == 3\n",
251 |     "assert number_of_unique_elements([1] * 93) == 1\n",
252 |     "assert number_of_unique_elements(list(range(1000))) == 1000"
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "code",
257 |    "execution_count": null,
258 |    "id": "2a2a2363",
259 |    "metadata": {},
260 |    "outputs": [],
261 |    "source": [
262 |     "def counter(input_list):\n",
263 |     "    \"\"\"\n",
264 |     "    Считает количество вхождений каждого из элементов листа.\n",
265 |     "    Возвращает словарь вида {число: количество вхождений}\n",
266 |     "    \n",
267 |     "    Замечание (!): встроенным в collections Counter'ом пользоваться нельзя\n",
268 |     "    \n",
269 |     "    Например:\n",
270 |     "    counter([1, 1, 2, 3]) вернет {1: 2, 2: 1, 3: 1}\n",
271 |     "    \"\"\"\n",
272 |     "    # TODO: напиши меня\n",
273 |     "    pass\n",
274 |     "\n",
275 |     "\n",
276 |     "assert counter([1, 1, 1, 2, 3]) == {1: 3, 2: 1, 3: 1}\n",
277 |     "assert counter([1] * 1000) == {1: 1000}\n",
278 |     "assert counter([1, 3, 5] * 100) == {1: 100, 3: 100, 5: 100}"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "code",
283 |    "execution_count": null,
284 |    "id": "c1498018",
285 |    "metadata": {},
286 |    "outputs": [],
287 |    "source": [
288 |     "def multiply_nums(input_string):\n",
289 |     "    \"\"\"\n",
290 |     "    Перемножить числа, переданные в строке, перечисленные через запятую.\n",
291 |     "    \n",
292 |     "    hint: можно использовать метод .split()\n",
293 |     "    \"\"\"\n",
294 |     "    # TODO: напиши меня\n",
295 |     "    pass\n",
296 |     "\n",
297 |     "\n",
298 |     "assert multiply_nums(\"2, 3\") == 6\n",
299 |     "assert multiply_nums(\"1, 1, 1, 1, 1, 1, 1\") == 1\n",
300 |     "assert multiply_nums(\"345, 4576, 794, 325, 0\") == 0"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "markdown",
305 |    "id": "202c838a",
306 |    "metadata": {},
307 |    "source": [
308 |     "Реализуйте следующую функцию:\n",
309 |     "$$\n",
310 |     "y = sin(x)\\cdot cos(x)\n",
311 |     "$$\n",
312 |     "P.S. используйте библиотеку `math`"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "code",
317 |    "execution_count": null,
318 |    "id": "4121596c",
319 |    "metadata": {},
320 |    "outputs": [],
321 |    "source": [
322 |     "def custom_function(x):\n",
323 |     "    \"\"\"\n",
324 |     "    Реализуйте функцию, описанную выше.\n",
325 |     "    \"\"\"\n",
326 |     "    # TODO: напиши меня\n",
327 |     "    pass\n",
328 |     "\n",
329 |     "assert round(custom_function(1), 3) == 0.455\n",
330 |     "assert round(custom_function(1.5), 3) == 0.071\n",
331 |     "assert round(custom_function(2), 3) == -0.378\n",
332 |     "assert custom_function(0) == 0"
333 |    ]
334 |   },
335 |   {
336 |    "cell_type": "markdown",
337 |    "id": "7a2210d6",
338 |    "metadata": {},
339 |    "source": [
340 |     "Реализуйте следующую функцию:\n",
341 |     "$$\n",
342 |     "y = \\prod\\limits_{n = 1}^n \\frac{(n+2)^x + ln(x)}{x^2 + 4n}\n",
343 |     "$$\n",
344 |     "P.S. используйте библиотеку `math`"
345 |    ]
346 |   },
347 |   {
348 |    "cell_type": "code",
349 |    "execution_count": null,
350 |    "id": "b4698ba6",
351 |    "metadata": {},
352 |    "outputs": [],
353 |    "source": [
354 |     "def custom_function_1(x, n):\n",
355 |     "    \"\"\"\n",
356 |     "    Реализуйте функцию, описанную выше.\n",
357 |     "    \"\"\"\n",
358 |     "    # TODO: напиши меня\n",
359 |     "    pass\n",
360 |     "\n",
361 |     "\n",
362 |     "assert round(custom_function_1(2, 3), 3) == 2.707\n",
363 |     "assert round(custom_function_1(3, 2), 3) == 8.277\n",
364 |     "assert round(custom_function_1(3, 3), 3) == 49.7"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "code",
369 |    "execution_count": null,
370 |    "id": "b9a4db60",
371 |    "metadata": {},
372 |    "outputs": [],
373 |    "source": [
374 |     "class MyList:\n",
375 |     "    def __init__(self):\n",
376 |     "        # TODO: напиши меня\n",
377 |     "        pass\n",
378 |     "    \n",
379 |     "    def return_sum(self):\n",
380 |     "        \"\"\"\n",
381 |     "        Возвращает сумму всех элементов сохраненного листа.\n",
382 |     "        Пользоваться sum нельзя!\n",
383 |     "        \"\"\"\n",
384 |     "        # TODO: напиши меня\n",
385 |     "        pass\n",
386 |     "    \n",
387 |     "    def make_reverse(self):\n",
388 |     "        \"\"\"\n",
389 |     "        Разворачивает сохраненный лист.\n",
390 |     "        \"\"\"\n",
391 |     "        # TODO: напиши меня\n",
392 |     "        pass\n",
393 |     "    \n",
394 |     "    def make_slice(self, start, stop):\n",
395 |     "        \"\"\"\n",
396 |     "        Делает слайсинг сохраненного листа.\n",
397 |     "        \"\"\"\n",
398 |     "        # TODO: напиши меня\n",
399 |     "        pass\n",
400 |     "    \n",
401 |     "    \n",
402 |     "a = MyList([1, 2, 3, 4])\n",
403 |     "assert a.return_sum() == 10\n",
404 |     "assert a.make_reverse() == [4, 3, 2, 1]\n",
405 |     "assert a.make_slice(0, 2) == [1, 2]\n",
406 |     "\n",
407 |     "b = MyList([5, 6, 6, 5])\n",
408 |     "assert b.return_sum() == 22\n",
409 |     "assert b.make_reverse() == [5, 6, 6, 5]\n",
410 |     "assert b.make_slice(1, 2) == [6]"
411 |    ]
412 |   }
413 |  ],
414 |  "metadata": {
415 |   "kernelspec": {
416 |    "display_name": "Python 3 (ipykernel)",
417 |    "language": "python",
418 |    "name": "python3"
419 |   },
420 |   "language_info": {
421 |    "codemirror_mode": {
422 |     "name": "ipython",
423 |     "version": 3
424 |    },
425 |    "file_extension": ".py",
426 |    "mimetype": "text/x-python",
427 |    "name": "python",
428 |    "nbconvert_exporter": "python",
429 |    "pygments_lexer": "ipython3",
430 |    "version": "3.9.13"
431 |   }
432 |  },
433 |  "nbformat": 4,
434 |  "nbformat_minor": 5
435 | }
436 | 


--------------------------------------------------------------------------------
/notebooks/labs/01_lab_python_intro.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "596b4987",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Лабораторная работа №1\n",
  9 |     "\n",
 10 |     "ФИО:   \n",
 11 |     "Группа: \n",
 12 |     "\n",
 13 |     "Отправлять можно следующими способами:\n",
 14 |     "1. Создать **приватный** репозиторий на github, добавить меня по нику (l3lush) в Collaborators (Settings -> Collaborators -> Add people)\n",
 15 |     "\n",
 16 |     "Deadlines:\n",
 17 |     "- Занятие №5 в семестре (Занятие №1 очное)\n",
 18 |     "\n",
 19 |     "Что необходимо сделать:\n",
 20 |     "1. Заполнить все ячейки ниже кодом так, чтобы прошли все `assert`ы."
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "id": "2213731a",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "# 0. Пример работы assert"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "id": "0dcee059",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "`assert` предназначен для проверки каких-либо условий.  \n",
 37 |     "Если условие истинно (возвращается `True`), код выполняется без ошибок, в противном случае выходит `AssertionError`.  <br />\n",
 38 |     "Ниже пример работы `assert`ов (в ячейке с примером `assert a > 6` можно оставить ошибку, но нигде далее в коде ошибок быть **не должно**)    <br />\n",
 39 |     "P.S. `assert`ы менять никак **нельзя**"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": 1,
 45 |    "id": "31be897b",
 46 |    "metadata": {},
 47 |    "outputs": [],
 48 |    "source": [
 49 |     "a = 5\n",
 50 |     "assert a == 5  # assert проходит, т.к. a равно 0"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 2,
 56 |    "id": "139c9ca8",
 57 |    "metadata": {},
 58 |    "outputs": [
 59 |     {
 60 |      "ename": "AssertionError",
 61 |      "evalue": "",
 62 |      "output_type": "error",
 63 |      "traceback": [
 64 |       "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
 65 |       "\u001b[1;31mAssertionError\u001b[0m                            Traceback (most recent call last)",
 66 |       "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_23792\\1589179466.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0ma\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m6\u001b[0m  \u001b[1;31m# assert не проходит, т.к. а не больше 6, выпадает ошибка\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
 67 |       "\u001b[1;31mAssertionError\u001b[0m: "
 68 |      ]
 69 |     }
 70 |    ],
 71 |    "source": [
 72 |     "assert a > 6  # assert не проходит, т.к. а не больше 6, выпадает ошибка"
 73 |    ]
 74 |   },
 75 |   {
 76 |    "cell_type": "markdown",
 77 |    "id": "a72f14da",
 78 |    "metadata": {},
 79 |    "source": [
 80 |     "# 1. Базовые типы переменных"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "id": "ccfb5acf",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "Создайте переменные так, чтобы они были объектами определенного типа (типы приведены в комментариях)."
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "id": "db46e8d6",
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": [
 98 |     "int_variable = ...  # int\n",
 99 |     "assert isinstance(int_variable, int)\n",
100 |     "\n",
101 |     "float_variable = ...  # float\n",
102 |     "assert isinstance(float_variable, float)\n",
103 |     "\n",
104 |     "string_variable = ...  # string\n",
105 |     "assert isinstance(string_variable, str)\n",
106 |     "\n",
107 |     "bool_variable = ...  # bool\n",
108 |     "assert isinstance(bool_variable, bool)"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "id": "ed1861a0",
115 |    "metadata": {},
116 |    "outputs": [],
117 |    "source": [
118 |     "list_variable = ...  # list\n",
119 |     "assert isinstance(list_variable, list)\n",
120 |     "\n",
121 |     "tuple_variable = ...  # tuple\n",
122 |     "assert isinstance(tuple_variable, tuple)\n",
123 |     "\n",
124 |     "set_variable = ...  # set\n",
125 |     "assert isinstance(set_variable, set)\n",
126 |     "\n",
127 |     "dict_variable = ...  # dict\n",
128 |     "assert isinstance(dict_variable, dict)"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "id": "00ec06c0",
134 |    "metadata": {},
135 |    "source": [
136 |     "<br>"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "markdown",
141 |    "id": "52830470",
142 |    "metadata": {},
143 |    "source": [
144 |     "# 2.1 Различные функции"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "code",
149 |    "execution_count": null,
150 |    "id": "3eb16cbe",
151 |    "metadata": {},
152 |    "outputs": [],
153 |    "source": [
154 |     "def difference_of_two_numbers(first, second):\n",
155 |     "    \"\"\"Возвращает разницу между первым и вторым аргументом\"\"\"\n",
156 |     "    # TODO: напиши меня\n",
157 |     "    pass\n",
158 |     "\n",
159 |     "\n",
160 |     "assert difference_of_two_numbers(2, 1) == 1\n",
161 |     "assert difference_of_two_numbers(4, 1) == 3\n",
162 |     "assert difference_of_two_numbers(10, 0) == 10\n",
163 |     "assert difference_of_two_numbers(-5, -6) == 1"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "code",
168 |    "execution_count": null,
169 |    "id": "65ff4881",
170 |    "metadata": {},
171 |    "outputs": [],
172 |    "source": [
173 |     "def condition_function(input_number):\n",
174 |     "    \"\"\"\n",
175 |     "    Если входное число меньше либо равно 0, то умножить его на 2.\n",
176 |     "    В противном случае, если число больше 0, но меньше или равно 10, умножить на 3.\n",
177 |     "    Во всех прочих случаях поделить на 10.\n",
178 |     "    \"\"\"\n",
179 |     "    # TODO: напиши меня\n",
180 |     "    pass\n",
181 |     "\n",
182 |     "\n",
183 |     "assert condition_function(0) == 0\n",
184 |     "assert condition_function(-1) == -2\n",
185 |     "assert condition_function(1) == 3\n",
186 |     "assert condition_function(10) == 30\n",
187 |     "assert condition_function(11) == 1.1\n",
188 |     "assert condition_function(20) == 2"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "code",
193 |    "execution_count": null,
194 |    "id": "51264059",
195 |    "metadata": {},
196 |    "outputs": [],
197 |    "source": [
198 |     "def calculator(number_1, operation, number_2):\n",
199 |     "    \"\"\"\n",
200 |     "    Простой оператор, способный выполнять операции +, -, *, /.\n",
201 |     "    На входе первое число, операция в виде строки и второе число.\n",
202 |     "    \n",
203 |     "    Пример: \n",
204 |     "    >>> calculator(1, \"+\", 1)\n",
205 |     "    >>> 2\n",
206 |     "    \"\"\"\n",
207 |     "    # TODO: напиши меня\n",
208 |     "    pass\n",
209 |     "\n",
210 |     "\n",
211 |     "assert calculator(1, \"+\", 2) == 3\n",
212 |     "assert calculator(3, \"-\", 1) == 2\n",
213 |     "assert calculator(4, \"*\", 3) == 12\n",
214 |     "assert calculator(2, \"/\", 2) == 1"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": null,
220 |    "id": "02255ff4",
221 |    "metadata": {},
222 |    "outputs": [],
223 |    "source": [
224 |     "def number_of_unique_elements(input_list):\n",
225 |     "    \"\"\"\n",
226 |     "    Считает количество уникальных элементов в листе.\n",
227 |     "    \"\"\"\n",
228 |     "    # TODO: напиши меня\n",
229 |     "    pass\n",
230 |     "\n",
231 |     "\n",
232 |     "assert number_of_unique_elements([1, 2, 3]) == 3\n",
233 |     "assert number_of_unique_elements([1] * 93) == 1\n",
234 |     "assert number_of_unique_elements(list(range(1000))) == 1000"
235 |    ]
236 |   },
237 |   {
238 |    "cell_type": "code",
239 |    "execution_count": null,
240 |    "id": "2a2a2363",
241 |    "metadata": {},
242 |    "outputs": [],
243 |    "source": [
244 |     "def counter(input_list):\n",
245 |     "    \"\"\"\n",
246 |     "    Считает количество вхождений каждого из элементов листа.\n",
247 |     "    Возвращает словарь вида {число: количество вхождений}\n",
248 |     "    \n",
249 |     "    Замечание (!): встроенным в collections Counter'ом пользоваться нельзя\n",
250 |     "    \n",
251 |     "    Например:\n",
252 |     "    counter([1, 1, 2, 3]) вернет {1: 2, 2: 1, 3: 1}\n",
253 |     "    \"\"\"\n",
254 |     "    # TODO: напиши меня\n",
255 |     "    pass\n",
256 |     "\n",
257 |     "\n",
258 |     "assert counter([1, 1, 1, 2, 3]) == {1: 3, 2: 1, 3: 1}\n",
259 |     "assert counter([1] * 1000) == {1: 1000}\n",
260 |     "assert counter([1, 3, 5] * 100) == {1: 100, 3: 100, 5: 100}"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "code",
265 |    "execution_count": null,
266 |    "id": "c1498018",
267 |    "metadata": {},
268 |    "outputs": [],
269 |    "source": [
270 |     "def multiply_nums(input_string):\n",
271 |     "    \"\"\"\n",
272 |     "    Перемножить числа, переданные в строке, перечисленные через запятую.\n",
273 |     "    \n",
274 |     "    hint: можно использовать метод .split()\n",
275 |     "    \"\"\"\n",
276 |     "    # TODO: напиши меня\n",
277 |     "    pass\n",
278 |     "\n",
279 |     "\n",
280 |     "assert multiply_nums(\"2, 3\") == 6\n",
281 |     "assert multiply_nums(\"1, 1, 1, 1, 1, 1, 1\") == 1\n",
282 |     "assert multiply_nums(\"345, 4576, 794, 325, 0\") == 0"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "markdown",
287 |    "id": "202c838a",
288 |    "metadata": {},
289 |    "source": [
290 |     "Реализуйте следующую функцию:\n",
291 |     "$$\n",
292 |     "y = sin(x)\\cdot cos(x)\n",
293 |     "$$\n",
294 |     "P.S. используйте библиотеку `math`"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": null,
300 |    "id": "4121596c",
301 |    "metadata": {},
302 |    "outputs": [],
303 |    "source": [
304 |     "def custom_function(x):\n",
305 |     "    \"\"\"\n",
306 |     "    Реализуйте функцию, описанную выше.\n",
307 |     "    \"\"\"\n",
308 |     "    # TODO: напиши меня\n",
309 |     "    pass\n",
310 |     "\n",
311 |     "assert round(custom_function(1), 3) == 0.455\n",
312 |     "assert round(custom_function(1.5), 3) == 0.071\n",
313 |     "assert round(custom_function(2), 3) == -0.378\n",
314 |     "assert custom_function(0) == 0"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "id": "7a2210d6",
320 |    "metadata": {},
321 |    "source": [
322 |     "Реализуйте следующую функцию:\n",
323 |     "$$\n",
324 |     "y = \\prod\\limits_{n = 1}^n \\frac{(n+2)^x + ln(x)}{x^2 + 4n}\n",
325 |     "$$\n",
326 |     "P.S. используйте библиотеку `math`"
327 |    ]
328 |   },
329 |   {
330 |    "cell_type": "code",
331 |    "execution_count": null,
332 |    "id": "b4698ba6",
333 |    "metadata": {},
334 |    "outputs": [],
335 |    "source": [
336 |     "def custom_function_1(x, n):\n",
337 |     "    \"\"\"\n",
338 |     "    Реализуйте функцию, описанную выше.\n",
339 |     "    \"\"\"\n",
340 |     "    # TODO: напиши меня\n",
341 |     "    pass\n",
342 |     "\n",
343 |     "\n",
344 |     "assert round(custom_function_1(2, 3), 3) == 2.707\n",
345 |     "assert round(custom_function_1(3, 2), 3) == 8.277\n",
346 |     "assert round(custom_function_1(3, 3), 3) == 49.7"
347 |    ]
348 |   },
349 |   {
350 |    "cell_type": "markdown",
351 |    "id": "79409870",
352 |    "metadata": {},
353 |    "source": [
354 |     "# 2.2 Задачки на написание функций"
355 |    ]
356 |   },
357 |   {
358 |    "cell_type": "markdown",
359 |    "id": "b1b313b9",
360 |    "metadata": {},
361 |    "source": [
362 |     "1. Дано число n. С начала суток прошло n минут. Определите, сколько часов и минут будут показывать электронные часы в этот момент. Программа должна вывести два числа: количество часов (от 0 до 23) и количество минут (от 0 до 59). Учтите, что число n может быть больше, чем количество минут в сутках."
363 |    ]
364 |   },
365 |   {
366 |    "cell_type": "code",
367 |    "execution_count": 12,
368 |    "id": "42562c52",
369 |    "metadata": {},
370 |    "outputs": [],
371 |    "source": [
372 |     "# Ваша задача здесь написать функцию time_converter (именно такое название)\n",
373 |     "# она должна возвращать значения в виде строки 'hours minutes'\n",
374 |     "\n",
375 |     "\n",
376 |     "assert time_converter(2782) == '22 22'\n",
377 |     "assert time_converter(4733) == '6 53'\n",
378 |     "assert time_converter(1766) == '5 26'\n",
379 |     "assert time_converter(3865) == '16 25'\n",
380 |     "assert time_converter(4628) == '5 8'\n",
381 |     "assert time_converter(4353) == '0 33'\n",
382 |     "assert time_converter(268) == '4 28'\n",
383 |     "assert time_converter(4373) == '0 53'\n",
384 |     "assert time_converter(2722) == '21 22'\n",
385 |     "assert time_converter(1531) == '1 31'"
386 |    ]
387 |   },
388 |   {
389 |    "cell_type": "markdown",
390 |    "id": "824cbdcc",
391 |    "metadata": {},
392 |    "source": [
393 |     "---\n",
394 |     "2. Написать функцию, которая выводит минимум из трех чисел. Использовать `min` нельзя. Только `if`."
395 |    ]
396 |   },
397 |   {
398 |    "cell_type": "code",
399 |    "execution_count": null,
400 |    "id": "23a5e2ef",
401 |    "metadata": {},
402 |    "outputs": [],
403 |    "source": [
404 |     "# Написать функцию min_of_three_values, принимает на вход 3 аргумента \n",
405 |     "\n",
406 |     "\n",
407 |     "assert min_of_three_values(1, 2, 3) == 1\n",
408 |     "assert min_of_three_values(1, 1, 9) == 1\n",
409 |     "assert min_of_three_values(18, 7, 11) == 7\n",
410 |     "assert min_of_three_values(2, 10, 10) == 2\n",
411 |     "assert min_of_three_values(17, 14, 17) == 14\n",
412 |     "assert min_of_three_values(9, 2, 10) == 2\n",
413 |     "assert min_of_three_values(7, 4, 7) == 4\n",
414 |     "assert min_of_three_values(0, 8, 3) == 0\n",
415 |     "assert min_of_three_values(8, 10, 6) == 6\n",
416 |     "assert min_of_three_values(1, 4, 8) == 1"
417 |    ]
418 |   },
419 |   {
420 |    "cell_type": "markdown",
421 |    "id": "414dc710",
422 |    "metadata": {},
423 |    "source": [
424 |     "---\n",
425 |     "3. Удаление символа из строки. Необходимо использовать встроенный метод `.replace()`"
426 |    ]
427 |   },
428 |   {
429 |    "cell_type": "code",
430 |    "execution_count": 20,
431 |    "id": "3a131bb4",
432 |    "metadata": {},
433 |    "outputs": [],
434 |    "source": [
435 |     "# Функция remove_symbol принимает два аргумента -- строку и символ, которые необходимо удалить\n",
436 |     "\n",
437 |     "\n",
438 |     "assert remove_symbol('aaaaaaaa', 'a') == ''\n",
439 |     "assert remove_symbol('abababa', 'b') == 'aaaa'\n",
440 |     "assert remove_symbol('12341234', '3') == '124124'"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "markdown",
445 |    "id": "b6881076",
446 |    "metadata": {},
447 |    "source": [
448 |     "---\n",
449 |     "4. Удаление каждого 3 символа из строки "
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "code",
454 |    "execution_count": null,
455 |    "id": "52279032",
456 |    "metadata": {},
457 |    "outputs": [],
458 |    "source": [
459 |     "# Функция remove_each_third_sym принимает один агрумент -- строку.\n",
460 |     "# важно -- мы считаем человеческие индексы (начиная с 1, а не 0)\n",
461 |     "# важно -- оставляем первый аргумент\n",
462 |     "# Необходимо вернуть новую строку \n",
463 |     "\n",
464 |     "\n",
465 |     "\n",
466 |     "assert remove_each_third_sym('abcdef') == 'abde'\n",
467 |     "assert remove_each_third_sym('sdfasdfasdfsfa') == 'sdasfadffa'\n",
468 |     "assert remove_each_third_sym('123456789') == '124578'\n",
469 |     "assert remove_each_third_sym('987654321') == '986532'"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "markdown",
474 |    "id": "7c72f3ac",
475 |    "metadata": {},
476 |    "source": [
477 |     "--- \n",
478 |     "5. Вывести максимальный элемент и его индекс в листе"
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": 54,
484 |    "id": "789f7d5c",
485 |    "metadata": {},
486 |    "outputs": [],
487 |    "source": [
488 |     "# Функция find_max принимает на вход лист \n",
489 |     "# на выходе два числа -- непосредственно максимальное значение и его индекс \n",
490 |     "\n",
491 |     "\n",
492 |     "assert find_max([1, 2, 3, 4, 5]) == (5, 4)\n",
493 |     "assert find_max([5, 4, 3, 2, 1]) == (5, 0)\n",
494 |     "assert find_max([96, 82, 72, 48, 93, 88, 79]) == (96, 0)\n",
495 |     "assert find_max([49, 75, 65, 65, 65, 18]) == (75, 1)\n",
496 |     "assert find_max([69, 16, 64, 54, 36, 70, 89, 29]) == (89, 6)\n",
497 |     "assert find_max([17, 80, 27, 36, 21, 85, 63, 27]) == (85, 5)\n",
498 |     "assert find_max([76, 27, 73, 65, 52]) == (76, 0)\n",
499 |     "assert find_max([33, 26, 69, 40, 93]) == (93, 4)\n",
500 |     "assert find_max([87, 5, 95, 52, 21, 76, 22]) == (95, 2)\n",
501 |     "assert find_max([75, 18, 89, 99, 70]) == (99, 3)"
502 |    ]
503 |   },
504 |   {
505 |    "cell_type": "markdown",
506 |    "id": "a8cccf57",
507 |    "metadata": {},
508 |    "source": [
509 |     "---\n",
510 |     "6. Вставить элемент в конец листа. Использовать метод `append` можно и нужно"
511 |    ]
512 |   },
513 |   {
514 |    "cell_type": "code",
515 |    "execution_count": 66,
516 |    "id": "2c127a30",
517 |    "metadata": {},
518 |    "outputs": [],
519 |    "source": [
520 |     "# функция append_to_list принимает два значения -- лист и значение, которое необходимо вставить в конец листа\n",
521 |     "# функция возвращает обновленный лист\n",
522 |     "\n",
523 |     "\n",
524 |     "assert append_to_list([1, 2], 3) == [1, 2, 3]\n",
525 |     "assert append_to_list([1, 2], None) == [1, 2, None]\n",
526 |     "assert append_to_list([1, 's'], True) == [1, 's', True]"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "markdown",
531 |    "id": "92d53320",
532 |    "metadata": {},
533 |    "source": [
534 |     "---\n",
535 |     "7. Количество уникальных чисел в листе. Нужно использовать множества (`set`)"
536 |    ]
537 |   },
538 |   {
539 |    "cell_type": "code",
540 |    "execution_count": null,
541 |    "id": "e0d01126",
542 |    "metadata": {},
543 |    "outputs": [],
544 |    "source": [
545 |     "# функция number_unique_elements принимает на вход лист\n",
546 |     "# на выходе одно число -- количество уникальных элементов\n",
547 |     "\n",
548 |     "\n",
549 |     "assert number_unique_elements([1, 2, 3]) == 3\n",
550 |     "assert number_unique_elements([1, 2, 1]) == 2\n",
551 |     "assert number_unique_elements([1, 1, 1, 1]) == 1\n",
552 |     "assert number_unique_elements([1, 2, 1, 2]) == 2"
553 |    ]
554 |   },
555 |   {
556 |    "cell_type": "markdown",
557 |    "id": "8c708395",
558 |    "metadata": {},
559 |    "source": [
560 |     "# 3. Классы"
561 |    ]
562 |   },
563 |   {
564 |    "cell_type": "markdown",
565 |    "id": "1a11fb20",
566 |    "metadata": {},
567 |    "source": [
568 |     "Напишите класс Vehicle с двумя атрибутами -- максимальная скорость (max_speed) и пробег (mileage)  \n",
569 |     "Необходимо реализовать только метод-конструктор `__init__`"
570 |    ]
571 |   },
572 |   {
573 |    "cell_type": "code",
574 |    "execution_count": 2,
575 |    "id": "5fc4379c",
576 |    "metadata": {},
577 |    "outputs": [],
578 |    "source": [
579 |     "class Vehicle:\n",
580 |     "    # TODO\n",
581 |     "    pass\n",
582 |     "\n",
583 |     "\n",
584 |     "veh1 = Vehicle(100, 50)\n",
585 |     "assert (veh1.max_speed, veh1.mileage) == (100, 50)\n",
586 |     "\n",
587 |     "veh2 = Vehicle(200, 3)\n",
588 |     "assert (veh2.max_speed, veh2.mileage) == (200, 3)"
589 |    ]
590 |   },
591 |   {
592 |    "cell_type": "markdown",
593 |    "id": "ea4551bf",
594 |    "metadata": {},
595 |    "source": [
596 |     "Напишите класс Truck, который наследуется от Vehicle  \n",
597 |     "P.S. Наследование возможно при помощи конструкции class Subclass(MotherClass):"
598 |    ]
599 |   },
600 |   {
601 |    "cell_type": "code",
602 |    "execution_count": null,
603 |    "id": "2ea8d59d",
604 |    "metadata": {},
605 |    "outputs": [],
606 |    "source": [
607 |     "class Truck:\n",
608 |     "    # TODO\n",
609 |     "    pass\n",
610 |     "\n",
611 |     "\n",
612 |     "truck1 = Truck(50, 1000)\n",
613 |     "assert (truck1.max_speed, truck1.mileage) == (50, 1000)\n",
614 |     "\n",
615 |     "truck2 = Truck(43, 235)\n",
616 |     "assert (truck2.max_speed, truck2.mileage) == (43, 235)"
617 |    ]
618 |   },
619 |   {
620 |    "cell_type": "markdown",
621 |    "id": "5f93b111",
622 |    "metadata": {},
623 |    "source": [
624 |     "Напишите класс MyList с функционалом, как описано ниже"
625 |    ]
626 |   },
627 |   {
628 |    "cell_type": "code",
629 |    "execution_count": null,
630 |    "id": "b9a4db60",
631 |    "metadata": {},
632 |    "outputs": [],
633 |    "source": [
634 |     "class MyList:\n",
635 |     "    def __init__(self):\n",
636 |     "        # TODO: напиши меня\n",
637 |     "        pass\n",
638 |     "    \n",
639 |     "    def return_sum(self):\n",
640 |     "        \"\"\"\n",
641 |     "        Возвращает сумму всех элементов сохраненного листа.\n",
642 |     "        Пользоваться sum нельзя!\n",
643 |     "        \"\"\"\n",
644 |     "        # TODO: напиши меня\n",
645 |     "        pass\n",
646 |     "    \n",
647 |     "    def make_reverse(self):\n",
648 |     "        \"\"\"\n",
649 |     "        Разворачивает сохраненный лист.\n",
650 |     "        \"\"\"\n",
651 |     "        # TODO: напиши меня\n",
652 |     "        pass\n",
653 |     "    \n",
654 |     "    def make_slice(self, start, stop):\n",
655 |     "        \"\"\"\n",
656 |     "        Делает слайсинг сохраненного листа.\n",
657 |     "        \"\"\"\n",
658 |     "        # TODO: напиши меня\n",
659 |     "        pass\n",
660 |     "    \n",
661 |     "    \n",
662 |     "a = MyList([1, 2, 3, 4])\n",
663 |     "assert a.return_sum() == 10\n",
664 |     "assert a.make_reverse() == [4, 3, 2, 1]\n",
665 |     "assert a.make_slice(0, 2) == [1, 2]\n",
666 |     "\n",
667 |     "b = MyList([5, 6, 6, 5])\n",
668 |     "assert b.return_sum() == 22\n",
669 |     "assert b.make_reverse() == [5, 6, 6, 5]\n",
670 |     "assert b.make_slice(1, 2) == [6]"
671 |    ]
672 |   },
673 |   {
674 |    "cell_type": "code",
675 |    "execution_count": null,
676 |    "id": "aa5dc08b",
677 |    "metadata": {},
678 |    "outputs": [],
679 |    "source": []
680 |   }
681 |  ],
682 |  "metadata": {
683 |   "kernelspec": {
684 |    "display_name": "Python 3 (ipykernel)",
685 |    "language": "python",
686 |    "name": "python3"
687 |   },
688 |   "language_info": {
689 |    "codemirror_mode": {
690 |     "name": "ipython",
691 |     "version": 3
692 |    },
693 |    "file_extension": ".py",
694 |    "mimetype": "text/x-python",
695 |    "name": "python",
696 |    "nbconvert_exporter": "python",
697 |    "pygments_lexer": "ipython3",
698 |    "version": "3.9.13"
699 |   }
700 |  },
701 |  "nbformat": 4,
702 |  "nbformat_minor": 5
703 | }
704 | 


--------------------------------------------------------------------------------
/notebooks/seminars/02-python-libs.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "id": "e60bae95",
   6 |    "metadata": {},
   7 |    "source": [
   8 |     "# Библиотеки"
   9 |    ]
  10 |   },
  11 |   {
  12 |    "cell_type": "code",
  13 |    "execution_count": null,
  14 |    "id": "c310ce9d",
  15 |    "metadata": {},
  16 |    "outputs": [],
  17 |    "source": [
  18 |     "import math \n",
  19 |     "import os"
  20 |    ]
  21 |   },
  22 |   {
  23 |    "cell_type": "markdown",
  24 |    "id": "a2b43ec1",
  25 |    "metadata": {},
  26 |    "source": [
  27 |     "# 1. Numpy "
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "markdown",
  32 |    "id": "b93bcd0d",
  33 |    "metadata": {},
  34 |    "source": [
  35 |     "NumPy (https://numpy.org) -- библиотека для работы с массивами.  \n",
  36 |     "Работаем с примерами отсюда https://numpy.org/doc/stable/user/quickstart.html#the-basics"
  37 |    ]
  38 |   },
  39 |   {
  40 |    "cell_type": "code",
  41 |    "execution_count": null,
  42 |    "id": "8e31d506",
  43 |    "metadata": {},
  44 |    "outputs": [],
  45 |    "source": [
  46 |     "import numpy as np\n",
  47 |     "\n",
  48 |     "\n",
  49 |     "a = np.arange(15)\n",
  50 |     "a"
  51 |    ]
  52 |   },
  53 |   {
  54 |    "cell_type": "code",
  55 |    "execution_count": null,
  56 |    "id": "4d115b9a",
  57 |    "metadata": {},
  58 |    "outputs": [],
  59 |    "source": [
  60 |     "a.shape"
  61 |    ]
  62 |   },
  63 |   {
  64 |    "cell_type": "code",
  65 |    "execution_count": null,
  66 |    "id": "bf2bcfa1",
  67 |    "metadata": {},
  68 |    "outputs": [],
  69 |    "source": [
  70 |     "type(a)"
  71 |    ]
  72 |   },
  73 |   {
  74 |    "cell_type": "code",
  75 |    "execution_count": null,
  76 |    "id": "ddd6c6c3",
  77 |    "metadata": {},
  78 |    "outputs": [],
  79 |    "source": [
  80 |     "a = a.reshape(3, 5)\n",
  81 |     "a"
  82 |    ]
  83 |   },
  84 |   {
  85 |    "cell_type": "code",
  86 |    "execution_count": null,
  87 |    "id": "2460e6cd",
  88 |    "metadata": {},
  89 |    "outputs": [],
  90 |    "source": [
  91 |     "a.shape"
  92 |    ]
  93 |   },
  94 |   {
  95 |    "cell_type": "code",
  96 |    "execution_count": null,
  97 |    "id": "220643a2",
  98 |    "metadata": {},
  99 |    "outputs": [],
 100 |    "source": [
 101 |     "a.dtype"
 102 |    ]
 103 |   },
 104 |   {
 105 |    "cell_type": "code",
 106 |    "execution_count": null,
 107 |    "id": "68ec0297",
 108 |    "metadata": {},
 109 |    "outputs": [],
 110 |    "source": [
 111 |     "type(a)"
 112 |    ]
 113 |   },
 114 |   {
 115 |    "cell_type": "code",
 116 |    "execution_count": null,
 117 |    "id": "9dac9538",
 118 |    "metadata": {},
 119 |    "outputs": [],
 120 |    "source": [
 121 |     "a = np.array([2, 3, 4])\n",
 122 |     "a"
 123 |    ]
 124 |   },
 125 |   {
 126 |    "cell_type": "code",
 127 |    "execution_count": null,
 128 |    "id": "53754763",
 129 |    "metadata": {},
 130 |    "outputs": [],
 131 |    "source": [
 132 |     "a.dtype"
 133 |    ]
 134 |   },
 135 |   {
 136 |    "cell_type": "code",
 137 |    "execution_count": null,
 138 |    "id": "25babeb3",
 139 |    "metadata": {},
 140 |    "outputs": [],
 141 |    "source": [
 142 |     "b = np.array([1.2, 3.5, 5.1])\n",
 143 |     "b.dtype"
 144 |    ]
 145 |   },
 146 |   {
 147 |    "cell_type": "code",
 148 |    "execution_count": null,
 149 |    "id": "4297d976",
 150 |    "metadata": {},
 151 |    "outputs": [],
 152 |    "source": [
 153 |     "b = np.array([(1.5, 2, 3), (4, 5, 6)])\n",
 154 |     "b"
 155 |    ]
 156 |   },
 157 |   {
 158 |    "cell_type": "code",
 159 |    "execution_count": null,
 160 |    "id": "5745b04c",
 161 |    "metadata": {},
 162 |    "outputs": [],
 163 |    "source": [
 164 |     "np.zeros((3, 4))"
 165 |    ]
 166 |   },
 167 |   {
 168 |    "cell_type": "code",
 169 |    "execution_count": null,
 170 |    "id": "e394a1de",
 171 |    "metadata": {},
 172 |    "outputs": [],
 173 |    "source": [
 174 |     "np.ones((2, 3, 4), dtype=np.int16)"
 175 |    ]
 176 |   },
 177 |   {
 178 |    "cell_type": "code",
 179 |    "execution_count": null,
 180 |    "id": "8b642d5a",
 181 |    "metadata": {},
 182 |    "outputs": [],
 183 |    "source": [
 184 |     "np.arange(10, 30, 5)"
 185 |    ]
 186 |   },
 187 |   {
 188 |    "cell_type": "code",
 189 |    "execution_count": null,
 190 |    "id": "15b61e06",
 191 |    "metadata": {},
 192 |    "outputs": [],
 193 |    "source": [
 194 |     "np.arange(0, 2, 0.3)"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "id": "f6473aca",
 200 |    "metadata": {},
 201 |    "source": [
 202 |     "### Operations"
 203 |    ]
 204 |   },
 205 |   {
 206 |    "cell_type": "code",
 207 |    "execution_count": null,
 208 |    "id": "c2c52536",
 209 |    "metadata": {},
 210 |    "outputs": [],
 211 |    "source": [
 212 |     "a = np.array([20, 30, 40, 50])\n",
 213 |     "b = np.arange(4)\n",
 214 |     "print(a)\n",
 215 |     "print(b)"
 216 |    ]
 217 |   },
 218 |   {
 219 |    "cell_type": "code",
 220 |    "execution_count": null,
 221 |    "id": "1a8ec8d4",
 222 |    "metadata": {},
 223 |    "outputs": [],
 224 |    "source": [
 225 |     "c = a - b\n",
 226 |     "c"
 227 |    ]
 228 |   },
 229 |   {
 230 |    "cell_type": "code",
 231 |    "execution_count": null,
 232 |    "id": "08104d1b",
 233 |    "metadata": {},
 234 |    "outputs": [],
 235 |    "source": [
 236 |     "b ** 2"
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "code",
 241 |    "execution_count": null,
 242 |    "id": "640af5c6",
 243 |    "metadata": {},
 244 |    "outputs": [],
 245 |    "source": [
 246 |     "b * 10"
 247 |    ]
 248 |   },
 249 |   {
 250 |    "cell_type": "code",
 251 |    "execution_count": null,
 252 |    "id": "d6a54e00",
 253 |    "metadata": {},
 254 |    "outputs": [],
 255 |    "source": [
 256 |     "10 * np.sin(a)"
 257 |    ]
 258 |   },
 259 |   {
 260 |    "cell_type": "code",
 261 |    "execution_count": null,
 262 |    "id": "b92ed553",
 263 |    "metadata": {},
 264 |    "outputs": [],
 265 |    "source": [
 266 |     "a < 35"
 267 |    ]
 268 |   },
 269 |   {
 270 |    "cell_type": "markdown",
 271 |    "id": "d8cf6b45",
 272 |    "metadata": {},
 273 |    "source": [
 274 |     "# Micro ML Intro"
 275 |    ]
 276 |   },
 277 |   {
 278 |    "cell_type": "markdown",
 279 |    "id": "79a961a0",
 280 |    "metadata": {},
 281 |    "source": [
 282 |     "**`Датасет`** -- набор данных, на которых обучается модель машинного / глубокого обучения.  \n",
 283 |     "\n",
 284 |     "Датасет можно разделить на 2 части: `X` и `y`   \n",
 285 |     "**`X`** -- набор признаков по каждому объекту   \n",
 286 |     "**`y`** -- целевая переменная по каждому объекту   \n",
 287 |     "  \n",
 288 |     "Датасеты можно искать много где. Например, на kaggle (там их около 300к) -- https://www.kaggle.com/datasets  \n",
 289 |     "\n",
 290 |     "---\n",
 291 |     "Изображения взяты с https://builtin.com/data-science/train-test-split"
 292 |    ]
 293 |   },
 294 |   {
 295 |    "cell_type": "markdown",
 296 |    "id": "16593847",
 297 |    "metadata": {},
 298 |    "source": [
 299 |     "![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/2_train-test-split.jpg)"
 300 |    ]
 301 |   },
 302 |   {
 303 |    "cell_type": "markdown",
 304 |    "id": "2c652b90",
 305 |    "metadata": {},
 306 |    "source": [
 307 |     "### Примеры датасетов:\n",
 308 |     "1. Титаник (https://www.kaggle.com/competitions/titanic/data) <br>\n",
 309 |     "   `X` -- набор признаков на каждого пассажира (пол, возраст, класс билета и пр.)  <br>\n",
 310 |     "   `y` -- флаг, выжил ли пассажир при крушении Титаника   <br>\n",
 311 |     "---\n",
 312 |     "2. Данные об играх на портале IGN (https://www.kaggle.com/datasets/kapturovalexander/ign-games-from-best-to-worst)  <br>\n",
 313 |     "   `X` -- набор признаков: название, год выхода, жанр   \n",
 314 |     "   `y` -- оценка от критиков, оценка от игроков\n",
 315 |     "---\n",
 316 |     "3. Определение типа кожи по фотографии (https://www.kaggle.com/datasets/shakyadissanayake/oily-dry-and-normal-skin-types-dataset)  <br>\n",
 317 |     "   `X` -- фотографии кожи   \n",
 318 |     "   `y` -- тип кожи\n",
 319 |     "---\n",
 320 |     "4. Датасет 50к песен из Spotify (https://www.kaggle.com/datasets/joebeachcapital/57651-spotify-songs)  <br>\n",
 321 |     "   `X` -- исполнитель, название песни, текст песни   \n",
 322 |     "   `y` -- зависит от задачи. Например, может быть предсказание исполнителя по тексту песни. Либо генерация текстов в стиле какого-либо исполнителя."
 323 |    ]
 324 |   },
 325 |   {
 326 |    "cell_type": "markdown",
 327 |    "id": "ba3bc012",
 328 |    "metadata": {},
 329 |    "source": [
 330 |     "![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/4_train-test-split.jpg)  \n",
 331 |     "---"
 332 |    ]
 333 |   },
 334 |   {
 335 |    "cell_type": "markdown",
 336 |    "id": "38eacdf8",
 337 |    "metadata": {},
 338 |    "source": [
 339 |     "## Пример\n",
 340 |     "![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/5_train-test-split.jpg)"
 341 |    ]
 342 |   },
 343 |   {
 344 |    "cell_type": "markdown",
 345 |    "id": "e7535cfa",
 346 |    "metadata": {},
 347 |    "source": [
 348 |     "![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/6_train-test-split.jpg)"
 349 |    ]
 350 |   },
 351 |   {
 352 |    "cell_type": "markdown",
 353 |    "id": "c17b8c40",
 354 |    "metadata": {},
 355 |    "source": [
 356 |     "![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/7_train-test-split.jpg)"
 357 |    ]
 358 |   },
 359 |   {
 360 |    "cell_type": "markdown",
 361 |    "id": "08a6ba50",
 362 |    "metadata": {},
 363 |    "source": [
 364 |     "# 2. Pandas"
 365 |    ]
 366 |   },
 367 |   {
 368 |    "cell_type": "markdown",
 369 |    "id": "d3a63348",
 370 |    "metadata": {},
 371 |    "source": [
 372 |     "Pandas (https://pandas.pydata.org) -- библиотека для работы с датасетами.  \n",
 373 |     "Введение в библиотеку можно посмотреть тут -- https://pandas.pydata.org/docs/user_guide/10min.html"
 374 |    ]
 375 |   },
 376 |   {
 377 |    "cell_type": "code",
 378 |    "execution_count": null,
 379 |    "id": "d504be18",
 380 |    "metadata": {},
 381 |    "outputs": [],
 382 |    "source": [
 383 |     "import pandas as pd"
 384 |    ]
 385 |   },
 386 |   {
 387 |    "cell_type": "code",
 388 |    "execution_count": null,
 389 |    "id": "44ca2f5f",
 390 |    "metadata": {},
 391 |    "outputs": [],
 392 |    "source": [
 393 |     "s = pd.Series([1, 3, 5, np.nan, 6, 8])\n",
 394 |     "s"
 395 |    ]
 396 |   },
 397 |   {
 398 |    "cell_type": "code",
 399 |    "execution_count": null,
 400 |    "id": "ffec5692",
 401 |    "metadata": {},
 402 |    "outputs": [],
 403 |    "source": [
 404 |     "dates = pd.date_range(\"20130101\", periods=6)\n",
 405 |     "dates"
 406 |    ]
 407 |   },
 408 |   {
 409 |    "cell_type": "code",
 410 |    "execution_count": null,
 411 |    "id": "2882d00c",
 412 |    "metadata": {},
 413 |    "outputs": [],
 414 |    "source": [
 415 |     "df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list(\"ABCD\"))\n",
 416 |     "df"
 417 |    ]
 418 |   },
 419 |   {
 420 |    "cell_type": "code",
 421 |    "execution_count": null,
 422 |    "id": "67e1ff54",
 423 |    "metadata": {},
 424 |    "outputs": [],
 425 |    "source": [
 426 |     "df2 = pd.DataFrame(\n",
 427 |     "    {\n",
 428 |     "        \"A\": 1.0,\n",
 429 |     "        \"B\": pd.Timestamp(\"20130102\"),\n",
 430 |     "        \"C\": pd.Series(1, index=list(range(4)), dtype=\"float32\"),\n",
 431 |     "        \"D\": np.array([3] * 4, dtype=\"int32\"),\n",
 432 |     "        \"E\": pd.Categorical([\"test\", \"train\", \"test\", \"train\"]),\n",
 433 |     "        \"F\": \"foo\",\n",
 434 |     "    }\n",
 435 |     ")\n",
 436 |     "df2"
 437 |    ]
 438 |   },
 439 |   {
 440 |    "cell_type": "code",
 441 |    "execution_count": null,
 442 |    "id": "1469555e",
 443 |    "metadata": {},
 444 |    "outputs": [],
 445 |    "source": [
 446 |     "df2.dtypes"
 447 |    ]
 448 |   },
 449 |   {
 450 |    "cell_type": "code",
 451 |    "execution_count": null,
 452 |    "id": "95698aa8",
 453 |    "metadata": {},
 454 |    "outputs": [],
 455 |    "source": [
 456 |     "df.head()"
 457 |    ]
 458 |   },
 459 |   {
 460 |    "cell_type": "code",
 461 |    "execution_count": null,
 462 |    "id": "96e59a0c",
 463 |    "metadata": {},
 464 |    "outputs": [],
 465 |    "source": [
 466 |     "df.tail(3)"
 467 |    ]
 468 |   },
 469 |   {
 470 |    "cell_type": "code",
 471 |    "execution_count": null,
 472 |    "id": "9138a90b",
 473 |    "metadata": {},
 474 |    "outputs": [],
 475 |    "source": [
 476 |     "df.index"
 477 |    ]
 478 |   },
 479 |   {
 480 |    "cell_type": "code",
 481 |    "execution_count": null,
 482 |    "id": "316e81bb",
 483 |    "metadata": {},
 484 |    "outputs": [],
 485 |    "source": [
 486 |     "df.columns"
 487 |    ]
 488 |   },
 489 |   {
 490 |    "cell_type": "code",
 491 |    "execution_count": null,
 492 |    "id": "aa047d37",
 493 |    "metadata": {},
 494 |    "outputs": [],
 495 |    "source": [
 496 |     "df.to_numpy()"
 497 |    ]
 498 |   },
 499 |   {
 500 |    "cell_type": "code",
 501 |    "execution_count": null,
 502 |    "id": "85716865",
 503 |    "metadata": {},
 504 |    "outputs": [],
 505 |    "source": [
 506 |     "df.describe()"
 507 |    ]
 508 |   },
 509 |   {
 510 |    "cell_type": "code",
 511 |    "execution_count": null,
 512 |    "id": "baec6a63",
 513 |    "metadata": {},
 514 |    "outputs": [],
 515 |    "source": [
 516 |     "df.T"
 517 |    ]
 518 |   },
 519 |   {
 520 |    "cell_type": "code",
 521 |    "execution_count": null,
 522 |    "id": "8a2b7a14",
 523 |    "metadata": {},
 524 |    "outputs": [],
 525 |    "source": [
 526 |     "df.sort_index(axis=1, ascending=False)"
 527 |    ]
 528 |   },
 529 |   {
 530 |    "cell_type": "code",
 531 |    "execution_count": null,
 532 |    "id": "777416db",
 533 |    "metadata": {},
 534 |    "outputs": [],
 535 |    "source": [
 536 |     "df.sort_values(by=\"B\")"
 537 |    ]
 538 |   },
 539 |   {
 540 |    "cell_type": "code",
 541 |    "execution_count": null,
 542 |    "id": "64786844",
 543 |    "metadata": {},
 544 |    "outputs": [],
 545 |    "source": [
 546 |     "df[\"A\"]"
 547 |    ]
 548 |   },
 549 |   {
 550 |    "cell_type": "code",
 551 |    "execution_count": null,
 552 |    "id": "c0e66775",
 553 |    "metadata": {},
 554 |    "outputs": [],
 555 |    "source": [
 556 |     "df[0:3]"
 557 |    ]
 558 |   },
 559 |   {
 560 |    "cell_type": "code",
 561 |    "execution_count": null,
 562 |    "id": "17cd42b9",
 563 |    "metadata": {},
 564 |    "outputs": [],
 565 |    "source": [
 566 |     "df[\"20130102\":\"20130104\"]"
 567 |    ]
 568 |   },
 569 |   {
 570 |    "cell_type": "code",
 571 |    "execution_count": null,
 572 |    "id": "243d6301",
 573 |    "metadata": {},
 574 |    "outputs": [],
 575 |    "source": [
 576 |     "df.loc[dates[0]]"
 577 |    ]
 578 |   },
 579 |   {
 580 |    "cell_type": "code",
 581 |    "execution_count": null,
 582 |    "id": "86e019cd",
 583 |    "metadata": {},
 584 |    "outputs": [],
 585 |    "source": [
 586 |     "df.loc[:, [\"A\", \"B\"]]"
 587 |    ]
 588 |   },
 589 |   {
 590 |    "cell_type": "code",
 591 |    "execution_count": null,
 592 |    "id": "47acc1f7",
 593 |    "metadata": {},
 594 |    "outputs": [],
 595 |    "source": [
 596 |     "df.loc[\"20130102\":\"20130104\", [\"A\", \"B\"]]"
 597 |    ]
 598 |   },
 599 |   {
 600 |    "cell_type": "code",
 601 |    "execution_count": null,
 602 |    "id": "4f3b73e3",
 603 |    "metadata": {},
 604 |    "outputs": [],
 605 |    "source": [
 606 |     "df.loc[\"20130102\", [\"A\", \"B\"]]"
 607 |    ]
 608 |   },
 609 |   {
 610 |    "cell_type": "code",
 611 |    "execution_count": null,
 612 |    "id": "b539dbff",
 613 |    "metadata": {},
 614 |    "outputs": [],
 615 |    "source": [
 616 |     "df.loc[dates[0], \"A\"]"
 617 |    ]
 618 |   },
 619 |   {
 620 |    "cell_type": "code",
 621 |    "execution_count": null,
 622 |    "id": "515c57c7",
 623 |    "metadata": {},
 624 |    "outputs": [],
 625 |    "source": [
 626 |     "df.at[dates[0], \"A\"]"
 627 |    ]
 628 |   },
 629 |   {
 630 |    "cell_type": "code",
 631 |    "execution_count": null,
 632 |    "id": "c98a02a4",
 633 |    "metadata": {},
 634 |    "outputs": [],
 635 |    "source": [
 636 |     "df.iloc[3]"
 637 |    ]
 638 |   },
 639 |   {
 640 |    "cell_type": "code",
 641 |    "execution_count": null,
 642 |    "id": "65a13e7d",
 643 |    "metadata": {},
 644 |    "outputs": [],
 645 |    "source": [
 646 |     "df.iloc[3:5, 0:2]"
 647 |    ]
 648 |   },
 649 |   {
 650 |    "cell_type": "code",
 651 |    "execution_count": null,
 652 |    "id": "424bb0f9",
 653 |    "metadata": {},
 654 |    "outputs": [],
 655 |    "source": [
 656 |     "df.iloc[[1, 2, 4], [0, 2]]"
 657 |    ]
 658 |   },
 659 |   {
 660 |    "cell_type": "code",
 661 |    "execution_count": null,
 662 |    "id": "f9ad2217",
 663 |    "metadata": {},
 664 |    "outputs": [],
 665 |    "source": [
 666 |     "df.iloc[1:3, :]"
 667 |    ]
 668 |   },
 669 |   {
 670 |    "cell_type": "code",
 671 |    "execution_count": null,
 672 |    "id": "6b8a180c",
 673 |    "metadata": {},
 674 |    "outputs": [],
 675 |    "source": [
 676 |     "df.iloc[:, 1:3]"
 677 |    ]
 678 |   },
 679 |   {
 680 |    "cell_type": "code",
 681 |    "execution_count": null,
 682 |    "id": "b5c2e70d",
 683 |    "metadata": {},
 684 |    "outputs": [],
 685 |    "source": [
 686 |     "df.iloc[1, 1]"
 687 |    ]
 688 |   },
 689 |   {
 690 |    "cell_type": "code",
 691 |    "execution_count": null,
 692 |    "id": "98ffeac6",
 693 |    "metadata": {},
 694 |    "outputs": [],
 695 |    "source": [
 696 |     "%timeit df.iloc[1, 1]"
 697 |    ]
 698 |   },
 699 |   {
 700 |    "cell_type": "code",
 701 |    "execution_count": null,
 702 |    "id": "a8013f56",
 703 |    "metadata": {},
 704 |    "outputs": [],
 705 |    "source": [
 706 |     "df.iat[1, 1]"
 707 |    ]
 708 |   },
 709 |   {
 710 |    "cell_type": "code",
 711 |    "execution_count": null,
 712 |    "id": "1ebf9284",
 713 |    "metadata": {},
 714 |    "outputs": [],
 715 |    "source": [
 716 |     "%timeit df.iat[1, 1]"
 717 |    ]
 718 |   },
 719 |   {
 720 |    "cell_type": "code",
 721 |    "execution_count": null,
 722 |    "id": "72cceb89",
 723 |    "metadata": {},
 724 |    "outputs": [],
 725 |    "source": [
 726 |     "df[df[\"A\"] > 0]"
 727 |    ]
 728 |   },
 729 |   {
 730 |    "cell_type": "code",
 731 |    "execution_count": null,
 732 |    "id": "38c9b444",
 733 |    "metadata": {},
 734 |    "outputs": [],
 735 |    "source": [
 736 |     "df[df > 0]"
 737 |    ]
 738 |   },
 739 |   {
 740 |    "cell_type": "code",
 741 |    "execution_count": null,
 742 |    "id": "0b0c5b4f",
 743 |    "metadata": {},
 744 |    "outputs": [],
 745 |    "source": [
 746 |     "df2 = df.copy()"
 747 |    ]
 748 |   },
 749 |   {
 750 |    "cell_type": "code",
 751 |    "execution_count": null,
 752 |    "id": "cb3f27e2",
 753 |    "metadata": {},
 754 |    "outputs": [],
 755 |    "source": [
 756 |     "df2[\"E\"] = [\"one\", \"one\", \"two\", \"three\", \"four\", \"three\"]"
 757 |    ]
 758 |   },
 759 |   {
 760 |    "cell_type": "code",
 761 |    "execution_count": null,
 762 |    "id": "f3081a4f",
 763 |    "metadata": {},
 764 |    "outputs": [],
 765 |    "source": [
 766 |     "df2"
 767 |    ]
 768 |   },
 769 |   {
 770 |    "cell_type": "code",
 771 |    "execution_count": null,
 772 |    "id": "128fd735",
 773 |    "metadata": {},
 774 |    "outputs": [],
 775 |    "source": [
 776 |     "df2[df2[\"E\"].isin([\"two\", \"four\"])]"
 777 |    ]
 778 |   },
 779 |   {
 780 |    "cell_type": "code",
 781 |    "execution_count": null,
 782 |    "id": "92bcd1cb",
 783 |    "metadata": {},
 784 |    "outputs": [],
 785 |    "source": [
 786 |     "s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range(\"20130102\", periods=6))\n",
 787 |     "s1"
 788 |    ]
 789 |   },
 790 |   {
 791 |    "cell_type": "code",
 792 |    "execution_count": null,
 793 |    "id": "ff154561",
 794 |    "metadata": {},
 795 |    "outputs": [],
 796 |    "source": [
 797 |     "df[\"F\"] = s1"
 798 |    ]
 799 |   },
 800 |   {
 801 |    "cell_type": "code",
 802 |    "execution_count": null,
 803 |    "id": "799516ba",
 804 |    "metadata": {},
 805 |    "outputs": [],
 806 |    "source": [
 807 |     "df"
 808 |    ]
 809 |   },
 810 |   {
 811 |    "cell_type": "code",
 812 |    "execution_count": null,
 813 |    "id": "3cad2f7b",
 814 |    "metadata": {},
 815 |    "outputs": [],
 816 |    "source": [
 817 |     "df.at[dates[0], \"A\"] = 0"
 818 |    ]
 819 |   },
 820 |   {
 821 |    "cell_type": "code",
 822 |    "execution_count": null,
 823 |    "id": "39db1a86",
 824 |    "metadata": {},
 825 |    "outputs": [],
 826 |    "source": [
 827 |     "df"
 828 |    ]
 829 |   },
 830 |   {
 831 |    "cell_type": "code",
 832 |    "execution_count": null,
 833 |    "id": "200e51fd",
 834 |    "metadata": {},
 835 |    "outputs": [],
 836 |    "source": [
 837 |     "df.iat[0, 1] = 0"
 838 |    ]
 839 |   },
 840 |   {
 841 |    "cell_type": "code",
 842 |    "execution_count": null,
 843 |    "id": "27563656",
 844 |    "metadata": {},
 845 |    "outputs": [],
 846 |    "source": [
 847 |     "df"
 848 |    ]
 849 |   },
 850 |   {
 851 |    "cell_type": "code",
 852 |    "execution_count": null,
 853 |    "id": "dff1a5c2",
 854 |    "metadata": {},
 855 |    "outputs": [],
 856 |    "source": [
 857 |     "df.loc[:, \"D\"] = np.array([5] * len(df))\n",
 858 |     "df"
 859 |    ]
 860 |   },
 861 |   {
 862 |    "cell_type": "code",
 863 |    "execution_count": null,
 864 |    "id": "9223d073",
 865 |    "metadata": {},
 866 |    "outputs": [],
 867 |    "source": [
 868 |     "df2 = df.copy()"
 869 |    ]
 870 |   },
 871 |   {
 872 |    "cell_type": "code",
 873 |    "execution_count": null,
 874 |    "id": "7137549f",
 875 |    "metadata": {},
 876 |    "outputs": [],
 877 |    "source": [
 878 |     "df2[df2 > 0] = -df2"
 879 |    ]
 880 |   },
 881 |   {
 882 |    "cell_type": "code",
 883 |    "execution_count": null,
 884 |    "id": "38ebf7c9",
 885 |    "metadata": {},
 886 |    "outputs": [],
 887 |    "source": [
 888 |     "df2"
 889 |    ]
 890 |   },
 891 |   {
 892 |    "cell_type": "code",
 893 |    "execution_count": null,
 894 |    "id": "6e92f302",
 895 |    "metadata": {},
 896 |    "outputs": [],
 897 |    "source": [
 898 |     "df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + [\"E\"])"
 899 |    ]
 900 |   },
 901 |   {
 902 |    "cell_type": "code",
 903 |    "execution_count": null,
 904 |    "id": "fc07699c",
 905 |    "metadata": {},
 906 |    "outputs": [],
 907 |    "source": [
 908 |     "df1"
 909 |    ]
 910 |   },
 911 |   {
 912 |    "cell_type": "code",
 913 |    "execution_count": null,
 914 |    "id": "c9be24a4",
 915 |    "metadata": {},
 916 |    "outputs": [],
 917 |    "source": [
 918 |     "df1.loc[dates[0] : dates[1], \"E\"] = 1"
 919 |    ]
 920 |   },
 921 |   {
 922 |    "cell_type": "code",
 923 |    "execution_count": null,
 924 |    "id": "b4a7e142",
 925 |    "metadata": {},
 926 |    "outputs": [],
 927 |    "source": [
 928 |     "df1"
 929 |    ]
 930 |   },
 931 |   {
 932 |    "cell_type": "code",
 933 |    "execution_count": null,
 934 |    "id": "a57f4792",
 935 |    "metadata": {},
 936 |    "outputs": [],
 937 |    "source": [
 938 |     "df1.dropna(how=\"any\")"
 939 |    ]
 940 |   },
 941 |   {
 942 |    "cell_type": "code",
 943 |    "execution_count": null,
 944 |    "id": "1049117a",
 945 |    "metadata": {},
 946 |    "outputs": [],
 947 |    "source": [
 948 |     "df1.fillna(value=5)"
 949 |    ]
 950 |   },
 951 |   {
 952 |    "cell_type": "code",
 953 |    "execution_count": null,
 954 |    "id": "f25fb757",
 955 |    "metadata": {},
 956 |    "outputs": [],
 957 |    "source": [
 958 |     "pd.isna(df1)"
 959 |    ]
 960 |   },
 961 |   {
 962 |    "cell_type": "markdown",
 963 |    "id": "05eef9f9",
 964 |    "metadata": {},
 965 |    "source": [
 966 |     "## Operations"
 967 |    ]
 968 |   },
 969 |   {
 970 |    "cell_type": "code",
 971 |    "execution_count": null,
 972 |    "id": "b89bcbd1",
 973 |    "metadata": {},
 974 |    "outputs": [],
 975 |    "source": [
 976 |     "df"
 977 |    ]
 978 |   },
 979 |   {
 980 |    "cell_type": "code",
 981 |    "execution_count": null,
 982 |    "id": "3bf1f0fd",
 983 |    "metadata": {},
 984 |    "outputs": [],
 985 |    "source": [
 986 |     "df.mean()"
 987 |    ]
 988 |   },
 989 |   {
 990 |    "cell_type": "code",
 991 |    "execution_count": null,
 992 |    "id": "b0c4c5b4",
 993 |    "metadata": {},
 994 |    "outputs": [],
 995 |    "source": [
 996 |     "df.mean(axis=1)"
 997 |    ]
 998 |   },
 999 |   {
1000 |    "cell_type": "code",
1001 |    "execution_count": null,
1002 |    "id": "f740babe",
1003 |    "metadata": {},
1004 |    "outputs": [],
1005 |    "source": [
1006 |     "dates"
1007 |    ]
1008 |   },
1009 |   {
1010 |    "cell_type": "code",
1011 |    "execution_count": null,
1012 |    "id": "ddbf9aca",
1013 |    "metadata": {},
1014 |    "outputs": [],
1015 |    "source": [
1016 |     "pd.Series([1, 3, 5, np.nan, 6, 8], index=dates)"
1017 |    ]
1018 |   },
1019 |   {
1020 |    "cell_type": "code",
1021 |    "execution_count": null,
1022 |    "id": "f73070b8",
1023 |    "metadata": {},
1024 |    "outputs": [],
1025 |    "source": [
1026 |     "pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).values"
1027 |    ]
1028 |   },
1029 |   {
1030 |    "cell_type": "code",
1031 |    "execution_count": null,
1032 |    "id": "a60fd060",
1033 |    "metadata": {},
1034 |    "outputs": [],
1035 |    "source": [
1036 |     "dates"
1037 |    ]
1038 |   },
1039 |   {
1040 |    "cell_type": "code",
1041 |    "execution_count": null,
1042 |    "id": "2ffbfa73",
1043 |    "metadata": {},
1044 |    "outputs": [],
1045 |    "source": [
1046 |     "s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)\n",
1047 |     "s"
1048 |    ]
1049 |   },
1050 |   {
1051 |    "cell_type": "code",
1052 |    "execution_count": null,
1053 |    "id": "94493557",
1054 |    "metadata": {},
1055 |    "outputs": [],
1056 |    "source": [
1057 |     "df"
1058 |    ]
1059 |   },
1060 |   {
1061 |    "cell_type": "code",
1062 |    "execution_count": null,
1063 |    "id": "f8b01395",
1064 |    "metadata": {},
1065 |    "outputs": [],
1066 |    "source": [
1067 |     "df.index"
1068 |    ]
1069 |   },
1070 |   {
1071 |    "cell_type": "code",
1072 |    "execution_count": null,
1073 |    "id": "3617fc30",
1074 |    "metadata": {},
1075 |    "outputs": [],
1076 |    "source": [
1077 |     "s"
1078 |    ]
1079 |   },
1080 |   {
1081 |    "cell_type": "code",
1082 |    "execution_count": null,
1083 |    "id": "105ce0ce",
1084 |    "metadata": {},
1085 |    "outputs": [],
1086 |    "source": [
1087 |     "df.sub(s, axis=\"index\")"
1088 |    ]
1089 |   },
1090 |   {
1091 |    "cell_type": "code",
1092 |    "execution_count": null,
1093 |    "id": "e90abe3d",
1094 |    "metadata": {},
1095 |    "outputs": [],
1096 |    "source": [
1097 |     "df"
1098 |    ]
1099 |   },
1100 |   {
1101 |    "cell_type": "code",
1102 |    "execution_count": null,
1103 |    "id": "bcc4cdf5",
1104 |    "metadata": {},
1105 |    "outputs": [],
1106 |    "source": [
1107 |     "df.apply(np.cumsum)"
1108 |    ]
1109 |   },
1110 |   {
1111 |    "cell_type": "code",
1112 |    "execution_count": null,
1113 |    "id": "164ad202",
1114 |    "metadata": {},
1115 |    "outputs": [],
1116 |    "source": [
1117 |     "df.apply(np.cumsum, axis=1)"
1118 |    ]
1119 |   },
1120 |   {
1121 |    "cell_type": "code",
1122 |    "execution_count": null,
1123 |    "id": "280d90bf",
1124 |    "metadata": {},
1125 |    "outputs": [],
1126 |    "source": [
1127 |     "df.apply(lambda x: x.max() - x.min())"
1128 |    ]
1129 |   },
1130 |   {
1131 |    "cell_type": "code",
1132 |    "execution_count": null,
1133 |    "id": "478f95cf",
1134 |    "metadata": {},
1135 |    "outputs": [],
1136 |    "source": [
1137 |     "s = pd.Series(np.random.randint(0, 7, size=10))"
1138 |    ]
1139 |   },
1140 |   {
1141 |    "cell_type": "code",
1142 |    "execution_count": null,
1143 |    "id": "1ab998b1",
1144 |    "metadata": {},
1145 |    "outputs": [],
1146 |    "source": [
1147 |     "s"
1148 |    ]
1149 |   },
1150 |   {
1151 |    "cell_type": "code",
1152 |    "execution_count": null,
1153 |    "id": "085e4269",
1154 |    "metadata": {},
1155 |    "outputs": [],
1156 |    "source": [
1157 |     "s.value_counts()"
1158 |    ]
1159 |   },
1160 |   {
1161 |    "cell_type": "code",
1162 |    "execution_count": null,
1163 |    "id": "18d39be5",
1164 |    "metadata": {},
1165 |    "outputs": [],
1166 |    "source": [
1167 |     "s = pd.Series([\"A\", \"B\", \"C\", \"Aaba\", \"Baca\", np.nan, \"CABA\", \"dog\", \"cat\"])"
1168 |    ]
1169 |   },
1170 |   {
1171 |    "cell_type": "code",
1172 |    "execution_count": null,
1173 |    "id": "feff87c2",
1174 |    "metadata": {},
1175 |    "outputs": [],
1176 |    "source": [
1177 |     "s"
1178 |    ]
1179 |   },
1180 |   {
1181 |    "cell_type": "code",
1182 |    "execution_count": null,
1183 |    "id": "772a8d4f",
1184 |    "metadata": {},
1185 |    "outputs": [],
1186 |    "source": [
1187 |     "df['A']"
1188 |    ]
1189 |   },
1190 |   {
1191 |    "cell_type": "code",
1192 |    "execution_count": null,
1193 |    "id": "ba743edf",
1194 |    "metadata": {},
1195 |    "outputs": [],
1196 |    "source": [
1197 |     "s.str.lower()"
1198 |    ]
1199 |   },
1200 |   {
1201 |    "cell_type": "code",
1202 |    "execution_count": null,
1203 |    "id": "7d2394af",
1204 |    "metadata": {},
1205 |    "outputs": [],
1206 |    "source": [
1207 |     "s.str.upper()"
1208 |    ]
1209 |   },
1210 |   {
1211 |    "cell_type": "code",
1212 |    "execution_count": null,
1213 |    "id": "5524891d",
1214 |    "metadata": {},
1215 |    "outputs": [],
1216 |    "source": [
1217 |     "df.max()"
1218 |    ]
1219 |   },
1220 |   {
1221 |    "cell_type": "markdown",
1222 |    "id": "8a07abef",
1223 |    "metadata": {},
1224 |    "source": [
1225 |     "## Merge & Concat"
1226 |    ]
1227 |   },
1228 |   {
1229 |    "cell_type": "code",
1230 |    "execution_count": null,
1231 |    "id": "ffa04286",
1232 |    "metadata": {},
1233 |    "outputs": [],
1234 |    "source": [
1235 |     "df = pd.DataFrame(np.random.randn(10, 4))\n",
1236 |     "df"
1237 |    ]
1238 |   },
1239 |   {
1240 |    "cell_type": "code",
1241 |    "execution_count": null,
1242 |    "id": "31d8f50f",
1243 |    "metadata": {},
1244 |    "outputs": [],
1245 |    "source": [
1246 |     "pieces = [df[:3], df[3:7], df[7:]]"
1247 |    ]
1248 |   },
1249 |   {
1250 |    "cell_type": "code",
1251 |    "execution_count": null,
1252 |    "id": "39e96af3",
1253 |    "metadata": {},
1254 |    "outputs": [],
1255 |    "source": [
1256 |     "pieces[1]"
1257 |    ]
1258 |   },
1259 |   {
1260 |    "cell_type": "code",
1261 |    "execution_count": null,
1262 |    "id": "776a6901",
1263 |    "metadata": {},
1264 |    "outputs": [],
1265 |    "source": [
1266 |     "pieces[0]"
1267 |    ]
1268 |   },
1269 |   {
1270 |    "cell_type": "code",
1271 |    "execution_count": null,
1272 |    "id": "b744166b",
1273 |    "metadata": {},
1274 |    "outputs": [],
1275 |    "source": [
1276 |     "pieces[-1]"
1277 |    ]
1278 |   },
1279 |   {
1280 |    "cell_type": "code",
1281 |    "execution_count": null,
1282 |    "id": "8928c666",
1283 |    "metadata": {},
1284 |    "outputs": [],
1285 |    "source": [
1286 |     "pd.concat(pieces)"
1287 |    ]
1288 |   },
1289 |   {
1290 |    "cell_type": "code",
1291 |    "execution_count": null,
1292 |    "id": "b6ae5449",
1293 |    "metadata": {},
1294 |    "outputs": [],
1295 |    "source": [
1296 |     "pd.concat([df[:3], df[7:], df[3:7]], ignore_index=True)"
1297 |    ]
1298 |   },
1299 |   {
1300 |    "cell_type": "code",
1301 |    "execution_count": null,
1302 |    "id": "44a9bbc5",
1303 |    "metadata": {},
1304 |    "outputs": [],
1305 |    "source": [
1306 |     "left = pd.DataFrame({\"key\": [\"foo\", \"foo\"], \"lval\": [1, 2]})\n",
1307 |     "left"
1308 |    ]
1309 |   },
1310 |   {
1311 |    "cell_type": "code",
1312 |    "execution_count": null,
1313 |    "id": "cc52847c",
1314 |    "metadata": {},
1315 |    "outputs": [],
1316 |    "source": [
1317 |     "right = pd.DataFrame({\"key\": [\"foo\", \"foo\"], \"rval\": [4, 5]})\n",
1318 |     "right"
1319 |    ]
1320 |   },
1321 |   {
1322 |    "cell_type": "code",
1323 |    "execution_count": null,
1324 |    "id": "784cb8d7",
1325 |    "metadata": {},
1326 |    "outputs": [],
1327 |    "source": [
1328 |     "pd.merge(left, right, on=\"key\")"
1329 |    ]
1330 |   },
1331 |   {
1332 |    "cell_type": "code",
1333 |    "execution_count": null,
1334 |    "id": "c791cd31",
1335 |    "metadata": {},
1336 |    "outputs": [],
1337 |    "source": [
1338 |     "left = pd.DataFrame({\"key\": [\"foo\", \"bar\", \"ccc\"], \"lval\": [1, 2, 3]})\n",
1339 |     "left"
1340 |    ]
1341 |   },
1342 |   {
1343 |    "cell_type": "code",
1344 |    "execution_count": null,
1345 |    "id": "16055c93",
1346 |    "metadata": {},
1347 |    "outputs": [],
1348 |    "source": [
1349 |     "right = pd.DataFrame({\"key\": [\"foo\", \"bar\", \"foo\", \"bar\"], \"rval\": [4, 5, 8, 3]})\n",
1350 |     "right"
1351 |    ]
1352 |   },
1353 |   {
1354 |    "cell_type": "code",
1355 |    "execution_count": null,
1356 |    "id": "76aefea2",
1357 |    "metadata": {},
1358 |    "outputs": [],
1359 |    "source": [
1360 |     "pd.merge(left, right, on=\"key\")"
1361 |    ]
1362 |   },
1363 |   {
1364 |    "cell_type": "markdown",
1365 |    "id": "c652e699",
1366 |    "metadata": {},
1367 |    "source": [
1368 |     "## Grouping"
1369 |    ]
1370 |   },
1371 |   {
1372 |    "cell_type": "code",
1373 |    "execution_count": null,
1374 |    "id": "c8a8d51c",
1375 |    "metadata": {},
1376 |    "outputs": [],
1377 |    "source": [
1378 |     "df = pd.DataFrame(\n",
1379 |     "    {\n",
1380 |     "        \"A\": [\"foo\", \"bar\", \"foo\", \"bar\", \"foo\", \"bar\", \"foo\", \"foo\"],\n",
1381 |     "        \"B\": [\"one\", \"one\", \"two\", \"three\", \"two\", \"two\", \"one\", \"three\"],\n",
1382 |     "        \"C\": np.random.randn(8),\n",
1383 |     "        \"D\": np.random.randn(8),\n",
1384 |     "    }\n",
1385 |     ")"
1386 |    ]
1387 |   },
1388 |   {
1389 |    "cell_type": "code",
1390 |    "execution_count": null,
1391 |    "id": "84b3aad8",
1392 |    "metadata": {},
1393 |    "outputs": [],
1394 |    "source": [
1395 |     "df"
1396 |    ]
1397 |   },
1398 |   {
1399 |    "cell_type": "code",
1400 |    "execution_count": null,
1401 |    "id": "ce702be3",
1402 |    "metadata": {},
1403 |    "outputs": [],
1404 |    "source": [
1405 |     "df[\"A\"].unique()"
1406 |    ]
1407 |   },
1408 |   {
1409 |    "cell_type": "code",
1410 |    "execution_count": null,
1411 |    "id": "4e38b453",
1412 |    "metadata": {},
1413 |    "outputs": [],
1414 |    "source": [
1415 |     "df[\"A\"].value_counts()"
1416 |    ]
1417 |   },
1418 |   {
1419 |    "cell_type": "code",
1420 |    "execution_count": null,
1421 |    "id": "b8018ce6",
1422 |    "metadata": {},
1423 |    "outputs": [],
1424 |    "source": [
1425 |     "df.groupby(\"A\")[[\"C\", \"D\"]].sum()"
1426 |    ]
1427 |   },
1428 |   {
1429 |    "cell_type": "code",
1430 |    "execution_count": null,
1431 |    "id": "95b962e0",
1432 |    "metadata": {},
1433 |    "outputs": [],
1434 |    "source": [
1435 |     "df.groupby([\"A\", \"B\"]).sum()"
1436 |    ]
1437 |   },
1438 |   {
1439 |    "cell_type": "code",
1440 |    "execution_count": null,
1441 |    "id": "9fa36b25",
1442 |    "metadata": {},
1443 |    "outputs": [],
1444 |    "source": [
1445 |     "df.groupby([\"A\", \"B\"]).mean()[[\"C\"]]"
1446 |    ]
1447 |   },
1448 |   {
1449 |    "cell_type": "code",
1450 |    "execution_count": null,
1451 |    "id": "852e4319",
1452 |    "metadata": {},
1453 |    "outputs": [],
1454 |    "source": [
1455 |     "df.groupby([\"A\", \"B\"]).count()"
1456 |    ]
1457 |   },
1458 |   {
1459 |    "cell_type": "markdown",
1460 |    "id": "e70b94a3",
1461 |    "metadata": {},
1462 |    "source": [
1463 |     "## Reshaping"
1464 |    ]
1465 |   },
1466 |   {
1467 |    "cell_type": "code",
1468 |    "execution_count": null,
1469 |    "id": "82b53b4e",
1470 |    "metadata": {},
1471 |    "outputs": [],
1472 |    "source": [
1473 |     "tuples = list(\n",
1474 |     "    zip(\n",
1475 |     "        [\"bar\", \"bar\", \"baz\", \"baz\", \"foo\", \"foo\", \"qux\", \"qux\"],\n",
1476 |     "        [\"one\", \"two\", \"one\", \"two\", \"one\", \"two\", \"one\", \"two\"],\n",
1477 |     "    )\n",
1478 |     ")\n",
1479 |     "\n",
1480 |     "index = pd.MultiIndex.from_tuples(tuples, names=[\"first\", \"second\"])\n",
1481 |     "df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=[\"A\", \"B\"])\n",
1482 |     "df"
1483 |    ]
1484 |   },
1485 |   {
1486 |    "cell_type": "code",
1487 |    "execution_count": null,
1488 |    "id": "0b84ef5e",
1489 |    "metadata": {},
1490 |    "outputs": [],
1491 |    "source": [
1492 |     "df2 = df[:4]\n",
1493 |     "df2"
1494 |    ]
1495 |   },
1496 |   {
1497 |    "cell_type": "code",
1498 |    "execution_count": null,
1499 |    "id": "646195d2",
1500 |    "metadata": {},
1501 |    "outputs": [],
1502 |    "source": [
1503 |     "df2.values"
1504 |    ]
1505 |   },
1506 |   {
1507 |    "cell_type": "code",
1508 |    "execution_count": null,
1509 |    "id": "440f2280",
1510 |    "metadata": {},
1511 |    "outputs": [],
1512 |    "source": [
1513 |     "stacked = df2.stack()\n",
1514 |     "stacked"
1515 |    ]
1516 |   },
1517 |   {
1518 |    "cell_type": "code",
1519 |    "execution_count": null,
1520 |    "id": "5bfde1ce",
1521 |    "metadata": {},
1522 |    "outputs": [],
1523 |    "source": [
1524 |     "stacked.index"
1525 |    ]
1526 |   },
1527 |   {
1528 |    "cell_type": "code",
1529 |    "execution_count": null,
1530 |    "id": "1a108d06",
1531 |    "metadata": {},
1532 |    "outputs": [],
1533 |    "source": [
1534 |     "stacked.values"
1535 |    ]
1536 |   },
1537 |   {
1538 |    "cell_type": "code",
1539 |    "execution_count": null,
1540 |    "id": "e2fd8b87",
1541 |    "metadata": {},
1542 |    "outputs": [],
1543 |    "source": [
1544 |     "stacked.unstack()"
1545 |    ]
1546 |   },
1547 |   {
1548 |    "cell_type": "code",
1549 |    "execution_count": null,
1550 |    "id": "3503bc31",
1551 |    "metadata": {},
1552 |    "outputs": [],
1553 |    "source": [
1554 |     "stacked.unstack(1)"
1555 |    ]
1556 |   },
1557 |   {
1558 |    "cell_type": "code",
1559 |    "execution_count": null,
1560 |    "id": "064a51e7",
1561 |    "metadata": {},
1562 |    "outputs": [],
1563 |    "source": [
1564 |     "stacked.unstack(0).unstack(0)"
1565 |    ]
1566 |   },
1567 |   {
1568 |    "cell_type": "markdown",
1569 |    "id": "f98ff9a5",
1570 |    "metadata": {},
1571 |    "source": [
1572 |     "## Pivoting"
1573 |    ]
1574 |   },
1575 |   {
1576 |    "cell_type": "code",
1577 |    "execution_count": null,
1578 |    "id": "eb66efff",
1579 |    "metadata": {},
1580 |    "outputs": [],
1581 |    "source": [
1582 |     "df = pd.DataFrame(\n",
1583 |     "    {\n",
1584 |     "        \"A\": [\"one\", \"one\", \"two\", \"three\"] * 3,\n",
1585 |     "        \"B\": [\"A\", \"B\", \"C\"] * 4,\n",
1586 |     "        \"C\": [\"foo\", \"foo\", \"foo\", \"bar\", \"bar\", \"bar\"] * 2,\n",
1587 |     "        \"D\": np.random.randn(12),\n",
1588 |     "        \"E\": np.random.randn(12),\n",
1589 |     "    }\n",
1590 |     ")\n",
1591 |     "df"
1592 |    ]
1593 |   },
1594 |   {
1595 |    "cell_type": "code",
1596 |    "execution_count": null,
1597 |    "id": "e9248bfd",
1598 |    "metadata": {},
1599 |    "outputs": [],
1600 |    "source": [
1601 |     "pd.pivot_table(df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"])"
1602 |    ]
1603 |   },
1604 |   {
1605 |    "cell_type": "markdown",
1606 |    "id": "80bb7a2a",
1607 |    "metadata": {},
1608 |    "source": [
1609 |     "## Time Series"
1610 |    ]
1611 |   },
1612 |   {
1613 |    "cell_type": "code",
1614 |    "execution_count": null,
1615 |    "id": "e37886ec",
1616 |    "metadata": {},
1617 |    "outputs": [],
1618 |    "source": [
1619 |     "rng = pd.date_range(\"1/1/2012\", periods=1000, freq=\"S\")\n",
1620 |     "ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)\n",
1621 |     "ts"
1622 |    ]
1623 |   },
1624 |   {
1625 |    "cell_type": "code",
1626 |    "execution_count": null,
1627 |    "id": "8e3791e6",
1628 |    "metadata": {},
1629 |    "outputs": [],
1630 |    "source": [
1631 |     "ts.resample(\"1Min\").sum()"
1632 |    ]
1633 |   },
1634 |   {
1635 |    "cell_type": "code",
1636 |    "execution_count": null,
1637 |    "id": "25142ec1",
1638 |    "metadata": {},
1639 |    "outputs": [],
1640 |    "source": [
1641 |     "rng = pd.date_range(\"3/6/2012 00:00\", periods=5, freq=\"D\")\n",
1642 |     "ts = pd.Series(np.random.randn(len(rng)), rng)\n",
1643 |     "ts"
1644 |    ]
1645 |   },
1646 |   {
1647 |    "cell_type": "code",
1648 |    "execution_count": null,
1649 |    "id": "eb5bee69",
1650 |    "metadata": {},
1651 |    "outputs": [],
1652 |    "source": [
1653 |     "ts_utc = ts.tz_localize(\"UTC\")\n",
1654 |     "ts_utc"
1655 |    ]
1656 |   },
1657 |   {
1658 |    "cell_type": "code",
1659 |    "execution_count": null,
1660 |    "id": "1ff05f90",
1661 |    "metadata": {},
1662 |    "outputs": [],
1663 |    "source": [
1664 |     "ts_utc.tz_convert(\"US/Eastern\")"
1665 |    ]
1666 |   },
1667 |   {
1668 |    "cell_type": "code",
1669 |    "execution_count": null,
1670 |    "id": "51a62dc5",
1671 |    "metadata": {},
1672 |    "outputs": [],
1673 |    "source": [
1674 |     "rng = pd.date_range(\"1/1/2012\", periods=5, freq=\"M\")\n",
1675 |     "ts = pd.Series(np.random.randn(len(rng)), index=rng)\n",
1676 |     "ts"
1677 |    ]
1678 |   },
1679 |   {
1680 |    "cell_type": "code",
1681 |    "execution_count": null,
1682 |    "id": "7a10cff1",
1683 |    "metadata": {},
1684 |    "outputs": [],
1685 |    "source": [
1686 |     "ps = ts.to_period()\n",
1687 |     "ps"
1688 |    ]
1689 |   },
1690 |   {
1691 |    "cell_type": "code",
1692 |    "execution_count": null,
1693 |    "id": "c609aff7",
1694 |    "metadata": {},
1695 |    "outputs": [],
1696 |    "source": [
1697 |     "ps.to_timestamp()"
1698 |    ]
1699 |   },
1700 |   {
1701 |    "cell_type": "code",
1702 |    "execution_count": null,
1703 |    "id": "b47a344d",
1704 |    "metadata": {},
1705 |    "outputs": [],
1706 |    "source": [
1707 |     "pd.period_range(\"1990Q1\", \"2000Q4\", freq=\"Q-NOV\")"
1708 |    ]
1709 |   },
1710 |   {
1711 |    "cell_type": "code",
1712 |    "execution_count": null,
1713 |    "id": "b3712d37",
1714 |    "metadata": {},
1715 |    "outputs": [],
1716 |    "source": [
1717 |     "prng = pd.period_range(\"1990Q1\", \"2000Q4\", freq=\"Q-NOV\")\n",
1718 |     "ts = pd.Series(np.random.randn(len(prng)), prng)\n",
1719 |     "ts.index = (prng.asfreq(\"M\", \"e\") + 1).asfreq(\"H\", \"s\") + 9\n",
1720 |     "ts.head()"
1721 |    ]
1722 |   },
1723 |   {
1724 |    "cell_type": "markdown",
1725 |    "id": "286bad2a",
1726 |    "metadata": {},
1727 |    "source": [
1728 |     "## Categoricals"
1729 |    ]
1730 |   },
1731 |   {
1732 |    "cell_type": "code",
1733 |    "execution_count": null,
1734 |    "id": "93c71264",
1735 |    "metadata": {},
1736 |    "outputs": [],
1737 |    "source": [
1738 |     "df = pd.DataFrame(\n",
1739 |     "    {\"id\": [1, 2, 3, 4, 5, 6], \"raw_grade\": [\"a\", \"b\", \"b\", \"a\", \"a\", \"e\"]}\n",
1740 |     ")\n",
1741 |     "df"
1742 |    ]
1743 |   },
1744 |   {
1745 |    "cell_type": "code",
1746 |    "execution_count": null,
1747 |    "id": "b95a599b",
1748 |    "metadata": {},
1749 |    "outputs": [],
1750 |    "source": [
1751 |     "df[\"raw_grade\"].str.upper()"
1752 |    ]
1753 |   },
1754 |   {
1755 |    "cell_type": "code",
1756 |    "execution_count": null,
1757 |    "id": "3d3efdc4",
1758 |    "metadata": {},
1759 |    "outputs": [],
1760 |    "source": [
1761 |     "df[\"grade\"] = df[\"raw_grade\"].astype(\"category\")\n",
1762 |     "df[\"grade\"]"
1763 |    ]
1764 |   },
1765 |   {
1766 |    "cell_type": "code",
1767 |    "execution_count": null,
1768 |    "id": "8ce3a5d3",
1769 |    "metadata": {},
1770 |    "outputs": [],
1771 |    "source": [
1772 |     "new_categories = [\"very good\", \"good\", \"very bad\"]\n",
1773 |     "df[\"grade\"] = df[\"grade\"].cat.rename_categories(new_categories)"
1774 |    ]
1775 |   },
1776 |   {
1777 |    "cell_type": "code",
1778 |    "execution_count": null,
1779 |    "id": "63eab71a",
1780 |    "metadata": {},
1781 |    "outputs": [],
1782 |    "source": [
1783 |     "df"
1784 |    ]
1785 |   },
1786 |   {
1787 |    "cell_type": "code",
1788 |    "execution_count": null,
1789 |    "id": "862f74c9",
1790 |    "metadata": {},
1791 |    "outputs": [],
1792 |    "source": [
1793 |     "df[\"grade\"] = df[\"grade\"].cat.set_categories(\n",
1794 |     "    [\"very bad\", \"bad\", \"medium\", \"good\", \"very good\"]\n",
1795 |     ")\n",
1796 |     "df[\"grade\"]"
1797 |    ]
1798 |   },
1799 |   {
1800 |    "cell_type": "code",
1801 |    "execution_count": null,
1802 |    "id": "103191f2",
1803 |    "metadata": {},
1804 |    "outputs": [],
1805 |    "source": [
1806 |     "df.sort_values(by=\"grade\")"
1807 |    ]
1808 |   },
1809 |   {
1810 |    "cell_type": "code",
1811 |    "execution_count": null,
1812 |    "id": "705d978a",
1813 |    "metadata": {},
1814 |    "outputs": [],
1815 |    "source": [
1816 |     "df.groupby(\"raw_grade\").size()"
1817 |    ]
1818 |   },
1819 |   {
1820 |    "cell_type": "code",
1821 |    "execution_count": null,
1822 |    "id": "51bf8c5c",
1823 |    "metadata": {},
1824 |    "outputs": [],
1825 |    "source": [
1826 |     "df.groupby(\"grade\").size()"
1827 |    ]
1828 |   },
1829 |   {
1830 |    "cell_type": "code",
1831 |    "execution_count": null,
1832 |    "id": "086a42e3",
1833 |    "metadata": {},
1834 |    "outputs": [],
1835 |    "source": [
1836 |     "import matplotlib.pyplot as plt"
1837 |    ]
1838 |   },
1839 |   {
1840 |    "cell_type": "code",
1841 |    "execution_count": null,
1842 |    "id": "bb53025d",
1843 |    "metadata": {},
1844 |    "outputs": [],
1845 |    "source": [
1846 |     "pd.Series(np.random.randn(1000), index=pd.date_range(\"1/1/2000\", periods=1000)).plot()"
1847 |    ]
1848 |   },
1849 |   {
1850 |    "cell_type": "code",
1851 |    "execution_count": null,
1852 |    "id": "bd5364d8",
1853 |    "metadata": {},
1854 |    "outputs": [],
1855 |    "source": [
1856 |     "ts = pd.Series(np.random.randn(1000), index=pd.date_range(\"1/1/2000\", periods=1000))\n",
1857 |     "ts = ts.cumsum()\n",
1858 |     "ts.plot();"
1859 |    ]
1860 |   },
1861 |   {
1862 |    "cell_type": "code",
1863 |    "execution_count": null,
1864 |    "id": "18d37930",
1865 |    "metadata": {},
1866 |    "outputs": [],
1867 |    "source": [
1868 |     "df = pd.DataFrame(\n",
1869 |     "    np.random.randn(1000, 4), index=ts.index, columns=[\"A\", \"B\", \"C\", \"D\"]\n",
1870 |     ")\n",
1871 |     "df = df.cumsum()\n",
1872 |     "\n",
1873 |     "\n",
1874 |     "df.plot()\n",
1875 |     "plt.legend(loc='best');"
1876 |    ]
1877 |   },
1878 |   {
1879 |    "cell_type": "markdown",
1880 |    "id": "a2cf4c18",
1881 |    "metadata": {},
1882 |    "source": [
1883 |     "# Working with real data"
1884 |    ]
1885 |   },
1886 |   {
1887 |    "cell_type": "code",
1888 |    "execution_count": null,
1889 |    "id": "89c3d4ba",
1890 |    "metadata": {},
1891 |    "outputs": [],
1892 |    "source": [
1893 |     "data = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv', sep=',')"
1894 |    ]
1895 |   },
1896 |   {
1897 |    "cell_type": "code",
1898 |    "execution_count": null,
1899 |    "id": "586a3735",
1900 |    "metadata": {},
1901 |    "outputs": [],
1902 |    "source": [
1903 |     "data.head()"
1904 |    ]
1905 |   },
1906 |   {
1907 |    "cell_type": "code",
1908 |    "execution_count": null,
1909 |    "id": "0fb15d90",
1910 |    "metadata": {},
1911 |    "outputs": [],
1912 |    "source": [
1913 |     "data.shape"
1914 |    ]
1915 |   },
1916 |   {
1917 |    "cell_type": "code",
1918 |    "execution_count": null,
1919 |    "id": "a26cad5f",
1920 |    "metadata": {},
1921 |    "outputs": [],
1922 |    "source": [
1923 |     "data.info()"
1924 |    ]
1925 |   },
1926 |   {
1927 |    "cell_type": "code",
1928 |    "execution_count": null,
1929 |    "id": "5c2e9f13",
1930 |    "metadata": {},
1931 |    "outputs": [],
1932 |    "source": [
1933 |     "data.describe()"
1934 |    ]
1935 |   },
1936 |   {
1937 |    "cell_type": "code",
1938 |    "execution_count": null,
1939 |    "id": "c30b94bb",
1940 |    "metadata": {},
1941 |    "outputs": [],
1942 |    "source": [
1943 |     "data.to_numpy()"
1944 |    ]
1945 |   },
1946 |   {
1947 |    "cell_type": "code",
1948 |    "execution_count": null,
1949 |    "id": "b48e6200",
1950 |    "metadata": {},
1951 |    "outputs": [],
1952 |    "source": [
1953 |     "type(data[\"Name\"])"
1954 |    ]
1955 |   },
1956 |   {
1957 |    "cell_type": "code",
1958 |    "execution_count": null,
1959 |    "id": "3e3b5e44",
1960 |    "metadata": {},
1961 |    "outputs": [],
1962 |    "source": [
1963 |     "data[\"Name\"]"
1964 |    ]
1965 |   },
1966 |   {
1967 |    "cell_type": "code",
1968 |    "execution_count": null,
1969 |    "id": "be3ee512",
1970 |    "metadata": {},
1971 |    "outputs": [],
1972 |    "source": [
1973 |     "data[[\"Name\"]]"
1974 |    ]
1975 |   },
1976 |   {
1977 |    "cell_type": "code",
1978 |    "execution_count": null,
1979 |    "id": "82623d08",
1980 |    "metadata": {},
1981 |    "outputs": [],
1982 |    "source": [
1983 |     "data[[\"Name\", \"Sex\", \"Age\"]]"
1984 |    ]
1985 |   },
1986 |   {
1987 |    "cell_type": "code",
1988 |    "execution_count": null,
1989 |    "id": "de950aeb",
1990 |    "metadata": {},
1991 |    "outputs": [],
1992 |    "source": [
1993 |     "data[\"Age\"].hist(bins=30)"
1994 |    ]
1995 |   },
1996 |   {
1997 |    "cell_type": "code",
1998 |    "execution_count": null,
1999 |    "id": "a3fbe35a",
2000 |    "metadata": {},
2001 |    "outputs": [],
2002 |    "source": [
2003 |     "data[\"Sex\"].hist()"
2004 |    ]
2005 |   },
2006 |   {
2007 |    "cell_type": "code",
2008 |    "execution_count": null,
2009 |    "id": "c2f48461",
2010 |    "metadata": {},
2011 |    "outputs": [],
2012 |    "source": [
2013 |     "data[\"Age\"] < 30"
2014 |    ]
2015 |   },
2016 |   {
2017 |    "cell_type": "code",
2018 |    "execution_count": null,
2019 |    "id": "d8dcb30b",
2020 |    "metadata": {},
2021 |    "outputs": [],
2022 |    "source": [
2023 |     "bool_mask = data[\"Age\"] < 30\n",
2024 |     "data[bool_mask]"
2025 |    ]
2026 |   },
2027 |   {
2028 |    "cell_type": "code",
2029 |    "execution_count": null,
2030 |    "id": "4174142d",
2031 |    "metadata": {},
2032 |    "outputs": [],
2033 |    "source": [
2034 |     "data[bool_mask].equals(data[data[\"Age\"] < 30])"
2035 |    ]
2036 |   },
2037 |   {
2038 |    "cell_type": "code",
2039 |    "execution_count": null,
2040 |    "id": "6b11fe1c",
2041 |    "metadata": {},
2042 |    "outputs": [],
2043 |    "source": [
2044 |     "data[\"Age\"].mean()"
2045 |    ]
2046 |   },
2047 |   {
2048 |    "cell_type": "code",
2049 |    "execution_count": null,
2050 |    "id": "7068042d",
2051 |    "metadata": {},
2052 |    "outputs": [],
2053 |    "source": [
2054 |     "data.groupby(\"Sex\").mean()"
2055 |    ]
2056 |   },
2057 |   {
2058 |    "cell_type": "markdown",
2059 |    "id": "1f32196a",
2060 |    "metadata": {},
2061 |    "source": [
2062 |     "# 3. Matplotlib\n",
2063 |     "  \n",
2064 |     "Официальный сайт -- https://matplotlib.org   \n",
2065 |     "Getting started -- https://matplotlib.org/stable/users/getting_started/   \n",
2066 |     "Примеры -- https://matplotlib.org/stable/gallery/index.html"
2067 |    ]
2068 |   },
2069 |   {
2070 |    "cell_type": "code",
2071 |    "execution_count": null,
2072 |    "id": "1a0bc676",
2073 |    "metadata": {},
2074 |    "outputs": [],
2075 |    "source": [
2076 |     "import matplotlib as mpl\n",
2077 |     "import matplotlib.pyplot as plt"
2078 |    ]
2079 |   },
2080 |   {
2081 |    "cell_type": "code",
2082 |    "execution_count": null,
2083 |    "id": "843f5b88",
2084 |    "metadata": {},
2085 |    "outputs": [],
2086 |    "source": [
2087 |     "fig, ax = plt.subplots()  \n",
2088 |     "ax.plot([1, 2, 3, 4], [1, 4, 2, 3])"
2089 |    ]
2090 |   },
2091 |   {
2092 |    "cell_type": "code",
2093 |    "execution_count": null,
2094 |    "id": "a077efe3",
2095 |    "metadata": {},
2096 |    "outputs": [],
2097 |    "source": [
2098 |     "np.random.seed(17)\n",
2099 |     "data = {'a': np.arange(50),\n",
2100 |     "        'c': np.random.randint(0, 50, 50),\n",
2101 |     "        'd': np.random.randn(50)}\n",
2102 |     "data['b'] = data['a'] + 10 * np.random.randn(50)\n",
2103 |     "data['d'] = np.abs(data['d']) * 100\n",
2104 |     "\n",
2105 |     "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2106 |     "ax.scatter('a', 'b', c='c', s='d', data=data)\n",
2107 |     "ax.set_xlabel('entry a')\n",
2108 |     "ax.set_ylabel('entry b')"
2109 |    ]
2110 |   },
2111 |   {
2112 |    "cell_type": "code",
2113 |    "execution_count": null,
2114 |    "id": "a8d00d46",
2115 |    "metadata": {},
2116 |    "outputs": [],
2117 |    "source": [
2118 |     "x = np.linspace(0, 2, 100) \n",
2119 |     "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2120 |     "ax.plot(x, x, label='linear')\n",
2121 |     "ax.plot(x, x**2, label='quadratic')\n",
2122 |     "ax.plot(x, x**3, label='cubic')  \n",
2123 |     "ax.set_xlabel('x label') \n",
2124 |     "ax.set_ylabel('y label') \n",
2125 |     "ax.set_title(\"Simple Plot\") \n",
2126 |     "ax.legend() "
2127 |    ]
2128 |   },
2129 |   {
2130 |    "cell_type": "code",
2131 |    "execution_count": null,
2132 |    "id": "fddaf151",
2133 |    "metadata": {},
2134 |    "outputs": [],
2135 |    "source": [
2136 |     "x = np.linspace(0, 2, 100)\n",
2137 |     "\n",
2138 |     "plt.figure(figsize=(5, 2.7), layout='constrained')\n",
2139 |     "plt.plot(x, x, label='linear')\n",
2140 |     "plt.plot(x, x**2, label='quadratic')\n",
2141 |     "plt.plot(x, x**3, label='cubic')\n",
2142 |     "plt.xlabel('x label')\n",
2143 |     "plt.ylabel('y label')\n",
2144 |     "plt.title(\"Simple Plot\")\n",
2145 |     "plt.legend()"
2146 |    ]
2147 |   },
2148 |   {
2149 |    "cell_type": "code",
2150 |    "execution_count": null,
2151 |    "id": "f58cff42",
2152 |    "metadata": {},
2153 |    "outputs": [],
2154 |    "source": [
2155 |     "data1, data2, data3, data4 = np.random.randn(4, 100)"
2156 |    ]
2157 |   },
2158 |   {
2159 |    "cell_type": "code",
2160 |    "execution_count": null,
2161 |    "id": "39298642",
2162 |    "metadata": {},
2163 |    "outputs": [],
2164 |    "source": [
2165 |     "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2166 |     "x = np.arange(len(data1))\n",
2167 |     "ax.plot(x, np.cumsum(data1), color='blue', linewidth=3, linestyle='--')\n",
2168 |     "l, = ax.plot(x, np.cumsum(data2), color='orange', linewidth=2)\n",
2169 |     "l.set_linestyle(':')"
2170 |    ]
2171 |   },
2172 |   {
2173 |    "cell_type": "code",
2174 |    "execution_count": null,
2175 |    "id": "71954147",
2176 |    "metadata": {},
2177 |    "outputs": [],
2178 |    "source": [
2179 |     "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2180 |     "ax.scatter(data1, data2, s=50, facecolor='C0', edgecolor='k')"
2181 |    ]
2182 |   },
2183 |   {
2184 |    "cell_type": "code",
2185 |    "execution_count": null,
2186 |    "id": "f6c8c058",
2187 |    "metadata": {},
2188 |    "outputs": [],
2189 |    "source": [
2190 |     "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2191 |     "ax.plot(data1, 'o', label='data1')\n",
2192 |     "ax.plot(data2, 'd', label='data2')\n",
2193 |     "ax.plot(data3, 'v', label='data3')\n",
2194 |     "ax.plot(data4, 's', label='data4')\n",
2195 |     "ax.legend()"
2196 |    ]
2197 |   },
2198 |   {
2199 |    "cell_type": "code",
2200 |    "execution_count": null,
2201 |    "id": "44fc4990",
2202 |    "metadata": {},
2203 |    "outputs": [],
2204 |    "source": [
2205 |     "mu, sigma = 115, 15\n",
2206 |     "x = mu + sigma * np.random.randn(10000)\n",
2207 |     "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2208 |     "\n",
2209 |     "n, bins, patches = ax.hist(x, 50, density=True, facecolor='C0', alpha=0.75)\n",
2210 |     "\n",
2211 |     "ax.set_xlabel('Length [cm]')\n",
2212 |     "ax.set_ylabel('Probability')\n",
2213 |     "ax.set_title('Aardvark lengths\\n (not really)')\n",
2214 |     "ax.text(75, .025, r'$\\mu=115,\\ \\sigma=15$')\n",
2215 |     "ax.axis([55, 175, 0, 0.03])\n",
2216 |     "ax.grid(True)"
2217 |    ]
2218 |   },
2219 |   {
2220 |    "cell_type": "code",
2221 |    "execution_count": null,
2222 |    "id": "bd4d06d0",
2223 |    "metadata": {},
2224 |    "outputs": [],
2225 |    "source": [
2226 |     "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2227 |     "\n",
2228 |     "t = np.arange(0.0, 5.0, 0.01)\n",
2229 |     "s = np.cos(2 * np.pi * t)\n",
2230 |     "line, = ax.plot(t, s, lw=2)\n",
2231 |     "\n",
2232 |     "ax.annotate('local max', xy=(2, 1), xytext=(3, 1.5),\n",
2233 |     "            arrowprops=dict(facecolor='black', shrink=0.05))\n",
2234 |     "\n",
2235 |     "ax.set_ylim(-2, 2)"
2236 |    ]
2237 |   },
2238 |   {
2239 |    "cell_type": "code",
2240 |    "execution_count": null,
2241 |    "id": "dcb20e51",
2242 |    "metadata": {},
2243 |    "outputs": [],
2244 |    "source": [
2245 |     "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2246 |     "ax.plot(np.arange(len(data1)), data1, label='data1')\n",
2247 |     "ax.plot(np.arange(len(data2)), data2, label='data2')\n",
2248 |     "ax.plot(np.arange(len(data3)), data3, 'd', label='data3')\n",
2249 |     "ax.legend()"
2250 |    ]
2251 |   },
2252 |   {
2253 |    "cell_type": "code",
2254 |    "execution_count": null,
2255 |    "id": "9ce0253f",
2256 |    "metadata": {},
2257 |    "outputs": [],
2258 |    "source": [
2259 |     "fig, axs = plt.subplots(1, 2, figsize=(5, 2.7), layout='constrained')\n",
2260 |     "xdata = np.arange(len(data1))  # make an ordinal for this\n",
2261 |     "data = 10**data1\n",
2262 |     "axs[0].plot(xdata, data)\n",
2263 |     "\n",
2264 |     "axs[1].set_yscale('log')\n",
2265 |     "axs[1].plot(xdata, data)"
2266 |    ]
2267 |   },
2268 |   {
2269 |    "cell_type": "code",
2270 |    "execution_count": null,
2271 |    "id": "e8052aaa",
2272 |    "metadata": {},
2273 |    "outputs": [],
2274 |    "source": [
2275 |     "fig, axs = plt.subplots(2, 1, layout='constrained')\n",
2276 |     "axs[0].plot(xdata, data1)\n",
2277 |     "axs[0].set_title('Automatic ticks')\n",
2278 |     "\n",
2279 |     "axs[1].plot(xdata, data1)\n",
2280 |     "axs[1].set_xticks(np.arange(0, 100, 30), ['zero', '30', 'sixty', '90'])\n",
2281 |     "axs[1].set_yticks([-1.5, 0, 1.5])  # note that we don't need to specify labels\n",
2282 |     "axs[1].set_title('Manual ticks')"
2283 |    ]
2284 |   },
2285 |   {
2286 |    "cell_type": "code",
2287 |    "execution_count": null,
2288 |    "id": "b6049ab7",
2289 |    "metadata": {},
2290 |    "outputs": [],
2291 |    "source": [
2292 |     "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2293 |     "dates = np.arange(np.datetime64('2021-11-15'), np.datetime64('2021-12-25'),\n",
2294 |     "                  np.timedelta64(1, 'h'))\n",
2295 |     "data = np.cumsum(np.random.randn(len(dates)))\n",
2296 |     "ax.plot(dates, data)\n",
2297 |     "cdf = mpl.dates.ConciseDateFormatter(ax.xaxis.get_major_locator())\n",
2298 |     "ax.xaxis.set_major_formatter(cdf)"
2299 |    ]
2300 |   },
2301 |   {
2302 |    "cell_type": "code",
2303 |    "execution_count": null,
2304 |    "id": "47d167c2",
2305 |    "metadata": {},
2306 |    "outputs": [],
2307 |    "source": [
2308 |     "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2309 |     "categories = ['turnips', 'rutabaga', 'cucumber', 'pumpkins']\n",
2310 |     "\n",
2311 |     "ax.bar(categories, np.random.rand(len(categories)))"
2312 |    ]
2313 |   },
2314 |   {
2315 |    "cell_type": "code",
2316 |    "execution_count": null,
2317 |    "id": "c9850fed",
2318 |    "metadata": {},
2319 |    "outputs": [],
2320 |    "source": [
2321 |     "fig, (ax1, ax3) = plt.subplots(1, 2, figsize=(7, 2.7), layout='constrained')\n",
2322 |     "l1, = ax1.plot(t, s)\n",
2323 |     "ax2 = ax1.twinx()\n",
2324 |     "l2, = ax2.plot(t, range(len(t)), 'C1')\n",
2325 |     "ax2.legend([l1, l2], ['Sine (left)', 'Straight (right)'])\n",
2326 |     "\n",
2327 |     "ax3.plot(t, s)\n",
2328 |     "ax3.set_xlabel('Angle [rad]')\n",
2329 |     "ax4 = ax3.secondary_xaxis('top', functions=(np.rad2deg, np.deg2rad))\n",
2330 |     "ax4.set_xlabel('Angle [°]')"
2331 |    ]
2332 |   },
2333 |   {
2334 |    "cell_type": "code",
2335 |    "execution_count": null,
2336 |    "id": "4dc9684f",
2337 |    "metadata": {},
2338 |    "outputs": [],
2339 |    "source": [
2340 |     "X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))\n",
2341 |     "Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)\n",
2342 |     "\n",
2343 |     "fig, axs = plt.subplots(2, 2, layout='constrained')\n",
2344 |     "pc = axs[0, 0].pcolormesh(X, Y, Z, vmin=-1, vmax=1, cmap='RdBu_r')\n",
2345 |     "fig.colorbar(pc, ax=axs[0, 0])\n",
2346 |     "axs[0, 0].set_title('pcolormesh()')\n",
2347 |     "\n",
2348 |     "co = axs[0, 1].contourf(X, Y, Z, levels=np.linspace(-1.25, 1.25, 11))\n",
2349 |     "fig.colorbar(co, ax=axs[0, 1])\n",
2350 |     "axs[0, 1].set_title('contourf()')\n",
2351 |     "\n",
2352 |     "pc = axs[1, 0].imshow(Z**2 * 100, cmap='plasma',\n",
2353 |     "                          norm=mpl.colors.LogNorm(vmin=0.01, vmax=100))\n",
2354 |     "fig.colorbar(pc, ax=axs[1, 0], extend='both')\n",
2355 |     "axs[1, 0].set_title('imshow() with LogNorm()')\n",
2356 |     "\n",
2357 |     "pc = axs[1, 1].scatter(data1, data2, c=data3, cmap='RdBu_r')\n",
2358 |     "fig.colorbar(pc, ax=axs[1, 1], extend='both')\n",
2359 |     "axs[1, 1].set_title('scatter()')"
2360 |    ]
2361 |   },
2362 |   {
2363 |    "cell_type": "code",
2364 |    "execution_count": null,
2365 |    "id": "23ef9c07",
2366 |    "metadata": {},
2367 |    "outputs": [],
2368 |    "source": [
2369 |     "fig, axd = plt.subplot_mosaic([['upleft', 'right'],\n",
2370 |     "                               ['lowleft', 'right']], layout='constrained')\n",
2371 |     "axd['upleft'].set_title('upleft')\n",
2372 |     "axd['lowleft'].set_title('lowleft')\n",
2373 |     "axd['right'].set_title('right')"
2374 |    ]
2375 |   }
2376 |  ],
2377 |  "metadata": {
2378 |   "kernelspec": {
2379 |    "display_name": "Python 3 (ipykernel)",
2380 |    "language": "python",
2381 |    "name": "python3"
2382 |   },
2383 |   "language_info": {
2384 |    "codemirror_mode": {
2385 |     "name": "ipython",
2386 |     "version": 3
2387 |    },
2388 |    "file_extension": ".py",
2389 |    "mimetype": "text/x-python",
2390 |    "name": "python",
2391 |    "nbconvert_exporter": "python",
2392 |    "pygments_lexer": "ipython3",
2393 |    "version": "3.9.13"
2394 |   }
2395 |  },
2396 |  "nbformat": 4,
2397 |  "nbformat_minor": 5
2398 | }
2399 | 


--------------------------------------------------------------------------------
/notebooks/seminars/04-pytorch-intro.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {
   6 |     "id": "apOCg85d9SLV"
   7 |    },
   8 |    "source": [
   9 |     "# Семинар по Pytorch.\n"
  10 |    ]
  11 |   },
  12 |   {
  13 |    "cell_type": "code",
  14 |    "execution_count": null,
  15 |    "metadata": {
  16 |     "id": "_gLdrFvh9SLX"
  17 |    },
  18 |    "outputs": [],
  19 |    "source": [
  20 |     "import matplotlib.pyplot as plt\n",
  21 |     "import numpy as np\n",
  22 |     "import sklearn\n",
  23 |     "%matplotlib inline"
  24 |    ]
  25 |   },
  26 |   {
  27 |    "cell_type": "markdown",
  28 |    "metadata": {
  29 |     "id": "OYzpWFqt9SLY"
  30 |    },
  31 |    "source": [
  32 |     "Установка: https://pytorch.org/get-started/locally/  \n",
  33 |     "В этом ноутбуке будут разобраны основные особенности фреймворка Pytorch. Pytorch - это как Numpy, только умеет эффективно автоматически считать градиенты."
  34 |    ]
  35 |   },
  36 |   {
  37 |    "cell_type": "code",
  38 |    "execution_count": null,
  39 |    "metadata": {
  40 |     "colab": {
  41 |      "base_uri": "https://localhost:8080/"
  42 |     },
  43 |     "id": "ejEH9lf39SLZ",
  44 |     "outputId": "67145a07-4704-44b0-bfb7-c962c75b03e0"
  45 |    },
  46 |    "outputs": [],
  47 |    "source": [
  48 |     "import torch\n",
  49 |     "\n",
  50 |     "print(torch.__version__)"
  51 |    ]
  52 |   },
  53 |   {
  54 |    "cell_type": "code",
  55 |    "execution_count": null,
  56 |    "metadata": {
  57 |     "id": "CvX6ONy29SLa"
  58 |    },
  59 |    "outputs": [],
  60 |    "source": [
  61 |     "import torchvision\n",
  62 |     "from torch import nn\n",
  63 |     "import os\n",
  64 |     "from torchvision.datasets import MNIST\n",
  65 |     "import torchvision.transforms as transforms\n",
  66 |     "import tqdm"
  67 |    ]
  68 |   },
  69 |   {
  70 |    "cell_type": "code",
  71 |    "execution_count": null,
  72 |    "metadata": {
  73 |     "colab": {
  74 |      "base_uri": "https://localhost:8080/"
  75 |     },
  76 |     "id": "iyXIU9K29SLb",
  77 |     "outputId": "5c8cb232-155a-48ec-a759-6d8bdeb7fd90"
  78 |    },
  79 |    "outputs": [],
  80 |    "source": [
  81 |     "# проверить, доступна ли у вас cuda.\n",
  82 |     "torch.cuda.is_available()"
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "markdown",
  87 |    "metadata": {
  88 |     "id": "7eVL0Xh49SLb"
  89 |    },
  90 |    "source": [
  91 |     "# Базовые операции"
  92 |    ]
  93 |   },
  94 |   {
  95 |    "cell_type": "code",
  96 |    "execution_count": null,
  97 |    "metadata": {
  98 |     "colab": {
  99 |      "base_uri": "https://localhost:8080/"
 100 |     },
 101 |     "id": "VJh2krfQ9SLb",
 102 |     "outputId": "b6f11b04-1d73-42e3-8433-11254067f174",
 103 |     "scrolled": true
 104 |    },
 105 |    "outputs": [],
 106 |    "source": [
 107 |     "# numpy\n",
 108 |     "\n",
 109 |     "x = np.arange(25).reshape(5, 5)\n",
 110 |     "\n",
 111 |     "print(\"X :\\n {}\\n\".format(x))\n",
 112 |     "print(\"X.shape : {}\\n \".format(x.shape))\n",
 113 |     "print(\"Возвести в квадрат:\\n {}\\n\".format(x * x))\n",
 114 |     "print(\"X*X^T  :\\n {}\\n\".format(np.matmul(x, x.T)))\n",
 115 |     "print(\"Cреднее по столбцам :\\n {}\\n\".format(np.mean(x, axis=0)))\n",
 116 |     "print(\"Сумма по строкам:\\n {}\\n\".format(np.cumsum(x, axis=1)))"
 117 |    ]
 118 |   },
 119 |   {
 120 |    "cell_type": "code",
 121 |    "execution_count": null,
 122 |    "metadata": {
 123 |     "colab": {
 124 |      "base_uri": "https://localhost:8080/"
 125 |     },
 126 |     "id": "qRoNIUAm9SLc",
 127 |     "outputId": "700f09ba-d36f-43d8-9bb7-592c01222c5c"
 128 |    },
 129 |    "outputs": [],
 130 |    "source": [
 131 |     "# torch\n",
 132 |     "\n",
 133 |     "x = torch.arange(25).reshape(5, 5).float()\n",
 134 |     "\n",
 135 |     "print(\"X :\\n {}\\n\".format(x))\n",
 136 |     "print(\"X.shape : {}\\n \".format(x.shape))\n",
 137 |     "print(\"Возвести в квадрат:\\n {}\\n\".format(x * x))\n",
 138 |     "print(\"X*X^T  :\\n {}\\n\".format(torch.matmul(x, x.T)))\n",
 139 |     "print(\"Cреднее по столбцам :\\n {}\\n\".format(torch.mean(x, axis=0)))\n",
 140 |     "print(\"Сумма по строкам:\\n {}\\n\".format(torch.cumsum(x, axis=1)))"
 141 |    ]
 142 |   },
 143 |   {
 144 |    "cell_type": "markdown",
 145 |    "metadata": {
 146 |     "id": "PErs03Qf9SLc"
 147 |    },
 148 |    "source": [
 149 |     "# Pytorch - почти Numpy.\n",
 150 |     "Вы можете создавать тензоры, смотреть на их градиенты, не создавая сессии как в tensorflow.\n",
 151 |     "Названия методов очень похожи. Если они отличаются - загляните в таблицу: https://github.com/torch/torch7/wiki/Torch-for-Numpy-users\n"
 152 |    ]
 153 |   },
 154 |   {
 155 |    "cell_type": "markdown",
 156 |    "metadata": {
 157 |     "id": "OVAZNles9SLd"
 158 |    },
 159 |    "source": [
 160 |     "# Pytroch сам считает backpropagation для нас с помощью модуля autograd"
 161 |    ]
 162 |   },
 163 |   {
 164 |    "cell_type": "code",
 165 |    "execution_count": null,
 166 |    "metadata": {
 167 |     "colab": {
 168 |      "base_uri": "https://localhost:8080/"
 169 |     },
 170 |     "id": "BxomQQ3u9SLd",
 171 |     "outputId": "c5eb07db-3143-49ae-f5cf-7bc1c957c161"
 172 |    },
 173 |    "outputs": [],
 174 |    "source": [
 175 |     "# создаем тензор\n",
 176 |     "preds = torch.zeros(5, requires_grad=True)\n",
 177 |     "\n",
 178 |     "# вектор предсказаний\n",
 179 |     "labels = torch.ones(5, requires_grad=True)\n",
 180 |     "\n",
 181 |     "# loss: MAE\n",
 182 |     "loss = torch.mean(torch.abs(labels - preds))\n",
 183 |     "\n",
 184 |     "print(loss)\n",
 185 |     "\n",
 186 |     "# запускаем backprop\n",
 187 |     "loss.backward()"
 188 |    ]
 189 |   },
 190 |   {
 191 |    "cell_type": "code",
 192 |    "execution_count": null,
 193 |    "metadata": {
 194 |     "colab": {
 195 |      "base_uri": "https://localhost:8080/"
 196 |     },
 197 |     "id": "aSu1ITF99SLe",
 198 |     "outputId": "f81d2c14-af85-4dab-815f-09ec4ae1c12f"
 199 |    },
 200 |    "outputs": [],
 201 |    "source": [
 202 |     "# градиенты доступны в поле .grad:\n",
 203 |     "preds.grad"
 204 |    ]
 205 |   },
 206 |   {
 207 |    "cell_type": "code",
 208 |    "execution_count": null,
 209 |    "metadata": {
 210 |     "colab": {
 211 |      "base_uri": "https://localhost:8080/"
 212 |     },
 213 |     "id": "JlpI3Th29SLe",
 214 |     "outputId": "e7c3d66e-9233-41d7-f227-37589db53212"
 215 |    },
 216 |    "outputs": [],
 217 |    "source": [
 218 |     "# градиенты можно занулить\n",
 219 |     "preds.grad.zero_()"
 220 |    ]
 221 |   },
 222 |   {
 223 |    "cell_type": "code",
 224 |    "execution_count": null,
 225 |    "metadata": {
 226 |     "colab": {
 227 |      "base_uri": "https://localhost:8080/"
 228 |     },
 229 |     "id": "zBjOk_jH9SLe",
 230 |     "outputId": "07571207-ce54-4452-adb3-705c2f55009f"
 231 |    },
 232 |    "outputs": [],
 233 |    "source": [
 234 |     "# теперь градиенты снова 0\n",
 235 |     "preds.grad"
 236 |    ]
 237 |   },
 238 |   {
 239 |    "cell_type": "markdown",
 240 |    "metadata": {
 241 |     "id": "rzwlsjQC9SLf"
 242 |    },
 243 |    "source": [
 244 |     "###  Градиенты накапливаются при каждом вызове backward()"
 245 |    ]
 246 |   },
 247 |   {
 248 |    "cell_type": "code",
 249 |    "execution_count": null,
 250 |    "metadata": {
 251 |     "colab": {
 252 |      "base_uri": "https://localhost:8080/"
 253 |     },
 254 |     "id": "kgydtalG9SLf",
 255 |     "outputId": "886288c8-bcd2-4a78-ad5b-f795b8485693"
 256 |    },
 257 |    "outputs": [],
 258 |    "source": [
 259 |     "# создаем тензор\n",
 260 |     "preds = torch.zeros(5, requires_grad=True)\n",
 261 |     "\n",
 262 |     "# вектор предсказаний\n",
 263 |     "labels = torch.ones(5, requires_grad=True)\n",
 264 |     "\n",
 265 |     "# loss: MAE\n",
 266 |     "loss = torch.mean(torch.abs(labels - preds))\n",
 267 |     "\n",
 268 |     "print(loss)\n",
 269 |     "\n",
 270 |     "# запускаем backprop\n",
 271 |     "for i in range(5):\n",
 272 |     "    loss.backward(retain_graph=True)\n",
 273 |     "    print(i, preds.grad)"
 274 |    ]
 275 |   },
 276 |   {
 277 |    "cell_type": "markdown",
 278 |    "metadata": {
 279 |     "id": "xt-q2Fk79SLg"
 280 |    },
 281 |    "source": [
 282 |     "# Пишем свою логистическую регрессию на пайторче"
 283 |    ]
 284 |   },
 285 |   {
 286 |    "cell_type": "code",
 287 |    "execution_count": null,
 288 |    "metadata": {
 289 |     "id": "J3yFTWek9SLg"
 290 |    },
 291 |    "outputs": [],
 292 |    "source": [
 293 |     "from sklearn.datasets import make_blobs\n",
 294 |     "import sklearn\n",
 295 |     "\n",
 296 |     "X, y = make_blobs(\n",
 297 |     "    n_samples=200,\n",
 298 |     "    centers=((10, 5), (5, -5)),\n",
 299 |     "    n_features=2,\n",
 300 |     "    random_state=0,\n",
 301 |     "    cluster_std=3,\n",
 302 |     ")"
 303 |    ]
 304 |   },
 305 |   {
 306 |    "cell_type": "code",
 307 |    "execution_count": null,
 308 |    "metadata": {
 309 |     "colab": {
 310 |      "base_uri": "https://localhost:8080/",
 311 |      "height": 282
 312 |     },
 313 |     "id": "FDd7e9339SLg",
 314 |     "outputId": "1b919df0-f188-447c-e69e-f6a468ce9cb6"
 315 |    },
 316 |    "outputs": [],
 317 |    "source": [
 318 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")"
 319 |    ]
 320 |   },
 321 |   {
 322 |    "cell_type": "markdown",
 323 |    "metadata": {
 324 |     "id": "5NU6rOBC9SLh"
 325 |    },
 326 |    "source": [
 327 |     "# Напоминание. Как это было в Sklearn:"
 328 |    ]
 329 |   },
 330 |   {
 331 |    "cell_type": "code",
 332 |    "execution_count": null,
 333 |    "metadata": {
 334 |     "id": "7wWhBHPH9SLh"
 335 |    },
 336 |    "outputs": [],
 337 |    "source": [
 338 |     "from sklearn.linear_model import LogisticRegression"
 339 |    ]
 340 |   },
 341 |   {
 342 |    "cell_type": "code",
 343 |    "execution_count": null,
 344 |    "metadata": {
 345 |     "id": "2HmP5WB_9SLi"
 346 |    },
 347 |    "outputs": [],
 348 |    "source": [
 349 |     "model = LogisticRegression()"
 350 |    ]
 351 |   },
 352 |   {
 353 |    "cell_type": "code",
 354 |    "execution_count": null,
 355 |    "metadata": {
 356 |     "colab": {
 357 |      "base_uri": "https://localhost:8080/"
 358 |     },
 359 |     "id": "WBcgUibg9SLi",
 360 |     "outputId": "74606d4b-5b33-4025-f275-fd4690b22c71",
 361 |     "scrolled": true
 362 |    },
 363 |    "outputs": [],
 364 |    "source": [
 365 |     "model.fit(X, y)"
 366 |    ]
 367 |   },
 368 |   {
 369 |    "cell_type": "code",
 370 |    "execution_count": null,
 371 |    "metadata": {
 372 |     "colab": {
 373 |      "base_uri": "https://localhost:8080/"
 374 |     },
 375 |     "id": "lk966UsF9SLj",
 376 |     "outputId": "da585bd2-a748-4004-c80a-290471761045"
 377 |    },
 378 |    "outputs": [],
 379 |    "source": [
 380 |     "model.coef_"
 381 |    ]
 382 |   },
 383 |   {
 384 |    "cell_type": "code",
 385 |    "execution_count": null,
 386 |    "metadata": {
 387 |     "colab": {
 388 |      "base_uri": "https://localhost:8080/"
 389 |     },
 390 |     "id": "tInwNtdw9SLj",
 391 |     "outputId": "7547ac62-d286-4045-ec6b-158d31f5b062"
 392 |    },
 393 |    "outputs": [],
 394 |    "source": [
 395 |     "model.intercept_"
 396 |    ]
 397 |   },
 398 |   {
 399 |    "cell_type": "markdown",
 400 |    "metadata": {
 401 |     "id": "tyw6b4ZG9SLk"
 402 |    },
 403 |    "source": [
 404 |     "## Визуализируем разделяющую плоскость"
 405 |    ]
 406 |   },
 407 |   {
 408 |    "cell_type": "code",
 409 |    "execution_count": null,
 410 |    "metadata": {
 411 |     "colab": {
 412 |      "base_uri": "https://localhost:8080/",
 413 |      "height": 592
 414 |     },
 415 |     "id": "zeQ1czKR9SLk",
 416 |     "outputId": "404ed742-7fc9-4bfe-c554-c5cd39d579a2"
 417 |    },
 418 |    "outputs": [],
 419 |    "source": [
 420 |     "w_1 = model.coef_[0][0]\n",
 421 |     "w_2 = model.coef_[0][1]\n",
 422 |     "w_0 = model.intercept_[0]\n",
 423 |     "\n",
 424 |     "plt.figure(figsize=(20, 10))\n",
 425 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
 426 |     "plt.legend([\"y = -1\", \"y = 1\"])\n",
 427 |     "x_arr = np.linspace(-3, 18, 1000)\n",
 428 |     "\n",
 429 |     "y_arr = -(w_0 + w_1 * x_arr) / w_2\n",
 430 |     "plt.plot(x_arr, y_arr)\n",
 431 |     "plt.show()"
 432 |    ]
 433 |   },
 434 |   {
 435 |    "cell_type": "markdown",
 436 |    "metadata": {
 437 |     "id": "uqe74hxI9SLl"
 438 |    },
 439 |    "source": [
 440 |     "# Pytorch"
 441 |    ]
 442 |   },
 443 |   {
 444 |    "cell_type": "code",
 445 |    "execution_count": null,
 446 |    "metadata": {
 447 |     "id": "zeYuViOh9SLl"
 448 |    },
 449 |    "outputs": [],
 450 |    "source": [
 451 |     "X, y = make_blobs(n_samples=200, centers=2, n_features=2, random_state=0, cluster_std=3)"
 452 |    ]
 453 |   },
 454 |   {
 455 |    "cell_type": "code",
 456 |    "execution_count": null,
 457 |    "metadata": {
 458 |     "colab": {
 459 |      "base_uri": "https://localhost:8080/"
 460 |     },
 461 |     "id": "jrbIbcUf9SLl",
 462 |     "outputId": "54d5f4cb-3c66-4531-ff11-6406df90fa81"
 463 |    },
 464 |    "outputs": [],
 465 |    "source": [
 466 |     "type(X), type(y)"
 467 |    ]
 468 |   },
 469 |   {
 470 |    "cell_type": "code",
 471 |    "execution_count": null,
 472 |    "metadata": {
 473 |     "id": "NkSB7ldb9SLm"
 474 |    },
 475 |    "outputs": [],
 476 |    "source": [
 477 |     "X = torch.tensor(X)\n",
 478 |     "y = torch.tensor(y)"
 479 |    ]
 480 |   },
 481 |   {
 482 |    "cell_type": "code",
 483 |    "execution_count": null,
 484 |    "metadata": {
 485 |     "colab": {
 486 |      "base_uri": "https://localhost:8080/"
 487 |     },
 488 |     "id": "R5pyIO1k9SLm",
 489 |     "outputId": "00f6053e-2099-495c-e226-867c3650d8bb"
 490 |    },
 491 |    "outputs": [],
 492 |    "source": [
 493 |     "type(X), type(y)"
 494 |    ]
 495 |   },
 496 |   {
 497 |    "cell_type": "code",
 498 |    "execution_count": null,
 499 |    "metadata": {
 500 |     "colab": {
 501 |      "base_uri": "https://localhost:8080/"
 502 |     },
 503 |     "id": "JdoThC549SLm",
 504 |     "outputId": "146701ee-b542-4bce-eb8c-c6a7f8c6a935"
 505 |    },
 506 |    "outputs": [],
 507 |    "source": [
 508 |     "print(X.shape, y.shape)"
 509 |    ]
 510 |   },
 511 |   {
 512 |    "cell_type": "code",
 513 |    "execution_count": null,
 514 |    "metadata": {
 515 |     "colab": {
 516 |      "base_uri": "https://localhost:8080/"
 517 |     },
 518 |     "id": "T6dvOti89SLo",
 519 |     "outputId": "bab39e85-0757-4d7e-bb49-4df45c0e0838"
 520 |    },
 521 |    "outputs": [],
 522 |    "source": [
 523 |     "len(X)"
 524 |    ]
 525 |   },
 526 |   {
 527 |    "cell_type": "code",
 528 |    "execution_count": null,
 529 |    "metadata": {
 530 |     "colab": {
 531 |      "base_uri": "https://localhost:8080/"
 532 |     },
 533 |     "id": "TSwk0ae39SLo",
 534 |     "outputId": "f66f7609-dc2b-41da-bf6a-9d0ef89e24f5"
 535 |    },
 536 |    "outputs": [],
 537 |    "source": [
 538 |     "len(y)"
 539 |    ]
 540 |   },
 541 |   {
 542 |    "cell_type": "markdown",
 543 |    "metadata": {
 544 |     "id": "0DW4OIky9SLo"
 545 |    },
 546 |    "source": [
 547 |     "# Напишем свою логистическую регрессию на пайторче"
 548 |    ]
 549 |   },
 550 |   {
 551 |    "cell_type": "code",
 552 |    "execution_count": null,
 553 |    "metadata": {
 554 |     "id": "RcTuxVya9SLo"
 555 |    },
 556 |    "outputs": [],
 557 |    "source": [
 558 |     "X, y = make_blobs(\n",
 559 |     "    n_samples=200,\n",
 560 |     "    centers=((10, 5), (5, -5)),\n",
 561 |     "    n_features=2,\n",
 562 |     "    random_state=0,\n",
 563 |     "    cluster_std=3,\n",
 564 |     ")"
 565 |    ]
 566 |   },
 567 |   {
 568 |    "cell_type": "code",
 569 |    "execution_count": null,
 570 |    "metadata": {
 571 |     "colab": {
 572 |      "base_uri": "https://localhost:8080/"
 573 |     },
 574 |     "id": "l9nsTljs9SLp",
 575 |     "outputId": "d0708940-91d1-4b69-da77-6de9cf08320b"
 576 |    },
 577 |    "outputs": [],
 578 |    "source": [
 579 |     "type(X), type(y)"
 580 |    ]
 581 |   },
 582 |   {
 583 |    "cell_type": "code",
 584 |    "execution_count": null,
 585 |    "metadata": {
 586 |     "id": "dRCnYCIZ9SLp"
 587 |    },
 588 |    "outputs": [],
 589 |    "source": [
 590 |     "X = torch.tensor(X)\n",
 591 |     "y = torch.tensor(y)"
 592 |    ]
 593 |   },
 594 |   {
 595 |    "cell_type": "code",
 596 |    "execution_count": null,
 597 |    "metadata": {
 598 |     "id": "XLWyyMy29SLp"
 599 |    },
 600 |    "outputs": [],
 601 |    "source": [
 602 |     "assert type(X) == torch.Tensor\n",
 603 |     "assert type(y) == torch.Tensor"
 604 |    ]
 605 |   },
 606 |   {
 607 |    "cell_type": "code",
 608 |    "execution_count": null,
 609 |    "metadata": {
 610 |     "id": "d5sC_9z79SLp"
 611 |    },
 612 |    "outputs": [],
 613 |    "source": [
 614 |     "w = torch.zeros(2, requires_grad=True, dtype=torch.double)\n",
 615 |     "b = torch.zeros(1, requires_grad=True, dtype=torch.double)"
 616 |    ]
 617 |   },
 618 |   {
 619 |    "cell_type": "code",
 620 |    "execution_count": null,
 621 |    "metadata": {
 622 |     "id": "oXS_iaNV9SLq"
 623 |    },
 624 |    "outputs": [],
 625 |    "source": [
 626 |     "assert w.requires_grad\n",
 627 |     "assert len(w.shape) == 1\n",
 628 |     "assert w.shape[0] == X.shape[1]\n",
 629 |     "assert w.dtype == X.dtype\n",
 630 |     "assert b.requires_grad\n",
 631 |     "assert len(b.shape) == 1\n",
 632 |     "assert b.dtype == X.dtype"
 633 |    ]
 634 |   },
 635 |   {
 636 |    "cell_type": "code",
 637 |    "execution_count": null,
 638 |    "metadata": {
 639 |     "id": "mHrha4x49SLq"
 640 |    },
 641 |    "outputs": [],
 642 |    "source": [
 643 |     "def binary_cross_entropy(y, y_predicted):\n",
 644 |     "    \"\"\"\n",
 645 |     "    y: binary tensor, shape: N, example: [0, 1, 0, 1, 1]\n",
 646 |     "    y_pred: tensor with values from 0 to 1. shape: N. example: [0.2, 0, 1, 0.75, 0.999]\n",
 647 |     "\n",
 648 |     "    output: tensor, shape: N\n",
 649 |     "\n",
 650 |     "    \"\"\"\n",
 651 |     "    return -(y * (torch.log(y_predicted)) + (1 - y) * torch.log(1 - y_predicted))"
 652 |    ]
 653 |   },
 654 |   {
 655 |    "cell_type": "code",
 656 |    "execution_count": null,
 657 |    "metadata": {
 658 |     "id": "Bc7UtPD-9SLq"
 659 |    },
 660 |    "outputs": [],
 661 |    "source": [
 662 |     "y_test = torch.tensor([1, 0, 1, 1])\n",
 663 |     "y_pred = torch.tensor([0.7, 0.3, 0.5, 0.9])\n",
 664 |     "bce_correct = torch.tensor([0.3567, 0.3567, 0.6931, 0.1054])\n",
 665 |     "bce_predicted = binary_cross_entropy(y_test, y_pred)\n",
 666 |     "assert bce_predicted.shape == y_test.shape\n",
 667 |     "assert torch.allclose(bce_predicted, bce_correct, rtol=1e-03)"
 668 |    ]
 669 |   },
 670 |   {
 671 |    "cell_type": "code",
 672 |    "execution_count": null,
 673 |    "metadata": {
 674 |     "colab": {
 675 |      "base_uri": "https://localhost:8080/",
 676 |      "height": 282
 677 |     },
 678 |     "id": "27EK6JmN9SLq",
 679 |     "outputId": "faa084ab-11aa-424c-bcc3-86fea3c02391",
 680 |     "scrolled": true
 681 |    },
 682 |    "outputs": [],
 683 |    "source": [
 684 |     "from IPython.display import clear_output\n",
 685 |     "\n",
 686 |     "learning_rate = 0.00001\n",
 687 |     "n_epoch = 1000\n",
 688 |     "\n",
 689 |     "for i in range(n_epoch):\n",
 690 |     "    y_pred = torch.matmul(X, w) + b\n",
 691 |     "    y_pred = torch.sigmoid(y_pred)\n",
 692 |     "    loss = binary_cross_entropy(y, y_pred)\n",
 693 |     "    loss = torch.sum(loss)\n",
 694 |     "\n",
 695 |     "    loss.backward()\n",
 696 |     "\n",
 697 |     "    w.data -= learning_rate * w.grad.data\n",
 698 |     "    b.data -= learning_rate * b.grad.data\n",
 699 |     "\n",
 700 |     "    # zero gradients\n",
 701 |     "    w.grad.data.zero_()\n",
 702 |     "    b.grad.data.zero_()\n",
 703 |     "\n",
 704 |     "    if (i + 1) % 5 == 0:\n",
 705 |     "        # if True:\n",
 706 |     "        clear_output(True)\n",
 707 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
 708 |     "\n",
 709 |     "        w_1 = w.data[0]\n",
 710 |     "        w_2 = w.data[1]\n",
 711 |     "        w_0 = b.data[0]\n",
 712 |     "\n",
 713 |     "        x_arr = torch.linspace(-10, 20, 1000)\n",
 714 |     "        plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
 715 |     "        plt.show()\n",
 716 |     "\n",
 717 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
 718 |    ]
 719 |   },
 720 |   {
 721 |    "cell_type": "markdown",
 722 |    "metadata": {
 723 |     "id": "dOACWrA69SLr"
 724 |    },
 725 |    "source": [
 726 |     "# Теперь все за нас делает пайторч"
 727 |    ]
 728 |   },
 729 |   {
 730 |    "cell_type": "code",
 731 |    "execution_count": null,
 732 |    "metadata": {
 733 |     "id": "GFipi35D9SLr"
 734 |    },
 735 |    "outputs": [],
 736 |    "source": [
 737 |     "X, y = make_blobs(\n",
 738 |     "    n_samples=200,\n",
 739 |     "    centers=((10, 5), (5, -5)),\n",
 740 |     "    n_features=2,\n",
 741 |     "    random_state=0,\n",
 742 |     "    cluster_std=3,\n",
 743 |     ")"
 744 |    ]
 745 |   },
 746 |   {
 747 |    "cell_type": "code",
 748 |    "execution_count": null,
 749 |    "metadata": {
 750 |     "id": "Wn8pm63j9SLr"
 751 |    },
 752 |    "outputs": [],
 753 |    "source": [
 754 |     "X = torch.tensor(X)\n",
 755 |     "y = torch.tensor(y)"
 756 |    ]
 757 |   },
 758 |   {
 759 |    "cell_type": "markdown",
 760 |    "metadata": {},
 761 |    "source": [
 762 |     "# Немного про оптимизаторы"
 763 |    ]
 764 |   },
 765 |   {
 766 |    "cell_type": "markdown",
 767 |    "metadata": {},
 768 |    "source": [
 769 |     "![](https://mpopov.com/images/adam-animated.gif)"
 770 |    ]
 771 |   },
 772 |   {
 773 |    "cell_type": "markdown",
 774 |    "metadata": {},
 775 |    "source": [
 776 |     "![](https://ml-cheatsheet.readthedocs.io/en/latest/_images/optimizers.gif)"
 777 |    ]
 778 |   },
 779 |   {
 780 |    "cell_type": "markdown",
 781 |    "metadata": {},
 782 |    "source": [
 783 |     "![](https://images.viblo.asia/993c2fcc-ece4-41d8-bca7-5442e9ce3836.gif)"
 784 |    ]
 785 |   },
 786 |   {
 787 |    "cell_type": "markdown",
 788 |    "metadata": {},
 789 |    "source": [
 790 |     "![](https://camo.githubusercontent.com/cccdc200cc2a3052ad9562f6f58ebd429d634fb1190a84b20db931e8ad4ffb8f/68747470733a2f2f692e737461636b2e696d6775722e636f6d2f316f6274562e676966)"
 791 |    ]
 792 |   },
 793 |   {
 794 |    "cell_type": "markdown",
 795 |    "metadata": {
 796 |     "id": "BcPK0X9A9SLs"
 797 |    },
 798 |    "source": [
 799 |     "# Инициализируем модель"
 800 |    ]
 801 |   },
 802 |   {
 803 |    "cell_type": "code",
 804 |    "execution_count": null,
 805 |    "metadata": {
 806 |     "colab": {
 807 |      "base_uri": "https://localhost:8080/"
 808 |     },
 809 |     "id": "isKKf6Ph9SLs",
 810 |     "outputId": "171f5352-7f9d-4b8b-e713-9e22bed4aae4"
 811 |    },
 812 |    "outputs": [],
 813 |    "source": [
 814 |     "model = torch.nn.Linear(2, 1)\n",
 815 |     "criterion = torch.nn.BCELoss()\n",
 816 |     "\n",
 817 |     "optim = torch.optim.SGD(model.parameters(), lr=0.1)\n",
 818 |     "\n",
 819 |     "model.train()"
 820 |    ]
 821 |   },
 822 |   {
 823 |    "cell_type": "code",
 824 |    "execution_count": null,
 825 |    "metadata": {
 826 |     "colab": {
 827 |      "base_uri": "https://localhost:8080/"
 828 |     },
 829 |     "id": "G8ZZVN6p9SLs",
 830 |     "outputId": "f726ed1e-75ab-400f-a272-0b97f064c1eb"
 831 |    },
 832 |    "outputs": [],
 833 |    "source": [
 834 |     "model"
 835 |    ]
 836 |   },
 837 |   {
 838 |    "cell_type": "code",
 839 |    "execution_count": null,
 840 |    "metadata": {
 841 |     "colab": {
 842 |      "base_uri": "https://localhost:8080/"
 843 |     },
 844 |     "id": "0EJf9Aa89SLs",
 845 |     "outputId": "cb41bf52-a525-4828-a41c-72d41924471a"
 846 |    },
 847 |    "outputs": [],
 848 |    "source": [
 849 |     "model.weight"
 850 |    ]
 851 |   },
 852 |   {
 853 |    "cell_type": "code",
 854 |    "execution_count": null,
 855 |    "metadata": {
 856 |     "colab": {
 857 |      "base_uri": "https://localhost:8080/"
 858 |     },
 859 |     "id": "kZfdsGbH9SLt",
 860 |     "outputId": "2732e8ba-8903-480a-dabe-9d3686bf9bae"
 861 |    },
 862 |    "outputs": [],
 863 |    "source": [
 864 |     "model.bias"
 865 |    ]
 866 |   },
 867 |   {
 868 |    "cell_type": "code",
 869 |    "execution_count": null,
 870 |    "metadata": {
 871 |     "colab": {
 872 |      "base_uri": "https://localhost:8080/"
 873 |     },
 874 |     "id": "-8fF8Cu89SLt",
 875 |     "outputId": "ad5eb620-6ada-46e4-edea-efe10042f355"
 876 |    },
 877 |    "outputs": [],
 878 |    "source": [
 879 |     "model.weight.data.dtype"
 880 |    ]
 881 |   },
 882 |   {
 883 |    "cell_type": "code",
 884 |    "execution_count": null,
 885 |    "metadata": {
 886 |     "colab": {
 887 |      "base_uri": "https://localhost:8080/",
 888 |      "height": 282
 889 |     },
 890 |     "id": "l0giOSki9SLt",
 891 |     "outputId": "56752055-57ab-4742-bca2-62b16ee88f22"
 892 |    },
 893 |    "outputs": [],
 894 |    "source": [
 895 |     "from IPython.display import clear_output\n",
 896 |     "\n",
 897 |     "\n",
 898 |     "for i in range(1000):\n",
 899 |     "    # считаем предсказание\n",
 900 |     "    y_pred = torch.sigmoid(model(X.float()))\n",
 901 |     "\n",
 902 |     "    # считаем лосс\n",
 903 |     "    loss = criterion(y_pred.flatten(), y.float())\n",
 904 |     "\n",
 905 |     "    # прокидываем градиенты\n",
 906 |     "    loss.backward()\n",
 907 |     "\n",
 908 |     "    # делаем шаг оптимизатором\n",
 909 |     "    optim.step()\n",
 910 |     "\n",
 911 |     "    # зануляем градиенты\n",
 912 |     "    optim.zero_grad()\n",
 913 |     "\n",
 914 |     "    if (i + 1) % 5 == 0:\n",
 915 |     "        clear_output(True)\n",
 916 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
 917 |     "\n",
 918 |     "        w_1 = model.weight.data[0][0]\n",
 919 |     "        w_2 = model.weight.data[0][1]\n",
 920 |     "        w_0 = model.bias.data[0]\n",
 921 |     "\n",
 922 |     "        x_arr = torch.linspace(-10, 20, 1000)\n",
 923 |     "        plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
 924 |     "        plt.show()\n",
 925 |     "\n",
 926 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
 927 |    ]
 928 |   },
 929 |   {
 930 |    "cell_type": "markdown",
 931 |    "metadata": {
 932 |     "id": "PvOBE30i9SLu"
 933 |    },
 934 |    "source": [
 935 |     "# Окей, пусть теперь проблема нелинейная"
 936 |    ]
 937 |   },
 938 |   {
 939 |    "cell_type": "code",
 940 |    "execution_count": null,
 941 |    "metadata": {
 942 |     "id": "xNdKWnab9SLu"
 943 |    },
 944 |    "outputs": [],
 945 |    "source": [
 946 |     "from sklearn.datasets import make_moons"
 947 |    ]
 948 |   },
 949 |   {
 950 |    "cell_type": "code",
 951 |    "execution_count": null,
 952 |    "metadata": {
 953 |     "id": "ZMiLvnHe9SLu"
 954 |    },
 955 |    "outputs": [],
 956 |    "source": [
 957 |     "X, y = make_moons(n_samples=200, noise=0.1, random_state=17)"
 958 |    ]
 959 |   },
 960 |   {
 961 |    "cell_type": "code",
 962 |    "execution_count": null,
 963 |    "metadata": {
 964 |     "colab": {
 965 |      "base_uri": "https://localhost:8080/",
 966 |      "height": 266
 967 |     },
 968 |     "id": "iKV3vT9q9SLu",
 969 |     "outputId": "2cbe3751-a072-4293-9753-f4e70d1ea88c"
 970 |    },
 971 |    "outputs": [],
 972 |    "source": [
 973 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
 974 |     "\n",
 975 |     "X = torch.tensor(X)\n",
 976 |     "y = torch.tensor(y)"
 977 |    ]
 978 |   },
 979 |   {
 980 |    "cell_type": "code",
 981 |    "execution_count": null,
 982 |    "metadata": {
 983 |     "colab": {
 984 |      "base_uri": "https://localhost:8080/"
 985 |     },
 986 |     "id": "A-suQlRt9SLu",
 987 |     "outputId": "af3e8cc1-747e-4195-87a6-a4fe7d01689a"
 988 |    },
 989 |    "outputs": [],
 990 |    "source": [
 991 |     "model = torch.nn.Linear(2, 1)\n",
 992 |     "criterion = torch.nn.BCELoss()\n",
 993 |     "\n",
 994 |     "optim = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)\n",
 995 |     "\n",
 996 |     "model.train()"
 997 |    ]
 998 |   },
 999 |   {
1000 |    "cell_type": "code",
1001 |    "execution_count": null,
1002 |    "metadata": {
1003 |     "colab": {
1004 |      "base_uri": "https://localhost:8080/",
1005 |      "height": 283
1006 |     },
1007 |     "id": "gF_WdOl29SLv",
1008 |     "outputId": "9b84241b-eb28-4ff2-ddef-6d883613de7e"
1009 |    },
1010 |    "outputs": [],
1011 |    "source": [
1012 |     "from IPython.display import clear_output\n",
1013 |     "\n",
1014 |     "\n",
1015 |     "for i in range(30000):\n",
1016 |     "\n",
1017 |     "    y_pred = torch.sigmoid(model(X.float()))\n",
1018 |     "\n",
1019 |     "    loss = criterion(y_pred.flatten(), y.float())\n",
1020 |     "\n",
1021 |     "    loss.backward()\n",
1022 |     "\n",
1023 |     "    optim.step()\n",
1024 |     "    optim.zero_grad()\n",
1025 |     "\n",
1026 |     "    if (i + 1) % 500 == 0:\n",
1027 |     "        clear_output(True)\n",
1028 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1029 |     "\n",
1030 |     "        w_1 = model.weight.data[0][0]\n",
1031 |     "        w_2 = model.weight.data[0][1]\n",
1032 |     "        w_0 = model.bias.data[0]\n",
1033 |     "\n",
1034 |     "        x_arr = torch.linspace(-2, 2, 1000)\n",
1035 |     "        plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
1036 |     "        plt.show()\n",
1037 |     "\n",
1038 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1039 |    ]
1040 |   },
1041 |   {
1042 |    "cell_type": "markdown",
1043 |    "metadata": {
1044 |     "id": "WevXy3M99SLv"
1045 |    },
1046 |    "source": [
1047 |     "# Визуализируем разделяющую плоскость"
1048 |    ]
1049 |   },
1050 |   {
1051 |    "cell_type": "code",
1052 |    "execution_count": null,
1053 |    "metadata": {
1054 |     "id": "hnCBUkTa9SLv"
1055 |    },
1056 |    "outputs": [],
1057 |    "source": [
1058 |     "h = 0.02  # step size in the mesh\n",
1059 |     "cm = plt.cm.RdBu\n",
1060 |     "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1061 |     "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1062 |     "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1063 |     "\n",
1064 |     "input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1065 |     "model.eval()\n",
1066 |     "Z = torch.sigmoid(model(input_tensor))\n",
1067 |     "Z = Z.reshape(xx.shape)"
1068 |    ]
1069 |   },
1070 |   {
1071 |    "cell_type": "code",
1072 |    "execution_count": null,
1073 |    "metadata": {
1074 |     "colab": {
1075 |      "base_uri": "https://localhost:8080/",
1076 |      "height": 265
1077 |     },
1078 |     "id": "nyNauPQp9SLw",
1079 |     "outputId": "ca257bc2-d4b6-4e79-ee7b-948a2d314d42"
1080 |    },
1081 |    "outputs": [],
1082 |    "source": [
1083 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1084 |     "\n",
1085 |     "plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1086 |     "\n",
1087 |     "plt.show()"
1088 |    ]
1089 |   },
1090 |   {
1091 |    "cell_type": "markdown",
1092 |    "metadata": {
1093 |     "id": "UddJchKW9SLw"
1094 |    },
1095 |    "source": [
1096 |     "# Делаем модель сложнее"
1097 |    ]
1098 |   },
1099 |   {
1100 |    "cell_type": "code",
1101 |    "execution_count": null,
1102 |    "metadata": {
1103 |     "colab": {
1104 |      "base_uri": "https://localhost:8080/",
1105 |      "height": 355
1106 |     },
1107 |     "id": "x69Nn_YN9SLw",
1108 |     "outputId": "2ca4de0c-a95f-4786-f8f1-e529f403fd81"
1109 |    },
1110 |    "outputs": [],
1111 |    "source": [
1112 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1113 |     "\n",
1114 |     "X = torch.tensor(X)\n",
1115 |     "y = torch.tensor(y)"
1116 |    ]
1117 |   },
1118 |   {
1119 |    "cell_type": "code",
1120 |    "execution_count": null,
1121 |    "metadata": {
1122 |     "id": "dUhEBnzI9SLx"
1123 |    },
1124 |    "outputs": [],
1125 |    "source": [
1126 |     "from torch import nn"
1127 |    ]
1128 |   },
1129 |   {
1130 |    "cell_type": "markdown",
1131 |    "metadata": {
1132 |     "id": "aFhGOXiu9SLx"
1133 |    },
1134 |    "source": [
1135 |     "# Cоберите двуслойную модель"
1136 |    ]
1137 |   },
1138 |   {
1139 |    "cell_type": "code",
1140 |    "execution_count": null,
1141 |    "metadata": {
1142 |     "id": "wfPiEfSU9SLx"
1143 |    },
1144 |    "outputs": [],
1145 |    "source": [
1146 |     "model = nn.Sequential()\n",
1147 |     "model.add_module(\"first\", nn.Linear(2, 2))\n",
1148 |     "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1149 |     "model.add_module(\"second\", nn.Linear(2, 1))"
1150 |    ]
1151 |   },
1152 |   {
1153 |    "cell_type": "code",
1154 |    "execution_count": null,
1155 |    "metadata": {
1156 |     "colab": {
1157 |      "base_uri": "https://localhost:8080/"
1158 |     },
1159 |     "id": "owO-1I-J9SLy",
1160 |     "outputId": "4c346e55-b4f0-4c45-aee8-72827c80c6b2"
1161 |    },
1162 |    "outputs": [],
1163 |    "source": [
1164 |     "criterion = torch.nn.BCELoss()\n",
1165 |     "\n",
1166 |     "optim = torch.optim.SGD(model.parameters(), lr=2.0)\n",
1167 |     "\n",
1168 |     "model.train()"
1169 |    ]
1170 |   },
1171 |   {
1172 |    "cell_type": "code",
1173 |    "execution_count": null,
1174 |    "metadata": {
1175 |     "colab": {
1176 |      "base_uri": "https://localhost:8080/",
1177 |      "height": 282
1178 |     },
1179 |     "id": "RquUN5YP9SLy",
1180 |     "outputId": "ab12b8c6-00c8-4a99-ead9-ffef5eb71d88"
1181 |    },
1182 |    "outputs": [],
1183 |    "source": [
1184 |     "from IPython.display import clear_output\n",
1185 |     "\n",
1186 |     "\n",
1187 |     "h = 0.02  # step size in the mesh\n",
1188 |     "cm = plt.cm.RdBu\n",
1189 |     "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1190 |     "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1191 |     "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1192 |     "\n",
1193 |     "\n",
1194 |     "model.train()\n",
1195 |     "for i in range(30000):\n",
1196 |     "\n",
1197 |     "    y_pred = torch.sigmoid(model(X.float()))\n",
1198 |     "\n",
1199 |     "    loss = criterion(y_pred.flatten(), y.float())\n",
1200 |     "\n",
1201 |     "    loss.backward()\n",
1202 |     "\n",
1203 |     "    optim.step()\n",
1204 |     "    optim.zero_grad()\n",
1205 |     "    if (i + 1) % 500 == 0:\n",
1206 |     "        clear_output(True)\n",
1207 |     "        input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1208 |     "        Z = torch.sigmoid(model(input_tensor))\n",
1209 |     "        Z = Z.reshape(xx.shape)\n",
1210 |     "\n",
1211 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1212 |     "\n",
1213 |     "        plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1214 |     "\n",
1215 |     "        plt.show()\n",
1216 |     "\n",
1217 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1218 |    ]
1219 |   },
1220 |   {
1221 |    "cell_type": "markdown",
1222 |    "metadata": {
1223 |     "id": "qN9h6lYI9SLy"
1224 |    },
1225 |    "source": [
1226 |     "# Усложним модель (увеличим количество слоев)"
1227 |    ]
1228 |   },
1229 |   {
1230 |    "cell_type": "code",
1231 |    "execution_count": null,
1232 |    "metadata": {
1233 |     "id": "79UD5nPe9SLy"
1234 |    },
1235 |    "outputs": [],
1236 |    "source": [
1237 |     "model = nn.Sequential()\n",
1238 |     "model.add_module(\"first\", nn.Linear(2, 2))\n",
1239 |     "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1240 |     "model.add_module(\"second\", nn.Linear(2, 2))\n",
1241 |     "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1242 |     "model.add_module(\"third\", nn.Linear(2, 1))"
1243 |    ]
1244 |   },
1245 |   {
1246 |    "cell_type": "code",
1247 |    "execution_count": null,
1248 |    "metadata": {
1249 |     "colab": {
1250 |      "base_uri": "https://localhost:8080/"
1251 |     },
1252 |     "id": "IAxchrKd9SLz",
1253 |     "outputId": "ca800af1-ecb9-407b-d091-a700f66b5c1e"
1254 |    },
1255 |    "outputs": [],
1256 |    "source": [
1257 |     "criterion = torch.nn.BCELoss()\n",
1258 |     "\n",
1259 |     "optim = torch.optim.SGD(model.parameters(), lr=2.0, momentum=0.9)\n",
1260 |     "\n",
1261 |     "model.train()"
1262 |    ]
1263 |   },
1264 |   {
1265 |    "cell_type": "code",
1266 |    "execution_count": null,
1267 |    "metadata": {
1268 |     "colab": {
1269 |      "base_uri": "https://localhost:8080/",
1270 |      "height": 282
1271 |     },
1272 |     "id": "K7n0LIa09SLz",
1273 |     "outputId": "ca69ae43-9dd6-4ee3-cd12-ff7bdb514e12",
1274 |     "scrolled": false
1275 |    },
1276 |    "outputs": [],
1277 |    "source": [
1278 |     "from IPython.display import clear_output\n",
1279 |     "\n",
1280 |     "\n",
1281 |     "h = 0.02  # step size in the mesh\n",
1282 |     "cm = plt.cm.RdBu\n",
1283 |     "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1284 |     "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1285 |     "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1286 |     "\n",
1287 |     "\n",
1288 |     "model.train()\n",
1289 |     "for i in range(30000):\n",
1290 |     "\n",
1291 |     "    y_pred = torch.sigmoid(model(X.float()))\n",
1292 |     "\n",
1293 |     "    loss = criterion(y_pred.flatten(), y.float())\n",
1294 |     "\n",
1295 |     "    loss.backward()\n",
1296 |     "\n",
1297 |     "    optim.step()\n",
1298 |     "    optim.zero_grad()\n",
1299 |     "    if (i + 1) % 500 == 0:\n",
1300 |     "        clear_output(True)\n",
1301 |     "        input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1302 |     "        Z = torch.sigmoid(model(input_tensor))\n",
1303 |     "        Z = Z.reshape(xx.shape)\n",
1304 |     "\n",
1305 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1306 |     "\n",
1307 |     "        plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1308 |     "\n",
1309 |     "        plt.show()\n",
1310 |     "\n",
1311 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1312 |    ]
1313 |   },
1314 |   {
1315 |    "cell_type": "markdown",
1316 |    "metadata": {
1317 |     "id": "OWZ6cFnP9SLz"
1318 |    },
1319 |    "source": [
1320 |     "# Еще усложним модель (увеличим размерность скрытых слоев)"
1321 |    ]
1322 |   },
1323 |   {
1324 |    "cell_type": "code",
1325 |    "execution_count": null,
1326 |    "metadata": {
1327 |     "id": "weq1btVh9SL0"
1328 |    },
1329 |    "outputs": [],
1330 |    "source": [
1331 |     "model = nn.Sequential()\n",
1332 |     "model.add_module(\"first\", nn.Linear(2, 5))\n",
1333 |     "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1334 |     "model.add_module(\"second\", nn.Linear(5, 5))\n",
1335 |     "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1336 |     "model.add_module(\"third\", nn.Linear(5, 1))\n",
1337 |     "model.add_module(\"third_activation\", nn.Sigmoid())"
1338 |    ]
1339 |   },
1340 |   {
1341 |    "cell_type": "code",
1342 |    "execution_count": null,
1343 |    "metadata": {
1344 |     "colab": {
1345 |      "base_uri": "https://localhost:8080/"
1346 |     },
1347 |     "id": "R9E6zP6p9SL0",
1348 |     "outputId": "0e5cffdc-3b66-4ab7-9819-6a69654ac445"
1349 |    },
1350 |    "outputs": [],
1351 |    "source": [
1352 |     "criterion = torch.nn.BCELoss()\n",
1353 |     "\n",
1354 |     "optim = torch.optim.SGD(model.parameters(), lr=2, momentum=0.9)\n",
1355 |     "\n",
1356 |     "model.train()\n",
1357 |     "\n",
1358 |     "# model.to('cuda:2')"
1359 |    ]
1360 |   },
1361 |   {
1362 |    "cell_type": "code",
1363 |    "execution_count": null,
1364 |    "metadata": {
1365 |     "colab": {
1366 |      "base_uri": "https://localhost:8080/",
1367 |      "height": 282
1368 |     },
1369 |     "id": "XUWTN8rr9SL0",
1370 |     "outputId": "7e5358e6-247b-4697-98f0-100384573ebc"
1371 |    },
1372 |    "outputs": [],
1373 |    "source": [
1374 |     "from IPython.display import clear_output\n",
1375 |     "\n",
1376 |     "\n",
1377 |     "h = 0.02  # step size in the mesh\n",
1378 |     "cm = plt.cm.RdBu\n",
1379 |     "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1380 |     "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1381 |     "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1382 |     "\n",
1383 |     "\n",
1384 |     "model.train()\n",
1385 |     "for i in range(3000):\n",
1386 |     "\n",
1387 |     "    y_pred = model(X.float())\n",
1388 |     "\n",
1389 |     "    loss = criterion(y_pred.flatten(), y.float())\n",
1390 |     "\n",
1391 |     "    loss.backward()\n",
1392 |     "\n",
1393 |     "    optim.step()\n",
1394 |     "    optim.zero_grad()\n",
1395 |     "    if (i + 1) % 50 == 0:\n",
1396 |     "        clear_output(True)\n",
1397 |     "        input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1398 |     "        Z = torch.sigmoid(model(input_tensor))\n",
1399 |     "        Z = Z.reshape(xx.shape)\n",
1400 |     "\n",
1401 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1402 |     "\n",
1403 |     "        plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1404 |     "\n",
1405 |     "        plt.show()\n",
1406 |     "\n",
1407 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1408 |    ]
1409 |   },
1410 |   {
1411 |    "cell_type": "markdown",
1412 |    "metadata": {
1413 |     "id": "QvByiISS9SL0"
1414 |    },
1415 |    "source": [
1416 |     "# Сделаем > 2 классов"
1417 |    ]
1418 |   },
1419 |   {
1420 |    "cell_type": "code",
1421 |    "execution_count": null,
1422 |    "metadata": {
1423 |     "id": "HKn510MK9SL1"
1424 |    },
1425 |    "outputs": [],
1426 |    "source": [
1427 |     "from sklearn.datasets import make_circles"
1428 |    ]
1429 |   },
1430 |   {
1431 |    "cell_type": "code",
1432 |    "execution_count": null,
1433 |    "metadata": {
1434 |     "id": "swhmG6--9SL1"
1435 |    },
1436 |    "outputs": [],
1437 |    "source": [
1438 |     "# blobs with varied variances\n",
1439 |     "X, y = make_blobs(n_samples=400, cluster_std=[1.0, 1.5, 0.5], random_state=17)"
1440 |    ]
1441 |   },
1442 |   {
1443 |    "cell_type": "code",
1444 |    "execution_count": null,
1445 |    "metadata": {
1446 |     "colab": {
1447 |      "base_uri": "https://localhost:8080/",
1448 |      "height": 265
1449 |     },
1450 |     "id": "pbgyFYMl9SL1",
1451 |     "outputId": "26804b23-87cd-408a-def5-fd296742302d"
1452 |    },
1453 |    "outputs": [],
1454 |    "source": [
1455 |     "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1456 |     "\n",
1457 |     "X = torch.tensor(X)\n",
1458 |     "y = torch.tensor(y)"
1459 |    ]
1460 |   },
1461 |   {
1462 |    "cell_type": "code",
1463 |    "execution_count": null,
1464 |    "metadata": {
1465 |     "id": "YE3gnkkN9SL1"
1466 |    },
1467 |    "outputs": [],
1468 |    "source": [
1469 |     "model = nn.Sequential()\n",
1470 |     "model.add_module(\"first\", nn.Linear(2, 5))\n",
1471 |     "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1472 |     "model.add_module(\"second\", nn.Linear(5, 5))\n",
1473 |     "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1474 |     "model.add_module(\"third\", nn.Linear(5, 3))"
1475 |    ]
1476 |   },
1477 |   {
1478 |    "cell_type": "code",
1479 |    "execution_count": null,
1480 |    "metadata": {
1481 |     "colab": {
1482 |      "base_uri": "https://localhost:8080/"
1483 |     },
1484 |     "id": "tqBQn5yW9SL2",
1485 |     "outputId": "fc339779-bee2-4cc5-aa91-ec285e824bc1"
1486 |    },
1487 |    "outputs": [],
1488 |    "source": [
1489 |     "criterion = torch.nn.CrossEntropyLoss()\n",
1490 |     "\n",
1491 |     "optim = torch.optim.SGD(model.parameters(), lr=0.1)\n",
1492 |     "\n",
1493 |     "model.train()"
1494 |    ]
1495 |   },
1496 |   {
1497 |    "cell_type": "code",
1498 |    "execution_count": null,
1499 |    "metadata": {
1500 |     "colab": {
1501 |      "base_uri": "https://localhost:8080/",
1502 |      "height": 282
1503 |     },
1504 |     "id": "KCBrFli99SL2",
1505 |     "outputId": "8f15148e-39a3-436b-f424-2e03d696b914"
1506 |    },
1507 |    "outputs": [],
1508 |    "source": [
1509 |     "from IPython.display import clear_output\n",
1510 |     "\n",
1511 |     "\n",
1512 |     "h = 0.02  # step size in the mesh\n",
1513 |     "cm = plt.cm.RdBu\n",
1514 |     "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1515 |     "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1516 |     "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1517 |     "\n",
1518 |     "model.train()\n",
1519 |     "for i in range(1000):\n",
1520 |     "\n",
1521 |     "    y_pred = model(X.float())\n",
1522 |     "\n",
1523 |     "    loss = criterion(y_pred, y.long())\n",
1524 |     "\n",
1525 |     "    loss.backward()\n",
1526 |     "\n",
1527 |     "    optim.step()\n",
1528 |     "    optim.zero_grad()\n",
1529 |     "    if (i + 1) % 25 == 0:\n",
1530 |     "        clear_output(True)\n",
1531 |     "        input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1532 |     "        Z = model(input_tensor)\n",
1533 |     "        Z = torch.argmax(Z, axis=1)\n",
1534 |     "        Z = Z.reshape(xx.shape)\n",
1535 |     "\n",
1536 |     "        plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1537 |     "\n",
1538 |     "        plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1539 |     "\n",
1540 |     "        plt.show()\n",
1541 |     "\n",
1542 |     "        print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1543 |    ]
1544 |   },
1545 |   {
1546 |    "cell_type": "markdown",
1547 |    "metadata": {},
1548 |    "source": [
1549 |     "# Реальные данные"
1550 |    ]
1551 |   },
1552 |   {
1553 |    "cell_type": "code",
1554 |    "execution_count": null,
1555 |    "metadata": {},
1556 |    "outputs": [],
1557 |    "source": [
1558 |     "from torch.utils.data import Dataset\n",
1559 |     "from torchvision import datasets\n",
1560 |     "from torchvision.transforms import ToTensor, Normalize, Lambda"
1561 |    ]
1562 |   },
1563 |   {
1564 |    "cell_type": "markdown",
1565 |    "metadata": {},
1566 |    "source": [
1567 |     "# 1. Грузим данные"
1568 |    ]
1569 |   },
1570 |   {
1571 |    "cell_type": "code",
1572 |    "execution_count": null,
1573 |    "metadata": {},
1574 |    "outputs": [],
1575 |    "source": [
1576 |     "data_folderpath = \"../../data\"\n",
1577 |     "training_data = datasets.FashionMNIST(\n",
1578 |     "    root=data_folderpath, train=True, download=True, transform=ToTensor()\n",
1579 |     ")\n",
1580 |     "\n",
1581 |     "test_data = datasets.FashionMNIST(\n",
1582 |     "    root=data_folderpath, train=False, download=True, transform=ToTensor()\n",
1583 |     ")"
1584 |    ]
1585 |   },
1586 |   {
1587 |    "cell_type": "code",
1588 |    "execution_count": null,
1589 |    "metadata": {},
1590 |    "outputs": [],
1591 |    "source": [
1592 |     "labels_map = {\n",
1593 |     "    0: \"T-Shirt\",\n",
1594 |     "    1: \"Trouser\",\n",
1595 |     "    2: \"Pullover\",\n",
1596 |     "    3: \"Dress\",\n",
1597 |     "    4: \"Coat\",\n",
1598 |     "    5: \"Sandal\",\n",
1599 |     "    6: \"Shirt\",\n",
1600 |     "    7: \"Sneaker\",\n",
1601 |     "    8: \"Bag\",\n",
1602 |     "    9: \"Ankle Boot\",\n",
1603 |     "}\n",
1604 |     "figure = plt.figure(figsize=(8, 8))\n",
1605 |     "cols, rows = 3, 3\n",
1606 |     "for i in range(1, cols * rows + 1):\n",
1607 |     "    sample_idx = torch.randint(len(training_data), size=(1,)).item()\n",
1608 |     "    img, label = training_data[sample_idx]\n",
1609 |     "    figure.add_subplot(rows, cols, i)\n",
1610 |     "    plt.title(labels_map[label])\n",
1611 |     "    plt.axis(\"off\")\n",
1612 |     "    plt.imshow(img.squeeze(), cmap=\"gray\")\n",
1613 |     "plt.show()"
1614 |    ]
1615 |   },
1616 |   {
1617 |    "cell_type": "code",
1618 |    "execution_count": null,
1619 |    "metadata": {},
1620 |    "outputs": [],
1621 |    "source": [
1622 |     "X_sample, y_sample = training_data[0]\n",
1623 |     "print(X_sample)\n",
1624 |     "print(y_sample)"
1625 |    ]
1626 |   },
1627 |   {
1628 |    "cell_type": "code",
1629 |    "execution_count": null,
1630 |    "metadata": {},
1631 |    "outputs": [],
1632 |    "source": [
1633 |     "X_sample.shape"
1634 |    ]
1635 |   },
1636 |   {
1637 |    "cell_type": "markdown",
1638 |    "metadata": {},
1639 |    "source": [
1640 |     "# Создаем собственный датасет"
1641 |    ]
1642 |   },
1643 |   {
1644 |    "cell_type": "code",
1645 |    "execution_count": null,
1646 |    "metadata": {},
1647 |    "outputs": [],
1648 |    "source": [
1649 |     "import pandas as pd\n",
1650 |     "import numpy as np\n",
1651 |     "\n",
1652 |     "\n",
1653 |     "class CustomImageDataset(Dataset):\n",
1654 |     "    def __init__(self, data_file, transform=None, target_transform=None):\n",
1655 |     "        self.data = pd.read_csv(data_file)\n",
1656 |     "        self.transform = transform\n",
1657 |     "        self.target_transform = target_transform\n",
1658 |     "\n",
1659 |     "    def __len__(self):\n",
1660 |     "        return self.data.shape[0]\n",
1661 |     "\n",
1662 |     "    def __getitem__(self, idx):\n",
1663 |     "        image = self.data.iloc[idx, 1:].values.reshape(28, 28).astype(np.uint8)\n",
1664 |     "        label = self.data.iloc[idx, 0]\n",
1665 |     "        if self.transform:\n",
1666 |     "            image = self.transform(image)\n",
1667 |     "        if self.target_transform:\n",
1668 |     "            label = self.target_transform(label)\n",
1669 |     "        return image, label"
1670 |    ]
1671 |   },
1672 |   {
1673 |    "cell_type": "code",
1674 |    "execution_count": null,
1675 |    "metadata": {},
1676 |    "outputs": [],
1677 |    "source": [
1678 |     "train_data_url = \"https://media.githubusercontent.com/media/fpleoni/fashion_mnist/master/fashion-mnist_train.csv\"\n",
1679 |     "test_data_url = \"https://media.githubusercontent.com/media/fpleoni/fashion_mnist/master/fashion-mnist_test.csv\""
1680 |    ]
1681 |   },
1682 |   {
1683 |    "cell_type": "code",
1684 |    "execution_count": null,
1685 |    "metadata": {},
1686 |    "outputs": [],
1687 |    "source": [
1688 |     "train = CustomImageDataset(\n",
1689 |     "    train_data_url,\n",
1690 |     "    transform=ToTensor(),\n",
1691 |     "    target_transform=Lambda(\n",
1692 |     "        lambda y: torch.zeros(10, dtype=torch.float).scatter_(\n",
1693 |     "            0, torch.tensor(y), value=1\n",
1694 |     "        )\n",
1695 |     "    ),\n",
1696 |     ")\n",
1697 |     "test = CustomImageDataset(\n",
1698 |     "    test_data_url,\n",
1699 |     "    transform=ToTensor(),\n",
1700 |     "    target_transform=Lambda(\n",
1701 |     "        lambda y: torch.zeros(10, dtype=torch.float).scatter_(\n",
1702 |     "            0, torch.tensor(y), value=1\n",
1703 |     "        )\n",
1704 |     "    ),\n",
1705 |     ")"
1706 |    ]
1707 |   },
1708 |   {
1709 |    "cell_type": "code",
1710 |    "execution_count": null,
1711 |    "metadata": {},
1712 |    "outputs": [],
1713 |    "source": [
1714 |     "X_sample, y_sample = train[0]"
1715 |    ]
1716 |   },
1717 |   {
1718 |    "cell_type": "code",
1719 |    "execution_count": null,
1720 |    "metadata": {},
1721 |    "outputs": [],
1722 |    "source": [
1723 |     "X_sample"
1724 |    ]
1725 |   },
1726 |   {
1727 |    "cell_type": "code",
1728 |    "execution_count": null,
1729 |    "metadata": {},
1730 |    "outputs": [],
1731 |    "source": [
1732 |     "X_sample.shape"
1733 |    ]
1734 |   },
1735 |   {
1736 |    "cell_type": "code",
1737 |    "execution_count": null,
1738 |    "metadata": {},
1739 |    "outputs": [],
1740 |    "source": [
1741 |     "y_sample"
1742 |    ]
1743 |   },
1744 |   {
1745 |    "cell_type": "code",
1746 |    "execution_count": null,
1747 |    "metadata": {},
1748 |    "outputs": [],
1749 |    "source": [
1750 |     "y_sample.shape"
1751 |    ]
1752 |   },
1753 |   {
1754 |    "cell_type": "markdown",
1755 |    "metadata": {},
1756 |    "source": [
1757 |     "# Познаем даталоадеры"
1758 |    ]
1759 |   },
1760 |   {
1761 |    "cell_type": "code",
1762 |    "execution_count": null,
1763 |    "metadata": {},
1764 |    "outputs": [],
1765 |    "source": [
1766 |     "from torch.utils.data import DataLoader\n",
1767 |     "\n",
1768 |     "train_dataloader = DataLoader(train, batch_size=64, shuffle=True)\n",
1769 |     "test_dataloader = DataLoader(test, batch_size=64, shuffle=True)"
1770 |    ]
1771 |   },
1772 |   {
1773 |    "cell_type": "code",
1774 |    "execution_count": null,
1775 |    "metadata": {},
1776 |    "outputs": [],
1777 |    "source": [
1778 |     "train_features, train_labels = next(iter(train_dataloader))\n",
1779 |     "print(f\"Feature batch shape: {train_features.size()}\")\n",
1780 |     "print(f\"Labels batch shape: {train_labels.size()}\")\n",
1781 |     "img = train_features[0].squeeze()\n",
1782 |     "label = train_labels[0]\n",
1783 |     "plt.imshow(img, cmap=\"gray\")\n",
1784 |     "plt.show()\n",
1785 |     "print(f\"Label: {label}\")"
1786 |    ]
1787 |   },
1788 |   {
1789 |    "cell_type": "markdown",
1790 |    "metadata": {},
1791 |    "source": [
1792 |     "# 2. Собираем нейросеть"
1793 |    ]
1794 |   },
1795 |   {
1796 |    "cell_type": "code",
1797 |    "execution_count": null,
1798 |    "metadata": {},
1799 |    "outputs": [],
1800 |    "source": [
1801 |     "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
1802 |     "print(f\"Using {device} device\")"
1803 |    ]
1804 |   },
1805 |   {
1806 |    "cell_type": "code",
1807 |    "execution_count": null,
1808 |    "metadata": {},
1809 |    "outputs": [],
1810 |    "source": [
1811 |     "from torch import nn\n",
1812 |     "\n",
1813 |     "\n",
1814 |     "class NeuralNetwork(nn.Module):\n",
1815 |     "    def __init__(self):\n",
1816 |     "        super(NeuralNetwork, self).__init__()\n",
1817 |     "        self.flatten = nn.Flatten()\n",
1818 |     "        self.linear_relu_stack = nn.Sequential(\n",
1819 |     "            nn.Linear(28 * 28, 512),\n",
1820 |     "            nn.ReLU(),\n",
1821 |     "            nn.Linear(512, 512),\n",
1822 |     "            nn.ReLU(),\n",
1823 |     "            nn.Linear(512, 10),\n",
1824 |     "        )\n",
1825 |     "\n",
1826 |     "    def forward(self, x):\n",
1827 |     "        x = self.flatten(x)\n",
1828 |     "        logits = self.linear_relu_stack(x)\n",
1829 |     "        return logits"
1830 |    ]
1831 |   },
1832 |   {
1833 |    "cell_type": "code",
1834 |    "execution_count": null,
1835 |    "metadata": {},
1836 |    "outputs": [],
1837 |    "source": [
1838 |     "model = NeuralNetwork().to(device)\n",
1839 |     "print(model)"
1840 |    ]
1841 |   },
1842 |   {
1843 |    "cell_type": "code",
1844 |    "execution_count": null,
1845 |    "metadata": {},
1846 |    "outputs": [],
1847 |    "source": [
1848 |     "X = torch.rand(1, 28, 28, device=device)\n",
1849 |     "logits = model(X)\n",
1850 |     "pred_probab = nn.Softmax(dim=1)(logits)\n",
1851 |     "y_pred = pred_probab.argmax(1)\n",
1852 |     "print(f\"Predicted class: {y_pred}\")"
1853 |    ]
1854 |   },
1855 |   {
1856 |    "cell_type": "markdown",
1857 |    "metadata": {},
1858 |    "source": [
1859 |     "# Смотрим детально в слои сети"
1860 |    ]
1861 |   },
1862 |   {
1863 |    "cell_type": "code",
1864 |    "execution_count": null,
1865 |    "metadata": {},
1866 |    "outputs": [],
1867 |    "source": [
1868 |     "input_image = torch.rand(3, 28, 28)\n",
1869 |     "print(input_image.size())"
1870 |    ]
1871 |   },
1872 |   {
1873 |    "cell_type": "code",
1874 |    "execution_count": null,
1875 |    "metadata": {},
1876 |    "outputs": [],
1877 |    "source": [
1878 |     "input_image"
1879 |    ]
1880 |   },
1881 |   {
1882 |    "cell_type": "code",
1883 |    "execution_count": null,
1884 |    "metadata": {},
1885 |    "outputs": [],
1886 |    "source": [
1887 |     "flatten = nn.Flatten()\n",
1888 |     "flat_image = flatten(input_image)\n",
1889 |     "print(flat_image.size())"
1890 |    ]
1891 |   },
1892 |   {
1893 |    "cell_type": "code",
1894 |    "execution_count": null,
1895 |    "metadata": {},
1896 |    "outputs": [],
1897 |    "source": [
1898 |     "flat_image"
1899 |    ]
1900 |   },
1901 |   {
1902 |    "cell_type": "code",
1903 |    "execution_count": null,
1904 |    "metadata": {},
1905 |    "outputs": [],
1906 |    "source": [
1907 |     "layer1 = nn.Linear(in_features=28 * 28, out_features=20)\n",
1908 |     "hidden1 = layer1(flat_image)\n",
1909 |     "print(hidden1.size())"
1910 |    ]
1911 |   },
1912 |   {
1913 |    "cell_type": "code",
1914 |    "execution_count": null,
1915 |    "metadata": {},
1916 |    "outputs": [],
1917 |    "source": [
1918 |     "print(f\"Before ReLU: {hidden1}\\n\\n\")\n",
1919 |     "hidden1 = nn.ReLU()(hidden1)\n",
1920 |     "print(f\"After ReLU: {hidden1}\")"
1921 |    ]
1922 |   },
1923 |   {
1924 |    "cell_type": "code",
1925 |    "execution_count": null,
1926 |    "metadata": {},
1927 |    "outputs": [],
1928 |    "source": [
1929 |     "seq_modules = nn.Sequential(flatten, layer1, nn.ReLU(), nn.Linear(20, 10))\n",
1930 |     "input_image = torch.rand(3, 28, 28)\n",
1931 |     "logits = seq_modules(input_image)"
1932 |    ]
1933 |   },
1934 |   {
1935 |    "cell_type": "code",
1936 |    "execution_count": null,
1937 |    "metadata": {},
1938 |    "outputs": [],
1939 |    "source": [
1940 |     "logits.shape"
1941 |    ]
1942 |   },
1943 |   {
1944 |    "cell_type": "code",
1945 |    "execution_count": null,
1946 |    "metadata": {},
1947 |    "outputs": [],
1948 |    "source": [
1949 |     "logits"
1950 |    ]
1951 |   },
1952 |   {
1953 |    "cell_type": "code",
1954 |    "execution_count": null,
1955 |    "metadata": {},
1956 |    "outputs": [],
1957 |    "source": [
1958 |     "softmax = nn.Softmax(dim=1)\n",
1959 |     "pred_probab = softmax(logits)\n",
1960 |     "pred_probab"
1961 |    ]
1962 |   },
1963 |   {
1964 |    "cell_type": "code",
1965 |    "execution_count": null,
1966 |    "metadata": {},
1967 |    "outputs": [],
1968 |    "source": [
1969 |     "pred_probab.argmax(dim=1)"
1970 |    ]
1971 |   },
1972 |   {
1973 |    "cell_type": "code",
1974 |    "execution_count": null,
1975 |    "metadata": {},
1976 |    "outputs": [],
1977 |    "source": [
1978 |     "print(f\"Model structure: {model}\\n\\n\")\n",
1979 |     "\n",
1980 |     "for name, param in model.named_parameters():\n",
1981 |     "    print(f\"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \\n\")"
1982 |    ]
1983 |   },
1984 |   {
1985 |    "cell_type": "markdown",
1986 |    "metadata": {},
1987 |    "source": [
1988 |     "# 3. Тренируем сеть"
1989 |    ]
1990 |   },
1991 |   {
1992 |    "cell_type": "markdown",
1993 |    "metadata": {},
1994 |    "source": [
1995 |     "#### Инициализируем модель"
1996 |    ]
1997 |   },
1998 |   {
1999 |    "cell_type": "code",
2000 |    "execution_count": null,
2001 |    "metadata": {},
2002 |    "outputs": [],
2003 |    "source": [
2004 |     "model = NeuralNetwork()"
2005 |    ]
2006 |   },
2007 |   {
2008 |    "cell_type": "markdown",
2009 |    "metadata": {},
2010 |    "source": [
2011 |     "#### Инициализируем параметры"
2012 |    ]
2013 |   },
2014 |   {
2015 |    "cell_type": "code",
2016 |    "execution_count": null,
2017 |    "metadata": {},
2018 |    "outputs": [],
2019 |    "source": [
2020 |     "learning_rate = 1e-3\n",
2021 |     "batch_size = 64\n",
2022 |     "epochs = 5"
2023 |    ]
2024 |   },
2025 |   {
2026 |    "cell_type": "markdown",
2027 |    "metadata": {},
2028 |    "source": [
2029 |     "#### Инициализируем функцию потерь"
2030 |    ]
2031 |   },
2032 |   {
2033 |    "cell_type": "code",
2034 |    "execution_count": null,
2035 |    "metadata": {},
2036 |    "outputs": [],
2037 |    "source": [
2038 |     "loss_fn = nn.CrossEntropyLoss()"
2039 |    ]
2040 |   },
2041 |   {
2042 |    "cell_type": "markdown",
2043 |    "metadata": {},
2044 |    "source": [
2045 |     "#### Инициализируем оптимизатор"
2046 |    ]
2047 |   },
2048 |   {
2049 |    "cell_type": "code",
2050 |    "execution_count": null,
2051 |    "metadata": {},
2052 |    "outputs": [],
2053 |    "source": [
2054 |     "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)"
2055 |    ]
2056 |   },
2057 |   {
2058 |    "cell_type": "markdown",
2059 |    "metadata": {},
2060 |    "source": [
2061 |     "#### Описываем функции тренировки и тестирования модели"
2062 |    ]
2063 |   },
2064 |   {
2065 |    "cell_type": "code",
2066 |    "execution_count": null,
2067 |    "metadata": {},
2068 |    "outputs": [],
2069 |    "source": [
2070 |     "def train_loop(dataloader, model, loss_fn, optimizer):\n",
2071 |     "    size = len(dataloader.dataset)\n",
2072 |     "    for batch, (X, y) in enumerate(dataloader):\n",
2073 |     "        # Compute prediction and loss\n",
2074 |     "        pred = model(X)\n",
2075 |     "        loss = loss_fn(pred, y)\n",
2076 |     "\n",
2077 |     "        # Backpropagation\n",
2078 |     "        optimizer.zero_grad()\n",
2079 |     "        loss.backward()\n",
2080 |     "        optimizer.step()\n",
2081 |     "\n",
2082 |     "        if batch % 100 == 0:\n",
2083 |     "            loss, current = loss.item(), batch * len(X)\n",
2084 |     "            print(f\"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]\")\n",
2085 |     "\n",
2086 |     "\n",
2087 |     "def test_loop(dataloader, model, loss_fn):\n",
2088 |     "    size = len(dataloader.dataset)\n",
2089 |     "    num_batches = len(dataloader)\n",
2090 |     "    test_loss, correct = 0, 0\n",
2091 |     "\n",
2092 |     "    with torch.no_grad():\n",
2093 |     "        for X, y in dataloader:\n",
2094 |     "            pred = model(X)\n",
2095 |     "            test_loss += loss_fn(pred, y).item()\n",
2096 |     "            correct += (pred.argmax(1) == y.argmax(1)).type(torch.float).sum().item()\n",
2097 |     "\n",
2098 |     "    test_loss /= num_batches\n",
2099 |     "    correct /= size\n",
2100 |     "    print(\n",
2101 |     "        f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\"\n",
2102 |     "    )"
2103 |    ]
2104 |   },
2105 |   {
2106 |    "cell_type": "code",
2107 |    "execution_count": null,
2108 |    "metadata": {},
2109 |    "outputs": [],
2110 |    "source": [
2111 |     "epochs = 10\n",
2112 |     "for t in range(epochs):\n",
2113 |     "    print(f\"Epoch {t+1}\\n-------------------------------\")\n",
2114 |     "    train_loop(train_dataloader, model, loss_fn, optimizer)\n",
2115 |     "    test_loop(test_dataloader, model, loss_fn)\n",
2116 |     "print(\"Done!\")"
2117 |    ]
2118 |   },
2119 |   {
2120 |    "cell_type": "markdown",
2121 |    "metadata": {},
2122 |    "source": [
2123 |     "# Весь материал ниже считаем факультативным"
2124 |    ]
2125 |   },
2126 |   {
2127 |    "cell_type": "markdown",
2128 |    "metadata": {},
2129 |    "source": [
2130 |     "---"
2131 |    ]
2132 |   },
2133 |   {
2134 |    "cell_type": "markdown",
2135 |    "metadata": {},
2136 |    "source": [
2137 |     "# 4. Изучаем свертки"
2138 |    ]
2139 |   },
2140 |   {
2141 |    "cell_type": "markdown",
2142 |    "metadata": {},
2143 |    "source": [
2144 |     "![](https://distillery.com/wp-content/uploads/3-1.png)"
2145 |    ]
2146 |   },
2147 |   {
2148 |    "cell_type": "markdown",
2149 |    "metadata": {},
2150 |    "source": [
2151 |     "![](https://wikidocs.net/images/page/164365/Fig_04_cnn_filter.png)"
2152 |    ]
2153 |   },
2154 |   {
2155 |    "cell_type": "markdown",
2156 |    "metadata": {},
2157 |    "source": [
2158 |     "![](https://blog.kakaocdn.net/dn/lgBRU/btqWWtkNJhM/DsJwc0PXrbhS0n33Vqzgm1/img.gif)"
2159 |    ]
2160 |   },
2161 |   {
2162 |    "cell_type": "markdown",
2163 |    "metadata": {},
2164 |    "source": [
2165 |     "![](https://www.mobiquity.com/hs-fs/hubfs/CNN03.gif?width=640&name=CNN03.gif)"
2166 |    ]
2167 |   },
2168 |   {
2169 |    "cell_type": "markdown",
2170 |    "metadata": {},
2171 |    "source": [
2172 |     "![](https://i.stack.imgur.com/9bi5k.gif)"
2173 |    ]
2174 |   },
2175 |   {
2176 |    "cell_type": "code",
2177 |    "execution_count": null,
2178 |    "metadata": {},
2179 |    "outputs": [],
2180 |    "source": [
2181 |     "import matplotlib.pyplot as plt\n",
2182 |     "\n",
2183 |     "%matplotlib inline\n",
2184 |     "import numpy as np\n",
2185 |     "import cv2\n",
2186 |     "import sklearn\n",
2187 |     "import PIL\n",
2188 |     "import torch\n",
2189 |     "import torchvision\n",
2190 |     "\n",
2191 |     "\n",
2192 |     "def show_image(image, figsize=(16, 9), reverse=True):\n",
2193 |     "    plt.figure(figsize=figsize)\n",
2194 |     "    if reverse:\n",
2195 |     "        plt.imshow(image[..., ::-1])\n",
2196 |     "    else:\n",
2197 |     "        plt.imshow(image)\n",
2198 |     "    plt.axis(\"off\")\n",
2199 |     "    plt.show()\n",
2200 |     "\n",
2201 |     "\n",
2202 |     "def show_grayscale_image(image, figsize=(16, 9)):\n",
2203 |     "    plt.figure(figsize=figsize)\n",
2204 |     "    plt.imshow(image, cmap=\"gray\")\n",
2205 |     "    plt.axis(\"off\")\n",
2206 |     "    plt.show()"
2207 |    ]
2208 |   },
2209 |   {
2210 |    "cell_type": "code",
2211 |    "execution_count": null,
2212 |    "metadata": {},
2213 |    "outputs": [],
2214 |    "source": [
2215 |     "layer = torch.nn.Conv2d(1, 1, kernel_size=(3, 3))\n",
2216 |     "layer.weight"
2217 |    ]
2218 |   },
2219 |   {
2220 |    "cell_type": "code",
2221 |    "execution_count": null,
2222 |    "metadata": {},
2223 |    "outputs": [],
2224 |    "source": [
2225 |     "# вес доступен через .weight.data\n",
2226 |     "layer.weight.data = torch.ones_like(layer.weight.data)\n",
2227 |     "layer.weight.data /= torch.sum(layer.weight.data)\n",
2228 |     "layer.weight.data"
2229 |    ]
2230 |   },
2231 |   {
2232 |    "cell_type": "code",
2233 |    "execution_count": null,
2234 |    "metadata": {},
2235 |    "outputs": [],
2236 |    "source": [
2237 |     "layer.bias.data = torch.zeros_like(layer.bias.data)\n",
2238 |     "layer.bias.data"
2239 |    ]
2240 |   },
2241 |   {
2242 |    "cell_type": "code",
2243 |    "execution_count": null,
2244 |    "metadata": {},
2245 |    "outputs": [],
2246 |    "source": [
2247 |     "layer.train()\n",
2248 |     "layer.eval()"
2249 |    ]
2250 |   },
2251 |   {
2252 |    "cell_type": "code",
2253 |    "execution_count": null,
2254 |    "metadata": {},
2255 |    "outputs": [],
2256 |    "source": [
2257 |     "input_tensor = torch.arange(9).reshape(1, 1, 3, 3)\n",
2258 |     "input_tensor.shape"
2259 |    ]
2260 |   },
2261 |   {
2262 |    "cell_type": "markdown",
2263 |    "metadata": {},
2264 |    "source": [
2265 |     "1 - размер батча 1 - кол-во каналов (1, если черно-белый, 3, если цветной) 3 - высота 3 - ширина"
2266 |    ]
2267 |   },
2268 |   {
2269 |    "cell_type": "code",
2270 |    "execution_count": null,
2271 |    "metadata": {},
2272 |    "outputs": [],
2273 |    "source": [
2274 |     "input_tensor"
2275 |    ]
2276 |   },
2277 |   {
2278 |    "cell_type": "code",
2279 |    "execution_count": null,
2280 |    "metadata": {},
2281 |    "outputs": [],
2282 |    "source": [
2283 |     "output_tensor = layer(input_tensor.float())"
2284 |    ]
2285 |   },
2286 |   {
2287 |    "cell_type": "code",
2288 |    "execution_count": null,
2289 |    "metadata": {},
2290 |    "outputs": [],
2291 |    "source": [
2292 |     "output_tensor"
2293 |    ]
2294 |   },
2295 |   {
2296 |    "cell_type": "markdown",
2297 |    "metadata": {},
2298 |    "source": [
2299 |     "## Какие бывают фильтры"
2300 |    ]
2301 |   },
2302 |   {
2303 |    "cell_type": "code",
2304 |    "execution_count": null,
2305 |    "metadata": {},
2306 |    "outputs": [],
2307 |    "source": [
2308 |     "from urllib.request import urlopen\n",
2309 |     "\n",
2310 |     "\n",
2311 |     "req = urlopen(\n",
2312 |     "    \"https://images.unsplash.com/photo-1608848461950-0fe51dfc41cb?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxleHBsb3JlLWZlZWR8M3x8fGVufDB8fHx8&w=1000&q=80\"\n",
2313 |     ")\n",
2314 |     "arr = np.asarray(bytearray(req.read()), dtype=np.uint8)\n",
2315 |     "img = cv2.imdecode(arr, 0)"
2316 |    ]
2317 |   },
2318 |   {
2319 |    "cell_type": "code",
2320 |    "execution_count": null,
2321 |    "metadata": {},
2322 |    "outputs": [],
2323 |    "source": [
2324 |     "img.shape"
2325 |    ]
2326 |   },
2327 |   {
2328 |    "cell_type": "code",
2329 |    "execution_count": null,
2330 |    "metadata": {},
2331 |    "outputs": [],
2332 |    "source": [
2333 |     "show_grayscale_image(img)"
2334 |    ]
2335 |   },
2336 |   {
2337 |    "cell_type": "code",
2338 |    "execution_count": null,
2339 |    "metadata": {},
2340 |    "outputs": [],
2341 |    "source": [
2342 |     "to_tensor = torchvision.transforms.ToTensor()\n",
2343 |     "img_tensor = to_tensor(img)\n",
2344 |     "img_tensor"
2345 |    ]
2346 |   },
2347 |   {
2348 |    "cell_type": "code",
2349 |    "execution_count": null,
2350 |    "metadata": {},
2351 |    "outputs": [],
2352 |    "source": [
2353 |     "img_tensor.shape"
2354 |    ]
2355 |   },
2356 |   {
2357 |    "cell_type": "code",
2358 |    "execution_count": null,
2359 |    "metadata": {},
2360 |    "outputs": [],
2361 |    "source": [
2362 |     "# делаем так, чтобы нейросеть могла считать изображение\n",
2363 |     "\n",
2364 |     "img_tensor = img_tensor.reshape((1, 1, 1500, 1000))"
2365 |    ]
2366 |   },
2367 |   {
2368 |    "cell_type": "code",
2369 |    "execution_count": null,
2370 |    "metadata": {},
2371 |    "outputs": [],
2372 |    "source": [
2373 |     "kernel_size = 50\n",
2374 |     "layer = torch.nn.Conv2d(1, 1, kernel_size=kernel_size)\n",
2375 |     "\n",
2376 |     "# обновляем веса свертки\n",
2377 |     "\n",
2378 |     "layer.weight.data = torch.ones_like(layer.weight.data)\n",
2379 |     "layer.weight.data /= torch.sum(layer.weight.data)"
2380 |    ]
2381 |   },
2382 |   {
2383 |    "cell_type": "code",
2384 |    "execution_count": null,
2385 |    "metadata": {},
2386 |    "outputs": [],
2387 |    "source": [
2388 |     "layer.weight.data"
2389 |    ]
2390 |   },
2391 |   {
2392 |    "cell_type": "code",
2393 |    "execution_count": null,
2394 |    "metadata": {},
2395 |    "outputs": [],
2396 |    "source": [
2397 |     "output_tensor = layer(img_tensor)\n",
2398 |     "output_tensor.shape"
2399 |    ]
2400 |   },
2401 |   {
2402 |    "cell_type": "code",
2403 |    "execution_count": null,
2404 |    "metadata": {},
2405 |    "outputs": [],
2406 |    "source": [
2407 |     "import torchvision.transforms as transforms\n",
2408 |     "\n",
2409 |     "# функция, переводящее тензор в PIL-изображение\n",
2410 |     "to_pil_image = transforms.ToPILImage()\n",
2411 |     "output_img = to_pil_image(output_tensor.squeeze(0))"
2412 |    ]
2413 |   },
2414 |   {
2415 |    "cell_type": "code",
2416 |    "execution_count": null,
2417 |    "metadata": {},
2418 |    "outputs": [],
2419 |    "source": [
2420 |     "show_grayscale_image(output_img)"
2421 |    ]
2422 |   },
2423 |   {
2424 |    "cell_type": "code",
2425 |    "execution_count": null,
2426 |    "metadata": {},
2427 |    "outputs": [],
2428 |    "source": [
2429 |     "def show_image_with_kernel(img_input, kernel):\n",
2430 |     "    layer = torch.nn.Conv2d(1, 1, kernel_size=3)\n",
2431 |     "    layer.weight.data = our_kernel.reshape(1, 1, 3, 3)\n",
2432 |     "    output_tensor = layer(img_input)\n",
2433 |     "    output_img = to_pil_image(output_tensor.squeeze(0))\n",
2434 |     "    show_grayscale_image(output_img)"
2435 |    ]
2436 |   },
2437 |   {
2438 |    "cell_type": "code",
2439 |    "execution_count": null,
2440 |    "metadata": {},
2441 |    "outputs": [],
2442 |    "source": [
2443 |     "our_kernel = torch.tensor([[0, -1, 0], [-1, 4, -1], [0, -1, 0]], dtype=torch.float32)\n",
2444 |     "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2445 |     "show_image_with_kernel(img_tensor, our_kernel)"
2446 |    ]
2447 |   },
2448 |   {
2449 |    "cell_type": "code",
2450 |    "execution_count": null,
2451 |    "metadata": {},
2452 |    "outputs": [],
2453 |    "source": [
2454 |     "our_kernel = torch.tensor(\n",
2455 |     "    [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], dtype=torch.float32\n",
2456 |     ")\n",
2457 |     "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2458 |     "show_image_with_kernel(img_tensor, our_kernel)"
2459 |    ]
2460 |   },
2461 |   {
2462 |    "cell_type": "code",
2463 |    "execution_count": null,
2464 |    "metadata": {},
2465 |    "outputs": [],
2466 |    "source": [
2467 |     "our_kernel = torch.tensor([[0, -1, 0], [-1, 5, -1], [0, -1, 0]], dtype=torch.float32)\n",
2468 |     "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2469 |     "show_image_with_kernel(img_tensor, our_kernel)"
2470 |    ]
2471 |   },
2472 |   {
2473 |    "cell_type": "code",
2474 |    "execution_count": null,
2475 |    "metadata": {},
2476 |    "outputs": [],
2477 |    "source": [
2478 |     "our_kernel = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=torch.float32)\n",
2479 |     "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2480 |     "show_image_with_kernel(img_tensor, our_kernel)"
2481 |    ]
2482 |   },
2483 |   {
2484 |    "cell_type": "code",
2485 |    "execution_count": null,
2486 |    "metadata": {},
2487 |    "outputs": [],
2488 |    "source": [
2489 |     "our_kernel = torch.tensor([[1, 0, 1], [0, 1, 0], [1, 0, 1]], dtype=torch.float32)\n",
2490 |     "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2491 |     "show_image_with_kernel(img_tensor, our_kernel)"
2492 |    ]
2493 |   },
2494 |   {
2495 |    "cell_type": "markdown",
2496 |    "metadata": {},
2497 |    "source": [
2498 |     "![](https://miro.medium.com/v2/resize:fit:738/0*4tHPDjvQXX9obrhK.gif)"
2499 |    ]
2500 |   },
2501 |   {
2502 |    "cell_type": "markdown",
2503 |    "metadata": {},
2504 |    "source": [
2505 |     "![](https://qph.cf2.quoracdn.net/main-qimg-c05ec72225e19614f0cc10385d27525c)"
2506 |    ]
2507 |   },
2508 |   {
2509 |    "cell_type": "markdown",
2510 |    "metadata": {},
2511 |    "source": [
2512 |     "![](https://blog.ca.meron.dev/images/Convolution.gif)"
2513 |    ]
2514 |   },
2515 |   {
2516 |    "cell_type": "markdown",
2517 |    "metadata": {},
2518 |    "source": [
2519 |     "![](https://i.stack.imgur.com/ptDPZ.gif)"
2520 |    ]
2521 |   },
2522 |   {
2523 |    "cell_type": "markdown",
2524 |    "metadata": {},
2525 |    "source": [
2526 |     "![](https://lukeguerdan.com/assets/img/TCNfigureA.png)"
2527 |    ]
2528 |   },
2529 |   {
2530 |    "cell_type": "markdown",
2531 |    "metadata": {},
2532 |    "source": [
2533 |     "![](https://theaisummer.com/static/490be17ee7f19b78003c3fdf5a6bbafc/83b75/receptive-field-in-convolutional-networks.png)"
2534 |    ]
2535 |   },
2536 |   {
2537 |    "cell_type": "markdown",
2538 |    "metadata": {},
2539 |    "source": [
2540 |     "![](https://i.ytimg.com/vi/KuXjwB4LzSA/maxresdefault.jpg)"
2541 |    ]
2542 |   },
2543 |   {
2544 |    "cell_type": "markdown",
2545 |    "metadata": {},
2546 |    "source": [
2547 |     "![](https://saturncloud.io/images/blog/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way.jpg)"
2548 |    ]
2549 |   },
2550 |   {
2551 |    "cell_type": "markdown",
2552 |    "metadata": {},
2553 |    "source": [
2554 |     "![](https://learnopencv.com/wp-content/uploads/2023/01/tensorflow-keras-cnn-vgg-architecture-1024x611.png)"
2555 |    ]
2556 |   },
2557 |   {
2558 |    "cell_type": "markdown",
2559 |    "metadata": {},
2560 |    "source": [
2561 |     "![](https://nico-curti.github.io/NumPyNet/NumPyNet/images/maxpool.gif)"
2562 |    ]
2563 |   },
2564 |   {
2565 |    "cell_type": "markdown",
2566 |    "metadata": {},
2567 |    "source": [
2568 |     "Визуализация сверточных слоев https://distill.pub/2017/feature-visualization/"
2569 |    ]
2570 |   },
2571 |   {
2572 |    "cell_type": "markdown",
2573 |    "metadata": {},
2574 |    "source": [
2575 |     "# Теперь пора писать свою сверточную нейросеть"
2576 |    ]
2577 |   },
2578 |   {
2579 |    "cell_type": "code",
2580 |    "execution_count": null,
2581 |    "metadata": {},
2582 |    "outputs": [],
2583 |    "source": [
2584 |     "import torch\n",
2585 |     "import torchvision\n",
2586 |     "import torchvision.transforms as transforms\n",
2587 |     "\n",
2588 |     "\n",
2589 |     "transform = transforms.Compose(\n",
2590 |     "    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]\n",
2591 |     ")"
2592 |    ]
2593 |   },
2594 |   {
2595 |    "cell_type": "code",
2596 |    "execution_count": null,
2597 |    "metadata": {},
2598 |    "outputs": [],
2599 |    "source": [
2600 |     "batch_size = 4\n",
2601 |     "\n",
2602 |     "# dataset задаёт данные\n",
2603 |     "trainset = torchvision.datasets.CIFAR10(\n",
2604 |     "    root=\"./data\", train=True, download=True, transform=transform\n",
2605 |     ")\n",
2606 |     "# dataloader подгружает их\n",
2607 |     "trainloader = torch.utils.data.DataLoader(\n",
2608 |     "    trainset, batch_size=batch_size, shuffle=True, num_workers=2\n",
2609 |     ")\n",
2610 |     "\n",
2611 |     "testset = torchvision.datasets.CIFAR10(\n",
2612 |     "    root=\"./data\", train=False, download=True, transform=transform\n",
2613 |     ")\n",
2614 |     "testloader = torch.utils.data.DataLoader(\n",
2615 |     "    testset, batch_size=batch_size, shuffle=False, num_workers=2\n",
2616 |     ")\n",
2617 |     "\n",
2618 |     "classes = (\n",
2619 |     "    \"plane\",\n",
2620 |     "    \"car\",\n",
2621 |     "    \"bird\",\n",
2622 |     "    \"cat\",\n",
2623 |     "    \"deer\",\n",
2624 |     "    \"dog\",\n",
2625 |     "    \"frog\",\n",
2626 |     "    \"horse\",\n",
2627 |     "    \"ship\",\n",
2628 |     "    \"truck\",\n",
2629 |     ")"
2630 |    ]
2631 |   },
2632 |   {
2633 |    "cell_type": "code",
2634 |    "execution_count": null,
2635 |    "metadata": {},
2636 |    "outputs": [],
2637 |    "source": [
2638 |     "import matplotlib.pyplot as plt\n",
2639 |     "import numpy as np\n",
2640 |     "\n",
2641 |     "\n",
2642 |     "def imshow(img):\n",
2643 |     "    # убрать нормализацию\n",
2644 |     "    img = img / 2 + 0.5\n",
2645 |     "    npimg = img.numpy()\n",
2646 |     "    plt.imshow(np.transpose(npimg, (1, 2, 0)))\n",
2647 |     "    plt.show()\n",
2648 |     "\n",
2649 |     "\n",
2650 |     "# взять случайный батч изображений\n",
2651 |     "dataiter = iter(trainloader)\n",
2652 |     "images, labels = next(dataiter)\n",
2653 |     "\n",
2654 |     "imshow(torchvision.utils.make_grid(images))\n",
2655 |     "print(\"  \".join(\"{}\".format(classes[labels[j]]) for j in range(4)))"
2656 |    ]
2657 |   },
2658 |   {
2659 |    "cell_type": "code",
2660 |    "execution_count": null,
2661 |    "metadata": {},
2662 |    "outputs": [],
2663 |    "source": [
2664 |     "def train_model(net, criterion, optimizer, trainloader, num_epochs=5):\n",
2665 |     "    for epoch in range(num_epochs):\n",
2666 |     "        running_loss = 0.0\n",
2667 |     "        for i, data in enumerate(trainloader, 0):\n",
2668 |     "            inputs, labels = data\n",
2669 |     "\n",
2670 |     "            # Давайте сами напишем код тут\n",
2671 |     "\n",
2672 |     "            optimizer.zero_grad()\n",
2673 |     "\n",
2674 |     "            outputs = net(inputs)\n",
2675 |     "            loss = criterion(outputs, labels)\n",
2676 |     "            loss.backward()\n",
2677 |     "            optimizer.step()\n",
2678 |     "\n",
2679 |     "            running_loss += loss.item()\n",
2680 |     "            if i % 2000 == 1999:\n",
2681 |     "                print(\n",
2682 |     "                    \"Epoch {0}/{1}, iteration {2}, loss: {3:.3f}\".format(\n",
2683 |     "                        epoch + 1, num_epochs, i + 1, running_loss / 2000\n",
2684 |     "                    )\n",
2685 |     "                )\n",
2686 |     "                running_loss = 0.0\n",
2687 |     "        print()\n",
2688 |     "\n",
2689 |     "    print(\"Finished Training\")\n",
2690 |     "\n",
2691 |     "    return net"
2692 |    ]
2693 |   },
2694 |   {
2695 |    "cell_type": "code",
2696 |    "execution_count": null,
2697 |    "metadata": {},
2698 |    "outputs": [],
2699 |    "source": [
2700 |     "def all_accuracy(net, testloader):\n",
2701 |     "    correct = 0\n",
2702 |     "    total = 0\n",
2703 |     "    with torch.no_grad():\n",
2704 |     "        for data in testloader:\n",
2705 |     "            images, labels = data\n",
2706 |     "            outputs = net(images)\n",
2707 |     "\n",
2708 |     "            _, predicted = torch.max(outputs.data, 1)\n",
2709 |     "            total += labels.size(0)\n",
2710 |     "            correct += (predicted == labels).sum().item()\n",
2711 |     "\n",
2712 |     "    print(\n",
2713 |     "        \"Accuracy of the network on the 10000 test images: {} %\".format(\n",
2714 |     "            100 * correct / total\n",
2715 |     "        )\n",
2716 |     "    )"
2717 |    ]
2718 |   },
2719 |   {
2720 |    "cell_type": "code",
2721 |    "execution_count": null,
2722 |    "metadata": {},
2723 |    "outputs": [],
2724 |    "source": [
2725 |     "def class_accuracy(net, testloader):\n",
2726 |     "    class_correct = list(0.0 for i in range(10))\n",
2727 |     "    class_total = list(0.0 for i in range(10))\n",
2728 |     "    with torch.no_grad():\n",
2729 |     "        for data in testloader:\n",
2730 |     "            images, labels = data\n",
2731 |     "            outputs = net(images)\n",
2732 |     "            _, predicted = torch.max(outputs, 1)\n",
2733 |     "            c = (predicted == labels).squeeze()\n",
2734 |     "            for i in range(4):\n",
2735 |     "                label = labels[i]\n",
2736 |     "                class_correct[label] += c[i].item()\n",
2737 |     "                class_total[label] += 1\n",
2738 |     "\n",
2739 |     "    for i in range(10):\n",
2740 |     "        print(\n",
2741 |     "            \"Accuracy of {} : {} %\".format(\n",
2742 |     "                classes[i], 100 * class_correct[i] / class_total[i]\n",
2743 |     "            )\n",
2744 |     "        )"
2745 |    ]
2746 |   },
2747 |   {
2748 |    "cell_type": "code",
2749 |    "execution_count": null,
2750 |    "metadata": {},
2751 |    "outputs": [],
2752 |    "source": [
2753 |     "import torch.nn as nn\n",
2754 |     "import torch.nn.functional as F\n",
2755 |     "\n",
2756 |     "\n",
2757 |     "class FeedForwardNet(nn.Module):\n",
2758 |     "    def __init__(self):\n",
2759 |     "        super(FeedForwardNet, self).__init__()\n",
2760 |     "        self.fc1 = nn.Linear(3 * 32 * 32, 128)\n",
2761 |     "        self.fc2 = nn.Linear(128, 32)\n",
2762 |     "        self.fc3 = nn.Linear(32, 10)\n",
2763 |     "\n",
2764 |     "    def forward(self, x):\n",
2765 |     "        x = x.view(-1, 3 * 32 * 32)\n",
2766 |     "        x = self.fc1(x)\n",
2767 |     "        x = F.relu(x)\n",
2768 |     "        x = self.fc2(x)\n",
2769 |     "        x = F.relu(x)\n",
2770 |     "        x = self.fc3(x)\n",
2771 |     "        return x\n",
2772 |     "\n",
2773 |     "\n",
2774 |     "net = FeedForwardNet()"
2775 |    ]
2776 |   },
2777 |   {
2778 |    "cell_type": "code",
2779 |    "execution_count": null,
2780 |    "metadata": {},
2781 |    "outputs": [],
2782 |    "source": [
2783 |     "import torch.optim as optim\n",
2784 |     "\n",
2785 |     "criterion = nn.CrossEntropyLoss()\n",
2786 |     "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)"
2787 |    ]
2788 |   },
2789 |   {
2790 |    "cell_type": "code",
2791 |    "execution_count": null,
2792 |    "metadata": {},
2793 |    "outputs": [],
2794 |    "source": [
2795 |     "net = train_model(net, criterion, optimizer, trainloader, num_epochs=5)"
2796 |    ]
2797 |   },
2798 |   {
2799 |    "cell_type": "code",
2800 |    "execution_count": null,
2801 |    "metadata": {},
2802 |    "outputs": [],
2803 |    "source": [
2804 |     "all_accuracy(net, testloader)"
2805 |    ]
2806 |   },
2807 |   {
2808 |    "cell_type": "code",
2809 |    "execution_count": null,
2810 |    "metadata": {},
2811 |    "outputs": [],
2812 |    "source": [
2813 |     "class_accuracy(net, testloader)"
2814 |    ]
2815 |   },
2816 |   {
2817 |    "cell_type": "markdown",
2818 |    "metadata": {},
2819 |    "source": [
2820 |     "## Свёрточная сеть для классификации"
2821 |    ]
2822 |   },
2823 |   {
2824 |    "cell_type": "code",
2825 |    "execution_count": null,
2826 |    "metadata": {},
2827 |    "outputs": [],
2828 |    "source": [
2829 |     "import torch.nn as nn\n",
2830 |     "import torch.nn.functional as F\n",
2831 |     "\n",
2832 |     "\n",
2833 |     "class Net(nn.Module):\n",
2834 |     "    def __init__(self):\n",
2835 |     "        super(Net, self).__init__()\n",
2836 |     "        self.conv1 = nn.Conv2d(3, 6, 5)\n",
2837 |     "        self.pool = nn.MaxPool2d(2, 2)\n",
2838 |     "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
2839 |     "        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
2840 |     "        self.fc2 = nn.Linear(120, 84)\n",
2841 |     "        self.fc3 = nn.Linear(84, 10)\n",
2842 |     "\n",
2843 |     "    def forward(self, x):\n",
2844 |     "        x = self.pool(F.relu(self.conv1(x)))\n",
2845 |     "        x = self.pool(F.relu(self.conv2(x)))\n",
2846 |     "        x = x.view(-1, 16 * 5 * 5)\n",
2847 |     "        x = F.relu(self.fc1(x))\n",
2848 |     "        x = F.relu(self.fc2(x))\n",
2849 |     "        x = self.fc3(x)\n",
2850 |     "        return x\n",
2851 |     "\n",
2852 |     "\n",
2853 |     "net = Net()"
2854 |    ]
2855 |   },
2856 |   {
2857 |    "cell_type": "code",
2858 |    "execution_count": null,
2859 |    "metadata": {},
2860 |    "outputs": [],
2861 |    "source": [
2862 |     "criterion = nn.CrossEntropyLoss()\n",
2863 |     "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)"
2864 |    ]
2865 |   },
2866 |   {
2867 |    "cell_type": "code",
2868 |    "execution_count": null,
2869 |    "metadata": {},
2870 |    "outputs": [],
2871 |    "source": [
2872 |     "net = train_model(net, criterion, optimizer, trainloader, num_epochs=5)"
2873 |    ]
2874 |   },
2875 |   {
2876 |    "cell_type": "code",
2877 |    "execution_count": null,
2878 |    "metadata": {},
2879 |    "outputs": [],
2880 |    "source": [
2881 |     "# сохраним сеть\n",
2882 |     "PATH = \"./cifar_net.pth\"\n",
2883 |     "torch.save(net.state_dict(), PATH)"
2884 |    ]
2885 |   },
2886 |   {
2887 |    "cell_type": "code",
2888 |    "execution_count": null,
2889 |    "metadata": {},
2890 |    "outputs": [],
2891 |    "source": [
2892 |     "dataiter = iter(testloader)\n",
2893 |     "images, labels = next(dataiter)\n",
2894 |     "\n",
2895 |     "imshow(torchvision.utils.make_grid(images))\n",
2896 |     "print(\n",
2897 |     "    \"GroundTruth: \",\n",
2898 |     "    \" \".join(\"{}\".format(classes[labels[j]]) for j in range(batch_size)),\n",
2899 |     ")"
2900 |    ]
2901 |   },
2902 |   {
2903 |    "cell_type": "code",
2904 |    "execution_count": null,
2905 |    "metadata": {},
2906 |    "outputs": [],
2907 |    "source": [
2908 |     "# загрузим сеть\n",
2909 |     "net_loaded = Net()\n",
2910 |     "net_loaded.load_state_dict(torch.load(PATH))"
2911 |    ]
2912 |   },
2913 |   {
2914 |    "cell_type": "code",
2915 |    "execution_count": null,
2916 |    "metadata": {},
2917 |    "outputs": [],
2918 |    "source": [
2919 |     "outputs = net_loaded(images)\n",
2920 |     "_, predicted = torch.max(outputs, 1)\n",
2921 |     "\n",
2922 |     "print(\"Predicted: \", \" \".join(\"{}\".format(classes[predicted[j]]) for j in range(4)))"
2923 |    ]
2924 |   },
2925 |   {
2926 |    "cell_type": "code",
2927 |    "execution_count": null,
2928 |    "metadata": {},
2929 |    "outputs": [],
2930 |    "source": [
2931 |     "all_accuracy(net, testloader)"
2932 |    ]
2933 |   },
2934 |   {
2935 |    "cell_type": "code",
2936 |    "execution_count": null,
2937 |    "metadata": {},
2938 |    "outputs": [],
2939 |    "source": [
2940 |     "class_accuracy(net, testloader)"
2941 |    ]
2942 |   },
2943 |   {
2944 |    "cell_type": "code",
2945 |    "execution_count": null,
2946 |    "metadata": {},
2947 |    "outputs": [],
2948 |    "source": []
2949 |   }
2950 |  ],
2951 |  "metadata": {
2952 |   "colab": {
2953 |    "collapsed_sections": [],
2954 |    "provenance": []
2955 |   },
2956 |   "kernelspec": {
2957 |    "display_name": "nn-ml-bachelor-2024-venv",
2958 |    "language": "python",
2959 |    "name": "nn-ml-bachelor-2024-venv"
2960 |   },
2961 |   "language_info": {
2962 |    "codemirror_mode": {
2963 |     "name": "ipython",
2964 |     "version": 3
2965 |    },
2966 |    "file_extension": ".py",
2967 |    "mimetype": "text/x-python",
2968 |    "name": "python",
2969 |    "nbconvert_exporter": "python",
2970 |    "pygments_lexer": "ipython3",
2971 |    "version": "3.9.6"
2972 |   }
2973 |  },
2974 |  "nbformat": 4,
2975 |  "nbformat_minor": 1
2976 | }
2977 | 


--------------------------------------------------------------------------------