├── .gitignore
├── presentations
├── ТЗ ДЗ.pdf
├── 01_introduction.pdf
└── 01_introduction.pptx
├── pics
└── 01_ml_arch_example.png
├── README.md
└── notebooks
├── labs
├── 02_lab_EDA.ipynb
├── 04_lab_dl_intro.ipynb
├── 03_lab_ml_intro.ipynb
├── 01_lab_python_intro_old_example.ipynb
└── 01_lab_python_intro.ipynb
└── seminars
├── 02-python-libs.ipynb
└── 04-pytorch-intro.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .venv/
2 | .ipynb_checkpoints
3 |
--------------------------------------------------------------------------------
/presentations/ТЗ ДЗ.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/ТЗ ДЗ.pdf
--------------------------------------------------------------------------------
/pics/01_ml_arch_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/pics/01_ml_arch_example.png
--------------------------------------------------------------------------------
/presentations/01_introduction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/01_introduction.pdf
--------------------------------------------------------------------------------
/presentations/01_introduction.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/l3lush/nn_ml_practice/HEAD/presentations/01_introduction.pptx
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Репозиторий курса "Искусственные нейронные сети и машинное обучение" для групп БИВТ-21
2 | [Google таблица с вашими успехами](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing)
3 |
--------------------------------------------------------------------------------
/notebooks/labs/02_lab_EDA.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "bc9520d2",
6 | "metadata": {},
7 | "source": [
8 | "# Лабораторная работа №2\n",
9 | "\n",
10 | "ФИО: \n",
11 | "Группа: \n",
12 | "\n",
13 | "Отправлять можно следующими способами:\n",
14 | "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
15 | "\n",
16 | "Deadlines:\n",
17 | "- Занятие №6 в семестре (Занятие №2 очное)\n",
18 | "\n",
19 | "Что необходимо сделать: \n",
20 | "**В общих чертах просто провести EDA** (но обычно это не бывает просто)\n",
21 | "## Читайте задание внимательно\n",
22 | "\n",
23 | "Исходные данные:\n",
24 | "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
25 | "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
26 | " \n",
27 | "---\n",
28 | "Теперь по пунктам, что я от вас жду: \n",
29 | "1. **Найти** в таблице (из исходных данных) название своего датасета\n",
30 | "2. **Описать** кратко постановку задачи, что от вас хотят. Какие есть переменные. Целевое событие непрерывно (предсказываем число от -$\\infty$ до $\\infty$) либо дискретно (предсказываем класс из конечного множества вариантов, например 0 или 1, или какое-то число в диапазоне [0; 10])\n",
31 | "3. Построить распределение целевой переменной в виде гистограммы, сделать промежуточные выводы (обратите внимание на однородоность распределения и возможный дисбаланс). Посчитайте количество уникальных значений целевой переменной.\n",
32 | "4. Выведите основные статистики по переменным в датасете (для этого есть готовый метод в pandas, он считает count, min, max, mean, 25%, 50% и пр.). Это делается **одним** методом (вы его знаете).\n",
33 | "5. Выведите основную информацию по датасету (сколько всего колонок, каких они типов, сколько в них non-null элементов). Это делается **одним** методом (вы его знаете).\n",
34 | "6. Посчитайте количество пропусков (NaN, Null, null, None) элементов во всех колонках. Предположите, почему эти пропуски могли возникнуть, и как их можно было бы заменить. \n",
35 | "7. Постройте гистограммы 5 любых признаков (из множества `X`, или как оно изначально у нас называется `data.data`). Если видите какое-то смещение, несимметричность и прочее, опишите это словами в ноутбуке.\n",
36 | "8. Постройте графики зависимости 5 любых (на ваш выбор) переменных от целевой переменной (если переменных меньше, чем 5, то сделайте столько, сколько получится). Сделайте вывод, можно ли использовать эти переменные для прогнозирования целевой переменной (иначе говоря, есть ли какая-то взаимосвязь между y-переменной и X-переменной) \n",
37 | "9. _extra_ (необязательно). Посмотрите на зависимость двух переменных одновременно от целевой переменной. То есть по оси OX должна быть переменная $X_{n}$, по оси OY -- переменная $X_{k}$. И у вас будет две кривые (два облака точек) в разрезе целевой переменной. Либо вы можете построить похожее для категориальных признаков, но необходимо будет прочитать про heatmap. \n",
38 | "\n",
39 | "---\n",
40 | "P.S. \n",
41 | "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа \n",
42 | "- Заголовок\n",
43 | "- Ячейки с кодом\n",
44 | "- Другой заголовок\n",
45 | "- Другие ячейки с кодом"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "id": "28b51062",
51 | "metadata": {},
52 | "source": [
53 | "## 0. Пример импорта данных"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "id": "406597a7",
60 | "metadata": {},
61 | "outputs": [],
62 | "source": [
63 | "data = pd.read_csv(path_to_dataset, sep='\\t') # если нужно, можно поменять разделитель"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "id": "0d119028",
69 | "metadata": {},
70 | "source": [
71 | "## 2. Описание постановки задачи"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "id": "d92464d0",
78 | "metadata": {},
79 | "outputs": [],
80 | "source": []
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "id": "34931f43",
85 | "metadata": {},
86 | "source": [
87 | "## 3. Распределение целевой переменной"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "id": "67a611d0",
94 | "metadata": {},
95 | "outputs": [],
96 | "source": []
97 | }
98 | ],
99 | "metadata": {
100 | "kernelspec": {
101 | "display_name": "nn-ml-bachelor-2024-venv",
102 | "language": "python",
103 | "name": "nn-ml-bachelor-2024-venv"
104 | },
105 | "language_info": {
106 | "codemirror_mode": {
107 | "name": "ipython",
108 | "version": 3
109 | },
110 | "file_extension": ".py",
111 | "mimetype": "text/x-python",
112 | "name": "python",
113 | "nbconvert_exporter": "python",
114 | "pygments_lexer": "ipython3",
115 | "version": "3.9.6"
116 | }
117 | },
118 | "nbformat": 4,
119 | "nbformat_minor": 5
120 | }
121 |
--------------------------------------------------------------------------------
/notebooks/labs/04_lab_dl_intro.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "bc9520d2",
6 | "metadata": {},
7 | "source": [
8 | "# Лабораторная работа №4\n",
9 | "\n",
10 | "ФИО: \n",
11 | "Группа: \n",
12 | "\n",
13 | "Отправлять можно следующими способами:\n",
14 | "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
15 | "\n",
16 | "Deadlines:\n",
17 | "- Занятие №8 в семестре (Занятие №4 очное)\n",
18 | "\n",
19 | "Что необходимо сделать: \n",
20 | "- Обучить различные модели глубокого обучения на имеющихся данных \n",
21 | "\n",
22 | "---\n",
23 | "## Читайте задание внимательно\n",
24 | "\n",
25 | "Исходные данные:\n",
26 | "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
27 | "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
28 | " \n",
29 | "---\n",
30 | "Теперь по пунктам, что я от вас жду: \n",
31 | "1. Загрузить необходимые данные к себе и считать (read) их в переменную.\n",
32 | "2. Понять, у вас задача классификации (бинарной или многоклассовой) или регрессии.\n",
33 | "3. Сделать предобработку данных: \n",
34 | " 1. Разделить выборку на тренировочную (train) и тестовую (test). _Обратите внимание, что обучать скейлеры и определять, какими значениями вы будете заполнять пропуски, вы будете на train выборке, а применять и на train, и на test_.\n",
35 | " 2. Проверить пропуски в данных. Если они есть, заполнить одной из стратегий, предложенных в ноутбуке для семинара №3. P.S. Для численных и категориальных переменных будут разные стратегии.\n",
36 | " 3. Отнормировать численные переменные (`StandardScaler`, `MinMaxScaler`).\n",
37 | " 4. Закодировать категориальные признаки по одной из стратегий.\n",
38 | "4. Оформить данные в виде класса `Dataset` из библиотеки `torch` (как мы это делали на семинаре), а затем засунуть в `Dataloader` (тоже делали на семинаре).\n",
39 | "5. Обучить на тренировочном множестве:\n",
40 | " 1. Очень простую однослойную нейросеть с оптимизатором `SGD` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
41 | " 2. Нейросеть посложнее (с 1 скрытым слоем) с оптимизатором `Adam` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
42 | " 3. Нейросеть еще сложнее (с 3+ скрытыми слоями) с оптимизатором `Adam` ([link](https://pytorch.org/docs/stable/optim.html)).\n",
43 | "6. Посчитайте loss на train и test множествах, в зависимости от эпохи обучения. Провизуализируйте это с помощью библиотеки `matplotlib` (выйдет так называемая **learning curve**, кривая обучения модели).\n",
44 | "6. Посчитайте метрики на train и test множествах:\n",
45 | " 1. Для задачи классификации -- Accuracy\n",
46 | " 2. Для задачи регрессии -- MAE\n",
47 | "7. Сравните метрики относительно train/test, так и относительно разных моделей. Ответьте на следующие вопросы:\n",
48 | " 1. Какая модель справилась лучше с поставленной задачей?\n",
49 | " 2. Имеет ли место переобучение?\n",
50 | " 3. Имеет ли место недообучение?\n",
51 | " 4. Как можно улучшить метрики моделей?\n",
52 | "\n",
53 | "---\n",
54 | "P.S. \n",
55 | "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа \n",
56 | "- Заголовок\n",
57 | "- Ячейки с кодом\n",
58 | "- Другой заголовок\n",
59 | "- Другие ячейки с кодом\n",
60 | "\n",
61 | "P.S.S. \n",
62 | "Если вам повезло с многоклассовой классификацией, у вас не будет проблем, просто нужно будет поставить необходимое количество нейронов на выходе вашей нейросети."
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "id": "28b51062",
68 | "metadata": {},
69 | "source": [
70 | "## 1. Пример импорта данных. Грузим данные"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": null,
76 | "id": "406597a7",
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "data = pd.read_csv(path_to_dataset, sep='\\t') # если нужно, можно поменять разделитель"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "id": "0d119028",
86 | "metadata": {},
87 | "source": [
88 | "## 2. Понимаем, какая перед нами задача"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "id": "d92464d0",
95 | "metadata": {},
96 | "outputs": [],
97 | "source": []
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "id": "34931f43",
102 | "metadata": {},
103 | "source": [
104 | "## 3. Делаем предобработку данных"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": null,
110 | "id": "67a611d0",
111 | "metadata": {},
112 | "outputs": [],
113 | "source": []
114 | }
115 | ],
116 | "metadata": {
117 | "kernelspec": {
118 | "display_name": "nn-ml-bachelor-2024-venv",
119 | "language": "python",
120 | "name": "nn-ml-bachelor-2024-venv"
121 | },
122 | "language_info": {
123 | "codemirror_mode": {
124 | "name": "ipython",
125 | "version": 3
126 | },
127 | "file_extension": ".py",
128 | "mimetype": "text/x-python",
129 | "name": "python",
130 | "nbconvert_exporter": "python",
131 | "pygments_lexer": "ipython3",
132 | "version": "3.9.6"
133 | }
134 | },
135 | "nbformat": 4,
136 | "nbformat_minor": 5
137 | }
138 |
--------------------------------------------------------------------------------
/notebooks/labs/03_lab_ml_intro.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "bc9520d2",
6 | "metadata": {},
7 | "source": [
8 | "# Лабораторная работа №3\n",
9 | "\n",
10 | "ФИО: \n",
11 | "Группа: \n",
12 | "\n",
13 | "Отправлять можно следующими способами:\n",
14 | "1. Запушить этот ноутбук в GitHub в репозиторий, где у вас лежат ноутбуки с лабами\n",
15 | "\n",
16 | "Deadlines:\n",
17 | "- Занятие №7 в семестре (Занятие №3 очное)\n",
18 | "\n",
19 | "Что необходимо сделать: \n",
20 | "- Обучить различного рода модели машинного обучения и сравнить их между собой \n",
21 | "\n",
22 | "---\n",
23 | "## Читайте задание внимательно\n",
24 | "\n",
25 | "Исходные данные:\n",
26 | "1. В [табличке](https://docs.google.com/spreadsheets/d/11600OkqdRzxc-xZmFTuBs099VeAqU_GlE0Bx6bBlIRE/edit?usp=sharing) необходимо узнать название своего датасета \n",
27 | "2. Скачать нужны вам данные можно в [Google Drive](https://drive.google.com/drive/folders/1Phm-Fq1GL-VX7NS-DCEMRh_Fo1wU7DQ9?usp=sharing)\n",
28 | " \n",
29 | "---\n",
30 | "Теперь по пунктам, что я от вас жду: \n",
31 | "1. Загрузить необходимые данные к себе и считать (read) их в переменную.\n",
32 | "2. Понять, у вас задача классификации (бинарной или многоклассовой) или регрессии (**если у вас многоклассовая классификация, прочтите P.S.S. внизу**).\n",
33 | "3. Сделать предобработку данных: \n",
34 | " 1. Разделить выборку на тренировочную (train) и тестовую (test). _Обратите внимание, что обучать скейлеры и определять, какими значениями вы будете заполнять пропуски, вы будете на train выборке, а применять и на train, и на test_.\n",
35 | " 2. Проверить пропуски в данных. Если они есть, заполнить одной из стратегий, предложенных в ноутбуке для семинара №3. P.S. Для численных и категориальных переменных будут разные стратегии.\n",
36 | " 3. Отнормировать численные переменные (`StandardScaler`, `MinMaxScaler`).\n",
37 | " 4. Закодировать категориальные признаки по одной из стратегий.\n",
38 | "4. Обучить на тренировочном множестве:\n",
39 | " 1. Линейную модель (`LogisticRegression`, `LinearRegression`)\n",
40 | " 2. Деревянную модель (`DecisionTreeClassifier`, `DecisionTreeRegressor`) (тут советую попробовать разные глубины деревьев)\n",
41 | " 3. K-ближайших соседей (`KNeighborsClassifier`, `KNeighborsRegressor`) (тут тоже есть смысл попробовать разные `k`)\n",
42 | " 4. Случайный лес (`RandomForestClassifier`, `RandomForestRegressor`) \n",
43 | "5. Посчитайте метрики на train и test множествах:\n",
44 | " 1. Для задачи классификации -- Accuracy, ROC-AUC (график + значение), PR-кривую (график), F1-score\n",
45 | " 2. Для задачи регрессии -- MAE, RMSE, MAPE\n",
46 | "6. Сравните метрики относительно train/test, так и относительно разных моделей. Ответьте на следующие вопросы:\n",
47 | " 1. Какая модель справилась лучше с поставленной задачей?\n",
48 | " 2. Имеет ли место переобучение?\n",
49 | " 3. Имеет ли место недообучение?\n",
50 | " 4. Как можно улучшить метрики моделей?\n",
51 | "\n",
52 | "---\n",
53 | "P.S. \n",
54 | "Просьба -- делать каждое задание в отдельных ячейках и с отдельными заголовками (как пункт 1 и 2 в этом ноутбуке) типа \n",
55 | "- Заголовок\n",
56 | "- Ячейки с кодом\n",
57 | "- Другой заголовок\n",
58 | "- Другие ячейки с кодом\n",
59 | "\n",
60 | "P.S.S. \n",
61 | "Если вам повезло с многоклассовой классификацией, вам будет необходимо понять, умеет ли алгоритм работать с несколькими классами одновременно (обычно они не умеют). Поэтому вам может понадобиться такая штука, как OneVsRestClassifier ([ссылка](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier)), но советую ознакомиться с этой [страницей](https://scikit-learn.org/stable/modules/multiclass.html), здесь представлена более полная информация."
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "id": "28b51062",
67 | "metadata": {},
68 | "source": [
69 | "## 1. Пример импорта данных. Грузим данные"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "id": "406597a7",
76 | "metadata": {},
77 | "outputs": [],
78 | "source": [
79 | "data = pd.read_csv(path_to_dataset, sep='\\t') # если нужно, можно поменять разделитель"
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "id": "0d119028",
85 | "metadata": {},
86 | "source": [
87 | "## 2. Понимаем, какая перед нами задача"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "id": "d92464d0",
94 | "metadata": {},
95 | "outputs": [],
96 | "source": []
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "id": "34931f43",
101 | "metadata": {},
102 | "source": [
103 | "## 3. Делаем предобработку данных"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": null,
109 | "id": "67a611d0",
110 | "metadata": {},
111 | "outputs": [],
112 | "source": []
113 | }
114 | ],
115 | "metadata": {
116 | "kernelspec": {
117 | "display_name": "nn-ml-bachelor-2024-venv",
118 | "language": "python",
119 | "name": "nn-ml-bachelor-2024-venv"
120 | },
121 | "language_info": {
122 | "codemirror_mode": {
123 | "name": "ipython",
124 | "version": 3
125 | },
126 | "file_extension": ".py",
127 | "mimetype": "text/x-python",
128 | "name": "python",
129 | "nbconvert_exporter": "python",
130 | "pygments_lexer": "ipython3",
131 | "version": "3.9.6"
132 | }
133 | },
134 | "nbformat": 4,
135 | "nbformat_minor": 5
136 | }
137 |
--------------------------------------------------------------------------------
/notebooks/labs/01_lab_python_intro_old_example.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "596b4987",
6 | "metadata": {},
7 | "source": [
8 | "# Лабораторная работа №1\n",
9 | "\n",
10 | "ФИО: \n",
11 | "Группа: \n",
12 | "\n",
13 | "Отправлять можно следующими способами:\n",
14 | "1. Создать **приватный** репозиторий на github, добавить меня по нику (l3lush) в Collaborators (Settings -> Collaborators -> Add people)\n",
15 | "2. Отправить заполненный ноутбук мне на почту avmysh@gmail.com, либо m1603956@edu.misis.ru\n",
16 | "3. Отправить заполненный ноутбук мне в тг @l3lush. \n",
17 | "\n",
18 | "Deadlines:\n",
19 | "- soft -- **05.03.2023 23:59** (за сдачу в пределах этого времени +1 балл в табличку)\n",
20 | "- hard -- **19.03.2023 23:59**\n",
21 | "\n",
22 | "Что необходимо сделать:\n",
23 | "1. Заполнить все ячейки ниже кодом так, чтобы прошли все `assert`ы."
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "id": "2213731a",
29 | "metadata": {},
30 | "source": [
31 | "# 0. Пример работы assert"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "id": "0dcee059",
37 | "metadata": {},
38 | "source": [
39 | "`assert` предназначен для проверки каких-либо условий. \n",
40 | "Если условие истинно (возвращается `True`), код выполняется без ошибок, в противном случае выходит `AssertionError`.
\n",
41 | "Ниже пример работы `assert`ов (в ячейке с примером `assert a > 6` можно оставить ошибку, но нигде далее в коде ошибок быть **не должно**)
\n",
42 | "P.S. `assert`ы менять никак **нельзя**"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 1,
48 | "id": "31be897b",
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "a = 5\n",
53 | "assert a == 5 # assert проходит, т.к. a равно 0"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 2,
59 | "id": "139c9ca8",
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "ename": "AssertionError",
64 | "evalue": "",
65 | "output_type": "error",
66 | "traceback": [
67 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
68 | "\u001b[1;31mAssertionError\u001b[0m Traceback (most recent call last)",
69 | "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_21924\\1589179466.py\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0ma\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m6\u001b[0m \u001b[1;31m# assert не проходит, т.к. а не больше 6, выпадает ошибка\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
70 | "\u001b[1;31mAssertionError\u001b[0m: "
71 | ]
72 | }
73 | ],
74 | "source": [
75 | "assert a > 6 # assert не проходит, т.к. а не больше 6, выпадает ошибка"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "id": "a72f14da",
81 | "metadata": {},
82 | "source": [
83 | "# 1. Базовые типы переменных"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "id": "ccfb5acf",
89 | "metadata": {},
90 | "source": [
91 | "Создайте переменные так, чтобы они были объектами определенного типа (типы приведены в комментариях)."
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 5,
97 | "id": "e88c14eb",
98 | "metadata": {},
99 | "outputs": [
100 | {
101 | "ename": "AssertionError",
102 | "evalue": "",
103 | "output_type": "error",
104 | "traceback": [
105 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
106 | "\u001b[1;31mAssertionError\u001b[0m Traceback (most recent call last)",
107 | "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_21924\\2380738517.py\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mint_variable\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m4.1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mint_variable\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mint\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
108 | "\u001b[1;31mAssertionError\u001b[0m: "
109 | ]
110 | }
111 | ],
112 | "source": [
113 | "int_variable = [4.1]\n",
114 | "assert isinstance(int_variable, int)"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": null,
120 | "id": "db46e8d6",
121 | "metadata": {},
122 | "outputs": [],
123 | "source": [
124 | "int_variable = ... # int\n",
125 | "assert isinstance(int_variable, int)\n",
126 | "\n",
127 | "float_variable = ... # float\n",
128 | "assert isinstance(float_variable, float)\n",
129 | "\n",
130 | "string_variable = ... # string\n",
131 | "assert isinstance(string_variable, str)\n",
132 | "\n",
133 | "bool_variable = ... # bool\n",
134 | "assert isinstance(bool_variable, bool)"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "id": "ed1861a0",
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "list_variable = ... # list\n",
145 | "assert isinstance(list_variable, list)\n",
146 | "\n",
147 | "tuple_variable = ... # tuple\n",
148 | "assert isinstance(tuple_variable, tuple)\n",
149 | "\n",
150 | "set_variable = ... # set\n",
151 | "assert isinstance(set_variable, set)\n",
152 | "\n",
153 | "dict_variable = ... # dict\n",
154 | "assert isinstance(dict_variable, dict)"
155 | ]
156 | },
157 | {
158 | "cell_type": "markdown",
159 | "id": "52830470",
160 | "metadata": {},
161 | "source": [
162 | "# 2. Различные функции"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": null,
168 | "id": "3eb16cbe",
169 | "metadata": {},
170 | "outputs": [],
171 | "source": [
172 | "def difference_of_two_numbers(first, second):\n",
173 | " \"\"\"Возвращает разницу между первым и вторым аргументом\"\"\"\n",
174 | " # TODO: напиши меня\n",
175 | " pass\n",
176 | "\n",
177 | "\n",
178 | "assert difference_of_two_numbers(2, 1) == 1\n",
179 | "assert difference_of_two_numbers(4, 1) == 3\n",
180 | "assert difference_of_two_numbers(10, 0) == 10\n",
181 | "assert difference_of_two_numbers(-5, -6) == 1"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "id": "65ff4881",
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "def condition_function(input_number):\n",
192 | " \"\"\"\n",
193 | " Если входное число меньше либо равно 0, то умножить его на 2.\n",
194 | " В противном случае, если число больше 0, но меньше или равно 10, умножить на 3.\n",
195 | " Во всех прочих случаях поделить на 10.\n",
196 | " \"\"\"\n",
197 | " # TODO: напиши меня\n",
198 | " pass\n",
199 | "\n",
200 | "\n",
201 | "assert condition_function(0) == 0\n",
202 | "assert condition_function(-1) == -2\n",
203 | "assert condition_function(1) == 3\n",
204 | "assert condition_function(10) == 30\n",
205 | "assert condition_function(11) == 1.1\n",
206 | "assert condition_function(20) == 2"
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": null,
212 | "id": "51264059",
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "def calculator(number_1, operation, number_2):\n",
217 | " \"\"\"\n",
218 | " Простой оператор, способный выполнять операции +, -, *, /.\n",
219 | " На входе первое число, операция в виде строки и второе число.\n",
220 | " \n",
221 | " Пример: \n",
222 | " >>> calculator(1, \"+\", 1)\n",
223 | " >>> 2\n",
224 | " \"\"\"\n",
225 | " # TODO: напиши меня\n",
226 | " pass\n",
227 | "\n",
228 | "\n",
229 | "assert calculator(1, \"+\", 2) == 3\n",
230 | "assert calculator(3, \"-\", 1) == 2\n",
231 | "assert calculator(4, \"*\", 3) == 12\n",
232 | "assert calculator(2, \"/\", 2) == 1"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "id": "02255ff4",
239 | "metadata": {},
240 | "outputs": [],
241 | "source": [
242 | "def number_of_unique_elements(input_list):\n",
243 | " \"\"\"\n",
244 | " Считает количество уникальных элементов в листе.\n",
245 | " \"\"\"\n",
246 | " # TODO: напиши меня\n",
247 | " pass\n",
248 | "\n",
249 | "\n",
250 | "assert number_of_unique_elements([1, 2, 3]) == 3\n",
251 | "assert number_of_unique_elements([1] * 93) == 1\n",
252 | "assert number_of_unique_elements(list(range(1000))) == 1000"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "id": "2a2a2363",
259 | "metadata": {},
260 | "outputs": [],
261 | "source": [
262 | "def counter(input_list):\n",
263 | " \"\"\"\n",
264 | " Считает количество вхождений каждого из элементов листа.\n",
265 | " Возвращает словарь вида {число: количество вхождений}\n",
266 | " \n",
267 | " Замечание (!): встроенным в collections Counter'ом пользоваться нельзя\n",
268 | " \n",
269 | " Например:\n",
270 | " counter([1, 1, 2, 3]) вернет {1: 2, 2: 1, 3: 1}\n",
271 | " \"\"\"\n",
272 | " # TODO: напиши меня\n",
273 | " pass\n",
274 | "\n",
275 | "\n",
276 | "assert counter([1, 1, 1, 2, 3]) == {1: 3, 2: 1, 3: 1}\n",
277 | "assert counter([1] * 1000) == {1: 1000}\n",
278 | "assert counter([1, 3, 5] * 100) == {1: 100, 3: 100, 5: 100}"
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": null,
284 | "id": "c1498018",
285 | "metadata": {},
286 | "outputs": [],
287 | "source": [
288 | "def multiply_nums(input_string):\n",
289 | " \"\"\"\n",
290 | " Перемножить числа, переданные в строке, перечисленные через запятую.\n",
291 | " \n",
292 | " hint: можно использовать метод .split()\n",
293 | " \"\"\"\n",
294 | " # TODO: напиши меня\n",
295 | " pass\n",
296 | "\n",
297 | "\n",
298 | "assert multiply_nums(\"2, 3\") == 6\n",
299 | "assert multiply_nums(\"1, 1, 1, 1, 1, 1, 1\") == 1\n",
300 | "assert multiply_nums(\"345, 4576, 794, 325, 0\") == 0"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "id": "202c838a",
306 | "metadata": {},
307 | "source": [
308 | "Реализуйте следующую функцию:\n",
309 | "$$\n",
310 | "y = sin(x)\\cdot cos(x)\n",
311 | "$$\n",
312 | "P.S. используйте библиотеку `math`"
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": null,
318 | "id": "4121596c",
319 | "metadata": {},
320 | "outputs": [],
321 | "source": [
322 | "def custom_function(x):\n",
323 | " \"\"\"\n",
324 | " Реализуйте функцию, описанную выше.\n",
325 | " \"\"\"\n",
326 | " # TODO: напиши меня\n",
327 | " pass\n",
328 | "\n",
329 | "assert round(custom_function(1), 3) == 0.455\n",
330 | "assert round(custom_function(1.5), 3) == 0.071\n",
331 | "assert round(custom_function(2), 3) == -0.378\n",
332 | "assert custom_function(0) == 0"
333 | ]
334 | },
335 | {
336 | "cell_type": "markdown",
337 | "id": "7a2210d6",
338 | "metadata": {},
339 | "source": [
340 | "Реализуйте следующую функцию:\n",
341 | "$$\n",
342 | "y = \\prod\\limits_{n = 1}^n \\frac{(n+2)^x + ln(x)}{x^2 + 4n}\n",
343 | "$$\n",
344 | "P.S. используйте библиотеку `math`"
345 | ]
346 | },
347 | {
348 | "cell_type": "code",
349 | "execution_count": null,
350 | "id": "b4698ba6",
351 | "metadata": {},
352 | "outputs": [],
353 | "source": [
354 | "def custom_function_1(x, n):\n",
355 | " \"\"\"\n",
356 | " Реализуйте функцию, описанную выше.\n",
357 | " \"\"\"\n",
358 | " # TODO: напиши меня\n",
359 | " pass\n",
360 | "\n",
361 | "\n",
362 | "assert round(custom_function_1(2, 3), 3) == 2.707\n",
363 | "assert round(custom_function_1(3, 2), 3) == 8.277\n",
364 | "assert round(custom_function_1(3, 3), 3) == 49.7"
365 | ]
366 | },
367 | {
368 | "cell_type": "code",
369 | "execution_count": null,
370 | "id": "b9a4db60",
371 | "metadata": {},
372 | "outputs": [],
373 | "source": [
374 | "class MyList:\n",
375 | " def __init__(self):\n",
376 | " # TODO: напиши меня\n",
377 | " pass\n",
378 | " \n",
379 | " def return_sum(self):\n",
380 | " \"\"\"\n",
381 | " Возвращает сумму всех элементов сохраненного листа.\n",
382 | " Пользоваться sum нельзя!\n",
383 | " \"\"\"\n",
384 | " # TODO: напиши меня\n",
385 | " pass\n",
386 | " \n",
387 | " def make_reverse(self):\n",
388 | " \"\"\"\n",
389 | " Разворачивает сохраненный лист.\n",
390 | " \"\"\"\n",
391 | " # TODO: напиши меня\n",
392 | " pass\n",
393 | " \n",
394 | " def make_slice(self, start, stop):\n",
395 | " \"\"\"\n",
396 | " Делает слайсинг сохраненного листа.\n",
397 | " \"\"\"\n",
398 | " # TODO: напиши меня\n",
399 | " pass\n",
400 | " \n",
401 | " \n",
402 | "a = MyList([1, 2, 3, 4])\n",
403 | "assert a.return_sum() == 10\n",
404 | "assert a.make_reverse() == [4, 3, 2, 1]\n",
405 | "assert a.make_slice(0, 2) == [1, 2]\n",
406 | "\n",
407 | "b = MyList([5, 6, 6, 5])\n",
408 | "assert b.return_sum() == 22\n",
409 | "assert b.make_reverse() == [5, 6, 6, 5]\n",
410 | "assert b.make_slice(1, 2) == [6]"
411 | ]
412 | }
413 | ],
414 | "metadata": {
415 | "kernelspec": {
416 | "display_name": "Python 3 (ipykernel)",
417 | "language": "python",
418 | "name": "python3"
419 | },
420 | "language_info": {
421 | "codemirror_mode": {
422 | "name": "ipython",
423 | "version": 3
424 | },
425 | "file_extension": ".py",
426 | "mimetype": "text/x-python",
427 | "name": "python",
428 | "nbconvert_exporter": "python",
429 | "pygments_lexer": "ipython3",
430 | "version": "3.9.13"
431 | }
432 | },
433 | "nbformat": 4,
434 | "nbformat_minor": 5
435 | }
436 |
--------------------------------------------------------------------------------
/notebooks/labs/01_lab_python_intro.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "596b4987",
6 | "metadata": {},
7 | "source": [
8 | "# Лабораторная работа №1\n",
9 | "\n",
10 | "ФИО: \n",
11 | "Группа: \n",
12 | "\n",
13 | "Отправлять можно следующими способами:\n",
14 | "1. Создать **приватный** репозиторий на github, добавить меня по нику (l3lush) в Collaborators (Settings -> Collaborators -> Add people)\n",
15 | "\n",
16 | "Deadlines:\n",
17 | "- Занятие №5 в семестре (Занятие №1 очное)\n",
18 | "\n",
19 | "Что необходимо сделать:\n",
20 | "1. Заполнить все ячейки ниже кодом так, чтобы прошли все `assert`ы."
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "2213731a",
26 | "metadata": {},
27 | "source": [
28 | "# 0. Пример работы assert"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "id": "0dcee059",
34 | "metadata": {},
35 | "source": [
36 | "`assert` предназначен для проверки каких-либо условий. \n",
37 | "Если условие истинно (возвращается `True`), код выполняется без ошибок, в противном случае выходит `AssertionError`.
\n",
38 | "Ниже пример работы `assert`ов (в ячейке с примером `assert a > 6` можно оставить ошибку, но нигде далее в коде ошибок быть **не должно**)
\n",
39 | "P.S. `assert`ы менять никак **нельзя**"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 1,
45 | "id": "31be897b",
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "a = 5\n",
50 | "assert a == 5 # assert проходит, т.к. a равно 0"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 2,
56 | "id": "139c9ca8",
57 | "metadata": {},
58 | "outputs": [
59 | {
60 | "ename": "AssertionError",
61 | "evalue": "",
62 | "output_type": "error",
63 | "traceback": [
64 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
65 | "\u001b[1;31mAssertionError\u001b[0m Traceback (most recent call last)",
66 | "\u001b[1;32m~\\AppData\\Local\\Temp\\ipykernel_23792\\1589179466.py\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0ma\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m6\u001b[0m \u001b[1;31m# assert не проходит, т.к. а не больше 6, выпадает ошибка\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
67 | "\u001b[1;31mAssertionError\u001b[0m: "
68 | ]
69 | }
70 | ],
71 | "source": [
72 | "assert a > 6 # assert не проходит, т.к. а не больше 6, выпадает ошибка"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "id": "a72f14da",
78 | "metadata": {},
79 | "source": [
80 | "# 1. Базовые типы переменных"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "id": "ccfb5acf",
86 | "metadata": {},
87 | "source": [
88 | "Создайте переменные так, чтобы они были объектами определенного типа (типы приведены в комментариях)."
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "id": "db46e8d6",
95 | "metadata": {},
96 | "outputs": [],
97 | "source": [
98 | "int_variable = ... # int\n",
99 | "assert isinstance(int_variable, int)\n",
100 | "\n",
101 | "float_variable = ... # float\n",
102 | "assert isinstance(float_variable, float)\n",
103 | "\n",
104 | "string_variable = ... # string\n",
105 | "assert isinstance(string_variable, str)\n",
106 | "\n",
107 | "bool_variable = ... # bool\n",
108 | "assert isinstance(bool_variable, bool)"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": null,
114 | "id": "ed1861a0",
115 | "metadata": {},
116 | "outputs": [],
117 | "source": [
118 | "list_variable = ... # list\n",
119 | "assert isinstance(list_variable, list)\n",
120 | "\n",
121 | "tuple_variable = ... # tuple\n",
122 | "assert isinstance(tuple_variable, tuple)\n",
123 | "\n",
124 | "set_variable = ... # set\n",
125 | "assert isinstance(set_variable, set)\n",
126 | "\n",
127 | "dict_variable = ... # dict\n",
128 | "assert isinstance(dict_variable, dict)"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "id": "00ec06c0",
134 | "metadata": {},
135 | "source": [
136 | "
"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "id": "52830470",
142 | "metadata": {},
143 | "source": [
144 | "# 2.1 Различные функции"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "id": "3eb16cbe",
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "def difference_of_two_numbers(first, second):\n",
155 | " \"\"\"Возвращает разницу между первым и вторым аргументом\"\"\"\n",
156 | " # TODO: напиши меня\n",
157 | " pass\n",
158 | "\n",
159 | "\n",
160 | "assert difference_of_two_numbers(2, 1) == 1\n",
161 | "assert difference_of_two_numbers(4, 1) == 3\n",
162 | "assert difference_of_two_numbers(10, 0) == 10\n",
163 | "assert difference_of_two_numbers(-5, -6) == 1"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "id": "65ff4881",
170 | "metadata": {},
171 | "outputs": [],
172 | "source": [
173 | "def condition_function(input_number):\n",
174 | " \"\"\"\n",
175 | " Если входное число меньше либо равно 0, то умножить его на 2.\n",
176 | " В противном случае, если число больше 0, но меньше или равно 10, умножить на 3.\n",
177 | " Во всех прочих случаях поделить на 10.\n",
178 | " \"\"\"\n",
179 | " # TODO: напиши меня\n",
180 | " pass\n",
181 | "\n",
182 | "\n",
183 | "assert condition_function(0) == 0\n",
184 | "assert condition_function(-1) == -2\n",
185 | "assert condition_function(1) == 3\n",
186 | "assert condition_function(10) == 30\n",
187 | "assert condition_function(11) == 1.1\n",
188 | "assert condition_function(20) == 2"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": null,
194 | "id": "51264059",
195 | "metadata": {},
196 | "outputs": [],
197 | "source": [
198 | "def calculator(number_1, operation, number_2):\n",
199 | " \"\"\"\n",
200 | " Простой оператор, способный выполнять операции +, -, *, /.\n",
201 | " На входе первое число, операция в виде строки и второе число.\n",
202 | " \n",
203 | " Пример: \n",
204 | " >>> calculator(1, \"+\", 1)\n",
205 | " >>> 2\n",
206 | " \"\"\"\n",
207 | " # TODO: напиши меня\n",
208 | " pass\n",
209 | "\n",
210 | "\n",
211 | "assert calculator(1, \"+\", 2) == 3\n",
212 | "assert calculator(3, \"-\", 1) == 2\n",
213 | "assert calculator(4, \"*\", 3) == 12\n",
214 | "assert calculator(2, \"/\", 2) == 1"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": null,
220 | "id": "02255ff4",
221 | "metadata": {},
222 | "outputs": [],
223 | "source": [
224 | "def number_of_unique_elements(input_list):\n",
225 | " \"\"\"\n",
226 | " Считает количество уникальных элементов в листе.\n",
227 | " \"\"\"\n",
228 | " # TODO: напиши меня\n",
229 | " pass\n",
230 | "\n",
231 | "\n",
232 | "assert number_of_unique_elements([1, 2, 3]) == 3\n",
233 | "assert number_of_unique_elements([1] * 93) == 1\n",
234 | "assert number_of_unique_elements(list(range(1000))) == 1000"
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": null,
240 | "id": "2a2a2363",
241 | "metadata": {},
242 | "outputs": [],
243 | "source": [
244 | "def counter(input_list):\n",
245 | " \"\"\"\n",
246 | " Считает количество вхождений каждого из элементов листа.\n",
247 | " Возвращает словарь вида {число: количество вхождений}\n",
248 | " \n",
249 | " Замечание (!): встроенным в collections Counter'ом пользоваться нельзя\n",
250 | " \n",
251 | " Например:\n",
252 | " counter([1, 1, 2, 3]) вернет {1: 2, 2: 1, 3: 1}\n",
253 | " \"\"\"\n",
254 | " # TODO: напиши меня\n",
255 | " pass\n",
256 | "\n",
257 | "\n",
258 | "assert counter([1, 1, 1, 2, 3]) == {1: 3, 2: 1, 3: 1}\n",
259 | "assert counter([1] * 1000) == {1: 1000}\n",
260 | "assert counter([1, 3, 5] * 100) == {1: 100, 3: 100, 5: 100}"
261 | ]
262 | },
263 | {
264 | "cell_type": "code",
265 | "execution_count": null,
266 | "id": "c1498018",
267 | "metadata": {},
268 | "outputs": [],
269 | "source": [
270 | "def multiply_nums(input_string):\n",
271 | " \"\"\"\n",
272 | " Перемножить числа, переданные в строке, перечисленные через запятую.\n",
273 | " \n",
274 | " hint: можно использовать метод .split()\n",
275 | " \"\"\"\n",
276 | " # TODO: напиши меня\n",
277 | " pass\n",
278 | "\n",
279 | "\n",
280 | "assert multiply_nums(\"2, 3\") == 6\n",
281 | "assert multiply_nums(\"1, 1, 1, 1, 1, 1, 1\") == 1\n",
282 | "assert multiply_nums(\"345, 4576, 794, 325, 0\") == 0"
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "id": "202c838a",
288 | "metadata": {},
289 | "source": [
290 | "Реализуйте следующую функцию:\n",
291 | "$$\n",
292 | "y = sin(x)\\cdot cos(x)\n",
293 | "$$\n",
294 | "P.S. используйте библиотеку `math`"
295 | ]
296 | },
297 | {
298 | "cell_type": "code",
299 | "execution_count": null,
300 | "id": "4121596c",
301 | "metadata": {},
302 | "outputs": [],
303 | "source": [
304 | "def custom_function(x):\n",
305 | " \"\"\"\n",
306 | " Реализуйте функцию, описанную выше.\n",
307 | " \"\"\"\n",
308 | " # TODO: напиши меня\n",
309 | " pass\n",
310 | "\n",
311 | "assert round(custom_function(1), 3) == 0.455\n",
312 | "assert round(custom_function(1.5), 3) == 0.071\n",
313 | "assert round(custom_function(2), 3) == -0.378\n",
314 | "assert custom_function(0) == 0"
315 | ]
316 | },
317 | {
318 | "cell_type": "markdown",
319 | "id": "7a2210d6",
320 | "metadata": {},
321 | "source": [
322 | "Реализуйте следующую функцию:\n",
323 | "$$\n",
324 | "y = \\prod\\limits_{n = 1}^n \\frac{(n+2)^x + ln(x)}{x^2 + 4n}\n",
325 | "$$\n",
326 | "P.S. используйте библиотеку `math`"
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": null,
332 | "id": "b4698ba6",
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "def custom_function_1(x, n):\n",
337 | " \"\"\"\n",
338 | " Реализуйте функцию, описанную выше.\n",
339 | " \"\"\"\n",
340 | " # TODO: напиши меня\n",
341 | " pass\n",
342 | "\n",
343 | "\n",
344 | "assert round(custom_function_1(2, 3), 3) == 2.707\n",
345 | "assert round(custom_function_1(3, 2), 3) == 8.277\n",
346 | "assert round(custom_function_1(3, 3), 3) == 49.7"
347 | ]
348 | },
349 | {
350 | "cell_type": "markdown",
351 | "id": "79409870",
352 | "metadata": {},
353 | "source": [
354 | "# 2.2 Задачки на написание функций"
355 | ]
356 | },
357 | {
358 | "cell_type": "markdown",
359 | "id": "b1b313b9",
360 | "metadata": {},
361 | "source": [
362 | "1. Дано число n. С начала суток прошло n минут. Определите, сколько часов и минут будут показывать электронные часы в этот момент. Программа должна вывести два числа: количество часов (от 0 до 23) и количество минут (от 0 до 59). Учтите, что число n может быть больше, чем количество минут в сутках."
363 | ]
364 | },
365 | {
366 | "cell_type": "code",
367 | "execution_count": 12,
368 | "id": "42562c52",
369 | "metadata": {},
370 | "outputs": [],
371 | "source": [
372 | "# Ваша задача здесь написать функцию time_converter (именно такое название)\n",
373 | "# она должна возвращать значения в виде строки 'hours minutes'\n",
374 | "\n",
375 | "\n",
376 | "assert time_converter(2782) == '22 22'\n",
377 | "assert time_converter(4733) == '6 53'\n",
378 | "assert time_converter(1766) == '5 26'\n",
379 | "assert time_converter(3865) == '16 25'\n",
380 | "assert time_converter(4628) == '5 8'\n",
381 | "assert time_converter(4353) == '0 33'\n",
382 | "assert time_converter(268) == '4 28'\n",
383 | "assert time_converter(4373) == '0 53'\n",
384 | "assert time_converter(2722) == '21 22'\n",
385 | "assert time_converter(1531) == '1 31'"
386 | ]
387 | },
388 | {
389 | "cell_type": "markdown",
390 | "id": "824cbdcc",
391 | "metadata": {},
392 | "source": [
393 | "---\n",
394 | "2. Написать функцию, которая выводит минимум из трех чисел. Использовать `min` нельзя. Только `if`."
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": null,
400 | "id": "23a5e2ef",
401 | "metadata": {},
402 | "outputs": [],
403 | "source": [
404 | "# Написать функцию min_of_three_values, принимает на вход 3 аргумента \n",
405 | "\n",
406 | "\n",
407 | "assert min_of_three_values(1, 2, 3) == 1\n",
408 | "assert min_of_three_values(1, 1, 9) == 1\n",
409 | "assert min_of_three_values(18, 7, 11) == 7\n",
410 | "assert min_of_three_values(2, 10, 10) == 2\n",
411 | "assert min_of_three_values(17, 14, 17) == 14\n",
412 | "assert min_of_three_values(9, 2, 10) == 2\n",
413 | "assert min_of_three_values(7, 4, 7) == 4\n",
414 | "assert min_of_three_values(0, 8, 3) == 0\n",
415 | "assert min_of_three_values(8, 10, 6) == 6\n",
416 | "assert min_of_three_values(1, 4, 8) == 1"
417 | ]
418 | },
419 | {
420 | "cell_type": "markdown",
421 | "id": "414dc710",
422 | "metadata": {},
423 | "source": [
424 | "---\n",
425 | "3. Удаление символа из строки. Необходимо использовать встроенный метод `.replace()`"
426 | ]
427 | },
428 | {
429 | "cell_type": "code",
430 | "execution_count": 20,
431 | "id": "3a131bb4",
432 | "metadata": {},
433 | "outputs": [],
434 | "source": [
435 | "# Функция remove_symbol принимает два аргумента -- строку и символ, которые необходимо удалить\n",
436 | "\n",
437 | "\n",
438 | "assert remove_symbol('aaaaaaaa', 'a') == ''\n",
439 | "assert remove_symbol('abababa', 'b') == 'aaaa'\n",
440 | "assert remove_symbol('12341234', '3') == '124124'"
441 | ]
442 | },
443 | {
444 | "cell_type": "markdown",
445 | "id": "b6881076",
446 | "metadata": {},
447 | "source": [
448 | "---\n",
449 | "4. Удаление каждого 3 символа из строки "
450 | ]
451 | },
452 | {
453 | "cell_type": "code",
454 | "execution_count": null,
455 | "id": "52279032",
456 | "metadata": {},
457 | "outputs": [],
458 | "source": [
459 | "# Функция remove_each_third_sym принимает один агрумент -- строку.\n",
460 | "# важно -- мы считаем человеческие индексы (начиная с 1, а не 0)\n",
461 | "# важно -- оставляем первый аргумент\n",
462 | "# Необходимо вернуть новую строку \n",
463 | "\n",
464 | "\n",
465 | "\n",
466 | "assert remove_each_third_sym('abcdef') == 'abde'\n",
467 | "assert remove_each_third_sym('sdfasdfasdfsfa') == 'sdasfadffa'\n",
468 | "assert remove_each_third_sym('123456789') == '124578'\n",
469 | "assert remove_each_third_sym('987654321') == '986532'"
470 | ]
471 | },
472 | {
473 | "cell_type": "markdown",
474 | "id": "7c72f3ac",
475 | "metadata": {},
476 | "source": [
477 | "--- \n",
478 | "5. Вывести максимальный элемент и его индекс в листе"
479 | ]
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": 54,
484 | "id": "789f7d5c",
485 | "metadata": {},
486 | "outputs": [],
487 | "source": [
488 | "# Функция find_max принимает на вход лист \n",
489 | "# на выходе два числа -- непосредственно максимальное значение и его индекс \n",
490 | "\n",
491 | "\n",
492 | "assert find_max([1, 2, 3, 4, 5]) == (5, 4)\n",
493 | "assert find_max([5, 4, 3, 2, 1]) == (5, 0)\n",
494 | "assert find_max([96, 82, 72, 48, 93, 88, 79]) == (96, 0)\n",
495 | "assert find_max([49, 75, 65, 65, 65, 18]) == (75, 1)\n",
496 | "assert find_max([69, 16, 64, 54, 36, 70, 89, 29]) == (89, 6)\n",
497 | "assert find_max([17, 80, 27, 36, 21, 85, 63, 27]) == (85, 5)\n",
498 | "assert find_max([76, 27, 73, 65, 52]) == (76, 0)\n",
499 | "assert find_max([33, 26, 69, 40, 93]) == (93, 4)\n",
500 | "assert find_max([87, 5, 95, 52, 21, 76, 22]) == (95, 2)\n",
501 | "assert find_max([75, 18, 89, 99, 70]) == (99, 3)"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "id": "a8cccf57",
507 | "metadata": {},
508 | "source": [
509 | "---\n",
510 | "6. Вставить элемент в конец листа. Использовать метод `append` можно и нужно"
511 | ]
512 | },
513 | {
514 | "cell_type": "code",
515 | "execution_count": 66,
516 | "id": "2c127a30",
517 | "metadata": {},
518 | "outputs": [],
519 | "source": [
520 | "# функция append_to_list принимает два значения -- лист и значение, которое необходимо вставить в конец листа\n",
521 | "# функция возвращает обновленный лист\n",
522 | "\n",
523 | "\n",
524 | "assert append_to_list([1, 2], 3) == [1, 2, 3]\n",
525 | "assert append_to_list([1, 2], None) == [1, 2, None]\n",
526 | "assert append_to_list([1, 's'], True) == [1, 's', True]"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "id": "92d53320",
532 | "metadata": {},
533 | "source": [
534 | "---\n",
535 | "7. Количество уникальных чисел в листе. Нужно использовать множества (`set`)"
536 | ]
537 | },
538 | {
539 | "cell_type": "code",
540 | "execution_count": null,
541 | "id": "e0d01126",
542 | "metadata": {},
543 | "outputs": [],
544 | "source": [
545 | "# функция number_unique_elements принимает на вход лист\n",
546 | "# на выходе одно число -- количество уникальных элементов\n",
547 | "\n",
548 | "\n",
549 | "assert number_unique_elements([1, 2, 3]) == 3\n",
550 | "assert number_unique_elements([1, 2, 1]) == 2\n",
551 | "assert number_unique_elements([1, 1, 1, 1]) == 1\n",
552 | "assert number_unique_elements([1, 2, 1, 2]) == 2"
553 | ]
554 | },
555 | {
556 | "cell_type": "markdown",
557 | "id": "8c708395",
558 | "metadata": {},
559 | "source": [
560 | "# 3. Классы"
561 | ]
562 | },
563 | {
564 | "cell_type": "markdown",
565 | "id": "1a11fb20",
566 | "metadata": {},
567 | "source": [
568 | "Напишите класс Vehicle с двумя атрибутами -- максимальная скорость (max_speed) и пробег (mileage) \n",
569 | "Необходимо реализовать только метод-конструктор `__init__`"
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": 2,
575 | "id": "5fc4379c",
576 | "metadata": {},
577 | "outputs": [],
578 | "source": [
579 | "class Vehicle:\n",
580 | " # TODO\n",
581 | " pass\n",
582 | "\n",
583 | "\n",
584 | "veh1 = Vehicle(100, 50)\n",
585 | "assert (veh1.max_speed, veh1.mileage) == (100, 50)\n",
586 | "\n",
587 | "veh2 = Vehicle(200, 3)\n",
588 | "assert (veh2.max_speed, veh2.mileage) == (200, 3)"
589 | ]
590 | },
591 | {
592 | "cell_type": "markdown",
593 | "id": "ea4551bf",
594 | "metadata": {},
595 | "source": [
596 | "Напишите класс Truck, который наследуется от Vehicle \n",
597 | "P.S. Наследование возможно при помощи конструкции class Subclass(MotherClass):"
598 | ]
599 | },
600 | {
601 | "cell_type": "code",
602 | "execution_count": null,
603 | "id": "2ea8d59d",
604 | "metadata": {},
605 | "outputs": [],
606 | "source": [
607 | "class Truck:\n",
608 | " # TODO\n",
609 | " pass\n",
610 | "\n",
611 | "\n",
612 | "truck1 = Truck(50, 1000)\n",
613 | "assert (truck1.max_speed, truck1.mileage) == (50, 1000)\n",
614 | "\n",
615 | "truck2 = Truck(43, 235)\n",
616 | "assert (truck2.max_speed, truck2.mileage) == (43, 235)"
617 | ]
618 | },
619 | {
620 | "cell_type": "markdown",
621 | "id": "5f93b111",
622 | "metadata": {},
623 | "source": [
624 | "Напишите класс MyList с функционалом, как описано ниже"
625 | ]
626 | },
627 | {
628 | "cell_type": "code",
629 | "execution_count": null,
630 | "id": "b9a4db60",
631 | "metadata": {},
632 | "outputs": [],
633 | "source": [
634 | "class MyList:\n",
635 | " def __init__(self):\n",
636 | " # TODO: напиши меня\n",
637 | " pass\n",
638 | " \n",
639 | " def return_sum(self):\n",
640 | " \"\"\"\n",
641 | " Возвращает сумму всех элементов сохраненного листа.\n",
642 | " Пользоваться sum нельзя!\n",
643 | " \"\"\"\n",
644 | " # TODO: напиши меня\n",
645 | " pass\n",
646 | " \n",
647 | " def make_reverse(self):\n",
648 | " \"\"\"\n",
649 | " Разворачивает сохраненный лист.\n",
650 | " \"\"\"\n",
651 | " # TODO: напиши меня\n",
652 | " pass\n",
653 | " \n",
654 | " def make_slice(self, start, stop):\n",
655 | " \"\"\"\n",
656 | " Делает слайсинг сохраненного листа.\n",
657 | " \"\"\"\n",
658 | " # TODO: напиши меня\n",
659 | " pass\n",
660 | " \n",
661 | " \n",
662 | "a = MyList([1, 2, 3, 4])\n",
663 | "assert a.return_sum() == 10\n",
664 | "assert a.make_reverse() == [4, 3, 2, 1]\n",
665 | "assert a.make_slice(0, 2) == [1, 2]\n",
666 | "\n",
667 | "b = MyList([5, 6, 6, 5])\n",
668 | "assert b.return_sum() == 22\n",
669 | "assert b.make_reverse() == [5, 6, 6, 5]\n",
670 | "assert b.make_slice(1, 2) == [6]"
671 | ]
672 | },
673 | {
674 | "cell_type": "code",
675 | "execution_count": null,
676 | "id": "aa5dc08b",
677 | "metadata": {},
678 | "outputs": [],
679 | "source": []
680 | }
681 | ],
682 | "metadata": {
683 | "kernelspec": {
684 | "display_name": "Python 3 (ipykernel)",
685 | "language": "python",
686 | "name": "python3"
687 | },
688 | "language_info": {
689 | "codemirror_mode": {
690 | "name": "ipython",
691 | "version": 3
692 | },
693 | "file_extension": ".py",
694 | "mimetype": "text/x-python",
695 | "name": "python",
696 | "nbconvert_exporter": "python",
697 | "pygments_lexer": "ipython3",
698 | "version": "3.9.13"
699 | }
700 | },
701 | "nbformat": 4,
702 | "nbformat_minor": 5
703 | }
704 |
--------------------------------------------------------------------------------
/notebooks/seminars/02-python-libs.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "e60bae95",
6 | "metadata": {},
7 | "source": [
8 | "# Библиотеки"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "id": "c310ce9d",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import math \n",
19 | "import os"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2b43ec1",
25 | "metadata": {},
26 | "source": [
27 | "# 1. Numpy "
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "id": "b93bcd0d",
33 | "metadata": {},
34 | "source": [
35 | "NumPy (https://numpy.org) -- библиотека для работы с массивами. \n",
36 | "Работаем с примерами отсюда https://numpy.org/doc/stable/user/quickstart.html#the-basics"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": null,
42 | "id": "8e31d506",
43 | "metadata": {},
44 | "outputs": [],
45 | "source": [
46 | "import numpy as np\n",
47 | "\n",
48 | "\n",
49 | "a = np.arange(15)\n",
50 | "a"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "id": "4d115b9a",
57 | "metadata": {},
58 | "outputs": [],
59 | "source": [
60 | "a.shape"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": null,
66 | "id": "bf2bcfa1",
67 | "metadata": {},
68 | "outputs": [],
69 | "source": [
70 | "type(a)"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": null,
76 | "id": "ddd6c6c3",
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "a = a.reshape(3, 5)\n",
81 | "a"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "id": "2460e6cd",
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "a.shape"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "id": "220643a2",
98 | "metadata": {},
99 | "outputs": [],
100 | "source": [
101 | "a.dtype"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "id": "68ec0297",
108 | "metadata": {},
109 | "outputs": [],
110 | "source": [
111 | "type(a)"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": null,
117 | "id": "9dac9538",
118 | "metadata": {},
119 | "outputs": [],
120 | "source": [
121 | "a = np.array([2, 3, 4])\n",
122 | "a"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "id": "53754763",
129 | "metadata": {},
130 | "outputs": [],
131 | "source": [
132 | "a.dtype"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "id": "25babeb3",
139 | "metadata": {},
140 | "outputs": [],
141 | "source": [
142 | "b = np.array([1.2, 3.5, 5.1])\n",
143 | "b.dtype"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": null,
149 | "id": "4297d976",
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "b = np.array([(1.5, 2, 3), (4, 5, 6)])\n",
154 | "b"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": null,
160 | "id": "5745b04c",
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "np.zeros((3, 4))"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": null,
170 | "id": "e394a1de",
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "np.ones((2, 3, 4), dtype=np.int16)"
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": null,
180 | "id": "8b642d5a",
181 | "metadata": {},
182 | "outputs": [],
183 | "source": [
184 | "np.arange(10, 30, 5)"
185 | ]
186 | },
187 | {
188 | "cell_type": "code",
189 | "execution_count": null,
190 | "id": "15b61e06",
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "np.arange(0, 2, 0.3)"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "id": "f6473aca",
200 | "metadata": {},
201 | "source": [
202 | "### Operations"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "id": "c2c52536",
209 | "metadata": {},
210 | "outputs": [],
211 | "source": [
212 | "a = np.array([20, 30, 40, 50])\n",
213 | "b = np.arange(4)\n",
214 | "print(a)\n",
215 | "print(b)"
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": null,
221 | "id": "1a8ec8d4",
222 | "metadata": {},
223 | "outputs": [],
224 | "source": [
225 | "c = a - b\n",
226 | "c"
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": null,
232 | "id": "08104d1b",
233 | "metadata": {},
234 | "outputs": [],
235 | "source": [
236 | "b ** 2"
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": null,
242 | "id": "640af5c6",
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "b * 10"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "id": "d6a54e00",
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "10 * np.sin(a)"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": null,
262 | "id": "b92ed553",
263 | "metadata": {},
264 | "outputs": [],
265 | "source": [
266 | "a < 35"
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "id": "d8cf6b45",
272 | "metadata": {},
273 | "source": [
274 | "# Micro ML Intro"
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "id": "79a961a0",
280 | "metadata": {},
281 | "source": [
282 | "**`Датасет`** -- набор данных, на которых обучается модель машинного / глубокого обучения. \n",
283 | "\n",
284 | "Датасет можно разделить на 2 части: `X` и `y` \n",
285 | "**`X`** -- набор признаков по каждому объекту \n",
286 | "**`y`** -- целевая переменная по каждому объекту \n",
287 | " \n",
288 | "Датасеты можно искать много где. Например, на kaggle (там их около 300к) -- https://www.kaggle.com/datasets \n",
289 | "\n",
290 | "---\n",
291 | "Изображения взяты с https://builtin.com/data-science/train-test-split"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "id": "16593847",
297 | "metadata": {},
298 | "source": [
299 | ""
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "id": "2c652b90",
305 | "metadata": {},
306 | "source": [
307 | "### Примеры датасетов:\n",
308 | "1. Титаник (https://www.kaggle.com/competitions/titanic/data)
\n",
309 | " `X` -- набор признаков на каждого пассажира (пол, возраст, класс билета и пр.)
\n",
310 | " `y` -- флаг, выжил ли пассажир при крушении Титаника
\n",
311 | "---\n",
312 | "2. Данные об играх на портале IGN (https://www.kaggle.com/datasets/kapturovalexander/ign-games-from-best-to-worst)
\n",
313 | " `X` -- набор признаков: название, год выхода, жанр \n",
314 | " `y` -- оценка от критиков, оценка от игроков\n",
315 | "---\n",
316 | "3. Определение типа кожи по фотографии (https://www.kaggle.com/datasets/shakyadissanayake/oily-dry-and-normal-skin-types-dataset)
\n",
317 | " `X` -- фотографии кожи \n",
318 | " `y` -- тип кожи\n",
319 | "---\n",
320 | "4. Датасет 50к песен из Spotify (https://www.kaggle.com/datasets/joebeachcapital/57651-spotify-songs)
\n",
321 | " `X` -- исполнитель, название песни, текст песни \n",
322 | " `y` -- зависит от задачи. Например, может быть предсказание исполнителя по тексту песни. Либо генерация текстов в стиле какого-либо исполнителя."
323 | ]
324 | },
325 | {
326 | "cell_type": "markdown",
327 | "id": "ba3bc012",
328 | "metadata": {},
329 | "source": [
330 | " \n",
331 | "---"
332 | ]
333 | },
334 | {
335 | "cell_type": "markdown",
336 | "id": "38eacdf8",
337 | "metadata": {},
338 | "source": [
339 | "## Пример\n",
340 | ""
341 | ]
342 | },
343 | {
344 | "cell_type": "markdown",
345 | "id": "e7535cfa",
346 | "metadata": {},
347 | "source": [
348 | ""
349 | ]
350 | },
351 | {
352 | "cell_type": "markdown",
353 | "id": "c17b8c40",
354 | "metadata": {},
355 | "source": [
356 | ""
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "id": "08a6ba50",
362 | "metadata": {},
363 | "source": [
364 | "# 2. Pandas"
365 | ]
366 | },
367 | {
368 | "cell_type": "markdown",
369 | "id": "d3a63348",
370 | "metadata": {},
371 | "source": [
372 | "Pandas (https://pandas.pydata.org) -- библиотека для работы с датасетами. \n",
373 | "Введение в библиотеку можно посмотреть тут -- https://pandas.pydata.org/docs/user_guide/10min.html"
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": null,
379 | "id": "d504be18",
380 | "metadata": {},
381 | "outputs": [],
382 | "source": [
383 | "import pandas as pd"
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": null,
389 | "id": "44ca2f5f",
390 | "metadata": {},
391 | "outputs": [],
392 | "source": [
393 | "s = pd.Series([1, 3, 5, np.nan, 6, 8])\n",
394 | "s"
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": null,
400 | "id": "ffec5692",
401 | "metadata": {},
402 | "outputs": [],
403 | "source": [
404 | "dates = pd.date_range(\"20130101\", periods=6)\n",
405 | "dates"
406 | ]
407 | },
408 | {
409 | "cell_type": "code",
410 | "execution_count": null,
411 | "id": "2882d00c",
412 | "metadata": {},
413 | "outputs": [],
414 | "source": [
415 | "df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list(\"ABCD\"))\n",
416 | "df"
417 | ]
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": null,
422 | "id": "67e1ff54",
423 | "metadata": {},
424 | "outputs": [],
425 | "source": [
426 | "df2 = pd.DataFrame(\n",
427 | " {\n",
428 | " \"A\": 1.0,\n",
429 | " \"B\": pd.Timestamp(\"20130102\"),\n",
430 | " \"C\": pd.Series(1, index=list(range(4)), dtype=\"float32\"),\n",
431 | " \"D\": np.array([3] * 4, dtype=\"int32\"),\n",
432 | " \"E\": pd.Categorical([\"test\", \"train\", \"test\", \"train\"]),\n",
433 | " \"F\": \"foo\",\n",
434 | " }\n",
435 | ")\n",
436 | "df2"
437 | ]
438 | },
439 | {
440 | "cell_type": "code",
441 | "execution_count": null,
442 | "id": "1469555e",
443 | "metadata": {},
444 | "outputs": [],
445 | "source": [
446 | "df2.dtypes"
447 | ]
448 | },
449 | {
450 | "cell_type": "code",
451 | "execution_count": null,
452 | "id": "95698aa8",
453 | "metadata": {},
454 | "outputs": [],
455 | "source": [
456 | "df.head()"
457 | ]
458 | },
459 | {
460 | "cell_type": "code",
461 | "execution_count": null,
462 | "id": "96e59a0c",
463 | "metadata": {},
464 | "outputs": [],
465 | "source": [
466 | "df.tail(3)"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": null,
472 | "id": "9138a90b",
473 | "metadata": {},
474 | "outputs": [],
475 | "source": [
476 | "df.index"
477 | ]
478 | },
479 | {
480 | "cell_type": "code",
481 | "execution_count": null,
482 | "id": "316e81bb",
483 | "metadata": {},
484 | "outputs": [],
485 | "source": [
486 | "df.columns"
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "execution_count": null,
492 | "id": "aa047d37",
493 | "metadata": {},
494 | "outputs": [],
495 | "source": [
496 | "df.to_numpy()"
497 | ]
498 | },
499 | {
500 | "cell_type": "code",
501 | "execution_count": null,
502 | "id": "85716865",
503 | "metadata": {},
504 | "outputs": [],
505 | "source": [
506 | "df.describe()"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": null,
512 | "id": "baec6a63",
513 | "metadata": {},
514 | "outputs": [],
515 | "source": [
516 | "df.T"
517 | ]
518 | },
519 | {
520 | "cell_type": "code",
521 | "execution_count": null,
522 | "id": "8a2b7a14",
523 | "metadata": {},
524 | "outputs": [],
525 | "source": [
526 | "df.sort_index(axis=1, ascending=False)"
527 | ]
528 | },
529 | {
530 | "cell_type": "code",
531 | "execution_count": null,
532 | "id": "777416db",
533 | "metadata": {},
534 | "outputs": [],
535 | "source": [
536 | "df.sort_values(by=\"B\")"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": null,
542 | "id": "64786844",
543 | "metadata": {},
544 | "outputs": [],
545 | "source": [
546 | "df[\"A\"]"
547 | ]
548 | },
549 | {
550 | "cell_type": "code",
551 | "execution_count": null,
552 | "id": "c0e66775",
553 | "metadata": {},
554 | "outputs": [],
555 | "source": [
556 | "df[0:3]"
557 | ]
558 | },
559 | {
560 | "cell_type": "code",
561 | "execution_count": null,
562 | "id": "17cd42b9",
563 | "metadata": {},
564 | "outputs": [],
565 | "source": [
566 | "df[\"20130102\":\"20130104\"]"
567 | ]
568 | },
569 | {
570 | "cell_type": "code",
571 | "execution_count": null,
572 | "id": "243d6301",
573 | "metadata": {},
574 | "outputs": [],
575 | "source": [
576 | "df.loc[dates[0]]"
577 | ]
578 | },
579 | {
580 | "cell_type": "code",
581 | "execution_count": null,
582 | "id": "86e019cd",
583 | "metadata": {},
584 | "outputs": [],
585 | "source": [
586 | "df.loc[:, [\"A\", \"B\"]]"
587 | ]
588 | },
589 | {
590 | "cell_type": "code",
591 | "execution_count": null,
592 | "id": "47acc1f7",
593 | "metadata": {},
594 | "outputs": [],
595 | "source": [
596 | "df.loc[\"20130102\":\"20130104\", [\"A\", \"B\"]]"
597 | ]
598 | },
599 | {
600 | "cell_type": "code",
601 | "execution_count": null,
602 | "id": "4f3b73e3",
603 | "metadata": {},
604 | "outputs": [],
605 | "source": [
606 | "df.loc[\"20130102\", [\"A\", \"B\"]]"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": null,
612 | "id": "b539dbff",
613 | "metadata": {},
614 | "outputs": [],
615 | "source": [
616 | "df.loc[dates[0], \"A\"]"
617 | ]
618 | },
619 | {
620 | "cell_type": "code",
621 | "execution_count": null,
622 | "id": "515c57c7",
623 | "metadata": {},
624 | "outputs": [],
625 | "source": [
626 | "df.at[dates[0], \"A\"]"
627 | ]
628 | },
629 | {
630 | "cell_type": "code",
631 | "execution_count": null,
632 | "id": "c98a02a4",
633 | "metadata": {},
634 | "outputs": [],
635 | "source": [
636 | "df.iloc[3]"
637 | ]
638 | },
639 | {
640 | "cell_type": "code",
641 | "execution_count": null,
642 | "id": "65a13e7d",
643 | "metadata": {},
644 | "outputs": [],
645 | "source": [
646 | "df.iloc[3:5, 0:2]"
647 | ]
648 | },
649 | {
650 | "cell_type": "code",
651 | "execution_count": null,
652 | "id": "424bb0f9",
653 | "metadata": {},
654 | "outputs": [],
655 | "source": [
656 | "df.iloc[[1, 2, 4], [0, 2]]"
657 | ]
658 | },
659 | {
660 | "cell_type": "code",
661 | "execution_count": null,
662 | "id": "f9ad2217",
663 | "metadata": {},
664 | "outputs": [],
665 | "source": [
666 | "df.iloc[1:3, :]"
667 | ]
668 | },
669 | {
670 | "cell_type": "code",
671 | "execution_count": null,
672 | "id": "6b8a180c",
673 | "metadata": {},
674 | "outputs": [],
675 | "source": [
676 | "df.iloc[:, 1:3]"
677 | ]
678 | },
679 | {
680 | "cell_type": "code",
681 | "execution_count": null,
682 | "id": "b5c2e70d",
683 | "metadata": {},
684 | "outputs": [],
685 | "source": [
686 | "df.iloc[1, 1]"
687 | ]
688 | },
689 | {
690 | "cell_type": "code",
691 | "execution_count": null,
692 | "id": "98ffeac6",
693 | "metadata": {},
694 | "outputs": [],
695 | "source": [
696 | "%timeit df.iloc[1, 1]"
697 | ]
698 | },
699 | {
700 | "cell_type": "code",
701 | "execution_count": null,
702 | "id": "a8013f56",
703 | "metadata": {},
704 | "outputs": [],
705 | "source": [
706 | "df.iat[1, 1]"
707 | ]
708 | },
709 | {
710 | "cell_type": "code",
711 | "execution_count": null,
712 | "id": "1ebf9284",
713 | "metadata": {},
714 | "outputs": [],
715 | "source": [
716 | "%timeit df.iat[1, 1]"
717 | ]
718 | },
719 | {
720 | "cell_type": "code",
721 | "execution_count": null,
722 | "id": "72cceb89",
723 | "metadata": {},
724 | "outputs": [],
725 | "source": [
726 | "df[df[\"A\"] > 0]"
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": null,
732 | "id": "38c9b444",
733 | "metadata": {},
734 | "outputs": [],
735 | "source": [
736 | "df[df > 0]"
737 | ]
738 | },
739 | {
740 | "cell_type": "code",
741 | "execution_count": null,
742 | "id": "0b0c5b4f",
743 | "metadata": {},
744 | "outputs": [],
745 | "source": [
746 | "df2 = df.copy()"
747 | ]
748 | },
749 | {
750 | "cell_type": "code",
751 | "execution_count": null,
752 | "id": "cb3f27e2",
753 | "metadata": {},
754 | "outputs": [],
755 | "source": [
756 | "df2[\"E\"] = [\"one\", \"one\", \"two\", \"three\", \"four\", \"three\"]"
757 | ]
758 | },
759 | {
760 | "cell_type": "code",
761 | "execution_count": null,
762 | "id": "f3081a4f",
763 | "metadata": {},
764 | "outputs": [],
765 | "source": [
766 | "df2"
767 | ]
768 | },
769 | {
770 | "cell_type": "code",
771 | "execution_count": null,
772 | "id": "128fd735",
773 | "metadata": {},
774 | "outputs": [],
775 | "source": [
776 | "df2[df2[\"E\"].isin([\"two\", \"four\"])]"
777 | ]
778 | },
779 | {
780 | "cell_type": "code",
781 | "execution_count": null,
782 | "id": "92bcd1cb",
783 | "metadata": {},
784 | "outputs": [],
785 | "source": [
786 | "s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range(\"20130102\", periods=6))\n",
787 | "s1"
788 | ]
789 | },
790 | {
791 | "cell_type": "code",
792 | "execution_count": null,
793 | "id": "ff154561",
794 | "metadata": {},
795 | "outputs": [],
796 | "source": [
797 | "df[\"F\"] = s1"
798 | ]
799 | },
800 | {
801 | "cell_type": "code",
802 | "execution_count": null,
803 | "id": "799516ba",
804 | "metadata": {},
805 | "outputs": [],
806 | "source": [
807 | "df"
808 | ]
809 | },
810 | {
811 | "cell_type": "code",
812 | "execution_count": null,
813 | "id": "3cad2f7b",
814 | "metadata": {},
815 | "outputs": [],
816 | "source": [
817 | "df.at[dates[0], \"A\"] = 0"
818 | ]
819 | },
820 | {
821 | "cell_type": "code",
822 | "execution_count": null,
823 | "id": "39db1a86",
824 | "metadata": {},
825 | "outputs": [],
826 | "source": [
827 | "df"
828 | ]
829 | },
830 | {
831 | "cell_type": "code",
832 | "execution_count": null,
833 | "id": "200e51fd",
834 | "metadata": {},
835 | "outputs": [],
836 | "source": [
837 | "df.iat[0, 1] = 0"
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "execution_count": null,
843 | "id": "27563656",
844 | "metadata": {},
845 | "outputs": [],
846 | "source": [
847 | "df"
848 | ]
849 | },
850 | {
851 | "cell_type": "code",
852 | "execution_count": null,
853 | "id": "dff1a5c2",
854 | "metadata": {},
855 | "outputs": [],
856 | "source": [
857 | "df.loc[:, \"D\"] = np.array([5] * len(df))\n",
858 | "df"
859 | ]
860 | },
861 | {
862 | "cell_type": "code",
863 | "execution_count": null,
864 | "id": "9223d073",
865 | "metadata": {},
866 | "outputs": [],
867 | "source": [
868 | "df2 = df.copy()"
869 | ]
870 | },
871 | {
872 | "cell_type": "code",
873 | "execution_count": null,
874 | "id": "7137549f",
875 | "metadata": {},
876 | "outputs": [],
877 | "source": [
878 | "df2[df2 > 0] = -df2"
879 | ]
880 | },
881 | {
882 | "cell_type": "code",
883 | "execution_count": null,
884 | "id": "38ebf7c9",
885 | "metadata": {},
886 | "outputs": [],
887 | "source": [
888 | "df2"
889 | ]
890 | },
891 | {
892 | "cell_type": "code",
893 | "execution_count": null,
894 | "id": "6e92f302",
895 | "metadata": {},
896 | "outputs": [],
897 | "source": [
898 | "df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + [\"E\"])"
899 | ]
900 | },
901 | {
902 | "cell_type": "code",
903 | "execution_count": null,
904 | "id": "fc07699c",
905 | "metadata": {},
906 | "outputs": [],
907 | "source": [
908 | "df1"
909 | ]
910 | },
911 | {
912 | "cell_type": "code",
913 | "execution_count": null,
914 | "id": "c9be24a4",
915 | "metadata": {},
916 | "outputs": [],
917 | "source": [
918 | "df1.loc[dates[0] : dates[1], \"E\"] = 1"
919 | ]
920 | },
921 | {
922 | "cell_type": "code",
923 | "execution_count": null,
924 | "id": "b4a7e142",
925 | "metadata": {},
926 | "outputs": [],
927 | "source": [
928 | "df1"
929 | ]
930 | },
931 | {
932 | "cell_type": "code",
933 | "execution_count": null,
934 | "id": "a57f4792",
935 | "metadata": {},
936 | "outputs": [],
937 | "source": [
938 | "df1.dropna(how=\"any\")"
939 | ]
940 | },
941 | {
942 | "cell_type": "code",
943 | "execution_count": null,
944 | "id": "1049117a",
945 | "metadata": {},
946 | "outputs": [],
947 | "source": [
948 | "df1.fillna(value=5)"
949 | ]
950 | },
951 | {
952 | "cell_type": "code",
953 | "execution_count": null,
954 | "id": "f25fb757",
955 | "metadata": {},
956 | "outputs": [],
957 | "source": [
958 | "pd.isna(df1)"
959 | ]
960 | },
961 | {
962 | "cell_type": "markdown",
963 | "id": "05eef9f9",
964 | "metadata": {},
965 | "source": [
966 | "## Operations"
967 | ]
968 | },
969 | {
970 | "cell_type": "code",
971 | "execution_count": null,
972 | "id": "b89bcbd1",
973 | "metadata": {},
974 | "outputs": [],
975 | "source": [
976 | "df"
977 | ]
978 | },
979 | {
980 | "cell_type": "code",
981 | "execution_count": null,
982 | "id": "3bf1f0fd",
983 | "metadata": {},
984 | "outputs": [],
985 | "source": [
986 | "df.mean()"
987 | ]
988 | },
989 | {
990 | "cell_type": "code",
991 | "execution_count": null,
992 | "id": "b0c4c5b4",
993 | "metadata": {},
994 | "outputs": [],
995 | "source": [
996 | "df.mean(axis=1)"
997 | ]
998 | },
999 | {
1000 | "cell_type": "code",
1001 | "execution_count": null,
1002 | "id": "f740babe",
1003 | "metadata": {},
1004 | "outputs": [],
1005 | "source": [
1006 | "dates"
1007 | ]
1008 | },
1009 | {
1010 | "cell_type": "code",
1011 | "execution_count": null,
1012 | "id": "ddbf9aca",
1013 | "metadata": {},
1014 | "outputs": [],
1015 | "source": [
1016 | "pd.Series([1, 3, 5, np.nan, 6, 8], index=dates)"
1017 | ]
1018 | },
1019 | {
1020 | "cell_type": "code",
1021 | "execution_count": null,
1022 | "id": "f73070b8",
1023 | "metadata": {},
1024 | "outputs": [],
1025 | "source": [
1026 | "pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).values"
1027 | ]
1028 | },
1029 | {
1030 | "cell_type": "code",
1031 | "execution_count": null,
1032 | "id": "a60fd060",
1033 | "metadata": {},
1034 | "outputs": [],
1035 | "source": [
1036 | "dates"
1037 | ]
1038 | },
1039 | {
1040 | "cell_type": "code",
1041 | "execution_count": null,
1042 | "id": "2ffbfa73",
1043 | "metadata": {},
1044 | "outputs": [],
1045 | "source": [
1046 | "s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)\n",
1047 | "s"
1048 | ]
1049 | },
1050 | {
1051 | "cell_type": "code",
1052 | "execution_count": null,
1053 | "id": "94493557",
1054 | "metadata": {},
1055 | "outputs": [],
1056 | "source": [
1057 | "df"
1058 | ]
1059 | },
1060 | {
1061 | "cell_type": "code",
1062 | "execution_count": null,
1063 | "id": "f8b01395",
1064 | "metadata": {},
1065 | "outputs": [],
1066 | "source": [
1067 | "df.index"
1068 | ]
1069 | },
1070 | {
1071 | "cell_type": "code",
1072 | "execution_count": null,
1073 | "id": "3617fc30",
1074 | "metadata": {},
1075 | "outputs": [],
1076 | "source": [
1077 | "s"
1078 | ]
1079 | },
1080 | {
1081 | "cell_type": "code",
1082 | "execution_count": null,
1083 | "id": "105ce0ce",
1084 | "metadata": {},
1085 | "outputs": [],
1086 | "source": [
1087 | "df.sub(s, axis=\"index\")"
1088 | ]
1089 | },
1090 | {
1091 | "cell_type": "code",
1092 | "execution_count": null,
1093 | "id": "e90abe3d",
1094 | "metadata": {},
1095 | "outputs": [],
1096 | "source": [
1097 | "df"
1098 | ]
1099 | },
1100 | {
1101 | "cell_type": "code",
1102 | "execution_count": null,
1103 | "id": "bcc4cdf5",
1104 | "metadata": {},
1105 | "outputs": [],
1106 | "source": [
1107 | "df.apply(np.cumsum)"
1108 | ]
1109 | },
1110 | {
1111 | "cell_type": "code",
1112 | "execution_count": null,
1113 | "id": "164ad202",
1114 | "metadata": {},
1115 | "outputs": [],
1116 | "source": [
1117 | "df.apply(np.cumsum, axis=1)"
1118 | ]
1119 | },
1120 | {
1121 | "cell_type": "code",
1122 | "execution_count": null,
1123 | "id": "280d90bf",
1124 | "metadata": {},
1125 | "outputs": [],
1126 | "source": [
1127 | "df.apply(lambda x: x.max() - x.min())"
1128 | ]
1129 | },
1130 | {
1131 | "cell_type": "code",
1132 | "execution_count": null,
1133 | "id": "478f95cf",
1134 | "metadata": {},
1135 | "outputs": [],
1136 | "source": [
1137 | "s = pd.Series(np.random.randint(0, 7, size=10))"
1138 | ]
1139 | },
1140 | {
1141 | "cell_type": "code",
1142 | "execution_count": null,
1143 | "id": "1ab998b1",
1144 | "metadata": {},
1145 | "outputs": [],
1146 | "source": [
1147 | "s"
1148 | ]
1149 | },
1150 | {
1151 | "cell_type": "code",
1152 | "execution_count": null,
1153 | "id": "085e4269",
1154 | "metadata": {},
1155 | "outputs": [],
1156 | "source": [
1157 | "s.value_counts()"
1158 | ]
1159 | },
1160 | {
1161 | "cell_type": "code",
1162 | "execution_count": null,
1163 | "id": "18d39be5",
1164 | "metadata": {},
1165 | "outputs": [],
1166 | "source": [
1167 | "s = pd.Series([\"A\", \"B\", \"C\", \"Aaba\", \"Baca\", np.nan, \"CABA\", \"dog\", \"cat\"])"
1168 | ]
1169 | },
1170 | {
1171 | "cell_type": "code",
1172 | "execution_count": null,
1173 | "id": "feff87c2",
1174 | "metadata": {},
1175 | "outputs": [],
1176 | "source": [
1177 | "s"
1178 | ]
1179 | },
1180 | {
1181 | "cell_type": "code",
1182 | "execution_count": null,
1183 | "id": "772a8d4f",
1184 | "metadata": {},
1185 | "outputs": [],
1186 | "source": [
1187 | "df['A']"
1188 | ]
1189 | },
1190 | {
1191 | "cell_type": "code",
1192 | "execution_count": null,
1193 | "id": "ba743edf",
1194 | "metadata": {},
1195 | "outputs": [],
1196 | "source": [
1197 | "s.str.lower()"
1198 | ]
1199 | },
1200 | {
1201 | "cell_type": "code",
1202 | "execution_count": null,
1203 | "id": "7d2394af",
1204 | "metadata": {},
1205 | "outputs": [],
1206 | "source": [
1207 | "s.str.upper()"
1208 | ]
1209 | },
1210 | {
1211 | "cell_type": "code",
1212 | "execution_count": null,
1213 | "id": "5524891d",
1214 | "metadata": {},
1215 | "outputs": [],
1216 | "source": [
1217 | "df.max()"
1218 | ]
1219 | },
1220 | {
1221 | "cell_type": "markdown",
1222 | "id": "8a07abef",
1223 | "metadata": {},
1224 | "source": [
1225 | "## Merge & Concat"
1226 | ]
1227 | },
1228 | {
1229 | "cell_type": "code",
1230 | "execution_count": null,
1231 | "id": "ffa04286",
1232 | "metadata": {},
1233 | "outputs": [],
1234 | "source": [
1235 | "df = pd.DataFrame(np.random.randn(10, 4))\n",
1236 | "df"
1237 | ]
1238 | },
1239 | {
1240 | "cell_type": "code",
1241 | "execution_count": null,
1242 | "id": "31d8f50f",
1243 | "metadata": {},
1244 | "outputs": [],
1245 | "source": [
1246 | "pieces = [df[:3], df[3:7], df[7:]]"
1247 | ]
1248 | },
1249 | {
1250 | "cell_type": "code",
1251 | "execution_count": null,
1252 | "id": "39e96af3",
1253 | "metadata": {},
1254 | "outputs": [],
1255 | "source": [
1256 | "pieces[1]"
1257 | ]
1258 | },
1259 | {
1260 | "cell_type": "code",
1261 | "execution_count": null,
1262 | "id": "776a6901",
1263 | "metadata": {},
1264 | "outputs": [],
1265 | "source": [
1266 | "pieces[0]"
1267 | ]
1268 | },
1269 | {
1270 | "cell_type": "code",
1271 | "execution_count": null,
1272 | "id": "b744166b",
1273 | "metadata": {},
1274 | "outputs": [],
1275 | "source": [
1276 | "pieces[-1]"
1277 | ]
1278 | },
1279 | {
1280 | "cell_type": "code",
1281 | "execution_count": null,
1282 | "id": "8928c666",
1283 | "metadata": {},
1284 | "outputs": [],
1285 | "source": [
1286 | "pd.concat(pieces)"
1287 | ]
1288 | },
1289 | {
1290 | "cell_type": "code",
1291 | "execution_count": null,
1292 | "id": "b6ae5449",
1293 | "metadata": {},
1294 | "outputs": [],
1295 | "source": [
1296 | "pd.concat([df[:3], df[7:], df[3:7]], ignore_index=True)"
1297 | ]
1298 | },
1299 | {
1300 | "cell_type": "code",
1301 | "execution_count": null,
1302 | "id": "44a9bbc5",
1303 | "metadata": {},
1304 | "outputs": [],
1305 | "source": [
1306 | "left = pd.DataFrame({\"key\": [\"foo\", \"foo\"], \"lval\": [1, 2]})\n",
1307 | "left"
1308 | ]
1309 | },
1310 | {
1311 | "cell_type": "code",
1312 | "execution_count": null,
1313 | "id": "cc52847c",
1314 | "metadata": {},
1315 | "outputs": [],
1316 | "source": [
1317 | "right = pd.DataFrame({\"key\": [\"foo\", \"foo\"], \"rval\": [4, 5]})\n",
1318 | "right"
1319 | ]
1320 | },
1321 | {
1322 | "cell_type": "code",
1323 | "execution_count": null,
1324 | "id": "784cb8d7",
1325 | "metadata": {},
1326 | "outputs": [],
1327 | "source": [
1328 | "pd.merge(left, right, on=\"key\")"
1329 | ]
1330 | },
1331 | {
1332 | "cell_type": "code",
1333 | "execution_count": null,
1334 | "id": "c791cd31",
1335 | "metadata": {},
1336 | "outputs": [],
1337 | "source": [
1338 | "left = pd.DataFrame({\"key\": [\"foo\", \"bar\", \"ccc\"], \"lval\": [1, 2, 3]})\n",
1339 | "left"
1340 | ]
1341 | },
1342 | {
1343 | "cell_type": "code",
1344 | "execution_count": null,
1345 | "id": "16055c93",
1346 | "metadata": {},
1347 | "outputs": [],
1348 | "source": [
1349 | "right = pd.DataFrame({\"key\": [\"foo\", \"bar\", \"foo\", \"bar\"], \"rval\": [4, 5, 8, 3]})\n",
1350 | "right"
1351 | ]
1352 | },
1353 | {
1354 | "cell_type": "code",
1355 | "execution_count": null,
1356 | "id": "76aefea2",
1357 | "metadata": {},
1358 | "outputs": [],
1359 | "source": [
1360 | "pd.merge(left, right, on=\"key\")"
1361 | ]
1362 | },
1363 | {
1364 | "cell_type": "markdown",
1365 | "id": "c652e699",
1366 | "metadata": {},
1367 | "source": [
1368 | "## Grouping"
1369 | ]
1370 | },
1371 | {
1372 | "cell_type": "code",
1373 | "execution_count": null,
1374 | "id": "c8a8d51c",
1375 | "metadata": {},
1376 | "outputs": [],
1377 | "source": [
1378 | "df = pd.DataFrame(\n",
1379 | " {\n",
1380 | " \"A\": [\"foo\", \"bar\", \"foo\", \"bar\", \"foo\", \"bar\", \"foo\", \"foo\"],\n",
1381 | " \"B\": [\"one\", \"one\", \"two\", \"three\", \"two\", \"two\", \"one\", \"three\"],\n",
1382 | " \"C\": np.random.randn(8),\n",
1383 | " \"D\": np.random.randn(8),\n",
1384 | " }\n",
1385 | ")"
1386 | ]
1387 | },
1388 | {
1389 | "cell_type": "code",
1390 | "execution_count": null,
1391 | "id": "84b3aad8",
1392 | "metadata": {},
1393 | "outputs": [],
1394 | "source": [
1395 | "df"
1396 | ]
1397 | },
1398 | {
1399 | "cell_type": "code",
1400 | "execution_count": null,
1401 | "id": "ce702be3",
1402 | "metadata": {},
1403 | "outputs": [],
1404 | "source": [
1405 | "df[\"A\"].unique()"
1406 | ]
1407 | },
1408 | {
1409 | "cell_type": "code",
1410 | "execution_count": null,
1411 | "id": "4e38b453",
1412 | "metadata": {},
1413 | "outputs": [],
1414 | "source": [
1415 | "df[\"A\"].value_counts()"
1416 | ]
1417 | },
1418 | {
1419 | "cell_type": "code",
1420 | "execution_count": null,
1421 | "id": "b8018ce6",
1422 | "metadata": {},
1423 | "outputs": [],
1424 | "source": [
1425 | "df.groupby(\"A\")[[\"C\", \"D\"]].sum()"
1426 | ]
1427 | },
1428 | {
1429 | "cell_type": "code",
1430 | "execution_count": null,
1431 | "id": "95b962e0",
1432 | "metadata": {},
1433 | "outputs": [],
1434 | "source": [
1435 | "df.groupby([\"A\", \"B\"]).sum()"
1436 | ]
1437 | },
1438 | {
1439 | "cell_type": "code",
1440 | "execution_count": null,
1441 | "id": "9fa36b25",
1442 | "metadata": {},
1443 | "outputs": [],
1444 | "source": [
1445 | "df.groupby([\"A\", \"B\"]).mean()[[\"C\"]]"
1446 | ]
1447 | },
1448 | {
1449 | "cell_type": "code",
1450 | "execution_count": null,
1451 | "id": "852e4319",
1452 | "metadata": {},
1453 | "outputs": [],
1454 | "source": [
1455 | "df.groupby([\"A\", \"B\"]).count()"
1456 | ]
1457 | },
1458 | {
1459 | "cell_type": "markdown",
1460 | "id": "e70b94a3",
1461 | "metadata": {},
1462 | "source": [
1463 | "## Reshaping"
1464 | ]
1465 | },
1466 | {
1467 | "cell_type": "code",
1468 | "execution_count": null,
1469 | "id": "82b53b4e",
1470 | "metadata": {},
1471 | "outputs": [],
1472 | "source": [
1473 | "tuples = list(\n",
1474 | " zip(\n",
1475 | " [\"bar\", \"bar\", \"baz\", \"baz\", \"foo\", \"foo\", \"qux\", \"qux\"],\n",
1476 | " [\"one\", \"two\", \"one\", \"two\", \"one\", \"two\", \"one\", \"two\"],\n",
1477 | " )\n",
1478 | ")\n",
1479 | "\n",
1480 | "index = pd.MultiIndex.from_tuples(tuples, names=[\"first\", \"second\"])\n",
1481 | "df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=[\"A\", \"B\"])\n",
1482 | "df"
1483 | ]
1484 | },
1485 | {
1486 | "cell_type": "code",
1487 | "execution_count": null,
1488 | "id": "0b84ef5e",
1489 | "metadata": {},
1490 | "outputs": [],
1491 | "source": [
1492 | "df2 = df[:4]\n",
1493 | "df2"
1494 | ]
1495 | },
1496 | {
1497 | "cell_type": "code",
1498 | "execution_count": null,
1499 | "id": "646195d2",
1500 | "metadata": {},
1501 | "outputs": [],
1502 | "source": [
1503 | "df2.values"
1504 | ]
1505 | },
1506 | {
1507 | "cell_type": "code",
1508 | "execution_count": null,
1509 | "id": "440f2280",
1510 | "metadata": {},
1511 | "outputs": [],
1512 | "source": [
1513 | "stacked = df2.stack()\n",
1514 | "stacked"
1515 | ]
1516 | },
1517 | {
1518 | "cell_type": "code",
1519 | "execution_count": null,
1520 | "id": "5bfde1ce",
1521 | "metadata": {},
1522 | "outputs": [],
1523 | "source": [
1524 | "stacked.index"
1525 | ]
1526 | },
1527 | {
1528 | "cell_type": "code",
1529 | "execution_count": null,
1530 | "id": "1a108d06",
1531 | "metadata": {},
1532 | "outputs": [],
1533 | "source": [
1534 | "stacked.values"
1535 | ]
1536 | },
1537 | {
1538 | "cell_type": "code",
1539 | "execution_count": null,
1540 | "id": "e2fd8b87",
1541 | "metadata": {},
1542 | "outputs": [],
1543 | "source": [
1544 | "stacked.unstack()"
1545 | ]
1546 | },
1547 | {
1548 | "cell_type": "code",
1549 | "execution_count": null,
1550 | "id": "3503bc31",
1551 | "metadata": {},
1552 | "outputs": [],
1553 | "source": [
1554 | "stacked.unstack(1)"
1555 | ]
1556 | },
1557 | {
1558 | "cell_type": "code",
1559 | "execution_count": null,
1560 | "id": "064a51e7",
1561 | "metadata": {},
1562 | "outputs": [],
1563 | "source": [
1564 | "stacked.unstack(0).unstack(0)"
1565 | ]
1566 | },
1567 | {
1568 | "cell_type": "markdown",
1569 | "id": "f98ff9a5",
1570 | "metadata": {},
1571 | "source": [
1572 | "## Pivoting"
1573 | ]
1574 | },
1575 | {
1576 | "cell_type": "code",
1577 | "execution_count": null,
1578 | "id": "eb66efff",
1579 | "metadata": {},
1580 | "outputs": [],
1581 | "source": [
1582 | "df = pd.DataFrame(\n",
1583 | " {\n",
1584 | " \"A\": [\"one\", \"one\", \"two\", \"three\"] * 3,\n",
1585 | " \"B\": [\"A\", \"B\", \"C\"] * 4,\n",
1586 | " \"C\": [\"foo\", \"foo\", \"foo\", \"bar\", \"bar\", \"bar\"] * 2,\n",
1587 | " \"D\": np.random.randn(12),\n",
1588 | " \"E\": np.random.randn(12),\n",
1589 | " }\n",
1590 | ")\n",
1591 | "df"
1592 | ]
1593 | },
1594 | {
1595 | "cell_type": "code",
1596 | "execution_count": null,
1597 | "id": "e9248bfd",
1598 | "metadata": {},
1599 | "outputs": [],
1600 | "source": [
1601 | "pd.pivot_table(df, values=\"D\", index=[\"A\", \"B\"], columns=[\"C\"])"
1602 | ]
1603 | },
1604 | {
1605 | "cell_type": "markdown",
1606 | "id": "80bb7a2a",
1607 | "metadata": {},
1608 | "source": [
1609 | "## Time Series"
1610 | ]
1611 | },
1612 | {
1613 | "cell_type": "code",
1614 | "execution_count": null,
1615 | "id": "e37886ec",
1616 | "metadata": {},
1617 | "outputs": [],
1618 | "source": [
1619 | "rng = pd.date_range(\"1/1/2012\", periods=1000, freq=\"S\")\n",
1620 | "ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)\n",
1621 | "ts"
1622 | ]
1623 | },
1624 | {
1625 | "cell_type": "code",
1626 | "execution_count": null,
1627 | "id": "8e3791e6",
1628 | "metadata": {},
1629 | "outputs": [],
1630 | "source": [
1631 | "ts.resample(\"1Min\").sum()"
1632 | ]
1633 | },
1634 | {
1635 | "cell_type": "code",
1636 | "execution_count": null,
1637 | "id": "25142ec1",
1638 | "metadata": {},
1639 | "outputs": [],
1640 | "source": [
1641 | "rng = pd.date_range(\"3/6/2012 00:00\", periods=5, freq=\"D\")\n",
1642 | "ts = pd.Series(np.random.randn(len(rng)), rng)\n",
1643 | "ts"
1644 | ]
1645 | },
1646 | {
1647 | "cell_type": "code",
1648 | "execution_count": null,
1649 | "id": "eb5bee69",
1650 | "metadata": {},
1651 | "outputs": [],
1652 | "source": [
1653 | "ts_utc = ts.tz_localize(\"UTC\")\n",
1654 | "ts_utc"
1655 | ]
1656 | },
1657 | {
1658 | "cell_type": "code",
1659 | "execution_count": null,
1660 | "id": "1ff05f90",
1661 | "metadata": {},
1662 | "outputs": [],
1663 | "source": [
1664 | "ts_utc.tz_convert(\"US/Eastern\")"
1665 | ]
1666 | },
1667 | {
1668 | "cell_type": "code",
1669 | "execution_count": null,
1670 | "id": "51a62dc5",
1671 | "metadata": {},
1672 | "outputs": [],
1673 | "source": [
1674 | "rng = pd.date_range(\"1/1/2012\", periods=5, freq=\"M\")\n",
1675 | "ts = pd.Series(np.random.randn(len(rng)), index=rng)\n",
1676 | "ts"
1677 | ]
1678 | },
1679 | {
1680 | "cell_type": "code",
1681 | "execution_count": null,
1682 | "id": "7a10cff1",
1683 | "metadata": {},
1684 | "outputs": [],
1685 | "source": [
1686 | "ps = ts.to_period()\n",
1687 | "ps"
1688 | ]
1689 | },
1690 | {
1691 | "cell_type": "code",
1692 | "execution_count": null,
1693 | "id": "c609aff7",
1694 | "metadata": {},
1695 | "outputs": [],
1696 | "source": [
1697 | "ps.to_timestamp()"
1698 | ]
1699 | },
1700 | {
1701 | "cell_type": "code",
1702 | "execution_count": null,
1703 | "id": "b47a344d",
1704 | "metadata": {},
1705 | "outputs": [],
1706 | "source": [
1707 | "pd.period_range(\"1990Q1\", \"2000Q4\", freq=\"Q-NOV\")"
1708 | ]
1709 | },
1710 | {
1711 | "cell_type": "code",
1712 | "execution_count": null,
1713 | "id": "b3712d37",
1714 | "metadata": {},
1715 | "outputs": [],
1716 | "source": [
1717 | "prng = pd.period_range(\"1990Q1\", \"2000Q4\", freq=\"Q-NOV\")\n",
1718 | "ts = pd.Series(np.random.randn(len(prng)), prng)\n",
1719 | "ts.index = (prng.asfreq(\"M\", \"e\") + 1).asfreq(\"H\", \"s\") + 9\n",
1720 | "ts.head()"
1721 | ]
1722 | },
1723 | {
1724 | "cell_type": "markdown",
1725 | "id": "286bad2a",
1726 | "metadata": {},
1727 | "source": [
1728 | "## Categoricals"
1729 | ]
1730 | },
1731 | {
1732 | "cell_type": "code",
1733 | "execution_count": null,
1734 | "id": "93c71264",
1735 | "metadata": {},
1736 | "outputs": [],
1737 | "source": [
1738 | "df = pd.DataFrame(\n",
1739 | " {\"id\": [1, 2, 3, 4, 5, 6], \"raw_grade\": [\"a\", \"b\", \"b\", \"a\", \"a\", \"e\"]}\n",
1740 | ")\n",
1741 | "df"
1742 | ]
1743 | },
1744 | {
1745 | "cell_type": "code",
1746 | "execution_count": null,
1747 | "id": "b95a599b",
1748 | "metadata": {},
1749 | "outputs": [],
1750 | "source": [
1751 | "df[\"raw_grade\"].str.upper()"
1752 | ]
1753 | },
1754 | {
1755 | "cell_type": "code",
1756 | "execution_count": null,
1757 | "id": "3d3efdc4",
1758 | "metadata": {},
1759 | "outputs": [],
1760 | "source": [
1761 | "df[\"grade\"] = df[\"raw_grade\"].astype(\"category\")\n",
1762 | "df[\"grade\"]"
1763 | ]
1764 | },
1765 | {
1766 | "cell_type": "code",
1767 | "execution_count": null,
1768 | "id": "8ce3a5d3",
1769 | "metadata": {},
1770 | "outputs": [],
1771 | "source": [
1772 | "new_categories = [\"very good\", \"good\", \"very bad\"]\n",
1773 | "df[\"grade\"] = df[\"grade\"].cat.rename_categories(new_categories)"
1774 | ]
1775 | },
1776 | {
1777 | "cell_type": "code",
1778 | "execution_count": null,
1779 | "id": "63eab71a",
1780 | "metadata": {},
1781 | "outputs": [],
1782 | "source": [
1783 | "df"
1784 | ]
1785 | },
1786 | {
1787 | "cell_type": "code",
1788 | "execution_count": null,
1789 | "id": "862f74c9",
1790 | "metadata": {},
1791 | "outputs": [],
1792 | "source": [
1793 | "df[\"grade\"] = df[\"grade\"].cat.set_categories(\n",
1794 | " [\"very bad\", \"bad\", \"medium\", \"good\", \"very good\"]\n",
1795 | ")\n",
1796 | "df[\"grade\"]"
1797 | ]
1798 | },
1799 | {
1800 | "cell_type": "code",
1801 | "execution_count": null,
1802 | "id": "103191f2",
1803 | "metadata": {},
1804 | "outputs": [],
1805 | "source": [
1806 | "df.sort_values(by=\"grade\")"
1807 | ]
1808 | },
1809 | {
1810 | "cell_type": "code",
1811 | "execution_count": null,
1812 | "id": "705d978a",
1813 | "metadata": {},
1814 | "outputs": [],
1815 | "source": [
1816 | "df.groupby(\"raw_grade\").size()"
1817 | ]
1818 | },
1819 | {
1820 | "cell_type": "code",
1821 | "execution_count": null,
1822 | "id": "51bf8c5c",
1823 | "metadata": {},
1824 | "outputs": [],
1825 | "source": [
1826 | "df.groupby(\"grade\").size()"
1827 | ]
1828 | },
1829 | {
1830 | "cell_type": "code",
1831 | "execution_count": null,
1832 | "id": "086a42e3",
1833 | "metadata": {},
1834 | "outputs": [],
1835 | "source": [
1836 | "import matplotlib.pyplot as plt"
1837 | ]
1838 | },
1839 | {
1840 | "cell_type": "code",
1841 | "execution_count": null,
1842 | "id": "bb53025d",
1843 | "metadata": {},
1844 | "outputs": [],
1845 | "source": [
1846 | "pd.Series(np.random.randn(1000), index=pd.date_range(\"1/1/2000\", periods=1000)).plot()"
1847 | ]
1848 | },
1849 | {
1850 | "cell_type": "code",
1851 | "execution_count": null,
1852 | "id": "bd5364d8",
1853 | "metadata": {},
1854 | "outputs": [],
1855 | "source": [
1856 | "ts = pd.Series(np.random.randn(1000), index=pd.date_range(\"1/1/2000\", periods=1000))\n",
1857 | "ts = ts.cumsum()\n",
1858 | "ts.plot();"
1859 | ]
1860 | },
1861 | {
1862 | "cell_type": "code",
1863 | "execution_count": null,
1864 | "id": "18d37930",
1865 | "metadata": {},
1866 | "outputs": [],
1867 | "source": [
1868 | "df = pd.DataFrame(\n",
1869 | " np.random.randn(1000, 4), index=ts.index, columns=[\"A\", \"B\", \"C\", \"D\"]\n",
1870 | ")\n",
1871 | "df = df.cumsum()\n",
1872 | "\n",
1873 | "\n",
1874 | "df.plot()\n",
1875 | "plt.legend(loc='best');"
1876 | ]
1877 | },
1878 | {
1879 | "cell_type": "markdown",
1880 | "id": "a2cf4c18",
1881 | "metadata": {},
1882 | "source": [
1883 | "# Working with real data"
1884 | ]
1885 | },
1886 | {
1887 | "cell_type": "code",
1888 | "execution_count": null,
1889 | "id": "89c3d4ba",
1890 | "metadata": {},
1891 | "outputs": [],
1892 | "source": [
1893 | "data = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv', sep=',')"
1894 | ]
1895 | },
1896 | {
1897 | "cell_type": "code",
1898 | "execution_count": null,
1899 | "id": "586a3735",
1900 | "metadata": {},
1901 | "outputs": [],
1902 | "source": [
1903 | "data.head()"
1904 | ]
1905 | },
1906 | {
1907 | "cell_type": "code",
1908 | "execution_count": null,
1909 | "id": "0fb15d90",
1910 | "metadata": {},
1911 | "outputs": [],
1912 | "source": [
1913 | "data.shape"
1914 | ]
1915 | },
1916 | {
1917 | "cell_type": "code",
1918 | "execution_count": null,
1919 | "id": "a26cad5f",
1920 | "metadata": {},
1921 | "outputs": [],
1922 | "source": [
1923 | "data.info()"
1924 | ]
1925 | },
1926 | {
1927 | "cell_type": "code",
1928 | "execution_count": null,
1929 | "id": "5c2e9f13",
1930 | "metadata": {},
1931 | "outputs": [],
1932 | "source": [
1933 | "data.describe()"
1934 | ]
1935 | },
1936 | {
1937 | "cell_type": "code",
1938 | "execution_count": null,
1939 | "id": "c30b94bb",
1940 | "metadata": {},
1941 | "outputs": [],
1942 | "source": [
1943 | "data.to_numpy()"
1944 | ]
1945 | },
1946 | {
1947 | "cell_type": "code",
1948 | "execution_count": null,
1949 | "id": "b48e6200",
1950 | "metadata": {},
1951 | "outputs": [],
1952 | "source": [
1953 | "type(data[\"Name\"])"
1954 | ]
1955 | },
1956 | {
1957 | "cell_type": "code",
1958 | "execution_count": null,
1959 | "id": "3e3b5e44",
1960 | "metadata": {},
1961 | "outputs": [],
1962 | "source": [
1963 | "data[\"Name\"]"
1964 | ]
1965 | },
1966 | {
1967 | "cell_type": "code",
1968 | "execution_count": null,
1969 | "id": "be3ee512",
1970 | "metadata": {},
1971 | "outputs": [],
1972 | "source": [
1973 | "data[[\"Name\"]]"
1974 | ]
1975 | },
1976 | {
1977 | "cell_type": "code",
1978 | "execution_count": null,
1979 | "id": "82623d08",
1980 | "metadata": {},
1981 | "outputs": [],
1982 | "source": [
1983 | "data[[\"Name\", \"Sex\", \"Age\"]]"
1984 | ]
1985 | },
1986 | {
1987 | "cell_type": "code",
1988 | "execution_count": null,
1989 | "id": "de950aeb",
1990 | "metadata": {},
1991 | "outputs": [],
1992 | "source": [
1993 | "data[\"Age\"].hist(bins=30)"
1994 | ]
1995 | },
1996 | {
1997 | "cell_type": "code",
1998 | "execution_count": null,
1999 | "id": "a3fbe35a",
2000 | "metadata": {},
2001 | "outputs": [],
2002 | "source": [
2003 | "data[\"Sex\"].hist()"
2004 | ]
2005 | },
2006 | {
2007 | "cell_type": "code",
2008 | "execution_count": null,
2009 | "id": "c2f48461",
2010 | "metadata": {},
2011 | "outputs": [],
2012 | "source": [
2013 | "data[\"Age\"] < 30"
2014 | ]
2015 | },
2016 | {
2017 | "cell_type": "code",
2018 | "execution_count": null,
2019 | "id": "d8dcb30b",
2020 | "metadata": {},
2021 | "outputs": [],
2022 | "source": [
2023 | "bool_mask = data[\"Age\"] < 30\n",
2024 | "data[bool_mask]"
2025 | ]
2026 | },
2027 | {
2028 | "cell_type": "code",
2029 | "execution_count": null,
2030 | "id": "4174142d",
2031 | "metadata": {},
2032 | "outputs": [],
2033 | "source": [
2034 | "data[bool_mask].equals(data[data[\"Age\"] < 30])"
2035 | ]
2036 | },
2037 | {
2038 | "cell_type": "code",
2039 | "execution_count": null,
2040 | "id": "6b11fe1c",
2041 | "metadata": {},
2042 | "outputs": [],
2043 | "source": [
2044 | "data[\"Age\"].mean()"
2045 | ]
2046 | },
2047 | {
2048 | "cell_type": "code",
2049 | "execution_count": null,
2050 | "id": "7068042d",
2051 | "metadata": {},
2052 | "outputs": [],
2053 | "source": [
2054 | "data.groupby(\"Sex\").mean()"
2055 | ]
2056 | },
2057 | {
2058 | "cell_type": "markdown",
2059 | "id": "1f32196a",
2060 | "metadata": {},
2061 | "source": [
2062 | "# 3. Matplotlib\n",
2063 | " \n",
2064 | "Официальный сайт -- https://matplotlib.org \n",
2065 | "Getting started -- https://matplotlib.org/stable/users/getting_started/ \n",
2066 | "Примеры -- https://matplotlib.org/stable/gallery/index.html"
2067 | ]
2068 | },
2069 | {
2070 | "cell_type": "code",
2071 | "execution_count": null,
2072 | "id": "1a0bc676",
2073 | "metadata": {},
2074 | "outputs": [],
2075 | "source": [
2076 | "import matplotlib as mpl\n",
2077 | "import matplotlib.pyplot as plt"
2078 | ]
2079 | },
2080 | {
2081 | "cell_type": "code",
2082 | "execution_count": null,
2083 | "id": "843f5b88",
2084 | "metadata": {},
2085 | "outputs": [],
2086 | "source": [
2087 | "fig, ax = plt.subplots() \n",
2088 | "ax.plot([1, 2, 3, 4], [1, 4, 2, 3])"
2089 | ]
2090 | },
2091 | {
2092 | "cell_type": "code",
2093 | "execution_count": null,
2094 | "id": "a077efe3",
2095 | "metadata": {},
2096 | "outputs": [],
2097 | "source": [
2098 | "np.random.seed(17)\n",
2099 | "data = {'a': np.arange(50),\n",
2100 | " 'c': np.random.randint(0, 50, 50),\n",
2101 | " 'd': np.random.randn(50)}\n",
2102 | "data['b'] = data['a'] + 10 * np.random.randn(50)\n",
2103 | "data['d'] = np.abs(data['d']) * 100\n",
2104 | "\n",
2105 | "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2106 | "ax.scatter('a', 'b', c='c', s='d', data=data)\n",
2107 | "ax.set_xlabel('entry a')\n",
2108 | "ax.set_ylabel('entry b')"
2109 | ]
2110 | },
2111 | {
2112 | "cell_type": "code",
2113 | "execution_count": null,
2114 | "id": "a8d00d46",
2115 | "metadata": {},
2116 | "outputs": [],
2117 | "source": [
2118 | "x = np.linspace(0, 2, 100) \n",
2119 | "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2120 | "ax.plot(x, x, label='linear')\n",
2121 | "ax.plot(x, x**2, label='quadratic')\n",
2122 | "ax.plot(x, x**3, label='cubic') \n",
2123 | "ax.set_xlabel('x label') \n",
2124 | "ax.set_ylabel('y label') \n",
2125 | "ax.set_title(\"Simple Plot\") \n",
2126 | "ax.legend() "
2127 | ]
2128 | },
2129 | {
2130 | "cell_type": "code",
2131 | "execution_count": null,
2132 | "id": "fddaf151",
2133 | "metadata": {},
2134 | "outputs": [],
2135 | "source": [
2136 | "x = np.linspace(0, 2, 100)\n",
2137 | "\n",
2138 | "plt.figure(figsize=(5, 2.7), layout='constrained')\n",
2139 | "plt.plot(x, x, label='linear')\n",
2140 | "plt.plot(x, x**2, label='quadratic')\n",
2141 | "plt.plot(x, x**3, label='cubic')\n",
2142 | "plt.xlabel('x label')\n",
2143 | "plt.ylabel('y label')\n",
2144 | "plt.title(\"Simple Plot\")\n",
2145 | "plt.legend()"
2146 | ]
2147 | },
2148 | {
2149 | "cell_type": "code",
2150 | "execution_count": null,
2151 | "id": "f58cff42",
2152 | "metadata": {},
2153 | "outputs": [],
2154 | "source": [
2155 | "data1, data2, data3, data4 = np.random.randn(4, 100)"
2156 | ]
2157 | },
2158 | {
2159 | "cell_type": "code",
2160 | "execution_count": null,
2161 | "id": "39298642",
2162 | "metadata": {},
2163 | "outputs": [],
2164 | "source": [
2165 | "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2166 | "x = np.arange(len(data1))\n",
2167 | "ax.plot(x, np.cumsum(data1), color='blue', linewidth=3, linestyle='--')\n",
2168 | "l, = ax.plot(x, np.cumsum(data2), color='orange', linewidth=2)\n",
2169 | "l.set_linestyle(':')"
2170 | ]
2171 | },
2172 | {
2173 | "cell_type": "code",
2174 | "execution_count": null,
2175 | "id": "71954147",
2176 | "metadata": {},
2177 | "outputs": [],
2178 | "source": [
2179 | "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2180 | "ax.scatter(data1, data2, s=50, facecolor='C0', edgecolor='k')"
2181 | ]
2182 | },
2183 | {
2184 | "cell_type": "code",
2185 | "execution_count": null,
2186 | "id": "f6c8c058",
2187 | "metadata": {},
2188 | "outputs": [],
2189 | "source": [
2190 | "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2191 | "ax.plot(data1, 'o', label='data1')\n",
2192 | "ax.plot(data2, 'd', label='data2')\n",
2193 | "ax.plot(data3, 'v', label='data3')\n",
2194 | "ax.plot(data4, 's', label='data4')\n",
2195 | "ax.legend()"
2196 | ]
2197 | },
2198 | {
2199 | "cell_type": "code",
2200 | "execution_count": null,
2201 | "id": "44fc4990",
2202 | "metadata": {},
2203 | "outputs": [],
2204 | "source": [
2205 | "mu, sigma = 115, 15\n",
2206 | "x = mu + sigma * np.random.randn(10000)\n",
2207 | "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2208 | "\n",
2209 | "n, bins, patches = ax.hist(x, 50, density=True, facecolor='C0', alpha=0.75)\n",
2210 | "\n",
2211 | "ax.set_xlabel('Length [cm]')\n",
2212 | "ax.set_ylabel('Probability')\n",
2213 | "ax.set_title('Aardvark lengths\\n (not really)')\n",
2214 | "ax.text(75, .025, r'$\\mu=115,\\ \\sigma=15$')\n",
2215 | "ax.axis([55, 175, 0, 0.03])\n",
2216 | "ax.grid(True)"
2217 | ]
2218 | },
2219 | {
2220 | "cell_type": "code",
2221 | "execution_count": null,
2222 | "id": "bd4d06d0",
2223 | "metadata": {},
2224 | "outputs": [],
2225 | "source": [
2226 | "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2227 | "\n",
2228 | "t = np.arange(0.0, 5.0, 0.01)\n",
2229 | "s = np.cos(2 * np.pi * t)\n",
2230 | "line, = ax.plot(t, s, lw=2)\n",
2231 | "\n",
2232 | "ax.annotate('local max', xy=(2, 1), xytext=(3, 1.5),\n",
2233 | " arrowprops=dict(facecolor='black', shrink=0.05))\n",
2234 | "\n",
2235 | "ax.set_ylim(-2, 2)"
2236 | ]
2237 | },
2238 | {
2239 | "cell_type": "code",
2240 | "execution_count": null,
2241 | "id": "dcb20e51",
2242 | "metadata": {},
2243 | "outputs": [],
2244 | "source": [
2245 | "fig, ax = plt.subplots(figsize=(5, 2.7))\n",
2246 | "ax.plot(np.arange(len(data1)), data1, label='data1')\n",
2247 | "ax.plot(np.arange(len(data2)), data2, label='data2')\n",
2248 | "ax.plot(np.arange(len(data3)), data3, 'd', label='data3')\n",
2249 | "ax.legend()"
2250 | ]
2251 | },
2252 | {
2253 | "cell_type": "code",
2254 | "execution_count": null,
2255 | "id": "9ce0253f",
2256 | "metadata": {},
2257 | "outputs": [],
2258 | "source": [
2259 | "fig, axs = plt.subplots(1, 2, figsize=(5, 2.7), layout='constrained')\n",
2260 | "xdata = np.arange(len(data1)) # make an ordinal for this\n",
2261 | "data = 10**data1\n",
2262 | "axs[0].plot(xdata, data)\n",
2263 | "\n",
2264 | "axs[1].set_yscale('log')\n",
2265 | "axs[1].plot(xdata, data)"
2266 | ]
2267 | },
2268 | {
2269 | "cell_type": "code",
2270 | "execution_count": null,
2271 | "id": "e8052aaa",
2272 | "metadata": {},
2273 | "outputs": [],
2274 | "source": [
2275 | "fig, axs = plt.subplots(2, 1, layout='constrained')\n",
2276 | "axs[0].plot(xdata, data1)\n",
2277 | "axs[0].set_title('Automatic ticks')\n",
2278 | "\n",
2279 | "axs[1].plot(xdata, data1)\n",
2280 | "axs[1].set_xticks(np.arange(0, 100, 30), ['zero', '30', 'sixty', '90'])\n",
2281 | "axs[1].set_yticks([-1.5, 0, 1.5]) # note that we don't need to specify labels\n",
2282 | "axs[1].set_title('Manual ticks')"
2283 | ]
2284 | },
2285 | {
2286 | "cell_type": "code",
2287 | "execution_count": null,
2288 | "id": "b6049ab7",
2289 | "metadata": {},
2290 | "outputs": [],
2291 | "source": [
2292 | "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2293 | "dates = np.arange(np.datetime64('2021-11-15'), np.datetime64('2021-12-25'),\n",
2294 | " np.timedelta64(1, 'h'))\n",
2295 | "data = np.cumsum(np.random.randn(len(dates)))\n",
2296 | "ax.plot(dates, data)\n",
2297 | "cdf = mpl.dates.ConciseDateFormatter(ax.xaxis.get_major_locator())\n",
2298 | "ax.xaxis.set_major_formatter(cdf)"
2299 | ]
2300 | },
2301 | {
2302 | "cell_type": "code",
2303 | "execution_count": null,
2304 | "id": "47d167c2",
2305 | "metadata": {},
2306 | "outputs": [],
2307 | "source": [
2308 | "fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')\n",
2309 | "categories = ['turnips', 'rutabaga', 'cucumber', 'pumpkins']\n",
2310 | "\n",
2311 | "ax.bar(categories, np.random.rand(len(categories)))"
2312 | ]
2313 | },
2314 | {
2315 | "cell_type": "code",
2316 | "execution_count": null,
2317 | "id": "c9850fed",
2318 | "metadata": {},
2319 | "outputs": [],
2320 | "source": [
2321 | "fig, (ax1, ax3) = plt.subplots(1, 2, figsize=(7, 2.7), layout='constrained')\n",
2322 | "l1, = ax1.plot(t, s)\n",
2323 | "ax2 = ax1.twinx()\n",
2324 | "l2, = ax2.plot(t, range(len(t)), 'C1')\n",
2325 | "ax2.legend([l1, l2], ['Sine (left)', 'Straight (right)'])\n",
2326 | "\n",
2327 | "ax3.plot(t, s)\n",
2328 | "ax3.set_xlabel('Angle [rad]')\n",
2329 | "ax4 = ax3.secondary_xaxis('top', functions=(np.rad2deg, np.deg2rad))\n",
2330 | "ax4.set_xlabel('Angle [°]')"
2331 | ]
2332 | },
2333 | {
2334 | "cell_type": "code",
2335 | "execution_count": null,
2336 | "id": "4dc9684f",
2337 | "metadata": {},
2338 | "outputs": [],
2339 | "source": [
2340 | "X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))\n",
2341 | "Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)\n",
2342 | "\n",
2343 | "fig, axs = plt.subplots(2, 2, layout='constrained')\n",
2344 | "pc = axs[0, 0].pcolormesh(X, Y, Z, vmin=-1, vmax=1, cmap='RdBu_r')\n",
2345 | "fig.colorbar(pc, ax=axs[0, 0])\n",
2346 | "axs[0, 0].set_title('pcolormesh()')\n",
2347 | "\n",
2348 | "co = axs[0, 1].contourf(X, Y, Z, levels=np.linspace(-1.25, 1.25, 11))\n",
2349 | "fig.colorbar(co, ax=axs[0, 1])\n",
2350 | "axs[0, 1].set_title('contourf()')\n",
2351 | "\n",
2352 | "pc = axs[1, 0].imshow(Z**2 * 100, cmap='plasma',\n",
2353 | " norm=mpl.colors.LogNorm(vmin=0.01, vmax=100))\n",
2354 | "fig.colorbar(pc, ax=axs[1, 0], extend='both')\n",
2355 | "axs[1, 0].set_title('imshow() with LogNorm()')\n",
2356 | "\n",
2357 | "pc = axs[1, 1].scatter(data1, data2, c=data3, cmap='RdBu_r')\n",
2358 | "fig.colorbar(pc, ax=axs[1, 1], extend='both')\n",
2359 | "axs[1, 1].set_title('scatter()')"
2360 | ]
2361 | },
2362 | {
2363 | "cell_type": "code",
2364 | "execution_count": null,
2365 | "id": "23ef9c07",
2366 | "metadata": {},
2367 | "outputs": [],
2368 | "source": [
2369 | "fig, axd = plt.subplot_mosaic([['upleft', 'right'],\n",
2370 | " ['lowleft', 'right']], layout='constrained')\n",
2371 | "axd['upleft'].set_title('upleft')\n",
2372 | "axd['lowleft'].set_title('lowleft')\n",
2373 | "axd['right'].set_title('right')"
2374 | ]
2375 | }
2376 | ],
2377 | "metadata": {
2378 | "kernelspec": {
2379 | "display_name": "Python 3 (ipykernel)",
2380 | "language": "python",
2381 | "name": "python3"
2382 | },
2383 | "language_info": {
2384 | "codemirror_mode": {
2385 | "name": "ipython",
2386 | "version": 3
2387 | },
2388 | "file_extension": ".py",
2389 | "mimetype": "text/x-python",
2390 | "name": "python",
2391 | "nbconvert_exporter": "python",
2392 | "pygments_lexer": "ipython3",
2393 | "version": "3.9.13"
2394 | }
2395 | },
2396 | "nbformat": 4,
2397 | "nbformat_minor": 5
2398 | }
2399 |
--------------------------------------------------------------------------------
/notebooks/seminars/04-pytorch-intro.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "apOCg85d9SLV"
7 | },
8 | "source": [
9 | "# Семинар по Pytorch.\n"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": null,
15 | "metadata": {
16 | "id": "_gLdrFvh9SLX"
17 | },
18 | "outputs": [],
19 | "source": [
20 | "import matplotlib.pyplot as plt\n",
21 | "import numpy as np\n",
22 | "import sklearn\n",
23 | "%matplotlib inline"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {
29 | "id": "OYzpWFqt9SLY"
30 | },
31 | "source": [
32 | "Установка: https://pytorch.org/get-started/locally/ \n",
33 | "В этом ноутбуке будут разобраны основные особенности фреймворка Pytorch. Pytorch - это как Numpy, только умеет эффективно автоматически считать градиенты."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {
40 | "colab": {
41 | "base_uri": "https://localhost:8080/"
42 | },
43 | "id": "ejEH9lf39SLZ",
44 | "outputId": "67145a07-4704-44b0-bfb7-c962c75b03e0"
45 | },
46 | "outputs": [],
47 | "source": [
48 | "import torch\n",
49 | "\n",
50 | "print(torch.__version__)"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {
57 | "id": "CvX6ONy29SLa"
58 | },
59 | "outputs": [],
60 | "source": [
61 | "import torchvision\n",
62 | "from torch import nn\n",
63 | "import os\n",
64 | "from torchvision.datasets import MNIST\n",
65 | "import torchvision.transforms as transforms\n",
66 | "import tqdm"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": null,
72 | "metadata": {
73 | "colab": {
74 | "base_uri": "https://localhost:8080/"
75 | },
76 | "id": "iyXIU9K29SLb",
77 | "outputId": "5c8cb232-155a-48ec-a759-6d8bdeb7fd90"
78 | },
79 | "outputs": [],
80 | "source": [
81 | "# проверить, доступна ли у вас cuda.\n",
82 | "torch.cuda.is_available()"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {
88 | "id": "7eVL0Xh49SLb"
89 | },
90 | "source": [
91 | "# Базовые операции"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {
98 | "colab": {
99 | "base_uri": "https://localhost:8080/"
100 | },
101 | "id": "VJh2krfQ9SLb",
102 | "outputId": "b6f11b04-1d73-42e3-8433-11254067f174",
103 | "scrolled": true
104 | },
105 | "outputs": [],
106 | "source": [
107 | "# numpy\n",
108 | "\n",
109 | "x = np.arange(25).reshape(5, 5)\n",
110 | "\n",
111 | "print(\"X :\\n {}\\n\".format(x))\n",
112 | "print(\"X.shape : {}\\n \".format(x.shape))\n",
113 | "print(\"Возвести в квадрат:\\n {}\\n\".format(x * x))\n",
114 | "print(\"X*X^T :\\n {}\\n\".format(np.matmul(x, x.T)))\n",
115 | "print(\"Cреднее по столбцам :\\n {}\\n\".format(np.mean(x, axis=0)))\n",
116 | "print(\"Сумма по строкам:\\n {}\\n\".format(np.cumsum(x, axis=1)))"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": null,
122 | "metadata": {
123 | "colab": {
124 | "base_uri": "https://localhost:8080/"
125 | },
126 | "id": "qRoNIUAm9SLc",
127 | "outputId": "700f09ba-d36f-43d8-9bb7-592c01222c5c"
128 | },
129 | "outputs": [],
130 | "source": [
131 | "# torch\n",
132 | "\n",
133 | "x = torch.arange(25).reshape(5, 5).float()\n",
134 | "\n",
135 | "print(\"X :\\n {}\\n\".format(x))\n",
136 | "print(\"X.shape : {}\\n \".format(x.shape))\n",
137 | "print(\"Возвести в квадрат:\\n {}\\n\".format(x * x))\n",
138 | "print(\"X*X^T :\\n {}\\n\".format(torch.matmul(x, x.T)))\n",
139 | "print(\"Cреднее по столбцам :\\n {}\\n\".format(torch.mean(x, axis=0)))\n",
140 | "print(\"Сумма по строкам:\\n {}\\n\".format(torch.cumsum(x, axis=1)))"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {
146 | "id": "PErs03Qf9SLc"
147 | },
148 | "source": [
149 | "# Pytorch - почти Numpy.\n",
150 | "Вы можете создавать тензоры, смотреть на их градиенты, не создавая сессии как в tensorflow.\n",
151 | "Названия методов очень похожи. Если они отличаются - загляните в таблицу: https://github.com/torch/torch7/wiki/Torch-for-Numpy-users\n"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {
157 | "id": "OVAZNles9SLd"
158 | },
159 | "source": [
160 | "# Pytroch сам считает backpropagation для нас с помощью модуля autograd"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": null,
166 | "metadata": {
167 | "colab": {
168 | "base_uri": "https://localhost:8080/"
169 | },
170 | "id": "BxomQQ3u9SLd",
171 | "outputId": "c5eb07db-3143-49ae-f5cf-7bc1c957c161"
172 | },
173 | "outputs": [],
174 | "source": [
175 | "# создаем тензор\n",
176 | "preds = torch.zeros(5, requires_grad=True)\n",
177 | "\n",
178 | "# вектор предсказаний\n",
179 | "labels = torch.ones(5, requires_grad=True)\n",
180 | "\n",
181 | "# loss: MAE\n",
182 | "loss = torch.mean(torch.abs(labels - preds))\n",
183 | "\n",
184 | "print(loss)\n",
185 | "\n",
186 | "# запускаем backprop\n",
187 | "loss.backward()"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": null,
193 | "metadata": {
194 | "colab": {
195 | "base_uri": "https://localhost:8080/"
196 | },
197 | "id": "aSu1ITF99SLe",
198 | "outputId": "f81d2c14-af85-4dab-815f-09ec4ae1c12f"
199 | },
200 | "outputs": [],
201 | "source": [
202 | "# градиенты доступны в поле .grad:\n",
203 | "preds.grad"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": null,
209 | "metadata": {
210 | "colab": {
211 | "base_uri": "https://localhost:8080/"
212 | },
213 | "id": "JlpI3Th29SLe",
214 | "outputId": "e7c3d66e-9233-41d7-f227-37589db53212"
215 | },
216 | "outputs": [],
217 | "source": [
218 | "# градиенты можно занулить\n",
219 | "preds.grad.zero_()"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {
226 | "colab": {
227 | "base_uri": "https://localhost:8080/"
228 | },
229 | "id": "zBjOk_jH9SLe",
230 | "outputId": "07571207-ce54-4452-adb3-705c2f55009f"
231 | },
232 | "outputs": [],
233 | "source": [
234 | "# теперь градиенты снова 0\n",
235 | "preds.grad"
236 | ]
237 | },
238 | {
239 | "cell_type": "markdown",
240 | "metadata": {
241 | "id": "rzwlsjQC9SLf"
242 | },
243 | "source": [
244 | "### Градиенты накапливаются при каждом вызове backward()"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": null,
250 | "metadata": {
251 | "colab": {
252 | "base_uri": "https://localhost:8080/"
253 | },
254 | "id": "kgydtalG9SLf",
255 | "outputId": "886288c8-bcd2-4a78-ad5b-f795b8485693"
256 | },
257 | "outputs": [],
258 | "source": [
259 | "# создаем тензор\n",
260 | "preds = torch.zeros(5, requires_grad=True)\n",
261 | "\n",
262 | "# вектор предсказаний\n",
263 | "labels = torch.ones(5, requires_grad=True)\n",
264 | "\n",
265 | "# loss: MAE\n",
266 | "loss = torch.mean(torch.abs(labels - preds))\n",
267 | "\n",
268 | "print(loss)\n",
269 | "\n",
270 | "# запускаем backprop\n",
271 | "for i in range(5):\n",
272 | " loss.backward(retain_graph=True)\n",
273 | " print(i, preds.grad)"
274 | ]
275 | },
276 | {
277 | "cell_type": "markdown",
278 | "metadata": {
279 | "id": "xt-q2Fk79SLg"
280 | },
281 | "source": [
282 | "# Пишем свою логистическую регрессию на пайторче"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {
289 | "id": "J3yFTWek9SLg"
290 | },
291 | "outputs": [],
292 | "source": [
293 | "from sklearn.datasets import make_blobs\n",
294 | "import sklearn\n",
295 | "\n",
296 | "X, y = make_blobs(\n",
297 | " n_samples=200,\n",
298 | " centers=((10, 5), (5, -5)),\n",
299 | " n_features=2,\n",
300 | " random_state=0,\n",
301 | " cluster_std=3,\n",
302 | ")"
303 | ]
304 | },
305 | {
306 | "cell_type": "code",
307 | "execution_count": null,
308 | "metadata": {
309 | "colab": {
310 | "base_uri": "https://localhost:8080/",
311 | "height": 282
312 | },
313 | "id": "FDd7e9339SLg",
314 | "outputId": "1b919df0-f188-447c-e69e-f6a468ce9cb6"
315 | },
316 | "outputs": [],
317 | "source": [
318 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")"
319 | ]
320 | },
321 | {
322 | "cell_type": "markdown",
323 | "metadata": {
324 | "id": "5NU6rOBC9SLh"
325 | },
326 | "source": [
327 | "# Напоминание. Как это было в Sklearn:"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": null,
333 | "metadata": {
334 | "id": "7wWhBHPH9SLh"
335 | },
336 | "outputs": [],
337 | "source": [
338 | "from sklearn.linear_model import LogisticRegression"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": null,
344 | "metadata": {
345 | "id": "2HmP5WB_9SLi"
346 | },
347 | "outputs": [],
348 | "source": [
349 | "model = LogisticRegression()"
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "metadata": {
356 | "colab": {
357 | "base_uri": "https://localhost:8080/"
358 | },
359 | "id": "WBcgUibg9SLi",
360 | "outputId": "74606d4b-5b33-4025-f275-fd4690b22c71",
361 | "scrolled": true
362 | },
363 | "outputs": [],
364 | "source": [
365 | "model.fit(X, y)"
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": null,
371 | "metadata": {
372 | "colab": {
373 | "base_uri": "https://localhost:8080/"
374 | },
375 | "id": "lk966UsF9SLj",
376 | "outputId": "da585bd2-a748-4004-c80a-290471761045"
377 | },
378 | "outputs": [],
379 | "source": [
380 | "model.coef_"
381 | ]
382 | },
383 | {
384 | "cell_type": "code",
385 | "execution_count": null,
386 | "metadata": {
387 | "colab": {
388 | "base_uri": "https://localhost:8080/"
389 | },
390 | "id": "tInwNtdw9SLj",
391 | "outputId": "7547ac62-d286-4045-ec6b-158d31f5b062"
392 | },
393 | "outputs": [],
394 | "source": [
395 | "model.intercept_"
396 | ]
397 | },
398 | {
399 | "cell_type": "markdown",
400 | "metadata": {
401 | "id": "tyw6b4ZG9SLk"
402 | },
403 | "source": [
404 | "## Визуализируем разделяющую плоскость"
405 | ]
406 | },
407 | {
408 | "cell_type": "code",
409 | "execution_count": null,
410 | "metadata": {
411 | "colab": {
412 | "base_uri": "https://localhost:8080/",
413 | "height": 592
414 | },
415 | "id": "zeQ1czKR9SLk",
416 | "outputId": "404ed742-7fc9-4bfe-c554-c5cd39d579a2"
417 | },
418 | "outputs": [],
419 | "source": [
420 | "w_1 = model.coef_[0][0]\n",
421 | "w_2 = model.coef_[0][1]\n",
422 | "w_0 = model.intercept_[0]\n",
423 | "\n",
424 | "plt.figure(figsize=(20, 10))\n",
425 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
426 | "plt.legend([\"y = -1\", \"y = 1\"])\n",
427 | "x_arr = np.linspace(-3, 18, 1000)\n",
428 | "\n",
429 | "y_arr = -(w_0 + w_1 * x_arr) / w_2\n",
430 | "plt.plot(x_arr, y_arr)\n",
431 | "plt.show()"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {
437 | "id": "uqe74hxI9SLl"
438 | },
439 | "source": [
440 | "# Pytorch"
441 | ]
442 | },
443 | {
444 | "cell_type": "code",
445 | "execution_count": null,
446 | "metadata": {
447 | "id": "zeYuViOh9SLl"
448 | },
449 | "outputs": [],
450 | "source": [
451 | "X, y = make_blobs(n_samples=200, centers=2, n_features=2, random_state=0, cluster_std=3)"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": null,
457 | "metadata": {
458 | "colab": {
459 | "base_uri": "https://localhost:8080/"
460 | },
461 | "id": "jrbIbcUf9SLl",
462 | "outputId": "54d5f4cb-3c66-4531-ff11-6406df90fa81"
463 | },
464 | "outputs": [],
465 | "source": [
466 | "type(X), type(y)"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": null,
472 | "metadata": {
473 | "id": "NkSB7ldb9SLm"
474 | },
475 | "outputs": [],
476 | "source": [
477 | "X = torch.tensor(X)\n",
478 | "y = torch.tensor(y)"
479 | ]
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": null,
484 | "metadata": {
485 | "colab": {
486 | "base_uri": "https://localhost:8080/"
487 | },
488 | "id": "R5pyIO1k9SLm",
489 | "outputId": "00f6053e-2099-495c-e226-867c3650d8bb"
490 | },
491 | "outputs": [],
492 | "source": [
493 | "type(X), type(y)"
494 | ]
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": null,
499 | "metadata": {
500 | "colab": {
501 | "base_uri": "https://localhost:8080/"
502 | },
503 | "id": "JdoThC549SLm",
504 | "outputId": "146701ee-b542-4bce-eb8c-c6a7f8c6a935"
505 | },
506 | "outputs": [],
507 | "source": [
508 | "print(X.shape, y.shape)"
509 | ]
510 | },
511 | {
512 | "cell_type": "code",
513 | "execution_count": null,
514 | "metadata": {
515 | "colab": {
516 | "base_uri": "https://localhost:8080/"
517 | },
518 | "id": "T6dvOti89SLo",
519 | "outputId": "bab39e85-0757-4d7e-bb49-4df45c0e0838"
520 | },
521 | "outputs": [],
522 | "source": [
523 | "len(X)"
524 | ]
525 | },
526 | {
527 | "cell_type": "code",
528 | "execution_count": null,
529 | "metadata": {
530 | "colab": {
531 | "base_uri": "https://localhost:8080/"
532 | },
533 | "id": "TSwk0ae39SLo",
534 | "outputId": "f66f7609-dc2b-41da-bf6a-9d0ef89e24f5"
535 | },
536 | "outputs": [],
537 | "source": [
538 | "len(y)"
539 | ]
540 | },
541 | {
542 | "cell_type": "markdown",
543 | "metadata": {
544 | "id": "0DW4OIky9SLo"
545 | },
546 | "source": [
547 | "# Напишем свою логистическую регрессию на пайторче"
548 | ]
549 | },
550 | {
551 | "cell_type": "code",
552 | "execution_count": null,
553 | "metadata": {
554 | "id": "RcTuxVya9SLo"
555 | },
556 | "outputs": [],
557 | "source": [
558 | "X, y = make_blobs(\n",
559 | " n_samples=200,\n",
560 | " centers=((10, 5), (5, -5)),\n",
561 | " n_features=2,\n",
562 | " random_state=0,\n",
563 | " cluster_std=3,\n",
564 | ")"
565 | ]
566 | },
567 | {
568 | "cell_type": "code",
569 | "execution_count": null,
570 | "metadata": {
571 | "colab": {
572 | "base_uri": "https://localhost:8080/"
573 | },
574 | "id": "l9nsTljs9SLp",
575 | "outputId": "d0708940-91d1-4b69-da77-6de9cf08320b"
576 | },
577 | "outputs": [],
578 | "source": [
579 | "type(X), type(y)"
580 | ]
581 | },
582 | {
583 | "cell_type": "code",
584 | "execution_count": null,
585 | "metadata": {
586 | "id": "dRCnYCIZ9SLp"
587 | },
588 | "outputs": [],
589 | "source": [
590 | "X = torch.tensor(X)\n",
591 | "y = torch.tensor(y)"
592 | ]
593 | },
594 | {
595 | "cell_type": "code",
596 | "execution_count": null,
597 | "metadata": {
598 | "id": "XLWyyMy29SLp"
599 | },
600 | "outputs": [],
601 | "source": [
602 | "assert type(X) == torch.Tensor\n",
603 | "assert type(y) == torch.Tensor"
604 | ]
605 | },
606 | {
607 | "cell_type": "code",
608 | "execution_count": null,
609 | "metadata": {
610 | "id": "d5sC_9z79SLp"
611 | },
612 | "outputs": [],
613 | "source": [
614 | "w = torch.zeros(2, requires_grad=True, dtype=torch.double)\n",
615 | "b = torch.zeros(1, requires_grad=True, dtype=torch.double)"
616 | ]
617 | },
618 | {
619 | "cell_type": "code",
620 | "execution_count": null,
621 | "metadata": {
622 | "id": "oXS_iaNV9SLq"
623 | },
624 | "outputs": [],
625 | "source": [
626 | "assert w.requires_grad\n",
627 | "assert len(w.shape) == 1\n",
628 | "assert w.shape[0] == X.shape[1]\n",
629 | "assert w.dtype == X.dtype\n",
630 | "assert b.requires_grad\n",
631 | "assert len(b.shape) == 1\n",
632 | "assert b.dtype == X.dtype"
633 | ]
634 | },
635 | {
636 | "cell_type": "code",
637 | "execution_count": null,
638 | "metadata": {
639 | "id": "mHrha4x49SLq"
640 | },
641 | "outputs": [],
642 | "source": [
643 | "def binary_cross_entropy(y, y_predicted):\n",
644 | " \"\"\"\n",
645 | " y: binary tensor, shape: N, example: [0, 1, 0, 1, 1]\n",
646 | " y_pred: tensor with values from 0 to 1. shape: N. example: [0.2, 0, 1, 0.75, 0.999]\n",
647 | "\n",
648 | " output: tensor, shape: N\n",
649 | "\n",
650 | " \"\"\"\n",
651 | " return -(y * (torch.log(y_predicted)) + (1 - y) * torch.log(1 - y_predicted))"
652 | ]
653 | },
654 | {
655 | "cell_type": "code",
656 | "execution_count": null,
657 | "metadata": {
658 | "id": "Bc7UtPD-9SLq"
659 | },
660 | "outputs": [],
661 | "source": [
662 | "y_test = torch.tensor([1, 0, 1, 1])\n",
663 | "y_pred = torch.tensor([0.7, 0.3, 0.5, 0.9])\n",
664 | "bce_correct = torch.tensor([0.3567, 0.3567, 0.6931, 0.1054])\n",
665 | "bce_predicted = binary_cross_entropy(y_test, y_pred)\n",
666 | "assert bce_predicted.shape == y_test.shape\n",
667 | "assert torch.allclose(bce_predicted, bce_correct, rtol=1e-03)"
668 | ]
669 | },
670 | {
671 | "cell_type": "code",
672 | "execution_count": null,
673 | "metadata": {
674 | "colab": {
675 | "base_uri": "https://localhost:8080/",
676 | "height": 282
677 | },
678 | "id": "27EK6JmN9SLq",
679 | "outputId": "faa084ab-11aa-424c-bcc3-86fea3c02391",
680 | "scrolled": true
681 | },
682 | "outputs": [],
683 | "source": [
684 | "from IPython.display import clear_output\n",
685 | "\n",
686 | "learning_rate = 0.00001\n",
687 | "n_epoch = 1000\n",
688 | "\n",
689 | "for i in range(n_epoch):\n",
690 | " y_pred = torch.matmul(X, w) + b\n",
691 | " y_pred = torch.sigmoid(y_pred)\n",
692 | " loss = binary_cross_entropy(y, y_pred)\n",
693 | " loss = torch.sum(loss)\n",
694 | "\n",
695 | " loss.backward()\n",
696 | "\n",
697 | " w.data -= learning_rate * w.grad.data\n",
698 | " b.data -= learning_rate * b.grad.data\n",
699 | "\n",
700 | " # zero gradients\n",
701 | " w.grad.data.zero_()\n",
702 | " b.grad.data.zero_()\n",
703 | "\n",
704 | " if (i + 1) % 5 == 0:\n",
705 | " # if True:\n",
706 | " clear_output(True)\n",
707 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
708 | "\n",
709 | " w_1 = w.data[0]\n",
710 | " w_2 = w.data[1]\n",
711 | " w_0 = b.data[0]\n",
712 | "\n",
713 | " x_arr = torch.linspace(-10, 20, 1000)\n",
714 | " plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
715 | " plt.show()\n",
716 | "\n",
717 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
718 | ]
719 | },
720 | {
721 | "cell_type": "markdown",
722 | "metadata": {
723 | "id": "dOACWrA69SLr"
724 | },
725 | "source": [
726 | "# Теперь все за нас делает пайторч"
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": null,
732 | "metadata": {
733 | "id": "GFipi35D9SLr"
734 | },
735 | "outputs": [],
736 | "source": [
737 | "X, y = make_blobs(\n",
738 | " n_samples=200,\n",
739 | " centers=((10, 5), (5, -5)),\n",
740 | " n_features=2,\n",
741 | " random_state=0,\n",
742 | " cluster_std=3,\n",
743 | ")"
744 | ]
745 | },
746 | {
747 | "cell_type": "code",
748 | "execution_count": null,
749 | "metadata": {
750 | "id": "Wn8pm63j9SLr"
751 | },
752 | "outputs": [],
753 | "source": [
754 | "X = torch.tensor(X)\n",
755 | "y = torch.tensor(y)"
756 | ]
757 | },
758 | {
759 | "cell_type": "markdown",
760 | "metadata": {},
761 | "source": [
762 | "# Немного про оптимизаторы"
763 | ]
764 | },
765 | {
766 | "cell_type": "markdown",
767 | "metadata": {},
768 | "source": [
769 | ""
770 | ]
771 | },
772 | {
773 | "cell_type": "markdown",
774 | "metadata": {},
775 | "source": [
776 | ""
777 | ]
778 | },
779 | {
780 | "cell_type": "markdown",
781 | "metadata": {},
782 | "source": [
783 | ""
784 | ]
785 | },
786 | {
787 | "cell_type": "markdown",
788 | "metadata": {},
789 | "source": [
790 | ""
791 | ]
792 | },
793 | {
794 | "cell_type": "markdown",
795 | "metadata": {
796 | "id": "BcPK0X9A9SLs"
797 | },
798 | "source": [
799 | "# Инициализируем модель"
800 | ]
801 | },
802 | {
803 | "cell_type": "code",
804 | "execution_count": null,
805 | "metadata": {
806 | "colab": {
807 | "base_uri": "https://localhost:8080/"
808 | },
809 | "id": "isKKf6Ph9SLs",
810 | "outputId": "171f5352-7f9d-4b8b-e713-9e22bed4aae4"
811 | },
812 | "outputs": [],
813 | "source": [
814 | "model = torch.nn.Linear(2, 1)\n",
815 | "criterion = torch.nn.BCELoss()\n",
816 | "\n",
817 | "optim = torch.optim.SGD(model.parameters(), lr=0.1)\n",
818 | "\n",
819 | "model.train()"
820 | ]
821 | },
822 | {
823 | "cell_type": "code",
824 | "execution_count": null,
825 | "metadata": {
826 | "colab": {
827 | "base_uri": "https://localhost:8080/"
828 | },
829 | "id": "G8ZZVN6p9SLs",
830 | "outputId": "f726ed1e-75ab-400f-a272-0b97f064c1eb"
831 | },
832 | "outputs": [],
833 | "source": [
834 | "model"
835 | ]
836 | },
837 | {
838 | "cell_type": "code",
839 | "execution_count": null,
840 | "metadata": {
841 | "colab": {
842 | "base_uri": "https://localhost:8080/"
843 | },
844 | "id": "0EJf9Aa89SLs",
845 | "outputId": "cb41bf52-a525-4828-a41c-72d41924471a"
846 | },
847 | "outputs": [],
848 | "source": [
849 | "model.weight"
850 | ]
851 | },
852 | {
853 | "cell_type": "code",
854 | "execution_count": null,
855 | "metadata": {
856 | "colab": {
857 | "base_uri": "https://localhost:8080/"
858 | },
859 | "id": "kZfdsGbH9SLt",
860 | "outputId": "2732e8ba-8903-480a-dabe-9d3686bf9bae"
861 | },
862 | "outputs": [],
863 | "source": [
864 | "model.bias"
865 | ]
866 | },
867 | {
868 | "cell_type": "code",
869 | "execution_count": null,
870 | "metadata": {
871 | "colab": {
872 | "base_uri": "https://localhost:8080/"
873 | },
874 | "id": "-8fF8Cu89SLt",
875 | "outputId": "ad5eb620-6ada-46e4-edea-efe10042f355"
876 | },
877 | "outputs": [],
878 | "source": [
879 | "model.weight.data.dtype"
880 | ]
881 | },
882 | {
883 | "cell_type": "code",
884 | "execution_count": null,
885 | "metadata": {
886 | "colab": {
887 | "base_uri": "https://localhost:8080/",
888 | "height": 282
889 | },
890 | "id": "l0giOSki9SLt",
891 | "outputId": "56752055-57ab-4742-bca2-62b16ee88f22"
892 | },
893 | "outputs": [],
894 | "source": [
895 | "from IPython.display import clear_output\n",
896 | "\n",
897 | "\n",
898 | "for i in range(1000):\n",
899 | " # считаем предсказание\n",
900 | " y_pred = torch.sigmoid(model(X.float()))\n",
901 | "\n",
902 | " # считаем лосс\n",
903 | " loss = criterion(y_pred.flatten(), y.float())\n",
904 | "\n",
905 | " # прокидываем градиенты\n",
906 | " loss.backward()\n",
907 | "\n",
908 | " # делаем шаг оптимизатором\n",
909 | " optim.step()\n",
910 | "\n",
911 | " # зануляем градиенты\n",
912 | " optim.zero_grad()\n",
913 | "\n",
914 | " if (i + 1) % 5 == 0:\n",
915 | " clear_output(True)\n",
916 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
917 | "\n",
918 | " w_1 = model.weight.data[0][0]\n",
919 | " w_2 = model.weight.data[0][1]\n",
920 | " w_0 = model.bias.data[0]\n",
921 | "\n",
922 | " x_arr = torch.linspace(-10, 20, 1000)\n",
923 | " plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
924 | " plt.show()\n",
925 | "\n",
926 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
927 | ]
928 | },
929 | {
930 | "cell_type": "markdown",
931 | "metadata": {
932 | "id": "PvOBE30i9SLu"
933 | },
934 | "source": [
935 | "# Окей, пусть теперь проблема нелинейная"
936 | ]
937 | },
938 | {
939 | "cell_type": "code",
940 | "execution_count": null,
941 | "metadata": {
942 | "id": "xNdKWnab9SLu"
943 | },
944 | "outputs": [],
945 | "source": [
946 | "from sklearn.datasets import make_moons"
947 | ]
948 | },
949 | {
950 | "cell_type": "code",
951 | "execution_count": null,
952 | "metadata": {
953 | "id": "ZMiLvnHe9SLu"
954 | },
955 | "outputs": [],
956 | "source": [
957 | "X, y = make_moons(n_samples=200, noise=0.1, random_state=17)"
958 | ]
959 | },
960 | {
961 | "cell_type": "code",
962 | "execution_count": null,
963 | "metadata": {
964 | "colab": {
965 | "base_uri": "https://localhost:8080/",
966 | "height": 266
967 | },
968 | "id": "iKV3vT9q9SLu",
969 | "outputId": "2cbe3751-a072-4293-9753-f4e70d1ea88c"
970 | },
971 | "outputs": [],
972 | "source": [
973 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
974 | "\n",
975 | "X = torch.tensor(X)\n",
976 | "y = torch.tensor(y)"
977 | ]
978 | },
979 | {
980 | "cell_type": "code",
981 | "execution_count": null,
982 | "metadata": {
983 | "colab": {
984 | "base_uri": "https://localhost:8080/"
985 | },
986 | "id": "A-suQlRt9SLu",
987 | "outputId": "af3e8cc1-747e-4195-87a6-a4fe7d01689a"
988 | },
989 | "outputs": [],
990 | "source": [
991 | "model = torch.nn.Linear(2, 1)\n",
992 | "criterion = torch.nn.BCELoss()\n",
993 | "\n",
994 | "optim = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)\n",
995 | "\n",
996 | "model.train()"
997 | ]
998 | },
999 | {
1000 | "cell_type": "code",
1001 | "execution_count": null,
1002 | "metadata": {
1003 | "colab": {
1004 | "base_uri": "https://localhost:8080/",
1005 | "height": 283
1006 | },
1007 | "id": "gF_WdOl29SLv",
1008 | "outputId": "9b84241b-eb28-4ff2-ddef-6d883613de7e"
1009 | },
1010 | "outputs": [],
1011 | "source": [
1012 | "from IPython.display import clear_output\n",
1013 | "\n",
1014 | "\n",
1015 | "for i in range(30000):\n",
1016 | "\n",
1017 | " y_pred = torch.sigmoid(model(X.float()))\n",
1018 | "\n",
1019 | " loss = criterion(y_pred.flatten(), y.float())\n",
1020 | "\n",
1021 | " loss.backward()\n",
1022 | "\n",
1023 | " optim.step()\n",
1024 | " optim.zero_grad()\n",
1025 | "\n",
1026 | " if (i + 1) % 500 == 0:\n",
1027 | " clear_output(True)\n",
1028 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1029 | "\n",
1030 | " w_1 = model.weight.data[0][0]\n",
1031 | " w_2 = model.weight.data[0][1]\n",
1032 | " w_0 = model.bias.data[0]\n",
1033 | "\n",
1034 | " x_arr = torch.linspace(-2, 2, 1000)\n",
1035 | " plt.plot(x_arr, -(w_0 + w_1 * x_arr) / w_2)\n",
1036 | " plt.show()\n",
1037 | "\n",
1038 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1039 | ]
1040 | },
1041 | {
1042 | "cell_type": "markdown",
1043 | "metadata": {
1044 | "id": "WevXy3M99SLv"
1045 | },
1046 | "source": [
1047 | "# Визуализируем разделяющую плоскость"
1048 | ]
1049 | },
1050 | {
1051 | "cell_type": "code",
1052 | "execution_count": null,
1053 | "metadata": {
1054 | "id": "hnCBUkTa9SLv"
1055 | },
1056 | "outputs": [],
1057 | "source": [
1058 | "h = 0.02 # step size in the mesh\n",
1059 | "cm = plt.cm.RdBu\n",
1060 | "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1061 | "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1062 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1063 | "\n",
1064 | "input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1065 | "model.eval()\n",
1066 | "Z = torch.sigmoid(model(input_tensor))\n",
1067 | "Z = Z.reshape(xx.shape)"
1068 | ]
1069 | },
1070 | {
1071 | "cell_type": "code",
1072 | "execution_count": null,
1073 | "metadata": {
1074 | "colab": {
1075 | "base_uri": "https://localhost:8080/",
1076 | "height": 265
1077 | },
1078 | "id": "nyNauPQp9SLw",
1079 | "outputId": "ca257bc2-d4b6-4e79-ee7b-948a2d314d42"
1080 | },
1081 | "outputs": [],
1082 | "source": [
1083 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1084 | "\n",
1085 | "plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1086 | "\n",
1087 | "plt.show()"
1088 | ]
1089 | },
1090 | {
1091 | "cell_type": "markdown",
1092 | "metadata": {
1093 | "id": "UddJchKW9SLw"
1094 | },
1095 | "source": [
1096 | "# Делаем модель сложнее"
1097 | ]
1098 | },
1099 | {
1100 | "cell_type": "code",
1101 | "execution_count": null,
1102 | "metadata": {
1103 | "colab": {
1104 | "base_uri": "https://localhost:8080/",
1105 | "height": 355
1106 | },
1107 | "id": "x69Nn_YN9SLw",
1108 | "outputId": "2ca4de0c-a95f-4786-f8f1-e529f403fd81"
1109 | },
1110 | "outputs": [],
1111 | "source": [
1112 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1113 | "\n",
1114 | "X = torch.tensor(X)\n",
1115 | "y = torch.tensor(y)"
1116 | ]
1117 | },
1118 | {
1119 | "cell_type": "code",
1120 | "execution_count": null,
1121 | "metadata": {
1122 | "id": "dUhEBnzI9SLx"
1123 | },
1124 | "outputs": [],
1125 | "source": [
1126 | "from torch import nn"
1127 | ]
1128 | },
1129 | {
1130 | "cell_type": "markdown",
1131 | "metadata": {
1132 | "id": "aFhGOXiu9SLx"
1133 | },
1134 | "source": [
1135 | "# Cоберите двуслойную модель"
1136 | ]
1137 | },
1138 | {
1139 | "cell_type": "code",
1140 | "execution_count": null,
1141 | "metadata": {
1142 | "id": "wfPiEfSU9SLx"
1143 | },
1144 | "outputs": [],
1145 | "source": [
1146 | "model = nn.Sequential()\n",
1147 | "model.add_module(\"first\", nn.Linear(2, 2))\n",
1148 | "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1149 | "model.add_module(\"second\", nn.Linear(2, 1))"
1150 | ]
1151 | },
1152 | {
1153 | "cell_type": "code",
1154 | "execution_count": null,
1155 | "metadata": {
1156 | "colab": {
1157 | "base_uri": "https://localhost:8080/"
1158 | },
1159 | "id": "owO-1I-J9SLy",
1160 | "outputId": "4c346e55-b4f0-4c45-aee8-72827c80c6b2"
1161 | },
1162 | "outputs": [],
1163 | "source": [
1164 | "criterion = torch.nn.BCELoss()\n",
1165 | "\n",
1166 | "optim = torch.optim.SGD(model.parameters(), lr=2.0)\n",
1167 | "\n",
1168 | "model.train()"
1169 | ]
1170 | },
1171 | {
1172 | "cell_type": "code",
1173 | "execution_count": null,
1174 | "metadata": {
1175 | "colab": {
1176 | "base_uri": "https://localhost:8080/",
1177 | "height": 282
1178 | },
1179 | "id": "RquUN5YP9SLy",
1180 | "outputId": "ab12b8c6-00c8-4a99-ead9-ffef5eb71d88"
1181 | },
1182 | "outputs": [],
1183 | "source": [
1184 | "from IPython.display import clear_output\n",
1185 | "\n",
1186 | "\n",
1187 | "h = 0.02 # step size in the mesh\n",
1188 | "cm = plt.cm.RdBu\n",
1189 | "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1190 | "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1191 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1192 | "\n",
1193 | "\n",
1194 | "model.train()\n",
1195 | "for i in range(30000):\n",
1196 | "\n",
1197 | " y_pred = torch.sigmoid(model(X.float()))\n",
1198 | "\n",
1199 | " loss = criterion(y_pred.flatten(), y.float())\n",
1200 | "\n",
1201 | " loss.backward()\n",
1202 | "\n",
1203 | " optim.step()\n",
1204 | " optim.zero_grad()\n",
1205 | " if (i + 1) % 500 == 0:\n",
1206 | " clear_output(True)\n",
1207 | " input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1208 | " Z = torch.sigmoid(model(input_tensor))\n",
1209 | " Z = Z.reshape(xx.shape)\n",
1210 | "\n",
1211 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1212 | "\n",
1213 | " plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1214 | "\n",
1215 | " plt.show()\n",
1216 | "\n",
1217 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1218 | ]
1219 | },
1220 | {
1221 | "cell_type": "markdown",
1222 | "metadata": {
1223 | "id": "qN9h6lYI9SLy"
1224 | },
1225 | "source": [
1226 | "# Усложним модель (увеличим количество слоев)"
1227 | ]
1228 | },
1229 | {
1230 | "cell_type": "code",
1231 | "execution_count": null,
1232 | "metadata": {
1233 | "id": "79UD5nPe9SLy"
1234 | },
1235 | "outputs": [],
1236 | "source": [
1237 | "model = nn.Sequential()\n",
1238 | "model.add_module(\"first\", nn.Linear(2, 2))\n",
1239 | "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1240 | "model.add_module(\"second\", nn.Linear(2, 2))\n",
1241 | "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1242 | "model.add_module(\"third\", nn.Linear(2, 1))"
1243 | ]
1244 | },
1245 | {
1246 | "cell_type": "code",
1247 | "execution_count": null,
1248 | "metadata": {
1249 | "colab": {
1250 | "base_uri": "https://localhost:8080/"
1251 | },
1252 | "id": "IAxchrKd9SLz",
1253 | "outputId": "ca800af1-ecb9-407b-d091-a700f66b5c1e"
1254 | },
1255 | "outputs": [],
1256 | "source": [
1257 | "criterion = torch.nn.BCELoss()\n",
1258 | "\n",
1259 | "optim = torch.optim.SGD(model.parameters(), lr=2.0, momentum=0.9)\n",
1260 | "\n",
1261 | "model.train()"
1262 | ]
1263 | },
1264 | {
1265 | "cell_type": "code",
1266 | "execution_count": null,
1267 | "metadata": {
1268 | "colab": {
1269 | "base_uri": "https://localhost:8080/",
1270 | "height": 282
1271 | },
1272 | "id": "K7n0LIa09SLz",
1273 | "outputId": "ca69ae43-9dd6-4ee3-cd12-ff7bdb514e12",
1274 | "scrolled": false
1275 | },
1276 | "outputs": [],
1277 | "source": [
1278 | "from IPython.display import clear_output\n",
1279 | "\n",
1280 | "\n",
1281 | "h = 0.02 # step size in the mesh\n",
1282 | "cm = plt.cm.RdBu\n",
1283 | "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1284 | "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1285 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1286 | "\n",
1287 | "\n",
1288 | "model.train()\n",
1289 | "for i in range(30000):\n",
1290 | "\n",
1291 | " y_pred = torch.sigmoid(model(X.float()))\n",
1292 | "\n",
1293 | " loss = criterion(y_pred.flatten(), y.float())\n",
1294 | "\n",
1295 | " loss.backward()\n",
1296 | "\n",
1297 | " optim.step()\n",
1298 | " optim.zero_grad()\n",
1299 | " if (i + 1) % 500 == 0:\n",
1300 | " clear_output(True)\n",
1301 | " input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1302 | " Z = torch.sigmoid(model(input_tensor))\n",
1303 | " Z = Z.reshape(xx.shape)\n",
1304 | "\n",
1305 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1306 | "\n",
1307 | " plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1308 | "\n",
1309 | " plt.show()\n",
1310 | "\n",
1311 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1312 | ]
1313 | },
1314 | {
1315 | "cell_type": "markdown",
1316 | "metadata": {
1317 | "id": "OWZ6cFnP9SLz"
1318 | },
1319 | "source": [
1320 | "# Еще усложним модель (увеличим размерность скрытых слоев)"
1321 | ]
1322 | },
1323 | {
1324 | "cell_type": "code",
1325 | "execution_count": null,
1326 | "metadata": {
1327 | "id": "weq1btVh9SL0"
1328 | },
1329 | "outputs": [],
1330 | "source": [
1331 | "model = nn.Sequential()\n",
1332 | "model.add_module(\"first\", nn.Linear(2, 5))\n",
1333 | "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1334 | "model.add_module(\"second\", nn.Linear(5, 5))\n",
1335 | "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1336 | "model.add_module(\"third\", nn.Linear(5, 1))\n",
1337 | "model.add_module(\"third_activation\", nn.Sigmoid())"
1338 | ]
1339 | },
1340 | {
1341 | "cell_type": "code",
1342 | "execution_count": null,
1343 | "metadata": {
1344 | "colab": {
1345 | "base_uri": "https://localhost:8080/"
1346 | },
1347 | "id": "R9E6zP6p9SL0",
1348 | "outputId": "0e5cffdc-3b66-4ab7-9819-6a69654ac445"
1349 | },
1350 | "outputs": [],
1351 | "source": [
1352 | "criterion = torch.nn.BCELoss()\n",
1353 | "\n",
1354 | "optim = torch.optim.SGD(model.parameters(), lr=2, momentum=0.9)\n",
1355 | "\n",
1356 | "model.train()\n",
1357 | "\n",
1358 | "# model.to('cuda:2')"
1359 | ]
1360 | },
1361 | {
1362 | "cell_type": "code",
1363 | "execution_count": null,
1364 | "metadata": {
1365 | "colab": {
1366 | "base_uri": "https://localhost:8080/",
1367 | "height": 282
1368 | },
1369 | "id": "XUWTN8rr9SL0",
1370 | "outputId": "7e5358e6-247b-4697-98f0-100384573ebc"
1371 | },
1372 | "outputs": [],
1373 | "source": [
1374 | "from IPython.display import clear_output\n",
1375 | "\n",
1376 | "\n",
1377 | "h = 0.02 # step size in the mesh\n",
1378 | "cm = plt.cm.RdBu\n",
1379 | "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1380 | "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1381 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1382 | "\n",
1383 | "\n",
1384 | "model.train()\n",
1385 | "for i in range(3000):\n",
1386 | "\n",
1387 | " y_pred = model(X.float())\n",
1388 | "\n",
1389 | " loss = criterion(y_pred.flatten(), y.float())\n",
1390 | "\n",
1391 | " loss.backward()\n",
1392 | "\n",
1393 | " optim.step()\n",
1394 | " optim.zero_grad()\n",
1395 | " if (i + 1) % 50 == 0:\n",
1396 | " clear_output(True)\n",
1397 | " input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1398 | " Z = torch.sigmoid(model(input_tensor))\n",
1399 | " Z = Z.reshape(xx.shape)\n",
1400 | "\n",
1401 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1402 | "\n",
1403 | " plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1404 | "\n",
1405 | " plt.show()\n",
1406 | "\n",
1407 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1408 | ]
1409 | },
1410 | {
1411 | "cell_type": "markdown",
1412 | "metadata": {
1413 | "id": "QvByiISS9SL0"
1414 | },
1415 | "source": [
1416 | "# Сделаем > 2 классов"
1417 | ]
1418 | },
1419 | {
1420 | "cell_type": "code",
1421 | "execution_count": null,
1422 | "metadata": {
1423 | "id": "HKn510MK9SL1"
1424 | },
1425 | "outputs": [],
1426 | "source": [
1427 | "from sklearn.datasets import make_circles"
1428 | ]
1429 | },
1430 | {
1431 | "cell_type": "code",
1432 | "execution_count": null,
1433 | "metadata": {
1434 | "id": "swhmG6--9SL1"
1435 | },
1436 | "outputs": [],
1437 | "source": [
1438 | "# blobs with varied variances\n",
1439 | "X, y = make_blobs(n_samples=400, cluster_std=[1.0, 1.5, 0.5], random_state=17)"
1440 | ]
1441 | },
1442 | {
1443 | "cell_type": "code",
1444 | "execution_count": null,
1445 | "metadata": {
1446 | "colab": {
1447 | "base_uri": "https://localhost:8080/",
1448 | "height": 265
1449 | },
1450 | "id": "pbgyFYMl9SL1",
1451 | "outputId": "26804b23-87cd-408a-def5-fd296742302d"
1452 | },
1453 | "outputs": [],
1454 | "source": [
1455 | "plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1456 | "\n",
1457 | "X = torch.tensor(X)\n",
1458 | "y = torch.tensor(y)"
1459 | ]
1460 | },
1461 | {
1462 | "cell_type": "code",
1463 | "execution_count": null,
1464 | "metadata": {
1465 | "id": "YE3gnkkN9SL1"
1466 | },
1467 | "outputs": [],
1468 | "source": [
1469 | "model = nn.Sequential()\n",
1470 | "model.add_module(\"first\", nn.Linear(2, 5))\n",
1471 | "model.add_module(\"first_activation\", nn.Sigmoid())\n",
1472 | "model.add_module(\"second\", nn.Linear(5, 5))\n",
1473 | "model.add_module(\"second_activation\", nn.Sigmoid())\n",
1474 | "model.add_module(\"third\", nn.Linear(5, 3))"
1475 | ]
1476 | },
1477 | {
1478 | "cell_type": "code",
1479 | "execution_count": null,
1480 | "metadata": {
1481 | "colab": {
1482 | "base_uri": "https://localhost:8080/"
1483 | },
1484 | "id": "tqBQn5yW9SL2",
1485 | "outputId": "fc339779-bee2-4cc5-aa91-ec285e824bc1"
1486 | },
1487 | "outputs": [],
1488 | "source": [
1489 | "criterion = torch.nn.CrossEntropyLoss()\n",
1490 | "\n",
1491 | "optim = torch.optim.SGD(model.parameters(), lr=0.1)\n",
1492 | "\n",
1493 | "model.train()"
1494 | ]
1495 | },
1496 | {
1497 | "cell_type": "code",
1498 | "execution_count": null,
1499 | "metadata": {
1500 | "colab": {
1501 | "base_uri": "https://localhost:8080/",
1502 | "height": 282
1503 | },
1504 | "id": "KCBrFli99SL2",
1505 | "outputId": "8f15148e-39a3-436b-f424-2e03d696b914"
1506 | },
1507 | "outputs": [],
1508 | "source": [
1509 | "from IPython.display import clear_output\n",
1510 | "\n",
1511 | "\n",
1512 | "h = 0.02 # step size in the mesh\n",
1513 | "cm = plt.cm.RdBu\n",
1514 | "x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
1515 | "y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n",
1516 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
1517 | "\n",
1518 | "model.train()\n",
1519 | "for i in range(1000):\n",
1520 | "\n",
1521 | " y_pred = model(X.float())\n",
1522 | "\n",
1523 | " loss = criterion(y_pred, y.long())\n",
1524 | "\n",
1525 | " loss.backward()\n",
1526 | "\n",
1527 | " optim.step()\n",
1528 | " optim.zero_grad()\n",
1529 | " if (i + 1) % 25 == 0:\n",
1530 | " clear_output(True)\n",
1531 | " input_tensor = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()\n",
1532 | " Z = model(input_tensor)\n",
1533 | " Z = torch.argmax(Z, axis=1)\n",
1534 | " Z = Z.reshape(xx.shape)\n",
1535 | "\n",
1536 | " plt.scatter(X[:, 0], X[:, 1], marker=\"o\", c=y, s=25, edgecolor=\"k\")\n",
1537 | "\n",
1538 | " plt.contourf(xx, yy, Z.detach().numpy(), cmap=cm, alpha=0.8)\n",
1539 | "\n",
1540 | " plt.show()\n",
1541 | "\n",
1542 | " print(\"Iteration: {}, Loss: {}\".format(i, loss))"
1543 | ]
1544 | },
1545 | {
1546 | "cell_type": "markdown",
1547 | "metadata": {},
1548 | "source": [
1549 | "# Реальные данные"
1550 | ]
1551 | },
1552 | {
1553 | "cell_type": "code",
1554 | "execution_count": null,
1555 | "metadata": {},
1556 | "outputs": [],
1557 | "source": [
1558 | "from torch.utils.data import Dataset\n",
1559 | "from torchvision import datasets\n",
1560 | "from torchvision.transforms import ToTensor, Normalize, Lambda"
1561 | ]
1562 | },
1563 | {
1564 | "cell_type": "markdown",
1565 | "metadata": {},
1566 | "source": [
1567 | "# 1. Грузим данные"
1568 | ]
1569 | },
1570 | {
1571 | "cell_type": "code",
1572 | "execution_count": null,
1573 | "metadata": {},
1574 | "outputs": [],
1575 | "source": [
1576 | "data_folderpath = \"../../data\"\n",
1577 | "training_data = datasets.FashionMNIST(\n",
1578 | " root=data_folderpath, train=True, download=True, transform=ToTensor()\n",
1579 | ")\n",
1580 | "\n",
1581 | "test_data = datasets.FashionMNIST(\n",
1582 | " root=data_folderpath, train=False, download=True, transform=ToTensor()\n",
1583 | ")"
1584 | ]
1585 | },
1586 | {
1587 | "cell_type": "code",
1588 | "execution_count": null,
1589 | "metadata": {},
1590 | "outputs": [],
1591 | "source": [
1592 | "labels_map = {\n",
1593 | " 0: \"T-Shirt\",\n",
1594 | " 1: \"Trouser\",\n",
1595 | " 2: \"Pullover\",\n",
1596 | " 3: \"Dress\",\n",
1597 | " 4: \"Coat\",\n",
1598 | " 5: \"Sandal\",\n",
1599 | " 6: \"Shirt\",\n",
1600 | " 7: \"Sneaker\",\n",
1601 | " 8: \"Bag\",\n",
1602 | " 9: \"Ankle Boot\",\n",
1603 | "}\n",
1604 | "figure = plt.figure(figsize=(8, 8))\n",
1605 | "cols, rows = 3, 3\n",
1606 | "for i in range(1, cols * rows + 1):\n",
1607 | " sample_idx = torch.randint(len(training_data), size=(1,)).item()\n",
1608 | " img, label = training_data[sample_idx]\n",
1609 | " figure.add_subplot(rows, cols, i)\n",
1610 | " plt.title(labels_map[label])\n",
1611 | " plt.axis(\"off\")\n",
1612 | " plt.imshow(img.squeeze(), cmap=\"gray\")\n",
1613 | "plt.show()"
1614 | ]
1615 | },
1616 | {
1617 | "cell_type": "code",
1618 | "execution_count": null,
1619 | "metadata": {},
1620 | "outputs": [],
1621 | "source": [
1622 | "X_sample, y_sample = training_data[0]\n",
1623 | "print(X_sample)\n",
1624 | "print(y_sample)"
1625 | ]
1626 | },
1627 | {
1628 | "cell_type": "code",
1629 | "execution_count": null,
1630 | "metadata": {},
1631 | "outputs": [],
1632 | "source": [
1633 | "X_sample.shape"
1634 | ]
1635 | },
1636 | {
1637 | "cell_type": "markdown",
1638 | "metadata": {},
1639 | "source": [
1640 | "# Создаем собственный датасет"
1641 | ]
1642 | },
1643 | {
1644 | "cell_type": "code",
1645 | "execution_count": null,
1646 | "metadata": {},
1647 | "outputs": [],
1648 | "source": [
1649 | "import pandas as pd\n",
1650 | "import numpy as np\n",
1651 | "\n",
1652 | "\n",
1653 | "class CustomImageDataset(Dataset):\n",
1654 | " def __init__(self, data_file, transform=None, target_transform=None):\n",
1655 | " self.data = pd.read_csv(data_file)\n",
1656 | " self.transform = transform\n",
1657 | " self.target_transform = target_transform\n",
1658 | "\n",
1659 | " def __len__(self):\n",
1660 | " return self.data.shape[0]\n",
1661 | "\n",
1662 | " def __getitem__(self, idx):\n",
1663 | " image = self.data.iloc[idx, 1:].values.reshape(28, 28).astype(np.uint8)\n",
1664 | " label = self.data.iloc[idx, 0]\n",
1665 | " if self.transform:\n",
1666 | " image = self.transform(image)\n",
1667 | " if self.target_transform:\n",
1668 | " label = self.target_transform(label)\n",
1669 | " return image, label"
1670 | ]
1671 | },
1672 | {
1673 | "cell_type": "code",
1674 | "execution_count": null,
1675 | "metadata": {},
1676 | "outputs": [],
1677 | "source": [
1678 | "train_data_url = \"https://media.githubusercontent.com/media/fpleoni/fashion_mnist/master/fashion-mnist_train.csv\"\n",
1679 | "test_data_url = \"https://media.githubusercontent.com/media/fpleoni/fashion_mnist/master/fashion-mnist_test.csv\""
1680 | ]
1681 | },
1682 | {
1683 | "cell_type": "code",
1684 | "execution_count": null,
1685 | "metadata": {},
1686 | "outputs": [],
1687 | "source": [
1688 | "train = CustomImageDataset(\n",
1689 | " train_data_url,\n",
1690 | " transform=ToTensor(),\n",
1691 | " target_transform=Lambda(\n",
1692 | " lambda y: torch.zeros(10, dtype=torch.float).scatter_(\n",
1693 | " 0, torch.tensor(y), value=1\n",
1694 | " )\n",
1695 | " ),\n",
1696 | ")\n",
1697 | "test = CustomImageDataset(\n",
1698 | " test_data_url,\n",
1699 | " transform=ToTensor(),\n",
1700 | " target_transform=Lambda(\n",
1701 | " lambda y: torch.zeros(10, dtype=torch.float).scatter_(\n",
1702 | " 0, torch.tensor(y), value=1\n",
1703 | " )\n",
1704 | " ),\n",
1705 | ")"
1706 | ]
1707 | },
1708 | {
1709 | "cell_type": "code",
1710 | "execution_count": null,
1711 | "metadata": {},
1712 | "outputs": [],
1713 | "source": [
1714 | "X_sample, y_sample = train[0]"
1715 | ]
1716 | },
1717 | {
1718 | "cell_type": "code",
1719 | "execution_count": null,
1720 | "metadata": {},
1721 | "outputs": [],
1722 | "source": [
1723 | "X_sample"
1724 | ]
1725 | },
1726 | {
1727 | "cell_type": "code",
1728 | "execution_count": null,
1729 | "metadata": {},
1730 | "outputs": [],
1731 | "source": [
1732 | "X_sample.shape"
1733 | ]
1734 | },
1735 | {
1736 | "cell_type": "code",
1737 | "execution_count": null,
1738 | "metadata": {},
1739 | "outputs": [],
1740 | "source": [
1741 | "y_sample"
1742 | ]
1743 | },
1744 | {
1745 | "cell_type": "code",
1746 | "execution_count": null,
1747 | "metadata": {},
1748 | "outputs": [],
1749 | "source": [
1750 | "y_sample.shape"
1751 | ]
1752 | },
1753 | {
1754 | "cell_type": "markdown",
1755 | "metadata": {},
1756 | "source": [
1757 | "# Познаем даталоадеры"
1758 | ]
1759 | },
1760 | {
1761 | "cell_type": "code",
1762 | "execution_count": null,
1763 | "metadata": {},
1764 | "outputs": [],
1765 | "source": [
1766 | "from torch.utils.data import DataLoader\n",
1767 | "\n",
1768 | "train_dataloader = DataLoader(train, batch_size=64, shuffle=True)\n",
1769 | "test_dataloader = DataLoader(test, batch_size=64, shuffle=True)"
1770 | ]
1771 | },
1772 | {
1773 | "cell_type": "code",
1774 | "execution_count": null,
1775 | "metadata": {},
1776 | "outputs": [],
1777 | "source": [
1778 | "train_features, train_labels = next(iter(train_dataloader))\n",
1779 | "print(f\"Feature batch shape: {train_features.size()}\")\n",
1780 | "print(f\"Labels batch shape: {train_labels.size()}\")\n",
1781 | "img = train_features[0].squeeze()\n",
1782 | "label = train_labels[0]\n",
1783 | "plt.imshow(img, cmap=\"gray\")\n",
1784 | "plt.show()\n",
1785 | "print(f\"Label: {label}\")"
1786 | ]
1787 | },
1788 | {
1789 | "cell_type": "markdown",
1790 | "metadata": {},
1791 | "source": [
1792 | "# 2. Собираем нейросеть"
1793 | ]
1794 | },
1795 | {
1796 | "cell_type": "code",
1797 | "execution_count": null,
1798 | "metadata": {},
1799 | "outputs": [],
1800 | "source": [
1801 | "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
1802 | "print(f\"Using {device} device\")"
1803 | ]
1804 | },
1805 | {
1806 | "cell_type": "code",
1807 | "execution_count": null,
1808 | "metadata": {},
1809 | "outputs": [],
1810 | "source": [
1811 | "from torch import nn\n",
1812 | "\n",
1813 | "\n",
1814 | "class NeuralNetwork(nn.Module):\n",
1815 | " def __init__(self):\n",
1816 | " super(NeuralNetwork, self).__init__()\n",
1817 | " self.flatten = nn.Flatten()\n",
1818 | " self.linear_relu_stack = nn.Sequential(\n",
1819 | " nn.Linear(28 * 28, 512),\n",
1820 | " nn.ReLU(),\n",
1821 | " nn.Linear(512, 512),\n",
1822 | " nn.ReLU(),\n",
1823 | " nn.Linear(512, 10),\n",
1824 | " )\n",
1825 | "\n",
1826 | " def forward(self, x):\n",
1827 | " x = self.flatten(x)\n",
1828 | " logits = self.linear_relu_stack(x)\n",
1829 | " return logits"
1830 | ]
1831 | },
1832 | {
1833 | "cell_type": "code",
1834 | "execution_count": null,
1835 | "metadata": {},
1836 | "outputs": [],
1837 | "source": [
1838 | "model = NeuralNetwork().to(device)\n",
1839 | "print(model)"
1840 | ]
1841 | },
1842 | {
1843 | "cell_type": "code",
1844 | "execution_count": null,
1845 | "metadata": {},
1846 | "outputs": [],
1847 | "source": [
1848 | "X = torch.rand(1, 28, 28, device=device)\n",
1849 | "logits = model(X)\n",
1850 | "pred_probab = nn.Softmax(dim=1)(logits)\n",
1851 | "y_pred = pred_probab.argmax(1)\n",
1852 | "print(f\"Predicted class: {y_pred}\")"
1853 | ]
1854 | },
1855 | {
1856 | "cell_type": "markdown",
1857 | "metadata": {},
1858 | "source": [
1859 | "# Смотрим детально в слои сети"
1860 | ]
1861 | },
1862 | {
1863 | "cell_type": "code",
1864 | "execution_count": null,
1865 | "metadata": {},
1866 | "outputs": [],
1867 | "source": [
1868 | "input_image = torch.rand(3, 28, 28)\n",
1869 | "print(input_image.size())"
1870 | ]
1871 | },
1872 | {
1873 | "cell_type": "code",
1874 | "execution_count": null,
1875 | "metadata": {},
1876 | "outputs": [],
1877 | "source": [
1878 | "input_image"
1879 | ]
1880 | },
1881 | {
1882 | "cell_type": "code",
1883 | "execution_count": null,
1884 | "metadata": {},
1885 | "outputs": [],
1886 | "source": [
1887 | "flatten = nn.Flatten()\n",
1888 | "flat_image = flatten(input_image)\n",
1889 | "print(flat_image.size())"
1890 | ]
1891 | },
1892 | {
1893 | "cell_type": "code",
1894 | "execution_count": null,
1895 | "metadata": {},
1896 | "outputs": [],
1897 | "source": [
1898 | "flat_image"
1899 | ]
1900 | },
1901 | {
1902 | "cell_type": "code",
1903 | "execution_count": null,
1904 | "metadata": {},
1905 | "outputs": [],
1906 | "source": [
1907 | "layer1 = nn.Linear(in_features=28 * 28, out_features=20)\n",
1908 | "hidden1 = layer1(flat_image)\n",
1909 | "print(hidden1.size())"
1910 | ]
1911 | },
1912 | {
1913 | "cell_type": "code",
1914 | "execution_count": null,
1915 | "metadata": {},
1916 | "outputs": [],
1917 | "source": [
1918 | "print(f\"Before ReLU: {hidden1}\\n\\n\")\n",
1919 | "hidden1 = nn.ReLU()(hidden1)\n",
1920 | "print(f\"After ReLU: {hidden1}\")"
1921 | ]
1922 | },
1923 | {
1924 | "cell_type": "code",
1925 | "execution_count": null,
1926 | "metadata": {},
1927 | "outputs": [],
1928 | "source": [
1929 | "seq_modules = nn.Sequential(flatten, layer1, nn.ReLU(), nn.Linear(20, 10))\n",
1930 | "input_image = torch.rand(3, 28, 28)\n",
1931 | "logits = seq_modules(input_image)"
1932 | ]
1933 | },
1934 | {
1935 | "cell_type": "code",
1936 | "execution_count": null,
1937 | "metadata": {},
1938 | "outputs": [],
1939 | "source": [
1940 | "logits.shape"
1941 | ]
1942 | },
1943 | {
1944 | "cell_type": "code",
1945 | "execution_count": null,
1946 | "metadata": {},
1947 | "outputs": [],
1948 | "source": [
1949 | "logits"
1950 | ]
1951 | },
1952 | {
1953 | "cell_type": "code",
1954 | "execution_count": null,
1955 | "metadata": {},
1956 | "outputs": [],
1957 | "source": [
1958 | "softmax = nn.Softmax(dim=1)\n",
1959 | "pred_probab = softmax(logits)\n",
1960 | "pred_probab"
1961 | ]
1962 | },
1963 | {
1964 | "cell_type": "code",
1965 | "execution_count": null,
1966 | "metadata": {},
1967 | "outputs": [],
1968 | "source": [
1969 | "pred_probab.argmax(dim=1)"
1970 | ]
1971 | },
1972 | {
1973 | "cell_type": "code",
1974 | "execution_count": null,
1975 | "metadata": {},
1976 | "outputs": [],
1977 | "source": [
1978 | "print(f\"Model structure: {model}\\n\\n\")\n",
1979 | "\n",
1980 | "for name, param in model.named_parameters():\n",
1981 | " print(f\"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \\n\")"
1982 | ]
1983 | },
1984 | {
1985 | "cell_type": "markdown",
1986 | "metadata": {},
1987 | "source": [
1988 | "# 3. Тренируем сеть"
1989 | ]
1990 | },
1991 | {
1992 | "cell_type": "markdown",
1993 | "metadata": {},
1994 | "source": [
1995 | "#### Инициализируем модель"
1996 | ]
1997 | },
1998 | {
1999 | "cell_type": "code",
2000 | "execution_count": null,
2001 | "metadata": {},
2002 | "outputs": [],
2003 | "source": [
2004 | "model = NeuralNetwork()"
2005 | ]
2006 | },
2007 | {
2008 | "cell_type": "markdown",
2009 | "metadata": {},
2010 | "source": [
2011 | "#### Инициализируем параметры"
2012 | ]
2013 | },
2014 | {
2015 | "cell_type": "code",
2016 | "execution_count": null,
2017 | "metadata": {},
2018 | "outputs": [],
2019 | "source": [
2020 | "learning_rate = 1e-3\n",
2021 | "batch_size = 64\n",
2022 | "epochs = 5"
2023 | ]
2024 | },
2025 | {
2026 | "cell_type": "markdown",
2027 | "metadata": {},
2028 | "source": [
2029 | "#### Инициализируем функцию потерь"
2030 | ]
2031 | },
2032 | {
2033 | "cell_type": "code",
2034 | "execution_count": null,
2035 | "metadata": {},
2036 | "outputs": [],
2037 | "source": [
2038 | "loss_fn = nn.CrossEntropyLoss()"
2039 | ]
2040 | },
2041 | {
2042 | "cell_type": "markdown",
2043 | "metadata": {},
2044 | "source": [
2045 | "#### Инициализируем оптимизатор"
2046 | ]
2047 | },
2048 | {
2049 | "cell_type": "code",
2050 | "execution_count": null,
2051 | "metadata": {},
2052 | "outputs": [],
2053 | "source": [
2054 | "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)"
2055 | ]
2056 | },
2057 | {
2058 | "cell_type": "markdown",
2059 | "metadata": {},
2060 | "source": [
2061 | "#### Описываем функции тренировки и тестирования модели"
2062 | ]
2063 | },
2064 | {
2065 | "cell_type": "code",
2066 | "execution_count": null,
2067 | "metadata": {},
2068 | "outputs": [],
2069 | "source": [
2070 | "def train_loop(dataloader, model, loss_fn, optimizer):\n",
2071 | " size = len(dataloader.dataset)\n",
2072 | " for batch, (X, y) in enumerate(dataloader):\n",
2073 | " # Compute prediction and loss\n",
2074 | " pred = model(X)\n",
2075 | " loss = loss_fn(pred, y)\n",
2076 | "\n",
2077 | " # Backpropagation\n",
2078 | " optimizer.zero_grad()\n",
2079 | " loss.backward()\n",
2080 | " optimizer.step()\n",
2081 | "\n",
2082 | " if batch % 100 == 0:\n",
2083 | " loss, current = loss.item(), batch * len(X)\n",
2084 | " print(f\"loss: {loss:>7f} [{current:>5d}/{size:>5d}]\")\n",
2085 | "\n",
2086 | "\n",
2087 | "def test_loop(dataloader, model, loss_fn):\n",
2088 | " size = len(dataloader.dataset)\n",
2089 | " num_batches = len(dataloader)\n",
2090 | " test_loss, correct = 0, 0\n",
2091 | "\n",
2092 | " with torch.no_grad():\n",
2093 | " for X, y in dataloader:\n",
2094 | " pred = model(X)\n",
2095 | " test_loss += loss_fn(pred, y).item()\n",
2096 | " correct += (pred.argmax(1) == y.argmax(1)).type(torch.float).sum().item()\n",
2097 | "\n",
2098 | " test_loss /= num_batches\n",
2099 | " correct /= size\n",
2100 | " print(\n",
2101 | " f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\"\n",
2102 | " )"
2103 | ]
2104 | },
2105 | {
2106 | "cell_type": "code",
2107 | "execution_count": null,
2108 | "metadata": {},
2109 | "outputs": [],
2110 | "source": [
2111 | "epochs = 10\n",
2112 | "for t in range(epochs):\n",
2113 | " print(f\"Epoch {t+1}\\n-------------------------------\")\n",
2114 | " train_loop(train_dataloader, model, loss_fn, optimizer)\n",
2115 | " test_loop(test_dataloader, model, loss_fn)\n",
2116 | "print(\"Done!\")"
2117 | ]
2118 | },
2119 | {
2120 | "cell_type": "markdown",
2121 | "metadata": {},
2122 | "source": [
2123 | "# Весь материал ниже считаем факультативным"
2124 | ]
2125 | },
2126 | {
2127 | "cell_type": "markdown",
2128 | "metadata": {},
2129 | "source": [
2130 | "---"
2131 | ]
2132 | },
2133 | {
2134 | "cell_type": "markdown",
2135 | "metadata": {},
2136 | "source": [
2137 | "# 4. Изучаем свертки"
2138 | ]
2139 | },
2140 | {
2141 | "cell_type": "markdown",
2142 | "metadata": {},
2143 | "source": [
2144 | ""
2145 | ]
2146 | },
2147 | {
2148 | "cell_type": "markdown",
2149 | "metadata": {},
2150 | "source": [
2151 | ""
2152 | ]
2153 | },
2154 | {
2155 | "cell_type": "markdown",
2156 | "metadata": {},
2157 | "source": [
2158 | ""
2159 | ]
2160 | },
2161 | {
2162 | "cell_type": "markdown",
2163 | "metadata": {},
2164 | "source": [
2165 | ""
2166 | ]
2167 | },
2168 | {
2169 | "cell_type": "markdown",
2170 | "metadata": {},
2171 | "source": [
2172 | ""
2173 | ]
2174 | },
2175 | {
2176 | "cell_type": "code",
2177 | "execution_count": null,
2178 | "metadata": {},
2179 | "outputs": [],
2180 | "source": [
2181 | "import matplotlib.pyplot as plt\n",
2182 | "\n",
2183 | "%matplotlib inline\n",
2184 | "import numpy as np\n",
2185 | "import cv2\n",
2186 | "import sklearn\n",
2187 | "import PIL\n",
2188 | "import torch\n",
2189 | "import torchvision\n",
2190 | "\n",
2191 | "\n",
2192 | "def show_image(image, figsize=(16, 9), reverse=True):\n",
2193 | " plt.figure(figsize=figsize)\n",
2194 | " if reverse:\n",
2195 | " plt.imshow(image[..., ::-1])\n",
2196 | " else:\n",
2197 | " plt.imshow(image)\n",
2198 | " plt.axis(\"off\")\n",
2199 | " plt.show()\n",
2200 | "\n",
2201 | "\n",
2202 | "def show_grayscale_image(image, figsize=(16, 9)):\n",
2203 | " plt.figure(figsize=figsize)\n",
2204 | " plt.imshow(image, cmap=\"gray\")\n",
2205 | " plt.axis(\"off\")\n",
2206 | " plt.show()"
2207 | ]
2208 | },
2209 | {
2210 | "cell_type": "code",
2211 | "execution_count": null,
2212 | "metadata": {},
2213 | "outputs": [],
2214 | "source": [
2215 | "layer = torch.nn.Conv2d(1, 1, kernel_size=(3, 3))\n",
2216 | "layer.weight"
2217 | ]
2218 | },
2219 | {
2220 | "cell_type": "code",
2221 | "execution_count": null,
2222 | "metadata": {},
2223 | "outputs": [],
2224 | "source": [
2225 | "# вес доступен через .weight.data\n",
2226 | "layer.weight.data = torch.ones_like(layer.weight.data)\n",
2227 | "layer.weight.data /= torch.sum(layer.weight.data)\n",
2228 | "layer.weight.data"
2229 | ]
2230 | },
2231 | {
2232 | "cell_type": "code",
2233 | "execution_count": null,
2234 | "metadata": {},
2235 | "outputs": [],
2236 | "source": [
2237 | "layer.bias.data = torch.zeros_like(layer.bias.data)\n",
2238 | "layer.bias.data"
2239 | ]
2240 | },
2241 | {
2242 | "cell_type": "code",
2243 | "execution_count": null,
2244 | "metadata": {},
2245 | "outputs": [],
2246 | "source": [
2247 | "layer.train()\n",
2248 | "layer.eval()"
2249 | ]
2250 | },
2251 | {
2252 | "cell_type": "code",
2253 | "execution_count": null,
2254 | "metadata": {},
2255 | "outputs": [],
2256 | "source": [
2257 | "input_tensor = torch.arange(9).reshape(1, 1, 3, 3)\n",
2258 | "input_tensor.shape"
2259 | ]
2260 | },
2261 | {
2262 | "cell_type": "markdown",
2263 | "metadata": {},
2264 | "source": [
2265 | "1 - размер батча 1 - кол-во каналов (1, если черно-белый, 3, если цветной) 3 - высота 3 - ширина"
2266 | ]
2267 | },
2268 | {
2269 | "cell_type": "code",
2270 | "execution_count": null,
2271 | "metadata": {},
2272 | "outputs": [],
2273 | "source": [
2274 | "input_tensor"
2275 | ]
2276 | },
2277 | {
2278 | "cell_type": "code",
2279 | "execution_count": null,
2280 | "metadata": {},
2281 | "outputs": [],
2282 | "source": [
2283 | "output_tensor = layer(input_tensor.float())"
2284 | ]
2285 | },
2286 | {
2287 | "cell_type": "code",
2288 | "execution_count": null,
2289 | "metadata": {},
2290 | "outputs": [],
2291 | "source": [
2292 | "output_tensor"
2293 | ]
2294 | },
2295 | {
2296 | "cell_type": "markdown",
2297 | "metadata": {},
2298 | "source": [
2299 | "## Какие бывают фильтры"
2300 | ]
2301 | },
2302 | {
2303 | "cell_type": "code",
2304 | "execution_count": null,
2305 | "metadata": {},
2306 | "outputs": [],
2307 | "source": [
2308 | "from urllib.request import urlopen\n",
2309 | "\n",
2310 | "\n",
2311 | "req = urlopen(\n",
2312 | " \"https://images.unsplash.com/photo-1608848461950-0fe51dfc41cb?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxleHBsb3JlLWZlZWR8M3x8fGVufDB8fHx8&w=1000&q=80\"\n",
2313 | ")\n",
2314 | "arr = np.asarray(bytearray(req.read()), dtype=np.uint8)\n",
2315 | "img = cv2.imdecode(arr, 0)"
2316 | ]
2317 | },
2318 | {
2319 | "cell_type": "code",
2320 | "execution_count": null,
2321 | "metadata": {},
2322 | "outputs": [],
2323 | "source": [
2324 | "img.shape"
2325 | ]
2326 | },
2327 | {
2328 | "cell_type": "code",
2329 | "execution_count": null,
2330 | "metadata": {},
2331 | "outputs": [],
2332 | "source": [
2333 | "show_grayscale_image(img)"
2334 | ]
2335 | },
2336 | {
2337 | "cell_type": "code",
2338 | "execution_count": null,
2339 | "metadata": {},
2340 | "outputs": [],
2341 | "source": [
2342 | "to_tensor = torchvision.transforms.ToTensor()\n",
2343 | "img_tensor = to_tensor(img)\n",
2344 | "img_tensor"
2345 | ]
2346 | },
2347 | {
2348 | "cell_type": "code",
2349 | "execution_count": null,
2350 | "metadata": {},
2351 | "outputs": [],
2352 | "source": [
2353 | "img_tensor.shape"
2354 | ]
2355 | },
2356 | {
2357 | "cell_type": "code",
2358 | "execution_count": null,
2359 | "metadata": {},
2360 | "outputs": [],
2361 | "source": [
2362 | "# делаем так, чтобы нейросеть могла считать изображение\n",
2363 | "\n",
2364 | "img_tensor = img_tensor.reshape((1, 1, 1500, 1000))"
2365 | ]
2366 | },
2367 | {
2368 | "cell_type": "code",
2369 | "execution_count": null,
2370 | "metadata": {},
2371 | "outputs": [],
2372 | "source": [
2373 | "kernel_size = 50\n",
2374 | "layer = torch.nn.Conv2d(1, 1, kernel_size=kernel_size)\n",
2375 | "\n",
2376 | "# обновляем веса свертки\n",
2377 | "\n",
2378 | "layer.weight.data = torch.ones_like(layer.weight.data)\n",
2379 | "layer.weight.data /= torch.sum(layer.weight.data)"
2380 | ]
2381 | },
2382 | {
2383 | "cell_type": "code",
2384 | "execution_count": null,
2385 | "metadata": {},
2386 | "outputs": [],
2387 | "source": [
2388 | "layer.weight.data"
2389 | ]
2390 | },
2391 | {
2392 | "cell_type": "code",
2393 | "execution_count": null,
2394 | "metadata": {},
2395 | "outputs": [],
2396 | "source": [
2397 | "output_tensor = layer(img_tensor)\n",
2398 | "output_tensor.shape"
2399 | ]
2400 | },
2401 | {
2402 | "cell_type": "code",
2403 | "execution_count": null,
2404 | "metadata": {},
2405 | "outputs": [],
2406 | "source": [
2407 | "import torchvision.transforms as transforms\n",
2408 | "\n",
2409 | "# функция, переводящее тензор в PIL-изображение\n",
2410 | "to_pil_image = transforms.ToPILImage()\n",
2411 | "output_img = to_pil_image(output_tensor.squeeze(0))"
2412 | ]
2413 | },
2414 | {
2415 | "cell_type": "code",
2416 | "execution_count": null,
2417 | "metadata": {},
2418 | "outputs": [],
2419 | "source": [
2420 | "show_grayscale_image(output_img)"
2421 | ]
2422 | },
2423 | {
2424 | "cell_type": "code",
2425 | "execution_count": null,
2426 | "metadata": {},
2427 | "outputs": [],
2428 | "source": [
2429 | "def show_image_with_kernel(img_input, kernel):\n",
2430 | " layer = torch.nn.Conv2d(1, 1, kernel_size=3)\n",
2431 | " layer.weight.data = our_kernel.reshape(1, 1, 3, 3)\n",
2432 | " output_tensor = layer(img_input)\n",
2433 | " output_img = to_pil_image(output_tensor.squeeze(0))\n",
2434 | " show_grayscale_image(output_img)"
2435 | ]
2436 | },
2437 | {
2438 | "cell_type": "code",
2439 | "execution_count": null,
2440 | "metadata": {},
2441 | "outputs": [],
2442 | "source": [
2443 | "our_kernel = torch.tensor([[0, -1, 0], [-1, 4, -1], [0, -1, 0]], dtype=torch.float32)\n",
2444 | "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2445 | "show_image_with_kernel(img_tensor, our_kernel)"
2446 | ]
2447 | },
2448 | {
2449 | "cell_type": "code",
2450 | "execution_count": null,
2451 | "metadata": {},
2452 | "outputs": [],
2453 | "source": [
2454 | "our_kernel = torch.tensor(\n",
2455 | " [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], dtype=torch.float32\n",
2456 | ")\n",
2457 | "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2458 | "show_image_with_kernel(img_tensor, our_kernel)"
2459 | ]
2460 | },
2461 | {
2462 | "cell_type": "code",
2463 | "execution_count": null,
2464 | "metadata": {},
2465 | "outputs": [],
2466 | "source": [
2467 | "our_kernel = torch.tensor([[0, -1, 0], [-1, 5, -1], [0, -1, 0]], dtype=torch.float32)\n",
2468 | "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2469 | "show_image_with_kernel(img_tensor, our_kernel)"
2470 | ]
2471 | },
2472 | {
2473 | "cell_type": "code",
2474 | "execution_count": null,
2475 | "metadata": {},
2476 | "outputs": [],
2477 | "source": [
2478 | "our_kernel = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=torch.float32)\n",
2479 | "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2480 | "show_image_with_kernel(img_tensor, our_kernel)"
2481 | ]
2482 | },
2483 | {
2484 | "cell_type": "code",
2485 | "execution_count": null,
2486 | "metadata": {},
2487 | "outputs": [],
2488 | "source": [
2489 | "our_kernel = torch.tensor([[1, 0, 1], [0, 1, 0], [1, 0, 1]], dtype=torch.float32)\n",
2490 | "our_kernel = our_kernel.reshape(1, 1, 3, 3)\n",
2491 | "show_image_with_kernel(img_tensor, our_kernel)"
2492 | ]
2493 | },
2494 | {
2495 | "cell_type": "markdown",
2496 | "metadata": {},
2497 | "source": [
2498 | ""
2499 | ]
2500 | },
2501 | {
2502 | "cell_type": "markdown",
2503 | "metadata": {},
2504 | "source": [
2505 | ""
2506 | ]
2507 | },
2508 | {
2509 | "cell_type": "markdown",
2510 | "metadata": {},
2511 | "source": [
2512 | ""
2513 | ]
2514 | },
2515 | {
2516 | "cell_type": "markdown",
2517 | "metadata": {},
2518 | "source": [
2519 | ""
2520 | ]
2521 | },
2522 | {
2523 | "cell_type": "markdown",
2524 | "metadata": {},
2525 | "source": [
2526 | ""
2527 | ]
2528 | },
2529 | {
2530 | "cell_type": "markdown",
2531 | "metadata": {},
2532 | "source": [
2533 | ""
2534 | ]
2535 | },
2536 | {
2537 | "cell_type": "markdown",
2538 | "metadata": {},
2539 | "source": [
2540 | ""
2541 | ]
2542 | },
2543 | {
2544 | "cell_type": "markdown",
2545 | "metadata": {},
2546 | "source": [
2547 | ""
2548 | ]
2549 | },
2550 | {
2551 | "cell_type": "markdown",
2552 | "metadata": {},
2553 | "source": [
2554 | ""
2555 | ]
2556 | },
2557 | {
2558 | "cell_type": "markdown",
2559 | "metadata": {},
2560 | "source": [
2561 | ""
2562 | ]
2563 | },
2564 | {
2565 | "cell_type": "markdown",
2566 | "metadata": {},
2567 | "source": [
2568 | "Визуализация сверточных слоев https://distill.pub/2017/feature-visualization/"
2569 | ]
2570 | },
2571 | {
2572 | "cell_type": "markdown",
2573 | "metadata": {},
2574 | "source": [
2575 | "# Теперь пора писать свою сверточную нейросеть"
2576 | ]
2577 | },
2578 | {
2579 | "cell_type": "code",
2580 | "execution_count": null,
2581 | "metadata": {},
2582 | "outputs": [],
2583 | "source": [
2584 | "import torch\n",
2585 | "import torchvision\n",
2586 | "import torchvision.transforms as transforms\n",
2587 | "\n",
2588 | "\n",
2589 | "transform = transforms.Compose(\n",
2590 | " [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]\n",
2591 | ")"
2592 | ]
2593 | },
2594 | {
2595 | "cell_type": "code",
2596 | "execution_count": null,
2597 | "metadata": {},
2598 | "outputs": [],
2599 | "source": [
2600 | "batch_size = 4\n",
2601 | "\n",
2602 | "# dataset задаёт данные\n",
2603 | "trainset = torchvision.datasets.CIFAR10(\n",
2604 | " root=\"./data\", train=True, download=True, transform=transform\n",
2605 | ")\n",
2606 | "# dataloader подгружает их\n",
2607 | "trainloader = torch.utils.data.DataLoader(\n",
2608 | " trainset, batch_size=batch_size, shuffle=True, num_workers=2\n",
2609 | ")\n",
2610 | "\n",
2611 | "testset = torchvision.datasets.CIFAR10(\n",
2612 | " root=\"./data\", train=False, download=True, transform=transform\n",
2613 | ")\n",
2614 | "testloader = torch.utils.data.DataLoader(\n",
2615 | " testset, batch_size=batch_size, shuffle=False, num_workers=2\n",
2616 | ")\n",
2617 | "\n",
2618 | "classes = (\n",
2619 | " \"plane\",\n",
2620 | " \"car\",\n",
2621 | " \"bird\",\n",
2622 | " \"cat\",\n",
2623 | " \"deer\",\n",
2624 | " \"dog\",\n",
2625 | " \"frog\",\n",
2626 | " \"horse\",\n",
2627 | " \"ship\",\n",
2628 | " \"truck\",\n",
2629 | ")"
2630 | ]
2631 | },
2632 | {
2633 | "cell_type": "code",
2634 | "execution_count": null,
2635 | "metadata": {},
2636 | "outputs": [],
2637 | "source": [
2638 | "import matplotlib.pyplot as plt\n",
2639 | "import numpy as np\n",
2640 | "\n",
2641 | "\n",
2642 | "def imshow(img):\n",
2643 | " # убрать нормализацию\n",
2644 | " img = img / 2 + 0.5\n",
2645 | " npimg = img.numpy()\n",
2646 | " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n",
2647 | " plt.show()\n",
2648 | "\n",
2649 | "\n",
2650 | "# взять случайный батч изображений\n",
2651 | "dataiter = iter(trainloader)\n",
2652 | "images, labels = next(dataiter)\n",
2653 | "\n",
2654 | "imshow(torchvision.utils.make_grid(images))\n",
2655 | "print(\" \".join(\"{}\".format(classes[labels[j]]) for j in range(4)))"
2656 | ]
2657 | },
2658 | {
2659 | "cell_type": "code",
2660 | "execution_count": null,
2661 | "metadata": {},
2662 | "outputs": [],
2663 | "source": [
2664 | "def train_model(net, criterion, optimizer, trainloader, num_epochs=5):\n",
2665 | " for epoch in range(num_epochs):\n",
2666 | " running_loss = 0.0\n",
2667 | " for i, data in enumerate(trainloader, 0):\n",
2668 | " inputs, labels = data\n",
2669 | "\n",
2670 | " # Давайте сами напишем код тут\n",
2671 | "\n",
2672 | " optimizer.zero_grad()\n",
2673 | "\n",
2674 | " outputs = net(inputs)\n",
2675 | " loss = criterion(outputs, labels)\n",
2676 | " loss.backward()\n",
2677 | " optimizer.step()\n",
2678 | "\n",
2679 | " running_loss += loss.item()\n",
2680 | " if i % 2000 == 1999:\n",
2681 | " print(\n",
2682 | " \"Epoch {0}/{1}, iteration {2}, loss: {3:.3f}\".format(\n",
2683 | " epoch + 1, num_epochs, i + 1, running_loss / 2000\n",
2684 | " )\n",
2685 | " )\n",
2686 | " running_loss = 0.0\n",
2687 | " print()\n",
2688 | "\n",
2689 | " print(\"Finished Training\")\n",
2690 | "\n",
2691 | " return net"
2692 | ]
2693 | },
2694 | {
2695 | "cell_type": "code",
2696 | "execution_count": null,
2697 | "metadata": {},
2698 | "outputs": [],
2699 | "source": [
2700 | "def all_accuracy(net, testloader):\n",
2701 | " correct = 0\n",
2702 | " total = 0\n",
2703 | " with torch.no_grad():\n",
2704 | " for data in testloader:\n",
2705 | " images, labels = data\n",
2706 | " outputs = net(images)\n",
2707 | "\n",
2708 | " _, predicted = torch.max(outputs.data, 1)\n",
2709 | " total += labels.size(0)\n",
2710 | " correct += (predicted == labels).sum().item()\n",
2711 | "\n",
2712 | " print(\n",
2713 | " \"Accuracy of the network on the 10000 test images: {} %\".format(\n",
2714 | " 100 * correct / total\n",
2715 | " )\n",
2716 | " )"
2717 | ]
2718 | },
2719 | {
2720 | "cell_type": "code",
2721 | "execution_count": null,
2722 | "metadata": {},
2723 | "outputs": [],
2724 | "source": [
2725 | "def class_accuracy(net, testloader):\n",
2726 | " class_correct = list(0.0 for i in range(10))\n",
2727 | " class_total = list(0.0 for i in range(10))\n",
2728 | " with torch.no_grad():\n",
2729 | " for data in testloader:\n",
2730 | " images, labels = data\n",
2731 | " outputs = net(images)\n",
2732 | " _, predicted = torch.max(outputs, 1)\n",
2733 | " c = (predicted == labels).squeeze()\n",
2734 | " for i in range(4):\n",
2735 | " label = labels[i]\n",
2736 | " class_correct[label] += c[i].item()\n",
2737 | " class_total[label] += 1\n",
2738 | "\n",
2739 | " for i in range(10):\n",
2740 | " print(\n",
2741 | " \"Accuracy of {} : {} %\".format(\n",
2742 | " classes[i], 100 * class_correct[i] / class_total[i]\n",
2743 | " )\n",
2744 | " )"
2745 | ]
2746 | },
2747 | {
2748 | "cell_type": "code",
2749 | "execution_count": null,
2750 | "metadata": {},
2751 | "outputs": [],
2752 | "source": [
2753 | "import torch.nn as nn\n",
2754 | "import torch.nn.functional as F\n",
2755 | "\n",
2756 | "\n",
2757 | "class FeedForwardNet(nn.Module):\n",
2758 | " def __init__(self):\n",
2759 | " super(FeedForwardNet, self).__init__()\n",
2760 | " self.fc1 = nn.Linear(3 * 32 * 32, 128)\n",
2761 | " self.fc2 = nn.Linear(128, 32)\n",
2762 | " self.fc3 = nn.Linear(32, 10)\n",
2763 | "\n",
2764 | " def forward(self, x):\n",
2765 | " x = x.view(-1, 3 * 32 * 32)\n",
2766 | " x = self.fc1(x)\n",
2767 | " x = F.relu(x)\n",
2768 | " x = self.fc2(x)\n",
2769 | " x = F.relu(x)\n",
2770 | " x = self.fc3(x)\n",
2771 | " return x\n",
2772 | "\n",
2773 | "\n",
2774 | "net = FeedForwardNet()"
2775 | ]
2776 | },
2777 | {
2778 | "cell_type": "code",
2779 | "execution_count": null,
2780 | "metadata": {},
2781 | "outputs": [],
2782 | "source": [
2783 | "import torch.optim as optim\n",
2784 | "\n",
2785 | "criterion = nn.CrossEntropyLoss()\n",
2786 | "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)"
2787 | ]
2788 | },
2789 | {
2790 | "cell_type": "code",
2791 | "execution_count": null,
2792 | "metadata": {},
2793 | "outputs": [],
2794 | "source": [
2795 | "net = train_model(net, criterion, optimizer, trainloader, num_epochs=5)"
2796 | ]
2797 | },
2798 | {
2799 | "cell_type": "code",
2800 | "execution_count": null,
2801 | "metadata": {},
2802 | "outputs": [],
2803 | "source": [
2804 | "all_accuracy(net, testloader)"
2805 | ]
2806 | },
2807 | {
2808 | "cell_type": "code",
2809 | "execution_count": null,
2810 | "metadata": {},
2811 | "outputs": [],
2812 | "source": [
2813 | "class_accuracy(net, testloader)"
2814 | ]
2815 | },
2816 | {
2817 | "cell_type": "markdown",
2818 | "metadata": {},
2819 | "source": [
2820 | "## Свёрточная сеть для классификации"
2821 | ]
2822 | },
2823 | {
2824 | "cell_type": "code",
2825 | "execution_count": null,
2826 | "metadata": {},
2827 | "outputs": [],
2828 | "source": [
2829 | "import torch.nn as nn\n",
2830 | "import torch.nn.functional as F\n",
2831 | "\n",
2832 | "\n",
2833 | "class Net(nn.Module):\n",
2834 | " def __init__(self):\n",
2835 | " super(Net, self).__init__()\n",
2836 | " self.conv1 = nn.Conv2d(3, 6, 5)\n",
2837 | " self.pool = nn.MaxPool2d(2, 2)\n",
2838 | " self.conv2 = nn.Conv2d(6, 16, 5)\n",
2839 | " self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
2840 | " self.fc2 = nn.Linear(120, 84)\n",
2841 | " self.fc3 = nn.Linear(84, 10)\n",
2842 | "\n",
2843 | " def forward(self, x):\n",
2844 | " x = self.pool(F.relu(self.conv1(x)))\n",
2845 | " x = self.pool(F.relu(self.conv2(x)))\n",
2846 | " x = x.view(-1, 16 * 5 * 5)\n",
2847 | " x = F.relu(self.fc1(x))\n",
2848 | " x = F.relu(self.fc2(x))\n",
2849 | " x = self.fc3(x)\n",
2850 | " return x\n",
2851 | "\n",
2852 | "\n",
2853 | "net = Net()"
2854 | ]
2855 | },
2856 | {
2857 | "cell_type": "code",
2858 | "execution_count": null,
2859 | "metadata": {},
2860 | "outputs": [],
2861 | "source": [
2862 | "criterion = nn.CrossEntropyLoss()\n",
2863 | "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)"
2864 | ]
2865 | },
2866 | {
2867 | "cell_type": "code",
2868 | "execution_count": null,
2869 | "metadata": {},
2870 | "outputs": [],
2871 | "source": [
2872 | "net = train_model(net, criterion, optimizer, trainloader, num_epochs=5)"
2873 | ]
2874 | },
2875 | {
2876 | "cell_type": "code",
2877 | "execution_count": null,
2878 | "metadata": {},
2879 | "outputs": [],
2880 | "source": [
2881 | "# сохраним сеть\n",
2882 | "PATH = \"./cifar_net.pth\"\n",
2883 | "torch.save(net.state_dict(), PATH)"
2884 | ]
2885 | },
2886 | {
2887 | "cell_type": "code",
2888 | "execution_count": null,
2889 | "metadata": {},
2890 | "outputs": [],
2891 | "source": [
2892 | "dataiter = iter(testloader)\n",
2893 | "images, labels = next(dataiter)\n",
2894 | "\n",
2895 | "imshow(torchvision.utils.make_grid(images))\n",
2896 | "print(\n",
2897 | " \"GroundTruth: \",\n",
2898 | " \" \".join(\"{}\".format(classes[labels[j]]) for j in range(batch_size)),\n",
2899 | ")"
2900 | ]
2901 | },
2902 | {
2903 | "cell_type": "code",
2904 | "execution_count": null,
2905 | "metadata": {},
2906 | "outputs": [],
2907 | "source": [
2908 | "# загрузим сеть\n",
2909 | "net_loaded = Net()\n",
2910 | "net_loaded.load_state_dict(torch.load(PATH))"
2911 | ]
2912 | },
2913 | {
2914 | "cell_type": "code",
2915 | "execution_count": null,
2916 | "metadata": {},
2917 | "outputs": [],
2918 | "source": [
2919 | "outputs = net_loaded(images)\n",
2920 | "_, predicted = torch.max(outputs, 1)\n",
2921 | "\n",
2922 | "print(\"Predicted: \", \" \".join(\"{}\".format(classes[predicted[j]]) for j in range(4)))"
2923 | ]
2924 | },
2925 | {
2926 | "cell_type": "code",
2927 | "execution_count": null,
2928 | "metadata": {},
2929 | "outputs": [],
2930 | "source": [
2931 | "all_accuracy(net, testloader)"
2932 | ]
2933 | },
2934 | {
2935 | "cell_type": "code",
2936 | "execution_count": null,
2937 | "metadata": {},
2938 | "outputs": [],
2939 | "source": [
2940 | "class_accuracy(net, testloader)"
2941 | ]
2942 | },
2943 | {
2944 | "cell_type": "code",
2945 | "execution_count": null,
2946 | "metadata": {},
2947 | "outputs": [],
2948 | "source": []
2949 | }
2950 | ],
2951 | "metadata": {
2952 | "colab": {
2953 | "collapsed_sections": [],
2954 | "provenance": []
2955 | },
2956 | "kernelspec": {
2957 | "display_name": "nn-ml-bachelor-2024-venv",
2958 | "language": "python",
2959 | "name": "nn-ml-bachelor-2024-venv"
2960 | },
2961 | "language_info": {
2962 | "codemirror_mode": {
2963 | "name": "ipython",
2964 | "version": 3
2965 | },
2966 | "file_extension": ".py",
2967 | "mimetype": "text/x-python",
2968 | "name": "python",
2969 | "nbconvert_exporter": "python",
2970 | "pygments_lexer": "ipython3",
2971 | "version": "3.9.6"
2972 | }
2973 | },
2974 | "nbformat": 4,
2975 | "nbformat_minor": 1
2976 | }
2977 |
--------------------------------------------------------------------------------