├── README.md
├── book_dl_cnn01_MNIST.ipynb
├── dj_AE_20220318.ipynb
├── dj_LSTMLangModel.ipynb
├── dj_TABLE_example.ipynb
├── dj_cat_coding.ipynb
├── dj_dl_cnn01_MNIST.ipynb
├── dj_fast_preprocessing.ipynb
├── dj_intro_pytorch20210125.ipynb
├── dj_matplotlib_intro.ipynb
├── dj_numpy_20181021.ipynb
├── dj_python_0_intro_20181004.ipynb
├── dj_python_2_oop_20181004.ipynb
├── dj_python_4_tonko_20181004.ipynb
└── dj_value_counts.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Приёмы при написании python-программ
2 |
3 | сделано для частного пользования и для выкладывания материалов, связанных с программированием на Питоне
4 |
5 | * [кодирование категориальных (факторных, номинальных) признаков](dj_cat_coding.ipynb)
6 | * [различные способы получения числа категорий](dj_value_counts.ipynb)
7 | * [4 задачи на быструю предобработку данных на Питоне (pandas)](dj_fast_preprocessing.ipynb)
8 |
9 |
10 | ### для лекции "Введение в ЯП Питон (Python)" курса https://github.com/Dyakonov/IML
11 | * [Питон часть I - общий обзор](dj_python_0_intro_20181004.ipynb)
12 | * части II пока нет
13 | * [Питон часть III - ООП](dj_python_2_oop_20181004.ipynb)
14 | * [Питон часть IV - тонкости](dj_python_4_tonko_20181004.ipynb)
15 |
16 | * [Пакеты Numpy, Scipy, Matplotlib](dj_numpy_20181021.ipynb)
17 |
18 | ### для начинающих
19 | * [Совсем начальное введение в Matplotlib](dj_matplotlib_intro.ipynb)
20 |
21 |
22 | ### Pytorch - для курса по DL
23 |
24 | * [Введение в Pytorch](dj_intro_pytorch20210125.ipynb)
25 | * [Пример сетей для табличных данных](dj_TABLE_example.ipynb)
26 | * [Пример работы с изображениями](book_dl_cnn01_MNIST.ipynb)
27 | * [Решение MNIST](dj_dl_cnn01_MNIST.ipynb) и выше тоже
28 | * [Автокодировщики](dj_AE_20220318.ipynb) и выше там тоже был пример (начать с него)
29 | * [Языковая модель на LSTM](dj_LSTMLangModel.ipynb)
30 |
31 |
--------------------------------------------------------------------------------
/dj_LSTMLangModel.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "view-in-github",
7 | "colab_type": "text"
8 | },
9 | "source": [
10 | "
"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {
16 | "id": "jr_HDBrT-XlW"
17 | },
18 | "source": [
19 | "# Языковая модель на LSTM\n",
20 | "\n",
21 | "по мотивам https://atcold.github.io/pytorch-Deep-Learning/"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "id": "I5cOUtpz-XlX",
29 | "outputId": "a9a97d48-b9cf-4c13-a072-c67b1323a3d7"
30 | },
31 | "outputs": [
32 | {
33 | "data": {
34 | "text/plain": [
35 | "'cuda'"
36 | ]
37 | },
38 | "execution_count": 1,
39 | "metadata": {},
40 | "output_type": "execute_result"
41 | }
42 | ],
43 | "source": [
44 | "import torch\n",
45 | "import torch.nn as nn\n",
46 | "import torch.nn.utils.rnn as rnn\n",
47 | "from torch.utils.data import Dataset, DataLoader, TensorDataset\n",
48 | "import numpy as np\n",
49 | "import time\n",
50 | "\n",
51 | "# import shakespeare_data as sh\n",
52 | "\n",
53 | "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
54 | "DEVICE"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {
60 | "id": "VYT7Q6Vd-XlY"
61 | },
62 | "source": [
63 | "## Загрузить данные"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {
70 | "scrolled": true,
71 | "id": "VSoPFZvH-XlY",
72 | "outputId": "54991443-96e5-48d4-f280-786e10cc4a48"
73 | },
74 | "outputs": [
75 | {
76 | "name": "stdout",
77 | "output_type": "stream",
78 | "text": [
79 | "александр сергеевич пушкин евгений онегин роман в стихах ,. проникнутый тщеславием, он обладал сверх того еще особенной гордостью, которая побуждает признаваться с одинаковым равнодушием в своих как добрых, так и дурных поступках, следствие чувства превосходства, быть может мнимого.из частного письма фр. мысля гордый свет забавить, вниманье дружбы возлюбя, хотел бы я тебе представить залог достойнее тебя, достойнее души прекрасной, святой исполненной мечты, поэзии живой и ясной, высоких дум и простоты; но так и быть рукой пристрастной прими собранье пестрых глав, полусмешных, полупечальных, простонародных, идеальных, небрежный плод моих забав, бессонниц, легких вдохновений, незрелых и увядших лет, ума холодных наблюдений и сердца горестных замет. глава первая и жить торопится, и чувствовать спешит. князь вяземский эпиграф взят из стихотворения п. а. вяземского первый снег. мой дядя самых честных правил, когда не в шутку занемог, он уважать себя заставил и лучше выдумать не мог. его пример другим наука; но, боже мой, какая скука с больным сидеть и день и ночь, не отходя ни шагу прочь! какое низкое коварство полуживого забавлять, ему подушки поправлять, печально подносить лекарство, вздыхать и думать про себя: когда же черт возьмет тебя! так думал молодой повеса, летя в пыли на почтовых, всевышней волею зевеса наследник всех своих родных. друзья людмилы и руслана! с героем моего романа без предисловий, сей же час позвольте познакомить вас: онегин, добрый мой приятель, родился на брегах невы, где, может быть, родились вы или блистали, мой читатель; там некогда гулял и я: но вреден север для меня писано в бесарабии. служив отлично благородно, долгами жил его отец, давал три бала ежегодно и промотался наконец. судьба евгения хранила: сперва за ним ходила, потом ее сменил; ребенок был резов, но мил. , француз убогой, чтоб не измучилось дитя, учил его всему шутя, не докучал моралью строгой, слегка за шалости бранил и в летний сад гулять водил. когда же юности мятежной приш\n"
80 | ]
81 | }
82 | ],
83 | "source": [
84 | "# filename = 'onegin_small.txt'\n",
85 | "filename = 'onegin.txt'\n",
86 | "\n",
87 | "import re\n",
88 | "\n",
89 | "\n",
90 | "\n",
91 | "def read_corpus(filename):\n",
92 | " # r = re.compile(\"[а-яА-Я .!,;:]+\")\n",
93 | " lines = []\n",
94 | " with open(filename, 'r', encoding='Windows-1251', errors='ignore') as f:\n",
95 | " for pos, line in enumerate(f):\n",
96 | " # line = line.replace(\"\\t\", \"\").replace(\"\\n\", \" \")\n",
97 | " #line = ''.join([c for c in filter(r.match, line)]) # оставить русские буквы\n",
98 | " #\n",
99 | " line = re.sub('[^а-яА-Я .!,;:]+', ' ', line.replace(\"\\t\", \"\").replace(\"\\n\", \" \")).strip().lower()\n",
100 | " line = re.sub(\" +\", \" \", line) # схлопываем пробелы\n",
101 | " line = line.replace(\" .\", \".\")\n",
102 | " line = line.replace(\" ,\", \",\")\n",
103 | " line = line.replace(\" !\", \"!\")\n",
104 | " line = re.sub(\"[.]+\", \".\", line)\n",
105 | " line = re.sub(\"[,]+\", \",\", line)\n",
106 | " line = re.sub(\"[!]+\", \"!\", line)\n",
107 | " if len(line.strip()) > 0:\n",
108 | " lines.append(line)\n",
109 | " corpus = \" \".join(lines)\n",
110 | " return corpus\n",
111 | "\n",
112 | "corpus = read_corpus(filename)\n",
113 | "print (corpus[:2000])"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {
120 | "id": "l-_7Y72f-XlY",
121 | "outputId": "a2c1f6bb-59bd-45bd-f39d-7562276a45ea"
122 | },
123 | "outputs": [
124 | {
125 | "name": "stdout",
126 | "output_type": "stream",
127 | "text": [
128 | "Число символов в корпусе: 147224\n",
129 | "Число уникальных символов: 38\n",
130 | "corpus_array.shape: (147224,)\n"
131 | ]
132 | }
133 | ],
134 | "source": [
135 | "def get_charmap(corpus):\n",
136 | " chars = list(set(corpus))\n",
137 | " chars.sort()\n",
138 | " charmap = {c: i for i, c in enumerate(chars)}\n",
139 | " return chars, charmap\n",
140 | "\n",
141 | "\n",
142 | "def map_corpus(corpus, charmap):\n",
143 | " return np.array([charmap[c] for c in corpus], dtype=np.int64)\n",
144 | "\n",
145 | "\n",
146 | "def to_text(line, charset):\n",
147 | " return \"\".join([charset[c] for c in line])\n",
148 | "\n",
149 | "print(f\"Число символов в корпусе: {len(corpus)}\")\n",
150 | "chars, charmap = get_charmap(corpus)\n",
151 | "charcount = len(chars)\n",
152 | "print(f\"Число уникальных символов: {len(chars)}\")\n",
153 | "corpus_array = map_corpus(corpus, charmap)\n",
154 | "print(f\"corpus_array.shape: {corpus_array.shape}\")\n",
155 | "\n",
156 | "# Число символов в корпусе: 158663\n",
157 | "# Число уникальных символов: 148\n",
158 | "# corpus_array.shape: (158663,)"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {
165 | "id": "mVxZUUtV-XlY",
166 | "outputId": "c0173b1a-b9ec-48ee-ece0-7cc1a5eced90"
167 | },
168 | "outputs": [
169 | {
170 | "name": "stdout",
171 | "output_type": "stream",
172 | "text": [
173 | " ! , . : ; а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я\n"
174 | ]
175 | }
176 | ],
177 | "source": [
178 | "print (' '.join(chars)) # символы"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": null,
184 | "metadata": {
185 | "id": "uu7TGnaS-XlY",
186 | "outputId": "b509962b-da6f-4bda-a372-6981d9709134"
187 | },
188 | "outputs": [
189 | {
190 | "name": "stdout",
191 | "output_type": "stream",
192 | "text": [
193 | "{' ': 0, '!': 1, ',': 2, '.': 3, ':': 4, ';': 5, 'а': 6, 'б': 7, 'в': 8, 'г': 9, 'д': 10, 'е': 11, 'ж': 12, 'з': 13, 'и': 14, 'й': 15, 'к': 16, 'л': 17, 'м': 18, 'н': 19, 'о': 20, 'п': 21, 'р': 22, 'с': 23, 'т': 24, 'у': 25, 'ф': 26, 'х': 27, 'ц': 28, 'ч': 29, 'ш': 30, 'щ': 31, 'ъ': 32, 'ы': 33, 'ь': 34, 'э': 35, 'ю': 36, 'я': 37}\n"
194 | ]
195 | }
196 | ],
197 | "source": [
198 | "print (charmap)"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "metadata": {
205 | "id": "wF577ue0-XlZ"
206 | },
207 | "outputs": [],
208 | "source": [
209 | "# текст -> последовательности фиксированной длины\n",
210 | "# плохая версия!!!\n",
211 | "\n",
212 | "class TextDataset(Dataset):\n",
213 | "\n",
214 | " def __init__(self, text, seq_len = 200):\n",
215 | " n_seq = len(text) // seq_len\n",
216 | " text = text[:n_seq * seq_len]\n",
217 | " self.data = torch.tensor(text).view(-1,seq_len)\n",
218 | "\n",
219 | " def __getitem__(self,i):\n",
220 | " txt = self.data[i]\n",
221 | " return txt[:-1], txt[1:] # метки - это те же последовательности, сдвинутые на 1\n",
222 | "\n",
223 | " def __len__(self):\n",
224 | " return self.data.size(0)\n",
225 | "\n",
226 | "# используется в DataLoader - список последовательностей в батч\n",
227 | "# ответ: seq_len x batch_size\n",
228 | "def collate(seq_list):\n",
229 | " inputs = torch.cat([s[0].unsqueeze(1) for s in seq_list], dim=1)\n",
230 | " targets = torch.cat([s[1].unsqueeze(1) for s in seq_list], dim=1)\n",
231 | " return inputs, targets"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": null,
237 | "metadata": {
238 | "id": "zW0cIQ3j-XlZ"
239 | },
240 | "outputs": [],
241 | "source": [
242 | "\"\"\"\n",
243 | "более хорошая версия\n",
244 | "\"\"\"\n",
245 | "class TextDataset(Dataset):\n",
246 | "\n",
247 | " def __init__(self, text, seq_len = 200):\n",
248 | " self.len = len(text) - seq_len + 1\n",
249 | " self.data = []\n",
250 | " self.seq_len = seq_len\n",
251 | " for i in range(self.len):\n",
252 | " self.data.append(torch.tensor(text[i: i+self.seq_len]))\n",
253 | "\n",
254 | " def __getitem__(self, i):\n",
255 | " #line = self.data[i: i+self.seq_len]\n",
256 | " #line = torch.tensor(line) # это плохо\n",
257 | " line = self.data[i]\n",
258 | " return line[:-1].to(DEVICE), line[1:].to(DEVICE)\n",
259 | "\n",
260 | " def __len__(self):\n",
261 | " return self.len"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {
268 | "id": "GD7qZAN--XlZ"
269 | },
270 | "outputs": [],
271 | "source": [
272 | "# Model\n",
273 | "class CharLanguageModel(nn.Module):\n",
274 | "\n",
275 | " def __init__(self, vocab_size, embed_size, hidden_size, nlayers):\n",
276 | " super(CharLanguageModel,self).__init__()\n",
277 | " self.vocab_size = vocab_size\n",
278 | " self.embed_size = embed_size\n",
279 | " self.hidden_size = hidden_size\n",
280 | " self.nlayers = nlayers\n",
281 | " self.embedding = nn.Embedding(vocab_size,\n",
282 | " embed_size) # Embedding layer\n",
283 | " self.rnn = nn.LSTM(input_size = embed_size,\n",
284 | " hidden_size=hidden_size,\n",
285 | " num_layers=nlayers) # Recurrent network\n",
286 | " self.scoring = nn.Linear(hidden_size, vocab_size) # Projection layer\n",
287 | "\n",
288 | " def forward(self, seq_batch): # L x N\n",
289 | " # returns 3D logits\n",
290 | " batch_size = seq_batch.size(1) # здесь это размерность 1\n",
291 | " embed = self.embedding(seq_batch) # L x N x E\n",
292 | " hidden = None\n",
293 | " output_lstm, hidden = self.rnn(embed, hidden) # L x N x H\n",
294 | " output_lstm_flatten = output_lstm.view(-1, self.hidden_size) # (L*N) x H\n",
295 | " output_flatten = self.scoring(output_lstm_flatten) #(L*N) x V\n",
296 | " return output_flatten.view(-1, batch_size, self.vocab_size)\n",
297 | "\n",
298 | " def generate(self, seq, n_words): # L x V\n",
299 | " # жадный поиск для генерации слов\n",
300 | " generated_words = []\n",
301 | " embed = self.embedding(seq).unsqueeze(1) # L x 1 x E\n",
302 | " hidden = None\n",
303 | " output_lstm, hidden = self.rnn(embed, hidden) # L x 1 x H\n",
304 | " output = output_lstm[-1] # 1 x H\n",
305 | " scores = self.scoring(output) # 1 x V\n",
306 | " _, current_word = torch.max(scores, dim=1) # 1 x 1\n",
307 | " generated_words.append(current_word)\n",
308 | " if n_words > 1:\n",
309 | " for i in range(n_words-1):\n",
310 | " embed = self.embedding(current_word).unsqueeze(0) # 1 x 1 x E\n",
311 | " output_lstm, hidden = self.rnn(embed, hidden) # 1 x 1 x H\n",
312 | " output = output_lstm[0] # 1 x H\n",
313 | " scores = self.scoring(output) # V\n",
314 | " _,current_word = torch.max(scores, dim=1) # 1\n",
315 | " generated_words.append(current_word)\n",
316 | " return torch.cat(generated_words, dim=0)"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": null,
322 | "metadata": {
323 | "id": "pKyS-MEm-XlZ"
324 | },
325 | "outputs": [],
326 | "source": [
327 | "def train_epoch(model, optimizer, train_loader, val_loader):\n",
328 | " criterion = nn.CrossEntropyLoss()\n",
329 | " criterion = criterion.to(DEVICE)\n",
330 | " train_loss = 0\n",
331 | " before = time.time()\n",
332 | " print(\"training\", len(train_loader), \"number of batches\")\n",
333 | " for batch_idx, (inputs,targets) in enumerate(train_loader):\n",
334 | " if batch_idx == 0:\n",
335 | " first_time = time.time()\n",
336 | " inputs = inputs.to(DEVICE)\n",
337 | " targets = targets.to(DEVICE)\n",
338 | " outputs = model(inputs) # 3D\n",
339 | " loss = criterion(outputs.view(-1,outputs.size(2)),targets.view(-1)) # Loss of the flattened outputs\n",
340 | " optimizer.zero_grad()\n",
341 | " loss.backward()\n",
342 | " optimizer.step()\n",
343 | "\n",
344 | " train_loss += loss.item()\n",
345 | "\n",
346 | " #if batch_idx == 0:\n",
347 | " # print(\"Time elapsed\", time.time() - first_time)\n",
348 | "\n",
349 | " #if batch_idx % 500 == 0 and batch_idx != 0:\n",
350 | " # after = time.time()\n",
351 | " # print(\"Time: \", after - before)\n",
352 | " # print(\"Loss per word: \", loss.item() / batch_idx)\n",
353 | " # print(\"Perplexity: \", np.exp(loss.item() / batch_idx))\n",
354 | " # after = before\n",
355 | "\n",
356 | " train_loss = train_loss / batch_idx\n",
357 | "\n",
358 | " val_loss = 0\n",
359 | " batch_id = 0\n",
360 | " for inputs,targets in val_loader:\n",
361 | " batch_id += 1\n",
362 | " inputs = inputs.to(DEVICE)\n",
363 | " targets = targets.to(DEVICE)\n",
364 | " outputs = model(inputs)\n",
365 | " loss = criterion(outputs.view(-1,outputs.size(2)), targets.view(-1))\n",
366 | " val_loss += loss.item()\n",
367 | " val_lpw = val_loss / batch_id\n",
368 | " # print(\"\\nValidation loss per word:\",val_lpw)\n",
369 | " print(\"Train perplexity :\", np.exp(train_loss))\n",
370 | " print(\"Validation perplexity :\", np.exp(val_loss / batch_id))\n",
371 | " return val_lpw"
372 | ]
373 | },
374 | {
375 | "cell_type": "code",
376 | "execution_count": null,
377 | "metadata": {
378 | "id": "cc9E1D1q-XlZ"
379 | },
380 | "outputs": [],
381 | "source": [
382 | "#model = CharLanguageModel(charcount, 256, 256,3)\n",
383 | "model = CharLanguageModel(vocab_size=charcount,\n",
384 | " embed_size=256,\n",
385 | " hidden_size=256,\n",
386 | " nlayers=2)\n",
387 | "model = model.to(DEVICE)\n",
388 | "optimizer = torch.optim.Adam(model.parameters(),\n",
389 | " lr=0.01, weight_decay=1e-7) #, # lr=0.001,\n",
390 | " #weight_decay=1e-6)\n",
391 | "\n",
392 | "split = 120000\n",
393 | "train_dataset = TextDataset(corpus_array[:split], seq_len=100)\n",
394 | "val_dataset = TextDataset(corpus_array[split:], seq_len=100)\n",
395 | "train_loader = DataLoader(train_dataset, shuffle=True, batch_size=64, collate_fn = collate)\n",
396 | "val_loader = DataLoader(val_dataset, shuffle=False, batch_size=64, collate_fn = collate, drop_last=True)"
397 | ]
398 | },
399 | {
400 | "cell_type": "code",
401 | "execution_count": null,
402 | "metadata": {
403 | "id": "elQpQNEe-XlZ"
404 | },
405 | "outputs": [],
406 | "source": [
407 | "#train_dataset = TextDataset(shakespeare_array, seq_len = 10)\n",
408 | "#train_loader = DataLoader(train_dataset, shuffle=True, batch_size=1, collate_fn = collate)"
409 | ]
410 | },
411 | {
412 | "cell_type": "code",
413 | "execution_count": null,
414 | "metadata": {
415 | "id": "NSGHOPo--XlZ",
416 | "outputId": "c1131169-c035-4748-8335-db1ee0b16308"
417 | },
418 | "outputs": [
419 | {
420 | "data": {
421 | "text/plain": [
422 | "(119901, 27125)"
423 | ]
424 | },
425 | "execution_count": 16,
426 | "metadata": {},
427 | "output_type": "execute_result"
428 | }
429 | ],
430 | "source": [
431 | "train_dataset.__len__(), val_dataset.__len__()\n",
432 | "# (1200, 386)"
433 | ]
434 | },
435 | {
436 | "cell_type": "code",
437 | "execution_count": null,
438 | "metadata": {
439 | "id": "2hlve73c-XlZ"
440 | },
441 | "outputs": [],
442 | "source": [
443 | "# i = 0\n",
444 | "# for i1, i2 in val_loader:\n",
445 | "# i = i + 1\n",
446 | "# if (i>10): break;\n",
447 | "# # print(i1.shape, i2.shape)\n",
448 | "# print(i1, i2)\n",
449 | "\n",
450 | "128 3 0.01\n",
451 | "Train perplexity : 1.708431299024099\n",
452 | "Validation perplexity : 68.95001547975727\n",
453 | "\n",
454 | "256 3 lr=0.01, weight_decay=1e-7\n",
455 | "Train perplexity : 1.6952816075689339\n",
456 | "Validation perplexity : 55.76105459243429\n",
457 | "\n",
458 | "256 2 lr=0.01, weight_decay=1e-7\n",
459 | "Train perplexity : 1.9566967733083707\n",
460 | "Validation perplexity : 32.95505913167213"
461 | ]
462 | },
463 | {
464 | "cell_type": "code",
465 | "execution_count": null,
466 | "metadata": {
467 | "id": "vc_pd3kk-XlZ",
468 | "outputId": "bf04fc1a-d0c7-4eed-bbdb-448cc57f9fb0"
469 | },
470 | "outputs": [
471 | {
472 | "name": "stdout",
473 | "output_type": "stream",
474 | "text": [
475 | "training 1874 number of batches\n",
476 | "Train perplexity : 3.100737592348649\n",
477 | "Validation perplexity : 20.92226048805598\n",
478 | "training 1874 number of batches\n",
479 | "Train perplexity : 1.68108752638697\n",
480 | "Validation perplexity : 44.035060143245296\n"
481 | ]
482 | }
483 | ],
484 | "source": [
485 | "for i in range(2):\n",
486 | " train_epoch(model=model,\n",
487 | " optimizer=optimizer,\n",
488 | " train_loader=train_loader,\n",
489 | " val_loader=val_loader)"
490 | ]
491 | },
492 | {
493 | "cell_type": "code",
494 | "execution_count": null,
495 | "metadata": {
496 | "id": "0WtciWCT-XlZ"
497 | },
498 | "outputs": [],
499 | "source": [
500 | "def generate(model, seed, nwords):\n",
501 | " seq = map_corpus(seed, charmap)\n",
502 | " seq = torch.tensor(seq).to(DEVICE)\n",
503 | " out = model.generate(seq, nwords)\n",
504 | " return to_text(out.cpu().detach().numpy(), chars)"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": null,
510 | "metadata": {
511 | "id": "hciqG-mN-XlZ",
512 | "outputId": "07b643a8-af9b-4178-c80a-9bbe99ba9c02"
513 | },
514 | "outputs": [
515 | {
516 | "name": "stdout",
517 | "output_type": "stream",
518 | "text": [
519 | " значит видеть свет! где ж лучше гордым и после важно повторять одно, стараться вас задригалы иль пр\n"
520 | ]
521 | }
522 | ],
523 | "source": [
524 | "print(generate(model, \"онегин встал и подошел, сказав, что\", 100))"
525 | ]
526 | },
527 | {
528 | "cell_type": "code",
529 | "execution_count": null,
530 | "metadata": {
531 | "id": "m8Q05TkI-XlZ",
532 | "outputId": "e1bb8d44-8c2e-4830-e58b-da67d38f0450"
533 | },
534 | "outputs": [
535 | {
536 | "name": "stdout",
537 | "output_type": "stream",
538 | "text": [
539 | " занеможественной мечты, поэзии живой и кума ему не наши внемя. тут же полуравет, и вдруг нетвал, пр\n"
540 | ]
541 | }
542 | ],
543 | "source": [
544 | "print(generate(model, \"мой дядя самых честных правил, когда не в шутку\", 100))"
545 | ]
546 | },
547 | {
548 | "cell_type": "code",
549 | "execution_count": null,
550 | "metadata": {
551 | "id": "6qtp4tIl-Xla"
552 | },
553 | "outputs": [],
554 | "source": []
555 | },
556 | {
557 | "cell_type": "code",
558 | "execution_count": null,
559 | "metadata": {
560 | "id": "KE139E0x-Xla"
561 | },
562 | "outputs": [],
563 | "source": [
564 | "seq = map_corpus(corpus[500:530], charmap) # \"Высоких дум и простоты\"\n",
565 | "seq = torch.tensor(seq).to(DEVICE)\n",
566 | "out = model.generate(seq, 100)\n"
567 | ]
568 | },
569 | {
570 | "cell_type": "code",
571 | "execution_count": null,
572 | "metadata": {
573 | "id": "RAR_fVDY-Xla",
574 | "outputId": "59ce5e54-9618-45e4-8c85-dee9003708d0"
575 | },
576 | "outputs": [
577 | {
578 | "data": {
579 | "text/plain": [
580 | "'александр сергеевич пушкин евгений онегин роман в стихах ,. проникнутый тщеславием, он обладал сверх того еще особенной гордостью, которая побуждает признаваться с одинаковым равнодушием в своих как добрых, так и дурных поступках, следствие чувства превосходства, быть может мнимого.из частного письма фр. мысля гордый свет забавить, вниманье дружбы возлюбя, хотел бы я тебе представить залог достойнее тебя, достойнее души прекрасной, святой исполненной мечты, поэзии живой и ясной, высоких дум и простоты; но так и быть рукой пристрастной прими собранье пестрых глав, полусмешных, полупечальных, простонародных, идеальных, небрежный плод моих забав, бессонниц, легких вдохновений, незрелых и увядших лет, ума холодных наблюдений и сердца горестных замет. глава первая и жить торопится, и чувствовать спешит. князь вяземский эпиграф взят из стихотворения п. а. вяземского первый снег. мой дядя самых честных правил, когда не в шутку занемог, он уважать себя заставил и лучше выдумать не мог. его при'"
581 | ]
582 | },
583 | "execution_count": 89,
584 | "metadata": {},
585 | "output_type": "execute_result"
586 | }
587 | ],
588 | "source": [
589 | "corpus[:1000] # 500:1530]"
590 | ]
591 | },
592 | {
593 | "cell_type": "code",
594 | "execution_count": null,
595 | "metadata": {
596 | "id": "DEk6k6WM-Xla",
597 | "outputId": "0df0524a-a0a4-49bb-b9a0-ea81771af85f"
598 | },
599 | "outputs": [
600 | {
601 | "data": {
602 | "text/plain": [
603 | "tensor([21, 20, 23, 24, 6, 17, 34, 19, 20, 0, 21, 20, 23, 24, 6, 17, 34, 19,\n",
604 | " 20, 0, 21, 20, 23, 24, 6, 17, 34, 19, 20, 0, 21, 20, 23, 24, 6, 17,\n",
605 | " 34, 19, 20, 0, 21, 20, 23, 24, 6, 17, 34, 19, 20, 0, 21, 20, 23, 24,\n",
606 | " 6, 17, 34, 19, 20, 0, 21, 20, 23, 24, 6, 17, 34, 19, 20, 0, 21, 20,\n",
607 | " 23, 24, 6, 17, 34, 19, 20, 0, 21, 20, 23, 24, 6, 17, 34, 19, 20, 0,\n",
608 | " 21, 20, 23, 24, 6, 17, 34, 19, 20, 0], device='cuda:0')"
609 | ]
610 | },
611 | "execution_count": 23,
612 | "metadata": {},
613 | "output_type": "execute_result"
614 | }
615 | ],
616 | "source": [
617 | "out"
618 | ]
619 | },
620 | {
621 | "cell_type": "code",
622 | "execution_count": null,
623 | "metadata": {
624 | "id": "PltYvTZW-Xla",
625 | "outputId": "6350a55a-3af4-4dbd-ca9f-532f76f099bf"
626 | },
627 | "outputs": [
628 | {
629 | "data": {
630 | "text/plain": [
631 | "148"
632 | ]
633 | },
634 | "execution_count": 72,
635 | "metadata": {},
636 | "output_type": "execute_result"
637 | }
638 | ],
639 | "source": [
640 | "charcount"
641 | ]
642 | }
643 | ],
644 | "metadata": {
645 | "kernelspec": {
646 | "display_name": "Python 3",
647 | "language": "python",
648 | "name": "python3"
649 | },
650 | "language_info": {
651 | "codemirror_mode": {
652 | "name": "ipython",
653 | "version": 3
654 | },
655 | "file_extension": ".py",
656 | "mimetype": "text/x-python",
657 | "name": "python",
658 | "nbconvert_exporter": "python",
659 | "pygments_lexer": "ipython3",
660 | "version": "3.8.8"
661 | },
662 | "colab": {
663 | "provenance": [],
664 | "include_colab_link": true
665 | }
666 | },
667 | "nbformat": 4,
668 | "nbformat_minor": 0
669 | }
--------------------------------------------------------------------------------
/dj_fast_preprocessing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Эксперименты по ускорению обработки данных в Python\n",
8 | "\n",
9 | "* 2019, Александр Дьяконов (https://dyakonov.org/ag/)\n",
10 | "\n",
11 | "Частично использованы материалы\n",
12 | "\n",
13 | "* Глеба Маслякова https://nbviewer.jupyter.org/github/glebmaslyak/PZAD_Homeworks/blob/student/PZAD_feature_preprocessing_hw.ipynb \n",
14 | "\n",
15 | "* Дениса Бибика https://nbviewer.jupyter.org/github/den-bibik/PZAD/blob/master/optimize.ipynb"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 136,
21 | "metadata": {},
22 | "outputs": [],
23 | "source": [
24 | "import pandas as pd\n",
25 | "import numpy as np"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "### Задача 1 - устранить знак доллара"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 137,
38 | "metadata": {},
39 | "outputs": [
40 | {
41 | "data": {
42 | "text/html": [
43 | "
\n",
44 | "\n",
57 | "
\n",
58 | " \n",
59 | " \n",
60 | " | \n",
61 | " price | \n",
62 | " feature | \n",
63 | "
\n",
64 | " \n",
65 | " \n",
66 | " \n",
67 | " 0 | \n",
68 | " 17$ | \n",
69 | " 0.0 | \n",
70 | "
\n",
71 | " \n",
72 | " 1 | \n",
73 | " 89$ | \n",
74 | " 0.0 | \n",
75 | "
\n",
76 | " \n",
77 | " 2 | \n",
78 | " 39$ | \n",
79 | " 0.0 | \n",
80 | "
\n",
81 | " \n",
82 | " 3 | \n",
83 | " 97$ | \n",
84 | " 0.0 | \n",
85 | "
\n",
86 | " \n",
87 | " 4 | \n",
88 | " 23$ | \n",
89 | " 0.0 | \n",
90 | "
\n",
91 | " \n",
92 | "
\n",
93 | "
"
94 | ],
95 | "text/plain": [
96 | " price feature\n",
97 | "0 17$ 0.0\n",
98 | "1 89$ 0.0\n",
99 | "2 39$ 0.0\n",
100 | "3 97$ 0.0\n",
101 | "4 23$ 0.0"
102 | ]
103 | },
104 | "execution_count": 137,
105 | "metadata": {},
106 | "output_type": "execute_result"
107 | }
108 | ],
109 | "source": [
110 | "def make_s(n_rows):\n",
111 | " tmp = pd.DataFrame({'price': (100*np.random.rand(n_rows)).astype(int), 'feature': np.zeros(n_rows)})\n",
112 | " tmp['price'] = tmp['price'].astype(str) + '$'\n",
113 | " return tmp\n",
114 | "\n",
115 | "data = make_s(5)\n",
116 | "data"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": 139,
122 | "metadata": {},
123 | "outputs": [
124 | {
125 | "data": {
126 | "text/html": [
127 | "\n",
128 | "\n",
141 | "
\n",
142 | " \n",
143 | " \n",
144 | " | \n",
145 | " price | \n",
146 | " feature | \n",
147 | " price($)_v1 | \n",
148 | " price($)_v2 | \n",
149 | " price($)_v3 | \n",
150 | " price($)_v4 | \n",
151 | "
\n",
152 | " \n",
153 | " \n",
154 | " \n",
155 | " 0 | \n",
156 | " 17$ | \n",
157 | " 0.0 | \n",
158 | " 17 | \n",
159 | " 17 | \n",
160 | " 17 | \n",
161 | " 17 | \n",
162 | "
\n",
163 | " \n",
164 | " 1 | \n",
165 | " 89$ | \n",
166 | " 0.0 | \n",
167 | " 89 | \n",
168 | " 89 | \n",
169 | " 89 | \n",
170 | " 89 | \n",
171 | "
\n",
172 | " \n",
173 | " 2 | \n",
174 | " 39$ | \n",
175 | " 0.0 | \n",
176 | " 39 | \n",
177 | " 39 | \n",
178 | " 39 | \n",
179 | " 39 | \n",
180 | "
\n",
181 | " \n",
182 | " 3 | \n",
183 | " 97$ | \n",
184 | " 0.0 | \n",
185 | " 97 | \n",
186 | " 97 | \n",
187 | " 97 | \n",
188 | " 97 | \n",
189 | "
\n",
190 | " \n",
191 | " 4 | \n",
192 | " 23$ | \n",
193 | " 0.0 | \n",
194 | " 23 | \n",
195 | " 23 | \n",
196 | " 23 | \n",
197 | " 23 | \n",
198 | "
\n",
199 | " \n",
200 | "
\n",
201 | "
"
202 | ],
203 | "text/plain": [
204 | " price feature price($)_v1 price($)_v2 price($)_v3 price($)_v4\n",
205 | "0 17$ 0.0 17 17 17 17\n",
206 | "1 89$ 0.0 89 89 89 89\n",
207 | "2 39$ 0.0 39 39 39 39\n",
208 | "3 97$ 0.0 97 97 97 97\n",
209 | "4 23$ 0.0 23 23 23 23"
210 | ]
211 | },
212 | "execution_count": 139,
213 | "metadata": {},
214 | "output_type": "execute_result"
215 | }
216 | ],
217 | "source": [
218 | "data['price($)_v1'] = data['price'].apply(lambda x: int(x[:-1]))\n",
219 | "data['price($)_v2'] = data['price'].apply(lambda x: x[:-1]).astype(int)\n",
220 | "data['price($)_v3'] = data['price'].apply(lambda x: x.replace('$', '')).astype(int)\n",
221 | "data['price($)_v4'] = data['price'].str.replace('$', '').astype(int)\n",
222 | "data"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 140,
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "data = make_s(10000000)"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 141,
237 | "metadata": {},
238 | "outputs": [
239 | {
240 | "name": "stdout",
241 | "output_type": "stream",
242 | "text": [
243 | "CPU times: user 4.26 s, sys: 60 ms, total: 4.32 s\n",
244 | "Wall time: 4.36 s\n"
245 | ]
246 | }
247 | ],
248 | "source": [
249 | "%%time\n",
250 | "data['price($))_v1'] = data['price'].apply(lambda x: int(x[:-1]))\n",
251 | "# 4.2-4.33"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": 147,
257 | "metadata": {},
258 | "outputs": [
259 | {
260 | "name": "stdout",
261 | "output_type": "stream",
262 | "text": [
263 | "CPU times: user 2.5 s, sys: 124 ms, total: 2.62 s\n",
264 | "Wall time: 2.65 s\n"
265 | ]
266 | }
267 | ],
268 | "source": [
269 | "%%time\n",
270 | "data['price($)_v2'] = data['price'].apply(lambda x: x[:-1]).astype(int)\n",
271 | "# 2.47 s - 2.52 s"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": 144,
277 | "metadata": {},
278 | "outputs": [
279 | {
280 | "name": "stdout",
281 | "output_type": "stream",
282 | "text": [
283 | "CPU times: user 3.19 s, sys: 152 ms, total: 3.34 s\n",
284 | "Wall time: 3.35 s\n"
285 | ]
286 | }
287 | ],
288 | "source": [
289 | "%%time\n",
290 | "data['price($)_v3'] = data['price'].apply(lambda x: x.replace('$', '')).astype(int) \n",
291 | "# 3.14 - 3.31"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 145,
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "name": "stdout",
301 | "output_type": "stream",
302 | "text": [
303 | "CPU times: user 3.43 s, sys: 164 ms, total: 3.6 s\n",
304 | "Wall time: 3.63 s\n"
305 | ]
306 | }
307 | ],
308 | "source": [
309 | "%%time\n",
310 | "data['price($)_v4'] = data['price'].str.replace('$', '').astype(int)\n",
311 | "# 3.43 - 4"
312 | ]
313 | },
314 | {
315 | "cell_type": "markdown",
316 | "metadata": {},
317 | "source": [
318 | "### Задача - бинаризовать"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": 150,
324 | "metadata": {},
325 | "outputs": [
326 | {
327 | "data": {
328 | "text/html": [
329 | "\n",
330 | "\n",
343 | "
\n",
344 | " \n",
345 | " \n",
346 | " | \n",
347 | " type | \n",
348 | " feature | \n",
349 | "
\n",
350 | " \n",
351 | " \n",
352 | " \n",
353 | " 0 | \n",
354 | " A | \n",
355 | " 0.0 | \n",
356 | "
\n",
357 | " \n",
358 | " 1 | \n",
359 | " B | \n",
360 | " 0.0 | \n",
361 | "
\n",
362 | " \n",
363 | " 2 | \n",
364 | " A | \n",
365 | " 0.0 | \n",
366 | "
\n",
367 | " \n",
368 | " 3 | \n",
369 | " A | \n",
370 | " 0.0 | \n",
371 | "
\n",
372 | " \n",
373 | " 4 | \n",
374 | " A | \n",
375 | " 0.0 | \n",
376 | "
\n",
377 | " \n",
378 | "
\n",
379 | "
"
380 | ],
381 | "text/plain": [
382 | " type feature\n",
383 | "0 A 0.0\n",
384 | "1 B 0.0\n",
385 | "2 A 0.0\n",
386 | "3 A 0.0\n",
387 | "4 A 0.0"
388 | ]
389 | },
390 | "execution_count": 150,
391 | "metadata": {},
392 | "output_type": "execute_result"
393 | }
394 | ],
395 | "source": [
396 | "def make_t(n_rows):\n",
397 | " tmp = pd.DataFrame({'type': np.where(np.random.rand(n_rows)<0.5, 'A', 'B'), 'feature': np.zeros(n_rows)})\n",
398 | " return tmp\n",
399 | "\n",
400 | "data = make_t(5)\n",
401 | "data"
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": 155,
407 | "metadata": {},
408 | "outputs": [
409 | {
410 | "data": {
411 | "text/html": [
412 | "\n",
413 | "\n",
426 | "
\n",
427 | " \n",
428 | " \n",
429 | " | \n",
430 | " type | \n",
431 | " feature | \n",
432 | " type_v1 | \n",
433 | " type_v2 | \n",
434 | " type_v3 | \n",
435 | " type_v4 | \n",
436 | " type_v5 | \n",
437 | " type_v6 | \n",
438 | " type_v7 | \n",
439 | "
\n",
440 | " \n",
441 | " \n",
442 | " \n",
443 | " 0 | \n",
444 | " A | \n",
445 | " 0.0 | \n",
446 | " 1 | \n",
447 | " 1 | \n",
448 | " 1 | \n",
449 | " 1 | \n",
450 | " 0 | \n",
451 | " 1 | \n",
452 | " 0 | \n",
453 | "
\n",
454 | " \n",
455 | " 1 | \n",
456 | " B | \n",
457 | " 0.0 | \n",
458 | " 0 | \n",
459 | " 0 | \n",
460 | " 0 | \n",
461 | " 0 | \n",
462 | " 1 | \n",
463 | " 0 | \n",
464 | " 1 | \n",
465 | "
\n",
466 | " \n",
467 | " 2 | \n",
468 | " A | \n",
469 | " 0.0 | \n",
470 | " 1 | \n",
471 | " 1 | \n",
472 | " 1 | \n",
473 | " 1 | \n",
474 | " 0 | \n",
475 | " 1 | \n",
476 | " 0 | \n",
477 | "
\n",
478 | " \n",
479 | " 3 | \n",
480 | " A | \n",
481 | " 0.0 | \n",
482 | " 1 | \n",
483 | " 1 | \n",
484 | " 1 | \n",
485 | " 1 | \n",
486 | " 0 | \n",
487 | " 1 | \n",
488 | " 0 | \n",
489 | "
\n",
490 | " \n",
491 | " 4 | \n",
492 | " A | \n",
493 | " 0.0 | \n",
494 | " 1 | \n",
495 | " 1 | \n",
496 | " 1 | \n",
497 | " 1 | \n",
498 | " 0 | \n",
499 | " 1 | \n",
500 | " 0 | \n",
501 | "
\n",
502 | " \n",
503 | "
\n",
504 | "
"
505 | ],
506 | "text/plain": [
507 | " type feature type_v1 type_v2 type_v3 type_v4 type_v5 type_v6 type_v7\n",
508 | "0 A 0.0 1 1 1 1 0 1 0\n",
509 | "1 B 0.0 0 0 0 0 1 0 1\n",
510 | "2 A 0.0 1 1 1 1 0 1 0\n",
511 | "3 A 0.0 1 1 1 1 0 1 0\n",
512 | "4 A 0.0 1 1 1 1 0 1 0"
513 | ]
514 | },
515 | "execution_count": 155,
516 | "metadata": {},
517 | "output_type": "execute_result"
518 | }
519 | ],
520 | "source": [
521 | "data['type_v1'] = data['type'].apply(lambda x: 1 if x == \"A\" else 0)\n",
522 | "data['type_v2'] = (data['type']=='A').astype(int)\n",
523 | "data['type_v3'] = np.where(data['type'] == 'A', 1 ,0)\n",
524 | "data['type_v4'] = data['type'].map({'A': 1, 'B': 0})\n",
525 | "data['type_v5'] = data['type'].factorize()[0] # некорректный ответ\n",
526 | "data['type_v6'] = pd.get_dummies(data['type'])['A'] # uint8!!!\n",
527 | "from sklearn import preprocessing\n",
528 | "data['type_v7'] = preprocessing.LabelEncoder().fit_transform(data['type']) # некорректный ответ\n",
529 | "\n",
530 | "data"
531 | ]
532 | },
533 | {
534 | "cell_type": "code",
535 | "execution_count": 156,
536 | "metadata": {},
537 | "outputs": [],
538 | "source": [
539 | "data = make_t(10000000)"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 157,
545 | "metadata": {},
546 | "outputs": [
547 | {
548 | "name": "stdout",
549 | "output_type": "stream",
550 | "text": [
551 | "CPU times: user 2.14 s, sys: 72 ms, total: 2.22 s\n",
552 | "Wall time: 2.25 s\n"
553 | ]
554 | }
555 | ],
556 | "source": [
557 | "%%time\n",
558 | "data['type_v1'] = data['type'].apply(lambda x: 1 if x == \"A\" else 0)\n",
559 | "# 2.14 s - 2.18 s"
560 | ]
561 | },
562 | {
563 | "cell_type": "code",
564 | "execution_count": 159,
565 | "metadata": {},
566 | "outputs": [
567 | {
568 | "name": "stdout",
569 | "output_type": "stream",
570 | "text": [
571 | "CPU times: user 1.82 s, sys: 60 ms, total: 1.88 s\n",
572 | "Wall time: 1.89 s\n"
573 | ]
574 | }
575 | ],
576 | "source": [
577 | "%%time\n",
578 | "data['type_v1'] = data['type'].apply(lambda x: \"1\" if x == \"A\" else \"0\").astype(int)\n",
579 | "# 1.82 s - 1.86 s"
580 | ]
581 | },
582 | {
583 | "cell_type": "code",
584 | "execution_count": 160,
585 | "metadata": {},
586 | "outputs": [
587 | {
588 | "name": "stdout",
589 | "output_type": "stream",
590 | "text": [
591 | "CPU times: user 364 ms, sys: 20 ms, total: 384 ms\n",
592 | "Wall time: 386 ms\n"
593 | ]
594 | }
595 | ],
596 | "source": [
597 | "%%time\n",
598 | "data['type_v2'] = (data['type']=='A').astype(int)\n",
599 | "# 348 - 364"
600 | ]
601 | },
602 | {
603 | "cell_type": "code",
604 | "execution_count": 161,
605 | "metadata": {},
606 | "outputs": [
607 | {
608 | "name": "stdout",
609 | "output_type": "stream",
610 | "text": [
611 | "CPU times: user 376 ms, sys: 20 ms, total: 396 ms\n",
612 | "Wall time: 398 ms\n"
613 | ]
614 | }
615 | ],
616 | "source": [
617 | "%%time\n",
618 | "data['type_v3'] = np.where(data['type'] == 'A', 1 ,0)\n",
619 | "# 380-398"
620 | ]
621 | },
622 | {
623 | "cell_type": "code",
624 | "execution_count": 162,
625 | "metadata": {},
626 | "outputs": [
627 | {
628 | "name": "stdout",
629 | "output_type": "stream",
630 | "text": [
631 | "CPU times: user 420 ms, sys: 16 ms, total: 436 ms\n",
632 | "Wall time: 443 ms\n"
633 | ]
634 | }
635 | ],
636 | "source": [
637 | "%%time\n",
638 | "data['type_v4'] = data['type'].map({'A': 1, 'B': 0})\n",
639 | "# 400-424"
640 | ]
641 | },
642 | {
643 | "cell_type": "code",
644 | "execution_count": 163,
645 | "metadata": {},
646 | "outputs": [
647 | {
648 | "name": "stdout",
649 | "output_type": "stream",
650 | "text": [
651 | "CPU times: user 304 ms, sys: 36 ms, total: 340 ms\n",
652 | "Wall time: 357 ms\n"
653 | ]
654 | }
655 | ],
656 | "source": [
657 | "%%time\n",
658 | "data['type_v5'] = data['type'].factorize()[0]\n",
659 | "# 304-324"
660 | ]
661 | },
662 | {
663 | "cell_type": "code",
664 | "execution_count": 164,
665 | "metadata": {},
666 | "outputs": [
667 | {
668 | "name": "stdout",
669 | "output_type": "stream",
670 | "text": [
671 | "CPU times: user 364 ms, sys: 28 ms, total: 392 ms\n",
672 | "Wall time: 395 ms\n"
673 | ]
674 | }
675 | ],
676 | "source": [
677 | "%%time\n",
678 | "data['type_v6'] = pd.get_dummies(data['type'])['A']\n",
679 | "# 360-392"
680 | ]
681 | },
682 | {
683 | "cell_type": "code",
684 | "execution_count": 168,
685 | "metadata": {},
686 | "outputs": [
687 | {
688 | "name": "stdout",
689 | "output_type": "stream",
690 | "text": [
691 | "CPU times: user 5.5 s, sys: 36 ms, total: 5.53 s\n",
692 | "Wall time: 5.52 s\n"
693 | ]
694 | }
695 | ],
696 | "source": [
697 | "%%time\n",
698 | "from sklearn import preprocessing\n",
699 | "data['type_v7'] = preprocessing.LabelEncoder().fit_transform(data['type']) # некорректный ответ\n",
700 | "# 5.47 s - 5.57 s"
701 | ]
702 | },
703 | {
704 | "cell_type": "markdown",
705 | "metadata": {},
706 | "source": [
707 | "### Задача - расщепить"
708 | ]
709 | },
710 | {
711 | "cell_type": "code",
712 | "execution_count": 169,
713 | "metadata": {},
714 | "outputs": [
715 | {
716 | "data": {
717 | "text/html": [
718 | "\n",
719 | "\n",
732 | "
\n",
733 | " \n",
734 | " \n",
735 | " | \n",
736 | " feature | \n",
737 | " A/B | \n",
738 | "
\n",
739 | " \n",
740 | " \n",
741 | " \n",
742 | " 0 | \n",
743 | " 0.0 | \n",
744 | " 23/60 | \n",
745 | "
\n",
746 | " \n",
747 | " 1 | \n",
748 | " 0.0 | \n",
749 | " 65/76 | \n",
750 | "
\n",
751 | " \n",
752 | " 2 | \n",
753 | " 0.0 | \n",
754 | " 66/53 | \n",
755 | "
\n",
756 | " \n",
757 | " 3 | \n",
758 | " 0.0 | \n",
759 | " 57/53 | \n",
760 | "
\n",
761 | " \n",
762 | " 4 | \n",
763 | " 0.0 | \n",
764 | " 85/18 | \n",
765 | "
\n",
766 | " \n",
767 | "
\n",
768 | "
"
769 | ],
770 | "text/plain": [
771 | " feature A/B\n",
772 | "0 0.0 23/60\n",
773 | "1 0.0 65/76\n",
774 | "2 0.0 66/53\n",
775 | "3 0.0 57/53\n",
776 | "4 0.0 85/18"
777 | ]
778 | },
779 | "execution_count": 169,
780 | "metadata": {},
781 | "output_type": "execute_result"
782 | }
783 | ],
784 | "source": [
785 | "def make_ab(n_rows):\n",
786 | " tmp = pd.DataFrame({'A': (100*np.random.rand(n_rows)).astype(int), 'B': (100*np.random.rand(n_rows)).astype(int), 'feature': np.zeros(n_rows)})\n",
787 | " tmp['A/B'] = tmp['A'].astype(str) + '/' + tmp['B'].astype(str)\n",
788 | " del tmp['A']\n",
789 | " del tmp['B']\n",
790 | " return tmp\n",
791 | "\n",
792 | "data = make_ab(5)\n",
793 | "data"
794 | ]
795 | },
796 | {
797 | "cell_type": "code",
798 | "execution_count": 171,
799 | "metadata": {},
800 | "outputs": [
801 | {
802 | "data": {
803 | "text/html": [
804 | "\n",
805 | "\n",
818 | "
\n",
819 | " \n",
820 | " \n",
821 | " | \n",
822 | " feature | \n",
823 | " A/B | \n",
824 | " A_v1 | \n",
825 | " B_v1 | \n",
826 | " A_v2 | \n",
827 | " B_v2 | \n",
828 | " A_v3 | \n",
829 | " B_v3 | \n",
830 | " A_v4 | \n",
831 | " B_v4 | \n",
832 | "
\n",
833 | " \n",
834 | " \n",
835 | " \n",
836 | " 0 | \n",
837 | " 0.0 | \n",
838 | " 23/60 | \n",
839 | " 23 | \n",
840 | " 60 | \n",
841 | " 23 | \n",
842 | " 60 | \n",
843 | " 23 | \n",
844 | " 60 | \n",
845 | " 23 | \n",
846 | " 60 | \n",
847 | "
\n",
848 | " \n",
849 | " 1 | \n",
850 | " 0.0 | \n",
851 | " 65/76 | \n",
852 | " 65 | \n",
853 | " 76 | \n",
854 | " 65 | \n",
855 | " 76 | \n",
856 | " 65 | \n",
857 | " 76 | \n",
858 | " 65 | \n",
859 | " 76 | \n",
860 | "
\n",
861 | " \n",
862 | " 2 | \n",
863 | " 0.0 | \n",
864 | " 66/53 | \n",
865 | " 66 | \n",
866 | " 53 | \n",
867 | " 66 | \n",
868 | " 53 | \n",
869 | " 66 | \n",
870 | " 53 | \n",
871 | " 66 | \n",
872 | " 53 | \n",
873 | "
\n",
874 | " \n",
875 | " 3 | \n",
876 | " 0.0 | \n",
877 | " 57/53 | \n",
878 | " 57 | \n",
879 | " 53 | \n",
880 | " 57 | \n",
881 | " 53 | \n",
882 | " 57 | \n",
883 | " 53 | \n",
884 | " 57 | \n",
885 | " 53 | \n",
886 | "
\n",
887 | " \n",
888 | " 4 | \n",
889 | " 0.0 | \n",
890 | " 85/18 | \n",
891 | " 85 | \n",
892 | " 18 | \n",
893 | " 85 | \n",
894 | " 18 | \n",
895 | " 85 | \n",
896 | " 18 | \n",
897 | " 85 | \n",
898 | " 18 | \n",
899 | "
\n",
900 | " \n",
901 | "
\n",
902 | "
"
903 | ],
904 | "text/plain": [
905 | " feature A/B A_v1 B_v1 A_v2 B_v2 A_v3 B_v3 A_v4 B_v4\n",
906 | "0 0.0 23/60 23 60 23 60 23 60 23 60\n",
907 | "1 0.0 65/76 65 76 65 76 65 76 65 76\n",
908 | "2 0.0 66/53 66 53 66 53 66 53 66 53\n",
909 | "3 0.0 57/53 57 53 57 53 57 53 57 53\n",
910 | "4 0.0 85/18 85 18 85 18 85 18 85 18"
911 | ]
912 | },
913 | "execution_count": 171,
914 | "metadata": {},
915 | "output_type": "execute_result"
916 | }
917 | ],
918 | "source": [
919 | "tmp = data['A/B'].str.split('/')\n",
920 | "data['A_v1'] = tmp.apply(lambda x: x[0])\n",
921 | "data['B_v1'] = tmp.apply(lambda x: x[1])\n",
922 | "\n",
923 | "data[['A_v2', 'B_v2']] = pd.DataFrame(data['A/B'].str.split('/', 1).tolist())\n",
924 | "\n",
925 | "data[['A_v3', 'B_v3']] = data['A/B'].str.split('/', expand=True)\n",
926 | "\n",
927 | "st = '/'.join(data['A/B'])\n",
928 | "data[['A_v4', 'B_v4']] = pd.DataFrame(np.array(st.split('/')).reshape(-1, 2))\n",
929 | "\n",
930 | "data"
931 | ]
932 | },
933 | {
934 | "cell_type": "code",
935 | "execution_count": 172,
936 | "metadata": {},
937 | "outputs": [],
938 | "source": [
939 | "data = make_ab(10000000)"
940 | ]
941 | },
942 | {
943 | "cell_type": "code",
944 | "execution_count": 177,
945 | "metadata": {},
946 | "outputs": [
947 | {
948 | "name": "stdout",
949 | "output_type": "stream",
950 | "text": [
951 | "CPU times: user 13.1 s, sys: 1.51 s, total: 14.6 s\n",
952 | "Wall time: 14.6 s\n"
953 | ]
954 | }
955 | ],
956 | "source": [
957 | "%%time\n",
958 | "tmp = data['A/B'].str.split('/')\n",
959 | "data['A_v1'] = tmp.apply(lambda x: x[0])\n",
960 | "data['B_v1'] = tmp.apply(lambda x: x[1])\n",
961 | "# 12.5 s-13.4 s"
962 | ]
963 | },
964 | {
965 | "cell_type": "code",
966 | "execution_count": 174,
967 | "metadata": {},
968 | "outputs": [
969 | {
970 | "name": "stdout",
971 | "output_type": "stream",
972 | "text": [
973 | "CPU times: user 9.94 s, sys: 176 ms, total: 10.1 s\n",
974 | "Wall time: 10.2 s\n"
975 | ]
976 | }
977 | ],
978 | "source": [
979 | "%%time\n",
980 | "data[['A_v2', 'B_v2']] = pd.DataFrame(data['A/B'].str.split('/', 1).tolist())\n",
981 | "# 10.2 s - 12.2 s"
982 | ]
983 | },
984 | {
985 | "cell_type": "code",
986 | "execution_count": 175,
987 | "metadata": {},
988 | "outputs": [
989 | {
990 | "name": "stdout",
991 | "output_type": "stream",
992 | "text": [
993 | "CPU times: user 26.1 s, sys: 368 ms, total: 26.5 s\n",
994 | "Wall time: 26.5 s\n"
995 | ]
996 | }
997 | ],
998 | "source": [
999 | "%%time\n",
1000 | "data[['A_v3', 'B_v3']] = data['A/B'].str.split('/', expand=True)\n",
1001 | "# 26.1 s - 29.2 s"
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "code",
1006 | "execution_count": 176,
1007 | "metadata": {},
1008 | "outputs": [
1009 | {
1010 | "name": "stdout",
1011 | "output_type": "stream",
1012 | "text": [
1013 | "CPU times: user 3.65 s, sys: 168 ms, total: 3.82 s\n",
1014 | "Wall time: 3.84 s\n"
1015 | ]
1016 | }
1017 | ],
1018 | "source": [
1019 | "%%time\n",
1020 | "st = '/'.join(data['A/B'])\n",
1021 | "data[['A_v4', 'B_v4']] = pd.DataFrame(np.array(st.split('/')).reshape(-1, 2))\n",
1022 | "# 3.65 s - 4.54 s"
1023 | ]
1024 | },
1025 | {
1026 | "cell_type": "markdown",
1027 | "metadata": {},
1028 | "source": [
1029 | "### Задача - заменить пропуски средним"
1030 | ]
1031 | },
1032 | {
1033 | "cell_type": "code",
1034 | "execution_count": 178,
1035 | "metadata": {},
1036 | "outputs": [
1037 | {
1038 | "data": {
1039 | "text/html": [
1040 | "\n",
1041 | "\n",
1054 | "
\n",
1055 | " \n",
1056 | " \n",
1057 | " | \n",
1058 | " type | \n",
1059 | " feature | \n",
1060 | " feature_v1 | \n",
1061 | " feature_v2 | \n",
1062 | " feature_v3 | \n",
1063 | " feature_v4 | \n",
1064 | "
\n",
1065 | " \n",
1066 | " \n",
1067 | " \n",
1068 | " 0 | \n",
1069 | " test | \n",
1070 | " NaN | \n",
1071 | " NaN | \n",
1072 | " NaN | \n",
1073 | " NaN | \n",
1074 | " NaN | \n",
1075 | "
\n",
1076 | " \n",
1077 | " 1 | \n",
1078 | " test | \n",
1079 | " 43.0 | \n",
1080 | " 43.0 | \n",
1081 | " 43.0 | \n",
1082 | " 43.0 | \n",
1083 | " 43.0 | \n",
1084 | "
\n",
1085 | " \n",
1086 | " 2 | \n",
1087 | " test | \n",
1088 | " NaN | \n",
1089 | " NaN | \n",
1090 | " NaN | \n",
1091 | " NaN | \n",
1092 | " NaN | \n",
1093 | "
\n",
1094 | " \n",
1095 | " 3 | \n",
1096 | " train | \n",
1097 | " 4.0 | \n",
1098 | " 4.0 | \n",
1099 | " 4.0 | \n",
1100 | " 4.0 | \n",
1101 | " 4.0 | \n",
1102 | "
\n",
1103 | " \n",
1104 | " 4 | \n",
1105 | " train | \n",
1106 | " 18.0 | \n",
1107 | " 18.0 | \n",
1108 | " 18.0 | \n",
1109 | " 18.0 | \n",
1110 | " 18.0 | \n",
1111 | "
\n",
1112 | " \n",
1113 | " 5 | \n",
1114 | " train | \n",
1115 | " NaN | \n",
1116 | " NaN | \n",
1117 | " NaN | \n",
1118 | " NaN | \n",
1119 | " NaN | \n",
1120 | "
\n",
1121 | " \n",
1122 | " 6 | \n",
1123 | " train | \n",
1124 | " NaN | \n",
1125 | " NaN | \n",
1126 | " NaN | \n",
1127 | " NaN | \n",
1128 | " NaN | \n",
1129 | "
\n",
1130 | " \n",
1131 | " 7 | \n",
1132 | " train | \n",
1133 | " 25.0 | \n",
1134 | " 25.0 | \n",
1135 | " 25.0 | \n",
1136 | " 25.0 | \n",
1137 | " 25.0 | \n",
1138 | "
\n",
1139 | " \n",
1140 | " 8 | \n",
1141 | " train | \n",
1142 | " NaN | \n",
1143 | " NaN | \n",
1144 | " NaN | \n",
1145 | " NaN | \n",
1146 | " NaN | \n",
1147 | "
\n",
1148 | " \n",
1149 | " 9 | \n",
1150 | " train | \n",
1151 | " NaN | \n",
1152 | " NaN | \n",
1153 | " NaN | \n",
1154 | " NaN | \n",
1155 | " NaN | \n",
1156 | "
\n",
1157 | " \n",
1158 | "
\n",
1159 | "
"
1160 | ],
1161 | "text/plain": [
1162 | " type feature feature_v1 feature_v2 feature_v3 feature_v4\n",
1163 | "0 test NaN NaN NaN NaN NaN\n",
1164 | "1 test 43.0 43.0 43.0 43.0 43.0\n",
1165 | "2 test NaN NaN NaN NaN NaN\n",
1166 | "3 train 4.0 4.0 4.0 4.0 4.0\n",
1167 | "4 train 18.0 18.0 18.0 18.0 18.0\n",
1168 | "5 train NaN NaN NaN NaN NaN\n",
1169 | "6 train NaN NaN NaN NaN NaN\n",
1170 | "7 train 25.0 25.0 25.0 25.0 25.0\n",
1171 | "8 train NaN NaN NaN NaN NaN\n",
1172 | "9 train NaN NaN NaN NaN NaN"
1173 | ]
1174 | },
1175 | "execution_count": 178,
1176 | "metadata": {},
1177 | "output_type": "execute_result"
1178 | }
1179 | ],
1180 | "source": [
1181 | "def make_t(n_rows):\n",
1182 | " tmp = pd.DataFrame({'type': np.where(np.random.rand(n_rows)<0.5, 'train', 'test'),\n",
1183 | " 'feature': np.where(np.random.rand(n_rows)<0.5, (100*np.random.rand(n_rows)).astype(int), np.nan)})\n",
1184 | " tmp['feature_v1'] = tmp['feature']\n",
1185 | " tmp['feature_v2'] = tmp['feature']\n",
1186 | " tmp['feature_v3'] = tmp['feature']\n",
1187 | " tmp['feature_v4'] = tmp['feature']\n",
1188 | " return tmp\n",
1189 | "\n",
1190 | "data = make_t(10)\n",
1191 | "data"
1192 | ]
1193 | },
1194 | {
1195 | "cell_type": "code",
1196 | "execution_count": 182,
1197 | "metadata": {},
1198 | "outputs": [
1199 | {
1200 | "data": {
1201 | "text/html": [
1202 | "\n",
1203 | "\n",
1216 | "
\n",
1217 | " \n",
1218 | " \n",
1219 | " | \n",
1220 | " type | \n",
1221 | " feature | \n",
1222 | " feature_v1 | \n",
1223 | " feature_v2 | \n",
1224 | " feature_v3 | \n",
1225 | " feature_v4 | \n",
1226 | "
\n",
1227 | " \n",
1228 | " \n",
1229 | " \n",
1230 | " 0 | \n",
1231 | " test | \n",
1232 | " NaN | \n",
1233 | " 43.000000 | \n",
1234 | " 43.000000 | \n",
1235 | " 43.000000 | \n",
1236 | " 43.000000 | \n",
1237 | "
\n",
1238 | " \n",
1239 | " 1 | \n",
1240 | " test | \n",
1241 | " 43.0 | \n",
1242 | " 43.000000 | \n",
1243 | " 43.000000 | \n",
1244 | " 43.000000 | \n",
1245 | " 43.000000 | \n",
1246 | "
\n",
1247 | " \n",
1248 | " 2 | \n",
1249 | " test | \n",
1250 | " NaN | \n",
1251 | " 43.000000 | \n",
1252 | " 43.000000 | \n",
1253 | " 43.000000 | \n",
1254 | " 43.000000 | \n",
1255 | "
\n",
1256 | " \n",
1257 | " 3 | \n",
1258 | " train | \n",
1259 | " 4.0 | \n",
1260 | " 4.000000 | \n",
1261 | " 4.000000 | \n",
1262 | " 4.000000 | \n",
1263 | " 4.000000 | \n",
1264 | "
\n",
1265 | " \n",
1266 | " 4 | \n",
1267 | " train | \n",
1268 | " 18.0 | \n",
1269 | " 18.000000 | \n",
1270 | " 18.000000 | \n",
1271 | " 18.000000 | \n",
1272 | " 18.000000 | \n",
1273 | "
\n",
1274 | " \n",
1275 | " 5 | \n",
1276 | " train | \n",
1277 | " NaN | \n",
1278 | " 15.666667 | \n",
1279 | " 15.666667 | \n",
1280 | " 15.666667 | \n",
1281 | " 15.666667 | \n",
1282 | "
\n",
1283 | " \n",
1284 | " 6 | \n",
1285 | " train | \n",
1286 | " NaN | \n",
1287 | " 15.666667 | \n",
1288 | " 15.666667 | \n",
1289 | " 15.666667 | \n",
1290 | " 15.666667 | \n",
1291 | "
\n",
1292 | " \n",
1293 | " 7 | \n",
1294 | " train | \n",
1295 | " 25.0 | \n",
1296 | " 25.000000 | \n",
1297 | " 25.000000 | \n",
1298 | " 25.000000 | \n",
1299 | " 25.000000 | \n",
1300 | "
\n",
1301 | " \n",
1302 | " 8 | \n",
1303 | " train | \n",
1304 | " NaN | \n",
1305 | " 15.666667 | \n",
1306 | " 15.666667 | \n",
1307 | " 15.666667 | \n",
1308 | " 15.666667 | \n",
1309 | "
\n",
1310 | " \n",
1311 | " 9 | \n",
1312 | " train | \n",
1313 | " NaN | \n",
1314 | " 15.666667 | \n",
1315 | " 15.666667 | \n",
1316 | " 15.666667 | \n",
1317 | " 15.666667 | \n",
1318 | "
\n",
1319 | " \n",
1320 | "
\n",
1321 | "
"
1322 | ],
1323 | "text/plain": [
1324 | " type feature feature_v1 feature_v2 feature_v3 feature_v4\n",
1325 | "0 test NaN 43.000000 43.000000 43.000000 43.000000\n",
1326 | "1 test 43.0 43.000000 43.000000 43.000000 43.000000\n",
1327 | "2 test NaN 43.000000 43.000000 43.000000 43.000000\n",
1328 | "3 train 4.0 4.000000 4.000000 4.000000 4.000000\n",
1329 | "4 train 18.0 18.000000 18.000000 18.000000 18.000000\n",
1330 | "5 train NaN 15.666667 15.666667 15.666667 15.666667\n",
1331 | "6 train NaN 15.666667 15.666667 15.666667 15.666667\n",
1332 | "7 train 25.0 25.000000 25.000000 25.000000 25.000000\n",
1333 | "8 train NaN 15.666667 15.666667 15.666667 15.666667\n",
1334 | "9 train NaN 15.666667 15.666667 15.666667 15.666667"
1335 | ]
1336 | },
1337 | "execution_count": 182,
1338 | "metadata": {},
1339 | "output_type": "execute_result"
1340 | }
1341 | ],
1342 | "source": [
1343 | "name = 'feature_v1'\n",
1344 | "data.loc[data['type'] == 'test', name] = \\\n",
1345 | " data[data['type'] == 'test'][name].fillna(data[data['type'] == 'test'][name].mean())\n",
1346 | "data.loc[data['type'] == 'train', name] = \\\n",
1347 | " data[data['type'] == 'train'][name].fillna(data[data['type'] == 'train'][name].mean())\n",
1348 | "\n",
1349 | "name = 'feature_v2'\n",
1350 | "data[name] = data.groupby('type')[name].transform(lambda x: x.fillna(x.mean()))\n",
1351 | "\n",
1352 | "name = 'feature_v3'\n",
1353 | "data.loc[data[name].isnull(), name] = data.groupby('type')[name].transform('mean')\n",
1354 | "\n",
1355 | "name = 'feature_v4'\n",
1356 | "data[name] = np.where(data[name].isnull(), data['type'].map(data.groupby('type')[name].mean()), data[name])\n",
1357 | "\n",
1358 | "#name = 'feature_v4'\n",
1359 | "#gb = data.groupby('type')\n",
1360 | "#mn = gb.mean()\n",
1361 | "#for gn, x in gb:\n",
1362 | "# x[name].fillna(mn.loc[gn], inplace=True)\n",
1363 | "\n",
1364 | "data"
1365 | ]
1366 | },
1367 | {
1368 | "cell_type": "code",
1369 | "execution_count": 191,
1370 | "metadata": {},
1371 | "outputs": [],
1372 | "source": [
1373 | "data = make_t(10000000)"
1374 | ]
1375 | },
1376 | {
1377 | "cell_type": "code",
1378 | "execution_count": 195,
1379 | "metadata": {},
1380 | "outputs": [
1381 | {
1382 | "name": "stdout",
1383 | "output_type": "stream",
1384 | "text": [
1385 | "CPU times: user 3.66 s, sys: 132 ms, total: 3.79 s\n",
1386 | "Wall time: 3.81 s\n"
1387 | ]
1388 | }
1389 | ],
1390 | "source": [
1391 | "%%time\n",
1392 | "\n",
1393 | "name = 'feature_v1'\n",
1394 | "data.loc[data['type'] == 'test', name] = data[data['type'] == 'test'][name].fillna(data[data['type'] == 'test'][name].mean())\n",
1395 | "data.loc[data['type'] == 'train', name] = data[data['type'] == 'train'][name].fillna(data[data['type'] == 'train'][name].mean())\n",
1396 | "\n",
1397 | "# 3.44 s - 3.84 s"
1398 | ]
1399 | },
1400 | {
1401 | "cell_type": "code",
1402 | "execution_count": 194,
1403 | "metadata": {},
1404 | "outputs": [
1405 | {
1406 | "name": "stdout",
1407 | "output_type": "stream",
1408 | "text": [
1409 | "CPU times: user 1.9 s, sys: 152 ms, total: 2.05 s\n",
1410 | "Wall time: 2.06 s\n"
1411 | ]
1412 | }
1413 | ],
1414 | "source": [
1415 | "%%time\n",
1416 | "name = 'feature_v2'\n",
1417 | "data[name] = data.groupby('type')[name].transform(lambda x: x.fillna(x.mean()))\n",
1418 | "# 1.9 s - 2.04 s"
1419 | ]
1420 | },
1421 | {
1422 | "cell_type": "code",
1423 | "execution_count": 193,
1424 | "metadata": {},
1425 | "outputs": [
1426 | {
1427 | "name": "stdout",
1428 | "output_type": "stream",
1429 | "text": [
1430 | "CPU times: user 1.2 s, sys: 128 ms, total: 1.32 s\n",
1431 | "Wall time: 1.35 s\n"
1432 | ]
1433 | }
1434 | ],
1435 | "source": [
1436 | "%%time\n",
1437 | "name = 'feature_v3'\n",
1438 | "data.loc[data[name].isnull(), name] = data.groupby('type')[name].transform('mean')\n",
1439 | "# 1.17 - 1.18 s"
1440 | ]
1441 | },
1442 | {
1443 | "cell_type": "code",
1444 | "execution_count": 192,
1445 | "metadata": {},
1446 | "outputs": [
1447 | {
1448 | "name": "stdout",
1449 | "output_type": "stream",
1450 | "text": [
1451 | "CPU times: user 1.37 s, sys: 72 ms, total: 1.44 s\n",
1452 | "Wall time: 1.45 s\n"
1453 | ]
1454 | }
1455 | ],
1456 | "source": [
1457 | "%%time\n",
1458 | "name = 'feature_v4'\n",
1459 | "data[name] = np.where(data[name].isnull(), data['type'].map(data.groupby('type')[name].mean()), data[name])\n",
1460 | "# 1.26 s - 1.38"
1461 | ]
1462 | },
1463 | {
1464 | "cell_type": "code",
1465 | "execution_count": null,
1466 | "metadata": {},
1467 | "outputs": [],
1468 | "source": []
1469 | }
1470 | ],
1471 | "metadata": {
1472 | "kernelspec": {
1473 | "display_name": "Python 3",
1474 | "language": "python",
1475 | "name": "python3"
1476 | },
1477 | "language_info": {
1478 | "codemirror_mode": {
1479 | "name": "ipython",
1480 | "version": 3
1481 | },
1482 | "file_extension": ".py",
1483 | "mimetype": "text/x-python",
1484 | "name": "python",
1485 | "nbconvert_exporter": "python",
1486 | "pygments_lexer": "ipython3",
1487 | "version": "3.6.8"
1488 | }
1489 | },
1490 | "nbformat": 4,
1491 | "nbformat_minor": 2
1492 | }
1493 |
--------------------------------------------------------------------------------
/dj_python_2_oop_20181004.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# язык программирования Python\n",
8 | "## часть II - ООП\n",
9 | "\n",
10 | "автор: **Дьяконов Александр www.dyakonov.org**\n",
11 | "\n",
12 | "**для поддержки курсов автора, в частности https://github.com/Dyakonov/IML**\n",
13 | "\n",
14 | "\n",
15 | "##### материал частично основан на...\n",
16 | "\n",
17 | "* *Bruce Eckel* **Python 3 Patterns, Recipes and Idioms**\n",
18 | "* *Никита Лесников* **Беглый обзор внутренностей Python** // slideshare\n",
19 | "* *Сергей Лебедев* **Лекции по языку Питон** // youtube, канал \"Computer Science Center\"\n",
20 | "* Learn X in Y minutes https://learnxinyminutes.com/docs/python/\n",
21 | "* *Дэн Бейдер* **Чистый Python. Тонкости программирования для профи**"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "- Всё - объекты\n",
29 | "- Программа = объекты, посылающе друг другу сообщения. \n",
30 | "- Объект имеет собственную часть памяти и может состоять из других объектов. \n",
31 | "- Объект имеет тип (класс): формализует действия объекта / над объеком"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 8,
37 | "metadata": {},
38 | "outputs": [
39 | {
40 | "name": "stdout",
41 | "output_type": "stream",
42 | "text": [
43 | "3.5\n",
44 | "NotImplemented\n"
45 | ]
46 | }
47 | ],
48 | "source": [
49 | "a = 1\n",
50 | "b = 2.5\n",
51 | "print (a + b)\n",
52 | "print (a.__add__(b)) # НЕ РАБОТАЕТ"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 34,
58 | "metadata": {},
59 | "outputs": [
60 | {
61 | "name": "stdout",
62 | "output_type": "stream",
63 | "text": [
64 | "1578102544 73531656\n",
65 | "1578102576 73531656\n"
66 | ]
67 | }
68 | ],
69 | "source": [
70 | "# в Python всё - объекты. Имеют id и значение\n",
71 | "a = 1\n",
72 | "b = [1, 2]\n",
73 | "print (id(a), id(b))\n",
74 | "a = a + 1\n",
75 | "b.append(3) # id не изменится\n",
76 | "print (id(a), id(b))\n",
77 | "del a # удаление объекта\n",
78 | "# a # будет ошибка"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 21,
84 | "metadata": {},
85 | "outputs": [
86 | {
87 | "name": "stdout",
88 | "output_type": "stream",
89 | "text": [
90 | "{1, 2, 3, 4}; {(1, 2), (1, 3), (3, 4)}\n"
91 | ]
92 | }
93 | ],
94 | "source": [
95 | "# определение класса\n",
96 | "# первый аргумент всех методов - экземпляр класса\n",
97 | "\n",
98 | "class MyGraph:\n",
99 | " def __init__(self, V, E): # конструктор (деструктор - __del__)\n",
100 | " self.vertices = set(V)\n",
101 | " self.edges = set(E)\n",
102 | " def add_vertex(self, v): # метод - функция, объявленная в теле класса\n",
103 | " self.vertices.add(v)\n",
104 | " def add_edge(self, e):\n",
105 | " self.vertices.add(e[0])\n",
106 | " self.vertices.add(e[1])\n",
107 | " self.edges.add(e)\n",
108 | " def __str__(self): # представление в виде строки\n",
109 | " return (\"%s; %s\" % (self.vertices, self.edges))\n",
110 | "\n",
111 | "g = MyGraph([1,2,3], [(1,2), (1,3)])\n",
112 | "g.add_edge((3,4))\n",
113 | "print (g)"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 22,
119 | "metadata": {},
120 | "outputs": [
121 | {
122 | "name": "stdout",
123 | "output_type": "stream",
124 | "text": [
125 | "{1, 2, 3, 4}\n",
126 | "{1, 2, 3, 4}\n",
127 | "{1, 2, 3, 4, 5}; {(1, 2), (1, 3), (3, 4)}\n"
128 | ]
129 | }
130 | ],
131 | "source": [
132 | "print (g.vertices)\n",
133 | "print (g.__getattribute__('vertices'))\n",
134 | "g.__setattr__('vertices', set([1,2,3,4,5]))\n",
135 | "print (g)"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 1,
141 | "metadata": {},
142 | "outputs": [
143 | {
144 | "name": "stdout",
145 | "output_type": "stream",
146 | "text": [
147 | "['_X__c', '__doc__', '__module__', '_b', 'a']\n",
148 | "{'a': 0, '__module__': '__main__', '_X__c': 2, '_b': 1, '__doc__': None}\n"
149 | ]
150 | }
151 | ],
152 | "source": [
153 | "class X:\n",
154 | " a = 0 # обычный атрибут\n",
155 | " _b = 1 # не желательно пользоваться, но доступен\n",
156 | " __c = 2 # доступен под другим именем ('_X__c')\n",
157 | "print (dir(X))\n",
158 | "print (X.__dict__) # все атрибуты в виде словаря"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 1,
164 | "metadata": {},
165 | "outputs": [
166 | {
167 | "name": "stdout",
168 | "output_type": "stream",
169 | "text": [
170 | "3\n",
171 | "(, , , )\n"
172 | ]
173 | }
174 | ],
175 | "source": [
176 | "# множественное наследование\n",
177 | "class A:\n",
178 | " def make(self, x):\n",
179 | " print(x+1)\n",
180 | "class B:\n",
181 | " def make(self, x):\n",
182 | " print(x-1)\n",
183 | "class C(A, B):\n",
184 | " pass\n",
185 | "\n",
186 | "c = C()\n",
187 | "c.make(2)\n",
188 | "print (C.__mro__) # в таком порядке ищутся методы"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 6,
194 | "metadata": {},
195 | "outputs": [],
196 | "source": [
197 | "# класс\n",
198 | "class Human(object):\n",
199 | "\n",
200 | " species = \"H. sapiens\" # атрибут\n",
201 | "\n",
202 | " # инициализатор\n",
203 | " # когда класс инициализируется\n",
204 | " def __init__(self, name): \n",
205 | " self.name = name # инициализация атрибута\n",
206 | " self.age = 0 # инициализация свойства\n",
207 | "\n",
208 | "\n",
209 | " # метод класса, первый аргумент - self\n",
210 | " def say(self, msg):\n",
211 | " return \"{0}: {1}\".format(self.name, msg)\n",
212 | "\n",
213 | " # общий метод для всех экземпляров\n",
214 | " # первый аргумент - кто вызвал\n",
215 | " @classmethod\n",
216 | " def get_species(cls):\n",
217 | " return cls.species\n",
218 | "\n",
219 | " # вызывается без ссылки на вызвавшего\n",
220 | " @staticmethod\n",
221 | " def grunt():\n",
222 | " return \"статика...\"\n",
223 | "\n",
224 | " # свойство - превращает метод в атрибут\n",
225 | " @property\n",
226 | " def age(self):\n",
227 | " return self._age\n",
228 | "\n",
229 | " # для присваивания свойству\n",
230 | " @age.setter\n",
231 | " def age(self, age):\n",
232 | " self._age = age\n",
233 | "\n",
234 | " # для удаления свойства\n",
235 | " @age.deleter\n",
236 | " def age(self):\n",
237 | " del self._age"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 7,
243 | "metadata": {},
244 | "outputs": [
245 | {
246 | "name": "stdout",
247 | "output_type": "stream",
248 | "text": [
249 | "Иван: привет\n",
250 | "Сергей: пока\n",
251 | "H. sapiens\n",
252 | "H. neanderthalensis H. neanderthalensis\n",
253 | "H. neanderthalensis H. neanderthalensis\n",
254 | "статика...\n",
255 | "42\n"
256 | ]
257 | }
258 | ],
259 | "source": [
260 | "# инициализация\n",
261 | "i = Human(name=\"Иван\")\n",
262 | "print (i.say(\"привет\"))\n",
263 | "\n",
264 | "\n",
265 | "j = Human(\"Сергей\")\n",
266 | "print (j.say(\"пока\"))\n",
267 | "\n",
268 | "\n",
269 | "print (i.get_species())\n",
270 | "\n",
271 | "\n",
272 | "# меняем атрибут общий - для всего класса\n",
273 | "Human.species = \"H. neanderthalensis\"\n",
274 | "print (i.get_species(), i.species)\n",
275 | "print (j.get_species(), j.species)\n",
276 | "\n",
277 | "\n",
278 | "\n",
279 | "print (Human.grunt()) # статический метод\n",
280 | "\n",
281 | "\n",
282 | "i.age = 42 # свойство\n",
283 | "print (i.age)\n",
284 | "del i.age\n",
285 | "# i.age # будет исключение\n"
286 | ]
287 | },
288 | {
289 | "cell_type": "code",
290 | "execution_count": 5,
291 | "metadata": {},
292 | "outputs": [
293 | {
294 | "data": {
295 | "text/plain": [
296 | "4"
297 | ]
298 | },
299 | "execution_count": 5,
300 | "metadata": {},
301 | "output_type": "execute_result"
302 | }
303 | ],
304 | "source": [
305 | "class Myclass:\n",
306 | " __slots__ = ['name1', 'name2'] # указываем ВСЕ возможные атрибуты\n",
307 | " \n",
308 | "# -- занимает меньше памяти\n",
309 | " \n",
310 | "c = Myclass()\n",
311 | "c.name1 = 10\n",
312 | "c.name2 = lambda x: x * x\n",
313 | "\n",
314 | "c.name2(2)\n",
315 | "\n",
316 | "#c.name3 = 20 # должна быть ошибка (в Python2 - нет)\n",
317 | "#c.name3"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": 9,
323 | "metadata": {},
324 | "outputs": [
325 | {
326 | "name": "stdout",
327 | "output_type": "stream",
328 | "text": [
329 | "[1, 2]\n",
330 | "[3, 2]\n",
331 | "[3, 2]\n"
332 | ]
333 | }
334 | ],
335 | "source": [
336 | "a = [1, 2]\n",
337 | "b = a\n",
338 | "print (b)\n",
339 | "a[0] = 3\n",
340 | "print (b) # a и b идентичны\n",
341 | "del a\n",
342 | "print (b) # но b остаётся!"
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "execution_count": 17,
348 | "metadata": {},
349 | "outputs": [
350 | {
351 | "name": "stdout",
352 | "output_type": "stream",
353 | "text": [
354 | "dict_values(['mac_greek', 'zlib_codec', 'iso8859_15', 'cp1255', 'iso8859_9', 'shift_jis', 'gb2312', 'cp500', 'cp1140', 'iso8859_8', 'hp_roman8', 'cp037', 'cp949', 'cp864', 'euc_kr', 'gbk', 'cp852', 'iso8859_7', 'cp037', 'latin_1', 'utf_7', 'cp860', 'cp1253', 'iso8859_5', 'cp1125', 'big5', 'shift_jis_2004', 'cp850', 'utf_16_be', 'cp932', 'utf_8', 'iso2022_jp', 'iso8859_16', 'cp500', 'utf_8', 'ascii', 'iso8859_14', 'latin_1', 'base64_codec', 'gb2312', 'cp949', 'hp_roman8', 'euc_jis_2004', 'cp1258', 'quopri_codec', 'uu_codec', 'mac_turkish', 'euc_kr', 'iso8859_14', 'rot_13', 'euc_jp', 'cp1254', 'iso8859_8', 'cp1253', 'ptcp154', 'cp273', 'iso8859_14', 'cp1125', 'iso8859_14', 'cp424', 'big5hkscs', 'cp500', 'iso8859_2', 'cp437', 'hz', 'gbk', 'euc_jp', 'cp852', 'iso2022_jp_ext', 'euc_kr', 'iso8859_15', 'utf_7', 'utf_8', 'cp850', 'shift_jisx0213', 'euc_kr', 'cp437', 'iso8859_11', 'cp855', 'cp861', 'latin_1', 'cp1026', 'cp273', 'iso8859_16', 'shift_jis', 'cp932', 'iso8859_8', 'cp865', 'tis_620', 'iso8859_6', 'cp866', 'utf_8', 'iso8859_13', 'iso8859_16', 'iso8859_14', 'iso2022_jp_1', 'mac_iceland', 'iso8859_6', 'johab', 'iso8859_9', 'cp424', 'cp273', 'iso2022_jp_2004', 'ascii', 'utf_16_le', 'iso2022_jp_1', 'cp857', 'cp950', 'utf_32_le', 'cp1252', 'iso8859_3', 'cp037', 'gb2312', 'hz', 'cp1250', 'utf_16_le', 'iso8859_2', 'cp855', 'iso8859_7', 'iso8859_10', 'cp1256', 'iso8859_8', 'cp857', 'iso8859_14', 'cp437', 'iso8859_10', 'iso8859_2', 'iso8859_9', 'cp857', 'gb2312', 'iso8859_6', 'kz1048', 'iso8859_6', 'hex_codec', 'iso8859_7', 'euc_jis_2004', 'cp1125', 'cp869', 'zlib_codec', 'cp775', 'iso8859_3', 'iso8859_3', 'cp1257', 'iso8859_4', 'iso2022_jp_2', 'kz1048', 'shift_jis', 'cp1255', 'cp1254', 'iso8859_10', 'iso8859_10', 'cp424', 'iso8859_8', 'cp775', 'tis_620', 'latin_1', 'iso8859_3', 'iso8859_5', 'mac_roman', 'kz1048', 'cp949', 'cp037', 'cp037', 'cp1250', 'iso2022_jp_3', 'gb2312', 'latin_1', 'cp424', 'tis_620', 'euc_kr', 'ascii', 'iso8859_13', 'euc_jis_2004', 'iso8859_10', 'cp850', 'utf_16_be', 'ascii', 'iso8859_4', 'ptcp154', 'iso8859_4', 'iso8859_16', 'cp862', 'euc_jisx0213', 'iso8859_7', 'ascii', 'iso2022_kr', 'cp864', 'tis_620', 'utf_32_be', 'cp500', 'latin_1', 'big5', 'ascii', 'iso2022_jp_2004', 'cp037', 'iso2022_jp_2', 'iso8859_11', 'iso2022_jp_ext', 'big5', 'iso8859_9', 'cp1026', 'gb2312', 'cp862', 'cp861', 'latin_1', 'cp866', 'iso2022_jp', 'cp500', 'shift_jis_2004', 'cp860', 'cp950', 'koi8_r', 'cp1251', 'cp863', 'cp852', 'latin_1', 'gb2312', 'cp858', 'euc_kr', 'gb18030', 'cp932', 'utf_8', 'cp1125', 'cp037', 'base64_codec', 'ptcp154', 'cp1251', 'cp869', 'cp860', 'cp858', 'ptcp154', 'cp1256', 'iso8859_4', 'gb2312', 'iso8859_2', 'iso8859_9', 'quopri_codec', 'bz2_codec', 'cp865', 'johab', 'ascii', 'ascii', 'cp1258', 'mac_latin2', 'iso2022_kr', 'cp037', 'utf_7', 'mac_cyrillic', 'iso8859_9', 'cp861', 'iso8859_10', 'iso8859_5', 'utf_16', 'latin_1', 'iso8859_15', 'latin_1', 'cp863', 'iso8859_2', 'iso8859_6', 'utf_32', 'iso2022_jp', 'cp869', 'ascii', 'iso8859_4', 'cp864', 'cp863', 'cp862', 'gb2312', 'cp861', 'cp932', 'cp855', 'utf_32', 'ascii', 'mac_roman', 'quopri_codec', 'iso8859_6', 'ascii', 'iso8859_11', 'mac_latin2', 'euc_kr', 'cp869', 'iso8859_2', 'cp866', 'iso8859_4', 'ascii', 'shift_jisx0213', 'shift_jisx0213', 'cp1140', 'utf_16', 'iso8859_5', 'cp858', 'iso8859_5', 'shift_jis', 'mbcs', 'iso2022_jp_3', 'iso8859_7', 'cp1257', 'hz', 'iso8859_7', 'iso8859_6', 'big5hkscs', 'tis_620', 'cp865', 'tactis', 'iso2022_kr', 'euc_kr', 'hp_roman8', 'shift_jis', 'cp1026', 'iso8859_3', 'latin_1', 'iso8859_16', 'iso8859_3', 'euc_jp', 'shift_jis_2004', 'iso8859_7', 'cp775', 'latin_1', 'gbk', 'cp1252', 'iso8859_7', 'iso8859_13'])\n"
355 | ]
356 | }
357 | ],
358 | "source": [
359 | "# чтобы Unicode-литералы правильно воспринимались интерпретатором, в начале программы укажите\n",
360 | "# -*- coding: koi8 -r -*-\n",
361 | "# -*- coding: cp1251 -*-\n",
362 | "# 2я запись - в Win\n",
363 | "\n",
364 | "import encodings.aliases\n",
365 | "print (encodings.aliases.aliases.values()) # возможные кодировки"
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": 17,
371 | "metadata": {},
372 | "outputs": [
373 | {
374 | "name": "stdout",
375 | "output_type": "stream",
376 | "text": [
377 | "First создан, счётчик = 1\n",
378 | "Second создан, счётчик = 2\n",
379 | "First удалён, счётчик = 1\n",
380 | "Third создан, счётчик = 2\n",
381 | "Third удалён, счётчик = 1\n"
382 | ]
383 | }
384 | ],
385 | "source": [
386 | "# подсчёт числа объектов соответствующего класса\n",
387 | "\n",
388 | "class Counter:\n",
389 | " Count = 0 # счётчик\n",
390 | " \n",
391 | " def __init__(self, name): # внимание к отступам\n",
392 | " self.name = name # обращение через self\n",
393 | " Counter.Count += 1\n",
394 | " print (name, ' создан, счётчик =', Counter.Count)\n",
395 | " # Counter.Count - у класса, а не объекта!\n",
396 | " \n",
397 | " def __del__(self):\n",
398 | " Counter.Count -= 1\n",
399 | " print (self.name, 'удалён, счётчик = ', Counter.Count)\n",
400 | " if Counter.Count == 0:\n",
401 | " print ('Больше объектов нет...')\n",
402 | " \n",
403 | "x = Counter(\"First\")\n",
404 | "y = Counter(\"Second\")\n",
405 | "del x\n",
406 | "z = Counter(\"Third\")\n",
407 | "del z\n",
408 | "\n",
409 | "# если запустить второй раз - какая-то ерунда..."
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": 16,
415 | "metadata": {},
416 | "outputs": [],
417 | "source": [
418 | "# del x\n",
419 | "# del y\n",
420 | "del Counter"
421 | ]
422 | },
423 | {
424 | "cell_type": "markdown",
425 | "metadata": {},
426 | "source": [
427 | "## Метаклассы\n",
428 | "\n",
429 | "Метакласс - класс для создания других классов (его экземпляры тоже классы)\n",
430 | "\n",
431 | "Пример - класс **type** (все классы это его экземпляры)\n",
432 | "\n",
433 | "**зачем нужны** (не будем)\n",
434 | "- изменение поведения классов\n",
435 | "- сохраняются при наследовании\n",
436 | "- временно подменяют --dict-- (?)"
437 | ]
438 | },
439 | {
440 | "cell_type": "code",
441 | "execution_count": 14,
442 | "metadata": {},
443 | "outputs": [
444 | {
445 | "name": "stdout",
446 | "output_type": "stream",
447 | "text": [
448 | "\n",
449 | "\n"
450 | ]
451 | }
452 | ],
453 | "source": [
454 | "class Anyclass:\n",
455 | " x = 10\n",
456 | "\n",
457 | "print (type(Anyclass))\n",
458 | "\n",
459 | "class Nextclass(Anyclass): # Python 3 (metaclass = Anyclass):\n",
460 | " y = 20\n",
461 | " \n",
462 | "print (type(Nextclass))"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": 8,
468 | "metadata": {
469 | "collapsed": true
470 | },
471 | "outputs": [],
472 | "source": [
473 | "class C: pass\n",
474 | "# эквивалентная запись:\n",
475 | "С = type('C', (), {})\n",
476 | "\n",
477 | "# type - дефолтный метакласс для создания классов"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": 11,
483 | "metadata": {},
484 | "outputs": [
485 | {
486 | "name": "stdout",
487 | "output_type": "stream",
488 | "text": [
489 | "список: ['one', 'two']\n"
490 | ]
491 | }
492 | ],
493 | "source": [
494 | "def myprint(self):\n",
495 | " print(\"список: \", self)\n",
496 | "\n",
497 | "# такой способ создания класса \n",
498 | "MyList = type('MyList', (list,), dict(x=10, myprint=myprint))\n",
499 | "\n",
500 | "\n",
501 | "ml = MyList()\n",
502 | "ml.append(\"one\")\n",
503 | "ml.append(\"two\")\n",
504 | "ml.myprint()"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": 13,
510 | "metadata": {},
511 | "outputs": [
512 | {
513 | "name": "stdout",
514 | "output_type": "stream",
515 | "text": [
516 | "10 Привет, мир! 5 Пока, мир!\n"
517 | ]
518 | }
519 | ],
520 | "source": [
521 | "class MyClass: pass\n",
522 | "# это класс!\n",
523 | "MyClass.field = 10\n",
524 | "MyClass.method = lambda x: \"Привет, мир!\"\n",
525 | "\n",
526 | "x = MyClass()\n",
527 | "x.field2 = 5\n",
528 | "x.method2 = lambda x: \"Пока, мир!\"\n",
529 | "\n",
530 | "print (x.field, x.method(), x.field2, x.method2(None))\n",
531 | "# x.method2 - нужен аргумент"
532 | ]
533 | },
534 | {
535 | "cell_type": "code",
536 | "execution_count": 10,
537 | "metadata": {},
538 | "outputs": [
539 | {
540 | "name": "stdout",
541 | "output_type": "stream",
542 | "text": [
543 | "5 5\n"
544 | ]
545 | }
546 | ],
547 | "source": [
548 | "# есть атрибуты экземпляра\n",
549 | "# а есть атрибуты класса\n",
550 | "\n",
551 | "class MyClass: pass\n",
552 | "\n",
553 | "MyClass.field = 10\n",
554 | "x = MyClass()\n",
555 | "y = MyClass()\n",
556 | "MyClass.field = 5 # модификация произойдёт во всех объектах\n",
557 | "\n",
558 | "print (x.field, y.field)"
559 | ]
560 | },
561 | {
562 | "cell_type": "code",
563 | "execution_count": 26,
564 | "metadata": {},
565 | "outputs": [],
566 | "source": [
567 | "# много ещё чего про метаклассы - не стал писать"
568 | ]
569 | },
570 | {
571 | "cell_type": "code",
572 | "execution_count": 2,
573 | "metadata": {},
574 | "outputs": [
575 | {
576 | "data": {
577 | "text/plain": [
578 | "Path(./main/file.txt)"
579 | ]
580 | },
581 | "execution_count": 2,
582 | "metadata": {},
583 | "output_type": "execute_result"
584 | }
585 | ],
586 | "source": [
587 | "# ???? - см. Python 3\n",
588 | "class Path:\n",
589 | " def __init__(self, directory):\n",
590 | " self.directory = directory\n",
591 | " \n",
592 | " def __repr__(self):\n",
593 | " return \"Path({})\".format(self.directory)\n",
594 | " \n",
595 | " @property\n",
596 | " def parent(self):\n",
597 | " return Path(dirname(self.directory))\n",
598 | " \n",
599 | "p = Path(\"./main/file.txt\")\n",
600 | "p"
601 | ]
602 | },
603 | {
604 | "cell_type": "code",
605 | "execution_count": 17,
606 | "metadata": {},
607 | "outputs": [
608 | {
609 | "name": "stdout",
610 | "output_type": "stream",
611 | "text": [
612 | "my_attr\n",
613 | "0\n",
614 | "val_other\n"
615 | ]
616 | }
617 | ],
618 | "source": [
619 | "class Myclass:\n",
620 | " val = 0\n",
621 | " def __getattr__(self, name): # __getattr__ - при вызове несуществующего атрибута\n",
622 | " return name\n",
623 | "\n",
624 | "m = Myclass()\n",
625 | "print (m.my_attr)"
626 | ]
627 | },
628 | {
629 | "cell_type": "code",
630 | "execution_count": 31,
631 | "metadata": {},
632 | "outputs": [
633 | {
634 | "name": "stdout",
635 | "output_type": "stream",
636 | "text": [
637 | "2\n",
638 | "1.0\n"
639 | ]
640 | },
641 | {
642 | "data": {
643 | "text/plain": [
644 | "{'val': 2, 'val_other': 3.0}"
645 | ]
646 | },
647 | "execution_count": 31,
648 | "metadata": {},
649 | "output_type": "execute_result"
650 | }
651 | ],
652 | "source": [
653 | "class Myclass:\n",
654 | " val = 0\n",
655 | "\n",
656 | "m = Myclass()\n",
657 | "\n",
658 | "setattr(m, \"val\", 2) # getattr - безопасное добавление атрибута\n",
659 | "# setattr(m, \"val_other\", 3.0) # m.val_other = 3.0 создастся новый атрибут\n",
660 | "print (getattr(m, \"val\")) # getattr - безопасный вызов атрибута\n",
661 | "print (getattr(m, \"val_some_other\", 1.0))\n",
662 | "\n",
663 | "m.__dict__"
664 | ]
665 | },
666 | {
667 | "cell_type": "code",
668 | "execution_count": 5,
669 | "metadata": {},
670 | "outputs": [
671 | {
672 | "name": "stdout",
673 | "output_type": "stream",
674 | "text": [
675 | "(1, 2)\n"
676 | ]
677 | }
678 | ],
679 | "source": [
680 | "class Vec:\n",
681 | " def __init__(self, x=0.0, y=0.0):\n",
682 | " self.x = x\n",
683 | " self.y = y\n",
684 | " \n",
685 | " def __str__(self): # представление в виде строки\n",
686 | " return (\"(%s, %s)\" % (self.x, self.y))\n",
687 | "\n",
688 | "x = Vec(1, 2)\n",
689 | "print (x)"
690 | ]
691 | },
692 | {
693 | "cell_type": "markdown",
694 | "metadata": {},
695 | "source": [
696 | "## Дескриптор\n",
697 | "\n",
698 | "-- атрибут объекта со скрытым поведеним, которое задают методы в протоколе дескриптора: __get__(), __set__(), and __delete__()"
699 | ]
700 | },
701 | {
702 | "cell_type": "code",
703 | "execution_count": 5,
704 | "metadata": {},
705 | "outputs": [
706 | {
707 | "name": "stdout",
708 | "output_type": "stream",
709 | "text": [
710 | "Выдаём var \"x\"\n",
711 | "11\n",
712 | "Получаем var \"x\"\n",
713 | "Выдаём var \"x\"\n",
714 | "21\n",
715 | "Получаем var \"x\"\n",
716 | "Отрицательное значение - будет обнулено\n",
717 | "Выдаём var \"x\"\n",
718 | "1\n",
719 | "5\n"
720 | ]
721 | }
722 | ],
723 | "source": [
724 | "class RevealAccess(object):\n",
725 | " \"\"\"Пример дескриптора данных:\n",
726 | " устанавливает и возвращает значения,\n",
727 | " а также пишет сообщения\n",
728 | " \"\"\"\n",
729 | "\n",
730 | " def __init__(self, initval=None, name='var'):\n",
731 | " self.val = initval\n",
732 | " self.name = name\n",
733 | "\n",
734 | " def __get__(self, obj, objtype):\n",
735 | " print('Выдаём', self.name)\n",
736 | " return (self.val + 1) # выдаёт следующее число\n",
737 | "\n",
738 | " def __set__(self, obj, val):\n",
739 | " print('Получаем', self.name)\n",
740 | " if val<0:\n",
741 | " print('Отрицательное значение - будет обнулено')\n",
742 | " self.val = 0\n",
743 | " else:\n",
744 | " self.val = val\n",
745 | "\n",
746 | "class MyClass(object):\n",
747 | " x = RevealAccess(10, 'var \"x\"') # у переменное скрытое поведение\n",
748 | " y = 5\n",
749 | " \n",
750 | " \n",
751 | "m = MyClass()\n",
752 | "print (m.x)\n",
753 | "\n",
754 | "m.x = 20\n",
755 | "print (m.x)\n",
756 | "m.x = -20\n",
757 | "print (m.x)\n",
758 | "\n",
759 | "\n",
760 | "print(m.y)"
761 | ]
762 | },
763 | {
764 | "cell_type": "code",
765 | "execution_count": 6,
766 | "metadata": {
767 | "collapsed": true
768 | },
769 | "outputs": [],
770 | "source": [
771 | "# свойства в Питоне - \"быстрый\" дескриптор\n",
772 | "\n",
773 | "# \"безопасный класс\"\n",
774 | "# контролирует значения атрибутов\n",
775 | "\n",
776 | "class SafeClass:\n",
777 | " def _get_attr(self):\n",
778 | " return self._x\n",
779 | " \n",
780 | " def _set_attr(self, x):\n",
781 | " assert x > 0, \"необходимо положительное значение\"\n",
782 | " self._x = x\n",
783 | " \n",
784 | " def _del_attr(self):\n",
785 | " del self._x\n",
786 | " \n",
787 | " x = property(_get_attr, _set_attr, _del_attr)\n",
788 | " \n",
789 | " \n",
790 | "safe = SafeClass()\n",
791 | "safe.x = 1\n",
792 | "# safe.x = -2 # будет исключение"
793 | ]
794 | },
795 | {
796 | "cell_type": "code",
797 | "execution_count": 13,
798 | "metadata": {},
799 | "outputs": [],
800 | "source": [
801 | "# ??? пока не понял - МОЖНО УДАЛИТЬ\n",
802 | "class NonNegative:\n",
803 | " def _get_attr(self):\n",
804 | " return self._x\n",
805 | " \n",
806 | " def _set_attr(self, x):\n",
807 | " assert x > 0, \"необходимо положительное значение\"\n",
808 | " self._x = x\n",
809 | " \n",
810 | " def _del_attr(self):\n",
811 | " del self._x\n",
812 | " \n",
813 | "class SafeClass2:\n",
814 | " x = NonNegative()\n",
815 | " y = NonNegative()"
816 | ]
817 | },
818 | {
819 | "cell_type": "code",
820 | "execution_count": 17,
821 | "metadata": {
822 | "collapsed": true
823 | },
824 | "outputs": [],
825 | "source": [
826 | "safe = SafeClass2()"
827 | ]
828 | },
829 | {
830 | "cell_type": "code",
831 | "execution_count": 18,
832 | "metadata": {},
833 | "outputs": [],
834 | "source": [
835 | "safe.x = -2"
836 | ]
837 | },
838 | {
839 | "cell_type": "code",
840 | "execution_count": 21,
841 | "metadata": {},
842 | "outputs": [],
843 | "source": [
844 | "safe.y = -2"
845 | ]
846 | },
847 | {
848 | "cell_type": "code",
849 | "execution_count": 23,
850 | "metadata": {
851 | "collapsed": true
852 | },
853 | "outputs": [],
854 | "source": [
855 | "class SafeClass2:\n",
856 | " x = SafeClass()\n",
857 | " y = SafeClass()"
858 | ]
859 | }
860 | ],
861 | "metadata": {
862 | "kernelspec": {
863 | "display_name": "Python 3",
864 | "language": "python",
865 | "name": "python3"
866 | },
867 | "language_info": {
868 | "codemirror_mode": {
869 | "name": "ipython",
870 | "version": 3
871 | },
872 | "file_extension": ".py",
873 | "mimetype": "text/x-python",
874 | "name": "python",
875 | "nbconvert_exporter": "python",
876 | "pygments_lexer": "ipython3",
877 | "version": "3.6.5"
878 | }
879 | },
880 | "nbformat": 4,
881 | "nbformat_minor": 1
882 | }
883 |
--------------------------------------------------------------------------------
/dj_python_4_tonko_20181004.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# язык программирования Python\n",
8 | "## часть IV - некоторые тонкости\n",
9 | "\n",
10 | "автор: **Дьяконов Александр www.dyakonov.org**\n",
11 | "\n",
12 | "**для поддержки курсов автора, в частности https://github.com/Dyakonov/IML**\n",
13 | "\n",
14 | "\n",
15 | "### материал основан на...\n",
16 | "книге ***Дэн Бейдер*** **Чистый Python. Тонкости программирования для профи**. Питер, 2018."
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### assert\n",
24 | "\n",
25 | "Инструкции assert должны применяться только для того, чтобы помогать разработчикам идентифицировать ошибки. Они не являются механизмом обработки ошибок периода исполнения программы."
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## типичная ошибка - пропущенная запятая"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 2,
38 | "metadata": {},
39 | "outputs": [
40 | {
41 | "data": {
42 | "text/plain": [
43 | "['Петя', 'МашаВася']"
44 | ]
45 | },
46 | "execution_count": 2,
47 | "metadata": {},
48 | "output_type": "execute_result"
49 | }
50 | ],
51 | "source": [
52 | "lst = ['Петя',\n",
53 | " 'Маша'\n",
54 | " 'Вася'\n",
55 | " ]\n",
56 | "\n",
57 | "lst"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## менеджер контекста\n",
65 | "\n",
66 | "эквивалентные куски кода"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": null,
72 | "metadata": {},
73 | "outputs": [],
74 | "source": [
75 | "with open('hello.txt', 'w') as f:\n",
76 | " f.write('привет, мир!')\n",
77 | "\n",
78 | "\n",
79 | "f = open('hello.txt', 'w')\n",
80 | "try:\n",
81 | " f.write('привет, мир!')\n",
82 | "finally:\n",
83 | " f.close()"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 3,
89 | "metadata": {},
90 | "outputs": [
91 | {
92 | "name": "stdout",
93 | "output_type": "stream",
94 | "text": [
95 | " раз\n",
96 | " два\n",
97 | " три\n",
98 | " четыре\n"
99 | ]
100 | }
101 | ],
102 | "source": [
103 | "class Indenter:\n",
104 | " def __init__(self):\n",
105 | " self.level = 0\n",
106 | " \n",
107 | " def __enter__(self):\n",
108 | " self.level += 1\n",
109 | " return self\n",
110 | " \n",
111 | " def __exit__(self, exc_type, exc_val, exc_tb):\n",
112 | " self.level -= 1\n",
113 | " \n",
114 | " def print(self, text):\n",
115 | " print(' ' * self.level + text)\n",
116 | " \n",
117 | "\n",
118 | "with Indenter() as indent:\n",
119 | " indent.print('раз')\n",
120 | " with indent:\n",
121 | " indent.print('два')\n",
122 | " with indent:\n",
123 | " indent.print('три')\n",
124 | " indent.print('четыре')"
125 | ]
126 | },
127 | {
128 | "cell_type": "markdown",
129 | "metadata": {},
130 | "source": [
131 | "## именования переменных\n",
132 | "\n",
133 | "\n",
134 | "\n",
135 | "«дандеры» (dunders — это сокращение от англ. double underscores) - с двойным подчёркиванием"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "_var = 1 # для внутреннего пользования, не импортируются from my_module import *\n",
145 | "var_ = 1 # чтобы избежать конфликта имён\n",
146 | "__var = 1 # искажением имени (name mangling)\n",
147 | "__var__ = 1 # не будет искажения! Зарезервированы: __init__, __call__ и т.п.\n",
148 | "_ = 1 # неважное имя переменной (чаще при распаковке) часто: результат последнего выражения\n"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "# форматирование строк"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 1,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "name": "stdout",
165 | "output_type": "stream",
166 | "text": [
167 | "Привет, Иван\n",
168 | "Привет, Иван\n",
169 | "Привет, Иван\n",
170 | "Привет, Иван\n"
171 | ]
172 | }
173 | ],
174 | "source": [
175 | "name = 'Иван'\n",
176 | "print ('Привет, %s' % name)\n",
177 | "print ('Привет, {}'.format(name))\n",
178 | "print (f'Привет, {name}') # форматированные строковые литералы (Formatted String Literals).\n",
179 | "\n",
180 | "from string import Template\n",
181 | "print (Template('Привет, $name').substitute(name=name)) # шаблонные строки"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "## лямбда-функции"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 7,
194 | "metadata": {},
195 | "outputs": [
196 | {
197 | "data": {
198 | "text/plain": [
199 | "3"
200 | ]
201 | },
202 | "execution_count": 7,
203 | "metadata": {},
204 | "output_type": "execute_result"
205 | }
206 | ],
207 | "source": [
208 | "(lambda x, y: x + y)(1, 2)"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": 55,
214 | "metadata": {},
215 | "outputs": [
216 | {
217 | "name": "stdout",
218 | "output_type": "stream",
219 | "text": [
220 | "[(-2, 'MINUS TWO'), (-1, 'MINUS ONE'), (1, 'ONE'), (4, 'FOUR')]\n",
221 | "[(1, 'ONE'), (-1, 'MINUS ONE'), (-2, 'MINUS TWO'), (4, 'FOUR')]\n"
222 | ]
223 | }
224 | ],
225 | "source": [
226 | "tuples = [(1, 'ONE'), (4, 'FOUR'), (-1, 'MINUS ONE'), (-2, 'MINUS TWO')]\n",
227 | "\n",
228 | "print (sorted(tuples, key=lambda x: x[0])) # так по умолчанию!\n",
229 | "print (sorted(tuples, key=lambda x: x[0]*x[0]))"
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": 60,
235 | "metadata": {},
236 | "outputs": [
237 | {
238 | "data": {
239 | "text/plain": [
240 | "[(-2, 'MINUS TWO'), (-1, 'MINUS ONE'), (1, 'ONE'), (4, 'FOUR')]"
241 | ]
242 | },
243 | "execution_count": 60,
244 | "metadata": {},
245 | "output_type": "execute_result"
246 | }
247 | ],
248 | "source": [
249 | "import operator\n",
250 | "sorted(tuples, key=operator.itemgetter(0))"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": 11,
256 | "metadata": {},
257 | "outputs": [
258 | {
259 | "data": {
260 | "text/plain": [
261 | "[(-2, 'MINUS TWO'), (-1, 'MINUS ONE'), (1, 'ONE'), (4, 'FOUR')]"
262 | ]
263 | },
264 | "execution_count": 11,
265 | "metadata": {},
266 | "output_type": "execute_result"
267 | }
268 | ],
269 | "source": [
270 | "sorted(tuples)"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 13,
276 | "metadata": {},
277 | "outputs": [
278 | {
279 | "data": {
280 | "text/plain": [
281 | "[0, 2, 4, 6, 8, 10, 12, 14]"
282 | ]
283 | },
284 | "execution_count": 13,
285 | "metadata": {},
286 | "output_type": "execute_result"
287 | }
288 | ],
289 | "source": [
290 | "list(filter(lambda x: x % 2 == 0, range(16)))"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 14,
296 | "metadata": {},
297 | "outputs": [
298 | {
299 | "data": {
300 | "text/plain": [
301 | "[0, 2, 4, 6, 8, 10, 12, 14]"
302 | ]
303 | },
304 | "execution_count": 14,
305 | "metadata": {},
306 | "output_type": "execute_result"
307 | }
308 | ],
309 | "source": [
310 | "[x for x in range(16) if x % 2 == 0]"
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": 15,
316 | "metadata": {},
317 | "outputs": [
318 | {
319 | "name": "stdout",
320 | "output_type": "stream",
321 | "text": [
322 | "[(4, 'FOUR'), (-1, 'MINUS ONE'), (-2, 'MINUS TWO'), (1, 'ONE')]\n"
323 | ]
324 | }
325 | ],
326 | "source": [
327 | "tuples = [(1, 'ONE'), (4, 'FOUR'), (-1, 'MINUS ONE'), (-2, 'MINUS TWO')]\n",
328 | "\n",
329 | "print (sorted(tuples, key=lambda x: x[1])) # так по умолчанию!"
330 | ]
331 | },
332 | {
333 | "cell_type": "markdown",
334 | "metadata": {},
335 | "source": [
336 | "# декораторы"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": 24,
342 | "metadata": {},
343 | "outputs": [
344 | {
345 | "data": {
346 | "text/plain": [
347 | "'HELLO!'"
348 | ]
349 | },
350 | "execution_count": 24,
351 | "metadata": {},
352 | "output_type": "execute_result"
353 | }
354 | ],
355 | "source": [
356 | "def dec_upper(f):\n",
357 | " return lambda: f().upper()\n",
358 | "\n",
359 | "def dec_lower(f):\n",
360 | " return lambda: f().lower()\n",
361 | "\n",
362 | "@dec_upper\n",
363 | "@dec_lower\n",
364 | "def say():\n",
365 | " return('Hello!') # ' + x + '\n",
366 | "\n",
367 | "say()"
368 | ]
369 | },
370 | {
371 | "cell_type": "code",
372 | "execution_count": 29,
373 | "metadata": {},
374 | "outputs": [
375 | {
376 | "data": {
377 | "text/plain": [
378 | "'HELLOMIKE!'"
379 | ]
380 | },
381 | "execution_count": 29,
382 | "metadata": {},
383 | "output_type": "execute_result"
384 | }
385 | ],
386 | "source": [
387 | "def dec_upper(f):\n",
388 | " def g(*args, **kwargs):\n",
389 | " return f(*args, **kwargs).upper()\n",
390 | " return g\n",
391 | "\n",
392 | "def dec_lower(f):\n",
393 | " def g(*args, **kwargs):\n",
394 | " return f(*args, **kwargs).lower()\n",
395 | " return g\n",
396 | "\n",
397 | "@dec_upper\n",
398 | "@dec_lower\n",
399 | "def say(x):\n",
400 | " return('Hello' + x + '!') # ' + x + '\n",
401 | "\n",
402 | "say('Mike')"
403 | ]
404 | },
405 | {
406 | "cell_type": "markdown",
407 | "metadata": {},
408 | "source": [
409 | "## необязательные аргументы"
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": 21,
415 | "metadata": {},
416 | "outputs": [
417 | {
418 | "name": "stdout",
419 | "output_type": "stream",
420 | "text": [
421 | "()\n"
422 | ]
423 | }
424 | ],
425 | "source": [
426 | "def f(z, *x):\n",
427 | " print (x)\n",
428 | " \n",
429 | "f(1)"
430 | ]
431 | },
432 | {
433 | "cell_type": "markdown",
434 | "metadata": {},
435 | "source": [
436 | "# распаковка аргументов"
437 | ]
438 | },
439 | {
440 | "cell_type": "code",
441 | "execution_count": 33,
442 | "metadata": {},
443 | "outputs": [
444 | {
445 | "name": "stdout",
446 | "output_type": "stream",
447 | "text": [
448 | "<1, 2, 3>\n",
449 | "<4, 5, 6>\n",
450 | "<0, 1, 4>\n",
451 | "\n",
452 | "<30, 10, 20>\n"
453 | ]
454 | }
455 | ],
456 | "source": [
457 | "def print_vec(x, y, z):\n",
458 | " print('<%s, %s, %s>' % (x, y, z))\n",
459 | " \n",
460 | "lst = [1, 2, 3]\n",
461 | "tpl = (4, 5, 6)\n",
462 | "gen = (i*i for i in range(3))\n",
463 | "\n",
464 | "print_vec(*lst)\n",
465 | "print_vec(*tpl)\n",
466 | "print_vec(*gen)\n",
467 | "\n",
468 | "\n",
469 | "dct = {'y': 10, 'z': 20, 'x': 30}\n",
470 | "print_vec(*dct)\n",
471 | "print_vec(**dct)"
472 | ]
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "metadata": {},
477 | "source": [
478 | "## ООП"
479 | ]
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": 34,
484 | "metadata": {},
485 | "outputs": [
486 | {
487 | "data": {
488 | "text/plain": [
489 | "(False, False, True, False, [1, 2], None, [1, 2])"
490 | ]
491 | },
492 | "execution_count": 34,
493 | "metadata": {},
494 | "output_type": "execute_result"
495 | }
496 | ],
497 | "source": [
498 | "a = [1, 2]\n",
499 | "b = [1].append(2)\n",
500 | "c = [1, 2]\n",
501 | "\n",
502 | "a == b, a is b, a == c, a is c, a, b, c"
503 | ]
504 | },
505 | {
506 | "cell_type": "markdown",
507 | "metadata": {},
508 | "source": [
509 | "## что возвращает функция"
510 | ]
511 | },
512 | {
513 | "cell_type": "code",
514 | "execution_count": 35,
515 | "metadata": {},
516 | "outputs": [
517 | {
518 | "name": "stdout",
519 | "output_type": "stream",
520 | "text": [
521 | "None\n"
522 | ]
523 | }
524 | ],
525 | "source": [
526 | "def f(x, y):\n",
527 | " z = x + y\n",
528 | " \n",
529 | "print (f(1, 2))"
530 | ]
531 | },
532 | {
533 | "cell_type": "markdown",
534 | "metadata": {},
535 | "source": [
536 | "# описание функции"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": 36,
542 | "metadata": {},
543 | "outputs": [
544 | {
545 | "name": "stdout",
546 | "output_type": "stream",
547 | "text": [
548 | "__str__ 2\n",
549 | "\n",
550 | "Help on MyClass in module __main__ object:\n",
551 | "\n",
552 | "class MyClass(builtins.object)\n",
553 | " | __comment__\n",
554 | " | \n",
555 | " | Methods defined here:\n",
556 | " | \n",
557 | " | __init__(self, var)\n",
558 | " | Initialize self. See help(type(self)) for accurate signature.\n",
559 | " | \n",
560 | " | __repr__(self)\n",
561 | " | Return repr(self).\n",
562 | " | \n",
563 | " | __str__(self)\n",
564 | " | Return str(self).\n",
565 | " | \n",
566 | " | ----------------------------------------------------------------------\n",
567 | " | Data descriptors defined here:\n",
568 | " | \n",
569 | " | __dict__\n",
570 | " | dictionary for instance variables (if defined)\n",
571 | " | \n",
572 | " | __weakref__\n",
573 | " | list of weak references to the object (if defined)\n",
574 | "\n"
575 | ]
576 | },
577 | {
578 | "data": {
579 | "text/plain": [
580 | "__repr__ 2"
581 | ]
582 | },
583 | "execution_count": 36,
584 | "metadata": {},
585 | "output_type": "execute_result"
586 | }
587 | ],
588 | "source": [
589 | "class MyClass:\n",
590 | " \"\"\"\n",
591 | " __comment__\n",
592 | " \"\"\"\n",
593 | " def __init__(self, var):\n",
594 | " self.var = var\n",
595 | " def __repr__(self):\n",
596 | " # для разработчиков\n",
597 | " return '__repr__ %g' % self.var\n",
598 | " def __str__(self):\n",
599 | " # удобочитаемо\n",
600 | " return '__str__ %g' % self.var\n",
601 | "\n",
602 | "mc = MyClass(2)\n",
603 | "print (mc)\n",
604 | "print (MyClass)\n",
605 | "help(mc)\n",
606 | "mc"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": 37,
612 | "metadata": {},
613 | "outputs": [
614 | {
615 | "data": {
616 | "text/plain": [
617 | "'__str__ 2'"
618 | ]
619 | },
620 | "execution_count": 37,
621 | "metadata": {},
622 | "output_type": "execute_result"
623 | }
624 | ],
625 | "source": [
626 | "str(mc)"
627 | ]
628 | },
629 | {
630 | "cell_type": "markdown",
631 | "metadata": {},
632 | "source": [
633 | "## копирование"
634 | ]
635 | },
636 | {
637 | "cell_type": "code",
638 | "execution_count": 39,
639 | "metadata": {},
640 | "outputs": [
641 | {
642 | "name": "stdout",
643 | "output_type": "stream",
644 | "text": [
645 | "[[1, 4], [2], [3]]\n",
646 | "[[1, 4], [2]]\n"
647 | ]
648 | }
649 | ],
650 | "source": [
651 | "x = [[1], [2]]\n",
652 | "y = list(x)\n",
653 | "x.append([3])\n",
654 | "x[0].append(4)\n",
655 | "print (x)\n",
656 | "print (y)"
657 | ]
658 | },
659 | {
660 | "cell_type": "markdown",
661 | "metadata": {},
662 | "source": [
663 | "## переменные класса и экземпляра"
664 | ]
665 | },
666 | {
667 | "cell_type": "code",
668 | "execution_count": 40,
669 | "metadata": {},
670 | "outputs": [
671 | {
672 | "name": "stdout",
673 | "output_type": "stream",
674 | "text": [
675 | "2\n",
676 | "2 10\n",
677 | "0 10 2 20 2\n"
678 | ]
679 | }
680 | ],
681 | "source": [
682 | "class My:\n",
683 | " a = 2 # переменная класса\n",
684 | " def __init__(self, b):\n",
685 | " self.b = b # переменная экземпляра\n",
686 | " \n",
687 | "print (My.a)\n",
688 | "m = My(10)\n",
689 | "print (m.a, m.b)\n",
690 | "m2 = My(20)\n",
691 | "m.a = 0\n",
692 | "print (m.a, m.b, m2.a, m2.b, My.a)"
693 | ]
694 | },
695 | {
696 | "cell_type": "code",
697 | "execution_count": 41,
698 | "metadata": {},
699 | "outputs": [
700 | {
701 | "name": "stdout",
702 | "output_type": "stream",
703 | "text": [
704 | "10 20\n"
705 | ]
706 | }
707 | ],
708 | "source": [
709 | "class My:\n",
710 | " a = 2 # переменная класса\n",
711 | " def __init__(self, b):\n",
712 | " self.b = b # переменная экземпляра\n",
713 | " \n",
714 | "m = My(0)\n",
715 | "m.a = 10\n",
716 | "m.b = 20\n",
717 | "My.a = 30\n",
718 | "# m = My(0)\n",
719 | "print (m.a, m.b)"
720 | ]
721 | },
722 | {
723 | "cell_type": "code",
724 | "execution_count": 42,
725 | "metadata": {},
726 | "outputs": [],
727 | "source": [
728 | "class MyClass:\n",
729 | " def method(self):\n",
730 | " return 'вызван метод экземпляра', self\n",
731 | " @classmethod\n",
732 | " def classmethod(cls):\n",
733 | " return 'вызван метод класса', cls\n",
734 | " @staticmethod\n",
735 | " def staticmethod():\n",
736 | " return 'вызван статический метод'"
737 | ]
738 | },
739 | {
740 | "cell_type": "code",
741 | "execution_count": 43,
742 | "metadata": {},
743 | "outputs": [
744 | {
745 | "data": {
746 | "text/plain": [
747 | "('вызван метод экземпляра', <__main__.MyClass at 0x7f3a8f1ce8d0>)"
748 | ]
749 | },
750 | "execution_count": 43,
751 | "metadata": {},
752 | "output_type": "execute_result"
753 | }
754 | ],
755 | "source": [
756 | "obj = MyClass()\n",
757 | "obj.method()"
758 | ]
759 | },
760 | {
761 | "cell_type": "code",
762 | "execution_count": 44,
763 | "metadata": {},
764 | "outputs": [
765 | {
766 | "data": {
767 | "text/plain": [
768 | "('вызван метод экземпляра', <__main__.MyClass at 0x7f3a8f1ce8d0>)"
769 | ]
770 | },
771 | "execution_count": 44,
772 | "metadata": {},
773 | "output_type": "execute_result"
774 | }
775 | ],
776 | "source": [
777 | "MyClass.method(obj)"
778 | ]
779 | },
780 | {
781 | "cell_type": "code",
782 | "execution_count": 45,
783 | "metadata": {},
784 | "outputs": [
785 | {
786 | "data": {
787 | "text/plain": [
788 | "('вызван метод класса', __main__.MyClass)"
789 | ]
790 | },
791 | "execution_count": 45,
792 | "metadata": {},
793 | "output_type": "execute_result"
794 | }
795 | ],
796 | "source": [
797 | "obj.classmethod()"
798 | ]
799 | },
800 | {
801 | "cell_type": "code",
802 | "execution_count": 46,
803 | "metadata": {},
804 | "outputs": [
805 | {
806 | "data": {
807 | "text/plain": [
808 | "'вызван статический метод'"
809 | ]
810 | },
811 | "execution_count": 46,
812 | "metadata": {},
813 | "output_type": "execute_result"
814 | }
815 | ],
816 | "source": [
817 | "obj.staticmethod()\n",
818 | "\n",
819 | "# статические методы не могут получить доступ ни к состоянию экземпляра объекта, ни к состоянию класса"
820 | ]
821 | },
822 | {
823 | "cell_type": "code",
824 | "execution_count": 47,
825 | "metadata": {},
826 | "outputs": [
827 | {
828 | "data": {
829 | "text/plain": [
830 | "('вызван метод класса', __main__.MyClass)"
831 | ]
832 | },
833 | "execution_count": 47,
834 | "metadata": {},
835 | "output_type": "execute_result"
836 | }
837 | ],
838 | "source": [
839 | "MyClass.classmethod()"
840 | ]
841 | },
842 | {
843 | "cell_type": "code",
844 | "execution_count": 49,
845 | "metadata": {},
846 | "outputs": [
847 | {
848 | "data": {
849 | "text/plain": [
850 | "'вызван статический метод'"
851 | ]
852 | },
853 | "execution_count": 49,
854 | "metadata": {},
855 | "output_type": "execute_result"
856 | }
857 | ],
858 | "source": [
859 | "MyClass.staticmethod()"
860 | ]
861 | },
862 | {
863 | "cell_type": "code",
864 | "execution_count": 50,
865 | "metadata": {},
866 | "outputs": [
867 | {
868 | "ename": "TypeError",
869 | "evalue": "method() missing 1 required positional argument: 'self'",
870 | "output_type": "error",
871 | "traceback": [
872 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
873 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
874 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mMyClass\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
875 | "\u001b[0;31mTypeError\u001b[0m: method() missing 1 required positional argument: 'self'"
876 | ]
877 | }
878 | ],
879 | "source": [
880 | "MyClass.method()"
881 | ]
882 | },
883 | {
884 | "cell_type": "markdown",
885 | "metadata": {},
886 | "source": [
887 | "# словари"
888 | ]
889 | },
890 | {
891 | "cell_type": "code",
892 | "execution_count": 6,
893 | "metadata": {},
894 | "outputs": [
895 | {
896 | "data": {
897 | "text/plain": [
898 | "{(1, 2): [1, 2]}"
899 | ]
900 | },
901 | "execution_count": 6,
902 | "metadata": {},
903 | "output_type": "execute_result"
904 | }
905 | ],
906 | "source": [
907 | "dict({(1, 2): [1, 2]})"
908 | ]
909 | },
910 | {
911 | "cell_type": "code",
912 | "execution_count": 63,
913 | "metadata": {},
914 | "outputs": [
915 | {
916 | "data": {
917 | "text/plain": [
918 | "{True: 'возможно'}"
919 | ]
920 | },
921 | "execution_count": 63,
922 | "metadata": {},
923 | "output_type": "execute_result"
924 | }
925 | ],
926 | "source": [
927 | "{True: 'да', 1: 'нет', 1.0: 'возможно'}"
928 | ]
929 | },
930 | {
931 | "cell_type": "code",
932 | "execution_count": 64,
933 | "metadata": {},
934 | "outputs": [
935 | {
936 | "data": {
937 | "text/plain": [
938 | "True"
939 | ]
940 | },
941 | "execution_count": 64,
942 | "metadata": {},
943 | "output_type": "execute_result"
944 | }
945 | ],
946 | "source": [
947 | "True == 1 == 1.0"
948 | ]
949 | },
950 | {
951 | "cell_type": "markdown",
952 | "metadata": {},
953 | "source": [
954 | "# словарь с порядком"
955 | ]
956 | },
957 | {
958 | "cell_type": "code",
959 | "execution_count": 7,
960 | "metadata": {},
961 | "outputs": [
962 | {
963 | "data": {
964 | "text/plain": [
965 | "OrderedDict([('one', 1), ('two', 2), ('three', 3)])"
966 | ]
967 | },
968 | "execution_count": 7,
969 | "metadata": {},
970 | "output_type": "execute_result"
971 | }
972 | ],
973 | "source": [
974 | "# словарь с порядком ключей\n",
975 | "import collections \n",
976 | "d = collections.OrderedDict(one=1, two=2, three=3)\n",
977 | "d"
978 | ]
979 | },
980 | {
981 | "cell_type": "code",
982 | "execution_count": 12,
983 | "metadata": {},
984 | "outputs": [
985 | {
986 | "data": {
987 | "text/plain": [
988 | "(1, [])"
989 | ]
990 | },
991 | "execution_count": 12,
992 | "metadata": {},
993 | "output_type": "execute_result"
994 | }
995 | ],
996 | "source": [
997 | "# словарь со значение по умолчанию\n",
998 | "import collections \n",
999 | "d = collections.defaultdict(list, one=1, two=2, three=3, default='111')\n",
1000 | "d['one'], d['2'] # по умолчанию - пустой список"
1001 | ]
1002 | },
1003 | {
1004 | "cell_type": "code",
1005 | "execution_count": 14,
1006 | "metadata": {},
1007 | "outputs": [
1008 | {
1009 | "data": {
1010 | "text/plain": [
1011 | "(1, 2, 4, 6)"
1012 | ]
1013 | },
1014 | "execution_count": 14,
1015 | "metadata": {},
1016 | "output_type": "execute_result"
1017 | }
1018 | ],
1019 | "source": [
1020 | "# словари -> в один словарь\n",
1021 | "from collections import ChainMap \n",
1022 | "d1 = {1:1, 2:2}\n",
1023 | "d2 = {2:3, 3:4}\n",
1024 | "d3 = {3:5, 4:6}\n",
1025 | "cm = ChainMap(d1, d2, d3)\n",
1026 | "\n",
1027 | "cm[1], cm[2], cm[3], cm[4]"
1028 | ]
1029 | },
1030 | {
1031 | "cell_type": "markdown",
1032 | "metadata": {},
1033 | "source": [
1034 | "# кортежи"
1035 | ]
1036 | },
1037 | {
1038 | "cell_type": "code",
1039 | "execution_count": 19,
1040 | "metadata": {},
1041 | "outputs": [
1042 | {
1043 | "data": {
1044 | "text/plain": [
1045 | "((), (1,), 1)"
1046 | ]
1047 | },
1048 | "execution_count": 19,
1049 | "metadata": {},
1050 | "output_type": "execute_result"
1051 | }
1052 | ],
1053 | "source": [
1054 | "a = ()\n",
1055 | "# b = (,)\n",
1056 | "c = (1,)\n",
1057 | "#d = (,1)\n",
1058 | "e = (1)\n",
1059 | "\n",
1060 | "a, c, e"
1061 | ]
1062 | },
1063 | {
1064 | "cell_type": "code",
1065 | "execution_count": 20,
1066 | "metadata": {},
1067 | "outputs": [
1068 | {
1069 | "data": {
1070 | "text/plain": [
1071 | "(1,)"
1072 | ]
1073 | },
1074 | "execution_count": 20,
1075 | "metadata": {},
1076 | "output_type": "execute_result"
1077 | }
1078 | ],
1079 | "source": [
1080 | "a + c"
1081 | ]
1082 | },
1083 | {
1084 | "cell_type": "markdown",
1085 | "metadata": {},
1086 | "source": [
1087 | "## типизированные массивы"
1088 | ]
1089 | },
1090 | {
1091 | "cell_type": "code",
1092 | "execution_count": 21,
1093 | "metadata": {},
1094 | "outputs": [
1095 | {
1096 | "data": {
1097 | "text/plain": [
1098 | "(array('f', [1.0, 1.5, 2.0, 2.5]), array('f', [1.0, 1.5]))"
1099 | ]
1100 | },
1101 | "execution_count": 21,
1102 | "metadata": {},
1103 | "output_type": "execute_result"
1104 | }
1105 | ],
1106 | "source": [
1107 | "import array \n",
1108 | "arr = array.array('f', (1.0, 1.5, 2.0, 2.5))\n",
1109 | "arr, arr[:2]"
1110 | ]
1111 | },
1112 | {
1113 | "cell_type": "markdown",
1114 | "metadata": {},
1115 | "source": [
1116 | "# строки"
1117 | ]
1118 | },
1119 | {
1120 | "cell_type": "code",
1121 | "execution_count": 22,
1122 | "metadata": {},
1123 | "outputs": [
1124 | {
1125 | "data": {
1126 | "text/plain": [
1127 | "'3'"
1128 | ]
1129 | },
1130 | "execution_count": 22,
1131 | "metadata": {},
1132 | "output_type": "execute_result"
1133 | }
1134 | ],
1135 | "source": [
1136 | "s1 = '123'\n",
1137 | "s2 = '45'\n",
1138 | "s1 + s2\n",
1139 | "#s1[4]\n",
1140 | "s1[-1]\n",
1141 | "# (s1 s2)\n",
1142 | "# del s1[1]"
1143 | ]
1144 | },
1145 | {
1146 | "cell_type": "markdown",
1147 | "metadata": {},
1148 | "source": [
1149 | "## именнованный список"
1150 | ]
1151 | },
1152 | {
1153 | "cell_type": "code",
1154 | "execution_count": 31,
1155 | "metadata": {},
1156 | "outputs": [
1157 | {
1158 | "name": "stdout",
1159 | "output_type": "stream",
1160 | "text": [
1161 | "Авто(цвет='red', пробег='1200', автомат=True)\n"
1162 | ]
1163 | }
1164 | ],
1165 | "source": [
1166 | "# именнованный список\n",
1167 | "from collections import namedtuple \n",
1168 | "Car = namedtuple('Авто' , 'цвет пробег автомат')\n",
1169 | "print (Car('red', '1200', True))"
1170 | ]
1171 | },
1172 | {
1173 | "cell_type": "code",
1174 | "execution_count": 32,
1175 | "metadata": {},
1176 | "outputs": [
1177 | {
1178 | "name": "stdout",
1179 | "output_type": "stream",
1180 | "text": [
1181 | "Car(цвет='red', пробег='1200', автомат=True)\n"
1182 | ]
1183 | }
1184 | ],
1185 | "source": [
1186 | "# Python 3.x\n",
1187 | "from typing import NamedTuple\n",
1188 | "class Car(NamedTuple):\n",
1189 | " цвет: str\n",
1190 | " пробег: float\n",
1191 | " автомат: bool\n",
1192 | "print (Car('red', '1200', True))"
1193 | ]
1194 | },
1195 | {
1196 | "cell_type": "markdown",
1197 | "metadata": {},
1198 | "source": [
1199 | "# мультимножество"
1200 | ]
1201 | },
1202 | {
1203 | "cell_type": "code",
1204 | "execution_count": 35,
1205 | "metadata": {},
1206 | "outputs": [
1207 | {
1208 | "data": {
1209 | "text/plain": [
1210 | "Counter({'клинок': 1, 'хлеб': 3})"
1211 | ]
1212 | },
1213 | "execution_count": 35,
1214 | "metadata": {},
1215 | "output_type": "execute_result"
1216 | }
1217 | ],
1218 | "source": [
1219 | "from collections import Counter \n",
1220 | "\n",
1221 | "inventory = Counter() \n",
1222 | "loot = {'клинок': 1, 'хлеб': 3} \n",
1223 | "inventory.update(loot) \n",
1224 | "inventory "
1225 | ]
1226 | },
1227 | {
1228 | "cell_type": "code",
1229 | "execution_count": 36,
1230 | "metadata": {},
1231 | "outputs": [
1232 | {
1233 | "data": {
1234 | "text/plain": [
1235 | "Counter({'клинок': 2, 'хлеб': 3, 'яблоко': 1})"
1236 | ]
1237 | },
1238 | "execution_count": 36,
1239 | "metadata": {},
1240 | "output_type": "execute_result"
1241 | }
1242 | ],
1243 | "source": [
1244 | "inventory.update({'клинок': 1, 'яблоко': 1}) \n",
1245 | "inventory"
1246 | ]
1247 | },
1248 | {
1249 | "cell_type": "code",
1250 | "execution_count": 40,
1251 | "metadata": {},
1252 | "outputs": [
1253 | {
1254 | "data": {
1255 | "text/plain": [
1256 | "(3, 6)"
1257 | ]
1258 | },
1259 | "execution_count": 40,
1260 | "metadata": {},
1261 | "output_type": "execute_result"
1262 | }
1263 | ],
1264 | "source": [
1265 | "len(inventory), sum(inventory.values())"
1266 | ]
1267 | },
1268 | {
1269 | "cell_type": "markdown",
1270 | "metadata": {},
1271 | "source": [
1272 | "# очередь с двусторониим доступом"
1273 | ]
1274 | },
1275 | {
1276 | "cell_type": "code",
1277 | "execution_count": 41,
1278 | "metadata": {},
1279 | "outputs": [
1280 | {
1281 | "data": {
1282 | "text/plain": [
1283 | "('2', '1')"
1284 | ]
1285 | },
1286 | "execution_count": 41,
1287 | "metadata": {},
1288 | "output_type": "execute_result"
1289 | }
1290 | ],
1291 | "source": [
1292 | "# для очереди FIFO - лучше использовать это!\n",
1293 | "\n",
1294 | "from collections import deque\n",
1295 | "# LifoQueue - в параллельных вычислениях\n",
1296 | "s = deque() \n",
1297 | "s.append('1') \n",
1298 | "s.append('2') \n",
1299 | "s.pop(), s.pop()"
1300 | ]
1301 | },
1302 | {
1303 | "cell_type": "code",
1304 | "execution_count": 42,
1305 | "metadata": {},
1306 | "outputs": [
1307 | {
1308 | "data": {
1309 | "text/plain": [
1310 | "('1', '2')"
1311 | ]
1312 | },
1313 | "execution_count": 42,
1314 | "metadata": {},
1315 | "output_type": "execute_result"
1316 | }
1317 | ],
1318 | "source": [
1319 | "s = deque() \n",
1320 | "s.append('1') \n",
1321 | "s.append('2') \n",
1322 | "s.popleft(), s.popleft()"
1323 | ]
1324 | },
1325 | {
1326 | "cell_type": "markdown",
1327 | "metadata": {},
1328 | "source": [
1329 | "# очереди с приоритетом"
1330 | ]
1331 | },
1332 | {
1333 | "cell_type": "code",
1334 | "execution_count": 43,
1335 | "metadata": {},
1336 | "outputs": [
1337 | {
1338 | "name": "stdout",
1339 | "output_type": "stream",
1340 | "text": [
1341 | "(1, 'a')\n",
1342 | "(2, 'b')\n",
1343 | "(3, 'c')\n"
1344 | ]
1345 | }
1346 | ],
1347 | "source": [
1348 | "from queue import PriorityQueue \n",
1349 | "q = PriorityQueue()\n",
1350 | "q.put((2, 'b')) \n",
1351 | "q.put((1, 'a')) \n",
1352 | "q.put((3, 'c'))\n",
1353 | "while not q.empty():\n",
1354 | " next_item = q.get()\n",
1355 | " print(next_item)"
1356 | ]
1357 | },
1358 | {
1359 | "cell_type": "markdown",
1360 | "metadata": {},
1361 | "source": [
1362 | "## списки"
1363 | ]
1364 | },
1365 | {
1366 | "cell_type": "code",
1367 | "execution_count": 45,
1368 | "metadata": {},
1369 | "outputs": [
1370 | {
1371 | "name": "stdout",
1372 | "output_type": "stream",
1373 | "text": [
1374 | "[]\n",
1375 | "[]\n"
1376 | ]
1377 | }
1378 | ],
1379 | "source": [
1380 | "# очистка списка\n",
1381 | "lst = [1, 2, 3]\n",
1382 | "del lst[:]\n",
1383 | "print (lst)\n",
1384 | "\n",
1385 | "lst = [1, 2, 3]\n",
1386 | "lst.clear()\n",
1387 | "print (lst)\n"
1388 | ]
1389 | },
1390 | {
1391 | "cell_type": "markdown",
1392 | "metadata": {},
1393 | "source": [
1394 | "# генераторы"
1395 | ]
1396 | },
1397 | {
1398 | "cell_type": "code",
1399 | "execution_count": 46,
1400 | "metadata": {},
1401 | "outputs": [],
1402 | "source": [
1403 | "def repeater(value):\n",
1404 | " while True:\n",
1405 | " yield value"
1406 | ]
1407 | },
1408 | {
1409 | "cell_type": "code",
1410 | "execution_count": 54,
1411 | "metadata": {},
1412 | "outputs": [
1413 | {
1414 | "data": {
1415 | "text/plain": [
1416 | "[0, -1, -4]"
1417 | ]
1418 | },
1419 | "execution_count": 54,
1420 | "metadata": {},
1421 | "output_type": "execute_result"
1422 | }
1423 | ],
1424 | "source": [
1425 | "vals = range(3) \n",
1426 | "squared = (i * i for i in vals) \n",
1427 | "negated = (-i for i in squared)\n",
1428 | "list(negated)"
1429 | ]
1430 | },
1431 | {
1432 | "cell_type": "markdown",
1433 | "metadata": {},
1434 | "source": [
1435 | "## когда функции - элементы списка"
1436 | ]
1437 | },
1438 | {
1439 | "cell_type": "code",
1440 | "execution_count": 6,
1441 | "metadata": {},
1442 | "outputs": [
1443 | {
1444 | "name": "stdout",
1445 | "output_type": "stream",
1446 | "text": [
1447 | "Populating the interactive namespace from numpy and matplotlib\n"
1448 | ]
1449 | },
1450 | {
1451 | "name": "stderr",
1452 | "output_type": "stream",
1453 | "text": [
1454 | "/home/dash/anaconda3/lib/python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['f']\n",
1455 | "`%matplotlib` prevents importing * from pylab and numpy\n",
1456 | " \"\\n`%matplotlib` prevents importing * from pylab and numpy\"\n",
1457 | "/home/dash/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:13: RuntimeWarning: divide by zero encountered in log\n",
1458 | " del sys.path[0]\n"
1459 | ]
1460 | },
1461 | {
1462 | "data": {
1463 | "text/plain": [
1464 | "(0.0, 2.0)"
1465 | ]
1466 | },
1467 | "execution_count": 6,
1468 | "metadata": {},
1469 | "output_type": "execute_result"
1470 | },
1471 | {
1472 | "data": {
1473 | "image/png": "\n",
1474 | "text/plain": [
1475 | ""
1476 | ]
1477 | },
1478 | "metadata": {},
1479 | "output_type": "display_data"
1480 | }
1481 | ],
1482 | "source": [
1483 | "import pandas as pd\n",
1484 | "import numpy as np\n",
1485 | "# для встроенных картинок\n",
1486 | "%pylab inline\n",
1487 | "# чуть покрасивше картинки:\n",
1488 | "plt.style.use('seaborn-dark')\n",
1489 | "import matplotlib.pyplot as plt\n",
1490 | "plt.rc('font', size=14)\n",
1491 | "\n",
1492 | "plt.figure(figsize=(7, 3))\n",
1493 | "x = np.linspace(0, 2)\n",
1494 | "for f in [np.sin, np.cos, np.exp, np.log]:\n",
1495 | " plt.plot(x, f(x), label=f.__name__)\n",
1496 | "plt.legend(loc=(1,0))\n",
1497 | "plt.grid(lw=2)\n",
1498 | "plt.xlim(x.min(), x.max())"
1499 | ]
1500 | },
1501 | {
1502 | "cell_type": "code",
1503 | "execution_count": null,
1504 | "metadata": {},
1505 | "outputs": [],
1506 | "source": []
1507 | }
1508 | ],
1509 | "metadata": {
1510 | "kernelspec": {
1511 | "display_name": "Python 3",
1512 | "language": "python",
1513 | "name": "python3"
1514 | },
1515 | "language_info": {
1516 | "codemirror_mode": {
1517 | "name": "ipython",
1518 | "version": 3
1519 | },
1520 | "file_extension": ".py",
1521 | "mimetype": "text/x-python",
1522 | "name": "python",
1523 | "nbconvert_exporter": "python",
1524 | "pygments_lexer": "ipython3",
1525 | "version": "3.6.5"
1526 | }
1527 | },
1528 | "nbformat": 4,
1529 | "nbformat_minor": 2
1530 | }
1531 |
--------------------------------------------------------------------------------