├── Nov-2017 ├── array_archive.npz ├── array_ex.txt ├── numpy-1-student.ipynb ├── numpy-1.ipynb ├── numpy-2-student.ipynb ├── numpy-2.ipynb └── some_array.npy ├── README.md ├── array_archive.npz ├── array_ex.txt ├── numpy-tutorial-student.ipynb ├── proj ├── .ipynb_checkpoints │ └── Untitled-checkpoint.ipynb ├── Untitled.ipynb ├── dict.pkl ├── embed.npy ├── p_vector.npy └── senti.binary.test.txt ├── python-numpy-tutorial.ipynb └── some_array.npy /Nov-2017/array_archive.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/Nov-2017/array_archive.npz -------------------------------------------------------------------------------- /Nov-2017/array_ex.txt: -------------------------------------------------------------------------------- 1 | 0.580052,0.186730,1.040717,1.134411 2 | 0.194163,-0.636917,-0.938659,0.124094 3 | -0.126410,0.268607,-0.695724,0.047428 4 | -1.484413,0.004176,-0.744203,0.005487 5 | 2.302869,0.200131,1.670238,-1.881090 6 | -0.193230,1.047233,0.482803,0.960334 -------------------------------------------------------------------------------- /Nov-2017/numpy-1-student.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# numpy基础\n", 8 | "\n", 9 | "### 七月在线python数据分析集训营 julyedu.com\n", 10 | "\n", 11 | "褚则伟 zeweichu@gmail.com" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Numpy简介\n", 19 | "\n", 20 | "- Numpy是Python语言的一个library [numpy](http://www.numpy.org/)\n", 21 | "- Numpy主要支持矩阵操作和运算\n", 22 | "- Numpy非常高效,core代码由C语言写成\n", 23 | "- 我们第三课要讲的pandas也是基于Numpy构建的一个library\n", 24 | "- 现在比较流行的机器学习框架(例如Tensorflow/PyTorch等等),语法都与Numpy比较接近" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## 目录\n", 32 | "- 数组简介和数组的构造(ndarray)\n", 33 | "- 数组取值和赋值\n", 34 | "- 数学运算\n", 35 | "- broadcasting广播" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "python里面调用一个包,用import对吧, 所以我们import `numpy` 包:\n", 43 | "\n", 44 | "如果还没有安装的话,你可以在command line界面使用`pip install numpy`" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "## Arrays/数组\n", 52 | "\n", 53 | "### 七月在线python数据分析集训营 julyedu.com" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "看你数组的维度啦,我自己的话比较简单粗暴,一般直接把1维数组就看做向量/vector,2维数组看做2维矩阵,3维数组看做3维矩阵..." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "可以调用np.array去从list初始化一个数组:" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "查看每个element的大小" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "有一些内置的创建数组的函数:" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "linspace也是一个很常用的初始化数据的手段,它可以帮我们产生一连串等间距的数组" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## 使用reshape来改变tensor的形状\n", 96 | "### 七月在线python数据分析集训营 julyedu.com" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "numpy可以很容易地把一维数组转成二维数组,三维数组。" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "直接把shape给重新定义了其实也可以" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "如果我们在某一个维度上写上-1,numpy会帮我们自动推导出正确的维度" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "还可以从其他的ndarray中获取shape信息然后reshape" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "高维数组可以用ravel来拉平" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "### 数组的数据类型 dtype\n", 139 | "\n", 140 | "数组可以有不同的数据类型" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "生成数组时可以指定数据类型,如果不指定numpy会自动匹配合适的类型" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "有时候如果我们需要ndarray是一个特定的数据类型,可以使用astype复制数组并转换数据类型" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "使用astype将float转换为int时小数部分被舍弃" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "使用astype把字符串转换为数组,如果失败抛出异常。" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "astype使用其它数组的数据类型作为参数" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "更多的内容可以读读[文档](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)." 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "## Array indexing/数组取值和赋值\n", 190 | "\n", 191 | "### 七月在线python数据分析集训营 julyedu.com" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "Numpy提供了蛮多种取值的方式的." 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "可以像list一样切片(多维数组可以从各个维度同时切片):" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "虽然,怎么说呢,不建议你这样去赋值,但是你确实可以修改切片出来的对象,然后完成对原数组的赋值." 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "关于Copy和View的关系\n", 220 | "- 简单的数组赋值,切片,包括作为函数的参数传递一个数组--并不会复制出一个新的数组,只是制造了一个新的reference。所以如果我们在新赋值的变量上改变数组的内容,原来的那个数组内容也会发生改变。这一点千万要注意哦!" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "- 使用`view`方法,我们可以拿到数组的一部分或者全部,但是在view上面修改内容还是会把原来的数组给更改了" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "使用`base`方法可以查看一个数组的owner是谁,也就是说这个数组是由谁制造产生的。" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "其实使用切片方法我们拿到的也是一个view" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "所以更改切片上的内容之后,原来数组的内容也被更改了" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "如果要复制出一个新的数组,我们就需要使用`copy()`这个方法了" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "下面我们继续回到数组切片的问题上\n", 263 | "\n", 264 | "创建3x4的2维数组/矩阵" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "你就放心大胆地去取你想要的数咯:" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "试试在第2个维度上切片也一样的:" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "dots(...)" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "下面这个高级了,更自由地取值和组合,但是要看清楚一点:" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "再来熟悉一下\n", 300 | "\n", 301 | "先创建一个2维数组" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "用下标生成一个向量" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "你能看明白下面做的事情吗?" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "既然可以取出来,我们当然也可以对这些元素操作咯" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "### numpy的条件判断\n", 330 | "\n", 331 | "比较fashion的取法之一,用条件判定去取(但是很好用):" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "用刚才的布尔型数组作为下标就可以去除符合条件的元素啦" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "其实一句话也可以完成是不是?" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "那个,真的,其实还有很多细节,其他的方式去取值,你可以看看官方文档。" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "我们一起来来总结一下,看下面切片取值方式(对应颜色是取出来的结果):" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "![](http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_02.png)\n", 367 | "![](http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_03.png)" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "## 简单数学运算\n", 375 | "### 七月在线python数据分析集训营 julyedu.com" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "下面这些运算是你在科学运算中经常经常会用到的,比如逐个元素的运算如下:" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "逐元素求和有下面2种方式" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "逐元素作差" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "逐元素相乘" 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "逐元素相除" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "逐元素求平方根!!!" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "当然还可以逐个元素求平方" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "你猜你做科学运算会最常用到的矩阵内元素的运算是什么?对啦,是求和,用 `sum`可以完成:" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "还有一些其他我们可以想到的运算,比如求和,求平均,求cumulative sum,sumulative product用numpy都可以做到" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "我想说最基本的运算就是上面这个样子,更多的运算可能得查查[文档](http://docs.scipy.org/doc/numpy/reference/routines.math.html).\n", 446 | "\n", 447 | "其实除掉基本运算,我们经常还需要做一些操作,比如矩阵的变形,转置和重排等等:" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "一维数组的排序" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "二维数组也可以在某些维度上排序" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "下面我们做一个小案例,找出排序后位置在5%的数字" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "## Broadcasting\n", 476 | "### 七月在线python数据分析集训营 julyedu.com" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "这个没想好哪个中文词最贴切,我们暂且叫它“传播吧”:
\n", 484 | "作用是什么呢,我们设想一个场景,如果要用小的矩阵去和大的矩阵做一些操作,但是希望小矩阵能循环和大矩阵的那些块做一样的操作,那急需要Broadcasting啦" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "我们要做一件事情,给x的每一行都逐元素加上一个向量,然后生成y" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "比较粗暴的方式是,用for循环逐个相加" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "这种方法当然可以啦,问题是不高效嘛,如果你的x矩阵行数非常多,那就很慢的咯:" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "因为broadcasting的存在,你上面的操作可以简单地汇总成一个求和操作" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "当操作两个array时,numpy会逐个比较它们的shape,在下述情况下,两arrays会兼容和输出broadcasting结果:
\n", 527 | "\n", 528 | "1. 相等\n", 529 | "2. 其中一个为1,(进而可进行拷贝拓展已至,shape匹配)\n", 530 | "3. 当两个ndarray的维度不完全相同的时候,rank较小的那个ndarray会被自动在前面加上一个一维维度,直到与另一个ndaary rank相同再检查是否匹配\n", 531 | "\n", 532 | "比如求和的时候有:\n", 533 | "```python\n", 534 | "Image (3d array): 256 x 256 x 3\n", 535 | "Scale (1d array): 3\n", 536 | "Result (3d array): 256 x 256 x 3\n", 537 | "\n", 538 | "A (4d array): 8 x 1 x 6 x 1\n", 539 | "B (3d array): 7 x 1 x 5\n", 540 | "Result (4d array): 8 x 7 x 6 x 5\n", 541 | "\n", 542 | "A (2d array): 5 x 4\n", 543 | "B (1d array): 1\n", 544 | "Result (2d array): 5 x 4\n", 545 | "\n", 546 | "A (2d array): 15 x 3 x 5\n", 547 | "B (1d array): 15 x 1 x 5\n", 548 | "Result (2d array): 15 x 3 x 5\n", 549 | "```\n", 550 | "\n", 551 | "下面是一些 broadcasting 的例子:" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "我们来理解一下broadcasting的这种用法\n", 559 | "\n", 560 | "先把v变形成3x1的数组/矩阵,然后就可以broadcasting加在w上了:" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "那如果要把一个矩阵的每一行都加上一个向量呢" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "上面那个操作太复杂了,其实我们可以直接这么做嘛" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "broadcasting当然可以逐元素运算了" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "总结一下broadcasting,可以看看下面的图:
\n", 589 | "![](http://www.astroml.org/_images/fig_broadcast_visual_1.png)" 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": {}, 595 | "source": [ 596 | "## 逻辑运算\n", 597 | "### 七月在线python数据分析班 2017升级版 julyedu.com" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "where可以帮我们选择是取第一个ndarray的元素还是第二个的" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "## 连接两个二维数组\n", 612 | "### 七月在线python数据分析集训营 julyedu.com" 613 | ] 614 | }, 615 | { 616 | "cell_type": "markdown", 617 | "metadata": {}, 618 | "source": [ 619 | "所谓堆叠,参考叠盘子。。。连接的另一种表述\n", 620 | "垂直stack与水平stack" 621 | ] 622 | }, 623 | { 624 | "cell_type": "markdown", 625 | "metadata": {}, 626 | "source": [ 627 | "拆分数组, 我们使用[split方法](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.split.html)。\n", 628 | "\n", 629 | "split(array, indices_or_sections, axis=0)\n", 630 | "\n", 631 | "第一个参数array没有什么疑问,第二个参数可以是切断的index,也可以是切分的个数,第三个参数是我们切块的维度" 632 | ] 633 | }, 634 | { 635 | "cell_type": "markdown", 636 | "metadata": {}, 637 | "source": [ 638 | "如果我们想要直接平均切分成三块呢?" 639 | ] 640 | }, 641 | { 642 | "cell_type": "markdown", 643 | "metadata": {}, 644 | "source": [ 645 | "堆叠辅助" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "r_用于按行堆叠" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "c_用于按列堆叠" 660 | ] 661 | }, 662 | { 663 | "cell_type": "markdown", 664 | "metadata": {}, 665 | "source": [ 666 | "切片直接转为数组" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "使用repeat来重复ndarry中的元素" 674 | ] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": {}, 679 | "source": [ 680 | "按元素重复" 681 | ] 682 | }, 683 | { 684 | "cell_type": "markdown", 685 | "metadata": {}, 686 | "source": [ 687 | "指定axis来重复" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": {}, 693 | "source": [ 694 | "Tile: 参考贴瓷砖\n", 695 | "[numpy tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html)" 696 | ] 697 | } 698 | ], 699 | "metadata": { 700 | "kernelspec": { 701 | "display_name": "Python 3", 702 | "language": "python", 703 | "name": "python3" 704 | }, 705 | "language_info": { 706 | "codemirror_mode": { 707 | "name": "ipython", 708 | "version": 3 709 | }, 710 | "file_extension": ".py", 711 | "mimetype": "text/x-python", 712 | "name": "python", 713 | "nbconvert_exporter": "python", 714 | "pygments_lexer": "ipython3", 715 | "version": "3.6.1" 716 | } 717 | }, 718 | "nbformat": 4, 719 | "nbformat_minor": 1 720 | } 721 | -------------------------------------------------------------------------------- /Nov-2017/numpy-1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# numpy基础\n", 8 | "\n", 9 | "### 七月在线python数据分析集训营 julyedu.com\n", 10 | "\n", 11 | "褚则伟 zeweichu@gmail.com" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Numpy简介\n", 19 | "\n", 20 | "- Numpy是Python语言的一个library [numpy](http://www.numpy.org/)\n", 21 | "- Numpy主要支持矩阵操作和运算\n", 22 | "- Numpy非常高效,core代码由C语言写成\n", 23 | "- 我们第三课要讲的pandas也是基于Numpy构建的一个library\n", 24 | "- 现在比较流行的机器学习框架(例如Tensorflow/PyTorch等等),语法都与Numpy比较接近" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## 目录\n", 32 | "- 数组简介和数组的构造(ndarray)\n", 33 | "- 数组取值和赋值\n", 34 | "- 数学运算" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "python里面调用一个包,用import对吧, 所以我们import `numpy` 包:\n", 42 | "\n", 43 | "如果还没有安装的话,你可以在command line界面使用`pip install numpy`" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 1, 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "import numpy as np" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "## Arrays/数组\n", 62 | "\n", 63 | "### 七月在线python数据分析集训营 julyedu.com" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "看你数组的维度啦,我自己的话比较简单粗暴,一般直接把1维数组就看做向量/vector,2维数组看做2维矩阵,3维数组看做3维矩阵..." 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "可以调用np.array去从list初始化一个数组:" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 2, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | " (3,) 1 2 3\n", 90 | "[5 2 3]\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "a = np.array([1, 2, 3]) # 1维数组\n", 96 | "print(type(a), a.shape, a[0], a[1], a[2])\n", 97 | "a[0] = 5 # 重新赋值\n", 98 | "print(a) " 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 3, 104 | "metadata": {}, 105 | "outputs": [ 106 | { 107 | "name": "stdout", 108 | "output_type": "stream", 109 | "text": [ 110 | "[[1 2 3]\n", 111 | " [4 5 6]]\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "b = np.array([[1,2,3],[4,5,6]]) # 2维数组\n", 117 | "print(b)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 4, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "(2, 3)\n", 130 | "1 2 4\n" 131 | ] 132 | } 133 | ], 134 | "source": [ 135 | "print(b.shape) #可以看形状的(非常常用!!!) \n", 136 | "print(b[0, 0], b[0, 1], b[1, 0])" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "name": "stdout", 146 | "output_type": "stream", 147 | "text": [ 148 | "6\n" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "print(b.size)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "name": "stdout", 163 | "output_type": "stream", 164 | "text": [ 165 | "int64\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "print(b.dtype)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "查看每个element的大小" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 7, 183 | "metadata": { 184 | "scrolled": true 185 | }, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "8\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "print(b.itemsize)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "有一些内置的创建数组的函数:" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 8, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "name": "stdout", 213 | "output_type": "stream", 214 | "text": [ 215 | "[[ 0. 0.]\n", 216 | " [ 0. 0.]]\n" 217 | ] 218 | } 219 | ], 220 | "source": [ 221 | "a = np.zeros((2,2)) # 创建2x2的全0数组\n", 222 | "print(a)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 9, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "[[ 1. 1.]]\n" 235 | ] 236 | } 237 | ], 238 | "source": [ 239 | "b = np.ones((1,2)) # 创建1x2的全1数组\n", 240 | "print(b)" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 10, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "[[7 7]\n", 253 | " [7 7]]\n" 254 | ] 255 | } 256 | ], 257 | "source": [ 258 | "c = np.full((2,2), 7) # 定值数组\n", 259 | "print(c) " 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 11, 265 | "metadata": {}, 266 | "outputs": [ 267 | { 268 | "name": "stdout", 269 | "output_type": "stream", 270 | "text": [ 271 | "[[ 1. 0.]\n", 272 | " [ 0. 1.]]\n" 273 | ] 274 | } 275 | ], 276 | "source": [ 277 | "d = np.eye(2) # 对角矩阵(对角元素为1)\n", 278 | "print(d)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 12, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | "[[ 0.18371333 0.67849295]\n", 291 | " [ 0.56642033 0.87021502]]\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "e = np.random.random((2,2)) # 2x2的随机数组(矩阵)\n", 297 | "print(e)" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 13, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "name": "stdout", 307 | "output_type": "stream", 308 | "text": [ 309 | "[[[ 0.00000000e+000 3.11108892e+231]\n", 310 | " [ 2.96439388e-323 0.00000000e+000]\n", 311 | " [ 2.12199579e-314 1.58817677e-052]]\n", 312 | "\n", 313 | " [[ 5.20845631e-090 1.69175720e-052]\n", 314 | " [ 3.61111103e+174 4.79126305e-037]\n", 315 | " [ 3.99910963e+252 8.34404912e-309]]]\n", 316 | "(2, 3, 2)\n" 317 | ] 318 | } 319 | ], 320 | "source": [ 321 | "f = np.empty((2,3,2)) # empty是未初始化的数据\n", 322 | "print(f)\n", 323 | "print(f.shape)" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 14, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]\n", 336 | "(15,)\n" 337 | ] 338 | } 339 | ], 340 | "source": [ 341 | "g = np.arange(15) # 用arange可以生成连续的一串元素\n", 342 | "print(g)\n", 343 | "print(g.shape)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "linspace也是一个很常用的初始化数据的手段,它可以帮我们产生一连串等间距的数组" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 15, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/plain": [ 361 | "array([ 2. , 2.25, 2.5 , 2.75, 3. ])" 362 | ] 363 | }, 364 | "execution_count": 15, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "np.linspace(2.0, 3.0, 5)" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "## 使用reshape来改变tensor的形状\n", 378 | "### 七月在线python数据分析集训营 julyedu.com" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "numpy可以很容易地把一维数组转成二维数组,三维数组。" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": 16, 391 | "metadata": {}, 392 | "outputs": [ 393 | { 394 | "name": "stdout", 395 | "output_type": "stream", 396 | "text": [ 397 | "(4,2): [[0 1]\n", 398 | " [2 3]\n", 399 | " [4 5]\n", 400 | " [6 7]]\n", 401 | "\n", 402 | "(2,2,2): [[[0 1]\n", 403 | " [2 3]]\n", 404 | "\n", 405 | " [[4 5]\n", 406 | " [6 7]]]\n" 407 | ] 408 | } 409 | ], 410 | "source": [ 411 | "import numpy as np\n", 412 | "\n", 413 | "arr = np.arange(8)\n", 414 | "print(\"(4,2):\", arr.reshape((4,2)))\n", 415 | "print()\n", 416 | "print(\"(2,2,2):\", arr.reshape((2,2,2)))" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": {}, 422 | "source": [ 423 | "直接把shape给重新定义了其实也可以" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 17, 429 | "metadata": {}, 430 | "outputs": [ 431 | { 432 | "data": { 433 | "text/plain": [ 434 | "array([[0, 1, 2, 3],\n", 435 | " [4, 5, 6, 7]])" 436 | ] 437 | }, 438 | "execution_count": 17, 439 | "metadata": {}, 440 | "output_type": "execute_result" 441 | } 442 | ], 443 | "source": [ 444 | "arr = np.arange(8)\n", 445 | "arr.shape = 2,4\n", 446 | "arr" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "如果我们在某一个维度上写上-1,numpy会帮我们自动推导出正确的维度" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 18, 459 | "metadata": {}, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "[[ 0 1 2]\n", 466 | " [ 3 4 5]\n", 467 | " [ 6 7 8]\n", 468 | " [ 9 10 11]\n", 469 | " [12 13 14]]\n", 470 | "(5, 3)\n" 471 | ] 472 | } 473 | ], 474 | "source": [ 475 | "arr = np.arange(15)\n", 476 | "print(arr.reshape((5,-1)))\n", 477 | "print(arr.reshape((5,-1)).shape)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "还可以从其他的ndarray中获取shape信息然后reshape" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 19, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "name": "stdout", 494 | "output_type": "stream", 495 | "text": [ 496 | "(3, 5)\n", 497 | "[[ 0 1 2 3 4]\n", 498 | " [ 5 6 7 8 9]\n", 499 | " [10 11 12 13 14]]\n" 500 | ] 501 | } 502 | ], 503 | "source": [ 504 | "other_arr = np.ones((3,5))\n", 505 | "print(other_arr.shape)\n", 506 | "print(arr.reshape(other_arr.shape))" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "高维数组可以用ravel来拉平" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 20, 519 | "metadata": {}, 520 | "outputs": [ 521 | { 522 | "name": "stdout", 523 | "output_type": "stream", 524 | "text": [ 525 | "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]\n" 526 | ] 527 | } 528 | ], 529 | "source": [ 530 | "print(arr.ravel())" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "### 数组的数据类型 dtype\n", 538 | "\n", 539 | "数组可以有不同的数据类型" 540 | ] 541 | }, 542 | { 543 | "cell_type": "markdown", 544 | "metadata": {}, 545 | "source": [ 546 | "生成数组时可以指定数据类型,如果不指定numpy会自动匹配合适的类型" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 21, 552 | "metadata": {}, 553 | "outputs": [ 554 | { 555 | "name": "stdout", 556 | "output_type": "stream", 557 | "text": [ 558 | "float64\n" 559 | ] 560 | } 561 | ], 562 | "source": [ 563 | "arr = np.array([1,2,3], dtype=np.float64)\n", 564 | "print(arr.dtype)" 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": 22, 570 | "metadata": {}, 571 | "outputs": [ 572 | { 573 | "name": "stdout", 574 | "output_type": "stream", 575 | "text": [ 576 | "int32\n" 577 | ] 578 | } 579 | ], 580 | "source": [ 581 | "arr = np.array([1,2,3], dtype=np.int32)\n", 582 | "print(arr.dtype)" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "有时候如果我们需要ndarray是一个特定的数据类型,可以使用astype复制数组并转换数据类型" 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": 23, 595 | "metadata": {}, 596 | "outputs": [ 597 | { 598 | "name": "stdout", 599 | "output_type": "stream", 600 | "text": [ 601 | "int64\n", 602 | "float64\n" 603 | ] 604 | } 605 | ], 606 | "source": [ 607 | "int_arr = np.array([1,2,3,4,5])\n", 608 | "float_arr = int_arr.astype(np.float)\n", 609 | "print(int_arr.dtype)\n", 610 | "print(float_arr.dtype)" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "使用astype将float转换为int时小数部分被舍弃" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 24, 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "name": "stdout", 627 | "output_type": "stream", 628 | "text": [ 629 | "[ 3 -1 -2 0 12 10]\n" 630 | ] 631 | } 632 | ], 633 | "source": [ 634 | "float_arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])\n", 635 | "int_arr = float_arr.astype(dtype = np.int)\n", 636 | "print(int_arr)" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "使用astype把字符串转换为数组,如果失败抛出异常。" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 25, 649 | "metadata": {}, 650 | "outputs": [ 651 | { 652 | "name": "stdout", 653 | "output_type": "stream", 654 | "text": [ 655 | "[ 1.25 -9.6 42. ]\n" 656 | ] 657 | } 658 | ], 659 | "source": [ 660 | "str_arr = np.array(['1.25', '-9.6', '42'], dtype = np.string_)\n", 661 | "float_arr = str_arr.astype(dtype = np.float)\n", 662 | "print(float_arr)" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": {}, 668 | "source": [ 669 | "astype使用其它数组的数据类型作为参数" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": 26, 675 | "metadata": {}, 676 | "outputs": [ 677 | { 678 | "name": "stdout", 679 | "output_type": "stream", 680 | "text": [ 681 | "[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]\n", 682 | "0 1\n" 683 | ] 684 | } 685 | ], 686 | "source": [ 687 | "int_arr = np.arange(10)\n", 688 | "float_arr = np.array([.23, 0.270, .357, 0.44, 0.5], dtype = np.float64)\n", 689 | "print(int_arr.astype(float_arr.dtype))\n", 690 | "print(int_arr[0], int_arr[1])" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "更多的内容可以读读[文档](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)." 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "## Array indexing/数组取值和赋值\n", 705 | "\n", 706 | "### 七月在线python数据分析集训营 julyedu.com" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "Numpy提供了蛮多种取值的方式的." 714 | ] 715 | }, 716 | { 717 | "cell_type": "markdown", 718 | "metadata": {}, 719 | "source": [ 720 | "可以像list一样切片(多维数组可以从各个维度同时切片):" 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": 27, 726 | "metadata": {}, 727 | "outputs": [ 728 | { 729 | "name": "stdout", 730 | "output_type": "stream", 731 | "text": [ 732 | "[[2 3]\n", 733 | " [6 7]]\n" 734 | ] 735 | } 736 | ], 737 | "source": [ 738 | "import numpy as np\n", 739 | "\n", 740 | "# 创建一个如下格式的3x4数组\n", 741 | "# [[ 1 2 3 4]\n", 742 | "# [ 5 6 7 8]\n", 743 | "# [ 9 10 11 12]]\n", 744 | "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n", 745 | "\n", 746 | "# 在两个维度上分别按照[:2]和[1:3]进行切片,取需要的部分\n", 747 | "# [[2 3]\n", 748 | "# [6 7]]\n", 749 | "b = a[:2, 1:3]\n", 750 | "print(b)" 751 | ] 752 | }, 753 | { 754 | "cell_type": "markdown", 755 | "metadata": {}, 756 | "source": [ 757 | "虽然,怎么说呢,不建议你这样去赋值,但是你确实可以修改切片出来的对象,然后完成对原数组的赋值." 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": 28, 763 | "metadata": {}, 764 | "outputs": [ 765 | { 766 | "name": "stdout", 767 | "output_type": "stream", 768 | "text": [ 769 | "2\n", 770 | "77\n" 771 | ] 772 | } 773 | ], 774 | "source": [ 775 | "print(a[0, 1]) \n", 776 | "b[0, 0] = 77 # b[0, 0]改了,很遗憾a[0, 1]也被修改了\n", 777 | "print(a[0, 1])" 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": {}, 783 | "source": [ 784 | "关于Copy和View的关系\n", 785 | "- 简单的数组赋值,切片,包括作为函数的参数传递一个数组--并不会复制出一个新的数组,只是制造了一个新的reference。所以如果我们在新赋值的变量上改变数组的内容,原来的那个数组内容也会发生改变。这一点千万要注意哦!" 786 | ] 787 | }, 788 | { 789 | "cell_type": "code", 790 | "execution_count": 29, 791 | "metadata": {}, 792 | "outputs": [ 793 | { 794 | "data": { 795 | "text/plain": [ 796 | "True" 797 | ] 798 | }, 799 | "execution_count": 29, 800 | "metadata": {}, 801 | "output_type": "execute_result" 802 | } 803 | ], 804 | "source": [ 805 | "b = a\n", 806 | "b is a" 807 | ] 808 | }, 809 | { 810 | "cell_type": "markdown", 811 | "metadata": {}, 812 | "source": [ 813 | "- 使用`view`方法,我们可以拿到数组的一部分或者全部,但是在view上面修改内容还是会把原来的数组给更改了" 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": 30, 819 | "metadata": {}, 820 | "outputs": [ 821 | { 822 | "data": { 823 | "text/plain": [ 824 | "False" 825 | ] 826 | }, 827 | "execution_count": 30, 828 | "metadata": {}, 829 | "output_type": "execute_result" 830 | } 831 | ], 832 | "source": [ 833 | "c = a.view()\n", 834 | "c is a" 835 | ] 836 | }, 837 | { 838 | "cell_type": "markdown", 839 | "metadata": {}, 840 | "source": [ 841 | "使用`base`方法可以查看一个数组的owner是谁,也就是说这个数组是由谁制造产生的。" 842 | ] 843 | }, 844 | { 845 | "cell_type": "code", 846 | "execution_count": 31, 847 | "metadata": { 848 | "scrolled": false 849 | }, 850 | "outputs": [ 851 | { 852 | "data": { 853 | "text/plain": [ 854 | "True" 855 | ] 856 | }, 857 | "execution_count": 31, 858 | "metadata": {}, 859 | "output_type": "execute_result" 860 | } 861 | ], 862 | "source": [ 863 | "c.base is a" 864 | ] 865 | }, 866 | { 867 | "cell_type": "markdown", 868 | "metadata": {}, 869 | "source": [ 870 | "其实使用切片方法我们拿到的也是一个view" 871 | ] 872 | }, 873 | { 874 | "cell_type": "code", 875 | "execution_count": 32, 876 | "metadata": { 877 | "scrolled": true 878 | }, 879 | "outputs": [ 880 | { 881 | "data": { 882 | "text/plain": [ 883 | "True" 884 | ] 885 | }, 886 | "execution_count": 32, 887 | "metadata": {}, 888 | "output_type": "execute_result" 889 | } 890 | ], 891 | "source": [ 892 | "s = a[:, 2:]\n", 893 | "s.base is a" 894 | ] 895 | }, 896 | { 897 | "cell_type": "markdown", 898 | "metadata": {}, 899 | "source": [ 900 | "所以更改切片上的内容之后,原来数组的内容也被更改了" 901 | ] 902 | }, 903 | { 904 | "cell_type": "code", 905 | "execution_count": 33, 906 | "metadata": {}, 907 | "outputs": [ 908 | { 909 | "data": { 910 | "text/plain": [ 911 | "array([[ 1, 77, 10, 10],\n", 912 | " [ 5, 6, 10, 10],\n", 913 | " [ 9, 10, 10, 10]])" 914 | ] 915 | }, 916 | "execution_count": 33, 917 | "metadata": {}, 918 | "output_type": "execute_result" 919 | } 920 | ], 921 | "source": [ 922 | "s[:] = 10\n", 923 | "a" 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": {}, 929 | "source": [ 930 | "如果要复制出一个新的数组,我们就需要使用`copy()`这个方法了" 931 | ] 932 | }, 933 | { 934 | "cell_type": "code", 935 | "execution_count": 34, 936 | "metadata": {}, 937 | "outputs": [ 938 | { 939 | "data": { 940 | "text/plain": [ 941 | "False" 942 | ] 943 | }, 944 | "execution_count": 34, 945 | "metadata": {}, 946 | "output_type": "execute_result" 947 | } 948 | ], 949 | "source": [ 950 | "d = a.copy()\n", 951 | "d is a" 952 | ] 953 | }, 954 | { 955 | "cell_type": "code", 956 | "execution_count": 35, 957 | "metadata": { 958 | "scrolled": true 959 | }, 960 | "outputs": [ 961 | { 962 | "data": { 963 | "text/plain": [ 964 | "False" 965 | ] 966 | }, 967 | "execution_count": 35, 968 | "metadata": {}, 969 | "output_type": "execute_result" 970 | } 971 | ], 972 | "source": [ 973 | "d.base is a" 974 | ] 975 | }, 976 | { 977 | "cell_type": "code", 978 | "execution_count": 36, 979 | "metadata": {}, 980 | "outputs": [ 981 | { 982 | "data": { 983 | "text/plain": [ 984 | "array([[ 1, 77, 10, 10],\n", 985 | " [ 5, 6, 10, 10],\n", 986 | " [ 9, 10, 10, 10]])" 987 | ] 988 | }, 989 | "execution_count": 36, 990 | "metadata": {}, 991 | "output_type": "execute_result" 992 | } 993 | ], 994 | "source": [ 995 | "d[0,0] = 9999\n", 996 | "a" 997 | ] 998 | }, 999 | { 1000 | "cell_type": "markdown", 1001 | "metadata": {}, 1002 | "source": [ 1003 | "下面我们继续回到数组切片的问题上\n", 1004 | "\n", 1005 | "创建3x4的2维数组/矩阵" 1006 | ] 1007 | }, 1008 | { 1009 | "cell_type": "code", 1010 | "execution_count": 37, 1011 | "metadata": {}, 1012 | "outputs": [ 1013 | { 1014 | "name": "stdout", 1015 | "output_type": "stream", 1016 | "text": [ 1017 | "[[ 1 2 3 4]\n", 1018 | " [ 5 6 7 8]\n", 1019 | " [ 9 10 11 12]]\n" 1020 | ] 1021 | } 1022 | ], 1023 | "source": [ 1024 | "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n", 1025 | "print(a)" 1026 | ] 1027 | }, 1028 | { 1029 | "cell_type": "markdown", 1030 | "metadata": {}, 1031 | "source": [ 1032 | "你就放心大胆地去取你想要的数咯:" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": 38, 1038 | "metadata": {}, 1039 | "outputs": [ 1040 | { 1041 | "name": "stdout", 1042 | "output_type": "stream", 1043 | "text": [ 1044 | "[5 6 7 8] (4,)\n", 1045 | "[[5 6 7 8]] (1, 4)\n", 1046 | "[[5 6 7 8]] (1, 4)\n" 1047 | ] 1048 | } 1049 | ], 1050 | "source": [ 1051 | "row_r1 = a[1, :] # 第2行,但是得到的是1维输出(列向量)\n", 1052 | "row_r2 = a[1:2, :] # 1x2的2维输出\n", 1053 | "row_r3 = a[[1], :] # 同上\n", 1054 | "print(row_r1, row_r1.shape)\n", 1055 | "print(row_r2, row_r2.shape)\n", 1056 | "print(row_r3, row_r3.shape)" 1057 | ] 1058 | }, 1059 | { 1060 | "cell_type": "markdown", 1061 | "metadata": {}, 1062 | "source": [ 1063 | "试试在第2个维度上切片也一样的:" 1064 | ] 1065 | }, 1066 | { 1067 | "cell_type": "code", 1068 | "execution_count": 39, 1069 | "metadata": {}, 1070 | "outputs": [ 1071 | { 1072 | "name": "stdout", 1073 | "output_type": "stream", 1074 | "text": [ 1075 | "[ 2 6 10] (3,)\n", 1076 | "\n", 1077 | "[[ 2]\n", 1078 | " [ 6]\n", 1079 | " [10]] (3, 1)\n" 1080 | ] 1081 | } 1082 | ], 1083 | "source": [ 1084 | "col_r1 = a[:, 1]\n", 1085 | "col_r2 = a[:, 1:2]\n", 1086 | "print(col_r1, col_r1.shape)\n", 1087 | "print()\n", 1088 | "print(col_r2, col_r2.shape)" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "markdown", 1093 | "metadata": {}, 1094 | "source": [ 1095 | "dots(...)" 1096 | ] 1097 | }, 1098 | { 1099 | "cell_type": "code", 1100 | "execution_count": 40, 1101 | "metadata": {}, 1102 | "outputs": [ 1103 | { 1104 | "data": { 1105 | "text/plain": [ 1106 | "array([[ 75, 76, 77, 78, 79],\n", 1107 | " [ 95, 96, 97, 98, 99],\n", 1108 | " [115, 116, 117, 118, 119]])" 1109 | ] 1110 | }, 1111 | "execution_count": 40, 1112 | "metadata": {}, 1113 | "output_type": "execute_result" 1114 | } 1115 | ], 1116 | "source": [ 1117 | "import numpy as np\n", 1118 | "c = np.arange(120).reshape(2,3,4,5)\n", 1119 | "c[1, ..., 3, :]" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "markdown", 1124 | "metadata": {}, 1125 | "source": [ 1126 | "下面这个高级了,更自由地取值和组合,但是要看清楚一点:" 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": 41, 1132 | "metadata": {}, 1133 | "outputs": [ 1134 | { 1135 | "name": "stdout", 1136 | "output_type": "stream", 1137 | "text": [ 1138 | "[1 4 5]\n", 1139 | "[1 4 5]\n" 1140 | ] 1141 | } 1142 | ], 1143 | "source": [ 1144 | "a = np.array([[1,2], [3, 4], [5, 6]])\n", 1145 | "\n", 1146 | "# 其实意思就是取(0,0),(1,1),(2,0)的元素组起来\n", 1147 | "print(a[[0, 1, 2], [0, 1, 0]])\n", 1148 | "\n", 1149 | "# 下面这个比较直白啦\n", 1150 | "print(np.array([a[0, 0], a[1, 1], a[2, 0]]))" 1151 | ] 1152 | }, 1153 | { 1154 | "cell_type": "code", 1155 | "execution_count": 42, 1156 | "metadata": {}, 1157 | "outputs": [ 1158 | { 1159 | "data": { 1160 | "text/plain": [ 1161 | "array([ 1, 39, 77, 110])" 1162 | ] 1163 | }, 1164 | "execution_count": 42, 1165 | "metadata": {}, 1166 | "output_type": "execute_result" 1167 | } 1168 | ], 1169 | "source": [ 1170 | "a = np.arange(4*5*6).reshape(4,5,6)\n", 1171 | "a[np.arange(4), np.arange(4), [1,3,5,2]]" 1172 | ] 1173 | }, 1174 | { 1175 | "cell_type": "code", 1176 | "execution_count": 43, 1177 | "metadata": {}, 1178 | "outputs": [ 1179 | { 1180 | "name": "stdout", 1181 | "output_type": "stream", 1182 | "text": [ 1183 | "[[ 6 7 8 9 10 11]\n", 1184 | " [ 6 7 8 9 10 11]]\n", 1185 | "[[ 6 7 8 9 10 11]\n", 1186 | " [ 6 7 8 9 10 11]]\n" 1187 | ] 1188 | } 1189 | ], 1190 | "source": [ 1191 | "# 再来试试\n", 1192 | "print(a[[0, 0], [1, 1]])\n", 1193 | "\n", 1194 | "# 还是一样\n", 1195 | "print(np.array([a[0, 1], a[0, 1]]))" 1196 | ] 1197 | }, 1198 | { 1199 | "cell_type": "markdown", 1200 | "metadata": {}, 1201 | "source": [ 1202 | "再来熟悉一下\n", 1203 | "\n", 1204 | "先创建一个2维数组" 1205 | ] 1206 | }, 1207 | { 1208 | "cell_type": "code", 1209 | "execution_count": 44, 1210 | "metadata": {}, 1211 | "outputs": [ 1212 | { 1213 | "name": "stdout", 1214 | "output_type": "stream", 1215 | "text": [ 1216 | "[[ 1 2 3]\n", 1217 | " [ 4 5 6]\n", 1218 | " [ 7 8 9]\n", 1219 | " [10 11 12]]\n" 1220 | ] 1221 | } 1222 | ], 1223 | "source": [ 1224 | "a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", 1225 | "print(a)" 1226 | ] 1227 | }, 1228 | { 1229 | "cell_type": "markdown", 1230 | "metadata": {}, 1231 | "source": [ 1232 | "用下标生成一个向量" 1233 | ] 1234 | }, 1235 | { 1236 | "cell_type": "code", 1237 | "execution_count": 45, 1238 | "metadata": { 1239 | "collapsed": true 1240 | }, 1241 | "outputs": [], 1242 | "source": [ 1243 | "b = np.array([0, 2, 0, 1])" 1244 | ] 1245 | }, 1246 | { 1247 | "cell_type": "markdown", 1248 | "metadata": {}, 1249 | "source": [ 1250 | "你能看明白下面做的事情吗?" 1251 | ] 1252 | }, 1253 | { 1254 | "cell_type": "code", 1255 | "execution_count": 46, 1256 | "metadata": {}, 1257 | "outputs": [ 1258 | { 1259 | "name": "stdout", 1260 | "output_type": "stream", 1261 | "text": [ 1262 | "[ 1 6 7 11]\n" 1263 | ] 1264 | } 1265 | ], 1266 | "source": [ 1267 | "print(a[np.arange(4), b]) " 1268 | ] 1269 | }, 1270 | { 1271 | "cell_type": "markdown", 1272 | "metadata": {}, 1273 | "source": [ 1274 | "既然可以取出来,我们当然也可以对这些元素操作咯" 1275 | ] 1276 | }, 1277 | { 1278 | "cell_type": "code", 1279 | "execution_count": 47, 1280 | "metadata": {}, 1281 | "outputs": [ 1282 | { 1283 | "name": "stdout", 1284 | "output_type": "stream", 1285 | "text": [ 1286 | "[[11 2 3]\n", 1287 | " [ 4 5 16]\n", 1288 | " [17 8 9]\n", 1289 | " [10 21 12]]\n" 1290 | ] 1291 | } 1292 | ], 1293 | "source": [ 1294 | "a[np.arange(4), b] += 10\n", 1295 | "print(a)" 1296 | ] 1297 | }, 1298 | { 1299 | "cell_type": "markdown", 1300 | "metadata": {}, 1301 | "source": [ 1302 | "### numpy的条件判断\n", 1303 | "\n", 1304 | "比较fashion的取法之一,用条件判定去取(但是很好用):" 1305 | ] 1306 | }, 1307 | { 1308 | "cell_type": "code", 1309 | "execution_count": 48, 1310 | "metadata": {}, 1311 | "outputs": [ 1312 | { 1313 | "name": "stdout", 1314 | "output_type": "stream", 1315 | "text": [ 1316 | "[[False False]\n", 1317 | " [ True True]\n", 1318 | " [ True True]]\n" 1319 | ] 1320 | } 1321 | ], 1322 | "source": [ 1323 | "a = np.array([[1,2], [3, 4], [5, 6]])\n", 1324 | "\n", 1325 | "bool_idx = (a > 2) # 就是判定一下是否大于2\n", 1326 | "\n", 1327 | "print(bool_idx) # 返回一个布尔型的3x2数组" 1328 | ] 1329 | }, 1330 | { 1331 | "cell_type": "markdown", 1332 | "metadata": {}, 1333 | "source": [ 1334 | "用刚才的布尔型数组作为下标就可以去除符合条件的元素啦" 1335 | ] 1336 | }, 1337 | { 1338 | "cell_type": "code", 1339 | "execution_count": 49, 1340 | "metadata": {}, 1341 | "outputs": [ 1342 | { 1343 | "name": "stdout", 1344 | "output_type": "stream", 1345 | "text": [ 1346 | "[3 4 5 6]\n" 1347 | ] 1348 | } 1349 | ], 1350 | "source": [ 1351 | "print(a[bool_idx])" 1352 | ] 1353 | }, 1354 | { 1355 | "cell_type": "markdown", 1356 | "metadata": {}, 1357 | "source": [ 1358 | "其实一句话也可以完成是不是?" 1359 | ] 1360 | }, 1361 | { 1362 | "cell_type": "code", 1363 | "execution_count": 50, 1364 | "metadata": {}, 1365 | "outputs": [ 1366 | { 1367 | "name": "stdout", 1368 | "output_type": "stream", 1369 | "text": [ 1370 | "[3 4 5 6]\n" 1371 | ] 1372 | } 1373 | ], 1374 | "source": [ 1375 | "print(a[a > 2])" 1376 | ] 1377 | }, 1378 | { 1379 | "cell_type": "markdown", 1380 | "metadata": {}, 1381 | "source": [ 1382 | "那个,真的,其实还有很多细节,其他的方式去取值,你可以看看官方文档。" 1383 | ] 1384 | }, 1385 | { 1386 | "cell_type": "markdown", 1387 | "metadata": {}, 1388 | "source": [ 1389 | "我们一起来来总结一下,看下面切片取值方式(对应颜色是取出来的结果):" 1390 | ] 1391 | }, 1392 | { 1393 | "cell_type": "markdown", 1394 | "metadata": {}, 1395 | "source": [ 1396 | "![](http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_02.png)\n", 1397 | "![](http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_03.png)" 1398 | ] 1399 | }, 1400 | { 1401 | "cell_type": "markdown", 1402 | "metadata": {}, 1403 | "source": [ 1404 | "## 简单数学运算\n", 1405 | "### 七月在线python数据分析集训营 julyedu.com" 1406 | ] 1407 | }, 1408 | { 1409 | "cell_type": "markdown", 1410 | "metadata": {}, 1411 | "source": [ 1412 | "下面这些运算是你在科学运算中经常经常会用到的,比如逐个元素的运算如下:" 1413 | ] 1414 | }, 1415 | { 1416 | "cell_type": "code", 1417 | "execution_count": 2, 1418 | "metadata": { 1419 | "collapsed": true 1420 | }, 1421 | "outputs": [], 1422 | "source": [ 1423 | "import numpy as np\n", 1424 | "x = np.array([[1,2],[3,4]], dtype=np.float64)\n", 1425 | "y = np.array([[5,6],[7,8]], dtype=np.float64)" 1426 | ] 1427 | }, 1428 | { 1429 | "cell_type": "markdown", 1430 | "metadata": {}, 1431 | "source": [ 1432 | "逐元素求和有下面2种方式" 1433 | ] 1434 | }, 1435 | { 1436 | "cell_type": "code", 1437 | "execution_count": 52, 1438 | "metadata": {}, 1439 | "outputs": [ 1440 | { 1441 | "name": "stdout", 1442 | "output_type": "stream", 1443 | "text": [ 1444 | "[[ 6. 8.]\n", 1445 | " [ 10. 12.]]\n", 1446 | "[[ 6. 8.]\n", 1447 | " [ 10. 12.]]\n" 1448 | ] 1449 | } 1450 | ], 1451 | "source": [ 1452 | "print(x + y)\n", 1453 | "print(np.add(x, y))" 1454 | ] 1455 | }, 1456 | { 1457 | "cell_type": "markdown", 1458 | "metadata": {}, 1459 | "source": [ 1460 | "逐元素作差" 1461 | ] 1462 | }, 1463 | { 1464 | "cell_type": "code", 1465 | "execution_count": 53, 1466 | "metadata": {}, 1467 | "outputs": [ 1468 | { 1469 | "name": "stdout", 1470 | "output_type": "stream", 1471 | "text": [ 1472 | "[[-4. -4.]\n", 1473 | " [-4. -4.]]\n", 1474 | "[[-4. -4.]\n", 1475 | " [-4. -4.]]\n" 1476 | ] 1477 | } 1478 | ], 1479 | "source": [ 1480 | "print(x - y)\n", 1481 | "print(np.subtract(x, y))" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "markdown", 1486 | "metadata": {}, 1487 | "source": [ 1488 | "逐元素相乘" 1489 | ] 1490 | }, 1491 | { 1492 | "cell_type": "code", 1493 | "execution_count": 54, 1494 | "metadata": {}, 1495 | "outputs": [ 1496 | { 1497 | "name": "stdout", 1498 | "output_type": "stream", 1499 | "text": [ 1500 | "[[ 5. 12.]\n", 1501 | " [ 21. 32.]]\n", 1502 | "[[ 5. 12.]\n", 1503 | " [ 21. 32.]]\n" 1504 | ] 1505 | } 1506 | ], 1507 | "source": [ 1508 | "print(x * y)\n", 1509 | "print(np.multiply(x, y))" 1510 | ] 1511 | }, 1512 | { 1513 | "cell_type": "markdown", 1514 | "metadata": {}, 1515 | "source": [ 1516 | "逐元素相除" 1517 | ] 1518 | }, 1519 | { 1520 | "cell_type": "code", 1521 | "execution_count": 55, 1522 | "metadata": {}, 1523 | "outputs": [ 1524 | { 1525 | "name": "stdout", 1526 | "output_type": "stream", 1527 | "text": [ 1528 | "[[ 0.2 0.33333333]\n", 1529 | " [ 0.42857143 0.5 ]]\n", 1530 | "[[ 0.2 0.33333333]\n", 1531 | " [ 0.42857143 0.5 ]]\n" 1532 | ] 1533 | } 1534 | ], 1535 | "source": [ 1536 | "print(x / y)\n", 1537 | "print(np.divide(x, y))" 1538 | ] 1539 | }, 1540 | { 1541 | "cell_type": "markdown", 1542 | "metadata": {}, 1543 | "source": [ 1544 | "逐元素求平方根!!!" 1545 | ] 1546 | }, 1547 | { 1548 | "cell_type": "code", 1549 | "execution_count": 56, 1550 | "metadata": {}, 1551 | "outputs": [ 1552 | { 1553 | "name": "stdout", 1554 | "output_type": "stream", 1555 | "text": [ 1556 | "[[ 1. 1.41421356]\n", 1557 | " [ 1.73205081 2. ]]\n" 1558 | ] 1559 | } 1560 | ], 1561 | "source": [ 1562 | "print(np.sqrt(x))" 1563 | ] 1564 | }, 1565 | { 1566 | "cell_type": "markdown", 1567 | "metadata": {}, 1568 | "source": [ 1569 | "当然还可以逐个元素求平方" 1570 | ] 1571 | }, 1572 | { 1573 | "cell_type": "code", 1574 | "execution_count": 57, 1575 | "metadata": {}, 1576 | "outputs": [ 1577 | { 1578 | "name": "stdout", 1579 | "output_type": "stream", 1580 | "text": [ 1581 | "[[ 1. 4.]\n", 1582 | " [ 9. 16.]]\n" 1583 | ] 1584 | } 1585 | ], 1586 | "source": [ 1587 | "print(x**2)" 1588 | ] 1589 | }, 1590 | { 1591 | "cell_type": "markdown", 1592 | "metadata": {}, 1593 | "source": [ 1594 | "你猜你做科学运算会最常用到的矩阵内元素的运算是什么?对啦,是求和,用 `sum`可以完成:" 1595 | ] 1596 | }, 1597 | { 1598 | "cell_type": "code", 1599 | "execution_count": 58, 1600 | "metadata": {}, 1601 | "outputs": [ 1602 | { 1603 | "name": "stdout", 1604 | "output_type": "stream", 1605 | "text": [ 1606 | "10\n", 1607 | "[4 6]\n", 1608 | "[3 7]\n" 1609 | ] 1610 | } 1611 | ], 1612 | "source": [ 1613 | "x = np.array([[1,2],[3,4]])\n", 1614 | "\n", 1615 | "print(np.sum(x)) # 数组/矩阵中所有元素求和; prints \"10\"\n", 1616 | "print(np.sum(x, axis=0)) # 按行去求和; prints \"[4 6]\"\n", 1617 | "print(np.sum(x, axis=1)) # 按列去求和; prints \"[3 7]\"" 1618 | ] 1619 | }, 1620 | { 1621 | "cell_type": "markdown", 1622 | "metadata": {}, 1623 | "source": [ 1624 | "还有一些其他我们可以想到的运算,比如求和,求平均,求cumulative sum,sumulative product用numpy都可以做到" 1625 | ] 1626 | }, 1627 | { 1628 | "cell_type": "code", 1629 | "execution_count": 59, 1630 | "metadata": {}, 1631 | "outputs": [ 1632 | { 1633 | "name": "stdout", 1634 | "output_type": "stream", 1635 | "text": [ 1636 | "2.5\n", 1637 | "[ 2. 3.]\n", 1638 | "[ 1.5 3.5]\n", 1639 | "[[1 2]\n", 1640 | " [4 6]]\n", 1641 | "[[ 1 2]\n", 1642 | " [ 3 12]]\n" 1643 | ] 1644 | } 1645 | ], 1646 | "source": [ 1647 | "print(np.mean(x))\n", 1648 | "print(np.mean(x, axis=0))\n", 1649 | "print(np.mean(x, axis=1))\n", 1650 | "print(x.cumsum(axis=0))\n", 1651 | "print(x.cumprod(axis=1))" 1652 | ] 1653 | }, 1654 | { 1655 | "cell_type": "markdown", 1656 | "metadata": {}, 1657 | "source": [ 1658 | "当我们在某一个维度上对ndarray求和求平均的时候,那一个维度会被自动压缩掉,但是如果我们希望保留这个维度的话,可以使用keepdims这个parameter,这个小技巧有时候很有用" 1659 | ] 1660 | }, 1661 | { 1662 | "cell_type": "code", 1663 | "execution_count": 12, 1664 | "metadata": {}, 1665 | "outputs": [ 1666 | { 1667 | "name": "stdout", 1668 | "output_type": "stream", 1669 | "text": [ 1670 | "[[ 1. 2.]\n", 1671 | " [ 3. 4.]]\n" 1672 | ] 1673 | } 1674 | ], 1675 | "source": [ 1676 | "print(x)" 1677 | ] 1678 | }, 1679 | { 1680 | "cell_type": "code", 1681 | "execution_count": 9, 1682 | "metadata": {}, 1683 | "outputs": [ 1684 | { 1685 | "name": "stdout", 1686 | "output_type": "stream", 1687 | "text": [ 1688 | "(2, 1) \n", 1689 | " [[ 1.5]\n", 1690 | " [ 3.5]]\n" 1691 | ] 1692 | } 1693 | ], 1694 | "source": [ 1695 | "x_mean = x.mean(1, keepdims=True)\n", 1696 | "print(x_mean.shape, \"\\n\", x_mean)" 1697 | ] 1698 | }, 1699 | { 1700 | "cell_type": "code", 1701 | "execution_count": 10, 1702 | "metadata": {}, 1703 | "outputs": [ 1704 | { 1705 | "data": { 1706 | "text/plain": [ 1707 | "array([[-0.5, -1.5],\n", 1708 | " [ 1.5, 0.5]])" 1709 | ] 1710 | }, 1711 | "execution_count": 10, 1712 | "metadata": {}, 1713 | "output_type": "execute_result" 1714 | } 1715 | ], 1716 | "source": [ 1717 | "x - x.mean(1)" 1718 | ] 1719 | }, 1720 | { 1721 | "cell_type": "code", 1722 | "execution_count": 11, 1723 | "metadata": {}, 1724 | "outputs": [ 1725 | { 1726 | "data": { 1727 | "text/plain": [ 1728 | "array([[-0.5, 0.5],\n", 1729 | " [-0.5, 0.5]])" 1730 | ] 1731 | }, 1732 | "execution_count": 11, 1733 | "metadata": {}, 1734 | "output_type": "execute_result" 1735 | } 1736 | ], 1737 | "source": [ 1738 | "x - x.mean(1, keepdims=True)" 1739 | ] 1740 | }, 1741 | { 1742 | "cell_type": "markdown", 1743 | "metadata": {}, 1744 | "source": [ 1745 | "我想说最基本的运算就是上面这个样子,更多的运算可能得查查[文档](http://docs.scipy.org/doc/numpy/reference/routines.math.html)." 1746 | ] 1747 | }, 1748 | { 1749 | "cell_type": "markdown", 1750 | "metadata": {}, 1751 | "source": [ 1752 | "一维数组的排序" 1753 | ] 1754 | }, 1755 | { 1756 | "cell_type": "code", 1757 | "execution_count": 60, 1758 | "metadata": { 1759 | "scrolled": true 1760 | }, 1761 | "outputs": [ 1762 | { 1763 | "name": "stdout", 1764 | "output_type": "stream", 1765 | "text": [ 1766 | "[-0.59089959 -0.69464228 0.19764173 1.06542957 -0.93167911 0.72010009\n", 1767 | " 0.98485164 0.64554892]\n", 1768 | "[-0.93167911 -0.69464228 -0.59089959 0.19764173 0.64554892 0.72010009\n", 1769 | " 0.98485164 1.06542957]\n" 1770 | ] 1771 | } 1772 | ], 1773 | "source": [ 1774 | "arr = np.random.randn(8)\n", 1775 | "print(arr)\n", 1776 | "arr.sort()\n", 1777 | "print(arr)" 1778 | ] 1779 | }, 1780 | { 1781 | "cell_type": "markdown", 1782 | "metadata": {}, 1783 | "source": [ 1784 | "二维数组也可以在某些维度上排序" 1785 | ] 1786 | }, 1787 | { 1788 | "cell_type": "code", 1789 | "execution_count": 61, 1790 | "metadata": {}, 1791 | "outputs": [ 1792 | { 1793 | "name": "stdout", 1794 | "output_type": "stream", 1795 | "text": [ 1796 | "[[ 0.96442199 0.24170399 -0.34868107]\n", 1797 | " [ 0.49019122 -0.44247649 0.26807994]\n", 1798 | " [-0.19606933 0.8373728 -0.42110106]\n", 1799 | " [-1.17488438 -0.01514267 -1.40175246]\n", 1800 | " [ 1.03809644 -0.32226042 1.21621558]]\n", 1801 | "[[-0.34868107 0.24170399 0.96442199]\n", 1802 | " [-0.44247649 0.26807994 0.49019122]\n", 1803 | " [-0.42110106 -0.19606933 0.8373728 ]\n", 1804 | " [-1.40175246 -1.17488438 -0.01514267]\n", 1805 | " [-0.32226042 1.03809644 1.21621558]]\n" 1806 | ] 1807 | } 1808 | ], 1809 | "source": [ 1810 | "arr = np.random.randn(5,3)\n", 1811 | "print(arr)\n", 1812 | "arr.sort(1)\n", 1813 | "print(arr)" 1814 | ] 1815 | }, 1816 | { 1817 | "cell_type": "markdown", 1818 | "metadata": {}, 1819 | "source": [ 1820 | "下面我们做一个小案例,找出排序后位置在5%的数字" 1821 | ] 1822 | }, 1823 | { 1824 | "cell_type": "code", 1825 | "execution_count": 62, 1826 | "metadata": {}, 1827 | "outputs": [ 1828 | { 1829 | "name": "stdout", 1830 | "output_type": "stream", 1831 | "text": [ 1832 | "-1.69029967076\n" 1833 | ] 1834 | } 1835 | ], 1836 | "source": [ 1837 | "large_arr = np.random.randn(1000)\n", 1838 | "large_arr.sort()\n", 1839 | "print(large_arr[int(0.05*len(large_arr))])" 1840 | ] 1841 | }, 1842 | { 1843 | "cell_type": "markdown", 1844 | "metadata": {}, 1845 | "source": [ 1846 | "如果我们想要找出某个dimension上最大的index呢?" 1847 | ] 1848 | }, 1849 | { 1850 | "cell_type": "code", 1851 | "execution_count": 16, 1852 | "metadata": {}, 1853 | "outputs": [ 1854 | { 1855 | "name": "stdout", 1856 | "output_type": "stream", 1857 | "text": [ 1858 | "[[ 0.69729261 0.46836516 0.61262327 0.5116643 0.11963729 0.65744612]\n", 1859 | " [ 0.59042301 0.52653756 0.83107804 0.49619956 0.8131979 0.90982086]\n", 1860 | " [ 0.54387051 0.7645951 0.03996066 0.60462687 0.21541442 0.33530842]\n", 1861 | " [ 0.89684909 0.46083355 0.45639174 0.03490184 0.54921917 0.42301243]\n", 1862 | " [ 0.23118945 0.46970828 0.25111209 0.48423839 0.69496104 0.22514291]]\n" 1863 | ] 1864 | } 1865 | ], 1866 | "source": [ 1867 | "x = np.random.random((5, 6))\n", 1868 | "print(x)" 1869 | ] 1870 | }, 1871 | { 1872 | "cell_type": "code", 1873 | "execution_count": 17, 1874 | "metadata": { 1875 | "scrolled": true 1876 | }, 1877 | "outputs": [ 1878 | { 1879 | "data": { 1880 | "text/plain": [ 1881 | "array([0, 5, 1, 0, 4])" 1882 | ] 1883 | }, 1884 | "execution_count": 17, 1885 | "metadata": {}, 1886 | "output_type": "execute_result" 1887 | } 1888 | ], 1889 | "source": [ 1890 | "np.argmax(x, 1)" 1891 | ] 1892 | }, 1893 | { 1894 | "cell_type": "markdown", 1895 | "metadata": {}, 1896 | "source": [ 1897 | "如果我们想要找出top k个数字呢?" 1898 | ] 1899 | }, 1900 | { 1901 | "cell_type": "code", 1902 | "execution_count": 20, 1903 | "metadata": {}, 1904 | "outputs": [ 1905 | { 1906 | "data": { 1907 | "text/plain": [ 1908 | "array([[0, 5, 2],\n", 1909 | " [5, 2, 4],\n", 1910 | " [1, 3, 0],\n", 1911 | " [0, 4, 1],\n", 1912 | " [4, 3, 1]])" 1913 | ] 1914 | }, 1915 | "execution_count": 20, 1916 | "metadata": {}, 1917 | "output_type": "execute_result" 1918 | } 1919 | ], 1920 | "source": [ 1921 | "x.argsort()[:, -3:][:, ::-1]" 1922 | ] 1923 | } 1924 | ], 1925 | "metadata": { 1926 | "kernelspec": { 1927 | "display_name": "Python 3", 1928 | "language": "python", 1929 | "name": "python3" 1930 | }, 1931 | "language_info": { 1932 | "codemirror_mode": { 1933 | "name": "ipython", 1934 | "version": 3 1935 | }, 1936 | "file_extension": ".py", 1937 | "mimetype": "text/x-python", 1938 | "name": "python", 1939 | "nbconvert_exporter": "python", 1940 | "pygments_lexer": "ipython3", 1941 | "version": "3.6.1" 1942 | } 1943 | }, 1944 | "nbformat": 4, 1945 | "nbformat_minor": 1 1946 | } 1947 | -------------------------------------------------------------------------------- /Nov-2017/numpy-2-student.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# numpy基础\n", 8 | "\n", 9 | "### 七月在线python数据分析集训营 julyedu.com\n", 10 | "\n", 11 | "褚则伟 zeweichu@gmail.com" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## 目录\n", 19 | "- broadcasting广播\n", 20 | "- 文件输入输出\n", 21 | "- 线性代数运算\n", 22 | "- 随堂小项目:用Numpy写一个Softmax" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## 复习" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "首先复习一下上次讲课的内容,我们首先产生一个随机的numpy ndarray" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Broadcasting\n", 44 | "### 七月在线python数据分析集训营 julyedu.com" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "这个没想好哪个中文词最贴切,我们暂且叫它“传播吧”:
\n", 52 | "作用是什么呢,我们设想一个场景,如果要用小的矩阵去和大的矩阵做一些操作,但是希望小矩阵能循环和大矩阵的那些块做一样的操作,那急需要Broadcasting啦" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "我们要做一件事情,给x的每一行都逐元素加上一个向量,然后生成y" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "比较粗暴的方式是,用for循环逐个相加" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "这种方法当然可以啦,问题是不高效嘛,如果你的x矩阵行数非常多,那就很慢的咯:" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "因为broadcasting的存在,你上面的操作可以简单地汇总成一个求和操作" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "当操作两个array时,numpy会逐个比较它们的shape,在下述情况下,两arrays会兼容和输出broadcasting结果:
\n", 95 | "\n", 96 | "1. 相等\n", 97 | "2. 其中一个为1,(进而可进行拷贝拓展已至,shape匹配)\n", 98 | "3. 当两个ndarray的维度不完全相同的时候,rank较小的那个ndarray会被自动在前面加上一个一维维度,直到与另一个ndaary rank相同再检查是否匹配\n", 99 | "\n", 100 | "比如求和的时候有:\n", 101 | "```python\n", 102 | "Image (3d array): 256 x 256 x 3\n", 103 | "Scale (1d array): 3\n", 104 | "Result (3d array): 256 x 256 x 3\n", 105 | "\n", 106 | "A (4d array): 8 x 1 x 6 x 1\n", 107 | "B (3d array): 7 x 1 x 5\n", 108 | "Result (4d array): 8 x 7 x 6 x 5\n", 109 | "\n", 110 | "A (2d array): 5 x 4\n", 111 | "B (1d array): 1\n", 112 | "Result (2d array): 5 x 4\n", 113 | "\n", 114 | "A (2d array): 15 x 3 x 5\n", 115 | "B (1d array): 15 x 1 x 5\n", 116 | "Result (2d array): 15 x 3 x 5\n", 117 | "```\n", 118 | "\n", 119 | "下面是一些 broadcasting 的例子:" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "我们来理解一下broadcasting的这种用法\n", 127 | "\n", 128 | "先把v变形成3x1的数组/矩阵,然后就可以broadcasting加在w上了:" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "那如果要把一个矩阵的每一行都加上一个向量呢" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "上面那个操作太复杂了,其实我们可以直接这么做嘛" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "broadcasting当然可以逐元素运算了" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "总结一下broadcasting,可以看看下面的图:
\n", 157 | "![](http://www.astroml.org/_images/fig_broadcast_visual_1.png)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "## 逻辑运算\n", 165 | "### 七月在线python数据分析班 2017升级版 julyedu.com" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "where可以帮我们选择是取第一个ndarray的元素还是第二个的" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "## 连接两个二维数组\n", 180 | "### 七月在线python数据分析集训营 julyedu.com" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "所谓堆叠,参考叠盘子。。。连接的另一种表述\n", 188 | "垂直stack与水平stack" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "拆分数组, 我们使用[split方法](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.split.html)。\n", 196 | "\n", 197 | "split(array, indices_or_sections, axis=0)\n", 198 | "\n", 199 | "第一个参数array没有什么疑问,第二个参数可以是切断的index,也可以是切分的个数,第三个参数是我们切块的维度" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "如果我们想要直接平均切分成三块呢?" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "堆叠辅助" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "r_用于按行堆叠" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "c_用于按列堆叠" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "切片直接转为数组" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "使用repeat来重复ndarry中的元素" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "按元素重复" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "指定axis来重复" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "Tile: 参考贴瓷砖\n", 263 | "[numpy tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html)" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "## numpy的文件输入输出\n", 271 | "### 七月在线python数据分析集训营 julyedu.com" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "读取csv文件作为数组" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "还有一个常用的把文本数据转换成ndarray的方法叫做genfromtxt" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "数组文件读写" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "多个数组可以一起压缩存储" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "## numpy和scipy的相关数学运算\n", 307 | "### 七月在线python数据分析集训营 julyedu.com" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "那如果我要做矩阵的乘法运算怎么办!!!恩,别着急,照着下面写就可以了:\n", 315 | "\n", 316 | "[matrix multiplication](http://mathworld.wolfram.com/MatrixMultiplication.html)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "求向量内积" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "矩阵的乘法" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "向量的内积[inner](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.inner.html#numpy.inner)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": { 343 | "collapsed": true 344 | }, 345 | "source": [ 346 | "转置和数学公式一样,简单粗暴" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "需要说明一下,1维的vector转置还是自己" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "2维的就不一样了" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "利用转置矩阵做dot product" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "高维的tensor也可以做转置" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "### [matmul](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html#numpy.matmul)\n", 382 | "\n", 383 | "非常常用,用于计算矩阵乘法" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "### [outer product](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.outer.html)\n", 391 | "\n", 392 | "与数学中的定义一样,outer product就是两个向量酸外积,变成了一个矩阵" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "### 一些更高级的线性代数操作" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "计算determinant" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "计算inverse" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "计算pseudo-inverse" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "计算Matrix的[norm](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm)" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "计算singular value decomposition (SVD)" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "\n", 442 | "## 随堂小项目\n", 443 | "\n", 444 | "### 七月在线python数据分析集训营 julyedu.com\n", 445 | "\n", 446 | "用numpy写一个softmax\n", 447 | "\n", 448 | "[什么是softmax?](http://cs231n.github.io/linear-classify/#softmax)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "更多的numpy细节和用法可以查看一下官网[numpy指南](http://docs.scipy.org/doc/numpy/reference/)" 456 | ] 457 | } 458 | ], 459 | "metadata": { 460 | "kernelspec": { 461 | "display_name": "Python 3", 462 | "language": "python", 463 | "name": "python3" 464 | }, 465 | "language_info": { 466 | "codemirror_mode": { 467 | "name": "ipython", 468 | "version": 3 469 | }, 470 | "file_extension": ".py", 471 | "mimetype": "text/x-python", 472 | "name": "python", 473 | "nbconvert_exporter": "python", 474 | "pygments_lexer": "ipython3", 475 | "version": "3.6.1" 476 | } 477 | }, 478 | "nbformat": 4, 479 | "nbformat_minor": 1 480 | } 481 | -------------------------------------------------------------------------------- /Nov-2017/numpy-2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# numpy基础\n", 8 | "\n", 9 | "### 七月在线python数据分析集训营 julyedu.com\n", 10 | "\n", 11 | "褚则伟 zeweichu@gmail.com" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## 目录\n", 19 | "- broadcasting广播\n", 20 | "- 文件输入输出\n", 21 | "- 线性代数运算\n", 22 | "- 随堂小项目:用Numpy写一个Softmax" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## 复习" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "首先复习一下上次讲课的内容,我们首先产生一个随机的numpy ndarray" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 1, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "name": "stdout", 46 | "output_type": "stream", 47 | "text": [ 48 | "(3, 5, 6) (3, 6, 4)\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "import numpy as np\n", 54 | "x = (10 * np.random.random((3, 5, 6)) - 5).astype(np.int32)\n", 55 | "y = (10 * np.random.random((3, 6, 4)) - 5).astype(np.int32)\n", 56 | "print(x.shape, y.shape)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 2, 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "data": { 66 | "text/plain": [ 67 | "array([[ 2, 1, -3, 2, -3, -1],\n", 68 | " [-3, 4, -1, -2, -2, -1],\n", 69 | " [-2, 2, 4, 0, 1, -2]], dtype=int32)" 70 | ] 71 | }, 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "x[:, 2, :]" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "data": { 88 | "text/plain": [ 89 | "array([[-2.5 , 0. , -0.33333333, 0.83333333, -0.66666667],\n", 90 | " [ 0.16666667, 1.83333333, -0.83333333, 1.66666667, 0.5 ],\n", 91 | " [-0.16666667, -0.5 , 0.5 , -1.16666667, 0.5 ]])" 92 | ] 93 | }, 94 | "execution_count": 3, 95 | "metadata": {}, 96 | "output_type": "execute_result" 97 | } 98 | ], 99 | "source": [ 100 | "np.mean(x, -1)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/plain": [ 111 | "array([[[-2.5 ],\n", 112 | " [ 0. ],\n", 113 | " [-0.33333333],\n", 114 | " [ 0.83333333],\n", 115 | " [-0.66666667]],\n", 116 | "\n", 117 | " [[ 0.16666667],\n", 118 | " [ 1.83333333],\n", 119 | " [-0.83333333],\n", 120 | " [ 1.66666667],\n", 121 | " [ 0.5 ]],\n", 122 | "\n", 123 | " [[-0.16666667],\n", 124 | " [-0.5 ],\n", 125 | " [ 0.5 ],\n", 126 | " [-1.16666667],\n", 127 | " [ 0.5 ]]])" 128 | ] 129 | }, 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "output_type": "execute_result" 133 | } 134 | ], 135 | "source": [ 136 | "np.mean(x, -1, keepdims=True)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "## Broadcasting\n", 144 | "### 七月在线python数据分析集训营 julyedu.com" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "这个没想好哪个中文词最贴切,我们暂且叫它“传播吧”:
\n", 152 | "作用是什么呢,我们设想一个场景,如果要用小的矩阵去和大的矩阵做一些操作,但是希望小矩阵能循环和大矩阵的那些块做一样的操作,那急需要Broadcasting啦" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "我们要做一件事情,给x的每一行都逐元素加上一个向量,然后生成y" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 5, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "name": "stdout", 169 | "output_type": "stream", 170 | "text": [ 171 | "[[0 0 0]\n", 172 | " [0 0 0]\n", 173 | " [0 0 0]\n", 174 | " [0 0 0]]\n" 175 | ] 176 | } 177 | ], 178 | "source": [ 179 | "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", 180 | "v = np.array([1, 0, 1])\n", 181 | "y = np.zeros_like(x) # 生成一个和x维度一致的空数组/矩阵\n", 182 | "print(y)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "比较粗暴的方式是,用for循环逐个相加" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 6, 195 | "metadata": {}, 196 | "outputs": [ 197 | { 198 | "name": "stdout", 199 | "output_type": "stream", 200 | "text": [ 201 | "[[ 2 2 4]\n", 202 | " [ 5 5 7]\n", 203 | " [ 8 8 10]\n", 204 | " [11 11 13]]\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "for i in range(x.shape[0]):\n", 210 | " for j in range(x.shape[1]):\n", 211 | " y[i, j] = x[i, j] + v[j]\n", 212 | "print(y)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "这种方法当然可以啦,问题是不高效嘛,如果你的x矩阵行数非常多,那就很慢的咯:" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 7, 225 | "metadata": { 226 | "collapsed": true 227 | }, 228 | "outputs": [], 229 | "source": [ 230 | "import time" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 8, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "name": "stdout", 240 | "output_type": "stream", 241 | "text": [ 242 | "[[ 500. 500. 500. ..., 500. 500. 500.]\n", 243 | " [ 500. 500. 500. ..., 500. 500. 500.]\n", 244 | " [ 500. 500. 500. ..., 500. 500. 500.]\n", 245 | " ..., \n", 246 | " [ 500. 500. 500. ..., 500. 500. 500.]\n", 247 | " [ 500. 500. 500. ..., 500. 500. 500.]\n", 248 | " [ 500. 500. 500. ..., 500. 500. 500.]]\n", 249 | "It took 18.60887122154236 seconds to finish\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "start = time.time()\n", 255 | "x = 200 * np.ones((5000, 6000))\n", 256 | "v = 300 * np.ones((6000))\n", 257 | "y = np.zeros_like(x)\n", 258 | "for i in range(x.shape[0]):\n", 259 | " for j in range(x.shape[1]):\n", 260 | " y[i, j] = x[i, j] + v[j]\n", 261 | "print(y)\n", 262 | "print(\"It took {} seconds to finish\".format(time.time() - start))" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "因为broadcasting的存在,你上面的操作可以简单地汇总成一个求和操作" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 9, 282 | "metadata": {}, 283 | "outputs": [ 284 | { 285 | "name": "stdout", 286 | "output_type": "stream", 287 | "text": [ 288 | "[[ 2 2 4]\n", 289 | " [ 5 5 7]\n", 290 | " [ 8 8 10]\n", 291 | " [11 11 13]]\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", 297 | "v = np.array([1, 0, 1])\n", 298 | "y = x + v # Add v to each row of x using broadcasting\n", 299 | "print(y)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 10, 305 | "metadata": {}, 306 | "outputs": [ 307 | { 308 | "name": "stdout", 309 | "output_type": "stream", 310 | "text": [ 311 | "It took 0.2812681198120117 seconds to finish\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "start = time.time()\n", 317 | "x = 200 * np.ones((5000, 6000))\n", 318 | "v = 300 * np.array((6000))\n", 319 | "y = x + v\n", 320 | "print(\"It took {} seconds to finish\".format(time.time() - start))" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "当操作两个array时,numpy会逐个比较它们的shape,在下述情况下,两arrays会兼容和输出broadcasting结果:
\n", 328 | "\n", 329 | "1. 相等\n", 330 | "2. 其中一个为1,(进而可进行拷贝拓展已至,shape匹配)\n", 331 | "3. 当两个ndarray的维度不完全相同的时候,rank较小的那个ndarray会被自动在前面加上一个一维维度,直到与另一个ndaary rank相同再检查是否匹配\n", 332 | "\n", 333 | "比如求和的时候有:\n", 334 | "```python\n", 335 | "Image (3d array): 256 x 256 x 3\n", 336 | "Scale (1d array): 3\n", 337 | "Result (3d array): 256 x 256 x 3\n", 338 | "\n", 339 | "A (4d array): 8 x 1 x 6 x 1\n", 340 | "B (3d array): 7 x 1 x 5\n", 341 | "Result (4d array): 8 x 7 x 6 x 5\n", 342 | "\n", 343 | "A (2d array): 5 x 4\n", 344 | "B (1d array): 1\n", 345 | "Result (2d array): 5 x 4\n", 346 | "\n", 347 | "A (2d array): 15 x 3 x 5\n", 348 | "B (1d array): 15 x 1 x 5\n", 349 | "Result (2d array): 15 x 3 x 5\n", 350 | "```\n", 351 | "\n", 352 | "下面是一些 broadcasting 的例子:" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "我们来理解一下broadcasting的这种用法\n", 360 | "\n", 361 | "先把v变形成3x1的数组/矩阵,然后就可以broadcasting加在w上了:" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 11, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "name": "stdout", 371 | "output_type": "stream", 372 | "text": [ 373 | "[[ 4 5]\n", 374 | " [ 8 10]\n", 375 | " [12 15]]\n" 376 | ] 377 | } 378 | ], 379 | "source": [ 380 | "v = np.array([1,2,3]) # v 形状是 (3,)\n", 381 | "w = np.array([4,5]) # w 形状是 (2,)\n", 382 | "\n", 383 | "print(np.reshape(v, (3, 1)) * w) # (3, 1), (2,) -> (3, 1), (1, 2) -> (3, 2)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "那如果要把一个矩阵的每一行都加上一个向量呢" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 12, 396 | "metadata": {}, 397 | "outputs": [ 398 | { 399 | "name": "stdout", 400 | "output_type": "stream", 401 | "text": [ 402 | "[[2 4 6]\n", 403 | " [5 7 9]]\n" 404 | ] 405 | } 406 | ], 407 | "source": [ 408 | "x = np.array([[1,2,3], [4,5,6]]) # (2,3)\n", 409 | "v = np.array([1,2,3]) # (3,)\n", 410 | "print(x + v) #(2, 3), (3,) -> (2, 3), (1, 3) -> (2, 3)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 13, 416 | "metadata": { 417 | "scrolled": true 418 | }, 419 | "outputs": [ 420 | { 421 | "ename": "ValueError", 422 | "evalue": "operands could not be broadcast together with shapes (2,3) (2,) ", 423 | "output_type": "error", 424 | "traceback": [ 425 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 426 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 427 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m6\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 2x3的\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mw\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# w 形状是 (2,)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mw\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# (2, 3), (2, ) -> (2, 3), (1, 2) -> not compatible\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 428 | "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (2,3) (2,) " 429 | ] 430 | } 431 | ], 432 | "source": [ 433 | "x = np.array([[1,2,3], [4,5,6]]) # 2x3的\n", 434 | "w = np.array([4,5]) # w 形状是 (2,)\n", 435 | "print(x + w) # (2, 3), (2, ) -> (2, 3), (1, 2) -> not compatible" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": null, 441 | "metadata": { 442 | "collapsed": true 443 | }, 444 | "outputs": [], 445 | "source": [ 446 | "print((x.T + w).T)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "上面那个操作太复杂了,其实我们可以直接这么做嘛" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": { 460 | "collapsed": true 461 | }, 462 | "outputs": [], 463 | "source": [ 464 | "print(x + np.reshape(w, (2, 1)))" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "broadcasting当然可以逐元素运算了" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "metadata": { 478 | "collapsed": true 479 | }, 480 | "outputs": [], 481 | "source": [ 482 | "print(x * 2)" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "总结一下broadcasting,可以看看下面的图:
\n", 490 | "![](http://www.astroml.org/_images/fig_broadcast_visual_1.png)" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": {}, 496 | "source": [ 497 | "## 逻辑运算\n", 498 | "### 七月在线python数据分析班 2017升级版 julyedu.com" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "where可以帮我们选择是取第一个ndarray的元素还是第二个的" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 92, 511 | "metadata": {}, 512 | "outputs": [ 513 | { 514 | "name": "stdout", 515 | "output_type": "stream", 516 | "text": [ 517 | "[ 1.1 2.2 1.3 1.4 2.5]\n" 518 | ] 519 | } 520 | ], 521 | "source": [ 522 | "x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])\n", 523 | "y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])\n", 524 | "cond = np.array([True, False, True, True, False])\n", 525 | "print(np.where(cond, x_arr, y_arr))" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 93, 531 | "metadata": {}, 532 | "outputs": [ 533 | { 534 | "name": "stdout", 535 | "output_type": "stream", 536 | "text": [ 537 | "[[-0.70291816 -0.48078299 -0.07345543 0.37364768]\n", 538 | " [-2.12054472 0.12560835 0.53658201 -0.34450973]\n", 539 | " [-0.23174391 -0.78220029 -0.34650272 0.16584218]\n", 540 | " [-0.12586755 -0.46684574 -1.76005006 -0.93146404]]\n" 541 | ] 542 | } 543 | ], 544 | "source": [ 545 | "arr = np.random.randn(4,4)\n", 546 | "print(arr)" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 94, 552 | "metadata": {}, 553 | "outputs": [ 554 | { 555 | "name": "stdout", 556 | "output_type": "stream", 557 | "text": [ 558 | "[[-2 -2 -2 2]\n", 559 | " [-2 2 2 -2]\n", 560 | " [-2 -2 -2 2]\n", 561 | " [-2 -2 -2 -2]]\n" 562 | ] 563 | } 564 | ], 565 | "source": [ 566 | "print(np.where(arr > 0, 2, -2))" 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": 95, 572 | "metadata": {}, 573 | "outputs": [ 574 | { 575 | "name": "stdout", 576 | "output_type": "stream", 577 | "text": [ 578 | "[[-0.70291816 -0.48078299 -0.07345543 2. ]\n", 579 | " [-2.12054472 2. 2. -0.34450973]\n", 580 | " [-0.23174391 -0.78220029 -0.34650272 2. ]\n", 581 | " [-0.12586755 -0.46684574 -1.76005006 -0.93146404]]\n" 582 | ] 583 | } 584 | ], 585 | "source": [ 586 | "print(np.where(arr > 0, 2, arr))" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": 96, 592 | "metadata": {}, 593 | "outputs": [ 594 | { 595 | "name": "stdout", 596 | "output_type": "stream", 597 | "text": [ 598 | "[1 2 1 0 3]\n" 599 | ] 600 | } 601 | ], 602 | "source": [ 603 | "cond_1 = np.array([True, False, True, True, False])\n", 604 | "cond_2 = np.array([False, True, False, True, False])\n", 605 | "result = np.where(cond_1 & cond_2, 0, \\\n", 606 | " np.where(cond_1, 1, np.where(cond_2, 2, 3)))\n", 607 | "print(result)" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 97, 613 | "metadata": {}, 614 | "outputs": [ 615 | { 616 | "name": "stdout", 617 | "output_type": "stream", 618 | "text": [ 619 | "[ 1.84333075 -0.18505244 -0.3696118 1.36176081 1.36693291 0.41808203\n", 620 | " -1.03304133 -0.04080082 0.03553841 -0.29910141]\n", 621 | "5\n" 622 | ] 623 | } 624 | ], 625 | "source": [ 626 | "arr = np.random.randn(10)\n", 627 | "print(arr)\n", 628 | "print((arr > 0).sum())" 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": 98, 634 | "metadata": {}, 635 | "outputs": [ 636 | { 637 | "name": "stdout", 638 | "output_type": "stream", 639 | "text": [ 640 | "True\n", 641 | "False\n" 642 | ] 643 | } 644 | ], 645 | "source": [ 646 | "bools = np.array([False, False, True, False])\n", 647 | "print(bools.any()) # 有一个为True则返回True\n", 648 | "print(bools.all()) # 有一个为False则返回False" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "## 连接两个二维数组\n", 656 | "### 七月在线python数据分析集训营 julyedu.com" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": null, 662 | "metadata": { 663 | "collapsed": true 664 | }, 665 | "outputs": [], 666 | "source": [ 667 | "arr1 = np.array([[1, 2, 3], [4, 5, 6]])\n", 668 | "arr2 = np.array([[7, 8, 9], [10, 11, 12]])\n", 669 | "print(np.concatenate([arr1, arr2], axis = 0)) # 按行连接\n", 670 | "print(np.concatenate([arr1, arr2], axis = 1)) # 按列连接" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "所谓堆叠,参考叠盘子。。。连接的另一种表述\n", 678 | "垂直stack与水平stack" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": null, 684 | "metadata": { 685 | "collapsed": true 686 | }, 687 | "outputs": [], 688 | "source": [ 689 | "print(np.vstack((arr1, arr2))) # 垂直堆叠\n", 690 | "print(np.hstack((arr1, arr2))) # 水平堆叠" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "拆分数组, 我们使用[split方法](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.split.html)。\n", 698 | "\n", 699 | "split(array, indices_or_sections, axis=0)\n", 700 | "\n", 701 | "第一个参数array没有什么疑问,第二个参数可以是切断的index,也可以是切分的个数,第三个参数是我们切块的维度" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 87, 707 | "metadata": {}, 708 | "outputs": [ 709 | { 710 | "name": "stdout", 711 | "output_type": "stream", 712 | "text": [ 713 | "[[ 0.02748613 0.80183338 0.98362064 0.83390233 0.30820675 0.62237232]\n", 714 | " [ 0.24180617 0.50848842 0.11817702 0.63971147 0.95449527 0.77232103]\n", 715 | " [ 0.65504176 0.33856181 0.58431342 0.11515941 0.50000158 0.56214734]\n", 716 | " [ 0.36666571 0.11613323 0.01241145 0.67861831 0.46134197 0.69705024]\n", 717 | " [ 0.68029107 0.12991374 0.98166857 0.5981871 0.80964768 0.44394885]\n", 718 | " [ 0.72437319 0.5260204 0.05226753 0.51586905 0.71076813 0.83842862]]\n" 719 | ] 720 | } 721 | ], 722 | "source": [ 723 | "arr = np.random.rand(6,6)\n", 724 | "print(arr)" 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": 88, 730 | "metadata": {}, 731 | "outputs": [ 732 | { 733 | "name": "stdout", 734 | "output_type": "stream", 735 | "text": [ 736 | "[[ 0.02748613 0.80183338 0.98362064 0.83390233 0.30820675 0.62237232]]\n", 737 | "\n", 738 | "[[ 0.24180617 0.50848842 0.11817702 0.63971147 0.95449527 0.77232103]\n", 739 | " [ 0.65504176 0.33856181 0.58431342 0.11515941 0.50000158 0.56214734]]\n", 740 | "\n", 741 | "[[ 0.36666571 0.11613323 0.01241145 0.67861831 0.46134197 0.69705024]\n", 742 | " [ 0.68029107 0.12991374 0.98166857 0.5981871 0.80964768 0.44394885]\n", 743 | " [ 0.72437319 0.5260204 0.05226753 0.51586905 0.71076813 0.83842862]]\n" 744 | ] 745 | } 746 | ], 747 | "source": [ 748 | "first, second, third = np.split(arr, [1,3], axis = 0)\n", 749 | "print(first)\n", 750 | "print()\n", 751 | "print(second)\n", 752 | "print()\n", 753 | "print(third)" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": 89, 759 | "metadata": { 760 | "scrolled": true 761 | }, 762 | "outputs": [ 763 | { 764 | "name": "stdout", 765 | "output_type": "stream", 766 | "text": [ 767 | "[[ 0.02748613]\n", 768 | " [ 0.24180617]\n", 769 | " [ 0.65504176]\n", 770 | " [ 0.36666571]\n", 771 | " [ 0.68029107]\n", 772 | " [ 0.72437319]]\n", 773 | "\n", 774 | "[[ 0.80183338 0.98362064]\n", 775 | " [ 0.50848842 0.11817702]\n", 776 | " [ 0.33856181 0.58431342]\n", 777 | " [ 0.11613323 0.01241145]\n", 778 | " [ 0.12991374 0.98166857]\n", 779 | " [ 0.5260204 0.05226753]]\n", 780 | "\n", 781 | "[[ 0.83390233 0.30820675 0.62237232]\n", 782 | " [ 0.63971147 0.95449527 0.77232103]\n", 783 | " [ 0.11515941 0.50000158 0.56214734]\n", 784 | " [ 0.67861831 0.46134197 0.69705024]\n", 785 | " [ 0.5981871 0.80964768 0.44394885]\n", 786 | " [ 0.51586905 0.71076813 0.83842862]]\n" 787 | ] 788 | } 789 | ], 790 | "source": [ 791 | "first, second, third = np.split(arr, [1, 3], axis = 1)\n", 792 | "print(first)\n", 793 | "print()\n", 794 | "print(second)\n", 795 | "print()\n", 796 | "print(third)" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "如果我们想要直接平均切分成三块呢?" 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": 90, 809 | "metadata": {}, 810 | "outputs": [ 811 | { 812 | "name": "stdout", 813 | "output_type": "stream", 814 | "text": [ 815 | "\n", 816 | "3\n", 817 | "[array([[ 0.02748613, 0.80183338],\n", 818 | " [ 0.24180617, 0.50848842],\n", 819 | " [ 0.65504176, 0.33856181],\n", 820 | " [ 0.36666571, 0.11613323],\n", 821 | " [ 0.68029107, 0.12991374],\n", 822 | " [ 0.72437319, 0.5260204 ]]), array([[ 0.98362064, 0.83390233],\n", 823 | " [ 0.11817702, 0.63971147],\n", 824 | " [ 0.58431342, 0.11515941],\n", 825 | " [ 0.01241145, 0.67861831],\n", 826 | " [ 0.98166857, 0.5981871 ],\n", 827 | " [ 0.05226753, 0.51586905]]), array([[ 0.30820675, 0.62237232],\n", 828 | " [ 0.95449527, 0.77232103],\n", 829 | " [ 0.50000158, 0.56214734],\n", 830 | " [ 0.46134197, 0.69705024],\n", 831 | " [ 0.80964768, 0.44394885],\n", 832 | " [ 0.71076813, 0.83842862]])]\n" 833 | ] 834 | } 835 | ], 836 | "source": [ 837 | "blocks = np.split(arr, 3, axis = 1)\n", 838 | "print(type(blocks)) # 我们会拿到一个list of ndarray\n", 839 | "print(len(blocks))\n", 840 | "print(blocks)" 841 | ] 842 | }, 843 | { 844 | "cell_type": "markdown", 845 | "metadata": {}, 846 | "source": [ 847 | "堆叠辅助" 848 | ] 849 | }, 850 | { 851 | "cell_type": "code", 852 | "execution_count": 91, 853 | "metadata": { 854 | "collapsed": true, 855 | "scrolled": true 856 | }, 857 | "outputs": [], 858 | "source": [ 859 | "arr = np.arange(6)\n", 860 | "arr1 = arr.reshape((3, 2))\n", 861 | "arr2 = np.random.randn(3, 2)" 862 | ] 863 | }, 864 | { 865 | "cell_type": "markdown", 866 | "metadata": {}, 867 | "source": [ 868 | "r_用于按行堆叠" 869 | ] 870 | }, 871 | { 872 | "cell_type": "code", 873 | "execution_count": 92, 874 | "metadata": {}, 875 | "outputs": [ 876 | { 877 | "name": "stdout", 878 | "output_type": "stream", 879 | "text": [ 880 | "[[ 0. 1. ]\n", 881 | " [ 2. 3. ]\n", 882 | " [ 4. 5. ]\n", 883 | " [ 1.72687736 1.39613883]\n", 884 | " [-0.48292151 1.21469352]\n", 885 | " [ 0.59093029 1.92159834]]\n", 886 | "\n" 887 | ] 888 | } 889 | ], 890 | "source": [ 891 | "print(np.r_[arr1, arr2])\n", 892 | "print()" 893 | ] 894 | }, 895 | { 896 | "cell_type": "markdown", 897 | "metadata": {}, 898 | "source": [ 899 | "c_用于按列堆叠" 900 | ] 901 | }, 902 | { 903 | "cell_type": "code", 904 | "execution_count": 93, 905 | "metadata": {}, 906 | "outputs": [ 907 | { 908 | "name": "stdout", 909 | "output_type": "stream", 910 | "text": [ 911 | "[[ 0. 1. 0. ]\n", 912 | " [ 2. 3. 1. ]\n", 913 | " [ 4. 5. 2. ]\n", 914 | " [ 1.72687736 1.39613883 3. ]\n", 915 | " [-0.48292151 1.21469352 4. ]\n", 916 | " [ 0.59093029 1.92159834 5. ]]\n", 917 | "\n" 918 | ] 919 | } 920 | ], 921 | "source": [ 922 | "print(np.c_[np.r_[arr1, arr2], arr])\n", 923 | "print()" 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": {}, 929 | "source": [ 930 | "切片直接转为数组" 931 | ] 932 | }, 933 | { 934 | "cell_type": "code", 935 | "execution_count": 94, 936 | "metadata": {}, 937 | "outputs": [ 938 | { 939 | "name": "stdout", 940 | "output_type": "stream", 941 | "text": [ 942 | "[[ 1 -10]\n", 943 | " [ 2 -9]\n", 944 | " [ 3 -8]\n", 945 | " [ 4 -7]\n", 946 | " [ 5 -6]]\n", 947 | "\n" 948 | ] 949 | } 950 | ], 951 | "source": [ 952 | "print(np.c_[1:6, -10:-5])\n", 953 | "print()" 954 | ] 955 | }, 956 | { 957 | "cell_type": "markdown", 958 | "metadata": {}, 959 | "source": [ 960 | "使用repeat来重复ndarry中的元素" 961 | ] 962 | }, 963 | { 964 | "cell_type": "markdown", 965 | "metadata": {}, 966 | "source": [ 967 | "按元素重复" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 95, 973 | "metadata": {}, 974 | "outputs": [ 975 | { 976 | "name": "stdout", 977 | "output_type": "stream", 978 | "text": [ 979 | "[0 0 0 1 1 1 2 2 2]\n", 980 | "[0 0 1 1 1 2 2 2 2]\n", 981 | "\n" 982 | ] 983 | } 984 | ], 985 | "source": [ 986 | "arr = np.arange(3)\n", 987 | "print(arr.repeat(3))\n", 988 | "print(arr.repeat([2,3,4]))\n", 989 | "print()" 990 | ] 991 | }, 992 | { 993 | "cell_type": "markdown", 994 | "metadata": {}, 995 | "source": [ 996 | "指定axis来重复" 997 | ] 998 | }, 999 | { 1000 | "cell_type": "code", 1001 | "execution_count": 72, 1002 | "metadata": {}, 1003 | "outputs": [ 1004 | { 1005 | "name": "stdout", 1006 | "output_type": "stream", 1007 | "text": [ 1008 | "[[ 0.01909565 0.27303844]\n", 1009 | " [ 0.15173119 0.04216735]]\n" 1010 | ] 1011 | } 1012 | ], 1013 | "source": [ 1014 | "arr = np.random.rand(2,2)\n", 1015 | "print(arr)" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "code", 1020 | "execution_count": 73, 1021 | "metadata": {}, 1022 | "outputs": [ 1023 | { 1024 | "name": "stdout", 1025 | "output_type": "stream", 1026 | "text": [ 1027 | "[[ 0.01909565 0.27303844]\n", 1028 | " [ 0.01909565 0.27303844]\n", 1029 | " [ 0.15173119 0.04216735]\n", 1030 | " [ 0.15173119 0.04216735]]\n", 1031 | "[[ 0.01909565 0.01909565 0.27303844 0.27303844]\n", 1032 | " [ 0.15173119 0.15173119 0.04216735 0.04216735]]\n" 1033 | ] 1034 | } 1035 | ], 1036 | "source": [ 1037 | "print(arr.repeat(2, axis=0))\n", 1038 | "print(arr.repeat(2, axis=1))" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "markdown", 1043 | "metadata": {}, 1044 | "source": [ 1045 | "Tile: 参考贴瓷砖\n", 1046 | "[numpy tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html)" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "code", 1051 | "execution_count": 74, 1052 | "metadata": {}, 1053 | "outputs": [ 1054 | { 1055 | "name": "stdout", 1056 | "output_type": "stream", 1057 | "text": [ 1058 | "[[ 0.01909565 0.27303844 0.01909565 0.27303844]\n", 1059 | " [ 0.15173119 0.04216735 0.15173119 0.04216735]]\n", 1060 | "[[ 0.01909565 0.27303844 0.01909565 0.27303844 0.01909565 0.27303844]\n", 1061 | " [ 0.15173119 0.04216735 0.15173119 0.04216735 0.15173119 0.04216735]\n", 1062 | " [ 0.01909565 0.27303844 0.01909565 0.27303844 0.01909565 0.27303844]\n", 1063 | " [ 0.15173119 0.04216735 0.15173119 0.04216735 0.15173119 0.04216735]]\n" 1064 | ] 1065 | } 1066 | ], 1067 | "source": [ 1068 | "print(np.tile(arr, 2))\n", 1069 | "print(np.tile(arr, (2,3)))" 1070 | ] 1071 | }, 1072 | { 1073 | "cell_type": "markdown", 1074 | "metadata": {}, 1075 | "source": [ 1076 | "## numpy的文件输入输出\n", 1077 | "### 七月在线python数据分析集训营 julyedu.com" 1078 | ] 1079 | }, 1080 | { 1081 | "cell_type": "markdown", 1082 | "metadata": {}, 1083 | "source": [ 1084 | "读取csv文件作为数组" 1085 | ] 1086 | }, 1087 | { 1088 | "cell_type": "code", 1089 | "execution_count": 1, 1090 | "metadata": {}, 1091 | "outputs": [ 1092 | { 1093 | "name": "stdout", 1094 | "output_type": "stream", 1095 | "text": [ 1096 | "[[ 0.580052 0.18673 1.040717 1.134411]\n", 1097 | " [ 0.194163 -0.636917 -0.938659 0.124094]\n", 1098 | " [-0.12641 0.268607 -0.695724 0.047428]\n", 1099 | " [-1.484413 0.004176 -0.744203 0.005487]\n", 1100 | " [ 2.302869 0.200131 1.670238 -1.88109 ]\n", 1101 | " [-0.19323 1.047233 0.482803 0.960334]]\n" 1102 | ] 1103 | } 1104 | ], 1105 | "source": [ 1106 | "import numpy as np\n", 1107 | "arr = np.loadtxt('array_ex.txt', delimiter=',')\n", 1108 | "print(arr)" 1109 | ] 1110 | }, 1111 | { 1112 | "cell_type": "markdown", 1113 | "metadata": {}, 1114 | "source": [ 1115 | "数组文件读写" 1116 | ] 1117 | }, 1118 | { 1119 | "cell_type": "code", 1120 | "execution_count": 3, 1121 | "metadata": { 1122 | "collapsed": true 1123 | }, 1124 | "outputs": [], 1125 | "source": [ 1126 | "arr = np.arange(10)\n", 1127 | "np.save('some_array', arr)" 1128 | ] 1129 | }, 1130 | { 1131 | "cell_type": "code", 1132 | "execution_count": 4, 1133 | "metadata": {}, 1134 | "outputs": [ 1135 | { 1136 | "name": "stdout", 1137 | "output_type": "stream", 1138 | "text": [ 1139 | "[0 1 2 3 4 5 6 7 8 9]\n" 1140 | ] 1141 | } 1142 | ], 1143 | "source": [ 1144 | "print(np.load('some_array.npy'))" 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "markdown", 1149 | "metadata": {}, 1150 | "source": [ 1151 | "多个数组可以一起压缩存储" 1152 | ] 1153 | }, 1154 | { 1155 | "cell_type": "code", 1156 | "execution_count": 5, 1157 | "metadata": { 1158 | "collapsed": true 1159 | }, 1160 | "outputs": [], 1161 | "source": [ 1162 | "arr2 = np.arange(15).reshape(3,5)\n", 1163 | "np.savez('array_archive.npz', a=arr, b=arr2)" 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "code", 1168 | "execution_count": 6, 1169 | "metadata": {}, 1170 | "outputs": [ 1171 | { 1172 | "name": "stdout", 1173 | "output_type": "stream", 1174 | "text": [ 1175 | "[0 1 2 3 4 5 6 7 8 9]\n", 1176 | "[[ 0 1 2 3 4]\n", 1177 | " [ 5 6 7 8 9]\n", 1178 | " [10 11 12 13 14]]\n" 1179 | ] 1180 | } 1181 | ], 1182 | "source": [ 1183 | "arch = np.load('array_archive.npz')\n", 1184 | "print(arch['a'])\n", 1185 | "print(arch['b'])" 1186 | ] 1187 | }, 1188 | { 1189 | "cell_type": "markdown", 1190 | "metadata": {}, 1191 | "source": [ 1192 | "## numpy和scipy的相关数学运算\n", 1193 | "### 七月在线python数据分析集训营 julyedu.com" 1194 | ] 1195 | }, 1196 | { 1197 | "cell_type": "code", 1198 | "execution_count": 7, 1199 | "metadata": { 1200 | "collapsed": true 1201 | }, 1202 | "outputs": [], 1203 | "source": [ 1204 | "import numpy as np" 1205 | ] 1206 | }, 1207 | { 1208 | "cell_type": "markdown", 1209 | "metadata": {}, 1210 | "source": [ 1211 | "那如果我要做矩阵的乘法运算怎么办!!!恩,别着急,照着下面写就可以了:\n", 1212 | "\n", 1213 | "[matrix multiplication](http://mathworld.wolfram.com/MatrixMultiplication.html)" 1214 | ] 1215 | }, 1216 | { 1217 | "cell_type": "code", 1218 | "execution_count": 8, 1219 | "metadata": {}, 1220 | "outputs": [ 1221 | { 1222 | "name": "stdout", 1223 | "output_type": "stream", 1224 | "text": [ 1225 | "[[ 1. 2.]\n", 1226 | " [ 3. 4.]]\n", 1227 | "[[ 5. 6.]\n", 1228 | " [ 7. 8.]]\n" 1229 | ] 1230 | } 1231 | ], 1232 | "source": [ 1233 | "x = np.array([[1,2],[3,4]], dtype=np.float64)\n", 1234 | "y = np.array([[5,6],[7,8]], dtype=np.float64)\n", 1235 | "v = np.array([9,10])\n", 1236 | "w = np.array([11, 12])\n", 1237 | "print(x)\n", 1238 | "print(y)" 1239 | ] 1240 | }, 1241 | { 1242 | "cell_type": "markdown", 1243 | "metadata": {}, 1244 | "source": [ 1245 | "求向量内积" 1246 | ] 1247 | }, 1248 | { 1249 | "cell_type": "code", 1250 | "execution_count": 9, 1251 | "metadata": {}, 1252 | "outputs": [ 1253 | { 1254 | "name": "stdout", 1255 | "output_type": "stream", 1256 | "text": [ 1257 | "219\n", 1258 | "219\n" 1259 | ] 1260 | } 1261 | ], 1262 | "source": [ 1263 | "print(v.dot(w))\n", 1264 | "print(np.dot(v, w))" 1265 | ] 1266 | }, 1267 | { 1268 | "cell_type": "markdown", 1269 | "metadata": {}, 1270 | "source": [ 1271 | "矩阵的乘法" 1272 | ] 1273 | }, 1274 | { 1275 | "cell_type": "code", 1276 | "execution_count": 10, 1277 | "metadata": {}, 1278 | "outputs": [ 1279 | { 1280 | "name": "stdout", 1281 | "output_type": "stream", 1282 | "text": [ 1283 | "[ 29. 67.]\n", 1284 | "[ 29. 67.]\n" 1285 | ] 1286 | } 1287 | ], 1288 | "source": [ 1289 | "print(x.dot(v))\n", 1290 | "print(np.dot(x, v))" 1291 | ] 1292 | }, 1293 | { 1294 | "cell_type": "code", 1295 | "execution_count": 11, 1296 | "metadata": { 1297 | "scrolled": true 1298 | }, 1299 | "outputs": [ 1300 | { 1301 | "name": "stdout", 1302 | "output_type": "stream", 1303 | "text": [ 1304 | "[[ 19. 22.]\n", 1305 | " [ 43. 50.]]\n", 1306 | "[[ 19. 22.]\n", 1307 | " [ 43. 50.]]\n" 1308 | ] 1309 | } 1310 | ], 1311 | "source": [ 1312 | "print(x.dot(y))\n", 1313 | "print(np.dot(x, y))" 1314 | ] 1315 | }, 1316 | { 1317 | "cell_type": "markdown", 1318 | "metadata": {}, 1319 | "source": [ 1320 | "向量的内积[inner](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.inner.html#numpy.inner)" 1321 | ] 1322 | }, 1323 | { 1324 | "cell_type": "code", 1325 | "execution_count": 12, 1326 | "metadata": {}, 1327 | "outputs": [ 1328 | { 1329 | "data": { 1330 | "text/plain": [ 1331 | "array([[ 17., 23.],\n", 1332 | " [ 39., 53.]])" 1333 | ] 1334 | }, 1335 | "execution_count": 12, 1336 | "metadata": {}, 1337 | "output_type": "execute_result" 1338 | } 1339 | ], 1340 | "source": [ 1341 | "np.inner(x, y)" 1342 | ] 1343 | }, 1344 | { 1345 | "cell_type": "code", 1346 | "execution_count": 13, 1347 | "metadata": {}, 1348 | "outputs": [ 1349 | { 1350 | "data": { 1351 | "text/plain": [ 1352 | "array([[[ 14, 38, 62],\n", 1353 | " [ 38, 126, 214],\n", 1354 | " [ 62, 214, 366]],\n", 1355 | "\n", 1356 | " [[ 86, 302, 518],\n", 1357 | " [110, 390, 670],\n", 1358 | " [134, 478, 822]]])" 1359 | ] 1360 | }, 1361 | "execution_count": 13, 1362 | "metadata": {}, 1363 | "output_type": "execute_result" 1364 | } 1365 | ], 1366 | "source": [ 1367 | "X = np.arange(24).reshape(2,3,4)\n", 1368 | "Y = np.arange(12).reshape(3,4)\n", 1369 | "np.inner(X, Y)" 1370 | ] 1371 | }, 1372 | { 1373 | "cell_type": "code", 1374 | "execution_count": 14, 1375 | "metadata": {}, 1376 | "outputs": [ 1377 | { 1378 | "data": { 1379 | "text/plain": [ 1380 | "(2, 3, 4)" 1381 | ] 1382 | }, 1383 | "execution_count": 14, 1384 | "metadata": {}, 1385 | "output_type": "execute_result" 1386 | } 1387 | ], 1388 | "source": [ 1389 | "X = np.arange(24).reshape(2,3,4)\n", 1390 | "Y = np.arange(16).reshape(4,4)\n", 1391 | "np.inner(X, Y).shape" 1392 | ] 1393 | }, 1394 | { 1395 | "cell_type": "markdown", 1396 | "metadata": { 1397 | "collapsed": true 1398 | }, 1399 | "source": [ 1400 | "转置和数学公式一样,简单粗暴" 1401 | ] 1402 | }, 1403 | { 1404 | "cell_type": "code", 1405 | "execution_count": 15, 1406 | "metadata": {}, 1407 | "outputs": [ 1408 | { 1409 | "name": "stdout", 1410 | "output_type": "stream", 1411 | "text": [ 1412 | "[[ 1. 2.]\n", 1413 | " [ 3. 4.]]\n", 1414 | "[[ 1. 3.]\n", 1415 | " [ 2. 4.]]\n" 1416 | ] 1417 | } 1418 | ], 1419 | "source": [ 1420 | "print(x)\n", 1421 | "print(x.T)" 1422 | ] 1423 | }, 1424 | { 1425 | "cell_type": "code", 1426 | "execution_count": 16, 1427 | "metadata": {}, 1428 | "outputs": [ 1429 | { 1430 | "ename": "SyntaxError", 1431 | "evalue": "invalid character in identifier (, line 1)", 1432 | "output_type": "error", 1433 | "traceback": [ 1434 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 需要说明一下,1维的vector转置还是自己\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid character in identifier\n" 1435 | ] 1436 | } 1437 | ], 1438 | "source": [ 1439 | "需要说明一下,1维的vector转置还是自己" 1440 | ] 1441 | }, 1442 | { 1443 | "cell_type": "code", 1444 | "execution_count": 17, 1445 | "metadata": {}, 1446 | "outputs": [ 1447 | { 1448 | "name": "stdout", 1449 | "output_type": "stream", 1450 | "text": [ 1451 | "[1 2 3]\n", 1452 | "[1 2 3]\n" 1453 | ] 1454 | } 1455 | ], 1456 | "source": [ 1457 | "v = np.array([1,2,3])\n", 1458 | "print(v)\n", 1459 | "print(v.T)" 1460 | ] 1461 | }, 1462 | { 1463 | "cell_type": "markdown", 1464 | "metadata": {}, 1465 | "source": [ 1466 | "2维的就不一样了" 1467 | ] 1468 | }, 1469 | { 1470 | "cell_type": "code", 1471 | "execution_count": 18, 1472 | "metadata": {}, 1473 | "outputs": [ 1474 | { 1475 | "name": "stdout", 1476 | "output_type": "stream", 1477 | "text": [ 1478 | "[[1 2 3]]\n", 1479 | "[[1]\n", 1480 | " [2]\n", 1481 | " [3]]\n" 1482 | ] 1483 | } 1484 | ], 1485 | "source": [ 1486 | "w = np.array([[1,2,3]])\n", 1487 | "print(w)\n", 1488 | "print(w.T)" 1489 | ] 1490 | }, 1491 | { 1492 | "cell_type": "markdown", 1493 | "metadata": {}, 1494 | "source": [ 1495 | "利用转置矩阵做dot product" 1496 | ] 1497 | }, 1498 | { 1499 | "cell_type": "code", 1500 | "execution_count": 19, 1501 | "metadata": {}, 1502 | "outputs": [ 1503 | { 1504 | "name": "stdout", 1505 | "output_type": "stream", 1506 | "text": [ 1507 | "[[ 3.25570055 0.34061858 -0.66837506]\n", 1508 | " [ 0.34061858 4.34204493 -0.08812162]\n", 1509 | " [ -0.66837506 -0.08812162 12.28257546]]\n" 1510 | ] 1511 | } 1512 | ], 1513 | "source": [ 1514 | "arr = np.random.randn(6,3)\n", 1515 | "print(np.dot(arr.T, arr))" 1516 | ] 1517 | }, 1518 | { 1519 | "cell_type": "code", 1520 | "execution_count": 20, 1521 | "metadata": {}, 1522 | "outputs": [ 1523 | { 1524 | "ename": "ValueError", 1525 | "evalue": "shapes (6,3) and (6,3) not aligned: 3 (dim 1) != 6 (dim 0)", 1526 | "output_type": "error", 1527 | "traceback": [ 1528 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1529 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 1530 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1531 | "\u001b[0;31mValueError\u001b[0m: shapes (6,3) and (6,3) not aligned: 3 (dim 1) != 6 (dim 0)" 1532 | ] 1533 | } 1534 | ], 1535 | "source": [ 1536 | "print(np.dot(arr, arr))" 1537 | ] 1538 | }, 1539 | { 1540 | "cell_type": "markdown", 1541 | "metadata": {}, 1542 | "source": [ 1543 | "高维的tensor也可以做转置" 1544 | ] 1545 | }, 1546 | { 1547 | "cell_type": "code", 1548 | "execution_count": 21, 1549 | "metadata": {}, 1550 | "outputs": [ 1551 | { 1552 | "name": "stdout", 1553 | "output_type": "stream", 1554 | "text": [ 1555 | "[[[ 0 1 2 3]\n", 1556 | " [ 4 5 6 7]]\n", 1557 | "\n", 1558 | " [[ 8 9 10 11]\n", 1559 | " [12 13 14 15]]]\n" 1560 | ] 1561 | } 1562 | ], 1563 | "source": [ 1564 | "arr = np.arange(16).reshape((2, 2, 4))\n", 1565 | "print(arr)" 1566 | ] 1567 | }, 1568 | { 1569 | "cell_type": "code", 1570 | "execution_count": 22, 1571 | "metadata": {}, 1572 | "outputs": [ 1573 | { 1574 | "name": "stdout", 1575 | "output_type": "stream", 1576 | "text": [ 1577 | "[[[ 0 1 2 3]\n", 1578 | " [ 8 9 10 11]]\n", 1579 | "\n", 1580 | " [[ 4 5 6 7]\n", 1581 | " [12 13 14 15]]]\n" 1582 | ] 1583 | } 1584 | ], 1585 | "source": [ 1586 | "print(arr.transpose((1,0,2)))" 1587 | ] 1588 | }, 1589 | { 1590 | "cell_type": "code", 1591 | "execution_count": 23, 1592 | "metadata": {}, 1593 | "outputs": [ 1594 | { 1595 | "name": "stdout", 1596 | "output_type": "stream", 1597 | "text": [ 1598 | "[[[ 0 4]\n", 1599 | " [ 1 5]\n", 1600 | " [ 2 6]\n", 1601 | " [ 3 7]]\n", 1602 | "\n", 1603 | " [[ 8 12]\n", 1604 | " [ 9 13]\n", 1605 | " [10 14]\n", 1606 | " [11 15]]]\n" 1607 | ] 1608 | } 1609 | ], 1610 | "source": [ 1611 | "print(arr.swapaxes(1,2))" 1612 | ] 1613 | }, 1614 | { 1615 | "cell_type": "markdown", 1616 | "metadata": {}, 1617 | "source": [ 1618 | "### [matmul](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html#numpy.matmul)\n", 1619 | "\n", 1620 | "非常常用,用于计算矩阵乘法" 1621 | ] 1622 | }, 1623 | { 1624 | "cell_type": "code", 1625 | "execution_count": 24, 1626 | "metadata": { 1627 | "collapsed": true 1628 | }, 1629 | "outputs": [], 1630 | "source": [ 1631 | "import numpy as np" 1632 | ] 1633 | }, 1634 | { 1635 | "cell_type": "code", 1636 | "execution_count": 25, 1637 | "metadata": {}, 1638 | "outputs": [ 1639 | { 1640 | "name": "stdout", 1641 | "output_type": "stream", 1642 | "text": [ 1643 | "[[[ 28 34]\n", 1644 | " [ 76 98]\n", 1645 | " [124 162]]\n", 1646 | "\n", 1647 | " [[172 226]\n", 1648 | " [220 290]\n", 1649 | " [268 354]]]\n" 1650 | ] 1651 | } 1652 | ], 1653 | "source": [ 1654 | "x = np.arange(24).reshape(2,3,4)\n", 1655 | "y = np.arange(8).reshape(4,2)\n", 1656 | "print(np.matmul(x,y))" 1657 | ] 1658 | }, 1659 | { 1660 | "cell_type": "code", 1661 | "execution_count": 26, 1662 | "metadata": {}, 1663 | "outputs": [ 1664 | { 1665 | "name": "stdout", 1666 | "output_type": "stream", 1667 | "text": [ 1668 | "[[[ 28 34]\n", 1669 | " [ 76 98]\n", 1670 | " [124 162]]\n", 1671 | "\n", 1672 | " [[172 226]\n", 1673 | " [220 290]\n", 1674 | " [268 354]]]\n" 1675 | ] 1676 | } 1677 | ], 1678 | "source": [ 1679 | "print(np.dot(x, y))" 1680 | ] 1681 | }, 1682 | { 1683 | "cell_type": "code", 1684 | "execution_count": 27, 1685 | "metadata": {}, 1686 | "outputs": [ 1687 | { 1688 | "name": "stdout", 1689 | "output_type": "stream", 1690 | "text": [ 1691 | "[[ 28 34]\n", 1692 | " [ 76 98]\n", 1693 | " [124 162]]\n", 1694 | "[[172 226]\n", 1695 | " [220 290]\n", 1696 | " [268 354]]\n" 1697 | ] 1698 | } 1699 | ], 1700 | "source": [ 1701 | "x1 = np.arange(12).reshape(3,4)\n", 1702 | "print(np.matmul(x1, y))\n", 1703 | "x2 = np.arange(12,24).reshape(3,4)\n", 1704 | "print(np.matmul(x2, y))" 1705 | ] 1706 | }, 1707 | { 1708 | "cell_type": "code", 1709 | "execution_count": 28, 1710 | "metadata": {}, 1711 | "outputs": [ 1712 | { 1713 | "name": "stdout", 1714 | "output_type": "stream", 1715 | "text": [ 1716 | "(2, 3, 2, 2)\n" 1717 | ] 1718 | } 1719 | ], 1720 | "source": [ 1721 | "y = np.arange(16).reshape(2,4,2)\n", 1722 | "print(x.dot(y).shape)" 1723 | ] 1724 | }, 1725 | { 1726 | "cell_type": "code", 1727 | "execution_count": 29, 1728 | "metadata": {}, 1729 | "outputs": [ 1730 | { 1731 | "name": "stdout", 1732 | "output_type": "stream", 1733 | "text": [ 1734 | "(2, 3, 2)\n" 1735 | ] 1736 | } 1737 | ], 1738 | "source": [ 1739 | "print(np.matmul(x,y).shape)" 1740 | ] 1741 | }, 1742 | { 1743 | "cell_type": "code", 1744 | "execution_count": 30, 1745 | "metadata": {}, 1746 | "outputs": [ 1747 | { 1748 | "name": "stdout", 1749 | "output_type": "stream", 1750 | "text": [ 1751 | "[[[ 28 34]\n", 1752 | " [ 76 98]\n", 1753 | " [ 124 162]]\n", 1754 | "\n", 1755 | " [[ 604 658]\n", 1756 | " [ 780 850]\n", 1757 | " [ 956 1042]]]\n" 1758 | ] 1759 | } 1760 | ], 1761 | "source": [ 1762 | "x = np.arange(24).reshape(2,3,4)\n", 1763 | "y = np.arange(16).reshape(2,4,2)\n", 1764 | "print(np.matmul(x,y))" 1765 | ] 1766 | }, 1767 | { 1768 | "cell_type": "code", 1769 | "execution_count": 31, 1770 | "metadata": {}, 1771 | "outputs": [ 1772 | { 1773 | "name": "stdout", 1774 | "output_type": "stream", 1775 | "text": [ 1776 | "[[[ 28 34]\n", 1777 | " [ 76 98]\n", 1778 | " [124 162]]\n", 1779 | "\n", 1780 | " [[172 226]\n", 1781 | " [220 290]\n", 1782 | " [268 354]]]\n" 1783 | ] 1784 | } 1785 | ], 1786 | "source": [ 1787 | "x = np.arange(24).reshape(2,3,4) \n", 1788 | "y = np.arange(8).reshape(1,4,2)\n", 1789 | "print(np.matmul(x,y))" 1790 | ] 1791 | }, 1792 | { 1793 | "cell_type": "code", 1794 | "execution_count": 32, 1795 | "metadata": {}, 1796 | "outputs": [ 1797 | { 1798 | "name": "stdout", 1799 | "output_type": "stream", 1800 | "text": [ 1801 | "x [[ 0 1 2 3]\n", 1802 | " [ 4 5 6 7]\n", 1803 | " [ 8 9 10 11]] [[12 13 14 15]\n", 1804 | " [16 17 18 19]\n", 1805 | " [20 21 22 23]]\n", 1806 | "[[[0 1]\n", 1807 | " [2 3]\n", 1808 | " [4 5]\n", 1809 | " [6 7]]]\n" 1810 | ] 1811 | } 1812 | ], 1813 | "source": [ 1814 | "print(\"x\", x[0], x[1])\n", 1815 | "print(y)" 1816 | ] 1817 | }, 1818 | { 1819 | "cell_type": "markdown", 1820 | "metadata": {}, 1821 | "source": [ 1822 | "### [outer product](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.outer.html)\n", 1823 | "\n", 1824 | "与数学中的定义一样,outer product就是两个向量酸外积,变成了一个矩阵" 1825 | ] 1826 | }, 1827 | { 1828 | "cell_type": "code", 1829 | "execution_count": 33, 1830 | "metadata": {}, 1831 | "outputs": [ 1832 | { 1833 | "data": { 1834 | "text/plain": [ 1835 | "array([[-10., -15., -20.],\n", 1836 | " [ 0., 0., 0.],\n", 1837 | " [ 10., 15., 20.]])" 1838 | ] 1839 | }, 1840 | "execution_count": 33, 1841 | "metadata": {}, 1842 | "output_type": "execute_result" 1843 | } 1844 | ], 1845 | "source": [ 1846 | "a = np.linspace(-5,5,3)\n", 1847 | "b = np.arange(2,5)\n", 1848 | "np.outer(a, b)" 1849 | ] 1850 | }, 1851 | { 1852 | "cell_type": "code", 1853 | "execution_count": null, 1854 | "metadata": { 1855 | "collapsed": true 1856 | }, 1857 | "outputs": [], 1858 | "source": [] 1859 | }, 1860 | { 1861 | "cell_type": "markdown", 1862 | "metadata": {}, 1863 | "source": [ 1864 | "### 一些更高级的线性代数操作" 1865 | ] 1866 | }, 1867 | { 1868 | "cell_type": "markdown", 1869 | "metadata": {}, 1870 | "source": [ 1871 | "计算determinant" 1872 | ] 1873 | }, 1874 | { 1875 | "cell_type": "code", 1876 | "execution_count": 34, 1877 | "metadata": {}, 1878 | "outputs": [ 1879 | { 1880 | "data": { 1881 | "text/plain": [ 1882 | "-9.0000000000000018" 1883 | ] 1884 | }, 1885 | "execution_count": 34, 1886 | "metadata": {}, 1887 | "output_type": "execute_result" 1888 | } 1889 | ], 1890 | "source": [ 1891 | "x = np.array([[1, 5], [2, 1]])\n", 1892 | "np.linalg.det(x)" 1893 | ] 1894 | }, 1895 | { 1896 | "cell_type": "markdown", 1897 | "metadata": {}, 1898 | "source": [ 1899 | "计算inverse" 1900 | ] 1901 | }, 1902 | { 1903 | "cell_type": "code", 1904 | "execution_count": 35, 1905 | "metadata": {}, 1906 | "outputs": [ 1907 | { 1908 | "name": "stdout", 1909 | "output_type": "stream", 1910 | "text": [ 1911 | "x_inv [[-0.11111111 0.55555556]\n", 1912 | " [ 0.22222222 -0.11111111]]\n" 1913 | ] 1914 | }, 1915 | { 1916 | "data": { 1917 | "text/plain": [ 1918 | "array([[ 1.00000000e+00, 5.55111512e-17],\n", 1919 | " [ 0.00000000e+00, 1.00000000e+00]])" 1920 | ] 1921 | }, 1922 | "execution_count": 35, 1923 | "metadata": {}, 1924 | "output_type": "execute_result" 1925 | } 1926 | ], 1927 | "source": [ 1928 | "x_inv = np.linalg.inv(x)\n", 1929 | "print(\"x_inv\", x_inv)\n", 1930 | "np.dot(x, x_inv)" 1931 | ] 1932 | }, 1933 | { 1934 | "cell_type": "markdown", 1935 | "metadata": {}, 1936 | "source": [ 1937 | "计算pseudo-inverse" 1938 | ] 1939 | }, 1940 | { 1941 | "cell_type": "code", 1942 | "execution_count": 36, 1943 | "metadata": {}, 1944 | "outputs": [ 1945 | { 1946 | "data": { 1947 | "text/plain": [ 1948 | "0.0" 1949 | ] 1950 | }, 1951 | "execution_count": 36, 1952 | "metadata": {}, 1953 | "output_type": "execute_result" 1954 | } 1955 | ], 1956 | "source": [ 1957 | "x = np.array([[1,2,3], [2,4,6], [1,3,5]])\n", 1958 | "np.linalg.det(x)" 1959 | ] 1960 | }, 1961 | { 1962 | "cell_type": "code", 1963 | "execution_count": 37, 1964 | "metadata": {}, 1965 | "outputs": [ 1966 | { 1967 | "ename": "LinAlgError", 1968 | "evalue": "Singular matrix", 1969 | "output_type": "error", 1970 | "traceback": [ 1971 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1972 | "\u001b[0;31mLinAlgError\u001b[0m Traceback (most recent call last)", 1973 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx_inv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlinalg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1974 | "\u001b[0;32m~/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/linalg/linalg.py\u001b[0m in \u001b[0;36minv\u001b[0;34m(a)\u001b[0m\n\u001b[1;32m 511\u001b[0m \u001b[0msignature\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'D->D'\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misComplexType\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m'd->d'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 512\u001b[0m \u001b[0mextobj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_linalg_error_extobj\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_raise_linalgerror_singular\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 513\u001b[0;31m \u001b[0mainv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_umath_linalg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msignature\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msignature\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextobj\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mextobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 514\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mainv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult_t\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcopy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 515\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 1975 | "\u001b[0;32m~/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/linalg/linalg.py\u001b[0m in \u001b[0;36m_raise_linalgerror_singular\u001b[0;34m(err, flag)\u001b[0m\n\u001b[1;32m 88\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_raise_linalgerror_singular\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 90\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mLinAlgError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Singular matrix\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 91\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_raise_linalgerror_nonposdef\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1976 | "\u001b[0;31mLinAlgError\u001b[0m: Singular matrix" 1977 | ] 1978 | } 1979 | ], 1980 | "source": [ 1981 | "x_inv = np.linalg.inv(x)" 1982 | ] 1983 | }, 1984 | { 1985 | "cell_type": "code", 1986 | "execution_count": 38, 1987 | "metadata": { 1988 | "scrolled": true 1989 | }, 1990 | "outputs": [ 1991 | { 1992 | "name": "stdout", 1993 | "output_type": "stream", 1994 | "text": [ 1995 | "x_pinv [[ 0.43333333 0.86666667 -1.33333333]\n", 1996 | " [ 0.13333333 0.26666667 -0.33333333]\n", 1997 | " [-0.16666667 -0.33333333 0.66666667]]\n" 1998 | ] 1999 | } 2000 | ], 2001 | "source": [ 2002 | "x_pinv = np.linalg.pinv(x)\n", 2003 | "print(\"x_pinv\", x_pinv)" 2004 | ] 2005 | }, 2006 | { 2007 | "cell_type": "code", 2008 | "execution_count": 39, 2009 | "metadata": {}, 2010 | "outputs": [ 2011 | { 2012 | "data": { 2013 | "text/plain": [ 2014 | "array([[ 2.00000000e-01, 4.00000000e-01, 0.00000000e+00],\n", 2015 | " [ 4.00000000e-01, 8.00000000e-01, 0.00000000e+00],\n", 2016 | " [ 1.11022302e-16, 0.00000000e+00, 1.00000000e+00]])" 2017 | ] 2018 | }, 2019 | "execution_count": 39, 2020 | "metadata": {}, 2021 | "output_type": "execute_result" 2022 | } 2023 | ], 2024 | "source": [ 2025 | "np.dot(x, x_pinv)" 2026 | ] 2027 | }, 2028 | { 2029 | "cell_type": "markdown", 2030 | "metadata": {}, 2031 | "source": [ 2032 | "计算Matrix的[norm](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm)" 2033 | ] 2034 | }, 2035 | { 2036 | "cell_type": "code", 2037 | "execution_count": 40, 2038 | "metadata": {}, 2039 | "outputs": [ 2040 | { 2041 | "data": { 2042 | "text/plain": [ 2043 | "31.859064644147981" 2044 | ] 2045 | }, 2046 | "execution_count": 40, 2047 | "metadata": {}, 2048 | "output_type": "execute_result" 2049 | } 2050 | ], 2051 | "source": [ 2052 | "x = np.arange(15).reshape(3,5)\n", 2053 | "np.linalg.norm(x, \"fro\")" 2054 | ] 2055 | }, 2056 | { 2057 | "cell_type": "code", 2058 | "execution_count": 41, 2059 | "metadata": {}, 2060 | "outputs": [ 2061 | { 2062 | "data": { 2063 | "text/plain": [ 2064 | "31.859064644147981" 2065 | ] 2066 | }, 2067 | "execution_count": 41, 2068 | "metadata": {}, 2069 | "output_type": "execute_result" 2070 | } 2071 | ], 2072 | "source": [ 2073 | "np.sqrt(np.sum(x**2))" 2074 | ] 2075 | }, 2076 | { 2077 | "cell_type": "code", 2078 | "execution_count": 42, 2079 | "metadata": {}, 2080 | "outputs": [ 2081 | { 2082 | "data": { 2083 | "text/plain": [ 2084 | "60.0" 2085 | ] 2086 | }, 2087 | "execution_count": 42, 2088 | "metadata": {}, 2089 | "output_type": "execute_result" 2090 | } 2091 | ], 2092 | "source": [ 2093 | "np.linalg.norm(x, np.inf)" 2094 | ] 2095 | }, 2096 | { 2097 | "cell_type": "markdown", 2098 | "metadata": {}, 2099 | "source": [ 2100 | "计算singular value decomposition (SVD)" 2101 | ] 2102 | }, 2103 | { 2104 | "cell_type": "code", 2105 | "execution_count": 43, 2106 | "metadata": { 2107 | "collapsed": true 2108 | }, 2109 | "outputs": [], 2110 | "source": [ 2111 | "U, s, V = np.linalg.svd(x)" 2112 | ] 2113 | }, 2114 | { 2115 | "cell_type": "code", 2116 | "execution_count": 44, 2117 | "metadata": {}, 2118 | "outputs": [ 2119 | { 2120 | "data": { 2121 | "text/plain": [ 2122 | "array([[ 1.00000000e+00, 0.00000000e+00, -2.77555756e-17],\n", 2123 | " [ 0.00000000e+00, 1.00000000e+00, -5.55111512e-17],\n", 2124 | " [ -2.77555756e-17, -5.55111512e-17, 1.00000000e+00]])" 2125 | ] 2126 | }, 2127 | "execution_count": 44, 2128 | "metadata": {}, 2129 | "output_type": "execute_result" 2130 | } 2131 | ], 2132 | "source": [ 2133 | "np.dot(U, U.T)" 2134 | ] 2135 | }, 2136 | { 2137 | "cell_type": "code", 2138 | "execution_count": 45, 2139 | "metadata": {}, 2140 | "outputs": [ 2141 | { 2142 | "data": { 2143 | "text/plain": [ 2144 | "array([[ 1.00000000e+00, -1.07948583e-16, 5.91865369e-17,\n", 2145 | " -4.17545215e-17, -4.14054997e-17],\n", 2146 | " [ -1.07948583e-16, 1.00000000e+00, -1.25162789e-16,\n", 2147 | " -1.68536677e-17, 5.08778614e-18],\n", 2148 | " [ 5.91865369e-17, -1.25162789e-16, 1.00000000e+00,\n", 2149 | " 4.99764062e-17, -8.35727138e-17],\n", 2150 | " [ -4.17545215e-17, -1.68536677e-17, 4.99764062e-17,\n", 2151 | " 1.00000000e+00, -8.67263621e-17],\n", 2152 | " [ -4.14054997e-17, 5.08778614e-18, -8.35727138e-17,\n", 2153 | " -8.67263621e-17, 1.00000000e+00]])" 2154 | ] 2155 | }, 2156 | "execution_count": 45, 2157 | "metadata": {}, 2158 | "output_type": "execute_result" 2159 | } 2160 | ], 2161 | "source": [ 2162 | "np.dot(V, V.T)" 2163 | ] 2164 | }, 2165 | { 2166 | "cell_type": "code", 2167 | "execution_count": 46, 2168 | "metadata": {}, 2169 | "outputs": [ 2170 | { 2171 | "data": { 2172 | "text/plain": [ 2173 | "array([ 3.17420265e+01, 2.72832424e+00, 8.33338143e-16])" 2174 | ] 2175 | }, 2176 | "execution_count": 46, 2177 | "metadata": {}, 2178 | "output_type": "execute_result" 2179 | } 2180 | ], 2181 | "source": [ 2182 | "s" 2183 | ] 2184 | }, 2185 | { 2186 | "cell_type": "markdown", 2187 | "metadata": {}, 2188 | "source": [ 2189 | "\n", 2190 | "## 随堂小项目\n", 2191 | "\n", 2192 | "### 七月在线python数据分析集训营 julyedu.com\n", 2193 | "\n", 2194 | "用numpy写一个softmax\n", 2195 | "\n", 2196 | "[什么是softmax?](http://cs231n.github.io/linear-classify/#softmax)" 2197 | ] 2198 | }, 2199 | { 2200 | "cell_type": "markdown", 2201 | "metadata": {}, 2202 | "source": [ 2203 | "一维softmax" 2204 | ] 2205 | }, 2206 | { 2207 | "cell_type": "code", 2208 | "execution_count": 99, 2209 | "metadata": {}, 2210 | "outputs": [ 2211 | { 2212 | "data": { 2213 | "text/plain": [ 2214 | "array([ 0.60621965, 0.30030324, 0.89137532, 0.71493725, 0.13655471,\n", 2215 | " 0.08581598, 0.54112516, 0.4707926 , 0.35316744, 0.35783616])" 2216 | ] 2217 | }, 2218 | "execution_count": 99, 2219 | "metadata": {}, 2220 | "output_type": "execute_result" 2221 | } 2222 | ], 2223 | "source": [ 2224 | "import numpy as np\n", 2225 | "x = np.random.random(10)\n", 2226 | "x" 2227 | ] 2228 | }, 2229 | { 2230 | "cell_type": "code", 2231 | "execution_count": 101, 2232 | "metadata": {}, 2233 | "outputs": [ 2234 | { 2235 | "data": { 2236 | "text/plain": [ 2237 | "array([ 1.83348706, 1.3502682 , 2.43848103, 2.04405842, 1.14631759,\n", 2238 | " 1.0896058 , 1.71793872, 1.60126285, 1.42356949, 1.43023128])" 2239 | ] 2240 | }, 2241 | "execution_count": 101, 2242 | "metadata": {}, 2243 | "output_type": "execute_result" 2244 | } 2245 | ], 2246 | "source": [ 2247 | "np.exp(x)" 2248 | ] 2249 | }, 2250 | { 2251 | "cell_type": "code", 2252 | "execution_count": 102, 2253 | "metadata": {}, 2254 | "outputs": [ 2255 | { 2256 | "data": { 2257 | "text/plain": [ 2258 | "16.075220445857994" 2259 | ] 2260 | }, 2261 | "execution_count": 102, 2262 | "metadata": {}, 2263 | "output_type": "execute_result" 2264 | } 2265 | ], 2266 | "source": [ 2267 | "np.sum(np.exp(x))" 2268 | ] 2269 | }, 2270 | { 2271 | "cell_type": "code", 2272 | "execution_count": 100, 2273 | "metadata": {}, 2274 | "outputs": [ 2275 | { 2276 | "data": { 2277 | "text/plain": [ 2278 | "array([ 0.11405673, 0.08399687, 0.15169192, 0.12715586, 0.0713096 ,\n", 2279 | " 0.0677817 , 0.10686875, 0.09961063, 0.08855676, 0.08897118])" 2280 | ] 2281 | }, 2282 | "execution_count": 100, 2283 | "metadata": {}, 2284 | "output_type": "execute_result" 2285 | } 2286 | ], 2287 | "source": [ 2288 | "np.exp(x) / np.sum(np.exp(x))" 2289 | ] 2290 | }, 2291 | { 2292 | "cell_type": "code", 2293 | "execution_count": 48, 2294 | "metadata": {}, 2295 | "outputs": [ 2296 | { 2297 | "name": "stdout", 2298 | "output_type": "stream", 2299 | "text": [ 2300 | "[[ 1009.03960456 1000.28966207 1007.0243779 1005.12220239\n", 2301 | " 1002.88437093 1008.84302621 1009.51564452 1004.52647942\n", 2302 | " 1007.62835009 1008.12790242]\n", 2303 | " [ 1003.55735494 1001.23541286 1007.98665582 1009.49467382\n", 2304 | " 1002.31208185 1007.62423241 1007.39623205 1004.85250709\n", 2305 | " 1008.49656807 1003.80373337]\n", 2306 | " [ 1009.55551008 1001.83598146 1000.82767674 1009.83673379\n", 2307 | " 1000.46585151 1002.29082922 1008.02347323 1001.54300225 1002.5740486\n", 2308 | " 1003.26800962]\n", 2309 | " [ 1003.98037258 1008.25950365 1000.73334725 1006.18337055\n", 2310 | " 1005.91710081 1003.29850781 1009.37108919 1000.71425167\n", 2311 | " 1006.56877464 1004.29557635]\n", 2312 | " [ 1009.52417036 1005.76606876 1001.65168779 1000.34081781\n", 2313 | " 1003.53449811 1002.72862727 1000.80267248 1009.70808009\n", 2314 | " 1007.96610372 1000.50550359]\n", 2315 | " [ 1005.48887008 1002.22319984 1000.76703623 1005.11631226\n", 2316 | " 1006.19447414 1006.16004298 1001.07526485 1005.16117179\n", 2317 | " 1001.39018188 1002.61539398]\n", 2318 | " [ 1004.08661371 1003.84655825 1003.65662011 1000.81745635\n", 2319 | " 1006.05343756 1005.86074863 1009.81171013 1003.1970601 1003.3602387\n", 2320 | " 1007.25948129]\n", 2321 | " [ 1001.52682237 1009.01222274 1005.9308933 1009.42206593\n", 2322 | " 1001.90505273 1001.93671271 1005.26838395 1004.79170226\n", 2323 | " 1003.69677991 1007.48275556]\n", 2324 | " [ 1002.05268084 1007.16277577 1009.38249775 1008.39492843\n", 2325 | " 1003.98635282 1007.43979093 1001.40709911 1002.6240636 1003.62269888\n", 2326 | " 1008.41843796]\n", 2327 | " [ 1007.43767778 1006.55560766 1005.18042169 1005.12971307\n", 2328 | " 1005.62346619 1004.48468658 1005.2506437 1007.44010259\n", 2329 | " 1002.50114765 1003.87657108]]\n" 2330 | ] 2331 | } 2332 | ], 2333 | "source": [ 2334 | "import numpy as np\n", 2335 | "m = np.random.rand(10, 10) * 10 + 1000\n", 2336 | "print(m)" 2337 | ] 2338 | }, 2339 | { 2340 | "cell_type": "code", 2341 | "execution_count": 49, 2342 | "metadata": {}, 2343 | "outputs": [ 2344 | { 2345 | "name": "stdout", 2346 | "output_type": "stream", 2347 | "text": [ 2348 | "[[ inf inf inf inf inf inf inf inf inf inf]\n", 2349 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2350 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2351 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2352 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2353 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2354 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2355 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2356 | " [ inf inf inf inf inf inf inf inf inf inf]\n", 2357 | " [ inf inf inf inf inf inf inf inf inf inf]]\n" 2358 | ] 2359 | }, 2360 | { 2361 | "name": "stderr", 2362 | "output_type": "stream", 2363 | "text": [ 2364 | "/Users/zeweichu/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: overflow encountered in exp\n", 2365 | " \"\"\"Entry point for launching an IPython kernel.\n" 2366 | ] 2367 | } 2368 | ], 2369 | "source": [ 2370 | "print(np.exp(m))" 2371 | ] 2372 | }, 2373 | { 2374 | "cell_type": "code", 2375 | "execution_count": 50, 2376 | "metadata": {}, 2377 | "outputs": [ 2378 | { 2379 | "name": "stdout", 2380 | "output_type": "stream", 2381 | "text": [ 2382 | "[ 1009.51564452 1009.49467382 1009.83673379 1009.37108919 1009.70808009\n", 2383 | " 1006.19447414 1009.81171013 1009.42206593 1009.38249775 1007.44010259] (10,)\n" 2384 | ] 2385 | } 2386 | ], 2387 | "source": [ 2388 | "m_row_max = m.max(axis=1)\n", 2389 | "print(m_row_max, m_row_max.shape)" 2390 | ] 2391 | }, 2392 | { 2393 | "cell_type": "code", 2394 | "execution_count": 51, 2395 | "metadata": {}, 2396 | "outputs": [ 2397 | { 2398 | "name": "stdout", 2399 | "output_type": "stream", 2400 | "text": [ 2401 | "[[ -4.76039960e-01 -9.20501175e+00 -2.81235589e+00 -4.24888680e+00\n", 2402 | " -6.82370916e+00 2.64855206e+00 -2.96065608e-01 -4.89558651e+00\n", 2403 | " -1.75414766e+00 6.87799827e-01]\n", 2404 | " [ -5.95828958e+00 -8.25926095e+00 -1.85007797e+00 1.23584629e-01\n", 2405 | " -7.39599824e+00 1.42975827e+00 -2.41547808e+00 -4.56955883e+00\n", 2406 | " -8.85929673e-01 -3.63636923e+00]\n", 2407 | " [ 3.98655546e-02 -7.65869236e+00 -9.00905705e+00 4.65644603e-01\n", 2408 | " -9.24222857e+00 -3.90364492e+00 -1.78823690e+00 -7.87906367e+00\n", 2409 | " -6.80844914e+00 -4.17209297e+00]\n", 2410 | " [ -5.53527194e+00 -1.23517017e+00 -9.10338654e+00 -3.18771864e+00\n", 2411 | " -3.79097927e+00 -2.89596633e+00 -4.40620942e-01 -8.70781425e+00\n", 2412 | " -2.81372310e+00 -3.14452624e+00]\n", 2413 | " [ 8.52583931e-03 -3.72860506e+00 -8.18504600e+00 -9.03027138e+00\n", 2414 | " -6.17358197e+00 -3.46584687e+00 -9.00903765e+00 2.86014159e-01\n", 2415 | " -1.41639402e+00 -6.93459900e+00]\n", 2416 | " [ -4.02677444e+00 -7.27147398e+00 -9.06969756e+00 -4.25477693e+00\n", 2417 | " -3.51360594e+00 -3.44311628e-02 -8.73644528e+00 -4.26089414e+00\n", 2418 | " -7.99231586e+00 -4.82470861e+00]\n", 2419 | " [ -5.42903082e+00 -5.64811557e+00 -6.18011368e+00 -8.55363284e+00\n", 2420 | " -3.65464253e+00 -3.33725512e-01 0.00000000e+00 -6.22500582e+00\n", 2421 | " -6.02225905e+00 -1.80621301e-01]\n", 2422 | " [ -7.98882216e+00 -4.82451080e-01 -3.90584049e+00 5.09767380e-02\n", 2423 | " -7.80302735e+00 -4.25776143e+00 -4.54332618e+00 -4.63036366e+00\n", 2424 | " -5.68571783e+00 4.26529647e-02]\n", 2425 | " [ -7.46296368e+00 -2.33189805e+00 -4.54236045e-01 -9.76160754e-01\n", 2426 | " -5.72172726e+00 1.24531679e+00 -8.40461102e+00 -6.79800233e+00\n", 2427 | " -5.75979887e+00 9.78335370e-01]\n", 2428 | " [ -2.07796674e+00 -2.93906616e+00 -4.65631210e+00 -4.24137611e+00\n", 2429 | " -4.08461389e+00 -1.70978756e+00 -4.56106643e+00 -1.98196333e+00\n", 2430 | " -6.88135010e+00 -3.56353152e+00]]\n" 2431 | ] 2432 | } 2433 | ], 2434 | "source": [ 2435 | "m = m - m_row_max\n", 2436 | "print(m)" 2437 | ] 2438 | }, 2439 | { 2440 | "cell_type": "code", 2441 | "execution_count": 52, 2442 | "metadata": {}, 2443 | "outputs": [ 2444 | { 2445 | "name": "stdout", 2446 | "output_type": "stream", 2447 | "text": [ 2448 | "[[ 6.21238657e-01 1.00534285e-04 6.00633229e-02 1.42801218e-02\n", 2449 | " 1.08767906e-03 1.41335593e+01 7.43738631e-01 7.47952116e-03\n", 2450 | " 1.73054682e-01 1.98933384e+00]\n", 2451 | " [ 2.58432847e-03 2.58850223e-04 1.57224907e-01 1.13154576e+00\n", 2452 | " 6.13703750e-04 4.17768918e+00 8.93246242e-02 1.03625303e-02\n", 2453 | " 4.12330662e-01 2.63478335e-02]\n", 2454 | " [ 1.04067085e+00 4.71924116e-04 1.22297121e-04 1.59304074e+00\n", 2455 | " 9.68614866e-05 2.01682655e-02 1.67254797e-01 3.78587373e-04\n", 2456 | " 1.10440435e-03 1.54199529e-02]\n", 2457 | " [ 3.94513565e-03 2.90785275e-01 1.11288288e-04 4.12659061e-02\n", 2458 | " 2.25734854e-02 5.52456136e-02 6.43636636e-01 1.65289140e-04\n", 2459 | " 5.99812597e-02 4.30873321e-02]\n", 2460 | " [ 1.00856229e+00 2.40263278e-02 2.78791604e-04 1.19729997e-04\n", 2461 | " 2.08375865e-03 3.12465324e-02 1.22299494e-04 1.33111130e+00\n", 2462 | " 2.42587206e-01 9.73513369e-04]\n", 2463 | " [ 1.78317546e-02 6.95086697e-04 1.15101344e-04 1.41962572e-02\n", 2464 | " 2.97893021e-02 9.66154845e-01 1.60623848e-04 1.41096808e-02\n", 2465 | " 3.38050298e-04 8.02889305e-03]\n", 2466 | " [ 4.38734589e-03 3.52415153e-03 2.07019250e-03 1.92843258e-04\n", 2467 | " 2.58707439e-02 7.16250357e-01 1.00000000e+00 1.97931229e-03\n", 2468 | " 2.42418705e-03 8.34751418e-01]\n", 2469 | " [ 3.39233412e-04 6.17268561e-01 2.01240333e-02 1.05229841e+00\n", 2470 | " 4.08496442e-04 1.41539515e-02 1.06379639e-02 9.75121230e-03\n", 2471 | " 3.39409598e-03 1.04357567e+00]\n", 2472 | " [ 5.73952636e-04 9.71112500e-02 6.34932843e-01 3.76754780e-01\n", 2473 | " 3.27405087e-03 3.47403515e+00 2.23832843e-04 1.11600233e-03\n", 2474 | " 3.15174546e-03 2.66002460e+00]\n", 2475 | " [ 1.25184486e-01 5.29151200e-02 9.50143822e-03 1.43877790e-02\n", 2476 | " 1.68296361e-02 1.80904219e-01 1.04509078e-02 1.37798427e-01\n", 2477 | " 1.02675689e-03 2.83385696e-02]] (10, 10)\n" 2478 | ] 2479 | } 2480 | ], 2481 | "source": [ 2482 | "m_exp = np.exp(m)\n", 2483 | "print(m_exp, m_exp.shape)" 2484 | ] 2485 | }, 2486 | { 2487 | "cell_type": "code", 2488 | "execution_count": 53, 2489 | "metadata": {}, 2490 | "outputs": [ 2491 | { 2492 | "name": "stdout", 2493 | "output_type": "stream", 2494 | "text": [ 2495 | "[[ 17.74393632]\n", 2496 | " [ 6.00828239]\n", 2497 | " [ 2.83872868]\n", 2498 | " [ 1.16079722]\n", 2499 | " [ 2.64111175]\n", 2500 | " [ 1.05141959]\n", 2501 | " [ 2.59145055]\n", 2502 | " [ 2.77195164]\n", 2503 | " [ 7.2511982 ]\n", 2504 | " [ 0.57733734]] (10, 1)\n" 2505 | ] 2506 | } 2507 | ], 2508 | "source": [ 2509 | "m_exp_row_sum = m_exp.sum(axis = 1).reshape(10,1)\n", 2510 | "print(m_exp_row_sum, m_exp_row_sum.shape)" 2511 | ] 2512 | }, 2513 | { 2514 | "cell_type": "code", 2515 | "execution_count": 54, 2516 | "metadata": {}, 2517 | "outputs": [ 2518 | { 2519 | "name": "stdout", 2520 | "output_type": "stream", 2521 | "text": [ 2522 | "[[ 3.50113214e-02 5.66583891e-06 3.38500555e-03 8.04788830e-04\n", 2523 | " 6.12986339e-05 7.96528971e-01 4.19150868e-02 4.21525473e-04\n", 2524 | " 9.75289126e-03 1.12113445e-01]\n", 2525 | " [ 4.30127665e-04 4.30822332e-05 2.61680288e-02 1.88330989e-01\n", 2526 | " 1.02142960e-04 6.95321710e-01 1.48669151e-02 1.72470760e-03\n", 2527 | " 6.86270444e-02 4.38525219e-03]\n", 2528 | " [ 3.66597505e-01 1.66244883e-04 4.30816521e-05 5.61181049e-01\n", 2529 | " 3.41214317e-05 7.10468234e-03 5.89189092e-02 1.33365114e-04\n", 2530 | " 3.89048928e-04 5.43199249e-03]\n", 2531 | " [ 3.39864326e-03 2.50504800e-01 9.58722898e-05 3.55496252e-02\n", 2532 | " 1.94465364e-02 4.75928203e-02 5.54478099e-01 1.42392777e-04\n", 2533 | " 5.16724701e-02 3.71187416e-02]\n", 2534 | " [ 3.81870357e-01 9.09705083e-03 1.05558428e-04 4.53331808e-05\n", 2535 | " 7.88970271e-04 1.18308256e-02 4.63060656e-05 5.03996585e-01\n", 2536 | " 9.18504134e-02 3.68599840e-04]\n", 2537 | " [ 1.69596940e-02 6.61093535e-04 1.09472322e-04 1.35019903e-02\n", 2538 | " 2.83324585e-02 9.18905116e-01 1.52768551e-04 1.34196479e-02\n", 2539 | " 3.21517974e-04 7.63624066e-03]\n", 2540 | " [ 1.69300776e-03 1.35991464e-03 7.98854717e-04 7.44151794e-05\n", 2541 | " 9.98311308e-03 2.76389745e-01 3.85884268e-01 7.63785475e-04\n", 2542 | " 9.35455647e-04 3.22117440e-01]\n", 2543 | " [ 1.22380711e-04 2.22683741e-01 7.25987895e-03 3.79623656e-01\n", 2544 | " 1.47367810e-04 5.10613221e-03 3.83771627e-03 3.51781473e-03\n", 2545 | " 1.22444271e-03 3.76476869e-01]\n", 2546 | " [ 7.91527993e-05 1.33924418e-02 8.75624724e-02 5.19575895e-02\n", 2547 | " 4.51518601e-04 4.79098082e-01 3.08683940e-05 1.53905920e-04\n", 2548 | " 4.34651676e-04 3.66839317e-01]\n", 2549 | " [ 2.16830745e-01 9.16537289e-02 1.64573423e-02 2.49209223e-02\n", 2550 | " 2.91504375e-02 3.13342316e-01 1.81019087e-02 2.38679222e-01\n", 2551 | " 1.77843493e-03 4.90849416e-02]]\n" 2552 | ] 2553 | } 2554 | ], 2555 | "source": [ 2556 | "m_softmax = m_exp / m_exp_row_sum\n", 2557 | "print(m_softmax)" 2558 | ] 2559 | }, 2560 | { 2561 | "cell_type": "code", 2562 | "execution_count": 55, 2563 | "metadata": {}, 2564 | "outputs": [ 2565 | { 2566 | "name": "stdout", 2567 | "output_type": "stream", 2568 | "text": [ 2569 | "[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n" 2570 | ] 2571 | } 2572 | ], 2573 | "source": [ 2574 | "print(m_softmax.sum(axis=1))" 2575 | ] 2576 | }, 2577 | { 2578 | "cell_type": "markdown", 2579 | "metadata": {}, 2580 | "source": [ 2581 | "更多的numpy细节和用法可以查看一下官网[numpy指南](http://docs.scipy.org/doc/numpy/reference/)" 2582 | ] 2583 | } 2584 | ], 2585 | "metadata": { 2586 | "kernelspec": { 2587 | "display_name": "Python 3", 2588 | "language": "python", 2589 | "name": "python3" 2590 | }, 2591 | "language_info": { 2592 | "codemirror_mode": { 2593 | "name": "ipython", 2594 | "version": 3 2595 | }, 2596 | "file_extension": ".py", 2597 | "mimetype": "text/x-python", 2598 | "name": "python", 2599 | "nbconvert_exporter": "python", 2600 | "pygments_lexer": "ipython3", 2601 | "version": "3.6.1" 2602 | } 2603 | }, 2604 | "nbformat": 4, 2605 | "nbformat_minor": 1 2606 | } 2607 | -------------------------------------------------------------------------------- /Nov-2017/some_array.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/Nov-2017/some_array.npy -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # numpy-tutorial 2 | numpy tutorial for julyedu 3 | -------------------------------------------------------------------------------- /array_archive.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/array_archive.npz -------------------------------------------------------------------------------- /array_ex.txt: -------------------------------------------------------------------------------- 1 | 0.580052,0.186730,1.040717,1.134411 2 | 0.194163,-0.636917,-0.938659,0.124094 3 | -0.126410,0.268607,-0.695724,0.047428 4 | -1.484413,0.004176,-0.744203,0.005487 5 | 2.302869,0.200131,1.670238,-1.881090 6 | -0.193230,1.047233,0.482803,0.960334 -------------------------------------------------------------------------------- /proj/.ipynb_checkpoints/Untitled-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 4, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "ename": "UnicodeDecodeError", 21 | "evalue": "'charmap' codec can't decode byte 0x90 in position 104: character maps to ", 22 | "output_type": "error", 23 | "traceback": [ 24 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 25 | "\u001b[1;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 26 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0membed\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"embed.npy\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mp_vector\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"p_vector\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 27 | "\u001b[1;32mc:\\users\\jasonchuzewei\\anaconda3\\envs\\julyedu\\lib\\site-packages\\numpy\\lib\\npyio.py\u001b[0m in \u001b[0;36mload\u001b[1;34m(file, mmap_mode, allow_pickle, fix_imports, encoding)\u001b[0m\n\u001b[0;32m 400\u001b[0m \u001b[0m_ZIP_PREFIX\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0masbytes\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'PK\\x03\\x04'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 401\u001b[0m \u001b[0mN\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mMAGIC_PREFIX\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 402\u001b[1;33m \u001b[0mmagic\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfid\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mN\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 403\u001b[0m \u001b[1;31m# If the file size is less than N, we need to make sure not\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 404\u001b[0m \u001b[1;31m# to seek past the beginning of the file\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 28 | "\u001b[1;32mc:\\users\\jasonchuzewei\\anaconda3\\envs\\julyedu\\lib\\encodings\\cp1252.py\u001b[0m in \u001b[0;36mdecode\u001b[1;34m(self, input, final)\u001b[0m\n\u001b[0;32m 21\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 22\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mdecode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfinal\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 23\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcharmap_decode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0merrors\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mdecoding_table\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 24\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 25\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mStreamWriter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mCodec\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mStreamWriter\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 29 | "\u001b[1;31mUnicodeDecodeError\u001b[0m: 'charmap' codec can't decode byte 0x90 in position 104: character maps to " 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "embed = np.load(open(\"embed.npy\", \"r\"))\n", 35 | "p_vector = np.load(open(\"p_vector\", \"r\"))" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 11, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "def load_data(in_file, relabeling=True):\n", 45 | " docs = []\n", 46 | " labels = []\n", 47 | " num_examples = 0\n", 48 | " f = open(in_file, 'r')\n", 49 | " line = f.readline()\n", 50 | " while line != \"\": \n", 51 | " line = line.strip().split(\"\\t\") \n", 52 | " \n", 53 | " if len(line) >= 2:\n", 54 | " docs.append(line[0].split())\n", 55 | " labels.append(line[1])\n", 56 | " num_examples += 1\n", 57 | " else:\n", 58 | " docs.append(line[0].split())\n", 59 | " num_examples += 1\n", 60 | "\n", 61 | " line = f.readline()\n", 62 | " f.close()\n", 63 | " return (docs, labels)\n", 64 | "\n", 65 | "def encode(examples, word_dict, pos_examples=None, pos_dict=None, sort_by_len=True):\n", 66 | " '''\n", 67 | " Encode the sequences. \n", 68 | " '''\n", 69 | " in_doc = []\n", 70 | " in_l = []\n", 71 | " in_pos = []\n", 72 | "\n", 73 | " \n", 74 | " if pos_examples is not None:\n", 75 | " for idx, (d_words, l_words, pos_words) in enumerate(zip(examples[0], examples[1], pos_examples[0])):\n", 76 | " seq1 = [word_dict[w] if w in word_dict else 0 for w in d_words]\n", 77 | " seq2 = [int(w) for w in l_words]\n", 78 | " seq3 = [pos_dict[w] if w in pos_dict else 0 for w in pos_words]\n", 79 | " \n", 80 | " if len(seq1) > 0:\n", 81 | " in_doc.append(seq1)\n", 82 | " in_l.append(seq2)\n", 83 | " in_pos.append(seq3)\n", 84 | " else:\n", 85 | " for idx, (d_words, l_words) in enumerate(zip(examples[0], examples[1])):\n", 86 | " seq1 = [word_dict[w] if w in word_dict else 0 for w in d_words]\n", 87 | " seq2 = [int(w) for w in l_words]\n", 88 | " \n", 89 | " if len(seq1) > 0:\n", 90 | " in_doc.append(seq1)\n", 91 | " in_l.append(seq2)\n", 92 | "\n", 93 | " def len_argsort(seq):\n", 94 | " return sorted(range(len(seq)), key=lambda x: len(seq[x]))\n", 95 | "\n", 96 | " if sort_by_len:\n", 97 | " # sort by the document length\n", 98 | " sorted_index = len_argsort(in_doc)\n", 99 | " in_doc = [in_doc[i] for i in sorted_index]\n", 100 | " in_l = [in_l[i] for i in sorted_index]\n", 101 | " if pos_examples is not None:\n", 102 | " in_pos = [in_pos[i] for i in sorted_index]\n", 103 | "\n", 104 | " if pos_examples is not None:\n", 105 | " return in_doc, in_l, in_pos\n", 106 | " else:\n", 107 | " return in_doc, in_l\n", 108 | "\n", 109 | "def get_minibatches(n, minibatch_size, shuffle=False):\n", 110 | " idx_list = np.arange(0, n, minibatch_size)\n", 111 | " if shuffle:\n", 112 | " np.random.shuffle(idx_list)\n", 113 | " minibatches = []\n", 114 | " for idx in idx_list:\n", 115 | " minibatches.append(np.arange(idx, min(idx + minibatch_size, n)))\n", 116 | " return minibatches\n", 117 | "\n", 118 | "def prepare_data(seqs):\n", 119 | " lengths = [len(seq) for seq in seqs]\n", 120 | " n_samples = len(seqs)\n", 121 | " max_len = np.max(lengths)\n", 122 | " x = np.zeros((n_samples, max_len)).astype('int32')\n", 123 | " x_mask = np.zeros((n_samples, max_len)).astype('float32')\n", 124 | " for idx, seq in enumerate(seqs):\n", 125 | " x[idx, :lengths[idx]] = seq\n", 126 | " x_mask[idx, :lengths[idx]] = 1.0\n", 127 | " return x, x_mask\n", 128 | "\n", 129 | "def gen_examples(d, l, batch_size, pos=None):\n", 130 | "\n", 131 | " minibatches = get_minibatches(len(d), batch_size)\n", 132 | " all_ex = []\n", 133 | " for minibatch in minibatches:\n", 134 | " mb_d = [d[t] for t in minibatch]\n", 135 | " mb_l = [l[t] for t in minibatch]\n", 136 | " mb_d, mb_mask_d = prepare_data(mb_d)\n", 137 | " if pos is not None:\n", 138 | " mb_pos = [pos[t] for t in minibatch]\n", 139 | " mb_pos, mb_mask_pos = prepare_data(mb_pos)\n", 140 | " all_ex.append((mb_d, mb_mask_d, mb_l, mb_pos))\n", 141 | " else:\n", 142 | " all_ex.append((mb_d, mb_mask_d, mb_l))\n", 143 | " return all_ex" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": { 150 | "collapsed": true 151 | }, 152 | "outputs": [], 153 | "source": [ 154 | "data = load_data(\"senti.binary.test.txt\", \"r\")\n", 155 | "docs, labels = utils.encode(data, word_dict)\n", 156 | "data = utils.gen_examples(d_docs, d_labels, args.batch_size)\n", 157 | "for idx, (mb_d, mb_mask_d, mb_l) in enumerate(data):\n", 158 | " \n", 159 | "\n", 160 | "print()" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": { 167 | "collapsed": true 168 | }, 169 | "outputs": [], 170 | "source": [] 171 | } 172 | ], 173 | "metadata": { 174 | "kernelspec": { 175 | "display_name": "Python 3", 176 | "language": "python", 177 | "name": "python3" 178 | }, 179 | "language_info": { 180 | "codemirror_mode": { 181 | "name": "ipython", 182 | "version": 3 183 | }, 184 | "file_extension": ".py", 185 | "mimetype": "text/x-python", 186 | "name": "python", 187 | "nbconvert_exporter": "python", 188 | "pygments_lexer": "ipython3", 189 | "version": "3.6.1" 190 | } 191 | }, 192 | "nbformat": 4, 193 | "nbformat_minor": 2 194 | } 195 | -------------------------------------------------------------------------------- /proj/Untitled.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 4, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "ename": "UnicodeDecodeError", 21 | "evalue": "'charmap' codec can't decode byte 0x90 in position 104: character maps to ", 22 | "output_type": "error", 23 | "traceback": [ 24 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 25 | "\u001b[1;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 26 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0membed\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"embed.npy\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mp_vector\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"p_vector\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 27 | "\u001b[1;32mc:\\users\\jasonchuzewei\\anaconda3\\envs\\julyedu\\lib\\site-packages\\numpy\\lib\\npyio.py\u001b[0m in \u001b[0;36mload\u001b[1;34m(file, mmap_mode, allow_pickle, fix_imports, encoding)\u001b[0m\n\u001b[0;32m 400\u001b[0m \u001b[0m_ZIP_PREFIX\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0masbytes\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'PK\\x03\\x04'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 401\u001b[0m \u001b[0mN\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mMAGIC_PREFIX\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 402\u001b[1;33m \u001b[0mmagic\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfid\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mN\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 403\u001b[0m \u001b[1;31m# If the file size is less than N, we need to make sure not\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 404\u001b[0m \u001b[1;31m# to seek past the beginning of the file\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 28 | "\u001b[1;32mc:\\users\\jasonchuzewei\\anaconda3\\envs\\julyedu\\lib\\encodings\\cp1252.py\u001b[0m in \u001b[0;36mdecode\u001b[1;34m(self, input, final)\u001b[0m\n\u001b[0;32m 21\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 22\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mdecode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfinal\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 23\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcharmap_decode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0merrors\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mdecoding_table\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 24\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 25\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mStreamWriter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mCodec\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mStreamWriter\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 29 | "\u001b[1;31mUnicodeDecodeError\u001b[0m: 'charmap' codec can't decode byte 0x90 in position 104: character maps to " 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "embed = np.load(open(\"embed.npy\", \"r\"))\n", 35 | "p_vector = np.load(open(\"p_vector\", \"r\"))" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 11, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "def load_data(in_file, relabeling=True):\n", 45 | " docs = []\n", 46 | " labels = []\n", 47 | " num_examples = 0\n", 48 | " f = open(in_file, 'r')\n", 49 | " line = f.readline()\n", 50 | " while line != \"\": \n", 51 | " line = line.strip().split(\"\\t\") \n", 52 | " \n", 53 | " if len(line) >= 2:\n", 54 | " docs.append(line[0].split())\n", 55 | " labels.append(line[1])\n", 56 | " num_examples += 1\n", 57 | " else:\n", 58 | " docs.append(line[0].split())\n", 59 | " num_examples += 1\n", 60 | "\n", 61 | " line = f.readline()\n", 62 | " f.close()\n", 63 | " return (docs, labels)\n", 64 | "\n", 65 | "def encode(examples, word_dict, pos_examples=None, pos_dict=None, sort_by_len=True):\n", 66 | " '''\n", 67 | " Encode the sequences. \n", 68 | " '''\n", 69 | " in_doc = []\n", 70 | " in_l = []\n", 71 | " in_pos = []\n", 72 | "\n", 73 | " \n", 74 | " if pos_examples is not None:\n", 75 | " for idx, (d_words, l_words, pos_words) in enumerate(zip(examples[0], examples[1], pos_examples[0])):\n", 76 | " seq1 = [word_dict[w] if w in word_dict else 0 for w in d_words]\n", 77 | " seq2 = [int(w) for w in l_words]\n", 78 | " seq3 = [pos_dict[w] if w in pos_dict else 0 for w in pos_words]\n", 79 | " \n", 80 | " if len(seq1) > 0:\n", 81 | " in_doc.append(seq1)\n", 82 | " in_l.append(seq2)\n", 83 | " in_pos.append(seq3)\n", 84 | " else:\n", 85 | " for idx, (d_words, l_words) in enumerate(zip(examples[0], examples[1])):\n", 86 | " seq1 = [word_dict[w] if w in word_dict else 0 for w in d_words]\n", 87 | " seq2 = [int(w) for w in l_words]\n", 88 | " \n", 89 | " if len(seq1) > 0:\n", 90 | " in_doc.append(seq1)\n", 91 | " in_l.append(seq2)\n", 92 | "\n", 93 | " def len_argsort(seq):\n", 94 | " return sorted(range(len(seq)), key=lambda x: len(seq[x]))\n", 95 | "\n", 96 | " if sort_by_len:\n", 97 | " # sort by the document length\n", 98 | " sorted_index = len_argsort(in_doc)\n", 99 | " in_doc = [in_doc[i] for i in sorted_index]\n", 100 | " in_l = [in_l[i] for i in sorted_index]\n", 101 | " if pos_examples is not None:\n", 102 | " in_pos = [in_pos[i] for i in sorted_index]\n", 103 | "\n", 104 | " if pos_examples is not None:\n", 105 | " return in_doc, in_l, in_pos\n", 106 | " else:\n", 107 | " return in_doc, in_l\n", 108 | "\n", 109 | "def get_minibatches(n, minibatch_size, shuffle=False):\n", 110 | " idx_list = np.arange(0, n, minibatch_size)\n", 111 | " if shuffle:\n", 112 | " np.random.shuffle(idx_list)\n", 113 | " minibatches = []\n", 114 | " for idx in idx_list:\n", 115 | " minibatches.append(np.arange(idx, min(idx + minibatch_size, n)))\n", 116 | " return minibatches\n", 117 | "\n", 118 | "def prepare_data(seqs):\n", 119 | " lengths = [len(seq) for seq in seqs]\n", 120 | " n_samples = len(seqs)\n", 121 | " max_len = np.max(lengths)\n", 122 | " x = np.zeros((n_samples, max_len)).astype('int32')\n", 123 | " x_mask = np.zeros((n_samples, max_len)).astype('float32')\n", 124 | " for idx, seq in enumerate(seqs):\n", 125 | " x[idx, :lengths[idx]] = seq\n", 126 | " x_mask[idx, :lengths[idx]] = 1.0\n", 127 | " return x, x_mask\n", 128 | "\n", 129 | "def gen_examples(d, l, batch_size, pos=None):\n", 130 | "\n", 131 | " minibatches = get_minibatches(len(d), batch_size)\n", 132 | " all_ex = []\n", 133 | " for minibatch in minibatches:\n", 134 | " mb_d = [d[t] for t in minibatch]\n", 135 | " mb_l = [l[t] for t in minibatch]\n", 136 | " mb_d, mb_mask_d = prepare_data(mb_d)\n", 137 | " if pos is not None:\n", 138 | " mb_pos = [pos[t] for t in minibatch]\n", 139 | " mb_pos, mb_mask_pos = prepare_data(mb_pos)\n", 140 | " all_ex.append((mb_d, mb_mask_d, mb_l, mb_pos))\n", 141 | " else:\n", 142 | " all_ex.append((mb_d, mb_mask_d, mb_l))\n", 143 | " return all_ex" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": { 150 | "collapsed": true 151 | }, 152 | "outputs": [], 153 | "source": [ 154 | "data = load_data(\"senti.binary.test.txt\", \"r\")\n", 155 | "docs, labels = utils.encode(data, word_dict)\n", 156 | "data = utils.gen_examples(d_docs, d_labels, args.batch_size)\n", 157 | "for idx, (mb_d, mb_mask_d, mb_l) in enumerate(data):\n", 158 | " \n", 159 | "\n", 160 | "print()" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": { 167 | "collapsed": true 168 | }, 169 | "outputs": [], 170 | "source": [] 171 | } 172 | ], 173 | "metadata": { 174 | "kernelspec": { 175 | "display_name": "Python 3", 176 | "language": "python", 177 | "name": "python3" 178 | }, 179 | "language_info": { 180 | "codemirror_mode": { 181 | "name": "ipython", 182 | "version": 3 183 | }, 184 | "file_extension": ".py", 185 | "mimetype": "text/x-python", 186 | "name": "python", 187 | "nbconvert_exporter": "python", 188 | "pygments_lexer": "ipython3", 189 | "version": "3.6.1" 190 | } 191 | }, 192 | "nbformat": 4, 193 | "nbformat_minor": 2 194 | } 195 | -------------------------------------------------------------------------------- /proj/embed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/proj/embed.npy -------------------------------------------------------------------------------- /proj/p_vector.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/proj/p_vector.npy -------------------------------------------------------------------------------- /some_array.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ZeweiChu/numpy-tutorial/bcda441bee52af9ff392f58f688ec7f5cc57d8d5/some_array.npy --------------------------------------------------------------------------------