├── .ipynb_checkpoints ├── 01. Numpy和原生Python用于数组计算的性能对比-checkpoint.ipynb ├── 02. Numpy的核心array对象以及创建array的方法-checkpoint.ipynb ├── 03. Numpy对数组按索引查询-checkpoint.ipynb ├── 04. Numpy常用random随机函数汇总-checkpoint.ipynb ├── 05. Numpy的数学统计函数-checkpoint.ipynb ├── 06. Numpy计算数组中满足条件元素个数-checkpoint.ipynb ├── 07. Numpy怎样给数组增加一个维度-checkpoint.ipynb ├── 08. Numpy实现K折交叉验证的数据划分-checkpoint.ipynb ├── 09. Numpy非常有用的数组合并操作-checkpoint.ipynb ├── 10. Numpy怎样对数组排序-checkpoint.ipynb ├── 11. Numpy中数组的乘法-checkpoint.ipynb ├── 12. Numpy中重要的广播概念-checkpoint.ipynb ├── 13. Numpy求解线性方程组-checkpoint.ipynb ├── 14. Numpy实现SVD矩阵分解-checkpoint.ipynb ├── 15. Numpy实现多项式曲线拟合-checkpoint.ipynb ├── 16. Numpy使用Matplotlib实现可视化绘图-checkpoint.ipynb ├── 17. Numpy计算逆矩阵求解线性方程组-checkpoint.ipynb ├── 18. Numpy怎样将数组读写到文件-checkpoint.ipynb ├── 19. Numpy的结构化数组-checkpoint.ipynb ├── 20. Numpy与Pandas数据的相互转换-checkpoint.ipynb ├── 21. Numpy数据输入给Scikit-learn实现模型训练-checkpoint.ipynb ├── Untitled-checkpoint.ipynb └── Untitled1-checkpoint.ipynb ├── 01. Numpy和原生Python用于数组计算的性能对比.ipynb ├── 02. Numpy的核心array对象以及创建array的方法.ipynb ├── 03. Numpy对数组按索引查询.ipynb ├── 04. Numpy常用random随机函数汇总.ipynb ├── 05. Numpy的数学统计函数.ipynb ├── 06. Numpy计算数组中满足条件元素个数.ipynb ├── 07. Numpy怎样给数组增加一个维度.ipynb ├── 08. Numpy实现K折交叉验证的数据划分.ipynb ├── 09. Numpy非常有用的数组合并操作.ipynb ├── 10. Numpy怎样对数组排序.ipynb ├── 11. Numpy中数组的乘法.ipynb ├── 12. Numpy中重要的广播概念.ipynb ├── 13. Numpy求解线性方程组.ipynb ├── 14. Numpy实现SVD矩阵分解.ipynb ├── 15. Numpy实现多项式曲线拟合.ipynb ├── 16. Numpy使用Matplotlib实现可视化绘图.ipynb ├── 17. Numpy计算逆矩阵求解线性方程组.ipynb ├── 18. Numpy怎样将数组读写到文件.ipynb ├── 19. Numpy的结构化数组.ipynb ├── 20. Numpy与Pandas数据的相互转换.ipynb ├── 21. Numpy数据输入给Scikit-learn实现模型训练.ipynb ├── README.md ├── Untitled.ipynb ├── Untitled1.ipynb ├── arr_a.npy ├── arr_ab.npz ├── arr_ab_compressed.npz └── other_files ├── numpy-array-inv.jpg ├── numpy-kfold-validation.jpg ├── numpy-kfold-validation.png └── numpy_random_functions.png /.ipynb_checkpoints/06. Numpy计算数组中满足条件元素个数-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy计算数组中满足条件元素个数\n", 8 | "\n", 9 | "需求:有一个非常大的数组比如1亿个数字,求出里面数字小于5000的数字数目" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 1. 使用numpy的random模块生成1亿个数字" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "arr = np.random.randint(1, 10000, size=int(1e8))" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 3, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "data": { 44 | "text/plain": [ 45 | "array([8855, 6014, 4193, 7830, 355, 9469, 1661, 6569, 7647, 5907])" 46 | ] 47 | }, 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "output_type": "execute_result" 51 | } 52 | ], 53 | "source": [ 54 | "arr[:10]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 4, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "100000000" 66 | ] 67 | }, 68 | "execution_count": 4, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "arr.size" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### 2. 使用Python原生语法实现" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 5, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "pyarr = list(arr)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 6, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "50001207" 102 | ] 103 | }, 104 | "execution_count": 6, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "# 计算下结果,用于对比是否准确\n", 111 | "len([x for x in pyarr if x>5000])" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 7, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "name": "stdout", 121 | "output_type": "stream", 122 | "text": [ 123 | "16.6 s ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 124 | ] 125 | } 126 | ], 127 | "source": [ 128 | "# 记一下时间\n", 129 | "%timeit len([x for x in pyarr if x>5000])" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### 3. 使用numpy的向量化操作实现" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 8, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/plain": [ 147 | "50001207" 148 | ] 149 | }, 150 | "execution_count": 8, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "# 计算下结果,用于对比是否准确\n", 157 | "arr[arr>5000].size" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 9, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "array([ True, True, False, True, False, True, False, True, True,\n", 169 | " True])" 170 | ] 171 | }, 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "(arr>5000)[:10]" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 10, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "556 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "# 记一下时间\n", 196 | "%timeit arr[arr>5000].size" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "### 4. 对比下时间" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 12, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "data": { 213 | "text/plain": [ 214 | "29.90990990990991" 215 | ] 216 | }, 217 | "execution_count": 12, 218 | "metadata": {}, 219 | "output_type": "execute_result" 220 | } 221 | ], 222 | "source": [ 223 | "16.6*1000 / 555 " 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [] 232 | } 233 | ], 234 | "metadata": { 235 | "kernelspec": { 236 | "display_name": "Python 3", 237 | "language": "python", 238 | "name": "python3" 239 | }, 240 | "language_info": { 241 | "codemirror_mode": { 242 | "name": "ipython", 243 | "version": 3 244 | }, 245 | "file_extension": ".py", 246 | "mimetype": "text/x-python", 247 | "name": "python", 248 | "nbconvert_exporter": "python", 249 | "pygments_lexer": "ipython3", 250 | "version": "3.7.6" 251 | } 252 | }, 253 | "nbformat": 4, 254 | "nbformat_minor": 4 255 | } 256 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/07. Numpy怎样给数组增加一个维度-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样给数组增加一个维度" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "***背景:*** \n", 15 | "很多数据计算都是二维或三维的,对于一维的数据输入为了形状匹配,经常需升维变成二维\n", 16 | "\n", 17 | "***需要:*** \n", 18 | "在不改变数据的情况下,添加数组维度;(注意观察这个例子,维度变了,但数据不变) \n", 19 | "原始数组:一维数组arr=[1,2,3,4],其shape是(4, ),取值分别为arr[0],arr[1],arr[2],arr[3] \n", 20 | "变形数组:二维数组arr[[1,2,3,4]],其shape实(1,4), 取值分别为a[0,0],a[0,1],a[0,2],a[0,3]\n", 21 | "\n", 22 | "***实操:*** \n", 23 | "经常需要在纸上手绘数组的形状,来查看不同数组是否形状匹配,是否需要升维降维\n", 24 | "\n", 25 | "***3种方法:*** \n", 26 | "* np.newaxis:关键字,使用索引的语法给数组添加维度\n", 27 | "* np.expand_dims(arr, axis):方法,和np.newaxis实现一样的功能,给arr在axis位置添加维度\n", 28 | "* np.reshape(a, newshape):方法,给一个维度设置为1完成升维" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 3, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "data": { 47 | "text/plain": [ 48 | "array([0, 1, 2, 3, 4])" 49 | ] 50 | }, 51 | "execution_count": 3, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "arr = np.arange(5)\n", 58 | "arr" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 5, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "(5,)" 70 | ] 71 | }, 72 | "execution_count": 5, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "# 注意,当前是一维向量\n", 79 | "arr.shape" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### 方法1:np.newaxis关键字" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "#### 注意:np.newaxis其实就是None的别名" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 6, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "text/plain": [ 104 | "True" 105 | ] 106 | }, 107 | "execution_count": 6, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "np.newaxis is None" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 7, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "True" 125 | ] 126 | }, 127 | "execution_count": 7, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "np.newaxis == None" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "即以下所有的np.newaxis的位置,都可以用None替代" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "#### 给一维向量添加一个行维度" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 8, 153 | "metadata": {}, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "array([[0, 1, 2, 3, 4]])" 159 | ] 160 | }, 161 | "execution_count": 8, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "arr[np.newaxis, :]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 9, 173 | "metadata": { 174 | "scrolled": true 175 | }, 176 | "outputs": [ 177 | { 178 | "data": { 179 | "text/plain": [ 180 | "(1, 5)" 181 | ] 182 | }, 183 | "execution_count": 9, 184 | "metadata": {}, 185 | "output_type": "execute_result" 186 | } 187 | ], 188 | "source": [ 189 | "arr[np.newaxis, :].shape" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "数据现在是一行*五列,数据本身没有增减,只是多了一级括号" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "#### 给一维向量添加一个列维度" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 10, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "data": { 213 | "text/plain": [ 214 | "array([[0],\n", 215 | " [1],\n", 216 | " [2],\n", 217 | " [3],\n", 218 | " [4]])" 219 | ] 220 | }, 221 | "execution_count": 10, 222 | "metadata": {}, 223 | "output_type": "execute_result" 224 | } 225 | ], 226 | "source": [ 227 | "arr[:, np.newaxis]" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 11, 233 | "metadata": {}, 234 | "outputs": [ 235 | { 236 | "data": { 237 | "text/plain": [ 238 | "(5, 1)" 239 | ] 240 | }, 241 | "execution_count": 11, 242 | "metadata": {}, 243 | "output_type": "execute_result" 244 | } 245 | ], 246 | "source": [ 247 | "arr[:, np.newaxis].shape" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "数据现在是五行*一列" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "### 方法2:np.expand_dims方法" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "np.expand_dims方法实现的效果,和np.newaxis关键字是一模一样的" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 13, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "data": { 278 | "text/plain": [ 279 | "array([0, 1, 2, 3, 4])" 280 | ] 281 | }, 282 | "execution_count": 13, 283 | "metadata": {}, 284 | "output_type": "execute_result" 285 | } 286 | ], 287 | "source": [ 288 | "arr" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "#### 给一维数组添加一个行维度" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "相当于arr[np.newaxis, arr]" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 14, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/plain": [ 313 | "array([[0, 1, 2, 3, 4]])" 314 | ] 315 | }, 316 | "execution_count": 14, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "np.expand_dims(arr, axis=0)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 15, 328 | "metadata": {}, 329 | "outputs": [ 330 | { 331 | "data": { 332 | "text/plain": [ 333 | "(1, 5)" 334 | ] 335 | }, 336 | "execution_count": 15, 337 | "metadata": {}, 338 | "output_type": "execute_result" 339 | } 340 | ], 341 | "source": [ 342 | "np.expand_dims(arr, axis=0).shape" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "#### 给一维数组添加一个列维度" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "相当于arr[arr, np.newaxis]" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 16, 362 | "metadata": {}, 363 | "outputs": [ 364 | { 365 | "data": { 366 | "text/plain": [ 367 | "array([[0],\n", 368 | " [1],\n", 369 | " [2],\n", 370 | " [3],\n", 371 | " [4]])" 372 | ] 373 | }, 374 | "execution_count": 16, 375 | "metadata": {}, 376 | "output_type": "execute_result" 377 | } 378 | ], 379 | "source": [ 380 | "np.expand_dims(arr, axis=1)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 17, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "(5, 1)" 392 | ] 393 | }, 394 | "execution_count": 17, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "np.expand_dims(arr, axis=1).shape" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "### 方法3:np.reshape方法" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "#### 给一维数组添加一个行维度" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 23, 420 | "metadata": {}, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/plain": [ 425 | "array([0, 1, 2, 3, 4])" 426 | ] 427 | }, 428 | "execution_count": 23, 429 | "metadata": {}, 430 | "output_type": "execute_result" 431 | } 432 | ], 433 | "source": [ 434 | "arr" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 32, 440 | "metadata": {}, 441 | "outputs": [ 442 | { 443 | "data": { 444 | "text/plain": [ 445 | "array([[0, 1, 2, 3, 4]])" 446 | ] 447 | }, 448 | "execution_count": 32, 449 | "metadata": {}, 450 | "output_type": "execute_result" 451 | } 452 | ], 453 | "source": [ 454 | "np.reshape(arr, (1, 5))" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 33, 460 | "metadata": {}, 461 | "outputs": [ 462 | { 463 | "data": { 464 | "text/plain": [ 465 | "array([[0, 1, 2, 3, 4]])" 466 | ] 467 | }, 468 | "execution_count": 33, 469 | "metadata": {}, 470 | "output_type": "execute_result" 471 | } 472 | ], 473 | "source": [ 474 | "np.reshape(arr, (1, -1))" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 34, 480 | "metadata": {}, 481 | "outputs": [ 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "(1, 5)" 486 | ] 487 | }, 488 | "execution_count": 34, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "np.reshape(arr, (1, -1)).shape" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "#### 给一维数组添加一个列维度" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": 35, 507 | "metadata": {}, 508 | "outputs": [ 509 | { 510 | "data": { 511 | "text/plain": [ 512 | "array([[0],\n", 513 | " [1],\n", 514 | " [2],\n", 515 | " [3],\n", 516 | " [4]])" 517 | ] 518 | }, 519 | "execution_count": 35, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "np.reshape(arr, (-1, 1))" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 36, 531 | "metadata": {}, 532 | "outputs": [ 533 | { 534 | "data": { 535 | "text/plain": [ 536 | "(5, 1)" 537 | ] 538 | }, 539 | "execution_count": 36, 540 | "metadata": {}, 541 | "output_type": "execute_result" 542 | } 543 | ], 544 | "source": [ 545 | "np.reshape(arr, (-1, 1)).shape" 546 | ] 547 | } 548 | ], 549 | "metadata": { 550 | "kernelspec": { 551 | "display_name": "Python 3", 552 | "language": "python", 553 | "name": "python3" 554 | }, 555 | "language_info": { 556 | "codemirror_mode": { 557 | "name": "ipython", 558 | "version": 3 559 | }, 560 | "file_extension": ".py", 561 | "mimetype": "text/x-python", 562 | "name": "python", 563 | "nbconvert_exporter": "python", 564 | "pygments_lexer": "ipython3", 565 | "version": "3.7.6" 566 | } 567 | }, 568 | "nbformat": 4, 569 | "nbformat_minor": 4 570 | } 571 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/08. Numpy实现K折交叉验证的数据划分-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy实现K折交叉验证的数据划分" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "本实例使用Numpy的数组切片语法,实现了K折交叉验证的数据划分" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### 背景:K折交叉验证\n", 22 | "\n", 23 | "***为什么需要这个?*** \n", 24 | "在机器学习中,因为如下原因,使用K折交叉验证能更好评估模型效果:\n", 25 | "1. 样本量不充足,划分了训练集和测试集后,训练数据更少;\n", 26 | "2. 训练集和测试集的不同划分,可能会导致不同的模型性能结果;\n", 27 | "\n", 28 | "\n", 29 | "***K折验证是什么*** \n", 30 | "K折验证(K-fold validtion)将数据划分为大小相同的K个分区。 \n", 31 | "对每个分区i,在剩余的K-1个分区上训练模型,然后在分区i上评估模型。 \n", 32 | "最终分数等于K个分数的平均值,使用平均值来消除训练集和测试集的划分影响;\n", 33 | "\n", 34 | "" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "### 1. 模拟构造样本集合" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import numpy as np" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "array([[ 0, 1, 2, 3],\n", 62 | " [ 4, 5, 6, 7],\n", 63 | " [ 8, 9, 10, 11],\n", 64 | " [12, 13, 14, 15],\n", 65 | " [16, 17, 18, 19],\n", 66 | " [20, 21, 22, 23],\n", 67 | " [24, 25, 26, 27],\n", 68 | " [28, 29, 30, 31],\n", 69 | " [32, 33, 34, 35]])" 70 | ] 71 | }, 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "data = np.arange(36).reshape(9,4)\n", 79 | "data" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "用样本的角度解释下data数组:\n", 87 | "* 这是一个二维矩阵,行代表每个样本,列代表每个特征\n", 88 | "* 这里有9个样本,每个样本有4个特征\n", 89 | "\n", 90 | "这是scikit-learn模型训练输入的标准格式" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "### 2. 使用Numpy实现K次划分" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 3, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "# 我们想进行4折交叉验证\n", 107 | "k = 4" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "data": { 117 | "text/plain": [ 118 | "2" 119 | ] 120 | }, 121 | "execution_count": 4, 122 | "metadata": {}, 123 | "output_type": "execute_result" 124 | } 125 | ], 126 | "source": [ 127 | "# 算出来每个fold的样本个数\n", 128 | "k_samples_count = data.shape[0]//k\n", 129 | "k_samples_count" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 5, 135 | "metadata": { 136 | "scrolled": false 137 | }, 138 | "outputs": [ 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "\n", 144 | "#####第0折#####\n", 145 | "验证集:\n", 146 | " [[0 1 2 3]\n", 147 | " [4 5 6 7]]\n", 148 | "训练集:\n", 149 | " [[ 8 9 10 11]\n", 150 | " [12 13 14 15]\n", 151 | " [16 17 18 19]\n", 152 | " [20 21 22 23]\n", 153 | " [24 25 26 27]\n", 154 | " [28 29 30 31]\n", 155 | " [32 33 34 35]]\n", 156 | "\n", 157 | "#####第1折#####\n", 158 | "验证集:\n", 159 | " [[ 8 9 10 11]\n", 160 | " [12 13 14 15]]\n", 161 | "训练集:\n", 162 | " [[ 0 1 2 3]\n", 163 | " [ 4 5 6 7]\n", 164 | " [16 17 18 19]\n", 165 | " [20 21 22 23]\n", 166 | " [24 25 26 27]\n", 167 | " [28 29 30 31]\n", 168 | " [32 33 34 35]]\n", 169 | "\n", 170 | "#####第2折#####\n", 171 | "验证集:\n", 172 | " [[16 17 18 19]\n", 173 | " [20 21 22 23]]\n", 174 | "训练集:\n", 175 | " [[ 0 1 2 3]\n", 176 | " [ 4 5 6 7]\n", 177 | " [ 8 9 10 11]\n", 178 | " [12 13 14 15]\n", 179 | " [24 25 26 27]\n", 180 | " [28 29 30 31]\n", 181 | " [32 33 34 35]]\n", 182 | "\n", 183 | "#####第3折#####\n", 184 | "验证集:\n", 185 | " [[24 25 26 27]\n", 186 | " [28 29 30 31]]\n", 187 | "训练集:\n", 188 | " [[ 0 1 2 3]\n", 189 | " [ 4 5 6 7]\n", 190 | " [ 8 9 10 11]\n", 191 | " [12 13 14 15]\n", 192 | " [16 17 18 19]\n", 193 | " [20 21 22 23]\n", 194 | " [32 33 34 35]]\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "for fold in range(k):\n", 200 | " validation_begin = k_samples_count*fold\n", 201 | " validation_end = k_samples_count*(fold+1)\n", 202 | " \n", 203 | " validation_data = data[validation_begin:validation_end]\n", 204 | " \n", 205 | " # np.vstack,沿着垂直的方向堆叠数组\n", 206 | " train_data = np.vstack([\n", 207 | " data[:validation_begin], \n", 208 | " data[validation_end:]\n", 209 | " ])\n", 210 | " \n", 211 | " print()\n", 212 | " print(f\"#####第{fold}折#####\")\n", 213 | " print(\"验证集:\\n\", validation_data)\n", 214 | " print(\"训练集:\\n\", train_data)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "如果使用scikit-learn,已经有封装好的实现: \n", 222 | "from sklearn.model_selection import cross_val_score" 223 | ] 224 | } 225 | ], 226 | "metadata": { 227 | "kernelspec": { 228 | "display_name": "Python 3", 229 | "language": "python", 230 | "name": "python3" 231 | }, 232 | "language_info": { 233 | "codemirror_mode": { 234 | "name": "ipython", 235 | "version": 3 236 | }, 237 | "file_extension": ".py", 238 | "mimetype": "text/x-python", 239 | "name": "python", 240 | "nbconvert_exporter": "python", 241 | "pygments_lexer": "ipython3", 242 | "version": "3.7.6" 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 4 247 | } 248 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09. Numpy非常有用的数组合并操作-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy非常重要有用的数组合并操作" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "背景:在给机器学习准备数据的过程中,经常需要进行不同来源的数据合并的操作。\n", 15 | "\n", 16 | "两类场景:\n", 17 | "1. 给已有的数据添加多行,比如增添一些样本数据进去;\n", 18 | "2. 给已有的数据添加多列,比如增添一些特征进去;\n", 19 | "\n", 20 | "以下操作均可以实现数组合并:\n", 21 | "* np.concatenate(array_list, axis=0/1):沿着指定axis进行数组的合并\n", 22 | "* np.vstack或者np.row_stack(array_list):垂直vertically、按行row wise进行数据合并\n", 23 | "* np.hstack或者np.column_stack(array_list):水平horizontally、按列column wise进行数据合并" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import numpy as np" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "### 1. 怎样给数据添加新的多行" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "a = np.arange(6).reshape(2,3)\n", 49 | "b = np.random.randint(10,20,size=(4,3))" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "array([[0, 1, 2],\n", 61 | " [3, 4, 5]])" 62 | ] 63 | }, 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "a" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 4, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/plain": [ 81 | "array([[12, 11, 14],\n", 82 | " [18, 15, 10],\n", 83 | " [11, 15, 15],\n", 84 | " [19, 16, 10]])" 85 | ] 86 | }, 87 | "execution_count": 4, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "b" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 5, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "text/plain": [ 104 | "array([[ 0, 1, 2],\n", 105 | " [ 3, 4, 5],\n", 106 | " [12, 11, 14],\n", 107 | " [18, 15, 10],\n", 108 | " [11, 15, 15],\n", 109 | " [19, 16, 10]])" 110 | ] 111 | }, 112 | "execution_count": 5, 113 | "metadata": {}, 114 | "output_type": "execute_result" 115 | } 116 | ], 117 | "source": [ 118 | "# 方法1:\n", 119 | "np.concatenate([a,b])" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 6, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "data": { 129 | "text/plain": [ 130 | "array([[ 0, 1, 2],\n", 131 | " [ 3, 4, 5],\n", 132 | " [12, 11, 14],\n", 133 | " [18, 15, 10],\n", 134 | " [11, 15, 15],\n", 135 | " [19, 16, 10]])" 136 | ] 137 | }, 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "# 方法2\n", 145 | "np.vstack([a,b])" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 7, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "array([[ 0, 1, 2],\n", 157 | " [ 3, 4, 5],\n", 158 | " [12, 11, 14],\n", 159 | " [18, 15, 10],\n", 160 | " [11, 15, 15],\n", 161 | " [19, 16, 10]])" 162 | ] 163 | }, 164 | "execution_count": 7, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "# 方法3\n", 171 | "np.row_stack([a, b])" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "### 2. 怎样给数据添加新的多列" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 8, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "a = np.arange(12).reshape(3,4)\n", 188 | "b = np.random.randint(10,20,size=(3,2))" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 9, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "array([[ 0, 1, 2, 3],\n", 200 | " [ 4, 5, 6, 7],\n", 201 | " [ 8, 9, 10, 11]])" 202 | ] 203 | }, 204 | "execution_count": 9, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "a" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 10, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "array([[16, 10],\n", 222 | " [12, 17],\n", 223 | " [12, 10]])" 224 | ] 225 | }, 226 | "execution_count": 10, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "b" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 11, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "array([[ 0, 1, 2, 3, 16, 10],\n", 244 | " [ 4, 5, 6, 7, 12, 17],\n", 245 | " [ 8, 9, 10, 11, 12, 10]])" 246 | ] 247 | }, 248 | "execution_count": 11, 249 | "metadata": {}, 250 | "output_type": "execute_result" 251 | } 252 | ], 253 | "source": [ 254 | "# 方法1\n", 255 | "np.concatenate([a,b], axis=1)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 12, 261 | "metadata": {}, 262 | "outputs": [ 263 | { 264 | "data": { 265 | "text/plain": [ 266 | "array([[ 0, 1, 2, 3, 16, 10],\n", 267 | " [ 4, 5, 6, 7, 12, 17],\n", 268 | " [ 8, 9, 10, 11, 12, 10]])" 269 | ] 270 | }, 271 | "execution_count": 12, 272 | "metadata": {}, 273 | "output_type": "execute_result" 274 | } 275 | ], 276 | "source": [ 277 | "# 方法2\n", 278 | "np.hstack([a,b])" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 13, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/plain": [ 289 | "array([[ 0, 1, 2, 3, 16, 10],\n", 290 | " [ 4, 5, 6, 7, 12, 17],\n", 291 | " [ 8, 9, 10, 11, 12, 10]])" 292 | ] 293 | }, 294 | "execution_count": 13, 295 | "metadata": {}, 296 | "output_type": "execute_result" 297 | } 298 | ], 299 | "source": [ 300 | "# 方法3\n", 301 | "np.column_stack([a,b])" 302 | ] 303 | } 304 | ], 305 | "metadata": { 306 | "kernelspec": { 307 | "display_name": "Python 3", 308 | "language": "python", 309 | "name": "python3" 310 | }, 311 | "language_info": { 312 | "codemirror_mode": { 313 | "name": "ipython", 314 | "version": 3 315 | }, 316 | "file_extension": ".py", 317 | "mimetype": "text/x-python", 318 | "name": "python", 319 | "nbconvert_exporter": "python", 320 | "pygments_lexer": "ipython3", 321 | "version": "3.7.6" 322 | } 323 | }, 324 | "nbformat": 4, 325 | "nbformat_minor": 4 326 | } 327 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/10. Numpy怎样对数组排序-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样对数组排序\n", 8 | "\n", 9 | "Numpy给数组排序的三个方法: \n", 10 | "* numpy.sort:返回排序后数组的拷贝\n", 11 | "* array.sort:原地排序数组而不是返回拷贝\n", 12 | "* numpy.argsort:间接排序,返回的是排序后的数字索引\n", 13 | "\n", 14 | "3个方法都支持一个参数kind,可以是以下一个值:\n", 15 | "* quicksort:快速排序,平均O(nlogn),不稳定情况\n", 16 | "* mergesort:归并排序,平均O(nlogn),稳定排序\n", 17 | "* heapsort:堆排序,平均O(nlogn),不稳定排序\n", 18 | "* stable:稳定排序\n", 19 | "\n", 20 | "kind默认值是quicksort,快速排序平均情况是最快,保持默认即可" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "### 1. np.sort返回排序后的数组" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "arr = np.array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 57 | ] 58 | }, 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "# 返回拷贝后的数组\n", 66 | "np.sort(arr)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 4, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 78 | ] 79 | }, 80 | "execution_count": 4, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "arr" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### 2. array.sort进行原地排序" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 5, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "arr2 = arr.copy()" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 6, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/plain": [ 113 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 114 | ] 115 | }, 116 | "execution_count": 6, 117 | "metadata": {}, 118 | "output_type": "execute_result" 119 | } 120 | ], 121 | "source": [ 122 | "arr2" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 7, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "arr2.sort()" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 8, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "data": { 141 | "text/plain": [ 142 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 143 | ] 144 | }, 145 | "execution_count": 8, 146 | "metadata": {}, 147 | "output_type": "execute_result" 148 | } 149 | ], 150 | "source": [ 151 | "arr2" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### 3. np.argsort 返回的是有序数字的索引" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 9, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/plain": [ 169 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 170 | ] 171 | }, 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "arr" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 10, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "data": { 188 | "text/plain": [ 189 | "array([4, 1, 0, 2, 3, 8, 6, 7, 5], dtype=int64)" 190 | ] 191 | }, 192 | "execution_count": 10, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "# 获得排序元素对应的索引数字列表\n", 199 | "indices = np.argsort(arr)\n", 200 | "indices" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 11, 206 | "metadata": { 207 | "scrolled": true 208 | }, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 214 | ] 215 | }, 216 | "execution_count": 11, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "# 可以直接获取对应的数据列表\n", 223 | "arr[indices]" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "### 4. Python原生sorted与np.sort的性能对比" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 12, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "arr_np = np.random.randint(0, 100, 100*10000)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 13, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "name": "stdout", 249 | "output_type": "stream", 250 | "text": [ 251 | "24 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 252 | ] 253 | } 254 | ], 255 | "source": [ 256 | "%timeit np.sort(arr_np)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 14, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "# 将numpy arr变成python list\n", 266 | "arr_py = arr_np.tolist()" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 15, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "90.1 ms ± 726 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "%timeit sorted(arr_py)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [] 292 | } 293 | ], 294 | "metadata": { 295 | "kernelspec": { 296 | "display_name": "Python 3", 297 | "language": "python", 298 | "name": "python3" 299 | }, 300 | "language_info": { 301 | "codemirror_mode": { 302 | "name": "ipython", 303 | "version": 3 304 | }, 305 | "file_extension": ".py", 306 | "mimetype": "text/x-python", 307 | "name": "python", 308 | "nbconvert_exporter": "python", 309 | "pygments_lexer": "ipython3", 310 | "version": "3.7.6" 311 | } 312 | }, 313 | "nbformat": 4, 314 | "nbformat_minor": 4 315 | } 316 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/12. Numpy中重要的广播概念-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy中重要的广播概念" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "***广播:*** \n", 15 | "简单理解为用于不同大小数组的二元通用函数(加、减、乘等)的一组规则\n", 16 | "\n", 17 | "***广播的规则:***\n", 18 | "1. 如果两个数组的维度数dim不相同,那么小维度数组的形状将会在左边补1\n", 19 | "2. 如果shape的维度不匹配,但是有维度是1,那么可以扩展维度是1的维度匹配另一个数组;\n", 20 | "3. 如果shape的维度不匹配,但是没有任何一个维度是1,则匹配失败引发错误;" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "### 实例1:二维数组加一维数组" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "array([[1., 1., 1.],\n", 48 | " [1., 1., 1.]])" 49 | ] 50 | }, 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "a = np.ones((2,3))\n", 58 | "a" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "array([0, 1, 2])" 70 | ] 71 | }, 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "b = np.arange(3)\n", 79 | "b" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "data": { 89 | "text/plain": [ 90 | "((2, 3), (3,))" 91 | ] 92 | }, 93 | "execution_count": 4, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | } 97 | ], 98 | "source": [ 99 | "a.shape, b.shape" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "array([[1., 2., 3.],\n", 111 | " [1., 2., 3.]])" 112 | ] 113 | }, 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "output_type": "execute_result" 117 | } 118 | ], 119 | "source": [ 120 | "# 形状不匹配但是可以相加\n", 121 | "a + b" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "***分析:a.shape=(2, 3), b.shape=(3,)***\n", 129 | "1. 根据规则1,b.shape会变成(1, 3)\n", 130 | "2. 根据规则2,b.shape再变成(2, 3),相当于在行上复制\n", 131 | "3. 完成匹配" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "### 实例2:两个数组均需要广播" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 6, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/plain": [ 149 | "array([[0],\n", 150 | " [1],\n", 151 | " [2]])" 152 | ] 153 | }, 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "a = np.arange(3).reshape((3, 1))\n", 161 | "a" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 7, 167 | "metadata": {}, 168 | "outputs": [ 169 | { 170 | "data": { 171 | "text/plain": [ 172 | "array([0, 1, 2])" 173 | ] 174 | }, 175 | "execution_count": 7, 176 | "metadata": {}, 177 | "output_type": "execute_result" 178 | } 179 | ], 180 | "source": [ 181 | "b = np.arange(3)\n", 182 | "b" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 8, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "data": { 192 | "text/plain": [ 193 | "((3, 1), (3,))" 194 | ] 195 | }, 196 | "execution_count": 8, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "a.shape, b.shape" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 9, 208 | "metadata": { 209 | "scrolled": true 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/plain": [ 215 | "array([[0, 1, 2],\n", 216 | " [1, 2, 3],\n", 217 | " [2, 3, 4]])" 218 | ] 219 | }, 220 | "execution_count": 9, 221 | "metadata": {}, 222 | "output_type": "execute_result" 223 | } 224 | ], 225 | "source": [ 226 | "a + b" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "***分析:a.shape为(3,1),b.shape为(3,)***:\n", 234 | "1. 根据规则1,b.shape会变成(1, 3)\n", 235 | "2. 根据规则2,b.shape再变成(3, 3),相当于在行上复制\n", 236 | "3. 根据规则2,a.shape再变成(3, 3),相当于在列上复制\n", 237 | "3. 完成匹配" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### 实例3:不匹配的例子" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 10, 250 | "metadata": {}, 251 | "outputs": [ 252 | { 253 | "data": { 254 | "text/plain": [ 255 | "array([[1., 1.],\n", 256 | " [1., 1.],\n", 257 | " [1., 1.]])" 258 | ] 259 | }, 260 | "execution_count": 10, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "a = np.ones((3,2))\n", 267 | "a" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 11, 273 | "metadata": {}, 274 | "outputs": [ 275 | { 276 | "data": { 277 | "text/plain": [ 278 | "array([0, 1, 2])" 279 | ] 280 | }, 281 | "execution_count": 11, 282 | "metadata": {}, 283 | "output_type": "execute_result" 284 | } 285 | ], 286 | "source": [ 287 | "b = np.arange(3)\n", 288 | "b" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 12, 294 | "metadata": {}, 295 | "outputs": [ 296 | { 297 | "data": { 298 | "text/plain": [ 299 | "((3, 2), (3,))" 300 | ] 301 | }, 302 | "execution_count": 12, 303 | "metadata": {}, 304 | "output_type": "execute_result" 305 | } 306 | ], 307 | "source": [ 308 | "a.shape, b.shape" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 13, 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "ename": "ValueError", 318 | "evalue": "operands could not be broadcast together with shapes (3,2) (3,) ", 319 | "output_type": "error", 320 | "traceback": [ 321 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 322 | "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", 323 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0ma\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mb\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 324 | "\u001b[1;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,2) (3,) " 325 | ] 326 | } 327 | ], 328 | "source": [ 329 | "a + b" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "***分析:a.shape为(3,2),b.shape为(3,)***:\n", 337 | "1. 根据规则1,b.shape会变成(1, 3)\n", 338 | "2. 根据规则2,b.shape再变成(3, 3),相当于在行上复制\n", 339 | "3. 根据规则3,形状不匹配,但是没有维度是1,匹配失败报错" 340 | ] 341 | } 342 | ], 343 | "metadata": { 344 | "kernelspec": { 345 | "display_name": "Python 3", 346 | "language": "python", 347 | "name": "python3" 348 | }, 349 | "language_info": { 350 | "codemirror_mode": { 351 | "name": "ipython", 352 | "version": 3 353 | }, 354 | "file_extension": ".py", 355 | "mimetype": "text/x-python", 356 | "name": "python", 357 | "nbconvert_exporter": "python", 358 | "pygments_lexer": "ipython3", 359 | "version": "3.7.6" 360 | } 361 | }, 362 | "nbformat": 4, 363 | "nbformat_minor": 4 364 | } 365 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/13. Numpy求解线性方程组-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy求解线性方程组\n", 8 | "\n", 9 | "对于Ax=b,已知A和b,怎么算出x?" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 1. 引入包" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### 2. 求解" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 3, 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "data": { 42 | "text/plain": [ 43 | "array([[ 1, -2, 1],\n", 44 | " [ 0, 2, -8],\n", 45 | " [-4, 5, 9]])" 46 | ] 47 | }, 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "output_type": "execute_result" 51 | } 52 | ], 53 | "source": [ 54 | "A = np.array(\n", 55 | " [\n", 56 | " [1, -2, 1],\n", 57 | " [0, 2, -8],\n", 58 | " [-4, 5, 9]\n", 59 | " ]\n", 60 | ")\n", 61 | "A" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 5, 67 | "metadata": {}, 68 | "outputs": [ 69 | { 70 | "data": { 71 | "text/plain": [ 72 | "array([ 0, 8, -9])" 73 | ] 74 | }, 75 | "execution_count": 5, 76 | "metadata": {}, 77 | "output_type": "execute_result" 78 | } 79 | ], 80 | "source": [ 81 | "b = np.array([0, 8, -9])\n", 82 | "b" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 7, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "array([29., 16., 3.])" 94 | ] 95 | }, 96 | "execution_count": 7, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "# 调用solve方法直接求解\n", 103 | "x = np.linalg.solve(A, b)\n", 104 | "x" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "### 验证" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 9, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "8.0" 123 | ] 124 | }, 125 | "execution_count": 9, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "# 验证单个方程\n", 132 | "A[1].dot(x)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 11, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "data": { 142 | "text/plain": [ 143 | "array([ True, True, True])" 144 | ] 145 | }, 146 | "execution_count": 11, 147 | "metadata": {}, 148 | "output_type": "execute_result" 149 | } 150 | ], 151 | "source": [ 152 | "# 验证整个矩阵计算\n", 153 | "A.dot(x) == b" 154 | ] 155 | } 156 | ], 157 | "metadata": { 158 | "kernelspec": { 159 | "display_name": "Python 3", 160 | "language": "python", 161 | "name": "python3" 162 | }, 163 | "language_info": { 164 | "codemirror_mode": { 165 | "name": "ipython", 166 | "version": 3 167 | }, 168 | "file_extension": ".py", 169 | "mimetype": "text/x-python", 170 | "name": "python", 171 | "nbconvert_exporter": "python", 172 | "pygments_lexer": "ipython3", 173 | "version": "3.7.6" 174 | } 175 | }, 176 | "nbformat": 4, 177 | "nbformat_minor": 4 178 | } 179 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/14. Numpy实现SVD矩阵分解-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy实现SVD矩阵分解" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### 1. 引入包" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### 2. 实现矩阵分解" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "A = np.random.randint(1, 10, (8, 4))" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/plain": [ 50 | "array([[6, 5, 1, 5],\n", 51 | " [1, 7, 9, 7],\n", 52 | " [7, 2, 4, 2],\n", 53 | " [6, 4, 3, 5],\n", 54 | " [2, 8, 8, 6],\n", 55 | " [5, 2, 8, 6],\n", 56 | " [7, 8, 2, 3],\n", 57 | " [1, 3, 6, 9]])" 58 | ] 59 | }, 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "A" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 4, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "# 实现矩阵分解\n", 76 | "U, S, V = np.linalg.svd(A, full_matrices=False)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 5, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "((8, 4), (4,), (4, 4))" 88 | ] 89 | }, 90 | "execution_count": 5, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "U.shape, S.shape, V.shape" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 6, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "data": { 106 | "text/plain": [ 107 | "array([[-0.28611227, -0.38768744, -0.07088588, -0.47757145],\n", 108 | " [-0.44374671, 0.40390585, -0.25458601, 0.20383531],\n", 109 | " [-0.24657791, -0.34884357, 0.43054458, 0.4062272 ],\n", 110 | " [-0.30673084, -0.27495123, 0.14797683, -0.2218886 ],\n", 111 | " [-0.43671345, 0.23339125, -0.39431663, 0.27599841],\n", 112 | " [-0.37257929, 0.10313032, 0.59362412, 0.23542645],\n", 113 | " [-0.33314069, -0.52514475, -0.41727103, 0.07285924],\n", 114 | " [-0.35472167, 0.38520663, 0.20225001, -0.61580222]])" 115 | ] 116 | }, 117 | "execution_count": 6, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "U" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 7, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "array([28.44730142, 10.24874824, 6.39012419, 4.56952014])" 135 | ] 136 | }, 137 | "execution_count": 7, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "# 因为是对角矩阵,这里进行了简写\n", 144 | "S" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 8, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "array([[28.44730142, 0. , 0. , 0. ],\n", 156 | " [ 0. , 10.24874824, 0. , 0. ],\n", 157 | " [ 0. , 0. , 6.39012419, 0. ],\n", 158 | " [ 0. , 0. , 0. , 4.56952014]])" 159 | ] 160 | }, 161 | "execution_count": 8, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "np.diag(S)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "array([[-0.39194862, -0.50004828, -0.54329548, -0.54877866],\n", 179 | " [-0.81202147, -0.18350883, 0.48594814, 0.26608277],\n", 180 | " [ 0.41980592, -0.84227439, 0.27814277, 0.19228481],\n", 181 | " [ 0.10373231, 0.08276523, 0.62555658, -0.76880979]])" 182 | ] 183 | }, 184 | "execution_count": 9, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "V" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "### 3. 从分量还原矩阵" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 10, 203 | "metadata": {}, 204 | "outputs": [ 205 | { 206 | "data": { 207 | "text/plain": [ 208 | "array([[6., 5., 1., 5.],\n", 209 | " [1., 7., 9., 7.],\n", 210 | " [7., 2., 4., 2.],\n", 211 | " [6., 4., 3., 5.],\n", 212 | " [2., 8., 8., 6.],\n", 213 | " [5., 2., 8., 6.],\n", 214 | " [7., 8., 2., 3.],\n", 215 | " [1., 3., 6., 9.]])" 216 | ] 217 | }, 218 | "execution_count": 10, 219 | "metadata": {}, 220 | "output_type": "execute_result" 221 | } 222 | ], 223 | "source": [ 224 | "U @ np.diag(S) @ V" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [] 233 | } 234 | ], 235 | "metadata": { 236 | "kernelspec": { 237 | "display_name": "Python 3", 238 | "language": "python", 239 | "name": "python3" 240 | }, 241 | "language_info": { 242 | "codemirror_mode": { 243 | "name": "ipython", 244 | "version": 3 245 | }, 246 | "file_extension": ".py", 247 | "mimetype": "text/x-python", 248 | "name": "python", 249 | "nbconvert_exporter": "python", 250 | "pygments_lexer": "ipython3", 251 | "version": "3.7.6" 252 | } 253 | }, 254 | "nbformat": 4, 255 | "nbformat_minor": 4 256 | } 257 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/17. Numpy计算逆矩阵求解线性方程组-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy使用逆矩阵求解线性方程组" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "对于这样的线性方程组:\n", 15 | "* x + y + z = 6\n", 16 | "* 2y + 5z = -4\n", 17 | "* 2x + 5y - z = 27\n", 18 | "\n", 19 | "可以表示成矩阵的形式:\n", 20 | "\n", 21 | "\n", 22 | "用公式可以表示为:Ax=b,其中A是矩阵,x和b都是列向量\n", 23 | "\n", 24 | "***逆矩阵(inverse matrix)的定义:*** \n", 25 | "设A是数域上的一个n阶矩阵,若存在另一个n阶矩阵B,使得: AB=BA=E ,则我们称B是A的逆矩阵,而A则被称为可逆矩阵。注:E为单位矩阵。\n", 26 | "\n", 27 | "***使用逆矩阵求解线性方程组的方法:*** \n", 28 | "两边都乘以$A^{-1}$,变成$A^{-1}$Ax=$A^{-1}$b,因为任何矩阵乘以单位矩阵都是自身,所以x=$A^{-1}$b" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "### 1. 求解逆矩阵" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/plain": [ 55 | "array([[ 1, 1, 1],\n", 56 | " [ 0, 2, 5],\n", 57 | " [ 2, 5, -1]])" 58 | ] 59 | }, 60 | "execution_count": 2, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "A = np.array([\n", 67 | " [1,1,1],\n", 68 | " [0,2,5],\n", 69 | " [2,5,-1]\n", 70 | "])\n", 71 | "A" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "array([[ 1.28571429, -0.28571429, -0.14285714],\n", 83 | " [-0.47619048, 0.14285714, 0.23809524],\n", 84 | " [ 0.19047619, 0.14285714, -0.0952381 ]])" 85 | ] 86 | }, 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "# B为A的逆矩阵\n", 94 | "B = np.linalg.inv(A)\n", 95 | "B" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### 2. 验证矩阵和逆矩阵的乘积是单位矩阵" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 6, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/plain": [ 113 | "array([[ 1.00000000e+00, -2.77555756e-17, 2.77555756e-17],\n", 114 | " [ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00],\n", 115 | " [-2.22044605e-16, 5.55111512e-17, 1.00000000e+00]])" 116 | ] 117 | }, 118 | "execution_count": 6, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | } 122 | ], 123 | "source": [ 124 | "A@B" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 7, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/plain": [ 135 | "array([[ 1.00000000e+00, -2.77555756e-17, 2.77555756e-17],\n", 136 | " [ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00],\n", 137 | " [-2.22044605e-16, 5.55111512e-17, 1.00000000e+00]])" 138 | ] 139 | }, 140 | "execution_count": 7, 141 | "metadata": {}, 142 | "output_type": "execute_result" 143 | } 144 | ], 145 | "source": [ 146 | "np.matmul(A, B)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "### 3. 验证线性方程组" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 8, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "# 构造Ax=b中的b\n", 163 | "b = np.array([6, -4, 27])" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 9, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "# 使用逆矩阵求解x\n", 173 | "x = B@b" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 10, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "array([ 5., 3., -2.])" 185 | ] 186 | }, 187 | "execution_count": 10, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "x" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 12, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "array([ 6., -4., 27.])" 205 | ] 206 | }, 207 | "execution_count": 12, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "# 验证A@x = b\n", 214 | "A@x" 215 | ] 216 | } 217 | ], 218 | "metadata": { 219 | "kernelspec": { 220 | "display_name": "Python 3", 221 | "language": "python", 222 | "name": "python3" 223 | }, 224 | "language_info": { 225 | "codemirror_mode": { 226 | "name": "ipython", 227 | "version": 3 228 | }, 229 | "file_extension": ".py", 230 | "mimetype": "text/x-python", 231 | "name": "python", 232 | "nbconvert_exporter": "python", 233 | "pygments_lexer": "ipython3", 234 | "version": "3.7.6" 235 | } 236 | }, 237 | "nbformat": 4, 238 | "nbformat_minor": 4 239 | } 240 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/18. Numpy怎样将数组读写到文件-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样将数组读写到文件\n", 8 | "\n", 9 | "本文档介绍的是Numpy以自己内建二进制的方式,将数组写出到文件,以及从文件加载数组;\n", 10 | "\n", 11 | "如果是文本、表格类数据,一般使用pandas这个类库做加载和处理,不用numpy\n", 12 | "\n", 13 | "几个方法:\n", 14 | "1. np.load(filename):从.npy或者.npz文件中加载numpy数组\n", 15 | "2. np.save(filename, arr):将单个numpy数组保存到.npy文件中\n", 16 | "3. np.savez(filename, arra=arra, arrb=arrb):将多个numpy数组保存到.npz未压缩的文件格式中\n", 17 | "4. np.savez_compressed(filename, arra=arra, arrb=arrb):将多个numpy数组保存到.npz压缩的文件格式中\n", 18 | "\n", 19 | ".npy和.npz都是二进制格式文件,用纯文本编辑器打开都是乱码" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import numpy as np" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### 1. 使用np.save和np.load保存和加载单个数组" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": {}, 42 | "outputs": [ 43 | { 44 | "data": { 45 | "text/plain": [ 46 | "array([[ 0, 1, 2, 3],\n", 47 | " [ 4, 5, 6, 7],\n", 48 | " [ 8, 9, 10, 11]])" 49 | ] 50 | }, 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "a = np.arange(12).reshape(3,4)\n", 58 | "a" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "np.save(\"arr_a.npy\", a)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [] 76 | } 77 | ], 78 | "metadata": { 79 | "kernelspec": { 80 | "display_name": "Python 3", 81 | "language": "python", 82 | "name": "python3" 83 | }, 84 | "language_info": { 85 | "codemirror_mode": { 86 | "name": "ipython", 87 | "version": 3 88 | }, 89 | "file_extension": ".py", 90 | "mimetype": "text/x-python", 91 | "name": "python", 92 | "nbconvert_exporter": "python", 93 | "pygments_lexer": "ipython3", 94 | "version": "3.7.6" 95 | } 96 | }, 97 | "nbformat": 4, 98 | "nbformat_minor": 4 99 | } 100 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/19. Numpy的结构化数组-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy的结构化数组" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "一般情况下,Numpy中的数组都是同样的数据类型,比如int、float; \n", 15 | "这也是Numpy性能高效的原因,在内存中紧凑存储,读取非常快; \n", 16 | "\n", 17 | "但是Numpy也可以记录异构数组,比如下面的数据: \n", 18 | "\n", 19 | " \n", 20 | " \n", 21 | " \n", 22 | " \n", 23 | " \n", 24 | " \n", 25 | " \n", 26 | " \n", 27 | " \n", 28 | " \n", 29 | " \n", 30 | " \n", 31 | " \n", 32 | " \n", 33 | " \n", 34 | " \n", 35 | " \n", 36 | " \n", 37 | " \n", 38 | " \n", 39 | "
姓名年龄体重
小王3080.5
小李2870.3
小天2978.6
\n", 40 | "\n", 41 | "这就是本节要介绍的“Numpy结构化数组”特性; " 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import numpy as np" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### 1. 正常的Numpy数组的dtype值只有一个类型" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 2, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/plain": [ 68 | "(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), dtype('int32'))" 69 | ] 70 | }, 71 | "execution_count": 2, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "arr = np.arange(10)\n", 78 | "arr, arr.dtype" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "data": { 88 | "text/plain": [ 89 | "(array([[0.13813273, 0.69213455, 0.2869116 , 0.64065806],\n", 90 | " [0.5972653 , 0.42803843, 0.84914465, 0.0502318 ],\n", 91 | " [0.31351949, 0.87095862, 0.52867948, 0.83884873]]),\n", 92 | " dtype('float64'))" 93 | ] 94 | }, 95 | "execution_count": 3, 96 | "metadata": {}, 97 | "output_type": "execute_result" 98 | } 99 | ], 100 | "source": [ 101 | "arr = np.random.rand(3, 4)\n", 102 | "arr, arr.dtype" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "### 2. 怎样使用Numpy表达异构数据" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/plain": [ 120 | "dtype([('name', '= 29]" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 13, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "data": { 323 | "text/plain": [ 324 | "array([('xiaowang', 30, 80.5)],\n", 325 | " dtype=[('name', '= 29) & (my_arr[\"weight\"] > 80)]" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "#### 对单列做逐元素计算" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 14, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "data": { 352 | "text/plain": [ 353 | "array([30, 28, 29])" 354 | ] 355 | }, 356 | "execution_count": 14, 357 | "metadata": {}, 358 | "output_type": "execute_result" 359 | } 360 | ], 361 | "source": [ 362 | "my_arr[\"age\"]" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 15, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "my_arr[\"age\"] += 1" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 16, 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/plain": [ 382 | "array([31, 29, 30])" 383 | ] 384 | }, 385 | "execution_count": 16, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "my_arr[\"age\"]" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "最后的一言: \n", 399 | "* 对于这种每列类型不同的“异构数据”,Pandas更擅长处理;\n", 400 | "* 但我们还要学习一下Numpy结构化数组,不一定会使用它,但要能读懂别人的代码" 401 | ] 402 | } 403 | ], 404 | "metadata": { 405 | "kernelspec": { 406 | "display_name": "Python 3", 407 | "language": "python", 408 | "name": "python3" 409 | }, 410 | "language_info": { 411 | "codemirror_mode": { 412 | "name": "ipython", 413 | "version": 3 414 | }, 415 | "file_extension": ".py", 416 | "mimetype": "text/x-python", 417 | "name": "python", 418 | "nbconvert_exporter": "python", 419 | "pygments_lexer": "ipython3", 420 | "version": "3.7.6" 421 | } 422 | }, 423 | "nbformat": 4, 424 | "nbformat_minor": 4 425 | } 426 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/20. Numpy与Pandas数据的相互转换-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy与Pandas数据的相互转换\n", 8 | "\n", 9 | "Pandas是在Numpy基础上建立的非常流行的数据分析类库; \n", 10 | "提供了强大针对异构、表格类型数据的处理与分析能力。\n", 11 | "\n", 12 | "本节介绍Numpy和Pandas的转换方法: \n", 13 | "1. Numpy数组怎样输入给Pandas的Series、DataFrame;\n", 14 | "2. Pandas的Series、DataFrame怎样转换成Numpy的数组" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np\n", 24 | "import pandas as pd" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### 怎样将Numpy数组转换成Pandas的数据结构" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "#### 怎样将Numpy的一维数组变成Pandas的Series" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/plain": [ 49 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" 50 | ] 51 | }, 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "arr = np.arange(10)\n", 59 | "arr" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/plain": [ 70 | "0 0\n", 71 | "1 1\n", 72 | "2 2\n", 73 | "3 3\n", 74 | "4 4\n", 75 | "5 5\n", 76 | "6 6\n", 77 | "7 7\n", 78 | "8 8\n", 79 | "9 9\n", 80 | "dtype: int32" 81 | ] 82 | }, 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "series = pd.Series(arr)\n", 90 | "series" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "#### 怎样将Numpy的二维数组转换成Pandas的DataFrame" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 4, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "array([[3, 9, 6, 3],\n", 109 | " [4, 1, 8, 1],\n", 110 | " [2, 4, 4, 7],\n", 111 | " [4, 8, 4, 7],\n", 112 | " [8, 3, 9, 8]])" 113 | ] 114 | }, 115 | "execution_count": 4, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "arr = np.random.randint(1, 10, size=(5, 4))\n", 122 | "arr" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/html": [ 133 | "
\n", 134 | "\n", 147 | "\n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | "
cacbcccd
03963
14181
22447
34847
48398
\n", 195 | "
" 196 | ], 197 | "text/plain": [ 198 | " ca cb cc cd\n", 199 | "0 3 9 6 3\n", 200 | "1 4 1 8 1\n", 201 | "2 2 4 4 7\n", 202 | "3 4 8 4 7\n", 203 | "4 8 3 9 8" 204 | ] 205 | }, 206 | "execution_count": 5, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "df = pd.DataFrame(arr, columns = [\"ca\", \"cb\", \"cc\", \"cd\"])\n", 213 | "df" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 6, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/html": [ 224 | "
\n", 225 | "\n", 238 | "\n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | "
cacbcccd
48398
\n", 258 | "
" 259 | ], 260 | "text/plain": [ 261 | " ca cb cc cd\n", 262 | "4 8 3 9 8" 263 | ] 264 | }, 265 | "execution_count": 6, 266 | "metadata": {}, 267 | "output_type": "execute_result" 268 | } 269 | ], 270 | "source": [ 271 | "df[df[\"ca\"] > 4]" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "### 怎样Pandas的数据结构转换成Numpy数组\n", 279 | "\n", 280 | "* 方法1:.values()\n", 281 | "* 方法2:.to_numpy()\n", 282 | "\n", 283 | "用途: \n", 284 | "比如Scikit-Learn的模型输入需要的是Numpy的数组 \n", 285 | "可以使用Pandas对原始数据做大量的处理后,将结果数据转换成Numpy数组作为输入 " 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "#### 将Series转换成Numpy数组" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 7, 298 | "metadata": {}, 299 | "outputs": [ 300 | { 301 | "data": { 302 | "text/plain": [ 303 | "0 0\n", 304 | "1 1\n", 305 | "2 2\n", 306 | "3 3\n", 307 | "4 4\n", 308 | "5 5\n", 309 | "6 6\n", 310 | "7 7\n", 311 | "8 8\n", 312 | "9 9\n", 313 | "dtype: int64" 314 | ] 315 | }, 316 | "execution_count": 7, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "series = pd.Series(range(10))\n", 323 | "series" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 8, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "data": { 333 | "text/plain": [ 334 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)" 335 | ] 336 | }, 337 | "execution_count": 8, 338 | "metadata": {}, 339 | "output_type": "execute_result" 340 | } 341 | ], 342 | "source": [ 343 | "series.values" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 9, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/plain": [ 354 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)" 355 | ] 356 | }, 357 | "execution_count": 9, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "series.to_numpy()" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "#### 将DataFrame转换成Numpy数组" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 10, 376 | "metadata": {}, 377 | "outputs": [ 378 | { 379 | "data": { 380 | "text/html": [ 381 | "
\n", 382 | "\n", 395 | "\n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | "
feature_afeature_bfeature_c
01112.2345.23
12122.2355.23
23132.2365.23
34142.2375.23
\n", 431 | "
" 432 | ], 433 | "text/plain": [ 434 | " feature_a feature_b feature_c\n", 435 | "0 11 12.23 45.23\n", 436 | "1 21 22.23 55.23\n", 437 | "2 31 32.23 65.23\n", 438 | "3 41 42.23 75.23" 439 | ] 440 | }, 441 | "execution_count": 10, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "df = pd.DataFrame(\n", 448 | " [\n", 449 | " [11, 12.23, 45.23],\n", 450 | " [21, 22.23, 55.23],\n", 451 | " [31, 32.23, 65.23],\n", 452 | " [41, 42.23, 75.23]\n", 453 | " ],\n", 454 | " columns = [\"feature_a\", \"feature_b\", \"feature_c\"]\n", 455 | ")\n", 456 | "df" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 11, 462 | "metadata": {}, 463 | "outputs": [ 464 | { 465 | "data": { 466 | "text/plain": [ 467 | "array([[11. , 12.23, 45.23],\n", 468 | " [21. , 22.23, 55.23],\n", 469 | " [31. , 32.23, 65.23],\n", 470 | " [41. , 42.23, 75.23]])" 471 | ] 472 | }, 473 | "execution_count": 11, 474 | "metadata": {}, 475 | "output_type": "execute_result" 476 | } 477 | ], 478 | "source": [ 479 | "df.values" 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": 12, 485 | "metadata": {}, 486 | "outputs": [ 487 | { 488 | "data": { 489 | "text/plain": [ 490 | "array([[11. , 12.23, 45.23],\n", 491 | " [21. , 22.23, 55.23],\n", 492 | " [31. , 32.23, 65.23],\n", 493 | " [41. , 42.23, 75.23]])" 494 | ] 495 | }, 496 | "execution_count": 12, 497 | "metadata": {}, 498 | "output_type": "execute_result" 499 | } 500 | ], 501 | "source": [ 502 | "df.to_numpy()" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [] 511 | } 512 | ], 513 | "metadata": { 514 | "kernelspec": { 515 | "display_name": "Python 3", 516 | "language": "python", 517 | "name": "python3" 518 | }, 519 | "language_info": { 520 | "codemirror_mode": { 521 | "name": "ipython", 522 | "version": 3 523 | }, 524 | "file_extension": ".py", 525 | "mimetype": "text/x-python", 526 | "name": "python", 527 | "nbconvert_exporter": "python", 528 | "pygments_lexer": "ipython3", 529 | "version": "3.7.6" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 4 534 | } 535 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/21. Numpy数据输入给Scikit-learn实现模型训练-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy数据输入给Sklearn实现模型训练\n", 8 | "\n", 9 | "***本视频的目的,向大家演示:*** \n", 10 | "Numpy的数组怎样与sklearn模型交互,包括训练测试集拆分、输入给模型、评估模型、模型预估\n", 11 | "\n", 12 | "对于大家自己的任务,可以提前处理成这样的Numpy格式,然后输入给sklearn模型" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "# 使用sklearn自带的数据集,这些数据集都是Numpy的形式\n", 23 | "# 我们自己的数据,也可以处理成这种格式,然后就可以输入给模型\n", 24 | "from sklearn import datasets\n", 25 | "# 用train_test_split可以拆分训练集和测试集\n", 26 | "from sklearn.model_selection import train_test_split\n", 27 | "# 使用LinearRegression训练线性回归模型\n", 28 | "from sklearn.linear_model import LinearRegression" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### 1. 加载波斯顿房价数据集" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "# 加载数据集,存入特征矩阵data、预测结果向量target\n", 45 | "data, target = datasets.load_boston(return_X_y=True)" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "(numpy.ndarray, numpy.ndarray)" 57 | ] 58 | }, 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "type(data), type(target)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/plain": [ 76 | "((506, 13), (506,))" 77 | ] 78 | }, 79 | "execution_count": 4, 80 | "metadata": {}, 81 | "output_type": "execute_result" 82 | } 83 | ], 84 | "source": [ 85 | "data.shape, target.shape" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 5, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "array([[6.3200e-03, 1.8000e+01, 2.3100e+00, 0.0000e+00, 5.3800e-01,\n", 97 | " 6.5750e+00, 6.5200e+01, 4.0900e+00, 1.0000e+00, 2.9600e+02,\n", 98 | " 1.5300e+01, 3.9690e+02, 4.9800e+00],\n", 99 | " [2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", 100 | " 6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", 101 | " 1.7800e+01, 3.9690e+02, 9.1400e+00],\n", 102 | " [2.7290e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", 103 | " 7.1850e+00, 6.1100e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", 104 | " 1.7800e+01, 3.9283e+02, 4.0300e+00]])" 105 | ] 106 | }, 107 | "execution_count": 5, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "# 查看前三条房子的特征信息\n", 114 | "data[:3]" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 6, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/plain": [ 125 | "array([24. , 21.6, 34.7])" 126 | ] 127 | }, 128 | "execution_count": 6, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "# 查看前三条房价结果\n", 135 | "target[:3]" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "### 2. 拆分训练集和测试集" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 7, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "# 拆分训练集和测试集\n", 152 | "X_train, X_test, y_train, y_test = train_test_split(data, target)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 8, 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/plain": [ 163 | "((379, 13), (379,))" 164 | ] 165 | }, 166 | "execution_count": 8, 167 | "metadata": {}, 168 | "output_type": "execute_result" 169 | } 170 | ], 171 | "source": [ 172 | "# 训练集的数据\n", 173 | "X_train.shape, y_train.shape" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 9, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "((127, 13), (127,))" 185 | ] 186 | }, 187 | "execution_count": 9, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "# 测试集的数据\n", 194 | "X_test.shape, y_test.shape" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "### 3. 训练线性回归模型" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 10, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "# 构造线性回归对象,使用默认参数即可\n", 211 | "clf = LinearRegression()" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 11, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/plain": [ 222 | "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" 223 | ] 224 | }, 225 | "execution_count": 11, 226 | "metadata": {}, 227 | "output_type": "execute_result" 228 | } 229 | ], 230 | "source": [ 231 | "# 执行训练\n", 232 | "clf.fit(X_train, y_train)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 12, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "0.7290997955432121" 244 | ] 245 | }, 246 | "execution_count": 12, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "# 在训练集上的打分\n", 253 | "clf.score(X_train, y_train)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "### 4. 评估模型和使用模型" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 13, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "data": { 270 | "text/plain": [ 271 | "0.7658281007291711" 272 | ] 273 | }, 274 | "execution_count": 13, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "# 在测试集上打分评估\n", 281 | "clf.score(X_test, y_test)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 14, 287 | "metadata": {}, 288 | "outputs": [ 289 | { 290 | "data": { 291 | "text/plain": [ 292 | "array([36.1889043 , 17.05681981, 26.1238293 ])" 293 | ] 294 | }, 295 | "execution_count": 14, 296 | "metadata": {}, 297 | "output_type": "execute_result" 298 | } 299 | ], 300 | "source": [ 301 | "# 只取前三条数据,实现房价预估\n", 302 | "clf.predict(X_test[:3])" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 15, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/plain": [ 313 | "array([50. , 23.1, 22.8])" 314 | ] 315 | }, 316 | "execution_count": 15, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "# 看下实际的房价\n", 323 | "y_test[:3]" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [] 332 | } 333 | ], 334 | "metadata": { 335 | "kernelspec": { 336 | "display_name": "Python 3", 337 | "language": "python", 338 | "name": "python3" 339 | }, 340 | "language_info": { 341 | "codemirror_mode": { 342 | "name": "ipython", 343 | "version": 3 344 | }, 345 | "file_extension": ".py", 346 | "mimetype": "text/x-python", 347 | "name": "python", 348 | "nbconvert_exporter": "python", 349 | "pygments_lexer": "ipython3", 350 | "version": "3.7.6" 351 | } 352 | }, 353 | "nbformat": 4, 354 | "nbformat_minor": 4 355 | } 356 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Untitled-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 4 6 | } 7 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Untitled1-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 4 6 | } 7 | -------------------------------------------------------------------------------- /06. Numpy计算数组中满足条件元素个数.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy计算数组中满足条件元素个数\n", 8 | "\n", 9 | "需求:有一个非常大的数组比如1亿个数字,求出里面数字小于5000的数字数目" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 1. 使用numpy的random模块生成1亿个数字" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "arr = np.random.randint(1, 10000, size=int(1e8))" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 3, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "data": { 44 | "text/plain": [ 45 | "array([5682, 8924, 7737, 7717, 8871, 2469, 1807, 6847, 8138, 1779])" 46 | ] 47 | }, 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "output_type": "execute_result" 51 | } 52 | ], 53 | "source": [ 54 | "arr[:10]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 4, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "100000000" 66 | ] 67 | }, 68 | "execution_count": 4, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "arr.size" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### 2. 使用Python原生语法实现" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 5, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "pyarr = list(arr)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 6, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "49997444" 102 | ] 103 | }, 104 | "execution_count": 6, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "# 计算下结果,用于对比是否准确\n", 111 | "len([x for x in pyarr if x>5000])" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 7, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "name": "stdout", 121 | "output_type": "stream", 122 | "text": [ 123 | "16.8 s ± 204 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 124 | ] 125 | } 126 | ], 127 | "source": [ 128 | "# 记一下时间\n", 129 | "%timeit len([x for x in pyarr if x>5000])" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### 3. 使用numpy的向量化操作实现" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 8, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/plain": [ 147 | "49997444" 148 | ] 149 | }, 150 | "execution_count": 8, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "# 计算下结果,用于对比是否准确\n", 157 | "arr[arr>5000].size" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 9, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "array([ True, True, True, True, True, False, False, True, True,\n", 169 | " False])" 170 | ] 171 | }, 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "(arr>5000)[:10]" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 10, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "590 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "# 记一下时间\n", 196 | "%timeit arr[arr>5000].size" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "### 4. 对比下时间" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 11, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "data": { 213 | "text/plain": [ 214 | "28.47457627118644" 215 | ] 216 | }, 217 | "execution_count": 11, 218 | "metadata": {}, 219 | "output_type": "execute_result" 220 | } 221 | ], 222 | "source": [ 223 | "16.8*1000 / 590 " 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [] 232 | } 233 | ], 234 | "metadata": { 235 | "kernelspec": { 236 | "display_name": "Python 3", 237 | "language": "python", 238 | "name": "python3" 239 | }, 240 | "language_info": { 241 | "codemirror_mode": { 242 | "name": "ipython", 243 | "version": 3 244 | }, 245 | "file_extension": ".py", 246 | "mimetype": "text/x-python", 247 | "name": "python", 248 | "nbconvert_exporter": "python", 249 | "pygments_lexer": "ipython3", 250 | "version": "3.7.6" 251 | } 252 | }, 253 | "nbformat": 4, 254 | "nbformat_minor": 4 255 | } 256 | -------------------------------------------------------------------------------- /07. Numpy怎样给数组增加一个维度.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样给数组增加一个维度" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "***背景:*** \n", 15 | "很多数据计算都是二维或三维的,对于一维的数据输入为了形状匹配,经常需升维变成二维\n", 16 | "\n", 17 | "***需要:*** \n", 18 | "在不改变数据的情况下,添加数组维度;(注意观察这个例子,维度变了,但数据不变) \n", 19 | "原始数组:一维数组arr=[1,2,3,4],其shape是(4, ),取值分别为arr[0],arr[1],arr[2],arr[3] \n", 20 | "变形数组:二维数组arr[[1,2,3,4]],其shape实(1,4), 取值分别为a[0,0],a[0,1],a[0,2],a[0,3]\n", 21 | "\n", 22 | "***实操:*** \n", 23 | "经常需要在纸上手绘数组的形状,来查看不同数组是否形状匹配,是否需要升维降维\n", 24 | "\n", 25 | "***3种方法:*** \n", 26 | "* np.newaxis:关键字,使用索引的语法给数组添加维度\n", 27 | "* np.expand_dims(arr, axis):方法,和np.newaxis实现一样的功能,给arr在axis位置添加维度\n", 28 | "* np.reshape(a, newshape):方法,给一个维度设置为1完成升维" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "data": { 47 | "text/plain": [ 48 | "array([0, 1, 2, 3, 4])" 49 | ] 50 | }, 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "arr = np.arange(5)\n", 58 | "arr" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "(5,)" 70 | ] 71 | }, 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "# 注意,当前是一维向量\n", 79 | "arr.shape" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### 方法1:np.newaxis关键字" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "#### 注意:np.newaxis其实就是None的别名" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 4, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "text/plain": [ 104 | "True" 105 | ] 106 | }, 107 | "execution_count": 4, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "np.newaxis is None" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 5, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "True" 125 | ] 126 | }, 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "np.newaxis == None" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "即以下所有的np.newaxis的位置,都可以用None替代" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "#### 给一维向量添加一个行维度" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 6, 153 | "metadata": {}, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "array([[0, 1, 2, 3, 4]])" 159 | ] 160 | }, 161 | "execution_count": 6, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "arr[np.newaxis, :]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 7, 173 | "metadata": { 174 | "scrolled": true 175 | }, 176 | "outputs": [ 177 | { 178 | "data": { 179 | "text/plain": [ 180 | "(1, 5)" 181 | ] 182 | }, 183 | "execution_count": 7, 184 | "metadata": {}, 185 | "output_type": "execute_result" 186 | } 187 | ], 188 | "source": [ 189 | "arr[np.newaxis, :].shape" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "数据现在是一行*五列,数据本身没有增减,只是多了一级括号" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "#### 给一维向量添加一个列维度" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 8, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "data": { 213 | "text/plain": [ 214 | "array([[0],\n", 215 | " [1],\n", 216 | " [2],\n", 217 | " [3],\n", 218 | " [4]])" 219 | ] 220 | }, 221 | "execution_count": 8, 222 | "metadata": {}, 223 | "output_type": "execute_result" 224 | } 225 | ], 226 | "source": [ 227 | "arr[:, np.newaxis]" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 9, 233 | "metadata": {}, 234 | "outputs": [ 235 | { 236 | "data": { 237 | "text/plain": [ 238 | "(5, 1)" 239 | ] 240 | }, 241 | "execution_count": 9, 242 | "metadata": {}, 243 | "output_type": "execute_result" 244 | } 245 | ], 246 | "source": [ 247 | "arr[:, np.newaxis].shape" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "数据现在是五行*一列" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "### 方法2:np.expand_dims方法" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "np.expand_dims方法实现的效果,和np.newaxis关键字是一模一样的" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 10, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "data": { 278 | "text/plain": [ 279 | "array([0, 1, 2, 3, 4])" 280 | ] 281 | }, 282 | "execution_count": 10, 283 | "metadata": {}, 284 | "output_type": "execute_result" 285 | } 286 | ], 287 | "source": [ 288 | "arr" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "#### 给一维数组添加一个行维度" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "相当于arr[np.newaxis, :]" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 11, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/plain": [ 313 | "array([[0, 1, 2, 3, 4]])" 314 | ] 315 | }, 316 | "execution_count": 11, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "np.expand_dims(arr, axis=0)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 12, 328 | "metadata": {}, 329 | "outputs": [ 330 | { 331 | "data": { 332 | "text/plain": [ 333 | "(1, 5)" 334 | ] 335 | }, 336 | "execution_count": 12, 337 | "metadata": {}, 338 | "output_type": "execute_result" 339 | } 340 | ], 341 | "source": [ 342 | "np.expand_dims(arr, axis=0).shape" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "#### 给一维数组添加一个列维度" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "相当于arr[:, np.newaxis]" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 13, 362 | "metadata": {}, 363 | "outputs": [ 364 | { 365 | "data": { 366 | "text/plain": [ 367 | "array([[0],\n", 368 | " [1],\n", 369 | " [2],\n", 370 | " [3],\n", 371 | " [4]])" 372 | ] 373 | }, 374 | "execution_count": 13, 375 | "metadata": {}, 376 | "output_type": "execute_result" 377 | } 378 | ], 379 | "source": [ 380 | "np.expand_dims(arr, axis=1)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 14, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "(5, 1)" 392 | ] 393 | }, 394 | "execution_count": 14, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "np.expand_dims(arr, axis=1).shape" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "### 方法3:np.reshape方法" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "#### 给一维数组添加一个行维度" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 15, 420 | "metadata": {}, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/plain": [ 425 | "array([0, 1, 2, 3, 4])" 426 | ] 427 | }, 428 | "execution_count": 15, 429 | "metadata": {}, 430 | "output_type": "execute_result" 431 | } 432 | ], 433 | "source": [ 434 | "arr" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 16, 440 | "metadata": {}, 441 | "outputs": [ 442 | { 443 | "data": { 444 | "text/plain": [ 445 | "array([[0, 1, 2, 3, 4]])" 446 | ] 447 | }, 448 | "execution_count": 16, 449 | "metadata": {}, 450 | "output_type": "execute_result" 451 | } 452 | ], 453 | "source": [ 454 | "np.reshape(arr, (1, 5))" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 17, 460 | "metadata": {}, 461 | "outputs": [ 462 | { 463 | "data": { 464 | "text/plain": [ 465 | "array([[0, 1, 2, 3, 4]])" 466 | ] 467 | }, 468 | "execution_count": 17, 469 | "metadata": {}, 470 | "output_type": "execute_result" 471 | } 472 | ], 473 | "source": [ 474 | "np.reshape(arr, (1, -1))" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 18, 480 | "metadata": {}, 481 | "outputs": [ 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "(1, 5)" 486 | ] 487 | }, 488 | "execution_count": 18, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "np.reshape(arr, (1, -1)).shape" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "#### 给一维数组添加一个列维度" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": null, 507 | "metadata": {}, 508 | "outputs": [], 509 | "source": [ 510 | "np.reshape(arr, (-1, 1))" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": null, 516 | "metadata": {}, 517 | "outputs": [], 518 | "source": [ 519 | "np.reshape(arr, (-1, 1)).shape" 520 | ] 521 | } 522 | ], 523 | "metadata": { 524 | "kernelspec": { 525 | "display_name": "Python 3", 526 | "language": "python", 527 | "name": "python3" 528 | }, 529 | "language_info": { 530 | "codemirror_mode": { 531 | "name": "ipython", 532 | "version": 3 533 | }, 534 | "file_extension": ".py", 535 | "mimetype": "text/x-python", 536 | "name": "python", 537 | "nbconvert_exporter": "python", 538 | "pygments_lexer": "ipython3", 539 | "version": "3.7.6" 540 | } 541 | }, 542 | "nbformat": 4, 543 | "nbformat_minor": 4 544 | } 545 | -------------------------------------------------------------------------------- /08. Numpy实现K折交叉验证的数据划分.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy实现K折交叉验证的数据划分" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "本实例使用Numpy的数组切片语法,实现了K折交叉验证的数据划分" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### 背景:K折交叉验证\n", 22 | "\n", 23 | "***为什么需要这个?*** \n", 24 | "在机器学习中,因为如下原因,使用K折交叉验证能更好评估模型效果:\n", 25 | "1. 样本量不充足,划分了训练集和测试集后,训练数据更少;\n", 26 | "2. 训练集和测试集的不同划分,可能会导致不同的模型性能结果;\n", 27 | "\n", 28 | "\n", 29 | "***K折验证是什么*** \n", 30 | "K折验证(K-fold validtion)将数据划分为大小相同的K个分区。 \n", 31 | "对每个分区i,在剩余的K-1个分区上训练模型,然后在分区i上评估模型。 \n", 32 | "最终分数等于K个分数的平均值,使用平均值来消除训练集和测试集的划分影响;\n", 33 | "\n", 34 | "" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "### 1. 模拟构造样本集合" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import numpy as np" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "array([[ 0, 1, 2, 3],\n", 62 | " [ 4, 5, 6, 7],\n", 63 | " [ 8, 9, 10, 11],\n", 64 | " [12, 13, 14, 15],\n", 65 | " [16, 17, 18, 19],\n", 66 | " [20, 21, 22, 23],\n", 67 | " [24, 25, 26, 27],\n", 68 | " [28, 29, 30, 31],\n", 69 | " [32, 33, 34, 35]])" 70 | ] 71 | }, 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "data = np.arange(36).reshape(9,4)\n", 79 | "data" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "用样本的角度解释下data数组:\n", 87 | "* 这是一个二维矩阵,行代表每个样本,列代表每个特征\n", 88 | "* 这里有9个样本,每个样本有4个特征\n", 89 | "\n", 90 | "这是scikit-learn模型训练输入的标准格式" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "### 2. 使用Numpy实现K次划分" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 3, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "# 我们想进行4折交叉验证\n", 107 | "k = 4" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "data": { 117 | "text/plain": [ 118 | "2" 119 | ] 120 | }, 121 | "execution_count": 4, 122 | "metadata": {}, 123 | "output_type": "execute_result" 124 | } 125 | ], 126 | "source": [ 127 | "# 算出来每个fold的样本个数\n", 128 | "k_samples_count = data.shape[0]//k\n", 129 | "k_samples_count" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 5, 135 | "metadata": { 136 | "scrolled": false 137 | }, 138 | "outputs": [ 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "\n", 144 | "#####第0折#####\n", 145 | "验证集:\n", 146 | " [[0 1 2 3]\n", 147 | " [4 5 6 7]]\n", 148 | "训练集:\n", 149 | " [[ 8 9 10 11]\n", 150 | " [12 13 14 15]\n", 151 | " [16 17 18 19]\n", 152 | " [20 21 22 23]\n", 153 | " [24 25 26 27]\n", 154 | " [28 29 30 31]\n", 155 | " [32 33 34 35]]\n", 156 | "\n", 157 | "#####第1折#####\n", 158 | "验证集:\n", 159 | " [[ 8 9 10 11]\n", 160 | " [12 13 14 15]]\n", 161 | "训练集:\n", 162 | " [[ 0 1 2 3]\n", 163 | " [ 4 5 6 7]\n", 164 | " [16 17 18 19]\n", 165 | " [20 21 22 23]\n", 166 | " [24 25 26 27]\n", 167 | " [28 29 30 31]\n", 168 | " [32 33 34 35]]\n", 169 | "\n", 170 | "#####第2折#####\n", 171 | "验证集:\n", 172 | " [[16 17 18 19]\n", 173 | " [20 21 22 23]]\n", 174 | "训练集:\n", 175 | " [[ 0 1 2 3]\n", 176 | " [ 4 5 6 7]\n", 177 | " [ 8 9 10 11]\n", 178 | " [12 13 14 15]\n", 179 | " [24 25 26 27]\n", 180 | " [28 29 30 31]\n", 181 | " [32 33 34 35]]\n", 182 | "\n", 183 | "#####第3折#####\n", 184 | "验证集:\n", 185 | " [[24 25 26 27]\n", 186 | " [28 29 30 31]]\n", 187 | "训练集:\n", 188 | " [[ 0 1 2 3]\n", 189 | " [ 4 5 6 7]\n", 190 | " [ 8 9 10 11]\n", 191 | " [12 13 14 15]\n", 192 | " [16 17 18 19]\n", 193 | " [20 21 22 23]\n", 194 | " [32 33 34 35]]\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "for fold in range(k):\n", 200 | " validation_begin = k_samples_count*fold\n", 201 | " validation_end = k_samples_count*(fold+1)\n", 202 | " \n", 203 | " validation_data = data[validation_begin:validation_end]\n", 204 | " \n", 205 | " # np.vstack,沿着垂直的方向堆叠数组\n", 206 | " train_data = np.vstack([\n", 207 | " data[:validation_begin], \n", 208 | " data[validation_end:]\n", 209 | " ])\n", 210 | " \n", 211 | " print()\n", 212 | " print(f\"#####第{fold}折#####\")\n", 213 | " print(\"验证集:\\n\", validation_data)\n", 214 | " print(\"训练集:\\n\", train_data)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "如果使用scikit-learn,已经有封装好的实现: \n", 222 | "from sklearn.model_selection import cross_val_score" 223 | ] 224 | } 225 | ], 226 | "metadata": { 227 | "kernelspec": { 228 | "display_name": "Python 3", 229 | "language": "python", 230 | "name": "python3" 231 | }, 232 | "language_info": { 233 | "codemirror_mode": { 234 | "name": "ipython", 235 | "version": 3 236 | }, 237 | "file_extension": ".py", 238 | "mimetype": "text/x-python", 239 | "name": "python", 240 | "nbconvert_exporter": "python", 241 | "pygments_lexer": "ipython3", 242 | "version": "3.7.6" 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 4 247 | } 248 | -------------------------------------------------------------------------------- /09. Numpy非常有用的数组合并操作.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy非常重要有用的数组合并操作" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "背景:在给机器学习准备数据的过程中,经常需要进行不同来源的数据合并的操作。\n", 15 | "\n", 16 | "两类场景:\n", 17 | "1. 给已有的数据添加多行,比如增添一些样本数据进去;\n", 18 | "2. 给已有的数据添加多列,比如增添一些特征进去;\n", 19 | "\n", 20 | "以下操作均可以实现数组合并:\n", 21 | "* np.concatenate(array_list, axis=0/1):沿着指定axis进行数组的合并\n", 22 | "* np.vstack或者np.row_stack(array_list):垂直vertically、按行row wise进行数据合并\n", 23 | "* np.hstack或者np.column_stack(array_list):水平horizontally、按列column wise进行数据合并" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import numpy as np" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "### 1. 怎样给数据添加新的多行" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "a = np.arange(6).reshape(2,3)\n", 49 | "b = np.random.randint(10,20,size=(4,3))" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "array([[0, 1, 2],\n", 61 | " [3, 4, 5]])" 62 | ] 63 | }, 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "a" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 4, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/plain": [ 81 | "array([[13, 16, 13],\n", 82 | " [17, 15, 14],\n", 83 | " [19, 13, 19],\n", 84 | " [10, 13, 10]])" 85 | ] 86 | }, 87 | "execution_count": 4, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "b" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 5, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "text/plain": [ 104 | "array([[ 0, 1, 2],\n", 105 | " [ 3, 4, 5],\n", 106 | " [13, 16, 13],\n", 107 | " [17, 15, 14],\n", 108 | " [19, 13, 19],\n", 109 | " [10, 13, 10]])" 110 | ] 111 | }, 112 | "execution_count": 5, 113 | "metadata": {}, 114 | "output_type": "execute_result" 115 | } 116 | ], 117 | "source": [ 118 | "# 方法1:\n", 119 | "np.concatenate([a,b])" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 6, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "data": { 129 | "text/plain": [ 130 | "array([[ 0, 1, 2],\n", 131 | " [ 3, 4, 5],\n", 132 | " [13, 16, 13],\n", 133 | " [17, 15, 14],\n", 134 | " [19, 13, 19],\n", 135 | " [10, 13, 10]])" 136 | ] 137 | }, 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "# 方法2\n", 145 | "np.vstack([a,b])" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 7, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "array([[ 0, 1, 2],\n", 157 | " [ 3, 4, 5],\n", 158 | " [13, 16, 13],\n", 159 | " [17, 15, 14],\n", 160 | " [19, 13, 19],\n", 161 | " [10, 13, 10]])" 162 | ] 163 | }, 164 | "execution_count": 7, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "# 方法3\n", 171 | "np.row_stack([a, b])" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "### 2. 怎样给数据添加新的多列" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 8, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "a = np.arange(12).reshape(3,4)\n", 188 | "b = np.random.randint(10,20,size=(3,2))" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 9, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "array([[ 0, 1, 2, 3],\n", 200 | " [ 4, 5, 6, 7],\n", 201 | " [ 8, 9, 10, 11]])" 202 | ] 203 | }, 204 | "execution_count": 9, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "a" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 10, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "array([[12, 16],\n", 222 | " [18, 12],\n", 223 | " [12, 12]])" 224 | ] 225 | }, 226 | "execution_count": 10, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "b" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 11, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "array([[ 0, 1, 2, 3, 12, 16],\n", 244 | " [ 4, 5, 6, 7, 18, 12],\n", 245 | " [ 8, 9, 10, 11, 12, 12]])" 246 | ] 247 | }, 248 | "execution_count": 11, 249 | "metadata": {}, 250 | "output_type": "execute_result" 251 | } 252 | ], 253 | "source": [ 254 | "# 方法1\n", 255 | "np.concatenate([a,b], axis=1)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": null, 261 | "metadata": {}, 262 | "outputs": [], 263 | "source": [ 264 | "# 方法2\n", 265 | "np.hstack([a,b])" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "# 方法3\n", 275 | "np.column_stack([a,b])" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.7.6" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 4 300 | } 301 | -------------------------------------------------------------------------------- /10. Numpy怎样对数组排序.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样对数组排序\n", 8 | "\n", 9 | "Numpy给数组排序的三个方法: \n", 10 | "* numpy.sort:返回排序后数组的拷贝\n", 11 | "* array.sort:原地排序数组而不是返回拷贝\n", 12 | "* numpy.argsort:间接排序,返回的是排序后的数字索引\n", 13 | "\n", 14 | "3个方法都支持一个参数kind,可以是以下一个值:\n", 15 | "* quicksort:快速排序,平均O(nlogn),不稳定情况\n", 16 | "* mergesort:归并排序,平均O(nlogn),稳定排序\n", 17 | "* heapsort:堆排序,平均O(nlogn),不稳定排序\n", 18 | "* stable:稳定排序\n", 19 | "\n", 20 | "kind默认值是quicksort,快速排序平均情况是最快,保持默认即可" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "### 1. np.sort返回排序后的数组" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "arr = np.array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 57 | ] 58 | }, 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "# 返回拷贝后的数组\n", 66 | "np.sort(arr)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 4, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 78 | ] 79 | }, 80 | "execution_count": 4, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "arr" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### 2. array.sort进行原地排序" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 5, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "arr2 = arr.copy()" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 6, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/plain": [ 113 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 114 | ] 115 | }, 116 | "execution_count": 6, 117 | "metadata": {}, 118 | "output_type": "execute_result" 119 | } 120 | ], 121 | "source": [ 122 | "arr2" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 7, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "arr2.sort()" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 8, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "data": { 141 | "text/plain": [ 142 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 143 | ] 144 | }, 145 | "execution_count": 8, 146 | "metadata": {}, 147 | "output_type": "execute_result" 148 | } 149 | ], 150 | "source": [ 151 | "arr2" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### 3. np.argsort 返回的是有序数字的索引" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 9, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/plain": [ 169 | "array([3, 2, 4, 5, 1, 9, 7, 8, 6])" 170 | ] 171 | }, 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "arr" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 10, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "data": { 188 | "text/plain": [ 189 | "array([4, 1, 0, 2, 3, 8, 6, 7, 5], dtype=int64)" 190 | ] 191 | }, 192 | "execution_count": 10, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "# 获得排序元素对应的索引数字列表\n", 199 | "indices = np.argsort(arr)\n", 200 | "indices" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 11, 206 | "metadata": { 207 | "scrolled": true 208 | }, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" 214 | ] 215 | }, 216 | "execution_count": 11, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "# 可以直接获取对应的数据列表\n", 223 | "arr[indices]" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "### 4. Python原生sorted与np.sort的性能对比" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 12, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "arr_np = np.random.randint(0, 100, 100*10000)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 13, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "name": "stdout", 249 | "output_type": "stream", 250 | "text": [ 251 | "24 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 252 | ] 253 | } 254 | ], 255 | "source": [ 256 | "%timeit np.sort(arr_np)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 14, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "# 将numpy arr变成python list\n", 266 | "arr_py = arr_np.tolist()" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 15, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "90.1 ms ± 726 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "%timeit sorted(arr_py)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [] 292 | } 293 | ], 294 | "metadata": { 295 | "kernelspec": { 296 | "display_name": "Python 3", 297 | "language": "python", 298 | "name": "python3" 299 | }, 300 | "language_info": { 301 | "codemirror_mode": { 302 | "name": "ipython", 303 | "version": 3 304 | }, 305 | "file_extension": ".py", 306 | "mimetype": "text/x-python", 307 | "name": "python", 308 | "nbconvert_exporter": "python", 309 | "pygments_lexer": "ipython3", 310 | "version": "3.7.6" 311 | } 312 | }, 313 | "nbformat": 4, 314 | "nbformat_minor": 4 315 | } 316 | -------------------------------------------------------------------------------- /11. Numpy中数组的乘法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy中数组的乘法\n", 8 | "\n", 9 | "按照两个相乘数组A和B的维度不同,分为以下乘法:\n", 10 | "1. 数字与一维/二维数组相乘;\n", 11 | "2. 一维数组与一维数组相乘;\n", 12 | "3. 二维数组与一维数组相乘;\n", 13 | "4. 二维数组与二维数组相乘;\n", 14 | "\n", 15 | "**numpy有以下乘法函数:** \n", 16 | "1. *符号或者np.multiply:逐元素乘法,对应位置的元素相乘,要求shape相同\n", 17 | "2. @符号或者np.matmul:矩阵乘法,形状要求满足(n,k),(k,m)->(n,m)\n", 18 | "3. np.dot:点积乘法\n", 19 | "\n", 20 | "**解释:点积,也叫内积,也叫数量积** \n", 21 | "两个向量a = [a1, a2,…, an]和b = [b1, b2,…, bn]的点积定义为: \n", 22 | "a·b=a1b1+a2b2+……+anbn。" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "import numpy as np" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "### 1. 数字与一维数组/二维数组相乘" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "#### 一维数组" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 2, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" 57 | ] 58 | }, 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "A = np.arange(10)\n", 66 | "A" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 3, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])" 78 | ] 79 | }, 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "# *意思是逐元素乘法\n", 87 | "A * 0.5" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "#### 二维数组" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 4, 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "array([[ 0, 1, 2, 3],\n", 106 | " [ 4, 5, 6, 7],\n", 107 | " [ 8, 9, 10, 11]])" 108 | ] 109 | }, 110 | "execution_count": 4, 111 | "metadata": {}, 112 | "output_type": "execute_result" 113 | } 114 | ], 115 | "source": [ 116 | "B = np.arange(12).reshape(3, 4)\n", 117 | "B" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 5, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "data": { 127 | "text/plain": [ 128 | "array([[0. , 0.5, 1. , 1.5],\n", 129 | " [2. , 2.5, 3. , 3.5],\n", 130 | " [4. , 4.5, 5. , 5.5]])" 131 | ] 132 | }, 133 | "execution_count": 5, 134 | "metadata": {}, 135 | "output_type": "execute_result" 136 | } 137 | ], 138 | "source": [ 139 | "B * 0.5" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### 2. 一维数组与一维数组相乘" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 6, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "data": { 156 | "text/plain": [ 157 | "array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])" 158 | ] 159 | }, 160 | "execution_count": 6, 161 | "metadata": {}, 162 | "output_type": "execute_result" 163 | } 164 | ], 165 | "source": [ 166 | "A = np.arange(1, 11)\n", 167 | "A" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 7, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" 179 | ] 180 | }, 181 | "execution_count": 7, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "B = np.arange(1, 11) * 0.1\n", 188 | "B" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "#### 逐元素乘法" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 8, 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "data": { 205 | "text/plain": [ 206 | "array([ 0.1, 0.4, 0.9, 1.6, 2.5, 3.6, 4.9, 6.4, 8.1, 10. ])" 207 | ] 208 | }, 209 | "execution_count": 8, 210 | "metadata": {}, 211 | "output_type": "execute_result" 212 | } 213 | ], 214 | "source": [ 215 | "np.multiply(A, B)" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 9, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "array([ 0.1, 0.4, 0.9, 1.6, 2.5, 3.6, 4.9, 6.4, 8.1, 10. ])" 227 | ] 228 | }, 229 | "execution_count": 9, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "A * B" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "#### 点积/内积/数量积" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 11, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/plain": [ 253 | "38.5" 254 | ] 255 | }, 256 | "execution_count": 11, 257 | "metadata": {}, 258 | "output_type": "execute_result" 259 | } 260 | ], 261 | "source": [ 262 | "A@B" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 12, 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "data": { 272 | "text/plain": [ 273 | "38.5" 274 | ] 275 | }, 276 | "execution_count": 12, 277 | "metadata": {}, 278 | "output_type": "execute_result" 279 | } 280 | ], 281 | "source": [ 282 | "np.matmul(A, B)" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 10, 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "data": { 292 | "text/plain": [ 293 | "38.5" 294 | ] 295 | }, 296 | "execution_count": 10, 297 | "metadata": {}, 298 | "output_type": "execute_result" 299 | } 300 | ], 301 | "source": [ 302 | "np.dot(A, B)" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 13, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/plain": [ 313 | "38.5" 314 | ] 315 | }, 316 | "execution_count": 13, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "# 以上三个,相当于\n", 323 | "np.sum(A*B)" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "### 3. 二维数组和一维数组相乘" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 14, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "data": { 340 | "text/plain": [ 341 | "array([[ 1, 2, 3, 4],\n", 342 | " [ 5, 6, 7, 8],\n", 343 | " [ 9, 10, 11, 12],\n", 344 | " [13, 14, 15, 16],\n", 345 | " [17, 18, 19, 20]])" 346 | ] 347 | }, 348 | "execution_count": 14, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "A = np.arange(1, 21).reshape(5, 4)\n", 355 | "A" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 15, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "data": { 365 | "text/plain": [ 366 | "array([0.1, 0.2, 0.3, 0.4])" 367 | ] 368 | }, 369 | "execution_count": 15, 370 | "metadata": {}, 371 | "output_type": "execute_result" 372 | } 373 | ], 374 | "source": [ 375 | "B = np.arange(1, 5) * 0.1\n", 376 | "B" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "#### 逐元素乘法" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 16, 389 | "metadata": {}, 390 | "outputs": [ 391 | { 392 | "data": { 393 | "text/plain": [ 394 | "array([[0.1, 0.4, 0.9, 1.6],\n", 395 | " [0.5, 1.2, 2.1, 3.2],\n", 396 | " [0.9, 2. , 3.3, 4.8],\n", 397 | " [1.3, 2.8, 4.5, 6.4],\n", 398 | " [1.7, 3.6, 5.7, 8. ]])" 399 | ] 400 | }, 401 | "execution_count": 16, 402 | "metadata": {}, 403 | "output_type": "execute_result" 404 | } 405 | ], 406 | "source": [ 407 | "A*B" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 17, 413 | "metadata": {}, 414 | "outputs": [ 415 | { 416 | "data": { 417 | "text/plain": [ 418 | "array([[0.1, 0.4, 0.9, 1.6],\n", 419 | " [0.5, 1.2, 2.1, 3.2],\n", 420 | " [0.9, 2. , 3.3, 4.8],\n", 421 | " [1.3, 2.8, 4.5, 6.4],\n", 422 | " [1.7, 3.6, 5.7, 8. ]])" 423 | ] 424 | }, 425 | "execution_count": 17, 426 | "metadata": {}, 427 | "output_type": "execute_result" 428 | } 429 | ], 430 | "source": [ 431 | "np.multiply(A, B)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "#### 矩阵乘法" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": 18, 444 | "metadata": {}, 445 | "outputs": [ 446 | { 447 | "data": { 448 | "text/plain": [ 449 | "array([ 3., 7., 11., 15., 19.])" 450 | ] 451 | }, 452 | "execution_count": 18, 453 | "metadata": {}, 454 | "output_type": "execute_result" 455 | } 456 | ], 457 | "source": [ 458 | "A@B" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 19, 464 | "metadata": {}, 465 | "outputs": [ 466 | { 467 | "data": { 468 | "text/plain": [ 469 | "array([ 3., 7., 11., 15., 19.])" 470 | ] 471 | }, 472 | "execution_count": 19, 473 | "metadata": {}, 474 | "output_type": "execute_result" 475 | } 476 | ], 477 | "source": [ 478 | "np.matmul(A, B)" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 20, 484 | "metadata": {}, 485 | "outputs": [ 486 | { 487 | "data": { 488 | "text/plain": [ 489 | "array([ 3., 7., 11., 15., 19.])" 490 | ] 491 | }, 492 | "execution_count": 20, 493 | "metadata": {}, 494 | "output_type": "execute_result" 495 | } 496 | ], 497 | "source": [ 498 | "np.dot(A, B)" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "### 4. A和B都是二维数组,实现矩阵乘法" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 21, 511 | "metadata": {}, 512 | "outputs": [ 513 | { 514 | "data": { 515 | "text/plain": [ 516 | "array([[ 0, 1, 2, 3],\n", 517 | " [ 4, 5, 6, 7],\n", 518 | " [ 8, 9, 10, 11]])" 519 | ] 520 | }, 521 | "execution_count": 21, 522 | "metadata": {}, 523 | "output_type": "execute_result" 524 | } 525 | ], 526 | "source": [ 527 | "A = np.arange(12).reshape(3, 4)\n", 528 | "A" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": 22, 534 | "metadata": {}, 535 | "outputs": [ 536 | { 537 | "data": { 538 | "text/plain": [ 539 | "array([[ 0, 1, 2, 3, 4],\n", 540 | " [ 5, 6, 7, 8, 9],\n", 541 | " [10, 11, 12, 13, 14],\n", 542 | " [15, 16, 17, 18, 19]])" 543 | ] 544 | }, 545 | "execution_count": 22, 546 | "metadata": {}, 547 | "output_type": "execute_result" 548 | } 549 | ], 550 | "source": [ 551 | "B = np.arange(20).reshape(4, 5)\n", 552 | "B" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": 23, 558 | "metadata": { 559 | "scrolled": true 560 | }, 561 | "outputs": [ 562 | { 563 | "data": { 564 | "text/plain": [ 565 | "array([[ 70, 76, 82, 88, 94],\n", 566 | " [190, 212, 234, 256, 278],\n", 567 | " [310, 348, 386, 424, 462]])" 568 | ] 569 | }, 570 | "execution_count": 23, 571 | "metadata": {}, 572 | "output_type": "execute_result" 573 | } 574 | ], 575 | "source": [ 576 | "A@B" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": 24, 582 | "metadata": {}, 583 | "outputs": [ 584 | { 585 | "data": { 586 | "text/plain": [ 587 | "array([[ 70, 76, 82, 88, 94],\n", 588 | " [190, 212, 234, 256, 278],\n", 589 | " [310, 348, 386, 424, 462]])" 590 | ] 591 | }, 592 | "execution_count": 24, 593 | "metadata": {}, 594 | "output_type": "execute_result" 595 | } 596 | ], 597 | "source": [ 598 | "np.matmul(A, B)" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 25, 604 | "metadata": {}, 605 | "outputs": [ 606 | { 607 | "data": { 608 | "text/plain": [ 609 | "array([[ 70, 76, 82, 88, 94],\n", 610 | " [190, 212, 234, 256, 278],\n", 611 | " [310, 348, 386, 424, 462]])" 612 | ] 613 | }, 614 | "execution_count": 25, 615 | "metadata": {}, 616 | "output_type": "execute_result" 617 | } 618 | ], 619 | "source": [ 620 | "np.dot(A, B)" 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": null, 626 | "metadata": {}, 627 | "outputs": [], 628 | "source": [] 629 | } 630 | ], 631 | "metadata": { 632 | "kernelspec": { 633 | "display_name": "Python 3", 634 | "language": "python", 635 | "name": "python3" 636 | }, 637 | "language_info": { 638 | "codemirror_mode": { 639 | "name": "ipython", 640 | "version": 3 641 | }, 642 | "file_extension": ".py", 643 | "mimetype": "text/x-python", 644 | "name": "python", 645 | "nbconvert_exporter": "python", 646 | "pygments_lexer": "ipython3", 647 | "version": "3.7.6" 648 | } 649 | }, 650 | "nbformat": 4, 651 | "nbformat_minor": 4 652 | } 653 | -------------------------------------------------------------------------------- /12. Numpy中重要的广播概念.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy中重要的广播概念" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "***广播:*** \n", 15 | "简单理解为用于不同大小数组的二元通用函数(加、减、乘等)的一组规则\n", 16 | "\n", 17 | "***广播的规则:***\n", 18 | "1. 如果两个数组的维度数dim不相同,那么小维度数组的形状将会在左边补1\n", 19 | "2. 如果shape的维度不匹配,但是有维度是1,那么可以扩展维度是1的维度匹配另一个数组;\n", 20 | "3. 如果shape的维度不匹配,但是没有任何一个维度是1,则匹配失败引发错误;" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "### 实例1:二维数组加一维数组" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "array([[1., 1., 1.],\n", 48 | " [1., 1., 1.]])" 49 | ] 50 | }, 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "a = np.ones((2,3))\n", 58 | "a" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "array([0, 1, 2])" 70 | ] 71 | }, 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "b = np.arange(3)\n", 79 | "b" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "data": { 89 | "text/plain": [ 90 | "((2, 3), (3,))" 91 | ] 92 | }, 93 | "execution_count": 4, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | } 97 | ], 98 | "source": [ 99 | "a.shape, b.shape" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "array([[1., 2., 3.],\n", 111 | " [1., 2., 3.]])" 112 | ] 113 | }, 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "output_type": "execute_result" 117 | } 118 | ], 119 | "source": [ 120 | "# 形状不匹配但是可以相加\n", 121 | "a + b" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "***分析:a.shape=(2, 3), b.shape=(3,)***\n", 129 | "1. 根据规则1,b.shape会变成(1, 3)\n", 130 | "2. 根据规则2,b.shape再变成(2, 3),相当于在行上复制\n", 131 | "3. 完成匹配" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "### 实例2:两个数组均需要广播" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 6, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/plain": [ 149 | "array([[0],\n", 150 | " [1],\n", 151 | " [2]])" 152 | ] 153 | }, 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "a = np.arange(3).reshape((3, 1))\n", 161 | "a" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 7, 167 | "metadata": {}, 168 | "outputs": [ 169 | { 170 | "data": { 171 | "text/plain": [ 172 | "array([0, 1, 2])" 173 | ] 174 | }, 175 | "execution_count": 7, 176 | "metadata": {}, 177 | "output_type": "execute_result" 178 | } 179 | ], 180 | "source": [ 181 | "b = np.arange(3)\n", 182 | "b" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 8, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "data": { 192 | "text/plain": [ 193 | "((3, 1), (3,))" 194 | ] 195 | }, 196 | "execution_count": 8, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "a.shape, b.shape" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 9, 208 | "metadata": { 209 | "scrolled": true 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/plain": [ 215 | "array([[0, 1, 2],\n", 216 | " [1, 2, 3],\n", 217 | " [2, 3, 4]])" 218 | ] 219 | }, 220 | "execution_count": 9, 221 | "metadata": {}, 222 | "output_type": "execute_result" 223 | } 224 | ], 225 | "source": [ 226 | "a + b" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "***分析:a.shape为(3,1),b.shape为(3,)***:\n", 234 | "1. 根据规则1,b.shape会变成(1, 3)\n", 235 | "2. 根据规则2,b.shape再变成(3, 3),相当于在行上复制\n", 236 | "3. 根据规则2,a.shape再变成(3, 3),相当于在列上复制\n", 237 | "3. 完成匹配" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### 实例3:不匹配的例子" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 10, 250 | "metadata": {}, 251 | "outputs": [ 252 | { 253 | "data": { 254 | "text/plain": [ 255 | "array([[1., 1.],\n", 256 | " [1., 1.],\n", 257 | " [1., 1.]])" 258 | ] 259 | }, 260 | "execution_count": 10, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "a = np.ones((3,2))\n", 267 | "a" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 11, 273 | "metadata": {}, 274 | "outputs": [ 275 | { 276 | "data": { 277 | "text/plain": [ 278 | "array([0, 1, 2])" 279 | ] 280 | }, 281 | "execution_count": 11, 282 | "metadata": {}, 283 | "output_type": "execute_result" 284 | } 285 | ], 286 | "source": [ 287 | "b = np.arange(3)\n", 288 | "b" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 12, 294 | "metadata": {}, 295 | "outputs": [ 296 | { 297 | "data": { 298 | "text/plain": [ 299 | "((3, 2), (3,))" 300 | ] 301 | }, 302 | "execution_count": 12, 303 | "metadata": {}, 304 | "output_type": "execute_result" 305 | } 306 | ], 307 | "source": [ 308 | "a.shape, b.shape" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 13, 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "ename": "ValueError", 318 | "evalue": "operands could not be broadcast together with shapes (3,2) (3,) ", 319 | "output_type": "error", 320 | "traceback": [ 321 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 322 | "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", 323 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0ma\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mb\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 324 | "\u001b[1;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,2) (3,) " 325 | ] 326 | } 327 | ], 328 | "source": [ 329 | "a + b" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "***分析:a.shape为(3,2),b.shape为(3,)***:\n", 337 | "1. 根据规则1,b.shape会变成(1, 3)\n", 338 | "2. 根据规则2,b.shape再变成(3, 3),相当于在行上复制\n", 339 | "3. 根据规则3,形状不匹配,但是没有维度是1,匹配失败报错" 340 | ] 341 | } 342 | ], 343 | "metadata": { 344 | "kernelspec": { 345 | "display_name": "Python 3", 346 | "language": "python", 347 | "name": "python3" 348 | }, 349 | "language_info": { 350 | "codemirror_mode": { 351 | "name": "ipython", 352 | "version": 3 353 | }, 354 | "file_extension": ".py", 355 | "mimetype": "text/x-python", 356 | "name": "python", 357 | "nbconvert_exporter": "python", 358 | "pygments_lexer": "ipython3", 359 | "version": "3.7.6" 360 | } 361 | }, 362 | "nbformat": 4, 363 | "nbformat_minor": 4 364 | } 365 | -------------------------------------------------------------------------------- /13. Numpy求解线性方程组.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy求解线性方程组\n", 8 | "\n", 9 | "对于Ax=b,已知A和b,怎么算出x?" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 1. 引入包" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### 2. 求解" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "data": { 42 | "text/plain": [ 43 | "array([[ 1, -2, 1],\n", 44 | " [ 0, 2, -8],\n", 45 | " [-4, 5, 9]])" 46 | ] 47 | }, 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "output_type": "execute_result" 51 | } 52 | ], 53 | "source": [ 54 | "A = np.array(\n", 55 | " [\n", 56 | " [1, -2, 1],\n", 57 | " [0, 2, -8],\n", 58 | " [-4, 5, 9]\n", 59 | " ]\n", 60 | ")\n", 61 | "A" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 3, 67 | "metadata": {}, 68 | "outputs": [ 69 | { 70 | "data": { 71 | "text/plain": [ 72 | "array([ 0, 8, -9])" 73 | ] 74 | }, 75 | "execution_count": 3, 76 | "metadata": {}, 77 | "output_type": "execute_result" 78 | } 79 | ], 80 | "source": [ 81 | "b = np.array([0, 8, -9])\n", 82 | "b" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 4, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "array([29., 16., 3.])" 94 | ] 95 | }, 96 | "execution_count": 4, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "# 调用solve方法直接求解\n", 103 | "x = np.linalg.solve(A, b)\n", 104 | "x" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "### 验证" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 5, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "8.0" 123 | ] 124 | }, 125 | "execution_count": 5, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "# 验证单个方程\n", 132 | "A[1].dot(x)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 6, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "data": { 142 | "text/plain": [ 143 | "array([ True, True, True])" 144 | ] 145 | }, 146 | "execution_count": 6, 147 | "metadata": {}, 148 | "output_type": "execute_result" 149 | } 150 | ], 151 | "source": [ 152 | "# 验证整个矩阵计算\n", 153 | "A.dot(x) == b" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [] 162 | } 163 | ], 164 | "metadata": { 165 | "kernelspec": { 166 | "display_name": "Python 3", 167 | "language": "python", 168 | "name": "python3" 169 | }, 170 | "language_info": { 171 | "codemirror_mode": { 172 | "name": "ipython", 173 | "version": 3 174 | }, 175 | "file_extension": ".py", 176 | "mimetype": "text/x-python", 177 | "name": "python", 178 | "nbconvert_exporter": "python", 179 | "pygments_lexer": "ipython3", 180 | "version": "3.7.6" 181 | } 182 | }, 183 | "nbformat": 4, 184 | "nbformat_minor": 4 185 | } 186 | -------------------------------------------------------------------------------- /14. Numpy实现SVD矩阵分解.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy实现SVD矩阵分解" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### 1. 引入包" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### 2. 实现矩阵分解" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "A = np.random.randint(1, 10, (8, 4))" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/plain": [ 50 | "array([[6, 5, 1, 5],\n", 51 | " [1, 7, 9, 7],\n", 52 | " [7, 2, 4, 2],\n", 53 | " [6, 4, 3, 5],\n", 54 | " [2, 8, 8, 6],\n", 55 | " [5, 2, 8, 6],\n", 56 | " [7, 8, 2, 3],\n", 57 | " [1, 3, 6, 9]])" 58 | ] 59 | }, 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "A" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 4, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "# 实现矩阵分解\n", 76 | "U, S, V = np.linalg.svd(A, full_matrices=False)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 5, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "((8, 4), (4,), (4, 4))" 88 | ] 89 | }, 90 | "execution_count": 5, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "U.shape, S.shape, V.shape" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 6, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "data": { 106 | "text/plain": [ 107 | "array([[-0.28611227, -0.38768744, -0.07088588, -0.47757145],\n", 108 | " [-0.44374671, 0.40390585, -0.25458601, 0.20383531],\n", 109 | " [-0.24657791, -0.34884357, 0.43054458, 0.4062272 ],\n", 110 | " [-0.30673084, -0.27495123, 0.14797683, -0.2218886 ],\n", 111 | " [-0.43671345, 0.23339125, -0.39431663, 0.27599841],\n", 112 | " [-0.37257929, 0.10313032, 0.59362412, 0.23542645],\n", 113 | " [-0.33314069, -0.52514475, -0.41727103, 0.07285924],\n", 114 | " [-0.35472167, 0.38520663, 0.20225001, -0.61580222]])" 115 | ] 116 | }, 117 | "execution_count": 6, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "U" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 7, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "array([28.44730142, 10.24874824, 6.39012419, 4.56952014])" 135 | ] 136 | }, 137 | "execution_count": 7, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "# 因为是对角矩阵,这里进行了简写\n", 144 | "S" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 8, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "array([[28.44730142, 0. , 0. , 0. ],\n", 156 | " [ 0. , 10.24874824, 0. , 0. ],\n", 157 | " [ 0. , 0. , 6.39012419, 0. ],\n", 158 | " [ 0. , 0. , 0. , 4.56952014]])" 159 | ] 160 | }, 161 | "execution_count": 8, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "np.diag(S)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 9, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "array([[-0.39194862, -0.50004828, -0.54329548, -0.54877866],\n", 179 | " [-0.81202147, -0.18350883, 0.48594814, 0.26608277],\n", 180 | " [ 0.41980592, -0.84227439, 0.27814277, 0.19228481],\n", 181 | " [ 0.10373231, 0.08276523, 0.62555658, -0.76880979]])" 182 | ] 183 | }, 184 | "execution_count": 9, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "V" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "### 3. 从分量还原矩阵" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 10, 203 | "metadata": {}, 204 | "outputs": [ 205 | { 206 | "data": { 207 | "text/plain": [ 208 | "array([[6., 5., 1., 5.],\n", 209 | " [1., 7., 9., 7.],\n", 210 | " [7., 2., 4., 2.],\n", 211 | " [6., 4., 3., 5.],\n", 212 | " [2., 8., 8., 6.],\n", 213 | " [5., 2., 8., 6.],\n", 214 | " [7., 8., 2., 3.],\n", 215 | " [1., 3., 6., 9.]])" 216 | ] 217 | }, 218 | "execution_count": 10, 219 | "metadata": {}, 220 | "output_type": "execute_result" 221 | } 222 | ], 223 | "source": [ 224 | "U @ np.diag(S) @ V" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [] 233 | } 234 | ], 235 | "metadata": { 236 | "kernelspec": { 237 | "display_name": "Python 3", 238 | "language": "python", 239 | "name": "python3" 240 | }, 241 | "language_info": { 242 | "codemirror_mode": { 243 | "name": "ipython", 244 | "version": 3 245 | }, 246 | "file_extension": ".py", 247 | "mimetype": "text/x-python", 248 | "name": "python", 249 | "nbconvert_exporter": "python", 250 | "pygments_lexer": "ipython3", 251 | "version": "3.7.6" 252 | } 253 | }, 254 | "nbformat": 4, 255 | "nbformat_minor": 4 256 | } 257 | -------------------------------------------------------------------------------- /17. Numpy计算逆矩阵求解线性方程组.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy计算逆矩阵求解线性方程组" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "对于这样的线性方程组:\n", 15 | "* x + y + z = 6\n", 16 | "* 2y + 5z = -4\n", 17 | "* 2x + 5y - z = 27\n", 18 | "\n", 19 | "可以表示成矩阵的形式:\n", 20 | "\n", 21 | "\n", 22 | "用公式可以表示为:Ax=b,其中A是矩阵,x和b都是列向量\n", 23 | "\n", 24 | "***逆矩阵(inverse matrix)的定义:*** \n", 25 | "设A是数域上的一个n阶矩阵,若存在另一个n阶矩阵B,使得: AB=BA=E ,则我们称B是A的逆矩阵,而A则被称为可逆矩阵。注:E为单位矩阵。\n", 26 | "\n", 27 | "***使用逆矩阵求解线性方程组的方法:*** \n", 28 | "两边都乘以$A^{-1}$,变成$A^{-1}$Ax=$A^{-1}$b,因为任何矩阵乘以单位矩阵都是自身,所以x=$A^{-1}$b" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "### 1. 求解逆矩阵" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/plain": [ 55 | "array([[ 1, 1, 1],\n", 56 | " [ 0, 2, 5],\n", 57 | " [ 2, 5, -1]])" 58 | ] 59 | }, 60 | "execution_count": 2, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "A = np.array([\n", 67 | " [1,1,1],\n", 68 | " [0,2,5],\n", 69 | " [2,5,-1]\n", 70 | "])\n", 71 | "A" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "array([[ 1.28571429, -0.28571429, -0.14285714],\n", 83 | " [-0.47619048, 0.14285714, 0.23809524],\n", 84 | " [ 0.19047619, 0.14285714, -0.0952381 ]])" 85 | ] 86 | }, 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "# B为A的逆矩阵\n", 94 | "B = np.linalg.inv(A)\n", 95 | "B" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### 2. 验证矩阵和逆矩阵的乘积是单位矩阵" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/plain": [ 113 | "array([[ 1.00000000e+00, -2.77555756e-17, 2.77555756e-17],\n", 114 | " [ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00],\n", 115 | " [-2.22044605e-16, 5.55111512e-17, 1.00000000e+00]])" 116 | ] 117 | }, 118 | "execution_count": 4, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | } 122 | ], 123 | "source": [ 124 | "A@B" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 5, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/plain": [ 135 | "array([[ 1.00000000e+00, -2.77555756e-17, 2.77555756e-17],\n", 136 | " [ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00],\n", 137 | " [-2.22044605e-16, 5.55111512e-17, 1.00000000e+00]])" 138 | ] 139 | }, 140 | "execution_count": 5, 141 | "metadata": {}, 142 | "output_type": "execute_result" 143 | } 144 | ], 145 | "source": [ 146 | "np.matmul(A, B)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "### 3. 验证线性方程组" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "# 构造Ax=b中的b\n", 163 | "b = np.array([6, -4, 27])" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 7, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "# 使用逆矩阵求解x\n", 173 | "x = B@b" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 8, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "array([ 5., 3., -2.])" 185 | ] 186 | }, 187 | "execution_count": 8, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "x" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 9, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "array([ 6., -4., 27.])" 205 | ] 206 | }, 207 | "execution_count": 9, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "# 验证A@x = b\n", 214 | "A@x" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": null, 220 | "metadata": {}, 221 | "outputs": [], 222 | "source": [] 223 | } 224 | ], 225 | "metadata": { 226 | "kernelspec": { 227 | "display_name": "Python 3", 228 | "language": "python", 229 | "name": "python3" 230 | }, 231 | "language_info": { 232 | "codemirror_mode": { 233 | "name": "ipython", 234 | "version": 3 235 | }, 236 | "file_extension": ".py", 237 | "mimetype": "text/x-python", 238 | "name": "python", 239 | "nbconvert_exporter": "python", 240 | "pygments_lexer": "ipython3", 241 | "version": "3.7.6" 242 | } 243 | }, 244 | "nbformat": 4, 245 | "nbformat_minor": 4 246 | } 247 | -------------------------------------------------------------------------------- /18. Numpy怎样将数组读写到文件.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy怎样将数组读写到文件\n", 8 | "\n", 9 | "本文档介绍的是Numpy以自己内建二进制的方式,将数组写出到文件,以及从文件加载数组;\n", 10 | "\n", 11 | "如果是文本、表格类数据,一般使用pandas这个类库做加载和处理,不用numpy\n", 12 | "\n", 13 | "几个方法:\n", 14 | "1. np.load(filename):从.npy或者.npz文件中加载numpy数组 \n", 15 | "如果文件后缀是.npy返回单个数组,如果文件后缀是.npz返回多个数组的字典\n", 16 | "2. np.save(filename, arr):将单个numpy数组保存到.npy文件中\n", 17 | "3. np.savez(filename, arra=arra, arrb=arrb):将多个numpy数组保存到.npz未压缩的文件格式中\n", 18 | "4. np.savez_compressed(filename, arra=arra, arrb=arrb):将多个numpy数组保存到.npz压缩的文件格式中\n", 19 | "\n", 20 | ".npy和.npz都是二进制格式文件,用纯文本编辑器打开都是乱码" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "### 1. 使用np.save和np.load保存和加载单个数组" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "array([[ 0, 1, 2, 3],\n", 48 | " [ 4, 5, 6, 7],\n", 49 | " [ 8, 9, 10, 11]])" 50 | ] 51 | }, 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "a = np.arange(12).reshape(3,4)\n", 59 | "a" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "# 把单个数组保存到.npy文件\n", 69 | "np.save(\"arr_a.npy\", a)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "array([[ 0, 1, 2, 3],\n", 81 | " [ 4, 5, 6, 7],\n", 82 | " [ 8, 9, 10, 11]])" 83 | ] 84 | }, 85 | "execution_count": 4, 86 | "metadata": {}, 87 | "output_type": "execute_result" 88 | } 89 | ], 90 | "source": [ 91 | "# 从.npy文件加载单个数组\n", 92 | "b = np.load(\"arr_a.npy\")\n", 93 | "b" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "### 2. 使用np.savez和np.load保存和加载多个数组" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 5, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/plain": [ 111 | "array([[ 0, 1, 2, 3],\n", 112 | " [ 4, 5, 6, 7],\n", 113 | " [ 8, 9, 10, 11]])" 114 | ] 115 | }, 116 | "execution_count": 5, 117 | "metadata": {}, 118 | "output_type": "execute_result" 119 | } 120 | ], 121 | "source": [ 122 | "a" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 6, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/plain": [ 133 | "array([[0.06355473, 0.69576567, 0.17754786],\n", 134 | " [0.28343315, 0.29994149, 0.76737219]])" 135 | ] 136 | }, 137 | "execution_count": 6, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "b = np.random.rand(2, 3)\n", 144 | "b" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 7, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "# 保存多个数组到一个文件\n", 154 | "np.savez(\"arr_ab.npz\", a=a, b=b)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 8, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "" 166 | ] 167 | }, 168 | "execution_count": 8, 169 | "metadata": {}, 170 | "output_type": "execute_result" 171 | } 172 | ], 173 | "source": [ 174 | "# 从.npz读取多个数组,返回一个字典形式\n", 175 | "data = np.load(\"arr_ab.npz\")\n", 176 | "data" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 9, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "array([[ 0, 1, 2, 3],\n", 188 | " [ 4, 5, 6, 7],\n", 189 | " [ 8, 9, 10, 11]])" 190 | ] 191 | }, 192 | "execution_count": 9, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "data[\"a\"]" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 10, 204 | "metadata": {}, 205 | "outputs": [ 206 | { 207 | "data": { 208 | "text/plain": [ 209 | "array([[0.06355473, 0.69576567, 0.17754786],\n", 210 | " [0.28343315, 0.29994149, 0.76737219]])" 211 | ] 212 | }, 213 | "execution_count": 10, 214 | "metadata": {}, 215 | "output_type": "execute_result" 216 | } 217 | ], 218 | "source": [ 219 | "data[\"b\"]" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "### 3. 使用np.savez_compressed和np.load保存和加载多个数组到压缩格式文件" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 11, 232 | "metadata": {}, 233 | "outputs": [ 234 | { 235 | "data": { 236 | "text/plain": [ 237 | "array([[ 0, 1, 2, 3],\n", 238 | " [ 4, 5, 6, 7],\n", 239 | " [ 8, 9, 10, 11]])" 240 | ] 241 | }, 242 | "execution_count": 11, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "a" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 12, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "array([[0.06355473, 0.69576567, 0.17754786],\n", 260 | " [0.28343315, 0.29994149, 0.76737219]])" 261 | ] 262 | }, 263 | "execution_count": 12, 264 | "metadata": {}, 265 | "output_type": "execute_result" 266 | } 267 | ], 268 | "source": [ 269 | "b" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 13, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "# 保存多个数组到压缩文件\n", 279 | "np.savez_compressed(\"arr_ab_compressed.npz\", a=a, b=b)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 14, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "# 同样用np.load加载.npz文件\n", 289 | "data = np.load(\"arr_ab_compressed.npz\")" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 15, 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "data": { 299 | "text/plain": [ 300 | "array([[ 0, 1, 2, 3],\n", 301 | " [ 4, 5, 6, 7],\n", 302 | " [ 8, 9, 10, 11]])" 303 | ] 304 | }, 305 | "execution_count": 15, 306 | "metadata": {}, 307 | "output_type": "execute_result" 308 | } 309 | ], 310 | "source": [ 311 | "data[\"a\"]" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 16, 317 | "metadata": {}, 318 | "outputs": [ 319 | { 320 | "data": { 321 | "text/plain": [ 322 | "array([[0.06355473, 0.69576567, 0.17754786],\n", 323 | " [0.28343315, 0.29994149, 0.76737219]])" 324 | ] 325 | }, 326 | "execution_count": 16, 327 | "metadata": {}, 328 | "output_type": "execute_result" 329 | } 330 | ], 331 | "source": [ 332 | "data[\"b\"]" 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": null, 338 | "metadata": {}, 339 | "outputs": [], 340 | "source": [] 341 | } 342 | ], 343 | "metadata": { 344 | "kernelspec": { 345 | "display_name": "Python 3", 346 | "language": "python", 347 | "name": "python3" 348 | }, 349 | "language_info": { 350 | "codemirror_mode": { 351 | "name": "ipython", 352 | "version": 3 353 | }, 354 | "file_extension": ".py", 355 | "mimetype": "text/x-python", 356 | "name": "python", 357 | "nbconvert_exporter": "python", 358 | "pygments_lexer": "ipython3", 359 | "version": "3.7.6" 360 | } 361 | }, 362 | "nbformat": 4, 363 | "nbformat_minor": 4 364 | } 365 | -------------------------------------------------------------------------------- /19. Numpy的结构化数组.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy的结构化数组" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "一般情况下,Numpy中的数组都是同样的数据类型,比如int、float; \n", 15 | "这也是Numpy性能高效的原因,在内存中紧凑存储,读取非常快; \n", 16 | "\n", 17 | "但是Numpy也可以记录异构数组,比如下面的数据: \n", 18 | "\n", 19 | " \n", 20 | " \n", 21 | " \n", 22 | " \n", 23 | " \n", 24 | " \n", 25 | " \n", 26 | " \n", 27 | " \n", 28 | " \n", 29 | " \n", 30 | " \n", 31 | " \n", 32 | " \n", 33 | " \n", 34 | " \n", 35 | " \n", 36 | " \n", 37 | " \n", 38 | " \n", 39 | "
姓名年龄体重
小王3080.5
小李2870.3
小天2978.6
\n", 40 | "\n", 41 | "这就是本节要介绍的“Numpy结构化数组”特性; " 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import numpy as np" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### 1. 正常的Numpy数组的dtype值只有一个类型" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 2, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/plain": [ 68 | "(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), dtype('int32'))" 69 | ] 70 | }, 71 | "execution_count": 2, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "arr = np.arange(10)\n", 78 | "arr, arr.dtype" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "data": { 88 | "text/plain": [ 89 | "(array([[0.13813273, 0.69213455, 0.2869116 , 0.64065806],\n", 90 | " [0.5972653 , 0.42803843, 0.84914465, 0.0502318 ],\n", 91 | " [0.31351949, 0.87095862, 0.52867948, 0.83884873]]),\n", 92 | " dtype('float64'))" 93 | ] 94 | }, 95 | "execution_count": 3, 96 | "metadata": {}, 97 | "output_type": "execute_result" 98 | } 99 | ], 100 | "source": [ 101 | "arr = np.random.rand(3, 4)\n", 102 | "arr, arr.dtype" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "### 2. 怎样使用Numpy表达异构数据" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/plain": [ 120 | "dtype([('name', '= 29]" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 13, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "data": { 323 | "text/plain": [ 324 | "array([('xiaowang', 30, 80.5)],\n", 325 | " dtype=[('name', '= 29) & (my_arr[\"weight\"] > 80)]" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "#### 对单列做逐元素计算" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 14, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "data": { 352 | "text/plain": [ 353 | "array([30, 28, 29])" 354 | ] 355 | }, 356 | "execution_count": 14, 357 | "metadata": {}, 358 | "output_type": "execute_result" 359 | } 360 | ], 361 | "source": [ 362 | "my_arr[\"age\"]" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 15, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "my_arr[\"age\"] += 1" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 16, 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/plain": [ 382 | "array([31, 29, 30])" 383 | ] 384 | }, 385 | "execution_count": 16, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "my_arr[\"age\"]" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "最后的一言: \n", 399 | "* 对于这种每列类型不同的“异构数据”,Pandas更擅长处理;\n", 400 | "* 但我们还要学习一下Numpy结构化数组,不一定会使用它,但要能读懂别人的代码" 401 | ] 402 | } 403 | ], 404 | "metadata": { 405 | "kernelspec": { 406 | "display_name": "Python 3", 407 | "language": "python", 408 | "name": "python3" 409 | }, 410 | "language_info": { 411 | "codemirror_mode": { 412 | "name": "ipython", 413 | "version": 3 414 | }, 415 | "file_extension": ".py", 416 | "mimetype": "text/x-python", 417 | "name": "python", 418 | "nbconvert_exporter": "python", 419 | "pygments_lexer": "ipython3", 420 | "version": "3.7.6" 421 | } 422 | }, 423 | "nbformat": 4, 424 | "nbformat_minor": 4 425 | } 426 | -------------------------------------------------------------------------------- /20. Numpy与Pandas数据的相互转换.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy与Pandas数据的相互转换\n", 8 | "\n", 9 | "Pandas是在Numpy基础上建立的非常流行的数据分析类库; \n", 10 | "提供了强大针对异构、表格类型数据的处理与分析能力。\n", 11 | "\n", 12 | "本节介绍Numpy和Pandas的转换方法: \n", 13 | "1. Numpy数组怎样输入给Pandas的Series、DataFrame;\n", 14 | "2. Pandas的Series、DataFrame怎样转换成Numpy的数组" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np\n", 24 | "import pandas as pd" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### 怎样将Numpy数组转换成Pandas的数据结构" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "#### 怎样将Numpy的一维数组变成Pandas的Series" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/plain": [ 49 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" 50 | ] 51 | }, 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "arr = np.arange(10)\n", 59 | "arr" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/plain": [ 70 | "0 0\n", 71 | "1 1\n", 72 | "2 2\n", 73 | "3 3\n", 74 | "4 4\n", 75 | "5 5\n", 76 | "6 6\n", 77 | "7 7\n", 78 | "8 8\n", 79 | "9 9\n", 80 | "dtype: int32" 81 | ] 82 | }, 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "series = pd.Series(arr)\n", 90 | "series" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "#### 怎样将Numpy的二维数组转换成Pandas的DataFrame" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 4, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "array([[3, 9, 6, 3],\n", 109 | " [4, 1, 8, 1],\n", 110 | " [2, 4, 4, 7],\n", 111 | " [4, 8, 4, 7],\n", 112 | " [8, 3, 9, 8]])" 113 | ] 114 | }, 115 | "execution_count": 4, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "arr = np.random.randint(1, 10, size=(5, 4))\n", 122 | "arr" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/html": [ 133 | "
\n", 134 | "\n", 147 | "\n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | "
cacbcccd
03963
14181
22447
34847
48398
\n", 195 | "
" 196 | ], 197 | "text/plain": [ 198 | " ca cb cc cd\n", 199 | "0 3 9 6 3\n", 200 | "1 4 1 8 1\n", 201 | "2 2 4 4 7\n", 202 | "3 4 8 4 7\n", 203 | "4 8 3 9 8" 204 | ] 205 | }, 206 | "execution_count": 5, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "df = pd.DataFrame(arr, columns = [\"ca\", \"cb\", \"cc\", \"cd\"])\n", 213 | "df" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 6, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/html": [ 224 | "
\n", 225 | "\n", 238 | "\n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | "
cacbcccd
48398
\n", 258 | "
" 259 | ], 260 | "text/plain": [ 261 | " ca cb cc cd\n", 262 | "4 8 3 9 8" 263 | ] 264 | }, 265 | "execution_count": 6, 266 | "metadata": {}, 267 | "output_type": "execute_result" 268 | } 269 | ], 270 | "source": [ 271 | "df[df[\"ca\"] > 4]" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "### 怎样Pandas的数据结构转换成Numpy数组\n", 279 | "\n", 280 | "* 方法1:.values()\n", 281 | "* 方法2:.to_numpy()\n", 282 | "\n", 283 | "用途: \n", 284 | "比如Scikit-Learn的模型输入需要的是Numpy的数组 \n", 285 | "可以使用Pandas对原始数据做大量的处理后,将结果数据转换成Numpy数组作为输入 " 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "#### 将Series转换成Numpy数组" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 7, 298 | "metadata": {}, 299 | "outputs": [ 300 | { 301 | "data": { 302 | "text/plain": [ 303 | "0 0\n", 304 | "1 1\n", 305 | "2 2\n", 306 | "3 3\n", 307 | "4 4\n", 308 | "5 5\n", 309 | "6 6\n", 310 | "7 7\n", 311 | "8 8\n", 312 | "9 9\n", 313 | "dtype: int64" 314 | ] 315 | }, 316 | "execution_count": 7, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "series = pd.Series(range(10))\n", 323 | "series" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 8, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "data": { 333 | "text/plain": [ 334 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)" 335 | ] 336 | }, 337 | "execution_count": 8, 338 | "metadata": {}, 339 | "output_type": "execute_result" 340 | } 341 | ], 342 | "source": [ 343 | "series.values" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 9, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/plain": [ 354 | "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)" 355 | ] 356 | }, 357 | "execution_count": 9, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "series.to_numpy()" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "#### 将DataFrame转换成Numpy数组" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 10, 376 | "metadata": {}, 377 | "outputs": [ 378 | { 379 | "data": { 380 | "text/html": [ 381 | "
\n", 382 | "\n", 395 | "\n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | "
feature_afeature_bfeature_c
01112.2345.23
12122.2355.23
23132.2365.23
34142.2375.23
\n", 431 | "
" 432 | ], 433 | "text/plain": [ 434 | " feature_a feature_b feature_c\n", 435 | "0 11 12.23 45.23\n", 436 | "1 21 22.23 55.23\n", 437 | "2 31 32.23 65.23\n", 438 | "3 41 42.23 75.23" 439 | ] 440 | }, 441 | "execution_count": 10, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "df = pd.DataFrame(\n", 448 | " [\n", 449 | " [11, 12.23, 45.23],\n", 450 | " [21, 22.23, 55.23],\n", 451 | " [31, 32.23, 65.23],\n", 452 | " [41, 42.23, 75.23]\n", 453 | " ],\n", 454 | " columns = [\"feature_a\", \"feature_b\", \"feature_c\"]\n", 455 | ")\n", 456 | "df" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 11, 462 | "metadata": {}, 463 | "outputs": [ 464 | { 465 | "data": { 466 | "text/plain": [ 467 | "array([[11. , 12.23, 45.23],\n", 468 | " [21. , 22.23, 55.23],\n", 469 | " [31. , 32.23, 65.23],\n", 470 | " [41. , 42.23, 75.23]])" 471 | ] 472 | }, 473 | "execution_count": 11, 474 | "metadata": {}, 475 | "output_type": "execute_result" 476 | } 477 | ], 478 | "source": [ 479 | "df.values" 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": 12, 485 | "metadata": {}, 486 | "outputs": [ 487 | { 488 | "data": { 489 | "text/plain": [ 490 | "array([[11. , 12.23, 45.23],\n", 491 | " [21. , 22.23, 55.23],\n", 492 | " [31. , 32.23, 65.23],\n", 493 | " [41. , 42.23, 75.23]])" 494 | ] 495 | }, 496 | "execution_count": 12, 497 | "metadata": {}, 498 | "output_type": "execute_result" 499 | } 500 | ], 501 | "source": [ 502 | "df.to_numpy()" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [] 511 | } 512 | ], 513 | "metadata": { 514 | "kernelspec": { 515 | "display_name": "Python 3", 516 | "language": "python", 517 | "name": "python3" 518 | }, 519 | "language_info": { 520 | "codemirror_mode": { 521 | "name": "ipython", 522 | "version": 3 523 | }, 524 | "file_extension": ".py", 525 | "mimetype": "text/x-python", 526 | "name": "python", 527 | "nbconvert_exporter": "python", 528 | "pygments_lexer": "ipython3", 529 | "version": "3.7.6" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 4 534 | } 535 | -------------------------------------------------------------------------------- /21. Numpy数据输入给Scikit-learn实现模型训练.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Numpy数据输入给Sklearn实现模型训练\n", 8 | "\n", 9 | "***本视频的目的,向大家演示:*** \n", 10 | "Numpy的数组怎样与sklearn模型交互,包括训练测试集拆分、输入给模型、评估模型、模型预估\n", 11 | "\n", 12 | "对于大家自己的任务,可以提前处理成这样的Numpy格式,然后输入给sklearn模型" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "# 使用sklearn自带的数据集,这些数据集都是Numpy的形式\n", 23 | "# 我们自己的数据,也可以处理成这种格式,然后就可以输入给模型\n", 24 | "from sklearn import datasets\n", 25 | "# 用train_test_split可以拆分训练集和测试集\n", 26 | "from sklearn.model_selection import train_test_split\n", 27 | "# 使用LinearRegression训练线性回归模型\n", 28 | "from sklearn.linear_model import LinearRegression" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### 1. 加载波斯顿房价数据集" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "# 加载数据集,存入特征矩阵data、预测结果向量target\n", 45 | "data, target = datasets.load_boston(return_X_y=True)" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "(numpy.ndarray, numpy.ndarray)" 57 | ] 58 | }, 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | } 63 | ], 64 | "source": [ 65 | "type(data), type(target)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/plain": [ 76 | "((506, 13), (506,))" 77 | ] 78 | }, 79 | "execution_count": 4, 80 | "metadata": {}, 81 | "output_type": "execute_result" 82 | } 83 | ], 84 | "source": [ 85 | "data.shape, target.shape" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 5, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "array([[6.3200e-03, 1.8000e+01, 2.3100e+00, 0.0000e+00, 5.3800e-01,\n", 97 | " 6.5750e+00, 6.5200e+01, 4.0900e+00, 1.0000e+00, 2.9600e+02,\n", 98 | " 1.5300e+01, 3.9690e+02, 4.9800e+00],\n", 99 | " [2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", 100 | " 6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", 101 | " 1.7800e+01, 3.9690e+02, 9.1400e+00],\n", 102 | " [2.7290e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", 103 | " 7.1850e+00, 6.1100e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", 104 | " 1.7800e+01, 3.9283e+02, 4.0300e+00]])" 105 | ] 106 | }, 107 | "execution_count": 5, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "# 查看前三条房子的特征信息\n", 114 | "data[:3]" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 6, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/plain": [ 125 | "array([24. , 21.6, 34.7])" 126 | ] 127 | }, 128 | "execution_count": 6, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "# 查看前三条房价结果\n", 135 | "target[:3]" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "### 2. 拆分训练集和测试集" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 7, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "# 拆分训练集和测试集\n", 152 | "X_train, X_test, y_train, y_test = train_test_split(data, target)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 8, 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/plain": [ 163 | "((379, 13), (379,))" 164 | ] 165 | }, 166 | "execution_count": 8, 167 | "metadata": {}, 168 | "output_type": "execute_result" 169 | } 170 | ], 171 | "source": [ 172 | "# 训练集的数据\n", 173 | "X_train.shape, y_train.shape" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 9, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "((127, 13), (127,))" 185 | ] 186 | }, 187 | "execution_count": 9, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "# 测试集的数据\n", 194 | "X_test.shape, y_test.shape" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "### 3. 训练线性回归模型" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 10, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "# 构造线性回归对象,使用默认参数即可\n", 211 | "clf = LinearRegression()" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 11, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/plain": [ 222 | "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" 223 | ] 224 | }, 225 | "execution_count": 11, 226 | "metadata": {}, 227 | "output_type": "execute_result" 228 | } 229 | ], 230 | "source": [ 231 | "# 执行训练\n", 232 | "clf.fit(X_train, y_train)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 12, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "0.7290997955432121" 244 | ] 245 | }, 246 | "execution_count": 12, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "# 在训练集上的打分\n", 253 | "clf.score(X_train, y_train)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "### 4. 评估模型和使用模型" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 13, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "data": { 270 | "text/plain": [ 271 | "0.7658281007291711" 272 | ] 273 | }, 274 | "execution_count": 13, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "# 在测试集上打分评估\n", 281 | "clf.score(X_test, y_test)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 14, 287 | "metadata": {}, 288 | "outputs": [ 289 | { 290 | "data": { 291 | "text/plain": [ 292 | "array([36.1889043 , 17.05681981, 26.1238293 ])" 293 | ] 294 | }, 295 | "execution_count": 14, 296 | "metadata": {}, 297 | "output_type": "execute_result" 298 | } 299 | ], 300 | "source": [ 301 | "# 只取前三条数据,实现房价预估\n", 302 | "clf.predict(X_test[:3])" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 15, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/plain": [ 313 | "array([50. , 23.1, 22.8])" 314 | ] 315 | }, 316 | "execution_count": 15, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "# 看下实际的房价\n", 323 | "y_test[:3]" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [] 332 | } 333 | ], 334 | "metadata": { 335 | "kernelspec": { 336 | "display_name": "Python 3", 337 | "language": "python", 338 | "name": "python3" 339 | }, 340 | "language_info": { 341 | "codemirror_mode": { 342 | "name": "ipython", 343 | "version": 3 344 | }, 345 | "file_extension": ".py", 346 | "mimetype": "text/x-python", 347 | "name": "python", 348 | "nbconvert_exporter": "python", 349 | "pygments_lexer": "ipython3", 350 | "version": "3.7.6" 351 | } 352 | }, 353 | "nbformat": 4, 354 | "nbformat_minor": 4 355 | } 356 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ant-learn-numpy 2 | Python科学计算库Numpy的代码实现 3 | 4 | 同时,欢迎大家关注我的微信公众号,也会分享很多Python领域学习的视频 5 | 关注:Python基础入门,爬虫、数据分析、大数据处理、机器学习、推荐系统等领域 6 | 7 | 公众号名字:蚂蚁学Python 8 | 9 | -------------------------------------------------------------------------------- /Untitled.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 4 6 | } 7 | -------------------------------------------------------------------------------- /Untitled1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 4 6 | } 7 | -------------------------------------------------------------------------------- /arr_a.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/arr_a.npy -------------------------------------------------------------------------------- /arr_ab.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/arr_ab.npz -------------------------------------------------------------------------------- /arr_ab_compressed.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/arr_ab_compressed.npz -------------------------------------------------------------------------------- /other_files/numpy-array-inv.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/other_files/numpy-array-inv.jpg -------------------------------------------------------------------------------- /other_files/numpy-kfold-validation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/other_files/numpy-kfold-validation.jpg -------------------------------------------------------------------------------- /other_files/numpy-kfold-validation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/other_files/numpy-kfold-validation.png -------------------------------------------------------------------------------- /other_files/numpy_random_functions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/peiss/ant-learn-numpy/dca4191fcf9c762902632c59cb49c73b12332397/other_files/numpy_random_functions.png --------------------------------------------------------------------------------