├── LICENSE ├── NumPy ├── (1)numpy_array_basis1.ipynb ├── (10)assert_function.ipynb ├── (11)unit_test.ipynb ├── (2)numpy_array_basis2.ipynb ├── (3)common_functions1.ipynb ├── (4)common_functions2——stock_analysis.ipynb ├── (5)convenience_function.ipynb ├── (6)linear_algebra.ipynb ├── (7)universal_functions.ipynb ├── (8)random_module.ipynb ├── (9)sort_and_search.ipynb ├── .ipynb_checkpoints │ ├── (1)numpy_array_basis1-checkpoint.ipynb │ ├── (10)assert_function-checkpoint.ipynb │ ├── (11)unit_test-checkpoint.ipynb │ ├── (2)numpy_array_basis2-checkpoint.ipynb │ ├── (3)common_functions1-checkpoint.ipynb │ ├── (4)common_functions2——stock_analysis-checkpoint.ipynb │ ├── (5)convenience_function-checkpoint.ipynb │ ├── (6)linear_algebra-checkpoint.ipynb │ ├── (7)universal_functions-checkpoint.ipynb │ ├── (8)random_module-checkpoint.ipynb │ └── (9)sort_and_search-checkpoint.ipynb ├── AAPL.csv ├── BHP.csv ├── VALE.csv └── data.csv ├── Pandas ├── (1)pandas_introduction.ipynb ├── (2)dataframe_slice_selection.ipynb └── .ipynb_checkpoints │ ├── (1)pandas_introduction-checkpoint.ipynb │ └── (2)dataframe_slice_selection-checkpoint.ipynb ├── README.md ├── Scikit-learn ├── (1)getting_started_with_iris.ipynb ├── (2)choose_a_ml_model.ipynb ├── (3)linear_regression.ipynb ├── (4)cross_validation.ipynb ├── (5)grid_search.ipynb ├── (6)classification_metrics.ipynb ├── .ipynb_checkpoints │ ├── (1)getting_started_with_iris-checkpoint.ipynb │ ├── (2)choose_a_ml_model-checkpoint.ipynb │ ├── (3)linear_regression-checkpoint.ipynb │ ├── (4)cross_validation-checkpoint.ipynb │ ├── (5)grid_search-checkpoint.ipynb │ └── (6)classification_metrics-checkpoint.ipynb └── Image │ ├── Data3classes.png │ ├── Map1NN.png │ ├── Map5NN.png │ ├── confusion_matrix.png │ ├── cross_validation_diagram.png │ ├── grid_vs_random.jpeg │ └── iris_petal_sepal.jpg └── Visualization ├── (1)plot_base.ipynb ├── (2)interesting_plot.ipynb ├── (3)special_curves_plot.ipynb ├── .ipynb_checkpoints ├── (1)plot_base-checkpoint.ipynb ├── (2)interesting_plot-checkpoint.ipynb └── (3)special_curves_plot-checkpoint.ipynb └── baidu_stock_price.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Jason Ding 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /NumPy/(1)numpy_array_basis1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "该小结主要介绍了NumPy数组的基本操作。\n", 9 | "\n", 10 | "子目1中,介绍创建和索引数组,数据类型,dtype类,自定义异构数据类型。\n", 11 | "\n", 12 | "子目2中,介绍数组的索引和切片,主要是对[]运算符的操作。\n", 13 | "\n", 14 | "子目3中,介绍如何改变数组的维度,分别介绍了**ravel函数、flatten函数、transpose函数、resize函数、reshape函数的用法。**" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "collapsed": false 22 | }, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "Populating the interactive namespace from numpy and matplotlib\n" 29 | ] 30 | } 31 | ], 32 | "source": [ 33 | "%pylab inline" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "ndarray是一个多维数组对象,该对象由**实际的数据、描述这些数据的元数据**组成,大部分数组操作仅仅修改元数据部分,而不改变底层的实际数据。" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "用arange函数创建数组" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 2, 53 | "metadata": { 54 | "collapsed": false 55 | }, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "dtype('int32')" 61 | ] 62 | }, 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "a = arange(5)\n", 70 | "a.dtype" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "metadata": { 77 | "collapsed": false 78 | }, 79 | "outputs": [ 80 | { 81 | "data": { 82 | "text/plain": [ 83 | "array([0, 1, 2, 3, 4])" 84 | ] 85 | }, 86 | "execution_count": 3, 87 | "metadata": {}, 88 | "output_type": "execute_result" 89 | } 90 | ], 91 | "source": [ 92 | "a" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 4, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "(5,)" 106 | ] 107 | }, 108 | "execution_count": 4, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "a.shape" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "数组的shape属性返回一个元祖(tuple),元组中的元素即NumPy数组每一个维度的大小。" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "source": [ 130 | "##1. 创建多维数组\n", 131 | "array函数可以依据给定的对象生成数组。\n", 132 | "给定的对象应是类数组,如python的列表、numpy的arange函数" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "m = array([arange(2), arange(2)])" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 6, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "[[0 1]\n", 158 | " [0 1]]\n", 159 | "(2, 2)\n", 160 | "\n", 161 | "\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "print m\n", 167 | "print m.shape\n", 168 | "print type(m)\n", 169 | "print type(m.shape)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "###选取元素" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 7, 182 | "metadata": { 183 | "collapsed": false 184 | }, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "1\n", 191 | "2\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "a = array([[1,2],[3,4]])\n", 197 | "print a[0,0]\n", 198 | "print a[0,1]" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "###NumPy数据类型\n", 206 | "Numpy除了Python支持的整型、浮点型、复数型之外,还添加了很多其他的数据类型。\n", 207 | "\n", 208 | "Type\tRemarks\t Character code\n", 209 | "bool_\tcompatible: Python bool\t'?'\n", 210 | "bool8\t8 bits\t \n", 211 | "Integers:\n", 212 | "\n", 213 | "byte\tcompatible: C char\t'b'\n", 214 | "short\tcompatible: C short\t'h'\n", 215 | "intc\tcompatible: C int\t'i'\n", 216 | "int_\tcompatible: Python int\t'l'\n", 217 | "longlong\tcompatible: C long long\t'q'\n", 218 | "intp\tlarge enough to fit a pointer\t'p'\n", 219 | "int8\t8 bits\t \n", 220 | "int16\t16 bits\t \n", 221 | "int32\t32 bits\t \n", 222 | "int64\t64 bits\t \n", 223 | "Unsigned integers:\n", 224 | "\n", 225 | "ubyte\tcompatible: C unsigned char\t'B'\n", 226 | "ushort\tcompatible: C unsigned short\t'H'\n", 227 | "uintc\tcompatible: C unsigned int\t'I'\n", 228 | "uint\tcompatible: Python int\t'L'\n", 229 | "ulonglong\tcompatible: C long long\t'Q'\n", 230 | "uintp\tlarge enough to fit a pointer\t'P'\n", 231 | "uint8\t8 bits\t \n", 232 | "uint16\t16 bits\t \n", 233 | "uint32\t32 bits\t \n", 234 | "uint64\t64 bits\t \n", 235 | "Floating-point numbers:\n", 236 | "\n", 237 | "half\t \t'e'\n", 238 | "single\tcompatible: C float\t'f'\n", 239 | "double\tcompatible: C double\t \n", 240 | "float_\tcompatible: Python float\t'd'\n", 241 | "longfloat\tcompatible: C long float\t'g'\n", 242 | "float16\t16 bits\t \n", 243 | "float32\t32 bits\t \n", 244 | "float64\t64 bits\t \n", 245 | "float96\t96 bits, platform?\t \n", 246 | "float128\t128 bits, platform?\t \n", 247 | "Complex floating-point numbers:\n", 248 | "\n", 249 | "csingle\t \t'F'\n", 250 | "complex_\tcompatible: Python complex\t'D'\n", 251 | "clongfloat\t \t'G'\n", 252 | "complex64\ttwo 32-bit floats\t \n", 253 | "complex128\ttwo 64-bit floats\t \n", 254 | "complex192\ttwo 96-bit floats, platform?\t \n", 255 | "complex256\ttwo 128-bit floats, platform?\t \n", 256 | "Any Python object:\n", 257 | "\n", 258 | "object_\tany Python object\t'O'" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "**每一种数据类型均有对应的类型转换函数**" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 8, 271 | "metadata": { 272 | "collapsed": false 273 | }, 274 | "outputs": [ 275 | { 276 | "name": "stdout", 277 | "output_type": "stream", 278 | "text": [ 279 | "42.0\n", 280 | "42\n", 281 | "True\n", 282 | "1.0\n" 283 | ] 284 | } 285 | ], 286 | "source": [ 287 | "print float64(42)\n", 288 | "print int8(42.0)\n", 289 | "print bool(42)\n", 290 | "print float(True)" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "metadata": { 297 | "collapsed": false 298 | }, 299 | "outputs": [ 300 | { 301 | "data": { 302 | "text/plain": [ 303 | "array([0, 1, 2, 3, 4, 5, 6, 7], dtype=uint16)" 304 | ] 305 | }, 306 | "execution_count": 9, 307 | "metadata": {}, 308 | "output_type": "execute_result" 309 | } 310 | ], 311 | "source": [ 312 | "arange(8, dtype=uint16)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "**复数不能转换成整数和浮点数**" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "Numpy数组中每一个元素均为相同的数据类型,现在给出单个元素所占字节" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 10, 332 | "metadata": { 333 | "collapsed": false 334 | }, 335 | "outputs": [ 336 | { 337 | "data": { 338 | "text/plain": [ 339 | "dtype('int32')" 340 | ] 341 | }, 342 | "execution_count": 10, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "a.dtype" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 11, 354 | "metadata": { 355 | "collapsed": false 356 | }, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/plain": [ 361 | "4" 362 | ] 363 | }, 364 | "execution_count": 11, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "a.dtype.itemsize" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "**dtype类的属性**" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 12, 383 | "metadata": { 384 | "collapsed": false 385 | }, 386 | "outputs": [ 387 | { 388 | "name": "stdout", 389 | "output_type": "stream", 390 | "text": [ 391 | "d\n", 392 | "\n", 393 | "表示;与之相反,小端序是将最低位字节存储在最低的内存地址处,用<表示。" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "**创建自定义数据类型**" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "自定义数据类型是一种异构数据类型,可以当做用来记录电子表格或数据库中一行数据的结构。\n", 425 | "\n", 426 | "下面我们创建一种自定义的异构数据类型,该数据类型包括一个用字符串记录的名字、一个用整数记录的数字以及一个用浮点数记录的价格。" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 13, 432 | "metadata": { 433 | "collapsed": true 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "t = dtype([('name', str_, 40), ('numitems', int32), ('price', float32)])" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 14, 443 | "metadata": { 444 | "collapsed": false 445 | }, 446 | "outputs": [ 447 | { 448 | "data": { 449 | "text/plain": [ 450 | "dtype([('name', 'S40'), ('numitems', '" 686 | ] 687 | }, 688 | "execution_count": 26, 689 | "metadata": {}, 690 | "output_type": "execute_result" 691 | } 692 | ], 693 | "source": [ 694 | "f = b.flat\n", 695 | "f" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 27, 701 | "metadata": { 702 | "collapsed": false 703 | }, 704 | "outputs": [ 705 | { 706 | "name": "stdout", 707 | "output_type": "stream", 708 | "text": [ 709 | "0\n", 710 | "1\n", 711 | "2\n", 712 | "3\n" 713 | ] 714 | } 715 | ], 716 | "source": [ 717 | "for item in f:\n", 718 | " print item" 719 | ] 720 | }, 721 | { 722 | "cell_type": "code", 723 | "execution_count": 28, 724 | "metadata": { 725 | "collapsed": false 726 | }, 727 | "outputs": [ 728 | { 729 | "data": { 730 | "text/plain": [ 731 | "2" 732 | ] 733 | }, 734 | "execution_count": 28, 735 | "metadata": {}, 736 | "output_type": "execute_result" 737 | } 738 | ], 739 | "source": [ 740 | "#使用flatiter对象直接获取一个数组的元素\n", 741 | "b.flat[2]" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 29, 747 | "metadata": { 748 | "collapsed": false 749 | }, 750 | "outputs": [ 751 | { 752 | "data": { 753 | "text/plain": [ 754 | "array([1, 3])" 755 | ] 756 | }, 757 | "execution_count": 29, 758 | "metadata": {}, 759 | "output_type": "execute_result" 760 | } 761 | ], 762 | "source": [ 763 | "b.flat[[1,3]]" 764 | ] 765 | }, 766 | { 767 | "cell_type": "code", 768 | "execution_count": 30, 769 | "metadata": { 770 | "collapsed": true 771 | }, 772 | "outputs": [], 773 | "source": [ 774 | "#对flat属性赋值将导致整个数组的元素被覆盖\n", 775 | "b.flat = 7" 776 | ] 777 | }, 778 | { 779 | "cell_type": "code", 780 | "execution_count": 31, 781 | "metadata": { 782 | "collapsed": false 783 | }, 784 | "outputs": [ 785 | { 786 | "data": { 787 | "text/plain": [ 788 | "array([[7, 7],\n", 789 | " [7, 7]])" 790 | ] 791 | }, 792 | "execution_count": 31, 793 | "metadata": {}, 794 | "output_type": "execute_result" 795 | } 796 | ], 797 | "source": [ 798 | "b" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": 34, 804 | "metadata": { 805 | "collapsed": false 806 | }, 807 | "outputs": [ 808 | { 809 | "data": { 810 | "text/plain": [ 811 | "array([[7, 1],\n", 812 | " [7, 1]])" 813 | ] 814 | }, 815 | "execution_count": 34, 816 | "metadata": {}, 817 | "output_type": "execute_result" 818 | } 819 | ], 820 | "source": [ 821 | "b.flat[[1,3]] = 1\n", 822 | "b" 823 | ] 824 | }, 825 | { 826 | "cell_type": "markdown", 827 | "metadata": {}, 828 | "source": [ 829 | "##4. 数组的转换\n", 830 | "**tolist函数将numpy数组转换成python列表**" 831 | ] 832 | }, 833 | { 834 | "cell_type": "code", 835 | "execution_count": 35, 836 | "metadata": { 837 | "collapsed": false 838 | }, 839 | "outputs": [ 840 | { 841 | "data": { 842 | "text/plain": [ 843 | "array([[7, 1],\n", 844 | " [7, 1]])" 845 | ] 846 | }, 847 | "execution_count": 35, 848 | "metadata": {}, 849 | "output_type": "execute_result" 850 | } 851 | ], 852 | "source": [ 853 | "b" 854 | ] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": 36, 859 | "metadata": { 860 | "collapsed": false 861 | }, 862 | "outputs": [ 863 | { 864 | "data": { 865 | "text/plain": [ 866 | "[[7, 1], [7, 1]]" 867 | ] 868 | }, 869 | "execution_count": 36, 870 | "metadata": {}, 871 | "output_type": "execute_result" 872 | } 873 | ], 874 | "source": [ 875 | "b.tolist()" 876 | ] 877 | }, 878 | { 879 | "cell_type": "markdown", 880 | "metadata": {}, 881 | "source": [ 882 | "**astype函数可以在转换数组时指定数据类型**" 883 | ] 884 | }, 885 | { 886 | "cell_type": "code", 887 | "execution_count": 37, 888 | "metadata": { 889 | "collapsed": false 890 | }, 891 | "outputs": [ 892 | { 893 | "data": { 894 | "text/plain": [ 895 | "array([[ 7., 1.],\n", 896 | " [ 7., 1.]])" 897 | ] 898 | }, 899 | "execution_count": 37, 900 | "metadata": {}, 901 | "output_type": "execute_result" 902 | } 903 | ], 904 | "source": [ 905 | "b.astype(float)" 906 | ] 907 | }, 908 | { 909 | "cell_type": "code", 910 | "execution_count": null, 911 | "metadata": { 912 | "collapsed": true 913 | }, 914 | "outputs": [], 915 | "source": [] 916 | } 917 | ], 918 | "metadata": { 919 | "kernelspec": { 920 | "display_name": "Python 2", 921 | "language": "python", 922 | "name": "python2" 923 | }, 924 | "language_info": { 925 | "codemirror_mode": { 926 | "name": "ipython", 927 | "version": 2 928 | }, 929 | "file_extension": ".py", 930 | "mimetype": "text/x-python", 931 | "name": "python", 932 | "nbconvert_exporter": "python", 933 | "pygments_lexer": "ipython2", 934 | "version": "2.7.5" 935 | } 936 | }, 937 | "nbformat": 4, 938 | "nbformat_minor": 0 939 | } 940 | -------------------------------------------------------------------------------- /NumPy/(3)common_functions1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "这一小节介绍NumPy的常用函数。\n", 9 | "\n", 10 | "1. 读入csv\n", 11 | "loadtxt函数\n", 12 | "\n", 13 | "2. 计算平均值\n", 14 | "average、mean函数\n", 15 | "\n", 16 | "3. 求最大最小值\n", 17 | "max、min函数\n", 18 | "\n", 19 | "4. 计算中位数\n", 20 | "median、msort函数\n", 21 | "\n", 22 | "5. 计算方差\n", 23 | "var函数" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "##1. 读写CSV文件\n", 31 | "CSV(Comma-Separated Value,逗号分隔值)格式是一种常见的文件格式。通常,数据库的转存文件就是csv格式的,文件中的各个字段对应于数据库中的列。" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "**Numpy中的loadtxt函数可以方便地读取csv文件,自动切分字段,并将数据载入Numpy数组。**" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "import numpy as np" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "c, v = np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "data.csv文件是苹果公司的历史股价数据。第一列为股票代码,第二列为dd-mm-yyyy格式的日期,第三列为空,随后各列依次是**开盘价(4)、最高价(5)、最低价(6)和收盘价(7)**,最后一列为当日的成交量(8)。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "loadtxt函数中,*usecols参数*为一个元组,以获得第7字段至第8字段的数据,也就是股票的收盘价和成交量数据。\n", 75 | "\n", 76 | "*unpack参数*设置为True,意思是分拆存储不同列的数据,即分别将收盘价和成交量的数组赋值给变量c和v。\n", 77 | "\n", 78 | "**用usecols中的参数指定我们感兴趣的数据列,并将unpack参数设置为True使得不同列的数据分别存储。**" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": { 85 | "collapsed": false 86 | }, 87 | "outputs": [ 88 | { 89 | "data": { 90 | "text/plain": [ 91 | "array([ 336.1 , 339.32, 345.03, 344.32, 343.44, 346.5 , 351.88,\n", 92 | " 355.2 , 358.16, 354.54, 356.85, 359.18, 359.9 , 363.13,\n", 93 | " 358.3 , 350.56, 338.61, 342.62, 342.88, 348.16, 353.21,\n", 94 | " 349.31, 352.12, 359.56, 360. , 355.36, 355.76, 352.47,\n", 95 | " 346.67, 351.99])" 96 | ] 97 | }, 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "output_type": "execute_result" 101 | } 102 | ], 103 | "source": [ 104 | "c" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 4, 110 | "metadata": { 111 | "collapsed": false 112 | }, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/plain": [ 117 | "array([ 21144800., 13473000., 15236800., 9242600., 14064100.,\n", 118 | " 11494200., 17322100., 13608500., 17240800., 33162400.,\n", 119 | " 13127500., 11086200., 10149000., 17184100., 18949000.,\n", 120 | " 29144500., 31162200., 23994700., 17853500., 13572000.,\n", 121 | " 14395400., 16290300., 21521000., 17885200., 16188000.,\n", 122 | " 19504300., 12718000., 16192700., 18138800., 16824200.])" 123 | ] 124 | }, 125 | "execution_count": 4, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "v" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 18, 137 | "metadata": { 138 | "collapsed": false 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "#选择第4列,开盘价\n", 143 | "opening_price = np.loadtxt('data.csv', delimiter=',', usecols=(3,), unpack=True)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 19, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "[ 344.17 335.8 341.3 344.45 343.8 343.61 347.89 353.68 355.19\n", 158 | " 357.39 354.75 356.79 359.19 360.8 357.1 358.21 342.05 338.77\n", 159 | " 344.02 345.29 351.21 355.47 349.96 357.2 360.07 361.11 354.91\n", 160 | " 354.69 349.69 345.4 ]\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "print opening_price" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "##2. 计算平均值" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "###2.1 计算加权平均\n", 180 | "VWAP是Volume-Weighted Average Price,成交量加权平均价格,某个价格的成交量越高,该价格所占的权重就越大。**VWAP就是以成交量为权重计算出来的加权平均值。**" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 20, 186 | "metadata": { 187 | "collapsed": false 188 | }, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "VWAP = 350.589549353\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "vwap = np.average(c, weights=v)\n", 200 | "print \"VWAP =\", vwap" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "TWAP是Time0Weighted Average Price,时间加权平均价格,其基本思想是最近的价格重要性大一些,所以我们应该对近期的价格给以较高的权重。\n", 208 | "我们使用arange函数创建从0递增的自然数序列,自然数的个数即为收盘价的个数。" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 23, 214 | "metadata": { 215 | "collapsed": false 216 | }, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "twap = 352.428321839\n" 223 | ] 224 | } 225 | ], 226 | "source": [ 227 | "t = np.arange(len(c))\n", 228 | "print \"twap = \",np.average(c, weights=t)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "###2.2 算术平均\n", 236 | "使用mean函数计算算术平均" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 21, 242 | "metadata": { 243 | "collapsed": false 244 | }, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "mean = 351.037666667\n" 251 | ] 252 | } 253 | ], 254 | "source": [ 255 | "mean = np.mean(c)\n", 256 | "print \"mean = \",mean" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 32, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [ 266 | { 267 | "name": "stdout", 268 | "output_type": "stream", 269 | "text": [ 270 | "mean = 351.037666667\n" 271 | ] 272 | } 273 | ], 274 | "source": [ 275 | "print \"mean = \", c.mean()" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "##3. 求最大最小值和取值范围" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "步骤:读入最高价和最低价,使用max和min函数得到最大最小值。" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 24, 295 | "metadata": { 296 | "collapsed": false 297 | }, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "hightest = 364.9\n", 304 | "lowest = 333.53\n" 305 | ] 306 | } 307 | ], 308 | "source": [ 309 | "h,l = np.loadtxt('data.csv', delimiter=',', usecols=(4,5), unpack=True)\n", 310 | "print 'hightest = ', np.max(h)\n", 311 | "print 'lowest = ', np.min(l)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "numpy中ptp函数可以计算数组的取值范围。该函数返回的是数组元素最大值和最小值的差值,即max(array)-min(array)。" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 25, 324 | "metadata": { 325 | "collapsed": false 326 | }, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "Spread high price : 24.86\n", 333 | "Spread low price : 26.97\n" 334 | ] 335 | } 336 | ], 337 | "source": [ 338 | "print 'Spread high price : ', np.ptp(h)\n", 339 | "print 'Spread low price : ', np.ptp(l)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "##4. 计算中位数" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "计算收盘价的中位数" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 26, 359 | "metadata": { 360 | "collapsed": false 361 | }, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "median = 352.055\n" 368 | ] 369 | } 370 | ], 371 | "source": [ 372 | "closing_price = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)\n", 373 | "print 'median = ', np.median(closing_price)" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "对数组进行排序,之后再去中位数" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 27, 386 | "metadata": { 387 | "collapsed": false 388 | }, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "sorted_closing_price = [ 336.1 338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5\n", 395 | " 346.67 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21\n", 396 | " 354.54 355.2 355.36 355.76 356.85 358.16 358.3 359.18 359.56\n", 397 | " 359.9 360. 363.13]\n" 398 | ] 399 | } 400 | ], 401 | "source": [ 402 | "sorted_closing = np.msort(closing_price)\n", 403 | "print \"sorted_closing_price = \", sorted_closing" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 29, 409 | "metadata": { 410 | "collapsed": false 411 | }, 412 | "outputs": [ 413 | { 414 | "name": "stdout", 415 | "output_type": "stream", 416 | "text": [ 417 | "median = 352.055\n" 418 | ] 419 | } 420 | ], 421 | "source": [ 422 | "#先判断数组的个数是奇数还是偶数\n", 423 | "N = len(closing_price)\n", 424 | "median_ind = (N-1)/2\n", 425 | "if N & 0x1 :\n", 426 | " print \"median = \", sorted_closing[median_ind]\n", 427 | "else:\n", 428 | " print \"median = \", (sorted_closing[median_ind]+sorted_closing[median_ind+1])/2" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "##5. 计算方差\n", 436 | "方差体现变量变化的程度,股价变动过于剧烈的股票一定会给持有者制造麻烦。" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 30, 442 | "metadata": { 443 | "collapsed": false 444 | }, 445 | "outputs": [ 446 | { 447 | "name": "stdout", 448 | "output_type": "stream", 449 | "text": [ 450 | "variance = 50.1265178889\n" 451 | ] 452 | } 453 | ], 454 | "source": [ 455 | "print \"variance = \", np.var(closing_price)" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 31, 461 | "metadata": { 462 | "collapsed": false 463 | }, 464 | "outputs": [ 465 | { 466 | "name": "stdout", 467 | "output_type": "stream", 468 | "text": [ 469 | "variance from definition = 50.1265178889\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "#手动求方差\n", 475 | "print 'variance from definition = ', np.mean( (closing_price-c.mean())**2 )" 476 | ] 477 | } 478 | ], 479 | "metadata": { 480 | "kernelspec": { 481 | "display_name": "Python 2", 482 | "language": "python", 483 | "name": "python2" 484 | }, 485 | "language_info": { 486 | "codemirror_mode": { 487 | "name": "ipython", 488 | "version": 2 489 | }, 490 | "file_extension": ".py", 491 | "mimetype": "text/x-python", 492 | "name": "python", 493 | "nbconvert_exporter": "python", 494 | "pygments_lexer": "ipython2", 495 | "version": "2.7.5" 496 | } 497 | }, 498 | "nbformat": 4, 499 | "nbformat_minor": 0 500 | } 501 | -------------------------------------------------------------------------------- /NumPy/(6)linear_algebra.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "1. 矩阵 --- mat函数\n", 9 | "2. 线性代数 --- \n", 10 | "numpy.linalg中的逆矩阵函数inv函数、行列式det函数、求解线性方程组的solve函数、内积dot函数、特征分解eigvals函数、eig函数、奇异值分解svd函数、广义逆矩阵的pinv函数" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "collapsed": true 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "##1. 矩阵\n", 29 | "在NumP中,矩阵是ndarray的子类,可以由专用的字符串格式来创建。我们可以使用mat、matrix、以及bmat函数来创建矩阵。" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "collapsed": true 36 | }, 37 | "source": [ 38 | "###1.1 创建矩阵\n", 39 | "mat函数创建矩阵时,若输入已经为matrix或ndarray对象,则不会为它们创建副本。因此,调用mat函数和调用matrix(data, copy=False)等价。" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "**在创建矩阵的专用字符串中,矩阵的行与行之间用分号隔开,行内的元素之间用空格隔开。**" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 2, 52 | "metadata": { 53 | "collapsed": false 54 | }, 55 | "outputs": [ 56 | { 57 | "name": "stdout", 58 | "output_type": "stream", 59 | "text": [ 60 | "Creation from string:\n", 61 | "[[1 2 3]\n", 62 | " [4 5 6]\n", 63 | " [7 8 9]]\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "A = np.mat('1 2 3; 4 5 6; 7 8 9')\n", 69 | "print \"Creation from string:\\n\", A" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": { 76 | "collapsed": false 77 | }, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "Transpose A :\n", 84 | "[[1 4 7]\n", 85 | " [2 5 8]\n", 86 | " [3 6 9]]\n", 87 | "Inverse A :\n", 88 | "[[ -4.50359963e+15 9.00719925e+15 -4.50359963e+15]\n", 89 | " [ 9.00719925e+15 -1.80143985e+16 9.00719925e+15]\n", 90 | " [ -4.50359963e+15 9.00719925e+15 -4.50359963e+15]]\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "# 转置\n", 96 | "print \"Transpose A :\\n\", A.T\n", 97 | "\n", 98 | "# 逆矩阵\n", 99 | "print \"Inverse A :\\n\", A.I" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": { 106 | "collapsed": false 107 | }, 108 | "outputs": [ 109 | { 110 | "name": "stdout", 111 | "output_type": "stream", 112 | "text": [ 113 | "Creation from array: \n", 114 | "[[0 1 2]\n", 115 | " [3 4 5]\n", 116 | " [6 7 8]]\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "# 通过NumPy数组创建矩阵\n", 122 | "print \"Creation from array: \\n\", np.mat(np.arange(9).reshape(3,3))" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "###1.2 从已有矩阵创建新矩阵\n", 130 | "我们可以利用一些已有的较小的矩阵来创建一个新的大矩阵。用bmat函数来实现。" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 5, 136 | "metadata": { 137 | "collapsed": false 138 | }, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "A:\n", 145 | "[[ 1. 0.]\n", 146 | " [ 0. 1.]]\n", 147 | "B:\n", 148 | "[[ 2. 0.]\n", 149 | " [ 0. 2.]]\n", 150 | "Compound matrix:\n", 151 | "[[ 1. 0. 2. 0.]\n", 152 | " [ 0. 1. 0. 2.]]\n", 153 | "Compound matrix:\n", 154 | "[[ 1. 0. 2. 0.]\n", 155 | " [ 0. 1. 0. 2.]\n", 156 | " [ 2. 0. 1. 0.]\n", 157 | " [ 0. 2. 0. 1.]]\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "A = np.eye(2)\n", 163 | "print \"A:\\n\", A\n", 164 | "\n", 165 | "B = 2 * A\n", 166 | "print \"B:\\n\", B\n", 167 | "\n", 168 | "# 使用字符串创建复合矩阵\n", 169 | "print \"Compound matrix:\\n\", np.bmat(\"A B\")\n", 170 | "print \"Compound matrix:\\n\", np.bmat(\"A B; B A\")" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "collapsed": true 177 | }, 178 | "source": [ 179 | "##2. 线性代数\n", 180 | "线性代数是数学的一个重要分支。numpy.linalg模块包含线性代数的函数。使用这个模块,我们可以计算逆矩阵、求特征值、解线性方程组以及求解行列式。" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "###2.1 计算逆矩阵\n", 188 | "使用inv函数计算逆矩阵。" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 6, 194 | "metadata": { 195 | "collapsed": false 196 | }, 197 | "outputs": [ 198 | { 199 | "name": "stdout", 200 | "output_type": "stream", 201 | "text": [ 202 | "A:\n", 203 | "[[ 0 1 2]\n", 204 | " [ 1 0 3]\n", 205 | " [ 4 -3 8]]\n", 206 | "inverse of A:\n", 207 | "[[-4.5 7. -1.5]\n", 208 | " [-2. 4. -1. ]\n", 209 | " [ 1.5 -2. 0.5]]\n", 210 | "check inverse:\n", 211 | "[[ 1. 0. 0.]\n", 212 | " [ 0. 1. 0.]\n", 213 | " [ 0. 0. 1.]]\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "A = np.mat(\"0 1 2; 1 0 3; 4 -3 8\")\n", 219 | "print \"A:\\n\", A\n", 220 | "inverse = np.linalg.inv(A)\n", 221 | "print \"inverse of A:\\n\", inverse\n", 222 | "\n", 223 | "print \"check inverse:\\n\", inverse * A" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "###2.2 行列式\n", 231 | "行列式是与方阵相关的一个标量值。对于一个n*n的实数矩阵,**行列式描述的是一个线性变换对“有向体积”所造成的影响。行列式的值为正,表示保持了空间的定向(顺时针或逆时针),为负表示颠倒空间的定向。**numpy.linalg模块中的det函数可以计算矩阵的行列式。" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 7, 237 | "metadata": { 238 | "collapsed": false 239 | }, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | "A:\n", 246 | "[[3 4]\n", 247 | " [5 6]]\n", 248 | "Determinant:\n", 249 | "-2.0\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "A = np.mat(\"3 4; 5 6\")\n", 255 | "print \"A:\\n\", A\n", 256 | "\n", 257 | "print \"Determinant:\\n\", np.linalg.det(A)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "###2.3 求解线性方程组\n", 265 | "矩阵可以对向量进行线性变换,这对应于数学中的线性方程组。solve函数可以求解形如Ax = b的线性方程组,其中A是矩阵,b是一维或二维的数组,x是未知变量。" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 8, 271 | "metadata": { 272 | "collapsed": false 273 | }, 274 | "outputs": [ 275 | { 276 | "name": "stdout", 277 | "output_type": "stream", 278 | "text": [ 279 | "A:\n", 280 | "[[ 1 -2 1]\n", 281 | " [ 0 2 -8]\n", 282 | " [-4 5 9]]\n", 283 | "b:\n", 284 | "[ 0 8 -9]\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "A = np.mat(\"1 -2 1; 0 2 -8; -4 5 9\")\n", 290 | "print \"A:\\n\", A\n", 291 | "b = np.array([0,8,-9])\n", 292 | "print \"b:\\n\", b" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 9, 298 | "metadata": { 299 | "collapsed": false 300 | }, 301 | "outputs": [ 302 | { 303 | "name": "stdout", 304 | "output_type": "stream", 305 | "text": [ 306 | "Solution:\n", 307 | "[ 29. 16. 3.]\n", 308 | "Check:\n", 309 | "[[ True True True]]\n", 310 | "[[ 0. 8. -9.]]\n" 311 | ] 312 | } 313 | ], 314 | "source": [ 315 | "x = np.linalg.solve(A, b)\n", 316 | "print \"Solution:\\n\", x\n", 317 | "\n", 318 | "# check\n", 319 | "print \"Check:\\n\",b == np.dot(A, x)\n", 320 | "print np.dot(A, x)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "###2.4 特征值和特征向量\n", 328 | "特征值(eigenvalue)即方程Ax = ax的根,是一个标量。其中,A是一个二维矩阵,x是一个一维向量。特征向量(eigenvector)是关于特征值的向量。在numpy.linalg模块中,eigvals函数可以计算矩阵的特征值,而eig函数可以返回一个包含特征值和对应特征向量的元组。" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 10, 334 | "metadata": { 335 | "collapsed": false 336 | }, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "A:\n", 343 | "[[ 3 -2]\n", 344 | " [ 1 0]]\n", 345 | "Eigenvalues:\n", 346 | "[ 2. 1.]\n", 347 | "Eigenvalues:\n", 348 | "[ 2. 1.]\n", 349 | "Eigenvectors:\n", 350 | "[[ 0.89442719 0.70710678]\n", 351 | " [ 0.4472136 0.70710678]]\n" 352 | ] 353 | } 354 | ], 355 | "source": [ 356 | "A = np.mat(\"3 -2; 1 0\")\n", 357 | "print \"A:\\n\", A\n", 358 | "\n", 359 | "print \"Eigenvalues:\\n\", np.linalg.eigvals(A)\n", 360 | "\n", 361 | "eigenvalues, eigenvectors = np.linalg.eig(A)\n", 362 | "print \"Eigenvalues:\\n\", eigenvalues\n", 363 | "print \"Eigenvectors:\\n\", eigenvectors" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 11, 369 | "metadata": { 370 | "collapsed": false 371 | }, 372 | "outputs": [ 373 | { 374 | "name": "stdout", 375 | "output_type": "stream", 376 | "text": [ 377 | "Left:\n", 378 | "[[ 1.78885438]\n", 379 | " [ 0.89442719]]\n", 380 | "Right:\n", 381 | "[[ 1.78885438]\n", 382 | " [ 0.89442719]]\n", 383 | "\n", 384 | "Left:\n", 385 | "[[ 0.70710678]\n", 386 | " [ 0.70710678]]\n", 387 | "Right:\n", 388 | "[[ 0.70710678]\n", 389 | " [ 0.70710678]]\n", 390 | "\n" 391 | ] 392 | } 393 | ], 394 | "source": [ 395 | "# check\n", 396 | "# 计算 Ax = ax的左右两部分的值\n", 397 | "for i in range(len(eigenvalues)):\n", 398 | " print \"Left:\\n\", np.dot(A, eigenvectors[:,i])\n", 399 | " print \"Right:\\n\", np.dot(eigenvalues[i], eigenvectors[:,i])\n", 400 | " print" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": { 406 | "collapsed": true 407 | }, 408 | "source": [ 409 | "###2.5 奇异值分解\n", 410 | "SVD(Singular Value Decomposition,奇异值分解)是一种因子分解运算,将一个矩阵分解为3个矩阵的乘积。奇异值分解是特征值分解的一种推广。\n", 411 | "\n", 412 | "在numpy.linalg模块中的svd函数可以对矩阵进行奇异值分解。该函数返回3个矩阵——U、Sigma和V,其中U和V是正交矩阵,Sigma包含输入矩阵的奇异值。" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 12, 418 | "metadata": { 419 | "collapsed": false 420 | }, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/latex": [ 425 | "$M=U \\Sigma V^*$" 426 | ], 427 | "text/plain": [ 428 | "" 429 | ] 430 | }, 431 | "execution_count": 12, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "from IPython.display import Latex\n", 438 | "Latex(r\"$M=U \\Sigma V^*$\")" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "*号表示共轭转置" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 13, 451 | "metadata": { 452 | "collapsed": false 453 | }, 454 | "outputs": [ 455 | { 456 | "name": "stdout", 457 | "output_type": "stream", 458 | "text": [ 459 | "A:\n", 460 | "[[ 4 11 14]\n", 461 | " [ 8 7 -2]]\n", 462 | "U:\n", 463 | "[[-0.9486833 -0.31622777]\n", 464 | " [-0.31622777 0.9486833 ]]\n", 465 | "Sigma:\n", 466 | "[ 18.97366596 9.48683298]\n", 467 | "V:\n", 468 | "[[-0.33333333 -0.66666667 -0.66666667]\n", 469 | " [ 0.66666667 0.33333333 -0.66666667]]\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "A = np.mat(\"4 11 14;8 7 -2\")\n", 475 | "print \"A:\\n\", A\n", 476 | "\n", 477 | "U, Sigma, V = np.linalg.svd(A, full_matrices=False)\n", 478 | "print \"U:\\n\", U\n", 479 | "print \"Sigma:\\n\", Sigma\n", 480 | "print \"V:\\n\", V" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 14, 486 | "metadata": { 487 | "collapsed": false 488 | }, 489 | "outputs": [ 490 | { 491 | "data": { 492 | "text/plain": [ 493 | "array([[ 18.97366596, 0. ],\n", 494 | " [ 0. , 9.48683298]])" 495 | ] 496 | }, 497 | "execution_count": 14, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "# Sigma矩阵是奇异值矩阵对角线上的值\n", 504 | "np.diag(Sigma)" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 17, 510 | "metadata": { 511 | "collapsed": false 512 | }, 513 | "outputs": [ 514 | { 515 | "name": "stdout", 516 | "output_type": "stream", 517 | "text": [ 518 | "Product:\n", 519 | "[[ 4. 11. 14.]\n", 520 | " [ 8. 7. -2.]]\n" 521 | ] 522 | } 523 | ], 524 | "source": [ 525 | "# check\n", 526 | "M = U*np.diag(Sigma)*V\n", 527 | "print \"Product:\\n\", M" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "###2.6 广义逆矩阵\n", 535 | "广义逆矩阵可以使用numpy.linalg模块中的pinv函数进行求解。inv函数只接受方阵作为输入矩阵,而pinv函数没有这个限制。" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 18, 541 | "metadata": { 542 | "collapsed": false 543 | }, 544 | "outputs": [ 545 | { 546 | "name": "stdout", 547 | "output_type": "stream", 548 | "text": [ 549 | "A:\n", 550 | "[[ 4 11 14]\n", 551 | " [ 8 7 -2]]\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "A = np.mat(\"4 11 14; 8 7 -2\")\n", 557 | "print \"A:\\n\", A" 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": 19, 563 | "metadata": { 564 | "collapsed": false 565 | }, 566 | "outputs": [ 567 | { 568 | "name": "stdout", 569 | "output_type": "stream", 570 | "text": [ 571 | "Pseudo inverse:\n", 572 | "[[-0.00555556 0.07222222]\n", 573 | " [ 0.02222222 0.04444444]\n", 574 | " [ 0.05555556 -0.05555556]]\n" 575 | ] 576 | } 577 | ], 578 | "source": [ 579 | "pseudoinv = np.linalg.pinv(A)\n", 580 | "print \"Pseudo inverse:\\n\", pseudoinv" 581 | ] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": 20, 586 | "metadata": { 587 | "collapsed": false 588 | }, 589 | "outputs": [ 590 | { 591 | "name": "stdout", 592 | "output_type": "stream", 593 | "text": [ 594 | "Check pseudo inverse:\n", 595 | "[[ 1.00000000e+00 0.00000000e+00]\n", 596 | " [ 8.32667268e-17 1.00000000e+00]]\n" 597 | ] 598 | } 599 | ], 600 | "source": [ 601 | "# check\n", 602 | "print \"Check pseudo inverse:\\n\", A*pseudoinv" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "得到的结果并非严格意义上的单位矩阵,但是非常近似。" 610 | ] 611 | }, 612 | { 613 | "cell_type": "code", 614 | "execution_count": 21, 615 | "metadata": { 616 | "collapsed": false 617 | }, 618 | "outputs": [ 619 | { 620 | "name": "stdout", 621 | "output_type": "stream", 622 | "text": [ 623 | "A:\n", 624 | "[[ 0 1 2]\n", 625 | " [ 1 0 3]\n", 626 | " [ 4 -3 8]]\n", 627 | "inverse of A:\n", 628 | "[[-4.5 7. -1.5]\n", 629 | " [-2. 4. -1. ]\n", 630 | " [ 1.5 -2. 0.5]]\n", 631 | "check inverse:\n", 632 | "[[ 1. 0. 0.]\n", 633 | " [ 0. 1. 0.]\n", 634 | " [ 0. 0. 1.]]\n", 635 | "Pseudo inverse:\n", 636 | "[[-4.5 7. -1.5]\n", 637 | " [-2. 4. -1. ]\n", 638 | " [ 1.5 -2. 0.5]]\n", 639 | "Check pseudo inverse:\n", 640 | "[[ 1.00000000e+00 -2.66453526e-15 8.88178420e-16]\n", 641 | " [ 8.88178420e-16 1.00000000e+00 2.22044605e-16]\n", 642 | " [ 0.00000000e+00 3.55271368e-15 1.00000000e+00]]\n" 643 | ] 644 | } 645 | ], 646 | "source": [ 647 | "A = np.mat(\"0 1 2; 1 0 3; 4 -3 8\")\n", 648 | "print \"A:\\n\", A\n", 649 | "inverse = np.linalg.inv(A)\n", 650 | "print \"inverse of A:\\n\", inverse\n", 651 | "print \"check inverse:\\n\", inverse * A\n", 652 | "\n", 653 | "pseudoinv = np.linalg.pinv(A)\n", 654 | "print \"Pseudo inverse:\\n\", pseudoinv\n", 655 | "print \"Check pseudo inverse:\\n\", A*pseudoinv" 656 | ] 657 | } 658 | ], 659 | "metadata": { 660 | "kernelspec": { 661 | "display_name": "Python 2", 662 | "language": "python", 663 | "name": "python2" 664 | }, 665 | "language_info": { 666 | "codemirror_mode": { 667 | "name": "ipython", 668 | "version": 2 669 | }, 670 | "file_extension": ".py", 671 | "mimetype": "text/x-python", 672 | "name": "python", 673 | "nbconvert_exporter": "python", 674 | "pygments_lexer": "ipython2", 675 | "version": "2.7.5" 676 | } 677 | }, 678 | "nbformat": 4, 679 | "nbformat_minor": 0 680 | } 681 | -------------------------------------------------------------------------------- /NumPy/(7)universal_functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "1. 通用函数\n", 9 | " - 创建通用函数 --- frompyfunc工厂函数\n", 10 | " - 通用函数的方法 --- reduce函数、accumulate函数、reduceat函数、outer函数\n", 11 | "2. 数组的除法运算 --- divide函数、true_divide函数、floor_divide函数\n", 12 | "3. 数组的模运算 --- mod函数、remainder函数、fmod函数\n", 13 | "4. 位操作函数和比较函数 --- " 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "import numpy as np" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "##1. 通用函数\n", 32 | "通用函数的输入时一组标量、输出也是一组标量,他们通常可以对应于基本数学运算。如加减乘除。\n", 33 | "\n", 34 | "[通用函数的文档](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#ufunc)\n", 35 | "\n", 36 | "通用函数是对普通的python函数进行矢量化,它是对ndarray对象的逐个元素的操作。" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "###1.1 创建通用函数\n", 44 | "使用NumPy中的frompyfunc函数,通过一个Python函数来创建通用函数。" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": { 51 | "collapsed": true 52 | }, 53 | "outputs": [], 54 | "source": [ 55 | "# 定义一个Python函数\n", 56 | "def pyFunc(a):\n", 57 | " result = np.zeros_like(a)\n", 58 | " # 这里可以看出来对逐个元素的操作\n", 59 | " result = 42\n", 60 | " return result" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "使用zeros_like函数创建一个和a形状相同的、元素全部为0的数组result。flat属性提供了一个扁平迭代器,可以逐个设置数组元素的值。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "使用frompyfunc创建通用函数,制定输入参数为1,输出参数为1。" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": { 81 | "collapsed": false 82 | }, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "The answer:\n", 89 | "[42 42 42 42]\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "ufunc1 = np.frompyfunc(pyFunc, 1, 1)\n", 95 | "ret = ufunc1(np.arange(4))\n", 96 | "print \"The answer:\\n\", ret" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": { 103 | "collapsed": false 104 | }, 105 | "outputs": [ 106 | { 107 | "name": "stdout", 108 | "output_type": "stream", 109 | "text": [ 110 | "The answer:\n", 111 | "[[42 42]\n", 112 | " [42 42]]\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "ret = ufunc1(np.arange(4).reshape(2,2))\n", 118 | "print \"The answer:\\n\", ret" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": { 124 | "collapsed": true 125 | }, 126 | "source": [ 127 | "**使用在第五节介绍的vectorize函数**" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 5, 133 | "metadata": { 134 | "collapsed": false 135 | }, 136 | "outputs": [ 137 | { 138 | "name": "stdout", 139 | "output_type": "stream", 140 | "text": [ 141 | "The answer:\n", 142 | "[42 42 42 42]\n" 143 | ] 144 | } 145 | ], 146 | "source": [ 147 | "func2 = np.vectorize(pyFunc)\n", 148 | "ret = func2(np.arange(4))\n", 149 | "print \"The answer:\\n\", ret" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "###1.2 通用函数的方法\n", 157 | "通用函数并不是真正的函数,而是能够表示函数的numpy.ufunc的对象。frompyfunc是一个构造ufunc类对象的工厂函数。\n", 158 | "\n", 159 | "通用函数类有4个方法:**reduce、accumulate、reduceat、outer。这些方法只对输入两个参数、输出一个参数的ufunc对象有效**。" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "###1.3 在add函数上分别调用4个方法" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "####(1) reduce方法:沿着指定的轴,在连续的数组元素之间递归调用通用函数,即可得到输入数组的规约(reduce)计算结果。对于add函数,其对数组的reduce计算结果等价于对数组元素求和。" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 6, 179 | "metadata": { 180 | "collapsed": false 181 | }, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "a:\n", 188 | "[0 1 2 3 4 5 6 7 8]\n", 189 | "Reduce:\n", 190 | "36\n" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "a = np.arange(9)\n", 196 | "print \"a:\\n\", a\n", 197 | "print \"Reduce:\\n\", np.add.reduce(a)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "####(2) accumulate方法:可以递归作用于输入数组,与reduce不同的是,它将存储运算的中间结果并返回。add函数上调用accumulate方法,等价于直接调用cumsum函数。" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 7, 210 | "metadata": { 211 | "collapsed": false 212 | }, 213 | "outputs": [ 214 | { 215 | "name": "stdout", 216 | "output_type": "stream", 217 | "text": [ 218 | "Accumulate:\n", 219 | "[ 0 1 3 6 10 15 21 28 36]\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "print \"Accumulate:\\n\", np.add.accumulate(a)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 8, 230 | "metadata": { 231 | "collapsed": false 232 | }, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "cumsum:\n", 239 | "[ 0 1 3 6 10 15 21 28 36]\n" 240 | ] 241 | } 242 | ], 243 | "source": [ 244 | "print \"cumsum:\\n\", np.cumsum(a)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "####(3) reduceat方法有点复杂,它需要输入一个数组以及一个索引值列表作为参数" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 9, 257 | "metadata": { 258 | "collapsed": false 259 | }, 260 | "outputs": [ 261 | { 262 | "name": "stdout", 263 | "output_type": "stream", 264 | "text": [ 265 | "Reduceat:\n", 266 | "[10 5 20 15]\n" 267 | ] 268 | } 269 | ], 270 | "source": [ 271 | "print \"Reduceat:\\n\", np.add.reduceat(a, [0,5,2,7])" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "- 第一步:用到索引值列表中的0和5,实际上就是对数组中索引值在0到5之间的元素进行reduce操作\n", 279 | "- 第二步:用到索引值5和2.由于2比5小,所以直接返回索引值为5的元素\n", 280 | "- 第三步:用到索引值2和7,计算2到7的数组的reduce操作\n", 281 | "- 第四步:用到索引值7,对索引值7开始直到数组末尾的元素进行reduce操作" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "metadata": { 288 | "collapsed": false 289 | }, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "Reduceat step 1: 10\n", 296 | "Reduceat step 2: 5\n", 297 | "Reduceat step 3: 20\n", 298 | "Reduceat step 4: 15\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "print \"Reduceat step 1:\", np.add.reduce(a[0:5])\n", 304 | "print \"Reduceat step 2:\", a[5]\n", 305 | "print \"Reduceat step 3:\", np.add.reduce(a[2:7])\n", 306 | "print \"Reduceat step 4:\", np.add.reduce(a[7:])" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "####(4) outer方法:返回一个数组,它的秩(rank)等于两个输入数组的秩的和。它会作用于两个输入数组之间存在的所有元素对。" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 11, 319 | "metadata": { 320 | "collapsed": false 321 | }, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "Outer:\n", 328 | "[[ 0 1 2 3 4 5 6 7 8]\n", 329 | " [ 1 2 3 4 5 6 7 8 9]\n", 330 | " [ 2 3 4 5 6 7 8 9 10]]\n" 331 | ] 332 | } 333 | ], 334 | "source": [ 335 | "print \"Outer:\\n\", np.add.outer(np.arange(3), a)" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "##2. 数组的除法运算" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "在NumPy中,计算算术运算符+、-、* 隐式关联着通用函数add、subtrack和multiply。也就是说,当你对NumPy数组使用这些运算符时,对应的通用函数将自动被调用。\n", 350 | "\n", 351 | "除法包含的过程比较复杂,在数组的除法运算中射击三个通用函数divide、true_divide和floor_division,以及两个对应的运算符/和//。" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "###(1) divide函数在整数除法中均只保留整数部分" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 12, 364 | "metadata": { 365 | "collapsed": false 366 | }, 367 | "outputs": [ 368 | { 369 | "name": "stdout", 370 | "output_type": "stream", 371 | "text": [ 372 | "Divide:\n", 373 | "[2 3 1] [0 0 0]\n" 374 | ] 375 | } 376 | ], 377 | "source": [ 378 | "a = np.array([2, 6, 5])\n", 379 | "b = np.array([1, 2, 3])\n", 380 | "print \"Divide:\\n\", np.divide(a, b), np.divide(b, a)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "运算结果的小数部分被截断了" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 13, 393 | "metadata": { 394 | "collapsed": false 395 | }, 396 | "outputs": [ 397 | { 398 | "name": "stdout", 399 | "output_type": "stream", 400 | "text": [ 401 | "Divide:\n", 402 | "[ 2.1 3.1 2.63157895] [ 0.47619048 0.32258065 0.38 ]\n" 403 | ] 404 | } 405 | ], 406 | "source": [ 407 | "c = np.array([2.1, 6.2, 5.0])\n", 408 | "d = np.array([1, 2, 1.9])\n", 409 | "print \"Divide:\\n\", np.divide(c, d), np.divide(d, c)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "divide函数如果有一方是浮点数,那么结果也是浮点数结果" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": {}, 422 | "source": [ 423 | "###(2) true_divide函数与数学中的除法定义更为接近,返回除法的浮点数结果不截断" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 14, 429 | "metadata": { 430 | "collapsed": false 431 | }, 432 | "outputs": [ 433 | { 434 | "name": "stdout", 435 | "output_type": "stream", 436 | "text": [ 437 | "True Divide:\n", 438 | "[ 2. 3. 1.66666667] [ 0.5 0.33333333 0.6 ]\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "print \"True Divide:\\n\", np.true_divide(a, b), np.true_divide(b, a)" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "###(3) floor_divide函数总是返回整数结果,相当于先调用divide函数再调用floor函数。\n", 451 | "floor函数对浮点数进行**向下取整**并返回整数。" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 15, 457 | "metadata": { 458 | "collapsed": false 459 | }, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "Floor Divide:\n", 466 | "[2 3 1] [0 0 0]\n" 467 | ] 468 | } 469 | ], 470 | "source": [ 471 | "print \"Floor Divide:\\n\", np.floor_divide(a, b), np.floor_divide(b, a)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "**默认情况下,使用/运算符相当于调用divide函数,使用//运算符对应于floor_divide函数**" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "##3. 数组的模运算\n", 486 | "计算模数或者余数,可以使用NumPy中的mod、remainder和fmod函数。也可以用%运算符。" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "###(1) remainder函数逐个返回两个数组中元素相除后的余数,如果第二个数字为0,则直接返回0" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 16, 499 | "metadata": { 500 | "collapsed": false 501 | }, 502 | "outputs": [ 503 | { 504 | "name": "stdout", 505 | "output_type": "stream", 506 | "text": [ 507 | "a:\n", 508 | "[-4 -3 -2 -1 0 1 2 3]\n", 509 | "Remainder:\n", 510 | "[0 1 0 1 0 1 0 1]\n" 511 | ] 512 | } 513 | ], 514 | "source": [ 515 | "a = np.arange(-4,4)\n", 516 | "print \"a:\\n\", a\n", 517 | "print \"Remainder:\\n\", np.remainder(a, 2)" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": {}, 523 | "source": [ 524 | "**mod函数与remainder函数的功能完全一致,%操作符仅仅是remainder函数的简写**" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "###(2) fmod函数处理负数的方式和remainder不同。所得余数的正负由被除数决定,与除数的正负无关" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 17, 537 | "metadata": { 538 | "collapsed": false 539 | }, 540 | "outputs": [ 541 | { 542 | "name": "stdout", 543 | "output_type": "stream", 544 | "text": [ 545 | "Fmod:\n", 546 | "[ 0 -1 0 -1 0 1 0 1]\n" 547 | ] 548 | } 549 | ], 550 | "source": [ 551 | "print \"Fmod:\\n\", np.fmod(a, 2)" 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 18, 557 | "metadata": { 558 | "collapsed": false 559 | }, 560 | "outputs": [ 561 | { 562 | "name": "stdout", 563 | "output_type": "stream", 564 | "text": [ 565 | "[ 0 -1 0 -1 0 1 0 1]\n" 566 | ] 567 | } 568 | ], 569 | "source": [ 570 | "print np.fmod(a, -2)" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "##4. 位操作函数和比较函数\n", 578 | "位操作函数可以在整数或整数数组的位上进行操作,它们都是通用函数。\n", 579 | "\n", 580 | "位操作符:^、&、|、<<、>>等。\n", 581 | "\n", 582 | "比较操作符:<、>、==等。" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "###4.1 检查两个整数的符号是否一致\n", 590 | "这里要用到XOR或者^操作符。XOR操作符又称为**不等运算符**,因此当两个操作数的符号不一致时,XOR操作的结果为负数。\n", 591 | "\n", 592 | "在NumPy中,^操作符对应于bitwise_xor函数,<操作符对应于less函数。" 593 | ] 594 | }, 595 | { 596 | "cell_type": "code", 597 | "execution_count": 19, 598 | "metadata": { 599 | "collapsed": false 600 | }, 601 | "outputs": [ 602 | { 603 | "name": "stdout", 604 | "output_type": "stream", 605 | "text": [ 606 | "Sign different? [ True True True True True True True True True False True True\n", 607 | " True True True True True True]\n", 608 | "Sign different? [ True True True True True True True True True False True True\n", 609 | " True True True True True True]\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "x = np.arange(-9, 9)\n", 615 | "y = -x\n", 616 | "print \"Sign different? \", (x^y) < 0\n", 617 | "print \"Sign different? \", np.less(np.bitwise_xor(x, y), 0)" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "除了等于0的情况,所有整数对的符号都不一样。" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "###4.2 检查一个数是否为2的幂数\n", 632 | "在二进制数中,2的幂数表示为一个1后面跟着一串0的形式。**如果在2的幂数以及比它小1的数之间进行位与操作AND,那么应该等于0。**\n", 633 | "\n", 634 | "在NumPy中,&操作符对应于bitwise_and函数,==操作符对应于equal函数。" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": 20, 640 | "metadata": { 641 | "collapsed": false 642 | }, 643 | "outputs": [ 644 | { 645 | "name": "stdout", 646 | "output_type": "stream", 647 | "text": [ 648 | "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n", 649 | "Power of 2 ?\n", 650 | "[ True True True False True False False False True False False False\n", 651 | " False False False False True False False False]\n", 652 | "Power of 2 ?\n", 653 | "[ True True True False True False False False True False False False\n", 654 | " False False False False True False False False]\n" 655 | ] 656 | } 657 | ], 658 | "source": [ 659 | "b = np.arange(20)\n", 660 | "print b\n", 661 | "print \"Power of 2 ?\\n\", (b & (b-1)) == 0\n", 662 | "print \"Power of 2 ?\\n\", np.equal(np.bitwise_and(b, (b-1)), 0)" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": {}, 668 | "source": [ 669 | "###4.3 计算一个数被2的幂数整除后的余数\n", 670 | "计算余数的技巧只在模为2的幂数时有效。二进制的位左移一位,数值翻倍。\n", 671 | "\n", 672 | "**上一个例子看到,将2的幂数减去1,得到一串1组成的二进制数,这为我们提供了掩码,与这样的掩码做位与操作,即可得到以2的幂数作为模的余数。**\n", 673 | "\n", 674 | "在NumPy中,<<操作符对应于left_shift函数。" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 21, 680 | "metadata": { 681 | "collapsed": false 682 | }, 683 | "outputs": [ 684 | { 685 | "name": "stdout", 686 | "output_type": "stream", 687 | "text": [ 688 | "Modulus 4:\n", 689 | "[3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0]\n" 690 | ] 691 | } 692 | ], 693 | "source": [ 694 | "print \"Modulus 4:\\n\", x & ((1<<2) - 1)" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": 25, 700 | "metadata": { 701 | "collapsed": false 702 | }, 703 | "outputs": [], 704 | "source": [ 705 | "def mod_2_pow(x, n):\n", 706 | " mod = x & ((1<" 686 | ] 687 | }, 688 | "execution_count": 26, 689 | "metadata": {}, 690 | "output_type": "execute_result" 691 | } 692 | ], 693 | "source": [ 694 | "f = b.flat\n", 695 | "f" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 27, 701 | "metadata": { 702 | "collapsed": false 703 | }, 704 | "outputs": [ 705 | { 706 | "name": "stdout", 707 | "output_type": "stream", 708 | "text": [ 709 | "0\n", 710 | "1\n", 711 | "2\n", 712 | "3\n" 713 | ] 714 | } 715 | ], 716 | "source": [ 717 | "for item in f:\n", 718 | " print item" 719 | ] 720 | }, 721 | { 722 | "cell_type": "code", 723 | "execution_count": 28, 724 | "metadata": { 725 | "collapsed": false 726 | }, 727 | "outputs": [ 728 | { 729 | "data": { 730 | "text/plain": [ 731 | "2" 732 | ] 733 | }, 734 | "execution_count": 28, 735 | "metadata": {}, 736 | "output_type": "execute_result" 737 | } 738 | ], 739 | "source": [ 740 | "#使用flatiter对象直接获取一个数组的元素\n", 741 | "b.flat[2]" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 29, 747 | "metadata": { 748 | "collapsed": false 749 | }, 750 | "outputs": [ 751 | { 752 | "data": { 753 | "text/plain": [ 754 | "array([1, 3])" 755 | ] 756 | }, 757 | "execution_count": 29, 758 | "metadata": {}, 759 | "output_type": "execute_result" 760 | } 761 | ], 762 | "source": [ 763 | "b.flat[[1,3]]" 764 | ] 765 | }, 766 | { 767 | "cell_type": "code", 768 | "execution_count": 30, 769 | "metadata": { 770 | "collapsed": true 771 | }, 772 | "outputs": [], 773 | "source": [ 774 | "#对flat属性赋值将导致整个数组的元素被覆盖\n", 775 | "b.flat = 7" 776 | ] 777 | }, 778 | { 779 | "cell_type": "code", 780 | "execution_count": 31, 781 | "metadata": { 782 | "collapsed": false 783 | }, 784 | "outputs": [ 785 | { 786 | "data": { 787 | "text/plain": [ 788 | "array([[7, 7],\n", 789 | " [7, 7]])" 790 | ] 791 | }, 792 | "execution_count": 31, 793 | "metadata": {}, 794 | "output_type": "execute_result" 795 | } 796 | ], 797 | "source": [ 798 | "b" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": 34, 804 | "metadata": { 805 | "collapsed": false 806 | }, 807 | "outputs": [ 808 | { 809 | "data": { 810 | "text/plain": [ 811 | "array([[7, 1],\n", 812 | " [7, 1]])" 813 | ] 814 | }, 815 | "execution_count": 34, 816 | "metadata": {}, 817 | "output_type": "execute_result" 818 | } 819 | ], 820 | "source": [ 821 | "b.flat[[1,3]] = 1\n", 822 | "b" 823 | ] 824 | }, 825 | { 826 | "cell_type": "markdown", 827 | "metadata": {}, 828 | "source": [ 829 | "##4. 数组的转换\n", 830 | "**tolist函数将numpy数组转换成python列表**" 831 | ] 832 | }, 833 | { 834 | "cell_type": "code", 835 | "execution_count": 35, 836 | "metadata": { 837 | "collapsed": false 838 | }, 839 | "outputs": [ 840 | { 841 | "data": { 842 | "text/plain": [ 843 | "array([[7, 1],\n", 844 | " [7, 1]])" 845 | ] 846 | }, 847 | "execution_count": 35, 848 | "metadata": {}, 849 | "output_type": "execute_result" 850 | } 851 | ], 852 | "source": [ 853 | "b" 854 | ] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": 36, 859 | "metadata": { 860 | "collapsed": false 861 | }, 862 | "outputs": [ 863 | { 864 | "data": { 865 | "text/plain": [ 866 | "[[7, 1], [7, 1]]" 867 | ] 868 | }, 869 | "execution_count": 36, 870 | "metadata": {}, 871 | "output_type": "execute_result" 872 | } 873 | ], 874 | "source": [ 875 | "b.tolist()" 876 | ] 877 | }, 878 | { 879 | "cell_type": "markdown", 880 | "metadata": {}, 881 | "source": [ 882 | "**astype函数可以在转换数组时指定数据类型**" 883 | ] 884 | }, 885 | { 886 | "cell_type": "code", 887 | "execution_count": 37, 888 | "metadata": { 889 | "collapsed": false 890 | }, 891 | "outputs": [ 892 | { 893 | "data": { 894 | "text/plain": [ 895 | "array([[ 7., 1.],\n", 896 | " [ 7., 1.]])" 897 | ] 898 | }, 899 | "execution_count": 37, 900 | "metadata": {}, 901 | "output_type": "execute_result" 902 | } 903 | ], 904 | "source": [ 905 | "b.astype(float)" 906 | ] 907 | }, 908 | { 909 | "cell_type": "code", 910 | "execution_count": null, 911 | "metadata": { 912 | "collapsed": true 913 | }, 914 | "outputs": [], 915 | "source": [] 916 | } 917 | ], 918 | "metadata": { 919 | "kernelspec": { 920 | "display_name": "Python 2", 921 | "language": "python", 922 | "name": "python2" 923 | }, 924 | "language_info": { 925 | "codemirror_mode": { 926 | "name": "ipython", 927 | "version": 2 928 | }, 929 | "file_extension": ".py", 930 | "mimetype": "text/x-python", 931 | "name": "python", 932 | "nbconvert_exporter": "python", 933 | "pygments_lexer": "ipython2", 934 | "version": "2.7.5" 935 | } 936 | }, 937 | "nbformat": 4, 938 | "nbformat_minor": 0 939 | } 940 | -------------------------------------------------------------------------------- /NumPy/.ipynb_checkpoints/(3)common_functions1-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "这一小节介绍NumPy的常用函数。\n", 9 | "\n", 10 | "1. 读入csv\n", 11 | "loadtxt函数\n", 12 | "\n", 13 | "2. 计算平均值\n", 14 | "average、mean函数\n", 15 | "\n", 16 | "3. 求最大最小值\n", 17 | "max、min函数\n", 18 | "\n", 19 | "4. 计算中位数\n", 20 | "median、msort函数\n", 21 | "\n", 22 | "5. 计算方差\n", 23 | "var函数" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "##1. 读写CSV文件\n", 31 | "CSV(Comma-Separated Value,逗号分隔值)格式是一种常见的文件格式。通常,数据库的转存文件就是csv格式的,文件中的各个字段对应于数据库中的列。" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "**Numpy中的loadtxt函数可以方便地读取csv文件,自动切分字段,并将数据载入Numpy数组。**" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "import numpy as np" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "c, v = np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "data.csv文件是苹果公司的历史股价数据。第一列为股票代码,第二列为dd-mm-yyyy格式的日期,第三列为空,随后各列依次是**开盘价(4)、最高价(5)、最低价(6)和收盘价(7)**,最后一列为当日的成交量(8)。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "loadtxt函数中,*usecols参数*为一个元组,以获得第7字段至第8字段的数据,也就是股票的收盘价和成交量数据。\n", 75 | "\n", 76 | "*unpack参数*设置为True,意思是分拆存储不同列的数据,即分别将收盘价和成交量的数组赋值给变量c和v。\n", 77 | "\n", 78 | "**用usecols中的参数指定我们感兴趣的数据列,并将unpack参数设置为True使得不同列的数据分别存储。**" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": { 85 | "collapsed": false 86 | }, 87 | "outputs": [ 88 | { 89 | "data": { 90 | "text/plain": [ 91 | "array([ 336.1 , 339.32, 345.03, 344.32, 343.44, 346.5 , 351.88,\n", 92 | " 355.2 , 358.16, 354.54, 356.85, 359.18, 359.9 , 363.13,\n", 93 | " 358.3 , 350.56, 338.61, 342.62, 342.88, 348.16, 353.21,\n", 94 | " 349.31, 352.12, 359.56, 360. , 355.36, 355.76, 352.47,\n", 95 | " 346.67, 351.99])" 96 | ] 97 | }, 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "output_type": "execute_result" 101 | } 102 | ], 103 | "source": [ 104 | "c" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 4, 110 | "metadata": { 111 | "collapsed": false 112 | }, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/plain": [ 117 | "array([ 21144800., 13473000., 15236800., 9242600., 14064100.,\n", 118 | " 11494200., 17322100., 13608500., 17240800., 33162400.,\n", 119 | " 13127500., 11086200., 10149000., 17184100., 18949000.,\n", 120 | " 29144500., 31162200., 23994700., 17853500., 13572000.,\n", 121 | " 14395400., 16290300., 21521000., 17885200., 16188000.,\n", 122 | " 19504300., 12718000., 16192700., 18138800., 16824200.])" 123 | ] 124 | }, 125 | "execution_count": 4, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "v" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 18, 137 | "metadata": { 138 | "collapsed": false 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "#选择第4列,开盘价\n", 143 | "opening_price = np.loadtxt('data.csv', delimiter=',', usecols=(3,), unpack=True)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 19, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "[ 344.17 335.8 341.3 344.45 343.8 343.61 347.89 353.68 355.19\n", 158 | " 357.39 354.75 356.79 359.19 360.8 357.1 358.21 342.05 338.77\n", 159 | " 344.02 345.29 351.21 355.47 349.96 357.2 360.07 361.11 354.91\n", 160 | " 354.69 349.69 345.4 ]\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "print opening_price" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "##2. 计算平均值" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "###2.1 计算加权平均\n", 180 | "VWAP是Volume-Weighted Average Price,成交量加权平均价格,某个价格的成交量越高,该价格所占的权重就越大。**VWAP就是以成交量为权重计算出来的加权平均值。**" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 20, 186 | "metadata": { 187 | "collapsed": false 188 | }, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "VWAP = 350.589549353\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "vwap = np.average(c, weights=v)\n", 200 | "print \"VWAP =\", vwap" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "TWAP是Time0Weighted Average Price,时间加权平均价格,其基本思想是最近的价格重要性大一些,所以我们应该对近期的价格给以较高的权重。\n", 208 | "我们使用arange函数创建从0递增的自然数序列,自然数的个数即为收盘价的个数。" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 23, 214 | "metadata": { 215 | "collapsed": false 216 | }, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "twap = 352.428321839\n" 223 | ] 224 | } 225 | ], 226 | "source": [ 227 | "t = np.arange(len(c))\n", 228 | "print \"twap = \",np.average(c, weights=t)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "###2.2 算术平均\n", 236 | "使用mean函数计算算术平均" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 21, 242 | "metadata": { 243 | "collapsed": false 244 | }, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "mean = 351.037666667\n" 251 | ] 252 | } 253 | ], 254 | "source": [ 255 | "mean = np.mean(c)\n", 256 | "print \"mean = \",mean" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 32, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [ 266 | { 267 | "name": "stdout", 268 | "output_type": "stream", 269 | "text": [ 270 | "mean = 351.037666667\n" 271 | ] 272 | } 273 | ], 274 | "source": [ 275 | "print \"mean = \", c.mean()" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "##3. 求最大最小值和取值范围" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "步骤:读入最高价和最低价,使用max和min函数得到最大最小值。" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 24, 295 | "metadata": { 296 | "collapsed": false 297 | }, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "hightest = 364.9\n", 304 | "lowest = 333.53\n" 305 | ] 306 | } 307 | ], 308 | "source": [ 309 | "h,l = np.loadtxt('data.csv', delimiter=',', usecols=(4,5), unpack=True)\n", 310 | "print 'hightest = ', np.max(h)\n", 311 | "print 'lowest = ', np.min(l)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "numpy中ptp函数可以计算数组的取值范围。该函数返回的是数组元素最大值和最小值的差值,即max(array)-min(array)。" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 25, 324 | "metadata": { 325 | "collapsed": false 326 | }, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "Spread high price : 24.86\n", 333 | "Spread low price : 26.97\n" 334 | ] 335 | } 336 | ], 337 | "source": [ 338 | "print 'Spread high price : ', np.ptp(h)\n", 339 | "print 'Spread low price : ', np.ptp(l)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "##4. 计算中位数" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "计算收盘价的中位数" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 26, 359 | "metadata": { 360 | "collapsed": false 361 | }, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "median = 352.055\n" 368 | ] 369 | } 370 | ], 371 | "source": [ 372 | "closing_price = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)\n", 373 | "print 'median = ', np.median(closing_price)" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "对数组进行排序,之后再去中位数" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 27, 386 | "metadata": { 387 | "collapsed": false 388 | }, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "sorted_closing_price = [ 336.1 338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5\n", 395 | " 346.67 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21\n", 396 | " 354.54 355.2 355.36 355.76 356.85 358.16 358.3 359.18 359.56\n", 397 | " 359.9 360. 363.13]\n" 398 | ] 399 | } 400 | ], 401 | "source": [ 402 | "sorted_closing = np.msort(closing_price)\n", 403 | "print \"sorted_closing_price = \", sorted_closing" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 29, 409 | "metadata": { 410 | "collapsed": false 411 | }, 412 | "outputs": [ 413 | { 414 | "name": "stdout", 415 | "output_type": "stream", 416 | "text": [ 417 | "median = 352.055\n" 418 | ] 419 | } 420 | ], 421 | "source": [ 422 | "#先判断数组的个数是奇数还是偶数\n", 423 | "N = len(closing_price)\n", 424 | "median_ind = (N-1)/2\n", 425 | "if N & 0x1 :\n", 426 | " print \"median = \", sorted_closing[median_ind]\n", 427 | "else:\n", 428 | " print \"median = \", (sorted_closing[median_ind]+sorted_closing[median_ind+1])/2" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "##5. 计算方差\n", 436 | "方差体现变量变化的程度,股价变动过于剧烈的股票一定会给持有者制造麻烦。" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 30, 442 | "metadata": { 443 | "collapsed": false 444 | }, 445 | "outputs": [ 446 | { 447 | "name": "stdout", 448 | "output_type": "stream", 449 | "text": [ 450 | "variance = 50.1265178889\n" 451 | ] 452 | } 453 | ], 454 | "source": [ 455 | "print \"variance = \", np.var(closing_price)" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 31, 461 | "metadata": { 462 | "collapsed": false 463 | }, 464 | "outputs": [ 465 | { 466 | "name": "stdout", 467 | "output_type": "stream", 468 | "text": [ 469 | "variance from definition = 50.1265178889\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "#手动求方差\n", 475 | "print 'variance from definition = ', np.mean( (closing_price-c.mean())**2 )" 476 | ] 477 | } 478 | ], 479 | "metadata": { 480 | "kernelspec": { 481 | "display_name": "Python 2", 482 | "language": "python", 483 | "name": "python2" 484 | }, 485 | "language_info": { 486 | "codemirror_mode": { 487 | "name": "ipython", 488 | "version": 2 489 | }, 490 | "file_extension": ".py", 491 | "mimetype": "text/x-python", 492 | "name": "python", 493 | "nbconvert_exporter": "python", 494 | "pygments_lexer": "ipython2", 495 | "version": "2.7.5" 496 | } 497 | }, 498 | "nbformat": 4, 499 | "nbformat_minor": 0 500 | } 501 | -------------------------------------------------------------------------------- /NumPy/.ipynb_checkpoints/(6)linear_algebra-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "1. 矩阵 --- mat函数\n", 9 | "2. 线性代数 --- \n", 10 | "numpy.linalg中的逆矩阵函数inv函数、行列式det函数、求解线性方程组的solve函数、内积dot函数、特征分解eigvals函数、eig函数、奇异值分解svd函数、广义逆矩阵的pinv函数" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "collapsed": true 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "##1. 矩阵\n", 29 | "在NumP中,矩阵是ndarray的子类,可以由专用的字符串格式来创建。我们可以使用mat、matrix、以及bmat函数来创建矩阵。" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "collapsed": true 36 | }, 37 | "source": [ 38 | "###1.1 创建矩阵\n", 39 | "mat函数创建矩阵时,若输入已经为matrix或ndarray对象,则不会为它们创建副本。因此,调用mat函数和调用matrix(data, copy=False)等价。" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "**在创建矩阵的专用字符串中,矩阵的行与行之间用分号隔开,行内的元素之间用空格隔开。**" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 2, 52 | "metadata": { 53 | "collapsed": false 54 | }, 55 | "outputs": [ 56 | { 57 | "name": "stdout", 58 | "output_type": "stream", 59 | "text": [ 60 | "Creation from string:\n", 61 | "[[1 2 3]\n", 62 | " [4 5 6]\n", 63 | " [7 8 9]]\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "A = np.mat('1 2 3; 4 5 6; 7 8 9')\n", 69 | "print \"Creation from string:\\n\", A" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": { 76 | "collapsed": false 77 | }, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "Transpose A :\n", 84 | "[[1 4 7]\n", 85 | " [2 5 8]\n", 86 | " [3 6 9]]\n", 87 | "Inverse A :\n", 88 | "[[ -4.50359963e+15 9.00719925e+15 -4.50359963e+15]\n", 89 | " [ 9.00719925e+15 -1.80143985e+16 9.00719925e+15]\n", 90 | " [ -4.50359963e+15 9.00719925e+15 -4.50359963e+15]]\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "# 转置\n", 96 | "print \"Transpose A :\\n\", A.T\n", 97 | "\n", 98 | "# 逆矩阵\n", 99 | "print \"Inverse A :\\n\", A.I" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": { 106 | "collapsed": false 107 | }, 108 | "outputs": [ 109 | { 110 | "name": "stdout", 111 | "output_type": "stream", 112 | "text": [ 113 | "Creation from array: \n", 114 | "[[0 1 2]\n", 115 | " [3 4 5]\n", 116 | " [6 7 8]]\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "# 通过NumPy数组创建矩阵\n", 122 | "print \"Creation from array: \\n\", np.mat(np.arange(9).reshape(3,3))" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "###1.2 从已有矩阵创建新矩阵\n", 130 | "我们可以利用一些已有的较小的矩阵来创建一个新的大矩阵。用bmat函数来实现。" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 5, 136 | "metadata": { 137 | "collapsed": false 138 | }, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "A:\n", 145 | "[[ 1. 0.]\n", 146 | " [ 0. 1.]]\n", 147 | "B:\n", 148 | "[[ 2. 0.]\n", 149 | " [ 0. 2.]]\n", 150 | "Compound matrix:\n", 151 | "[[ 1. 0. 2. 0.]\n", 152 | " [ 0. 1. 0. 2.]]\n", 153 | "Compound matrix:\n", 154 | "[[ 1. 0. 2. 0.]\n", 155 | " [ 0. 1. 0. 2.]\n", 156 | " [ 2. 0. 1. 0.]\n", 157 | " [ 0. 2. 0. 1.]]\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "A = np.eye(2)\n", 163 | "print \"A:\\n\", A\n", 164 | "\n", 165 | "B = 2 * A\n", 166 | "print \"B:\\n\", B\n", 167 | "\n", 168 | "# 使用字符串创建复合矩阵\n", 169 | "print \"Compound matrix:\\n\", np.bmat(\"A B\")\n", 170 | "print \"Compound matrix:\\n\", np.bmat(\"A B; B A\")" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "collapsed": true 177 | }, 178 | "source": [ 179 | "##2. 线性代数\n", 180 | "线性代数是数学的一个重要分支。numpy.linalg模块包含线性代数的函数。使用这个模块,我们可以计算逆矩阵、求特征值、解线性方程组以及求解行列式。" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "###2.1 计算逆矩阵\n", 188 | "使用inv函数计算逆矩阵。" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 6, 194 | "metadata": { 195 | "collapsed": false 196 | }, 197 | "outputs": [ 198 | { 199 | "name": "stdout", 200 | "output_type": "stream", 201 | "text": [ 202 | "A:\n", 203 | "[[ 0 1 2]\n", 204 | " [ 1 0 3]\n", 205 | " [ 4 -3 8]]\n", 206 | "inverse of A:\n", 207 | "[[-4.5 7. -1.5]\n", 208 | " [-2. 4. -1. ]\n", 209 | " [ 1.5 -2. 0.5]]\n", 210 | "check inverse:\n", 211 | "[[ 1. 0. 0.]\n", 212 | " [ 0. 1. 0.]\n", 213 | " [ 0. 0. 1.]]\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "A = np.mat(\"0 1 2; 1 0 3; 4 -3 8\")\n", 219 | "print \"A:\\n\", A\n", 220 | "inverse = np.linalg.inv(A)\n", 221 | "print \"inverse of A:\\n\", inverse\n", 222 | "\n", 223 | "print \"check inverse:\\n\", inverse * A" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "###2.2 行列式\n", 231 | "行列式是与方阵相关的一个标量值。对于一个n*n的实数矩阵,**行列式描述的是一个线性变换对“有向体积”所造成的影响。行列式的值为正,表示保持了空间的定向(顺时针或逆时针),为负表示颠倒空间的定向。**numpy.linalg模块中的det函数可以计算矩阵的行列式。" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 7, 237 | "metadata": { 238 | "collapsed": false 239 | }, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | "A:\n", 246 | "[[3 4]\n", 247 | " [5 6]]\n", 248 | "Determinant:\n", 249 | "-2.0\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "A = np.mat(\"3 4; 5 6\")\n", 255 | "print \"A:\\n\", A\n", 256 | "\n", 257 | "print \"Determinant:\\n\", np.linalg.det(A)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "###2.3 求解线性方程组\n", 265 | "矩阵可以对向量进行线性变换,这对应于数学中的线性方程组。solve函数可以求解形如Ax = b的线性方程组,其中A是矩阵,b是一维或二维的数组,x是未知变量。" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 8, 271 | "metadata": { 272 | "collapsed": false 273 | }, 274 | "outputs": [ 275 | { 276 | "name": "stdout", 277 | "output_type": "stream", 278 | "text": [ 279 | "A:\n", 280 | "[[ 1 -2 1]\n", 281 | " [ 0 2 -8]\n", 282 | " [-4 5 9]]\n", 283 | "b:\n", 284 | "[ 0 8 -9]\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "A = np.mat(\"1 -2 1; 0 2 -8; -4 5 9\")\n", 290 | "print \"A:\\n\", A\n", 291 | "b = np.array([0,8,-9])\n", 292 | "print \"b:\\n\", b" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 9, 298 | "metadata": { 299 | "collapsed": false 300 | }, 301 | "outputs": [ 302 | { 303 | "name": "stdout", 304 | "output_type": "stream", 305 | "text": [ 306 | "Solution:\n", 307 | "[ 29. 16. 3.]\n", 308 | "Check:\n", 309 | "[[ True True True]]\n", 310 | "[[ 0. 8. -9.]]\n" 311 | ] 312 | } 313 | ], 314 | "source": [ 315 | "x = np.linalg.solve(A, b)\n", 316 | "print \"Solution:\\n\", x\n", 317 | "\n", 318 | "# check\n", 319 | "print \"Check:\\n\",b == np.dot(A, x)\n", 320 | "print np.dot(A, x)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "###2.4 特征值和特征向量\n", 328 | "特征值(eigenvalue)即方程Ax = ax的根,是一个标量。其中,A是一个二维矩阵,x是一个一维向量。特征向量(eigenvector)是关于特征值的向量。在numpy.linalg模块中,eigvals函数可以计算矩阵的特征值,而eig函数可以返回一个包含特征值和对应特征向量的元组。" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 10, 334 | "metadata": { 335 | "collapsed": false 336 | }, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "A:\n", 343 | "[[ 3 -2]\n", 344 | " [ 1 0]]\n", 345 | "Eigenvalues:\n", 346 | "[ 2. 1.]\n", 347 | "Eigenvalues:\n", 348 | "[ 2. 1.]\n", 349 | "Eigenvectors:\n", 350 | "[[ 0.89442719 0.70710678]\n", 351 | " [ 0.4472136 0.70710678]]\n" 352 | ] 353 | } 354 | ], 355 | "source": [ 356 | "A = np.mat(\"3 -2; 1 0\")\n", 357 | "print \"A:\\n\", A\n", 358 | "\n", 359 | "print \"Eigenvalues:\\n\", np.linalg.eigvals(A)\n", 360 | "\n", 361 | "eigenvalues, eigenvectors = np.linalg.eig(A)\n", 362 | "print \"Eigenvalues:\\n\", eigenvalues\n", 363 | "print \"Eigenvectors:\\n\", eigenvectors" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 11, 369 | "metadata": { 370 | "collapsed": false 371 | }, 372 | "outputs": [ 373 | { 374 | "name": "stdout", 375 | "output_type": "stream", 376 | "text": [ 377 | "Left:\n", 378 | "[[ 1.78885438]\n", 379 | " [ 0.89442719]]\n", 380 | "Right:\n", 381 | "[[ 1.78885438]\n", 382 | " [ 0.89442719]]\n", 383 | "\n", 384 | "Left:\n", 385 | "[[ 0.70710678]\n", 386 | " [ 0.70710678]]\n", 387 | "Right:\n", 388 | "[[ 0.70710678]\n", 389 | " [ 0.70710678]]\n", 390 | "\n" 391 | ] 392 | } 393 | ], 394 | "source": [ 395 | "# check\n", 396 | "# 计算 Ax = ax的左右两部分的值\n", 397 | "for i in range(len(eigenvalues)):\n", 398 | " print \"Left:\\n\", np.dot(A, eigenvectors[:,i])\n", 399 | " print \"Right:\\n\", np.dot(eigenvalues[i], eigenvectors[:,i])\n", 400 | " print" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": { 406 | "collapsed": true 407 | }, 408 | "source": [ 409 | "###2.5 奇异值分解\n", 410 | "SVD(Singular Value Decomposition,奇异值分解)是一种因子分解运算,将一个矩阵分解为3个矩阵的乘积。奇异值分解是特征值分解的一种推广。\n", 411 | "\n", 412 | "在numpy.linalg模块中的svd函数可以对矩阵进行奇异值分解。该函数返回3个矩阵——U、Sigma和V,其中U和V是正交矩阵,Sigma包含输入矩阵的奇异值。" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 12, 418 | "metadata": { 419 | "collapsed": false 420 | }, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/latex": [ 425 | "$M=U \\Sigma V^*$" 426 | ], 427 | "text/plain": [ 428 | "" 429 | ] 430 | }, 431 | "execution_count": 12, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "from IPython.display import Latex\n", 438 | "Latex(r\"$M=U \\Sigma V^*$\")" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "*号表示共轭转置" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 13, 451 | "metadata": { 452 | "collapsed": false 453 | }, 454 | "outputs": [ 455 | { 456 | "name": "stdout", 457 | "output_type": "stream", 458 | "text": [ 459 | "A:\n", 460 | "[[ 4 11 14]\n", 461 | " [ 8 7 -2]]\n", 462 | "U:\n", 463 | "[[-0.9486833 -0.31622777]\n", 464 | " [-0.31622777 0.9486833 ]]\n", 465 | "Sigma:\n", 466 | "[ 18.97366596 9.48683298]\n", 467 | "V:\n", 468 | "[[-0.33333333 -0.66666667 -0.66666667]\n", 469 | " [ 0.66666667 0.33333333 -0.66666667]]\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "A = np.mat(\"4 11 14;8 7 -2\")\n", 475 | "print \"A:\\n\", A\n", 476 | "\n", 477 | "U, Sigma, V = np.linalg.svd(A, full_matrices=False)\n", 478 | "print \"U:\\n\", U\n", 479 | "print \"Sigma:\\n\", Sigma\n", 480 | "print \"V:\\n\", V" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 14, 486 | "metadata": { 487 | "collapsed": false 488 | }, 489 | "outputs": [ 490 | { 491 | "data": { 492 | "text/plain": [ 493 | "array([[ 18.97366596, 0. ],\n", 494 | " [ 0. , 9.48683298]])" 495 | ] 496 | }, 497 | "execution_count": 14, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "# Sigma矩阵是奇异值矩阵对角线上的值\n", 504 | "np.diag(Sigma)" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 17, 510 | "metadata": { 511 | "collapsed": false 512 | }, 513 | "outputs": [ 514 | { 515 | "name": "stdout", 516 | "output_type": "stream", 517 | "text": [ 518 | "Product:\n", 519 | "[[ 4. 11. 14.]\n", 520 | " [ 8. 7. -2.]]\n" 521 | ] 522 | } 523 | ], 524 | "source": [ 525 | "# check\n", 526 | "M = U*np.diag(Sigma)*V\n", 527 | "print \"Product:\\n\", M" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "###2.6 广义逆矩阵\n", 535 | "广义逆矩阵可以使用numpy.linalg模块中的pinv函数进行求解。inv函数只接受方阵作为输入矩阵,而pinv函数没有这个限制。" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 18, 541 | "metadata": { 542 | "collapsed": false 543 | }, 544 | "outputs": [ 545 | { 546 | "name": "stdout", 547 | "output_type": "stream", 548 | "text": [ 549 | "A:\n", 550 | "[[ 4 11 14]\n", 551 | " [ 8 7 -2]]\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "A = np.mat(\"4 11 14; 8 7 -2\")\n", 557 | "print \"A:\\n\", A" 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": 19, 563 | "metadata": { 564 | "collapsed": false 565 | }, 566 | "outputs": [ 567 | { 568 | "name": "stdout", 569 | "output_type": "stream", 570 | "text": [ 571 | "Pseudo inverse:\n", 572 | "[[-0.00555556 0.07222222]\n", 573 | " [ 0.02222222 0.04444444]\n", 574 | " [ 0.05555556 -0.05555556]]\n" 575 | ] 576 | } 577 | ], 578 | "source": [ 579 | "pseudoinv = np.linalg.pinv(A)\n", 580 | "print \"Pseudo inverse:\\n\", pseudoinv" 581 | ] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": 20, 586 | "metadata": { 587 | "collapsed": false 588 | }, 589 | "outputs": [ 590 | { 591 | "name": "stdout", 592 | "output_type": "stream", 593 | "text": [ 594 | "Check pseudo inverse:\n", 595 | "[[ 1.00000000e+00 0.00000000e+00]\n", 596 | " [ 8.32667268e-17 1.00000000e+00]]\n" 597 | ] 598 | } 599 | ], 600 | "source": [ 601 | "# check\n", 602 | "print \"Check pseudo inverse:\\n\", A*pseudoinv" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "得到的结果并非严格意义上的单位矩阵,但是非常近似。" 610 | ] 611 | }, 612 | { 613 | "cell_type": "code", 614 | "execution_count": 21, 615 | "metadata": { 616 | "collapsed": false 617 | }, 618 | "outputs": [ 619 | { 620 | "name": "stdout", 621 | "output_type": "stream", 622 | "text": [ 623 | "A:\n", 624 | "[[ 0 1 2]\n", 625 | " [ 1 0 3]\n", 626 | " [ 4 -3 8]]\n", 627 | "inverse of A:\n", 628 | "[[-4.5 7. -1.5]\n", 629 | " [-2. 4. -1. ]\n", 630 | " [ 1.5 -2. 0.5]]\n", 631 | "check inverse:\n", 632 | "[[ 1. 0. 0.]\n", 633 | " [ 0. 1. 0.]\n", 634 | " [ 0. 0. 1.]]\n", 635 | "Pseudo inverse:\n", 636 | "[[-4.5 7. -1.5]\n", 637 | " [-2. 4. -1. ]\n", 638 | " [ 1.5 -2. 0.5]]\n", 639 | "Check pseudo inverse:\n", 640 | "[[ 1.00000000e+00 -2.66453526e-15 8.88178420e-16]\n", 641 | " [ 8.88178420e-16 1.00000000e+00 2.22044605e-16]\n", 642 | " [ 0.00000000e+00 3.55271368e-15 1.00000000e+00]]\n" 643 | ] 644 | } 645 | ], 646 | "source": [ 647 | "A = np.mat(\"0 1 2; 1 0 3; 4 -3 8\")\n", 648 | "print \"A:\\n\", A\n", 649 | "inverse = np.linalg.inv(A)\n", 650 | "print \"inverse of A:\\n\", inverse\n", 651 | "print \"check inverse:\\n\", inverse * A\n", 652 | "\n", 653 | "pseudoinv = np.linalg.pinv(A)\n", 654 | "print \"Pseudo inverse:\\n\", pseudoinv\n", 655 | "print \"Check pseudo inverse:\\n\", A*pseudoinv" 656 | ] 657 | } 658 | ], 659 | "metadata": { 660 | "kernelspec": { 661 | "display_name": "Python 2", 662 | "language": "python", 663 | "name": "python2" 664 | }, 665 | "language_info": { 666 | "codemirror_mode": { 667 | "name": "ipython", 668 | "version": 2 669 | }, 670 | "file_extension": ".py", 671 | "mimetype": "text/x-python", 672 | "name": "python", 673 | "nbconvert_exporter": "python", 674 | "pygments_lexer": "ipython2", 675 | "version": "2.7.5" 676 | } 677 | }, 678 | "nbformat": 4, 679 | "nbformat_minor": 0 680 | } 681 | -------------------------------------------------------------------------------- /NumPy/.ipynb_checkpoints/(7)universal_functions-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "##内容索引\n", 8 | "1. 通用函数\n", 9 | " - 创建通用函数 --- frompyfunc工厂函数\n", 10 | " - 通用函数的方法 --- reduce函数、accumulate函数、reduceat函数、outer函数\n", 11 | "2. 数组的除法运算 --- divide函数、true_divide函数、floor_divide函数\n", 12 | "3. 数组的模运算 --- mod函数、remainder函数、fmod函数\n", 13 | "4. 位操作函数和比较函数 --- " 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "import numpy as np" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "##1. 通用函数\n", 32 | "通用函数的输入时一组标量、输出也是一组标量,他们通常可以对应于基本数学运算。如加减乘除。\n", 33 | "\n", 34 | "[通用函数的文档](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#ufunc)\n", 35 | "\n", 36 | "通用函数是对普通的python函数进行矢量化,它是对ndarray对象的逐个元素的操作。" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "###1.1 创建通用函数\n", 44 | "使用NumPy中的frompyfunc函数,通过一个Python函数来创建通用函数。" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": { 51 | "collapsed": true 52 | }, 53 | "outputs": [], 54 | "source": [ 55 | "# 定义一个Python函数\n", 56 | "def pyFunc(a):\n", 57 | " result = np.zeros_like(a)\n", 58 | " # 这里可以看出来对逐个元素的操作\n", 59 | " result = 42\n", 60 | " return result" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "使用zeros_like函数创建一个和a形状相同的、元素全部为0的数组result。flat属性提供了一个扁平迭代器,可以逐个设置数组元素的值。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "使用frompyfunc创建通用函数,制定输入参数为1,输出参数为1。" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": { 81 | "collapsed": false 82 | }, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "The answer:\n", 89 | "[42 42 42 42]\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "ufunc1 = np.frompyfunc(pyFunc, 1, 1)\n", 95 | "ret = ufunc1(np.arange(4))\n", 96 | "print \"The answer:\\n\", ret" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": { 103 | "collapsed": false 104 | }, 105 | "outputs": [ 106 | { 107 | "name": "stdout", 108 | "output_type": "stream", 109 | "text": [ 110 | "The answer:\n", 111 | "[[42 42]\n", 112 | " [42 42]]\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "ret = ufunc1(np.arange(4).reshape(2,2))\n", 118 | "print \"The answer:\\n\", ret" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": { 124 | "collapsed": true 125 | }, 126 | "source": [ 127 | "**使用在第五节介绍的vectorize函数**" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 5, 133 | "metadata": { 134 | "collapsed": false 135 | }, 136 | "outputs": [ 137 | { 138 | "name": "stdout", 139 | "output_type": "stream", 140 | "text": [ 141 | "The answer:\n", 142 | "[42 42 42 42]\n" 143 | ] 144 | } 145 | ], 146 | "source": [ 147 | "func2 = np.vectorize(pyFunc)\n", 148 | "ret = func2(np.arange(4))\n", 149 | "print \"The answer:\\n\", ret" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "###1.2 通用函数的方法\n", 157 | "通用函数并不是真正的函数,而是能够表示函数的numpy.ufunc的对象。frompyfunc是一个构造ufunc类对象的工厂函数。\n", 158 | "\n", 159 | "通用函数类有4个方法:**reduce、accumulate、reduceat、outer。这些方法只对输入两个参数、输出一个参数的ufunc对象有效**。" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "###1.3 在add函数上分别调用4个方法" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "####(1) reduce方法:沿着指定的轴,在连续的数组元素之间递归调用通用函数,即可得到输入数组的规约(reduce)计算结果。对于add函数,其对数组的reduce计算结果等价于对数组元素求和。" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 6, 179 | "metadata": { 180 | "collapsed": false 181 | }, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "a:\n", 188 | "[0 1 2 3 4 5 6 7 8]\n", 189 | "Reduce:\n", 190 | "36\n" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "a = np.arange(9)\n", 196 | "print \"a:\\n\", a\n", 197 | "print \"Reduce:\\n\", np.add.reduce(a)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "####(2) accumulate方法:可以递归作用于输入数组,与reduce不同的是,它将存储运算的中间结果并返回。add函数上调用accumulate方法,等价于直接调用cumsum函数。" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 7, 210 | "metadata": { 211 | "collapsed": false 212 | }, 213 | "outputs": [ 214 | { 215 | "name": "stdout", 216 | "output_type": "stream", 217 | "text": [ 218 | "Accumulate:\n", 219 | "[ 0 1 3 6 10 15 21 28 36]\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "print \"Accumulate:\\n\", np.add.accumulate(a)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 8, 230 | "metadata": { 231 | "collapsed": false 232 | }, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "cumsum:\n", 239 | "[ 0 1 3 6 10 15 21 28 36]\n" 240 | ] 241 | } 242 | ], 243 | "source": [ 244 | "print \"cumsum:\\n\", np.cumsum(a)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "####(3) reduceat方法有点复杂,它需要输入一个数组以及一个索引值列表作为参数" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 9, 257 | "metadata": { 258 | "collapsed": false 259 | }, 260 | "outputs": [ 261 | { 262 | "name": "stdout", 263 | "output_type": "stream", 264 | "text": [ 265 | "Reduceat:\n", 266 | "[10 5 20 15]\n" 267 | ] 268 | } 269 | ], 270 | "source": [ 271 | "print \"Reduceat:\\n\", np.add.reduceat(a, [0,5,2,7])" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "- 第一步:用到索引值列表中的0和5,实际上就是对数组中索引值在0到5之间的元素进行reduce操作\n", 279 | "- 第二步:用到索引值5和2.由于2比5小,所以直接返回索引值为5的元素\n", 280 | "- 第三步:用到索引值2和7,计算2到7的数组的reduce操作\n", 281 | "- 第四步:用到索引值7,对索引值7开始直到数组末尾的元素进行reduce操作" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "metadata": { 288 | "collapsed": false 289 | }, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "Reduceat step 1: 10\n", 296 | "Reduceat step 2: 5\n", 297 | "Reduceat step 3: 20\n", 298 | "Reduceat step 4: 15\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "print \"Reduceat step 1:\", np.add.reduce(a[0:5])\n", 304 | "print \"Reduceat step 2:\", a[5]\n", 305 | "print \"Reduceat step 3:\", np.add.reduce(a[2:7])\n", 306 | "print \"Reduceat step 4:\", np.add.reduce(a[7:])" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "####(4) outer方法:返回一个数组,它的秩(rank)等于两个输入数组的秩的和。它会作用于两个输入数组之间存在的所有元素对。" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 11, 319 | "metadata": { 320 | "collapsed": false 321 | }, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "Outer:\n", 328 | "[[ 0 1 2 3 4 5 6 7 8]\n", 329 | " [ 1 2 3 4 5 6 7 8 9]\n", 330 | " [ 2 3 4 5 6 7 8 9 10]]\n" 331 | ] 332 | } 333 | ], 334 | "source": [ 335 | "print \"Outer:\\n\", np.add.outer(np.arange(3), a)" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "##2. 数组的除法运算" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "在NumPy中,计算算术运算符+、-、* 隐式关联着通用函数add、subtrack和multiply。也就是说,当你对NumPy数组使用这些运算符时,对应的通用函数将自动被调用。\n", 350 | "\n", 351 | "除法包含的过程比较复杂,在数组的除法运算中射击三个通用函数divide、true_divide和floor_division,以及两个对应的运算符/和//。" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "###(1) divide函数在整数除法中均只保留整数部分" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 12, 364 | "metadata": { 365 | "collapsed": false 366 | }, 367 | "outputs": [ 368 | { 369 | "name": "stdout", 370 | "output_type": "stream", 371 | "text": [ 372 | "Divide:\n", 373 | "[2 3 1] [0 0 0]\n" 374 | ] 375 | } 376 | ], 377 | "source": [ 378 | "a = np.array([2, 6, 5])\n", 379 | "b = np.array([1, 2, 3])\n", 380 | "print \"Divide:\\n\", np.divide(a, b), np.divide(b, a)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "运算结果的小数部分被截断了" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 13, 393 | "metadata": { 394 | "collapsed": false 395 | }, 396 | "outputs": [ 397 | { 398 | "name": "stdout", 399 | "output_type": "stream", 400 | "text": [ 401 | "Divide:\n", 402 | "[ 2.1 3.1 2.63157895] [ 0.47619048 0.32258065 0.38 ]\n" 403 | ] 404 | } 405 | ], 406 | "source": [ 407 | "c = np.array([2.1, 6.2, 5.0])\n", 408 | "d = np.array([1, 2, 1.9])\n", 409 | "print \"Divide:\\n\", np.divide(c, d), np.divide(d, c)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "divide函数如果有一方是浮点数,那么结果也是浮点数结果" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": {}, 422 | "source": [ 423 | "###(2) true_divide函数与数学中的除法定义更为接近,返回除法的浮点数结果不截断" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 14, 429 | "metadata": { 430 | "collapsed": false 431 | }, 432 | "outputs": [ 433 | { 434 | "name": "stdout", 435 | "output_type": "stream", 436 | "text": [ 437 | "True Divide:\n", 438 | "[ 2. 3. 1.66666667] [ 0.5 0.33333333 0.6 ]\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "print \"True Divide:\\n\", np.true_divide(a, b), np.true_divide(b, a)" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "###(3) floor_divide函数总是返回整数结果,相当于先调用divide函数再调用floor函数。\n", 451 | "floor函数对浮点数进行**向下取整**并返回整数。" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 15, 457 | "metadata": { 458 | "collapsed": false 459 | }, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "Floor Divide:\n", 466 | "[2 3 1] [0 0 0]\n" 467 | ] 468 | } 469 | ], 470 | "source": [ 471 | "print \"Floor Divide:\\n\", np.floor_divide(a, b), np.floor_divide(b, a)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "**默认情况下,使用/运算符相当于调用divide函数,使用//运算符对应于floor_divide函数**" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "##3. 数组的模运算\n", 486 | "计算模数或者余数,可以使用NumPy中的mod、remainder和fmod函数。也可以用%运算符。" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "###(1) remainder函数逐个返回两个数组中元素相除后的余数,如果第二个数字为0,则直接返回0" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 16, 499 | "metadata": { 500 | "collapsed": false 501 | }, 502 | "outputs": [ 503 | { 504 | "name": "stdout", 505 | "output_type": "stream", 506 | "text": [ 507 | "a:\n", 508 | "[-4 -3 -2 -1 0 1 2 3]\n", 509 | "Remainder:\n", 510 | "[0 1 0 1 0 1 0 1]\n" 511 | ] 512 | } 513 | ], 514 | "source": [ 515 | "a = np.arange(-4,4)\n", 516 | "print \"a:\\n\", a\n", 517 | "print \"Remainder:\\n\", np.remainder(a, 2)" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": {}, 523 | "source": [ 524 | "**mod函数与remainder函数的功能完全一致,%操作符仅仅是remainder函数的简写**" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "###(2) fmod函数处理负数的方式和remainder不同。所得余数的正负由被除数决定,与除数的正负无关" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 17, 537 | "metadata": { 538 | "collapsed": false 539 | }, 540 | "outputs": [ 541 | { 542 | "name": "stdout", 543 | "output_type": "stream", 544 | "text": [ 545 | "Fmod:\n", 546 | "[ 0 -1 0 -1 0 1 0 1]\n" 547 | ] 548 | } 549 | ], 550 | "source": [ 551 | "print \"Fmod:\\n\", np.fmod(a, 2)" 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 18, 557 | "metadata": { 558 | "collapsed": false 559 | }, 560 | "outputs": [ 561 | { 562 | "name": "stdout", 563 | "output_type": "stream", 564 | "text": [ 565 | "[ 0 -1 0 -1 0 1 0 1]\n" 566 | ] 567 | } 568 | ], 569 | "source": [ 570 | "print np.fmod(a, -2)" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "##4. 位操作函数和比较函数\n", 578 | "位操作函数可以在整数或整数数组的位上进行操作,它们都是通用函数。\n", 579 | "\n", 580 | "位操作符:^、&、|、<<、>>等。\n", 581 | "\n", 582 | "比较操作符:<、>、==等。" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "###4.1 检查两个整数的符号是否一致\n", 590 | "这里要用到XOR或者^操作符。XOR操作符又称为**不等运算符**,因此当两个操作数的符号不一致时,XOR操作的结果为负数。\n", 591 | "\n", 592 | "在NumPy中,^操作符对应于bitwise_xor函数,<操作符对应于less函数。" 593 | ] 594 | }, 595 | { 596 | "cell_type": "code", 597 | "execution_count": 19, 598 | "metadata": { 599 | "collapsed": false 600 | }, 601 | "outputs": [ 602 | { 603 | "name": "stdout", 604 | "output_type": "stream", 605 | "text": [ 606 | "Sign different? [ True True True True True True True True True False True True\n", 607 | " True True True True True True]\n", 608 | "Sign different? [ True True True True True True True True True False True True\n", 609 | " True True True True True True]\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "x = np.arange(-9, 9)\n", 615 | "y = -x\n", 616 | "print \"Sign different? \", (x^y) < 0\n", 617 | "print \"Sign different? \", np.less(np.bitwise_xor(x, y), 0)" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "除了等于0的情况,所有整数对的符号都不一样。" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "###4.2 检查一个数是否为2的幂数\n", 632 | "在二进制数中,2的幂数表示为一个1后面跟着一串0的形式。**如果在2的幂数以及比它小1的数之间进行位与操作AND,那么应该等于0。**\n", 633 | "\n", 634 | "在NumPy中,&操作符对应于bitwise_and函数,==操作符对应于equal函数。" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": 20, 640 | "metadata": { 641 | "collapsed": false 642 | }, 643 | "outputs": [ 644 | { 645 | "name": "stdout", 646 | "output_type": "stream", 647 | "text": [ 648 | "[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n", 649 | "Power of 2 ?\n", 650 | "[ True True True False True False False False True False False False\n", 651 | " False False False False True False False False]\n", 652 | "Power of 2 ?\n", 653 | "[ True True True False True False False False True False False False\n", 654 | " False False False False True False False False]\n" 655 | ] 656 | } 657 | ], 658 | "source": [ 659 | "b = np.arange(20)\n", 660 | "print b\n", 661 | "print \"Power of 2 ?\\n\", (b & (b-1)) == 0\n", 662 | "print \"Power of 2 ?\\n\", np.equal(np.bitwise_and(b, (b-1)), 0)" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": {}, 668 | "source": [ 669 | "###4.3 计算一个数被2的幂数整除后的余数\n", 670 | "计算余数的技巧只在模为2的幂数时有效。二进制的位左移一位,数值翻倍。\n", 671 | "\n", 672 | "**上一个例子看到,将2的幂数减去1,得到一串1组成的二进制数,这为我们提供了掩码,与这样的掩码做位与操作,即可得到以2的幂数作为模的余数。**\n", 673 | "\n", 674 | "在NumPy中,<<操作符对应于left_shift函数。" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 21, 680 | "metadata": { 681 | "collapsed": false 682 | }, 683 | "outputs": [ 684 | { 685 | "name": "stdout", 686 | "output_type": "stream", 687 | "text": [ 688 | "Modulus 4:\n", 689 | "[3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0]\n" 690 | ] 691 | } 692 | ], 693 | "source": [ 694 | "print \"Modulus 4:\\n\", x & ((1<<2) - 1)" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": 25, 700 | "metadata": { 701 | "collapsed": false 702 | }, 703 | "outputs": [], 704 | "source": [ 705 | "def mod_2_pow(x, n):\n", 706 | " mod = x & ((1<\n", 27 | "\n", 28 | " \n", 29 | " \n", 30 | " \n", 31 | " \n", 32 | " \n", 33 | " \n", 34 | " \n", 35 | " \n", 36 | " \n", 37 | " \n", 38 | " \n", 39 | " \n", 40 | " \n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | "
ABCD
0 0.081227 1.651024-0.063561 1.992570
1-0.060838-0.293773-0.757681-0.397578
2 1.025647-0.353300-0.878448-2.015514
3-0.788950-0.221509-1.079488-0.833900
4 1.038247 0.376582 0.698767 0.401919
5-0.067863 0.174289 1.914769-0.808617
\n", 82 | "" 83 | ], 84 | "text/plain": [ 85 | " A B C D\n", 86 | "0 0.081227 1.651024 -0.063561 1.992570\n", 87 | "1 -0.060838 -0.293773 -0.757681 -0.397578\n", 88 | "2 1.025647 -0.353300 -0.878448 -2.015514\n", 89 | "3 -0.788950 -0.221509 -1.079488 -0.833900\n", 90 | "4 1.038247 0.376582 0.698767 0.401919\n", 91 | "5 -0.067863 0.174289 1.914769 -0.808617" 92 | ] 93 | }, 94 | "execution_count": 8, 95 | "metadata": {}, 96 | "output_type": "execute_result" 97 | } 98 | ], 99 | "source": [ 100 | "df = DataFrame(np.random.randn(6,4), columns=list('ABCD'))\n", 101 | "df" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "##1. DataFrame选择数据" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "**选择A列的数据**" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 3, 121 | "metadata": { 122 | "collapsed": false 123 | }, 124 | "outputs": [ 125 | { 126 | "data": { 127 | "text/plain": [ 128 | "0 -0.532235\n", 129 | "1 1.282245\n", 130 | "2 1.894709\n", 131 | "3 -1.421003\n", 132 | "4 -0.477041\n", 133 | "5 -2.055907\n", 134 | "Name: A, dtype: float64" 135 | ] 136 | }, 137 | "execution_count": 3, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "df['A']" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "**切片得到行数据**" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 4, 156 | "metadata": { 157 | "collapsed": false 158 | }, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/html": [ 163 | "
\n", 164 | "\n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | "
ABCD
1 1.282245-2.136740 0.969922 0.110193
2 1.894709 0.732707-1.164495-0.379666
\n", 191 | "
" 192 | ], 193 | "text/plain": [ 194 | " A B C D\n", 195 | "1 1.282245 -2.136740 0.969922 0.110193\n", 196 | "2 1.894709 0.732707 -1.164495 -0.379666" 197 | ] 198 | }, 199 | "execution_count": 4, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "df[1:3]" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "**DataFrame的loc方法帮助选择数据**" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 9, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "text/plain": [ 225 | "A 0.081227\n", 226 | "B 1.651024\n", 227 | "C -0.063561\n", 228 | "D 1.992570\n", 229 | "Name: 0, dtype: float64" 230 | ] 231 | }, 232 | "execution_count": 9, 233 | "metadata": {}, 234 | "output_type": "execute_result" 235 | } 236 | ], 237 | "source": [ 238 | "# 选择第0行数据\n", 239 | "df.loc[0]" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 10, 245 | "metadata": { 246 | "collapsed": false 247 | }, 248 | "outputs": [ 249 | { 250 | "data": { 251 | "text/html": [ 252 | "
\n", 253 | "\n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | "
AB
0 0.081227 1.651024
1-0.060838-0.293773
2 1.025647-0.353300
3-0.788950-0.221509
4 1.038247 0.376582
5-0.067863 0.174289
\n", 294 | "
" 295 | ], 296 | "text/plain": [ 297 | " A B\n", 298 | "0 0.081227 1.651024\n", 299 | "1 -0.060838 -0.293773\n", 300 | "2 1.025647 -0.353300\n", 301 | "3 -0.788950 -0.221509\n", 302 | "4 1.038247 0.376582\n", 303 | "5 -0.067863 0.174289" 304 | ] 305 | }, 306 | "execution_count": 10, 307 | "metadata": {}, 308 | "output_type": "execute_result" 309 | } 310 | ], 311 | "source": [ 312 | "# 选择多列数据\n", 313 | "df.loc[:, ['A', 'B']]" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 11, 319 | "metadata": { 320 | "collapsed": false 321 | }, 322 | "outputs": [ 323 | { 324 | "data": { 325 | "text/html": [ 326 | "
\n", 327 | "\n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | "
AB
0 0.081227 1.651024
1-0.060838-0.293773
2 1.025647-0.353300
\n", 353 | "
" 354 | ], 355 | "text/plain": [ 356 | " A B\n", 357 | "0 0.081227 1.651024\n", 358 | "1 -0.060838 -0.293773\n", 359 | "2 1.025647 -0.353300" 360 | ] 361 | }, 362 | "execution_count": 11, 363 | "metadata": {}, 364 | "output_type": "execute_result" 365 | } 366 | ], 367 | "source": [ 368 | "# 选择局部数据,行列交叉区域的数据\n", 369 | "df.loc[0:2, ['A', 'B']]" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 12, 375 | "metadata": { 376 | "collapsed": false 377 | }, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/plain": [ 382 | "0.081227162656888133" 383 | ] 384 | }, 385 | "execution_count": 12, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "# 只选择一个数据\n", 392 | "df.loc[0, 'A']" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "**at方法用于专门获取某个值**" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 13, 405 | "metadata": { 406 | "collapsed": false 407 | }, 408 | "outputs": [ 409 | { 410 | "data": { 411 | "text/plain": [ 412 | "0.081227162656888133" 413 | ] 414 | }, 415 | "execution_count": 13, 416 | "metadata": {}, 417 | "output_type": "execute_result" 418 | } 419 | ], 420 | "source": [ 421 | "df.at[0, 'A']" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "##2. DataFrame切片操作" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "**iloc方法提取第四行数据**" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 14, 441 | "metadata": { 442 | "collapsed": false 443 | }, 444 | "outputs": [ 445 | { 446 | "data": { 447 | "text/plain": [ 448 | "A -0.788950\n", 449 | "B -0.221509\n", 450 | "C -1.079488\n", 451 | "D -0.833900\n", 452 | "Name: 3, dtype: float64" 453 | ] 454 | }, 455 | "execution_count": 14, 456 | "metadata": {}, 457 | "output_type": "execute_result" 458 | } 459 | ], 460 | "source": [ 461 | "df.iloc[3]" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 15, 467 | "metadata": { 468 | "collapsed": false 469 | }, 470 | "outputs": [ 471 | { 472 | "data": { 473 | "text/plain": [ 474 | "pandas.core.series.Series" 475 | ] 476 | }, 477 | "execution_count": 15, 478 | "metadata": {}, 479 | "output_type": "execute_result" 480 | } 481 | ], 482 | "source": [ 483 | "# 返回series数据类型\n", 484 | "type(df.iloc[3])" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 16, 490 | "metadata": { 491 | "collapsed": false 492 | }, 493 | "outputs": [ 494 | { 495 | "data": { 496 | "text/html": [ 497 | "
\n", 498 | "\n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | "
AB
3-0.788950-0.221509
4 1.038247 0.376582
\n", 519 | "
" 520 | ], 521 | "text/plain": [ 522 | " A B\n", 523 | "3 -0.788950 -0.221509\n", 524 | "4 1.038247 0.376582" 525 | ] 526 | }, 527 | "execution_count": 16, 528 | "metadata": {}, 529 | "output_type": "execute_result" 530 | } 531 | ], 532 | "source": [ 533 | "# 返回地4-5行,1-2列\n", 534 | "df.iloc[3:5, 0:2]" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 17, 540 | "metadata": { 541 | "collapsed": false 542 | }, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "text/html": [ 547 | "
\n", 548 | "\n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | "
AC
1-0.060838-0.757681
2 1.025647-0.878448
4 1.038247 0.698767
\n", 574 | "
" 575 | ], 576 | "text/plain": [ 577 | " A C\n", 578 | "1 -0.060838 -0.757681\n", 579 | "2 1.025647 -0.878448\n", 580 | "4 1.038247 0.698767" 581 | ] 582 | }, 583 | "execution_count": 17, 584 | "metadata": {}, 585 | "output_type": "execute_result" 586 | } 587 | ], 588 | "source": [ 589 | "# 提取不连续行和列的数\n", 590 | "df.iloc[[1,2,4], [0,2]]" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": 18, 596 | "metadata": { 597 | "collapsed": false 598 | }, 599 | "outputs": [ 600 | { 601 | "data": { 602 | "text/plain": [ 603 | "-0.29377253872215964" 604 | ] 605 | }, 606 | "execution_count": 18, 607 | "metadata": {}, 608 | "output_type": "execute_result" 609 | } 610 | ], 611 | "source": [ 612 | "# 提取某一个值\n", 613 | "df.iloc[1,1]" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": {}, 619 | "source": [ 620 | "**iat是专门提取某个数的方法,效率更高**" 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": 19, 626 | "metadata": { 627 | "collapsed": false 628 | }, 629 | "outputs": [ 630 | { 631 | "data": { 632 | "text/plain": [ 633 | "-0.29377253872215964" 634 | ] 635 | }, 636 | "execution_count": 19, 637 | "metadata": {}, 638 | "output_type": "execute_result" 639 | } 640 | ], 641 | "source": [ 642 | "df.iat[1,1]" 643 | ] 644 | }, 645 | { 646 | "cell_type": "markdown", 647 | "metadata": {}, 648 | "source": [ 649 | "##3. DataFrame筛选数据" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": 21, 655 | "metadata": { 656 | "collapsed": false 657 | }, 658 | "outputs": [ 659 | { 660 | "data": { 661 | "text/html": [ 662 | "
\n", 663 | "\n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | "
ABCD
0 0.081227 1.651024-0.063561 1.992570
4 1.038247 0.376582 0.698767 0.401919
\n", 690 | "
" 691 | ], 692 | "text/plain": [ 693 | " A B C D\n", 694 | "0 0.081227 1.651024 -0.063561 1.992570\n", 695 | "4 1.038247 0.376582 0.698767 0.401919" 696 | ] 697 | }, 698 | "execution_count": 21, 699 | "metadata": {}, 700 | "output_type": "execute_result" 701 | } 702 | ], 703 | "source": [ 704 | "# 筛选D列数据中大于0的行\n", 705 | "df[df.D > 0]" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 22, 711 | "metadata": { 712 | "collapsed": false 713 | }, 714 | "outputs": [ 715 | { 716 | "data": { 717 | "text/html": [ 718 | "
\n", 719 | "\n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | "
ABCD
0 0.081227 1.651024-0.063561 1.99257
\n", 739 | "
" 740 | ], 741 | "text/plain": [ 742 | " A B C D\n", 743 | "0 0.081227 1.651024 -0.063561 1.99257" 744 | ] 745 | }, 746 | "execution_count": 22, 747 | "metadata": {}, 748 | "output_type": "execute_result" 749 | } 750 | ], 751 | "source": [ 752 | "# 使用&符号实现多条件筛选\n", 753 | "df[(df.D > 0) & (df.C < 0)]" 754 | ] 755 | }, 756 | { 757 | "cell_type": "markdown", 758 | "metadata": {}, 759 | "source": [ 760 | "**加入我们只需要A和B列的数据,而D和C列数据都是用于筛选的,可如此写**" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": 23, 766 | "metadata": { 767 | "collapsed": false 768 | }, 769 | "outputs": [ 770 | { 771 | "data": { 772 | "text/html": [ 773 | "
\n", 774 | "\n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | "
AB
0 0.081227 1.651024
\n", 790 | "
" 791 | ], 792 | "text/plain": [ 793 | " A B\n", 794 | "0 0.081227 1.651024" 795 | ] 796 | }, 797 | "execution_count": 23, 798 | "metadata": {}, 799 | "output_type": "execute_result" 800 | } 801 | ], 802 | "source": [ 803 | "df[['A', 'B']][(df.D > 0) & (df.C < 0)]" 804 | ] 805 | }, 806 | { 807 | "cell_type": "markdown", 808 | "metadata": {}, 809 | "source": [ 810 | "**通过insin方法来筛选特定的值**" 811 | ] 812 | }, 813 | { 814 | "cell_type": "code", 815 | "execution_count": null, 816 | "metadata": { 817 | "collapsed": true 818 | }, 819 | "outputs": [], 820 | "source": [ 821 | "# \n", 822 | "alist = [1, 0.054497, 0.36]" 823 | ] 824 | } 825 | ], 826 | "metadata": { 827 | "kernelspec": { 828 | "display_name": "Python 2", 829 | "language": "python", 830 | "name": "python2" 831 | }, 832 | "language_info": { 833 | "codemirror_mode": { 834 | "name": "ipython", 835 | "version": 2 836 | }, 837 | "file_extension": ".py", 838 | "mimetype": "text/x-python", 839 | "name": "python", 840 | "nbconvert_exporter": "python", 841 | "pygments_lexer": "ipython2", 842 | "version": "2.7.5" 843 | } 844 | }, 845 | "nbformat": 4, 846 | "nbformat_minor": 0 847 | } 848 | -------------------------------------------------------------------------------- /Pandas/.ipynb_checkpoints/(2)dataframe_slice_selection-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 0 6 | } 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pyDataScienceToolkits_Base 2 | 3 | 该项目仓库使用IPython Notebook方式介绍了Python的科学计算库NumPy、SciPy、Matplotlib和机器学习库Scikit-learn的使用入门。 4 | 所有内容系平时看书笔记和日常学习实验,希望对读者有帮助作用。 5 | 6 | ##NumPy入门部分 7 | * [NumPy数组基本操作——创建、索引、分片、改变维度](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%281%29numpy_array_basis1.ipynb) 8 | * [NumPy数组基本操作——组合、分割、属性、转换](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%282%29numpy_array_basis2.ipynb) 9 | * [NumPy常用函数——计算均值、最大最小值、中位数、方差](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%283%29common_functions1.ipynb) 10 | * [NumPy常用函数——股票分析例子学习numpy](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%284%29common_functions2%E2%80%94%E2%80%94stock_analysis.ipynb) 11 | * [NumPy便捷函数——相关性分析、多项式拟合等](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%285%29convenience_function.ipynb) 12 | * [NumPy的矩阵和线性代数——矩阵基本使用、线性代数常用函数](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%286%29linear_algebra.ipynb) 13 | * [NumPy通用函数——通用函数、算术运算、位操作和比较函数](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%287%29universal_functions.ipynb) 14 | * [NumPy的random模块——随机数、超几何分布、连续分布](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%288%29random_module.ipynb) 15 | * [NumPy的排序与搜索](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%289%29sort_and_search.ipynb) 16 | * [NumPy断言函数](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/NumPy/%2810%29assert_function.ipynb) 17 | 18 | ##Matplotlib绘图入门部分 19 | * [绘图基础——绘制百度全年股价K线图、直方图、散点图、着色、图例](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Visualization/%281%29plot_base.ipynb) 20 | * [有趣绘图——动画、三维绘图、等高线图](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Visualization/%282%29interesting_plot.ipynb) 21 | * [特殊曲线——利萨如曲线、方波、锯齿波](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Visualization/%283%29special_curves_plot.ipynb) 22 | * []() 23 | 24 | ##Scikit-learn机器学习库入门部分 25 | * [从iris数据集入门scikit-learn——介绍使用KNN方法和sklearn进行模型训练预测的一般流程](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%281%29getting_started_with_iris.ipynb) 26 | * [介绍scikit-learn中如何进行模型参数的选择](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%282%29choose_a_ml_model.ipynb) 27 | * [介绍使用scikit-learn线性回归模型进行回归问题预测和特征的选择](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%283%29linear_regression.ipynb) 28 | * [介绍交叉验证及其用于参数选择、模型选择、特征选择的例子](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%284%29cross_validation.ipynb) 29 | * [介绍网格搜索来进行高效的参数调优](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%285%29grid_search.ipynb) 30 | * [介绍评估分类器性能的度量,像混淆矩阵、ROC、AUC等](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Scikit-learn/%286%29classification_metrics.ipynb) 31 | 32 | ##Pandas数据分析处理入门 33 | * [Pandas的Series和DataFrame数据结构介绍](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Pandas/%281%29pandas_introduction.ipynb) 34 | * [Pandas的DataFrame的切片和选择操作](http://nbviewer.ipython.org/github/jasonding1354/pyDataScienceToolkits_Base/blob/master/Pandas/%282%29dataframe_slice_selection.ipynb) 35 | -------------------------------------------------------------------------------- /Scikit-learn/Image/Data3classes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/Data3classes.png -------------------------------------------------------------------------------- /Scikit-learn/Image/Map1NN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/Map1NN.png -------------------------------------------------------------------------------- /Scikit-learn/Image/Map5NN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/Map5NN.png -------------------------------------------------------------------------------- /Scikit-learn/Image/confusion_matrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/confusion_matrix.png -------------------------------------------------------------------------------- /Scikit-learn/Image/cross_validation_diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/cross_validation_diagram.png -------------------------------------------------------------------------------- /Scikit-learn/Image/grid_vs_random.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/grid_vs_random.jpeg -------------------------------------------------------------------------------- /Scikit-learn/Image/iris_petal_sepal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonding1354/pyDataScienceToolkits_Base/ec3acee7ad037adf12747b659c842327c10d8b85/Scikit-learn/Image/iris_petal_sepal.jpg -------------------------------------------------------------------------------- /Visualization/.ipynb_checkpoints/(3)special_curves_plot-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 0 6 | } 7 | -------------------------------------------------------------------------------- /Visualization/baidu_stock_price.py: -------------------------------------------------------------------------------- 1 | #coding: utf-8 2 | # 将当前的日期减去1年作为起始日期 3 | import matplotlib.pyplot as plt 4 | from matplotlib.dates import DateFormatter 5 | from matplotlib.dates import DayLocator 6 | from matplotlib.dates import MonthLocator 7 | from matplotlib.finance import quotes_historical_yahoo_ochl 8 | from matplotlib.finance import candlestick_ochl 9 | import sys 10 | from datetime import date 11 | 12 | today = date.today() 13 | start = (today.year - 1, today.month, today.day) 14 | 15 | alldays = DayLocator() 16 | months = MonthLocator() 17 | month_formatter = DateFormatter("%b %Y") 18 | 19 | # 从财经频道下载股价数据 20 | symbol = 'BIDU' # 百度的股票代码 21 | quotes = quotes_historical_yahoo_ochl(symbol, start, today) 22 | 23 | # 创建figure对象,这是绘图组件的顶层容器 24 | fig = plt.figure() 25 | # 增加一个子图 26 | ax = fig.add_subplot(111) 27 | # x轴上的主定位器设置为月定位器,该定位器负责x轴上较粗的刻度 28 | ax.xaxis.set_major_locator(months) 29 | # x轴上的次定位器设置为日定位器,该定位器负责x轴上较细的刻度 30 | ax.xaxis.set_minor_locator(alldays) 31 | # x轴上的主格式化器设置为月格式化器,该格式化器负责x轴上较粗刻度的标签 32 | ax.xaxis.set_major_formatter(month_formatter) 33 | 34 | # 使用matplotlib.finance包的candlestick函数绘制k线图 35 | candlestick_ochl(ax, quotes) 36 | # 将x轴上的标签格式化为日期 37 | fig.autofmt_xdate() 38 | plt.title('Baidu, Inc. (BIDU)') 39 | plt.show() 40 | 41 | --------------------------------------------------------------------------------