├── README.md ├── utils.pyx ├── setup.py ├── 10 minutes to cython.md ├── Python multi process.md ├── .gitignore ├── Python multi threads.md ├── More efficient pandas.md ├── 10 minutes to cython.ipynb ├── Python multi process.ipynb ├── Python Standard Library.md ├── Python coroutines.ipynb ├── More efficient array.md ├── Itertools for efficient looping.md ├── Python multi threads.ipynb ├── More efficient pandas.ipynb ├── Python Standard Library.ipynb ├── Built-in method.md ├── More efficient array.ipynb ├── Using C++ in Cython.md ├── Itertools for efficient looping.ipynb ├── Using C++ in Cython.ipynb └── Built-in method.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # flying-python 2 | How to write fast and efficient Python code. 3 | -------------------------------------------------------------------------------- /utils.pyx: -------------------------------------------------------------------------------- 1 | import cython 2 | import numpy as np 3 | cimport numpy as cnp 4 | ctypedef cnp.int_t DTYPE_t 5 | 6 | 7 | @cython.boundscheck(False) 8 | @cython.wraparound(False) 9 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr): 10 | cdef: 11 | int i = 0 12 | int n = arr.shape[0] 13 | int x 14 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr) 15 | 16 | while i < n: 17 | x = arr[i] 18 | if x % 2: 19 | new_arr[i] = x + 1 20 | else: 21 | new_arr[i] = x - 1 22 | i += 1 23 | return new_arr 24 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """ 2 | @Author: tushushu 3 | @Date: 2019-06-20 10:32:30 4 | """ 5 | 6 | from distutils.core import setup 7 | from Cython.Build import cythonize 8 | import numpy 9 | 10 | 11 | def compile_file(file_name: str): 12 | """Compile pyx file.""" 13 | 14 | ext_modules = cythonize(file_name) 15 | name = file_name.split(".")[0] if "." in file_name else file_name 16 | setup(name=name, ext_modules=ext_modules, include_dirs=[numpy.get_include()]) 17 | 18 | 19 | if __name__ == "__main__": 20 | compile_file("utils.pyx") 21 | 22 | # source activate py36 23 | # python setup.py build_ext --inplace 24 | -------------------------------------------------------------------------------- /10 minutes to cython.md: -------------------------------------------------------------------------------- 1 | # 10分钟入门Cython 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | ## 1. Cython是什么? 6 | Cython是一个编程语言,它通过类似Python的语法来编写C扩展并可以被Python调用.既具备了Python快速开发的特点,又可以让代码运行起来像C一样快,同时还可以方便地调用C library。 7 | 8 | ## 2. 如何安装Cython? 9 | 跟大多数的Python库不同,Cython需要一个C编译器,在不同的平台上配置方法也不一样。 10 | ### 2.1 配置gcc 11 | - **windows** 12 | 安装MingW-w64编译器:``conda install libpython m2w64-toolchain -c msys2`` 13 | 在Python安装路径下找到\Lib\distutils文件夹,创建distutils.cfg写入如下内容: 14 | ``[build] compiler=mingw32`` 15 | 16 | - **macOS** 17 | 安装XCode即可 18 | 19 | - **linux:** 20 | gcc一般都是配置好的,如果没有就执行这条命令: ``sudo apt-get install build-essential`` 21 | 22 | 23 | ### 2.2 安装cython库 24 | - 如果没安装Anaconda: ``pip install cython`` 25 | - 如果安装了Anaconda: ``conda install cython`` 26 | 27 | ## 3. 在Jupyter Notebook上使用Cython 28 | - 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython`` 29 | - 接下来运行Cython代码,使用魔术命令 ``%%cython`` 30 | 31 | 32 | ```python 33 | %load_ext Cython 34 | ``` 35 | 36 | 37 | ```cython 38 | %%cython 39 | # 对1~100的自然数进行求和 40 | total = 0 41 | for i in range(1, 101): 42 | total += i 43 | print(total) 44 | ``` 45 | 46 | 5050 47 | 48 | 49 | ## 4. 试试Cython到底有多快 50 | - Python函数,运行时间261 ns 51 | - Cython函数,运行时间44.1 ns 52 | 53 | 运行时间竟然只有原来的五分之一左右,秘诀就在于参数x使用了静态类型int,避免了类型检查的耗时。 54 | 55 | ### 4.1 Python函数 56 | 57 | 58 | ```python 59 | def f(x): 60 | return x ** 2 - x 61 | ``` 62 | 63 | 64 | ```python 65 | %timeit f(100) 66 | ``` 67 | 68 | 261 ns ± 8.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 69 | 70 | 71 | ### 4.2 Cython函数 72 | 73 | 74 | ```cython 75 | %%cython 76 | def g(int x): 77 | return x ** 2 - x 78 | ``` 79 | 80 | 81 | ```python 82 | %timeit g(100) 83 | ``` 84 | 85 | 44.1 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 86 | 87 | 88 | ## 参考文章 89 | 部分内容引用自 - [Cython官方文档](http://docs.cython.org/en/latest/index.html) 90 | 91 | 92 | ```python 93 | 94 | ``` 95 | -------------------------------------------------------------------------------- /Python multi process.md: -------------------------------------------------------------------------------- 1 | # Python多任务处理(多进程篇) 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | ## 多进程处理CPU密集型任务 6 | CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。在Python多任务处理(多线程篇),我们试图用多线程执行CPU密集型任务,然而并没有性能上的提升。现在我们试一下用多进程来处理CPU密集型任务。 7 | 8 | ### 1. 建立进程池 9 | 10 | 11 | ```python 12 | from concurrent.futures import ProcessPoolExecutor 13 | from time import sleep, time 14 | import os 15 | print("CPU核数为%s个!" % os.cpu_count()) 16 | ``` 17 | 18 | CPU核数为8个! 19 | 20 | 21 | 22 | ```python 23 | # Worker数量 24 | N = 8 25 | # 建立进程池 26 | pool = ProcessPoolExecutor(max_workers=N) 27 | ``` 28 | 29 | ### 2. 定义一个CPU密集型函数 30 | 该函数会对[1, x]之间的整数进行求和。 31 | 32 | 33 | ```python 34 | def cpu_bound_func(x): 35 | tot = 0 36 | a = 1 37 | while a <= x: 38 | tot += x 39 | a += 1 40 | print("Finish sum from 1 to %d!" % x) 41 | return tot 42 | ``` 43 | 44 | ### 3. 使用串行的方式处理 45 | 遍历一个列表的所有元素,执行func函数。 46 | 47 | 48 | ```python 49 | def process_array(arr): 50 | for x in arr: 51 | cpu_bound_func(x) 52 | ``` 53 | 54 | ### 4. 使用多进程处理 55 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。 56 | 57 | 58 | ```python 59 | def fast_process_array(arr): 60 | for x in pool.map(cpu_bound_func, arr): 61 | pass 62 | ``` 63 | 64 | ### 5. 计算函数运行时间 65 | - 串行版本的运行时间5.7秒 66 | - 多进程版本的运行时间1.6秒 67 | 68 | 69 | ```python 70 | def time_it(fn, *args): 71 | start = time() 72 | fn(*args) 73 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start)) 74 | ``` 75 | 76 | 77 | ```python 78 | time_it(process_array, [10**7 for _ in range(8)]) 79 | ``` 80 | 81 | Finish sum from 1 to 10000000! 82 | Finish sum from 1 to 10000000! 83 | Finish sum from 1 to 10000000! 84 | Finish sum from 1 to 10000000! 85 | Finish sum from 1 to 10000000! 86 | Finish sum from 1 to 10000000! 87 | Finish sum from 1 to 10000000! 88 | Finish sum from 1 to 10000000! 89 | process_array版本的运行时间为 5.74394 秒! 90 | 91 | 92 | 93 | ```python 94 | time_it(fast_process_array, [10**7 for _ in range(8)]) 95 | ``` 96 | 97 | fast_process_array版本的运行时间为 1.62266 秒! 98 | 99 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | 53 | # Translations 54 | *.mo 55 | *.pot 56 | 57 | # Django stuff: 58 | *.log 59 | local_settings.py 60 | db.sqlite3 61 | db.sqlite3-journal 62 | 63 | # Flask stuff: 64 | instance/ 65 | .webassets-cache 66 | 67 | # Scrapy stuff: 68 | .scrapy 69 | 70 | # Sphinx documentation 71 | docs/_build/ 72 | 73 | # PyBuilder 74 | target/ 75 | 76 | # Jupyter Notebook 77 | .ipynb_checkpoints 78 | 79 | # IPython 80 | profile_default/ 81 | ipython_config.py 82 | 83 | # pyenv 84 | .python-version 85 | 86 | # pipenv 87 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 88 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 89 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 90 | # install all needed dependencies. 91 | #Pipfile.lock 92 | 93 | # celery beat schedule file 94 | celerybeat-schedule 95 | 96 | # SageMath parsed files 97 | *.sage.py 98 | 99 | # Environments 100 | .env 101 | .venv 102 | env/ 103 | venv/ 104 | ENV/ 105 | env.bak/ 106 | venv.bak/ 107 | 108 | # Spyder project settings 109 | .spyderproject 110 | .spyproject 111 | 112 | # Rope project settings 113 | .ropeproject 114 | 115 | # mkdocs documentation 116 | /site 117 | 118 | # mypy 119 | .mypy_cache/ 120 | .dmypy.json 121 | dmypy.json 122 | 123 | # Pyre type checker 124 | .pyre/ 125 | 126 | # VS Code 127 | .vscode/ 128 | .ipynb_checkpoints/ 129 | -------------------------------------------------------------------------------- /Python multi threads.md: -------------------------------------------------------------------------------- 1 | # Python多任务处理(多线程篇) 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | ## 1. GIL 6 | 7 | 熟悉python的都知道,在C语言写的python解释器中存在全局解释器锁,由于全局解释器锁的存在,在同一时间内,python解释器只能运行一个线程的代码,这大大影响了python多线程的性能。而这个解释器锁由于历史原因,现在几乎无法消除。 8 | 9 | python GIL 之所以会影响多线程等性能,是因为在多线程的情况下,只有当线程获得了一个全局锁的时候,那么该线程的代码才能运行,而全局锁只有一个,所以使用python多线程,在同一时刻也只有一个线程在运行,因此在即使在多核的情况下也只能发挥出单核的性能。 10 | 11 | 12 | ## 2. 多线程处理IO密集型任务 13 | IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。一个线程执行IO密集型任务的时候,CPU处于闲置状态,因此GIL会被释放给其他线程,从而缩短了总体的等待运行时间。 14 | 15 | 16 | ```python 17 | from concurrent.futures import ThreadPoolExecutor 18 | from time import sleep, time 19 | ``` 20 | 21 | 22 | ```python 23 | # Worker数量 24 | N = 4 25 | # 建立线程池 26 | pool = ThreadPoolExecutor(max_workers=N) 27 | ``` 28 | 29 | ### 2.1 定义一个IO密集型函数 30 | 该函数会“睡眠”x秒。 31 | 32 | 33 | ```python 34 | def io_bound_func(x): 35 | sleep(x) 36 | print("Sleep for %d seconds." % x) 37 | ``` 38 | 39 | ### 2.2 使用串行的方式处理 40 | 遍历一个列表的所有元素,执行func函数。 41 | 42 | 43 | ```python 44 | def process_array(arr): 45 | for x in arr: 46 | io_bound_func(x) 47 | ``` 48 | 49 | ### 2.3 使用多线程处理 50 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。 51 | 52 | 53 | ```python 54 | def fast_process_array(arr): 55 | for x in pool.map(io_bound_func, arr): 56 | pass 57 | ``` 58 | 59 | ### 2.4 计算函数运行时间 60 | - 串行版本的运行时间 = 1 + 2 + 3 = 6秒 61 | - 多线程版本的运行时间 = max(1, 2, 3) = 3秒 62 | 63 | 64 | ```python 65 | def time_it(fn, *args): 66 | start = time() 67 | fn(*args) 68 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start)) 69 | ``` 70 | 71 | 72 | ```python 73 | time_it(process_array, [1, 2, 3]) 74 | ``` 75 | 76 | Sleep for 1 seconds. 77 | Sleep for 2 seconds. 78 | Sleep for 3 seconds. 79 | process_array版本的运行时间为 6.00883 秒! 80 | 81 | 82 | 83 | ```python 84 | time_it(fast_process_array, [1, 2, 3]) 85 | ``` 86 | 87 | Sleep for 1 seconds. 88 | Sleep for 2 seconds. 89 | Sleep for 3 seconds. 90 | fast_process_array版本的运行时间为 3.00300 秒! 91 | 92 | 93 | ### 3. 多线程CPU密集型任务 94 | CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。 95 | 96 | ### 3.1 定义一个CPU密集型函数 97 | 该函数会对[1, x]之间的整数进行求和。 98 | 99 | 100 | ```python 101 | def cpu_bound_func(x): 102 | tot = 0 103 | a = 1 104 | while a <= x: 105 | tot += x 106 | a += 1 107 | print("Finish sum from 1 to %d!" % x) 108 | return tot 109 | ``` 110 | 111 | ### 3.2 使用串行的方式处理 112 | 遍历一个列表的所有元素,执行func函数。 113 | 114 | 115 | ```python 116 | def process_array(arr): 117 | for x in arr: 118 | cpu_bound_func(x) 119 | ``` 120 | 121 | ### 3.3 使用多线程处理 122 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。 123 | 124 | 125 | ```python 126 | def fast_process_array(arr): 127 | for x in pool.map(cpu_bound_func, arr): 128 | pass 129 | ``` 130 | 131 | ### 3.4 计算函数运行时间 132 | - 串行版本的运行时间2.1秒 133 | - 多线程版本的运行时间2.2秒 134 | 135 | 136 | ```python 137 | def time_it(fn, *args): 138 | start = time() 139 | fn(*args) 140 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start)) 141 | ``` 142 | 143 | 144 | ```python 145 | time_it(process_array, [10**7, 10**7, 10**7]) 146 | ``` 147 | 148 | Finish sum from 1 to 10000000! 149 | Finish sum from 1 to 10000000! 150 | Finish sum from 1 to 10000000! 151 | process_array版本的运行时间为 2.10489 秒! 152 | 153 | 154 | 155 | ```python 156 | time_it(fast_process_array, [10**7, 10**7, 10**7]) 157 | ``` 158 | 159 | Finish sum from 1 to 10000000! 160 | Finish sum from 1 to 10000000! 161 | Finish sum from 1 to 10000000! 162 | fast_process_array版本的运行时间为 2.20897 秒! 163 | 164 | 165 | ## 参考文章 166 | https://www.jianshu.com/p/c75ed8a6e9af 167 | https://www.cnblogs.com/tusheng/articles/10630662.html 168 | 169 | 170 | ```python 171 | 172 | ``` 173 | -------------------------------------------------------------------------------- /More efficient pandas.md: -------------------------------------------------------------------------------- 1 | # 让Pandas DataFrame性能提升40倍 2 | 3 | ## 1. 小试牛刀 4 | 大名鼎鼎的Pandas是数据分析的神器。有时候我们需要对上千万甚至上亿的数据进行非常复杂处理,那么运行效率就是一个不能忽视的问题。比如下面这个简单例子,我们随机生成100万条数据,对'val'这一列进行处理:如果是偶数则减1,奇数则加1。实际的数据分析工作要比这个例子复杂的多,但考虑到我们(主要是我)没有那么多时间等待运行结果,所以就偷个懒吧。可以看到transform函数的平均运行时间是284ms, 5 | 6 | 7 | ```python 8 | import pandas as pd 9 | import numpy as np 10 | 11 | def gen_data(size): 12 | d = dict() 13 | d["genre"] = np.random.choice(["A", "B", "C", "D"], size=size) 14 | d["val"] = np.random.randint(low=0, high=100, size=size) 15 | return pd.DataFrame(d) 16 | 17 | data = gen_data(1000000) 18 | data.head() 19 | ``` 20 | 21 | 22 | 23 | 24 |
25 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 |
genreval
0C54
1A5
2D0
3D42
4C91
74 |
75 | 76 | 77 | 78 | 79 | ```python 80 | def transform(data): 81 | data.loc[:, "new_val"] = data.val.apply(lambda x: x + 1 if x % 2 else x - 1) 82 | ``` 83 | 84 | 85 | ```python 86 | %timeit -n 1 transform(data) 87 | ``` 88 | 89 | 284 ms ± 8.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 90 | 91 | 92 | ## 2. 用Cython编写C扩展 93 | 试试用我们的老朋友Cython来写一下 `x + 1 if x % 2 else x - 1` 这个函数。平均运行时间降低到了202ms,果然速度变快了。性能大约提升了1.4倍,离40倍的flag还差的好远[捂脸]。 94 | 95 | 96 | ```python 97 | %load_ext cython 98 | ``` 99 | 100 | 101 | ```cython 102 | %%cython 103 | cpdef int _transform(int x): 104 | if x % 2: 105 | return x + 1 106 | return x - 1 107 | 108 | def transform(data): 109 | data.loc[:, "new_val"] = data.val.apply(_transform) 110 | ``` 111 | 112 | 113 | ```python 114 | %timeit -n 1 transform(data) 115 | ``` 116 | 117 | 202 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 118 | 119 | 120 | ## 3. 减少类型转换 121 | 为了减少C和Python之间的类型转换,我们直接把'val'这一列作为Numpy数组传递给Cython函数,注意区分cnp和np。平均运行时间直接降到10.8毫秒,性能大约提升了26倍,仿佛看到了一丝希望。 122 | 123 | 124 | ```cython 125 | %%cython 126 | import numpy as np 127 | cimport numpy as cnp 128 | ctypedef cnp.int_t DTYPE_t 129 | 130 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr): 131 | cdef: 132 | int i = 0 133 | int n = arr.shape[0] 134 | int x 135 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr) 136 | 137 | while i < n: 138 | x = arr[i] 139 | if x % 2: 140 | new_arr[i] = x + 1 141 | else: 142 | new_arr[i] = x - 1 143 | i += 1 144 | return new_arr 145 | 146 | def transform(data): 147 | data.loc[:, "new_val"] = _transform(data.val.values) 148 | ``` 149 | 150 | 151 | ```python 152 | %timeit -n 1 transform(data) 153 | ``` 154 | 155 | 10.8 ms ± 512 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 156 | 157 | 158 | ## 4. 使用不安全的数组 159 | 利用@cython.boundscheck(False),@cython.wraparound(False)装饰器关闭数组的边界检查和负下标处理,平均运行时间变为5.9毫秒。性能提升了42倍左右,顺利完成任务。 160 | 161 | 162 | ```cython 163 | %%cython 164 | import cython 165 | import numpy as np 166 | cimport numpy as cnp 167 | ctypedef cnp.int_t DTYPE_t 168 | 169 | 170 | @cython.boundscheck(False) 171 | @cython.wraparound(False) 172 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr): 173 | cdef: 174 | int i = 0 175 | int n = arr.shape[0] 176 | int x 177 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr) 178 | 179 | while i < n: 180 | x = arr[i] 181 | if x % 2: 182 | new_arr[i] = x + 1 183 | else: 184 | new_arr[i] = x - 1 185 | i += 1 186 | return new_arr 187 | 188 | def transform(data): 189 | data.loc[:, "new_val"] = _transform(data.val.values) 190 | ``` 191 | 192 | 193 | ```python 194 | %timeit -n 1 transform(data) 195 | ``` 196 | 197 | 6.76 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 198 | 199 | -------------------------------------------------------------------------------- /10 minutes to cython.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 10分钟入门Cython\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. Cython是什么? \n", 17 | "Cython是一个编程语言,它通过类似Python的语法来编写C扩展并可以被Python调用.既具备了Python快速开发的特点,又可以让代码运行起来像C一样快,同时还可以方便地调用C library。" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## 2. 如何安装Cython?\n", 25 | "跟大多数的Python库不同,Cython需要一个C编译器,在不同的平台上配置方法也不一样。\n", 26 | "### 2.1 配置gcc\n", 27 | "- **windows** \n", 28 | "安装MingW-w64编译器:``conda install libpython m2w64-toolchain -c msys2`` \n", 29 | "在Python安装路径下找到\\Lib\\distutils文件夹,创建distutils.cfg写入如下内容: \n", 30 | "``[build] compiler=mingw32``\n", 31 | "\n", 32 | "- **macOS** \n", 33 | "安装XCode即可 \n", 34 | "\n", 35 | "- **linux:** \n", 36 | "gcc一般都是配置好的,如果没有就执行这条命令: ``sudo apt-get install build-essential`` \n", 37 | "\n", 38 | "\n", 39 | "### 2.2 安装cython库\n", 40 | "- 如果没安装Anaconda: ``pip install cython`` \n", 41 | "- 如果安装了Anaconda: ``conda install cython``" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## 3. 在Jupyter Notebook上使用Cython \n", 49 | "- 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``\n", 50 | "- 接下来运行Cython代码,使用魔术命令 ``%%cython``" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "%load_ext Cython" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "5050\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "%%cython\n", 77 | "# 对1~100的自然数进行求和\n", 78 | "total = 0\n", 79 | "for i in range(1, 101):\n", 80 | " total += i\n", 81 | "print(total)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "## 4. 试试Cython到底有多快\n", 89 | "- Python函数,运行时间261 ns\n", 90 | "- Cython函数,运行时间44.1 ns \n", 91 | "\n", 92 | "运行时间竟然只有原来的五分之一左右,秘诀就在于参数x使用了静态类型int,避免了类型检查的耗时。" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "### 4.1 Python函数" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 3, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "def f(x):\n", 109 | " return x ** 2 - x" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "261 ns ± 8.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" 122 | ] 123 | } 124 | ], 125 | "source": [ 126 | "%timeit f(100)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "### 4.2 Cython函数" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 5, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "%%cython\n", 143 | "def g(int x):\n", 144 | " return x ** 2 - x" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 6, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | "44.1 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" 157 | ] 158 | } 159 | ], 160 | "source": [ 161 | "%timeit g(100)" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "## 参考文章\n", 169 | "部分内容引用自 - [Cython官方文档](http://docs.cython.org/en/latest/index.html)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [] 178 | } 179 | ], 180 | "metadata": { 181 | "kernelspec": { 182 | "display_name": "Python 3", 183 | "language": "python", 184 | "name": "python3" 185 | }, 186 | "language_info": { 187 | "codemirror_mode": { 188 | "name": "ipython", 189 | "version": 3 190 | }, 191 | "file_extension": ".py", 192 | "mimetype": "text/x-python", 193 | "name": "python", 194 | "nbconvert_exporter": "python", 195 | "pygments_lexer": "ipython3", 196 | "version": "3.6.6" 197 | } 198 | }, 199 | "nbformat": 4, 200 | "nbformat_minor": 2 201 | } 202 | -------------------------------------------------------------------------------- /Python multi process.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python多任务处理(多进程篇)\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 多进程处理CPU密集型任务\n", 17 | "CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。在Python多任务处理(多线程篇),我们试图用多线程执行CPU密集型任务,然而并没有性能上的提升。现在我们试一下用多进程来处理CPU密集型任务。" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "### 1. 建立进程池" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 10, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "name": "stdout", 34 | "output_type": "stream", 35 | "text": [ 36 | "CPU核数为8个!\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "from concurrent.futures import ProcessPoolExecutor\n", 42 | "from time import sleep, time\n", 43 | "import os\n", 44 | "print(\"CPU核数为%s个!\" % os.cpu_count())" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 11, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# Worker数量\n", 54 | "N = 8\n", 55 | "# 建立进程池\n", 56 | "pool = ProcessPoolExecutor(max_workers=N)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "### 2. 定义一个CPU密集型函数\n", 64 | "该函数会对[1, x]之间的整数进行求和。" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 12, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "def cpu_bound_func(x):\n", 74 | " tot = 0\n", 75 | " a = 1\n", 76 | " while a <= x:\n", 77 | " tot += x\n", 78 | " a += 1\n", 79 | " print(\"Finish sum from 1 to %d!\" % x)\n", 80 | " return tot" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### 3. 使用串行的方式处理\n", 88 | "遍历一个列表的所有元素,执行func函数。" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 13, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "def process_array(arr):\n", 98 | " for x in arr:\n", 99 | " cpu_bound_func(x)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "### 4. 使用多进程处理\n", 107 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 14, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "def fast_process_array(arr):\n", 117 | " for x in pool.map(cpu_bound_func, arr):\n", 118 | " pass" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "### 5. 计算函数运行时间\n", 126 | "- 串行版本的运行时间5.7秒\n", 127 | "- 多进程版本的运行时间1.6秒" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 15, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "def time_it(fn, *args):\n", 137 | " start = time()\n", 138 | " fn(*args)\n", 139 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 20, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "Finish sum from 1 to 10000000!\n", 152 | "Finish sum from 1 to 10000000!\n", 153 | "Finish sum from 1 to 10000000!\n", 154 | "Finish sum from 1 to 10000000!\n", 155 | "Finish sum from 1 to 10000000!\n", 156 | "Finish sum from 1 to 10000000!\n", 157 | "Finish sum from 1 to 10000000!\n", 158 | "Finish sum from 1 to 10000000!\n", 159 | "process_array版本的运行时间为 5.74394 秒!\n" 160 | ] 161 | } 162 | ], 163 | "source": [ 164 | "time_it(process_array, [10**7 for _ in range(8)])" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 21, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "fast_process_array版本的运行时间为 1.62266 秒!\n" 177 | ] 178 | } 179 | ], 180 | "source": [ 181 | "time_it(fast_process_array, [10**7 for _ in range(8)])" 182 | ] 183 | } 184 | ], 185 | "metadata": { 186 | "kernelspec": { 187 | "display_name": "Python 3", 188 | "language": "python", 189 | "name": "python3" 190 | }, 191 | "language_info": { 192 | "codemirror_mode": { 193 | "name": "ipython", 194 | "version": 3 195 | }, 196 | "file_extension": ".py", 197 | "mimetype": "text/x-python", 198 | "name": "python", 199 | "nbconvert_exporter": "python", 200 | "pygments_lexer": "ipython3", 201 | "version": "3.6.6" 202 | } 203 | }, 204 | "nbformat": 4, 205 | "nbformat_minor": 2 206 | } 207 | -------------------------------------------------------------------------------- /Python Standard Library.md: -------------------------------------------------------------------------------- 1 | # 用Python标准库写出高效的代码 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | ## 1. bisect - 二分查找 6 | 给定一个列表对象,我们要对目标元素进行查找,返回其在列表中的下标。 7 | * 首先想到的是Python列表的index方法。建立一个长度为10000的升序列表,编写search函数使用index方式把里面的每一个元素查找一遍,平均运行时间437毫秒。 8 | * 使用bisect模块的bisect_left,也就是我们熟知的二分查找。编写fast_search函数,平均运行时间3.94毫秒,性能提升了110倍! 9 | 10 | 11 | ```python 12 | import bisect 13 | ``` 14 | 15 | 16 | ```python 17 | def search(nums): 18 | for x in nums: 19 | nums.index(x) 20 | ``` 21 | 22 | 23 | ```python 24 | def fast_search(nums): 25 | for x in nums: 26 | bisect.bisect_left(nums, x) 27 | ``` 28 | 29 | 30 | ```python 31 | arr = list(range(10000)) 32 | ``` 33 | 34 | 35 | ```python 36 | %timeit -n 1 search(arr) 37 | ``` 38 | 39 | 437 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 40 | 41 | 42 | 43 | ```python 44 | %timeit -n 1 fast_search(arr) 45 | ``` 46 | 47 | 3.94 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 48 | 49 | 50 | ## 2. Counter - 高效计数 51 | 给定一个列表对象,我们要统计其中的每个不重复的元素出现了多少次,返回一个字典对象。 52 | * 创建一个长度为10000,元素为1-3之间的随机数的列表。编写count函数,创建一个空字典,用for循环遍历该列表,将计数结果写入字典。平均运行时间937微秒。 53 | * 使用collections模块的Counter,编写fast_count函数,一行代码搞定。平均运行时间494微秒,性能几乎是原来的2倍。 54 | 55 | 56 | ```python 57 | from collections import Counter 58 | from random import randint 59 | ``` 60 | 61 | 62 | ```python 63 | def count(nums): 64 | res = dict() 65 | for x in nums: 66 | if x in res: 67 | res[x] += 1 68 | else: 69 | res[x] = 0 70 | return x 71 | ``` 72 | 73 | 74 | ```python 75 | def fast_count(nums): 76 | return Counter(nums) 77 | ``` 78 | 79 | 80 | ```python 81 | nums = [randint(1, 3) for _ in range(10000)] 82 | ``` 83 | 84 | 85 | ```python 86 | %timeit -n 1 count(nums) 87 | ``` 88 | 89 | 937 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 90 | 91 | 92 | 93 | ```python 94 | %timeit -n 1 fast_count(nums) 95 | ``` 96 | 97 | 494 µs ± 240 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 98 | 99 | 100 | ## 3. heapq - 堆 101 | 给定一个列表对象,返回该列表中最小的3个元素。 102 | * 创建一个长度为10000的列表,对元素进行随机打乱。编写top_3函数,对列表进行排序,返回前3个元素。平均运行时间2.03毫秒。 103 | * 使用heapq模块,也就是我们熟悉的堆,编写fast_top_3函数。平均运行时间296微秒,性能提升了6.8倍。 104 | 105 | 106 | ```python 107 | import heapq 108 | from random import shuffle 109 | ``` 110 | 111 | 112 | ```python 113 | def top_3(nums): 114 | return sorted(nums)[:3] 115 | ``` 116 | 117 | 118 | ```python 119 | def fast_top_3(nums): 120 | return heapq.nsmallest(3, nums) 121 | ``` 122 | 123 | 124 | ```python 125 | nums = list(range(10000)) 126 | shuffle(nums) 127 | ``` 128 | 129 | 130 | ```python 131 | %timeit -n 1 top_3(nums) 132 | ``` 133 | 134 | 2.03 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 135 | 136 | 137 | 138 | ```python 139 | %timeit -n 1 fast_top_3(nums) 140 | ``` 141 | 142 | 296 µs ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 143 | 144 | 145 | ## 4. itemgetter - 批量get元素 146 | 给定一个字典和一个列表,列表中包含一个或多个字典中的key,返回对应的values。 147 | * 创建一个元素数量为10万的字典,从字典的key中随机抽样10万,形成一个长度为1万的列表。编写get_items函数,平均运行时间1.12毫秒 148 | * 使用itemgetter批量读取这些元素,编写fast_get_items函数,平均运行时间836微秒,性能是原来的1.3倍。 149 | 150 | 151 | 152 | ```python 153 | from operator import itemgetter 154 | from random import choices 155 | ``` 156 | 157 | 158 | ```python 159 | def get_items(data, keys): 160 | return [data[x] for x in keys] 161 | ``` 162 | 163 | 164 | ```python 165 | def fast_get_items(data, keys): 166 | return itemgetter(*keys)(data) 167 | ``` 168 | 169 | 170 | ```python 171 | data= dict(enumerate(range(100000))) 172 | keys = choices(list(data.keys()), k=10000) 173 | ``` 174 | 175 | 176 | ```python 177 | %timeit -n 5 get_items(data, keys) 178 | ``` 179 | 180 | 1.12 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 5 loops each) 181 | 182 | 183 | 184 | ```python 185 | %timeit -n 5 fast_get_items(data, keys) 186 | ``` 187 | 188 | 836 µs ± 287 µs per loop (mean ± std. dev. of 7 runs, 5 loops each) 189 | 190 | 191 | ## 5. lru_cache - 空间换时间 192 | 给定数字n,返回长度为n的斐波那且数列 193 | * 使用递归方式,编写fib函数,并用fib_seq函数对其进行循环调用。令n等于20,平均运行时间3.28ms。 194 | * 使用@lru_cache语法糖,将已经计算出来的结果缓存起来,比如fib(4),计算fib(5)的时候可以直接调用缓存的fib(4)。平均运行时间144微秒,性能提升了22倍。 195 | 196 | 197 | ```python 198 | from functools import lru_cache 199 | ``` 200 | 201 | 202 | ```python 203 | def fib(n): 204 | if n < 2: 205 | return n 206 | return fib(n-1) + fib(n-2) 207 | 208 | def fib_seq(n): 209 | return [fib(x) for x in range(n)] 210 | ``` 211 | 212 | 213 | ```python 214 | @lru_cache(maxsize=None) 215 | def fast_fib(n): 216 | if n < 2: 217 | return n 218 | return fib(n-1) + fib(n-2) 219 | 220 | def fast_fib_seq(n): 221 | return [fast_fib(x) for x in range(n)] 222 | ``` 223 | 224 | 225 | ```python 226 | %timeit -n 5 fib_seq(20) 227 | ``` 228 | 229 | 3.28 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 3 loops each) 230 | 231 | 232 | 233 | ```python 234 | %timeit -n 5 fast_fib_seq(20) 235 | ``` 236 | 237 | The slowest run took 524.07 times longer than the fastest. This could mean that an intermediate result is being cached. 238 | 144 µs ± 347 µs per loop (mean ± std. dev. of 7 runs, 3 loops each) 239 | 240 | 241 | 242 | ```python 243 | 244 | ``` 245 | -------------------------------------------------------------------------------- /Python coroutines.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python多任务处理(协程篇)\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. 协程\n", 17 | "协程,英文名是Coroutine,又称为微线程,是一种用户态的轻量级线程。协程不像线程和进程那样,需要进行系统内核上的上下文切换,协程的上下文切换是由程序员决定的。协程通过 async/await 语法进行声明,是编写异步应用的推荐方式。\n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 1, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "import asyncio\n", 27 | "\n", 28 | "async def hello_world():\n", 29 | " print('hello')\n", 30 | " await asyncio.sleep(1)\n", 31 | " print('world')\n", 32 | "\n", 33 | "# asyncio.run(main())" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## 2. 协程处理IO密集型任务\n", 41 | "IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 1, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import requests\n", 51 | "import time" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### 2.1 定义一个IO密集型函数" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "def _sleep(n):\n", 68 | " time.sleep(n)\n", 69 | " print(\"Sleep for %d seconds.\" % n)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### 2.2 使用串行的方式处理" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 3, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "def sleep(m, n):\n", 86 | " for i in range(m):\n", 87 | " _sleep(n)" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "### 2.3 使用协程处理\n", 95 | "目前Jupyter Notebook因为一些待修复的BUG无法运行协程。" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 4, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "f = open(\"tmp.py\", \"w\")\n", 105 | "f.write(\n", 106 | "\"\"\"\n", 107 | "import time\n", 108 | "import asyncio\n", 109 | "\n", 110 | "\n", 111 | "async def _sleep(n):\n", 112 | " await asyncio.sleep(n)\n", 113 | " print(\"Sleep for %d seconds.\" % n)\n", 114 | "\n", 115 | "\n", 116 | "def sleep(m, n):\n", 117 | " loop = asyncio.get_event_loop()\n", 118 | " loop.run_until_complete(asyncio.gather(*[_sleep(n) for _ in range(m)]))\n", 119 | " loop.close()\n", 120 | "\n", 121 | "\n", 122 | "if __name__ == '__main__':\n", 123 | " start = time.perf_counter()\n", 124 | " m = 3\n", 125 | " n = 1\n", 126 | " sleep(m, n)\n", 127 | " print(\"%s函数的运行时间为 %.5f 秒!\" % (sleep.__name__, time.perf_counter() - start))\"\"\"\n", 128 | ")\n", 129 | "f.close()" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### 2.4 计算函数运行时间\n", 137 | "- 串行版本的运行时间 = 1 + 2 + 3 = 6秒 \n", 138 | "- 多线程版本的运行时间 = max(1, 2, 3) = 3秒" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 5, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "def time_it(fn, *args):\n", 148 | " start = time.perf_counter()\n", 149 | " fn(*args)\n", 150 | " print(\"%s函数的运行时间为 %.5f 秒!\" % (fn.__name__, time.perf_counter() - start))" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 6, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "Sleep for 1 seconds.\n", 163 | "Sleep for 1 seconds.\n", 164 | "Sleep for 1 seconds.\n", 165 | "sleep函数的运行时间为 3.01054 秒!\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "time_it(sleep, 3, 1)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 7, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "Sleep for 1 seconds.\n", 183 | "Sleep for 1 seconds.\n", 184 | "Sleep for 1 seconds.\n", 185 | "sleep函数的运行时间为 1.00305 秒!\n", 186 | "\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "import subprocess\n", 192 | "print(str(subprocess.check_output(\"python tmp.py\", shell=True), encoding = \"utf-8\"))" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "## 参考文章\n", 200 | "https://docs.python.org/zh-cn/3.7/library/asyncio-task.html" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [] 209 | } 210 | ], 211 | "metadata": { 212 | "kernelspec": { 213 | "display_name": "Python 3", 214 | "language": "python", 215 | "name": "python3" 216 | }, 217 | "language_info": { 218 | "codemirror_mode": { 219 | "name": "ipython", 220 | "version": 3 221 | }, 222 | "file_extension": ".py", 223 | "mimetype": "text/x-python", 224 | "name": "python", 225 | "nbconvert_exporter": "python", 226 | "pygments_lexer": "ipython3", 227 | "version": "3.6.10" 228 | } 229 | }, 230 | "nbformat": 4, 231 | "nbformat_minor": 2 232 | } 233 | -------------------------------------------------------------------------------- /More efficient array.md: -------------------------------------------------------------------------------- 1 | # 4种方法提升Python数组的效率 2 | 3 | ## 1. Python的列表为什么慢 4 | Python的列表是一个动态的数组,即数组的size是可以调整的,数组存放着指向各个列表元素的指针(PyObject*)。列表中的各个元素可以是不同的类型,比如my_list = ['a', 1, True]。实际上数组里存放了三个指针,分别指向了这三个元素。那么相比其他语言的数组而言,为什么Python的列表会慢呢?原因主要是以下两个: 5 | 1. Python是动态类型语言,意味着类型检查要耗费额外的时间。 6 | 2. Python或者说Cpython没有JIT优化器。 7 | 8 | ## 2. 如何用Python执行快速的数组计算 9 | 目前比较主流的解决方案有如下几种: 10 | 1. Numpy - Numpy的array更像是C/C++的数组,数据类型一致,而且array的方法(如sum)都是用C来实现的。 11 | 2. Numba - 使用JIT技术,优化Numpy的性能。无论是调用Numpy的方法,还是使用for循环遍历Numpy数组,都可以得到性能提升。 12 | 3. Numexpr - 避免Numpy为中间结果分配内存,优化Numpy性能,主要用于大数组的表达式计算。 13 | 4. Cython - 为Python编写C/C++扩展。 14 | 15 | 接下来通过两个例子来演示如何通过这四种工具 16 | 17 | ## 3. 数组求平方和 18 | 19 | 20 | ```python 21 | arr = [x for x in range(10000)] 22 | ``` 23 | 24 | ### 3.1 for循环 25 | 26 | 27 | ```python 28 | def sqr_sum(arr): 29 | total = 0 30 | for x in arr: 31 | total += x ** 2 32 | return total 33 | 34 | print("The result is:", sqr_sum(arr)) 35 | %timeit sqr_sum(arr) 36 | ``` 37 | 38 | The result is: 333283335000 39 | 2.53 ms ± 91.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 40 | 41 | 42 | ### 3.2 Numpy 43 | 44 | 45 | ```python 46 | import numpy as np 47 | ``` 48 | 49 | 50 | ```python 51 | def sqr_sum(arr): 52 | return (arr ** 2).sum() 53 | 54 | arr = np.array(arr) 55 | print("The result is:", sqr_sum(arr)) 56 | %timeit sqr_sum(arr) 57 | ``` 58 | 59 | The result is: 333283335000 60 | 9.66 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 61 | 62 | 63 | ### 3.3 Numba 64 | 65 | 66 | ```python 67 | from numba import jit 68 | ``` 69 | 70 | 71 | ```python 72 | @jit(nopython=True) 73 | def sqr_sum(arr): 74 | return (arr ** 2).sum() 75 | 76 | arr = np.array(arr) 77 | print("The result is:", sqr_sum(arr)) 78 | %timeit sqr_sum(arr) 79 | ``` 80 | 81 | The result is: 333283335000 82 | 3.39 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 83 | 84 | 85 | ### 3.4 Numexpr 86 | 87 | 88 | ```python 89 | import numexpr as ne 90 | ``` 91 | 92 | 93 | ```python 94 | def sqr_sum(arr): 95 | return ne.evaluate("sum(arr * arr)") 96 | 97 | arr = np.array(arr) 98 | print("The result is:", sqr_sum(arr)) 99 | %timeit sqr_sum(arr) 100 | ``` 101 | 102 | The result is: 333283335000 103 | 14.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 104 | 105 | 106 | ### 3.5 Cython 107 | 108 | 109 | ```python 110 | %load_ext cython 111 | ``` 112 | 113 | 114 | ```cython 115 | %%cython 116 | cimport numpy as np 117 | ctypedef np.int_t DTYPE_t 118 | 119 | def sqr_sum(np.ndarray[DTYPE_t] arr): 120 | cdef: 121 | DTYPE_t total = 0 122 | DTYPE_t x 123 | int i = 0 124 | int n = len(arr) 125 | while i < n: 126 | total += arr[i] ** 2 127 | i += 1 128 | return total 129 | ``` 130 | 131 | 132 | ```python 133 | arr = np.array(arr, dtype="int") 134 | print("The result is:", sqr_sum(arr)) 135 | %timeit sqr_sum(arr) 136 | ``` 137 | 138 | The result is: 333283335000 139 | 5.51 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 140 | 141 | 142 | ## 4. 数组变换 143 | 144 | 145 | ```python 146 | arr = [x for x in range(1000000)] 147 | ``` 148 | 149 | ### 4.1 for循环 150 | 151 | 152 | ```python 153 | def transform(arr): 154 | return [x * 2 + 1 for x in arr] 155 | 156 | print("The result is:", transform(arr)[:5], "...") 157 | %timeit transform(arr) 158 | ``` 159 | 160 | The result is: [1, 3, 5, 7, 9] ... 161 | 84.5 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 162 | 163 | 164 | ### 4.2 Numpy 165 | 166 | 167 | ```python 168 | import numpy as np 169 | ``` 170 | 171 | 172 | ```python 173 | def transform(arr): 174 | return arr * 2 + 1 175 | 176 | arr = np.array(arr) 177 | print("The result is:", transform(arr)[:5], "...") 178 | %timeit transform(arr) 179 | ``` 180 | 181 | The result is: [1 3 5 7 9] ... 182 | 803 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 183 | 184 | 185 | ### 4.3 Numba 186 | 187 | 188 | ```python 189 | from numba import jit 190 | ``` 191 | 192 | 193 | ```python 194 | @jit(nopython=True) 195 | def transform(arr): 196 | return arr * 2 + 1 197 | 198 | arr = np.array(arr) 199 | print("The result is:", transform(arr)[:5], "...") 200 | %timeit transform(arr) 201 | ``` 202 | 203 | The result is: [1 3 5 7 9] ... 204 | 498 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 205 | 206 | 207 | ### 4.4 Numexpr 208 | 209 | 210 | ```python 211 | import numexpr as ne 212 | ``` 213 | 214 | 215 | ```python 216 | def transform(arr): 217 | return ne.evaluate("arr * 2 + 1") 218 | 219 | arr = np.array(arr) 220 | print("The result is:", transform(arr)[:5], "...") 221 | %timeit transform(arr) 222 | ``` 223 | 224 | The result is: [1 3 5 7 9] ... 225 | 369 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 226 | 227 | 228 | ### 4.5 Cython 229 | 230 | 231 | ```python 232 | %load_ext cython 233 | ``` 234 | 235 | The cython extension is already loaded. To reload it, use: 236 | %reload_ext cython 237 | 238 | 239 | 240 | ```cython 241 | %%cython 242 | import numpy as np 243 | cimport numpy as np 244 | ctypedef np.int_t DTYPE_t 245 | 246 | def transform(np.ndarray[DTYPE_t] arr): 247 | cdef: 248 | np.ndarray[DTYPE_t] new_arr = np.empty_like(arr) 249 | int i = 0 250 | int n = len(arr) 251 | while i < n: 252 | new_arr[i] = arr[i] * 2 + 1 253 | i += 1 254 | return new_arr 255 | ``` 256 | 257 | 258 | ```python 259 | arr = np.array(arr) 260 | print("The result is:", transform(arr)[:5], "...") 261 | %timeit transform(arr) 262 | ``` 263 | 264 | The result is: [1 3 5 7 9] ... 265 | 887 µs ± 29.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 266 | 267 | 268 | ## 5. 参考文章 269 | [How does python have different data types in an array?](https://stackoverflow.com/questions/10558670/how-does-python-have-different-data-types-in-an-array) 270 | [Why are Python Programs often slower than the Equivalent Program Written in C or C++?](https://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or) 271 | [How Fast Numpy Really is and Why?](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) 272 | 273 | 274 | ```python 275 | 276 | ``` 277 | -------------------------------------------------------------------------------- /Itertools for efficient looping.md: -------------------------------------------------------------------------------- 1 | # Python Itertools - 高效的循环 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | Python官方文档用"高效的循环"来形容itertools模块,有些tools会带来性能提升,而另外一些tools并不快,只是会节省一些开发时间而已,如果滥用还会导致代码可读性变差。我们不妨把itertools的兄弟们拉出来溜溜。 6 | 7 | 8 | ## 1. 数列累加 9 | 给定一个列表An,返回数列累加和Sn。 10 | 举例说明: 11 | * 输入: [1, 2, 3, 4, 5] 12 | * 返回: [1, 3, 6, 10, 15] 13 | 14 | 使用accumulate,性能提升了2.5倍 15 | 16 | 17 | ```python 18 | from itertools import accumulate 19 | ``` 20 | 21 | 22 | ```python 23 | def _accumulate_list(arr): 24 | tot = 0 25 | for x in arr: 26 | tot += x 27 | yield tot 28 | 29 | def accumulate_list(arr): 30 | return list(_accumulate_list(arr)) 31 | ``` 32 | 33 | 34 | ```python 35 | def fast_accumulate_list(arr): 36 | return list(accumulate(arr)) 37 | ``` 38 | 39 | 40 | ```python 41 | arr = list(range(1000)) 42 | ``` 43 | 44 | 45 | ```python 46 | %timeit accumulate_list(arr) 47 | ``` 48 | 49 | 61 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50 | 51 | 52 | 53 | ```python 54 | %timeit fast_accumulate_list(arr) 55 | ``` 56 | 57 | 21.3 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 58 | 59 | 60 | ## 2. 选择数据 61 | 给定一个列表data,一个用0/1表示的列表selectors,返回被选择的数据。 62 | 举例说明: 63 | * 输入: [1, 2, 3, 4, 5], [0, 1, 0, 1, 0] 64 | * 返回: [2, 4] 65 | 66 | 使用compress,性能提升了2.8倍 67 | 68 | 69 | ```python 70 | from itertools import compress 71 | from random import randint 72 | ``` 73 | 74 | 75 | ```python 76 | def select_data(data, selectors): 77 | return [x for x, y in zip(data, selectors) if y] 78 | ``` 79 | 80 | 81 | ```python 82 | def fast_select_data(data, selectors): 83 | return list(compress(data, selectors)) 84 | ``` 85 | 86 | 87 | ```python 88 | data = list(range(10000)) 89 | selectors = [randint(0, 1) for _ in range(10000)] 90 | ``` 91 | 92 | 93 | ```python 94 | %timeit select_data(data, selectors) 95 | ``` 96 | 97 | 341 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 98 | 99 | 100 | 101 | ```python 102 | %timeit fast_select_data(data, selectors) 103 | ``` 104 | 105 | 130 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 106 | 107 | 108 | ## 3. 组合 109 | 给定一个列表arr和一个数字k,返回从arr中选择k个元素的所有情况。 110 | 举例说明: 111 | * 输入: [1, 2, 3], 2 112 | * 返回: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)] 113 | 114 | 使用permutations,性能提升了10倍 115 | 116 | 117 | ```python 118 | from itertools import permutations 119 | ``` 120 | 121 | 122 | ```python 123 | def _get_permutations(arr, k, i): 124 | if i == k: 125 | return [arr[:k]] 126 | res = [] 127 | for j in range(i, len(arr)): 128 | arr_cpy = arr.copy() 129 | arr_cpy[i], arr_cpy[j] = arr_cpy[j], arr_cpy[i] 130 | res += _get_permutations(arr_cpy, k, i + 1) 131 | return res 132 | 133 | def get_permutations(arr, k): 134 | return _get_permutations(arr, k, 0) 135 | ``` 136 | 137 | 138 | ```python 139 | def fast_get_permutations(arr, k): 140 | return list(permutations(arr, k)) 141 | ``` 142 | 143 | 144 | ```python 145 | arr = list(range(10)) 146 | k = 5 147 | ``` 148 | 149 | 150 | ```python 151 | %timeit -n 1 get_permutations(arr, k) 152 | ``` 153 | 154 | 15.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 155 | 156 | 157 | 158 | ```python 159 | %timeit -n 1 fast_get_permutations(arr, k) 160 | ``` 161 | 162 | 1.56 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 163 | 164 | 165 | ## 4. 筛选数据 166 | 给定一个列表arr,筛选出所有的偶数。 167 | 举例说明: 168 | * 输入: [3, 1, 4, 5, 9, 2] 169 | * 返回: [(4, 2] 170 | 171 | 使用filterfalse,性能反而会变慢,所以不要迷信itertools。 172 | 173 | 174 | ```python 175 | from itertools import filterfalse 176 | ``` 177 | 178 | 179 | ```python 180 | def get_even_nums(arr): 181 | return [x for x in arr if x % 2 == 0] 182 | ``` 183 | 184 | 185 | ```python 186 | def fast_get_even_nums(arr): 187 | return list(filterfalse(lambda x: x % 2, arr)) 188 | ``` 189 | 190 | 191 | ```python 192 | arr = list(range(10000)) 193 | ``` 194 | 195 | 196 | ```python 197 | %timeit get_even_nums(arr) 198 | ``` 199 | 200 | 417 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 201 | 202 | 203 | 204 | ```python 205 | %timeit fast_get_even_nums(arr) 206 | ``` 207 | 208 | 823 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 209 | 210 | 211 | ## 5. 条件终止 212 | 给定一个列表arr,依次对列表的所有数字进行求和,若遇到某个元素大于target之后则终止求和,返回这个和。 213 | 举例说明: 214 | * 输入: [1, 2, 3, 4, 5], 3 215 | * 返回: 6 (4 > 3,终止) 216 | 217 | 使用takewhile,性能反而会变慢,所以不要迷信itertools。 218 | 219 | 220 | ```python 221 | from itertools import takewhile 222 | ``` 223 | 224 | 225 | ```python 226 | def cond_sum(arr, target): 227 | res = 0 228 | for x in arr: 229 | if x > target: 230 | break 231 | res += x 232 | return res 233 | ``` 234 | 235 | 236 | ```python 237 | def fast_cond_sum(arr, target): 238 | return sum(takewhile(lambda x: x <= target, arr)) 239 | ``` 240 | 241 | 242 | ```python 243 | arr = list(range(10000)) 244 | target = 5000 245 | ``` 246 | 247 | 248 | ```python 249 | %timeit cond_sum(arr, target) 250 | ``` 251 | 252 | 245 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 253 | 254 | 255 | 256 | ```python 257 | %timeit fast_cond_sum(arr, target) 258 | ``` 259 | 260 | 404 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 261 | 262 | 263 | ## 6. 循环嵌套 264 | 给定列表arr1,arr2,返回两个列表的所有元素两两相加的和。 265 | 举例说明: 266 | * 输入: [1, 2], [4, 5] 267 | * 返回: [1 + 4, 1 + 5, 2 + 4, 2 + 5] 268 | 269 | 使用product,性能提升了1.25倍。 270 | 271 | 272 | ```python 273 | from itertools import product 274 | ``` 275 | 276 | 277 | ```python 278 | def _cross_sum(arr1, arr2): 279 | for x in arr1: 280 | for y in arr2: 281 | yield x + y 282 | 283 | def cross_sum(arr1, arr2): 284 | return list(_cross_sum(arr1, arr2)) 285 | ``` 286 | 287 | 288 | ```python 289 | def fast_cross_sum(arr1, arr2): 290 | return [x + y for x, y in product(arr1, arr2)] 291 | ``` 292 | 293 | 294 | ```python 295 | arr1 = list(range(100)) 296 | arr2 = list(range(100)) 297 | ``` 298 | 299 | 300 | ```python 301 | %timeit cross_sum(arr1, arr2) 302 | ``` 303 | 304 | 484 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 305 | 306 | 307 | 308 | ```python 309 | %timeit fast_cross_sum(arr1, arr2) 310 | ``` 311 | 312 | 373 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 313 | 314 | 315 | ## 7. 二维列表转一维列表 316 | 给定二维列表arr,转为一维列表 317 | 举例说明: 318 | * 输入: [[1, 2], [3, 4]] 319 | * 返回: [1, 2, 3, 4] 320 | 321 | 使用chain,性能提升了6倍。 322 | 323 | 324 | ```python 325 | from itertools import chain 326 | ``` 327 | 328 | 329 | ```python 330 | def _flatten(arr2d): 331 | for arr in arr2d: 332 | for x in arr: 333 | yield x 334 | 335 | def flatten(arr2d): 336 | return list(_flatten(arr2d)) 337 | ``` 338 | 339 | 340 | ```python 341 | def fast_flatten(arr2d): 342 | return list(chain(*arr2d)) 343 | ``` 344 | 345 | 346 | ```python 347 | arr2d = [[x + y * 100 for x in range(100)] for y in range(100)] 348 | ``` 349 | 350 | 351 | ```python 352 | %timeit flatten(arr2d) 353 | ``` 354 | 355 | 379 µs ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 356 | 357 | 358 | 359 | ```python 360 | %timeit fast_flatten(arr2d) 361 | ``` 362 | 363 | 66.9 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 364 | 365 | 366 | 367 | ```python 368 | 369 | ``` 370 | -------------------------------------------------------------------------------- /Python multi threads.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python多任务处理(多线程篇)\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. GIL\n", 17 | "\n", 18 | "熟悉python的都知道,在C语言写的python解释器中存在全局解释器锁,由于全局解释器锁的存在,在同一时间内,python解释器只能运行一个线程的代码,这大大影响了python多线程的性能。而这个解释器锁由于历史原因,现在几乎无法消除。 \n", 19 | " \n", 20 | "python GIL 之所以会影响多线程等性能,是因为在多线程的情况下,只有当线程获得了一个全局锁的时候,那么该线程的代码才能运行,而全局锁只有一个,所以使用python多线程,在同一时刻也只有一个线程在运行,因此在即使在多核的情况下也只能发挥出单核的性能。 \n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## 2. 多线程处理IO密集型任务\n", 28 | "IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。一个线程执行IO密集型任务的时候,CPU处于闲置状态,因此GIL会被释放给其他线程,从而缩短了总体的等待运行时间。" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "from concurrent.futures import ThreadPoolExecutor\n", 38 | "from time import sleep, time" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "# Worker数量\n", 48 | "N = 4\n", 49 | "# 建立线程池\n", 50 | "pool = ThreadPoolExecutor(max_workers=N)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### 2.1 定义一个IO密集型函数\n", 58 | "该函数会“睡眠”x秒。" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "def io_bound_func(x):\n", 68 | " sleep(x)\n", 69 | " print(\"Sleep for %d seconds.\" % x)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### 2.2 使用串行的方式处理\n", 77 | "遍历一个列表的所有元素,执行func函数。" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 4, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "def process_array(arr):\n", 87 | " for x in arr:\n", 88 | " io_bound_func(x)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "### 2.3 使用多线程处理\n", 96 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 5, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "def fast_process_array(arr):\n", 106 | " for x in pool.map(io_bound_func, arr):\n", 107 | " pass" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "### 2.4 计算函数运行时间\n", 115 | "- 串行版本的运行时间 = 1 + 2 + 3 = 6秒 \n", 116 | "- 多线程版本的运行时间 = max(1, 2, 3) = 3秒" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 6, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "def time_it(fn, *args):\n", 126 | " start = time()\n", 127 | " fn(*args)\n", 128 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 7, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "Sleep for 1 seconds.\n", 141 | "Sleep for 2 seconds.\n", 142 | "Sleep for 3 seconds.\n", 143 | "process_array版本的运行时间为 6.00883 秒!\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "time_it(process_array, [1, 2, 3])" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 8, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "Sleep for 1 seconds.\n", 161 | "Sleep for 2 seconds.\n", 162 | "Sleep for 3 seconds.\n", 163 | "fast_process_array版本的运行时间为 3.00300 秒!\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "time_it(fast_process_array, [1, 2, 3])" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "### 3. 多线程CPU密集型任务\n", 176 | "CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "### 3.1 定义一个CPU密集型函数\n", 184 | "该函数会对[1, x]之间的整数进行求和。" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 9, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "def cpu_bound_func(x):\n", 194 | " tot = 0\n", 195 | " a = 1\n", 196 | " while a <= x:\n", 197 | " tot += x\n", 198 | " a += 1\n", 199 | " print(\"Finish sum from 1 to %d!\" % x)\n", 200 | " return tot" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "### 3.2 使用串行的方式处理\n", 208 | "遍历一个列表的所有元素,执行func函数。" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 10, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "def process_array(arr):\n", 218 | " for x in arr:\n", 219 | " cpu_bound_func(x)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "### 3.3 使用多线程处理\n", 227 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 11, 233 | "metadata": {}, 234 | "outputs": [], 235 | "source": [ 236 | "def fast_process_array(arr):\n", 237 | " for x in pool.map(cpu_bound_func, arr):\n", 238 | " pass" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "### 3.4 计算函数运行时间\n", 246 | "- 串行版本的运行时间2.1秒\n", 247 | "- 多线程版本的运行时间2.2秒" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 12, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "def time_it(fn, *args):\n", 257 | " start = time()\n", 258 | " fn(*args)\n", 259 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 13, 265 | "metadata": {}, 266 | "outputs": [ 267 | { 268 | "name": "stdout", 269 | "output_type": "stream", 270 | "text": [ 271 | "Finish sum from 1 to 10000000!\n", 272 | "Finish sum from 1 to 10000000!\n", 273 | "Finish sum from 1 to 10000000!\n", 274 | "process_array版本的运行时间为 2.10489 秒!\n" 275 | ] 276 | } 277 | ], 278 | "source": [ 279 | "time_it(process_array, [10**7, 10**7, 10**7])" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 14, 285 | "metadata": {}, 286 | "outputs": [ 287 | { 288 | "name": "stdout", 289 | "output_type": "stream", 290 | "text": [ 291 | "Finish sum from 1 to 10000000!\n", 292 | "Finish sum from 1 to 10000000!\n", 293 | "Finish sum from 1 to 10000000!\n", 294 | "fast_process_array版本的运行时间为 2.20897 秒!\n" 295 | ] 296 | } 297 | ], 298 | "source": [ 299 | "time_it(fast_process_array, [10**7, 10**7, 10**7])" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "## 参考文章\n", 307 | "https://www.jianshu.com/p/c75ed8a6e9af \n", 308 | "https://www.cnblogs.com/tusheng/articles/10630662.html" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [] 317 | } 318 | ], 319 | "metadata": { 320 | "kernelspec": { 321 | "display_name": "Python 3", 322 | "language": "python", 323 | "name": "python3" 324 | }, 325 | "language_info": { 326 | "codemirror_mode": { 327 | "name": "ipython", 328 | "version": 3 329 | }, 330 | "file_extension": ".py", 331 | "mimetype": "text/x-python", 332 | "name": "python", 333 | "nbconvert_exporter": "python", 334 | "pygments_lexer": "ipython3", 335 | "version": "3.6.6" 336 | } 337 | }, 338 | "nbformat": 4, 339 | "nbformat_minor": 2 340 | } 341 | -------------------------------------------------------------------------------- /More efficient pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 让Pandas DataFrame性能提升40倍" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 1. 小试牛刀\n", 15 | "大名鼎鼎的Pandas是数据分析的神器。有时候我们需要对上千万甚至上亿的数据进行非常复杂处理,那么运行效率就是一个不能忽视的问题。比如下面这个简单例子,我们随机生成100万条数据,对'val'这一列进行处理:如果是偶数则减1,奇数则加1。实际的数据分析工作要比这个例子复杂的多,但考虑到我们(主要是我)没有那么多时间等待运行结果,所以就偷个懒吧。可以看到transform函数的平均运行时间是284ms," 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "data": { 25 | "text/html": [ 26 | "
\n", 27 | "\n", 40 | "\n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | "
genreval
0C54
1A5
2D0
3D42
4C91
\n", 76 | "
" 77 | ], 78 | "text/plain": [ 79 | " genre val\n", 80 | "0 C 54\n", 81 | "1 A 5\n", 82 | "2 D 0\n", 83 | "3 D 42\n", 84 | "4 C 91" 85 | ] 86 | }, 87 | "execution_count": 1, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "import pandas as pd\n", 94 | "import numpy as np\n", 95 | "\n", 96 | "def gen_data(size):\n", 97 | " d = dict()\n", 98 | " d[\"genre\"] = np.random.choice([\"A\", \"B\", \"C\", \"D\"], size=size)\n", 99 | " d[\"val\"] = np.random.randint(low=0, high=100, size=size)\n", 100 | " return pd.DataFrame(d)\n", 101 | "\n", 102 | "data = gen_data(1000000)\n", 103 | "data.head()" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 2, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "def transform(data):\n", 113 | " data.loc[:, \"new_val\"] = data.val.apply(lambda x: x + 1 if x % 2 else x - 1)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 3, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "284 ms ± 8.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "%timeit -n 1 transform(data)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## 2. 用Cython编写C扩展\n", 138 | "试试用我们的老朋友Cython来写一下 `x + 1 if x % 2 else x - 1` 这个函数。平均运行时间降低到了202ms,果然速度变快了。性能大约提升了1.4倍,离40倍的flag还差的好远[捂脸]。" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 4, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "%load_ext cython" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 5, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "%%cython\n", 157 | "cpdef int _transform(int x):\n", 158 | " if x % 2:\n", 159 | " return x + 1\n", 160 | " return x - 1\n", 161 | "\n", 162 | "def transform(data):\n", 163 | " data.loc[:, \"new_val\"] = data.val.apply(_transform)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 6, 169 | "metadata": {}, 170 | "outputs": [ 171 | { 172 | "name": "stdout", 173 | "output_type": "stream", 174 | "text": [ 175 | "202 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "%timeit -n 1 transform(data)" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "## 3. 减少类型转换\n", 188 | "为了减少C和Python之间的类型转换,我们直接把'val'这一列作为Numpy数组传递给Cython函数,注意区分cnp和np。平均运行时间直接降到10.8毫秒,性能大约提升了26倍,仿佛看到了一丝希望。" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 7, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "%%cython\n", 198 | "import numpy as np\n", 199 | "cimport numpy as cnp\n", 200 | "ctypedef cnp.int_t DTYPE_t\n", 201 | "\n", 202 | "cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):\n", 203 | " cdef:\n", 204 | " int i = 0\n", 205 | " int n = arr.shape[0]\n", 206 | " int x\n", 207 | " cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n", 208 | "\n", 209 | " while i < n:\n", 210 | " x = arr[i]\n", 211 | " if x % 2:\n", 212 | " new_arr[i] = x + 1\n", 213 | " else:\n", 214 | " new_arr[i] = x - 1\n", 215 | " i += 1\n", 216 | " return new_arr\n", 217 | "\n", 218 | "def transform(data):\n", 219 | " data.loc[:, \"new_val\"] = _transform(data.val.values)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 8, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | "10.8 ms ± 512 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 232 | ] 233 | } 234 | ], 235 | "source": [ 236 | "%timeit -n 1 transform(data)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "## 4. 使用不安全的数组\n", 244 | "利用@cython.boundscheck(False),@cython.wraparound(False)装饰器关闭数组的边界检查和负下标处理,平均运行时间变为5.9毫秒。性能提升了42倍左右,顺利完成任务。" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 9, 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "%%cython\n", 254 | "import cython\n", 255 | "import numpy as np\n", 256 | "cimport numpy as cnp\n", 257 | "ctypedef cnp.int_t DTYPE_t\n", 258 | "\n", 259 | "\n", 260 | "@cython.boundscheck(False)\n", 261 | "@cython.wraparound(False)\n", 262 | "cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):\n", 263 | " cdef:\n", 264 | " int i = 0\n", 265 | " int n = arr.shape[0]\n", 266 | " int x\n", 267 | " cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n", 268 | "\n", 269 | " while i < n:\n", 270 | " x = arr[i]\n", 271 | " if x % 2:\n", 272 | " new_arr[i] = x + 1\n", 273 | " else:\n", 274 | " new_arr[i] = x - 1\n", 275 | " i += 1\n", 276 | " return new_arr\n", 277 | "\n", 278 | "def transform(data):\n", 279 | " data.loc[:, \"new_val\"] = _transform(data.val.values)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 10, 285 | "metadata": {}, 286 | "outputs": [ 287 | { 288 | "name": "stdout", 289 | "output_type": "stream", 290 | "text": [ 291 | "6.76 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "%timeit -n 1 transform(data)" 297 | ] 298 | } 299 | ], 300 | "metadata": { 301 | "kernelspec": { 302 | "display_name": "Python 3", 303 | "language": "python", 304 | "name": "python3" 305 | }, 306 | "language_info": { 307 | "codemirror_mode": { 308 | "name": "ipython", 309 | "version": 3 310 | }, 311 | "file_extension": ".py", 312 | "mimetype": "text/x-python", 313 | "name": "python", 314 | "nbconvert_exporter": "python", 315 | "pygments_lexer": "ipython3", 316 | "version": "3.6.6" 317 | } 318 | }, 319 | "nbformat": 4, 320 | "nbformat_minor": 2 321 | } 322 | -------------------------------------------------------------------------------- /Python Standard Library.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 用Python标准库写出高效的代码\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. bisect - 二分查找\n", 17 | "给定一个列表对象,我们要对目标元素进行查找,返回其在列表中的下标。 \n", 18 | "* 首先想到的是Python列表的index方法。建立一个长度为10000的升序列表,编写search函数使用index方式把里面的每一个元素查找一遍,平均运行时间437毫秒。\n", 19 | "* 使用bisect模块的bisect_left,也就是我们熟知的二分查找。编写fast_search函数,平均运行时间3.94毫秒,性能提升了110倍!" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import bisect" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "def search(nums):\n", 38 | " for x in nums:\n", 39 | " nums.index(x)" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "def fast_search(nums):\n", 49 | " for x in nums:\n", 50 | " bisect.bisect_left(nums, x)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 4, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "arr = list(range(10000))" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 5, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "437 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "%timeit -n 1 search(arr)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 6, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "3.94 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 89 | ] 90 | } 91 | ], 92 | "source": [ 93 | "%timeit -n 1 fast_search(arr)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "## 2. Counter - 高效计数\n", 101 | "给定一个列表对象,我们要统计其中的每个不重复的元素出现了多少次,返回一个字典对象。 \n", 102 | "* 创建一个长度为10000,元素为1-3之间的随机数的列表。编写count函数,创建一个空字典,用for循环遍历该列表,将计数结果写入字典。平均运行时间937微秒。\n", 103 | "* 使用collections模块的Counter,编写fast_count函数,一行代码搞定。平均运行时间494微秒,性能几乎是原来的2倍。" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 7, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "from collections import Counter\n", 113 | "from random import randint" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 8, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "def count(nums):\n", 123 | " res = dict()\n", 124 | " for x in nums:\n", 125 | " if x in res:\n", 126 | " res[x] += 1\n", 127 | " else:\n", 128 | " res[x] = 0\n", 129 | " return x" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 9, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "def fast_count(nums):\n", 139 | " return Counter(nums)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 10, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "nums = [randint(1, 3) for _ in range(10000)]" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 11, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "937 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "%timeit -n 1 count(nums)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 12, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "494 µs ± 240 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "%timeit -n 1 fast_count(nums)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "## 3. heapq - 堆\n", 190 | "给定一个列表对象,返回该列表中最小的3个元素。\n", 191 | "* 创建一个长度为10000的列表,对元素进行随机打乱。编写top_3函数,对列表进行排序,返回前3个元素。平均运行时间2.03毫秒。\n", 192 | "* 使用heapq模块,也就是我们熟悉的堆,编写fast_top_3函数。平均运行时间296微秒,性能提升了6.8倍。" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 13, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [ 201 | "import heapq\n", 202 | "from random import shuffle" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 14, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "def top_3(nums):\n", 212 | " return sorted(nums)[:3]" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 15, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "def fast_top_3(nums):\n", 222 | " return heapq.nsmallest(3, nums)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 16, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "nums = list(range(10000))\n", 232 | "shuffle(nums)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 17, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "name": "stdout", 242 | "output_type": "stream", 243 | "text": [ 244 | "2.03 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "%timeit -n 1 top_3(nums)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 18, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "name": "stdout", 259 | "output_type": "stream", 260 | "text": [ 261 | "296 µs ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "%timeit -n 1 fast_top_3(nums)" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "## 4. itemgetter - 批量get元素\n", 274 | "给定一个字典和一个列表,列表中包含一个或多个字典中的key,返回对应的values。\n", 275 | "* 创建一个元素数量为10万的字典,从字典的key中随机抽样10万,形成一个长度为1万的列表。编写get_items函数,平均运行时间1.12毫秒\n", 276 | "* 使用itemgetter批量读取这些元素,编写fast_get_items函数,平均运行时间836微秒,性能是原来的1.3倍。\n" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 19, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "from operator import itemgetter\n", 286 | "from random import choices" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 20, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "def get_items(data, keys):\n", 296 | " return [data[x] for x in keys]" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 21, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "def fast_get_items(data, keys):\n", 306 | " return itemgetter(*keys)(data)" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 22, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "data= dict(enumerate(range(100000)))\n", 316 | "keys = choices(list(data.keys()), k=10000)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 23, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "1.12 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)\n" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "%timeit -n 5 get_items(data, keys)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 24, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "name": "stdout", 343 | "output_type": "stream", 344 | "text": [ 345 | "836 µs ± 287 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)\n" 346 | ] 347 | } 348 | ], 349 | "source": [ 350 | "%timeit -n 5 fast_get_items(data, keys)" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "## 5. lru_cache - 空间换时间\n", 358 | "给定数字n,返回长度为n的斐波那且数列\n", 359 | "* 使用递归方式,编写fib函数,并用fib_seq函数对其进行循环调用。令n等于20,平均运行时间3.28ms。\n", 360 | "* 使用@lru_cache语法糖,将已经计算出来的结果缓存起来,比如fib(4),计算fib(5)的时候可以直接调用缓存的fib(4)。平均运行时间144微秒,性能提升了22倍。" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 25, 366 | "metadata": {}, 367 | "outputs": [], 368 | "source": [ 369 | "from functools import lru_cache" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 36, 375 | "metadata": {}, 376 | "outputs": [], 377 | "source": [ 378 | "def fib(n):\n", 379 | " if n < 2:\n", 380 | " return n\n", 381 | " return fib(n-1) + fib(n-2)\n", 382 | "\n", 383 | "def fib_seq(n):\n", 384 | " return [fib(x) for x in range(n)]" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 27, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "@lru_cache(maxsize=None)\n", 394 | "def fast_fib(n):\n", 395 | " if n < 2:\n", 396 | " return n\n", 397 | " return fib(n-1) + fib(n-2)\n", 398 | "\n", 399 | "def fast_fib_seq(n):\n", 400 | " return [fast_fib(x) for x in range(n)]" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 28, 406 | "metadata": {}, 407 | "outputs": [ 408 | { 409 | "name": "stdout", 410 | "output_type": "stream", 411 | "text": [ 412 | "3.28 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)\n" 413 | ] 414 | } 415 | ], 416 | "source": [ 417 | "%timeit -n 5 fib_seq(20)" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 29, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "The slowest run took 524.07 times longer than the fastest. This could mean that an intermediate result is being cached.\n", 430 | "144 µs ± 347 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)\n" 431 | ] 432 | } 433 | ], 434 | "source": [ 435 | "%timeit -n 5 fast_fib_seq(20)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": null, 441 | "metadata": {}, 442 | "outputs": [], 443 | "source": [] 444 | } 445 | ], 446 | "metadata": { 447 | "kernelspec": { 448 | "display_name": "Python 3", 449 | "language": "python", 450 | "name": "python3" 451 | }, 452 | "language_info": { 453 | "codemirror_mode": { 454 | "name": "ipython", 455 | "version": 3 456 | }, 457 | "file_extension": ".py", 458 | "mimetype": "text/x-python", 459 | "name": "python", 460 | "nbconvert_exporter": "python", 461 | "pygments_lexer": "ipython3", 462 | "version": "3.6.6" 463 | } 464 | }, 465 | "nbformat": 4, 466 | "nbformat_minor": 2 467 | } 468 | -------------------------------------------------------------------------------- /Built-in method.md: -------------------------------------------------------------------------------- 1 | # 使用内置方法优化Python性能 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | Python程序运行太慢的一个可能的原因是没有尽可能的调用内置方法,下面通过5个例子来演示如何用内置方法提升Python程序的性能。 6 | 7 | ## 1. 数组求平方和 8 | 输入一个列表,要求计算出该列表中数字的的平方和。最终性能提升了1.4倍。 9 | 10 | 首先创建一个长度为10000的列表。 11 | 12 | 13 | ```python 14 | arr = list(range(10000)) 15 | ``` 16 | 17 | ### 1.1 最常规的写法 18 | while循环遍历列表求平方和。平均运行时间2.97毫秒。 19 | 20 | 21 | ```python 22 | def sum_sqr_0(arr): 23 | res = 0 24 | n = len(arr) 25 | i = 0 26 | while i < n: 27 | res += arr[i] ** 2 28 | i += 1 29 | return res 30 | ``` 31 | 32 | 33 | ```python 34 | %timeit sum_sqr_0(arr) 35 | ``` 36 | 37 | 2.97 ms ± 36.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 38 | 39 | 40 | ### 1.2 for range代替while循环 41 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间2.9毫秒。 42 | 43 | 44 | ```python 45 | def sum_sqr_1(arr): 46 | res = 0 47 | for i in range(len(arr)): 48 | res += arr[i] ** 2 49 | return res 50 | ``` 51 | 52 | 53 | ```python 54 | %timeit sum_sqr_1(arr) 55 | ``` 56 | 57 | 2.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 58 | 59 | 60 | ### 1.3 for x in arr代替for range 61 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间2.59毫秒。 62 | 63 | 64 | ```python 65 | def sum_sqr_2(arr): 66 | res = 0 67 | for x in arr: 68 | res += x ** 2 69 | return res 70 | ``` 71 | 72 | 73 | ```python 74 | %timeit sum_sqr_2(arr) 75 | ``` 76 | 77 | 2.59 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 78 | 79 | 80 | ### 1.4 sum函数套用map函数 81 | 平均运行时间2.36毫秒 82 | 83 | 84 | ```python 85 | def sum_sqr_3(arr): 86 | return sum(map(lambda x: x**2, arr)) 87 | ``` 88 | 89 | 90 | ```python 91 | %timeit sum_sqr_3(arr) 92 | ``` 93 | 94 | 2.36 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 95 | 96 | 97 | ### 1.5 sum函数套用生成器表达式 98 | 生成器表达式如果作为某个函数的参数,则可以省略掉()。平均运行时间2.35毫秒。 99 | 100 | 101 | ```python 102 | def sum_sqr_4(arr): 103 | return sum(x ** 2 for x in arr) 104 | ``` 105 | 106 | 107 | ```python 108 | %timeit sum_sqr_4(arr) 109 | ``` 110 | 111 | 2.35 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 112 | 113 | 114 | ### 1. 6 sum函数套用列表推导式 115 | 平均运行时间2.06毫秒。 116 | 117 | 118 | ```python 119 | def sum_sqr_5(arr): 120 | return sum([x ** 2 for x in arr]) 121 | ``` 122 | 123 | 124 | ```python 125 | %timeit sum_sqr_5(arr) 126 | ``` 127 | 128 | 2.06 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 129 | 130 | 131 | ## 2. 字符串拼接 132 | 输入一个列表,要求将列表中的字符串的前3个字符都拼接为一个字符串。最终性能提升了2.1倍。 133 | 134 | 首先创建一个列表,生成10000个随机长度和内容的字符串。 135 | 136 | 137 | ```python 138 | from random import randint 139 | 140 | def random_letter(): 141 | return chr(ord('a') + randint(0, 25)) 142 | 143 | def random_letters(n): 144 | return "".join([random_letter() for _ in range(n)]) 145 | 146 | strings = [random_letters(randint(1, 10)) for _ in range(10000)] 147 | ``` 148 | 149 | ### 2.1 最常规的写法 150 | while循环遍历列表,对字符串进行拼接。平均运行时间1.86毫秒。 151 | 152 | 153 | ```python 154 | def concat_strings_0(strings): 155 | res = "" 156 | n = len(strings) 157 | i = 0 158 | while i < n: 159 | res += strings[i][:3] 160 | i += 1 161 | return res 162 | ``` 163 | 164 | 165 | ```python 166 | %timeit concat_strings_0(strings) 167 | ``` 168 | 169 | 1.86 ms ± 74.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 170 | 171 | 172 | ### 2.2 for range代替while循环 173 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间1.55毫秒。 174 | 175 | 176 | ```python 177 | def concat_strings_1(strings): 178 | res = "" 179 | for i in range(len(strings)): 180 | res += strings[i][:3] 181 | return res 182 | ``` 183 | 184 | 185 | ```python 186 | %timeit concat_strings_1(strings) 187 | ``` 188 | 189 | 1.55 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 190 | 191 | 192 | ### 2.3 for x in strings代替for range 193 | 避免strings[i]的变量类型检查带来的额外开销。平均运行时间1.32毫秒。 194 | 195 | 196 | ```python 197 | def concat_strings_2(strings): 198 | res = "" 199 | for x in strings: 200 | res += x[:3] 201 | return res 202 | ``` 203 | 204 | 205 | ```python 206 | %timeit concat_strings_2(strings) 207 | ``` 208 | 209 | 1.32 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 210 | 211 | 212 | ### 2.4 .join方法套用生成器表达式 213 | 平均运行时间1.06毫秒。 214 | 215 | 216 | ```python 217 | def concat_strings_3(strings): 218 | return "".join(x[:3] for x in strings) 219 | ``` 220 | 221 | 222 | ```python 223 | %timeit concat_strings_3(strings) 224 | ``` 225 | 226 | 1.06 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 227 | 228 | 229 | ### 2.5 .join方法套用列表解析式 230 | 平均运行时间0.85毫秒。 231 | 232 | 233 | ```python 234 | def concat_strings_4(strings): 235 | return "".join([x[:3] for x in strings]) 236 | ``` 237 | 238 | 239 | ```python 240 | %timeit concat_strings_4(strings) 241 | ``` 242 | 243 | 858 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 244 | 245 | 246 | ## 3. 筛选奇数 247 | 248 | 输入一个列表,要求筛选出该列表中的所有奇数。最终性能提升了3.6倍。 249 | 250 | 首先创建一个长度为10000的列表。 251 | 252 | 253 | ```python 254 | arr = list(range(10000)) 255 | ``` 256 | 257 | ### 3.1 最常规的写法 258 | 创建一个空列表res,while循环遍历列表,将奇数append到res中。平均运行时间1.03毫秒。 259 | 260 | 261 | ```python 262 | def filter_odd_0(arr): 263 | res = [] 264 | i = 0 265 | n = len(arr) 266 | while i < n: 267 | if arr[i] % 2: 268 | res.append(arr[i]) 269 | i += 1 270 | return res 271 | ``` 272 | 273 | 274 | ```python 275 | %timeit filter_odd_0(arr) 276 | ``` 277 | 278 | 1.03 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 279 | 280 | 281 | ### 3.2 for range代替while循环 282 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间0.965毫秒。 283 | 284 | 285 | ```python 286 | def filter_odd_1(arr): 287 | res = [] 288 | for i in range(len(arr)): 289 | if arr[i] % 2: 290 | res.append(arr[i]) 291 | i += 1 292 | return res 293 | ``` 294 | 295 | 296 | ```python 297 | %timeit filter_odd_1(arr) 298 | ``` 299 | 300 | 965 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 | 302 | 303 | ### 3.3 for x in arr代替for range 304 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.430毫秒。 305 | 306 | 307 | ```python 308 | def filter_odd_2(arr): 309 | res = [] 310 | for x in arr: 311 | if x % 2: 312 | res.append(x) 313 | return res 314 | ``` 315 | 316 | 317 | ```python 318 | %timeit filter_odd_2(arr) 319 | ``` 320 | 321 | 430 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 322 | 323 | 324 | ### 3.4 list套用filter函数 325 | 平均运行时间0.763毫秒。注意filter函数很慢,在Python 3.6里非常鸡肋。 326 | 327 | 328 | ```python 329 | def filter_odd_3(arr): 330 | return list(filter(lambda x: x % 2, arr)) 331 | ``` 332 | 333 | 334 | ```python 335 | %timeit filter_odd_3(arr) 336 | ``` 337 | 338 | 763 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 | 340 | 341 | ### 3.5 list套用生成器表达式 342 | 平均运行时间0.398毫秒。 343 | 344 | 345 | ```python 346 | def filter_odd_4(arr): 347 | return list((x for x in arr if x % 2)) 348 | ``` 349 | 350 | 351 | ```python 352 | %timeit filter_odd_4(arr) 353 | ``` 354 | 355 | 398 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 356 | 357 | 358 | ### 3.6 带条件的列表推导式 359 | 平均运行时间0.290毫秒。 360 | 361 | 362 | ```python 363 | def filter_odd_5(arr): 364 | return [x for x in arr if x % 2] 365 | ``` 366 | 367 | 368 | ```python 369 | %timeit filter_odd_5(arr) 370 | ``` 371 | 372 | 290 µs ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 373 | 374 | 375 | ## 4. 两个数组相加 376 | 377 | 输入两个长度相同的列表,要求计算出两个列表对应位置的数字之和,返回一个与输入长度相同的列表。最终性能提升了2.7倍。 378 | 379 | 首先生成两个长度为10000的列表。 380 | 381 | 382 | ```python 383 | arr1 = list(range(10000)) 384 | arr2 = list(range(10000)) 385 | ``` 386 | 387 | ### 4.1 最常规的写法 388 | 创建一个空列表res,while循环遍历列表,将两个列表对应的元素之和append到res中。平均运行时间1.23毫秒。 389 | 390 | 391 | ```python 392 | def arr_sum_0(arr1, arr2): 393 | i = 0 394 | n = len(arr1) 395 | res = [] 396 | while i < n: 397 | res.append(arr1[i] + arr2[i]) 398 | i += 1 399 | return res 400 | ``` 401 | 402 | 403 | ```python 404 | %timeit arr_sum_0(arr1, arr2) 405 | ``` 406 | 407 | 1.23 ms ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 408 | 409 | 410 | ### 4.2 for range代替while循环 411 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间0.997毫秒。 412 | 413 | 414 | ```python 415 | def arr_sum_1(arr1, arr2): 416 | res = [] 417 | for i in range(len(arr1)): 418 | res.append(arr1[i] + arr2[i]) 419 | return res 420 | ``` 421 | 422 | 423 | ```python 424 | %timeit arr_sum_1(arr1, arr2) 425 | ``` 426 | 427 | 997 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 428 | 429 | 430 | ### 4.3 for i, x in enumerate代替for range 431 | 部分避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.799毫秒。 432 | 433 | 434 | ```python 435 | def arr_sum_2(arr1, arr2): 436 | res = arr1.copy() 437 | for i, x in enumerate(arr2): 438 | res[i] += x 439 | return res 440 | ``` 441 | 442 | 443 | ```python 444 | %timeit arr_sum_2(arr1, arr2) 445 | ``` 446 | 447 | 799 µs ± 16.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 448 | 449 | 450 | ### 4.4 for x, y in zip代替for range 451 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.769毫秒。 452 | 453 | 454 | ```python 455 | def arr_sum_3(arr1, arr2): 456 | res = [] 457 | for x, y in zip(arr1, arr2): 458 | res.append(x + y) 459 | return res 460 | ``` 461 | 462 | 463 | ```python 464 | %timeit arr_sum_3(arr1, arr2) 465 | ``` 466 | 467 | 769 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 468 | 469 | 470 | ### 4.5 列表推导式套用zip 471 | 平均运行时间0.462毫秒。 472 | 473 | 474 | ```python 475 | def arr_sum_4(arr1, arr2): 476 | return [x + y for x, y in zip(arr1, arr2)] 477 | ``` 478 | 479 | 480 | ```python 481 | %timeit arr_sum_4(arr1, arr2) 482 | ``` 483 | 484 | 462 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 485 | 486 | 487 | ## 5. 两个列表相同元素的数量 488 | 输入两个列表,要求统计两个列表相同元素的数量。其中每个列表内的元素都是不重复的。最终性能提升了5000倍。 489 | 490 | 首先创建两个列表,并将元素的顺序打乱。 491 | 492 | 493 | ```python 494 | from random import shuffle 495 | arr1 = list(range(2000)) 496 | shuffle(arr1) 497 | arr2 = list(range(1000, 3000)) 498 | shuffle(arr2) 499 | ``` 500 | 501 | ### 5.1 最常规的写法 502 | while循环嵌套,判断元素arr1[i]是否等于arr2[j],平均运行时间338毫秒。 503 | 504 | 505 | ```python 506 | def n_common_0(arr1, arr2): 507 | res = 0 508 | i = 0 509 | m = len(arr1) 510 | n = len(arr2) 511 | while i < m: 512 | j = 0 513 | while j < n: 514 | if arr1[i] == arr2[j]: 515 | res += 1 516 | j += 1 517 | i += 1 518 | return res 519 | ``` 520 | 521 | 522 | ```python 523 | %timeit n_common_0(arr1, arr2) 524 | ``` 525 | 526 | 338 ms ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 527 | 528 | 529 | ### 5.2 for range代替while循环 530 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间233毫秒。 531 | 532 | 533 | ```python 534 | def n_common_1(arr1, arr2): 535 | res = 0 536 | for i in range(len(arr1)): 537 | for j in range(len(arr2)): 538 | if arr1[i] == arr2[j]: 539 | res += 1 540 | return res 541 | ``` 542 | 543 | 544 | ```python 545 | %timeit n_common_1(arr1, arr2) 546 | ``` 547 | 548 | 233 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 549 | 550 | 551 | ### 5.3 for x in arr代替for range 552 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间84.8毫秒。 553 | 554 | 555 | ```python 556 | def n_common_2(arr1, arr2): 557 | res = 0 558 | for x in arr1: 559 | for y in arr2: 560 | if x == y: 561 | res += 1 562 | return res 563 | ``` 564 | 565 | 566 | ```python 567 | %timeit n_common_2(arr1, arr2) 568 | ``` 569 | 570 | 84.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 571 | 572 | 573 | ### 5.4 使用if x in arr2代替内层循环 574 | 平均运行时间24.9毫秒。 575 | 576 | 577 | ```python 578 | def n_common_3(arr1, arr2): 579 | res = 0 580 | for x in arr1: 581 | if x in arr2: 582 | res += 1 583 | return res 584 | ``` 585 | 586 | 587 | ```python 588 | %timeit n_common_3(arr1, arr2) 589 | ``` 590 | 591 | 24.9 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 592 | 593 | 594 | ### 5.4 使用更快的算法 595 | 将数组用.sort方法排序,再进行单层循环遍历。把时间复杂度从O(n2)降低到O(nlogn),平均运行时间0.239毫秒。 596 | 597 | 598 | ```python 599 | def n_common_4(arr1, arr2): 600 | arr1.sort() 601 | arr2.sort() 602 | res = i = j = 0 603 | m, n = len(arr1), len(arr2) 604 | while i < m and j < n: 605 | if arr1[i] == arr2[j]: 606 | res += 1 607 | i += 1 608 | j += 1 609 | elif arr1[i] > arr2[j]: 610 | j += 1 611 | else: 612 | i += 1 613 | return res 614 | ``` 615 | 616 | 617 | ```python 618 | %timeit n_common_4(arr1, arr2) 619 | ``` 620 | 621 | 329 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 622 | 623 | 624 | ### 5.5 使用更好的数据结构 625 | 将数组转为集合,求交集的长度。平均运行时间0.067毫秒。 626 | 627 | 628 | ```python 629 | def n_common_5(arr1, arr2): 630 | return len(set(arr1) & set(arr2)) 631 | ``` 632 | 633 | 634 | ```python 635 | %timeit n_common_5(arr1, arr2) 636 | ``` 637 | 638 | 67.2 µs ± 755 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 639 | 640 | 641 | 642 | ```python 643 | 644 | ``` 645 | -------------------------------------------------------------------------------- /More efficient array.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 4种方法提升Python数组的效率" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 1. Python的列表为什么慢\n", 15 | "Python的列表是一个动态的数组,即数组的size是可以调整的,数组存放着指向各个列表元素的指针(PyObject*)。列表中的各个元素可以是不同的类型,比如my_list = ['a', 1, True]。实际上数组里存放了三个指针,分别指向了这三个元素。那么相比其他语言的数组而言,为什么Python的列表会慢呢?原因主要是以下两个:\n", 16 | "1. Python是动态类型语言,意味着类型检查要耗费额外的时间。\n", 17 | "2. Python或者说Cpython没有JIT优化器。" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## 2. 如何用Python执行快速的数组计算\n", 25 | "目前比较主流的解决方案有如下几种:\n", 26 | "1. Numpy - Numpy的array更像是C/C++的数组,数据类型一致,而且array的方法(如sum)都是用C来实现的。\n", 27 | "2. Numba - 使用JIT技术,优化Numpy的性能。无论是调用Numpy的方法,还是使用for循环遍历Numpy数组,都可以得到性能提升。\n", 28 | "3. Numexpr - 避免Numpy为中间结果分配内存,优化Numpy性能,主要用于大数组的表达式计算。\n", 29 | "4. Cython - 为Python编写C/C++扩展。\n", 30 | "\n", 31 | "接下来通过两个例子来演示如何通过这四种工具" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "## 3. 数组求平方和" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "arr = [x for x in range(10000)]" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "### 3.1 for循环" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "The result is: 333283335000\n", 67 | "2.53 ms ± 91.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "def sqr_sum(arr):\n", 73 | " total = 0\n", 74 | " for x in arr:\n", 75 | " total += x ** 2\n", 76 | " return total\n", 77 | "\n", 78 | "print(\"The result is:\", sqr_sum(arr))\n", 79 | "%timeit sqr_sum(arr)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### 3.2 Numpy" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 3, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "import numpy as np" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 4, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "The result is: 333283335000\n", 108 | "9.66 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "def sqr_sum(arr):\n", 114 | " return (arr ** 2).sum()\n", 115 | "\n", 116 | "arr = np.array(arr)\n", 117 | "print(\"The result is:\", sqr_sum(arr))\n", 118 | "%timeit sqr_sum(arr)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "### 3.3 Numba" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 5, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "from numba import jit" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "The result is: 333283335000\n", 147 | "3.39 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "@jit(nopython=True)\n", 153 | "def sqr_sum(arr):\n", 154 | " return (arr ** 2).sum()\n", 155 | "\n", 156 | "arr = np.array(arr)\n", 157 | "print(\"The result is:\", sqr_sum(arr))\n", 158 | "%timeit sqr_sum(arr)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "### 3.4 Numexpr" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 7, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "import numexpr as ne" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 8, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "name": "stdout", 184 | "output_type": "stream", 185 | "text": [ 186 | "The result is: 333283335000\n", 187 | "14.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" 188 | ] 189 | } 190 | ], 191 | "source": [ 192 | "def sqr_sum(arr):\n", 193 | " return ne.evaluate(\"sum(arr * arr)\")\n", 194 | "\n", 195 | "arr = np.array(arr)\n", 196 | "print(\"The result is:\", sqr_sum(arr))\n", 197 | "%timeit sqr_sum(arr)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "### 3.5 Cython" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 9, 210 | "metadata": {}, 211 | "outputs": [], 212 | "source": [ 213 | "%load_ext cython" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 10, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "%%cython\n", 223 | "cimport numpy as np\n", 224 | "ctypedef np.int_t DTYPE_t\n", 225 | "\n", 226 | "def sqr_sum(np.ndarray[DTYPE_t] arr):\n", 227 | " cdef:\n", 228 | " DTYPE_t total = 0\n", 229 | " DTYPE_t x\n", 230 | " int i = 0\n", 231 | " int n = len(arr)\n", 232 | " while i < n:\n", 233 | " total += arr[i] ** 2\n", 234 | " i += 1\n", 235 | " return total" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 11, 241 | "metadata": {}, 242 | "outputs": [ 243 | { 244 | "name": "stdout", 245 | "output_type": "stream", 246 | "text": [ 247 | "The result is: 333283335000\n", 248 | "5.51 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" 249 | ] 250 | } 251 | ], 252 | "source": [ 253 | "arr = np.array(arr, dtype=\"int\")\n", 254 | "print(\"The result is:\", sqr_sum(arr))\n", 255 | "%timeit sqr_sum(arr)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "## 4. 数组变换" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 12, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "arr = [x for x in range(1000000)]" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "### 4.1 for循环" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 13, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | "The result is: [1, 3, 5, 7, 9] ...\n", 291 | "84.5 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "def transform(arr):\n", 297 | " return [x * 2 + 1 for x in arr]\n", 298 | "\n", 299 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n", 300 | "%timeit transform(arr)" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "### 4.2 Numpy" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 14, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "import numpy as np" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 15, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "The result is: [1 3 5 7 9] ...\n", 329 | "803 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 330 | ] 331 | } 332 | ], 333 | "source": [ 334 | "def transform(arr):\n", 335 | " return arr * 2 + 1\n", 336 | "\n", 337 | "arr = np.array(arr)\n", 338 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n", 339 | "%timeit transform(arr)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "### 4.3 Numba" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 16, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "from numba import jit" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 17, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "The result is: [1 3 5 7 9] ...\n", 368 | "498 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 369 | ] 370 | } 371 | ], 372 | "source": [ 373 | "@jit(nopython=True)\n", 374 | "def transform(arr):\n", 375 | " return arr * 2 + 1\n", 376 | "\n", 377 | "arr = np.array(arr)\n", 378 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n", 379 | "%timeit transform(arr)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "### 4.4 Numexpr" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 18, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "import numexpr as ne" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": 19, 401 | "metadata": {}, 402 | "outputs": [ 403 | { 404 | "name": "stdout", 405 | "output_type": "stream", 406 | "text": [ 407 | "The result is: [1 3 5 7 9] ...\n", 408 | "369 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 409 | ] 410 | } 411 | ], 412 | "source": [ 413 | "def transform(arr):\n", 414 | " return ne.evaluate(\"arr * 2 + 1\")\n", 415 | "\n", 416 | "arr = np.array(arr)\n", 417 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n", 418 | "%timeit transform(arr)" 419 | ] 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "### 4.5 Cython" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": 20, 431 | "metadata": {}, 432 | "outputs": [ 433 | { 434 | "name": "stdout", 435 | "output_type": "stream", 436 | "text": [ 437 | "The cython extension is already loaded. To reload it, use:\n", 438 | " %reload_ext cython\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "%load_ext cython" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 21, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "%%cython\n", 453 | "import numpy as np\n", 454 | "cimport numpy as np\n", 455 | "ctypedef np.int_t DTYPE_t\n", 456 | "\n", 457 | "def transform(np.ndarray[DTYPE_t] arr):\n", 458 | " cdef:\n", 459 | " np.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n", 460 | " int i = 0\n", 461 | " int n = len(arr)\n", 462 | " while i < n:\n", 463 | " new_arr[i] = arr[i] * 2 + 1\n", 464 | " i += 1\n", 465 | " return new_arr" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 22, 471 | "metadata": {}, 472 | "outputs": [ 473 | { 474 | "name": "stdout", 475 | "output_type": "stream", 476 | "text": [ 477 | "The result is: [1 3 5 7 9] ...\n", 478 | "887 µs ± 29.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 479 | ] 480 | } 481 | ], 482 | "source": [ 483 | "arr = np.array(arr)\n", 484 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n", 485 | "%timeit transform(arr)" 486 | ] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "## 5. 参考文章\n", 493 | "[How does python have different data types in an array?](https://stackoverflow.com/questions/10558670/how-does-python-have-different-data-types-in-an-array) \n", 494 | "[Why are Python Programs often slower than the Equivalent Program Written in C or C++?](https://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or) \n", 495 | "[How Fast Numpy Really is and Why?](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347)" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": null, 501 | "metadata": {}, 502 | "outputs": [], 503 | "source": [] 504 | } 505 | ], 506 | "metadata": { 507 | "kernelspec": { 508 | "display_name": "Python 3", 509 | "language": "python", 510 | "name": "python3" 511 | }, 512 | "language_info": { 513 | "codemirror_mode": { 514 | "name": "ipython", 515 | "version": 3 516 | }, 517 | "file_extension": ".py", 518 | "mimetype": "text/x-python", 519 | "name": "python", 520 | "nbconvert_exporter": "python", 521 | "pygments_lexer": "ipython3", 522 | "version": "3.6.6" 523 | } 524 | }, 525 | "nbformat": 4, 526 | "nbformat_minor": 4 527 | } 528 | -------------------------------------------------------------------------------- /Using C++ in Cython.md: -------------------------------------------------------------------------------- 1 | ## 在Cython中使用C++ 2 | 作者: tushushu 3 | 项目地址: https://github.com/tushushu/flying-python 4 | 5 | ## 1. 在Jupyter Notebook上使用C++ 6 | - 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython`` 7 | - 接下来运行Cython代码,使用魔术命令 ``%%cython --cplus`` 8 | - 如果使用MacOS,使用魔术命令 ``%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++``,详情请参考https://stackoverflow.com/questions/57367764/cant-import-cpplist-into-cython 9 | 10 | 11 | ```python 12 | %load_ext Cython 13 | ``` 14 | 15 | 16 | ```cython 17 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 18 | # 注意: 使用 'cimport' 而不是 'import' 19 | from libcpp.string cimport string 20 | cdef string s 21 | s = b"Hello world!" 22 | print(s.decode("utf-8")) 23 | ``` 24 | 25 | Hello world! 26 | 27 | 28 | ## 2. C++和Python类型的相互转换 29 | 30 | | Python type| C++ type | Python type | 31 | | ------ | ------ | ------ | 32 | | bytes | std::string | bytes | 33 | |iterable|std::vector|list| 34 | |iterable|std::list|list| 35 | |iterable|std::set|set| 36 | |iterable (len 2)|std::pair|tuple (len 2)| 37 | 38 | ## 3. 使用C++ STL 39 | 40 | ### 3.1 使用C++ Vector 41 | 可以替代Python的List。 42 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型 43 | 2. 遍历 - 让index自增,通过while循环进行遍历 44 | 3. 访问 - 和Python一样使用'[]'操作符对元素进行访问 45 | 4. 追加 - 与Python list的append方法相似,使用C++ Vector的push_back方法追加元素 46 | 47 | 最后,我们通过分别实现Python和C++版本的元素计数函数来对比性能,C++大约快240倍左右。 48 | 注意: 为了公平起见,函数没有传入参数,而是直接访问函数体外部的变量。避免计入C++版本把Python列表转换为C++ Vector的耗时。如果计入这部分耗时,C++的版本大约快4倍左右。 49 | 50 | 51 | ```cython 52 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 53 | from libcpp.vector cimport vector 54 | # 通过Python对象初始化 55 | cdef vector[int] vec = range(5) 56 | # 遍历 57 | cdef: 58 | int i = 0 59 | int n = vec.size() 60 | print("开始遍历...") 61 | while i < n: 62 | # 访问 63 | print("\t第%d个位置的元素是%d" % (i, vec[i])) 64 | i += 1 65 | print() 66 | # 追加 67 | vec.push_back(5) 68 | print("追加元素之后vec变为", vec) 69 | ``` 70 | 71 | 开始遍历... 72 | 第0个位置的元素是0 73 | 第1个位置的元素是1 74 | 第2个位置的元素是2 75 | 第3个位置的元素是3 76 | 第4个位置的元素是4 77 | 78 | 追加元素之后vec变为 [0, 1, 2, 3, 4, 5] 79 | 80 | 81 | 82 | ```python 83 | arr = [x // 100 for x in range(1000)] 84 | target = 6 85 | 86 | def count_py(): 87 | return sum(1 for x in arr if x == target) 88 | 89 | print("用Python来实现,计算结果为%d!"% count_py()) 90 | ``` 91 | 92 | 用Python来实现,计算结果为100! 93 | 94 | 95 | 96 | ```cython 97 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 98 | from libcpp.vector cimport vector 99 | 100 | cdef: 101 | int target = 6 102 | vector[int] v = [x // 100 for x in range(1000)] 103 | 104 | cdef int _count_cpp(): 105 | cdef: 106 | int i = 0 107 | int n = v.size() 108 | int ret = 0 109 | while i < n: 110 | if v[i] == target: 111 | ret += 1 112 | i += 1 113 | return ret 114 | 115 | def count_cpp(): 116 | return _count_cpp() 117 | 118 | print("用Cython(C++)来实现,计算结果为%d!"% count_cpp()) 119 | ``` 120 | 121 | 用Cython(C++)来实现,计算结果为100! 122 | 123 | 124 | 125 | ```python 126 | print("对比Python版本与C++版本的性能...") 127 | %timeit count_py() 128 | %timeit count_cpp() 129 | ``` 130 | 131 | 对比Python版本与C++版本的性能... 132 | 29.9 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 133 | 130 ns ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 134 | 135 | 136 | ### 3.2 使用C++ Unordered Map 137 | 可以替代Python的Dict。 138 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型 139 | 2. 遍历 - 让泛型指针自增,通过while循环进行遍历 140 | 3. 访问 - 使用deref(C++中的'*'操作符)来解引用,返回pair对象,通过.first来访问key, .second来访问Value 141 | 4. 查找 - 使用unordered_map.count,返回1或0;或者用unordered_map.find,返回一个泛型指针,如果指针指向unordered_map.end,则表示未找到。 142 | 5. 追加/修改 - unordered_map[key] = value。如果Key不存在,'[]'操作符会添加一个Key,并赋值为默认的Value,比如0.0。所以,除非确定不会产生错误,否则在修改Key对应的Value之前,要先判断Key是否存在。这与Python的DecaultDict有点相似。 143 | 144 | 最后,我们通过分别实现Python和C++版本的map条件求和函数来对比性能,C++大约快40倍左右。 145 | 146 | 147 | ```cython 148 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 149 | from cython.operator cimport dereference as deref, preincrement as inc 150 | from libcpp.unordered_map cimport unordered_map 151 | # 通过Python对象初始化 152 | cdef unordered_map[int, float] mymap = {i: i/10 for i in range(10)} 153 | # 遍历 154 | cdef: 155 | unordered_map[int, float].iterator it = mymap.begin() 156 | unordered_map[int, float].iterator end = mymap.end() 157 | print("开始遍历...") 158 | while it != end: 159 | # 访问 160 | print("\tKey is %d, Value is %.1f" % (deref(it).first, deref(it).second)) 161 | inc(it) 162 | print() 163 | 164 | # 查找 165 | print("开始查找...") 166 | if mymap.count(-2): 167 | print("\t元素-2存在!") 168 | else: 169 | print("\t元素-2不存在!") 170 | 171 | it = mymap.find(3) 172 | if it != end: 173 | print("\t元素3存在, 它的值是%.1f!" % deref(it).second) 174 | else: 175 | print("\t元素3不存在!") 176 | print() 177 | 178 | # 修改 179 | print("修改元素...") 180 | if mymap.count(3): 181 | mymap[3] += 1.0 182 | mymap[-2] # Key -2不存在,会被添加一个默认值0.0 183 | print("\tKey is 3, Value is %.1f" % mymap[3]) 184 | print("\tKey is -2, Value is %.1f" % mymap[-2]) 185 | ``` 186 | 187 | 开始遍历... 188 | Key is 0, Value is 0.0 189 | Key is 1, Value is 0.1 190 | Key is 2, Value is 0.2 191 | Key is 3, Value is 0.3 192 | Key is 4, Value is 0.4 193 | Key is 5, Value is 0.5 194 | Key is 6, Value is 0.6 195 | Key is 7, Value is 0.7 196 | Key is 8, Value is 0.8 197 | Key is 9, Value is 0.9 198 | 199 | 开始查找... 200 | 元素-2不存在! 201 | 元素3存在, 它的值是0.3! 202 | 203 | 修改元素... 204 | Key is 3, Value is 1.3 205 | Key is -2, Value is 0.0 206 | 207 | 208 | 209 | ```python 210 | my_map = {x: x for x in range(100)} 211 | target = 50 212 | 213 | def sum_lt_py(): 214 | return sum(my_map[x] for x in my_map if x < target) 215 | 216 | print("用Python来实现,计算结果为%d!"% sum_lt_py()) 217 | ``` 218 | 219 | 用Python来实现,计算结果为1225! 220 | 221 | 222 | 223 | ```cython 224 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 225 | from libcpp.unordered_map cimport unordered_map 226 | from cython.operator cimport dereference as deref, preincrement as inc 227 | 228 | cdef: 229 | unordered_map[int, int] my_map = {x: x for x in range(100)} 230 | int target = 50 231 | 232 | cdef _sum_lt_cpp(): 233 | cdef: 234 | unordered_map[int, int].iterator it = my_map.begin() 235 | int ret 236 | while it != my_map.end(): 237 | if deref(it).first < target: 238 | ret += deref(it).second 239 | inc(it) 240 | return ret 241 | 242 | def sum_lt_cpp(): 243 | return _sum_lt_cpp() 244 | 245 | print("用Cython(C++)来实现,计算结果为%d!"% sum_lt_cpp()) 246 | ``` 247 | 248 | 用Cython(C++)来实现,计算结果为1225! 249 | 250 | 251 | 252 | ```python 253 | print("对比Python版本与C++版本的性能...") 254 | %timeit sum_lt_py() 255 | %timeit sum_lt_cpp() 256 | ``` 257 | 258 | 对比Python版本与C++版本的性能... 259 | 6.56 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 260 | 162 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 261 | 262 | 263 | ### 3.3 使用C++ Unordered Set 264 | 可以替代Python的Set。 265 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型 266 | 2. 遍历 - 让泛型指针自增,通过while循环进行遍历 267 | 3. 访问 - 使用deref(C++中的'*'操作符)来解引用 268 | 4. 查找 - 使用unordered_set.count,返回1或0 269 | 5. 追加 - 使用unordered_set.insert,如果元素已经存在,则元素不会被追加 270 | 6. 交集、并集、差集 - 据我所知,unordered_set的这些操作需要开发者自己去实现,不如Python的Set用起来方便。 271 | 272 | 最后,我们通过分别实现Python和C++版本的set求交集对比性能,C++大约**慢**20倍左右。详情可参考https://stackoverflow.com/questions/54763112/how-to-improve-stdset-intersection-performance-in-c 273 | 如果只是求两个集合相同元素的数量,C++的性能大约是Python的6倍。不难推测,C++的unordered set查询很快,但是创建很慢。 274 | 275 | 276 | ```cython 277 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 278 | from cython.operator cimport dereference as deref, preincrement as inc 279 | from libcpp.unordered_set cimport unordered_set 280 | # 通过Python对象初始化 281 | cdef unordered_set[int] myset = {i for i in range(5)} 282 | # 遍历 283 | cdef: 284 | unordered_set[int].iterator it = myset.begin() 285 | unordered_set[int].iterator end = myset.end() 286 | print("开始遍历...") 287 | while it != end: 288 | # 访问 289 | print("\tValue is %d" % deref(it)) 290 | inc(it) 291 | print() 292 | 293 | # 查找 294 | print("开始查找...") 295 | if myset.count(-2): 296 | print("\t元素-2存在!") 297 | else: 298 | print("\t元素-2不存在!") 299 | 300 | print() 301 | 302 | # 追加 303 | print("追加元素...") 304 | myset.insert(0) 305 | myset.insert(-1) 306 | 307 | print("\tMyset is: ", myset) 308 | ``` 309 | 310 | 开始遍历... 311 | Value is 0 312 | Value is 1 313 | Value is 2 314 | Value is 3 315 | Value is 4 316 | 317 | 开始查找... 318 | 元素-2不存在! 319 | 320 | 追加元素... 321 | Myset is: {0, 1, 2, 3, 4, -1} 322 | 323 | 324 | 325 | ```python 326 | myset1 = {x for x in range(100)} 327 | myset2 = {x for x in range(50, 60)} 328 | 329 | def intersection_py(): 330 | return myset1 & myset2 331 | 332 | print("用Python来实现,计算结果为%s!"% intersection_py()) 333 | ``` 334 | 335 | 用Python来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}! 336 | 337 | 338 | 339 | ```cython 340 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 341 | from cython.operator cimport dereference as deref, preincrement as inc 342 | from libcpp.unordered_set cimport unordered_set 343 | 344 | cdef: 345 | unordered_set[int] myset1 = {x for x in range(100)} 346 | unordered_set[int] myset2 = {x for x in range(50, 60)} 347 | 348 | cdef unordered_set[int] _intersection_cpp(): 349 | cdef: 350 | unordered_set[int].iterator it = myset1.begin() 351 | unordered_set[int] ret 352 | while it != myset1.end(): 353 | if myset2.count(deref(it)): 354 | ret.insert(deref(it)) 355 | inc(it) 356 | return ret 357 | 358 | def intersection_cpp(): 359 | return _intersection_cpp() 360 | 361 | print("用Cython(C++)来实现,计算结果为%s!"% intersection_cpp()) 362 | ``` 363 | 364 | 用Cython(C++)来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}! 365 | 366 | 367 | 368 | ```python 369 | print("对比Python版本与C++版本的性能...") 370 | %timeit intersection_py() 371 | %timeit intersection_cpp() 372 | ``` 373 | 374 | 对比Python版本与C++版本的性能... 375 | 274 ns ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 376 | 5.28 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 377 | 378 | 379 | 380 | ```python 381 | myset1 = {x for x in range(100)} 382 | myset2 = {x for x in range(50, 60)} 383 | 384 | def count_common_py(): 385 | return len(myset1 & myset2) 386 | 387 | print("用Python(C++)来实现,计算结果为%s!"% count_common_py()) 388 | ``` 389 | 390 | 用Python(C++)来实现,计算结果为10! 391 | 392 | 393 | 394 | ```cython 395 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 396 | from cython.operator cimport dereference as deref, preincrement as inc 397 | from libcpp.unordered_set cimport unordered_set 398 | 399 | cdef: 400 | unordered_set[int] myset2 = {x for x in range(100)} 401 | unordered_set[int] myset1 = {x for x in range(50, 60)} 402 | 403 | cdef int _count_common_cpp(): 404 | if myset1.size() > myset2.size(): 405 | myset1.swap(myset2) 406 | cdef: 407 | unordered_set[int].iterator it = myset1.begin() 408 | int ret = 0 409 | while it != myset1.end(): 410 | if myset2.count(deref(it)): 411 | ret += 1 412 | inc(it) 413 | return ret 414 | 415 | def count_common_cpp(): 416 | return _count_common_cpp() 417 | 418 | print("用Cython(C++)来实现,计算结果为%s!"% count_common_cpp()) 419 | ``` 420 | 421 | 用Cython(C++)来实现,计算结果为10! 422 | 423 | 424 | 425 | ```python 426 | print("对比Python版本与C++版本的性能...") 427 | %timeit count_common_py() 428 | %timeit count_common_cpp() 429 | ``` 430 | 431 | 对比Python版本与C++版本的性能... 432 | 295 ns ± 5.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 433 | 46.1 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 434 | 435 | 436 | ## 4. 传值与传引用 437 | Python的函数,如果是容器类对象(如List, Set),传递的是引用,否则传递的是值(如int, float),如果不希望让函数修改容器类对象,可以用deepcopy函数先拷贝一份容器的副本。 438 | 但在C++里默认都是传值,如果需要传引用需要声明。 439 | 以int型Vector为例,可以看到v1的值没有被pass_value修改,但被pass_reference修改了。 440 | - 传值使用 ``vector[int]``,pass_value函数只是传入了v1的一份拷贝,所以函数无法修改v1 441 | - 传引用使用 ``vector[int]&``,pass_reference传入了v1的引用,函数可以修改v1。 442 | 443 | 下面的两块代码可以展示Python与C++的不同之处。 444 | 445 | 446 | ```python 447 | from copy import deepcopy 448 | 449 | def pass_value(v): 450 | v = deepcopy(v) 451 | v[0] = -1 452 | 453 | def pass_reference(v): 454 | v[0] = -1 455 | 456 | v1 = [0, 0, 0] 457 | print("v1的初始值是%s" % v1) 458 | pass_value(v1) 459 | print("执行pass_value函数后,v1的值是%s" % v1) 460 | pass_reference(v1) 461 | print("执行pass_reference函数后,v1的值是%s" % v1) 462 | ``` 463 | 464 | v1的初始值是[0, 0, 0] 465 | 执行pass_value函数后,v1的值是[0, 0, 0] 466 | 执行pass_reference函数后,v1的值是[-1, 0, 0] 467 | 468 | 469 | 470 | ```cython 471 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 472 | 473 | from libcpp.vector cimport vector 474 | 475 | cdef void pass_value(vector[int] v): 476 | v[0] = -1 477 | 478 | cdef void pass_reference(vector[int]& v): 479 | v[0] = -1 480 | 481 | cdef vector[int] v1 = [0, 0, 0] 482 | print("v1的初始值是%s" % v1) 483 | pass_value(v1) 484 | print("执行pass_value函数后,v1的值是%s" % v1) 485 | pass_reference(v1) 486 | print("执行pass_reference函数后,v1的值是%s" % v1) 487 | ``` 488 | 489 | v1的初始值是[0, 0, 0] 490 | 执行pass_value函数后,v1的值是[0, 0, 0] 491 | 执行pass_reference函数后,v1的值是[-1, 0, 0] 492 | 493 | 494 | ## 5. 数字的范围 495 | Python只有int型,而且int的范围可以认为是无限大的,只要没有超出内存限制,所以Python使用者一般不太关心数值溢出等问题。但使用C++的时候就需要谨慎,C++各个数字类型对应的范围如下: 496 | 497 | 498 | |Type |Typical Bit Width |Typical Range| 499 | | ------ | ------ | ------ | 500 | |char |1byte |-127 to 127 or 0 to 255| 501 | |unsigned char |1byte |0 to 255| 502 | |signed char |1byte -127 to 127| 503 | |int |4bytes |-2147483648 to 2147483647| 504 | |unsigned int |4bytes |0 to 4294967295| 505 | |signed int |4bytes |-2147483648 to 2147483647| 506 | |short int |2bytes |-32768 to 32767| 507 | |unsigned short int |2bytes |0 to 65,535| 508 | |signed short int |2bytes |-32768 to 32767| 509 | |long int |4bytes |-2,147,483,648 to 2,147,483,647| 510 | |signed long int |8bytes |same as long int| 511 | |unsigned long int |4bytes |0 to 4,294,967,295| 512 | |long long int |8bytes |-(2^63) to (2^63)-1| 513 | |unsigned long long int |8bytes |0 to 18,446,744,073,709,551,615| 514 | |float |4bytes || 515 | |double |8bytes || 516 | |long double |12bytes|| 517 | |wchar_t |2 or 4 bytes |1 wide character| 518 | 519 | 520 | 比如下面的函数就会造成错误。 521 | 522 | 523 | ```cython 524 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 525 | def sum_py(num1, num2): 526 | print("The result by python is:", num1 + num2) 527 | 528 | cdef int _sum_cpp(int num1, int num2): # int -> long int 529 | return num1 + num2 530 | 531 | def sum_cpp(num1, num2): 532 | print("The result by cpp is:", _sum_cpp(num1, num2)) 533 | ``` 534 | 535 | 536 | ```python 537 | sum_py(2**31-1, 1) 538 | sum_cpp(2**31-1, 1) 539 | ``` 540 | 541 | The result by python is: 2147483648 542 | The result by cpp is: -2147483648 543 | 544 | 545 | 546 | ```cython 547 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++ 548 | from libcpp cimport bool 549 | 550 | def lt_py(num1, num2): 551 | print("The result by python is:", num1 < num2) 552 | 553 | cdef bool _lt_cpp(float num1, float num2): # float -> double 554 | return num1 > num2 555 | 556 | def lt_cpp(num1, num2): 557 | print("The result by cpp is:", _lt_cpp(num1, num2)) 558 | ``` 559 | 560 | 561 | ```python 562 | lt_py(1234567890.0, 1234567891.0) 563 | lt_cpp(1234567890.0, 1234567891.0) 564 | ``` 565 | 566 | The result by python is: True 567 | The result by cpp is: False 568 | 569 | 570 | 571 | ```python 572 | 573 | ``` 574 | -------------------------------------------------------------------------------- /Itertools for efficient looping.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python Itertools - 高效的循环\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python \n", 10 | "\n", 11 | "Python官方文档用\"高效的循环\"来形容itertools模块,有些tools会带来性能提升,而另外一些tools并不快,只是会节省一些开发时间而已,如果滥用还会导致代码可读性变差。我们不妨把itertools的兄弟们拉出来溜溜。\n" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## 1. 数列累加\n", 19 | "给定一个列表An,返回数列累加和Sn。\n", 20 | "举例说明:\n", 21 | "* 输入: [1, 2, 3, 4, 5]\n", 22 | "* 返回: [1, 3, 6, 10, 15] \n", 23 | "\n", 24 | "使用accumulate,性能提升了2.5倍" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 1, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "from itertools import accumulate" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "def _accumulate_list(arr):\n", 43 | " tot = 0\n", 44 | " for x in arr:\n", 45 | " tot += x\n", 46 | " yield tot\n", 47 | "\n", 48 | "def accumulate_list(arr):\n", 49 | " return list(_accumulate_list(arr))" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "def fast_accumulate_list(arr):\n", 59 | " return list(accumulate(arr))" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "arr = list(range(1000))" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 5, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "61 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "%timeit accumulate_list(arr)" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 6, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "name": "stdout", 95 | "output_type": "stream", 96 | "text": [ 97 | "21.3 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" 98 | ] 99 | } 100 | ], 101 | "source": [ 102 | "%timeit fast_accumulate_list(arr)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "## 2. 选择数据\n", 110 | "给定一个列表data,一个用0/1表示的列表selectors,返回被选择的数据。\n", 111 | "举例说明:\n", 112 | "* 输入: [1, 2, 3, 4, 5], [0, 1, 0, 1, 0]\n", 113 | "* 返回: [2, 4] \n", 114 | "\n", 115 | "使用compress,性能提升了2.8倍" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 7, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "from itertools import compress\n", 125 | "from random import randint" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 8, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "def select_data(data, selectors):\n", 135 | " return [x for x, y in zip(data, selectors) if y]" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 9, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "def fast_select_data(data, selectors):\n", 145 | " return list(compress(data, selectors))" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 10, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "data = list(range(10000))\n", 155 | "selectors = [randint(0, 1) for _ in range(10000)]" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 11, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "341 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "%timeit select_data(data, selectors)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 12, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "name": "stdout", 182 | "output_type": "stream", 183 | "text": [ 184 | "130 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" 185 | ] 186 | } 187 | ], 188 | "source": [ 189 | "%timeit fast_select_data(data, selectors)" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "## 3. 组合\n", 197 | "给定一个列表arr和一个数字k,返回从arr中选择k个元素的所有情况。\n", 198 | "举例说明:\n", 199 | "* 输入: [1, 2, 3], 2\n", 200 | "* 返回: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)] \n", 201 | "\n", 202 | "使用permutations,性能提升了10倍" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 13, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "from itertools import permutations" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 14, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "def _get_permutations(arr, k, i):\n", 221 | " if i == k:\n", 222 | " return [arr[:k]]\n", 223 | " res = []\n", 224 | " for j in range(i, len(arr)):\n", 225 | " arr_cpy = arr.copy()\n", 226 | " arr_cpy[i], arr_cpy[j] = arr_cpy[j], arr_cpy[i]\n", 227 | " res += _get_permutations(arr_cpy, k, i + 1)\n", 228 | " return res\n", 229 | " \n", 230 | "def get_permutations(arr, k):\n", 231 | " return _get_permutations(arr, k, 0)" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 15, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "def fast_get_permutations(arr, k):\n", 241 | " return list(permutations(arr, k))" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 16, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "arr = list(range(10))\n", 251 | "k = 5" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 17, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "name": "stdout", 261 | "output_type": "stream", 262 | "text": [ 263 | "15.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 264 | ] 265 | } 266 | ], 267 | "source": [ 268 | "%timeit -n 1 get_permutations(arr, k)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 18, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "1.56 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "%timeit -n 1 fast_get_permutations(arr, k)" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "## 4. 筛选数据\n", 293 | "给定一个列表arr,筛选出所有的偶数。\n", 294 | "举例说明:\n", 295 | "* 输入: [3, 1, 4, 5, 9, 2]\n", 296 | "* 返回: [(4, 2] \n", 297 | "\n", 298 | "使用filterfalse,性能反而会变慢,所以不要迷信itertools。" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 19, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [ 307 | "from itertools import filterfalse" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 20, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "def get_even_nums(arr):\n", 317 | " return [x for x in arr if x % 2 == 0]" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 21, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [ 326 | "def fast_get_even_nums(arr):\n", 327 | " return list(filterfalse(lambda x: x % 2, arr))" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": 22, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [ 336 | "arr = list(range(10000))" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 23, 342 | "metadata": {}, 343 | "outputs": [ 344 | { 345 | "name": "stdout", 346 | "output_type": "stream", 347 | "text": [ 348 | "417 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 349 | ] 350 | } 351 | ], 352 | "source": [ 353 | "%timeit get_even_nums(arr)" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 24, 359 | "metadata": {}, 360 | "outputs": [ 361 | { 362 | "name": "stdout", 363 | "output_type": "stream", 364 | "text": [ 365 | "823 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 366 | ] 367 | } 368 | ], 369 | "source": [ 370 | "%timeit fast_get_even_nums(arr)" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "## 5. 条件终止\n", 378 | "给定一个列表arr,依次对列表的所有数字进行求和,若遇到某个元素大于target之后则终止求和,返回这个和。\n", 379 | "举例说明:\n", 380 | "* 输入: [1, 2, 3, 4, 5], 3\n", 381 | "* 返回: 6 (4 > 3,终止)\n", 382 | "\n", 383 | "使用takewhile,性能反而会变慢,所以不要迷信itertools。" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 25, 389 | "metadata": {}, 390 | "outputs": [], 391 | "source": [ 392 | "from itertools import takewhile" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 26, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "def cond_sum(arr, target):\n", 402 | " res = 0\n", 403 | " for x in arr:\n", 404 | " if x > target:\n", 405 | " break\n", 406 | " res += x\n", 407 | " return res" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 27, 413 | "metadata": {}, 414 | "outputs": [], 415 | "source": [ 416 | "def fast_cond_sum(arr, target):\n", 417 | " return sum(takewhile(lambda x: x <= target, arr))" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 28, 423 | "metadata": {}, 424 | "outputs": [], 425 | "source": [ 426 | "arr = list(range(10000))\n", 427 | "target = 5000" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 29, 433 | "metadata": {}, 434 | "outputs": [ 435 | { 436 | "name": "stdout", 437 | "output_type": "stream", 438 | "text": [ 439 | "245 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 440 | ] 441 | } 442 | ], 443 | "source": [ 444 | "%timeit cond_sum(arr, target)" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 30, 450 | "metadata": {}, 451 | "outputs": [ 452 | { 453 | "name": "stdout", 454 | "output_type": "stream", 455 | "text": [ 456 | "404 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 457 | ] 458 | } 459 | ], 460 | "source": [ 461 | "%timeit fast_cond_sum(arr, target)" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "## 6. 循环嵌套\n", 469 | "给定列表arr1,arr2,返回两个列表的所有元素两两相加的和。\n", 470 | "举例说明:\n", 471 | "* 输入: [1, 2], [4, 5]\n", 472 | "* 返回: [1 + 4, 1 + 5, 2 + 4, 2 + 5]\n", 473 | "\n", 474 | "使用product,性能提升了1.25倍。" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 31, 480 | "metadata": {}, 481 | "outputs": [], 482 | "source": [ 483 | "from itertools import product" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 32, 489 | "metadata": {}, 490 | "outputs": [], 491 | "source": [ 492 | "def _cross_sum(arr1, arr2):\n", 493 | " for x in arr1:\n", 494 | " for y in arr2:\n", 495 | " yield x + y\n", 496 | "\n", 497 | "def cross_sum(arr1, arr2):\n", 498 | " return list(_cross_sum(arr1, arr2))" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 33, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "def fast_cross_sum(arr1, arr2):\n", 508 | " return [x + y for x, y in product(arr1, arr2)]" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 34, 514 | "metadata": {}, 515 | "outputs": [], 516 | "source": [ 517 | "arr1 = list(range(100))\n", 518 | "arr2 = list(range(100))" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 35, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "name": "stdout", 528 | "output_type": "stream", 529 | "text": [ 530 | "484 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 531 | ] 532 | } 533 | ], 534 | "source": [ 535 | "%timeit cross_sum(arr1, arr2)" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 36, 541 | "metadata": {}, 542 | "outputs": [ 543 | { 544 | "name": "stdout", 545 | "output_type": "stream", 546 | "text": [ 547 | "373 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 548 | ] 549 | } 550 | ], 551 | "source": [ 552 | "%timeit fast_cross_sum(arr1, arr2)" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "## 7. 二维列表转一维列表\n", 560 | "给定二维列表arr,转为一维列表\n", 561 | "举例说明:\n", 562 | "* 输入: [[1, 2], [3, 4]]\n", 563 | "* 返回: [1, 2, 3, 4]\n", 564 | "\n", 565 | "使用chain,性能提升了6倍。" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 37, 571 | "metadata": {}, 572 | "outputs": [], 573 | "source": [ 574 | "from itertools import chain" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 38, 580 | "metadata": {}, 581 | "outputs": [], 582 | "source": [ 583 | "def _flatten(arr2d):\n", 584 | " for arr in arr2d:\n", 585 | " for x in arr:\n", 586 | " yield x\n", 587 | "\n", 588 | "def flatten(arr2d):\n", 589 | " return list(_flatten(arr2d))" 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": 39, 595 | "metadata": {}, 596 | "outputs": [], 597 | "source": [ 598 | "def fast_flatten(arr2d):\n", 599 | " return list(chain(*arr2d))" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 40, 605 | "metadata": {}, 606 | "outputs": [], 607 | "source": [ 608 | "arr2d = [[x + y * 100 for x in range(100)] for y in range(100)]" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": 41, 614 | "metadata": {}, 615 | "outputs": [ 616 | { 617 | "name": "stdout", 618 | "output_type": "stream", 619 | "text": [ 620 | "379 µs ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 621 | ] 622 | } 623 | ], 624 | "source": [ 625 | "%timeit flatten(arr2d)" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": 42, 631 | "metadata": {}, 632 | "outputs": [ 633 | { 634 | "name": "stdout", 635 | "output_type": "stream", 636 | "text": [ 637 | "66.9 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" 638 | ] 639 | } 640 | ], 641 | "source": [ 642 | "%timeit fast_flatten(arr2d)" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": null, 648 | "metadata": {}, 649 | "outputs": [], 650 | "source": [] 651 | } 652 | ], 653 | "metadata": { 654 | "kernelspec": { 655 | "display_name": "Python 3", 656 | "language": "python", 657 | "name": "python3" 658 | }, 659 | "language_info": { 660 | "codemirror_mode": { 661 | "name": "ipython", 662 | "version": 3 663 | }, 664 | "file_extension": ".py", 665 | "mimetype": "text/x-python", 666 | "name": "python", 667 | "nbconvert_exporter": "python", 668 | "pygments_lexer": "ipython3", 669 | "version": "3.6.6" 670 | } 671 | }, 672 | "nbformat": 4, 673 | "nbformat_minor": 2 674 | } 675 | -------------------------------------------------------------------------------- /Using C++ in Cython.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 在Cython中使用C++\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. 在Jupyter Notebook上使用C++ \n", 17 | "- 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``\n", 18 | "- 接下来运行Cython代码,使用魔术命令 ``%%cython --cplus``\n", 19 | "- 如果使用MacOS,使用魔术命令 ``%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++``,详情请参考https://stackoverflow.com/questions/57367764/cant-import-cpplist-into-cython" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "%load_ext Cython" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [ 36 | { 37 | "name": "stdout", 38 | "output_type": "stream", 39 | "text": [ 40 | "Hello world!\n" 41 | ] 42 | } 43 | ], 44 | "source": [ 45 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 46 | "# 注意: 使用 'cimport' 而不是 'import'\n", 47 | "from libcpp.string cimport string\n", 48 | "cdef string s\n", 49 | "s = b\"Hello world!\"\n", 50 | "print(s.decode(\"utf-8\"))" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## 2. C++和Python类型的相互转换" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "| Python type| C++ type | Python type |\n", 65 | "| ------ | ------ | ------ |\n", 66 | "| bytes | std::string | bytes |\n", 67 | "|iterable|std::vector|list|\n", 68 | "|iterable|std::list|list|\n", 69 | "|iterable|std::set|set|\n", 70 | "|iterable (len 2)|std::pair|tuple (len 2)|" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## 3. 使用C++ STL" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "### 3.1 使用C++ Vector\n", 85 | "可以替代Python的List。\n", 86 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n", 87 | "2. 遍历 - 让index自增,通过while循环进行遍历\n", 88 | "3. 访问 - 和Python一样使用'[]'操作符对元素进行访问\n", 89 | "4. 追加 - 与Python list的append方法相似,使用C++ Vector的push_back方法追加元素\n", 90 | "\n", 91 | "最后,我们通过分别实现Python和C++版本的元素计数函数来对比性能,C++大约快240倍左右。 \n", 92 | "注意: 为了公平起见,函数没有传入参数,而是直接访问函数体外部的变量。避免计入C++版本把Python列表转换为C++ Vector的耗时。如果计入这部分耗时,C++的版本大约快4倍左右。" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 3, 98 | "metadata": {}, 99 | "outputs": [ 100 | { 101 | "name": "stdout", 102 | "output_type": "stream", 103 | "text": [ 104 | "开始遍历...\n", 105 | "\t第0个位置的元素是0\n", 106 | "\t第1个位置的元素是1\n", 107 | "\t第2个位置的元素是2\n", 108 | "\t第3个位置的元素是3\n", 109 | "\t第4个位置的元素是4\n", 110 | "\n", 111 | "追加元素之后vec变为 [0, 1, 2, 3, 4, 5]\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 117 | "from libcpp.vector cimport vector\n", 118 | "# 通过Python对象初始化\n", 119 | "cdef vector[int] vec = range(5)\n", 120 | "# 遍历\n", 121 | "cdef:\n", 122 | " int i = 0\n", 123 | " int n = vec.size()\n", 124 | "print(\"开始遍历...\")\n", 125 | "while i < n:\n", 126 | " # 访问\n", 127 | " print(\"\\t第%d个位置的元素是%d\" % (i, vec[i]))\n", 128 | " i += 1\n", 129 | "print()\n", 130 | "# 追加\n", 131 | "vec.push_back(5)\n", 132 | "print(\"追加元素之后vec变为\", vec)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 4, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "用Python来实现,计算结果为100!\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "arr = [x // 100 for x in range(1000)]\n", 150 | "target = 6\n", 151 | "\n", 152 | "def count_py():\n", 153 | " return sum(1 for x in arr if x == target)\n", 154 | "\n", 155 | "print(\"用Python来实现,计算结果为%d!\"% count_py())" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 5, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "用Cython(C++)来实现,计算结果为100!\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 173 | "from libcpp.vector cimport vector\n", 174 | "\n", 175 | "cdef:\n", 176 | " int target = 6\n", 177 | " vector[int] v = [x // 100 for x in range(1000)]\n", 178 | "\n", 179 | "cdef int _count_cpp():\n", 180 | " cdef:\n", 181 | " int i = 0\n", 182 | " int n = v.size()\n", 183 | " int ret = 0\n", 184 | " while i < n:\n", 185 | " if v[i] == target:\n", 186 | " ret += 1\n", 187 | " i += 1\n", 188 | " return ret\n", 189 | "\n", 190 | "def count_cpp():\n", 191 | " return _count_cpp()\n", 192 | "\n", 193 | "print(\"用Cython(C++)来实现,计算结果为%d!\"% count_cpp())" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 6, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "name": "stdout", 203 | "output_type": "stream", 204 | "text": [ 205 | "对比Python版本与C++版本的性能...\n", 206 | "29.9 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n", 207 | "130 ns ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "print(\"对比Python版本与C++版本的性能...\")\n", 213 | "%timeit count_py()\n", 214 | "%timeit count_cpp()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "### 3.2 使用C++ Unordered Map\n", 222 | "可以替代Python的Dict。\n", 223 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n", 224 | "2. 遍历 - 让泛型指针自增,通过while循环进行遍历\n", 225 | "3. 访问 - 使用deref(C++中的'*'操作符)来解引用,返回pair对象,通过.first来访问key, .second来访问Value\n", 226 | "4. 查找 - 使用unordered_map.count,返回1或0;或者用unordered_map.find,返回一个泛型指针,如果指针指向unordered_map.end,则表示未找到。\n", 227 | "5. 追加/修改 - unordered_map[key] = value。如果Key不存在,'[]'操作符会添加一个Key,并赋值为默认的Value,比如0.0。所以,除非确定不会产生错误,否则在修改Key对应的Value之前,要先判断Key是否存在。这与Python的DecaultDict有点相似。 \n", 228 | "\n", 229 | "最后,我们通过分别实现Python和C++版本的map条件求和函数来对比性能,C++大约快40倍左右。" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 7, 235 | "metadata": {}, 236 | "outputs": [ 237 | { 238 | "name": "stdout", 239 | "output_type": "stream", 240 | "text": [ 241 | "开始遍历...\n", 242 | "\tKey is 0, Value is 0.0\n", 243 | "\tKey is 1, Value is 0.1\n", 244 | "\tKey is 2, Value is 0.2\n", 245 | "\tKey is 3, Value is 0.3\n", 246 | "\tKey is 4, Value is 0.4\n", 247 | "\tKey is 5, Value is 0.5\n", 248 | "\tKey is 6, Value is 0.6\n", 249 | "\tKey is 7, Value is 0.7\n", 250 | "\tKey is 8, Value is 0.8\n", 251 | "\tKey is 9, Value is 0.9\n", 252 | "\n", 253 | "开始查找...\n", 254 | "\t元素-2不存在!\n", 255 | "\t元素3存在, 它的值是0.3!\n", 256 | "\n", 257 | "修改元素...\n", 258 | "\tKey is 3, Value is 1.3\n", 259 | "\tKey is -2, Value is 0.0\n" 260 | ] 261 | } 262 | ], 263 | "source": [ 264 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 265 | "from cython.operator cimport dereference as deref, preincrement as inc\n", 266 | "from libcpp.unordered_map cimport unordered_map\n", 267 | "# 通过Python对象初始化\n", 268 | "cdef unordered_map[int, float] mymap = {i: i/10 for i in range(10)}\n", 269 | "# 遍历\n", 270 | "cdef:\n", 271 | " unordered_map[int, float].iterator it = mymap.begin()\n", 272 | " unordered_map[int, float].iterator end = mymap.end()\n", 273 | "print(\"开始遍历...\")\n", 274 | "while it != end:\n", 275 | " # 访问\n", 276 | " print(\"\\tKey is %d, Value is %.1f\" % (deref(it).first, deref(it).second))\n", 277 | " inc(it)\n", 278 | "print()\n", 279 | "\n", 280 | "# 查找\n", 281 | "print(\"开始查找...\")\n", 282 | "if mymap.count(-2):\n", 283 | " print(\"\\t元素-2存在!\")\n", 284 | "else:\n", 285 | " print(\"\\t元素-2不存在!\")\n", 286 | "\n", 287 | "it = mymap.find(3)\n", 288 | "if it != end:\n", 289 | " print(\"\\t元素3存在, 它的值是%.1f!\" % deref(it).second)\n", 290 | "else:\n", 291 | " print(\"\\t元素3不存在!\")\n", 292 | "print()\n", 293 | "\n", 294 | "# 修改\n", 295 | "print(\"修改元素...\")\n", 296 | "if mymap.count(3):\n", 297 | " mymap[3] += 1.0\n", 298 | "mymap[-2] # Key -2不存在,会被添加一个默认值0.0\n", 299 | "print(\"\\tKey is 3, Value is %.1f\" % mymap[3])\n", 300 | "print(\"\\tKey is -2, Value is %.1f\" % mymap[-2])" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 8, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "用Python来实现,计算结果为1225!\n" 313 | ] 314 | } 315 | ], 316 | "source": [ 317 | "my_map = {x: x for x in range(100)}\n", 318 | "target = 50\n", 319 | "\n", 320 | "def sum_lt_py():\n", 321 | " return sum(my_map[x] for x in my_map if x < target)\n", 322 | "\n", 323 | "print(\"用Python来实现,计算结果为%d!\"% sum_lt_py())" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 9, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "用Cython(C++)来实现,计算结果为1225!\n" 336 | ] 337 | } 338 | ], 339 | "source": [ 340 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 341 | "from libcpp.unordered_map cimport unordered_map\n", 342 | "from cython.operator cimport dereference as deref, preincrement as inc\n", 343 | "\n", 344 | "cdef:\n", 345 | " unordered_map[int, int] my_map = {x: x for x in range(100)}\n", 346 | " int target = 50\n", 347 | "\n", 348 | "cdef _sum_lt_cpp():\n", 349 | " cdef:\n", 350 | " unordered_map[int, int].iterator it = my_map.begin()\n", 351 | " int ret\n", 352 | " while it != my_map.end():\n", 353 | " if deref(it).first < target:\n", 354 | " ret += deref(it).second\n", 355 | " inc(it)\n", 356 | " return ret\n", 357 | "\n", 358 | "def sum_lt_cpp():\n", 359 | " return _sum_lt_cpp()\n", 360 | "\n", 361 | "print(\"用Cython(C++)来实现,计算结果为%d!\"% sum_lt_cpp())" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 10, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "name": "stdout", 371 | "output_type": "stream", 372 | "text": [ 373 | "对比Python版本与C++版本的性能...\n", 374 | "6.56 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n", 375 | "162 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" 376 | ] 377 | } 378 | ], 379 | "source": [ 380 | "print(\"对比Python版本与C++版本的性能...\")\n", 381 | "%timeit sum_lt_py()\n", 382 | "%timeit sum_lt_cpp()" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "### 3.3 使用C++ Unordered Set\n", 390 | "可以替代Python的Set。 \n", 391 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n", 392 | "2. 遍历 - 让泛型指针自增,通过while循环进行遍历\n", 393 | "3. 访问 - 使用deref(C++中的'*'操作符)来解引用\n", 394 | "4. 查找 - 使用unordered_set.count,返回1或0\n", 395 | "5. 追加 - 使用unordered_set.insert,如果元素已经存在,则元素不会被追加\n", 396 | "6. 交集、并集、差集 - 据我所知,unordered_set的这些操作需要开发者自己去实现,不如Python的Set用起来方便。\n", 397 | " \n", 398 | "最后,我们通过分别实现Python和C++版本的set求交集对比性能,C++大约**慢**20倍左右。详情可参考https://stackoverflow.com/questions/54763112/how-to-improve-stdset-intersection-performance-in-c \n", 399 | "如果只是求两个集合相同元素的数量,C++的性能大约是Python的6倍。不难推测,C++的unordered set查询很快,但是创建很慢。" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 11, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "name": "stdout", 409 | "output_type": "stream", 410 | "text": [ 411 | "开始遍历...\n", 412 | "\tValue is 0\n", 413 | "\tValue is 1\n", 414 | "\tValue is 2\n", 415 | "\tValue is 3\n", 416 | "\tValue is 4\n", 417 | "\n", 418 | "开始查找...\n", 419 | "\t元素-2不存在!\n", 420 | "\n", 421 | "追加元素...\n", 422 | "\tMyset is: {0, 1, 2, 3, 4, -1}\n" 423 | ] 424 | } 425 | ], 426 | "source": [ 427 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 428 | "from cython.operator cimport dereference as deref, preincrement as inc\n", 429 | "from libcpp.unordered_set cimport unordered_set\n", 430 | "# 通过Python对象初始化\n", 431 | "cdef unordered_set[int] myset = {i for i in range(5)}\n", 432 | "# 遍历\n", 433 | "cdef:\n", 434 | " unordered_set[int].iterator it = myset.begin()\n", 435 | " unordered_set[int].iterator end = myset.end()\n", 436 | "print(\"开始遍历...\")\n", 437 | "while it != end:\n", 438 | " # 访问\n", 439 | " print(\"\\tValue is %d\" % deref(it))\n", 440 | " inc(it)\n", 441 | "print()\n", 442 | "\n", 443 | "# 查找\n", 444 | "print(\"开始查找...\")\n", 445 | "if myset.count(-2):\n", 446 | " print(\"\\t元素-2存在!\")\n", 447 | "else:\n", 448 | " print(\"\\t元素-2不存在!\")\n", 449 | "\n", 450 | "print()\n", 451 | "\n", 452 | "# 追加\n", 453 | "print(\"追加元素...\")\n", 454 | "myset.insert(0)\n", 455 | "myset.insert(-1)\n", 456 | "\n", 457 | "print(\"\\tMyset is: \", myset)" 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": 12, 463 | "metadata": {}, 464 | "outputs": [ 465 | { 466 | "name": "stdout", 467 | "output_type": "stream", 468 | "text": [ 469 | "用Python来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "myset1 = {x for x in range(100)}\n", 475 | "myset2 = {x for x in range(50, 60)}\n", 476 | "\n", 477 | "def intersection_py():\n", 478 | " return myset1 & myset2\n", 479 | "\n", 480 | "print(\"用Python来实现,计算结果为%s!\"% intersection_py())" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 13, 486 | "metadata": {}, 487 | "outputs": [ 488 | { 489 | "name": "stdout", 490 | "output_type": "stream", 491 | "text": [ 492 | "用Cython(C++)来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!\n" 493 | ] 494 | } 495 | ], 496 | "source": [ 497 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 498 | "from cython.operator cimport dereference as deref, preincrement as inc\n", 499 | "from libcpp.unordered_set cimport unordered_set\n", 500 | "\n", 501 | "cdef:\n", 502 | " unordered_set[int] myset1 = {x for x in range(100)}\n", 503 | " unordered_set[int] myset2 = {x for x in range(50, 60)}\n", 504 | "\n", 505 | "cdef unordered_set[int] _intersection_cpp():\n", 506 | " cdef:\n", 507 | " unordered_set[int].iterator it = myset1.begin()\n", 508 | " unordered_set[int] ret\n", 509 | " while it != myset1.end():\n", 510 | " if myset2.count(deref(it)):\n", 511 | " ret.insert(deref(it))\n", 512 | " inc(it)\n", 513 | " return ret\n", 514 | "\n", 515 | "def intersection_cpp():\n", 516 | " return _intersection_cpp()\n", 517 | "\n", 518 | "print(\"用Cython(C++)来实现,计算结果为%s!\"% intersection_cpp())" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 14, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "name": "stdout", 528 | "output_type": "stream", 529 | "text": [ 530 | "对比Python版本与C++版本的性能...\n", 531 | "274 ns ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", 532 | "5.28 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" 533 | ] 534 | } 535 | ], 536 | "source": [ 537 | "print(\"对比Python版本与C++版本的性能...\")\n", 538 | "%timeit intersection_py()\n", 539 | "%timeit intersection_cpp()" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 15, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "name": "stdout", 549 | "output_type": "stream", 550 | "text": [ 551 | "用Python(C++)来实现,计算结果为10!\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "myset1 = {x for x in range(100)}\n", 557 | "myset2 = {x for x in range(50, 60)}\n", 558 | "\n", 559 | "def count_common_py():\n", 560 | " return len(myset1 & myset2)\n", 561 | "\n", 562 | "print(\"用Python(C++)来实现,计算结果为%s!\"% count_common_py())" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": 16, 568 | "metadata": {}, 569 | "outputs": [ 570 | { 571 | "name": "stdout", 572 | "output_type": "stream", 573 | "text": [ 574 | "用Cython(C++)来实现,计算结果为10!\n" 575 | ] 576 | } 577 | ], 578 | "source": [ 579 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 580 | "from cython.operator cimport dereference as deref, preincrement as inc\n", 581 | "from libcpp.unordered_set cimport unordered_set\n", 582 | "\n", 583 | "cdef:\n", 584 | " unordered_set[int] myset2 = {x for x in range(100)}\n", 585 | " unordered_set[int] myset1 = {x for x in range(50, 60)}\n", 586 | "\n", 587 | "cdef int _count_common_cpp():\n", 588 | " if myset1.size() > myset2.size():\n", 589 | " myset1.swap(myset2)\n", 590 | " cdef:\n", 591 | " unordered_set[int].iterator it = myset1.begin()\n", 592 | " int ret = 0\n", 593 | " while it != myset1.end():\n", 594 | " if myset2.count(deref(it)):\n", 595 | " ret += 1\n", 596 | " inc(it)\n", 597 | " return ret\n", 598 | "\n", 599 | "def count_common_cpp():\n", 600 | " return _count_common_cpp()\n", 601 | "\n", 602 | "print(\"用Cython(C++)来实现,计算结果为%s!\"% count_common_cpp())" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": 17, 608 | "metadata": {}, 609 | "outputs": [ 610 | { 611 | "name": "stdout", 612 | "output_type": "stream", 613 | "text": [ 614 | "对比Python版本与C++版本的性能...\n", 615 | "295 ns ± 5.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", 616 | "46.1 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" 617 | ] 618 | } 619 | ], 620 | "source": [ 621 | "print(\"对比Python版本与C++版本的性能...\")\n", 622 | "%timeit count_common_py()\n", 623 | "%timeit count_common_cpp()" 624 | ] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": {}, 629 | "source": [ 630 | "## 4. 传值与传引用\n", 631 | "Python的函数,如果是容器类对象(如List, Set),传递的是引用,否则传递的是值(如int, float),如果不希望让函数修改容器类对象,可以用deepcopy函数先拷贝一份容器的副本。 \n", 632 | "但在C++里默认都是传值,如果需要传引用需要声明。\n", 633 | "以int型Vector为例,可以看到v1的值没有被pass_value修改,但被pass_reference修改了。\n", 634 | "- 传值使用 ``vector[int]``,pass_value函数只是传入了v1的一份拷贝,所以函数无法修改v1\n", 635 | "- 传引用使用 ``vector[int]&``,pass_reference传入了v1的引用,函数可以修改v1。 \n", 636 | "\n", 637 | "下面的两块代码可以展示Python与C++的不同之处。" 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": 18, 643 | "metadata": {}, 644 | "outputs": [ 645 | { 646 | "name": "stdout", 647 | "output_type": "stream", 648 | "text": [ 649 | "v1的初始值是[0, 0, 0]\n", 650 | "执行pass_value函数后,v1的值是[0, 0, 0]\n", 651 | "执行pass_reference函数后,v1的值是[-1, 0, 0]\n" 652 | ] 653 | } 654 | ], 655 | "source": [ 656 | "from copy import deepcopy\n", 657 | "\n", 658 | "def pass_value(v):\n", 659 | " v = deepcopy(v)\n", 660 | " v[0] = -1\n", 661 | "\n", 662 | "def pass_reference(v):\n", 663 | " v[0] = -1\n", 664 | "\n", 665 | "v1 = [0, 0, 0]\n", 666 | "print(\"v1的初始值是%s\" % v1)\n", 667 | "pass_value(v1)\n", 668 | "print(\"执行pass_value函数后,v1的值是%s\" % v1)\n", 669 | "pass_reference(v1)\n", 670 | "print(\"执行pass_reference函数后,v1的值是%s\" % v1)" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": 19, 676 | "metadata": {}, 677 | "outputs": [ 678 | { 679 | "name": "stdout", 680 | "output_type": "stream", 681 | "text": [ 682 | "v1的初始值是[0, 0, 0]\n", 683 | "执行pass_value函数后,v1的值是[0, 0, 0]\n", 684 | "执行pass_reference函数后,v1的值是[-1, 0, 0]\n" 685 | ] 686 | } 687 | ], 688 | "source": [ 689 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 690 | "\n", 691 | "from libcpp.vector cimport vector\n", 692 | "\n", 693 | "cdef void pass_value(vector[int] v):\n", 694 | " v[0] = -1\n", 695 | "\n", 696 | "cdef void pass_reference(vector[int]& v):\n", 697 | " v[0] = -1\n", 698 | "\n", 699 | "cdef vector[int] v1 = [0, 0, 0]\n", 700 | "print(\"v1的初始值是%s\" % v1)\n", 701 | "pass_value(v1)\n", 702 | "print(\"执行pass_value函数后,v1的值是%s\" % v1)\n", 703 | "pass_reference(v1)\n", 704 | "print(\"执行pass_reference函数后,v1的值是%s\" % v1)" 705 | ] 706 | }, 707 | { 708 | "cell_type": "markdown", 709 | "metadata": {}, 710 | "source": [ 711 | "## 5. 数字的范围\n", 712 | "Python只有int型,而且int的范围可以认为是无限大的,只要没有超出内存限制,所以Python使用者一般不太关心数值溢出等问题。但使用C++的时候就需要谨慎,C++各个数字类型对应的范围如下: \n", 713 | "\n", 714 | "\n", 715 | "|Type\t|Typical Bit Width\t|Typical Range|\n", 716 | "| ------ | ------ | ------ |\n", 717 | "|char\t|1byte\t|-127 to 127 or 0 to 255|\n", 718 | "|unsigned char\t|1byte\t|0 to 255|\n", 719 | "|signed char\t|1byte\t-127 to 127|\n", 720 | "|int\t|4bytes\t|-2147483648 to 2147483647|\n", 721 | "|unsigned int\t|4bytes\t|0 to 4294967295|\n", 722 | "|signed int\t|4bytes\t|-2147483648 to 2147483647|\n", 723 | "|short int\t|2bytes\t|-32768 to 32767|\n", 724 | "|unsigned short int\t|2bytes\t|0 to 65,535|\n", 725 | "|signed short int\t|2bytes\t|-32768 to 32767|\n", 726 | "|long int\t|4bytes\t|-2,147,483,648 to 2,147,483,647|\n", 727 | "|signed long int\t|8bytes\t|same as long int|\n", 728 | "|unsigned long int\t|4bytes\t|0 to 4,294,967,295|\n", 729 | "|long long int\t|8bytes\t|-(2^63) to (2^63)-1|\n", 730 | "|unsigned long long int\t|8bytes\t|0 to 18,446,744,073,709,551,615|\n", 731 | "|float\t|4bytes\t||\n", 732 | "|double\t|8bytes\t||\n", 733 | "|long double\t|12bytes||\t\n", 734 | "|wchar_t\t|2 or 4 bytes\t|1 wide character|\n", 735 | "\n", 736 | "\n", 737 | "比如下面的函数就会造成错误。" 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 20, 743 | "metadata": {}, 744 | "outputs": [], 745 | "source": [ 746 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 747 | "def sum_py(num1, num2):\n", 748 | " print(\"The result by python is:\", num1 + num2)\n", 749 | "\n", 750 | "cdef int _sum_cpp(int num1, int num2): # int -> long int\n", 751 | " return num1 + num2\n", 752 | "\n", 753 | "def sum_cpp(num1, num2):\n", 754 | " print(\"The result by cpp is:\", _sum_cpp(num1, num2))" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 21, 760 | "metadata": {}, 761 | "outputs": [ 762 | { 763 | "name": "stdout", 764 | "output_type": "stream", 765 | "text": [ 766 | "The result by python is: 2147483648\n", 767 | "The result by cpp is: -2147483648\n" 768 | ] 769 | } 770 | ], 771 | "source": [ 772 | "sum_py(2**31-1, 1)\n", 773 | "sum_cpp(2**31-1, 1)" 774 | ] 775 | }, 776 | { 777 | "cell_type": "code", 778 | "execution_count": 22, 779 | "metadata": {}, 780 | "outputs": [], 781 | "source": [ 782 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n", 783 | "from libcpp cimport bool\n", 784 | "\n", 785 | "def lt_py(num1, num2):\n", 786 | " print(\"The result by python is:\", num1 < num2)\n", 787 | "\n", 788 | "cdef bool _lt_cpp(float num1, float num2): # float -> double\n", 789 | " return num1 > num2\n", 790 | "\n", 791 | "def lt_cpp(num1, num2):\n", 792 | " print(\"The result by cpp is:\", _lt_cpp(num1, num2))" 793 | ] 794 | }, 795 | { 796 | "cell_type": "code", 797 | "execution_count": 23, 798 | "metadata": {}, 799 | "outputs": [ 800 | { 801 | "name": "stdout", 802 | "output_type": "stream", 803 | "text": [ 804 | "The result by python is: True\n", 805 | "The result by cpp is: False\n" 806 | ] 807 | } 808 | ], 809 | "source": [ 810 | "lt_py(1234567890.0, 1234567891.0)\n", 811 | "lt_cpp(1234567890.0, 1234567891.0)" 812 | ] 813 | }, 814 | { 815 | "cell_type": "code", 816 | "execution_count": null, 817 | "metadata": {}, 818 | "outputs": [], 819 | "source": [] 820 | } 821 | ], 822 | "metadata": { 823 | "kernelspec": { 824 | "display_name": "Python 3", 825 | "language": "python", 826 | "name": "python3" 827 | }, 828 | "language_info": { 829 | "codemirror_mode": { 830 | "name": "ipython", 831 | "version": 3 832 | }, 833 | "file_extension": ".py", 834 | "mimetype": "text/x-python", 835 | "name": "python", 836 | "nbconvert_exporter": "python", 837 | "pygments_lexer": "ipython3", 838 | "version": "3.6.6" 839 | } 840 | }, 841 | "nbformat": 4, 842 | "nbformat_minor": 2 843 | } 844 | -------------------------------------------------------------------------------- /Built-in method.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 使用内置方法优化Python性能\n", 8 | "作者: tushushu \n", 9 | "项目地址: https://github.com/tushushu/flying-python\n", 10 | "\n", 11 | "Python程序运行太慢的一个可能的原因是没有尽可能的调用内置方法,下面通过5个例子来演示如何用内置方法提升Python程序的性能。" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## 1. 数组求平方和\n", 19 | "输入一个列表,要求计算出该列表中数字的的平方和。最终性能提升了1.4倍。" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "首先创建一个长度为10000的列表。" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "arr = list(range(10000))" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "### 1.1 最常规的写法\n", 43 | "while循环遍历列表求平方和。平均运行时间2.97毫秒。" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "def sum_sqr_0(arr):\n", 53 | " res = 0\n", 54 | " n = len(arr)\n", 55 | " i = 0\n", 56 | " while i < n:\n", 57 | " res += arr[i] ** 2\n", 58 | " i += 1\n", 59 | " return res" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "2.97 ms ± 36.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "%timeit sum_sqr_0(arr)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "### 1.2 for range代替while循环\n", 84 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间2.9毫秒。" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 4, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "def sum_sqr_1(arr):\n", 94 | " res = 0\n", 95 | " for i in range(len(arr)):\n", 96 | " res += arr[i] ** 2\n", 97 | " return res" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "2.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "%timeit sum_sqr_1(arr)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "### 1.3 for x in arr代替for range\n", 122 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间2.59毫秒。" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 6, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "def sum_sqr_2(arr):\n", 132 | " res = 0\n", 133 | " for x in arr:\n", 134 | " res += x ** 2\n", 135 | " return res" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 7, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "2.59 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "%timeit sum_sqr_2(arr)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "### 1.4 sum函数套用map函数\n", 160 | "平均运行时间2.36毫秒" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 8, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "def sum_sqr_3(arr):\n", 170 | " return sum(map(lambda x: x**2, arr))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 9, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "2.36 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 183 | ] 184 | } 185 | ], 186 | "source": [ 187 | "%timeit sum_sqr_3(arr)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "### 1.5 sum函数套用生成器表达式\n", 195 | "生成器表达式如果作为某个函数的参数,则可以省略掉()。平均运行时间2.35毫秒。" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 10, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "def sum_sqr_4(arr):\n", 205 | " return sum(x ** 2 for x in arr)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 11, 211 | "metadata": {}, 212 | "outputs": [ 213 | { 214 | "name": "stdout", 215 | "output_type": "stream", 216 | "text": [ 217 | "2.35 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 218 | ] 219 | } 220 | ], 221 | "source": [ 222 | "%timeit sum_sqr_4(arr)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### 1. 6 sum函数套用列表推导式\n", 230 | "平均运行时间2.06毫秒。" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 12, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "def sum_sqr_5(arr):\n", 240 | " return sum([x ** 2 for x in arr])" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 13, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "2.06 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "%timeit sum_sqr_5(arr)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "## 2. 字符串拼接\n", 265 | "输入一个列表,要求将列表中的字符串的前3个字符都拼接为一个字符串。最终性能提升了2.1倍。" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "首先创建一个列表,生成10000个随机长度和内容的字符串。" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 1, 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [ 281 | "from random import randint\n", 282 | "\n", 283 | "def random_letter():\n", 284 | " return chr(ord('a') + randint(0, 25))\n", 285 | "\n", 286 | "def random_letters(n):\n", 287 | " return \"\".join([random_letter() for _ in range(n)])\n", 288 | "\n", 289 | "strings = [random_letters(randint(1, 10)) for _ in range(10000)]" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "### 2.1 最常规的写法\n", 297 | "while循环遍历列表,对字符串进行拼接。平均运行时间1.86毫秒。" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 2, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "def concat_strings_0(strings):\n", 307 | " res = \"\"\n", 308 | " n = len(strings)\n", 309 | " i = 0\n", 310 | " while i < n:\n", 311 | " res += strings[i][:3]\n", 312 | " i += 1\n", 313 | " return res" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 3, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "name": "stdout", 323 | "output_type": "stream", 324 | "text": [ 325 | "1.86 ms ± 74.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 326 | ] 327 | } 328 | ], 329 | "source": [ 330 | "%timeit concat_strings_0(strings)" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "### 2.2 for range代替while循环\n", 338 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间1.55毫秒。" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 4, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "def concat_strings_1(strings):\n", 348 | " res = \"\"\n", 349 | " for i in range(len(strings)):\n", 350 | " res += strings[i][:3]\n", 351 | " return res" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 5, 357 | "metadata": {}, 358 | "outputs": [ 359 | { 360 | "name": "stdout", 361 | "output_type": "stream", 362 | "text": [ 363 | "1.55 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 364 | ] 365 | } 366 | ], 367 | "source": [ 368 | "%timeit concat_strings_1(strings)" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "### 2.3 for x in strings代替for range\n", 376 | "避免strings[i]的变量类型检查带来的额外开销。平均运行时间1.32毫秒。" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": 6, 382 | "metadata": {}, 383 | "outputs": [], 384 | "source": [ 385 | "def concat_strings_2(strings):\n", 386 | " res = \"\"\n", 387 | " for x in strings:\n", 388 | " res += x[:3]\n", 389 | " return res" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": 7, 395 | "metadata": {}, 396 | "outputs": [ 397 | { 398 | "name": "stdout", 399 | "output_type": "stream", 400 | "text": [ 401 | "1.32 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 402 | ] 403 | } 404 | ], 405 | "source": [ 406 | "%timeit concat_strings_2(strings)" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "### 2.4 .join方法套用生成器表达式\n", 414 | "平均运行时间1.06毫秒。" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 8, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [ 423 | "def concat_strings_3(strings):\n", 424 | " return \"\".join(x[:3] for x in strings)" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 9, 430 | "metadata": {}, 431 | "outputs": [ 432 | { 433 | "name": "stdout", 434 | "output_type": "stream", 435 | "text": [ 436 | "1.06 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 437 | ] 438 | } 439 | ], 440 | "source": [ 441 | "%timeit concat_strings_3(strings)" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "### 2.5 .join方法套用列表解析式\n", 449 | "平均运行时间0.85毫秒。" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 10, 455 | "metadata": {}, 456 | "outputs": [], 457 | "source": [ 458 | "def concat_strings_4(strings):\n", 459 | " return \"\".join([x[:3] for x in strings])" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 11, 465 | "metadata": {}, 466 | "outputs": [ 467 | { 468 | "name": "stdout", 469 | "output_type": "stream", 470 | "text": [ 471 | "858 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 472 | ] 473 | } 474 | ], 475 | "source": [ 476 | "%timeit concat_strings_4(strings)" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "## 3. 筛选奇数" 484 | ] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "metadata": {}, 489 | "source": [ 490 | "输入一个列表,要求筛选出该列表中的所有奇数。最终性能提升了3.6倍。" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": {}, 496 | "source": [ 497 | "首先创建一个长度为10000的列表。" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 26, 503 | "metadata": {}, 504 | "outputs": [], 505 | "source": [ 506 | "arr = list(range(10000))" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "### 3.1 最常规的写法\n", 514 | "创建一个空列表res,while循环遍历列表,将奇数append到res中。平均运行时间1.03毫秒。" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 27, 520 | "metadata": {}, 521 | "outputs": [], 522 | "source": [ 523 | "def filter_odd_0(arr):\n", 524 | " res = []\n", 525 | " i = 0\n", 526 | " n = len(arr)\n", 527 | " while i < n:\n", 528 | " if arr[i] % 2:\n", 529 | " res.append(arr[i])\n", 530 | " i += 1\n", 531 | " return res" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 28, 537 | "metadata": {}, 538 | "outputs": [ 539 | { 540 | "name": "stdout", 541 | "output_type": "stream", 542 | "text": [ 543 | "1.03 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 544 | ] 545 | } 546 | ], 547 | "source": [ 548 | "%timeit filter_odd_0(arr)" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "### 3.2 for range代替while循环\n", 556 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间0.965毫秒。" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 29, 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "def filter_odd_1(arr):\n", 566 | " res = []\n", 567 | " for i in range(len(arr)):\n", 568 | " if arr[i] % 2:\n", 569 | " res.append(arr[i])\n", 570 | " i += 1\n", 571 | " return res" 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": 30, 577 | "metadata": {}, 578 | "outputs": [ 579 | { 580 | "name": "stdout", 581 | "output_type": "stream", 582 | "text": [ 583 | "965 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 584 | ] 585 | } 586 | ], 587 | "source": [ 588 | "%timeit filter_odd_1(arr)" 589 | ] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": {}, 594 | "source": [ 595 | "### 3.3 for x in arr代替for range\n", 596 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.430毫秒。" 597 | ] 598 | }, 599 | { 600 | "cell_type": "code", 601 | "execution_count": 31, 602 | "metadata": {}, 603 | "outputs": [], 604 | "source": [ 605 | "def filter_odd_2(arr):\n", 606 | " res = []\n", 607 | " for x in arr:\n", 608 | " if x % 2:\n", 609 | " res.append(x)\n", 610 | " return res" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": 32, 616 | "metadata": {}, 617 | "outputs": [ 618 | { 619 | "name": "stdout", 620 | "output_type": "stream", 621 | "text": [ 622 | "430 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 623 | ] 624 | } 625 | ], 626 | "source": [ 627 | "%timeit filter_odd_2(arr)" 628 | ] 629 | }, 630 | { 631 | "cell_type": "markdown", 632 | "metadata": {}, 633 | "source": [ 634 | "### 3.4 list套用filter函数\n", 635 | "平均运行时间0.763毫秒。注意filter函数很慢,在Python 3.6里非常鸡肋。" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 33, 641 | "metadata": {}, 642 | "outputs": [], 643 | "source": [ 644 | "def filter_odd_3(arr):\n", 645 | " return list(filter(lambda x: x % 2, arr))" 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": 34, 651 | "metadata": {}, 652 | "outputs": [ 653 | { 654 | "name": "stdout", 655 | "output_type": "stream", 656 | "text": [ 657 | "763 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 658 | ] 659 | } 660 | ], 661 | "source": [ 662 | "%timeit filter_odd_3(arr)" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": {}, 668 | "source": [ 669 | "### 3.5 list套用生成器表达式\n", 670 | "平均运行时间0.398毫秒。" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": 35, 676 | "metadata": {}, 677 | "outputs": [], 678 | "source": [ 679 | "def filter_odd_4(arr):\n", 680 | " return list((x for x in arr if x % 2))" 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "execution_count": 36, 686 | "metadata": {}, 687 | "outputs": [ 688 | { 689 | "name": "stdout", 690 | "output_type": "stream", 691 | "text": [ 692 | "398 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 693 | ] 694 | } 695 | ], 696 | "source": [ 697 | "%timeit filter_odd_4(arr)" 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "### 3.6 带条件的列表推导式\n", 705 | "平均运行时间0.290毫秒。" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 37, 711 | "metadata": {}, 712 | "outputs": [], 713 | "source": [ 714 | "def filter_odd_5(arr):\n", 715 | " return [x for x in arr if x % 2]" 716 | ] 717 | }, 718 | { 719 | "cell_type": "code", 720 | "execution_count": 38, 721 | "metadata": {}, 722 | "outputs": [ 723 | { 724 | "name": "stdout", 725 | "output_type": "stream", 726 | "text": [ 727 | "290 µs ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 728 | ] 729 | } 730 | ], 731 | "source": [ 732 | "%timeit filter_odd_5(arr)" 733 | ] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": {}, 738 | "source": [ 739 | "## 4. 两个数组相加" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "输入两个长度相同的列表,要求计算出两个列表对应位置的数字之和,返回一个与输入长度相同的列表。最终性能提升了2.7倍。" 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "metadata": {}, 752 | "source": [ 753 | "首先生成两个长度为10000的列表。" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": 40, 759 | "metadata": {}, 760 | "outputs": [], 761 | "source": [ 762 | "arr1 = list(range(10000))\n", 763 | "arr2 = list(range(10000))" 764 | ] 765 | }, 766 | { 767 | "cell_type": "markdown", 768 | "metadata": {}, 769 | "source": [ 770 | "### 4.1 最常规的写法\n", 771 | "创建一个空列表res,while循环遍历列表,将两个列表对应的元素之和append到res中。平均运行时间1.23毫秒。" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 41, 777 | "metadata": {}, 778 | "outputs": [], 779 | "source": [ 780 | "def arr_sum_0(arr1, arr2):\n", 781 | " i = 0\n", 782 | " n = len(arr1)\n", 783 | " res = []\n", 784 | " while i < n:\n", 785 | " res.append(arr1[i] + arr2[i])\n", 786 | " i += 1\n", 787 | " return res" 788 | ] 789 | }, 790 | { 791 | "cell_type": "code", 792 | "execution_count": 42, 793 | "metadata": {}, 794 | "outputs": [ 795 | { 796 | "name": "stdout", 797 | "output_type": "stream", 798 | "text": [ 799 | "1.23 ms ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 800 | ] 801 | } 802 | ], 803 | "source": [ 804 | "%timeit arr_sum_0(arr1, arr2)" 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "metadata": {}, 810 | "source": [ 811 | "### 4.2 for range代替while循环\n", 812 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间0.997毫秒。" 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": 43, 818 | "metadata": {}, 819 | "outputs": [], 820 | "source": [ 821 | "def arr_sum_1(arr1, arr2):\n", 822 | " res = []\n", 823 | " for i in range(len(arr1)):\n", 824 | " res.append(arr1[i] + arr2[i])\n", 825 | " return res" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 44, 831 | "metadata": {}, 832 | "outputs": [ 833 | { 834 | "name": "stdout", 835 | "output_type": "stream", 836 | "text": [ 837 | "997 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 838 | ] 839 | } 840 | ], 841 | "source": [ 842 | "%timeit arr_sum_1(arr1, arr2)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "markdown", 847 | "metadata": {}, 848 | "source": [ 849 | "### 4.3 for i, x in enumerate代替for range\n", 850 | "部分避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.799毫秒。" 851 | ] 852 | }, 853 | { 854 | "cell_type": "code", 855 | "execution_count": 45, 856 | "metadata": {}, 857 | "outputs": [], 858 | "source": [ 859 | "def arr_sum_2(arr1, arr2):\n", 860 | " res = arr1.copy()\n", 861 | " for i, x in enumerate(arr2):\n", 862 | " res[i] += x\n", 863 | " return res" 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 46, 869 | "metadata": {}, 870 | "outputs": [ 871 | { 872 | "name": "stdout", 873 | "output_type": "stream", 874 | "text": [ 875 | "799 µs ± 16.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 876 | ] 877 | } 878 | ], 879 | "source": [ 880 | "%timeit arr_sum_2(arr1, arr2)" 881 | ] 882 | }, 883 | { 884 | "cell_type": "markdown", 885 | "metadata": {}, 886 | "source": [ 887 | "### 4.4 for x, y in zip代替for range\n", 888 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.769毫秒。" 889 | ] 890 | }, 891 | { 892 | "cell_type": "code", 893 | "execution_count": 47, 894 | "metadata": {}, 895 | "outputs": [], 896 | "source": [ 897 | "def arr_sum_3(arr1, arr2):\n", 898 | " res = []\n", 899 | " for x, y in zip(arr1, arr2):\n", 900 | " res.append(x + y)\n", 901 | " return res" 902 | ] 903 | }, 904 | { 905 | "cell_type": "code", 906 | "execution_count": 48, 907 | "metadata": {}, 908 | "outputs": [ 909 | { 910 | "name": "stdout", 911 | "output_type": "stream", 912 | "text": [ 913 | "769 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 914 | ] 915 | } 916 | ], 917 | "source": [ 918 | "%timeit arr_sum_3(arr1, arr2)" 919 | ] 920 | }, 921 | { 922 | "cell_type": "markdown", 923 | "metadata": {}, 924 | "source": [ 925 | "### 4.5 列表推导式套用zip\n", 926 | "平均运行时间0.462毫秒。" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 49, 932 | "metadata": {}, 933 | "outputs": [], 934 | "source": [ 935 | "def arr_sum_4(arr1, arr2):\n", 936 | " return [x + y for x, y in zip(arr1, arr2)]" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": 50, 942 | "metadata": {}, 943 | "outputs": [ 944 | { 945 | "name": "stdout", 946 | "output_type": "stream", 947 | "text": [ 948 | "462 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 949 | ] 950 | } 951 | ], 952 | "source": [ 953 | "%timeit arr_sum_4(arr1, arr2)" 954 | ] 955 | }, 956 | { 957 | "cell_type": "markdown", 958 | "metadata": {}, 959 | "source": [ 960 | "## 5. 两个列表相同元素的数量\n", 961 | "输入两个列表,要求统计两个列表相同元素的数量。其中每个列表内的元素都是不重复的。最终性能提升了5000倍。" 962 | ] 963 | }, 964 | { 965 | "cell_type": "markdown", 966 | "metadata": {}, 967 | "source": [ 968 | "首先创建两个列表,并将元素的顺序打乱。" 969 | ] 970 | }, 971 | { 972 | "cell_type": "code", 973 | "execution_count": 51, 974 | "metadata": {}, 975 | "outputs": [], 976 | "source": [ 977 | "from random import shuffle\n", 978 | "arr1 = list(range(2000))\n", 979 | "shuffle(arr1)\n", 980 | "arr2 = list(range(1000, 3000))\n", 981 | "shuffle(arr2)" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "metadata": {}, 987 | "source": [ 988 | "### 5.1 最常规的写法\n", 989 | "while循环嵌套,判断元素arr1[i]是否等于arr2[j],平均运行时间338毫秒。" 990 | ] 991 | }, 992 | { 993 | "cell_type": "code", 994 | "execution_count": 52, 995 | "metadata": {}, 996 | "outputs": [], 997 | "source": [ 998 | "def n_common_0(arr1, arr2):\n", 999 | " res = 0\n", 1000 | " i = 0\n", 1001 | " m = len(arr1)\n", 1002 | " n = len(arr2)\n", 1003 | " while i < m:\n", 1004 | " j = 0\n", 1005 | " while j < n:\n", 1006 | " if arr1[i] == arr2[j]:\n", 1007 | " res += 1\n", 1008 | " j += 1\n", 1009 | " i += 1\n", 1010 | " return res" 1011 | ] 1012 | }, 1013 | { 1014 | "cell_type": "code", 1015 | "execution_count": 53, 1016 | "metadata": {}, 1017 | "outputs": [ 1018 | { 1019 | "name": "stdout", 1020 | "output_type": "stream", 1021 | "text": [ 1022 | "338 ms ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 1023 | ] 1024 | } 1025 | ], 1026 | "source": [ 1027 | "%timeit n_common_0(arr1, arr2)" 1028 | ] 1029 | }, 1030 | { 1031 | "cell_type": "markdown", 1032 | "metadata": {}, 1033 | "source": [ 1034 | "### 5.2 for range代替while循环\n", 1035 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间233毫秒。" 1036 | ] 1037 | }, 1038 | { 1039 | "cell_type": "code", 1040 | "execution_count": 54, 1041 | "metadata": {}, 1042 | "outputs": [], 1043 | "source": [ 1044 | "def n_common_1(arr1, arr2):\n", 1045 | " res = 0\n", 1046 | " for i in range(len(arr1)):\n", 1047 | " for j in range(len(arr2)):\n", 1048 | " if arr1[i] == arr2[j]:\n", 1049 | " res += 1\n", 1050 | " return res" 1051 | ] 1052 | }, 1053 | { 1054 | "cell_type": "code", 1055 | "execution_count": 55, 1056 | "metadata": {}, 1057 | "outputs": [ 1058 | { 1059 | "name": "stdout", 1060 | "output_type": "stream", 1061 | "text": [ 1062 | "233 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" 1063 | ] 1064 | } 1065 | ], 1066 | "source": [ 1067 | "%timeit n_common_1(arr1, arr2)" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "markdown", 1072 | "metadata": {}, 1073 | "source": [ 1074 | "### 5.3 for x in arr代替for range\n", 1075 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间84.8毫秒。" 1076 | ] 1077 | }, 1078 | { 1079 | "cell_type": "code", 1080 | "execution_count": 56, 1081 | "metadata": {}, 1082 | "outputs": [], 1083 | "source": [ 1084 | "def n_common_2(arr1, arr2):\n", 1085 | " res = 0\n", 1086 | " for x in arr1:\n", 1087 | " for y in arr2:\n", 1088 | " if x == y:\n", 1089 | " res += 1\n", 1090 | " return res" 1091 | ] 1092 | }, 1093 | { 1094 | "cell_type": "code", 1095 | "execution_count": 57, 1096 | "metadata": {}, 1097 | "outputs": [ 1098 | { 1099 | "name": "stdout", 1100 | "output_type": "stream", 1101 | "text": [ 1102 | "84.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 1103 | ] 1104 | } 1105 | ], 1106 | "source": [ 1107 | "%timeit n_common_2(arr1, arr2)" 1108 | ] 1109 | }, 1110 | { 1111 | "cell_type": "markdown", 1112 | "metadata": {}, 1113 | "source": [ 1114 | "### 5.4 使用if x in arr2代替内层循环\n", 1115 | "平均运行时间24.9毫秒。" 1116 | ] 1117 | }, 1118 | { 1119 | "cell_type": "code", 1120 | "execution_count": 58, 1121 | "metadata": {}, 1122 | "outputs": [], 1123 | "source": [ 1124 | "def n_common_3(arr1, arr2):\n", 1125 | " res = 0\n", 1126 | " for x in arr1:\n", 1127 | " if x in arr2:\n", 1128 | " res += 1\n", 1129 | " return res" 1130 | ] 1131 | }, 1132 | { 1133 | "cell_type": "code", 1134 | "execution_count": 59, 1135 | "metadata": {}, 1136 | "outputs": [ 1137 | { 1138 | "name": "stdout", 1139 | "output_type": "stream", 1140 | "text": [ 1141 | "24.9 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" 1142 | ] 1143 | } 1144 | ], 1145 | "source": [ 1146 | "%timeit n_common_3(arr1, arr2)" 1147 | ] 1148 | }, 1149 | { 1150 | "cell_type": "markdown", 1151 | "metadata": {}, 1152 | "source": [ 1153 | "### 5.4 使用更快的算法\n", 1154 | "将数组用.sort方法排序,再进行单层循环遍历。把时间复杂度从O(n2)降低到O(nlogn),平均运行时间0.239毫秒。" 1155 | ] 1156 | }, 1157 | { 1158 | "cell_type": "code", 1159 | "execution_count": 60, 1160 | "metadata": {}, 1161 | "outputs": [], 1162 | "source": [ 1163 | "def n_common_4(arr1, arr2):\n", 1164 | " arr1.sort()\n", 1165 | " arr2.sort()\n", 1166 | " res = i = j = 0\n", 1167 | " m, n = len(arr1), len(arr2)\n", 1168 | " while i < m and j < n:\n", 1169 | " if arr1[i] == arr2[j]:\n", 1170 | " res += 1\n", 1171 | " i += 1\n", 1172 | " j += 1\n", 1173 | " elif arr1[i] > arr2[j]:\n", 1174 | " j += 1\n", 1175 | " else:\n", 1176 | " i += 1\n", 1177 | " return res" 1178 | ] 1179 | }, 1180 | { 1181 | "cell_type": "code", 1182 | "execution_count": 61, 1183 | "metadata": {}, 1184 | "outputs": [ 1185 | { 1186 | "name": "stdout", 1187 | "output_type": "stream", 1188 | "text": [ 1189 | "329 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" 1190 | ] 1191 | } 1192 | ], 1193 | "source": [ 1194 | "%timeit n_common_4(arr1, arr2)" 1195 | ] 1196 | }, 1197 | { 1198 | "cell_type": "markdown", 1199 | "metadata": {}, 1200 | "source": [ 1201 | "### 5.5 使用更好的数据结构\n", 1202 | "将数组转为集合,求交集的长度。平均运行时间0.067毫秒。" 1203 | ] 1204 | }, 1205 | { 1206 | "cell_type": "code", 1207 | "execution_count": 62, 1208 | "metadata": {}, 1209 | "outputs": [], 1210 | "source": [ 1211 | "def n_common_5(arr1, arr2):\n", 1212 | " return len(set(arr1) & set(arr2))" 1213 | ] 1214 | }, 1215 | { 1216 | "cell_type": "code", 1217 | "execution_count": 63, 1218 | "metadata": {}, 1219 | "outputs": [ 1220 | { 1221 | "name": "stdout", 1222 | "output_type": "stream", 1223 | "text": [ 1224 | "67.2 µs ± 755 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" 1225 | ] 1226 | } 1227 | ], 1228 | "source": [ 1229 | "%timeit n_common_5(arr1, arr2)" 1230 | ] 1231 | }, 1232 | { 1233 | "cell_type": "code", 1234 | "execution_count": null, 1235 | "metadata": {}, 1236 | "outputs": [], 1237 | "source": [] 1238 | } 1239 | ], 1240 | "metadata": { 1241 | "kernelspec": { 1242 | "display_name": "Python 3", 1243 | "language": "python", 1244 | "name": "python3" 1245 | }, 1246 | "language_info": { 1247 | "codemirror_mode": { 1248 | "name": "ipython", 1249 | "version": 3 1250 | }, 1251 | "file_extension": ".py", 1252 | "mimetype": "text/x-python", 1253 | "name": "python", 1254 | "nbconvert_exporter": "python", 1255 | "pygments_lexer": "ipython3", 1256 | "version": "3.6.6" 1257 | } 1258 | }, 1259 | "nbformat": 4, 1260 | "nbformat_minor": 2 1261 | } 1262 | --------------------------------------------------------------------------------