├── README.md
├── utils.pyx
├── setup.py
├── 10 minutes to cython.md
├── Python multi process.md
├── .gitignore
├── Python multi threads.md
├── More efficient pandas.md
├── 10 minutes to cython.ipynb
├── Python multi process.ipynb
├── Python Standard Library.md
├── Python coroutines.ipynb
├── More efficient array.md
├── Itertools for efficient looping.md
├── Python multi threads.ipynb
├── More efficient pandas.ipynb
├── Python Standard Library.ipynb
├── Built-in method.md
├── More efficient array.ipynb
├── Using C++ in Cython.md
├── Itertools for efficient looping.ipynb
├── Using C++ in Cython.ipynb
└── Built-in method.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # flying-python
2 | How to write fast and efficient Python code.
3 |
--------------------------------------------------------------------------------
/utils.pyx:
--------------------------------------------------------------------------------
1 | import cython
2 | import numpy as np
3 | cimport numpy as cnp
4 | ctypedef cnp.int_t DTYPE_t
5 |
6 |
7 | @cython.boundscheck(False)
8 | @cython.wraparound(False)
9 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):
10 | cdef:
11 | int i = 0
12 | int n = arr.shape[0]
13 | int x
14 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)
15 |
16 | while i < n:
17 | x = arr[i]
18 | if x % 2:
19 | new_arr[i] = x + 1
20 | else:
21 | new_arr[i] = x - 1
22 | i += 1
23 | return new_arr
24 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | """
2 | @Author: tushushu
3 | @Date: 2019-06-20 10:32:30
4 | """
5 |
6 | from distutils.core import setup
7 | from Cython.Build import cythonize
8 | import numpy
9 |
10 |
11 | def compile_file(file_name: str):
12 | """Compile pyx file."""
13 |
14 | ext_modules = cythonize(file_name)
15 | name = file_name.split(".")[0] if "." in file_name else file_name
16 | setup(name=name, ext_modules=ext_modules, include_dirs=[numpy.get_include()])
17 |
18 |
19 | if __name__ == "__main__":
20 | compile_file("utils.pyx")
21 |
22 | # source activate py36
23 | # python setup.py build_ext --inplace
24 |
--------------------------------------------------------------------------------
/10 minutes to cython.md:
--------------------------------------------------------------------------------
1 | # 10分钟入门Cython
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | ## 1. Cython是什么?
6 | Cython是一个编程语言,它通过类似Python的语法来编写C扩展并可以被Python调用.既具备了Python快速开发的特点,又可以让代码运行起来像C一样快,同时还可以方便地调用C library。
7 |
8 | ## 2. 如何安装Cython?
9 | 跟大多数的Python库不同,Cython需要一个C编译器,在不同的平台上配置方法也不一样。
10 | ### 2.1 配置gcc
11 | - **windows**
12 | 安装MingW-w64编译器:``conda install libpython m2w64-toolchain -c msys2``
13 | 在Python安装路径下找到\Lib\distutils文件夹,创建distutils.cfg写入如下内容:
14 | ``[build] compiler=mingw32``
15 |
16 | - **macOS**
17 | 安装XCode即可
18 |
19 | - **linux:**
20 | gcc一般都是配置好的,如果没有就执行这条命令: ``sudo apt-get install build-essential``
21 |
22 |
23 | ### 2.2 安装cython库
24 | - 如果没安装Anaconda: ``pip install cython``
25 | - 如果安装了Anaconda: ``conda install cython``
26 |
27 | ## 3. 在Jupyter Notebook上使用Cython
28 | - 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``
29 | - 接下来运行Cython代码,使用魔术命令 ``%%cython``
30 |
31 |
32 | ```python
33 | %load_ext Cython
34 | ```
35 |
36 |
37 | ```cython
38 | %%cython
39 | # 对1~100的自然数进行求和
40 | total = 0
41 | for i in range(1, 101):
42 | total += i
43 | print(total)
44 | ```
45 |
46 | 5050
47 |
48 |
49 | ## 4. 试试Cython到底有多快
50 | - Python函数,运行时间261 ns
51 | - Cython函数,运行时间44.1 ns
52 |
53 | 运行时间竟然只有原来的五分之一左右,秘诀就在于参数x使用了静态类型int,避免了类型检查的耗时。
54 |
55 | ### 4.1 Python函数
56 |
57 |
58 | ```python
59 | def f(x):
60 | return x ** 2 - x
61 | ```
62 |
63 |
64 | ```python
65 | %timeit f(100)
66 | ```
67 |
68 | 261 ns ± 8.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
69 |
70 |
71 | ### 4.2 Cython函数
72 |
73 |
74 | ```cython
75 | %%cython
76 | def g(int x):
77 | return x ** 2 - x
78 | ```
79 |
80 |
81 | ```python
82 | %timeit g(100)
83 | ```
84 |
85 | 44.1 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
86 |
87 |
88 | ## 参考文章
89 | 部分内容引用自 - [Cython官方文档](http://docs.cython.org/en/latest/index.html)
90 |
91 |
92 | ```python
93 |
94 | ```
95 |
--------------------------------------------------------------------------------
/Python multi process.md:
--------------------------------------------------------------------------------
1 | # Python多任务处理(多进程篇)
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | ## 多进程处理CPU密集型任务
6 | CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。在Python多任务处理(多线程篇),我们试图用多线程执行CPU密集型任务,然而并没有性能上的提升。现在我们试一下用多进程来处理CPU密集型任务。
7 |
8 | ### 1. 建立进程池
9 |
10 |
11 | ```python
12 | from concurrent.futures import ProcessPoolExecutor
13 | from time import sleep, time
14 | import os
15 | print("CPU核数为%s个!" % os.cpu_count())
16 | ```
17 |
18 | CPU核数为8个!
19 |
20 |
21 |
22 | ```python
23 | # Worker数量
24 | N = 8
25 | # 建立进程池
26 | pool = ProcessPoolExecutor(max_workers=N)
27 | ```
28 |
29 | ### 2. 定义一个CPU密集型函数
30 | 该函数会对[1, x]之间的整数进行求和。
31 |
32 |
33 | ```python
34 | def cpu_bound_func(x):
35 | tot = 0
36 | a = 1
37 | while a <= x:
38 | tot += x
39 | a += 1
40 | print("Finish sum from 1 to %d!" % x)
41 | return tot
42 | ```
43 |
44 | ### 3. 使用串行的方式处理
45 | 遍历一个列表的所有元素,执行func函数。
46 |
47 |
48 | ```python
49 | def process_array(arr):
50 | for x in arr:
51 | cpu_bound_func(x)
52 | ```
53 |
54 | ### 4. 使用多进程处理
55 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。
56 |
57 |
58 | ```python
59 | def fast_process_array(arr):
60 | for x in pool.map(cpu_bound_func, arr):
61 | pass
62 | ```
63 |
64 | ### 5. 计算函数运行时间
65 | - 串行版本的运行时间5.7秒
66 | - 多进程版本的运行时间1.6秒
67 |
68 |
69 | ```python
70 | def time_it(fn, *args):
71 | start = time()
72 | fn(*args)
73 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start))
74 | ```
75 |
76 |
77 | ```python
78 | time_it(process_array, [10**7 for _ in range(8)])
79 | ```
80 |
81 | Finish sum from 1 to 10000000!
82 | Finish sum from 1 to 10000000!
83 | Finish sum from 1 to 10000000!
84 | Finish sum from 1 to 10000000!
85 | Finish sum from 1 to 10000000!
86 | Finish sum from 1 to 10000000!
87 | Finish sum from 1 to 10000000!
88 | Finish sum from 1 to 10000000!
89 | process_array版本的运行时间为 5.74394 秒!
90 |
91 |
92 |
93 | ```python
94 | time_it(fast_process_array, [10**7 for _ in range(8)])
95 | ```
96 |
97 | fast_process_array版本的运行时间为 1.62266 秒!
98 |
99 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | .hypothesis/
51 | .pytest_cache/
52 |
53 | # Translations
54 | *.mo
55 | *.pot
56 |
57 | # Django stuff:
58 | *.log
59 | local_settings.py
60 | db.sqlite3
61 | db.sqlite3-journal
62 |
63 | # Flask stuff:
64 | instance/
65 | .webassets-cache
66 |
67 | # Scrapy stuff:
68 | .scrapy
69 |
70 | # Sphinx documentation
71 | docs/_build/
72 |
73 | # PyBuilder
74 | target/
75 |
76 | # Jupyter Notebook
77 | .ipynb_checkpoints
78 |
79 | # IPython
80 | profile_default/
81 | ipython_config.py
82 |
83 | # pyenv
84 | .python-version
85 |
86 | # pipenv
87 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
88 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
89 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
90 | # install all needed dependencies.
91 | #Pipfile.lock
92 |
93 | # celery beat schedule file
94 | celerybeat-schedule
95 |
96 | # SageMath parsed files
97 | *.sage.py
98 |
99 | # Environments
100 | .env
101 | .venv
102 | env/
103 | venv/
104 | ENV/
105 | env.bak/
106 | venv.bak/
107 |
108 | # Spyder project settings
109 | .spyderproject
110 | .spyproject
111 |
112 | # Rope project settings
113 | .ropeproject
114 |
115 | # mkdocs documentation
116 | /site
117 |
118 | # mypy
119 | .mypy_cache/
120 | .dmypy.json
121 | dmypy.json
122 |
123 | # Pyre type checker
124 | .pyre/
125 |
126 | # VS Code
127 | .vscode/
128 | .ipynb_checkpoints/
129 |
--------------------------------------------------------------------------------
/Python multi threads.md:
--------------------------------------------------------------------------------
1 | # Python多任务处理(多线程篇)
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | ## 1. GIL
6 |
7 | 熟悉python的都知道,在C语言写的python解释器中存在全局解释器锁,由于全局解释器锁的存在,在同一时间内,python解释器只能运行一个线程的代码,这大大影响了python多线程的性能。而这个解释器锁由于历史原因,现在几乎无法消除。
8 |
9 | python GIL 之所以会影响多线程等性能,是因为在多线程的情况下,只有当线程获得了一个全局锁的时候,那么该线程的代码才能运行,而全局锁只有一个,所以使用python多线程,在同一时刻也只有一个线程在运行,因此在即使在多核的情况下也只能发挥出单核的性能。
10 |
11 |
12 | ## 2. 多线程处理IO密集型任务
13 | IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。一个线程执行IO密集型任务的时候,CPU处于闲置状态,因此GIL会被释放给其他线程,从而缩短了总体的等待运行时间。
14 |
15 |
16 | ```python
17 | from concurrent.futures import ThreadPoolExecutor
18 | from time import sleep, time
19 | ```
20 |
21 |
22 | ```python
23 | # Worker数量
24 | N = 4
25 | # 建立线程池
26 | pool = ThreadPoolExecutor(max_workers=N)
27 | ```
28 |
29 | ### 2.1 定义一个IO密集型函数
30 | 该函数会“睡眠”x秒。
31 |
32 |
33 | ```python
34 | def io_bound_func(x):
35 | sleep(x)
36 | print("Sleep for %d seconds." % x)
37 | ```
38 |
39 | ### 2.2 使用串行的方式处理
40 | 遍历一个列表的所有元素,执行func函数。
41 |
42 |
43 | ```python
44 | def process_array(arr):
45 | for x in arr:
46 | io_bound_func(x)
47 | ```
48 |
49 | ### 2.3 使用多线程处理
50 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。
51 |
52 |
53 | ```python
54 | def fast_process_array(arr):
55 | for x in pool.map(io_bound_func, arr):
56 | pass
57 | ```
58 |
59 | ### 2.4 计算函数运行时间
60 | - 串行版本的运行时间 = 1 + 2 + 3 = 6秒
61 | - 多线程版本的运行时间 = max(1, 2, 3) = 3秒
62 |
63 |
64 | ```python
65 | def time_it(fn, *args):
66 | start = time()
67 | fn(*args)
68 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start))
69 | ```
70 |
71 |
72 | ```python
73 | time_it(process_array, [1, 2, 3])
74 | ```
75 |
76 | Sleep for 1 seconds.
77 | Sleep for 2 seconds.
78 | Sleep for 3 seconds.
79 | process_array版本的运行时间为 6.00883 秒!
80 |
81 |
82 |
83 | ```python
84 | time_it(fast_process_array, [1, 2, 3])
85 | ```
86 |
87 | Sleep for 1 seconds.
88 | Sleep for 2 seconds.
89 | Sleep for 3 seconds.
90 | fast_process_array版本的运行时间为 3.00300 秒!
91 |
92 |
93 | ### 3. 多线程CPU密集型任务
94 | CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。
95 |
96 | ### 3.1 定义一个CPU密集型函数
97 | 该函数会对[1, x]之间的整数进行求和。
98 |
99 |
100 | ```python
101 | def cpu_bound_func(x):
102 | tot = 0
103 | a = 1
104 | while a <= x:
105 | tot += x
106 | a += 1
107 | print("Finish sum from 1 to %d!" % x)
108 | return tot
109 | ```
110 |
111 | ### 3.2 使用串行的方式处理
112 | 遍历一个列表的所有元素,执行func函数。
113 |
114 |
115 | ```python
116 | def process_array(arr):
117 | for x in arr:
118 | cpu_bound_func(x)
119 | ```
120 |
121 | ### 3.3 使用多线程处理
122 | 通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。
123 |
124 |
125 | ```python
126 | def fast_process_array(arr):
127 | for x in pool.map(cpu_bound_func, arr):
128 | pass
129 | ```
130 |
131 | ### 3.4 计算函数运行时间
132 | - 串行版本的运行时间2.1秒
133 | - 多线程版本的运行时间2.2秒
134 |
135 |
136 | ```python
137 | def time_it(fn, *args):
138 | start = time()
139 | fn(*args)
140 | print("%s版本的运行时间为 %.5f 秒!" % (fn.__name__, time() - start))
141 | ```
142 |
143 |
144 | ```python
145 | time_it(process_array, [10**7, 10**7, 10**7])
146 | ```
147 |
148 | Finish sum from 1 to 10000000!
149 | Finish sum from 1 to 10000000!
150 | Finish sum from 1 to 10000000!
151 | process_array版本的运行时间为 2.10489 秒!
152 |
153 |
154 |
155 | ```python
156 | time_it(fast_process_array, [10**7, 10**7, 10**7])
157 | ```
158 |
159 | Finish sum from 1 to 10000000!
160 | Finish sum from 1 to 10000000!
161 | Finish sum from 1 to 10000000!
162 | fast_process_array版本的运行时间为 2.20897 秒!
163 |
164 |
165 | ## 参考文章
166 | https://www.jianshu.com/p/c75ed8a6e9af
167 | https://www.cnblogs.com/tusheng/articles/10630662.html
168 |
169 |
170 | ```python
171 |
172 | ```
173 |
--------------------------------------------------------------------------------
/More efficient pandas.md:
--------------------------------------------------------------------------------
1 | # 让Pandas DataFrame性能提升40倍
2 |
3 | ## 1. 小试牛刀
4 | 大名鼎鼎的Pandas是数据分析的神器。有时候我们需要对上千万甚至上亿的数据进行非常复杂处理,那么运行效率就是一个不能忽视的问题。比如下面这个简单例子,我们随机生成100万条数据,对'val'这一列进行处理:如果是偶数则减1,奇数则加1。实际的数据分析工作要比这个例子复杂的多,但考虑到我们(主要是我)没有那么多时间等待运行结果,所以就偷个懒吧。可以看到transform函数的平均运行时间是284ms,
5 |
6 |
7 | ```python
8 | import pandas as pd
9 | import numpy as np
10 |
11 | def gen_data(size):
12 | d = dict()
13 | d["genre"] = np.random.choice(["A", "B", "C", "D"], size=size)
14 | d["val"] = np.random.randint(low=0, high=100, size=size)
15 | return pd.DataFrame(d)
16 |
17 | data = gen_data(1000000)
18 | data.head()
19 | ```
20 |
21 |
22 |
23 |
24 |
25 |
38 |
39 |
40 |
41 | |
42 | genre |
43 | val |
44 |
45 |
46 |
47 |
48 | | 0 |
49 | C |
50 | 54 |
51 |
52 |
53 | | 1 |
54 | A |
55 | 5 |
56 |
57 |
58 | | 2 |
59 | D |
60 | 0 |
61 |
62 |
63 | | 3 |
64 | D |
65 | 42 |
66 |
67 |
68 | | 4 |
69 | C |
70 | 91 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 | ```python
80 | def transform(data):
81 | data.loc[:, "new_val"] = data.val.apply(lambda x: x + 1 if x % 2 else x - 1)
82 | ```
83 |
84 |
85 | ```python
86 | %timeit -n 1 transform(data)
87 | ```
88 |
89 | 284 ms ± 8.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
90 |
91 |
92 | ## 2. 用Cython编写C扩展
93 | 试试用我们的老朋友Cython来写一下 `x + 1 if x % 2 else x - 1` 这个函数。平均运行时间降低到了202ms,果然速度变快了。性能大约提升了1.4倍,离40倍的flag还差的好远[捂脸]。
94 |
95 |
96 | ```python
97 | %load_ext cython
98 | ```
99 |
100 |
101 | ```cython
102 | %%cython
103 | cpdef int _transform(int x):
104 | if x % 2:
105 | return x + 1
106 | return x - 1
107 |
108 | def transform(data):
109 | data.loc[:, "new_val"] = data.val.apply(_transform)
110 | ```
111 |
112 |
113 | ```python
114 | %timeit -n 1 transform(data)
115 | ```
116 |
117 | 202 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
118 |
119 |
120 | ## 3. 减少类型转换
121 | 为了减少C和Python之间的类型转换,我们直接把'val'这一列作为Numpy数组传递给Cython函数,注意区分cnp和np。平均运行时间直接降到10.8毫秒,性能大约提升了26倍,仿佛看到了一丝希望。
122 |
123 |
124 | ```cython
125 | %%cython
126 | import numpy as np
127 | cimport numpy as cnp
128 | ctypedef cnp.int_t DTYPE_t
129 |
130 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):
131 | cdef:
132 | int i = 0
133 | int n = arr.shape[0]
134 | int x
135 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)
136 |
137 | while i < n:
138 | x = arr[i]
139 | if x % 2:
140 | new_arr[i] = x + 1
141 | else:
142 | new_arr[i] = x - 1
143 | i += 1
144 | return new_arr
145 |
146 | def transform(data):
147 | data.loc[:, "new_val"] = _transform(data.val.values)
148 | ```
149 |
150 |
151 | ```python
152 | %timeit -n 1 transform(data)
153 | ```
154 |
155 | 10.8 ms ± 512 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
156 |
157 |
158 | ## 4. 使用不安全的数组
159 | 利用@cython.boundscheck(False),@cython.wraparound(False)装饰器关闭数组的边界检查和负下标处理,平均运行时间变为5.9毫秒。性能提升了42倍左右,顺利完成任务。
160 |
161 |
162 | ```cython
163 | %%cython
164 | import cython
165 | import numpy as np
166 | cimport numpy as cnp
167 | ctypedef cnp.int_t DTYPE_t
168 |
169 |
170 | @cython.boundscheck(False)
171 | @cython.wraparound(False)
172 | cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):
173 | cdef:
174 | int i = 0
175 | int n = arr.shape[0]
176 | int x
177 | cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)
178 |
179 | while i < n:
180 | x = arr[i]
181 | if x % 2:
182 | new_arr[i] = x + 1
183 | else:
184 | new_arr[i] = x - 1
185 | i += 1
186 | return new_arr
187 |
188 | def transform(data):
189 | data.loc[:, "new_val"] = _transform(data.val.values)
190 | ```
191 |
192 |
193 | ```python
194 | %timeit -n 1 transform(data)
195 | ```
196 |
197 | 6.76 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
198 |
199 |
--------------------------------------------------------------------------------
/10 minutes to cython.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 10分钟入门Cython\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 1. Cython是什么? \n",
17 | "Cython是一个编程语言,它通过类似Python的语法来编写C扩展并可以被Python调用.既具备了Python快速开发的特点,又可以让代码运行起来像C一样快,同时还可以方便地调用C library。"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "## 2. 如何安装Cython?\n",
25 | "跟大多数的Python库不同,Cython需要一个C编译器,在不同的平台上配置方法也不一样。\n",
26 | "### 2.1 配置gcc\n",
27 | "- **windows** \n",
28 | "安装MingW-w64编译器:``conda install libpython m2w64-toolchain -c msys2`` \n",
29 | "在Python安装路径下找到\\Lib\\distutils文件夹,创建distutils.cfg写入如下内容: \n",
30 | "``[build] compiler=mingw32``\n",
31 | "\n",
32 | "- **macOS** \n",
33 | "安装XCode即可 \n",
34 | "\n",
35 | "- **linux:** \n",
36 | "gcc一般都是配置好的,如果没有就执行这条命令: ``sudo apt-get install build-essential`` \n",
37 | "\n",
38 | "\n",
39 | "### 2.2 安装cython库\n",
40 | "- 如果没安装Anaconda: ``pip install cython`` \n",
41 | "- 如果安装了Anaconda: ``conda install cython``"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "## 3. 在Jupyter Notebook上使用Cython \n",
49 | "- 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``\n",
50 | "- 接下来运行Cython代码,使用魔术命令 ``%%cython``"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 1,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "%load_ext Cython"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 2,
65 | "metadata": {},
66 | "outputs": [
67 | {
68 | "name": "stdout",
69 | "output_type": "stream",
70 | "text": [
71 | "5050\n"
72 | ]
73 | }
74 | ],
75 | "source": [
76 | "%%cython\n",
77 | "# 对1~100的自然数进行求和\n",
78 | "total = 0\n",
79 | "for i in range(1, 101):\n",
80 | " total += i\n",
81 | "print(total)"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "## 4. 试试Cython到底有多快\n",
89 | "- Python函数,运行时间261 ns\n",
90 | "- Cython函数,运行时间44.1 ns \n",
91 | "\n",
92 | "运行时间竟然只有原来的五分之一左右,秘诀就在于参数x使用了静态类型int,避免了类型检查的耗时。"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "### 4.1 Python函数"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 3,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "def f(x):\n",
109 | " return x ** 2 - x"
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": 4,
115 | "metadata": {},
116 | "outputs": [
117 | {
118 | "name": "stdout",
119 | "output_type": "stream",
120 | "text": [
121 | "261 ns ± 8.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
122 | ]
123 | }
124 | ],
125 | "source": [
126 | "%timeit f(100)"
127 | ]
128 | },
129 | {
130 | "cell_type": "markdown",
131 | "metadata": {},
132 | "source": [
133 | "### 4.2 Cython函数"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": 5,
139 | "metadata": {},
140 | "outputs": [],
141 | "source": [
142 | "%%cython\n",
143 | "def g(int x):\n",
144 | " return x ** 2 - x"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 6,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | "44.1 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
157 | ]
158 | }
159 | ],
160 | "source": [
161 | "%timeit g(100)"
162 | ]
163 | },
164 | {
165 | "cell_type": "markdown",
166 | "metadata": {},
167 | "source": [
168 | "## 参考文章\n",
169 | "部分内容引用自 - [Cython官方文档](http://docs.cython.org/en/latest/index.html)"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "metadata": {},
176 | "outputs": [],
177 | "source": []
178 | }
179 | ],
180 | "metadata": {
181 | "kernelspec": {
182 | "display_name": "Python 3",
183 | "language": "python",
184 | "name": "python3"
185 | },
186 | "language_info": {
187 | "codemirror_mode": {
188 | "name": "ipython",
189 | "version": 3
190 | },
191 | "file_extension": ".py",
192 | "mimetype": "text/x-python",
193 | "name": "python",
194 | "nbconvert_exporter": "python",
195 | "pygments_lexer": "ipython3",
196 | "version": "3.6.6"
197 | }
198 | },
199 | "nbformat": 4,
200 | "nbformat_minor": 2
201 | }
202 |
--------------------------------------------------------------------------------
/Python multi process.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Python多任务处理(多进程篇)\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 多进程处理CPU密集型任务\n",
17 | "CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。在Python多任务处理(多线程篇),我们试图用多线程执行CPU密集型任务,然而并没有性能上的提升。现在我们试一下用多进程来处理CPU密集型任务。"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "### 1. 建立进程池"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 10,
30 | "metadata": {},
31 | "outputs": [
32 | {
33 | "name": "stdout",
34 | "output_type": "stream",
35 | "text": [
36 | "CPU核数为8个!\n"
37 | ]
38 | }
39 | ],
40 | "source": [
41 | "from concurrent.futures import ProcessPoolExecutor\n",
42 | "from time import sleep, time\n",
43 | "import os\n",
44 | "print(\"CPU核数为%s个!\" % os.cpu_count())"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 11,
50 | "metadata": {},
51 | "outputs": [],
52 | "source": [
53 | "# Worker数量\n",
54 | "N = 8\n",
55 | "# 建立进程池\n",
56 | "pool = ProcessPoolExecutor(max_workers=N)"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "### 2. 定义一个CPU密集型函数\n",
64 | "该函数会对[1, x]之间的整数进行求和。"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 12,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": [
73 | "def cpu_bound_func(x):\n",
74 | " tot = 0\n",
75 | " a = 1\n",
76 | " while a <= x:\n",
77 | " tot += x\n",
78 | " a += 1\n",
79 | " print(\"Finish sum from 1 to %d!\" % x)\n",
80 | " return tot"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "### 3. 使用串行的方式处理\n",
88 | "遍历一个列表的所有元素,执行func函数。"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": 13,
94 | "metadata": {},
95 | "outputs": [],
96 | "source": [
97 | "def process_array(arr):\n",
98 | " for x in arr:\n",
99 | " cpu_bound_func(x)"
100 | ]
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "### 4. 使用多进程处理\n",
107 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": 14,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "def fast_process_array(arr):\n",
117 | " for x in pool.map(cpu_bound_func, arr):\n",
118 | " pass"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "### 5. 计算函数运行时间\n",
126 | "- 串行版本的运行时间5.7秒\n",
127 | "- 多进程版本的运行时间1.6秒"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": 15,
133 | "metadata": {},
134 | "outputs": [],
135 | "source": [
136 | "def time_it(fn, *args):\n",
137 | " start = time()\n",
138 | " fn(*args)\n",
139 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 20,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | "Finish sum from 1 to 10000000!\n",
152 | "Finish sum from 1 to 10000000!\n",
153 | "Finish sum from 1 to 10000000!\n",
154 | "Finish sum from 1 to 10000000!\n",
155 | "Finish sum from 1 to 10000000!\n",
156 | "Finish sum from 1 to 10000000!\n",
157 | "Finish sum from 1 to 10000000!\n",
158 | "Finish sum from 1 to 10000000!\n",
159 | "process_array版本的运行时间为 5.74394 秒!\n"
160 | ]
161 | }
162 | ],
163 | "source": [
164 | "time_it(process_array, [10**7 for _ in range(8)])"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 21,
170 | "metadata": {},
171 | "outputs": [
172 | {
173 | "name": "stdout",
174 | "output_type": "stream",
175 | "text": [
176 | "fast_process_array版本的运行时间为 1.62266 秒!\n"
177 | ]
178 | }
179 | ],
180 | "source": [
181 | "time_it(fast_process_array, [10**7 for _ in range(8)])"
182 | ]
183 | }
184 | ],
185 | "metadata": {
186 | "kernelspec": {
187 | "display_name": "Python 3",
188 | "language": "python",
189 | "name": "python3"
190 | },
191 | "language_info": {
192 | "codemirror_mode": {
193 | "name": "ipython",
194 | "version": 3
195 | },
196 | "file_extension": ".py",
197 | "mimetype": "text/x-python",
198 | "name": "python",
199 | "nbconvert_exporter": "python",
200 | "pygments_lexer": "ipython3",
201 | "version": "3.6.6"
202 | }
203 | },
204 | "nbformat": 4,
205 | "nbformat_minor": 2
206 | }
207 |
--------------------------------------------------------------------------------
/Python Standard Library.md:
--------------------------------------------------------------------------------
1 | # 用Python标准库写出高效的代码
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | ## 1. bisect - 二分查找
6 | 给定一个列表对象,我们要对目标元素进行查找,返回其在列表中的下标。
7 | * 首先想到的是Python列表的index方法。建立一个长度为10000的升序列表,编写search函数使用index方式把里面的每一个元素查找一遍,平均运行时间437毫秒。
8 | * 使用bisect模块的bisect_left,也就是我们熟知的二分查找。编写fast_search函数,平均运行时间3.94毫秒,性能提升了110倍!
9 |
10 |
11 | ```python
12 | import bisect
13 | ```
14 |
15 |
16 | ```python
17 | def search(nums):
18 | for x in nums:
19 | nums.index(x)
20 | ```
21 |
22 |
23 | ```python
24 | def fast_search(nums):
25 | for x in nums:
26 | bisect.bisect_left(nums, x)
27 | ```
28 |
29 |
30 | ```python
31 | arr = list(range(10000))
32 | ```
33 |
34 |
35 | ```python
36 | %timeit -n 1 search(arr)
37 | ```
38 |
39 | 437 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
40 |
41 |
42 |
43 | ```python
44 | %timeit -n 1 fast_search(arr)
45 | ```
46 |
47 | 3.94 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
48 |
49 |
50 | ## 2. Counter - 高效计数
51 | 给定一个列表对象,我们要统计其中的每个不重复的元素出现了多少次,返回一个字典对象。
52 | * 创建一个长度为10000,元素为1-3之间的随机数的列表。编写count函数,创建一个空字典,用for循环遍历该列表,将计数结果写入字典。平均运行时间937微秒。
53 | * 使用collections模块的Counter,编写fast_count函数,一行代码搞定。平均运行时间494微秒,性能几乎是原来的2倍。
54 |
55 |
56 | ```python
57 | from collections import Counter
58 | from random import randint
59 | ```
60 |
61 |
62 | ```python
63 | def count(nums):
64 | res = dict()
65 | for x in nums:
66 | if x in res:
67 | res[x] += 1
68 | else:
69 | res[x] = 0
70 | return x
71 | ```
72 |
73 |
74 | ```python
75 | def fast_count(nums):
76 | return Counter(nums)
77 | ```
78 |
79 |
80 | ```python
81 | nums = [randint(1, 3) for _ in range(10000)]
82 | ```
83 |
84 |
85 | ```python
86 | %timeit -n 1 count(nums)
87 | ```
88 |
89 | 937 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
90 |
91 |
92 |
93 | ```python
94 | %timeit -n 1 fast_count(nums)
95 | ```
96 |
97 | 494 µs ± 240 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
98 |
99 |
100 | ## 3. heapq - 堆
101 | 给定一个列表对象,返回该列表中最小的3个元素。
102 | * 创建一个长度为10000的列表,对元素进行随机打乱。编写top_3函数,对列表进行排序,返回前3个元素。平均运行时间2.03毫秒。
103 | * 使用heapq模块,也就是我们熟悉的堆,编写fast_top_3函数。平均运行时间296微秒,性能提升了6.8倍。
104 |
105 |
106 | ```python
107 | import heapq
108 | from random import shuffle
109 | ```
110 |
111 |
112 | ```python
113 | def top_3(nums):
114 | return sorted(nums)[:3]
115 | ```
116 |
117 |
118 | ```python
119 | def fast_top_3(nums):
120 | return heapq.nsmallest(3, nums)
121 | ```
122 |
123 |
124 | ```python
125 | nums = list(range(10000))
126 | shuffle(nums)
127 | ```
128 |
129 |
130 | ```python
131 | %timeit -n 1 top_3(nums)
132 | ```
133 |
134 | 2.03 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
135 |
136 |
137 |
138 | ```python
139 | %timeit -n 1 fast_top_3(nums)
140 | ```
141 |
142 | 296 µs ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
143 |
144 |
145 | ## 4. itemgetter - 批量get元素
146 | 给定一个字典和一个列表,列表中包含一个或多个字典中的key,返回对应的values。
147 | * 创建一个元素数量为10万的字典,从字典的key中随机抽样10万,形成一个长度为1万的列表。编写get_items函数,平均运行时间1.12毫秒
148 | * 使用itemgetter批量读取这些元素,编写fast_get_items函数,平均运行时间836微秒,性能是原来的1.3倍。
149 |
150 |
151 |
152 | ```python
153 | from operator import itemgetter
154 | from random import choices
155 | ```
156 |
157 |
158 | ```python
159 | def get_items(data, keys):
160 | return [data[x] for x in keys]
161 | ```
162 |
163 |
164 | ```python
165 | def fast_get_items(data, keys):
166 | return itemgetter(*keys)(data)
167 | ```
168 |
169 |
170 | ```python
171 | data= dict(enumerate(range(100000)))
172 | keys = choices(list(data.keys()), k=10000)
173 | ```
174 |
175 |
176 | ```python
177 | %timeit -n 5 get_items(data, keys)
178 | ```
179 |
180 | 1.12 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)
181 |
182 |
183 |
184 | ```python
185 | %timeit -n 5 fast_get_items(data, keys)
186 | ```
187 |
188 | 836 µs ± 287 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)
189 |
190 |
191 | ## 5. lru_cache - 空间换时间
192 | 给定数字n,返回长度为n的斐波那且数列
193 | * 使用递归方式,编写fib函数,并用fib_seq函数对其进行循环调用。令n等于20,平均运行时间3.28ms。
194 | * 使用@lru_cache语法糖,将已经计算出来的结果缓存起来,比如fib(4),计算fib(5)的时候可以直接调用缓存的fib(4)。平均运行时间144微秒,性能提升了22倍。
195 |
196 |
197 | ```python
198 | from functools import lru_cache
199 | ```
200 |
201 |
202 | ```python
203 | def fib(n):
204 | if n < 2:
205 | return n
206 | return fib(n-1) + fib(n-2)
207 |
208 | def fib_seq(n):
209 | return [fib(x) for x in range(n)]
210 | ```
211 |
212 |
213 | ```python
214 | @lru_cache(maxsize=None)
215 | def fast_fib(n):
216 | if n < 2:
217 | return n
218 | return fib(n-1) + fib(n-2)
219 |
220 | def fast_fib_seq(n):
221 | return [fast_fib(x) for x in range(n)]
222 | ```
223 |
224 |
225 | ```python
226 | %timeit -n 5 fib_seq(20)
227 | ```
228 |
229 | 3.28 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)
230 |
231 |
232 |
233 | ```python
234 | %timeit -n 5 fast_fib_seq(20)
235 | ```
236 |
237 | The slowest run took 524.07 times longer than the fastest. This could mean that an intermediate result is being cached.
238 | 144 µs ± 347 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)
239 |
240 |
241 |
242 | ```python
243 |
244 | ```
245 |
--------------------------------------------------------------------------------
/Python coroutines.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Python多任务处理(协程篇)\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 1. 协程\n",
17 | "协程,英文名是Coroutine,又称为微线程,是一种用户态的轻量级线程。协程不像线程和进程那样,需要进行系统内核上的上下文切换,协程的上下文切换是由程序员决定的。协程通过 async/await 语法进行声明,是编写异步应用的推荐方式。\n"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 1,
23 | "metadata": {},
24 | "outputs": [],
25 | "source": [
26 | "import asyncio\n",
27 | "\n",
28 | "async def hello_world():\n",
29 | " print('hello')\n",
30 | " await asyncio.sleep(1)\n",
31 | " print('world')\n",
32 | "\n",
33 | "# asyncio.run(main())"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "## 2. 协程处理IO密集型任务\n",
41 | "IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 1,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "import requests\n",
51 | "import time"
52 | ]
53 | },
54 | {
55 | "cell_type": "markdown",
56 | "metadata": {},
57 | "source": [
58 | "### 2.1 定义一个IO密集型函数"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 2,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "def _sleep(n):\n",
68 | " time.sleep(n)\n",
69 | " print(\"Sleep for %d seconds.\" % n)"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### 2.2 使用串行的方式处理"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 3,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "def sleep(m, n):\n",
86 | " for i in range(m):\n",
87 | " _sleep(n)"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "### 2.3 使用协程处理\n",
95 | "目前Jupyter Notebook因为一些待修复的BUG无法运行协程。"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 4,
101 | "metadata": {},
102 | "outputs": [],
103 | "source": [
104 | "f = open(\"tmp.py\", \"w\")\n",
105 | "f.write(\n",
106 | "\"\"\"\n",
107 | "import time\n",
108 | "import asyncio\n",
109 | "\n",
110 | "\n",
111 | "async def _sleep(n):\n",
112 | " await asyncio.sleep(n)\n",
113 | " print(\"Sleep for %d seconds.\" % n)\n",
114 | "\n",
115 | "\n",
116 | "def sleep(m, n):\n",
117 | " loop = asyncio.get_event_loop()\n",
118 | " loop.run_until_complete(asyncio.gather(*[_sleep(n) for _ in range(m)]))\n",
119 | " loop.close()\n",
120 | "\n",
121 | "\n",
122 | "if __name__ == '__main__':\n",
123 | " start = time.perf_counter()\n",
124 | " m = 3\n",
125 | " n = 1\n",
126 | " sleep(m, n)\n",
127 | " print(\"%s函数的运行时间为 %.5f 秒!\" % (sleep.__name__, time.perf_counter() - start))\"\"\"\n",
128 | ")\n",
129 | "f.close()"
130 | ]
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "metadata": {},
135 | "source": [
136 | "### 2.4 计算函数运行时间\n",
137 | "- 串行版本的运行时间 = 1 + 2 + 3 = 6秒 \n",
138 | "- 多线程版本的运行时间 = max(1, 2, 3) = 3秒"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 5,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "def time_it(fn, *args):\n",
148 | " start = time.perf_counter()\n",
149 | " fn(*args)\n",
150 | " print(\"%s函数的运行时间为 %.5f 秒!\" % (fn.__name__, time.perf_counter() - start))"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 6,
156 | "metadata": {},
157 | "outputs": [
158 | {
159 | "name": "stdout",
160 | "output_type": "stream",
161 | "text": [
162 | "Sleep for 1 seconds.\n",
163 | "Sleep for 1 seconds.\n",
164 | "Sleep for 1 seconds.\n",
165 | "sleep函数的运行时间为 3.01054 秒!\n"
166 | ]
167 | }
168 | ],
169 | "source": [
170 | "time_it(sleep, 3, 1)"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": 7,
176 | "metadata": {},
177 | "outputs": [
178 | {
179 | "name": "stdout",
180 | "output_type": "stream",
181 | "text": [
182 | "Sleep for 1 seconds.\n",
183 | "Sleep for 1 seconds.\n",
184 | "Sleep for 1 seconds.\n",
185 | "sleep函数的运行时间为 1.00305 秒!\n",
186 | "\n"
187 | ]
188 | }
189 | ],
190 | "source": [
191 | "import subprocess\n",
192 | "print(str(subprocess.check_output(\"python tmp.py\", shell=True), encoding = \"utf-8\"))"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "## 参考文章\n",
200 | "https://docs.python.org/zh-cn/3.7/library/asyncio-task.html"
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": null,
206 | "metadata": {},
207 | "outputs": [],
208 | "source": []
209 | }
210 | ],
211 | "metadata": {
212 | "kernelspec": {
213 | "display_name": "Python 3",
214 | "language": "python",
215 | "name": "python3"
216 | },
217 | "language_info": {
218 | "codemirror_mode": {
219 | "name": "ipython",
220 | "version": 3
221 | },
222 | "file_extension": ".py",
223 | "mimetype": "text/x-python",
224 | "name": "python",
225 | "nbconvert_exporter": "python",
226 | "pygments_lexer": "ipython3",
227 | "version": "3.6.10"
228 | }
229 | },
230 | "nbformat": 4,
231 | "nbformat_minor": 2
232 | }
233 |
--------------------------------------------------------------------------------
/More efficient array.md:
--------------------------------------------------------------------------------
1 | # 4种方法提升Python数组的效率
2 |
3 | ## 1. Python的列表为什么慢
4 | Python的列表是一个动态的数组,即数组的size是可以调整的,数组存放着指向各个列表元素的指针(PyObject*)。列表中的各个元素可以是不同的类型,比如my_list = ['a', 1, True]。实际上数组里存放了三个指针,分别指向了这三个元素。那么相比其他语言的数组而言,为什么Python的列表会慢呢?原因主要是以下两个:
5 | 1. Python是动态类型语言,意味着类型检查要耗费额外的时间。
6 | 2. Python或者说Cpython没有JIT优化器。
7 |
8 | ## 2. 如何用Python执行快速的数组计算
9 | 目前比较主流的解决方案有如下几种:
10 | 1. Numpy - Numpy的array更像是C/C++的数组,数据类型一致,而且array的方法(如sum)都是用C来实现的。
11 | 2. Numba - 使用JIT技术,优化Numpy的性能。无论是调用Numpy的方法,还是使用for循环遍历Numpy数组,都可以得到性能提升。
12 | 3. Numexpr - 避免Numpy为中间结果分配内存,优化Numpy性能,主要用于大数组的表达式计算。
13 | 4. Cython - 为Python编写C/C++扩展。
14 |
15 | 接下来通过两个例子来演示如何通过这四种工具
16 |
17 | ## 3. 数组求平方和
18 |
19 |
20 | ```python
21 | arr = [x for x in range(10000)]
22 | ```
23 |
24 | ### 3.1 for循环
25 |
26 |
27 | ```python
28 | def sqr_sum(arr):
29 | total = 0
30 | for x in arr:
31 | total += x ** 2
32 | return total
33 |
34 | print("The result is:", sqr_sum(arr))
35 | %timeit sqr_sum(arr)
36 | ```
37 |
38 | The result is: 333283335000
39 | 2.53 ms ± 91.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
40 |
41 |
42 | ### 3.2 Numpy
43 |
44 |
45 | ```python
46 | import numpy as np
47 | ```
48 |
49 |
50 | ```python
51 | def sqr_sum(arr):
52 | return (arr ** 2).sum()
53 |
54 | arr = np.array(arr)
55 | print("The result is:", sqr_sum(arr))
56 | %timeit sqr_sum(arr)
57 | ```
58 |
59 | The result is: 333283335000
60 | 9.66 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
61 |
62 |
63 | ### 3.3 Numba
64 |
65 |
66 | ```python
67 | from numba import jit
68 | ```
69 |
70 |
71 | ```python
72 | @jit(nopython=True)
73 | def sqr_sum(arr):
74 | return (arr ** 2).sum()
75 |
76 | arr = np.array(arr)
77 | print("The result is:", sqr_sum(arr))
78 | %timeit sqr_sum(arr)
79 | ```
80 |
81 | The result is: 333283335000
82 | 3.39 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
83 |
84 |
85 | ### 3.4 Numexpr
86 |
87 |
88 | ```python
89 | import numexpr as ne
90 | ```
91 |
92 |
93 | ```python
94 | def sqr_sum(arr):
95 | return ne.evaluate("sum(arr * arr)")
96 |
97 | arr = np.array(arr)
98 | print("The result is:", sqr_sum(arr))
99 | %timeit sqr_sum(arr)
100 | ```
101 |
102 | The result is: 333283335000
103 | 14.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
104 |
105 |
106 | ### 3.5 Cython
107 |
108 |
109 | ```python
110 | %load_ext cython
111 | ```
112 |
113 |
114 | ```cython
115 | %%cython
116 | cimport numpy as np
117 | ctypedef np.int_t DTYPE_t
118 |
119 | def sqr_sum(np.ndarray[DTYPE_t] arr):
120 | cdef:
121 | DTYPE_t total = 0
122 | DTYPE_t x
123 | int i = 0
124 | int n = len(arr)
125 | while i < n:
126 | total += arr[i] ** 2
127 | i += 1
128 | return total
129 | ```
130 |
131 |
132 | ```python
133 | arr = np.array(arr, dtype="int")
134 | print("The result is:", sqr_sum(arr))
135 | %timeit sqr_sum(arr)
136 | ```
137 |
138 | The result is: 333283335000
139 | 5.51 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
140 |
141 |
142 | ## 4. 数组变换
143 |
144 |
145 | ```python
146 | arr = [x for x in range(1000000)]
147 | ```
148 |
149 | ### 4.1 for循环
150 |
151 |
152 | ```python
153 | def transform(arr):
154 | return [x * 2 + 1 for x in arr]
155 |
156 | print("The result is:", transform(arr)[:5], "...")
157 | %timeit transform(arr)
158 | ```
159 |
160 | The result is: [1, 3, 5, 7, 9] ...
161 | 84.5 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
162 |
163 |
164 | ### 4.2 Numpy
165 |
166 |
167 | ```python
168 | import numpy as np
169 | ```
170 |
171 |
172 | ```python
173 | def transform(arr):
174 | return arr * 2 + 1
175 |
176 | arr = np.array(arr)
177 | print("The result is:", transform(arr)[:5], "...")
178 | %timeit transform(arr)
179 | ```
180 |
181 | The result is: [1 3 5 7 9] ...
182 | 803 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
183 |
184 |
185 | ### 4.3 Numba
186 |
187 |
188 | ```python
189 | from numba import jit
190 | ```
191 |
192 |
193 | ```python
194 | @jit(nopython=True)
195 | def transform(arr):
196 | return arr * 2 + 1
197 |
198 | arr = np.array(arr)
199 | print("The result is:", transform(arr)[:5], "...")
200 | %timeit transform(arr)
201 | ```
202 |
203 | The result is: [1 3 5 7 9] ...
204 | 498 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
205 |
206 |
207 | ### 4.4 Numexpr
208 |
209 |
210 | ```python
211 | import numexpr as ne
212 | ```
213 |
214 |
215 | ```python
216 | def transform(arr):
217 | return ne.evaluate("arr * 2 + 1")
218 |
219 | arr = np.array(arr)
220 | print("The result is:", transform(arr)[:5], "...")
221 | %timeit transform(arr)
222 | ```
223 |
224 | The result is: [1 3 5 7 9] ...
225 | 369 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
226 |
227 |
228 | ### 4.5 Cython
229 |
230 |
231 | ```python
232 | %load_ext cython
233 | ```
234 |
235 | The cython extension is already loaded. To reload it, use:
236 | %reload_ext cython
237 |
238 |
239 |
240 | ```cython
241 | %%cython
242 | import numpy as np
243 | cimport numpy as np
244 | ctypedef np.int_t DTYPE_t
245 |
246 | def transform(np.ndarray[DTYPE_t] arr):
247 | cdef:
248 | np.ndarray[DTYPE_t] new_arr = np.empty_like(arr)
249 | int i = 0
250 | int n = len(arr)
251 | while i < n:
252 | new_arr[i] = arr[i] * 2 + 1
253 | i += 1
254 | return new_arr
255 | ```
256 |
257 |
258 | ```python
259 | arr = np.array(arr)
260 | print("The result is:", transform(arr)[:5], "...")
261 | %timeit transform(arr)
262 | ```
263 |
264 | The result is: [1 3 5 7 9] ...
265 | 887 µs ± 29.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
266 |
267 |
268 | ## 5. 参考文章
269 | [How does python have different data types in an array?](https://stackoverflow.com/questions/10558670/how-does-python-have-different-data-types-in-an-array)
270 | [Why are Python Programs often slower than the Equivalent Program Written in C or C++?](https://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or)
271 | [How Fast Numpy Really is and Why?](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347)
272 |
273 |
274 | ```python
275 |
276 | ```
277 |
--------------------------------------------------------------------------------
/Itertools for efficient looping.md:
--------------------------------------------------------------------------------
1 | # Python Itertools - 高效的循环
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | Python官方文档用"高效的循环"来形容itertools模块,有些tools会带来性能提升,而另外一些tools并不快,只是会节省一些开发时间而已,如果滥用还会导致代码可读性变差。我们不妨把itertools的兄弟们拉出来溜溜。
6 |
7 |
8 | ## 1. 数列累加
9 | 给定一个列表An,返回数列累加和Sn。
10 | 举例说明:
11 | * 输入: [1, 2, 3, 4, 5]
12 | * 返回: [1, 3, 6, 10, 15]
13 |
14 | 使用accumulate,性能提升了2.5倍
15 |
16 |
17 | ```python
18 | from itertools import accumulate
19 | ```
20 |
21 |
22 | ```python
23 | def _accumulate_list(arr):
24 | tot = 0
25 | for x in arr:
26 | tot += x
27 | yield tot
28 |
29 | def accumulate_list(arr):
30 | return list(_accumulate_list(arr))
31 | ```
32 |
33 |
34 | ```python
35 | def fast_accumulate_list(arr):
36 | return list(accumulate(arr))
37 | ```
38 |
39 |
40 | ```python
41 | arr = list(range(1000))
42 | ```
43 |
44 |
45 | ```python
46 | %timeit accumulate_list(arr)
47 | ```
48 |
49 | 61 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
50 |
51 |
52 |
53 | ```python
54 | %timeit fast_accumulate_list(arr)
55 | ```
56 |
57 | 21.3 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
58 |
59 |
60 | ## 2. 选择数据
61 | 给定一个列表data,一个用0/1表示的列表selectors,返回被选择的数据。
62 | 举例说明:
63 | * 输入: [1, 2, 3, 4, 5], [0, 1, 0, 1, 0]
64 | * 返回: [2, 4]
65 |
66 | 使用compress,性能提升了2.8倍
67 |
68 |
69 | ```python
70 | from itertools import compress
71 | from random import randint
72 | ```
73 |
74 |
75 | ```python
76 | def select_data(data, selectors):
77 | return [x for x, y in zip(data, selectors) if y]
78 | ```
79 |
80 |
81 | ```python
82 | def fast_select_data(data, selectors):
83 | return list(compress(data, selectors))
84 | ```
85 |
86 |
87 | ```python
88 | data = list(range(10000))
89 | selectors = [randint(0, 1) for _ in range(10000)]
90 | ```
91 |
92 |
93 | ```python
94 | %timeit select_data(data, selectors)
95 | ```
96 |
97 | 341 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
98 |
99 |
100 |
101 | ```python
102 | %timeit fast_select_data(data, selectors)
103 | ```
104 |
105 | 130 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
106 |
107 |
108 | ## 3. 组合
109 | 给定一个列表arr和一个数字k,返回从arr中选择k个元素的所有情况。
110 | 举例说明:
111 | * 输入: [1, 2, 3], 2
112 | * 返回: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
113 |
114 | 使用permutations,性能提升了10倍
115 |
116 |
117 | ```python
118 | from itertools import permutations
119 | ```
120 |
121 |
122 | ```python
123 | def _get_permutations(arr, k, i):
124 | if i == k:
125 | return [arr[:k]]
126 | res = []
127 | for j in range(i, len(arr)):
128 | arr_cpy = arr.copy()
129 | arr_cpy[i], arr_cpy[j] = arr_cpy[j], arr_cpy[i]
130 | res += _get_permutations(arr_cpy, k, i + 1)
131 | return res
132 |
133 | def get_permutations(arr, k):
134 | return _get_permutations(arr, k, 0)
135 | ```
136 |
137 |
138 | ```python
139 | def fast_get_permutations(arr, k):
140 | return list(permutations(arr, k))
141 | ```
142 |
143 |
144 | ```python
145 | arr = list(range(10))
146 | k = 5
147 | ```
148 |
149 |
150 | ```python
151 | %timeit -n 1 get_permutations(arr, k)
152 | ```
153 |
154 | 15.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
155 |
156 |
157 |
158 | ```python
159 | %timeit -n 1 fast_get_permutations(arr, k)
160 | ```
161 |
162 | 1.56 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
163 |
164 |
165 | ## 4. 筛选数据
166 | 给定一个列表arr,筛选出所有的偶数。
167 | 举例说明:
168 | * 输入: [3, 1, 4, 5, 9, 2]
169 | * 返回: [(4, 2]
170 |
171 | 使用filterfalse,性能反而会变慢,所以不要迷信itertools。
172 |
173 |
174 | ```python
175 | from itertools import filterfalse
176 | ```
177 |
178 |
179 | ```python
180 | def get_even_nums(arr):
181 | return [x for x in arr if x % 2 == 0]
182 | ```
183 |
184 |
185 | ```python
186 | def fast_get_even_nums(arr):
187 | return list(filterfalse(lambda x: x % 2, arr))
188 | ```
189 |
190 |
191 | ```python
192 | arr = list(range(10000))
193 | ```
194 |
195 |
196 | ```python
197 | %timeit get_even_nums(arr)
198 | ```
199 |
200 | 417 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
201 |
202 |
203 |
204 | ```python
205 | %timeit fast_get_even_nums(arr)
206 | ```
207 |
208 | 823 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
209 |
210 |
211 | ## 5. 条件终止
212 | 给定一个列表arr,依次对列表的所有数字进行求和,若遇到某个元素大于target之后则终止求和,返回这个和。
213 | 举例说明:
214 | * 输入: [1, 2, 3, 4, 5], 3
215 | * 返回: 6 (4 > 3,终止)
216 |
217 | 使用takewhile,性能反而会变慢,所以不要迷信itertools。
218 |
219 |
220 | ```python
221 | from itertools import takewhile
222 | ```
223 |
224 |
225 | ```python
226 | def cond_sum(arr, target):
227 | res = 0
228 | for x in arr:
229 | if x > target:
230 | break
231 | res += x
232 | return res
233 | ```
234 |
235 |
236 | ```python
237 | def fast_cond_sum(arr, target):
238 | return sum(takewhile(lambda x: x <= target, arr))
239 | ```
240 |
241 |
242 | ```python
243 | arr = list(range(10000))
244 | target = 5000
245 | ```
246 |
247 |
248 | ```python
249 | %timeit cond_sum(arr, target)
250 | ```
251 |
252 | 245 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
253 |
254 |
255 |
256 | ```python
257 | %timeit fast_cond_sum(arr, target)
258 | ```
259 |
260 | 404 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
261 |
262 |
263 | ## 6. 循环嵌套
264 | 给定列表arr1,arr2,返回两个列表的所有元素两两相加的和。
265 | 举例说明:
266 | * 输入: [1, 2], [4, 5]
267 | * 返回: [1 + 4, 1 + 5, 2 + 4, 2 + 5]
268 |
269 | 使用product,性能提升了1.25倍。
270 |
271 |
272 | ```python
273 | from itertools import product
274 | ```
275 |
276 |
277 | ```python
278 | def _cross_sum(arr1, arr2):
279 | for x in arr1:
280 | for y in arr2:
281 | yield x + y
282 |
283 | def cross_sum(arr1, arr2):
284 | return list(_cross_sum(arr1, arr2))
285 | ```
286 |
287 |
288 | ```python
289 | def fast_cross_sum(arr1, arr2):
290 | return [x + y for x, y in product(arr1, arr2)]
291 | ```
292 |
293 |
294 | ```python
295 | arr1 = list(range(100))
296 | arr2 = list(range(100))
297 | ```
298 |
299 |
300 | ```python
301 | %timeit cross_sum(arr1, arr2)
302 | ```
303 |
304 | 484 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
305 |
306 |
307 |
308 | ```python
309 | %timeit fast_cross_sum(arr1, arr2)
310 | ```
311 |
312 | 373 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
313 |
314 |
315 | ## 7. 二维列表转一维列表
316 | 给定二维列表arr,转为一维列表
317 | 举例说明:
318 | * 输入: [[1, 2], [3, 4]]
319 | * 返回: [1, 2, 3, 4]
320 |
321 | 使用chain,性能提升了6倍。
322 |
323 |
324 | ```python
325 | from itertools import chain
326 | ```
327 |
328 |
329 | ```python
330 | def _flatten(arr2d):
331 | for arr in arr2d:
332 | for x in arr:
333 | yield x
334 |
335 | def flatten(arr2d):
336 | return list(_flatten(arr2d))
337 | ```
338 |
339 |
340 | ```python
341 | def fast_flatten(arr2d):
342 | return list(chain(*arr2d))
343 | ```
344 |
345 |
346 | ```python
347 | arr2d = [[x + y * 100 for x in range(100)] for y in range(100)]
348 | ```
349 |
350 |
351 | ```python
352 | %timeit flatten(arr2d)
353 | ```
354 |
355 | 379 µs ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
356 |
357 |
358 |
359 | ```python
360 | %timeit fast_flatten(arr2d)
361 | ```
362 |
363 | 66.9 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
364 |
365 |
366 |
367 | ```python
368 |
369 | ```
370 |
--------------------------------------------------------------------------------
/Python multi threads.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Python多任务处理(多线程篇)\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 1. GIL\n",
17 | "\n",
18 | "熟悉python的都知道,在C语言写的python解释器中存在全局解释器锁,由于全局解释器锁的存在,在同一时间内,python解释器只能运行一个线程的代码,这大大影响了python多线程的性能。而这个解释器锁由于历史原因,现在几乎无法消除。 \n",
19 | " \n",
20 | "python GIL 之所以会影响多线程等性能,是因为在多线程的情况下,只有当线程获得了一个全局锁的时候,那么该线程的代码才能运行,而全局锁只有一个,所以使用python多线程,在同一时刻也只有一个线程在运行,因此在即使在多核的情况下也只能发挥出单核的性能。 \n"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "## 2. 多线程处理IO密集型任务\n",
28 | "IO密集型任务指的是系统的CPU性能相对硬盘、内存要好很多,此时,系统运作,大部分的状况是CPU在等I/O (硬盘/内存) 的读/写操作,此时CPU Loading并不高。涉及到网络、磁盘IO的任务都是IO密集型任务。一个线程执行IO密集型任务的时候,CPU处于闲置状态,因此GIL会被释放给其他线程,从而缩短了总体的等待运行时间。"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 1,
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "from concurrent.futures import ThreadPoolExecutor\n",
38 | "from time import sleep, time"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 2,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": [
47 | "# Worker数量\n",
48 | "N = 4\n",
49 | "# 建立线程池\n",
50 | "pool = ThreadPoolExecutor(max_workers=N)"
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "### 2.1 定义一个IO密集型函数\n",
58 | "该函数会“睡眠”x秒。"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 3,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "def io_bound_func(x):\n",
68 | " sleep(x)\n",
69 | " print(\"Sleep for %d seconds.\" % x)"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### 2.2 使用串行的方式处理\n",
77 | "遍历一个列表的所有元素,执行func函数。"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": 4,
83 | "metadata": {},
84 | "outputs": [],
85 | "source": [
86 | "def process_array(arr):\n",
87 | " for x in arr:\n",
88 | " io_bound_func(x)"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "### 2.3 使用多线程处理\n",
96 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 5,
102 | "metadata": {},
103 | "outputs": [],
104 | "source": [
105 | "def fast_process_array(arr):\n",
106 | " for x in pool.map(io_bound_func, arr):\n",
107 | " pass"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "### 2.4 计算函数运行时间\n",
115 | "- 串行版本的运行时间 = 1 + 2 + 3 = 6秒 \n",
116 | "- 多线程版本的运行时间 = max(1, 2, 3) = 3秒"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": 6,
122 | "metadata": {},
123 | "outputs": [],
124 | "source": [
125 | "def time_it(fn, *args):\n",
126 | " start = time()\n",
127 | " fn(*args)\n",
128 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 7,
134 | "metadata": {},
135 | "outputs": [
136 | {
137 | "name": "stdout",
138 | "output_type": "stream",
139 | "text": [
140 | "Sleep for 1 seconds.\n",
141 | "Sleep for 2 seconds.\n",
142 | "Sleep for 3 seconds.\n",
143 | "process_array版本的运行时间为 6.00883 秒!\n"
144 | ]
145 | }
146 | ],
147 | "source": [
148 | "time_it(process_array, [1, 2, 3])"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": 8,
154 | "metadata": {},
155 | "outputs": [
156 | {
157 | "name": "stdout",
158 | "output_type": "stream",
159 | "text": [
160 | "Sleep for 1 seconds.\n",
161 | "Sleep for 2 seconds.\n",
162 | "Sleep for 3 seconds.\n",
163 | "fast_process_array版本的运行时间为 3.00300 秒!\n"
164 | ]
165 | }
166 | ],
167 | "source": [
168 | "time_it(fast_process_array, [1, 2, 3])"
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | "### 3. 多线程CPU密集型任务\n",
176 | "CPU密集型任务的特点是要进行大量的计算,消耗CPU资源,比如计算圆周率、对视频进行高清解码等等,全靠CPU的运算能力。一个线程执行CPU密集型任务的时候,CPU处于忙碌状态,运行1000个字节码之后GIL会被释放给其他线程,加上切换线程的时间有可能会比串行代码更慢。"
177 | ]
178 | },
179 | {
180 | "cell_type": "markdown",
181 | "metadata": {},
182 | "source": [
183 | "### 3.1 定义一个CPU密集型函数\n",
184 | "该函数会对[1, x]之间的整数进行求和。"
185 | ]
186 | },
187 | {
188 | "cell_type": "code",
189 | "execution_count": 9,
190 | "metadata": {},
191 | "outputs": [],
192 | "source": [
193 | "def cpu_bound_func(x):\n",
194 | " tot = 0\n",
195 | " a = 1\n",
196 | " while a <= x:\n",
197 | " tot += x\n",
198 | " a += 1\n",
199 | " print(\"Finish sum from 1 to %d!\" % x)\n",
200 | " return tot"
201 | ]
202 | },
203 | {
204 | "cell_type": "markdown",
205 | "metadata": {},
206 | "source": [
207 | "### 3.2 使用串行的方式处理\n",
208 | "遍历一个列表的所有元素,执行func函数。"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": 10,
214 | "metadata": {},
215 | "outputs": [],
216 | "source": [
217 | "def process_array(arr):\n",
218 | " for x in arr:\n",
219 | " cpu_bound_func(x)"
220 | ]
221 | },
222 | {
223 | "cell_type": "markdown",
224 | "metadata": {},
225 | "source": [
226 | "### 3.3 使用多线程处理\n",
227 | "通过线程池的map方法,可以将同一个函数作用在列表中的所有元素上。"
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "execution_count": 11,
233 | "metadata": {},
234 | "outputs": [],
235 | "source": [
236 | "def fast_process_array(arr):\n",
237 | " for x in pool.map(cpu_bound_func, arr):\n",
238 | " pass"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "### 3.4 计算函数运行时间\n",
246 | "- 串行版本的运行时间2.1秒\n",
247 | "- 多线程版本的运行时间2.2秒"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 12,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "def time_it(fn, *args):\n",
257 | " start = time()\n",
258 | " fn(*args)\n",
259 | " print(\"%s版本的运行时间为 %.5f 秒!\" % (fn.__name__, time() - start))"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": 13,
265 | "metadata": {},
266 | "outputs": [
267 | {
268 | "name": "stdout",
269 | "output_type": "stream",
270 | "text": [
271 | "Finish sum from 1 to 10000000!\n",
272 | "Finish sum from 1 to 10000000!\n",
273 | "Finish sum from 1 to 10000000!\n",
274 | "process_array版本的运行时间为 2.10489 秒!\n"
275 | ]
276 | }
277 | ],
278 | "source": [
279 | "time_it(process_array, [10**7, 10**7, 10**7])"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": 14,
285 | "metadata": {},
286 | "outputs": [
287 | {
288 | "name": "stdout",
289 | "output_type": "stream",
290 | "text": [
291 | "Finish sum from 1 to 10000000!\n",
292 | "Finish sum from 1 to 10000000!\n",
293 | "Finish sum from 1 to 10000000!\n",
294 | "fast_process_array版本的运行时间为 2.20897 秒!\n"
295 | ]
296 | }
297 | ],
298 | "source": [
299 | "time_it(fast_process_array, [10**7, 10**7, 10**7])"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {},
305 | "source": [
306 | "## 参考文章\n",
307 | "https://www.jianshu.com/p/c75ed8a6e9af \n",
308 | "https://www.cnblogs.com/tusheng/articles/10630662.html"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": null,
314 | "metadata": {},
315 | "outputs": [],
316 | "source": []
317 | }
318 | ],
319 | "metadata": {
320 | "kernelspec": {
321 | "display_name": "Python 3",
322 | "language": "python",
323 | "name": "python3"
324 | },
325 | "language_info": {
326 | "codemirror_mode": {
327 | "name": "ipython",
328 | "version": 3
329 | },
330 | "file_extension": ".py",
331 | "mimetype": "text/x-python",
332 | "name": "python",
333 | "nbconvert_exporter": "python",
334 | "pygments_lexer": "ipython3",
335 | "version": "3.6.6"
336 | }
337 | },
338 | "nbformat": 4,
339 | "nbformat_minor": 2
340 | }
341 |
--------------------------------------------------------------------------------
/More efficient pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 让Pandas DataFrame性能提升40倍"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 1. 小试牛刀\n",
15 | "大名鼎鼎的Pandas是数据分析的神器。有时候我们需要对上千万甚至上亿的数据进行非常复杂处理,那么运行效率就是一个不能忽视的问题。比如下面这个简单例子,我们随机生成100万条数据,对'val'这一列进行处理:如果是偶数则减1,奇数则加1。实际的数据分析工作要比这个例子复杂的多,但考虑到我们(主要是我)没有那么多时间等待运行结果,所以就偷个懒吧。可以看到transform函数的平均运行时间是284ms,"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {},
22 | "outputs": [
23 | {
24 | "data": {
25 | "text/html": [
26 | "\n",
27 | "\n",
40 | "
\n",
41 | " \n",
42 | " \n",
43 | " | \n",
44 | " genre | \n",
45 | " val | \n",
46 | "
\n",
47 | " \n",
48 | " \n",
49 | " \n",
50 | " | 0 | \n",
51 | " C | \n",
52 | " 54 | \n",
53 | "
\n",
54 | " \n",
55 | " | 1 | \n",
56 | " A | \n",
57 | " 5 | \n",
58 | "
\n",
59 | " \n",
60 | " | 2 | \n",
61 | " D | \n",
62 | " 0 | \n",
63 | "
\n",
64 | " \n",
65 | " | 3 | \n",
66 | " D | \n",
67 | " 42 | \n",
68 | "
\n",
69 | " \n",
70 | " | 4 | \n",
71 | " C | \n",
72 | " 91 | \n",
73 | "
\n",
74 | " \n",
75 | "
\n",
76 | "
"
77 | ],
78 | "text/plain": [
79 | " genre val\n",
80 | "0 C 54\n",
81 | "1 A 5\n",
82 | "2 D 0\n",
83 | "3 D 42\n",
84 | "4 C 91"
85 | ]
86 | },
87 | "execution_count": 1,
88 | "metadata": {},
89 | "output_type": "execute_result"
90 | }
91 | ],
92 | "source": [
93 | "import pandas as pd\n",
94 | "import numpy as np\n",
95 | "\n",
96 | "def gen_data(size):\n",
97 | " d = dict()\n",
98 | " d[\"genre\"] = np.random.choice([\"A\", \"B\", \"C\", \"D\"], size=size)\n",
99 | " d[\"val\"] = np.random.randint(low=0, high=100, size=size)\n",
100 | " return pd.DataFrame(d)\n",
101 | "\n",
102 | "data = gen_data(1000000)\n",
103 | "data.head()"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 2,
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "def transform(data):\n",
113 | " data.loc[:, \"new_val\"] = data.val.apply(lambda x: x + 1 if x % 2 else x - 1)"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 3,
119 | "metadata": {},
120 | "outputs": [
121 | {
122 | "name": "stdout",
123 | "output_type": "stream",
124 | "text": [
125 | "284 ms ± 8.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
126 | ]
127 | }
128 | ],
129 | "source": [
130 | "%timeit -n 1 transform(data)"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "## 2. 用Cython编写C扩展\n",
138 | "试试用我们的老朋友Cython来写一下 `x + 1 if x % 2 else x - 1` 这个函数。平均运行时间降低到了202ms,果然速度变快了。性能大约提升了1.4倍,离40倍的flag还差的好远[捂脸]。"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 4,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "%load_ext cython"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 5,
153 | "metadata": {},
154 | "outputs": [],
155 | "source": [
156 | "%%cython\n",
157 | "cpdef int _transform(int x):\n",
158 | " if x % 2:\n",
159 | " return x + 1\n",
160 | " return x - 1\n",
161 | "\n",
162 | "def transform(data):\n",
163 | " data.loc[:, \"new_val\"] = data.val.apply(_transform)"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": 6,
169 | "metadata": {},
170 | "outputs": [
171 | {
172 | "name": "stdout",
173 | "output_type": "stream",
174 | "text": [
175 | "202 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
176 | ]
177 | }
178 | ],
179 | "source": [
180 | "%timeit -n 1 transform(data)"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "## 3. 减少类型转换\n",
188 | "为了减少C和Python之间的类型转换,我们直接把'val'这一列作为Numpy数组传递给Cython函数,注意区分cnp和np。平均运行时间直接降到10.8毫秒,性能大约提升了26倍,仿佛看到了一丝希望。"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 7,
194 | "metadata": {},
195 | "outputs": [],
196 | "source": [
197 | "%%cython\n",
198 | "import numpy as np\n",
199 | "cimport numpy as cnp\n",
200 | "ctypedef cnp.int_t DTYPE_t\n",
201 | "\n",
202 | "cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):\n",
203 | " cdef:\n",
204 | " int i = 0\n",
205 | " int n = arr.shape[0]\n",
206 | " int x\n",
207 | " cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n",
208 | "\n",
209 | " while i < n:\n",
210 | " x = arr[i]\n",
211 | " if x % 2:\n",
212 | " new_arr[i] = x + 1\n",
213 | " else:\n",
214 | " new_arr[i] = x - 1\n",
215 | " i += 1\n",
216 | " return new_arr\n",
217 | "\n",
218 | "def transform(data):\n",
219 | " data.loc[:, \"new_val\"] = _transform(data.val.values)"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": 8,
225 | "metadata": {},
226 | "outputs": [
227 | {
228 | "name": "stdout",
229 | "output_type": "stream",
230 | "text": [
231 | "10.8 ms ± 512 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
232 | ]
233 | }
234 | ],
235 | "source": [
236 | "%timeit -n 1 transform(data)"
237 | ]
238 | },
239 | {
240 | "cell_type": "markdown",
241 | "metadata": {},
242 | "source": [
243 | "## 4. 使用不安全的数组\n",
244 | "利用@cython.boundscheck(False),@cython.wraparound(False)装饰器关闭数组的边界检查和负下标处理,平均运行时间变为5.9毫秒。性能提升了42倍左右,顺利完成任务。"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": 9,
250 | "metadata": {},
251 | "outputs": [],
252 | "source": [
253 | "%%cython\n",
254 | "import cython\n",
255 | "import numpy as np\n",
256 | "cimport numpy as cnp\n",
257 | "ctypedef cnp.int_t DTYPE_t\n",
258 | "\n",
259 | "\n",
260 | "@cython.boundscheck(False)\n",
261 | "@cython.wraparound(False)\n",
262 | "cpdef cnp.ndarray[DTYPE_t] _transform(cnp.ndarray[DTYPE_t] arr):\n",
263 | " cdef:\n",
264 | " int i = 0\n",
265 | " int n = arr.shape[0]\n",
266 | " int x\n",
267 | " cnp.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n",
268 | "\n",
269 | " while i < n:\n",
270 | " x = arr[i]\n",
271 | " if x % 2:\n",
272 | " new_arr[i] = x + 1\n",
273 | " else:\n",
274 | " new_arr[i] = x - 1\n",
275 | " i += 1\n",
276 | " return new_arr\n",
277 | "\n",
278 | "def transform(data):\n",
279 | " data.loc[:, \"new_val\"] = _transform(data.val.values)"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": 10,
285 | "metadata": {},
286 | "outputs": [
287 | {
288 | "name": "stdout",
289 | "output_type": "stream",
290 | "text": [
291 | "6.76 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
292 | ]
293 | }
294 | ],
295 | "source": [
296 | "%timeit -n 1 transform(data)"
297 | ]
298 | }
299 | ],
300 | "metadata": {
301 | "kernelspec": {
302 | "display_name": "Python 3",
303 | "language": "python",
304 | "name": "python3"
305 | },
306 | "language_info": {
307 | "codemirror_mode": {
308 | "name": "ipython",
309 | "version": 3
310 | },
311 | "file_extension": ".py",
312 | "mimetype": "text/x-python",
313 | "name": "python",
314 | "nbconvert_exporter": "python",
315 | "pygments_lexer": "ipython3",
316 | "version": "3.6.6"
317 | }
318 | },
319 | "nbformat": 4,
320 | "nbformat_minor": 2
321 | }
322 |
--------------------------------------------------------------------------------
/Python Standard Library.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 用Python标准库写出高效的代码\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 1. bisect - 二分查找\n",
17 | "给定一个列表对象,我们要对目标元素进行查找,返回其在列表中的下标。 \n",
18 | "* 首先想到的是Python列表的index方法。建立一个长度为10000的升序列表,编写search函数使用index方式把里面的每一个元素查找一遍,平均运行时间437毫秒。\n",
19 | "* 使用bisect模块的bisect_left,也就是我们熟知的二分查找。编写fast_search函数,平均运行时间3.94毫秒,性能提升了110倍!"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 1,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "import bisect"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 2,
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "def search(nums):\n",
38 | " for x in nums:\n",
39 | " nums.index(x)"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 3,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": [
48 | "def fast_search(nums):\n",
49 | " for x in nums:\n",
50 | " bisect.bisect_left(nums, x)"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 4,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "arr = list(range(10000))"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 5,
65 | "metadata": {},
66 | "outputs": [
67 | {
68 | "name": "stdout",
69 | "output_type": "stream",
70 | "text": [
71 | "437 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
72 | ]
73 | }
74 | ],
75 | "source": [
76 | "%timeit -n 1 search(arr)"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 6,
82 | "metadata": {},
83 | "outputs": [
84 | {
85 | "name": "stdout",
86 | "output_type": "stream",
87 | "text": [
88 | "3.94 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
89 | ]
90 | }
91 | ],
92 | "source": [
93 | "%timeit -n 1 fast_search(arr)"
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "## 2. Counter - 高效计数\n",
101 | "给定一个列表对象,我们要统计其中的每个不重复的元素出现了多少次,返回一个字典对象。 \n",
102 | "* 创建一个长度为10000,元素为1-3之间的随机数的列表。编写count函数,创建一个空字典,用for循环遍历该列表,将计数结果写入字典。平均运行时间937微秒。\n",
103 | "* 使用collections模块的Counter,编写fast_count函数,一行代码搞定。平均运行时间494微秒,性能几乎是原来的2倍。"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 7,
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "from collections import Counter\n",
113 | "from random import randint"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 8,
119 | "metadata": {},
120 | "outputs": [],
121 | "source": [
122 | "def count(nums):\n",
123 | " res = dict()\n",
124 | " for x in nums:\n",
125 | " if x in res:\n",
126 | " res[x] += 1\n",
127 | " else:\n",
128 | " res[x] = 0\n",
129 | " return x"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 9,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "def fast_count(nums):\n",
139 | " return Counter(nums)"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 10,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "nums = [randint(1, 3) for _ in range(10000)]"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": 11,
154 | "metadata": {},
155 | "outputs": [
156 | {
157 | "name": "stdout",
158 | "output_type": "stream",
159 | "text": [
160 | "937 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
161 | ]
162 | }
163 | ],
164 | "source": [
165 | "%timeit -n 1 count(nums)"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 12,
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "name": "stdout",
175 | "output_type": "stream",
176 | "text": [
177 | "494 µs ± 240 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
178 | ]
179 | }
180 | ],
181 | "source": [
182 | "%timeit -n 1 fast_count(nums)"
183 | ]
184 | },
185 | {
186 | "cell_type": "markdown",
187 | "metadata": {},
188 | "source": [
189 | "## 3. heapq - 堆\n",
190 | "给定一个列表对象,返回该列表中最小的3个元素。\n",
191 | "* 创建一个长度为10000的列表,对元素进行随机打乱。编写top_3函数,对列表进行排序,返回前3个元素。平均运行时间2.03毫秒。\n",
192 | "* 使用heapq模块,也就是我们熟悉的堆,编写fast_top_3函数。平均运行时间296微秒,性能提升了6.8倍。"
193 | ]
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": 13,
198 | "metadata": {},
199 | "outputs": [],
200 | "source": [
201 | "import heapq\n",
202 | "from random import shuffle"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 14,
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "def top_3(nums):\n",
212 | " return sorted(nums)[:3]"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 15,
218 | "metadata": {},
219 | "outputs": [],
220 | "source": [
221 | "def fast_top_3(nums):\n",
222 | " return heapq.nsmallest(3, nums)"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 16,
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "nums = list(range(10000))\n",
232 | "shuffle(nums)"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": 17,
238 | "metadata": {},
239 | "outputs": [
240 | {
241 | "name": "stdout",
242 | "output_type": "stream",
243 | "text": [
244 | "2.03 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
245 | ]
246 | }
247 | ],
248 | "source": [
249 | "%timeit -n 1 top_3(nums)"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": 18,
255 | "metadata": {},
256 | "outputs": [
257 | {
258 | "name": "stdout",
259 | "output_type": "stream",
260 | "text": [
261 | "296 µs ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
262 | ]
263 | }
264 | ],
265 | "source": [
266 | "%timeit -n 1 fast_top_3(nums)"
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {},
272 | "source": [
273 | "## 4. itemgetter - 批量get元素\n",
274 | "给定一个字典和一个列表,列表中包含一个或多个字典中的key,返回对应的values。\n",
275 | "* 创建一个元素数量为10万的字典,从字典的key中随机抽样10万,形成一个长度为1万的列表。编写get_items函数,平均运行时间1.12毫秒\n",
276 | "* 使用itemgetter批量读取这些元素,编写fast_get_items函数,平均运行时间836微秒,性能是原来的1.3倍。\n"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 19,
282 | "metadata": {},
283 | "outputs": [],
284 | "source": [
285 | "from operator import itemgetter\n",
286 | "from random import choices"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 20,
292 | "metadata": {},
293 | "outputs": [],
294 | "source": [
295 | "def get_items(data, keys):\n",
296 | " return [data[x] for x in keys]"
297 | ]
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": 21,
302 | "metadata": {},
303 | "outputs": [],
304 | "source": [
305 | "def fast_get_items(data, keys):\n",
306 | " return itemgetter(*keys)(data)"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": 22,
312 | "metadata": {},
313 | "outputs": [],
314 | "source": [
315 | "data= dict(enumerate(range(100000)))\n",
316 | "keys = choices(list(data.keys()), k=10000)"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 23,
322 | "metadata": {},
323 | "outputs": [
324 | {
325 | "name": "stdout",
326 | "output_type": "stream",
327 | "text": [
328 | "1.12 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)\n"
329 | ]
330 | }
331 | ],
332 | "source": [
333 | "%timeit -n 5 get_items(data, keys)"
334 | ]
335 | },
336 | {
337 | "cell_type": "code",
338 | "execution_count": 24,
339 | "metadata": {},
340 | "outputs": [
341 | {
342 | "name": "stdout",
343 | "output_type": "stream",
344 | "text": [
345 | "836 µs ± 287 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)\n"
346 | ]
347 | }
348 | ],
349 | "source": [
350 | "%timeit -n 5 fast_get_items(data, keys)"
351 | ]
352 | },
353 | {
354 | "cell_type": "markdown",
355 | "metadata": {},
356 | "source": [
357 | "## 5. lru_cache - 空间换时间\n",
358 | "给定数字n,返回长度为n的斐波那且数列\n",
359 | "* 使用递归方式,编写fib函数,并用fib_seq函数对其进行循环调用。令n等于20,平均运行时间3.28ms。\n",
360 | "* 使用@lru_cache语法糖,将已经计算出来的结果缓存起来,比如fib(4),计算fib(5)的时候可以直接调用缓存的fib(4)。平均运行时间144微秒,性能提升了22倍。"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": 25,
366 | "metadata": {},
367 | "outputs": [],
368 | "source": [
369 | "from functools import lru_cache"
370 | ]
371 | },
372 | {
373 | "cell_type": "code",
374 | "execution_count": 36,
375 | "metadata": {},
376 | "outputs": [],
377 | "source": [
378 | "def fib(n):\n",
379 | " if n < 2:\n",
380 | " return n\n",
381 | " return fib(n-1) + fib(n-2)\n",
382 | "\n",
383 | "def fib_seq(n):\n",
384 | " return [fib(x) for x in range(n)]"
385 | ]
386 | },
387 | {
388 | "cell_type": "code",
389 | "execution_count": 27,
390 | "metadata": {},
391 | "outputs": [],
392 | "source": [
393 | "@lru_cache(maxsize=None)\n",
394 | "def fast_fib(n):\n",
395 | " if n < 2:\n",
396 | " return n\n",
397 | " return fib(n-1) + fib(n-2)\n",
398 | "\n",
399 | "def fast_fib_seq(n):\n",
400 | " return [fast_fib(x) for x in range(n)]"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": 28,
406 | "metadata": {},
407 | "outputs": [
408 | {
409 | "name": "stdout",
410 | "output_type": "stream",
411 | "text": [
412 | "3.28 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)\n"
413 | ]
414 | }
415 | ],
416 | "source": [
417 | "%timeit -n 5 fib_seq(20)"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 29,
423 | "metadata": {},
424 | "outputs": [
425 | {
426 | "name": "stdout",
427 | "output_type": "stream",
428 | "text": [
429 | "The slowest run took 524.07 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
430 | "144 µs ± 347 µs per loop (mean ± std. dev. of 7 runs, 3 loops each)\n"
431 | ]
432 | }
433 | ],
434 | "source": [
435 | "%timeit -n 5 fast_fib_seq(20)"
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "metadata": {},
442 | "outputs": [],
443 | "source": []
444 | }
445 | ],
446 | "metadata": {
447 | "kernelspec": {
448 | "display_name": "Python 3",
449 | "language": "python",
450 | "name": "python3"
451 | },
452 | "language_info": {
453 | "codemirror_mode": {
454 | "name": "ipython",
455 | "version": 3
456 | },
457 | "file_extension": ".py",
458 | "mimetype": "text/x-python",
459 | "name": "python",
460 | "nbconvert_exporter": "python",
461 | "pygments_lexer": "ipython3",
462 | "version": "3.6.6"
463 | }
464 | },
465 | "nbformat": 4,
466 | "nbformat_minor": 2
467 | }
468 |
--------------------------------------------------------------------------------
/Built-in method.md:
--------------------------------------------------------------------------------
1 | # 使用内置方法优化Python性能
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | Python程序运行太慢的一个可能的原因是没有尽可能的调用内置方法,下面通过5个例子来演示如何用内置方法提升Python程序的性能。
6 |
7 | ## 1. 数组求平方和
8 | 输入一个列表,要求计算出该列表中数字的的平方和。最终性能提升了1.4倍。
9 |
10 | 首先创建一个长度为10000的列表。
11 |
12 |
13 | ```python
14 | arr = list(range(10000))
15 | ```
16 |
17 | ### 1.1 最常规的写法
18 | while循环遍历列表求平方和。平均运行时间2.97毫秒。
19 |
20 |
21 | ```python
22 | def sum_sqr_0(arr):
23 | res = 0
24 | n = len(arr)
25 | i = 0
26 | while i < n:
27 | res += arr[i] ** 2
28 | i += 1
29 | return res
30 | ```
31 |
32 |
33 | ```python
34 | %timeit sum_sqr_0(arr)
35 | ```
36 |
37 | 2.97 ms ± 36.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
38 |
39 |
40 | ### 1.2 for range代替while循环
41 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间2.9毫秒。
42 |
43 |
44 | ```python
45 | def sum_sqr_1(arr):
46 | res = 0
47 | for i in range(len(arr)):
48 | res += arr[i] ** 2
49 | return res
50 | ```
51 |
52 |
53 | ```python
54 | %timeit sum_sqr_1(arr)
55 | ```
56 |
57 | 2.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
58 |
59 |
60 | ### 1.3 for x in arr代替for range
61 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间2.59毫秒。
62 |
63 |
64 | ```python
65 | def sum_sqr_2(arr):
66 | res = 0
67 | for x in arr:
68 | res += x ** 2
69 | return res
70 | ```
71 |
72 |
73 | ```python
74 | %timeit sum_sqr_2(arr)
75 | ```
76 |
77 | 2.59 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
78 |
79 |
80 | ### 1.4 sum函数套用map函数
81 | 平均运行时间2.36毫秒
82 |
83 |
84 | ```python
85 | def sum_sqr_3(arr):
86 | return sum(map(lambda x: x**2, arr))
87 | ```
88 |
89 |
90 | ```python
91 | %timeit sum_sqr_3(arr)
92 | ```
93 |
94 | 2.36 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
95 |
96 |
97 | ### 1.5 sum函数套用生成器表达式
98 | 生成器表达式如果作为某个函数的参数,则可以省略掉()。平均运行时间2.35毫秒。
99 |
100 |
101 | ```python
102 | def sum_sqr_4(arr):
103 | return sum(x ** 2 for x in arr)
104 | ```
105 |
106 |
107 | ```python
108 | %timeit sum_sqr_4(arr)
109 | ```
110 |
111 | 2.35 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
112 |
113 |
114 | ### 1. 6 sum函数套用列表推导式
115 | 平均运行时间2.06毫秒。
116 |
117 |
118 | ```python
119 | def sum_sqr_5(arr):
120 | return sum([x ** 2 for x in arr])
121 | ```
122 |
123 |
124 | ```python
125 | %timeit sum_sqr_5(arr)
126 | ```
127 |
128 | 2.06 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
129 |
130 |
131 | ## 2. 字符串拼接
132 | 输入一个列表,要求将列表中的字符串的前3个字符都拼接为一个字符串。最终性能提升了2.1倍。
133 |
134 | 首先创建一个列表,生成10000个随机长度和内容的字符串。
135 |
136 |
137 | ```python
138 | from random import randint
139 |
140 | def random_letter():
141 | return chr(ord('a') + randint(0, 25))
142 |
143 | def random_letters(n):
144 | return "".join([random_letter() for _ in range(n)])
145 |
146 | strings = [random_letters(randint(1, 10)) for _ in range(10000)]
147 | ```
148 |
149 | ### 2.1 最常规的写法
150 | while循环遍历列表,对字符串进行拼接。平均运行时间1.86毫秒。
151 |
152 |
153 | ```python
154 | def concat_strings_0(strings):
155 | res = ""
156 | n = len(strings)
157 | i = 0
158 | while i < n:
159 | res += strings[i][:3]
160 | i += 1
161 | return res
162 | ```
163 |
164 |
165 | ```python
166 | %timeit concat_strings_0(strings)
167 | ```
168 |
169 | 1.86 ms ± 74.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
170 |
171 |
172 | ### 2.2 for range代替while循环
173 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间1.55毫秒。
174 |
175 |
176 | ```python
177 | def concat_strings_1(strings):
178 | res = ""
179 | for i in range(len(strings)):
180 | res += strings[i][:3]
181 | return res
182 | ```
183 |
184 |
185 | ```python
186 | %timeit concat_strings_1(strings)
187 | ```
188 |
189 | 1.55 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
190 |
191 |
192 | ### 2.3 for x in strings代替for range
193 | 避免strings[i]的变量类型检查带来的额外开销。平均运行时间1.32毫秒。
194 |
195 |
196 | ```python
197 | def concat_strings_2(strings):
198 | res = ""
199 | for x in strings:
200 | res += x[:3]
201 | return res
202 | ```
203 |
204 |
205 | ```python
206 | %timeit concat_strings_2(strings)
207 | ```
208 |
209 | 1.32 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
210 |
211 |
212 | ### 2.4 .join方法套用生成器表达式
213 | 平均运行时间1.06毫秒。
214 |
215 |
216 | ```python
217 | def concat_strings_3(strings):
218 | return "".join(x[:3] for x in strings)
219 | ```
220 |
221 |
222 | ```python
223 | %timeit concat_strings_3(strings)
224 | ```
225 |
226 | 1.06 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
227 |
228 |
229 | ### 2.5 .join方法套用列表解析式
230 | 平均运行时间0.85毫秒。
231 |
232 |
233 | ```python
234 | def concat_strings_4(strings):
235 | return "".join([x[:3] for x in strings])
236 | ```
237 |
238 |
239 | ```python
240 | %timeit concat_strings_4(strings)
241 | ```
242 |
243 | 858 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
244 |
245 |
246 | ## 3. 筛选奇数
247 |
248 | 输入一个列表,要求筛选出该列表中的所有奇数。最终性能提升了3.6倍。
249 |
250 | 首先创建一个长度为10000的列表。
251 |
252 |
253 | ```python
254 | arr = list(range(10000))
255 | ```
256 |
257 | ### 3.1 最常规的写法
258 | 创建一个空列表res,while循环遍历列表,将奇数append到res中。平均运行时间1.03毫秒。
259 |
260 |
261 | ```python
262 | def filter_odd_0(arr):
263 | res = []
264 | i = 0
265 | n = len(arr)
266 | while i < n:
267 | if arr[i] % 2:
268 | res.append(arr[i])
269 | i += 1
270 | return res
271 | ```
272 |
273 |
274 | ```python
275 | %timeit filter_odd_0(arr)
276 | ```
277 |
278 | 1.03 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
279 |
280 |
281 | ### 3.2 for range代替while循环
282 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间0.965毫秒。
283 |
284 |
285 | ```python
286 | def filter_odd_1(arr):
287 | res = []
288 | for i in range(len(arr)):
289 | if arr[i] % 2:
290 | res.append(arr[i])
291 | i += 1
292 | return res
293 | ```
294 |
295 |
296 | ```python
297 | %timeit filter_odd_1(arr)
298 | ```
299 |
300 | 965 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
301 |
302 |
303 | ### 3.3 for x in arr代替for range
304 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.430毫秒。
305 |
306 |
307 | ```python
308 | def filter_odd_2(arr):
309 | res = []
310 | for x in arr:
311 | if x % 2:
312 | res.append(x)
313 | return res
314 | ```
315 |
316 |
317 | ```python
318 | %timeit filter_odd_2(arr)
319 | ```
320 |
321 | 430 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
322 |
323 |
324 | ### 3.4 list套用filter函数
325 | 平均运行时间0.763毫秒。注意filter函数很慢,在Python 3.6里非常鸡肋。
326 |
327 |
328 | ```python
329 | def filter_odd_3(arr):
330 | return list(filter(lambda x: x % 2, arr))
331 | ```
332 |
333 |
334 | ```python
335 | %timeit filter_odd_3(arr)
336 | ```
337 |
338 | 763 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
339 |
340 |
341 | ### 3.5 list套用生成器表达式
342 | 平均运行时间0.398毫秒。
343 |
344 |
345 | ```python
346 | def filter_odd_4(arr):
347 | return list((x for x in arr if x % 2))
348 | ```
349 |
350 |
351 | ```python
352 | %timeit filter_odd_4(arr)
353 | ```
354 |
355 | 398 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
356 |
357 |
358 | ### 3.6 带条件的列表推导式
359 | 平均运行时间0.290毫秒。
360 |
361 |
362 | ```python
363 | def filter_odd_5(arr):
364 | return [x for x in arr if x % 2]
365 | ```
366 |
367 |
368 | ```python
369 | %timeit filter_odd_5(arr)
370 | ```
371 |
372 | 290 µs ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
373 |
374 |
375 | ## 4. 两个数组相加
376 |
377 | 输入两个长度相同的列表,要求计算出两个列表对应位置的数字之和,返回一个与输入长度相同的列表。最终性能提升了2.7倍。
378 |
379 | 首先生成两个长度为10000的列表。
380 |
381 |
382 | ```python
383 | arr1 = list(range(10000))
384 | arr2 = list(range(10000))
385 | ```
386 |
387 | ### 4.1 最常规的写法
388 | 创建一个空列表res,while循环遍历列表,将两个列表对应的元素之和append到res中。平均运行时间1.23毫秒。
389 |
390 |
391 | ```python
392 | def arr_sum_0(arr1, arr2):
393 | i = 0
394 | n = len(arr1)
395 | res = []
396 | while i < n:
397 | res.append(arr1[i] + arr2[i])
398 | i += 1
399 | return res
400 | ```
401 |
402 |
403 | ```python
404 | %timeit arr_sum_0(arr1, arr2)
405 | ```
406 |
407 | 1.23 ms ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
408 |
409 |
410 | ### 4.2 for range代替while循环
411 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间0.997毫秒。
412 |
413 |
414 | ```python
415 | def arr_sum_1(arr1, arr2):
416 | res = []
417 | for i in range(len(arr1)):
418 | res.append(arr1[i] + arr2[i])
419 | return res
420 | ```
421 |
422 |
423 | ```python
424 | %timeit arr_sum_1(arr1, arr2)
425 | ```
426 |
427 | 997 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
428 |
429 |
430 | ### 4.3 for i, x in enumerate代替for range
431 | 部分避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.799毫秒。
432 |
433 |
434 | ```python
435 | def arr_sum_2(arr1, arr2):
436 | res = arr1.copy()
437 | for i, x in enumerate(arr2):
438 | res[i] += x
439 | return res
440 | ```
441 |
442 |
443 | ```python
444 | %timeit arr_sum_2(arr1, arr2)
445 | ```
446 |
447 | 799 µs ± 16.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
448 |
449 |
450 | ### 4.4 for x, y in zip代替for range
451 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.769毫秒。
452 |
453 |
454 | ```python
455 | def arr_sum_3(arr1, arr2):
456 | res = []
457 | for x, y in zip(arr1, arr2):
458 | res.append(x + y)
459 | return res
460 | ```
461 |
462 |
463 | ```python
464 | %timeit arr_sum_3(arr1, arr2)
465 | ```
466 |
467 | 769 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
468 |
469 |
470 | ### 4.5 列表推导式套用zip
471 | 平均运行时间0.462毫秒。
472 |
473 |
474 | ```python
475 | def arr_sum_4(arr1, arr2):
476 | return [x + y for x, y in zip(arr1, arr2)]
477 | ```
478 |
479 |
480 | ```python
481 | %timeit arr_sum_4(arr1, arr2)
482 | ```
483 |
484 | 462 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
485 |
486 |
487 | ## 5. 两个列表相同元素的数量
488 | 输入两个列表,要求统计两个列表相同元素的数量。其中每个列表内的元素都是不重复的。最终性能提升了5000倍。
489 |
490 | 首先创建两个列表,并将元素的顺序打乱。
491 |
492 |
493 | ```python
494 | from random import shuffle
495 | arr1 = list(range(2000))
496 | shuffle(arr1)
497 | arr2 = list(range(1000, 3000))
498 | shuffle(arr2)
499 | ```
500 |
501 | ### 5.1 最常规的写法
502 | while循环嵌套,判断元素arr1[i]是否等于arr2[j],平均运行时间338毫秒。
503 |
504 |
505 | ```python
506 | def n_common_0(arr1, arr2):
507 | res = 0
508 | i = 0
509 | m = len(arr1)
510 | n = len(arr2)
511 | while i < m:
512 | j = 0
513 | while j < n:
514 | if arr1[i] == arr2[j]:
515 | res += 1
516 | j += 1
517 | i += 1
518 | return res
519 | ```
520 |
521 |
522 | ```python
523 | %timeit n_common_0(arr1, arr2)
524 | ```
525 |
526 | 338 ms ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
527 |
528 |
529 | ### 5.2 for range代替while循环
530 | 避免i += 1的变量类型检查带来的额外开销。平均运行时间233毫秒。
531 |
532 |
533 | ```python
534 | def n_common_1(arr1, arr2):
535 | res = 0
536 | for i in range(len(arr1)):
537 | for j in range(len(arr2)):
538 | if arr1[i] == arr2[j]:
539 | res += 1
540 | return res
541 | ```
542 |
543 |
544 | ```python
545 | %timeit n_common_1(arr1, arr2)
546 | ```
547 |
548 | 233 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
549 |
550 |
551 | ### 5.3 for x in arr代替for range
552 | 避免arr[i]的变量类型检查带来的额外开销。平均运行时间84.8毫秒。
553 |
554 |
555 | ```python
556 | def n_common_2(arr1, arr2):
557 | res = 0
558 | for x in arr1:
559 | for y in arr2:
560 | if x == y:
561 | res += 1
562 | return res
563 | ```
564 |
565 |
566 | ```python
567 | %timeit n_common_2(arr1, arr2)
568 | ```
569 |
570 | 84.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
571 |
572 |
573 | ### 5.4 使用if x in arr2代替内层循环
574 | 平均运行时间24.9毫秒。
575 |
576 |
577 | ```python
578 | def n_common_3(arr1, arr2):
579 | res = 0
580 | for x in arr1:
581 | if x in arr2:
582 | res += 1
583 | return res
584 | ```
585 |
586 |
587 | ```python
588 | %timeit n_common_3(arr1, arr2)
589 | ```
590 |
591 | 24.9 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
592 |
593 |
594 | ### 5.4 使用更快的算法
595 | 将数组用.sort方法排序,再进行单层循环遍历。把时间复杂度从O(n2)降低到O(nlogn),平均运行时间0.239毫秒。
596 |
597 |
598 | ```python
599 | def n_common_4(arr1, arr2):
600 | arr1.sort()
601 | arr2.sort()
602 | res = i = j = 0
603 | m, n = len(arr1), len(arr2)
604 | while i < m and j < n:
605 | if arr1[i] == arr2[j]:
606 | res += 1
607 | i += 1
608 | j += 1
609 | elif arr1[i] > arr2[j]:
610 | j += 1
611 | else:
612 | i += 1
613 | return res
614 | ```
615 |
616 |
617 | ```python
618 | %timeit n_common_4(arr1, arr2)
619 | ```
620 |
621 | 329 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
622 |
623 |
624 | ### 5.5 使用更好的数据结构
625 | 将数组转为集合,求交集的长度。平均运行时间0.067毫秒。
626 |
627 |
628 | ```python
629 | def n_common_5(arr1, arr2):
630 | return len(set(arr1) & set(arr2))
631 | ```
632 |
633 |
634 | ```python
635 | %timeit n_common_5(arr1, arr2)
636 | ```
637 |
638 | 67.2 µs ± 755 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
639 |
640 |
641 |
642 | ```python
643 |
644 | ```
645 |
--------------------------------------------------------------------------------
/More efficient array.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 4种方法提升Python数组的效率"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 1. Python的列表为什么慢\n",
15 | "Python的列表是一个动态的数组,即数组的size是可以调整的,数组存放着指向各个列表元素的指针(PyObject*)。列表中的各个元素可以是不同的类型,比如my_list = ['a', 1, True]。实际上数组里存放了三个指针,分别指向了这三个元素。那么相比其他语言的数组而言,为什么Python的列表会慢呢?原因主要是以下两个:\n",
16 | "1. Python是动态类型语言,意味着类型检查要耗费额外的时间。\n",
17 | "2. Python或者说Cpython没有JIT优化器。"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "## 2. 如何用Python执行快速的数组计算\n",
25 | "目前比较主流的解决方案有如下几种:\n",
26 | "1. Numpy - Numpy的array更像是C/C++的数组,数据类型一致,而且array的方法(如sum)都是用C来实现的。\n",
27 | "2. Numba - 使用JIT技术,优化Numpy的性能。无论是调用Numpy的方法,还是使用for循环遍历Numpy数组,都可以得到性能提升。\n",
28 | "3. Numexpr - 避免Numpy为中间结果分配内存,优化Numpy性能,主要用于大数组的表达式计算。\n",
29 | "4. Cython - 为Python编写C/C++扩展。\n",
30 | "\n",
31 | "接下来通过两个例子来演示如何通过这四种工具"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "## 3. 数组求平方和"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 1,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": [
47 | "arr = [x for x in range(10000)]"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "### 3.1 for循环"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 2,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "The result is: 333283335000\n",
67 | "2.53 ms ± 91.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
68 | ]
69 | }
70 | ],
71 | "source": [
72 | "def sqr_sum(arr):\n",
73 | " total = 0\n",
74 | " for x in arr:\n",
75 | " total += x ** 2\n",
76 | " return total\n",
77 | "\n",
78 | "print(\"The result is:\", sqr_sum(arr))\n",
79 | "%timeit sqr_sum(arr)"
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "metadata": {},
85 | "source": [
86 | "### 3.2 Numpy"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": 3,
92 | "metadata": {},
93 | "outputs": [],
94 | "source": [
95 | "import numpy as np"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 4,
101 | "metadata": {},
102 | "outputs": [
103 | {
104 | "name": "stdout",
105 | "output_type": "stream",
106 | "text": [
107 | "The result is: 333283335000\n",
108 | "9.66 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
109 | ]
110 | }
111 | ],
112 | "source": [
113 | "def sqr_sum(arr):\n",
114 | " return (arr ** 2).sum()\n",
115 | "\n",
116 | "arr = np.array(arr)\n",
117 | "print(\"The result is:\", sqr_sum(arr))\n",
118 | "%timeit sqr_sum(arr)"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "### 3.3 Numba"
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": 5,
131 | "metadata": {},
132 | "outputs": [],
133 | "source": [
134 | "from numba import jit"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 6,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "name": "stdout",
144 | "output_type": "stream",
145 | "text": [
146 | "The result is: 333283335000\n",
147 | "3.39 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
148 | ]
149 | }
150 | ],
151 | "source": [
152 | "@jit(nopython=True)\n",
153 | "def sqr_sum(arr):\n",
154 | " return (arr ** 2).sum()\n",
155 | "\n",
156 | "arr = np.array(arr)\n",
157 | "print(\"The result is:\", sqr_sum(arr))\n",
158 | "%timeit sqr_sum(arr)"
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {},
164 | "source": [
165 | "### 3.4 Numexpr"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 7,
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "import numexpr as ne"
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": 8,
180 | "metadata": {},
181 | "outputs": [
182 | {
183 | "name": "stdout",
184 | "output_type": "stream",
185 | "text": [
186 | "The result is: 333283335000\n",
187 | "14.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
188 | ]
189 | }
190 | ],
191 | "source": [
192 | "def sqr_sum(arr):\n",
193 | " return ne.evaluate(\"sum(arr * arr)\")\n",
194 | "\n",
195 | "arr = np.array(arr)\n",
196 | "print(\"The result is:\", sqr_sum(arr))\n",
197 | "%timeit sqr_sum(arr)"
198 | ]
199 | },
200 | {
201 | "cell_type": "markdown",
202 | "metadata": {},
203 | "source": [
204 | "### 3.5 Cython"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": 9,
210 | "metadata": {},
211 | "outputs": [],
212 | "source": [
213 | "%load_ext cython"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": 10,
219 | "metadata": {},
220 | "outputs": [],
221 | "source": [
222 | "%%cython\n",
223 | "cimport numpy as np\n",
224 | "ctypedef np.int_t DTYPE_t\n",
225 | "\n",
226 | "def sqr_sum(np.ndarray[DTYPE_t] arr):\n",
227 | " cdef:\n",
228 | " DTYPE_t total = 0\n",
229 | " DTYPE_t x\n",
230 | " int i = 0\n",
231 | " int n = len(arr)\n",
232 | " while i < n:\n",
233 | " total += arr[i] ** 2\n",
234 | " i += 1\n",
235 | " return total"
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": 11,
241 | "metadata": {},
242 | "outputs": [
243 | {
244 | "name": "stdout",
245 | "output_type": "stream",
246 | "text": [
247 | "The result is: 333283335000\n",
248 | "5.51 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
249 | ]
250 | }
251 | ],
252 | "source": [
253 | "arr = np.array(arr, dtype=\"int\")\n",
254 | "print(\"The result is:\", sqr_sum(arr))\n",
255 | "%timeit sqr_sum(arr)"
256 | ]
257 | },
258 | {
259 | "cell_type": "markdown",
260 | "metadata": {},
261 | "source": [
262 | "## 4. 数组变换"
263 | ]
264 | },
265 | {
266 | "cell_type": "code",
267 | "execution_count": 12,
268 | "metadata": {},
269 | "outputs": [],
270 | "source": [
271 | "arr = [x for x in range(1000000)]"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "### 4.1 for循环"
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": 13,
284 | "metadata": {},
285 | "outputs": [
286 | {
287 | "name": "stdout",
288 | "output_type": "stream",
289 | "text": [
290 | "The result is: [1, 3, 5, 7, 9] ...\n",
291 | "84.5 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
292 | ]
293 | }
294 | ],
295 | "source": [
296 | "def transform(arr):\n",
297 | " return [x * 2 + 1 for x in arr]\n",
298 | "\n",
299 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n",
300 | "%timeit transform(arr)"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "metadata": {},
306 | "source": [
307 | "### 4.2 Numpy"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 14,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "import numpy as np"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 15,
322 | "metadata": {},
323 | "outputs": [
324 | {
325 | "name": "stdout",
326 | "output_type": "stream",
327 | "text": [
328 | "The result is: [1 3 5 7 9] ...\n",
329 | "803 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
330 | ]
331 | }
332 | ],
333 | "source": [
334 | "def transform(arr):\n",
335 | " return arr * 2 + 1\n",
336 | "\n",
337 | "arr = np.array(arr)\n",
338 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n",
339 | "%timeit transform(arr)"
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {},
345 | "source": [
346 | "### 4.3 Numba"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": 16,
352 | "metadata": {},
353 | "outputs": [],
354 | "source": [
355 | "from numba import jit"
356 | ]
357 | },
358 | {
359 | "cell_type": "code",
360 | "execution_count": 17,
361 | "metadata": {},
362 | "outputs": [
363 | {
364 | "name": "stdout",
365 | "output_type": "stream",
366 | "text": [
367 | "The result is: [1 3 5 7 9] ...\n",
368 | "498 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
369 | ]
370 | }
371 | ],
372 | "source": [
373 | "@jit(nopython=True)\n",
374 | "def transform(arr):\n",
375 | " return arr * 2 + 1\n",
376 | "\n",
377 | "arr = np.array(arr)\n",
378 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n",
379 | "%timeit transform(arr)"
380 | ]
381 | },
382 | {
383 | "cell_type": "markdown",
384 | "metadata": {},
385 | "source": [
386 | "### 4.4 Numexpr"
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": 18,
392 | "metadata": {},
393 | "outputs": [],
394 | "source": [
395 | "import numexpr as ne"
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": 19,
401 | "metadata": {},
402 | "outputs": [
403 | {
404 | "name": "stdout",
405 | "output_type": "stream",
406 | "text": [
407 | "The result is: [1 3 5 7 9] ...\n",
408 | "369 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
409 | ]
410 | }
411 | ],
412 | "source": [
413 | "def transform(arr):\n",
414 | " return ne.evaluate(\"arr * 2 + 1\")\n",
415 | "\n",
416 | "arr = np.array(arr)\n",
417 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n",
418 | "%timeit transform(arr)"
419 | ]
420 | },
421 | {
422 | "cell_type": "markdown",
423 | "metadata": {},
424 | "source": [
425 | "### 4.5 Cython"
426 | ]
427 | },
428 | {
429 | "cell_type": "code",
430 | "execution_count": 20,
431 | "metadata": {},
432 | "outputs": [
433 | {
434 | "name": "stdout",
435 | "output_type": "stream",
436 | "text": [
437 | "The cython extension is already loaded. To reload it, use:\n",
438 | " %reload_ext cython\n"
439 | ]
440 | }
441 | ],
442 | "source": [
443 | "%load_ext cython"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": 21,
449 | "metadata": {},
450 | "outputs": [],
451 | "source": [
452 | "%%cython\n",
453 | "import numpy as np\n",
454 | "cimport numpy as np\n",
455 | "ctypedef np.int_t DTYPE_t\n",
456 | "\n",
457 | "def transform(np.ndarray[DTYPE_t] arr):\n",
458 | " cdef:\n",
459 | " np.ndarray[DTYPE_t] new_arr = np.empty_like(arr)\n",
460 | " int i = 0\n",
461 | " int n = len(arr)\n",
462 | " while i < n:\n",
463 | " new_arr[i] = arr[i] * 2 + 1\n",
464 | " i += 1\n",
465 | " return new_arr"
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": 22,
471 | "metadata": {},
472 | "outputs": [
473 | {
474 | "name": "stdout",
475 | "output_type": "stream",
476 | "text": [
477 | "The result is: [1 3 5 7 9] ...\n",
478 | "887 µs ± 29.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
479 | ]
480 | }
481 | ],
482 | "source": [
483 | "arr = np.array(arr)\n",
484 | "print(\"The result is:\", transform(arr)[:5], \"...\")\n",
485 | "%timeit transform(arr)"
486 | ]
487 | },
488 | {
489 | "cell_type": "markdown",
490 | "metadata": {},
491 | "source": [
492 | "## 5. 参考文章\n",
493 | "[How does python have different data types in an array?](https://stackoverflow.com/questions/10558670/how-does-python-have-different-data-types-in-an-array) \n",
494 | "[Why are Python Programs often slower than the Equivalent Program Written in C or C++?](https://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or) \n",
495 | "[How Fast Numpy Really is and Why?](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347)"
496 | ]
497 | },
498 | {
499 | "cell_type": "code",
500 | "execution_count": null,
501 | "metadata": {},
502 | "outputs": [],
503 | "source": []
504 | }
505 | ],
506 | "metadata": {
507 | "kernelspec": {
508 | "display_name": "Python 3",
509 | "language": "python",
510 | "name": "python3"
511 | },
512 | "language_info": {
513 | "codemirror_mode": {
514 | "name": "ipython",
515 | "version": 3
516 | },
517 | "file_extension": ".py",
518 | "mimetype": "text/x-python",
519 | "name": "python",
520 | "nbconvert_exporter": "python",
521 | "pygments_lexer": "ipython3",
522 | "version": "3.6.6"
523 | }
524 | },
525 | "nbformat": 4,
526 | "nbformat_minor": 4
527 | }
528 |
--------------------------------------------------------------------------------
/Using C++ in Cython.md:
--------------------------------------------------------------------------------
1 | ## 在Cython中使用C++
2 | 作者: tushushu
3 | 项目地址: https://github.com/tushushu/flying-python
4 |
5 | ## 1. 在Jupyter Notebook上使用C++
6 | - 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``
7 | - 接下来运行Cython代码,使用魔术命令 ``%%cython --cplus``
8 | - 如果使用MacOS,使用魔术命令 ``%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++``,详情请参考https://stackoverflow.com/questions/57367764/cant-import-cpplist-into-cython
9 |
10 |
11 | ```python
12 | %load_ext Cython
13 | ```
14 |
15 |
16 | ```cython
17 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
18 | # 注意: 使用 'cimport' 而不是 'import'
19 | from libcpp.string cimport string
20 | cdef string s
21 | s = b"Hello world!"
22 | print(s.decode("utf-8"))
23 | ```
24 |
25 | Hello world!
26 |
27 |
28 | ## 2. C++和Python类型的相互转换
29 |
30 | | Python type| C++ type | Python type |
31 | | ------ | ------ | ------ |
32 | | bytes | std::string | bytes |
33 | |iterable|std::vector|list|
34 | |iterable|std::list|list|
35 | |iterable|std::set|set|
36 | |iterable (len 2)|std::pair|tuple (len 2)|
37 |
38 | ## 3. 使用C++ STL
39 |
40 | ### 3.1 使用C++ Vector
41 | 可以替代Python的List。
42 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型
43 | 2. 遍历 - 让index自增,通过while循环进行遍历
44 | 3. 访问 - 和Python一样使用'[]'操作符对元素进行访问
45 | 4. 追加 - 与Python list的append方法相似,使用C++ Vector的push_back方法追加元素
46 |
47 | 最后,我们通过分别实现Python和C++版本的元素计数函数来对比性能,C++大约快240倍左右。
48 | 注意: 为了公平起见,函数没有传入参数,而是直接访问函数体外部的变量。避免计入C++版本把Python列表转换为C++ Vector的耗时。如果计入这部分耗时,C++的版本大约快4倍左右。
49 |
50 |
51 | ```cython
52 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
53 | from libcpp.vector cimport vector
54 | # 通过Python对象初始化
55 | cdef vector[int] vec = range(5)
56 | # 遍历
57 | cdef:
58 | int i = 0
59 | int n = vec.size()
60 | print("开始遍历...")
61 | while i < n:
62 | # 访问
63 | print("\t第%d个位置的元素是%d" % (i, vec[i]))
64 | i += 1
65 | print()
66 | # 追加
67 | vec.push_back(5)
68 | print("追加元素之后vec变为", vec)
69 | ```
70 |
71 | 开始遍历...
72 | 第0个位置的元素是0
73 | 第1个位置的元素是1
74 | 第2个位置的元素是2
75 | 第3个位置的元素是3
76 | 第4个位置的元素是4
77 |
78 | 追加元素之后vec变为 [0, 1, 2, 3, 4, 5]
79 |
80 |
81 |
82 | ```python
83 | arr = [x // 100 for x in range(1000)]
84 | target = 6
85 |
86 | def count_py():
87 | return sum(1 for x in arr if x == target)
88 |
89 | print("用Python来实现,计算结果为%d!"% count_py())
90 | ```
91 |
92 | 用Python来实现,计算结果为100!
93 |
94 |
95 |
96 | ```cython
97 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
98 | from libcpp.vector cimport vector
99 |
100 | cdef:
101 | int target = 6
102 | vector[int] v = [x // 100 for x in range(1000)]
103 |
104 | cdef int _count_cpp():
105 | cdef:
106 | int i = 0
107 | int n = v.size()
108 | int ret = 0
109 | while i < n:
110 | if v[i] == target:
111 | ret += 1
112 | i += 1
113 | return ret
114 |
115 | def count_cpp():
116 | return _count_cpp()
117 |
118 | print("用Cython(C++)来实现,计算结果为%d!"% count_cpp())
119 | ```
120 |
121 | 用Cython(C++)来实现,计算结果为100!
122 |
123 |
124 |
125 | ```python
126 | print("对比Python版本与C++版本的性能...")
127 | %timeit count_py()
128 | %timeit count_cpp()
129 | ```
130 |
131 | 对比Python版本与C++版本的性能...
132 | 29.9 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
133 | 130 ns ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
134 |
135 |
136 | ### 3.2 使用C++ Unordered Map
137 | 可以替代Python的Dict。
138 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型
139 | 2. 遍历 - 让泛型指针自增,通过while循环进行遍历
140 | 3. 访问 - 使用deref(C++中的'*'操作符)来解引用,返回pair对象,通过.first来访问key, .second来访问Value
141 | 4. 查找 - 使用unordered_map.count,返回1或0;或者用unordered_map.find,返回一个泛型指针,如果指针指向unordered_map.end,则表示未找到。
142 | 5. 追加/修改 - unordered_map[key] = value。如果Key不存在,'[]'操作符会添加一个Key,并赋值为默认的Value,比如0.0。所以,除非确定不会产生错误,否则在修改Key对应的Value之前,要先判断Key是否存在。这与Python的DecaultDict有点相似。
143 |
144 | 最后,我们通过分别实现Python和C++版本的map条件求和函数来对比性能,C++大约快40倍左右。
145 |
146 |
147 | ```cython
148 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
149 | from cython.operator cimport dereference as deref, preincrement as inc
150 | from libcpp.unordered_map cimport unordered_map
151 | # 通过Python对象初始化
152 | cdef unordered_map[int, float] mymap = {i: i/10 for i in range(10)}
153 | # 遍历
154 | cdef:
155 | unordered_map[int, float].iterator it = mymap.begin()
156 | unordered_map[int, float].iterator end = mymap.end()
157 | print("开始遍历...")
158 | while it != end:
159 | # 访问
160 | print("\tKey is %d, Value is %.1f" % (deref(it).first, deref(it).second))
161 | inc(it)
162 | print()
163 |
164 | # 查找
165 | print("开始查找...")
166 | if mymap.count(-2):
167 | print("\t元素-2存在!")
168 | else:
169 | print("\t元素-2不存在!")
170 |
171 | it = mymap.find(3)
172 | if it != end:
173 | print("\t元素3存在, 它的值是%.1f!" % deref(it).second)
174 | else:
175 | print("\t元素3不存在!")
176 | print()
177 |
178 | # 修改
179 | print("修改元素...")
180 | if mymap.count(3):
181 | mymap[3] += 1.0
182 | mymap[-2] # Key -2不存在,会被添加一个默认值0.0
183 | print("\tKey is 3, Value is %.1f" % mymap[3])
184 | print("\tKey is -2, Value is %.1f" % mymap[-2])
185 | ```
186 |
187 | 开始遍历...
188 | Key is 0, Value is 0.0
189 | Key is 1, Value is 0.1
190 | Key is 2, Value is 0.2
191 | Key is 3, Value is 0.3
192 | Key is 4, Value is 0.4
193 | Key is 5, Value is 0.5
194 | Key is 6, Value is 0.6
195 | Key is 7, Value is 0.7
196 | Key is 8, Value is 0.8
197 | Key is 9, Value is 0.9
198 |
199 | 开始查找...
200 | 元素-2不存在!
201 | 元素3存在, 它的值是0.3!
202 |
203 | 修改元素...
204 | Key is 3, Value is 1.3
205 | Key is -2, Value is 0.0
206 |
207 |
208 |
209 | ```python
210 | my_map = {x: x for x in range(100)}
211 | target = 50
212 |
213 | def sum_lt_py():
214 | return sum(my_map[x] for x in my_map if x < target)
215 |
216 | print("用Python来实现,计算结果为%d!"% sum_lt_py())
217 | ```
218 |
219 | 用Python来实现,计算结果为1225!
220 |
221 |
222 |
223 | ```cython
224 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
225 | from libcpp.unordered_map cimport unordered_map
226 | from cython.operator cimport dereference as deref, preincrement as inc
227 |
228 | cdef:
229 | unordered_map[int, int] my_map = {x: x for x in range(100)}
230 | int target = 50
231 |
232 | cdef _sum_lt_cpp():
233 | cdef:
234 | unordered_map[int, int].iterator it = my_map.begin()
235 | int ret
236 | while it != my_map.end():
237 | if deref(it).first < target:
238 | ret += deref(it).second
239 | inc(it)
240 | return ret
241 |
242 | def sum_lt_cpp():
243 | return _sum_lt_cpp()
244 |
245 | print("用Cython(C++)来实现,计算结果为%d!"% sum_lt_cpp())
246 | ```
247 |
248 | 用Cython(C++)来实现,计算结果为1225!
249 |
250 |
251 |
252 | ```python
253 | print("对比Python版本与C++版本的性能...")
254 | %timeit sum_lt_py()
255 | %timeit sum_lt_cpp()
256 | ```
257 |
258 | 对比Python版本与C++版本的性能...
259 | 6.56 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
260 | 162 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
261 |
262 |
263 | ### 3.3 使用C++ Unordered Set
264 | 可以替代Python的Set。
265 | 1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型
266 | 2. 遍历 - 让泛型指针自增,通过while循环进行遍历
267 | 3. 访问 - 使用deref(C++中的'*'操作符)来解引用
268 | 4. 查找 - 使用unordered_set.count,返回1或0
269 | 5. 追加 - 使用unordered_set.insert,如果元素已经存在,则元素不会被追加
270 | 6. 交集、并集、差集 - 据我所知,unordered_set的这些操作需要开发者自己去实现,不如Python的Set用起来方便。
271 |
272 | 最后,我们通过分别实现Python和C++版本的set求交集对比性能,C++大约**慢**20倍左右。详情可参考https://stackoverflow.com/questions/54763112/how-to-improve-stdset-intersection-performance-in-c
273 | 如果只是求两个集合相同元素的数量,C++的性能大约是Python的6倍。不难推测,C++的unordered set查询很快,但是创建很慢。
274 |
275 |
276 | ```cython
277 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
278 | from cython.operator cimport dereference as deref, preincrement as inc
279 | from libcpp.unordered_set cimport unordered_set
280 | # 通过Python对象初始化
281 | cdef unordered_set[int] myset = {i for i in range(5)}
282 | # 遍历
283 | cdef:
284 | unordered_set[int].iterator it = myset.begin()
285 | unordered_set[int].iterator end = myset.end()
286 | print("开始遍历...")
287 | while it != end:
288 | # 访问
289 | print("\tValue is %d" % deref(it))
290 | inc(it)
291 | print()
292 |
293 | # 查找
294 | print("开始查找...")
295 | if myset.count(-2):
296 | print("\t元素-2存在!")
297 | else:
298 | print("\t元素-2不存在!")
299 |
300 | print()
301 |
302 | # 追加
303 | print("追加元素...")
304 | myset.insert(0)
305 | myset.insert(-1)
306 |
307 | print("\tMyset is: ", myset)
308 | ```
309 |
310 | 开始遍历...
311 | Value is 0
312 | Value is 1
313 | Value is 2
314 | Value is 3
315 | Value is 4
316 |
317 | 开始查找...
318 | 元素-2不存在!
319 |
320 | 追加元素...
321 | Myset is: {0, 1, 2, 3, 4, -1}
322 |
323 |
324 |
325 | ```python
326 | myset1 = {x for x in range(100)}
327 | myset2 = {x for x in range(50, 60)}
328 |
329 | def intersection_py():
330 | return myset1 & myset2
331 |
332 | print("用Python来实现,计算结果为%s!"% intersection_py())
333 | ```
334 |
335 | 用Python来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!
336 |
337 |
338 |
339 | ```cython
340 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
341 | from cython.operator cimport dereference as deref, preincrement as inc
342 | from libcpp.unordered_set cimport unordered_set
343 |
344 | cdef:
345 | unordered_set[int] myset1 = {x for x in range(100)}
346 | unordered_set[int] myset2 = {x for x in range(50, 60)}
347 |
348 | cdef unordered_set[int] _intersection_cpp():
349 | cdef:
350 | unordered_set[int].iterator it = myset1.begin()
351 | unordered_set[int] ret
352 | while it != myset1.end():
353 | if myset2.count(deref(it)):
354 | ret.insert(deref(it))
355 | inc(it)
356 | return ret
357 |
358 | def intersection_cpp():
359 | return _intersection_cpp()
360 |
361 | print("用Cython(C++)来实现,计算结果为%s!"% intersection_cpp())
362 | ```
363 |
364 | 用Cython(C++)来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!
365 |
366 |
367 |
368 | ```python
369 | print("对比Python版本与C++版本的性能...")
370 | %timeit intersection_py()
371 | %timeit intersection_cpp()
372 | ```
373 |
374 | 对比Python版本与C++版本的性能...
375 | 274 ns ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
376 | 5.28 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
377 |
378 |
379 |
380 | ```python
381 | myset1 = {x for x in range(100)}
382 | myset2 = {x for x in range(50, 60)}
383 |
384 | def count_common_py():
385 | return len(myset1 & myset2)
386 |
387 | print("用Python(C++)来实现,计算结果为%s!"% count_common_py())
388 | ```
389 |
390 | 用Python(C++)来实现,计算结果为10!
391 |
392 |
393 |
394 | ```cython
395 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
396 | from cython.operator cimport dereference as deref, preincrement as inc
397 | from libcpp.unordered_set cimport unordered_set
398 |
399 | cdef:
400 | unordered_set[int] myset2 = {x for x in range(100)}
401 | unordered_set[int] myset1 = {x for x in range(50, 60)}
402 |
403 | cdef int _count_common_cpp():
404 | if myset1.size() > myset2.size():
405 | myset1.swap(myset2)
406 | cdef:
407 | unordered_set[int].iterator it = myset1.begin()
408 | int ret = 0
409 | while it != myset1.end():
410 | if myset2.count(deref(it)):
411 | ret += 1
412 | inc(it)
413 | return ret
414 |
415 | def count_common_cpp():
416 | return _count_common_cpp()
417 |
418 | print("用Cython(C++)来实现,计算结果为%s!"% count_common_cpp())
419 | ```
420 |
421 | 用Cython(C++)来实现,计算结果为10!
422 |
423 |
424 |
425 | ```python
426 | print("对比Python版本与C++版本的性能...")
427 | %timeit count_common_py()
428 | %timeit count_common_cpp()
429 | ```
430 |
431 | 对比Python版本与C++版本的性能...
432 | 295 ns ± 5.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
433 | 46.1 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
434 |
435 |
436 | ## 4. 传值与传引用
437 | Python的函数,如果是容器类对象(如List, Set),传递的是引用,否则传递的是值(如int, float),如果不希望让函数修改容器类对象,可以用deepcopy函数先拷贝一份容器的副本。
438 | 但在C++里默认都是传值,如果需要传引用需要声明。
439 | 以int型Vector为例,可以看到v1的值没有被pass_value修改,但被pass_reference修改了。
440 | - 传值使用 ``vector[int]``,pass_value函数只是传入了v1的一份拷贝,所以函数无法修改v1
441 | - 传引用使用 ``vector[int]&``,pass_reference传入了v1的引用,函数可以修改v1。
442 |
443 | 下面的两块代码可以展示Python与C++的不同之处。
444 |
445 |
446 | ```python
447 | from copy import deepcopy
448 |
449 | def pass_value(v):
450 | v = deepcopy(v)
451 | v[0] = -1
452 |
453 | def pass_reference(v):
454 | v[0] = -1
455 |
456 | v1 = [0, 0, 0]
457 | print("v1的初始值是%s" % v1)
458 | pass_value(v1)
459 | print("执行pass_value函数后,v1的值是%s" % v1)
460 | pass_reference(v1)
461 | print("执行pass_reference函数后,v1的值是%s" % v1)
462 | ```
463 |
464 | v1的初始值是[0, 0, 0]
465 | 执行pass_value函数后,v1的值是[0, 0, 0]
466 | 执行pass_reference函数后,v1的值是[-1, 0, 0]
467 |
468 |
469 |
470 | ```cython
471 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
472 |
473 | from libcpp.vector cimport vector
474 |
475 | cdef void pass_value(vector[int] v):
476 | v[0] = -1
477 |
478 | cdef void pass_reference(vector[int]& v):
479 | v[0] = -1
480 |
481 | cdef vector[int] v1 = [0, 0, 0]
482 | print("v1的初始值是%s" % v1)
483 | pass_value(v1)
484 | print("执行pass_value函数后,v1的值是%s" % v1)
485 | pass_reference(v1)
486 | print("执行pass_reference函数后,v1的值是%s" % v1)
487 | ```
488 |
489 | v1的初始值是[0, 0, 0]
490 | 执行pass_value函数后,v1的值是[0, 0, 0]
491 | 执行pass_reference函数后,v1的值是[-1, 0, 0]
492 |
493 |
494 | ## 5. 数字的范围
495 | Python只有int型,而且int的范围可以认为是无限大的,只要没有超出内存限制,所以Python使用者一般不太关心数值溢出等问题。但使用C++的时候就需要谨慎,C++各个数字类型对应的范围如下:
496 |
497 |
498 | |Type |Typical Bit Width |Typical Range|
499 | | ------ | ------ | ------ |
500 | |char |1byte |-127 to 127 or 0 to 255|
501 | |unsigned char |1byte |0 to 255|
502 | |signed char |1byte -127 to 127|
503 | |int |4bytes |-2147483648 to 2147483647|
504 | |unsigned int |4bytes |0 to 4294967295|
505 | |signed int |4bytes |-2147483648 to 2147483647|
506 | |short int |2bytes |-32768 to 32767|
507 | |unsigned short int |2bytes |0 to 65,535|
508 | |signed short int |2bytes |-32768 to 32767|
509 | |long int |4bytes |-2,147,483,648 to 2,147,483,647|
510 | |signed long int |8bytes |same as long int|
511 | |unsigned long int |4bytes |0 to 4,294,967,295|
512 | |long long int |8bytes |-(2^63) to (2^63)-1|
513 | |unsigned long long int |8bytes |0 to 18,446,744,073,709,551,615|
514 | |float |4bytes ||
515 | |double |8bytes ||
516 | |long double |12bytes||
517 | |wchar_t |2 or 4 bytes |1 wide character|
518 |
519 |
520 | 比如下面的函数就会造成错误。
521 |
522 |
523 | ```cython
524 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
525 | def sum_py(num1, num2):
526 | print("The result by python is:", num1 + num2)
527 |
528 | cdef int _sum_cpp(int num1, int num2): # int -> long int
529 | return num1 + num2
530 |
531 | def sum_cpp(num1, num2):
532 | print("The result by cpp is:", _sum_cpp(num1, num2))
533 | ```
534 |
535 |
536 | ```python
537 | sum_py(2**31-1, 1)
538 | sum_cpp(2**31-1, 1)
539 | ```
540 |
541 | The result by python is: 2147483648
542 | The result by cpp is: -2147483648
543 |
544 |
545 |
546 | ```cython
547 | %%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++
548 | from libcpp cimport bool
549 |
550 | def lt_py(num1, num2):
551 | print("The result by python is:", num1 < num2)
552 |
553 | cdef bool _lt_cpp(float num1, float num2): # float -> double
554 | return num1 > num2
555 |
556 | def lt_cpp(num1, num2):
557 | print("The result by cpp is:", _lt_cpp(num1, num2))
558 | ```
559 |
560 |
561 | ```python
562 | lt_py(1234567890.0, 1234567891.0)
563 | lt_cpp(1234567890.0, 1234567891.0)
564 | ```
565 |
566 | The result by python is: True
567 | The result by cpp is: False
568 |
569 |
570 |
571 | ```python
572 |
573 | ```
574 |
--------------------------------------------------------------------------------
/Itertools for efficient looping.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Python Itertools - 高效的循环\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python \n",
10 | "\n",
11 | "Python官方文档用\"高效的循环\"来形容itertools模块,有些tools会带来性能提升,而另外一些tools并不快,只是会节省一些开发时间而已,如果滥用还会导致代码可读性变差。我们不妨把itertools的兄弟们拉出来溜溜。\n"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## 1. 数列累加\n",
19 | "给定一个列表An,返回数列累加和Sn。\n",
20 | "举例说明:\n",
21 | "* 输入: [1, 2, 3, 4, 5]\n",
22 | "* 返回: [1, 3, 6, 10, 15] \n",
23 | "\n",
24 | "使用accumulate,性能提升了2.5倍"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 1,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "from itertools import accumulate"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 2,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "def _accumulate_list(arr):\n",
43 | " tot = 0\n",
44 | " for x in arr:\n",
45 | " tot += x\n",
46 | " yield tot\n",
47 | "\n",
48 | "def accumulate_list(arr):\n",
49 | " return list(_accumulate_list(arr))"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 3,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "def fast_accumulate_list(arr):\n",
59 | " return list(accumulate(arr))"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 4,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "arr = list(range(1000))"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": 5,
74 | "metadata": {},
75 | "outputs": [
76 | {
77 | "name": "stdout",
78 | "output_type": "stream",
79 | "text": [
80 | "61 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
81 | ]
82 | }
83 | ],
84 | "source": [
85 | "%timeit accumulate_list(arr)"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 6,
91 | "metadata": {},
92 | "outputs": [
93 | {
94 | "name": "stdout",
95 | "output_type": "stream",
96 | "text": [
97 | "21.3 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
98 | ]
99 | }
100 | ],
101 | "source": [
102 | "%timeit fast_accumulate_list(arr)"
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "## 2. 选择数据\n",
110 | "给定一个列表data,一个用0/1表示的列表selectors,返回被选择的数据。\n",
111 | "举例说明:\n",
112 | "* 输入: [1, 2, 3, 4, 5], [0, 1, 0, 1, 0]\n",
113 | "* 返回: [2, 4] \n",
114 | "\n",
115 | "使用compress,性能提升了2.8倍"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 7,
121 | "metadata": {},
122 | "outputs": [],
123 | "source": [
124 | "from itertools import compress\n",
125 | "from random import randint"
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": 8,
131 | "metadata": {},
132 | "outputs": [],
133 | "source": [
134 | "def select_data(data, selectors):\n",
135 | " return [x for x, y in zip(data, selectors) if y]"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 9,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "def fast_select_data(data, selectors):\n",
145 | " return list(compress(data, selectors))"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": 10,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "data = list(range(10000))\n",
155 | "selectors = [randint(0, 1) for _ in range(10000)]"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 11,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "name": "stdout",
165 | "output_type": "stream",
166 | "text": [
167 | "341 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "%timeit select_data(data, selectors)"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 12,
178 | "metadata": {},
179 | "outputs": [
180 | {
181 | "name": "stdout",
182 | "output_type": "stream",
183 | "text": [
184 | "130 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
185 | ]
186 | }
187 | ],
188 | "source": [
189 | "%timeit fast_select_data(data, selectors)"
190 | ]
191 | },
192 | {
193 | "cell_type": "markdown",
194 | "metadata": {},
195 | "source": [
196 | "## 3. 组合\n",
197 | "给定一个列表arr和一个数字k,返回从arr中选择k个元素的所有情况。\n",
198 | "举例说明:\n",
199 | "* 输入: [1, 2, 3], 2\n",
200 | "* 返回: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)] \n",
201 | "\n",
202 | "使用permutations,性能提升了10倍"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 13,
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "from itertools import permutations"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 14,
217 | "metadata": {},
218 | "outputs": [],
219 | "source": [
220 | "def _get_permutations(arr, k, i):\n",
221 | " if i == k:\n",
222 | " return [arr[:k]]\n",
223 | " res = []\n",
224 | " for j in range(i, len(arr)):\n",
225 | " arr_cpy = arr.copy()\n",
226 | " arr_cpy[i], arr_cpy[j] = arr_cpy[j], arr_cpy[i]\n",
227 | " res += _get_permutations(arr_cpy, k, i + 1)\n",
228 | " return res\n",
229 | " \n",
230 | "def get_permutations(arr, k):\n",
231 | " return _get_permutations(arr, k, 0)"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 15,
237 | "metadata": {},
238 | "outputs": [],
239 | "source": [
240 | "def fast_get_permutations(arr, k):\n",
241 | " return list(permutations(arr, k))"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 16,
247 | "metadata": {},
248 | "outputs": [],
249 | "source": [
250 | "arr = list(range(10))\n",
251 | "k = 5"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": 17,
257 | "metadata": {},
258 | "outputs": [
259 | {
260 | "name": "stdout",
261 | "output_type": "stream",
262 | "text": [
263 | "15.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
264 | ]
265 | }
266 | ],
267 | "source": [
268 | "%timeit -n 1 get_permutations(arr, k)"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": 18,
274 | "metadata": {},
275 | "outputs": [
276 | {
277 | "name": "stdout",
278 | "output_type": "stream",
279 | "text": [
280 | "1.56 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
281 | ]
282 | }
283 | ],
284 | "source": [
285 | "%timeit -n 1 fast_get_permutations(arr, k)"
286 | ]
287 | },
288 | {
289 | "cell_type": "markdown",
290 | "metadata": {},
291 | "source": [
292 | "## 4. 筛选数据\n",
293 | "给定一个列表arr,筛选出所有的偶数。\n",
294 | "举例说明:\n",
295 | "* 输入: [3, 1, 4, 5, 9, 2]\n",
296 | "* 返回: [(4, 2] \n",
297 | "\n",
298 | "使用filterfalse,性能反而会变慢,所以不要迷信itertools。"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": 19,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "from itertools import filterfalse"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 20,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "def get_even_nums(arr):\n",
317 | " return [x for x in arr if x % 2 == 0]"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": 21,
323 | "metadata": {},
324 | "outputs": [],
325 | "source": [
326 | "def fast_get_even_nums(arr):\n",
327 | " return list(filterfalse(lambda x: x % 2, arr))"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": 22,
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "arr = list(range(10000))"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": 23,
342 | "metadata": {},
343 | "outputs": [
344 | {
345 | "name": "stdout",
346 | "output_type": "stream",
347 | "text": [
348 | "417 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
349 | ]
350 | }
351 | ],
352 | "source": [
353 | "%timeit get_even_nums(arr)"
354 | ]
355 | },
356 | {
357 | "cell_type": "code",
358 | "execution_count": 24,
359 | "metadata": {},
360 | "outputs": [
361 | {
362 | "name": "stdout",
363 | "output_type": "stream",
364 | "text": [
365 | "823 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
366 | ]
367 | }
368 | ],
369 | "source": [
370 | "%timeit fast_get_even_nums(arr)"
371 | ]
372 | },
373 | {
374 | "cell_type": "markdown",
375 | "metadata": {},
376 | "source": [
377 | "## 5. 条件终止\n",
378 | "给定一个列表arr,依次对列表的所有数字进行求和,若遇到某个元素大于target之后则终止求和,返回这个和。\n",
379 | "举例说明:\n",
380 | "* 输入: [1, 2, 3, 4, 5], 3\n",
381 | "* 返回: 6 (4 > 3,终止)\n",
382 | "\n",
383 | "使用takewhile,性能反而会变慢,所以不要迷信itertools。"
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": 25,
389 | "metadata": {},
390 | "outputs": [],
391 | "source": [
392 | "from itertools import takewhile"
393 | ]
394 | },
395 | {
396 | "cell_type": "code",
397 | "execution_count": 26,
398 | "metadata": {},
399 | "outputs": [],
400 | "source": [
401 | "def cond_sum(arr, target):\n",
402 | " res = 0\n",
403 | " for x in arr:\n",
404 | " if x > target:\n",
405 | " break\n",
406 | " res += x\n",
407 | " return res"
408 | ]
409 | },
410 | {
411 | "cell_type": "code",
412 | "execution_count": 27,
413 | "metadata": {},
414 | "outputs": [],
415 | "source": [
416 | "def fast_cond_sum(arr, target):\n",
417 | " return sum(takewhile(lambda x: x <= target, arr))"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 28,
423 | "metadata": {},
424 | "outputs": [],
425 | "source": [
426 | "arr = list(range(10000))\n",
427 | "target = 5000"
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": 29,
433 | "metadata": {},
434 | "outputs": [
435 | {
436 | "name": "stdout",
437 | "output_type": "stream",
438 | "text": [
439 | "245 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
440 | ]
441 | }
442 | ],
443 | "source": [
444 | "%timeit cond_sum(arr, target)"
445 | ]
446 | },
447 | {
448 | "cell_type": "code",
449 | "execution_count": 30,
450 | "metadata": {},
451 | "outputs": [
452 | {
453 | "name": "stdout",
454 | "output_type": "stream",
455 | "text": [
456 | "404 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
457 | ]
458 | }
459 | ],
460 | "source": [
461 | "%timeit fast_cond_sum(arr, target)"
462 | ]
463 | },
464 | {
465 | "cell_type": "markdown",
466 | "metadata": {},
467 | "source": [
468 | "## 6. 循环嵌套\n",
469 | "给定列表arr1,arr2,返回两个列表的所有元素两两相加的和。\n",
470 | "举例说明:\n",
471 | "* 输入: [1, 2], [4, 5]\n",
472 | "* 返回: [1 + 4, 1 + 5, 2 + 4, 2 + 5]\n",
473 | "\n",
474 | "使用product,性能提升了1.25倍。"
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": 31,
480 | "metadata": {},
481 | "outputs": [],
482 | "source": [
483 | "from itertools import product"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 32,
489 | "metadata": {},
490 | "outputs": [],
491 | "source": [
492 | "def _cross_sum(arr1, arr2):\n",
493 | " for x in arr1:\n",
494 | " for y in arr2:\n",
495 | " yield x + y\n",
496 | "\n",
497 | "def cross_sum(arr1, arr2):\n",
498 | " return list(_cross_sum(arr1, arr2))"
499 | ]
500 | },
501 | {
502 | "cell_type": "code",
503 | "execution_count": 33,
504 | "metadata": {},
505 | "outputs": [],
506 | "source": [
507 | "def fast_cross_sum(arr1, arr2):\n",
508 | " return [x + y for x, y in product(arr1, arr2)]"
509 | ]
510 | },
511 | {
512 | "cell_type": "code",
513 | "execution_count": 34,
514 | "metadata": {},
515 | "outputs": [],
516 | "source": [
517 | "arr1 = list(range(100))\n",
518 | "arr2 = list(range(100))"
519 | ]
520 | },
521 | {
522 | "cell_type": "code",
523 | "execution_count": 35,
524 | "metadata": {},
525 | "outputs": [
526 | {
527 | "name": "stdout",
528 | "output_type": "stream",
529 | "text": [
530 | "484 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
531 | ]
532 | }
533 | ],
534 | "source": [
535 | "%timeit cross_sum(arr1, arr2)"
536 | ]
537 | },
538 | {
539 | "cell_type": "code",
540 | "execution_count": 36,
541 | "metadata": {},
542 | "outputs": [
543 | {
544 | "name": "stdout",
545 | "output_type": "stream",
546 | "text": [
547 | "373 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
548 | ]
549 | }
550 | ],
551 | "source": [
552 | "%timeit fast_cross_sum(arr1, arr2)"
553 | ]
554 | },
555 | {
556 | "cell_type": "markdown",
557 | "metadata": {},
558 | "source": [
559 | "## 7. 二维列表转一维列表\n",
560 | "给定二维列表arr,转为一维列表\n",
561 | "举例说明:\n",
562 | "* 输入: [[1, 2], [3, 4]]\n",
563 | "* 返回: [1, 2, 3, 4]\n",
564 | "\n",
565 | "使用chain,性能提升了6倍。"
566 | ]
567 | },
568 | {
569 | "cell_type": "code",
570 | "execution_count": 37,
571 | "metadata": {},
572 | "outputs": [],
573 | "source": [
574 | "from itertools import chain"
575 | ]
576 | },
577 | {
578 | "cell_type": "code",
579 | "execution_count": 38,
580 | "metadata": {},
581 | "outputs": [],
582 | "source": [
583 | "def _flatten(arr2d):\n",
584 | " for arr in arr2d:\n",
585 | " for x in arr:\n",
586 | " yield x\n",
587 | "\n",
588 | "def flatten(arr2d):\n",
589 | " return list(_flatten(arr2d))"
590 | ]
591 | },
592 | {
593 | "cell_type": "code",
594 | "execution_count": 39,
595 | "metadata": {},
596 | "outputs": [],
597 | "source": [
598 | "def fast_flatten(arr2d):\n",
599 | " return list(chain(*arr2d))"
600 | ]
601 | },
602 | {
603 | "cell_type": "code",
604 | "execution_count": 40,
605 | "metadata": {},
606 | "outputs": [],
607 | "source": [
608 | "arr2d = [[x + y * 100 for x in range(100)] for y in range(100)]"
609 | ]
610 | },
611 | {
612 | "cell_type": "code",
613 | "execution_count": 41,
614 | "metadata": {},
615 | "outputs": [
616 | {
617 | "name": "stdout",
618 | "output_type": "stream",
619 | "text": [
620 | "379 µs ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
621 | ]
622 | }
623 | ],
624 | "source": [
625 | "%timeit flatten(arr2d)"
626 | ]
627 | },
628 | {
629 | "cell_type": "code",
630 | "execution_count": 42,
631 | "metadata": {},
632 | "outputs": [
633 | {
634 | "name": "stdout",
635 | "output_type": "stream",
636 | "text": [
637 | "66.9 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
638 | ]
639 | }
640 | ],
641 | "source": [
642 | "%timeit fast_flatten(arr2d)"
643 | ]
644 | },
645 | {
646 | "cell_type": "code",
647 | "execution_count": null,
648 | "metadata": {},
649 | "outputs": [],
650 | "source": []
651 | }
652 | ],
653 | "metadata": {
654 | "kernelspec": {
655 | "display_name": "Python 3",
656 | "language": "python",
657 | "name": "python3"
658 | },
659 | "language_info": {
660 | "codemirror_mode": {
661 | "name": "ipython",
662 | "version": 3
663 | },
664 | "file_extension": ".py",
665 | "mimetype": "text/x-python",
666 | "name": "python",
667 | "nbconvert_exporter": "python",
668 | "pygments_lexer": "ipython3",
669 | "version": "3.6.6"
670 | }
671 | },
672 | "nbformat": 4,
673 | "nbformat_minor": 2
674 | }
675 |
--------------------------------------------------------------------------------
/Using C++ in Cython.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 在Cython中使用C++\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 1. 在Jupyter Notebook上使用C++ \n",
17 | "- 首先加载Cython扩展,使用魔术命令 ``%load_ext Cython``\n",
18 | "- 接下来运行Cython代码,使用魔术命令 ``%%cython --cplus``\n",
19 | "- 如果使用MacOS,使用魔术命令 ``%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++``,详情请参考https://stackoverflow.com/questions/57367764/cant-import-cpplist-into-cython"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 1,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "%load_ext Cython"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 2,
34 | "metadata": {},
35 | "outputs": [
36 | {
37 | "name": "stdout",
38 | "output_type": "stream",
39 | "text": [
40 | "Hello world!\n"
41 | ]
42 | }
43 | ],
44 | "source": [
45 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
46 | "# 注意: 使用 'cimport' 而不是 'import'\n",
47 | "from libcpp.string cimport string\n",
48 | "cdef string s\n",
49 | "s = b\"Hello world!\"\n",
50 | "print(s.decode(\"utf-8\"))"
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "## 2. C++和Python类型的相互转换"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "| Python type| C++ type | Python type |\n",
65 | "| ------ | ------ | ------ |\n",
66 | "| bytes | std::string | bytes |\n",
67 | "|iterable|std::vector|list|\n",
68 | "|iterable|std::list|list|\n",
69 | "|iterable|std::set|set|\n",
70 | "|iterable (len 2)|std::pair|tuple (len 2)|"
71 | ]
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "metadata": {},
76 | "source": [
77 | "## 3. 使用C++ STL"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "### 3.1 使用C++ Vector\n",
85 | "可以替代Python的List。\n",
86 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n",
87 | "2. 遍历 - 让index自增,通过while循环进行遍历\n",
88 | "3. 访问 - 和Python一样使用'[]'操作符对元素进行访问\n",
89 | "4. 追加 - 与Python list的append方法相似,使用C++ Vector的push_back方法追加元素\n",
90 | "\n",
91 | "最后,我们通过分别实现Python和C++版本的元素计数函数来对比性能,C++大约快240倍左右。 \n",
92 | "注意: 为了公平起见,函数没有传入参数,而是直接访问函数体外部的变量。避免计入C++版本把Python列表转换为C++ Vector的耗时。如果计入这部分耗时,C++的版本大约快4倍左右。"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 3,
98 | "metadata": {},
99 | "outputs": [
100 | {
101 | "name": "stdout",
102 | "output_type": "stream",
103 | "text": [
104 | "开始遍历...\n",
105 | "\t第0个位置的元素是0\n",
106 | "\t第1个位置的元素是1\n",
107 | "\t第2个位置的元素是2\n",
108 | "\t第3个位置的元素是3\n",
109 | "\t第4个位置的元素是4\n",
110 | "\n",
111 | "追加元素之后vec变为 [0, 1, 2, 3, 4, 5]\n"
112 | ]
113 | }
114 | ],
115 | "source": [
116 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
117 | "from libcpp.vector cimport vector\n",
118 | "# 通过Python对象初始化\n",
119 | "cdef vector[int] vec = range(5)\n",
120 | "# 遍历\n",
121 | "cdef:\n",
122 | " int i = 0\n",
123 | " int n = vec.size()\n",
124 | "print(\"开始遍历...\")\n",
125 | "while i < n:\n",
126 | " # 访问\n",
127 | " print(\"\\t第%d个位置的元素是%d\" % (i, vec[i]))\n",
128 | " i += 1\n",
129 | "print()\n",
130 | "# 追加\n",
131 | "vec.push_back(5)\n",
132 | "print(\"追加元素之后vec变为\", vec)"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": 4,
138 | "metadata": {},
139 | "outputs": [
140 | {
141 | "name": "stdout",
142 | "output_type": "stream",
143 | "text": [
144 | "用Python来实现,计算结果为100!\n"
145 | ]
146 | }
147 | ],
148 | "source": [
149 | "arr = [x // 100 for x in range(1000)]\n",
150 | "target = 6\n",
151 | "\n",
152 | "def count_py():\n",
153 | " return sum(1 for x in arr if x == target)\n",
154 | "\n",
155 | "print(\"用Python来实现,计算结果为%d!\"% count_py())"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 5,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "name": "stdout",
165 | "output_type": "stream",
166 | "text": [
167 | "用Cython(C++)来实现,计算结果为100!\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
173 | "from libcpp.vector cimport vector\n",
174 | "\n",
175 | "cdef:\n",
176 | " int target = 6\n",
177 | " vector[int] v = [x // 100 for x in range(1000)]\n",
178 | "\n",
179 | "cdef int _count_cpp():\n",
180 | " cdef:\n",
181 | " int i = 0\n",
182 | " int n = v.size()\n",
183 | " int ret = 0\n",
184 | " while i < n:\n",
185 | " if v[i] == target:\n",
186 | " ret += 1\n",
187 | " i += 1\n",
188 | " return ret\n",
189 | "\n",
190 | "def count_cpp():\n",
191 | " return _count_cpp()\n",
192 | "\n",
193 | "print(\"用Cython(C++)来实现,计算结果为%d!\"% count_cpp())"
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": 6,
199 | "metadata": {},
200 | "outputs": [
201 | {
202 | "name": "stdout",
203 | "output_type": "stream",
204 | "text": [
205 | "对比Python版本与C++版本的性能...\n",
206 | "29.9 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
207 | "130 ns ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
208 | ]
209 | }
210 | ],
211 | "source": [
212 | "print(\"对比Python版本与C++版本的性能...\")\n",
213 | "%timeit count_py()\n",
214 | "%timeit count_cpp()"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "metadata": {},
220 | "source": [
221 | "### 3.2 使用C++ Unordered Map\n",
222 | "可以替代Python的Dict。\n",
223 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n",
224 | "2. 遍历 - 让泛型指针自增,通过while循环进行遍历\n",
225 | "3. 访问 - 使用deref(C++中的'*'操作符)来解引用,返回pair对象,通过.first来访问key, .second来访问Value\n",
226 | "4. 查找 - 使用unordered_map.count,返回1或0;或者用unordered_map.find,返回一个泛型指针,如果指针指向unordered_map.end,则表示未找到。\n",
227 | "5. 追加/修改 - unordered_map[key] = value。如果Key不存在,'[]'操作符会添加一个Key,并赋值为默认的Value,比如0.0。所以,除非确定不会产生错误,否则在修改Key对应的Value之前,要先判断Key是否存在。这与Python的DecaultDict有点相似。 \n",
228 | "\n",
229 | "最后,我们通过分别实现Python和C++版本的map条件求和函数来对比性能,C++大约快40倍左右。"
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": 7,
235 | "metadata": {},
236 | "outputs": [
237 | {
238 | "name": "stdout",
239 | "output_type": "stream",
240 | "text": [
241 | "开始遍历...\n",
242 | "\tKey is 0, Value is 0.0\n",
243 | "\tKey is 1, Value is 0.1\n",
244 | "\tKey is 2, Value is 0.2\n",
245 | "\tKey is 3, Value is 0.3\n",
246 | "\tKey is 4, Value is 0.4\n",
247 | "\tKey is 5, Value is 0.5\n",
248 | "\tKey is 6, Value is 0.6\n",
249 | "\tKey is 7, Value is 0.7\n",
250 | "\tKey is 8, Value is 0.8\n",
251 | "\tKey is 9, Value is 0.9\n",
252 | "\n",
253 | "开始查找...\n",
254 | "\t元素-2不存在!\n",
255 | "\t元素3存在, 它的值是0.3!\n",
256 | "\n",
257 | "修改元素...\n",
258 | "\tKey is 3, Value is 1.3\n",
259 | "\tKey is -2, Value is 0.0\n"
260 | ]
261 | }
262 | ],
263 | "source": [
264 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
265 | "from cython.operator cimport dereference as deref, preincrement as inc\n",
266 | "from libcpp.unordered_map cimport unordered_map\n",
267 | "# 通过Python对象初始化\n",
268 | "cdef unordered_map[int, float] mymap = {i: i/10 for i in range(10)}\n",
269 | "# 遍历\n",
270 | "cdef:\n",
271 | " unordered_map[int, float].iterator it = mymap.begin()\n",
272 | " unordered_map[int, float].iterator end = mymap.end()\n",
273 | "print(\"开始遍历...\")\n",
274 | "while it != end:\n",
275 | " # 访问\n",
276 | " print(\"\\tKey is %d, Value is %.1f\" % (deref(it).first, deref(it).second))\n",
277 | " inc(it)\n",
278 | "print()\n",
279 | "\n",
280 | "# 查找\n",
281 | "print(\"开始查找...\")\n",
282 | "if mymap.count(-2):\n",
283 | " print(\"\\t元素-2存在!\")\n",
284 | "else:\n",
285 | " print(\"\\t元素-2不存在!\")\n",
286 | "\n",
287 | "it = mymap.find(3)\n",
288 | "if it != end:\n",
289 | " print(\"\\t元素3存在, 它的值是%.1f!\" % deref(it).second)\n",
290 | "else:\n",
291 | " print(\"\\t元素3不存在!\")\n",
292 | "print()\n",
293 | "\n",
294 | "# 修改\n",
295 | "print(\"修改元素...\")\n",
296 | "if mymap.count(3):\n",
297 | " mymap[3] += 1.0\n",
298 | "mymap[-2] # Key -2不存在,会被添加一个默认值0.0\n",
299 | "print(\"\\tKey is 3, Value is %.1f\" % mymap[3])\n",
300 | "print(\"\\tKey is -2, Value is %.1f\" % mymap[-2])"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 8,
306 | "metadata": {},
307 | "outputs": [
308 | {
309 | "name": "stdout",
310 | "output_type": "stream",
311 | "text": [
312 | "用Python来实现,计算结果为1225!\n"
313 | ]
314 | }
315 | ],
316 | "source": [
317 | "my_map = {x: x for x in range(100)}\n",
318 | "target = 50\n",
319 | "\n",
320 | "def sum_lt_py():\n",
321 | " return sum(my_map[x] for x in my_map if x < target)\n",
322 | "\n",
323 | "print(\"用Python来实现,计算结果为%d!\"% sum_lt_py())"
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": 9,
329 | "metadata": {},
330 | "outputs": [
331 | {
332 | "name": "stdout",
333 | "output_type": "stream",
334 | "text": [
335 | "用Cython(C++)来实现,计算结果为1225!\n"
336 | ]
337 | }
338 | ],
339 | "source": [
340 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
341 | "from libcpp.unordered_map cimport unordered_map\n",
342 | "from cython.operator cimport dereference as deref, preincrement as inc\n",
343 | "\n",
344 | "cdef:\n",
345 | " unordered_map[int, int] my_map = {x: x for x in range(100)}\n",
346 | " int target = 50\n",
347 | "\n",
348 | "cdef _sum_lt_cpp():\n",
349 | " cdef:\n",
350 | " unordered_map[int, int].iterator it = my_map.begin()\n",
351 | " int ret\n",
352 | " while it != my_map.end():\n",
353 | " if deref(it).first < target:\n",
354 | " ret += deref(it).second\n",
355 | " inc(it)\n",
356 | " return ret\n",
357 | "\n",
358 | "def sum_lt_cpp():\n",
359 | " return _sum_lt_cpp()\n",
360 | "\n",
361 | "print(\"用Cython(C++)来实现,计算结果为%d!\"% sum_lt_cpp())"
362 | ]
363 | },
364 | {
365 | "cell_type": "code",
366 | "execution_count": 10,
367 | "metadata": {},
368 | "outputs": [
369 | {
370 | "name": "stdout",
371 | "output_type": "stream",
372 | "text": [
373 | "对比Python版本与C++版本的性能...\n",
374 | "6.56 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
375 | "162 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
376 | ]
377 | }
378 | ],
379 | "source": [
380 | "print(\"对比Python版本与C++版本的性能...\")\n",
381 | "%timeit sum_lt_py()\n",
382 | "%timeit sum_lt_cpp()"
383 | ]
384 | },
385 | {
386 | "cell_type": "markdown",
387 | "metadata": {},
388 | "source": [
389 | "### 3.3 使用C++ Unordered Set\n",
390 | "可以替代Python的Set。 \n",
391 | "1. 初始化 - 通过Python的可迭代对象进行初始化,需要声明变量的嵌套类型\n",
392 | "2. 遍历 - 让泛型指针自增,通过while循环进行遍历\n",
393 | "3. 访问 - 使用deref(C++中的'*'操作符)来解引用\n",
394 | "4. 查找 - 使用unordered_set.count,返回1或0\n",
395 | "5. 追加 - 使用unordered_set.insert,如果元素已经存在,则元素不会被追加\n",
396 | "6. 交集、并集、差集 - 据我所知,unordered_set的这些操作需要开发者自己去实现,不如Python的Set用起来方便。\n",
397 | " \n",
398 | "最后,我们通过分别实现Python和C++版本的set求交集对比性能,C++大约**慢**20倍左右。详情可参考https://stackoverflow.com/questions/54763112/how-to-improve-stdset-intersection-performance-in-c \n",
399 | "如果只是求两个集合相同元素的数量,C++的性能大约是Python的6倍。不难推测,C++的unordered set查询很快,但是创建很慢。"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": 11,
405 | "metadata": {},
406 | "outputs": [
407 | {
408 | "name": "stdout",
409 | "output_type": "stream",
410 | "text": [
411 | "开始遍历...\n",
412 | "\tValue is 0\n",
413 | "\tValue is 1\n",
414 | "\tValue is 2\n",
415 | "\tValue is 3\n",
416 | "\tValue is 4\n",
417 | "\n",
418 | "开始查找...\n",
419 | "\t元素-2不存在!\n",
420 | "\n",
421 | "追加元素...\n",
422 | "\tMyset is: {0, 1, 2, 3, 4, -1}\n"
423 | ]
424 | }
425 | ],
426 | "source": [
427 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
428 | "from cython.operator cimport dereference as deref, preincrement as inc\n",
429 | "from libcpp.unordered_set cimport unordered_set\n",
430 | "# 通过Python对象初始化\n",
431 | "cdef unordered_set[int] myset = {i for i in range(5)}\n",
432 | "# 遍历\n",
433 | "cdef:\n",
434 | " unordered_set[int].iterator it = myset.begin()\n",
435 | " unordered_set[int].iterator end = myset.end()\n",
436 | "print(\"开始遍历...\")\n",
437 | "while it != end:\n",
438 | " # 访问\n",
439 | " print(\"\\tValue is %d\" % deref(it))\n",
440 | " inc(it)\n",
441 | "print()\n",
442 | "\n",
443 | "# 查找\n",
444 | "print(\"开始查找...\")\n",
445 | "if myset.count(-2):\n",
446 | " print(\"\\t元素-2存在!\")\n",
447 | "else:\n",
448 | " print(\"\\t元素-2不存在!\")\n",
449 | "\n",
450 | "print()\n",
451 | "\n",
452 | "# 追加\n",
453 | "print(\"追加元素...\")\n",
454 | "myset.insert(0)\n",
455 | "myset.insert(-1)\n",
456 | "\n",
457 | "print(\"\\tMyset is: \", myset)"
458 | ]
459 | },
460 | {
461 | "cell_type": "code",
462 | "execution_count": 12,
463 | "metadata": {},
464 | "outputs": [
465 | {
466 | "name": "stdout",
467 | "output_type": "stream",
468 | "text": [
469 | "用Python来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!\n"
470 | ]
471 | }
472 | ],
473 | "source": [
474 | "myset1 = {x for x in range(100)}\n",
475 | "myset2 = {x for x in range(50, 60)}\n",
476 | "\n",
477 | "def intersection_py():\n",
478 | " return myset1 & myset2\n",
479 | "\n",
480 | "print(\"用Python来实现,计算结果为%s!\"% intersection_py())"
481 | ]
482 | },
483 | {
484 | "cell_type": "code",
485 | "execution_count": 13,
486 | "metadata": {},
487 | "outputs": [
488 | {
489 | "name": "stdout",
490 | "output_type": "stream",
491 | "text": [
492 | "用Cython(C++)来实现,计算结果为{50, 51, 52, 53, 54, 55, 56, 57, 58, 59}!\n"
493 | ]
494 | }
495 | ],
496 | "source": [
497 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
498 | "from cython.operator cimport dereference as deref, preincrement as inc\n",
499 | "from libcpp.unordered_set cimport unordered_set\n",
500 | "\n",
501 | "cdef:\n",
502 | " unordered_set[int] myset1 = {x for x in range(100)}\n",
503 | " unordered_set[int] myset2 = {x for x in range(50, 60)}\n",
504 | "\n",
505 | "cdef unordered_set[int] _intersection_cpp():\n",
506 | " cdef:\n",
507 | " unordered_set[int].iterator it = myset1.begin()\n",
508 | " unordered_set[int] ret\n",
509 | " while it != myset1.end():\n",
510 | " if myset2.count(deref(it)):\n",
511 | " ret.insert(deref(it))\n",
512 | " inc(it)\n",
513 | " return ret\n",
514 | "\n",
515 | "def intersection_cpp():\n",
516 | " return _intersection_cpp()\n",
517 | "\n",
518 | "print(\"用Cython(C++)来实现,计算结果为%s!\"% intersection_cpp())"
519 | ]
520 | },
521 | {
522 | "cell_type": "code",
523 | "execution_count": 14,
524 | "metadata": {},
525 | "outputs": [
526 | {
527 | "name": "stdout",
528 | "output_type": "stream",
529 | "text": [
530 | "对比Python版本与C++版本的性能...\n",
531 | "274 ns ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n",
532 | "5.28 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
533 | ]
534 | }
535 | ],
536 | "source": [
537 | "print(\"对比Python版本与C++版本的性能...\")\n",
538 | "%timeit intersection_py()\n",
539 | "%timeit intersection_cpp()"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 15,
545 | "metadata": {},
546 | "outputs": [
547 | {
548 | "name": "stdout",
549 | "output_type": "stream",
550 | "text": [
551 | "用Python(C++)来实现,计算结果为10!\n"
552 | ]
553 | }
554 | ],
555 | "source": [
556 | "myset1 = {x for x in range(100)}\n",
557 | "myset2 = {x for x in range(50, 60)}\n",
558 | "\n",
559 | "def count_common_py():\n",
560 | " return len(myset1 & myset2)\n",
561 | "\n",
562 | "print(\"用Python(C++)来实现,计算结果为%s!\"% count_common_py())"
563 | ]
564 | },
565 | {
566 | "cell_type": "code",
567 | "execution_count": 16,
568 | "metadata": {},
569 | "outputs": [
570 | {
571 | "name": "stdout",
572 | "output_type": "stream",
573 | "text": [
574 | "用Cython(C++)来实现,计算结果为10!\n"
575 | ]
576 | }
577 | ],
578 | "source": [
579 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
580 | "from cython.operator cimport dereference as deref, preincrement as inc\n",
581 | "from libcpp.unordered_set cimport unordered_set\n",
582 | "\n",
583 | "cdef:\n",
584 | " unordered_set[int] myset2 = {x for x in range(100)}\n",
585 | " unordered_set[int] myset1 = {x for x in range(50, 60)}\n",
586 | "\n",
587 | "cdef int _count_common_cpp():\n",
588 | " if myset1.size() > myset2.size():\n",
589 | " myset1.swap(myset2)\n",
590 | " cdef:\n",
591 | " unordered_set[int].iterator it = myset1.begin()\n",
592 | " int ret = 0\n",
593 | " while it != myset1.end():\n",
594 | " if myset2.count(deref(it)):\n",
595 | " ret += 1\n",
596 | " inc(it)\n",
597 | " return ret\n",
598 | "\n",
599 | "def count_common_cpp():\n",
600 | " return _count_common_cpp()\n",
601 | "\n",
602 | "print(\"用Cython(C++)来实现,计算结果为%s!\"% count_common_cpp())"
603 | ]
604 | },
605 | {
606 | "cell_type": "code",
607 | "execution_count": 17,
608 | "metadata": {},
609 | "outputs": [
610 | {
611 | "name": "stdout",
612 | "output_type": "stream",
613 | "text": [
614 | "对比Python版本与C++版本的性能...\n",
615 | "295 ns ± 5.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n",
616 | "46.1 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
617 | ]
618 | }
619 | ],
620 | "source": [
621 | "print(\"对比Python版本与C++版本的性能...\")\n",
622 | "%timeit count_common_py()\n",
623 | "%timeit count_common_cpp()"
624 | ]
625 | },
626 | {
627 | "cell_type": "markdown",
628 | "metadata": {},
629 | "source": [
630 | "## 4. 传值与传引用\n",
631 | "Python的函数,如果是容器类对象(如List, Set),传递的是引用,否则传递的是值(如int, float),如果不希望让函数修改容器类对象,可以用deepcopy函数先拷贝一份容器的副本。 \n",
632 | "但在C++里默认都是传值,如果需要传引用需要声明。\n",
633 | "以int型Vector为例,可以看到v1的值没有被pass_value修改,但被pass_reference修改了。\n",
634 | "- 传值使用 ``vector[int]``,pass_value函数只是传入了v1的一份拷贝,所以函数无法修改v1\n",
635 | "- 传引用使用 ``vector[int]&``,pass_reference传入了v1的引用,函数可以修改v1。 \n",
636 | "\n",
637 | "下面的两块代码可以展示Python与C++的不同之处。"
638 | ]
639 | },
640 | {
641 | "cell_type": "code",
642 | "execution_count": 18,
643 | "metadata": {},
644 | "outputs": [
645 | {
646 | "name": "stdout",
647 | "output_type": "stream",
648 | "text": [
649 | "v1的初始值是[0, 0, 0]\n",
650 | "执行pass_value函数后,v1的值是[0, 0, 0]\n",
651 | "执行pass_reference函数后,v1的值是[-1, 0, 0]\n"
652 | ]
653 | }
654 | ],
655 | "source": [
656 | "from copy import deepcopy\n",
657 | "\n",
658 | "def pass_value(v):\n",
659 | " v = deepcopy(v)\n",
660 | " v[0] = -1\n",
661 | "\n",
662 | "def pass_reference(v):\n",
663 | " v[0] = -1\n",
664 | "\n",
665 | "v1 = [0, 0, 0]\n",
666 | "print(\"v1的初始值是%s\" % v1)\n",
667 | "pass_value(v1)\n",
668 | "print(\"执行pass_value函数后,v1的值是%s\" % v1)\n",
669 | "pass_reference(v1)\n",
670 | "print(\"执行pass_reference函数后,v1的值是%s\" % v1)"
671 | ]
672 | },
673 | {
674 | "cell_type": "code",
675 | "execution_count": 19,
676 | "metadata": {},
677 | "outputs": [
678 | {
679 | "name": "stdout",
680 | "output_type": "stream",
681 | "text": [
682 | "v1的初始值是[0, 0, 0]\n",
683 | "执行pass_value函数后,v1的值是[0, 0, 0]\n",
684 | "执行pass_reference函数后,v1的值是[-1, 0, 0]\n"
685 | ]
686 | }
687 | ],
688 | "source": [
689 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
690 | "\n",
691 | "from libcpp.vector cimport vector\n",
692 | "\n",
693 | "cdef void pass_value(vector[int] v):\n",
694 | " v[0] = -1\n",
695 | "\n",
696 | "cdef void pass_reference(vector[int]& v):\n",
697 | " v[0] = -1\n",
698 | "\n",
699 | "cdef vector[int] v1 = [0, 0, 0]\n",
700 | "print(\"v1的初始值是%s\" % v1)\n",
701 | "pass_value(v1)\n",
702 | "print(\"执行pass_value函数后,v1的值是%s\" % v1)\n",
703 | "pass_reference(v1)\n",
704 | "print(\"执行pass_reference函数后,v1的值是%s\" % v1)"
705 | ]
706 | },
707 | {
708 | "cell_type": "markdown",
709 | "metadata": {},
710 | "source": [
711 | "## 5. 数字的范围\n",
712 | "Python只有int型,而且int的范围可以认为是无限大的,只要没有超出内存限制,所以Python使用者一般不太关心数值溢出等问题。但使用C++的时候就需要谨慎,C++各个数字类型对应的范围如下: \n",
713 | "\n",
714 | "\n",
715 | "|Type\t|Typical Bit Width\t|Typical Range|\n",
716 | "| ------ | ------ | ------ |\n",
717 | "|char\t|1byte\t|-127 to 127 or 0 to 255|\n",
718 | "|unsigned char\t|1byte\t|0 to 255|\n",
719 | "|signed char\t|1byte\t-127 to 127|\n",
720 | "|int\t|4bytes\t|-2147483648 to 2147483647|\n",
721 | "|unsigned int\t|4bytes\t|0 to 4294967295|\n",
722 | "|signed int\t|4bytes\t|-2147483648 to 2147483647|\n",
723 | "|short int\t|2bytes\t|-32768 to 32767|\n",
724 | "|unsigned short int\t|2bytes\t|0 to 65,535|\n",
725 | "|signed short int\t|2bytes\t|-32768 to 32767|\n",
726 | "|long int\t|4bytes\t|-2,147,483,648 to 2,147,483,647|\n",
727 | "|signed long int\t|8bytes\t|same as long int|\n",
728 | "|unsigned long int\t|4bytes\t|0 to 4,294,967,295|\n",
729 | "|long long int\t|8bytes\t|-(2^63) to (2^63)-1|\n",
730 | "|unsigned long long int\t|8bytes\t|0 to 18,446,744,073,709,551,615|\n",
731 | "|float\t|4bytes\t||\n",
732 | "|double\t|8bytes\t||\n",
733 | "|long double\t|12bytes||\t\n",
734 | "|wchar_t\t|2 or 4 bytes\t|1 wide character|\n",
735 | "\n",
736 | "\n",
737 | "比如下面的函数就会造成错误。"
738 | ]
739 | },
740 | {
741 | "cell_type": "code",
742 | "execution_count": 20,
743 | "metadata": {},
744 | "outputs": [],
745 | "source": [
746 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
747 | "def sum_py(num1, num2):\n",
748 | " print(\"The result by python is:\", num1 + num2)\n",
749 | "\n",
750 | "cdef int _sum_cpp(int num1, int num2): # int -> long int\n",
751 | " return num1 + num2\n",
752 | "\n",
753 | "def sum_cpp(num1, num2):\n",
754 | " print(\"The result by cpp is:\", _sum_cpp(num1, num2))"
755 | ]
756 | },
757 | {
758 | "cell_type": "code",
759 | "execution_count": 21,
760 | "metadata": {},
761 | "outputs": [
762 | {
763 | "name": "stdout",
764 | "output_type": "stream",
765 | "text": [
766 | "The result by python is: 2147483648\n",
767 | "The result by cpp is: -2147483648\n"
768 | ]
769 | }
770 | ],
771 | "source": [
772 | "sum_py(2**31-1, 1)\n",
773 | "sum_cpp(2**31-1, 1)"
774 | ]
775 | },
776 | {
777 | "cell_type": "code",
778 | "execution_count": 22,
779 | "metadata": {},
780 | "outputs": [],
781 | "source": [
782 | "%%cython --cplus --compile-args=-stdlib=libc++ --link-args=-stdlib=libc++\n",
783 | "from libcpp cimport bool\n",
784 | "\n",
785 | "def lt_py(num1, num2):\n",
786 | " print(\"The result by python is:\", num1 < num2)\n",
787 | "\n",
788 | "cdef bool _lt_cpp(float num1, float num2): # float -> double\n",
789 | " return num1 > num2\n",
790 | "\n",
791 | "def lt_cpp(num1, num2):\n",
792 | " print(\"The result by cpp is:\", _lt_cpp(num1, num2))"
793 | ]
794 | },
795 | {
796 | "cell_type": "code",
797 | "execution_count": 23,
798 | "metadata": {},
799 | "outputs": [
800 | {
801 | "name": "stdout",
802 | "output_type": "stream",
803 | "text": [
804 | "The result by python is: True\n",
805 | "The result by cpp is: False\n"
806 | ]
807 | }
808 | ],
809 | "source": [
810 | "lt_py(1234567890.0, 1234567891.0)\n",
811 | "lt_cpp(1234567890.0, 1234567891.0)"
812 | ]
813 | },
814 | {
815 | "cell_type": "code",
816 | "execution_count": null,
817 | "metadata": {},
818 | "outputs": [],
819 | "source": []
820 | }
821 | ],
822 | "metadata": {
823 | "kernelspec": {
824 | "display_name": "Python 3",
825 | "language": "python",
826 | "name": "python3"
827 | },
828 | "language_info": {
829 | "codemirror_mode": {
830 | "name": "ipython",
831 | "version": 3
832 | },
833 | "file_extension": ".py",
834 | "mimetype": "text/x-python",
835 | "name": "python",
836 | "nbconvert_exporter": "python",
837 | "pygments_lexer": "ipython3",
838 | "version": "3.6.6"
839 | }
840 | },
841 | "nbformat": 4,
842 | "nbformat_minor": 2
843 | }
844 |
--------------------------------------------------------------------------------
/Built-in method.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 使用内置方法优化Python性能\n",
8 | "作者: tushushu \n",
9 | "项目地址: https://github.com/tushushu/flying-python\n",
10 | "\n",
11 | "Python程序运行太慢的一个可能的原因是没有尽可能的调用内置方法,下面通过5个例子来演示如何用内置方法提升Python程序的性能。"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## 1. 数组求平方和\n",
19 | "输入一个列表,要求计算出该列表中数字的的平方和。最终性能提升了1.4倍。"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "首先创建一个长度为10000的列表。"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "arr = list(range(10000))"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "### 1.1 最常规的写法\n",
43 | "while循环遍历列表求平方和。平均运行时间2.97毫秒。"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 2,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "def sum_sqr_0(arr):\n",
53 | " res = 0\n",
54 | " n = len(arr)\n",
55 | " i = 0\n",
56 | " while i < n:\n",
57 | " res += arr[i] ** 2\n",
58 | " i += 1\n",
59 | " return res"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 3,
65 | "metadata": {},
66 | "outputs": [
67 | {
68 | "name": "stdout",
69 | "output_type": "stream",
70 | "text": [
71 | "2.97 ms ± 36.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
72 | ]
73 | }
74 | ],
75 | "source": [
76 | "%timeit sum_sqr_0(arr)"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "### 1.2 for range代替while循环\n",
84 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间2.9毫秒。"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 4,
90 | "metadata": {},
91 | "outputs": [],
92 | "source": [
93 | "def sum_sqr_1(arr):\n",
94 | " res = 0\n",
95 | " for i in range(len(arr)):\n",
96 | " res += arr[i] ** 2\n",
97 | " return res"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 5,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "name": "stdout",
107 | "output_type": "stream",
108 | "text": [
109 | "2.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
110 | ]
111 | }
112 | ],
113 | "source": [
114 | "%timeit sum_sqr_1(arr)"
115 | ]
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "metadata": {},
120 | "source": [
121 | "### 1.3 for x in arr代替for range\n",
122 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间2.59毫秒。"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 6,
128 | "metadata": {},
129 | "outputs": [],
130 | "source": [
131 | "def sum_sqr_2(arr):\n",
132 | " res = 0\n",
133 | " for x in arr:\n",
134 | " res += x ** 2\n",
135 | " return res"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 7,
141 | "metadata": {},
142 | "outputs": [
143 | {
144 | "name": "stdout",
145 | "output_type": "stream",
146 | "text": [
147 | "2.59 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
148 | ]
149 | }
150 | ],
151 | "source": [
152 | "%timeit sum_sqr_2(arr)"
153 | ]
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {},
158 | "source": [
159 | "### 1.4 sum函数套用map函数\n",
160 | "平均运行时间2.36毫秒"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 8,
166 | "metadata": {},
167 | "outputs": [],
168 | "source": [
169 | "def sum_sqr_3(arr):\n",
170 | " return sum(map(lambda x: x**2, arr))"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": 9,
176 | "metadata": {},
177 | "outputs": [
178 | {
179 | "name": "stdout",
180 | "output_type": "stream",
181 | "text": [
182 | "2.36 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
183 | ]
184 | }
185 | ],
186 | "source": [
187 | "%timeit sum_sqr_3(arr)"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": [
194 | "### 1.5 sum函数套用生成器表达式\n",
195 | "生成器表达式如果作为某个函数的参数,则可以省略掉()。平均运行时间2.35毫秒。"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 10,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "def sum_sqr_4(arr):\n",
205 | " return sum(x ** 2 for x in arr)"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 11,
211 | "metadata": {},
212 | "outputs": [
213 | {
214 | "name": "stdout",
215 | "output_type": "stream",
216 | "text": [
217 | "2.35 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
218 | ]
219 | }
220 | ],
221 | "source": [
222 | "%timeit sum_sqr_4(arr)"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "### 1. 6 sum函数套用列表推导式\n",
230 | "平均运行时间2.06毫秒。"
231 | ]
232 | },
233 | {
234 | "cell_type": "code",
235 | "execution_count": 12,
236 | "metadata": {},
237 | "outputs": [],
238 | "source": [
239 | "def sum_sqr_5(arr):\n",
240 | " return sum([x ** 2 for x in arr])"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 13,
246 | "metadata": {},
247 | "outputs": [
248 | {
249 | "name": "stdout",
250 | "output_type": "stream",
251 | "text": [
252 | "2.06 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
253 | ]
254 | }
255 | ],
256 | "source": [
257 | "%timeit sum_sqr_5(arr)"
258 | ]
259 | },
260 | {
261 | "cell_type": "markdown",
262 | "metadata": {},
263 | "source": [
264 | "## 2. 字符串拼接\n",
265 | "输入一个列表,要求将列表中的字符串的前3个字符都拼接为一个字符串。最终性能提升了2.1倍。"
266 | ]
267 | },
268 | {
269 | "cell_type": "markdown",
270 | "metadata": {},
271 | "source": [
272 | "首先创建一个列表,生成10000个随机长度和内容的字符串。"
273 | ]
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": 1,
278 | "metadata": {},
279 | "outputs": [],
280 | "source": [
281 | "from random import randint\n",
282 | "\n",
283 | "def random_letter():\n",
284 | " return chr(ord('a') + randint(0, 25))\n",
285 | "\n",
286 | "def random_letters(n):\n",
287 | " return \"\".join([random_letter() for _ in range(n)])\n",
288 | "\n",
289 | "strings = [random_letters(randint(1, 10)) for _ in range(10000)]"
290 | ]
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": [
296 | "### 2.1 最常规的写法\n",
297 | "while循环遍历列表,对字符串进行拼接。平均运行时间1.86毫秒。"
298 | ]
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": 2,
303 | "metadata": {},
304 | "outputs": [],
305 | "source": [
306 | "def concat_strings_0(strings):\n",
307 | " res = \"\"\n",
308 | " n = len(strings)\n",
309 | " i = 0\n",
310 | " while i < n:\n",
311 | " res += strings[i][:3]\n",
312 | " i += 1\n",
313 | " return res"
314 | ]
315 | },
316 | {
317 | "cell_type": "code",
318 | "execution_count": 3,
319 | "metadata": {},
320 | "outputs": [
321 | {
322 | "name": "stdout",
323 | "output_type": "stream",
324 | "text": [
325 | "1.86 ms ± 74.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
326 | ]
327 | }
328 | ],
329 | "source": [
330 | "%timeit concat_strings_0(strings)"
331 | ]
332 | },
333 | {
334 | "cell_type": "markdown",
335 | "metadata": {},
336 | "source": [
337 | "### 2.2 for range代替while循环\n",
338 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间1.55毫秒。"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": 4,
344 | "metadata": {},
345 | "outputs": [],
346 | "source": [
347 | "def concat_strings_1(strings):\n",
348 | " res = \"\"\n",
349 | " for i in range(len(strings)):\n",
350 | " res += strings[i][:3]\n",
351 | " return res"
352 | ]
353 | },
354 | {
355 | "cell_type": "code",
356 | "execution_count": 5,
357 | "metadata": {},
358 | "outputs": [
359 | {
360 | "name": "stdout",
361 | "output_type": "stream",
362 | "text": [
363 | "1.55 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
364 | ]
365 | }
366 | ],
367 | "source": [
368 | "%timeit concat_strings_1(strings)"
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {},
374 | "source": [
375 | "### 2.3 for x in strings代替for range\n",
376 | "避免strings[i]的变量类型检查带来的额外开销。平均运行时间1.32毫秒。"
377 | ]
378 | },
379 | {
380 | "cell_type": "code",
381 | "execution_count": 6,
382 | "metadata": {},
383 | "outputs": [],
384 | "source": [
385 | "def concat_strings_2(strings):\n",
386 | " res = \"\"\n",
387 | " for x in strings:\n",
388 | " res += x[:3]\n",
389 | " return res"
390 | ]
391 | },
392 | {
393 | "cell_type": "code",
394 | "execution_count": 7,
395 | "metadata": {},
396 | "outputs": [
397 | {
398 | "name": "stdout",
399 | "output_type": "stream",
400 | "text": [
401 | "1.32 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
402 | ]
403 | }
404 | ],
405 | "source": [
406 | "%timeit concat_strings_2(strings)"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "metadata": {},
412 | "source": [
413 | "### 2.4 .join方法套用生成器表达式\n",
414 | "平均运行时间1.06毫秒。"
415 | ]
416 | },
417 | {
418 | "cell_type": "code",
419 | "execution_count": 8,
420 | "metadata": {},
421 | "outputs": [],
422 | "source": [
423 | "def concat_strings_3(strings):\n",
424 | " return \"\".join(x[:3] for x in strings)"
425 | ]
426 | },
427 | {
428 | "cell_type": "code",
429 | "execution_count": 9,
430 | "metadata": {},
431 | "outputs": [
432 | {
433 | "name": "stdout",
434 | "output_type": "stream",
435 | "text": [
436 | "1.06 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
437 | ]
438 | }
439 | ],
440 | "source": [
441 | "%timeit concat_strings_3(strings)"
442 | ]
443 | },
444 | {
445 | "cell_type": "markdown",
446 | "metadata": {},
447 | "source": [
448 | "### 2.5 .join方法套用列表解析式\n",
449 | "平均运行时间0.85毫秒。"
450 | ]
451 | },
452 | {
453 | "cell_type": "code",
454 | "execution_count": 10,
455 | "metadata": {},
456 | "outputs": [],
457 | "source": [
458 | "def concat_strings_4(strings):\n",
459 | " return \"\".join([x[:3] for x in strings])"
460 | ]
461 | },
462 | {
463 | "cell_type": "code",
464 | "execution_count": 11,
465 | "metadata": {},
466 | "outputs": [
467 | {
468 | "name": "stdout",
469 | "output_type": "stream",
470 | "text": [
471 | "858 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
472 | ]
473 | }
474 | ],
475 | "source": [
476 | "%timeit concat_strings_4(strings)"
477 | ]
478 | },
479 | {
480 | "cell_type": "markdown",
481 | "metadata": {},
482 | "source": [
483 | "## 3. 筛选奇数"
484 | ]
485 | },
486 | {
487 | "cell_type": "markdown",
488 | "metadata": {},
489 | "source": [
490 | "输入一个列表,要求筛选出该列表中的所有奇数。最终性能提升了3.6倍。"
491 | ]
492 | },
493 | {
494 | "cell_type": "markdown",
495 | "metadata": {},
496 | "source": [
497 | "首先创建一个长度为10000的列表。"
498 | ]
499 | },
500 | {
501 | "cell_type": "code",
502 | "execution_count": 26,
503 | "metadata": {},
504 | "outputs": [],
505 | "source": [
506 | "arr = list(range(10000))"
507 | ]
508 | },
509 | {
510 | "cell_type": "markdown",
511 | "metadata": {},
512 | "source": [
513 | "### 3.1 最常规的写法\n",
514 | "创建一个空列表res,while循环遍历列表,将奇数append到res中。平均运行时间1.03毫秒。"
515 | ]
516 | },
517 | {
518 | "cell_type": "code",
519 | "execution_count": 27,
520 | "metadata": {},
521 | "outputs": [],
522 | "source": [
523 | "def filter_odd_0(arr):\n",
524 | " res = []\n",
525 | " i = 0\n",
526 | " n = len(arr)\n",
527 | " while i < n:\n",
528 | " if arr[i] % 2:\n",
529 | " res.append(arr[i])\n",
530 | " i += 1\n",
531 | " return res"
532 | ]
533 | },
534 | {
535 | "cell_type": "code",
536 | "execution_count": 28,
537 | "metadata": {},
538 | "outputs": [
539 | {
540 | "name": "stdout",
541 | "output_type": "stream",
542 | "text": [
543 | "1.03 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
544 | ]
545 | }
546 | ],
547 | "source": [
548 | "%timeit filter_odd_0(arr)"
549 | ]
550 | },
551 | {
552 | "cell_type": "markdown",
553 | "metadata": {},
554 | "source": [
555 | "### 3.2 for range代替while循环\n",
556 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间0.965毫秒。"
557 | ]
558 | },
559 | {
560 | "cell_type": "code",
561 | "execution_count": 29,
562 | "metadata": {},
563 | "outputs": [],
564 | "source": [
565 | "def filter_odd_1(arr):\n",
566 | " res = []\n",
567 | " for i in range(len(arr)):\n",
568 | " if arr[i] % 2:\n",
569 | " res.append(arr[i])\n",
570 | " i += 1\n",
571 | " return res"
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": 30,
577 | "metadata": {},
578 | "outputs": [
579 | {
580 | "name": "stdout",
581 | "output_type": "stream",
582 | "text": [
583 | "965 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
584 | ]
585 | }
586 | ],
587 | "source": [
588 | "%timeit filter_odd_1(arr)"
589 | ]
590 | },
591 | {
592 | "cell_type": "markdown",
593 | "metadata": {},
594 | "source": [
595 | "### 3.3 for x in arr代替for range\n",
596 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.430毫秒。"
597 | ]
598 | },
599 | {
600 | "cell_type": "code",
601 | "execution_count": 31,
602 | "metadata": {},
603 | "outputs": [],
604 | "source": [
605 | "def filter_odd_2(arr):\n",
606 | " res = []\n",
607 | " for x in arr:\n",
608 | " if x % 2:\n",
609 | " res.append(x)\n",
610 | " return res"
611 | ]
612 | },
613 | {
614 | "cell_type": "code",
615 | "execution_count": 32,
616 | "metadata": {},
617 | "outputs": [
618 | {
619 | "name": "stdout",
620 | "output_type": "stream",
621 | "text": [
622 | "430 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
623 | ]
624 | }
625 | ],
626 | "source": [
627 | "%timeit filter_odd_2(arr)"
628 | ]
629 | },
630 | {
631 | "cell_type": "markdown",
632 | "metadata": {},
633 | "source": [
634 | "### 3.4 list套用filter函数\n",
635 | "平均运行时间0.763毫秒。注意filter函数很慢,在Python 3.6里非常鸡肋。"
636 | ]
637 | },
638 | {
639 | "cell_type": "code",
640 | "execution_count": 33,
641 | "metadata": {},
642 | "outputs": [],
643 | "source": [
644 | "def filter_odd_3(arr):\n",
645 | " return list(filter(lambda x: x % 2, arr))"
646 | ]
647 | },
648 | {
649 | "cell_type": "code",
650 | "execution_count": 34,
651 | "metadata": {},
652 | "outputs": [
653 | {
654 | "name": "stdout",
655 | "output_type": "stream",
656 | "text": [
657 | "763 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
658 | ]
659 | }
660 | ],
661 | "source": [
662 | "%timeit filter_odd_3(arr)"
663 | ]
664 | },
665 | {
666 | "cell_type": "markdown",
667 | "metadata": {},
668 | "source": [
669 | "### 3.5 list套用生成器表达式\n",
670 | "平均运行时间0.398毫秒。"
671 | ]
672 | },
673 | {
674 | "cell_type": "code",
675 | "execution_count": 35,
676 | "metadata": {},
677 | "outputs": [],
678 | "source": [
679 | "def filter_odd_4(arr):\n",
680 | " return list((x for x in arr if x % 2))"
681 | ]
682 | },
683 | {
684 | "cell_type": "code",
685 | "execution_count": 36,
686 | "metadata": {},
687 | "outputs": [
688 | {
689 | "name": "stdout",
690 | "output_type": "stream",
691 | "text": [
692 | "398 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
693 | ]
694 | }
695 | ],
696 | "source": [
697 | "%timeit filter_odd_4(arr)"
698 | ]
699 | },
700 | {
701 | "cell_type": "markdown",
702 | "metadata": {},
703 | "source": [
704 | "### 3.6 带条件的列表推导式\n",
705 | "平均运行时间0.290毫秒。"
706 | ]
707 | },
708 | {
709 | "cell_type": "code",
710 | "execution_count": 37,
711 | "metadata": {},
712 | "outputs": [],
713 | "source": [
714 | "def filter_odd_5(arr):\n",
715 | " return [x for x in arr if x % 2]"
716 | ]
717 | },
718 | {
719 | "cell_type": "code",
720 | "execution_count": 38,
721 | "metadata": {},
722 | "outputs": [
723 | {
724 | "name": "stdout",
725 | "output_type": "stream",
726 | "text": [
727 | "290 µs ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
728 | ]
729 | }
730 | ],
731 | "source": [
732 | "%timeit filter_odd_5(arr)"
733 | ]
734 | },
735 | {
736 | "cell_type": "markdown",
737 | "metadata": {},
738 | "source": [
739 | "## 4. 两个数组相加"
740 | ]
741 | },
742 | {
743 | "cell_type": "markdown",
744 | "metadata": {},
745 | "source": [
746 | "输入两个长度相同的列表,要求计算出两个列表对应位置的数字之和,返回一个与输入长度相同的列表。最终性能提升了2.7倍。"
747 | ]
748 | },
749 | {
750 | "cell_type": "markdown",
751 | "metadata": {},
752 | "source": [
753 | "首先生成两个长度为10000的列表。"
754 | ]
755 | },
756 | {
757 | "cell_type": "code",
758 | "execution_count": 40,
759 | "metadata": {},
760 | "outputs": [],
761 | "source": [
762 | "arr1 = list(range(10000))\n",
763 | "arr2 = list(range(10000))"
764 | ]
765 | },
766 | {
767 | "cell_type": "markdown",
768 | "metadata": {},
769 | "source": [
770 | "### 4.1 最常规的写法\n",
771 | "创建一个空列表res,while循环遍历列表,将两个列表对应的元素之和append到res中。平均运行时间1.23毫秒。"
772 | ]
773 | },
774 | {
775 | "cell_type": "code",
776 | "execution_count": 41,
777 | "metadata": {},
778 | "outputs": [],
779 | "source": [
780 | "def arr_sum_0(arr1, arr2):\n",
781 | " i = 0\n",
782 | " n = len(arr1)\n",
783 | " res = []\n",
784 | " while i < n:\n",
785 | " res.append(arr1[i] + arr2[i])\n",
786 | " i += 1\n",
787 | " return res"
788 | ]
789 | },
790 | {
791 | "cell_type": "code",
792 | "execution_count": 42,
793 | "metadata": {},
794 | "outputs": [
795 | {
796 | "name": "stdout",
797 | "output_type": "stream",
798 | "text": [
799 | "1.23 ms ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
800 | ]
801 | }
802 | ],
803 | "source": [
804 | "%timeit arr_sum_0(arr1, arr2)"
805 | ]
806 | },
807 | {
808 | "cell_type": "markdown",
809 | "metadata": {},
810 | "source": [
811 | "### 4.2 for range代替while循环\n",
812 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间0.997毫秒。"
813 | ]
814 | },
815 | {
816 | "cell_type": "code",
817 | "execution_count": 43,
818 | "metadata": {},
819 | "outputs": [],
820 | "source": [
821 | "def arr_sum_1(arr1, arr2):\n",
822 | " res = []\n",
823 | " for i in range(len(arr1)):\n",
824 | " res.append(arr1[i] + arr2[i])\n",
825 | " return res"
826 | ]
827 | },
828 | {
829 | "cell_type": "code",
830 | "execution_count": 44,
831 | "metadata": {},
832 | "outputs": [
833 | {
834 | "name": "stdout",
835 | "output_type": "stream",
836 | "text": [
837 | "997 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
838 | ]
839 | }
840 | ],
841 | "source": [
842 | "%timeit arr_sum_1(arr1, arr2)"
843 | ]
844 | },
845 | {
846 | "cell_type": "markdown",
847 | "metadata": {},
848 | "source": [
849 | "### 4.3 for i, x in enumerate代替for range\n",
850 | "部分避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.799毫秒。"
851 | ]
852 | },
853 | {
854 | "cell_type": "code",
855 | "execution_count": 45,
856 | "metadata": {},
857 | "outputs": [],
858 | "source": [
859 | "def arr_sum_2(arr1, arr2):\n",
860 | " res = arr1.copy()\n",
861 | " for i, x in enumerate(arr2):\n",
862 | " res[i] += x\n",
863 | " return res"
864 | ]
865 | },
866 | {
867 | "cell_type": "code",
868 | "execution_count": 46,
869 | "metadata": {},
870 | "outputs": [
871 | {
872 | "name": "stdout",
873 | "output_type": "stream",
874 | "text": [
875 | "799 µs ± 16.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
876 | ]
877 | }
878 | ],
879 | "source": [
880 | "%timeit arr_sum_2(arr1, arr2)"
881 | ]
882 | },
883 | {
884 | "cell_type": "markdown",
885 | "metadata": {},
886 | "source": [
887 | "### 4.4 for x, y in zip代替for range\n",
888 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间0.769毫秒。"
889 | ]
890 | },
891 | {
892 | "cell_type": "code",
893 | "execution_count": 47,
894 | "metadata": {},
895 | "outputs": [],
896 | "source": [
897 | "def arr_sum_3(arr1, arr2):\n",
898 | " res = []\n",
899 | " for x, y in zip(arr1, arr2):\n",
900 | " res.append(x + y)\n",
901 | " return res"
902 | ]
903 | },
904 | {
905 | "cell_type": "code",
906 | "execution_count": 48,
907 | "metadata": {},
908 | "outputs": [
909 | {
910 | "name": "stdout",
911 | "output_type": "stream",
912 | "text": [
913 | "769 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
914 | ]
915 | }
916 | ],
917 | "source": [
918 | "%timeit arr_sum_3(arr1, arr2)"
919 | ]
920 | },
921 | {
922 | "cell_type": "markdown",
923 | "metadata": {},
924 | "source": [
925 | "### 4.5 列表推导式套用zip\n",
926 | "平均运行时间0.462毫秒。"
927 | ]
928 | },
929 | {
930 | "cell_type": "code",
931 | "execution_count": 49,
932 | "metadata": {},
933 | "outputs": [],
934 | "source": [
935 | "def arr_sum_4(arr1, arr2):\n",
936 | " return [x + y for x, y in zip(arr1, arr2)]"
937 | ]
938 | },
939 | {
940 | "cell_type": "code",
941 | "execution_count": 50,
942 | "metadata": {},
943 | "outputs": [
944 | {
945 | "name": "stdout",
946 | "output_type": "stream",
947 | "text": [
948 | "462 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
949 | ]
950 | }
951 | ],
952 | "source": [
953 | "%timeit arr_sum_4(arr1, arr2)"
954 | ]
955 | },
956 | {
957 | "cell_type": "markdown",
958 | "metadata": {},
959 | "source": [
960 | "## 5. 两个列表相同元素的数量\n",
961 | "输入两个列表,要求统计两个列表相同元素的数量。其中每个列表内的元素都是不重复的。最终性能提升了5000倍。"
962 | ]
963 | },
964 | {
965 | "cell_type": "markdown",
966 | "metadata": {},
967 | "source": [
968 | "首先创建两个列表,并将元素的顺序打乱。"
969 | ]
970 | },
971 | {
972 | "cell_type": "code",
973 | "execution_count": 51,
974 | "metadata": {},
975 | "outputs": [],
976 | "source": [
977 | "from random import shuffle\n",
978 | "arr1 = list(range(2000))\n",
979 | "shuffle(arr1)\n",
980 | "arr2 = list(range(1000, 3000))\n",
981 | "shuffle(arr2)"
982 | ]
983 | },
984 | {
985 | "cell_type": "markdown",
986 | "metadata": {},
987 | "source": [
988 | "### 5.1 最常规的写法\n",
989 | "while循环嵌套,判断元素arr1[i]是否等于arr2[j],平均运行时间338毫秒。"
990 | ]
991 | },
992 | {
993 | "cell_type": "code",
994 | "execution_count": 52,
995 | "metadata": {},
996 | "outputs": [],
997 | "source": [
998 | "def n_common_0(arr1, arr2):\n",
999 | " res = 0\n",
1000 | " i = 0\n",
1001 | " m = len(arr1)\n",
1002 | " n = len(arr2)\n",
1003 | " while i < m:\n",
1004 | " j = 0\n",
1005 | " while j < n:\n",
1006 | " if arr1[i] == arr2[j]:\n",
1007 | " res += 1\n",
1008 | " j += 1\n",
1009 | " i += 1\n",
1010 | " return res"
1011 | ]
1012 | },
1013 | {
1014 | "cell_type": "code",
1015 | "execution_count": 53,
1016 | "metadata": {},
1017 | "outputs": [
1018 | {
1019 | "name": "stdout",
1020 | "output_type": "stream",
1021 | "text": [
1022 | "338 ms ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
1023 | ]
1024 | }
1025 | ],
1026 | "source": [
1027 | "%timeit n_common_0(arr1, arr2)"
1028 | ]
1029 | },
1030 | {
1031 | "cell_type": "markdown",
1032 | "metadata": {},
1033 | "source": [
1034 | "### 5.2 for range代替while循环\n",
1035 | "避免i += 1的变量类型检查带来的额外开销。平均运行时间233毫秒。"
1036 | ]
1037 | },
1038 | {
1039 | "cell_type": "code",
1040 | "execution_count": 54,
1041 | "metadata": {},
1042 | "outputs": [],
1043 | "source": [
1044 | "def n_common_1(arr1, arr2):\n",
1045 | " res = 0\n",
1046 | " for i in range(len(arr1)):\n",
1047 | " for j in range(len(arr2)):\n",
1048 | " if arr1[i] == arr2[j]:\n",
1049 | " res += 1\n",
1050 | " return res"
1051 | ]
1052 | },
1053 | {
1054 | "cell_type": "code",
1055 | "execution_count": 55,
1056 | "metadata": {},
1057 | "outputs": [
1058 | {
1059 | "name": "stdout",
1060 | "output_type": "stream",
1061 | "text": [
1062 | "233 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
1063 | ]
1064 | }
1065 | ],
1066 | "source": [
1067 | "%timeit n_common_1(arr1, arr2)"
1068 | ]
1069 | },
1070 | {
1071 | "cell_type": "markdown",
1072 | "metadata": {},
1073 | "source": [
1074 | "### 5.3 for x in arr代替for range\n",
1075 | "避免arr[i]的变量类型检查带来的额外开销。平均运行时间84.8毫秒。"
1076 | ]
1077 | },
1078 | {
1079 | "cell_type": "code",
1080 | "execution_count": 56,
1081 | "metadata": {},
1082 | "outputs": [],
1083 | "source": [
1084 | "def n_common_2(arr1, arr2):\n",
1085 | " res = 0\n",
1086 | " for x in arr1:\n",
1087 | " for y in arr2:\n",
1088 | " if x == y:\n",
1089 | " res += 1\n",
1090 | " return res"
1091 | ]
1092 | },
1093 | {
1094 | "cell_type": "code",
1095 | "execution_count": 57,
1096 | "metadata": {},
1097 | "outputs": [
1098 | {
1099 | "name": "stdout",
1100 | "output_type": "stream",
1101 | "text": [
1102 | "84.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
1103 | ]
1104 | }
1105 | ],
1106 | "source": [
1107 | "%timeit n_common_2(arr1, arr2)"
1108 | ]
1109 | },
1110 | {
1111 | "cell_type": "markdown",
1112 | "metadata": {},
1113 | "source": [
1114 | "### 5.4 使用if x in arr2代替内层循环\n",
1115 | "平均运行时间24.9毫秒。"
1116 | ]
1117 | },
1118 | {
1119 | "cell_type": "code",
1120 | "execution_count": 58,
1121 | "metadata": {},
1122 | "outputs": [],
1123 | "source": [
1124 | "def n_common_3(arr1, arr2):\n",
1125 | " res = 0\n",
1126 | " for x in arr1:\n",
1127 | " if x in arr2:\n",
1128 | " res += 1\n",
1129 | " return res"
1130 | ]
1131 | },
1132 | {
1133 | "cell_type": "code",
1134 | "execution_count": 59,
1135 | "metadata": {},
1136 | "outputs": [
1137 | {
1138 | "name": "stdout",
1139 | "output_type": "stream",
1140 | "text": [
1141 | "24.9 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
1142 | ]
1143 | }
1144 | ],
1145 | "source": [
1146 | "%timeit n_common_3(arr1, arr2)"
1147 | ]
1148 | },
1149 | {
1150 | "cell_type": "markdown",
1151 | "metadata": {},
1152 | "source": [
1153 | "### 5.4 使用更快的算法\n",
1154 | "将数组用.sort方法排序,再进行单层循环遍历。把时间复杂度从O(n2)降低到O(nlogn),平均运行时间0.239毫秒。"
1155 | ]
1156 | },
1157 | {
1158 | "cell_type": "code",
1159 | "execution_count": 60,
1160 | "metadata": {},
1161 | "outputs": [],
1162 | "source": [
1163 | "def n_common_4(arr1, arr2):\n",
1164 | " arr1.sort()\n",
1165 | " arr2.sort()\n",
1166 | " res = i = j = 0\n",
1167 | " m, n = len(arr1), len(arr2)\n",
1168 | " while i < m and j < n:\n",
1169 | " if arr1[i] == arr2[j]:\n",
1170 | " res += 1\n",
1171 | " i += 1\n",
1172 | " j += 1\n",
1173 | " elif arr1[i] > arr2[j]:\n",
1174 | " j += 1\n",
1175 | " else:\n",
1176 | " i += 1\n",
1177 | " return res"
1178 | ]
1179 | },
1180 | {
1181 | "cell_type": "code",
1182 | "execution_count": 61,
1183 | "metadata": {},
1184 | "outputs": [
1185 | {
1186 | "name": "stdout",
1187 | "output_type": "stream",
1188 | "text": [
1189 | "329 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
1190 | ]
1191 | }
1192 | ],
1193 | "source": [
1194 | "%timeit n_common_4(arr1, arr2)"
1195 | ]
1196 | },
1197 | {
1198 | "cell_type": "markdown",
1199 | "metadata": {},
1200 | "source": [
1201 | "### 5.5 使用更好的数据结构\n",
1202 | "将数组转为集合,求交集的长度。平均运行时间0.067毫秒。"
1203 | ]
1204 | },
1205 | {
1206 | "cell_type": "code",
1207 | "execution_count": 62,
1208 | "metadata": {},
1209 | "outputs": [],
1210 | "source": [
1211 | "def n_common_5(arr1, arr2):\n",
1212 | " return len(set(arr1) & set(arr2))"
1213 | ]
1214 | },
1215 | {
1216 | "cell_type": "code",
1217 | "execution_count": 63,
1218 | "metadata": {},
1219 | "outputs": [
1220 | {
1221 | "name": "stdout",
1222 | "output_type": "stream",
1223 | "text": [
1224 | "67.2 µs ± 755 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
1225 | ]
1226 | }
1227 | ],
1228 | "source": [
1229 | "%timeit n_common_5(arr1, arr2)"
1230 | ]
1231 | },
1232 | {
1233 | "cell_type": "code",
1234 | "execution_count": null,
1235 | "metadata": {},
1236 | "outputs": [],
1237 | "source": []
1238 | }
1239 | ],
1240 | "metadata": {
1241 | "kernelspec": {
1242 | "display_name": "Python 3",
1243 | "language": "python",
1244 | "name": "python3"
1245 | },
1246 | "language_info": {
1247 | "codemirror_mode": {
1248 | "name": "ipython",
1249 | "version": 3
1250 | },
1251 | "file_extension": ".py",
1252 | "mimetype": "text/x-python",
1253 | "name": "python",
1254 | "nbconvert_exporter": "python",
1255 | "pygments_lexer": "ipython3",
1256 | "version": "3.6.6"
1257 | }
1258 | },
1259 | "nbformat": 4,
1260 | "nbformat_minor": 2
1261 | }
1262 |
--------------------------------------------------------------------------------