├── .gitignore ├── LICENSE ├── Markdowns ├── 2016-03-06-The-Zen-of-Python.md ├── 2016-03-07-iterator-and-generator.md ├── 2016-03-08-Functional-Programming-in-Python.md ├── 2016-03-09-List-Comprehension.md ├── 2016-03-10-Scope-and-Closure.md ├── 2016-03-11-Arguments-and-Unpacking.md ├── 2016-03-14-Command-Line-tools-in-Python.md ├── 2016-03-15-Unicode-String.md ├── 2016-03-16-Bytes-and-Bytearray.md ├── 2016-03-17-Bytes-decode-Unicode-encode-Bytes.md ├── 2016-03-18-String-Format.md ├── 2016-03-21-Try-else.md ├── 2016-03-22-Shallow-and-Deep-Copy.md ├── 2016-03-23-With-Context-Manager.md ├── 2016-03-24-Sort-and-Sorted.md └── 2016-03-25-Decorator-and-functools.md ├── PyTips.png ├── README.md └── Tips ├── 2016-03-06-The-Zen-of-Python.ipynb ├── 2016-03-07-iterator-and-generator.ipynb ├── 2016-03-08-Functional-Programming-in-Python.ipynb ├── 2016-03-09-List-Comprehension.ipynb ├── 2016-03-10-Scope-and-Closure.ipynb ├── 2016-03-11-Arguments-and-Unpacking.ipynb ├── 2016-03-14-Command-Line-tools-in-Python.ipynb ├── 2016-03-15-Unicode-String.ipynb ├── 2016-03-16-Bytes-and-Bytearray.ipynb ├── 2016-03-17-Bytes-decode-Unicode-encode-Bytes.ipynb ├── 2016-03-18-String-Format.ipynb ├── 2016-03-21-Try-else.ipynb ├── 2016-03-22-Shallow-and-Deep-Copy.ipynb ├── 2016-03-23-With-Context-Manager.ipynb ├── 2016-03-24-Sort-and-Sorted.ipynb ├── 2016-03-25-Decorator-and-functools.ipynb ├── gb2312.txt └── utf8.txt /.gitignore: -------------------------------------------------------------------------------- 1 | venv3/ 2 | .ipynb_checkpoints/ 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Yusheng 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Markdowns/2016-03-06-The-Zen-of-Python.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 之禅与 Pythonic 3 | 4 | Python 之禅是 Python 语言的设计哲学与所倡导的编程理念,Pythonic 则是指基于 Python 理念编写更加符合 Python 语法习惯(idiomatic Python)的代码,这也是本项目所追求的目标,因此以本篇作为开头。 5 | 6 | 7 | ```python 8 | import this 9 | ``` 10 | 11 | The Zen of Python, by Tim Peters 12 | 13 | Beautiful is better than ugly. 14 | Explicit is better than implicit. 15 | Simple is better than complex. 16 | Complex is better than complicated. 17 | Flat is better than nested. 18 | Sparse is better than dense. 19 | Readability counts. 20 | Special cases aren't special enough to break the rules. 21 | Although practicality beats purity. 22 | Errors should never pass silently. 23 | Unless explicitly silenced. 24 | In the face of ambiguity, refuse the temptation to guess. 25 | There should be one-- and preferably only one --obvious way to do it. 26 | Although that way may not be obvious at first unless you're Dutch. 27 | Now is better than never. 28 | Although never is often better than *right* now. 29 | If the implementation is hard to explain, it's a bad idea. 30 | If the implementation is easy to explain, it may be a good idea. 31 | Namespaces are one honking great idea -- let's do more of those! 32 | 33 | 34 | Python 之禅,by Tim Peters 35 | 36 | 优美胜于丑陋 37 | 38 | 明确胜于隐晦 39 | 40 | 简单胜于复杂 41 | 42 | 复杂胜于凌乱 43 | 44 | 扁平胜于嵌套 45 | 46 | 稀疏胜于紧凑 47 | 48 | 可读性至关重要 49 | 50 | 即便特例,也需服从以上规则 51 | 52 | 53 | 除非刻意追求,错误不应跳过 54 | 55 | 面对歧义条件,拒绝尝试猜测 56 | 57 | 58 | 解决问题的最优方法应该有且只有一个 59 | 60 | 尽管这一方法并非显而易见(除非你是Python之父) 61 | 62 | 63 | 动手胜于空想 64 | 65 | 空想胜于不想 66 | 67 | 68 | 难以解释的实现方案,不是好方案 69 | 70 | 易于解释的实现方案,才是好方案 71 | 72 | 73 | 命名空间是个绝妙的理念,多多益善! 74 | 75 | #### 参考 76 | 77 | 1. [《Python之禅》的翻译和解释](http://blog.csdn.net/gzlaiyonghao/article/details/2151918) 78 | 2. [What is Pythonic?](http://blog.startifact.com/posts/older/what-is-pythonic.html) 79 | -------------------------------------------------------------------------------- /Markdowns/2016-03-07-iterator-and-generator.md: -------------------------------------------------------------------------------- 1 | 2 | ### 迭代器与生成器 3 | 4 | 迭代器(iterator)与生成器(generator)是 Python 中比较常用又很容易混淆的两个概念,今天就把它们梳理一遍,并举一些常用的例子。 5 | 6 | **`for` 语句与可迭代对象(iterable object):** 7 | 8 | 9 | ```python 10 | for i in [1, 2, 3]: 11 | print(i) 12 | ``` 13 | 14 | 1 15 | 2 16 | 3 17 | 18 | 19 | 20 | ```python 21 | obj = {"a": 123, "b": 456} 22 | for k in obj: 23 | print(k) 24 | ``` 25 | 26 | b 27 | a 28 | 29 | 30 | 这些可以用在 `for` 语句进行循环的对象就是**可迭代对象**。除了内置的数据类型(列表、元组、字符串、字典等)可以通过 `for` 语句进行迭代,我们也可以自己创建一个容器,包含一系列元素,可以通过 `for` 语句依次循环取出每一个元素,这种容器就是**迭代器(iterator)**。除了用 `for` 遍历,迭代器还可以通过 `next()` 方法逐一读取下一个元素。要创建一个迭代器有3种方法,其中前两种分别是: 31 | 32 | 1. 为容器对象添加 `__iter__()` 和 `__next__()` 方法(Python 2.7 中是 `next()`);`__iter__()` 返回迭代器对象本身 `self`,`__next__()` 则返回每次调用 `next()` 或迭代时的元素; 33 | 2. 内置函数 `iter()` 将可迭代对象转化为迭代器 34 | 35 | 36 | ```python 37 | # iter(IterableObject) 38 | ita = iter([1, 2, 3]) 39 | print(type(ita)) 40 | 41 | print(next(ita)) 42 | print(next(ita)) 43 | print(next(ita)) 44 | 45 | # Create iterator Object 46 | class Container: 47 | def __init__(self, start = 0, end = 0): 48 | self.start = start 49 | self.end = end 50 | def __iter__(self): 51 | print("[LOG] I made this iterator!") 52 | return self 53 | def __next__(self): 54 | print("[LOG] Calling __next__ method!") 55 | if self.start < self.end: 56 | i = self.start 57 | self.start += 1 58 | return i 59 | else: 60 | raise StopIteration() 61 | c = Container(0, 5) 62 | for i in c: 63 | print(i) 64 | 65 | ``` 66 | 67 | 68 | 1 69 | 2 70 | 3 71 | [LOG] I made this iterator! 72 | [LOG] Calling __next__ method! 73 | 0 74 | [LOG] Calling __next__ method! 75 | 1 76 | [LOG] Calling __next__ method! 77 | 2 78 | [LOG] Calling __next__ method! 79 | 3 80 | [LOG] Calling __next__ method! 81 | 4 82 | [LOG] Calling __next__ method! 83 | 84 | 85 | 创建迭代器对象的好处是当序列长度很大时,可以减少内存消耗,因为每次只需要记录一个值即刻(经常看到人们介绍 Python 2.7 的 `range` 函数时,建议当长度太大时用 `xrange` 更快,在 Python 3.5 中已经去除了 `xrange` 只有一个类似迭代器一样的 `range`)。 86 | 87 | #### 生成器 88 | 89 | 前面说到创建迭代器有3种方法,其中第三种就是**生成器(generator)**。生成器通过 `yield` 语句快速生成迭代器,省略了复杂的 `__iter__()` & `__next__()` 方式: 90 | 91 | 92 | ```python 93 | def container(start, end): 94 | while start < end: 95 | yield start 96 | start += 1 97 | c = container(0, 5) 98 | print(type(c)) 99 | print(next(c)) 100 | next(c) 101 | for i in c: 102 | print(i) 103 | ``` 104 | 105 | 106 | 0 107 | 2 108 | 3 109 | 4 110 | 111 | 112 | 简单来说,`yield` 语句可以让普通函数变成一个生成器,并且相应的 `__next__()` 方法返回的是 `yield` 后面的值。一种更直观的解释是:程序执行到 `yield` 会返回值并暂停,再次调用 `next()` 时会从上次暂停的地方继续开始执行: 113 | 114 | 115 | ```python 116 | def gen(): 117 | yield 5 118 | yield "Hello" 119 | yield "World" 120 | yield 4 121 | for i in gen(): 122 | print(i) 123 | ``` 124 | 125 | 5 126 | Hello 127 | World 128 | 4 129 | 130 | 131 | Python 3.5 (准确地说应该是 3.3 以后)中为生成器添加了更多特性,包括 `yield from` 以及在暂停的地方传值回生成器的 `send()`等,为了保持简洁这里就不深入介绍了,有兴趣可以阅读[官方文档](https://docs.python.org/3/reference/expressions.html#yieldexpr)说明以及参考链接2。 132 | 133 | #### 参考 134 | 135 | 1. [Iterators & Generators](http://anandology.com/python-practice-book/iterators.html) 136 | 2. [How the heck does async/await work in Python 3.5?](http://www.snarky.ca/how-the-heck-does-async-await-work-in-python-3-5) 137 | 3. [Python's yield from](http://charlesleifer.com/blog/python-s-yield-from/) 138 | -------------------------------------------------------------------------------- /Markdowns/2016-03-08-Functional-Programming-in-Python.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 中的函数式编程 3 | 4 | > 函数式编程(英语:functional programming)或称函数程序设计,又称泛函编程,是一种编程范型,它将电脑运算视为数学上的函数计算,并且避免使用程序状态以及易变对象。函数编程语言最重要的基础是λ演算(lambda calculus)。而且λ演算的函数可以接受函数当作输入(引数)和输出(传出值)。(维基百科:函数式编程) 5 | 6 | 所谓编程范式(Programming paradigm)是指编程风格、方法或模式,比如面向过程编程(C语言)、面向对象编程(C++)、面向函数式编程(Haskell),并不是说某种编程语言一定属于某种范式,例如 Python 就是多范式编程语言。 7 | 8 | #### 函数式编程 9 | 10 | 函数式编程具有以下特点: 11 | 12 | 1. 避免状态变量 13 | 2. 函数也是变量(一等公民,First-Class Citizen) 14 | 3. 高阶函数 15 | 4. 面向问题描述而不是面向问题解决步骤 16 | 17 | 值得一提的是,函数式编程的这些特点在实践过程中可能并不是那么 Pythonic,甚至与**[0x00](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb)**中提到的 The Zen of Python 相悖。例如函数式编程面向问题描述的特点可能让你更快地写出更简洁的代码,但可读性却也大打折扣(可参考这一段[Haskell代码](https://gist.github.com/rainyear/94b5d9a865601f075719))。不过,虽然 Pythonic 很重要但并不是唯一的准则,_The Choice Is Yours_。 18 | 19 | #### `map(function, iterable, ...)`/`filter(function, iterable)` 20 | 21 | 22 | ```python 23 | # map 函数的模拟实现 24 | def myMap(func, iterable): 25 | for arg in iterable: 26 | yield func(arg) 27 | 28 | names = ["ana", "bob", "dogge"] 29 | 30 | print(map(lambda x: x.capitalize(), names)) # Python 2.7 中直接返回列表 31 | for name in myMap(lambda x: x.capitalize(), names): 32 | print(name) 33 | ``` 34 | 35 | 36 | Ana 37 | Bob 38 | Dogge 39 | 40 | 41 | 42 | ```python 43 | # filter 函数的模拟实现 44 | def myFilter(func, iterable): 45 | for arg in iterable: 46 | if func(arg): 47 | yield arg 48 | 49 | print(filter(lambda x: x % 2 == 0, range(10))) # Python 2.7 中直接返回列表 50 | for i in myFilter(lambda x: x % 2 == 0, range(10)): 51 | print(i) 52 | ``` 53 | 54 | 55 | 0 56 | 2 57 | 4 58 | 6 59 | 8 60 | 61 | 62 | #### `functools.reduce(function, iterable[, initializer])` 63 | 64 | Python 3.5 中`reduce` 被降格到标准库`functools`,`reduce` 也是遍历可迭代对象元素作为第一个函数的参数,并将结果累计: 65 | 66 | 67 | ```python 68 | from functools import reduce 69 | 70 | print(reduce(lambda a, b: a*b, range(1,5))) 71 | ``` 72 | 73 | 24 74 | 75 | 76 | #### `functools.partial(func, *args, **keywords)` 77 | 78 | 偏应用函数(Partial Application)让我们可以固定函数的某些参数: 79 | 80 | 81 | ```python 82 | from functools import partial 83 | 84 | add = lambda a, b: a + b 85 | add1024 = partial(add, 1024) 86 | 87 | add1024(1) 88 | ``` 89 | 90 | 91 | 92 | 93 | 1025 94 | 95 | 96 | 97 | 这里简单介绍了一些常用函数式编程的方法和概念,实际上要传达的一个最重要的观念就是**函数本身也可以作为变量被返回、传递给高阶函数**,这使得我们可以更灵活地运用函数解决问题。但是这并不意味着一定要使用上面这些方法来简化代码,例如更 Pythonic 的方法推荐尽可能使用 List Comprehension 替代`map`/`filter`(关于 List Comprehension 后面会再单独介绍)。如果一定想要用函数式编程的方法来写 Python,也可以尝试[Fn.py](https://github.com/kachayev/fn.py),或者,试试 [Haskell](https://www.haskell.org/)。 98 | 99 | #### 参考 100 | 101 | 1. [维基百科:函数式编程](https://zh.wikipedia.org/wiki/%E5%87%BD%E6%95%B8%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80) 102 | 2. [byvoid:APIO讲稿——函数式编程](http://byvoid.github.io/slides/apio-fp/index.html) 103 | -------------------------------------------------------------------------------- /Markdowns/2016-03-09-List-Comprehension.md: -------------------------------------------------------------------------------- 1 | 2 | ### 0x03 - Python 列表推导 3 | 4 | **[0x02](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-08-Functional-Programming-in-Python.ipynb)** 中提到的 `map`/`filter` 方法可以通过简化的语法快速构建我们需要的列表(或其它可迭代对象),与它们功能相似的,Python 还提供**列表推导(List Comprehension)**的语法。最初学 Python 的时候,我只是把这种语法当做一种**语法糖**,可以用来快速构建特定的列表,后来学习 Haskell 的时候才知道这种形式叫做 List Comprehension(中文我好像没有找到固定的翻译,有翻译成**列表速构、列表解析**之类的,但意思上都是在定义列表结构的时候按照一定的规则进行推导,而不是穷举所有元素)。 5 | 6 | 这种列表推导与数学里面集合的表达形式有些相似,例如$[0, 10)$之间偶数集合可以表示为: 7 | 8 | $$\left\{x\ |\ x \in N, x \lt 10, x\ mod\ 2\ ==\ 0\right\}$$ 9 | 10 | 翻译成 Python 表达式为: 11 | 12 | 13 | ```python 14 | evens = [x for x in range(10) if x % 2 == 0] 15 | print(evens) 16 | ``` 17 | 18 | [0, 2, 4, 6, 8] 19 | 20 | 21 | 这与`filter`效果一样: 22 | 23 | 24 | ```python 25 | fevens = filter(lambda x: x % 2 == 0, range(10)) 26 | print(list(evens) == evens) 27 | ``` 28 | 29 | True 30 | 31 | 32 | 同样,列表推导也可以实现`map`的功能: 33 | 34 | 35 | ```python 36 | squares = [x ** 2 for x in range(1, 6)] 37 | print(squares) 38 | 39 | msquares = map(lambda x: x ** 2, range(1, 6)) 40 | print(list(msquares) == squares) 41 | ``` 42 | 43 | [1, 4, 9, 16, 25] 44 | True 45 | 46 | 47 | 相比之下,列表推导的语法更加直观,因此更 Pythonic 的写法是在可以用列表推导的时候尽量避免`map`/`filter`。 48 | 49 | 除了上面简单的迭代、过滤推导之外,列表推导还支持嵌套结构: 50 | 51 | 52 | ```python 53 | cords = [(x, y) for x in range(3) for y in range(3) if x > 0] 54 | print(cords) 55 | 56 | # 相当于 57 | lcords = [] 58 | for x in range(3): 59 | for y in range(3): 60 | if x > 0: 61 | lcords.append((x, y)) 62 | 63 | print(lcords == cords) 64 | ``` 65 | 66 | [(1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)] 67 | True 68 | 69 | 70 | #### 字典与集合的推导 71 | 72 | 这样一比较更加能够突出列表推导的优势,但是当嵌套的循环超过2层之后,列表推导语法的可读性也会大大下降,所以当循环嵌套层数增加时,还是建议用直接的语法。 73 | 74 | Python 中除了列表(List)可以进行列表推导之外,字典(Dict)、集合(Set)同样可以: 75 | 76 | 77 | ```python 78 | dns = {domain : ip 79 | for domain in ["github.com", "git.io"] 80 | for ip in ["23.22.145.36", "23.22.145.48"]} 81 | print(dns) 82 | 83 | names = {name for name in ["ana", "bob", "catty", "octocat"] if len(name) > 3} 84 | print(names) 85 | ``` 86 | 87 | {'github.com': '23.22.145.48', 'git.io': '23.22.145.48'} 88 | {'octocat', 'catty'} 89 | 90 | 91 | #### 生成器 92 | 93 | **[0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb)**中提到的生成器(Generator),除了在函数中使用 `yield` 关键字之外还有另外一种隐藏方法,那就是对元组(Tuple)使用列表推导: 94 | 95 | 96 | ```python 97 | squares = (x for x in range(10) if x % 2 == 0) 98 | print(squares) 99 | 100 | print(next(squares)) 101 | next(squares) 102 | 103 | for i in squares: 104 | print(i) 105 | ``` 106 | 107 | at 0x1104fbba0> 108 | 0 109 | 4 110 | 6 111 | 8 112 | 113 | -------------------------------------------------------------------------------- /Markdowns/2016-03-10-Scope-and-Closure.md: -------------------------------------------------------------------------------- 1 | 2 | ### 闭包(Closure) 3 | 4 | > 在计算机科学中,闭包(英语:Closure),又称词法闭包(Lexical Closure)或函数闭包(function closures),是引用了自由变量的函数。这个被引用的自由变量将和这个函数一同存在,即使已经离开了创造它的环境也不例外。 5 | [[维基百科::闭包(计算机科学)](https://zh.wikipedia.org/wiki/闭包_%28计算机科学%29)] 6 | 7 | [0x02 Python 中的函数式编程](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-08-Functional-Programming-in-Python.md) 本来也应该包括闭包的概念,但是我觉得闭包更重要的是对**作用域(Scope)**的理解,因此把它单独列出来,同时可以理顺一下 Python 的作用域规则。 8 | 9 | 闭包的概念最早出现在函数式编程语言中,后来被一些命令式编程语言所借鉴。尤其是在一些函数作为一等公民的语言中,例如JavaScript就经常用到(在JavaScript中函数几乎可以当做“特等公民”看待),我之前也写过一篇关于JavaScript闭包的文章([图解Javascript上下文与作用域](http://blog.rainy.im/2015/07/04/scope-chain-and-prototype-chain-in-js/)),实际上闭包并不是太复杂的概念,但是可以借助闭包更好地理解不同语言的作用域规则。 10 | 11 | #### 命名空间与作用域 12 | 13 | [0x00 The Zen of Python](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-06-The-Zen-of-Python.md)的最后一句重点强调命名空间的概念,我们可以把命名空间看做一个大型的字典类型(Dict),里面包含了所有变量的名字和值的映射关系。在 Python 中,作用域实际上可以看做是“**在当前上下文的位置,获取命名空间变量的规则**”。在 Python 代码执行的任意位置,都至少存在三层嵌套的作用域: 14 | 15 | 1. 最内层作用域,最早搜索,包含所有局部变量**(Python 默认所有变量声明均为局部变量)** 16 | 2. 所有包含当前上下文的外层函数的作用域,由内而外依次搜索,这里包含的是**非局部**也**非全局**的变量 17 | 3. 一直向上搜索,直到当前模块的全局变量 18 | 4. 最外层,最后搜索的,内置(built-in)变量 19 | 20 | 在任意执行位置,可以将作用域看成是对下面这样一个命名空间的搜索: 21 | 22 | 23 | ```python 24 | scopes = { 25 | "local": {"locals": None, 26 | "non-local": {"locals": None, 27 | "global": {"locals": None, 28 | "built-in": ["built-ins"]}}}, 29 | } 30 | ``` 31 | 32 | 除了默认的局部变量声明方式,Python 还有`global`和`nonlocal`两种类型的声明(**`nonlocal`是Python 3.x之后才有,2.7没有**),其中 `global` 指定的变量直接**指向**(3)当前模块的全局变量,而`nonlocal`则指向(2)最内层之外,`global`以内的变量。这里需要强调指向(references and assignments)的原因是,普通的局部变量对最内层局部作用域之外只有**只读(read-only)**的访问权限,比如下面的例子: 33 | 34 | 35 | ```python 36 | x = 100 37 | def main(): 38 | x += 1 39 | print(x) 40 | main() 41 | ``` 42 | 43 | 44 | --------------------------------------------------------------------------- 45 | 46 | UnboundLocalError Traceback (most recent call last) 47 | 48 | in () 49 | 3 x += 1 50 | 4 print(x) 51 | ----> 5 main() 52 | 53 | 54 | in main() 55 | 1 x = 100 56 | 2 def main(): 57 | ----> 3 x += 1 58 | 4 print(x) 59 | 5 main() 60 | 61 | 62 | UnboundLocalError: local variable 'x' referenced before assignment 63 | 64 | 65 | 这里抛出`UnboundLocalError`,是因为`main()`函数内部的作用域对于全局变量`x`仅有只读权限,想要在`main()`中对`x`进行改变,不会影响全局变量,而是会创建一个新的局部变量,显然无法对还未创建的局部变量直接使用`x += 1`。如果想要获得全局变量的完全引用,则需要`global`声明: 66 | 67 | 68 | ```python 69 | x = 100 70 | def main(): 71 | global x 72 | x += 1 73 | print(x) 74 | 75 | main() 76 | print(x) # 全局变量已被改变 77 | ``` 78 | 79 | 101 80 | 101 81 | 82 | 83 | #### Python 闭包 84 | 85 | 到这里基本上已经了解了 Python 作用域的规则,那么我们来仿照 JavaScript 写一个计数器的闭包: 86 | 87 | 88 | ```python 89 | """ 90 | /* JavaScript Closure example */ 91 | var inc = function(){ 92 | var x = 0; 93 | return function(){ 94 | console.log(x++); 95 | }; 96 | }; 97 | var inc1 = inc() 98 | var inc2 = inc() 99 | """ 100 | 101 | # Python 3.5 102 | def inc(): 103 | x = 0 104 | def inner(): 105 | nonlocal x 106 | x += 1 107 | print(x) 108 | return inner 109 | inc1 = inc() 110 | inc2 = inc() 111 | 112 | inc1() 113 | inc1() 114 | inc1() 115 | inc2() 116 | ``` 117 | 118 | 1 119 | 2 120 | 3 121 | 1 122 | 123 | 124 | 对于还没有`nonlocal`关键字的 Python 2.7,可以通过一点小技巧来规避局部作用域只读的限制: 125 | 126 | 127 | ```python 128 | # Python 2.7 129 | def inc(): 130 | x = [0] 131 | def inner(): 132 | x[0] += 1 133 | print(x[0]) 134 | return inner 135 | inc1 = inc() 136 | inc2 = inc() 137 | 138 | inc1() 139 | inc1() 140 | inc1() 141 | inc2() 142 | ``` 143 | 144 | 1 145 | 2 146 | 3 147 | 1 148 | 149 | 150 | 上面的例子中,`inc1()`是在全局环境下执行的,虽然全局环境是不能向下获取到`inc()`中的局部变量`x`的,但是我们返回了一个`inc()`内部的函数`inner()`,而`inner()`对`inc()`中的局部变量是有访问权限的。也就是说`inner()`将`inc()`内的局部作用域打包送给了`inc1`和`inc2`,从而使它们各自独立拥有了一块封闭起来的作用域,不受全局变量或者任何其它运行环境的影响,因此称为闭包。 151 | 152 | 闭包函数都有一个`__closure__`属性,其中包含了它所引用的上层作用域中的变量: 153 | 154 | 155 | ```python 156 | print(inc1.__closure__[0].cell_contents) 157 | print(inc2.__closure__[0].cell_contents) 158 | ``` 159 | 160 | [3] 161 | [1] 162 | 163 | 164 | #### 参考 165 | 166 | 1. [9.2. Python Scopes and Namespaces](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces) 167 | 2. [Visualize Python Execution](http://www.pythontutor.com/visualize.html#mode=edit) 168 | 3. [Wikipedia::Closure](https://en.wikipedia.org/wiki/Closure_%28computer_programming%29) 169 | -------------------------------------------------------------------------------- /Markdowns/2016-03-11-Arguments-and-Unpacking.md: -------------------------------------------------------------------------------- 1 | 2 | ### 函数调用的参数规则与解包 3 | 4 | Python 的函数在声明参数时大概有下面 4 种形式: 5 | 6 | 1. 不带默认值的:`def func(a): pass` 7 | 2. 带有默认值的:`def func(a, b = 1): pass` 8 | 3. 任意位置参数:`def func(a, b = 1, *c): pass` 9 | 4. 任意键值参数:`def func(a, b = 1, *c, **d): pass` 10 | 11 | 在调用函数时,有两种情况: 12 | 13 | 1. 没有关键词的参数:`func("G", 20)` 14 | 2. 带有关键词的参数:`func(a = "G", b = 20)`(其中带有关键词调用可以不考虑顺序:`func(b = 20, a = "G"`) 15 | 16 | 当然,这两种情况是可以混用的:`func("G", b = 20)`,但最重要的一条规则是**位置参数不能在关键词参数之后出现**: 17 | 18 | 19 | ```python 20 | def func(a, b = 1): 21 | pass 22 | func(a = "G", 20) # SyntaxError 语法错误 23 | ``` 24 | 25 | 26 | File "", line 3 27 | func(a = "G", 20) # SyntaxError 语法错误 28 | ^ 29 | SyntaxError: positional argument follows keyword argument 30 | 31 | 32 | 33 | 另外一条规则是:**位置参数优先权**: 34 | 35 | 36 | ```python 37 | def func(a, b = 1): 38 | pass 39 | func(20, a = "G") # TypeError 对参数 a 重复赋值 40 | ``` 41 | 42 | 43 | --------------------------------------------------------------------------- 44 | 45 | TypeError Traceback (most recent call last) 46 | 47 | in () 48 | 1 def func(a, b = 1): 49 | 2 pass 50 | ----> 3 func(20, a = "G") # TypeError 对参数 a 重复赋值 51 | 52 | 53 | TypeError: func() got multiple values for argument 'a' 54 | 55 | 56 | 最保险的方法就是全部采用关键词参数。 57 | 58 | #### 任意参数 59 | 60 | 任意参数可以接受任意数量的参数,其中`*a`的形式代表任意数量的位置参数,`**d`代表任意数量的关键词参数: 61 | 62 | 63 | ```python 64 | def concat(*lst, sep = "/"): 65 | return sep.join((str(i) for i in lst)) 66 | 67 | print(concat("G", 20, "@", "Hz", sep = "")) 68 | ``` 69 | 70 | G20@Hz 71 | 72 | 73 | 上面的这个`def concat(*lst, sep = "/")`的语法是[PEP 3102](https://www.python.org/dev/peps/pep-3102/)提出的,在 Python 3.0 之后实现。这里的关键词函数必须明确指明,不能通过位置推断: 74 | 75 | 76 | ```python 77 | print(concat("G", 20, "-")) # Not G-20 78 | ``` 79 | 80 | G/20/- 81 | 82 | 83 | `**d`则代表任意数量的关键词参数 84 | 85 | 86 | ```python 87 | def dconcat(sep = ":", **dic): 88 | for k in dic.keys(): 89 | print("{}{}{}".format(k, sep, dic[k])) 90 | 91 | dconcat(hello = "world", python = "rocks", sep = "~") 92 | ``` 93 | 94 | hello~world 95 | python~rocks 96 | 97 | 98 | #### Unpacking 99 | 100 | Python 3.5 添加的新特性([PEP 448](https://www.python.org/dev/peps/pep-0448/)),使得`*a`、`**d`可以在函数参数之外使用: 101 | 102 | 103 | ```python 104 | print(*range(5)) 105 | lst = [0, 1, 2, 3] 106 | print(*lst) 107 | 108 | a = *range(3), # 这里的逗号不能漏掉 109 | print(a) 110 | 111 | d = {"hello": "world", "python": "rocks"} 112 | print({**d}["python"]) 113 | ``` 114 | 115 | 0 1 2 3 4 116 | 0 1 2 3 117 | (0, 1, 2) 118 | rocks 119 | 120 | 121 | 所谓的解包(Unpacking)实际上可以看做是去掉`()`的元组或者是去掉`{}`的字典。这一语法也提供了一个更加 Pythonic 地合并字典的方法: 122 | 123 | 124 | ```python 125 | user = {'name': "Trey", 'website': "http://treyhunner.com"} 126 | defaults = {'name': "Anonymous User", 'page_name': "Profile Page"} 127 | 128 | print({**defaults, **user}) 129 | ``` 130 | 131 | {'page_name': 'Profile Page', 'name': 'Trey', 'website': 'http://treyhunner.com'} 132 | 133 | 134 | 在函数调用的时候使用这种解包的方法则是 Python 2.7 也可以使用的: 135 | 136 | 137 | ```python 138 | print(concat(*"ILovePython")) 139 | ``` 140 | 141 | I/L/o/v/e/P/y/t/h/o/n 142 | 143 | 144 | #### 参考 145 | 146 | 1. [The Idiomatic Way to Merge Dictionaries in Python](https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/) 147 | -------------------------------------------------------------------------------- /Markdowns/2016-03-14-Command-Line-tools-in-Python.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 开发命令行工具 3 | 4 | Python 作为一种脚本语言,可以非常方便地用于系统(尤其是\*nix系统)命令行工具的开发。Python 自身也集成了一些标准库,专门用于处理命令行相关的问题。 5 | 6 | #### 命令行工具的一般结构 7 | 8 | ![CL-in-Python](http://7xiijd.com1.z0.glb.clouddn.com/CL-in-Python.png) 9 | 10 | **1. 标准输入输出** 11 | 12 | \*nix 系统中,一切皆为文件,因此标准输入、输出可以完全可以看做是对文件的操作。标准化输入可以通过管道(pipe)或重定向(redirect)的方式传递: 13 | 14 | 15 | ```python 16 | # script reverse.py 17 | #!/usr/bin/env python 18 | import sys 19 | for l in sys.stdin.readlines(): 20 | sys.stdout.write(l[::-1]) 21 | ``` 22 | 23 | 保存为 `reverse.py`,通过管道 `|` 传递: 24 | 25 | ```sh 26 | chmod +x reverse.py 27 | cat reverse.py | ./reverse.py 28 | 29 | nohtyp vne/nib/rsu/!# 30 | sys tropmi 31 | :)(senildaer.nidts.sys ni l rof 32 | )]1-::[l(etirw.tuodts.sys 33 | ``` 34 | 35 | 通过重定向 `<` 传递: 36 | 37 | ```sh 38 | ./reverse.py < reverse.py 39 | # 输出结果同上 40 | ``` 41 | 42 | **2. 命令行参数** 43 | 44 | 一般在命令行后追加的参数可以通过 `sys.argv` 获取, `sys.argv` 是一个列表,其中第一个元素为当前脚本的文件名: 45 | 46 | 47 | ```python 48 | # script argv.py 49 | #!/usr/bin/env python 50 | import sys 51 | print(sys.argv) # 下面返回的是 Jupyter 运行的结果 52 | ``` 53 | 54 | ['/Users/rainy/Projects/GitHub/pytips/venv3/lib/python3.5/site-packages/ipykernel/__main__.py', '-f', '/Users/rainy/Library/Jupyter/runtime/kernel-0533e681-bd7c-4c4d-9094-a78fde7fc2ed.json'] 55 | 56 | 57 | 运行上面的脚本: 58 | 59 | ```sh 60 | chmod +x argv.py 61 | ./argv.py hello world 62 | python argv.py hello world 63 | 64 | # 返回的结果是相同的 65 | # ['./test.py', 'hello', 'world'] 66 | ``` 67 | 68 | 对于比较复杂的命令行参数,例如通过 `--option` 传递的选项参数,如果是对 `sys.argv` 逐项进行解析会很麻烦,Python 提供标准库 [`argparse`](https://docs.python.org/3/library/argparse.html)(旧的库为 `optparse`,已经停止维护)专门解析命令行参数: 69 | 70 | 71 | ```python 72 | # script convert.py 73 | #!/usr/bin/env python 74 | import argparse as apa 75 | def loadConfig(config): 76 | print("Load config from: {}".format(config)) 77 | def setTheme(theme): 78 | print("Set theme: {}".format(theme)) 79 | def main(): 80 | parser = apa.ArgumentParser(prog="convert") # 设定命令信息,用于输出帮助信息 81 | parser.add_argument("-c", "--config", required=False, default="config.ini") 82 | parser.add_argument("-t", "--theme", required=False, default="default.theme") 83 | parser.add_argument("-f") # Accept Jupyter runtime option 84 | args = parser.parse_args() 85 | loadConfig(args.config) 86 | setTheme(args.theme) 87 | 88 | if __name__ == "__main__": 89 | main() 90 | ``` 91 | 92 | Load config from: config.ini 93 | Set theme: default.theme 94 | 95 | 96 | 利用 `argparse` 可以很方便地解析选项参数,同时可以定义指定参数的相关属性(是否必须、默认值等),同时还可以自动生成帮助文档。执行上面的脚本: 97 | 98 | ```sh 99 | ./convert.py -h 100 | usage: convert [-h] [-c CONFIG] [-t THEME] 101 | 102 | optional arguments: 103 | -h, --help show this help message and exit 104 | -c CONFIG, --config CONFIG 105 | -t THEME, --theme THEME 106 | ``` 107 | 108 | **3. 执行系统命令** 109 | 110 | 当 Python 能够准确地解读输入信息或参数之后,就可以通过 Python 去做任何事情了。这里主要介绍通过 Python 调用系统命令,也就是替代 `Shell` 脚本完成系统管理的功能。我以前的习惯是将命令行指令通过 `os.system(command)` 执行,但是更好的做法应该是用 [`subprocess`](https://docs.python.org/3.5/library/subprocess.html) 标准库,它的存在就是为了替代旧的 `os.system; os.spawn*` 。 111 | 112 | `subprocess` 模块提供简便的直接调用系统指令的`call()`方法,以及较为复杂可以让用户更加深入地与系统命令进行交互的`Popen`对象。 113 | 114 | 115 | ```python 116 | # script list_files.py 117 | #!/usr/bin/env python 118 | import subprocess as sb 119 | res = sb.check_output("ls -lh ./*.ipynb", shell=True) # 为了安全起见,默认不通过系统 Shell 执行,因此需要设定 shell=True 120 | print(res.decode()) # 默认返回值为 bytes 类型,需要进行解码操作 121 | ``` 122 | 123 | -rw-r--r-- 1 rainy staff 3.4K 3 8 17:36 ./2016-03-06-The-Zen-of-Python.ipynb 124 | -rw-r--r-- 1 rainy staff 6.7K 3 8 17:45 ./2016-03-07-iterator-and-generator.ipynb 125 | -rw-r--r-- 1 rainy staff 6.0K 3 10 12:35 ./2016-03-08-Functional-Programming-in-Python.ipynb 126 | -rw-r--r-- 1 rainy staff 5.9K 3 9 16:28 ./2016-03-09-List-Comprehension.ipynb 127 | -rw-r--r-- 1 rainy staff 10K 3 10 14:14 ./2016-03-10-Scope-and-Closure.ipynb 128 | -rw-r--r-- 1 rainy staff 8.0K 3 11 16:30 ./2016-03-11-Arguments-and-Unpacking.ipynb 129 | -rw-r--r-- 1 rainy staff 8.5K 3 14 19:31 ./2016-03-14-Command-Line-tools-in-Python.ipynb 130 | 131 | 132 | 133 | 如果只是简单地执行系统命令还不能满足你的需求,可以使用 `subprocess.Popen` 与生成的子进程进行更多交互: 134 | 135 | 136 | ```python 137 | import subprocess as sb 138 | 139 | p = sb.Popen(['grep', 'communicate'], stdin=sb.PIPE, stdout=sb.PIPE) 140 | res, err = p.communicate(sb.check_output('cat ./*', shell=True)) 141 | if not err: 142 | print(res.decode()) 143 | ``` 144 | 145 | " \"p = sb.Popen(['grep', 'communicate'], stdout=sb.PIPE)\\n\",\n", 146 | " \"# res = p.communicate(sb.check_output('cat ./*'))\"\n", 147 | "p = sb.Popen(['grep', 'communicate'], stdin=sb.PIPE, stdout=sb.PIPE)\n", 148 | "res, err = p.communicate(sb.check_output('cat ./*', shell=True))\n", 149 | 150 | 151 | -------------------------------------------------------------------------------- /Markdowns/2016-03-15-Unicode-String.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 字符串 3 | 4 | 所有用过 Python (2&3)的人应该都看过下面两行错误信息: 5 | 6 | > `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)` 7 | 8 | > `UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte` 9 | 10 | 这就是 Python 界的"锟斤拷"! 11 | 12 | 今天和接下来几期的内容将主要关注 Python 中的字符串(`str`)、字节(`bytes`)及两者之间的相互转换(`encode`/`decode`)。也许不能让你突然间解决所有乱码问题,但希望可以帮助你迅速找到问题所在。 13 | 14 | ### 定义 15 | 16 | Python 中对字符串的定义如下: 17 | 18 | > Textual data in Python is handled with `str` objects, or strings. Strings are immutable sequences of Unicode code points. 19 | 20 | Python 3.5 中字符串是由一系列 Unicode 码位(code point)所组成的**不可变序列**: 21 | 22 | 23 | ```python 24 | ('S' 'T' 'R' 'I' 'N' 'G') 25 | ``` 26 | 27 | 28 | 29 | 30 | 'STRING' 31 | 32 | 33 | 34 | **不可变**是指无法对字符串本身进行更改操作: 35 | 36 | 37 | ```python 38 | s = 'Hello' 39 | print(s[3]) 40 | s[3] = 'o' 41 | ``` 42 | 43 | l 44 | 45 | 46 | 47 | --------------------------------------------------------------------------- 48 | 49 | TypeError Traceback (most recent call last) 50 | 51 | in () 52 | 1 s = 'Hello' 53 | 2 print(s[3]) 54 | ----> 3 s[3] = 'o' 55 | 56 | 57 | TypeError: 'str' object does not support item assignment 58 | 59 | 60 | 而**序列(sequence)**则是指字符串继承序列类型(`list/tuple/range`)的通用操作: 61 | 62 | 63 | ```python 64 | [i.upper() for i in "hello"] 65 | ``` 66 | 67 | 68 | 69 | 70 | ['H', 'E', 'L', 'L', 'O'] 71 | 72 | 73 | 74 | 至于 Unicode 暂时可以看作一张非常大的地图,这张地图里面记录了世界上所有的符号,而码位则是每个符号所对应的坐标(具体内容将在后面的几期介绍)。 75 | 76 | 77 | ```python 78 | s = '雨' 79 | print(s) 80 | print(len(s)) 81 | print(s.encode()) 82 | ``` 83 | 84 | 雨 85 | 1 86 | b'\xe9\x9b\xa8' 87 | 88 | 89 | ### 常用操作 90 | 91 | - **`len`**:字符串长度; 92 | - **`split` & `join`** 93 | - **`find` & `index`** 94 | - **`strip`** 95 | - **`upper` & `lower` & `swapcase` & `title` & `capitalize`** 96 | - **`endswith` & `startswith` & `is*`** 97 | - **`zfill`** 98 | 99 | 100 | ```python 101 | # split & join 102 | s = "Hello world!" 103 | print(",".join(s.split())) # 常用的切分 & 重组操作 104 | 105 | "https://github.com/rainyear/pytips".split("/", 2) # 限定切分次数 106 | ``` 107 | 108 | Hello,world! 109 | 110 | 111 | 112 | 113 | 114 | ['https:', '', 'github.com/rainyear/pytips'] 115 | 116 | 117 | 118 | 119 | ```python 120 | s = "coffee" 121 | print(s.find('f')) # 从左至右搜索,返回第一个下标 122 | print(s.rfind('f')) # 从右至左搜索,返回第一个下表 123 | 124 | print(s.find('a')) # 若不存在则返回 -1 125 | print(s.index('a')) # 若不存在则抛出 ValueError,其余与 find 相同 126 | ``` 127 | 128 | 2 129 | 3 130 | -1 131 | 132 | 133 | 134 | --------------------------------------------------------------------------- 135 | 136 | ValueError Traceback (most recent call last) 137 | 138 | in () 139 | 4 140 | 5 print(s.find('a')) # 若不存在则返回 -1 141 | ----> 6 print(s.index('a')) # 若不存在则抛出 ValueError,其余与 find 相同 142 | 143 | 144 | ValueError: substring not found 145 | 146 | 147 | 148 | ```python 149 | print(" hello world ".strip()) 150 | print("helloworld".strip("heo")) 151 | print("["+" i ".lstrip() +"]") 152 | print("["+" i ".rstrip() +"]") 153 | ``` 154 | 155 | hello world 156 | lloworld 157 | [i ] 158 | [ i] 159 | 160 | 161 | 162 | ```python 163 | print("{}\n{}\n{}\n{}\n{}".format( 164 | "hello, WORLD".upper(), 165 | "hello, WORLD".lower(), 166 | "hello, WORLD".swapcase(), 167 | "hello, WORLD".capitalize(), 168 | "hello, WORLD".title())) 169 | ``` 170 | 171 | HELLO, WORLD 172 | hello, world 173 | HELLO, world 174 | Hello, world 175 | Hello, World 176 | 177 | 178 | 179 | ```python 180 | print(""" 181 | {}|{} 182 | {}|{} 183 | {}|{} 184 | {}|{} 185 | {}|{} 186 | {}|{} 187 | """.format( 188 | "Python".startswith("P"),"Python".startswith("y"), 189 | "Python".endswith("n"),"Python".endswith("o"), 190 | "i23o6".isalnum(),"1 2 3 0 6".isalnum(), 191 | "isalpha".isalpha(),"isa1pha".isalpha(), 192 | "python".islower(),"Python".islower(), 193 | "PYTHON".isupper(),"Python".isupper(), 194 | )) 195 | ``` 196 | 197 | 198 | True|False 199 | True|False 200 | True|False 201 | True|False 202 | True|False 203 | True|False 204 | 205 | 206 | 207 | 208 | ```python 209 | "101".zfill(8) 210 | ``` 211 | 212 | 213 | 214 | 215 | '00000101' 216 | 217 | 218 | 219 | **`format` / `encode`** 220 | 221 | 格式化输出 `format` 是非常有用的工具,将会单独进行介绍;`encode` 会在 `bytes-decode-Unicode-encode-bytes` 中详细介绍。 222 | -------------------------------------------------------------------------------- /Markdowns/2016-03-16-Bytes-and-Bytearray.md: -------------------------------------------------------------------------------- 1 | 2 | ### 字节与字节数组 3 | 4 | [0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb) 中介绍了 Python 中的字符串类型,字符串类型是对人类友好的符号,但计算机只认识一种符号,那就是二进制(binary)数,或者说是数字: 5 | 6 | ![OpenCV](http://docs.opencv.org/2.4/_images/MatBasicImageForComputer.jpg) 7 | 8 | 上面这张图片来自 [OpenCV](http://docs.opencv.org/2.4/),非常直观地解释了计算机处理的信息与我们看到的图像之间的关系。回到 Python 对字节和字节数组的定义: 9 | 10 | > The core built-in types for manipulating binary data are `bytes` and `bytearray`. 11 | 12 | ### 1Byte of ASCII 13 | 14 | 为了用计算机可以理解的数字描述人类使用的字符,我们需要一张数字与字符对应的表。我们都知道在计算机中 `1 byte = 8bits`,可以存储 `0~255` 共256个值,也就是说 `1byte` 最多可以表示 256 个字符,在最初的计算机世界中,256 足以容纳所有大小写英文字母和 `0~9` 阿拉伯数字以及一些常用的符号,于是就有了 ASCII 编码: 15 | 16 | ![ascii](http://7xiijd.com1.z0.glb.clouddn.com/asciix400.jpg) 17 | 18 | 在 Python 中创建字节与字符串类似,只不过需要在引号外面加一个前缀`b`: 19 | 20 | 21 | ```python 22 | print(b"Python") 23 | python = (b'P' b'y' b"t" b'o' b'n') 24 | print(python) 25 | ``` 26 | 27 | b'Python' 28 | b'Pyton' 29 | 30 | 31 | Bytes 代表的是(二进制)数字的序列,只不过在是通过 `ASCII` 编码之后才是我们看到的字符形式,如果我们单独取出一个字节,它仍然是一个数字: 32 | 33 | 34 | ```python 35 | print(b"Python"[0]) 36 | ``` 37 | 38 | 80 39 | 40 | 41 | 我们可以用 `b"*"` 的形式创建一个字节类型,前提条件是这里的 `*` 必须是 `ASCII` 中可用的字符,否则将会超出限制: 42 | 43 | 44 | ```python 45 | print(b"雨") 46 | ``` 47 | 48 | 49 | File "", line 1 50 | print(b"雨") 51 | ^ 52 | SyntaxError: bytes can only contain ASCII literal characters. 53 | 54 | 55 | 56 | 错误提示说明:字节类型只能允许 ASCII 字符。 57 | 58 | **0~127~255** 59 | 60 | 那么问题来了,我们发现上面的 `ASCII` 表里面所有的字符只占据了 `[31, 127]`,那对于这一范围之外的数字我们要怎么才能表示为字节类型?答案就是用特殊的转义符号`\x`+十六进制数字 : 61 | 62 | 63 | ```python 64 | print(b'\xff'[0]) 65 | print(b'\x24') 66 | ``` 67 | 68 | 255 69 | b'$' 70 | 71 | 72 | 反过来我们也可以将数字(0~255)转变成转义后的字节类型: 73 | 74 | 75 | ```python 76 | print(bytes([24])) 77 | print(bytes([36,36,36])) # 记住字节类型是一个序列 78 | ``` 79 | 80 | b'\x18' 81 | b'$$$' 82 | 83 | 84 | 或者直接从十六进制得来: 85 | 86 | 87 | ```python 88 | print(bytes.fromhex("7b 7d")) 89 | 90 | # 逆运算 91 | print(b'{ }'.hex()) 92 | 93 | int(b' '.hex(), base=16) 94 | ``` 95 | 96 | b'{}' 97 | 7b207d 98 | 99 | 100 | 101 | 102 | 103 | 32 104 | 105 | 106 | 107 | ### `encode` 108 | 109 | 字符串有 `decode` 方法,而字节有 `encode` 方法,我们这里先简单看一下 `encode('ascii')` 。对于给定的**字符**我们可以通过编码得到它在编码表里面的坐标(即码位),因此对字符进行`encode('ascii')`操作是找到其在 `ASCII` 中的位置: 110 | 111 | 112 | ```python 113 | print("$".encode('ascii')) 114 | print("$".encode('ascii')[0]) 115 | ``` 116 | 117 | b'$' 118 | 36 119 | 120 | 121 | 也就是说字符 `"$"` ([0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb)中已经介绍过这是一个 Unicode 编码的字符)在 `ASCII` 中的位置就是 `$`(或者说36)。 122 | 123 | 可是如果我们对一些奇怪的字符进行 `ASCII` 编码,就会发生: 124 | 125 | 126 | ```python 127 | snake = '🐍' 128 | try: 129 | snake.encode('ascii') 130 | except UnicodeEncodeError as err: 131 | print(err) 132 | 133 | # 正确的做法应该是用 UTF-8 进行编码,因为字符串都是 UTF-8 的 134 | print(snake.encode()) # utf-8 by default 135 | ``` 136 | 137 | 'ascii' codec can't encode character '\U0001f40d' in position 0: ordinal not in range(128) 138 | b'\xf0\x9f\x90\x8d' 139 | 140 | 141 | 于是就得到了我们最熟悉的错误:`ordinal not in range(128)`,至于为什么是 128,现在应该很好理解了吧! 142 | 143 | ### 字节数组 144 | 145 | 和字符串一样,字节类型也是不可变序列,而字节数组就是可变版本的字节,它们的关系就相当于`list`与`tuple`。 146 | 147 | 148 | ```python 149 | ba = bytearray(b'hello') 150 | ba[0:1] = b'w' 151 | print(ba) 152 | ``` 153 | 154 | bytearray(b'wello') 155 | 156 | 157 | 由于和字符串一样是序列类型,字节和字节数组可用的方法也类似,这里就不一一列举了。 158 | 159 | ### 总结 160 | 161 | 1. 字节(字节数组)是二进制数据组成的序列,其中每个元素由8bit二进制即1byte亦即2位十六进制数亦亦即0~255组成; 162 | 2. 字节是计算机的语言,字符串是人类语言,它们之间通过编码表形成一一对应的关系; 163 | 3. 最小的 `ASCII` 编码表只需要一位字节,且只占用了其中 `[31,127]` 的码位; 164 | 165 | 关于字节与字符串之间的关系,将在下一期[0x08]()详细介绍。 166 | 167 | ### 参考 168 | 169 | 1. [Pragmatic Unicode](http://nedbatchelder.com/text/unipain/unipain.html#1) 170 | -------------------------------------------------------------------------------- /Markdowns/2016-03-17-Bytes-decode-Unicode-encode-Bytes.md: -------------------------------------------------------------------------------- 1 | 2 | ## Python 中 Unicode 的正确用法 3 | 4 | [0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb) 和 [0x08](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-16-Bytes-and-Bytearray.ipynb) 分别介绍了 Python 中的字符串类型(`str`)和字节类型(`byte`),以及 Python 编码中最常见也是最顽固的两个错误: 5 | 6 | > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) 7 | 8 | > UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte 9 | 10 | 这一期就从这两个错误入手,分析 Python 中 Unicode 的正确用法。这篇短文并不能保证你可以永远杜绝上面两个错误,但是希望在下次遇到这个错误的时候知道错在哪里、应该从哪里入手。 11 | 12 | ### 编码与解码 13 | 14 | 上面的两个错误分别是 `UnicodeEncodeError` 和 `UnicodeDecodeError`,也就是说分别在 Unicode 编码(Encode)和解码(Decode)过程中出现了错误,那么编码和解码究竟分别意味着什么?根据维基百科[字符编码](https://zh.wikipedia.org/wiki/字符编码)的定义: 15 | 16 | > 字符编码(英语:Character encoding)、字集码是把字符集中的字符编码为指定集合中某一对象(例如:比特模式、自然数序列、8位组或者电脉冲),以便文本在计算机中存储和通过通信网络的传递。 17 | 18 | 简单来说就是把**人类通用的语言符号**翻译成**计算机通用的对象**,而反向的翻译过程自然就是**解码**了。Python 中的字符串类型代表人类通用的语言符号,因此字符串类型有`encode()`方法;而字节类型代表计算机通用的对象(二进制数据),因此字节类型有`decode()`方法。 19 | 20 | 21 | ```python 22 | print("🌎🌏".encode()) 23 | ``` 24 | 25 | b'\xf0\x9f\x8c\x8e\xf0\x9f\x8c\x8f' 26 | 27 | 28 | 29 | ```python 30 | print(b'\xf0\x9f\x8c\x8e\xf0\x9f\x8c\x8f'.decode()) 31 | ``` 32 | 33 | 🌎🌏 34 | 35 | 36 | 既然说编码和解码都是**翻译**的过程,那么就需要一本字典将人类和计算机的语言一一对应起来,这本字典的名字叫做**字符集**,从最早的 ASCII 到现在最通用的 Unicode,它们的本质是一样的,只是两本字典的厚度不同而已。ASCII 只包含了26个基本拉丁字母、阿拉伯数目字和英式标点符号一共128个字符,因此只需要(不占满)一个字节就可以存储,而 Unicode 则涵盖的数据除了视觉上的字形、编码方法、标准的字符编码外,还包含了字符特性,如大小写字母,共可包含 1.1M 个字符,而到现在只填充了其中的 110K 个位置。 37 | 38 | 字符集中字符所存储的位置(或者说对应的计算机通用的数字)称之为码位(code point),例如在 ASCII 中字符 `'$'` 的码位就是: 39 | 40 | 41 | ```python 42 | print(ord('$')) 43 | ``` 44 | 45 | 36 46 | 47 | 48 | ASCII 只需要一个字节就能存下所有码位,而 Unicode 则需要几个字节才能容纳,但是对于具体采用什么样的方案来实现 Unicode 的这种映射关系,也有很多不同的方案(或规则),例如最常见(也是 Python 中默认的)UTF-8,还有 UTF-16、UTF-32 等,对于它们规则上的不同这里就不深入展开了。当然,在 ASCII 与 Unicode 之间还有很多其他的字符集与编码方案,例如中文编码的 GB2312、繁体字的 Big5 等等,这并不影响我们对编码与解码过程的理解。 49 | 50 | ### Unicode\*Error 51 | 52 | 明白了字符串与字节,编码与解码之后,让我们手动制造上面两个 `Unicode*Error` 试试,首先是编码错误: 53 | 54 | 55 | ```python 56 | def tryEncode(s, encoding="utf-8"): 57 | try: 58 | print(s.encode(encoding)) 59 | except UnicodeEncodeError as err: 60 | print(err) 61 | 62 | s = "$" # UTF-8 String 63 | tryEncode(s) # 默认用 UTF-8 进行编码 64 | tryEncode(s, "ascii") # 尝试用 ASCII 进行编码 65 | 66 | s = "雨" # UTF-8 String 67 | tryEncode(s) # 默认用 UTF-8 进行编码 68 | tryEncode(s, "ascii") # 尝试用 ASCII 进行编码 69 | tryEncode(s, "GB2312") # 尝试用 GB2312 进行编码 70 | ``` 71 | 72 | b'$' 73 | b'$' 74 | b'\xe9\x9b\xa8' 75 | 'ascii' codec can't encode character '\u96e8' in position 0: ordinal not in range(128) 76 | b'\xd3\xea' 77 | 78 | 79 | 由于 UTF-8 对 ASCII 的兼容性,`"$"` 可以用 ASCII 进行编码;而 `"雨"` 则无法用 ASCII 进行编码,因为它已经超出了 ASCII 字符集的 128 个字符,所以引发了 `UnicodeEncodeError`;而 `"雨"` 在 GB2312 中的码位是 `b'\xd3\xea'`,与 UTF-8 不同,但是仍然可以正确编码。因此如果出现了 `UnicodeEncodeError` 说明你用错了字典,要翻译的字符没办法正确翻译成码位! 80 | 81 | 再来看解码错误: 82 | 83 | 84 | ```python 85 | def tryDecode(s, decoding="utf-8"): 86 | try: 87 | print(s.decode(decoding)) 88 | except UnicodeDecodeError as err: 89 | print(err) 90 | 91 | b = b'$' # Bytes 92 | tryDecode(b) # 默认用 UTF-8 进行解码 93 | tryDecode(b, "ascii") # 尝试用 ASCII 进行解码 94 | tryDecode(b, "GB2312") # 尝试用 GB2312 进行解码 95 | 96 | b = b'\xd3\xea' # 上面例子中通过 GB2312 编码得到的 Bytes 97 | tryDecode(b) # 默认用 UTF-8 进行解码 98 | tryDecode(b, "ascii") # 尝试用 ASCII 进行解码 99 | tryDecode(b, "GB2312") # 尝试用 GB2312 进行解码 100 | tryDecode(b, "GBK") # 尝试用 GBK 进行解码 101 | tryDecode(b, "Big5") # 尝试用 Big5 进行解码 102 | 103 | tryDecode(b.decode("GB2312").encode()) # Byte-Decode-Unicode-Encode-Byte 104 | ``` 105 | 106 | $ 107 | $ 108 | $ 109 | 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte 110 | 'ascii' codec can't decode byte 0xd3 in position 0: ordinal not in range(128) 111 | 雨 112 | 雨 113 | 迾 114 | 雨 115 | 116 | 117 | 一般后续出现的字符集都是对 ASCII 兼容的,可以认为 ASCII 是他们的一个子集,因此可以用 ASCII 进行解码(编码)的,一般也可以用其它方法;对于不是不存在子集关系的编码,强行解码有可能会导致错误或乱码! 118 | 119 | ### 实践中的策略 120 | 121 | 清楚了上面介绍的所有原理之后,在时间操作中应该怎样规避错误或乱码呢? 122 | 123 | 1. 记清楚编码与解码的方向; 124 | 2. 在 Python 中的操作尽量采用 UTF-8,输入或输出的时候再根据需求确定是否需要编码成二进制: 125 | 126 | 127 | ```python 128 | # cat utf8.txt 129 | # 你好,世界! 130 | # file utf8.txt 131 | # utf8.txt: UTF-8 Unicode text 132 | 133 | with open("utf8.txt", "rb") as f: 134 | content = f.read() 135 | print(content) 136 | print(content.decode()) 137 | with open("utf8.txt", "r") as f: 138 | print(f.read()) 139 | 140 | # cat gb2312.txt 141 | # 你好,Unicode! 142 | # file gb2312.txt 143 | # gb2312.txt: ISO-8859 text 144 | 145 | with open("gb2312.txt", "r") as f: 146 | try: 147 | print(f.read()) 148 | except: 149 | print("Failed to decode file!") 150 | with open("gb2312.txt", "rb") as f: 151 | print(f.read().decode("gb2312")) 152 | ``` 153 | 154 | b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81\n' 155 | 你好,世界! 156 | 157 | 你好,世界! 158 | 159 | Failed to decode file! 160 | 你好,Unicode! 161 | 162 | 163 | 164 | ![Unicode](http://7xiijd.com1.z0.glb.clouddn.com/Pragmatic_Unicode.jpg) 165 | 166 | ### 参考 167 | 168 | 1. [Pragmatic Unicode](http://nedbatchelder.com/text/unipain/unipain.html) 169 | 2. [字符编码笔记:ASCII,Unicode和UTF-8](http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html) 170 | -------------------------------------------------------------------------------- /Markdowns/2016-03-18-String-Format.md: -------------------------------------------------------------------------------- 1 | 2 | ## Python 字符串的格式化 3 | 4 | 相信很多人在格式化字符串的时候都用`"%s" % v`的语法,[PEP 3101](https://www.python.org/dev/peps/pep-3101/) 提出一种更先进的格式化方法 `str.format()` 并成为 Python 3 的标准用来替换旧的 `%s` 格式化语法,CPython 从 2.6 开始已经实现了这一方法(其它解释器未考证)。 5 | 6 | ### `format()` 7 | 8 | 新的 `format()` 方法其实更像是一个简略版的模板引起(Template Engine),功能非常丰富,官方文档对其语法的描述如下: 9 | 10 | 11 | ```python 12 | """ 13 | replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}" 14 | field_name ::= arg_name ("." attribute_name | "[" element_index "]")* 15 | arg_name ::= [identifier | integer] 16 | attribute_name ::= identifier 17 | element_index ::= integer | index_string 18 | index_string ::= + 19 | conversion ::= "r" | "s" | "a" 20 | format_spec ::= 21 | """ 22 | pass # Donot output 23 | ``` 24 | 25 | 我将其准换成[铁路图](https://en.wikipedia.org/wiki/Syntax_diagram)的形式,(可能)更直观一些: 26 | 27 | ![replacement_field.jpg](http://7xiijd.com1.z0.glb.clouddn.com/replacement_field.jpg) 28 | 29 | 模板中替换变量用 `{}` 包围,且由 `:` 分为两部分,其中后半部分 `format_spec` 在后面会单独讨论。前半部分有三种用法: 30 | 31 | 1. 空 32 | 2. 代表位置的数字 33 | 3. 代表keyword的标识符 34 | 35 | 这与函数调用的参数类别是一致的: 36 | 37 | 38 | ```python 39 | print("{} {}".format("Hello", "World")) 40 | # is equal to... 41 | print("{0} {1}".format("Hello", "World")) 42 | print("{hello} {world}".format(hello="Hello", world="World")) 43 | 44 | print("{0}{1}{0}".format("H", "e")) 45 | ``` 46 | 47 | Hello World 48 | Hello World 49 | Hello World 50 | HeH 51 | 52 | 53 | 除此之外,就像在[0x05 函数参数与解包](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-11-Arguments-and-Unpacking.ipynb)中提到的一样,`format()` 中也可以直接使用解包操作: 54 | 55 | 56 | ```python 57 | print("{lang}.{suffix}".format(**{"lang": "Python", "suffix": "py"})) 58 | print("{} {}".format(*["Python", "Rocks"])) 59 | ``` 60 | 61 | Python.py 62 | Python Rocks 63 | 64 | 65 | 在模板中还可以通过 `.identifier` 和 `[key]` 的方式获取变量内的属性或值(需要注意的是 `"{}{}"` 相当于 `"{0}{1}"`): 66 | 67 | 68 | ```python 69 | data = {'name': 'Python', 'score': 100} 70 | print("Name: {0[name]}, Score: {0[score]}".format(data)) # 不需要引号 71 | 72 | langs = ["Python", "Ruby"] 73 | print("{0[0]} vs {0[1]}".format(langs)) 74 | 75 | print("\n====\nHelp(format):\n {.__doc__}".format(str.format)) 76 | ``` 77 | 78 | Name: Python, Score: 100 79 | Python vs Ruby 80 | 81 | ==== 82 | Help(format): 83 | S.format(*args, **kwargs) -> str 84 | 85 | Return a formatted version of S, using substitutions from args and kwargs. 86 | The substitutions are identified by braces ('{' and '}'). 87 | 88 | 89 | ### 强制转换 90 | 91 | 可以通过 `!` + `r|s|a` 的方式对替换的变量进行强制转换: 92 | 93 | 1. `"{!r}"` 对变量调用 `repr()` 94 | 2. `"{!s}"` 对变量调用 `str()` 95 | 3. `"{!a}"` 对变量调用 `ascii()` 96 | 97 | ### 格式 98 | 99 | 最后 `:` 之后的部分定义输出的样式: 100 | 101 | ![format_spec.jpg](http://7xiijd.com1.z0.glb.clouddn.com/format_spec.jpg) 102 | 103 | `align` 代表对齐方向,通常要配合 `width` 使用,而 `fill` 则是填充的字符(默认为空白): 104 | 105 | 106 | ```python 107 | for align, text in zip("<^>", ["left", "center", "right"]): 108 | print("{:{fill}{align}16}".format(text, fill=align, align=align)) 109 | 110 | print("{:0=10}".format(100)) # = 只允许数字 111 | ``` 112 | 113 | left<<<<<<<<<<<< 114 | ^^^^^center^^^^^ 115 | >>>>>>>>>>>right 116 | 0000000100 117 | 118 | 119 | 同时可以看出,样式设置里面可以嵌套 `{}` ,但是必须通过 keyword 指定,且只能嵌套一层。 120 | 121 | 接下来是符号样式:`+|-|' '` 分别指定数字是否需要强制符号(其中空格是指在正数的时候不显示 `+` 但保留一位空格): 122 | 123 | 124 | ```python 125 | print("{0:+}\n{1:-}\n{0: }".format(3.14, -3.14)) 126 | ``` 127 | 128 | +3.14 129 | -3.14 130 | 3.14 131 | 132 | 133 | `#` 用于表示特殊格式的数字(二进制、十六进制等)是否需要前缀符号;`,` 也是用于表示数字时是否需要在千位处进行分隔;`0` 相当于前面的 `{:0=}` 右对齐并用 `0` 补充空位: 134 | 135 | 136 | ```python 137 | print("Binary: {0:b} => {0:#b}".format(3)) 138 | 139 | print("Large Number: {0:} => {0:,}".format(1.25e6)) 140 | 141 | print("Padding: {0:16} => {0:016}".format(3)) 142 | ``` 143 | 144 | Binary: 11 => 0b11 145 | Large Number: 1250000.0 => 1,250,000.0 146 | Padding: 3 => 0000000000000003 147 | 148 | 149 | 最后两个就是我们熟悉的小数点精度 `.n` 和格式化类型了,这里仅给出一些示例,详细内容可以查阅[文档](https://docs.python.org/3/library/string.html#formatexamples): 150 | 151 | 152 | ```python 153 | from math import pi 154 | print("pi = {pi:.2}, also = {pi:.7}".format(pi=pi)) 155 | ``` 156 | 157 | pi = 3.1, also = 3.141593 158 | 159 | 160 | **Integer** 161 | 162 | 163 | ```python 164 | for t in "b c d #o #x #X n".split(): 165 | print("Type {0:>2} of {1} shows: {1:{t}}".format(t, 97, t=t)) 166 | ``` 167 | 168 | Type b of 97 shows: 1100001 169 | Type c of 97 shows: a 170 | Type d of 97 shows: 97 171 | Type #o of 97 shows: 0o141 172 | Type #x of 97 shows: 0x61 173 | Type #X of 97 shows: 0X61 174 | Type n of 97 shows: 97 175 | 176 | 177 | **Float** 178 | 179 | 180 | ```python 181 | for t, n in zip("eEfFgGn%", [12345, 12345, 1.3, 1.3, 1, 2, 3.14, 0.985]): 182 | print("Type {} shows: {:.2{t}}".format(t, n, t=t)) 183 | ``` 184 | 185 | Type e shows: 1.23e+04 186 | Type E shows: 1.23E+04 187 | Type f shows: 1.30 188 | Type F shows: 1.30 189 | Type g shows: 1 190 | Type G shows: 2 191 | Type n shows: 3.1 192 | Type % shows: 98.50% 193 | 194 | 195 | **String (default)** 196 | 197 | 198 | ```python 199 | try: 200 | print("{:s}".format(123)) 201 | except: 202 | print("{}".format(456)) 203 | ``` 204 | 205 | 456 206 | 207 | -------------------------------------------------------------------------------- /Markdowns/2016-03-21-Try-else.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 无处不在的 `else` 3 | 4 | 我们都知道 Python 中 `else` 的基本用法是在条件控制语句中的 `if...elif...else...`,但是 `else` 还有两个其它的用途,一是用于循环的结尾,另一个是用在错误处理的 `try` 中。这原本是 Python 的标准语法,但由于和大部分其它编程语言的习惯不太一样,致使人们有意或无意地忽略了这些用法。另外,对于这些用法是否符合 [0x00 The Zen of Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb) 的原则以及该不该广泛使用也存在很多争议。例如在我看到的两本书里([Effective Python](http://www.effectivepython.com/) VS [Write Idiomatic Python](https://jeffknupp.com/writing-idiomatic-python-ebook/)),两位作者就分别对其持有截然不同的态度。 5 | 6 | **循环中的 `else`** 7 | 8 | 跟在循环后面的 `else` 语句只有在当循环内没出现 `break`,也就是正常循环完成时才会执行。首先我们来看一个插入排序法的例子: 9 | 10 | 11 | ```python 12 | from random import randrange 13 | def insertion_sort(seq): 14 | if len(seq) <= 1: 15 | return seq 16 | _sorted = seq[:1] 17 | for i in seq[1:]: 18 | inserted = False 19 | for j in range(len(_sorted)): 20 | if i < _sorted[j]: 21 | _sorted = [*_sorted[:j], i, *_sorted[j:]] 22 | inserted = True 23 | break 24 | if not inserted: 25 | _sorted.append(i) 26 | return _sorted 27 | 28 | print(insertion_sort([randrange(1, 100) for i in range(10)])) 29 | ``` 30 | 31 | [8, 12, 12, 34, 38, 68, 72, 78, 84, 90] 32 | 33 | 34 | 在这个例子中,对已排序的 `_sorted` 元素逐个与 `i` 进行比较,若 `i` 比已排序的所有元素都大,则只能排在已排序列表的最后。这时我们就需要一个额外的状态变量 `inserted` 来标记完成遍历循环还是中途被 `break`,在这种情况下,我们可以用 `else` 来取代这一状态变量: 35 | 36 | 37 | ```python 38 | def insertion_sort(seq): 39 | if len(seq) <= 1: 40 | return seq 41 | _sorted = seq[:1] 42 | for i in seq[1:]: 43 | for j in range(len(_sorted)): 44 | if i < _sorted[j]: 45 | _sorted = [*_sorted[:j], i, *_sorted[j:]] 46 | break 47 | else: 48 | _sorted.append(i) 49 | return _sorted 50 | print(insertion_sort([randrange(1, 100) for i in range(10)])) 51 | ``` 52 | 53 | [1, 10, 27, 32, 32, 43, 50, 55, 80, 94] 54 | 55 | 56 | 我认为这是一个非常酷的做法!不过要注意的是,除了 `break` 可以触发后面的 `else` 语句,没有循环的时候也会: 57 | 58 | 59 | ```python 60 | while False: 61 | print("Will never print!") 62 | else: 63 | print("Loop failed!") 64 | ``` 65 | 66 | Loop failed! 67 | 68 | 69 | **错误捕捉中的 `else`** 70 | 71 | `try...except...else...finally` 流程控制语法用于捕捉可能出现的异常并进行相应的处理,其中 `except` 用于捕捉 `try` 语句中出现的错误;而 `else` 则用于处理**没有出现错误**的情况;`finally` 负责 `try` 语句的”善后工作“ ,无论如何都会执行。可以通过一个简单的例子来展示: 72 | 73 | 74 | ```python 75 | def divide(x, y): 76 | try: 77 | result = x / y 78 | except ZeroDivisionError: 79 | print("division by 0!") 80 | else: 81 | print("result = {}".format(result)) 82 | finally: 83 | print("divide finished!") 84 | divide(5,2) 85 | print("*"*20) 86 | divide(5,0) 87 | ``` 88 | 89 | result = 2.5 90 | divide finished! 91 | ******************** 92 | division by 0! 93 | divide finished! 94 | 95 | 96 | 当然,也可以用状态变量的做法来替代 `else`: 97 | 98 | 99 | ```python 100 | def divide(x, y): 101 | result = None 102 | try: 103 | result = x / y 104 | except ZeroDivisionError: 105 | print("division by 0!") 106 | if result is not None: 107 | print("result = {}".format(result)) 108 | print("divide finished!") 109 | 110 | 111 | divide(5,2) 112 | print("*"*20) 113 | divide(5,0) 114 | ``` 115 | 116 | result = 2.5 117 | divide finished! 118 | ******************** 119 | division by 0! 120 | divide finished! 121 | 122 | 123 | **总结** 124 | 125 | 有人觉得 `else` 的这些用法违反直觉或者是 **implicit** 而非 **explicit**,不值得提倡。但我觉得这种”判决“需要依赖具体的应用场景以及我们对 Python 的理解,并非一定要对新人友好的语法才算是 **explicit** 的。当然也不推荐在所有地方都使用这个语法,`for/while...else` 最大的缺点在于 `else` 是需要与 `for/file` 对齐的,如果是多层嵌套或者循环体太长的情况,就非常不适合用 `else`(回忆一下游标卡尺的梗就知道了:P)。只有在一些简短的循环控制语句中,我们通过 `else` 摆脱一些累赘的状态变量,这才是最 Pythonic 的应用场景! 126 | -------------------------------------------------------------------------------- /Markdowns/2016-03-22-Shallow-and-Deep-Copy.md: -------------------------------------------------------------------------------- 1 | 2 | ## Python 知之深浅 3 | 4 | Python 中的对象分为两种:可变对象(mutable)和不可变对象(immutable)。不可变对象包括int,float,long,str,tuple等,可变对象包括list,set,dict等。在 Python 中,赋值(assignment, `=`)的过程仅仅是: 5 | 6 | 1. 创建一个(某个值的)对象; 7 | 2. 将变量名指向(引用)这个对象。 8 | 9 | 这就像 C 语言中指针的概念,只不过更灵活地是 Python 中的变量随时可以指向其它对象(不分类型),其它变量也可以指向这一对象。如果这一对象是可变的,那么对其中一个引用变量的改变会影响其它变量: 10 | 11 | 12 | ```python 13 | lst = [1, 2, 3] 14 | s = lst 15 | s.pop() 16 | print(lst) 17 | 18 | d = {'a': 0} 19 | e = d 20 | e['b'] = 1 21 | print(d) 22 | ``` 23 | 24 | [1, 2] 25 | {'b': 1, 'a': 0} 26 | 27 | 28 | 如果你不是刻意想要这样做(实际也很少会要这样操作),那么就可能导致一些意想不到的错误(尤其是在传递参数给函数的时候)。为了解决这一麻烦,最简单的方法就是不直接变量指向现有的对象,而是生成一份新的 copy 赋值给新的变量,有很多种语法可以实现: 29 | 30 | 31 | ```python 32 | lst = [1,2,3] 33 | 34 | llst = [lst, 35 | lst[:], 36 | lst.copy(), 37 | [*lst]] # invalid in 2.7 38 | for i, v in enumerate(llst): 39 | v.append("#{}".format(i)) 40 | print(lst) 41 | 42 | d = {"a": 0} 43 | dd = [d, 44 | d.copy(), 45 | {**d}] # invalid in 2.7 46 | for i, v in enumerate(dd): 47 | v['dd'] = "#{}".format(i) 48 | print(d) 49 | ``` 50 | 51 | [1, 2, 3, '#0'] 52 | {'dd': '#0', 'a': 0} 53 | 54 | 55 | ### `deep` vs `shallow` 56 | 57 | 上面给出的这些 copy 的例子比较简单,都没有嵌套的情况出现,如果这里的可变对象中还包含其它可变对象,结果会怎样呢: 58 | 59 | 60 | ```python 61 | lst = [0, 1, [2, 3]] 62 | 63 | llst = [lst, 64 | lst[:], 65 | lst.copy(), 66 | [*lst]] 67 | for i, v in enumerate(llst): 68 | v[2].append("#{}".format(i)) 69 | print(lst) 70 | 71 | d = {"a": {"b": [0]}} 72 | dd = [d, 73 | d.copy(), 74 | {**d}] 75 | for i, v in enumerate(dd): 76 | v['a']['b'].append("#{}".format(i)) 77 | print(d) 78 | ``` 79 | 80 | [0, 1, [2, 3, '#0', '#1', '#2', '#3']] 81 | {'a': {'b': [0, '#0', '#1', '#2']}} 82 | 83 | 84 | 这些 copy 的方法称为**浅拷贝(shallow copy)**,它相比直接赋值更进了一步生成了新的对象,但是对于嵌套的对象仍然采用了赋值的方法来创建;如果要再进一步,则需要**深拷贝(deep copy)**,由标准库 `copy` 提供: 85 | 86 | 87 | ```python 88 | from copy import deepcopy 89 | 90 | lst = [0, 1, [2, 3]] 91 | lst2 = deepcopy(lst) 92 | lst2[2].append(4) 93 | print(lst2) 94 | print(lst) 95 | 96 | d = {"a": {"b": [0]}} 97 | d2 = deepcopy(d) 98 | d2["a"]["b"].append(1) 99 | print(d2) 100 | print(d) 101 | ``` 102 | 103 | [0, 1, [2, 3, 4]] 104 | [0, 1, [2, 3]] 105 | {'a': {'b': [0, 1]}} 106 | {'a': {'b': [0]}} 107 | 108 | 109 | 清楚了赋值(引用)、copy 还是 `deepcopy` 之间的区别才能更好地避免意想不到的错误,同样也可以利用它们的特性去实现一些 little tricks,例如我们在 [0x04 闭包与作用域](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-10-Scope-and-Closure.ipynb) 中利用可变对象的特性实现 `nonlocal` 的功能。关于可变对象的引用、传递等既是 Python 的基本属性,同时又因为隐藏在背后的“暗箱操作”而容易引起误解,想要深入了解可以进一步阅读参考链接的文章,我也会在后面的文章中继续一边学习、一边补充更多这方面的知识。 110 | 111 | ### 参考 112 | 113 | 1. [python基础(5):深入理解 python 中的赋值、引用、拷贝、作用域](http://my.oschina.net/leejun2005/blog/145911) 114 | -------------------------------------------------------------------------------- /Markdowns/2016-03-23-With-Context-Manager.md: -------------------------------------------------------------------------------- 1 | 2 | ## Python 上下文管理器 3 | 4 | Python 2.5 引入了 `with` 语句([PEP 343](https://www.python.org/dev/peps/pep-0343/))与上下文管理器类型([Context Manager Types](https://docs.python.org/3/library/stdtypes.html#context-manager-types)),其主要作用包括: 5 | 6 | > 保存、重置各种全局状态,锁住或解锁资源,关闭打开的文件等。[With Statement Context Managers](https://docs.python.org/3/reference/datamodel.html#with-statement-context-managers) 7 | 8 | 一种最普遍的用法是对文件的操作: 9 | 10 | 11 | ```python 12 | with open("utf8.txt", "r") as f: 13 | print(f.read()) 14 | ``` 15 | 16 | 你好,世界! 17 | 18 | 19 | 20 | 上面的例子也可以用 `try...finally...` 实现,它们的效果是相同(或者说上下文管理器就是封装、简化了错误捕捉的过程): 21 | 22 | 23 | ```python 24 | try: 25 | f = open("utf8.txt", "r") 26 | print(f.read()) 27 | finally: 28 | f.close() 29 | ``` 30 | 31 | 你好,世界! 32 | 33 | 34 | 35 | 除了文件对象之外,我们也可以自己创建上下文管理器,与 [0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb) 中介绍的迭代器类似,只要定义了 `__enter__()` 和 `__exit__()` 方法就成为了上下文管理器类型。`with` 语句的执行过程如下: 36 | 37 | 1. 执行 `with` 后的语句获取上下文管理器,例如 `open('utf8.txt', 'r')` 就是返回一个 `file object`; 38 | 2. 加载 `__exit__()` 方法备用; 39 | 3. 执行 `__enter__()`,该方法的返回值将传递给 `as` 后的变量(如果有的话); 40 | 4. 执行 `with` 语法块的子句; 41 | 5. 执行 `__exit__()` 方法,如果 `with` 语法块子句中出现异常,将会传递 `type, value, traceback` 给 `__exit__()`,否则将默认为 `None`;如果 `__exit__()` 方法返回 `False`,将会抛出异常给外层处理;如果返回 `True`,则忽略异常。 42 | 43 | 了解了 `with` 语句的执行过程,我们可以编写自己的上下文管理器。假设我们需要一个引用计数器,而出于某些特殊的原因需要多个计数器共享全局状态并且可以相互影响,而且在计数器使用完毕之后需要恢复初始的全局状态: 44 | 45 | 46 | ```python 47 | _G = {"counter": 99, "user": "admin"} 48 | 49 | class Refs(): 50 | def __init__(self, name = None): 51 | self.name = name 52 | self._G = _G 53 | self.init = self._G['counter'] 54 | def __enter__(self): 55 | return self 56 | def __exit__(self, *args): 57 | self._G["counter"] = self.init 58 | return False 59 | def acc(self, n = 1): 60 | self._G["counter"] += n 61 | def dec(self, n = 1): 62 | self._G["counter"] -= n 63 | def __str__(self): 64 | return "COUNTER #{name}: {counter}".format(**self._G, name=self.name) 65 | 66 | with Refs("ref1") as ref1, Refs("ref2") as ref2: # Python 3.1 加入了多个并列上下文管理器 67 | for _ in range(3): 68 | ref1.dec() 69 | print(ref1) 70 | ref2.acc(2) 71 | print(ref2) 72 | print(_G) 73 | ``` 74 | 75 | COUNTER #ref1: 98 76 | COUNTER #ref2: 100 77 | COUNTER #ref1: 99 78 | COUNTER #ref2: 101 79 | COUNTER #ref1: 100 80 | COUNTER #ref2: 102 81 | {'user': 'admin', 'counter': 99} 82 | 83 | 84 | 上面的例子很别扭但是可以很好地说明 `with` 语句的执行顺序,只是每次定义两个方法看起来并不是很简洁,一如既往地,Python 提供了 `@contextlib.contextmanager` + `generator` 的方式来简化这一过程(正如 [0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb) 中 `yield` 简化迭代器一样): 85 | 86 | 87 | ```python 88 | from contextlib import contextmanager as cm 89 | _G = {"counter": 99, "user": "admin"} 90 | 91 | @cm 92 | def ref(): 93 | counter = _G["counter"] 94 | yield _G 95 | _G["counter"] = counter 96 | 97 | with ref() as r1, ref() as r2: 98 | for _ in range(3): 99 | r1["counter"] -= 1 100 | print("COUNTER #ref1: {}".format(_G["counter"])) 101 | r2["counter"] += 2 102 | print("COUNTER #ref2: {}".format(_G["counter"])) 103 | print("*"*20) 104 | print(_G) 105 | ``` 106 | 107 | COUNTER #ref1: 98 108 | COUNTER #ref2: 100 109 | COUNTER #ref1: 99 110 | COUNTER #ref2: 101 111 | COUNTER #ref1: 100 112 | COUNTER #ref2: 102 113 | ******************** 114 | {'user': 'admin', 'counter': 99} 115 | 116 | 117 | 这里对生成器的要求是必须只能返回一个值(只有一次 `yield`),返回的值相当于 `__enter__()` 的返回值;而 `yield` 后的语句相当于 `__exit__()`。 118 | 119 | 生成器的写法更简洁,适合快速生成一个简单的上下文管理器。 120 | 121 | 除了上面两种方式,Python 3.2 中新增了 `contextlib.ContextDecorator`,可以允许我们自己在 `class` 层面定义新的”上下文管理修饰器“,有兴趣可以到[官方文档查看](https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator)。至少在我目前看来好像并没有带来更多方便(除了可以省掉一层缩进之外:()。 122 | 123 | 上下文管理器的概念与修饰器有很多相似之处,但是要记住的是 `with` 语句的目的是为了更优雅地收拾残局而不是替代 `try...finally...`,毕竟在 [The Zen of Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb) 中, 124 | 125 | > Explicit is better than implicit. 126 | 127 | 比 128 | 129 | > Simple is better than complex. 130 | 131 | 更重要:P。 132 | -------------------------------------------------------------------------------- /Markdowns/2016-03-24-Sort-and-Sorted.md: -------------------------------------------------------------------------------- 1 | 2 | ### Python 内置排序方法 3 | 4 | Python 提供两种内置排序方法,一个是只针对 `List` 的原地(in-place)排序方法 `list.sort()`,另一个是针对所有可迭代对象的非原地排序方法 `sorted()`。 5 | 6 | 所谓原地排序是指会立即改变被排序的列表对象,就像 `append()`/`pop()` 等方法一样: 7 | 8 | 9 | ```python 10 | from random import randrange 11 | lst = [randrange(1, 100) for _ in range(10)] 12 | print(lst) 13 | lst.sort() 14 | 15 | print(lst) 16 | ``` 17 | 18 | [57, 81, 32, 74, 12, 89, 76, 21, 75, 6] 19 | [6, 12, 21, 32, 57, 74, 75, 76, 81, 89] 20 | 21 | 22 | `sorted()` 不限于列表,而且会生成并返回一个新的排序后的**列表**,原有对象不受影响: 23 | 24 | 25 | ```python 26 | lst = [randrange(1, 100) for _ in range(10)] 27 | tup = tuple(lst) 28 | 29 | print(sorted(tup)) # return List 30 | print(tup) 31 | ``` 32 | 33 | [11, 36, 39, 41, 48, 48, 50, 76, 79, 99] 34 | (11, 41, 79, 48, 48, 99, 39, 76, 36, 50) 35 | 36 | 37 | 虽然不是原地排序,但如果是传入生成器,还是会被循环掉的: 38 | 39 | 40 | ```python 41 | tup = (randrange(1, 100) for _ in range(10)) 42 | print(sorted(tup)) 43 | for i in tup: 44 | print(i) 45 | ``` 46 | 47 | [5, 12, 15, 21, 57, 69, 73, 83, 90, 95] 48 | 49 | 50 | ### Key 51 | 52 | 对简单的迭代对象进行排序只需要逐次提取元素进行比较即可,如果想要对元素进行一些操作再进行比较,可以通过 `key` 参数指定一个取值函数。这里的 `key` 用法很像 [0x02 函数式编程](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-08-Functional-Programming-in-Python.ipynb)提到的 `map`/`filter` 所接受的函数,不同之处在于这里的 `key` 函数只是在排序比较前对元素进行处理,并不会改变元素原本的值,例如我们对一组整数**按照(`key` 可以理解为`按照`的意思)**绝对值进行排序: 53 | 54 | 55 | ```python 56 | lst = [randrange(-10, 10) for _ in range(10)] 57 | print(lst) 58 | print(sorted(lst, key=abs)) 59 | ``` 60 | 61 | [0, 7, 0, -10, 3, 7, -9, -10, -7, -10] 62 | [0, 0, 3, 7, 7, -7, -9, -10, -10, -10] 63 | 64 | 65 | 或者,当迭代对象的元素较为复杂时,可以只**按照**其中的某些属性进行排序: 66 | 67 | 68 | ```python 69 | lst = list(zip("hello world hail python".split(), [randrange(1, 10) for _ in range(4)])) 70 | print(lst) 71 | print(sorted(lst, key=lambda item: item[1])) 72 | ``` 73 | 74 | [('hello', 3), ('world', 3), ('hail', 9), ('python', 9)] 75 | [('hello', 3), ('world', 3), ('hail', 9), ('python', 9)] 76 | 77 | 78 | Python 的 `operator` 标准库提供了一些操作符相关的方法,可以更方便地获取元素的属性: 79 | 80 | 81 | ```python 82 | from operator import itemgetter, attrgetter 83 | 84 | print(lst) 85 | print(sorted(lst, key=itemgetter(1))) 86 | 87 | # 一切都只是函数 88 | fitemgetter = lambda ind: lambda item: item[ind] 89 | print(sorted(lst, key=fitemgetter(1))) 90 | 91 | class P(object): 92 | def __init__(self, w, n): 93 | self.w = w 94 | self.n = n 95 | def __repr__(self): 96 | return "{}=>{}".format(self.w, self.n) 97 | ps = [P(i[0], i[1]) for i in lst] 98 | 99 | print(sorted(ps, key=attrgetter('n'))) 100 | ``` 101 | 102 | [('hello', 3), ('world', 3), ('hail', 9), ('python', 9)] 103 | [('hello', 3), ('world', 3), ('hail', 9), ('python', 9)] 104 | [('hello', 3), ('world', 3), ('hail', 9), ('python', 9)] 105 | [hello=>3, world=>3, hail=>9, python=>9] 106 | 107 | 108 | 经过 `key` 处理之后会通过 `<` 符号对两个元素进行比较,在 Python 2.7 的版本中,`sorted()` 还可以接收另外一个参数 `cmp`,用来接管 `<` 的比较过程。但是在 Python 3.5 中已经全面摒弃了这一做法,包括 `sorted()` 中的 `cmp` 参数和对象中的 `__cmp__` 比较操作,只有在需要向后兼容的时候才可能在 Python 3.5 用到这一功能,其替换的方法为: 109 | 110 | 111 | ```python 112 | from functools import cmp_to_key as new_cmp_to_key 113 | 114 | # new_cmp_to_key works like this 115 | def cmp_to_key(mycmp): 116 | 'Convert a cmp= function into a key= function' 117 | class K: 118 | def __init__(self, obj, *args): 119 | self.obj = obj 120 | def __lt__(self, other): 121 | return mycmp(self.obj, other.obj) < 0 122 | return K 123 | def reverse_cmp(x, y): 124 | return y[1] - x[1] 125 | sorted(lst, key=cmp_to_key(reverse_cmp)) 126 | 127 | ``` 128 | 129 | 130 | 131 | 132 | [('hail', 9), ('python', 9), ('hello', 3), ('world', 3)] 133 | 134 | 135 | 136 | 如果想要按照递减排序,只需要设定参数 `reverse = True` 即可。 137 | -------------------------------------------------------------------------------- /Markdowns/2016-03-25-Decorator-and-functools.md: -------------------------------------------------------------------------------- 1 | 2 | ## Python 修饰器与 `functools` 3 | 4 | Python 的修饰器是一种语法糖(Syntactic Sugar),也就是说: 5 | 6 | ```python 7 | @decorator 8 | @wrap 9 | def func(): 10 | pass 11 | ``` 12 | 13 | 是下面语法的一种简写: 14 | 15 | ```python 16 | def func(): 17 | pass 18 | func = decorator(wrap(func)) 19 | ``` 20 | 21 | 关于修饰器的两个主要问题: 22 | 23 | 1. 修饰器用来修饰谁 24 | 2. 谁可以作为修饰器 25 | 26 | ### 修饰函数 27 | 28 | 修饰器最常见的用法是修饰新定义的函数,在 [0x0d 上下文管理器](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-23-With-Context-Manager.ipynb)中提到上下文管理器主要是为了**更优雅地完成善后工作**,而修饰器通常用于扩展函数的行为或属性: 29 | 30 | 31 | ```python 32 | def log(func): 33 | def wraper(): 34 | print("INFO: Starting {}".format(func.__name__)) 35 | func() 36 | print("INFO: Finishing {}".format(func.__name__)) 37 | return wraper 38 | 39 | @log 40 | def run(): 41 | print("Running run...") 42 | run() 43 | ``` 44 | 45 | INFO: Starting run 46 | Running run... 47 | INFO: Finishing run 48 | 49 | 50 | ### 修饰类 51 | 52 | 除了修饰函数之外,Python 3.0 之后增加了对新定义类的修饰([PEP 3129](https://www.python.org/dev/peps/pep-3129/)),但是对于类别属性的修改可以通过 [`Metaclasses`](https://www.python.org/doc/essays/metaclasses/) 或继承来实现,而新增加的类别修饰器更多是出于 [Jython](https://mail.python.org/pipermail/python-dev/2006-March/062942.html) 以及 [IronPython](http://lists.ironpython.com/pipermail/users-ironpython.com/2006-March/002007.html) 的考虑,但其语法还是很一致的: 53 | 54 | 55 | ```python 56 | from time import sleep, time 57 | def timer(Cls): 58 | def wraper(): 59 | s = time() 60 | obj = Cls() 61 | e = time() 62 | print("Cost {:.3f}s to init.".format(e - s)) 63 | return obj 64 | return wraper 65 | @timer 66 | class Obj: 67 | def __init__(self): 68 | print("Hello") 69 | sleep(3) 70 | print("Obj") 71 | o = Obj() 72 | ``` 73 | 74 | Hello 75 | Obj 76 | Cost 3.005s to init. 77 | 78 | 79 | ### 类作为修饰器 80 | 81 | 上面两个例子都是以函数作为修饰器,因为函数才可以被调用(callable) `decorator(wrap(func))`。除了函数之外,我们也可以定义可被调用的类,只要添加 `__call__` 方法即可: 82 | 83 | 84 | ```python 85 | class HTML(object): 86 | """ 87 | Baking HTML Tags! 88 | """ 89 | def __init__(self, tag="p"): 90 | print("LOG: Baking Tag <{}>!".format(tag)) 91 | self.tag = tag 92 | def __call__(self, func): 93 | return lambda: "<{0}>{1}".format(self.tag, func(), self.tag) 94 | 95 | @HTML("html") 96 | @HTML("body") 97 | @HTML("div") 98 | def body(): 99 | return "Hello" 100 | 101 | print(body()) 102 | ``` 103 | 104 | LOG: Baking Tag ! 105 | LOG: Baking Tag ! 106 | LOG: Baking Tag
! 107 |
Hello
108 | 109 | 110 | ### 传递参数 111 | 112 | 在实际使用过程中,我们可能需要向修饰器传递参数,也有可能需要向被修饰的函数(或类)传递参数。按照语法约定,只要修饰器 `@decorator` 中的 `decorator` 是可调用即可,`decorator(123)` 如果返回一个新的可调用函数,那么也是合理的,上面的 `@HTML('html')` 即是一例,下面再以 [flask](https://github.com/mitsuhiko/flask/blob/master/flask%2Fapp.py) 的路由修饰器为例说明如何传递参数给修饰器: 113 | 114 | 115 | ```python 116 | RULES = {} 117 | def route(rule): 118 | def decorator(hand): 119 | RULES.update({rule: hand}) 120 | return hand 121 | return decorator 122 | @route("/") 123 | def index(): 124 | print("Hello world!") 125 | 126 | def home(): 127 | print("Welcome Home!") 128 | home = route("/home")(home) 129 | 130 | index() 131 | home() 132 | print(RULES) 133 | ``` 134 | 135 | Hello world! 136 | Welcome Home! 137 | {'/': , '/home': } 138 | 139 | 140 | 向被修饰的函数传递参数,要看我们的修饰器是如何作用的,如果像上面这个例子一样未执行被修饰函数只是将其原模原样地返回,则不需要任何处理(这就把函数当做普通的值一样看待即可): 141 | 142 | 143 | ```python 144 | @route("/login") 145 | def login(user = "user", pwd = "pwd"): 146 | print("DB.findOne({{{}, {}}})".format(user, pwd)) 147 | login("hail", "python") 148 | ``` 149 | 150 | DB.findOne({hail, python}) 151 | 152 | 153 | 如果需要在修饰器内执行,则需要稍微变动一下: 154 | 155 | 156 | ```python 157 | def log(f): 158 | def wraper(*args, **kargs): 159 | print("INFO: Start Logging") 160 | f(*args, **kargs) 161 | print("INFO: Finish Logging") 162 | return wraper 163 | 164 | @log 165 | def run(hello = "world"): 166 | print("Hello {}".format(hello)) 167 | run("Python") 168 | ``` 169 | 170 | INFO: Start Logging 171 | Hello Python 172 | INFO: Finish Logging 173 | 174 | 175 | ### functools 176 | 177 | 由于修饰器将函数(或类)进行包装之后重新返回:`func = decorator(func)`,那么有可能改变原本函数(或类)的一些信息,以上面的 `HTML` 修饰器为例: 178 | 179 | 180 | ```python 181 | @HTML("body") 182 | def body(): 183 | """ 184 | return body content 185 | """ 186 | return "Hello, body!" 187 | print(body.__name__) 188 | print(body.__doc__) 189 | ``` 190 | 191 | LOG: Baking Tag ! 192 | 193 | None 194 | 195 | 196 | 因为 `body = HTML("body")(body)` ,而 `HTML("body").__call__()` 返回的是一个 `lambda` 函数,因此 `body` 已经被替换成了 `lambda`,虽然都是可执行的函数,但原来定义的 `body` 中的一些属性,例如 `__doc__`/`__name__`/`__module__` 都被替换了(在本例中`__module__`没变因为都在同一个文件中)。为了解决这一问题 Python 提供了 [`functools`](https://docs.python.org/3.5/library/functools.html) 标准库,其中包括了 `update_wrapper` 和 `wraps` 两个方法([源码](https://hg.python.org/cpython/file/3.5/Lib/functools.py))。其中 `update_wrapper` 就是用来将原来函数的信息赋值给修饰器中返回的函数: 197 | 198 | 199 | ```python 200 | from functools import update_wrapper 201 | """ 202 | functools.update_wrapper(wrapper, wrapped[, assigned][, updated]) 203 | """ 204 | 205 | 206 | class HTML(object): 207 | """ 208 | Baking HTML Tags! 209 | """ 210 | def __init__(self, tag="p"): 211 | print("LOG: Baking Tag <{}>!".format(tag)) 212 | self.tag = tag 213 | def __call__(self, func): 214 | wraper = lambda: "<{0}>{1}".format(self.tag, func(), self.tag) 215 | update_wrapper(wraper, func) 216 | return wraper 217 | @HTML("body") 218 | def body(): 219 | """ 220 | return body content! 221 | """ 222 | return "Hello, body!" 223 | print(body.__name__) 224 | print(body.__doc__) 225 | ``` 226 | 227 | LOG: Baking Tag ! 228 | body 229 | 230 | return body content! 231 | 232 | 233 | 234 | 有趣的是 `update_wrapper` 的用法本身就很像是修饰器,因此 `functools.wraps` 就利用 `functools.partial`(还记得函数式编程中的偏应用吧!)将其变成一个修饰器: 235 | 236 | 237 | ```python 238 | from functools import update_wrapper, partial 239 | 240 | def my_wraps(wrapped): 241 | return partial(update_wrapper, wrapped=wrapped) 242 | 243 | def log(func): 244 | @my_wraps(func) 245 | def wraper(): 246 | print("INFO: Starting {}".format(func.__name__)) 247 | func() 248 | print("INFO: Finishing {}".format(func.__name__)) 249 | return wraper 250 | 251 | @log 252 | def run(): 253 | """ 254 | Docs' of run 255 | """ 256 | print("Running run...") 257 | print(run.__name__) 258 | print(run.__doc__) 259 | ``` 260 | 261 | run 262 | 263 | Docs' of run 264 | 265 | 266 | 267 | ### 参考 268 | 269 | 1. [Python修饰器的函数式编程](http://coolshell.cn/articles/11265.html) 270 | -------------------------------------------------------------------------------- /PyTips.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coodict/pytips/b09321e30f66449056e0832b1a499b8a9623e461/PyTips.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### Python Tips 2 | [![Join the chat at https://gitter.im/rainyear/pytips](https://badges.gitter.im/rainyear/pytips.svg)](https://gitter.im/rainyear/pytips) 3 | 4 | 受[jstips](https://github.com/loverajoel/jstips)项目启发。 5 | 6 | > 每天一条有用的 Python 小提示 7 | 8 | 每天花费不到两分钟,学习一条可以提高Python代码质量,更Pythonic的方法解决问题的小技巧。 9 | 10 | *主要基于 Python 3.5 并尽量做到向后兼容(至Python 2.7)。* 11 | 12 | ### Tips list 13 | 14 | - **0x0f** - [Decorator and functools](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-25-Decorator-and-functools.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-25-Decorator-and-functools.md)] 15 | - **0x0e** - [Sort and Sorted](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-24-Sort-and-Sorted.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-24-Sort-and-Sorted.md)] 16 | - **0x0d** - [With Context Manager](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-23-With-Context-Manager.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-23-With-Context-Manager.md)] 17 | - **0x0c** - [Shallow and Deep Copy](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-22-Shallow-and-Deep-Copy.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-22-Shallow-and-Deep-Copy.md)] 18 | - **0x0b** - [Try else](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-21-Try-else.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-21-Try-else.md)] 19 | - **0x0a** - [String Format](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-18-String-Format.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-18-String-Format.md)] 20 | - **0x09** - [Bytes decode Unicode encode Bytes](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-17-Bytes-decode-Unicode-encode-Bytes.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-17-Bytes-decode-Unicode-encode-Bytes.md)] 21 | - **0x08** - [Bytes and Bytearray](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-16-Bytes-and-Bytearray.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-16-Bytes-and-Bytearray.md)] 22 | - **0x07** - [Unicode String](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-15-Unicode-String.md)] 23 | - **0x06** - [Command Line tools in Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-14-Command-Line-tools-in-Python.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-14-Command-Line-tools-in-Python.md)] 24 | - **0x05** - [Arguments and Unpacking](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-11-Arguments-and-Unpacking.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-11-Arguments-and-Unpacking.md)] 25 | - **0x04** - [Scope and Closure](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-10-Scope-and-Closure.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-10-Scope-and-Closure.md)] 26 | - **0x03** - [List Comprehension](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-09-List-Comprehension.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-09-List-Comprehension.md)] 27 | - **0x02** - [Functional Programming in Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-08-Functional-Programming-in-Python.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-08-Functional-Programming-in-Python.md)] 28 | - **0x01** - [Iterator and Generator](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-07-iterator-and-generator.md)] 29 | - **0x00** - [The Zen of Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb) [[markdown](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-06-The-Zen-of-Python.md)] 30 | 31 | ### How to Contribute 32 | 33 | 1. 提供问题、话题或应用场景:[Issue](https://github.com/rainyear/pytips/issues) 34 | 35 | ### Thanks 36 | 37 | - [Python](http://www.python.org/) 38 | - [Jupyter](https://jupyter.org/) 39 | - [jupyter-vim-binding](https://github.com/lambdalisue/jupyter-vim-binding) 40 | 41 | ### Donate 42 | 43 | 如果你觉得对你有所帮助,[不妨请我喝杯咖啡~ :beers:](http://rainy.im/donate/) 44 | 45 | ### License 46 | 47 | [MIT](./LICENSE) 48 | -------------------------------------------------------------------------------- /Tips/2016-03-06-The-Zen-of-Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Python 之禅与 Pythonic\n", 8 | "\n", 9 | "Python 之禅是 Python 语言的设计哲学与所倡导的编程理念,Pythonic 则是指基于 Python 理念编写更加符合 Python 语法习惯(idiomatic Python)的代码,这也是本项目所追求的目标,因此以本篇作为开头。" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "collapsed": false 17 | }, 18 | "outputs": [ 19 | { 20 | "name": "stdout", 21 | "output_type": "stream", 22 | "text": [ 23 | "The Zen of Python, by Tim Peters\n", 24 | "\n", 25 | "Beautiful is better than ugly.\n", 26 | "Explicit is better than implicit.\n", 27 | "Simple is better than complex.\n", 28 | "Complex is better than complicated.\n", 29 | "Flat is better than nested.\n", 30 | "Sparse is better than dense.\n", 31 | "Readability counts.\n", 32 | "Special cases aren't special enough to break the rules.\n", 33 | "Although practicality beats purity.\n", 34 | "Errors should never pass silently.\n", 35 | "Unless explicitly silenced.\n", 36 | "In the face of ambiguity, refuse the temptation to guess.\n", 37 | "There should be one-- and preferably only one --obvious way to do it.\n", 38 | "Although that way may not be obvious at first unless you're Dutch.\n", 39 | "Now is better than never.\n", 40 | "Although never is often better than *right* now.\n", 41 | "If the implementation is hard to explain, it's a bad idea.\n", 42 | "If the implementation is easy to explain, it may be a good idea.\n", 43 | "Namespaces are one honking great idea -- let's do more of those!\n" 44 | ] 45 | } 46 | ], 47 | "source": [ 48 | "import this" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "Python 之禅,by Tim Peters\n", 56 | "\n", 57 | "优美胜于丑陋\n", 58 | "\n", 59 | "明确胜于隐晦\n", 60 | "\n", 61 | "简单胜于复杂\n", 62 | "\n", 63 | "复杂胜于凌乱\n", 64 | "\n", 65 | "扁平胜于嵌套\n", 66 | "\n", 67 | "稀疏胜于紧凑\n", 68 | "\n", 69 | "可读性至关重要\n", 70 | "\n", 71 | "即便特例,也需服从以上规则\n", 72 | "\n", 73 | "\n", 74 | "除非刻意追求,错误不应跳过\n", 75 | "\n", 76 | "面对歧义条件,拒绝尝试猜测\n", 77 | "\n", 78 | "\n", 79 | "解决问题的最优方法应该有且只有一个\n", 80 | "\n", 81 | "尽管这一方法并非显而易见(除非你是Python之父)\n", 82 | "\n", 83 | "\n", 84 | "动手胜于空想\n", 85 | "\n", 86 | "空想胜于不想\n", 87 | "\n", 88 | "\n", 89 | "难以解释的实现方案,不是好方案\n", 90 | "\n", 91 | "易于解释的实现方案,才是好方案\n", 92 | "\n", 93 | "\n", 94 | "命名空间是个绝妙的理念,多多益善!" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "#### 参考\n", 102 | "\n", 103 | "1. [《Python之禅》的翻译和解释](http://blog.csdn.net/gzlaiyonghao/article/details/2151918)\n", 104 | "2. [What is Pythonic?](http://blog.startifact.com/posts/older/what-is-pythonic.html)" 105 | ] 106 | } 107 | ], 108 | "metadata": { 109 | "kernelspec": { 110 | "display_name": "Python 3", 111 | "language": "python", 112 | "name": "python3" 113 | }, 114 | "language_info": { 115 | "codemirror_mode": { 116 | "name": "ipython", 117 | "version": 3 118 | }, 119 | "file_extension": ".py", 120 | "mimetype": "text/x-python", 121 | "name": "python", 122 | "nbconvert_exporter": "python", 123 | "pygments_lexer": "ipython3", 124 | "version": "3.5.0" 125 | } 126 | }, 127 | "nbformat": 4, 128 | "nbformat_minor": 0 129 | } 130 | -------------------------------------------------------------------------------- /Tips/2016-03-07-iterator-and-generator.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 迭代器与生成器\n", 8 | "\n", 9 | "迭代器(iterator)与生成器(generator)是 Python 中比较常用又很容易混淆的两个概念,今天就把它们梳理一遍,并举一些常用的例子。\n", 10 | "\n", 11 | "**`for` 语句与可迭代对象(iterable object):**" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "collapsed": false 19 | }, 20 | "outputs": [ 21 | { 22 | "name": "stdout", 23 | "output_type": "stream", 24 | "text": [ 25 | "1\n", 26 | "2\n", 27 | "3\n" 28 | ] 29 | } 30 | ], 31 | "source": [ 32 | "for i in [1, 2, 3]:\n", 33 | " print(i)" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": { 40 | "collapsed": false 41 | }, 42 | "outputs": [ 43 | { 44 | "name": "stdout", 45 | "output_type": "stream", 46 | "text": [ 47 | "b\n", 48 | "a\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "obj = {\"a\": 123, \"b\": 456}\n", 54 | "for k in obj:\n", 55 | " print(k)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "这些可以用在 `for` 语句进行循环的对象就是**可迭代对象**。除了内置的数据类型(列表、元组、字符串、字典等)可以通过 `for` 语句进行迭代,我们也可以自己创建一个容器,包含一系列元素,可以通过 `for` 语句依次循环取出每一个元素,这种容器就是**迭代器(iterator)**。除了用 `for` 遍历,迭代器还可以通过 `next()` 方法逐一读取下一个元素。要创建一个迭代器有3种方法,其中前两种分别是:\n", 63 | "\n", 64 | "1. 为容器对象添加 `__iter__()` 和 `__next__()` 方法(Python 2.7 中是 `next()`);`__iter__()` 返回迭代器对象本身 `self`,`__next__()` 则返回每次调用 `next()` 或迭代时的元素;\n", 65 | "2. 内置函数 `iter()` 将可迭代对象转化为迭代器" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 14, 71 | "metadata": { 72 | "collapsed": false 73 | }, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "\n", 80 | "1\n", 81 | "2\n", 82 | "3\n", 83 | "[LOG] I made this iterator!\n", 84 | "[LOG] Calling __next__ method!\n", 85 | "0\n", 86 | "[LOG] Calling __next__ method!\n", 87 | "1\n", 88 | "[LOG] Calling __next__ method!\n", 89 | "2\n", 90 | "[LOG] Calling __next__ method!\n", 91 | "3\n", 92 | "[LOG] Calling __next__ method!\n", 93 | "4\n", 94 | "[LOG] Calling __next__ method!\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "# iter(IterableObject)\n", 100 | "ita = iter([1, 2, 3])\n", 101 | "print(type(ita))\n", 102 | "\n", 103 | "print(next(ita))\n", 104 | "print(next(ita))\n", 105 | "print(next(ita))\n", 106 | "\n", 107 | "# Create iterator Object\n", 108 | "class Container:\n", 109 | " def __init__(self, start = 0, end = 0):\n", 110 | " self.start = start\n", 111 | " self.end = end\n", 112 | " def __iter__(self):\n", 113 | " print(\"[LOG] I made this iterator!\")\n", 114 | " return self\n", 115 | " def __next__(self):\n", 116 | " print(\"[LOG] Calling __next__ method!\")\n", 117 | " if self.start < self.end:\n", 118 | " i = self.start\n", 119 | " self.start += 1\n", 120 | " return i\n", 121 | " else:\n", 122 | " raise StopIteration()\n", 123 | "c = Container(0, 5)\n", 124 | "for i in c:\n", 125 | " print(i)\n", 126 | " " 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "创建迭代器对象的好处是当序列长度很大时,可以减少内存消耗,因为每次只需要记录一个值即刻(经常看到人们介绍 Python 2.7 的 `range` 函数时,建议当长度太大时用 `xrange` 更快,在 Python 3.5 中已经去除了 `xrange` 只有一个类似迭代器一样的 `range`)。\n", 134 | "\n", 135 | "#### 生成器\n", 136 | "\n", 137 | "前面说到创建迭代器有3种方法,其中第三种就是**生成器(generator)**。生成器通过 `yield` 语句快速生成迭代器,省略了复杂的 `__iter__()` & `__next__()` 方式:" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 24, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "\n", 152 | "0\n", 153 | "2\n", 154 | "3\n", 155 | "4\n" 156 | ] 157 | } 158 | ], 159 | "source": [ 160 | "def container(start, end):\n", 161 | " while start < end:\n", 162 | " yield start\n", 163 | " start += 1\n", 164 | "c = container(0, 5)\n", 165 | "print(type(c))\n", 166 | "print(next(c))\n", 167 | "next(c)\n", 168 | "for i in c:\n", 169 | " print(i)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "简单来说,`yield` 语句可以让普通函数变成一个生成器,并且相应的 `__next__()` 方法返回的是 `yield` 后面的值。一种更直观的解释是:程序执行到 `yield` 会返回值并暂停,再次调用 `next()` 时会从上次暂停的地方继续开始执行:" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 26, 182 | "metadata": { 183 | "collapsed": false 184 | }, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "5\n", 191 | "Hello\n", 192 | "World\n", 193 | "4\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "def gen():\n", 199 | " yield 5\n", 200 | " yield \"Hello\"\n", 201 | " yield \"World\"\n", 202 | " yield 4\n", 203 | "for i in gen():\n", 204 | " print(i)" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "Python 3.5 (准确地说应该是 3.3 以后)中为生成器添加了更多特性,包括 `yield from` 以及在暂停的地方传值回生成器的 `send()`等,为了保持简洁这里就不深入介绍了,有兴趣可以阅读[官方文档](https://docs.python.org/3/reference/expressions.html#yieldexpr)说明以及参考链接2。" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "#### 参考\n", 219 | "\n", 220 | "1. [Iterators & Generators](http://anandology.com/python-practice-book/iterators.html)\n", 221 | "2. [How the heck does async/await work in Python 3.5?](http://www.snarky.ca/how-the-heck-does-async-await-work-in-python-3-5)\n", 222 | "3. [Python's yield from](http://charlesleifer.com/blog/python-s-yield-from/)" 223 | ] 224 | } 225 | ], 226 | "metadata": { 227 | "kernelspec": { 228 | "display_name": "Python 3", 229 | "language": "python", 230 | "name": "python3" 231 | }, 232 | "language_info": { 233 | "codemirror_mode": { 234 | "name": "ipython", 235 | "version": 3 236 | }, 237 | "file_extension": ".py", 238 | "mimetype": "text/x-python", 239 | "name": "python", 240 | "nbconvert_exporter": "python", 241 | "pygments_lexer": "ipython3", 242 | "version": "3.5.0" 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 0 247 | } 248 | -------------------------------------------------------------------------------- /Tips/2016-03-08-Functional-Programming-in-Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Python 中的函数式编程\n", 8 | "\n", 9 | "> 函数式编程(英语:functional programming)或称函数程序设计,又称泛函编程,是一种编程范型,它将电脑运算视为数学上的函数计算,并且避免使用程序状态以及易变对象。函数编程语言最重要的基础是λ演算(lambda calculus)。而且λ演算的函数可以接受函数当作输入(引数)和输出(传出值)。(维基百科:函数式编程)\n", 10 | "\n", 11 | "所谓编程范式(Programming paradigm)是指编程风格、方法或模式,比如面向过程编程(C语言)、面向对象编程(C++)、面向函数式编程(Haskell),并不是说某种编程语言一定属于某种范式,例如 Python 就是多范式编程语言。\n", 12 | "\n", 13 | "#### 函数式编程\n", 14 | "\n", 15 | "函数式编程具有以下特点:\n", 16 | "\n", 17 | "1. 避免状态变量\n", 18 | "2. 函数也是变量(一等公民,First-Class Citizen)\n", 19 | "3. 高阶函数\n", 20 | "4. 面向问题描述而不是面向问题解决步骤\n", 21 | "\n", 22 | "值得一提的是,函数式编程的这些特点在实践过程中可能并不是那么 Pythonic,甚至与**[0x00](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb)**中提到的 The Zen of Python 相悖。例如函数式编程面向问题描述的特点可能让你更快地写出更简洁的代码,但可读性却也大打折扣(可参考这一段[Haskell代码](https://gist.github.com/rainyear/94b5d9a865601f075719))。不过,虽然 Pythonic 很重要但并不是唯一的准则,_The Choice Is Yours_。" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "#### `map(function, iterable, ...)`/`filter(function, iterable)`" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": { 36 | "collapsed": false 37 | }, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "\n", 44 | "Ana\n", 45 | "Bob\n", 46 | "Dogge\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "# map 函数的模拟实现\n", 52 | "def myMap(func, iterable):\n", 53 | " for arg in iterable:\n", 54 | " yield func(arg)\n", 55 | "\n", 56 | "names = [\"ana\", \"bob\", \"dogge\"]\n", 57 | "\n", 58 | "print(map(lambda x: x.capitalize(), names)) # Python 2.7 中直接返回列表\n", 59 | "for name in myMap(lambda x: x.capitalize(), names):\n", 60 | " print(name)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 2, 66 | "metadata": { 67 | "collapsed": false 68 | }, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "\n", 75 | "0\n", 76 | "2\n", 77 | "4\n", 78 | "6\n", 79 | "8\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "# filter 函数的模拟实现\n", 85 | "def myFilter(func, iterable):\n", 86 | " for arg in iterable:\n", 87 | " if func(arg):\n", 88 | " yield arg\n", 89 | " \n", 90 | "print(filter(lambda x: x % 2 == 0, range(10))) # Python 2.7 中直接返回列表\n", 91 | "for i in myFilter(lambda x: x % 2 == 0, range(10)):\n", 92 | " print(i)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "#### `functools.reduce(function, iterable[, initializer])`\n", 100 | "\n", 101 | "Python 3.5 中`reduce` 被降格到标准库`functools`,`reduce` 也是遍历可迭代对象元素作为第一个函数的参数,并将结果累计:" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 3, 107 | "metadata": { 108 | "collapsed": false 109 | }, 110 | "outputs": [ 111 | { 112 | "name": "stdout", 113 | "output_type": "stream", 114 | "text": [ 115 | "24\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "from functools import reduce\n", 121 | "\n", 122 | "print(reduce(lambda a, b: a*b, range(1,5)))" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "#### `functools.partial(func, *args, **keywords)`\n", 130 | "\n", 131 | "偏应用函数(Partial Application)让我们可以固定函数的某些参数:" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 4, 137 | "metadata": { 138 | "collapsed": false 139 | }, 140 | "outputs": [ 141 | { 142 | "data": { 143 | "text/plain": [ 144 | "1025" 145 | ] 146 | }, 147 | "execution_count": 4, 148 | "metadata": {}, 149 | "output_type": "execute_result" 150 | } 151 | ], 152 | "source": [ 153 | "from functools import partial\n", 154 | "\n", 155 | "add = lambda a, b: a + b\n", 156 | "add1024 = partial(add, 1024)\n", 157 | "\n", 158 | "add1024(1)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "这里简单介绍了一些常用函数式编程的方法和概念,实际上要传达的一个最重要的观念就是**函数本身也可以作为变量被返回、传递给高阶函数**,这使得我们可以更灵活地运用函数解决问题。但是这并不意味着一定要使用上面这些方法来简化代码,例如更 Pythonic 的方法推荐尽可能使用 List Comprehension 替代`map`/`filter`(关于 List Comprehension 后面会再单独介绍)。如果一定想要用函数式编程的方法来写 Python,也可以尝试[Fn.py](https://github.com/kachayev/fn.py),或者,试试 [Haskell](https://www.haskell.org/)。" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "#### 参考\n", 173 | "\n", 174 | "1. [维基百科:函数式编程](https://zh.wikipedia.org/wiki/%E5%87%BD%E6%95%B8%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80)\n", 175 | "2. [byvoid:APIO讲稿——函数式编程](http://byvoid.github.io/slides/apio-fp/index.html)" 176 | ] 177 | } 178 | ], 179 | "metadata": { 180 | "kernelspec": { 181 | "display_name": "Python 3", 182 | "language": "python", 183 | "name": "python3" 184 | }, 185 | "language_info": { 186 | "codemirror_mode": { 187 | "name": "ipython", 188 | "version": 3 189 | }, 190 | "file_extension": ".py", 191 | "mimetype": "text/x-python", 192 | "name": "python", 193 | "nbconvert_exporter": "python", 194 | "pygments_lexer": "ipython3", 195 | "version": "3.5.0" 196 | } 197 | }, 198 | "nbformat": 4, 199 | "nbformat_minor": 0 200 | } 201 | -------------------------------------------------------------------------------- /Tips/2016-03-09-List-Comprehension.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 0x03 - Python 列表推导\n", 8 | "\n", 9 | "**[0x02](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-08-Functional-Programming-in-Python.ipynb)** 中提到的 `map`/`filter` 方法可以通过简化的语法快速构建我们需要的列表(或其它可迭代对象),与它们功能相似的,Python 还提供**列表推导(List Comprehension)**的语法。最初学 Python 的时候,我只是把这种语法当做一种**语法糖**,可以用来快速构建特定的列表,后来学习 Haskell 的时候才知道这种形式叫做 List Comprehension(中文我好像没有找到固定的翻译,有翻译成**列表速构、列表解析**之类的,但意思上都是在定义列表结构的时候按照一定的规则进行推导,而不是穷举所有元素)。\n", 10 | "\n", 11 | "这种列表推导与数学里面集合的表达形式有些相似,例如$[0, 10)$之间偶数集合可以表示为:\n", 12 | "\n", 13 | "$$\\left\\{x\\ |\\ x \\in N, x \\lt 10, x\\ mod\\ 2\\ ==\\ 0\\right\\}$$\n", 14 | "\n", 15 | "翻译成 Python 表达式为:" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": { 22 | "collapsed": false 23 | }, 24 | "outputs": [ 25 | { 26 | "name": "stdout", 27 | "output_type": "stream", 28 | "text": [ 29 | "[0, 2, 4, 6, 8]\n" 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "evens = [x for x in range(10) if x % 2 == 0]\n", 35 | "print(evens)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "这与`filter`效果一样:" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": { 49 | "collapsed": false 50 | }, 51 | "outputs": [ 52 | { 53 | "name": "stdout", 54 | "output_type": "stream", 55 | "text": [ 56 | "True\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "fevens = filter(lambda x: x % 2 == 0, range(10))\n", 62 | "print(list(evens) == evens)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "同样,列表推导也可以实现`map`的功能:" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": { 76 | "collapsed": false 77 | }, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "[1, 4, 9, 16, 25]\n", 84 | "True\n" 85 | ] 86 | } 87 | ], 88 | "source": [ 89 | "squares = [x ** 2 for x in range(1, 6)]\n", 90 | "print(squares)\n", 91 | "\n", 92 | "msquares = map(lambda x: x ** 2, range(1, 6))\n", 93 | "print(list(msquares) == squares)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "相比之下,列表推导的语法更加直观,因此更 Pythonic 的写法是在可以用列表推导的时候尽量避免`map`/`filter`。\n", 101 | "\n", 102 | "除了上面简单的迭代、过滤推导之外,列表推导还支持嵌套结构:" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "metadata": { 109 | "collapsed": false 110 | }, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | "[(1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]\n", 117 | "True\n" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "cords = [(x, y) for x in range(3) for y in range(3) if x > 0]\n", 123 | "print(cords)\n", 124 | "\n", 125 | "# 相当于\n", 126 | "lcords = []\n", 127 | "for x in range(3):\n", 128 | " for y in range(3):\n", 129 | " if x > 0:\n", 130 | " lcords.append((x, y))\n", 131 | " \n", 132 | "print(lcords == cords)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "#### 字典与集合的推导\n", 140 | "\n", 141 | "这样一比较更加能够突出列表推导的优势,但是当嵌套的循环超过2层之后,列表推导语法的可读性也会大大下降,所以当循环嵌套层数增加时,还是建议用直接的语法。\n", 142 | "\n", 143 | "Python 中除了列表(List)可以进行列表推导之外,字典(Dict)、集合(Set)同样可以:" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 5, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "{'github.com': '23.22.145.48', 'git.io': '23.22.145.48'}\n", 158 | "{'octocat', 'catty'}\n" 159 | ] 160 | } 161 | ], 162 | "source": [ 163 | "dns = {domain : ip\n", 164 | " for domain in [\"github.com\", \"git.io\"]\n", 165 | " for ip in [\"23.22.145.36\", \"23.22.145.48\"]}\n", 166 | "print(dns)\n", 167 | "\n", 168 | "names = {name for name in [\"ana\", \"bob\", \"catty\", \"octocat\"] if len(name) > 3}\n", 169 | "print(names)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "#### 生成器\n", 177 | "\n", 178 | "**[0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb)**中提到的生成器(Generator),除了在函数中使用 `yield` 关键字之外还有另外一种隐藏方法,那就是对元组(Tuple)使用列表推导:" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 6, 184 | "metadata": { 185 | "collapsed": false 186 | }, 187 | "outputs": [ 188 | { 189 | "name": "stdout", 190 | "output_type": "stream", 191 | "text": [ 192 | " at 0x1104fbba0>\n", 193 | "0\n", 194 | "4\n", 195 | "6\n", 196 | "8\n" 197 | ] 198 | } 199 | ], 200 | "source": [ 201 | "squares = (x for x in range(10) if x % 2 == 0)\n", 202 | "print(squares)\n", 203 | "\n", 204 | "print(next(squares))\n", 205 | "next(squares)\n", 206 | "\n", 207 | "for i in squares:\n", 208 | " print(i)" 209 | ] 210 | } 211 | ], 212 | "metadata": { 213 | "kernelspec": { 214 | "display_name": "Python 3", 215 | "language": "python", 216 | "name": "python3" 217 | }, 218 | "language_info": { 219 | "codemirror_mode": { 220 | "name": "ipython", 221 | "version": 3 222 | }, 223 | "file_extension": ".py", 224 | "mimetype": "text/x-python", 225 | "name": "python", 226 | "nbconvert_exporter": "python", 227 | "pygments_lexer": "ipython3", 228 | "version": "3.5.0" 229 | } 230 | }, 231 | "nbformat": 4, 232 | "nbformat_minor": 0 233 | } 234 | -------------------------------------------------------------------------------- /Tips/2016-03-10-Scope-and-Closure.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 闭包(Closure)\n", 8 | "\n", 9 | "> 在计算机科学中,闭包(英语:Closure),又称词法闭包(Lexical Closure)或函数闭包(function closures),是引用了自由变量的函数。这个被引用的自由变量将和这个函数一同存在,即使已经离开了创造它的环境也不例外。\n", 10 | "[[维基百科::闭包(计算机科学)](https://zh.wikipedia.org/wiki/闭包_%28计算机科学%29)]\n", 11 | "\n", 12 | "[0x02 Python 中的函数式编程](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-08-Functional-Programming-in-Python.md) 本来也应该包括闭包的概念,但是我觉得闭包更重要的是对**作用域(Scope)**的理解,因此把它单独列出来,同时可以理顺一下 Python 的作用域规则。\n", 13 | "\n", 14 | "闭包的概念最早出现在函数式编程语言中,后来被一些命令式编程语言所借鉴。尤其是在一些函数作为一等公民的语言中,例如JavaScript就经常用到(在JavaScript中函数几乎可以当做“特等公民”看待),我之前也写过一篇关于JavaScript闭包的文章([图解Javascript上下文与作用域](http://blog.rainy.im/2015/07/04/scope-chain-and-prototype-chain-in-js/)),实际上闭包并不是太复杂的概念,但是可以借助闭包更好地理解不同语言的作用域规则。\n", 15 | "\n", 16 | "#### 命名空间与作用域\n", 17 | "\n", 18 | "[0x00 The Zen of Python](https://github.com/rainyear/pytips/blob/master/Markdowns/2016-03-06-The-Zen-of-Python.md)的最后一句重点强调命名空间的概念,我们可以把命名空间看做一个大型的字典类型(Dict),里面包含了所有变量的名字和值的映射关系。在 Python 中,作用域实际上可以看做是“**在当前上下文的位置,获取命名空间变量的规则**”。在 Python 代码执行的任意位置,都至少存在三层嵌套的作用域:\n", 19 | "\n", 20 | "1. 最内层作用域,最早搜索,包含所有局部变量**(Python 默认所有变量声明均为局部变量)**\n", 21 | "2. 所有包含当前上下文的外层函数的作用域,由内而外依次搜索,这里包含的是**非局部**也**非全局**的变量\n", 22 | "3. 一直向上搜索,直到当前模块的全局变量\n", 23 | "4. 最外层,最后搜索的,内置(built-in)变量\n", 24 | "\n", 25 | "在任意执行位置,可以将作用域看成是对下面这样一个命名空间的搜索:" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": { 32 | "collapsed": false 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "scopes = {\n", 37 | " \"local\": {\"locals\": None,\n", 38 | " \"non-local\": {\"locals\": None,\n", 39 | " \"global\": {\"locals\": None,\n", 40 | " \"built-in\": [\"built-ins\"]}}},\n", 41 | "}" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "除了默认的局部变量声明方式,Python 还有`global`和`nonlocal`两种类型的声明(**`nonlocal`是Python 3.x之后才有,2.7没有**),其中 `global` 指定的变量直接**指向**(3)当前模块的全局变量,而`nonlocal`则指向(2)最内层之外,`global`以内的变量。这里需要强调指向(references and assignments)的原因是,普通的局部变量对最内层局部作用域之外只有**只读(read-only)**的访问权限,比如下面的例子:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": { 55 | "collapsed": false 56 | }, 57 | "outputs": [ 58 | { 59 | "ename": "UnboundLocalError", 60 | "evalue": "local variable 'x' referenced before assignment", 61 | "output_type": "error", 62 | "traceback": [ 63 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 64 | "\u001b[0;31mUnboundLocalError\u001b[0m Traceback (most recent call last)", 65 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mmain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 66 | "\u001b[0;32m\u001b[0m in \u001b[0;36mmain\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m100\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mmain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 67 | "\u001b[0;31mUnboundLocalError\u001b[0m: local variable 'x' referenced before assignment" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "x = 100\n", 73 | "def main():\n", 74 | " x += 1\n", 75 | " print(x)\n", 76 | "main()" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "这里抛出`UnboundLocalError`,是因为`main()`函数内部的作用域对于全局变量`x`仅有只读权限,想要在`main()`中对`x`进行改变,不会影响全局变量,而是会创建一个新的局部变量,显然无法对还未创建的局部变量直接使用`x += 1`。如果想要获得全局变量的完全引用,则需要`global`声明:" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": { 90 | "collapsed": false 91 | }, 92 | "outputs": [ 93 | { 94 | "name": "stdout", 95 | "output_type": "stream", 96 | "text": [ 97 | "101\n", 98 | "101\n" 99 | ] 100 | } 101 | ], 102 | "source": [ 103 | "x = 100\n", 104 | "def main():\n", 105 | " global x\n", 106 | " x += 1\n", 107 | " print(x)\n", 108 | " \n", 109 | "main()\n", 110 | "print(x) # 全局变量已被改变" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "#### Python 闭包\n", 118 | "\n", 119 | "到这里基本上已经了解了 Python 作用域的规则,那么我们来仿照 JavaScript 写一个计数器的闭包:" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 4, 125 | "metadata": { 126 | "collapsed": false 127 | }, 128 | "outputs": [ 129 | { 130 | "name": "stdout", 131 | "output_type": "stream", 132 | "text": [ 133 | "1\n", 134 | "2\n", 135 | "3\n", 136 | "1\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "\"\"\"\n", 142 | "/* JavaScript Closure example */\n", 143 | "var inc = function(){ \n", 144 | " var x = 0;\n", 145 | " return function(){\n", 146 | " console.log(x++);\n", 147 | " };\n", 148 | "};\n", 149 | "var inc1 = inc()\n", 150 | "var inc2 = inc()\n", 151 | "\"\"\"\n", 152 | "\n", 153 | "# Python 3.5\n", 154 | "def inc():\n", 155 | " x = 0\n", 156 | " def inner():\n", 157 | " nonlocal x\n", 158 | " x += 1\n", 159 | " print(x)\n", 160 | " return inner\n", 161 | "inc1 = inc()\n", 162 | "inc2 = inc()\n", 163 | "\n", 164 | "inc1()\n", 165 | "inc1()\n", 166 | "inc1()\n", 167 | "inc2()" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "对于还没有`nonlocal`关键字的 Python 2.7,可以通过一点小技巧来规避局部作用域只读的限制:" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 5, 180 | "metadata": { 181 | "collapsed": false 182 | }, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | "1\n", 189 | "2\n", 190 | "3\n", 191 | "1\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "# Python 2.7\n", 197 | "def inc():\n", 198 | " x = [0]\n", 199 | " def inner():\n", 200 | " x[0] += 1\n", 201 | " print(x[0])\n", 202 | " return inner\n", 203 | "inc1 = inc()\n", 204 | "inc2 = inc()\n", 205 | "\n", 206 | "inc1()\n", 207 | "inc1()\n", 208 | "inc1()\n", 209 | "inc2()" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "上面的例子中,`inc1()`是在全局环境下执行的,虽然全局环境是不能向下获取到`inc()`中的局部变量`x`的,但是我们返回了一个`inc()`内部的函数`inner()`,而`inner()`对`inc()`中的局部变量是有访问权限的。也就是说`inner()`将`inc()`内的局部作用域打包送给了`inc1`和`inc2`,从而使它们各自独立拥有了一块封闭起来的作用域,不受全局变量或者任何其它运行环境的影响,因此称为闭包。\n", 217 | "\n", 218 | "闭包函数都有一个`__closure__`属性,其中包含了它所引用的上层作用域中的变量:" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 6, 224 | "metadata": { 225 | "collapsed": false 226 | }, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "[3]\n", 233 | "[1]\n" 234 | ] 235 | } 236 | ], 237 | "source": [ 238 | "print(inc1.__closure__[0].cell_contents)\n", 239 | "print(inc2.__closure__[0].cell_contents)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### 参考\n", 247 | "\n", 248 | "1. [9.2. Python Scopes and Namespaces](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces)\n", 249 | "2. [Visualize Python Execution](http://www.pythontutor.com/visualize.html#mode=edit)\n", 250 | "3. [Wikipedia::Closure](https://en.wikipedia.org/wiki/Closure_%28computer_programming%29)" 251 | ] 252 | } 253 | ], 254 | "metadata": { 255 | "kernelspec": { 256 | "display_name": "Python 3", 257 | "language": "python", 258 | "name": "python3" 259 | }, 260 | "language_info": { 261 | "codemirror_mode": { 262 | "name": "ipython", 263 | "version": 3 264 | }, 265 | "file_extension": ".py", 266 | "mimetype": "text/x-python", 267 | "name": "python", 268 | "nbconvert_exporter": "python", 269 | "pygments_lexer": "ipython3", 270 | "version": "3.5.0" 271 | } 272 | }, 273 | "nbformat": 4, 274 | "nbformat_minor": 0 275 | } 276 | -------------------------------------------------------------------------------- /Tips/2016-03-11-Arguments-and-Unpacking.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": false 7 | }, 8 | "source": [ 9 | "### 函数调用的参数规则与解包\n", 10 | "\n", 11 | "Python 的函数在声明参数时大概有下面 4 种形式:\n", 12 | "\n", 13 | "1. 不带默认值的:`def func(a): pass`\n", 14 | "2. 带有默认值的:`def func(a, b = 1): pass`\n", 15 | "3. 任意位置参数:`def func(a, b = 1, *c): pass`\n", 16 | "4. 任意键值参数:`def func(a, b = 1, *c, **d): pass`\n", 17 | "\n", 18 | "在调用函数时,有两种情况:\n", 19 | "\n", 20 | "1. 没有关键词的参数:`func(\"G\", 20)`\n", 21 | "2. 带有关键词的参数:`func(a = \"G\", b = 20)`(其中带有关键词调用可以不考虑顺序:`func(b = 20, a = \"G\"`)\n", 22 | "\n", 23 | "当然,这两种情况是可以混用的:`func(\"G\", b = 20)`,但最重要的一条规则是**位置参数不能在关键词参数之后出现**:" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": { 30 | "collapsed": false 31 | }, 32 | "outputs": [ 33 | { 34 | "ename": "SyntaxError", 35 | "evalue": "positional argument follows keyword argument (, line 3)", 36 | "output_type": "error", 37 | "traceback": [ 38 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m3\u001b[0m\n\u001b[0;31m func(a = \"G\", 20) # SyntaxError 语法错误\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m positional argument follows keyword argument\n" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "def func(a, b = 1):\n", 44 | " pass\n", 45 | "func(a = \"G\", 20) # SyntaxError 语法错误" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "另外一条规则是:**位置参数优先权**:" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": { 59 | "collapsed": false 60 | }, 61 | "outputs": [ 62 | { 63 | "ename": "TypeError", 64 | "evalue": "func() got multiple values for argument 'a'", 65 | "output_type": "error", 66 | "traceback": [ 67 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 68 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 69 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"G\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# TypeError 对参数 a 重复赋值\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 70 | "\u001b[0;31mTypeError\u001b[0m: func() got multiple values for argument 'a'" 71 | ] 72 | } 73 | ], 74 | "source": [ 75 | "def func(a, b = 1):\n", 76 | " pass\n", 77 | "func(20, a = \"G\") # TypeError 对参数 a 重复赋值" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "最保险的方法就是全部采用关键词参数。" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "#### 任意参数\n", 92 | "\n", 93 | "任意参数可以接受任意数量的参数,其中`*a`的形式代表任意数量的位置参数,`**d`代表任意数量的关键词参数:" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": { 100 | "collapsed": false 101 | }, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "G20@Hz\n" 108 | ] 109 | } 110 | ], 111 | "source": [ 112 | "def concat(*lst, sep = \"/\"):\n", 113 | " return sep.join((str(i) for i in lst))\n", 114 | "\n", 115 | "print(concat(\"G\", 20, \"@\", \"Hz\", sep = \"\"))" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "上面的这个`def concat(*lst, sep = \"/\")`的语法是[PEP 3102](https://www.python.org/dev/peps/pep-3102/)提出的,在 Python 3.0 之后实现。这里的关键词函数必须明确指明,不能通过位置推断:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 4, 128 | "metadata": { 129 | "collapsed": false 130 | }, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "G/20/-\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "print(concat(\"G\", 20, \"-\")) # Not G-20" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "`**d`则代表任意数量的关键词参数" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 5, 154 | "metadata": { 155 | "collapsed": false 156 | }, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "hello~world\n", 163 | "python~rocks\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "def dconcat(sep = \":\", **dic):\n", 169 | " for k in dic.keys():\n", 170 | " print(\"{}{}{}\".format(k, sep, dic[k]))\n", 171 | "\n", 172 | "dconcat(hello = \"world\", python = \"rocks\", sep = \"~\")" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "#### Unpacking\n", 180 | "\n", 181 | "Python 3.5 添加的新特性([PEP 448](https://www.python.org/dev/peps/pep-0448/)),使得`*a`、`**d`可以在函数参数之外使用:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 6, 187 | "metadata": { 188 | "collapsed": false 189 | }, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "0 1 2 3 4\n", 196 | "0 1 2 3\n", 197 | "(0, 1, 2)\n", 198 | "rocks\n" 199 | ] 200 | } 201 | ], 202 | "source": [ 203 | "print(*range(5))\n", 204 | "lst = [0, 1, 2, 3]\n", 205 | "print(*lst)\n", 206 | "\n", 207 | "a = *range(3), # 这里的逗号不能漏掉\n", 208 | "print(a)\n", 209 | "\n", 210 | "d = {\"hello\": \"world\", \"python\": \"rocks\"}\n", 211 | "print({**d}[\"python\"])" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "所谓的解包(Unpacking)实际上可以看做是去掉`()`的元组或者是去掉`{}`的字典。这一语法也提供了一个更加 Pythonic 地合并字典的方法:" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 7, 224 | "metadata": { 225 | "collapsed": false 226 | }, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "{'page_name': 'Profile Page', 'name': 'Trey', 'website': 'http://treyhunner.com'}\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "user = {'name': \"Trey\", 'website': \"http://treyhunner.com\"}\n", 238 | "defaults = {'name': \"Anonymous User\", 'page_name': \"Profile Page\"}\n", 239 | "\n", 240 | "print({**defaults, **user})" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "在函数调用的时候使用这种解包的方法则是 Python 2.7 也可以使用的:" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 8, 253 | "metadata": { 254 | "collapsed": false 255 | }, 256 | "outputs": [ 257 | { 258 | "name": "stdout", 259 | "output_type": "stream", 260 | "text": [ 261 | "I/L/o/v/e/P/y/t/h/o/n\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "print(concat(*\"ILovePython\"))" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "#### 参考\n", 274 | "\n", 275 | "1. [The Idiomatic Way to Merge Dictionaries in Python](https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/)" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.5.0" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 0 300 | } 301 | -------------------------------------------------------------------------------- /Tips/2016-03-14-Command-Line-tools-in-Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "### Python 开发命令行工具\n", 12 | "\n", 13 | "Python 作为一种脚本语言,可以非常方便地用于系统(尤其是\\*nix系统)命令行工具的开发。Python 自身也集成了一些标准库,专门用于处理命令行相关的问题。" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": { 19 | "slideshow": { 20 | "slide_type": "slide" 21 | } 22 | }, 23 | "source": [ 24 | "#### 命令行工具的一般结构\n", 25 | "\n", 26 | "![CL-in-Python](http://7xiijd.com1.z0.glb.clouddn.com/CL-in-Python.png)" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "collapsed": true, 33 | "slideshow": { 34 | "slide_type": "slide" 35 | } 36 | }, 37 | "source": [ 38 | "**1. 标准输入输出**\n", 39 | "\n", 40 | "\\*nix 系统中,一切皆为文件,因此标准输入、输出可以完全可以看做是对文件的操作。标准化输入可以通过管道(pipe)或重定向(redirect)的方式传递:" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 1, 46 | "metadata": { 47 | "collapsed": true, 48 | "slideshow": { 49 | "slide_type": "slide" 50 | } 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "# script reverse.py\n", 55 | "#!/usr/bin/env python\n", 56 | "import sys\n", 57 | "for l in sys.stdin.readlines():\n", 58 | " sys.stdout.write(l[::-1])" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "slideshow": { 65 | "slide_type": "slide" 66 | } 67 | }, 68 | "source": [ 69 | "保存为 `reverse.py`,通过管道 `|` 传递:\n", 70 | "\n", 71 | "```sh\n", 72 | "chmod +x reverse.py\n", 73 | "cat reverse.py | ./reverse.py\n", 74 | "\n", 75 | "nohtyp vne/nib/rsu/!#\n", 76 | "sys tropmi\n", 77 | ":)(senildaer.nidts.sys ni l rof\n", 78 | ")]1-::[l(etirw.tuodts.sys\n", 79 | "```\n", 80 | "\n", 81 | "通过重定向 `<` 传递:\n", 82 | "\n", 83 | "```sh\n", 84 | "./reverse.py < reverse.py\n", 85 | "# 输出结果同上\n", 86 | "```" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": { 92 | "slideshow": { 93 | "slide_type": "slide" 94 | } 95 | }, 96 | "source": [ 97 | "**2. 命令行参数**\n", 98 | "\n", 99 | "一般在命令行后追加的参数可以通过 `sys.argv` 获取, `sys.argv` 是一个列表,其中第一个元素为当前脚本的文件名:" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 2, 105 | "metadata": { 106 | "collapsed": false, 107 | "slideshow": { 108 | "slide_type": "slide" 109 | } 110 | }, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | "['/Users/rainy/Projects/GitHub/pytips/venv3/lib/python3.5/site-packages/ipykernel/__main__.py', '-f', '/Users/rainy/Library/Jupyter/runtime/kernel-0533e681-bd7c-4c4d-9094-a78fde7fc2ed.json']\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "# script argv.py\n", 122 | "#!/usr/bin/env python\n", 123 | "import sys\n", 124 | "print(sys.argv) # 下面返回的是 Jupyter 运行的结果" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "运行上面的脚本:\n", 132 | "\n", 133 | "```sh\n", 134 | "chmod +x argv.py\n", 135 | "./argv.py hello world\n", 136 | "python argv.py hello world\n", 137 | "\n", 138 | "# 返回的结果是相同的\n", 139 | "# ['./test.py', 'hello', 'world']\n", 140 | "```" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "对于比较复杂的命令行参数,例如通过 `--option` 传递的选项参数,如果是对 `sys.argv` 逐项进行解析会很麻烦,Python 提供标准库 [`argparse`](https://docs.python.org/3/library/argparse.html)(旧的库为 `optparse`,已经停止维护)专门解析命令行参数:" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 3, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [ 157 | { 158 | "name": "stdout", 159 | "output_type": "stream", 160 | "text": [ 161 | "Load config from: config.ini\n", 162 | "Set theme: default.theme\n" 163 | ] 164 | } 165 | ], 166 | "source": [ 167 | "# script convert.py\n", 168 | "#!/usr/bin/env python\n", 169 | "import argparse as apa\n", 170 | "def loadConfig(config):\n", 171 | " print(\"Load config from: {}\".format(config))\n", 172 | "def setTheme(theme):\n", 173 | " print(\"Set theme: {}\".format(theme))\n", 174 | "def main():\n", 175 | " parser = apa.ArgumentParser(prog=\"convert\") # 设定命令信息,用于输出帮助信息\n", 176 | " parser.add_argument(\"-c\", \"--config\", required=False, default=\"config.ini\")\n", 177 | " parser.add_argument(\"-t\", \"--theme\", required=False, default=\"default.theme\")\n", 178 | " parser.add_argument(\"-f\") # Accept Jupyter runtime option\n", 179 | " args = parser.parse_args()\n", 180 | " loadConfig(args.config)\n", 181 | " setTheme(args.theme)\n", 182 | "\n", 183 | "if __name__ == \"__main__\":\n", 184 | " main()" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": { 190 | "slideshow": { 191 | "slide_type": "slide" 192 | } 193 | }, 194 | "source": [ 195 | "利用 `argparse` 可以很方便地解析选项参数,同时可以定义指定参数的相关属性(是否必须、默认值等),同时还可以自动生成帮助文档。执行上面的脚本:\n", 196 | "\n", 197 | "```sh\n", 198 | "./convert.py -h\n", 199 | "usage: convert [-h] [-c CONFIG] [-t THEME]\n", 200 | "\n", 201 | "optional arguments:\n", 202 | " -h, --help show this help message and exit\n", 203 | " -c CONFIG, --config CONFIG\n", 204 | " -t THEME, --theme THEME\n", 205 | "```" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": { 211 | "slideshow": { 212 | "slide_type": "slide" 213 | } 214 | }, 215 | "source": [ 216 | "**3. 执行系统命令**\n", 217 | "\n", 218 | "当 Python 能够准确地解读输入信息或参数之后,就可以通过 Python 去做任何事情了。这里主要介绍通过 Python 调用系统命令,也就是替代 `Shell` 脚本完成系统管理的功能。我以前的习惯是将命令行指令通过 `os.system(command)` 执行,但是更好的做法应该是用 [`subprocess`](https://docs.python.org/3.5/library/subprocess.html) 标准库,它的存在就是为了替代旧的 `os.system; os.spawn*` 。" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": { 224 | "slideshow": { 225 | "slide_type": "slide" 226 | } 227 | }, 228 | "source": [ 229 | "`subprocess` 模块提供简便的直接调用系统指令的`call()`方法,以及较为复杂可以让用户更加深入地与系统命令进行交互的`Popen`对象。" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 4, 235 | "metadata": { 236 | "collapsed": false, 237 | "slideshow": { 238 | "slide_type": "slide" 239 | } 240 | }, 241 | "outputs": [ 242 | { 243 | "name": "stdout", 244 | "output_type": "stream", 245 | "text": [ 246 | "-rw-r--r-- 1 rainy staff 3.4K 3 8 17:36 ./2016-03-06-The-Zen-of-Python.ipynb\n", 247 | "-rw-r--r-- 1 rainy staff 6.7K 3 8 17:45 ./2016-03-07-iterator-and-generator.ipynb\n", 248 | "-rw-r--r-- 1 rainy staff 6.0K 3 10 12:35 ./2016-03-08-Functional-Programming-in-Python.ipynb\n", 249 | "-rw-r--r-- 1 rainy staff 5.9K 3 9 16:28 ./2016-03-09-List-Comprehension.ipynb\n", 250 | "-rw-r--r-- 1 rainy staff 10K 3 10 14:14 ./2016-03-10-Scope-and-Closure.ipynb\n", 251 | "-rw-r--r-- 1 rainy staff 8.0K 3 11 16:30 ./2016-03-11-Arguments-and-Unpacking.ipynb\n", 252 | "-rw-r--r-- 1 rainy staff 8.5K 3 14 19:31 ./2016-03-14-Command-Line-tools-in-Python.ipynb\n", 253 | "\n" 254 | ] 255 | } 256 | ], 257 | "source": [ 258 | "# script list_files.py\n", 259 | "#!/usr/bin/env python\n", 260 | "import subprocess as sb\n", 261 | "res = sb.check_output(\"ls -lh ./*.ipynb\", shell=True) # 为了安全起见,默认不通过系统 Shell 执行,因此需要设定 shell=True\n", 262 | "print(res.decode()) # 默认返回值为 bytes 类型,需要进行解码操作" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": { 268 | "slideshow": { 269 | "slide_type": "slide" 270 | } 271 | }, 272 | "source": [ 273 | "如果只是简单地执行系统命令还不能满足你的需求,可以使用 `subprocess.Popen` 与生成的子进程进行更多交互:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 5, 279 | "metadata": { 280 | "collapsed": false, 281 | "slideshow": { 282 | "slide_type": "slide" 283 | } 284 | }, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | " \" \\\"p = sb.Popen(['grep', 'communicate'], stdout=sb.PIPE)\\\\n\\\",\\n\",\n", 291 | " \" \\\"# res = p.communicate(sb.check_output('cat ./*'))\\\"\\n\",\n", 292 | " \"p = sb.Popen(['grep', 'communicate'], stdin=sb.PIPE, stdout=sb.PIPE)\\n\",\n", 293 | " \"res, err = p.communicate(sb.check_output('cat ./*', shell=True))\\n\",\n", 294 | "\n" 295 | ] 296 | } 297 | ], 298 | "source": [ 299 | "import subprocess as sb\n", 300 | "\n", 301 | "p = sb.Popen(['grep', 'communicate'], stdin=sb.PIPE, stdout=sb.PIPE)\n", 302 | "res, err = p.communicate(sb.check_output('cat ./*', shell=True))\n", 303 | "if not err:\n", 304 | " print(res.decode())" 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "celltoolbar": "Slideshow", 310 | "kernelspec": { 311 | "display_name": "Python 3", 312 | "language": "python", 313 | "name": "python3" 314 | }, 315 | "language_info": { 316 | "codemirror_mode": { 317 | "name": "ipython", 318 | "version": 3 319 | }, 320 | "file_extension": ".py", 321 | "mimetype": "text/x-python", 322 | "name": "python", 323 | "nbconvert_exporter": "python", 324 | "pygments_lexer": "ipython3", 325 | "version": "3.5.0" 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 0 330 | } 331 | -------------------------------------------------------------------------------- /Tips/2016-03-15-Unicode-String.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Python 字符串\n", 8 | "\n", 9 | "所有用过 Python (2&3)的人应该都看过下面两行错误信息:\n", 10 | "\n", 11 | "> `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)`\n", 12 | "\n", 13 | "> `UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte`\n", 14 | "\n", 15 | "这就是 Python 界的\"锟斤拷\"!\n", 16 | "\n", 17 | "今天和接下来几期的内容将主要关注 Python 中的字符串(`str`)、字节(`bytes`)及两者之间的相互转换(`encode`/`decode`)。也许不能让你突然间解决所有乱码问题,但希望可以帮助你迅速找到问题所在。\n", 18 | "\n", 19 | "### 定义\n", 20 | "\n", 21 | "Python 中对字符串的定义如下:\n", 22 | "\n", 23 | "> Textual data in Python is handled with `str` objects, or strings. Strings are immutable sequences of Unicode code points.\n", 24 | "\n", 25 | "Python 3.5 中字符串是由一系列 Unicode 码位(code point)所组成的**不可变序列**:" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": { 32 | "collapsed": false 33 | }, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/plain": [ 38 | "'STRING'" 39 | ] 40 | }, 41 | "execution_count": 1, 42 | "metadata": {}, 43 | "output_type": "execute_result" 44 | } 45 | ], 46 | "source": [ 47 | "('S' 'T' 'R' 'I' 'N' 'G')" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "**不可变**是指无法对字符串本身进行更改操作:" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": { 61 | "collapsed": false 62 | }, 63 | "outputs": [ 64 | { 65 | "name": "stdout", 66 | "output_type": "stream", 67 | "text": [ 68 | "l\n" 69 | ] 70 | }, 71 | { 72 | "ename": "TypeError", 73 | "evalue": "'str' object does not support item assignment", 74 | "output_type": "error", 75 | "traceback": [ 76 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 77 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 78 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0ms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Hello'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'o'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 79 | "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "s = 'Hello'\n", 85 | "print(s[3])\n", 86 | "s[3] = 'o'" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "而**序列(sequence)**则是指字符串继承序列类型(`list/tuple/range`)的通用操作:" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": { 100 | "collapsed": false 101 | }, 102 | "outputs": [ 103 | { 104 | "data": { 105 | "text/plain": [ 106 | "['H', 'E', 'L', 'L', 'O']" 107 | ] 108 | }, 109 | "execution_count": 3, 110 | "metadata": {}, 111 | "output_type": "execute_result" 112 | } 113 | ], 114 | "source": [ 115 | "[i.upper() for i in \"hello\"]" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "至于 Unicode 暂时可以看作一张非常大的地图,这张地图里面记录了世界上所有的符号,而码位则是每个符号所对应的坐标(具体内容将在后面的几期介绍)。" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 4, 128 | "metadata": { 129 | "collapsed": false 130 | }, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "雨\n", 137 | "1\n", 138 | "b'\\xe9\\x9b\\xa8'\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "s = '雨'\n", 144 | "print(s)\n", 145 | "print(len(s))\n", 146 | "print(s.encode())" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "### 常用操作\n", 154 | "\n", 155 | "- **`len`**:字符串长度;\n", 156 | "- **`split` & `join`**\n", 157 | "- **`find` & `index`**\n", 158 | "- **`strip`**\n", 159 | "- **`upper` & `lower` & `swapcase` & `title` & `capitalize`**\n", 160 | "- **`endswith` & `startswith` & `is*`**\n", 161 | "- **`zfill`**" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 5, 167 | "metadata": { 168 | "collapsed": false 169 | }, 170 | "outputs": [ 171 | { 172 | "name": "stdout", 173 | "output_type": "stream", 174 | "text": [ 175 | "Hello,world!\n" 176 | ] 177 | }, 178 | { 179 | "data": { 180 | "text/plain": [ 181 | "['https:', '', 'github.com/rainyear/pytips']" 182 | ] 183 | }, 184 | "execution_count": 5, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "# split & join\n", 191 | "s = \"Hello world!\"\n", 192 | "print(\",\".join(s.split())) # 常用的切分 & 重组操作\n", 193 | "\n", 194 | "\"https://github.com/rainyear/pytips\".split(\"/\", 2) # 限定切分次数" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 6, 200 | "metadata": { 201 | "collapsed": false 202 | }, 203 | "outputs": [ 204 | { 205 | "name": "stdout", 206 | "output_type": "stream", 207 | "text": [ 208 | "2\n", 209 | "3\n", 210 | "-1\n" 211 | ] 212 | }, 213 | { 214 | "ename": "ValueError", 215 | "evalue": "substring not found", 216 | "output_type": "error", 217 | "traceback": [ 218 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 219 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 220 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfind\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'a'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 若不存在则返回 -1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'a'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 若不存在则抛出 ValueError,其余与 find 相同\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 221 | "\u001b[0;31mValueError\u001b[0m: substring not found" 222 | ] 223 | } 224 | ], 225 | "source": [ 226 | "s = \"coffee\"\n", 227 | "print(s.find('f')) # 从左至右搜索,返回第一个下标\n", 228 | "print(s.rfind('f')) # 从右至左搜索,返回第一个下表\n", 229 | "\n", 230 | "print(s.find('a')) # 若不存在则返回 -1\n", 231 | "print(s.index('a')) # 若不存在则抛出 ValueError,其余与 find 相同" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 7, 237 | "metadata": { 238 | "collapsed": false 239 | }, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | "hello world\n", 246 | "lloworld\n", 247 | "[i ]\n", 248 | "[ i]\n" 249 | ] 250 | } 251 | ], 252 | "source": [ 253 | "print(\" hello world \".strip())\n", 254 | "print(\"helloworld\".strip(\"heo\"))\n", 255 | "print(\"[\"+\" i \".lstrip() +\"]\")\n", 256 | "print(\"[\"+\" i \".rstrip() +\"]\")" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 8, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [ 266 | { 267 | "name": "stdout", 268 | "output_type": "stream", 269 | "text": [ 270 | "HELLO, WORLD\n", 271 | "hello, world\n", 272 | "HELLO, world\n", 273 | "Hello, world\n", 274 | "Hello, World\n" 275 | ] 276 | } 277 | ], 278 | "source": [ 279 | "print(\"{}\\n{}\\n{}\\n{}\\n{}\".format(\n", 280 | " \"hello, WORLD\".upper(),\n", 281 | " \"hello, WORLD\".lower(),\n", 282 | " \"hello, WORLD\".swapcase(),\n", 283 | " \"hello, WORLD\".capitalize(),\n", 284 | " \"hello, WORLD\".title()))" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 9, 290 | "metadata": { 291 | "collapsed": false 292 | }, 293 | "outputs": [ 294 | { 295 | "name": "stdout", 296 | "output_type": "stream", 297 | "text": [ 298 | "\n", 299 | "True|False\n", 300 | "True|False\n", 301 | "True|False\n", 302 | "True|False\n", 303 | "True|False\n", 304 | "True|False\n", 305 | "\n" 306 | ] 307 | } 308 | ], 309 | "source": [ 310 | "print(\"\"\"\n", 311 | "{}|{}\n", 312 | "{}|{}\n", 313 | "{}|{}\n", 314 | "{}|{}\n", 315 | "{}|{}\n", 316 | "{}|{}\n", 317 | "\"\"\".format(\n", 318 | " \"Python\".startswith(\"P\"),\"Python\".startswith(\"y\"),\n", 319 | " \"Python\".endswith(\"n\"),\"Python\".endswith(\"o\"),\n", 320 | " \"i23o6\".isalnum(),\"1 2 3 0 6\".isalnum(),\n", 321 | " \"isalpha\".isalpha(),\"isa1pha\".isalpha(),\n", 322 | " \"python\".islower(),\"Python\".islower(),\n", 323 | " \"PYTHON\".isupper(),\"Python\".isupper(),\n", 324 | "))" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": 10, 330 | "metadata": { 331 | "collapsed": false 332 | }, 333 | "outputs": [ 334 | { 335 | "data": { 336 | "text/plain": [ 337 | "'00000101'" 338 | ] 339 | }, 340 | "execution_count": 10, 341 | "metadata": {}, 342 | "output_type": "execute_result" 343 | } 344 | ], 345 | "source": [ 346 | "\"101\".zfill(8)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "**`format` / `encode`**\n", 354 | "\n", 355 | "格式化输出 `format` 是非常有用的工具,将会单独进行介绍;`encode` 会在 `bytes-decode-Unicode-encode-bytes` 中详细介绍。" 356 | ] 357 | } 358 | ], 359 | "metadata": { 360 | "kernelspec": { 361 | "display_name": "Python 3", 362 | "language": "python", 363 | "name": "python3" 364 | }, 365 | "language_info": { 366 | "codemirror_mode": { 367 | "name": "ipython", 368 | "version": 3 369 | }, 370 | "file_extension": ".py", 371 | "mimetype": "text/x-python", 372 | "name": "python", 373 | "nbconvert_exporter": "python", 374 | "pygments_lexer": "ipython3", 375 | "version": "3.5.0" 376 | } 377 | }, 378 | "nbformat": 4, 379 | "nbformat_minor": 0 380 | } 381 | -------------------------------------------------------------------------------- /Tips/2016-03-16-Bytes-and-Bytearray.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 字节与字节数组\n", 8 | "\n", 9 | "[0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb) 中介绍了 Python 中的字符串类型,字符串类型是对人类友好的符号,但计算机只认识一种符号,那就是二进制(binary)数,或者说是数字:\n", 10 | "\n", 11 | "![OpenCV](http://docs.opencv.org/2.4/_images/MatBasicImageForComputer.jpg)\n", 12 | "\n", 13 | "上面这张图片来自 [OpenCV](http://docs.opencv.org/2.4/),非常直观地解释了计算机处理的信息与我们看到的图像之间的关系。回到 Python 对字节和字节数组的定义:\n", 14 | "\n", 15 | "> The core built-in types for manipulating binary data are `bytes` and `bytearray`.\n", 16 | "\n", 17 | "### 1Byte of ASCII\n", 18 | "\n", 19 | "为了用计算机可以理解的数字描述人类使用的字符,我们需要一张数字与字符对应的表。我们都知道在计算机中 `1 byte = 8bits`,可以存储 `0~255` 共256个值,也就是说 `1byte` 最多可以表示 256 个字符,在最初的计算机世界中,256 足以容纳所有大小写英文字母和 `0~9` 阿拉伯数字以及一些常用的符号,于是就有了 ASCII 编码:\n", 20 | "\n", 21 | "![ascii](http://7xiijd.com1.z0.glb.clouddn.com/asciix400.jpg)" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": { 27 | "collapsed": true 28 | }, 29 | "source": [ 30 | "在 Python 中创建字节与字符串类似,只不过需要在引号外面加一个前缀`b`:" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": { 37 | "collapsed": false 38 | }, 39 | "outputs": [ 40 | { 41 | "name": "stdout", 42 | "output_type": "stream", 43 | "text": [ 44 | "b'Python'\n", 45 | "b'Pyton'\n" 46 | ] 47 | } 48 | ], 49 | "source": [ 50 | "print(b\"Python\")\n", 51 | "python = (b'P' b'y' b\"t\" b'o' b'n')\n", 52 | "print(python)" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "Bytes 代表的是(二进制)数字的序列,只不过在是通过 `ASCII` 编码之后才是我们看到的字符形式,如果我们单独取出一个字节,它仍然是一个数字:" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": { 66 | "collapsed": false 67 | }, 68 | "outputs": [ 69 | { 70 | "name": "stdout", 71 | "output_type": "stream", 72 | "text": [ 73 | "80\n" 74 | ] 75 | } 76 | ], 77 | "source": [ 78 | "print(b\"Python\"[0])" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "我们可以用 `b\"*\"` 的形式创建一个字节类型,前提条件是这里的 `*` 必须是 `ASCII` 中可用的字符,否则将会超出限制:" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 3, 91 | "metadata": { 92 | "collapsed": false 93 | }, 94 | "outputs": [ 95 | { 96 | "ename": "SyntaxError", 97 | "evalue": "bytes can only contain ASCII literal characters. (, line 1)", 98 | "output_type": "error", 99 | "traceback": [ 100 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m print(b\"雨\")\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m bytes can only contain ASCII literal characters.\n" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "print(b\"雨\")" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "错误提示说明:字节类型只能允许 ASCII 字符。" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "**0~127~255**\n", 120 | "\n", 121 | "那么问题来了,我们发现上面的 `ASCII` 表里面所有的字符只占据了 `[31, 127]`,那对于这一范围之外的数字我们要怎么才能表示为字节类型?答案就是用特殊的转义符号`\\x`+十六进制数字 :" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 4, 127 | "metadata": { 128 | "collapsed": false 129 | }, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "255\n", 136 | "b'$'\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "print(b'\\xff'[0])\n", 142 | "print(b'\\x24')" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "反过来我们也可以将数字(0~255)转变成转义后的字节类型:" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 5, 155 | "metadata": { 156 | "collapsed": false 157 | }, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | "b'\\x18'\n", 164 | "b'$$$'\n" 165 | ] 166 | } 167 | ], 168 | "source": [ 169 | "print(bytes([24]))\n", 170 | "print(bytes([36,36,36])) # 记住字节类型是一个序列" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "或者直接从十六进制得来:" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 7, 183 | "metadata": { 184 | "collapsed": false 185 | }, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "b'{}'\n", 192 | "7b207d\n" 193 | ] 194 | }, 195 | { 196 | "data": { 197 | "text/plain": [ 198 | "32" 199 | ] 200 | }, 201 | "execution_count": 7, 202 | "metadata": {}, 203 | "output_type": "execute_result" 204 | } 205 | ], 206 | "source": [ 207 | "print(bytes.fromhex(\"7b 7d\"))\n", 208 | "\n", 209 | "# 逆运算\n", 210 | "print(b'{ }'.hex())\n", 211 | "\n", 212 | "int(b' '.hex(), base=16)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "### `encode`\n", 220 | "\n", 221 | "字符串有 `decode` 方法,而字节有 `encode` 方法,我们这里先简单看一下 `encode('ascii')` 。对于给定的**字符**我们可以通过编码得到它在编码表里面的坐标(即码位),因此对字符进行`encode('ascii')`操作是找到其在 `ASCII` 中的位置:" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 8, 227 | "metadata": { 228 | "collapsed": false 229 | }, 230 | "outputs": [ 231 | { 232 | "name": "stdout", 233 | "output_type": "stream", 234 | "text": [ 235 | "b'$'\n", 236 | "36\n" 237 | ] 238 | } 239 | ], 240 | "source": [ 241 | "print(\"$\".encode('ascii'))\n", 242 | "print(\"$\".encode('ascii')[0])" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "也就是说字符 `\"$\"` ([0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb)中已经介绍过这是一个 Unicode 编码的字符)在 `ASCII` 中的位置就是 `$`(或者说36)。" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "可是如果我们对一些奇怪的字符进行 `ASCII` 编码,就会发生:" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 9, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [ 266 | { 267 | "name": "stdout", 268 | "output_type": "stream", 269 | "text": [ 270 | "'ascii' codec can't encode character '\\U0001f40d' in position 0: ordinal not in range(128)\n", 271 | "b'\\xf0\\x9f\\x90\\x8d'\n" 272 | ] 273 | } 274 | ], 275 | "source": [ 276 | "snake = '🐍'\n", 277 | "try:\n", 278 | " snake.encode('ascii')\n", 279 | "except UnicodeEncodeError as err:\n", 280 | " print(err)\n", 281 | "\n", 282 | "# 正确的做法应该是用 UTF-8 进行编码,因为字符串都是 UTF-8 的\n", 283 | "print(snake.encode()) # utf-8 by default" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "于是就得到了我们最熟悉的错误:`ordinal not in range(128)`,至于为什么是 128,现在应该很好理解了吧!" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "### 字节数组\n", 298 | "\n", 299 | "和字符串一样,字节类型也是不可变序列,而字节数组就是可变版本的字节,它们的关系就相当于`list`与`tuple`。" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 10, 305 | "metadata": { 306 | "collapsed": false 307 | }, 308 | "outputs": [ 309 | { 310 | "name": "stdout", 311 | "output_type": "stream", 312 | "text": [ 313 | "bytearray(b'wello')\n" 314 | ] 315 | } 316 | ], 317 | "source": [ 318 | "ba = bytearray(b'hello')\n", 319 | "ba[0:1] = b'w'\n", 320 | "print(ba)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "由于和字符串一样是序列类型,字节和字节数组可用的方法也类似,这里就不一一列举了。" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "### 总结\n", 335 | "\n", 336 | "1. 字节(字节数组)是二进制数据组成的序列,其中每个元素由8bit二进制即1byte亦即2位十六进制数亦亦即0~255组成;\n", 337 | "2. 字节是计算机的语言,字符串是人类语言,它们之间通过编码表形成一一对应的关系;\n", 338 | "3. 最小的 `ASCII` 编码表只需要一位字节,且只占用了其中 `[31,127]` 的码位;\n", 339 | "\n", 340 | "关于字节与字符串之间的关系,将在下一期[0x08]()详细介绍。" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "### 参考\n", 348 | "\n", 349 | "1. [Pragmatic Unicode](http://nedbatchelder.com/text/unipain/unipain.html#1)" 350 | ] 351 | } 352 | ], 353 | "metadata": { 354 | "kernelspec": { 355 | "display_name": "Python 3", 356 | "language": "python", 357 | "name": "python3" 358 | }, 359 | "language_info": { 360 | "codemirror_mode": { 361 | "name": "ipython", 362 | "version": 3 363 | }, 364 | "file_extension": ".py", 365 | "mimetype": "text/x-python", 366 | "name": "python", 367 | "nbconvert_exporter": "python", 368 | "pygments_lexer": "ipython3", 369 | "version": "3.5.0" 370 | } 371 | }, 372 | "nbformat": 4, 373 | "nbformat_minor": 0 374 | } 375 | -------------------------------------------------------------------------------- /Tips/2016-03-17-Bytes-decode-Unicode-encode-Bytes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python 中 Unicode 的正确用法\n", 8 | "\n", 9 | "[0x07](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-15-Unicode-String.ipynb) 和 [0x08](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-16-Bytes-and-Bytearray.ipynb) 分别介绍了 Python 中的字符串类型(`str`)和字节类型(`byte`),以及 Python 编码中最常见也是最顽固的两个错误:\n", 10 | "\n", 11 | "> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)\n", 12 | "\n", 13 | "> UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte\n", 14 | "\n", 15 | "这一期就从这两个错误入手,分析 Python 中 Unicode 的正确用法。这篇短文并不能保证你可以永远杜绝上面两个错误,但是希望在下次遇到这个错误的时候知道错在哪里、应该从哪里入手。\n", 16 | "\n", 17 | "### 编码与解码\n", 18 | "\n", 19 | "上面的两个错误分别是 `UnicodeEncodeError` 和 `UnicodeDecodeError`,也就是说分别在 Unicode 编码(Encode)和解码(Decode)过程中出现了错误,那么编码和解码究竟分别意味着什么?根据维基百科[字符编码](https://zh.wikipedia.org/wiki/字符编码)的定义:\n", 20 | "\n", 21 | "> 字符编码(英语:Character encoding)、字集码是把字符集中的字符编码为指定集合中某一对象(例如:比特模式、自然数序列、8位组或者电脉冲),以便文本在计算机中存储和通过通信网络的传递。\n", 22 | "\n", 23 | "简单来说就是把**人类通用的语言符号**翻译成**计算机通用的对象**,而反向的翻译过程自然就是**解码**了。Python 中的字符串类型代表人类通用的语言符号,因此字符串类型有`encode()`方法;而字节类型代表计算机通用的对象(二进制数据),因此字节类型有`decode()`方法。" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": { 30 | "collapsed": false 31 | }, 32 | "outputs": [ 33 | { 34 | "name": "stdout", 35 | "output_type": "stream", 36 | "text": [ 37 | "b'\\xf0\\x9f\\x8c\\x8e\\xf0\\x9f\\x8c\\x8f'\n" 38 | ] 39 | } 40 | ], 41 | "source": [ 42 | "print(\"🌎🌏\".encode())" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": { 49 | "collapsed": false 50 | }, 51 | "outputs": [ 52 | { 53 | "name": "stdout", 54 | "output_type": "stream", 55 | "text": [ 56 | "🌎🌏\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "print(b'\\xf0\\x9f\\x8c\\x8e\\xf0\\x9f\\x8c\\x8f'.decode())" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "既然说编码和解码都是**翻译**的过程,那么就需要一本字典将人类和计算机的语言一一对应起来,这本字典的名字叫做**字符集**,从最早的 ASCII 到现在最通用的 Unicode,它们的本质是一样的,只是两本字典的厚度不同而已。ASCII 只包含了26个基本拉丁字母、阿拉伯数目字和英式标点符号一共128个字符,因此只需要(不占满)一个字节就可以存储,而 Unicode 则涵盖的数据除了视觉上的字形、编码方法、标准的字符编码外,还包含了字符特性,如大小写字母,共可包含 1.1M 个字符,而到现在只填充了其中的 110K 个位置。\n", 69 | "\n", 70 | "字符集中字符所存储的位置(或者说对应的计算机通用的数字)称之为码位(code point),例如在 ASCII 中字符 `'$'` 的码位就是:" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "metadata": { 77 | "collapsed": false 78 | }, 79 | "outputs": [ 80 | { 81 | "name": "stdout", 82 | "output_type": "stream", 83 | "text": [ 84 | "36\n" 85 | ] 86 | } 87 | ], 88 | "source": [ 89 | "print(ord('$'))" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "ASCII 只需要一个字节就能存下所有码位,而 Unicode 则需要几个字节才能容纳,但是对于具体采用什么样的方案来实现 Unicode 的这种映射关系,也有很多不同的方案(或规则),例如最常见(也是 Python 中默认的)UTF-8,还有 UTF-16、UTF-32 等,对于它们规则上的不同这里就不深入展开了。当然,在 ASCII 与 Unicode 之间还有很多其他的字符集与编码方案,例如中文编码的 GB2312、繁体字的 Big5 等等,这并不影响我们对编码与解码过程的理解。\n", 97 | "\n", 98 | "### Unicode\\*Error\n", 99 | "\n", 100 | "明白了字符串与字节,编码与解码之后,让我们手动制造上面两个 `Unicode*Error` 试试,首先是编码错误:" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": { 107 | "collapsed": false 108 | }, 109 | "outputs": [ 110 | { 111 | "name": "stdout", 112 | "output_type": "stream", 113 | "text": [ 114 | "b'$'\n", 115 | "b'$'\n", 116 | "b'\\xe9\\x9b\\xa8'\n", 117 | "'ascii' codec can't encode character '\\u96e8' in position 0: ordinal not in range(128)\n", 118 | "b'\\xd3\\xea'\n" 119 | ] 120 | } 121 | ], 122 | "source": [ 123 | "def tryEncode(s, encoding=\"utf-8\"):\n", 124 | " try:\n", 125 | " print(s.encode(encoding))\n", 126 | " except UnicodeEncodeError as err:\n", 127 | " print(err)\n", 128 | " \n", 129 | "s = \"$\" # UTF-8 String\n", 130 | "tryEncode(s) # 默认用 UTF-8 进行编码\n", 131 | "tryEncode(s, \"ascii\") # 尝试用 ASCII 进行编码\n", 132 | "\n", 133 | "s = \"雨\" # UTF-8 String\n", 134 | "tryEncode(s) # 默认用 UTF-8 进行编码\n", 135 | "tryEncode(s, \"ascii\") # 尝试用 ASCII 进行编码\n", 136 | "tryEncode(s, \"GB2312\") # 尝试用 GB2312 进行编码" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "由于 UTF-8 对 ASCII 的兼容性,`\"$\"` 可以用 ASCII 进行编码;而 `\"雨\"` 则无法用 ASCII 进行编码,因为它已经超出了 ASCII 字符集的 128 个字符,所以引发了 `UnicodeEncodeError`;而 `\"雨\"` 在 GB2312 中的码位是 `b'\\xd3\\xea'`,与 UTF-8 不同,但是仍然可以正确编码。因此如果出现了 `UnicodeEncodeError` 说明你用错了字典,要翻译的字符没办法正确翻译成码位!\n", 144 | "\n", 145 | "再来看解码错误:" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 5, 151 | "metadata": { 152 | "collapsed": false 153 | }, 154 | "outputs": [ 155 | { 156 | "name": "stdout", 157 | "output_type": "stream", 158 | "text": [ 159 | "$\n", 160 | "$\n", 161 | "$\n", 162 | "'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte\n", 163 | "'ascii' codec can't decode byte 0xd3 in position 0: ordinal not in range(128)\n", 164 | "雨\n", 165 | "雨\n", 166 | "迾\n", 167 | "雨\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "def tryDecode(s, decoding=\"utf-8\"):\n", 173 | " try:\n", 174 | " print(s.decode(decoding))\n", 175 | " except UnicodeDecodeError as err:\n", 176 | " print(err)\n", 177 | " \n", 178 | "b = b'$' # Bytes\n", 179 | "tryDecode(b) # 默认用 UTF-8 进行解码\n", 180 | "tryDecode(b, \"ascii\") # 尝试用 ASCII 进行解码\n", 181 | "tryDecode(b, \"GB2312\") # 尝试用 GB2312 进行解码\n", 182 | "\n", 183 | "b = b'\\xd3\\xea' # 上面例子中通过 GB2312 编码得到的 Bytes\n", 184 | "tryDecode(b) # 默认用 UTF-8 进行解码\n", 185 | "tryDecode(b, \"ascii\") # 尝试用 ASCII 进行解码\n", 186 | "tryDecode(b, \"GB2312\") # 尝试用 GB2312 进行解码\n", 187 | "tryDecode(b, \"GBK\") # 尝试用 GBK 进行解码\n", 188 | "tryDecode(b, \"Big5\") # 尝试用 Big5 进行解码\n", 189 | "\n", 190 | "tryDecode(b.decode(\"GB2312\").encode()) # Byte-Decode-Unicode-Encode-Byte" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "一般后续出现的字符集都是对 ASCII 兼容的,可以认为 ASCII 是他们的一个子集,因此可以用 ASCII 进行解码(编码)的,一般也可以用其它方法;对于不是不存在子集关系的编码,强行解码有可能会导致错误或乱码!" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "### 实践中的策略\n", 205 | "\n", 206 | "清楚了上面介绍的所有原理之后,在时间操作中应该怎样规避错误或乱码呢?\n", 207 | "\n", 208 | "1. 记清楚编码与解码的方向;\n", 209 | "2. 在 Python 中的操作尽量采用 UTF-8,输入或输出的时候再根据需求确定是否需要编码成二进制:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 6, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [ 219 | { 220 | "name": "stdout", 221 | "output_type": "stream", 222 | "text": [ 223 | "b'\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd\\xef\\xbc\\x8c\\xe4\\xb8\\x96\\xe7\\x95\\x8c\\xef\\xbc\\x81\\n'\n", 224 | "你好,世界!\n", 225 | "\n", 226 | "你好,世界!\n", 227 | "\n", 228 | "Failed to decode file!\n", 229 | "你好,Unicode!\n", 230 | "\n" 231 | ] 232 | } 233 | ], 234 | "source": [ 235 | "# cat utf8.txt\n", 236 | "# 你好,世界!\n", 237 | "# file utf8.txt\n", 238 | "# utf8.txt: UTF-8 Unicode text\n", 239 | "\n", 240 | "with open(\"utf8.txt\", \"rb\") as f:\n", 241 | " content = f.read()\n", 242 | " print(content)\n", 243 | " print(content.decode())\n", 244 | "with open(\"utf8.txt\", \"r\") as f:\n", 245 | " print(f.read())\n", 246 | " \n", 247 | "# cat gb2312.txt\n", 248 | "# 你好,Unicode!\n", 249 | "# file gb2312.txt\n", 250 | "# gb2312.txt: ISO-8859 text\n", 251 | "\n", 252 | "with open(\"gb2312.txt\", \"r\") as f:\n", 253 | " try:\n", 254 | " print(f.read())\n", 255 | " except:\n", 256 | " print(\"Failed to decode file!\")\n", 257 | "with open(\"gb2312.txt\", \"rb\") as f:\n", 258 | " print(f.read().decode(\"gb2312\"))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "![Unicode](http://7xiijd.com1.z0.glb.clouddn.com/Pragmatic_Unicode.jpg)" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "### 参考\n", 273 | "\n", 274 | "1. [Pragmatic Unicode](http://nedbatchelder.com/text/unipain/unipain.html)\n", 275 | "2. [字符编码笔记:ASCII,Unicode和UTF-8](http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html)" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.5.0" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 0 300 | } 301 | -------------------------------------------------------------------------------- /Tips/2016-03-18-String-Format.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python 字符串的格式化\n", 8 | "\n", 9 | "相信很多人在格式化字符串的时候都用`\"%s\" % v`的语法,[PEP 3101](https://www.python.org/dev/peps/pep-3101/) 提出一种更先进的格式化方法 `str.format()` 并成为 Python 3 的标准用来替换旧的 `%s` 格式化语法,CPython 从 2.6 开始已经实现了这一方法(其它解释器未考证)。\n", 10 | "\n", 11 | "### `format()`\n", 12 | "\n", 13 | "新的 `format()` 方法其实更像是一个简略版的模板引起(Template Engine),功能非常丰富,官方文档对其语法的描述如下:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": false 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "\"\"\"\n", 25 | "replacement_field ::= \"{\" [field_name] [\"!\" conversion] [\":\" format_spec] \"}\"\n", 26 | "field_name ::= arg_name (\".\" attribute_name | \"[\" element_index \"]\")*\n", 27 | "arg_name ::= [identifier | integer]\n", 28 | "attribute_name ::= identifier\n", 29 | "element_index ::= integer | index_string\n", 30 | "index_string ::= +\n", 31 | "conversion ::= \"r\" | \"s\" | \"a\"\n", 32 | "format_spec ::= \n", 33 | "\"\"\"\n", 34 | "pass # Donot output" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "我将其准换成[铁路图](https://en.wikipedia.org/wiki/Syntax_diagram)的形式,(可能)更直观一些:\n", 42 | "\n", 43 | "![replacement_field.jpg](http://7xiijd.com1.z0.glb.clouddn.com/replacement_field.jpg)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "模板中替换变量用 `{}` 包围,且由 `:` 分为两部分,其中后半部分 `format_spec` 在后面会单独讨论。前半部分有三种用法:\n", 51 | "\n", 52 | "1. 空\n", 53 | "2. 代表位置的数字\n", 54 | "3. 代表keyword的标识符\n", 55 | "\n", 56 | "这与函数调用的参数类别是一致的:" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 2, 62 | "metadata": { 63 | "collapsed": false 64 | }, 65 | "outputs": [ 66 | { 67 | "name": "stdout", 68 | "output_type": "stream", 69 | "text": [ 70 | "Hello World\n", 71 | "Hello World\n", 72 | "Hello World\n", 73 | "HeH\n" 74 | ] 75 | } 76 | ], 77 | "source": [ 78 | "print(\"{} {}\".format(\"Hello\", \"World\"))\n", 79 | "# is equal to...\n", 80 | "print(\"{0} {1}\".format(\"Hello\", \"World\"))\n", 81 | "print(\"{hello} {world}\".format(hello=\"Hello\", world=\"World\"))\n", 82 | "\n", 83 | "print(\"{0}{1}{0}\".format(\"H\", \"e\"))" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "除此之外,就像在[0x05 函数参数与解包](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-11-Arguments-and-Unpacking.ipynb)中提到的一样,`format()` 中也可以直接使用解包操作:" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 3, 96 | "metadata": { 97 | "collapsed": false 98 | }, 99 | "outputs": [ 100 | { 101 | "name": "stdout", 102 | "output_type": "stream", 103 | "text": [ 104 | "Python.py\n", 105 | "Python Rocks\n" 106 | ] 107 | } 108 | ], 109 | "source": [ 110 | "print(\"{lang}.{suffix}\".format(**{\"lang\": \"Python\", \"suffix\": \"py\"}))\n", 111 | "print(\"{} {}\".format(*[\"Python\", \"Rocks\"]))" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "在模板中还可以通过 `.identifier` 和 `[key]` 的方式获取变量内的属性或值(需要注意的是 `\"{}{}\"` 相当于 `\"{0}{1}\"`):" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 4, 124 | "metadata": { 125 | "collapsed": false 126 | }, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "Name: Python, Score: 100\n", 133 | "Python vs Ruby\n", 134 | "\n", 135 | "====\n", 136 | "Help(format):\n", 137 | " S.format(*args, **kwargs) -> str\n", 138 | "\n", 139 | "Return a formatted version of S, using substitutions from args and kwargs.\n", 140 | "The substitutions are identified by braces ('{' and '}').\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "data = {'name': 'Python', 'score': 100}\n", 146 | "print(\"Name: {0[name]}, Score: {0[score]}\".format(data)) # 不需要引号\n", 147 | "\n", 148 | "langs = [\"Python\", \"Ruby\"]\n", 149 | "print(\"{0[0]} vs {0[1]}\".format(langs))\n", 150 | "\n", 151 | "print(\"\\n====\\nHelp(format):\\n {.__doc__}\".format(str.format))" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### 强制转换\n", 159 | "\n", 160 | "可以通过 `!` + `r|s|a` 的方式对替换的变量进行强制转换:\n", 161 | "\n", 162 | "1. `\"{!r}\"` 对变量调用 `repr()`\n", 163 | "2. `\"{!s}\"` 对变量调用 `str()`\n", 164 | "3. `\"{!a}\"` 对变量调用 `ascii()`" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "### 格式\n", 172 | "\n", 173 | "最后 `:` 之后的部分定义输出的样式:\n", 174 | "\n", 175 | "![format_spec.jpg](http://7xiijd.com1.z0.glb.clouddn.com/format_spec.jpg)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "`align` 代表对齐方向,通常要配合 `width` 使用,而 `fill` 则是填充的字符(默认为空白):" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 5, 188 | "metadata": { 189 | "collapsed": false 190 | }, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | "left<<<<<<<<<<<<\n", 197 | "^^^^^center^^^^^\n", 198 | ">>>>>>>>>>>right\n", 199 | "0000000100\n" 200 | ] 201 | } 202 | ], 203 | "source": [ 204 | "for align, text in zip(\"<^>\", [\"left\", \"center\", \"right\"]):\n", 205 | " print(\"{:{fill}{align}16}\".format(text, fill=align, align=align))\n", 206 | " \n", 207 | "print(\"{:0=10}\".format(100)) # = 只允许数字" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "同时可以看出,样式设置里面可以嵌套 `{}` ,但是必须通过 keyword 指定,且只能嵌套一层。\n", 215 | "\n", 216 | "接下来是符号样式:`+|-|' '` 分别指定数字是否需要强制符号(其中空格是指在正数的时候不显示 `+` 但保留一位空格):" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 6, 222 | "metadata": { 223 | "collapsed": false 224 | }, 225 | "outputs": [ 226 | { 227 | "name": "stdout", 228 | "output_type": "stream", 229 | "text": [ 230 | "+3.14\n", 231 | "-3.14\n", 232 | " 3.14\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "print(\"{0:+}\\n{1:-}\\n{0: }\".format(3.14, -3.14))" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "`#` 用于表示特殊格式的数字(二进制、十六进制等)是否需要前缀符号;`,` 也是用于表示数字时是否需要在千位处进行分隔;`0` 相当于前面的 `{:0=}` 右对齐并用 `0` 补充空位:" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 7, 250 | "metadata": { 251 | "collapsed": false 252 | }, 253 | "outputs": [ 254 | { 255 | "name": "stdout", 256 | "output_type": "stream", 257 | "text": [ 258 | "Binary: 11 => 0b11\n", 259 | "Large Number: 1250000.0 => 1,250,000.0\n", 260 | "Padding: 3 => 0000000000000003\n" 261 | ] 262 | } 263 | ], 264 | "source": [ 265 | "print(\"Binary: {0:b} => {0:#b}\".format(3))\n", 266 | "\n", 267 | "print(\"Large Number: {0:} => {0:,}\".format(1.25e6))\n", 268 | "\n", 269 | "print(\"Padding: {0:16} => {0:016}\".format(3))" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "最后两个就是我们熟悉的小数点精度 `.n` 和格式化类型了,这里仅给出一些示例,详细内容可以查阅[文档](https://docs.python.org/3/library/string.html#formatexamples):" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 8, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | "pi = 3.1, also = 3.141593\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "from math import pi\n", 296 | "print(\"pi = {pi:.2}, also = {pi:.7}\".format(pi=pi))" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "**Integer**" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 9, 309 | "metadata": { 310 | "collapsed": false 311 | }, 312 | "outputs": [ 313 | { 314 | "name": "stdout", 315 | "output_type": "stream", 316 | "text": [ 317 | "Type b of 97 shows: 1100001\n", 318 | "Type c of 97 shows: a\n", 319 | "Type d of 97 shows: 97\n", 320 | "Type #o of 97 shows: 0o141\n", 321 | "Type #x of 97 shows: 0x61\n", 322 | "Type #X of 97 shows: 0X61\n", 323 | "Type n of 97 shows: 97\n" 324 | ] 325 | } 326 | ], 327 | "source": [ 328 | "for t in \"b c d #o #x #X n\".split():\n", 329 | " print(\"Type {0:>2} of {1} shows: {1:{t}}\".format(t, 97, t=t))" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "**Float**" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 10, 342 | "metadata": { 343 | "collapsed": false 344 | }, 345 | "outputs": [ 346 | { 347 | "name": "stdout", 348 | "output_type": "stream", 349 | "text": [ 350 | "Type e shows: 1.23e+04\n", 351 | "Type E shows: 1.23E+04\n", 352 | "Type f shows: 1.30\n", 353 | "Type F shows: 1.30\n", 354 | "Type g shows: 1\n", 355 | "Type G shows: 2\n", 356 | "Type n shows: 3.1\n", 357 | "Type % shows: 98.50%\n" 358 | ] 359 | } 360 | ], 361 | "source": [ 362 | "for t, n in zip(\"eEfFgGn%\", [12345, 12345, 1.3, 1.3, 1, 2, 3.14, 0.985]):\n", 363 | " print(\"Type {} shows: {:.2{t}}\".format(t, n, t=t))" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "**String (default)**" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 11, 376 | "metadata": { 377 | "collapsed": false 378 | }, 379 | "outputs": [ 380 | { 381 | "name": "stdout", 382 | "output_type": "stream", 383 | "text": [ 384 | "456\n" 385 | ] 386 | } 387 | ], 388 | "source": [ 389 | "try:\n", 390 | " print(\"{:s}\".format(123))\n", 391 | "except:\n", 392 | " print(\"{}\".format(456))" 393 | ] 394 | } 395 | ], 396 | "metadata": { 397 | "kernelspec": { 398 | "display_name": "Python 3", 399 | "language": "python", 400 | "name": "python3" 401 | }, 402 | "language_info": { 403 | "codemirror_mode": { 404 | "name": "ipython", 405 | "version": 3 406 | }, 407 | "file_extension": ".py", 408 | "mimetype": "text/x-python", 409 | "name": "python", 410 | "nbconvert_exporter": "python", 411 | "pygments_lexer": "ipython3", 412 | "version": "3.5.0" 413 | } 414 | }, 415 | "nbformat": 4, 416 | "nbformat_minor": 0 417 | } 418 | -------------------------------------------------------------------------------- /Tips/2016-03-21-Try-else.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Python 无处不在的 `else`\n", 8 | "\n", 9 | "我们都知道 Python 中 `else` 的基本用法是在条件控制语句中的 `if...elif...else...`,但是 `else` 还有两个其它的用途,一是用于循环的结尾,另一个是用在错误处理的 `try` 中。这原本是 Python 的标准语法,但由于和大部分其它编程语言的习惯不太一样,致使人们有意或无意地忽略了这些用法。另外,对于这些用法是否符合 [0x00 The Zen of Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb) 的原则以及该不该广泛使用也存在很多争议。例如在我看到的两本书里([Effective Python](http://www.effectivepython.com/) VS [Write Idiomatic Python](https://jeffknupp.com/writing-idiomatic-python-ebook/)),两位作者就分别对其持有截然不同的态度。\n", 10 | "\n", 11 | "**循环中的 `else`**\n", 12 | "\n", 13 | "跟在循环后面的 `else` 语句只有在当循环内没出现 `break`,也就是正常循环完成时才会执行。首先我们来看一个插入排序法的例子:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": false 21 | }, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "[8, 12, 12, 34, 38, 68, 72, 78, 84, 90]\n" 28 | ] 29 | } 30 | ], 31 | "source": [ 32 | "from random import randrange\n", 33 | "def insertion_sort(seq):\n", 34 | " if len(seq) <= 1:\n", 35 | " return seq\n", 36 | " _sorted = seq[:1]\n", 37 | " for i in seq[1:]:\n", 38 | " inserted = False\n", 39 | " for j in range(len(_sorted)):\n", 40 | " if i < _sorted[j]:\n", 41 | " _sorted = [*_sorted[:j], i, *_sorted[j:]]\n", 42 | " inserted = True\n", 43 | " break\n", 44 | " if not inserted:\n", 45 | " _sorted.append(i)\n", 46 | " return _sorted\n", 47 | "\n", 48 | "print(insertion_sort([randrange(1, 100) for i in range(10)]))" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "在这个例子中,对已排序的 `_sorted` 元素逐个与 `i` 进行比较,若 `i` 比已排序的所有元素都大,则只能排在已排序列表的最后。这时我们就需要一个额外的状态变量 `inserted` 来标记完成遍历循环还是中途被 `break`,在这种情况下,我们可以用 `else` 来取代这一状态变量:" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 2, 61 | "metadata": { 62 | "collapsed": false 63 | }, 64 | "outputs": [ 65 | { 66 | "name": "stdout", 67 | "output_type": "stream", 68 | "text": [ 69 | "[1, 10, 27, 32, 32, 43, 50, 55, 80, 94]\n" 70 | ] 71 | } 72 | ], 73 | "source": [ 74 | "def insertion_sort(seq):\n", 75 | " if len(seq) <= 1:\n", 76 | " return seq\n", 77 | " _sorted = seq[:1]\n", 78 | " for i in seq[1:]:\n", 79 | " for j in range(len(_sorted)):\n", 80 | " if i < _sorted[j]:\n", 81 | " _sorted = [*_sorted[:j], i, *_sorted[j:]]\n", 82 | " break\n", 83 | " else:\n", 84 | " _sorted.append(i)\n", 85 | " return _sorted\n", 86 | "print(insertion_sort([randrange(1, 100) for i in range(10)]))" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "我认为这是一个非常酷的做法!不过要注意的是,除了 `break` 可以触发后面的 `else` 语句,没有循环的时候也会:" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": { 100 | "collapsed": false 101 | }, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "Loop failed!\n" 108 | ] 109 | } 110 | ], 111 | "source": [ 112 | "while False:\n", 113 | " print(\"Will never print!\")\n", 114 | "else:\n", 115 | " print(\"Loop failed!\")" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "**错误捕捉中的 `else`**\n", 123 | "\n", 124 | "`try...except...else...finally` 流程控制语法用于捕捉可能出现的异常并进行相应的处理,其中 `except` 用于捕捉 `try` 语句中出现的错误;而 `else` 则用于处理**没有出现错误**的情况;`finally` 负责 `try` 语句的”善后工作“ ,无论如何都会执行。可以通过一个简单的例子来展示:" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "metadata": { 131 | "collapsed": false 132 | }, 133 | "outputs": [ 134 | { 135 | "name": "stdout", 136 | "output_type": "stream", 137 | "text": [ 138 | "result = 2.5\n", 139 | "divide finished!\n", 140 | "********************\n", 141 | "division by 0!\n", 142 | "divide finished!\n" 143 | ] 144 | } 145 | ], 146 | "source": [ 147 | "def divide(x, y):\n", 148 | " try:\n", 149 | " result = x / y\n", 150 | " except ZeroDivisionError:\n", 151 | " print(\"division by 0!\")\n", 152 | " else:\n", 153 | " print(\"result = {}\".format(result))\n", 154 | " finally:\n", 155 | " print(\"divide finished!\")\n", 156 | "divide(5,2)\n", 157 | "print(\"*\"*20)\n", 158 | "divide(5,0)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "当然,也可以用状态变量的做法来替代 `else`:" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 5, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [ 175 | { 176 | "name": "stdout", 177 | "output_type": "stream", 178 | "text": [ 179 | "result = 2.5\n", 180 | "divide finished!\n", 181 | "********************\n", 182 | "division by 0!\n", 183 | "divide finished!\n" 184 | ] 185 | } 186 | ], 187 | "source": [ 188 | "def divide(x, y):\n", 189 | " result = None\n", 190 | " try:\n", 191 | " result = x / y\n", 192 | " except ZeroDivisionError:\n", 193 | " print(\"division by 0!\")\n", 194 | " if result is not None:\n", 195 | " print(\"result = {}\".format(result))\n", 196 | " print(\"divide finished!\")\n", 197 | "\n", 198 | " \n", 199 | "divide(5,2)\n", 200 | "print(\"*\"*20)\n", 201 | "divide(5,0)" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "**总结**\n", 209 | "\n", 210 | "有人觉得 `else` 的这些用法违反直觉或者是 **implicit** 而非 **explicit**,不值得提倡。但我觉得这种”判决“需要依赖具体的应用场景以及我们对 Python 的理解,并非一定要对新人友好的语法才算是 **explicit** 的。当然也不推荐在所有地方都使用这个语法,`for/while...else` 最大的缺点在于 `else` 是需要与 `for/file` 对齐的,如果是多层嵌套或者循环体太长的情况,就非常不适合用 `else`(回忆一下游标卡尺的梗就知道了:P)。只有在一些简短的循环控制语句中,我们通过 `else` 摆脱一些累赘的状态变量,这才是最 Pythonic 的应用场景!" 211 | ] 212 | } 213 | ], 214 | "metadata": { 215 | "kernelspec": { 216 | "display_name": "Python 3", 217 | "language": "python", 218 | "name": "python3" 219 | }, 220 | "language_info": { 221 | "codemirror_mode": { 222 | "name": "ipython", 223 | "version": 3 224 | }, 225 | "file_extension": ".py", 226 | "mimetype": "text/x-python", 227 | "name": "python", 228 | "nbconvert_exporter": "python", 229 | "pygments_lexer": "ipython3", 230 | "version": "3.5.0" 231 | } 232 | }, 233 | "nbformat": 4, 234 | "nbformat_minor": 0 235 | } 236 | -------------------------------------------------------------------------------- /Tips/2016-03-22-Shallow-and-Deep-Copy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python 知之深浅\n", 8 | "\n", 9 | "Python 中的对象分为两种:可变对象(mutable)和不可变对象(immutable)。不可变对象包括int,float,long,str,tuple等,可变对象包括list,set,dict等。在 Python 中,赋值(assignment, `=`)的过程仅仅是:\n", 10 | "\n", 11 | "1. 创建一个(某个值的)对象;\n", 12 | "2. 将变量名指向(引用)这个对象。\n", 13 | "\n", 14 | "这就像 C 语言中指针的概念,只不过更灵活地是 Python 中的变量随时可以指向其它对象(不分类型),其它变量也可以指向这一对象。如果这一对象是可变的,那么对其中一个引用变量的改变会影响其它变量:" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "collapsed": false 22 | }, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "[1, 2]\n", 29 | "{'b': 1, 'a': 0}\n" 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "lst = [1, 2, 3]\n", 35 | "s = lst\n", 36 | "s.pop()\n", 37 | "print(lst)\n", 38 | "\n", 39 | "d = {'a': 0}\n", 40 | "e = d\n", 41 | "e['b'] = 1\n", 42 | "print(d)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "如果你不是刻意想要这样做(实际也很少会要这样操作),那么就可能导致一些意想不到的错误(尤其是在传递参数给函数的时候)。为了解决这一麻烦,最简单的方法就是不直接变量指向现有的对象,而是生成一份新的 copy 赋值给新的变量,有很多种语法可以实现:" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": [ 63 | "[1, 2, 3, '#0']\n", 64 | "{'dd': '#0', 'a': 0}\n" 65 | ] 66 | } 67 | ], 68 | "source": [ 69 | "lst = [1,2,3]\n", 70 | "\n", 71 | "llst = [lst,\n", 72 | " lst[:],\n", 73 | " lst.copy(),\n", 74 | " [*lst]] # invalid in 2.7\n", 75 | "for i, v in enumerate(llst):\n", 76 | " v.append(\"#{}\".format(i))\n", 77 | "print(lst)\n", 78 | "\n", 79 | "d = {\"a\": 0}\n", 80 | "dd = [d,\n", 81 | " d.copy(),\n", 82 | " {**d}] # invalid in 2.7\n", 83 | "for i, v in enumerate(dd):\n", 84 | " v['dd'] = \"#{}\".format(i)\n", 85 | "print(d)" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "### `deep` vs `shallow`\n", 93 | "\n", 94 | "上面给出的这些 copy 的例子比较简单,都没有嵌套的情况出现,如果这里的可变对象中还包含其它可变对象,结果会怎样呢:" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 3, 100 | "metadata": { 101 | "collapsed": false 102 | }, 103 | "outputs": [ 104 | { 105 | "name": "stdout", 106 | "output_type": "stream", 107 | "text": [ 108 | "[0, 1, [2, 3, '#0', '#1', '#2', '#3']]\n", 109 | "{'a': {'b': [0, '#0', '#1', '#2']}}\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "lst = [0, 1, [2, 3]]\n", 115 | "\n", 116 | "llst = [lst,\n", 117 | " lst[:],\n", 118 | " lst.copy(),\n", 119 | " [*lst]]\n", 120 | "for i, v in enumerate(llst):\n", 121 | " v[2].append(\"#{}\".format(i))\n", 122 | "print(lst)\n", 123 | "\n", 124 | "d = {\"a\": {\"b\": [0]}}\n", 125 | "dd = [d,\n", 126 | " d.copy(),\n", 127 | " {**d}]\n", 128 | "for i, v in enumerate(dd):\n", 129 | " v['a']['b'].append(\"#{}\".format(i))\n", 130 | "print(d)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "这些 copy 的方法称为**浅拷贝(shallow copy)**,它相比直接赋值更进了一步生成了新的对象,但是对于嵌套的对象仍然采用了赋值的方法来创建;如果要再进一步,则需要**深拷贝(deep copy)**,由标准库 `copy` 提供:" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 4, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "[0, 1, [2, 3, 4]]\n", 152 | "[0, 1, [2, 3]]\n", 153 | "{'a': {'b': [0, 1]}}\n", 154 | "{'a': {'b': [0]}}\n" 155 | ] 156 | } 157 | ], 158 | "source": [ 159 | "from copy import deepcopy\n", 160 | "\n", 161 | "lst = [0, 1, [2, 3]]\n", 162 | "lst2 = deepcopy(lst)\n", 163 | "lst2[2].append(4)\n", 164 | "print(lst2)\n", 165 | "print(lst)\n", 166 | "\n", 167 | "d = {\"a\": {\"b\": [0]}}\n", 168 | "d2 = deepcopy(d)\n", 169 | "d2[\"a\"][\"b\"].append(1)\n", 170 | "print(d2)\n", 171 | "print(d)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "清楚了赋值(引用)、copy 还是 `deepcopy` 之间的区别才能更好地避免意想不到的错误,同样也可以利用它们的特性去实现一些 little tricks,例如我们在 [0x04 闭包与作用域](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-10-Scope-and-Closure.ipynb) 中利用可变对象的特性实现 `nonlocal` 的功能。关于可变对象的引用、传递等既是 Python 的基本属性,同时又因为隐藏在背后的“暗箱操作”而容易引起误解,想要深入了解可以进一步阅读参考链接的文章,我也会在后面的文章中继续一边学习、一边补充更多这方面的知识。" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "### 参考\n", 186 | "\n", 187 | "1. [python基础(5):深入理解 python 中的赋值、引用、拷贝、作用域](http://my.oschina.net/leejun2005/blog/145911)" 188 | ] 189 | } 190 | ], 191 | "metadata": { 192 | "kernelspec": { 193 | "display_name": "Python 3", 194 | "language": "python", 195 | "name": "python3" 196 | }, 197 | "language_info": { 198 | "codemirror_mode": { 199 | "name": "ipython", 200 | "version": 3 201 | }, 202 | "file_extension": ".py", 203 | "mimetype": "text/x-python", 204 | "name": "python", 205 | "nbconvert_exporter": "python", 206 | "pygments_lexer": "ipython3", 207 | "version": "3.5.0" 208 | } 209 | }, 210 | "nbformat": 4, 211 | "nbformat_minor": 0 212 | } 213 | -------------------------------------------------------------------------------- /Tips/2016-03-23-With-Context-Manager.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python 上下文管理器\n", 8 | "\n", 9 | "Python 2.5 引入了 `with` 语句([PEP 343](https://www.python.org/dev/peps/pep-0343/))与上下文管理器类型([Context Manager Types](https://docs.python.org/3/library/stdtypes.html#context-manager-types)),其主要作用包括:\n", 10 | "\n", 11 | "> 保存、重置各种全局状态,锁住或解锁资源,关闭打开的文件等。[With Statement Context Managers](https://docs.python.org/3/reference/datamodel.html#with-statement-context-managers)\n", 12 | "\n", 13 | "一种最普遍的用法是对文件的操作:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": false 21 | }, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "你好,世界!\n", 28 | "\n" 29 | ] 30 | } 31 | ], 32 | "source": [ 33 | "with open(\"utf8.txt\", \"r\") as f:\n", 34 | " print(f.read())" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "上面的例子也可以用 `try...finally...` 实现,它们的效果是相同(或者说上下文管理器就是封装、简化了错误捕捉的过程):" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": { 48 | "collapsed": false 49 | }, 50 | "outputs": [ 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "你好,世界!\n", 56 | "\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "try:\n", 62 | " f = open(\"utf8.txt\", \"r\")\n", 63 | " print(f.read())\n", 64 | "finally:\n", 65 | " f.close()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "除了文件对象之外,我们也可以自己创建上下文管理器,与 [0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb) 中介绍的迭代器类似,只要定义了 `__enter__()` 和 `__exit__()` 方法就成为了上下文管理器类型。`with` 语句的执行过程如下:\n", 73 | "\n", 74 | "1. 执行 `with` 后的语句获取上下文管理器,例如 `open('utf8.txt', 'r')` 就是返回一个 `file object`;\n", 75 | "2. 加载 `__exit__()` 方法备用;\n", 76 | "3. 执行 `__enter__()`,该方法的返回值将传递给 `as` 后的变量(如果有的话);\n", 77 | "4. 执行 `with` 语法块的子句;\n", 78 | "5. 执行 `__exit__()` 方法,如果 `with` 语法块子句中出现异常,将会传递 `type, value, traceback` 给 `__exit__()`,否则将默认为 `None`;如果 `__exit__()` 方法返回 `False`,将会抛出异常给外层处理;如果返回 `True`,则忽略异常。" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "了解了 `with` 语句的执行过程,我们可以编写自己的上下文管理器。假设我们需要一个引用计数器,而出于某些特殊的原因需要多个计数器共享全局状态并且可以相互影响,而且在计数器使用完毕之后需要恢复初始的全局状态:" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 3, 91 | "metadata": { 92 | "collapsed": false 93 | }, 94 | "outputs": [ 95 | { 96 | "name": "stdout", 97 | "output_type": "stream", 98 | "text": [ 99 | "COUNTER #ref1: 98\n", 100 | "COUNTER #ref2: 100\n", 101 | "COUNTER #ref1: 99\n", 102 | "COUNTER #ref2: 101\n", 103 | "COUNTER #ref1: 100\n", 104 | "COUNTER #ref2: 102\n", 105 | "{'user': 'admin', 'counter': 99}\n" 106 | ] 107 | } 108 | ], 109 | "source": [ 110 | "_G = {\"counter\": 99, \"user\": \"admin\"}\n", 111 | "\n", 112 | "class Refs():\n", 113 | " def __init__(self, name = None):\n", 114 | " self.name = name\n", 115 | " self._G = _G\n", 116 | " self.init = self._G['counter']\n", 117 | " def __enter__(self):\n", 118 | " return self\n", 119 | " def __exit__(self, *args):\n", 120 | " self._G[\"counter\"] = self.init\n", 121 | " return False\n", 122 | " def acc(self, n = 1):\n", 123 | " self._G[\"counter\"] += n\n", 124 | " def dec(self, n = 1):\n", 125 | " self._G[\"counter\"] -= n\n", 126 | " def __str__(self):\n", 127 | " return \"COUNTER #{name}: {counter}\".format(**self._G, name=self.name)\n", 128 | " \n", 129 | "with Refs(\"ref1\") as ref1, Refs(\"ref2\") as ref2: # Python 3.1 加入了多个并列上下文管理器\n", 130 | " for _ in range(3):\n", 131 | " ref1.dec()\n", 132 | " print(ref1)\n", 133 | " ref2.acc(2)\n", 134 | " print(ref2)\n", 135 | "print(_G)" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "上面的例子很别扭但是可以很好地说明 `with` 语句的执行顺序,只是每次定义两个方法看起来并不是很简洁,一如既往地,Python 提供了 `@contextlib.contextmanager` + `generator` 的方式来简化这一过程(正如 [0x01](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-07-iterator-and-generator.ipynb) 中 `yield` 简化迭代器一样):" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 4, 148 | "metadata": { 149 | "collapsed": false 150 | }, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | "COUNTER #ref1: 98\n", 157 | "COUNTER #ref2: 100\n", 158 | "COUNTER #ref1: 99\n", 159 | "COUNTER #ref2: 101\n", 160 | "COUNTER #ref1: 100\n", 161 | "COUNTER #ref2: 102\n", 162 | "********************\n", 163 | "{'user': 'admin', 'counter': 99}\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "from contextlib import contextmanager as cm\n", 169 | "_G = {\"counter\": 99, \"user\": \"admin\"}\n", 170 | "\n", 171 | "@cm\n", 172 | "def ref():\n", 173 | " counter = _G[\"counter\"]\n", 174 | " yield _G\n", 175 | " _G[\"counter\"] = counter\n", 176 | "\n", 177 | "with ref() as r1, ref() as r2:\n", 178 | " for _ in range(3):\n", 179 | " r1[\"counter\"] -= 1\n", 180 | " print(\"COUNTER #ref1: {}\".format(_G[\"counter\"]))\n", 181 | " r2[\"counter\"] += 2\n", 182 | " print(\"COUNTER #ref2: {}\".format(_G[\"counter\"]))\n", 183 | "print(\"*\"*20)\n", 184 | "print(_G)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "这里对生成器的要求是必须只能返回一个值(只有一次 `yield`),返回的值相当于 `__enter__()` 的返回值;而 `yield` 后的语句相当于 `__exit__()`。" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "生成器的写法更简洁,适合快速生成一个简单的上下文管理器。\n", 199 | "\n", 200 | "除了上面两种方式,Python 3.2 中新增了 `contextlib.ContextDecorator`,可以允许我们自己在 `class` 层面定义新的”上下文管理修饰器“,有兴趣可以到[官方文档查看](https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator)。至少在我目前看来好像并没有带来更多方便(除了可以省掉一层缩进之外:()。\n", 201 | "\n", 202 | "上下文管理器的概念与修饰器有很多相似之处,但是要记住的是 `with` 语句的目的是为了更优雅地收拾残局而不是替代 `try...finally...`,毕竟在 [The Zen of Python](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-06-The-Zen-of-Python.ipynb) 中,\n", 203 | "\n", 204 | "> Explicit is better than implicit.\n", 205 | "\n", 206 | "比\n", 207 | "\n", 208 | "> Simple is better than complex.\n", 209 | "\n", 210 | "更重要:P。" 211 | ] 212 | } 213 | ], 214 | "metadata": { 215 | "kernelspec": { 216 | "display_name": "Python 3", 217 | "language": "python", 218 | "name": "python3" 219 | }, 220 | "language_info": { 221 | "codemirror_mode": { 222 | "name": "ipython", 223 | "version": 3 224 | }, 225 | "file_extension": ".py", 226 | "mimetype": "text/x-python", 227 | "name": "python", 228 | "nbconvert_exporter": "python", 229 | "pygments_lexer": "ipython3", 230 | "version": "3.5.0" 231 | } 232 | }, 233 | "nbformat": 4, 234 | "nbformat_minor": 0 235 | } 236 | -------------------------------------------------------------------------------- /Tips/2016-03-24-Sort-and-Sorted.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Python 内置排序方法\n", 8 | "\n", 9 | "Python 提供两种内置排序方法,一个是只针对 `List` 的原地(in-place)排序方法 `list.sort()`,另一个是针对所有可迭代对象的非原地排序方法 `sorted()`。\n", 10 | "\n", 11 | "所谓原地排序是指会立即改变被排序的列表对象,就像 `append()`/`pop()` 等方法一样:" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "collapsed": false 19 | }, 20 | "outputs": [ 21 | { 22 | "name": "stdout", 23 | "output_type": "stream", 24 | "text": [ 25 | "[57, 81, 32, 74, 12, 89, 76, 21, 75, 6]\n", 26 | "[6, 12, 21, 32, 57, 74, 75, 76, 81, 89]\n" 27 | ] 28 | } 29 | ], 30 | "source": [ 31 | "from random import randrange\n", 32 | "lst = [randrange(1, 100) for _ in range(10)]\n", 33 | "print(lst)\n", 34 | "lst.sort()\n", 35 | "\n", 36 | "print(lst)" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "`sorted()` 不限于列表,而且会生成并返回一个新的排序后的**列表**,原有对象不受影响:" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": { 50 | "collapsed": false 51 | }, 52 | "outputs": [ 53 | { 54 | "name": "stdout", 55 | "output_type": "stream", 56 | "text": [ 57 | "[11, 36, 39, 41, 48, 48, 50, 76, 79, 99]\n", 58 | "(11, 41, 79, 48, 48, 99, 39, 76, 36, 50)\n" 59 | ] 60 | } 61 | ], 62 | "source": [ 63 | "lst = [randrange(1, 100) for _ in range(10)]\n", 64 | "tup = tuple(lst)\n", 65 | "\n", 66 | "print(sorted(tup)) # return List\n", 67 | "print(tup)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "虽然不是原地排序,但如果是传入生成器,还是会被循环掉的:" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": { 81 | "collapsed": false 82 | }, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "[5, 12, 15, 21, 57, 69, 73, 83, 90, 95]\n" 89 | ] 90 | } 91 | ], 92 | "source": [ 93 | "tup = (randrange(1, 100) for _ in range(10))\n", 94 | "print(sorted(tup))\n", 95 | "for i in tup:\n", 96 | " print(i)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "### Key\n", 104 | "\n", 105 | "对简单的迭代对象进行排序只需要逐次提取元素进行比较即可,如果想要对元素进行一些操作再进行比较,可以通过 `key` 参数指定一个取值函数。这里的 `key` 用法很像 [0x02 函数式编程](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-08-Functional-Programming-in-Python.ipynb)提到的 `map`/`filter` 所接受的函数,不同之处在于这里的 `key` 函数只是在排序比较前对元素进行处理,并不会改变元素原本的值,例如我们对一组整数**按照(`key` 可以理解为`按照`的意思)**绝对值进行排序:" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 4, 111 | "metadata": { 112 | "collapsed": false 113 | }, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "[0, 7, 0, -10, 3, 7, -9, -10, -7, -10]\n", 120 | "[0, 0, 3, 7, 7, -7, -9, -10, -10, -10]\n" 121 | ] 122 | } 123 | ], 124 | "source": [ 125 | "lst = [randrange(-10, 10) for _ in range(10)]\n", 126 | "print(lst)\n", 127 | "print(sorted(lst, key=abs))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "或者,当迭代对象的元素较为复杂时,可以只**按照**其中的某些属性进行排序:" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": { 141 | "collapsed": false 142 | }, 143 | "outputs": [ 144 | { 145 | "name": "stdout", 146 | "output_type": "stream", 147 | "text": [ 148 | "[('hello', 3), ('world', 3), ('hail', 9), ('python', 9)]\n", 149 | "[('hello', 3), ('world', 3), ('hail', 9), ('python', 9)]\n" 150 | ] 151 | } 152 | ], 153 | "source": [ 154 | "lst = list(zip(\"hello world hail python\".split(), [randrange(1, 10) for _ in range(4)]))\n", 155 | "print(lst)\n", 156 | "print(sorted(lst, key=lambda item: item[1]))" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "Python 的 `operator` 标准库提供了一些操作符相关的方法,可以更方便地获取元素的属性:" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 6, 169 | "metadata": { 170 | "collapsed": false 171 | }, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "[('hello', 3), ('world', 3), ('hail', 9), ('python', 9)]\n", 178 | "[('hello', 3), ('world', 3), ('hail', 9), ('python', 9)]\n", 179 | "[('hello', 3), ('world', 3), ('hail', 9), ('python', 9)]\n", 180 | "[hello=>3, world=>3, hail=>9, python=>9]\n" 181 | ] 182 | } 183 | ], 184 | "source": [ 185 | "from operator import itemgetter, attrgetter\n", 186 | "\n", 187 | "print(lst)\n", 188 | "print(sorted(lst, key=itemgetter(1)))\n", 189 | "\n", 190 | "# 一切都只是函数\n", 191 | "fitemgetter = lambda ind: lambda item: item[ind]\n", 192 | "print(sorted(lst, key=fitemgetter(1)))\n", 193 | "\n", 194 | "class P(object):\n", 195 | " def __init__(self, w, n):\n", 196 | " self.w = w\n", 197 | " self.n = n\n", 198 | " def __repr__(self):\n", 199 | " return \"{}=>{}\".format(self.w, self.n)\n", 200 | "ps = [P(i[0], i[1]) for i in lst]\n", 201 | "\n", 202 | "print(sorted(ps, key=attrgetter('n')))" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "经过 `key` 处理之后会通过 `<` 符号对两个元素进行比较,在 Python 2.7 的版本中,`sorted()` 还可以接收另外一个参数 `cmp`,用来接管 `<` 的比较过程。但是在 Python 3.5 中已经全面摒弃了这一做法,包括 `sorted()` 中的 `cmp` 参数和对象中的 `__cmp__` 比较操作,只有在需要向后兼容的时候才可能在 Python 3.5 用到这一功能,其替换的方法为:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 7, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/plain": [ 222 | "[('hail', 9), ('python', 9), ('hello', 3), ('world', 3)]" 223 | ] 224 | }, 225 | "execution_count": 7, 226 | "metadata": {}, 227 | "output_type": "execute_result" 228 | } 229 | ], 230 | "source": [ 231 | "from functools import cmp_to_key as new_cmp_to_key\n", 232 | "\n", 233 | "# new_cmp_to_key works like this\n", 234 | "def cmp_to_key(mycmp):\n", 235 | " 'Convert a cmp= function into a key= function'\n", 236 | " class K:\n", 237 | " def __init__(self, obj, *args):\n", 238 | " self.obj = obj\n", 239 | " def __lt__(self, other):\n", 240 | " return mycmp(self.obj, other.obj) < 0\n", 241 | " return K\n", 242 | "def reverse_cmp(x, y):\n", 243 | " return y[1] - x[1]\n", 244 | "sorted(lst, key=cmp_to_key(reverse_cmp))\n" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "如果想要按照递减排序,只需要设定参数 `reverse = True` 即可。" 252 | ] 253 | } 254 | ], 255 | "metadata": { 256 | "kernelspec": { 257 | "display_name": "Python 3", 258 | "language": "python", 259 | "name": "python3" 260 | }, 261 | "language_info": { 262 | "codemirror_mode": { 263 | "name": "ipython", 264 | "version": 3 265 | }, 266 | "file_extension": ".py", 267 | "mimetype": "text/x-python", 268 | "name": "python", 269 | "nbconvert_exporter": "python", 270 | "pygments_lexer": "ipython3", 271 | "version": "3.5.0" 272 | } 273 | }, 274 | "nbformat": 4, 275 | "nbformat_minor": 0 276 | } 277 | -------------------------------------------------------------------------------- /Tips/2016-03-25-Decorator-and-functools.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python 修饰器与 `functools`\n", 8 | "\n", 9 | "Python 的修饰器是一种语法糖(Syntactic Sugar),也就是说:\n", 10 | "\n", 11 | "```python\n", 12 | "@decorator\n", 13 | "@wrap\n", 14 | "def func():\n", 15 | " pass\n", 16 | "```\n", 17 | "\n", 18 | "是下面语法的一种简写:\n", 19 | "\n", 20 | "```python\n", 21 | "def func():\n", 22 | " pass\n", 23 | "func = decorator(wrap(func))\n", 24 | "```\n", 25 | "\n", 26 | "关于修饰器的两个主要问题:\n", 27 | "\n", 28 | "1. 修饰器用来修饰谁\n", 29 | "2. 谁可以作为修饰器\n", 30 | "\n", 31 | "### 修饰函数\n", 32 | "\n", 33 | "修饰器最常见的用法是修饰新定义的函数,在 [0x0d 上下文管理器](https://github.com/rainyear/pytips/blob/master/Tips/2016-03-23-With-Context-Manager.ipynb)中提到上下文管理器主要是为了**更优雅地完成善后工作**,而修饰器通常用于扩展函数的行为或属性:" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": { 40 | "collapsed": false 41 | }, 42 | "outputs": [ 43 | { 44 | "name": "stdout", 45 | "output_type": "stream", 46 | "text": [ 47 | "INFO: Starting run\n", 48 | "Running run...\n", 49 | "INFO: Finishing run\n" 50 | ] 51 | } 52 | ], 53 | "source": [ 54 | "def log(func):\n", 55 | " def wraper():\n", 56 | " print(\"INFO: Starting {}\".format(func.__name__))\n", 57 | " func()\n", 58 | " print(\"INFO: Finishing {}\".format(func.__name__))\n", 59 | " return wraper\n", 60 | "\n", 61 | "@log\n", 62 | "def run():\n", 63 | " print(\"Running run...\")\n", 64 | "run()" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "### 修饰类\n", 72 | "\n", 73 | "除了修饰函数之外,Python 3.0 之后增加了对新定义类的修饰([PEP 3129](https://www.python.org/dev/peps/pep-3129/)),但是对于类别属性的修改可以通过 [`Metaclasses`](https://www.python.org/doc/essays/metaclasses/) 或继承来实现,而新增加的类别修饰器更多是出于 [Jython](https://mail.python.org/pipermail/python-dev/2006-March/062942.html) 以及 [IronPython](http://lists.ironpython.com/pipermail/users-ironpython.com/2006-March/002007.html) 的考虑,但其语法还是很一致的:" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 2, 79 | "metadata": { 80 | "collapsed": false 81 | }, 82 | "outputs": [ 83 | { 84 | "name": "stdout", 85 | "output_type": "stream", 86 | "text": [ 87 | "Hello\n", 88 | "Obj\n", 89 | "Cost 3.005s to init.\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "from time import sleep, time\n", 95 | "def timer(Cls):\n", 96 | " def wraper():\n", 97 | " s = time()\n", 98 | " obj = Cls()\n", 99 | " e = time()\n", 100 | " print(\"Cost {:.3f}s to init.\".format(e - s))\n", 101 | " return obj\n", 102 | " return wraper\n", 103 | "@timer\n", 104 | "class Obj:\n", 105 | " def __init__(self):\n", 106 | " print(\"Hello\")\n", 107 | " sleep(3)\n", 108 | " print(\"Obj\")\n", 109 | "o = Obj()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "### 类作为修饰器\n", 117 | "\n", 118 | "上面两个例子都是以函数作为修饰器,因为函数才可以被调用(callable) `decorator(wrap(func))`。除了函数之外,我们也可以定义可被调用的类,只要添加 `__call__` 方法即可:" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 3, 124 | "metadata": { 125 | "collapsed": false 126 | }, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "LOG: Baking Tag !\n", 133 | "LOG: Baking Tag !\n", 134 | "LOG: Baking Tag
!\n", 135 | "
Hello
\n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "class HTML(object):\n", 141 | " \"\"\"\n", 142 | " Baking HTML Tags!\n", 143 | " \"\"\"\n", 144 | " def __init__(self, tag=\"p\"):\n", 145 | " print(\"LOG: Baking Tag <{}>!\".format(tag))\n", 146 | " self.tag = tag\n", 147 | " def __call__(self, func):\n", 148 | " return lambda: \"<{0}>{1}\".format(self.tag, func(), self.tag)\n", 149 | "\n", 150 | "@HTML(\"html\")\n", 151 | "@HTML(\"body\")\n", 152 | "@HTML(\"div\")\n", 153 | "def body():\n", 154 | " return \"Hello\"\n", 155 | "\n", 156 | "print(body())" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "### 传递参数\n", 164 | "\n", 165 | "在实际使用过程中,我们可能需要向修饰器传递参数,也有可能需要向被修饰的函数(或类)传递参数。按照语法约定,只要修饰器 `@decorator` 中的 `decorator` 是可调用即可,`decorator(123)` 如果返回一个新的可调用函数,那么也是合理的,上面的 `@HTML('html')` 即是一例,下面再以 [flask](https://github.com/mitsuhiko/flask/blob/master/flask%2Fapp.py) 的路由修饰器为例说明如何传递参数给修饰器:" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 4, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [ 175 | { 176 | "name": "stdout", 177 | "output_type": "stream", 178 | "text": [ 179 | "Hello world!\n", 180 | "Welcome Home!\n", 181 | "{'/': , '/home': }\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "RULES = {}\n", 187 | "def route(rule):\n", 188 | " def decorator(hand):\n", 189 | " RULES.update({rule: hand})\n", 190 | " return hand\n", 191 | " return decorator\n", 192 | "@route(\"/\")\n", 193 | "def index():\n", 194 | " print(\"Hello world!\")\n", 195 | "\n", 196 | "def home():\n", 197 | " print(\"Welcome Home!\")\n", 198 | "home = route(\"/home\")(home)\n", 199 | "\n", 200 | "index()\n", 201 | "home()\n", 202 | "print(RULES)" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "向被修饰的函数传递参数,要看我们的修饰器是如何作用的,如果像上面这个例子一样未执行被修饰函数只是将其原模原样地返回,则不需要任何处理(这就把函数当做普通的值一样看待即可):" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 5, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [ 219 | { 220 | "name": "stdout", 221 | "output_type": "stream", 222 | "text": [ 223 | "DB.findOne({hail, python})\n" 224 | ] 225 | } 226 | ], 227 | "source": [ 228 | "@route(\"/login\")\n", 229 | "def login(user = \"user\", pwd = \"pwd\"):\n", 230 | " print(\"DB.findOne({{{}, {}}})\".format(user, pwd))\n", 231 | "login(\"hail\", \"python\")" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "如果需要在修饰器内执行,则需要稍微变动一下:" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 6, 244 | "metadata": { 245 | "collapsed": false 246 | }, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "INFO: Start Logging\n", 253 | "Hello Python\n", 254 | "INFO: Finish Logging\n" 255 | ] 256 | } 257 | ], 258 | "source": [ 259 | "def log(f):\n", 260 | " def wraper(*args, **kargs):\n", 261 | " print(\"INFO: Start Logging\")\n", 262 | " f(*args, **kargs)\n", 263 | " print(\"INFO: Finish Logging\")\n", 264 | " return wraper\n", 265 | "\n", 266 | "@log\n", 267 | "def run(hello = \"world\"):\n", 268 | " print(\"Hello {}\".format(hello))\n", 269 | "run(\"Python\")" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### functools\n", 277 | "\n", 278 | "由于修饰器将函数(或类)进行包装之后重新返回:`func = decorator(func)`,那么有可能改变原本函数(或类)的一些信息,以上面的 `HTML` 修饰器为例:" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 7, 284 | "metadata": { 285 | "collapsed": false 286 | }, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "LOG: Baking Tag !\n", 293 | "\n", 294 | "None\n" 295 | ] 296 | } 297 | ], 298 | "source": [ 299 | "@HTML(\"body\")\n", 300 | "def body():\n", 301 | " \"\"\"\n", 302 | " return body content\n", 303 | " \"\"\"\n", 304 | " return \"Hello, body!\"\n", 305 | "print(body.__name__)\n", 306 | "print(body.__doc__)" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "因为 `body = HTML(\"body\")(body)` ,而 `HTML(\"body\").__call__()` 返回的是一个 `lambda` 函数,因此 `body` 已经被替换成了 `lambda`,虽然都是可执行的函数,但原来定义的 `body` 中的一些属性,例如 `__doc__`/`__name__`/`__module__` 都被替换了(在本例中`__module__`没变因为都在同一个文件中)。为了解决这一问题 Python 提供了 [`functools`](https://docs.python.org/3.5/library/functools.html) 标准库,其中包括了 `update_wrapper` 和 `wraps` 两个方法([源码](https://hg.python.org/cpython/file/3.5/Lib/functools.py))。其中 `update_wrapper` 就是用来将原来函数的信息赋值给修饰器中返回的函数:" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 8, 319 | "metadata": { 320 | "collapsed": false 321 | }, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "LOG: Baking Tag !\n", 328 | "body\n", 329 | "\n", 330 | " return body content!\n", 331 | " \n" 332 | ] 333 | } 334 | ], 335 | "source": [ 336 | "from functools import update_wrapper\n", 337 | "\"\"\"\n", 338 | "functools.update_wrapper(wrapper, wrapped[, assigned][, updated])\n", 339 | "\"\"\"\n", 340 | "\n", 341 | "\n", 342 | "class HTML(object):\n", 343 | " \"\"\"\n", 344 | " Baking HTML Tags!\n", 345 | " \"\"\"\n", 346 | " def __init__(self, tag=\"p\"):\n", 347 | " print(\"LOG: Baking Tag <{}>!\".format(tag))\n", 348 | " self.tag = tag\n", 349 | " def __call__(self, func):\n", 350 | " wraper = lambda: \"<{0}>{1}\".format(self.tag, func(), self.tag)\n", 351 | " update_wrapper(wraper, func)\n", 352 | " return wraper\n", 353 | "@HTML(\"body\")\n", 354 | "def body():\n", 355 | " \"\"\"\n", 356 | " return body content!\n", 357 | " \"\"\"\n", 358 | " return \"Hello, body!\"\n", 359 | "print(body.__name__)\n", 360 | "print(body.__doc__)" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "有趣的是 `update_wrapper` 的用法本身就很像是修饰器,因此 `functools.wraps` 就利用 `functools.partial`(还记得函数式编程中的偏应用吧!)将其变成一个修饰器:" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 9, 373 | "metadata": { 374 | "collapsed": false 375 | }, 376 | "outputs": [ 377 | { 378 | "name": "stdout", 379 | "output_type": "stream", 380 | "text": [ 381 | "run\n", 382 | "\n", 383 | " Docs' of run\n", 384 | " \n" 385 | ] 386 | } 387 | ], 388 | "source": [ 389 | "from functools import update_wrapper, partial\n", 390 | "\n", 391 | "def my_wraps(wrapped):\n", 392 | " return partial(update_wrapper, wrapped=wrapped)\n", 393 | "\n", 394 | "def log(func):\n", 395 | " @my_wraps(func)\n", 396 | " def wraper():\n", 397 | " print(\"INFO: Starting {}\".format(func.__name__))\n", 398 | " func()\n", 399 | " print(\"INFO: Finishing {}\".format(func.__name__))\n", 400 | " return wraper\n", 401 | "\n", 402 | "@log\n", 403 | "def run():\n", 404 | " \"\"\"\n", 405 | " Docs' of run\n", 406 | " \"\"\"\n", 407 | " print(\"Running run...\")\n", 408 | "print(run.__name__)\n", 409 | "print(run.__doc__)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "### 参考\n", 417 | "\n", 418 | "1. [Python修饰器的函数式编程](http://coolshell.cn/articles/11265.html)" 419 | ] 420 | } 421 | ], 422 | "metadata": { 423 | "kernelspec": { 424 | "display_name": "Python 3", 425 | "language": "python", 426 | "name": "python3" 427 | }, 428 | "language_info": { 429 | "codemirror_mode": { 430 | "name": "ipython", 431 | "version": 3 432 | }, 433 | "file_extension": ".py", 434 | "mimetype": "text/x-python", 435 | "name": "python", 436 | "nbconvert_exporter": "python", 437 | "pygments_lexer": "ipython3", 438 | "version": "3.5.0" 439 | } 440 | }, 441 | "nbformat": 4, 442 | "nbformat_minor": 0 443 | } 444 | -------------------------------------------------------------------------------- /Tips/gb2312.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coodict/pytips/b09321e30f66449056e0832b1a499b8a9623e461/Tips/gb2312.txt -------------------------------------------------------------------------------- /Tips/utf8.txt: -------------------------------------------------------------------------------- 1 | 你好,世界! 2 | --------------------------------------------------------------------------------