├── .travis.yml
├── README.md
├── ac.py
├── avl_tree.py
├── base64_str.py
├── btree.py
├── calc24.py
├── celery
    └── tasks.py
├── compress.py
├── coroutine.py
├── crawl_360
    ├── crawl_360
    │   ├── __init__.py
    │   ├── __init__.pyc
    │   ├── items.py
    │   ├── items.pyc
    │   ├── middlewares.py
    │   ├── models
    │   │   ├── __init__.py
    │   │   ├── __init__.pyc
    │   │   ├── db.py
    │   │   ├── db.pyc
    │   │   ├── models.py
    │   │   └── models.pyc
    │   ├── pipelines.py
    │   ├── pipelines.pyc
    │   ├── reademe
    │   │   └── sql.sql
    │   ├── settings.py
    │   ├── settings.pyc
    │   └── spiders
    │   │   ├── __init__.py
    │   │   ├── __init__.pyc
    │   │   ├── butian.py
    │   │   └── butian.pyc
    └── scrapy.cfg
├── dispatch.py
├── hashtable.py
├── heapq_sort.py
├── httpstat.py
├── img
    ├── ac_fail_pointer.png
    ├── btree.png
    ├── cmd.png
    ├── crawl_db_data.png
    ├── crawl_run.gif
    ├── download.gif
    ├── knn.png
    ├── redpackage.gif
    ├── spider-wx.png
    ├── svm.png
    └── tire.png
├── interpreter.py
├── kmp.py
├── knn.py
├── linked_list.py
├── nice_download.py
├── palindrome.py
├── rb_tree.py
├── red_package_optimize.py
├── redpackage.py
├── revert_list.py
├── rpn.py
├── rsa.py
├── selenium.py
├── svm.py
├── tensorflow
    ├── cnn_test.py
    ├── create_captcha_img.py
    └── train.py
└── word.md


/.travis.yml:
--------------------------------------------------------------------------------
1 | language: python
2 | script: true


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Python 
  2 | [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/LockGit/gochat/issues)
  3 | 
  4 | ### 基于mitm中间人的形式获取所有公众号历史文章，评论，阅读量
  5 | ```
  6 | feature
  7 | 1，支持水平扩展
  8 | 2，支持增量更新与自动化抓取阅读量，评论数据
  9 | 3，支持抓取封禁后继续恢复抓取
 10 | 4，支持各阶段时间频率控制
 11 | 5，支持持续监控指定公众号
 12 | ```
 13 | ![](https://github.com/LockGit/Py/blob/master/img/spider-wx.png)
 14 | 
 15 | ### nice_download.py 多线程文件下载器
 16 | ```
 17 | 理论在大型文件下载，带宽充足的情况下，可增加数十倍下载速度
 18 | 原理是多线程对目标文件分块下载
 19 | 1,发送head请求获取目标文件总大小，以及当前是否支持分块下载(详情:http协议header头range及response的content-range)，现在基本都支持
 20 | 2,下载前创建一个和要下载文件一样大小的文件
 21 | 3,根据1中获得的文件大小分块多线程,各个线程下载不同的数据块
 22 | 小型文件可能看不出加速效果，在大型文件上就会拉大差距
 23 | 关于http的range特性：
 24 | 有些文件下载器在下载中断之后可以在中断位置继续下载，而不必重新开始的原因就是利用了支持range的特性
 25 | 记录了中断时的文件偏移位置,在实现时只要在中断异常的时候记录文件偏移位置到临时文件
 26 | 下次继续下载读取临时文件中的偏移即可支持断点下载,下载完成时删除记录文件偏移的临时文件即可
 27 | 说明：
 28 | nice_download.py是多线程模式,所以去除断点下载功能，否则维护临时文件偏移位置比维护单一进程的临时文件偏移位置要复杂的多
 29 | 查看帮助：python nice_download.py -h
 30 | ```
 31 | ![](https://github.com/LockGit/Py/blob/master/img/download.gif)
 32 | 
 33 | 
 34 | ### 基于tensorflow的验证码识别
 35 | ```
 36 | 依赖:
 37 | pip install tensorflow
 38 | pip install numpy
 39 | 
 40 | 0x01,cd tensorflow
 41 | 0x02,模型训练：python train.py
 42 | 0x03,验证验证：python cnn_test.py
 43 | 
 44 | 已有大多相关案例，测试相关总结与截图如下:
 45 | ```
 46 | ![](https://github.com/LockGit/Hacking/blob/master/img/cnn_test.png)
 47 | 
 48 | [详相说明相关截图](https://github.com/LockGit/Hacking#基于机器学习tensorflow的复杂验证码识别)
 49 | 
 50 | 总结文档：[基于机器学习(TensorFlow)的复杂验证码识别.pdf](https://github.com/LockGit/Hacking/blob/master/res/doc/基于机器学习(TensorFlow)的复杂验证码识别.pdf)
 51 | 
 52 | 
 53 | ### redpackage.py && red_package_optimize.py 一种红包分配思路
 54 | ```
 55 | red_package_optimize.py为优化版，redpackage.py的range有点浪费内存，比如在红包个数特别大的情况下
 56 | 
 57 | 指定红包总金额，再指定红包的个数，获得每个红包分配金额详情
 58 | 
 59 | 例，红包总金额为10元，分成7个
 60 | ➜  Py git:(master) ✗ py redpackage.py 10 7
 61 | [0.57, 2.37, 1.91, 0.32, 1.3, 2.24, 1.29]
 62 | 第 1 个红包金额:0.57元
 63 | 第 2 个红包金额:2.37元
 64 | 第 3 个红包金额:1.91元
 65 | 第 4 个红包金额:0.32元
 66 | 第 5 个红包金额:1.3元
 67 | 第 6 个红包金额:2.24元
 68 | 第 7 个红包金额:1.29元
 69 | 验证:红包总金额 is 10.0元, 分配后 res sum is 10.0元
 70 | ```
 71 | ![](https://github.com/LockGit/Py/blob/master/img/redpackage.gif)
 72 | 
 73 | ### ac.py 字符串搜索算法（tire树+AC自动机)
 74 | ```
 75 | 学习记录:
 76 | 如果你的本地只有几个，几十个词，那么没有必要使用，直接存配置文件，字典查找即可，
 77 | 这比向api发起http请求要快的多。但如果词的数目不断增加，那么后期将不利于维护，
 78 | 需要服务化。
 79 | 
 80 | 这个算法存在于实际场景,判断某个单词是否是敏感词，就涉及到字符串查找。
 81 | 敏感词被封装成了一个api接口,使用起来也很方便,直接向api提交单词,
 82 | 看返回结果就知道是否命中,命中了则字符串存在,表明查找到了。
 83 | 
 84 | 需要数据结构与算法知识：
 85 | 参考文档1(海量数据处理之Tire树（字典树))：
 86 |       http://blog.csdn.net/ts173383201/article/details/7858598
 87 | 参考文档2(AC自动机总结)：
 88 |       http://blog.csdn.net/mobius_strip/article/details/22549517
 89 | 
 90 | trie的核心思想是空间换时间,跟彩虹表的思想一致,但trie树不是彩虹表,
 91 | 简而言之,trie树利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。
 92 | 
 93 | 它有3个基本性质：
 94 |       根节点不包含字符，除根节点外每一个节点都只包含一个字符。
 95 |       从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。
 96 |       每个节点的所有子节点包含的字符都不相同。
 97 | 
 98 | 复制了别人画的图，大致就是一种如下的树结构,需要用语言构造这棵树即可:
 99 | ```
100 | ![](https://github.com/LockGit/Py/blob/master/img/tire.png)
101 | 
102 | ```
103 | fail 指针的理解图解,以下内容需要仔细读
104 | 参考：http://www.cnblogs.com/crazyacking/p/4659501.html
105 | ```
106 | ![](https://github.com/LockGit/Py/blob/master/img/ac_fail_pointer.png)
107 | 
108 | ```
109 | 树上的词分别是：
110 | { he , hers , his , she}
111 | 按图所示分成3层。看到第三层，是"she"，其中：
112 | ①s指向root
113 | ②h先找到s的fail指针
114 | 发现是0号指针，不是h，然后h就不高兴了，再问问s的fail指针root：“你有没有儿子和我同名叫h的”
115 | root说：“有，你指向他吧”，然后h就高兴的指向了第一行的h.
116 | ③e开始找了，首先问他老爸h：“你的fail指针指着谁”
117 | h说：“图上第一行那个h啊”
118 | 然后e就屁颠屁颠地跑去问图上第一行那个h：“你有没有名字和我一样的儿子啊”
119 | 图上第一行那个h说：“有，他地址是xxx”
120 | 最后e的fail指针就指向xxx地址，也就是第一行那个e了
121 | 发现这样，如果一个字符串查到第三行的e以后的字符才不匹配，那说明他前面应该有个‘he’
122 | 刚好e的失败指针指向的是第一行的‘he...’的那个e；
123 | 这样就不用从h开始再找一遍，而是接着第一行的e继续往后找，从而节省了时间.
124 | ```
125 | 
126 | ```
127 | ➜  ~ du -h word.md && wc -l word.md
128 | 1.0M  word.md
129 | 57193 word.md
130 | 
131 | 本地测试了一下，57000条记录大于占1M硬盘空间，那么6M的空间大约包含记录34W条记录,
132 | 我传到github的word.md没有几个字符,只做了演示,而且每个单词还加了rank等级，\t制表符,实际占用空间应该更小,
133 | 生产环境甚至可以直接将这些数据缓存到内存中。
134 | ```
135 | ![](https://github.com/LockGit/Py/blob/master/img/cmd.png)
136 | ```
137 | 测试搜索指定字符串：
138 | 
139 | 查找到了
140 | ➜  ~ python ac.py lock
141 | Good ! Find it, the item is:
142 | [(0, 3, 'lock', 1, 2)]
143 | 
144 | 查找到了
145 | ➜  ~ python ac.py stop
146 | Good ! Find it, the item is:
147 | [(0, 3, 'stop', 2, 3)]
148 | 
149 | 没有查找到
150 | ➜  ~ python ac.py test
151 | Sorry, The item not in file dict
152 | 
153 | 如果查找到了返回一个list，list中item类型为tuple, 并且包含了在树中匹配的起,终点位置index
154 | ```
155 | 
156 | ### calc24.py 算24游戏小程序
157 | ```
158 | 游戏规则：给定4个数，可以执行的运算有 + - * / , 求出算的结果是24的算法过程
159 | 
160 | get help：
161 | ➜  Py git:(master) ✗ py calc24.py -h
162 | Usage: usage -n 1,2,3,4
163 | 
164 | Options:
165 |   -h, --help  show this help message and exit
166 |   -n NUMS     specify num list
167 |   
168 | exp:
169 | ➜  Py git:(master) ✗ py calc24.py -n 10,8,9,4
170 | [10, 8, 9, 4]
171 | 9 - 10 = -1
172 | 4 + -1 = 3
173 | 8 * 3 = 24
174 | Success
175 | 
176 | or random test:
177 | ➜  Py git:(master) ✗ py calc24.py
178 | [9, 10, 3, 6]
179 | 10 - 9 = 1
180 | 3 + 1 = 4
181 | 6 * 4 = 24
182 | Success
183 | 
184 | ~~~python轮子很强大~~~
185 | ```
186 | 
187 | ### rpn.py 逆波兰表达式 python 版实现
188 | ```
189 | 逆波兰表达式被广泛应用于编译原理中，是一种是由波兰数学家扬·武卡谢维奇1920年引入的数学表达式方式，在逆波兰记法中，
190 | 所有操作符置于操作数的后面，因此也被称为后缀表示法。逆波兰记法不需要括号来标识操作符的优先级。
191 | 以利用堆栈结构减少计算机内存访问。
192 | ➜  Py git:(master) ✗ python rpn.py
193 | ['11111111111111', '9999999999999', '*', '99', '12', '4', '/', '-', '10', '+', '+']
194 | True 111111111111098888888888995 111111111111098888888888995
195 | True 326 326
196 | ```
197 | 
198 | 
199 | ### dispatch.py 轮转队列 | 协程实现
200 | ```
201 | 你的手头上会有多个任务，每个任务耗时很长，而你又不想同步处理，而是希望能像多线程一样交替执行。
202 | yield 没有逻辑意义，仅是作为暂停的标志点。
203 | 程序流可以在此暂停，也可以在此恢复。而通过实现一个调度器，完成多个任务的并行处理。
204 | 通过轮转队列依次唤起任务，并将已经完成的任务清出队列，模拟任务调度的过程。
205 | 核心代码：
206 | from collections import deque
207 | class Runner(object):
208 |     def __init__(self, tasks):
209 |         self.tasks = deque(tasks)
210 | 
211 |     def next(self):
212 |         return self.tasks.pop()
213 | 
214 |     def run(self):
215 |         while len(self.tasks):
216 |             task = self.next()
217 |             try:
218 |                 next(task)
219 |             except StopIteration:
220 |                 pass
221 |             else:
222 |                 self.tasks.appendleft(task)
223 | 
224 | def task(name, times):
225 |     for i in range(times):
226 |         yield
227 |         print(name, i)
228 | 
229 | Runner([
230 |     task('hsfzxjy', 5),
231 |     task('Jack', 4),
232 |     task('Bob', 6)
233 | ]).run()
234 | ```
235 | 
236 | ### coroutine.py 通过gevent第三方库实现协程
237 | ```
238 | 上面的dispatch.py通过yield提供了对协程的支持,模拟了任务调度。而下面的这个gevent第三方库就更简单了。
239 | 
240 | 第三方的gevent为Python提供了比较完善的协程支持。通过greenlet实现协程，其基本思想是：
241 |     当一个greenlet遇到IO操作时，比如访问网络，就自动切换到其他的greenlet，等到IO操作完成，再在适当的时候切换回来继续执行。
242 |     由于IO操作非常耗时，经常使程序处于等待状态，有了gevent为我们自动切换协程，就保证总有greenlet在运行，而不是等待IO。
243 | 
244 | 由于切换是在IO操作时自动完成，所以gevent需要修改Python自带的一些标准库，这一过程在启动时通过monkey patch完成：
245 | 
246 | 依赖：
247 | pip install gevent
248 | 
249 | 执行：
250 | ➜  Py git:(master) ✗ python coroutine.py
251 | GET: https://www.python.org/
252 | GET: https://www.yahoo.com/
253 | GET: https://github.com/
254 | 91430 bytes received from https://github.com/.
255 | 47391 bytes received from https://www.python.org/.
256 | 461975 bytes received from https://www.yahoo.com/.
257 | 
258 | ```
259 | 
260 | 
261 | ### base64_str.py base64编码原理
262 | ```
263 | base64编码原理，使用Python实现base64编码，可能有bug，未完全完善版
264 | 1,准备一个包含64个字符的数组
265 | 2,对二进制数据进行处理，每3个字节一组，一共是3x8=24bit，划为4组，每组正好6个bit
266 | 3,得到4个数字作为索引，然后查表，获得相应的4个字符，就是编码后的字符串
267 | 4,如果要编码的二进制数据不是3的倍数，最后会剩下1个或2个字节,Base64用\x00字节在末尾补足后，再在编码的末尾加上1个或2个=号，
268 | 表示补了多少字节，解码的时候，会自动去掉。
269 | 
270 | Base64编码会把3字节的二进制数据编码为4字节的文本数据，长度增加33%
271 | 
272 | 例：
273 | ➜  Py git:(master) ✗ python base64_str.py lock
274 | bG9jaw==
275 | ➜  Py git:(master) ✗ echo -n lock|base64
276 | bG9jaw==
277 | 
278 | ```
279 | 
280 | ### rsa.py RSA算法演示
281 | ```
282 | ➜  py python rsa.py
283 | 下面是一个RSA加解密算法的简单演示:
284 | 
285 | 报文    加密       加密后密文
286 | 
287 | 12      248832          17
288 | 15      759375          15
289 | 22      5153632         22
290 | 5       3125            10
291 | 
292 | 
293 | ---------------------------
294 | ----------执行解密---------
295 | ---------------------------
296 | 原始报文        密文      加密            解密报文
297 | 
298 | 12              17      1419857         12
299 | 15              15      759375          15
300 | 22              22      5153632         22
301 | 5               10      100000          5
302 | ```
303 | 
304 | ### selenium.py 自动化测试demo
305 | ```
306 | 坑1：
307 |   执行 python selenium.py 始终无法唤醒chrome。
308 |   最终发现chromedriver很早之前安装的，没有进行：brew upgrade chromedriver，导致执行脚本时报错
309 |   upgrade chromedriver 之后解决问题，官方文档说明了selenium支持好几个Browser driver。
310 |   演示时用的是Chrome，python的unittest模块，文档上说也可以用pytest
311 | 
312 | 大致支持这以下几种DOM查找,不同语言的接口略微的小区别
313 |   driver.findElement(By.id(<element ID>))
314 |   driver.findElement(By.name(<element name>))
315 |   driver.findElement(By.className(<element class>))
316 |   driver.findElement(By.tagName(<htmltagname>))
317 |   driver.findElement(By.linkText(<linktext>))
318 |   driver.findElement(By.partialLinkText(<linktext>))
319 |   driver.findElement(By.cssSelector(<css selector>))
320 |   driver.findElement(By.xpath(<xpath>))
321 | 
322 | 支持Using Selenium with remote WebDriver
323 |   支持远程WebDriver，默认监听4444端口
324 |   启动：brew services start selenium-server-standalone
325 |   停止：brew services stop selenium-server-standalone
326 |   访问http://127.0.0.1:4444 点击console,
327 |   新建正在测试所使用的webdriver,对于正在运行driver的测试程序，可以截图看当前测试程序的运行位置
328 | ```
329 | 
330 | 
331 | ### Python 沙箱逃逸
332 | ```
333 | 重温2012.hack.lu的比赛题目，在这次挑战中，需要读取'./1.key'文件的内容。
334 | 他们首先通过删除引用来销毁打开文件的内置函数。然后它们允许您执行用户输入。看看他们的代码稍微修改的版本：
335 | 
336 | def make_secure():
337 |     UNSAFE = ['open',
338 |               'file',
339 |               'execfile',
340 |               'compile',
341 |               'reload',
342 |               '__import__',
343 |               'eval',
344 |               'input']
345 |     for func in UNSAFE:
346 |         del __builtins__.__dict__[func]
347 | from re import findall
348 | # Remove dangerous builtins
349 | make_secure()
350 | print 'Go Ahead, Expoit me >;D'
351 | while True:
352 |     try:
353 |         # Read user input until the first whitespace character
354 |         inp = findall('\S+', raw_input())[0]
355 |         a = None
356 |         # Set a to the result from executing the user input
357 |         exec 'a=' + inp
358 |         print 'Return Value:', a
359 |     except Exception, e:
360 |     	print 'Exception:', e
361 | 由于没有在__builtins__中引用file和open，所以常规的编码技巧是行不通的。但可以在Python解释器中挖掘出另一种代替file或open引用的方法。
362 | 
363 | 另类读取文件的方式：
364 | ().__class__.__bases__[0].__subclasses__()[40]('1.key').read()
365 | 这个方法依然可以读取到1.key的内容，coder,hack,geek可以深入了解下，本人测试时的python版本为：Python 2.7.12
366 | ```
367 | 
368 | ### avl_tree.py 平衡二叉搜索树
369 | ```
370 | 特点：
371 | 1、若它的左子树不为空，则左子树上所有的节点值都小于它的根节点值。
372 | 2、若它的右子树不为空，则右子树上所有的节点值均大于它的根节点值。
373 | 3、它的左右子树也分别可以充当为二叉查找树。
374 | 4、每个节点的左子树和右子树的高度差至多等于1。
375 | 
376 | 如果普通二叉搜索树的深度很高且单一左边节点很多或者单一右边节点很多，那么查找性能几乎就变成了线性的
377 | 而平衡二叉树的每个节点的左子树和右子树的高度差至多等于1，这种树结构的查找性能时间复杂度趋向lgn
378 | 
379 | ➜  Py git:(master) ✗ py avl_tree.py
380 | 8
381 | 9
382 | 1
383 | 
384 | ```
385 | ### rb_tree.py 红黑树
386 | ```
387 | 红黑树多用在内部排序，即全放在内存中的，微软STL的map和set的内部实现就是红黑树。
388 | B树多用在内存里放不下，大部分数据存储在外存上时。因为B树层数少，因此可以确保每次操作，读取磁盘的次数尽可能的少。
389 | 在数据较小，可以完全放到内存中时，红黑树的时间复杂度比B树低。
390 | 反之，数据量较大，外存中占主要部分时，B树因其读磁盘次数少，而具有更快的速度。
391 | 
392 | 特点：
393 | （1）每个节点或者是黑色，或者是红色。
394 | （2）根节点是黑色。
395 | （3）每个叶子节点（NIL）是黑色。 [注意：这里叶子节点，是指为空(NIL或NULL)的叶子节点！]
396 | （4）如果一个节点是红色的，则它的子节点必须是黑色的。
397 | （5）从一个节点到该节点的子孙节点的所有路径上包含相同数目的黑节点。
398 | ```
399 | 
400 | ### revert_list.py 反转链表
401 | ```
402 | ➜  Py git:(master) ✗ py revert_list.py
403 | 1
404 | 2
405 | 3
406 | start revert list ...
407 | 3
408 | 2
409 | 1
410 | ```
411 | 
412 | ### palindrome.py python版回文数,heapq_sort.py基于堆排序
413 | ```
414 | life is short , use python
415 | -(1)时间复杂度：O(n)，空间复杂度：O(1)。从两头向中间扫描
416 | -(2)时间复杂度：O(n)，空间复杂度：O(1)。先从中间开始、然后向两边扩展
417 | 
418 | 堆排实现，python对有对应封装好的heapq模块
419 | py heapq_sort.py
420 | ```
421 | 
422 | 
423 | ### kmp.py   kmp字符串查找算法
424 | ```
425 | ➜  Py git:(master) ✗ python kmp.py
426 | Found 'sase' start at string 'asfdehhaassdsdasasedwa' 15 index position, find use times: 23
427 | Found 'sase' start at string '12s3sasexxx' 4 index position, find use times: 9
428 | 
429 | 核心算法：
430 | def kmp(string, match):
431 |     n = len(string)
432 |     m = len(match)
433 |     i = 0
434 |     j = 0
435 |     count_times_used = 0
436 |     while i < n:
437 |         count_times_used += 1
438 |         if match[j] == string[i]:
439 |             if j == m - 1:
440 |                 print "Found '%s' start at string '%s' %s index position, find use times: %s" % (match, string, i - m + 1, count_times_used,)
441 |                 return
442 |             i += 1
443 |             j += 1
444 |         elif j > 0:
445 |             j = j - 1
446 |         else:
447 |             i += 1
448 | ```
449 | 
450 | 
451 | ### compress.py 字符串压缩
452 | ```
453 | 针对连续重复较多的字符压缩，否则不起压缩效果
454 | ➜  Py git:(master) ✗ python compress.py
455 | 原始字符串:xAAACCCBBDBB111
456 | 压缩后:x1A3C3B2D1B213
457 | 执行解压...
458 | x
459 | A
460 | A
461 | A
462 | C
463 | C
464 | C
465 | B
466 | B
467 | D
468 | B
469 | B
470 | 1
471 | 1
472 | 1
473 | 解压完毕
474 | 解压后:xAAACCCBBDBB111
475 | ```
476 | 
477 | ### hashtable.py  hash表实现
478 | ```
479 | hash_table = HashTable(5); # 分配5块
480 | hash_table.set(1,'x')
481 | print hash_table.get(1)
482 | 
483 | 核心代码：
484 | class Item(object):
485 |     def __init__(self, key, value):
486 |         self.key = key
487 |         self.value = value
488 | 
489 | 
490 | class HashTable(object):
491 |     def __init__(self, size):
492 |         self.size = size
493 |         self.table = [[] for _ in xrange(self.size)]
494 | 
495 |     def hash_function(self, key):
496 |         return key % self.size
497 | 
498 |     def set(self, key, value):
499 |         hash_index = self.hash_function(key)
500 |         for item in self.table[hash_index]:
501 |             if item.key == key:
502 |                 item.value = value
503 |                 return
504 |         self.table[hash_index].append(Item(key, value))
505 | 
506 |     def get(self, key):
507 |         hash_index = self.hash_function(key)
508 |         for item in self.table[hash_index]:
509 |             if item.key == key:
510 |                 return item.value
511 |         return None
512 | 
513 |     def remove(self, key):
514 |         hash_index = self.hash_function(key)
515 |         for i, item in enumerate(self.table[hash_index]):
516 |             if item.key == key:
517 |                 del self.table[hash_index][i]
518 | ```
519 | 
520 | 
521 | ### interpreter.py Python解释器理解
522 | ```
523 | Python会执行其他3个步骤：词法分析，语法解析和编译。
524 | 这三步合起来把源代码转换成code object,它包含着解释器可以理解的指令。而解释器的工作就是解释code object中的指令。
525 | 核心代码
526 | class Interpreter:
527 |     def __init__(self):
528 |         self.stack = []
529 | 
530 |     def load_value(self, number):
531 |         self.stack.append(number)
532 | 
533 |     def print_answer(self):
534 |         answer = self.stack.pop()
535 |         print(answer)
536 | 
537 |     def add_two_values(self):
538 |         first_num = self.stack.pop()
539 |         second_num = self.stack.pop()
540 |         total = first_num + second_num
541 |         self.stack.append(total)
542 | 
543 |     def run_code(self, what_to_execute):
544 |             instructions = what_to_execute["instructions"]
545 |             numbers = what_to_execute["numbers"]
546 |             for each_step in instructions:
547 |                 instruction, argument = each_step
548 |                 if instruction == "load_value":
549 |                     number = numbers[argument]
550 |                     self.load_value(number)
551 |                 elif instruction == "add_two_values":
552 |                     self.add_two_values()
553 |                 elif instruction == "print_answer":
554 |                     self.print_answer()
555 | ```
556 | 
557 | 
558 | ### linked_list.py 快速查找单链表中间节点
559 | ```
560 | ➜  Py git:(master) py linked_list.py
561 | 普通遍历方式,单链表中间节点为:n3,索引为:2，遍历一次链表，在从0遍历到中间位置
562 | 快慢指针方式,单链表中间节点为:n3,索引为:2，只遍历一次链表
563 | 
564 | 核心代码：
565 | class Node(object):
566 |   def __init__(self,data,next):
567 |     self.data=data
568 |     self.next=next
569 | 
570 | n1 = Node('n1',None)
571 | n2 = Node('n2',n1)
572 | n3 = Node('n3',n2)
573 | n4 = Node('n4',n3)
574 | n5 = Node('n5',n4)
575 | 
576 | head = n5   # 链表的头节点
577 | 
578 | p1 = head   # 一次步进1个node
579 | p2 = head   # 一次步进2个node
580 | 
581 | step = 0
582 | while (p2.next is not None and p2.next.next is not None):
583 |   p2 = p2.next.next
584 |   p1 = p1.next
585 |   step = step + 1
586 | print '快慢指针方式,单链表中间节点为:%s,索引为:%s，只遍历一次链表' % (p1.data,step)
587 | ```
588 | 
589 | ### K最近邻算法
590 | ```
591 | 这个算法比svm简单很多
592 | 只需使用初中所学的两点距离公式（欧拉距离公式），计算目标点到各组的距离，看绿点和哪组更接近。
593 | k代表取当前要分类的点最近的k个点，这k个点如果其中属于红点个数占多数，我们就认为绿点应该划分为红组，反之，则划分为黑组。
594 | k值与分类数成正相关，现在是2个分组，那么k值取3，假设是3个分组，那么k值就要取5
595 | 参考说明：https://zh.wikipedia.org/wiki/最近鄰居法
596 | 依赖：
597 | pip install numpy
598 | pip install matplotlib
599 | 
600 | 下图中标注较大的红点在计算之后被分配到红组
601 | 执行：python knn.py
602 | ```
603 | ![](https://github.com/LockGit/Py/blob/master/img/knn.png)
604 | 
605 | 
606 | ### 支持向量机 svm.py
607 | ```
608 | 迟早会忘记的svm
609 | 属分类算法，目标是寻找一个最优超平面，比knn算法复杂
610 | demo为线性可分离数据
611 | 
612 | 参考1：https://zh.wikipedia.org/zh-hans/支持向量机
613 | 参考2：http://blog.csdn.net/viewcode/article/details/12840405
614 | 参考3：http://blog.csdn.net/lisi1129/article/details/70209945?locationNum=8&fps=1
615 | 
616 | 依赖：
617 | pip install numpy
618 | pip install matplotlib
619 | 
620 | 执行：python svm.py
621 | ```
622 | ![](https://github.com/LockGit/Py/blob/master/img/svm.png)
623 | 
624 | 
625 | ### (前序，中序，后序，层序) btree.py
626 | ```
627 | ➜  Py git:(master) ✗ python btree.py
628 | 前序遍历： root A C D F G B E
629 | 中序遍历： C F D G A root B E
630 | 后序遍历： F G D C A E B root
631 | 层序遍历： root A B C E D F G
632 | 构造树结构如下图
633 | ```
634 | ![](https://github.com/LockGit/Py/blob/master/img/btree.png)
635 | 
636 | 
637 | ### Scrapy 爬虫测试（项目代码在仓库crawl_360目录下）
638 | ```
639 | 安装依赖：
640 | pip install Scrapy 
641 | pip install sqlalchemy 
642 | pip install sqlacodegen
643 | pip install mysql-connector
644 | 
645 | 创建db：CREATE DATABASE crawl DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci
646 | 
647 | 创建表：crawl_360/readme/sql.sql 文件
648 | 
649 | sqlacodegen创建models：
650 | sqlacodegen --outfile=models.py mysql://root@localhost:3306/crawl --tables butian
651 | 
652 | 
653 | 找测试的目标抓取页面：http://butian.360.cn/Loo 页面被披露漏洞的企业列表
654 | 
655 | 创建项目: scrapy startproject crawl_360
656 | 
657 | 目录结构:
658 | ➜  crawl_360 tree
659 | .
660 | ├── crawl_360
661 | │   ├── __init__.py
662 | │   ├── __init__.pyc
663 | │   ├── items.py
664 | │   ├── items.pyc
665 | │   ├── middlewares.py
666 | │   ├── models
667 | │   │   ├── __init__.py
668 | │   │   ├── __init__.pyc
669 | │   │   ├── db.py
670 | │   │   ├── db.pyc
671 | │   │   ├── models.py
672 | │   │   └── models.pyc
673 | │   ├── pipelines.py
674 | │   ├── pipelines.pyc
675 | │   ├── reademe
676 | │   │   └── sql.sql
677 | │   ├── settings.py
678 | │   ├── settings.pyc
679 | │   └── spiders
680 | │       ├── __init__.py
681 | │       ├── __init__.pyc
682 | │       ├── butian.py
683 | │       └── butian.pyc
684 | └── scrapy.cfg
685 | 
686 | 生成一个爬虫：
687 | cd crawl_360 && scrapy genspider butian butian.360.cn/Loo
688 | 
689 | 编写爬虫代码 (crawl_360目录下，xpath代码30行即可)
690 | 
691 | 爬取：scrapy crawl butian
692 | 
693 | 另：selenium也是一款非常不错的工具，可是使用selenium调用Browser driver更加逼真真实用户操作
694 | ```
695 | ![](https://github.com/LockGit/Py/blob/master/img/crawl_run.gif)
696 | ![](https://github.com/LockGit/Py/blob/master/img/crawl_db_data.png)
697 | 
698 | 
699 | 
700 | ### Celery 分布式任务队列Test (仓库celery文件夹下)
701 | ```
702 | pip3 install celery
703 | pip3 install redis
704 | 编写tasks.py
705 | ```
706 | ```python
707 | from celery import Celery
708 | 
709 | app = Celery('TASK', broker='redis://127.0.0.1', backend='redis://127.0.0.1')
710 | 
711 | 
712 | @app.task
713 | def add(x, y):
714 |     print 'start ...'
715 |     print 'get param :%s,%s' % (x, y,)
716 |     return x + y
717 | ```
718 | ```
719 | 启动celery worker 来开始监听并执行任务
720 | celery -A tasks worker --loglevel=info
721 | tasks 任务文件名，worker 任务角色，--loglevel=info 任务日志级别
722 | 
723 | 127.0.0.1:6379> keys *
724 | 1) "_kombu.binding.celery"
725 | 2) "_kombu.binding.celeryev"
726 | 3) "_kombu.binding.celery.pidbox"
727 | 127.0.0.1:6379>
728 | 
729 | redis 集合结构(set),查看value:
730 | SMEMBERS _kombu.binding.celery
731 | 
732 | 在tasks.py文件目录打开终端进入py的交互式模式
733 | >>> from tasks import add
734 | >>> add.delay(1,2)
735 | <AsyncResult: edb1b071-ed94-46fc-8250-a13e3db0e1a4>
736 | >>> t = add.delay(4,5)
737 | >>> t.get()
738 | 9
739 | >>> t.ready()
740 | True
741 | 
742 | celery常用接口
743 | tasks.add(4,6) ---> 本地执行
744 | tasks.add.delay(3,4) --> worker执行
745 | t=tasks.add.delay(3,4)  --> t.get()  获取结果，或卡住，阻塞
746 | t.ready()---> False：未执行完，True：已执行完
747 | t.get(propagate=False) 抛出简单异常，但程序不会停止
748 | t.traceback 追踪完整异常
749 | 
750 | 计算结果保存在redis中,默认结果有效期为1天
751 | 127.0.0.1:6379> ttl celery-task-meta-6eb3ee46-e86d-409a-9eb5-0c7d9b005035
752 | (integer) 85917
753 | 127.0.0.1:6379> get celery-task-meta-6eb3ee46-e86d-409a-9eb5-0c7d9b005035
754 | "{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": 9, \"task_id\": \"6eb3ee46-e86d-409a-9eb5-0c7d9b005035\", \"children\": []}"
755 | 127.0.0.1:6379>
756 | ```
757 | 


--------------------------------------------------------------------------------
/ac.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # @Author: lock
  3 | # @Date:   2017-05-08 16:32:38
  4 | # @Last Modified by:   lock
  5 | # @Last Modified time: 2017-05-08 22:48:16
  6 | import time
  7 | import logging
  8 | import sys
  9 | 
 10 | log = logging.getLogger('dict_word')
 11 | 
 12 | 
 13 | _word_cells = {}
 14 | #预先生成好组成单词的字符
 15 | for c in [chr(i) for i in range(ord('a'), ord('z') + 1)]:
 16 |     _word_cells[unicode(c)] = 1
 17 | for c in [chr(i) for i in range(ord('A'), ord('Z') + 1)]:
 18 |     _word_cells[unicode(c)] = 1
 19 | for c in [chr(i) for i in range(ord('0'), ord('9') + 1)]:
 20 |     _word_cells[unicode(c)] = 1
 21 | 
 22 | 
 23 | #固定的英文单词组成部分
 24 | _word_cells[u'_'] = 1
 25 | _word_cells[u'-'] = 1
 26 | 
 27 | # 缓存
 28 | _cache = {
 29 |     'acm': None,
 30 |     'load_time': 0
 31 | }
 32 | 
 33 | #词默认等级
 34 | DEFAULT_RANK=1
 35 | 
 36 | 
 37 | def isWordCell(a):
 38 |     '''
 39 |     当前字符是否为单词的非边界,或者是组成部分
 40 |     :param a:
 41 |     :return:
 42 |     '''
 43 |     return a in _word_cells
 44 | 
 45 | 
 46 | 
 47 | class Node(object):
 48 |     '''
 49 |     Node树节点
 50 |     :next : 用dict字典结构模拟动态链表
 51 |     :fail : 辅助初始值None
 52 |     :param isWord : 当前树节点是否为存在的单词
 53 |     :param rank : 等级
 54 |     '''
 55 |     def __init__(self):
 56 |         self.next = {}
 57 |         self.fail = None
 58 |         self.isWord = False
 59 |         self.rank = 0
 60 | 
 61 | 
 62 | class Ahocorasick(object):
 63 |     def __init__(self):
 64 |         self.__root = Node()
 65 | 
 66 | 
 67 |     def make(self):
 68 |         '''
 69 |         build the fail function
 70 |         构建自动机，失效函数
 71 |         '''
 72 |         tmpQueue = []
 73 |         tmpQueue.append(self.__root)
 74 |         while (len(tmpQueue) > 0):
 75 |             temp = tmpQueue.pop()
 76 |             p = None
 77 |             for k, v in temp.next.items():
 78 |                 if temp == self.__root:
 79 |                     temp.next[k].fail = self.__root
 80 |                 else:
 81 |                     p = temp.fail
 82 |                     while p is not None:
 83 |                         if p.next.has_key(k):
 84 |                             temp.next[k].fail = p.next[k]
 85 |                             break
 86 |                         p = p.fail
 87 |                     if p is None:
 88 |                         temp.next[k].fail = self.__root
 89 |                 tmpQueue.append(temp.next[k])
 90 | 
 91 | 
 92 |     def addWord(self, word, rank=1,line=0):
 93 |         '''
 94 |         @param word: add word to Tire tree
 95 |         添加关键词到Tire树中
 96 |         '''
 97 |         word = word.lower()
 98 |         tmp = self.__root
 99 |         for i in range(0, len(word)):
100 |             if not tmp.next.has_key(word[i]):
101 |                 tmp.next[word[i]] = Node()
102 |             tmp = tmp.next[word[i]]
103 |         tmp.isWord = True
104 |         tmp.rank = rank
105 |         tmp.line = line
106 | 
107 | 
108 |     def search(self, content):
109 |         '''
110 |         @return  如果查找到了返回一个list，list中item类型为tuple, 并且包含了匹配的起,终点位置index
111 |         '''
112 |         #不区分大小写
113 |         raw_content=content
114 |         content = content.lower()
115 | 
116 |         p = self.__root
117 |         result = []
118 |         startWordIndex = 0
119 |         endWordIndex = -1
120 |         currentPosition = 0
121 | 
122 |         content_len = len(content)
123 |         while currentPosition < content_len:
124 |             word = content[currentPosition]
125 |             #print 'word:', word
126 |             # 检索状态机，直到匹配
127 |             while p.next.has_key(word) == False and p != self.__root:
128 |                 p = p.fail
129 | 
130 |             if p.next.has_key(word):
131 |                 if p == self.__root:
132 |                     # 若当前节点是根且存在转移状态，则说明是匹配词的开头，记录词的起始位置
133 |                     startWordIndex = currentPosition
134 |                 # 转移状态机的状态
135 |                 p = p.next[word]
136 |             else:
137 |                 p = self.__root
138 | 
139 |             if p.isWord:
140 |                 # 若状态为词的结尾，则把词放进结果集
141 |                 # 判断当前这些位置是否为单词的边界
142 |                 if startWordIndex > 0 and isWordCell(content[startWordIndex - 1]) and isWordCell(content[startWordIndex]):
143 |                     # 当前字符和前面的字符都是字母,那么它是连续单词
144 |                     # print '前面不是单词边界', [startWordIndex > 0, str(content[startWordIndex - 1].encode('utf-8')),isWordCell(content[startWordIndex - 1]),str(content[startWordIndex].encode('utf-8')),isWordCell(content[startWordIndex])]
145 |                     currentPosition += 1
146 |                     continue
147 | 
148 |                 if currentPosition < content_len - 1 and isWordCell(content[currentPosition + 1]) and isWordCell(content[currentPosition]):
149 |                     # print '后面不是单词边界'
150 |                     currentPosition += 1
151 |                     continue
152 | 
153 |                 result.append((startWordIndex, currentPosition, raw_content[startWordIndex:currentPosition + 1], p.rank,p.line))
154 | 
155 |             currentPosition += 1
156 |         return result
157 | 
158 | 
159 | 
160 | def load_acm(filename):
161 |     '''
162 |     加载词表
163 |     :param filename:
164 |     :return:
165 |     词表 分为很多行 
166 |     每行 有2列组成
167 |     词 [tab] 等级
168 |     exp:
169 |     sharen [\t]  2
170 |     '''
171 |     import os.path
172 | 
173 |     mtime = os.path.getmtime(filename)
174 |     if _cache['load_time'] < mtime or _cache['acm'] is None:
175 |         log.info('start load data')
176 |         _cache['load_time'] = mtime
177 |         start_time = time.time()
178 |         acm = Ahocorasick()
179 | 
180 |         with open(filename) as fp:
181 |             line_count = 0
182 |             for line in fp:
183 |                 line_count += 1
184 |                 w = line.strip().decode('utf-8')
185 |                 arr2 = w.split('\t')
186 |                 # 默认等级
187 |                 if len(arr2) == 1:
188 |                     arr2.append(DEFAULT_RANK)
189 |                 try:
190 |                     acm.addWord(arr2[0], int(arr2[1]), line_count)
191 |                 except Exception as e:
192 |                     print 'error', e
193 |                     print 'line', line_count, line
194 |         acm.make()
195 |         _cache['acm'] = acm
196 |         log.info('load ok time:%.2f' % (time.time() - start_time))
197 |     else:
198 |         # print 'hit cache'
199 |         pass
200 |         
201 | 
202 |     return _cache['acm']
203 | 
204 | def help():
205 |     print "example: python ac.py str\n"
206 | 
207 | 
208 | if __name__ == '__main__':
209 | 
210 |     args = sys.argv
211 |     if len(args) != 2:
212 |         help()
213 |         exit()
214 | 
215 |     # 预加载
216 |     acm = load_acm('./word.md')
217 |     # 指定搜索的文本
218 |     content = args[1]
219 |     search_result = acm.search(content)
220 |     if len(search_result) > 0:
221 |         print 'Good ! Find it, the item is:\n%s'%(search_result)
222 |     else:
223 |         print 'Sorry, The item not in file dict'
224 | 
225 | 


--------------------------------------------------------------------------------
/avl_tree.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # encoding: utf-8
  3 | # author: Lock
  4 | # Created by Vim
  5 | """
  6 | 平衡二叉搜索树
  7 | 1、若它的左子树不为空，则左子树上所有的节点值都小于它的根节点值。
  8 | 2、若它的右子树不为空，则右子树上所有的节点值均大于它的根节点值。
  9 | 3、它的左右子树也分别可以充当为二叉查找树。
 10 | 4、每个节点的左子树和右子树的高度差至多等于1。
 11 | """
 12 | 
 13 | 
 14 | class Node(object):
 15 |     def __init__(self, key):
 16 |         self.key = key
 17 |         self.left = None
 18 |         self.right = None
 19 |         self.height = 0
 20 | 
 21 | 
 22 | class AvlTree(object):
 23 |     def __init__(self):
 24 |         self.root = None
 25 | 
 26 |     def find(self, key):
 27 |         if self.root is None:
 28 |             return None
 29 |         else:
 30 |             return self._find(key, self.root)
 31 | 
 32 |     def _find(self, key, node):
 33 |         if node is None:
 34 |             return None
 35 |         elif key < node.key:
 36 |             return self._find(key, self.left)
 37 |         elif key > node.key:
 38 |             return self._find(key, self.right)
 39 |         else:
 40 |             return node
 41 | 
 42 |     def find_min(self):
 43 |         if self.root is None:
 44 |             return None
 45 |         else:
 46 |             return self._find_min(self.root)
 47 | 
 48 |     def _find_min(self, node):
 49 |         if node.left:
 50 |             return self._find_min(node.left)
 51 |         else:
 52 |             return node
 53 | 
 54 |     def find_max(self):
 55 |         if self.root is None:
 56 |             return None
 57 |         else:
 58 |             return self._find_max(self.root)
 59 | 
 60 |     def _find_max(self, node):
 61 |         if node.right:
 62 |             return self._find_max(node.right)
 63 |         else:
 64 |             return node
 65 | 
 66 |     def height(self, node):
 67 |         if node is None:
 68 |             return -1
 69 |         else:
 70 |             return node.height
 71 | 
 72 |     def single_left_rotate(self, node):
 73 |         k1 = node.left
 74 |         node.left = k1.right
 75 |         k1.right = node
 76 |         node.height = max(self.height(node.right), self.height(node.left)) + 1
 77 |         k1.height = max(self.height(k1.left), node.height) + 1
 78 |         return k1
 79 | 
 80 |     def single_right_rotate(self, node):
 81 |         k1 = node.right
 82 |         node.right = k1.left
 83 |         k1.left = node
 84 |         node.height = max(self.height(node.right), self.height(node.left)) + 1
 85 |         k1.height = max(self.height(k1.right), node.height) + 1
 86 |         return k1
 87 | 
 88 |     def double_left_rotate(self, node):
 89 |         node.left = self.single_right_rotate(node.left)
 90 |         return self.single_left_rotate(node)
 91 | 
 92 |     def double_right_rotate(self, node):
 93 |         node.right = self.single_left_rotate(node.right)
 94 |         return self.single_right_rotate(node)
 95 | 
 96 |     def put(self, key):
 97 |         if not self.root:
 98 |             self.root = Node(key)
 99 |         else:
100 |             self.root = self._put(key, self.root)
101 | 
102 |     def _put(self, key, node):
103 |         if node is None:
104 |             node = Node(key)
105 |         elif key < node.key:
106 |             node.left = self._put(key, node.left)
107 |             if (self.height(node.left) - self.height(node.right)) == 2:
108 |                 if key < node.left.key:
109 |                     node = self.single_left_rotate(node)
110 |                 else:
111 |                     node = self.double_left_rotate(node)
112 |         elif key > node.key:
113 |             node.right = self._put(key, node.right)
114 |             if (self.height(node.right) - self.height(node.left)) == 2:
115 |                 if key < node.right.key:
116 |                     node = self.double_right_rotate(node)
117 |                 else:
118 |                     node = self.single_right_rotate(node)
119 | 
120 |         node.height = max(self.height(node.right), self.height(node.left)) + 1
121 |         return node
122 | 
123 |     def delete(self, key):
124 |         self.root = self.remove(key, self.root)
125 | 
126 |     def remove(self, key, node):
127 |         if node is None:
128 |             raise KeyError, 'Error,key not in tree'
129 |         elif key < node.key:
130 |             node.left = self.remove(key, node.left)
131 |             if (self.height(node.right) - self.height(node.left)) == 2:
132 |                 if self.height(node.right.right) >= self.height(node.right.left):
133 |                     node = self.single_right_rotate(node)
134 |                 else:
135 |                     node = self.double_right_rotate(node)
136 |             node.height = max(self.height(node.left), self.height(node.right)) + 1
137 |         elif key > node.key:
138 |             node.right = self.remove(key, node.right)
139 |             if (self.height(node.left) - self.height(node.right)) == 2:
140 |                 if self.height(node.left.left) >= self.height(node.left.right):
141 |                     node = self.single_left_rotate(node)
142 |                 else:
143 |                     node = self.double_left_rotate(node)
144 |             node.height = max(self.height(node.left), self.height(node.right)) + 1
145 |         elif node.left and node.right:
146 |             if node.left.height <= node.right.height:
147 |                 min_node = self._find_min(node.right)
148 |                 node.key = min_node.key
149 |                 node.right = self.remove(node.key, node.right)
150 |             else:
151 |                 max_node = self._find_max(node.left)
152 |                 node.key = max_node.key
153 |                 node.left = self.remove(node.key, node.left)
154 |             node.height = max(self.height(node.left), self.height(node.right)) + 1
155 |         else:
156 |             if node.right:
157 |                 node = node.right
158 |             else:
159 |                 node = node.left
160 | 
161 |         return node
162 | 
163 | 
164 | if __name__ == '__main__':
165 |     avlTree = AvlTree()
166 |     avlTree.put(1)
167 |     avlTree.put(2)
168 |     avlTree.put(3)
169 |     avlTree.put(4)
170 |     avlTree.put(5)
171 |     avlTree.put(6)
172 |     avlTree.put(7)
173 |     avlTree.put(8)
174 |     print avlTree.find_max().key
175 |     avlTree.put(9)
176 |     print avlTree.find_max().key
177 |     print avlTree.find_min().key
178 | 


--------------------------------------------------------------------------------
/base64_str.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2016-09-14 00:33:53
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2016-09-14 00:49:09
 6 | import string
 7 | import sys
 8 | def get_payloads():
 9 |     payloads = list(string.ascii_uppercase)
10 |     payloads = payloads + list(string.ascii_lowercase)
11 |     for i in xrange(0,10):
12 |         payloads.append(i)
13 |     payloads.extend(['+','-'])
14 |     return payloads
15 | 
16 | def encode(s):
17 |     if s=='':
18 |         return ''
19 |     if len(s)%3==1:
20 |         s = s+'00'
21 |     elif len(s)%3==2:
22 |         s = s+'0'
23 |     bin_code,tmp = [],[]
24 |     for i in xrange(0,len(s),3):
25 |         code = s[i:i+3]
26 |         for j in code:
27 |             if j=='0':
28 |                 bin_code.append('0'*8*len(j))
29 |             else:
30 |                 bin_code.append(bin(ord(j)).replace('0b','0')) # 10进制 to 2进制
31 | 
32 |     base_str = ''.join(map(str,bin_code))
33 |     translate_list = []
34 |     for bit in xrange(0,len(base_str),6):
35 |         split_code = base_str[bit:bit+6]
36 |         translate_list.append(str(int(split_code,2))) #二进制 to 十进制
37 |     payloads = get_payloads()
38 |     for i in translate_list:
39 |         if i=='0':
40 |             tmp.append('=')
41 |         else:
42 |             tmp.append(payloads[int(i)])
43 |     return ''.join(map(str,tmp))
44 | 
45 | def help():
46 |     print 'args error!\nexample:\n\tpython base64.py lock'
47 |     exit()
48 | 
49 | if __name__ == '__main__':
50 |     args = sys.argv
51 |     if len(args) > 2 or len(args)==1:
52 |         help()
53 |     print(encode(s = args[1]))
54 | 


--------------------------------------------------------------------------------
/btree.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2018/1/24 00:38
 5 | class BTree:
 6 |     def __init__(self, value):
 7 |         self.left = None
 8 |         self.data = value
 9 |         self.right = None
10 | 
11 |     def insertLeft(self, value):
12 |         self.left = BTree(value)
13 |         return self.left
14 | 
15 |     def insertRight(self, value):
16 |         self.right = BTree(value)
17 |         return self.right
18 | 
19 |     def show(self):
20 |         print self.data,
21 | 
22 | 
23 | def preorder(node):
24 |     if node.data:
25 |         node.show()
26 |         if node.left:
27 |             preorder(node.left)
28 |         if node.right:
29 |             preorder(node.right)
30 | 
31 | 
32 | def inorder(node):
33 |     if node.data:
34 |         if node.left:
35 |             inorder(node.left)
36 |         node.show()
37 |         if node.right:
38 |             inorder(node.right)
39 | 
40 | 
41 | def postorder(node):
42 |     if node.data:
43 |         if node.left:
44 |             postorder(node.left)
45 |         if node.right:
46 |             postorder(node.right)
47 |         node.show()
48 | 
49 | 
50 | def layerorder(node):
51 |     stack = [node]
52 |     while len(stack):
53 |         	node = stack.pop(0)
54 |         	if node.data:
55 |         		node.show()
56 | 	        if node.left:
57 | 	            stack.append(node.left)
58 | 	        if node.right:
59 | 	        	stack.append(node.right)
60 | 
61 | 
62 | if __name__ == "__main__":
63 |     Root = BTree("root")
64 |     A = Root.insertLeft("A")
65 |     C = A.insertLeft("C")
66 |     D = C.insertRight("D")
67 |     F = D.insertLeft("F")
68 |     G = D.insertRight("G")
69 |     B = Root.insertRight("B")
70 |     E = B.insertRight("E")
71 | 
72 |     print "前序遍历：",
73 |     preorder(Root)
74 | 
75 |     print ""
76 |     print "中序遍历：",
77 |     inorder(Root)
78 | 
79 |     print ""
80 |     print "后序遍历：",
81 |     postorder(Root)
82 | 
83 |     print ""
84 |     print "层序遍历：",
85 |     layerorder(Root)
86 | 


--------------------------------------------------------------------------------
/calc24.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-06-09 22:48:14
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2018-06-29 12:38:35
 6 | # -*- coding: utf-8 -*-
 7 | import optparse
 8 | import itertools
 9 | import random
10 | 
11 | 
12 | # 洗牌
13 | def shuffle(n, m=-1):
14 |     if m == -1:
15 |         m = n
16 |     l = range(n)
17 |     for i in range(len(l) - 1):
18 |         x = random.randint(i, len(l) - 1)
19 |         l[x], l[i] = l[i], l[x]
20 |         if i == m - 1:
21 |             break
22 |     return [l[idx] for idx in range(n) if idx >= 0 and idx < m]
23 | 
24 | 
25 | # 生成4张牌
26 | def Get4Card():
27 |     card = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10] * 4
28 |     cardidxs = shuffle(52, 4)
29 |     return [card[idx] for idx in cardidxs]
30 | 
31 | 
32 | def GenAllExpr(card_4, ops_iter):
33 |     try:
34 |         while True:
35 |             l = list(ops_iter.next()) + card_4
36 |             its = itertools.permutations(l, len(l))
37 |             try:
38 |                 while True:
39 |                     yield its.next()
40 |             except StopIteration:
41 |                 pass
42 |     except StopIteration:
43 |         pass
44 | 
45 | 
46 | def CalcRes(expr, isprint=False):
47 |     opmap = {'+': lambda a, b: a + b, '-': lambda a, b: a - b, '*': lambda a, b: a * b,
48 |              '/': lambda a, b: a / (b + 0.0)}
49 |     expr_stack = []
50 |     while expr:
51 |         t = expr.pop(0)
52 |         if type(t) == int:
53 |             expr_stack.append(t)
54 |         else:
55 |             if len(expr_stack) < 2:
56 |                 return False
57 |             else:
58 |                 a = expr_stack.pop()
59 |                 b = expr_stack.pop()
60 |                 if isprint:
61 |                     print a, t, b, '=', opmap[t](a, b)
62 |                 try:
63 |                     expr_stack.append(opmap[t](a, b))
64 |                 except ZeroDivisionError:
65 |                     return False
66 |     return expr_stack[0]
67 | 
68 | 
69 | if __name__ == "__main__":
70 |     parser = optparse.OptionParser('usage -n 1,2,3,4')
71 |     parser.add_option('-n', dest='nums', type='string', help='specify num list')
72 |     (options, args) = parser.parse_args()
73 |     nums = options.nums
74 |     if nums is None:
75 |         input_card = Get4Card()
76 |     else:
77 |         input_card = [int(x) for x in nums.split(',')]
78 |     card = input_card
79 |     if len(input_card) != 4:
80 |         print(parser.usage)
81 |         exit(0)
82 |     print card
83 |     ops = itertools.combinations_with_replacement('+-*/', 3)  # 一个24点的计算公式可以表达成3个操作符的形式
84 |     allexpr = GenAllExpr(card, ops)  # 数和操作符混合，得到所有可能序列
85 |     for expr in allexpr:
86 |         res = CalcRes(list(expr))
87 |         if res and res == 24:
88 |             CalcRes(list(expr), True)  # 输出计算过程
89 |             print "Success"
90 |             break
91 | 


--------------------------------------------------------------------------------
/celery/tasks.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2018/5/2 11:19
 5 | 
 6 | from celery import Celery
 7 | 
 8 | app = Celery('TASK', broker='redis://127.0.0.1', backend='redis://127.0.0.1')
 9 | 
10 | 
11 | @app.task
12 | def add(x, y):
13 |     print 'start ...'
14 |     print 'get param :%s,%s' % (x, y,)
15 |     return x + y
16 | 


--------------------------------------------------------------------------------
/compress.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-15 00:11:32
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-15 00:34:55
 6 | 
 7 | 
 8 | def compress(string):
 9 |     compressed = []
10 |     count = 0
11 |     temp = string[0]
12 | 
13 |     for i in range(0, len(string)):
14 |         if temp == string[i]:
15 |             count = count + 1
16 |         else:
17 |             compressed.append(str(temp) + str(count))
18 |             count = 1
19 |             temp = string[i]
20 | 
21 |         if i == len(string) - 1:
22 |             compressed.append(str(temp) + str(count))
23 | 
24 |     return ''.join([str(x) for x in compressed])
25 | 
26 | 
27 | def decompress(string):
28 | 	print '执行解压...'
29 | 	decompress_list = []
30 | 	for j in xrange(0, len(string) - 1):
31 | 		if j % 2 == 0:
32 | 			for i in xrange(0, int(string[j + 1])):
33 | 				decompress_list.append(string[j])
34 | 				print string[j]
35 | 	print '解压完毕'
36 | 	return ''.join(decompress_list)
37 | 
38 | def main():
39 |     string = "xAAACCCBBDBB111"
40 |     print '原始字符串:%s' % (string,)
41 |     print '压缩后:%s' % (compress(string),)
42 |     print '解压后:%s' % (decompress(compress(string)),)
43 | 
44 | if __name__ == '__main__':
45 |     main()
46 | 


--------------------------------------------------------------------------------
/coroutine.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-05-03 00:06:17
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-05-03 00:06:22
 6 | from gevent import monkey; monkey.patch_all()
 7 | import gevent
 8 | import urllib2
 9 | 
10 | def f(url):
11 |     print('GET: %s' % url)
12 |     resp = urllib2.urlopen(url)
13 |     data = resp.read()
14 |     print('%d bytes received from %s.' % (len(data), url))
15 | 
16 | gevent.joinall([
17 |         gevent.spawn(f, 'https://www.python.org/'),
18 |         gevent.spawn(f, 'https://www.yahoo.com/'),
19 |         gevent.spawn(f, 'https://github.com/'),
20 | ])


--------------------------------------------------------------------------------
/crawl_360/crawl_360/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/__init__.py


--------------------------------------------------------------------------------
/crawl_360/crawl_360/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/__init__.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/items.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Define here the models for your scraped items
 4 | #
 5 | # See documentation in:
 6 | # https://doc.scrapy.org/en/latest/topics/items.html
 7 | 
 8 | import scrapy
 9 | 
10 | 
11 | class Crawl360Item(scrapy.Item):
12 |     # define the fields for your item here like:
13 |     # name = scrapy.Field()
14 |     pass
15 | 
16 | 
17 | class ButianItem(scrapy.Item):
18 |     author = scrapy.Field()  # 作者
19 |     company_name = scrapy.Field()  # 企业名称
20 |     vul_name = scrapy.Field()  # SQL注入漏洞
21 |     vul_level = scrapy.Field()  # 高危
22 |     vul_type = scrapy.Field()  # 通用型
23 |     vul_money = scrapy.Field()  # 奖励
24 |     vul_find_time = scrapy.Field()  # 时间
25 |     link_url = scrapy.Field()  # 抓取url
26 |     create_time = scrapy.Field()  # 创建时间
27 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/items.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/items.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/middlewares.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Define here the models for your spider middleware
  4 | #
  5 | # See documentation in:
  6 | # https://doc.scrapy.org/en/latest/topics/spider-middleware.html
  7 | 
  8 | from scrapy import signals
  9 | 
 10 | 
 11 | class Crawl360SpiderMiddleware(object):
 12 |     # Not all methods need to be defined. If a method is not defined,
 13 |     # scrapy acts as if the spider middleware does not modify the
 14 |     # passed objects.
 15 | 
 16 |     @classmethod
 17 |     def from_crawler(cls, crawler):
 18 |         # This method is used by Scrapy to create your spiders.
 19 |         s = cls()
 20 |         crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
 21 |         return s
 22 | 
 23 |     def process_spider_input(self, response, spider):
 24 |         # Called for each response that goes through the spider
 25 |         # middleware and into the spider.
 26 | 
 27 |         # Should return None or raise an exception.
 28 |         return None
 29 | 
 30 |     def process_spider_output(self, response, result, spider):
 31 |         # Called with the results returned from the Spider, after
 32 |         # it has processed the response.
 33 | 
 34 |         # Must return an iterable of Request, dict or Item objects.
 35 |         for i in result:
 36 |             yield i
 37 | 
 38 |     def process_spider_exception(self, response, exception, spider):
 39 |         # Called when a spider or process_spider_input() method
 40 |         # (from other spider middleware) raises an exception.
 41 | 
 42 |         # Should return either None or an iterable of Response, dict
 43 |         # or Item objects.
 44 |         pass
 45 | 
 46 |     def process_start_requests(self, start_requests, spider):
 47 |         # Called with the start requests of the spider, and works
 48 |         # similarly to the process_spider_output() method, except
 49 |         # that it doesn’t have a response associated.
 50 | 
 51 |         # Must return only requests (not items).
 52 |         for r in start_requests:
 53 |             yield r
 54 | 
 55 |     def spider_opened(self, spider):
 56 |         spider.logger.info('Spider opened: %s' % spider.name)
 57 | 
 58 | 
 59 | class Crawl360DownloaderMiddleware(object):
 60 |     # Not all methods need to be defined. If a method is not defined,
 61 |     # scrapy acts as if the downloader middleware does not modify the
 62 |     # passed objects.
 63 | 
 64 |     @classmethod
 65 |     def from_crawler(cls, crawler):
 66 |         # This method is used by Scrapy to create your spiders.
 67 |         s = cls()
 68 |         crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
 69 |         return s
 70 | 
 71 |     def process_request(self, request, spider):
 72 |         # Called for each request that goes through the downloader
 73 |         # middleware.
 74 | 
 75 |         # Must either:
 76 |         # - return None: continue processing this request
 77 |         # - or return a Response object
 78 |         # - or return a Request object
 79 |         # - or raise IgnoreRequest: process_exception() methods of
 80 |         #   installed downloader middleware will be called
 81 |         return None
 82 | 
 83 |     def process_response(self, request, response, spider):
 84 |         # Called with the response returned from the downloader.
 85 | 
 86 |         # Must either;
 87 |         # - return a Response object
 88 |         # - return a Request object
 89 |         # - or raise IgnoreRequest
 90 |         return response
 91 | 
 92 |     def process_exception(self, request, exception, spider):
 93 |         # Called when a download handler or a process_request()
 94 |         # (from other downloader middleware) raises an exception.
 95 | 
 96 |         # Must either:
 97 |         # - return None: continue processing this exception
 98 |         # - return a Response object: stops process_exception() chain
 99 |         # - return a Request object: stops process_exception() chain
100 |         pass
101 | 
102 |     def spider_opened(self, spider):
103 |         spider.logger.info('Spider opened: %s' % spider.name)
104 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/__init__.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # encoding: utf-8
3 | # author: Lock
4 | # time: 2018/4/28 11:35
5 | 
6 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/__init__.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/db.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2018/4/28 11:36
 5 | 
 6 | from sqlalchemy import create_engine
 7 | from sqlalchemy.orm import sessionmaker
 8 | from sqlalchemy.ext.declarative import declarative_base
 9 | 
10 | # 创建对象的基类:
11 | Base = declarative_base()
12 | 
13 | CONFIG = {
14 |     'db_host': '127.0.0.1',
15 |     'db_user': 'root',
16 |     'db_pass': '',
17 |     'db_port': 3306,
18 |     'db_name': 'crawl'
19 | }
20 | 
21 | # 初始化数据库连接:
22 | engine = create_engine('mysql+mysqlconnector://%s:%s@%s:%s/%s' % (
23 |     CONFIG.get('db_user'),
24 |     CONFIG.get('db_pass'),
25 |     CONFIG.get('db_host'),
26 |     CONFIG.get('db_port'),
27 |     CONFIG.get('db_name'),
28 | ))
29 | 
30 | # 创建DBSession类型:
31 | DBSession = sessionmaker(bind=engine)
32 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/db.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/db.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/models.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | from sqlalchemy import Column, DateTime, Integer, Numeric, String, text
 3 | from sqlalchemy.ext.declarative import declarative_base
 4 | 
 5 | Base = declarative_base()
 6 | metadata = Base.metadata
 7 | 
 8 | 
 9 | class Butian(Base):
10 |     __tablename__ = 'butian'
11 | 
12 |     id = Column(Integer, primary_key=True)
13 |     author = Column(String(100), nullable=False, server_default=text("''"))
14 |     company_name = Column(String(100), nullable=False, server_default=text("''"))
15 |     vul_level = Column(String(100), nullable=False, server_default=text("''"))
16 |     vul_name = Column(String(100), nullable=False, server_default=text("''"))
17 |     vul_money = Column(Numeric(10, 2), nullable=False)
18 |     vul_find_time = Column(DateTime, nullable=False, server_default=text("'0000-00-00 00:00:00'"))
19 |     link_url = Column(String(255), nullable=False, server_default=text("''"))
20 |     create_time = Column(DateTime, nullable=False)
21 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/models/models.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/models.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/pipelines.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Define your item pipelines here
 4 | #
 5 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
 6 | # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
 7 | from crawl_360.items import ButianItem
 8 | from crawl_360.models.db import DBSession
 9 | from crawl_360.models.models import Butian
10 | 
11 | 
12 | class Crawl360Pipeline(object):
13 |     def __init__(self):
14 |         self.db_session = DBSession()
15 | 
16 |     def process_item(self, item, spider):
17 |         if isinstance(item, ButianItem):
18 |             data_item = Butian(**item)
19 |             # 数据入库
20 |             self.db_session.add(data_item)
21 |             try:
22 |                 self.db_session.commit()
23 |             except Exception, e:
24 |                 print e.message
25 |                 self.db_session.rollback()
26 |         return item
27 | 
28 |     def close_spider(self, spider):
29 |         self.db_session.close()
30 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/pipelines.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/pipelines.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/reademe/sql.sql:
--------------------------------------------------------------------------------
 1 | CREATE TABLE butian (
 2 |   id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增id',
 3 |   author VARCHAR(100) not null DEFAULT '' COMMENT '作者',
 4 |   company_name VARCHAR(100) NOT NULL DEFAULT '' COMMENT '公司名',
 5 |   vul_level VARCHAR(100) not null DEFAULT '' COMMENT '漏洞级别',
 6 |   vul_name VARCHAR(100) not null DEFAULT '' COMMENT '漏洞名',
 7 |   vul_money DECIMAL(10,2) not NULL DEFAULT 0 COMMENT '漏洞奖金',
 8 |   vul_find_time DATETIME not NULL DEFAULT '0000-00-00 00:00:00' COMMENT '漏洞发现时间',
 9 |   link_url VARCHAR(255) not null DEFAULT '' COMMENT '页面url',
10 |   create_time TIMESTAMP not null DEFAULT current_timestamp COMMENT '创建时间',
11 |   PRIMARY KEY (id)
12 | )ENGINE=INNODB DEFAULT CHARSET utf8;
13 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/settings.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | # Scrapy settings for crawl_360 project
 4 | #
 5 | # For simplicity, this file contains only settings considered important or
 6 | # commonly used. You can find more settings consulting the documentation:
 7 | #
 8 | #     https://doc.scrapy.org/en/latest/topics/settings.html
 9 | #     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
10 | #     https://doc.scrapy.org/en/latest/topics/spider-middleware.html
11 | 
12 | BOT_NAME = 'crawl_360'
13 | 
14 | SPIDER_MODULES = ['crawl_360.spiders']
15 | NEWSPIDER_MODULE = 'crawl_360.spiders'
16 | 
17 | 
18 | # Crawl responsibly by identifying yourself (and your website) on the user-agent
19 | #USER_AGENT = 'crawl_360 (+http://www.yourdomain.com)'
20 | 
21 | # Obey robots.txt rules
22 | ROBOTSTXT_OBEY = True
23 | 
24 | # Configure maximum concurrent requests performed by Scrapy (default: 16)
25 | #CONCURRENT_REQUESTS = 32
26 | 
27 | # Configure a delay for requests for the same website (default: 0)
28 | # See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
29 | # See also autothrottle settings and docs
30 | #DOWNLOAD_DELAY = 3
31 | # The download delay setting will honor only one of:
32 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16
33 | #CONCURRENT_REQUESTS_PER_IP = 16
34 | 
35 | # Disable cookies (enabled by default)
36 | #COOKIES_ENABLED = False
37 | 
38 | # Disable Telnet Console (enabled by default)
39 | #TELNETCONSOLE_ENABLED = False
40 | 
41 | # Override the default request headers:
42 | #DEFAULT_REQUEST_HEADERS = {
43 | #   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
44 | #   'Accept-Language': 'en',
45 | #}
46 | 
47 | # Enable or disable spider middlewares
48 | # See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
49 | #SPIDER_MIDDLEWARES = {
50 | #    'crawl_360.middlewares.Crawl360SpiderMiddleware': 543,
51 | #}
52 | 
53 | # Enable or disable downloader middlewares
54 | # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
55 | #DOWNLOADER_MIDDLEWARES = {
56 | #    'crawl_360.middlewares.Crawl360DownloaderMiddleware': 543,
57 | #}
58 | 
59 | # Enable or disable extensions
60 | # See https://doc.scrapy.org/en/latest/topics/extensions.html
61 | #EXTENSIONS = {
62 | #    'scrapy.extensions.telnet.TelnetConsole': None,
63 | #}
64 | 
65 | # Configure item pipelines
66 | # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
67 | ITEM_PIPELINES = {
68 |     'crawl_360.pipelines.Crawl360Pipeline': 300,
69 | }
70 | 
71 | # Enable and configure the AutoThrottle extension (disabled by default)
72 | # See https://doc.scrapy.org/en/latest/topics/autothrottle.html
73 | #AUTOTHROTTLE_ENABLED = True
74 | # The initial download delay
75 | #AUTOTHROTTLE_START_DELAY = 5
76 | # The maximum download delay to be set in case of high latencies
77 | #AUTOTHROTTLE_MAX_DELAY = 60
78 | # The average number of requests Scrapy should be sending in parallel to
79 | # each remote server
80 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
81 | # Enable showing throttling stats for every response received:
82 | #AUTOTHROTTLE_DEBUG = False
83 | 
84 | # Enable and configure HTTP caching (disabled by default)
85 | # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
86 | #HTTPCACHE_ENABLED = True
87 | #HTTPCACHE_EXPIRATION_SECS = 0
88 | #HTTPCACHE_DIR = 'httpcache'
89 | #HTTPCACHE_IGNORE_HTTP_CODES = []
90 | #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
91 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/settings.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/settings.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/spiders/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/spiders/__init__.pyc


--------------------------------------------------------------------------------
/crawl_360/crawl_360/spiders/butian.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | import scrapy
 3 | 
 4 | from crawl_360.items import ButianItem
 5 | import time
 6 | 
 7 | 
 8 | class ButianSpider(scrapy.Spider):
 9 |     name = 'butian'
10 |     allowed_domains = ['butian.360.cn/Loo', 'butian.360.cn']
11 |     start_urls = ['http://butian.360.cn/Loo/']
12 | 
13 |     def parse(self, response):
14 |         self.logger.info('strart parse dst page ...')
15 |         item = ButianItem()
16 |         # import ipdb
17 |         # ipdb.set_trace()
18 |         for sel in response.xpath('//ul[@class="loopListBottom"]/li'):
19 |             item['author'] = sel.xpath('dl/dd/span[1]/text()').extract_first(default='').strip()
20 |             item['company_name'] = sel.xpath('dl/dd/a/text()').extract_first(default='').strip()
21 |             item['vul_name'] = sel.xpath('dl/dd/span[3]/text()').extract_first(default='').replace(u'的一个', '').strip()
22 |             item['vul_level'] = sel.xpath('dl/dd[2]/strong[@class="loopHigh"]/text()').extract_first(default='').strip()
23 |             item['vul_money'] = sel.xpath('dl/p[@class="loopJiangjin"]/text()').extract_first(default=0)
24 |             item['vul_find_time'] = sel.xpath('dl/dd[2]/em/text()').extract_first(default='').strip()
25 |             item['link_url'] = response.url.strip()
26 |             item['create_time'] = time.strftime("%Y-%m-%d %H:%M:%S")
27 |             self.logger.info('find item data is:%s' % (item,))
28 |             yield item
29 | 
30 |         next_page = response.xpath(u'//div[@class="btPage page"]/a[contains(text(),"下一页")]/@href').extract_first()
31 |         if next_page is not None:
32 |             next_page = response.urljoin(next_page)
33 |             self.logger.info('next page url is:%s' % (next_page,))
34 |             yield scrapy.Request(url=next_page, callback=self.parse)
35 | 


--------------------------------------------------------------------------------
/crawl_360/crawl_360/spiders/butian.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/spiders/butian.pyc


--------------------------------------------------------------------------------
/crawl_360/scrapy.cfg:
--------------------------------------------------------------------------------
 1 | # Automatically created by: scrapy startproject
 2 | #
 3 | # For more information about the [deploy] section see:
 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
 5 | 
 6 | [settings]
 7 | default = crawl_360.settings
 8 | 
 9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = crawl_360
12 | 


--------------------------------------------------------------------------------
/dispatch.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2016-05-18 23:47:54
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2016-05-18 23:47:54
 6 | from collections import deque
 7 | class Runner(object):
 8 |     def __init__(self, tasks):
 9 |         self.tasks = deque(tasks)
10 | 
11 |     def next(self):
12 |         return self.tasks.pop()
13 | 
14 |     def run(self):
15 |         while len(self.tasks):
16 |             task = self.next()
17 |             try:
18 |                 next(task)
19 |             except StopIteration:
20 |                 pass
21 |             else:
22 |                 self.tasks.appendleft(task)
23 | 
24 | def task(name, times):
25 |     for i in range(times):
26 |         yield
27 |         print(name, i)
28 | 
29 | Runner([
30 |     task('hsfzxjy', 5),
31 |     task('Jack', 4),
32 |     task('Bob', 6)
33 | ]).run()
34 | 


--------------------------------------------------------------------------------
/hashtable.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-15 00:49:17
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-15 01:00:13
 6 | class Item(object):
 7 | 
 8 |     def __init__(self, key, value):
 9 |         self.key = key
10 |         self.value = value
11 | 
12 | 
13 | class HashTable(object):
14 | 
15 |     def __init__(self, size):
16 |         self.size = size
17 |         self.table = [[] for _ in xrange(self.size)]
18 | 
19 |     def hash_function(self, key):
20 |         return key % self.size
21 | 
22 |     def set(self, key, value):
23 |         hash_index = self.hash_function(key)
24 |         for item in self.table[hash_index]:
25 |             if item.key == key:
26 |                 item.value = value
27 |                 return
28 |         self.table[hash_index].append(Item(key, value))
29 | 
30 |     def get(self, key):
31 |         hash_index = self.hash_function(key)
32 |         for item in self.table[hash_index]:
33 |             if item.key == key:
34 |                 return item.value
35 |         return None
36 | 
37 |     def remove(self, key):
38 |         hash_index = self.hash_function(key)
39 |         for i, item in enumerate(self.table[hash_index]):
40 |             if item.key == key:
41 |                 del self.table[hash_index][i]
42 | 
43 | if __name__ == '__main__':
44 |     hash_table = HashTable(5);
45 |     hash_table.set(1,'x')
46 |     hash_table.set(1,'m')
47 |     hash_table.set(2,'y')
48 |     hash_table.set(3,'z')
49 |     print hash_table.get(1)
50 |     print hash_table.get(2)
51 |     print hash_table.get(3)


--------------------------------------------------------------------------------
/heapq_sort.py:
--------------------------------------------------------------------------------
 1 | # !/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # Created by Vim
 5 | """
 6 | 算法过程：
 7 | （1）、建堆：从len/2到第一个节点0处一直调用调整堆的过程，其中len为数组长度，len/2表示节点深度。
 8 | （2）、调整堆：比较节点i和它的孩子节点left(i),right(i)，选出三者最大者，如果最大值不是节点i而是它的一个孩子节点，那便交换节点i和该节点，然后再调用调整堆过程，这是一个递归的过程。调整堆的过程时间复杂度与堆的深度有关系，是lgn的操作。
 9 | （3）、堆排序：主要利用上面两个过程进行。首先是根据元素构建堆，然后将堆的根节点取出(一般是与最后一个节点进行交换)，将前面len-1个节点继续进行堆调整的过程，然后再将根节点取出，这样一直到所有节点都取出。
10 | """
11 | 
12 | 
13 | def build_heap(seq):
14 |     length = len(seq)
15 |     for item in range(0, int((length / 2)))[::-1]:
16 |         adjust_heap(seq, item, length)
17 | 
18 | 
19 | def adjust_heap(seq, root, length):
20 |     left_child = 2 * root + 1
21 |     right_child = 2 * root + 2
22 |     root_max = root
23 |     if left_child < length and seq[left_child] > seq[root_max]:
24 |         root_max = left_child
25 |     if right_child < length and seq[right_child] > seq[root_max]:
26 |         root_max = right_child
27 |     if root_max != root:  # 如果做了堆调整,则root_max变更后是左节点或者右节点的，进行对调值操作
28 |         seq[root_max], seq[root] = seq[root], seq[root_max]
29 |         adjust_heap(seq, root_max, length)
30 | 
31 | 
32 | def heap_sort(seq):
33 |     length = len(seq)
34 |     build_heap(seq)  # 建立初始堆
35 |     for i in range(0, length)[::-1]:
36 |         seq[0], seq[i] = seq[i], seq[0]  # 将根节点取出与最后一位做对调
37 |         adjust_heap(seq, 0, i)  # 对前面len-1个节点继续进行堆调整过程
38 |     return seq
39 | 
40 | 
41 | if __name__ == "__main__":
42 |     arr = [2, 1, 3, 8, 12, 5, 5, 6, 4, 10, 0]
43 |     print(arr)
44 |     heap_sort(arr)
45 |     print(arr)
46 | 


--------------------------------------------------------------------------------
/httpstat.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | # References:
  4 | # man curl
  5 | # https://curl.haxx.se/libcurl/c/curl_easy_getinfo.html
  6 | # https://curl.haxx.se/libcurl/c/easy_getinfo_options.html
  7 | # http://blog.kenweiner.com/2014/11/http-request-timings-with-curl.html
  8 | 
  9 | from __future__ import print_function
 10 | 
 11 | import os
 12 | import json
 13 | import sys
 14 | import logging
 15 | import tempfile
 16 | import subprocess
 17 | 
 18 | 
 19 | __version__ = '1.3.1'
 20 | 
 21 | 
 22 | PY3 = sys.version_info >= (3,)
 23 | 
 24 | if PY3:
 25 |     xrange = range
 26 | 
 27 | 
 28 | # Env class is copied from https://github.com/reorx/getenv/blob/master/getenv.py
 29 | class Env(object):
 30 |     prefix = 'HTTPSTAT'
 31 |     _instances = []
 32 | 
 33 |     def __init__(self, key):
 34 |         self.key = key.format(prefix=self.prefix)
 35 |         Env._instances.append(self)
 36 | 
 37 |     def get(self, default=None):
 38 |         return os.environ.get(self.key, default)
 39 | 
 40 | 
 41 | ENV_SHOW_BODY = Env('{prefix}_SHOW_BODY')
 42 | ENV_SHOW_IP = Env('{prefix}_SHOW_IP')
 43 | ENV_SHOW_SPEED = Env('{prefix}_SHOW_SPEED')
 44 | ENV_SAVE_BODY = Env('{prefix}_SAVE_BODY')
 45 | ENV_CURL_BIN = Env('{prefix}_CURL_BIN')
 46 | ENV_METRICS_ONLY = Env('{prefix}_METRICS_ONLY')
 47 | ENV_DEBUG = Env('{prefix}_DEBUG')
 48 | 
 49 | 
 50 | curl_format = """{
 51 | "time_namelookup": %{time_namelookup},
 52 | "time_connect": %{time_connect},
 53 | "time_appconnect": %{time_appconnect},
 54 | "time_pretransfer": %{time_pretransfer},
 55 | "time_redirect": %{time_redirect},
 56 | "time_starttransfer": %{time_starttransfer},
 57 | "time_total": %{time_total},
 58 | "speed_download": %{speed_download},
 59 | "speed_upload": %{speed_upload},
 60 | "remote_ip": "%{remote_ip}",
 61 | "remote_port": "%{remote_port}",
 62 | "local_ip": "%{local_ip}",
 63 | "local_port": "%{local_port}"
 64 | }"""
 65 | 
 66 | https_template = """
 67 |   DNS Lookup   TCP Connection   TLS Handshake   Server Processing   Content Transfer
 68 | [   {a0000}  |     {a0001}    |    {a0002}    |      {a0003}      |      {a0004}     ]
 69 |              |                |               |                   |                  |
 70 |     namelookup:{b0000}        |               |                   |                  |
 71 |                         connect:{b0001}       |                   |                  |
 72 |                                     pretransfer:{b0002}           |                  |
 73 |                                                       starttransfer:{b0003}          |
 74 |                                                                                  total:{b0004}
 75 | """[1:]
 76 | 
 77 | http_template = """
 78 |   DNS Lookup   TCP Connection   Server Processing   Content Transfer
 79 | [   {a0000}  |     {a0001}    |      {a0003}      |      {a0004}     ]
 80 |              |                |                   |                  |
 81 |     namelookup:{b0000}        |                   |                  |
 82 |                         connect:{b0001}           |                  |
 83 |                                       starttransfer:{b0003}          |
 84 |                                                                  total:{b0004}
 85 | """[1:]
 86 | 
 87 | 
 88 | # Color code is copied from https://github.com/reorx/python-terminal-color/blob/master/color_simple.py
 89 | ISATTY = sys.stdout.isatty()
 90 | 
 91 | 
 92 | def make_color(code):
 93 |     def color_func(s):
 94 |         if not ISATTY:
 95 |             return s
 96 |         tpl = '\x1b[{}m{}\x1b[0m'
 97 |         return tpl.format(code, s)
 98 |     return color_func
 99 | 
100 | 
101 | red = make_color(31)
102 | green = make_color(32)
103 | yellow = make_color(33)
104 | blue = make_color(34)
105 | magenta = make_color(35)
106 | cyan = make_color(36)
107 | 
108 | bold = make_color(1)
109 | underline = make_color(4)
110 | 
111 | grayscale = {(i - 232): make_color('38;5;' + str(i)) for i in xrange(232, 256)}
112 | 
113 | 
114 | def quit(s, code=0):
115 |     if s is not None:
116 |         print(s)
117 |     sys.exit(code)
118 | 
119 | 
120 | def print_help():
121 |     help = """
122 | Usage: httpstat URL [CURL_OPTIONS]
123 |        httpstat -h | --help
124 |        httpstat --version
125 | 
126 | Arguments:
127 |   URL     url to request, could be with or without `http(s)://` prefix
128 | 
129 | Options:
130 |   CURL_OPTIONS  any curl supported options, except for -w -D -o -S -s,
131 |                 which are already used internally.
132 |   -h --help     show this screen.
133 |   --version     show version.
134 | 
135 | Environments:
136 |   HTTPSTAT_SHOW_BODY    Set to `true` to show response body in the output,
137 |                         note that body length is limited to 1023 bytes, will be
138 |                         truncated if exceeds. Default is `false`.
139 |   HTTPSTAT_SHOW_IP      By default httpstat shows remote and local IP/port address.
140 |                         Set to `false` to disable this feature. Default is `true`.
141 |   HTTPSTAT_SHOW_SPEED   Set to `true` to show download and upload speed.
142 |                         Default is `false`.
143 |   HTTPSTAT_SAVE_BODY    By default httpstat stores body in a tmp file,
144 |                         set to `false` to disable this feature. Default is `true`
145 |   HTTPSTAT_CURL_BIN     Indicate the curl bin path to use. Default is `curl`
146 |                         from current shell $PATH.
147 |   HTTPSTAT_DEBUG        Set to `true` to see debugging logs. Default is `false`
148 | """[1:-1]
149 |     print(help)
150 | 
151 | 
152 | def main():
153 |     args = sys.argv[1:]
154 |     if not args:
155 |         print_help()
156 |         quit(None, 0)
157 | 
158 |     # get envs
159 |     show_body = 'true' in ENV_SHOW_BODY.get('false').lower()
160 |     show_ip = 'true' in ENV_SHOW_IP.get('true').lower()
161 |     show_speed = 'true'in ENV_SHOW_SPEED.get('false').lower()
162 |     save_body = 'true' in ENV_SAVE_BODY.get('true').lower()
163 |     curl_bin = ENV_CURL_BIN.get('curl')
164 |     metrics_only = 'true' in ENV_METRICS_ONLY.get('false').lower()
165 |     is_debug = 'true' in ENV_DEBUG.get('false').lower()
166 | 
167 |     # configure logging
168 |     if is_debug:
169 |         log_level = logging.DEBUG
170 |     else:
171 |         log_level = logging.INFO
172 |     logging.basicConfig(level=log_level)
173 |     lg = logging.getLogger('httpstat')
174 | 
175 |     # log envs
176 |     lg.debug('Envs:\n%s', '\n'.join('  {}={}'.format(i.key, i.get('')) for i in Env._instances))
177 |     lg.debug('Flags: %s', dict(
178 |         show_body=show_body,
179 |         show_ip=show_ip,
180 |         show_speed=show_speed,
181 |         save_body=save_body,
182 |         curl_bin=curl_bin,
183 |         is_debug=is_debug,
184 |     ))
185 | 
186 |     # get url
187 |     url = args[0]
188 |     if url in ['-h', '--help']:
189 |         print_help()
190 |         quit(None, 0)
191 |     elif url == '--version':
192 |         print('httpstat {}'.format(__version__))
193 |         quit(None, 0)
194 | 
195 |     curl_args = args[1:]
196 | 
197 |     # check curl args
198 |     exclude_options = [
199 |         '-w', '--write-out',
200 |         '-D', '--dump-header',
201 |         '-o', '--output',
202 |         '-s', '--silent',
203 |     ]
204 |     for i in exclude_options:
205 |         if i in curl_args:
206 |             quit(yellow('Error: {} is not allowed in extra curl args'.format(i)), 1)
207 | 
208 |     # tempfile for output
209 |     bodyf = tempfile.NamedTemporaryFile(delete=False)
210 |     bodyf.close()
211 | 
212 |     headerf = tempfile.NamedTemporaryFile(delete=False)
213 |     headerf.close()
214 | 
215 |     # run cmd
216 |     cmd_env = os.environ.copy()
217 |     cmd_env.update(
218 |         LC_ALL='C',
219 |     )
220 |     cmd_core = [curl_bin, '-w', curl_format, '-D', headerf.name, '-o', bodyf.name, '-s', '-S']
221 |     cmd = cmd_core + curl_args + [url]
222 |     lg.debug('cmd: %s', cmd)
223 |     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=cmd_env)
224 |     out, err = p.communicate()
225 |     if PY3:
226 |         out, err = out.decode(), err.decode()
227 |     lg.debug('out: %s', out)
228 | 
229 |     # print stderr
230 |     if p.returncode == 0:
231 |         if err:
232 |             print(grayscale[16](err))
233 |     else:
234 |         _cmd = list(cmd)
235 |         _cmd[2] = '<output-format>'
236 |         _cmd[4] = '<tempfile>'
237 |         _cmd[6] = '<tempfile>'
238 |         print('> {}'.format(' '.join(_cmd)))
239 |         quit(yellow('curl error: {}'.format(err)), p.returncode)
240 | 
241 |     # parse output
242 |     try:
243 |         d = json.loads(out)
244 |     except ValueError as e:
245 |         print(yellow('Could not decode json: {}'.format(e)))
246 |         print('curl result:', p.returncode, grayscale[16](out), grayscale[16](err))
247 |         quit(None, 1)
248 | 
249 |     # convert time_ metrics from seconds to milliseconds
250 |     for k in d:
251 |         if k.startswith('time_'):
252 |             v = d[k]
253 |             # Convert time_ values to milliseconds in int
254 |             if isinstance(v, float):
255 |                 # Before 7.61.0, time values are represented as seconds in float
256 |                 d[k] = int(v * 1000)
257 |             elif isinstance(v, int):
258 |                 # Starting from 7.61.0, libcurl uses microsecond in int
259 |                 # to return time values, references:
260 |                 # https://daniel.haxx.se/blog/2018/07/11/curl-7-61-0/
261 |                 # https://curl.se/bug/?i=2495
262 |                 d[k] = int(v / 1000)
263 |             else:
264 |                 raise TypeError('{} value type is invalid: {}'.format(k, type(v)))
265 | 
266 |     # calculate ranges
267 |     d.update(
268 |         range_dns=d['time_namelookup'],
269 |         range_connection=d['time_connect'] - d['time_namelookup'],
270 |         range_ssl=d['time_pretransfer'] - d['time_connect'],
271 |         range_server=d['time_starttransfer'] - d['time_pretransfer'],
272 |         range_transfer=d['time_total'] - d['time_starttransfer'],
273 |     )
274 | 
275 |     # print json if metrics_only is enabled
276 |     if metrics_only:
277 |         print(json.dumps(d, indent=2))
278 |         quit(None, 0)
279 | 
280 |     # ip
281 |     if show_ip:
282 |         s = 'Connected to {}:{} from {}:{}'.format(
283 |             cyan(d['remote_ip']), cyan(d['remote_port']),
284 |             d['local_ip'], d['local_port'],
285 |         )
286 |         print(s)
287 |         print()
288 | 
289 |     # print header & body summary
290 |     with open(headerf.name, 'r') as f:
291 |         headers = f.read().strip()
292 |     # remove header file
293 |     lg.debug('rm header file %s', headerf.name)
294 |     os.remove(headerf.name)
295 | 
296 |     for loop, line in enumerate(headers.split('\n')):
297 |         if loop == 0:
298 |             p1, p2 = tuple(line.split('/'))
299 |             print(green(p1) + grayscale[14]('/') + cyan(p2))
300 |         else:
301 |             pos = line.find(':')
302 |             print(grayscale[14](line[:pos + 1]) + cyan(line[pos + 1:]))
303 | 
304 |     print()
305 | 
306 |     # body
307 |     if show_body:
308 |         body_limit = 1024
309 |         with open(bodyf.name, 'r') as f:
310 |             body = f.read().strip()
311 |         body_len = len(body)
312 | 
313 |         if body_len > body_limit:
314 |             print(body[:body_limit] + cyan('...'))
315 |             print()
316 |             s = '{} is truncated ({} out of {})'.format(green('Body'), body_limit, body_len)
317 |             if save_body:
318 |                 s += ', stored in: {}'.format(bodyf.name)
319 |             print(s)
320 |         else:
321 |             print(body)
322 |     else:
323 |         if save_body:
324 |             print('{} stored in: {}'.format(green('Body'), bodyf.name))
325 | 
326 |     # remove body file
327 |     if not save_body:
328 |         lg.debug('rm body file %s', bodyf.name)
329 |         os.remove(bodyf.name)
330 | 
331 |     # print stat
332 |     if url.startswith('https://'):
333 |         template = https_template
334 |     else:
335 |         template = http_template
336 | 
337 |     # colorize template first line
338 |     tpl_parts = template.split('\n')
339 |     tpl_parts[0] = grayscale[16](tpl_parts[0])
340 |     template = '\n'.join(tpl_parts)
341 | 
342 |     def fmta(s):
343 |         return cyan('{:^7}'.format(str(s) + 'ms'))
344 | 
345 |     def fmtb(s):
346 |         return cyan('{:<7}'.format(str(s) + 'ms'))
347 | 
348 |     stat = template.format(
349 |         # a
350 |         a0000=fmta(d['range_dns']),
351 |         a0001=fmta(d['range_connection']),
352 |         a0002=fmta(d['range_ssl']),
353 |         a0003=fmta(d['range_server']),
354 |         a0004=fmta(d['range_transfer']),
355 |         # b
356 |         b0000=fmtb(d['time_namelookup']),
357 |         b0001=fmtb(d['time_connect']),
358 |         b0002=fmtb(d['time_pretransfer']),
359 |         b0003=fmtb(d['time_starttransfer']),
360 |         b0004=fmtb(d['time_total']),
361 |     )
362 |     print()
363 |     print(stat)
364 | 
365 |     # speed, originally bytes per second
366 |     if show_speed:
367 |         print('speed_download: {:.1f} KiB/s, speed_upload: {:.1f} KiB/s'.format(
368 |             d['speed_download'] / 1024, d['speed_upload'] / 1024))
369 | 
370 | 
371 | if __name__ == '__main__':
372 |     main()
373 | 


--------------------------------------------------------------------------------
/img/ac_fail_pointer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/ac_fail_pointer.png


--------------------------------------------------------------------------------
/img/btree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/btree.png


--------------------------------------------------------------------------------
/img/cmd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/cmd.png


--------------------------------------------------------------------------------
/img/crawl_db_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/crawl_db_data.png


--------------------------------------------------------------------------------
/img/crawl_run.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/crawl_run.gif


--------------------------------------------------------------------------------
/img/download.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/download.gif


--------------------------------------------------------------------------------
/img/knn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/knn.png


--------------------------------------------------------------------------------
/img/redpackage.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/redpackage.gif


--------------------------------------------------------------------------------
/img/spider-wx.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/spider-wx.png


--------------------------------------------------------------------------------
/img/svm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/svm.png


--------------------------------------------------------------------------------
/img/tire.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/tire.png


--------------------------------------------------------------------------------
/interpreter.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-18 15:21:43
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-18 15:40:25
 6 | 
 7 | class Interpreter:
 8 |     def __init__(self):
 9 |         self.stack = []
10 | 
11 |     def load_value(self, number):
12 |         self.stack.append(number)
13 | 
14 |     def print_answer(self):
15 |         answer = self.stack.pop()
16 |         print(answer)
17 | 
18 |     def add_two_values(self):
19 |         first_num = self.stack.pop()
20 |         second_num = self.stack.pop()
21 |         total = first_num + second_num
22 |         self.stack.append(total)
23 | 
24 |     def run_code(self, what_to_execute):
25 |             instructions = what_to_execute["instructions"]
26 |             numbers = what_to_execute["numbers"]
27 |             for each_step in instructions:
28 |                 instruction, argument = each_step 
29 |                 if instruction == "load_value":
30 |                     number = numbers[argument]
31 |                     self.load_value(number)
32 |                 elif instruction == "add_two_values":
33 |                     self.add_two_values()
34 |                 elif instruction == "print_answer":
35 |                     self.print_answer()
36 | 
37 | interpreter = Interpreter()
38 | what_to_execute = {
39 |     "instructions": [("load_value", 0),
40 |                      ("load_value", 1),
41 |                      ("add_two_values", None),
42 |                      ("print_answer", None)],
43 |     "numbers": [7, 5] }
44 | interpreter.run_code(what_to_execute)                    


--------------------------------------------------------------------------------
/kmp.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-14 18:08:13
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-14 23:31:06
 6 | 
 7 | 
 8 | # 只是字符串匹配，还不是真正的kmp
 9 | def kmp(string, match):
10 |     n = len(string)
11 |     m = len(match)
12 |     i = 0
13 |     j = 0
14 |     count_times_used = 0
15 |     while i < n:
16 |         count_times_used += 1
17 |         if match[j] == string[i]:
18 |             if j == m - 1:
19 |                 print "Found '%s' start at string '%s' %s index position, find use times: %s" % (match, string, i - m + 1, count_times_used,)
20 |                 return
21 |             i += 1
22 |             j += 1
23 |         elif j > 0:
24 |             j = j - 1
25 |         else:
26 |             i += 1
27 | 
28 | kmp("asfdehhaassdsdasasedwa", "sase")
29 | kmp("12s3sasexxx", "sase")
30 | 


--------------------------------------------------------------------------------
/knn.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-23 19:24:54
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-23 19:41:34
 6 | import math
 7 | import numpy as np
 8 | from matplotlib import pyplot
 9 | from collections import Counter
10 | import warnings
11 | # K最近邻算法
12 | # 两个分组时k值取3，3个分组时k值取5...
13 |  
14 | # k-Nearest Neighbor算法
15 | def k_nearest_neighbors(data, predict, k=3):
16 |  
17 |     if len(data) >= k:
18 |         warnings.warn("k is too small")
19 |  
20 |     # 计算predict点到各点的距离
21 |     distances = []
22 |     for group in data:
23 |         for features in data[group]:
24 |             #euclidean_distance = np.sqrt(np.sum((np.array(features)-np.array(predict))**2))   # 计算欧拉距离，这个方法没有下面一行代码快
25 |             euclidean_distance = np.linalg.norm(np.array(features)-np.array(predict))
26 |             distances.append([euclidean_distance, group])
27 |  
28 |     sorted_distances =[i[1]  for i in sorted(distances)]
29 |     top_nearest = sorted_distances[:k]
30 |  
31 |     #print(top_nearest)  ['red','black','red'] 出现次数最多，返回一个TopN列表。如果n没有被指定，则返回所有元素。当多个元素计数值相同时，排列是无确定顺序的。
32 |     group_res = Counter(top_nearest).most_common(1)[0][0]
33 |     confidence = Counter(top_nearest).most_common(1)[0][1]*1.0/k
34 |     # confidences是对本次分类的确定程度，例如(red,red,red)，(red,red,black)都分为red组，但是前者显的更自信
35 |     return group_res, confidence
36 |  
37 | if __name__=='__main__':
38 |  
39 |     dataset = {'black':[ [1,2], [2,3], [3,1] ], 'red':[ [6,5], [7,7], [8,6] ]}
40 |     new_features = [3.5,5.2]  # 判断这个样本属于哪个组
41 |  
42 |     for i in dataset:
43 |         for ii in dataset[i]:
44 |             pyplot.scatter(ii[0], ii[1], s=50, color=i)
45 |  
46 |     #两个分组时k值取3，3个分组时k值取5
47 |     which_group,confidence = k_nearest_neighbors(dataset, new_features, k=3)
48 |     print(which_group, confidence)
49 |  
50 |     #s表示点的大小
51 |     pyplot.scatter(new_features[0], new_features[1], s=300, color=which_group)
52 |  
53 |     pyplot.show()


--------------------------------------------------------------------------------
/linked_list.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-12-20 22:53:58
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-12-20 23:27:23
 6 | 
 7 | #常规做法,遍历一次链表，获得长度step,在从0的位置遍历到step/2的位置
 8 | class Node(object):
 9 |   def __init__(self,data,next):
10 |     self.data=data
11 |     self.next=next
12 | 
13 | n1 = Node('n1',None)
14 | n2 = Node('n2',n1)
15 | n3 = Node('n3',n2)
16 | n4 = Node('n4',n3)
17 | n5 = Node('n5',n4)
18 | 
19 | head = n5 
20 | step = 0
21 | while head.next is not None:
22 |   step = step+1
23 |   head = head.next
24 | 
25 | 
26 | head = n5 
27 | for x in xrange(0,step/2):
28 |   head = head.next
29 | 
30 | 
31 | print '普通遍历方式,单链表中间节点为:%s,索引为:%s，遍历一次链表，在从0遍历到中间位置' % (head.data,step/2)
32 | 
33 | 
34 | #快慢指针方式,遍历一次链表，快指针到达链表末尾，慢指针到达链表中间
35 | class Node(object):
36 |   def __init__(self,data,next):
37 |     self.data=data
38 |     self.next=next
39 | 
40 | n1 = Node('n1',None)
41 | n2 = Node('n2',n1)
42 | n3 = Node('n3',n2)
43 | n4 = Node('n4',n3)
44 | n5 = Node('n5',n4)
45 | 
46 | head = n5   # 链表的头节点
47 |  
48 | p1 = head   # 一次步进1个node
49 | p2 = head   # 一次步进2个node
50 | 
51 | step = 0 
52 | while (p2.next is not None and p2.next.next is not None):
53 |   p2 = p2.next.next
54 |   p1 = p1.next
55 |   step = step + 1
56 | 
57 | 
58 | print '快慢指针方式,单链表中间节点为:%s,索引为:%s，只遍历一次链表' % (p1.data,step)
59 |     
60 | 
61 | 


--------------------------------------------------------------------------------
/nice_download.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # encoding: utf-8
  3 | # author: Lock
  4 | # time: 2017/12/21 18:28
  5 | # 多线程文件下载器,默认单线程
  6 | 
  7 | import sys
  8 | import optparse
  9 | import threading
 10 | import requests
 11 | import re
 12 | import time
 13 | 
 14 | 
 15 | class Download(object):
 16 |     def __init__(self, config_dict):
 17 |         self.url = config_dict['url']
 18 |         self.filename = self.clear_name(config_dict['url'].split('/')[-1])
 19 |         self.thread = config_dict['thread']
 20 |         self.user_agent = config_dict['user_agent']
 21 |         self.fileSize = 0
 22 |         self.supportThread = True
 23 |         self.show_print = (config_dict['show_print'] == 'yes') and True or False
 24 | 
 25 |     # 移除文件名的一些特殊字符
 26 |     def clear_name(self, filename):
 27 |         (filename, _) = re.subn(ur'[\\\/\:\*\?\"\<\>\|]', '', filename)
 28 |         return filename
 29 | 
 30 |     # 初始化目标文件信息
 31 |     def init_file_info(self):
 32 |         headers = {
 33 |             'User-Agent': self.user_agent,
 34 |             'Range': 'bytes=0-4'
 35 |         }
 36 |         try:
 37 |             r = requests.head(self.url, headers=headers)
 38 |             rang_content = r.headers['content-range']
 39 |             self.fileSize = int(re.match(ur'^bytes 0-4/(\d+)$', rang_content).group(1))
 40 |             return True
 41 |         except Exception, e:
 42 |             print 'can not support breakpoint download,msg:%s' % (e.message,)
 43 | 
 44 |         try:
 45 |             self.fileSize = int(r.headers['content-length'])
 46 |         except Exception, e:
 47 |             self.supportThread = False
 48 |             print 'can not support multi thread download , error:%s' % (e.message,)
 49 |         return False
 50 | 
 51 |     def start_part_download(self, thread_id, start_index, stop_index):
 52 |         try:
 53 |             headers = {'Range': 'bytes=%d-%d' % (start_index, stop_index,), 'User-Agent': self.user_agent}
 54 |             r = requests.get(self.url, headers=headers, stream=True, allow_redirects=True)
 55 |             if r.status_code == 206:
 56 |                 with open(self.filename, "rb+") as fp:
 57 |                     fp.seek(start_index)
 58 |                     fp.write(r.content)
 59 |             if self.show_print:
 60 |                 sys.stdout.write('thread %s download part size:%.2f KB\n' % (thread_id, (r.content.__len__()) / 1024))
 61 |                 sys.stdout.flush()
 62 |         except Exception, e:
 63 |             if self.show_print:
 64 |                 sys.stdout.write('下载出现错误,错误位置:%s,状态码:%s,错误信息:%s\n' % (start_index, r.status_code, e.message))
 65 |                 sys.stdout.flush()
 66 | 
 67 |     def run(self):
 68 |         print 'Start...'
 69 |         start_time = time.time()
 70 |         self.init_file_info()
 71 |         # 创建一个和要下载文件一样大小的文件
 72 |         with open(self.filename, "wb") as fp:
 73 |             fp.truncate(self.fileSize)
 74 | 
 75 |         if self.fileSize > 0:
 76 |             if self.supportThread is False and self.thread > 1:
 77 |                 print 'sorry,only support single thread'
 78 |                 self.thread = 1
 79 |             print 'Thread count is:%s' % (self.thread,)
 80 |             part = self.fileSize / self.thread
 81 |             for i in xrange(0, self.thread):
 82 |                 start_index = part * i
 83 |                 stop_index = start_index + part
 84 |                 if i == self.thread - 1:
 85 |                     stop_index = self.fileSize
 86 |                 download_args = {'thread_id': i, 'start_index': start_index, 'stop_index': stop_index}
 87 |                 worker = threading.Thread(target=self.start_part_download, kwargs=download_args)
 88 |                 worker.setDaemon(True)
 89 |                 worker.start()
 90 |             # 等待所有线程下载完成
 91 |             main_thread = threading.current_thread()
 92 |             for t in threading.enumerate():
 93 |                 if t is main_thread:
 94 |                     continue
 95 |                 t.join()
 96 |             print 'Success.\nTime:%.2fs , Size:%.2fKB' % (time.time() - start_time, self.fileSize / 1024)
 97 |         else:
 98 |             print 'Can not download'
 99 | 
100 | 
101 | if __name__ == '__main__':
102 |     parser = optparse.OptionParser(usage='python %s.py [options]' % (sys.argv[0],))
103 |     parser.add_option('-u', dest='url', type='string', help='specify download resource url')
104 |     parser.add_option('-t', dest='thread', type='int', help='specify download thread count', default=1)
105 |     parser.add_option('-p', dest='show_print', type='string', help='yes/no,show print info,default enable', default='yes')
106 |     parser.add_option("-a", dest="user_agent", help="specify request user agent", default='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0')
107 |     (options, args) = parser.parse_args()
108 |     if options.url is None:
109 |         parser.print_help()
110 |         exit()
111 |     config = {
112 |         'url': options.url,
113 |         'thread': options.thread,
114 |         'user_agent': options.user_agent,
115 |         'show_print': options.show_print
116 |     }
117 |     try:
118 |         Download(config).run()
119 |     except KeyboardInterrupt:
120 |         print '\nCancel Download'
121 | 


--------------------------------------------------------------------------------
/palindrome.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-06-25 00:48:56
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-06-25 01:39:49
 6 | 
 7 | # 时间复杂度：O(n)，空间复杂度：O(1)。从两头向中间扫描
 8 | 
 9 | s = "abcmnmcba"
10 | 
11 | 
12 | def check(s):
13 |     start = 0
14 |     end = len(s) - 1
15 |     while start < end:
16 |         if s[start:start + 1] != s[end:end + 1]:
17 |             print s[start:start + 1] + '---' + s[end:end + 1]
18 |             return False
19 |         start = start + 1
20 |         end = end - 1
21 |     return True
22 | 
23 | # print check(s)
24 | 
25 | 
26 | s2 = '12311211321'
27 | 
28 | # 时间复杂度：O(n)，空间复杂度：O(1)。先从中间开始、然后向两边扩展
29 | def check2(s):
30 |     if len(s) % 2 == 0:
31 |         mid = len(s) / 2
32 |         start, end = mid - 1, mid
33 |     if len(s) % 2 == 1:
34 |         mid = len(s) / 2
35 |         start, end = mid - 1, mid+1
36 |     while mid > 0:
37 |         if s[start:start+1] != s[end:end+1]:
38 |             return False
39 |         start = start - 1
40 |         end = end + 1
41 |         mid = mid - 1
42 |     return True
43 | 
44 | print check2(s2)
45 | 


--------------------------------------------------------------------------------
/rb_tree.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # encoding: utf-8
  3 | # author: Lock
  4 | # Created by Vim
  5 | """
  6 | 红黑树多用在内部排序，即全放在内存中的，微软STL的map和set的内部实现就是红黑树。
  7 | B树多用在内存里放不下，大部分数据存储在外存上时。因为B树层数少，因此可以确保每次操作，读取磁盘的次数尽可能的少。
  8 | 在数据较小，可以完全放到内存中时，红黑树的时间复杂度比B树低。反之，数据量较大，外存中占主要部分时，B树因其读磁盘次数少，而具有更快的速度。
  9 | """
 10 | 
 11 | 
 12 | class RBTree(object):
 13 |     def __init__(self):
 14 |         self.nil = RBTreeNode(0)
 15 |         self.root = self.nil
 16 | 
 17 | 
 18 | class RBTreeNode(object):
 19 |     def __init__(self, x):
 20 |         self.key = x
 21 |         self.left = None
 22 |         self.right = None
 23 |         self.parent = None
 24 |         self.color = 'black'
 25 |         self.size = None
 26 | 
 27 | 
 28 | # 左旋转
 29 | def left_rotate(T, x):
 30 |     y = x.right
 31 |     x.right = y.left
 32 |     if y.left != T.nil:
 33 |         y.left.parent = x
 34 |     y.parent = x.parent
 35 |     if x.parent == T.nil:
 36 |         T.root = y
 37 |     elif x == x.parent.left:
 38 |         x.parent.left = y
 39 |     else:
 40 |         x.parent.right = y
 41 |     y.left = x
 42 |     x.parent = y
 43 | 
 44 | 
 45 | # 右旋转
 46 | def right_rotate(T, x):
 47 |     y = x.left
 48 |     x.left = y.right
 49 |     if y.right != T.nil:
 50 |         y.right.parent = x
 51 |     y.parent = x.parent
 52 |     if x.parent == T.nil:
 53 |         T.root = y
 54 |     elif x == x.parent.right:
 55 |         x.parent.right = y
 56 |     else:
 57 |         x.parent.left = y
 58 |     y.right = x
 59 |     x.parent = y
 60 | 
 61 | 
 62 | # 红黑树的插入
 63 | def rb_insert(T, z):
 64 |     y = T.nil
 65 |     x = T.root
 66 |     while x != T.nil:
 67 |         y = x
 68 |         if z.key < x.key:
 69 |             x = x.left
 70 |         else:
 71 |             x = x.right
 72 |     z.parent = y
 73 |     if y == T.nil:
 74 |         T.root = z
 75 |     elif z.key < y.key:
 76 |         y.left = z
 77 |     else:
 78 |         y.right = z
 79 |     z.left = T.nil
 80 |     z.right = T.nil
 81 |     z.color = 'red'
 82 |     rb_insert_fix_up(T, z)
 83 |     return "%s,%s,%s" % (z.key, "颜色为", z.color)
 84 | 
 85 | 
 86 | # 红黑树的上色
 87 | def rb_insert_fix_up(T, z):
 88 |     while z.parent.color == 'red':
 89 |         if z.parent == z.parent.parent.left:
 90 |             y = z.parent.parent.right
 91 |             if y.color == 'red':
 92 |                 z.parent.color = 'black'
 93 |                 y.color = 'black'
 94 |                 z.parent.parent.color = 'red'
 95 |                 z = z.parent.parent
 96 |             else:
 97 |                 if z == z.parent.right:
 98 |                     z = z.parent
 99 |                     left_rotate(T, z)
100 |                 z.parent.color = 'black'
101 |                 z.parent.parent.color = 'red'
102 |                 right_rotate(T, z.parent.parent)
103 |         else:
104 |             y = z.parent.parent.left
105 |             if y.color == 'red':
106 |                 z.parent.color = 'black'
107 |                 y.color = 'black'
108 |                 z.parent.parent.color = 'red'
109 |                 z = z.parent.parent
110 |             else:
111 |                 if z == z.parent.left:
112 |                     z = z.parent
113 |                     right_rotate(T, z)
114 |                 z.parent.color = 'black'
115 |                 z.parent.parent.color = 'red'
116 |                 left_rotate(T, z.parent.parent)
117 |     T.root.color = 'black'
118 | 
119 | 
120 | def rb_transplant(T, u, v):
121 |     if u.parent == T.nil:
122 |         T.root = v
123 |     elif u == u.parent.left:
124 |         u.parent.left = v
125 |     else:
126 |         u.parent.right = v
127 |     v.parent = u.parent
128 | 
129 | 
130 | def rb_delete(T, z):
131 |     y = z
132 |     y_original_color = y.color
133 |     if z.left == T.nil:
134 |         x = z.right
135 |         rb_transplant(T, z, z.right)
136 |     elif z.right == T.nil:
137 |         x = z.left
138 |         rb_transplant(T, z, z.left)
139 |     else:
140 |         y = tree_minimum(z.right)
141 |         y_original_color = y.color
142 |         x = y.right
143 |         if y.parent == z:
144 |             x.parent = y
145 |         else:
146 |             rb_transplant(T, y, y.right)
147 |             y.right = z.right
148 |             y.right.parent = y
149 |         rb_transplant(T, z, y)
150 |         y.left = z.left
151 |         y.left.parent = y
152 |         y.color = z.color
153 |     if y_original_color == 'black':
154 |         rb_delete_fix_up(T, x)
155 | 
156 | 
157 | # 红黑树的删除
158 | def rb_delete_fix_up(T, x):
159 |     while x != T.root and x.color == 'black':
160 |         if x == x.parent.left:
161 |             w = x.parent.right
162 |             if w.color == 'red':
163 |                 w.color = 'black'
164 |                 x.parent.color = 'red'
165 |                 left_rotate(T, x.parent)
166 |                 w = x.parent.right
167 |             if w.left.color == 'black' and w.right.color == 'black':
168 |                 w.color = 'red'
169 |                 x = x.parent
170 |             else:
171 |                 if w.right.color == 'black':
172 |                     w.left.color = 'black'
173 |                     w.color = 'red'
174 |                     right_rotate(T, w)
175 |                     w = x.parent.right
176 |                 w.color = x.parent.color
177 |                 x.parent.color = 'black'
178 |                 w.right.color = 'black'
179 |                 left_rotate(T, x.parent)
180 |                 x = T.root
181 |         else:
182 |             w = x.parent.left
183 |             if w.color == 'red':
184 |                 w.color = 'black'
185 |                 x.parent.color = 'red'
186 |                 right_rotate(T, x.parent)
187 |                 w = x.parent.left
188 |             if w.right.color == 'black' and w.left.color == 'black':
189 |                 w.color = 'red'
190 |                 x = x.parent
191 |             else:
192 |                 if w.left.color == 'black':
193 |                     w.right.color = 'black'
194 |                     w.color = 'red'
195 |                     left_rotate(T, w)
196 |                     w = x.parent.left
197 |                 w.color = x.parent.color
198 |                 x.parent.color = 'black'
199 |                 w.left.color = 'black'
200 |                 right_rotate(T, x.parent)
201 |                 x = T.root
202 |     x.color = 'black'
203 | 
204 | 
205 | def tree_minimum(x):
206 |     while x.left != T.nil:
207 |         x = x.left
208 |     return x
209 | 
210 | 
211 | # 中序遍历
212 | def mid_sort(x):
213 |     if x is not None:
214 |         mid_sort(x.left)
215 |         if x.key != 0:
216 |             print('key:', x.key, 'x.parent', x.parent.key)
217 |         mid_sort(x.right)
218 | 
219 | 
220 | if __name__ == '__main__':
221 |     nodes = [11, 2, 14, 1, 7, 15, 5, 8, 4]
222 |     T = RBTree()
223 |     for node in nodes:
224 |         print '插入数据', rb_insert(T, RBTreeNode(node))
225 |     print('中序遍历')
226 |     mid_sort(T.root)
227 |     rb_delete(T, T.root)
228 |     print('中序遍历')
229 |     mid_sort(T.root)
230 |     rb_delete(T, T.root)
231 |     print('中序遍历')
232 |     mid_sort(T.root)
233 | 


--------------------------------------------------------------------------------
/red_package_optimize.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # Created by Vim
 5 | from random import choice
 6 | import random
 7 | import sys
 8 | 
 9 | def calc_red_package(m,c):
10 | 	if c*0.01*100>m:
11 | 		print '红包总金额为:%s元不能划分成%s个红包'%(m/100.0,c,)
12 | 		exit()
13 | 		
14 | 	all = {}
15 | 	val = m/c
16 | 	left = m-(val*c)
17 | 	for i in range(0,c):
18 | 		all[i] = [1,val]
19 | 
20 | 	if left>0:
21 | 		rand = random.randint(0,c-1)
22 | 		all[rand].append(all[rand][-1]+left)
23 | 
24 | 	pos = {}
25 | 	for index in all:
26 | 		stop=random.randint(0,val)
27 | 		pos[index] = stop
28 | 
29 | 	res = []
30 | 
31 | 	if len(pos)>=2:
32 | 		for point in pos:
33 | 			left = pos[point]
34 | 			if left==0:
35 | 				left=1
36 | 			if point==0:
37 | 				res.append(left/100.0)
38 | 				last_right = all[point][-1]-left
39 | 			else:
40 | 				right = last_right
41 | 				if point==(c-1):
42 | 					res.append((left+right)/100.0)
43 | 					end = all[point][-1]-left
44 | 					randMax = random.randint(0,len(res)-1)
45 | 					res[randMax] = (res[randMax]*100 + end)/100.0
46 | 				else:
47 | 					res.append((left+right)/100.0)
48 | 					last_right = all[point][-1]-left
49 | 	else:
50 | 		res.append((all[0][-1])/100.0)
51 | 
52 | 	print res
53 | 
54 | 	for key,item in enumerate(res):
55 | 		print '第 %s 个红包金额:%s元' %(key+1,item)
56 | 	print '验证:红包总金额 is %s元, 分配后 res sum is %s元'%(m/100.0,sum(res),)
57 | 
58 | 
59 | if __name__ == '__main__':
60 | 	m = 1000 # 红包金额，单位：分
61 | 	c = 4 # 红包个数
62 | 	if len(sys.argv)==3:
63 | 		m = int(float(sys.argv[1])*100)
64 | 		c = int(sys.argv[2])
65 | 	calc_red_package(m,c)
66 | 
67 | 
68 | 
69 | 


--------------------------------------------------------------------------------
/redpackage.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # Created by Vim
 5 | from random import choice
 6 | import sys
 7 | 
 8 | def calc_red_package(m,c):
 9 | 	if c*0.01*100>m:
10 | 		print '红包总金额为:%s元不能划分成%s个红包'%(m/100.0,c,)
11 | 		exit()
12 | 		
13 | 	all = {}
14 | 	val = m/c
15 | 	left = m-(val*c)
16 | 	for i in range(0,c):
17 | 		all[i] = range(0,val)
18 | 
19 | 	if left>0:
20 | 		rand = choice(range(0,c))
21 | 		all[rand].append(all[rand][-1]+left)
22 | 
23 | 	pos = {}
24 | 	for index in all:
25 | 		stop=choice(all[index])
26 | 		pos[index] = stop
27 | 
28 | 	res = []
29 | 
30 | 	if len(pos)>=2:
31 | 		for point in pos:
32 | 			left = pos[point]+1
33 | 			if point==0:
34 | 				res.append(left/100.0)
35 | 			else:
36 | 				right = all[point-1][-1]-pos[point-1]
37 | 				if point==(c-1):
38 | 					end = all[point][-1]-pos[point]
39 | 					randMax = choice(range(0,len(res)))
40 | 					res[randMax] = (res[randMax]*100 + end)/100.0
41 | 					res.append((left+right)/100.0)
42 | 				else:
43 | 					res.append((left+right)/100.0)
44 | 	else:
45 | 		res.append((all[0][-1]+1)/100.0)
46 | 
47 | 	print res
48 | 
49 | 	for key,item in enumerate(res):
50 | 		print '第 %s 个红包金额:%s元' %(key+1,item)
51 | 	print '验证:红包总金额 is %s元, 分配后 res sum is %s元'%(m/100.0,sum(res),)
52 | 
53 | 
54 | if __name__ == '__main__':
55 | 	m = 1000 # 红包金额，单位：分
56 | 	c = 4 # 红包个数
57 | 	if len(sys.argv)==3:
58 | 		m = int(float(sys.argv[1])*100)
59 | 		c = int(sys.argv[2])
60 | 	calc_red_package(m,c)
61 | 
62 | 
63 | 
64 | 


--------------------------------------------------------------------------------
/revert_list.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-09-05 10:53:46
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2019-06-10 16:21:00
 6 | 
 7 | class Node():
 8 | 
 9 | 	def __init__(self, value):
10 | 		self.next = None
11 | 		self.value = value
12 | 
13 | 
14 | class DoubleNode:
15 | 
16 | 	def __init__(self, value):
17 | 		self.value = value
18 | 		self.next = None
19 | 		self.pre = None
20 | 
21 | 
22 | class RevertList():
23 | 
24 |     @classmethod
25 |     def revert_linked_list(cls, head):
26 |         pre = None
27 |         while head is not None:
28 |             next = head.next
29 |             head.next = pre
30 |             pre = head
31 |             head = next
32 |         return pre
33 | 
34 |     @classmethod
35 |     def revert_double_linked_list(cls, head):
36 |         pre = None
37 |         while head is not None:
38 |             next = head.next
39 |             head.next = pre
40 |             head.pre = next
41 |             pre = head
42 |             head = next
43 |         return pre
44 | 
45 | 
46 | if __name__ == '__main__':
47 | 	node = Node(1);
48 | 	node.next = Node(2);
49 | 	node.next.next = Node(3)
50 | 	print node.value
51 | 	print node.next.value
52 | 	print node.next.next.value
53 | 	print 'start revert list ...'
54 | 	newNode = RevertList.revert_linked_list(node)
55 | 	print newNode.value
56 | 	print newNode.next.value
57 | 	print newNode.next.next.value
58 | 
59 | 	node2 = DoubleNode(1)
60 | 	node2.next = DoubleNode(2)
61 | 	node2.next.pre = node2
62 | 	node2.next.next = DoubleNode(3)
63 | 	node2.next.next.pre = node2.next
64 | 	node2.next.next.next = DoubleNode(4)
65 | 	node2.next.next.next.pre = node2.next.next
66 | 	node2.next.next.next.next = DoubleNode(5)
67 | 	node2.next.next.next.next.pre = node2.next.next.next
68 | 	node2.next.next.next.next.next = DoubleNode(6)
69 | 	node2.next.next.next.next.next.pre = node2.next.next.next
70 | 


--------------------------------------------------------------------------------
/rpn.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-06-26 15:51:28
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-06-26 15:55:53
 6 | def calc(s):
 7 |     if type(s) != list:s = s.split(' ')
 8 |     operaList = ['+', '-', '*', '/'];
 9 |     for key, item in enumerate(s):
10 |         if item in operaList:
11 |             val = eval(s[key - 2] + item + s[key - 1])
12 |             s.insert(key - 2, str(val)) 
13 |             for x in ['1','2','3']:del s[key - 1] 
14 |             calc(s)
15 |     return sum(map(eval, s))
16 | def translate(calcStr):
17 |     element, calcList,s,stackStr,i = '', [],[],[],0
18 |     for key,item in enumerate(calcStr):
19 |         if item.isdigit():
20 |             element = element + item
21 |             if key == len(calcStr)-1:calcList.append(element)
22 |         else:
23 |             if element != '':
24 |                 calcList.append(element)
25 |                 element = ''
26 |             if item in ['+', '-', '*', '/', '(', ')']:
27 |                 calcList.append(item)
28 |     calcList.insert(0,'(')
29 |     calcList.append(')')
30 |     calcList.append('#')
31 |     while calcList[i] != "#":
32 |         if (calcList[i].isdigit()):
33 |             stackStr.append(calcList[i])
34 |         elif calcList[i] == '(':
35 |             s.append(calcList[i])
36 |         elif calcList[i] == ')':
37 |             while s[-1] != '(':
38 |                 stackStr.append(s.pop())
39 |             s.pop()
40 |         elif calcList[i] in ['+', '-']:
41 |             while s[-1] != '(':
42 |                 stackStr.append(s.pop())
43 |             s.append(calcList[i])
44 |         elif calcList[i] in ['*', '/']:
45 |             while s[-1] in ['*', '/']:
46 |                 stackStr.append(s.pop())
47 |             s.append(calcList[i])
48 |         i = i + 1
49 |     return stackStr
50 | if __name__ == '__main__':
51 |     s = '11111111111111*9999999999999+(99-(12/4)+10)'
52 |     print translate(s)
53 |     print str(calc(translate('11111111111111*9999999999999+(99-(12/4)+10)'))) == str(11111111111111*9999999999999+(99-(12/4)+10)),str(calc(translate(s))) , str(11111111111111*9999999999999+(99-(12/4)+10))
54 |     print str(calc(translate('12+1+12+33*9+4'))) == str(12+1+12+33*9+4),str(calc(translate('12+1+12+33*9+4'))) , str(12+1+12+33*9+4)
55 | 
56 | 
57 | 
58 | 


--------------------------------------------------------------------------------
/rsa.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2016/11/8 16:51
 5 | 
 6 | 
 7 | # RSA算法中加密方公布的密钥是n和e，解密方使用n和d解密
 8 | 
 9 | # p和q必须为素数，在实际运用中通常为很大的数
10 | p = 5
11 | q = 7
12 | 
13 | n = p * q
14 | z = (p - 1) * (q - 1)
15 | 
16 | e = 5  # 加密方选择e，e必须和z只有一个公约数
17 | d = 5  # (e * d - 1)必须能够被z整除 , 由于long int无法表示过大的数字，所以d取5
18 | 
19 | 
20 | def run():
21 |     raw_msg = [12, 15, 22, 5]
22 |     en, de = [], []
23 |     sec_code, de_msg = [], []
24 |     print "下面是一个RSA加解密算法的简单演示:\n"
25 |     print "报文\t加密\t   加密后密文\n"
26 |     for item in raw_msg:
27 |         en_key_item = pow(item, e)
28 |         en.append(en_key_item)
29 |         sec_code_item = en_key_item % n
30 |         sec_code.append(sec_code_item)
31 |         print "%d\t%d\t\t%d" % (item, en_key_item, sec_code_item)
32 | 
33 |     print "\n"
34 |     print "---------------------------"
35 |     print "----------执行解密---------"
36 |     print "---------------------------"
37 | 
38 |     print "原始报文\t密文\t加密\t解密报文\n"
39 |     for key,item in enumerate(sec_code):
40 |         de_key_item = pow(item,d)
41 |         de_msg_item = de_key_item % n
42 |         de_msg.append(de_msg_item)
43 |         print "%d\t\t%d\t%d\t\t%d" % (raw_msg[key], item, de_key_item, de_msg_item)
44 | 
45 | if __name__ == '__main__':
46 |     run()
47 | 
48 | 
49 | 
50 | 


--------------------------------------------------------------------------------
/selenium.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Author: lock
 3 | # @Date:   2017-04-04 01:40:22
 4 | # @Last Modified by:   lock
 5 | # @Last Modified time: 2017-04-04 23:38:09
 6 | import unittest
 7 | from selenium import webdriver
 8 | from selenium.webdriver.common.keys import Keys
 9 | import time
10 | 
11 | class BaiduSearch(unittest.TestCase):
12 | 
13 |     def setUp(self):
14 |         self.driver = webdriver.Chrome()
15 | 
16 | 
17 |     def test_lock(self):
18 |         driver = self.driver
19 |         driver.get("http://www.baidu.com")
20 |         self.assertIn(u"百度一下", driver.title)
21 |         elem = driver.find_element_by_id("kw")
22 |         elem.send_keys("lock")
23 |         elem.send_keys(Keys.RETURN)
24 |         i = 0
25 |         while 1:
26 |                if i>=2:
27 |                     break
28 |                time.sleep(1)
29 |                i+=1
30 |                print "not test %s , wait %s second continue ..." % ('lock',i,)
31 | 
32 |     def test_search(self):
33 |         driver = self.driver
34 |         driver.get("http://www.baidu.com")
35 |         self.assertIn(u"百度一下", driver.title)
36 |         elem = driver.find_element_by_id("kw")
37 |         elem.send_keys("php")
38 |         elem.send_keys(Keys.RETURN)
39 |         i = 0
40 |         while 1:
41 |                if i>=2:
42 |                     break
43 |                time.sleep(1)
44 |                i+=1
45 |                print "not test %s , wait %s second continue ..." % ('php',i,)
46 |         assert "No results found." not in driver.page_source
47 | 
48 | 
49 |     def tearDown(self):
50 |         self.driver.close()
51 | 
52 | if __name__ == "__main__":
53 |     unittest.main()


--------------------------------------------------------------------------------
/svm.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # @Author: lock
  3 | # @Date:   2017-12-21 09:58:01
  4 | # @Last Modified by:   lock
  5 | # @Last Modified time: 2017-12-21 17:41:07
  6 | # 分类算法之SVM,比KNN算法更加复杂
  7 | # demo最简单的线性可分离数据
  8 | # 参考：http://blog.csdn.net/lisi1129/article/details/70209945?locationNum=8&fps=1
  9 | 
 10 | import numpy as np
 11 | from matplotlib import pyplot
 12 | import math
 13 | import sys
 14 |  
 15 | class SVM(object):
 16 |     def __init__(self, visual=True):
 17 |         self.visual = visual
 18 |         self.colors = {1:'r', -1:'b'}
 19 |         if self.visual:
 20 |             self.fig = pyplot.figure()
 21 |             self.ax = self.fig.add_subplot(1,1,1)
 22 |  
 23 |     def train(self, data):
 24 |         self.data = data
 25 |         opt_dict = {}
 26 |     
 27 |         transforms = [[1,1],
 28 |                       [-1,1],
 29 |                       [-1,-1],
 30 |                       [1,-1]]
 31 |                       
 32 |         # 找到数据集中最大值和最小值
 33 |         self.max_feature_value = float('-inf')   # 正无穷
 34 |         self.min_feature_value = float('inf')    # 负无穷
 35 |         for y in self.data:
 36 |             for features in self.data[y]:
 37 |                 for feature in features:
 38 |                     if feature > self.max_feature_value:
 39 |                         self.max_feature_value = feature
 40 |                     if feature < self.min_feature_value:
 41 |                         self.min_feature_value = feature
 42 |         print(self.max_feature_value, self.min_feature_value)
 43 |         
 44 |         # 和梯度下降一样，定义每一步的大小；开始快，然后慢，越慢越耗时
 45 |         step_sizes = [self.max_feature_value * 0.1, self.max_feature_value * 0.01, self.max_feature_value * 0.001]
 46 |         
 47 |         b_range_multiple = 5
 48 |         b_multiple = 5
 49 |         lastest_optimum = self.max_feature_value * 10
 50 |         
 51 |         for step in step_sizes:
 52 |             w = np.array([lastest_optimum,lastest_optimum])
 53 |             optimized = False
 54 |             while not optimized:
 55 |                 for b in np.arange(self.max_feature_value*b_range_multiple*-1, self.max_feature_value*b_range_multiple, step*b_multiple):
 56 |                     for transformation in transforms:
 57 |                         w_t = w * transformation
 58 |                         found_option = True
 59 |                         for i in self.data:
 60 |                             for x in self.data[i]:
 61 |                                 y = i
 62 |                                 if not y*(np.dot(w_t, x)+b) >= 1:
 63 |                                     found_option = False
 64 |                                 #print(x,':',y*(np.dot(w_t, x)+b))  逐渐收敛
 65 |                                     
 66 |                         if found_option:
 67 |                             opt_dict[np.linalg.norm(w_t)] = [w_t,b]
 68 |  
 69 |                 if w[0] < 0:
 70 |                     optimized = True
 71 |                 else:
 72 |                     w = w - step
 73 |         
 74 |             norms = sorted([n for n in opt_dict])
 75 |             opt_choice = opt_dict[norms[0]]
 76 |             self.w = opt_choice[0]
 77 |             self.b = opt_choice[1]
 78 |             print(self.w, self.b)
 79 |             lastest_optimum = opt_choice[0][0] + step*2
 80 |  
 81 | 
 82 |     def predict(self, features):
 83 |         classification = np.sign( np.dot(features, self.w) + self.b )
 84 |         
 85 |         if classification != 0 and self.visual:
 86 |             self.ax.scatter(features[0], features[1], s=300, marker='*', c=self.colors[classification])
 87 |  
 88 |         return classification
 89 |  
 90 |  
 91 |     # 显示picture
 92 |     def visualize(self):
 93 |         for i in self.data:
 94 |             for x in self.data[i]:
 95 |                 self.ax.scatter(x[0], x[1], s=50, c=self.colors[i])
 96 |  
 97 |         # 超平面
 98 |         def hyperplane(x,w,b,v):
 99 |             return (-w[0]*x-b+v) / w[1]
100 |  
101 |         data_range = (self.min_feature_value*0.9, self.max_feature_value*1.1)
102 |  
103 |         hyp_x_min = data_range[0]
104 |         hyp_x_man = data_range[1]
105 |  
106 |         psv1 = hyperplane(hyp_x_min, self.w, self.b, 1)
107 |         psv2 = hyperplane(hyp_x_man, self.w, self.b, 1)
108 |         self.ax.plot([hyp_x_min, hyp_x_man], [psv1, psv2], c=self.colors[1])
109 |  
110 |         nsv1 = hyperplane(hyp_x_min, self.w, self.b, -1)
111 |         nsv2 = hyperplane(hyp_x_man, self.w, self.b, -1)
112 |         self.ax.plot([hyp_x_min, hyp_x_man], [nsv1, nsv2], c=self.colors[-1])
113 |  
114 |         db1 = hyperplane(hyp_x_min, self.w, self.b, 0)
115 |         db2 = hyperplane(hyp_x_man, self.w, self.b, 0)
116 |         self.ax.plot([hyp_x_min, hyp_x_man], [db1, db2], 'y--')
117 |  
118 |         pyplot.show()
119 |  
120 | if __name__ == '__main__':
121 |     data_set = {-1:np.array([[1,7],
122 |                              [2,8],
123 |                              [3,8]]),
124 |                  1:np.array([[5,1],
125 |                              [6,-1],
126 |                              [7,3]])}
127 |     print(data_set)
128 |  
129 |     svm = SVM()
130 |     svm.train(data_set)
131 |  
132 |     # 预测
133 |     for predict_feature in [[0,10],[2,6],[1,3], [4,3], [5.5,7.5], [8,3]]:
134 |         print(svm.predict(predict_feature))
135 |  
136 |     svm.visualize()
137 | 
138 | 


--------------------------------------------------------------------------------
/tensorflow/cnn_test.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2018/3/18 17:26
 5 | 
 6 | import tensorflow as tf
 7 | from train import cnn_graph
 8 | from train import get_random_captcha_text_and_image
 9 | from train import vec2text, convert2gray
10 | from create_captcha_img import CAPTCHA_LIST, CAPTCHA_WIDTH, CAPTCHA_HEIGHT, CAPTCHA_LEN
11 | 
12 | 
13 | def captcha_to_text(image_list, height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH):
14 |     '''
15 |     验证码图片转化为文本
16 |     :param image_list:
17 |     :param height:
18 |     :param width:
19 |     :return:
20 |     '''
21 |     x = tf.placeholder(tf.float32, [None, height * width])
22 |     keep_prob = tf.placeholder(tf.float32)
23 |     y_conv = cnn_graph(x, keep_prob, (height, width))
24 |     saver = tf.train.Saver()
25 |     with tf.Session() as sess:
26 |         saver.restore(sess, tf.train.latest_checkpoint('.'))
27 |         predict = tf.argmax(tf.reshape(y_conv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2)
28 |         vector_list = sess.run(predict, feed_dict={x: image_list, keep_prob: 1})
29 |         vector_list = vector_list.tolist()
30 |         text_list = [vec2text(vector) for vector in vector_list]
31 |         return text_list[0]
32 | 
33 | 
34 | def multi_test(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH):
35 |     x = tf.placeholder(tf.float32, [None, height * width])
36 |     keep_prob = tf.placeholder(tf.float32)
37 |     y_conv = cnn_graph(x, keep_prob, (height, width))
38 |     saver = tf.train.Saver()
39 |     with tf.Session() as sess:
40 |         saver.restore(sess, tf.train.latest_checkpoint('.'))
41 |         while 1:
42 |             text, image = get_random_captcha_text_and_image()
43 |             image = convert2gray(image)
44 |             image = image.flatten() / 255
45 |             image_list = [image]
46 |             predict = tf.argmax(tf.reshape(y_conv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2)
47 |             vector_list = sess.run(predict, feed_dict={x: image_list, keep_prob: 1})
48 |             vector_list = vector_list.tolist()
49 |             text_list = [vec2text(vector) for vector in vector_list]
50 |             pre_text = text_list[0]
51 |             flag = u'错误'
52 |             if text == pre_text:
53 |                 flag = u'正确'
54 |             print u"实际值(actual):%s, 预测值(predict):%s, 预测结果:%s" % (text, pre_text, flag,)
55 | 
56 | 
57 | if __name__ == '__main__':
58 |     try:
59 |         # 多个测试
60 |         multi_test()
61 |         exit()
62 | 
63 |         text, image = get_random_captcha_text_and_image()
64 |         image = convert2gray(image)
65 |         image = image.flatten() / 255
66 |         pre_text = captcha_to_text([image])
67 |         flag = u'错误'
68 |         if text == pre_text:
69 |             flag = u'正确'
70 |         print u"实际值(actual):%s, 预测值(predict):%s, 预测结果:%s" % (text, pre_text, flag,)
71 |     except KeyboardInterrupt as e:
72 |         print e.message
73 | 


--------------------------------------------------------------------------------
/tensorflow/create_captcha_img.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | # author: Lock
 4 | # time: 2018/3/18 13:25
 5 | 
 6 | import string
 7 | import random
 8 | from captcha.image import ImageCaptcha
 9 | from PIL import Image
10 | import numpy as np
11 | import os
12 | 
13 | CAPTCHA_HEIGHT = 60  # 验证码高度
14 | CAPTCHA_WIDTH = 160  # 验证码宽度
15 | CAPTCHA_LEN = 4  # 验证码长度
16 | # CAPTCHA_LIST = [str(i) for i in range(0, 10)] + list(string.ascii_letters)  # 验证码字符列表
17 | CAPTCHA_LIST = [str(i) for i in range(0, 10)]  # 验证码字符列表,改小一点的访问,提高速度
18 | 
19 | 
20 | def get_random_captcha_text(char_set=CAPTCHA_LIST, length=CAPTCHA_LEN):
21 |     captcha_text = [random.choice(char_set) for _ in range(length)]
22 |     return ''.join(captcha_text)
23 | 
24 | 
25 | def get_random_captcha_text_and_image(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT, save=None):
26 |     image = ImageCaptcha(width=width, height=height)
27 |     captcha_text = get_random_captcha_text()
28 |     captcha = image.generate(captcha_text)
29 |     if save:
30 |         image.write(captcha_text, 'image/' + captcha_text + '.jpg')
31 |     captcha_image = Image.open(captcha)
32 |     # 转化为np数组
33 |     captcha_image_np = np.array(captcha_image)
34 |     return captcha_text, captcha_image_np
35 | 
36 | 
37 | if __name__ == "__main__":
38 |     if os.path.exists('image') is False:
39 |         os.mkdir('image')
40 | 
41 |     while 1:
42 |         text, np_data = get_random_captcha_text_and_image(CAPTCHA_WIDTH, CAPTCHA_HEIGHT, 1)
43 |         print text
44 | 


--------------------------------------------------------------------------------
/tensorflow/train.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # encoding: utf-8
  3 | # author: Lock
  4 | # time: 2018/3/18 14:48
  5 | 
  6 | import tensorflow as tf
  7 | import os, numpy as np
  8 | from datetime import datetime
  9 | from create_captcha_img import CAPTCHA_LIST, CAPTCHA_WIDTH, CAPTCHA_HEIGHT, CAPTCHA_LEN
 10 | from create_captcha_img import get_random_captcha_text_and_image
 11 | 
 12 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
 13 | 
 14 | 
 15 | def weight_variable(shape, w_alpha=0.01):
 16 |     """
 17 |     增加噪音,随机生成权重
 18 |     :param shape:
 19 |     :param w_alpha:
 20 |     :return: Tensor , shape仍然是[batch, height, width, channels]这种形式
 21 |     """
 22 |     initial = w_alpha * tf.random_normal(shape)
 23 |     return tf.Variable(initial)
 24 | 
 25 | 
 26 | def bias_variable(shape, b_alpha=0.01):
 27 |     """
 28 |     增加噪音，随机生成偏置项
 29 |     :param shape:
 30 |     :param b_alpha:
 31 |     :return: Tensor , shape仍然是[batch, height, width, channels]这种形式
 32 |     """
 33 |     initial = b_alpha * tf.random_normal(shape)
 34 |     return tf.Variable(initial)
 35 | 
 36 | 
 37 | def conv2d(input, filter):
 38 |     """
 39 |     实现卷积的函数
 40 |     局部变量线性组合，部长为1，模式same代表卷积后图片尺寸不变，即零边距
 41 |     https://www.cnblogs.com/qggg/p/6832342.html
 42 |     :param input: 具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],Tensor
 43 |     :param filter: 具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数],Tensor
 44 |     :return: Tensor , shape仍然是[batch, height, width, channels]这种形式
 45 |     """
 46 |     # 第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4
 47 |     # http://blog.csdn.net/wuzqChom/article/details/74785643 SAME与VALID的区别
 48 |     return tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
 49 | 
 50 | 
 51 | def max_pool(val):
 52 |     """
 53 |     池化操作，max pooling是CNN当中的最大值池化操作，其实用法和卷积很类似
 54 |     http://blog.csdn.net/mao_xiao_feng/article/details/53453926
 55 |     :param val:一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape
 56 |     :return: Tensor , shape仍然是[batch, height, width, channels]这种形式
 57 |     """
 58 |     # ksize池化窗口的大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1
 59 |     # strides：和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]
 60 |     return tf.nn.max_pool(val, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
 61 | 
 62 | 
 63 | def cnn_graph(x, keep_prob, size, captcha_list=CAPTCHA_LIST, captcha_len=CAPTCHA_LEN):
 64 |     """
 65 |     三层卷积神经网络计算图
 66 |     :param x:
 67 |     :param keep_prob:
 68 |     :param size:
 69 |     :param captcha_list:
 70 |     :param captcha_len:
 71 |     :return:
 72 |     """
 73 |     # 图片reshape为4维向量
 74 |     image_height, image_width = size
 75 |     # http://blog.csdn.net/lxg0807/article/details/53021859 reshape介绍
 76 |     x_image = tf.reshape(x, shape=[-1, image_height, image_width, 1])
 77 | 
 78 |     # 第一层 ,filter定义为3x3x1， 输出32个特征, 即32个filter
 79 |     w_conv1 = weight_variable([3, 3, 1, 32])
 80 |     b_conv1 = bias_variable([32])
 81 |     # rulu激活函数 （
 82 |     # 一种函数（例如 ReLU 或 S 型函数），用于对上一层的所有输入求加权和，然后生成一个输出值）
 83 |     # （通常为非线性值），并将其传递给下一层。
 84 |     h_conv1 = tf.nn.relu(tf.nn.bias_add(conv2d(x_image, w_conv1), b_conv1))
 85 |     # 池化
 86 |     h_pool1 = max_pool(h_conv1)
 87 |     # dropout防止过拟合
 88 |     h_drop1 = tf.nn.dropout(h_pool1, keep_prob)
 89 | 
 90 |     # 第二层
 91 |     w_conv2 = weight_variable([3, 3, 32, 64])
 92 |     b_conv2 = bias_variable([64])
 93 |     h_conv2 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop1, w_conv2), b_conv2))
 94 |     h_pool2 = max_pool(h_conv2)
 95 |     h_drop2 = tf.nn.dropout(h_pool2, keep_prob)
 96 | 
 97 |     # 第三层
 98 |     w_conv3 = weight_variable([3, 3, 64, 64])
 99 |     b_conv3 = bias_variable([64])
100 |     h_conv3 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop2, w_conv3), b_conv3))
101 |     h_pool3 = max_pool(h_conv3)
102 |     h_drop3 = tf.nn.dropout(h_pool3, keep_prob)
103 | 
104 |     # 全连接层
105 |     image_height = int(h_drop3.shape[1])
106 |     image_width = int(h_drop3.shape[2])
107 |     w_fc = weight_variable([image_height * image_width * 64, 1024])
108 |     b_fc = bias_variable([1024])
109 |     h_drop3_re = tf.reshape(h_drop3, [-1, image_height * image_width * 64])
110 |     h_fc = tf.nn.relu(tf.add(tf.matmul(h_drop3_re, w_fc), b_fc))
111 |     h_drop_fc = tf.nn.dropout(h_fc, keep_prob)
112 | 
113 |     # 输出层
114 |     w_out = weight_variable([1024, len(captcha_list) * captcha_len])
115 |     b_out = bias_variable([len(captcha_list) * captcha_len])
116 |     y_conv = tf.add(tf.matmul(h_drop_fc, w_out), b_out)
117 |     return y_conv
118 | 
119 | 
120 | def optimize_graph(y, y_conv):
121 |     '''
122 |     优化计算图
123 |     :param y:
124 |     :param y_conv:
125 |     :return:
126 |     '''
127 |     # 交叉熵计算loss 注意logits输入是在函数内部进行sigmod操作
128 |     # sigmod_cross适用于每个类别相互独立但不互斥，如图中可以有字母和数字
129 |     # softmax_cross适用于每个类别独立且排斥的情况，如数字和字母不可以同时出现
130 |     loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_conv, labels=y))
131 |     # 最小化loss优化
132 |     optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
133 |     return optimizer
134 | 
135 | 
136 | def accuracy_graph(y, y_conv, width=len(CAPTCHA_LIST), height=CAPTCHA_LEN):
137 |     '''
138 |     偏差计算图
139 |     :param y:
140 |     :param y_conv:
141 |     :param width:
142 |     :param height:
143 |     :return:
144 |     '''
145 |     # 这里区分了大小写 实际上验证码一般不区分大小写
146 |     # 预测值
147 |     predict = tf.reshape(y_conv, [-1, height, width])
148 |     max_predict_idx = tf.argmax(predict, 2)
149 |     # 标签
150 |     label = tf.reshape(y, [-1, height, width])
151 |     max_label_idx = tf.argmax(label, 2)
152 |     correct_p = tf.equal(max_predict_idx, max_label_idx)
153 |     # reduce_mean求tensor中平均值
154 |     accuracy = tf.reduce_mean(tf.cast(correct_p, tf.float32))
155 |     return accuracy
156 | 
157 | 
158 | def convert2gray(img):
159 |     '''
160 |     图片转为黑白，3维转1维
161 |     :param img:
162 |     :return:
163 |     '''
164 |     if len(img.shape) > 2:
165 |         img = np.mean(img, -1)
166 |     return img
167 | 
168 | 
169 | def text2vec(text, captcha_len=CAPTCHA_LEN, captcha_list=CAPTCHA_LIST):
170 |     '''
171 |     验证码文本转为向量
172 |     :param text:
173 |     :param captcha_len:
174 |     :param captcha_list:
175 |     :return: vector
176 |     '''
177 |     text_len = len(text)
178 |     if text_len > captcha_len:
179 |         raise ValueError('验证码最长4个字符')
180 |     vector = np.zeros(captcha_len * len(captcha_list))
181 |     for i in range(text_len):
182 |         vector[captcha_list.index(text[i]) + i * len(captcha_list)] = 1
183 |     return vector
184 | 
185 | 
186 | def vec2text(vec, captcha_list=CAPTCHA_LIST, size=CAPTCHA_LEN):
187 |     '''
188 |     验证码向量转为文本
189 |     :param vec:
190 |     :param captcha_list:
191 |     :param size:
192 |     :return:
193 |     '''
194 |     # if np.size(np.shape(vec)) is not 1:
195 |     #     raise ValueError('向量限定为1维')
196 |     # vec = np.reshape(vec, (size, -1))
197 |     # vec_idx = np.argmax(vec, 1)
198 |     vec_idx = vec
199 |     text_list = [captcha_list[v] for v in vec_idx]
200 |     return ''.join(text_list)
201 | 
202 | 
203 | def wrap_gen_captcha_text_and_image(shape=(60, 160, 3)):
204 |     '''
205 |     返回特定shape图片
206 |     :param shape:
207 |     :return:
208 |     '''
209 |     while True:
210 |         t, im = get_random_captcha_text_and_image()
211 |         if im.shape == shape:
212 |             return t, im
213 | 
214 | 
215 | def next_batch(batch_count=60, width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT):
216 |     '''
217 |     获取训练图片组
218 |     :param batch_count:
219 |     :param width:
220 |     :param height:
221 |     :return:
222 |     '''
223 |     # np.zeros()返回来一个给定形状和类型的用0填充的数组；
224 |     batch_x = np.zeros([batch_count, width * height])
225 |     batch_y = np.zeros([batch_count, CAPTCHA_LEN * len(CAPTCHA_LIST)])
226 |     for i in range(batch_count):
227 |         text, image = wrap_gen_captcha_text_and_image()
228 |         image = convert2gray(image)
229 |         # 将图片数组一维化 同时将文本也对应在两个二维组的同一行
230 |         batch_x[i, :] = image.flatten() / 255
231 |         batch_y[i, :] = text2vec(text)
232 |     # 返回该训练批次
233 |     return batch_x, batch_y
234 | 
235 | 
236 | def start_train(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH, y_size=len(CAPTCHA_LIST) * CAPTCHA_LEN):
237 |     """
238 |     cnn 训练
239 |     :param height:
240 |     :param width:
241 |     :param y_size:
242 |     :return:
243 |     """
244 |     acc_rate = 0.95
245 |     # 按照图片大小申请占位符
246 |     x = tf.placeholder(tf.float32, [None, height * width])  # (这里的None表示此张量的第一个维度可以是任何长度的)
247 |     y = tf.placeholder(tf.float32, [None, y_size])
248 |     # 防止过拟合 训练时启用 测试时不启用 （过拟合是指为了得到一致假设而使假设变得过度严格）
249 |     keep_prob = tf.placeholder(tf.float32)
250 |     # cnn模型
251 |     y_conv = cnn_graph(x, keep_prob, (height, width))
252 |     # 最优化
253 |     optimizer = optimize_graph(y, y_conv)
254 |     # 偏差
255 |     accuracy = accuracy_graph(y, y_conv)
256 |     # 启动会话.开始训练
257 |     saver = tf.train.Saver()
258 |     sess = tf.Session()
259 |     sess.run(tf.global_variables_initializer())
260 |     step = 0
261 |     while 1:
262 |         batch_x, batch_y = next_batch(64)
263 |         sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.75})
264 |         # 每训练一百次测试一次
265 |         if step % 100 == 0:
266 |             batch_x_test, batch_y_test = next_batch(100)
267 |             acc = sess.run(accuracy, feed_dict={x: batch_x_test, y: batch_y_test, keep_prob: 1.0})
268 |             print(datetime.now().strftime('%c'), ' step:', step, ' accuracy:', acc)
269 |             # 偏差满足要求，保存模型
270 |             if acc > acc_rate:
271 |                 model_path = os.getcwd() + os.sep + str(acc_rate) + "captcha.model"
272 |                 saver.save(sess, model_path, global_step=step)
273 |                 acc_rate += 0.01
274 |                 if acc_rate > 0.99:
275 |                     break
276 |         step += 1
277 |     sess.close()
278 | 
279 | 
280 | if __name__ == "__main__":
281 |     start_train()
282 | 


--------------------------------------------------------------------------------
/word.md:
--------------------------------------------------------------------------------
1 | abcd	1
2 | lock	1
3 | stop	2
4 | aaaa	1
5 | bbbm	1
6 | dddd	1
7 | 


--------------------------------------------------------------------------------