├── .travis.yml ├── README.md ├── ac.py ├── avl_tree.py ├── base64_str.py ├── btree.py ├── calc24.py ├── celery └── tasks.py ├── compress.py ├── coroutine.py ├── crawl_360 ├── crawl_360 │ ├── __init__.py │ ├── __init__.pyc │ ├── items.py │ ├── items.pyc │ ├── middlewares.py │ ├── models │ │ ├── __init__.py │ │ ├── __init__.pyc │ │ ├── db.py │ │ ├── db.pyc │ │ ├── models.py │ │ └── models.pyc │ ├── pipelines.py │ ├── pipelines.pyc │ ├── reademe │ │ └── sql.sql │ ├── settings.py │ ├── settings.pyc │ └── spiders │ │ ├── __init__.py │ │ ├── __init__.pyc │ │ ├── butian.py │ │ └── butian.pyc └── scrapy.cfg ├── dispatch.py ├── hashtable.py ├── heapq_sort.py ├── httpstat.py ├── img ├── ac_fail_pointer.png ├── btree.png ├── cmd.png ├── crawl_db_data.png ├── crawl_run.gif ├── download.gif ├── knn.png ├── redpackage.gif ├── spider-wx.png ├── svm.png └── tire.png ├── interpreter.py ├── kmp.py ├── knn.py ├── linked_list.py ├── nice_download.py ├── palindrome.py ├── rb_tree.py ├── red_package_optimize.py ├── redpackage.py ├── revert_list.py ├── rpn.py ├── rsa.py ├── selenium.py ├── svm.py ├── tensorflow ├── cnn_test.py ├── create_captcha_img.py └── train.py └── word.md /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | script: true -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python 2 | [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/LockGit/gochat/issues) 3 | 4 | ### 基于mitm中间人的形式获取所有公众号历史文章,评论,阅读量 5 | ``` 6 | feature 7 | 1,支持水平扩展 8 | 2,支持增量更新与自动化抓取阅读量,评论数据 9 | 3,支持抓取封禁后继续恢复抓取 10 | 4,支持各阶段时间频率控制 11 | 5,支持持续监控指定公众号 12 | ``` 13 | ![](https://github.com/LockGit/Py/blob/master/img/spider-wx.png) 14 | 15 | ### nice_download.py 多线程文件下载器 16 | ``` 17 | 理论在大型文件下载,带宽充足的情况下,可增加数十倍下载速度 18 | 原理是多线程对目标文件分块下载 19 | 1,发送head请求获取目标文件总大小,以及当前是否支持分块下载(详情:http协议header头range及response的content-range),现在基本都支持 20 | 2,下载前创建一个和要下载文件一样大小的文件 21 | 3,根据1中获得的文件大小分块多线程,各个线程下载不同的数据块 22 | 小型文件可能看不出加速效果,在大型文件上就会拉大差距 23 | 关于http的range特性: 24 | 有些文件下载器在下载中断之后可以在中断位置继续下载,而不必重新开始的原因就是利用了支持range的特性 25 | 记录了中断时的文件偏移位置,在实现时只要在中断异常的时候记录文件偏移位置到临时文件 26 | 下次继续下载读取临时文件中的偏移即可支持断点下载,下载完成时删除记录文件偏移的临时文件即可 27 | 说明: 28 | nice_download.py是多线程模式,所以去除断点下载功能,否则维护临时文件偏移位置比维护单一进程的临时文件偏移位置要复杂的多 29 | 查看帮助:python nice_download.py -h 30 | ``` 31 | ![](https://github.com/LockGit/Py/blob/master/img/download.gif) 32 | 33 | 34 | ### 基于tensorflow的验证码识别 35 | ``` 36 | 依赖: 37 | pip install tensorflow 38 | pip install numpy 39 | 40 | 0x01,cd tensorflow 41 | 0x02,模型训练:python train.py 42 | 0x03,验证验证:python cnn_test.py 43 | 44 | 已有大多相关案例,测试相关总结与截图如下: 45 | ``` 46 | ![](https://github.com/LockGit/Hacking/blob/master/img/cnn_test.png) 47 | 48 | [详相说明相关截图](https://github.com/LockGit/Hacking#基于机器学习tensorflow的复杂验证码识别) 49 | 50 | 总结文档:[基于机器学习(TensorFlow)的复杂验证码识别.pdf](https://github.com/LockGit/Hacking/blob/master/res/doc/基于机器学习(TensorFlow)的复杂验证码识别.pdf) 51 | 52 | 53 | ### redpackage.py && red_package_optimize.py 一种红包分配思路 54 | ``` 55 | red_package_optimize.py为优化版,redpackage.py的range有点浪费内存,比如在红包个数特别大的情况下 56 | 57 | 指定红包总金额,再指定红包的个数,获得每个红包分配金额详情 58 | 59 | 例,红包总金额为10元,分成7个 60 | ➜ Py git:(master) ✗ py redpackage.py 10 7 61 | [0.57, 2.37, 1.91, 0.32, 1.3, 2.24, 1.29] 62 | 第 1 个红包金额:0.57元 63 | 第 2 个红包金额:2.37元 64 | 第 3 个红包金额:1.91元 65 | 第 4 个红包金额:0.32元 66 | 第 5 个红包金额:1.3元 67 | 第 6 个红包金额:2.24元 68 | 第 7 个红包金额:1.29元 69 | 验证:红包总金额 is 10.0元, 分配后 res sum is 10.0元 70 | ``` 71 | ![](https://github.com/LockGit/Py/blob/master/img/redpackage.gif) 72 | 73 | ### ac.py 字符串搜索算法(tire树+AC自动机) 74 | ``` 75 | 学习记录: 76 | 如果你的本地只有几个,几十个词,那么没有必要使用,直接存配置文件,字典查找即可, 77 | 这比向api发起http请求要快的多。但如果词的数目不断增加,那么后期将不利于维护, 78 | 需要服务化。 79 | 80 | 这个算法存在于实际场景,判断某个单词是否是敏感词,就涉及到字符串查找。 81 | 敏感词被封装成了一个api接口,使用起来也很方便,直接向api提交单词, 82 | 看返回结果就知道是否命中,命中了则字符串存在,表明查找到了。 83 | 84 | 需要数据结构与算法知识: 85 | 参考文档1(海量数据处理之Tire树(字典树)): 86 | http://blog.csdn.net/ts173383201/article/details/7858598 87 | 参考文档2(AC自动机总结): 88 | http://blog.csdn.net/mobius_strip/article/details/22549517 89 | 90 | trie的核心思想是空间换时间,跟彩虹表的思想一致,但trie树不是彩虹表, 91 | 简而言之,trie树利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。 92 | 93 | 它有3个基本性质: 94 | 根节点不包含字符,除根节点外每一个节点都只包含一个字符。 95 | 从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串。 96 | 每个节点的所有子节点包含的字符都不相同。 97 | 98 | 复制了别人画的图,大致就是一种如下的树结构,需要用语言构造这棵树即可: 99 | ``` 100 | ![](https://github.com/LockGit/Py/blob/master/img/tire.png) 101 | 102 | ``` 103 | fail 指针的理解图解,以下内容需要仔细读 104 | 参考:http://www.cnblogs.com/crazyacking/p/4659501.html 105 | ``` 106 | ![](https://github.com/LockGit/Py/blob/master/img/ac_fail_pointer.png) 107 | 108 | ``` 109 | 树上的词分别是: 110 | { he , hers , his , she} 111 | 按图所示分成3层。看到第三层,是"she",其中: 112 | ①s指向root 113 | ②h先找到s的fail指针 114 | 发现是0号指针,不是h,然后h就不高兴了,再问问s的fail指针root:“你有没有儿子和我同名叫h的” 115 | root说:“有,你指向他吧”,然后h就高兴的指向了第一行的h. 116 | ③e开始找了,首先问他老爸h:“你的fail指针指着谁” 117 | h说:“图上第一行那个h啊” 118 | 然后e就屁颠屁颠地跑去问图上第一行那个h:“你有没有名字和我一样的儿子啊” 119 | 图上第一行那个h说:“有,他地址是xxx” 120 | 最后e的fail指针就指向xxx地址,也就是第一行那个e了 121 | 发现这样,如果一个字符串查到第三行的e以后的字符才不匹配,那说明他前面应该有个‘he’ 122 | 刚好e的失败指针指向的是第一行的‘he...’的那个e; 123 | 这样就不用从h开始再找一遍,而是接着第一行的e继续往后找,从而节省了时间. 124 | ``` 125 | 126 | ``` 127 | ➜ ~ du -h word.md && wc -l word.md 128 | 1.0M word.md 129 | 57193 word.md 130 | 131 | 本地测试了一下,57000条记录大于占1M硬盘空间,那么6M的空间大约包含记录34W条记录, 132 | 我传到github的word.md没有几个字符,只做了演示,而且每个单词还加了rank等级,\t制表符,实际占用空间应该更小, 133 | 生产环境甚至可以直接将这些数据缓存到内存中。 134 | ``` 135 | ![](https://github.com/LockGit/Py/blob/master/img/cmd.png) 136 | ``` 137 | 测试搜索指定字符串: 138 | 139 | 查找到了 140 | ➜ ~ python ac.py lock 141 | Good ! Find it, the item is: 142 | [(0, 3, 'lock', 1, 2)] 143 | 144 | 查找到了 145 | ➜ ~ python ac.py stop 146 | Good ! Find it, the item is: 147 | [(0, 3, 'stop', 2, 3)] 148 | 149 | 没有查找到 150 | ➜ ~ python ac.py test 151 | Sorry, The item not in file dict 152 | 153 | 如果查找到了返回一个list,list中item类型为tuple, 并且包含了在树中匹配的起,终点位置index 154 | ``` 155 | 156 | ### calc24.py 算24游戏小程序 157 | ``` 158 | 游戏规则:给定4个数,可以执行的运算有 + - * / , 求出算的结果是24的算法过程 159 | 160 | get help: 161 | ➜ Py git:(master) ✗ py calc24.py -h 162 | Usage: usage -n 1,2,3,4 163 | 164 | Options: 165 | -h, --help show this help message and exit 166 | -n NUMS specify num list 167 | 168 | exp: 169 | ➜ Py git:(master) ✗ py calc24.py -n 10,8,9,4 170 | [10, 8, 9, 4] 171 | 9 - 10 = -1 172 | 4 + -1 = 3 173 | 8 * 3 = 24 174 | Success 175 | 176 | or random test: 177 | ➜ Py git:(master) ✗ py calc24.py 178 | [9, 10, 3, 6] 179 | 10 - 9 = 1 180 | 3 + 1 = 4 181 | 6 * 4 = 24 182 | Success 183 | 184 | ~~~python轮子很强大~~~ 185 | ``` 186 | 187 | ### rpn.py 逆波兰表达式 python 版实现 188 | ``` 189 | 逆波兰表达式被广泛应用于编译原理中,是一种是由波兰数学家扬·武卡谢维奇1920年引入的数学表达式方式,在逆波兰记法中, 190 | 所有操作符置于操作数的后面,因此也被称为后缀表示法。逆波兰记法不需要括号来标识操作符的优先级。 191 | 以利用堆栈结构减少计算机内存访问。 192 | ➜ Py git:(master) ✗ python rpn.py 193 | ['11111111111111', '9999999999999', '*', '99', '12', '4', '/', '-', '10', '+', '+'] 194 | True 111111111111098888888888995 111111111111098888888888995 195 | True 326 326 196 | ``` 197 | 198 | 199 | ### dispatch.py 轮转队列 | 协程实现 200 | ``` 201 | 你的手头上会有多个任务,每个任务耗时很长,而你又不想同步处理,而是希望能像多线程一样交替执行。 202 | yield 没有逻辑意义,仅是作为暂停的标志点。 203 | 程序流可以在此暂停,也可以在此恢复。而通过实现一个调度器,完成多个任务的并行处理。 204 | 通过轮转队列依次唤起任务,并将已经完成的任务清出队列,模拟任务调度的过程。 205 | 核心代码: 206 | from collections import deque 207 | class Runner(object): 208 | def __init__(self, tasks): 209 | self.tasks = deque(tasks) 210 | 211 | def next(self): 212 | return self.tasks.pop() 213 | 214 | def run(self): 215 | while len(self.tasks): 216 | task = self.next() 217 | try: 218 | next(task) 219 | except StopIteration: 220 | pass 221 | else: 222 | self.tasks.appendleft(task) 223 | 224 | def task(name, times): 225 | for i in range(times): 226 | yield 227 | print(name, i) 228 | 229 | Runner([ 230 | task('hsfzxjy', 5), 231 | task('Jack', 4), 232 | task('Bob', 6) 233 | ]).run() 234 | ``` 235 | 236 | ### coroutine.py 通过gevent第三方库实现协程 237 | ``` 238 | 上面的dispatch.py通过yield提供了对协程的支持,模拟了任务调度。而下面的这个gevent第三方库就更简单了。 239 | 240 | 第三方的gevent为Python提供了比较完善的协程支持。通过greenlet实现协程,其基本思想是: 241 | 当一个greenlet遇到IO操作时,比如访问网络,就自动切换到其他的greenlet,等到IO操作完成,再在适当的时候切换回来继续执行。 242 | 由于IO操作非常耗时,经常使程序处于等待状态,有了gevent为我们自动切换协程,就保证总有greenlet在运行,而不是等待IO。 243 | 244 | 由于切换是在IO操作时自动完成,所以gevent需要修改Python自带的一些标准库,这一过程在启动时通过monkey patch完成: 245 | 246 | 依赖: 247 | pip install gevent 248 | 249 | 执行: 250 | ➜ Py git:(master) ✗ python coroutine.py 251 | GET: https://www.python.org/ 252 | GET: https://www.yahoo.com/ 253 | GET: https://github.com/ 254 | 91430 bytes received from https://github.com/. 255 | 47391 bytes received from https://www.python.org/. 256 | 461975 bytes received from https://www.yahoo.com/. 257 | 258 | ``` 259 | 260 | 261 | ### base64_str.py base64编码原理 262 | ``` 263 | base64编码原理,使用Python实现base64编码,可能有bug,未完全完善版 264 | 1,准备一个包含64个字符的数组 265 | 2,对二进制数据进行处理,每3个字节一组,一共是3x8=24bit,划为4组,每组正好6个bit 266 | 3,得到4个数字作为索引,然后查表,获得相应的4个字符,就是编码后的字符串 267 | 4,如果要编码的二进制数据不是3的倍数,最后会剩下1个或2个字节,Base64用\x00字节在末尾补足后,再在编码的末尾加上1个或2个=号, 268 | 表示补了多少字节,解码的时候,会自动去掉。 269 | 270 | Base64编码会把3字节的二进制数据编码为4字节的文本数据,长度增加33% 271 | 272 | 例: 273 | ➜ Py git:(master) ✗ python base64_str.py lock 274 | bG9jaw== 275 | ➜ Py git:(master) ✗ echo -n lock|base64 276 | bG9jaw== 277 | 278 | ``` 279 | 280 | ### rsa.py RSA算法演示 281 | ``` 282 | ➜ py python rsa.py 283 | 下面是一个RSA加解密算法的简单演示: 284 | 285 | 报文 加密 加密后密文 286 | 287 | 12 248832 17 288 | 15 759375 15 289 | 22 5153632 22 290 | 5 3125 10 291 | 292 | 293 | --------------------------- 294 | ----------执行解密--------- 295 | --------------------------- 296 | 原始报文 密文 加密 解密报文 297 | 298 | 12 17 1419857 12 299 | 15 15 759375 15 300 | 22 22 5153632 22 301 | 5 10 100000 5 302 | ``` 303 | 304 | ### selenium.py 自动化测试demo 305 | ``` 306 | 坑1: 307 | 执行 python selenium.py 始终无法唤醒chrome。 308 | 最终发现chromedriver很早之前安装的,没有进行:brew upgrade chromedriver,导致执行脚本时报错 309 | upgrade chromedriver 之后解决问题,官方文档说明了selenium支持好几个Browser driver。 310 | 演示时用的是Chrome,python的unittest模块,文档上说也可以用pytest 311 | 312 | 大致支持这以下几种DOM查找,不同语言的接口略微的小区别 313 | driver.findElement(By.id()) 314 | driver.findElement(By.name()) 315 | driver.findElement(By.className()) 316 | driver.findElement(By.tagName()) 317 | driver.findElement(By.linkText()) 318 | driver.findElement(By.partialLinkText()) 319 | driver.findElement(By.cssSelector()) 320 | driver.findElement(By.xpath()) 321 | 322 | 支持Using Selenium with remote WebDriver 323 | 支持远程WebDriver,默认监听4444端口 324 | 启动:brew services start selenium-server-standalone 325 | 停止:brew services stop selenium-server-standalone 326 | 访问http://127.0.0.1:4444 点击console, 327 | 新建正在测试所使用的webdriver,对于正在运行driver的测试程序,可以截图看当前测试程序的运行位置 328 | ``` 329 | 330 | 331 | ### Python 沙箱逃逸 332 | ``` 333 | 重温2012.hack.lu的比赛题目,在这次挑战中,需要读取'./1.key'文件的内容。 334 | 他们首先通过删除引用来销毁打开文件的内置函数。然后它们允许您执行用户输入。看看他们的代码稍微修改的版本: 335 | 336 | def make_secure(): 337 | UNSAFE = ['open', 338 | 'file', 339 | 'execfile', 340 | 'compile', 341 | 'reload', 342 | '__import__', 343 | 'eval', 344 | 'input'] 345 | for func in UNSAFE: 346 | del __builtins__.__dict__[func] 347 | from re import findall 348 | # Remove dangerous builtins 349 | make_secure() 350 | print 'Go Ahead, Expoit me >;D' 351 | while True: 352 | try: 353 | # Read user input until the first whitespace character 354 | inp = findall('\S+', raw_input())[0] 355 | a = None 356 | # Set a to the result from executing the user input 357 | exec 'a=' + inp 358 | print 'Return Value:', a 359 | except Exception, e: 360 | print 'Exception:', e 361 | 由于没有在__builtins__中引用file和open,所以常规的编码技巧是行不通的。但可以在Python解释器中挖掘出另一种代替file或open引用的方法。 362 | 363 | 另类读取文件的方式: 364 | ().__class__.__bases__[0].__subclasses__()[40]('1.key').read() 365 | 这个方法依然可以读取到1.key的内容,coder,hack,geek可以深入了解下,本人测试时的python版本为:Python 2.7.12 366 | ``` 367 | 368 | ### avl_tree.py 平衡二叉搜索树 369 | ``` 370 | 特点: 371 | 1、若它的左子树不为空,则左子树上所有的节点值都小于它的根节点值。 372 | 2、若它的右子树不为空,则右子树上所有的节点值均大于它的根节点值。 373 | 3、它的左右子树也分别可以充当为二叉查找树。 374 | 4、每个节点的左子树和右子树的高度差至多等于1。 375 | 376 | 如果普通二叉搜索树的深度很高且单一左边节点很多或者单一右边节点很多,那么查找性能几乎就变成了线性的 377 | 而平衡二叉树的每个节点的左子树和右子树的高度差至多等于1,这种树结构的查找性能时间复杂度趋向lgn 378 | 379 | ➜ Py git:(master) ✗ py avl_tree.py 380 | 8 381 | 9 382 | 1 383 | 384 | ``` 385 | ### rb_tree.py 红黑树 386 | ``` 387 | 红黑树多用在内部排序,即全放在内存中的,微软STL的map和set的内部实现就是红黑树。 388 | B树多用在内存里放不下,大部分数据存储在外存上时。因为B树层数少,因此可以确保每次操作,读取磁盘的次数尽可能的少。 389 | 在数据较小,可以完全放到内存中时,红黑树的时间复杂度比B树低。 390 | 反之,数据量较大,外存中占主要部分时,B树因其读磁盘次数少,而具有更快的速度。 391 | 392 | 特点: 393 | (1)每个节点或者是黑色,或者是红色。 394 | (2)根节点是黑色。 395 | (3)每个叶子节点(NIL)是黑色。 [注意:这里叶子节点,是指为空(NIL或NULL)的叶子节点!] 396 | (4)如果一个节点是红色的,则它的子节点必须是黑色的。 397 | (5)从一个节点到该节点的子孙节点的所有路径上包含相同数目的黑节点。 398 | ``` 399 | 400 | ### revert_list.py 反转链表 401 | ``` 402 | ➜ Py git:(master) ✗ py revert_list.py 403 | 1 404 | 2 405 | 3 406 | start revert list ... 407 | 3 408 | 2 409 | 1 410 | ``` 411 | 412 | ### palindrome.py python版回文数,heapq_sort.py基于堆排序 413 | ``` 414 | life is short , use python 415 | -(1)时间复杂度:O(n),空间复杂度:O(1)。从两头向中间扫描 416 | -(2)时间复杂度:O(n),空间复杂度:O(1)。先从中间开始、然后向两边扩展 417 | 418 | 堆排实现,python对有对应封装好的heapq模块 419 | py heapq_sort.py 420 | ``` 421 | 422 | 423 | ### kmp.py kmp字符串查找算法 424 | ``` 425 | ➜ Py git:(master) ✗ python kmp.py 426 | Found 'sase' start at string 'asfdehhaassdsdasasedwa' 15 index position, find use times: 23 427 | Found 'sase' start at string '12s3sasexxx' 4 index position, find use times: 9 428 | 429 | 核心算法: 430 | def kmp(string, match): 431 | n = len(string) 432 | m = len(match) 433 | i = 0 434 | j = 0 435 | count_times_used = 0 436 | while i < n: 437 | count_times_used += 1 438 | if match[j] == string[i]: 439 | if j == m - 1: 440 | print "Found '%s' start at string '%s' %s index position, find use times: %s" % (match, string, i - m + 1, count_times_used,) 441 | return 442 | i += 1 443 | j += 1 444 | elif j > 0: 445 | j = j - 1 446 | else: 447 | i += 1 448 | ``` 449 | 450 | 451 | ### compress.py 字符串压缩 452 | ``` 453 | 针对连续重复较多的字符压缩,否则不起压缩效果 454 | ➜ Py git:(master) ✗ python compress.py 455 | 原始字符串:xAAACCCBBDBB111 456 | 压缩后:x1A3C3B2D1B213 457 | 执行解压... 458 | x 459 | A 460 | A 461 | A 462 | C 463 | C 464 | C 465 | B 466 | B 467 | D 468 | B 469 | B 470 | 1 471 | 1 472 | 1 473 | 解压完毕 474 | 解压后:xAAACCCBBDBB111 475 | ``` 476 | 477 | ### hashtable.py hash表实现 478 | ``` 479 | hash_table = HashTable(5); # 分配5块 480 | hash_table.set(1,'x') 481 | print hash_table.get(1) 482 | 483 | 核心代码: 484 | class Item(object): 485 | def __init__(self, key, value): 486 | self.key = key 487 | self.value = value 488 | 489 | 490 | class HashTable(object): 491 | def __init__(self, size): 492 | self.size = size 493 | self.table = [[] for _ in xrange(self.size)] 494 | 495 | def hash_function(self, key): 496 | return key % self.size 497 | 498 | def set(self, key, value): 499 | hash_index = self.hash_function(key) 500 | for item in self.table[hash_index]: 501 | if item.key == key: 502 | item.value = value 503 | return 504 | self.table[hash_index].append(Item(key, value)) 505 | 506 | def get(self, key): 507 | hash_index = self.hash_function(key) 508 | for item in self.table[hash_index]: 509 | if item.key == key: 510 | return item.value 511 | return None 512 | 513 | def remove(self, key): 514 | hash_index = self.hash_function(key) 515 | for i, item in enumerate(self.table[hash_index]): 516 | if item.key == key: 517 | del self.table[hash_index][i] 518 | ``` 519 | 520 | 521 | ### interpreter.py Python解释器理解 522 | ``` 523 | Python会执行其他3个步骤:词法分析,语法解析和编译。 524 | 这三步合起来把源代码转换成code object,它包含着解释器可以理解的指令。而解释器的工作就是解释code object中的指令。 525 | 核心代码 526 | class Interpreter: 527 | def __init__(self): 528 | self.stack = [] 529 | 530 | def load_value(self, number): 531 | self.stack.append(number) 532 | 533 | def print_answer(self): 534 | answer = self.stack.pop() 535 | print(answer) 536 | 537 | def add_two_values(self): 538 | first_num = self.stack.pop() 539 | second_num = self.stack.pop() 540 | total = first_num + second_num 541 | self.stack.append(total) 542 | 543 | def run_code(self, what_to_execute): 544 | instructions = what_to_execute["instructions"] 545 | numbers = what_to_execute["numbers"] 546 | for each_step in instructions: 547 | instruction, argument = each_step 548 | if instruction == "load_value": 549 | number = numbers[argument] 550 | self.load_value(number) 551 | elif instruction == "add_two_values": 552 | self.add_two_values() 553 | elif instruction == "print_answer": 554 | self.print_answer() 555 | ``` 556 | 557 | 558 | ### linked_list.py 快速查找单链表中间节点 559 | ``` 560 | ➜ Py git:(master) py linked_list.py 561 | 普通遍历方式,单链表中间节点为:n3,索引为:2,遍历一次链表,在从0遍历到中间位置 562 | 快慢指针方式,单链表中间节点为:n3,索引为:2,只遍历一次链表 563 | 564 | 核心代码: 565 | class Node(object): 566 | def __init__(self,data,next): 567 | self.data=data 568 | self.next=next 569 | 570 | n1 = Node('n1',None) 571 | n2 = Node('n2',n1) 572 | n3 = Node('n3',n2) 573 | n4 = Node('n4',n3) 574 | n5 = Node('n5',n4) 575 | 576 | head = n5 # 链表的头节点 577 | 578 | p1 = head # 一次步进1个node 579 | p2 = head # 一次步进2个node 580 | 581 | step = 0 582 | while (p2.next is not None and p2.next.next is not None): 583 | p2 = p2.next.next 584 | p1 = p1.next 585 | step = step + 1 586 | print '快慢指针方式,单链表中间节点为:%s,索引为:%s,只遍历一次链表' % (p1.data,step) 587 | ``` 588 | 589 | ### K最近邻算法 590 | ``` 591 | 这个算法比svm简单很多 592 | 只需使用初中所学的两点距离公式(欧拉距离公式),计算目标点到各组的距离,看绿点和哪组更接近。 593 | k代表取当前要分类的点最近的k个点,这k个点如果其中属于红点个数占多数,我们就认为绿点应该划分为红组,反之,则划分为黑组。 594 | k值与分类数成正相关,现在是2个分组,那么k值取3,假设是3个分组,那么k值就要取5 595 | 参考说明:https://zh.wikipedia.org/wiki/最近鄰居法 596 | 依赖: 597 | pip install numpy 598 | pip install matplotlib 599 | 600 | 下图中标注较大的红点在计算之后被分配到红组 601 | 执行:python knn.py 602 | ``` 603 | ![](https://github.com/LockGit/Py/blob/master/img/knn.png) 604 | 605 | 606 | ### 支持向量机 svm.py 607 | ``` 608 | 迟早会忘记的svm 609 | 属分类算法,目标是寻找一个最优超平面,比knn算法复杂 610 | demo为线性可分离数据 611 | 612 | 参考1:https://zh.wikipedia.org/zh-hans/支持向量机 613 | 参考2:http://blog.csdn.net/viewcode/article/details/12840405 614 | 参考3:http://blog.csdn.net/lisi1129/article/details/70209945?locationNum=8&fps=1 615 | 616 | 依赖: 617 | pip install numpy 618 | pip install matplotlib 619 | 620 | 执行:python svm.py 621 | ``` 622 | ![](https://github.com/LockGit/Py/blob/master/img/svm.png) 623 | 624 | 625 | ### (前序,中序,后序,层序) btree.py 626 | ``` 627 | ➜ Py git:(master) ✗ python btree.py 628 | 前序遍历: root A C D F G B E 629 | 中序遍历: C F D G A root B E 630 | 后序遍历: F G D C A E B root 631 | 层序遍历: root A B C E D F G 632 | 构造树结构如下图 633 | ``` 634 | ![](https://github.com/LockGit/Py/blob/master/img/btree.png) 635 | 636 | 637 | ### Scrapy 爬虫测试(项目代码在仓库crawl_360目录下) 638 | ``` 639 | 安装依赖: 640 | pip install Scrapy 641 | pip install sqlalchemy 642 | pip install sqlacodegen 643 | pip install mysql-connector 644 | 645 | 创建db:CREATE DATABASE crawl DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci 646 | 647 | 创建表:crawl_360/readme/sql.sql 文件 648 | 649 | sqlacodegen创建models: 650 | sqlacodegen --outfile=models.py mysql://root@localhost:3306/crawl --tables butian 651 | 652 | 653 | 找测试的目标抓取页面:http://butian.360.cn/Loo 页面被披露漏洞的企业列表 654 | 655 | 创建项目: scrapy startproject crawl_360 656 | 657 | 目录结构: 658 | ➜ crawl_360 tree 659 | . 660 | ├── crawl_360 661 | │   ├── __init__.py 662 | │   ├── __init__.pyc 663 | │   ├── items.py 664 | │   ├── items.pyc 665 | │   ├── middlewares.py 666 | │   ├── models 667 | │   │   ├── __init__.py 668 | │   │   ├── __init__.pyc 669 | │   │   ├── db.py 670 | │   │   ├── db.pyc 671 | │   │   ├── models.py 672 | │   │   └── models.pyc 673 | │   ├── pipelines.py 674 | │   ├── pipelines.pyc 675 | │   ├── reademe 676 | │   │   └── sql.sql 677 | │   ├── settings.py 678 | │   ├── settings.pyc 679 | │   └── spiders 680 | │   ├── __init__.py 681 | │   ├── __init__.pyc 682 | │   ├── butian.py 683 | │   └── butian.pyc 684 | └── scrapy.cfg 685 | 686 | 生成一个爬虫: 687 | cd crawl_360 && scrapy genspider butian butian.360.cn/Loo 688 | 689 | 编写爬虫代码 (crawl_360目录下,xpath代码30行即可) 690 | 691 | 爬取:scrapy crawl butian 692 | 693 | 另:selenium也是一款非常不错的工具,可是使用selenium调用Browser driver更加逼真真实用户操作 694 | ``` 695 | ![](https://github.com/LockGit/Py/blob/master/img/crawl_run.gif) 696 | ![](https://github.com/LockGit/Py/blob/master/img/crawl_db_data.png) 697 | 698 | 699 | 700 | ### Celery 分布式任务队列Test (仓库celery文件夹下) 701 | ``` 702 | pip3 install celery 703 | pip3 install redis 704 | 编写tasks.py 705 | ``` 706 | ```python 707 | from celery import Celery 708 | 709 | app = Celery('TASK', broker='redis://127.0.0.1', backend='redis://127.0.0.1') 710 | 711 | 712 | @app.task 713 | def add(x, y): 714 | print 'start ...' 715 | print 'get param :%s,%s' % (x, y,) 716 | return x + y 717 | ``` 718 | ``` 719 | 启动celery worker 来开始监听并执行任务 720 | celery -A tasks worker --loglevel=info 721 | tasks 任务文件名,worker 任务角色,--loglevel=info 任务日志级别 722 | 723 | 127.0.0.1:6379> keys * 724 | 1) "_kombu.binding.celery" 725 | 2) "_kombu.binding.celeryev" 726 | 3) "_kombu.binding.celery.pidbox" 727 | 127.0.0.1:6379> 728 | 729 | redis 集合结构(set),查看value: 730 | SMEMBERS _kombu.binding.celery 731 | 732 | 在tasks.py文件目录打开终端进入py的交互式模式 733 | >>> from tasks import add 734 | >>> add.delay(1,2) 735 | 736 | >>> t = add.delay(4,5) 737 | >>> t.get() 738 | 9 739 | >>> t.ready() 740 | True 741 | 742 | celery常用接口 743 | tasks.add(4,6) ---> 本地执行 744 | tasks.add.delay(3,4) --> worker执行 745 | t=tasks.add.delay(3,4) --> t.get() 获取结果,或卡住,阻塞 746 | t.ready()---> False:未执行完,True:已执行完 747 | t.get(propagate=False) 抛出简单异常,但程序不会停止 748 | t.traceback 追踪完整异常 749 | 750 | 计算结果保存在redis中,默认结果有效期为1天 751 | 127.0.0.1:6379> ttl celery-task-meta-6eb3ee46-e86d-409a-9eb5-0c7d9b005035 752 | (integer) 85917 753 | 127.0.0.1:6379> get celery-task-meta-6eb3ee46-e86d-409a-9eb5-0c7d9b005035 754 | "{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": 9, \"task_id\": \"6eb3ee46-e86d-409a-9eb5-0c7d9b005035\", \"children\": []}" 755 | 127.0.0.1:6379> 756 | ``` 757 | -------------------------------------------------------------------------------- /ac.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-05-08 16:32:38 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-05-08 22:48:16 6 | import time 7 | import logging 8 | import sys 9 | 10 | log = logging.getLogger('dict_word') 11 | 12 | 13 | _word_cells = {} 14 | #预先生成好组成单词的字符 15 | for c in [chr(i) for i in range(ord('a'), ord('z') + 1)]: 16 | _word_cells[unicode(c)] = 1 17 | for c in [chr(i) for i in range(ord('A'), ord('Z') + 1)]: 18 | _word_cells[unicode(c)] = 1 19 | for c in [chr(i) for i in range(ord('0'), ord('9') + 1)]: 20 | _word_cells[unicode(c)] = 1 21 | 22 | 23 | #固定的英文单词组成部分 24 | _word_cells[u'_'] = 1 25 | _word_cells[u'-'] = 1 26 | 27 | # 缓存 28 | _cache = { 29 | 'acm': None, 30 | 'load_time': 0 31 | } 32 | 33 | #词默认等级 34 | DEFAULT_RANK=1 35 | 36 | 37 | def isWordCell(a): 38 | ''' 39 | 当前字符是否为单词的非边界,或者是组成部分 40 | :param a: 41 | :return: 42 | ''' 43 | return a in _word_cells 44 | 45 | 46 | 47 | class Node(object): 48 | ''' 49 | Node树节点 50 | :next : 用dict字典结构模拟动态链表 51 | :fail : 辅助初始值None 52 | :param isWord : 当前树节点是否为存在的单词 53 | :param rank : 等级 54 | ''' 55 | def __init__(self): 56 | self.next = {} 57 | self.fail = None 58 | self.isWord = False 59 | self.rank = 0 60 | 61 | 62 | class Ahocorasick(object): 63 | def __init__(self): 64 | self.__root = Node() 65 | 66 | 67 | def make(self): 68 | ''' 69 | build the fail function 70 | 构建自动机,失效函数 71 | ''' 72 | tmpQueue = [] 73 | tmpQueue.append(self.__root) 74 | while (len(tmpQueue) > 0): 75 | temp = tmpQueue.pop() 76 | p = None 77 | for k, v in temp.next.items(): 78 | if temp == self.__root: 79 | temp.next[k].fail = self.__root 80 | else: 81 | p = temp.fail 82 | while p is not None: 83 | if p.next.has_key(k): 84 | temp.next[k].fail = p.next[k] 85 | break 86 | p = p.fail 87 | if p is None: 88 | temp.next[k].fail = self.__root 89 | tmpQueue.append(temp.next[k]) 90 | 91 | 92 | def addWord(self, word, rank=1,line=0): 93 | ''' 94 | @param word: add word to Tire tree 95 | 添加关键词到Tire树中 96 | ''' 97 | word = word.lower() 98 | tmp = self.__root 99 | for i in range(0, len(word)): 100 | if not tmp.next.has_key(word[i]): 101 | tmp.next[word[i]] = Node() 102 | tmp = tmp.next[word[i]] 103 | tmp.isWord = True 104 | tmp.rank = rank 105 | tmp.line = line 106 | 107 | 108 | def search(self, content): 109 | ''' 110 | @return 如果查找到了返回一个list,list中item类型为tuple, 并且包含了匹配的起,终点位置index 111 | ''' 112 | #不区分大小写 113 | raw_content=content 114 | content = content.lower() 115 | 116 | p = self.__root 117 | result = [] 118 | startWordIndex = 0 119 | endWordIndex = -1 120 | currentPosition = 0 121 | 122 | content_len = len(content) 123 | while currentPosition < content_len: 124 | word = content[currentPosition] 125 | #print 'word:', word 126 | # 检索状态机,直到匹配 127 | while p.next.has_key(word) == False and p != self.__root: 128 | p = p.fail 129 | 130 | if p.next.has_key(word): 131 | if p == self.__root: 132 | # 若当前节点是根且存在转移状态,则说明是匹配词的开头,记录词的起始位置 133 | startWordIndex = currentPosition 134 | # 转移状态机的状态 135 | p = p.next[word] 136 | else: 137 | p = self.__root 138 | 139 | if p.isWord: 140 | # 若状态为词的结尾,则把词放进结果集 141 | # 判断当前这些位置是否为单词的边界 142 | if startWordIndex > 0 and isWordCell(content[startWordIndex - 1]) and isWordCell(content[startWordIndex]): 143 | # 当前字符和前面的字符都是字母,那么它是连续单词 144 | # print '前面不是单词边界', [startWordIndex > 0, str(content[startWordIndex - 1].encode('utf-8')),isWordCell(content[startWordIndex - 1]),str(content[startWordIndex].encode('utf-8')),isWordCell(content[startWordIndex])] 145 | currentPosition += 1 146 | continue 147 | 148 | if currentPosition < content_len - 1 and isWordCell(content[currentPosition + 1]) and isWordCell(content[currentPosition]): 149 | # print '后面不是单词边界' 150 | currentPosition += 1 151 | continue 152 | 153 | result.append((startWordIndex, currentPosition, raw_content[startWordIndex:currentPosition + 1], p.rank,p.line)) 154 | 155 | currentPosition += 1 156 | return result 157 | 158 | 159 | 160 | def load_acm(filename): 161 | ''' 162 | 加载词表 163 | :param filename: 164 | :return: 165 | 词表 分为很多行 166 | 每行 有2列组成 167 | 词 [tab] 等级 168 | exp: 169 | sharen [\t] 2 170 | ''' 171 | import os.path 172 | 173 | mtime = os.path.getmtime(filename) 174 | if _cache['load_time'] < mtime or _cache['acm'] is None: 175 | log.info('start load data') 176 | _cache['load_time'] = mtime 177 | start_time = time.time() 178 | acm = Ahocorasick() 179 | 180 | with open(filename) as fp: 181 | line_count = 0 182 | for line in fp: 183 | line_count += 1 184 | w = line.strip().decode('utf-8') 185 | arr2 = w.split('\t') 186 | # 默认等级 187 | if len(arr2) == 1: 188 | arr2.append(DEFAULT_RANK) 189 | try: 190 | acm.addWord(arr2[0], int(arr2[1]), line_count) 191 | except Exception as e: 192 | print 'error', e 193 | print 'line', line_count, line 194 | acm.make() 195 | _cache['acm'] = acm 196 | log.info('load ok time:%.2f' % (time.time() - start_time)) 197 | else: 198 | # print 'hit cache' 199 | pass 200 | 201 | 202 | return _cache['acm'] 203 | 204 | def help(): 205 | print "example: python ac.py str\n" 206 | 207 | 208 | if __name__ == '__main__': 209 | 210 | args = sys.argv 211 | if len(args) != 2: 212 | help() 213 | exit() 214 | 215 | # 预加载 216 | acm = load_acm('./word.md') 217 | # 指定搜索的文本 218 | content = args[1] 219 | search_result = acm.search(content) 220 | if len(search_result) > 0: 221 | print 'Good ! Find it, the item is:\n%s'%(search_result) 222 | else: 223 | print 'Sorry, The item not in file dict' 224 | 225 | -------------------------------------------------------------------------------- /avl_tree.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # Created by Vim 5 | """ 6 | 平衡二叉搜索树 7 | 1、若它的左子树不为空,则左子树上所有的节点值都小于它的根节点值。 8 | 2、若它的右子树不为空,则右子树上所有的节点值均大于它的根节点值。 9 | 3、它的左右子树也分别可以充当为二叉查找树。 10 | 4、每个节点的左子树和右子树的高度差至多等于1。 11 | """ 12 | 13 | 14 | class Node(object): 15 | def __init__(self, key): 16 | self.key = key 17 | self.left = None 18 | self.right = None 19 | self.height = 0 20 | 21 | 22 | class AvlTree(object): 23 | def __init__(self): 24 | self.root = None 25 | 26 | def find(self, key): 27 | if self.root is None: 28 | return None 29 | else: 30 | return self._find(key, self.root) 31 | 32 | def _find(self, key, node): 33 | if node is None: 34 | return None 35 | elif key < node.key: 36 | return self._find(key, self.left) 37 | elif key > node.key: 38 | return self._find(key, self.right) 39 | else: 40 | return node 41 | 42 | def find_min(self): 43 | if self.root is None: 44 | return None 45 | else: 46 | return self._find_min(self.root) 47 | 48 | def _find_min(self, node): 49 | if node.left: 50 | return self._find_min(node.left) 51 | else: 52 | return node 53 | 54 | def find_max(self): 55 | if self.root is None: 56 | return None 57 | else: 58 | return self._find_max(self.root) 59 | 60 | def _find_max(self, node): 61 | if node.right: 62 | return self._find_max(node.right) 63 | else: 64 | return node 65 | 66 | def height(self, node): 67 | if node is None: 68 | return -1 69 | else: 70 | return node.height 71 | 72 | def single_left_rotate(self, node): 73 | k1 = node.left 74 | node.left = k1.right 75 | k1.right = node 76 | node.height = max(self.height(node.right), self.height(node.left)) + 1 77 | k1.height = max(self.height(k1.left), node.height) + 1 78 | return k1 79 | 80 | def single_right_rotate(self, node): 81 | k1 = node.right 82 | node.right = k1.left 83 | k1.left = node 84 | node.height = max(self.height(node.right), self.height(node.left)) + 1 85 | k1.height = max(self.height(k1.right), node.height) + 1 86 | return k1 87 | 88 | def double_left_rotate(self, node): 89 | node.left = self.single_right_rotate(node.left) 90 | return self.single_left_rotate(node) 91 | 92 | def double_right_rotate(self, node): 93 | node.right = self.single_left_rotate(node.right) 94 | return self.single_right_rotate(node) 95 | 96 | def put(self, key): 97 | if not self.root: 98 | self.root = Node(key) 99 | else: 100 | self.root = self._put(key, self.root) 101 | 102 | def _put(self, key, node): 103 | if node is None: 104 | node = Node(key) 105 | elif key < node.key: 106 | node.left = self._put(key, node.left) 107 | if (self.height(node.left) - self.height(node.right)) == 2: 108 | if key < node.left.key: 109 | node = self.single_left_rotate(node) 110 | else: 111 | node = self.double_left_rotate(node) 112 | elif key > node.key: 113 | node.right = self._put(key, node.right) 114 | if (self.height(node.right) - self.height(node.left)) == 2: 115 | if key < node.right.key: 116 | node = self.double_right_rotate(node) 117 | else: 118 | node = self.single_right_rotate(node) 119 | 120 | node.height = max(self.height(node.right), self.height(node.left)) + 1 121 | return node 122 | 123 | def delete(self, key): 124 | self.root = self.remove(key, self.root) 125 | 126 | def remove(self, key, node): 127 | if node is None: 128 | raise KeyError, 'Error,key not in tree' 129 | elif key < node.key: 130 | node.left = self.remove(key, node.left) 131 | if (self.height(node.right) - self.height(node.left)) == 2: 132 | if self.height(node.right.right) >= self.height(node.right.left): 133 | node = self.single_right_rotate(node) 134 | else: 135 | node = self.double_right_rotate(node) 136 | node.height = max(self.height(node.left), self.height(node.right)) + 1 137 | elif key > node.key: 138 | node.right = self.remove(key, node.right) 139 | if (self.height(node.left) - self.height(node.right)) == 2: 140 | if self.height(node.left.left) >= self.height(node.left.right): 141 | node = self.single_left_rotate(node) 142 | else: 143 | node = self.double_left_rotate(node) 144 | node.height = max(self.height(node.left), self.height(node.right)) + 1 145 | elif node.left and node.right: 146 | if node.left.height <= node.right.height: 147 | min_node = self._find_min(node.right) 148 | node.key = min_node.key 149 | node.right = self.remove(node.key, node.right) 150 | else: 151 | max_node = self._find_max(node.left) 152 | node.key = max_node.key 153 | node.left = self.remove(node.key, node.left) 154 | node.height = max(self.height(node.left), self.height(node.right)) + 1 155 | else: 156 | if node.right: 157 | node = node.right 158 | else: 159 | node = node.left 160 | 161 | return node 162 | 163 | 164 | if __name__ == '__main__': 165 | avlTree = AvlTree() 166 | avlTree.put(1) 167 | avlTree.put(2) 168 | avlTree.put(3) 169 | avlTree.put(4) 170 | avlTree.put(5) 171 | avlTree.put(6) 172 | avlTree.put(7) 173 | avlTree.put(8) 174 | print avlTree.find_max().key 175 | avlTree.put(9) 176 | print avlTree.find_max().key 177 | print avlTree.find_min().key 178 | -------------------------------------------------------------------------------- /base64_str.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2016-09-14 00:33:53 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2016-09-14 00:49:09 6 | import string 7 | import sys 8 | def get_payloads(): 9 | payloads = list(string.ascii_uppercase) 10 | payloads = payloads + list(string.ascii_lowercase) 11 | for i in xrange(0,10): 12 | payloads.append(i) 13 | payloads.extend(['+','-']) 14 | return payloads 15 | 16 | def encode(s): 17 | if s=='': 18 | return '' 19 | if len(s)%3==1: 20 | s = s+'00' 21 | elif len(s)%3==2: 22 | s = s+'0' 23 | bin_code,tmp = [],[] 24 | for i in xrange(0,len(s),3): 25 | code = s[i:i+3] 26 | for j in code: 27 | if j=='0': 28 | bin_code.append('0'*8*len(j)) 29 | else: 30 | bin_code.append(bin(ord(j)).replace('0b','0')) # 10进制 to 2进制 31 | 32 | base_str = ''.join(map(str,bin_code)) 33 | translate_list = [] 34 | for bit in xrange(0,len(base_str),6): 35 | split_code = base_str[bit:bit+6] 36 | translate_list.append(str(int(split_code,2))) #二进制 to 十进制 37 | payloads = get_payloads() 38 | for i in translate_list: 39 | if i=='0': 40 | tmp.append('=') 41 | else: 42 | tmp.append(payloads[int(i)]) 43 | return ''.join(map(str,tmp)) 44 | 45 | def help(): 46 | print 'args error!\nexample:\n\tpython base64.py lock' 47 | exit() 48 | 49 | if __name__ == '__main__': 50 | args = sys.argv 51 | if len(args) > 2 or len(args)==1: 52 | help() 53 | print(encode(s = args[1])) 54 | -------------------------------------------------------------------------------- /btree.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/1/24 00:38 5 | class BTree: 6 | def __init__(self, value): 7 | self.left = None 8 | self.data = value 9 | self.right = None 10 | 11 | def insertLeft(self, value): 12 | self.left = BTree(value) 13 | return self.left 14 | 15 | def insertRight(self, value): 16 | self.right = BTree(value) 17 | return self.right 18 | 19 | def show(self): 20 | print self.data, 21 | 22 | 23 | def preorder(node): 24 | if node.data: 25 | node.show() 26 | if node.left: 27 | preorder(node.left) 28 | if node.right: 29 | preorder(node.right) 30 | 31 | 32 | def inorder(node): 33 | if node.data: 34 | if node.left: 35 | inorder(node.left) 36 | node.show() 37 | if node.right: 38 | inorder(node.right) 39 | 40 | 41 | def postorder(node): 42 | if node.data: 43 | if node.left: 44 | postorder(node.left) 45 | if node.right: 46 | postorder(node.right) 47 | node.show() 48 | 49 | 50 | def layerorder(node): 51 | stack = [node] 52 | while len(stack): 53 | node = stack.pop(0) 54 | if node.data: 55 | node.show() 56 | if node.left: 57 | stack.append(node.left) 58 | if node.right: 59 | stack.append(node.right) 60 | 61 | 62 | if __name__ == "__main__": 63 | Root = BTree("root") 64 | A = Root.insertLeft("A") 65 | C = A.insertLeft("C") 66 | D = C.insertRight("D") 67 | F = D.insertLeft("F") 68 | G = D.insertRight("G") 69 | B = Root.insertRight("B") 70 | E = B.insertRight("E") 71 | 72 | print "前序遍历:", 73 | preorder(Root) 74 | 75 | print "" 76 | print "中序遍历:", 77 | inorder(Root) 78 | 79 | print "" 80 | print "后序遍历:", 81 | postorder(Root) 82 | 83 | print "" 84 | print "层序遍历:", 85 | layerorder(Root) 86 | -------------------------------------------------------------------------------- /calc24.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-06-09 22:48:14 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2018-06-29 12:38:35 6 | # -*- coding: utf-8 -*- 7 | import optparse 8 | import itertools 9 | import random 10 | 11 | 12 | # 洗牌 13 | def shuffle(n, m=-1): 14 | if m == -1: 15 | m = n 16 | l = range(n) 17 | for i in range(len(l) - 1): 18 | x = random.randint(i, len(l) - 1) 19 | l[x], l[i] = l[i], l[x] 20 | if i == m - 1: 21 | break 22 | return [l[idx] for idx in range(n) if idx >= 0 and idx < m] 23 | 24 | 25 | # 生成4张牌 26 | def Get4Card(): 27 | card = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10] * 4 28 | cardidxs = shuffle(52, 4) 29 | return [card[idx] for idx in cardidxs] 30 | 31 | 32 | def GenAllExpr(card_4, ops_iter): 33 | try: 34 | while True: 35 | l = list(ops_iter.next()) + card_4 36 | its = itertools.permutations(l, len(l)) 37 | try: 38 | while True: 39 | yield its.next() 40 | except StopIteration: 41 | pass 42 | except StopIteration: 43 | pass 44 | 45 | 46 | def CalcRes(expr, isprint=False): 47 | opmap = {'+': lambda a, b: a + b, '-': lambda a, b: a - b, '*': lambda a, b: a * b, 48 | '/': lambda a, b: a / (b + 0.0)} 49 | expr_stack = [] 50 | while expr: 51 | t = expr.pop(0) 52 | if type(t) == int: 53 | expr_stack.append(t) 54 | else: 55 | if len(expr_stack) < 2: 56 | return False 57 | else: 58 | a = expr_stack.pop() 59 | b = expr_stack.pop() 60 | if isprint: 61 | print a, t, b, '=', opmap[t](a, b) 62 | try: 63 | expr_stack.append(opmap[t](a, b)) 64 | except ZeroDivisionError: 65 | return False 66 | return expr_stack[0] 67 | 68 | 69 | if __name__ == "__main__": 70 | parser = optparse.OptionParser('usage -n 1,2,3,4') 71 | parser.add_option('-n', dest='nums', type='string', help='specify num list') 72 | (options, args) = parser.parse_args() 73 | nums = options.nums 74 | if nums is None: 75 | input_card = Get4Card() 76 | else: 77 | input_card = [int(x) for x in nums.split(',')] 78 | card = input_card 79 | if len(input_card) != 4: 80 | print(parser.usage) 81 | exit(0) 82 | print card 83 | ops = itertools.combinations_with_replacement('+-*/', 3) # 一个24点的计算公式可以表达成3个操作符的形式 84 | allexpr = GenAllExpr(card, ops) # 数和操作符混合,得到所有可能序列 85 | for expr in allexpr: 86 | res = CalcRes(list(expr)) 87 | if res and res == 24: 88 | CalcRes(list(expr), True) # 输出计算过程 89 | print "Success" 90 | break 91 | -------------------------------------------------------------------------------- /celery/tasks.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/5/2 11:19 5 | 6 | from celery import Celery 7 | 8 | app = Celery('TASK', broker='redis://127.0.0.1', backend='redis://127.0.0.1') 9 | 10 | 11 | @app.task 12 | def add(x, y): 13 | print 'start ...' 14 | print 'get param :%s,%s' % (x, y,) 15 | return x + y 16 | -------------------------------------------------------------------------------- /compress.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-15 00:11:32 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-15 00:34:55 6 | 7 | 8 | def compress(string): 9 | compressed = [] 10 | count = 0 11 | temp = string[0] 12 | 13 | for i in range(0, len(string)): 14 | if temp == string[i]: 15 | count = count + 1 16 | else: 17 | compressed.append(str(temp) + str(count)) 18 | count = 1 19 | temp = string[i] 20 | 21 | if i == len(string) - 1: 22 | compressed.append(str(temp) + str(count)) 23 | 24 | return ''.join([str(x) for x in compressed]) 25 | 26 | 27 | def decompress(string): 28 | print '执行解压...' 29 | decompress_list = [] 30 | for j in xrange(0, len(string) - 1): 31 | if j % 2 == 0: 32 | for i in xrange(0, int(string[j + 1])): 33 | decompress_list.append(string[j]) 34 | print string[j] 35 | print '解压完毕' 36 | return ''.join(decompress_list) 37 | 38 | def main(): 39 | string = "xAAACCCBBDBB111" 40 | print '原始字符串:%s' % (string,) 41 | print '压缩后:%s' % (compress(string),) 42 | print '解压后:%s' % (decompress(compress(string)),) 43 | 44 | if __name__ == '__main__': 45 | main() 46 | -------------------------------------------------------------------------------- /coroutine.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-05-03 00:06:17 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-05-03 00:06:22 6 | from gevent import monkey; monkey.patch_all() 7 | import gevent 8 | import urllib2 9 | 10 | def f(url): 11 | print('GET: %s' % url) 12 | resp = urllib2.urlopen(url) 13 | data = resp.read() 14 | print('%d bytes received from %s.' % (len(data), url)) 15 | 16 | gevent.joinall([ 17 | gevent.spawn(f, 'https://www.python.org/'), 18 | gevent.spawn(f, 'https://www.yahoo.com/'), 19 | gevent.spawn(f, 'https://github.com/'), 20 | ]) -------------------------------------------------------------------------------- /crawl_360/crawl_360/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/__init__.py -------------------------------------------------------------------------------- /crawl_360/crawl_360/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/__init__.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/items.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Define here the models for your scraped items 4 | # 5 | # See documentation in: 6 | # https://doc.scrapy.org/en/latest/topics/items.html 7 | 8 | import scrapy 9 | 10 | 11 | class Crawl360Item(scrapy.Item): 12 | # define the fields for your item here like: 13 | # name = scrapy.Field() 14 | pass 15 | 16 | 17 | class ButianItem(scrapy.Item): 18 | author = scrapy.Field() # 作者 19 | company_name = scrapy.Field() # 企业名称 20 | vul_name = scrapy.Field() # SQL注入漏洞 21 | vul_level = scrapy.Field() # 高危 22 | vul_type = scrapy.Field() # 通用型 23 | vul_money = scrapy.Field() # 奖励 24 | vul_find_time = scrapy.Field() # 时间 25 | link_url = scrapy.Field() # 抓取url 26 | create_time = scrapy.Field() # 创建时间 27 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/items.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/items.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/middlewares.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Define here the models for your spider middleware 4 | # 5 | # See documentation in: 6 | # https://doc.scrapy.org/en/latest/topics/spider-middleware.html 7 | 8 | from scrapy import signals 9 | 10 | 11 | class Crawl360SpiderMiddleware(object): 12 | # Not all methods need to be defined. If a method is not defined, 13 | # scrapy acts as if the spider middleware does not modify the 14 | # passed objects. 15 | 16 | @classmethod 17 | def from_crawler(cls, crawler): 18 | # This method is used by Scrapy to create your spiders. 19 | s = cls() 20 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 21 | return s 22 | 23 | def process_spider_input(self, response, spider): 24 | # Called for each response that goes through the spider 25 | # middleware and into the spider. 26 | 27 | # Should return None or raise an exception. 28 | return None 29 | 30 | def process_spider_output(self, response, result, spider): 31 | # Called with the results returned from the Spider, after 32 | # it has processed the response. 33 | 34 | # Must return an iterable of Request, dict or Item objects. 35 | for i in result: 36 | yield i 37 | 38 | def process_spider_exception(self, response, exception, spider): 39 | # Called when a spider or process_spider_input() method 40 | # (from other spider middleware) raises an exception. 41 | 42 | # Should return either None or an iterable of Response, dict 43 | # or Item objects. 44 | pass 45 | 46 | def process_start_requests(self, start_requests, spider): 47 | # Called with the start requests of the spider, and works 48 | # similarly to the process_spider_output() method, except 49 | # that it doesn’t have a response associated. 50 | 51 | # Must return only requests (not items). 52 | for r in start_requests: 53 | yield r 54 | 55 | def spider_opened(self, spider): 56 | spider.logger.info('Spider opened: %s' % spider.name) 57 | 58 | 59 | class Crawl360DownloaderMiddleware(object): 60 | # Not all methods need to be defined. If a method is not defined, 61 | # scrapy acts as if the downloader middleware does not modify the 62 | # passed objects. 63 | 64 | @classmethod 65 | def from_crawler(cls, crawler): 66 | # This method is used by Scrapy to create your spiders. 67 | s = cls() 68 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 69 | return s 70 | 71 | def process_request(self, request, spider): 72 | # Called for each request that goes through the downloader 73 | # middleware. 74 | 75 | # Must either: 76 | # - return None: continue processing this request 77 | # - or return a Response object 78 | # - or return a Request object 79 | # - or raise IgnoreRequest: process_exception() methods of 80 | # installed downloader middleware will be called 81 | return None 82 | 83 | def process_response(self, request, response, spider): 84 | # Called with the response returned from the downloader. 85 | 86 | # Must either; 87 | # - return a Response object 88 | # - return a Request object 89 | # - or raise IgnoreRequest 90 | return response 91 | 92 | def process_exception(self, request, exception, spider): 93 | # Called when a download handler or a process_request() 94 | # (from other downloader middleware) raises an exception. 95 | 96 | # Must either: 97 | # - return None: continue processing this exception 98 | # - return a Response object: stops process_exception() chain 99 | # - return a Request object: stops process_exception() chain 100 | pass 101 | 102 | def spider_opened(self, spider): 103 | spider.logger.info('Spider opened: %s' % spider.name) 104 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/4/28 11:35 5 | 6 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/__init__.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/db.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/4/28 11:36 5 | 6 | from sqlalchemy import create_engine 7 | from sqlalchemy.orm import sessionmaker 8 | from sqlalchemy.ext.declarative import declarative_base 9 | 10 | # 创建对象的基类: 11 | Base = declarative_base() 12 | 13 | CONFIG = { 14 | 'db_host': '127.0.0.1', 15 | 'db_user': 'root', 16 | 'db_pass': '', 17 | 'db_port': 3306, 18 | 'db_name': 'crawl' 19 | } 20 | 21 | # 初始化数据库连接: 22 | engine = create_engine('mysql+mysqlconnector://%s:%s@%s:%s/%s' % ( 23 | CONFIG.get('db_user'), 24 | CONFIG.get('db_pass'), 25 | CONFIG.get('db_host'), 26 | CONFIG.get('db_port'), 27 | CONFIG.get('db_name'), 28 | )) 29 | 30 | # 创建DBSession类型: 31 | DBSession = sessionmaker(bind=engine) 32 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/db.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/db.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/models.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | from sqlalchemy import Column, DateTime, Integer, Numeric, String, text 3 | from sqlalchemy.ext.declarative import declarative_base 4 | 5 | Base = declarative_base() 6 | metadata = Base.metadata 7 | 8 | 9 | class Butian(Base): 10 | __tablename__ = 'butian' 11 | 12 | id = Column(Integer, primary_key=True) 13 | author = Column(String(100), nullable=False, server_default=text("''")) 14 | company_name = Column(String(100), nullable=False, server_default=text("''")) 15 | vul_level = Column(String(100), nullable=False, server_default=text("''")) 16 | vul_name = Column(String(100), nullable=False, server_default=text("''")) 17 | vul_money = Column(Numeric(10, 2), nullable=False) 18 | vul_find_time = Column(DateTime, nullable=False, server_default=text("'0000-00-00 00:00:00'")) 19 | link_url = Column(String(255), nullable=False, server_default=text("''")) 20 | create_time = Column(DateTime, nullable=False) 21 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/models/models.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/models/models.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/pipelines.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Define your item pipelines here 4 | # 5 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting 6 | # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html 7 | from crawl_360.items import ButianItem 8 | from crawl_360.models.db import DBSession 9 | from crawl_360.models.models import Butian 10 | 11 | 12 | class Crawl360Pipeline(object): 13 | def __init__(self): 14 | self.db_session = DBSession() 15 | 16 | def process_item(self, item, spider): 17 | if isinstance(item, ButianItem): 18 | data_item = Butian(**item) 19 | # 数据入库 20 | self.db_session.add(data_item) 21 | try: 22 | self.db_session.commit() 23 | except Exception, e: 24 | print e.message 25 | self.db_session.rollback() 26 | return item 27 | 28 | def close_spider(self, spider): 29 | self.db_session.close() 30 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/pipelines.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/pipelines.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/reademe/sql.sql: -------------------------------------------------------------------------------- 1 | CREATE TABLE butian ( 2 | id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增id', 3 | author VARCHAR(100) not null DEFAULT '' COMMENT '作者', 4 | company_name VARCHAR(100) NOT NULL DEFAULT '' COMMENT '公司名', 5 | vul_level VARCHAR(100) not null DEFAULT '' COMMENT '漏洞级别', 6 | vul_name VARCHAR(100) not null DEFAULT '' COMMENT '漏洞名', 7 | vul_money DECIMAL(10,2) not NULL DEFAULT 0 COMMENT '漏洞奖金', 8 | vul_find_time DATETIME not NULL DEFAULT '0000-00-00 00:00:00' COMMENT '漏洞发现时间', 9 | link_url VARCHAR(255) not null DEFAULT '' COMMENT '页面url', 10 | create_time TIMESTAMP not null DEFAULT current_timestamp COMMENT '创建时间', 11 | PRIMARY KEY (id) 12 | )ENGINE=INNODB DEFAULT CHARSET utf8; 13 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/settings.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Scrapy settings for crawl_360 project 4 | # 5 | # For simplicity, this file contains only settings considered important or 6 | # commonly used. You can find more settings consulting the documentation: 7 | # 8 | # https://doc.scrapy.org/en/latest/topics/settings.html 9 | # https://doc.scrapy.org/en/latest/topics/downloader-middleware.html 10 | # https://doc.scrapy.org/en/latest/topics/spider-middleware.html 11 | 12 | BOT_NAME = 'crawl_360' 13 | 14 | SPIDER_MODULES = ['crawl_360.spiders'] 15 | NEWSPIDER_MODULE = 'crawl_360.spiders' 16 | 17 | 18 | # Crawl responsibly by identifying yourself (and your website) on the user-agent 19 | #USER_AGENT = 'crawl_360 (+http://www.yourdomain.com)' 20 | 21 | # Obey robots.txt rules 22 | ROBOTSTXT_OBEY = True 23 | 24 | # Configure maximum concurrent requests performed by Scrapy (default: 16) 25 | #CONCURRENT_REQUESTS = 32 26 | 27 | # Configure a delay for requests for the same website (default: 0) 28 | # See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay 29 | # See also autothrottle settings and docs 30 | #DOWNLOAD_DELAY = 3 31 | # The download delay setting will honor only one of: 32 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16 33 | #CONCURRENT_REQUESTS_PER_IP = 16 34 | 35 | # Disable cookies (enabled by default) 36 | #COOKIES_ENABLED = False 37 | 38 | # Disable Telnet Console (enabled by default) 39 | #TELNETCONSOLE_ENABLED = False 40 | 41 | # Override the default request headers: 42 | #DEFAULT_REQUEST_HEADERS = { 43 | # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 44 | # 'Accept-Language': 'en', 45 | #} 46 | 47 | # Enable or disable spider middlewares 48 | # See https://doc.scrapy.org/en/latest/topics/spider-middleware.html 49 | #SPIDER_MIDDLEWARES = { 50 | # 'crawl_360.middlewares.Crawl360SpiderMiddleware': 543, 51 | #} 52 | 53 | # Enable or disable downloader middlewares 54 | # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html 55 | #DOWNLOADER_MIDDLEWARES = { 56 | # 'crawl_360.middlewares.Crawl360DownloaderMiddleware': 543, 57 | #} 58 | 59 | # Enable or disable extensions 60 | # See https://doc.scrapy.org/en/latest/topics/extensions.html 61 | #EXTENSIONS = { 62 | # 'scrapy.extensions.telnet.TelnetConsole': None, 63 | #} 64 | 65 | # Configure item pipelines 66 | # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html 67 | ITEM_PIPELINES = { 68 | 'crawl_360.pipelines.Crawl360Pipeline': 300, 69 | } 70 | 71 | # Enable and configure the AutoThrottle extension (disabled by default) 72 | # See https://doc.scrapy.org/en/latest/topics/autothrottle.html 73 | #AUTOTHROTTLE_ENABLED = True 74 | # The initial download delay 75 | #AUTOTHROTTLE_START_DELAY = 5 76 | # The maximum download delay to be set in case of high latencies 77 | #AUTOTHROTTLE_MAX_DELAY = 60 78 | # The average number of requests Scrapy should be sending in parallel to 79 | # each remote server 80 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 81 | # Enable showing throttling stats for every response received: 82 | #AUTOTHROTTLE_DEBUG = False 83 | 84 | # Enable and configure HTTP caching (disabled by default) 85 | # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings 86 | #HTTPCACHE_ENABLED = True 87 | #HTTPCACHE_EXPIRATION_SECS = 0 88 | #HTTPCACHE_DIR = 'httpcache' 89 | #HTTPCACHE_IGNORE_HTTP_CODES = [] 90 | #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage' 91 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/settings.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/settings.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package will contain the spiders of your Scrapy project 2 | # 3 | # Please refer to the documentation for information on how to create and manage 4 | # your spiders. 5 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/spiders/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/spiders/__init__.pyc -------------------------------------------------------------------------------- /crawl_360/crawl_360/spiders/butian.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import scrapy 3 | 4 | from crawl_360.items import ButianItem 5 | import time 6 | 7 | 8 | class ButianSpider(scrapy.Spider): 9 | name = 'butian' 10 | allowed_domains = ['butian.360.cn/Loo', 'butian.360.cn'] 11 | start_urls = ['http://butian.360.cn/Loo/'] 12 | 13 | def parse(self, response): 14 | self.logger.info('strart parse dst page ...') 15 | item = ButianItem() 16 | # import ipdb 17 | # ipdb.set_trace() 18 | for sel in response.xpath('//ul[@class="loopListBottom"]/li'): 19 | item['author'] = sel.xpath('dl/dd/span[1]/text()').extract_first(default='').strip() 20 | item['company_name'] = sel.xpath('dl/dd/a/text()').extract_first(default='').strip() 21 | item['vul_name'] = sel.xpath('dl/dd/span[3]/text()').extract_first(default='').replace(u'的一个', '').strip() 22 | item['vul_level'] = sel.xpath('dl/dd[2]/strong[@class="loopHigh"]/text()').extract_first(default='').strip() 23 | item['vul_money'] = sel.xpath('dl/p[@class="loopJiangjin"]/text()').extract_first(default=0) 24 | item['vul_find_time'] = sel.xpath('dl/dd[2]/em/text()').extract_first(default='').strip() 25 | item['link_url'] = response.url.strip() 26 | item['create_time'] = time.strftime("%Y-%m-%d %H:%M:%S") 27 | self.logger.info('find item data is:%s' % (item,)) 28 | yield item 29 | 30 | next_page = response.xpath(u'//div[@class="btPage page"]/a[contains(text(),"下一页")]/@href').extract_first() 31 | if next_page is not None: 32 | next_page = response.urljoin(next_page) 33 | self.logger.info('next page url is:%s' % (next_page,)) 34 | yield scrapy.Request(url=next_page, callback=self.parse) 35 | -------------------------------------------------------------------------------- /crawl_360/crawl_360/spiders/butian.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/crawl_360/crawl_360/spiders/butian.pyc -------------------------------------------------------------------------------- /crawl_360/scrapy.cfg: -------------------------------------------------------------------------------- 1 | # Automatically created by: scrapy startproject 2 | # 3 | # For more information about the [deploy] section see: 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html 5 | 6 | [settings] 7 | default = crawl_360.settings 8 | 9 | [deploy] 10 | #url = http://localhost:6800/ 11 | project = crawl_360 12 | -------------------------------------------------------------------------------- /dispatch.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2016-05-18 23:47:54 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2016-05-18 23:47:54 6 | from collections import deque 7 | class Runner(object): 8 | def __init__(self, tasks): 9 | self.tasks = deque(tasks) 10 | 11 | def next(self): 12 | return self.tasks.pop() 13 | 14 | def run(self): 15 | while len(self.tasks): 16 | task = self.next() 17 | try: 18 | next(task) 19 | except StopIteration: 20 | pass 21 | else: 22 | self.tasks.appendleft(task) 23 | 24 | def task(name, times): 25 | for i in range(times): 26 | yield 27 | print(name, i) 28 | 29 | Runner([ 30 | task('hsfzxjy', 5), 31 | task('Jack', 4), 32 | task('Bob', 6) 33 | ]).run() 34 | -------------------------------------------------------------------------------- /hashtable.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-15 00:49:17 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-15 01:00:13 6 | class Item(object): 7 | 8 | def __init__(self, key, value): 9 | self.key = key 10 | self.value = value 11 | 12 | 13 | class HashTable(object): 14 | 15 | def __init__(self, size): 16 | self.size = size 17 | self.table = [[] for _ in xrange(self.size)] 18 | 19 | def hash_function(self, key): 20 | return key % self.size 21 | 22 | def set(self, key, value): 23 | hash_index = self.hash_function(key) 24 | for item in self.table[hash_index]: 25 | if item.key == key: 26 | item.value = value 27 | return 28 | self.table[hash_index].append(Item(key, value)) 29 | 30 | def get(self, key): 31 | hash_index = self.hash_function(key) 32 | for item in self.table[hash_index]: 33 | if item.key == key: 34 | return item.value 35 | return None 36 | 37 | def remove(self, key): 38 | hash_index = self.hash_function(key) 39 | for i, item in enumerate(self.table[hash_index]): 40 | if item.key == key: 41 | del self.table[hash_index][i] 42 | 43 | if __name__ == '__main__': 44 | hash_table = HashTable(5); 45 | hash_table.set(1,'x') 46 | hash_table.set(1,'m') 47 | hash_table.set(2,'y') 48 | hash_table.set(3,'z') 49 | print hash_table.get(1) 50 | print hash_table.get(2) 51 | print hash_table.get(3) -------------------------------------------------------------------------------- /heapq_sort.py: -------------------------------------------------------------------------------- 1 | # !/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # Created by Vim 5 | """ 6 | 算法过程: 7 | (1)、建堆:从len/2到第一个节点0处一直调用调整堆的过程,其中len为数组长度,len/2表示节点深度。 8 | (2)、调整堆:比较节点i和它的孩子节点left(i),right(i),选出三者最大者,如果最大值不是节点i而是它的一个孩子节点,那便交换节点i和该节点,然后再调用调整堆过程,这是一个递归的过程。调整堆的过程时间复杂度与堆的深度有关系,是lgn的操作。 9 | (3)、堆排序:主要利用上面两个过程进行。首先是根据元素构建堆,然后将堆的根节点取出(一般是与最后一个节点进行交换),将前面len-1个节点继续进行堆调整的过程,然后再将根节点取出,这样一直到所有节点都取出。 10 | """ 11 | 12 | 13 | def build_heap(seq): 14 | length = len(seq) 15 | for item in range(0, int((length / 2)))[::-1]: 16 | adjust_heap(seq, item, length) 17 | 18 | 19 | def adjust_heap(seq, root, length): 20 | left_child = 2 * root + 1 21 | right_child = 2 * root + 2 22 | root_max = root 23 | if left_child < length and seq[left_child] > seq[root_max]: 24 | root_max = left_child 25 | if right_child < length and seq[right_child] > seq[root_max]: 26 | root_max = right_child 27 | if root_max != root: # 如果做了堆调整,则root_max变更后是左节点或者右节点的,进行对调值操作 28 | seq[root_max], seq[root] = seq[root], seq[root_max] 29 | adjust_heap(seq, root_max, length) 30 | 31 | 32 | def heap_sort(seq): 33 | length = len(seq) 34 | build_heap(seq) # 建立初始堆 35 | for i in range(0, length)[::-1]: 36 | seq[0], seq[i] = seq[i], seq[0] # 将根节点取出与最后一位做对调 37 | adjust_heap(seq, 0, i) # 对前面len-1个节点继续进行堆调整过程 38 | return seq 39 | 40 | 41 | if __name__ == "__main__": 42 | arr = [2, 1, 3, 8, 12, 5, 5, 6, 4, 10, 0] 43 | print(arr) 44 | heap_sort(arr) 45 | print(arr) 46 | -------------------------------------------------------------------------------- /httpstat.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | # References: 4 | # man curl 5 | # https://curl.haxx.se/libcurl/c/curl_easy_getinfo.html 6 | # https://curl.haxx.se/libcurl/c/easy_getinfo_options.html 7 | # http://blog.kenweiner.com/2014/11/http-request-timings-with-curl.html 8 | 9 | from __future__ import print_function 10 | 11 | import os 12 | import json 13 | import sys 14 | import logging 15 | import tempfile 16 | import subprocess 17 | 18 | 19 | __version__ = '1.3.1' 20 | 21 | 22 | PY3 = sys.version_info >= (3,) 23 | 24 | if PY3: 25 | xrange = range 26 | 27 | 28 | # Env class is copied from https://github.com/reorx/getenv/blob/master/getenv.py 29 | class Env(object): 30 | prefix = 'HTTPSTAT' 31 | _instances = [] 32 | 33 | def __init__(self, key): 34 | self.key = key.format(prefix=self.prefix) 35 | Env._instances.append(self) 36 | 37 | def get(self, default=None): 38 | return os.environ.get(self.key, default) 39 | 40 | 41 | ENV_SHOW_BODY = Env('{prefix}_SHOW_BODY') 42 | ENV_SHOW_IP = Env('{prefix}_SHOW_IP') 43 | ENV_SHOW_SPEED = Env('{prefix}_SHOW_SPEED') 44 | ENV_SAVE_BODY = Env('{prefix}_SAVE_BODY') 45 | ENV_CURL_BIN = Env('{prefix}_CURL_BIN') 46 | ENV_METRICS_ONLY = Env('{prefix}_METRICS_ONLY') 47 | ENV_DEBUG = Env('{prefix}_DEBUG') 48 | 49 | 50 | curl_format = """{ 51 | "time_namelookup": %{time_namelookup}, 52 | "time_connect": %{time_connect}, 53 | "time_appconnect": %{time_appconnect}, 54 | "time_pretransfer": %{time_pretransfer}, 55 | "time_redirect": %{time_redirect}, 56 | "time_starttransfer": %{time_starttransfer}, 57 | "time_total": %{time_total}, 58 | "speed_download": %{speed_download}, 59 | "speed_upload": %{speed_upload}, 60 | "remote_ip": "%{remote_ip}", 61 | "remote_port": "%{remote_port}", 62 | "local_ip": "%{local_ip}", 63 | "local_port": "%{local_port}" 64 | }""" 65 | 66 | https_template = """ 67 | DNS Lookup TCP Connection TLS Handshake Server Processing Content Transfer 68 | [ {a0000} | {a0001} | {a0002} | {a0003} | {a0004} ] 69 | | | | | | 70 | namelookup:{b0000} | | | | 71 | connect:{b0001} | | | 72 | pretransfer:{b0002} | | 73 | starttransfer:{b0003} | 74 | total:{b0004} 75 | """[1:] 76 | 77 | http_template = """ 78 | DNS Lookup TCP Connection Server Processing Content Transfer 79 | [ {a0000} | {a0001} | {a0003} | {a0004} ] 80 | | | | | 81 | namelookup:{b0000} | | | 82 | connect:{b0001} | | 83 | starttransfer:{b0003} | 84 | total:{b0004} 85 | """[1:] 86 | 87 | 88 | # Color code is copied from https://github.com/reorx/python-terminal-color/blob/master/color_simple.py 89 | ISATTY = sys.stdout.isatty() 90 | 91 | 92 | def make_color(code): 93 | def color_func(s): 94 | if not ISATTY: 95 | return s 96 | tpl = '\x1b[{}m{}\x1b[0m' 97 | return tpl.format(code, s) 98 | return color_func 99 | 100 | 101 | red = make_color(31) 102 | green = make_color(32) 103 | yellow = make_color(33) 104 | blue = make_color(34) 105 | magenta = make_color(35) 106 | cyan = make_color(36) 107 | 108 | bold = make_color(1) 109 | underline = make_color(4) 110 | 111 | grayscale = {(i - 232): make_color('38;5;' + str(i)) for i in xrange(232, 256)} 112 | 113 | 114 | def quit(s, code=0): 115 | if s is not None: 116 | print(s) 117 | sys.exit(code) 118 | 119 | 120 | def print_help(): 121 | help = """ 122 | Usage: httpstat URL [CURL_OPTIONS] 123 | httpstat -h | --help 124 | httpstat --version 125 | 126 | Arguments: 127 | URL url to request, could be with or without `http(s)://` prefix 128 | 129 | Options: 130 | CURL_OPTIONS any curl supported options, except for -w -D -o -S -s, 131 | which are already used internally. 132 | -h --help show this screen. 133 | --version show version. 134 | 135 | Environments: 136 | HTTPSTAT_SHOW_BODY Set to `true` to show response body in the output, 137 | note that body length is limited to 1023 bytes, will be 138 | truncated if exceeds. Default is `false`. 139 | HTTPSTAT_SHOW_IP By default httpstat shows remote and local IP/port address. 140 | Set to `false` to disable this feature. Default is `true`. 141 | HTTPSTAT_SHOW_SPEED Set to `true` to show download and upload speed. 142 | Default is `false`. 143 | HTTPSTAT_SAVE_BODY By default httpstat stores body in a tmp file, 144 | set to `false` to disable this feature. Default is `true` 145 | HTTPSTAT_CURL_BIN Indicate the curl bin path to use. Default is `curl` 146 | from current shell $PATH. 147 | HTTPSTAT_DEBUG Set to `true` to see debugging logs. Default is `false` 148 | """[1:-1] 149 | print(help) 150 | 151 | 152 | def main(): 153 | args = sys.argv[1:] 154 | if not args: 155 | print_help() 156 | quit(None, 0) 157 | 158 | # get envs 159 | show_body = 'true' in ENV_SHOW_BODY.get('false').lower() 160 | show_ip = 'true' in ENV_SHOW_IP.get('true').lower() 161 | show_speed = 'true'in ENV_SHOW_SPEED.get('false').lower() 162 | save_body = 'true' in ENV_SAVE_BODY.get('true').lower() 163 | curl_bin = ENV_CURL_BIN.get('curl') 164 | metrics_only = 'true' in ENV_METRICS_ONLY.get('false').lower() 165 | is_debug = 'true' in ENV_DEBUG.get('false').lower() 166 | 167 | # configure logging 168 | if is_debug: 169 | log_level = logging.DEBUG 170 | else: 171 | log_level = logging.INFO 172 | logging.basicConfig(level=log_level) 173 | lg = logging.getLogger('httpstat') 174 | 175 | # log envs 176 | lg.debug('Envs:\n%s', '\n'.join(' {}={}'.format(i.key, i.get('')) for i in Env._instances)) 177 | lg.debug('Flags: %s', dict( 178 | show_body=show_body, 179 | show_ip=show_ip, 180 | show_speed=show_speed, 181 | save_body=save_body, 182 | curl_bin=curl_bin, 183 | is_debug=is_debug, 184 | )) 185 | 186 | # get url 187 | url = args[0] 188 | if url in ['-h', '--help']: 189 | print_help() 190 | quit(None, 0) 191 | elif url == '--version': 192 | print('httpstat {}'.format(__version__)) 193 | quit(None, 0) 194 | 195 | curl_args = args[1:] 196 | 197 | # check curl args 198 | exclude_options = [ 199 | '-w', '--write-out', 200 | '-D', '--dump-header', 201 | '-o', '--output', 202 | '-s', '--silent', 203 | ] 204 | for i in exclude_options: 205 | if i in curl_args: 206 | quit(yellow('Error: {} is not allowed in extra curl args'.format(i)), 1) 207 | 208 | # tempfile for output 209 | bodyf = tempfile.NamedTemporaryFile(delete=False) 210 | bodyf.close() 211 | 212 | headerf = tempfile.NamedTemporaryFile(delete=False) 213 | headerf.close() 214 | 215 | # run cmd 216 | cmd_env = os.environ.copy() 217 | cmd_env.update( 218 | LC_ALL='C', 219 | ) 220 | cmd_core = [curl_bin, '-w', curl_format, '-D', headerf.name, '-o', bodyf.name, '-s', '-S'] 221 | cmd = cmd_core + curl_args + [url] 222 | lg.debug('cmd: %s', cmd) 223 | p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=cmd_env) 224 | out, err = p.communicate() 225 | if PY3: 226 | out, err = out.decode(), err.decode() 227 | lg.debug('out: %s', out) 228 | 229 | # print stderr 230 | if p.returncode == 0: 231 | if err: 232 | print(grayscale[16](err)) 233 | else: 234 | _cmd = list(cmd) 235 | _cmd[2] = '' 236 | _cmd[4] = '' 237 | _cmd[6] = '' 238 | print('> {}'.format(' '.join(_cmd))) 239 | quit(yellow('curl error: {}'.format(err)), p.returncode) 240 | 241 | # parse output 242 | try: 243 | d = json.loads(out) 244 | except ValueError as e: 245 | print(yellow('Could not decode json: {}'.format(e))) 246 | print('curl result:', p.returncode, grayscale[16](out), grayscale[16](err)) 247 | quit(None, 1) 248 | 249 | # convert time_ metrics from seconds to milliseconds 250 | for k in d: 251 | if k.startswith('time_'): 252 | v = d[k] 253 | # Convert time_ values to milliseconds in int 254 | if isinstance(v, float): 255 | # Before 7.61.0, time values are represented as seconds in float 256 | d[k] = int(v * 1000) 257 | elif isinstance(v, int): 258 | # Starting from 7.61.0, libcurl uses microsecond in int 259 | # to return time values, references: 260 | # https://daniel.haxx.se/blog/2018/07/11/curl-7-61-0/ 261 | # https://curl.se/bug/?i=2495 262 | d[k] = int(v / 1000) 263 | else: 264 | raise TypeError('{} value type is invalid: {}'.format(k, type(v))) 265 | 266 | # calculate ranges 267 | d.update( 268 | range_dns=d['time_namelookup'], 269 | range_connection=d['time_connect'] - d['time_namelookup'], 270 | range_ssl=d['time_pretransfer'] - d['time_connect'], 271 | range_server=d['time_starttransfer'] - d['time_pretransfer'], 272 | range_transfer=d['time_total'] - d['time_starttransfer'], 273 | ) 274 | 275 | # print json if metrics_only is enabled 276 | if metrics_only: 277 | print(json.dumps(d, indent=2)) 278 | quit(None, 0) 279 | 280 | # ip 281 | if show_ip: 282 | s = 'Connected to {}:{} from {}:{}'.format( 283 | cyan(d['remote_ip']), cyan(d['remote_port']), 284 | d['local_ip'], d['local_port'], 285 | ) 286 | print(s) 287 | print() 288 | 289 | # print header & body summary 290 | with open(headerf.name, 'r') as f: 291 | headers = f.read().strip() 292 | # remove header file 293 | lg.debug('rm header file %s', headerf.name) 294 | os.remove(headerf.name) 295 | 296 | for loop, line in enumerate(headers.split('\n')): 297 | if loop == 0: 298 | p1, p2 = tuple(line.split('/')) 299 | print(green(p1) + grayscale[14]('/') + cyan(p2)) 300 | else: 301 | pos = line.find(':') 302 | print(grayscale[14](line[:pos + 1]) + cyan(line[pos + 1:])) 303 | 304 | print() 305 | 306 | # body 307 | if show_body: 308 | body_limit = 1024 309 | with open(bodyf.name, 'r') as f: 310 | body = f.read().strip() 311 | body_len = len(body) 312 | 313 | if body_len > body_limit: 314 | print(body[:body_limit] + cyan('...')) 315 | print() 316 | s = '{} is truncated ({} out of {})'.format(green('Body'), body_limit, body_len) 317 | if save_body: 318 | s += ', stored in: {}'.format(bodyf.name) 319 | print(s) 320 | else: 321 | print(body) 322 | else: 323 | if save_body: 324 | print('{} stored in: {}'.format(green('Body'), bodyf.name)) 325 | 326 | # remove body file 327 | if not save_body: 328 | lg.debug('rm body file %s', bodyf.name) 329 | os.remove(bodyf.name) 330 | 331 | # print stat 332 | if url.startswith('https://'): 333 | template = https_template 334 | else: 335 | template = http_template 336 | 337 | # colorize template first line 338 | tpl_parts = template.split('\n') 339 | tpl_parts[0] = grayscale[16](tpl_parts[0]) 340 | template = '\n'.join(tpl_parts) 341 | 342 | def fmta(s): 343 | return cyan('{:^7}'.format(str(s) + 'ms')) 344 | 345 | def fmtb(s): 346 | return cyan('{:<7}'.format(str(s) + 'ms')) 347 | 348 | stat = template.format( 349 | # a 350 | a0000=fmta(d['range_dns']), 351 | a0001=fmta(d['range_connection']), 352 | a0002=fmta(d['range_ssl']), 353 | a0003=fmta(d['range_server']), 354 | a0004=fmta(d['range_transfer']), 355 | # b 356 | b0000=fmtb(d['time_namelookup']), 357 | b0001=fmtb(d['time_connect']), 358 | b0002=fmtb(d['time_pretransfer']), 359 | b0003=fmtb(d['time_starttransfer']), 360 | b0004=fmtb(d['time_total']), 361 | ) 362 | print() 363 | print(stat) 364 | 365 | # speed, originally bytes per second 366 | if show_speed: 367 | print('speed_download: {:.1f} KiB/s, speed_upload: {:.1f} KiB/s'.format( 368 | d['speed_download'] / 1024, d['speed_upload'] / 1024)) 369 | 370 | 371 | if __name__ == '__main__': 372 | main() 373 | -------------------------------------------------------------------------------- /img/ac_fail_pointer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/ac_fail_pointer.png -------------------------------------------------------------------------------- /img/btree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/btree.png -------------------------------------------------------------------------------- /img/cmd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/cmd.png -------------------------------------------------------------------------------- /img/crawl_db_data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/crawl_db_data.png -------------------------------------------------------------------------------- /img/crawl_run.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/crawl_run.gif -------------------------------------------------------------------------------- /img/download.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/download.gif -------------------------------------------------------------------------------- /img/knn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/knn.png -------------------------------------------------------------------------------- /img/redpackage.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/redpackage.gif -------------------------------------------------------------------------------- /img/spider-wx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/spider-wx.png -------------------------------------------------------------------------------- /img/svm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/svm.png -------------------------------------------------------------------------------- /img/tire.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LockGit/Py/795feb1efcd03296501127201a812adff3fa150f/img/tire.png -------------------------------------------------------------------------------- /interpreter.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-18 15:21:43 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-18 15:40:25 6 | 7 | class Interpreter: 8 | def __init__(self): 9 | self.stack = [] 10 | 11 | def load_value(self, number): 12 | self.stack.append(number) 13 | 14 | def print_answer(self): 15 | answer = self.stack.pop() 16 | print(answer) 17 | 18 | def add_two_values(self): 19 | first_num = self.stack.pop() 20 | second_num = self.stack.pop() 21 | total = first_num + second_num 22 | self.stack.append(total) 23 | 24 | def run_code(self, what_to_execute): 25 | instructions = what_to_execute["instructions"] 26 | numbers = what_to_execute["numbers"] 27 | for each_step in instructions: 28 | instruction, argument = each_step 29 | if instruction == "load_value": 30 | number = numbers[argument] 31 | self.load_value(number) 32 | elif instruction == "add_two_values": 33 | self.add_two_values() 34 | elif instruction == "print_answer": 35 | self.print_answer() 36 | 37 | interpreter = Interpreter() 38 | what_to_execute = { 39 | "instructions": [("load_value", 0), 40 | ("load_value", 1), 41 | ("add_two_values", None), 42 | ("print_answer", None)], 43 | "numbers": [7, 5] } 44 | interpreter.run_code(what_to_execute) -------------------------------------------------------------------------------- /kmp.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-14 18:08:13 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-14 23:31:06 6 | 7 | 8 | # 只是字符串匹配,还不是真正的kmp 9 | def kmp(string, match): 10 | n = len(string) 11 | m = len(match) 12 | i = 0 13 | j = 0 14 | count_times_used = 0 15 | while i < n: 16 | count_times_used += 1 17 | if match[j] == string[i]: 18 | if j == m - 1: 19 | print "Found '%s' start at string '%s' %s index position, find use times: %s" % (match, string, i - m + 1, count_times_used,) 20 | return 21 | i += 1 22 | j += 1 23 | elif j > 0: 24 | j = j - 1 25 | else: 26 | i += 1 27 | 28 | kmp("asfdehhaassdsdasasedwa", "sase") 29 | kmp("12s3sasexxx", "sase") 30 | -------------------------------------------------------------------------------- /knn.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-23 19:24:54 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-23 19:41:34 6 | import math 7 | import numpy as np 8 | from matplotlib import pyplot 9 | from collections import Counter 10 | import warnings 11 | # K最近邻算法 12 | # 两个分组时k值取3,3个分组时k值取5... 13 | 14 | # k-Nearest Neighbor算法 15 | def k_nearest_neighbors(data, predict, k=3): 16 | 17 | if len(data) >= k: 18 | warnings.warn("k is too small") 19 | 20 | # 计算predict点到各点的距离 21 | distances = [] 22 | for group in data: 23 | for features in data[group]: 24 | #euclidean_distance = np.sqrt(np.sum((np.array(features)-np.array(predict))**2)) # 计算欧拉距离,这个方法没有下面一行代码快 25 | euclidean_distance = np.linalg.norm(np.array(features)-np.array(predict)) 26 | distances.append([euclidean_distance, group]) 27 | 28 | sorted_distances =[i[1] for i in sorted(distances)] 29 | top_nearest = sorted_distances[:k] 30 | 31 | #print(top_nearest) ['red','black','red'] 出现次数最多,返回一个TopN列表。如果n没有被指定,则返回所有元素。当多个元素计数值相同时,排列是无确定顺序的。 32 | group_res = Counter(top_nearest).most_common(1)[0][0] 33 | confidence = Counter(top_nearest).most_common(1)[0][1]*1.0/k 34 | # confidences是对本次分类的确定程度,例如(red,red,red),(red,red,black)都分为red组,但是前者显的更自信 35 | return group_res, confidence 36 | 37 | if __name__=='__main__': 38 | 39 | dataset = {'black':[ [1,2], [2,3], [3,1] ], 'red':[ [6,5], [7,7], [8,6] ]} 40 | new_features = [3.5,5.2] # 判断这个样本属于哪个组 41 | 42 | for i in dataset: 43 | for ii in dataset[i]: 44 | pyplot.scatter(ii[0], ii[1], s=50, color=i) 45 | 46 | #两个分组时k值取3,3个分组时k值取5 47 | which_group,confidence = k_nearest_neighbors(dataset, new_features, k=3) 48 | print(which_group, confidence) 49 | 50 | #s表示点的大小 51 | pyplot.scatter(new_features[0], new_features[1], s=300, color=which_group) 52 | 53 | pyplot.show() -------------------------------------------------------------------------------- /linked_list.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-20 22:53:58 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-20 23:27:23 6 | 7 | #常规做法,遍历一次链表,获得长度step,在从0的位置遍历到step/2的位置 8 | class Node(object): 9 | def __init__(self,data,next): 10 | self.data=data 11 | self.next=next 12 | 13 | n1 = Node('n1',None) 14 | n2 = Node('n2',n1) 15 | n3 = Node('n3',n2) 16 | n4 = Node('n4',n3) 17 | n5 = Node('n5',n4) 18 | 19 | head = n5 20 | step = 0 21 | while head.next is not None: 22 | step = step+1 23 | head = head.next 24 | 25 | 26 | head = n5 27 | for x in xrange(0,step/2): 28 | head = head.next 29 | 30 | 31 | print '普通遍历方式,单链表中间节点为:%s,索引为:%s,遍历一次链表,在从0遍历到中间位置' % (head.data,step/2) 32 | 33 | 34 | #快慢指针方式,遍历一次链表,快指针到达链表末尾,慢指针到达链表中间 35 | class Node(object): 36 | def __init__(self,data,next): 37 | self.data=data 38 | self.next=next 39 | 40 | n1 = Node('n1',None) 41 | n2 = Node('n2',n1) 42 | n3 = Node('n3',n2) 43 | n4 = Node('n4',n3) 44 | n5 = Node('n5',n4) 45 | 46 | head = n5 # 链表的头节点 47 | 48 | p1 = head # 一次步进1个node 49 | p2 = head # 一次步进2个node 50 | 51 | step = 0 52 | while (p2.next is not None and p2.next.next is not None): 53 | p2 = p2.next.next 54 | p1 = p1.next 55 | step = step + 1 56 | 57 | 58 | print '快慢指针方式,单链表中间节点为:%s,索引为:%s,只遍历一次链表' % (p1.data,step) 59 | 60 | 61 | -------------------------------------------------------------------------------- /nice_download.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2017/12/21 18:28 5 | # 多线程文件下载器,默认单线程 6 | 7 | import sys 8 | import optparse 9 | import threading 10 | import requests 11 | import re 12 | import time 13 | 14 | 15 | class Download(object): 16 | def __init__(self, config_dict): 17 | self.url = config_dict['url'] 18 | self.filename = self.clear_name(config_dict['url'].split('/')[-1]) 19 | self.thread = config_dict['thread'] 20 | self.user_agent = config_dict['user_agent'] 21 | self.fileSize = 0 22 | self.supportThread = True 23 | self.show_print = (config_dict['show_print'] == 'yes') and True or False 24 | 25 | # 移除文件名的一些特殊字符 26 | def clear_name(self, filename): 27 | (filename, _) = re.subn(ur'[\\\/\:\*\?\"\<\>\|]', '', filename) 28 | return filename 29 | 30 | # 初始化目标文件信息 31 | def init_file_info(self): 32 | headers = { 33 | 'User-Agent': self.user_agent, 34 | 'Range': 'bytes=0-4' 35 | } 36 | try: 37 | r = requests.head(self.url, headers=headers) 38 | rang_content = r.headers['content-range'] 39 | self.fileSize = int(re.match(ur'^bytes 0-4/(\d+)$', rang_content).group(1)) 40 | return True 41 | except Exception, e: 42 | print 'can not support breakpoint download,msg:%s' % (e.message,) 43 | 44 | try: 45 | self.fileSize = int(r.headers['content-length']) 46 | except Exception, e: 47 | self.supportThread = False 48 | print 'can not support multi thread download , error:%s' % (e.message,) 49 | return False 50 | 51 | def start_part_download(self, thread_id, start_index, stop_index): 52 | try: 53 | headers = {'Range': 'bytes=%d-%d' % (start_index, stop_index,), 'User-Agent': self.user_agent} 54 | r = requests.get(self.url, headers=headers, stream=True, allow_redirects=True) 55 | if r.status_code == 206: 56 | with open(self.filename, "rb+") as fp: 57 | fp.seek(start_index) 58 | fp.write(r.content) 59 | if self.show_print: 60 | sys.stdout.write('thread %s download part size:%.2f KB\n' % (thread_id, (r.content.__len__()) / 1024)) 61 | sys.stdout.flush() 62 | except Exception, e: 63 | if self.show_print: 64 | sys.stdout.write('下载出现错误,错误位置:%s,状态码:%s,错误信息:%s\n' % (start_index, r.status_code, e.message)) 65 | sys.stdout.flush() 66 | 67 | def run(self): 68 | print 'Start...' 69 | start_time = time.time() 70 | self.init_file_info() 71 | # 创建一个和要下载文件一样大小的文件 72 | with open(self.filename, "wb") as fp: 73 | fp.truncate(self.fileSize) 74 | 75 | if self.fileSize > 0: 76 | if self.supportThread is False and self.thread > 1: 77 | print 'sorry,only support single thread' 78 | self.thread = 1 79 | print 'Thread count is:%s' % (self.thread,) 80 | part = self.fileSize / self.thread 81 | for i in xrange(0, self.thread): 82 | start_index = part * i 83 | stop_index = start_index + part 84 | if i == self.thread - 1: 85 | stop_index = self.fileSize 86 | download_args = {'thread_id': i, 'start_index': start_index, 'stop_index': stop_index} 87 | worker = threading.Thread(target=self.start_part_download, kwargs=download_args) 88 | worker.setDaemon(True) 89 | worker.start() 90 | # 等待所有线程下载完成 91 | main_thread = threading.current_thread() 92 | for t in threading.enumerate(): 93 | if t is main_thread: 94 | continue 95 | t.join() 96 | print 'Success.\nTime:%.2fs , Size:%.2fKB' % (time.time() - start_time, self.fileSize / 1024) 97 | else: 98 | print 'Can not download' 99 | 100 | 101 | if __name__ == '__main__': 102 | parser = optparse.OptionParser(usage='python %s.py [options]' % (sys.argv[0],)) 103 | parser.add_option('-u', dest='url', type='string', help='specify download resource url') 104 | parser.add_option('-t', dest='thread', type='int', help='specify download thread count', default=1) 105 | parser.add_option('-p', dest='show_print', type='string', help='yes/no,show print info,default enable', default='yes') 106 | parser.add_option("-a", dest="user_agent", help="specify request user agent", default='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0') 107 | (options, args) = parser.parse_args() 108 | if options.url is None: 109 | parser.print_help() 110 | exit() 111 | config = { 112 | 'url': options.url, 113 | 'thread': options.thread, 114 | 'user_agent': options.user_agent, 115 | 'show_print': options.show_print 116 | } 117 | try: 118 | Download(config).run() 119 | except KeyboardInterrupt: 120 | print '\nCancel Download' 121 | -------------------------------------------------------------------------------- /palindrome.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-06-25 00:48:56 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-06-25 01:39:49 6 | 7 | # 时间复杂度:O(n),空间复杂度:O(1)。从两头向中间扫描 8 | 9 | s = "abcmnmcba" 10 | 11 | 12 | def check(s): 13 | start = 0 14 | end = len(s) - 1 15 | while start < end: 16 | if s[start:start + 1] != s[end:end + 1]: 17 | print s[start:start + 1] + '---' + s[end:end + 1] 18 | return False 19 | start = start + 1 20 | end = end - 1 21 | return True 22 | 23 | # print check(s) 24 | 25 | 26 | s2 = '12311211321' 27 | 28 | # 时间复杂度:O(n),空间复杂度:O(1)。先从中间开始、然后向两边扩展 29 | def check2(s): 30 | if len(s) % 2 == 0: 31 | mid = len(s) / 2 32 | start, end = mid - 1, mid 33 | if len(s) % 2 == 1: 34 | mid = len(s) / 2 35 | start, end = mid - 1, mid+1 36 | while mid > 0: 37 | if s[start:start+1] != s[end:end+1]: 38 | return False 39 | start = start - 1 40 | end = end + 1 41 | mid = mid - 1 42 | return True 43 | 44 | print check2(s2) 45 | -------------------------------------------------------------------------------- /rb_tree.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # Created by Vim 5 | """ 6 | 红黑树多用在内部排序,即全放在内存中的,微软STL的map和set的内部实现就是红黑树。 7 | B树多用在内存里放不下,大部分数据存储在外存上时。因为B树层数少,因此可以确保每次操作,读取磁盘的次数尽可能的少。 8 | 在数据较小,可以完全放到内存中时,红黑树的时间复杂度比B树低。反之,数据量较大,外存中占主要部分时,B树因其读磁盘次数少,而具有更快的速度。 9 | """ 10 | 11 | 12 | class RBTree(object): 13 | def __init__(self): 14 | self.nil = RBTreeNode(0) 15 | self.root = self.nil 16 | 17 | 18 | class RBTreeNode(object): 19 | def __init__(self, x): 20 | self.key = x 21 | self.left = None 22 | self.right = None 23 | self.parent = None 24 | self.color = 'black' 25 | self.size = None 26 | 27 | 28 | # 左旋转 29 | def left_rotate(T, x): 30 | y = x.right 31 | x.right = y.left 32 | if y.left != T.nil: 33 | y.left.parent = x 34 | y.parent = x.parent 35 | if x.parent == T.nil: 36 | T.root = y 37 | elif x == x.parent.left: 38 | x.parent.left = y 39 | else: 40 | x.parent.right = y 41 | y.left = x 42 | x.parent = y 43 | 44 | 45 | # 右旋转 46 | def right_rotate(T, x): 47 | y = x.left 48 | x.left = y.right 49 | if y.right != T.nil: 50 | y.right.parent = x 51 | y.parent = x.parent 52 | if x.parent == T.nil: 53 | T.root = y 54 | elif x == x.parent.right: 55 | x.parent.right = y 56 | else: 57 | x.parent.left = y 58 | y.right = x 59 | x.parent = y 60 | 61 | 62 | # 红黑树的插入 63 | def rb_insert(T, z): 64 | y = T.nil 65 | x = T.root 66 | while x != T.nil: 67 | y = x 68 | if z.key < x.key: 69 | x = x.left 70 | else: 71 | x = x.right 72 | z.parent = y 73 | if y == T.nil: 74 | T.root = z 75 | elif z.key < y.key: 76 | y.left = z 77 | else: 78 | y.right = z 79 | z.left = T.nil 80 | z.right = T.nil 81 | z.color = 'red' 82 | rb_insert_fix_up(T, z) 83 | return "%s,%s,%s" % (z.key, "颜色为", z.color) 84 | 85 | 86 | # 红黑树的上色 87 | def rb_insert_fix_up(T, z): 88 | while z.parent.color == 'red': 89 | if z.parent == z.parent.parent.left: 90 | y = z.parent.parent.right 91 | if y.color == 'red': 92 | z.parent.color = 'black' 93 | y.color = 'black' 94 | z.parent.parent.color = 'red' 95 | z = z.parent.parent 96 | else: 97 | if z == z.parent.right: 98 | z = z.parent 99 | left_rotate(T, z) 100 | z.parent.color = 'black' 101 | z.parent.parent.color = 'red' 102 | right_rotate(T, z.parent.parent) 103 | else: 104 | y = z.parent.parent.left 105 | if y.color == 'red': 106 | z.parent.color = 'black' 107 | y.color = 'black' 108 | z.parent.parent.color = 'red' 109 | z = z.parent.parent 110 | else: 111 | if z == z.parent.left: 112 | z = z.parent 113 | right_rotate(T, z) 114 | z.parent.color = 'black' 115 | z.parent.parent.color = 'red' 116 | left_rotate(T, z.parent.parent) 117 | T.root.color = 'black' 118 | 119 | 120 | def rb_transplant(T, u, v): 121 | if u.parent == T.nil: 122 | T.root = v 123 | elif u == u.parent.left: 124 | u.parent.left = v 125 | else: 126 | u.parent.right = v 127 | v.parent = u.parent 128 | 129 | 130 | def rb_delete(T, z): 131 | y = z 132 | y_original_color = y.color 133 | if z.left == T.nil: 134 | x = z.right 135 | rb_transplant(T, z, z.right) 136 | elif z.right == T.nil: 137 | x = z.left 138 | rb_transplant(T, z, z.left) 139 | else: 140 | y = tree_minimum(z.right) 141 | y_original_color = y.color 142 | x = y.right 143 | if y.parent == z: 144 | x.parent = y 145 | else: 146 | rb_transplant(T, y, y.right) 147 | y.right = z.right 148 | y.right.parent = y 149 | rb_transplant(T, z, y) 150 | y.left = z.left 151 | y.left.parent = y 152 | y.color = z.color 153 | if y_original_color == 'black': 154 | rb_delete_fix_up(T, x) 155 | 156 | 157 | # 红黑树的删除 158 | def rb_delete_fix_up(T, x): 159 | while x != T.root and x.color == 'black': 160 | if x == x.parent.left: 161 | w = x.parent.right 162 | if w.color == 'red': 163 | w.color = 'black' 164 | x.parent.color = 'red' 165 | left_rotate(T, x.parent) 166 | w = x.parent.right 167 | if w.left.color == 'black' and w.right.color == 'black': 168 | w.color = 'red' 169 | x = x.parent 170 | else: 171 | if w.right.color == 'black': 172 | w.left.color = 'black' 173 | w.color = 'red' 174 | right_rotate(T, w) 175 | w = x.parent.right 176 | w.color = x.parent.color 177 | x.parent.color = 'black' 178 | w.right.color = 'black' 179 | left_rotate(T, x.parent) 180 | x = T.root 181 | else: 182 | w = x.parent.left 183 | if w.color == 'red': 184 | w.color = 'black' 185 | x.parent.color = 'red' 186 | right_rotate(T, x.parent) 187 | w = x.parent.left 188 | if w.right.color == 'black' and w.left.color == 'black': 189 | w.color = 'red' 190 | x = x.parent 191 | else: 192 | if w.left.color == 'black': 193 | w.right.color = 'black' 194 | w.color = 'red' 195 | left_rotate(T, w) 196 | w = x.parent.left 197 | w.color = x.parent.color 198 | x.parent.color = 'black' 199 | w.left.color = 'black' 200 | right_rotate(T, x.parent) 201 | x = T.root 202 | x.color = 'black' 203 | 204 | 205 | def tree_minimum(x): 206 | while x.left != T.nil: 207 | x = x.left 208 | return x 209 | 210 | 211 | # 中序遍历 212 | def mid_sort(x): 213 | if x is not None: 214 | mid_sort(x.left) 215 | if x.key != 0: 216 | print('key:', x.key, 'x.parent', x.parent.key) 217 | mid_sort(x.right) 218 | 219 | 220 | if __name__ == '__main__': 221 | nodes = [11, 2, 14, 1, 7, 15, 5, 8, 4] 222 | T = RBTree() 223 | for node in nodes: 224 | print '插入数据', rb_insert(T, RBTreeNode(node)) 225 | print('中序遍历') 226 | mid_sort(T.root) 227 | rb_delete(T, T.root) 228 | print('中序遍历') 229 | mid_sort(T.root) 230 | rb_delete(T, T.root) 231 | print('中序遍历') 232 | mid_sort(T.root) 233 | -------------------------------------------------------------------------------- /red_package_optimize.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # Created by Vim 5 | from random import choice 6 | import random 7 | import sys 8 | 9 | def calc_red_package(m,c): 10 | if c*0.01*100>m: 11 | print '红包总金额为:%s元不能划分成%s个红包'%(m/100.0,c,) 12 | exit() 13 | 14 | all = {} 15 | val = m/c 16 | left = m-(val*c) 17 | for i in range(0,c): 18 | all[i] = [1,val] 19 | 20 | if left>0: 21 | rand = random.randint(0,c-1) 22 | all[rand].append(all[rand][-1]+left) 23 | 24 | pos = {} 25 | for index in all: 26 | stop=random.randint(0,val) 27 | pos[index] = stop 28 | 29 | res = [] 30 | 31 | if len(pos)>=2: 32 | for point in pos: 33 | left = pos[point] 34 | if left==0: 35 | left=1 36 | if point==0: 37 | res.append(left/100.0) 38 | last_right = all[point][-1]-left 39 | else: 40 | right = last_right 41 | if point==(c-1): 42 | res.append((left+right)/100.0) 43 | end = all[point][-1]-left 44 | randMax = random.randint(0,len(res)-1) 45 | res[randMax] = (res[randMax]*100 + end)/100.0 46 | else: 47 | res.append((left+right)/100.0) 48 | last_right = all[point][-1]-left 49 | else: 50 | res.append((all[0][-1])/100.0) 51 | 52 | print res 53 | 54 | for key,item in enumerate(res): 55 | print '第 %s 个红包金额:%s元' %(key+1,item) 56 | print '验证:红包总金额 is %s元, 分配后 res sum is %s元'%(m/100.0,sum(res),) 57 | 58 | 59 | if __name__ == '__main__': 60 | m = 1000 # 红包金额,单位:分 61 | c = 4 # 红包个数 62 | if len(sys.argv)==3: 63 | m = int(float(sys.argv[1])*100) 64 | c = int(sys.argv[2]) 65 | calc_red_package(m,c) 66 | 67 | 68 | 69 | -------------------------------------------------------------------------------- /redpackage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # Created by Vim 5 | from random import choice 6 | import sys 7 | 8 | def calc_red_package(m,c): 9 | if c*0.01*100>m: 10 | print '红包总金额为:%s元不能划分成%s个红包'%(m/100.0,c,) 11 | exit() 12 | 13 | all = {} 14 | val = m/c 15 | left = m-(val*c) 16 | for i in range(0,c): 17 | all[i] = range(0,val) 18 | 19 | if left>0: 20 | rand = choice(range(0,c)) 21 | all[rand].append(all[rand][-1]+left) 22 | 23 | pos = {} 24 | for index in all: 25 | stop=choice(all[index]) 26 | pos[index] = stop 27 | 28 | res = [] 29 | 30 | if len(pos)>=2: 31 | for point in pos: 32 | left = pos[point]+1 33 | if point==0: 34 | res.append(left/100.0) 35 | else: 36 | right = all[point-1][-1]-pos[point-1] 37 | if point==(c-1): 38 | end = all[point][-1]-pos[point] 39 | randMax = choice(range(0,len(res))) 40 | res[randMax] = (res[randMax]*100 + end)/100.0 41 | res.append((left+right)/100.0) 42 | else: 43 | res.append((left+right)/100.0) 44 | else: 45 | res.append((all[0][-1]+1)/100.0) 46 | 47 | print res 48 | 49 | for key,item in enumerate(res): 50 | print '第 %s 个红包金额:%s元' %(key+1,item) 51 | print '验证:红包总金额 is %s元, 分配后 res sum is %s元'%(m/100.0,sum(res),) 52 | 53 | 54 | if __name__ == '__main__': 55 | m = 1000 # 红包金额,单位:分 56 | c = 4 # 红包个数 57 | if len(sys.argv)==3: 58 | m = int(float(sys.argv[1])*100) 59 | c = int(sys.argv[2]) 60 | calc_red_package(m,c) 61 | 62 | 63 | 64 | -------------------------------------------------------------------------------- /revert_list.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-09-05 10:53:46 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2019-06-10 16:21:00 6 | 7 | class Node(): 8 | 9 | def __init__(self, value): 10 | self.next = None 11 | self.value = value 12 | 13 | 14 | class DoubleNode: 15 | 16 | def __init__(self, value): 17 | self.value = value 18 | self.next = None 19 | self.pre = None 20 | 21 | 22 | class RevertList(): 23 | 24 | @classmethod 25 | def revert_linked_list(cls, head): 26 | pre = None 27 | while head is not None: 28 | next = head.next 29 | head.next = pre 30 | pre = head 31 | head = next 32 | return pre 33 | 34 | @classmethod 35 | def revert_double_linked_list(cls, head): 36 | pre = None 37 | while head is not None: 38 | next = head.next 39 | head.next = pre 40 | head.pre = next 41 | pre = head 42 | head = next 43 | return pre 44 | 45 | 46 | if __name__ == '__main__': 47 | node = Node(1); 48 | node.next = Node(2); 49 | node.next.next = Node(3) 50 | print node.value 51 | print node.next.value 52 | print node.next.next.value 53 | print 'start revert list ...' 54 | newNode = RevertList.revert_linked_list(node) 55 | print newNode.value 56 | print newNode.next.value 57 | print newNode.next.next.value 58 | 59 | node2 = DoubleNode(1) 60 | node2.next = DoubleNode(2) 61 | node2.next.pre = node2 62 | node2.next.next = DoubleNode(3) 63 | node2.next.next.pre = node2.next 64 | node2.next.next.next = DoubleNode(4) 65 | node2.next.next.next.pre = node2.next.next 66 | node2.next.next.next.next = DoubleNode(5) 67 | node2.next.next.next.next.pre = node2.next.next.next 68 | node2.next.next.next.next.next = DoubleNode(6) 69 | node2.next.next.next.next.next.pre = node2.next.next.next 70 | -------------------------------------------------------------------------------- /rpn.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-06-26 15:51:28 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-06-26 15:55:53 6 | def calc(s): 7 | if type(s) != list:s = s.split(' ') 8 | operaList = ['+', '-', '*', '/']; 9 | for key, item in enumerate(s): 10 | if item in operaList: 11 | val = eval(s[key - 2] + item + s[key - 1]) 12 | s.insert(key - 2, str(val)) 13 | for x in ['1','2','3']:del s[key - 1] 14 | calc(s) 15 | return sum(map(eval, s)) 16 | def translate(calcStr): 17 | element, calcList,s,stackStr,i = '', [],[],[],0 18 | for key,item in enumerate(calcStr): 19 | if item.isdigit(): 20 | element = element + item 21 | if key == len(calcStr)-1:calcList.append(element) 22 | else: 23 | if element != '': 24 | calcList.append(element) 25 | element = '' 26 | if item in ['+', '-', '*', '/', '(', ')']: 27 | calcList.append(item) 28 | calcList.insert(0,'(') 29 | calcList.append(')') 30 | calcList.append('#') 31 | while calcList[i] != "#": 32 | if (calcList[i].isdigit()): 33 | stackStr.append(calcList[i]) 34 | elif calcList[i] == '(': 35 | s.append(calcList[i]) 36 | elif calcList[i] == ')': 37 | while s[-1] != '(': 38 | stackStr.append(s.pop()) 39 | s.pop() 40 | elif calcList[i] in ['+', '-']: 41 | while s[-1] != '(': 42 | stackStr.append(s.pop()) 43 | s.append(calcList[i]) 44 | elif calcList[i] in ['*', '/']: 45 | while s[-1] in ['*', '/']: 46 | stackStr.append(s.pop()) 47 | s.append(calcList[i]) 48 | i = i + 1 49 | return stackStr 50 | if __name__ == '__main__': 51 | s = '11111111111111*9999999999999+(99-(12/4)+10)' 52 | print translate(s) 53 | print str(calc(translate('11111111111111*9999999999999+(99-(12/4)+10)'))) == str(11111111111111*9999999999999+(99-(12/4)+10)),str(calc(translate(s))) , str(11111111111111*9999999999999+(99-(12/4)+10)) 54 | print str(calc(translate('12+1+12+33*9+4'))) == str(12+1+12+33*9+4),str(calc(translate('12+1+12+33*9+4'))) , str(12+1+12+33*9+4) 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /rsa.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2016/11/8 16:51 5 | 6 | 7 | # RSA算法中加密方公布的密钥是n和e,解密方使用n和d解密 8 | 9 | # p和q必须为素数,在实际运用中通常为很大的数 10 | p = 5 11 | q = 7 12 | 13 | n = p * q 14 | z = (p - 1) * (q - 1) 15 | 16 | e = 5 # 加密方选择e,e必须和z只有一个公约数 17 | d = 5 # (e * d - 1)必须能够被z整除 , 由于long int无法表示过大的数字,所以d取5 18 | 19 | 20 | def run(): 21 | raw_msg = [12, 15, 22, 5] 22 | en, de = [], [] 23 | sec_code, de_msg = [], [] 24 | print "下面是一个RSA加解密算法的简单演示:\n" 25 | print "报文\t加密\t 加密后密文\n" 26 | for item in raw_msg: 27 | en_key_item = pow(item, e) 28 | en.append(en_key_item) 29 | sec_code_item = en_key_item % n 30 | sec_code.append(sec_code_item) 31 | print "%d\t%d\t\t%d" % (item, en_key_item, sec_code_item) 32 | 33 | print "\n" 34 | print "---------------------------" 35 | print "----------执行解密---------" 36 | print "---------------------------" 37 | 38 | print "原始报文\t密文\t加密\t解密报文\n" 39 | for key,item in enumerate(sec_code): 40 | de_key_item = pow(item,d) 41 | de_msg_item = de_key_item % n 42 | de_msg.append(de_msg_item) 43 | print "%d\t\t%d\t%d\t\t%d" % (raw_msg[key], item, de_key_item, de_msg_item) 44 | 45 | if __name__ == '__main__': 46 | run() 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /selenium.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-04-04 01:40:22 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-04-04 23:38:09 6 | import unittest 7 | from selenium import webdriver 8 | from selenium.webdriver.common.keys import Keys 9 | import time 10 | 11 | class BaiduSearch(unittest.TestCase): 12 | 13 | def setUp(self): 14 | self.driver = webdriver.Chrome() 15 | 16 | 17 | def test_lock(self): 18 | driver = self.driver 19 | driver.get("http://www.baidu.com") 20 | self.assertIn(u"百度一下", driver.title) 21 | elem = driver.find_element_by_id("kw") 22 | elem.send_keys("lock") 23 | elem.send_keys(Keys.RETURN) 24 | i = 0 25 | while 1: 26 | if i>=2: 27 | break 28 | time.sleep(1) 29 | i+=1 30 | print "not test %s , wait %s second continue ..." % ('lock',i,) 31 | 32 | def test_search(self): 33 | driver = self.driver 34 | driver.get("http://www.baidu.com") 35 | self.assertIn(u"百度一下", driver.title) 36 | elem = driver.find_element_by_id("kw") 37 | elem.send_keys("php") 38 | elem.send_keys(Keys.RETURN) 39 | i = 0 40 | while 1: 41 | if i>=2: 42 | break 43 | time.sleep(1) 44 | i+=1 45 | print "not test %s , wait %s second continue ..." % ('php',i,) 46 | assert "No results found." not in driver.page_source 47 | 48 | 49 | def tearDown(self): 50 | self.driver.close() 51 | 52 | if __name__ == "__main__": 53 | unittest.main() -------------------------------------------------------------------------------- /svm.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Author: lock 3 | # @Date: 2017-12-21 09:58:01 4 | # @Last Modified by: lock 5 | # @Last Modified time: 2017-12-21 17:41:07 6 | # 分类算法之SVM,比KNN算法更加复杂 7 | # demo最简单的线性可分离数据 8 | # 参考:http://blog.csdn.net/lisi1129/article/details/70209945?locationNum=8&fps=1 9 | 10 | import numpy as np 11 | from matplotlib import pyplot 12 | import math 13 | import sys 14 | 15 | class SVM(object): 16 | def __init__(self, visual=True): 17 | self.visual = visual 18 | self.colors = {1:'r', -1:'b'} 19 | if self.visual: 20 | self.fig = pyplot.figure() 21 | self.ax = self.fig.add_subplot(1,1,1) 22 | 23 | def train(self, data): 24 | self.data = data 25 | opt_dict = {} 26 | 27 | transforms = [[1,1], 28 | [-1,1], 29 | [-1,-1], 30 | [1,-1]] 31 | 32 | # 找到数据集中最大值和最小值 33 | self.max_feature_value = float('-inf') # 正无穷 34 | self.min_feature_value = float('inf') # 负无穷 35 | for y in self.data: 36 | for features in self.data[y]: 37 | for feature in features: 38 | if feature > self.max_feature_value: 39 | self.max_feature_value = feature 40 | if feature < self.min_feature_value: 41 | self.min_feature_value = feature 42 | print(self.max_feature_value, self.min_feature_value) 43 | 44 | # 和梯度下降一样,定义每一步的大小;开始快,然后慢,越慢越耗时 45 | step_sizes = [self.max_feature_value * 0.1, self.max_feature_value * 0.01, self.max_feature_value * 0.001] 46 | 47 | b_range_multiple = 5 48 | b_multiple = 5 49 | lastest_optimum = self.max_feature_value * 10 50 | 51 | for step in step_sizes: 52 | w = np.array([lastest_optimum,lastest_optimum]) 53 | optimized = False 54 | while not optimized: 55 | for b in np.arange(self.max_feature_value*b_range_multiple*-1, self.max_feature_value*b_range_multiple, step*b_multiple): 56 | for transformation in transforms: 57 | w_t = w * transformation 58 | found_option = True 59 | for i in self.data: 60 | for x in self.data[i]: 61 | y = i 62 | if not y*(np.dot(w_t, x)+b) >= 1: 63 | found_option = False 64 | #print(x,':',y*(np.dot(w_t, x)+b)) 逐渐收敛 65 | 66 | if found_option: 67 | opt_dict[np.linalg.norm(w_t)] = [w_t,b] 68 | 69 | if w[0] < 0: 70 | optimized = True 71 | else: 72 | w = w - step 73 | 74 | norms = sorted([n for n in opt_dict]) 75 | opt_choice = opt_dict[norms[0]] 76 | self.w = opt_choice[0] 77 | self.b = opt_choice[1] 78 | print(self.w, self.b) 79 | lastest_optimum = opt_choice[0][0] + step*2 80 | 81 | 82 | def predict(self, features): 83 | classification = np.sign( np.dot(features, self.w) + self.b ) 84 | 85 | if classification != 0 and self.visual: 86 | self.ax.scatter(features[0], features[1], s=300, marker='*', c=self.colors[classification]) 87 | 88 | return classification 89 | 90 | 91 | # 显示picture 92 | def visualize(self): 93 | for i in self.data: 94 | for x in self.data[i]: 95 | self.ax.scatter(x[0], x[1], s=50, c=self.colors[i]) 96 | 97 | # 超平面 98 | def hyperplane(x,w,b,v): 99 | return (-w[0]*x-b+v) / w[1] 100 | 101 | data_range = (self.min_feature_value*0.9, self.max_feature_value*1.1) 102 | 103 | hyp_x_min = data_range[0] 104 | hyp_x_man = data_range[1] 105 | 106 | psv1 = hyperplane(hyp_x_min, self.w, self.b, 1) 107 | psv2 = hyperplane(hyp_x_man, self.w, self.b, 1) 108 | self.ax.plot([hyp_x_min, hyp_x_man], [psv1, psv2], c=self.colors[1]) 109 | 110 | nsv1 = hyperplane(hyp_x_min, self.w, self.b, -1) 111 | nsv2 = hyperplane(hyp_x_man, self.w, self.b, -1) 112 | self.ax.plot([hyp_x_min, hyp_x_man], [nsv1, nsv2], c=self.colors[-1]) 113 | 114 | db1 = hyperplane(hyp_x_min, self.w, self.b, 0) 115 | db2 = hyperplane(hyp_x_man, self.w, self.b, 0) 116 | self.ax.plot([hyp_x_min, hyp_x_man], [db1, db2], 'y--') 117 | 118 | pyplot.show() 119 | 120 | if __name__ == '__main__': 121 | data_set = {-1:np.array([[1,7], 122 | [2,8], 123 | [3,8]]), 124 | 1:np.array([[5,1], 125 | [6,-1], 126 | [7,3]])} 127 | print(data_set) 128 | 129 | svm = SVM() 130 | svm.train(data_set) 131 | 132 | # 预测 133 | for predict_feature in [[0,10],[2,6],[1,3], [4,3], [5.5,7.5], [8,3]]: 134 | print(svm.predict(predict_feature)) 135 | 136 | svm.visualize() 137 | 138 | -------------------------------------------------------------------------------- /tensorflow/cnn_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/3/18 17:26 5 | 6 | import tensorflow as tf 7 | from train import cnn_graph 8 | from train import get_random_captcha_text_and_image 9 | from train import vec2text, convert2gray 10 | from create_captcha_img import CAPTCHA_LIST, CAPTCHA_WIDTH, CAPTCHA_HEIGHT, CAPTCHA_LEN 11 | 12 | 13 | def captcha_to_text(image_list, height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH): 14 | ''' 15 | 验证码图片转化为文本 16 | :param image_list: 17 | :param height: 18 | :param width: 19 | :return: 20 | ''' 21 | x = tf.placeholder(tf.float32, [None, height * width]) 22 | keep_prob = tf.placeholder(tf.float32) 23 | y_conv = cnn_graph(x, keep_prob, (height, width)) 24 | saver = tf.train.Saver() 25 | with tf.Session() as sess: 26 | saver.restore(sess, tf.train.latest_checkpoint('.')) 27 | predict = tf.argmax(tf.reshape(y_conv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2) 28 | vector_list = sess.run(predict, feed_dict={x: image_list, keep_prob: 1}) 29 | vector_list = vector_list.tolist() 30 | text_list = [vec2text(vector) for vector in vector_list] 31 | return text_list[0] 32 | 33 | 34 | def multi_test(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH): 35 | x = tf.placeholder(tf.float32, [None, height * width]) 36 | keep_prob = tf.placeholder(tf.float32) 37 | y_conv = cnn_graph(x, keep_prob, (height, width)) 38 | saver = tf.train.Saver() 39 | with tf.Session() as sess: 40 | saver.restore(sess, tf.train.latest_checkpoint('.')) 41 | while 1: 42 | text, image = get_random_captcha_text_and_image() 43 | image = convert2gray(image) 44 | image = image.flatten() / 255 45 | image_list = [image] 46 | predict = tf.argmax(tf.reshape(y_conv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2) 47 | vector_list = sess.run(predict, feed_dict={x: image_list, keep_prob: 1}) 48 | vector_list = vector_list.tolist() 49 | text_list = [vec2text(vector) for vector in vector_list] 50 | pre_text = text_list[0] 51 | flag = u'错误' 52 | if text == pre_text: 53 | flag = u'正确' 54 | print u"实际值(actual):%s, 预测值(predict):%s, 预测结果:%s" % (text, pre_text, flag,) 55 | 56 | 57 | if __name__ == '__main__': 58 | try: 59 | # 多个测试 60 | multi_test() 61 | exit() 62 | 63 | text, image = get_random_captcha_text_and_image() 64 | image = convert2gray(image) 65 | image = image.flatten() / 255 66 | pre_text = captcha_to_text([image]) 67 | flag = u'错误' 68 | if text == pre_text: 69 | flag = u'正确' 70 | print u"实际值(actual):%s, 预测值(predict):%s, 预测结果:%s" % (text, pre_text, flag,) 71 | except KeyboardInterrupt as e: 72 | print e.message 73 | -------------------------------------------------------------------------------- /tensorflow/create_captcha_img.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/3/18 13:25 5 | 6 | import string 7 | import random 8 | from captcha.image import ImageCaptcha 9 | from PIL import Image 10 | import numpy as np 11 | import os 12 | 13 | CAPTCHA_HEIGHT = 60 # 验证码高度 14 | CAPTCHA_WIDTH = 160 # 验证码宽度 15 | CAPTCHA_LEN = 4 # 验证码长度 16 | # CAPTCHA_LIST = [str(i) for i in range(0, 10)] + list(string.ascii_letters) # 验证码字符列表 17 | CAPTCHA_LIST = [str(i) for i in range(0, 10)] # 验证码字符列表,改小一点的访问,提高速度 18 | 19 | 20 | def get_random_captcha_text(char_set=CAPTCHA_LIST, length=CAPTCHA_LEN): 21 | captcha_text = [random.choice(char_set) for _ in range(length)] 22 | return ''.join(captcha_text) 23 | 24 | 25 | def get_random_captcha_text_and_image(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT, save=None): 26 | image = ImageCaptcha(width=width, height=height) 27 | captcha_text = get_random_captcha_text() 28 | captcha = image.generate(captcha_text) 29 | if save: 30 | image.write(captcha_text, 'image/' + captcha_text + '.jpg') 31 | captcha_image = Image.open(captcha) 32 | # 转化为np数组 33 | captcha_image_np = np.array(captcha_image) 34 | return captcha_text, captcha_image_np 35 | 36 | 37 | if __name__ == "__main__": 38 | if os.path.exists('image') is False: 39 | os.mkdir('image') 40 | 41 | while 1: 42 | text, np_data = get_random_captcha_text_and_image(CAPTCHA_WIDTH, CAPTCHA_HEIGHT, 1) 43 | print text 44 | -------------------------------------------------------------------------------- /tensorflow/train.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | # author: Lock 4 | # time: 2018/3/18 14:48 5 | 6 | import tensorflow as tf 7 | import os, numpy as np 8 | from datetime import datetime 9 | from create_captcha_img import CAPTCHA_LIST, CAPTCHA_WIDTH, CAPTCHA_HEIGHT, CAPTCHA_LEN 10 | from create_captcha_img import get_random_captcha_text_and_image 11 | 12 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 13 | 14 | 15 | def weight_variable(shape, w_alpha=0.01): 16 | """ 17 | 增加噪音,随机生成权重 18 | :param shape: 19 | :param w_alpha: 20 | :return: Tensor , shape仍然是[batch, height, width, channels]这种形式 21 | """ 22 | initial = w_alpha * tf.random_normal(shape) 23 | return tf.Variable(initial) 24 | 25 | 26 | def bias_variable(shape, b_alpha=0.01): 27 | """ 28 | 增加噪音,随机生成偏置项 29 | :param shape: 30 | :param b_alpha: 31 | :return: Tensor , shape仍然是[batch, height, width, channels]这种形式 32 | """ 33 | initial = b_alpha * tf.random_normal(shape) 34 | return tf.Variable(initial) 35 | 36 | 37 | def conv2d(input, filter): 38 | """ 39 | 实现卷积的函数 40 | 局部变量线性组合,部长为1,模式same代表卷积后图片尺寸不变,即零边距 41 | https://www.cnblogs.com/qggg/p/6832342.html 42 | :param input: 具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],Tensor 43 | :param filter: 具体含义是[卷积核的高度,卷积核的宽度,图像通道数,卷积核个数],Tensor 44 | :return: Tensor , shape仍然是[batch, height, width, channels]这种形式 45 | """ 46 | # 第三个参数strides:卷积时在图像每一维的步长,这是一个一维的向量,长度4 47 | # http://blog.csdn.net/wuzqChom/article/details/74785643 SAME与VALID的区别 48 | return tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME') 49 | 50 | 51 | def max_pool(val): 52 | """ 53 | 池化操作,max pooling是CNN当中的最大值池化操作,其实用法和卷积很类似 54 | http://blog.csdn.net/mao_xiao_feng/article/details/53453926 55 | :param val:一般池化层接在卷积层后面,所以输入通常是feature map,依然是[batch, height, width, channels]这样的shape 56 | :return: Tensor , shape仍然是[batch, height, width, channels]这种形式 57 | """ 58 | # ksize池化窗口的大小,取一个四维向量,一般是[1, height, width, 1],因为我们不想在batch和channels上做池化,所以这两个维度设为了1 59 | # strides:和卷积类似,窗口在每一个维度上滑动的步长,一般也是[1, stride,stride, 1] 60 | return tf.nn.max_pool(val, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 61 | 62 | 63 | def cnn_graph(x, keep_prob, size, captcha_list=CAPTCHA_LIST, captcha_len=CAPTCHA_LEN): 64 | """ 65 | 三层卷积神经网络计算图 66 | :param x: 67 | :param keep_prob: 68 | :param size: 69 | :param captcha_list: 70 | :param captcha_len: 71 | :return: 72 | """ 73 | # 图片reshape为4维向量 74 | image_height, image_width = size 75 | # http://blog.csdn.net/lxg0807/article/details/53021859 reshape介绍 76 | x_image = tf.reshape(x, shape=[-1, image_height, image_width, 1]) 77 | 78 | # 第一层 ,filter定义为3x3x1, 输出32个特征, 即32个filter 79 | w_conv1 = weight_variable([3, 3, 1, 32]) 80 | b_conv1 = bias_variable([32]) 81 | # rulu激活函数 ( 82 | # 一种函数(例如 ReLU 或 S 型函数),用于对上一层的所有输入求加权和,然后生成一个输出值) 83 | # (通常为非线性值),并将其传递给下一层。 84 | h_conv1 = tf.nn.relu(tf.nn.bias_add(conv2d(x_image, w_conv1), b_conv1)) 85 | # 池化 86 | h_pool1 = max_pool(h_conv1) 87 | # dropout防止过拟合 88 | h_drop1 = tf.nn.dropout(h_pool1, keep_prob) 89 | 90 | # 第二层 91 | w_conv2 = weight_variable([3, 3, 32, 64]) 92 | b_conv2 = bias_variable([64]) 93 | h_conv2 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop1, w_conv2), b_conv2)) 94 | h_pool2 = max_pool(h_conv2) 95 | h_drop2 = tf.nn.dropout(h_pool2, keep_prob) 96 | 97 | # 第三层 98 | w_conv3 = weight_variable([3, 3, 64, 64]) 99 | b_conv3 = bias_variable([64]) 100 | h_conv3 = tf.nn.relu(tf.nn.bias_add(conv2d(h_drop2, w_conv3), b_conv3)) 101 | h_pool3 = max_pool(h_conv3) 102 | h_drop3 = tf.nn.dropout(h_pool3, keep_prob) 103 | 104 | # 全连接层 105 | image_height = int(h_drop3.shape[1]) 106 | image_width = int(h_drop3.shape[2]) 107 | w_fc = weight_variable([image_height * image_width * 64, 1024]) 108 | b_fc = bias_variable([1024]) 109 | h_drop3_re = tf.reshape(h_drop3, [-1, image_height * image_width * 64]) 110 | h_fc = tf.nn.relu(tf.add(tf.matmul(h_drop3_re, w_fc), b_fc)) 111 | h_drop_fc = tf.nn.dropout(h_fc, keep_prob) 112 | 113 | # 输出层 114 | w_out = weight_variable([1024, len(captcha_list) * captcha_len]) 115 | b_out = bias_variable([len(captcha_list) * captcha_len]) 116 | y_conv = tf.add(tf.matmul(h_drop_fc, w_out), b_out) 117 | return y_conv 118 | 119 | 120 | def optimize_graph(y, y_conv): 121 | ''' 122 | 优化计算图 123 | :param y: 124 | :param y_conv: 125 | :return: 126 | ''' 127 | # 交叉熵计算loss 注意logits输入是在函数内部进行sigmod操作 128 | # sigmod_cross适用于每个类别相互独立但不互斥,如图中可以有字母和数字 129 | # softmax_cross适用于每个类别独立且排斥的情况,如数字和字母不可以同时出现 130 | loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_conv, labels=y)) 131 | # 最小化loss优化 132 | optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss) 133 | return optimizer 134 | 135 | 136 | def accuracy_graph(y, y_conv, width=len(CAPTCHA_LIST), height=CAPTCHA_LEN): 137 | ''' 138 | 偏差计算图 139 | :param y: 140 | :param y_conv: 141 | :param width: 142 | :param height: 143 | :return: 144 | ''' 145 | # 这里区分了大小写 实际上验证码一般不区分大小写 146 | # 预测值 147 | predict = tf.reshape(y_conv, [-1, height, width]) 148 | max_predict_idx = tf.argmax(predict, 2) 149 | # 标签 150 | label = tf.reshape(y, [-1, height, width]) 151 | max_label_idx = tf.argmax(label, 2) 152 | correct_p = tf.equal(max_predict_idx, max_label_idx) 153 | # reduce_mean求tensor中平均值 154 | accuracy = tf.reduce_mean(tf.cast(correct_p, tf.float32)) 155 | return accuracy 156 | 157 | 158 | def convert2gray(img): 159 | ''' 160 | 图片转为黑白,3维转1维 161 | :param img: 162 | :return: 163 | ''' 164 | if len(img.shape) > 2: 165 | img = np.mean(img, -1) 166 | return img 167 | 168 | 169 | def text2vec(text, captcha_len=CAPTCHA_LEN, captcha_list=CAPTCHA_LIST): 170 | ''' 171 | 验证码文本转为向量 172 | :param text: 173 | :param captcha_len: 174 | :param captcha_list: 175 | :return: vector 176 | ''' 177 | text_len = len(text) 178 | if text_len > captcha_len: 179 | raise ValueError('验证码最长4个字符') 180 | vector = np.zeros(captcha_len * len(captcha_list)) 181 | for i in range(text_len): 182 | vector[captcha_list.index(text[i]) + i * len(captcha_list)] = 1 183 | return vector 184 | 185 | 186 | def vec2text(vec, captcha_list=CAPTCHA_LIST, size=CAPTCHA_LEN): 187 | ''' 188 | 验证码向量转为文本 189 | :param vec: 190 | :param captcha_list: 191 | :param size: 192 | :return: 193 | ''' 194 | # if np.size(np.shape(vec)) is not 1: 195 | # raise ValueError('向量限定为1维') 196 | # vec = np.reshape(vec, (size, -1)) 197 | # vec_idx = np.argmax(vec, 1) 198 | vec_idx = vec 199 | text_list = [captcha_list[v] for v in vec_idx] 200 | return ''.join(text_list) 201 | 202 | 203 | def wrap_gen_captcha_text_and_image(shape=(60, 160, 3)): 204 | ''' 205 | 返回特定shape图片 206 | :param shape: 207 | :return: 208 | ''' 209 | while True: 210 | t, im = get_random_captcha_text_and_image() 211 | if im.shape == shape: 212 | return t, im 213 | 214 | 215 | def next_batch(batch_count=60, width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT): 216 | ''' 217 | 获取训练图片组 218 | :param batch_count: 219 | :param width: 220 | :param height: 221 | :return: 222 | ''' 223 | # np.zeros()返回来一个给定形状和类型的用0填充的数组; 224 | batch_x = np.zeros([batch_count, width * height]) 225 | batch_y = np.zeros([batch_count, CAPTCHA_LEN * len(CAPTCHA_LIST)]) 226 | for i in range(batch_count): 227 | text, image = wrap_gen_captcha_text_and_image() 228 | image = convert2gray(image) 229 | # 将图片数组一维化 同时将文本也对应在两个二维组的同一行 230 | batch_x[i, :] = image.flatten() / 255 231 | batch_y[i, :] = text2vec(text) 232 | # 返回该训练批次 233 | return batch_x, batch_y 234 | 235 | 236 | def start_train(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH, y_size=len(CAPTCHA_LIST) * CAPTCHA_LEN): 237 | """ 238 | cnn 训练 239 | :param height: 240 | :param width: 241 | :param y_size: 242 | :return: 243 | """ 244 | acc_rate = 0.95 245 | # 按照图片大小申请占位符 246 | x = tf.placeholder(tf.float32, [None, height * width]) # (这里的None表示此张量的第一个维度可以是任何长度的) 247 | y = tf.placeholder(tf.float32, [None, y_size]) 248 | # 防止过拟合 训练时启用 测试时不启用 (过拟合是指为了得到一致假设而使假设变得过度严格) 249 | keep_prob = tf.placeholder(tf.float32) 250 | # cnn模型 251 | y_conv = cnn_graph(x, keep_prob, (height, width)) 252 | # 最优化 253 | optimizer = optimize_graph(y, y_conv) 254 | # 偏差 255 | accuracy = accuracy_graph(y, y_conv) 256 | # 启动会话.开始训练 257 | saver = tf.train.Saver() 258 | sess = tf.Session() 259 | sess.run(tf.global_variables_initializer()) 260 | step = 0 261 | while 1: 262 | batch_x, batch_y = next_batch(64) 263 | sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.75}) 264 | # 每训练一百次测试一次 265 | if step % 100 == 0: 266 | batch_x_test, batch_y_test = next_batch(100) 267 | acc = sess.run(accuracy, feed_dict={x: batch_x_test, y: batch_y_test, keep_prob: 1.0}) 268 | print(datetime.now().strftime('%c'), ' step:', step, ' accuracy:', acc) 269 | # 偏差满足要求,保存模型 270 | if acc > acc_rate: 271 | model_path = os.getcwd() + os.sep + str(acc_rate) + "captcha.model" 272 | saver.save(sess, model_path, global_step=step) 273 | acc_rate += 0.01 274 | if acc_rate > 0.99: 275 | break 276 | step += 1 277 | sess.close() 278 | 279 | 280 | if __name__ == "__main__": 281 | start_train() 282 | -------------------------------------------------------------------------------- /word.md: -------------------------------------------------------------------------------- 1 | abcd 1 2 | lock 1 3 | stop 2 4 | aaaa 1 5 | bbbm 1 6 | dddd 1 7 | --------------------------------------------------------------------------------