├── 2.PNG ├── README.md ├── scrapyTumblr.py ├── tumblr_normal.txt └── 捕获.PNG /2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/henan715/tumblrScrapy/b5b22125749ed9b39158a979dba4ce78e393e87f/2.PNG -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tumblrScrapy 2 | 一个针对Tumblr的爬虫,通过API获取视频、图片的地址,保存到本地。 3 | 4 | ##tumblrScrapy 5 |   Tumblr页面在滚动条拉到最底下时会自动加载下一页内容,当浏览的页面多时,会非常非常卡,加上国内在访问Tumblr奇慢无比,于是萌生了写一个爬虫自动爬的想法。 6 | 7 |   首先我从tumblr页面分析开始,发现tumblr的资源连接都是明文存在的,如果在浏览器中直接输入资源的连接,可以非常快速的访问资源,但是在tumblr上总是不停的加载加载再加载,估计是和Tumblr的后台业务有联系,于是经过一番分析找到了Tumblr的JSON数据接口:`http://username.tumblr.com/api/read/json?start=0&num=200'`,username为用户的名称,start、num为开始、结束的记录数目(以天为单位),然后用Python写了爬虫,这两天似乎国内已经无法访问上Tumblr了,于是翻墙找了一段API返回的数据,打成txt在附件中,不会翻墙的小伙伴可以用这个txt练习。 8 | 9 |   初步的设想是将抓取后的数据保存到本地txt和MongoDB,后期会用Python写一个下载脚本,批量读取这些URL进行下载。 10 | 11 | ToDo: 12 | - 视频地址提取尚未完工,想用正则表达式进行提取 13 | - 结果存储到MongoDB未完工,待定 14 | - 可以写一个自动发布脚本,将热门数据自动发布到个人网页上(版权问题、道德问题,不实现) 15 | - 16 | 17 | 提取的地址如下: 18 | 19 | [视频地址](https://github.com/henan715/tumblrScrapy/blob/master/2.PNG) 20 | [图片地址](https://github.com/henan715/tumblrScrapy/blob/master/捕获.PNG) 21 | -------------------------------------------------------------------------------- /scrapyTumblr.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/henan715/tumblrScrapy/b5b22125749ed9b39158a979dba4ce78e393e87f/scrapyTumblr.py -------------------------------------------------------------------------------- /tumblr_normal.txt: -------------------------------------------------------------------------------- 1 | var tumblr_api_read = {"tumblelog":{"title":"BEAUTIFUL.","description":"

@jennyvorwaller<\/p>","name":"beautiful","timezone":"US\/Pacific","cname":false,"feeds":[]},"posts-start":0,"posts-total":2633,"posts-type":false,"posts":[{"id":"145031492179","url":"http:\/\/beautiful.tumblr.com\/post\/145031492179","url-with-slug":"http:\/\/beautiful.tumblr.com\/post\/145031492179\/ways-of-seeing","type":"video","date-gmt":"2016-05-28 01:56:12 GMT","date":"Fri, 27 May 2016 18:56:12","bookmarklet":0,"mobile":0,"feed-item":"","from-feed-id":0,"unix-timestamp":1464400572,"format":"html","reblog-key":"DpFQHBwT","slug":"ways-of-seeing","video-caption":"

Ways of seeing<\/p>","video-source":"https:\/\/www.youtube.com\/watch?v=0pDE4VX_9Kk&feature=share","video-player":"