├── img ├── a ├── 风景_132.jpg ├── 风景_152.jpg ├── 风景_204.jpg ├── 风景_280.jpg ├── 风景_296.jpg ├── 风景_300.jpg ├── 风景_312.jpg ├── 风景_344.jpg ├── 风景_368.jpg ├── 风景_376.jpg ├── 风景_40.jpg ├── 风景_472.jpg ├── 风景_48.jpg ├── 风景_492.jpg ├── 风景_524.jpg ├── 风景_588.jpg ├── 风景_640.jpg ├── 风景_644.jpg ├── 风景_716.jpg ├── 风景_76.jpg ├── 风景_800.jpg ├── 风景_868.jpg ├── 风景_944.jpg ├── 风景_984.jpg ├── 风景_552.jpg └── 风景_720.jpg ├── 1.jpg ├── Spider.py └── README.md /img/a: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/1.jpg -------------------------------------------------------------------------------- /img/风景_132.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_132.jpg -------------------------------------------------------------------------------- /img/风景_152.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_152.jpg -------------------------------------------------------------------------------- /img/风景_204.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_204.jpg -------------------------------------------------------------------------------- /img/风景_280.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_280.jpg -------------------------------------------------------------------------------- /img/风景_296.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_296.jpg -------------------------------------------------------------------------------- /img/风景_300.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_300.jpg -------------------------------------------------------------------------------- /img/风景_312.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_312.jpg -------------------------------------------------------------------------------- /img/风景_344.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_344.jpg -------------------------------------------------------------------------------- /img/风景_368.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_368.jpg -------------------------------------------------------------------------------- /img/风景_376.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_376.jpg -------------------------------------------------------------------------------- /img/风景_40.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_40.jpg -------------------------------------------------------------------------------- /img/风景_472.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_472.jpg -------------------------------------------------------------------------------- /img/风景_48.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_48.jpg -------------------------------------------------------------------------------- /img/风景_492.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_492.jpg -------------------------------------------------------------------------------- /img/风景_524.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_524.jpg -------------------------------------------------------------------------------- /img/风景_588.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_588.jpg -------------------------------------------------------------------------------- /img/风景_640.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_640.jpg -------------------------------------------------------------------------------- /img/风景_644.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_644.jpg -------------------------------------------------------------------------------- /img/风景_716.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_716.jpg -------------------------------------------------------------------------------- /img/风景_76.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_76.jpg -------------------------------------------------------------------------------- /img/风景_800.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_800.jpg -------------------------------------------------------------------------------- /img/风景_868.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_868.jpg -------------------------------------------------------------------------------- /img/风景_944.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_944.jpg -------------------------------------------------------------------------------- /img/风景_984.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JackHCC/KeyWord-Crawler/HEAD/img/风景_984.jpg -------------------------------------------------------------------------------- /img/风景_552.jpg: -------------------------------------------------------------------------------- 1 |
Request forbidden
-------------------------------------------------------------------------------- /Spider.py: -------------------------------------------------------------------------------- 1 | import re # 导入正则表达式模块 2 | import requests # python HTTP客户端 编写爬虫和测试服务器经常用到的模块 3 | import random # 随机生成一个数,范围[0,1] 4 | import os 5 | 6 | # 定义函数方法 7 | 8 | def spiderPic(html, keyword): #html:网页;keyword:关键词 9 | print('正在查找 ' + keyword + ' 对应的图片,请稍后......') 10 | for addr in re.findall('"objURL":"(.*?)"', html, re.S): # 动态查找URL 11 | print('正在爬取URL地址:' + str(addr)[0:40] + '...') # 爬取的地址长度超过40时,用'...'代替后面的内容 12 | 13 | try: 14 | pics = requests.get(addr, timeout=10) # 请求URL时间(最大10秒) 15 | except requests.exceptions.ConnectionError: 16 | print('您当前请求的URL地址出现错误') 17 | continue 18 | 19 | fq = open('E:\\img\\' + (keyword + '_' + str(random.randrange(0, 1000, 4)) + '.jpg'), 'wb') # 下载图片,并保存和命名 20 | fq.write(pics.content) 21 | fq.close() 22 | 23 | 24 | # python的主方法 25 | if __name__ == '__main__': 26 | word = input('请输入你要搜索的图片关键字:') 27 | result = requests.get( 28 | # 通过百度引擎搜索关键词链接 29 | 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=' + word) 30 | 31 | 32 | path='E:\\img\\'; 33 | # 判断路径是否存在 34 | isExists = os.path.exists(path) 35 | 36 | # 判断结果 37 | if not isExists: 38 | # 如果不存在则创建目录 39 | # 创建目录操作函数 40 | os.makedirs(path) 41 | print 42 | path + '创建成功' 43 | 44 | else: 45 | # 如果目录存在则不创建,并提示目录已存在 46 | print 47 | path + ' 目录已存在' 48 | 49 | 50 | # 调用函数 51 | spiderPic(result.text, word) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 利用关键词动态爬取想要的图片 2 | 3 | ### 导入相关库 4 | 主要是requests库 5 | ``` 6 | import re # 导入正则表达式模块 7 | import requests # python HTTP客户端 编写爬虫和测试服务器经常用到的模块 8 | import random # 随机生成一个数,范围[0,1] 9 | import os #创建路径 10 | ``` 11 | 12 | ### 写爬虫爬取图片函数 13 | ``` 14 | def spiderPic(html, keyword): #html:网页;keyword:关键词 15 | print('正在查找 ' + keyword + ' 对应的图片,请稍后......') 16 | for addr in re.findall('"objURL":"(.*?)"', html, re.S): # 动态查找URL 17 | print('正在爬取URL地址:' + str(addr)[0:40] + '...') # 爬取的地址长度超过40时,用'...'代替后面的内容 18 | 19 | try: 20 | pics = requests.get(addr, timeout=10) # 请求URL时间(最大10秒) 21 | except requests.exceptions.ConnectionError: 22 | print('您当前请求的URL地址出现错误') 23 | continue 24 | 25 | fq = open('E:\\img\\' + (keyword + '_' + str(random.randrange(0, 1000, 4)) + '.jpg'), 'wb') # 下载图片,并保存和命名 26 | fq.write(pics.content) 27 | fq.close() 28 | ``` 29 | 30 | ### 主函数 31 | ``` 32 | if __name__ == '__main__': 33 | word = input('请输入你要搜索的图片关键字:') 34 | result = requests.get( 35 | # 通过百度引擎搜索关键词链接 36 | 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=' + word) 37 | 38 | ``` 39 | 40 | ### 存放图片文件夹创建 41 | 加入判断是否存在该文件目录 42 | ``` 43 | path='E:\\img\\'; 44 | # 判断路径是否存在 45 | isExists = os.path.exists(path) 46 | 47 | ### 判断结果 48 | if not isExists: 49 | # 如果不存在则创建目录 50 | # 创建目录操作函数 51 | os.makedirs(path) 52 | print 53 | path + '创建成功' 54 | 55 | else: 56 | # 如果目录存在则不创建,并提示目录已存在 57 | print 58 | path + ' 目录已存在' 59 | ``` 60 | 61 | ### 调用函数 62 | ``` 63 | spiderPic(result.text, word) 64 | ``` 65 | 66 | ### 数据展示 67 | 我们在输入提示后输入关键词 “风景”并开始爬取图片 68 |  69 | 70 | 爬取的图片在img文件夹中 71 | -------------------------------------------------------------------------------- /img/风景_720.jpg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
60 |
77 | 经典不期而遇
78 | 79 | 80 |
81 | 相逢是场缘分
82 | 83 | 84 |
85 | 邂逅一幕幕美
86 | 87 | 88 |
89 | 欣赏饕餮盛宴
90 | 91 |
98 | 刘帅,1988年7月23日生于山东青岛。2008年9月-2012年6月,就读于中央美术学院,毕建勋水墨人物工作室。2013年以专业成绩第一名的成绩考取中央美术学院国画系研究生,师从毕建勋教授。2012年毕业于中央美术学院中国画学院,获学士学位。2016年毕业于中央美术学院中国画学院,获硕士学位。现为博宝艺品万家签约艺术家。
101 |
107 | 李小成,字越吉。1968年2月出生,山东章丘人。大学文化。中国人民解放军海军文化中心美术创作员、中国人民大学继续教育学院李小成国画创作工作室导师。中国美术家协会会员、中国书法家协会会员、中国楹联学会书法艺术委员会委员、中国侨联艺术家联合会副会长、北京国际世纪名人书画院副秘长、博宝艺品万家签约艺术家。
110 |










