├── .gitignore ├── README.md ├── config.ini ├── gcj02towgs84.py ├── getDistrictShp.py ├── getPoiShp.py ├── lib ├── AMap_adcode_citycode_2020_4_10.xlsx └── amap_poicode.xlsx └── result ├── China ├── China.png └── China.svg ├── aoi_shp ├── Wuhan_college.mdb ├── Wuhan_park.mdb └── Wuhan_scenic spots.mdb └── district_shp ├── China_province.mdb ├── Hubei.mdb └── Wuhan_district.mdb /.gitignore: -------------------------------------------------------------------------------- 1 | /.vscode 2 | /__pycache__ 3 | /assets 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 爬取数据生成shp文件 2 | 3 | ## 1.功能简介 4 | 5 | 共有两大功能,一个功能是根据高德地图web服务API获取行政区划坐标串,写入行政区shp文件;另一个功能是根据高德地图的接口获取poi坐标串,写入aoi(area of interest)的shp文件。 6 | 7 | ## 2.文件说明 8 | 9 | **lib文件夹**,包含两个xls文件,分别是高德地图的城市编码表和POI分类编码表。 10 | 11 | **result/district_shp文件夹**,用于存储生成的行政区shp文件。 12 | 13 | **result/aoi_shp文件夹**,用于存储生成的aoi的shp文件。 14 | 15 | **config.ini文件**,配置文件,填写高德地图web服务的key;填写要爬取的poi的类别编码;填写爬取城市的adcode。 16 | 17 | **getPoiShp.py文件**,生成指定专题、指定城市的aoi的shp文件。 18 | 19 | **getDistrictShp.py文件**,生成行政区划shp文件。 20 | 21 | **gcj02togps84.py文件**,高德地图使用的是GCJ-02坐标系,用此py文件转换为WGS-84坐标系。 22 | 23 | > GCJ-02是由中国国家测绘局(G表示Guojia国家,C表示Cehui测绘,J表示Ju局)制订的地理信息系统的坐标系统。它是一种对经纬度数据的加密算法,即加入随机的偏差。国内出版的各种地图系统(包括电子形式),必须至少采用GCJ-02对地理位置进行首次加密。 24 | 25 | ## 3.程序思路 26 | 27 | 在具体操作前,首先需要注册高德地图开发者账号,然后申请Web服务API密钥(Key)。 28 | 29 | ### 3.1获取行政区的shp文件 30 | 31 | 1. 构造高德Web API的行政区查询请求URL,例如:[http://restapi.amap.com/v3/config/district?key=<用户的key>&keywords=<关键词>&subdistrict=<子级行政区级别(0或1)>&extensions=all](http://restapi.amap.com/v3/config/district?key=<用户的key>&keywords=<关键词>&subdistrict=<子级行政区级别(0或1)>&extensions=all)。须注意的一点是:extensions参数应为all,若为base则只返回基本信息,其中不包含坐标串。 32 | 33 | 2. 将获取到的坐标串,从GCJ-02坐标系转换为WGS-84坐标系。 34 | 35 | 3. 利用第三方库pyshp,将返回的坐标串写入对应的shp文件。 36 | 37 | ### 3.2获取aoi的shp文件 38 | 39 | 1. 构造高德Web API的POI搜索请求URL,搜索POI有四种方式,分别是:关键词搜索、周边搜索、多边形搜索和ID查询。这里我们使用关键词搜索的方式,指定`city`并设置`citylimit`为`true`,只搜索城市内的数据。例如:[https://restapi.amap.com/v3/place/text?keywords=北京大学&city=beijing&output=xml&offset=20&page=1&key=<用户的key>&extensions=all](https://restapi.amap.com/v3/place/text?keywords=北京大学&city=beijing&output=xml&offset=20&page=1&key=<用户的key>&extensions=all)。 40 | 41 | 2. 拿到POI的id后,请求[https://www.amap.com/detail/get/detail?id=](https://www.amap.com/detail/get/detail?id=)。 42 | 43 | 3. 若返回的数据包含边界坐标则写入对应shp文件,若返回的数据不包含边界坐标则将其父poi的id和name加入循环列表。 44 | 45 | ## 4.第三方依赖 46 | 47 | - requests 48 | 49 | - configparser 50 | 51 | - [pyshp](https://github.com/GeospatialPython/pyshp) 52 | 53 | 54 | 55 | ## 5.注意事项 56 | 57 | - result/district_shp文件夹中,分别包含有中国各省份、湖北各城市、武汉行政区的个人地理数据库。result/aoi_shp文件夹中,分别包含有武汉市高等教育院校、武汉市公园、武汉市景点的个人地理数据库。这些数据是在ArcMap中构建的数据库,一并上传,供需要的读者下载使用。 58 | - cookies参数,在getPoiShp.py文件中的getRawData函数中指定在headers参数中。config.ini文件中cookies参数配置,因为cookies中的=和;对ini文件的读取造成了困扰,以后有机会完善。 59 | - 每一个shp文件写入成功后,在控制台会输出提示,注意查看。 60 | - 若想研究pyshp的用法,推荐查阅pyshp的github页面,其作者的文档很详细。笔者额外加了写入.prj文件的代码。 61 | 62 | ## 6.Contact Me 63 | 64 | 如果有什么建议,欢迎联系我 [zixinwan@foxmail.com](mailto:zixinwan@foxmail.com) 或提issue。欢迎star! 65 | -------------------------------------------------------------------------------- /config.ini: -------------------------------------------------------------------------------- 1 | [parameter] 2 | ;此处填写高德地图web服务的key 3 | amapWebKey= 4 | ;此处填写请求url成功的cookies,目前还没有实现自动更新cookies 5 | cookies= 6 | ;此处填写poi类别编码,具体查阅poi分别编码表,141201为科教服务、高校 7 | poiType=141201 8 | 9 | [district] 10 | ;此处填写带查询的规范的行政区名称,或者adcode,例如:610100--西安市,推荐使用adcode,结果精准 11 | district=610100 12 | -------------------------------------------------------------------------------- /gcj02towgs84.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | # Author: Zixin Wan 3 | # Time: 2020/05/19 4 | # Function: gcj_02坐标系转wgs_84坐标系 5 | 6 | import json 7 | import math 8 | 9 | 10 | x_pi = 3.14159265358979324 * 3000.0 / 180.0 11 | pi = 3.1415926535897932384626 # π 12 | a = 6378245.0 # 长半轴 13 | ee = 0.00669342162296594323 # 扁率 14 | 15 | def gcj02towgs84(lng, lat): 16 | """ 17 | GCJ02(火星坐标系)转GPS84 18 | :param lng:火星坐标系的经度 19 | :param lat:火星坐标系纬度 20 | :return: 21 | """ 22 | if out_of_china(lng, lat): 23 | return lng, lat 24 | dlat = transformlat(lng - 105.0, lat - 35.0) 25 | dlng = transformlng(lng - 105.0, lat - 35.0) 26 | radlat = lat / 180.0 * pi 27 | magic = math.sin(radlat) 28 | magic = 1 - ee * magic * magic 29 | sqrtmagic = math.sqrt(magic) 30 | dlat = (dlat * 180.0) / ((a * (1 - ee)) / (magic * sqrtmagic) * pi) 31 | dlng = (dlng * 180.0) / (a / sqrtmagic * math.cos(radlat) * pi) 32 | mglat = lat + dlat 33 | mglng = lng + dlng 34 | return [lng * 2 - mglng, lat * 2 - mglat] 35 | 36 | def transformlat(lng, lat): 37 | ret = -100.0 + 2.0 * lng + 3.0 * lat + 0.2 * lat * lat + \ 38 | 0.1 * lng * lat + 0.2 * math.sqrt(math.fabs(lng)) 39 | ret += (20.0 * math.sin(6.0 * lng * pi) + 20.0 * 40 | math.sin(2.0 * lng * pi)) * 2.0 / 3.0 41 | ret += (20.0 * math.sin(lat * pi) + 40.0 * 42 | math.sin(lat / 3.0 * pi)) * 2.0 / 3.0 43 | ret += (160.0 * math.sin(lat / 12.0 * pi) + 320 * 44 | math.sin(lat * pi / 30.0)) * 2.0 / 3.0 45 | return ret 46 | 47 | def transformlng(lng, lat): 48 | ret = 300.0 + lng + 2.0 * lat + 0.1 * lng * lng + \ 49 | 0.1 * lng * lat + 0.1 * math.sqrt(math.fabs(lng)) 50 | ret += (20.0 * math.sin(6.0 * lng * pi) + 20.0 * 51 | math.sin(2.0 * lng * pi)) * 2.0 / 3.0 52 | ret += (20.0 * math.sin(lng * pi) + 40.0 * 53 | math.sin(lng / 3.0 * pi)) * 2.0 / 3.0 54 | ret += (150.0 * math.sin(lng / 12.0 * pi) + 300.0 * 55 | math.sin(lng / 30.0 * pi)) * 2.0 / 3.0 56 | return ret 57 | 58 | def out_of_china(lng, lat): 59 | """ 60 | 判断是否在国内,不在国内不做偏移 61 | :param lng: 62 | :param lat: 63 | :return: 64 | """ 65 | if lng < 72.004 or lng > 137.8347: 66 | return True 67 | if lat < 0.8293 or lat > 55.8271: 68 | return True 69 | return False 70 | -------------------------------------------------------------------------------- /getDistrictShp.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | # Author: Zixin Wan 3 | # Time: 2020/05/18 4 | # Function: 生成行政区划shp文件,使用wgs-84坐标系 5 | 6 | import requests 7 | import json 8 | import configparser 9 | import shapefile 10 | from gcj02towgs84 import gcj02towgs84 11 | 12 | # 根据高德地图web服务的行政区查询,生成shp文件(含.prj文件),行政区级别:省/直辖市,市,区/县 13 | class DistrictShp(object): 14 | def __init__(self,subdistrict=0): 15 | self.name=[] 16 | self.subdistrict=subdistrict 17 | self.amapWebKey='' 18 | self.read_ini() 19 | # self.amapWebKey=amapWebKey 20 | 21 | def read_ini(self,inipath='config.ini'): 22 | config = configparser.ConfigParser() 23 | config.read(inipath,encoding='utf-8') 24 | self.amapWebKey=config.get('parameter','amapWebKey') 25 | self.keywords=config.get('district','district') 26 | 27 | def getRawData(self,name): 28 | # 默认subdistrict为0,只显示当前行政区,不显示下级行政区 29 | url='http://restapi.amap.com/v3/config/district?key=%s&keywords=%s&subdistrict=%d&extensions=all'\ 30 | %(self.amapWebKey,name,self.subdistrict) 31 | res=requests.get(url) 32 | if res.status_code == 200: 33 | return json.loads(res.text) 34 | else: 35 | return '0' 36 | 37 | # 由于涉及大量网络请求,速度慢,暂不用,使用计算方式 38 | def gcj02towgs84(self,point): 39 | # coordsys=gps 从GCJ-02火星坐标系转换为GPS的WGS-84坐标系 40 | url='https://restapi.amap.com/v3/assistant/coordinate/convert?locations=%s&coordsys=gps&key=%s'\ 41 | %(str(point[0])+','+str(point[1]),self.amapWebKey) 42 | res=requests.get(url) 43 | if res.status_code == 200: 44 | return json.loads(res.text)['locations'] 45 | else: 46 | return '0' 47 | 48 | def writePrj(self,name): 49 | prj=open('result/district_shp/'+name+'.prj','w') 50 | epsg = 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],\ 51 | PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]]' 52 | prj.write(epsg) 53 | prj.close() 54 | 55 | def getName(self,name): 56 | self.name.append(name) 57 | info=self.getRawData(name) 58 | if 'districts' in info['districts'][0]: 59 | subdistricts=info['districts'][0]['districts'] 60 | for i in subdistricts: 61 | self.name.append(i['name']) 62 | 63 | def creatShp(self): 64 | for i in self.name: 65 | info=self.getRawData(i) 66 | if info != '0' and info['status'] == '1': 67 | name=info['districts'][0]['name'] 68 | level=info['districts'][0]['level'] 69 | if 'polyline' in info['districts'][0]: 70 | polyline_raw=info['districts'][0]['polyline'] 71 | else: 72 | continue 73 | 74 | w=shapefile.Writer('result/district_shp/'+name) 75 | w.field('NAME','C') 76 | w.field('LEVEL','C') 77 | 78 | for polygon in polyline_raw.split('|'): 79 | polygons=[] 80 | for j in polygon.split(';'): 81 | point=[] 82 | for k in j.split(','): 83 | point.append(float(k)) 84 | point=gcj02towgs84(point[0],point[1]) 85 | polygons.append(point) 86 | w.poly([polygons]) 87 | w.record(name,level) 88 | 89 | w.close() 90 | self.writePrj(name) 91 | print(name+' .shp文件已写入。') 92 | 93 | def main(): 94 | # 指定子级行政区级数,默认为0,即返回当前查询的行政区,注意,只能设置0或1 95 | ds=DistrictShp(subdistrict=1) 96 | 97 | config = configparser.ConfigParser() 98 | config.read('config.ini',encoding='utf-8') 99 | keywords=config.get('district','district') 100 | 101 | ds.getName(keywords) 102 | ds.creatShp() 103 | 104 | if __name__ == '__main__': 105 | main() 106 | 107 | -------------------------------------------------------------------------------- /getPoiShp.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | # Author: Zixin Wan 3 | # Time: 2020/05/20 4 | # Function: 生成指定专题、指定城市的aoi的shp文件 5 | 6 | import json 7 | import requests 8 | import math 9 | import shapefile 10 | import webbrowser 11 | import configparser 12 | from fake_useragent import UserAgent 13 | from gcj02towgs84 import gcj02towgs84 14 | 15 | class POIShp(object): 16 | def __init__(self,city,poiType,amapWebKey='eb69a25118bfbd1f06c1a9103c24df91'): 17 | self.city=city 18 | self.key=amapWebKey 19 | self.poiid=[] 20 | self.poiType=poiType 21 | 22 | def read_ini(self,inipath='config.ini'): 23 | config = configparser.ConfigParser() 24 | config.read(inipath,encoding='utf-8') 25 | # cookies=config.get('parameter','cookies') 26 | self.key=config.get('parameter','amapWebKey') 27 | self.poiType=config.get('parameter','poiType') 28 | # return cookies 29 | 30 | def getRawData(self,url): 31 | headers ={ 32 | 'User-Agent': UserAgent(use_cache_server=False).random, 33 | 'Cookie': '_uab_collina=155559528978903628447463; passport_login=MjM0MjkyMjI0LGFtYXBfMTU1NDk0NDgwMzVBbTROZ3ZScmksdmdlNnpvdWh1d25qZHlrcDJzY21wdXAzZGNpaWRkdmksMTU4ODY2MjQ5NixNR013TVRVMlpUZGtOemt3WWpBeU0yTTBNelJsT0RFNFptVmtOR1F5TWprPQ%3D%3D; dev_help=FRfQXSEvKkXdez53qnNDADQ2MDMyZWMxYjYwYTE1YmEwZDFjYjk0YjdjYTE4Zjk2MWUxYzNlYTRkYzBkZTdlZTJkZTg0ZmI2MDk4OWYwZjNXFm3EvkYizgU9gvifa9d8zShm%2B3jBzHQHpkqzzI5ZADCa8QMqNZePRIxLSEoIm9HhRtgaVxoRJPYRhLmE6Id%2Bjo4LZP544sy02%2F%2FJ%2B%2FA7sH3cqzZK5jI276lMN3cs8tPngqpD4D2LAGs%2FcEpT4KxL; cna=frsFF4wtqVsCAW8SOjOSbpsM; UM_distinctid=17245eeaa373a-0c3d4a8df057cb-f313f6d-100200-17245eeaa3857; CNZZDATA1255626299=550611881-1590309744-https%253A%252F%252Fwww.baidu.com%252F%7C1590315144; guid=2189-384f-c9bc-8d07; x-csrf-token=5a211b81825fe73616c183da39e88b70; x5sec=7b22617365727665723b32223a223966663663633631366564393636666138383266613237656561336461336264434f2b7171665946454d2f466776577279716a317077453d227d; l=eBIG3Jjuv3Q68TU9BOfwourza77tSIRfguPzaNbMiOCPOa1M5DeCWZANKFTHCnGVnsc2R3oGfmNDByYUuyUICZ28nkVylJsSedLh.; isg=BEZGK1GrZJvZLzIN4Mm5hVO1lzzIp4phCmmw9TBvOGlEM-dNmDZacbiJD2__m4J5' 34 | } 35 | 36 | res=requests.get(url,headers=headers) 37 | info=json.loads(res.text) 38 | if res.status_code == 200:# and info['status'] == '1': 39 | return info 40 | elif 'url' in info: 41 | url='https://www.amap.com/'+info['url'] 42 | webbrowser.open(url) 43 | else: 44 | return '0' 45 | 46 | def getPoiId(self): 47 | page=1 48 | totalPage=1000 # 最大为1000 49 | while page <= totalPage: 50 | url='https://restapi.amap.com/v3/place/text?types=%d&city=%s&offset=20&page=%d&key=%s&extensions=base&citylimit=true'\ 51 | %(self.poiType,self.city,page,self.key) 52 | info=self.getRawData(url) 53 | 54 | # 总页数向上取整,每次请求返回记录数20 55 | totalPage=math.ceil(int(info['count'])/20) 56 | 57 | for i in range(len(info['pois'])): 58 | self.poiid.append(info['pois'][i]['id']) 59 | 60 | page+=1 61 | 62 | print('poiid已获取完毕。共有 %d 个poi'%len(self.poiid)) 63 | 64 | def creatShp(self): 65 | for poiid in self.poiid: 66 | url='https://www.amap.com/detail/get/detail?id=%s'%poiid 67 | info=self.getRawData(url) 68 | 69 | if 'base' in info['data']: 70 | pass 71 | else: 72 | continue 73 | name=info['data']['base']['name'] 74 | address=info['data']['base']['address'] 75 | city_name=info['data']['base']['city_name'] 76 | city_adcode=int(info['data']['base']['city_adcode']) 77 | x=float(info['data']['base']['x']) 78 | y=float(info['data']['base']['y']) 79 | classify=info['data']['base']['classify'] 80 | business=info['data']['base']['business'] 81 | 82 | # shape=info['data']['spec']['mining_shape']['shape'] 83 | 84 | spec=info['data']['spec'] 85 | if 'mining_shape' in spec: 86 | if 'shape' in spec['mining_shape']: 87 | shape=spec['mining_shape']['shape'] 88 | else: 89 | continue 90 | elif 'aoi' in info['data']['base']['geodata']: 91 | mainpoi=info['data']['base']['geodata']['aoi'][0]['mainpoi'] 92 | if mainpoi not in self.poiid: 93 | self.poiid.append(mainpoi) 94 | print(name+' 不包含边界点坐标串,其父poi为 '+info['data']['base']['geodata']['aoi'][0]['name']) 95 | continue 96 | else: 97 | continue 98 | 99 | w=shapefile.Writer('result/aoi_shp/'+name) 100 | w.field('NAME','C') 101 | w.field('ADDRESS','C') 102 | w.field('CITY_NAME','C') 103 | w.field('CITY_ADCODE','N') 104 | w.field('LONGITUDE','F',decimal=6) 105 | w.field('LATITUDE','F',decimal=6) 106 | w.field('CLASSIFY','C') 107 | w.field('BUSINESS','C') 108 | 109 | polygon=[] 110 | for i in shape.split(';'): 111 | point=[] 112 | for j in i.split(','): 113 | point.append(float(j)) 114 | point=gcj02towgs84(point[0],point[1]) 115 | polygon.append(point) 116 | w.poly([polygon]) 117 | w.record(name,address,city_name,city_adcode,x,y,classify,business) 118 | self.writePrj(name) 119 | print(name+' .shp文件已写入完毕。') 120 | 121 | def writePrj(self,name): 122 | prj=open('result/aoi_shp/'+name+'.prj','w') 123 | epsg = 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],\ 124 | PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]]' 125 | prj.write(epsg) 126 | prj.close() 127 | 128 | def main(): 129 | ''' 130 | 具体参考lib文件夹下的poi分类编码表,例如: 131 | 141201 科教文化服务,高校 132 | 110000 风景名胜,风景名胜相关,旅游景点 133 | 110101 风景名胜,公园广场,公园 134 | ''' 135 | config = configparser.ConfigParser() 136 | config.read('config.ini',encoding='utf-8') 137 | poiType=config.get('parameter','poiType') 138 | 139 | p=POIShp('武汉',int(poiType)) 140 | p.getPoiId() 141 | p.creatShp() 142 | 143 | if __name__ == '__main__': 144 | main() 145 | -------------------------------------------------------------------------------- /lib/AMap_adcode_citycode_2020_4_10.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/lib/AMap_adcode_citycode_2020_4_10.xlsx -------------------------------------------------------------------------------- /lib/amap_poicode.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/lib/amap_poicode.xlsx -------------------------------------------------------------------------------- /result/China/China.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/China/China.png -------------------------------------------------------------------------------- /result/aoi_shp/Wuhan_college.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/aoi_shp/Wuhan_college.mdb -------------------------------------------------------------------------------- /result/aoi_shp/Wuhan_park.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/aoi_shp/Wuhan_park.mdb -------------------------------------------------------------------------------- /result/aoi_shp/Wuhan_scenic spots.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/aoi_shp/Wuhan_scenic spots.mdb -------------------------------------------------------------------------------- /result/district_shp/China_province.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/district_shp/China_province.mdb -------------------------------------------------------------------------------- /result/district_shp/Hubei.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/district_shp/Hubei.mdb -------------------------------------------------------------------------------- /result/district_shp/Wuhan_district.mdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanzixin/getShp/306bb186f968b93200501e8e3e04ce6ec08ebd8d/result/district_shp/Wuhan_district.mdb --------------------------------------------------------------------------------