├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── config.ini ├── db └── 20171227 │ └── 20171227-174401-ofo.csv ├── image ├── 1.png └── 2.png ├── ofoRegister ├── Api_360Yzm.py ├── login.py └── rk.py ├── ofoSpider ├── __init__.py └── spider.py ├── requirements.txt └── run.py /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.xml 2 | *.pyc 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 silverbell 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | #### 接口现在返回的单车id都是6666... 2 | #### 360验证码平台被封,注册机暂时也不能用了。 3 | 4 | ofo共享单车地图爬虫 5 | ==================== 6 | 7 | ``` 8 | 感谢您的支持 9 | 朋友要做城市数据的一些研究,拜托我写ofo地图的小爬虫 10 | 最初的看到了derekhe大神写的单车地图,觉得非常的优秀。 11 | 但是单车地图由于官方不断封杀爬虫的行为,被迫关闭了。 12 | derekhe写的爬虫也没办法使用了,于是参考了derekhe的代码。 13 | 对现在ofo的api进行了分析,甚至还写了个ofo的token注册机, 14 | 实现了对ofo附近单车位置的爬取。 15 | 16 | 如果这个程序帮助到了你,请动动手指,给个Star! 17 | ``` 18 | ## Stargazers over time 19 | [![Stargazers over time](https://starchart.cc/SilverBooker/ofoSpider.svg)](https://starchart.cc/SilverBooker/ofoSpider) 20 | 21 | 该爬虫为ofo附近单车爬虫 22 | * 新增ofo注册机 23 | * 目前只支持ofo 24 | * 多线程爬取 25 | * 自动去重 26 | * 输出csv文件,存放在db/【日期】/【日期】-【时间】-【品牌】.csv文件内 27 | * gzip压缩存储 28 | 29 | # 运行环境 30 | * Python3 31 | * Linux/Mac/Windows 32 | 33 | 请根据你的需要修改配置文件config.ini,请查看内置说明。 34 | 35 | ## Linux/Mac 36 | * 下载[最新代码](https://github.com/SilverBooker/ofoSpider/archive/master.zip)并解压 37 | * 修改config.ini确保坐标和区域等参数正确 38 | * 运行: 39 | ``` 40 | pip3 install -r requirements.txt 41 | python3 run.py 42 | ``` 43 | 44 | ## Windows 45 | * 下载[最新代码](https://github.com/SilverBooker/ofoSpider/archive/master.zip)并解压 46 | * 安装好python3.5.3 47 | * 在cmd中执行pip install -r requirements.txt 48 | * 修改config.ini确保坐标和区域等参数正确 49 | * 在cmd执行python run.py 50 | 51 | # 输出格式 52 | 53 | 输出格式:CSV 54 | 55 | 每行格式:时间戳,单车编号,纬度,经度 56 | 57 | # 关于token 58 | ``` 59 | 很多人来问我token的取法,问得人多了,我就认识到了写好文档的重要性 60 | token是发出获取附近单车请求的必要字段,保存在ofo账号登录后cookie, 61 | 具体获取方法如下所示 62 | ``` 63 | * 使用chrome浏览器访问https://common.ofo.so/newdist/?Login&~next=%22%3FJourney%22 64 | * 输入你的账号验证码成功登陆后(强烈建议使用没有押金的小号,被封后果自负) 65 | * 点击地址栏左边“安全”查看当前网站cookie信息 66 | * 然后我们就可以在ofo.so域下发现我们当前账号的token 67 | 68 | ![图1](/image/1.png) 69 | ![图2](/image/2.png) 70 | 71 | # 关于批量获取token——ofo注册机 72 | ``` 73 | 这个注册机写的不是很好,毕竟用到了个人不喜欢的selenium, 74 | 打码,接码,记录token能够全部自动化, 75 | 日后可能会再修改。 76 | ``` 77 | * 确保你安装了chrome浏览器,之后打开设置->关于chrome查看chrome的版本 78 | * 访问http://blog.csdn.net/huilan_same/article/details/51896672 查看版本对应的chromedriver 79 | * 访问http://chromedriver.storage.googleapis.com/index.html 下载对应的chromedriver 80 | * 修改ofoRegister/login.py下start()函数中chromedriver你所放置的路径 81 | * 访问http://www.360yzm.com/ 360验证码平台注册账号,进行充值(1个账号1毛钱),并在login.py对应的位置填入你的账号密码 82 | * 访问http://www.ruokuai.com/ 若快打码平台注册账号,进行充值(识别一条验证码好像是一分钱),并在login.py对应的位置填入账号密码 83 | * 按需修改ofoRegister/login.py下start()函数中for循环的次数,每次循环生成一个账号 84 | * 最后,在ofoRegister目录下执行python login.py(提示缺少哪个包就用pip安装哪个包) 85 | * 最后就会在ofoRegister目录下生成一个写有token的txt文件 86 | -------------------------------------------------------------------------------- /config.ini: -------------------------------------------------------------------------------- 1 | [DEFAULT] 2 | # 经纬度请用高德拾取工具拾取 3 | # 左上点,经度在前,纬度在后 4 | top_left = 123.223838,41.923016 5 | 6 | # 右下点 7 | bottom_right = 123.607881,41.641582 8 | 9 | # 平移量,用于遍历整个区域的最小间隔,请自行调整 10 | # 参数过小则抓取太过于密集,导致重复数据过多 11 | # 参数过大则抓取太过于稀疏,会漏掉一些数据 12 | offset = 0.002 13 | 14 | # 线程数,请合理利用资源,线程数请不要过大,过大服务器会返回错误 15 | workers = 100 16 | 17 | # 连接超时时间,在网络不好的情况下可以设置较大的超时时间。但较大的超时时间会影响速度。 18 | timeout = 30 19 | 20 | # token,这个是我小号登陆后从cookie拿出的token 21 | # 随时可能无效,被ban,联系QQ43057852,联系我之前请先自己读完文档。 22 | token = 9A9974E0-DB34-11E7-8613-359A94BAD51B 23 | 24 | # 连续运行 25 | always_run = false 26 | 27 | # 连续运行间隔(分钟) 28 | wait_time = 1 29 | 30 | # 压缩存储 31 | compress = false 32 | -------------------------------------------------------------------------------- /image/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SilverBooker/ofoSpider/1ef0ab7b0027c3613ca456df915437ed4a740813/image/1.png -------------------------------------------------------------------------------- /image/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SilverBooker/ofoSpider/1ef0ab7b0027c3613ca456df915437ed4a740813/image/2.png -------------------------------------------------------------------------------- /ofoRegister/Api_360Yzm.py: -------------------------------------------------------------------------------- 1 | import urllib 2 | import urllib.parse 3 | import urllib.request 4 | import time 5 | import http.client 6 | 7 | 8 | # GET 9 | def http_get(url,debug=False): 10 | headers = {'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/plain', 11 | 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko' , 12 | } 13 | urlparsestr = urllib.parse.urlparse(url) 14 | conn = http.client.HTTPConnection(urlparsestr.netloc) 15 | if debug == True: 16 | conn.set_debuglevel(1) 17 | conn.request("GET", urlparsestr.path + "?" +urlparsestr.query, "", headers) 18 | reasult = conn.getresponse() 19 | data = reasult.read() 20 | return {"status":reasult.status,"data":data,"cookie":reasult.getheader("Set-Cookie") } 21 | 22 | #登录 23 | #一个dama_360token可以取多个号 24 | #作者们在使用多线程的时候,无需要重复调试登录接口 25 | #正常返回 1|dama_360token|余额 26 | #错误返回 0|错误原因 27 | def loginIn(dama_360Uname,dama_360Pwd,author_uid): 28 | url = "http://api.360yzm.com/user.do!loginIn?uname=" + dama_360Uname + "&pwd=" + dama_360Pwd + "&author_uid=" + author_uid 29 | html = "" 30 | try: 31 | html = http_get(url)["data"] 32 | except Exception as e: 33 | time.sleep(1) 34 | html = http_get(url)["data"] 35 | return html 36 | 37 | #取手机号码 38 | #手机号码|dama_360token 39 | #成功返回: 1|手机号码 40 | #失败返回: 0|失败原因 41 | def getPhone(dama_360token,dama_360Pid): 42 | url = "http://api.360yzm.com/user.do!getPhone?pid="+ dama_360Pid + "&token=" + dama_360token 43 | html = "" 44 | i = 0 45 | while i<=5: 46 | try: 47 | i = i+1 48 | html = http_get(url)["data"] 49 | break 50 | except Exception as e: 51 | if i> 6: 52 | break 53 | continue 54 | return html 55 | 56 | 57 | #取手机号码[多参数] 58 | #phoneType 取值为1,2,3 {1代表[移动] 2代表[联通] 3代表[电信]} 59 | #wantCount 想要取多少个号码 多个用分号分隔 60 | #area 号码是哪个城市的,直接传想要的城市名字 61 | #channelKey 和卡商对接的密钥 格式:项目ID-随机数 62 | #成功返回: 1|手机号码 63 | #失败返回: 0|失败原因 64 | def getPhones(dama_360token,dama_360Pid,wantCount,wantPhone,area,phoneType,channelKey): 65 | url = "http://api.360yzm.com/user.do!getPhone?pid="+ dama_360Pid + "&token=" + dama_360token+"&count="+wantCount+"&phone="+wantPhone+"&area="+area+"&phoneType="+phoneType+"&channelKey="+channelKey 66 | html = "" 67 | i = 0 68 | while i<=5: 69 | try: 70 | i = i+1 71 | html = http_get(url)["data"] 72 | break 73 | except Exception as e: 74 | if i> 6: 75 | break 76 | continue 77 | return html 78 | 79 | #取短信 80 | #成功返回: 1|短信内容 81 | #失败返回: 0|暂无 82 | def getMessage(dama_360token,dama_360Pid,phone): 83 | # url = "http://api.360yzm.com/user.do!getMessage?token="+dama_360token+"&pid="+dama_360Pid+"&phone="+phone 84 | url = "http://api.360yzm.com/user.do!getMessage?token=%s&pid=%s&phone=%s"%(dama_360token,dama_360Pid,phone) 85 | html = "" 86 | i = 0 87 | while i<=5: 88 | try: 89 | i = i+1 90 | html = http_get(url)["data"] 91 | break 92 | except Exception as e: 93 | if i> 6: 94 | break 95 | continue 96 | return html 97 | 98 | #释放手机号码 99 | #只有当做的项目类型是 【发送类型 多条接受】 的时候,用户才需要主动释放!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 100 | #单条接收的项目,无需调用此方法!!!!!!! 101 | #成功返回: 1|成功 102 | #失败返回: 0|失败原因 103 | def releasePhone(dama_360token,dama_360Pid,phone): 104 | url = "http://api.360yzm.com/user.do!releasePhone?token="+dama_360token+"&pid="+dama_360Pid+"&phone=%s"%phone 105 | html = "" 106 | i = 0 107 | while i<=5: 108 | try: 109 | i = i+1 110 | html = http_get(url)["data"] 111 | break 112 | except Exception as e: 113 | if i> 6: 114 | break 115 | continue 116 | return html 117 | 118 | #释放所有号码,很少用到哦 119 | #程序写的混乱的时候可能会用到-_- 120 | #成功返回: 1|成功 121 | #失败返回: 0|失败原因 122 | def releaseAllPhone(dama_360token): 123 | url = "http://api.360yzm.com/user.do!releaseAllPhone?token="+dama_360token 124 | html = "" 125 | i = 0 126 | while i<=5: 127 | try: 128 | i = i+1 129 | html = http_get(url)["data"] 130 | break 131 | except Exception as e: 132 | if i> 6: 133 | break 134 | continue 135 | return html 136 | 137 | #添加到黑名单 138 | #取完手机号,不不不不不不用调用此接口,系统取完会自动帮你加入系统的黑名单 139 | #此接口一般是在你有检测是否注册过的接口时候用到的。 140 | #成功返回: 1|成功 141 | #失败返回: 0|失败原因 142 | def addBlack(dama_360token,dama_360Pid,phone): 143 | url = "http://api.360yzm.com/user.do!addBlack?token="+dama_360token+"&pid="+dama_360Pid+"&phone=%s"%phone 144 | html = "" 145 | i = 0 146 | while i<=5: 147 | try: 148 | i = i+1 149 | html = http_get(url)["data"] 150 | break 151 | except Exception as e: 152 | if i> 6: 153 | break 154 | continue 155 | return html 156 | 157 | 158 | #发送短信 159 | #成功返回: 1|成功 160 | #失败返回: 0|失败原因 161 | def sendMessage(dama_360token,dama_360Pid,phone,msgContent): 162 | url = "http://api.360yzm.com/user.do!sendMessage?token="+dama_360token+"&pid="+dama_360Pid+"&phone="+phone+"&msg="+msgContent 163 | html = "" 164 | i = 0 165 | while i<=5: 166 | try: 167 | i = i+1 168 | html = http_get(url)["data"] 169 | break 170 | except Exception as e: 171 | if i> 6: 172 | break 173 | continue 174 | return html 175 | -------------------------------------------------------------------------------- /ofoRegister/login.py: -------------------------------------------------------------------------------- 1 | import re 2 | import time 3 | import base64 4 | import sys 5 | import threading 6 | import Api_360Yzm 7 | from rk import * 8 | # from multiprocessing import Pool 9 | from selenium import webdriver 10 | from selenium.webdriver import ActionChains 11 | 12 | 13 | 14 | class Register: 15 | def __init__(self): 16 | self.rc = RClient('若快账号', '若快密码', '94468', '0a5cda058154411280029dc311a84011') 17 | options = webdriver.ChromeOptions() 18 | options.add_argument('--ignore-certificate-errors') 19 | options.add_argument('--ignore-ssl-errors') 20 | self.msg = '' 21 | while (True): 22 | retVal = Api_360Yzm.loginIn('360验证码账号', '360验证码密码', '24607') 23 | print(retVal) 24 | retValArr = retVal.decode("ascii").split("|") 25 | if (len(retValArr) == 3): 26 | self.dama_360token = retValArr[1] 27 | dama_money = retValArr[2] 28 | print("360Yzm login ok,token:" + self.dama_360token) 29 | print("money:" + dama_money) 30 | break 31 | else: 32 | print(retVal) 33 | print("360Yzm login error") 34 | # 重新拨号 35 | time.sleep(10) 36 | continue 37 | 38 | def get_phone(self): 39 | phone = Api_360Yzm.getPhone(self.dama_360token, "10597") 40 | phone = bytes.decode(phone) 41 | i = 0 42 | while i<3: 43 | if phone != '': 44 | phone = phone[2:] 45 | print("get phone ok:%s"%phone) 46 | self.phone = phone 47 | return phone 48 | break 49 | else: 50 | phone = Api_360Yzm.getPhone(self.dama_360token, "10597") 51 | i=i+1 52 | if i == 3: 53 | print("请检查360验证码余额") 54 | sys.exit(0) 55 | 56 | 57 | def get_message(self): 58 | i = 1 59 | msg = "" 60 | while (True): 61 | msg = Api_360Yzm.getMessage(self.dama_360token, "10597", self.phone) 62 | if (int(msg[0:1]) == 1): 63 | msg = re.search("\d{4}",str(msg)).group() 64 | self.msg = msg 65 | return msg 66 | break 67 | else: 68 | i = i + 1 69 | time.sleep(6) 70 | print("还没收到短信验证码,骚等一会") 71 | if i > 10: 72 | print("等了一分钟还没收到短信验证码,不等了,拉黑") 73 | Api_360Yzm.addBlack(self.dama_360token, "10597", self.phone) 74 | return 0 75 | # 判断有没有收到短信验证码 76 | if (msg == ""): 77 | print("进入一个循环") 78 | # 收到了短信,进行下一步操作 79 | print("this is ok,next") 80 | 81 | def get_captcha(self): 82 | page = self.browser.page_source 83 | captcha = re.search(r"data:image/jpeg;base64,([a-z,A-Z,0-9,/,+]*[=]{0,})",page).group(1) 84 | # print(page) 85 | captcha = base64.b64decode(captcha.encode("utf-8")) 86 | with open("captcha.jpg","wb") as pic: 87 | pic.write(captcha) 88 | 89 | def input_number(self,number): 90 | phone = str(number) 91 | for i in phone: 92 | if(i == '1'): 93 | print(i,end='') 94 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[1]/td[1]/div/canvas") 95 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 96 | elif(i == '2'): 97 | print(i,end='') 98 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[1]/td[2]/div/canvas") 99 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 100 | elif(i == '3'): 101 | print(i,end='') 102 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[1]/td[3]/div/canvas") 103 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 104 | elif(i == '4'): 105 | print(i,end='') 106 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[2]/td[1]/div/canvas") 107 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 108 | elif(i == '5'): 109 | print(i,end='') 110 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[2]/td[2]/div/canvas") 111 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 112 | elif(i == '6'): 113 | print(i,end='') 114 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[2]/td[3]/div/canvas") 115 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 116 | elif(i == '7'): 117 | print(i,end='') 118 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[3]/td[1]/div/canvas") 119 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 120 | elif(i == '8'): 121 | print(i,end='') 122 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[3]/td[2]/div/canvas") 123 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 124 | elif(i == '9'): 125 | print(i,end='') 126 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[3]/td[3]/div/canvas") 127 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 128 | elif(i == '0'): 129 | print(i,end='') 130 | ac = self.browser.find_element_by_xpath("//*[@id='app']/div/div[1]/div[2]/table/tbody/tr[4]/td[2]/div/canvas") 131 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 132 | 133 | def start(self): 134 | for i in range(5): 135 | self.browser = webdriver.Chrome(executable_path="D:/chromedriver/chromedriver.exe") 136 | self.browser.get("https://common.ofo.so/newdist/?Profile") 137 | time.sleep(3) 138 | self.input_number(self.get_phone()) 139 | t = threading.Thread(target=self.get_message) 140 | t.start() 141 | while True: 142 | self.get_captcha() 143 | im = open('captcha.jpg', 'rb').read() 144 | captcha = self.rc.rk_create(im, 1040)['Result'] 145 | print("验证码为{}".format(captcha)) 146 | print() 147 | self.input_number(captcha) 148 | print() 149 | time.sleep(1) 150 | if "数字输入错误" in self.browser.page_source: 151 | print("验证码识别失败,开始重试") 152 | else: 153 | break 154 | 155 | t.join() 156 | if self.msg: 157 | self.input_number(self.msg) 158 | ac = self.browser.find_element_by_xpath( 159 | "//*[@id='app']/div/div[1]/div[1]/div[2]/div[4]/button") 160 | ActionChains(self.browser).move_to_element(ac).click(ac).perform() 161 | time.sleep(1) 162 | self.browser.get("https://common.ofo.so/newdist/?Profile") 163 | time.sleep(3) 164 | print(self.browser.get_cookies()) 165 | for cookie in self.browser.get_cookies(): 166 | if cookie['name'] == 'ofo-tokened' and cookie['value']: 167 | with open("token.txt", "a") as f: 168 | f.write(cookie['value']) 169 | f.write(",") 170 | f.write("\n") 171 | self.browser.delete_all_cookies() 172 | self.browser.quit() 173 | else: 174 | print("获取手机验证码失败,开始下一个账号") 175 | self.browser.quit() 176 | 177 | if __name__ == '__main__': 178 | l = Register() 179 | l.start() 180 | -------------------------------------------------------------------------------- /ofoRegister/rk.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding:utf-8 3 | 4 | import requests 5 | from hashlib import md5 6 | 7 | class RClient(object): 8 | 9 | def __init__(self, username, password, soft_id, soft_key): 10 | self.username = username 11 | self.password = md5(password.encode("utf-8")).hexdigest() 12 | self.soft_id = soft_id 13 | self.soft_key = soft_key 14 | self.base_params = { 15 | 'username': self.username, 16 | 'password': self.password, 17 | 'softid': self.soft_id, 18 | 'softkey': self.soft_key, 19 | } 20 | self.headers = { 21 | 'Connection': 'Keep-Alive', 22 | 'Expect': '100-continue', 23 | 'User-Agent': 'ben', 24 | } 25 | 26 | def rk_create(self, im, im_type, timeout=60): 27 | """ 28 | im: 图片字节 29 | im_type: 题目类型 30 | """ 31 | params = { 32 | 'typeid': im_type, 33 | 'timeout': timeout, 34 | } 35 | params.update(self.base_params) 36 | files = {'image': ('a.jpg', im)} 37 | r = requests.post('http://api.ruokuai.com/create.json', data=params, files=files, headers=self.headers) 38 | return r.json() 39 | 40 | def rk_report_error(self, im_id): 41 | """ 42 | im_id:报错题目的ID 43 | """ 44 | params = { 45 | 'id': im_id, 46 | } 47 | params.update(self.base_params) 48 | r = requests.post('http://api.ruokuai.com/reporterror.json', data=params, headers=self.headers) 49 | return r.json() 50 | 51 | if __name__ == '__main__': 52 | rc = RClient('username', 'password', 'soft_id', 'soft_key') 53 | im = open('a.jpg', 'rb').read() 54 | print(rc.rk_create(im, 3040)) 55 | 56 | -------------------------------------------------------------------------------- /ofoSpider/__init__.py: -------------------------------------------------------------------------------- 1 | from .spider import * 2 | 3 | print ("ofoSpider Module Loaded!") 4 | print ("Author: SilverBooker") -------------------------------------------------------------------------------- /ofoSpider/spider.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import datetime 3 | import threading 4 | import json 5 | import os 6 | import pandas as pd 7 | import numpy as np 8 | import time 9 | import sqlite3 10 | from configparser import ConfigParser 11 | from requests.packages.urllib3.exceptions import InsecureRequestWarning 12 | from requests_toolbelt.multipart.encoder import MultipartEncoder 13 | from concurrent.futures import ThreadPoolExecutor 14 | requests.packages.urllib3.disable_warnings(InsecureRequestWarning) 15 | 16 | 17 | class Crawler: 18 | def __init__(self): 19 | self.start_time = datetime.datetime.now() 20 | self.db_name = "file:database?mode=memory&cache=shared" 21 | self.csv_path = "./db/" + datetime.datetime.now().strftime("%Y%m%d") 22 | os.makedirs(self.csv_path, exist_ok=True) 23 | self.csv_name = self.csv_path + "/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 24 | cfg = ConfigParser() 25 | cfg.read("config.ini", encoding='utf-8-sig') 26 | self.config = cfg 27 | self.lock = threading.Lock() 28 | self.total = 0 29 | self.done = 0 30 | self.bikes_count = 0 31 | 32 | def get_nearby_bikes(self,args): 33 | try: 34 | url = "https://san.ofo.so/ofo/Api/nearbyofoCar" 35 | 36 | headers = { 37 | 'Accept': '*/*', 38 | 'Accept-Encoding': 'gzip, deflate', 39 | 'Accept-Language': 'zh-CN', 40 | 'Content-Length': '524', 41 | 'Content-Type': 'multipart/form-data; boundary=----ofo-boundary-MC40MjcxMzUw', 42 | 'Host': 'san.ofo.so', 43 | 'Origin': 'https://common.ofo.so', 44 | 'Referer': 'https://common.ofo.so/newdist/?Journey', 45 | 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0' 46 | } 47 | 48 | self.request(headers,args,url) 49 | except Exception as ex: 50 | print(ex) 51 | 52 | def request(self,headers,args,url): 53 | multipart_encoder = MultipartEncoder( 54 | fields={ 55 | "token": str(args[2]), 56 | "source": "0", 57 | "source-version": "9999", 58 | # "lat": "39.928845", 59 | "lat": str(args[0]), 60 | # "lng":"116.422077" 61 | "lng":str(args[1]) 62 | #file为路径 63 | }, 64 | boundary='----ofo-boundary-MC40MjcxMzUw' 65 | ) 66 | response = requests.request( 67 | "POST",url,headers=headers, 68 | timeout = self.config.getint("DEFAULT","timeout"), 69 | verify=False, 70 | data=multipart_encoder 71 | ) 72 | 73 | with self.lock: 74 | with self.connect_db() as c: 75 | try: 76 | decoded = json.loads(response.text)['values']['info']['cars'] 77 | self.done += 1 78 | for x in decoded: 79 | self.bikes_count += 1 80 | c.execute("INSERT OR IGNORE INTO ofo VALUES (%d,'%s',%f,%f)" % ( 81 | int(time.time()) * 1000, x['carno'], x['lat'], x['lng'])) 82 | 83 | timespent = datetime.datetime.now() - self.start_time 84 | percent = self.done / self.total 85 | total = timespent / percent 86 | print("位置 %s, 未去重单车数量 %s, 进度 %0.2f%%, 速度 %0.2f个/分钟, 总时间 %s, 剩余时间 %s" % ( 87 | args, self.bikes_count, percent * 100, self.done / timespent.total_seconds() * 60, total, total - timespent)) 88 | except Exception as ex: 89 | print(ex) 90 | 91 | def connect_db(self): 92 | return sqlite3.connect(self.db_name, uri=True) 93 | 94 | def start(self): 95 | while True: 96 | self.__init__() 97 | 98 | try: 99 | with self.connect_db() as c: 100 | c.execute(self.generate_create_table_sql('ofo')) 101 | except Exception as ex: 102 | print(ex) 103 | pass 104 | 105 | executor = ThreadPoolExecutor(max_workers=self.config.getint('DEFAULT','workers')) 106 | print("Start") 107 | 108 | self.total = 0 109 | top_lng, top_lat = self.config.get("DEFAULT","top_left").split(",") 110 | bottom_lng, bottom_lat = self.config.get("DEFAULT", "bottom_right").split(",") 111 | lat_range = np.arange(float(top_lat), float(bottom_lat), -self.config.getfloat('DEFAULT','offset')) 112 | for lat in lat_range: 113 | lng_range = np.arange(float(top_lng), float(bottom_lng), self.config.getfloat('DEFAULT','offset')) 114 | for lon in lng_range: 115 | self.total += 1 116 | executor.submit(self.get_nearby_bikes, (lat, lon,self.config.get('DEFAULT','token'))) 117 | 118 | executor.shutdown() 119 | self.group_data() 120 | 121 | if not self.config.getboolean("DEFAULT", 'always_run'): 122 | break 123 | 124 | waittime = self.config.getint("DEFAULT", 'wait_time') 125 | print("等待%s分钟后继续运行" % waittime) 126 | time.sleep(waittime * 60) 127 | 128 | def generate_create_table_sql(self, brand): 129 | return '''CREATE TABLE {0} 130 | ( 131 | "Time" DATETIME, 132 | "bikeId" VARCHAR(12), 133 | lat DOUBLE, 134 | lon DOUBLE, 135 | CONSTRAINT "{0}_bikeId_lat_lon_pk" 136 | PRIMARY KEY (bikeId, lat, lon) 137 | );'''.format(brand) 138 | 139 | def group_data(self): 140 | print("正在导出数据") 141 | conn = self.connect_db() 142 | 143 | self.export_to_csv(conn, "ofo") 144 | 145 | def export_to_csv(self, conn, brand): 146 | df = pd.read_sql_query("SELECT * FROM %s" % brand, conn, parse_dates=True) 147 | print(brand, "去重后数量", len(df)) 148 | df['Time'] = pd.to_datetime(df['Time'], unit='ms').dt.tz_localize('UTC').dt.tz_convert('Asia/Chongqing') 149 | compress = None 150 | csv_file = self.csv_name + "-" + brand + ".csv" 151 | if self.config.getboolean("DEFAULT","compress"): 152 | compress = 'gzip' 153 | csv_file = self.csv_name + "-" + brand + ".csv.gz" 154 | 155 | df.to_csv(csv_file, header=False, index=False, compression=compress) 156 | 157 | if __name__ == '__main__': 158 | c = Crawler() 159 | c.start() 160 | print("完成") -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Requirements automatically generated by pigar. 2 | # https://github.com/damnever/pigar 3 | 4 | # ofoSpider/spider.py: 7 5 | numpy == 1.22.0 6 | 7 | # ofoSpider/spider.py: 6 8 | pandas == 0.24.1 9 | 10 | # ofoRegister/rk.py: 4 11 | # ofoSpider/spider.py: 1,11 12 | requests == 2.31.0 13 | 14 | # ofoSpider/spider.py: 12 15 | requests-toolbelt == 0.9.1 16 | 17 | # ofoRegister/login.py: 7 18 | rk == 0.3b1 19 | 20 | # ofoRegister/login.py: 9,10 21 | selenium == 3.141.0 22 | -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | from ofoSpider import * 2 | 3 | c = spider.Crawler() 4 | c.start() 5 | print("完成") --------------------------------------------------------------------------------