├── UNLICENSE ├── proxy-scrape.py └── README.md /UNLICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /proxy-scrape.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | """ 5 | proxyscrape.py: scrapin' proxies with ocr 6 | https://github.com/vesche 7 | """ 8 | 9 | import bs4 10 | import os 11 | import pytesseract 12 | import requests 13 | import subprocess 14 | 15 | from io import BytesIO 16 | from PIL import Image 17 | 18 | BASE_URL = 'https://www.torvpn.com' 19 | PROXY_LOC = '/en/proxy-list' 20 | 21 | 22 | def process_image(img): 23 | """Uses ImageMagick to resize an image to 200% size.""" 24 | img.save('/tmp/tmp_img1.png') 25 | command = 'convert /tmp/tmp_img1.png -resize 200% /tmp/tmp_img2.png' 26 | subprocess.call(command.split()) 27 | 28 | img = Image.open('/tmp/tmp_img2.png') 29 | os.remove('/tmp/tmp_img1.png') 30 | os.remove('/tmp/tmp_img2.png') 31 | return img 32 | 33 | 34 | def main(): 35 | page = requests.get('{}{}'.format(BASE_URL, PROXY_LOC)).content 36 | soup = bs4.BeautifulSoup(page, 'html.parser') 37 | table = soup.find('table', attrs={'class': 'table table-striped'}) 38 | rows = table.find_all('tr') 39 | 40 | data = [] 41 | for row in rows: 42 | cols = row.find_all('td') 43 | data.append([ele for ele in cols]) 44 | 45 | proxy_dicts = [] 46 | for i in range(len(data)): 47 | try: 48 | d = { 'img_pic': data[i][1].find('img')['src'], 49 | 'port': data[i][2].text, 50 | 'country': data[i][3].text.split('\n')[1], 51 | 'proxy_type': data[i][4].text.lower() } 52 | proxy_dicts.append(d) 53 | except IndexError: 54 | continue 55 | 56 | # convert image to IP 57 | for i in range(len(proxy_dicts)): 58 | d = proxy_dicts[i] 59 | response = requests.get('{}{}'.format(BASE_URL, d['img_pic'])) 60 | img = Image.open(BytesIO(response.content)) 61 | 62 | # double image size to correct OCR errors 63 | img = process_image(img) 64 | 65 | ip = pytesseract.image_to_string(img) 66 | d['ip'] = ip 67 | 68 | # print in proxychains format 69 | for d in proxy_dicts: 70 | print('{} {:15} {:5} # {}'.format( 71 | d['proxy_type'], d['ip'], d['port'], d['country'])) 72 | 73 | 74 | if __name__ == '__main__': 75 | main() 76 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # proxy-scrape 2 | 3 | **The TorVPN proxy list is no longer populating as of July 2018, therefore this tool is defunct. If you're looking for a decent free proxy list check out [a2u/free-proxy-list](https://github.com/a2u/free-proxy-list).** 4 | 5 | This is a command-line tool to scrape the [TorVPN proxy list](https://www.torvpn.com/en/proxy-list) for use with [proxychains](https://github.com/haad/proxychains). TorVPN's proxy list uses images to list IP addresses (likely to avoid scrapers), this tool uses [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) via the [pytesseract](https://github.com/madmaze/pytesseract) wrapper for [optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition) and [ImageMagick](https://www.imagemagick.org/script/index.php) for image manipulation. 6 | 7 | ## Install 8 | 9 | ``` 10 | $ sudo apt-get install proxychains tesseract-ocr imagemagick 11 | $ git clone https://github.com/vesche/proxy-scrape 12 | ``` 13 | 14 | ## Usage 15 | 16 | Run proxy-scrape like so, be patient it will take about 30 seconds to process: 17 | 18 | ``` 19 | $ ./proxy-scrape.py 20 | http 33.169.33.46 3128 # Germany 21 | http 158.69.204.32 3128 # United States 22 | http 139.59.117.11 3128 # Australia 23 | http 128.199.66.186 443 # United Kingdom 24 | http 139.59.125.12 8080 # Australia 25 | http 128.199.190.243 443 # United Kingdom 26 | http 139.59.125.12 80 # Australia 27 | http 128.199.75.57 8080 # United Kingdom 28 | http 14.141.73.11 8080 # India 29 | http 138.197.137.90 3128 # United States 30 | http 35.185.80.76 3128 # United States 31 | http 143.107.228.57 3128 # Brazil 32 | http 188.166.82.80 3000 # Russian Federation 33 | ... 34 | ``` 35 | 36 | You can easily append the results of proxy-scrape to proxychains. Some proxies may by denied or timeout, just remove these manually. Here I do a simple curl, and use my tool [scanless](https://github.com/vesche/scanless) using many proxies: 37 | 38 | ``` 39 | $ ./proxy-scrape.py > results.txt 40 | $ head -n 20 results.txt | sudo tee -a /etc/proxychains.conf > /dev/null 41 | $ proxychains curl https://ipinfo.io/8.8.8.8 42 | ProxyChains-3.1 (http://proxychains.sf.net) 43 | |D-chain|-<>-158.69.204.32:3128-<>-139.59.117.11:3128-<>-128.199.66.186:443-<>-139.59.125.53:443-<>-139.59.125.12:8080-<>-128.199.75.57:80-<>-128.199.190.243:443-<>-139.59.125.12:80-<>-128.199.75.57:8080-<>-138.197.137.90:3128-<>-35.185.80.76:3128-<>-143.107.228.57:3128-<><>-216.239.36.21:443-<><>-OK 44 | { 45 | "ip": "8.8.8.8", 46 | "hostname": "google-public-dns-a.google.com", 47 | "city": "Mountain View", 48 | "region": "California", 49 | "country": "US", 50 | "loc": "37.3860,-122.0840", 51 | "org": "AS15169 Google Inc.", 52 | "postal": "94035", 53 | "phone": "650" 54 | } 55 | $ proxychains scanless -t scanme.nmap.org -s hackertarget 56 | ProxyChains-3.1 (http://proxychains.sf.net) 57 | Running scanless... 58 | |D-chain|-<>-158.69.204.32:3128-<>-139.59.117.11:3128-<>-128.199.66.186:443-<>-139.59.125.53:443-<>-139.59.125.12:8080-<>-128.199.75.57:80-<>-128.199.190.243:443-<>-139.59.125.12:80-<>-128.199.75.57:8080-<>-138.197.137.90:3128-<>-35.185.80.76:3128-<>-143.107.228.57:3128-<><>-35.186.165.146:443-<><>-OK 59 | 60 | ------- hackertarget ------- 61 | Starting Nmap 7.01 ( https://nmap.org ) at 2017-09-25 14:42 UTC 62 | Nmap scan report for scanme.nmap.org (45.33.32.156) 63 | Host is up (0.063s latency). 64 | Other addresses for scanme.nmap.org (not scanned): 2600:3c01::f03c:91ff:fe18:bb2f 65 | PORT STATE SERVICE VERSION 66 | 21/tcp closed ftp 67 | 22/tcp open ssh OpenSSH 6.6.1p1 Ubuntu 2ubuntu2.8 (Ubuntu Linux; protocol 2.0) 68 | 23/tcp closed telnet 69 | 25/tcp closed smtp 70 | 80/tcp open http Apache httpd 2.4.7 ((Ubuntu)) 71 | 110/tcp closed pop3 72 | 143/tcp closed imap 73 | 443/tcp closed https 74 | 445/tcp closed microsoft-ds 75 | 3389/tcp closed ms-wbt-server 76 | Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel 77 | 78 | Service detection performed. Please report any incorrect results at https://nmap.org/submit/ . 79 | Nmap done: 1 IP address (1 host up) scanned in 7.18 seconds 80 | ---------------------------- 81 | ``` 82 | 83 | ## Notes 84 | If you're having DNS issues remove `proxy_dns` from your `/etc/proxychains.conf` to do DNS resolution locally. I'd recommend using a DNS server from [OpenNIC](https://servers.opennic.org/) in your `/etc/resolv.conf`. 85 | 86 | 87 | --------------------------------------------------------------------------------