├── .gitignore ├── README.md ├── main.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | .*.pyc 6 | 7 | # IDE 8 | .idea/ 9 | 10 | # virtualenv 11 | venv/ 12 | 13 | # MAC 14 | .DS_Store 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tor IP Rotation in Python 2 | A simple Python script that requests new IPs from the Tor network. 3 | 4 | Article: 5 | https://medium.com/@amine.btt/a-crawler-that-beats-bot-detection-879888f470eb 6 | 7 | Adapted from: 8 | - *"[Crawling anonymously with Tor in Python](http://sacharya.com/crawling-anonymously-with-tor-in-python/)" by S. Acharya, Nov 2, 2013.* 9 | - *[PyTorStemPrivoxy](https://github.com/FrackingAnalysis/PyTorStemPrivoxy) repo of [FrackingAnalysis](https://github.com/FrackingAnalysis)* 10 | 11 | ## Requirements 12 | PS: **These are the requirments for Mac OS X**. You can find the requirements for Linux in [PyTorStemPrivoxy](https://github.com/FrackingAnalysis/PyTorStemPrivoxy). 13 | 14 | ### Tor 15 | ```shell 16 | brew update 17 | brew install tor 18 | ``` 19 | 20 | *Notice that the socks listener is on port 9050.* 21 | 22 | Next, do the following: 23 | - Enable the ControlPort listener for Tor to listen on port 9051, as this is the port to which Tor will listen for any communication from applications talking to the Tor controller. 24 | - Hash a new password that prevents random access to the port by outside agents. 25 | - Implement cookie authentication as well. 26 | 27 | You can create a hashed password out of your password using: 28 | ```shell 29 | tor --hash-password my_password 30 | ``` 31 | 32 | Then, update the `/usr/local/etc/tor/torrc` with the port, hashed password, and cookie authentication. 33 | ```shell 34 | # content of torrc 35 | ControlPort 9051 36 | # hashed password below is obtained via `tor --hash-password my_password` 37 | HashedControlPassword 16:E600ADC1B52C80BB6022A0E999A7734571A451EB6AE50FED489B72E3DF 38 | CookieAuthentication 1 39 | ``` 40 | 41 | Restart Tor again to the configuration changes are applied. 42 | ```shell 43 | brew services restart tor 44 | ``` 45 | 46 | ### Privoxy 47 | 48 | Tor itself is not a http proxy. So in order to get access to the Tor Network, use `privoxy` as an http-proxy though socks5. 49 | 50 | Install `privoxy` via the following command: 51 | 52 | ```shell 53 | brew install privoxy 54 | ``` 55 | 56 | Now, tell `privoxy` to use TOR by routing all traffic through the SOCKS servers at localhost port 9050. 57 | To do that append `/usr/local/etc/privoxy/config` with the following 58 | ```shell 59 | forward-socks5t / 127.0.0.1:9050 . # the dot at the end is important 60 | ``` 61 | 62 | Restart `privoxy` after making the change to the configuration file. 63 | ```shell 64 | brew services restart privoxy 65 | ``` 66 | 67 | ### Stem 68 | 69 | Next, install `stem` which is a Python-based module used to interact with the Tor Controller, letting us send and receive commands to and from the Tor Control port programmatically. 70 | 71 | ```shell 72 | pip install stem 73 | ``` 74 | 75 | ## Example Script 76 | 77 | In the script below, `urllib` is using `privoxy` which is listening on port 8118 by default, and forwards the traffic to port 9050 on which the Tor socks is listening. 78 | 79 | Additionally, in the `renew_connection()` function, a signal is being sent to the Tor controller to change the identity, so you get new identities without restarting Tor. Doing such comes in handy when crawling a web site and one doesn't wanted to be blocked based on IP address. 80 | 81 | ```python 82 | ... 83 | 84 | wait_time = 2 85 | number_of_ip_rotations = 3 86 | tor_handler = TorHandler() 87 | 88 | ip = tor_handler.open_url('http://icanhazip.com/') 89 | print('My first IP: {}'.format(ip)) 90 | 91 | # Cycle through the specified number of IP addresses via TOR 92 | for i in range(0, number_of_ip_rotations): 93 | old_ip = ip 94 | seconds = 0 95 | 96 | tor_handler.renew_connection() 97 | 98 | # Loop until the 'new' IP address is different than the 'old' IP address, 99 | # It may take the TOR network some time to effect a different IP address 100 | while ip == old_ip: 101 | time.sleep(wait_time) 102 | seconds += wait_time 103 | print('{} seconds elapsed awaiting a different IP address.'.format(seconds)) 104 | 105 | ip = tor_handler.open_url('http://icanhazip.com/') 106 | 107 | print('My new IP: {}'.format(ip)) 108 | ``` 109 | Execute the Python 3 script above via the following command: 110 | 111 | ```shell 112 | python main.py 113 | ``` 114 | When the above script is executed, one should see that the IP address is changing every few seconds. 115 | 116 | 117 | ## Changes from PyTorStemPrivoxy 118 | - *Requirements for Mac OS X* 119 | - *Python 3* 120 | - *Coding style* 121 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import time 2 | from urllib.request import ProxyHandler, build_opener, install_opener, Request, urlopen 3 | 4 | from stem import Signal 5 | from stem.control import Controller 6 | 7 | 8 | class TorHandler: 9 | def __init__(self): 10 | self.headers = { 11 | 'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'} 12 | 13 | def open_url(self, url): 14 | # communicate with TOR via a local proxy (privoxy) 15 | def _set_url_proxy(): 16 | proxy_support = ProxyHandler({'http': '127.0.0.1:8118'}) 17 | opener = build_opener(proxy_support) 18 | install_opener(opener) 19 | 20 | _set_url_proxy() 21 | request = Request(url, None, self.headers) 22 | return urlopen(request).read().decode('utf-8') 23 | 24 | @staticmethod 25 | def renew_connection(): 26 | with Controller.from_port(port=9051) as controller: 27 | controller.authenticate(password='btt') 28 | controller.signal(Signal.NEWNYM) 29 | controller.close() 30 | 31 | 32 | if __name__ == '__main__': 33 | wait_time = 2 34 | number_of_ip_rotations = 3 35 | tor_handler = TorHandler() 36 | 37 | ip = tor_handler.open_url('http://icanhazip.com/') 38 | print('My first IP: {}'.format(ip)) 39 | 40 | # Cycle through the specified number of IP addresses via TOR 41 | for i in range(0, number_of_ip_rotations): 42 | old_ip = ip 43 | seconds = 0 44 | 45 | tor_handler.renew_connection() 46 | 47 | # Loop until the 'new' IP address is different than the 'old' IP address, 48 | # It may take the TOR network some time to effect a different IP address 49 | while ip == old_ip: 50 | time.sleep(wait_time) 51 | seconds += wait_time 52 | print('{} seconds elapsed awaiting a different IP address.'.format(seconds)) 53 | 54 | ip = tor_handler.open_url('http://icanhazip.com/') 55 | 56 | print('My new IP: {}'.format(ip)) 57 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | stem==1.6.0 2 | --------------------------------------------------------------------------------