├── .project ├── .pydevproject ├── PyTorStemPrivoxy.py ├── README.md └── documentation └── Linux_TOR_Install.md /.project: -------------------------------------------------------------------------------- 1 | 2 | 3 | PyTorStemPrivoxyTest 4 | 5 | 6 | 7 | 8 | 9 | org.python.pydev.PyDevBuilder 10 | 11 | 12 | 13 | 14 | 15 | org.python.pydev.pythonNature 16 | 17 | 18 | -------------------------------------------------------------------------------- /.pydevproject: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | /${PROJECT_DIR_NAME} 5 | 6 | python 2.7 7 | python27 8 | 9 | -------------------------------------------------------------------------------- /PyTorStemPrivoxy.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python script to connect to Tor via Stem and Privoxy, requesting a new connection (hence a new IP as well) as desired. 3 | ''' 4 | 5 | import stem 6 | import stem.connection 7 | 8 | import time 9 | import urllib2 10 | 11 | from stem import Signal 12 | from stem.control import Controller 13 | 14 | # initialize some HTTP headers 15 | # for later usage in URL requests 16 | user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7' 17 | headers={'User-Agent':user_agent} 18 | 19 | # initialize some 20 | # holding variables 21 | oldIP = "0.0.0.0" 22 | newIP = "0.0.0.0" 23 | 24 | # how many IP addresses 25 | # through which to iterate? 26 | nbrOfIpAddresses = 3 27 | 28 | # seconds between 29 | # IP address checks 30 | secondsBetweenChecks = 2 31 | 32 | # request a URL 33 | def request(url): 34 | # communicate with TOR via a local proxy (privoxy) 35 | def _set_urlproxy(): 36 | proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"}) 37 | opener = urllib2.build_opener(proxy_support) 38 | urllib2.install_opener(opener) 39 | 40 | # request a URL 41 | # via the proxy 42 | _set_urlproxy() 43 | request=urllib2.Request(url, None, headers) 44 | return urllib2.urlopen(request).read() 45 | 46 | # signal TOR for a new connection 47 | def renew_connection(): 48 | with Controller.from_port(port = 9051) as controller: 49 | controller.authenticate(password = 'my_password') 50 | controller.signal(Signal.NEWNYM) 51 | controller.close() 52 | 53 | # cycle through 54 | # the specified number 55 | # of IP addresses via TOR 56 | for i in range(0, nbrOfIpAddresses): 57 | 58 | # if it's the first pass 59 | if newIP == "0.0.0.0": 60 | # renew the TOR connection 61 | renew_connection() 62 | # obtain the "new" IP address 63 | newIP = request("http://icanhazip.com/") 64 | # otherwise 65 | else: 66 | # remember the 67 | # "new" IP address 68 | # as the "old" IP address 69 | oldIP = newIP 70 | # refresh the TOR connection 71 | renew_connection() 72 | # obtain the "new" IP address 73 | newIP = request("http://icanhazip.com/") 74 | 75 | # zero the 76 | # elapsed seconds 77 | seconds = 0 78 | 79 | # loop until the "new" IP address 80 | # is different than the "old" IP address, 81 | # as it may take the TOR network some 82 | # time to effect a different IP address 83 | while oldIP == newIP: 84 | # sleep this thread 85 | # for the specified duration 86 | time.sleep(secondsBetweenChecks) 87 | # track the elapsed seconds 88 | seconds += secondsBetweenChecks 89 | # obtain the current IP address 90 | newIP = request("http://icanhazip.com/") 91 | # signal that the program is still awaiting a different IP address 92 | print ("%d seconds elapsed awaiting a different IP address." % seconds) 93 | # output the 94 | # new IP address 95 | print ("") 96 | print ("newIP: %s" % newIP) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | PyTorStemPrivoxy 2 | ================ 3 | 4 | Python, Tor, Stem, Privoxy program that requests new connections via Tor and thereby obtains new IP addresses as well. 5 | 6 | # Crawling Anonymously with Tor in Python # 7 | 8 | *adapted from the article "[Crawling anonymously with Tor in Python](http://sacharya.com/crawling-anonymously-with-tor-in-python/)" by S. Acharya, Nov 2, 2013.* 9 | 10 | The most common use-case is to be able to hide one's identity using TOR or being able to change identities programmatically, for example when you are crawling a website like Google and you don't want to be rate-limited or blocked via IP address. 11 | 12 | ## Tor ## 13 | 14 | Install Tor. 15 | 16 | ```shell 17 | sudo apt-get update 18 | sudo apt-get install tor 19 | sudo /etc/init.d/tor restart 20 | ``` 21 | 22 | *Notice that the socks listener is on port 9050.* 23 | 24 | Next, do the following: 25 | 26 | - Enable the ControlPort listener for Tor to listen on port 9051, as this is the port to which Tor will listen for any communication from applications talking to the Tor controller. 27 | - Hash a new password that prevents random access to the port by outside agents. 28 | - Implement cookie authentication as well. 29 | 30 | You can create a hashed password out of your password using: 31 | 32 | ```shell 33 | tor --hash-password my_password 34 | ``` 35 | 36 | Then, update the /etc/tor/torrc with the port, hashed password, and cookie authentication. 37 | 38 | ```shell 39 | sudo gedit /etc/tor/torrc 40 | ``` 41 | 42 | ```shell 43 | ControlPort 9051 44 | # hashed password below is obtained via `tor --hash-password my_password` 45 | HashedControlPassword 16:E600ADC1B52C80BB6022A0E999A7734571A451EB6AE50FED489B72E3DF 46 | CookieAuthentication 1 47 | ``` 48 | 49 | Restart Tor again to the configuration changes are applied. 50 | 51 | ```shell 52 | sudo /etc/init.d/tor restart 53 | ``` 54 | 55 | ## python-stem ## 56 | 57 | Next, install `python-stem` which is a Python-based module used to interact with the Tor Controller, letting us send and receive commands to and from the Tor Control port programmatically. 58 | 59 | ```shell 60 | sudo apt-get install python-stem 61 | ``` 62 | 63 | ## privoxy ## 64 | 65 | Tor itself is not a http proxy. So in order to get access to the Tor Network, use `privoxy` as an http-proxy though socks5. 66 | 67 | Install `privoxy` via the following command: 68 | 69 | ```shell 70 | sudo apt-get install privoxy 71 | ``` 72 | 73 | Now, tell `privoxy` to use TOR by routing all traffic through the SOCKS servers at localhost port 9050. 74 | 75 | ```shell 76 | sudo gedit /etc/privoxy/config 77 | ``` 78 | 79 | and enable `forward-socks5` as follows: 80 | 81 | ```shell 82 | # source https://stackoverflow.com/questions/9887505/how-to-change-tor-identity-in-python 83 | forward-socks5 / localhost:9050 . #dot is important at the end 84 | ``` 85 | 86 | Restart `privoxy` after making the change to the configuration file. 87 | 88 | ```shell 89 | sudo /etc/init.d/privoxy restart 90 | ``` 91 | 92 | ##Python Script## 93 | 94 | In the script below, `urllib2` is using the proxy. `privoxy` listens on port 8118 by default, and forwards the traffic to port 9050 upon which the Tor socks is listening. 95 | 96 | Additionally, in the `renew_connection()` function, a signal is being sent to the Tor controller to change the identity, so you get new identities without restarting Tor. Doing such comes in handy when crawling a web site and one doesn’t wanted to be blocked based on IP address. 97 | 98 | **[PyTorStemPrivoxy.py](https://gist.github.com/KhepryQuixote/46cf4f3b999d7f658853#file-pytorstemprivoxy-py)** 99 | 100 | ```python 101 | 102 | ''' 103 | Python script to connect to Tor via Stem and Privoxy, requesting a new connection (hence a new IP as well) as desired. 104 | ''' 105 | 106 | import stem 107 | import stem.connection 108 | 109 | import time 110 | import urllib2 111 | 112 | from stem import Signal 113 | from stem.control import Controller 114 | 115 | # initialize some HTTP headers 116 | # for later usage in URL requests 117 | user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7' 118 | headers={'User-Agent':user_agent} 119 | 120 | # initialize some 121 | # holding variables 122 | oldIP = "0.0.0.0" 123 | newIP = "0.0.0.0" 124 | 125 | # how many IP addresses 126 | # through which to iterate? 127 | nbrOfIpAddresses = 3 128 | 129 | # seconds between 130 | # IP address checks 131 | secondsBetweenChecks = 2 132 | 133 | # request a URL 134 | def request(url): 135 | # communicate with TOR via a local proxy (privoxy) 136 | def _set_urlproxy(): 137 | proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"}) 138 | opener = urllib2.build_opener(proxy_support) 139 | urllib2.install_opener(opener) 140 | 141 | # request a URL 142 | # via the proxy 143 | _set_urlproxy() 144 | request=urllib2.Request(url, None, headers) 145 | return urllib2.urlopen(request).read() 146 | 147 | # signal TOR for a new connection 148 | def renew_connection(): 149 | with Controller.from_port(port = 9051) as controller: 150 | controller.authenticate(password = 'my_password') 151 | controller.signal(Signal.NEWNYM) 152 | controller.close() 153 | 154 | # cycle through 155 | # the specified number 156 | # of IP addresses via TOR 157 | for i in range(0, nbrOfIpAddresses): 158 | 159 | # if it's the first pass 160 | if newIP == "0.0.0.0": 161 | # renew the TOR connection 162 | renew_connection() 163 | # obtain the "new" IP address 164 | newIP = request("http://icanhazip.com/") 165 | # otherwise 166 | else: 167 | # remember the 168 | # "new" IP address 169 | # as the "old" IP address 170 | oldIP = newIP 171 | # refresh the TOR connection 172 | renew_connection() 173 | # obtain the "new" IP address 174 | newIP = request("http://icanhazip.com/") 175 | 176 | # zero the 177 | # elapsed seconds 178 | seconds = 0 179 | 180 | # loop until the "new" IP address 181 | # is different than the "old" IP address, 182 | # as it may take the TOR network some 183 | # time to effect a different IP address 184 | while oldIP == newIP: 185 | # sleep this thread 186 | # for the specified duration 187 | time.sleep(secondsBetweenChecks) 188 | # track the elapsed seconds 189 | seconds += secondsBetweenChecks 190 | # obtain the current IP address 191 | newIP = request("http://icanhazip.com/") 192 | # signal that the program is still awaiting a different IP address 193 | print ("%d seconds elapsed awaiting a different IP address." % seconds) 194 | # output the 195 | # new IP address 196 | print ("") 197 | print ("newIP: %s" % newIP) 198 | 199 | ``` 200 | 201 | Execute the Python 2.7 script above via the following command: 202 | 203 | ```shell 204 | python PyTorStemPrivoxy.py 205 | ``` 206 | 207 | When the above script is executed, one should see that the IP address is changing every few seconds. 208 | 209 | 210 | 211 | ## Adaptations to the original article ## 212 | 213 | - *tweaks of grammar.* 214 | - *the use of `python-stem` instead of `pytorctl`.* 215 | - *a slight difference of settings within the `/etc/tor/torrc` file.* 216 | - *the use of a different hashed password for the Tor controller, in this case `my_password`.* 217 | - *some modifications in the sample program to accommodate the use of `python-stem`, cleaner logic, and more comprehensive commentary.* 218 | -------------------------------------------------------------------------------- /documentation/Linux_TOR_Install.md: -------------------------------------------------------------------------------- 1 | ##Linux TOR Install## 2 | 3 | - 7zip 4 | - p7zip 5 | - p7zip-full 6 | - `sudo apt-get install p7zip p7zip-full` 7 | - nautilus 8 | - nautilus-open-terminal 9 | - `sudo apt-get install nautilus-open-terminal` 10 | - browsers 11 | - firefox 12 | - chromium-browser 13 | - `sudo apt-get install chromium-browser firefox` 14 | - cryptography 15 | - gnupg 16 | - gpa 17 | - kleopatra 18 | - seahorse 19 | - seahorse-nautilus 20 | - rng-tools 21 | - This package may be required on virtual machines for the following reasons: 22 | - Key generation requires the system to work with a source of random numbers. Systems which are better at generating random numbers than others are said to have higher entropy. This is typically obtained from the system hardware; the GnuPG documentation recommends that keys be generated only on a local machine (i.e. not one being accessed across a network), and that keyboard, mouse and disk activity be maximized during key generation to increase the entropy of the system. 23 | - Unfortunately, there are some scenarios - for example, on virtual machines which don’t have real hardware - where insufficient entropy causes key generation to be extremely slow. If you come across this problem, you should investigate means of increasing the system entropy. On virtualized Linux systems, this can often be achieved by installing the rng-tools package. This is available at least on RPM-based and APT-based systems (Red Hat/Fedora, Debian, Ubuntu and derivative distributions). 24 | - haveged 25 | - Installing this may assist in obtaining the needed entropy to make GPG key generation run within an acceptable timeframe. 26 | - `sudo apt-get install gnupg gpa kleopatra seahorse seahorse-nautilus rng-tools haveged` 27 | - databases 28 | - sqlite 29 | - sqlite3 30 | - sqlitebrowser 31 | - sqliteman 32 | - sqliteman-doc 33 | - libqt4-dev 34 | - libsqlite3-dev 35 | - sqlite3-doc 36 | - `sudo apt-get install sqlite3 sqlitebrowser sqliteman sqliteman-doc libsqlite3-dev libqt4-dev sqlite3-doc` 37 | - [sqlitestudio (Linux 64-bit)](http://sqlitestudio.pl/files/free/stable/linux64/sqlitestudio-2.1.5.bin) 38 | - downloaders 39 | - filezilla 40 | - transmission (a BitTorrent equivalent) 41 | - transmission-gtk 42 | - `sudo apt-get install filezilla transmission transmission-gtk` 43 | - editors 44 | - gedit (is already installed with Linux) 45 | - retext (Markdown editor) 46 | - `sudo apt-get install gedit retext` 47 | - folders 48 | - ~/data 49 | - voters 50 | - nc (these are North Carolina voter files suitable for use as "test" files) 51 | - ~/projects 52 | - python *(this is the "workspace" folder for the Spyder IDE)* 53 | - ~/temp 54 | - python 55 | - python 2.7 56 | - libraries 57 | - python 58 | - python-all 59 | - python-dev 60 | - python-bcrypt 61 | - python-configparser 62 | - python-crypto 63 | - python-gdal 64 | - python-gnupg 65 | - python-iniparse 66 | - python-pip 67 | - python-numpy 68 | - python-pandas 69 | - python-pyodbc 70 | - python-pysqlite2 71 | - python-socksipy 72 | - python-sphinx 73 | - python-stem 74 | - python-torctl 75 | - python-xlrd 76 | - python-zmq 77 | - `sudo apt-get install python python-all python-dev python-bcrypt python-configparser python-crypto python-gdal python-gnupg python-iniparse python-numpy python-pyodbc python-pandas python-pip python-pysqlite2 python-socksipy python-sphinx python-stem python-torctl python-xlrd python-zmq` 78 | - connectors 79 | - ***NOTE: add freetds.conf file to the {home} folder of the user*** 80 | - [global] 81 | - tds version = auto 82 | - freetds-bin 83 | - freetds-common 84 | - freetds-dev 85 | - libdbd-freetds 86 | - python-pymssql 87 | - python-mysql.connector 88 | - python-pysqlite2 (for SQLite 3) 89 | - python-pysqlite2-doc 90 | - tdsodbc 91 | - unixodbc 92 | - unixodbc-dev 93 | - `sudo apt-get install freetds-bin freetds-common freetds-dev libdbd-freetds python-pymssql python-mysql.connector python-pysqlite2 tdsodbc unixodbc unixodbc-dev` 94 | - documentation 95 | - python-doc 96 | - python-pysqlite2-doc 97 | - python-numpy-doc 98 | - `sudo apt-get install python-doc python-pysqlite2-doc python-numpy-doc` 99 | - ides 100 | - spyder (Python IDE) 101 | - `sudo apt-get install spyder` 102 | - `sudo pip install --upgrade spyder` 103 | - [Spyder 2.3.0](https://pypi.python.org/packages/source/s/spyder/spyder-2.3.0.zip#md5=7c99e0bc6485b0700f9570201282a139) 104 | - python 3.4 105 | - libraries 106 | - python3 107 | - python3-all 108 | - python3-dev 109 | - python3-bcrypt 110 | - *python3-configparser (now integrated into Python 3)* 111 | - python3-crypto 112 | - python3-gdal 113 | - python3-gnupg 114 | - *python3-iniparse (not present yet)* 115 | - python3-pip 116 | - python3-numpy 117 | - python3-pandas 118 | - *python3-pyodbc (not present yet)* 119 | - *python3-socksipy (not present yet)* 120 | - python3-sphinx 121 | - python3-stem 122 | - *python3-torctl (not present yet)* 123 | - python3-xlrd 124 | - python3-zmq 125 | - `sudo apt-get install python3 python3-all python3-dev python3-bcrypt python3-configparser python3-crypto python3-gdal python3-gnupg python3-iniparse python3-numpy python3-pyodbc python3-pandas python3-pip python3-socksipy python3-sphinx python3-stem python3-torctl python3-xlrd python3-zmq` 126 | - connectors 127 | - ***NOTE: add freetds.conf file to the {home} folder of the user*** 128 | - [global] 129 | - tds version = auto 130 | - freetds-bin 131 | - freetds-common 132 | - freetds-dev 133 | - libdbd-freetds 134 | - python3-pymssql (not present yet) 135 | - python3-mysql.connector 136 | - tdsodbc 137 | - unixodbc 138 | - unixodbc-dev 139 | - `sudo apt-get install freetds-bin freetds-common freetds-dev libdbd-freetds python3-mysql.connector tdsodbc unixodbc unixodbc-dev` 140 | - documentation 141 | - python3-doc 142 | - `sudo apt-get install python3-doc` 143 | - ides 144 | - spyder3 (Python IDE) 145 | - `sudo apt-get install spyder3` 146 | - `sudo pip3 install --upgrade spyder` 147 | - source code managers 148 | - git 149 | - rabbitvcs-cli 150 | - rabbitvcs-gedit 151 | - rabbitvcs-nautilus 152 | - rabbitvcs-core 153 | - `sudo apt-get rabbitvcs-cli rabbitvcs-gedit rabbitvcs-nautilus rabbitvcs-core` 154 | - mercurial 155 | - mercurial 156 | - tortoisehg-nautilus 157 | - tortoisehg 158 | - `sudo apt-get install mercurial tortoisehg-nautilus tortoisehg` 159 | - tor 160 | - tor 161 | - tor-geoipdb 162 | - vidalia 163 | - torchat 164 | - privoxy 165 | - `sudo apt-get install tor tor-geoipdb torchat privoxy vidalia` 166 | - To compensate for app-armor interfering with Vidalia when it is invoked: 167 | - `sudo ln -s /etc/apparmor.d/usr.bin.vidalia /etc/apparmor.d/disable/` 168 | - `sudo apparmor_parser -R /etc/apparmor.d/usr.bin.vidalia` 169 | - `sudo /etc/init.d/tor start` or `sudo /etc/init.d/tor restart` 170 | --------------------------------------------------------------------------------