├── README.md
├── no_proxy.py
├── requirements.txt
├── rotating_multiple_proxies.py
├── rotating_multiple_proxies_async.py
└── single_proxy.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Rotating Proxies With Python
  2 | [<img src="https://img.shields.io/static/v1?label=&message=Python&color=brightgreen" />](https://github.com/topics/python) [<img src="https://img.shields.io/static/v1?label=&message=Web%20Scraping&color=important" />](https://github.com/topics/web-scraping) [<img src="https://img.shields.io/static/v1?label=&message=Rotating%20Proxies&color=blueviolet" />](https://github.com/topics/rotating-proxies)
  3 | 
  4 | [![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)
  5 | 
  6 | [![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)
  7 | 
  8 | ## Table of Contents
  9 | 
 10 | - [Finding Current IP Address](#finding-your-current-ip-address)
 11 | - [Using A Single Proxy](#using-a-single-proxy)
 12 | - [Rotating Multiple Proxies](#rotating-multiple-proxies)
 13 | - [Rotating Multiple Proxies Using Async](#rotating-multiple-proxies-using-async)
 14 | 
 15 | ## Prerequisites
 16 | 
 17 | This article uses the python `requests` module. In order to install it, you can use `virtualenv`. `virtualenv` is a tool to create isolated Python environments.
 18 | 
 19 | Start by creating a virtual environment in your project folder by running
 20 | ```bash
 21 | $ virtualenv venv
 22 | ```
 23 | This will install python, pip and common libraries in your project folder.
 24 | 
 25 | Next, invoke the source command to activate the environment. 
 26 | ```bash
 27 | $ source venv/bin/activate
 28 | ```
 29 | 
 30 | Lastly, install the `requests` module in the current virtual environment
 31 | ```bash
 32 | $ pip install requests
 33 | ```
 34 | 
 35 | Alternatively, you can install the dependencies from the included [requirements.txt](requirements.txt) file by running
 36 | 
 37 | ```bash
 38 | $ pip install -r requirements.txt
 39 | ```
 40 | 
 41 | Congratulations, you have successfully installed the `request` module. Now, it's time to find out your current Ip address!
 42 | 
 43 | ## Finding Your Current IP Address
 44 | 
 45 | Create a file with the `.py` extension with the following contents (or just copy [no_proxy.py](src/no_proxy.py)):
 46 | 
 47 | ```python
 48 | import requests
 49 | 
 50 | response = requests.get('https://ip.oxylabs.io/location')
 51 | print(response.text)
 52 | ```
 53 | 
 54 | Now, run it from a terminal
 55 | 
 56 | ```bash
 57 | $ python no_proxy.py
 58 | 
 59 | 128.90.50.100
 60 | ```
 61 | The output of this script will show your current IP address, which uniquely identifies you on the network. Instead of exposing it directly when requesting pages, we will use a proxy server.
 62 | 
 63 | Let's start by using a single proxy.
 64 | 
 65 | ## Using A Single Proxy 
 66 | 
 67 | Your first step is to [find a free proxy server](https://www.google.com/search?q=free+proxy+server+list).
 68 | 
 69 | **Important Note**: free proxies are unreliable, slow and can collect the data about the pages you access. If you're looking for a reliable paid option, we highly recommend using [oxylabs.io](https://oxy.yt/GrVD) 
 70 | 
 71 | To use a proxy, you will need its:
 72 | * scheme (e.g. `http`)
 73 | * ip (e.g. `2.56.215.247`)
 74 | * port (e.g. `3128`)
 75 | * username and password that is used to connect to the proxy (optional)
 76 | 
 77 | Once you have it, you need to set it up in the following format
 78 | ```
 79 | SCHEME://USERNAME:PASSWORD@YOUR_PROXY_IP:YOUR_PROXY_PORT
 80 | ```
 81 | 
 82 | Here are a few examples of the proxy formats you may encounter:
 83 | ```text
 84 | http://2.56.215.247:3128
 85 | https://2.56.215.247:8091
 86 | https://my-user:aegi1Ohz@2.56.215.247:8044
 87 | ```
 88 | 
 89 | Once you have the proxy information, assign it to a constant.
 90 | 
 91 | ```python
 92 | PROXY = 'http://2.56.215.247:3128'
 93 | ```
 94 | 
 95 | Next, define a timeout in seconds as it is always a good idea to avoid waiting indefinitely for the response that may never be returned (due to network issues, server issues or the problems with the proxy server)
 96 | ```python
 97 | TIMEOUT_IN_SECONDS = 10
 98 | ```
 99 | 
100 | The requests module [needs to know](https://docs.python-requests.org/en/master/user/advanced/#proxies) when to actually use the proxy.
101 | For that, consider the website you are attempting to access. Does it use http or https?
102 | Since we're trying to access **https**://ip.oxylabs.io/location, we can define this configuration as follows
103 | ```python
104 | scheme_proxy_map = {
105 |     'https': PROXY,
106 | }
107 | ```
108 | 
109 | **Note**: you can specify multiple protocols, and even define specific domains for which a different proxy will be used
110 | 
111 | ```python
112 | scheme_proxy_map = {
113 |     'http': PROXY1,
114 |     'https': PROXY2,
115 |     'https://example.org': PROXY3,
116 | }
117 | ```
118 | 
119 | Finally, we make the request by calling `requests.get` and passing all the variables we defined earlier. We also handle the exceptions and show the error when a network issue occurs.
120 | 
121 | ```python
122 | try:
123 |     response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
124 | except (ProxyError, ReadTimeout, ConnectTimeout) as error:
125 |         print('Unable to connect to the proxy: ', error)
126 | else:
127 |     print(response.text)
128 | ```
129 | 
130 | The output of this script should show you the ip of your proxy:
131 | 
132 | ```bash
133 | $ python single_proxy.py
134 | 
135 | 2.56.215.247
136 | ```
137 | 
138 | You are now hidden behind a proxy when making your requests through the python script.
139 | You can find the complete code in the file [single_proxy.py](src/single_proxy.py).
140 | 
141 | Now we're ready to rotate through a list of proxies, instead of using a single one!
142 | 
143 | ## Rotating Multiple Proxies
144 | 
145 | If you're using unreliable proxies, it could prove beneficial to save a bunch of them into a csv file and run a loop to determine whether they are still available.
146 | 
147 | For that purpose, first create a file `proxies.csv` with the following content:
148 | ```text
149 | http://2.56.215.247:3128
150 | https://88.198.24.108:8080
151 | http://50.206.25.108:80
152 | http://68.188.59.198:80
153 | ... any other proxy servers, each of them on a separate line
154 | ```
155 | 
156 | Then, create a python file and define both the filename, and how long are you willing to wait for a single proxy to respond:
157 | 
158 | ```python
159 | TIMEOUT_IN_SECONDS = 10
160 | CSV_FILENAME = 'proxies.csv'
161 | ```
162 | 
163 | Next, write the code that opens the csv file and reads every proxy server line by line into a `csv_row` variable and builds `scheme_proxy_map` configuration needed by the requests module.
164 | 
165 | ```python
166 | with open(CSV_FILENAME) as open_file:
167 |     reader = csv.reader(open_file)
168 |     for csv_row in reader:
169 |         scheme_proxy_map = {
170 |             'https': csv_row[0],
171 |         }
172 | ```
173 | 
174 | And finally, we use the same scraping code from the previous section to access the website via proxy
175 | 
176 | ```python
177 | with open(CSV_FILENAME) as open_file:
178 |     reader = csv.reader(open_file)
179 |     for csv_row in reader:
180 |         scheme_proxy_map = {
181 |             'https': csv_row[0],
182 |         }
183 |         
184 |         # Access the website via proxy
185 |         try:
186 |             response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
187 |         except (ProxyError, ReadTimeout, ConnectTimeout) as error:
188 |             pass
189 |         else:
190 |             print(response.text)
191 | ```
192 | 
193 | **Note**: if you are only interested in scraping the content using *any* working proxy from the list, then add a break after print to stop going through the proxies in the csv file
194 | 
195 | ```python
196 |         try:
197 |             response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
198 |         except (ProxyError, ReadTimeout, ConnectTimeout) as error:
199 |             pass
200 |         else:
201 |             print(response.text)
202 |             break # notice the break here
203 | ```
204 | 
205 | This complete code is available in [rotating_multiple_proxies.py](src/rotating_multiple_proxies.py)
206 | 
207 | The only thing that is preventing us from reaching our full potential is speed.
208 | It's time to tackle that in the next section!
209 | 
210 | ## Rotating Multiple Proxies Using Async
211 | 
212 | Checking all the proxies in the list one by one may be an option for some, but it has one significant downside - this approach is painfully slow. This is because we are using a synchronous approach. We tackle requests one at a time and only move to the next once the previous one is completed. 
213 | 
214 | A better option would be to make requests and wait for responses in a non-blocking way - this would speed up the script significantly.
215 | 
216 | In order to do that we use the `aiohttp` module. You can install it using the following cli command: 
217 | 
218 | ```bash
219 | $ pip install aiohttp
220 | ```
221 | 
222 | Then, create a python file where you define:
223 | * the csv filename that contains the proxy list
224 | * url that you wish to use to check the proxies
225 | * how long are you willing to wait for each proxy - the timeout setting
226 | 
227 | ```python
228 | CSV_FILENAME = 'proxies.csv'
229 | URL_TO_CHECK = 'https://ip.oxylabs.io/location'
230 | TIMEOUT_IN_SECONDS = 10
231 | ```
232 | 
233 | Next, we define an async function and run it using the asyncio module.
234 | It accepts two parameters: 
235 | * the url it needs to request
236 | * the proxy to use to access it
237 | 
238 | We then print the response. If the script received an error when attempting to access the url via proxy, it will print it as well.
239 | 
240 | ```python
241 | 
242 | async def check_proxy(url, proxy):
243 |     try:
244 |         session_timeout = aiohttp.ClientTimeout(total=None,
245 |                                                 sock_connect=TIMEOUT_IN_SECONDS,
246 |                                                 sock_read=TIMEOUT_IN_SECONDS)
247 |         async with aiohttp.ClientSession(timeout=session_timeout) as session:
248 |             async with session.get(url, proxy=proxy, timeout=TIMEOUT_IN_SECONDS) as resp:
249 |                 print(await resp.text())
250 |     except Exception as error:
251 |         # you can comment out this line to only see valid proxies printed out in the command line
252 |         print('Proxy responded with an error: ', error)
253 |         return
254 | ```
255 | 
256 | Then, we define a main function that reads the csv file and creates an asynchronous task to check the proxy for every single record in the csv file. 
257 | 
258 | ```python
259 | 
260 | async def main():
261 |     tasks = []
262 |     with open(CSV_FILENAME) as open_file:
263 |         reader = csv.reader(open_file)
264 |         for csv_row in reader:
265 |             task = asyncio.create_task(check_proxy(URL_TO_CHECK, csv_row[0]))
266 |             tasks.append(task)
267 | 
268 |     await asyncio.gather(*tasks)
269 | ```
270 | 
271 | Finally, we run the main function and wait until all the async tasks complete
272 | ```python
273 | asyncio.run(main())
274 | ```
275 | 
276 | This complete code is available in [rotating_multiple_proxies.py](src/rotating_multiple_proxies_async.py)
277 | 
278 | This code now runs exceptionally fast!
279 | 
280 | # We are open to contribution!
281 | 
282 | Be sure to play around with it and create a pull request with any improvements you may find.
283 | Also, check this [Best rotating proxy service](https://medium.com/@oxylabs.io/10-best-rotating-proxy-services-for-2024-853d840af1a4) list.
284 | 
285 | Happy coding!
286 | 


--------------------------------------------------------------------------------
/no_proxy.py:
--------------------------------------------------------------------------------
1 | import requests
2 | 
3 | response = requests.get('https://ip.oxylabs.io/location')
4 | print(response.text)
5 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp==3.8.1
2 | requests==2.27.1
3 | 


--------------------------------------------------------------------------------
/rotating_multiple_proxies.py:
--------------------------------------------------------------------------------
 1 | import csv
 2 | 
 3 | import requests
 4 | from requests.exceptions import ProxyError, ReadTimeout, ConnectTimeout
 5 | 
 6 | TIMEOUT_IN_SECONDS = 10
 7 | CSV_FILENAME = 'proxies.csv'
 8 | 
 9 | with open(CSV_FILENAME) as open_file:
10 |     reader = csv.reader(open_file)
11 |     for csv_row in reader:
12 |         scheme_proxy_map = {
13 |             'https': csv_row[0],
14 |         }
15 | 
16 |         try:
17 |             response = requests.get(
18 |                 'https://ip.oxylabs.io/location',
19 |                 proxies=scheme_proxy_map,
20 |                 timeout=TIMEOUT_IN_SECONDS,
21 |             )
22 |         except (ProxyError, ReadTimeout, ConnectTimeout) as error:
23 |             pass
24 |         else:
25 |             print(response.text)
26 | 


--------------------------------------------------------------------------------
/rotating_multiple_proxies_async.py:
--------------------------------------------------------------------------------
 1 | import csv
 2 | import aiohttp
 3 | import asyncio
 4 | 
 5 | CSV_FILENAME = 'proxies.csv'
 6 | URL_TO_CHECK = 'https://ip.oxylabs.io/location'
 7 | TIMEOUT_IN_SECONDS = 10
 8 | 
 9 | 
10 | async def check_proxy(url, proxy):
11 |     try:
12 |         session_timeout = aiohttp.ClientTimeout(
13 |             total=None, sock_connect=TIMEOUT_IN_SECONDS, sock_read=TIMEOUT_IN_SECONDS
14 |         )
15 |         async with aiohttp.ClientSession(timeout=session_timeout) as session:
16 |             async with session.get(
17 |                 url, proxy=proxy, timeout=TIMEOUT_IN_SECONDS
18 |             ) as resp:
19 |                 print(await resp.text())
20 |     except Exception as error:
21 |         print('Proxy responded with an error: ', error)
22 |         return
23 | 
24 | 
25 | async def main():
26 |     tasks = []
27 |     with open(CSV_FILENAME) as open_file:
28 |         reader = csv.reader(open_file)
29 |         for csv_row in reader:
30 |             task = asyncio.create_task(check_proxy(URL_TO_CHECK, csv_row[0]))
31 |             tasks.append(task)
32 | 
33 |     await asyncio.gather(*tasks)
34 | 
35 | 
36 | asyncio.run(main())
37 | 


--------------------------------------------------------------------------------
/single_proxy.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from requests.exceptions import ProxyError, ReadTimeout, ConnectTimeout
 3 | 
 4 | PROXY = 'http://2.56.215.247:3128'
 5 | TIMEOUT_IN_SECONDS = 10
 6 | 
 7 | scheme_proxy_map = {
 8 |     'https': PROXY,
 9 | }
10 | try:
11 |     response = requests.get(
12 |         'https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS
13 |     )
14 | except (ProxyError, ReadTimeout, ConnectTimeout) as error:
15 |     print('Unable to connect to the proxy: ', error)
16 | else:
17 |     print(response.text)
18 | 


--------------------------------------------------------------------------------