├── .gitignore ├── LICENSE ├── .github └── workflows │ └── python-app.yml ├── README.md ├── main.py └── proxies.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Berkay 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.github/workflows/python-app.yml: -------------------------------------------------------------------------------- 1 | name: Check New Proxies 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | tags: 7 | description: 'Test scenario' 8 | required: false 9 | schedule: 10 | - cron: '0 4,10,16,22 * * *' 11 | 12 | permissions: 13 | contents: write 14 | 15 | jobs: 16 | build: 17 | runs-on: ubuntu-latest 18 | steps: 19 | - uses: actions/checkout@v3 20 | - name: Set up Python 3.10 21 | uses: actions/setup-python@v3 22 | with: 23 | python-version: "3.10" 24 | - name: Install dependencies 25 | run: | 26 | python -m pip install --upgrade pip 27 | pip install requests termcolor 28 | if [ -f requirements.txt ]; then pip install -r requirements.txt; fi 29 | - name: Run the code 30 | run: | 31 | python main.py 32 | - uses: stefanzweifel/git-auto-commit-action@v5 33 | with: 34 | commit_message: Fetch New Proxies 35 | commit_user_name: berkay-digital 36 | commit_user_email: berkay-digital@users.noreply.github.com 37 | commit_author: berkay-digital 38 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Proxy Scraper 2 | 3 | Proxy Scraper is a Python-based project that scrapes and validates HTTP proxies from various sources. It automatically updates [proxies.txt](proxies.txt) every six hours so you can use it in your code. Don't forget to use the [raw version](https://raw.githubusercontent.com/berkay-digital/Proxy-Scraper/main/proxies.txt). 4 | 5 | ![Demo](https://cdn.discordapp.com/attachments/853203826600181790/1194217286916128808/scraper_3.gif) 6 | 7 | ## Project Structure 8 | 9 | The main script of the project is located in `main.py`. This script performs the scraping and validation of proxies. The validated proxies are then written to a file named `proxies.txt`. 10 | 11 | ## How It Works 12 | 13 | The script fetches proxy lists from two sources. After fetching the proxies, the script validates each proxy to ensure it's working. This is done by attempting to open a URL using the proxy. If the URL opens successfully, the proxy is considered working; otherwise, it's considered bad. 14 | 15 | The script uses multithreading to speed up the validation process. The number of threads used is equal to the number of proxies to be validated. 16 | 17 | Once all proxies have been validated, the script writes the working proxies to a file named `proxies.txt`. 18 | 19 | ## How to Run 20 | 21 | To run the script, simply run `python3 main.py`. Make sure you have the required Python libraries installed. You can install them using pip: 22 | 23 | ```sh 24 | pip install requests termcolor 25 | ``` 26 | 27 | ## Output 28 | The output of the script is a file named **proxies.txt** that contains the list of working proxies. Each proxy is written on a new line. 29 | 30 | ## Future Improvements 31 | Future improvements to the project could include adding more sources for proxy lists, improving the validation process, and adding a user interface for easier use. 32 | 33 | ## Contributing 34 | Contributions to the project are welcome. If you have a feature request, bug report, or want to improve the code, feel free to open an issue or submit a pull request. 35 | 36 | ## License 37 | This project is open source and available under the [MIT License](LICENSE). 38 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import time 3 | from termcolor import colored 4 | 5 | timeout = "3000" 6 | url = "https://api.proxyscrape.com/v4/free-proxy-list/get?request=display_proxies&protocol=http&proxy_format=ipport&format=text&timeout=20000" 7 | url2 = "https://raw.githubusercontent.com/monosans/proxy-list/main/proxies/http.txt" 8 | 9 | 10 | headers = { 11 | "accept": "text/plain, */*; q=0.01", 12 | "accept-language": "en-US,en;q=0.8", 13 | "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", 14 | } 15 | 16 | response = requests.request( 17 | "GET", url, headers=headers 18 | ) 19 | response2 = requests.request("GET", url2, headers=headers) 20 | print(response.text) 21 | print(response2.text) 22 | proxies = [] 23 | proxies.extend(response.text.split("\n")) 24 | proxies.extend(response2.text.split("\n")) 25 | 26 | proxies = list(set(filter(None, proxies))) 27 | print("Amount of proxies after removing duplicates:", len(proxies)) 28 | time.sleep(3) 29 | 30 | 31 | import urllib.request, socket 32 | 33 | socket.setdefaulttimeout(3) 34 | 35 | 36 | def is_bad_proxy(pip): 37 | try: 38 | proxy_handler = urllib.request.ProxyHandler({"http": pip}) 39 | opener = urllib.request.build_opener(proxy_handler) 40 | opener.addheaders = [("User-agent", "Mozilla/5.0")] 41 | urllib.request.install_opener(opener) 42 | sock = urllib.request.urlopen("http://api.ipify.org/") 43 | except urllib.error.HTTPError as e: 44 | return e.code 45 | except Exception as detail: 46 | return 1 47 | return 0 48 | 49 | 50 | import concurrent.futures 51 | 52 | working = [] 53 | badcount = 0 54 | workingcount = 0 55 | 56 | def check_proxy(proxy): 57 | global badcount, workingcount 58 | if is_bad_proxy(proxy): 59 | badcount += 1 60 | print(colored(f"{badcount} bad proxies", "red")) 61 | else: 62 | workingcount += 1 63 | print(colored(f"{workingcount} working proxies", "green")) 64 | working.append(proxy + "\n") 65 | 66 | 67 | 68 | start_time = time.time() 69 | 70 | with concurrent.futures.ThreadPoolExecutor() as executor: 71 | executor.map(check_proxy, proxies) 72 | end_time = time.time() 73 | print("Time taken: ", end_time - start_time, "seconds") 74 | with open("proxies.txt", "w") as f: 75 | f.writelines(working) 76 | -------------------------------------------------------------------------------- /proxies.txt: -------------------------------------------------------------------------------- 1 | 170.114.45.161:80 2 | 108.162.192.247:80 3 | 45.67.215.115:80 4 | 141.101.113.62:80 5 | 69.84.182.85:80 6 | 45.67.215.106:80 7 | 108.162.193.113:80 8 | 45.67.215.163:80 9 | 185.238.228.132:80 10 | 45.67.215.19:80 11 | 141.101.113.184:80 12 | 69.84.182.200:80 13 | 170.114.45.147:80 14 | 69.84.182.185:80 15 | 102.177.176.73:80 16 | 108.162.192.123:80 17 | 69.84.182.74:80 18 | 45.67.215.165:80 19 | 185.238.228.209:80 20 | 8.35.211.123:80 21 | 170.114.45.13:80 22 | 45.67.215.24:80 23 | 108.162.192.113:80 24 | 141.101.113.133:80 25 | 8.35.211.196:80 26 | 69.84.182.118:80 27 | 216.205.52.215:80 28 | 102.177.176.233:80 29 | 141.101.113.14:80 30 | 8.35.211.176:80 31 | 69.84.182.217:80 32 | 69.84.182.181:80 33 | 45.67.215.62:80 34 | 185.238.228.67:80 35 | 69.84.182.106:80 36 | 170.114.45.185:80 37 | 216.205.52.170:80 38 | 108.162.192.211:80 39 | 45.67.215.12:80 40 | 170.114.45.192:80 41 | 102.177.176.68:80 42 | 8.35.211.181:80 43 | 185.238.228.215:80 44 | 141.101.113.196:80 45 | 45.67.215.240:80 46 | 8.35.211.51:80 47 | 108.162.193.153:80 48 | 185.238.228.181:80 49 | 170.114.45.36:80 50 | 69.84.182.184:80 51 | 216.205.52.12:80 52 | 141.101.113.190:80 53 | 45.67.215.44:80 54 | 170.114.45.179:80 55 | 185.238.228.129:80 56 | 108.162.192.158:80 57 | 69.84.182.206:80 58 | 45.67.215.193:80 59 | 108.162.193.109:80 60 | 45.67.215.180:80 61 | 108.162.192.125:80 62 | 216.205.52.120:80 63 | 108.162.193.186:80 64 | 170.114.45.46:80 65 | 216.205.52.248:80 66 | 141.101.114.140:80 67 | 45.67.215.151:80 68 | 170.114.45.37:80 69 | 45.67.215.71:80 70 | 68.183.68.1:80 71 | 170.114.45.218:80 72 | 69.84.182.179:80 73 | 216.205.52.187:80 74 | 108.162.193.129:80 75 | 102.177.176.44:80 76 | 216.205.52.22:80 77 | 45.67.215.219:80 78 | 69.84.182.252:80 79 | 185.238.228.191:80 80 | 170.114.45.118:80 81 | 108.162.192.86:80 82 | 45.67.215.144:80 83 | 69.84.182.249:80 84 | 185.238.228.104:80 85 | 170.114.45.70:80 86 | 69.84.182.56:80 87 | 45.67.215.122:80 88 | 69.84.182.223:80 89 | 185.238.228.202:80 90 | 8.35.211.122:80 91 | 102.177.176.212:80 92 | 216.205.52.245:80 93 | 69.84.182.183:80 94 | 185.238.228.241:80 95 | 67.43.236.22:13731 96 | 69.84.182.105:80 97 | 170.114.45.33:80 98 | 102.177.176.134:80 99 | 102.177.176.147:80 100 | 185.238.228.252:80 101 | 8.35.211.113:80 102 | 170.114.45.3:80 103 | 141.101.113.146:80 104 | 141.101.113.53:80 105 | 108.162.193.207:80 106 | 79.175.188.151:80 107 | 141.101.113.141:80 108 | 170.114.45.209:80 109 | 141.101.113.243:80 110 | 108.162.193.211:80 111 | 102.177.176.39:80 112 | 8.35.211.160:80 113 | 216.205.52.123:80 114 | 8.243.68.11:8080 115 | 74.176.195.135:80 116 | 102.177.176.216:80 117 | 185.238.228.121:80 118 | 141.101.113.230:80 119 | 8.35.211.107:80 120 | 102.177.176.141:80 121 | 8.35.211.161:80 122 | 216.205.52.125:80 123 | 170.114.45.110:80 124 | 185.238.228.255:80 125 | 216.205.52.221:80 126 | 141.101.113.111:80 127 | 69.84.182.178:80 128 | 108.162.193.226:80 129 | 108.162.193.21:80 130 | 185.238.228.155:80 131 | 8.35.211.127:80 132 | 185.238.228.141:80 133 | 8.35.211.206:80 134 | 185.238.228.143:80 135 | 141.101.113.99:80 136 | 45.67.215.139:80 137 | 170.114.45.35:80 138 | 134.209.29.120:80 139 | 108.162.193.243:80 140 | 185.238.228.149:80 141 | 69.84.182.154:80 142 | 139.59.1.14:3128 143 | 69.84.182.137:80 144 | 141.101.113.127:80 145 | 18.202.158.161:80 146 | 216.205.52.58:80 147 | 8.35.211.135:80 148 | 141.101.113.253:80 149 | 102.177.176.186:80 150 | 108.162.193.112:80 151 | 8.35.211.182:80 152 | 8.35.211.147:80 153 | 102.177.176.11:80 154 | 69.84.182.17:80 155 | 170.114.45.142:80 156 | 141.101.113.172:80 157 | 185.238.228.235:80 158 | 216.205.52.203:80 159 | 102.177.176.12:80 160 | 8.219.255.17:3128 161 | 8.35.211.208:80 162 | 108.162.193.219:80 163 | 141.101.113.60:80 164 | 8.213.151.128:3128 165 | 8.35.211.222:80 166 | 170.114.45.79:80 167 | 141.101.113.156:80 168 | 102.177.176.191:80 169 | 102.177.176.222:80 170 | 216.205.52.34:80 171 | 185.238.228.125:80 172 | 45.67.215.169:80 173 | 102.177.176.173:80 174 | 216.205.52.37:80 175 | 45.67.215.220:80 176 | 108.162.193.122:80 177 | 45.67.215.150:80 178 | 102.177.176.231:80 179 | 108.162.193.248:80 180 | 185.238.228.192:80 181 | 102.177.176.33:80 182 | 216.205.52.138:80 183 | 185.238.228.147:80 184 | 102.177.176.165:80 185 | 185.238.228.115:80 186 | 8.35.211.114:80 187 | 141.101.113.28:80 188 | 170.114.45.165:80 189 | 45.67.215.65:80 190 | 216.205.52.32:80 191 | 108.162.192.151:80 192 | 102.177.176.201:80 193 | 8.35.211.223:80 194 | 69.84.182.145:80 195 | 45.67.215.227:80 196 | 45.67.215.173:80 197 | 216.205.52.204:80 198 | 170.114.45.116:80 199 | 45.67.215.160:80 200 | 102.177.176.87:80 201 | 170.114.45.20:80 202 | 108.162.193.116:80 203 | 141.101.113.58:80 204 | 185.200.37.245:8080 205 | 69.84.182.35:80 206 | 102.177.176.179:80 207 | 141.101.113.205:80 208 | 102.177.176.84:80 209 | 45.67.215.120:80 210 | 141.101.113.124:80 211 | 8.35.211.210:80 212 | 185.238.228.220:80 213 | 45.67.215.197:80 214 | 185.238.228.100:80 215 | 8.35.211.177:80 216 | 216.205.52.190:80 217 | 170.114.45.29:80 218 | 185.238.228.163:80 219 | 170.114.45.137:80 220 | 108.162.192.173:80 221 | 170.114.45.151:80 222 | 216.205.52.205:80 223 | 102.177.176.219:80 224 | 8.35.211.169:80 225 | 170.114.45.83:80 226 | 108.162.192.12:80 227 | 141.101.113.52:80 228 | 185.238.228.237:80 229 | 216.205.52.236:80 230 | 45.67.215.108:80 231 | 170.114.45.115:80 232 | 45.67.215.128:80 233 | 102.177.176.127:80 234 | 141.101.113.246:80 235 | 103.116.7.101:80 236 | 102.177.176.2:80 237 | 216.205.52.247:80 238 | 170.114.45.202:80 239 | 45.67.215.170:80 240 | 141.101.113.174:80 241 | 45.67.215.87:80 242 | 108.162.192.192:80 243 | 185.238.228.24:80 244 | 102.177.176.71:80 245 | 69.84.182.117:80 246 | 141.101.113.119:80 247 | 102.177.176.248:80 248 | 108.162.193.204:80 249 | 45.67.215.164:80 250 | 216.205.52.200:80 251 | 170.114.45.160:80 252 | 170.114.45.143:80 253 | 45.67.215.228:80 254 | 102.177.176.205:80 255 | 108.162.193.247:80 256 | 45.67.215.109:80 257 | 185.238.228.63:80 258 | 185.238.228.142:80 259 | 216.205.52.0:80 260 | 69.84.182.91:80 261 | 102.177.176.193:80 262 | 141.101.113.207:80 263 | 8.35.211.118:80 264 | 108.162.193.179:80 265 | 185.238.228.131:80 266 | 8.35.211.116:80 267 | 69.84.182.152:80 268 | 176.126.103.194:44214 269 | 141.101.113.237:80 270 | 141.101.113.123:80 271 | 170.114.45.52:80 272 | 45.67.215.112:80 273 | 8.35.211.175:80 274 | 69.84.182.53:80 275 | 141.101.113.11:80 276 | 102.177.176.202:80 277 | 45.67.215.218:80 278 | 216.205.52.178:80 279 | 141.101.113.214:80 280 | 141.101.113.177:80 281 | 8.35.211.195:80 282 | 102.177.176.21:80 283 | 108.162.193.162:80 284 | 108.162.192.172:80 285 | 69.84.182.156:80 286 | 108.162.193.163:80 287 | 141.101.113.42:80 288 | 185.238.228.109:80 289 | 185.238.228.134:80 290 | 216.205.52.237:80 291 | 45.67.215.238:80 292 | 45.67.215.121:80 293 | 8.35.211.109:80 294 | 8.212.157.10:443 295 | 170.114.45.193:80 296 | 108.162.193.188:80 297 | 216.205.52.153:80 298 | 170.114.45.251:80 299 | 8.35.211.172:80 300 | 8.35.211.105:80 301 | 102.177.176.120:80 302 | 185.238.228.114:80 303 | 45.67.215.241:80 304 | 8.35.211.246:80 305 | 102.177.176.133:80 306 | 45.67.215.183:80 307 | 8.35.211.12:80 308 | 141.101.113.161:80 309 | 8.35.211.143:80 310 | 102.177.176.99:80 311 | 45.67.215.140:80 312 | 69.84.182.203:80 313 | 8.35.211.190:80 314 | 103.35.188.243:3128 315 | 103.250.70.193:1121 316 | 72.10.164.178:19971 317 | 78.26.146.16:443 318 | 8.219.255.17:3128 319 | 52.78.193.98:3128 320 | 195.158.8.123:3128 321 | 144.125.164.222:8081 322 | 209.97.150.167:80 323 | 68.235.35.171:3128 324 | 144.125.164.158:8081 325 | 94.131.8.178:8887 326 | 103.76.107.6:8080 327 | 41.223.119.156:3128 328 | 198.199.86.11:3128 329 | 94.131.8.178:8883 330 | 144.125.164.222:8080 331 | 139.162.78.109:3128 332 | 138.68.60.8:80 333 | 94.131.8.178:8884 334 | 200.24.159.230:8080 335 | 150.107.140.238:3128 336 | 59.6.25.118:3128 337 | 72.10.164.178:20769 338 | 8.219.97.248:80 339 | 46.161.6.165:8080 340 | 185.191.236.162:3128 341 | 8.212.177.126:8080 342 | 185.225.20.242:80 343 | 14.228.106.39:8080 344 | 68.235.35.171:3128 345 | 101.50.101.91:80 346 | 94.131.8.178:8885 347 | 78.129.253.149:80 348 | 8.218.220.123:8080 349 | 8.209.255.13:3128 350 | 8.212.157.10:8080 351 | 45.225.207.183:999 352 | 129.150.39.251:8000 353 | 77.110.104.126:80 354 | 8.221.118.5:9090 355 | 8.243.68.11:8080 356 | 139.162.200.213:80 357 | 78.157.58.254:80 358 | 185.88.177.40:80 359 | 14.228.106.39:8080 360 | 8.212.160.196:8080 361 | 144.125.164.158:8080 362 | 159.203.61.169:80 363 | 45.59.186.60:80 364 | 139.162.236.244:80 365 | 8.212.160.196:8080 366 | 185.235.16.12:80 367 | 161.35.70.249:80 368 | 8.219.77.141:80 369 | 43.161.250.102:8080 370 | 8.213.222.247:8080 371 | --------------------------------------------------------------------------------