└── README.md /README.md: -------------------------------------------------------------------------------- 1 | [![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112) 2 | 3 | [![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/Pds3gBmKMH) 4 | 5 | 6 | # How to Bypass CAPTCHA With Puppeteer 7 | 8 | - [How to Bypass CAPTCHA With Puppeteer](#how-to-bypass-captcha-with-puppeteer) 9 | * [Using Puppeteer-stealth to bypass CAPTCHA](#using-puppeteer-stealth-to-bypass-captcha) 10 | * [Using Web Unblocker with Node.JS](#using-web-unblocker-with-nodejs) 11 | 12 | To access protected websites, you must bypass CAPTCHA. Puppeteer, a Node.js library with a user-friendly API for managing Chrome/Chromium via the DevTools Protocol, can help. It can run in full-browser mode instead of headless mode. 13 | 14 | Well, why isn’t Puppeteer enough? Automated access using Puppeteer often triggers CAPTCHA or blocks as websites detect the automation. 15 | 16 | Let’s validate it using the following steps: 17 | 18 | ### 1. You must have Node.JS installed on your system. 19 | 20 | Create a new Node.JS project and install Puppeteer using the following `npm` command: 21 | 22 | ```npm i puppeteer``` 23 | 24 | ### 2. Import the Puppeteer library in your Node.JS file. 25 | 26 | ```const puppeteer = require('puppeteer');``` 27 | 28 | ### 3. Create a new browser instance in headless mode and a new page using the following code: 29 | 30 | ``` 31 | (async () => { 32 | // Create a browser instance 33 | const browserObj = await puppeteer.launch(); 34 | 35 | // Create a new page 36 | const newpage = await browserObj.newPage(); 37 | ``` 38 | 39 | ### 4. Since we need to take the screenshot on the desktop device, we can set the viewport size using the following code: 40 | 41 | ``` 42 | // Set the width and height of viewport 43 | await newpage.setViewport({ width: 1920, height: 1080 }); 44 | ``` 45 | 46 | The setViewPort() method sets the size of the webpage. You can change it according to your device requirements. 47 | 48 | ### 5. After that, navigate to a page URL (that you think is a CAPTCHA-protected page) and take a screenshot. 49 | 50 | For demonstration purposes, the code uses Oxylabs [scraping sandbox](https://sandbox.oxylabs.io/products). Remember to close the browser object at the end. 51 | 52 | ``` 53 | const url = 'https://sandbox.oxylabs.io/products'; 54 | 55 | // Open the required URL in the newpage object 56 | await newpage.goto(url); 57 | await newpage.waitForNetworkIdle(); // Wait for network resources to fully load 58 | 59 | // Capture screenshot 60 | await newpage.screenshot({ 61 | path: 'screenshot.png', 62 | }); 63 | 64 | // Close the browser object 65 | await browserObj.close(); 66 | })(); 67 | ``` 68 | 69 | This is what the complete code looks like: 70 | 71 | ``` 72 | const puppeteer = require('puppeteer'); 73 | 74 | (async () => { 75 | const browserObj = await puppeteer.launch(); 76 | const newpage = await browserObj.newPage(); 77 | await newpage.setViewport({ width: 1920, height: 1080 }); 78 | 79 | const url = 'https://sandbox.oxylabs.io/products'; 80 | 81 | await newpage.goto(url); 82 | await newpage.waitForNetworkIdle(); 83 | await newpage.screenshot({ 84 | path: 'screenshot.png', 85 | }); 86 | 87 | await browserObj.close(); 88 | })(); 89 | ``` 90 | 91 | ## Using Puppeteer-stealth to bypass CAPTCHA 92 | 93 | Here is the step-by-step procedure to implement this CAPTCHA bypass: 94 | 95 | ### 1. To start, you need to install the `puppeteer-extra` and `puppeteer-extra-plugin-stealth` packages. 96 | 97 | ``` 98 | npm install puppeteer-extra-plugin-stealth puppeteer-extra 99 | ``` 100 | 101 | ### 2. After that, import the following required libraries in your Node.JS file: 102 | 103 | ``` 104 | const puppeteerExtra = require('puppeteer-extra'); 105 | const Stealth = require('puppeteer-extra-plugin-stealth'); 106 | 107 | puppeteerExtra.use(Stealth()); 108 | ``` 109 | 110 | ### 3. The next step is to create the browser object in headless mode, navigate to the URL and take a screenshot. 111 | 112 | ``` 113 | (async () => { 114 | const browserObj = await puppeteerExtra.launch(); 115 | const newpage = await browserObj.newPage(); 116 | 117 | await newpage.setViewport({ width: 1920, height: 1080 }); 118 | 119 | await newpage.setUserAgent( 120 | 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' 121 | ); 122 | 123 | await newpage.goto('https://sandbox.oxylabs.io/products'); 124 | await newpage.waitForNetworkIdle(); // Wait for network resources to fully load 125 | 126 | await newpage.screenshot({ path: 'screenshot_stealth.png' }); 127 | 128 | await browserObj.close(); 129 | })(); 130 | ``` 131 | 132 | The `setUserAgent` method makes our requests imitate a real browser's User-Agent, making our automated headless browsers appear more like regular users. Setting one of the common User-Agent strings helps evade detection and bypass anti-bot mechanisms that analyze the User-Agent header. 133 | 134 | Here is what our complete script looks like: 135 | 136 | ``` 137 | const puppeteerExtra = require('puppeteer-extra'); 138 | const Stealth = require('puppeteer-extra-plugin-stealth'); 139 | 140 | puppeteerExtra.use(Stealth()); 141 | 142 | (async () => { 143 | const browserObj = await puppeteerExtra.launch(); 144 | const newpage = await browserObj.newPage(); 145 | 146 | await newpage.setViewport({ width: 1920, height: 1080 }); 147 | 148 | await newpage.setUserAgent( 149 | 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' 150 | ); 151 | 152 | await newpage.goto('https://sandbox.oxylabs.io/products'); 153 | await newpage.waitForNetworkIdle(); // Wait for network resources to fully load 154 | 155 | await newpage.screenshot({ path: 'screenshot_stealth.png' }); 156 | 157 | await browserObj.close(); 158 | })(); 159 | ``` 160 | 161 | ## Using Web Unblocker with Node.JS 162 | 163 | Web Unblocker uses AI to help users prevent CAPTCHA and gain access to public data from websites with advanced anti-bots implemented. To begin, you can send a basic query without any special options – the Web Unblocker tool will select the fastest CAPTCHA proxy, add all necessary headers, and provide you with the response body. 164 | 165 | ### 1. Install the node-fetch and HttpsProxyAgent using the following command: 166 | 167 | ```npm install node-fetch https-proxy-agent``` 168 | 169 | ### 2. [Sign up to Oxylabs](https://dashboard.oxylabs.io/en/) and get your credentials for using the API. 170 | 171 | ### 3. Before importing the libraries, open the package.json file and enter these lines `"type": "module"`, for example: 172 | 173 | ``` 174 | { 175 | "type": "module", 176 | "dependencies": { 177 | "https-proxy-agent": "^7.0.4", 178 | "node-fetch": "^3.3.2", 179 | "puppeteer": "^22.6.5", 180 | "puppeteer-extra": "^3.3.6", 181 | "puppeteer-extra-plugin-stealth": "^2.11.2" 182 | } 183 | } 184 | ``` 185 | 186 | Since the newest version of `node-fetch` is an ESM-only module, you can’t import it using the `require()` function. Learn more about it [here](https://www.npmjs.com/package/node-fetch#installation). 187 | 188 | Next, import the required modules in your JS file using the `import-from` syntax: 189 | 190 | ``` 191 | import fetch from 'node-fetch'; 192 | import HttpsProxyAgent from 'https-proxy-agent'; 193 | import fs from 'fs'; 194 | ``` 195 | 196 | The `fs` library can help save the response in an HTML file. 197 | 198 | ### 4. Provide your user credentials and set up a proxy using `HttpsProxyAgent`. 199 | 200 | ``` 201 | const username = ''; 202 | const password = ''; 203 | 204 | (async () => { 205 | const agent = new HttpsProxyAgent.HttpsProxyAgent( 206 | `http://${username}:${password}@unblock.oxylabs.io:60000` 207 | ); 208 | ``` 209 | 210 | ### 5. Next, set the URL and issue a fetch request. 211 | 212 | ``` 213 | // Ignore the certificate 214 | process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = 0; 215 | 216 | const response = await fetch('https://ip.oxylabs.io/', { 217 | method: 'get', 218 | agent: agent, 219 | }); 220 | ``` 221 | 222 | The environment variable `NODE_TLS_REJECT_UNAUTHORIZED` is set to zero so that Node.JS doesn't verify the SSL/TLS certificates. This is a required setting if you’re using Oxylabs’ Web Unblocker. 223 | 224 | ### 6. In the end, you can convert the response into text and save it in an HTML file. 225 | 226 | ``` 227 | const resp = await response.text(); 228 | fs.writeFile('result.html', resp.toString(), (err) => { 229 | if (err) throw err; 230 | console.log('Result saved to result.html'); 231 | }); 232 | })(); 233 | ``` 234 | 235 | Here is the complete script: 236 | ``` 237 | import fetch from 'node-fetch'; 238 | import HttpsProxyAgent from 'https-proxy-agent'; 239 | import fs from 'fs'; 240 | 241 | const username = ''; 242 | const password = ''; 243 | 244 | (async () => { 245 | const agent = new HttpsProxyAgent.HttpsProxyAgent( 246 | `http://${username}:${password}@unblock.oxylabs.io:60000` 247 | ); 248 | 249 | // Ignore the certificate 250 | process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = 0; 251 | 252 | const response = await fetch('https://ip.oxylabs.io/', { 253 | method: 'get', 254 | agent: agent, 255 | }); 256 | 257 | const resp = await response.text(); 258 | fs.writeFile('result.html', resp.toString(), (err) => { 259 | if (err) throw err; 260 | console.log('Result saved to result.html'); 261 | }); 262 | })(); 263 | ``` 264 | 265 | 266 | 267 | 268 | 269 | --------------------------------------------------------------------------------