├── readme.md └── src ├── indeed_title.py ├── job_search_payload.json ├── parse_jobs.py ├── result.json └── scraper_api_demo.py /readme.md: -------------------------------------------------------------------------------- 1 | # How to Scrape Indeed 2 | 3 | [![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112) 4 | 5 | [![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/Pds3gBmKMH) 6 | 7 | Here's the process of extracting job postings from [Indeed](https://www.indeed.com/) with the help of Oxylabs [Web Scraper API](https://oxylabs.io/products/scraper-api/web) (**1-week free trial**) and Python. 8 | 9 | For the complete guide with in-depth explanations and visuals, check our [blog post](https://oxylabs.io/blog/how-to-scrape-indeed). 10 | 11 | ## Project setup 12 | 13 | ### Creating a virtual environment 14 | 15 | ```python 16 | python -m venv indeed_env #Windows 17 | python3 -m venv indeed_env #Macand Linux 18 | ``` 19 | 20 | ### Activating the virtual environment 21 | 22 | ```python 23 | .\indeed_env\Scripts\Activate#Windows 24 | source indeed_env/bin/activate #Macand Linux 25 | ``` 26 | 27 | ### Installing libraries 28 | 29 | ```python 30 | $ pip install requests 31 | ``` 32 | 33 | ## Overview of Web Scraper API 34 | 35 | The following is an example that shows how Web Scraper API works. 36 | 37 | ```python 38 | # scraper_api_demo.py 39 | import requests 40 | payload = { 41 |     "source": "universal", 42 |     "url": "https://www.indeed.com" 43 | } 44 | response = requests.post( 45 |     url="https://realtime.oxylabs.io/v1/queries", 46 |     json=payload, 47 |     auth=(username,password), 48 | ) 49 | print(response.json()) 50 | ``` 51 | 52 | ## Web Scraper API parameters 53 | 54 | ### Parsing the page title and retrieving results in JSON 55 | 56 | ```python 57 | "title": { 58 |     "_fns": [ 59 |                 { 60 |                     "_fn": "xpath_one", 61 |                     "_args": ["//title/text()"] 62 |                 } 63 |             ] 64 |         } 65 | }, 66 | ``` 67 | 68 | If you send this as `parsing_instructions`, the output would be the following JSON. 69 | 70 | ```python 71 | { "title": "Job Search | Indeed", "parse_status_code": 12000 } 72 | ``` 73 | 74 | Note that the `parse_status_code` means a successful response. 75 | 76 | The following code prints the title of the Indeed page. 77 | 78 | ```python 79 | # indeed_title.py 80 | 81 | import requests 82 | payload = { 83 |     "source": "universal", 84 |     "url": "https://www.indeed.com", 85 |     "parse": True, 86 |     "parsing_instructions": { 87 |         "title": { 88 |             "\_fns": [ 89 |                         { 90 |                             "\_fn": "xpath_one", 91 |                             "\_args": [ 92 |                                 "//title/text()" 93 |                                 ] 94 |                         } 95 |                     ] 96 |                 } 97 |     }, 98 | } 99 | response = requests.post( 100 |     url="https://realtime.oxylabs.io/v1/queries", 101 |     json=payload, 102 |     auth=('username', 'password'), 103 | ) 104 | print(response.json()['results'][0]['content']) 105 | ``` 106 | 107 | ## Scraping Indeed job postings 108 | 109 | ### Selecting a job listing 110 | 111 | ```python 112 | `.job_seen_beacon` 113 | ``` 114 | 115 | ### Creating the placeholder for a job listing 116 | 117 | ``` 118 | "job_listings": { 119 |     "_fns": [ 120 |         { 121 |             "_fn": "css", 122 |             "_args": [".job_seen_beacon"] 123 |         } 124 |     ], 125 |     "_items": { 126 |         "job_title": { 127 |             "_fns": [ 128 |                 { 129 |                 "_fn": "xpath_one", 130 |                 "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"] 131 |                 } 132 |             ] 133 |         }, 134 |         "company_name": { 135 |             "_fns": [ 136 |                 { 137 |                     "_fn": "xpath_one", 138 |                     "_args": [".//span[@data-testid='company-name']/text()"] 139 |                 } 140 |             ] 141 |         }, 142 | ``` 143 | 144 | ### Adding other selectors 145 | 146 | ```json 147 | { 148 | "source": "universal", 149 | "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA", 150 | "parse": true, 151 | "parsing_instructions": { 152 | "job_listings": { 153 | "_fns": [ 154 | { 155 | "_fn": "css", 156 | "_args": [".job_seen_beacon"] 157 | } 158 | ], 159 | "_items": { 160 | "job_title": { 161 | "_fns": [ 162 | { 163 | "_fn": "xpath_one", 164 | "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"] 165 | } 166 | ] 167 | }, 168 | "company_name": { 169 | "_fns": [ 170 | { 171 | "_fn": "xpath_one", 172 | "_args": [".//span[@data-testid='company-name']/text()"] 173 | } 174 | ] 175 | } 176 | } 177 | } 178 | } 179 | } 180 | ``` 181 | 182 | For other data points, see the file [here](src/job_search_payload.json). 183 | 184 | ### Saving the payload as a separator JSON file 185 | 186 | ```python 187 | # parse_jobs.py 188 | 189 | import requests 190 | import json 191 | payload = {} 192 | with open("job_search_payload.json") as f: 193 |     payload = json.load(f) 194 | response = requests.post( 195 |     url="https://realtime.oxylabs.io/v1/queries", 196 |     json=payload, 197 |     auth=("username", "password"), 198 | ) 199 | print(response.status_code) 200 | with open("result.json", "w") as f: 201 |     json.dump(response.json(), f, indent=4) 202 | ``` 203 | 204 | ## Exporting to JSON and CSV 205 | 206 | ```python 207 | # parse_jobs.py 208 | with open("results.json", "w") as f: 209 |     json.dump(data, f, indent=4) 210 | df = pd.DataFrame(data["results"][0]["content"]["job_listings"]) 211 | df.to_csv("job_search_results.csv", index=False) 212 | ``` 213 | 214 | ## Final word 215 | 216 | Check our [documentation](https://developers.oxylabs.io/scraper-apis/web-scraper-api) for more API parameters and variables found in this tutorial. 217 | 218 | If you have any questions, feel free to contact us at support@oxylabs.io. 219 | -------------------------------------------------------------------------------- /src/indeed_title.py: -------------------------------------------------------------------------------- 1 | import requests 2 | 3 | payload = { 4 | "source": "universal", 5 | "url": "https://www.indeed.com", 6 | "parse": True, 7 | "parsing_instructions": { 8 | "title": {"_fns": [{"_fn": "xpath_one", "_args": ["//title/text()"]}]} 9 | }, 10 | } 11 | 12 | response = requests.post( 13 | url="https://realtime.oxylabs.io/v1/queries", 14 | json=payload, 15 | auth=("john_snow", "APIuser!123"), 16 | ) 17 | 18 | print(response.json()["results"][0]["content"]) 19 | -------------------------------------------------------------------------------- /src/job_search_payload.json: -------------------------------------------------------------------------------- 1 | { 2 | "source": "universal", 3 | "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA", 4 | "render": "html", 5 | "parse": true, 6 | "parsing_instructions": { 7 | "job_listings": { 8 | "_fns": [ 9 | { 10 | "_fn": "css", 11 | "_args": [".job_seen_beacon"] 12 | } 13 | ], 14 | "_items": { 15 | "job_title": { 16 | "_fns": [ 17 | { 18 | "_fn": "xpath_one", 19 | "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"] 20 | } 21 | ] 22 | }, 23 | "company_name": { 24 | "_fns": [ 25 | { 26 | "_fn": "xpath_one", 27 | "_args": [".//span[@data-testid='company-name']/text()"] 28 | } 29 | ] 30 | }, 31 | "location": { 32 | "_fns": [ 33 | { 34 | "_fn": "xpath_one", 35 | "_args": [".//div[@data-testid='text-location']//text()"] 36 | } 37 | ] 38 | }, 39 | "salary_range": { 40 | "_fns": [ 41 | { 42 | "_fn": "xpath_one", 43 | "_args": [".//div[contains(@class, 'salary-snippet-container') or contains(@class, 'estimated-salary')]//text()"] 44 | } 45 | ] 46 | }, 47 | "date_posted": { 48 | "_fns": [ 49 | { 50 | "_fn": "xpath_one", 51 | "_args": [".//span[@class='date']/text()"] 52 | } 53 | ] 54 | }, 55 | "job_description": { 56 | "_fns": [ 57 | { 58 | "_fn": "xpath_one", 59 | "_args": ["normalize-space(.//div[@class='job-snippet'])"] 60 | } 61 | ] 62 | } 63 | } 64 | } 65 | } 66 | } -------------------------------------------------------------------------------- /src/parse_jobs.py: -------------------------------------------------------------------------------- 1 | """parse_jobs.py""" 2 | import json 3 | import requests 4 | import pandas as pd 5 | 6 | payload = {} 7 | with open("job_search_payload.json", encoding="utf-8") as f: 8 | payload = json.load(f) 9 | 10 | response = requests.post( 11 | url="https://realtime.oxylabs.io/v1/queries", 12 | json=payload, 13 | auth=("username", "password"), 14 | timeout=180, 15 | ) 16 | 17 | print(response.status_code) 18 | 19 | 20 | with open("result.json", "w", encoding="utf-8") as f: 21 | json.dump(response.json(), f, indent=4) 22 | 23 | 24 | # save results into a variable data 25 | data = response.json() 26 | # save the indeed data as a json file 27 | 28 | with open("results.json", "w") as f: 29 | json.dump(data, f, indent=4) 30 | df = pd.DataFrame(data["results"][0]["content"]["job_listings"]) 31 | df.to_csv("job_search_results.csv", index=False) 32 | -------------------------------------------------------------------------------- /src/result.json: -------------------------------------------------------------------------------- 1 | { 2 | "results": [ 3 | { 4 | "content": { 5 | "job_listings": [ 6 | { 7 | "location": "Remote in Surprise, AZ", 8 | "job_title": "Board Certified Behavior Analyst (BCBA) Regional Director", 9 | "date_posted": "Active 23 days ago", 10 | "company_name": "PsycTalent", 11 | "salary_range": "$100,000 - $110,000 a year", 12 | "job_description": "Remote work with travel to West Valley Clinics (Peoria, Surprise, North Phoenix, and Paradise Valley). Manage 8-10 BCBA's." 13 | }, 14 | { 15 | "location": "Remote", 16 | "job_title": "Customer Service Professional - Phone, Chat, Email", 17 | "date_posted": "Active 6 days ago", 18 | "company_name": "Miaplaza Inc.", 19 | "salary_range": "$17 - $23 an hour", 20 | "job_description": "Benefits: *We offer health, dental, and vision insurance for all full-time employees! Our health insurance will also include specialty care like fertility\u2026" 21 | }, 22 | { 23 | "location": "Remote in Ontario, CA", 24 | "job_title": "Licensed Mental Health Therapist (Remote)", 25 | "date_posted": "Posted 12 days ago", 26 | "company_name": "Foresight Mental Health", 27 | "salary_range": "$68,000 - $100,000 a year", 28 | "job_description": "Part-time clinicians see between 7 and 20 clients per week (based on clinician availability) and are offered an hourly rate of $50.00 per hour." 29 | }, 30 | { 31 | "location": "Remote in Westlake, TX 76262", 32 | "job_title": "Director of Clinical Data Management - Remote", 33 | "date_posted": "Active 18 days ago", 34 | "company_name": "Sound Physicians", 35 | "salary_range": null, 36 | "job_description": "Medical insurance, Dental insurance, and Vision insurance. Advanced reasoning to define problems, collect data, establish facts, draw valid conclusions, and\u2026" 37 | }, 38 | { 39 | "location": "Remote", 40 | "job_title": "Workday Developer", 41 | "date_posted": "Posted 30+ days ago", 42 | "company_name": "MasterBrand Cabinets LLC", 43 | "salary_range": null, 44 | "job_description": "Some off hour and weekend work may be required. Solve complex business problems integrating the Workday Cloud application with external applications from a\u2026" 45 | }, 46 | { 47 | "location": "Remote", 48 | "job_title": "Customer Service Representative (Remote)", 49 | "date_posted": "Active 16 days ago", 50 | "company_name": "Interioricons.com", 51 | "salary_range": null, 52 | "job_description": "The ability to work effectively from home with a designated workspace. A highly positive person who can provide an excellent customer experience and build long\u2026" 53 | }, 54 | { 55 | "location": "Remote in San Antonio, TX", 56 | "job_title": "Debt Collector*REMOTE* NO Evenings or Weekends-Unlimited Commision", 57 | "date_posted": "Active 5 days ago", 58 | "company_name": "McCarthy, Burgess & Wolff, Inc", 59 | "salary_range": "$15 an hour", 60 | "job_description": "Are you looking to work from home full time? Experienced with using CRM or other customer service management programs. WORK FROM HOME FULL TIME." 61 | }, 62 | { 63 | "location": "Remote", 64 | "job_title": "Transcriptionist", 65 | "date_posted": "Active 13 days ago", 66 | "company_name": "Allegis Transcription", 67 | "salary_range": "$20 - $40 an hour", 68 | "job_description": "Available and willing to commit time to an initial quality development program. A transcription community network with discussion forum and resource library." 69 | }, 70 | { 71 | "location": "Remote in Milwaukee, WI 53201", 72 | "job_title": "Work-From-Home Phone Representative", 73 | "date_posted": "Posted 1 day ago", 74 | "company_name": "DATA ACQUISITION SYSTEMS LLC", 75 | "salary_range": "$22 - $30 an hour", 76 | "job_description": "A high tech company which specializes in providing solutions for small businesses is seeking a motivated Phone Representative to provide exceptional customer\u2026" 77 | }, 78 | { 79 | "location": "Remote in Harrisburg, PA", 80 | "job_title": "Customer Service Representative - Work From Home", 81 | "date_posted": "Posted 4 days ago", 82 | "company_name": "CVS Health", 83 | "salary_range": "$17.00 - $27.90 an hour", 84 | "job_description": "The Company provides a fully-paid term life insurance plan to eligible employees, and short-term and long term disability benefits." 85 | }, 86 | { 87 | "location": "Remote in Tampa, FL", 88 | "job_title": "Remote Accountant (Part-Time or Full-Time)", 89 | "date_posted": "Posted 30+ days ago", 90 | "company_name": "Supporting Strategies", 91 | "salary_range": null, 92 | "job_description": "Exhibited a passion for and sense of personal satisfaction in delighting clients and helping businesses succeed." 93 | }, 94 | { 95 | "location": "Remote in Parker, CO 80134", 96 | "job_title": "Customer Service Consultant - Work From Home - Denver", 97 | "date_posted": "Posted 30+ days ago", 98 | "company_name": "CarMax", 99 | "salary_range": "$18.00 - $26.70 an hour", 100 | "job_description": "Acquire the Automotive Sales Persons License in specific states - may require testing and travel as some states request physical presence to apply for the\u2026" 101 | }, 102 | { 103 | "location": "Remote", 104 | "job_title": "Live Chat Representative", 105 | "date_posted": "Posted 3 days ago", 106 | "company_name": "Odyssey Health Group", 107 | "salary_range": null, 108 | "job_description": "As a Chat Representative, you will play a key role in providing exceptional customer service to our clients. Bachelor's degree in a relevant field." 109 | }, 110 | { 111 | "location": "Remote in Massachusetts", 112 | "job_title": "Data Entry Clerk", 113 | "date_posted": "Today", 114 | "company_name": "Compliance Medical Services", 115 | "salary_range": null, 116 | "job_description": "These benefits include top-notch health, dental, and vision care for you and your family, life insurance, and a 401(k) plan to help you plan for your future." 117 | }, 118 | { 119 | "location": "Remote", 120 | "job_title": "Remote Accounting Proposal Manager", 121 | "date_posted": "Posted 30+ days ago", 122 | "company_name": "AccountingDepartment.com", 123 | "salary_range": null, 124 | "job_description": "The Proposal Manager plays a key role in the AccountingDepartment.com\u2019s sales process by working collaboratively with the Business Development Team and Proposal\u2026" 125 | } 126 | ], 127 | "parse_status_code": 12005 128 | }, 129 | "created_at": "2023-11-26 11:11:54", 130 | "updated_at": "2023-11-26 11:13:03", 131 | "page": 1, 132 | "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco,+CA", 133 | "job_id": "7134499003198100481", 134 | "status_code": 200, 135 | "parser_type": "custom", 136 | "_request": { 137 | "cookies": [], 138 | "headers": { 139 | "Accept": "text/html,application/xhtml+xml,image/jxr,*/*", 140 | "Referer": "https://www.indeed.com/", 141 | "Sec-Ch-Ua": "\"Opera\";v=\"79\", \"Chromium\";v=\"90\", \"Not-A.Brand\";v=\"24\"", 142 | "Connection": "keep-alive", 143 | "User-Agent": "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.0 Safari/537.36 OPR/79.0.0", 144 | "Sec-Fetch-Dest": "document", 145 | "Sec-Fetch-Mode": "navigate", 146 | "Sec-Fetch-Site": "same-origin", 147 | "Sec-Fetch-User": "?1", 148 | "Accept-Encoding": "br, gzip, deflate", 149 | "Accept-Language": "en-US;q=0.8", 150 | "Sec-Ch-Ua-Model": "\"\"", 151 | "Sec-Ch-Ua-Mobile": "?0", 152 | "Sec-Ch-Ua-Platform": "\"Windows\"", 153 | "Upgrade-Insecure-Requests": "1", 154 | "Sec-Ch-Ua-Platform-Version": "\"10\"", 155 | "Sec-Ch-Ua-Full-Version-List": "\"Opera\";v=\"79.0.0\", \"Chromium\";v=\"90.0.0\", \"Not-A.Brand\";v=\"24\"" 156 | } 157 | }, 158 | "_response": { 159 | "cookies": [ 160 | { 161 | "key": "CTK", 162 | "path": "/", 163 | "value": "1hg5l9tvdis0j800", 164 | "domain": ".indeed.com", 165 | "secure": true, 166 | "comment": "", 167 | "expires": "", 168 | "max-age": "157680000", 169 | "version": "", 170 | "httponly": "", 171 | "samesite": "None" 172 | }, 173 | { 174 | "key": "INDEED_CSRF_TOKEN", 175 | "path": "/", 176 | "value": "qmZKIbvrmQ7JFFwjNcNRNyrZmhSfGokl", 177 | "domain": "", 178 | "secure": true, 179 | "comment": "", 180 | "expires": "", 181 | "max-age": "", 182 | "version": "", 183 | "httponly": "", 184 | "samesite": "" 185 | }, 186 | { 187 | "key": "LV", 188 | "path": "/", 189 | "value": "LA=1700997167:CV=1700997167:TS=1700997167", 190 | "domain": "", 191 | "secure": "", 192 | "comment": "", 193 | "expires": "Fri, 24-Nov-2028 11:12:47 GMT", 194 | "max-age": "157680000", 195 | "version": "1", 196 | "httponly": "", 197 | "samesite": "" 198 | }, 199 | { 200 | "key": "PREF", 201 | "path": "/", 202 | "value": "TM=1700997167110:L=San+Francisco%2C+CA", 203 | "domain": ".indeed.com", 204 | "secure": "", 205 | "comment": "", 206 | "expires": "Fri, 24-Nov-2028 11:12:47 GMT", 207 | "max-age": "157680000", 208 | "version": "1", 209 | "httponly": "", 210 | "samesite": "" 211 | }, 212 | { 213 | "key": "ras", 214 | "path": "/", 215 | "value": "", 216 | "domain": "", 217 | "secure": "", 218 | "comment": "", 219 | "expires": "Thu, 01-Jan-1970 00:00:10 GMT", 220 | "max-age": "", 221 | "version": "", 222 | "httponly": "", 223 | "samesite": "" 224 | }, 225 | { 226 | "key": "RQ", 227 | "path": "/", 228 | "value": "q=work+from+home&l=San+Francisco%2C+CA&ts=1700997167117", 229 | "domain": "", 230 | "secure": "", 231 | "comment": "", 232 | "expires": "Fri, 24-Nov-2028 11:12:47 GMT", 233 | "max-age": "157680000", 234 | "version": "1", 235 | "httponly": "", 236 | "samesite": "" 237 | }, 238 | { 239 | "key": "PP", 240 | "path": "/m", 241 | "value": "1", 242 | "domain": "", 243 | "secure": "", 244 | "comment": "", 245 | "expires": "", 246 | "max-age": "", 247 | "version": "", 248 | "httponly": "", 249 | "samesite": "" 250 | }, 251 | { 252 | "key": "__cf_bm", 253 | "path": "/", 254 | "value": "_5ek._BLa0K2n1Oycs8YCEEzsdTlr9aUBYMa1P_2q74-1700997167-0-AYi9IVXClLDwFf6ElCxd1z5sUDQKiG3Auzwla7W4YvR1QDSrfJwLhyfbcHUjzgcH8eeGj15xppUJzHHFRYnI7XE=", 255 | "domain": ".indeed.com", 256 | "secure": true, 257 | "comment": "", 258 | "expires": "Sun, 26-Nov-23 11:42:47 GMT", 259 | "max-age": "", 260 | "version": "", 261 | "httponly": true, 262 | "samesite": "None" 263 | }, 264 | { 265 | "key": "_cfuvid", 266 | "path": "/", 267 | "value": "72qV0sXfEK8Rlk6dANSDLZxscAH6NfSaSrZ9Mbo5VOI-1700997167689-0-604800000", 268 | "domain": ".indeed.com", 269 | "secure": true, 270 | "comment": "", 271 | "expires": "", 272 | "max-age": "", 273 | "version": "", 274 | "httponly": true, 275 | "samesite": "None" 276 | } 277 | ], 278 | "headers": { 279 | "date": "Sun, 26 Nov 2023 11:12:47 GMT", 280 | "cf-ray": "82c1b2c61fa7b0c9-ATL", 281 | "server": "cloudflare", 282 | "alt-svc": "h3=\":443\"; ma=86400", 283 | "x-indeed-dp": "cmh/cmh", 284 | "content-type": "text/html;charset=UTF-8", 285 | "cf-cache-status": "DYNAMIC", 286 | "x-frame-options": "SAMEORIGIN", 287 | "content-encoding": "br", 288 | "deployment_group": "cmh", 289 | "content-security-policy": "upgrade-insecure-requests; object-src 'none'; form-action 'self' *.indeed.com indeedapply.indeedusercontent.com/callback/ go.indeedassessments.com/ take.indeedassessments.com/; frame-src 'self' *.indeed.com smartlock.google.com/ accounts.google.com/ www.accounts.google.com/ www.google.com/recaptcha/ hcaptcha.com *.hcaptcha.com www.youtube.com/embed/ indeedapply.indeedusercontent.com/callback/ 6927552.fls.doubleclick.net/ 8232301.fls.doubleclick.net/ d1kxbtm38oz100.cloudfront.net/ *.indeedassessments.com; frame-ancestors 'self' *.indeed.com ; img-src 'self' *.indeed.com data: smartlock.google.com/ accounts.google.com/ googletagmanager.com d2q79iu7y748jz.cloudfront.net ds61c5zfdq8r8.cloudfront.net d3s4xzh46vzktb.cloudfront.net d3fw5vlhllyvee.cloudfront.net d3hbwax96mbv6t.cloudfront.net dl76z5cuv1as3.cloudfront.net www.facebook.com/tr/ sb.scorecardresearch.com bs.serving-sys.com maps.googleapis.com csi.gstatic.com maps.gstatic.com www.youtube.com www.google-analytics.com/collect ad.doubleclick.net/ddm/ adservice.google.com www.googletagmanager.com/ stats.g.doubleclick.net www.google.co.jp/ads/ga-audiences www.google.com/ads/ga-audience www.google.com/ads/ga-audiences jas.indeednps.com api.pitneybowes.com api.precisely.com locate.pitneybowes.com api-maps.precisely.com res.cloudinary.com staging-pt.ispot.tv pt.ispot.tv developer.precisely.com/images/precisely-logo-purple.gif pxl.indeed.com/usersync match.prod.bidr.io/cookie-sync/indeed firebaseinstallations.googleapis.com fcmregistrations.googleapis.com 6927552.fls.doubleclick.net/ 8232301.fls.doubleclick.net/ i.ytimg.com/ ucarecdn.com app.appsflyer.com https://d23zm5r1n38khq.cloudfront.net https://click.appcast.io https://jsv3.recruitics.com; script-src 'self' *.indeed.com 'unsafe-inline' 'unsafe-eval' data: blob: smartlock.google.com/ accounts.google.com/ d3hbwax96mbv6t.cloudfront.net d2q79iu7y748jz.cloudfront.net d13w2n5a9i6r56.cloudfront.net d3fw5vlhllyvee.cloudfront.net dl76z5cuv1as3.cloudfront.net www.google-analytics.com sb.scorecardresearch.com connect.facebook.net *.serving-sys.com maps.googleapis.com csi.gstatic.com ad.doubleclick.net/ddm/ www.google.com/recaptcha/ hcaptcha.com *.hcaptcha.com www.gstatic.com www.youtube.com maps.gstatic.com adservice.google.com www.googletagmanager.com/ stats.g.doubleclick.net www.google.co.jp/ads/ga-audiences www.google.com/ads/ga-audience www.google.com/ads/ga-audiences firebaseinstallations.googleapis.com fcmregistrations.googleapis.com static.cloudflareinsights.com *.browser-intake-datadoghq.com https://www.datadoghq-browser-agent.com; style-src 'self' 'unsafe-inline' data: *.indeed.com c03.s3.indeed.com c03.s3.indeed.com c03.s3.indeed.com c03.s3.indeed.com accounts.google.com/ d3hbwax96mbv6t.cloudfront.net d12632ofg6v5f7.cloudfront.net d3fw5vlhllyvee.cloudfront.net d13w2n5a9i6r56.cloudfront.net dl76z5cuv1as3.cloudfront.net fonts.googleapis.com/; default-src 'self' 'unsafe-inline' data: *.indeed.com c03.s3.indeed.com c03.s3.indeed.com c03.s3.indeed.com c03.s3.indeed.com accounts.google.com/ d1ymdoy4af119w.cloudfront.net/ d3fw5vlhllyvee.cloudfront.net dl76z5cuv1as3.cloudfront.net www.google-analytics.com www.google.com/maps/search/ hcaptcha.com *.hcaptcha.com stats.g.doubleclick.net privacyportal.onetrust.com api.pitneybowes.com api.precisely.com api-maps.precisely.com fonts.gstatic.com/ res.cloudinary.com developer.precisely.com/images/precisely-logo-purple.gif firebaseinstallations.googleapis.com fcmregistrations.googleapis.com rs.fullstory.com/rec/ i.ytimg.com/ static.cloudflareinsights.com *.browser-intake-datadoghq.com https://www.datadoghq-browser-agent.com https://click.appcast.io https://jsv3.recruitics.com;" 290 | } 291 | }, 292 | "session_info": { 293 | "id": null, 294 | "expires_at": null, 295 | "remaining": null 296 | } 297 | } 298 | ], 299 | "job": { 300 | "callback_url": "https://realtime.oxylabs.io:443/api/done", 301 | "client_id": 36698, 302 | "context": [ 303 | { 304 | "key": "force_headers", 305 | "value": null 306 | }, 307 | { 308 | "key": "successful_status_codes", 309 | "value": [] 310 | }, 311 | { 312 | "key": "follow_redirects", 313 | "value": null 314 | }, 315 | { 316 | "key": "cookies", 317 | "value": [] 318 | }, 319 | { 320 | "key": "headers", 321 | "value": [] 322 | }, 323 | { 324 | "key": "session_id", 325 | "value": null 326 | }, 327 | { 328 | "key": "http_method", 329 | "value": "get" 330 | }, 331 | { 332 | "key": "content", 333 | "value": null 334 | }, 335 | { 336 | "key": "store_id", 337 | "value": null 338 | } 339 | ], 340 | "created_at": "2023-11-26 11:11:54", 341 | "domain": "com", 342 | "geo_location": null, 343 | "id": "7134499003198100481", 344 | "limit": 10, 345 | "locale": null, 346 | "pages": 1, 347 | "parse": true, 348 | "parser_type": "custom", 349 | "parsing_instructions": { 350 | "job_listings": { 351 | "_fns": [ 352 | { 353 | "_fn": "css", 354 | "_args": [ 355 | ".job_seen_beacon" 356 | ] 357 | } 358 | ], 359 | "_items": { 360 | "location": { 361 | "_fns": [ 362 | { 363 | "_fn": "xpath_one", 364 | "_args": [ 365 | ".//div[@data-testid='text-location']//text()" 366 | ] 367 | } 368 | ] 369 | }, 370 | "job_title": { 371 | "_fns": [ 372 | { 373 | "_fn": "xpath_one", 374 | "_args": [ 375 | ".//h2[contains(@class,'jobTitle')]/a/span/text()" 376 | ] 377 | } 378 | ] 379 | }, 380 | "date_posted": { 381 | "_fns": [ 382 | { 383 | "_fn": "xpath_one", 384 | "_args": [ 385 | ".//span[@class='date']/text()" 386 | ] 387 | } 388 | ] 389 | }, 390 | "company_name": { 391 | "_fns": [ 392 | { 393 | "_fn": "xpath_one", 394 | "_args": [ 395 | ".//span[@data-testid='company-name']/text()" 396 | ] 397 | } 398 | ] 399 | }, 400 | "salary_range": { 401 | "_fns": [ 402 | { 403 | "_fn": "xpath_one", 404 | "_args": [ 405 | ".//div[contains(@class, 'salary-snippet-container') or contains(@class, 'estimated-salary')]//text()" 406 | ] 407 | } 408 | ] 409 | }, 410 | "job_description": { 411 | "_fns": [ 412 | { 413 | "_fn": "xpath_one", 414 | "_args": [ 415 | "normalize-space(.//div[@class='job-snippet'])" 416 | ] 417 | } 418 | ] 419 | } 420 | } 421 | } 422 | }, 423 | "browser_instructions": null, 424 | "render": "html", 425 | "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA", 426 | "query": "", 427 | "source": "universal", 428 | "start_page": 1, 429 | "status": "done", 430 | "storage_type": null, 431 | "storage_url": null, 432 | "subdomain": "www", 433 | "content_encoding": "utf-8", 434 | "updated_at": "2023-11-26 11:13:03", 435 | "user_agent_type": "desktop", 436 | "session_info": null, 437 | "statuses": [], 438 | "client_notes": null, 439 | "_links": [ 440 | { 441 | "rel": "self", 442 | "href": "http://data.oxylabs.io/v1/queries/7134499003198100481", 443 | "method": "GET" 444 | }, 445 | { 446 | "rel": "results", 447 | "href": "http://data.oxylabs.io/v1/queries/7134499003198100481/results", 448 | "method": "GET" 449 | } 450 | ] 451 | } 452 | } -------------------------------------------------------------------------------- /src/scraper_api_demo.py: -------------------------------------------------------------------------------- 1 | import requests 2 | 3 | payload = {"source": "universal", "url": "https://www.imdb.com"} 4 | 5 | response = requests.post( 6 | url="https://realtime.oxylabs.io/v1/queries", 7 | json=payload, 8 | auth=("username", "password"), 9 | ) 10 | 11 | print(response.json()) 12 | --------------------------------------------------------------------------------