├── .gitignore
├── README.md
├── indeed
    ├── __init__.py
    ├── items.py
    ├── middlewares.py
    ├── pipelines.py
    ├── settings.py
    └── spiders
    │   ├── __init__.py
    │   ├── jobs_spider.py
    │   └── search_spider.py
├── requirements.txt
└── scrapy.cfg


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 | venv/
30 | 
31 | 
32 | ## Custom
33 | data/


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # indeed-python-scrapy-scraper
  2 | Python Scrapy spider that scrapes Jobs data from [Indeed.com](https://www.indeed.com/). There are two versions:
  3 | 
  4 | 1. **Scrapes Job Summary Data:** The scraper will query the Indeed search page with your query parameters and extract the job data directly from the search results.
  5 | 2. **Scrapes Full Job Data:** The scraper will crawl the Indeed search pages with your query parameters, then send a request to each individual job page and scrape all the job data from the page.
  6 | 
  7 | Both of these scrapers only scrape some of the available data, however, you can easily expand them to scrape other data that is available in the response.
  8 | 
  9 | These scrapers extract the following fields from Indeed jobs pages:
 10 | 
 11 | - Company Name
 12 | - Company Location
 13 | - Job Title
 14 | - Job Description
 15 | - Job Salary
 16 | - Job Location
 17 | - Etc.
 18 | 
 19 | The following article goes through in detail how these Indeed spiders were developed, which you can use to understand the spiders and edit them for your own use case.
 20 | 
 21 | [Python Scrapy: Build A Indeed Scraper](https://scrapeops.io/python-scrapy-playbook/python-scrapy-indeed-scraper/)
 22 | 
 23 | ## ScrapeOps Proxy
 24 | This Indeed spider uses [ScrapeOps Proxy](https://scrapeops.io/proxy-aggregator/) as the proxy solution. ScrapeOps has a free plan that allows you to make up to 1,000 requests per month which makes it ideal for the development phase, but can be easily scaled up to millions of pages per month if needs be.
 25 | 
 26 | You can [sign up for a free API key here](https://scrapeops.io/app/register/main).
 27 | 
 28 | 
 29 | ## ScrapeOps Monitoring
 30 | To monitor our scraper, this spider uses the [ScrapeOps Monitor](https://scrapeops.io/monitoring-scheduling/), a free monitoring tool specifically designed for web scraping. 
 31 | 
 32 | **Live demo here:** [ScrapeOps Demo](https://scrapeops.io/app/login/demo) 
 33 | 
 34 | ![ScrapeOps Dashboard](https://scrapeops.io/assets/images/scrapeops-promo-286a59166d9f41db1c195f619aa36a06.png)
 35 | 
 36 | 
 37 | ## Installing Required Modules
 38 | 
 39 | To make sure the required modules are installed into your Python virtual environment.
 40 | From the top level of the project run:
 41 | 
 42 | ```
 43 | 
 44 | pip install -r requirements.txt
 45 | 
 46 | ```
 47 | 
 48 | ### Troubleshooting
 49 | 
 50 | If you have issues running `scrapy crawl` after installing the above, try deactivating your virtual environment and then reactivating it.
 51 | ```
 52 | 
 53 | deactivate
 54 | 
 55 | ```
 56 | Followed by 
 57 | 
 58 | ```
 59 | 
 60 | source venv/bin/activate
 61 | 
 62 | ```
 63 | 
 64 | 
 65 | 
 66 | ## Running The Scrapers
 67 | 
 68 | To run the Indeed spiders you should first set the Job query parameters you want to search by updating the `keyword_list` and `location_list` lists in the spiders:
 69 | 
 70 | ```python
 71 | 
 72 | def start_requests(self):
 73 |     keyword_list = ['software engineer']
 74 |     location_list = ['California']
 75 |     for keyword in keyword_list:
 76 |         for location in location_list:
 77 |             indeed_jobs_url = self.get_indeed_search_url(keyword, location)
 78 |             yield scrapy.Request(url=indeed_jobs_url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': 0})
 79 | 
 80 | ```
 81 | 
 82 | Then to run the spiders, enter one of the following commands:
 83 | 
 84 | | Spider  |      Command      |
 85 | |----------|-------------|
 86 | | **Job Summary Data** |  `scrapy crawl indeed_search` | 
 87 | | **Full Job Data** |  `scrapy crawl indeed_jobs` | 
 88 | 
 89 | 
 90 | ## Customizing The Indeed Scraper
 91 | The following are instructions on how to modify the Indeed scrapers for your particular use case.
 92 | 
 93 | Check out this [guide to building a Indeed.com Scrapy spider](https://scrapeops.io/python-scrapy-playbook/python-scrapy-indeed-scraper/) if you need any more information.
 94 | 
 95 | ### Configuring Job Search
 96 | To change the query parameters for the job search just change the keywords and locations in the `keyword_list` and `location_list` lists in each spider.
 97 | 
 98 | For example:
 99 | 
100 | ```python
101 | 
102 | def start_requests(self):
103 |     keyword_list = ['software engineer', 'devops engineer', 'product manager']
104 |     location_list = ['California', 'texas']
105 |     for keyword in keyword_list:
106 |         for location in location_list:
107 |             indeed_jobs_url = self.get_indeed_search_url(keyword, location)
108 |             yield scrapy.Request(url=indeed_jobs_url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': 0})
109 | 
110 | ```
111 | 
112 | ### Extract More/Different Data
113 | The JSON blobs the spiders extract the job data from are pretty big so the spiders are configured to only parse some of the data. 
114 | 
115 | You can expand or change the data that gets extract by changing the yield statements:
116 | 
117 | ```python
118 | 
119 | yield {
120 |         'keyword': keyword,
121 |         'location': location,
122 |         'page': round(offset / 10) + 1 if offset > 0 else 1,
123 |         'position': index,
124 |         'company': job.get('company'),
125 |         'companyRating': job.get('companyRating'),
126 |         'companyReviewCount': job.get('companyReviewCount'),
127 |         'companyRating': job.get('companyRating'),
128 |         'highlyRatedEmployer': job.get('highlyRatedEmployer'),
129 |         'jobkey': job.get('jobkey'),
130 |         'jobTitle': job.get('title'),
131 |         'jobLocationCity': job.get('jobLocationCity'),
132 |         'jobLocationPostal': job.get('jobLocationPostal'),
133 |         'jobLocationState': job.get('jobLocationState'),
134 |         'maxSalary': job.get('estimatedSalary').get('max') if job.get('estimatedSalary') is not None else 0,
135 |         'minSalary': job.get('estimatedSalary').get('min') if job.get('estimatedSalary') is not None else 0,
136 |         'salaryType': job.get('estimatedSalary').get('max') if job.get('estimatedSalary') is not None else 'none',
137 |         'pubDate': job.get('pubDate'),
138 |     }
139 | 
140 | ```
141 | 
142 | ### Speeding Up The Crawl
143 | The spiders are set to only use 1 concurrent thread in the settings.py file as the ScrapeOps Free Proxy Plan only gives you 1 concurrent thread.
144 | 
145 | However, if you upgrade to a paid ScrapeOps Proxy plan you will have more concurrent threads. Then you can increase the concurrency limit in your scraper by updating the `CONCURRENT_REQUESTS` value in your ``settings.py`` file.
146 | 
147 | ```python
148 | # settings.py
149 | 
150 | CONCURRENT_REQUESTS = 10
151 | 
152 | ```
153 | 
154 | ### Storing Data
155 | The spiders are set to save the scraped data into a CSV file and store it in a data folder using [Scrapy's Feed Export functionality](https://docs.scrapy.org/en/latest/topics/feed-exports.html).
156 | 
157 | ```python
158 | 
159 | custom_settings = {
160 |         'FEEDS': { 'data/%(name)s_%(time)s.csv': { 'format': 'csv',}}
161 |         }
162 | 
163 | ```
164 | 
165 | If you would like to save your CSV files to a AWS S3 bucket then check out our [Saving CSV/JSON Files to Amazon AWS S3 Bucket guide here](https://scrapeops.io//python-scrapy-playbook/scrapy-save-aws-s3)
166 | 
167 | Or if you would like to save your data to another type of database then be sure to check out these guides:
168 | 
169 | - [Saving Data to JSON](https://scrapeops.io/python-scrapy-playbook/scrapy-save-json-files)
170 | - [Saving Data to SQLite Database](https://scrapeops.io/python-scrapy-playbook/scrapy-save-data-sqlite)
171 | - [Saving Data to MySQL Database](https://scrapeops.io/python-scrapy-playbook/scrapy-save-data-mysql)
172 | - [Saving Data to Postgres Database](https://scrapeops.io/python-scrapy-playbook/scrapy-save-data-postgres)
173 | 
174 | ### Deactivating ScrapeOps Proxy & Monitor
175 | To deactivate the ScrapeOps Proxy & Monitor simply comment out the follow code in your `settings.py` file:
176 | 
177 | ```python
178 | # settings.py
179 | 
180 | # ## Enable ScrapeOps Proxy
181 | # SCRAPEOPS_PROXY_ENABLED = True
182 | 
183 | # # Add In The ScrapeOps Monitoring Extension
184 | # EXTENSIONS = {
185 | # 'scrapeops_scrapy.extension.ScrapeOpsMonitor': 500, 
186 | # }
187 | 
188 | 
189 | # DOWNLOADER_MIDDLEWARES = {
190 | 
191 | #     ## ScrapeOps Monitor
192 | #     'scrapeops_scrapy.middleware.retry.RetryMiddleware': 550,
193 | #     'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
194 |     
195 | #     ## Proxy Middleware
196 | #     'indeed.middlewares.ScrapeOpsProxyMiddleware': 725,
197 | # }
198 | 
199 | ```
200 | 
201 | 


--------------------------------------------------------------------------------
/indeed/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/python-scrapy-playbook/indeed-python-scrapy-scraper/3a1cb7d124a53a2945e07967d074b1cff4d00f85/indeed/__init__.py


--------------------------------------------------------------------------------
/indeed/items.py:
--------------------------------------------------------------------------------
 1 | # Define here the models for your scraped items
 2 | #
 3 | # See documentation in:
 4 | # https://docs.scrapy.org/en/latest/topics/items.html
 5 | 
 6 | import scrapy
 7 | 
 8 | 
 9 | class IndeedItem(scrapy.Item):
10 |     # define the fields for your item here like:
11 |     # name = scrapy.Field()
12 |     pass
13 | 


--------------------------------------------------------------------------------
/indeed/middlewares.py:
--------------------------------------------------------------------------------
 1 | from urllib.parse import urlencode
 2 | from scrapy import Request
 3 | class ScrapeOpsProxyMiddleware:
 4 | 
 5 |     @classmethod
 6 |     def from_crawler(cls, crawler):
 7 |         return cls(crawler.settings)
 8 | 
 9 | 
10 |     def __init__(self, settings):
11 |         self.scrapeops_api_key = settings.get('SCRAPEOPS_API_KEY')
12 |         self.scrapeops_endpoint = 'https://proxy.scrapeops.io/v1/?'
13 |         self.scrapeops_proxy_active = settings.get('SCRAPEOPS_PROXY_ENABLED', False)
14 | 
15 | 
16 |     @staticmethod
17 |     def _param_is_true(request, key):
18 |         if request.meta.get(key) or request.meta.get(key, 'false').lower() == 'true':
19 |             return True
20 |         return False
21 | 
22 | 
23 |     @staticmethod
24 |     def _replace_response_url(response):
25 |         real_url = response.headers.get(
26 |             'Sops-Final-Url', def_val=response.url)
27 |         return response.replace(
28 |             url=real_url.decode(response.headers.encoding))
29 |     
30 | 
31 |     def _get_scrapeops_url(self, request):
32 |         payload = {'api_key': self.scrapeops_api_key, 'url': request.url}
33 |         if self._param_is_true(request, 'sops_render_js'):
34 |             payload['render_js'] = True
35 |         if self._param_is_true(request, 'sops_residential'): 
36 |             payload['residential'] = True
37 |         if self._param_is_true(request, 'sops_keep_headers'): 
38 |             payload['keep_headers'] = True
39 |         if request.meta.get('sops_country') is not None:
40 |             payload['country'] = request.meta.get('sops_country')
41 |         proxy_url = self.scrapeops_endpoint + urlencode(payload)
42 |         return proxy_url
43 | 
44 | 
45 |     def _scrapeops_proxy_enabled(self):
46 |         if self.scrapeops_api_key is None or self.scrapeops_api_key == '' or self.scrapeops_proxy_active == False:
47 |             return False
48 |         return True
49 |     
50 |     def process_request(self, request, spider):
51 |         if self._scrapeops_proxy_enabled is False or self.scrapeops_endpoint in request.url:
52 |             return None
53 |         
54 |         scrapeops_url = self._get_scrapeops_url(request)
55 |         new_request = request.replace(
56 |             cls=Request, url=scrapeops_url, meta=request.meta)
57 |         return new_request
58 | 
59 | 
60 |     def process_response(self, request, response, spider):
61 |         new_response = self._replace_response_url(response)
62 |         return new_response
63 | 


--------------------------------------------------------------------------------
/indeed/pipelines.py:
--------------------------------------------------------------------------------
 1 | # Define your item pipelines here
 2 | #
 3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
 4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 5 | 
 6 | 
 7 | # useful for handling different item types with a single interface
 8 | from itemadapter import ItemAdapter
 9 | 
10 | 
11 | class IndeedPipeline:
12 |     def process_item(self, item, spider):
13 |         return item
14 | 


--------------------------------------------------------------------------------
/indeed/settings.py:
--------------------------------------------------------------------------------
 1 | 
 2 | BOT_NAME = 'indeed'
 3 | 
 4 | SPIDER_MODULES = ['indeed.spiders']
 5 | NEWSPIDER_MODULE = 'indeed.spiders'
 6 | 
 7 | 
 8 | # Obey robots.txt rules
 9 | ROBOTSTXT_OBEY = False
10 | 
11 | ## ScrapeOps API Key
12 | SCRAPEOPS_API_KEY = 'YOUR_API_KEY' ## Get Free API KEY here: https://scrapeops.io/app/register/main
13 | 
14 | ## Enable ScrapeOps Proxy
15 | SCRAPEOPS_PROXY_ENABLED = True
16 | 
17 | # Add In The ScrapeOps Monitoring Extension
18 | EXTENSIONS = {
19 | 'scrapeops_scrapy.extension.ScrapeOpsMonitor': 500, 
20 | }
21 | 
22 | 
23 | DOWNLOADER_MIDDLEWARES = {
24 | 
25 |     ## ScrapeOps Monitor
26 |     'scrapeops_scrapy.middleware.retry.RetryMiddleware': 550,
27 |     'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
28 |     
29 |     ## Proxy Middleware
30 |     'indeed.middlewares.ScrapeOpsProxyMiddleware': 725,
31 | }
32 | 
33 | # Max Concurrency On ScrapeOps Proxy Free Plan is 1 thread
34 | CONCURRENT_REQUESTS = 1
35 | 


--------------------------------------------------------------------------------
/indeed/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 | 


--------------------------------------------------------------------------------
/indeed/spiders/jobs_spider.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import json
 3 | import scrapy
 4 | from urllib.parse import urlencode
 5 | 
 6 | class IndeedJobSpider(scrapy.Spider):
 7 |     name = "indeed_jobs"
 8 |     custom_settings = {
 9 |         'FEEDS': { 'data/%(name)s_%(time)s.csv': { 'format': 'csv',}}
10 |         }
11 | 
12 |     def get_indeed_search_url(self, keyword, location, offset=0):
13 |         parameters = {"q": keyword, "l": location, "filter": 0, "start": offset}
14 |         return "https://www.indeed.com/jobs?" + urlencode(parameters)
15 | 
16 | 
17 |     def start_requests(self):
18 |         keyword_list = ['software engineer']
19 |         location_list = ['California']
20 |         for keyword in keyword_list:
21 |             for location in location_list:
22 |                 indeed_jobs_url = self.get_indeed_search_url(keyword, location)
23 |                 yield scrapy.Request(url=indeed_jobs_url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': 0})
24 | 
25 |     def parse_search_results(self, response):
26 |         location = response.meta['location']
27 |         keyword = response.meta['keyword'] 
28 |         offset = response.meta['offset'] 
29 |         script_tag  = re.findall(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});', response.text)
30 |         if script_tag is not None:
31 |             json_blob = json.loads(script_tag[0])
32 | 
33 |             ## Extract Jobs From Search Page
34 |             jobs_list = json_blob['metaData']['mosaicProviderJobCardsModel']['results']
35 |             for index, job in enumerate(jobs_list):
36 |                 if job.get('jobkey') is not None:
37 |                     job_url = 'https://www.indeed.com/m/basecamp/viewjob?viewtype=embedded&jk=' + job.get('jobkey')
38 |                     yield scrapy.Request(url=job_url, 
39 |                             callback=self.parse_job, 
40 |                             meta={
41 |                                 'keyword': keyword, 
42 |                                 'location': location, 
43 |                                 'page': round(offset / 10) + 1 if offset > 0 else 1,
44 |                                 'position': index,
45 |                                 'jobKey': job.get('jobkey'),
46 |                             })
47 | 
48 |             
49 |             # Paginate Through Jobs Pages
50 |             if offset == 0:
51 |                 meta_data = json_blob["metaData"]["mosaicProviderJobCardsModel"]["tierSummaries"]
52 |                 num_results = sum(category["jobCount"] for category in meta_data)
53 |                 if num_results > 1000:
54 |                     num_results = 50
55 |                 
56 |                 for offset in range(10, num_results + 10, 10):
57 |                     url = self.get_indeed_search_url(keyword, location, offset)
58 |                     yield scrapy.Request(url=url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': offset})
59 |     
60 |     def parse_job(self, response):
61 |         location = response.meta['location']
62 |         keyword = response.meta['keyword'] 
63 |         page = response.meta['page'] 
64 |         position = response.meta['position'] 
65 | 
66 |         
67 |         script_tag  = re.findall(r"_initialData=(\{.+?\});", response.text)
68 |         if script_tag is not None:
69 |             json_blob = json.loads(script_tag[0])
70 |             job = json_blob["jobInfoWrapperModel"]["jobInfoModel"]['jobInfoHeaderModel']
71 |             sanitizedJobDescription= json_blob["jobInfoWrapperModel"]["jobInfoModel"]['sanitizedJobDescription']
72 |             yield {
73 |                 'keyword': keyword,
74 |                 'location': location,
75 |                 'page': page,
76 |                 'position': position,
77 |                 'company': job.get('companyName'),
78 |                 'jobkey': response.meta['jobKey'],
79 |                 'jobTitle': job.get('jobTitle'),
80 |                 'jobDescription': sanitizedJobDescription
81 |             }
82 | 
83 | 
84 | 
85 | 
86 | 


--------------------------------------------------------------------------------
/indeed/spiders/search_spider.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import json
 3 | import scrapy
 4 | from urllib.parse import urlencode
 5 | 
 6 | class IndeedSearchSpider(scrapy.Spider):
 7 |     name = "indeed_search"
 8 |     custom_settings = {
 9 |         'FEEDS': { 'data/%(name)s_%(time)s.csv': { 'format': 'csv',}}
10 |         }
11 | 
12 |     def get_indeed_search_url(self, keyword, location, offset=0):
13 |         parameters = {"q": keyword, "l": location, "filter": 0, "start": offset}
14 |         return "https://www.indeed.com/jobs?" + urlencode(parameters)
15 | 
16 |     def start_requests(self):
17 |         keyword_list = ['software engineer']
18 |         location_list = ['California']
19 |         for keyword in keyword_list:
20 |             for location in location_list:
21 |                 indeed_jobs_url = self.get_indeed_search_url(keyword, location)
22 |                 yield scrapy.Request(url=indeed_jobs_url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': 0})
23 | 
24 |     def parse_search_results(self, response):
25 |         location = response.meta['location']
26 |         keyword = response.meta['keyword'] 
27 |         offset = response.meta['offset'] 
28 |         script_tag  = re.findall(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});', response.text)
29 |         if script_tag is not None:
30 |             json_blob = json.loads(script_tag[0])
31 | 
32 |             ## Extract Jobs From Search Page
33 |             jobs_list = json_blob['metaData']['mosaicProviderJobCardsModel']['results']
34 |             for index, job in enumerate(jobs_list):
35 |                 yield {
36 |                     'keyword': keyword,
37 |                     'location': location,
38 |                     'page': round(offset / 10) + 1 if offset > 0 else 1,
39 |                     'position': index,
40 |                     'company': job.get('company'),
41 |                     'companyRating': job.get('companyRating'),
42 |                     'companyReviewCount': job.get('companyReviewCount'),
43 |                     'companyRating': job.get('companyRating'),
44 |                     'highlyRatedEmployer': job.get('highlyRatedEmployer'),
45 |                     'jobkey': job.get('jobkey'),
46 |                     'jobTitle': job.get('title'),
47 |                     'jobLocationCity': job.get('jobLocationCity'),
48 |                     'jobLocationPostal': job.get('jobLocationPostal'),
49 |                     'jobLocationState': job.get('jobLocationState'),
50 |                     'maxSalary': job.get('estimatedSalary').get('max') if job.get('estimatedSalary') is not None else 0,
51 |                     'minSalary': job.get('estimatedSalary').get('min') if job.get('estimatedSalary') is not None else 0,
52 |                     'salaryType': job.get('estimatedSalary').get('max') if job.get('estimatedSalary') is not None else 'none',
53 |                     'pubDate': job.get('pubDate'),
54 |                 }
55 |             
56 |             ## Paginate Through Jobs Pages
57 |             if offset == 0:
58 |                 meta_data = json_blob["metaData"]["mosaicProviderJobCardsModel"]["tierSummaries"]
59 |                 num_results = sum(category["jobCount"] for category in meta_data)
60 |                 if num_results > 1000:
61 |                     num_results = 50
62 |                 
63 |                 for offset in range(10, num_results + 10, 10):
64 |                     url = self.get_indeed_search_url(keyword, location, offset)
65 |                     yield scrapy.Request(url=url, callback=self.parse_search_results, meta={'keyword': keyword, 'location': location, 'offset': offset})
66 | 
67 | 
68 | 
69 | 
70 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | attrs==22.2.0
 2 | Automat==22.10.0
 3 | certifi==2022.12.7
 4 | cffi==1.15.1
 5 | charset-normalizer==3.1.0
 6 | constantly==15.1.0
 7 | cryptography==40.0.1
 8 | cssselect==1.2.0
 9 | filelock==3.10.7
10 | hyperlink==21.0.0
11 | idna==3.4
12 | incremental==22.10.0
13 | itemadapter==0.7.0
14 | itemloaders==1.0.6
15 | jmespath==1.0.1
16 | json5==0.9.11
17 | lxml==4.9.2
18 | packaging==23.0
19 | parsel==1.7.0
20 | Protego==0.2.1
21 | pyasn1==0.4.8
22 | pyasn1-modules==0.2.8
23 | pycparser==2.21
24 | PyDispatcher==2.0.7
25 | pyOpenSSL==23.1.1
26 | queuelib==1.6.2
27 | requests==2.28.2
28 | requests-file==1.5.1
29 | scrapeops-scrapy==0.5.2
30 | scrapeops-scrapy-proxy-sdk==1.0
31 | Scrapy==2.8.0
32 | service-identity==21.1.0
33 | six==1.16.0
34 | tld==0.13
35 | tldextract==3.4.0
36 | Twisted==22.10.0
37 | typing_extensions==4.5.0
38 | urllib3==1.26.15
39 | w3lib==2.1.1
40 | zope.interface==6.0
41 | 


--------------------------------------------------------------------------------
/scrapy.cfg:
--------------------------------------------------------------------------------
 1 | # Automatically created by: scrapy startproject
 2 | #
 3 | # For more information about the [deploy] section see:
 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
 5 | 
 6 | [settings]
 7 | default = indeed.settings
 8 | 
 9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = indeed
12 | 


--------------------------------------------------------------------------------