├── Apify Scraper Tool ├── requirements.txt ├── src │ ├── __main__.py │ └── main.py └── README.md └── README.md /Apify Scraper Tool/requirements.txt: -------------------------------------------------------------------------------- 1 | # Feel free to add your Python dependencies below. For formatting guidelines, see: 2 | # https://pip.pypa.io/en/latest/reference/requirements-file-format/ 3 | 4 | apify ~= 1.7.0 5 | beautifulsoup4 ~= 4.12.2 6 | httpx ~= 0.25.2 7 | types-beautifulsoup4 ~= 4.12.0.7 8 | requests ~= 2.28.1 9 | -------------------------------------------------------------------------------- /Apify Scraper Tool/src/__main__.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module serves as the entry point for executing the Apify Actor. It handles the configuration of logging 3 | settings. The `main()` coroutine is then executed using `asyncio.run()`. 4 | 5 | Feel free to modify this file to suit your specific needs. 6 | """ 7 | 8 | import asyncio 9 | import logging 10 | 11 | from apify.log import ActorLogFormatter 12 | 13 | from .main import main 14 | 15 | # Configure loggers 16 | handler = logging.StreamHandler() 17 | handler.setFormatter(ActorLogFormatter()) 18 | 19 | apify_client_logger = logging.getLogger('apify_client') 20 | apify_client_logger.setLevel(logging.INFO) 21 | apify_client_logger.addHandler(handler) 22 | 23 | apify_logger = logging.getLogger('apify') 24 | apify_logger.setLevel(logging.DEBUG) 25 | apify_logger.addHandler(handler) 26 | 27 | # Execute the Actor main coroutine 28 | asyncio.run(main()) 29 | -------------------------------------------------------------------------------- /Apify Scraper Tool/src/main.py: -------------------------------------------------------------------------------- 1 | from bs4 import BeautifulSoup 2 | import re 3 | import urllib.parse 4 | import requests 5 | from apify import Actor 6 | 7 | 8 | 9 | 10 | def build_search_urls(keywords, pages, base_url="https://www.google.com/search", query_prefix="site:linkedin.com/in/"): 11 | urls = [] 12 | encoded_keywords = [urllib.parse.quote(f'"{keyword}"') for keyword in keywords] 13 | query = f"{query_prefix}+({' + '.join(encoded_keywords)})" 14 | for page in range(pages): 15 | start_parameter = page * 10 16 | urls.append(f"{base_url}?q={query}&start={start_parameter}") 17 | return urls 18 | 19 | 20 | 21 | 22 | def fetch_linkedin_profiles(urls): 23 | linkedin_url_pattern = re.compile(r'https://www\.linkedin\.com/in/[\w-]+/?') 24 | linkedin_profiles = [] 25 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} 26 | 27 | 28 | 29 | 30 | for url in urls: 31 | print(url) 32 | response = requests.get(url, headers=headers) 33 | if response.status_code == 200: 34 | soup = BeautifulSoup(response.content, 'html.parser') 35 | links = soup.find_all('a', href=True) 36 | for link in links: 37 | match = linkedin_url_pattern.search(link['href']) 38 | if match: 39 | profile_url = match.group(0) 40 | linkedin_profiles.append(profile_url) 41 | 42 | 43 | 44 | 45 | return linkedin_profiles 46 | 47 | 48 | 49 | 50 | async def main() -> None: 51 | async with Actor() as actor: 52 | actor_input = await actor.get_input() or {} 53 | keywords = actor_input.get('keywords', ["Chief Product Officer", "United States", "Insurance"]) 54 | pages = actor_input.get('pages', 1) 55 | 56 | 57 | 58 | 59 | urls = build_search_urls(keywords, pages) 60 | linkedin_profiles = fetch_linkedin_profiles(urls) 61 | if linkedin_profiles: 62 | for profile_url in linkedin_profiles: 63 | await actor.push_data({"linkedin_profile": profile_url}) 64 | actor.log.info(f'Found and pushed {len(linkedin_profiles)} LinkedIn profiles individually.') 65 | else: 66 | actor.log.info('No LinkedIn profiles found.') 67 | 68 | 69 | 70 | 71 | if __name__ == "__main__": 72 | Actor.run(main) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | LinkedIn Profile Finder & Data Organizer 2 | 3 | Project Description: 4 | 5 | This project is an advanced Python tool that allows for the automatic searching of LinkedIn profiles based on user-defined search phrases. For example, by entering phrases such as "Mechanic, California, Tesla," the tool will find profiles matching these criteria and enable easy sorting and organizing of the results. This tool helps you reach a specific target audience, making it particularly useful in recruitment, market research, or marketing activities. 6 | Key Features 7 | 8 | - Automatic Profile Search: The tool searches LinkedIn using the search phrases provided by the user, such as job title, location, or company name. 9 | - Organization and Sorting of Results: The search results can be automatically sorted based on selected criteria, making it easier to analyze and choose the most appropriate profiles. 10 | - Data Export: You can save the search results in CSV format or directly to Google Sheets, allowing for further analysis and data management in a flexible and convenient way. 11 | - Potential for Expansion: There is an option to add additional features, such as collecting publicly available information from profiles (in accordance with LinkedIn's guidelines) and generating personalized email messages. 12 | 13 | Applications: 14 | 15 | With this tool, you can: 16 | - Precisely Target a Specific Audience: The tool facilitates the identification and organization of profiles that best meet your needs, making it ideal for recruiters, marketers, or analysts. 17 | - Streamline the Recruitment Process: Find potential candidates for positions based on detailed search criteria. 18 | - Conduct Market Analysis: Collect data on specialists in a specific industry or region, helping you better understand the market and plan business activities. 19 | - Manage Data: Save search results to Google Sheets, making it easier to manage and share information with your team. 20 | 21 | Functionality Extensions: 22 | 23 | - Adding Scraping Features: In the future, the tool can be expanded to include the ability to collect publicly available information from LinkedIn profiles, such as contact details, in full compliance with LinkedIn's privacy policy and terms of service. 24 | 25 | Contributions: 26 | 27 | If you have an idea to improve this project, open an issue or submit a pull request. All suggestions and improvements are welcome. 28 | 29 | Authors and Contributions: 30 | 31 | Main Author: Patryk Rogowski We welcome external contributions. Please report issues and submit pull requests. 32 | 33 | Contact and Support: 34 | 35 | For questions or issues, please contact us via GitHub Issues or directly at email: jeremyspace@spacemillerco.com 36 | -------------------------------------------------------------------------------- /Apify Scraper Tool/README.md: -------------------------------------------------------------------------------- 1 | ## Scrape single-page in Python template 2 | 3 | A template for [web scraping](https://apify.com/web-scraping) data from a single web page in Python. The URL of the web page is passed in via input, which is defined by the [input schema](https://docs.apify.com/platform/actors/development/input-schema). The template uses the [HTTPX](https://www.python-httpx.org) to get the HTML of the page and the [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to parse the data from it. The data are then stored in a [dataset](https://docs.apify.com/sdk/python/docs/concepts/storages#working-with-datasets) where you can easily access them. 4 | 5 | The scraped data in this template are page headings but you can easily edit the code to scrape whatever you want from the page. 6 | 7 | ## Included features 8 | 9 | - **[Apify SDK](https://docs.apify.com/sdk/python/)** for Python - a toolkit for building Apify [Actors](https://apify.com/actors) and scrapers in Python 10 | - **[Input schema](https://docs.apify.com/platform/actors/development/input-schema)** - define and easily validate a schema for your Actor's input 11 | - **[Request queue](https://docs.apify.com/sdk/python/docs/concepts/storages#working-with-request-queues)** - queues into which you can put the URLs you want to scrape 12 | - **[Dataset](https://docs.apify.com/sdk/python/docs/concepts/storages#working-with-datasets)** - store structured data where each object stored has the same attributes 13 | - **[HTTPX](https://www.python-httpx.org)** - library for making asynchronous HTTP requests in Python 14 | - **[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)** - library for pulling data out of HTML and XML files 15 | 16 | ## How it works 17 | 18 | 1. `Actor.get_input()` gets the input where the page URL is defined 19 | 2. `httpx.AsyncClient().get(url)` fetches the page 20 | 3. `BeautifulSoup(response.content, 'html.parser')` loads the page data and enables parsing the headings 21 | 4. This parses the headings from the page and here you can edit the code to parse whatever you need from the page 22 | ```python 23 | for heading in soup.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]): 24 | ``` 25 | 5. `Actor.push_data(headings)` stores the headings in the dataset 26 | 27 | ## Resources 28 | 29 | - [BeautifulSoup Scraper](https://apify.com/apify/beautifulsoup-scraper) 30 | - [Python tutorials in Academy](https://docs.apify.com/academy/python) 31 | - [Web scraping with Beautiful Soup and Requests](https://blog.apify.com/web-scraping-with-beautiful-soup/) 32 | - [Beautiful Soup vs. Scrapy for web scraping](https://blog.apify.com/beautiful-soup-vs-scrapy-web-scraping/) 33 | - [Integration with Make, GitHub, Zapier, Google Drive, and other apps](https://apify.com/integrations) 34 | - [Video guide on getting scraped data using Apify API](https://www.youtube.com/watch?v=ViYYDHSBAKM) 35 | - A short guide on how to build web scrapers using code templates: 36 | 37 | [web scraper template](https://www.youtube.com/watch?v=u-i-Korzf8w) 38 | 39 | 40 | ## Getting started 41 | 42 | For complete information [see this article](https://docs.apify.com/platform/actors/development#build-actor-at-apify-console). In short, you will: 43 | 44 | 1. Build the Actor 45 | 2. Run the Actor 46 | 47 | ## Pull the Actor for local development 48 | 49 | If you would like to develop locally, you can pull the existing Actor from Apify console using Apify CLI: 50 | 51 | 1. Install `apify-cli` 52 | 53 | **Using Homebrew** 54 | 55 | ```bash 56 | brew install apify-cli 57 | ``` 58 | 59 | **Using NPM** 60 | 61 | ```bash 62 | npm -g install apify-cli 63 | ``` 64 | 65 | 2. Pull the Actor by its unique ``, which is one of the following: 66 | - unique name of the Actor to pull (e.g. "apify/hello-world") 67 | - or ID of the Actor to pull (e.g. "E2jjCZBezvAZnX8Rb") 68 | 69 | You can find both by clicking on the Actor title at the top of the page, which will open a modal containing both Actor unique name and Actor ID. 70 | 71 | This command will copy the Actor into the current directory on your local machine. 72 | 73 | ```bash 74 | apify pull 75 | ``` 76 | 77 | ## Documentation reference 78 | 79 | To learn more about Apify and Actors, take a look at the following resources: 80 | 81 | - [Apify SDK for JavaScript documentation](https://docs.apify.com/sdk/js) 82 | - [Apify SDK for Python documentation](https://docs.apify.com/sdk/python) 83 | - [Apify Platform documentation](https://docs.apify.com/platform) 84 | - [Join our developer community on Discord](https://discord.com/invite/jyEM2PRvMU) 85 | --------------------------------------------------------------------------------