├── README.md └── tracker.py /README.md: -------------------------------------------------------------------------------- 1 | # How to Build a Price Tracker With Python 2 | 3 | [![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112) 4 | 5 | [![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq) 6 | 7 | ## Project requirements 8 | 9 | The following price monitoring script works with Python version 3.6 and above. The recommended libraries are as follows: 10 | 11 | `Requests` – for sending HTTP requests. In other words, for downloading web pages without a browser. It’s the essential library for the upcoming price monitoring script. 12 | 13 | `BeautifulSoup` – for querying the HTML for specific elements. It’s a wrapper over a parser library. 14 | 15 | `lxml` – for parsing the HTML. An HTML retrieved by the Requests library is a string that requires parsing into a Python object before querying. Instead of directly using this library, we’ll use BeautifulSoup as a wrapper for a more straightforward API. 16 | 17 | `Price-parser` – a library useful for every price monitoring script. It helps to extract the price component from a string that contains it. 18 | 19 | `smtplib` – for sending emails. 20 | 21 | `Pandas` – for filtering product data and reading and writing CSV files. 22 | 23 | Optionally, creating a virtual environment will keep the whole process more organized: 24 | 25 | ```bash 26 | $ python3 -m venv .venv 27 | $ source .venv/bin/activate 28 | ``` 29 | 30 | To install the dependencies, open the terminal and run the following command: 31 | 32 | ```bash 33 | $ pip install pandas requests beautifulsoup4 price-parser 34 | ``` 35 | 36 | Note that the `smtlib` library is part of Python Standard Library and doesn’t need to be installed separately. 37 | 38 | Once the installation is complete, create a new Python file and add the following imports: 39 | 40 | ```python 41 | import smtplib 42 | import pandas as pd 43 | import requests 44 | from bs4 import BeautifulSoup 45 | from price_parser import Price 46 | ``` 47 | 48 | Additionally, add the following lines for initial configuration: 49 | 50 | ```python 51 | PRODUCT_URL_CSV = "products.csv" 52 | SAVE_TO_CSV = True 53 | PRICES_CSV = “prices.csv" 54 | SEND_MAIL = True 55 | ``` 56 | 57 | The CSV that contains the target URLs is supplied as `PRODUCT_URL_CSV`. 58 | 59 | If the `SAVE_TO_CSV` flag is set to `True`, the fetched prices will be saved to the CSV file specified as `PRICES_CSV`. 60 | 61 | `SEND_MAIL` is a flag that can be set to `True` to send email alerts. 62 | 63 | ## Reading a list of product URLs 64 | 65 | The easiest way to store and manage the product URLs is to keep them in a CSV or JSON file. This time we’ll use CSV as it’s easily updatable using a text editor or spreadsheet application. 66 | 67 | The CSV should contain at least two fields — `url` and `alert_price`. The product’s title can be extracted from the product URL or stored in the same CSV file. If the price monitor finds product price dropping below a value of the `alert_price` field, it’ll trigger an email alert. 68 | 69 | ![](https://images.prismic.io/oxylabs-sm/ZWRmNGFkMTQtNTBmNS00ZDkzLWFjOTMtOGZmMjdiZDZkYjQx_product-urls.png?auto=compress,format&rect=0,0,1530,276&w=1530&h=276&fm=webp&dpr=2&q=50) 70 | 71 | The CSV file can be read and converted to a dictionary object using Pandas. Let’s wrap this up in a simple function: 72 | 73 | ```python 74 | def get_urls(csv_file): 75 | df = pd.read_csv(csv_file) 76 | return df 77 | ``` 78 | 79 | The function will return a Pandas’ DataFrame object that contains three columns — product, URL, and alert_price (see the image above). 80 | 81 | ## Scraping the prices 82 | 83 | The initial step is to loop over the target URLs. 84 | 85 | Note that the `get_urls()` returns a DataFrame object. 86 | 87 | To run a loop, first use the `to_dict()` method of Pandas. When the `to_dict` method is called with the parameter as `records`, it converts the DataFrame into a list of dictionaries. 88 | 89 | Run a loop over each dictionary as follows: 90 | 91 | ```python 92 | def process_products(df): 93 | for product in df.to_dict("records"): 94 | # product["url"] is the URL 95 | ``` 96 | 97 | We’ll revisit this method after writing two additional functions. The first function is to get the HTML and the second function is to extract the price from it. 98 | 99 | To get the HTML from response for each URL, run the following function: 100 | 101 | ```python 102 | def get_response(url): 103 | response = requests.get(url) 104 | return response.text 105 | ``` 106 | 107 | Next, create a BeautifulSoup object according to the response and locate the price element using a CSS selector. Use the Price-parser library to extract the price as a float for comparison with the alert price. If you want to better understand how the Price-parser library works, head over to our GitHub repository for examples. 108 | 109 | The following function will extract the price from the given HTML, returning it as a float: 110 | 111 | ```python 112 | def get_price(html): 113 | soup = BeautifulSoup(html, "lxml") 114 | el = soup.select_one(".price_color") 115 | price = Price.fromstring(el.text) 116 | return price.amount_float 117 | ``` 118 | 119 | Note that the CSS selector used in this example is specific to the scraping target. If you are working with any other site, this is the only place where you would have to change the code. 120 | 121 | We’re using BeautifulSoup to locate an element containing the price via CSS selectors. The element is stored in the `el` variable. The text attribute of the `el` tag, `el.text`, contains the price and currency symbol. Price-parser parses this string to extract the price as a float value. 122 | 123 | There is more than one product URL in the DataFrame object. Let’s loop over all the rows and update the DataFrame with new information. 124 | 125 | The easiest approach is to convert each row into a dictionary. This way, you can read the URL, call the `get_price()` function, and update the required fields. 126 | 127 | We’ll add two new keys — the extracted price (price) and a boolean value (alert), which filters rows for sending an email. 128 | 129 | The `process_products()` function can now be extended to demonstrate the aforementioned sequence: 130 | 131 | ```python 132 | def process_products(df): 133 | updated_products = [] 134 | for product in df.to_dict("records"): 135 | html = get_response(product["url"]) 136 | product["price"] = get_price(html) 137 | product["alert"] = product["price"] < product["alert_price"] 138 | updated_products.append(product) 139 | return pd.DataFrame(updated_products) 140 | ``` 141 | 142 | This function will return a new DataFrame object containing the product URL and a name read from the CSV. Additionally, it includes the price and alert flag used to send an email on a price drop. 143 | 144 | ## Saving the output 145 | The final DataFrame containing the updated product data can be saved as CSV using a simple call to the to_csv() function. 146 | 147 | Additionally, we’ll check the `SAVE_TO_CSV` flag as follows: 148 | 149 | ```python 150 | if SAVE_TO_CSV: 151 | df_updated.to_csv(PRICES_CSV, mode="a") 152 | ``` 153 | 154 | You’ll notice that the mode is set to "a", which stands for “append” to ensure new data is appended if the CSV file is present. 155 | 156 | ## Sending email alerts 157 | 158 | Optionally, you can send an email alert on price drop based on the alert flag. First, create a function that filters the data frame and returns email’s subject and body: 159 | 160 | ```python 161 | def get_mail(df): 162 | subject = "Price Drop Alert" 163 | body = df[df["alert"]].to_string() 164 | subject_and_message = f"Subject:{subject}\n\n{body}" 165 | return subject_and_message 166 | ``` 167 | 168 | Now, using `smtplib`, create another function that sends alert emails: 169 | 170 | ```python 171 | def send_mail(df): 172 | message_text = get_mail(df) 173 | with smtplib.SMTP("smtp.server.address", 587) as smtp: 174 | smtp.starttls() 175 | smtp.login(mail_user, mail_pass) 176 | smtp.sendmail(mail_user, mail_to, message_text) 177 | ``` 178 | 179 | This code snippet assumes that you’ll set the variables `mail_user`, `mail_pass`, and `mail_to`. 180 | 181 | Putting everything together, this is the main function: 182 | 183 | ```python 184 | def main(): 185 | df = get_urls(PRODUCT_URL_CSV) 186 | df_updated = process_products(df) 187 | if SAVE_TO_CSV: 188 | df_updated.to_csv(PRICES_CSV, index=False, mode="a") 189 | if SEND_MAIL: 190 | send_mail(df_updated) 191 | ``` 192 | 193 | Execute this function to run the entire code. 194 | 195 | If you wish to run this automatically at certain intervals, use cronjob on macOS/Linux or Task Scheduler on Windows. 196 | 197 | Alternatively, you can also deploy this price monitoring script on any cloud service environment. 198 | -------------------------------------------------------------------------------- /tracker.py: -------------------------------------------------------------------------------- 1 | import smtplib 2 | import pandas as pd 3 | import requests 4 | from bs4 import BeautifulSoup 5 | from price_parser import Price 6 | 7 | PRODUCT_URL_CSV = "products.csv" 8 | SAVE_TO_CSV = True 9 | PRICES_CSV = "prices.csv" 10 | SEND_MAIL = True 11 | 12 | def get_urls(csv_file): 13 | df = pd.read_csv(csv_file) 14 | return df 15 | 16 | def get_response(url): 17 | response = requests.get(url) 18 | return response.text 19 | 20 | def get_price(html): 21 | soup = BeautifulSoup(html, "lxml") 22 | el = soup.select_one(".price_color") 23 | price = Price.fromstring(el.text) 24 | return price.amount_float 25 | 26 | def process_products(df): 27 | updated_products = [] 28 | for product in df.to_dict("records"): 29 | html = get_response(product["url"]) 30 | product["price"] = get_price(html) 31 | product["alert"] = product["price"] < product["alert_price"] 32 | updated_products.append(product) 33 | return pd.DataFrame(updated_products) 34 | 35 | def get_mail(df): 36 | subject = "Price Drop Alert" 37 | body = df[df["alert"]].to_string() 38 | subject_and_message = f"Subject:{subject}\n\n{body}" 39 | return subject_and_message 40 | 41 | def send_mail(df): 42 | message_text = get_mail(df) 43 | with smtplib.SMTP("smtp.server.address", 587) as smtp: 44 | smtp.starttls() 45 | smtp.login(mail_user, mail_pass) 46 | smtp.sendmail(mail_user, mail_to, message_text) 47 | 48 | def main(): 49 | df = get_urls(PRODUCT_URL_CSV) 50 | df_updated = process_products(df) 51 | if SAVE_TO_CSV: 52 | df_updated.to_csv(PRICES_CSV, index=False, mode="a") 53 | if SEND_MAIL: 54 | send_mail(df_updated) 55 | --------------------------------------------------------------------------------