├── README.md
└── lighthouse.py


/README.md:
--------------------------------------------------------------------------------
 1 | # SEO: Run multiple URLs and get Ligthhouse V5 Scores
 2 | 
 3 | ## How the Script Works
 4 | 
 5 | This Python script is designed to extract web performance metrics, specifically Core Web Vitals, from a list of URLs using the Google PageSpeed Insights API. It utilizes the `aiohttp` library for asynchronous HTTP requests and `asyncio` for handling concurrency. The extracted metrics are then processed and saved to an Excel file for further analysis.
 6 | 
 7 | ### Script Workflow
 8 | 
 9 | 1. **URL List**: The script starts with a predefined list of URLs to analyze. You can customize this list by adding or removing URLs in the `url_list` variable.
10 | 
11 | 2. **API Configuration**: Key configuration parameters are set, including:
12 |     - `category`: The performance category for analysis.
13 |     - `today`: The current date in the format "dd-mm-yyyy."
14 |     - `locale`: The locale for analysis (e.g., 'br' for Brazil).
15 |     - `key`: Your API key, which you can obtain from [Google's PageSpeed Insights API](https://developers.google.com/speed/docs/insights/v5/get-started).
16 | 
17 | 3. **API Data Extraction**: The script defines an asynchronous function `webcorevitals` to make API requests for each URL, both for 'mobile' and 'desktop' devices. It extracts various performance metrics, such as First Input Delay (FID), Interaction to Next Paint (INP), Time to First Byte (TTFB), First Contentful Paint (FCP), Speed Index (SI), Largest Contentful Paint (LCP), Time to Interactive (TTI), Total Blocking Time (TBT), Cumulative Layout Shift (CLS), Total Page Size, and the overall performance score.
18 | 
19 | 4. **Data Transformation**: The extracted data is transformed and processed to ensure consistency and proper data types.
20 | 
21 | 5. **DataFrame Creation**: A Pandas DataFrame is created to organize the extracted metrics. The DataFrame is structured with columns for Date, URL, Score, FCP, SI, LCP, TTI, TBT, CLS, Size in MB, and Device.
22 | 
23 | 6. **Concurrent Execution**: The script uses asyncio to run API requests concurrently for all URLs and devices, significantly speeding up the data extraction process.
24 | 
25 | 7. **Excel Output**: The final DataFrame is concatenated from all requests and saved as an Excel file named 'output.xlsx' in the same directory as the script.
26 | 
27 | ## How to Use It
28 | 
29 | 1. **Install Dependencies**: Make sure you have the required Python libraries installed. You can install them using pip:
30 | 
31 |    `pip install aiohttp asyncio pandas`
32 | 
33 | 2. **API Key**: Obtain an API key from [Google's PageSpeed Insights API](https://developers.google.com/speed/docs/insights/v5/get-started) and replace the `key` variable in the script with your key.
34 | 
35 | 3. **Customize URL List**: Customize the list of URLs to analyze by modifying the `url_list` variable in the script.
36 | 
37 | 4. **Run the Script**: Execute the script using Python:
38 | 
39 |    `python lighthouse.py`
40 | 
41 | 5. **Output**: Once the script finishes execution, you will find an Excel file named 'output.xlsx' containing the extracted web performance metrics in the same directory as the script.
42 | 
43 | **For Example:**
44 | 
45 | | Date       | URL                                 | Score | FCP | SI  | LCP | TTI | TBT | CLS   | Size (MB)   | Device  |
46 | |------------|-------------------------------------|-------|-----|-----|-----|-----|-----|-------|-------------|---------|
47 | | 2023-09-25 | [https://www.google.com](https://www.google.com) | 76    | 2   | 3.2 | 2   | 8.5 | 910 | 0.014 | 1.123100281 | mobile  |
48 | | 2023-09-25 | [https://www.google.com](https://www.google.com) | 92    | 0.4 | 0.8 | 0.6 | 1.9 | 220 | 0.007 | 1.246808052 | desktop |
49 | 
50 | ## Contributing
51 | 
52 | If you want to contribute please open an issue or send me an email hello@kburchardt.com. If not just give me a star.
53 | 
54 | ## Authors
55 | 
56 | * **Konrad Burchardt** - *Initial work* - [Sundios](https://github.com/sundios)
57 | * **Vinicius Stanula** - *Added new metrics and implemented Async Function* - [Vinicius](https://github.com/ViniciusStanula)


--------------------------------------------------------------------------------
/lighthouse.py:
--------------------------------------------------------------------------------
  1 | import aiohttp
  2 | import asyncio
  3 | import pandas as pd
  4 | from datetime import date
  5 | 
  6 | # =============================================================================
  7 | #         # Extracting Metrics from API Response
  8 | # =============================================================================
  9 | 
 10 | ## List of URLs to run in a loop
 11 | url_list = [
 12 |     "https://www.amazon.com",
 13 |     "https://www.nytimes.com/",
 14 | ]
 15 | 
 16 | ## Definition of analysis type, date, analysis location, and API key
 17 | category = 'performance'
 18 | today = date.today().strftime("%Y-%m-%d")
 19 | locale = 'en'
 20 | key = 'get your api key here: https://developers.google.com/speed/docs/insights/v5/get-started'
 21 | 
 22 | ## Function to Extract API Data
 23 | async def webcorevitals(session, url, device, category, today):
 24 |     
 25 |     ## Header to ensure requests have no caching
 26 |     headers = {
 27 |         'Cache-Control': 'no-cache, no-store, must-revalidate',
 28 |         'Pragma': 'no-cache',
 29 |         'Expires': '0',
 30 |     }
 31 | 
 32 |     ## API Call
 33 |     async with session.get(
 34 |         "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=" + url + "&key=" + key + "&strategy=" + device + "&category=" + category+ "&locale=" + locale,
 35 |         headers=headers
 36 |     ) as response:
 37 |         data = await response.json()
 38 | 
 39 |         print('Running URL #', url, device)
 40 | 
 41 |         test = url
 42 |         date = today
 43 |         
 44 |         try:
 45 |             #To get the Metrics: FIP, TTFB, INP, FID:
 46 |             data_loading = data['loadingExperience']
 47 |             
 48 |             #To get the Metrics: FCP, LCP, CLS, SI, TTI, Size in MB, TBT, Score:
 49 |             data = data['lighthouseResult']
 50 |         except KeyError:
 51 |             print('No Values')
 52 |             data = 'No Values.'
 53 | 
 54 |         # First Contentful Paint (FCP)
 55 |         try:
 56 |             fcp = data['audits']['first-contentful-paint']['displayValue']
 57 |         except KeyError:
 58 |             print('No Values')
 59 |             fcp = 0
 60 | 
 61 |         # Largest Contentful Paint (LCP)
 62 |         try:
 63 |             lcp = data['audits']['largest-contentful-paint']['displayValue']
 64 |         except KeyError:
 65 |             print('No Values')
 66 |             lcp = 0
 67 | 
 68 |         # Cumulative Layout Shift (CLS)
 69 |         try:
 70 |             cls = data['audits']['cumulative-layout-shift']['displayValue']
 71 |         except KeyError:
 72 |             print('No Values')
 73 |             cls = 0
 74 | 
 75 |         try:
 76 |             # Speed Index (SI)
 77 |             si = data['audits']['speed-index']['displayValue']
 78 |         except KeyError:
 79 |             print('No Values')
 80 |             si = 0
 81 | 
 82 |         try:
 83 |             # Time to Interactive (TTI)
 84 |             tti = data['audits']['interactive']['displayValue']
 85 |         except KeyError:
 86 |             print('No Values')
 87 |             tti = 0
 88 | 
 89 |         try:
 90 |             # Total Page Size (Size in MB)
 91 |             bytes = data['audits']['total-byte-weight']['numericValue']
 92 |         except KeyError:
 93 |             print('No Values')
 94 |             bytes = 0
 95 |             
 96 |         try:
 97 |             # Total Blocking Time (TBT)
 98 |             tbt = data['audits']['total-blocking-time']['displayValue']
 99 |         except KeyError:
100 |             print('No Values')
101 |             tbt = 0
102 |             
103 |         try:
104 |             # Score
105 |             score = data['categories']['performance']['score']
106 |         except KeyError:
107 |             print('No Values')
108 |             
109 |         try:
110 |             # First Input Delay (FID)
111 |             fid = data_loading["metrics"]["FIRST_INPUT_DELAY_MS"]["percentile"]
112 |         except KeyError:
113 |             print('No Values')
114 |             
115 |         try:
116 |             # Interaction to Next Paint (INP)
117 |             inp = data_loading["metrics"]["INTERACTION_TO_NEXT_PAINT"]["percentile"]
118 |         except KeyError:
119 |             print('No Values')
120 |             
121 |         try:
122 |             # Time to First Byte (TTFB)
123 |             ttfb = data_loading["metrics"]["EXPERIMENTAL_TIME_TO_FIRST_BYTE"]["percentile"]
124 |         except KeyError:
125 |             print('No Values')
126 | 
127 |         ## List with all Metrics
128 |         values = [test, score, fid, inp, ttfb, fcp, si, lcp, tti, tbt, cls, bytes, date, device]
129 | 
130 |         ## Create a DataFrame with the Result
131 |         df_score = pd.DataFrame(values)
132 | 
133 |         ## Transpose to Columns
134 |         df_score = df_score.transpose()
135 | 
136 |         ## Naming the Columns
137 |         df_score.columns = ['URL', 'Score', 'FID', 'INP', 'TTFB', 'FCP', 'SI', 'LCP', 'TTI', 'TBT', 'CLS', 'Size in MB', 'Date', 'Device']
138 | 
139 |         ## Transformations and Calculations to represent Metrics correctly
140 |         df_score['FID'] = df_score['FID'].astype(str).str.replace(r',', '').astype(float)
141 |         df_score['INP'] = df_score['INP'].astype(str).str.replace(r',', '').astype(float)
142 |         df_score['TTFB'] = df_score['TTFB'].astype(float) / 1000
143 |         df_score['LCP'] = df_score['LCP'].astype(str).str.replace(r's', '').astype(float)
144 |         df_score['FCP'] = df_score['FCP'].astype(str).str.replace(r's', '').astype(float)
145 |         df_score['SI'] = df_score['SI'].astype(str).str.replace(r's', '').astype(float)
146 |         df_score['TTI'] = df_score['TTI'].astype(str).str.replace(r's', '').astype(float)
147 |         df_score['TBT'] = df_score['TBT'].astype(str).str.replace(r'ms', '')
148 |         df_score['TBT'] = df_score['TBT'].astype(str).str.replace(r',', '').astype(float)
149 |         df_score['Score'] = df_score['Score'].astype(float) * 100
150 |         df_score['CLS'] = df_score['CLS'].astype(float)
151 |         df_score['Size in MB'] = df_score['Size in MB']/ (1024 * 1024)
152 |         df_score['Device'] = device
153 |         
154 |         ## Reorganize the column order
155 |         df_score = df_score[['Date', 'URL', 'Score', 'FID', 'INP', 'TTFB', 'FCP', 'SI', 'LCP', 'TTI', 'TBT', 'CLS', 'Size in MB', 'Device']]
156 |         df_score.columns = ['Date', 'URL', 'Score', 'First Input Delay (FID)', 'Interaction to Next Paint (INP)', 'Time to First Byte (TTFB)', 'First Contentful Paint (FCP)', 'Speed Index (SI)', 'Largest Contentful Paint (LCP)', 'Time to Interactive (TTI)', 'Total Blocking Time (TBT)', 'Cumulative Layout Shift (CLS)', 'Size in MB', 'Device']
157 |        
158 |         return df_score
159 |     
160 | # =============================================================================
161 | #         # Run the Requests and pass the DataFrame to an Excel File
162 | # =============================================================================
163 | 
164 | ## Function to run the Requests and build the DataFrame with Desktop and Mobile data
165 | async def main():
166 |     async with aiohttp.ClientSession() as session:
167 |         tasks = []
168 |         for url in url_list:
169 |             tasks.append(webcorevitals(session, url, 'mobile', category, today))
170 |             tasks.append(webcorevitals(session, url, 'desktop', category, today))
171 | 
172 |         results = await asyncio.gather(*tasks)
173 | 
174 |         ## Create an Empty DataFrame to Store the Results
175 |         df_final = pd.concat(results, ignore_index=True)
176 |         
177 |         # Save the DataFrame to an Excel file
178 |         df_final.to_excel('output.xlsx', index=False)        
179 | 
180 | if __name__ == '__main__':
181 |     # Create a new asynchronous event loop here
182 |     loop = asyncio.new_event_loop()
183 |     asyncio.set_event_loop(loop)
184 |     loop.run_until_complete(main())


--------------------------------------------------------------------------------