├── README.md └── lighthouse.py /README.md: -------------------------------------------------------------------------------- 1 | # SEO: Run multiple URLs and get Ligthhouse V5 Scores 2 | 3 | ## How the Script Works 4 | 5 | This Python script is designed to extract web performance metrics, specifically Core Web Vitals, from a list of URLs using the Google PageSpeed Insights API. It utilizes the `aiohttp` library for asynchronous HTTP requests and `asyncio` for handling concurrency. The extracted metrics are then processed and saved to an Excel file for further analysis. 6 | 7 | ### Script Workflow 8 | 9 | 1. **URL List**: The script starts with a predefined list of URLs to analyze. You can customize this list by adding or removing URLs in the `url_list` variable. 10 | 11 | 2. **API Configuration**: Key configuration parameters are set, including: 12 | - `category`: The performance category for analysis. 13 | - `today`: The current date in the format "dd-mm-yyyy." 14 | - `locale`: The locale for analysis (e.g., 'br' for Brazil). 15 | - `key`: Your API key, which you can obtain from [Google's PageSpeed Insights API](https://developers.google.com/speed/docs/insights/v5/get-started). 16 | 17 | 3. **API Data Extraction**: The script defines an asynchronous function `webcorevitals` to make API requests for each URL, both for 'mobile' and 'desktop' devices. It extracts various performance metrics, such as First Input Delay (FID), Interaction to Next Paint (INP), Time to First Byte (TTFB), First Contentful Paint (FCP), Speed Index (SI), Largest Contentful Paint (LCP), Time to Interactive (TTI), Total Blocking Time (TBT), Cumulative Layout Shift (CLS), Total Page Size, and the overall performance score. 18 | 19 | 4. **Data Transformation**: The extracted data is transformed and processed to ensure consistency and proper data types. 20 | 21 | 5. **DataFrame Creation**: A Pandas DataFrame is created to organize the extracted metrics. The DataFrame is structured with columns for Date, URL, Score, FCP, SI, LCP, TTI, TBT, CLS, Size in MB, and Device. 22 | 23 | 6. **Concurrent Execution**: The script uses asyncio to run API requests concurrently for all URLs and devices, significantly speeding up the data extraction process. 24 | 25 | 7. **Excel Output**: The final DataFrame is concatenated from all requests and saved as an Excel file named 'output.xlsx' in the same directory as the script. 26 | 27 | ## How to Use It 28 | 29 | 1. **Install Dependencies**: Make sure you have the required Python libraries installed. You can install them using pip: 30 | 31 | `pip install aiohttp asyncio pandas` 32 | 33 | 2. **API Key**: Obtain an API key from [Google's PageSpeed Insights API](https://developers.google.com/speed/docs/insights/v5/get-started) and replace the `key` variable in the script with your key. 34 | 35 | 3. **Customize URL List**: Customize the list of URLs to analyze by modifying the `url_list` variable in the script. 36 | 37 | 4. **Run the Script**: Execute the script using Python: 38 | 39 | `python lighthouse.py` 40 | 41 | 5. **Output**: Once the script finishes execution, you will find an Excel file named 'output.xlsx' containing the extracted web performance metrics in the same directory as the script. 42 | 43 | **For Example:** 44 | 45 | | Date | URL | Score | FCP | SI | LCP | TTI | TBT | CLS | Size (MB) | Device | 46 | |------------|-------------------------------------|-------|-----|-----|-----|-----|-----|-------|-------------|---------| 47 | | 2023-09-25 | [https://www.google.com](https://www.google.com) | 76 | 2 | 3.2 | 2 | 8.5 | 910 | 0.014 | 1.123100281 | mobile | 48 | | 2023-09-25 | [https://www.google.com](https://www.google.com) | 92 | 0.4 | 0.8 | 0.6 | 1.9 | 220 | 0.007 | 1.246808052 | desktop | 49 | 50 | ## Contributing 51 | 52 | If you want to contribute please open an issue or send me an email hello@kburchardt.com. If not just give me a star. 53 | 54 | ## Authors 55 | 56 | * **Konrad Burchardt** - *Initial work* - [Sundios](https://github.com/sundios) 57 | * **Vinicius Stanula** - *Added new metrics and implemented Async Function* - [Vinicius](https://github.com/ViniciusStanula) -------------------------------------------------------------------------------- /lighthouse.py: -------------------------------------------------------------------------------- 1 | import aiohttp 2 | import asyncio 3 | import pandas as pd 4 | from datetime import date 5 | 6 | # ============================================================================= 7 | # # Extracting Metrics from API Response 8 | # ============================================================================= 9 | 10 | ## List of URLs to run in a loop 11 | url_list = [ 12 | "https://www.amazon.com", 13 | "https://www.nytimes.com/", 14 | ] 15 | 16 | ## Definition of analysis type, date, analysis location, and API key 17 | category = 'performance' 18 | today = date.today().strftime("%Y-%m-%d") 19 | locale = 'en' 20 | key = 'get your api key here: https://developers.google.com/speed/docs/insights/v5/get-started' 21 | 22 | ## Function to Extract API Data 23 | async def webcorevitals(session, url, device, category, today): 24 | 25 | ## Header to ensure requests have no caching 26 | headers = { 27 | 'Cache-Control': 'no-cache, no-store, must-revalidate', 28 | 'Pragma': 'no-cache', 29 | 'Expires': '0', 30 | } 31 | 32 | ## API Call 33 | async with session.get( 34 | "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=" + url + "&key=" + key + "&strategy=" + device + "&category=" + category+ "&locale=" + locale, 35 | headers=headers 36 | ) as response: 37 | data = await response.json() 38 | 39 | print('Running URL #', url, device) 40 | 41 | test = url 42 | date = today 43 | 44 | try: 45 | #To get the Metrics: FIP, TTFB, INP, FID: 46 | data_loading = data['loadingExperience'] 47 | 48 | #To get the Metrics: FCP, LCP, CLS, SI, TTI, Size in MB, TBT, Score: 49 | data = data['lighthouseResult'] 50 | except KeyError: 51 | print('No Values') 52 | data = 'No Values.' 53 | 54 | # First Contentful Paint (FCP) 55 | try: 56 | fcp = data['audits']['first-contentful-paint']['displayValue'] 57 | except KeyError: 58 | print('No Values') 59 | fcp = 0 60 | 61 | # Largest Contentful Paint (LCP) 62 | try: 63 | lcp = data['audits']['largest-contentful-paint']['displayValue'] 64 | except KeyError: 65 | print('No Values') 66 | lcp = 0 67 | 68 | # Cumulative Layout Shift (CLS) 69 | try: 70 | cls = data['audits']['cumulative-layout-shift']['displayValue'] 71 | except KeyError: 72 | print('No Values') 73 | cls = 0 74 | 75 | try: 76 | # Speed Index (SI) 77 | si = data['audits']['speed-index']['displayValue'] 78 | except KeyError: 79 | print('No Values') 80 | si = 0 81 | 82 | try: 83 | # Time to Interactive (TTI) 84 | tti = data['audits']['interactive']['displayValue'] 85 | except KeyError: 86 | print('No Values') 87 | tti = 0 88 | 89 | try: 90 | # Total Page Size (Size in MB) 91 | bytes = data['audits']['total-byte-weight']['numericValue'] 92 | except KeyError: 93 | print('No Values') 94 | bytes = 0 95 | 96 | try: 97 | # Total Blocking Time (TBT) 98 | tbt = data['audits']['total-blocking-time']['displayValue'] 99 | except KeyError: 100 | print('No Values') 101 | tbt = 0 102 | 103 | try: 104 | # Score 105 | score = data['categories']['performance']['score'] 106 | except KeyError: 107 | print('No Values') 108 | 109 | try: 110 | # First Input Delay (FID) 111 | fid = data_loading["metrics"]["FIRST_INPUT_DELAY_MS"]["percentile"] 112 | except KeyError: 113 | print('No Values') 114 | 115 | try: 116 | # Interaction to Next Paint (INP) 117 | inp = data_loading["metrics"]["INTERACTION_TO_NEXT_PAINT"]["percentile"] 118 | except KeyError: 119 | print('No Values') 120 | 121 | try: 122 | # Time to First Byte (TTFB) 123 | ttfb = data_loading["metrics"]["EXPERIMENTAL_TIME_TO_FIRST_BYTE"]["percentile"] 124 | except KeyError: 125 | print('No Values') 126 | 127 | ## List with all Metrics 128 | values = [test, score, fid, inp, ttfb, fcp, si, lcp, tti, tbt, cls, bytes, date, device] 129 | 130 | ## Create a DataFrame with the Result 131 | df_score = pd.DataFrame(values) 132 | 133 | ## Transpose to Columns 134 | df_score = df_score.transpose() 135 | 136 | ## Naming the Columns 137 | df_score.columns = ['URL', 'Score', 'FID', 'INP', 'TTFB', 'FCP', 'SI', 'LCP', 'TTI', 'TBT', 'CLS', 'Size in MB', 'Date', 'Device'] 138 | 139 | ## Transformations and Calculations to represent Metrics correctly 140 | df_score['FID'] = df_score['FID'].astype(str).str.replace(r',', '').astype(float) 141 | df_score['INP'] = df_score['INP'].astype(str).str.replace(r',', '').astype(float) 142 | df_score['TTFB'] = df_score['TTFB'].astype(float) / 1000 143 | df_score['LCP'] = df_score['LCP'].astype(str).str.replace(r's', '').astype(float) 144 | df_score['FCP'] = df_score['FCP'].astype(str).str.replace(r's', '').astype(float) 145 | df_score['SI'] = df_score['SI'].astype(str).str.replace(r's', '').astype(float) 146 | df_score['TTI'] = df_score['TTI'].astype(str).str.replace(r's', '').astype(float) 147 | df_score['TBT'] = df_score['TBT'].astype(str).str.replace(r'ms', '') 148 | df_score['TBT'] = df_score['TBT'].astype(str).str.replace(r',', '').astype(float) 149 | df_score['Score'] = df_score['Score'].astype(float) * 100 150 | df_score['CLS'] = df_score['CLS'].astype(float) 151 | df_score['Size in MB'] = df_score['Size in MB']/ (1024 * 1024) 152 | df_score['Device'] = device 153 | 154 | ## Reorganize the column order 155 | df_score = df_score[['Date', 'URL', 'Score', 'FID', 'INP', 'TTFB', 'FCP', 'SI', 'LCP', 'TTI', 'TBT', 'CLS', 'Size in MB', 'Device']] 156 | df_score.columns = ['Date', 'URL', 'Score', 'First Input Delay (FID)', 'Interaction to Next Paint (INP)', 'Time to First Byte (TTFB)', 'First Contentful Paint (FCP)', 'Speed Index (SI)', 'Largest Contentful Paint (LCP)', 'Time to Interactive (TTI)', 'Total Blocking Time (TBT)', 'Cumulative Layout Shift (CLS)', 'Size in MB', 'Device'] 157 | 158 | return df_score 159 | 160 | # ============================================================================= 161 | # # Run the Requests and pass the DataFrame to an Excel File 162 | # ============================================================================= 163 | 164 | ## Function to run the Requests and build the DataFrame with Desktop and Mobile data 165 | async def main(): 166 | async with aiohttp.ClientSession() as session: 167 | tasks = [] 168 | for url in url_list: 169 | tasks.append(webcorevitals(session, url, 'mobile', category, today)) 170 | tasks.append(webcorevitals(session, url, 'desktop', category, today)) 171 | 172 | results = await asyncio.gather(*tasks) 173 | 174 | ## Create an Empty DataFrame to Store the Results 175 | df_final = pd.concat(results, ignore_index=True) 176 | 177 | # Save the DataFrame to an Excel file 178 | df_final.to_excel('output.xlsx', index=False) 179 | 180 | if __name__ == '__main__': 181 | # Create a new asynchronous event loop here 182 | loop = asyncio.new_event_loop() 183 | asyncio.set_event_loop(loop) 184 | loop.run_until_complete(main()) --------------------------------------------------------------------------------