├── Images ├── 13_datarange.png ├── 1_gotodata.png ├── 8_cleardata.png ├── 10_rightclick.png ├── 11_clickimport.png ├── 2_clickfromweb.png ├── 3_gotowebsite.png ├── 4.scrolltobook.png ├── 5_clickimport.png ├── 12_refreshclick.png ├── 7_extracteddata.png ├── 9_refreshmanual.png ├── 14_refresh_control.png └── 6_existingworksheet.png └── README.md /Images/13_datarange.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/13_datarange.png -------------------------------------------------------------------------------- /Images/1_gotodata.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/1_gotodata.png -------------------------------------------------------------------------------- /Images/8_cleardata.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/8_cleardata.png -------------------------------------------------------------------------------- /Images/10_rightclick.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/10_rightclick.png -------------------------------------------------------------------------------- /Images/11_clickimport.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/11_clickimport.png -------------------------------------------------------------------------------- /Images/2_clickfromweb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/2_clickfromweb.png -------------------------------------------------------------------------------- /Images/3_gotowebsite.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/3_gotowebsite.png -------------------------------------------------------------------------------- /Images/4.scrolltobook.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/4.scrolltobook.png -------------------------------------------------------------------------------- /Images/5_clickimport.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/5_clickimport.png -------------------------------------------------------------------------------- /Images/12_refreshclick.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/12_refreshclick.png -------------------------------------------------------------------------------- /Images/7_extracteddata.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/7_extracteddata.png -------------------------------------------------------------------------------- /Images/9_refreshmanual.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/9_refreshmanual.png -------------------------------------------------------------------------------- /Images/14_refresh_control.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/14_refresh_control.png -------------------------------------------------------------------------------- /Images/6_existingworksheet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxylabs/web-scraping-excel-web-query/HEAD/Images/6_existingworksheet.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Guide to Scraping Data from Website to Excel with Web Query 2 | 3 | [![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877&utm_medium=affiliate&groupid=877&utm_content=web-scraping-excel-web-query-github&transaction_id=102f49063ab94276ae8f116d224b67) 4 | 5 | [![](https://dcbadge.limes.pink/api/server/Pds3gBmKMH?style=for-the-badge&theme=discord)](https://discord.gg/Pds3gBmKMH) [![YouTube](https://img.shields.io/badge/YouTube-Oxylabs-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@oxylabs) 6 | 7 | 8 | In this article, we will learn how to implement a web scraper in Excel with Web Query. We will first explore the basics of Excel Web Scraping. After that, we will also write an excel scraper using Web Query to retrieve data into excel. So, let’s get started. 9 | 10 | ## How web scraping in Excel works 11 | 12 | Microsoft Excel has a powerful built-in feature to extract data from websites. It is called `Web Query`. Web Query allows users to perform web scraping and collect data from the Internet automatically or with a few button clicks. If your target website contains a static table and you have a computer running Microsoft Excel with an active Internet connection, you are good to go! 13 | 14 | The Microsoft Web Query uses the Operating System’s Web Browser usually Internet Explorer in older Windows and Edge in the latest ones to load the website. This gives Web Query the ability to render Modern Javascript websites without any issues. Once the website finishes loading the Web Query will automatically parse the whole page and find all the relevant static HTML Tables available on the current page. It will highlight all of the tables and make them selectable. Once the user picks the desired table, a Web query will help the user to extract the table data without altering the formats or dimensions. 15 | 16 | ## How to scrape website data using Excel web queries 17 | 18 | Now that we’ve learned what Web Query let’s scrape some data from the Internet. We will scrape the `books.toscrape.com` website and extract the book’s data and save it to an excel spreadsheet. We will also learn and explore various web query features along the way. Before we begin, we need to make sure: 19 | 20 | - We have an Active Internet Connection 21 | - We have Microsoft Office Installed so that we can use Microsoft Excel. If you don’t have Microsoft Office you can download and install it from [here](https://support.microsoft.com/en-us/office/download-and-install-or-reinstall-microsoft-365-or-office-2021-on-a-pc-or-mac-4414eaaf-0478-48be-9c42-23adc4716658) 22 | 23 | Once you have Microsoft Office installed, you can follow the below steps: 24 | 25 | ### Step 1: Open a blank Spreadsheet 26 | 27 | Open a blank spreadsheet in Microsoft Excel and click on `Data` from the menu. 28 | 29 | ![Go to data](Images/1_gotodata.png) 30 | 31 | ### Step 2: Click on from web 32 | 33 | Once you click on `Data` it will show new menu items with a variety of options for extracting data. From there pick `From Web` and click on it. It will open a new Window. 34 | 35 | ![Click from Web](Images/2_clickfromweb.png) 36 | 37 | ### Step 3: Type the URL in the address bar and click Go 38 | 39 | You will see an Address bar in the `New Web Query` window. In this text box, type the URL: `https://books.toscrape.com` and click Go. 40 | 41 | ### Step 4: Navigate to the Book page 42 | 43 | It will load the website in the mini Web Browser. You can interact and browse the website here. Scroll down and click on a book link to open the book page 44 | 45 | ![Scroll to book](Images/4.scrolltobook.png) 46 | 47 | ### Step 5: Select the desired Table to scrape 48 | 49 | On the book page, you will find a table if you scroll down a bit. There will be a small yellow arrow icon that you can click. Clicking the button will select the table associated with it. Click on it. Once the table is selected click on the `Import` button below. 50 | 51 | ![Click on import](Images/5_clickimport.png) 52 | 53 | ### Step 6: Select "existing worksheet" and click ok 54 | 55 | When you click the Import Button a small window will appear similar to the below screenshot. Make sure Existing Worksheet is selected and click `OK`: 56 | 57 | ![Select Existing Workbook](Images/6_existingworksheet.png) 58 | 59 | and that’s it! The Web Query will create a background process to run the web query and fetch the website. After fetching the website, it will parse the table and extract the text into the excel columns. The output will be similar to below. 60 | 61 | ## Output 62 | 63 | You can match it with the website and validate all the data from the table are correct. 64 | 65 | ![output](Images/7_extracteddata.png) 66 | 67 | All the columns and rows will be linked to the web query so whenever we refresh the data manually or automatically Microsoft Excel will know which rows or columns to update. In the next section, we will explore multiple ways to Refresh & Update Data. 68 | 69 | ## How to Update and Refresh Data 70 | 71 | There are mainly 2 different refreshes available for the Web Query. 72 | 73 | - Automatic 74 | - Manual 75 | 76 | In the automatic mode, Excel will periodically pull the data in the background and keep the sheet up to date. We can customize the duration which will see in a few moments. Before we do this, we will explore multiple ways of doing Automatic updates and Manual refreshes of the data. 77 | 78 | First, let’s delete a few items from the existing data so that we can validate the refresh works as expected. 79 | 80 | ![delete a few data](Images/8_cleardata.png) 81 | 82 | Notice, we removed price, tax, and availability from the data. 83 | 84 | ### Option 1: Click Refresh from the Top menu 85 | 86 | On the top menu under the Data Submenu, there is a button named `Refresh`. We can pull the latest data by clicking this button or we can press the shortcut `CTRL + ALT+ F5`. Once we click the `Refresh` The missing values of Prices, tax & Availability get refreshed with the latest prices, tax & availability respectively. 87 | 88 | ![option1](Images/9_refreshmanual.png) 89 | 90 | ### Option 2: Refresh from the context menu 91 | 92 | Let’s remove a few fields again to test an alternate way of data refresh. This time, we will use the `Refresh` button of the context menu instead of the Menu button. First, we will have to `right-click` on a cell and select `Refresh`. Note that, we will have to click cells that get updated by the web query. If we click on the other cells that are not part of the web query update this `Refresh` button won’t show up. 93 | 94 | ![option 2](Images/12_refreshclick.png) 95 | 96 | ### Option 3: Edit & Rerun the Query 97 | 98 | Now we will remove some fields again and then try another method of updating the data. After removing some data. Right Click on the cell, it will show an option `Edit Query` 99 | 100 | ![option 3 a](Images/10_rightclick.png) 101 | 102 | Click on it. 103 | 104 | It will open a new window `Edit Web Query`. From this window, If we click import, The web query will run a background process to fetch the latest updates from the website and refresh the existing data, replacing old data with new data. 105 | [!option 3 b](Images/11_clickimport.png) 106 | 107 | This method is useful if we want to update the scraper e.g. changing the website URL, updating the query to fetch a different table or page, etc. 108 | 109 | Now, let’s learn how to avoid manual refresh and automate the whole refresh/update process. 110 | 111 | ## Configure Automatic Refresh from Properties 112 | 113 | The steps are pretty simple. First, we will open the context menu again by right-clicking on a cell associated with a web query. From the menu, we will select Data Range Properties. 114 | 115 | ![data range properties](Images/13_datarange.png) 116 | 117 | This will open up the `External Data Range Properties` window similar to the below: 118 | 119 | ![data range properties](Images/14_refresh_control.png) 120 | 121 | Here we are looking for the Refresh Controls. By checking these checkboxes we can automatically refresh the data. Go ahead and check the `Enable background refresh`. And then using the second checkbox we can tell excel to refresh data periodically after some time. So for example, if we set it to refresh after every 5 minutes. Excel will automatically pull the data every five minutes using a background process and update the table automatically. 122 | We won’t have to click any buttons. 123 | 124 | If we check the third checkbox `Refresh data when opening the file`. Every time we open the spreadsheet excel will pull fresh data for us. 125 | 126 | ## Conclusion 127 | 128 | Let’s revisit what we’ve learned so far. Web Query makes web data extraction a breeze in Excel, especially for websites with tables. It enables us to automate simple tasks and extract web data with less or no interaction. Web Query also allows us to scrape data from dynamic websites with Javascript. 129 | 130 | Before wrapping it up, We must take note of a few things. Web Query is not suitable for developing custom sophisticated web scrapers. For example, web scrapers that require login, interaction with button or web elements, proxy integration to do a large-scale scraping, etc. are almost impossible to accomplish using Web Query. 131 | 132 | In such cases, we have multiple alternatives such as developing Web Scraper using Python, Javascript or Go. Especially, Python Programming language is popular for developing Large Scale Cross Platform web scrapers. Google App Script and Google spreadsheet combination can be another option. Last but not least, we can also use VBA script to interact with websites from Excel however, it is not as flexible as the other options that we’ve mentioned earlier. 133 | --------------------------------------------------------------------------------