├── website_assets └── README_NOTES.txt ├── LICENSE ├── extract_site.py └── README.md /website_assets/README_NOTES.txt: -------------------------------------------------------------------------------- 1 | 🚀 Website Extractor Output 🚀 2 | 3 | The extracted website files (HTML, CSS, JS) are saved in this folder. 4 | 5 | 📂 Steps: 6 | 1️⃣ Copy the extracted files. 7 | 2️⃣ Paste them into your own project. 8 | 3️⃣ Modify as needed. 🎨 9 | 10 | ✅ Happy coding! 😊 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Lakshitha Madumal 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /extract_site.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from bs4 import BeautifulSoup 4 | from urllib.parse import urljoin, urlparse 5 | 6 | # Target Website URL 7 | url = "Target-Website-URL" 8 | 9 | # Create folder for saving files 10 | os.makedirs("website_assets", exist_ok=True) 11 | 12 | # Function to download a file 13 | def download_file(file_url, folder): 14 | parsed_url = urlparse(file_url) 15 | filename = os.path.basename(parsed_url.path) 16 | save_path = os.path.join(folder, filename) 17 | 18 | try: 19 | response = requests.get(file_url) 20 | response.raise_for_status() # Check if request was successful 21 | with open(save_path, "wb") as file: 22 | file.write(response.content) 23 | print(f"Downloaded: {filename}") 24 | except Exception as e: 25 | print(f"Failed to download {file_url}: {e}") 26 | 27 | # Step 1: Get HTML content 28 | response = requests.get(url) 29 | soup = BeautifulSoup(response.text, "html.parser") 30 | 31 | # Save HTML 32 | html_file = os.path.join("website_assets", "index.html") 33 | with open(html_file, "w", encoding="utf-8") as file: 34 | file.write(soup.prettify()) 35 | print("HTML Saved!") 36 | 37 | # Step 2: Find and download CSS files 38 | css_folder = os.path.join("website_assets", "css") 39 | os.makedirs(css_folder, exist_ok=True) 40 | 41 | for css_link in soup.find_all("link", rel="stylesheet"): 42 | css_url = urljoin(url, css_link.get("href")) 43 | download_file(css_url, css_folder) 44 | 45 | # Step 3: Find and download JavaScript files 46 | js_folder = os.path.join("website_assets", "js") 47 | os.makedirs(js_folder, exist_ok=True) 48 | 49 | for script in soup.find_all("script", src=True): 50 | js_url = urljoin(url, script.get("src")) 51 | download_file(js_url, js_folder) 52 | 53 | print("✅ Extraction Complete! HTML, CSS, and JS saved in 'website_assets' folder.") 54 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🚀 Website Extractor 2 | 3 | This Python script extracts and downloads the HTML, CSS, and JavaScript files from a given website. 🌍✨ 4 | 5 | ## 🎯 Features 6 | - 📜 Downloads the main HTML file of the target website. 7 | - 🎨 Extracts and downloads linked CSS files. 8 | - 🛠️ Extracts and downloads JavaScript files. 9 | - 📂 Saves all assets inside the `website_assets` folder for easy access. 10 | - 📝 Includes a simple Notepad text file for guidance. 11 | 12 | ## 🔧 Prerequisites 13 | Before running the script, ensure that you have the following installed on your system: 14 | - **🐍 Python** (Make sure Python is installed. You can check by running `python --version` in the terminal.) 15 | - **☕ Java** (Check if Java is installed using `java -version`. If not installed, download it from [Oracle](https://www.java.com/en/download/)) 16 | 17 | ## 📥 Installation 18 | 1. Install the required Python libraries by running: 19 | ```bash 20 | pip install requests beautifulsoup4 21 | ``` 22 | 23 | ## ▶️ Usage 24 | 1. Open the `extract_site.py` file and replace `Target-Website-URL` with the URL of the website you want to extract. 25 | 2. Run the script using the following command: 26 | ```bash 27 | python extract_site.py 28 | ``` 29 | 3. If everything is correct, the extracted files (HTML, CSS, and JS) will be saved inside the `website_assets` folder. ✅ 30 | 4. Navigate to `website_assets` and copy the extracted files as needed. 📁 31 | 5. Open the `README_NOTES.txt` inside `website_assets` for further guidance. 32 | 33 | ## 📂 Output 34 | After execution, the script will create the following folder structure: 35 | ``` 36 | website_assets/ 37 | │-- index.html # Extracted HTML file 38 | │-- css/ # Folder containing CSS files 39 | │-- js/ # Folder containing JS files 40 | │-- README_NOTES.txt # Simple Notepad text for guidance 41 | ``` 42 | 43 | ## 📜 README_NOTES.txt Content 44 | A simple text file will be created in `website_assets/` with the following content: 45 | ``` 46 | 🚀 Website Extractor Output 🚀 47 | 48 | The extracted website files (HTML, CSS, JS) are saved in this folder. 49 | 50 | Steps: 51 | 1️⃣ Copy the extracted files. 52 | 2️⃣ Paste them into your own project. 53 | 3️⃣ Modify as needed. 🎨 54 | 55 | Happy coding! 😊 56 | ``` 57 | 58 | ## ⚠️ Notes 59 | - 🔍 This script does not download images or other assets (like fonts or videos) from the website. 60 | - 🚨 Ensure that you have permission to scrape the target website before using this script. 61 | 62 | ## 💡 Credits 63 | Developed with ❤️ by **Lsky** 64 | 📧 Email: [mandujayaweera2003@gmail.com](mailto:mandujayaweera2003@gmail.com) 65 | 66 | ## 📜 License 67 | This project is licensed under the **MIT License**. 📝 68 | 69 | ✅ Happy coding! 🚀 70 | --------------------------------------------------------------------------------