├── website_assets
    └── README_NOTES.txt
├── LICENSE
├── extract_site.py
└── README.md


/website_assets/README_NOTES.txt:
--------------------------------------------------------------------------------
 1 | 🚀 Website Extractor Output 🚀
 2 | 
 3 | The extracted website files (HTML, CSS, JS) are saved in this folder.
 4 | 
 5 | 📂 Steps:
 6 | 1️⃣ Copy the extracted files.
 7 | 2️⃣ Paste them into your own project.
 8 | 3️⃣ Modify as needed. 🎨
 9 | 
10 | ✅ Happy coding! 😊
11 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Lakshitha Madumal
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/extract_site.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import requests
 3 | from bs4 import BeautifulSoup
 4 | from urllib.parse import urljoin, urlparse
 5 | 
 6 | # Target Website URL
 7 | url = "Target-Website-URL"
 8 | 
 9 | # Create folder for saving files
10 | os.makedirs("website_assets", exist_ok=True)
11 | 
12 | # Function to download a file
13 | def download_file(file_url, folder):
14 |     parsed_url = urlparse(file_url)
15 |     filename = os.path.basename(parsed_url.path)
16 |     save_path = os.path.join(folder, filename)
17 | 
18 |     try:
19 |         response = requests.get(file_url)
20 |         response.raise_for_status()  # Check if request was successful
21 |         with open(save_path, "wb") as file:
22 |             file.write(response.content)
23 |         print(f"Downloaded: {filename}")
24 |     except Exception as e:
25 |         print(f"Failed to download {file_url}: {e}")
26 | 
27 | # Step 1: Get HTML content
28 | response = requests.get(url)
29 | soup = BeautifulSoup(response.text, "html.parser")
30 | 
31 | # Save HTML
32 | html_file = os.path.join("website_assets", "index.html")
33 | with open(html_file, "w", encoding="utf-8") as file:
34 |     file.write(soup.prettify())
35 | print("HTML Saved!")
36 | 
37 | # Step 2: Find and download CSS files
38 | css_folder = os.path.join("website_assets", "css")
39 | os.makedirs(css_folder, exist_ok=True)
40 | 
41 | for css_link in soup.find_all("link", rel="stylesheet"):
42 |     css_url = urljoin(url, css_link.get("href"))
43 |     download_file(css_url, css_folder)
44 | 
45 | # Step 3: Find and download JavaScript files
46 | js_folder = os.path.join("website_assets", "js")
47 | os.makedirs(js_folder, exist_ok=True)
48 | 
49 | for script in soup.find_all("script", src=True):
50 |     js_url = urljoin(url, script.get("src"))
51 |     download_file(js_url, js_folder)
52 | 
53 | print("✅ Extraction Complete! HTML, CSS, and JS saved in 'website_assets' folder.")
54 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 🚀 Website Extractor
 2 | 
 3 | This Python script extracts and downloads the HTML, CSS, and JavaScript files from a given website. 🌍✨
 4 | 
 5 | ## 🎯 Features
 6 | - 📜 Downloads the main HTML file of the target website.
 7 | - 🎨 Extracts and downloads linked CSS files.
 8 | - 🛠️ Extracts and downloads JavaScript files.
 9 | - 📂 Saves all assets inside the `website_assets` folder for easy access.
10 | - 📝 Includes a simple Notepad text file for guidance.
11 | 
12 | ## 🔧 Prerequisites
13 | Before running the script, ensure that you have the following installed on your system:
14 | - **🐍 Python** (Make sure Python is installed. You can check by running `python --version` in the terminal.)
15 | - **☕ Java** (Check if Java is installed using `java -version`. If not installed, download it from [Oracle](https://www.java.com/en/download/))
16 | 
17 | ## 📥 Installation
18 | 1. Install the required Python libraries by running:
19 |    ```bash
20 |    pip install requests beautifulsoup4
21 |    ```
22 | 
23 | ## ▶️ Usage
24 | 1. Open the `extract_site.py` file and replace `Target-Website-URL` with the URL of the website you want to extract.
25 | 2. Run the script using the following command:
26 |    ```bash
27 |    python extract_site.py
28 |    ```
29 | 3. If everything is correct, the extracted files (HTML, CSS, and JS) will be saved inside the `website_assets` folder. ✅
30 | 4. Navigate to `website_assets` and copy the extracted files as needed. 📁
31 | 5. Open the `README_NOTES.txt` inside `website_assets` for further guidance.
32 | 
33 | ## 📂 Output
34 | After execution, the script will create the following folder structure:
35 | ```
36 | website_assets/
37 | │-- index.html  # Extracted HTML file
38 | │-- css/        # Folder containing CSS files
39 | │-- js/         # Folder containing JS files
40 | │-- README_NOTES.txt  # Simple Notepad text for guidance
41 | ```
42 | 
43 | ## 📜 README_NOTES.txt Content
44 | A simple text file will be created in `website_assets/` with the following content:
45 | ```
46 | 🚀 Website Extractor Output 🚀
47 | 
48 | The extracted website files (HTML, CSS, JS) are saved in this folder.
49 | 
50 | Steps:
51 | 1️⃣ Copy the extracted files.
52 | 2️⃣ Paste them into your own project.
53 | 3️⃣ Modify as needed. 🎨
54 | 
55 | Happy coding! 😊
56 | ```
57 | 
58 | ## ⚠️ Notes
59 | - 🔍 This script does not download images or other assets (like fonts or videos) from the website.
60 | - 🚨 Ensure that you have permission to scrape the target website before using this script.
61 | 
62 | ## 💡 Credits
63 | Developed with ❤️ by **Lsky**  
64 | 📧 Email: [mandujayaweera2003@gmail.com](mailto:mandujayaweera2003@gmail.com)
65 | 
66 | ## 📜 License
67 | This project is licensed under the **MIT License**. 📝
68 | 
69 | ✅ Happy coding! 🚀
70 | 


--------------------------------------------------------------------------------