├── README.md
└── code


/README.md:
--------------------------------------------------------------------------------
 1 | # Automated ETL Pipeline for Weather Data
 2 | 
 3 | ## Overview
 4 | This project is an **Automated ETL (Extract, Transform, Load) Pipeline** designed to collect, process, and store weather data from various sources. The pipeline ensures seamless data integration for analytics, reporting, and visualization.
 5 | 
 6 | ## Features
 7 | - **Automated Data Extraction**: Retrieves weather data from APIs, CSV files, or databases.
 8 | - **Data Transformation**: Cleans, formats, and enriches the data to ensure consistency.
 9 | - **Efficient Data Loading**: Stores the processed data into a database or a cloud storage solution.
10 | - **Scheduled Execution**: Uses cron jobs or workflow orchestration tools for automation.
11 | - **Logging & Monitoring**: Tracks data processing stages and errors.
12 | 
13 | ## Technologies Used
14 | - **Python**: Main scripting language for ETL tasks.
15 | - **Pandas**: Data manipulation and transformation.
16 | - **SQL / PostgreSQL**: Database storage.
17 | - **Apache Airflow / Prefect**: Workflow orchestration.
18 | - **APIs / Web Scraping**: For data extraction from online sources.
19 | - **AWS S3 / Google Cloud Storage** (Optional): Cloud-based data storage.
20 | 
21 | ## Installation
22 | ### Prerequisites
23 | - Python 3.x installed
24 | - PostgreSQL or any other database service (optional)
25 | - Required Python packages (listed in `requirements.txt`)
26 | 
27 | ### Setup
28 | 1. Clone the repository:
29 |    ```sh
30 |    git clone https://github.com/yourusername/Automated-ETL-Pipeline-for-Weather-Data.git
31 |    cd Automated-ETL-Pipeline-for-Weather-Data
32 |    ```
33 | 2. Install dependencies:
34 |    ```sh
35 |    pip install -r requirements.txt
36 |    ```
37 | 3. Configure the `.env` file with API keys, database credentials, and other configurations.
38 | 
39 | ## Usage
40 | Run the ETL script manually:
41 | ```sh
42 | python main.py
43 | ```
44 | Or schedule the pipeline using Airflow, cron jobs, or Prefect.
45 | 
46 | ## Project Structure
47 | ```
48 | Automated-ETL-Pipeline-for-Weather-Data/
49 | │-- src/
50 | │   │-- extract.py       # Handles data extraction
51 | │   │-- transform.py     # Cleans and processes data
52 | │   │-- load.py          # Stores processed data
53 | │   │-- config.py        # Configuration settings
54 | │-- main.py              # Main execution script
55 | │-- requirements.txt     # Dependencies
56 | │-- README.md            # Project documentation
57 | ```
58 | 
59 | ## Contributing
60 | Feel free to contribute by opening an issue or submitting a pull request.
61 | 
62 | ## License
63 | This project is licensed under the MIT License.
64 | 
65 | ## Contact
66 | For any inquiries, reach out via desmondeteh@gmail.com 
67 | 
68 | 


--------------------------------------------------------------------------------
/code:
--------------------------------------------------------------------------------
  1 | import requests
  2 | import sqlite3
  3 | import pandas as pd
  4 | import datetime
  5 | import time
  6 | import logging
  7 | import random
  8 | 
  9 | # Set up logging
 10 | logging.basicConfig(filename="etl_pipeline.log", level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 11 | 
 12 | # OpenWeatherMap API Configuration
 13 | API_KEY = "your_api_key_here"  # Replace with your API key
 14 | CITY = "New York"
 15 | BASE_URL = "https://api.openweathermap.org/data/2.5/weather"
 16 | 
 17 | # SQLite Database Configuration
 18 | DB_NAME = "weather_data.db"
 19 | TABLE_NAME = "weather"
 20 | 
 21 | # List of synthetic weather conditions
 22 | WEATHER_CONDITIONS = ["Clear", "Cloudy", "Rainy", "Stormy", "Snowy", "Foggy", "Windy"]
 23 | 
 24 | 
 25 | def extract_weather_data(city):
 26 |     """Extract weather data from OpenWeatherMap API."""
 27 |     try:
 28 |         params = {"q": city, "appid": API_KEY, "units": "metric"}
 29 |         response = requests.get(BASE_URL, params=params)
 30 |         response.raise_for_status()
 31 |         data = response.json()
 32 |         logging.info(f"Successfully extracted weather data for {city}.")
 33 |         return data
 34 |     except requests.exceptions.RequestException as e:
 35 |         logging.error(f"API request failed. Generating synthetic data. Error: {e}")
 36 |         return generate_synthetic_data(city)
 37 | 
 38 | 
 39 | def generate_synthetic_data(city):
 40 |     """Generate synthetic weather data when API fails."""
 41 |     logging.info(f"Generating synthetic weather data for {city}.")
 42 |     return {
 43 |         "name": city,
 44 |         "main": {
 45 |             "temp": round(random.uniform(-10, 40), 2),  # Random temperature (-10 to 40°C)
 46 |             "humidity": random.randint(10, 100),  # Humidity (10-100%)
 47 |             "pressure": random.randint(980, 1050),  # Pressure (980-1050 hPa)
 48 |         },
 49 |         "weather": [{"description": random.choice(WEATHER_CONDITIONS)}],  # Random weather condition
 50 |         "wind": {"speed": round(random.uniform(0, 20), 2)},  # Wind speed (0-20 m/s)
 51 |     }
 52 | 
 53 | 
 54 | def transform_weather_data(data):
 55 |     """Transform the extracted weather data into a structured format."""
 56 |     try:
 57 |         if not data:
 58 |             return None
 59 | 
 60 |         transformed_data = {
 61 |             "city": data["name"],
 62 |             "temperature": data["main"]["temp"],
 63 |             "humidity": data["main"]["humidity"],
 64 |             "pressure": data["main"]["pressure"],
 65 |             "weather": data["weather"][0]["description"],
 66 |             "wind_speed": data["wind"]["speed"],
 67 |             "date_time": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
 68 |         }
 69 | 
 70 |         df = pd.DataFrame([transformed_data])
 71 |         logging.info("Successfully transformed weather data.")
 72 |         return df
 73 |     except Exception as e:
 74 |         logging.error(f"Error transforming weather data: {e}")
 75 |         return None
 76 | 
 77 | 
 78 | def load_data_to_db(df, db_name, table_name):
 79 |     """Load the transformed data into an SQLite database."""
 80 |     try:
 81 |         conn = sqlite3.connect(db_name)
 82 |         df.to_sql(table_name, conn, if_exists="append", index=False)
 83 |         conn.close()
 84 |         logging.info("Successfully loaded data into database.")
 85 |     except Exception as e:
 86 |         logging.error(f"Error loading data into database: {e}")
 87 | 
 88 | 
 89 | def run_etl():
 90 |     """Run the ETL pipeline."""
 91 |     logging.info("ETL pipeline started.")
 92 |     data = extract_weather_data(CITY)
 93 |     transformed_data = transform_weather_data(data)
 94 |     if transformed_data is not None:
 95 |         load_data_to_db(transformed_data, DB_NAME, TABLE_NAME)
 96 |     logging.info("ETL pipeline completed.")
 97 | 
 98 | 
 99 | if __name__ == "__main__":
100 |     while True:
101 |         run_etl()
102 |         time.sleep(3600)  # Run every 1 hour
103 | 


--------------------------------------------------------------------------------