├── AI-Based-Energy-Consumption-Profiling.xlsx ├── README.md └── file1 /AI-Based-Energy-Consumption-Profiling.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Beckybams/AI-Based-Energy-Consumption-Profiling/HEAD/AI-Based-Energy-Consumption-Profiling.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | AI-Based-Energy-Consumption-ProfilingA compact, practical README for the AI-Based-Energy-Consumption-Profiling project. This repository contains code and notebooks to profile and analyze energy consumption patterns using machine learning — useful for demand-side energy management, anomaly detection, load forecasting, and customer segmentation.Table of ContentsProject OverviewFeaturesRepository StructureGetting StartedRequirementsInstallationDataExample SchemaPreparing Your DataUsageQuick Start (notebook)Train a Model (script)Profile & Export ResultsModeling & EvaluationExamplesContributingLicenseContactProject OverviewThis project uses machine learning to create energy consumption profiles from smart meter or building-level energy data. It extracts features, clusters users or devices into consumption archetypes, detects anomalies, and can produce short-term forecasts to inform demand-side management actions.FeaturesData ingestion and cleaning pipelineFeature extraction: daily/weekly patterns, peak/off-peak behavior, seasonality metricsClustering to generate consumption profiles (e.g., K-Means, DBSCAN)Anomaly detection module (statistical + ML-based)Simple forecasting baselines (ARIMA, XGBoost / LightGBM)Export results to CSV/Excel for reportingExample Jupyter notebooks for exploration and demonstrationRepository Structure. 2 | ├── data/ # example and synthetic datasets (do not store PII) 3 | ├── notebooks/ # exploratory notebooks and tutorials 4 | │ ├── 01_data_overview.ipynb 5 | │ ├── 02_feature_engineering.ipynb 6 | │ └── 03_clustering_profiling.ipynb 7 | ├── src/ # core python modules 8 | │ ├── ingest.py 9 | │ ├── features.py 10 | │ ├── clustering.py 11 | │ ├── anomaly.py 12 | │ └── forecasting.py 13 | ├── scripts/ # runnable scripts (train, eval, export) 14 | │ ├── train_model.py 15 | │ └── profile_export.py 16 | ├── tests/ # unit tests 17 | ├── requirements.txt 18 | ├── README.md # this file 19 | └── LICENSEGetting StartedRequirementsPython 3.9+Packages (see requirements.txt), typical examples:pandas, numpy, scikit-learn, scipymatplotlib (for plotting in notebooks)xgboost or lightgbm (optional, for forecasting)openpyxl (for Excel export)jupyterlab / notebookInstallationClone the repo:git clone https://your.git.repo/AI-Based-Energy-Consumption-Profiling.git 20 | cd AI-Based-Energy-Consumption-ProfilingCreate a virtual environment and install dependencies:python -m venv venv 21 | source venv/bin/activate # macOS / Linux 22 | venv\Scripts\activate # Windows 23 | pip install -r requirements.txtDataExample SchemaInput data should be a time-series table (CSV/Parquet) with a datetime column and one or more consumption columns. Example:timestampmeter_idconsumption_kwh2024-01-01 00:00:00MTR-0010.422024-01-01 00:15:00MTR-0010.38.........Minimum requirements:timestamp column (ISO format recommended)meter_id (or device_id / building_id) for multi-entity datasetsconsumption column(s) (kWh, W, etc.)Preparing Your DataResample to regular intervals (e.g., 15min, 1H) if necessary.Handle missing data: interpolation, forward/backfill, or masking.Normalize units (kW vs kWh) and timezone-aware timestamps.Split into train/validation/test for forecasting tasks.UsageQuick Start (notebook)Open notebooks/01_data_overview.ipynb and follow the step-by-step guided analysis:jupyter lab 24 | # then open the notebook in the browserTrain a Model (script)Example to train clustering and profiling:python scripts/train_model.py \ 25 | --data data/example_consumption.csv \ 26 | --outdir outputs/ \ 27 | --freq 15min \ 28 | --model clusteringScript options:--data: path to CSV or Parquet`-- 29 | -------------------------------------------------------------------------------- /file1: -------------------------------------------------------------------------------- 1 | """ 2 | AI-Based-Energy-Consumption-Profiling 3 | File: AI-Based-Energy-Consumption-Profiling.py 4 | 5 | What this script does: 6 | - Generates synthetic household energy consumption data (hourly aggregated to daily and 24-hour profiles). 7 | - Performs feature engineering to create household-level features and daily load profiles. 8 | - Uses clustering (KMeans) to create consumption profiles (i.e., consumer segments). 9 | - Trains a RandomForest classifier to predict the profile label from household features. 10 | - Produces visualizations and exports results to CSV / Excel. 11 | 12 | Requirements: 13 | - Python 3.8+ 14 | - pandas, numpy, scikit-learn, matplotlib, seaborn, openpyxl 15 | 16 | Run: python AI-Based-Energy-Consumption-Profiling.py 17 | """ 18 | 19 | import os 20 | import numpy as np 21 | import pandas as pd 22 | from sklearn.cluster import KMeans 23 | from sklearn.preprocessing import StandardScaler 24 | from sklearn.decomposition import PCA 25 | from sklearn.ensemble import RandomForestClassifier 26 | from sklearn.model_selection import train_test_split 27 | from sklearn.metrics import classification_report, confusion_matrix 28 | import matplotlib.pyplot as plt 29 | import seaborn as sns 30 | 31 | sns.set(style="whitegrid") 32 | np.random.seed(42) 33 | 34 | OUTPUT_DIR = "output" 35 | os.makedirs(OUTPUT_DIR, exist_ok=True) 36 | 37 | # ----------------------------- 38 | # 1) Synthetic data generation 39 | # ----------------------------- 40 | 41 | def generate_synthetic_households(num_households=500): 42 | """ 43 | Generate synthetic household meta-data. 44 | Returns a DataFrame with one row per household. 45 | """ 46 | household_ids = [f"H{idx:04d}" for idx in range(1, num_households + 1)] 47 | household_size = np.random.choice([1, 2, 3, 4, 5, 6], size=num_households, p=[0.15,0.25,0.25,0.2,0.1,0.05]) 48 | building_type = np.random.choice(["apartment", "detached", "semi-detached", "townhouse"], size=num_households, p=[0.35,0.35,0.15,0.15]) 49 | region = np.random.choice(["north","south","east","west","central"], size=num_households) 50 | hvac_efficiency = np.round(np.random.uniform(0.6, 1.0, size=num_households), 2) # 1.0 best 51 | has_solar = np.random.choice([0,1], size=num_households, p=[0.85,0.15]) 52 | base_usage = np.round(np.random.uniform(3,8,size=num_households),2) # kWh baseline per day 53 | 54 | households = pd.DataFrame({ 55 | "household_id": household_ids, 56 | "household_size": household_size, 57 | "building_type": building_type, 58 | "region": region, 59 | "hvac_efficiency": hvac_efficiency, 60 | "has_solar": has_solar, 61 | "base_usage": base_usage 62 | }) 63 | return households 64 | 65 | 66 | def generate_daily_profiles(households, days=365): 67 | """ 68 | For each household, generate `days` days of hourly consumption (24 values per day). 69 | We'll produce an aggregated daily consumption and also store the 24-hour profile. 70 | Returns a DataFrame with columns: household_id, date, total_kwh, profile_0..profile_23 71 | """ 72 | profiles = [] 73 | date_range = pd.date_range(end=pd.Timestamp.today().normalize(), periods=days) 74 | 75 | for _, row in households.iterrows(): 76 | hid = row.household_id 77 | size = row.household_size 78 | base = row.base_usage 79 | hvac_eff = row.hvac_efficiency 80 | solar = row.has_solar 81 | 82 | # Create a daily pattern template (morning peak, evening peak) 83 | # Base profile shape (24 hours) 84 | x = np.arange(24) 85 | morning_peak = 0.6 * np.exp(-0.5 * ((x - 7) / 2)**2) 86 | evening_peak = 1.0 * np.exp(-0.5 * ((x - 19) / 2.5)**2) 87 | night_usage = 0.2 * np.exp(-0.5 * ((x - 2) / 3)**2) 88 | workday_modifier = 1.0 89 | 90 | for d, date in enumerate(date_range): 91 | # Seasonal effect: simple sinusoidal over the year to simulate heating/cooling 92 | seasonality = 1.0 + 0.25 * np.sin(2 * np.pi * (d / days)) 93 | # weekday/weekend modifier 94 | weekday = date.weekday() < 5 95 | wd_factor = 1.0 if weekday else 0.9 96 | 97 | # temperature-driven HVAC effect (simulate hotter/cooler days randomly) 98 | temp_factor = 1.0 + np.random.normal(0, 0.05) 99 | 100 | # random daily variation 101 | noise = np.random.normal(1.0, 0.08, size=24) 102 | 103 | profile = (0.5 * morning_peak + 0.9 * evening_peak + night_usage) * (base * size / 3.0) 104 | profile = profile * seasonality * wd_factor * temp_factor * noise 105 | 106 | # solar generation subtracts from daytime usage 107 | if solar: 108 | solar_generation = np.maximum(0, 0.6 * np.exp(-0.5*((x-13)/3)**2)) # midday generation shape 109 | # solar reduces consumption between 8 and 16 110 | profile = profile - 0.7 * solar_generation * base 111 | profile = np.clip(profile, a_min=0.05, a_max=None) 112 | 113 | total_kwh = profile.sum() 114 | day_record = { 115 | "household_id": hid, 116 | "date": date, 117 | "total_kwh": total_kwh, 118 | "weekday": int(weekday) 119 | } 120 | # attach hourly columns 121 | for h in range(24): 122 | day_record[f"h_{h}"] = profile[h] 123 | 124 | profiles.append(day_record) 125 | 126 | profiles_df = pd.DataFrame(profiles) 127 | return profiles_df 128 | 129 | # ----------------------------- 130 | # 2) Create dataset 131 | # ----------------------------- 132 | 133 | print("Generating synthetic households...") 134 | households = generate_synthetic_households(num_households=200) 135 | print("Generating daily profiles (this may take a moment)...") 136 | daily = generate_daily_profiles(households, days=365) 137 | 138 | # Merge household meta into daily 139 | data = daily.merge(households, on="household_id", how="left") 140 | 141 | # ----------------------------- 142 | # 3) Feature engineering 143 | # ----------------------------- 144 | print("Feature engineering...") 145 | 146 | # Daily features 147 | data['hourly_profile'] = data[[f"h_{h}" for h in range(24)]].values.tolist() 148 | 149 | # Calculate simple statistics from 24-hour profile 150 | data['peak_kwh'] = data[[f"h_{h}" for h in range(24)]].max(axis=1) 151 | data['offpeak_kwh'] = data[[f"h_{h}" for h in range(24)]].min(axis=1) 152 | data['mean_hourly_kwh'] = data[[f"h_{h}" for h in range(24)]].mean(axis=1) 153 | 154 | data['evening_share'] = data[[f"h_{h}" for h in range(17,23)]].sum(axis=1) / data['total_kwh'] 155 | 156 | data['morning_share'] = data[[f"h_{h}" for h in range(5,10)]].sum(axis=1) / data['total_kwh'] 157 | 158 | # Household-level aggregates: average daily consumption and variability 159 | agg = data.groupby('household_id').agg( 160 | avg_daily_kwh=('total_kwh','mean'), 161 | std_daily_kwh=('total_kwh','std'), 162 | median_daily_kwh=('total_kwh','median') 163 | ).reset_index() 164 | 165 | household_features = households.merge(agg, on='household_id', how='left') 166 | 167 | # For clustering, we'll use normalized 24-hour profiles averaged per household 168 | profile_cols = [f"h_{h}" for h in range(24)] 169 | avg_profile = data.groupby('household_id')[profile_cols].mean().reset_index() 170 | avg_profile = avg_profile.merge(household_features, on='household_id', how='left') 171 | 172 | # Normalize profiles 173 | scaler = StandardScaler() 174 | profile_matrix = scaler.fit_transform(avg_profile[profile_cols].values) 175 | 176 | # ----------------------------- 177 | # 4) Clustering to create profiles 178 | # ----------------------------- 179 | print("Clustering to create consumption profiles...") 180 | 181 | n_clusters = 4 182 | kmeans = KMeans(n_clusters=n_clusters, random_state=42) 183 | profile_labels = kmeans.fit_predict(profile_matrix) 184 | avg_profile['profile_label'] = profile_labels 185 | 186 | # Add label to household_features 187 | household_features = household_features.merge(avg_profile[['household_id','profile_label']], on='household_id', how='left') 188 | 189 | # Save cluster centers (for interpretation) 190 | cluster_centers = scaler.inverse_transform(kmeans.cluster_centers_) 191 | cluster_centers_df = pd.DataFrame(cluster_centers, columns=profile_cols) 192 | cluster_centers_df['profile'] = [f"Profile_{i}" for i in range(n_clusters)] 193 | cluster_centers_df.to_csv(os.path.join(OUTPUT_DIR, 'cluster_centers.csv'), index=False) 194 | 195 | # ----------------------------- 196 | # 5) Train classifier to predict profile from household features 197 | # ----------------------------- 198 | print("Training classifier to predict profile labels from household features...") 199 | 200 | # Prepare features 201 | cat_cols = ['building_type','region','has_solar'] 202 | clf_df = household_features.copy() 203 | clf_df = pd.get_dummies(clf_df, columns=['building_type','region'], drop_first=True) 204 | 205 | feature_cols = ['household_size','hvac_efficiency','has_solar','avg_daily_kwh','std_daily_kwh','median_daily_kwh'] + \ 206 | [c for c in clf_df.columns if c.startswith('building_type_') or c.startswith('region_')] 207 | 208 | X = clf_df[feature_cols].fillna(0) 209 | y = clf_df['profile_label'] 210 | 211 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y) 212 | 213 | clf = RandomForestClassifier(n_estimators=200, random_state=42) 214 | clf.fit(X_train, y_train) 215 | 216 | y_pred = clf.predict(X_test) 217 | 218 | report = classification_report(y_test, y_pred, zero_division=0) 219 | cm = confusion_matrix(y_test, y_pred) 220 | 221 | with open(os.path.join(OUTPUT_DIR, 'classification_report.txt'), 'w') as f: 222 | f.write(report) 223 | 224 | # ----------------------------- 225 | # 6) Visualizations 226 | # ----------------------------- 227 | print("Generating visualizations...") 228 | 229 | # PCA of profiles colored by cluster 230 | pca = PCA(n_components=2) 231 | proj = pca.fit_transform(profile_matrix) 232 | proj_df = pd.DataFrame(proj, columns=['pc1','pc2']) 233 | proj_df['profile_label'] = profile_labels 234 | proj_df['household_id'] = avg_profile['household_id'].values 235 | 236 | plt.figure(figsize=(8,6)) 237 | for lbl in sorted(proj_df['profile_label'].unique()): 238 | subset = proj_df[proj_df['profile_label']==lbl] 239 | plt.scatter(subset['pc1'], subset['pc2'], label=f'Profile {lbl}', alpha=0.6) 240 | plt.legend() 241 | plt.title('PCA of average daily profiles by cluster') 242 | plt.xlabel('PC1') 243 | plt.ylabel('PC2') 244 | plt.tight_layout() 245 | plt.savefig(os.path.join(OUTPUT_DIR,'profiles_pca.png'), dpi=150) 246 | plt.close() 247 | 248 | # Plot cluster center load shapes 249 | plt.figure(figsize=(10,6)) 250 | for i in range(n_clusters): 251 | plt.plot(range(24), cluster_centers_df.loc[i, profile_cols].values, marker='o', label=f'Profile_{i}') 252 | plt.xlabel('Hour of day') 253 | plt.ylabel('kWh') 254 | plt.title('Cluster center daily load shapes') 255 | plt.legend() 256 | plt.tight_layout() 257 | plt.savefig(os.path.join(OUTPUT_DIR,'cluster_centers.png'), dpi=150) 258 | plt.close() 259 | 260 | # Confusion matrix heatmap 261 | plt.figure(figsize=(6,5)) 262 | sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') 263 | plt.title('Confusion Matrix (profile prediction)') 264 | plt.xlabel('Predicted') 265 | plt.ylabel('Actual') 266 | plt.tight_layout() 267 | plt.savefig(os.path.join(OUTPUT_DIR,'confusion_matrix.png'), dpi=150) 268 | plt.close() 269 | 270 | # ----------------------------- 271 | # 7) Feature importance 272 | # ----------------------------- 273 | feat_imp = pd.Series(clf.feature_importances_, index=X.columns).sort_values(ascending=False) 274 | feat_imp.to_csv(os.path.join(OUTPUT_DIR,'feature_importances.csv')) 275 | 276 | plt.figure(figsize=(8,6)) 277 | feat_imp.head(15).plot(kind='bar') 278 | plt.title('Top feature importances for profile prediction') 279 | plt.tight_layout() 280 | plt.savefig(os.path.join(OUTPUT_DIR,'feature_importances.png'), dpi=150) 281 | plt.close() 282 | 283 | # ----------------------------- 284 | # 8) Save outputs (CSV / Excel) 285 | # ----------------------------- 286 | print("Saving outputs to the output/ folder...") 287 | 288 | # household features with labels 289 | household_features.to_csv(os.path.join(OUTPUT_DIR,'household_features_with_profiles.csv'), index=False) 290 | # example daily data sample 291 | data.sample(500, random_state=42).to_csv(os.path.join(OUTPUT_DIR,'daily_sample.csv'), index=False) 292 | 293 | # Save to Excel workbook with multiple sheets 294 | with pd.ExcelWriter(os.path.join(OUTPUT_DIR,'energy_profiling_results.xlsx'), engine='openpyxl') as writer: 295 | household_features.to_excel(writer, sheet_name='household_profiles', index=False) 296 | cluster_centers_df.to_excel(writer, sheet_name='cluster_centers', index=False) 297 | feat_imp.to_frame('importance').to_excel(writer, sheet_name='feature_importances') 298 | 299 | # Also print summary to console 300 | print("--- Summary ---") 301 | print(f"Number of households: {households.shape[0]}") 302 | print(f"Number of days per household: 365") 303 | print(f"Cluster counts:\n{avg_profile['profile_label'].value_counts().sort_index()}\n") 304 | print("Classification report (saved to output/classification_report.txt):") 305 | print(report) 306 | 307 | print("All done. Results are in the 'output' folder.") 308 | --------------------------------------------------------------------------------