├── AI-Based-Energy-Consumption-Profiling.xlsx
├── README.md
└── file1


/AI-Based-Energy-Consumption-Profiling.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Beckybams/AI-Based-Energy-Consumption-Profiling/HEAD/AI-Based-Energy-Consumption-Profiling.xlsx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | AI-Based-Energy-Consumption-ProfilingA compact, practical README for the AI-Based-Energy-Consumption-Profiling project. This repository contains code and notebooks to profile and analyze energy consumption patterns using machine learning — useful for demand-side energy management, anomaly detection, load forecasting, and customer segmentation.Table of ContentsProject OverviewFeaturesRepository StructureGetting StartedRequirementsInstallationDataExample SchemaPreparing Your DataUsageQuick Start (notebook)Train a Model (script)Profile & Export ResultsModeling & EvaluationExamplesContributingLicenseContactProject OverviewThis project uses machine learning to create energy consumption profiles from smart meter or building-level energy data. It extracts features, clusters users or devices into consumption archetypes, detects anomalies, and can produce short-term forecasts to inform demand-side management actions.FeaturesData ingestion and cleaning pipelineFeature extraction: daily/weekly patterns, peak/off-peak behavior, seasonality metricsClustering to generate consumption profiles (e.g., K-Means, DBSCAN)Anomaly detection module (statistical + ML-based)Simple forecasting baselines (ARIMA, XGBoost / LightGBM)Export results to CSV/Excel for reportingExample Jupyter notebooks for exploration and demonstrationRepository Structure.
 2 | ├── data/                      # example and synthetic datasets (do not store PII)
 3 | ├── notebooks/                 # exploratory notebooks and tutorials
 4 | │   ├── 01_data_overview.ipynb
 5 | │   ├── 02_feature_engineering.ipynb
 6 | │   └── 03_clustering_profiling.ipynb
 7 | ├── src/                       # core python modules
 8 | │   ├── ingest.py
 9 | │   ├── features.py
10 | │   ├── clustering.py
11 | │   ├── anomaly.py
12 | │   └── forecasting.py
13 | ├── scripts/                   # runnable scripts (train, eval, export)
14 | │   ├── train_model.py
15 | │   └── profile_export.py
16 | ├── tests/                     # unit tests
17 | ├── requirements.txt
18 | ├── README.md                  # this file
19 | └── LICENSEGetting StartedRequirementsPython 3.9+Packages (see requirements.txt), typical examples:pandas, numpy, scikit-learn, scipymatplotlib (for plotting in notebooks)xgboost or lightgbm (optional, for forecasting)openpyxl (for Excel export)jupyterlab / notebookInstallationClone the repo:git clone https://your.git.repo/AI-Based-Energy-Consumption-Profiling.git
20 | cd AI-Based-Energy-Consumption-ProfilingCreate a virtual environment and install dependencies:python -m venv venv
21 | source venv/bin/activate      # macOS / Linux
22 | venv\Scripts\activate         # Windows
23 | pip install -r requirements.txtDataExample SchemaInput data should be a time-series table (CSV/Parquet) with a datetime column and one or more consumption columns. Example:timestampmeter_idconsumption_kwh2024-01-01 00:00:00MTR-0010.422024-01-01 00:15:00MTR-0010.38.........Minimum requirements:timestamp column (ISO format recommended)meter_id (or device_id / building_id) for multi-entity datasetsconsumption column(s) (kWh, W, etc.)Preparing Your DataResample to regular intervals (e.g., 15min, 1H) if necessary.Handle missing data: interpolation, forward/backfill, or masking.Normalize units (kW vs kWh) and timezone-aware timestamps.Split into train/validation/test for forecasting tasks.UsageQuick Start (notebook)Open notebooks/01_data_overview.ipynb and follow the step-by-step guided analysis:jupyter lab
24 | # then open the notebook in the browserTrain a Model (script)Example to train clustering and profiling:python scripts/train_model.py \
25 |   --data data/example_consumption.csv \
26 |   --outdir outputs/ \
27 |   --freq 15min \
28 |   --model clusteringScript options:--data: path to CSV or Parquet`--
29 | 


--------------------------------------------------------------------------------
/file1:
--------------------------------------------------------------------------------
  1 | """
  2 | AI-Based-Energy-Consumption-Profiling
  3 | File: AI-Based-Energy-Consumption-Profiling.py
  4 | 
  5 | What this script does:
  6 | - Generates synthetic household energy consumption data (hourly aggregated to daily and 24-hour profiles).
  7 | - Performs feature engineering to create household-level features and daily load profiles.
  8 | - Uses clustering (KMeans) to create consumption profiles (i.e., consumer segments).
  9 | - Trains a RandomForest classifier to predict the profile label from household features.
 10 | - Produces visualizations and exports results to CSV / Excel.
 11 | 
 12 | Requirements:
 13 | - Python 3.8+
 14 | - pandas, numpy, scikit-learn, matplotlib, seaborn, openpyxl
 15 | 
 16 | Run: python AI-Based-Energy-Consumption-Profiling.py
 17 | """
 18 | 
 19 | import os
 20 | import numpy as np
 21 | import pandas as pd
 22 | from sklearn.cluster import KMeans
 23 | from sklearn.preprocessing import StandardScaler
 24 | from sklearn.decomposition import PCA
 25 | from sklearn.ensemble import RandomForestClassifier
 26 | from sklearn.model_selection import train_test_split
 27 | from sklearn.metrics import classification_report, confusion_matrix
 28 | import matplotlib.pyplot as plt
 29 | import seaborn as sns
 30 | 
 31 | sns.set(style="whitegrid")
 32 | np.random.seed(42)
 33 | 
 34 | OUTPUT_DIR = "output"
 35 | os.makedirs(OUTPUT_DIR, exist_ok=True)
 36 | 
 37 | # -----------------------------
 38 | # 1) Synthetic data generation
 39 | # -----------------------------
 40 | 
 41 | def generate_synthetic_households(num_households=500):
 42 |     """
 43 |     Generate synthetic household meta-data.
 44 |     Returns a DataFrame with one row per household.
 45 |     """
 46 |     household_ids = [f"H{idx:04d}" for idx in range(1, num_households + 1)]
 47 |     household_size = np.random.choice([1, 2, 3, 4, 5, 6], size=num_households, p=[0.15,0.25,0.25,0.2,0.1,0.05])
 48 |     building_type = np.random.choice(["apartment", "detached", "semi-detached", "townhouse"], size=num_households, p=[0.35,0.35,0.15,0.15])
 49 |     region = np.random.choice(["north","south","east","west","central"], size=num_households)
 50 |     hvac_efficiency = np.round(np.random.uniform(0.6, 1.0, size=num_households), 2)  # 1.0 best
 51 |     has_solar = np.random.choice([0,1], size=num_households, p=[0.85,0.15])
 52 |     base_usage = np.round(np.random.uniform(3,8,size=num_households),2)  # kWh baseline per day
 53 | 
 54 |     households = pd.DataFrame({
 55 |         "household_id": household_ids,
 56 |         "household_size": household_size,
 57 |         "building_type": building_type,
 58 |         "region": region,
 59 |         "hvac_efficiency": hvac_efficiency,
 60 |         "has_solar": has_solar,
 61 |         "base_usage": base_usage
 62 |     })
 63 |     return households
 64 | 
 65 | 
 66 | def generate_daily_profiles(households, days=365):
 67 |     """
 68 |     For each household, generate `days` days of hourly consumption (24 values per day).
 69 |     We'll produce an aggregated daily consumption and also store the 24-hour profile.
 70 |     Returns a DataFrame with columns: household_id, date, total_kwh, profile_0..profile_23
 71 |     """
 72 |     profiles = []
 73 |     date_range = pd.date_range(end=pd.Timestamp.today().normalize(), periods=days)
 74 | 
 75 |     for _, row in households.iterrows():
 76 |         hid = row.household_id
 77 |         size = row.household_size
 78 |         base = row.base_usage
 79 |         hvac_eff = row.hvac_efficiency
 80 |         solar = row.has_solar
 81 | 
 82 |         # Create a daily pattern template (morning peak, evening peak)
 83 |         # Base profile shape (24 hours)
 84 |         x = np.arange(24)
 85 |         morning_peak = 0.6 * np.exp(-0.5 * ((x - 7) / 2)**2)
 86 |         evening_peak = 1.0 * np.exp(-0.5 * ((x - 19) / 2.5)**2)
 87 |         night_usage = 0.2 * np.exp(-0.5 * ((x - 2) / 3)**2)
 88 |         workday_modifier = 1.0
 89 | 
 90 |         for d, date in enumerate(date_range):
 91 |             # Seasonal effect: simple sinusoidal over the year to simulate heating/cooling
 92 |             seasonality = 1.0 + 0.25 * np.sin(2 * np.pi * (d / days))
 93 |             # weekday/weekend modifier
 94 |             weekday = date.weekday() < 5
 95 |             wd_factor = 1.0 if weekday else 0.9
 96 | 
 97 |             # temperature-driven HVAC effect (simulate hotter/cooler days randomly)
 98 |             temp_factor = 1.0 + np.random.normal(0, 0.05)
 99 | 
100 |             # random daily variation
101 |             noise = np.random.normal(1.0, 0.08, size=24)
102 | 
103 |             profile = (0.5 * morning_peak + 0.9 * evening_peak + night_usage) * (base * size / 3.0)
104 |             profile = profile * seasonality * wd_factor * temp_factor * noise
105 | 
106 |             # solar generation subtracts from daytime usage
107 |             if solar:
108 |                 solar_generation = np.maximum(0, 0.6 * np.exp(-0.5*((x-13)/3)**2))  # midday generation shape
109 |                 # solar reduces consumption between 8 and 16
110 |                 profile = profile - 0.7 * solar_generation * base
111 |                 profile = np.clip(profile, a_min=0.05, a_max=None)
112 | 
113 |             total_kwh = profile.sum()
114 |             day_record = {
115 |                 "household_id": hid,
116 |                 "date": date,
117 |                 "total_kwh": total_kwh,
118 |                 "weekday": int(weekday)
119 |             }
120 |             # attach hourly columns
121 |             for h in range(24):
122 |                 day_record[f"h_{h}"] = profile[h]
123 | 
124 |             profiles.append(day_record)
125 | 
126 |     profiles_df = pd.DataFrame(profiles)
127 |     return profiles_df
128 | 
129 | # -----------------------------
130 | # 2) Create dataset
131 | # -----------------------------
132 | 
133 | print("Generating synthetic households...")
134 | households = generate_synthetic_households(num_households=200)
135 | print("Generating daily profiles (this may take a moment)...")
136 | daily = generate_daily_profiles(households, days=365)
137 | 
138 | # Merge household meta into daily
139 | data = daily.merge(households, on="household_id", how="left")
140 | 
141 | # -----------------------------
142 | # 3) Feature engineering
143 | # -----------------------------
144 | print("Feature engineering...")
145 | 
146 | # Daily features
147 | data['hourly_profile'] = data[[f"h_{h}" for h in range(24)]].values.tolist()
148 | 
149 | # Calculate simple statistics from 24-hour profile
150 | data['peak_kwh'] = data[[f"h_{h}" for h in range(24)]].max(axis=1)
151 | data['offpeak_kwh'] = data[[f"h_{h}" for h in range(24)]].min(axis=1)
152 | data['mean_hourly_kwh'] = data[[f"h_{h}" for h in range(24)]].mean(axis=1)
153 | 
154 | data['evening_share'] = data[[f"h_{h}" for h in range(17,23)]].sum(axis=1) / data['total_kwh']
155 | 
156 | data['morning_share'] = data[[f"h_{h}" for h in range(5,10)]].sum(axis=1) / data['total_kwh']
157 | 
158 | # Household-level aggregates: average daily consumption and variability
159 | agg = data.groupby('household_id').agg(
160 |     avg_daily_kwh=('total_kwh','mean'),
161 |     std_daily_kwh=('total_kwh','std'),
162 |     median_daily_kwh=('total_kwh','median')
163 | ).reset_index()
164 | 
165 | household_features = households.merge(agg, on='household_id', how='left')
166 | 
167 | # For clustering, we'll use normalized 24-hour profiles averaged per household
168 | profile_cols = [f"h_{h}" for h in range(24)]
169 | avg_profile = data.groupby('household_id')[profile_cols].mean().reset_index()
170 | avg_profile = avg_profile.merge(household_features, on='household_id', how='left')
171 | 
172 | # Normalize profiles
173 | scaler = StandardScaler()
174 | profile_matrix = scaler.fit_transform(avg_profile[profile_cols].values)
175 | 
176 | # -----------------------------
177 | # 4) Clustering to create profiles
178 | # -----------------------------
179 | print("Clustering to create consumption profiles...")
180 | 
181 | n_clusters = 4
182 | kmeans = KMeans(n_clusters=n_clusters, random_state=42)
183 | profile_labels = kmeans.fit_predict(profile_matrix)
184 | avg_profile['profile_label'] = profile_labels
185 | 
186 | # Add label to household_features
187 | household_features = household_features.merge(avg_profile[['household_id','profile_label']], on='household_id', how='left')
188 | 
189 | # Save cluster centers (for interpretation)
190 | cluster_centers = scaler.inverse_transform(kmeans.cluster_centers_)
191 | cluster_centers_df = pd.DataFrame(cluster_centers, columns=profile_cols)
192 | cluster_centers_df['profile'] = [f"Profile_{i}" for i in range(n_clusters)]
193 | cluster_centers_df.to_csv(os.path.join(OUTPUT_DIR, 'cluster_centers.csv'), index=False)
194 | 
195 | # -----------------------------
196 | # 5) Train classifier to predict profile from household features
197 | # -----------------------------
198 | print("Training classifier to predict profile labels from household features...")
199 | 
200 | # Prepare features
201 | cat_cols = ['building_type','region','has_solar']
202 | clf_df = household_features.copy()
203 | clf_df = pd.get_dummies(clf_df, columns=['building_type','region'], drop_first=True)
204 | 
205 | feature_cols = ['household_size','hvac_efficiency','has_solar','avg_daily_kwh','std_daily_kwh','median_daily_kwh'] + \
206 |                [c for c in clf_df.columns if c.startswith('building_type_') or c.startswith('region_')]
207 | 
208 | X = clf_df[feature_cols].fillna(0)
209 | y = clf_df['profile_label']
210 | 
211 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
212 | 
213 | clf = RandomForestClassifier(n_estimators=200, random_state=42)
214 | clf.fit(X_train, y_train)
215 | 
216 | y_pred = clf.predict(X_test)
217 | 
218 | report = classification_report(y_test, y_pred, zero_division=0)
219 | cm = confusion_matrix(y_test, y_pred)
220 | 
221 | with open(os.path.join(OUTPUT_DIR, 'classification_report.txt'), 'w') as f:
222 |     f.write(report)
223 | 
224 | # -----------------------------
225 | # 6) Visualizations
226 | # -----------------------------
227 | print("Generating visualizations...")
228 | 
229 | # PCA of profiles colored by cluster
230 | pca = PCA(n_components=2)
231 | proj = pca.fit_transform(profile_matrix)
232 | proj_df = pd.DataFrame(proj, columns=['pc1','pc2'])
233 | proj_df['profile_label'] = profile_labels
234 | proj_df['household_id'] = avg_profile['household_id'].values
235 | 
236 | plt.figure(figsize=(8,6))
237 | for lbl in sorted(proj_df['profile_label'].unique()):
238 |     subset = proj_df[proj_df['profile_label']==lbl]
239 |     plt.scatter(subset['pc1'], subset['pc2'], label=f'Profile {lbl}', alpha=0.6)
240 | plt.legend()
241 | plt.title('PCA of average daily profiles by cluster')
242 | plt.xlabel('PC1')
243 | plt.ylabel('PC2')
244 | plt.tight_layout()
245 | plt.savefig(os.path.join(OUTPUT_DIR,'profiles_pca.png'), dpi=150)
246 | plt.close()
247 | 
248 | # Plot cluster center load shapes
249 | plt.figure(figsize=(10,6))
250 | for i in range(n_clusters):
251 |     plt.plot(range(24), cluster_centers_df.loc[i, profile_cols].values, marker='o', label=f'Profile_{i}')
252 | plt.xlabel('Hour of day')
253 | plt.ylabel('kWh')
254 | plt.title('Cluster center daily load shapes')
255 | plt.legend()
256 | plt.tight_layout()
257 | plt.savefig(os.path.join(OUTPUT_DIR,'cluster_centers.png'), dpi=150)
258 | plt.close()
259 | 
260 | # Confusion matrix heatmap
261 | plt.figure(figsize=(6,5))
262 | sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
263 | plt.title('Confusion Matrix (profile prediction)')
264 | plt.xlabel('Predicted')
265 | plt.ylabel('Actual')
266 | plt.tight_layout()
267 | plt.savefig(os.path.join(OUTPUT_DIR,'confusion_matrix.png'), dpi=150)
268 | plt.close()
269 | 
270 | # -----------------------------
271 | # 7) Feature importance
272 | # -----------------------------
273 | feat_imp = pd.Series(clf.feature_importances_, index=X.columns).sort_values(ascending=False)
274 | feat_imp.to_csv(os.path.join(OUTPUT_DIR,'feature_importances.csv'))
275 | 
276 | plt.figure(figsize=(8,6))
277 | feat_imp.head(15).plot(kind='bar')
278 | plt.title('Top feature importances for profile prediction')
279 | plt.tight_layout()
280 | plt.savefig(os.path.join(OUTPUT_DIR,'feature_importances.png'), dpi=150)
281 | plt.close()
282 | 
283 | # -----------------------------
284 | # 8) Save outputs (CSV / Excel)
285 | # -----------------------------
286 | print("Saving outputs to the output/ folder...")
287 | 
288 | # household features with labels
289 | household_features.to_csv(os.path.join(OUTPUT_DIR,'household_features_with_profiles.csv'), index=False)
290 | # example daily data sample
291 | data.sample(500, random_state=42).to_csv(os.path.join(OUTPUT_DIR,'daily_sample.csv'), index=False)
292 | 
293 | # Save to Excel workbook with multiple sheets
294 | with pd.ExcelWriter(os.path.join(OUTPUT_DIR,'energy_profiling_results.xlsx'), engine='openpyxl') as writer:
295 |     household_features.to_excel(writer, sheet_name='household_profiles', index=False)
296 |     cluster_centers_df.to_excel(writer, sheet_name='cluster_centers', index=False)
297 |     feat_imp.to_frame('importance').to_excel(writer, sheet_name='feature_importances')
298 | 
299 | # Also print summary to console
300 | print("--- Summary ---")
301 | print(f"Number of households: {households.shape[0]}")
302 | print(f"Number of days per household: 365")
303 | print(f"Cluster counts:\n{avg_profile['profile_label'].value_counts().sort_index()}\n")
304 | print("Classification report (saved to output/classification_report.txt):")
305 | print(report)
306 | 
307 | print("All done. Results are in the 'output' folder.")
308 | 


--------------------------------------------------------------------------------