├── 7fbdbf_c9754d269f2c476c85ea0c9cc78fb805~mv2.webp ├── 7fbdbf_fbcb57db75e54e0dad434e394b3df0cc~mv2.webp ├── AI_CUP_2024_README_zh.md ├── L1_Generated.gz ├── README.md ├── report_satacking_model.ipynb ├── upload_24.gz └── 競賽報告與程式碼TEAM_5709根據區域微氣候資料預測發電量競賽.docx /7fbdbf_c9754d269f2c476c85ea0c9cc78fb805~mv2.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/a06044b29a53db0491f5ea8db0e66f21d28b62c8/7fbdbf_c9754d269f2c476c85ea0c9cc78fb805~mv2.webp -------------------------------------------------------------------------------- /7fbdbf_fbcb57db75e54e0dad434e394b3df0cc~mv2.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/a06044b29a53db0491f5ea8db0e66f21d28b62c8/7fbdbf_fbcb57db75e54e0dad434e394b3df0cc~mv2.webp -------------------------------------------------------------------------------- /AI_CUP_2024_README_zh.md: -------------------------------------------------------------------------------- 1 | 2 | # AI CUP 2024 - 基於區域微氣候資料預測發電量 3 | 4 | ## 團隊資訊 5 | 6 | - **隊伍名稱**: TEAM_5709 7 | - **比賽成績**: 8 | - **Private Leaderboard 分數**: 310,872.06 9 | - **Private Leaderboard 排名**: 第 1 名 10 | - **隊員**: 11 | - 隊長: 林靖鈞 (Lin Jingjun) 12 | 13 | --- 14 | 15 | ## 專案簡介 16 | 17 | 本專案參加 AI CUP 2024 秋季賽,主題為基於區域微氣候資料進行太陽能發電量的預測。我們的解決方案結合了創新的特徵工程策略與高效的堆疊回歸模型,在排行榜中取得了第一名的佳績。 18 | 19 | --- 20 | 21 | ## 專案目標 22 | 23 | - **準確預測**: 基於區域氣象數據進行太陽能發電量的高精度預測。 24 | - **高效建模**: 使用多種基礎模型與堆疊回歸框架提升預測效果。 25 | - **創新策略**: 通過天氣特徵交換等方法提升數據多樣性與模型泛化能力。 26 | - **通用應用**: 為新建電廠提供選址與效能評估參考。 27 | 28 | --- 29 | 30 | ## 環境設置 31 | 32 | - **開發語言**: Python 3 33 | - **開發環境**: Jupyter Notebook 34 | - **作業系統**: Ubuntu 24.04 35 | - **使用套件**: 36 | ```text 37 | numpy==1.24.3 38 | pandas==1.5.3 39 | matplotlib==3.7.1 40 | seaborn==0.12.2 41 | yellowbrick==1.5 42 | scikit-learn==1.2.2 43 | xgboost==1.7.6 44 | lightgbm==3.3.5 45 | catboost==1.2 46 | ``` 47 | 48 | ### 硬體配置 49 | 50 | - **伺服器**: 51 | - **CPU**: AMD Ryzen 7 8845HS 52 | - **記憶體**: 64GB 53 | - **GPU**: 無 54 | - **筆電**: 55 | - **CPU**: AMD Ryzen 5 4600H 56 | - **記憶體**: 32GB 57 | - **GPU**: 無 58 | 59 | --- 60 | 61 | ## 核心方法 62 | 63 | ### 資料處理 64 | 65 | 1. **異常值處理**: 根據颱風與豪雨數據剔除異常樣本,並進行標準化處理。 66 | 2. **數據集擴建**: 使用「天氣特徵交換」方法,將訓練數據從 11 萬筆擴增至 75 萬筆。 67 | 3. **降頻與粒度調整**: 68 | - 訓練數據: 使用 10 分鐘級別的降頻數據以降低噪音。 69 | - 測試數據: 恢復至分鐘級別以提升精度。 70 | 4. **特徵工程**: 71 | - **時間相關特徵**: 提取小時、分鐘、週期性特徵(如 cos/sin 轉換)。 72 | - **光照數據特徵**: 包括均值、標準差及正規化處理。 73 | 74 | ### 模型架構 75 | 76 | - **第一層模型(基礎模型)**: 77 | - **RandomForest (RFRA)**: 使用絕對誤差作為損失函數,搭建 1,000 棵決策樹。 78 | - **HistGradientBoosting (HGBR)**: 基於直方圖方法的梯度提升,啟用早停機制。 79 | - **CatBoost Regressor (CatBoostR)**: 支援類別數據建模,對數據順序不敏感。 80 | - **LightGBM (LGBM)**: 配置不同損失函數以適應多樣數據特性。 81 | 82 | - **第二層模型(堆疊模型)**: 83 | - 使用 LightGBM 作為最終預測器,增加隨機性以提升泛化能力。 84 | 85 | ### 創新技術 86 | 87 | - **天氣特徵交換**: 提升數據多樣性並學習跨站點相關性。 88 | - **偽標籤(Pseudo-Labeling)**: 通過多輪迭代優化預測結果。 89 | - **靈活時間粒度應用**: 結合低噪音訓練數據與高粒度測試數據提升穩定性與精度。 90 | 91 | --- 92 | 93 | ## 成果與啟示 94 | 95 | ### 競賽成果 96 | 97 | 1. **模型效能**: 98 | - 採用堆疊回歸模型取得了最高的排行榜成績。 99 | - 有效提升不同氣象條件下的預測準確性。 100 | 101 | 2. **創新貢獻**: 102 | - 提出天氣特徵交換策略,充分挖掘微氣候數據的潛力。 103 | - 設計可擴展的預測框架,適用於無歷史數據的新建電廠。 104 | 105 | ### 模型比較 106 | 107 | - **堆疊模型 vs. CNN-LSTM**: 108 | - CNN-LSTM 在預測時間範圍延長時誤差顯著增加。 109 | - 堆疊回歸模型在整體準確性與穩定性上更具優勢。 110 | 111 | --- 112 | 113 | ## 使用指南 114 | 115 | ### 環境設置 116 | 117 | 1. 安裝 Python 3 和 Conda。 118 | 2. 創建虛擬環境並安裝依賴套件: 119 | ```bash 120 | conda create -n aicup python=3.8 121 | conda activate aicup 122 | pip install -r requirements.txt 123 | ``` 124 | 125 | ### 執行步驟 126 | 127 | 1. **數據處理**: 執行數據預處理腳本生成訓練與測試數據。 128 | 2. **模型訓練**: 訓練堆疊回歸模型並保存結果。 129 | 3. **模型評估**: 評估模型性能並生成可視化報告。 130 | 131 | --- 132 | 133 | ## 聯絡資訊 134 | 135 | - **隊長**: 林靖鈞 (Lin Jingjun) 136 | - **學校**: 國立竹北高中普通科 137 | - **Email**: xup6vu84m3ul6@gmail.com 138 | 139 | --- 140 | 141 | ## 授權聲明 142 | 143 | 本專案僅供學術用途,若需用於商業目的或再發布,請聯繫作者獲得授權。 144 | 145 | --- 146 | 147 | ## 參考文獻與外部資源 148 | 149 | 1. Tovar, M. et al. (2020). *PV Power Prediction using CNN-LSTM hybrid neural network model*. 150 | 2. Himawari 衛星數據: [NOAA Himawari](https://registry.opendata.aws/noaa-himawari) 151 | 3. 環境署空氣品質資料: [MOE](https://data.moenv.gov.tw/dataset/detail/AQX_P_35) 152 | 153 | 更多細節請參閱競賽報告。 154 | -------------------------------------------------------------------------------- /L1_Generated.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/a06044b29a53db0491f5ea8db0e66f21d28b62c8/L1_Generated.gz -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # AI CUP 2024 - Predicting Solar Power Generation Using Microclimate Data 3 | 4 | ## Team Information 5 | 6 | - **Team Name**: TEAM_5709 7 | - **Performance**: 8 | - **Private Leaderboard Score**: 310,872.06 9 | - **Private Leaderboard Rank**: 1st Place 10 | ![Award](https://github.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/blob/main/7fbdbf_fbcb57db75e54e0dad434e394b3df0cc~mv2.webp) 11 | ![Award](https://github.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/blob/main/7fbdbf_c9754d269f2c476c85ea0c9cc78fb805~mv2.webp) 12 | - **Team Members**: 13 | - Team Leader: Lin Jingjun (林靖鈞) 14 | 15 | --- 16 | 17 | ## Project Overview 18 | 19 | This project was developed for the AI CUP 2024 competition, which focused on predicting solar power generation using regional microclimate data. Our solution integrates advanced feature engineering techniques and a robust stacking regressor model to achieve state-of-the-art predictive performance, securing the top rank in the competition. 20 | 21 | --- 22 | 23 | ## Objectives 24 | 25 | - **Accurate Predictions**: Leverage regional weather data to predict solar power output. 26 | - **Efficient Modeling**: Employ a stacking regression framework combining diverse base models. 27 | - **Innovative Strategies**: Enhance data diversity and model generalization using novel data augmentation methods. 28 | - **Scalable Application**: Provide insights for new plant site selection and performance assessment. 29 | 30 | --- 31 | 32 | ## Environment Setup 33 | 34 | - **Programming Language**: Python 3 35 | - **Development Environment**: Jupyter Notebook 36 | - **Operating System**: Ubuntu 24.04 37 | - **Required Libraries**: 38 | ```text 39 | numpy==1.24.3 40 | pandas==1.5.3 41 | matplotlib==3.7.1 42 | seaborn==0.12.2 43 | yellowbrick==1.5 44 | scikit-learn==1.2.2 45 | xgboost==1.7.6 46 | lightgbm==3.3.5 47 | catboost==1.2 48 | ``` 49 | 50 | ### Hardware Configuration 51 | 52 | - **Server**: 53 | - **CPU**: AMD Ryzen 7 8845HS 54 | - **RAM**: 64GB 55 | - **GPU**: None 56 | - **Laptop**: 57 | - **CPU**: AMD Ryzen 5 4600H 58 | - **RAM**: 32GB 59 | - **GPU**: None 60 | 61 | --- 62 | 63 | ## Key Methodologies 64 | 65 | ### Data Processing 66 | 67 | 1. **Outlier Handling**: Remove anomalies based on typhoon and heavy rainfall data. Normalize critical features for stability. 68 | 2. **Data Augmentation**: Use a novel "weather feature swapping" strategy to expand the training dataset from 110,000 to 750,000 samples. 69 | 3. **Frequency Adjustment**: 70 | - Training: Downsample to 10-minute intervals for noise reduction. 71 | - Testing: Restore to minute-level granularity for precision. 72 | 4. **Feature Engineering**: 73 | - **Time-related Features**: Hour, minute, and periodic transformations (e.g., cosine/sine for day cycles). 74 | - **Irradiance Metrics**: Statistical properties like mean, standard deviation, and normalization. 75 | 76 | ### Model Architecture 77 | 78 | - **First Layer (Base Models)**: 79 | - **RandomForest (RFRA)**: Trained with absolute error as the loss function, utilizing 1,000 trees. 80 | - **HistGradientBoosting (HGBR)**: Optimized with histogram-based gradient boosting and early stopping. 81 | - **CatBoost Regressor (CatBoostR)**: Supports categorical data natively and is order-insensitive. 82 | - **LightGBM (LGBM)**: Configured with different loss functions for diverse data characteristics. 83 | 84 | - **Second Layer (Meta-Model)**: 85 | - LightGBM with added randomness to enhance generalization. 86 | 87 | ### Innovative Techniques 88 | 89 | - **Weather Feature Swapping**: Boosts data diversity and learns cross-location correlations effectively. 90 | - **Pseudo-Labeling**: Iteratively refines predictions by using prior predictions as pseudo-labels. 91 | - **Granular Time Usage**: Balances precision and efficiency by integrating low-noise training data and high-granularity testing data. 92 | 93 | --- 94 | 95 | ## Results and Insights 96 | 97 | ### Achievements 98 | 99 | 1. **Model Performance**: 100 | - Achieved top leaderboard performance with a robust stacking regression model. 101 | - Significantly improved prediction accuracy across varying weather conditions. 102 | 103 | 2. **Innovative Contributions**: 104 | - Introduced weather feature swapping to maximize the utilization of microclimate data. 105 | - Designed a scalable prediction framework applicable to new power plants with no historical data. 106 | 107 | ### Key Comparisons 108 | 109 | - **Stacking vs. CNN-LSTM**: 110 | - CNN-LSTM exhibited higher errors with increasing forecast time horizons. 111 | - The stacking regressor model demonstrated superior overall accuracy and robustness. 112 | 113 | --- 114 | 115 | ## How to Use 116 | 117 | ### Environment Setup 118 | 119 | 1. Install Python 3 and Conda. 120 | 2. Create a virtual environment and install the required libraries: 121 | ```bash 122 | conda create -n aicup python=3.8 123 | conda activate aicup 124 | pip install -r requirements.txt 125 | ``` 126 | 127 | ### Execution Steps 128 | 129 | 1. **Data Preprocessing**: Run the data preparation script to clean and engineer features. 130 | 2. **Model Training**: Train the stacking regressor model and save the results. 131 | 3. **Evaluation**: Evaluate model performance and visualize the outcomes. 132 | 133 | --- 134 | 135 | ## Contact 136 | 137 | - **Team Leader**: Lin Jingjun (林靖鈞) 138 | - **School**: National Chupei Senior High School (國立竹北高中普通科) 139 | - **Email**: xup6vu84m3ul6@gmail.com 140 | 141 | --- 142 | 143 | ## License 144 | 145 | This project is released for academic purposes only. For commercial use or redistribution, please contact the author for permission. 146 | 147 | --- 148 | 149 | ## References and External Resources 150 | 151 | 1. Tovar, M. et al. (2020). *PV Power Prediction using CNN-LSTM hybrid neural network model*. 152 | 2. Himawari Satellite Data: [NOAA Himawari](https://registry.opendata.aws/noaa-himawari) 153 | 3. Environmental Quality Data: [MOE](https://data.moenv.gov.tw/dataset/detail/AQX_P_35) 154 | 155 | For more details, refer to the competition report. 156 | 157 | --- 158 | -------------------------------------------------------------------------------- /upload_24.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/a06044b29a53db0491f5ea8db0e66f21d28b62c8/upload_24.gz -------------------------------------------------------------------------------- /競賽報告與程式碼TEAM_5709根據區域微氣候資料預測發電量競賽.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xup6193193/Predicting_Solar_Power_Generation_Using_Microclimate_Data/a06044b29a53db0491f5ea8db0e66f21d28b62c8/競賽報告與程式碼TEAM_5709根據區域微氣候資料預測發電量競賽.docx --------------------------------------------------------------------------------