├── fig
    └── certificate.png
├── requirements.txt
├── .gitignore
├── LICENSE
├── dataset
    └── data_source.md
├── README.md
└── cmi-piu-silver-medal-solution.ipynb


/fig/certificate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chenfeng-huang/Kaggle_Silver_Medal_Solutioun_CMI-PIU/HEAD/fig/certificate.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy
 2 | pandas
 3 | polars
 4 | scikit-learn
 5 | lightgbm
 6 | xgboost
 7 | catboost
 8 | scipy
 9 | matplotlib
10 | seaborn
11 | tqdm
12 | colorama
13 | ipython
14 | ipykernel
15 | pyarrow


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Ignore large dataset files
 2 | 
 3 | dataset/train.csv
 4 | dataset/test.csv
 5 | dataset/series_train.parquet/
 6 | dataset/series_test.parquet/
 7 | dataset/sample_submission.csv
 8 | 
 9 | # Ignore catboost info
10 | catboost_info/*


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Chenfeng Huang(AIrick_H)
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/dataset/data_source.md:
--------------------------------------------------------------------------------
 1 | ## Data source and usage
 2 | 
 3 | Data for this project comes from the Kaggle competition [Child Mind Institute — Problematic Internet Use](https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/data).
 4 | 
 5 | ### What to download
 6 | 
 7 | - `train.csv`, `test.csv`, `sample_submission.csv`
 8 | - `series_train.parquet/` and `series_test.parquet/`, each containing per‑participant folders like `id=<participant_id>/part-0.parquet`
 9 | 
10 | ### Where to place the files
11 | 
12 | Place all files under `dataset/` with the following layout:
13 | 
14 | ```
15 | dataset/
16 | ├─ train.csv
17 | ├─ test.csv
18 | ├─ sample_submission.csv
19 | ├─ series_train.parquet/
20 | │  └─ id=<participant_id>/part-0.parquet
21 | └─ series_test.parquet/
22 |    └─ id=<participant_id>/part-0.parquet
23 | ```
24 | 
25 | ### Git policy (large files)
26 | 
27 | This repository ignores large competition artifacts to respect size and licensing:
28 | 
29 | - Ignored: `dataset/train.csv`, `dataset/test.csv`, `dataset/series_train.parquet/`, `dataset/series_test.parquet/`
30 | - Tracked: `dataset/sample_submission.csv`
31 | 
32 | See `.gitignore` for details.
33 | 
34 | ### Competition data rules (summary)
35 | 
36 | - The competition data consists of public and private test sets; which is which is not disclosed to participants.
37 | - Access and use are allowed for non‑commercial purposes only (participation, research, education) during the competition; terms may change thereafter.
38 | - Phenotypic/tabular survey data are de‑identified. You must not redistribute the data, attempt re‑identification, or probe the test labels. Report any PII findings to organizers via Kaggle forums.
39 | - Keep data secure; do not share it with non‑participants. Notify Kaggle of any unauthorized access or transmission.
40 | - External data may be used only if it is publicly available, equally accessible to all participants, and free of charge, and all other competition rules still apply.
41 | 
42 | ### Citation
43 | 
44 | If you reference or use the dataset, include the following citation:
45 | 
46 | “CMI 2024 Problematic Internet Use Detection Challenge”
47 | 
48 | Adam Santorelli, Arianna Zuanazzi, Michael Leyden, Logan Lawler, Maggie Devkin, Yuki Kotani, and Gregory Kiar. Child Mind Institute — Problematic Internet Use. https://kaggle.com/competitions/child-mind-institute-problematic-internet-use, 2024. Kaggle.
49 | 
50 | BibTeX:
51 | 
52 | ```bibtex
53 | @misc{child-mind-institute-problematic-internet-use,
54 |     author = {Adam Santorelli and Arianna Zuanazzi and Michael Leyden and Logan Lawler and Maggie Devkin and Yuki Kotani and Gregory Kiar},
55 |     title = {Child Mind Institute — Problematic Internet Use},
56 |     year = {2024},
57 |     howpublished = {\url{https://kaggle.com/competitions/child-mind-institute-problematic-internet-use}},
58 |     note = {Kaggle}
59 | }
60 | ```


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Child Mind Institute — Problematic Internet Use (CMI‑PIU) Solution
 2 | ![Python](https://img.shields.io/badge/python-3.10+-blue.svg)
 3 | 
 4 | 
 5 | 
 6 | This solution was developed for the Kaggle competition [Child Mind Institute — Problematic Internet Use](https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use), where participants predict the severity category `sii ∈ {0,1,2,3}` using demographics, clinical/questionnaire data, and wearable time‑series signals. The approach combines tabular features with per‑participant time‑series summary statistics and trains tree‑based regressors with out‑of‑fold validation and threshold optimization for the Quadratic Weighted Kappa (QWK) metric.
 7 | 
 8 | Our work earned a Silver Medal. The repository contains an end‑to‑end notebook that reproduces training and inference and generates a valid `submission.csv`. 🥈
 9 | 
10 | For a detailed walkthrough of the solution and insights into the modeling process, see my [blog post](https://chenfenghuang.info/2024/12/20/Kaggle-CMI-PIU/).
11 | ![CMI‑PIU — Silver Medal](./fig/certificate.png)
12 | 
13 | ## Competition Overview
14 | 
15 | ### Competition Introduction
16 | 
17 | The task is to build a predictive model for `sii` (four ordered categories) using a mixture of tabular data (demographics, physical health, and questionnaires such as SDS/PCIAT) and multivariate time‑series recorded per participant. The evaluation metric is **Quadratic Weighted Kappa (QWK)**, which measures agreement between predicted classes and ground truth while penalizing larger class discrepancies more heavily.
18 | 
19 | ### Competition Background
20 | 
21 | Problematic Internet Use is an emerging mental‑health concern. The dataset couples questionnaire and clinical information with wearable time‑series, enabling models that can leverage both static and dynamic signals. A practical solution must align columns across train/test, summarize long time‑series efficiently, and handle missing/categorical features robustly while optimizing directly for the ordered‑class QWK objective.
22 | 
23 | ## Solution Overview
24 | 
25 | This solution comprises three components:
26 | 
27 | 1. **Time‑Series Feature Extraction**: Aggregate each participant’s parquet time‑series into compact summary statistics via `DataFrame.describe()` (e.g., count, mean, std, quantiles, max) per channel.
28 | 2. **Tabular Fusion and Preprocessing**: Merge time‑series stats with tabular features by `id`, fill and encode season‑type categoricals, and align train/test columns.
29 | 3. **Modeling and Thresholding**: Train tree‑based regressors with 5‑fold stratified CV on `sii`, ensemble predictions where helpful, and optimize three thresholds to discretize continuous predictions into {0,1,2,3} maximizing QWK.
30 | 
31 | The final output is a two‑column `submission.csv` with `id,sii`.
32 | 
33 | 
34 | ## How to Reproduce
35 | 
36 | ### Environment Setup
37 | 
38 | - Python 3.10+ recommended
39 | - Install dependencies:
40 | 
41 | ```bash
42 | pip install -r requirements.txt
43 | ```
44 | 
45 | Key libraries: `numpy`, `pandas`, `scikit-learn`, `lightgbm`, `xgboost`, `catboost`, `scipy`, `pyarrow`, `tqdm`.
46 | 
47 | ### Data Layout
48 | 
49 | Place the competition data under `dataset/`:
50 | 
51 | ```
52 | dataset/
53 | ├─ train.csv
54 | ├─ test.csv
55 | ├─ sample_submission.csv
56 | ├─ series_train.parquet/
57 | │  └─ id=<participant_id>/part-0.parquet
58 | └─ series_test.parquet/
59 |    └─ id=<participant_id>/part-0.parquet
60 | ```
61 | 
62 | ### Data Source and Citation
63 | 
64 | - Dataset and rules: see `dataset/data_source.md` and the Kaggle data page: https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/data
65 | 
66 | ```bibtex
67 | @misc{child-mind-institute-problematic-internet-use,
68 |     author = {Adam Santorelli and Arianna Zuanazzi and Michael Leyden and Logan Lawler and Maggie Devkin and Yuki Kotani and Gregory Kiar},
69 |     title = {Child Mind Institute — Problematic Internet Use},
70 |     year = {2024},
71 |     howpublished = {\url{https://kaggle.com/competitions/child-mind-institute-problematic-internet-use}},
72 |     note = {Kaggle}
73 | }
74 | ```
75 | 
76 | ### Training and Inference
77 | 
78 | - Launch Jupyter and open the notebook:
79 | 
80 | - Run all cells in `cmi-piu-silver-medal-solution.ipynb` to train, validate, and generate `submission.csv` in the repository root.
81 | 
82 | ## Author
83 | 
84 | Maintainer: Chenfeng Huang - [Kaggle](https://www.kaggle.com/alrickh)
85 | 
86 | For questions, please open an issue or discussion in this repository.
87 | 
88 | ## License
89 | 
90 | Distributed under the terms of the license in `LICENSE`.
91 |  
92 | 


--------------------------------------------------------------------------------
/cmi-piu-silver-medal-solution.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": null,
   6 |    "id": "492a992d",
   7 |    "metadata": {
   8 |     "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
   9 |     "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
  10 |     "execution": {
  11 |      "iopub.execute_input": "2025-01-01T15:25:36.247831Z",
  12 |      "iopub.status.busy": "2025-01-01T15:25:36.247465Z",
  13 |      "iopub.status.idle": "2025-01-01T15:25:42.175421Z",
  14 |      "shell.execute_reply": "2025-01-01T15:25:42.174665Z"
  15 |     },
  16 |     "papermill": {
  17 |      "duration": 5.935686,
  18 |      "end_time": "2025-01-01T15:25:42.177081",
  19 |      "exception": false,
  20 |      "start_time": "2025-01-01T15:25:36.241395",
  21 |      "status": "completed"
  22 |     },
  23 |     "tags": []
  24 |    },
  25 |    "outputs": [],
  26 |    "source": [
  27 |     "import os\n",
  28 |     "import warnings\n",
  29 |     "from concurrent.futures import ThreadPoolExecutor\n",
  30 |     "\n",
  31 |     "import numpy as np\n",
  32 |     "import pandas as pd\n",
  33 |     "from sklearn.model_selection import StratifiedKFold\n",
  34 |     "from sklearn.impute import SimpleImputer, KNNImputer\n",
  35 |     "from sklearn.pipeline import Pipeline\n",
  36 |     "\n",
  37 |     "from sklearn.base import clone\n",
  38 |     "from sklearn.metrics import cohen_kappa_score\n",
  39 |     "from lightgbm import LGBMRegressor\n",
  40 |     "from xgboost import XGBRegressor\n",
  41 |     "from catboost import CatBoostRegressor\n",
  42 |     "from sklearn.ensemble import VotingRegressor, RandomForestRegressor, GradientBoostingRegressor\n",
  43 |     "\n",
  44 |     "from scipy.optimize import minimize\n",
  45 |     "\n",
  46 |     "\n",
  47 |     "from tqdm import tqdm\n",
  48 |     "from IPython.display import clear_output\n",
  49 |     "\n",
  50 |     "warnings.filterwarnings('ignore')  \n",
  51 |     "pd.options.display.max_columns = None  \n",
  52 |     "\n"
  53 |    ]
  54 |   },
  55 |   {
  56 |    "cell_type": "markdown",
  57 |    "id": "abb90b3b",
  58 |    "metadata": {
  59 |     "papermill": {
  60 |      "duration": 0.003826,
  61 |      "end_time": "2025-01-01T15:25:42.185401",
  62 |      "exception": false,
  63 |      "start_time": "2025-01-01T15:25:42.181575",
  64 |      "status": "completed"
  65 |     },
  66 |     "tags": []
  67 |    },
  68 |    "source": [
  69 |     "## Method 1"
  70 |    ]
  71 |   },
  72 |   {
  73 |    "cell_type": "code",
  74 |    "execution_count": 2,
  75 |    "id": "232dca9d",
  76 |    "metadata": {
  77 |     "execution": {
  78 |      "iopub.execute_input": "2025-01-01T15:25:42.194061Z",
  79 |      "iopub.status.busy": "2025-01-01T15:25:42.193611Z",
  80 |      "iopub.status.idle": "2025-01-01T15:25:42.264763Z",
  81 |      "shell.execute_reply": "2025-01-01T15:25:42.263859Z"
  82 |     },
  83 |     "papermill": {
  84 |      "duration": 0.077324,
  85 |      "end_time": "2025-01-01T15:25:42.266462",
  86 |      "exception": false,
  87 |      "start_time": "2025-01-01T15:25:42.189138",
  88 |      "status": "completed"
  89 |     },
  90 |     "tags": []
  91 |    },
  92 |    "outputs": [],
  93 |    "source": [
  94 |     "train = pd.read_csv('dataset/train.csv')  \n",
  95 |     "test = pd.read_csv('dataset/test.csv') \n",
  96 |     "sample = pd.read_csv('dataset/sample_submission.csv')  \n",
  97 |     "def process_file(filename, dirname):\n",
  98 |     "    df = pd.read_parquet(os.path.join(dirname, filename, 'part-0.parquet'))  \n",
  99 |     "    df.drop('step', axis=1, inplace=True)  \n",
 100 |     "    return df.describe().values.reshape(-1), filename.split('=')[1] \n",
 101 |     "def load_time_series(dirname) -> pd.DataFrame:\n",
 102 |     "    ids = os.listdir(dirname) \n",
 103 |     "\n",
 104 |     "    with ThreadPoolExecutor() as executor:\n",
 105 |     "        results = list(tqdm(executor.map(lambda fname: process_file(fname, dirname), ids), total=len(ids)))  \n",
 106 |     "\n",
 107 |     "    stats, indexes = zip(*results)  \n",
 108 |     "\n",
 109 |     "    df = pd.DataFrame(stats, columns=[f\"stat_{i}\" for i in range(len(stats[0]))])  \n",
 110 |     "    df['id'] = indexes  \n",
 111 |     "    return df"
 112 |    ]
 113 |   },
 114 |   {
 115 |    "cell_type": "code",
 116 |    "execution_count": 3,
 117 |    "id": "49aeba17",
 118 |    "metadata": {
 119 |     "execution": {
 120 |      "iopub.execute_input": "2025-01-01T15:25:42.275593Z",
 121 |      "iopub.status.busy": "2025-01-01T15:25:42.275346Z",
 122 |      "iopub.status.idle": "2025-01-01T15:26:51.501872Z",
 123 |      "shell.execute_reply": "2025-01-01T15:26:51.500865Z"
 124 |     },
 125 |     "papermill": {
 126 |      "duration": 69.232586,
 127 |      "end_time": "2025-01-01T15:26:51.503352",
 128 |      "exception": false,
 129 |      "start_time": "2025-01-01T15:25:42.270766",
 130 |      "status": "completed"
 131 |     },
 132 |     "tags": []
 133 |    },
 134 |    "outputs": [
 135 |     {
 136 |      "name": "stderr",
 137 |      "output_type": "stream",
 138 |      "text": [
 139 |       "100%|██████████| 111/111 [00:02<00:00, 54.32it/s]\n",
 140 |       "100%|██████████| 2/2 [00:00<00:00, 22.82it/s]\n"
 141 |      ]
 142 |     }
 143 |    ],
 144 |    "source": [
 145 |     "train_ts = load_time_series(\"dataset/series_train.parquet\")  \n",
 146 |     "test_ts = load_time_series(\"dataset/series_test.parquet\") \n",
 147 |     "\n",
 148 |     "time_series_cols = train_ts.columns.tolist()\n",
 149 |     "time_series_cols.remove(\"id\")  \n",
 150 |     "\n",
 151 |     "train = pd.merge(train, train_ts, how=\"left\", on='id')  \n",
 152 |     "test = pd.merge(test, test_ts, how=\"left\", on='id')  \n",
 153 |     "\n",
 154 |     "train = train.drop('id', axis=1) \n",
 155 |     "test = test.drop('id', axis=1)   "
 156 |    ]
 157 |   },
 158 |   {
 159 |    "cell_type": "code",
 160 |    "execution_count": 4,
 161 |    "id": "100a40dc",
 162 |    "metadata": {
 163 |     "execution": {
 164 |      "iopub.execute_input": "2025-01-01T15:26:51.540481Z",
 165 |      "iopub.status.busy": "2025-01-01T15:26:51.540155Z",
 166 |      "iopub.status.idle": "2025-01-01T15:26:51.567867Z",
 167 |      "shell.execute_reply": "2025-01-01T15:26:51.566962Z"
 168 |     },
 169 |     "papermill": {
 170 |      "duration": 0.047922,
 171 |      "end_time": "2025-01-01T15:26:51.569613",
 172 |      "exception": false,
 173 |      "start_time": "2025-01-01T15:26:51.521691",
 174 |      "status": "completed"
 175 |     },
 176 |     "tags": []
 177 |    },
 178 |    "outputs": [],
 179 |    "source": [
 180 |     "# Select Relevant Features and Handle Missing Values\n",
 181 |     "featuresCols = ['Basic_Demos-Enroll_Season', 'Basic_Demos-Age', 'Basic_Demos-Sex',\n",
 182 |     "                'CGAS-Season', 'CGAS-CGAS_Score', 'Physical-Season', 'Physical-BMI',\n",
 183 |     "                'Physical-Height', 'Physical-Weight', 'Physical-Waist_Circumference',\n",
 184 |     "                'Physical-Diastolic_BP', 'Physical-HeartRate', 'Physical-Systolic_BP',\n",
 185 |     "                'Fitness_Endurance-Season', 'Fitness_Endurance-Max_Stage',\n",
 186 |     "                'Fitness_Endurance-Time_Mins', 'Fitness_Endurance-Time_Sec',\n",
 187 |     "                'FGC-Season', 'FGC-FGC_CU', 'FGC-FGC_CU_Zone', 'FGC-FGC_GSND',\n",
 188 |     "                'FGC-FGC_GSND_Zone', 'FGC-FGC_GSD', 'FGC-FGC_GSD_Zone', 'FGC-FGC_PU',\n",
 189 |     "                'FGC-FGC_PU_Zone', 'FGC-FGC_SRL', 'FGC-FGC_SRL_Zone', 'FGC-FGC_SRR',\n",
 190 |     "                'FGC-FGC_SRR_Zone', 'FGC-FGC_TL', 'FGC-FGC_TL_Zone', 'BIA-Season',\n",
 191 |     "                'BIA-BIA_Activity_Level_num', 'BIA-BIA_BMC', 'BIA-BIA_BMI',\n",
 192 |     "                'BIA-BIA_BMR', 'BIA-BIA_DEE', 'BIA-BIA_ECW', 'BIA-BIA_FFM',\n",
 193 |     "                'BIA-BIA_FFMI', 'BIA-BIA_FMI', 'BIA-BIA_Fat', 'BIA-BIA_Frame_num',\n",
 194 |     "                'BIA-BIA_ICW', 'BIA-BIA_LDM', 'BIA-BIA_LST', 'BIA-BIA_SMM',\n",
 195 |     "                'BIA-BIA_TBW', 'PAQ_A-Season', 'PAQ_A-PAQ_A_Total', 'PAQ_C-Season',\n",
 196 |     "                'PAQ_C-PAQ_C_Total', 'SDS-Season', 'SDS-SDS_Total_Raw',\n",
 197 |     "                'SDS-SDS_Total_T', 'PreInt_EduHx-Season',\n",
 198 |     "                'PreInt_EduHx-computerinternet_hoursday', 'sii']\n",
 199 |     "\n",
 200 |     "\n",
 201 |     "featuresCols += time_series_cols \n",
 202 |     "\n",
 203 |     "train = train[featuresCols] \n",
 204 |     "train = train.dropna(subset='sii')  \n",
 205 |     "\n",
 206 |     "\n",
 207 |     "cat_c = ['Basic_Demos-Enroll_Season', 'CGAS-Season', 'Physical-Season', \n",
 208 |     "         'Fitness_Endurance-Season', 'FGC-Season', 'BIA-Season', \n",
 209 |     "         'PAQ_A-Season', 'PAQ_C-Season', 'SDS-Season', 'PreInt_EduHx-Season'] \n",
 210 |     "\n",
 211 |     "def update(df):\n",
 212 |     "    global cat_c\n",
 213 |     "    for c in cat_c: \n",
 214 |     "        df[c] = df[c].fillna('Missing')  \n",
 215 |     "        df[c] = df[c].astype('category') \n",
 216 |     "    return df\n",
 217 |     "        \n",
 218 |     "train = update(train)  \n",
 219 |     "test = update(test)  "
 220 |    ]
 221 |   },
 222 |   {
 223 |    "cell_type": "markdown",
 224 |    "id": "3373ab21",
 225 |    "metadata": {
 226 |     "papermill": {
 227 |      "duration": 0.017233,
 228 |      "end_time": "2025-01-01T15:26:51.604962",
 229 |      "exception": false,
 230 |      "start_time": "2025-01-01T15:26:51.587729",
 231 |      "status": "completed"
 232 |     },
 233 |     "tags": []
 234 |    },
 235 |    "source": [
 236 |     "### Feature Extraction "
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "code",
 241 |    "execution_count": 5,
 242 |    "id": "9ef1228e",
 243 |    "metadata": {
 244 |     "execution": {
 245 |      "iopub.execute_input": "2025-01-01T15:26:51.640809Z",
 246 |      "iopub.status.busy": "2025-01-01T15:26:51.640510Z",
 247 |      "iopub.status.idle": "2025-01-01T15:26:51.673254Z",
 248 |      "shell.execute_reply": "2025-01-01T15:26:51.672562Z"
 249 |     },
 250 |     "papermill": {
 251 |      "duration": 0.052186,
 252 |      "end_time": "2025-01-01T15:26:51.674572",
 253 |      "exception": false,
 254 |      "start_time": "2025-01-01T15:26:51.622386",
 255 |      "status": "completed"
 256 |     },
 257 |     "tags": []
 258 |    },
 259 |    "outputs": [],
 260 |    "source": [
 261 |     "def create_mapping(column, dataset):\n",
 262 |     "    unique_values = dataset[column].unique() \n",
 263 |     "    # to {feat0: 0, feat1: 1, feat2: 2, ...}\n",
 264 |     "    return {value: idx for idx, value in enumerate(unique_values)}  \n",
 265 |     "\n",
 266 |     "\n",
 267 |     "for col in cat_c:\n",
 268 |     "    mapping = create_mapping(col, train)  \n",
 269 |     "    mappingTe = create_mapping(col, test) \n",
 270 |     "    \n",
 271 |     "    train[col] = train[col].replace(mapping).astype(int) \n",
 272 |     "    test[col] = test[col].replace(mappingTe).astype(int)  "
 273 |    ]
 274 |   },
 275 |   {
 276 |    "cell_type": "markdown",
 277 |    "id": "0dafe0b5",
 278 |    "metadata": {
 279 |     "papermill": {
 280 |      "duration": 0.017297,
 281 |      "end_time": "2025-01-01T15:26:51.709290",
 282 |      "exception": false,
 283 |      "start_time": "2025-01-01T15:26:51.691993",
 284 |      "status": "completed"
 285 |     },
 286 |     "tags": []
 287 |    },
 288 |    "source": [
 289 |     "### Training model"
 290 |    ]
 291 |   },
 292 |   {
 293 |    "cell_type": "code",
 294 |    "execution_count": 6,
 295 |    "id": "995fbf40",
 296 |    "metadata": {
 297 |     "execution": {
 298 |      "iopub.execute_input": "2025-01-01T15:26:51.745352Z",
 299 |      "iopub.status.busy": "2025-01-01T15:26:51.744956Z",
 300 |      "iopub.status.idle": "2025-01-01T15:26:51.749853Z",
 301 |      "shell.execute_reply": "2025-01-01T15:26:51.748964Z"
 302 |     },
 303 |     "papermill": {
 304 |      "duration": 0.024415,
 305 |      "end_time": "2025-01-01T15:26:51.751107",
 306 |      "exception": false,
 307 |      "start_time": "2025-01-01T15:26:51.726692",
 308 |      "status": "completed"
 309 |     },
 310 |     "tags": []
 311 |    },
 312 |    "outputs": [],
 313 |    "source": [
 314 |     "def quadratic_weighted_kappa(y_true, y_pred):\n",
 315 |     "    return cohen_kappa_score(y_true, y_pred, weights='quadratic') \n",
 316 |     "\n",
 317 |     "def threshold_Rounder(oof_non_rounded, thresholds):\n",
 318 |     "    return np.where(oof_non_rounded < thresholds[0], 0,\n",
 319 |     "                    np.where(oof_non_rounded < thresholds[1], 1, \n",
 320 |     "                             np.where(oof_non_rounded < thresholds[2], 2, 3))) \n",
 321 |     "\n",
 322 |     "def evaluate_predictions(thresholds, y_true, oof_non_rounded):\n",
 323 |     "    rounded_p = threshold_Rounder(oof_non_rounded, thresholds)\n",
 324 |     "    return -quadratic_weighted_kappa(y_true, rounded_p)"
 325 |    ]
 326 |   },
 327 |   {
 328 |    "cell_type": "code",
 329 |    "execution_count": 7,
 330 |    "id": "73cdb3e9",
 331 |    "metadata": {
 332 |     "execution": {
 333 |      "iopub.execute_input": "2025-01-01T15:26:51.786688Z",
 334 |      "iopub.status.busy": "2025-01-01T15:26:51.786250Z",
 335 |      "iopub.status.idle": "2025-01-01T15:26:51.789479Z",
 336 |      "shell.execute_reply": "2025-01-01T15:26:51.788687Z"
 337 |     },
 338 |     "papermill": {
 339 |      "duration": 0.022369,
 340 |      "end_time": "2025-01-01T15:26:51.790748",
 341 |      "exception": false,
 342 |      "start_time": "2025-01-01T15:26:51.768379",
 343 |      "status": "completed"
 344 |     },
 345 |     "tags": []
 346 |    },
 347 |    "outputs": [],
 348 |    "source": [
 349 |     "SEED = 42\n",
 350 |     "n_splits = 5"
 351 |    ]
 352 |   },
 353 |   {
 354 |    "cell_type": "code",
 355 |    "execution_count": 8,
 356 |    "id": "2fa26bbb",
 357 |    "metadata": {
 358 |     "execution": {
 359 |      "iopub.execute_input": "2025-01-01T15:26:51.825995Z",
 360 |      "iopub.status.busy": "2025-01-01T15:26:51.825783Z",
 361 |      "iopub.status.idle": "2025-01-01T15:26:51.833161Z",
 362 |      "shell.execute_reply": "2025-01-01T15:26:51.832516Z"
 363 |     },
 364 |     "papermill": {
 365 |      "duration": 0.026285,
 366 |      "end_time": "2025-01-01T15:26:51.834277",
 367 |      "exception": false,
 368 |      "start_time": "2025-01-01T15:26:51.807992",
 369 |      "status": "completed"
 370 |     },
 371 |     "tags": []
 372 |    },
 373 |    "outputs": [],
 374 |    "source": [
 375 |     "def TrainML(model_class, test_data):\n",
 376 |     "    X = train.drop(['sii'], axis=1)  \n",
 377 |     "    y = train['sii']  \n",
 378 |     "\n",
 379 |     "    SKF = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=SEED)  \n",
 380 |     "    \n",
 381 |     "    train_S = []\n",
 382 |     "    test_S = []\n",
 383 |     "    \n",
 384 |     "    oof_non_rounded = np.zeros(len(y), dtype=float)  \n",
 385 |     "    oof_rounded = np.zeros(len(y), dtype=int)  \n",
 386 |     "    test_preds = np.zeros((len(test_data), n_splits)) \n",
 387 |     "\n",
 388 |     "    for fold, (train_idx, test_idx) in enumerate(tqdm(SKF.split(X, y), desc=\"Training Folds\", total=n_splits)):\n",
 389 |     "        X_train, X_val = X.iloc[train_idx], X.iloc[test_idx]  \n",
 390 |     "        y_train, y_val = y.iloc[train_idx], y.iloc[test_idx]  \n",
 391 |     "        model = clone(model_class)  \n",
 392 |     "        model.fit(X_train, y_train)\n",
 393 |     "\n",
 394 |     "        y_train_pred = model.predict(X_train)  \n",
 395 |     "        y_val_pred = model.predict(X_val)  \n",
 396 |     "\n",
 397 |     "        oof_non_rounded[test_idx] = y_val_pred  \n",
 398 |     "        y_val_pred_rounded = y_val_pred.round(0).astype(int) \n",
 399 |     "        oof_rounded[test_idx] = y_val_pred_rounded  \n",
 400 |     "\n",
 401 |     "        train_kappa = quadratic_weighted_kappa(y_train, y_train_pred.round(0).astype(int))  \n",
 402 |     "        val_kappa = quadratic_weighted_kappa(y_val, y_val_pred_rounded)  \n",
 403 |     "\n",
 404 |     "        train_S.append(train_kappa) \n",
 405 |     "        test_S.append(val_kappa)  \n",
 406 |     "        \n",
 407 |     "        test_preds[:, fold] = model.predict(test_data)  \n",
 408 |     "        \n",
 409 |     "        clear_output(wait=True)  \n",
 410 |     "\n",
 411 |     "    \n",
 412 |     "    KappaOPtimizer = minimize(evaluate_predictions,\n",
 413 |     "                              x0=[0.5, 1.5, 2.5], args=(y, oof_non_rounded), \n",
 414 |     "                              method='Nelder-Mead')\n",
 415 |     "    assert KappaOPtimizer.success, \"Optimization did not converge.\" \n",
 416 |     "    \n",
 417 |     "    oof_tuned = threshold_Rounder(oof_non_rounded, KappaOPtimizer.x)  \n",
 418 |     "    tKappa = quadratic_weighted_kappa(y, oof_tuned)  \n",
 419 |     "\n",
 420 |     "    tpm = test_preds.mean(axis=1)  \n",
 421 |     "    tpTuned = threshold_Rounder(tpm, KappaOPtimizer.x)  \n",
 422 |     "    \n",
 423 |     "    submission = pd.DataFrame({\n",
 424 |     "        'id': sample['id'],\n",
 425 |     "        'sii': tpTuned\n",
 426 |     "    }) \n",
 427 |     "    return submission"
 428 |    ]
 429 |   },
 430 |   {
 431 |    "cell_type": "code",
 432 |    "execution_count": 9,
 433 |    "id": "7f9f6e8c",
 434 |    "metadata": {
 435 |     "execution": {
 436 |      "iopub.execute_input": "2025-01-01T15:26:51.869751Z",
 437 |      "iopub.status.busy": "2025-01-01T15:26:51.869538Z",
 438 |      "iopub.status.idle": "2025-01-01T15:27:50.771711Z",
 439 |      "shell.execute_reply": "2025-01-01T15:27:50.770893Z"
 440 |     },
 441 |     "papermill": {
 442 |      "duration": 58.921369,
 443 |      "end_time": "2025-01-01T15:27:50.773131",
 444 |      "exception": false,
 445 |      "start_time": "2025-01-01T15:26:51.851762",
 446 |      "status": "completed"
 447 |     },
 448 |     "tags": []
 449 |    },
 450 |    "outputs": [
 451 |     {
 452 |      "name": "stderr",
 453 |      "output_type": "stream",
 454 |      "text": [
 455 |       "Training Folds: 100%|██████████| 5/5 [00:27<00:00,  5.49s/it]\n"
 456 |      ]
 457 |     },
 458 |     {
 459 |      "data": {
 460 |       "text/html": [
 461 |        "<div>\n",
 462 |        "<style scoped>\n",
 463 |        "    .dataframe tbody tr th:only-of-type {\n",
 464 |        "        vertical-align: middle;\n",
 465 |        "    }\n",
 466 |        "\n",
 467 |        "    .dataframe tbody tr th {\n",
 468 |        "        vertical-align: top;\n",
 469 |        "    }\n",
 470 |        "\n",
 471 |        "    .dataframe thead th {\n",
 472 |        "        text-align: right;\n",
 473 |        "    }\n",
 474 |        "</style>\n",
 475 |        "<table border=\"1\" class=\"dataframe\">\n",
 476 |        "  <thead>\n",
 477 |        "    <tr style=\"text-align: right;\">\n",
 478 |        "      <th></th>\n",
 479 |        "      <th>id</th>\n",
 480 |        "      <th>sii</th>\n",
 481 |        "    </tr>\n",
 482 |        "  </thead>\n",
 483 |        "  <tbody>\n",
 484 |        "    <tr>\n",
 485 |        "      <th>0</th>\n",
 486 |        "      <td>00008ff9</td>\n",
 487 |        "      <td>2</td>\n",
 488 |        "    </tr>\n",
 489 |        "    <tr>\n",
 490 |        "      <th>1</th>\n",
 491 |        "      <td>000fd460</td>\n",
 492 |        "      <td>0</td>\n",
 493 |        "    </tr>\n",
 494 |        "    <tr>\n",
 495 |        "      <th>2</th>\n",
 496 |        "      <td>00105258</td>\n",
 497 |        "      <td>0</td>\n",
 498 |        "    </tr>\n",
 499 |        "    <tr>\n",
 500 |        "      <th>3</th>\n",
 501 |        "      <td>00115b9f</td>\n",
 502 |        "      <td>1</td>\n",
 503 |        "    </tr>\n",
 504 |        "    <tr>\n",
 505 |        "      <th>4</th>\n",
 506 |        "      <td>0016bb22</td>\n",
 507 |        "      <td>1</td>\n",
 508 |        "    </tr>\n",
 509 |        "    <tr>\n",
 510 |        "      <th>5</th>\n",
 511 |        "      <td>001f3379</td>\n",
 512 |        "      <td>1</td>\n",
 513 |        "    </tr>\n",
 514 |        "    <tr>\n",
 515 |        "      <th>6</th>\n",
 516 |        "      <td>0038ba98</td>\n",
 517 |        "      <td>0</td>\n",
 518 |        "    </tr>\n",
 519 |        "    <tr>\n",
 520 |        "      <th>7</th>\n",
 521 |        "      <td>0068a485</td>\n",
 522 |        "      <td>0</td>\n",
 523 |        "    </tr>\n",
 524 |        "    <tr>\n",
 525 |        "      <th>8</th>\n",
 526 |        "      <td>0069fbed</td>\n",
 527 |        "      <td>1</td>\n",
 528 |        "    </tr>\n",
 529 |        "    <tr>\n",
 530 |        "      <th>9</th>\n",
 531 |        "      <td>0083e397</td>\n",
 532 |        "      <td>1</td>\n",
 533 |        "    </tr>\n",
 534 |        "    <tr>\n",
 535 |        "      <th>10</th>\n",
 536 |        "      <td>0087dd65</td>\n",
 537 |        "      <td>0</td>\n",
 538 |        "    </tr>\n",
 539 |        "    <tr>\n",
 540 |        "      <th>11</th>\n",
 541 |        "      <td>00abe655</td>\n",
 542 |        "      <td>0</td>\n",
 543 |        "    </tr>\n",
 544 |        "    <tr>\n",
 545 |        "      <th>12</th>\n",
 546 |        "      <td>00ae59c9</td>\n",
 547 |        "      <td>1</td>\n",
 548 |        "    </tr>\n",
 549 |        "    <tr>\n",
 550 |        "      <th>13</th>\n",
 551 |        "      <td>00af6387</td>\n",
 552 |        "      <td>1</td>\n",
 553 |        "    </tr>\n",
 554 |        "    <tr>\n",
 555 |        "      <th>14</th>\n",
 556 |        "      <td>00bd4359</td>\n",
 557 |        "      <td>1</td>\n",
 558 |        "    </tr>\n",
 559 |        "    <tr>\n",
 560 |        "      <th>15</th>\n",
 561 |        "      <td>00c0cd71</td>\n",
 562 |        "      <td>2</td>\n",
 563 |        "    </tr>\n",
 564 |        "    <tr>\n",
 565 |        "      <th>16</th>\n",
 566 |        "      <td>00d56d4b</td>\n",
 567 |        "      <td>0</td>\n",
 568 |        "    </tr>\n",
 569 |        "    <tr>\n",
 570 |        "      <th>17</th>\n",
 571 |        "      <td>00d9913d</td>\n",
 572 |        "      <td>0</td>\n",
 573 |        "    </tr>\n",
 574 |        "    <tr>\n",
 575 |        "      <th>18</th>\n",
 576 |        "      <td>00e6167c</td>\n",
 577 |        "      <td>0</td>\n",
 578 |        "    </tr>\n",
 579 |        "    <tr>\n",
 580 |        "      <th>19</th>\n",
 581 |        "      <td>00ebc35d</td>\n",
 582 |        "      <td>1</td>\n",
 583 |        "    </tr>\n",
 584 |        "  </tbody>\n",
 585 |        "</table>\n",
 586 |        "</div>"
 587 |       ],
 588 |       "text/plain": [
 589 |        "          id  sii\n",
 590 |        "0   00008ff9    2\n",
 591 |        "1   000fd460    0\n",
 592 |        "2   00105258    0\n",
 593 |        "3   00115b9f    1\n",
 594 |        "4   0016bb22    1\n",
 595 |        "5   001f3379    1\n",
 596 |        "6   0038ba98    0\n",
 597 |        "7   0068a485    0\n",
 598 |        "8   0069fbed    1\n",
 599 |        "9   0083e397    1\n",
 600 |        "10  0087dd65    0\n",
 601 |        "11  00abe655    0\n",
 602 |        "12  00ae59c9    1\n",
 603 |        "13  00af6387    1\n",
 604 |        "14  00bd4359    1\n",
 605 |        "15  00c0cd71    2\n",
 606 |        "16  00d56d4b    0\n",
 607 |        "17  00d9913d    0\n",
 608 |        "18  00e6167c    0\n",
 609 |        "19  00ebc35d    1"
 610 |       ]
 611 |      },
 612 |      "execution_count": 9,
 613 |      "metadata": {},
 614 |      "output_type": "execute_result"
 615 |     }
 616 |    ],
 617 |    "source": [
 618 |     "# LightGBM\n",
 619 |     "Params = {\n",
 620 |     "    'learning_rate': 0.046, \n",
 621 |     "    'max_depth': 12, \n",
 622 |     "    'num_leaves': 478, \n",
 623 |     "    'min_data_in_leaf': 13, \n",
 624 |     "    'feature_fraction': 0.893, \n",
 625 |     "    'bagging_fraction': 0.784, \n",
 626 |     "    'bagging_freq': 4, \n",
 627 |     "    'lambda_l1': 10,\n",
 628 |     "    'lambda_l2': 0.01, \n",
 629 |     "}\n",
 630 |     "\n",
 631 |     "XGB_Params = {\n",
 632 |     "    'learning_rate': 0.05, \n",
 633 |     "    'max_depth': 6, \n",
 634 |     "    'n_estimators': 200, \n",
 635 |     "    'subsample': 0.8, \n",
 636 |     "    'colsample_bytree': 0.8, \n",
 637 |     "    'reg_alpha': 1, \n",
 638 |     "    'reg_lambda': 5, \n",
 639 |     "    'random_state': SEED, \n",
 640 |     "}\n",
 641 |     "\n",
 642 |     "CatBoost_Params = {\n",
 643 |     "    'learning_rate': 0.05, \n",
 644 |     "    'depth': 6, \n",
 645 |     "    'iterations': 200, \n",
 646 |     "    'random_seed': SEED, \n",
 647 |     "    'cat_features': cat_c, \n",
 648 |     "    'verbose': 0, \n",
 649 |     "    'l2_leaf_reg': 10, \n",
 650 |     "}\n",
 651 |     "\n",
 652 |     "\n",
 653 |     "from collections import Counter\n",
 654 |     "\n",
 655 |     "class_counts = Counter(train['sii'])  \n",
 656 |     "total_samples = len(train)  \n",
 657 |     "# w = total_sample / class_sample\n",
 658 |     "class_weights = {cls: total_samples / count for cls, count in class_counts.items()} \n",
 659 |     "\n",
 660 |     "\n",
 661 |     "Params_with_weights = {\n",
 662 |     "    **Params,\n",
 663 |     "    'class_weight': class_weights\n",
 664 |     "}\n",
 665 |     "\n",
 666 |     "\n",
 667 |     "Light = LGBMRegressor(**Params_with_weights, random_state=SEED, verbose=-1, n_estimators=300)\n",
 668 |     "XGB_Model = XGBRegressor(**XGB_Params) \n",
 669 |     "CatBoost_Model = CatBoostRegressor(**CatBoost_Params)  \n",
 670 |     "\n",
 671 |     "voting_model = VotingRegressor(estimators=[\n",
 672 |     "    ('lightgbm', Light),\n",
 673 |     "    ('xgboost', XGB_Model),\n",
 674 |     "    ('catboost', CatBoost_Model)\n",
 675 |     "])\n",
 676 |     "\n",
 677 |     "# Train the ensemble model\n",
 678 |     "Submission1 = TrainML(voting_model, test) \n",
 679 |     "\n",
 680 |     "Submission1"
 681 |    ]
 682 |   },
 683 |   {
 684 |    "cell_type": "markdown",
 685 |    "id": "76614c97",
 686 |    "metadata": {
 687 |     "papermill": {
 688 |      "duration": 0.017445,
 689 |      "end_time": "2025-01-01T15:27:50.809561",
 690 |      "exception": false,
 691 |      "start_time": "2025-01-01T15:27:50.792116",
 692 |      "status": "completed"
 693 |     },
 694 |     "tags": []
 695 |    },
 696 |    "source": [
 697 |     "## Method 2"
 698 |    ]
 699 |   },
 700 |   {
 701 |    "cell_type": "code",
 702 |    "execution_count": 10,
 703 |    "id": "008ee316",
 704 |    "metadata": {
 705 |     "execution": {
 706 |      "iopub.execute_input": "2025-01-01T15:27:50.846386Z",
 707 |      "iopub.status.busy": "2025-01-01T15:27:50.846013Z",
 708 |      "iopub.status.idle": "2025-01-01T15:27:50.885071Z",
 709 |      "shell.execute_reply": "2025-01-01T15:27:50.884286Z"
 710 |     },
 711 |     "papermill": {
 712 |      "duration": 0.059238,
 713 |      "end_time": "2025-01-01T15:27:50.886643",
 714 |      "exception": false,
 715 |      "start_time": "2025-01-01T15:27:50.827405",
 716 |      "status": "completed"
 717 |     },
 718 |     "tags": []
 719 |    },
 720 |    "outputs": [],
 721 |    "source": [
 722 |     "# Load data\n",
 723 |     "train = pd.read_csv('dataset/train.csv') \n",
 724 |     "test = pd.read_csv('dataset/test.csv') \n",
 725 |     "sample = pd.read_csv('dataset/sample_submission.csv') "
 726 |    ]
 727 |   },
 728 |   {
 729 |    "cell_type": "code",
 730 |    "execution_count": 11,
 731 |    "id": "3af3d073",
 732 |    "metadata": {
 733 |     "execution": {
 734 |      "iopub.execute_input": "2025-01-01T15:27:50.923999Z",
 735 |      "iopub.status.busy": "2025-01-01T15:27:50.923707Z",
 736 |      "iopub.status.idle": "2025-01-01T15:28:59.977058Z",
 737 |      "shell.execute_reply": "2025-01-01T15:28:59.976147Z"
 738 |     },
 739 |     "papermill": {
 740 |      "duration": 69.073418,
 741 |      "end_time": "2025-01-01T15:28:59.978281",
 742 |      "exception": false,
 743 |      "start_time": "2025-01-01T15:27:50.904863",
 744 |      "status": "completed"
 745 |     },
 746 |     "tags": []
 747 |    },
 748 |    "outputs": [
 749 |     {
 750 |      "name": "stderr",
 751 |      "output_type": "stream",
 752 |      "text": [
 753 |       "100%|██████████| 111/111 [00:02<00:00, 53.38it/s]\n",
 754 |       "100%|██████████| 2/2 [00:00<00:00, 20.19it/s]\n"
 755 |      ]
 756 |     }
 757 |    ],
 758 |    "source": [
 759 |     "# Merge and Drop Columns\n",
 760 |     "train_ts = load_time_series(\"dataset/series_train.parquet\") \n",
 761 |     "test_ts = load_time_series(\"dataset/series_test.parquet\")  \n",
 762 |     "\n",
 763 |     "time_series_cols = train_ts.columns.tolist()\n",
 764 |     "time_series_cols.remove(\"id\")  \n",
 765 |     "\n",
 766 |     "train = pd.merge(train, train_ts, how=\"left\", on='id')  \n",
 767 |     "test = pd.merge(test, test_ts, how=\"left\", on='id')  \n",
 768 |     "train = train.drop('id', axis=1)  \n",
 769 |     "test = test.drop('id', axis=1)  "
 770 |    ]
 771 |   },
 772 |   {
 773 |    "cell_type": "code",
 774 |    "execution_count": 12,
 775 |    "id": "19390bef",
 776 |    "metadata": {
 777 |     "execution": {
 778 |      "iopub.execute_input": "2025-01-01T15:29:00.045938Z",
 779 |      "iopub.status.busy": "2025-01-01T15:29:00.045693Z",
 780 |      "iopub.status.idle": "2025-01-01T15:29:14.637956Z",
 781 |      "shell.execute_reply": "2025-01-01T15:29:14.637202Z"
 782 |     },
 783 |     "papermill": {
 784 |      "duration": 14.62708,
 785 |      "end_time": "2025-01-01T15:29:14.639577",
 786 |      "exception": false,
 787 |      "start_time": "2025-01-01T15:29:00.012497",
 788 |      "status": "completed"
 789 |     },
 790 |     "tags": []
 791 |    },
 792 |    "outputs": [],
 793 |    "source": [
 794 |     "imputer = KNNImputer(n_neighbors=5) \n",
 795 |     "\n",
 796 |     "numeric_cols = train.select_dtypes(include=['int32', 'int64', 'float64', 'int64']).columns \n",
 797 |     "imputed_data = imputer.fit_transform(train[numeric_cols])  \n",
 798 |     "train_imputed = pd.DataFrame(imputed_data, columns=numeric_cols)  \n",
 799 |     "train_imputed['sii'] = train_imputed['sii'].round().astype(int)  \n",
 800 |     "for col in train.columns:\n",
 801 |     "    if col not in numeric_cols:\n",
 802 |     "        train_imputed[col] = train[col]  \n",
 803 |     "        \n",
 804 |     "train = train_imputed  "
 805 |    ]
 806 |   },
 807 |   {
 808 |    "cell_type": "code",
 809 |    "execution_count": 13,
 810 |    "id": "301ecede",
 811 |    "metadata": {
 812 |     "execution": {
 813 |      "iopub.execute_input": "2025-01-01T15:29:14.707069Z",
 814 |      "iopub.status.busy": "2025-01-01T15:29:14.706770Z",
 815 |      "iopub.status.idle": "2025-01-01T15:29:14.733969Z",
 816 |      "shell.execute_reply": "2025-01-01T15:29:14.733302Z"
 817 |     },
 818 |     "papermill": {
 819 |      "duration": 0.062428,
 820 |      "end_time": "2025-01-01T15:29:14.735454",
 821 |      "exception": false,
 822 |      "start_time": "2025-01-01T15:29:14.673026",
 823 |      "status": "completed"
 824 |     },
 825 |     "tags": []
 826 |    },
 827 |    "outputs": [],
 828 |    "source": [
 829 |     "def feature_engineering(df):\n",
 830 |     "\n",
 831 |     "    season_cols = [col for col in df.columns if 'Season' in col]  \n",
 832 |     "    df = df.drop(season_cols, axis=1)  # Drop Season (too many missing values)\n",
 833 |     "    df['BMI_Age'] = df['Physical-BMI'] * df['Basic_Demos-Age']  # BMI and age interactions \n",
 834 |     "    df['Internet_Hours_Age'] = df['PreInt_EduHx-computerinternet_hoursday'] * df['Basic_Demos-Age']  # Internet hours and age interactions\n",
 835 |     "    df['BMI_Internet_Hours'] = df['Physical-BMI'] * df['PreInt_EduHx-computerinternet_hoursday']  # BMI and Internet hours interactions\n",
 836 |     "    df['BFP_BMI'] = df['BIA-BIA_Fat'] / df['BIA-BIA_BMI']  # Fat and BMI ratio\n",
 837 |     "    df['FFMI_BFP'] = df['BIA-BIA_FFMI'] / df['BIA-BIA_Fat']  # FFMI and Fat ratio\n",
 838 |     "    df['FMI_BFP'] = df['BIA-BIA_FMI'] / df['BIA-BIA_Fat']  # FMI and Fat ratio\n",
 839 |     "    df['LST_TBW'] = df['BIA-BIA_LST'] / df['BIA-BIA_TBW']  # LST and TBW ratio\n",
 840 |     "    df['BFP_BMR'] = df['BIA-BIA_Fat'] * df['BIA-BIA_BMR']  # Fat and BMR interactions\n",
 841 |     "    df['BFP_DEE'] = df['BIA-BIA_Fat'] * df['BIA-BIA_DEE']  # Fat and DEE interactions\n",
 842 |     "    df['BMR_Weight'] = df['BIA-BIA_BMR'] / df['Physical-Weight']  # BMR and Weight ratio\n",
 843 |     "    df['DEE_Weight'] = df['BIA-BIA_DEE'] / df['Physical-Weight']  # DEE and Weight ratio\n",
 844 |     "    df['SMM_Height'] = df['BIA-BIA_SMM'] / df['Physical-Height']  # SMM and Height ratio\n",
 845 |     "    df['Muscle_to_Fat'] = df['BIA-BIA_SMM'] / df['BIA-BIA_FMI']  # Muscle and Fat ratio\n",
 846 |     "    df['Hydration_Status'] = df['BIA-BIA_TBW'] / df['Physical-Weight']  #TBW and Weight ratio\n",
 847 |     "    df['ICW_TBW'] = df['BIA-BIA_ICW'] / df['BIA-BIA_TBW']  # ICW and TBW ratio\n",
 848 |     "    df['BMI_PHR'] = df['Physical-BMI'] * df['Physical-HeartRate']  # BMI and Heart rate interaction\n",
 849 |     "    return df\n",
 850 |     "\n",
 851 |     "train = feature_engineering(train)  \n",
 852 |     "train = train.dropna(thresh=10, axis=0)  # Keep rows with at least 10 non-missing values\n",
 853 |     "test = feature_engineering(test) "
 854 |    ]
 855 |   },
 856 |   {
 857 |    "cell_type": "code",
 858 |    "execution_count": 14,
 859 |    "id": "5f33cd5d",
 860 |    "metadata": {
 861 |     "execution": {
 862 |      "iopub.execute_input": "2025-01-01T15:29:14.801735Z",
 863 |      "iopub.status.busy": "2025-01-01T15:29:14.801456Z",
 864 |      "iopub.status.idle": "2025-01-01T15:29:14.810767Z",
 865 |      "shell.execute_reply": "2025-01-01T15:29:14.810060Z"
 866 |     },
 867 |     "papermill": {
 868 |      "duration": 0.043658,
 869 |      "end_time": "2025-01-01T15:29:14.812076",
 870 |      "exception": false,
 871 |      "start_time": "2025-01-01T15:29:14.768418",
 872 |      "status": "completed"
 873 |     },
 874 |     "tags": []
 875 |    },
 876 |    "outputs": [],
 877 |    "source": [
 878 |     "featuresCols = ['Basic_Demos-Age', 'Basic_Demos-Sex',\n",
 879 |     "                'CGAS-CGAS_Score', 'Physical-BMI',\n",
 880 |     "                'Physical-Height', 'Physical-Weight', 'Physical-Waist_Circumference',\n",
 881 |     "                'Physical-Diastolic_BP', 'Physical-HeartRate', 'Physical-Systolic_BP',\n",
 882 |     "                'Fitness_Endurance-Max_Stage',\n",
 883 |     "                'Fitness_Endurance-Time_Mins', 'Fitness_Endurance-Time_Sec',\n",
 884 |     "                'FGC-FGC_CU', 'FGC-FGC_CU_Zone', 'FGC-FGC_GSND',\n",
 885 |     "                'FGC-FGC_GSND_Zone', 'FGC-FGC_GSD', 'FGC-FGC_GSD_Zone', 'FGC-FGC_PU',\n",
 886 |     "                'FGC-FGC_PU_Zone', 'FGC-FGC_SRL', 'FGC-FGC_SRL_Zone', 'FGC-FGC_SRR',\n",
 887 |     "                'FGC-FGC_SRR_Zone', 'FGC-FGC_TL', 'FGC-FGC_TL_Zone',\n",
 888 |     "                'BIA-BIA_Activity_Level_num', 'BIA-BIA_BMC', 'BIA-BIA_BMI',\n",
 889 |     "                'BIA-BIA_BMR', 'BIA-BIA_DEE', 'BIA-BIA_ECW', 'BIA-BIA_FFM',\n",
 890 |     "                'BIA-BIA_FFMI', 'BIA-BIA_FMI', 'BIA-BIA_Fat', 'BIA-BIA_Frame_num',\n",
 891 |     "                'BIA-BIA_ICW', 'BIA-BIA_LDM', 'BIA-BIA_LST', 'BIA-BIA_SMM',\n",
 892 |     "                'BIA-BIA_TBW', 'PAQ_A-PAQ_A_Total',\n",
 893 |     "                'PAQ_C-PAQ_C_Total', 'SDS-SDS_Total_Raw',\n",
 894 |     "                'SDS-SDS_Total_T',\n",
 895 |     "                'PreInt_EduHx-computerinternet_hoursday', 'BMI_Age', 'Internet_Hours_Age', 'BMI_Internet_Hours',\n",
 896 |     "                'BFP_BMI', 'FFMI_BFP', 'FMI_BFP', 'LST_TBW', 'BFP_BMR', 'BFP_DEE', 'BMR_Weight', 'DEE_Weight', 'SMM_Height', 'Muscle_to_Fat', 'Hydration_Status', 'ICW_TBW', 'BMI_PHR',\n",
 897 |     "                ]\n",
 898 |     "\n",
 899 |     "train = train[featuresCols + time_series_cols + ['sii']]\n",
 900 |     "train = train.dropna(subset='sii') \n",
 901 |     "\n",
 902 |     "\n",
 903 |     "test = test[featuresCols + time_series_cols]"
 904 |    ]
 905 |   },
 906 |   {
 907 |    "cell_type": "code",
 908 |    "execution_count": 15,
 909 |    "id": "1c3192d2",
 910 |    "metadata": {
 911 |     "execution": {
 912 |      "iopub.execute_input": "2025-01-01T15:29:14.877874Z",
 913 |      "iopub.status.busy": "2025-01-01T15:29:14.877598Z",
 914 |      "iopub.status.idle": "2025-01-01T15:29:14.887045Z",
 915 |      "shell.execute_reply": "2025-01-01T15:29:14.886388Z"
 916 |     },
 917 |     "papermill": {
 918 |      "duration": 0.044146,
 919 |      "end_time": "2025-01-01T15:29:14.888509",
 920 |      "exception": false,
 921 |      "start_time": "2025-01-01T15:29:14.844363",
 922 |      "status": "completed"
 923 |     },
 924 |     "tags": []
 925 |    },
 926 |    "outputs": [],
 927 |    "source": [
 928 |     "if np.any(np.isinf(train)):\n",
 929 |     "    train = train.replace([np.inf, -np.inf], np.nan) "
 930 |    ]
 931 |   },
 932 |   {
 933 |    "cell_type": "markdown",
 934 |    "id": "b311fd0f",
 935 |    "metadata": {
 936 |     "papermill": {
 937 |      "duration": 0.034398,
 938 |      "end_time": "2025-01-01T15:29:14.957111",
 939 |      "exception": false,
 940 |      "start_time": "2025-01-01T15:29:14.922713",
 941 |      "status": "completed"
 942 |     },
 943 |     "tags": []
 944 |    },
 945 |    "source": [
 946 |     "### Training model"
 947 |    ]
 948 |   },
 949 |   {
 950 |    "cell_type": "code",
 951 |    "execution_count": 16,
 952 |    "id": "4514fd9d",
 953 |    "metadata": {
 954 |     "execution": {
 955 |      "iopub.execute_input": "2025-01-01T15:29:15.066031Z",
 956 |      "iopub.status.busy": "2025-01-01T15:29:15.065714Z",
 957 |      "iopub.status.idle": "2025-01-01T15:29:15.073359Z",
 958 |      "shell.execute_reply": "2025-01-01T15:29:15.072524Z"
 959 |     },
 960 |     "papermill": {
 961 |      "duration": 0.04206,
 962 |      "end_time": "2025-01-01T15:29:15.074631",
 963 |      "exception": false,
 964 |      "start_time": "2025-01-01T15:29:15.032571",
 965 |      "status": "completed"
 966 |     },
 967 |     "tags": []
 968 |    },
 969 |    "outputs": [],
 970 |    "source": [
 971 |     "def TrainML(model_class, test_data):\n",
 972 |     "    X = train.drop(['sii'], axis=1)  \n",
 973 |     "    y = train['sii']  \n",
 974 |     "\n",
 975 |     "    SKF = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=SEED)  \n",
 976 |     "    \n",
 977 |     "    train_S = []\n",
 978 |     "    test_S = []\n",
 979 |     "    \n",
 980 |     "    oof_non_rounded = np.zeros(len(y), dtype=float)  \n",
 981 |     "    oof_rounded = np.zeros(len(y), dtype=int) \n",
 982 |     "    test_preds = np.zeros((len(test_data), n_splits)) \n",
 983 |     "\n",
 984 |     "    for fold, (train_idx, test_idx) in enumerate(tqdm(SKF.split(X, y), desc=\"Training Folds\", total=n_splits)):\n",
 985 |     "        X_train, X_val = X.iloc[train_idx], X.iloc[test_idx]  \n",
 986 |     "        y_train, y_val = y.iloc[train_idx], y.iloc[test_idx]  \n",
 987 |     "\n",
 988 |     "        model = clone(model_class)  \n",
 989 |     "        model.fit(X_train, y_train) \n",
 990 |     "\n",
 991 |     "        y_train_pred = model.predict(X_train)  \n",
 992 |     "        y_val_pred = model.predict(X_val)  \n",
 993 |     "\n",
 994 |     "        oof_non_rounded[test_idx] = y_val_pred  \n",
 995 |     "        y_val_pred_rounded = y_val_pred.round(0).astype(int)  \n",
 996 |     "        oof_rounded[test_idx] = y_val_pred_rounded  \n",
 997 |     "\n",
 998 |     "        train_kappa = quadratic_weighted_kappa(y_train, y_train_pred.round(0).astype(int))  \n",
 999 |     "        val_kappa = quadratic_weighted_kappa(y_val, y_val_pred_rounded)  \n",
1000 |     "\n",
1001 |     "        train_S.append(train_kappa)  \n",
1002 |     "        test_S.append(val_kappa)  \n",
1003 |     "        \n",
1004 |     "        test_preds[:, fold] = model.predict(test_data)\n",
1005 |     "        \n",
1006 |     "        clear_output(wait=True) \n",
1007 |     "\n",
1008 |     "    # Maxmize Kappa score optimization\n",
1009 |     "    KappaOPtimizer = minimize(evaluate_predictions,\n",
1010 |     "                              x0=[0.5, 1.5, 2.5], args=(y, oof_non_rounded), \n",
1011 |     "                              method='Nelder-Mead')\n",
1012 |     "    assert KappaOPtimizer.success, \"Optimization did not converge.\"  \n",
1013 |     "    \n",
1014 |     "    oof_tuned = threshold_Rounder(oof_non_rounded, KappaOPtimizer.x)  \n",
1015 |     "    tKappa = quadratic_weighted_kappa(y, oof_tuned) \n",
1016 |     "\n",
1017 |     "    tpm = test_preds.mean(axis=1)  \n",
1018 |     "    tp_rounded = threshold_Rounder(tpm, KappaOPtimizer.x)  \n",
1019 |     "\n",
1020 |     "    return tp_rounded  "
1021 |    ]
1022 |   },
1023 |   {
1024 |    "cell_type": "code",
1025 |    "execution_count": 17,
1026 |    "id": "2f8d68aa",
1027 |    "metadata": {
1028 |     "execution": {
1029 |      "iopub.execute_input": "2025-01-01T15:29:15.140584Z",
1030 |      "iopub.status.busy": "2025-01-01T15:29:15.140266Z",
1031 |      "iopub.status.idle": "2025-01-01T15:32:12.170806Z",
1032 |      "shell.execute_reply": "2025-01-01T15:32:12.169857Z"
1033 |     },
1034 |     "papermill": {
1035 |      "duration": 177.065371,
1036 |      "end_time": "2025-01-01T15:32:12.172492",
1037 |      "exception": false,
1038 |      "start_time": "2025-01-01T15:29:15.107121",
1039 |      "status": "completed"
1040 |     },
1041 |     "tags": []
1042 |    },
1043 |    "outputs": [
1044 |     {
1045 |      "name": "stderr",
1046 |      "output_type": "stream",
1047 |      "text": [
1048 |       "Training Folds: 100%|██████████| 5/5 [00:59<00:00, 11.88s/it]\n"
1049 |      ]
1050 |     }
1051 |    ],
1052 |    "source": [
1053 |     "# Ensemble Model\n",
1054 |     "imputer = SimpleImputer(strategy='median')  \n",
1055 |     "\n",
1056 |     "ensemble = VotingRegressor(estimators=[\n",
1057 |     "    ('lgb', Pipeline(steps=[('imputer', imputer), ('regressor', LGBMRegressor(random_state=SEED))])),  # LightGBM\n",
1058 |     "    ('xgb', Pipeline(steps=[('imputer', imputer), ('regressor', XGBRegressor(random_state=SEED))])),  # XGBoost\n",
1059 |     "    ('cat', Pipeline(steps=[('imputer', imputer), ('regressor', CatBoostRegressor(random_state=SEED, silent=True))])),  # CatBoost\n",
1060 |     "    ('rf', Pipeline(steps=[('imputer', imputer), ('regressor', RandomForestRegressor(random_state=SEED))])),  # Random Forest\n",
1061 |     "    ('gb', Pipeline(steps=[('imputer', imputer), ('regressor', GradientBoostingRegressor(random_state=SEED))]))  # Gradient Boosting\n",
1062 |     "])\n",
1063 |     "\n",
1064 |     "Submission2 = TrainML(ensemble, test)  "
1065 |    ]
1066 |   },
1067 |   {
1068 |    "cell_type": "code",
1069 |    "execution_count": 18,
1070 |    "id": "9f90de79",
1071 |    "metadata": {
1072 |     "execution": {
1073 |      "iopub.execute_input": "2025-01-01T15:32:12.248388Z",
1074 |      "iopub.status.busy": "2025-01-01T15:32:12.247952Z",
1075 |      "iopub.status.idle": "2025-01-01T15:32:12.258275Z",
1076 |      "shell.execute_reply": "2025-01-01T15:32:12.257317Z"
1077 |     },
1078 |     "papermill": {
1079 |      "duration": 0.050096,
1080 |      "end_time": "2025-01-01T15:32:12.259945",
1081 |      "exception": false,
1082 |      "start_time": "2025-01-01T15:32:12.209849",
1083 |      "status": "completed"
1084 |     },
1085 |     "tags": []
1086 |    },
1087 |    "outputs": [
1088 |     {
1089 |      "data": {
1090 |       "text/html": [
1091 |        "<div>\n",
1092 |        "<style scoped>\n",
1093 |        "    .dataframe tbody tr th:only-of-type {\n",
1094 |        "        vertical-align: middle;\n",
1095 |        "    }\n",
1096 |        "\n",
1097 |        "    .dataframe tbody tr th {\n",
1098 |        "        vertical-align: top;\n",
1099 |        "    }\n",
1100 |        "\n",
1101 |        "    .dataframe thead th {\n",
1102 |        "        text-align: right;\n",
1103 |        "    }\n",
1104 |        "</style>\n",
1105 |        "<table border=\"1\" class=\"dataframe\">\n",
1106 |        "  <thead>\n",
1107 |        "    <tr style=\"text-align: right;\">\n",
1108 |        "      <th></th>\n",
1109 |        "      <th>id</th>\n",
1110 |        "      <th>sii</th>\n",
1111 |        "    </tr>\n",
1112 |        "  </thead>\n",
1113 |        "  <tbody>\n",
1114 |        "    <tr>\n",
1115 |        "      <th>0</th>\n",
1116 |        "      <td>00008ff9</td>\n",
1117 |        "      <td>2</td>\n",
1118 |        "    </tr>\n",
1119 |        "    <tr>\n",
1120 |        "      <th>1</th>\n",
1121 |        "      <td>000fd460</td>\n",
1122 |        "      <td>0</td>\n",
1123 |        "    </tr>\n",
1124 |        "    <tr>\n",
1125 |        "      <th>2</th>\n",
1126 |        "      <td>00105258</td>\n",
1127 |        "      <td>0</td>\n",
1128 |        "    </tr>\n",
1129 |        "    <tr>\n",
1130 |        "      <th>3</th>\n",
1131 |        "      <td>00115b9f</td>\n",
1132 |        "      <td>1</td>\n",
1133 |        "    </tr>\n",
1134 |        "    <tr>\n",
1135 |        "      <th>4</th>\n",
1136 |        "      <td>0016bb22</td>\n",
1137 |        "      <td>0</td>\n",
1138 |        "    </tr>\n",
1139 |        "    <tr>\n",
1140 |        "      <th>5</th>\n",
1141 |        "      <td>001f3379</td>\n",
1142 |        "      <td>1</td>\n",
1143 |        "    </tr>\n",
1144 |        "    <tr>\n",
1145 |        "      <th>6</th>\n",
1146 |        "      <td>0038ba98</td>\n",
1147 |        "      <td>0</td>\n",
1148 |        "    </tr>\n",
1149 |        "    <tr>\n",
1150 |        "      <th>7</th>\n",
1151 |        "      <td>0068a485</td>\n",
1152 |        "      <td>0</td>\n",
1153 |        "    </tr>\n",
1154 |        "    <tr>\n",
1155 |        "      <th>8</th>\n",
1156 |        "      <td>0069fbed</td>\n",
1157 |        "      <td>1</td>\n",
1158 |        "    </tr>\n",
1159 |        "    <tr>\n",
1160 |        "      <th>9</th>\n",
1161 |        "      <td>0083e397</td>\n",
1162 |        "      <td>0</td>\n",
1163 |        "    </tr>\n",
1164 |        "    <tr>\n",
1165 |        "      <th>10</th>\n",
1166 |        "      <td>0087dd65</td>\n",
1167 |        "      <td>0</td>\n",
1168 |        "    </tr>\n",
1169 |        "    <tr>\n",
1170 |        "      <th>11</th>\n",
1171 |        "      <td>00abe655</td>\n",
1172 |        "      <td>0</td>\n",
1173 |        "    </tr>\n",
1174 |        "    <tr>\n",
1175 |        "      <th>12</th>\n",
1176 |        "      <td>00ae59c9</td>\n",
1177 |        "      <td>1</td>\n",
1178 |        "    </tr>\n",
1179 |        "    <tr>\n",
1180 |        "      <th>13</th>\n",
1181 |        "      <td>00af6387</td>\n",
1182 |        "      <td>0</td>\n",
1183 |        "    </tr>\n",
1184 |        "    <tr>\n",
1185 |        "      <th>14</th>\n",
1186 |        "      <td>00bd4359</td>\n",
1187 |        "      <td>1</td>\n",
1188 |        "    </tr>\n",
1189 |        "    <tr>\n",
1190 |        "      <th>15</th>\n",
1191 |        "      <td>00c0cd71</td>\n",
1192 |        "      <td>1</td>\n",
1193 |        "    </tr>\n",
1194 |        "    <tr>\n",
1195 |        "      <th>16</th>\n",
1196 |        "      <td>00d56d4b</td>\n",
1197 |        "      <td>0</td>\n",
1198 |        "    </tr>\n",
1199 |        "    <tr>\n",
1200 |        "      <th>17</th>\n",
1201 |        "      <td>00d9913d</td>\n",
1202 |        "      <td>0</td>\n",
1203 |        "    </tr>\n",
1204 |        "    <tr>\n",
1205 |        "      <th>18</th>\n",
1206 |        "      <td>00e6167c</td>\n",
1207 |        "      <td>0</td>\n",
1208 |        "    </tr>\n",
1209 |        "    <tr>\n",
1210 |        "      <th>19</th>\n",
1211 |        "      <td>00ebc35d</td>\n",
1212 |        "      <td>0</td>\n",
1213 |        "    </tr>\n",
1214 |        "  </tbody>\n",
1215 |        "</table>\n",
1216 |        "</div>"
1217 |       ],
1218 |       "text/plain": [
1219 |        "          id  sii\n",
1220 |        "0   00008ff9    2\n",
1221 |        "1   000fd460    0\n",
1222 |        "2   00105258    0\n",
1223 |        "3   00115b9f    1\n",
1224 |        "4   0016bb22    0\n",
1225 |        "5   001f3379    1\n",
1226 |        "6   0038ba98    0\n",
1227 |        "7   0068a485    0\n",
1228 |        "8   0069fbed    1\n",
1229 |        "9   0083e397    0\n",
1230 |        "10  0087dd65    0\n",
1231 |        "11  00abe655    0\n",
1232 |        "12  00ae59c9    1\n",
1233 |        "13  00af6387    0\n",
1234 |        "14  00bd4359    1\n",
1235 |        "15  00c0cd71    1\n",
1236 |        "16  00d56d4b    0\n",
1237 |        "17  00d9913d    0\n",
1238 |        "18  00e6167c    0\n",
1239 |        "19  00ebc35d    0"
1240 |       ]
1241 |      },
1242 |      "execution_count": 18,
1243 |      "metadata": {},
1244 |      "output_type": "execute_result"
1245 |     }
1246 |    ],
1247 |    "source": [
1248 |     "Submission2 = pd.DataFrame({\n",
1249 |     "    'id': sample['id'],\n",
1250 |     "    'sii': Submission2\n",
1251 |     "})  \n",
1252 |     "\n",
1253 |     "Submission2"
1254 |    ]
1255 |   },
1256 |   {
1257 |    "cell_type": "code",
1258 |    "execution_count": 19,
1259 |    "id": "9b523b1b",
1260 |    "metadata": {},
1261 |    "outputs": [],
1262 |    "source": [
1263 |     "sub1 = Submission1  \n",
1264 |     "sub2 = Submission2  \n",
1265 |     "sub1 = sub1.sort_values(by='id').reset_index(drop=True)  \n",
1266 |     "sub2 = sub2.sort_values(by='id').reset_index(drop=True)  \n",
1267 |     "\n",
1268 |     "combined = pd.DataFrame({\n",
1269 |     "    'id': sub1['id'],\n",
1270 |     "    'sii_1': sub1['sii'],\n",
1271 |     "    'sii_2': sub2['sii']\n",
1272 |     "})  \n",
1273 |     "\n",
1274 |     "def majority_vote(row):\n",
1275 |     "    \"\"\"\n",
1276 |     "    For each row of predictions, perform majority voting. \n",
1277 |     "    If there are multiple modes, take their average and round to the nearest integer.\n",
1278 |     "\n",
1279 |     "    Parameters:\n",
1280 |     "    - row: A row of prediction values\n",
1281 |     "\n",
1282 |     "    Returns:\n",
1283 |     "    - The final predicted 'sii' value\n",
1284 |     "    \"\"\"\n",
1285 |     "    return row.mode()[0] if len(row.mode()) == 1 else row.mean().round().astype(int)\n",
1286 |     "\n",
1287 |     "combined['final_sii'] = combined[['sii_1', 'sii_2']].apply(majority_vote, axis=1)  \n",
1288 |     "\n",
1289 |     "final_submission = combined[['id', 'final_sii']].rename(columns={'final_sii': 'sii'})  \n",
1290 |     "final_submission.to_csv('submission.csv', index=False)  "
1291 |    ]
1292 |   }
1293 |  ],
1294 |  "metadata": {
1295 |   "kaggle": {
1296 |    "accelerator": "none",
1297 |    "dataSources": [
1298 |     {
1299 |      "databundleVersionId": 9643020,
1300 |      "sourceId": 81933,
1301 |      "sourceType": "competition"
1302 |     }
1303 |    ],
1304 |    "isGpuEnabled": false,
1305 |    "isInternetEnabled": false,
1306 |    "language": "python",
1307 |    "sourceType": "notebook"
1308 |   },
1309 |   "kernelspec": {
1310 |    "display_name": "CMI-PIU",
1311 |    "language": "python",
1312 |    "name": "python3"
1313 |   },
1314 |   "language_info": {
1315 |    "codemirror_mode": {
1316 |     "name": "ipython",
1317 |     "version": 3
1318 |    },
1319 |    "file_extension": ".py",
1320 |    "mimetype": "text/x-python",
1321 |    "name": "python",
1322 |    "nbconvert_exporter": "python",
1323 |    "pygments_lexer": "ipython3",
1324 |    "version": "3.8.20"
1325 |   },
1326 |   "papermill": {
1327 |    "default_parameters": {},
1328 |    "duration": 399.490884,
1329 |    "end_time": "2025-01-01T15:32:13.524661",
1330 |    "environment_variables": {},
1331 |    "exception": null,
1332 |    "input_path": "__notebook__.ipynb",
1333 |    "output_path": "__notebook__.ipynb",
1334 |    "parameters": {},
1335 |    "start_time": "2025-01-01T15:25:34.033777",
1336 |    "version": "2.6.0"
1337 |   }
1338 |  },
1339 |  "nbformat": 4,
1340 |  "nbformat_minor": 5
1341 | }
1342 | 


--------------------------------------------------------------------------------