├── LICENSE ├── README.md ├── configs ├── autoint.yaml ├── dcnv2.yaml ├── default │ ├── autoint.yaml │ ├── dcnv2.yaml │ ├── ft-transformer.yaml │ ├── mlp.yaml │ └── node.yaml ├── ft-transformer.yaml ├── mlp.yaml └── node.yaml ├── data ├── __init__.py ├── custom_datasets │ └── infos.json ├── env.py ├── processor.py └── utils.py ├── examples ├── add_custom_dataset.py ├── finetune_baseline.py └── tune_baseline.py ├── image └── auto_skdl-logo.png ├── models ├── __init__.py ├── abstract.py ├── autoint.py ├── dcnv2.py ├── ft_transformer.py ├── mlp.py ├── node │ ├── __init__.py │ ├── arch.py │ ├── nn_utils.py │ ├── odst.py │ └── utils.py └── node_model.py ├── requirements.txt └── utils ├── deep.py ├── metrics.py └── model.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Tabular AI Research 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | # Auto-Scikit-DL: An automatic deep tabular learning package 4 | 5 | *Auto-Scikit-DL* is a deep tabular learning package served as a complement to scikit-learn. It will contain classical and advanced deep model baselines in tabular (machine) learning, automatic feature engineering and model selection methods, flexible training paradigm customization. This project aims to provide unified baseline interface and benchmark usage for the academic community, convenient pipeline construction for the machine learning competition, and rapid engineering experiment for machine learning projects, helping people focus on the specific algorithm design. 6 | 7 | It is currently under construction by [LionSenSei](https://github.com/jyansir). More baselines are coming soon. The project will be packaged for public use in the future. If there are any problems or suggestions, feel free to contact [jyansir@zju.edu.cn](). 8 | 9 | 10 | ## Baselines 11 | 12 | Here is the baseline list we are going to include in this package (continue to update): 13 | 14 | | Paper | Baseline | Year | Link | 15 | | :------------------------------------------------------------- | :------- | :---: | :--- | 16 | | AutoInt: Automatic Feature Interaction Learning via
Self-Attentive Neural Networks | AutoInt | 2019 | [arXiv](https://arxiv.org/abs/1810.11921) | 17 | | Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data | NODE | 2019 | [arXiv](https://arxiv.org/abs/1909.06312) | 18 | | DCN V2: Improved Deep & Cross Network and Practical Lessons
for Web-scale Learning to Rank Systems | DCNv2 | 2020 | [arXiv](https://arxiv.org/abs/2008.13535) | 19 | | TabNet: Attentive Interpretable Tabular Learning | TabNet | 2020 | [arXiv](https://arxiv.org/abs/1908.07442) | 20 | | Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain | VIME | 2021 | [arXiv](https://arxiv.org/abs/2108.12296) | 21 | | Revisiting Deep Learning Models for Tabular Data | FT-Transformer | 2021 | [arXiv](https://arxiv.org/abs/2106.11959) | 22 | | Saint: Improved neural networks for tabular data via
row attention and contrastive pre-training | SAINT | 2021 | [arXiv](https://arxiv.org/abs/2106.01342) | 23 | | T2G-Former: Organizing Tabular Features into Relation Graphs
Promotes Heterogeneous Feature Interaction | T2G-Former | 2022 | [arXiv](https://arxiv.org/abs/2211.16887) | 24 | | TabPFN: A Transformer That Solves Small Tabular Classification
Problems in a Second | TabPFN | 2022 | [arXiv](https://arxiv.org/abs/2207.01848) | 25 | | ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data | ExcelFormer | 2023 | [arXiv](https://arxiv.org/abs/2301.02819) | 26 | 27 | 28 | ## Basic Framework 29 | 30 | The project is organized into serveral parts: 31 | 32 | - `data`: to include in-built dataset and benchmark files, store dataset global settings and infomation, and common data preprocessing scripts. 33 | 34 | - `models`: to include baseline implementations, and contains an abstract class `TabModel` to organize the uniform deep tabular model interface and training paradigm. 35 | 36 | - `configs`: to include default hyper-parameter and hyper-parameter search spaces of baselines in the original paper. 37 | 38 | - `utils`: to include basic functionalities of: `model`, building baselines, tunning; `deep`, common deep learning functions and opitmizers; `metrics`, metric calculation. 39 | 40 | ## Examples 41 | 42 | Some basic usage examples are provided in `examples` directory, you can run the scripts with `python examples/script_name.py`. Before run the examples, you can download our preprared in-built datasets in the [T2G-Former](https://arxiv.org/abs/2211.16887) experiment from this [link](https://drive.google.com/uc?export=download&id=1dIp78bZo0I0TJATmZKzBhrbZxFVzJYLR), then extract to `data/datasets` folder. 43 | 44 | ``` 45 | mkdir ./data/datasets # create the directory if it does not exist 46 | tar -zxvf t2g-data0.tar.gz -C ./data/datasets 47 | ``` 48 | 49 | - **Add a custom dataset from a single csv file**: If you want to load a csv file like in-built datasets, we provide the interface to automatically process from a raw csv file and store it in the package. Then you can load it easily. 50 | 51 | - **Finetune a baseline**: You can easily finetune a model by our `fit` and `predict` APIs. 52 | 53 | - **Tune a baseline**: We provide an end-to-end `tune` function to perform hyper-parameter search in spaces defined in `configs`. You can also define your own search spaces (refer to our config files). 54 | 55 | ## Add your models 56 | 57 | Currently, you can only achieve this by manually copying your model codes and integrating it into the `models` folder (refer to `models/mlp.py` for API alignment, we suggest to copy it and directly add your model codes). Then modify `MODEL_CARDS` in `utils/model.py` to add and import your model. We will support adding user models with simple scripts in the future. 58 | 59 | -------------------------------------------------------------------------------- /configs/autoint.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | activation: 3 | value: "relu" 4 | type: "const" 5 | initialization: 6 | value: "kaiming" 7 | type: "const" 8 | n_heads: 9 | value: 2 10 | type: "const" 11 | prenormalization: 12 | value: false 13 | type: "const" 14 | attention_dropout: 15 | min: 0 16 | max: 0.5 17 | type: "uniform" 18 | d_token: 19 | min: 8 20 | max: 64 21 | type: "int" 22 | n_layers: 23 | min: 1 24 | max: 6 25 | type: "int" 26 | residual_dropout: 27 | min: 0 28 | max: 0.5 29 | type: "uniform" 30 | training: 31 | lr: 32 | min: 1.0e-5 33 | max: 1.0e-3 34 | type: "loguniform" 35 | min2: 3.0e-5 36 | max2: 3.0e-4 37 | type2: "loguniform" 38 | weight_decay: 39 | min: 1.0e-6 40 | max: 1.0e-3 41 | type: "loguniform" 42 | optimizer: 43 | value: "adamw" 44 | type: "const" -------------------------------------------------------------------------------- /configs/dcnv2.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | cross_dropout: 3 | min: 0 4 | max: 0.5 5 | type: "uniform" 6 | d: 7 | min: 64 8 | max: 512 9 | type: "int" 10 | min2: 64 11 | max2: 1024 12 | type2: "int" 13 | hidden_dropout: 14 | min: 0 15 | max: 0.5 16 | type: "uniform" 17 | n_cross_layers: 18 | min: 1 19 | max: 8 20 | type: "int" 21 | min2: 1 22 | max2: 16 23 | type2: "int" 24 | n_hidden_layers: 25 | min: 1 26 | max: 8 27 | type: "int" 28 | min2: 1 29 | max2: 16 30 | type2: "int" 31 | stacked: 32 | value: false 33 | type: "const" 34 | d_embedding: 35 | min: 64 36 | max: 512 37 | type: "int" 38 | training: 39 | lr: 40 | min: 1.0e-5 41 | max: 1.0e-2 42 | type: "loguniform" 43 | weight_decay: 44 | min: 1.0e-6 45 | max: 1.0e-3 46 | type: "loguniform" 47 | optimizer: 48 | value: "adamw" 49 | type: "const" -------------------------------------------------------------------------------- /configs/default/autoint.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | activation: "relu" 3 | initialization: "kaiming" 4 | n_heads: 2 5 | prenormalization: false 6 | attention_dropout: 0.2 7 | d_token: 32 8 | n_layers: 3 9 | residual_dropout: 0.2 10 | training: 11 | lr: 5.0e-4 12 | weight_decay: 2.0e-5 13 | optimizer: "adamw" 14 | -------------------------------------------------------------------------------- /configs/default/dcnv2.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | cross_dropout: 0.2 3 | d: 64 4 | hidden_dropout: 0.2 5 | n_cross_layers: 3 6 | n_hidden_layers: 3 7 | stacked: false 8 | d_embedding: 64 9 | training: 10 | lr: 5.0e-4 11 | weight_decay: 1.0e-6 12 | optimizer: "adamw" 13 | -------------------------------------------------------------------------------- /configs/default/ft-transformer.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | d_token: 64 3 | n_blocks: 3 4 | attention_dropout: 0.2 5 | ffn_d_factor: 1.33 6 | ffn_dropout: 0.3 7 | residual_dropout: 0.1 8 | training: 9 | lr: 5.0e-4 10 | weight_decay: 2.0e-5 11 | optimizer: "adamw" 12 | -------------------------------------------------------------------------------- /configs/default/mlp.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | n_layers: 3 3 | first_dim: 64 4 | mid_dim: 32 5 | last_dim: 64 6 | dropout: 0.2 7 | d_embedding: 128 8 | 9 | training: 10 | lr: 1.0e-3 11 | weight_decay: 0.0 12 | optimizer: "adamw" 13 | -------------------------------------------------------------------------------- /configs/default/node.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | bin_function: "entmoid15" 3 | choice_function: "entmax15" 4 | depth: 6 5 | layer_dim: 256 6 | num_layers: 4 7 | d_embedding: 256 8 | training: 9 | lr: 1.0e-3 10 | weight_decay: 0.0 11 | optimizer: "adam" 12 | -------------------------------------------------------------------------------- /configs/ft-transformer.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | d_token: 3 | min: 64 4 | max: 512 5 | type: "int" 6 | 7 | n_blocks: 8 | min: 1 9 | max: 4 10 | type: "int" 11 | # for large datasets 12 | min2: 1 13 | max2: 6 14 | type2: "int" 15 | 16 | attention_dropout: 17 | min: 0 18 | max: 0.5 19 | type: "uniform" 20 | 21 | ffn_d_factor: 22 | min: 0.66 23 | max: 2.66 24 | type: "uniform" 25 | value2: 1.33 26 | type2: "const" 27 | 28 | ffn_dropout: 29 | min: 0 30 | max: 0.5 31 | type: "uniform" 32 | 33 | residual_dropout: 34 | min: 0 35 | max: 0.2 36 | type: "uniform" 37 | value2: 0 38 | type2: "const" 39 | 40 | training: 41 | lr: 42 | min: 1.0e-5 43 | max: 1.0e-3 44 | type: "loguniform" 45 | min2: 3.0e-5 46 | max2: 3.0e-4 47 | type2: "loguniform" 48 | 49 | weight_decay: 50 | min: 1.0e-6 51 | max: 1.0e-3 52 | type: "loguniform" 53 | 54 | optimizer: 55 | value: "adamw" 56 | type: "const" 57 | -------------------------------------------------------------------------------- /configs/mlp.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | n_layers: 3 | min: 1 4 | max: 8 5 | type: "int" 6 | # for large datasets 7 | min2: 1 8 | max2: 16 9 | type2: "int" 10 | 11 | first_dim: 12 | min: 1 13 | max: 512 14 | type: "int" 15 | # for large datasets 16 | min2: 1 17 | max2: 1024 18 | type2: "int" 19 | 20 | mid_dim: 21 | min: 1 22 | max: 512 23 | type: "int" 24 | # for large datasets 25 | min2: 1 26 | max2: 1024 27 | type2: "int" 28 | 29 | last_dim: 30 | min: 1 31 | max: 512 32 | type: "int" 33 | # for large datasets 34 | min2: 1 35 | max2: 1024 36 | type2: "int" 37 | 38 | dropout: 39 | min: 0 40 | max: 0.5 41 | type: "uniform" 42 | 43 | d_embedding: 44 | min: 64 45 | max: 512 46 | type: "int" 47 | 48 | training: 49 | lr: 50 | min: 1.0e-5 51 | max: 1.0e-2 52 | type: "loguniform" 53 | 54 | weight_decay: 55 | value: 0.0 56 | type: "const" 57 | 58 | optimizer: 59 | value: "adamw" 60 | type: "const" -------------------------------------------------------------------------------- /configs/node.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | bin_function: 3 | value: "entmoid15" 4 | type: "const" 5 | choice_function: 6 | value: "entmax15" 7 | type: "const" 8 | depth: 9 | min: 6 10 | max: 8 11 | type: "int" 12 | layer_dim: 13 | choices: [128, 256, 512] 14 | type: "categorical" 15 | num_layers: 16 | choices: [2, 4, 8] 17 | type: "categorical" 18 | d_embedding: 19 | value: 256 20 | type: "const" 21 | training: 22 | lr: 23 | value: 1.0e-3 24 | type: "const" 25 | weight_decay: 26 | value: 0.0 27 | type: "const" 28 | optimizer: 29 | value: "adam" 30 | type: "const" -------------------------------------------------------------------------------- /data/__init__.py: -------------------------------------------------------------------------------- 1 | from .env import * -------------------------------------------------------------------------------- /data/custom_datasets/infos.json: -------------------------------------------------------------------------------- 1 | { 2 | "n_datasets": 0, 3 | "binclass": 0, 4 | "multiclass": 0, 5 | "regression": 0, 6 | "data_list": [] 7 | } -------------------------------------------------------------------------------- /data/env.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import shutil 4 | from pathlib import Path 5 | import numpy as np 6 | 7 | DATA = Path('data/datasets') 8 | CUSTOM_DATA = Path('data/custom_datasets') # user custom datasets 9 | 10 | BENCHMARKS = { 11 | 'ft-transformer': { 12 | 'path': 'data/benchmarks/ft-transformer', 13 | # The dataset list of the benchmark 14 | # if not appears in DATASETS or CUSTOM_DATASETS 15 | # it should be added to the 'path' 16 | 'datasets': [], # priority: DATASETS > CUSTOM_DATASETS > 'path' 17 | }, 18 | 't2g-former': { 19 | 'path': 'data/benchmarks/t2g-former', 20 | 'datasets': ['california', 'eye', 'helena', 'otto'] 21 | } 22 | } 23 | 24 | # available single datasets and specific DNN processing methods 25 | # default: `normalization: quantile` 26 | DATASETS = { 27 | 'adult': {'path': DATA / 'adult'}, 28 | 'california': {'path': DATA / 'california'}, 29 | 'churn': {'path': DATA / 'churn'}, 30 | 'eye': {'path': DATA / 'eye', 'normalization': 'standard'}, 31 | 'gesture': {'path': DATA / 'gesture'}, 32 | 'helena': {'path': DATA / 'helena', 'normalization': 'standard'}, 33 | 'higgs-small': {'path': DATA / 'higgs-small'}, 34 | 'house': {'path': DATA / 'house'}, 35 | 'jannis': {'path': DATA / 'jannis'}, 36 | 'otto': {'path': DATA / 'otto', 'normalization': None}, 37 | 'fb-comments': {'path': DATA / 'fb-comments'}, 38 | 'year': {'path': DATA / 'year'}, 39 | } 40 | 41 | CUSTOM_DATASETS = {} 42 | 43 | def read_custom_infos(): 44 | with open(CUSTOM_DATA / 'infos.json', 'r') as f: 45 | custom_infos = json.load(f) 46 | return custom_infos 47 | # read the `infos.json` to load 48 | def reload_custom_infos(): 49 | custom_infos = read_custom_infos() 50 | global CUSTOM_DATASETS 51 | CUSTOM_DATASETS = { 52 | info['name']: { 53 | 'path': CUSTOM_DATA / info['name'], 54 | 'task_type': info['task_type'], 55 | 'normalization': info.get('normalization', 'quantile') 56 | } for info in custom_infos['data_list'] 57 | } 58 | reload_custom_infos() 59 | 60 | def write_custom_infos(infos): 61 | with open(CUSTOM_DATA / 'infos.json', 'w') as f: 62 | json.dump(infos, f, indent=4) 63 | reload_custom_infos() 64 | 65 | def push_custom_datasets( 66 | X_num, X_cat, ys, idx, 67 | info # TODO: add normalization field to info 68 | ): 69 | data_dir = CUSTOM_DATA / info['name'] 70 | os.makedirs(data_dir) 71 | try: 72 | for spl in ['train', 'val', 'test']: 73 | np.save(data_dir / f'idx_{spl}.npy', idx[spl]) 74 | if X_num is not None: 75 | np.save(data_dir / f'X_num_{spl}.npy', X_num[spl]) 76 | if X_cat is not None: 77 | np.save(data_dir / f'X_cat_{spl}.npy', X_cat[spl]) 78 | np.save(data_dir / f'y_{spl}.npy', ys[spl]) 79 | with open(data_dir / 'info.json', 'w') as f: 80 | json.dump(info, f, indent=4) 81 | except: 82 | print('failed to add custom dataset: ', info['name']) 83 | shutil.rmtree(data_dir) 84 | return 85 | custom_infos = read_custom_infos() 86 | custom_infos['data_list'].append({'name': info['name'], 'task_type': info['task_type']}) 87 | custom_infos[info['task_type']] += 1 88 | custom_infos['n_datasets'] += 1 89 | write_custom_infos(custom_infos) 90 | print(f"push dataset: '{info['name']}' done") 91 | 92 | def available_datasets(): 93 | return sorted(list(DATASETS.keys()) + list(CUSTOM_DATASETS.keys())) -------------------------------------------------------------------------------- /data/processor.py: -------------------------------------------------------------------------------- 1 | 2 | from typing import List, Optional, Union, Literal 3 | from pathlib import Path 4 | import os 5 | import yaml 6 | import json 7 | import shutil 8 | import warnings 9 | 10 | import numpy as np 11 | import pandas as pd 12 | from collections import Counter 13 | from sklearn.preprocessing import OrdinalEncoder 14 | from sklearn.model_selection import train_test_split 15 | 16 | import torch.nn as nn 17 | from .env import ( 18 | BENCHMARKS, DATASETS, CUSTOM_DATASETS, 19 | push_custom_datasets, read_custom_infos, write_custom_infos 20 | ) 21 | from .utils import ( 22 | Normalization, NumNanPolicy, CatNanPolicy, CatEncoding, YPolicy, 23 | CAT_MISSING_VALUE, ArrayDict, TensorDict, TaskType, 24 | Dataset, Transformations, prepare_tensors, build_dataset, transform_dataset 25 | ) 26 | from models.abstract import TabModel 27 | 28 | DataFileType = Literal['csv', 'excel', 'npy', 'arff'] 29 | 30 | class DataProcessor: 31 | """Base class to process a single dataset""" 32 | def __init__( 33 | self, 34 | normalization: Optional[Normalization] = None, 35 | num_nan_policy: Optional[NumNanPolicy] = None, 36 | cat_nan_policy: Optional[CatNanPolicy] = None, 37 | cat_min_frequency: Optional[float] = None, 38 | cat_encoding: Optional[CatEncoding] = None, 39 | y_policy: Optional[YPolicy] = 'default', 40 | seed: int = 42, 41 | cache_dir: Optional[str] = None, 42 | ): 43 | self.transformation = Transformations( 44 | seed=seed, 45 | normalization=normalization, 46 | num_nan_policy=num_nan_policy, 47 | cat_nan_policy=cat_nan_policy, 48 | cat_min_frequency=cat_min_frequency, 49 | cat_encoding=cat_encoding, 50 | y_policy=y_policy 51 | ) 52 | self.cache_dir = cache_dir 53 | 54 | def apply(self, dataset: Dataset): 55 | return transform_dataset(dataset, self.transformation, self.cache_dir) 56 | 57 | def save(self, file, **kwargs): 58 | data_config = { 59 | 'transformation': vars(self.transformation), 60 | 'cache_dir': str(self.cache_dir), 61 | 'meta': kwargs, 62 | } 63 | with open(file, 'w') as f: 64 | yaml.dump(data_config, f, indent=2) 65 | 66 | @staticmethod 67 | def check_splits(dataset: Dataset): 68 | valid_splits = True 69 | if 'train' in dataset.y: 70 | if 'test' not in dataset.y: 71 | warnings.warn("Missing test split, unable to prediction") 72 | valid_splits = False 73 | if 'val' not in dataset.y: 74 | warnings.warn("Missing dev split, unable to early stop, or ignore this message if no early stop needed.") 75 | valid_splits = False 76 | if valid_splits: 77 | print("ready for training!") 78 | else: 79 | raise ValueError("Missing training split in the dataset") 80 | 81 | @staticmethod 82 | def prepare(dataset: Dataset, model: Optional[TabModel] = None, device: str = 'cuda'): 83 | assert model is not None or device is not None 84 | def get_spl(X: Optional[Union[ArrayDict, TensorDict]], spl): 85 | return None if X is None else X[spl] 86 | if device is not None or isinstance(model.model, nn.Module): 87 | device = device or model.model.device 88 | X_num, X_cat, ys = prepare_tensors(dataset, device) 89 | return {spl: ( 90 | get_spl(X_num, spl), 91 | get_spl(X_cat, spl), 92 | get_spl(ys, spl) 93 | ) for spl in ys} 94 | else: 95 | return {spl: ( 96 | get_spl(dataset.X_num, spl), 97 | get_spl(dataset.X_cat, spl), 98 | get_spl(dataset.y, spl) 99 | ) for spl in dataset.y} 100 | 101 | @staticmethod 102 | def load_preproc_default( 103 | output_dir, # output preprocessing infos 104 | model_name, 105 | dataset_name, 106 | benchmark_name: Optional[str] = None, 107 | seed: int = 42, 108 | cache_dir: Optional[str] = None 109 | ): 110 | global DATASETS, CUSTOM_DATASETS 111 | """default data preprocessing pipeline""" 112 | if dataset_name in DATASETS or dataset_name in CUSTOM_DATASETS: 113 | data_src = DATASETS if dataset_name in DATASETS else CUSTOM_DATASETS 114 | data_config = data_src[dataset_name] 115 | data_path = Path(data_config['path']) 116 | data_config.setdefault('normalization', 'quantile') 117 | normalization = data_config['normalization'] 118 | elif benchmark_name is not None: 119 | assert benchmark_name in BENCHMARKS, f"Benchmark '{benchmark_name}' is not included, \ 120 | please choose one of '{list(BENCHMARKS.keys())}', for include your benchmark manually." 121 | benchmark_info = BENCHMARKS[benchmark_name] 122 | assert dataset_name in benchmark_info['datasets'], f"dataset '{dataset_name}' not in benchmark '{benchmark_name}'" 123 | data_path = Path(benchmark_info['path']) / dataset_name 124 | normalization = 'quantile' 125 | else: 126 | raise ValueError(f"No dataset '{dataset_name}' is available, \ 127 | if you want to use a custom dataset (from csv file), using `add_custom_dataset`") 128 | 129 | dataset = Dataset.from_dir(data_path) 130 | # default preprocess settings 131 | num_nan_policy = 'mean' if dataset.X_num is not None and \ 132 | any(np.isnan(dataset.X_num[spl]).any() for spl in dataset.X_num) else None 133 | cat_nan_policy = None 134 | if model_name in ['xgboost', 'catboost', 'lightgbm']: # for tree models or other sklearn algorithms 135 | normalization = None 136 | cat_min_frequency = None 137 | cat_encoding = 'one-hot' 138 | if model_name in ['catboost']: 139 | cat_encoding = None 140 | else: # for dnns 141 | # BUG: (dataset.X_cat[spl] == CAT_MISSING_VALUE).any() has different action 142 | # dtype: int -> bool, dtype: string -> array[bool], dtype: object -> np.load error 143 | # CURRENT: uniformly using string type to store catgorical features 144 | if dataset.X_cat is not None and \ 145 | any((dataset.X_cat[spl] == CAT_MISSING_VALUE).any() for spl in dataset.X_cat): 146 | cat_nan_policy = 'most_frequent' 147 | cat_min_frequency = None 148 | cat_encoding = None 149 | cache_dir = cache_dir or data_path 150 | processor = DataProcessor( 151 | normalization=normalization, 152 | num_nan_policy=num_nan_policy, 153 | cat_nan_policy=cat_nan_policy, 154 | cat_min_frequency=cat_min_frequency, 155 | cat_encoding=cat_encoding, 156 | seed=seed, 157 | cache_dir=Path(cache_dir), 158 | ) 159 | dataset = processor.apply(dataset) 160 | # check train, val, test splits 161 | DataProcessor.check_splits(dataset) 162 | # save preprocessing infos 163 | if not os.path.exists(output_dir): 164 | os.makedirs(output_dir) 165 | processor.save( 166 | Path(output_dir) / 'data_config.yaml', 167 | benchmark=str(benchmark_name), 168 | dataset=dataset_name 169 | ) 170 | return dataset 171 | 172 | @staticmethod 173 | def split( 174 | X_num: Optional[np.ndarray] = None, 175 | X_cat: Optional[np.ndarray] = None, 176 | ys: np.ndarray = None, 177 | train_ratio: float = 0.8, 178 | stratify: bool = True, 179 | seed: int = 42, 180 | ): 181 | assert 0 < train_ratio < 1 182 | assert ys is not None 183 | sample_idx = np.arange(len(ys)) 184 | test_ratio = 1 - train_ratio 185 | _stratify = None if not stratify else ys 186 | train_idx, test_idx = train_test_split(sample_idx, test_size=test_ratio, random_state=seed, stratify=_stratify) 187 | _stratify = None if not stratify else ys[train_idx] 188 | train_idx, val_idx = train_test_split(train_idx, test_size=test_ratio, random_state=seed, stratify=_stratify) 189 | if X_num is not None: 190 | X_num = {'train': X_num[train_idx], 'val': X_num[val_idx], 'test': X_num[test_idx]} 191 | if X_cat is not None: 192 | X_cat = {'train': X_cat[train_idx], 'val': X_cat[val_idx], 'test': X_cat[test_idx]} 193 | ys = {'train': ys[train_idx], 'val': ys[val_idx], 'test': ys[test_idx]} 194 | idx = {'train': train_idx, 'val': val_idx, 'test': test_idx} 195 | return X_num, X_cat, ys, idx 196 | 197 | @staticmethod 198 | def del_custom_dataset( 199 | dataset_names: Union[str, List[str]] 200 | ): 201 | global DATASETS, CUSTOM_DATASETS 202 | all_infos = read_custom_infos() 203 | if isinstance(dataset_names, str): 204 | dataset_names = [dataset_names] 205 | for dataset_name in dataset_names: 206 | if dataset_name not in CUSTOM_DATASETS: 207 | print(f"custom dataset: {dataset_name} not exist") 208 | continue 209 | elif dataset_name in DATASETS: 210 | print(f"can not delete an in-built dataset: {dataset_name}") 211 | continue 212 | data_info = CUSTOM_DATASETS[dataset_name] 213 | task = data_info['task_type'] 214 | data_path = data_info['path'] 215 | data_idx = [info['name'] for info in all_infos['data_list']].index(dataset_name) 216 | all_infos['data_list'].pop(data_idx) 217 | all_infos['n_datasets'] -= 1 218 | all_infos[task] -= 1 219 | shutil.rmtree(data_path) 220 | print(f"delete dataset: {dataset_name} successfully") 221 | write_custom_infos(all_infos) 222 | from .env import CUSTOM_DATASETS # BUG: refresh the global variable 223 | 224 | @staticmethod 225 | def add_custom_dataset( 226 | file: Union[str, Path], 227 | format: DataFileType = 'csv', 228 | dataset_name: Optional[str] = None, 229 | task: Optional[str] = None, 230 | num_cols: Optional[List[int]] = None, 231 | cat_cols: Optional[List[int]] = None, 232 | label_index: int = -1, # label column index 233 | header: Optional[int] = 0, # header row 234 | max_cat_num: int = 16, 235 | train_ratio: float = 0.8, # split train / test, train / val 236 | seed: float = 42, # random split seed 237 | ): 238 | """ 239 | Support for adding a custom dataset from a single data file 240 | --- 241 | read a raw csv file, process into 3 splits (train, val, test), and add to custom_datasets 242 | 243 | TODO: adding a dataset from prepared data split files 244 | TODO: support no validation split 245 | """ 246 | global DATASETS, CUSTOM_DATASETS 247 | file_name = Path(file).name 248 | assert file_name.endswith(format), f'please check if the file \ 249 | is in {format} format, or add the suffix manually' 250 | dataset_name = dataset_name or file_name[:-len(format)-1] 251 | assert dataset_name not in DATASETS, f'same dataset name as an in-built dataset: {dataset_name}' 252 | assert dataset_name not in CUSTOM_DATASETS, f"existing custom dataset '{dataset_name}' found" 253 | 254 | if format == 'csv': 255 | datas: pd.DataFrame = pd.read_csv(file, header=header) 256 | columns = datas.columns if header is not None else None 257 | elif format == 'npy': 258 | header = None # numpy file has no headers 259 | columns = None 260 | datas = np.load(file) 261 | raise NotImplementedError("only support load csv file now") 262 | else: 263 | raise ValueError("other support format to be add further") 264 | 265 | X_idx = list(range(datas.shape[1])) 266 | y_idx = X_idx.pop(label_index) 267 | label_name = columns[y_idx] if columns is not None else None 268 | # numerical and categorical feature detection 269 | if num_cols is None or cat_cols is None: 270 | print('automatically detect column type...') 271 | print('max category amount: ', max_cat_num) 272 | num_cols, cat_cols = [], [] 273 | num_names, cat_names = [], [] 274 | for i in X_idx: 275 | if datas.iloc[:, i].values.dtype == float: 276 | num_cols.append(i) 277 | if columns is not None: 278 | num_names.append(columns[i]) 279 | else: # int or object (str) 280 | if len(set(datas.iloc[:, i].values)) <= max_cat_num: 281 | cat_cols.append(i) 282 | if columns is not None: 283 | cat_names.append(columns[i]) 284 | elif datas.iloc[:, i].values.dtype == int: 285 | num_cols.append(i) 286 | if columns is not None: 287 | num_names.append(columns[i]) 288 | if not num_names and not cat_names: 289 | num_names, cat_names = None, None 290 | elif columns: 291 | num_names = [columns[i] for i in num_cols] 292 | cat_names = [columns[i] for i in cat_cols] 293 | else: 294 | num_names, cat_names = None, None 295 | n_num_features = len(num_cols) 296 | n_cat_features = len(cat_cols) 297 | # build X_num and X_cat 298 | X_num, ys = None, datas.iloc[:, y_idx].values 299 | if len(num_cols) > 0: 300 | X_num = datas.iloc[:, num_cols].values.astype(np.float32) 301 | # check data type 302 | X_cat = [] 303 | for i in cat_cols: 304 | if datas.iloc[:, i].values.dtype == int: 305 | x = datas.iloc[:, i].values.astype(np.int64) 306 | # ordered by value 307 | # x = OrdinalEncoder(categories=[sorted(list(set(x)))]).fit_transform(x.reshape(-1, 1)) 308 | else: # string object 309 | x = datas.iloc[:, i].values.astype(object) 310 | # most_common = [item[0] for item in Counter(x).most_common()] 311 | # ordered by frequency 312 | # x = OrdinalEncoder(categories=[most_common]).fit_transform(x.reshape(-1, 1)) 313 | X_cat.append(x.astype(np.str0)) # Encoder Later, compatible with Line 140 314 | X_cat = np.stack(X_cat, axis=1) if len(X_cat) > 0 else None # if using OrdinalEncoder, np.concatenate 315 | # detect task type 316 | def process_non_regression_labels(ys: np.ndarray, task): 317 | if ys.dtype in [int, float]: 318 | ys = OrdinalEncoder(categories=[sorted(list(set(ys)))]).fit_transform(ys.reshape(-1, 1)) 319 | else: 320 | most_common = [item[0] for item in Counter(ys).most_common()] 321 | ys = OrdinalEncoder(categories=most_common).fit_transform(ys.reshape(-1, 1)) 322 | ys = ys[:, 0] 323 | return ys.astype(np.float32) if task == 'binclass' else ys.astype(np.int64) 324 | 325 | if task is None: 326 | if ys.dtype in [int, object]: 327 | task = 'binclass' if len(set(ys)) == 2 else 'multiclass' 328 | ys = process_non_regression_labels(ys, task) 329 | elif ys.dtype == float: 330 | if len(set(ys)) == 2: 331 | task = 'binclass' 332 | ys = process_non_regression_labels(ys, task) 333 | else: 334 | task = 'regression' 335 | ys = ys.astype(np.float32) 336 | else: 337 | if task == 'regression': 338 | ys = ys.astype(np.float32) 339 | else: 340 | ys = process_non_regression_labels(ys, task) 341 | 342 | # split datasets 343 | stratify = task != 'regression' 344 | X_num, X_cat, ys, idx = DataProcessor.split(X_num, X_cat, ys, train_ratio, stratify, seed) 345 | # push to CUSTOM_DATASETS 346 | data_info = { 347 | 'name': dataset_name, 348 | 'id': f'{dataset_name.lower()}--custom', 349 | 'task_type': task, 350 | 'label_name': label_name, 351 | 'n_num_features': n_num_features, 352 | 'num_feature_names': num_names, 353 | 'n_cat_features': n_cat_features, 354 | 'cat_feature_names': cat_names, 355 | 'test_size': len(ys['test']), 356 | 'train_size': len(ys['train']), 357 | 'val_size': len(ys['val'])} 358 | push_custom_datasets(X_num, X_cat, ys, idx, data_info) 359 | from .env import CUSTOM_DATASETS # refresh global variable 360 | print(f'finish, now you can load your dataset with `load_preproc_default({dataset_name})`') 361 | 362 | class BenchmarkProcessor: 363 | """Prepare datasets in the Literatures""" 364 | def __init__(self) -> None: 365 | pass -------------------------------------------------------------------------------- /data/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | References: 3 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/data.py 4 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/util.py 5 | """ 6 | import json 7 | import enum 8 | import pickle 9 | import hashlib 10 | from collections import Counter 11 | from copy import deepcopy 12 | from dataclasses import astuple, dataclass, replace 13 | from pathlib import Path 14 | from typing import Any, Optional, Union, cast, Dict, List, Tuple 15 | try: 16 | from typing import Literal 17 | except ImportError: 18 | from typing_extensions import Literal 19 | 20 | import numpy as np 21 | import pandas as pd 22 | import sklearn.preprocessing 23 | import torch 24 | from category_encoders import LeaveOneOutEncoder 25 | from sklearn.impute import SimpleImputer 26 | from sklearn.preprocessing import StandardScaler 27 | 28 | 29 | ArrayDict = Dict[str, np.ndarray] 30 | TensorDict = Dict[str, torch.Tensor] 31 | 32 | 33 | CAT_MISSING_VALUE = '__nan__' 34 | CAT_RARE_VALUE = '__rare__' 35 | Normalization = Literal['standard', 'quantile'] 36 | NumNanPolicy = Literal['drop-rows', 'mean'] 37 | CatNanPolicy = Literal['most_frequent'] 38 | CatEncoding = Literal['one-hot', 'counter'] 39 | YPolicy = Literal['default'] 40 | 41 | class TaskType(enum.Enum): 42 | BINCLASS = 'binclass' 43 | MULTICLASS = 'multiclass' 44 | REGRESSION = 'regression' 45 | 46 | def __str__(self) -> str: 47 | return self.value 48 | 49 | 50 | def raise_unknown(unknown_what: str, unknown_value: Any): 51 | raise ValueError(f'Unknown {unknown_what}: {unknown_value}') 52 | 53 | def load_json(path: Union[Path, str], **kwargs) -> Any: 54 | return json.loads(Path(path).read_text(), **kwargs) 55 | 56 | def dump_json(x: Any, path: Union[Path, str], **kwargs) -> None: 57 | kwargs.setdefault('indent', 4) 58 | Path(path).write_text(json.dumps(x, **kwargs) + '\n') 59 | 60 | def load_pickle(path: Union[Path, str], **kwargs) -> Any: 61 | return pickle.loads(Path(path).read_bytes(), **kwargs) 62 | 63 | def dump_pickle(x: Any, path: Union[Path, str], **kwargs) -> None: 64 | Path(path).write_bytes(pickle.dumps(x, **kwargs)) 65 | 66 | 67 | class StandardScaler1d(StandardScaler): 68 | def partial_fit(self, X, *args, **kwargs): 69 | assert X.ndim == 1 70 | return super().partial_fit(X[:, None], *args, **kwargs) 71 | 72 | def transform(self, X, *args, **kwargs): 73 | assert X.ndim == 1 74 | return super().transform(X[:, None], *args, **kwargs).squeeze(1) 75 | 76 | def inverse_transform(self, X, *args, **kwargs): 77 | assert X.ndim == 1 78 | return super().inverse_transform(X[:, None], *args, **kwargs).squeeze(1) 79 | 80 | 81 | def get_category_sizes(X: Union[torch.Tensor, np.ndarray]) -> List[int]: 82 | XT = X.T.cpu().tolist() if isinstance(X, torch.Tensor) else X.T.tolist() 83 | return [len(set(x)) for x in XT] 84 | 85 | 86 | @dataclass(frozen=True) 87 | class Dataset: 88 | X_num: Optional[ArrayDict] 89 | X_cat: Optional[ArrayDict] 90 | y: ArrayDict 91 | y_info: Dict[str, Any] 92 | task_type: TaskType 93 | n_classes: Optional[int] 94 | name: Optional[str] 95 | 96 | @classmethod 97 | def from_dir(cls, dir_: Union[Path, str]) -> 'Dataset': 98 | dir_ = Path(dir_) 99 | 100 | def load(item) -> ArrayDict: 101 | def _load(file: Path): 102 | return cast(np.ndarray, np.load(file)) if file.exists() else None 103 | return { 104 | x: _load(dir_ / f'{item}_{x}.npy') 105 | for x in ['train', 'val', 'test'] 106 | } 107 | 108 | info = load_json(dir_ / 'info.json') 109 | 110 | return Dataset( 111 | load('X_num') if dir_.joinpath('X_num_train.npy').exists() else None, 112 | load('X_cat') if dir_.joinpath('X_cat_train.npy').exists() else None, 113 | load('y'), 114 | {}, 115 | TaskType(info['task_type']), 116 | info.get('n_classes'), 117 | info.get('name'), 118 | ) 119 | 120 | @property 121 | def is_binclass(self) -> bool: 122 | return self.task_type == TaskType.BINCLASS 123 | 124 | @property 125 | def is_multiclass(self) -> bool: 126 | return self.task_type == TaskType.MULTICLASS 127 | 128 | @property 129 | def is_regression(self) -> bool: 130 | return self.task_type == TaskType.REGRESSION 131 | 132 | @property 133 | def n_num_features(self) -> int: 134 | return 0 if self.X_num is None else self.X_num['train'].shape[1] 135 | 136 | @property 137 | def n_cat_features(self) -> int: 138 | return 0 if self.X_cat is None else self.X_cat['train'].shape[1] 139 | 140 | @property 141 | def n_features(self) -> int: 142 | return self.n_num_features + self.n_cat_features 143 | 144 | def size(self, part: Optional[str]) -> int: 145 | return sum(map(len, self.y.values())) if part is None else len(self.y[part]) 146 | 147 | @property 148 | def nn_output_dim(self) -> int: 149 | if self.is_multiclass: 150 | assert self.n_classes is not None 151 | return self.n_classes 152 | else: 153 | return 1 154 | 155 | def get_category_sizes(self, part: str) -> List[int]: 156 | return [] if self.X_cat is None else get_category_sizes(self.X_cat[part]) 157 | 158 | 159 | def num_process_nans(dataset: Dataset, policy: Optional[NumNanPolicy]) -> Dataset: 160 | assert dataset.X_num is not None 161 | nan_masks = {k: np.isnan(v) for k, v in dataset.X_num.items()} 162 | if not any(x.any() for x in nan_masks.values()): # type: ignore[code] 163 | assert policy is None 164 | return dataset 165 | 166 | assert policy is not None 167 | if policy == 'drop-rows': 168 | valid_masks = {k: ~v.any(1) for k, v in nan_masks.items()} 169 | assert valid_masks[ 170 | 'test' 171 | ].all(), 'Cannot drop test rows, since this will affect the final metrics.' 172 | new_data = {} 173 | for data_name in ['X_num', 'X_cat', 'y']: 174 | data_dict = getattr(dataset, data_name) 175 | if data_dict is not None: 176 | new_data[data_name] = { 177 | k: v[valid_masks[k]] for k, v in data_dict.items() 178 | } 179 | dataset = replace(dataset, **new_data) 180 | elif policy == 'mean': 181 | new_values = np.nanmean(dataset.X_num['train'], axis=0) 182 | X_num = deepcopy(dataset.X_num) 183 | for k, v in X_num.items(): 184 | num_nan_indices = np.where(nan_masks[k]) 185 | v[num_nan_indices] = np.take(new_values, num_nan_indices[1]) 186 | dataset = replace(dataset, X_num=X_num) 187 | else: 188 | assert raise_unknown('policy', policy) 189 | return dataset 190 | 191 | 192 | # Inspired by: https://github.com/Yura52/rtdl/blob/a4c93a32b334ef55d2a0559a4407c8306ffeeaee/lib/data.py#L20 193 | def normalize( 194 | X: ArrayDict, normalization: Normalization, seed: Optional[int] 195 | ) -> ArrayDict: 196 | X_train = X['train'] 197 | if normalization == 'standard': 198 | normalizer = sklearn.preprocessing.StandardScaler() 199 | elif normalization == 'quantile': 200 | normalizer = sklearn.preprocessing.QuantileTransformer( 201 | output_distribution='normal', 202 | n_quantiles=max(min(X['train'].shape[0] // 30, 1000), 10), 203 | subsample=1e9, 204 | random_state=seed, 205 | ) 206 | noise = 1e-3 207 | if noise > 0: 208 | assert seed is not None 209 | stds = np.std(X_train, axis=0, keepdims=True) 210 | noise_std = noise / np.maximum(stds, noise) # type: ignore[code] 211 | X_train = X_train + noise_std * np.random.default_rng(seed).standard_normal( 212 | X_train.shape 213 | ) 214 | else: 215 | raise_unknown('normalization', normalization) 216 | normalizer.fit(X_train) 217 | return {k: normalizer.transform(v) for k, v in X.items()} # type: ignore[code] 218 | 219 | 220 | def cat_process_nans(X: ArrayDict, policy: Optional[CatNanPolicy]) -> ArrayDict: 221 | assert X is not None 222 | nan_masks = {k: v == CAT_MISSING_VALUE for k, v in X.items()} 223 | if any(x.any() for x in nan_masks.values()): # type: ignore[code] 224 | if policy is None: 225 | X_new = X 226 | elif policy == 'most_frequent': 227 | imputer = SimpleImputer(missing_values=CAT_MISSING_VALUE, strategy=policy) # type: ignore[code] 228 | imputer.fit(X['train']) 229 | X_new = {k: cast(np.ndarray, imputer.transform(v)) for k, v in X.items()} 230 | else: 231 | raise_unknown('categorical NaN policy', policy) 232 | else: 233 | assert policy is None 234 | X_new = X 235 | return X_new 236 | 237 | 238 | def cat_drop_rare(X: ArrayDict, min_frequency: float) -> ArrayDict: 239 | assert 0.0 < min_frequency < 1.0 240 | min_count = round(len(X['train']) * min_frequency) 241 | X_new = {x: [] for x in X} 242 | for column_idx in range(X['train'].shape[1]): 243 | counter = Counter(X['train'][:, column_idx].tolist()) 244 | popular_categories = {k for k, v in counter.items() if v >= min_count} 245 | for part in X_new: 246 | X_new[part].append( 247 | [ 248 | (x if x in popular_categories else CAT_RARE_VALUE) 249 | for x in X[part][:, column_idx].tolist() 250 | ] 251 | ) 252 | return {k: np.array(v).T for k, v in X_new.items()} 253 | 254 | 255 | def cat_encode( 256 | X: ArrayDict, 257 | encoding: Optional[CatEncoding], 258 | y_train: Optional[np.ndarray], 259 | seed: Optional[int], 260 | ) -> Tuple[ArrayDict, bool]: # (X, is_converted_to_numerical) 261 | if encoding != 'counter': 262 | y_train = None 263 | 264 | # Step 1. Map strings to 0-based ranges 265 | unknown_value = np.iinfo('int64').max - 3 266 | encoder = sklearn.preprocessing.OrdinalEncoder( 267 | handle_unknown='use_encoded_value', # type: ignore[code] 268 | unknown_value=unknown_value, # type: ignore[code] 269 | dtype='int64', # type: ignore[code] 270 | ).fit(X['train']) 271 | X = {k: encoder.transform(v) for k, v in X.items()} 272 | max_values = X['train'].max(axis=0) 273 | for part in ['val', 'test']: 274 | for column_idx in range(X[part].shape[1]): 275 | X[part][X[part][:, column_idx] == unknown_value, column_idx] = ( 276 | max_values[column_idx] + 1 277 | ) 278 | 279 | # Step 2. Encode. 280 | if encoding is None: 281 | return (X, False) 282 | elif encoding == 'one-hot': 283 | encoder = sklearn.preprocessing.OneHotEncoder( 284 | handle_unknown='ignore', sparse=False, dtype=np.float32 # type: ignore[code] 285 | ) 286 | encoder.fit(X['train']) 287 | return ({k: encoder.transform(v) for k, v in X.items()}, True) # type: ignore[code] 288 | elif encoding == 'counter': 289 | assert y_train is not None 290 | assert seed is not None 291 | encoder = LeaveOneOutEncoder(sigma=0.1, random_state=seed, return_df=False) 292 | encoder.fit(X['train'], y_train) 293 | X = {k: encoder.transform(v).astype('float32') for k, v in X.items()} # type: ignore[code] 294 | if not isinstance(X['train'], pd.DataFrame): 295 | X = {k: v.values for k, v in X.items()} # type: ignore[code] 296 | return (X, True) # type: ignore[code] 297 | else: 298 | raise_unknown('encoding', encoding) 299 | 300 | 301 | def build_target( 302 | y: ArrayDict, policy: Optional[YPolicy], task_type: TaskType 303 | ) -> Tuple[ArrayDict, Dict[str, Any]]: 304 | info: Dict[str, Any] = {'policy': policy} 305 | if policy is None: 306 | pass 307 | elif policy == 'default': 308 | if task_type == TaskType.REGRESSION: 309 | mean, std = float(y['train'].mean()), float(y['train'].std()) 310 | y = {k: (v - mean) / std for k, v in y.items()} 311 | info['mean'] = mean 312 | info['std'] = std 313 | else: 314 | raise_unknown('policy', policy) 315 | return y, info 316 | 317 | 318 | @dataclass(frozen=True) 319 | class Transformations: 320 | seed: int = 0 321 | normalization: Optional[Normalization] = None 322 | num_nan_policy: Optional[NumNanPolicy] = None 323 | cat_nan_policy: Optional[CatNanPolicy] = None 324 | cat_min_frequency: Optional[float] = None 325 | cat_encoding: Optional[CatEncoding] = None 326 | y_policy: Optional[YPolicy] = 'default' 327 | 328 | 329 | def transform_dataset( 330 | dataset: Dataset, 331 | transformations: Transformations, 332 | cache_dir: Optional[Path], 333 | ) -> Dataset: 334 | # WARNING: the order of transformations matters. Moreover, the current 335 | # implementation is not ideal in that sense. 336 | if cache_dir is not None: 337 | transformations_md5 = hashlib.md5( 338 | str(transformations).encode('utf-8') 339 | ).hexdigest() 340 | transformations_str = '__'.join(map(str, astuple(transformations))) 341 | cache_path = ( 342 | cache_dir / f'cache__{transformations_str}__{transformations_md5}.pickle' 343 | ) 344 | if cache_path.exists(): 345 | cache_transformations, value = load_pickle(cache_path) 346 | if transformations == cache_transformations: 347 | print( 348 | f"Using cached features: {cache_dir.name + '/' + cache_path.name}" 349 | ) 350 | return value 351 | else: 352 | raise RuntimeError(f'Hash collision for {cache_path}') 353 | else: 354 | cache_path = None 355 | 356 | if dataset.X_num is not None: 357 | dataset = num_process_nans(dataset, transformations.num_nan_policy) 358 | 359 | X_num = dataset.X_num 360 | if dataset.X_cat is None: 361 | replace(transformations, cat_nan_policy=None, cat_min_frequency=None, cat_encoding=None) 362 | # assert transformations.cat_nan_policy is None 363 | # assert transformations.cat_min_frequency is None 364 | # assert transformations.cat_encoding is None 365 | X_cat = None 366 | else: 367 | X_cat = cat_process_nans(dataset.X_cat, transformations.cat_nan_policy) 368 | if transformations.cat_min_frequency is not None: 369 | X_cat = cat_drop_rare(X_cat, transformations.cat_min_frequency) 370 | X_cat, is_num = cat_encode( 371 | X_cat, 372 | transformations.cat_encoding, 373 | dataset.y['train'], 374 | transformations.seed, 375 | ) 376 | if is_num: 377 | X_num = ( 378 | X_cat 379 | if X_num is None 380 | else {x: np.hstack([X_num[x], X_cat[x]]) for x in X_num} 381 | ) 382 | X_cat = None 383 | 384 | if X_num is not None and transformations.normalization is not None: 385 | X_num = normalize(X_num, transformations.normalization, transformations.seed) 386 | 387 | y, y_info = build_target(dataset.y, transformations.y_policy, dataset.task_type) 388 | 389 | dataset = replace(dataset, X_num=X_num, X_cat=X_cat, y=y, y_info=y_info) 390 | if cache_path is not None: 391 | dump_pickle((transformations, dataset), cache_path) 392 | return dataset 393 | 394 | 395 | def build_dataset( 396 | path: Union[str, Path], transformations: Transformations, cache: bool 397 | ) -> Dataset: 398 | path = Path(path) 399 | dataset = Dataset.from_dir(path) 400 | return transform_dataset(dataset, transformations, path if cache else None) 401 | 402 | 403 | def prepare_tensors( 404 | dataset: Dataset, device: Union[str, torch.device] 405 | ) -> Tuple[Optional[TensorDict], Optional[TensorDict], TensorDict]: 406 | if isinstance(device, str): 407 | device = torch.device(device) 408 | X_num, X_cat, Y = ( 409 | None if x is None else {k: torch.as_tensor(v) for k, v in x.items()} 410 | for x in [dataset.X_num, dataset.X_cat, dataset.y] 411 | ) 412 | if device.type != 'cpu': 413 | X_num, X_cat, Y = ( 414 | None if x is None else {k: v.to(device) for k, v in x.items()} 415 | for x in [X_num, X_cat, Y] 416 | ) 417 | assert X_num is not None 418 | assert Y is not None 419 | if not dataset.is_multiclass: 420 | Y = {k: v.float() for k, v in Y.items()} 421 | return X_num, X_cat, Y 422 | -------------------------------------------------------------------------------- /examples/add_custom_dataset.py: -------------------------------------------------------------------------------- 1 | # add your custom datasets from csv files 2 | import os 3 | import sys 4 | sys.path.append(os.getcwd()) 5 | from data import available_datasets 6 | from data.processor import DataProcessor 7 | 8 | if __name__ == '__main__': 9 | # my_csv_file = 'examples/[kaggle]Assay of serum free light chain.csv' # binclass 10 | # print('available datasets: ', available_datasets()) 11 | # DataProcessor.add_custom_dataset(my_csv_file) # add a dataset 12 | # print('available datasets: ', available_datasets()) 13 | # dataset = DataProcessor.load_preproc_default('result/test', 'ft-transformer', '[kaggle]Assay of serum free light chai') 14 | # DataProcessor.del_custom_dataset("[kaggle]Assay of serum free light chain") # remove a dataset 15 | print('available datasets: ', available_datasets()) 16 | my_csv_file = 'examples/[openml]bodyfat.csv' # regression 17 | DataProcessor.add_custom_dataset(my_csv_file) # add 18 | print('available datasets: ', available_datasets()) 19 | dataset = DataProcessor.load_preproc_default('result/test', 'ft-transformer', '[openml]bodyfat') # load 20 | pass -------------------------------------------------------------------------------- /examples/finetune_baseline.py: -------------------------------------------------------------------------------- 1 | # finetune a baseline with given configs 2 | import os 3 | import sys 4 | sys.path.append(os.getcwd()) 5 | 6 | import torch 7 | from data import available_datasets 8 | from data.processor import DataProcessor 9 | from utils.model import seed_everything, get_model_cards, make_baseline, load_config_from_file 10 | 11 | 12 | if __name__ == '__main__': 13 | seed_everything(42) 14 | device = torch.device('cuda') 15 | # model infos 16 | print('model cards: ', get_model_cards()) 17 | base_model = 'mlp' 18 | # dataset infos 19 | print('available datasets: ', available_datasets()) 20 | dataset_name = 'adult' 21 | # config files 22 | default_config_file = f'configs/default/{base_model}.yaml' # path to your config file 23 | output_dir = f"results/{base_model}/{dataset_name}" # path to save results 24 | # load configs 25 | configs = load_config_from_file(default_config_file) # or you can direcly pass config file to `make_baseline` 26 | # some necessary configs 27 | configs['training']['max_epochs'] = 100 # training args: max training epochs 28 | configs['training']['batch_size'] = 128 # training args: batch_size 29 | configs['meta'] = {'save_path': output_dir} # meta args: result dir 30 | 31 | # load dataset (processing upon model type) 32 | dataset = DataProcessor.load_preproc_default(output_dir, base_model, dataset_name, seed=0) 33 | # build model 34 | n_num_features = dataset.n_num_features 35 | categories = dataset.get_category_sizes('train') 36 | if len(categories) == 0: 37 | categories = None 38 | n_labels = dataset.n_classes or 1 # regression n_classes is None 39 | y_std = dataset.y_info.get('std') # for regression 40 | 41 | model = make_baseline( 42 | base_model, configs['model'], 43 | n_num=n_num_features, 44 | cat_card=categories, 45 | n_labels=n_labels, 46 | device=device 47 | ) 48 | # convert to tensor 49 | datas = DataProcessor.prepare(dataset, model) 50 | 51 | # training (automatically load best model at the end) 52 | model.fit( 53 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std, 54 | eval_set=(datas['val'],), # similar as sk-learn 55 | patience=8, # for early stop, <= 0 no early stop 56 | task=dataset.task_type.value, 57 | training_args=configs['training'], # training args 58 | meta_args=configs['meta'], # meta args: other infos, e.g. result dir, experiment name / id 59 | ) 60 | 61 | # prediction (best metric checkpoint) 62 | # model.load_best_dnn(output_dir, file='best') # or you can load manually 63 | predictions, results = model.predict( 64 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std, 65 | task=dataset.task_type.value, 66 | return_probs=True, return_metric=True, return_loss=True, 67 | ) 68 | model.save_prediction(output_dir, results) # save results 69 | print("=== Prediction (best metric) ===") 70 | print(results) 71 | 72 | # prediction (best logloss checkpoint) 73 | if dataset.task_type.value != 'regression': 74 | model.load_best_dnn(output_dir, file='best-logloss') 75 | predictions, results = model.predict( 76 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std, 77 | task=dataset.task_type.value, 78 | return_probs=True, return_metric=True, return_loss=True, 79 | ) 80 | model.save_prediction(output_dir, results, file='prediction_logloss') 81 | print("=== Prediction (best logloss) ===") 82 | print(results) -------------------------------------------------------------------------------- /examples/tune_baseline.py: -------------------------------------------------------------------------------- 1 | # tune then finetune in one function 2 | import os 3 | import sys 4 | sys.path.append(os.getcwd()) 5 | 6 | import torch 7 | from data.processor import DataProcessor 8 | from utils.model import load_config_from_file, seed_everything, get_model_cards, tune, make_baseline 9 | 10 | if __name__ == '__main__': 11 | seed_everything(42) 12 | device = torch.device('cuda') 13 | print('available model infos: ', get_model_cards()) 14 | base_model = 'mlp' 15 | dataset_name = 'adult' 16 | # model, training args 17 | search_space_file = f'configs/{base_model}.yaml' # refer to sample search space config file and build yours 18 | output_dir = f"results-tuned/{base_model}/{dataset_name}" # output dir for tuned configs & checkpoints 19 | # load dataset 20 | dataset = DataProcessor.load_preproc_default(output_dir, base_model, dataset_name, seed=0) 21 | 22 | # tune (will load the checkpoint of the best config to predict at the end) 23 | model = tune( 24 | model_name=base_model, 25 | search_config=search_space_file, 26 | dataset=dataset, 27 | batch_size=128, 28 | patience=3, # a small patience for fast tune 29 | n_iterations=5, # tune interations 30 | device=device, 31 | output_dir=output_dir) 32 | print('done') 33 | 34 | # if you want to use the best tuned config 35 | # but a different training args (e.g. patience, batch size) 36 | # you should manually load the best config and finetune 37 | best_config_file = f'{output_dir}/tuned/configs.yaml' 38 | best_configs = load_config_from_file(best_config_file) 39 | # data args 40 | n_num_features = dataset.n_num_features 41 | categories = dataset.get_category_sizes('train') 42 | if len(categories) == 0: 43 | categories = None 44 | n_labels = dataset.n_classes or 1 # regression n_classes is None 45 | y_std = dataset.y_info.get('std') # for regression 46 | # build model from the given config 47 | model = make_baseline( 48 | model_name=base_model, 49 | # you can directly pass config file, but this can not modify training args explicitly 50 | # model_config=best_config_file, 51 | model_config=best_configs['model'], 52 | n_num=n_num_features, 53 | cat_card=categories, 54 | n_labels=n_labels, 55 | device=device 56 | ) 57 | # here you can modify the training args (if read config file above) 58 | best_configs['training']['batch_size'] = 256 59 | best_configs['training']['lr'] = 5e-5 60 | output_dir2 = 'final_output_dir' 61 | best_configs['meta']['save_path'] = output_dir2 # save new results with tuned configs 62 | # prepare tensor data 63 | datas = DataProcessor.prepare(dataset, model) 64 | # finetune 65 | model.fit( 66 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std, 67 | eval_set=(datas['val'],), 68 | patience=8, # can use a differnet patience 69 | task=dataset.task_type.value, 70 | training_args=best_configs['training'], # training args 71 | meta_args=best_configs['meta'], # meta args: other infos, e.g. result dir, experiment name / id 72 | ) 73 | # prediction 74 | predictions, results = model.predict( 75 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std, 76 | task=dataset.task_type.value, 77 | return_probs=True, return_metric=True, return_loss=True, 78 | ) 79 | model.save_prediction(output_dir2, results) # save results 80 | print("=== Prediction (best metric) ===") 81 | print(results) -------------------------------------------------------------------------------- /image/auto_skdl-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pytabular-ai/auto-scikit-dl/39477cd89832b9da99b9eaa898274439a08b13fb/image/auto_skdl-logo.png -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .mlp import MLP 2 | from .ft_transformer import FTTransformer 3 | from .autoint import AutoInt 4 | from .dcnv2 import DCNv2 5 | from .node_model import NODE 6 | -------------------------------------------------------------------------------- /models/abstract.py: -------------------------------------------------------------------------------- 1 | # abstract class for all tabular models 2 | from abc import ABC, abstractmethod 3 | from typing import Optional, Tuple, Union, Dict, Any, Callable 4 | from pathlib import Path 5 | import warnings 6 | warnings.filterwarnings("ignore") 7 | import os 8 | import json 9 | import yaml 10 | import time 11 | import numpy as np 12 | 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.functional as F 16 | import torch.optim as optim 17 | from torch.utils.data import TensorDataset, DataLoader 18 | 19 | from utils.metrics import calculate_metrics 20 | from sklearn.metrics import log_loss, mean_squared_error 21 | 22 | DNN_FIT_API = Callable[ 23 | [nn.Module, torch.Tensor, torch.Tensor, torch.Tensor], 24 | Tuple[torch.Tensor, float] 25 | ] # input X, y and return logits, used time 26 | DNN_PREDICT_API = Callable[ 27 | [nn.Module, torch.Tensor, torch.Tensor], 28 | Tuple[torch.Tensor, float] 29 | ] # input X and return logits, used time 30 | 31 | def default_dnn_fit(model, x_num, x_cat, y): 32 | """ 33 | Training Process 34 | """ 35 | start_time = time.time() 36 | logits = model(x_num, x_cat) 37 | used_time = time.time() - start_time # omit backward time, add in outer loop 38 | return logits, used_time 39 | def default_dnn_predict(model, x_num, x_cat): 40 | """ 41 | Inference Process 42 | `no_grad` will be applied in `dnn_predict' 43 | """ 44 | start_time = time.time() 45 | logits = model(x_num, x_cat) 46 | used_time = time.time() - start_time 47 | return logits, used_time 48 | def check_dir(dir): 49 | if not os.path.exists(dir): 50 | os.makedirs(dir) 51 | 52 | def make_optimizer( 53 | optimizer: str, 54 | parameter_groups, 55 | lr: float, 56 | weight_decay: float, 57 | ) -> optim.Optimizer: 58 | Optimizer = { 59 | 'adam': optim.Adam, 60 | 'adamw': optim.AdamW, 61 | 'sgd': optim.SGD, 62 | }[optimizer] 63 | momentum = (0.9,) if Optimizer is optim.SGD else () 64 | return Optimizer(parameter_groups, lr, *momentum, weight_decay=weight_decay) 65 | 66 | def make_lr_schedule( 67 | optimizer: optim.Optimizer, 68 | lr: float, 69 | epoch_size: int, 70 | lr_schedule: Optional[Dict[str, Any]], 71 | ) -> Tuple[ 72 | Optional[optim.lr_scheduler._LRScheduler], 73 | Dict[str, Any], 74 | Optional[int], 75 | ]: 76 | if lr_schedule is None: 77 | lr_schedule = {'type': 'constant'} 78 | lr_scheduler = None 79 | n_warmup_steps = None 80 | if lr_schedule['type'] in ['transformer', 'linear_warmup']: 81 | n_warmup_steps = ( 82 | lr_schedule['n_warmup_steps'] 83 | if 'n_warmup_steps' in lr_schedule 84 | else lr_schedule['n_warmup_epochs'] * epoch_size 85 | ) 86 | elif lr_schedule['type'] == 'cyclic': 87 | lr_scheduler = optim.lr_scheduler.CyclicLR( 88 | optimizer, 89 | base_lr=lr, 90 | max_lr=lr_schedule['max_lr'], 91 | step_size_up=lr_schedule['n_epochs_up'] * epoch_size, 92 | step_size_down=lr_schedule['n_epochs_down'] * epoch_size, 93 | mode=lr_schedule['mode'], 94 | gamma=lr_schedule.get('gamma', 1.0), 95 | cycle_momentum=False, 96 | ) 97 | return lr_scheduler, lr_schedule, n_warmup_steps 98 | 99 | class TabModel(ABC): 100 | def __init__(self): 101 | self.model: Optional[nn.Module] = None # true model 102 | self.base_name = None # model type name 103 | self.device = None 104 | self.saved_model_config = None 105 | self.training_config = None 106 | self.meta_config = None 107 | self.post_init() 108 | 109 | def post_init(self): 110 | self.history = { 111 | 'train': {'loss': [], 'tot_time': 0, 'avg_step_time': 0, 'avg_epoch_time': 0}, 112 | 'val': { 113 | 'metric_name': None, 'metric': [], 'best_metric': None, 114 | 'log_loss': [], 'best_log_loss': None, 115 | 'best_epoch': None, 'best_step': None, 116 | 'tot_time': 0, 'avg_step_time': 0, 'avg_epoch_time': 0 117 | }, 118 | # 'test': {'loss': [], 'metric': [], 'final_metric': None}, 119 | 'device': torch.cuda.get_device_name(), 120 | } # save metrics 121 | self.no_improvement = 0 # for dnn early stop 122 | 123 | def preproc_config(self, model_config: dict): 124 | """default preprocessing for model configurations""" 125 | self.saved_model_config = model_config 126 | return model_config 127 | 128 | @abstractmethod 129 | def fit( 130 | self, 131 | X_num: Union[torch.Tensor, np.ndarray], 132 | X_cat: Union[torch.Tensor, np.ndarray], 133 | ys: Union[torch.Tensor, np.ndarray], 134 | y_std: Optional[float], 135 | eval_set: Optional[Tuple[Union[torch.Tensor, np.ndarray]]], 136 | patience: int, 137 | task: str, 138 | training_args: dict, 139 | meta_args: Optional[dict], 140 | ): 141 | """ 142 | Training Model with Early Stop(optional) 143 | load best weights at the end 144 | """ 145 | pass 146 | 147 | def dnn_fit( 148 | self, 149 | *, 150 | dnn_fit_func: Optional[DNN_FIT_API] = None, 151 | # API for specical sampler like curriculum learning 152 | train_loader: Optional[Tuple[DataLoader, int]] = None, # (loader, missing_idx) 153 | # using normal dataloader sampler if is None 154 | X_num: Optional[torch.Tensor] = None, 155 | X_cat: Optional[torch.Tensor] = None, 156 | ys: Optional[torch.Tensor] = None, 157 | y_std: Optional[float] = None, # for RMSE 158 | eval_set: Tuple[torch.Tensor, np.ndarray] = None, # similar API as sk-learn 159 | patience: int = 0, # <= 0 without early stop 160 | task: str, 161 | training_args: dict, 162 | meta_args: Optional[dict] = None, 163 | ): 164 | # DONE: move to abstract class (dnn_fit) 165 | if dnn_fit_func is None: 166 | dnn_fit_func = default_dnn_fit 167 | # meta args 168 | if meta_args is None: 169 | meta_args = {} 170 | meta_args.setdefault('save_path', f'results/{self.base_name}') 171 | if not os.path.exists(meta_args['save_path']): 172 | print('create new results dir: ', meta_args['save_path']) 173 | os.makedirs(meta_args['save_path']) 174 | self.meta_config = meta_args 175 | # optimzier and scheduler 176 | training_args.setdefault('optimizer', 'adamw') 177 | optimizer, scheduler = TabModel.make_optimizer(self.model, training_args) 178 | # data loader 179 | training_args.setdefault('batch_size', 64) 180 | training_args.setdefault('ghost_batch_size', None) 181 | if train_loader is not None: 182 | train_loader, missing_idx = train_loader 183 | training_args['batch_size'] = train_loader.batch_size 184 | else: 185 | train_loader, missing_idx = TabModel.prepare_tensor_loader( 186 | X_num=X_num, X_cat=X_cat, ys=ys, 187 | batch_size=training_args['batch_size'], 188 | shuffle=True, 189 | ) 190 | if eval_set is not None: 191 | eval_set = eval_set[0] # only use the first dev set 192 | dev_loader = TabModel.prepare_tensor_loader( 193 | X_num=eval_set[0], X_cat=eval_set[1], ys=eval_set[2], 194 | batch_size=training_args['batch_size'], 195 | ) 196 | else: 197 | dev_loader = None 198 | # training loops 199 | training_args.setdefault('max_epochs', 1000) 200 | # training_args.setdefault('report_frequency', 100) # same as save_freq 201 | # training_args.setdefault('save_frequency', 100) # save per 100 steps 202 | training_args.setdefault('patience', patience) 203 | training_args.setdefault('save_frequency', 'epoch') # save per epoch 204 | self.training_config = training_args 205 | 206 | steps_per_backward = 1 if training_args['ghost_batch_size'] is None \ 207 | else training_args['batch_size'] // training_args['ghost_batch_size'] 208 | steps_per_epoch = len(train_loader) 209 | tot_step, tot_time = 0, 0 210 | for e in range(training_args['max_epochs']): 211 | self.model.train() 212 | tot_loss = 0 213 | for step, batch in enumerate(train_loader): 214 | optimizer.zero_grad() 215 | x_num, x_cat, y = TabModel.parse_batch(batch, missing_idx, self.device) 216 | logits, forward_time = dnn_fit_func(self.model, x_num, x_cat, y) 217 | loss = TabModel.compute_loss(logits, y, task) 218 | # backward 219 | start_time = time.time() 220 | loss.backward() 221 | backward_time = time.time() - start_time 222 | self.gradient_policy() 223 | tot_time += forward_time + backward_time 224 | optimizer.step() 225 | if scheduler is not None: 226 | scheduler.step() 227 | # print or save infos 228 | tot_step += 1 229 | tot_loss += loss.cpu().item() 230 | if isinstance(training_args['save_frequency'], int) \ 231 | and tot_step % training_args['save_frequency'] == 0: 232 | is_early_stop = self.save_evaluate_dnn( 233 | tot_step, steps_per_epoch, 234 | tot_loss, tot_time, 235 | task, training_args['patience'], meta_args['save_path'], 236 | dev_loader, y_std, 237 | ) 238 | if is_early_stop: 239 | self.save(meta_args['save_path']) 240 | self.load_best_dnn(meta_args['save_path']) 241 | return 242 | if training_args['save_frequency'] == 'epoch': 243 | if hasattr(self.model, 'layer_masks'): 244 | print('layer_mask: ', self.model.layer_masks > 0) 245 | is_early_stop = self.save_evaluate_dnn( 246 | tot_step, steps_per_epoch, 247 | tot_loss, tot_time, 248 | task, training_args['patience'], meta_args['save_path'], 249 | dev_loader, y_std, 250 | ) 251 | if is_early_stop: 252 | self.save(meta_args['save_path']) 253 | self.load_best_dnn(meta_args['save_path']) 254 | return 255 | self.save(meta_args['save_path']) 256 | self.load_best_dnn(meta_args['save_path']) 257 | 258 | @abstractmethod 259 | def predict( 260 | self, 261 | dev_loader: Optional[DataLoader], 262 | X_num: Union[torch.Tensor, np.ndarray], 263 | X_cat: Union[torch.Tensor, np.ndarray], 264 | ys: Union[torch.Tensor, np.ndarray], 265 | y_std: Optional[float], 266 | task: str, 267 | return_probs: bool = True, 268 | return_metric: bool = True, 269 | return_loss: bool = True, 270 | meta_args: Optional[dict] = None, 271 | ): 272 | """ 273 | Prediction 274 | """ 275 | pass 276 | 277 | def dnn_predict( 278 | self, 279 | *, 280 | dnn_predict_func: Optional[DNN_PREDICT_API] = None, 281 | dev_loader: Optional[Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 282 | X_num: Optional[torch.Tensor] = None, 283 | X_cat: Optional[torch.Tensor] = None, 284 | ys: Optional[torch.Tensor] = None, 285 | y_std: Optional[float] = None, # for RMSE 286 | task: str, 287 | return_probs: bool = True, 288 | return_metric: bool = False, 289 | return_loss: bool = False, 290 | meta_args: Optional[dict] = None, 291 | ): 292 | # DONE: move to abstract class (dnn_predict) 293 | if dnn_predict_func is None: 294 | dnn_predict_func = default_dnn_predict 295 | if dev_loader is None: 296 | dev_loader, missing_idx = TabModel.prepare_tensor_loader( 297 | X_num=X_num, X_cat=X_cat, ys=ys, 298 | batch_size=128, 299 | ) 300 | else: 301 | dev_loader, missing_idx = dev_loader 302 | # print("Evaluate...") 303 | predictions, golds = [], [] 304 | tot_time = 0 305 | self.model.eval() 306 | for batch in dev_loader: 307 | x_num, x_cat, y = TabModel.parse_batch(batch, missing_idx, self.device) 308 | with torch.no_grad(): 309 | logits, used_time = dnn_predict_func(self.model, x_num, x_cat) 310 | tot_time += used_time 311 | predictions.append(logits) 312 | golds.append(y) 313 | self.model.train() 314 | predictions = torch.cat(predictions).squeeze(-1) 315 | golds = torch.cat(golds) 316 | if return_loss: 317 | loss = TabModel.compute_loss(predictions, golds, task).cpu().item() 318 | else: 319 | loss = None 320 | if return_probs and task != 'regression': 321 | predictions = ( 322 | predictions.sigmoid() 323 | if task == 'binclass' 324 | else predictions.softmax(-1) 325 | ) 326 | prediction_type = 'probs' 327 | elif task == 'regression': 328 | prediction_type = None 329 | else: 330 | prediction_type = 'logits' 331 | predictions = predictions.cpu().numpy() 332 | golds = golds.cpu().numpy() 333 | if return_metric: 334 | metric = TabModel.calculate_metric( 335 | golds, predictions, 336 | task, prediction_type, y_std 337 | ) 338 | logloss = ( 339 | log_loss(golds, np.stack([1-predictions, predictions], axis=1), labels=[0,1]) 340 | if task == 'binclass' 341 | else log_loss(golds, predictions, labels=list(range(len(set(golds))))) 342 | if task == 'multiclass' 343 | else None 344 | ) 345 | else: 346 | metric, logloss = None, None 347 | results = {'loss': loss, 'metric': metric, 'time': tot_time, 'log_loss': logloss} 348 | if meta_args is not None: 349 | self.save_prediction(meta_args['save_path'], results) 350 | return predictions, results 351 | 352 | def gradient_policy(self): 353 | """For post porcess model gradient""" 354 | pass 355 | 356 | @abstractmethod 357 | def save(self, output_dir): 358 | """ 359 | Save model weights and configs, 360 | the following default save functions 361 | can be combined to override this function 362 | """ 363 | pass 364 | 365 | def save_pt_model(self, output_dir): 366 | print('saving pt model weights...') 367 | # save model params 368 | torch.save(self.model.state_dict(), Path(output_dir) / 'final.bin') 369 | 370 | def save_tree_model(self, output_dir): 371 | print('saving tree model...') 372 | pass 373 | 374 | def save_history(self, output_dir): 375 | # save metrics 376 | with open(Path(output_dir) / 'results.json', 'w') as f: 377 | json.dump(self.history, f, indent=4) 378 | 379 | def save_prediction(self, output_dir, results, file='prediction'): 380 | check_dir(output_dir) 381 | # save test results 382 | print("saving prediction results") 383 | saved_results = { 384 | 'loss': results['loss'], 385 | 'metric_name': results['metric'][1], 386 | 'metric': results['metric'][0], 387 | 'time': results['time'], 388 | 'log_loss': results['log_loss'], 389 | } 390 | with open(Path(output_dir) / f'{file}.json', 'w') as f: 391 | json.dump(saved_results, f, indent=4) 392 | 393 | def save_config(self, output_dir): 394 | def serialize(config: dict): 395 | for key in config: 396 | # serialized object to store yaml or json files 397 | if any(isinstance(config[key], obj) for obj in [Path, ]): 398 | config[key] = str(config[key]) 399 | return config 400 | # save all configs 401 | with open(Path(output_dir) / 'configs.yaml', 'w') as f: 402 | configs = { 403 | 'model': self.saved_model_config, 404 | 'training': self.training_config, 405 | 'meta': serialize(self.meta_config) 406 | } 407 | yaml.dump(configs, f, indent=2) 408 | 409 | @staticmethod 410 | def make_optimizer( 411 | model: nn.Module, 412 | training_args: dict, 413 | ) -> Tuple[optim.Optimizer, optim.lr_scheduler._LRScheduler]: 414 | training_args.setdefault('optimizer', 'adamw') 415 | training_args.setdefault('no_wd_group', None) 416 | training_args.setdefault('scheduler', None) 417 | # optimizer 418 | if training_args['no_wd_group'] is not None: 419 | assert isinstance(training_args['no_wd_group'], list) 420 | def needs_wd(name): 421 | return all(x not in name for x in training_args['no_wd_group']) 422 | parameters_with_wd = [v for k, v in model.named_parameters() if needs_wd(k)] 423 | parameters_without_wd = [v for k, v in model.named_parameters() if not needs_wd(k)] 424 | model_params = [ 425 | {'params': parameters_with_wd}, 426 | {'params': parameters_without_wd, 'weight_decay': 0.0}, 427 | ] 428 | else: 429 | model_params = model.parameters() 430 | optimizer = make_optimizer( 431 | training_args['optimizer'], 432 | model_params, 433 | training_args['lr'], 434 | training_args['weight_decay'], 435 | ) 436 | # scheduler 437 | if training_args['scheduler'] is not None: 438 | scheduler = None 439 | else: 440 | scheduler = None 441 | 442 | return optimizer, scheduler 443 | 444 | @staticmethod 445 | def prepare_tensor_loader( 446 | X_num: Optional[torch.Tensor], 447 | X_cat: Optional[torch.Tensor], 448 | ys: torch.Tensor, 449 | batch_size: int = 64, 450 | shuffle: bool = False, 451 | ): 452 | assert not all(x is None for x in [X_num, X_cat]) 453 | missing_placeholder = 0 if X_num is None else 1 if X_cat is None else -1 454 | datas = [x for x in [X_num, X_cat, ys] if x is not None] 455 | tensor_dataset = TensorDataset(*datas) 456 | tensor_loader = DataLoader( 457 | tensor_dataset, 458 | batch_size=batch_size, 459 | shuffle=shuffle, 460 | ) 461 | return tensor_loader, missing_placeholder 462 | 463 | @staticmethod 464 | def parse_batch(batch: Tuple[torch.Tensor], missing_idx, device: torch.device): 465 | if batch[0].device.type != device.type: 466 | # if batch[0].device != device: # initialize self.device with model.device rather than torch.device() 467 | # batch = (x.to(device) for x in batch) # generator 468 | batch = tuple([x.to(device) for x in batch]) # list 469 | if missing_idx == -1: 470 | return batch 471 | else: 472 | return batch[:missing_idx] + [None,] + batch[missing_idx:] 473 | 474 | @staticmethod 475 | def compute_loss(logits: torch.Tensor, targets: torch.Tensor, task: str, reduction: str = 'mean'): 476 | loss_fn = { 477 | 'binclass': F.binary_cross_entropy_with_logits, 478 | 'multiclass': F.cross_entropy, 479 | 'regression': F.mse_loss, 480 | }[task] 481 | return loss_fn(logits.squeeze(-1), targets, reduction=reduction) 482 | 483 | @staticmethod 484 | def calculate_metric( 485 | golds, 486 | predictions, 487 | task: str, 488 | prediction_type: Optional[str] = None, 489 | y_std: Optional[float] = None, 490 | ): 491 | """Calculate metrics""" 492 | metric = { 493 | 'regression': 'rmse', 494 | 'binclass': 'roc_auc', 495 | 'multiclass': 'accuracy' 496 | }[task] 497 | 498 | return calculate_metrics( 499 | golds, predictions, 500 | task, prediction_type, y_std 501 | )[metric], metric 502 | 503 | def better_result(self, dev_metric, task, is_loss=False): 504 | if is_loss: # logloss 505 | best_dev_metric = self.history['val']['best_log_loss'] 506 | if best_dev_metric is None or best_dev_metric > dev_metric: 507 | self.history['val']['best_log_loss'] = dev_metric 508 | return True 509 | else: 510 | return False 511 | best_dev_metric = self.history['val']['best_metric'] 512 | if best_dev_metric is None: 513 | self.history['val']['best_metric'] = dev_metric 514 | return True 515 | elif task == 'regression': # rmse 516 | if best_dev_metric > dev_metric: 517 | self.history['val']['best_metric'] = dev_metric 518 | return True 519 | else: 520 | return False 521 | else: 522 | if best_dev_metric < dev_metric: 523 | self.history['val']['best_metric'] = dev_metric 524 | return True 525 | else: 526 | return False 527 | 528 | def early_stop_handler(self, epoch, tot_step, dev_metric, task, patience, save_path): 529 | if task != 'regression' and self.better_result(dev_metric['log_loss'], task, is_loss=True): 530 | # record best logloss 531 | torch.save(self.model.state_dict(), Path(save_path) / 'best-logloss.bin') 532 | if self.better_result(dev_metric['metric'], task): 533 | print('<<< Best Dev Result', end='') 534 | torch.save(self.model.state_dict(), Path(save_path) / 'best.bin') 535 | self.no_improvement = 0 536 | self.history['val']['best_epoch'] = epoch 537 | self.history['val']['best_step'] = tot_step 538 | else: 539 | self.no_improvement += 1 540 | print(f'| [no improvement] {self.no_improvement}', end='') 541 | if patience <= 0: 542 | return False 543 | else: 544 | return self.no_improvement >= patience 545 | 546 | def save_evaluate_dnn( 547 | self, 548 | # print and saved infos 549 | tot_step, steps_per_epoch, 550 | tot_loss, tot_time, 551 | # evaluate infos 552 | task, patience, save_path, 553 | dev_loader, y_std 554 | ): 555 | """For DNN models""" 556 | epoch, step = tot_step // steps_per_epoch, (tot_step - 1) % steps_per_epoch + 1 557 | avg_loss = tot_loss / step 558 | self.history['train']['loss'].append(avg_loss) 559 | self.history['train']['tot_time'] = tot_time 560 | self.history['train']['avg_step_time'] = tot_time / tot_step 561 | self.history['train']['avg_epoch_time'] = self.history['train']['avg_step_time'] * steps_per_epoch 562 | print(f"[epoch] {epoch} | [step] {step} | [tot_step] {tot_step} | [used time] {tot_time:.4g} | [train_loss] {avg_loss:.4g} ", end='') 563 | if dev_loader is not None: 564 | _, results = self.predict(dev_loader=dev_loader, y_std=y_std, task=task, return_metric=True) 565 | dev_metric, metric_name = results['metric'] 566 | print(f"| [{metric_name}] {dev_metric:.4g} ", end='') 567 | if task != 'regression': 568 | print(f"| [log-loss] {results['log_loss']:.4g} ", end='') 569 | self.history['val']['log_loss'].append(results['log_loss']) 570 | self.history['val']['metric_name'] = metric_name 571 | self.history['val']['metric'].append(dev_metric) 572 | self.history['val']['tot_time'] += results['time'] 573 | self.history['val']['avg_step_time'] = self.history['val']['tot_time'] / tot_step 574 | self.history['val']['avg_epoch_time'] = self.history['val']['avg_step_time'] * steps_per_epoch 575 | dev_metric = {'metric': dev_metric, 'log_loss': results['log_loss']} 576 | if self.early_stop_handler(epoch, tot_step, dev_metric, task, patience, save_path): 577 | print(' <<< Early Stop') 578 | return True 579 | print() 580 | return False 581 | 582 | def load_best_dnn(self, save_path, file='best'): 583 | model_file = Path(save_path) / f"{file}.bin" 584 | if not os.path.exists(model_file): 585 | print(f'There is no {file} checkpoint, loading the last one...') 586 | model_file = Path(save_path) / 'final.bin' 587 | else: 588 | print(f'Loading {file} model...') 589 | self.model.load_state_dict(torch.load(model_file)) 590 | print('successfully') 591 | -------------------------------------------------------------------------------- /models/autoint.py: -------------------------------------------------------------------------------- 1 | # Implementation of "AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks" 2 | # Some differences from a more "conventional" transformer: 3 | # - no FFN module, but one linear layer before adding the result of attention 4 | # - no bias for numerical embeddings 5 | # - no CLS token, the final embedding is formed by concatenation of all the tokens 6 | # - n_heads = 2 is recommended in the paper 7 | # - d_token is supposed to be small 8 | # - the placement of normalizations and activations is different 9 | 10 | # %% 11 | import math 12 | import time 13 | import typing as ty 14 | from pathlib import Path 15 | 16 | import numpy as np 17 | import torch 18 | import torch.nn as nn 19 | import torch.nn.functional as F 20 | import torch.nn.init as nn_init 21 | from torch.utils.data import DataLoader 22 | from torch import Tensor 23 | 24 | from utils.deep import get_activation_fn 25 | from models.abstract import TabModel, check_dir 26 | 27 | # %% 28 | class Tokenizer(nn.Module): 29 | category_offsets: ty.Optional[Tensor] 30 | 31 | def __init__( 32 | self, 33 | d_numerical: int, 34 | categories: ty.Optional[ty.List[int]], 35 | n_latent_tokens: int, 36 | d_token: int, 37 | ) -> None: 38 | super().__init__() 39 | assert n_latent_tokens == 0 40 | self.n_latent_tokens = n_latent_tokens 41 | if d_numerical: 42 | self.weight = nn.Parameter(Tensor(d_numerical + n_latent_tokens, d_token)) 43 | # The initialization is inspired by nn.Linear 44 | nn_init.kaiming_uniform_(self.weight, a=math.sqrt(5)) 45 | else: 46 | self.weight = None 47 | assert categories is not None 48 | if categories is None: 49 | self.category_offsets = None 50 | self.category_embeddings = None 51 | else: 52 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0) 53 | self.register_buffer('category_offsets', category_offsets) 54 | self.category_embeddings = nn.Embedding(sum(categories), d_token) 55 | nn_init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5)) 56 | print(f'{self.category_embeddings.weight.shape}') 57 | 58 | @property 59 | def n_tokens(self) -> int: 60 | return (0 if self.weight is None else len(self.weight)) + ( 61 | 0 if self.category_offsets is None else len(self.category_offsets) 62 | ) 63 | 64 | def forward(self, x_num: ty.Optional[Tensor], x_cat: ty.Optional[Tensor]) -> Tensor: 65 | if x_num is None: 66 | return self.category_embeddings(x_cat + self.category_offsets[None]) # type: ignore[code] 67 | x_num = torch.cat( 68 | [ 69 | torch.ones(len(x_num), self.n_latent_tokens, device=x_num.device), 70 | x_num, 71 | ], 72 | dim=1, 73 | ) 74 | x = self.weight[None] * x_num[:, :, None] # type: ignore[code] 75 | if x_cat is not None: 76 | x = torch.cat( 77 | [x, self.category_embeddings(x_cat + self.category_offsets[None])], # type: ignore[code] 78 | dim=1, 79 | ) 80 | return x 81 | 82 | 83 | class MultiheadAttention(nn.Module): 84 | def __init__( 85 | self, d: int, n_heads: int, dropout: float, initialization: str 86 | ) -> None: 87 | if n_heads > 1: 88 | assert d % n_heads == 0 89 | assert initialization in ['xavier', 'kaiming'] 90 | 91 | super().__init__() 92 | self.W_q = nn.Linear(d, d) 93 | self.W_k = nn.Linear(d, d) 94 | self.W_v = nn.Linear(d, d) 95 | self.W_out = None 96 | self.n_heads = n_heads 97 | self.dropout = nn.Dropout(dropout) if dropout else None 98 | 99 | for m in [self.W_q, self.W_k, self.W_v]: 100 | if initialization == 'xavier' and (n_heads > 1 or m is not self.W_v): 101 | # gain is needed since W_qkv is represented with 3 separate layers 102 | nn_init.xavier_uniform_(m.weight, gain=1 / math.sqrt(2)) 103 | nn_init.zeros_(m.bias) 104 | if self.W_out is not None: 105 | nn_init.zeros_(self.W_out.bias) 106 | 107 | def _reshape(self, x: Tensor) -> Tensor: 108 | batch_size, n_tokens, d = x.shape 109 | d_head = d // self.n_heads 110 | return ( 111 | x.reshape(batch_size, n_tokens, self.n_heads, d_head) 112 | .transpose(1, 2) 113 | .reshape(batch_size * self.n_heads, n_tokens, d_head) 114 | ) 115 | 116 | def forward( 117 | self, 118 | x_q: Tensor, 119 | x_kv: Tensor, 120 | key_compression: ty.Optional[nn.Linear], 121 | value_compression: ty.Optional[nn.Linear], 122 | ) -> Tensor: 123 | q, k, v = self.W_q(x_q), self.W_k(x_kv), self.W_v(x_kv) 124 | for tensor in [q, k, v]: 125 | assert tensor.shape[-1] % self.n_heads == 0 126 | if key_compression is not None: 127 | assert value_compression is not None 128 | k = key_compression(k.transpose(1, 2)).transpose(1, 2) 129 | v = value_compression(v.transpose(1, 2)).transpose(1, 2) 130 | else: 131 | assert value_compression is None 132 | 133 | batch_size = len(q) 134 | d_head_key = k.shape[-1] // self.n_heads 135 | d_head_value = v.shape[-1] // self.n_heads 136 | n_q_tokens = q.shape[1] 137 | 138 | q = self._reshape(q) 139 | k = self._reshape(k) 140 | attention = F.softmax(q @ k.transpose(1, 2) / math.sqrt(d_head_key), dim=-1) 141 | if self.dropout is not None: 142 | attention = self.dropout(attention) 143 | x = attention @ self._reshape(v) 144 | x = ( 145 | x.reshape(batch_size, self.n_heads, n_q_tokens, d_head_value) 146 | .transpose(1, 2) 147 | .reshape(batch_size, n_q_tokens, self.n_heads * d_head_value) 148 | ) 149 | if self.W_out is not None: 150 | x = self.W_out(x) 151 | return x 152 | 153 | 154 | class _AutoInt(nn.Module): 155 | def __init__( 156 | self, 157 | *, 158 | d_numerical: int, 159 | categories: ty.Optional[ty.List[int]], 160 | n_layers: int, 161 | d_token: int, 162 | n_heads: int, 163 | attention_dropout: float, 164 | residual_dropout: float, 165 | activation: str, 166 | prenormalization: bool = False, 167 | initialization: str = 'kaiming', 168 | kv_compression: ty.Optional[float] = None, 169 | kv_compression_sharing: ty.Optional[str] = None, 170 | d_out: int, 171 | ) -> None: 172 | assert not prenormalization 173 | assert activation == 'relu' 174 | assert (kv_compression is None) ^ (kv_compression_sharing is not None) 175 | 176 | super().__init__() 177 | self.tokenizer = Tokenizer(d_numerical, categories, 0, d_token) 178 | n_tokens = self.tokenizer.n_tokens 179 | 180 | def make_kv_compression(): 181 | assert kv_compression 182 | compression = nn.Linear( 183 | n_tokens, int(n_tokens * kv_compression), bias=False 184 | ) 185 | if initialization == 'xavier': 186 | nn_init.xavier_uniform_(compression.weight) 187 | return compression 188 | 189 | self.shared_kv_compression = ( 190 | make_kv_compression() 191 | if kv_compression and kv_compression_sharing == 'layerwise' 192 | else None 193 | ) 194 | 195 | def make_normalization(): 196 | return nn.LayerNorm(d_token) 197 | 198 | self.layers = nn.ModuleList([]) 199 | for layer_idx in range(n_layers): 200 | layer = nn.ModuleDict( 201 | { 202 | 'attention': MultiheadAttention( 203 | d_token, n_heads, attention_dropout, initialization 204 | ), 205 | 'linear': nn.Linear(d_token, d_token, bias=False), 206 | } 207 | ) 208 | if not prenormalization or layer_idx: 209 | layer['norm0'] = make_normalization() 210 | if kv_compression and self.shared_kv_compression is None: 211 | layer['key_compression'] = make_kv_compression() 212 | if kv_compression_sharing == 'headwise': 213 | layer['value_compression'] = make_kv_compression() 214 | else: 215 | assert kv_compression_sharing == 'key-value' 216 | self.layers.append(layer) 217 | 218 | self.activation = get_activation_fn(activation) 219 | self.prenormalization = prenormalization 220 | self.last_normalization = make_normalization() if prenormalization else None 221 | self.residual_dropout = residual_dropout 222 | self.head = nn.Linear(d_token * n_tokens, d_out) 223 | 224 | def _get_kv_compressions(self, layer): 225 | return ( 226 | (self.shared_kv_compression, self.shared_kv_compression) 227 | if self.shared_kv_compression is not None 228 | else (layer['key_compression'], layer['value_compression']) 229 | if 'key_compression' in layer and 'value_compression' in layer 230 | else (layer['key_compression'], layer['key_compression']) 231 | if 'key_compression' in layer 232 | else (None, None) 233 | ) 234 | 235 | def _start_residual(self, x, layer, norm_idx): 236 | x_residual = x 237 | if self.prenormalization: 238 | norm_key = f'norm{norm_idx}' 239 | if norm_key in layer: 240 | x_residual = layer[norm_key](x_residual) 241 | return x_residual 242 | 243 | def _end_residual(self, x, x_residual, layer, norm_idx): 244 | if self.residual_dropout: 245 | x_residual = F.dropout(x_residual, self.residual_dropout, self.training) 246 | x = x + x_residual 247 | if not self.prenormalization: 248 | x = layer[f'norm{norm_idx}'](x) 249 | return x 250 | 251 | def forward(self, x_num: ty.Optional[Tensor], x_cat: ty.Optional[Tensor]) -> Tensor: 252 | x = self.tokenizer(x_num, x_cat) 253 | 254 | for layer in self.layers: 255 | layer = ty.cast(ty.Dict[str, nn.Module], layer) 256 | 257 | x_residual = self._start_residual(x, layer, 0) 258 | x_residual = layer['attention']( 259 | x_residual, 260 | x_residual, 261 | *self._get_kv_compressions(layer), 262 | ) 263 | x = layer['linear'](x) 264 | x = self._end_residual(x, x_residual, layer, 0) 265 | x = self.activation(x) 266 | 267 | x = x.flatten(1, 2) 268 | x = self.head(x) 269 | x = x.squeeze(-1) 270 | return x 271 | 272 | 273 | # %% 274 | class AutoInt(TabModel): 275 | def __init__( 276 | self, 277 | model_config: dict, 278 | n_num_features: int, 279 | categories: ty.Optional[ty.List[int]], 280 | n_labels: int, 281 | device: ty.Union[str, torch.device] = 'cuda', 282 | ): 283 | super().__init__() 284 | model_config = self.preproc_config(model_config) 285 | self.model = _AutoInt( 286 | d_numerical=n_num_features, 287 | categories=categories, 288 | d_out=n_labels, 289 | **model_config 290 | ).to(device) 291 | self.base_name = 'autoint' 292 | self.device = torch.device(device) 293 | 294 | def preproc_config(self, model_config: dict): 295 | # process autoint configs 296 | self.saved_model_config = model_config.copy() 297 | return model_config 298 | 299 | def fit( 300 | self, 301 | # API for specical sampler like curriculum learning 302 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx) 303 | # using normal sampler if is None 304 | X_num: ty.Optional[torch.Tensor] = None, 305 | X_cat: ty.Optional[torch.Tensor] = None, 306 | ys: ty.Optional[torch.Tensor] = None, 307 | y_std: ty.Optional[float] = None, # for RMSE 308 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None, 309 | patience: int = 0, 310 | task: str = None, 311 | training_args: dict = None, 312 | meta_args: ty.Optional[dict] = None, 313 | ): 314 | def train_step(model, x_num, x_cat, y): # input is X and y 315 | # process input (model-specific) 316 | # define your model API 317 | start_time = time.time() 318 | # define your model API 319 | logits = model(x_num, x_cat) 320 | used_time = time.time() - start_time 321 | return logits, used_time 322 | 323 | # to custom other training paradigm 324 | # 1. add self.dnn_fit2(...) in abstract class for special training process 325 | # 2. (recommended) override self.dnn_fit in abstract class 326 | self.dnn_fit( # uniform training paradigm 327 | dnn_fit_func=train_step, 328 | # training data 329 | train_loader=train_loader, 330 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, 331 | # dev data 332 | eval_set=eval_set, patience=patience, task=task, 333 | # args 334 | training_args=training_args, 335 | meta_args=meta_args, 336 | ) 337 | 338 | def predict( 339 | self, 340 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 341 | X_num: ty.Optional[torch.Tensor] = None, 342 | X_cat: ty.Optional[torch.Tensor] = None, 343 | ys: ty.Optional[torch.Tensor] = None, 344 | y_std: ty.Optional[float] = None, # for RMSE 345 | task: str = None, 346 | return_probs: bool = True, 347 | return_metric: bool = False, 348 | return_loss: bool = False, 349 | meta_args: ty.Optional[dict] = None, 350 | ): 351 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible) 352 | """ 353 | Inference Process 354 | `no_grad` will be applied in `dnn_predict' 355 | """ 356 | # process input (model-specific) 357 | # define your model API 358 | start_time = time.time() 359 | # define your model API 360 | logits = model(x_num, x_cat) 361 | used_time = time.time() - start_time 362 | return logits, used_time 363 | 364 | # to custom other inference paradigm 365 | # 1. add self.dnn_predict2(...) in abstract class for special training process 366 | # 2. (recommended) override self.dnn_predict in abstract class 367 | return self.dnn_predict( # uniform training paradigm 368 | dnn_predict_func=inference_step, 369 | dev_loader=dev_loader, 370 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task, 371 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss, 372 | meta_args=meta_args, 373 | ) 374 | 375 | def save(self, output_dir): 376 | check_dir(output_dir) 377 | self.save_pt_model(output_dir) 378 | self.save_history(output_dir) 379 | self.save_config(output_dir) -------------------------------------------------------------------------------- /models/dcnv2.py: -------------------------------------------------------------------------------- 1 | # %% 2 | import math 3 | import time 4 | import typing as ty 5 | from pathlib import Path 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | from torch.utils.data import DataLoader 12 | 13 | from models.abstract import TabModel, check_dir 14 | # %% 15 | class CrossLayer(nn.Module): 16 | def __init__(self, d, dropout): 17 | super().__init__() 18 | self.linear = nn.Linear(d, d) 19 | self.dropout = nn.Dropout(dropout) 20 | 21 | def forward(self, x0, x): 22 | return self.dropout(x0 * self.linear(x)) + x 23 | 24 | 25 | class _DCNv2(nn.Module): 26 | def __init__( 27 | self, 28 | *, 29 | d_in: int, 30 | d: int, 31 | n_hidden_layers: int, 32 | n_cross_layers: int, 33 | hidden_dropout: float, 34 | cross_dropout: float, 35 | d_out: int, 36 | stacked: bool, 37 | categories: ty.Optional[ty.List[int]], 38 | d_embedding: int = None, 39 | ) -> None: 40 | super().__init__() 41 | 42 | if categories is not None: 43 | assert d_embedding is not None 44 | d_in += len(categories) * d_embedding 45 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0) 46 | self.register_buffer('category_offsets', category_offsets) 47 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding) 48 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5)) 49 | print(f'{self.category_embeddings.weight.shape}') 50 | 51 | self.first_linear = nn.Linear(d_in, d) 52 | self.last_linear = nn.Linear(d if stacked else 2 * d, d_out) 53 | 54 | deep_layers = sum( 55 | [ 56 | [nn.Linear(d, d), nn.ReLU(True), nn.Dropout(hidden_dropout)] 57 | for _ in range(n_hidden_layers) 58 | ], 59 | [], 60 | ) 61 | cross_layers = [CrossLayer(d, cross_dropout) for _ in range(n_cross_layers)] 62 | 63 | self.deep_layers = nn.Sequential(*deep_layers) 64 | self.cross_layers = nn.ModuleList(cross_layers) 65 | self.stacked = stacked 66 | 67 | def forward(self, x_num, x_cat): 68 | x = [] 69 | if x_num is not None: 70 | x.append(x_num) 71 | if x_cat is not None: 72 | x.append( 73 | self.category_embeddings(x_cat + self.category_offsets[None]).view( 74 | x_cat.size(0), -1 75 | ) 76 | ) 77 | x = torch.cat(x, dim=-1) 78 | 79 | x = self.first_linear(x) 80 | 81 | x_cross = x 82 | for cross_layer in self.cross_layers: 83 | x_cross = cross_layer(x, x_cross) 84 | 85 | if self.stacked: 86 | return self.last_linear(self.deep_layers(x_cross)).squeeze(1) 87 | else: 88 | return self.last_linear( 89 | torch.cat([x_cross, self.deep_layers(x)], dim=1) 90 | ).squeeze(1) 91 | 92 | 93 | # %% 94 | class DCNv2(TabModel): 95 | def __init__( 96 | self, 97 | model_config: dict, 98 | n_num_features: int, 99 | categories: ty.Optional[ty.List[int]], 100 | n_labels: int, 101 | device: ty.Union[str, torch.device] = 'cuda', 102 | ): 103 | super().__init__() 104 | model_config = self.preproc_config(model_config) 105 | self.model = _DCNv2( 106 | d_in=n_num_features, 107 | categories=categories, 108 | d_out=n_labels, 109 | **model_config 110 | ).to(device) 111 | self.base_name = 'dcnv2' 112 | self.device = torch.device(device) 113 | 114 | def preproc_config(self, model_config: dict): 115 | # process autoint configs 116 | self.saved_model_config = model_config.copy() 117 | return model_config 118 | 119 | def fit( 120 | self, 121 | # API for specical sampler like curriculum learning 122 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx) 123 | # using normal sampler if is None 124 | X_num: ty.Optional[torch.Tensor] = None, 125 | X_cat: ty.Optional[torch.Tensor] = None, 126 | ys: ty.Optional[torch.Tensor] = None, 127 | y_std: ty.Optional[float] = None, # for RMSE 128 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None, 129 | patience: int = 0, 130 | task: str = None, 131 | training_args: dict = None, 132 | meta_args: ty.Optional[dict] = None, 133 | ): 134 | def train_step(model, x_num, x_cat, y): # input is X and y 135 | # process input (model-specific) 136 | # define your model API 137 | start_time = time.time() 138 | # define your model API 139 | logits = model(x_num, x_cat) 140 | used_time = time.time() - start_time 141 | return logits, used_time 142 | 143 | # to custom other training paradigm 144 | # 1. add self.dnn_fit2(...) in abstract class for special training process 145 | # 2. (recommended) override self.dnn_fit in abstract class 146 | self.dnn_fit( # uniform training paradigm 147 | dnn_fit_func=train_step, 148 | # training data 149 | train_loader=train_loader, 150 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, 151 | # dev data 152 | eval_set=eval_set, patience=patience, task=task, 153 | # args 154 | training_args=training_args, 155 | meta_args=meta_args, 156 | ) 157 | 158 | def predict( 159 | self, 160 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 161 | X_num: ty.Optional[torch.Tensor] = None, 162 | X_cat: ty.Optional[torch.Tensor] = None, 163 | ys: ty.Optional[torch.Tensor] = None, 164 | y_std: ty.Optional[float] = None, # for RMSE 165 | task: str = None, 166 | return_probs: bool = True, 167 | return_metric: bool = False, 168 | return_loss: bool = False, 169 | meta_args: ty.Optional[dict] = None, 170 | ): 171 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible) 172 | """ 173 | Inference Process 174 | `no_grad` will be applied in `dnn_predict' 175 | """ 176 | # process input (model-specific) 177 | # define your model API 178 | start_time = time.time() 179 | # define your model API 180 | logits = model(x_num, x_cat) 181 | used_time = time.time() - start_time 182 | return logits, used_time 183 | 184 | # to custom other inference paradigm 185 | # 1. add self.dnn_predict2(...) in abstract class for special training process 186 | # 2. (recommended) override self.dnn_predict in abstract class 187 | return self.dnn_predict( # uniform training paradigm 188 | dnn_predict_func=inference_step, 189 | dev_loader=dev_loader, 190 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task, 191 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss, 192 | meta_args=meta_args 193 | ) 194 | 195 | def save(self, output_dir): 196 | check_dir(output_dir) 197 | self.save_pt_model(output_dir) 198 | self.save_history(output_dir) 199 | self.save_config(output_dir) -------------------------------------------------------------------------------- /models/ft_transformer.py: -------------------------------------------------------------------------------- 1 | import rtdl 2 | import time 3 | import typing as ty 4 | import numpy as np 5 | 6 | import torch 7 | from torch.utils.data import DataLoader 8 | from models.abstract import TabModel, check_dir 9 | 10 | class FTTransformer(TabModel): 11 | def __init__( 12 | self, 13 | model_config: dict, 14 | n_num_features: int, 15 | categories: ty.Optional[ty.List[int]], 16 | n_labels: int, 17 | device: ty.Union[str, torch.device] = 'cuda', 18 | ): 19 | super().__init__() 20 | model_config = self.preproc_config(model_config) 21 | self.model = rtdl.FTTransformer.make_baseline( 22 | n_num_features=n_num_features, 23 | cat_cardinalities=categories, 24 | d_out=n_labels, 25 | **model_config 26 | ).to(device) 27 | self.base_name = 'ft-transformer' 28 | self.device = torch.device(device) 29 | 30 | def preproc_config(self, model_config: dict): 31 | self.saved_model_config = model_config.copy() 32 | # process ftt configs 33 | if 'ffn_d_factor' in model_config: 34 | model_config['ffn_d_hidden'] = \ 35 | int(model_config['d_token'] * model_config.pop('ffn_d_factor')) 36 | return model_config 37 | 38 | def fit( 39 | self, 40 | # API for specical sampler like curriculum learning 41 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx) 42 | # using normal sampler if is None 43 | X_num: ty.Optional[torch.Tensor] = None, 44 | X_cat: ty.Optional[torch.Tensor] = None, 45 | ys: ty.Optional[torch.Tensor] = None, 46 | y_std: ty.Optional[float] = None, # for RMSE 47 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None, 48 | patience: int = 0, 49 | task: str = None, 50 | training_args: dict = None, 51 | meta_args: ty.Optional[dict] = None, 52 | ): 53 | def train_step(model, x_num, x_cat, y): # input is X and y 54 | # process input (model-specific) 55 | # define your running time calculation 56 | start_time = time.time() 57 | # define your model API 58 | logits = model(x_num, x_cat) 59 | used_time = time.time() - start_time # don't forget backward time, calculate in outer loop 60 | return logits, used_time 61 | 62 | # to custom other training paradigm 63 | # 1. add self.dnn_fit2(...) in abstract class for special training process 64 | # 2. (recommended) override self.dnn_fit in abstract class 65 | self.dnn_fit( # uniform training paradigm 66 | dnn_fit_func=train_step, 67 | # training data 68 | train_loader=train_loader, 69 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, 70 | # dev data 71 | eval_set=eval_set, patience=patience, task=task, 72 | # args 73 | training_args=training_args, 74 | meta_args=meta_args, 75 | ) 76 | 77 | def predict( 78 | self, 79 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 80 | X_num: ty.Optional[torch.Tensor] = None, 81 | X_cat: ty.Optional[torch.Tensor] = None, 82 | ys: ty.Optional[torch.Tensor] = None, 83 | y_std: ty.Optional[float] = None, # for RMSE 84 | task: str = None, 85 | return_probs: bool = True, 86 | return_metric: bool = False, 87 | return_loss: bool = False, 88 | meta_args: ty.Optional[dict] = None, 89 | ): 90 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible) 91 | """ 92 | Inference Process 93 | `no_grad` will be applied in `dnn_predict' 94 | """ 95 | # process input (model-specific) 96 | # define your running time calculation 97 | start_time = time.time() 98 | # define your model API 99 | logits = model(x_num, x_cat) 100 | used_time = time.time() - start_time 101 | return logits, used_time 102 | 103 | # to custom other inference paradigm 104 | # 1. add self.dnn_predict2(...) in abstract class for special training process 105 | # 2. (recommended) override self.dnn_predict in abstract class 106 | return self.dnn_predict( # uniform training paradigm 107 | dnn_predict_func=inference_step, 108 | dev_loader=dev_loader, 109 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task, 110 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss, 111 | meta_args=meta_args 112 | ) 113 | 114 | def save(self, output_dir): 115 | check_dir(output_dir) 116 | self.save_pt_model(output_dir) 117 | self.save_history(output_dir) 118 | self.save_config(output_dir) -------------------------------------------------------------------------------- /models/mlp.py: -------------------------------------------------------------------------------- 1 | # %% 2 | import os 3 | import json 4 | import math 5 | import time 6 | import typing as ty 7 | from pathlib import Path 8 | 9 | import numpy as np 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | from torch.utils.data import DataLoader 14 | 15 | from models.abstract import TabModel, check_dir 16 | # %% 17 | class _MLP(nn.Module): 18 | def __init__( 19 | self, 20 | d_in: int, 21 | d_layers: ty.List[int], 22 | dropout: float, 23 | d_out: int, 24 | categories: ty.Optional[ty.List[int]], 25 | d_embedding: int, 26 | ) -> None: 27 | super().__init__() 28 | 29 | self.n_categories = 0 if categories is None else len(categories) 30 | if categories is not None: 31 | d_in += len(categories) * d_embedding 32 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0) 33 | self.register_buffer('category_offsets', category_offsets) 34 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding) 35 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5)) 36 | print(f'{self.category_embeddings.weight.shape}') 37 | 38 | self.layers = nn.ModuleList( 39 | [ 40 | nn.Linear(d_layers[i - 1] if i else d_in, x) 41 | for i, x in enumerate(d_layers) 42 | ] 43 | ) 44 | self.dropout = dropout 45 | self.head = nn.Linear(d_layers[-1] if d_layers else d_in, d_out) 46 | 47 | def forward(self, x_num, x_cat): 48 | x = [] 49 | if x_num is not None: 50 | x.append(x_num) 51 | if x_cat is not None: 52 | x.append( 53 | self.category_embeddings(x_cat + self.category_offsets[None]).view( 54 | x_cat.size(0), -1 55 | ) 56 | ) 57 | x = torch.cat(x, dim=-1) 58 | 59 | for layer in self.layers: 60 | x = layer(x) 61 | x = F.relu(x) 62 | if self.dropout: 63 | x = F.dropout(x, self.dropout, self.training) 64 | x = self.head(x) 65 | x = x.squeeze(-1) 66 | 67 | return x 68 | 69 | 70 | # %% 71 | class MLP(TabModel): 72 | def __init__( 73 | self, 74 | model_config: dict, 75 | n_num_features: int, 76 | categories: ty.Optional[ty.List[int]], 77 | n_labels: int, 78 | device: ty.Union[str, torch.device] = 'cuda', 79 | ): 80 | super().__init__() 81 | model_config = self.preproc_config(model_config) 82 | self.model = _MLP( 83 | d_in=n_num_features, 84 | categories=categories, 85 | d_out=n_labels, 86 | **model_config 87 | ).to(device) 88 | self.base_name = 'mlp' 89 | self.device = torch.device(device) 90 | 91 | def preproc_config(self, model_config: dict): 92 | """MLP config preprocessing""" 93 | # process mlp configs 94 | self.saved_model_config = model_config.copy() 95 | d_layers = [] 96 | n_layers, first_dim, mid_dim, last_dim = \ 97 | ( 98 | model_config.pop('n_layers'), model_config.pop('first_dim'), 99 | model_config.pop('mid_dim'), model_config.pop('last_dim') 100 | ) 101 | for i in range(n_layers): 102 | if i == 0: 103 | d_layers.append(first_dim) 104 | elif i == n_layers - 1 and n_layers > 1: 105 | d_layers.append(last_dim) 106 | else: 107 | d_layers.append(mid_dim) 108 | model_config['d_layers'] = d_layers 109 | return model_config 110 | 111 | def fit( 112 | self, 113 | # API for specical sampler like curriculum learning 114 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx) 115 | # using normal sampler if is None 116 | X_num: ty.Optional[torch.Tensor] = None, 117 | X_cat: ty.Optional[torch.Tensor] = None, 118 | ys: ty.Optional[torch.Tensor] = None, 119 | y_std: ty.Optional[float] = None, # for RMSE 120 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None, 121 | patience: int = 0, 122 | task: str = None, 123 | training_args: dict = None, 124 | meta_args: ty.Optional[dict] = None, 125 | ): 126 | def train_step(model, x_num, x_cat, y): # input is X and y 127 | # process input (model-specific) 128 | # define your running time calculation 129 | start_time = time.time() 130 | # define your model API 131 | logits = model(x_num, x_cat) 132 | used_time = time.time() - start_time # don't forget backward time, calculate in outer loop 133 | return logits, used_time 134 | 135 | # to custom other training paradigm 136 | # 1. add self.dnn_fit2(...) in abstract class for special training process 137 | # 2. (recommended) override self.dnn_fit in abstract class 138 | self.dnn_fit( # uniform training paradigm 139 | dnn_fit_func=train_step, 140 | # training data 141 | train_loader=train_loader, 142 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, 143 | # dev data 144 | eval_set=eval_set, patience=patience, task=task, 145 | # args 146 | training_args=training_args, 147 | meta_args=meta_args, 148 | ) 149 | 150 | def predict( 151 | self, 152 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 153 | X_num: ty.Optional[torch.Tensor] = None, 154 | X_cat: ty.Optional[torch.Tensor] = None, 155 | ys: ty.Optional[torch.Tensor] = None, 156 | y_std: ty.Optional[float] = None, # for RMSE 157 | task: str = None, 158 | return_probs: bool = True, 159 | return_metric: bool = False, 160 | return_loss: bool = False, 161 | meta_args: ty.Optional[dict] = None, 162 | ): 163 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible) 164 | """ 165 | Inference Process 166 | `no_grad` will be applied in `dnn_predict' 167 | """ 168 | # process input (model-specific) 169 | # define your running time calculation 170 | start_time = time.time() 171 | # define your model API 172 | logits = model(x_num, x_cat) 173 | used_time = time.time() - start_time 174 | return logits, used_time 175 | 176 | # to custom other inference paradigm 177 | # 1. add self.dnn_predict2(...) in abstract class for special training process 178 | # 2. (recommended) override self.dnn_predict in abstract class 179 | return self.dnn_predict( # uniform training paradigm 180 | dnn_predict_func=inference_step, 181 | dev_loader=dev_loader, 182 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task, 183 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss, 184 | meta_args=meta_args 185 | ) 186 | 187 | def save(self, output_dir): 188 | check_dir(output_dir) 189 | self.save_pt_model(output_dir) 190 | self.save_history(output_dir) 191 | self.save_config(output_dir) -------------------------------------------------------------------------------- /models/node/__init__.py: -------------------------------------------------------------------------------- 1 | # Source: https://github.com/Qwicen/node 2 | from .arch import * # noqa 3 | from .nn_utils import * # noqa 4 | from .odst import * # noqa 5 | from .utils import * # noqa 6 | -------------------------------------------------------------------------------- /models/node/arch.py: -------------------------------------------------------------------------------- 1 | # Source: https://github.com/Qwicen/node 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | from torch.utils.checkpoint import checkpoint as torch_checkpoint 6 | 7 | from .odst import ODST 8 | 9 | 10 | class DenseBlock(nn.Sequential): 11 | def __init__(self, input_dim, layer_dim, num_layers, tree_dim=1, max_features=None, 12 | input_dropout=0.0, flatten_output=True, Module=ODST, **kwargs): 13 | layers = [] 14 | for i in range(num_layers): 15 | oddt = Module(input_dim, layer_dim, tree_dim=tree_dim, flatten_output=True, **kwargs) 16 | input_dim = min(input_dim + layer_dim * tree_dim, max_features or float('inf')) 17 | layers.append(oddt) 18 | 19 | super().__init__(*layers) 20 | self.num_layers, self.layer_dim, self.tree_dim = num_layers, layer_dim, tree_dim 21 | self.max_features, self.flatten_output = max_features, flatten_output 22 | self.input_dropout = input_dropout 23 | 24 | def forward(self, x): 25 | initial_features = x.shape[-1] 26 | for layer in self: 27 | layer_inp = x 28 | if self.max_features is not None: 29 | tail_features = min(self.max_features, layer_inp.shape[-1]) - initial_features 30 | if tail_features != 0: 31 | layer_inp = torch.cat([layer_inp[..., :initial_features], layer_inp[..., -tail_features:]], dim=-1) 32 | if self.training and self.input_dropout: 33 | layer_inp = F.dropout(layer_inp, self.input_dropout) 34 | h = layer(layer_inp) 35 | x = torch.cat([x, h], dim=-1) 36 | 37 | outputs = x[..., initial_features:] 38 | if not self.flatten_output: 39 | outputs = outputs.view(*outputs.shape[:-1], self.num_layers * self.layer_dim, self.tree_dim) 40 | return outputs 41 | -------------------------------------------------------------------------------- /models/node/nn_utils.py: -------------------------------------------------------------------------------- 1 | # Source: https://github.com/Qwicen/node 2 | import contextlib 3 | from collections import OrderedDict 4 | 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from torch.autograd import Function 10 | from torch.jit import script 11 | 12 | 13 | def to_one_hot(y, depth=None): 14 | r""" 15 | Takes integer with n dims and converts it to 1-hot representation with n + 1 dims. 16 | The n+1'st dimension will have zeros everywhere but at y'th index, where it will be equal to 1. 17 | Args: 18 | y: input integer (IntTensor, LongTensor or Variable) of any shape 19 | depth (int): the size of the one hot dimension 20 | """ 21 | y_flat = y.to(torch.int64).view(-1, 1) 22 | depth = depth if depth is not None else int(torch.max(y_flat)) + 1 23 | y_one_hot = torch.zeros(y_flat.size()[0], depth, device=y.device).scatter_(1, y_flat, 1) 24 | y_one_hot = y_one_hot.view(*(tuple(y.shape) + (-1,))) 25 | return y_one_hot 26 | 27 | 28 | def _make_ix_like(input, dim=0): 29 | d = input.size(dim) 30 | rho = torch.arange(1, d + 1, device=input.device, dtype=input.dtype) 31 | view = [1] * input.dim() 32 | view[0] = -1 33 | return rho.view(view).transpose(0, dim) 34 | 35 | 36 | class SparsemaxFunction(Function): 37 | """ 38 | An implementation of sparsemax (Martins & Astudillo, 2016). See 39 | :cite:`DBLP:journals/corr/MartinsA16` for detailed description. 40 | 41 | By Ben Peters and Vlad Niculae 42 | """ 43 | 44 | @staticmethod 45 | def forward(ctx, input, dim=-1): 46 | """sparsemax: normalizing sparse transform (a la softmax) 47 | 48 | Parameters: 49 | input (Tensor): any shape 50 | dim: dimension along which to apply sparsemax 51 | 52 | Returns: 53 | output (Tensor): same shape as input 54 | """ 55 | ctx.dim = dim 56 | max_val, _ = input.max(dim=dim, keepdim=True) 57 | input -= max_val # same numerical stability trick as for softmax 58 | tau, supp_size = SparsemaxFunction._threshold_and_support(input, dim=dim) 59 | output = torch.clamp(input - tau, min=0) 60 | ctx.save_for_backward(supp_size, output) 61 | return output 62 | 63 | @staticmethod 64 | def backward(ctx, grad_output): 65 | supp_size, output = ctx.saved_tensors 66 | dim = ctx.dim 67 | grad_input = grad_output.clone() 68 | grad_input[output == 0] = 0 69 | 70 | v_hat = grad_input.sum(dim=dim) / supp_size.to(output.dtype).squeeze() 71 | v_hat = v_hat.unsqueeze(dim) 72 | grad_input = torch.where(output != 0, grad_input - v_hat, grad_input) 73 | return grad_input, None 74 | 75 | 76 | @staticmethod 77 | def _threshold_and_support(input, dim=-1): 78 | """Sparsemax building block: compute the threshold 79 | 80 | Args: 81 | input: any dimension 82 | dim: dimension along which to apply the sparsemax 83 | 84 | Returns: 85 | the threshold value 86 | """ 87 | 88 | input_srt, _ = torch.sort(input, descending=True, dim=dim) 89 | input_cumsum = input_srt.cumsum(dim) - 1 90 | rhos = _make_ix_like(input, dim) 91 | support = rhos * input_srt > input_cumsum 92 | 93 | support_size = support.sum(dim=dim).unsqueeze(dim) 94 | tau = input_cumsum.gather(dim, support_size - 1) 95 | tau /= support_size.to(input.dtype) 96 | return tau, support_size 97 | 98 | 99 | sparsemax = lambda input, dim=-1: SparsemaxFunction.apply(input, dim) 100 | sparsemoid = lambda input: (0.5 * input + 0.5).clamp_(0, 1) 101 | 102 | 103 | class Entmax15Function(Function): 104 | """ 105 | An implementation of exact Entmax with alpha=1.5 (B. Peters, V. Niculae, A. Martins). See 106 | :cite:`https://arxiv.org/abs/1905.05702 for detailed description. 107 | Source: https://github.com/deep-spin/entmax 108 | """ 109 | 110 | @staticmethod 111 | def forward(ctx, input, dim=-1): 112 | ctx.dim = dim 113 | 114 | max_val, _ = input.max(dim=dim, keepdim=True) 115 | input = input - max_val # same numerical stability trick as for softmax 116 | input = input / 2 # divide by 2 to solve actual Entmax 117 | 118 | tau_star, _ = Entmax15Function._threshold_and_support(input, dim) 119 | output = torch.clamp(input - tau_star, min=0) ** 2 120 | ctx.save_for_backward(output) 121 | return output 122 | 123 | @staticmethod 124 | def backward(ctx, grad_output): 125 | Y, = ctx.saved_tensors 126 | gppr = Y.sqrt() # = 1 / g'' (Y) 127 | dX = grad_output * gppr 128 | q = dX.sum(ctx.dim) / gppr.sum(ctx.dim) 129 | q = q.unsqueeze(ctx.dim) 130 | dX -= q * gppr 131 | return dX, None 132 | 133 | @staticmethod 134 | def _threshold_and_support(input, dim=-1): 135 | Xsrt, _ = torch.sort(input, descending=True, dim=dim) 136 | 137 | rho = _make_ix_like(input, dim) 138 | mean = Xsrt.cumsum(dim) / rho 139 | mean_sq = (Xsrt ** 2).cumsum(dim) / rho 140 | ss = rho * (mean_sq - mean ** 2) 141 | delta = (1 - ss) / rho 142 | 143 | # NOTE this is not exactly the same as in reference algo 144 | # Fortunately it seems the clamped values never wrongly 145 | # get selected by tau <= sorted_z. Prove this! 146 | delta_nz = torch.clamp(delta, 0) 147 | tau = mean - torch.sqrt(delta_nz) 148 | 149 | support_size = (tau <= Xsrt).sum(dim).unsqueeze(dim) 150 | tau_star = tau.gather(dim, support_size - 1) 151 | return tau_star, support_size 152 | 153 | 154 | class Entmoid15(Function): 155 | """ A highly optimized equivalent of labda x: Entmax15([x, 0]) """ 156 | 157 | @staticmethod 158 | def forward(ctx, input): 159 | output = Entmoid15._forward(input) 160 | ctx.save_for_backward(output) 161 | return output 162 | 163 | @staticmethod 164 | @script 165 | def _forward(input): 166 | input, is_pos = abs(input), input >= 0 167 | tau = (input + torch.sqrt(F.relu(8 - input ** 2))) / 2 168 | tau.masked_fill_(tau <= input, 2.0) 169 | y_neg = 0.25 * F.relu(tau - input, inplace=True) ** 2 170 | return torch.where(is_pos, 1 - y_neg, y_neg) 171 | 172 | @staticmethod 173 | def backward(ctx, grad_output): 174 | return Entmoid15._backward(ctx.saved_tensors[0], grad_output) 175 | 176 | @staticmethod 177 | @script 178 | def _backward(output, grad_output): 179 | gppr0, gppr1 = output.sqrt(), (1 - output).sqrt() 180 | grad_input = grad_output * gppr0 181 | q = grad_input / (gppr0 + gppr1) 182 | grad_input -= q * gppr0 183 | return grad_input 184 | 185 | 186 | entmax15 = lambda input, dim=-1: Entmax15Function.apply(input, dim) 187 | entmoid15 = Entmoid15.apply 188 | 189 | 190 | class Lambda(nn.Module): 191 | def __init__(self, func): 192 | super().__init__() 193 | self.func = func 194 | 195 | def forward(self, *args, **kwargs): 196 | return self.func(*args, **kwargs) 197 | 198 | 199 | class ModuleWithInit(nn.Module): 200 | """ Base class for pytorch module with data-aware initializer on first batch """ 201 | def __init__(self): 202 | super().__init__() 203 | self._is_initialized_tensor = nn.Parameter(torch.tensor(0, dtype=torch.uint8), requires_grad=False) 204 | self._is_initialized_bool = None 205 | # Note: this module uses a separate flag self._is_initialized so as to achieve both 206 | # * persistence: is_initialized is saved alongside model in state_dict 207 | # * speed: model doesn't need to cache 208 | # please DO NOT use these flags in child modules 209 | 210 | def initialize(self, *args, **kwargs): 211 | """ initialize module tensors using first batch of data """ 212 | raise NotImplementedError("Please implement ") 213 | 214 | def __call__(self, *args, **kwargs): 215 | if self._is_initialized_bool is None: 216 | self._is_initialized_bool = bool(self._is_initialized_tensor.item()) 217 | if not self._is_initialized_bool: 218 | self.initialize(*args, **kwargs) 219 | self._is_initialized_tensor.data[...] = 1 220 | self._is_initialized_bool = True 221 | return super().__call__(*args, **kwargs) 222 | -------------------------------------------------------------------------------- /models/node/odst.py: -------------------------------------------------------------------------------- 1 | # Source: https://github.com/Qwicen/node 2 | from warnings import warn 3 | 4 | import numpy as np 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | 9 | from .nn_utils import ModuleWithInit, sparsemax, sparsemoid 10 | from .utils import check_numpy 11 | 12 | 13 | class ODST(ModuleWithInit): 14 | def __init__(self, in_features, num_trees, depth=6, tree_dim=1, flatten_output=True, 15 | choice_function=sparsemax, bin_function=sparsemoid, 16 | initialize_response_=nn.init.normal_, initialize_selection_logits_=nn.init.uniform_, 17 | threshold_init_beta=1.0, threshold_init_cutoff=1.0, 18 | ): 19 | """ 20 | Oblivious Differentiable Sparsemax Trees. http://tinyurl.com/odst-readmore 21 | One can drop (sic!) this module anywhere instead of nn.Linear 22 | :param in_features: number of features in the input tensor 23 | :param num_trees: number of trees in this layer 24 | :param tree_dim: number of response channels in the response of individual tree 25 | :param depth: number of splits in every tree 26 | :param flatten_output: if False, returns [..., num_trees, tree_dim], 27 | by default returns [..., num_trees * tree_dim] 28 | :param choice_function: f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1 29 | :param bin_function: f(tensor) -> R[0, 1], computes tree leaf weights 30 | 31 | :param initialize_response_: in-place initializer for tree output tensor 32 | :param initialize_selection_logits_: in-place initializer for logits that select features for the tree 33 | both thresholds and scales are initialized with data-aware init (or .load_state_dict) 34 | :param threshold_init_beta: initializes threshold to a q-th quantile of data points 35 | where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:) 36 | If this param is set to 1, initial thresholds will have the same distribution as data points 37 | If greater than 1 (e.g. 10), thresholds will be closer to median data value 38 | If less than 1 (e.g. 0.1), thresholds will approach min/max data values. 39 | 40 | :param threshold_init_cutoff: threshold log-temperatures initializer, \in (0, inf) 41 | By default(1.0), log-remperatures are initialized in such a way that all bin selectors 42 | end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter. 43 | Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value 44 | Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid region 45 | For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0 46 | Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value 47 | All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff) 48 | """ 49 | super().__init__() 50 | self.depth, self.num_trees, self.tree_dim, self.flatten_output = depth, num_trees, tree_dim, flatten_output 51 | self.choice_function, self.bin_function = choice_function, bin_function 52 | self.threshold_init_beta, self.threshold_init_cutoff = threshold_init_beta, threshold_init_cutoff 53 | 54 | self.response = nn.Parameter(torch.zeros([num_trees, tree_dim, 2 ** depth]), requires_grad=True) 55 | initialize_response_(self.response) 56 | 57 | self.feature_selection_logits = nn.Parameter( 58 | torch.zeros([in_features, num_trees, depth]), requires_grad=True 59 | ) 60 | initialize_selection_logits_(self.feature_selection_logits) 61 | 62 | self.feature_thresholds = nn.Parameter( 63 | torch.full([num_trees, depth], float('nan'), dtype=torch.float32), requires_grad=True 64 | ) # nan values will be initialized on first batch (data-aware init) 65 | 66 | self.log_temperatures = nn.Parameter( 67 | torch.full([num_trees, depth], float('nan'), dtype=torch.float32), requires_grad=True 68 | ) 69 | 70 | # binary codes for mapping between 1-hot vectors and bin indices 71 | with torch.no_grad(): 72 | indices = torch.arange(2 ** self.depth) 73 | offsets = 2 ** torch.arange(self.depth) 74 | bin_codes = (indices.view(1, -1) // offsets.view(-1, 1) % 2).to(torch.float32) 75 | bin_codes_1hot = torch.stack([bin_codes, 1.0 - bin_codes], dim=-1) 76 | self.bin_codes_1hot = nn.Parameter(bin_codes_1hot, requires_grad=False) 77 | # ^-- [depth, 2 ** depth, 2] 78 | 79 | def forward(self, input): 80 | assert len(input.shape) >= 2 81 | if len(input.shape) > 2: 82 | return self.forward(input.view(-1, input.shape[-1])).view(*input.shape[:-1], -1) 83 | # new input shape: [batch_size, in_features] 84 | 85 | feature_logits = self.feature_selection_logits 86 | feature_selectors = self.choice_function(feature_logits, dim=0) 87 | # ^--[in_features, num_trees, depth] 88 | 89 | feature_values = torch.einsum('bi,ind->bnd', input, feature_selectors) 90 | # ^--[batch_size, num_trees, depth] 91 | 92 | threshold_logits = (feature_values - self.feature_thresholds) * torch.exp(-self.log_temperatures) 93 | 94 | threshold_logits = torch.stack([-threshold_logits, threshold_logits], dim=-1) 95 | # ^--[batch_size, num_trees, depth, 2] 96 | 97 | bins = self.bin_function(threshold_logits) 98 | # ^--[batch_size, num_trees, depth, 2], approximately binary 99 | 100 | bin_matches = torch.einsum('btds,dcs->btdc', bins, self.bin_codes_1hot) 101 | # ^--[batch_size, num_trees, depth, 2 ** depth] 102 | 103 | response_weights = torch.prod(bin_matches, dim=-2) 104 | # ^-- [batch_size, num_trees, 2 ** depth] 105 | 106 | response = torch.einsum('bnd,ncd->bnc', response_weights, self.response) 107 | # ^-- [batch_size, num_trees, tree_dim] 108 | 109 | return response.flatten(1, 2) if self.flatten_output else response 110 | 111 | def initialize(self, input, eps=1e-6): 112 | # data-aware initializer 113 | assert len(input.shape) == 2 114 | if input.shape[0] < 1000: 115 | warn("Data-aware initialization is performed on less than 1000 data points. This may cause instability." 116 | "To avoid potential problems, run this model on a data batch with at least 1000 data samples." 117 | "You can do so manually before training. Use with torch.no_grad() for memory efficiency.") 118 | with torch.no_grad(): 119 | feature_selectors = self.choice_function(self.feature_selection_logits, dim=0) 120 | # ^--[in_features, num_trees, depth] 121 | 122 | feature_values = torch.einsum('bi,ind->bnd', input, feature_selectors) 123 | # ^--[batch_size, num_trees, depth] 124 | 125 | # initialize thresholds: sample random percentiles of data 126 | percentiles_q = 100 * np.random.beta(self.threshold_init_beta, self.threshold_init_beta, 127 | size=[self.num_trees, self.depth]) 128 | self.feature_thresholds.data[...] = torch.as_tensor( 129 | list(map(np.percentile, check_numpy(feature_values.flatten(1, 2).t()), percentiles_q.flatten())), 130 | dtype=feature_values.dtype, device=feature_values.device 131 | ).view(self.num_trees, self.depth) 132 | 133 | # init temperatures: make sure enough data points are in the linear region of sparse-sigmoid 134 | temperatures = np.percentile(check_numpy(abs(feature_values - self.feature_thresholds)), 135 | q=100 * min(1.0, self.threshold_init_cutoff), axis=0) 136 | 137 | # if threshold_init_cutoff > 1, scale everything down by it 138 | temperatures /= max(1.0, self.threshold_init_cutoff) 139 | self.log_temperatures.data[...] = torch.log(torch.as_tensor(temperatures) + eps) 140 | 141 | def __repr__(self): 142 | return "{}(in_features={}, num_trees={}, depth={}, tree_dim={}, flatten_output={})".format( 143 | self.__class__.__name__, self.feature_selection_logits.shape[0], 144 | self.num_trees, self.depth, self.tree_dim, self.flatten_output 145 | ) 146 | 147 | -------------------------------------------------------------------------------- /models/node/utils.py: -------------------------------------------------------------------------------- 1 | # Source: https://github.com/Qwicen/node 2 | import contextlib 3 | import gc 4 | import glob 5 | import hashlib 6 | import os 7 | import time 8 | 9 | import numpy as np 10 | import requests 11 | import torch 12 | from tqdm import tqdm 13 | 14 | 15 | def download(url, filename, delete_if_interrupted=True, chunk_size=4096): 16 | """ saves file from url to filename with a fancy progressbar """ 17 | try: 18 | with open(filename, "wb") as f: 19 | print("Downloading {} > {}".format(url, filename)) 20 | response = requests.get(url, stream=True) 21 | total_length = response.headers.get('content-length') 22 | 23 | if total_length is None: # no content length header 24 | f.write(response.content) 25 | else: 26 | total_length = int(total_length) 27 | with tqdm(total=total_length) as progressbar: 28 | for data in response.iter_content(chunk_size=chunk_size): 29 | if data: # filter-out keep-alive chunks 30 | f.write(data) 31 | progressbar.update(len(data)) 32 | except Exception as e: 33 | if delete_if_interrupted: 34 | print("Removing incomplete download {}.".format(filename)) 35 | os.remove(filename) 36 | raise e 37 | return filename 38 | 39 | 40 | def iterate_minibatches(*tensors, batch_size, shuffle=True, epochs=1, 41 | allow_incomplete=True, callback=lambda x:x): 42 | indices = np.arange(len(tensors[0])) 43 | upper_bound = int((np.ceil if allow_incomplete else np.floor) (len(indices) / batch_size)) * batch_size 44 | epoch = 0 45 | while True: 46 | if shuffle: 47 | np.random.shuffle(indices) 48 | for batch_start in callback(range(0, upper_bound, batch_size)): 49 | batch_ix = indices[batch_start: batch_start + batch_size] 50 | batch = [tensor[batch_ix] for tensor in tensors] 51 | yield batch if len(tensors) > 1 else batch[0] 52 | epoch += 1 53 | if epoch >= epochs: 54 | break 55 | 56 | 57 | def process_in_chunks(function, *args, batch_size, out=None, **kwargs): 58 | """ 59 | Computes output by applying batch-parallel function to large data tensor in chunks 60 | :param function: a function(*[x[indices, ...] for x in args]) -> out[indices, ...] 61 | :param args: one or many tensors, each [num_instances, ...] 62 | :param batch_size: maximum chunk size processed in one go 63 | :param out: memory buffer for out, defaults to torch.zeros of appropriate size and type 64 | :returns: function(data), computed in a memory-efficient way 65 | """ 66 | total_size = args[0].shape[0] 67 | first_output = function(*[x[0: batch_size] for x in args]) 68 | output_shape = (total_size,) + tuple(first_output.shape[1:]) 69 | if out is None: 70 | out = torch.zeros(*output_shape, dtype=first_output.dtype, device=first_output.device, 71 | layout=first_output.layout, **kwargs) 72 | 73 | out[0: batch_size] = first_output 74 | for i in range(batch_size, total_size, batch_size): 75 | batch_ix = slice(i, min(i + batch_size, total_size)) 76 | out[batch_ix] = function(*[x[batch_ix] for x in args]) 77 | return out 78 | 79 | 80 | def check_numpy(x): 81 | """ Makes sure x is a numpy array """ 82 | if isinstance(x, torch.Tensor): 83 | x = x.detach().cpu().numpy() 84 | x = np.asarray(x) 85 | assert isinstance(x, np.ndarray) 86 | return x 87 | 88 | 89 | @contextlib.contextmanager 90 | def nop_ctx(): 91 | yield None 92 | 93 | 94 | def get_latest_file(pattern): 95 | list_of_files = glob.glob(pattern) # * means all if need specific format then *.csv 96 | assert len(list_of_files) > 0, "No files found: " + pattern 97 | return max(list_of_files, key=os.path.getctime) 98 | 99 | 100 | def md5sum(fname): 101 | """ Computes mdp checksum of a file """ 102 | hash_md5 = hashlib.md5() 103 | with open(fname, "rb") as f: 104 | for chunk in iter(lambda: f.read(4096), b""): 105 | hash_md5.update(chunk) 106 | return hash_md5.hexdigest() 107 | 108 | 109 | def free_memory(sleep_time=0.1): 110 | """ Black magic function to free torch memory and some jupyter whims """ 111 | gc.collect() 112 | torch.cuda.synchronize() 113 | gc.collect() 114 | torch.cuda.empty_cache() 115 | time.sleep(sleep_time) 116 | 117 | def to_float_str(element): 118 | try: 119 | return str(float(element)) 120 | except ValueError: 121 | return element 122 | -------------------------------------------------------------------------------- /models/node_model.py: -------------------------------------------------------------------------------- 1 | # %% 2 | import time 3 | import math 4 | import typing as ty 5 | from pathlib import Path 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | from torch.utils.data import DataLoader 12 | from torch import Tensor 13 | 14 | import models.node as node 15 | from models.abstract import TabModel, check_dir 16 | 17 | # %% 18 | class _NODE(nn.Module): 19 | def __init__( 20 | self, 21 | *, 22 | d_in: int, 23 | num_layers: int, 24 | layer_dim: int, 25 | depth: int, 26 | tree_dim: int, 27 | choice_function: str, 28 | bin_function: str, 29 | d_out: int, 30 | categories: ty.Optional[ty.List[int]], 31 | d_embedding: int, 32 | ) -> None: 33 | super().__init__() 34 | 35 | if categories is not None: 36 | d_in += len(categories) * d_embedding 37 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0) 38 | self.register_buffer('category_offsets', category_offsets) 39 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding) 40 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5)) 41 | print(f'{self.category_embeddings.weight.shape}') 42 | 43 | self.d_out = d_out 44 | self.block = node.DenseBlock( 45 | input_dim=d_in, 46 | num_layers=num_layers, 47 | layer_dim=layer_dim, 48 | depth=depth, 49 | tree_dim=tree_dim, 50 | bin_function=getattr(node, bin_function), 51 | choice_function=getattr(node, choice_function), 52 | flatten_output=False, 53 | ) 54 | 55 | def forward(self, x_num: Tensor, x_cat: Tensor) -> Tensor: 56 | if x_cat is not None: 57 | x_cat = self.category_embeddings(x_cat + self.category_offsets[None]) 58 | x = torch.cat([x_num, x_cat.view(x_cat.size(0), -1)], dim=-1) 59 | else: 60 | x = x_num 61 | 62 | x = self.block(x) 63 | x = x[..., : self.d_out].mean(dim=-2) 64 | x = x.squeeze(-1) 65 | return x 66 | 67 | 68 | # %% 69 | class NODE(TabModel): 70 | def __init__( 71 | self, 72 | model_config: dict, 73 | n_num_features: int, 74 | categories: ty.Optional[ty.List[int]], 75 | n_labels: int, 76 | device: ty.Union[str, torch.device] = 'cuda', 77 | ): 78 | super().__init__() 79 | model_config = self.preproc_config(model_config) 80 | self.model = _NODE( 81 | d_in=n_num_features, 82 | categories=categories, 83 | d_out=n_labels, 84 | tree_dim=n_labels, 85 | **model_config 86 | ).to(device) 87 | self.base_name = 'node' 88 | self.device = torch.device(device) 89 | 90 | def preproc_config(self, model_config: dict): 91 | # process autoint configs 92 | self.saved_model_config = model_config.copy() 93 | return model_config 94 | 95 | def fit( 96 | self, 97 | # API for specical sampler like curriculum learning 98 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx) 99 | # using normal sampler if is None 100 | X_num: ty.Optional[torch.Tensor] = None, 101 | X_cat: ty.Optional[torch.Tensor] = None, 102 | ys: ty.Optional[torch.Tensor] = None, 103 | y_std: ty.Optional[float] = None, # for RMSE 104 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None, 105 | patience: int = 0, 106 | task: str = None, 107 | training_args: dict = None, 108 | meta_args: ty.Optional[dict] = None, 109 | ): 110 | def train_step(model, x_num, x_cat, y): # input is X and y 111 | # process input (model-specific) 112 | # define your model API 113 | start_time = time.time() 114 | # define your model API 115 | logits = model(x_num, x_cat) 116 | used_time = time.time() - start_time 117 | return logits, used_time 118 | 119 | # to custom other training paradigm 120 | # 1. add self.dnn_fit2(...) in abstract class for special training process 121 | # 2. (recommended) override self.dnn_fit in abstract class 122 | self.dnn_fit( # uniform training paradigm 123 | dnn_fit_func=train_step, 124 | # training data 125 | train_loader=train_loader, 126 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, 127 | # dev data 128 | eval_set=eval_set, patience=patience, task=task, 129 | # args 130 | training_args=training_args, 131 | meta_args=meta_args, 132 | ) 133 | 134 | def predict( 135 | self, 136 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx) 137 | X_num: ty.Optional[torch.Tensor] = None, 138 | X_cat: ty.Optional[torch.Tensor] = None, 139 | ys: ty.Optional[torch.Tensor] = None, 140 | y_std: ty.Optional[float] = None, # for RMSE 141 | task: str = None, 142 | return_probs: bool = True, 143 | return_metric: bool = False, 144 | return_loss: bool = False, 145 | meta_args: ty.Optional[dict] = None, 146 | ): 147 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible) 148 | """ 149 | Inference Process 150 | `no_grad` will be applied in `dnn_predict' 151 | """ 152 | # process input (model-specific) 153 | # define your running time calculation 154 | start_time = time.time() 155 | # define your model API 156 | logits = model(x_num, x_cat) 157 | used_time = time.time() - start_time 158 | return logits, used_time 159 | 160 | # to custom other inference paradigm 161 | # 1. add self.dnn_predict2(...) in abstract class for special training process 162 | # 2. (recommended) override self.dnn_predict in abstract class 163 | return self.dnn_predict( # uniform training paradigm 164 | dnn_predict_func=inference_step, 165 | dev_loader=dev_loader, 166 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task, 167 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss, 168 | meta_args=meta_args 169 | ) 170 | 171 | def save(self, output_dir): 172 | check_dir(output_dir) 173 | self.save_pt_model(output_dir) 174 | self.save_history(output_dir) 175 | self.save_config(output_dir) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | category_encoders==2.5.1.post0 2 | numpy==1.23.5 3 | optuna==3.0.5 4 | pandas==1.3.5 5 | PyYAML==6.0 6 | PyYAML==6.0.1 7 | Requests==2.31.0 8 | rtdl==0.0.13 9 | scikit_learn==1.0.2 10 | scipy==1.8.1 11 | torch==1.9.0+cu111 12 | tqdm==4.64.1 13 | typing_extensions==4.8.0 14 | -------------------------------------------------------------------------------- /utils/deep.py: -------------------------------------------------------------------------------- 1 | """ 2 | References: 3 | - https://github.com/yandex-research/tabular-dl-revisiting-models/blob/main/lib/deep.py 4 | """ 5 | from __future__ import absolute_import, division, print_function 6 | 7 | import math 8 | import os 9 | import typing as ty 10 | from copy import deepcopy 11 | 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.functional as F 15 | import torch.optim as optim 16 | from torch import Tensor 17 | 18 | 19 | class Lambda(nn.Module): 20 | def __init__(self, f: ty.Callable) -> None: 21 | super().__init__() 22 | self.f = f 23 | 24 | def forward(self, x): 25 | return self.f(x) 26 | 27 | 28 | # Source: https://github.com/bzhangGo/rmsnorm 29 | # NOTE: eps is changed to 1e-5 30 | class RMSNorm(nn.Module): 31 | def __init__(self, d, p=-1.0, eps=1e-5, bias=False): 32 | """Root Mean Square Layer Normalization 33 | 34 | :param d: model size 35 | :param p: partial RMSNorm, valid value [0, 1], default -1.0 (disabled) 36 | :param eps: epsilon value, default 1e-8 37 | :param bias: whether use bias term for RMSNorm, disabled by 38 | default because RMSNorm doesn't enforce re-centering invariance. 39 | """ 40 | super(RMSNorm, self).__init__() 41 | 42 | self.eps = eps 43 | self.d = d 44 | self.p = p 45 | self.bias = bias 46 | 47 | self.scale = nn.Parameter(torch.ones(d)) 48 | self.register_parameter("scale", self.scale) 49 | 50 | if self.bias: 51 | self.offset = nn.Parameter(torch.zeros(d)) 52 | self.register_parameter("offset", self.offset) 53 | 54 | def forward(self, x): 55 | if self.p < 0.0 or self.p > 1.0: 56 | norm_x = x.norm(2, dim=-1, keepdim=True) 57 | d_x = self.d 58 | else: 59 | partial_size = int(self.d * self.p) 60 | partial_x, _ = torch.split(x, [partial_size, self.d - partial_size], dim=-1) 61 | 62 | norm_x = partial_x.norm(2, dim=-1, keepdim=True) 63 | d_x = partial_size 64 | 65 | rms_x = norm_x * d_x ** (-1.0 / 2) 66 | x_normed = x / (rms_x + self.eps) 67 | 68 | if self.bias: 69 | return self.scale * x_normed + self.offset 70 | 71 | return self.scale * x_normed 72 | 73 | 74 | class ScaleNorm(nn.Module): 75 | """ 76 | Sources: 77 | - https://github.com/tnq177/transformers_without_tears/blob/25026061979916afb193274438f7097945acf9bc/layers.py#L132 78 | - https://github.com/tnq177/transformers_without_tears/blob/6b2726cd9e6e642d976ae73b9f696d9d7ff4b395/layers.py#L157 79 | """ 80 | 81 | def __init__(self, d: int, eps: float = 1e-5, clamp: bool = False) -> None: 82 | super(ScaleNorm, self).__init__() 83 | self.scale = nn.Parameter(torch.tensor(d ** 0.5)) 84 | self.eps = eps 85 | self.clamp = clamp 86 | 87 | def forward(self, x): 88 | norms = torch.norm(x, dim=-1, keepdim=True) 89 | norms = norms.clamp(min=self.eps) if self.clamp else norms + self.eps 90 | return self.scale * x / norms 91 | 92 | 93 | def reglu(x: Tensor) -> Tensor: 94 | a, b = x.chunk(2, dim=-1) 95 | return a * F.relu(b) 96 | 97 | 98 | def geglu(x: Tensor) -> Tensor: 99 | a, b = x.chunk(2, dim=-1) 100 | return a * F.gelu(b) 101 | 102 | def tanglu(x: Tensor) -> Tensor: 103 | a, b = x.chunk(2, dim=-1) 104 | return a * torch.tanh(b) 105 | 106 | 107 | class ReGLU(nn.Module): 108 | def forward(self, x: Tensor) -> Tensor: 109 | return reglu(x) 110 | 111 | 112 | class GEGLU(nn.Module): 113 | def forward(self, x: Tensor) -> Tensor: 114 | return geglu(x) 115 | 116 | 117 | def make_optimizer( 118 | optimizer: str, 119 | parameter_groups, 120 | lr: float, 121 | weight_decay: float, 122 | ) -> optim.Optimizer: 123 | Optimizer = { 124 | 'adabelief': AdaBelief, 125 | 'adam': optim.Adam, 126 | 'adamw': optim.AdamW, 127 | 'radam': RAdam, 128 | 'sgd': optim.SGD, 129 | }[optimizer] 130 | momentum = (0.9,) if Optimizer is optim.SGD else () 131 | return Optimizer(parameter_groups, lr, *momentum, weight_decay=weight_decay) 132 | 133 | 134 | def make_lr_schedule( 135 | optimizer: optim.Optimizer, 136 | lr: float, 137 | epoch_size: int, 138 | lr_schedule: ty.Optional[ty.Dict[str, ty.Any]], 139 | ) -> ty.Tuple[ 140 | ty.Optional[optim.lr_scheduler._LRScheduler], 141 | ty.Dict[str, ty.Any], 142 | ty.Optional[int], 143 | ]: 144 | if lr_schedule is None: 145 | lr_schedule = {'type': 'constant'} 146 | lr_scheduler = None 147 | n_warmup_steps = None 148 | if lr_schedule['type'] in ['transformer', 'linear_warmup']: 149 | n_warmup_steps = ( 150 | lr_schedule['n_warmup_steps'] 151 | if 'n_warmup_steps' in lr_schedule 152 | else lr_schedule['n_warmup_epochs'] * epoch_size 153 | ) 154 | elif lr_schedule['type'] == 'cyclic': 155 | lr_scheduler = optim.lr_scheduler.CyclicLR( 156 | optimizer, 157 | base_lr=lr, 158 | max_lr=lr_schedule['max_lr'], 159 | step_size_up=lr_schedule['n_epochs_up'] * epoch_size, 160 | step_size_down=lr_schedule['n_epochs_down'] * epoch_size, 161 | mode=lr_schedule['mode'], 162 | gamma=lr_schedule.get('gamma', 1.0), 163 | cycle_momentum=False, 164 | ) 165 | return lr_scheduler, lr_schedule, n_warmup_steps 166 | 167 | 168 | def get_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]: 169 | return ( 170 | reglu 171 | if name == 'reglu' 172 | else geglu 173 | if name == 'geglu' 174 | else torch.sigmoid 175 | if name == 'sigmoid' 176 | else tanglu 177 | if name == 'tanglu' 178 | else getattr(F, name) 179 | ) 180 | 181 | 182 | def get_nonglu_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]: 183 | return ( 184 | F.relu 185 | if name == 'reglu' 186 | else F.gelu 187 | if name == 'geglu' 188 | else get_activation_fn(name) 189 | ) 190 | 191 | 192 | def load_swa_state_dict(model: nn.Module, swa_model: optim.swa_utils.AveragedModel): 193 | state_dict = deepcopy(swa_model.state_dict()) 194 | del state_dict['n_averaged'] 195 | model.load_state_dict({k[len('module.') :]: v for k, v in state_dict.items()}) 196 | 197 | 198 | def get_epoch_parameters( 199 | train_size: int, batch_size: ty.Union[int, str] 200 | ) -> ty.Tuple[int, int]: 201 | if isinstance(batch_size, str): 202 | if batch_size == 'v3': 203 | batch_size = ( 204 | 256 if train_size < 50000 else 512 if train_size < 100000 else 1024 205 | ) 206 | elif batch_size == 'v1': 207 | batch_size = ( 208 | 16 209 | if train_size < 1000 210 | else 32 211 | if train_size < 10000 212 | else 64 213 | if train_size < 50000 214 | else 128 215 | if train_size < 100000 216 | else 256 217 | if train_size < 200000 218 | else 512 219 | if train_size < 500000 220 | else 1024 221 | ) 222 | elif batch_size == 'v2': 223 | batch_size = ( 224 | 512 if train_size < 100000 else 1024 if train_size < 500000 else 2048 225 | ) 226 | return batch_size, math.ceil(train_size / batch_size) # type: ignore[code] 227 | 228 | 229 | def get_linear_warmup_lr(lr: float, n_warmup_steps: int, step: int) -> float: 230 | assert step > 0, "1-based enumeration of steps is expected" 231 | return min(lr, step / (n_warmup_steps + 1) * lr) 232 | 233 | 234 | def get_manual_lr(schedule: ty.List[float], epoch: int) -> float: 235 | assert epoch > 0, "1-based enumeration of epochs is expected" 236 | return schedule[min(epoch, len(schedule)) - 1] 237 | 238 | 239 | def get_transformer_lr(scale: float, d: int, n_warmup_steps: int, step: int) -> float: 240 | return scale * d ** -0.5 * min(step ** -0.5, step * n_warmup_steps ** -1.5) 241 | 242 | 243 | def learn(model, optimizer, loss_fn, step, batch, star) -> ty.Tuple[Tensor, ty.Any]: 244 | model.train() 245 | optimizer.zero_grad() 246 | out = step(batch) 247 | loss = loss_fn(*out) if star else loss_fn(out) 248 | loss.backward() 249 | optimizer.step() 250 | return loss, out 251 | 252 | 253 | def tensor(x) -> torch.Tensor: 254 | assert isinstance(x, torch.Tensor) 255 | return ty.cast(torch.Tensor, x) 256 | 257 | 258 | def get_n_parameters(m: nn.Module): 259 | return sum(x.numel() for x in m.parameters() if x.requires_grad) 260 | 261 | 262 | def get_mlp_n_parameters(units: ty.List[int]): 263 | x = 0 264 | for a, b in zip(units, units[1:]): 265 | x += a * b + b 266 | return x 267 | 268 | 269 | def get_lr(optimizer: optim.Optimizer) -> float: 270 | return next(iter(optimizer.param_groups))['lr'] 271 | 272 | 273 | def set_lr(optimizer: optim.Optimizer, lr: float) -> None: 274 | for x in optimizer.param_groups: 275 | x['lr'] = lr 276 | 277 | 278 | def get_device() -> torch.device: 279 | return torch.device('cuda:0' if os.environ.get('CUDA_VISIBLE_DEVICES') else 'cpu') 280 | 281 | 282 | @torch.no_grad() 283 | def get_gradient_norm_ratios(m: nn.Module): 284 | return { 285 | k: v.grad.norm() / v.norm() 286 | for k, v in m.named_parameters() 287 | if v.grad is not None 288 | } 289 | 290 | 291 | def is_oom_exception(err: RuntimeError) -> bool: 292 | return any( 293 | x in str(err) 294 | for x in [ 295 | 'CUDA out of memory', 296 | 'CUBLAS_STATUS_ALLOC_FAILED', 297 | 'CUDA error: out of memory', 298 | ] 299 | ) 300 | 301 | 302 | # Source: https://github.com/LiyuanLucasLiu/RAdam 303 | class RAdam(optim.Optimizer): 304 | def __init__( 305 | self, 306 | params, 307 | lr=1e-3, 308 | betas=(0.9, 0.999), 309 | eps=1e-8, 310 | weight_decay=0, 311 | degenerated_to_sgd=True, 312 | ): 313 | if not 0.0 <= lr: 314 | raise ValueError("Invalid learning rate: {}".format(lr)) 315 | if not 0.0 <= eps: 316 | raise ValueError("Invalid epsilon value: {}".format(eps)) 317 | if not 0.0 <= betas[0] < 1.0: 318 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 319 | if not 0.0 <= betas[1] < 1.0: 320 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 321 | 322 | self.degenerated_to_sgd = degenerated_to_sgd 323 | if ( 324 | isinstance(params, (list, tuple)) 325 | and len(params) > 0 326 | and isinstance(params[0], dict) 327 | ): 328 | for param in params: 329 | if 'betas' in param and ( 330 | param['betas'][0] != betas[0] or param['betas'][1] != betas[1] 331 | ): 332 | param['buffer'] = [[None, None, None] for _ in range(10)] 333 | defaults = dict( 334 | lr=lr, 335 | betas=betas, 336 | eps=eps, 337 | weight_decay=weight_decay, 338 | buffer=[[None, None, None] for _ in range(10)], 339 | ) 340 | super(RAdam, self).__init__(params, defaults) 341 | 342 | def __setstate__(self, state): 343 | super(RAdam, self).__setstate__(state) 344 | 345 | def step(self, closure=None): 346 | 347 | loss = None 348 | if closure is not None: 349 | loss = closure() 350 | 351 | for group in self.param_groups: 352 | 353 | for p in group['params']: 354 | if p.grad is None: 355 | continue 356 | grad = p.grad.data.float() 357 | if grad.is_sparse: 358 | raise RuntimeError('RAdam does not support sparse gradients') 359 | 360 | p_data_fp32 = p.data.float() 361 | 362 | state = self.state[p] 363 | 364 | if len(state) == 0: 365 | state['step'] = 0 366 | state['exp_avg'] = torch.zeros_like(p_data_fp32) 367 | state['exp_avg_sq'] = torch.zeros_like(p_data_fp32) 368 | else: 369 | state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32) 370 | state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32) 371 | 372 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] 373 | beta1, beta2 = group['betas'] 374 | 375 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 376 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 377 | 378 | state['step'] += 1 379 | buffered = group['buffer'][int(state['step'] % 10)] 380 | if state['step'] == buffered[0]: 381 | N_sma, step_size = buffered[1], buffered[2] 382 | else: 383 | buffered[0] = state['step'] 384 | beta2_t = beta2 ** state['step'] 385 | N_sma_max = 2 / (1 - beta2) - 1 386 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t) 387 | buffered[1] = N_sma 388 | 389 | # more conservative since it's an approximated value 390 | if N_sma >= 5: 391 | step_size = math.sqrt( 392 | (1 - beta2_t) 393 | * (N_sma - 4) 394 | / (N_sma_max - 4) 395 | * (N_sma - 2) 396 | / N_sma 397 | * N_sma_max 398 | / (N_sma_max - 2) 399 | ) / (1 - beta1 ** state['step']) 400 | elif self.degenerated_to_sgd: 401 | step_size = 1.0 / (1 - beta1 ** state['step']) 402 | else: 403 | step_size = -1 404 | buffered[2] = step_size 405 | 406 | # more conservative since it's an approximated value 407 | if N_sma >= 5: 408 | if group['weight_decay'] != 0: 409 | p_data_fp32.add_( 410 | -group['weight_decay'] * group['lr'], p_data_fp32 411 | ) 412 | denom = exp_avg_sq.sqrt().add_(group['eps']) 413 | p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom) 414 | p.data.copy_(p_data_fp32) 415 | elif step_size > 0: 416 | if group['weight_decay'] != 0: 417 | p_data_fp32.add_( 418 | -group['weight_decay'] * group['lr'], p_data_fp32 419 | ) 420 | p_data_fp32.add_(-step_size * group['lr'], exp_avg) 421 | p.data.copy_(p_data_fp32) 422 | 423 | return loss 424 | 425 | 426 | version_higher = torch.__version__ >= "1.5.0" 427 | 428 | 429 | # Source: https://github.com/juntang-zhuang/Adabelief-Optimizer 430 | class AdaBelief(optim.Optimizer): 431 | r"""Implements AdaBelief algorithm. Modified from Adam in PyTorch 432 | Arguments: 433 | params (iterable): iterable of parameters to optimize or dicts defining 434 | parameter groups 435 | lr (float, optional): learning rate (default: 1e-3) 436 | betas (Tuple[float, float], optional): coefficients used for computing 437 | running averages of gradient and its square (default: (0.9, 0.999)) 438 | eps (float, optional): term added to the denominator to improve 439 | numerical stability (default: 1e-16) 440 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0) 441 | amsgrad (boolean, optional): whether to use the AMSGrad variant of this 442 | algorithm from the paper `On the Convergence of Adam and Beyond`_ 443 | (default: False) 444 | weight_decouple (boolean, optional): ( default: True) If set as True, then 445 | the optimizer uses decoupled weight decay as in AdamW 446 | fixed_decay (boolean, optional): (default: False) This is used when weight_decouple 447 | is set as True. 448 | When fixed_decay == True, the weight decay is performed as 449 | $W_{new} = W_{old} - W_{old} \times decay$. 450 | When fixed_decay == False, the weight decay is performed as 451 | $W_{new} = W_{old} - W_{old} \times decay \times lr$. Note that in this case, the 452 | weight decay ratio decreases with learning rate (lr). 453 | rectify (boolean, optional): (default: True) If set as True, then perform the rectified 454 | update similar to RAdam 455 | degenerated_to_sgd (boolean, optional) (default:True) If set as True, then perform SGD update 456 | when variance of gradient is high 457 | print_change_log (boolean, optional) (default: True) If set as True, print the modifcation to 458 | default hyper-parameters 459 | reference: AdaBelief Optimizer, adapting stepsizes by the belief in observed gradients, NeurIPS 2020 460 | """ 461 | 462 | def __init__( 463 | self, 464 | params, 465 | lr=1e-3, 466 | betas=(0.9, 0.999), 467 | eps=1e-16, 468 | weight_decay=0, 469 | amsgrad=False, 470 | weight_decouple=True, 471 | fixed_decay=False, 472 | rectify=True, 473 | degenerated_to_sgd=True, 474 | print_change_log=True, 475 | ): 476 | 477 | # ------------------------------------------------------------------------------ 478 | # Print modifications to default arguments 479 | if print_change_log: 480 | print( 481 | 'Please check your arguments if you have upgraded adabelief-pytorch from version 0.0.5.' 482 | ) 483 | print('Modifications to default arguments:') 484 | default_table = [ 485 | ['eps', 'weight_decouple', 'rectify'], 486 | ['adabelief-pytorch=0.0.5', '1e-8', 'False', 'False'], 487 | ['>=0.1.0 (Current 0.2.0)', '1e-16', 'True', 'True'], 488 | ] 489 | print(default_table) 490 | 491 | recommend_table = [ 492 | [ 493 | 'SGD better than Adam (e.g. CNN for Image Classification)', 494 | 'Adam better than SGD (e.g. Transformer, GAN)', 495 | ], 496 | ['Recommended eps = 1e-8', 'Recommended eps = 1e-16'], 497 | ] 498 | print(recommend_table) 499 | 500 | print('For a complete table of recommended hyperparameters, see') 501 | print('https://github.com/juntang-zhuang/Adabelief-Optimizer') 502 | 503 | print( 504 | 'You can disable the log message by setting "print_change_log = False", though it is recommended to keep as a reminder.' 505 | ) 506 | # ------------------------------------------------------------------------------ 507 | 508 | if not 0.0 <= lr: 509 | raise ValueError("Invalid learning rate: {}".format(lr)) 510 | if not 0.0 <= eps: 511 | raise ValueError("Invalid epsilon value: {}".format(eps)) 512 | if not 0.0 <= betas[0] < 1.0: 513 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0])) 514 | if not 0.0 <= betas[1] < 1.0: 515 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1])) 516 | 517 | self.degenerated_to_sgd = degenerated_to_sgd 518 | if ( 519 | isinstance(params, (list, tuple)) 520 | and len(params) > 0 521 | and isinstance(params[0], dict) 522 | ): 523 | for param in params: 524 | if 'betas' in param and ( 525 | param['betas'][0] != betas[0] or param['betas'][1] != betas[1] 526 | ): 527 | param['buffer'] = [[None, None, None] for _ in range(10)] 528 | 529 | defaults = dict( 530 | lr=lr, 531 | betas=betas, 532 | eps=eps, 533 | weight_decay=weight_decay, 534 | amsgrad=amsgrad, 535 | buffer=[[None, None, None] for _ in range(10)], 536 | ) 537 | super(AdaBelief, self).__init__(params, defaults) 538 | 539 | self.degenerated_to_sgd = degenerated_to_sgd 540 | self.weight_decouple = weight_decouple 541 | self.rectify = rectify 542 | self.fixed_decay = fixed_decay 543 | if self.weight_decouple: 544 | print('Weight decoupling enabled in AdaBelief') 545 | if self.fixed_decay: 546 | print('Weight decay fixed') 547 | if self.rectify: 548 | print('Rectification enabled in AdaBelief') 549 | if amsgrad: 550 | print('AMSGrad enabled in AdaBelief') 551 | 552 | def __setstate__(self, state): 553 | super(AdaBelief, self).__setstate__(state) 554 | for group in self.param_groups: 555 | group.setdefault('amsgrad', False) 556 | 557 | def reset(self): 558 | for group in self.param_groups: 559 | for p in group['params']: 560 | state = self.state[p] 561 | amsgrad = group['amsgrad'] 562 | 563 | # State initialization 564 | state['step'] = 0 565 | # Exponential moving average of gradient values 566 | state['exp_avg'] = ( 567 | torch.zeros_like(p.data, memory_format=torch.preserve_format) 568 | if version_higher 569 | else torch.zeros_like(p.data) 570 | ) 571 | 572 | # Exponential moving average of squared gradient values 573 | state['exp_avg_var'] = ( 574 | torch.zeros_like(p.data, memory_format=torch.preserve_format) 575 | if version_higher 576 | else torch.zeros_like(p.data) 577 | ) 578 | 579 | if amsgrad: 580 | # Maintains max of all exp. moving avg. of sq. grad. values 581 | state['max_exp_avg_var'] = ( 582 | torch.zeros_like(p.data, memory_format=torch.preserve_format) 583 | if version_higher 584 | else torch.zeros_like(p.data) 585 | ) 586 | 587 | def step(self, closure=None): 588 | """Performs a single optimization step. 589 | Arguments: 590 | closure (callable, optional): A closure that reevaluates the model 591 | and returns the loss. 592 | """ 593 | loss = None 594 | if closure is not None: 595 | loss = closure() 596 | 597 | for group in self.param_groups: 598 | for p in group['params']: 599 | if p.grad is None: 600 | continue 601 | 602 | # cast data type 603 | half_precision = False 604 | if p.data.dtype == torch.float16: 605 | half_precision = True 606 | p.data = p.data.float() 607 | p.grad = p.grad.float() 608 | 609 | grad = p.grad.data 610 | if grad.is_sparse: 611 | raise RuntimeError( 612 | 'AdaBelief does not support sparse gradients, please consider SparseAdam instead' 613 | ) 614 | amsgrad = group['amsgrad'] 615 | 616 | state = self.state[p] 617 | 618 | beta1, beta2 = group['betas'] 619 | 620 | # State initialization 621 | if len(state) == 0: 622 | state['step'] = 0 623 | # Exponential moving average of gradient values 624 | state['exp_avg'] = ( 625 | torch.zeros_like(p.data, memory_format=torch.preserve_format) 626 | if version_higher 627 | else torch.zeros_like(p.data) 628 | ) 629 | # Exponential moving average of squared gradient values 630 | state['exp_avg_var'] = ( 631 | torch.zeros_like(p.data, memory_format=torch.preserve_format) 632 | if version_higher 633 | else torch.zeros_like(p.data) 634 | ) 635 | if amsgrad: 636 | # Maintains max of all exp. moving avg. of sq. grad. values 637 | state['max_exp_avg_var'] = ( 638 | torch.zeros_like( 639 | p.data, memory_format=torch.preserve_format 640 | ) 641 | if version_higher 642 | else torch.zeros_like(p.data) 643 | ) 644 | 645 | # perform weight decay, check if decoupled weight decay 646 | if self.weight_decouple: 647 | if not self.fixed_decay: 648 | p.data.mul_(1.0 - group['lr'] * group['weight_decay']) 649 | else: 650 | p.data.mul_(1.0 - group['weight_decay']) 651 | else: 652 | if group['weight_decay'] != 0: 653 | grad.add_(p.data, alpha=group['weight_decay']) 654 | 655 | # get current state variable 656 | exp_avg, exp_avg_var = state['exp_avg'], state['exp_avg_var'] 657 | 658 | state['step'] += 1 659 | bias_correction1 = 1 - beta1 ** state['step'] 660 | bias_correction2 = 1 - beta2 ** state['step'] 661 | 662 | # Update first and second moment running average 663 | exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1) 664 | grad_residual = grad - exp_avg 665 | exp_avg_var.mul_(beta2).addcmul_( 666 | grad_residual, grad_residual, value=1 - beta2 667 | ) 668 | 669 | if amsgrad: 670 | max_exp_avg_var = state['max_exp_avg_var'] 671 | # Maintains the maximum of all 2nd moment running avg. till now 672 | torch.max( 673 | max_exp_avg_var, 674 | exp_avg_var.add_(group['eps']), 675 | out=max_exp_avg_var, 676 | ) 677 | 678 | # Use the max. for normalizing running avg. of gradient 679 | denom = (max_exp_avg_var.sqrt() / math.sqrt(bias_correction2)).add_( 680 | group['eps'] 681 | ) 682 | else: 683 | denom = ( 684 | exp_avg_var.add_(group['eps']).sqrt() 685 | / math.sqrt(bias_correction2) 686 | ).add_(group['eps']) 687 | 688 | # update 689 | if not self.rectify: 690 | # Default update 691 | step_size = group['lr'] / bias_correction1 692 | p.data.addcdiv_(exp_avg, denom, value=-step_size) 693 | 694 | else: # Rectified update, forked from RAdam 695 | buffered = group['buffer'][int(state['step'] % 10)] 696 | if state['step'] == buffered[0]: 697 | N_sma, step_size = buffered[1], buffered[2] 698 | else: 699 | buffered[0] = state['step'] 700 | beta2_t = beta2 ** state['step'] 701 | N_sma_max = 2 / (1 - beta2) - 1 702 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t) 703 | buffered[1] = N_sma 704 | 705 | # more conservative since it's an approximated value 706 | if N_sma >= 5: 707 | step_size = math.sqrt( 708 | (1 - beta2_t) 709 | * (N_sma - 4) 710 | / (N_sma_max - 4) 711 | * (N_sma - 2) 712 | / N_sma 713 | * N_sma_max 714 | / (N_sma_max - 2) 715 | ) / (1 - beta1 ** state['step']) 716 | elif self.degenerated_to_sgd: 717 | step_size = 1.0 / (1 - beta1 ** state['step']) 718 | else: 719 | step_size = -1 720 | buffered[2] = step_size 721 | 722 | if N_sma >= 5: 723 | denom = exp_avg_var.sqrt().add_(group['eps']) 724 | p.data.addcdiv_(exp_avg, denom, value=-step_size * group['lr']) 725 | elif step_size > 0: 726 | p.data.add_(exp_avg, alpha=-step_size * group['lr']) 727 | 728 | if half_precision: 729 | p.data = p.data.half() 730 | p.grad = p.grad.half() 731 | 732 | return loss 733 | -------------------------------------------------------------------------------- /utils/metrics.py: -------------------------------------------------------------------------------- 1 | """ 2 | References: 3 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/metrics.py 4 | """ 5 | import enum 6 | from typing import Any, Optional, Union, cast, Tuple, Dict 7 | 8 | import numpy as np 9 | import scipy.special 10 | import sklearn.metrics as skm 11 | 12 | 13 | class PredictionType(enum.Enum): 14 | LOGITS = 'logits' 15 | PROBS = 'probs' 16 | 17 | 18 | def calculate_rmse( 19 | y_true: np.ndarray, y_pred: np.ndarray, std: Optional[float] 20 | ) -> float: 21 | rmse = skm.mean_squared_error(y_true, y_pred) ** 0.5 22 | if std is not None: 23 | rmse *= std 24 | return rmse 25 | 26 | 27 | def _get_labels_and_probs( 28 | y_pred: np.ndarray, task_type, prediction_type: Optional[PredictionType] 29 | ) -> Tuple[np.ndarray, Optional[np.ndarray]]: 30 | assert task_type in ('binclass', 'multiclass') 31 | 32 | if prediction_type is None: 33 | return y_pred, None 34 | 35 | if prediction_type == PredictionType.LOGITS: 36 | probs = ( 37 | scipy.special.expit(y_pred) 38 | if task_type == 'binclass' 39 | else scipy.special.softmax(y_pred, axis=1) 40 | ) 41 | elif prediction_type == PredictionType.PROBS: 42 | probs = y_pred 43 | else: 44 | raise AssertionError('Unknown prediction type') 45 | 46 | assert probs is not None 47 | labels = np.round(probs) if task_type == 'binclass' else probs.argmax(axis=1) 48 | return labels.astype('int64'), probs 49 | 50 | 51 | def calculate_metrics( 52 | y_true: np.ndarray, 53 | y_pred: np.ndarray, 54 | task_type: str, 55 | prediction_type: Optional[Union[str, PredictionType]], 56 | y_std: Optional[float] = None, 57 | ) -> Dict[str, Any]: 58 | # Example: calculate_metrics(y_true, y_pred, 'binclass', 'logits', {}) 59 | if prediction_type is not None: 60 | prediction_type = PredictionType(prediction_type) 61 | 62 | if task_type == 'regression': 63 | assert prediction_type is None 64 | assert y_std is not None 65 | rmse = calculate_rmse(y_true, y_pred, y_std) 66 | result = {'rmse': rmse} 67 | else: 68 | labels, probs = _get_labels_and_probs(y_pred, task_type, prediction_type) 69 | result = cast( 70 | Dict[str, Any], skm.classification_report(y_true, labels, output_dict=True) 71 | ) 72 | if task_type == 'binclass': 73 | result['roc_auc'] = skm.roc_auc_score(y_true, probs) 74 | return result 75 | -------------------------------------------------------------------------------- /utils/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import json 4 | import yaml 5 | import shutil 6 | import random 7 | import datetime 8 | from pathlib import Path 9 | 10 | import numpy as np 11 | 12 | from typing import Dict, List, Tuple, Union, Optional, Literal 13 | 14 | import torch 15 | import optuna 16 | 17 | from models import MLP, FTTransformer, AutoInt, DCNv2, NODE 18 | from models.abstract import TabModel, check_dir 19 | from data.utils import Dataset 20 | from data.processor import DataProcessor 21 | 22 | MODEL_CARDS = { 23 | 'xgboost': None, 'catboost': None, 'lightgbm': None, 24 | 'mlp': MLP, 'autoint': AutoInt, 'dcnv2': DCNv2, 'node': NODE, 25 | 'ft-transformer': FTTransformer, 'saint': None, 26 | 't2g-former': None, 'excel-former': None, 27 | } 28 | HPOLib = Literal['optuna', 'hyperopt'] # TODO: add 'hyperopt' support 29 | 30 | def get_model_cards(): 31 | return { 32 | 'ready': sorted(list([key for key, value in MODEL_CARDS.items() if value])), 33 | 'comming soon': sorted(list([key for key, value in MODEL_CARDS.items() if not value])) 34 | } 35 | 36 | def seed_everything(seed=42): 37 | ''' 38 | Sets the seed of the entire notebook so results are the same every time we run. 39 | This is for REPRODUCIBILITY. 40 | ''' 41 | random.seed(seed) 42 | # Set a fixed value for the hash seed 43 | os.environ['PYTHONHASHSEED'] = str(seed) 44 | np.random.seed(seed) 45 | torch.manual_seed(seed) 46 | 47 | if torch.cuda.is_available(): 48 | torch.cuda.manual_seed(seed) 49 | torch.cuda.manual_seed_all(seed) 50 | # When running on the CuDNN backend, two further options must be set 51 | torch.backends.cudnn.deterministic = True 52 | torch.backends.cudnn.benchmark = False 53 | 54 | def load_config_from_file(file): 55 | file = str(file) 56 | if file.endswith('.yaml'): 57 | with open(file, 'r') as f: 58 | cfg = yaml.safe_load(f) 59 | elif file.endswith('.json'): 60 | with open(file, 'r') as f: 61 | cfg = json.load(f) 62 | else: 63 | raise AssertionError('Config files only support yaml or json format now.') 64 | return cfg 65 | 66 | def extract_config(model_config: dict, is_large_data: bool = False): 67 | """selection of different search spaces""" 68 | used_cfgs = {"model": {}, "training": {}, 'meta': model_config.get('meta', {})} 69 | for field in ['model', 'training']: 70 | for k in model_config[field]: 71 | cfgs = model_config[field][k] 72 | if 'type2' not in cfgs: 73 | used_cfg = cfgs 74 | else: 75 | if not is_large_data: 76 | used_cfg = {k: v for k, v in cfgs.items() if not k.endswith('2')} 77 | else: 78 | used_cfg = {k[:-1]: v for k, v in cfgs.items() if k.endswith('2')} 79 | used_cfgs[field][k] = used_cfg 80 | return used_cfgs 81 | 82 | def make_baseline( 83 | model_name, 84 | model_config: Union[dict, str], 85 | n_num: int, 86 | cat_card: Optional[List[int]], 87 | n_labels: int, 88 | sparsity_scheme: Optional[str] = None, 89 | device: Union[str, torch.device] = 'cuda', 90 | ) -> TabModel: 91 | """Process Model Configs and Call Specific Model APIs""" 92 | assert model_name in MODEL_CARDS, f"unrecognized `{model_name}` model name, choose one of valid models in {MODEL_CARDS}" 93 | if isinstance(model_config, str): 94 | model_config = load_config_from_file(model_config)['model'] 95 | if MODEL_CARDS[model_name] is None: 96 | raise NotImplementedError("Please add corresponding model implementation to `models` module") 97 | if sparsity_scheme is not None: 98 | assert 'mlp' in model_name 99 | return MODEL_CARDS[model_name]( 100 | model_config=model_config, 101 | n_num_features=n_num, categories=cat_card, n_labels=n_labels, 102 | sparsity_scheme=sparsity_scheme) 103 | return MODEL_CARDS[model_name]( 104 | model_config=model_config, 105 | n_num_features=n_num, categories=cat_card, n_labels=n_labels) 106 | 107 | def tune( 108 | model_name: str = None, 109 | search_config: Union[dict, str] = None, 110 | dataset: Dataset = None, 111 | batch_size: int = 64, 112 | patience: int = 8, # a small patience for quick tune 113 | n_iterations: int = 50, 114 | framework: HPOLib = 'optuna', 115 | device: Union[str, torch.device] = 'cuda', 116 | output_dir: Optional[str] = None, 117 | ) -> 'TabModel': 118 | # assert framework in HPOLib, f"hyper tune only support the following frameworks '{HPOLib}'" 119 | 120 | # device 121 | device = torch.device(device) 122 | 123 | # task params 124 | n_num_features = dataset.n_num_features 125 | categories = dataset.get_category_sizes('train') 126 | if len(categories) == 0: 127 | categories = None 128 | n_labels = dataset.n_classes or 1 129 | y_std = dataset.y_info.get('std') # for regression 130 | # preprocess 131 | datas = DataProcessor.prepare(dataset, device=device) 132 | # hpo search space 133 | if isinstance(search_config, str): 134 | search_spaces = load_config_from_file(search_config) 135 | else: 136 | search_spaces = search_config 137 | search_spaces = extract_config(search_spaces) # for multi-choice spaces 138 | # meta args 139 | if output_dir is None: 140 | now = datetime.datetime.now().strftime("%Y%m%d%H%M%S") 141 | output_dir = f"results/{model_name}-{dataset.name}-{now}" 142 | search_spaces['meta'] = {'save_path': Path(output_dir) / 'tunning'} # save tuning results 143 | tuned_dir = Path(output_dir) / 'tuned' 144 | # global variable 145 | running_time = 0. 146 | 147 | def get_configs(trial: optuna.Trial): # sample configs 148 | config = {} 149 | for field in ['model', 'training']: 150 | config[field] = {} 151 | for k, space in search_spaces[field].items(): 152 | if space['type'] in ['int', 'float', 'uniform', 'loguniform']: 153 | config[field][k] = eval(f"trial.suggest_{space['type']}")(k, low=space['min'], high=space['max']) 154 | elif space['type'] == 'categorical': 155 | config[field][k] = trial.suggest_categorical(k, choices=space['choices']) 156 | elif space['type'] == 'const': 157 | config[field][k] = space['value'] 158 | else: 159 | raise TypeError(f"Unsupport suggest type {space['type']} for framework: {framework}") 160 | config['meta'] = search_spaces['meta'] 161 | config['training'].setdefault('batch_size', batch_size) 162 | return config 163 | 164 | def objective(trial: optuna.Trial): 165 | configs = get_configs(trial) 166 | model = make_baseline( 167 | model_name, configs['model'], 168 | n_num=n_num_features, 169 | cat_card=categories, 170 | n_labels=n_labels, 171 | device=device) 172 | nonlocal running_time 173 | start = time.time() 174 | model.fit( 175 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std, 176 | eval_set=(datas['val'],), 177 | patience=patience, 178 | task=dataset.task_type.value, 179 | training_args=configs['training'], 180 | meta_args=configs['meta']) # save best model and configs 181 | running_time += time.time() - start 182 | val_metric = ( 183 | model.history['val']['best_metric'] 184 | if dataset.task_type.value != 'regression' 185 | else -model.history['val']['best_metric'] 186 | ) 187 | return val_metric 188 | 189 | def save_per_iter(study: optuna.Study, trail: optuna.Trial): 190 | # current tuning infos 191 | tunning_infos = { 192 | 'model_name': model_name, 193 | 'dataset': dataset.name, 194 | 'cur_trail': trail.number, 195 | 'best_trial': study.best_trial.number, 196 | 'best_val_metric': study.best_value, 197 | 'scores': [t.value for t in study.trials], 198 | 'used_time (s)': running_time, 199 | } 200 | with open(Path(search_spaces['meta']['save_path']) / 'tunning.json', 'w') as f: 201 | json.dump(tunning_infos, f, indent=4) 202 | # only copy the best tuning result 203 | if study.best_trial.number == trail.number: 204 | src_dir = search_spaces['meta']['save_path'] 205 | dst_dir = tuned_dir 206 | print(f'copy best configs and results: {str(src_dir)} -> {str(dst_dir)}') 207 | print(f'best val metric: ', np.round(study.best_value, 4)) 208 | if os.path.exists(dst_dir): 209 | shutil.rmtree(dst_dir) 210 | shutil.copytree(src_dir, dst_dir) 211 | 212 | study = optuna.create_study(direction='maximize') 213 | study.optimize(func=objective, n_trials=n_iterations, callbacks=[save_per_iter]) 214 | 215 | # load best model 216 | config_file = Path(tuned_dir) / 'configs.yaml' 217 | configs = load_config_from_file(config_file) 218 | model = make_baseline( 219 | model_name, configs['model'], 220 | n_num=n_num_features, cat_card=categories, n_labels=n_labels, 221 | device=device) 222 | model.load_best_dnn(tuned_dir) 223 | # prediction 224 | predictions, results = model.predict( 225 | X_num=datas['val'][0], X_cat=datas['val'][1], ys=datas['val'][2], y_std=y_std, 226 | task=dataset.task_type.value, 227 | return_probs=True, return_metric=True, 228 | meta_args={'save_path': output_dir}) 229 | print(results) 230 | 231 | return model --------------------------------------------------------------------------------