├── LICENSE
├── README.md
├── configs
├── autoint.yaml
├── dcnv2.yaml
├── default
│ ├── autoint.yaml
│ ├── dcnv2.yaml
│ ├── ft-transformer.yaml
│ ├── mlp.yaml
│ └── node.yaml
├── ft-transformer.yaml
├── mlp.yaml
└── node.yaml
├── data
├── __init__.py
├── custom_datasets
│ └── infos.json
├── env.py
├── processor.py
└── utils.py
├── examples
├── add_custom_dataset.py
├── finetune_baseline.py
└── tune_baseline.py
├── image
└── auto_skdl-logo.png
├── models
├── __init__.py
├── abstract.py
├── autoint.py
├── dcnv2.py
├── ft_transformer.py
├── mlp.py
├── node
│ ├── __init__.py
│ ├── arch.py
│ ├── nn_utils.py
│ ├── odst.py
│ └── utils.py
└── node_model.py
├── requirements.txt
└── utils
├── deep.py
├── metrics.py
└── model.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Tabular AI Research
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |

2 |
3 | # Auto-Scikit-DL: An automatic deep tabular learning package
4 |
5 | *Auto-Scikit-DL* is a deep tabular learning package served as a complement to scikit-learn. It will contain classical and advanced deep model baselines in tabular (machine) learning, automatic feature engineering and model selection methods, flexible training paradigm customization. This project aims to provide unified baseline interface and benchmark usage for the academic community, convenient pipeline construction for the machine learning competition, and rapid engineering experiment for machine learning projects, helping people focus on the specific algorithm design.
6 |
7 | It is currently under construction by [LionSenSei](https://github.com/jyansir). More baselines are coming soon. The project will be packaged for public use in the future. If there are any problems or suggestions, feel free to contact [jyansir@zju.edu.cn]().
8 |
9 |
10 | ## Baselines
11 |
12 | Here is the baseline list we are going to include in this package (continue to update):
13 |
14 | | Paper | Baseline | Year | Link |
15 | | :------------------------------------------------------------- | :------- | :---: | :--- |
16 | | AutoInt: Automatic Feature Interaction Learning via
Self-Attentive Neural Networks | AutoInt | 2019 | [arXiv](https://arxiv.org/abs/1810.11921) |
17 | | Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data | NODE | 2019 | [arXiv](https://arxiv.org/abs/1909.06312) |
18 | | DCN V2: Improved Deep & Cross Network and Practical Lessons
for Web-scale Learning to Rank Systems | DCNv2 | 2020 | [arXiv](https://arxiv.org/abs/2008.13535) |
19 | | TabNet: Attentive Interpretable Tabular Learning | TabNet | 2020 | [arXiv](https://arxiv.org/abs/1908.07442) |
20 | | Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain | VIME | 2021 | [arXiv](https://arxiv.org/abs/2108.12296) |
21 | | Revisiting Deep Learning Models for Tabular Data | FT-Transformer | 2021 | [arXiv](https://arxiv.org/abs/2106.11959) |
22 | | Saint: Improved neural networks for tabular data via
row attention and contrastive pre-training | SAINT | 2021 | [arXiv](https://arxiv.org/abs/2106.01342) |
23 | | T2G-Former: Organizing Tabular Features into Relation Graphs
Promotes Heterogeneous Feature Interaction | T2G-Former | 2022 | [arXiv](https://arxiv.org/abs/2211.16887) |
24 | | TabPFN: A Transformer That Solves Small Tabular Classification
Problems in a Second | TabPFN | 2022 | [arXiv](https://arxiv.org/abs/2207.01848) |
25 | | ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data | ExcelFormer | 2023 | [arXiv](https://arxiv.org/abs/2301.02819) |
26 |
27 |
28 | ## Basic Framework
29 |
30 | The project is organized into serveral parts:
31 |
32 | - `data`: to include in-built dataset and benchmark files, store dataset global settings and infomation, and common data preprocessing scripts.
33 |
34 | - `models`: to include baseline implementations, and contains an abstract class `TabModel` to organize the uniform deep tabular model interface and training paradigm.
35 |
36 | - `configs`: to include default hyper-parameter and hyper-parameter search spaces of baselines in the original paper.
37 |
38 | - `utils`: to include basic functionalities of: `model`, building baselines, tunning; `deep`, common deep learning functions and opitmizers; `metrics`, metric calculation.
39 |
40 | ## Examples
41 |
42 | Some basic usage examples are provided in `examples` directory, you can run the scripts with `python examples/script_name.py`. Before run the examples, you can download our preprared in-built datasets in the [T2G-Former](https://arxiv.org/abs/2211.16887) experiment from this [link](https://drive.google.com/uc?export=download&id=1dIp78bZo0I0TJATmZKzBhrbZxFVzJYLR), then extract to `data/datasets` folder.
43 |
44 | ```
45 | mkdir ./data/datasets # create the directory if it does not exist
46 | tar -zxvf t2g-data0.tar.gz -C ./data/datasets
47 | ```
48 |
49 | - **Add a custom dataset from a single csv file**: If you want to load a csv file like in-built datasets, we provide the interface to automatically process from a raw csv file and store it in the package. Then you can load it easily.
50 |
51 | - **Finetune a baseline**: You can easily finetune a model by our `fit` and `predict` APIs.
52 |
53 | - **Tune a baseline**: We provide an end-to-end `tune` function to perform hyper-parameter search in spaces defined in `configs`. You can also define your own search spaces (refer to our config files).
54 |
55 | ## Add your models
56 |
57 | Currently, you can only achieve this by manually copying your model codes and integrating it into the `models` folder (refer to `models/mlp.py` for API alignment, we suggest to copy it and directly add your model codes). Then modify `MODEL_CARDS` in `utils/model.py` to add and import your model. We will support adding user models with simple scripts in the future.
58 |
59 |
--------------------------------------------------------------------------------
/configs/autoint.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | activation:
3 | value: "relu"
4 | type: "const"
5 | initialization:
6 | value: "kaiming"
7 | type: "const"
8 | n_heads:
9 | value: 2
10 | type: "const"
11 | prenormalization:
12 | value: false
13 | type: "const"
14 | attention_dropout:
15 | min: 0
16 | max: 0.5
17 | type: "uniform"
18 | d_token:
19 | min: 8
20 | max: 64
21 | type: "int"
22 | n_layers:
23 | min: 1
24 | max: 6
25 | type: "int"
26 | residual_dropout:
27 | min: 0
28 | max: 0.5
29 | type: "uniform"
30 | training:
31 | lr:
32 | min: 1.0e-5
33 | max: 1.0e-3
34 | type: "loguniform"
35 | min2: 3.0e-5
36 | max2: 3.0e-4
37 | type2: "loguniform"
38 | weight_decay:
39 | min: 1.0e-6
40 | max: 1.0e-3
41 | type: "loguniform"
42 | optimizer:
43 | value: "adamw"
44 | type: "const"
--------------------------------------------------------------------------------
/configs/dcnv2.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | cross_dropout:
3 | min: 0
4 | max: 0.5
5 | type: "uniform"
6 | d:
7 | min: 64
8 | max: 512
9 | type: "int"
10 | min2: 64
11 | max2: 1024
12 | type2: "int"
13 | hidden_dropout:
14 | min: 0
15 | max: 0.5
16 | type: "uniform"
17 | n_cross_layers:
18 | min: 1
19 | max: 8
20 | type: "int"
21 | min2: 1
22 | max2: 16
23 | type2: "int"
24 | n_hidden_layers:
25 | min: 1
26 | max: 8
27 | type: "int"
28 | min2: 1
29 | max2: 16
30 | type2: "int"
31 | stacked:
32 | value: false
33 | type: "const"
34 | d_embedding:
35 | min: 64
36 | max: 512
37 | type: "int"
38 | training:
39 | lr:
40 | min: 1.0e-5
41 | max: 1.0e-2
42 | type: "loguniform"
43 | weight_decay:
44 | min: 1.0e-6
45 | max: 1.0e-3
46 | type: "loguniform"
47 | optimizer:
48 | value: "adamw"
49 | type: "const"
--------------------------------------------------------------------------------
/configs/default/autoint.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | activation: "relu"
3 | initialization: "kaiming"
4 | n_heads: 2
5 | prenormalization: false
6 | attention_dropout: 0.2
7 | d_token: 32
8 | n_layers: 3
9 | residual_dropout: 0.2
10 | training:
11 | lr: 5.0e-4
12 | weight_decay: 2.0e-5
13 | optimizer: "adamw"
14 |
--------------------------------------------------------------------------------
/configs/default/dcnv2.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | cross_dropout: 0.2
3 | d: 64
4 | hidden_dropout: 0.2
5 | n_cross_layers: 3
6 | n_hidden_layers: 3
7 | stacked: false
8 | d_embedding: 64
9 | training:
10 | lr: 5.0e-4
11 | weight_decay: 1.0e-6
12 | optimizer: "adamw"
13 |
--------------------------------------------------------------------------------
/configs/default/ft-transformer.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | d_token: 64
3 | n_blocks: 3
4 | attention_dropout: 0.2
5 | ffn_d_factor: 1.33
6 | ffn_dropout: 0.3
7 | residual_dropout: 0.1
8 | training:
9 | lr: 5.0e-4
10 | weight_decay: 2.0e-5
11 | optimizer: "adamw"
12 |
--------------------------------------------------------------------------------
/configs/default/mlp.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | n_layers: 3
3 | first_dim: 64
4 | mid_dim: 32
5 | last_dim: 64
6 | dropout: 0.2
7 | d_embedding: 128
8 |
9 | training:
10 | lr: 1.0e-3
11 | weight_decay: 0.0
12 | optimizer: "adamw"
13 |
--------------------------------------------------------------------------------
/configs/default/node.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | bin_function: "entmoid15"
3 | choice_function: "entmax15"
4 | depth: 6
5 | layer_dim: 256
6 | num_layers: 4
7 | d_embedding: 256
8 | training:
9 | lr: 1.0e-3
10 | weight_decay: 0.0
11 | optimizer: "adam"
12 |
--------------------------------------------------------------------------------
/configs/ft-transformer.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | d_token:
3 | min: 64
4 | max: 512
5 | type: "int"
6 |
7 | n_blocks:
8 | min: 1
9 | max: 4
10 | type: "int"
11 | # for large datasets
12 | min2: 1
13 | max2: 6
14 | type2: "int"
15 |
16 | attention_dropout:
17 | min: 0
18 | max: 0.5
19 | type: "uniform"
20 |
21 | ffn_d_factor:
22 | min: 0.66
23 | max: 2.66
24 | type: "uniform"
25 | value2: 1.33
26 | type2: "const"
27 |
28 | ffn_dropout:
29 | min: 0
30 | max: 0.5
31 | type: "uniform"
32 |
33 | residual_dropout:
34 | min: 0
35 | max: 0.2
36 | type: "uniform"
37 | value2: 0
38 | type2: "const"
39 |
40 | training:
41 | lr:
42 | min: 1.0e-5
43 | max: 1.0e-3
44 | type: "loguniform"
45 | min2: 3.0e-5
46 | max2: 3.0e-4
47 | type2: "loguniform"
48 |
49 | weight_decay:
50 | min: 1.0e-6
51 | max: 1.0e-3
52 | type: "loguniform"
53 |
54 | optimizer:
55 | value: "adamw"
56 | type: "const"
57 |
--------------------------------------------------------------------------------
/configs/mlp.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | n_layers:
3 | min: 1
4 | max: 8
5 | type: "int"
6 | # for large datasets
7 | min2: 1
8 | max2: 16
9 | type2: "int"
10 |
11 | first_dim:
12 | min: 1
13 | max: 512
14 | type: "int"
15 | # for large datasets
16 | min2: 1
17 | max2: 1024
18 | type2: "int"
19 |
20 | mid_dim:
21 | min: 1
22 | max: 512
23 | type: "int"
24 | # for large datasets
25 | min2: 1
26 | max2: 1024
27 | type2: "int"
28 |
29 | last_dim:
30 | min: 1
31 | max: 512
32 | type: "int"
33 | # for large datasets
34 | min2: 1
35 | max2: 1024
36 | type2: "int"
37 |
38 | dropout:
39 | min: 0
40 | max: 0.5
41 | type: "uniform"
42 |
43 | d_embedding:
44 | min: 64
45 | max: 512
46 | type: "int"
47 |
48 | training:
49 | lr:
50 | min: 1.0e-5
51 | max: 1.0e-2
52 | type: "loguniform"
53 |
54 | weight_decay:
55 | value: 0.0
56 | type: "const"
57 |
58 | optimizer:
59 | value: "adamw"
60 | type: "const"
--------------------------------------------------------------------------------
/configs/node.yaml:
--------------------------------------------------------------------------------
1 | model:
2 | bin_function:
3 | value: "entmoid15"
4 | type: "const"
5 | choice_function:
6 | value: "entmax15"
7 | type: "const"
8 | depth:
9 | min: 6
10 | max: 8
11 | type: "int"
12 | layer_dim:
13 | choices: [128, 256, 512]
14 | type: "categorical"
15 | num_layers:
16 | choices: [2, 4, 8]
17 | type: "categorical"
18 | d_embedding:
19 | value: 256
20 | type: "const"
21 | training:
22 | lr:
23 | value: 1.0e-3
24 | type: "const"
25 | weight_decay:
26 | value: 0.0
27 | type: "const"
28 | optimizer:
29 | value: "adam"
30 | type: "const"
--------------------------------------------------------------------------------
/data/__init__.py:
--------------------------------------------------------------------------------
1 | from .env import *
--------------------------------------------------------------------------------
/data/custom_datasets/infos.json:
--------------------------------------------------------------------------------
1 | {
2 | "n_datasets": 0,
3 | "binclass": 0,
4 | "multiclass": 0,
5 | "regression": 0,
6 | "data_list": []
7 | }
--------------------------------------------------------------------------------
/data/env.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import shutil
4 | from pathlib import Path
5 | import numpy as np
6 |
7 | DATA = Path('data/datasets')
8 | CUSTOM_DATA = Path('data/custom_datasets') # user custom datasets
9 |
10 | BENCHMARKS = {
11 | 'ft-transformer': {
12 | 'path': 'data/benchmarks/ft-transformer',
13 | # The dataset list of the benchmark
14 | # if not appears in DATASETS or CUSTOM_DATASETS
15 | # it should be added to the 'path'
16 | 'datasets': [], # priority: DATASETS > CUSTOM_DATASETS > 'path'
17 | },
18 | 't2g-former': {
19 | 'path': 'data/benchmarks/t2g-former',
20 | 'datasets': ['california', 'eye', 'helena', 'otto']
21 | }
22 | }
23 |
24 | # available single datasets and specific DNN processing methods
25 | # default: `normalization: quantile`
26 | DATASETS = {
27 | 'adult': {'path': DATA / 'adult'},
28 | 'california': {'path': DATA / 'california'},
29 | 'churn': {'path': DATA / 'churn'},
30 | 'eye': {'path': DATA / 'eye', 'normalization': 'standard'},
31 | 'gesture': {'path': DATA / 'gesture'},
32 | 'helena': {'path': DATA / 'helena', 'normalization': 'standard'},
33 | 'higgs-small': {'path': DATA / 'higgs-small'},
34 | 'house': {'path': DATA / 'house'},
35 | 'jannis': {'path': DATA / 'jannis'},
36 | 'otto': {'path': DATA / 'otto', 'normalization': None},
37 | 'fb-comments': {'path': DATA / 'fb-comments'},
38 | 'year': {'path': DATA / 'year'},
39 | }
40 |
41 | CUSTOM_DATASETS = {}
42 |
43 | def read_custom_infos():
44 | with open(CUSTOM_DATA / 'infos.json', 'r') as f:
45 | custom_infos = json.load(f)
46 | return custom_infos
47 | # read the `infos.json` to load
48 | def reload_custom_infos():
49 | custom_infos = read_custom_infos()
50 | global CUSTOM_DATASETS
51 | CUSTOM_DATASETS = {
52 | info['name']: {
53 | 'path': CUSTOM_DATA / info['name'],
54 | 'task_type': info['task_type'],
55 | 'normalization': info.get('normalization', 'quantile')
56 | } for info in custom_infos['data_list']
57 | }
58 | reload_custom_infos()
59 |
60 | def write_custom_infos(infos):
61 | with open(CUSTOM_DATA / 'infos.json', 'w') as f:
62 | json.dump(infos, f, indent=4)
63 | reload_custom_infos()
64 |
65 | def push_custom_datasets(
66 | X_num, X_cat, ys, idx,
67 | info # TODO: add normalization field to info
68 | ):
69 | data_dir = CUSTOM_DATA / info['name']
70 | os.makedirs(data_dir)
71 | try:
72 | for spl in ['train', 'val', 'test']:
73 | np.save(data_dir / f'idx_{spl}.npy', idx[spl])
74 | if X_num is not None:
75 | np.save(data_dir / f'X_num_{spl}.npy', X_num[spl])
76 | if X_cat is not None:
77 | np.save(data_dir / f'X_cat_{spl}.npy', X_cat[spl])
78 | np.save(data_dir / f'y_{spl}.npy', ys[spl])
79 | with open(data_dir / 'info.json', 'w') as f:
80 | json.dump(info, f, indent=4)
81 | except:
82 | print('failed to add custom dataset: ', info['name'])
83 | shutil.rmtree(data_dir)
84 | return
85 | custom_infos = read_custom_infos()
86 | custom_infos['data_list'].append({'name': info['name'], 'task_type': info['task_type']})
87 | custom_infos[info['task_type']] += 1
88 | custom_infos['n_datasets'] += 1
89 | write_custom_infos(custom_infos)
90 | print(f"push dataset: '{info['name']}' done")
91 |
92 | def available_datasets():
93 | return sorted(list(DATASETS.keys()) + list(CUSTOM_DATASETS.keys()))
--------------------------------------------------------------------------------
/data/processor.py:
--------------------------------------------------------------------------------
1 |
2 | from typing import List, Optional, Union, Literal
3 | from pathlib import Path
4 | import os
5 | import yaml
6 | import json
7 | import shutil
8 | import warnings
9 |
10 | import numpy as np
11 | import pandas as pd
12 | from collections import Counter
13 | from sklearn.preprocessing import OrdinalEncoder
14 | from sklearn.model_selection import train_test_split
15 |
16 | import torch.nn as nn
17 | from .env import (
18 | BENCHMARKS, DATASETS, CUSTOM_DATASETS,
19 | push_custom_datasets, read_custom_infos, write_custom_infos
20 | )
21 | from .utils import (
22 | Normalization, NumNanPolicy, CatNanPolicy, CatEncoding, YPolicy,
23 | CAT_MISSING_VALUE, ArrayDict, TensorDict, TaskType,
24 | Dataset, Transformations, prepare_tensors, build_dataset, transform_dataset
25 | )
26 | from models.abstract import TabModel
27 |
28 | DataFileType = Literal['csv', 'excel', 'npy', 'arff']
29 |
30 | class DataProcessor:
31 | """Base class to process a single dataset"""
32 | def __init__(
33 | self,
34 | normalization: Optional[Normalization] = None,
35 | num_nan_policy: Optional[NumNanPolicy] = None,
36 | cat_nan_policy: Optional[CatNanPolicy] = None,
37 | cat_min_frequency: Optional[float] = None,
38 | cat_encoding: Optional[CatEncoding] = None,
39 | y_policy: Optional[YPolicy] = 'default',
40 | seed: int = 42,
41 | cache_dir: Optional[str] = None,
42 | ):
43 | self.transformation = Transformations(
44 | seed=seed,
45 | normalization=normalization,
46 | num_nan_policy=num_nan_policy,
47 | cat_nan_policy=cat_nan_policy,
48 | cat_min_frequency=cat_min_frequency,
49 | cat_encoding=cat_encoding,
50 | y_policy=y_policy
51 | )
52 | self.cache_dir = cache_dir
53 |
54 | def apply(self, dataset: Dataset):
55 | return transform_dataset(dataset, self.transformation, self.cache_dir)
56 |
57 | def save(self, file, **kwargs):
58 | data_config = {
59 | 'transformation': vars(self.transformation),
60 | 'cache_dir': str(self.cache_dir),
61 | 'meta': kwargs,
62 | }
63 | with open(file, 'w') as f:
64 | yaml.dump(data_config, f, indent=2)
65 |
66 | @staticmethod
67 | def check_splits(dataset: Dataset):
68 | valid_splits = True
69 | if 'train' in dataset.y:
70 | if 'test' not in dataset.y:
71 | warnings.warn("Missing test split, unable to prediction")
72 | valid_splits = False
73 | if 'val' not in dataset.y:
74 | warnings.warn("Missing dev split, unable to early stop, or ignore this message if no early stop needed.")
75 | valid_splits = False
76 | if valid_splits:
77 | print("ready for training!")
78 | else:
79 | raise ValueError("Missing training split in the dataset")
80 |
81 | @staticmethod
82 | def prepare(dataset: Dataset, model: Optional[TabModel] = None, device: str = 'cuda'):
83 | assert model is not None or device is not None
84 | def get_spl(X: Optional[Union[ArrayDict, TensorDict]], spl):
85 | return None if X is None else X[spl]
86 | if device is not None or isinstance(model.model, nn.Module):
87 | device = device or model.model.device
88 | X_num, X_cat, ys = prepare_tensors(dataset, device)
89 | return {spl: (
90 | get_spl(X_num, spl),
91 | get_spl(X_cat, spl),
92 | get_spl(ys, spl)
93 | ) for spl in ys}
94 | else:
95 | return {spl: (
96 | get_spl(dataset.X_num, spl),
97 | get_spl(dataset.X_cat, spl),
98 | get_spl(dataset.y, spl)
99 | ) for spl in dataset.y}
100 |
101 | @staticmethod
102 | def load_preproc_default(
103 | output_dir, # output preprocessing infos
104 | model_name,
105 | dataset_name,
106 | benchmark_name: Optional[str] = None,
107 | seed: int = 42,
108 | cache_dir: Optional[str] = None
109 | ):
110 | global DATASETS, CUSTOM_DATASETS
111 | """default data preprocessing pipeline"""
112 | if dataset_name in DATASETS or dataset_name in CUSTOM_DATASETS:
113 | data_src = DATASETS if dataset_name in DATASETS else CUSTOM_DATASETS
114 | data_config = data_src[dataset_name]
115 | data_path = Path(data_config['path'])
116 | data_config.setdefault('normalization', 'quantile')
117 | normalization = data_config['normalization']
118 | elif benchmark_name is not None:
119 | assert benchmark_name in BENCHMARKS, f"Benchmark '{benchmark_name}' is not included, \
120 | please choose one of '{list(BENCHMARKS.keys())}', for include your benchmark manually."
121 | benchmark_info = BENCHMARKS[benchmark_name]
122 | assert dataset_name in benchmark_info['datasets'], f"dataset '{dataset_name}' not in benchmark '{benchmark_name}'"
123 | data_path = Path(benchmark_info['path']) / dataset_name
124 | normalization = 'quantile'
125 | else:
126 | raise ValueError(f"No dataset '{dataset_name}' is available, \
127 | if you want to use a custom dataset (from csv file), using `add_custom_dataset`")
128 |
129 | dataset = Dataset.from_dir(data_path)
130 | # default preprocess settings
131 | num_nan_policy = 'mean' if dataset.X_num is not None and \
132 | any(np.isnan(dataset.X_num[spl]).any() for spl in dataset.X_num) else None
133 | cat_nan_policy = None
134 | if model_name in ['xgboost', 'catboost', 'lightgbm']: # for tree models or other sklearn algorithms
135 | normalization = None
136 | cat_min_frequency = None
137 | cat_encoding = 'one-hot'
138 | if model_name in ['catboost']:
139 | cat_encoding = None
140 | else: # for dnns
141 | # BUG: (dataset.X_cat[spl] == CAT_MISSING_VALUE).any() has different action
142 | # dtype: int -> bool, dtype: string -> array[bool], dtype: object -> np.load error
143 | # CURRENT: uniformly using string type to store catgorical features
144 | if dataset.X_cat is not None and \
145 | any((dataset.X_cat[spl] == CAT_MISSING_VALUE).any() for spl in dataset.X_cat):
146 | cat_nan_policy = 'most_frequent'
147 | cat_min_frequency = None
148 | cat_encoding = None
149 | cache_dir = cache_dir or data_path
150 | processor = DataProcessor(
151 | normalization=normalization,
152 | num_nan_policy=num_nan_policy,
153 | cat_nan_policy=cat_nan_policy,
154 | cat_min_frequency=cat_min_frequency,
155 | cat_encoding=cat_encoding,
156 | seed=seed,
157 | cache_dir=Path(cache_dir),
158 | )
159 | dataset = processor.apply(dataset)
160 | # check train, val, test splits
161 | DataProcessor.check_splits(dataset)
162 | # save preprocessing infos
163 | if not os.path.exists(output_dir):
164 | os.makedirs(output_dir)
165 | processor.save(
166 | Path(output_dir) / 'data_config.yaml',
167 | benchmark=str(benchmark_name),
168 | dataset=dataset_name
169 | )
170 | return dataset
171 |
172 | @staticmethod
173 | def split(
174 | X_num: Optional[np.ndarray] = None,
175 | X_cat: Optional[np.ndarray] = None,
176 | ys: np.ndarray = None,
177 | train_ratio: float = 0.8,
178 | stratify: bool = True,
179 | seed: int = 42,
180 | ):
181 | assert 0 < train_ratio < 1
182 | assert ys is not None
183 | sample_idx = np.arange(len(ys))
184 | test_ratio = 1 - train_ratio
185 | _stratify = None if not stratify else ys
186 | train_idx, test_idx = train_test_split(sample_idx, test_size=test_ratio, random_state=seed, stratify=_stratify)
187 | _stratify = None if not stratify else ys[train_idx]
188 | train_idx, val_idx = train_test_split(train_idx, test_size=test_ratio, random_state=seed, stratify=_stratify)
189 | if X_num is not None:
190 | X_num = {'train': X_num[train_idx], 'val': X_num[val_idx], 'test': X_num[test_idx]}
191 | if X_cat is not None:
192 | X_cat = {'train': X_cat[train_idx], 'val': X_cat[val_idx], 'test': X_cat[test_idx]}
193 | ys = {'train': ys[train_idx], 'val': ys[val_idx], 'test': ys[test_idx]}
194 | idx = {'train': train_idx, 'val': val_idx, 'test': test_idx}
195 | return X_num, X_cat, ys, idx
196 |
197 | @staticmethod
198 | def del_custom_dataset(
199 | dataset_names: Union[str, List[str]]
200 | ):
201 | global DATASETS, CUSTOM_DATASETS
202 | all_infos = read_custom_infos()
203 | if isinstance(dataset_names, str):
204 | dataset_names = [dataset_names]
205 | for dataset_name in dataset_names:
206 | if dataset_name not in CUSTOM_DATASETS:
207 | print(f"custom dataset: {dataset_name} not exist")
208 | continue
209 | elif dataset_name in DATASETS:
210 | print(f"can not delete an in-built dataset: {dataset_name}")
211 | continue
212 | data_info = CUSTOM_DATASETS[dataset_name]
213 | task = data_info['task_type']
214 | data_path = data_info['path']
215 | data_idx = [info['name'] for info in all_infos['data_list']].index(dataset_name)
216 | all_infos['data_list'].pop(data_idx)
217 | all_infos['n_datasets'] -= 1
218 | all_infos[task] -= 1
219 | shutil.rmtree(data_path)
220 | print(f"delete dataset: {dataset_name} successfully")
221 | write_custom_infos(all_infos)
222 | from .env import CUSTOM_DATASETS # BUG: refresh the global variable
223 |
224 | @staticmethod
225 | def add_custom_dataset(
226 | file: Union[str, Path],
227 | format: DataFileType = 'csv',
228 | dataset_name: Optional[str] = None,
229 | task: Optional[str] = None,
230 | num_cols: Optional[List[int]] = None,
231 | cat_cols: Optional[List[int]] = None,
232 | label_index: int = -1, # label column index
233 | header: Optional[int] = 0, # header row
234 | max_cat_num: int = 16,
235 | train_ratio: float = 0.8, # split train / test, train / val
236 | seed: float = 42, # random split seed
237 | ):
238 | """
239 | Support for adding a custom dataset from a single data file
240 | ---
241 | read a raw csv file, process into 3 splits (train, val, test), and add to custom_datasets
242 |
243 | TODO: adding a dataset from prepared data split files
244 | TODO: support no validation split
245 | """
246 | global DATASETS, CUSTOM_DATASETS
247 | file_name = Path(file).name
248 | assert file_name.endswith(format), f'please check if the file \
249 | is in {format} format, or add the suffix manually'
250 | dataset_name = dataset_name or file_name[:-len(format)-1]
251 | assert dataset_name not in DATASETS, f'same dataset name as an in-built dataset: {dataset_name}'
252 | assert dataset_name not in CUSTOM_DATASETS, f"existing custom dataset '{dataset_name}' found"
253 |
254 | if format == 'csv':
255 | datas: pd.DataFrame = pd.read_csv(file, header=header)
256 | columns = datas.columns if header is not None else None
257 | elif format == 'npy':
258 | header = None # numpy file has no headers
259 | columns = None
260 | datas = np.load(file)
261 | raise NotImplementedError("only support load csv file now")
262 | else:
263 | raise ValueError("other support format to be add further")
264 |
265 | X_idx = list(range(datas.shape[1]))
266 | y_idx = X_idx.pop(label_index)
267 | label_name = columns[y_idx] if columns is not None else None
268 | # numerical and categorical feature detection
269 | if num_cols is None or cat_cols is None:
270 | print('automatically detect column type...')
271 | print('max category amount: ', max_cat_num)
272 | num_cols, cat_cols = [], []
273 | num_names, cat_names = [], []
274 | for i in X_idx:
275 | if datas.iloc[:, i].values.dtype == float:
276 | num_cols.append(i)
277 | if columns is not None:
278 | num_names.append(columns[i])
279 | else: # int or object (str)
280 | if len(set(datas.iloc[:, i].values)) <= max_cat_num:
281 | cat_cols.append(i)
282 | if columns is not None:
283 | cat_names.append(columns[i])
284 | elif datas.iloc[:, i].values.dtype == int:
285 | num_cols.append(i)
286 | if columns is not None:
287 | num_names.append(columns[i])
288 | if not num_names and not cat_names:
289 | num_names, cat_names = None, None
290 | elif columns:
291 | num_names = [columns[i] for i in num_cols]
292 | cat_names = [columns[i] for i in cat_cols]
293 | else:
294 | num_names, cat_names = None, None
295 | n_num_features = len(num_cols)
296 | n_cat_features = len(cat_cols)
297 | # build X_num and X_cat
298 | X_num, ys = None, datas.iloc[:, y_idx].values
299 | if len(num_cols) > 0:
300 | X_num = datas.iloc[:, num_cols].values.astype(np.float32)
301 | # check data type
302 | X_cat = []
303 | for i in cat_cols:
304 | if datas.iloc[:, i].values.dtype == int:
305 | x = datas.iloc[:, i].values.astype(np.int64)
306 | # ordered by value
307 | # x = OrdinalEncoder(categories=[sorted(list(set(x)))]).fit_transform(x.reshape(-1, 1))
308 | else: # string object
309 | x = datas.iloc[:, i].values.astype(object)
310 | # most_common = [item[0] for item in Counter(x).most_common()]
311 | # ordered by frequency
312 | # x = OrdinalEncoder(categories=[most_common]).fit_transform(x.reshape(-1, 1))
313 | X_cat.append(x.astype(np.str0)) # Encoder Later, compatible with Line 140
314 | X_cat = np.stack(X_cat, axis=1) if len(X_cat) > 0 else None # if using OrdinalEncoder, np.concatenate
315 | # detect task type
316 | def process_non_regression_labels(ys: np.ndarray, task):
317 | if ys.dtype in [int, float]:
318 | ys = OrdinalEncoder(categories=[sorted(list(set(ys)))]).fit_transform(ys.reshape(-1, 1))
319 | else:
320 | most_common = [item[0] for item in Counter(ys).most_common()]
321 | ys = OrdinalEncoder(categories=most_common).fit_transform(ys.reshape(-1, 1))
322 | ys = ys[:, 0]
323 | return ys.astype(np.float32) if task == 'binclass' else ys.astype(np.int64)
324 |
325 | if task is None:
326 | if ys.dtype in [int, object]:
327 | task = 'binclass' if len(set(ys)) == 2 else 'multiclass'
328 | ys = process_non_regression_labels(ys, task)
329 | elif ys.dtype == float:
330 | if len(set(ys)) == 2:
331 | task = 'binclass'
332 | ys = process_non_regression_labels(ys, task)
333 | else:
334 | task = 'regression'
335 | ys = ys.astype(np.float32)
336 | else:
337 | if task == 'regression':
338 | ys = ys.astype(np.float32)
339 | else:
340 | ys = process_non_regression_labels(ys, task)
341 |
342 | # split datasets
343 | stratify = task != 'regression'
344 | X_num, X_cat, ys, idx = DataProcessor.split(X_num, X_cat, ys, train_ratio, stratify, seed)
345 | # push to CUSTOM_DATASETS
346 | data_info = {
347 | 'name': dataset_name,
348 | 'id': f'{dataset_name.lower()}--custom',
349 | 'task_type': task,
350 | 'label_name': label_name,
351 | 'n_num_features': n_num_features,
352 | 'num_feature_names': num_names,
353 | 'n_cat_features': n_cat_features,
354 | 'cat_feature_names': cat_names,
355 | 'test_size': len(ys['test']),
356 | 'train_size': len(ys['train']),
357 | 'val_size': len(ys['val'])}
358 | push_custom_datasets(X_num, X_cat, ys, idx, data_info)
359 | from .env import CUSTOM_DATASETS # refresh global variable
360 | print(f'finish, now you can load your dataset with `load_preproc_default({dataset_name})`')
361 |
362 | class BenchmarkProcessor:
363 | """Prepare datasets in the Literatures"""
364 | def __init__(self) -> None:
365 | pass
--------------------------------------------------------------------------------
/data/utils.py:
--------------------------------------------------------------------------------
1 | """
2 | References:
3 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/data.py
4 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/util.py
5 | """
6 | import json
7 | import enum
8 | import pickle
9 | import hashlib
10 | from collections import Counter
11 | from copy import deepcopy
12 | from dataclasses import astuple, dataclass, replace
13 | from pathlib import Path
14 | from typing import Any, Optional, Union, cast, Dict, List, Tuple
15 | try:
16 | from typing import Literal
17 | except ImportError:
18 | from typing_extensions import Literal
19 |
20 | import numpy as np
21 | import pandas as pd
22 | import sklearn.preprocessing
23 | import torch
24 | from category_encoders import LeaveOneOutEncoder
25 | from sklearn.impute import SimpleImputer
26 | from sklearn.preprocessing import StandardScaler
27 |
28 |
29 | ArrayDict = Dict[str, np.ndarray]
30 | TensorDict = Dict[str, torch.Tensor]
31 |
32 |
33 | CAT_MISSING_VALUE = '__nan__'
34 | CAT_RARE_VALUE = '__rare__'
35 | Normalization = Literal['standard', 'quantile']
36 | NumNanPolicy = Literal['drop-rows', 'mean']
37 | CatNanPolicy = Literal['most_frequent']
38 | CatEncoding = Literal['one-hot', 'counter']
39 | YPolicy = Literal['default']
40 |
41 | class TaskType(enum.Enum):
42 | BINCLASS = 'binclass'
43 | MULTICLASS = 'multiclass'
44 | REGRESSION = 'regression'
45 |
46 | def __str__(self) -> str:
47 | return self.value
48 |
49 |
50 | def raise_unknown(unknown_what: str, unknown_value: Any):
51 | raise ValueError(f'Unknown {unknown_what}: {unknown_value}')
52 |
53 | def load_json(path: Union[Path, str], **kwargs) -> Any:
54 | return json.loads(Path(path).read_text(), **kwargs)
55 |
56 | def dump_json(x: Any, path: Union[Path, str], **kwargs) -> None:
57 | kwargs.setdefault('indent', 4)
58 | Path(path).write_text(json.dumps(x, **kwargs) + '\n')
59 |
60 | def load_pickle(path: Union[Path, str], **kwargs) -> Any:
61 | return pickle.loads(Path(path).read_bytes(), **kwargs)
62 |
63 | def dump_pickle(x: Any, path: Union[Path, str], **kwargs) -> None:
64 | Path(path).write_bytes(pickle.dumps(x, **kwargs))
65 |
66 |
67 | class StandardScaler1d(StandardScaler):
68 | def partial_fit(self, X, *args, **kwargs):
69 | assert X.ndim == 1
70 | return super().partial_fit(X[:, None], *args, **kwargs)
71 |
72 | def transform(self, X, *args, **kwargs):
73 | assert X.ndim == 1
74 | return super().transform(X[:, None], *args, **kwargs).squeeze(1)
75 |
76 | def inverse_transform(self, X, *args, **kwargs):
77 | assert X.ndim == 1
78 | return super().inverse_transform(X[:, None], *args, **kwargs).squeeze(1)
79 |
80 |
81 | def get_category_sizes(X: Union[torch.Tensor, np.ndarray]) -> List[int]:
82 | XT = X.T.cpu().tolist() if isinstance(X, torch.Tensor) else X.T.tolist()
83 | return [len(set(x)) for x in XT]
84 |
85 |
86 | @dataclass(frozen=True)
87 | class Dataset:
88 | X_num: Optional[ArrayDict]
89 | X_cat: Optional[ArrayDict]
90 | y: ArrayDict
91 | y_info: Dict[str, Any]
92 | task_type: TaskType
93 | n_classes: Optional[int]
94 | name: Optional[str]
95 |
96 | @classmethod
97 | def from_dir(cls, dir_: Union[Path, str]) -> 'Dataset':
98 | dir_ = Path(dir_)
99 |
100 | def load(item) -> ArrayDict:
101 | def _load(file: Path):
102 | return cast(np.ndarray, np.load(file)) if file.exists() else None
103 | return {
104 | x: _load(dir_ / f'{item}_{x}.npy')
105 | for x in ['train', 'val', 'test']
106 | }
107 |
108 | info = load_json(dir_ / 'info.json')
109 |
110 | return Dataset(
111 | load('X_num') if dir_.joinpath('X_num_train.npy').exists() else None,
112 | load('X_cat') if dir_.joinpath('X_cat_train.npy').exists() else None,
113 | load('y'),
114 | {},
115 | TaskType(info['task_type']),
116 | info.get('n_classes'),
117 | info.get('name'),
118 | )
119 |
120 | @property
121 | def is_binclass(self) -> bool:
122 | return self.task_type == TaskType.BINCLASS
123 |
124 | @property
125 | def is_multiclass(self) -> bool:
126 | return self.task_type == TaskType.MULTICLASS
127 |
128 | @property
129 | def is_regression(self) -> bool:
130 | return self.task_type == TaskType.REGRESSION
131 |
132 | @property
133 | def n_num_features(self) -> int:
134 | return 0 if self.X_num is None else self.X_num['train'].shape[1]
135 |
136 | @property
137 | def n_cat_features(self) -> int:
138 | return 0 if self.X_cat is None else self.X_cat['train'].shape[1]
139 |
140 | @property
141 | def n_features(self) -> int:
142 | return self.n_num_features + self.n_cat_features
143 |
144 | def size(self, part: Optional[str]) -> int:
145 | return sum(map(len, self.y.values())) if part is None else len(self.y[part])
146 |
147 | @property
148 | def nn_output_dim(self) -> int:
149 | if self.is_multiclass:
150 | assert self.n_classes is not None
151 | return self.n_classes
152 | else:
153 | return 1
154 |
155 | def get_category_sizes(self, part: str) -> List[int]:
156 | return [] if self.X_cat is None else get_category_sizes(self.X_cat[part])
157 |
158 |
159 | def num_process_nans(dataset: Dataset, policy: Optional[NumNanPolicy]) -> Dataset:
160 | assert dataset.X_num is not None
161 | nan_masks = {k: np.isnan(v) for k, v in dataset.X_num.items()}
162 | if not any(x.any() for x in nan_masks.values()): # type: ignore[code]
163 | assert policy is None
164 | return dataset
165 |
166 | assert policy is not None
167 | if policy == 'drop-rows':
168 | valid_masks = {k: ~v.any(1) for k, v in nan_masks.items()}
169 | assert valid_masks[
170 | 'test'
171 | ].all(), 'Cannot drop test rows, since this will affect the final metrics.'
172 | new_data = {}
173 | for data_name in ['X_num', 'X_cat', 'y']:
174 | data_dict = getattr(dataset, data_name)
175 | if data_dict is not None:
176 | new_data[data_name] = {
177 | k: v[valid_masks[k]] for k, v in data_dict.items()
178 | }
179 | dataset = replace(dataset, **new_data)
180 | elif policy == 'mean':
181 | new_values = np.nanmean(dataset.X_num['train'], axis=0)
182 | X_num = deepcopy(dataset.X_num)
183 | for k, v in X_num.items():
184 | num_nan_indices = np.where(nan_masks[k])
185 | v[num_nan_indices] = np.take(new_values, num_nan_indices[1])
186 | dataset = replace(dataset, X_num=X_num)
187 | else:
188 | assert raise_unknown('policy', policy)
189 | return dataset
190 |
191 |
192 | # Inspired by: https://github.com/Yura52/rtdl/blob/a4c93a32b334ef55d2a0559a4407c8306ffeeaee/lib/data.py#L20
193 | def normalize(
194 | X: ArrayDict, normalization: Normalization, seed: Optional[int]
195 | ) -> ArrayDict:
196 | X_train = X['train']
197 | if normalization == 'standard':
198 | normalizer = sklearn.preprocessing.StandardScaler()
199 | elif normalization == 'quantile':
200 | normalizer = sklearn.preprocessing.QuantileTransformer(
201 | output_distribution='normal',
202 | n_quantiles=max(min(X['train'].shape[0] // 30, 1000), 10),
203 | subsample=1e9,
204 | random_state=seed,
205 | )
206 | noise = 1e-3
207 | if noise > 0:
208 | assert seed is not None
209 | stds = np.std(X_train, axis=0, keepdims=True)
210 | noise_std = noise / np.maximum(stds, noise) # type: ignore[code]
211 | X_train = X_train + noise_std * np.random.default_rng(seed).standard_normal(
212 | X_train.shape
213 | )
214 | else:
215 | raise_unknown('normalization', normalization)
216 | normalizer.fit(X_train)
217 | return {k: normalizer.transform(v) for k, v in X.items()} # type: ignore[code]
218 |
219 |
220 | def cat_process_nans(X: ArrayDict, policy: Optional[CatNanPolicy]) -> ArrayDict:
221 | assert X is not None
222 | nan_masks = {k: v == CAT_MISSING_VALUE for k, v in X.items()}
223 | if any(x.any() for x in nan_masks.values()): # type: ignore[code]
224 | if policy is None:
225 | X_new = X
226 | elif policy == 'most_frequent':
227 | imputer = SimpleImputer(missing_values=CAT_MISSING_VALUE, strategy=policy) # type: ignore[code]
228 | imputer.fit(X['train'])
229 | X_new = {k: cast(np.ndarray, imputer.transform(v)) for k, v in X.items()}
230 | else:
231 | raise_unknown('categorical NaN policy', policy)
232 | else:
233 | assert policy is None
234 | X_new = X
235 | return X_new
236 |
237 |
238 | def cat_drop_rare(X: ArrayDict, min_frequency: float) -> ArrayDict:
239 | assert 0.0 < min_frequency < 1.0
240 | min_count = round(len(X['train']) * min_frequency)
241 | X_new = {x: [] for x in X}
242 | for column_idx in range(X['train'].shape[1]):
243 | counter = Counter(X['train'][:, column_idx].tolist())
244 | popular_categories = {k for k, v in counter.items() if v >= min_count}
245 | for part in X_new:
246 | X_new[part].append(
247 | [
248 | (x if x in popular_categories else CAT_RARE_VALUE)
249 | for x in X[part][:, column_idx].tolist()
250 | ]
251 | )
252 | return {k: np.array(v).T for k, v in X_new.items()}
253 |
254 |
255 | def cat_encode(
256 | X: ArrayDict,
257 | encoding: Optional[CatEncoding],
258 | y_train: Optional[np.ndarray],
259 | seed: Optional[int],
260 | ) -> Tuple[ArrayDict, bool]: # (X, is_converted_to_numerical)
261 | if encoding != 'counter':
262 | y_train = None
263 |
264 | # Step 1. Map strings to 0-based ranges
265 | unknown_value = np.iinfo('int64').max - 3
266 | encoder = sklearn.preprocessing.OrdinalEncoder(
267 | handle_unknown='use_encoded_value', # type: ignore[code]
268 | unknown_value=unknown_value, # type: ignore[code]
269 | dtype='int64', # type: ignore[code]
270 | ).fit(X['train'])
271 | X = {k: encoder.transform(v) for k, v in X.items()}
272 | max_values = X['train'].max(axis=0)
273 | for part in ['val', 'test']:
274 | for column_idx in range(X[part].shape[1]):
275 | X[part][X[part][:, column_idx] == unknown_value, column_idx] = (
276 | max_values[column_idx] + 1
277 | )
278 |
279 | # Step 2. Encode.
280 | if encoding is None:
281 | return (X, False)
282 | elif encoding == 'one-hot':
283 | encoder = sklearn.preprocessing.OneHotEncoder(
284 | handle_unknown='ignore', sparse=False, dtype=np.float32 # type: ignore[code]
285 | )
286 | encoder.fit(X['train'])
287 | return ({k: encoder.transform(v) for k, v in X.items()}, True) # type: ignore[code]
288 | elif encoding == 'counter':
289 | assert y_train is not None
290 | assert seed is not None
291 | encoder = LeaveOneOutEncoder(sigma=0.1, random_state=seed, return_df=False)
292 | encoder.fit(X['train'], y_train)
293 | X = {k: encoder.transform(v).astype('float32') for k, v in X.items()} # type: ignore[code]
294 | if not isinstance(X['train'], pd.DataFrame):
295 | X = {k: v.values for k, v in X.items()} # type: ignore[code]
296 | return (X, True) # type: ignore[code]
297 | else:
298 | raise_unknown('encoding', encoding)
299 |
300 |
301 | def build_target(
302 | y: ArrayDict, policy: Optional[YPolicy], task_type: TaskType
303 | ) -> Tuple[ArrayDict, Dict[str, Any]]:
304 | info: Dict[str, Any] = {'policy': policy}
305 | if policy is None:
306 | pass
307 | elif policy == 'default':
308 | if task_type == TaskType.REGRESSION:
309 | mean, std = float(y['train'].mean()), float(y['train'].std())
310 | y = {k: (v - mean) / std for k, v in y.items()}
311 | info['mean'] = mean
312 | info['std'] = std
313 | else:
314 | raise_unknown('policy', policy)
315 | return y, info
316 |
317 |
318 | @dataclass(frozen=True)
319 | class Transformations:
320 | seed: int = 0
321 | normalization: Optional[Normalization] = None
322 | num_nan_policy: Optional[NumNanPolicy] = None
323 | cat_nan_policy: Optional[CatNanPolicy] = None
324 | cat_min_frequency: Optional[float] = None
325 | cat_encoding: Optional[CatEncoding] = None
326 | y_policy: Optional[YPolicy] = 'default'
327 |
328 |
329 | def transform_dataset(
330 | dataset: Dataset,
331 | transformations: Transformations,
332 | cache_dir: Optional[Path],
333 | ) -> Dataset:
334 | # WARNING: the order of transformations matters. Moreover, the current
335 | # implementation is not ideal in that sense.
336 | if cache_dir is not None:
337 | transformations_md5 = hashlib.md5(
338 | str(transformations).encode('utf-8')
339 | ).hexdigest()
340 | transformations_str = '__'.join(map(str, astuple(transformations)))
341 | cache_path = (
342 | cache_dir / f'cache__{transformations_str}__{transformations_md5}.pickle'
343 | )
344 | if cache_path.exists():
345 | cache_transformations, value = load_pickle(cache_path)
346 | if transformations == cache_transformations:
347 | print(
348 | f"Using cached features: {cache_dir.name + '/' + cache_path.name}"
349 | )
350 | return value
351 | else:
352 | raise RuntimeError(f'Hash collision for {cache_path}')
353 | else:
354 | cache_path = None
355 |
356 | if dataset.X_num is not None:
357 | dataset = num_process_nans(dataset, transformations.num_nan_policy)
358 |
359 | X_num = dataset.X_num
360 | if dataset.X_cat is None:
361 | replace(transformations, cat_nan_policy=None, cat_min_frequency=None, cat_encoding=None)
362 | # assert transformations.cat_nan_policy is None
363 | # assert transformations.cat_min_frequency is None
364 | # assert transformations.cat_encoding is None
365 | X_cat = None
366 | else:
367 | X_cat = cat_process_nans(dataset.X_cat, transformations.cat_nan_policy)
368 | if transformations.cat_min_frequency is not None:
369 | X_cat = cat_drop_rare(X_cat, transformations.cat_min_frequency)
370 | X_cat, is_num = cat_encode(
371 | X_cat,
372 | transformations.cat_encoding,
373 | dataset.y['train'],
374 | transformations.seed,
375 | )
376 | if is_num:
377 | X_num = (
378 | X_cat
379 | if X_num is None
380 | else {x: np.hstack([X_num[x], X_cat[x]]) for x in X_num}
381 | )
382 | X_cat = None
383 |
384 | if X_num is not None and transformations.normalization is not None:
385 | X_num = normalize(X_num, transformations.normalization, transformations.seed)
386 |
387 | y, y_info = build_target(dataset.y, transformations.y_policy, dataset.task_type)
388 |
389 | dataset = replace(dataset, X_num=X_num, X_cat=X_cat, y=y, y_info=y_info)
390 | if cache_path is not None:
391 | dump_pickle((transformations, dataset), cache_path)
392 | return dataset
393 |
394 |
395 | def build_dataset(
396 | path: Union[str, Path], transformations: Transformations, cache: bool
397 | ) -> Dataset:
398 | path = Path(path)
399 | dataset = Dataset.from_dir(path)
400 | return transform_dataset(dataset, transformations, path if cache else None)
401 |
402 |
403 | def prepare_tensors(
404 | dataset: Dataset, device: Union[str, torch.device]
405 | ) -> Tuple[Optional[TensorDict], Optional[TensorDict], TensorDict]:
406 | if isinstance(device, str):
407 | device = torch.device(device)
408 | X_num, X_cat, Y = (
409 | None if x is None else {k: torch.as_tensor(v) for k, v in x.items()}
410 | for x in [dataset.X_num, dataset.X_cat, dataset.y]
411 | )
412 | if device.type != 'cpu':
413 | X_num, X_cat, Y = (
414 | None if x is None else {k: v.to(device) for k, v in x.items()}
415 | for x in [X_num, X_cat, Y]
416 | )
417 | assert X_num is not None
418 | assert Y is not None
419 | if not dataset.is_multiclass:
420 | Y = {k: v.float() for k, v in Y.items()}
421 | return X_num, X_cat, Y
422 |
--------------------------------------------------------------------------------
/examples/add_custom_dataset.py:
--------------------------------------------------------------------------------
1 | # add your custom datasets from csv files
2 | import os
3 | import sys
4 | sys.path.append(os.getcwd())
5 | from data import available_datasets
6 | from data.processor import DataProcessor
7 |
8 | if __name__ == '__main__':
9 | # my_csv_file = 'examples/[kaggle]Assay of serum free light chain.csv' # binclass
10 | # print('available datasets: ', available_datasets())
11 | # DataProcessor.add_custom_dataset(my_csv_file) # add a dataset
12 | # print('available datasets: ', available_datasets())
13 | # dataset = DataProcessor.load_preproc_default('result/test', 'ft-transformer', '[kaggle]Assay of serum free light chai')
14 | # DataProcessor.del_custom_dataset("[kaggle]Assay of serum free light chain") # remove a dataset
15 | print('available datasets: ', available_datasets())
16 | my_csv_file = 'examples/[openml]bodyfat.csv' # regression
17 | DataProcessor.add_custom_dataset(my_csv_file) # add
18 | print('available datasets: ', available_datasets())
19 | dataset = DataProcessor.load_preproc_default('result/test', 'ft-transformer', '[openml]bodyfat') # load
20 | pass
--------------------------------------------------------------------------------
/examples/finetune_baseline.py:
--------------------------------------------------------------------------------
1 | # finetune a baseline with given configs
2 | import os
3 | import sys
4 | sys.path.append(os.getcwd())
5 |
6 | import torch
7 | from data import available_datasets
8 | from data.processor import DataProcessor
9 | from utils.model import seed_everything, get_model_cards, make_baseline, load_config_from_file
10 |
11 |
12 | if __name__ == '__main__':
13 | seed_everything(42)
14 | device = torch.device('cuda')
15 | # model infos
16 | print('model cards: ', get_model_cards())
17 | base_model = 'mlp'
18 | # dataset infos
19 | print('available datasets: ', available_datasets())
20 | dataset_name = 'adult'
21 | # config files
22 | default_config_file = f'configs/default/{base_model}.yaml' # path to your config file
23 | output_dir = f"results/{base_model}/{dataset_name}" # path to save results
24 | # load configs
25 | configs = load_config_from_file(default_config_file) # or you can direcly pass config file to `make_baseline`
26 | # some necessary configs
27 | configs['training']['max_epochs'] = 100 # training args: max training epochs
28 | configs['training']['batch_size'] = 128 # training args: batch_size
29 | configs['meta'] = {'save_path': output_dir} # meta args: result dir
30 |
31 | # load dataset (processing upon model type)
32 | dataset = DataProcessor.load_preproc_default(output_dir, base_model, dataset_name, seed=0)
33 | # build model
34 | n_num_features = dataset.n_num_features
35 | categories = dataset.get_category_sizes('train')
36 | if len(categories) == 0:
37 | categories = None
38 | n_labels = dataset.n_classes or 1 # regression n_classes is None
39 | y_std = dataset.y_info.get('std') # for regression
40 |
41 | model = make_baseline(
42 | base_model, configs['model'],
43 | n_num=n_num_features,
44 | cat_card=categories,
45 | n_labels=n_labels,
46 | device=device
47 | )
48 | # convert to tensor
49 | datas = DataProcessor.prepare(dataset, model)
50 |
51 | # training (automatically load best model at the end)
52 | model.fit(
53 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std,
54 | eval_set=(datas['val'],), # similar as sk-learn
55 | patience=8, # for early stop, <= 0 no early stop
56 | task=dataset.task_type.value,
57 | training_args=configs['training'], # training args
58 | meta_args=configs['meta'], # meta args: other infos, e.g. result dir, experiment name / id
59 | )
60 |
61 | # prediction (best metric checkpoint)
62 | # model.load_best_dnn(output_dir, file='best') # or you can load manually
63 | predictions, results = model.predict(
64 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std,
65 | task=dataset.task_type.value,
66 | return_probs=True, return_metric=True, return_loss=True,
67 | )
68 | model.save_prediction(output_dir, results) # save results
69 | print("=== Prediction (best metric) ===")
70 | print(results)
71 |
72 | # prediction (best logloss checkpoint)
73 | if dataset.task_type.value != 'regression':
74 | model.load_best_dnn(output_dir, file='best-logloss')
75 | predictions, results = model.predict(
76 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std,
77 | task=dataset.task_type.value,
78 | return_probs=True, return_metric=True, return_loss=True,
79 | )
80 | model.save_prediction(output_dir, results, file='prediction_logloss')
81 | print("=== Prediction (best logloss) ===")
82 | print(results)
--------------------------------------------------------------------------------
/examples/tune_baseline.py:
--------------------------------------------------------------------------------
1 | # tune then finetune in one function
2 | import os
3 | import sys
4 | sys.path.append(os.getcwd())
5 |
6 | import torch
7 | from data.processor import DataProcessor
8 | from utils.model import load_config_from_file, seed_everything, get_model_cards, tune, make_baseline
9 |
10 | if __name__ == '__main__':
11 | seed_everything(42)
12 | device = torch.device('cuda')
13 | print('available model infos: ', get_model_cards())
14 | base_model = 'mlp'
15 | dataset_name = 'adult'
16 | # model, training args
17 | search_space_file = f'configs/{base_model}.yaml' # refer to sample search space config file and build yours
18 | output_dir = f"results-tuned/{base_model}/{dataset_name}" # output dir for tuned configs & checkpoints
19 | # load dataset
20 | dataset = DataProcessor.load_preproc_default(output_dir, base_model, dataset_name, seed=0)
21 |
22 | # tune (will load the checkpoint of the best config to predict at the end)
23 | model = tune(
24 | model_name=base_model,
25 | search_config=search_space_file,
26 | dataset=dataset,
27 | batch_size=128,
28 | patience=3, # a small patience for fast tune
29 | n_iterations=5, # tune interations
30 | device=device,
31 | output_dir=output_dir)
32 | print('done')
33 |
34 | # if you want to use the best tuned config
35 | # but a different training args (e.g. patience, batch size)
36 | # you should manually load the best config and finetune
37 | best_config_file = f'{output_dir}/tuned/configs.yaml'
38 | best_configs = load_config_from_file(best_config_file)
39 | # data args
40 | n_num_features = dataset.n_num_features
41 | categories = dataset.get_category_sizes('train')
42 | if len(categories) == 0:
43 | categories = None
44 | n_labels = dataset.n_classes or 1 # regression n_classes is None
45 | y_std = dataset.y_info.get('std') # for regression
46 | # build model from the given config
47 | model = make_baseline(
48 | model_name=base_model,
49 | # you can directly pass config file, but this can not modify training args explicitly
50 | # model_config=best_config_file,
51 | model_config=best_configs['model'],
52 | n_num=n_num_features,
53 | cat_card=categories,
54 | n_labels=n_labels,
55 | device=device
56 | )
57 | # here you can modify the training args (if read config file above)
58 | best_configs['training']['batch_size'] = 256
59 | best_configs['training']['lr'] = 5e-5
60 | output_dir2 = 'final_output_dir'
61 | best_configs['meta']['save_path'] = output_dir2 # save new results with tuned configs
62 | # prepare tensor data
63 | datas = DataProcessor.prepare(dataset, model)
64 | # finetune
65 | model.fit(
66 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std,
67 | eval_set=(datas['val'],),
68 | patience=8, # can use a differnet patience
69 | task=dataset.task_type.value,
70 | training_args=best_configs['training'], # training args
71 | meta_args=best_configs['meta'], # meta args: other infos, e.g. result dir, experiment name / id
72 | )
73 | # prediction
74 | predictions, results = model.predict(
75 | X_num=datas['test'][0], X_cat=datas['test'][1], ys=datas['test'][2], y_std=y_std,
76 | task=dataset.task_type.value,
77 | return_probs=True, return_metric=True, return_loss=True,
78 | )
79 | model.save_prediction(output_dir2, results) # save results
80 | print("=== Prediction (best metric) ===")
81 | print(results)
--------------------------------------------------------------------------------
/image/auto_skdl-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pytabular-ai/auto-scikit-dl/39477cd89832b9da99b9eaa898274439a08b13fb/image/auto_skdl-logo.png
--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .mlp import MLP
2 | from .ft_transformer import FTTransformer
3 | from .autoint import AutoInt
4 | from .dcnv2 import DCNv2
5 | from .node_model import NODE
6 |
--------------------------------------------------------------------------------
/models/abstract.py:
--------------------------------------------------------------------------------
1 | # abstract class for all tabular models
2 | from abc import ABC, abstractmethod
3 | from typing import Optional, Tuple, Union, Dict, Any, Callable
4 | from pathlib import Path
5 | import warnings
6 | warnings.filterwarnings("ignore")
7 | import os
8 | import json
9 | import yaml
10 | import time
11 | import numpy as np
12 |
13 | import torch
14 | import torch.nn as nn
15 | import torch.nn.functional as F
16 | import torch.optim as optim
17 | from torch.utils.data import TensorDataset, DataLoader
18 |
19 | from utils.metrics import calculate_metrics
20 | from sklearn.metrics import log_loss, mean_squared_error
21 |
22 | DNN_FIT_API = Callable[
23 | [nn.Module, torch.Tensor, torch.Tensor, torch.Tensor],
24 | Tuple[torch.Tensor, float]
25 | ] # input X, y and return logits, used time
26 | DNN_PREDICT_API = Callable[
27 | [nn.Module, torch.Tensor, torch.Tensor],
28 | Tuple[torch.Tensor, float]
29 | ] # input X and return logits, used time
30 |
31 | def default_dnn_fit(model, x_num, x_cat, y):
32 | """
33 | Training Process
34 | """
35 | start_time = time.time()
36 | logits = model(x_num, x_cat)
37 | used_time = time.time() - start_time # omit backward time, add in outer loop
38 | return logits, used_time
39 | def default_dnn_predict(model, x_num, x_cat):
40 | """
41 | Inference Process
42 | `no_grad` will be applied in `dnn_predict'
43 | """
44 | start_time = time.time()
45 | logits = model(x_num, x_cat)
46 | used_time = time.time() - start_time
47 | return logits, used_time
48 | def check_dir(dir):
49 | if not os.path.exists(dir):
50 | os.makedirs(dir)
51 |
52 | def make_optimizer(
53 | optimizer: str,
54 | parameter_groups,
55 | lr: float,
56 | weight_decay: float,
57 | ) -> optim.Optimizer:
58 | Optimizer = {
59 | 'adam': optim.Adam,
60 | 'adamw': optim.AdamW,
61 | 'sgd': optim.SGD,
62 | }[optimizer]
63 | momentum = (0.9,) if Optimizer is optim.SGD else ()
64 | return Optimizer(parameter_groups, lr, *momentum, weight_decay=weight_decay)
65 |
66 | def make_lr_schedule(
67 | optimizer: optim.Optimizer,
68 | lr: float,
69 | epoch_size: int,
70 | lr_schedule: Optional[Dict[str, Any]],
71 | ) -> Tuple[
72 | Optional[optim.lr_scheduler._LRScheduler],
73 | Dict[str, Any],
74 | Optional[int],
75 | ]:
76 | if lr_schedule is None:
77 | lr_schedule = {'type': 'constant'}
78 | lr_scheduler = None
79 | n_warmup_steps = None
80 | if lr_schedule['type'] in ['transformer', 'linear_warmup']:
81 | n_warmup_steps = (
82 | lr_schedule['n_warmup_steps']
83 | if 'n_warmup_steps' in lr_schedule
84 | else lr_schedule['n_warmup_epochs'] * epoch_size
85 | )
86 | elif lr_schedule['type'] == 'cyclic':
87 | lr_scheduler = optim.lr_scheduler.CyclicLR(
88 | optimizer,
89 | base_lr=lr,
90 | max_lr=lr_schedule['max_lr'],
91 | step_size_up=lr_schedule['n_epochs_up'] * epoch_size,
92 | step_size_down=lr_schedule['n_epochs_down'] * epoch_size,
93 | mode=lr_schedule['mode'],
94 | gamma=lr_schedule.get('gamma', 1.0),
95 | cycle_momentum=False,
96 | )
97 | return lr_scheduler, lr_schedule, n_warmup_steps
98 |
99 | class TabModel(ABC):
100 | def __init__(self):
101 | self.model: Optional[nn.Module] = None # true model
102 | self.base_name = None # model type name
103 | self.device = None
104 | self.saved_model_config = None
105 | self.training_config = None
106 | self.meta_config = None
107 | self.post_init()
108 |
109 | def post_init(self):
110 | self.history = {
111 | 'train': {'loss': [], 'tot_time': 0, 'avg_step_time': 0, 'avg_epoch_time': 0},
112 | 'val': {
113 | 'metric_name': None, 'metric': [], 'best_metric': None,
114 | 'log_loss': [], 'best_log_loss': None,
115 | 'best_epoch': None, 'best_step': None,
116 | 'tot_time': 0, 'avg_step_time': 0, 'avg_epoch_time': 0
117 | },
118 | # 'test': {'loss': [], 'metric': [], 'final_metric': None},
119 | 'device': torch.cuda.get_device_name(),
120 | } # save metrics
121 | self.no_improvement = 0 # for dnn early stop
122 |
123 | def preproc_config(self, model_config: dict):
124 | """default preprocessing for model configurations"""
125 | self.saved_model_config = model_config
126 | return model_config
127 |
128 | @abstractmethod
129 | def fit(
130 | self,
131 | X_num: Union[torch.Tensor, np.ndarray],
132 | X_cat: Union[torch.Tensor, np.ndarray],
133 | ys: Union[torch.Tensor, np.ndarray],
134 | y_std: Optional[float],
135 | eval_set: Optional[Tuple[Union[torch.Tensor, np.ndarray]]],
136 | patience: int,
137 | task: str,
138 | training_args: dict,
139 | meta_args: Optional[dict],
140 | ):
141 | """
142 | Training Model with Early Stop(optional)
143 | load best weights at the end
144 | """
145 | pass
146 |
147 | def dnn_fit(
148 | self,
149 | *,
150 | dnn_fit_func: Optional[DNN_FIT_API] = None,
151 | # API for specical sampler like curriculum learning
152 | train_loader: Optional[Tuple[DataLoader, int]] = None, # (loader, missing_idx)
153 | # using normal dataloader sampler if is None
154 | X_num: Optional[torch.Tensor] = None,
155 | X_cat: Optional[torch.Tensor] = None,
156 | ys: Optional[torch.Tensor] = None,
157 | y_std: Optional[float] = None, # for RMSE
158 | eval_set: Tuple[torch.Tensor, np.ndarray] = None, # similar API as sk-learn
159 | patience: int = 0, # <= 0 without early stop
160 | task: str,
161 | training_args: dict,
162 | meta_args: Optional[dict] = None,
163 | ):
164 | # DONE: move to abstract class (dnn_fit)
165 | if dnn_fit_func is None:
166 | dnn_fit_func = default_dnn_fit
167 | # meta args
168 | if meta_args is None:
169 | meta_args = {}
170 | meta_args.setdefault('save_path', f'results/{self.base_name}')
171 | if not os.path.exists(meta_args['save_path']):
172 | print('create new results dir: ', meta_args['save_path'])
173 | os.makedirs(meta_args['save_path'])
174 | self.meta_config = meta_args
175 | # optimzier and scheduler
176 | training_args.setdefault('optimizer', 'adamw')
177 | optimizer, scheduler = TabModel.make_optimizer(self.model, training_args)
178 | # data loader
179 | training_args.setdefault('batch_size', 64)
180 | training_args.setdefault('ghost_batch_size', None)
181 | if train_loader is not None:
182 | train_loader, missing_idx = train_loader
183 | training_args['batch_size'] = train_loader.batch_size
184 | else:
185 | train_loader, missing_idx = TabModel.prepare_tensor_loader(
186 | X_num=X_num, X_cat=X_cat, ys=ys,
187 | batch_size=training_args['batch_size'],
188 | shuffle=True,
189 | )
190 | if eval_set is not None:
191 | eval_set = eval_set[0] # only use the first dev set
192 | dev_loader = TabModel.prepare_tensor_loader(
193 | X_num=eval_set[0], X_cat=eval_set[1], ys=eval_set[2],
194 | batch_size=training_args['batch_size'],
195 | )
196 | else:
197 | dev_loader = None
198 | # training loops
199 | training_args.setdefault('max_epochs', 1000)
200 | # training_args.setdefault('report_frequency', 100) # same as save_freq
201 | # training_args.setdefault('save_frequency', 100) # save per 100 steps
202 | training_args.setdefault('patience', patience)
203 | training_args.setdefault('save_frequency', 'epoch') # save per epoch
204 | self.training_config = training_args
205 |
206 | steps_per_backward = 1 if training_args['ghost_batch_size'] is None \
207 | else training_args['batch_size'] // training_args['ghost_batch_size']
208 | steps_per_epoch = len(train_loader)
209 | tot_step, tot_time = 0, 0
210 | for e in range(training_args['max_epochs']):
211 | self.model.train()
212 | tot_loss = 0
213 | for step, batch in enumerate(train_loader):
214 | optimizer.zero_grad()
215 | x_num, x_cat, y = TabModel.parse_batch(batch, missing_idx, self.device)
216 | logits, forward_time = dnn_fit_func(self.model, x_num, x_cat, y)
217 | loss = TabModel.compute_loss(logits, y, task)
218 | # backward
219 | start_time = time.time()
220 | loss.backward()
221 | backward_time = time.time() - start_time
222 | self.gradient_policy()
223 | tot_time += forward_time + backward_time
224 | optimizer.step()
225 | if scheduler is not None:
226 | scheduler.step()
227 | # print or save infos
228 | tot_step += 1
229 | tot_loss += loss.cpu().item()
230 | if isinstance(training_args['save_frequency'], int) \
231 | and tot_step % training_args['save_frequency'] == 0:
232 | is_early_stop = self.save_evaluate_dnn(
233 | tot_step, steps_per_epoch,
234 | tot_loss, tot_time,
235 | task, training_args['patience'], meta_args['save_path'],
236 | dev_loader, y_std,
237 | )
238 | if is_early_stop:
239 | self.save(meta_args['save_path'])
240 | self.load_best_dnn(meta_args['save_path'])
241 | return
242 | if training_args['save_frequency'] == 'epoch':
243 | if hasattr(self.model, 'layer_masks'):
244 | print('layer_mask: ', self.model.layer_masks > 0)
245 | is_early_stop = self.save_evaluate_dnn(
246 | tot_step, steps_per_epoch,
247 | tot_loss, tot_time,
248 | task, training_args['patience'], meta_args['save_path'],
249 | dev_loader, y_std,
250 | )
251 | if is_early_stop:
252 | self.save(meta_args['save_path'])
253 | self.load_best_dnn(meta_args['save_path'])
254 | return
255 | self.save(meta_args['save_path'])
256 | self.load_best_dnn(meta_args['save_path'])
257 |
258 | @abstractmethod
259 | def predict(
260 | self,
261 | dev_loader: Optional[DataLoader],
262 | X_num: Union[torch.Tensor, np.ndarray],
263 | X_cat: Union[torch.Tensor, np.ndarray],
264 | ys: Union[torch.Tensor, np.ndarray],
265 | y_std: Optional[float],
266 | task: str,
267 | return_probs: bool = True,
268 | return_metric: bool = True,
269 | return_loss: bool = True,
270 | meta_args: Optional[dict] = None,
271 | ):
272 | """
273 | Prediction
274 | """
275 | pass
276 |
277 | def dnn_predict(
278 | self,
279 | *,
280 | dnn_predict_func: Optional[DNN_PREDICT_API] = None,
281 | dev_loader: Optional[Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
282 | X_num: Optional[torch.Tensor] = None,
283 | X_cat: Optional[torch.Tensor] = None,
284 | ys: Optional[torch.Tensor] = None,
285 | y_std: Optional[float] = None, # for RMSE
286 | task: str,
287 | return_probs: bool = True,
288 | return_metric: bool = False,
289 | return_loss: bool = False,
290 | meta_args: Optional[dict] = None,
291 | ):
292 | # DONE: move to abstract class (dnn_predict)
293 | if dnn_predict_func is None:
294 | dnn_predict_func = default_dnn_predict
295 | if dev_loader is None:
296 | dev_loader, missing_idx = TabModel.prepare_tensor_loader(
297 | X_num=X_num, X_cat=X_cat, ys=ys,
298 | batch_size=128,
299 | )
300 | else:
301 | dev_loader, missing_idx = dev_loader
302 | # print("Evaluate...")
303 | predictions, golds = [], []
304 | tot_time = 0
305 | self.model.eval()
306 | for batch in dev_loader:
307 | x_num, x_cat, y = TabModel.parse_batch(batch, missing_idx, self.device)
308 | with torch.no_grad():
309 | logits, used_time = dnn_predict_func(self.model, x_num, x_cat)
310 | tot_time += used_time
311 | predictions.append(logits)
312 | golds.append(y)
313 | self.model.train()
314 | predictions = torch.cat(predictions).squeeze(-1)
315 | golds = torch.cat(golds)
316 | if return_loss:
317 | loss = TabModel.compute_loss(predictions, golds, task).cpu().item()
318 | else:
319 | loss = None
320 | if return_probs and task != 'regression':
321 | predictions = (
322 | predictions.sigmoid()
323 | if task == 'binclass'
324 | else predictions.softmax(-1)
325 | )
326 | prediction_type = 'probs'
327 | elif task == 'regression':
328 | prediction_type = None
329 | else:
330 | prediction_type = 'logits'
331 | predictions = predictions.cpu().numpy()
332 | golds = golds.cpu().numpy()
333 | if return_metric:
334 | metric = TabModel.calculate_metric(
335 | golds, predictions,
336 | task, prediction_type, y_std
337 | )
338 | logloss = (
339 | log_loss(golds, np.stack([1-predictions, predictions], axis=1), labels=[0,1])
340 | if task == 'binclass'
341 | else log_loss(golds, predictions, labels=list(range(len(set(golds)))))
342 | if task == 'multiclass'
343 | else None
344 | )
345 | else:
346 | metric, logloss = None, None
347 | results = {'loss': loss, 'metric': metric, 'time': tot_time, 'log_loss': logloss}
348 | if meta_args is not None:
349 | self.save_prediction(meta_args['save_path'], results)
350 | return predictions, results
351 |
352 | def gradient_policy(self):
353 | """For post porcess model gradient"""
354 | pass
355 |
356 | @abstractmethod
357 | def save(self, output_dir):
358 | """
359 | Save model weights and configs,
360 | the following default save functions
361 | can be combined to override this function
362 | """
363 | pass
364 |
365 | def save_pt_model(self, output_dir):
366 | print('saving pt model weights...')
367 | # save model params
368 | torch.save(self.model.state_dict(), Path(output_dir) / 'final.bin')
369 |
370 | def save_tree_model(self, output_dir):
371 | print('saving tree model...')
372 | pass
373 |
374 | def save_history(self, output_dir):
375 | # save metrics
376 | with open(Path(output_dir) / 'results.json', 'w') as f:
377 | json.dump(self.history, f, indent=4)
378 |
379 | def save_prediction(self, output_dir, results, file='prediction'):
380 | check_dir(output_dir)
381 | # save test results
382 | print("saving prediction results")
383 | saved_results = {
384 | 'loss': results['loss'],
385 | 'metric_name': results['metric'][1],
386 | 'metric': results['metric'][0],
387 | 'time': results['time'],
388 | 'log_loss': results['log_loss'],
389 | }
390 | with open(Path(output_dir) / f'{file}.json', 'w') as f:
391 | json.dump(saved_results, f, indent=4)
392 |
393 | def save_config(self, output_dir):
394 | def serialize(config: dict):
395 | for key in config:
396 | # serialized object to store yaml or json files
397 | if any(isinstance(config[key], obj) for obj in [Path, ]):
398 | config[key] = str(config[key])
399 | return config
400 | # save all configs
401 | with open(Path(output_dir) / 'configs.yaml', 'w') as f:
402 | configs = {
403 | 'model': self.saved_model_config,
404 | 'training': self.training_config,
405 | 'meta': serialize(self.meta_config)
406 | }
407 | yaml.dump(configs, f, indent=2)
408 |
409 | @staticmethod
410 | def make_optimizer(
411 | model: nn.Module,
412 | training_args: dict,
413 | ) -> Tuple[optim.Optimizer, optim.lr_scheduler._LRScheduler]:
414 | training_args.setdefault('optimizer', 'adamw')
415 | training_args.setdefault('no_wd_group', None)
416 | training_args.setdefault('scheduler', None)
417 | # optimizer
418 | if training_args['no_wd_group'] is not None:
419 | assert isinstance(training_args['no_wd_group'], list)
420 | def needs_wd(name):
421 | return all(x not in name for x in training_args['no_wd_group'])
422 | parameters_with_wd = [v for k, v in model.named_parameters() if needs_wd(k)]
423 | parameters_without_wd = [v for k, v in model.named_parameters() if not needs_wd(k)]
424 | model_params = [
425 | {'params': parameters_with_wd},
426 | {'params': parameters_without_wd, 'weight_decay': 0.0},
427 | ]
428 | else:
429 | model_params = model.parameters()
430 | optimizer = make_optimizer(
431 | training_args['optimizer'],
432 | model_params,
433 | training_args['lr'],
434 | training_args['weight_decay'],
435 | )
436 | # scheduler
437 | if training_args['scheduler'] is not None:
438 | scheduler = None
439 | else:
440 | scheduler = None
441 |
442 | return optimizer, scheduler
443 |
444 | @staticmethod
445 | def prepare_tensor_loader(
446 | X_num: Optional[torch.Tensor],
447 | X_cat: Optional[torch.Tensor],
448 | ys: torch.Tensor,
449 | batch_size: int = 64,
450 | shuffle: bool = False,
451 | ):
452 | assert not all(x is None for x in [X_num, X_cat])
453 | missing_placeholder = 0 if X_num is None else 1 if X_cat is None else -1
454 | datas = [x for x in [X_num, X_cat, ys] if x is not None]
455 | tensor_dataset = TensorDataset(*datas)
456 | tensor_loader = DataLoader(
457 | tensor_dataset,
458 | batch_size=batch_size,
459 | shuffle=shuffle,
460 | )
461 | return tensor_loader, missing_placeholder
462 |
463 | @staticmethod
464 | def parse_batch(batch: Tuple[torch.Tensor], missing_idx, device: torch.device):
465 | if batch[0].device.type != device.type:
466 | # if batch[0].device != device: # initialize self.device with model.device rather than torch.device()
467 | # batch = (x.to(device) for x in batch) # generator
468 | batch = tuple([x.to(device) for x in batch]) # list
469 | if missing_idx == -1:
470 | return batch
471 | else:
472 | return batch[:missing_idx] + [None,] + batch[missing_idx:]
473 |
474 | @staticmethod
475 | def compute_loss(logits: torch.Tensor, targets: torch.Tensor, task: str, reduction: str = 'mean'):
476 | loss_fn = {
477 | 'binclass': F.binary_cross_entropy_with_logits,
478 | 'multiclass': F.cross_entropy,
479 | 'regression': F.mse_loss,
480 | }[task]
481 | return loss_fn(logits.squeeze(-1), targets, reduction=reduction)
482 |
483 | @staticmethod
484 | def calculate_metric(
485 | golds,
486 | predictions,
487 | task: str,
488 | prediction_type: Optional[str] = None,
489 | y_std: Optional[float] = None,
490 | ):
491 | """Calculate metrics"""
492 | metric = {
493 | 'regression': 'rmse',
494 | 'binclass': 'roc_auc',
495 | 'multiclass': 'accuracy'
496 | }[task]
497 |
498 | return calculate_metrics(
499 | golds, predictions,
500 | task, prediction_type, y_std
501 | )[metric], metric
502 |
503 | def better_result(self, dev_metric, task, is_loss=False):
504 | if is_loss: # logloss
505 | best_dev_metric = self.history['val']['best_log_loss']
506 | if best_dev_metric is None or best_dev_metric > dev_metric:
507 | self.history['val']['best_log_loss'] = dev_metric
508 | return True
509 | else:
510 | return False
511 | best_dev_metric = self.history['val']['best_metric']
512 | if best_dev_metric is None:
513 | self.history['val']['best_metric'] = dev_metric
514 | return True
515 | elif task == 'regression': # rmse
516 | if best_dev_metric > dev_metric:
517 | self.history['val']['best_metric'] = dev_metric
518 | return True
519 | else:
520 | return False
521 | else:
522 | if best_dev_metric < dev_metric:
523 | self.history['val']['best_metric'] = dev_metric
524 | return True
525 | else:
526 | return False
527 |
528 | def early_stop_handler(self, epoch, tot_step, dev_metric, task, patience, save_path):
529 | if task != 'regression' and self.better_result(dev_metric['log_loss'], task, is_loss=True):
530 | # record best logloss
531 | torch.save(self.model.state_dict(), Path(save_path) / 'best-logloss.bin')
532 | if self.better_result(dev_metric['metric'], task):
533 | print('<<< Best Dev Result', end='')
534 | torch.save(self.model.state_dict(), Path(save_path) / 'best.bin')
535 | self.no_improvement = 0
536 | self.history['val']['best_epoch'] = epoch
537 | self.history['val']['best_step'] = tot_step
538 | else:
539 | self.no_improvement += 1
540 | print(f'| [no improvement] {self.no_improvement}', end='')
541 | if patience <= 0:
542 | return False
543 | else:
544 | return self.no_improvement >= patience
545 |
546 | def save_evaluate_dnn(
547 | self,
548 | # print and saved infos
549 | tot_step, steps_per_epoch,
550 | tot_loss, tot_time,
551 | # evaluate infos
552 | task, patience, save_path,
553 | dev_loader, y_std
554 | ):
555 | """For DNN models"""
556 | epoch, step = tot_step // steps_per_epoch, (tot_step - 1) % steps_per_epoch + 1
557 | avg_loss = tot_loss / step
558 | self.history['train']['loss'].append(avg_loss)
559 | self.history['train']['tot_time'] = tot_time
560 | self.history['train']['avg_step_time'] = tot_time / tot_step
561 | self.history['train']['avg_epoch_time'] = self.history['train']['avg_step_time'] * steps_per_epoch
562 | print(f"[epoch] {epoch} | [step] {step} | [tot_step] {tot_step} | [used time] {tot_time:.4g} | [train_loss] {avg_loss:.4g} ", end='')
563 | if dev_loader is not None:
564 | _, results = self.predict(dev_loader=dev_loader, y_std=y_std, task=task, return_metric=True)
565 | dev_metric, metric_name = results['metric']
566 | print(f"| [{metric_name}] {dev_metric:.4g} ", end='')
567 | if task != 'regression':
568 | print(f"| [log-loss] {results['log_loss']:.4g} ", end='')
569 | self.history['val']['log_loss'].append(results['log_loss'])
570 | self.history['val']['metric_name'] = metric_name
571 | self.history['val']['metric'].append(dev_metric)
572 | self.history['val']['tot_time'] += results['time']
573 | self.history['val']['avg_step_time'] = self.history['val']['tot_time'] / tot_step
574 | self.history['val']['avg_epoch_time'] = self.history['val']['avg_step_time'] * steps_per_epoch
575 | dev_metric = {'metric': dev_metric, 'log_loss': results['log_loss']}
576 | if self.early_stop_handler(epoch, tot_step, dev_metric, task, patience, save_path):
577 | print(' <<< Early Stop')
578 | return True
579 | print()
580 | return False
581 |
582 | def load_best_dnn(self, save_path, file='best'):
583 | model_file = Path(save_path) / f"{file}.bin"
584 | if not os.path.exists(model_file):
585 | print(f'There is no {file} checkpoint, loading the last one...')
586 | model_file = Path(save_path) / 'final.bin'
587 | else:
588 | print(f'Loading {file} model...')
589 | self.model.load_state_dict(torch.load(model_file))
590 | print('successfully')
591 |
--------------------------------------------------------------------------------
/models/autoint.py:
--------------------------------------------------------------------------------
1 | # Implementation of "AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks"
2 | # Some differences from a more "conventional" transformer:
3 | # - no FFN module, but one linear layer before adding the result of attention
4 | # - no bias for numerical embeddings
5 | # - no CLS token, the final embedding is formed by concatenation of all the tokens
6 | # - n_heads = 2 is recommended in the paper
7 | # - d_token is supposed to be small
8 | # - the placement of normalizations and activations is different
9 |
10 | # %%
11 | import math
12 | import time
13 | import typing as ty
14 | from pathlib import Path
15 |
16 | import numpy as np
17 | import torch
18 | import torch.nn as nn
19 | import torch.nn.functional as F
20 | import torch.nn.init as nn_init
21 | from torch.utils.data import DataLoader
22 | from torch import Tensor
23 |
24 | from utils.deep import get_activation_fn
25 | from models.abstract import TabModel, check_dir
26 |
27 | # %%
28 | class Tokenizer(nn.Module):
29 | category_offsets: ty.Optional[Tensor]
30 |
31 | def __init__(
32 | self,
33 | d_numerical: int,
34 | categories: ty.Optional[ty.List[int]],
35 | n_latent_tokens: int,
36 | d_token: int,
37 | ) -> None:
38 | super().__init__()
39 | assert n_latent_tokens == 0
40 | self.n_latent_tokens = n_latent_tokens
41 | if d_numerical:
42 | self.weight = nn.Parameter(Tensor(d_numerical + n_latent_tokens, d_token))
43 | # The initialization is inspired by nn.Linear
44 | nn_init.kaiming_uniform_(self.weight, a=math.sqrt(5))
45 | else:
46 | self.weight = None
47 | assert categories is not None
48 | if categories is None:
49 | self.category_offsets = None
50 | self.category_embeddings = None
51 | else:
52 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0)
53 | self.register_buffer('category_offsets', category_offsets)
54 | self.category_embeddings = nn.Embedding(sum(categories), d_token)
55 | nn_init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5))
56 | print(f'{self.category_embeddings.weight.shape}')
57 |
58 | @property
59 | def n_tokens(self) -> int:
60 | return (0 if self.weight is None else len(self.weight)) + (
61 | 0 if self.category_offsets is None else len(self.category_offsets)
62 | )
63 |
64 | def forward(self, x_num: ty.Optional[Tensor], x_cat: ty.Optional[Tensor]) -> Tensor:
65 | if x_num is None:
66 | return self.category_embeddings(x_cat + self.category_offsets[None]) # type: ignore[code]
67 | x_num = torch.cat(
68 | [
69 | torch.ones(len(x_num), self.n_latent_tokens, device=x_num.device),
70 | x_num,
71 | ],
72 | dim=1,
73 | )
74 | x = self.weight[None] * x_num[:, :, None] # type: ignore[code]
75 | if x_cat is not None:
76 | x = torch.cat(
77 | [x, self.category_embeddings(x_cat + self.category_offsets[None])], # type: ignore[code]
78 | dim=1,
79 | )
80 | return x
81 |
82 |
83 | class MultiheadAttention(nn.Module):
84 | def __init__(
85 | self, d: int, n_heads: int, dropout: float, initialization: str
86 | ) -> None:
87 | if n_heads > 1:
88 | assert d % n_heads == 0
89 | assert initialization in ['xavier', 'kaiming']
90 |
91 | super().__init__()
92 | self.W_q = nn.Linear(d, d)
93 | self.W_k = nn.Linear(d, d)
94 | self.W_v = nn.Linear(d, d)
95 | self.W_out = None
96 | self.n_heads = n_heads
97 | self.dropout = nn.Dropout(dropout) if dropout else None
98 |
99 | for m in [self.W_q, self.W_k, self.W_v]:
100 | if initialization == 'xavier' and (n_heads > 1 or m is not self.W_v):
101 | # gain is needed since W_qkv is represented with 3 separate layers
102 | nn_init.xavier_uniform_(m.weight, gain=1 / math.sqrt(2))
103 | nn_init.zeros_(m.bias)
104 | if self.W_out is not None:
105 | nn_init.zeros_(self.W_out.bias)
106 |
107 | def _reshape(self, x: Tensor) -> Tensor:
108 | batch_size, n_tokens, d = x.shape
109 | d_head = d // self.n_heads
110 | return (
111 | x.reshape(batch_size, n_tokens, self.n_heads, d_head)
112 | .transpose(1, 2)
113 | .reshape(batch_size * self.n_heads, n_tokens, d_head)
114 | )
115 |
116 | def forward(
117 | self,
118 | x_q: Tensor,
119 | x_kv: Tensor,
120 | key_compression: ty.Optional[nn.Linear],
121 | value_compression: ty.Optional[nn.Linear],
122 | ) -> Tensor:
123 | q, k, v = self.W_q(x_q), self.W_k(x_kv), self.W_v(x_kv)
124 | for tensor in [q, k, v]:
125 | assert tensor.shape[-1] % self.n_heads == 0
126 | if key_compression is not None:
127 | assert value_compression is not None
128 | k = key_compression(k.transpose(1, 2)).transpose(1, 2)
129 | v = value_compression(v.transpose(1, 2)).transpose(1, 2)
130 | else:
131 | assert value_compression is None
132 |
133 | batch_size = len(q)
134 | d_head_key = k.shape[-1] // self.n_heads
135 | d_head_value = v.shape[-1] // self.n_heads
136 | n_q_tokens = q.shape[1]
137 |
138 | q = self._reshape(q)
139 | k = self._reshape(k)
140 | attention = F.softmax(q @ k.transpose(1, 2) / math.sqrt(d_head_key), dim=-1)
141 | if self.dropout is not None:
142 | attention = self.dropout(attention)
143 | x = attention @ self._reshape(v)
144 | x = (
145 | x.reshape(batch_size, self.n_heads, n_q_tokens, d_head_value)
146 | .transpose(1, 2)
147 | .reshape(batch_size, n_q_tokens, self.n_heads * d_head_value)
148 | )
149 | if self.W_out is not None:
150 | x = self.W_out(x)
151 | return x
152 |
153 |
154 | class _AutoInt(nn.Module):
155 | def __init__(
156 | self,
157 | *,
158 | d_numerical: int,
159 | categories: ty.Optional[ty.List[int]],
160 | n_layers: int,
161 | d_token: int,
162 | n_heads: int,
163 | attention_dropout: float,
164 | residual_dropout: float,
165 | activation: str,
166 | prenormalization: bool = False,
167 | initialization: str = 'kaiming',
168 | kv_compression: ty.Optional[float] = None,
169 | kv_compression_sharing: ty.Optional[str] = None,
170 | d_out: int,
171 | ) -> None:
172 | assert not prenormalization
173 | assert activation == 'relu'
174 | assert (kv_compression is None) ^ (kv_compression_sharing is not None)
175 |
176 | super().__init__()
177 | self.tokenizer = Tokenizer(d_numerical, categories, 0, d_token)
178 | n_tokens = self.tokenizer.n_tokens
179 |
180 | def make_kv_compression():
181 | assert kv_compression
182 | compression = nn.Linear(
183 | n_tokens, int(n_tokens * kv_compression), bias=False
184 | )
185 | if initialization == 'xavier':
186 | nn_init.xavier_uniform_(compression.weight)
187 | return compression
188 |
189 | self.shared_kv_compression = (
190 | make_kv_compression()
191 | if kv_compression and kv_compression_sharing == 'layerwise'
192 | else None
193 | )
194 |
195 | def make_normalization():
196 | return nn.LayerNorm(d_token)
197 |
198 | self.layers = nn.ModuleList([])
199 | for layer_idx in range(n_layers):
200 | layer = nn.ModuleDict(
201 | {
202 | 'attention': MultiheadAttention(
203 | d_token, n_heads, attention_dropout, initialization
204 | ),
205 | 'linear': nn.Linear(d_token, d_token, bias=False),
206 | }
207 | )
208 | if not prenormalization or layer_idx:
209 | layer['norm0'] = make_normalization()
210 | if kv_compression and self.shared_kv_compression is None:
211 | layer['key_compression'] = make_kv_compression()
212 | if kv_compression_sharing == 'headwise':
213 | layer['value_compression'] = make_kv_compression()
214 | else:
215 | assert kv_compression_sharing == 'key-value'
216 | self.layers.append(layer)
217 |
218 | self.activation = get_activation_fn(activation)
219 | self.prenormalization = prenormalization
220 | self.last_normalization = make_normalization() if prenormalization else None
221 | self.residual_dropout = residual_dropout
222 | self.head = nn.Linear(d_token * n_tokens, d_out)
223 |
224 | def _get_kv_compressions(self, layer):
225 | return (
226 | (self.shared_kv_compression, self.shared_kv_compression)
227 | if self.shared_kv_compression is not None
228 | else (layer['key_compression'], layer['value_compression'])
229 | if 'key_compression' in layer and 'value_compression' in layer
230 | else (layer['key_compression'], layer['key_compression'])
231 | if 'key_compression' in layer
232 | else (None, None)
233 | )
234 |
235 | def _start_residual(self, x, layer, norm_idx):
236 | x_residual = x
237 | if self.prenormalization:
238 | norm_key = f'norm{norm_idx}'
239 | if norm_key in layer:
240 | x_residual = layer[norm_key](x_residual)
241 | return x_residual
242 |
243 | def _end_residual(self, x, x_residual, layer, norm_idx):
244 | if self.residual_dropout:
245 | x_residual = F.dropout(x_residual, self.residual_dropout, self.training)
246 | x = x + x_residual
247 | if not self.prenormalization:
248 | x = layer[f'norm{norm_idx}'](x)
249 | return x
250 |
251 | def forward(self, x_num: ty.Optional[Tensor], x_cat: ty.Optional[Tensor]) -> Tensor:
252 | x = self.tokenizer(x_num, x_cat)
253 |
254 | for layer in self.layers:
255 | layer = ty.cast(ty.Dict[str, nn.Module], layer)
256 |
257 | x_residual = self._start_residual(x, layer, 0)
258 | x_residual = layer['attention'](
259 | x_residual,
260 | x_residual,
261 | *self._get_kv_compressions(layer),
262 | )
263 | x = layer['linear'](x)
264 | x = self._end_residual(x, x_residual, layer, 0)
265 | x = self.activation(x)
266 |
267 | x = x.flatten(1, 2)
268 | x = self.head(x)
269 | x = x.squeeze(-1)
270 | return x
271 |
272 |
273 | # %%
274 | class AutoInt(TabModel):
275 | def __init__(
276 | self,
277 | model_config: dict,
278 | n_num_features: int,
279 | categories: ty.Optional[ty.List[int]],
280 | n_labels: int,
281 | device: ty.Union[str, torch.device] = 'cuda',
282 | ):
283 | super().__init__()
284 | model_config = self.preproc_config(model_config)
285 | self.model = _AutoInt(
286 | d_numerical=n_num_features,
287 | categories=categories,
288 | d_out=n_labels,
289 | **model_config
290 | ).to(device)
291 | self.base_name = 'autoint'
292 | self.device = torch.device(device)
293 |
294 | def preproc_config(self, model_config: dict):
295 | # process autoint configs
296 | self.saved_model_config = model_config.copy()
297 | return model_config
298 |
299 | def fit(
300 | self,
301 | # API for specical sampler like curriculum learning
302 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx)
303 | # using normal sampler if is None
304 | X_num: ty.Optional[torch.Tensor] = None,
305 | X_cat: ty.Optional[torch.Tensor] = None,
306 | ys: ty.Optional[torch.Tensor] = None,
307 | y_std: ty.Optional[float] = None, # for RMSE
308 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None,
309 | patience: int = 0,
310 | task: str = None,
311 | training_args: dict = None,
312 | meta_args: ty.Optional[dict] = None,
313 | ):
314 | def train_step(model, x_num, x_cat, y): # input is X and y
315 | # process input (model-specific)
316 | # define your model API
317 | start_time = time.time()
318 | # define your model API
319 | logits = model(x_num, x_cat)
320 | used_time = time.time() - start_time
321 | return logits, used_time
322 |
323 | # to custom other training paradigm
324 | # 1. add self.dnn_fit2(...) in abstract class for special training process
325 | # 2. (recommended) override self.dnn_fit in abstract class
326 | self.dnn_fit( # uniform training paradigm
327 | dnn_fit_func=train_step,
328 | # training data
329 | train_loader=train_loader,
330 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std,
331 | # dev data
332 | eval_set=eval_set, patience=patience, task=task,
333 | # args
334 | training_args=training_args,
335 | meta_args=meta_args,
336 | )
337 |
338 | def predict(
339 | self,
340 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
341 | X_num: ty.Optional[torch.Tensor] = None,
342 | X_cat: ty.Optional[torch.Tensor] = None,
343 | ys: ty.Optional[torch.Tensor] = None,
344 | y_std: ty.Optional[float] = None, # for RMSE
345 | task: str = None,
346 | return_probs: bool = True,
347 | return_metric: bool = False,
348 | return_loss: bool = False,
349 | meta_args: ty.Optional[dict] = None,
350 | ):
351 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible)
352 | """
353 | Inference Process
354 | `no_grad` will be applied in `dnn_predict'
355 | """
356 | # process input (model-specific)
357 | # define your model API
358 | start_time = time.time()
359 | # define your model API
360 | logits = model(x_num, x_cat)
361 | used_time = time.time() - start_time
362 | return logits, used_time
363 |
364 | # to custom other inference paradigm
365 | # 1. add self.dnn_predict2(...) in abstract class for special training process
366 | # 2. (recommended) override self.dnn_predict in abstract class
367 | return self.dnn_predict( # uniform training paradigm
368 | dnn_predict_func=inference_step,
369 | dev_loader=dev_loader,
370 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task,
371 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss,
372 | meta_args=meta_args,
373 | )
374 |
375 | def save(self, output_dir):
376 | check_dir(output_dir)
377 | self.save_pt_model(output_dir)
378 | self.save_history(output_dir)
379 | self.save_config(output_dir)
--------------------------------------------------------------------------------
/models/dcnv2.py:
--------------------------------------------------------------------------------
1 | # %%
2 | import math
3 | import time
4 | import typing as ty
5 | from pathlib import Path
6 |
7 | import numpy as np
8 | import torch
9 | import torch.nn as nn
10 | import torch.nn.functional as F
11 | from torch.utils.data import DataLoader
12 |
13 | from models.abstract import TabModel, check_dir
14 | # %%
15 | class CrossLayer(nn.Module):
16 | def __init__(self, d, dropout):
17 | super().__init__()
18 | self.linear = nn.Linear(d, d)
19 | self.dropout = nn.Dropout(dropout)
20 |
21 | def forward(self, x0, x):
22 | return self.dropout(x0 * self.linear(x)) + x
23 |
24 |
25 | class _DCNv2(nn.Module):
26 | def __init__(
27 | self,
28 | *,
29 | d_in: int,
30 | d: int,
31 | n_hidden_layers: int,
32 | n_cross_layers: int,
33 | hidden_dropout: float,
34 | cross_dropout: float,
35 | d_out: int,
36 | stacked: bool,
37 | categories: ty.Optional[ty.List[int]],
38 | d_embedding: int = None,
39 | ) -> None:
40 | super().__init__()
41 |
42 | if categories is not None:
43 | assert d_embedding is not None
44 | d_in += len(categories) * d_embedding
45 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0)
46 | self.register_buffer('category_offsets', category_offsets)
47 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding)
48 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5))
49 | print(f'{self.category_embeddings.weight.shape}')
50 |
51 | self.first_linear = nn.Linear(d_in, d)
52 | self.last_linear = nn.Linear(d if stacked else 2 * d, d_out)
53 |
54 | deep_layers = sum(
55 | [
56 | [nn.Linear(d, d), nn.ReLU(True), nn.Dropout(hidden_dropout)]
57 | for _ in range(n_hidden_layers)
58 | ],
59 | [],
60 | )
61 | cross_layers = [CrossLayer(d, cross_dropout) for _ in range(n_cross_layers)]
62 |
63 | self.deep_layers = nn.Sequential(*deep_layers)
64 | self.cross_layers = nn.ModuleList(cross_layers)
65 | self.stacked = stacked
66 |
67 | def forward(self, x_num, x_cat):
68 | x = []
69 | if x_num is not None:
70 | x.append(x_num)
71 | if x_cat is not None:
72 | x.append(
73 | self.category_embeddings(x_cat + self.category_offsets[None]).view(
74 | x_cat.size(0), -1
75 | )
76 | )
77 | x = torch.cat(x, dim=-1)
78 |
79 | x = self.first_linear(x)
80 |
81 | x_cross = x
82 | for cross_layer in self.cross_layers:
83 | x_cross = cross_layer(x, x_cross)
84 |
85 | if self.stacked:
86 | return self.last_linear(self.deep_layers(x_cross)).squeeze(1)
87 | else:
88 | return self.last_linear(
89 | torch.cat([x_cross, self.deep_layers(x)], dim=1)
90 | ).squeeze(1)
91 |
92 |
93 | # %%
94 | class DCNv2(TabModel):
95 | def __init__(
96 | self,
97 | model_config: dict,
98 | n_num_features: int,
99 | categories: ty.Optional[ty.List[int]],
100 | n_labels: int,
101 | device: ty.Union[str, torch.device] = 'cuda',
102 | ):
103 | super().__init__()
104 | model_config = self.preproc_config(model_config)
105 | self.model = _DCNv2(
106 | d_in=n_num_features,
107 | categories=categories,
108 | d_out=n_labels,
109 | **model_config
110 | ).to(device)
111 | self.base_name = 'dcnv2'
112 | self.device = torch.device(device)
113 |
114 | def preproc_config(self, model_config: dict):
115 | # process autoint configs
116 | self.saved_model_config = model_config.copy()
117 | return model_config
118 |
119 | def fit(
120 | self,
121 | # API for specical sampler like curriculum learning
122 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx)
123 | # using normal sampler if is None
124 | X_num: ty.Optional[torch.Tensor] = None,
125 | X_cat: ty.Optional[torch.Tensor] = None,
126 | ys: ty.Optional[torch.Tensor] = None,
127 | y_std: ty.Optional[float] = None, # for RMSE
128 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None,
129 | patience: int = 0,
130 | task: str = None,
131 | training_args: dict = None,
132 | meta_args: ty.Optional[dict] = None,
133 | ):
134 | def train_step(model, x_num, x_cat, y): # input is X and y
135 | # process input (model-specific)
136 | # define your model API
137 | start_time = time.time()
138 | # define your model API
139 | logits = model(x_num, x_cat)
140 | used_time = time.time() - start_time
141 | return logits, used_time
142 |
143 | # to custom other training paradigm
144 | # 1. add self.dnn_fit2(...) in abstract class for special training process
145 | # 2. (recommended) override self.dnn_fit in abstract class
146 | self.dnn_fit( # uniform training paradigm
147 | dnn_fit_func=train_step,
148 | # training data
149 | train_loader=train_loader,
150 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std,
151 | # dev data
152 | eval_set=eval_set, patience=patience, task=task,
153 | # args
154 | training_args=training_args,
155 | meta_args=meta_args,
156 | )
157 |
158 | def predict(
159 | self,
160 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
161 | X_num: ty.Optional[torch.Tensor] = None,
162 | X_cat: ty.Optional[torch.Tensor] = None,
163 | ys: ty.Optional[torch.Tensor] = None,
164 | y_std: ty.Optional[float] = None, # for RMSE
165 | task: str = None,
166 | return_probs: bool = True,
167 | return_metric: bool = False,
168 | return_loss: bool = False,
169 | meta_args: ty.Optional[dict] = None,
170 | ):
171 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible)
172 | """
173 | Inference Process
174 | `no_grad` will be applied in `dnn_predict'
175 | """
176 | # process input (model-specific)
177 | # define your model API
178 | start_time = time.time()
179 | # define your model API
180 | logits = model(x_num, x_cat)
181 | used_time = time.time() - start_time
182 | return logits, used_time
183 |
184 | # to custom other inference paradigm
185 | # 1. add self.dnn_predict2(...) in abstract class for special training process
186 | # 2. (recommended) override self.dnn_predict in abstract class
187 | return self.dnn_predict( # uniform training paradigm
188 | dnn_predict_func=inference_step,
189 | dev_loader=dev_loader,
190 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task,
191 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss,
192 | meta_args=meta_args
193 | )
194 |
195 | def save(self, output_dir):
196 | check_dir(output_dir)
197 | self.save_pt_model(output_dir)
198 | self.save_history(output_dir)
199 | self.save_config(output_dir)
--------------------------------------------------------------------------------
/models/ft_transformer.py:
--------------------------------------------------------------------------------
1 | import rtdl
2 | import time
3 | import typing as ty
4 | import numpy as np
5 |
6 | import torch
7 | from torch.utils.data import DataLoader
8 | from models.abstract import TabModel, check_dir
9 |
10 | class FTTransformer(TabModel):
11 | def __init__(
12 | self,
13 | model_config: dict,
14 | n_num_features: int,
15 | categories: ty.Optional[ty.List[int]],
16 | n_labels: int,
17 | device: ty.Union[str, torch.device] = 'cuda',
18 | ):
19 | super().__init__()
20 | model_config = self.preproc_config(model_config)
21 | self.model = rtdl.FTTransformer.make_baseline(
22 | n_num_features=n_num_features,
23 | cat_cardinalities=categories,
24 | d_out=n_labels,
25 | **model_config
26 | ).to(device)
27 | self.base_name = 'ft-transformer'
28 | self.device = torch.device(device)
29 |
30 | def preproc_config(self, model_config: dict):
31 | self.saved_model_config = model_config.copy()
32 | # process ftt configs
33 | if 'ffn_d_factor' in model_config:
34 | model_config['ffn_d_hidden'] = \
35 | int(model_config['d_token'] * model_config.pop('ffn_d_factor'))
36 | return model_config
37 |
38 | def fit(
39 | self,
40 | # API for specical sampler like curriculum learning
41 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx)
42 | # using normal sampler if is None
43 | X_num: ty.Optional[torch.Tensor] = None,
44 | X_cat: ty.Optional[torch.Tensor] = None,
45 | ys: ty.Optional[torch.Tensor] = None,
46 | y_std: ty.Optional[float] = None, # for RMSE
47 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None,
48 | patience: int = 0,
49 | task: str = None,
50 | training_args: dict = None,
51 | meta_args: ty.Optional[dict] = None,
52 | ):
53 | def train_step(model, x_num, x_cat, y): # input is X and y
54 | # process input (model-specific)
55 | # define your running time calculation
56 | start_time = time.time()
57 | # define your model API
58 | logits = model(x_num, x_cat)
59 | used_time = time.time() - start_time # don't forget backward time, calculate in outer loop
60 | return logits, used_time
61 |
62 | # to custom other training paradigm
63 | # 1. add self.dnn_fit2(...) in abstract class for special training process
64 | # 2. (recommended) override self.dnn_fit in abstract class
65 | self.dnn_fit( # uniform training paradigm
66 | dnn_fit_func=train_step,
67 | # training data
68 | train_loader=train_loader,
69 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std,
70 | # dev data
71 | eval_set=eval_set, patience=patience, task=task,
72 | # args
73 | training_args=training_args,
74 | meta_args=meta_args,
75 | )
76 |
77 | def predict(
78 | self,
79 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
80 | X_num: ty.Optional[torch.Tensor] = None,
81 | X_cat: ty.Optional[torch.Tensor] = None,
82 | ys: ty.Optional[torch.Tensor] = None,
83 | y_std: ty.Optional[float] = None, # for RMSE
84 | task: str = None,
85 | return_probs: bool = True,
86 | return_metric: bool = False,
87 | return_loss: bool = False,
88 | meta_args: ty.Optional[dict] = None,
89 | ):
90 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible)
91 | """
92 | Inference Process
93 | `no_grad` will be applied in `dnn_predict'
94 | """
95 | # process input (model-specific)
96 | # define your running time calculation
97 | start_time = time.time()
98 | # define your model API
99 | logits = model(x_num, x_cat)
100 | used_time = time.time() - start_time
101 | return logits, used_time
102 |
103 | # to custom other inference paradigm
104 | # 1. add self.dnn_predict2(...) in abstract class for special training process
105 | # 2. (recommended) override self.dnn_predict in abstract class
106 | return self.dnn_predict( # uniform training paradigm
107 | dnn_predict_func=inference_step,
108 | dev_loader=dev_loader,
109 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task,
110 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss,
111 | meta_args=meta_args
112 | )
113 |
114 | def save(self, output_dir):
115 | check_dir(output_dir)
116 | self.save_pt_model(output_dir)
117 | self.save_history(output_dir)
118 | self.save_config(output_dir)
--------------------------------------------------------------------------------
/models/mlp.py:
--------------------------------------------------------------------------------
1 | # %%
2 | import os
3 | import json
4 | import math
5 | import time
6 | import typing as ty
7 | from pathlib import Path
8 |
9 | import numpy as np
10 | import torch
11 | import torch.nn as nn
12 | import torch.nn.functional as F
13 | from torch.utils.data import DataLoader
14 |
15 | from models.abstract import TabModel, check_dir
16 | # %%
17 | class _MLP(nn.Module):
18 | def __init__(
19 | self,
20 | d_in: int,
21 | d_layers: ty.List[int],
22 | dropout: float,
23 | d_out: int,
24 | categories: ty.Optional[ty.List[int]],
25 | d_embedding: int,
26 | ) -> None:
27 | super().__init__()
28 |
29 | self.n_categories = 0 if categories is None else len(categories)
30 | if categories is not None:
31 | d_in += len(categories) * d_embedding
32 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0)
33 | self.register_buffer('category_offsets', category_offsets)
34 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding)
35 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5))
36 | print(f'{self.category_embeddings.weight.shape}')
37 |
38 | self.layers = nn.ModuleList(
39 | [
40 | nn.Linear(d_layers[i - 1] if i else d_in, x)
41 | for i, x in enumerate(d_layers)
42 | ]
43 | )
44 | self.dropout = dropout
45 | self.head = nn.Linear(d_layers[-1] if d_layers else d_in, d_out)
46 |
47 | def forward(self, x_num, x_cat):
48 | x = []
49 | if x_num is not None:
50 | x.append(x_num)
51 | if x_cat is not None:
52 | x.append(
53 | self.category_embeddings(x_cat + self.category_offsets[None]).view(
54 | x_cat.size(0), -1
55 | )
56 | )
57 | x = torch.cat(x, dim=-1)
58 |
59 | for layer in self.layers:
60 | x = layer(x)
61 | x = F.relu(x)
62 | if self.dropout:
63 | x = F.dropout(x, self.dropout, self.training)
64 | x = self.head(x)
65 | x = x.squeeze(-1)
66 |
67 | return x
68 |
69 |
70 | # %%
71 | class MLP(TabModel):
72 | def __init__(
73 | self,
74 | model_config: dict,
75 | n_num_features: int,
76 | categories: ty.Optional[ty.List[int]],
77 | n_labels: int,
78 | device: ty.Union[str, torch.device] = 'cuda',
79 | ):
80 | super().__init__()
81 | model_config = self.preproc_config(model_config)
82 | self.model = _MLP(
83 | d_in=n_num_features,
84 | categories=categories,
85 | d_out=n_labels,
86 | **model_config
87 | ).to(device)
88 | self.base_name = 'mlp'
89 | self.device = torch.device(device)
90 |
91 | def preproc_config(self, model_config: dict):
92 | """MLP config preprocessing"""
93 | # process mlp configs
94 | self.saved_model_config = model_config.copy()
95 | d_layers = []
96 | n_layers, first_dim, mid_dim, last_dim = \
97 | (
98 | model_config.pop('n_layers'), model_config.pop('first_dim'),
99 | model_config.pop('mid_dim'), model_config.pop('last_dim')
100 | )
101 | for i in range(n_layers):
102 | if i == 0:
103 | d_layers.append(first_dim)
104 | elif i == n_layers - 1 and n_layers > 1:
105 | d_layers.append(last_dim)
106 | else:
107 | d_layers.append(mid_dim)
108 | model_config['d_layers'] = d_layers
109 | return model_config
110 |
111 | def fit(
112 | self,
113 | # API for specical sampler like curriculum learning
114 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx)
115 | # using normal sampler if is None
116 | X_num: ty.Optional[torch.Tensor] = None,
117 | X_cat: ty.Optional[torch.Tensor] = None,
118 | ys: ty.Optional[torch.Tensor] = None,
119 | y_std: ty.Optional[float] = None, # for RMSE
120 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None,
121 | patience: int = 0,
122 | task: str = None,
123 | training_args: dict = None,
124 | meta_args: ty.Optional[dict] = None,
125 | ):
126 | def train_step(model, x_num, x_cat, y): # input is X and y
127 | # process input (model-specific)
128 | # define your running time calculation
129 | start_time = time.time()
130 | # define your model API
131 | logits = model(x_num, x_cat)
132 | used_time = time.time() - start_time # don't forget backward time, calculate in outer loop
133 | return logits, used_time
134 |
135 | # to custom other training paradigm
136 | # 1. add self.dnn_fit2(...) in abstract class for special training process
137 | # 2. (recommended) override self.dnn_fit in abstract class
138 | self.dnn_fit( # uniform training paradigm
139 | dnn_fit_func=train_step,
140 | # training data
141 | train_loader=train_loader,
142 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std,
143 | # dev data
144 | eval_set=eval_set, patience=patience, task=task,
145 | # args
146 | training_args=training_args,
147 | meta_args=meta_args,
148 | )
149 |
150 | def predict(
151 | self,
152 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
153 | X_num: ty.Optional[torch.Tensor] = None,
154 | X_cat: ty.Optional[torch.Tensor] = None,
155 | ys: ty.Optional[torch.Tensor] = None,
156 | y_std: ty.Optional[float] = None, # for RMSE
157 | task: str = None,
158 | return_probs: bool = True,
159 | return_metric: bool = False,
160 | return_loss: bool = False,
161 | meta_args: ty.Optional[dict] = None,
162 | ):
163 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible)
164 | """
165 | Inference Process
166 | `no_grad` will be applied in `dnn_predict'
167 | """
168 | # process input (model-specific)
169 | # define your running time calculation
170 | start_time = time.time()
171 | # define your model API
172 | logits = model(x_num, x_cat)
173 | used_time = time.time() - start_time
174 | return logits, used_time
175 |
176 | # to custom other inference paradigm
177 | # 1. add self.dnn_predict2(...) in abstract class for special training process
178 | # 2. (recommended) override self.dnn_predict in abstract class
179 | return self.dnn_predict( # uniform training paradigm
180 | dnn_predict_func=inference_step,
181 | dev_loader=dev_loader,
182 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task,
183 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss,
184 | meta_args=meta_args
185 | )
186 |
187 | def save(self, output_dir):
188 | check_dir(output_dir)
189 | self.save_pt_model(output_dir)
190 | self.save_history(output_dir)
191 | self.save_config(output_dir)
--------------------------------------------------------------------------------
/models/node/__init__.py:
--------------------------------------------------------------------------------
1 | # Source: https://github.com/Qwicen/node
2 | from .arch import * # noqa
3 | from .nn_utils import * # noqa
4 | from .odst import * # noqa
5 | from .utils import * # noqa
6 |
--------------------------------------------------------------------------------
/models/node/arch.py:
--------------------------------------------------------------------------------
1 | # Source: https://github.com/Qwicen/node
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | from torch.utils.checkpoint import checkpoint as torch_checkpoint
6 |
7 | from .odst import ODST
8 |
9 |
10 | class DenseBlock(nn.Sequential):
11 | def __init__(self, input_dim, layer_dim, num_layers, tree_dim=1, max_features=None,
12 | input_dropout=0.0, flatten_output=True, Module=ODST, **kwargs):
13 | layers = []
14 | for i in range(num_layers):
15 | oddt = Module(input_dim, layer_dim, tree_dim=tree_dim, flatten_output=True, **kwargs)
16 | input_dim = min(input_dim + layer_dim * tree_dim, max_features or float('inf'))
17 | layers.append(oddt)
18 |
19 | super().__init__(*layers)
20 | self.num_layers, self.layer_dim, self.tree_dim = num_layers, layer_dim, tree_dim
21 | self.max_features, self.flatten_output = max_features, flatten_output
22 | self.input_dropout = input_dropout
23 |
24 | def forward(self, x):
25 | initial_features = x.shape[-1]
26 | for layer in self:
27 | layer_inp = x
28 | if self.max_features is not None:
29 | tail_features = min(self.max_features, layer_inp.shape[-1]) - initial_features
30 | if tail_features != 0:
31 | layer_inp = torch.cat([layer_inp[..., :initial_features], layer_inp[..., -tail_features:]], dim=-1)
32 | if self.training and self.input_dropout:
33 | layer_inp = F.dropout(layer_inp, self.input_dropout)
34 | h = layer(layer_inp)
35 | x = torch.cat([x, h], dim=-1)
36 |
37 | outputs = x[..., initial_features:]
38 | if not self.flatten_output:
39 | outputs = outputs.view(*outputs.shape[:-1], self.num_layers * self.layer_dim, self.tree_dim)
40 | return outputs
41 |
--------------------------------------------------------------------------------
/models/node/nn_utils.py:
--------------------------------------------------------------------------------
1 | # Source: https://github.com/Qwicen/node
2 | import contextlib
3 | from collections import OrderedDict
4 |
5 | import numpy as np
6 | import torch
7 | import torch.nn as nn
8 | import torch.nn.functional as F
9 | from torch.autograd import Function
10 | from torch.jit import script
11 |
12 |
13 | def to_one_hot(y, depth=None):
14 | r"""
15 | Takes integer with n dims and converts it to 1-hot representation with n + 1 dims.
16 | The n+1'st dimension will have zeros everywhere but at y'th index, where it will be equal to 1.
17 | Args:
18 | y: input integer (IntTensor, LongTensor or Variable) of any shape
19 | depth (int): the size of the one hot dimension
20 | """
21 | y_flat = y.to(torch.int64).view(-1, 1)
22 | depth = depth if depth is not None else int(torch.max(y_flat)) + 1
23 | y_one_hot = torch.zeros(y_flat.size()[0], depth, device=y.device).scatter_(1, y_flat, 1)
24 | y_one_hot = y_one_hot.view(*(tuple(y.shape) + (-1,)))
25 | return y_one_hot
26 |
27 |
28 | def _make_ix_like(input, dim=0):
29 | d = input.size(dim)
30 | rho = torch.arange(1, d + 1, device=input.device, dtype=input.dtype)
31 | view = [1] * input.dim()
32 | view[0] = -1
33 | return rho.view(view).transpose(0, dim)
34 |
35 |
36 | class SparsemaxFunction(Function):
37 | """
38 | An implementation of sparsemax (Martins & Astudillo, 2016). See
39 | :cite:`DBLP:journals/corr/MartinsA16` for detailed description.
40 |
41 | By Ben Peters and Vlad Niculae
42 | """
43 |
44 | @staticmethod
45 | def forward(ctx, input, dim=-1):
46 | """sparsemax: normalizing sparse transform (a la softmax)
47 |
48 | Parameters:
49 | input (Tensor): any shape
50 | dim: dimension along which to apply sparsemax
51 |
52 | Returns:
53 | output (Tensor): same shape as input
54 | """
55 | ctx.dim = dim
56 | max_val, _ = input.max(dim=dim, keepdim=True)
57 | input -= max_val # same numerical stability trick as for softmax
58 | tau, supp_size = SparsemaxFunction._threshold_and_support(input, dim=dim)
59 | output = torch.clamp(input - tau, min=0)
60 | ctx.save_for_backward(supp_size, output)
61 | return output
62 |
63 | @staticmethod
64 | def backward(ctx, grad_output):
65 | supp_size, output = ctx.saved_tensors
66 | dim = ctx.dim
67 | grad_input = grad_output.clone()
68 | grad_input[output == 0] = 0
69 |
70 | v_hat = grad_input.sum(dim=dim) / supp_size.to(output.dtype).squeeze()
71 | v_hat = v_hat.unsqueeze(dim)
72 | grad_input = torch.where(output != 0, grad_input - v_hat, grad_input)
73 | return grad_input, None
74 |
75 |
76 | @staticmethod
77 | def _threshold_and_support(input, dim=-1):
78 | """Sparsemax building block: compute the threshold
79 |
80 | Args:
81 | input: any dimension
82 | dim: dimension along which to apply the sparsemax
83 |
84 | Returns:
85 | the threshold value
86 | """
87 |
88 | input_srt, _ = torch.sort(input, descending=True, dim=dim)
89 | input_cumsum = input_srt.cumsum(dim) - 1
90 | rhos = _make_ix_like(input, dim)
91 | support = rhos * input_srt > input_cumsum
92 |
93 | support_size = support.sum(dim=dim).unsqueeze(dim)
94 | tau = input_cumsum.gather(dim, support_size - 1)
95 | tau /= support_size.to(input.dtype)
96 | return tau, support_size
97 |
98 |
99 | sparsemax = lambda input, dim=-1: SparsemaxFunction.apply(input, dim)
100 | sparsemoid = lambda input: (0.5 * input + 0.5).clamp_(0, 1)
101 |
102 |
103 | class Entmax15Function(Function):
104 | """
105 | An implementation of exact Entmax with alpha=1.5 (B. Peters, V. Niculae, A. Martins). See
106 | :cite:`https://arxiv.org/abs/1905.05702 for detailed description.
107 | Source: https://github.com/deep-spin/entmax
108 | """
109 |
110 | @staticmethod
111 | def forward(ctx, input, dim=-1):
112 | ctx.dim = dim
113 |
114 | max_val, _ = input.max(dim=dim, keepdim=True)
115 | input = input - max_val # same numerical stability trick as for softmax
116 | input = input / 2 # divide by 2 to solve actual Entmax
117 |
118 | tau_star, _ = Entmax15Function._threshold_and_support(input, dim)
119 | output = torch.clamp(input - tau_star, min=0) ** 2
120 | ctx.save_for_backward(output)
121 | return output
122 |
123 | @staticmethod
124 | def backward(ctx, grad_output):
125 | Y, = ctx.saved_tensors
126 | gppr = Y.sqrt() # = 1 / g'' (Y)
127 | dX = grad_output * gppr
128 | q = dX.sum(ctx.dim) / gppr.sum(ctx.dim)
129 | q = q.unsqueeze(ctx.dim)
130 | dX -= q * gppr
131 | return dX, None
132 |
133 | @staticmethod
134 | def _threshold_and_support(input, dim=-1):
135 | Xsrt, _ = torch.sort(input, descending=True, dim=dim)
136 |
137 | rho = _make_ix_like(input, dim)
138 | mean = Xsrt.cumsum(dim) / rho
139 | mean_sq = (Xsrt ** 2).cumsum(dim) / rho
140 | ss = rho * (mean_sq - mean ** 2)
141 | delta = (1 - ss) / rho
142 |
143 | # NOTE this is not exactly the same as in reference algo
144 | # Fortunately it seems the clamped values never wrongly
145 | # get selected by tau <= sorted_z. Prove this!
146 | delta_nz = torch.clamp(delta, 0)
147 | tau = mean - torch.sqrt(delta_nz)
148 |
149 | support_size = (tau <= Xsrt).sum(dim).unsqueeze(dim)
150 | tau_star = tau.gather(dim, support_size - 1)
151 | return tau_star, support_size
152 |
153 |
154 | class Entmoid15(Function):
155 | """ A highly optimized equivalent of labda x: Entmax15([x, 0]) """
156 |
157 | @staticmethod
158 | def forward(ctx, input):
159 | output = Entmoid15._forward(input)
160 | ctx.save_for_backward(output)
161 | return output
162 |
163 | @staticmethod
164 | @script
165 | def _forward(input):
166 | input, is_pos = abs(input), input >= 0
167 | tau = (input + torch.sqrt(F.relu(8 - input ** 2))) / 2
168 | tau.masked_fill_(tau <= input, 2.0)
169 | y_neg = 0.25 * F.relu(tau - input, inplace=True) ** 2
170 | return torch.where(is_pos, 1 - y_neg, y_neg)
171 |
172 | @staticmethod
173 | def backward(ctx, grad_output):
174 | return Entmoid15._backward(ctx.saved_tensors[0], grad_output)
175 |
176 | @staticmethod
177 | @script
178 | def _backward(output, grad_output):
179 | gppr0, gppr1 = output.sqrt(), (1 - output).sqrt()
180 | grad_input = grad_output * gppr0
181 | q = grad_input / (gppr0 + gppr1)
182 | grad_input -= q * gppr0
183 | return grad_input
184 |
185 |
186 | entmax15 = lambda input, dim=-1: Entmax15Function.apply(input, dim)
187 | entmoid15 = Entmoid15.apply
188 |
189 |
190 | class Lambda(nn.Module):
191 | def __init__(self, func):
192 | super().__init__()
193 | self.func = func
194 |
195 | def forward(self, *args, **kwargs):
196 | return self.func(*args, **kwargs)
197 |
198 |
199 | class ModuleWithInit(nn.Module):
200 | """ Base class for pytorch module with data-aware initializer on first batch """
201 | def __init__(self):
202 | super().__init__()
203 | self._is_initialized_tensor = nn.Parameter(torch.tensor(0, dtype=torch.uint8), requires_grad=False)
204 | self._is_initialized_bool = None
205 | # Note: this module uses a separate flag self._is_initialized so as to achieve both
206 | # * persistence: is_initialized is saved alongside model in state_dict
207 | # * speed: model doesn't need to cache
208 | # please DO NOT use these flags in child modules
209 |
210 | def initialize(self, *args, **kwargs):
211 | """ initialize module tensors using first batch of data """
212 | raise NotImplementedError("Please implement ")
213 |
214 | def __call__(self, *args, **kwargs):
215 | if self._is_initialized_bool is None:
216 | self._is_initialized_bool = bool(self._is_initialized_tensor.item())
217 | if not self._is_initialized_bool:
218 | self.initialize(*args, **kwargs)
219 | self._is_initialized_tensor.data[...] = 1
220 | self._is_initialized_bool = True
221 | return super().__call__(*args, **kwargs)
222 |
--------------------------------------------------------------------------------
/models/node/odst.py:
--------------------------------------------------------------------------------
1 | # Source: https://github.com/Qwicen/node
2 | from warnings import warn
3 |
4 | import numpy as np
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 |
9 | from .nn_utils import ModuleWithInit, sparsemax, sparsemoid
10 | from .utils import check_numpy
11 |
12 |
13 | class ODST(ModuleWithInit):
14 | def __init__(self, in_features, num_trees, depth=6, tree_dim=1, flatten_output=True,
15 | choice_function=sparsemax, bin_function=sparsemoid,
16 | initialize_response_=nn.init.normal_, initialize_selection_logits_=nn.init.uniform_,
17 | threshold_init_beta=1.0, threshold_init_cutoff=1.0,
18 | ):
19 | """
20 | Oblivious Differentiable Sparsemax Trees. http://tinyurl.com/odst-readmore
21 | One can drop (sic!) this module anywhere instead of nn.Linear
22 | :param in_features: number of features in the input tensor
23 | :param num_trees: number of trees in this layer
24 | :param tree_dim: number of response channels in the response of individual tree
25 | :param depth: number of splits in every tree
26 | :param flatten_output: if False, returns [..., num_trees, tree_dim],
27 | by default returns [..., num_trees * tree_dim]
28 | :param choice_function: f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1
29 | :param bin_function: f(tensor) -> R[0, 1], computes tree leaf weights
30 |
31 | :param initialize_response_: in-place initializer for tree output tensor
32 | :param initialize_selection_logits_: in-place initializer for logits that select features for the tree
33 | both thresholds and scales are initialized with data-aware init (or .load_state_dict)
34 | :param threshold_init_beta: initializes threshold to a q-th quantile of data points
35 | where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
36 | If this param is set to 1, initial thresholds will have the same distribution as data points
37 | If greater than 1 (e.g. 10), thresholds will be closer to median data value
38 | If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
39 |
40 | :param threshold_init_cutoff: threshold log-temperatures initializer, \in (0, inf)
41 | By default(1.0), log-remperatures are initialized in such a way that all bin selectors
42 | end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter.
43 | Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
44 | Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid region
45 | For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0
46 | Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value
47 | All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
48 | """
49 | super().__init__()
50 | self.depth, self.num_trees, self.tree_dim, self.flatten_output = depth, num_trees, tree_dim, flatten_output
51 | self.choice_function, self.bin_function = choice_function, bin_function
52 | self.threshold_init_beta, self.threshold_init_cutoff = threshold_init_beta, threshold_init_cutoff
53 |
54 | self.response = nn.Parameter(torch.zeros([num_trees, tree_dim, 2 ** depth]), requires_grad=True)
55 | initialize_response_(self.response)
56 |
57 | self.feature_selection_logits = nn.Parameter(
58 | torch.zeros([in_features, num_trees, depth]), requires_grad=True
59 | )
60 | initialize_selection_logits_(self.feature_selection_logits)
61 |
62 | self.feature_thresholds = nn.Parameter(
63 | torch.full([num_trees, depth], float('nan'), dtype=torch.float32), requires_grad=True
64 | ) # nan values will be initialized on first batch (data-aware init)
65 |
66 | self.log_temperatures = nn.Parameter(
67 | torch.full([num_trees, depth], float('nan'), dtype=torch.float32), requires_grad=True
68 | )
69 |
70 | # binary codes for mapping between 1-hot vectors and bin indices
71 | with torch.no_grad():
72 | indices = torch.arange(2 ** self.depth)
73 | offsets = 2 ** torch.arange(self.depth)
74 | bin_codes = (indices.view(1, -1) // offsets.view(-1, 1) % 2).to(torch.float32)
75 | bin_codes_1hot = torch.stack([bin_codes, 1.0 - bin_codes], dim=-1)
76 | self.bin_codes_1hot = nn.Parameter(bin_codes_1hot, requires_grad=False)
77 | # ^-- [depth, 2 ** depth, 2]
78 |
79 | def forward(self, input):
80 | assert len(input.shape) >= 2
81 | if len(input.shape) > 2:
82 | return self.forward(input.view(-1, input.shape[-1])).view(*input.shape[:-1], -1)
83 | # new input shape: [batch_size, in_features]
84 |
85 | feature_logits = self.feature_selection_logits
86 | feature_selectors = self.choice_function(feature_logits, dim=0)
87 | # ^--[in_features, num_trees, depth]
88 |
89 | feature_values = torch.einsum('bi,ind->bnd', input, feature_selectors)
90 | # ^--[batch_size, num_trees, depth]
91 |
92 | threshold_logits = (feature_values - self.feature_thresholds) * torch.exp(-self.log_temperatures)
93 |
94 | threshold_logits = torch.stack([-threshold_logits, threshold_logits], dim=-1)
95 | # ^--[batch_size, num_trees, depth, 2]
96 |
97 | bins = self.bin_function(threshold_logits)
98 | # ^--[batch_size, num_trees, depth, 2], approximately binary
99 |
100 | bin_matches = torch.einsum('btds,dcs->btdc', bins, self.bin_codes_1hot)
101 | # ^--[batch_size, num_trees, depth, 2 ** depth]
102 |
103 | response_weights = torch.prod(bin_matches, dim=-2)
104 | # ^-- [batch_size, num_trees, 2 ** depth]
105 |
106 | response = torch.einsum('bnd,ncd->bnc', response_weights, self.response)
107 | # ^-- [batch_size, num_trees, tree_dim]
108 |
109 | return response.flatten(1, 2) if self.flatten_output else response
110 |
111 | def initialize(self, input, eps=1e-6):
112 | # data-aware initializer
113 | assert len(input.shape) == 2
114 | if input.shape[0] < 1000:
115 | warn("Data-aware initialization is performed on less than 1000 data points. This may cause instability."
116 | "To avoid potential problems, run this model on a data batch with at least 1000 data samples."
117 | "You can do so manually before training. Use with torch.no_grad() for memory efficiency.")
118 | with torch.no_grad():
119 | feature_selectors = self.choice_function(self.feature_selection_logits, dim=0)
120 | # ^--[in_features, num_trees, depth]
121 |
122 | feature_values = torch.einsum('bi,ind->bnd', input, feature_selectors)
123 | # ^--[batch_size, num_trees, depth]
124 |
125 | # initialize thresholds: sample random percentiles of data
126 | percentiles_q = 100 * np.random.beta(self.threshold_init_beta, self.threshold_init_beta,
127 | size=[self.num_trees, self.depth])
128 | self.feature_thresholds.data[...] = torch.as_tensor(
129 | list(map(np.percentile, check_numpy(feature_values.flatten(1, 2).t()), percentiles_q.flatten())),
130 | dtype=feature_values.dtype, device=feature_values.device
131 | ).view(self.num_trees, self.depth)
132 |
133 | # init temperatures: make sure enough data points are in the linear region of sparse-sigmoid
134 | temperatures = np.percentile(check_numpy(abs(feature_values - self.feature_thresholds)),
135 | q=100 * min(1.0, self.threshold_init_cutoff), axis=0)
136 |
137 | # if threshold_init_cutoff > 1, scale everything down by it
138 | temperatures /= max(1.0, self.threshold_init_cutoff)
139 | self.log_temperatures.data[...] = torch.log(torch.as_tensor(temperatures) + eps)
140 |
141 | def __repr__(self):
142 | return "{}(in_features={}, num_trees={}, depth={}, tree_dim={}, flatten_output={})".format(
143 | self.__class__.__name__, self.feature_selection_logits.shape[0],
144 | self.num_trees, self.depth, self.tree_dim, self.flatten_output
145 | )
146 |
147 |
--------------------------------------------------------------------------------
/models/node/utils.py:
--------------------------------------------------------------------------------
1 | # Source: https://github.com/Qwicen/node
2 | import contextlib
3 | import gc
4 | import glob
5 | import hashlib
6 | import os
7 | import time
8 |
9 | import numpy as np
10 | import requests
11 | import torch
12 | from tqdm import tqdm
13 |
14 |
15 | def download(url, filename, delete_if_interrupted=True, chunk_size=4096):
16 | """ saves file from url to filename with a fancy progressbar """
17 | try:
18 | with open(filename, "wb") as f:
19 | print("Downloading {} > {}".format(url, filename))
20 | response = requests.get(url, stream=True)
21 | total_length = response.headers.get('content-length')
22 |
23 | if total_length is None: # no content length header
24 | f.write(response.content)
25 | else:
26 | total_length = int(total_length)
27 | with tqdm(total=total_length) as progressbar:
28 | for data in response.iter_content(chunk_size=chunk_size):
29 | if data: # filter-out keep-alive chunks
30 | f.write(data)
31 | progressbar.update(len(data))
32 | except Exception as e:
33 | if delete_if_interrupted:
34 | print("Removing incomplete download {}.".format(filename))
35 | os.remove(filename)
36 | raise e
37 | return filename
38 |
39 |
40 | def iterate_minibatches(*tensors, batch_size, shuffle=True, epochs=1,
41 | allow_incomplete=True, callback=lambda x:x):
42 | indices = np.arange(len(tensors[0]))
43 | upper_bound = int((np.ceil if allow_incomplete else np.floor) (len(indices) / batch_size)) * batch_size
44 | epoch = 0
45 | while True:
46 | if shuffle:
47 | np.random.shuffle(indices)
48 | for batch_start in callback(range(0, upper_bound, batch_size)):
49 | batch_ix = indices[batch_start: batch_start + batch_size]
50 | batch = [tensor[batch_ix] for tensor in tensors]
51 | yield batch if len(tensors) > 1 else batch[0]
52 | epoch += 1
53 | if epoch >= epochs:
54 | break
55 |
56 |
57 | def process_in_chunks(function, *args, batch_size, out=None, **kwargs):
58 | """
59 | Computes output by applying batch-parallel function to large data tensor in chunks
60 | :param function: a function(*[x[indices, ...] for x in args]) -> out[indices, ...]
61 | :param args: one or many tensors, each [num_instances, ...]
62 | :param batch_size: maximum chunk size processed in one go
63 | :param out: memory buffer for out, defaults to torch.zeros of appropriate size and type
64 | :returns: function(data), computed in a memory-efficient way
65 | """
66 | total_size = args[0].shape[0]
67 | first_output = function(*[x[0: batch_size] for x in args])
68 | output_shape = (total_size,) + tuple(first_output.shape[1:])
69 | if out is None:
70 | out = torch.zeros(*output_shape, dtype=first_output.dtype, device=first_output.device,
71 | layout=first_output.layout, **kwargs)
72 |
73 | out[0: batch_size] = first_output
74 | for i in range(batch_size, total_size, batch_size):
75 | batch_ix = slice(i, min(i + batch_size, total_size))
76 | out[batch_ix] = function(*[x[batch_ix] for x in args])
77 | return out
78 |
79 |
80 | def check_numpy(x):
81 | """ Makes sure x is a numpy array """
82 | if isinstance(x, torch.Tensor):
83 | x = x.detach().cpu().numpy()
84 | x = np.asarray(x)
85 | assert isinstance(x, np.ndarray)
86 | return x
87 |
88 |
89 | @contextlib.contextmanager
90 | def nop_ctx():
91 | yield None
92 |
93 |
94 | def get_latest_file(pattern):
95 | list_of_files = glob.glob(pattern) # * means all if need specific format then *.csv
96 | assert len(list_of_files) > 0, "No files found: " + pattern
97 | return max(list_of_files, key=os.path.getctime)
98 |
99 |
100 | def md5sum(fname):
101 | """ Computes mdp checksum of a file """
102 | hash_md5 = hashlib.md5()
103 | with open(fname, "rb") as f:
104 | for chunk in iter(lambda: f.read(4096), b""):
105 | hash_md5.update(chunk)
106 | return hash_md5.hexdigest()
107 |
108 |
109 | def free_memory(sleep_time=0.1):
110 | """ Black magic function to free torch memory and some jupyter whims """
111 | gc.collect()
112 | torch.cuda.synchronize()
113 | gc.collect()
114 | torch.cuda.empty_cache()
115 | time.sleep(sleep_time)
116 |
117 | def to_float_str(element):
118 | try:
119 | return str(float(element))
120 | except ValueError:
121 | return element
122 |
--------------------------------------------------------------------------------
/models/node_model.py:
--------------------------------------------------------------------------------
1 | # %%
2 | import time
3 | import math
4 | import typing as ty
5 | from pathlib import Path
6 |
7 | import numpy as np
8 | import torch
9 | import torch.nn as nn
10 | import torch.nn.functional as F
11 | from torch.utils.data import DataLoader
12 | from torch import Tensor
13 |
14 | import models.node as node
15 | from models.abstract import TabModel, check_dir
16 |
17 | # %%
18 | class _NODE(nn.Module):
19 | def __init__(
20 | self,
21 | *,
22 | d_in: int,
23 | num_layers: int,
24 | layer_dim: int,
25 | depth: int,
26 | tree_dim: int,
27 | choice_function: str,
28 | bin_function: str,
29 | d_out: int,
30 | categories: ty.Optional[ty.List[int]],
31 | d_embedding: int,
32 | ) -> None:
33 | super().__init__()
34 |
35 | if categories is not None:
36 | d_in += len(categories) * d_embedding
37 | category_offsets = torch.tensor([0] + categories[:-1]).cumsum(0)
38 | self.register_buffer('category_offsets', category_offsets)
39 | self.category_embeddings = nn.Embedding(sum(categories), d_embedding)
40 | nn.init.kaiming_uniform_(self.category_embeddings.weight, a=math.sqrt(5))
41 | print(f'{self.category_embeddings.weight.shape}')
42 |
43 | self.d_out = d_out
44 | self.block = node.DenseBlock(
45 | input_dim=d_in,
46 | num_layers=num_layers,
47 | layer_dim=layer_dim,
48 | depth=depth,
49 | tree_dim=tree_dim,
50 | bin_function=getattr(node, bin_function),
51 | choice_function=getattr(node, choice_function),
52 | flatten_output=False,
53 | )
54 |
55 | def forward(self, x_num: Tensor, x_cat: Tensor) -> Tensor:
56 | if x_cat is not None:
57 | x_cat = self.category_embeddings(x_cat + self.category_offsets[None])
58 | x = torch.cat([x_num, x_cat.view(x_cat.size(0), -1)], dim=-1)
59 | else:
60 | x = x_num
61 |
62 | x = self.block(x)
63 | x = x[..., : self.d_out].mean(dim=-2)
64 | x = x.squeeze(-1)
65 | return x
66 |
67 |
68 | # %%
69 | class NODE(TabModel):
70 | def __init__(
71 | self,
72 | model_config: dict,
73 | n_num_features: int,
74 | categories: ty.Optional[ty.List[int]],
75 | n_labels: int,
76 | device: ty.Union[str, torch.device] = 'cuda',
77 | ):
78 | super().__init__()
79 | model_config = self.preproc_config(model_config)
80 | self.model = _NODE(
81 | d_in=n_num_features,
82 | categories=categories,
83 | d_out=n_labels,
84 | tree_dim=n_labels,
85 | **model_config
86 | ).to(device)
87 | self.base_name = 'node'
88 | self.device = torch.device(device)
89 |
90 | def preproc_config(self, model_config: dict):
91 | # process autoint configs
92 | self.saved_model_config = model_config.copy()
93 | return model_config
94 |
95 | def fit(
96 | self,
97 | # API for specical sampler like curriculum learning
98 | train_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # (loader, missing_idx)
99 | # using normal sampler if is None
100 | X_num: ty.Optional[torch.Tensor] = None,
101 | X_cat: ty.Optional[torch.Tensor] = None,
102 | ys: ty.Optional[torch.Tensor] = None,
103 | y_std: ty.Optional[float] = None, # for RMSE
104 | eval_set: ty.Tuple[torch.Tensor, np.ndarray] = None,
105 | patience: int = 0,
106 | task: str = None,
107 | training_args: dict = None,
108 | meta_args: ty.Optional[dict] = None,
109 | ):
110 | def train_step(model, x_num, x_cat, y): # input is X and y
111 | # process input (model-specific)
112 | # define your model API
113 | start_time = time.time()
114 | # define your model API
115 | logits = model(x_num, x_cat)
116 | used_time = time.time() - start_time
117 | return logits, used_time
118 |
119 | # to custom other training paradigm
120 | # 1. add self.dnn_fit2(...) in abstract class for special training process
121 | # 2. (recommended) override self.dnn_fit in abstract class
122 | self.dnn_fit( # uniform training paradigm
123 | dnn_fit_func=train_step,
124 | # training data
125 | train_loader=train_loader,
126 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std,
127 | # dev data
128 | eval_set=eval_set, patience=patience, task=task,
129 | # args
130 | training_args=training_args,
131 | meta_args=meta_args,
132 | )
133 |
134 | def predict(
135 | self,
136 | dev_loader: ty.Optional[ty.Tuple[DataLoader, int]] = None, # reuse, (loader, missing_idx)
137 | X_num: ty.Optional[torch.Tensor] = None,
138 | X_cat: ty.Optional[torch.Tensor] = None,
139 | ys: ty.Optional[torch.Tensor] = None,
140 | y_std: ty.Optional[float] = None, # for RMSE
141 | task: str = None,
142 | return_probs: bool = True,
143 | return_metric: bool = False,
144 | return_loss: bool = False,
145 | meta_args: ty.Optional[dict] = None,
146 | ):
147 | def inference_step(model, x_num, x_cat): # input only X (y inaccessible)
148 | """
149 | Inference Process
150 | `no_grad` will be applied in `dnn_predict'
151 | """
152 | # process input (model-specific)
153 | # define your running time calculation
154 | start_time = time.time()
155 | # define your model API
156 | logits = model(x_num, x_cat)
157 | used_time = time.time() - start_time
158 | return logits, used_time
159 |
160 | # to custom other inference paradigm
161 | # 1. add self.dnn_predict2(...) in abstract class for special training process
162 | # 2. (recommended) override self.dnn_predict in abstract class
163 | return self.dnn_predict( # uniform training paradigm
164 | dnn_predict_func=inference_step,
165 | dev_loader=dev_loader,
166 | X_num=X_num, X_cat=X_cat, ys=ys, y_std=y_std, task=task,
167 | return_probs=return_probs, return_metric=return_metric, return_loss=return_loss,
168 | meta_args=meta_args
169 | )
170 |
171 | def save(self, output_dir):
172 | check_dir(output_dir)
173 | self.save_pt_model(output_dir)
174 | self.save_history(output_dir)
175 | self.save_config(output_dir)
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | category_encoders==2.5.1.post0
2 | numpy==1.23.5
3 | optuna==3.0.5
4 | pandas==1.3.5
5 | PyYAML==6.0
6 | PyYAML==6.0.1
7 | Requests==2.31.0
8 | rtdl==0.0.13
9 | scikit_learn==1.0.2
10 | scipy==1.8.1
11 | torch==1.9.0+cu111
12 | tqdm==4.64.1
13 | typing_extensions==4.8.0
14 |
--------------------------------------------------------------------------------
/utils/deep.py:
--------------------------------------------------------------------------------
1 | """
2 | References:
3 | - https://github.com/yandex-research/tabular-dl-revisiting-models/blob/main/lib/deep.py
4 | """
5 | from __future__ import absolute_import, division, print_function
6 |
7 | import math
8 | import os
9 | import typing as ty
10 | from copy import deepcopy
11 |
12 | import torch
13 | import torch.nn as nn
14 | import torch.nn.functional as F
15 | import torch.optim as optim
16 | from torch import Tensor
17 |
18 |
19 | class Lambda(nn.Module):
20 | def __init__(self, f: ty.Callable) -> None:
21 | super().__init__()
22 | self.f = f
23 |
24 | def forward(self, x):
25 | return self.f(x)
26 |
27 |
28 | # Source: https://github.com/bzhangGo/rmsnorm
29 | # NOTE: eps is changed to 1e-5
30 | class RMSNorm(nn.Module):
31 | def __init__(self, d, p=-1.0, eps=1e-5, bias=False):
32 | """Root Mean Square Layer Normalization
33 |
34 | :param d: model size
35 | :param p: partial RMSNorm, valid value [0, 1], default -1.0 (disabled)
36 | :param eps: epsilon value, default 1e-8
37 | :param bias: whether use bias term for RMSNorm, disabled by
38 | default because RMSNorm doesn't enforce re-centering invariance.
39 | """
40 | super(RMSNorm, self).__init__()
41 |
42 | self.eps = eps
43 | self.d = d
44 | self.p = p
45 | self.bias = bias
46 |
47 | self.scale = nn.Parameter(torch.ones(d))
48 | self.register_parameter("scale", self.scale)
49 |
50 | if self.bias:
51 | self.offset = nn.Parameter(torch.zeros(d))
52 | self.register_parameter("offset", self.offset)
53 |
54 | def forward(self, x):
55 | if self.p < 0.0 or self.p > 1.0:
56 | norm_x = x.norm(2, dim=-1, keepdim=True)
57 | d_x = self.d
58 | else:
59 | partial_size = int(self.d * self.p)
60 | partial_x, _ = torch.split(x, [partial_size, self.d - partial_size], dim=-1)
61 |
62 | norm_x = partial_x.norm(2, dim=-1, keepdim=True)
63 | d_x = partial_size
64 |
65 | rms_x = norm_x * d_x ** (-1.0 / 2)
66 | x_normed = x / (rms_x + self.eps)
67 |
68 | if self.bias:
69 | return self.scale * x_normed + self.offset
70 |
71 | return self.scale * x_normed
72 |
73 |
74 | class ScaleNorm(nn.Module):
75 | """
76 | Sources:
77 | - https://github.com/tnq177/transformers_without_tears/blob/25026061979916afb193274438f7097945acf9bc/layers.py#L132
78 | - https://github.com/tnq177/transformers_without_tears/blob/6b2726cd9e6e642d976ae73b9f696d9d7ff4b395/layers.py#L157
79 | """
80 |
81 | def __init__(self, d: int, eps: float = 1e-5, clamp: bool = False) -> None:
82 | super(ScaleNorm, self).__init__()
83 | self.scale = nn.Parameter(torch.tensor(d ** 0.5))
84 | self.eps = eps
85 | self.clamp = clamp
86 |
87 | def forward(self, x):
88 | norms = torch.norm(x, dim=-1, keepdim=True)
89 | norms = norms.clamp(min=self.eps) if self.clamp else norms + self.eps
90 | return self.scale * x / norms
91 |
92 |
93 | def reglu(x: Tensor) -> Tensor:
94 | a, b = x.chunk(2, dim=-1)
95 | return a * F.relu(b)
96 |
97 |
98 | def geglu(x: Tensor) -> Tensor:
99 | a, b = x.chunk(2, dim=-1)
100 | return a * F.gelu(b)
101 |
102 | def tanglu(x: Tensor) -> Tensor:
103 | a, b = x.chunk(2, dim=-1)
104 | return a * torch.tanh(b)
105 |
106 |
107 | class ReGLU(nn.Module):
108 | def forward(self, x: Tensor) -> Tensor:
109 | return reglu(x)
110 |
111 |
112 | class GEGLU(nn.Module):
113 | def forward(self, x: Tensor) -> Tensor:
114 | return geglu(x)
115 |
116 |
117 | def make_optimizer(
118 | optimizer: str,
119 | parameter_groups,
120 | lr: float,
121 | weight_decay: float,
122 | ) -> optim.Optimizer:
123 | Optimizer = {
124 | 'adabelief': AdaBelief,
125 | 'adam': optim.Adam,
126 | 'adamw': optim.AdamW,
127 | 'radam': RAdam,
128 | 'sgd': optim.SGD,
129 | }[optimizer]
130 | momentum = (0.9,) if Optimizer is optim.SGD else ()
131 | return Optimizer(parameter_groups, lr, *momentum, weight_decay=weight_decay)
132 |
133 |
134 | def make_lr_schedule(
135 | optimizer: optim.Optimizer,
136 | lr: float,
137 | epoch_size: int,
138 | lr_schedule: ty.Optional[ty.Dict[str, ty.Any]],
139 | ) -> ty.Tuple[
140 | ty.Optional[optim.lr_scheduler._LRScheduler],
141 | ty.Dict[str, ty.Any],
142 | ty.Optional[int],
143 | ]:
144 | if lr_schedule is None:
145 | lr_schedule = {'type': 'constant'}
146 | lr_scheduler = None
147 | n_warmup_steps = None
148 | if lr_schedule['type'] in ['transformer', 'linear_warmup']:
149 | n_warmup_steps = (
150 | lr_schedule['n_warmup_steps']
151 | if 'n_warmup_steps' in lr_schedule
152 | else lr_schedule['n_warmup_epochs'] * epoch_size
153 | )
154 | elif lr_schedule['type'] == 'cyclic':
155 | lr_scheduler = optim.lr_scheduler.CyclicLR(
156 | optimizer,
157 | base_lr=lr,
158 | max_lr=lr_schedule['max_lr'],
159 | step_size_up=lr_schedule['n_epochs_up'] * epoch_size,
160 | step_size_down=lr_schedule['n_epochs_down'] * epoch_size,
161 | mode=lr_schedule['mode'],
162 | gamma=lr_schedule.get('gamma', 1.0),
163 | cycle_momentum=False,
164 | )
165 | return lr_scheduler, lr_schedule, n_warmup_steps
166 |
167 |
168 | def get_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]:
169 | return (
170 | reglu
171 | if name == 'reglu'
172 | else geglu
173 | if name == 'geglu'
174 | else torch.sigmoid
175 | if name == 'sigmoid'
176 | else tanglu
177 | if name == 'tanglu'
178 | else getattr(F, name)
179 | )
180 |
181 |
182 | def get_nonglu_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]:
183 | return (
184 | F.relu
185 | if name == 'reglu'
186 | else F.gelu
187 | if name == 'geglu'
188 | else get_activation_fn(name)
189 | )
190 |
191 |
192 | def load_swa_state_dict(model: nn.Module, swa_model: optim.swa_utils.AveragedModel):
193 | state_dict = deepcopy(swa_model.state_dict())
194 | del state_dict['n_averaged']
195 | model.load_state_dict({k[len('module.') :]: v for k, v in state_dict.items()})
196 |
197 |
198 | def get_epoch_parameters(
199 | train_size: int, batch_size: ty.Union[int, str]
200 | ) -> ty.Tuple[int, int]:
201 | if isinstance(batch_size, str):
202 | if batch_size == 'v3':
203 | batch_size = (
204 | 256 if train_size < 50000 else 512 if train_size < 100000 else 1024
205 | )
206 | elif batch_size == 'v1':
207 | batch_size = (
208 | 16
209 | if train_size < 1000
210 | else 32
211 | if train_size < 10000
212 | else 64
213 | if train_size < 50000
214 | else 128
215 | if train_size < 100000
216 | else 256
217 | if train_size < 200000
218 | else 512
219 | if train_size < 500000
220 | else 1024
221 | )
222 | elif batch_size == 'v2':
223 | batch_size = (
224 | 512 if train_size < 100000 else 1024 if train_size < 500000 else 2048
225 | )
226 | return batch_size, math.ceil(train_size / batch_size) # type: ignore[code]
227 |
228 |
229 | def get_linear_warmup_lr(lr: float, n_warmup_steps: int, step: int) -> float:
230 | assert step > 0, "1-based enumeration of steps is expected"
231 | return min(lr, step / (n_warmup_steps + 1) * lr)
232 |
233 |
234 | def get_manual_lr(schedule: ty.List[float], epoch: int) -> float:
235 | assert epoch > 0, "1-based enumeration of epochs is expected"
236 | return schedule[min(epoch, len(schedule)) - 1]
237 |
238 |
239 | def get_transformer_lr(scale: float, d: int, n_warmup_steps: int, step: int) -> float:
240 | return scale * d ** -0.5 * min(step ** -0.5, step * n_warmup_steps ** -1.5)
241 |
242 |
243 | def learn(model, optimizer, loss_fn, step, batch, star) -> ty.Tuple[Tensor, ty.Any]:
244 | model.train()
245 | optimizer.zero_grad()
246 | out = step(batch)
247 | loss = loss_fn(*out) if star else loss_fn(out)
248 | loss.backward()
249 | optimizer.step()
250 | return loss, out
251 |
252 |
253 | def tensor(x) -> torch.Tensor:
254 | assert isinstance(x, torch.Tensor)
255 | return ty.cast(torch.Tensor, x)
256 |
257 |
258 | def get_n_parameters(m: nn.Module):
259 | return sum(x.numel() for x in m.parameters() if x.requires_grad)
260 |
261 |
262 | def get_mlp_n_parameters(units: ty.List[int]):
263 | x = 0
264 | for a, b in zip(units, units[1:]):
265 | x += a * b + b
266 | return x
267 |
268 |
269 | def get_lr(optimizer: optim.Optimizer) -> float:
270 | return next(iter(optimizer.param_groups))['lr']
271 |
272 |
273 | def set_lr(optimizer: optim.Optimizer, lr: float) -> None:
274 | for x in optimizer.param_groups:
275 | x['lr'] = lr
276 |
277 |
278 | def get_device() -> torch.device:
279 | return torch.device('cuda:0' if os.environ.get('CUDA_VISIBLE_DEVICES') else 'cpu')
280 |
281 |
282 | @torch.no_grad()
283 | def get_gradient_norm_ratios(m: nn.Module):
284 | return {
285 | k: v.grad.norm() / v.norm()
286 | for k, v in m.named_parameters()
287 | if v.grad is not None
288 | }
289 |
290 |
291 | def is_oom_exception(err: RuntimeError) -> bool:
292 | return any(
293 | x in str(err)
294 | for x in [
295 | 'CUDA out of memory',
296 | 'CUBLAS_STATUS_ALLOC_FAILED',
297 | 'CUDA error: out of memory',
298 | ]
299 | )
300 |
301 |
302 | # Source: https://github.com/LiyuanLucasLiu/RAdam
303 | class RAdam(optim.Optimizer):
304 | def __init__(
305 | self,
306 | params,
307 | lr=1e-3,
308 | betas=(0.9, 0.999),
309 | eps=1e-8,
310 | weight_decay=0,
311 | degenerated_to_sgd=True,
312 | ):
313 | if not 0.0 <= lr:
314 | raise ValueError("Invalid learning rate: {}".format(lr))
315 | if not 0.0 <= eps:
316 | raise ValueError("Invalid epsilon value: {}".format(eps))
317 | if not 0.0 <= betas[0] < 1.0:
318 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
319 | if not 0.0 <= betas[1] < 1.0:
320 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
321 |
322 | self.degenerated_to_sgd = degenerated_to_sgd
323 | if (
324 | isinstance(params, (list, tuple))
325 | and len(params) > 0
326 | and isinstance(params[0], dict)
327 | ):
328 | for param in params:
329 | if 'betas' in param and (
330 | param['betas'][0] != betas[0] or param['betas'][1] != betas[1]
331 | ):
332 | param['buffer'] = [[None, None, None] for _ in range(10)]
333 | defaults = dict(
334 | lr=lr,
335 | betas=betas,
336 | eps=eps,
337 | weight_decay=weight_decay,
338 | buffer=[[None, None, None] for _ in range(10)],
339 | )
340 | super(RAdam, self).__init__(params, defaults)
341 |
342 | def __setstate__(self, state):
343 | super(RAdam, self).__setstate__(state)
344 |
345 | def step(self, closure=None):
346 |
347 | loss = None
348 | if closure is not None:
349 | loss = closure()
350 |
351 | for group in self.param_groups:
352 |
353 | for p in group['params']:
354 | if p.grad is None:
355 | continue
356 | grad = p.grad.data.float()
357 | if grad.is_sparse:
358 | raise RuntimeError('RAdam does not support sparse gradients')
359 |
360 | p_data_fp32 = p.data.float()
361 |
362 | state = self.state[p]
363 |
364 | if len(state) == 0:
365 | state['step'] = 0
366 | state['exp_avg'] = torch.zeros_like(p_data_fp32)
367 | state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
368 | else:
369 | state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
370 | state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)
371 |
372 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
373 | beta1, beta2 = group['betas']
374 |
375 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
376 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
377 |
378 | state['step'] += 1
379 | buffered = group['buffer'][int(state['step'] % 10)]
380 | if state['step'] == buffered[0]:
381 | N_sma, step_size = buffered[1], buffered[2]
382 | else:
383 | buffered[0] = state['step']
384 | beta2_t = beta2 ** state['step']
385 | N_sma_max = 2 / (1 - beta2) - 1
386 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
387 | buffered[1] = N_sma
388 |
389 | # more conservative since it's an approximated value
390 | if N_sma >= 5:
391 | step_size = math.sqrt(
392 | (1 - beta2_t)
393 | * (N_sma - 4)
394 | / (N_sma_max - 4)
395 | * (N_sma - 2)
396 | / N_sma
397 | * N_sma_max
398 | / (N_sma_max - 2)
399 | ) / (1 - beta1 ** state['step'])
400 | elif self.degenerated_to_sgd:
401 | step_size = 1.0 / (1 - beta1 ** state['step'])
402 | else:
403 | step_size = -1
404 | buffered[2] = step_size
405 |
406 | # more conservative since it's an approximated value
407 | if N_sma >= 5:
408 | if group['weight_decay'] != 0:
409 | p_data_fp32.add_(
410 | -group['weight_decay'] * group['lr'], p_data_fp32
411 | )
412 | denom = exp_avg_sq.sqrt().add_(group['eps'])
413 | p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom)
414 | p.data.copy_(p_data_fp32)
415 | elif step_size > 0:
416 | if group['weight_decay'] != 0:
417 | p_data_fp32.add_(
418 | -group['weight_decay'] * group['lr'], p_data_fp32
419 | )
420 | p_data_fp32.add_(-step_size * group['lr'], exp_avg)
421 | p.data.copy_(p_data_fp32)
422 |
423 | return loss
424 |
425 |
426 | version_higher = torch.__version__ >= "1.5.0"
427 |
428 |
429 | # Source: https://github.com/juntang-zhuang/Adabelief-Optimizer
430 | class AdaBelief(optim.Optimizer):
431 | r"""Implements AdaBelief algorithm. Modified from Adam in PyTorch
432 | Arguments:
433 | params (iterable): iterable of parameters to optimize or dicts defining
434 | parameter groups
435 | lr (float, optional): learning rate (default: 1e-3)
436 | betas (Tuple[float, float], optional): coefficients used for computing
437 | running averages of gradient and its square (default: (0.9, 0.999))
438 | eps (float, optional): term added to the denominator to improve
439 | numerical stability (default: 1e-16)
440 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
441 | amsgrad (boolean, optional): whether to use the AMSGrad variant of this
442 | algorithm from the paper `On the Convergence of Adam and Beyond`_
443 | (default: False)
444 | weight_decouple (boolean, optional): ( default: True) If set as True, then
445 | the optimizer uses decoupled weight decay as in AdamW
446 | fixed_decay (boolean, optional): (default: False) This is used when weight_decouple
447 | is set as True.
448 | When fixed_decay == True, the weight decay is performed as
449 | $W_{new} = W_{old} - W_{old} \times decay$.
450 | When fixed_decay == False, the weight decay is performed as
451 | $W_{new} = W_{old} - W_{old} \times decay \times lr$. Note that in this case, the
452 | weight decay ratio decreases with learning rate (lr).
453 | rectify (boolean, optional): (default: True) If set as True, then perform the rectified
454 | update similar to RAdam
455 | degenerated_to_sgd (boolean, optional) (default:True) If set as True, then perform SGD update
456 | when variance of gradient is high
457 | print_change_log (boolean, optional) (default: True) If set as True, print the modifcation to
458 | default hyper-parameters
459 | reference: AdaBelief Optimizer, adapting stepsizes by the belief in observed gradients, NeurIPS 2020
460 | """
461 |
462 | def __init__(
463 | self,
464 | params,
465 | lr=1e-3,
466 | betas=(0.9, 0.999),
467 | eps=1e-16,
468 | weight_decay=0,
469 | amsgrad=False,
470 | weight_decouple=True,
471 | fixed_decay=False,
472 | rectify=True,
473 | degenerated_to_sgd=True,
474 | print_change_log=True,
475 | ):
476 |
477 | # ------------------------------------------------------------------------------
478 | # Print modifications to default arguments
479 | if print_change_log:
480 | print(
481 | 'Please check your arguments if you have upgraded adabelief-pytorch from version 0.0.5.'
482 | )
483 | print('Modifications to default arguments:')
484 | default_table = [
485 | ['eps', 'weight_decouple', 'rectify'],
486 | ['adabelief-pytorch=0.0.5', '1e-8', 'False', 'False'],
487 | ['>=0.1.0 (Current 0.2.0)', '1e-16', 'True', 'True'],
488 | ]
489 | print(default_table)
490 |
491 | recommend_table = [
492 | [
493 | 'SGD better than Adam (e.g. CNN for Image Classification)',
494 | 'Adam better than SGD (e.g. Transformer, GAN)',
495 | ],
496 | ['Recommended eps = 1e-8', 'Recommended eps = 1e-16'],
497 | ]
498 | print(recommend_table)
499 |
500 | print('For a complete table of recommended hyperparameters, see')
501 | print('https://github.com/juntang-zhuang/Adabelief-Optimizer')
502 |
503 | print(
504 | 'You can disable the log message by setting "print_change_log = False", though it is recommended to keep as a reminder.'
505 | )
506 | # ------------------------------------------------------------------------------
507 |
508 | if not 0.0 <= lr:
509 | raise ValueError("Invalid learning rate: {}".format(lr))
510 | if not 0.0 <= eps:
511 | raise ValueError("Invalid epsilon value: {}".format(eps))
512 | if not 0.0 <= betas[0] < 1.0:
513 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
514 | if not 0.0 <= betas[1] < 1.0:
515 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
516 |
517 | self.degenerated_to_sgd = degenerated_to_sgd
518 | if (
519 | isinstance(params, (list, tuple))
520 | and len(params) > 0
521 | and isinstance(params[0], dict)
522 | ):
523 | for param in params:
524 | if 'betas' in param and (
525 | param['betas'][0] != betas[0] or param['betas'][1] != betas[1]
526 | ):
527 | param['buffer'] = [[None, None, None] for _ in range(10)]
528 |
529 | defaults = dict(
530 | lr=lr,
531 | betas=betas,
532 | eps=eps,
533 | weight_decay=weight_decay,
534 | amsgrad=amsgrad,
535 | buffer=[[None, None, None] for _ in range(10)],
536 | )
537 | super(AdaBelief, self).__init__(params, defaults)
538 |
539 | self.degenerated_to_sgd = degenerated_to_sgd
540 | self.weight_decouple = weight_decouple
541 | self.rectify = rectify
542 | self.fixed_decay = fixed_decay
543 | if self.weight_decouple:
544 | print('Weight decoupling enabled in AdaBelief')
545 | if self.fixed_decay:
546 | print('Weight decay fixed')
547 | if self.rectify:
548 | print('Rectification enabled in AdaBelief')
549 | if amsgrad:
550 | print('AMSGrad enabled in AdaBelief')
551 |
552 | def __setstate__(self, state):
553 | super(AdaBelief, self).__setstate__(state)
554 | for group in self.param_groups:
555 | group.setdefault('amsgrad', False)
556 |
557 | def reset(self):
558 | for group in self.param_groups:
559 | for p in group['params']:
560 | state = self.state[p]
561 | amsgrad = group['amsgrad']
562 |
563 | # State initialization
564 | state['step'] = 0
565 | # Exponential moving average of gradient values
566 | state['exp_avg'] = (
567 | torch.zeros_like(p.data, memory_format=torch.preserve_format)
568 | if version_higher
569 | else torch.zeros_like(p.data)
570 | )
571 |
572 | # Exponential moving average of squared gradient values
573 | state['exp_avg_var'] = (
574 | torch.zeros_like(p.data, memory_format=torch.preserve_format)
575 | if version_higher
576 | else torch.zeros_like(p.data)
577 | )
578 |
579 | if amsgrad:
580 | # Maintains max of all exp. moving avg. of sq. grad. values
581 | state['max_exp_avg_var'] = (
582 | torch.zeros_like(p.data, memory_format=torch.preserve_format)
583 | if version_higher
584 | else torch.zeros_like(p.data)
585 | )
586 |
587 | def step(self, closure=None):
588 | """Performs a single optimization step.
589 | Arguments:
590 | closure (callable, optional): A closure that reevaluates the model
591 | and returns the loss.
592 | """
593 | loss = None
594 | if closure is not None:
595 | loss = closure()
596 |
597 | for group in self.param_groups:
598 | for p in group['params']:
599 | if p.grad is None:
600 | continue
601 |
602 | # cast data type
603 | half_precision = False
604 | if p.data.dtype == torch.float16:
605 | half_precision = True
606 | p.data = p.data.float()
607 | p.grad = p.grad.float()
608 |
609 | grad = p.grad.data
610 | if grad.is_sparse:
611 | raise RuntimeError(
612 | 'AdaBelief does not support sparse gradients, please consider SparseAdam instead'
613 | )
614 | amsgrad = group['amsgrad']
615 |
616 | state = self.state[p]
617 |
618 | beta1, beta2 = group['betas']
619 |
620 | # State initialization
621 | if len(state) == 0:
622 | state['step'] = 0
623 | # Exponential moving average of gradient values
624 | state['exp_avg'] = (
625 | torch.zeros_like(p.data, memory_format=torch.preserve_format)
626 | if version_higher
627 | else torch.zeros_like(p.data)
628 | )
629 | # Exponential moving average of squared gradient values
630 | state['exp_avg_var'] = (
631 | torch.zeros_like(p.data, memory_format=torch.preserve_format)
632 | if version_higher
633 | else torch.zeros_like(p.data)
634 | )
635 | if amsgrad:
636 | # Maintains max of all exp. moving avg. of sq. grad. values
637 | state['max_exp_avg_var'] = (
638 | torch.zeros_like(
639 | p.data, memory_format=torch.preserve_format
640 | )
641 | if version_higher
642 | else torch.zeros_like(p.data)
643 | )
644 |
645 | # perform weight decay, check if decoupled weight decay
646 | if self.weight_decouple:
647 | if not self.fixed_decay:
648 | p.data.mul_(1.0 - group['lr'] * group['weight_decay'])
649 | else:
650 | p.data.mul_(1.0 - group['weight_decay'])
651 | else:
652 | if group['weight_decay'] != 0:
653 | grad.add_(p.data, alpha=group['weight_decay'])
654 |
655 | # get current state variable
656 | exp_avg, exp_avg_var = state['exp_avg'], state['exp_avg_var']
657 |
658 | state['step'] += 1
659 | bias_correction1 = 1 - beta1 ** state['step']
660 | bias_correction2 = 1 - beta2 ** state['step']
661 |
662 | # Update first and second moment running average
663 | exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
664 | grad_residual = grad - exp_avg
665 | exp_avg_var.mul_(beta2).addcmul_(
666 | grad_residual, grad_residual, value=1 - beta2
667 | )
668 |
669 | if amsgrad:
670 | max_exp_avg_var = state['max_exp_avg_var']
671 | # Maintains the maximum of all 2nd moment running avg. till now
672 | torch.max(
673 | max_exp_avg_var,
674 | exp_avg_var.add_(group['eps']),
675 | out=max_exp_avg_var,
676 | )
677 |
678 | # Use the max. for normalizing running avg. of gradient
679 | denom = (max_exp_avg_var.sqrt() / math.sqrt(bias_correction2)).add_(
680 | group['eps']
681 | )
682 | else:
683 | denom = (
684 | exp_avg_var.add_(group['eps']).sqrt()
685 | / math.sqrt(bias_correction2)
686 | ).add_(group['eps'])
687 |
688 | # update
689 | if not self.rectify:
690 | # Default update
691 | step_size = group['lr'] / bias_correction1
692 | p.data.addcdiv_(exp_avg, denom, value=-step_size)
693 |
694 | else: # Rectified update, forked from RAdam
695 | buffered = group['buffer'][int(state['step'] % 10)]
696 | if state['step'] == buffered[0]:
697 | N_sma, step_size = buffered[1], buffered[2]
698 | else:
699 | buffered[0] = state['step']
700 | beta2_t = beta2 ** state['step']
701 | N_sma_max = 2 / (1 - beta2) - 1
702 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
703 | buffered[1] = N_sma
704 |
705 | # more conservative since it's an approximated value
706 | if N_sma >= 5:
707 | step_size = math.sqrt(
708 | (1 - beta2_t)
709 | * (N_sma - 4)
710 | / (N_sma_max - 4)
711 | * (N_sma - 2)
712 | / N_sma
713 | * N_sma_max
714 | / (N_sma_max - 2)
715 | ) / (1 - beta1 ** state['step'])
716 | elif self.degenerated_to_sgd:
717 | step_size = 1.0 / (1 - beta1 ** state['step'])
718 | else:
719 | step_size = -1
720 | buffered[2] = step_size
721 |
722 | if N_sma >= 5:
723 | denom = exp_avg_var.sqrt().add_(group['eps'])
724 | p.data.addcdiv_(exp_avg, denom, value=-step_size * group['lr'])
725 | elif step_size > 0:
726 | p.data.add_(exp_avg, alpha=-step_size * group['lr'])
727 |
728 | if half_precision:
729 | p.data = p.data.half()
730 | p.grad = p.grad.half()
731 |
732 | return loss
733 |
--------------------------------------------------------------------------------
/utils/metrics.py:
--------------------------------------------------------------------------------
1 | """
2 | References:
3 | - https://github.com/yandex-research/tabular-dl-num-embeddings/blob/main/lib/metrics.py
4 | """
5 | import enum
6 | from typing import Any, Optional, Union, cast, Tuple, Dict
7 |
8 | import numpy as np
9 | import scipy.special
10 | import sklearn.metrics as skm
11 |
12 |
13 | class PredictionType(enum.Enum):
14 | LOGITS = 'logits'
15 | PROBS = 'probs'
16 |
17 |
18 | def calculate_rmse(
19 | y_true: np.ndarray, y_pred: np.ndarray, std: Optional[float]
20 | ) -> float:
21 | rmse = skm.mean_squared_error(y_true, y_pred) ** 0.5
22 | if std is not None:
23 | rmse *= std
24 | return rmse
25 |
26 |
27 | def _get_labels_and_probs(
28 | y_pred: np.ndarray, task_type, prediction_type: Optional[PredictionType]
29 | ) -> Tuple[np.ndarray, Optional[np.ndarray]]:
30 | assert task_type in ('binclass', 'multiclass')
31 |
32 | if prediction_type is None:
33 | return y_pred, None
34 |
35 | if prediction_type == PredictionType.LOGITS:
36 | probs = (
37 | scipy.special.expit(y_pred)
38 | if task_type == 'binclass'
39 | else scipy.special.softmax(y_pred, axis=1)
40 | )
41 | elif prediction_type == PredictionType.PROBS:
42 | probs = y_pred
43 | else:
44 | raise AssertionError('Unknown prediction type')
45 |
46 | assert probs is not None
47 | labels = np.round(probs) if task_type == 'binclass' else probs.argmax(axis=1)
48 | return labels.astype('int64'), probs
49 |
50 |
51 | def calculate_metrics(
52 | y_true: np.ndarray,
53 | y_pred: np.ndarray,
54 | task_type: str,
55 | prediction_type: Optional[Union[str, PredictionType]],
56 | y_std: Optional[float] = None,
57 | ) -> Dict[str, Any]:
58 | # Example: calculate_metrics(y_true, y_pred, 'binclass', 'logits', {})
59 | if prediction_type is not None:
60 | prediction_type = PredictionType(prediction_type)
61 |
62 | if task_type == 'regression':
63 | assert prediction_type is None
64 | assert y_std is not None
65 | rmse = calculate_rmse(y_true, y_pred, y_std)
66 | result = {'rmse': rmse}
67 | else:
68 | labels, probs = _get_labels_and_probs(y_pred, task_type, prediction_type)
69 | result = cast(
70 | Dict[str, Any], skm.classification_report(y_true, labels, output_dict=True)
71 | )
72 | if task_type == 'binclass':
73 | result['roc_auc'] = skm.roc_auc_score(y_true, probs)
74 | return result
75 |
--------------------------------------------------------------------------------
/utils/model.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import json
4 | import yaml
5 | import shutil
6 | import random
7 | import datetime
8 | from pathlib import Path
9 |
10 | import numpy as np
11 |
12 | from typing import Dict, List, Tuple, Union, Optional, Literal
13 |
14 | import torch
15 | import optuna
16 |
17 | from models import MLP, FTTransformer, AutoInt, DCNv2, NODE
18 | from models.abstract import TabModel, check_dir
19 | from data.utils import Dataset
20 | from data.processor import DataProcessor
21 |
22 | MODEL_CARDS = {
23 | 'xgboost': None, 'catboost': None, 'lightgbm': None,
24 | 'mlp': MLP, 'autoint': AutoInt, 'dcnv2': DCNv2, 'node': NODE,
25 | 'ft-transformer': FTTransformer, 'saint': None,
26 | 't2g-former': None, 'excel-former': None,
27 | }
28 | HPOLib = Literal['optuna', 'hyperopt'] # TODO: add 'hyperopt' support
29 |
30 | def get_model_cards():
31 | return {
32 | 'ready': sorted(list([key for key, value in MODEL_CARDS.items() if value])),
33 | 'comming soon': sorted(list([key for key, value in MODEL_CARDS.items() if not value]))
34 | }
35 |
36 | def seed_everything(seed=42):
37 | '''
38 | Sets the seed of the entire notebook so results are the same every time we run.
39 | This is for REPRODUCIBILITY.
40 | '''
41 | random.seed(seed)
42 | # Set a fixed value for the hash seed
43 | os.environ['PYTHONHASHSEED'] = str(seed)
44 | np.random.seed(seed)
45 | torch.manual_seed(seed)
46 |
47 | if torch.cuda.is_available():
48 | torch.cuda.manual_seed(seed)
49 | torch.cuda.manual_seed_all(seed)
50 | # When running on the CuDNN backend, two further options must be set
51 | torch.backends.cudnn.deterministic = True
52 | torch.backends.cudnn.benchmark = False
53 |
54 | def load_config_from_file(file):
55 | file = str(file)
56 | if file.endswith('.yaml'):
57 | with open(file, 'r') as f:
58 | cfg = yaml.safe_load(f)
59 | elif file.endswith('.json'):
60 | with open(file, 'r') as f:
61 | cfg = json.load(f)
62 | else:
63 | raise AssertionError('Config files only support yaml or json format now.')
64 | return cfg
65 |
66 | def extract_config(model_config: dict, is_large_data: bool = False):
67 | """selection of different search spaces"""
68 | used_cfgs = {"model": {}, "training": {}, 'meta': model_config.get('meta', {})}
69 | for field in ['model', 'training']:
70 | for k in model_config[field]:
71 | cfgs = model_config[field][k]
72 | if 'type2' not in cfgs:
73 | used_cfg = cfgs
74 | else:
75 | if not is_large_data:
76 | used_cfg = {k: v for k, v in cfgs.items() if not k.endswith('2')}
77 | else:
78 | used_cfg = {k[:-1]: v for k, v in cfgs.items() if k.endswith('2')}
79 | used_cfgs[field][k] = used_cfg
80 | return used_cfgs
81 |
82 | def make_baseline(
83 | model_name,
84 | model_config: Union[dict, str],
85 | n_num: int,
86 | cat_card: Optional[List[int]],
87 | n_labels: int,
88 | sparsity_scheme: Optional[str] = None,
89 | device: Union[str, torch.device] = 'cuda',
90 | ) -> TabModel:
91 | """Process Model Configs and Call Specific Model APIs"""
92 | assert model_name in MODEL_CARDS, f"unrecognized `{model_name}` model name, choose one of valid models in {MODEL_CARDS}"
93 | if isinstance(model_config, str):
94 | model_config = load_config_from_file(model_config)['model']
95 | if MODEL_CARDS[model_name] is None:
96 | raise NotImplementedError("Please add corresponding model implementation to `models` module")
97 | if sparsity_scheme is not None:
98 | assert 'mlp' in model_name
99 | return MODEL_CARDS[model_name](
100 | model_config=model_config,
101 | n_num_features=n_num, categories=cat_card, n_labels=n_labels,
102 | sparsity_scheme=sparsity_scheme)
103 | return MODEL_CARDS[model_name](
104 | model_config=model_config,
105 | n_num_features=n_num, categories=cat_card, n_labels=n_labels)
106 |
107 | def tune(
108 | model_name: str = None,
109 | search_config: Union[dict, str] = None,
110 | dataset: Dataset = None,
111 | batch_size: int = 64,
112 | patience: int = 8, # a small patience for quick tune
113 | n_iterations: int = 50,
114 | framework: HPOLib = 'optuna',
115 | device: Union[str, torch.device] = 'cuda',
116 | output_dir: Optional[str] = None,
117 | ) -> 'TabModel':
118 | # assert framework in HPOLib, f"hyper tune only support the following frameworks '{HPOLib}'"
119 |
120 | # device
121 | device = torch.device(device)
122 |
123 | # task params
124 | n_num_features = dataset.n_num_features
125 | categories = dataset.get_category_sizes('train')
126 | if len(categories) == 0:
127 | categories = None
128 | n_labels = dataset.n_classes or 1
129 | y_std = dataset.y_info.get('std') # for regression
130 | # preprocess
131 | datas = DataProcessor.prepare(dataset, device=device)
132 | # hpo search space
133 | if isinstance(search_config, str):
134 | search_spaces = load_config_from_file(search_config)
135 | else:
136 | search_spaces = search_config
137 | search_spaces = extract_config(search_spaces) # for multi-choice spaces
138 | # meta args
139 | if output_dir is None:
140 | now = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
141 | output_dir = f"results/{model_name}-{dataset.name}-{now}"
142 | search_spaces['meta'] = {'save_path': Path(output_dir) / 'tunning'} # save tuning results
143 | tuned_dir = Path(output_dir) / 'tuned'
144 | # global variable
145 | running_time = 0.
146 |
147 | def get_configs(trial: optuna.Trial): # sample configs
148 | config = {}
149 | for field in ['model', 'training']:
150 | config[field] = {}
151 | for k, space in search_spaces[field].items():
152 | if space['type'] in ['int', 'float', 'uniform', 'loguniform']:
153 | config[field][k] = eval(f"trial.suggest_{space['type']}")(k, low=space['min'], high=space['max'])
154 | elif space['type'] == 'categorical':
155 | config[field][k] = trial.suggest_categorical(k, choices=space['choices'])
156 | elif space['type'] == 'const':
157 | config[field][k] = space['value']
158 | else:
159 | raise TypeError(f"Unsupport suggest type {space['type']} for framework: {framework}")
160 | config['meta'] = search_spaces['meta']
161 | config['training'].setdefault('batch_size', batch_size)
162 | return config
163 |
164 | def objective(trial: optuna.Trial):
165 | configs = get_configs(trial)
166 | model = make_baseline(
167 | model_name, configs['model'],
168 | n_num=n_num_features,
169 | cat_card=categories,
170 | n_labels=n_labels,
171 | device=device)
172 | nonlocal running_time
173 | start = time.time()
174 | model.fit(
175 | X_num=datas['train'][0], X_cat=datas['train'][1], ys=datas['train'][2], y_std=y_std,
176 | eval_set=(datas['val'],),
177 | patience=patience,
178 | task=dataset.task_type.value,
179 | training_args=configs['training'],
180 | meta_args=configs['meta']) # save best model and configs
181 | running_time += time.time() - start
182 | val_metric = (
183 | model.history['val']['best_metric']
184 | if dataset.task_type.value != 'regression'
185 | else -model.history['val']['best_metric']
186 | )
187 | return val_metric
188 |
189 | def save_per_iter(study: optuna.Study, trail: optuna.Trial):
190 | # current tuning infos
191 | tunning_infos = {
192 | 'model_name': model_name,
193 | 'dataset': dataset.name,
194 | 'cur_trail': trail.number,
195 | 'best_trial': study.best_trial.number,
196 | 'best_val_metric': study.best_value,
197 | 'scores': [t.value for t in study.trials],
198 | 'used_time (s)': running_time,
199 | }
200 | with open(Path(search_spaces['meta']['save_path']) / 'tunning.json', 'w') as f:
201 | json.dump(tunning_infos, f, indent=4)
202 | # only copy the best tuning result
203 | if study.best_trial.number == trail.number:
204 | src_dir = search_spaces['meta']['save_path']
205 | dst_dir = tuned_dir
206 | print(f'copy best configs and results: {str(src_dir)} -> {str(dst_dir)}')
207 | print(f'best val metric: ', np.round(study.best_value, 4))
208 | if os.path.exists(dst_dir):
209 | shutil.rmtree(dst_dir)
210 | shutil.copytree(src_dir, dst_dir)
211 |
212 | study = optuna.create_study(direction='maximize')
213 | study.optimize(func=objective, n_trials=n_iterations, callbacks=[save_per_iter])
214 |
215 | # load best model
216 | config_file = Path(tuned_dir) / 'configs.yaml'
217 | configs = load_config_from_file(config_file)
218 | model = make_baseline(
219 | model_name, configs['model'],
220 | n_num=n_num_features, cat_card=categories, n_labels=n_labels,
221 | device=device)
222 | model.load_best_dnn(tuned_dir)
223 | # prediction
224 | predictions, results = model.predict(
225 | X_num=datas['val'][0], X_cat=datas['val'][1], ys=datas['val'][2], y_std=y_std,
226 | task=dataset.task_type.value,
227 | return_probs=True, return_metric=True,
228 | meta_args={'save_path': output_dir})
229 | print(results)
230 |
231 | return model
--------------------------------------------------------------------------------