├── .gitignore ├── LICENSE ├── README.md ├── imgs ├── leaf_coefficients.png ├── linear_boost_importances.png ├── linear_forest_predictions.png ├── linear_tree_class.png ├── linear_tree_reg.png └── plot_tree.png ├── lineartree ├── __init__.py ├── _classes.py ├── _criterion.py └── lineartree.py ├── notebooks ├── README.md ├── plots.ipynb ├── usage-LinearBoost.ipynb ├── usage-LinearForest.ipynb └── usage-LinearTree.ipynb ├── requirements.txt └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | 3 | # Created by https://www.gitignore.io/api/python 4 | 5 | ### Python ### 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | 32 | # PyInstaller 33 | # Usually these files are written by a python script from a template 34 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 35 | *.manifest 36 | *.spec 37 | 38 | # Installer logs 39 | pip-log.txt 40 | pip-delete-this-directory.txt 41 | 42 | # Unit test / coverage reports 43 | htmlcov/ 44 | .tox/ 45 | .coverage 46 | .coverage.* 47 | .cache 48 | nosetests.xml 49 | coverage.xml 50 | *.cover 51 | .hypothesis/ 52 | 53 | # Translations 54 | *.mo 55 | *.pot 56 | 57 | # Django stuff: 58 | *.log 59 | local_settings.py 60 | 61 | # Flask stuff: 62 | instance/ 63 | .webassets-cache 64 | 65 | # Scrapy stuff: 66 | .scrapy 67 | 68 | # Sphinx documentation 69 | docs/_build/ 70 | 71 | # PyBuilder 72 | target/ 73 | 74 | # Jupyter Notebook 75 | .ipynb_checkpoints 76 | 77 | # pyenv 78 | .python-version 79 | 80 | # celery beat schedule file 81 | celerybeat-schedule 82 | 83 | # SageMath parsed files 84 | *.sage.py 85 | 86 | # Environments 87 | .env 88 | .venv 89 | env/ 90 | venv/ 91 | ENV/ 92 | env.bak/ 93 | venv.bak/ 94 | 95 | # Spyder project settings 96 | .spyderproject 97 | .spyproject 98 | 99 | # Rope project settings 100 | .ropeproject 101 | 102 | # mkdocs documentation 103 | /site 104 | 105 | # mypy 106 | .mypy_cache/ 107 | 108 | # End of https://www.gitignore.io/api/python 109 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Marco Cerliani 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # linear-tree 2 | A python library to build Model Trees with Linear Models at the leaves. 3 | 4 | linear-tree provides also the implementations of _LinearForest_ and _LinearBoost_ inspired from [these works](https://github.com/cerlymarco/linear-tree#references). 5 | 6 | ## Overview 7 | **Linear Trees** combine the learning ability of Decision Tree with the predictive and explicative power of Linear Models. 8 | Like in tree-based algorithms, the data are split according to simple decision rules. The goodness of slits is evaluated in gain terms fitting Linear Models in the nodes. This implies that the models in the leaves are linear instead of constant approximations like in classical Decision Trees. 9 | 10 | **Linear Forests** generalize the well known Random Forests by combining Linear Models with the same Random Forests. The key idea is to use the strength of Linear Models to improve the nonparametric learning ability of tree-based algorithms. Firstly, a Linear Model is fitted on the whole dataset, then a Random Forest is trained on the same dataset but using the residuals of the previous steps as target. The final predictions are the sum of the raw linear predictions and the residuals modeled by the Random Forest. 11 | 12 | **Linear Boosting** is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtain predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met. 13 | 14 | **linear-tree is developed to be fully integrable with scikit-learn**. ```LinearTreeRegressor``` and ```LinearTreeClassifier``` are provided as scikit-learn _BaseEstimator_ to build a decision tree using linear estimators. ```LinearForestRegressor``` and ```LinearForestClassifier``` use the _RandomForest_ from sklearn to model residuals. ```LinearBoostRegressor``` and ```LinearBoostClassifier``` are available also as _TransformerMixin_ in order to be integrated, in any pipeline, also for automated features engineering. All the models available in [sklearn.linear_model](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) can be used as base learner. 15 | 16 | ## Installation 17 | ```shell 18 | pip install --upgrade linear-tree 19 | ``` 20 | The module depends on NumPy, SciPy and Scikit-Learn (>=0.24.2). Python 3.6 or above is supported. 21 | 22 | ## Media 23 | - [Linear Tree: the perfect mix of Linear Model and Decision Tree](https://towardsdatascience.com/linear-tree-the-perfect-mix-of-linear-model-and-decision-tree-2eaed21936b7) 24 | - [Model Tree: handle Data Shifts mixing Linear Model and Decision Tree](https://towardsdatascience.com/model-tree-handle-data-shifts-mixing-linear-model-and-decision-tree-facfd642e42b) 25 | - [Explainable AI with Linear Trees](https://towardsdatascience.com/explainable-ai-with-linear-trees-7e30a6f067d7) 26 | - [Improve Linear Regression for Time Series Forecasting](https://towardsdatascience.com/improve-linear-regression-for-time-series-forecasting-e36f3c3e3534#a80b-b6010ccb1c21) 27 | - [Linear Boosting with Automated Features Engineering](https://towardsdatascience.com/linear-boosting-with-automated-features-engineering-894962c3ba84) 28 | - [Improve Random Forest with Linear Models](https://towardsdatascience.com/improve-random-forest-with-linear-models-1fa789691e18) 29 | 30 | ## Usage 31 | ##### Linear Tree Regression 32 | ```python 33 | from sklearn.linear_model import LinearRegression 34 | from lineartree import LinearTreeRegressor 35 | from sklearn.datasets import make_regression 36 | X, y = make_regression(n_samples=100, n_features=4, 37 | n_informative=2, n_targets=1, 38 | random_state=0, shuffle=False) 39 | regr = LinearTreeRegressor(base_estimator=LinearRegression()) 40 | regr.fit(X, y) 41 | ``` 42 | ##### Linear Tree Classification 43 | ```python 44 | from sklearn.linear_model import RidgeClassifier 45 | from lineartree import LinearTreeClassifier 46 | from sklearn.datasets import make_classification 47 | X, y = make_classification(n_samples=100, n_features=4, 48 | n_informative=2, n_redundant=0, 49 | random_state=0, shuffle=False) 50 | clf = LinearTreeClassifier(base_estimator=RidgeClassifier()) 51 | clf.fit(X, y) 52 | ``` 53 | ##### Linear Forest Regression 54 | ```python 55 | from sklearn.linear_model import LinearRegression 56 | from lineartree import LinearForestRegressor 57 | from sklearn.datasets import make_regression 58 | X, y = make_regression(n_samples=100, n_features=4, 59 | n_informative=2, n_targets=1, 60 | random_state=0, shuffle=False) 61 | regr = LinearForestRegressor(base_estimator=LinearRegression()) 62 | regr.fit(X, y) 63 | ``` 64 | ##### Linear Forest Classification 65 | ```python 66 | from sklearn.linear_model import LinearRegression 67 | from lineartree import LinearForestClassifier 68 | from sklearn.datasets import make_classification 69 | X, y = make_classification(n_samples=100, n_features=4, 70 | n_informative=2, n_redundant=0, 71 | random_state=0, shuffle=False) 72 | clf = LinearForestClassifier(base_estimator=LinearRegression()) 73 | clf.fit(X, y) 74 | ``` 75 | ##### Linear Boosting Regression 76 | ```python 77 | from sklearn.linear_model import LinearRegression 78 | from lineartree import LinearBoostRegressor 79 | from sklearn.datasets import make_regression 80 | X, y = make_regression(n_samples=100, n_features=4, 81 | n_informative=2, n_targets=1, 82 | random_state=0, shuffle=False) 83 | regr = LinearBoostRegressor(base_estimator=LinearRegression()) 84 | regr.fit(X, y) 85 | ``` 86 | ##### Linear Boosting Classification 87 | ```python 88 | from sklearn.linear_model import RidgeClassifier 89 | from lineartree import LinearBoostClassifier 90 | from sklearn.datasets import make_classification 91 | X, y = make_classification(n_samples=100, n_features=4, 92 | n_informative=2, n_redundant=0, 93 | random_state=0, shuffle=False) 94 | clf = LinearBoostClassifier(base_estimator=RidgeClassifier()) 95 | clf.fit(X, y) 96 | ``` 97 | 98 | More examples in the [notebooks folder](https://github.com/cerlymarco/linear-tree/tree/main/notebooks). 99 | 100 | Check the [API Reference](https://github.com/cerlymarco/linear-tree/blob/main/notebooks/README.md) to see the parameter configurations and the available methods. 101 | 102 | ## Examples 103 | Show the linear tree learning path: 104 | 105 | ![plot tree](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/plot_tree.png) 106 | 107 | Linear Tree Regressor at work: 108 | 109 | ![linear tree regressor](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_reg.png) 110 | 111 | Linear Tree Classifier at work: 112 | 113 | ![linear tree classifier](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_class.png) 114 | 115 | Extract and examine coefficients at the leaves: 116 | 117 | ![leaf coefficients](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/leaf_coefficients.png) 118 | 119 | Impact of the features automatically generated with Linear Boosting: 120 | 121 | ![linear_boost_importances](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_boost_importances.png) 122 | 123 | Comparing predictions of Linear Forest and Random Forest: 124 | 125 | ![linear_forest_predictions](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_forest_predictions.png) 126 | 127 | ## References 128 | - Regression-Enhanced Random Forests. Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu. 129 | - Explainable boosted linear regression for time series forecasting. Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan. 130 | -------------------------------------------------------------------------------- /imgs/leaf_coefficients.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/leaf_coefficients.png -------------------------------------------------------------------------------- /imgs/linear_boost_importances.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_boost_importances.png -------------------------------------------------------------------------------- /imgs/linear_forest_predictions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_forest_predictions.png -------------------------------------------------------------------------------- /imgs/linear_tree_class.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_tree_class.png -------------------------------------------------------------------------------- /imgs/linear_tree_reg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_tree_reg.png -------------------------------------------------------------------------------- /imgs/plot_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/plot_tree.png -------------------------------------------------------------------------------- /lineartree/__init__.py: -------------------------------------------------------------------------------- 1 | from ._classes import * 2 | from ._criterion import * 3 | from .lineartree import * -------------------------------------------------------------------------------- /lineartree/_classes.py: -------------------------------------------------------------------------------- 1 | import numbers 2 | import numpy as np 3 | import scipy.sparse as sp 4 | 5 | from copy import deepcopy 6 | from joblib import Parallel, effective_n_jobs # , delayed 7 | 8 | from sklearn.dummy import DummyClassifier 9 | from sklearn.tree import DecisionTreeRegressor 10 | from sklearn.ensemble import RandomForestRegressor 11 | 12 | from sklearn.base import is_regressor 13 | from sklearn.base import BaseEstimator, TransformerMixin 14 | from sklearn.utils.validation import has_fit_parameter, check_is_fitted 15 | 16 | from ._criterion import SCORING 17 | from ._criterion import mse, rmse, mae, poisson 18 | from ._criterion import hamming, crossentropy 19 | 20 | import sklearn 21 | _sklearn_v1 = eval(sklearn.__version__.split('.')[0]) > 0 22 | 23 | 24 | CRITERIA = {"mse": mse, 25 | "rmse": rmse, 26 | "mae": mae, 27 | "poisson": poisson, 28 | "hamming": hamming, 29 | "crossentropy": crossentropy} 30 | 31 | 32 | ######################################################################### 33 | ### remove when https://github.com/joblib/joblib/issues/1071 is fixed ### 34 | ######################################################################### 35 | from sklearn import get_config, config_context 36 | from functools import update_wrapper 37 | import functools 38 | 39 | # from sklearn.utils.fixes 40 | def delayed(function): 41 | """Decorator used to capture the arguments of a function.""" 42 | @functools.wraps(function) 43 | def delayed_function(*args, **kwargs): 44 | return _FuncWrapper(function), args, kwargs 45 | return delayed_function 46 | 47 | # from sklearn.utils.fixes 48 | class _FuncWrapper: 49 | """"Load the global configuration before calling the function.""" 50 | def __init__(self, function): 51 | self.function = function 52 | self.config = get_config() 53 | update_wrapper(self, self.function) 54 | 55 | def __call__(self, *args, **kwargs): 56 | with config_context(**self.config): 57 | return self.function(*args, **kwargs) 58 | ######################################################################### 59 | ######################################################################### 60 | ######################################################################### 61 | 62 | 63 | def _partition_columns(columns, n_jobs): 64 | """Private function to partition columns splitting between jobs.""" 65 | # Compute the number of jobs 66 | n_columns = len(columns) 67 | n_jobs = min(effective_n_jobs(n_jobs), n_columns) 68 | 69 | # Partition columns between jobs 70 | n_columns_per_job = np.full(n_jobs, n_columns // n_jobs, dtype=int) 71 | n_columns_per_job[:n_columns % n_jobs] += 1 72 | columns_per_job = np.cumsum(n_columns_per_job) 73 | columns_per_job = np.split(columns, columns_per_job) 74 | columns_per_job = columns_per_job[:-1] 75 | 76 | return n_jobs, columns_per_job 77 | 78 | 79 | def _parallel_binning_fit(split_feat, _self, X, y, 80 | weights, support_sample_weight, 81 | bins, loss): 82 | """Private function to find the best column splittings within a job.""" 83 | n_sample, n_feat = X.shape 84 | feval = CRITERIA[_self.criterion] 85 | 86 | split_t = None 87 | split_col = None 88 | left_node = (None, None, None, None) 89 | right_node = (None, None, None, None) 90 | largs_left = {'classes': None} 91 | largs_right = {'classes': None} 92 | 93 | if n_sample < _self._min_samples_split: 94 | return loss, split_t, split_col, left_node, right_node 95 | 96 | for col, _bin in zip(split_feat, bins): 97 | 98 | for q in _bin: 99 | 100 | # create 1D bool mask for right/left children 101 | mask = (X[:, col] > q) 102 | 103 | n_left, n_right = (~mask).sum(), mask.sum() 104 | 105 | if n_left < _self._min_samples_leaf or n_right < _self._min_samples_leaf: 106 | continue 107 | 108 | # create 2D bool mask for right/left children 109 | left_mesh = np.ix_(~mask, _self._linear_features) 110 | right_mesh = np.ix_(mask, _self._linear_features) 111 | 112 | model_left = deepcopy(_self.base_estimator) 113 | model_right = deepcopy(_self.base_estimator) 114 | 115 | if hasattr(_self, 'classes_'): 116 | largs_left['classes'] = np.unique(y[~mask]) 117 | largs_right['classes'] = np.unique(y[mask]) 118 | if len(largs_left['classes']) == 1: 119 | model_left = DummyClassifier(strategy="most_frequent") 120 | if len(largs_right['classes']) == 1: 121 | model_right = DummyClassifier(strategy="most_frequent") 122 | 123 | if weights is None: 124 | model_left.fit(X[left_mesh], y[~mask]) 125 | loss_left = feval(model_left, X[left_mesh], y[~mask], 126 | **largs_left) 127 | wloss_left = loss_left * (n_left / n_sample) 128 | 129 | model_right.fit(X[right_mesh], y[mask]) 130 | loss_right = feval(model_right, X[right_mesh], y[mask], 131 | **largs_right) 132 | wloss_right = loss_right * (n_right / n_sample) 133 | 134 | else: 135 | if support_sample_weight: 136 | model_left.fit(X[left_mesh], y[~mask], 137 | sample_weight=weights[~mask]) 138 | 139 | model_right.fit(X[right_mesh], y[mask], 140 | sample_weight=weights[mask]) 141 | 142 | else: 143 | model_left.fit(X[left_mesh], y[~mask]) 144 | 145 | model_right.fit(X[right_mesh], y[mask]) 146 | 147 | loss_left = feval(model_left, X[left_mesh], y[~mask], 148 | weights=weights[~mask], **largs_left) 149 | wloss_left = loss_left * (weights[~mask].sum() / weights.sum()) 150 | 151 | loss_right = feval(model_right, X[right_mesh], y[mask], 152 | weights=weights[mask], **largs_right) 153 | wloss_right = loss_right * (weights[mask].sum() / weights.sum()) 154 | 155 | total_loss = round(wloss_left + wloss_right, 5) 156 | 157 | # store if best 158 | if total_loss < loss: 159 | split_t = q 160 | split_col = col 161 | loss = total_loss 162 | left_node = (model_left, loss_left, wloss_left, 163 | n_left, largs_left['classes']) 164 | right_node = (model_right, loss_right, wloss_right, 165 | n_right, largs_right['classes']) 166 | 167 | return loss, split_t, split_col, left_node, right_node 168 | 169 | 170 | def _map_node(X, feat, direction, split): 171 | """Utility to map samples to nodes""" 172 | if direction == 'L': 173 | mask = (X[:, feat] <= split) 174 | else: 175 | mask = (X[:, feat] > split) 176 | 177 | return mask 178 | 179 | 180 | def _predict_branch(X, branch_history, mask=None): 181 | """Utility to map samples to branches""" 182 | 183 | if mask is None: 184 | mask = np.repeat(True, X.shape[0]) 185 | 186 | for node in branch_history: 187 | mask = np.logical_and(_map_node(X, *node), mask) 188 | 189 | return mask 190 | 191 | 192 | class Node: 193 | 194 | def __init__(self, id=None, threshold=[], 195 | parent=None, children=None, 196 | n_samples=None, w_loss=None, 197 | loss=None, model=None, classes=None): 198 | self.id = id 199 | self.threshold = threshold 200 | self.parent = parent 201 | self.children = children 202 | self.n_samples = n_samples 203 | self.w_loss = w_loss 204 | self.loss = loss 205 | self.model = model 206 | self.classes = classes 207 | 208 | 209 | class _LinearTree(BaseEstimator): 210 | """Base class for Linear Tree meta-estimator. 211 | 212 | Warning: This class should not be used directly. Use derived classes 213 | instead. 214 | """ 215 | def __init__(self, base_estimator, *, criterion, max_depth, 216 | min_samples_split, min_samples_leaf, max_bins, 217 | min_impurity_decrease, categorical_features, 218 | split_features, linear_features, n_jobs): 219 | 220 | self.base_estimator = base_estimator 221 | self.criterion = criterion 222 | self.max_depth = max_depth 223 | self.min_samples_split = min_samples_split 224 | self.min_samples_leaf = min_samples_leaf 225 | self.max_bins = max_bins 226 | self.min_impurity_decrease = min_impurity_decrease 227 | self.categorical_features = categorical_features 228 | self.split_features = split_features 229 | self.linear_features = linear_features 230 | self.n_jobs = n_jobs 231 | 232 | def _parallel_args(self): 233 | return {} 234 | 235 | def _split(self, X, y, bins, 236 | support_sample_weight, 237 | weights=None, 238 | loss=None): 239 | """Evaluate optimal splits in a given node (in a specific partition of 240 | X and y). 241 | 242 | Parameters 243 | ---------- 244 | X : array-like of shape (n_samples, n_features) 245 | The training input samples. 246 | 247 | y : array-like of shape (n_samples, ) 248 | The target values (class labels in classification, real numbers in 249 | regression). 250 | 251 | bins : array-like of shape (max_bins - 2, ) 252 | The bins to use to find an optimal split. Expressed as percentiles. 253 | 254 | support_sample_weight : bool 255 | Whether the estimator's fit method supports sample_weight. 256 | 257 | weights : array-like of shape (n_samples, ), default=None 258 | Sample weights. If None, then samples are equally weighted. 259 | Note that if the base estimator does not support sample weighting, 260 | the sample weights are still used to evaluate the splits. 261 | 262 | loss : float, default=None 263 | The loss of the parent node. A split is computed if the weighted 264 | loss sum of the two children is lower than the loss of the parent. 265 | A None value implies the first fit on all the data to evaluate 266 | the benefits of possible future splits. 267 | 268 | Returns 269 | ------- 270 | self : object 271 | """ 272 | # Parallel loops 273 | n_jobs, split_feat = _partition_columns(self._split_features, self.n_jobs) 274 | 275 | # partition columns splittings between jobs 276 | all_results = Parallel(n_jobs=n_jobs, verbose=0, 277 | **self._parallel_args())( 278 | delayed(_parallel_binning_fit)( 279 | feat, 280 | self, X, y, 281 | weights, support_sample_weight, 282 | [bins[i] for i in feat], 283 | loss 284 | ) 285 | for feat in split_feat) 286 | 287 | # extract results from parallel loops 288 | _losses, split_t, split_col = [], [], [] 289 | left_node, right_node = [], [] 290 | for job_res in all_results: 291 | _losses.append(job_res[0]) 292 | split_t.append(job_res[1]) 293 | split_col.append(job_res[2]) 294 | left_node.append(job_res[3]) 295 | right_node.append(job_res[4]) 296 | 297 | # select best results 298 | _id_best = np.argmin(_losses) 299 | if loss - _losses[_id_best] > self.min_impurity_decrease: 300 | split_t = split_t[_id_best] 301 | split_col = split_col[_id_best] 302 | left_node = left_node[_id_best] 303 | right_node = right_node[_id_best] 304 | else: 305 | split_t = None 306 | split_col = None 307 | left_node = (None, None, None, None, None) 308 | right_node = (None, None, None, None, None) 309 | 310 | return split_t, split_col, left_node, right_node 311 | 312 | def _grow(self, X, y, weights=None): 313 | """Grow and prune a Linear Tree from the training set (X, y). 314 | 315 | Parameters 316 | ---------- 317 | X : array-like of shape (n_samples, n_features) 318 | The training input samples. 319 | 320 | y : array-like of shape (n_samples, ) 321 | The target values (class labels in classification, real numbers in 322 | regression). 323 | 324 | weights : array-like of shape (n_samples, ), default=None 325 | Sample weights. If None, then samples are equally weighted. 326 | Note that if the base estimator does not support sample weighting, 327 | the sample weights are still used to evaluate the splits. 328 | 329 | Returns 330 | ------- 331 | self : object 332 | """ 333 | n_sample, self.n_features_in_ = X.shape 334 | self.feature_importances_ = np.zeros((self.n_features_in_,)) 335 | 336 | # extract quantiles 337 | bins = np.linspace(0, 1, self.max_bins)[1:-1] 338 | bins = np.quantile(X, bins, axis=0) 339 | bins = list(bins.T) 340 | bins = [np.unique(X[:, c]) if c in self._categorical_features 341 | else np.unique(q) for c, q in enumerate(bins)] 342 | 343 | # check if base_estimator supports fitting with sample_weights 344 | support_sample_weight = has_fit_parameter(self.base_estimator, 345 | "sample_weight") 346 | 347 | queue = [''] # queue of the nodes to evaluate for splitting 348 | # store the results of each node in dicts 349 | self._nodes = {} 350 | self._leaves = {} 351 | 352 | # initialize first fit 353 | largs = {'classes': None} 354 | model = deepcopy(self.base_estimator) 355 | if weights is None or not support_sample_weight: 356 | model.fit(X[:, self._linear_features], y) 357 | else: 358 | model.fit(X[:, self._linear_features], y, sample_weight=weights) 359 | 360 | if hasattr(self, 'classes_'): 361 | largs['classes'] = self.classes_ 362 | 363 | loss = CRITERIA[self.criterion]( 364 | model, X[:, self._linear_features], y, 365 | weights=weights, **largs) 366 | loss = round(loss, 5) 367 | 368 | self._nodes[''] = Node( 369 | id=0, 370 | n_samples=n_sample, 371 | model=model, 372 | loss=loss, 373 | classes=largs['classes'] 374 | ) 375 | 376 | # in the beginning consider all the samples 377 | start = np.repeat(True, n_sample) 378 | mask = start.copy() 379 | 380 | i = 1 381 | while len(queue) > 0: 382 | 383 | if weights is None: 384 | split_t, split_col, left_node, right_node = self._split( 385 | X[mask], y[mask], bins, 386 | support_sample_weight, 387 | loss=loss) 388 | else: 389 | split_t, split_col, left_node, right_node = self._split( 390 | X[mask], y[mask], bins, 391 | support_sample_weight, weights[mask], 392 | loss=loss) 393 | 394 | # no utility in splitting 395 | if split_col is None or len(queue[-1]) >= self.max_depth: 396 | self._leaves[queue[-1]] = self._nodes[queue[-1]] 397 | del self._nodes[queue[-1]] 398 | queue.pop() 399 | else: 400 | model_left, loss_left, wloss_left, n_left, class_left = \ 401 | left_node 402 | model_right, loss_right, wloss_right, n_right, class_right = \ 403 | right_node 404 | self.feature_importances_[split_col] += \ 405 | loss - wloss_left - wloss_right 406 | 407 | self._nodes[queue[-1] + 'L'] = Node( 408 | id=i, parent=queue[-1], 409 | model=model_left, 410 | loss=loss_left, 411 | w_loss=wloss_left, 412 | n_samples=n_left, 413 | threshold=self._nodes[queue[-1]].threshold[:] + [ 414 | (split_col, 'L', split_t) 415 | ] 416 | ) 417 | 418 | self._nodes[queue[-1] + 'R'] = Node( 419 | id=i + 1, parent=queue[-1], 420 | model=model_right, 421 | loss=loss_right, 422 | w_loss=wloss_right, 423 | n_samples=n_right, 424 | threshold=self._nodes[queue[-1]].threshold[:] + [ 425 | (split_col, 'R', split_t) 426 | ] 427 | ) 428 | 429 | if hasattr(self, 'classes_'): 430 | self._nodes[queue[-1] + 'L'].classes = class_left 431 | self._nodes[queue[-1] + 'R'].classes = class_right 432 | 433 | self._nodes[queue[-1]].children = (queue[-1] + 'L', queue[-1] + 'R') 434 | 435 | i += 2 436 | q = queue[-1] 437 | queue.pop() 438 | queue.extend([q + 'R', q + 'L']) 439 | 440 | if len(queue) > 0: 441 | loss = self._nodes[queue[-1]].loss 442 | mask = _predict_branch( 443 | X, self._nodes[queue[-1]].threshold, start.copy()) 444 | 445 | self.node_count = i 446 | 447 | return self 448 | 449 | def _fit(self, X, y, sample_weight=None): 450 | """Build a Linear Tree of a linear estimator from the training 451 | set (X, y). 452 | 453 | Parameters 454 | ---------- 455 | X : array-like of shape (n_samples, n_features) 456 | The training input samples. 457 | 458 | y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for 459 | multitarget regression. 460 | The target values (class labels in classification, real numbers in 461 | regression). 462 | 463 | sample_weight : array-like of shape (n_samples, ), default=None 464 | Sample weights. If None, then samples are equally weighted. 465 | Note that if the base estimator does not support sample weighting, 466 | the sample weights are still used to evaluate the splits. 467 | 468 | Returns 469 | ------- 470 | self : object 471 | """ 472 | n_sample, n_feat = X.shape 473 | 474 | if isinstance(self.min_samples_split, numbers.Integral): 475 | if self.min_samples_split < 6: 476 | raise ValueError( 477 | "min_samples_split must be an integer greater than 5 or " 478 | "a float in (0.0, 1.0); got the integer {}".format( 479 | self.min_samples_split)) 480 | self._min_samples_split = self.min_samples_split 481 | else: 482 | if not 0. < self.min_samples_split < 1.: 483 | raise ValueError( 484 | "min_samples_split must be an integer greater than 5 or " 485 | "a float in (0.0, 1.0); got the float {}".format( 486 | self.min_samples_split)) 487 | 488 | self._min_samples_split = int(np.ceil(self.min_samples_split * n_sample)) 489 | self._min_samples_split = max(6, self._min_samples_split) 490 | 491 | if isinstance(self.min_samples_leaf, numbers.Integral): 492 | if self.min_samples_leaf < 3: 493 | raise ValueError( 494 | "min_samples_leaf must be an integer greater than 2 or " 495 | "a float in (0.0, 1.0); got the integer {}".format( 496 | self.min_samples_leaf)) 497 | self._min_samples_leaf = self.min_samples_leaf 498 | else: 499 | if not 0. < self.min_samples_leaf < 1.: 500 | raise ValueError( 501 | "min_samples_leaf must be an integer greater than 2 or " 502 | "a float in (0.0, 1.0); got the float {}".format( 503 | self.min_samples_leaf)) 504 | 505 | self._min_samples_leaf = int(np.ceil(self.min_samples_leaf * n_sample)) 506 | self._min_samples_leaf = max(3, self._min_samples_leaf) 507 | 508 | if not 1 <= self.max_depth <= 20: 509 | raise ValueError("max_depth must be an integer in [1, 20].") 510 | 511 | if not 10 <= self.max_bins <= 120: 512 | raise ValueError("max_bins must be an integer in [10, 120].") 513 | 514 | if not hasattr(self.base_estimator, 'fit_intercept'): 515 | raise ValueError( 516 | "Only linear models are accepted as base_estimator. " 517 | "Select one from linear_model class of scikit-learn.") 518 | 519 | if self.categorical_features is not None: 520 | cat_features = np.unique(self.categorical_features) 521 | 522 | if not issubclass(cat_features.dtype.type, numbers.Integral): 523 | raise ValueError( 524 | "No valid specification of categorical columns. " 525 | "Only a scalar, list or array-like of integers is allowed.") 526 | 527 | if (cat_features < 0).any() or (cat_features >= n_feat).any(): 528 | raise ValueError( 529 | 'Categorical features must be in [0, {}].'.format( 530 | n_feat - 1)) 531 | 532 | if len(cat_features) == n_feat: 533 | raise ValueError( 534 | "Only categorical features detected. " 535 | "No features available for fitting.") 536 | else: 537 | cat_features = [] 538 | self._categorical_features = cat_features 539 | 540 | if self.split_features is not None: 541 | split_features = np.unique(self.split_features) 542 | 543 | if not issubclass(split_features.dtype.type, numbers.Integral): 544 | raise ValueError( 545 | "No valid specification of split_features. " 546 | "Only a scalar, list or array-like of integers is allowed.") 547 | 548 | if (split_features < 0).any() or (split_features >= n_feat).any(): 549 | raise ValueError( 550 | 'Splitting features must be in [0, {}].'.format( 551 | n_feat - 1)) 552 | else: 553 | split_features = np.arange(n_feat) 554 | self._split_features = split_features 555 | 556 | if self.linear_features is not None: 557 | linear_features = np.unique(self.linear_features) 558 | 559 | if not issubclass(linear_features.dtype.type, numbers.Integral): 560 | raise ValueError( 561 | "No valid specification of linear_features. " 562 | "Only a scalar, list or array-like of integers is allowed.") 563 | 564 | if (linear_features < 0).any() or (linear_features >= n_feat).any(): 565 | raise ValueError( 566 | 'Linear features must be in [0, {}].'.format( 567 | n_feat - 1)) 568 | 569 | if np.isin(linear_features, cat_features).any(): 570 | raise ValueError( 571 | "Linear features cannot be categorical features.") 572 | else: 573 | linear_features = np.setdiff1d(np.arange(n_feat), cat_features) 574 | self._linear_features = linear_features 575 | 576 | self._grow(X, y, sample_weight) 577 | 578 | normalizer = np.sum(self.feature_importances_) 579 | if normalizer > 0: 580 | self.feature_importances_ /= normalizer 581 | 582 | return self 583 | 584 | def summary(self, feature_names=None, only_leaves=False, max_depth=None): 585 | """Return a summary of nodes created from model fitting. 586 | 587 | Parameters 588 | ---------- 589 | feature_names : array-like of shape (n_features, ), default=None 590 | Names of each of the features. If None, generic names 591 | will be used (“X[0]”, “X[1]”, …). 592 | 593 | only_leaves : bool, default=False 594 | Store only information of leaf nodes. 595 | 596 | max_depth : int, default=None 597 | The maximum depth of the representation. If None, the tree 598 | is fully generated. 599 | 600 | Returns 601 | ------- 602 | summary : nested dict 603 | The keys are the integer map of each node. 604 | The values are dicts containing information for that node: 605 | 606 | - 'col' (^): column used for splitting; 607 | - 'th' (^): threshold value used for splitting in the 608 | selected column; 609 | - 'loss': loss computed at node level. Weighted sum of 610 | children' losses if it is a splitting node; 611 | - 'samples': number of samples in the node. Sum of children' 612 | samples if it is a split node; 613 | - 'children' (^): integer mapping of possible children nodes; 614 | - 'models': fitted linear models built in each split. 615 | Single model if it is leaf node; 616 | - 'classes' (^^): target classes detected in the split. 617 | Available only for LinearTreeClassifier. 618 | 619 | (^): Only for split nodes. 620 | (^^): Only for leaf nodes. 621 | """ 622 | check_is_fitted(self, attributes='_nodes') 623 | 624 | if max_depth is None: 625 | max_depth = 20 626 | if max_depth < 1: 627 | raise ValueError( 628 | "max_depth must be > 0, got {}".format(max_depth)) 629 | 630 | summary = {} 631 | 632 | if len(self._nodes) > 0 and not only_leaves: 633 | 634 | if (feature_names is not None and 635 | len(feature_names) != self.n_features_in_): 636 | raise ValueError( 637 | "feature_names must contain {} elements, got {}".format( 638 | self.n_features_in_, len(feature_names))) 639 | 640 | if feature_names is None: 641 | feature_names = np.arange(self.n_features_in_) 642 | 643 | for n, N in self._nodes.items(): 644 | 645 | if len(n) >= max_depth: 646 | continue 647 | 648 | cl, cr = N.children 649 | Cl = (self._nodes[cl] if cl in self._nodes 650 | else self._leaves[cl]) 651 | Cr = (self._nodes[cr] if cr in self._nodes 652 | else self._leaves[cr]) 653 | 654 | summary[N.id] = { 655 | 'col': feature_names[Cl.threshold[-1][0]], 656 | 'th': round(Cl.threshold[-1][-1], 5), 657 | 'loss': round(Cl.w_loss + Cr.w_loss, 5), 658 | 'samples': Cl.n_samples + Cr.n_samples, 659 | 'children': (Cl.id, Cr.id), 660 | 'models': (Cl.model, Cr.model) 661 | } 662 | 663 | for l, L in self._leaves.items(): 664 | 665 | if len(l) > max_depth: 666 | continue 667 | 668 | summary[L.id] = { 669 | 'loss': round(L.loss, 5), 670 | 'samples': L.n_samples, 671 | 'models': L.model 672 | } 673 | 674 | if hasattr(self, 'classes_'): 675 | summary[L.id]['classes'] = L.classes 676 | 677 | return summary 678 | 679 | def apply(self, X): 680 | """Return the index of the leaf that each sample is predicted as. 681 | 682 | Parameters 683 | ---------- 684 | X : array-like of shape (n_samples, n_features) 685 | Samples. 686 | 687 | Returns 688 | ------- 689 | X_leaves : array-like of shape (n_samples, ) 690 | For each datapoint x in X, return the index of the leaf x 691 | ends up in. Leaves are numbered within 692 | ``[0; n_nodes)``, possibly with gaps in the 693 | numbering. 694 | """ 695 | check_is_fitted(self, attributes='_nodes') 696 | 697 | X = self._validate_data( 698 | X, 699 | reset=False, 700 | accept_sparse=False, 701 | dtype='float32', 702 | force_all_finite=True, 703 | ensure_2d=True, 704 | allow_nd=False, 705 | ensure_min_features=self.n_features_in_ 706 | ) 707 | 708 | X_leaves = np.zeros(X.shape[0], dtype='int64') 709 | 710 | for L in self._leaves.values(): 711 | 712 | mask = _predict_branch(X, L.threshold) 713 | if (~mask).all(): 714 | continue 715 | 716 | X_leaves[mask] = L.id 717 | 718 | return X_leaves 719 | 720 | def decision_path(self, X): 721 | """Return the decision path in the tree. 722 | 723 | Parameters 724 | ---------- 725 | X : array-like of shape (n_samples, n_features) 726 | Samples. 727 | 728 | Returns 729 | ------- 730 | indicator : sparse matrix of shape (n_samples, n_nodes) 731 | Return a node indicator CSR matrix where non zero elements 732 | indicates that the samples goes through the nodes. 733 | """ 734 | check_is_fitted(self, attributes='_nodes') 735 | 736 | X = self._validate_data( 737 | X, 738 | reset=False, 739 | accept_sparse=False, 740 | dtype='float32', 741 | force_all_finite=True, 742 | ensure_2d=True, 743 | allow_nd=False, 744 | ensure_min_features=self.n_features_in_ 745 | ) 746 | 747 | indicator = np.zeros((X.shape[0], self.node_count), dtype='int64') 748 | 749 | for L in self._leaves.values(): 750 | 751 | mask = _predict_branch(X, L.threshold) 752 | if (~mask).all(): 753 | continue 754 | 755 | n = L.id 756 | p = L.parent 757 | paths_id = [n] 758 | 759 | while p is not None: 760 | n = self._nodes[p].id 761 | p = self._nodes[p].parent 762 | paths_id.append(n) 763 | 764 | indicator[np.ix_(mask, paths_id)] = 1 765 | 766 | return sp.csr_matrix(indicator) 767 | 768 | def model_to_dot(self, feature_names=None, max_depth=None): 769 | """Convert a fitted Linear Tree model to dot format. 770 | It results in ModuleNotFoundError if graphviz or pydot are not available. 771 | When installing graphviz make sure to add it to the system path. 772 | 773 | Parameters 774 | ---------- 775 | feature_names : array-like of shape (n_features, ), default=None 776 | Names of each of the features. If None, generic names 777 | will be used (“X[0]”, “X[1]”, …). 778 | 779 | max_depth : int, default=None 780 | The maximum depth of the representation. If None, the tree 781 | is fully generated. 782 | 783 | Returns 784 | ------- 785 | graph : pydot.Dot instance 786 | Return an instance representing the Linear Tree. Splitting nodes have 787 | a rectangular shape while leaf nodes have a circular one. 788 | """ 789 | import pydot 790 | 791 | summary = self.summary(feature_names=feature_names, max_depth=max_depth) 792 | graph = pydot.Dot('linear_tree', graph_type='graph') 793 | 794 | # create nodes 795 | for n in summary: 796 | if 'col' in summary[n]: 797 | if isinstance(summary[n]['col'], str): 798 | msg = "id_node: {}\n{} <= {}\nloss: {:.4f}\nsamples: {}" 799 | else: 800 | msg = "id_node: {}\nX[{}] <= {}\nloss: {:.4f}\nsamples: {}" 801 | 802 | msg = msg.format( 803 | n, summary[n]['col'], summary[n]['th'], 804 | summary[n]['loss'], summary[n]['samples'] 805 | ) 806 | graph.add_node(pydot.Node(n, label=msg, shape='rectangle')) 807 | 808 | for c in summary[n]['children']: 809 | if c not in summary: 810 | graph.add_node(pydot.Node(c, label="...", 811 | shape='rectangle')) 812 | 813 | else: 814 | msg = "id_node: {}\nloss: {:.4f}\nsamples: {}".format( 815 | n, summary[n]['loss'], summary[n]['samples']) 816 | graph.add_node(pydot.Node(n, label=msg)) 817 | 818 | # add edges 819 | for n in summary: 820 | if 'children' in summary[n]: 821 | for c in summary[n]['children']: 822 | graph.add_edge(pydot.Edge(n, c)) 823 | 824 | return graph 825 | 826 | def plot_model(self, feature_names=None, max_depth=None): 827 | """Convert a fitted Linear Tree model to dot format and display it. 828 | It results in ModuleNotFoundError if graphviz or pydot are not available. 829 | When installing graphviz make sure to add it to the system path. 830 | 831 | Parameters 832 | ---------- 833 | feature_names : array-like of shape (n_features, ), default=None 834 | Names of each of the features. If None, generic names 835 | will be used (“X[0]”, “X[1]”, …). 836 | 837 | max_depth : int, default=None 838 | The maximum depth of the representation. If None, the tree 839 | is fully generated. 840 | 841 | Returns 842 | ------- 843 | A Jupyter notebook Image object if Jupyter is installed. 844 | This enables in-line display of the model plots in notebooks. 845 | Splitting nodes have a rectangular shape while leaf nodes 846 | have a circular one. 847 | """ 848 | from IPython.display import Image 849 | 850 | graph = self.model_to_dot(feature_names=feature_names, max_depth=max_depth) 851 | 852 | return Image(graph.create_png()) 853 | 854 | 855 | class _LinearBoosting(TransformerMixin, BaseEstimator): 856 | """Base class for Linear Boosting meta-estimator. 857 | 858 | Warning: This class should not be used directly. Use derived classes 859 | instead. 860 | """ 861 | def __init__(self, base_estimator, *, loss, n_estimators, 862 | max_depth, min_samples_split, min_samples_leaf, 863 | min_weight_fraction_leaf, max_features, 864 | random_state, max_leaf_nodes, 865 | min_impurity_decrease, ccp_alpha): 866 | 867 | self.base_estimator = base_estimator 868 | self.loss = loss 869 | self.n_estimators = n_estimators 870 | self.max_depth = max_depth 871 | self.min_samples_split = min_samples_split 872 | self.min_samples_leaf = min_samples_leaf 873 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 874 | self.max_features = max_features 875 | self.random_state = random_state 876 | self.max_leaf_nodes = max_leaf_nodes 877 | self.min_impurity_decrease = min_impurity_decrease 878 | self.ccp_alpha = ccp_alpha 879 | 880 | def _fit(self, X, y, sample_weight=None): 881 | """Build a Linear Boosting from the training set (X, y). 882 | 883 | Parameters 884 | ---------- 885 | X : array-like of shape (n_samples, n_features) 886 | The training input samples. 887 | 888 | y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for 889 | multitarget regression. 890 | The target values (class labels in classification, real numbers in 891 | regression). 892 | 893 | sample_weight : array-like of shape (n_samples, ), default=None 894 | Sample weights. 895 | 896 | Returns 897 | ------- 898 | self : object 899 | """ 900 | if not hasattr(self.base_estimator, 'fit_intercept'): 901 | raise ValueError("Only linear models are accepted as base_estimator. " 902 | "Select one from linear_model class of scikit-learn.") 903 | 904 | if self.n_estimators <= 0: 905 | raise ValueError("n_estimators must be an integer greater than 0 but " 906 | "got {}".format(self.n_estimators)) 907 | 908 | n_sample, self.n_features_in_ = X.shape 909 | 910 | self._trees = [] 911 | self._leaves = [] 912 | 913 | for i in range(self.n_estimators): 914 | 915 | estimator = deepcopy(self.base_estimator) 916 | estimator.fit(X, y, sample_weight=sample_weight) 917 | 918 | if self.loss == 'entropy': 919 | pred = estimator.predict_proba(X) 920 | else: 921 | pred = estimator.predict(X) 922 | 923 | if hasattr(self, 'classes_'): 924 | resid = SCORING[self.loss](y, pred, self.classes_) 925 | else: 926 | resid = SCORING[self.loss](y, pred) 927 | 928 | if resid.ndim > 1: 929 | resid = resid.mean(1) 930 | 931 | criterion = 'squared_error' if _sklearn_v1 else 'mse' 932 | 933 | tree = DecisionTreeRegressor( 934 | criterion=criterion, max_depth=self.max_depth, 935 | min_samples_split=self.min_samples_split, 936 | min_samples_leaf=self.min_samples_leaf, 937 | min_weight_fraction_leaf=self.min_weight_fraction_leaf, 938 | max_features=self.max_features, 939 | random_state=self.random_state, 940 | max_leaf_nodes=self.max_leaf_nodes, 941 | min_impurity_decrease=self.min_impurity_decrease, 942 | ccp_alpha=self.ccp_alpha 943 | ) 944 | 945 | tree.fit(X, resid, sample_weight=sample_weight, check_input=False) 946 | self._trees.append(tree) 947 | 948 | pred_tree = np.abs(tree.predict(X, check_input=False)) 949 | worst_pred = np.max(pred_tree) 950 | self._leaves.append(worst_pred) 951 | 952 | pred_tree = (pred_tree == worst_pred).astype(np.float32) 953 | pred_tree = pred_tree.reshape(-1, 1) 954 | X = np.concatenate([X, pred_tree], axis=1) 955 | 956 | self.base_estimator_ = deepcopy(self.base_estimator) 957 | self.base_estimator_.fit(X, y, sample_weight=sample_weight) 958 | 959 | if hasattr(self.base_estimator_, 'coef_'): 960 | self.coef_ = self.base_estimator_.coef_ 961 | 962 | if hasattr(self.base_estimator_, 'intercept_'): 963 | self.intercept_ = self.base_estimator_.intercept_ 964 | 965 | self.n_features_out_ = X.shape[1] 966 | 967 | return self 968 | 969 | def transform(self, X): 970 | """Transform dataset. 971 | 972 | Parameters 973 | ---------- 974 | X : array-like of shape (n_samples, n_features) 975 | Input data to be transformed. Use ``dtype=np.float32`` for maximum 976 | efficiency. 977 | 978 | Returns 979 | ------- 980 | X_transformed : ndarray of shape (n_samples, n_out) 981 | Transformed dataset. 982 | `n_out` is equal to `n_features` + `n_estimators` 983 | """ 984 | check_is_fitted(self, attributes='base_estimator_') 985 | 986 | X = self._validate_data( 987 | X, 988 | reset=False, 989 | accept_sparse=False, 990 | dtype='float32', 991 | force_all_finite=True, 992 | ensure_2d=True, 993 | allow_nd=False, 994 | ensure_min_features=self.n_features_in_ 995 | ) 996 | 997 | for tree, leaf in zip(self._trees, self._leaves): 998 | pred_tree = np.abs(tree.predict(X, check_input=False)) 999 | pred_tree = (pred_tree == leaf).astype(np.float32) 1000 | pred_tree = pred_tree.reshape(-1, 1) 1001 | X = np.concatenate([X, pred_tree], axis=1) 1002 | 1003 | return X 1004 | 1005 | 1006 | class _LinearForest(BaseEstimator): 1007 | """Base class for Linear Forest meta-estimator. 1008 | 1009 | Warning: This class should not be used directly. Use derived classes 1010 | instead. 1011 | """ 1012 | def __init__(self, base_estimator, *, n_estimators, max_depth, 1013 | min_samples_split, min_samples_leaf, min_weight_fraction_leaf, 1014 | max_features, max_leaf_nodes, min_impurity_decrease, 1015 | bootstrap, oob_score, n_jobs, random_state, 1016 | ccp_alpha, max_samples): 1017 | 1018 | self.base_estimator = base_estimator 1019 | self.n_estimators = n_estimators 1020 | self.max_depth = max_depth 1021 | self.min_samples_split = min_samples_split 1022 | self.min_samples_leaf = min_samples_leaf 1023 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 1024 | self.max_features = max_features 1025 | self.max_leaf_nodes = max_leaf_nodes 1026 | self.min_impurity_decrease = min_impurity_decrease 1027 | self.bootstrap = bootstrap 1028 | self.oob_score = oob_score 1029 | self.n_jobs = n_jobs 1030 | self.random_state = random_state 1031 | self.ccp_alpha = ccp_alpha 1032 | self.max_samples = max_samples 1033 | 1034 | def _sigmoid(self, y): 1035 | """Expit function (a.k.a. logistic sigmoid). 1036 | 1037 | Parameters 1038 | ---------- 1039 | y : array-like of shape (n_samples, ) 1040 | The array to apply expit to element-wise. 1041 | 1042 | Returns 1043 | ------- 1044 | y : array-like of shape (n_samples, ) 1045 | Expits. 1046 | """ 1047 | return np.exp(y) / (1 + np.exp(y)) 1048 | 1049 | def _inv_sigmoid(self, y): 1050 | """Logit function. 1051 | 1052 | Parameters 1053 | ---------- 1054 | y : array-like of shape (n_samples, ) 1055 | The array to apply logit to element-wise. 1056 | 1057 | Returns 1058 | ------- 1059 | y : array-like of shape (n_samples, ) 1060 | Logits. 1061 | """ 1062 | y = y.clip(1e-3, 1 - 1e-3) 1063 | 1064 | return np.log(y / (1 - y)) 1065 | 1066 | def _fit(self, X, y, sample_weight=None): 1067 | """Build a Linear Boosting from the training set (X, y). 1068 | 1069 | Parameters 1070 | ---------- 1071 | X : array-like of shape (n_samples, n_features) 1072 | The training input samples. 1073 | 1074 | y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for 1075 | multitarget regression. 1076 | The target values (class labels in classification, real numbers in 1077 | regression). 1078 | 1079 | sample_weight : array-like of shape (n_samples, ), default=None 1080 | Sample weights. 1081 | 1082 | Returns 1083 | ------- 1084 | self : object 1085 | """ 1086 | if not hasattr(self.base_estimator, 'fit_intercept'): 1087 | raise ValueError("Only linear models are accepted as base_estimator. " 1088 | "Select one from linear_model class of scikit-learn.") 1089 | 1090 | if not is_regressor(self.base_estimator): 1091 | raise ValueError("Select a regressor linear model as base_estimator.") 1092 | 1093 | n_sample, self.n_features_in_ = X.shape 1094 | 1095 | if hasattr(self, 'classes_'): 1096 | class_to_int = dict(map(reversed, enumerate(self.classes_))) 1097 | y = np.array([class_to_int[i] for i in y]) 1098 | y = self._inv_sigmoid(y) 1099 | 1100 | self.base_estimator_ = deepcopy(self.base_estimator) 1101 | self.base_estimator_.fit(X, y, sample_weight) 1102 | resid = y - self.base_estimator_.predict(X) 1103 | 1104 | criterion = 'squared_error' if _sklearn_v1 else 'mse' 1105 | 1106 | self.forest_estimator_ = RandomForestRegressor( 1107 | n_estimators=self.n_estimators, 1108 | criterion=criterion, 1109 | max_depth=self.max_depth, 1110 | min_samples_split=self.min_samples_split, 1111 | min_samples_leaf=self.min_samples_leaf, 1112 | min_weight_fraction_leaf=self.min_weight_fraction_leaf, 1113 | max_features=self.max_features, 1114 | max_leaf_nodes=self.max_leaf_nodes, 1115 | min_impurity_decrease=self.min_impurity_decrease, 1116 | bootstrap=self.bootstrap, 1117 | oob_score=self.oob_score, 1118 | n_jobs=self.n_jobs, 1119 | random_state=self.random_state, 1120 | ccp_alpha=self.ccp_alpha, 1121 | max_samples=self.max_samples 1122 | ) 1123 | self.forest_estimator_.fit(X, resid, sample_weight) 1124 | 1125 | if hasattr(self.base_estimator_, 'coef_'): 1126 | self.coef_ = self.base_estimator_.coef_ 1127 | 1128 | if hasattr(self.base_estimator_, 'intercept_'): 1129 | self.intercept_ = self.base_estimator_.intercept_ 1130 | 1131 | self.feature_importances_ = self.forest_estimator_.feature_importances_ 1132 | 1133 | return self 1134 | 1135 | def apply(self, X): 1136 | """Apply trees in the forest to X, return leaf indices. 1137 | 1138 | Parameters 1139 | ---------- 1140 | X : array-like of shape (n_samples, n_features) 1141 | The input samples. 1142 | 1143 | Returns 1144 | ------- 1145 | X_leaves : ndarray of shape (n_samples, n_estimators) 1146 | For each datapoint x in X and for each tree in the forest, 1147 | return the index of the leaf x ends up in. 1148 | """ 1149 | check_is_fitted(self, attributes='base_estimator_') 1150 | 1151 | return self.forest_estimator_.apply(X) 1152 | 1153 | def decision_path(self, X): 1154 | """Return the decision path in the forest. 1155 | 1156 | Parameters 1157 | ---------- 1158 | X : array-like of shape (n_samples, n_features) 1159 | The input samples. 1160 | 1161 | Returns 1162 | ------- 1163 | indicator : sparse matrix of shape (n_samples, n_nodes) 1164 | Return a node indicator matrix where non zero elements indicates 1165 | that the samples goes through the nodes. The matrix is of CSR 1166 | format. 1167 | 1168 | n_nodes_ptr : ndarray of shape (n_estimators + 1, ) 1169 | The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] 1170 | gives the indicator value for the i-th estimator. 1171 | """ 1172 | check_is_fitted(self, attributes='base_estimator_') 1173 | 1174 | return self.forest_estimator_.decision_path(X) -------------------------------------------------------------------------------- /lineartree/_criterion.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | SCORING = { 5 | 'linear': lambda y, yh: y - yh, 6 | 'square': lambda y, yh: np.square(y - yh), 7 | 'absolute': lambda y, yh: np.abs(y - yh), 8 | 'exponential': lambda y, yh: 1 - np.exp(-np.abs(y - yh)), 9 | 'poisson': lambda y, yh: yh.clip(1e-6) - y * np.log(yh.clip(1e-6)), 10 | 'hamming': lambda y, yh, classes: (y != yh).astype(int), 11 | 'entropy': lambda y, yh, classes: np.sum(list(map( 12 | lambda c: -(y == c[1]).astype(int) * np.log(yh[:, c[0]]), 13 | enumerate(classes))), axis=0) 14 | } 15 | 16 | 17 | def _normalize_score(scores, weights=None): 18 | """Normalize scores according to weights""" 19 | 20 | if weights is None: 21 | return scores.mean() 22 | else: 23 | return np.mean(np.dot(scores.T, weights) / weights.sum()) 24 | 25 | 26 | def mse(model, X, y, weights=None, **largs): 27 | """Mean Squared Error""" 28 | 29 | pred = model.predict(X) 30 | scores = SCORING['square'](y, pred) 31 | 32 | return _normalize_score(scores, weights) 33 | 34 | 35 | def rmse(model, X, y, weights=None, **largs): 36 | """Root Mean Squared Error""" 37 | 38 | return np.sqrt(mse(model, X, y, weights, **largs)) 39 | 40 | 41 | def mae(model, X, y, weights=None, **largs): 42 | """Mean Absolute Error""" 43 | 44 | pred = model.predict(X) 45 | scores = SCORING['absolute'](y, pred) 46 | 47 | return _normalize_score(scores, weights) 48 | 49 | 50 | def poisson(model, X, y, weights=None, **largs): 51 | """Poisson Loss""" 52 | 53 | if np.any(y < 0): 54 | raise ValueError("Some value(s) of y are negative which is" 55 | " not allowed for Poisson regression.") 56 | 57 | pred = model.predict(X) 58 | scores = SCORING['poisson'](y, pred) 59 | 60 | return _normalize_score(scores, weights) 61 | 62 | 63 | def hamming(model, X, y, weights=None, **largs): 64 | """Hamming Loss""" 65 | 66 | pred = model.predict(X) 67 | scores = SCORING['hamming'](y, pred, None) 68 | 69 | return _normalize_score(scores, weights) 70 | 71 | 72 | def crossentropy(model, X, y, classes, weights=None, **largs): 73 | """Cross Entropy Loss""" 74 | 75 | pred = model.predict_proba(X).clip(1e-5, 1 - 1e-5) 76 | scores = SCORING['entropy'](y, pred, classes) 77 | 78 | return _normalize_score(scores, weights) -------------------------------------------------------------------------------- /lineartree/lineartree.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from sklearn.base import ClassifierMixin, RegressorMixin 4 | from sklearn.utils.validation import check_is_fitted, _check_sample_weight 5 | 6 | from ._classes import _predict_branch 7 | from ._classes import _LinearTree, _LinearBoosting, _LinearForest 8 | 9 | 10 | class LinearTreeRegressor(_LinearTree, RegressorMixin): 11 | """A Linear Tree Regressor. 12 | 13 | A Linear Tree Regressor is a meta-estimator that combine the learning 14 | ability of Decision Tree and the predictive power of Linear Models. 15 | Like in tree-based algorithms, the received data are splitted according 16 | simple decision rules. The goodness of slits is evaluated in gain terms 17 | fitting linear models in each node. This implies that the models in the 18 | leaves are linear instead of constant approximations like in classical 19 | Decision Tree. 20 | 21 | Parameters 22 | ---------- 23 | base_estimator : object 24 | The base estimator to fit on dataset splits. 25 | The base estimator must be a sklearn.linear_model. 26 | 27 | criterion : {"mse", "rmse", "mae", "poisson"}, default="mse" 28 | The function to measure the quality of a split. "poisson" 29 | requires ``y >= 0``. 30 | 31 | max_depth : int, default=5 32 | The maximum depth of the tree considering only the splitting nodes. 33 | A higher value implies a higher training time. 34 | 35 | min_samples_split : int or float, default=6 36 | The minimum number of samples required to split an internal node. 37 | The minimum valid number of samples in each node is 6. 38 | A lower value implies a higher training time. 39 | - If int, then consider `min_samples_split` as the minimum number. 40 | - If float, then `min_samples_split` is a fraction and 41 | `ceil(min_samples_split * n_samples)` are the minimum 42 | number of samples for each split. 43 | 44 | min_samples_leaf : int or float, default=0.1 45 | The minimum number of samples required to be at a leaf node. 46 | A split point at any depth will only be considered if it leaves at 47 | least `min_samples_leaf` training samples in each of the left and 48 | right branches. 49 | The minimum valid number of samples in each leaf is 3. 50 | A lower value implies a higher training time. 51 | - If int, then consider `min_samples_leaf` as the minimum number. 52 | - If float, then `min_samples_leaf` is a fraction and 53 | `ceil(min_samples_leaf * n_samples)` are the minimum 54 | number of samples for each node. 55 | 56 | max_bins : int, default=25 57 | The maximum number of bins to use to search the optimal split in each 58 | feature. Features with a small number of unique values may use less than 59 | ``max_bins`` bins. Must be lower than 120 and larger than 10. 60 | A higher value implies a higher training time. 61 | 62 | min_impurity_decrease : float, default=0.0 63 | A node will be split if this split induces a decrease of the impurity 64 | greater than or equal to this value. 65 | 66 | categorical_features : int or array-like of int, default=None 67 | Indicates the categorical features. 68 | All categorical indices must be in `[0, n_features)`. 69 | Categorical features are used for splits but are not used in 70 | model fitting. 71 | More categorical features imply a higher training time. 72 | - None : no feature will be considered categorical. 73 | - integer array-like : integer indices indicating categorical 74 | features. 75 | - integer : integer index indicating a categorical 76 | feature. 77 | 78 | split_features : int or array-like of int, default=None 79 | Defines which features can be used to split on. 80 | All split feature indices must be in `[0, n_features)`. 81 | - None : All features will be used for splitting. 82 | - integer array-like : integer indices indicating splitting features. 83 | - integer : integer index indicating a single splitting feature. 84 | 85 | linear_features : int or array-like of int, default=None 86 | Defines which features are used for the linear model in the leaves. 87 | All linear feature indices must be in `[0, n_features)`. 88 | - None : All features except those in `categorical_features` 89 | will be used in the leaf models. 90 | - integer array-like : integer indices indicating features to 91 | be used in the leaf models. 92 | - integer : integer index indicating a single feature to be used 93 | in the leaf models. 94 | 95 | n_jobs : int, default=None 96 | The number of jobs to run in parallel for model fitting. 97 | ``None`` means 1 using one processor. ``-1`` means using all 98 | processors. 99 | 100 | Attributes 101 | ---------- 102 | n_features_in_ : int 103 | The number of features when :meth:`fit` is performed. 104 | 105 | feature_importances_ : ndarray of shape (n_features, ) 106 | Normalized total reduction of criteria by splitting features. 107 | 108 | n_targets_ : int 109 | The number of targets when :meth:`fit` is performed. 110 | 111 | Examples 112 | -------- 113 | >>> from sklearn.linear_model import LinearRegression 114 | >>> from lineartree import LinearTreeRegressor 115 | >>> from sklearn.datasets import make_regression 116 | >>> X, y = make_regression(n_samples=100, n_features=4, 117 | ... n_informative=2, n_targets=1, 118 | ... random_state=0, shuffle=False) 119 | >>> regr = LinearTreeRegressor(base_estimator=LinearRegression()) 120 | >>> regr.fit(X, y) 121 | >>> regr.predict([[0, 0, 0, 0]]) 122 | array([8.8817842e-16]) 123 | """ 124 | def __init__(self, base_estimator, *, criterion='mse', max_depth=5, 125 | min_samples_split=6, min_samples_leaf=0.1, max_bins=25, 126 | min_impurity_decrease=0.0, categorical_features=None, 127 | split_features=None, linear_features=None, n_jobs=None): 128 | 129 | self.base_estimator = base_estimator 130 | self.criterion = criterion 131 | self.max_depth = max_depth 132 | self.min_samples_split = min_samples_split 133 | self.min_samples_leaf = min_samples_leaf 134 | self.max_bins = max_bins 135 | self.min_impurity_decrease = min_impurity_decrease 136 | self.categorical_features = categorical_features 137 | self.split_features = split_features 138 | self.linear_features = linear_features 139 | self.n_jobs = n_jobs 140 | 141 | def fit(self, X, y, sample_weight=None): 142 | """Build a Linear Tree of a linear estimator from the training 143 | set (X, y). 144 | 145 | Parameters 146 | ---------- 147 | X : array-like of shape (n_samples, n_features) 148 | The training input samples. 149 | 150 | y : array-like of shape (n_samples, ) or (n_samples, n_targets) 151 | Target values. 152 | 153 | sample_weight : array-like of shape (n_samples, ), default=None 154 | Sample weights. If None, then samples are equally weighted. 155 | Note that if the base estimator does not support sample weighting, 156 | the sample weights are still used to evaluate the splits. 157 | 158 | Returns 159 | ------- 160 | self : object 161 | """ 162 | reg_criterions = ('mse', 'rmse', 'mae', 'poisson') 163 | 164 | if self.criterion not in reg_criterions: 165 | raise ValueError("Regression tasks support only criterion in {}, " 166 | "got '{}'.".format(reg_criterions, self.criterion)) 167 | 168 | # Convert data (X is required to be 2d and indexable) 169 | X, y = self._validate_data( 170 | X, y, 171 | reset=True, 172 | accept_sparse=False, 173 | dtype='float32', 174 | force_all_finite=True, 175 | ensure_2d=True, 176 | allow_nd=False, 177 | multi_output=True, 178 | y_numeric=True, 179 | ) 180 | if sample_weight is not None: 181 | sample_weight = _check_sample_weight(sample_weight, X) 182 | 183 | y_shape = np.shape(y) 184 | self.n_targets_ = y_shape[1] if len(y_shape) > 1 else 1 185 | if self.n_targets_ < 2: 186 | y = y.ravel() 187 | self._fit(X, y, sample_weight) 188 | 189 | return self 190 | 191 | def predict(self, X): 192 | """Predict regression target for X. 193 | 194 | Parameters 195 | ---------- 196 | X : array-like of shape (n_samples, n_features) 197 | Samples. 198 | 199 | Returns 200 | ------- 201 | pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if 202 | multitarget regression. 203 | The predicted values. 204 | """ 205 | check_is_fitted(self, attributes='_nodes') 206 | 207 | X = self._validate_data( 208 | X, 209 | reset=False, 210 | accept_sparse=False, 211 | dtype='float32', 212 | force_all_finite=True, 213 | ensure_2d=True, 214 | allow_nd=False, 215 | ensure_min_features=self.n_features_in_ 216 | ) 217 | 218 | if self.n_targets_ > 1: 219 | pred = np.zeros((X.shape[0], self.n_targets_)) 220 | else: 221 | pred = np.zeros(X.shape[0]) 222 | 223 | for L in self._leaves.values(): 224 | 225 | mask = _predict_branch(X, L.threshold) 226 | if (~mask).all(): 227 | continue 228 | 229 | pred[mask] = L.model.predict(X[np.ix_(mask, self._linear_features)]) 230 | 231 | return pred 232 | 233 | 234 | class LinearTreeClassifier(_LinearTree, ClassifierMixin): 235 | """A Linear Tree Classifier. 236 | 237 | A Linear Tree Classifier is a meta-estimator that combine the learning 238 | ability of Decision Tree and the predictive power of Linear Models. 239 | Like in tree-based algorithms, the received data are splitted according 240 | simple decision rules. The goodness of slits is evaluated in gain terms 241 | fitting linear models in each node. This implies that the models in the 242 | leaves are linear instead of constant approximations like in classical 243 | Decision Tree. 244 | 245 | Parameters 246 | ---------- 247 | base_estimator : object 248 | The base estimator to fit on dataset splits. 249 | The base estimator must be a sklearn.linear_model. 250 | The selected base estimator is automatically substituted by a 251 | `~sklearn.dummy.DummyClassifier` when a dataset split 252 | is composed of unique labels. 253 | 254 | criterion : {"hamming", "crossentropy"}, default="hamming" 255 | The function to measure the quality of a split. `"crossentropy"` 256 | can be used only if `base_estimator` has `predict_proba` method. 257 | 258 | max_depth : int, default=5 259 | The maximum depth of the tree considering only the splitting nodes. 260 | A higher value implies a higher training time. 261 | 262 | min_samples_split : int or float, default=6 263 | The minimum number of samples required to split an internal node. 264 | The minimum valid number of samples in each node is 6. 265 | A lower value implies a higher training time. 266 | - If int, then consider `min_samples_split` as the minimum number. 267 | - If float, then `min_samples_split` is a fraction and 268 | `ceil(min_samples_split * n_samples)` are the minimum 269 | number of samples for each split. 270 | 271 | min_samples_leaf : int or float, default=0.1 272 | The minimum number of samples required to be at a leaf node. 273 | A split point at any depth will only be considered if it leaves at 274 | least `min_samples_leaf` training samples in each of the left and 275 | right branches. 276 | The minimum valid number of samples in each leaf is 3. 277 | A lower value implies a higher training time. 278 | - If int, then consider `min_samples_leaf` as the minimum number. 279 | - If float, then `min_samples_leaf` is a fraction and 280 | `ceil(min_samples_leaf * n_samples)` are the minimum 281 | number of samples for each node. 282 | 283 | max_bins : int, default=25 284 | The maximum number of bins to use to search the optimal split in each 285 | feature. Features with a small number of unique values may use less than 286 | ``max_bins`` bins. Must be lower than 120 and larger than 10. 287 | A higher value implies a higher training time. 288 | 289 | min_impurity_decrease : float, default=0.0 290 | A node will be split if this split induces a decrease of the impurity 291 | greater than or equal to this value. 292 | 293 | categorical_features : int or array-like of int, default=None 294 | Indicates the categorical features. 295 | All categorical indices must be in `[0, n_features)`. 296 | Categorical features are used for splits but are not used in 297 | model fitting. 298 | More categorical features imply a higher training time. 299 | - None : no feature will be considered categorical. 300 | - integer array-like : integer indices indicating categorical 301 | features. 302 | - integer : integer index indicating a categorical 303 | feature. 304 | 305 | split_features : int or array-like of int, default=None 306 | Defines which features can be used to split on. 307 | All split feature indices must be in `[0, n_features)`. 308 | - None : All features will be used for splitting. 309 | - integer array-like : integer indices indicating splitting features. 310 | - integer : integer index indicating a single splitting feature. 311 | 312 | linear_features : int or array-like of int, default=None 313 | Defines which features are used for the linear model in the leaves. 314 | All linear feature indices must be in `[0, n_features)`. 315 | - None : All features except those in `categorical_features` 316 | will be used in the leaf models. 317 | - integer array-like : integer indices indicating features to 318 | be used in the leaf models. 319 | - integer : integer index indicating a single feature to be used 320 | in the leaf models. 321 | 322 | n_jobs : int, default=None 323 | The number of jobs to run in parallel for model fitting. 324 | ``None`` means 1 using one processor. ``-1`` means using all 325 | processors. 326 | 327 | Attributes 328 | ---------- 329 | n_features_in_ : int 330 | The number of features when :meth:`fit` is performed. 331 | 332 | feature_importances_ : ndarray of shape (n_features, ) 333 | Normalized total reduction of criteria by splitting features. 334 | 335 | classes_ : ndarray of shape (n_classes, ) 336 | A list of class labels known to the classifier. 337 | 338 | Examples 339 | -------- 340 | >>> from sklearn.linear_model import RidgeClassifier 341 | >>> from lineartree import LinearTreeClassifier 342 | >>> from sklearn.datasets import make_classification 343 | >>> X, y = make_classification(n_samples=100, n_features=4, 344 | ... n_informative=2, n_redundant=0, 345 | ... random_state=0, shuffle=False) 346 | >>> clf = LinearTreeClassifier(base_estimator=RidgeClassifier()) 347 | >>> clf.fit(X, y) 348 | >>> clf.predict([[0, 0, 0, 0]]) 349 | array([1]) 350 | """ 351 | def __init__(self, base_estimator, *, criterion='hamming', max_depth=5, 352 | min_samples_split=6, min_samples_leaf=0.1, max_bins=25, 353 | min_impurity_decrease=0.0, categorical_features=None, 354 | split_features=None, linear_features=None, n_jobs=None): 355 | 356 | self.base_estimator = base_estimator 357 | self.criterion = criterion 358 | self.max_depth = max_depth 359 | self.min_samples_split = min_samples_split 360 | self.min_samples_leaf = min_samples_leaf 361 | self.max_bins = max_bins 362 | self.min_impurity_decrease = min_impurity_decrease 363 | self.categorical_features = categorical_features 364 | self.split_features = split_features 365 | self.linear_features = linear_features 366 | self.n_jobs = n_jobs 367 | 368 | def fit(self, X, y, sample_weight=None): 369 | """Build a Linear Tree of a linear estimator from the training 370 | set (X, y). 371 | 372 | Parameters 373 | ---------- 374 | X : array-like of shape (n_samples, n_features) 375 | The training input samples. 376 | 377 | y : array-like of shape (n_samples, ) 378 | Target values. 379 | 380 | sample_weight : array-like of shape (n_samples, ), default=None 381 | Sample weights. If None, then samples are equally weighted. 382 | Note that if the base estimator does not support sample weighting, 383 | the sample weights are still used to evaluate the splits. 384 | 385 | Returns 386 | ------- 387 | self : object 388 | """ 389 | clas_criterions = ('hamming', 'crossentropy') 390 | 391 | if self.criterion not in clas_criterions: 392 | raise ValueError("Classification tasks support only criterion in {}, " 393 | "got '{}'.".format(clas_criterions, self.criterion)) 394 | 395 | if (not hasattr(self.base_estimator, 'predict_proba') and 396 | self.criterion == 'crossentropy'): 397 | raise ValueError("The 'crossentropy' criterion requires a base_estimator " 398 | "with predict_proba method.") 399 | 400 | # Convert data (X is required to be 2d and indexable) 401 | X, y = self._validate_data( 402 | X, y, 403 | reset=True, 404 | accept_sparse=False, 405 | dtype='float32', 406 | force_all_finite=True, 407 | ensure_2d=True, 408 | allow_nd=False, 409 | multi_output=False, 410 | ) 411 | if sample_weight is not None: 412 | sample_weight = _check_sample_weight(sample_weight, X) 413 | 414 | self.classes_ = np.unique(y) 415 | self._fit(X, y, sample_weight) 416 | 417 | return self 418 | 419 | def predict(self, X): 420 | """Predict class for X. 421 | 422 | Parameters 423 | ---------- 424 | X : array-like of shape (n_samples, n_features) 425 | Samples. 426 | 427 | Returns 428 | ------- 429 | pred : ndarray of shape (n_samples, ) 430 | The predicted classes. 431 | """ 432 | check_is_fitted(self, attributes='_nodes') 433 | 434 | X = self._validate_data( 435 | X, 436 | reset=False, 437 | accept_sparse=False, 438 | dtype='float32', 439 | force_all_finite=True, 440 | ensure_2d=True, 441 | allow_nd=False, 442 | ensure_min_features=self.n_features_in_ 443 | ) 444 | 445 | pred = np.empty(X.shape[0], dtype=self.classes_.dtype) 446 | 447 | for L in self._leaves.values(): 448 | 449 | mask = _predict_branch(X, L.threshold) 450 | if (~mask).all(): 451 | continue 452 | 453 | pred[mask] = L.model.predict(X[np.ix_(mask, self._linear_features)]) 454 | 455 | return pred 456 | 457 | def predict_proba(self, X): 458 | """Predict class probabilities for X. 459 | 460 | If base estimators do not implement a ``predict_proba`` method, 461 | then the one-hot encoding of the predicted class is returned. 462 | 463 | Parameters 464 | ---------- 465 | X : array-like of shape (n_samples, n_features) 466 | Samples. 467 | 468 | Returns 469 | ------- 470 | pred : ndarray of shape (n_samples, n_classes) 471 | The class probabilities of the input samples. The order of the 472 | classes corresponds to that in the attribute :term:`classes_`. 473 | """ 474 | check_is_fitted(self, attributes='_nodes') 475 | 476 | X = self._validate_data( 477 | X, 478 | reset=False, 479 | accept_sparse=False, 480 | dtype='float32', 481 | force_all_finite=True, 482 | ensure_2d=True, 483 | allow_nd=False, 484 | ensure_min_features=self.n_features_in_ 485 | ) 486 | 487 | pred = np.zeros((X.shape[0], len(self.classes_))) 488 | 489 | if hasattr(self.base_estimator, 'predict_proba'): 490 | for L in self._leaves.values(): 491 | 492 | mask = _predict_branch(X, L.threshold) 493 | if (~mask).all(): 494 | continue 495 | 496 | pred[np.ix_(mask, np.isin(self.classes_, L.classes))] = \ 497 | L.model.predict_proba(X[np.ix_(mask, self._linear_features)]) 498 | 499 | else: 500 | pred_class = self.predict(X) 501 | class_to_int = dict(map(reversed, enumerate(self.classes_))) 502 | pred_class = np.array([class_to_int[i] for i in pred_class]) 503 | pred[np.arange(X.shape[0]), pred_class] = 1 504 | 505 | return pred 506 | 507 | def predict_log_proba(self, X): 508 | """Predict class log-probabilities for X. 509 | 510 | If base estimators do not implement a ``predict_log_proba`` method, 511 | then the logarithm of the one-hot encoded predicted class is returned. 512 | 513 | Parameters 514 | ---------- 515 | X : array-like of shape (n_samples, n_features) 516 | Samples. 517 | 518 | Returns 519 | ------- 520 | pred : ndarray of shape (n_samples, n_classes) 521 | The class log-probabilities of the input samples. The order of the 522 | classes corresponds to that in the attribute :term:`classes_`. 523 | """ 524 | return np.log(self.predict_proba(X)) 525 | 526 | 527 | class LinearBoostRegressor(_LinearBoosting, RegressorMixin): 528 | """A Linear Boosting Regressor. 529 | 530 | A Linear Boosting Regressor is an iterative meta-estimator that starts 531 | with a linear regressor, and model the residuals through decision trees. 532 | At each iteration, the path leading to highest error (i.e. the worst leaf) 533 | is added as a new binary variable to the base model. This kind of Linear 534 | Boosting can be considered as an improvement over general linear models 535 | since it enables incorporating non-linear features by residuals modeling. 536 | 537 | Parameters 538 | ---------- 539 | base_estimator : object 540 | The base estimator iteratively fitted. 541 | The base estimator must be a sklearn.linear_model. 542 | 543 | loss : {"linear", "square", "absolute", "exponential"}, default="linear" 544 | The function used to calculate the residuals of each sample. 545 | 546 | n_estimators : int, default=10 547 | The number of boosting stages to perform. It corresponds to the number 548 | of the new features generated. 549 | 550 | max_depth : int, default=3 551 | The maximum depth of the tree. If None, then nodes are expanded until 552 | all leaves are pure or until all leaves contain less than 553 | min_samples_split samples. 554 | 555 | min_samples_split : int or float, default=2 556 | The minimum number of samples required to split an internal node: 557 | 558 | - If int, then consider `min_samples_split` as the minimum number. 559 | - If float, then `min_samples_split` is a fraction and 560 | `ceil(min_samples_split * n_samples)` are the minimum 561 | number of samples for each split. 562 | 563 | min_samples_leaf : int or float, default=1 564 | The minimum number of samples required to be at a leaf node. 565 | A split point at any depth will only be considered if it leaves at 566 | least ``min_samples_leaf`` training samples in each of the left and 567 | right branches. This may have the effect of smoothing the model, 568 | especially in regression. 569 | 570 | - If int, then consider `min_samples_leaf` as the minimum number. 571 | - If float, then `min_samples_leaf` is a fraction and 572 | `ceil(min_samples_leaf * n_samples)` are the minimum 573 | number of samples for each node. 574 | 575 | min_weight_fraction_leaf : float, default=0.0 576 | The minimum weighted fraction of the sum total of weights (of all 577 | the input samples) required to be at a leaf node. Samples have 578 | equal weight when sample_weight is not provided. 579 | 580 | max_features : int, float or {"auto", "sqrt", "log2"}, default=None 581 | The number of features to consider when looking for the best split: 582 | 583 | - If int, then consider `max_features` features at each split. 584 | - If float, then `max_features` is a fraction and 585 | `int(max_features * n_features)` features are considered at each 586 | split. 587 | - If "auto", then `max_features=n_features`. 588 | - If "sqrt", then `max_features=sqrt(n_features)`. 589 | - If "log2", then `max_features=log2(n_features)`. 590 | - If None, then `max_features=n_features`. 591 | 592 | Note: the search for a split does not stop until at least one 593 | valid partition of the node samples is found, even if it requires to 594 | effectively inspect more than ``max_features`` features. 595 | 596 | random_state : int, RandomState instance or None, default=None 597 | Controls the randomness of the estimator. 598 | 599 | max_leaf_nodes : int, default=None 600 | Grow a tree with ``max_leaf_nodes`` in best-first fashion. 601 | Best nodes are defined as relative reduction in impurity. 602 | If None then unlimited number of leaf nodes. 603 | 604 | min_impurity_decrease : float, default=0.0 605 | A node will be split if this split induces a decrease of the impurity 606 | greater than or equal to this value. 607 | 608 | ccp_alpha : non-negative float, default=0.0 609 | Complexity parameter used for Minimal Cost-Complexity Pruning. The 610 | subtree with the largest cost complexity that is smaller than 611 | ``ccp_alpha`` will be chosen. By default, no pruning is performed. See 612 | :ref:`minimal_cost_complexity_pruning` for details. 613 | 614 | Attributes 615 | ---------- 616 | n_features_in_ : int 617 | The number of features when :meth:`fit` is performed. 618 | 619 | n_features_out_ : int 620 | The total number of features used to fit the base estimator in the 621 | last iteration. The number of output features is equal to the sum 622 | of n_features_in_ and n_estimators. 623 | 624 | coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_) 625 | Estimated coefficients for the linear regression problem. 626 | If multiple targets are passed during the fit (y 2D), this is a 627 | 2D array of shape (n_targets, n_features_out_), while if only one target 628 | is passed, this is a 1D array of length n_features_out_. 629 | 630 | intercept_ : float or array of shape (n_targets, ) 631 | Independent term in the linear model. Set to 0 if `fit_intercept = False` 632 | in `base_estimator` 633 | 634 | Examples 635 | -------- 636 | >>> from sklearn.linear_model import LinearRegression 637 | >>> from lineartree import LinearBoostRegressor 638 | >>> from sklearn.datasets import make_regression 639 | >>> X, y = make_regression(n_samples=100, n_features=4, 640 | ... n_informative=2, n_targets=1, 641 | ... random_state=0, shuffle=False) 642 | >>> regr = LinearBoostRegressor(base_estimator=LinearRegression()) 643 | >>> regr.fit(X, y) 644 | >>> regr.predict([[0, 0, 0, 0]]) 645 | array([8.8817842e-16]) 646 | 647 | References 648 | ---------- 649 | Explainable boosted linear regression for time series forecasting. 650 | Authors: Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan. 651 | (https://arxiv.org/abs/2009.09110) 652 | """ 653 | def __init__(self, base_estimator, *, loss='linear', n_estimators=10, 654 | max_depth=3, min_samples_split=2, min_samples_leaf=1, 655 | min_weight_fraction_leaf=0.0, max_features=None, 656 | random_state=None, max_leaf_nodes=None, 657 | min_impurity_decrease=0.0, ccp_alpha=0.0): 658 | 659 | self.base_estimator = base_estimator 660 | self.loss = loss 661 | self.n_estimators = n_estimators 662 | self.max_depth = max_depth 663 | self.min_samples_split = min_samples_split 664 | self.min_samples_leaf = min_samples_leaf 665 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 666 | self.max_features = max_features 667 | self.random_state = random_state 668 | self.max_leaf_nodes = max_leaf_nodes 669 | self.min_impurity_decrease = min_impurity_decrease 670 | self.ccp_alpha = ccp_alpha 671 | 672 | def fit(self, X, y, sample_weight=None): 673 | """Build a Linear Boosting from the training set (X, y). 674 | 675 | Parameters 676 | ---------- 677 | X : array-like of shape (n_samples, n_features) 678 | The training input samples. 679 | 680 | y : array-like of shape (n_samples, ) or (n_samples, n_targets) 681 | Target values. 682 | 683 | sample_weight : array-like of shape (n_samples, ), default=None 684 | Sample weights. 685 | 686 | Returns 687 | ------- 688 | self : object 689 | """ 690 | reg_losses = ('linear', 'square', 'absolute', 'exponential') 691 | 692 | if self.loss not in reg_losses: 693 | raise ValueError("Regression tasks support only loss in {}, " 694 | "got '{}'.".format(reg_losses, self.loss)) 695 | 696 | # Convert data (X is required to be 2d and indexable) 697 | X, y = self._validate_data( 698 | X, y, 699 | reset=True, 700 | accept_sparse=False, 701 | dtype='float32', 702 | force_all_finite=True, 703 | ensure_2d=True, 704 | allow_nd=False, 705 | multi_output=True, 706 | y_numeric=True, 707 | ) 708 | if sample_weight is not None: 709 | sample_weight = _check_sample_weight(sample_weight, X) 710 | 711 | y_shape = np.shape(y) 712 | n_targets = y_shape[1] if len(y_shape) > 1 else 1 713 | if n_targets < 2: 714 | y = y.ravel() 715 | self._fit(X, y, sample_weight) 716 | 717 | return self 718 | 719 | def predict(self, X): 720 | """Predict regression target for X. 721 | 722 | Parameters 723 | ---------- 724 | X : array-like of shape (n_samples, n_features) 725 | Samples. 726 | 727 | Returns 728 | ------- 729 | pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if 730 | multitarget regression. 731 | The predicted values. 732 | """ 733 | check_is_fitted(self, attributes='base_estimator_') 734 | 735 | return self.base_estimator_.predict(self.transform(X)) 736 | 737 | 738 | class LinearBoostClassifier(_LinearBoosting, ClassifierMixin): 739 | """A Linear Boosting Classifier. 740 | 741 | A Linear Boosting Classifier is an iterative meta-estimator that starts 742 | with a linear classifier, and model the residuals through decision trees. 743 | At each iteration, the path leading to highest error (i.e. the worst leaf) 744 | is added as a new binary variable to the base model. This kind of Linear 745 | Boosting can be considered as an improvement over general linear models 746 | since it enables incorporating non-linear features by residuals modeling. 747 | 748 | Parameters 749 | ---------- 750 | base_estimator : object 751 | The base estimator iteratively fitted. 752 | The base estimator must be a sklearn.linear_model. 753 | 754 | loss : {"hamming", "entropy"}, default="entropy" 755 | The function used to calculate the residuals of each sample. 756 | `"entropy"` can be used only if `base_estimator` has `predict_proba` 757 | method. 758 | 759 | n_estimators : int, default=10 760 | The number of boosting stages to perform. It corresponds to the number 761 | of the new features generated. 762 | 763 | max_depth : int, default=3 764 | The maximum depth of the tree. If None, then nodes are expanded until 765 | all leaves are pure or until all leaves contain less than 766 | min_samples_split samples. 767 | 768 | min_samples_split : int or float, default=2 769 | The minimum number of samples required to split an internal node: 770 | 771 | - If int, then consider `min_samples_split` as the minimum number. 772 | - If float, then `min_samples_split` is a fraction and 773 | `ceil(min_samples_split * n_samples)` are the minimum 774 | number of samples for each split. 775 | 776 | min_samples_leaf : int or float, default=1 777 | The minimum number of samples required to be at a leaf node. 778 | A split point at any depth will only be considered if it leaves at 779 | least ``min_samples_leaf`` training samples in each of the left and 780 | right branches. This may have the effect of smoothing the model, 781 | especially in regression. 782 | 783 | - If int, then consider `min_samples_leaf` as the minimum number. 784 | - If float, then `min_samples_leaf` is a fraction and 785 | `ceil(min_samples_leaf * n_samples)` are the minimum 786 | number of samples for each node. 787 | 788 | min_weight_fraction_leaf : float, default=0.0 789 | The minimum weighted fraction of the sum total of weights (of all 790 | the input samples) required to be at a leaf node. Samples have 791 | equal weight when sample_weight is not provided. 792 | 793 | max_features : int, float or {"auto", "sqrt", "log2"}, default=None 794 | The number of features to consider when looking for the best split: 795 | 796 | - If int, then consider `max_features` features at each split. 797 | - If float, then `max_features` is a fraction and 798 | `int(max_features * n_features)` features are considered at each 799 | split. 800 | - If "auto", then `max_features=n_features`. 801 | - If "sqrt", then `max_features=sqrt(n_features)`. 802 | - If "log2", then `max_features=log2(n_features)`. 803 | - If None, then `max_features=n_features`. 804 | 805 | Note: the search for a split does not stop until at least one 806 | valid partition of the node samples is found, even if it requires to 807 | effectively inspect more than ``max_features`` features. 808 | 809 | random_state : int, RandomState instance or None, default=None 810 | Controls the randomness of the estimator. 811 | 812 | max_leaf_nodes : int, default=None 813 | Grow a tree with ``max_leaf_nodes`` in best-first fashion. 814 | Best nodes are defined as relative reduction in impurity. 815 | If None then unlimited number of leaf nodes. 816 | 817 | min_impurity_decrease : float, default=0.0 818 | A node will be split if this split induces a decrease of the impurity 819 | greater than or equal to this value. 820 | 821 | ccp_alpha : non-negative float, default=0.0 822 | Complexity parameter used for Minimal Cost-Complexity Pruning. The 823 | subtree with the largest cost complexity that is smaller than 824 | ``ccp_alpha`` will be chosen. By default, no pruning is performed. See 825 | :ref:`minimal_cost_complexity_pruning` for details. 826 | 827 | Attributes 828 | ---------- 829 | n_features_in_ : int 830 | The number of features when :meth:`fit` is performed. 831 | 832 | n_features_out_ : int 833 | The total number of features used to fit the base estimator in the 834 | last iteration. The number of output features is equal to the sum 835 | of n_features_in_ and n_estimators. 836 | 837 | coef_ : ndarray of shape (1, n_features_out_) or (n_classes, n_features_out_) 838 | Coefficient of the features in the decision function. 839 | 840 | intercept_ : float or array of shape (n_classes, ) 841 | Independent term in the linear model. Set to 0 if `fit_intercept = False` 842 | in `base_estimator` 843 | 844 | classes_ : ndarray of shape (n_classes, ) 845 | A list of class labels known to the classifier. 846 | 847 | Examples 848 | -------- 849 | >>> from sklearn.linear_model import RidgeClassifier 850 | >>> from lineartree import LinearBoostClassifier 851 | >>> from sklearn.datasets import make_classification 852 | >>> X, y = make_classification(n_samples=100, n_features=4, 853 | ... n_informative=2, n_redundant=0, 854 | ... random_state=0, shuffle=False) 855 | >>> clf = LinearBoostClassifier(base_estimator=RidgeClassifier()) 856 | >>> clf.fit(X, y) 857 | >>> clf.predict([[0, 0, 0, 0]]) 858 | array([1]) 859 | 860 | References 861 | ---------- 862 | Explainable boosted linear regression for time series forecasting. 863 | Authors: Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan. 864 | (https://arxiv.org/abs/2009.09110) 865 | """ 866 | def __init__(self, base_estimator, *, loss='hamming', n_estimators=10, 867 | max_depth=3, min_samples_split=2, min_samples_leaf=1, 868 | min_weight_fraction_leaf=0.0, max_features=None, 869 | random_state=None, max_leaf_nodes=None, 870 | min_impurity_decrease=0.0, ccp_alpha=0.0): 871 | 872 | self.base_estimator = base_estimator 873 | self.loss = loss 874 | self.n_estimators = n_estimators 875 | self.max_depth = max_depth 876 | self.min_samples_split = min_samples_split 877 | self.min_samples_leaf = min_samples_leaf 878 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 879 | self.max_features = max_features 880 | self.random_state = random_state 881 | self.max_leaf_nodes = max_leaf_nodes 882 | self.min_impurity_decrease = min_impurity_decrease 883 | self.ccp_alpha = ccp_alpha 884 | 885 | def fit(self, X, y, sample_weight=None): 886 | """Build a Linear Boosting from the training set (X, y). 887 | 888 | Parameters 889 | ---------- 890 | X : array-like of shape (n_samples, n_features) 891 | The training input samples. 892 | 893 | y : array-like of shape (n_samples, ) 894 | Target values. 895 | 896 | sample_weight : array-like of shape (n_samples, ), default=None 897 | Sample weights. 898 | 899 | Returns 900 | ------- 901 | self : object 902 | """ 903 | clas_losses = ('hamming', 'entropy') 904 | 905 | if self.loss not in clas_losses: 906 | raise ValueError("Classification tasks support only loss in {}, " 907 | "got '{}'.".format(clas_losses, self.loss)) 908 | 909 | if (not hasattr(self.base_estimator, 'predict_proba') and 910 | self.loss == 'entropy'): 911 | raise ValueError("The 'entropy' loss requires a base_estimator " 912 | "with predict_proba method.") 913 | 914 | # Convert data (X is required to be 2d and indexable) 915 | X, y = self._validate_data( 916 | X, y, 917 | reset=True, 918 | accept_sparse=False, 919 | dtype='float32', 920 | force_all_finite=True, 921 | ensure_2d=True, 922 | allow_nd=False, 923 | multi_output=False, 924 | ) 925 | if sample_weight is not None: 926 | sample_weight = _check_sample_weight(sample_weight, X) 927 | 928 | self.classes_ = np.unique(y) 929 | self._fit(X, y, sample_weight) 930 | 931 | return self 932 | 933 | def predict(self, X): 934 | """Predict class for X. 935 | 936 | Parameters 937 | ---------- 938 | X : array-like of shape (n_samples, n_features) 939 | Samples. 940 | 941 | Returns 942 | ------- 943 | pred : ndarray of shape (n_samples, ) 944 | The predicted classes. 945 | """ 946 | check_is_fitted(self, attributes='base_estimator_') 947 | 948 | return self.base_estimator_.predict(self.transform(X)) 949 | 950 | def predict_proba(self, X): 951 | """Predict class probabilities for X. 952 | 953 | If base estimators do not implement a ``predict_proba`` method, 954 | then the one-hot encoding of the predicted class is returned. 955 | 956 | Parameters 957 | ---------- 958 | X : array-like of shape (n_samples, n_features) 959 | Samples. 960 | 961 | Returns 962 | ------- 963 | pred : ndarray of shape (n_samples, n_classes) 964 | The class probabilities of the input samples. The order of the 965 | classes corresponds to that in the attribute :term:`classes_`. 966 | """ 967 | if hasattr(self.base_estimator, 'predict_proba'): 968 | check_is_fitted(self, attributes='base_estimator_') 969 | pred = self.base_estimator_.predict_proba(self.transform(X)) 970 | 971 | else: 972 | pred_class = self.predict(X) 973 | pred = np.zeros((pred_class.shape[0], len(self.classes_))) 974 | class_to_int = dict(map(reversed, enumerate(self.classes_))) 975 | pred_class = np.array([class_to_int[v] for v in pred_class]) 976 | pred[np.arange(pred_class.shape[0]), pred_class] = 1 977 | 978 | return pred 979 | 980 | def predict_log_proba(self, X): 981 | """Predict class log-probabilities for X. 982 | 983 | If base estimators do not implement a ``predict_log_proba`` method, 984 | then the logarithm of the one-hot encoded predicted class is returned. 985 | 986 | Parameters 987 | ---------- 988 | X : array-like of shape (n_samples, n_features) 989 | Samples. 990 | 991 | Returns 992 | ------- 993 | pred : ndarray of shape (n_samples, n_classes) 994 | The class log-probabilities of the input samples. The order of the 995 | classes corresponds to that in the attribute :term:`classes_`. 996 | """ 997 | return np.log(self.predict_proba(X)) 998 | 999 | 1000 | class LinearForestRegressor(_LinearForest, RegressorMixin): 1001 | """"A Linear Forest Regressor. 1002 | 1003 | Linear forests generalizes the well known random forests by combining 1004 | linear models with the same random forests. 1005 | The key idea of linear forests is to use the strength of linear models 1006 | to improve the nonparametric learning ability of tree-based algorithms. 1007 | Firstly, a linear model is fitted on the whole dataset, then a random 1008 | forest is trained on the same dataset but using the residuals of the 1009 | previous steps as target. The final predictions are the sum of the raw 1010 | linear predictions and the residuals modeled by the random forest. 1011 | 1012 | Parameters 1013 | ---------- 1014 | base_estimator : object 1015 | The linear estimator fitted on the raw target. 1016 | The linear estimator must be a regressor from sklearn.linear_model. 1017 | 1018 | n_estimators : int, default=100 1019 | The number of trees in the forest. 1020 | 1021 | max_depth : int, default=None 1022 | The maximum depth of the tree. If None, then nodes are expanded until 1023 | all leaves are pure or until all leaves contain less than 1024 | min_samples_split samples. 1025 | 1026 | min_samples_split : int or float, default=2 1027 | The minimum number of samples required to split an internal node: 1028 | 1029 | - If int, then consider `min_samples_split` as the minimum number. 1030 | - If float, then `min_samples_split` is a fraction and 1031 | `ceil(min_samples_split * n_samples)` are the minimum 1032 | number of samples for each split. 1033 | 1034 | min_samples_leaf : int or float, default=1 1035 | The minimum number of samples required to be at a leaf node. 1036 | A split point at any depth will only be considered if it leaves at 1037 | least ``min_samples_leaf`` training samples in each of the left and 1038 | right branches. This may have the effect of smoothing the model, 1039 | especially in regression. 1040 | 1041 | - If int, then consider `min_samples_leaf` as the minimum number. 1042 | - If float, then `min_samples_leaf` is a fraction and 1043 | `ceil(min_samples_leaf * n_samples)` are the minimum 1044 | number of samples for each node. 1045 | 1046 | min_weight_fraction_leaf : float, default=0.0 1047 | The minimum weighted fraction of the sum total of weights (of all 1048 | the input samples) required to be at a leaf node. Samples have 1049 | equal weight when sample_weight is not provided. 1050 | 1051 | max_features : {"auto", "sqrt", "log2"}, int or float, default="auto" 1052 | The number of features to consider when looking for the best split: 1053 | 1054 | - If int, then consider `max_features` features at each split. 1055 | - If float, then `max_features` is a fraction and 1056 | `round(max_features * n_features)` features are considered at each 1057 | split. 1058 | - If "auto", then `max_features=n_features`. 1059 | - If "sqrt", then `max_features=sqrt(n_features)`. 1060 | - If "log2", then `max_features=log2(n_features)`. 1061 | - If None, then `max_features=n_features`. 1062 | 1063 | Note: the search for a split does not stop until at least one 1064 | valid partition of the node samples is found, even if it requires to 1065 | effectively inspect more than ``max_features`` features. 1066 | 1067 | max_leaf_nodes : int, default=None 1068 | Grow trees with ``max_leaf_nodes`` in best-first fashion. 1069 | Best nodes are defined as relative reduction in impurity. 1070 | If None then unlimited number of leaf nodes. 1071 | 1072 | min_impurity_decrease : float, default=0.0 1073 | A node will be split if this split induces a decrease of the impurity 1074 | greater than or equal to this value. 1075 | 1076 | bootstrap : bool, default=True 1077 | Whether bootstrap samples are used when building trees. If False, the 1078 | whole dataset is used to build each tree. 1079 | 1080 | oob_score : bool, default=False 1081 | Whether to use out-of-bag samples to estimate the generalization score. 1082 | Only available if bootstrap=True. 1083 | 1084 | n_jobs : int, default=None 1085 | The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, 1086 | :meth:`decision_path` and :meth:`apply` are all parallelized over the 1087 | trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` 1088 | context. ``-1`` means using all processors. 1089 | 1090 | random_state : int, RandomState instance or None, default=None 1091 | Controls both the randomness of the bootstrapping of the samples used 1092 | when building trees (if ``bootstrap=True``) and the sampling of the 1093 | features to consider when looking for the best split at each node 1094 | (if ``max_features < n_features``). 1095 | 1096 | ccp_alpha : non-negative float, default=0.0 1097 | Complexity parameter used for Minimal Cost-Complexity Pruning. The 1098 | subtree with the largest cost complexity that is smaller than 1099 | ``ccp_alpha`` will be chosen. By default, no pruning is performed. See 1100 | :ref:`minimal_cost_complexity_pruning` for details. 1101 | 1102 | max_samples : int or float, default=None 1103 | If bootstrap is True, the number of samples to draw from X 1104 | to train each base estimator. 1105 | 1106 | - If None (default), then draw `X.shape[0]` samples. 1107 | - If int, then draw `max_samples` samples. 1108 | - If float, then draw `max_samples * X.shape[0]` samples. Thus, 1109 | `max_samples` should be in the interval `(0, 1]`. 1110 | 1111 | Attributes 1112 | ---------- 1113 | n_features_in_ : int 1114 | The number of features when :meth:`fit` is performed. 1115 | 1116 | feature_importances_ : ndarray of shape (n_features, ) 1117 | The impurity-based feature importances. 1118 | The higher, the more important the feature. 1119 | The importance of a feature is computed as the (normalized) 1120 | total reduction of the criterion brought by that feature. It is also 1121 | known as the Gini importance. 1122 | 1123 | coef_ : array of shape (n_features, ) or (n_targets, n_features) 1124 | Estimated coefficients for the linear regression problem. 1125 | If multiple targets are passed during the fit (y 2D), this is a 1126 | 2D array of shape (n_targets, n_features), while if only one target 1127 | is passed, this is a 1D array of length n_features. 1128 | 1129 | intercept_ : float or array of shape (n_targets,) 1130 | Independent term in the linear model. Set to 0 if `fit_intercept = False` 1131 | in `base_estimator`. 1132 | 1133 | base_estimator_ : object 1134 | A fitted linear model instance. 1135 | 1136 | forest_estimator_ : object 1137 | A fitted random forest instance. 1138 | 1139 | Examples 1140 | -------- 1141 | >>> from sklearn.linear_model import LinearRegression 1142 | >>> from lineartree import LinearForestRegressor 1143 | >>> from sklearn.datasets import make_regression 1144 | >>> X, y = make_regression(n_samples=100, n_features=4, 1145 | ... n_informative=2, n_targets=1, 1146 | ... random_state=0, shuffle=False) 1147 | >>> regr = LinearForestRegressor(base_estimator=LinearRegression()) 1148 | >>> regr.fit(X, y) 1149 | >>> regr.predict([[0, 0, 0, 0]]) 1150 | array([8.8817842e-16]) 1151 | 1152 | References 1153 | ---------- 1154 | Regression-Enhanced Random Forests. 1155 | Authors: Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu. 1156 | (https://arxiv.org/abs/1904.10416) 1157 | """ 1158 | def __init__(self, base_estimator, *, n_estimators=100, 1159 | max_depth=None, min_samples_split=2, min_samples_leaf=1, 1160 | min_weight_fraction_leaf=0., max_features="auto", 1161 | max_leaf_nodes=None, min_impurity_decrease=0., 1162 | bootstrap=True, oob_score=False, n_jobs=None, 1163 | random_state=None, ccp_alpha=0.0, max_samples=None): 1164 | 1165 | self.base_estimator = base_estimator 1166 | self.n_estimators = n_estimators 1167 | self.max_depth = max_depth 1168 | self.min_samples_split = min_samples_split 1169 | self.min_samples_leaf = min_samples_leaf 1170 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 1171 | self.max_features = max_features 1172 | self.max_leaf_nodes = max_leaf_nodes 1173 | self.min_impurity_decrease = min_impurity_decrease 1174 | self.bootstrap = bootstrap 1175 | self.oob_score = oob_score 1176 | self.n_jobs = n_jobs 1177 | self.random_state = random_state 1178 | self.ccp_alpha = ccp_alpha 1179 | self.max_samples = max_samples 1180 | 1181 | def fit(self, X, y, sample_weight=None): 1182 | """Build a Linear Forest from the training set (X, y). 1183 | 1184 | Parameters 1185 | ---------- 1186 | X : array-like of shape (n_samples, n_features) 1187 | The training input samples. 1188 | 1189 | y : array-like of shape (n_samples, ) or (n_samples, n_targets) 1190 | Target values. 1191 | 1192 | sample_weight : array-like of shape (n_samples, ), default=None 1193 | Sample weights. 1194 | 1195 | Returns 1196 | ------- 1197 | self : object 1198 | """ 1199 | # Convert data (X is required to be 2d and indexable) 1200 | X, y = self._validate_data( 1201 | X, y, 1202 | reset=True, 1203 | accept_sparse=True, 1204 | dtype='float32', 1205 | force_all_finite=True, 1206 | ensure_2d=True, 1207 | allow_nd=False, 1208 | multi_output=True, 1209 | y_numeric=True, 1210 | ) 1211 | if sample_weight is not None: 1212 | sample_weight = _check_sample_weight(sample_weight, X) 1213 | 1214 | y_shape = np.shape(y) 1215 | n_targets = y_shape[1] if len(y_shape) > 1 else 1 1216 | if n_targets < 2: 1217 | y = y.ravel() 1218 | self._fit(X, y, sample_weight) 1219 | 1220 | return self 1221 | 1222 | def predict(self, X): 1223 | """Predict regression target for X. 1224 | 1225 | Parameters 1226 | ---------- 1227 | X : array-like of shape (n_samples, n_features) 1228 | Samples. 1229 | 1230 | Returns 1231 | ------- 1232 | pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if 1233 | multitarget regression. 1234 | The predicted values. 1235 | """ 1236 | check_is_fitted(self, attributes='base_estimator_') 1237 | 1238 | X = self._validate_data( 1239 | X, 1240 | reset=False, 1241 | accept_sparse=True, 1242 | dtype='float32', 1243 | force_all_finite=True, 1244 | ensure_2d=True, 1245 | allow_nd=False, 1246 | ensure_min_features=self.n_features_in_ 1247 | ) 1248 | 1249 | linear_pred = self.base_estimator_.predict(X) 1250 | forest_pred = self.forest_estimator_.predict(X) 1251 | 1252 | return linear_pred + forest_pred 1253 | 1254 | 1255 | class LinearForestClassifier(_LinearForest, ClassifierMixin): 1256 | """"A Linear Forest Classifier. 1257 | 1258 | Linear forests generalizes the well known random forests by combining 1259 | linear models with the same random forests. 1260 | The key idea of linear forests is to use the strength of linear models 1261 | to improve the nonparametric learning ability of tree-based algorithms. 1262 | Firstly, a linear model is fitted on the whole dataset, then a random 1263 | forest is trained on the same dataset but using the residuals of the 1264 | previous steps as target. The final predictions are the sum of the raw 1265 | linear predictions and the residuals modeled by the random forest. 1266 | 1267 | For classification tasks the same approach used in regression context 1268 | is adopted. The binary targets are transformed into logits using the 1269 | inverse sigmoid function. A linear regression is fitted. A random forest 1270 | regressor is trained to approximate the residulas from logits and linear 1271 | predictions. Finally the sigmoid of the combinded predictions are taken 1272 | to obtain probabilities. 1273 | The multi-label scenario is carried out using OneVsRestClassifier. 1274 | 1275 | Parameters 1276 | ---------- 1277 | base_estimator : object 1278 | The linear estimator fitted on the raw target. 1279 | The linear estimator must be a regressor from sklearn.linear_model. 1280 | 1281 | n_estimators : int, default=100 1282 | The number of trees in the forest. 1283 | 1284 | max_depth : int, default=None 1285 | The maximum depth of the tree. If None, then nodes are expanded until 1286 | all leaves are pure or until all leaves contain less than 1287 | min_samples_split samples. 1288 | 1289 | min_samples_split : int or float, default=2 1290 | The minimum number of samples required to split an internal node: 1291 | 1292 | - If int, then consider `min_samples_split` as the minimum number. 1293 | - If float, then `min_samples_split` is a fraction and 1294 | `ceil(min_samples_split * n_samples)` are the minimum 1295 | number of samples for each split. 1296 | 1297 | min_samples_leaf : int or float, default=1 1298 | The minimum number of samples required to be at a leaf node. 1299 | A split point at any depth will only be considered if it leaves at 1300 | least ``min_samples_leaf`` training samples in each of the left and 1301 | right branches. This may have the effect of smoothing the model, 1302 | especially in regression. 1303 | 1304 | - If int, then consider `min_samples_leaf` as the minimum number. 1305 | - If float, then `min_samples_leaf` is a fraction and 1306 | `ceil(min_samples_leaf * n_samples)` are the minimum 1307 | number of samples for each node. 1308 | 1309 | min_weight_fraction_leaf : float, default=0.0 1310 | The minimum weighted fraction of the sum total of weights (of all 1311 | the input samples) required to be at a leaf node. Samples have 1312 | equal weight when sample_weight is not provided. 1313 | 1314 | max_features : {"auto", "sqrt", "log2"}, int or float, default="auto" 1315 | The number of features to consider when looking for the best split: 1316 | 1317 | - If int, then consider `max_features` features at each split. 1318 | - If float, then `max_features` is a fraction and 1319 | `round(max_features * n_features)` features are considered at each 1320 | split. 1321 | - If "auto", then `max_features=n_features`. 1322 | - If "sqrt", then `max_features=sqrt(n_features)`. 1323 | - If "log2", then `max_features=log2(n_features)`. 1324 | - If None, then `max_features=n_features`. 1325 | 1326 | Note: the search for a split does not stop until at least one 1327 | valid partition of the node samples is found, even if it requires to 1328 | effectively inspect more than ``max_features`` features. 1329 | 1330 | max_leaf_nodes : int, default=None 1331 | Grow trees with ``max_leaf_nodes`` in best-first fashion. 1332 | Best nodes are defined as relative reduction in impurity. 1333 | If None then unlimited number of leaf nodes. 1334 | 1335 | min_impurity_decrease : float, default=0.0 1336 | A node will be split if this split induces a decrease of the impurity 1337 | greater than or equal to this value. 1338 | 1339 | bootstrap : bool, default=True 1340 | Whether bootstrap samples are used when building trees. If False, the 1341 | whole dataset is used to build each tree. 1342 | 1343 | oob_score : bool, default=False 1344 | Whether to use out-of-bag samples to estimate the generalization score. 1345 | Only available if bootstrap=True. 1346 | 1347 | n_jobs : int, default=None 1348 | The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, 1349 | :meth:`decision_path` and :meth:`apply` are all parallelized over the 1350 | trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` 1351 | context. ``-1`` means using all processors. 1352 | 1353 | random_state : int, RandomState instance or None, default=None 1354 | Controls both the randomness of the bootstrapping of the samples used 1355 | when building trees (if ``bootstrap=True``) and the sampling of the 1356 | features to consider when looking for the best split at each node 1357 | (if ``max_features < n_features``). 1358 | 1359 | ccp_alpha : non-negative float, default=0.0 1360 | Complexity parameter used for Minimal Cost-Complexity Pruning. The 1361 | subtree with the largest cost complexity that is smaller than 1362 | ``ccp_alpha`` will be chosen. By default, no pruning is performed. See 1363 | :ref:`minimal_cost_complexity_pruning` for details. 1364 | 1365 | max_samples : int or float, default=None 1366 | If bootstrap is True, the number of samples to draw from X 1367 | to train each base estimator. 1368 | 1369 | - If None (default), then draw `X.shape[0]` samples. 1370 | - If int, then draw `max_samples` samples. 1371 | - If float, then draw `max_samples * X.shape[0]` samples. Thus, 1372 | `max_samples` should be in the interval `(0, 1]`. 1373 | 1374 | Attributes 1375 | ---------- 1376 | n_features_in_ : int 1377 | The number of features when :meth:`fit` is performed. 1378 | 1379 | feature_importances_ : ndarray of shape (n_features, ) 1380 | The impurity-based feature importances. 1381 | The higher, the more important the feature. 1382 | The importance of a feature is computed as the (normalized) 1383 | total reduction of the criterion brought by that feature. It is also 1384 | known as the Gini importance. 1385 | 1386 | coef_ : ndarray of shape (1, n_features_out_) 1387 | Coefficient of the features in the decision function. 1388 | 1389 | intercept_ : float 1390 | Independent term in the linear model. Set to 0 if `fit_intercept = False` 1391 | in `base_estimator`. 1392 | 1393 | classes_ : ndarray of shape (n_classes, ) 1394 | A list of class labels known to the classifier. 1395 | 1396 | base_estimator_ : object 1397 | A fitted linear model instance. 1398 | 1399 | forest_estimator_ : object 1400 | A fitted random forest instance. 1401 | 1402 | Examples 1403 | -------- 1404 | >>> from sklearn.linear_model import LinearRegression 1405 | >>> from lineartree import LinearForestClassifier 1406 | >>> from sklearn.datasets import make_classification 1407 | >>> X, y = make_classification(n_samples=100, n_classes=2, n_features=4, 1408 | ... n_informative=2, n_redundant=0, 1409 | ... random_state=0, shuffle=False) 1410 | >>> clf = LinearForestClassifier(base_estimator=LinearRegression()) 1411 | >>> clf.fit(X, y) 1412 | >>> clf.predict([[0, 0, 0, 0]]) 1413 | array([1]) 1414 | 1415 | References 1416 | ---------- 1417 | Regression-Enhanced Random Forests. 1418 | Authors: Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu. 1419 | (https://arxiv.org/abs/1904.10416) 1420 | """ 1421 | def __init__(self, base_estimator, *, n_estimators=100, 1422 | max_depth=None, min_samples_split=2, min_samples_leaf=1, 1423 | min_weight_fraction_leaf=0., max_features="auto", 1424 | max_leaf_nodes=None, min_impurity_decrease=0., 1425 | bootstrap=True, oob_score=False, n_jobs=None, 1426 | random_state=None, ccp_alpha=0.0, max_samples=None): 1427 | 1428 | self.base_estimator = base_estimator 1429 | self.n_estimators = n_estimators 1430 | self.max_depth = max_depth 1431 | self.min_samples_split = min_samples_split 1432 | self.min_samples_leaf = min_samples_leaf 1433 | self.min_weight_fraction_leaf = min_weight_fraction_leaf 1434 | self.max_features = max_features 1435 | self.max_leaf_nodes = max_leaf_nodes 1436 | self.min_impurity_decrease = min_impurity_decrease 1437 | self.bootstrap = bootstrap 1438 | self.oob_score = oob_score 1439 | self.n_jobs = n_jobs 1440 | self.random_state = random_state 1441 | self.ccp_alpha = ccp_alpha 1442 | self.max_samples = max_samples 1443 | 1444 | def fit(self, X, y, sample_weight=None): 1445 | """Build a Linear Forest from the training set (X, y). 1446 | 1447 | Parameters 1448 | ---------- 1449 | X : array-like of shape (n_samples, n_features) 1450 | The training input samples. 1451 | 1452 | y : array-like of shape (n_samples, ) or (n_samples, n_targets) 1453 | Target values. 1454 | 1455 | sample_weight : array-like of shape (n_samples, ), default=None 1456 | Sample weights. 1457 | 1458 | Returns 1459 | ------- 1460 | self : object 1461 | """ 1462 | # Convert data (X is required to be 2d and indexable) 1463 | X, y = self._validate_data( 1464 | X, y, 1465 | reset=True, 1466 | accept_sparse=True, 1467 | dtype='float32', 1468 | force_all_finite=True, 1469 | ensure_2d=True, 1470 | allow_nd=False, 1471 | multi_output=False, 1472 | ) 1473 | if sample_weight is not None: 1474 | sample_weight = _check_sample_weight(sample_weight, X) 1475 | 1476 | self.classes_ = np.unique(y) 1477 | if len(self.classes_) > 2: 1478 | raise ValueError( 1479 | "LinearForestClassifier supports only binary classification task. " 1480 | "To solve a multi-lable classification task use " 1481 | "LinearForestClassifier with OneVsRestClassifier from sklearn.") 1482 | 1483 | self._fit(X, y, sample_weight) 1484 | 1485 | return self 1486 | 1487 | def decision_function(self, X): 1488 | """Predict confidence scores for samples. 1489 | 1490 | The confidence score for a sample is proportional to the signed 1491 | distance of that sample to the hyperplane. 1492 | 1493 | Parameters 1494 | ---------- 1495 | X : array-like of shape (n_samples, n_features) 1496 | Samples. 1497 | 1498 | Returns 1499 | ------- 1500 | pred : ndarray of shape (n_samples, ) 1501 | Confidence scores. 1502 | Confidence score for self.classes_[1] where >0 means this 1503 | class would be predicted. 1504 | """ 1505 | check_is_fitted(self, attributes='base_estimator_') 1506 | 1507 | X = self._validate_data( 1508 | X, 1509 | reset=False, 1510 | accept_sparse=True, 1511 | dtype='float32', 1512 | force_all_finite=True, 1513 | ensure_2d=True, 1514 | allow_nd=False, 1515 | ensure_min_features=self.n_features_in_ 1516 | ) 1517 | 1518 | linear_pred = self.base_estimator_.predict(X) 1519 | forest_pred = self.forest_estimator_.predict(X) 1520 | 1521 | return linear_pred + forest_pred 1522 | 1523 | def predict(self, X): 1524 | """Predict class for X. 1525 | 1526 | Parameters 1527 | ---------- 1528 | X : array-like of shape (n_samples, n_features) 1529 | Samples. 1530 | 1531 | Returns 1532 | ------- 1533 | pred : ndarray of shape (n_samples, ) 1534 | The predicted classes. 1535 | """ 1536 | pred = self.decision_function(X) 1537 | pred_class = (self._sigmoid(pred) > 0.5).astype(int) 1538 | int_to_class = dict(enumerate(self.classes_)) 1539 | pred_class = np.array([int_to_class[i] for i in pred_class]) 1540 | 1541 | return pred_class 1542 | 1543 | def predict_proba(self, X): 1544 | """Predict class probabilities for X. 1545 | 1546 | Parameters 1547 | ---------- 1548 | X : array-like of shape (n_samples, n_features) 1549 | Samples. 1550 | 1551 | Returns 1552 | ------- 1553 | proba : ndarray of shape (n_samples, n_classes) 1554 | The class probabilities of the input samples. The order of the 1555 | classes corresponds to that in the attribute :term:`classes_`. 1556 | """ 1557 | 1558 | pred = self._sigmoid(self.decision_function(X)) 1559 | proba = np.zeros((X.shape[0], 2)) 1560 | proba[:, 0] = 1 - pred 1561 | proba[:, 1] = pred 1562 | 1563 | return proba 1564 | 1565 | def predict_log_proba(self, X): 1566 | """Predict class log-probabilities for X. 1567 | 1568 | Parameters 1569 | ---------- 1570 | X : array-like of shape (n_samples, n_features) 1571 | Samples. 1572 | 1573 | Returns 1574 | ------- 1575 | pred : ndarray of shape (n_samples, n_classes) 1576 | The class log-probabilities of the input samples. The order of the 1577 | classes corresponds to that in the attribute :term:`classes_`. 1578 | """ 1579 | return np.log(self.predict_proba(X)) 1580 | -------------------------------------------------------------------------------- /notebooks/README.md: -------------------------------------------------------------------------------- 1 | # API Reference 2 | 3 | ## LinearTreeRegressor 4 | ``` 5 | class lineartree.LinearTreeRegressor(base_estimator, *, criterion = 'mse', max_depth = 5, min_samples_split = 6, min_samples_leaf = 0.1, max_bins = 25, categorical_features = None, split_features = None, linear_features = None, n_jobs = None) 6 | ``` 7 | 8 | #### Parameters: 9 | 10 | - ```base_estimator : object``` 11 | 12 | The base estimator to fit on dataset splits. 13 | The base estimator must be a sklearn.linear_model. 14 | 15 | - ```criterion : {"mse", "rmse", "mae", "poisson"}, default="mse"``` 16 | 17 | The function to measure the quality of a split. `"poisson"` requires `y >= 0`. 18 | 19 | - ```max_depth : int, default=5``` 20 | 21 | The maximum depth of the tree considering only the splitting nodes. 22 | A higher value implies a higher training time. 23 | 24 | - ```min_samples_split : int or float, default=6``` 25 | 26 | The minimum number of samples required to split an internal node. 27 | The minimum valid number of samples in each node is 6. 28 | A lower value implies a higher training time. 29 | - If int, then consider `min_samples_split` as the minimum number. 30 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 31 | 32 | - ```min_samples_leaf : int or float, default=0.1``` 33 | 34 | The minimum number of samples required to be at a leaf node. 35 | A split point at any depth will only be considered if it leaves at least `min_samples_leaf` training samples in each of the left and right branches. 36 | The minimum valid number of samples in each leaf is 3. 37 | A lower value implies a higher training time. 38 | - If int, then consider `min_samples_leaf` as the minimum number. 39 | - If float, then `min_samples_leaf` is a fraction and 40 | `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 41 | 42 | - ```max_bins : int, default=25``` 43 | 44 | The maximum number of bins to use to search the optimal split in each feature. Features with a small number of unique values may use less than ``max_bins`` bins. Must be lower than 120 and larger than 10. 45 | A higher value implies a higher training time. 46 | 47 | - ```min_impurity_decrease : float, default=0.0``` 48 | 49 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 50 | 51 | - ```categorical_features : int or array-like of int, default=None``` 52 | 53 | Indicates the categorical features. 54 | All categorical indices must be in `[0, n_features)`. 55 | Categorical features are used for splits but are not used in model fitting. 56 | More categorical features imply a higher training time. 57 | - None : no feature will be considered categorical. 58 | - integer array-like : integer indices indicating categorical features. 59 | - integer : integer index indicating a categorical feature. 60 | 61 | - ```split_features : int or array-like of int, default=None``` 62 | 63 | Defines which features can be used to split on. 64 | All split feature indices must be in `[0, n_features)`. 65 | - None : All features will be used for splitting. 66 | - integer array-like : integer indices indicating splitting features. 67 | - integer : integer index indicating a single splitting feature. 68 | 69 | - ```linear_features : int or array-like of int, default=None``` 70 | 71 | Defines which features are used for the linear model in the leaves. 72 | All linear feature indices must be in `[0, n_features)`. 73 | - None : All features except those in `categorical_features` will be used in the leaf models. 74 | - integer array-like : integer indices indicating features to be used in the leaf models. 75 | - integer : integer index indicating a single feature to be used in the leaf models. 76 | 77 | - ```n_jobs : int, default=None``` 78 | 79 | The number of jobs to run in parallel for model fitting. ``None`` means 1 using one processor. ``-1`` means using all processors. 80 | 81 | #### Attributes: 82 | 83 | - ```n_features_in_ : int``` 84 | 85 | The number of features when :meth:`fit` is performed. 86 | 87 | - ```feature_importances_ : ndarray of shape (n_features, )``` 88 | 89 | Normalized total reduction of criteria by splitting features. 90 | 91 | - ```n_targets_ : int``` 92 | 93 | The number of targets when :meth:`fit` is performed. 94 | 95 | #### Methods: 96 | 97 | - ```fit(X, y, sample_weight=None)``` 98 | 99 | Build a Linear Tree of a linear estimator from the training set (X, y). 100 | 101 | **Parameters:** 102 | 103 | - `X` : array-like of shape (n_samples, n_features) 104 | The training input samples. 105 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 106 | Target values. 107 | - `sample_weight` : array-like of shape (n_samples, ), default=None 108 | Sample weights. If None, then samples are equally weighted. Note that if the base estimator does not support sample weighting, the sample weights are still used to evaluate the splits. 109 | 110 | **Returns:** 111 | 112 | - `self` : object 113 | 114 | - ```predict(X)``` 115 | 116 | Predict regression target for X. 117 | 118 | **Parameters:** 119 | 120 | - `X` : array-like of shape (n_samples, n_features) 121 | 122 | Samples. 123 | 124 | **Returns:** 125 | 126 | - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression 127 | 128 | The predicted values. 129 | 130 | - ```apply(X)``` 131 | 132 | Return the index of the leaf that each sample is predicted as. 133 | 134 | **Parameters:** 135 | 136 | - `X` : array-like of shape (n_samples, n_features) 137 | 138 | Samples. 139 | 140 | **Returns:** 141 | 142 | - `X_leaves` : array-like of shape (n_samples, ) 143 | 144 | For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within ``[0; n_nodes)``, possibly with gaps in the numbering. 145 | 146 | - ```decision_path(X)``` 147 | 148 | Return the decision path in the tree. 149 | 150 | **Parameters:** 151 | 152 | - `X` : array-like of shape (n_samples, n_features) 153 | 154 | Samples. 155 | 156 | **Returns:** 157 | 158 | - `indicator` : sparse matrix of shape (n_samples, n_nodes) 159 | 160 | Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes. 161 | 162 | - ```summary(feature_names=None, only_leaves=False, max_depth=None)``` 163 | 164 | Return a summary of nodes created from model fitting. 165 | 166 | **Parameters:** 167 | 168 | - `feature_names` : array-like of shape (n_features, ), default=None 169 | 170 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 171 | 172 | - `only_leaves` : bool, default=False 173 | 174 | Store only information of leaf nodes. 175 | 176 | - `max_depth` : int, default=None 177 | 178 | The maximum depth of the representation. If None, the tree is fully generated. 179 | 180 | **Returns:** 181 | 182 | - `summary` : nested dict 183 | 184 | The keys are the integer map of each node. 185 | The values are dicts containing information for that node: 186 | - 'col' (^): column used for splitting; 187 | - 'th' (^): threshold value used for splitting in the selected column; 188 | - 'loss': loss computed at node level. Weighted sum of children' losses if it is a splitting node; 189 | - 'samples': number of samples in the node. Sum of children' samples if it is a split node; 190 | - 'children' (^): integer mapping of possible children nodes; 191 | - 'models': fitted linear models built in each split. Single model if it is leaf node; 192 | - 'classes' (^^): target classes detected in the split. Available only for LinearTreeClassifier. 193 | 194 | (^): Only for split nodes. 195 | (^^): Only for leaf nodes. 196 | 197 | - ```model_to_dot(feature_names=None, max_depth=None)``` 198 | 199 | Convert a fitted Linear Tree model to dot format. 200 | It results in ModuleNotFoundError if graphviz or pydot are not available. 201 | When installing graphviz make sure to add it to the system path. 202 | 203 | **Parameters:** 204 | 205 | - `feature_names` : array-like of shape (n_features, ), default=None 206 | 207 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 208 | 209 | - `max_depth` : int, default=None 210 | 211 | The maximum depth of the representation. If None, the tree is fully generated. 212 | 213 | **Returns:** 214 | 215 | - `graph` : pydot.Dot instance 216 | 217 | Return an instance representing the Linear Tree. Splitting nodes have a rectangular shape while leaf nodes have a circular one. 218 | 219 | - ```plot_model(feature_names=None, max_depth=None)``` 220 | 221 | Convert a fitted Linear Tree model to dot format and display it. 222 | It results in ModuleNotFoundError if graphviz or pydot are not available. 223 | When installing graphviz make sure to add it to the system path. 224 | 225 | **Parameters:** 226 | 227 | - `feature_names` : array-like of shape (n_features, ), default=None 228 | 229 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 230 | 231 | - `max_depth` : int, default=None 232 | 233 | The maximum depth of the representation. If None, the tree is fully generated. 234 | 235 | **Returns:** 236 | 237 | - A Jupyter notebook Image object if Jupyter is installed. 238 | 239 | This enables in-line display of the model plots in notebooks. Splitting nodes have a rectangular shape while leaf nodes have a circular one. 240 | 241 | 242 | ## LinearTreeClassifier 243 | ``` 244 | class lineartree.LinearTreeClassifier(base_estimator, *, criterion = 'hamming', max_depth = 5, min_samples_split = 6, min_samples_leaf = 0.1, max_bins = 25, categorical_features = None, split_features = None, linear_features = None, n_jobs = None) 245 | ``` 246 | 247 | #### Parameters: 248 | 249 | - ```base_estimator : object``` 250 | 251 | The base estimator to fit on dataset splits. 252 | The base estimator must be a sklearn.linear_model. 253 | The selected base estimator is automatically substituted by a `~sklearn.dummy.DummyClassifier` when a dataset split is composed of unique labels. 254 | 255 | - ```criterion : {"hamming", "crossentropy"}, default="hamming"``` 256 | 257 | The function to measure the quality of a split. `"crossentropy"` can be used only if `base_estimator` has `predict_proba` method 258 | 259 | - ```max_depth : int, default=5``` 260 | 261 | The maximum depth of the tree considering only the splitting nodes. 262 | A higher value implies a higher training time. 263 | 264 | - ```min_samples_split : int or float, default=6``` 265 | 266 | The minimum number of samples required to split an internal node. 267 | The minimum valid number of samples in each node is 6. 268 | A lower value implies a higher training time. 269 | - If int, then consider `min_samples_split` as the minimum number. 270 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 271 | 272 | - ```min_samples_leaf : int or float, default=0.1``` 273 | 274 | The minimum number of samples required to be at a leaf node. 275 | A split point at any depth will only be considered if it leaves at least `min_samples_leaf` training samples in each of the left and right branches. 276 | The minimum valid number of samples in each leaf is 3. 277 | A lower value implies a higher training time. 278 | - If int, then consider `min_samples_leaf` as the minimum number. 279 | - If float, then `min_samples_leaf` is a fraction and 280 | `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 281 | 282 | - ```max_bins : int, default=25``` 283 | 284 | The maximum number of bins to use to search the optimal split in each feature. Features with a small number of unique values may use less than ``max_bins`` bins. Must be lower than 120 and larger than 10. 285 | A higher value implies a higher training time. 286 | 287 | - ```min_impurity_decrease : float, default=0.0``` 288 | 289 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 290 | 291 | - ```categorical_features : int or array-like of int, default=None``` 292 | 293 | Indicates the categorical features. 294 | All categorical indices must be in `[0, n_features)`. 295 | Categorical features are used for splits but are not used in model fitting. 296 | More categorical features imply a higher training time. 297 | - None : no feature will be considered categorical. 298 | - integer array-like : integer indices indicating categorical features. 299 | - integer : integer index indicating a categorical feature. 300 | 301 | - ```split_features : int or array-like of int, default=None``` 302 | 303 | Defines which features can be used to split on. 304 | All split feature indices must be in `[0, n_features)`. 305 | - None : All features will be used for splitting. 306 | - integer array-like : integer indices indicating splitting features. 307 | - integer : integer index indicating a single splitting feature. 308 | 309 | - ```linear_features : int or array-like of int, default=None``` 310 | 311 | Defines which features are used for the linear model in the leaves. 312 | All linear feature indices must be in `[0, n_features)`. 313 | - None : All features except those in `categorical_features` will be used in the leaf models. 314 | - integer array-like : integer indices indicating features to be used in the leaf models. 315 | - integer : integer index indicating a single feature to be used in the leaf models. 316 | 317 | - ```n_jobs : int, default=None``` 318 | 319 | The number of jobs to run in parallel for model fitting. ``None`` means 1 using one processor. ``-1`` means using all processors. 320 | 321 | #### Attributes: 322 | 323 | - ```n_features_in_ : int``` 324 | 325 | The number of features when :meth:`fit` is performed. 326 | 327 | - ```feature_importances_ : ndarray of shape (n_features, )``` 328 | 329 | Normalized total reduction of criteria by splitting features. 330 | 331 | - ```classes_ : ndarray of shape (n_classes, )``` 332 | 333 | A list of class labels known to the classifier. 334 | 335 | #### Methods: 336 | 337 | - ```fit(X, y, sample_weight=None)``` 338 | 339 | Build a Linear Tree of a linear estimator from the training set (X, y). 340 | 341 | **Parameters:** 342 | 343 | - `X` : array-like of shape (n_samples, n_features) 344 | 345 | The training input samples. 346 | 347 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 348 | 349 | Target values. 350 | 351 | - `sample_weight` : array-like of shape (n_samples, ), default=None 352 | 353 | Sample weights. If None, then samples are equally weighted. Note that if the base estimator does not support sample weighting, the sample weights are still used to evaluate the splits. 354 | 355 | **Returns:** 356 | 357 | - `self` : object 358 | 359 | - ```predict(X)``` 360 | 361 | Predict class for X. 362 | 363 | **Parameters:** 364 | 365 | - `X` : array-like of shape (n_samples, n_features) 366 | 367 | Samples. 368 | 369 | **Returns:** 370 | 371 | - `pred` : ndarray of shape (n_samples, ) 372 | 373 | The predicted classes. 374 | 375 | - ```predict_proba(X)``` 376 | 377 | Predict class probabilities for X. 378 | If base estimators do not implement a ``predict_proba`` method, then the one-hot encoding of the predicted class is returned 379 | 380 | **Parameters:** 381 | 382 | - `X` : array-like of shape (n_samples, n_features) 383 | 384 | Samples. 385 | 386 | **Returns:** 387 | 388 | - `pred` : ndarray of shape (n_samples, n_classes) 389 | 390 | The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`. 391 | 392 | - ```predict_log_proba(X)``` 393 | 394 | Predict class log-probabilities for X. 395 | If base estimators do not implement a ``predict_log_proba`` method, then the logarithm of the one-hot encoded predicted class is returned. 396 | 397 | **Parameters:** 398 | 399 | - `X` : array-like of shape (n_samples, n_features) 400 | 401 | Samples. 402 | 403 | **Returns:** 404 | 405 | - `pred` : ndarray of shape (n_samples, n_classes) 406 | 407 | The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`. 408 | 409 | - ```apply(X)``` 410 | 411 | Return the index of the leaf that each sample is predicted as. 412 | 413 | **Parameters:** 414 | 415 | - `X` : array-like of shape (n_samples, n_features) 416 | 417 | Samples. 418 | 419 | **Returns:** 420 | 421 | - `X_leaves` : array-like of shape (n_samples, ) 422 | 423 | For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within ``[0; n_nodes)``, possibly with gaps in the numbering. 424 | 425 | - ```decision_path(X)``` 426 | 427 | Return the decision path in the tree. 428 | 429 | **Parameters:** 430 | 431 | - `X` : array-like of shape (n_samples, n_features) 432 | 433 | Samples. 434 | 435 | **Returns:** 436 | 437 | - `indicator` : sparse matrix of shape (n_samples, n_nodes) 438 | 439 | Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes. 440 | 441 | - ```summary(feature_names=None, only_leaves=False, max_depth=None)``` 442 | 443 | Return a summary of nodes created from model fitting. 444 | 445 | **Parameters:** 446 | 447 | - `feature_names` : array-like of shape (n_features, ), default=None 448 | 449 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 450 | 451 | - `only_leaves` : bool, default=False 452 | 453 | Store only information of leaf nodes. 454 | 455 | - `max_depth` : int, default=None 456 | 457 | The maximum depth of the representation. If None, the tree is fully generated. 458 | 459 | **Returns:** 460 | 461 | - `summary` : nested dict 462 | 463 | The keys are the integer map of each node. 464 | The values are dicts containing information for that node: 465 | - 'col' (^): column used for splitting; 466 | - 'th' (^): threshold value used for splitting in the selected column; 467 | - 'loss': loss computed at node level. Weighted sum of children' losses if it is a splitting node; 468 | - 'samples': number of samples in the node. Sum of children' samples if it is a split node; 469 | - 'children' (^): integer mapping of possible children nodes; 470 | - 'models': fitted linear models built in each split. Single model if it is leaf node; 471 | - 'classes' (^^): target classes detected in the split. Available only for LinearTreeClassifier. 472 | 473 | (^): Only for split nodes. 474 | (^^): Only for leaf nodes. 475 | 476 | - ```model_to_dot(feature_names=None, max_depth=None)``` 477 | 478 | Convert a fitted Linear Tree model to dot format. 479 | It results in ModuleNotFoundError if graphviz or pydot are not available. 480 | When installing graphviz make sure to add it to the system path. 481 | 482 | **Parameters:** 483 | 484 | - `feature_names` : array-like of shape (n_features, ), default=None 485 | 486 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 487 | 488 | - `max_depth` : int, default=None 489 | 490 | The maximum depth of the representation. If None, the tree is fully generated. 491 | 492 | **Returns:** 493 | 494 | - `graph` : pydot.Dot instance 495 | 496 | Return an instance representing the Linear Tree. Splitting nodes have a rectangular shape while leaf nodes have a circular one. 497 | 498 | - ```plot_model(feature_names=None, max_depth=None)``` 499 | 500 | Convert a fitted Linear Tree model to dot format and display it. 501 | It results in ModuleNotFoundError if graphviz or pydot are not available. 502 | When installing graphviz make sure to add it to the system path. 503 | 504 | **Parameters:** 505 | 506 | - `feature_names` : array-like of shape (n_features, ), default=None 507 | 508 | Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 509 | 510 | - `max_depth` : int, default=None 511 | 512 | The maximum depth of the representation. If None, the tree is fully generated. 513 | 514 | **Returns:** 515 | 516 | - A Jupyter notebook Image object if Jupyter is installed. 517 | 518 | This enables in-line display of the model plots in notebooks. Splitting nodes have a rectangular shape while leaf nodes have a circular one. 519 | 520 | 521 | ## LinearBoostRegressor 522 | ``` 523 | class lineartree.LinearBoostRegressor(base_estimator, *, loss = 'linear', n_estimators = 10, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0.0, ccp_alpha = 0.0) 524 | ``` 525 | 526 | #### Parameters: 527 | 528 | - ```base_estimator : object``` 529 | 530 | The base estimator iteratively fitted. 531 | The base estimator must be a sklearn.linear_model. 532 | 533 | - ```loss : {"linear", "square", "absolute", "exponential"}, default="linear"``` 534 | 535 | The function used to calculate the residuals of each sample. 536 | 537 | - ```n_estimators : int, default=10``` 538 | 539 | The number of boosting stages to perform. It corresponds to the number of the new features generated. 540 | 541 | - ```max_depth : int, default=3``` 542 | 543 | The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 544 | 545 | - ```min_samples_split : int or float, default=2``` 546 | 547 | The minimum number of samples required to split an internal node: 548 | - If int, then consider `min_samples_split` as the minimum number. 549 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 550 | 551 | - ```min_samples_leaf : int or float, default=1``` 552 | 553 | The minimum number of samples required to be at a leaf node. 554 | A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. 555 | - If int, then consider `min_samples_leaf` as the minimum number. 556 | - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 557 | 558 | - ```min_weight_fraction_leaf : float, default=0.0``` 559 | 560 | The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 561 | 562 | - ```max_features : int, float or {"auto", "sqrt", "log2"}, default=None``` 563 | 564 | The number of features to consider when looking for the best split: 565 | - If int, then consider `max_features` features at each split. 566 | - If float, then `max_features` is a fraction and `int(max_features * n_features)` features are considered at each split. 567 | - If "auto", then `max_features=n_features`. 568 | - If "sqrt", then `max_features=sqrt(n_features)`. 569 | - If "log2", then `max_features=log2(n_features)`. 570 | - If None, then `max_features=n_features`. 571 | Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features. 572 | 573 | - ```max_leaf_nodes : int, default=None``` 574 | 575 | Grow a tree with ``max_leaf_nodes`` in best-first fashion. 576 | Best nodes are defined as relative reduction in impurity. 577 | If None then unlimited number of leaf nodes. 578 | 579 | - ```min_impurity_decrease : float, default=0.0``` 580 | 581 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 582 | 583 | - ```ccp_alpha : non-negative float, default=0.0``` 584 | 585 | Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. 586 | 587 | #### Attributes: 588 | 589 | - ```n_features_in_ : int``` 590 | 591 | The number of features when :meth:`fit` is performed. 592 | 593 | - ```n_features_out_ : int``` 594 | 595 | The total number of features used to fit the base estimator in the last iteration. The number of output features is equal to the sum of n_features_in_ and n_estimators. 596 | 597 | - ```coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_)``` 598 | 599 | Estimated coefficients for the linear regression problem. 600 | If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features_out_), while if only one target is passed, this is a 1D array of length n_features. 601 | 602 | - ```intercept_ : float or array of shape (n_targets, )``` 603 | 604 | Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator` 605 | 606 | #### Methods: 607 | 608 | - ```fit(X, y, sample_weight=None)``` 609 | 610 | Build a Linear Boosting from the training set (X, y). 611 | 612 | **Parameters:** 613 | 614 | - `X` : array-like of shape (n_samples, n_features) 615 | 616 | The training input samples. 617 | 618 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 619 | 620 | Target values. 621 | 622 | - `sample_weight` : array-like of shape (n_samples, ), default=None 623 | 624 | Sample weights. 625 | 626 | **Returns:** 627 | 628 | - `self` : object 629 | 630 | - ```predict(X)``` 631 | 632 | Predict regression target for X. 633 | 634 | **Parameters:** 635 | 636 | - `X` : array-like of shape (n_samples, n_features) 637 | 638 | Samples. 639 | 640 | **Returns:** 641 | 642 | - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression. 643 | 644 | The predicted values. 645 | 646 | - ```transform(X)``` 647 | 648 | Transform dataset. 649 | 650 | **Parameters:** 651 | 652 | - `X` : array-like of shape (n_samples, n_features) 653 | 654 | Input data to be transformed. Use ``dtype=np.float32`` for maximum efficiency. 655 | 656 | **Returns:** 657 | 658 | - `X_transformed` : ndarray of shape (n_samples, n_out). 659 | 660 | Transformed dataset. 661 | `n_out` is equal to `n_features` + `n_estimators`. 662 | 663 | ## LinearBoostClassifier 664 | ``` 665 | class lineartree.LinearBoostClassifier(base_estimator, loss = 'hamming', n_estimators = 10, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0.0, ccp_alpha = 0.0) 666 | ``` 667 | 668 | #### Parameters: 669 | 670 | - ```base_estimator : object``` 671 | 672 | The base estimator iteratively fitted. 673 | The base estimator must be a sklearn.linear_model. 674 | 675 | - ```loss : {"hamming", "entropy"}, default="hamming"``` 676 | 677 | The function used to calculate the residuals of each sample. 678 | 679 | - ```n_estimators : int, default=10``` 680 | 681 | The number of boosting stages to perform. It corresponds to the number of the new features generated. 682 | 683 | - ```max_depth : int, default=3``` 684 | 685 | The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 686 | 687 | - ```min_samples_split : int or float, default=2``` 688 | 689 | The minimum number of samples required to split an internal node: 690 | - If int, then consider `min_samples_split` as the minimum number. 691 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 692 | 693 | - ```min_samples_leaf : int or float, default=1``` 694 | 695 | The minimum number of samples required to be at a leaf node. 696 | A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. 697 | - If int, then consider `min_samples_leaf` as the minimum number. 698 | - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 699 | 700 | - ```min_weight_fraction_leaf : float, default=0.0``` 701 | 702 | The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 703 | 704 | - ```max_features : int, float or {"auto", "sqrt", "log2"}, default=None``` 705 | 706 | The number of features to consider when looking for the best split: 707 | - If int, then consider `max_features` features at each split. 708 | - If float, then `max_features` is a fraction and `int(max_features * n_features)` features are considered at each split. 709 | - If "auto", then `max_features=n_features`. 710 | - If "sqrt", then `max_features=sqrt(n_features)`. 711 | - If "log2", then `max_features=log2(n_features)`. 712 | - If None, then `max_features=n_features`. 713 | 714 | Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features. 715 | 716 | - ```max_leaf_nodes : int, default=None``` 717 | 718 | Grow a tree with ``max_leaf_nodes`` in best-first fashion. 719 | Best nodes are defined as relative reduction in impurity. 720 | If None then unlimited number of leaf nodes. 721 | 722 | - ```min_impurity_decrease : float, default=0.0``` 723 | 724 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 725 | 726 | - ```ccp_alpha : non-negative float, default=0.0``` 727 | 728 | Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. 729 | 730 | #### Attributes: 731 | 732 | - ```n_features_in_ : int``` 733 | 734 | The number of features when :meth:`fit` is performed. 735 | 736 | - ```n_features_out_ : int``` 737 | 738 | The total number of features used to fit the base estimator in the last iteration. The number of output features is equal to the sum of n_features_in_ and n_estimators. 739 | 740 | - ```coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_)``` 741 | 742 | Estimated coefficients for the linear regression problem. 743 | If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features_out_), while if only one target is passed, this is a 1D array of length n_features_out_. 744 | 745 | - ```intercept_ : float or array of shape (n_targets, )``` 746 | 747 | Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator` 748 | 749 | - ```classes_ : ndarray of shape (n_classes, )``` 750 | 751 | A list of class labels known to the classifier. 752 | 753 | #### Methods: 754 | 755 | - ```fit(X, y, sample_weight=None)``` 756 | 757 | Build a Linear Boosting from the training set (X, y). 758 | 759 | **Parameters:** 760 | 761 | - `X` : array-like of shape (n_samples, n_features) 762 | 763 | The training input samples. 764 | 765 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 766 | 767 | Target values. 768 | 769 | - `sample_weight` : array-like of shape (n_samples, ), default=None 770 | 771 | Sample weights. 772 | 773 | **Returns:** 774 | 775 | - `self` : object 776 | 777 | - ```predict(X)``` 778 | 779 | Predict class for X. 780 | 781 | **Parameters:** 782 | 783 | - `X` : array-like of shape (n_samples, n_features) 784 | 785 | Samples. 786 | 787 | **Returns:** 788 | 789 | - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression. 790 | 791 | The predicted classes. 792 | 793 | - ```transform(X)``` 794 | 795 | Transform dataset. 796 | 797 | **Parameters:** 798 | 799 | - `X` : array-like of shape (n_samples, n_features) 800 | 801 | Input data to be transformed. Use ``dtype=np.float32`` for maximum efficiency. 802 | 803 | **Returns:** 804 | 805 | - `X_transformed` : ndarray of shape (n_samples, n_out) 806 | 807 | Transformed dataset. 808 | `n_out` is equal to `n_features` + `n_estimators`. 809 | 810 | ## LinearForestRegressor 811 | ``` 812 | class lineartree.LinearForestRegressor(base_estimator, *, n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0., max_features="auto", max_leaf_nodes=None, min_impurity_decrease=0., bootstrap=True, oob_score=False, n_jobs=None, random_state=None, ccp_alpha=0.0, max_samples=None) 813 | ``` 814 | 815 | #### Parameters: 816 | 817 | - ```base_estimator : object``` 818 | 819 | The linear estimator fitted on the raw target. 820 | The linear estimator must be a regressor from sklearn.linear_model. 821 | 822 | - ```n_estimators : int, default=100``` 823 | 824 | The number of trees in the forest. 825 | 826 | - ```max_depth : int, default=None``` 827 | 828 | The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 829 | 830 | - ```min_samples_split : int or float, default=2``` 831 | 832 | The minimum number of samples required to split an internal node: 833 | - If int, then consider `min_samples_split` as the minimum number. 834 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 835 | 836 | - ```min_samples_leaf : int or float, default=1``` 837 | 838 | The minimum number of samples required to be at a leaf node. 839 | A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. 840 | - If int, then consider `min_samples_leaf` as the minimum number. 841 | - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 842 | 843 | - ```min_weight_fraction_leaf : float, default=0.0``` 844 | 845 | The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 846 | 847 | - ```max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"``` 848 | 849 | The number of features to consider when looking for the best split: 850 | - If int, then consider `max_features` features at each split. 851 | - If float, then `max_features` is a fraction and `round(max_features * n_features)` features are considered at each split. 852 | - If "auto", then `max_features=n_features`. 853 | - If "sqrt", then `max_features=sqrt(n_features)`. 854 | - If "log2", then `max_features=log2(n_features)`. 855 | - If None, then `max_features=n_features` 856 | 857 | Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features. 858 | 859 | - ```max_leaf_nodes : int, default=None``` 860 | 861 | Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. 862 | 863 | - ```min_impurity_decrease : float, default=0.0``` 864 | 865 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 866 | 867 | - ```bootstrap : bool, default=True``` 868 | 869 | Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. 870 | 871 | - ```oob_score : bool, default=False``` 872 | 873 | Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True. 874 | 875 | - ```n_jobs : int, default=None``` 876 | 877 | The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, :meth:`decision_path` and :meth:`apply` are all parallelized over the trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. 878 | 879 | - ```random_state : int, RandomState instance or None, default=None``` 880 | 881 | Controls both the randomness of the bootstrapping of the samples used when building trees (if ``bootstrap=True``) and the sampling of the features to consider when looking for the best split at each node (if ``max_features < n_features``). 882 | 883 | - ```ccp_alpha : non-negative float, default=0.0``` 884 | 885 | Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. 886 | 887 | - ```max_samples : int or float, default=None``` 888 | 889 | If bootstrap is True, the number of samples to draw from X to train each base estimator. 890 | - If None (default), then draw `X.shape[0]` samples. 891 | - If int, then draw `max_samples` samples. 892 | - If float, then draw `max_samples * X.shape[0]` samples. Thus, 893 | `max_samples` should be in the interval `(0, 1]`. 894 | 895 | #### Attributes: 896 | 897 | - ```n_features_in_ : int``` 898 | 899 | The number of features when :meth:`fit` is performed. 900 | 901 | - ```feature_importances_ : ndarray of shape (n_features, )``` 902 | 903 | The impurity-based feature importances. 904 | The higher, the more important the feature. 905 | The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance. 906 | 907 | - ```coef_ : array of shape (n_features, ) or (n_targets, n_features)``` 908 | 909 | Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features. 910 | 911 | - ```intercept_ : float or array of shape (n_targets,)``` 912 | 913 | Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`. 914 | 915 | - ```base_estimator_ : object``` 916 | 917 | A fitted linear model instance. 918 | 919 | - ```forest_estimator_ : object``` 920 | 921 | A fitted random forest instance. 922 | 923 | #### Methods: 924 | 925 | - ```fit(X, y, sample_weight=None)``` 926 | 927 | Build a Linear Forest from the training set (X, y). 928 | 929 | **Parameters:** 930 | 931 | - `X` : array-like of shape (n_samples, n_features) 932 | 933 | The training input samples. 934 | 935 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 936 | 937 | Target values. 938 | 939 | - `sample_weight` : array-like of shape (n_samples, ), default=None 940 | 941 | Sample weights. 942 | 943 | **Returns:** 944 | 945 | - `self` : object 946 | 947 | - ```predict(X)``` 948 | 949 | Predict regression target for X. 950 | 951 | **Parameters:** 952 | 953 | - `X` : array-like of shape (n_samples, n_features) 954 | 955 | Samples. 956 | 957 | **Returns:** 958 | 959 | - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression. 960 | 961 | The predicted values. 962 | 963 | - ```apply(X)``` 964 | 965 | Apply trees in the forest to X, return leaf indices. 966 | 967 | **Parameters:** 968 | 969 | - `X` : array-like of shape (n_samples, n_features) 970 | 971 | The input samples. 972 | 973 | **Returns:** 974 | 975 | - `X_leaves` : array-like of shape (n_samples, n_estimators). 976 | 977 | For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in. 978 | 979 | - ```decision_path(X)``` 980 | 981 | Return the decision path in the forest. 982 | 983 | **Parameters:** 984 | 985 | - `X` : array-like of shape (n_samples, n_features) 986 | 987 | The input samples. 988 | 989 | **Returns:** 990 | 991 | - `indicator` : sparse matrix of shape (n_samples, n_nodes) 992 | 993 | Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format. 994 | 995 | - `n_nodes_ptr` : ndarray of shape (n_estimators + 1, ) 996 | 997 | The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator. 998 | 999 | ## LinearForestClassifier 1000 | ``` 1001 | class lineartree.LinearForestClassifier(base_estimator, *, n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0., max_features="auto", max_leaf_nodes=None, min_impurity_decrease=0., bootstrap=True, oob_score=False, n_jobs=None, random_state=None, ccp_alpha=0.0, max_samples=None) 1002 | ``` 1003 | 1004 | #### Parameters: 1005 | 1006 | - ```base_estimator : object``` 1007 | 1008 | The linear estimator fitted on the raw target. 1009 | The linear estimator must be a regressor from sklearn.linear_model. 1010 | 1011 | - ```n_estimators : int, default=100``` 1012 | 1013 | The number of trees in the forest. 1014 | 1015 | - ```max_depth : int, default=None``` 1016 | 1017 | The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 1018 | 1019 | - ```min_samples_split : int or float, default=2``` 1020 | 1021 | The minimum number of samples required to split an internal node: 1022 | - If int, then consider `min_samples_split` as the minimum number. 1023 | - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. 1024 | 1025 | - ```min_samples_leaf : int or float, default=1``` 1026 | 1027 | The minimum number of samples required to be at a leaf node. 1028 | A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. 1029 | - If int, then consider `min_samples_leaf` as the minimum number. 1030 | - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. 1031 | 1032 | - ```min_weight_fraction_leaf : float, default=0.0``` 1033 | 1034 | The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 1035 | 1036 | - ```max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"``` 1037 | 1038 | The number of features to consider when looking for the best split: 1039 | - If int, then consider `max_features` features at each split. 1040 | - If float, then `max_features` is a fraction and `round(max_features * n_features)` features are considered at each split. 1041 | - If "auto", then `max_features=n_features`. 1042 | - If "sqrt", then `max_features=sqrt(n_features)`. 1043 | - If "log2", then `max_features=log2(n_features)`. 1044 | - If None, then `max_features=n_features`. 1045 | Note: the search for a split does not stop until at least one 1046 | valid partition of the node samples is found, even if it requires to 1047 | effectively inspect more than ``max_features`` features. 1048 | 1049 | - ```max_leaf_nodes : int, default=None``` 1050 | 1051 | Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. 1052 | 1053 | - ```min_impurity_decrease : float, default=0.0``` 1054 | 1055 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value. 1056 | 1057 | - ```bootstrap : bool, default=True``` 1058 | 1059 | Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. 1060 | 1061 | - ```oob_score : bool, default=False``` 1062 | 1063 | Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True. 1064 | 1065 | - ```n_jobs : int, default=None``` 1066 | 1067 | The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, :meth:`decision_path` and :meth:`apply` are all parallelized over the trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. 1068 | 1069 | - ```random_state : int, RandomState instance or None, default=None``` 1070 | 1071 | Controls both the randomness of the bootstrapping of the samples used when building trees (if ``bootstrap=True``) and the sampling of the features to consider when looking for the best split at each node (if ``max_features < n_features``). 1072 | 1073 | - ```ccp_alpha : non-negative float, default=0.0``` 1074 | 1075 | Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. 1076 | 1077 | - ```max_samples : int or float, default=None``` 1078 | 1079 | If bootstrap is True, the number of samples to draw from X to train each base estimator. 1080 | - If None (default), then draw `X.shape[0]` samples. 1081 | - If int, then draw `max_samples` samples. 1082 | - If float, then draw `max_samples * X.shape[0]` samples. Thus, 1083 | `max_samples` should be in the interval `(0, 1]`. 1084 | 1085 | #### Attributes: 1086 | 1087 | - ```n_features_in_ : int``` 1088 | 1089 | The number of features when :meth:`fit` is performed. 1090 | 1091 | - ```feature_importances_ : ndarray of shape (n_features, )``` 1092 | 1093 | The impurity-based feature importances. 1094 | The higher, the more important the feature. 1095 | The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance. 1096 | 1097 | - ```coef_ : array of shape (n_features, ) or (n_targets, n_features)``` 1098 | 1099 | Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features. 1100 | 1101 | - ```intercept_ : float or array of shape (n_targets,)``` 1102 | 1103 | Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`. 1104 | 1105 | - ```classes_ : ndarray of shape (n_classes, )``` 1106 | 1107 | A list of class labels known to the classifier. 1108 | 1109 | - ```base_estimator_ : object``` 1110 | 1111 | A fitted linear model instance. 1112 | 1113 | - ```forest_estimator_ : object``` 1114 | 1115 | A fitted random forest instance. 1116 | 1117 | #### Methods: 1118 | 1119 | - ```fit(X, y, sample_weight=None)``` 1120 | 1121 | Build a Linear Forest from the training set (X, y). 1122 | 1123 | **Parameters:** 1124 | 1125 | - `X` : array-like of shape (n_samples, n_features) 1126 | 1127 | The training input samples. 1128 | 1129 | - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets) 1130 | 1131 | Target values. 1132 | 1133 | - `sample_weight` : array-like of shape (n_samples, ), default=None 1134 | 1135 | Sample weights. 1136 | 1137 | **Returns:** 1138 | 1139 | - `self` : 1140 | 1141 | - ```decision_function(X)``` 1142 | 1143 | Predict confidence scores for samples. 1144 | The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane. 1145 | 1146 | **Parameters:** 1147 | 1148 | - `X` : array-like of shape (n_samples, n_features) 1149 | 1150 | Samples. 1151 | 1152 | **Returns:** 1153 | 1154 | - `pred` : ndarray of shape (n_samples, ). 1155 | 1156 | Confidence scores. 1157 | Confidence score for self.classes_[1] where >0 means this class would be predicted. 1158 | 1159 | - ```predict(X)``` 1160 | 1161 | Predict class for X. 1162 | 1163 | **Parameters:** 1164 | 1165 | - `X` : array-like of shape (n_samples, n_features) 1166 | 1167 | Samples. 1168 | 1169 | **Returns:** 1170 | 1171 | - `pred` : ndarray of shape (n_samples, ). 1172 | 1173 | The predicted classes. 1174 | 1175 | - ```predict_proba(X)``` 1176 | 1177 | Predict class probabilities for X. 1178 | 1179 | **Parameters:** 1180 | 1181 | - `X` : array-like of shape (n_samples, n_features) 1182 | 1183 | Samples. 1184 | 1185 | **Returns:** 1186 | 1187 | - `proba` : ndarray of shape (n_samples, n_classes). 1188 | 1189 | The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`. 1190 | 1191 | - ```predict_log_proba(X)``` 1192 | 1193 | Predict class log-probabilities for X. 1194 | 1195 | **Parameters:** 1196 | 1197 | - `X` : array-like of shape (n_samples, n_features) 1198 | 1199 | Samples. 1200 | 1201 | **Returns:** 1202 | 1203 | - `pred` : ndarray of shape (n_samples, n_classes). 1204 | 1205 | The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`. 1206 | 1207 | - ```apply(X)``` 1208 | 1209 | Apply trees in the forest to X, return leaf indices. 1210 | 1211 | **Parameters:** 1212 | 1213 | - `X` : array-like of shape (n_samples, n_features) 1214 | 1215 | The input samples. 1216 | 1217 | **Returns:** 1218 | 1219 | - `X_leaves` : array-like of shape (n_samples, n_estimators). 1220 | 1221 | For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in. 1222 | 1223 | - ```decision_path(X)``` 1224 | 1225 | Return the decision path in the forest. 1226 | 1227 | **Parameters:** 1228 | 1229 | - `X` : array-like of shape (n_samples, n_features) 1230 | 1231 | The input samples. 1232 | 1233 | **Returns:** 1234 | 1235 | - `indicator` : sparse matrix of shape (n_samples, n_nodes) 1236 | 1237 | Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format. 1238 | 1239 | - `n_nodes_ptr` : ndarray of shape (n_estimators + 1, ) 1240 | 1241 | The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator. -------------------------------------------------------------------------------- /notebooks/usage-LinearBoost.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "\n", 11 | "from sklearn.linear_model import *\n", 12 | "from lineartree import LinearBoostClassifier, LinearBoostRegressor\n", 13 | "\n", 14 | "from sklearn.datasets import make_classification, make_regression\n", 15 | "\n", 16 | "import warnings\n", 17 | "warnings.simplefilter('ignore')" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "# REGRESSION" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "text/plain": [ 35 | "((8000, 15), (8000,))" 36 | ] 37 | }, 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "output_type": "execute_result" 41 | } 42 | ], 43 | "source": [ 44 | "n_sample, n_features = 8000, 15\n", 45 | "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=1, \n", 46 | " n_informative=5, shuffle=True, random_state=33)\n", 47 | "\n", 48 | "X.shape, y.shape" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### default configuration" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "data": { 65 | "text/plain": [ 66 | "LinearBoostRegressor(base_estimator=Ridge())" 67 | ] 68 | }, 69 | "execution_count": 3, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "regr = LinearBoostRegressor(Ridge(), loss='linear')\n", 76 | "regr.fit(X, y)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 4, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "((8000, 25), (8000,), 0.9999998985375346)" 88 | ] 89 | }, 90 | "execution_count": 4, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "### square loss" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 5, 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "LinearBoostRegressor(base_estimator=Ridge(), loss='square', n_estimators=50)" 115 | ] 116 | }, 117 | "execution_count": 5, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "regr = LinearBoostRegressor(Ridge(), loss='square', n_estimators=50)\n", 124 | "regr.fit(X, y)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 6, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/plain": [ 135 | "((8000, 65), (8000,), 0.9999998980115891)" 136 | ] 137 | }, 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "### absolute loss" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 7, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "data": { 161 | "text/plain": [ 162 | "LinearBoostRegressor(base_estimator=Ridge(), loss='absolute', n_estimators=50)" 163 | ] 164 | }, 165 | "execution_count": 7, 166 | "metadata": {}, 167 | "output_type": "execute_result" 168 | } 169 | ], 170 | "source": [ 171 | "regr = LinearBoostRegressor(Ridge(), loss='absolute', n_estimators=50)\n", 172 | "regr.fit(X, y)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 8, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "((8000, 65), (8000,), 0.9999998009507592)" 184 | ] 185 | }, 186 | "execution_count": 8, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | } 190 | ], 191 | "source": [ 192 | "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "### exponential loss" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 9, 205 | "metadata": {}, 206 | "outputs": [ 207 | { 208 | "data": { 209 | "text/plain": [ 210 | "LinearBoostRegressor(base_estimator=Ridge(), loss='exponential',\n", 211 | " n_estimators=50)" 212 | ] 213 | }, 214 | "execution_count": 9, 215 | "metadata": {}, 216 | "output_type": "execute_result" 217 | } 218 | ], 219 | "source": [ 220 | "regr = LinearBoostRegressor(Ridge(), loss='exponential', n_estimators=50)\n", 221 | "regr.fit(X, y)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 10, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "data": { 231 | "text/plain": [ 232 | "((8000, 65), (8000,), 0.9999998597027936)" 233 | ] 234 | }, 235 | "execution_count": 10, 236 | "metadata": {}, 237 | "output_type": "execute_result" 238 | } 239 | ], 240 | "source": [ 241 | "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "### multi-target regression with weights " 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 11, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "((8000, 15), (8000, 2))" 260 | ] 261 | }, 262 | "execution_count": 11, 263 | "metadata": {}, 264 | "output_type": "execute_result" 265 | } 266 | ], 267 | "source": [ 268 | "n_sample, n_features = 8000, 15\n", 269 | "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=2, \n", 270 | " n_informative=5, shuffle=True, random_state=33)\n", 271 | "W = np.random.uniform(1,3, (n_sample,))\n", 272 | "\n", 273 | "X.shape, y.shape" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 12, 279 | "metadata": {}, 280 | "outputs": [ 281 | { 282 | "data": { 283 | "text/plain": [ 284 | "LinearBoostRegressor(base_estimator=Ridge(), n_estimators=50)" 285 | ] 286 | }, 287 | "execution_count": 12, 288 | "metadata": {}, 289 | "output_type": "execute_result" 290 | } 291 | ], 292 | "source": [ 293 | "regr = LinearBoostRegressor(Ridge(), loss='linear', n_estimators=50)\n", 294 | "regr.fit(X, y, W)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 13, 300 | "metadata": {}, 301 | "outputs": [ 302 | { 303 | "data": { 304 | "text/plain": [ 305 | "((8000, 65), (8000, 2), 0.999999867971615)" 306 | ] 307 | }, 308 | "execution_count": 13, 309 | "metadata": {}, 310 | "output_type": "execute_result" 311 | } 312 | ], 313 | "source": [ 314 | "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "# CLASSIFICATION" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 14, 327 | "metadata": {}, 328 | "outputs": [ 329 | { 330 | "data": { 331 | "text/plain": [ 332 | "((8000, 15), (8000,))" 333 | ] 334 | }, 335 | "execution_count": 14, 336 | "metadata": {}, 337 | "output_type": "execute_result" 338 | } 339 | ], 340 | "source": [ 341 | "n_sample, n_features = 8000, 15\n", 342 | "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=3, \n", 343 | " n_redundant=4, n_informative=5,\n", 344 | " n_clusters_per_class=1,\n", 345 | " shuffle=True, random_state=33)\n", 346 | "\n", 347 | "X.shape, y.shape" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "### default configuration " 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 15, 360 | "metadata": {}, 361 | "outputs": [ 362 | { 363 | "data": { 364 | "text/plain": [ 365 | "LinearBoostClassifier(base_estimator=RidgeClassifier())" 366 | ] 367 | }, 368 | "execution_count": 15, 369 | "metadata": {}, 370 | "output_type": "execute_result" 371 | } 372 | ], 373 | "source": [ 374 | "clf = LinearBoostClassifier(RidgeClassifier(), loss='hamming')\n", 375 | "clf.fit(X, y)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 16, 381 | "metadata": {}, 382 | "outputs": [ 383 | { 384 | "data": { 385 | "text/plain": [ 386 | "((8000, 25), (8000,), (8000, 3), 0.81775)" 387 | ] 388 | }, 389 | "execution_count": 16, 390 | "metadata": {}, 391 | "output_type": "execute_result" 392 | } 393 | ], 394 | "source": [ 395 | "clf.transform(X).shape, clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X, y)" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "### entropy loss " 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": 17, 408 | "metadata": {}, 409 | "outputs": [ 410 | { 411 | "data": { 412 | "text/plain": [ 413 | "LinearBoostClassifier(base_estimator=LogisticRegression(), loss='entropy',\n", 414 | " n_estimators=50)" 415 | ] 416 | }, 417 | "execution_count": 17, 418 | "metadata": {}, 419 | "output_type": "execute_result" 420 | } 421 | ], 422 | "source": [ 423 | "clf = LinearBoostClassifier(LogisticRegression(), loss='entropy', n_estimators=50)\n", 424 | "clf.fit(X, y)" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 18, 430 | "metadata": {}, 431 | "outputs": [ 432 | { 433 | "data": { 434 | "text/plain": [ 435 | "((8000, 65), (8000,), (8000, 3), 0.844)" 436 | ] 437 | }, 438 | "execution_count": 18, 439 | "metadata": {}, 440 | "output_type": "execute_result" 441 | } 442 | ], 443 | "source": [ 444 | "clf.transform(X).shape, clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X, y)" 445 | ] 446 | } 447 | ], 448 | "metadata": { 449 | "kernelspec": { 450 | "display_name": "Python [conda env:prova]", 451 | "language": "python", 452 | "name": "conda-env-prova-py" 453 | }, 454 | "language_info": { 455 | "codemirror_mode": { 456 | "name": "ipython", 457 | "version": 3 458 | }, 459 | "file_extension": ".py", 460 | "mimetype": "text/x-python", 461 | "name": "python", 462 | "nbconvert_exporter": "python", 463 | "pygments_lexer": "ipython3", 464 | "version": "3.7.7" 465 | } 466 | }, 467 | "nbformat": 4, 468 | "nbformat_minor": 2 469 | } 470 | -------------------------------------------------------------------------------- /notebooks/usage-LinearForest.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "\n", 11 | "from sklearn.linear_model import *\n", 12 | "from lineartree import LinearForestClassifier, LinearForestRegressor\n", 13 | "\n", 14 | "from sklearn.datasets import make_classification, make_regression\n", 15 | "\n", 16 | "import warnings\n", 17 | "warnings.simplefilter('ignore')" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "# REGRESSION" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "text/plain": [ 35 | "((8000, 15), (8000,))" 36 | ] 37 | }, 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "output_type": "execute_result" 41 | } 42 | ], 43 | "source": [ 44 | "n_sample, n_features = 8000, 15\n", 45 | "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=1, \n", 46 | " n_informative=5, shuffle=True, random_state=33)\n", 47 | "\n", 48 | "X.shape, y.shape" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "data": { 58 | "text/plain": [ 59 | "LinearForestRegressor(base_estimator=Ridge())" 60 | ] 61 | }, 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "output_type": "execute_result" 65 | } 66 | ], 67 | "source": [ 68 | "regr = LinearForestRegressor(Ridge())\n", 69 | "regr.fit(X, y)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "((8000,), (8000, 100), (101,), 0.9999999999365181)" 81 | ] 82 | }, 83 | "execution_count": 4, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "regr.predict(X).shape, regr.apply(X).shape, regr.decision_path(X)[-1].shape, regr.score(X,y)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "### multi-target regression with weights " 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 5, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "data": { 106 | "text/plain": [ 107 | "((8000, 15), (8000, 2))" 108 | ] 109 | }, 110 | "execution_count": 5, 111 | "metadata": {}, 112 | "output_type": "execute_result" 113 | } 114 | ], 115 | "source": [ 116 | "n_sample, n_features = 8000, 15\n", 117 | "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=2, \n", 118 | " n_informative=5, shuffle=True, random_state=33)\n", 119 | "W = np.random.uniform(1,3, (n_sample,))\n", 120 | "\n", 121 | "X.shape, y.shape" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 6, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/plain": [ 132 | "LinearForestRegressor(base_estimator=Ridge())" 133 | ] 134 | }, 135 | "execution_count": 6, 136 | "metadata": {}, 137 | "output_type": "execute_result" 138 | } 139 | ], 140 | "source": [ 141 | "regr = LinearForestRegressor(Ridge())\n", 142 | "regr.fit(X, y, W)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 7, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "data": { 152 | "text/plain": [ 153 | "((8000, 2), (8000, 100), (101,), 0.999999999979502)" 154 | ] 155 | }, 156 | "execution_count": 7, 157 | "metadata": {}, 158 | "output_type": "execute_result" 159 | } 160 | ], 161 | "source": [ 162 | "regr.predict(X).shape, regr.apply(X).shape, regr.decision_path(X)[-1].shape, regr.score(X,y)" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "# BINARY CLASSIFICATION" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 8, 175 | "metadata": {}, 176 | "outputs": [ 177 | { 178 | "data": { 179 | "text/plain": [ 180 | "((8000, 15), (8000,))" 181 | ] 182 | }, 183 | "execution_count": 8, 184 | "metadata": {}, 185 | "output_type": "execute_result" 186 | } 187 | ], 188 | "source": [ 189 | "n_sample, n_features = 8000, 15\n", 190 | "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=2, \n", 191 | " n_redundant=4, n_informative=5,\n", 192 | " n_clusters_per_class=1,\n", 193 | " shuffle=True, random_state=33)\n", 194 | "\n", 195 | "X.shape, y.shape" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### default configuration" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 9, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "LinearForestClassifier(base_estimator=Ridge())" 214 | ] 215 | }, 216 | "execution_count": 9, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "clf = LinearForestClassifier(Ridge())\n", 223 | "clf.fit(X, y)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 10, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "((8000,), (8000, 2), (8000, 100), (101,), 1.0)" 235 | ] 236 | }, 237 | "execution_count": 10, 238 | "metadata": {}, 239 | "output_type": "execute_result" 240 | } 241 | ], 242 | "source": [ 243 | "clf.predict(X).shape, clf.predict_proba(X).shape, clf.apply(X).shape, clf.decision_path(X)[-1].shape, clf.score(X,y)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "# MULTI-CLASS CLASSIFICATION" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 11, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "data": { 260 | "text/plain": [ 261 | "((8000, 15), (8000,))" 262 | ] 263 | }, 264 | "execution_count": 11, 265 | "metadata": {}, 266 | "output_type": "execute_result" 267 | } 268 | ], 269 | "source": [ 270 | "n_sample, n_features = 8000, 15\n", 271 | "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=3, \n", 272 | " n_redundant=4, n_informative=5,\n", 273 | " n_clusters_per_class=1,\n", 274 | " shuffle=True, random_state=33)\n", 275 | "\n", 276 | "X.shape, y.shape" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "### default configuration" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 12, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "data": { 293 | "text/plain": [ 294 | "OneVsRestClassifier(estimator=LinearForestClassifier(base_estimator=Ridge()))" 295 | ] 296 | }, 297 | "execution_count": 12, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "from sklearn.multiclass import OneVsRestClassifier\n", 304 | "\n", 305 | "clf = OneVsRestClassifier(LinearForestClassifier(Ridge()))\n", 306 | "clf.fit(X, y)" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 13, 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "data": { 316 | "text/plain": [ 317 | "((8000,), (8000, 3), 1.0)" 318 | ] 319 | }, 320 | "execution_count": 13, 321 | "metadata": {}, 322 | "output_type": "execute_result" 323 | } 324 | ], 325 | "source": [ 326 | "clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X,y)" 327 | ] 328 | } 329 | ], 330 | "metadata": { 331 | "kernelspec": { 332 | "display_name": "Python [conda env:prova]", 333 | "language": "python", 334 | "name": "conda-env-prova-py" 335 | }, 336 | "language_info": { 337 | "codemirror_mode": { 338 | "name": "ipython", 339 | "version": 3 340 | }, 341 | "file_extension": ".py", 342 | "mimetype": "text/x-python", 343 | "name": "python", 344 | "nbconvert_exporter": "python", 345 | "pygments_lexer": "ipython3", 346 | "version": "3.7.7" 347 | } 348 | }, 349 | "nbformat": 4, 350 | "nbformat_minor": 2 351 | } 352 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scipy 3 | scikit-learn>=0.24.2 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import pathlib 2 | from setuptools import setup, find_packages 3 | 4 | HERE = pathlib.Path(__file__).parent 5 | 6 | VERSION = '0.3.5' 7 | PACKAGE_NAME = 'linear-tree' 8 | AUTHOR = 'Marco Cerliani' 9 | AUTHOR_EMAIL = 'cerlymarco@gmail.com' 10 | URL = 'https://github.com/cerlymarco/linear-tree' 11 | 12 | LICENSE = 'MIT' 13 | DESCRIPTION = 'A python library to build Model Trees with Linear Models at the leaves.' 14 | LONG_DESCRIPTION = (HERE / "README.md").read_text() 15 | LONG_DESC_TYPE = "text/markdown" 16 | 17 | INSTALL_REQUIRES = [ 18 | 'scikit-learn>=0.24.2', 19 | 'numpy', 20 | 'scipy' 21 | ] 22 | 23 | setup(name=PACKAGE_NAME, 24 | version=VERSION, 25 | description=DESCRIPTION, 26 | long_description=LONG_DESCRIPTION, 27 | long_description_content_type=LONG_DESC_TYPE, 28 | author=AUTHOR, 29 | license=LICENSE, 30 | author_email=AUTHOR_EMAIL, 31 | url=URL, 32 | install_requires=INSTALL_REQUIRES, 33 | python_requires='>=3', 34 | packages=find_packages() 35 | ) --------------------------------------------------------------------------------