├── .gitignore
├── LICENSE
├── README.md
├── imgs
    ├── leaf_coefficients.png
    ├── linear_boost_importances.png
    ├── linear_forest_predictions.png
    ├── linear_tree_class.png
    ├── linear_tree_reg.png
    └── plot_tree.png
├── lineartree
    ├── __init__.py
    ├── _classes.py
    ├── _criterion.py
    └── lineartree.py
├── notebooks
    ├── README.md
    ├── plots.ipynb
    ├── usage-LinearBoost.ipynb
    ├── usage-LinearForest.ipynb
    └── usage-LinearTree.ipynb
├── requirements.txt
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | .DS_Store
  2 | 
  3 | # Created by https://www.gitignore.io/api/python
  4 | 
  5 | ### Python ###
  6 | # Byte-compiled / optimized / DLL files
  7 | __pycache__/
  8 | *.py[cod]
  9 | *$py.class
 10 | 
 11 | # C extensions
 12 | *.so
 13 | 
 14 | # Distribution / packaging
 15 | .Python
 16 | build/
 17 | develop-eggs/
 18 | dist/
 19 | downloads/
 20 | eggs/
 21 | .eggs/
 22 | lib/
 23 | lib64/
 24 | parts/
 25 | sdist/
 26 | var/
 27 | wheels/
 28 | *.egg-info/
 29 | .installed.cfg
 30 | *.egg
 31 | 
 32 | # PyInstaller
 33 | #  Usually these files are written by a python script from a template
 34 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 35 | *.manifest
 36 | *.spec
 37 | 
 38 | # Installer logs
 39 | pip-log.txt
 40 | pip-delete-this-directory.txt
 41 | 
 42 | # Unit test / coverage reports
 43 | htmlcov/
 44 | .tox/
 45 | .coverage
 46 | .coverage.*
 47 | .cache
 48 | nosetests.xml
 49 | coverage.xml
 50 | *.cover
 51 | .hypothesis/
 52 | 
 53 | # Translations
 54 | *.mo
 55 | *.pot
 56 | 
 57 | # Django stuff:
 58 | *.log
 59 | local_settings.py
 60 | 
 61 | # Flask stuff:
 62 | instance/
 63 | .webassets-cache
 64 | 
 65 | # Scrapy stuff:
 66 | .scrapy
 67 | 
 68 | # Sphinx documentation
 69 | docs/_build/
 70 | 
 71 | # PyBuilder
 72 | target/
 73 | 
 74 | # Jupyter Notebook
 75 | .ipynb_checkpoints
 76 | 
 77 | # pyenv
 78 | .python-version
 79 | 
 80 | # celery beat schedule file
 81 | celerybeat-schedule
 82 | 
 83 | # SageMath parsed files
 84 | *.sage.py
 85 | 
 86 | # Environments
 87 | .env
 88 | .venv
 89 | env/
 90 | venv/
 91 | ENV/
 92 | env.bak/
 93 | venv.bak/
 94 | 
 95 | # Spyder project settings
 96 | .spyderproject
 97 | .spyproject
 98 | 
 99 | # Rope project settings
100 | .ropeproject
101 | 
102 | # mkdocs documentation
103 | /site
104 | 
105 | # mypy
106 | .mypy_cache/
107 | 
108 | # End of https://www.gitignore.io/api/python
109 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Marco Cerliani
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # linear-tree
  2 | A python library to build Model Trees with Linear Models at the leaves.
  3 | 
  4 | linear-tree provides also the implementations of _LinearForest_ and _LinearBoost_ inspired from [these works](https://github.com/cerlymarco/linear-tree#references).
  5 | 
  6 | ## Overview
  7 | **Linear Trees** combine the learning ability of Decision Tree with the predictive and explicative power of Linear Models. 
  8 | Like in tree-based algorithms, the data are split according to simple decision rules. The goodness of slits is evaluated in gain terms fitting Linear Models in the nodes. This implies that the models in the leaves are linear instead of constant approximations like in classical Decision Trees. 
  9 | 
 10 | **Linear Forests** generalize the well known Random Forests by combining Linear Models with the same Random Forests. The key idea is to use the strength of Linear Models to improve the nonparametric learning ability of tree-based algorithms. Firstly, a Linear Model is fitted on the whole dataset, then a Random Forest is trained on the same dataset but using the residuals of the previous steps as target. The final predictions are the sum of the raw linear predictions and the residuals modeled by the Random Forest.
 11 | 
 12 | **Linear Boosting** is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtain predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met.
 13 | 
 14 | **linear-tree is developed to be fully integrable with scikit-learn**. ```LinearTreeRegressor``` and ```LinearTreeClassifier``` are provided as scikit-learn _BaseEstimator_ to build a decision tree using linear estimators. ```LinearForestRegressor``` and ```LinearForestClassifier``` use the _RandomForest_ from sklearn to model residuals. ```LinearBoostRegressor``` and ```LinearBoostClassifier``` are available also as _TransformerMixin_ in order to be integrated, in any pipeline, also for  automated features engineering. All the models available in [sklearn.linear_model](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) can be used as base learner. 
 15 | 
 16 | ## Installation
 17 | ```shell
 18 | pip install --upgrade linear-tree
 19 | ```
 20 | The module depends on NumPy, SciPy and Scikit-Learn (>=0.24.2). Python 3.6 or above is supported.
 21 | 
 22 | ## Media
 23 | - [Linear Tree: the perfect mix of Linear Model and Decision Tree](https://towardsdatascience.com/linear-tree-the-perfect-mix-of-linear-model-and-decision-tree-2eaed21936b7)
 24 | - [Model Tree: handle Data Shifts mixing Linear Model and Decision Tree](https://towardsdatascience.com/model-tree-handle-data-shifts-mixing-linear-model-and-decision-tree-facfd642e42b)
 25 | - [Explainable AI with Linear Trees](https://towardsdatascience.com/explainable-ai-with-linear-trees-7e30a6f067d7)
 26 | - [Improve Linear Regression for Time Series Forecasting](https://towardsdatascience.com/improve-linear-regression-for-time-series-forecasting-e36f3c3e3534#a80b-b6010ccb1c21)
 27 | - [Linear Boosting with Automated Features Engineering](https://towardsdatascience.com/linear-boosting-with-automated-features-engineering-894962c3ba84)
 28 | - [Improve Random Forest with Linear Models](https://towardsdatascience.com/improve-random-forest-with-linear-models-1fa789691e18)
 29 | 
 30 | ## Usage
 31 | ##### Linear Tree Regression
 32 | ```python
 33 | from sklearn.linear_model import LinearRegression
 34 | from lineartree import LinearTreeRegressor
 35 | from sklearn.datasets import make_regression
 36 | X, y = make_regression(n_samples=100, n_features=4,
 37 |                        n_informative=2, n_targets=1,
 38 |                        random_state=0, shuffle=False)
 39 | regr = LinearTreeRegressor(base_estimator=LinearRegression())
 40 | regr.fit(X, y)
 41 | ```
 42 | ##### Linear Tree Classification
 43 | ```python
 44 | from sklearn.linear_model import RidgeClassifier
 45 | from lineartree import LinearTreeClassifier
 46 | from sklearn.datasets import make_classification
 47 | X, y = make_classification(n_samples=100, n_features=4,
 48 |                            n_informative=2, n_redundant=0,
 49 |                            random_state=0, shuffle=False)
 50 | clf = LinearTreeClassifier(base_estimator=RidgeClassifier())
 51 | clf.fit(X, y)
 52 | ```
 53 | ##### Linear Forest Regression
 54 | ```python
 55 | from sklearn.linear_model import LinearRegression
 56 | from lineartree import LinearForestRegressor
 57 | from sklearn.datasets import make_regression
 58 | X, y = make_regression(n_samples=100, n_features=4,
 59 |                        n_informative=2, n_targets=1,
 60 |                        random_state=0, shuffle=False)
 61 | regr = LinearForestRegressor(base_estimator=LinearRegression())
 62 | regr.fit(X, y)
 63 | ```
 64 | ##### Linear Forest Classification
 65 | ```python
 66 | from sklearn.linear_model import LinearRegression
 67 | from lineartree import LinearForestClassifier
 68 | from sklearn.datasets import make_classification
 69 | X, y = make_classification(n_samples=100, n_features=4,
 70 |                            n_informative=2, n_redundant=0,
 71 |                            random_state=0, shuffle=False)
 72 | clf = LinearForestClassifier(base_estimator=LinearRegression())
 73 | clf.fit(X, y)
 74 | ```
 75 | ##### Linear Boosting Regression
 76 | ```python
 77 | from sklearn.linear_model import LinearRegression
 78 | from lineartree import LinearBoostRegressor
 79 | from sklearn.datasets import make_regression
 80 | X, y = make_regression(n_samples=100, n_features=4,
 81 |                        n_informative=2, n_targets=1,
 82 |                        random_state=0, shuffle=False)
 83 | regr = LinearBoostRegressor(base_estimator=LinearRegression())
 84 | regr.fit(X, y)
 85 | ```
 86 | ##### Linear Boosting Classification
 87 | ```python
 88 | from sklearn.linear_model import RidgeClassifier
 89 | from lineartree import LinearBoostClassifier
 90 | from sklearn.datasets import make_classification
 91 | X, y = make_classification(n_samples=100, n_features=4,
 92 |                            n_informative=2, n_redundant=0,
 93 |                            random_state=0, shuffle=False)
 94 | clf = LinearBoostClassifier(base_estimator=RidgeClassifier())
 95 | clf.fit(X, y)
 96 | ```
 97 | 
 98 | More examples in the [notebooks folder](https://github.com/cerlymarco/linear-tree/tree/main/notebooks).
 99 | 
100 | Check the [API Reference](https://github.com/cerlymarco/linear-tree/blob/main/notebooks/README.md) to see the parameter configurations and the available methods.
101 | 
102 | ## Examples
103 | Show the linear tree learning path:
104 | 
105 | ![plot tree](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/plot_tree.png)
106 | 
107 | Linear Tree Regressor at work:
108 | 
109 | ![linear tree regressor](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_reg.png)
110 | 
111 | Linear Tree Classifier at work:
112 | 
113 | ![linear tree classifier](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_class.png)
114 | 
115 | Extract and examine coefficients at the leaves:
116 | 
117 | ![leaf coefficients](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/leaf_coefficients.png)
118 | 
119 | Impact of the features automatically generated with Linear Boosting:
120 | 
121 | ![linear_boost_importances](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_boost_importances.png)
122 | 
123 | Comparing predictions of Linear Forest and Random Forest:
124 | 
125 | ![linear_forest_predictions](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_forest_predictions.png)
126 | 
127 | ## References
128 | - Regression-Enhanced Random Forests. Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu.
129 | - Explainable boosted linear regression for time series forecasting. Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan.
130 | 


--------------------------------------------------------------------------------
/imgs/leaf_coefficients.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/leaf_coefficients.png


--------------------------------------------------------------------------------
/imgs/linear_boost_importances.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_boost_importances.png


--------------------------------------------------------------------------------
/imgs/linear_forest_predictions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_forest_predictions.png


--------------------------------------------------------------------------------
/imgs/linear_tree_class.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_tree_class.png


--------------------------------------------------------------------------------
/imgs/linear_tree_reg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/linear_tree_reg.png


--------------------------------------------------------------------------------
/imgs/plot_tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cerlymarco/linear-tree/2982edc050206521fa9cde7df1b1f88ab7b2183d/imgs/plot_tree.png


--------------------------------------------------------------------------------
/lineartree/__init__.py:
--------------------------------------------------------------------------------
1 | from ._classes import *
2 | from ._criterion import *
3 | from .lineartree import *


--------------------------------------------------------------------------------
/lineartree/_classes.py:
--------------------------------------------------------------------------------
   1 | import numbers
   2 | import numpy as np
   3 | import scipy.sparse as sp
   4 | 
   5 | from copy import deepcopy
   6 | from joblib import Parallel, effective_n_jobs  # , delayed
   7 | 
   8 | from sklearn.dummy import DummyClassifier
   9 | from sklearn.tree import DecisionTreeRegressor
  10 | from sklearn.ensemble import RandomForestRegressor
  11 | 
  12 | from sklearn.base import is_regressor
  13 | from sklearn.base import BaseEstimator, TransformerMixin
  14 | from sklearn.utils.validation import has_fit_parameter, check_is_fitted
  15 | 
  16 | from ._criterion import SCORING
  17 | from ._criterion import mse, rmse, mae, poisson
  18 | from ._criterion import hamming, crossentropy
  19 | 
  20 | import sklearn
  21 | _sklearn_v1 = eval(sklearn.__version__.split('.')[0]) > 0
  22 | 
  23 | 
  24 | CRITERIA = {"mse": mse,
  25 |             "rmse": rmse,
  26 |             "mae": mae,
  27 |             "poisson": poisson,
  28 |             "hamming": hamming,
  29 |             "crossentropy": crossentropy}
  30 | 
  31 | 
  32 | #########################################################################
  33 | ### remove when https://github.com/joblib/joblib/issues/1071 is fixed ###
  34 | #########################################################################
  35 | from sklearn import get_config, config_context
  36 | from functools import update_wrapper
  37 | import functools
  38 | 
  39 | # from sklearn.utils.fixes
  40 | def delayed(function):
  41 |     """Decorator used to capture the arguments of a function."""
  42 |     @functools.wraps(function)
  43 |     def delayed_function(*args, **kwargs):
  44 |         return _FuncWrapper(function), args, kwargs
  45 |     return delayed_function
  46 | 
  47 | # from sklearn.utils.fixes
  48 | class _FuncWrapper:
  49 |     """"Load the global configuration before calling the function."""
  50 |     def __init__(self, function):
  51 |         self.function = function
  52 |         self.config = get_config()
  53 |         update_wrapper(self, self.function)
  54 | 
  55 |     def __call__(self, *args, **kwargs):
  56 |         with config_context(**self.config):
  57 |             return self.function(*args, **kwargs)
  58 | #########################################################################
  59 | #########################################################################
  60 | #########################################################################
  61 | 
  62 | 
  63 | def _partition_columns(columns, n_jobs):
  64 |     """Private function to partition columns splitting between jobs."""
  65 |     # Compute the number of jobs
  66 |     n_columns = len(columns)
  67 |     n_jobs = min(effective_n_jobs(n_jobs), n_columns)
  68 | 
  69 |     # Partition columns between jobs
  70 |     n_columns_per_job = np.full(n_jobs, n_columns // n_jobs, dtype=int)
  71 |     n_columns_per_job[:n_columns % n_jobs] += 1
  72 |     columns_per_job = np.cumsum(n_columns_per_job)
  73 |     columns_per_job = np.split(columns, columns_per_job)
  74 |     columns_per_job = columns_per_job[:-1]
  75 | 
  76 |     return n_jobs, columns_per_job
  77 | 
  78 | 
  79 | def _parallel_binning_fit(split_feat, _self, X, y,
  80 |                           weights, support_sample_weight,
  81 |                           bins, loss):
  82 |     """Private function to find the best column splittings within a job."""
  83 |     n_sample, n_feat = X.shape
  84 |     feval = CRITERIA[_self.criterion]
  85 | 
  86 |     split_t = None
  87 |     split_col = None
  88 |     left_node = (None, None, None, None)
  89 |     right_node = (None, None, None, None)
  90 |     largs_left = {'classes': None}
  91 |     largs_right = {'classes': None}
  92 | 
  93 |     if n_sample < _self._min_samples_split:
  94 |         return loss, split_t, split_col, left_node, right_node
  95 | 
  96 |     for col, _bin in zip(split_feat, bins):
  97 | 
  98 |         for q in _bin:
  99 | 
 100 |             # create 1D bool mask for right/left children
 101 |             mask = (X[:, col] > q)
 102 | 
 103 |             n_left, n_right = (~mask).sum(), mask.sum()
 104 | 
 105 |             if n_left < _self._min_samples_leaf or n_right < _self._min_samples_leaf:
 106 |                 continue
 107 | 
 108 |             # create 2D bool mask for right/left children
 109 |             left_mesh = np.ix_(~mask, _self._linear_features)
 110 |             right_mesh = np.ix_(mask, _self._linear_features)
 111 | 
 112 |             model_left = deepcopy(_self.base_estimator)
 113 |             model_right = deepcopy(_self.base_estimator)
 114 | 
 115 |             if hasattr(_self, 'classes_'):
 116 |                 largs_left['classes'] = np.unique(y[~mask])
 117 |                 largs_right['classes'] = np.unique(y[mask])
 118 |                 if len(largs_left['classes']) == 1:
 119 |                     model_left = DummyClassifier(strategy="most_frequent")
 120 |                 if len(largs_right['classes']) == 1:
 121 |                     model_right = DummyClassifier(strategy="most_frequent")
 122 | 
 123 |             if weights is None:
 124 |                 model_left.fit(X[left_mesh], y[~mask])
 125 |                 loss_left = feval(model_left, X[left_mesh], y[~mask],
 126 |                                   **largs_left)
 127 |                 wloss_left = loss_left * (n_left / n_sample)
 128 | 
 129 |                 model_right.fit(X[right_mesh], y[mask])
 130 |                 loss_right = feval(model_right, X[right_mesh], y[mask],
 131 |                                    **largs_right)
 132 |                 wloss_right = loss_right * (n_right / n_sample)
 133 | 
 134 |             else:
 135 |                 if support_sample_weight:
 136 |                     model_left.fit(X[left_mesh], y[~mask],
 137 |                                    sample_weight=weights[~mask])
 138 | 
 139 |                     model_right.fit(X[right_mesh], y[mask],
 140 |                                     sample_weight=weights[mask])
 141 | 
 142 |                 else:
 143 |                     model_left.fit(X[left_mesh], y[~mask])
 144 | 
 145 |                     model_right.fit(X[right_mesh], y[mask])
 146 | 
 147 |                 loss_left = feval(model_left, X[left_mesh], y[~mask],
 148 |                                   weights=weights[~mask], **largs_left)
 149 |                 wloss_left = loss_left * (weights[~mask].sum() / weights.sum())
 150 | 
 151 |                 loss_right = feval(model_right, X[right_mesh], y[mask],
 152 |                                    weights=weights[mask], **largs_right)
 153 |                 wloss_right = loss_right * (weights[mask].sum() / weights.sum())
 154 | 
 155 |             total_loss = round(wloss_left + wloss_right, 5)
 156 | 
 157 |             # store if best
 158 |             if total_loss < loss:
 159 |                 split_t = q
 160 |                 split_col = col
 161 |                 loss = total_loss
 162 |                 left_node = (model_left, loss_left, wloss_left,
 163 |                              n_left, largs_left['classes'])
 164 |                 right_node = (model_right, loss_right, wloss_right,
 165 |                               n_right, largs_right['classes'])
 166 | 
 167 |     return loss, split_t, split_col, left_node, right_node
 168 | 
 169 | 
 170 | def _map_node(X, feat, direction, split):
 171 |     """Utility to map samples to nodes"""
 172 |     if direction == 'L':
 173 |         mask = (X[:, feat] <= split)
 174 |     else:
 175 |         mask = (X[:, feat] > split)
 176 | 
 177 |     return mask
 178 | 
 179 | 
 180 | def _predict_branch(X, branch_history, mask=None):
 181 |     """Utility to map samples to branches"""
 182 | 
 183 |     if mask is None:
 184 |         mask = np.repeat(True, X.shape[0])
 185 | 
 186 |     for node in branch_history:
 187 |         mask = np.logical_and(_map_node(X, *node), mask)
 188 | 
 189 |     return mask
 190 | 
 191 | 
 192 | class Node:
 193 | 
 194 |     def __init__(self, id=None, threshold=[],
 195 |                  parent=None, children=None,
 196 |                  n_samples=None, w_loss=None,
 197 |                  loss=None, model=None, classes=None):
 198 |         self.id = id
 199 |         self.threshold = threshold
 200 |         self.parent = parent
 201 |         self.children = children
 202 |         self.n_samples = n_samples
 203 |         self.w_loss = w_loss
 204 |         self.loss = loss
 205 |         self.model = model
 206 |         self.classes = classes
 207 | 
 208 | 
 209 | class _LinearTree(BaseEstimator):
 210 |     """Base class for Linear Tree meta-estimator.
 211 | 
 212 |     Warning: This class should not be used directly. Use derived classes
 213 |     instead.
 214 |     """
 215 |     def __init__(self, base_estimator, *, criterion, max_depth,
 216 |                  min_samples_split, min_samples_leaf, max_bins,
 217 |                  min_impurity_decrease, categorical_features,
 218 |                  split_features, linear_features, n_jobs):
 219 | 
 220 |         self.base_estimator = base_estimator
 221 |         self.criterion = criterion
 222 |         self.max_depth = max_depth
 223 |         self.min_samples_split = min_samples_split
 224 |         self.min_samples_leaf = min_samples_leaf
 225 |         self.max_bins = max_bins
 226 |         self.min_impurity_decrease = min_impurity_decrease
 227 |         self.categorical_features = categorical_features
 228 |         self.split_features = split_features
 229 |         self.linear_features = linear_features
 230 |         self.n_jobs = n_jobs
 231 | 
 232 |     def _parallel_args(self):
 233 |         return {}
 234 | 
 235 |     def _split(self, X, y, bins,
 236 |                support_sample_weight,
 237 |                weights=None,
 238 |                loss=None):
 239 |         """Evaluate optimal splits in a given node (in a specific partition of
 240 |         X and y).
 241 | 
 242 |         Parameters
 243 |         ----------
 244 |         X : array-like of shape (n_samples, n_features)
 245 |             The training input samples.
 246 | 
 247 |         y : array-like of shape (n_samples, )
 248 |             The target values (class labels in classification, real numbers in
 249 |             regression).
 250 | 
 251 |         bins : array-like of shape (max_bins - 2, )
 252 |             The bins to use to find an optimal split. Expressed as percentiles.
 253 | 
 254 |         support_sample_weight : bool
 255 |             Whether the estimator's fit method supports sample_weight.
 256 | 
 257 |         weights : array-like of shape (n_samples, ), default=None
 258 |             Sample weights. If None, then samples are equally weighted.
 259 |             Note that if the base estimator does not support sample weighting,
 260 |             the sample weights are still used to evaluate the splits.
 261 | 
 262 |         loss : float, default=None
 263 |             The loss of the parent node. A split is computed if the weighted
 264 |             loss sum of the two children is lower than the loss of the parent.
 265 |             A None value implies the first fit on all the data to evaluate
 266 |             the benefits of possible future splits.
 267 | 
 268 |         Returns
 269 |         -------
 270 |         self : object
 271 |         """
 272 |         # Parallel loops
 273 |         n_jobs, split_feat = _partition_columns(self._split_features, self.n_jobs)
 274 | 
 275 |         # partition columns splittings between jobs
 276 |         all_results = Parallel(n_jobs=n_jobs, verbose=0,
 277 |                                **self._parallel_args())(
 278 |             delayed(_parallel_binning_fit)(
 279 |                 feat,
 280 |                 self, X, y,
 281 |                 weights, support_sample_weight,
 282 |                 [bins[i] for i in feat],
 283 |                 loss
 284 |             )
 285 |             for feat in split_feat)
 286 | 
 287 |         # extract results from parallel loops
 288 |         _losses, split_t, split_col = [], [], []
 289 |         left_node, right_node = [], []
 290 |         for job_res in all_results:
 291 |             _losses.append(job_res[0])
 292 |             split_t.append(job_res[1])
 293 |             split_col.append(job_res[2])
 294 |             left_node.append(job_res[3])
 295 |             right_node.append(job_res[4])
 296 | 
 297 |         # select best results
 298 |         _id_best = np.argmin(_losses)
 299 |         if loss - _losses[_id_best] > self.min_impurity_decrease:
 300 |             split_t = split_t[_id_best]
 301 |             split_col = split_col[_id_best]
 302 |             left_node = left_node[_id_best]
 303 |             right_node = right_node[_id_best]
 304 |         else:
 305 |             split_t = None
 306 |             split_col = None
 307 |             left_node = (None, None, None, None, None)
 308 |             right_node = (None, None, None, None, None)
 309 | 
 310 |         return split_t, split_col, left_node, right_node
 311 | 
 312 |     def _grow(self, X, y, weights=None):
 313 |         """Grow and prune a Linear Tree from the training set (X, y).
 314 | 
 315 |         Parameters
 316 |         ----------
 317 |         X : array-like of shape (n_samples, n_features)
 318 |             The training input samples.
 319 | 
 320 |         y : array-like of shape (n_samples, )
 321 |             The target values (class labels in classification, real numbers in
 322 |             regression).
 323 | 
 324 |         weights : array-like of shape (n_samples, ), default=None
 325 |             Sample weights. If None, then samples are equally weighted.
 326 |             Note that if the base estimator does not support sample weighting,
 327 |             the sample weights are still used to evaluate the splits.
 328 | 
 329 |         Returns
 330 |         -------
 331 |         self : object
 332 |         """
 333 |         n_sample, self.n_features_in_ = X.shape
 334 |         self.feature_importances_ = np.zeros((self.n_features_in_,))
 335 | 
 336 |         # extract quantiles
 337 |         bins = np.linspace(0, 1, self.max_bins)[1:-1]
 338 |         bins = np.quantile(X, bins, axis=0)
 339 |         bins = list(bins.T)
 340 |         bins = [np.unique(X[:, c]) if c in self._categorical_features
 341 |                 else np.unique(q) for c, q in enumerate(bins)]
 342 | 
 343 |         # check if base_estimator supports fitting with sample_weights
 344 |         support_sample_weight = has_fit_parameter(self.base_estimator,
 345 |                                                   "sample_weight")
 346 | 
 347 |         queue = ['']  # queue of the nodes to evaluate for splitting
 348 |         # store the results of each node in dicts
 349 |         self._nodes = {}
 350 |         self._leaves = {}
 351 | 
 352 |         # initialize first fit
 353 |         largs = {'classes': None}
 354 |         model = deepcopy(self.base_estimator)
 355 |         if weights is None or not support_sample_weight:
 356 |             model.fit(X[:, self._linear_features], y)
 357 |         else:
 358 |             model.fit(X[:, self._linear_features], y, sample_weight=weights)
 359 | 
 360 |         if hasattr(self, 'classes_'):
 361 |             largs['classes'] = self.classes_
 362 | 
 363 |         loss = CRITERIA[self.criterion](
 364 |             model, X[:, self._linear_features], y,
 365 |             weights=weights, **largs)
 366 |         loss = round(loss, 5)
 367 | 
 368 |         self._nodes[''] = Node(
 369 |             id=0,
 370 |             n_samples=n_sample,
 371 |             model=model,
 372 |             loss=loss,
 373 |             classes=largs['classes']
 374 |         )
 375 | 
 376 |         # in the beginning consider all the samples
 377 |         start = np.repeat(True, n_sample)
 378 |         mask = start.copy()
 379 | 
 380 |         i = 1
 381 |         while len(queue) > 0:
 382 | 
 383 |             if weights is None:
 384 |                 split_t, split_col, left_node, right_node = self._split(
 385 |                     X[mask], y[mask], bins,
 386 |                     support_sample_weight,
 387 |                     loss=loss)
 388 |             else:
 389 |                 split_t, split_col, left_node, right_node = self._split(
 390 |                     X[mask], y[mask], bins,
 391 |                     support_sample_weight, weights[mask],
 392 |                     loss=loss)
 393 | 
 394 |             # no utility in splitting
 395 |             if split_col is None or len(queue[-1]) >= self.max_depth:
 396 |                 self._leaves[queue[-1]] = self._nodes[queue[-1]]
 397 |                 del self._nodes[queue[-1]]
 398 |                 queue.pop()
 399 |             else:
 400 |                 model_left, loss_left, wloss_left, n_left, class_left = \
 401 |                     left_node
 402 |                 model_right, loss_right, wloss_right, n_right, class_right = \
 403 |                     right_node
 404 |                 self.feature_importances_[split_col] += \
 405 |                     loss - wloss_left - wloss_right
 406 | 
 407 |                 self._nodes[queue[-1] + 'L'] = Node(
 408 |                     id=i, parent=queue[-1],
 409 |                     model=model_left,
 410 |                     loss=loss_left,
 411 |                     w_loss=wloss_left,
 412 |                     n_samples=n_left,
 413 |                     threshold=self._nodes[queue[-1]].threshold[:] + [
 414 |                         (split_col, 'L', split_t)
 415 |                     ]
 416 |                 )
 417 | 
 418 |                 self._nodes[queue[-1] + 'R'] = Node(
 419 |                     id=i + 1, parent=queue[-1],
 420 |                     model=model_right,
 421 |                     loss=loss_right,
 422 |                     w_loss=wloss_right,
 423 |                     n_samples=n_right,
 424 |                     threshold=self._nodes[queue[-1]].threshold[:] + [
 425 |                         (split_col, 'R', split_t)
 426 |                     ]
 427 |                 )
 428 | 
 429 |                 if hasattr(self, 'classes_'):
 430 |                     self._nodes[queue[-1] + 'L'].classes = class_left
 431 |                     self._nodes[queue[-1] + 'R'].classes = class_right
 432 | 
 433 |                 self._nodes[queue[-1]].children = (queue[-1] + 'L', queue[-1] + 'R')
 434 | 
 435 |                 i += 2
 436 |                 q = queue[-1]
 437 |                 queue.pop()
 438 |                 queue.extend([q + 'R', q + 'L'])
 439 | 
 440 |             if len(queue) > 0:
 441 |                 loss = self._nodes[queue[-1]].loss
 442 |                 mask = _predict_branch(
 443 |                     X, self._nodes[queue[-1]].threshold, start.copy())
 444 | 
 445 |         self.node_count = i
 446 | 
 447 |         return self
 448 | 
 449 |     def _fit(self, X, y, sample_weight=None):
 450 |         """Build a Linear Tree of a linear estimator from the training
 451 |         set (X, y).
 452 | 
 453 |         Parameters
 454 |         ----------
 455 |         X : array-like of shape (n_samples, n_features)
 456 |             The training input samples.
 457 | 
 458 |         y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for
 459 |             multitarget regression.
 460 |             The target values (class labels in classification, real numbers in
 461 |             regression).
 462 | 
 463 |         sample_weight : array-like of shape (n_samples, ), default=None
 464 |             Sample weights. If None, then samples are equally weighted.
 465 |             Note that if the base estimator does not support sample weighting,
 466 |             the sample weights are still used to evaluate the splits.
 467 | 
 468 |         Returns
 469 |         -------
 470 |         self : object
 471 |         """
 472 |         n_sample, n_feat = X.shape
 473 | 
 474 |         if isinstance(self.min_samples_split, numbers.Integral):
 475 |             if self.min_samples_split < 6:
 476 |                 raise ValueError(
 477 |                     "min_samples_split must be an integer greater than 5 or "
 478 |                     "a float in (0.0, 1.0); got the integer {}".format(
 479 |                         self.min_samples_split))
 480 |             self._min_samples_split = self.min_samples_split
 481 |         else:
 482 |             if not 0. < self.min_samples_split < 1.:
 483 |                 raise ValueError(
 484 |                     "min_samples_split must be an integer greater than 5 or "
 485 |                     "a float in (0.0, 1.0); got the float {}".format(
 486 |                         self.min_samples_split))
 487 | 
 488 |             self._min_samples_split = int(np.ceil(self.min_samples_split * n_sample))
 489 |             self._min_samples_split = max(6, self._min_samples_split)
 490 | 
 491 |         if isinstance(self.min_samples_leaf, numbers.Integral):
 492 |             if self.min_samples_leaf < 3:
 493 |                 raise ValueError(
 494 |                     "min_samples_leaf must be an integer greater than 2 or "
 495 |                     "a float in (0.0, 1.0); got the integer {}".format(
 496 |                         self.min_samples_leaf))
 497 |             self._min_samples_leaf = self.min_samples_leaf
 498 |         else:
 499 |             if not 0. < self.min_samples_leaf < 1.:
 500 |                 raise ValueError(
 501 |                     "min_samples_leaf must be an integer greater than 2 or "
 502 |                     "a float in (0.0, 1.0); got the float {}".format(
 503 |                         self.min_samples_leaf))
 504 | 
 505 |             self._min_samples_leaf = int(np.ceil(self.min_samples_leaf * n_sample))
 506 |             self._min_samples_leaf = max(3, self._min_samples_leaf)
 507 | 
 508 |         if not 1 <= self.max_depth <= 20:
 509 |             raise ValueError("max_depth must be an integer in [1, 20].")
 510 | 
 511 |         if not 10 <= self.max_bins <= 120:
 512 |             raise ValueError("max_bins must be an integer in [10, 120].")
 513 | 
 514 |         if not hasattr(self.base_estimator, 'fit_intercept'):
 515 |             raise ValueError(
 516 |                 "Only linear models are accepted as base_estimator. "
 517 |                 "Select one from linear_model class of scikit-learn.")
 518 | 
 519 |         if self.categorical_features is not None:
 520 |             cat_features = np.unique(self.categorical_features)
 521 | 
 522 |             if not issubclass(cat_features.dtype.type, numbers.Integral):
 523 |                 raise ValueError(
 524 |                     "No valid specification of categorical columns. "
 525 |                     "Only a scalar, list or array-like of integers is allowed.")
 526 | 
 527 |             if (cat_features < 0).any() or (cat_features >= n_feat).any():
 528 |                 raise ValueError(
 529 |                     'Categorical features must be in [0, {}].'.format(
 530 |                         n_feat - 1))
 531 | 
 532 |             if len(cat_features) == n_feat:
 533 |                 raise ValueError(
 534 |                     "Only categorical features detected. "
 535 |                     "No features available for fitting.")
 536 |         else:
 537 |             cat_features = []
 538 |         self._categorical_features = cat_features
 539 | 
 540 |         if self.split_features is not None:
 541 |             split_features = np.unique(self.split_features)
 542 | 
 543 |             if not issubclass(split_features.dtype.type, numbers.Integral):
 544 |                 raise ValueError(
 545 |                     "No valid specification of split_features. "
 546 |                     "Only a scalar, list or array-like of integers is allowed.")
 547 | 
 548 |             if (split_features < 0).any() or (split_features >= n_feat).any():
 549 |                 raise ValueError(
 550 |                     'Splitting features must be in [0, {}].'.format(
 551 |                         n_feat - 1))
 552 |         else:
 553 |             split_features = np.arange(n_feat)
 554 |         self._split_features = split_features
 555 | 
 556 |         if self.linear_features is not None:
 557 |             linear_features = np.unique(self.linear_features)
 558 | 
 559 |             if not issubclass(linear_features.dtype.type, numbers.Integral):
 560 |                 raise ValueError(
 561 |                     "No valid specification of linear_features. "
 562 |                     "Only a scalar, list or array-like of integers is allowed.")
 563 | 
 564 |             if (linear_features < 0).any() or (linear_features >= n_feat).any():
 565 |                 raise ValueError(
 566 |                     'Linear features must be in [0, {}].'.format(
 567 |                         n_feat - 1))
 568 | 
 569 |             if np.isin(linear_features, cat_features).any():
 570 |                 raise ValueError(
 571 |                     "Linear features cannot be categorical features.")
 572 |         else:
 573 |             linear_features = np.setdiff1d(np.arange(n_feat), cat_features)
 574 |         self._linear_features = linear_features
 575 | 
 576 |         self._grow(X, y, sample_weight)
 577 | 
 578 |         normalizer = np.sum(self.feature_importances_)
 579 |         if normalizer > 0:
 580 |             self.feature_importances_ /= normalizer
 581 | 
 582 |         return self
 583 | 
 584 |     def summary(self, feature_names=None, only_leaves=False, max_depth=None):
 585 |         """Return a summary of nodes created from model fitting.
 586 | 
 587 |         Parameters
 588 |         ----------
 589 |         feature_names : array-like of shape (n_features, ), default=None
 590 |             Names of each of the features. If None, generic names
 591 |             will be used (“X[0]”, “X[1]”, …).
 592 | 
 593 |         only_leaves : bool, default=False
 594 |             Store only information of leaf nodes.
 595 | 
 596 |         max_depth : int, default=None
 597 |             The maximum depth of the representation. If None, the tree
 598 |             is fully generated.
 599 | 
 600 |         Returns
 601 |         -------
 602 |         summary : nested dict
 603 |             The keys are the integer map of each node.
 604 |             The values are dicts containing information for that node:
 605 | 
 606 |                 - 'col' (^): column used for splitting;
 607 |                 - 'th' (^): threshold value used for splitting in the
 608 |                   selected column;
 609 |                 - 'loss': loss computed at node level. Weighted sum of
 610 |                   children' losses if it is a splitting node;
 611 |                 - 'samples': number of samples in the node. Sum of children'
 612 |                   samples if it is a split node;
 613 |                 - 'children' (^): integer mapping of possible children nodes;
 614 |                 - 'models': fitted linear models built in each split.
 615 |                   Single model if it is leaf node;
 616 |                 - 'classes' (^^): target classes detected in the split.
 617 |                   Available only for LinearTreeClassifier.
 618 | 
 619 |             (^): Only for split nodes.
 620 |             (^^): Only for leaf nodes.
 621 |         """
 622 |         check_is_fitted(self, attributes='_nodes')
 623 | 
 624 |         if max_depth is None:
 625 |             max_depth = 20
 626 |         if max_depth < 1:
 627 |             raise ValueError(
 628 |                 "max_depth must be > 0, got {}".format(max_depth))
 629 | 
 630 |         summary = {}
 631 | 
 632 |         if len(self._nodes) > 0 and not only_leaves:
 633 | 
 634 |             if (feature_names is not None and
 635 |                     len(feature_names) != self.n_features_in_):
 636 |                 raise ValueError(
 637 |                     "feature_names must contain {} elements, got {}".format(
 638 |                         self.n_features_in_, len(feature_names)))
 639 | 
 640 |             if feature_names is None:
 641 |                 feature_names = np.arange(self.n_features_in_)
 642 | 
 643 |             for n, N in self._nodes.items():
 644 | 
 645 |                 if len(n) >= max_depth:
 646 |                     continue
 647 | 
 648 |                 cl, cr = N.children
 649 |                 Cl = (self._nodes[cl] if cl in self._nodes
 650 |                       else self._leaves[cl])
 651 |                 Cr = (self._nodes[cr] if cr in self._nodes
 652 |                       else self._leaves[cr])
 653 | 
 654 |                 summary[N.id] = {
 655 |                     'col': feature_names[Cl.threshold[-1][0]],
 656 |                     'th': round(Cl.threshold[-1][-1], 5),
 657 |                     'loss': round(Cl.w_loss + Cr.w_loss, 5),
 658 |                     'samples': Cl.n_samples + Cr.n_samples,
 659 |                     'children': (Cl.id, Cr.id),
 660 |                     'models': (Cl.model, Cr.model)
 661 |                 }
 662 | 
 663 |         for l, L in self._leaves.items():
 664 | 
 665 |             if len(l) > max_depth:
 666 |                 continue
 667 | 
 668 |             summary[L.id] = {
 669 |                 'loss': round(L.loss, 5),
 670 |                 'samples': L.n_samples,
 671 |                 'models': L.model
 672 |             }
 673 | 
 674 |             if hasattr(self, 'classes_'):
 675 |                 summary[L.id]['classes'] = L.classes
 676 | 
 677 |         return summary
 678 | 
 679 |     def apply(self, X):
 680 |         """Return the index of the leaf that each sample is predicted as.
 681 | 
 682 |         Parameters
 683 |         ----------
 684 |         X : array-like of shape (n_samples, n_features)
 685 |             Samples.
 686 | 
 687 |         Returns
 688 |         -------
 689 |         X_leaves : array-like of shape (n_samples, )
 690 |             For each datapoint x in X, return the index of the leaf x
 691 |             ends up in. Leaves are numbered within
 692 |             ``[0; n_nodes)``, possibly with gaps in the
 693 |             numbering.
 694 |         """
 695 |         check_is_fitted(self, attributes='_nodes')
 696 | 
 697 |         X = self._validate_data(
 698 |             X,
 699 |             reset=False,
 700 |             accept_sparse=False,
 701 |             dtype='float32',
 702 |             force_all_finite=True,
 703 |             ensure_2d=True,
 704 |             allow_nd=False,
 705 |             ensure_min_features=self.n_features_in_
 706 |         )
 707 | 
 708 |         X_leaves = np.zeros(X.shape[0], dtype='int64')
 709 | 
 710 |         for L in self._leaves.values():
 711 | 
 712 |             mask = _predict_branch(X, L.threshold)
 713 |             if (~mask).all():
 714 |                 continue
 715 | 
 716 |             X_leaves[mask] = L.id
 717 | 
 718 |         return X_leaves
 719 | 
 720 |     def decision_path(self, X):
 721 |         """Return the decision path in the tree.
 722 | 
 723 |         Parameters
 724 |         ----------
 725 |         X : array-like of shape (n_samples, n_features)
 726 |             Samples.
 727 | 
 728 |         Returns
 729 |         -------
 730 |         indicator : sparse matrix of shape (n_samples, n_nodes)
 731 |             Return a node indicator CSR matrix where non zero elements
 732 |             indicates that the samples goes through the nodes.
 733 |         """
 734 |         check_is_fitted(self, attributes='_nodes')
 735 | 
 736 |         X = self._validate_data(
 737 |             X,
 738 |             reset=False,
 739 |             accept_sparse=False,
 740 |             dtype='float32',
 741 |             force_all_finite=True,
 742 |             ensure_2d=True,
 743 |             allow_nd=False,
 744 |             ensure_min_features=self.n_features_in_
 745 |         )
 746 | 
 747 |         indicator = np.zeros((X.shape[0], self.node_count), dtype='int64')
 748 | 
 749 |         for L in self._leaves.values():
 750 | 
 751 |             mask = _predict_branch(X, L.threshold)
 752 |             if (~mask).all():
 753 |                 continue
 754 | 
 755 |             n = L.id
 756 |             p = L.parent
 757 |             paths_id = [n]
 758 | 
 759 |             while p is not None:
 760 |                 n = self._nodes[p].id
 761 |                 p = self._nodes[p].parent
 762 |                 paths_id.append(n)
 763 | 
 764 |             indicator[np.ix_(mask, paths_id)] = 1
 765 | 
 766 |         return sp.csr_matrix(indicator)
 767 | 
 768 |     def model_to_dot(self, feature_names=None, max_depth=None):
 769 |         """Convert a fitted Linear Tree model to dot format.
 770 |         It results in ModuleNotFoundError if graphviz or pydot are not available.
 771 |         When installing graphviz make sure to add it to the system path.
 772 | 
 773 |         Parameters
 774 |         ----------
 775 |         feature_names : array-like of shape (n_features, ), default=None
 776 |             Names of each of the features. If None, generic names
 777 |             will be used (“X[0]”, “X[1]”, …).
 778 | 
 779 |         max_depth : int, default=None
 780 |             The maximum depth of the representation. If None, the tree
 781 |             is fully generated.
 782 | 
 783 |         Returns
 784 |         -------
 785 |         graph : pydot.Dot instance
 786 |             Return an instance representing the Linear Tree. Splitting nodes have
 787 |             a rectangular shape while leaf nodes have a circular one.
 788 |         """
 789 |         import pydot
 790 | 
 791 |         summary = self.summary(feature_names=feature_names, max_depth=max_depth)
 792 |         graph = pydot.Dot('linear_tree', graph_type='graph')
 793 | 
 794 |         # create nodes
 795 |         for n in summary:
 796 |             if 'col' in summary[n]:
 797 |                 if isinstance(summary[n]['col'], str):
 798 |                     msg = "id_node: {}\n{} <= {}\nloss: {:.4f}\nsamples: {}"
 799 |                 else:
 800 |                     msg = "id_node: {}\nX[{}] <= {}\nloss: {:.4f}\nsamples: {}"
 801 | 
 802 |                 msg = msg.format(
 803 |                     n, summary[n]['col'], summary[n]['th'],
 804 |                     summary[n]['loss'], summary[n]['samples']
 805 |                 )
 806 |                 graph.add_node(pydot.Node(n, label=msg, shape='rectangle'))
 807 | 
 808 |                 for c in summary[n]['children']:
 809 |                     if c not in summary:
 810 |                         graph.add_node(pydot.Node(c, label="...",
 811 |                                                   shape='rectangle'))
 812 | 
 813 |             else:
 814 |                 msg = "id_node: {}\nloss: {:.4f}\nsamples: {}".format(
 815 |                     n, summary[n]['loss'], summary[n]['samples'])
 816 |                 graph.add_node(pydot.Node(n, label=msg))
 817 | 
 818 |         # add edges
 819 |         for n in summary:
 820 |             if 'children' in summary[n]:
 821 |                 for c in summary[n]['children']:
 822 |                     graph.add_edge(pydot.Edge(n, c))
 823 | 
 824 |         return graph
 825 | 
 826 |     def plot_model(self, feature_names=None, max_depth=None):
 827 |         """Convert a fitted Linear Tree model to dot format and display it.
 828 |         It results in ModuleNotFoundError if graphviz or pydot are not available.
 829 |         When installing graphviz make sure to add it to the system path.
 830 | 
 831 |         Parameters
 832 |         ----------
 833 |         feature_names : array-like of shape (n_features, ), default=None
 834 |             Names of each of the features. If None, generic names
 835 |             will be used (“X[0]”, “X[1]”, …).
 836 | 
 837 |         max_depth : int, default=None
 838 |             The maximum depth of the representation. If None, the tree
 839 |             is fully generated.
 840 | 
 841 |         Returns
 842 |         -------
 843 |         A Jupyter notebook Image object if Jupyter is installed.
 844 |         This enables in-line display of the model plots in notebooks.
 845 |         Splitting nodes have a rectangular shape while leaf nodes
 846 |         have a circular one.
 847 |         """
 848 |         from IPython.display import Image
 849 | 
 850 |         graph = self.model_to_dot(feature_names=feature_names, max_depth=max_depth)
 851 | 
 852 |         return Image(graph.create_png())
 853 | 
 854 | 
 855 | class _LinearBoosting(TransformerMixin, BaseEstimator):
 856 |     """Base class for Linear Boosting meta-estimator.
 857 | 
 858 |     Warning: This class should not be used directly. Use derived classes
 859 |     instead.
 860 |     """
 861 |     def __init__(self, base_estimator, *, loss, n_estimators,
 862 |                  max_depth, min_samples_split, min_samples_leaf,
 863 |                  min_weight_fraction_leaf, max_features,
 864 |                  random_state, max_leaf_nodes,
 865 |                  min_impurity_decrease, ccp_alpha):
 866 | 
 867 |         self.base_estimator = base_estimator
 868 |         self.loss = loss
 869 |         self.n_estimators = n_estimators
 870 |         self.max_depth = max_depth
 871 |         self.min_samples_split = min_samples_split
 872 |         self.min_samples_leaf = min_samples_leaf
 873 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
 874 |         self.max_features = max_features
 875 |         self.random_state = random_state
 876 |         self.max_leaf_nodes = max_leaf_nodes
 877 |         self.min_impurity_decrease = min_impurity_decrease
 878 |         self.ccp_alpha = ccp_alpha
 879 | 
 880 |     def _fit(self, X, y, sample_weight=None):
 881 |         """Build a Linear Boosting from the training set (X, y).
 882 | 
 883 |         Parameters
 884 |         ----------
 885 |         X : array-like of shape (n_samples, n_features)
 886 |             The training input samples.
 887 | 
 888 |         y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for
 889 |             multitarget regression.
 890 |             The target values (class labels in classification, real numbers in
 891 |             regression).
 892 | 
 893 |         sample_weight : array-like of shape (n_samples, ), default=None
 894 |             Sample weights.
 895 | 
 896 |         Returns
 897 |         -------
 898 |         self : object
 899 |         """
 900 |         if not hasattr(self.base_estimator, 'fit_intercept'):
 901 |             raise ValueError("Only linear models are accepted as base_estimator. "
 902 |                              "Select one from linear_model class of scikit-learn.")
 903 | 
 904 |         if self.n_estimators <= 0:
 905 |             raise ValueError("n_estimators must be an integer greater than 0 but "
 906 |                              "got {}".format(self.n_estimators))
 907 | 
 908 |         n_sample, self.n_features_in_ = X.shape
 909 | 
 910 |         self._trees = []
 911 |         self._leaves = []
 912 | 
 913 |         for i in range(self.n_estimators):
 914 | 
 915 |             estimator = deepcopy(self.base_estimator)
 916 |             estimator.fit(X, y, sample_weight=sample_weight)
 917 | 
 918 |             if self.loss == 'entropy':
 919 |                 pred = estimator.predict_proba(X)
 920 |             else:
 921 |                 pred = estimator.predict(X)
 922 | 
 923 |             if hasattr(self, 'classes_'):
 924 |                 resid = SCORING[self.loss](y, pred, self.classes_)
 925 |             else:
 926 |                 resid = SCORING[self.loss](y, pred)
 927 | 
 928 |             if resid.ndim > 1:
 929 |                 resid = resid.mean(1)
 930 | 
 931 |             criterion = 'squared_error' if _sklearn_v1 else 'mse'
 932 | 
 933 |             tree = DecisionTreeRegressor(
 934 |                 criterion=criterion, max_depth=self.max_depth,
 935 |                 min_samples_split=self.min_samples_split,
 936 |                 min_samples_leaf=self.min_samples_leaf,
 937 |                 min_weight_fraction_leaf=self.min_weight_fraction_leaf,
 938 |                 max_features=self.max_features,
 939 |                 random_state=self.random_state,
 940 |                 max_leaf_nodes=self.max_leaf_nodes,
 941 |                 min_impurity_decrease=self.min_impurity_decrease,
 942 |                 ccp_alpha=self.ccp_alpha
 943 |             )
 944 | 
 945 |             tree.fit(X, resid, sample_weight=sample_weight, check_input=False)
 946 |             self._trees.append(tree)
 947 | 
 948 |             pred_tree = np.abs(tree.predict(X, check_input=False))
 949 |             worst_pred = np.max(pred_tree)
 950 |             self._leaves.append(worst_pred)
 951 | 
 952 |             pred_tree = (pred_tree == worst_pred).astype(np.float32)
 953 |             pred_tree = pred_tree.reshape(-1, 1)
 954 |             X = np.concatenate([X, pred_tree], axis=1)
 955 | 
 956 |         self.base_estimator_ = deepcopy(self.base_estimator)
 957 |         self.base_estimator_.fit(X, y, sample_weight=sample_weight)
 958 | 
 959 |         if hasattr(self.base_estimator_, 'coef_'):
 960 |             self.coef_ = self.base_estimator_.coef_
 961 | 
 962 |         if hasattr(self.base_estimator_, 'intercept_'):
 963 |             self.intercept_ = self.base_estimator_.intercept_
 964 | 
 965 |         self.n_features_out_ = X.shape[1]
 966 | 
 967 |         return self
 968 | 
 969 |     def transform(self, X):
 970 |         """Transform dataset.
 971 | 
 972 |         Parameters
 973 |         ----------
 974 |         X : array-like of shape (n_samples, n_features)
 975 |             Input data to be transformed. Use ``dtype=np.float32`` for maximum
 976 |             efficiency.
 977 | 
 978 |         Returns
 979 |         -------
 980 |         X_transformed : ndarray of shape (n_samples, n_out)
 981 |             Transformed dataset.
 982 |             `n_out` is equal to `n_features` + `n_estimators`
 983 |         """
 984 |         check_is_fitted(self, attributes='base_estimator_')
 985 | 
 986 |         X = self._validate_data(
 987 |             X,
 988 |             reset=False,
 989 |             accept_sparse=False,
 990 |             dtype='float32',
 991 |             force_all_finite=True,
 992 |             ensure_2d=True,
 993 |             allow_nd=False,
 994 |             ensure_min_features=self.n_features_in_
 995 |         )
 996 | 
 997 |         for tree, leaf in zip(self._trees, self._leaves):
 998 |             pred_tree = np.abs(tree.predict(X, check_input=False))
 999 |             pred_tree = (pred_tree == leaf).astype(np.float32)
1000 |             pred_tree = pred_tree.reshape(-1, 1)
1001 |             X = np.concatenate([X, pred_tree], axis=1)
1002 | 
1003 |         return X
1004 | 
1005 | 
1006 | class _LinearForest(BaseEstimator):
1007 |     """Base class for Linear Forest meta-estimator.
1008 | 
1009 |     Warning: This class should not be used directly. Use derived classes
1010 |     instead.
1011 |     """
1012 |     def __init__(self, base_estimator, *, n_estimators, max_depth,
1013 |                  min_samples_split, min_samples_leaf, min_weight_fraction_leaf,
1014 |                  max_features, max_leaf_nodes, min_impurity_decrease,
1015 |                  bootstrap, oob_score, n_jobs, random_state,
1016 |                  ccp_alpha, max_samples):
1017 | 
1018 |         self.base_estimator = base_estimator
1019 |         self.n_estimators = n_estimators
1020 |         self.max_depth = max_depth
1021 |         self.min_samples_split = min_samples_split
1022 |         self.min_samples_leaf = min_samples_leaf
1023 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
1024 |         self.max_features = max_features
1025 |         self.max_leaf_nodes = max_leaf_nodes
1026 |         self.min_impurity_decrease = min_impurity_decrease
1027 |         self.bootstrap = bootstrap
1028 |         self.oob_score = oob_score
1029 |         self.n_jobs = n_jobs
1030 |         self.random_state = random_state
1031 |         self.ccp_alpha = ccp_alpha
1032 |         self.max_samples = max_samples
1033 | 
1034 |     def _sigmoid(self, y):
1035 |         """Expit function (a.k.a. logistic sigmoid).
1036 | 
1037 |         Parameters
1038 |         ----------
1039 |         y : array-like of shape (n_samples, )
1040 |             The array to apply expit to element-wise.
1041 | 
1042 |         Returns
1043 |         -------
1044 |         y : array-like of shape (n_samples, )
1045 |             Expits.
1046 |         """
1047 |         return np.exp(y) / (1 + np.exp(y))
1048 | 
1049 |     def _inv_sigmoid(self, y):
1050 |         """Logit function.
1051 | 
1052 |         Parameters
1053 |         ----------
1054 |         y : array-like of shape (n_samples, )
1055 |             The array to apply logit to element-wise.
1056 | 
1057 |         Returns
1058 |         -------
1059 |         y : array-like of shape (n_samples, )
1060 |             Logits.
1061 |         """
1062 |         y = y.clip(1e-3, 1 - 1e-3)
1063 | 
1064 |         return np.log(y / (1 - y))
1065 | 
1066 |     def _fit(self, X, y, sample_weight=None):
1067 |         """Build a Linear Boosting from the training set (X, y).
1068 | 
1069 |         Parameters
1070 |         ----------
1071 |         X : array-like of shape (n_samples, n_features)
1072 |             The training input samples.
1073 | 
1074 |         y : array-like of shape (n_samples, ) or also (n_samples, n_targets) for
1075 |             multitarget regression.
1076 |             The target values (class labels in classification, real numbers in
1077 |             regression).
1078 | 
1079 |         sample_weight : array-like of shape (n_samples, ), default=None
1080 |             Sample weights.
1081 | 
1082 |         Returns
1083 |         -------
1084 |         self : object
1085 |         """
1086 |         if not hasattr(self.base_estimator, 'fit_intercept'):
1087 |             raise ValueError("Only linear models are accepted as base_estimator. "
1088 |                              "Select one from linear_model class of scikit-learn.")
1089 | 
1090 |         if not is_regressor(self.base_estimator):
1091 |             raise ValueError("Select a regressor linear model as base_estimator.")
1092 | 
1093 |         n_sample, self.n_features_in_ = X.shape
1094 | 
1095 |         if hasattr(self, 'classes_'):
1096 |             class_to_int = dict(map(reversed, enumerate(self.classes_)))
1097 |             y = np.array([class_to_int[i] for i in y])
1098 |             y = self._inv_sigmoid(y)
1099 | 
1100 |         self.base_estimator_ = deepcopy(self.base_estimator)
1101 |         self.base_estimator_.fit(X, y, sample_weight)
1102 |         resid = y - self.base_estimator_.predict(X)
1103 | 
1104 |         criterion = 'squared_error' if _sklearn_v1 else 'mse'
1105 | 
1106 |         self.forest_estimator_ = RandomForestRegressor(
1107 |             n_estimators=self.n_estimators,
1108 |             criterion=criterion,
1109 |             max_depth=self.max_depth,
1110 |             min_samples_split=self.min_samples_split,
1111 |             min_samples_leaf=self.min_samples_leaf,
1112 |             min_weight_fraction_leaf=self.min_weight_fraction_leaf,
1113 |             max_features=self.max_features,
1114 |             max_leaf_nodes=self.max_leaf_nodes,
1115 |             min_impurity_decrease=self.min_impurity_decrease,
1116 |             bootstrap=self.bootstrap,
1117 |             oob_score=self.oob_score,
1118 |             n_jobs=self.n_jobs,
1119 |             random_state=self.random_state,
1120 |             ccp_alpha=self.ccp_alpha,
1121 |             max_samples=self.max_samples
1122 |         )
1123 |         self.forest_estimator_.fit(X, resid, sample_weight)
1124 | 
1125 |         if hasattr(self.base_estimator_, 'coef_'):
1126 |             self.coef_ = self.base_estimator_.coef_
1127 | 
1128 |         if hasattr(self.base_estimator_, 'intercept_'):
1129 |             self.intercept_ = self.base_estimator_.intercept_
1130 | 
1131 |         self.feature_importances_ = self.forest_estimator_.feature_importances_
1132 | 
1133 |         return self
1134 | 
1135 |     def apply(self, X):
1136 |         """Apply trees in the forest to X, return leaf indices.
1137 | 
1138 |         Parameters
1139 |         ----------
1140 |         X : array-like of shape (n_samples, n_features)
1141 |             The input samples.
1142 | 
1143 |         Returns
1144 |         -------
1145 |         X_leaves : ndarray of shape (n_samples, n_estimators)
1146 |             For each datapoint x in X and for each tree in the forest,
1147 |             return the index of the leaf x ends up in.
1148 |         """
1149 |         check_is_fitted(self, attributes='base_estimator_')
1150 | 
1151 |         return self.forest_estimator_.apply(X)
1152 | 
1153 |     def decision_path(self, X):
1154 |         """Return the decision path in the forest.
1155 | 
1156 |         Parameters
1157 |         ----------
1158 |         X : array-like of shape (n_samples, n_features)
1159 |             The input samples.
1160 | 
1161 |         Returns
1162 |         -------
1163 |         indicator : sparse matrix of shape (n_samples, n_nodes)
1164 |             Return a node indicator matrix where non zero elements indicates
1165 |             that the samples goes through the nodes. The matrix is of CSR
1166 |             format.
1167 | 
1168 |         n_nodes_ptr : ndarray of shape (n_estimators + 1, )
1169 |             The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]]
1170 |             gives the indicator value for the i-th estimator.
1171 |         """
1172 |         check_is_fitted(self, attributes='base_estimator_')
1173 | 
1174 |         return self.forest_estimator_.decision_path(X)


--------------------------------------------------------------------------------
/lineartree/_criterion.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | SCORING = {
 5 |     'linear': lambda y, yh: y - yh,
 6 |     'square': lambda y, yh: np.square(y - yh),
 7 |     'absolute': lambda y, yh: np.abs(y - yh),
 8 |     'exponential': lambda y, yh: 1 - np.exp(-np.abs(y - yh)),
 9 |     'poisson': lambda y, yh: yh.clip(1e-6) - y * np.log(yh.clip(1e-6)),
10 |     'hamming': lambda y, yh, classes: (y != yh).astype(int),
11 |     'entropy': lambda y, yh, classes: np.sum(list(map(
12 |         lambda c: -(y == c[1]).astype(int) * np.log(yh[:, c[0]]),
13 |         enumerate(classes))), axis=0)
14 | }
15 | 
16 | 
17 | def _normalize_score(scores, weights=None):
18 |     """Normalize scores according to weights"""
19 | 
20 |     if weights is None:
21 |         return scores.mean()
22 |     else:
23 |         return np.mean(np.dot(scores.T, weights) / weights.sum())
24 | 
25 | 
26 | def mse(model, X, y, weights=None, **largs):
27 |     """Mean Squared Error"""
28 | 
29 |     pred = model.predict(X)
30 |     scores = SCORING['square'](y, pred)
31 | 
32 |     return _normalize_score(scores, weights)
33 | 
34 | 
35 | def rmse(model, X, y, weights=None, **largs):
36 |     """Root Mean Squared Error"""
37 | 
38 |     return np.sqrt(mse(model, X, y, weights, **largs))
39 | 
40 | 
41 | def mae(model, X, y, weights=None, **largs):
42 |     """Mean Absolute Error"""
43 | 
44 |     pred = model.predict(X)
45 |     scores = SCORING['absolute'](y, pred)
46 | 
47 |     return _normalize_score(scores, weights)
48 | 
49 | 
50 | def poisson(model, X, y, weights=None, **largs):
51 |     """Poisson Loss"""
52 | 
53 |     if np.any(y < 0):
54 |         raise ValueError("Some value(s) of y are negative which is"
55 |                          " not allowed for Poisson regression.")
56 | 
57 |     pred = model.predict(X)
58 |     scores = SCORING['poisson'](y, pred)
59 | 
60 |     return _normalize_score(scores, weights)
61 | 
62 | 
63 | def hamming(model, X, y, weights=None, **largs):
64 |     """Hamming Loss"""
65 | 
66 |     pred = model.predict(X)
67 |     scores = SCORING['hamming'](y, pred, None)
68 | 
69 |     return _normalize_score(scores, weights)
70 | 
71 | 
72 | def crossentropy(model, X, y, classes, weights=None, **largs):
73 |     """Cross Entropy Loss"""
74 | 
75 |     pred = model.predict_proba(X).clip(1e-5, 1 - 1e-5)
76 |     scores = SCORING['entropy'](y, pred, classes)
77 | 
78 |     return _normalize_score(scores, weights)


--------------------------------------------------------------------------------
/lineartree/lineartree.py:
--------------------------------------------------------------------------------
   1 | import numpy as np
   2 | 
   3 | from sklearn.base import ClassifierMixin, RegressorMixin
   4 | from sklearn.utils.validation import check_is_fitted, _check_sample_weight
   5 | 
   6 | from ._classes import _predict_branch
   7 | from ._classes import _LinearTree, _LinearBoosting, _LinearForest
   8 | 
   9 | 
  10 | class LinearTreeRegressor(_LinearTree, RegressorMixin):
  11 |     """A Linear Tree Regressor.
  12 | 
  13 |     A Linear Tree Regressor is a meta-estimator that combine the learning
  14 |     ability of Decision Tree and the predictive power of Linear Models.
  15 |     Like in tree-based algorithms, the received data are splitted according
  16 |     simple decision rules. The goodness of slits is evaluated in gain terms
  17 |     fitting linear models in each node. This implies that the models in the
  18 |     leaves are linear instead of constant approximations like in classical
  19 |     Decision Tree.
  20 | 
  21 |     Parameters
  22 |     ----------
  23 |     base_estimator : object
  24 |         The base estimator to fit on dataset splits.
  25 |         The base estimator must be a sklearn.linear_model.
  26 | 
  27 |     criterion : {"mse", "rmse", "mae", "poisson"}, default="mse"
  28 |         The function to measure the quality of a split. "poisson"
  29 |         requires ``y >= 0``.
  30 | 
  31 |     max_depth : int, default=5
  32 |         The maximum depth of the tree considering only the splitting nodes.
  33 |         A higher value implies a higher training time.
  34 | 
  35 |     min_samples_split : int or float, default=6
  36 |         The minimum number of samples required to split an internal node.
  37 |         The minimum valid number of samples in each node is 6.
  38 |         A lower value implies a higher training time.
  39 |         - If int, then consider `min_samples_split` as the minimum number.
  40 |         - If float, then `min_samples_split` is a fraction and
  41 |           `ceil(min_samples_split * n_samples)` are the minimum
  42 |           number of samples for each split.
  43 | 
  44 |     min_samples_leaf : int or float, default=0.1
  45 |         The minimum number of samples required to be at a leaf node.
  46 |         A split point at any depth will only be considered if it leaves at
  47 |         least `min_samples_leaf` training samples in each of the left and
  48 |         right branches.
  49 |         The minimum valid number of samples in each leaf is 3.
  50 |         A lower value implies a higher training time.
  51 |         - If int, then consider `min_samples_leaf` as the minimum number.
  52 |         - If float, then `min_samples_leaf` is a fraction and
  53 |           `ceil(min_samples_leaf * n_samples)` are the minimum
  54 |           number of samples for each node.
  55 | 
  56 |     max_bins : int, default=25
  57 |         The maximum number of bins to use to search the optimal split in each
  58 |         feature. Features with a small number of unique values may use less than
  59 |         ``max_bins`` bins. Must be lower than 120 and larger than 10.
  60 |         A higher value implies a higher training time.
  61 | 
  62 |     min_impurity_decrease : float, default=0.0
  63 |         A node will be split if this split induces a decrease of the impurity
  64 |         greater than or equal to this value.
  65 | 
  66 |     categorical_features : int or array-like of int, default=None
  67 |         Indicates the categorical features.
  68 |         All categorical indices must be in `[0, n_features)`.
  69 |         Categorical features are used for splits but are not used in
  70 |         model fitting.
  71 |         More categorical features imply a higher training time.
  72 |         - None : no feature will be considered categorical.
  73 |         - integer array-like : integer indices indicating categorical
  74 |           features.
  75 |         - integer : integer index indicating a categorical
  76 |           feature.
  77 | 
  78 |     split_features : int or array-like of int, default=None
  79 |         Defines which features can be used to split on.
  80 |         All split feature indices must be in `[0, n_features)`.
  81 |         - None : All features will be used for splitting.
  82 |         - integer array-like : integer indices indicating splitting features.
  83 |         - integer : integer index indicating a single splitting feature.
  84 | 
  85 |     linear_features : int or array-like of int, default=None
  86 |         Defines which features are used for the linear model in the leaves.
  87 |         All linear feature indices must be in `[0, n_features)`.
  88 |         - None : All features except those in `categorical_features`
  89 |           will be used in the leaf models.
  90 |         - integer array-like : integer indices indicating features to
  91 |           be used in the leaf models.
  92 |         - integer : integer index indicating a single feature to be used
  93 |           in the leaf models.
  94 | 
  95 |     n_jobs : int, default=None
  96 |         The number of jobs to run in parallel for model fitting.
  97 |         ``None`` means 1 using one processor. ``-1`` means using all
  98 |         processors.
  99 | 
 100 |     Attributes
 101 |     ----------
 102 |     n_features_in_ : int
 103 |         The number of features when :meth:`fit` is performed.
 104 | 
 105 |     feature_importances_ : ndarray of shape (n_features, )
 106 |         Normalized total reduction of criteria by splitting features.
 107 | 
 108 |     n_targets_ : int
 109 |         The number of targets when :meth:`fit` is performed.
 110 | 
 111 |     Examples
 112 |     --------
 113 |     >>> from sklearn.linear_model import LinearRegression
 114 |     >>> from lineartree import LinearTreeRegressor
 115 |     >>> from sklearn.datasets import make_regression
 116 |     >>> X, y = make_regression(n_samples=100, n_features=4,
 117 |     ...                        n_informative=2, n_targets=1,
 118 |     ...                        random_state=0, shuffle=False)
 119 |     >>> regr = LinearTreeRegressor(base_estimator=LinearRegression())
 120 |     >>> regr.fit(X, y)
 121 |     >>> regr.predict([[0, 0, 0, 0]])
 122 |     array([8.8817842e-16])
 123 |     """
 124 |     def __init__(self, base_estimator, *, criterion='mse', max_depth=5,
 125 |                  min_samples_split=6, min_samples_leaf=0.1, max_bins=25,
 126 |                  min_impurity_decrease=0.0, categorical_features=None,
 127 |                  split_features=None, linear_features=None, n_jobs=None):
 128 | 
 129 |         self.base_estimator = base_estimator
 130 |         self.criterion = criterion
 131 |         self.max_depth = max_depth
 132 |         self.min_samples_split = min_samples_split
 133 |         self.min_samples_leaf = min_samples_leaf
 134 |         self.max_bins = max_bins
 135 |         self.min_impurity_decrease = min_impurity_decrease
 136 |         self.categorical_features = categorical_features
 137 |         self.split_features = split_features
 138 |         self.linear_features = linear_features
 139 |         self.n_jobs = n_jobs
 140 | 
 141 |     def fit(self, X, y, sample_weight=None):
 142 |         """Build a Linear Tree of a linear estimator from the training
 143 |         set (X, y).
 144 | 
 145 |         Parameters
 146 |         ----------
 147 |         X : array-like of shape (n_samples, n_features)
 148 |             The training input samples.
 149 | 
 150 |         y : array-like of shape (n_samples, ) or (n_samples, n_targets)
 151 |             Target values.
 152 | 
 153 |         sample_weight : array-like of shape (n_samples, ), default=None
 154 |             Sample weights. If None, then samples are equally weighted.
 155 |             Note that if the base estimator does not support sample weighting,
 156 |             the sample weights are still used to evaluate the splits.
 157 | 
 158 |         Returns
 159 |         -------
 160 |         self : object
 161 |         """
 162 |         reg_criterions = ('mse', 'rmse', 'mae', 'poisson')
 163 | 
 164 |         if self.criterion not in reg_criterions:
 165 |             raise ValueError("Regression tasks support only criterion in {}, "
 166 |                              "got '{}'.".format(reg_criterions, self.criterion))
 167 | 
 168 |         # Convert data (X is required to be 2d and indexable)
 169 |         X, y = self._validate_data(
 170 |             X, y,
 171 |             reset=True,
 172 |             accept_sparse=False,
 173 |             dtype='float32',
 174 |             force_all_finite=True,
 175 |             ensure_2d=True,
 176 |             allow_nd=False,
 177 |             multi_output=True,
 178 |             y_numeric=True,
 179 |         )
 180 |         if sample_weight is not None:
 181 |             sample_weight = _check_sample_weight(sample_weight, X)
 182 | 
 183 |         y_shape = np.shape(y)
 184 |         self.n_targets_ = y_shape[1] if len(y_shape) > 1 else 1
 185 |         if self.n_targets_ < 2:
 186 |             y = y.ravel()
 187 |         self._fit(X, y, sample_weight)
 188 | 
 189 |         return self
 190 | 
 191 |     def predict(self, X):
 192 |         """Predict regression target for X.
 193 | 
 194 |         Parameters
 195 |         ----------
 196 |         X : array-like of shape (n_samples, n_features)
 197 |             Samples.
 198 | 
 199 |         Returns
 200 |         -------
 201 |         pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if
 202 |             multitarget regression.
 203 |             The predicted values.
 204 |         """
 205 |         check_is_fitted(self, attributes='_nodes')
 206 | 
 207 |         X = self._validate_data(
 208 |             X,
 209 |             reset=False,
 210 |             accept_sparse=False,
 211 |             dtype='float32',
 212 |             force_all_finite=True,
 213 |             ensure_2d=True,
 214 |             allow_nd=False,
 215 |             ensure_min_features=self.n_features_in_
 216 |         )
 217 | 
 218 |         if self.n_targets_ > 1:
 219 |             pred = np.zeros((X.shape[0], self.n_targets_))
 220 |         else:
 221 |             pred = np.zeros(X.shape[0])
 222 | 
 223 |         for L in self._leaves.values():
 224 | 
 225 |             mask = _predict_branch(X, L.threshold)
 226 |             if (~mask).all():
 227 |                 continue
 228 | 
 229 |             pred[mask] = L.model.predict(X[np.ix_(mask, self._linear_features)])
 230 | 
 231 |         return pred
 232 | 
 233 | 
 234 | class LinearTreeClassifier(_LinearTree, ClassifierMixin):
 235 |     """A Linear Tree Classifier.
 236 | 
 237 |     A Linear Tree Classifier is a meta-estimator that combine the learning
 238 |     ability of Decision Tree and the predictive power of Linear Models.
 239 |     Like in tree-based algorithms, the received data are splitted according
 240 |     simple decision rules. The goodness of slits is evaluated in gain terms
 241 |     fitting linear models in each node. This implies that the models in the
 242 |     leaves are linear instead of constant approximations like in classical
 243 |     Decision Tree.
 244 | 
 245 |     Parameters
 246 |     ----------
 247 |     base_estimator : object
 248 |         The base estimator to fit on dataset splits.
 249 |         The base estimator must be a sklearn.linear_model.
 250 |         The selected base estimator is automatically substituted by a
 251 |         `~sklearn.dummy.DummyClassifier` when a dataset split
 252 |         is composed of unique labels.
 253 | 
 254 |     criterion : {"hamming", "crossentropy"}, default="hamming"
 255 |         The function to measure the quality of a split. `"crossentropy"`
 256 |         can be used only if `base_estimator` has `predict_proba` method.
 257 | 
 258 |     max_depth : int, default=5
 259 |         The maximum depth of the tree considering only the splitting nodes.
 260 |         A higher value implies a higher training time.
 261 | 
 262 |     min_samples_split : int or float, default=6
 263 |         The minimum number of samples required to split an internal node.
 264 |         The minimum valid number of samples in each node is 6.
 265 |         A lower value implies a higher training time.
 266 |         - If int, then consider `min_samples_split` as the minimum number.
 267 |         - If float, then `min_samples_split` is a fraction and
 268 |           `ceil(min_samples_split * n_samples)` are the minimum
 269 |           number of samples for each split.
 270 | 
 271 |     min_samples_leaf : int or float, default=0.1
 272 |         The minimum number of samples required to be at a leaf node.
 273 |         A split point at any depth will only be considered if it leaves at
 274 |         least `min_samples_leaf` training samples in each of the left and
 275 |         right branches.
 276 |         The minimum valid number of samples in each leaf is 3.
 277 |         A lower value implies a higher training time.
 278 |         - If int, then consider `min_samples_leaf` as the minimum number.
 279 |         - If float, then `min_samples_leaf` is a fraction and
 280 |           `ceil(min_samples_leaf * n_samples)` are the minimum
 281 |           number of samples for each node.
 282 | 
 283 |     max_bins : int, default=25
 284 |         The maximum number of bins to use to search the optimal split in each
 285 |         feature. Features with a small number of unique values may use less than
 286 |         ``max_bins`` bins. Must be lower than 120 and larger than 10.
 287 |         A higher value implies a higher training time.
 288 | 
 289 |     min_impurity_decrease : float, default=0.0
 290 |         A node will be split if this split induces a decrease of the impurity
 291 |         greater than or equal to this value.
 292 | 
 293 |     categorical_features : int or array-like of int, default=None
 294 |         Indicates the categorical features.
 295 |         All categorical indices must be in `[0, n_features)`.
 296 |         Categorical features are used for splits but are not used in
 297 |         model fitting.
 298 |         More categorical features imply a higher training time.
 299 |         - None : no feature will be considered categorical.
 300 |         - integer array-like : integer indices indicating categorical
 301 |           features.
 302 |         - integer : integer index indicating a categorical
 303 |           feature.
 304 | 
 305 |     split_features : int or array-like of int, default=None
 306 |         Defines which features can be used to split on.
 307 |         All split feature indices must be in `[0, n_features)`.
 308 |         - None : All features will be used for splitting.
 309 |         - integer array-like : integer indices indicating splitting features.
 310 |         - integer : integer index indicating a single splitting feature.
 311 | 
 312 |     linear_features : int or array-like of int, default=None
 313 |         Defines which features are used for the linear model in the leaves.
 314 |         All linear feature indices must be in `[0, n_features)`.
 315 |         - None : All features except those in `categorical_features`
 316 |           will be used in the leaf models.
 317 |         - integer array-like : integer indices indicating features to
 318 |           be used in the leaf models.
 319 |         - integer : integer index indicating a single feature to be used
 320 |           in the leaf models.
 321 | 
 322 |     n_jobs : int, default=None
 323 |         The number of jobs to run in parallel for model fitting.
 324 |         ``None`` means 1 using one processor. ``-1`` means using all
 325 |         processors.
 326 | 
 327 |     Attributes
 328 |     ----------
 329 |     n_features_in_ : int
 330 |         The number of features when :meth:`fit` is performed.
 331 | 
 332 |     feature_importances_ : ndarray of shape (n_features, )
 333 |         Normalized total reduction of criteria by splitting features.
 334 | 
 335 |     classes_ : ndarray of shape (n_classes, )
 336 |         A list of class labels known to the classifier.
 337 | 
 338 |     Examples
 339 |     --------
 340 |     >>> from sklearn.linear_model import RidgeClassifier
 341 |     >>> from lineartree import LinearTreeClassifier
 342 |     >>> from sklearn.datasets import make_classification
 343 |     >>> X, y = make_classification(n_samples=100, n_features=4,
 344 |     ...                            n_informative=2, n_redundant=0,
 345 |     ...                            random_state=0, shuffle=False)
 346 |     >>> clf = LinearTreeClassifier(base_estimator=RidgeClassifier())
 347 |     >>> clf.fit(X, y)
 348 |     >>> clf.predict([[0, 0, 0, 0]])
 349 |     array([1])
 350 |     """
 351 |     def __init__(self, base_estimator, *, criterion='hamming', max_depth=5,
 352 |                  min_samples_split=6, min_samples_leaf=0.1, max_bins=25,
 353 |                  min_impurity_decrease=0.0, categorical_features=None,
 354 |                  split_features=None, linear_features=None, n_jobs=None):
 355 | 
 356 |         self.base_estimator = base_estimator
 357 |         self.criterion = criterion
 358 |         self.max_depth = max_depth
 359 |         self.min_samples_split = min_samples_split
 360 |         self.min_samples_leaf = min_samples_leaf
 361 |         self.max_bins = max_bins
 362 |         self.min_impurity_decrease = min_impurity_decrease
 363 |         self.categorical_features = categorical_features
 364 |         self.split_features = split_features
 365 |         self.linear_features = linear_features
 366 |         self.n_jobs = n_jobs
 367 | 
 368 |     def fit(self, X, y, sample_weight=None):
 369 |         """Build a Linear Tree of a linear estimator from the training
 370 |         set (X, y).
 371 | 
 372 |         Parameters
 373 |         ----------
 374 |         X : array-like of shape (n_samples, n_features)
 375 |             The training input samples.
 376 | 
 377 |         y : array-like of shape (n_samples, )
 378 |             Target values.
 379 | 
 380 |         sample_weight : array-like of shape (n_samples, ), default=None
 381 |             Sample weights. If None, then samples are equally weighted.
 382 |             Note that if the base estimator does not support sample weighting,
 383 |             the sample weights are still used to evaluate the splits.
 384 | 
 385 |         Returns
 386 |         -------
 387 |         self : object
 388 |         """
 389 |         clas_criterions = ('hamming', 'crossentropy')
 390 | 
 391 |         if self.criterion not in clas_criterions:
 392 |             raise ValueError("Classification tasks support only criterion in {}, "
 393 |                              "got '{}'.".format(clas_criterions, self.criterion))
 394 | 
 395 |         if (not hasattr(self.base_estimator, 'predict_proba') and
 396 |                 self.criterion == 'crossentropy'):
 397 |             raise ValueError("The 'crossentropy' criterion requires a base_estimator "
 398 |                              "with predict_proba method.")
 399 | 
 400 |         # Convert data (X is required to be 2d and indexable)
 401 |         X, y = self._validate_data(
 402 |             X, y,
 403 |             reset=True,
 404 |             accept_sparse=False,
 405 |             dtype='float32',
 406 |             force_all_finite=True,
 407 |             ensure_2d=True,
 408 |             allow_nd=False,
 409 |             multi_output=False,
 410 |         )
 411 |         if sample_weight is not None:
 412 |             sample_weight = _check_sample_weight(sample_weight, X)
 413 | 
 414 |         self.classes_ = np.unique(y)
 415 |         self._fit(X, y, sample_weight)
 416 | 
 417 |         return self
 418 | 
 419 |     def predict(self, X):
 420 |         """Predict class for X.
 421 | 
 422 |         Parameters
 423 |         ----------
 424 |         X : array-like of shape (n_samples, n_features)
 425 |             Samples.
 426 | 
 427 |         Returns
 428 |         -------
 429 |         pred : ndarray of shape (n_samples, )
 430 |             The predicted classes.
 431 |         """
 432 |         check_is_fitted(self, attributes='_nodes')
 433 | 
 434 |         X = self._validate_data(
 435 |             X,
 436 |             reset=False,
 437 |             accept_sparse=False,
 438 |             dtype='float32',
 439 |             force_all_finite=True,
 440 |             ensure_2d=True,
 441 |             allow_nd=False,
 442 |             ensure_min_features=self.n_features_in_
 443 |         )
 444 | 
 445 |         pred = np.empty(X.shape[0], dtype=self.classes_.dtype)
 446 | 
 447 |         for L in self._leaves.values():
 448 | 
 449 |             mask = _predict_branch(X, L.threshold)
 450 |             if (~mask).all():
 451 |                 continue
 452 | 
 453 |             pred[mask] = L.model.predict(X[np.ix_(mask, self._linear_features)])
 454 | 
 455 |         return pred
 456 | 
 457 |     def predict_proba(self, X):
 458 |         """Predict class probabilities for X.
 459 | 
 460 |         If base estimators do not implement a ``predict_proba`` method,
 461 |         then the one-hot encoding of the predicted class is returned.
 462 | 
 463 |         Parameters
 464 |         ----------
 465 |         X : array-like of shape (n_samples, n_features)
 466 |             Samples.
 467 | 
 468 |         Returns
 469 |         -------
 470 |         pred : ndarray of shape (n_samples, n_classes)
 471 |             The class probabilities of the input samples. The order of the
 472 |             classes corresponds to that in the attribute :term:`classes_`.
 473 |         """
 474 |         check_is_fitted(self, attributes='_nodes')
 475 | 
 476 |         X = self._validate_data(
 477 |             X,
 478 |             reset=False,
 479 |             accept_sparse=False,
 480 |             dtype='float32',
 481 |             force_all_finite=True,
 482 |             ensure_2d=True,
 483 |             allow_nd=False,
 484 |             ensure_min_features=self.n_features_in_
 485 |         )
 486 | 
 487 |         pred = np.zeros((X.shape[0], len(self.classes_)))
 488 | 
 489 |         if hasattr(self.base_estimator, 'predict_proba'):
 490 |             for L in self._leaves.values():
 491 | 
 492 |                 mask = _predict_branch(X, L.threshold)
 493 |                 if (~mask).all():
 494 |                     continue
 495 | 
 496 |                 pred[np.ix_(mask, np.isin(self.classes_, L.classes))] = \
 497 |                     L.model.predict_proba(X[np.ix_(mask, self._linear_features)])
 498 | 
 499 |         else:
 500 |             pred_class = self.predict(X)
 501 |             class_to_int = dict(map(reversed, enumerate(self.classes_)))
 502 |             pred_class = np.array([class_to_int[i] for i in pred_class])
 503 |             pred[np.arange(X.shape[0]), pred_class] = 1
 504 | 
 505 |         return pred
 506 | 
 507 |     def predict_log_proba(self, X):
 508 |         """Predict class log-probabilities for X.
 509 | 
 510 |         If base estimators do not implement a ``predict_log_proba`` method,
 511 |         then the logarithm of the one-hot encoded predicted class is returned.
 512 | 
 513 |         Parameters
 514 |         ----------
 515 |         X : array-like of shape (n_samples, n_features)
 516 |             Samples.
 517 | 
 518 |         Returns
 519 |         -------
 520 |         pred : ndarray of shape (n_samples, n_classes)
 521 |             The class log-probabilities of the input samples. The order of the
 522 |             classes corresponds to that in the attribute :term:`classes_`.
 523 |         """
 524 |         return np.log(self.predict_proba(X))
 525 | 
 526 | 
 527 | class LinearBoostRegressor(_LinearBoosting, RegressorMixin):
 528 |     """A Linear Boosting Regressor.
 529 | 
 530 |     A Linear Boosting Regressor is an iterative meta-estimator that starts
 531 |     with a linear regressor, and model the residuals through decision trees.
 532 |     At each iteration, the path leading to highest error (i.e. the worst leaf)
 533 |     is added as a new binary variable to the base model. This kind of Linear
 534 |     Boosting can be considered as an improvement over general linear models
 535 |     since it enables incorporating non-linear features by residuals modeling.
 536 | 
 537 |     Parameters
 538 |     ----------
 539 |     base_estimator : object
 540 |         The base estimator iteratively fitted.
 541 |         The base estimator must be a sklearn.linear_model.
 542 | 
 543 |     loss : {"linear", "square", "absolute", "exponential"}, default="linear"
 544 |         The function used to calculate the residuals of each sample.
 545 | 
 546 |     n_estimators : int, default=10
 547 |         The number of boosting stages to perform. It corresponds to the number
 548 |         of the new features generated.
 549 | 
 550 |     max_depth : int, default=3
 551 |         The maximum depth of the tree. If None, then nodes are expanded until
 552 |         all leaves are pure or until all leaves contain less than
 553 |         min_samples_split samples.
 554 | 
 555 |     min_samples_split : int or float, default=2
 556 |         The minimum number of samples required to split an internal node:
 557 | 
 558 |         - If int, then consider `min_samples_split` as the minimum number.
 559 |         - If float, then `min_samples_split` is a fraction and
 560 |           `ceil(min_samples_split * n_samples)` are the minimum
 561 |           number of samples for each split.
 562 | 
 563 |     min_samples_leaf : int or float, default=1
 564 |         The minimum number of samples required to be at a leaf node.
 565 |         A split point at any depth will only be considered if it leaves at
 566 |         least ``min_samples_leaf`` training samples in each of the left and
 567 |         right branches.  This may have the effect of smoothing the model,
 568 |         especially in regression.
 569 | 
 570 |         - If int, then consider `min_samples_leaf` as the minimum number.
 571 |         - If float, then `min_samples_leaf` is a fraction and
 572 |           `ceil(min_samples_leaf * n_samples)` are the minimum
 573 |           number of samples for each node.
 574 | 
 575 |     min_weight_fraction_leaf : float, default=0.0
 576 |         The minimum weighted fraction of the sum total of weights (of all
 577 |         the input samples) required to be at a leaf node. Samples have
 578 |         equal weight when sample_weight is not provided.
 579 | 
 580 |     max_features : int, float or {"auto", "sqrt", "log2"}, default=None
 581 |         The number of features to consider when looking for the best split:
 582 | 
 583 |         - If int, then consider `max_features` features at each split.
 584 |         - If float, then `max_features` is a fraction and
 585 |           `int(max_features * n_features)` features are considered at each
 586 |           split.
 587 |         - If "auto", then `max_features=n_features`.
 588 |         - If "sqrt", then `max_features=sqrt(n_features)`.
 589 |         - If "log2", then `max_features=log2(n_features)`.
 590 |         - If None, then `max_features=n_features`.
 591 | 
 592 |         Note: the search for a split does not stop until at least one
 593 |         valid partition of the node samples is found, even if it requires to
 594 |         effectively inspect more than ``max_features`` features.
 595 | 
 596 |     random_state : int, RandomState instance or None, default=None
 597 |         Controls the randomness of the estimator.
 598 | 
 599 |     max_leaf_nodes : int, default=None
 600 |         Grow a tree with ``max_leaf_nodes`` in best-first fashion.
 601 |         Best nodes are defined as relative reduction in impurity.
 602 |         If None then unlimited number of leaf nodes.
 603 | 
 604 |     min_impurity_decrease : float, default=0.0
 605 |         A node will be split if this split induces a decrease of the impurity
 606 |         greater than or equal to this value.
 607 | 
 608 |     ccp_alpha : non-negative float, default=0.0
 609 |         Complexity parameter used for Minimal Cost-Complexity Pruning. The
 610 |         subtree with the largest cost complexity that is smaller than
 611 |         ``ccp_alpha`` will be chosen. By default, no pruning is performed. See
 612 |         :ref:`minimal_cost_complexity_pruning` for details.
 613 | 
 614 |     Attributes
 615 |     ----------
 616 |     n_features_in_ : int
 617 |         The number of features when :meth:`fit` is performed.
 618 | 
 619 |     n_features_out_ : int
 620 |         The total number of features used to fit the base estimator in the
 621 |         last iteration. The number of output features is equal to the sum
 622 |         of n_features_in_ and n_estimators.
 623 | 
 624 |     coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_)
 625 |         Estimated coefficients for the linear regression problem.
 626 |         If multiple targets are passed during the fit (y 2D), this is a
 627 |         2D array of shape (n_targets, n_features_out_), while if only one target
 628 |         is passed, this is a 1D array of length n_features_out_.
 629 | 
 630 |     intercept_ : float or array of shape (n_targets, )
 631 |         Independent term in the linear model. Set to 0 if `fit_intercept = False`
 632 |         in `base_estimator`
 633 | 
 634 |     Examples
 635 |     --------
 636 |     >>> from sklearn.linear_model import LinearRegression
 637 |     >>> from lineartree import LinearBoostRegressor
 638 |     >>> from sklearn.datasets import make_regression
 639 |     >>> X, y = make_regression(n_samples=100, n_features=4,
 640 |     ...                        n_informative=2, n_targets=1,
 641 |     ...                        random_state=0, shuffle=False)
 642 |     >>> regr = LinearBoostRegressor(base_estimator=LinearRegression())
 643 |     >>> regr.fit(X, y)
 644 |     >>> regr.predict([[0, 0, 0, 0]])
 645 |     array([8.8817842e-16])
 646 | 
 647 |     References
 648 |     ----------
 649 |     Explainable boosted linear regression for time series forecasting.
 650 |     Authors: Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan.
 651 |     (https://arxiv.org/abs/2009.09110)
 652 |     """
 653 |     def __init__(self, base_estimator, *, loss='linear', n_estimators=10,
 654 |                  max_depth=3, min_samples_split=2, min_samples_leaf=1,
 655 |                  min_weight_fraction_leaf=0.0, max_features=None,
 656 |                  random_state=None, max_leaf_nodes=None,
 657 |                  min_impurity_decrease=0.0, ccp_alpha=0.0):
 658 | 
 659 |         self.base_estimator = base_estimator
 660 |         self.loss = loss
 661 |         self.n_estimators = n_estimators
 662 |         self.max_depth = max_depth
 663 |         self.min_samples_split = min_samples_split
 664 |         self.min_samples_leaf = min_samples_leaf
 665 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
 666 |         self.max_features = max_features
 667 |         self.random_state = random_state
 668 |         self.max_leaf_nodes = max_leaf_nodes
 669 |         self.min_impurity_decrease = min_impurity_decrease
 670 |         self.ccp_alpha = ccp_alpha
 671 | 
 672 |     def fit(self, X, y, sample_weight=None):
 673 |         """Build a Linear Boosting from the training set (X, y).
 674 | 
 675 |         Parameters
 676 |         ----------
 677 |         X : array-like of shape (n_samples, n_features)
 678 |             The training input samples.
 679 | 
 680 |         y : array-like of shape (n_samples, ) or (n_samples, n_targets)
 681 |             Target values.
 682 | 
 683 |         sample_weight : array-like of shape (n_samples, ), default=None
 684 |             Sample weights.
 685 | 
 686 |         Returns
 687 |         -------
 688 |         self : object
 689 |         """
 690 |         reg_losses = ('linear', 'square', 'absolute', 'exponential')
 691 | 
 692 |         if self.loss not in reg_losses:
 693 |             raise ValueError("Regression tasks support only loss in {}, "
 694 |                              "got '{}'.".format(reg_losses, self.loss))
 695 | 
 696 |         # Convert data (X is required to be 2d and indexable)
 697 |         X, y = self._validate_data(
 698 |             X, y,
 699 |             reset=True,
 700 |             accept_sparse=False,
 701 |             dtype='float32',
 702 |             force_all_finite=True,
 703 |             ensure_2d=True,
 704 |             allow_nd=False,
 705 |             multi_output=True,
 706 |             y_numeric=True,
 707 |         )
 708 |         if sample_weight is not None:
 709 |             sample_weight = _check_sample_weight(sample_weight, X)
 710 | 
 711 |         y_shape = np.shape(y)
 712 |         n_targets = y_shape[1] if len(y_shape) > 1 else 1
 713 |         if n_targets < 2:
 714 |             y = y.ravel()
 715 |         self._fit(X, y, sample_weight)
 716 | 
 717 |         return self
 718 | 
 719 |     def predict(self, X):
 720 |         """Predict regression target for X.
 721 | 
 722 |         Parameters
 723 |         ----------
 724 |         X : array-like of shape (n_samples, n_features)
 725 |             Samples.
 726 | 
 727 |         Returns
 728 |         -------
 729 |         pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if
 730 |             multitarget regression.
 731 |             The predicted values.
 732 |         """
 733 |         check_is_fitted(self, attributes='base_estimator_')
 734 | 
 735 |         return self.base_estimator_.predict(self.transform(X))
 736 | 
 737 | 
 738 | class LinearBoostClassifier(_LinearBoosting, ClassifierMixin):
 739 |     """A Linear Boosting Classifier.
 740 | 
 741 |     A Linear Boosting Classifier is an iterative meta-estimator that starts
 742 |     with a linear classifier, and model the residuals through decision trees.
 743 |     At each iteration, the path leading to highest error (i.e. the worst leaf)
 744 |     is added as a new binary variable to the base model. This kind of Linear
 745 |     Boosting can be considered as an improvement over general linear models
 746 |     since it enables incorporating non-linear features by residuals modeling.
 747 | 
 748 |     Parameters
 749 |     ----------
 750 |     base_estimator : object
 751 |         The base estimator iteratively fitted.
 752 |         The base estimator must be a sklearn.linear_model.
 753 | 
 754 |     loss : {"hamming", "entropy"}, default="entropy"
 755 |         The function used to calculate the residuals of each sample.
 756 |         `"entropy"` can be used only if `base_estimator` has `predict_proba`
 757 |         method.
 758 | 
 759 |     n_estimators : int, default=10
 760 |         The number of boosting stages to perform. It corresponds to the number
 761 |         of the new features generated.
 762 | 
 763 |     max_depth : int, default=3
 764 |         The maximum depth of the tree. If None, then nodes are expanded until
 765 |         all leaves are pure or until all leaves contain less than
 766 |         min_samples_split samples.
 767 | 
 768 |     min_samples_split : int or float, default=2
 769 |         The minimum number of samples required to split an internal node:
 770 | 
 771 |         - If int, then consider `min_samples_split` as the minimum number.
 772 |         - If float, then `min_samples_split` is a fraction and
 773 |           `ceil(min_samples_split * n_samples)` are the minimum
 774 |           number of samples for each split.
 775 | 
 776 |     min_samples_leaf : int or float, default=1
 777 |         The minimum number of samples required to be at a leaf node.
 778 |         A split point at any depth will only be considered if it leaves at
 779 |         least ``min_samples_leaf`` training samples in each of the left and
 780 |         right branches.  This may have the effect of smoothing the model,
 781 |         especially in regression.
 782 | 
 783 |         - If int, then consider `min_samples_leaf` as the minimum number.
 784 |         - If float, then `min_samples_leaf` is a fraction and
 785 |           `ceil(min_samples_leaf * n_samples)` are the minimum
 786 |           number of samples for each node.
 787 | 
 788 |     min_weight_fraction_leaf : float, default=0.0
 789 |         The minimum weighted fraction of the sum total of weights (of all
 790 |         the input samples) required to be at a leaf node. Samples have
 791 |         equal weight when sample_weight is not provided.
 792 | 
 793 |     max_features : int, float or {"auto", "sqrt", "log2"}, default=None
 794 |         The number of features to consider when looking for the best split:
 795 | 
 796 |         - If int, then consider `max_features` features at each split.
 797 |         - If float, then `max_features` is a fraction and
 798 |           `int(max_features * n_features)` features are considered at each
 799 |           split.
 800 |         - If "auto", then `max_features=n_features`.
 801 |         - If "sqrt", then `max_features=sqrt(n_features)`.
 802 |         - If "log2", then `max_features=log2(n_features)`.
 803 |         - If None, then `max_features=n_features`.
 804 | 
 805 |         Note: the search for a split does not stop until at least one
 806 |         valid partition of the node samples is found, even if it requires to
 807 |         effectively inspect more than ``max_features`` features.
 808 | 
 809 |     random_state : int, RandomState instance or None, default=None
 810 |         Controls the randomness of the estimator.
 811 | 
 812 |     max_leaf_nodes : int, default=None
 813 |         Grow a tree with ``max_leaf_nodes`` in best-first fashion.
 814 |         Best nodes are defined as relative reduction in impurity.
 815 |         If None then unlimited number of leaf nodes.
 816 | 
 817 |     min_impurity_decrease : float, default=0.0
 818 |         A node will be split if this split induces a decrease of the impurity
 819 |         greater than or equal to this value.
 820 | 
 821 |     ccp_alpha : non-negative float, default=0.0
 822 |         Complexity parameter used for Minimal Cost-Complexity Pruning. The
 823 |         subtree with the largest cost complexity that is smaller than
 824 |         ``ccp_alpha`` will be chosen. By default, no pruning is performed. See
 825 |         :ref:`minimal_cost_complexity_pruning` for details.
 826 | 
 827 |     Attributes
 828 |     ----------
 829 |     n_features_in_ : int
 830 |         The number of features when :meth:`fit` is performed.
 831 | 
 832 |     n_features_out_ : int
 833 |         The total number of features used to fit the base estimator in the
 834 |         last iteration. The number of output features is equal to the sum
 835 |         of n_features_in_ and n_estimators.
 836 | 
 837 |     coef_ : ndarray of shape (1, n_features_out_) or (n_classes, n_features_out_)
 838 |         Coefficient of the features in the decision function.
 839 | 
 840 |     intercept_ : float or array of shape (n_classes, )
 841 |         Independent term in the linear model. Set to 0 if `fit_intercept = False`
 842 |         in `base_estimator`
 843 | 
 844 |     classes_ : ndarray of shape (n_classes, )
 845 |         A list of class labels known to the classifier.
 846 | 
 847 |     Examples
 848 |     --------
 849 |     >>> from sklearn.linear_model import RidgeClassifier
 850 |     >>> from lineartree import LinearBoostClassifier
 851 |     >>> from sklearn.datasets import make_classification
 852 |     >>> X, y = make_classification(n_samples=100, n_features=4,
 853 |     ...                            n_informative=2, n_redundant=0,
 854 |     ...                            random_state=0, shuffle=False)
 855 |     >>> clf = LinearBoostClassifier(base_estimator=RidgeClassifier())
 856 |     >>> clf.fit(X, y)
 857 |     >>> clf.predict([[0, 0, 0, 0]])
 858 |     array([1])
 859 | 
 860 |     References
 861 |     ----------
 862 |     Explainable boosted linear regression for time series forecasting.
 863 |     Authors: Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan.
 864 |     (https://arxiv.org/abs/2009.09110)
 865 |     """
 866 |     def __init__(self, base_estimator, *, loss='hamming', n_estimators=10,
 867 |                  max_depth=3, min_samples_split=2, min_samples_leaf=1,
 868 |                  min_weight_fraction_leaf=0.0, max_features=None,
 869 |                  random_state=None, max_leaf_nodes=None,
 870 |                  min_impurity_decrease=0.0, ccp_alpha=0.0):
 871 | 
 872 |         self.base_estimator = base_estimator
 873 |         self.loss = loss
 874 |         self.n_estimators = n_estimators
 875 |         self.max_depth = max_depth
 876 |         self.min_samples_split = min_samples_split
 877 |         self.min_samples_leaf = min_samples_leaf
 878 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
 879 |         self.max_features = max_features
 880 |         self.random_state = random_state
 881 |         self.max_leaf_nodes = max_leaf_nodes
 882 |         self.min_impurity_decrease = min_impurity_decrease
 883 |         self.ccp_alpha = ccp_alpha
 884 | 
 885 |     def fit(self, X, y, sample_weight=None):
 886 |         """Build a Linear Boosting from the training set (X, y).
 887 | 
 888 |         Parameters
 889 |         ----------
 890 |         X : array-like of shape (n_samples, n_features)
 891 |             The training input samples.
 892 | 
 893 |         y : array-like of shape (n_samples, )
 894 |             Target values.
 895 | 
 896 |         sample_weight : array-like of shape (n_samples, ), default=None
 897 |             Sample weights.
 898 | 
 899 |         Returns
 900 |         -------
 901 |         self : object
 902 |         """
 903 |         clas_losses = ('hamming', 'entropy')
 904 | 
 905 |         if self.loss not in clas_losses:
 906 |             raise ValueError("Classification tasks support only loss in {}, "
 907 |                              "got '{}'.".format(clas_losses, self.loss))
 908 | 
 909 |         if (not hasattr(self.base_estimator, 'predict_proba') and
 910 |                 self.loss == 'entropy'):
 911 |             raise ValueError("The 'entropy' loss requires a base_estimator "
 912 |                              "with predict_proba method.")
 913 | 
 914 |         # Convert data (X is required to be 2d and indexable)
 915 |         X, y = self._validate_data(
 916 |             X, y,
 917 |             reset=True,
 918 |             accept_sparse=False,
 919 |             dtype='float32',
 920 |             force_all_finite=True,
 921 |             ensure_2d=True,
 922 |             allow_nd=False,
 923 |             multi_output=False,
 924 |         )
 925 |         if sample_weight is not None:
 926 |             sample_weight = _check_sample_weight(sample_weight, X)
 927 | 
 928 |         self.classes_ = np.unique(y)
 929 |         self._fit(X, y, sample_weight)
 930 | 
 931 |         return self
 932 | 
 933 |     def predict(self, X):
 934 |         """Predict class for X.
 935 | 
 936 |         Parameters
 937 |         ----------
 938 |         X : array-like of shape (n_samples, n_features)
 939 |             Samples.
 940 | 
 941 |         Returns
 942 |         -------
 943 |         pred : ndarray of shape (n_samples, )
 944 |             The predicted classes.
 945 |         """
 946 |         check_is_fitted(self, attributes='base_estimator_')
 947 | 
 948 |         return self.base_estimator_.predict(self.transform(X))
 949 | 
 950 |     def predict_proba(self, X):
 951 |         """Predict class probabilities for X.
 952 | 
 953 |         If base estimators do not implement a ``predict_proba`` method,
 954 |         then the one-hot encoding of the predicted class is returned.
 955 | 
 956 |         Parameters
 957 |         ----------
 958 |         X : array-like of shape (n_samples, n_features)
 959 |             Samples.
 960 | 
 961 |         Returns
 962 |         -------
 963 |         pred : ndarray of shape (n_samples, n_classes)
 964 |             The class probabilities of the input samples. The order of the
 965 |             classes corresponds to that in the attribute :term:`classes_`.
 966 |         """
 967 |         if hasattr(self.base_estimator, 'predict_proba'):
 968 |             check_is_fitted(self, attributes='base_estimator_')
 969 |             pred = self.base_estimator_.predict_proba(self.transform(X))
 970 | 
 971 |         else:
 972 |             pred_class = self.predict(X)
 973 |             pred = np.zeros((pred_class.shape[0], len(self.classes_)))
 974 |             class_to_int = dict(map(reversed, enumerate(self.classes_)))
 975 |             pred_class = np.array([class_to_int[v] for v in pred_class])
 976 |             pred[np.arange(pred_class.shape[0]), pred_class] = 1
 977 | 
 978 |         return pred
 979 | 
 980 |     def predict_log_proba(self, X):
 981 |         """Predict class log-probabilities for X.
 982 | 
 983 |         If base estimators do not implement a ``predict_log_proba`` method,
 984 |         then the logarithm of the one-hot encoded predicted class is returned.
 985 | 
 986 |         Parameters
 987 |         ----------
 988 |         X : array-like of shape (n_samples, n_features)
 989 |             Samples.
 990 | 
 991 |         Returns
 992 |         -------
 993 |         pred : ndarray of shape (n_samples, n_classes)
 994 |             The class log-probabilities of the input samples. The order of the
 995 |             classes corresponds to that in the attribute :term:`classes_`.
 996 |         """
 997 |         return np.log(self.predict_proba(X))
 998 | 
 999 | 
1000 | class LinearForestRegressor(_LinearForest, RegressorMixin):
1001 |     """"A Linear Forest Regressor.
1002 | 
1003 |     Linear forests generalizes the well known random forests by combining
1004 |     linear models with the same random forests.
1005 |     The key idea of linear forests is to use the strength of linear models
1006 |     to improve the nonparametric learning ability of tree-based algorithms.
1007 |     Firstly, a linear model is fitted on the whole dataset, then a random
1008 |     forest is trained on the same dataset but using the residuals of the
1009 |     previous steps as target. The final predictions are the sum of the raw
1010 |     linear predictions and the residuals modeled by the random forest.
1011 | 
1012 |     Parameters
1013 |     ----------
1014 |     base_estimator : object
1015 |         The linear estimator fitted on the raw target.
1016 |         The linear estimator must be a regressor from sklearn.linear_model.
1017 | 
1018 |     n_estimators : int, default=100
1019 |         The number of trees in the forest.
1020 | 
1021 |     max_depth : int, default=None
1022 |         The maximum depth of the tree. If None, then nodes are expanded until
1023 |         all leaves are pure or until all leaves contain less than
1024 |         min_samples_split samples.
1025 | 
1026 |     min_samples_split : int or float, default=2
1027 |         The minimum number of samples required to split an internal node:
1028 | 
1029 |         - If int, then consider `min_samples_split` as the minimum number.
1030 |         - If float, then `min_samples_split` is a fraction and
1031 |           `ceil(min_samples_split * n_samples)` are the minimum
1032 |           number of samples for each split.
1033 | 
1034 |     min_samples_leaf : int or float, default=1
1035 |         The minimum number of samples required to be at a leaf node.
1036 |         A split point at any depth will only be considered if it leaves at
1037 |         least ``min_samples_leaf`` training samples in each of the left and
1038 |         right branches.  This may have the effect of smoothing the model,
1039 |         especially in regression.
1040 | 
1041 |         - If int, then consider `min_samples_leaf` as the minimum number.
1042 |         - If float, then `min_samples_leaf` is a fraction and
1043 |           `ceil(min_samples_leaf * n_samples)` are the minimum
1044 |           number of samples for each node.
1045 | 
1046 |     min_weight_fraction_leaf : float, default=0.0
1047 |         The minimum weighted fraction of the sum total of weights (of all
1048 |         the input samples) required to be at a leaf node. Samples have
1049 |         equal weight when sample_weight is not provided.
1050 | 
1051 |     max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"
1052 |         The number of features to consider when looking for the best split:
1053 | 
1054 |         - If int, then consider `max_features` features at each split.
1055 |         - If float, then `max_features` is a fraction and
1056 |           `round(max_features * n_features)` features are considered at each
1057 |           split.
1058 |         - If "auto", then `max_features=n_features`.
1059 |         - If "sqrt", then `max_features=sqrt(n_features)`.
1060 |         - If "log2", then `max_features=log2(n_features)`.
1061 |         - If None, then `max_features=n_features`.
1062 | 
1063 |         Note: the search for a split does not stop until at least one
1064 |         valid partition of the node samples is found, even if it requires to
1065 |         effectively inspect more than ``max_features`` features.
1066 | 
1067 |     max_leaf_nodes : int, default=None
1068 |         Grow trees with ``max_leaf_nodes`` in best-first fashion.
1069 |         Best nodes are defined as relative reduction in impurity.
1070 |         If None then unlimited number of leaf nodes.
1071 | 
1072 |     min_impurity_decrease : float, default=0.0
1073 |         A node will be split if this split induces a decrease of the impurity
1074 |         greater than or equal to this value.
1075 | 
1076 |     bootstrap : bool, default=True
1077 |         Whether bootstrap samples are used when building trees. If False, the
1078 |         whole dataset is used to build each tree.
1079 | 
1080 |     oob_score : bool, default=False
1081 |         Whether to use out-of-bag samples to estimate the generalization score.
1082 |         Only available if bootstrap=True.
1083 | 
1084 |     n_jobs : int, default=None
1085 |         The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
1086 |         :meth:`decision_path` and :meth:`apply` are all parallelized over the
1087 |         trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
1088 |         context. ``-1`` means using all processors.
1089 | 
1090 |     random_state : int, RandomState instance or None, default=None
1091 |         Controls both the randomness of the bootstrapping of the samples used
1092 |         when building trees (if ``bootstrap=True``) and the sampling of the
1093 |         features to consider when looking for the best split at each node
1094 |         (if ``max_features < n_features``).
1095 | 
1096 |     ccp_alpha : non-negative float, default=0.0
1097 |         Complexity parameter used for Minimal Cost-Complexity Pruning. The
1098 |         subtree with the largest cost complexity that is smaller than
1099 |         ``ccp_alpha`` will be chosen. By default, no pruning is performed. See
1100 |         :ref:`minimal_cost_complexity_pruning` for details.
1101 | 
1102 |     max_samples : int or float, default=None
1103 |         If bootstrap is True, the number of samples to draw from X
1104 |         to train each base estimator.
1105 | 
1106 |         - If None (default), then draw `X.shape[0]` samples.
1107 |         - If int, then draw `max_samples` samples.
1108 |         - If float, then draw `max_samples * X.shape[0]` samples. Thus,
1109 |           `max_samples` should be in the interval `(0, 1]`.
1110 | 
1111 |     Attributes
1112 |     ----------
1113 |     n_features_in_ : int
1114 |         The number of features when :meth:`fit` is performed.
1115 | 
1116 |     feature_importances_ : ndarray of shape (n_features, )
1117 |         The impurity-based feature importances.
1118 |         The higher, the more important the feature.
1119 |         The importance of a feature is computed as the (normalized)
1120 |         total reduction of the criterion brought by that feature.  It is also
1121 |         known as the Gini importance.
1122 | 
1123 |     coef_ : array of shape (n_features, ) or (n_targets, n_features)
1124 |         Estimated coefficients for the linear regression problem.
1125 |         If multiple targets are passed during the fit (y 2D), this is a
1126 |         2D array of shape (n_targets, n_features), while if only one target
1127 |         is passed, this is a 1D array of length n_features.
1128 | 
1129 |     intercept_ : float or array of shape (n_targets,)
1130 |         Independent term in the linear model. Set to 0 if `fit_intercept = False`
1131 |         in `base_estimator`.
1132 | 
1133 |     base_estimator_ : object
1134 |         A fitted linear model instance.
1135 | 
1136 |     forest_estimator_ : object
1137 |         A fitted random forest instance.
1138 | 
1139 |     Examples
1140 |     --------
1141 |     >>> from sklearn.linear_model import LinearRegression
1142 |     >>> from lineartree import LinearForestRegressor
1143 |     >>> from sklearn.datasets import make_regression
1144 |     >>> X, y = make_regression(n_samples=100, n_features=4,
1145 |     ...                        n_informative=2, n_targets=1,
1146 |     ...                        random_state=0, shuffle=False)
1147 |     >>> regr = LinearForestRegressor(base_estimator=LinearRegression())
1148 |     >>> regr.fit(X, y)
1149 |     >>> regr.predict([[0, 0, 0, 0]])
1150 |     array([8.8817842e-16])
1151 | 
1152 |     References
1153 |     ----------
1154 |     Regression-Enhanced Random Forests.
1155 |     Authors: Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu.
1156 |     (https://arxiv.org/abs/1904.10416)
1157 |     """
1158 |     def __init__(self, base_estimator, *, n_estimators=100,
1159 |                  max_depth=None, min_samples_split=2, min_samples_leaf=1,
1160 |                  min_weight_fraction_leaf=0., max_features="auto",
1161 |                  max_leaf_nodes=None, min_impurity_decrease=0.,
1162 |                  bootstrap=True, oob_score=False, n_jobs=None,
1163 |                  random_state=None, ccp_alpha=0.0, max_samples=None):
1164 | 
1165 |         self.base_estimator = base_estimator
1166 |         self.n_estimators = n_estimators
1167 |         self.max_depth = max_depth
1168 |         self.min_samples_split = min_samples_split
1169 |         self.min_samples_leaf = min_samples_leaf
1170 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
1171 |         self.max_features = max_features
1172 |         self.max_leaf_nodes = max_leaf_nodes
1173 |         self.min_impurity_decrease = min_impurity_decrease
1174 |         self.bootstrap = bootstrap
1175 |         self.oob_score = oob_score
1176 |         self.n_jobs = n_jobs
1177 |         self.random_state = random_state
1178 |         self.ccp_alpha = ccp_alpha
1179 |         self.max_samples = max_samples
1180 | 
1181 |     def fit(self, X, y, sample_weight=None):
1182 |         """Build a Linear Forest from the training set (X, y).
1183 | 
1184 |         Parameters
1185 |         ----------
1186 |         X : array-like of shape (n_samples, n_features)
1187 |             The training input samples.
1188 | 
1189 |         y : array-like of shape (n_samples, ) or (n_samples, n_targets)
1190 |             Target values.
1191 | 
1192 |         sample_weight : array-like of shape (n_samples, ), default=None
1193 |             Sample weights.
1194 | 
1195 |         Returns
1196 |         -------
1197 |         self : object
1198 |         """
1199 |         # Convert data (X is required to be 2d and indexable)
1200 |         X, y = self._validate_data(
1201 |             X, y,
1202 |             reset=True,
1203 |             accept_sparse=True,
1204 |             dtype='float32',
1205 |             force_all_finite=True,
1206 |             ensure_2d=True,
1207 |             allow_nd=False,
1208 |             multi_output=True,
1209 |             y_numeric=True,
1210 |         )
1211 |         if sample_weight is not None:
1212 |             sample_weight = _check_sample_weight(sample_weight, X)
1213 | 
1214 |         y_shape = np.shape(y)
1215 |         n_targets = y_shape[1] if len(y_shape) > 1 else 1
1216 |         if n_targets < 2:
1217 |             y = y.ravel()
1218 |         self._fit(X, y, sample_weight)
1219 | 
1220 |         return self
1221 | 
1222 |     def predict(self, X):
1223 |         """Predict regression target for X.
1224 | 
1225 |         Parameters
1226 |         ----------
1227 |         X : array-like of shape (n_samples, n_features)
1228 |             Samples.
1229 | 
1230 |         Returns
1231 |         -------
1232 |         pred : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if
1233 |             multitarget regression.
1234 |             The predicted values.
1235 |         """
1236 |         check_is_fitted(self, attributes='base_estimator_')
1237 | 
1238 |         X = self._validate_data(
1239 |             X,
1240 |             reset=False,
1241 |             accept_sparse=True,
1242 |             dtype='float32',
1243 |             force_all_finite=True,
1244 |             ensure_2d=True,
1245 |             allow_nd=False,
1246 |             ensure_min_features=self.n_features_in_
1247 |         )
1248 | 
1249 |         linear_pred = self.base_estimator_.predict(X)
1250 |         forest_pred = self.forest_estimator_.predict(X)
1251 | 
1252 |         return linear_pred + forest_pred
1253 | 
1254 | 
1255 | class LinearForestClassifier(_LinearForest, ClassifierMixin):
1256 |     """"A Linear Forest Classifier.
1257 | 
1258 |     Linear forests generalizes the well known random forests by combining
1259 |     linear models with the same random forests.
1260 |     The key idea of linear forests is to use the strength of linear models
1261 |     to improve the nonparametric learning ability of tree-based algorithms.
1262 |     Firstly, a linear model is fitted on the whole dataset, then a random
1263 |     forest is trained on the same dataset but using the residuals of the
1264 |     previous steps as target. The final predictions are the sum of the raw
1265 |     linear predictions and the residuals modeled by the random forest.
1266 | 
1267 |     For classification tasks the same approach used in regression context
1268 |     is adopted. The binary targets are transformed into logits using the
1269 |     inverse sigmoid function. A linear regression is fitted. A random forest
1270 |     regressor is trained to approximate the residulas from logits and linear
1271 |     predictions. Finally the sigmoid of the combinded predictions are taken
1272 |     to obtain probabilities.
1273 |     The multi-label scenario is carried out using OneVsRestClassifier.
1274 | 
1275 |     Parameters
1276 |     ----------
1277 |     base_estimator : object
1278 |         The linear estimator fitted on the raw target.
1279 |         The linear estimator must be a regressor from sklearn.linear_model.
1280 | 
1281 |     n_estimators : int, default=100
1282 |         The number of trees in the forest.
1283 | 
1284 |     max_depth : int, default=None
1285 |         The maximum depth of the tree. If None, then nodes are expanded until
1286 |         all leaves are pure or until all leaves contain less than
1287 |         min_samples_split samples.
1288 | 
1289 |     min_samples_split : int or float, default=2
1290 |         The minimum number of samples required to split an internal node:
1291 | 
1292 |         - If int, then consider `min_samples_split` as the minimum number.
1293 |         - If float, then `min_samples_split` is a fraction and
1294 |           `ceil(min_samples_split * n_samples)` are the minimum
1295 |           number of samples for each split.
1296 | 
1297 |     min_samples_leaf : int or float, default=1
1298 |         The minimum number of samples required to be at a leaf node.
1299 |         A split point at any depth will only be considered if it leaves at
1300 |         least ``min_samples_leaf`` training samples in each of the left and
1301 |         right branches.  This may have the effect of smoothing the model,
1302 |         especially in regression.
1303 | 
1304 |         - If int, then consider `min_samples_leaf` as the minimum number.
1305 |         - If float, then `min_samples_leaf` is a fraction and
1306 |           `ceil(min_samples_leaf * n_samples)` are the minimum
1307 |           number of samples for each node.
1308 | 
1309 |     min_weight_fraction_leaf : float, default=0.0
1310 |         The minimum weighted fraction of the sum total of weights (of all
1311 |         the input samples) required to be at a leaf node. Samples have
1312 |         equal weight when sample_weight is not provided.
1313 | 
1314 |     max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"
1315 |         The number of features to consider when looking for the best split:
1316 | 
1317 |         - If int, then consider `max_features` features at each split.
1318 |         - If float, then `max_features` is a fraction and
1319 |           `round(max_features * n_features)` features are considered at each
1320 |           split.
1321 |         - If "auto", then `max_features=n_features`.
1322 |         - If "sqrt", then `max_features=sqrt(n_features)`.
1323 |         - If "log2", then `max_features=log2(n_features)`.
1324 |         - If None, then `max_features=n_features`.
1325 | 
1326 |         Note: the search for a split does not stop until at least one
1327 |         valid partition of the node samples is found, even if it requires to
1328 |         effectively inspect more than ``max_features`` features.
1329 | 
1330 |     max_leaf_nodes : int, default=None
1331 |         Grow trees with ``max_leaf_nodes`` in best-first fashion.
1332 |         Best nodes are defined as relative reduction in impurity.
1333 |         If None then unlimited number of leaf nodes.
1334 | 
1335 |     min_impurity_decrease : float, default=0.0
1336 |         A node will be split if this split induces a decrease of the impurity
1337 |         greater than or equal to this value.
1338 | 
1339 |     bootstrap : bool, default=True
1340 |         Whether bootstrap samples are used when building trees. If False, the
1341 |         whole dataset is used to build each tree.
1342 | 
1343 |     oob_score : bool, default=False
1344 |         Whether to use out-of-bag samples to estimate the generalization score.
1345 |         Only available if bootstrap=True.
1346 | 
1347 |     n_jobs : int, default=None
1348 |         The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
1349 |         :meth:`decision_path` and :meth:`apply` are all parallelized over the
1350 |         trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
1351 |         context. ``-1`` means using all processors.
1352 | 
1353 |     random_state : int, RandomState instance or None, default=None
1354 |         Controls both the randomness of the bootstrapping of the samples used
1355 |         when building trees (if ``bootstrap=True``) and the sampling of the
1356 |         features to consider when looking for the best split at each node
1357 |         (if ``max_features < n_features``).
1358 | 
1359 |     ccp_alpha : non-negative float, default=0.0
1360 |         Complexity parameter used for Minimal Cost-Complexity Pruning. The
1361 |         subtree with the largest cost complexity that is smaller than
1362 |         ``ccp_alpha`` will be chosen. By default, no pruning is performed. See
1363 |         :ref:`minimal_cost_complexity_pruning` for details.
1364 | 
1365 |     max_samples : int or float, default=None
1366 |         If bootstrap is True, the number of samples to draw from X
1367 |         to train each base estimator.
1368 | 
1369 |         - If None (default), then draw `X.shape[0]` samples.
1370 |         - If int, then draw `max_samples` samples.
1371 |         - If float, then draw `max_samples * X.shape[0]` samples. Thus,
1372 |           `max_samples` should be in the interval `(0, 1]`.
1373 | 
1374 |     Attributes
1375 |     ----------
1376 |     n_features_in_ : int
1377 |         The number of features when :meth:`fit` is performed.
1378 | 
1379 |     feature_importances_ : ndarray of shape (n_features, )
1380 |         The impurity-based feature importances.
1381 |         The higher, the more important the feature.
1382 |         The importance of a feature is computed as the (normalized)
1383 |         total reduction of the criterion brought by that feature.  It is also
1384 |         known as the Gini importance.
1385 | 
1386 |     coef_ : ndarray of shape (1, n_features_out_)
1387 |         Coefficient of the features in the decision function.
1388 | 
1389 |     intercept_ : float
1390 |         Independent term in the linear model. Set to 0 if `fit_intercept = False`
1391 |         in `base_estimator`.
1392 | 
1393 |     classes_ : ndarray of shape (n_classes, )
1394 |         A list of class labels known to the classifier.
1395 | 
1396 |     base_estimator_ : object
1397 |         A fitted linear model instance.
1398 | 
1399 |     forest_estimator_ : object
1400 |         A fitted random forest instance.
1401 | 
1402 |     Examples
1403 |     --------
1404 |     >>> from sklearn.linear_model import LinearRegression
1405 |     >>> from lineartree import LinearForestClassifier
1406 |     >>> from sklearn.datasets import make_classification
1407 |     >>> X, y = make_classification(n_samples=100, n_classes=2, n_features=4,
1408 |     ...                            n_informative=2, n_redundant=0,
1409 |     ...                            random_state=0, shuffle=False)
1410 |     >>> clf = LinearForestClassifier(base_estimator=LinearRegression())
1411 |     >>> clf.fit(X, y)
1412 |     >>> clf.predict([[0, 0, 0, 0]])
1413 |     array([1])
1414 | 
1415 |     References
1416 |     ----------
1417 |     Regression-Enhanced Random Forests.
1418 |     Authors: Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu.
1419 |     (https://arxiv.org/abs/1904.10416)
1420 |     """
1421 |     def __init__(self, base_estimator, *, n_estimators=100,
1422 |                  max_depth=None, min_samples_split=2, min_samples_leaf=1,
1423 |                  min_weight_fraction_leaf=0., max_features="auto",
1424 |                  max_leaf_nodes=None, min_impurity_decrease=0.,
1425 |                  bootstrap=True, oob_score=False, n_jobs=None,
1426 |                  random_state=None, ccp_alpha=0.0, max_samples=None):
1427 | 
1428 |         self.base_estimator = base_estimator
1429 |         self.n_estimators = n_estimators
1430 |         self.max_depth = max_depth
1431 |         self.min_samples_split = min_samples_split
1432 |         self.min_samples_leaf = min_samples_leaf
1433 |         self.min_weight_fraction_leaf = min_weight_fraction_leaf
1434 |         self.max_features = max_features
1435 |         self.max_leaf_nodes = max_leaf_nodes
1436 |         self.min_impurity_decrease = min_impurity_decrease
1437 |         self.bootstrap = bootstrap
1438 |         self.oob_score = oob_score
1439 |         self.n_jobs = n_jobs
1440 |         self.random_state = random_state
1441 |         self.ccp_alpha = ccp_alpha
1442 |         self.max_samples = max_samples
1443 | 
1444 |     def fit(self, X, y, sample_weight=None):
1445 |         """Build a Linear Forest from the training set (X, y).
1446 | 
1447 |         Parameters
1448 |         ----------
1449 |         X : array-like of shape (n_samples, n_features)
1450 |             The training input samples.
1451 | 
1452 |         y : array-like of shape (n_samples, ) or (n_samples, n_targets)
1453 |             Target values.
1454 | 
1455 |         sample_weight : array-like of shape (n_samples, ), default=None
1456 |             Sample weights.
1457 | 
1458 |         Returns
1459 |         -------
1460 |         self : object
1461 |         """
1462 |         # Convert data (X is required to be 2d and indexable)
1463 |         X, y = self._validate_data(
1464 |             X, y,
1465 |             reset=True,
1466 |             accept_sparse=True,
1467 |             dtype='float32',
1468 |             force_all_finite=True,
1469 |             ensure_2d=True,
1470 |             allow_nd=False,
1471 |             multi_output=False,
1472 |         )
1473 |         if sample_weight is not None:
1474 |             sample_weight = _check_sample_weight(sample_weight, X)
1475 | 
1476 |         self.classes_ = np.unique(y)
1477 |         if len(self.classes_) > 2:
1478 |             raise ValueError(
1479 |                 "LinearForestClassifier supports only binary classification task. "
1480 |                 "To solve a multi-lable classification task use "
1481 |                 "LinearForestClassifier with OneVsRestClassifier from sklearn.")
1482 | 
1483 |         self._fit(X, y, sample_weight)
1484 | 
1485 |         return self
1486 | 
1487 |     def decision_function(self, X):
1488 |         """Predict confidence scores for samples.
1489 | 
1490 |         The confidence score for a sample is proportional to the signed
1491 |         distance of that sample to the hyperplane.
1492 | 
1493 |         Parameters
1494 |         ----------
1495 |         X : array-like of shape (n_samples, n_features)
1496 |             Samples.
1497 | 
1498 |         Returns
1499 |         -------
1500 |         pred : ndarray of shape (n_samples, )
1501 |             Confidence scores.
1502 |             Confidence score for self.classes_[1] where >0 means this
1503 |             class would be predicted.
1504 |         """
1505 |         check_is_fitted(self, attributes='base_estimator_')
1506 | 
1507 |         X = self._validate_data(
1508 |             X,
1509 |             reset=False,
1510 |             accept_sparse=True,
1511 |             dtype='float32',
1512 |             force_all_finite=True,
1513 |             ensure_2d=True,
1514 |             allow_nd=False,
1515 |             ensure_min_features=self.n_features_in_
1516 |         )
1517 | 
1518 |         linear_pred = self.base_estimator_.predict(X)
1519 |         forest_pred = self.forest_estimator_.predict(X)
1520 | 
1521 |         return linear_pred + forest_pred
1522 | 
1523 |     def predict(self, X):
1524 |         """Predict class for X.
1525 | 
1526 |         Parameters
1527 |         ----------
1528 |         X : array-like of shape (n_samples, n_features)
1529 |             Samples.
1530 | 
1531 |         Returns
1532 |         -------
1533 |         pred : ndarray of shape (n_samples, )
1534 |             The predicted classes.
1535 |         """
1536 |         pred = self.decision_function(X)
1537 |         pred_class = (self._sigmoid(pred) > 0.5).astype(int)
1538 |         int_to_class = dict(enumerate(self.classes_))
1539 |         pred_class = np.array([int_to_class[i] for i in pred_class])
1540 | 
1541 |         return pred_class
1542 | 
1543 |     def predict_proba(self, X):
1544 |         """Predict class probabilities for X.
1545 | 
1546 |         Parameters
1547 |         ----------
1548 |         X : array-like of shape (n_samples, n_features)
1549 |             Samples.
1550 | 
1551 |         Returns
1552 |         -------
1553 |         proba : ndarray of shape (n_samples, n_classes)
1554 |             The class probabilities of the input samples. The order of the
1555 |             classes corresponds to that in the attribute :term:`classes_`.
1556 |         """
1557 | 
1558 |         pred = self._sigmoid(self.decision_function(X))
1559 |         proba = np.zeros((X.shape[0], 2))
1560 |         proba[:, 0] = 1 - pred
1561 |         proba[:, 1] = pred
1562 | 
1563 |         return proba
1564 | 
1565 |     def predict_log_proba(self, X):
1566 |         """Predict class log-probabilities for X.
1567 | 
1568 |         Parameters
1569 |         ----------
1570 |         X : array-like of shape (n_samples, n_features)
1571 |             Samples.
1572 | 
1573 |         Returns
1574 |         -------
1575 |         pred : ndarray of shape (n_samples, n_classes)
1576 |             The class log-probabilities of the input samples. The order of the
1577 |             classes corresponds to that in the attribute :term:`classes_`.
1578 |         """
1579 |         return np.log(self.predict_proba(X))
1580 | 


--------------------------------------------------------------------------------
/notebooks/README.md:
--------------------------------------------------------------------------------
   1 | # API Reference
   2 | 
   3 | ## LinearTreeRegressor
   4 | ```
   5 | class lineartree.LinearTreeRegressor(base_estimator, *, criterion = 'mse', max_depth = 5, min_samples_split = 6, min_samples_leaf = 0.1, max_bins = 25, categorical_features = None, split_features = None, linear_features = None, n_jobs = None)
   6 | ```
   7 | 
   8 | #### Parameters:
   9 | 
  10 | - ```base_estimator : object```
  11 |     
  12 |     The base estimator to fit on dataset splits.
  13 |     The base estimator must be a sklearn.linear_model. 
  14 |     
  15 | - ```criterion : {"mse", "rmse", "mae", "poisson"}, default="mse"```
  16 |     
  17 |     The function to measure the quality of a split. `"poisson"` requires `y >= 0`.
  18 |     
  19 | - ```max_depth : int, default=5```
  20 | 
  21 |     The maximum depth of the tree considering only the splitting nodes.
  22 |     A higher value implies a higher training time.
  23 |     
  24 | - ```min_samples_split : int or float, default=6```
  25 | 
  26 |     The minimum number of samples required to split an internal node.
  27 |     The minimum valid number of samples in each node is 6.
  28 |     A lower value implies a higher training time.
  29 |     - If int, then consider `min_samples_split` as the minimum number.
  30 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
  31 | 
  32 | - ```min_samples_leaf : int or float, default=0.1```
  33 | 
  34 |     The minimum number of samples required to be at a leaf node. 
  35 |     A split point at any depth will only be considered if it leaves at least `min_samples_leaf` training samples in each of the left and right branches.
  36 |     The minimum valid number of samples in each leaf is 3.
  37 |     A lower value implies a higher training time.
  38 |     - If int, then consider `min_samples_leaf` as the minimum number.
  39 |     - If float, then `min_samples_leaf` is a fraction and
  40 |     `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
  41 | 
  42 | - ```max_bins : int, default=25```
  43 | 
  44 |     The maximum number of bins to use to search the optimal split in each feature. Features with a small number of unique values may use less than ``max_bins`` bins. Must be lower than 120 and larger than 10. 
  45 |     A higher value implies a higher training time. 
  46 |     
  47 | - ```min_impurity_decrease : float, default=0.0```
  48 | 
  49 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
  50 | 
  51 | - ```categorical_features : int or array-like of int, default=None```
  52 | 
  53 |     Indicates the categorical features. 
  54 |     All categorical indices must be in `[0, n_features)`. 
  55 |     Categorical features are used for splits but are not used in model fitting.
  56 |     More categorical features imply a higher training time.
  57 |     - None : no feature will be considered categorical.
  58 |     - integer array-like : integer indices indicating categorical features.
  59 |     - integer : integer index indicating a categorical feature.
  60 |     
  61 | - ```split_features : int or array-like of int, default=None```
  62 |         
  63 |     Defines which features can be used to split on.
  64 |     All split feature indices must be in `[0, n_features)`.
  65 |     - None : All features will be used for splitting.
  66 |     - integer array-like : integer indices indicating splitting features.
  67 |     - integer : integer index indicating a single splitting feature.
  68 |     
  69 | - ```linear_features : int or array-like of int, default=None```
  70 | 
  71 |     Defines which features are used for the linear model in the leaves.
  72 |     All linear feature indices must be in `[0, n_features)`.
  73 |     - None : All features except those in `categorical_features` will be used in the leaf models.
  74 |     - integer array-like : integer indices indicating features to be used in the leaf models.
  75 |     - integer : integer index indicating a single feature to be used in the leaf models.
  76 | 
  77 | - ```n_jobs : int, default=None```
  78 | 
  79 |     The number of jobs to run in parallel for model fitting. ``None`` means 1 using one processor. ``-1`` means using all processors. 
  80 | 
  81 | #### Attributes:
  82 | 
  83 | - ```n_features_in_ : int```
  84 | 
  85 |     The number of features when :meth:`fit` is performed.
  86 |     
  87 | - ```feature_importances_ : ndarray of shape (n_features, )```
  88 | 
  89 |     Normalized total reduction of criteria by splitting features.
  90 |     
  91 | - ```n_targets_ : int```
  92 | 
  93 |     The number of targets when :meth:`fit` is performed.
  94 | 
  95 | #### Methods:
  96 | 
  97 | - ```fit(X, y, sample_weight=None)```
  98 | 
  99 |     Build a Linear Tree of a linear estimator from the training set (X, y).
 100 |     
 101 |     **Parameters:**
 102 |     
 103 |     - `X` : array-like of shape (n_samples, n_features)
 104 |         The training input samples.  
 105 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
 106 |         Target values.
 107 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
 108 |         Sample weights. If None, then samples are equally weighted. Note that if the base estimator does not support sample weighting, the sample weights are still used to evaluate the splits.
 109 |     
 110 |     **Returns:**
 111 |     
 112 |     - `self` : object
 113 | 
 114 | - ```predict(X)```
 115 | 
 116 |     Predict regression target for X.
 117 |     
 118 |     **Parameters:**
 119 |     
 120 |     - `X` : array-like of shape (n_samples, n_features)
 121 |     
 122 |         Samples. 
 123 |     
 124 |     **Returns:**
 125 |     
 126 |     - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression 
 127 |         
 128 |         The predicted values.
 129 | 
 130 | - ```apply(X)```
 131 | 
 132 |     Return the index of the leaf that each sample is predicted as.
 133 |     
 134 |     **Parameters:**
 135 |     
 136 |     - `X` : array-like of shape (n_samples, n_features)
 137 |         
 138 |         Samples. 
 139 |     
 140 |     **Returns:**
 141 |     
 142 |     - `X_leaves` : array-like of shape (n_samples, ) 
 143 |         
 144 |         For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within ``[0; n_nodes)``, possibly with gaps in the numbering.
 145 | 
 146 | - ```decision_path(X)```
 147 | 
 148 |     Return the decision path in the tree.
 149 |     
 150 |     **Parameters:**
 151 |     
 152 |     - `X` : array-like of shape (n_samples, n_features)
 153 |         
 154 |         Samples. 
 155 |     
 156 |     **Returns:**
 157 |     
 158 |     - `indicator` : sparse matrix of shape (n_samples, n_nodes) 
 159 |         
 160 |         Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes.
 161 | 
 162 | - ```summary(feature_names=None, only_leaves=False, max_depth=None)```
 163 | 
 164 |     Return a summary of nodes created from model fitting.
 165 |     
 166 |     **Parameters:**
 167 |     
 168 |     - `feature_names` : array-like of shape (n_features, ), default=None
 169 |         
 170 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 171 |     
 172 |     - `only_leaves` : bool, default=False
 173 |     
 174 |         Store only information of leaf nodes.
 175 |     
 176 |     - `max_depth` : int, default=None
 177 |     
 178 |         The maximum depth of the representation. If None, the tree is fully generated.
 179 |     
 180 |     **Returns:**
 181 |     
 182 |     - `summary` : nested dict 
 183 |      
 184 |         The keys are the integer map of each node. 
 185 |         The values are dicts containing information for that node:
 186 |         - 'col' (^): column used for splitting;
 187 |         - 'th' (^): threshold value used for splitting in the selected column;
 188 |         - 'loss': loss computed at node level. Weighted sum of children' losses if it is a splitting node;
 189 |         - 'samples': number of samples in the node. Sum of children' samples if it is a split node;
 190 |         - 'children' (^): integer mapping of possible children nodes;
 191 |         - 'models': fitted linear models built in each split. Single model if it is leaf node;
 192 |         - 'classes' (^^): target classes detected in the split. Available only for LinearTreeClassifier.
 193 | 
 194 |         (^): Only for split nodes. 
 195 |         (^^): Only for leaf nodes.
 196 | 
 197 | - ```model_to_dot(feature_names=None, max_depth=None)```
 198 | 
 199 |     Convert a fitted Linear Tree model to dot format. 
 200 |     It results in ModuleNotFoundError if graphviz or pydot are not available.
 201 |     When installing graphviz make sure to add it to the system path.
 202 |     
 203 |     **Parameters:**
 204 |     
 205 |     - `feature_names` : array-like of shape (n_features, ), default=None
 206 |         
 207 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 208 |     
 209 |     - `max_depth` : int, default=None
 210 |     
 211 |         The maximum depth of the representation. If None, the tree is fully generated.
 212 |     
 213 |     **Returns:**
 214 |     
 215 |     - `graph` : pydot.Dot instance  
 216 |     
 217 |         Return an instance representing the Linear Tree. Splitting nodes have a rectangular shape while leaf nodes have a circular one.
 218 | 
 219 | - ```plot_model(feature_names=None, max_depth=None)```
 220 | 
 221 |     Convert a fitted Linear Tree model to dot format and display it. 
 222 |     It results in ModuleNotFoundError if graphviz or pydot are not available.
 223 |     When installing graphviz make sure to add it to the system path.
 224 |     
 225 |     **Parameters:**
 226 |     
 227 |     - `feature_names` : array-like of shape (n_features, ), default=None
 228 |         
 229 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 230 |     
 231 |     - `max_depth` : int, default=None
 232 |     
 233 |         The maximum depth of the representation. If None, the tree is fully generated.
 234 |     
 235 |     **Returns:**
 236 |     
 237 |     - A Jupyter notebook Image object if Jupyter is installed.
 238 |     
 239 |         This enables in-line display of the model plots in notebooks. Splitting nodes have a rectangular shape while leaf nodes have a circular one.
 240 | 
 241 | 
 242 | ## LinearTreeClassifier
 243 | ```
 244 | class lineartree.LinearTreeClassifier(base_estimator, *, criterion = 'hamming', max_depth = 5, min_samples_split = 6,  min_samples_leaf = 0.1, max_bins = 25, categorical_features = None, split_features = None, linear_features = None, n_jobs = None)
 245 | ```
 246 | 
 247 | #### Parameters:
 248 | 
 249 | - ```base_estimator : object```
 250 | 
 251 |     The base estimator to fit on dataset splits.
 252 |     The base estimator must be a sklearn.linear_model. 
 253 |     The selected base estimator is automatically substituted by a `~sklearn.dummy.DummyClassifier` when a dataset split is composed of unique labels.
 254 | 
 255 | - ```criterion : {"hamming", "crossentropy"}, default="hamming"```
 256 | 
 257 |     The function to measure the quality of a split. `"crossentropy"` can be used only if `base_estimator` has `predict_proba` method
 258 | 
 259 | - ```max_depth : int, default=5```
 260 | 
 261 |     The maximum depth of the tree considering only the splitting nodes.
 262 |     A higher value implies a higher training time.
 263 |     
 264 | - ```min_samples_split : int or float, default=6```
 265 | 
 266 |     The minimum number of samples required to split an internal node.
 267 |     The minimum valid number of samples in each node is 6.
 268 |     A lower value implies a higher training time.
 269 |     - If int, then consider `min_samples_split` as the minimum number.
 270 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
 271 | 
 272 | - ```min_samples_leaf : int or float, default=0.1```
 273 | 
 274 |     The minimum number of samples required to be at a leaf node. 
 275 |     A split point at any depth will only be considered if it leaves at least `min_samples_leaf` training samples in each of the left and right branches.
 276 |     The minimum valid number of samples in each leaf is 3.
 277 |     A lower value implies a higher training time.
 278 |     - If int, then consider `min_samples_leaf` as the minimum number.
 279 |     - If float, then `min_samples_leaf` is a fraction and
 280 |     `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
 281 | 
 282 | - ```max_bins : int, default=25```
 283 | 
 284 |     The maximum number of bins to use to search the optimal split in each feature. Features with a small number of unique values may use less than ``max_bins`` bins. Must be lower than 120 and larger than 10. 
 285 |     A higher value implies a higher training time. 
 286 |     
 287 | - ```min_impurity_decrease : float, default=0.0```
 288 | 
 289 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
 290 | 
 291 | - ```categorical_features : int or array-like of int, default=None```
 292 | 
 293 |     Indicates the categorical features. 
 294 |     All categorical indices must be in `[0, n_features)`. 
 295 |     Categorical features are used for splits but are not used in model fitting.
 296 |     More categorical features imply a higher training time.
 297 |     - None : no feature will be considered categorical.
 298 |     - integer array-like : integer indices indicating categorical features.
 299 |     - integer : integer index indicating a categorical feature.
 300 |     
 301 | - ```split_features : int or array-like of int, default=None```
 302 |         
 303 |     Defines which features can be used to split on.
 304 |     All split feature indices must be in `[0, n_features)`.
 305 |     - None : All features will be used for splitting.
 306 |     - integer array-like : integer indices indicating splitting features.
 307 |     - integer : integer index indicating a single splitting feature.
 308 |     
 309 | - ```linear_features : int or array-like of int, default=None```
 310 | 
 311 |     Defines which features are used for the linear model in the leaves.
 312 |     All linear feature indices must be in `[0, n_features)`.
 313 |     - None : All features except those in `categorical_features` will be used in the leaf models.
 314 |     - integer array-like : integer indices indicating features to be used in the leaf models.
 315 |     - integer : integer index indicating a single feature to be used in the leaf models.
 316 |     
 317 | - ```n_jobs : int, default=None```
 318 | 
 319 |     The number of jobs to run in parallel for model fitting. ``None`` means 1 using one processor. ``-1`` means using all processors. 
 320 | 
 321 | #### Attributes:
 322 | 
 323 | - ```n_features_in_ : int```
 324 | 
 325 |     The number of features when :meth:`fit` is performed.
 326 |     
 327 | - ```feature_importances_ : ndarray of shape (n_features, )```
 328 | 
 329 |     Normalized total reduction of criteria by splitting features.
 330 |     
 331 | - ```classes_ : ndarray of shape (n_classes, )```
 332 | 
 333 |     A list of class labels known to the classifier.
 334 | 
 335 | #### Methods:
 336 | 
 337 | - ```fit(X, y, sample_weight=None)```
 338 | 
 339 |     Build a Linear Tree of a linear estimator from the training set (X, y).
 340 |     
 341 |     **Parameters:**
 342 |     
 343 |     - `X` : array-like of shape (n_samples, n_features)
 344 |     
 345 |         The training input samples.  
 346 |         
 347 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
 348 |     
 349 |         Target values.
 350 |         
 351 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
 352 |         
 353 |         Sample weights. If None, then samples are equally weighted. Note that if the base estimator does not support sample weighting, the sample weights are still used to evaluate the splits.
 354 |     
 355 |     **Returns:**
 356 |     
 357 |     - `self` : object
 358 | 
 359 | - ```predict(X)```
 360 | 
 361 |     Predict class for X.
 362 |     
 363 |     **Parameters:**
 364 |     
 365 |     - `X` : array-like of shape (n_samples, n_features)
 366 |         
 367 |         Samples. 
 368 |     
 369 |     **Returns:**
 370 |     
 371 |     - `pred` : ndarray of shape (n_samples, ) 
 372 |         
 373 |         The predicted classes.
 374 | 
 375 | - ```predict_proba(X)```
 376 | 
 377 |     Predict class probabilities for X.
 378 |     If base estimators do not implement a ``predict_proba`` method, then the one-hot encoding of the predicted class is returned
 379 |     
 380 |     **Parameters:**
 381 |     
 382 |     - `X` : array-like of shape (n_samples, n_features)
 383 |         
 384 |         Samples. 
 385 |     
 386 |     **Returns:**
 387 |     
 388 |     - `pred` : ndarray of shape (n_samples, n_classes) 
 389 |         
 390 |         The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.
 391 | 
 392 | - ```predict_log_proba(X)```
 393 | 
 394 |     Predict class log-probabilities for X.
 395 |     If base estimators do not implement a ``predict_log_proba`` method, then the logarithm of the one-hot encoded predicted class is returned.
 396 |     
 397 |     **Parameters:**
 398 |     
 399 |     - `X` : array-like of shape (n_samples, n_features)
 400 |         
 401 |         Samples. 
 402 |     
 403 |     **Returns:**
 404 |     
 405 |     - `pred` : ndarray of shape (n_samples, n_classes) 
 406 |         
 407 |         The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.
 408 | 
 409 | - ```apply(X)```
 410 | 
 411 |     Return the index of the leaf that each sample is predicted as.
 412 |     
 413 |     **Parameters:**
 414 |     
 415 |     - `X` : array-like of shape (n_samples, n_features)
 416 |         
 417 |         Samples. 
 418 |     
 419 |     **Returns:**
 420 |     
 421 |     - `X_leaves` : array-like of shape (n_samples, ) 
 422 |         
 423 |         For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within ``[0; n_nodes)``, possibly with gaps in the numbering.
 424 | 
 425 | - ```decision_path(X)```
 426 | 
 427 |     Return the decision path in the tree.
 428 |     
 429 |     **Parameters:**
 430 |     
 431 |     - `X` : array-like of shape (n_samples, n_features)
 432 |         
 433 |         Samples. 
 434 |     
 435 |     **Returns:**
 436 |     
 437 |     - `indicator` : sparse matrix of shape (n_samples, n_nodes) 
 438 |         
 439 |         Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes.
 440 | 
 441 | - ```summary(feature_names=None, only_leaves=False, max_depth=None)```
 442 | 
 443 |     Return a summary of nodes created from model fitting.
 444 |     
 445 |     **Parameters:**
 446 |     
 447 |     - `feature_names` : array-like of shape (n_features, ), default=None
 448 |         
 449 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 450 |     
 451 |     - `only_leaves` : bool, default=False
 452 |         
 453 |         Store only information of leaf nodes.
 454 |     
 455 |     - `max_depth` : int, default=None
 456 |         
 457 |         The maximum depth of the representation. If None, the tree is fully generated.
 458 |     
 459 |     **Returns:**
 460 |     
 461 |     - `summary` : nested dict
 462 |       
 463 |         The keys are the integer map of each node. 
 464 |         The values are dicts containing information for that node:
 465 |         - 'col' (^): column used for splitting;
 466 |         - 'th' (^): threshold value used for splitting in the selected column;
 467 |         - 'loss': loss computed at node level. Weighted sum of children' losses if it is a splitting node;
 468 |         - 'samples': number of samples in the node. Sum of children' samples if it is a split node;
 469 |         - 'children' (^): integer mapping of possible children nodes;
 470 |         - 'models': fitted linear models built in each split. Single model if it is leaf node;
 471 |         - 'classes' (^^): target classes detected in the split. Available only for LinearTreeClassifier.
 472 | 
 473 |         (^): Only for split nodes. 
 474 |         (^^): Only for leaf nodes.
 475 | 
 476 | - ```model_to_dot(feature_names=None, max_depth=None)```
 477 | 
 478 |     Convert a fitted Linear Tree model to dot format. 
 479 |     It results in ModuleNotFoundError if graphviz or pydot are not available.
 480 |     When installing graphviz make sure to add it to the system path.
 481 |     
 482 |     **Parameters:**
 483 |     
 484 |     - `feature_names` : array-like of shape (n_features, ), default=None
 485 |         
 486 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 487 |     
 488 |     - `max_depth` : int, default=None
 489 |         
 490 |         The maximum depth of the representation. If None, the tree is fully generated.
 491 |     
 492 |     **Returns:**
 493 |     
 494 |     - `graph` : pydot.Dot instance  
 495 |         
 496 |         Return an instance representing the Linear Tree. Splitting nodes have a rectangular shape while leaf nodes have a circular one.
 497 | 
 498 | - ```plot_model(feature_names=None, max_depth=None)```
 499 | 
 500 |     Convert a fitted Linear Tree model to dot format and display it. 
 501 |     It results in ModuleNotFoundError if graphviz or pydot are not available.
 502 |     When installing graphviz make sure to add it to the system path.
 503 |     
 504 |     **Parameters:**
 505 |     
 506 |     - `feature_names` : array-like of shape (n_features, ), default=None
 507 |         
 508 |         Names of each of the features. If None, generic names will be used (“X[0]”, “X[1]”, …). 
 509 |     
 510 |     - `max_depth` : int, default=None
 511 |         
 512 |         The maximum depth of the representation. If None, the tree is fully generated.
 513 |     
 514 |     **Returns:**
 515 |     
 516 |     - A Jupyter notebook Image object if Jupyter is installed.
 517 |         
 518 |         This enables in-line display of the model plots in notebooks. Splitting nodes have a rectangular shape while leaf nodes have a circular one.
 519 |         
 520 |         
 521 | ## LinearBoostRegressor
 522 | ```
 523 | class lineartree.LinearBoostRegressor(base_estimator, *, loss = 'linear', n_estimators = 10, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0.0, ccp_alpha = 0.0)
 524 | ```
 525 | 
 526 | #### Parameters:
 527 | 
 528 | - ```base_estimator : object```
 529 | 
 530 |     The base estimator iteratively fitted.
 531 |     The base estimator must be a sklearn.linear_model.
 532 |     
 533 | - ```loss : {"linear", "square", "absolute", "exponential"}, default="linear"```
 534 | 
 535 |     The function used to calculate the residuals of each sample.
 536 |     
 537 | - ```n_estimators : int, default=10```
 538 | 
 539 |     The number of boosting stages to perform. It corresponds to the number of the new features generated.
 540 |     
 541 | - ```max_depth : int, default=3```
 542 | 
 543 |     The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 544 |     
 545 | - ```min_samples_split : int or float, default=2```
 546 | 
 547 |     The minimum number of samples required to split an internal node:
 548 |     - If int, then consider `min_samples_split` as the minimum number.
 549 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
 550 |     
 551 | - ```min_samples_leaf : int or float, default=1```
 552 |   
 553 |     The minimum number of samples required to be at a leaf node. 
 554 |     A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches.  This may have the effect of smoothing the model, especially in regression.
 555 |     - If int, then consider `min_samples_leaf` as the minimum number.
 556 |     - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
 557 |   
 558 | - ```min_weight_fraction_leaf : float, default=0.0```
 559 | 
 560 |     The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 
 561 |   
 562 | - ```max_features : int, float or {"auto", "sqrt", "log2"}, default=None```
 563 |   
 564 |     The number of features to consider when looking for the best split:
 565 |     - If int, then consider `max_features` features at each split.
 566 |     - If float, then `max_features` is a fraction and `int(max_features * n_features)` features are considered at each split.
 567 |     - If "auto", then `max_features=n_features`.
 568 |     - If "sqrt", then `max_features=sqrt(n_features)`.
 569 |     - If "log2", then `max_features=log2(n_features)`.
 570 |     - If None, then `max_features=n_features`.
 571 |     Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.  
 572 |   
 573 | - ```max_leaf_nodes : int, default=None```
 574 |   
 575 |     Grow a tree with ``max_leaf_nodes`` in best-first fashion.
 576 |     Best nodes are defined as relative reduction in impurity.
 577 |     If None then unlimited number of leaf nodes.
 578 |   
 579 | - ```min_impurity_decrease : float, default=0.0```
 580 |   
 581 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
 582 |   
 583 | - ```ccp_alpha : non-negative float, default=0.0```
 584 | 
 585 |     Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details.  
 586 |   
 587 | #### Attributes:
 588 | 
 589 | - ```n_features_in_ : int```
 590 | 
 591 |     The number of features when :meth:`fit` is performed.
 592 |     
 593 | - ```n_features_out_ : int```
 594 | 
 595 |     The total number of features used to fit the base estimator in the last iteration. The number of output features is equal to the sum of n_features_in_ and n_estimators.
 596 |     
 597 | - ```coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_)```
 598 | 
 599 |     Estimated coefficients for the linear regression problem.
 600 |     If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features_out_), while if only one target is passed, this is a 1D array of length n_features.
 601 | 
 602 | - ```intercept_ : float or array of shape (n_targets, )```
 603 | 
 604 |     Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`
 605 |     
 606 | #### Methods:
 607 | 
 608 | - ```fit(X, y, sample_weight=None)```
 609 | 
 610 |     Build a Linear Boosting from the training set (X, y).
 611 |     
 612 |     **Parameters:**
 613 |     
 614 |     - `X` : array-like of shape (n_samples, n_features)
 615 |         
 616 |         The training input samples.  
 617 |     
 618 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
 619 |         
 620 |         Target values.
 621 |     
 622 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
 623 |         
 624 |         Sample weights. 
 625 |         
 626 |     **Returns:**
 627 |     
 628 |     - `self` : object
 629 | 
 630 | - ```predict(X)```
 631 | 
 632 |     Predict regression target for X.
 633 | 
 634 |     **Parameters:**
 635 | 
 636 |     - `X` : array-like of shape (n_samples, n_features)
 637 |         
 638 |         Samples. 
 639 | 
 640 |     **Returns:**
 641 |     
 642 |     - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression.
 643 |         
 644 |         The predicted values.
 645 |         
 646 | - ```transform(X)```
 647 | 
 648 |     Transform dataset.
 649 | 
 650 |     **Parameters:**
 651 | 
 652 |     - `X` : array-like of shape (n_samples, n_features)
 653 |         
 654 |         Input data to be transformed. Use ``dtype=np.float32`` for maximum efficiency. 
 655 | 
 656 |     **Returns:**
 657 |     
 658 |     - `X_transformed` : ndarray of shape (n_samples, n_out).
 659 |         
 660 |         Transformed dataset.
 661 |         `n_out` is equal to `n_features` + `n_estimators`.
 662 |         
 663 | ## LinearBoostClassifier
 664 | ```
 665 | class lineartree.LinearBoostClassifier(base_estimator, loss = 'hamming', n_estimators = 10, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0.0, ccp_alpha = 0.0)
 666 | ```
 667 | 
 668 | #### Parameters:
 669 | 
 670 | - ```base_estimator : object```
 671 | 
 672 |     The base estimator iteratively fitted.
 673 |     The base estimator must be a sklearn.linear_model.
 674 |     
 675 | - ```loss : {"hamming", "entropy"}, default="hamming"```
 676 | 
 677 |     The function used to calculate the residuals of each sample.
 678 |     
 679 | - ```n_estimators : int, default=10```
 680 | 
 681 |     The number of boosting stages to perform. It corresponds to the number of the new features generated.
 682 |     
 683 | - ```max_depth : int, default=3```
 684 | 
 685 |     The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 686 |     
 687 | - ```min_samples_split : int or float, default=2```
 688 | 
 689 |     The minimum number of samples required to split an internal node:
 690 |     - If int, then consider `min_samples_split` as the minimum number.
 691 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
 692 |     
 693 | - ```min_samples_leaf : int or float, default=1```
 694 |   
 695 |     The minimum number of samples required to be at a leaf node. 
 696 |     A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches.  This may have the effect of smoothing the model, especially in regression.
 697 |     - If int, then consider `min_samples_leaf` as the minimum number.
 698 |     - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
 699 |   
 700 | - ```min_weight_fraction_leaf : float, default=0.0```
 701 | 
 702 |     The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. 
 703 |   
 704 | - ```max_features : int, float or {"auto", "sqrt", "log2"}, default=None```
 705 |   
 706 |     The number of features to consider when looking for the best split:
 707 |     - If int, then consider `max_features` features at each split.
 708 |     - If float, then `max_features` is a fraction and `int(max_features * n_features)` features are considered at each split.
 709 |     - If "auto", then `max_features=n_features`.
 710 |     - If "sqrt", then `max_features=sqrt(n_features)`.
 711 |     - If "log2", then `max_features=log2(n_features)`.
 712 |     - If None, then `max_features=n_features`.
 713 |     
 714 |     Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.  
 715 |   
 716 | - ```max_leaf_nodes : int, default=None```
 717 |   
 718 |     Grow a tree with ``max_leaf_nodes`` in best-first fashion.
 719 |     Best nodes are defined as relative reduction in impurity.
 720 |     If None then unlimited number of leaf nodes.
 721 |   
 722 | - ```min_impurity_decrease : float, default=0.0```
 723 |   
 724 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
 725 | 
 726 | - ```ccp_alpha : non-negative float, default=0.0```
 727 | 
 728 |     Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details.  
 729 |   
 730 | #### Attributes:
 731 | 
 732 | - ```n_features_in_ : int```
 733 | 
 734 |     The number of features when :meth:`fit` is performed.
 735 |     
 736 | - ```n_features_out_ : int```
 737 | 
 738 |     The total number of features used to fit the base estimator in the last iteration. The number of output features is equal to the sum of n_features_in_ and n_estimators.
 739 |     
 740 | - ```coef_ : array of shape (n_features_out_, ) or (n_targets, n_features_out_)```
 741 | 
 742 |     Estimated coefficients for the linear regression problem.
 743 |     If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features_out_), while if only one target is passed, this is a 1D array of length n_features_out_.
 744 | 
 745 | - ```intercept_ : float or array of shape (n_targets, )```
 746 | 
 747 |     Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`
 748 |     
 749 | - ```classes_ : ndarray of shape (n_classes, )```
 750 | 
 751 |     A list of class labels known to the classifier.
 752 |     
 753 | #### Methods:
 754 | 
 755 | - ```fit(X, y, sample_weight=None)```
 756 | 
 757 |     Build a Linear Boosting from the training set (X, y).
 758 |     
 759 |     **Parameters:**
 760 |     
 761 |     - `X` : array-like of shape (n_samples, n_features)
 762 |         
 763 |         The training input samples.  
 764 |     
 765 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
 766 |         
 767 |         Target values.
 768 |     
 769 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
 770 |         
 771 |         Sample weights. 
 772 |         
 773 |     **Returns:**
 774 |     
 775 |     - `self` : object
 776 | 
 777 | - ```predict(X)```
 778 | 
 779 |     Predict class for X.
 780 | 
 781 |     **Parameters:**
 782 | 
 783 |     - `X` : array-like of shape (n_samples, n_features)
 784 |         
 785 |         Samples. 
 786 | 
 787 |     **Returns:**
 788 |     
 789 |     - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression.
 790 |         
 791 |         The predicted classes.
 792 |         
 793 | - ```transform(X)```
 794 | 
 795 |     Transform dataset.
 796 | 
 797 |     **Parameters:**
 798 | 
 799 |     - `X` : array-like of shape (n_samples, n_features)
 800 |         
 801 |         Input data to be transformed. Use ``dtype=np.float32`` for maximum efficiency. 
 802 | 
 803 |     **Returns:**
 804 |     
 805 |     - `X_transformed` : ndarray of shape (n_samples, n_out)
 806 |         
 807 |         Transformed dataset.
 808 |         `n_out` is equal to `n_features` + `n_estimators`.
 809 |         
 810 | ## LinearForestRegressor
 811 | ```
 812 | class lineartree.LinearForestRegressor(base_estimator, *, n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0., max_features="auto", max_leaf_nodes=None, min_impurity_decrease=0., bootstrap=True, oob_score=False, n_jobs=None, random_state=None, ccp_alpha=0.0, max_samples=None)
 813 | ```
 814 | 
 815 | #### Parameters:
 816 | 
 817 | - ```base_estimator : object```
 818 | 
 819 |     The linear estimator fitted on the raw target.
 820 |     The linear estimator must be a regressor from sklearn.linear_model.
 821 |     
 822 | - ```n_estimators : int, default=100```
 823 | 
 824 |     The number of trees in the forest.
 825 |     
 826 | - ```max_depth : int, default=None```
 827 | 
 828 |     The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 829 |     
 830 | - ```min_samples_split : int or float, default=2```
 831 | 
 832 |     The minimum number of samples required to split an internal node:    
 833 |     - If int, then consider `min_samples_split` as the minimum number.
 834 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
 835 |       
 836 | - ```min_samples_leaf : int or float, default=1```
 837 | 
 838 |     The minimum number of samples required to be at a leaf node.
 839 |     A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches.  This may have the effect of smoothing the model, especially in regression.
 840 |     - If int, then consider `min_samples_leaf` as the minimum number.
 841 |     - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
 842 |       
 843 | - ```min_weight_fraction_leaf : float, default=0.0```
 844 | 
 845 |     The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 846 |     
 847 | - ```max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"```
 848 | 
 849 |     The number of features to consider when looking for the best split:    
 850 |     - If int, then consider `max_features` features at each split.
 851 |     - If float, then `max_features` is a fraction and `round(max_features * n_features)` features are considered at each split.
 852 |     - If "auto", then `max_features=n_features`.
 853 |     - If "sqrt", then `max_features=sqrt(n_features)`.
 854 |     - If "log2", then `max_features=log2(n_features)`.
 855 |     - If None, then `max_features=n_features`
 856 |     
 857 |     Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.
 858 |     
 859 | - ```max_leaf_nodes : int, default=None```
 860 | 
 861 |     Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
 862 |     
 863 | - ```min_impurity_decrease : float, default=0.0```
 864 | 
 865 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
 866 | 
 867 | - ```bootstrap : bool, default=True```
 868 | 
 869 |     Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
 870 |     
 871 | - ```oob_score : bool, default=False```
 872 | 
 873 |     Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True.
 874 |     
 875 | - ```n_jobs : int, default=None```
 876 | 
 877 |     The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, :meth:`decision_path` and :meth:`apply` are all parallelized over the trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors.
 878 |     
 879 | - ```random_state : int, RandomState instance or None, default=None```
 880 | 
 881 |     Controls both the randomness of the bootstrapping of the samples used when building trees (if ``bootstrap=True``) and the sampling of the features to consider when looking for the best split at each node (if ``max_features < n_features``).
 882 |     
 883 | - ```ccp_alpha : non-negative float, default=0.0```
 884 | 
 885 |     Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details.
 886 |     
 887 | - ```max_samples : int or float, default=None```
 888 | 
 889 |     If bootstrap is True, the number of samples to draw from X to train each base estimator.
 890 |     - If None (default), then draw `X.shape[0]` samples.
 891 |     - If int, then draw `max_samples` samples.
 892 |     - If float, then draw `max_samples * X.shape[0]` samples. Thus,
 893 |       `max_samples` should be in the interval `(0, 1]`.
 894 | 
 895 | #### Attributes:
 896 | 
 897 | - ```n_features_in_ : int```
 898 | 
 899 |     The number of features when :meth:`fit` is performed.
 900 |     
 901 | - ```feature_importances_ : ndarray of shape (n_features, )```
 902 | 
 903 |     The impurity-based feature importances.
 904 |     The higher, the more important the feature.
 905 |     The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature.  It is also known as the Gini importance.
 906 |     
 907 | - ```coef_ : array of shape (n_features, ) or (n_targets, n_features)```
 908 | 
 909 |     Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
 910 |     
 911 | - ```intercept_ : float or array of shape (n_targets,)```
 912 | 
 913 |     Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`. 
 914 |     
 915 | - ```base_estimator_ : object```
 916 | 
 917 |     A fitted linear model instance.
 918 |     
 919 | - ```forest_estimator_ : object```
 920 | 
 921 |     A fitted random forest instance. 
 922 |     
 923 | #### Methods:
 924 | 
 925 | - ```fit(X, y, sample_weight=None)```
 926 | 
 927 |     Build a Linear Forest from the training set (X, y).
 928 |     
 929 |     **Parameters:**
 930 |     
 931 |     - `X` : array-like of shape (n_samples, n_features)
 932 |         
 933 |         The training input samples.  
 934 |     
 935 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
 936 |         
 937 |         Target values.
 938 |     
 939 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
 940 |         
 941 |         Sample weights. 
 942 |         
 943 |     **Returns:**
 944 |     
 945 |     - `self` : object
 946 | 
 947 | - ```predict(X)```
 948 | 
 949 |     Predict regression target for X.
 950 |     
 951 |     **Parameters:**
 952 |     
 953 |     - `X` : array-like of shape (n_samples, n_features)
 954 |         
 955 |         Samples.  
 956 |         
 957 |     **Returns:**
 958 |     
 959 |     - `pred` : ndarray of shape (n_samples, ) or also (n_samples, n_targets) if multitarget regression.
 960 |     
 961 |         The predicted values.
 962 |         
 963 | - ```apply(X)```
 964 | 
 965 |     Apply trees in the forest to X, return leaf indices.
 966 |     
 967 |     **Parameters:**
 968 |     
 969 |     - `X` : array-like of shape (n_samples, n_features)
 970 |         
 971 |         The input samples.  
 972 |         
 973 |     **Returns:**
 974 |     
 975 |     - `X_leaves` : array-like of shape (n_samples, n_estimators).
 976 |     
 977 |         For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.
 978 |         
 979 | - ```decision_path(X)```
 980 | 
 981 |     Return the decision path in the forest.
 982 |     
 983 |     **Parameters:**
 984 |     
 985 |     - `X` : array-like of shape (n_samples, n_features)
 986 |         
 987 |         The input samples.  
 988 |         
 989 |     **Returns:**
 990 |     
 991 |     - `indicator` : sparse matrix of shape (n_samples, n_nodes)
 992 |         
 993 |         Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format.
 994 |         
 995 |     - `n_nodes_ptr` : ndarray of shape (n_estimators + 1, )
 996 |         
 997 |         The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.
 998 | 
 999 | ## LinearForestClassifier
1000 | ```
1001 | class lineartree.LinearForestClassifier(base_estimator, *, n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0., max_features="auto", max_leaf_nodes=None, min_impurity_decrease=0., bootstrap=True, oob_score=False, n_jobs=None, random_state=None, ccp_alpha=0.0, max_samples=None)
1002 | ```
1003 | 
1004 | #### Parameters:
1005 | 
1006 | - ```base_estimator : object```
1007 | 
1008 |     The linear estimator fitted on the raw target.
1009 |     The linear estimator must be a regressor from sklearn.linear_model.
1010 |     
1011 | - ```n_estimators : int, default=100```
1012 | 
1013 |     The number of trees in the forest.
1014 |     
1015 | - ```max_depth : int, default=None```
1016 | 
1017 |     The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
1018 |     
1019 | - ```min_samples_split : int or float, default=2```
1020 | 
1021 |     The minimum number of samples required to split an internal node:    
1022 |     - If int, then consider `min_samples_split` as the minimum number.
1023 |     - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split.
1024 |       
1025 | - ```min_samples_leaf : int or float, default=1```
1026 | 
1027 |     The minimum number of samples required to be at a leaf node.
1028 |     A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches.  This may have the effect of smoothing the model, especially in regression.
1029 |     - If int, then consider `min_samples_leaf` as the minimum number.
1030 |     - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node.
1031 |       
1032 | - ```min_weight_fraction_leaf : float, default=0.0```
1033 | 
1034 |     The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
1035 |     
1036 | - ```max_features : {"auto", "sqrt", "log2"}, int or float, default="auto"```
1037 | 
1038 |     The number of features to consider when looking for the best split:    
1039 |     - If int, then consider `max_features` features at each split.
1040 |     - If float, then `max_features` is a fraction and `round(max_features * n_features)` features are considered at each split.
1041 |     - If "auto", then `max_features=n_features`.
1042 |     - If "sqrt", then `max_features=sqrt(n_features)`.
1043 |     - If "log2", then `max_features=log2(n_features)`.
1044 |     - If None, then `max_features=n_features`.
1045 |     Note: the search for a split does not stop until at least one
1046 |     valid partition of the node samples is found, even if it requires to
1047 |     effectively inspect more than ``max_features`` features.
1048 |     
1049 | - ```max_leaf_nodes : int, default=None```
1050 | 
1051 |     Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
1052 |     
1053 | - ```min_impurity_decrease : float, default=0.0```
1054 | 
1055 |     A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
1056 |     
1057 | - ```bootstrap : bool, default=True```
1058 | 
1059 |     Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
1060 |     
1061 | - ```oob_score : bool, default=False```
1062 | 
1063 |     Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True.
1064 |     
1065 | - ```n_jobs : int, default=None```
1066 | 
1067 |     The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`, :meth:`decision_path` and :meth:`apply` are all parallelized over the trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors.
1068 |     
1069 | - ```random_state : int, RandomState instance or None, default=None```
1070 | 
1071 |     Controls both the randomness of the bootstrapping of the samples used when building trees (if ``bootstrap=True``) and the sampling of the features to consider when looking for the best split at each node (if ``max_features < n_features``).
1072 |     
1073 | - ```ccp_alpha : non-negative float, default=0.0```
1074 | 
1075 |     Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details.
1076 |     
1077 | - ```max_samples : int or float, default=None```
1078 | 
1079 |     If bootstrap is True, the number of samples to draw from X to train each base estimator.
1080 |     - If None (default), then draw `X.shape[0]` samples.
1081 |     - If int, then draw `max_samples` samples.
1082 |     - If float, then draw `max_samples * X.shape[0]` samples. Thus,
1083 |       `max_samples` should be in the interval `(0, 1]`.
1084 | 
1085 | #### Attributes:
1086 | 
1087 | - ```n_features_in_ : int```
1088 | 
1089 |     The number of features when :meth:`fit` is performed.
1090 |     
1091 | - ```feature_importances_ : ndarray of shape (n_features, )```
1092 | 
1093 |     The impurity-based feature importances.
1094 |     The higher, the more important the feature.
1095 |     The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature.  It is also known as the Gini importance.
1096 |     
1097 | - ```coef_ : array of shape (n_features, ) or (n_targets, n_features)```
1098 | 
1099 |     Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
1100 |     
1101 | - ```intercept_ : float or array of shape (n_targets,)```
1102 | 
1103 |     Independent term in the linear model. Set to 0 if `fit_intercept = False` in `base_estimator`. 
1104 |     
1105 | - ```classes_ : ndarray of shape (n_classes, )```
1106 | 
1107 |     A list of class labels known to the classifier. 
1108 |     
1109 | - ```base_estimator_ : object```
1110 | 
1111 |     A fitted linear model instance.
1112 |     
1113 | - ```forest_estimator_ : object```
1114 | 
1115 |     A fitted random forest instance. 
1116 |     
1117 | #### Methods:
1118 | 
1119 | - ```fit(X, y, sample_weight=None)```
1120 | 
1121 |     Build a Linear Forest from the training set (X, y).
1122 |     
1123 |     **Parameters:**
1124 |     
1125 |     - `X` : array-like of shape (n_samples, n_features)
1126 |         
1127 |         The training input samples.  
1128 |     
1129 |     - `y` : array-like of shape (n_samples, ) or (n_samples, n_targets)
1130 |         
1131 |         Target values.
1132 |     
1133 |     - `sample_weight` : array-like of shape (n_samples, ), default=None
1134 |         
1135 |         Sample weights. 
1136 |         
1137 |     **Returns:**
1138 |     
1139 |     - `self` : 
1140 |     
1141 | - ```decision_function(X)```
1142 | 
1143 |     Predict confidence scores for samples.
1144 |     The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.
1145 |     
1146 |     **Parameters:**
1147 |     
1148 |     - `X` : array-like of shape (n_samples, n_features)
1149 |         
1150 |         Samples.  
1151 |         
1152 |     **Returns:**
1153 |     
1154 |     - `pred` : ndarray of shape (n_samples, ).
1155 |     
1156 |         Confidence scores.
1157 |         Confidence score for self.classes_[1] where >0 means this class would be predicted.
1158 | 
1159 | - ```predict(X)```
1160 | 
1161 |     Predict class for X.
1162 |     
1163 |     **Parameters:**
1164 |     
1165 |     - `X` : array-like of shape (n_samples, n_features)
1166 |         
1167 |         Samples.  
1168 |         
1169 |     **Returns:**
1170 |     
1171 |     - `pred` : ndarray of shape (n_samples, ).
1172 |     
1173 |         The predicted classes.
1174 |         
1175 | - ```predict_proba(X)```
1176 | 
1177 |     Predict class probabilities for X.
1178 |     
1179 |     **Parameters:**
1180 |     
1181 |     - `X` : array-like of shape (n_samples, n_features)
1182 |         
1183 |         Samples.  
1184 |         
1185 |     **Returns:**
1186 |     
1187 |     - `proba` : ndarray of shape (n_samples, n_classes).
1188 |     
1189 |         The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.
1190 |         
1191 | - ```predict_log_proba(X)```
1192 | 
1193 |     Predict class log-probabilities for X.
1194 |     
1195 |     **Parameters:**
1196 |     
1197 |     - `X` : array-like of shape (n_samples, n_features)
1198 |         
1199 |         Samples.  
1200 |         
1201 |     **Returns:**
1202 |     
1203 |     - `pred` : ndarray of shape (n_samples, n_classes).
1204 |     
1205 |         The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.
1206 |         
1207 | - ```apply(X)```
1208 | 
1209 |     Apply trees in the forest to X, return leaf indices.
1210 |     
1211 |     **Parameters:**
1212 |     
1213 |     - `X` : array-like of shape (n_samples, n_features)
1214 |         
1215 |         The input samples.  
1216 |         
1217 |     **Returns:**
1218 |     
1219 |     - `X_leaves` : array-like of shape (n_samples, n_estimators).
1220 |     
1221 |         For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.
1222 |         
1223 | - ```decision_path(X)```
1224 | 
1225 |     Return the decision path in the forest.
1226 |     
1227 |     **Parameters:**
1228 |     
1229 |     - `X` : array-like of shape (n_samples, n_features)
1230 |         
1231 |         The input samples.  
1232 |         
1233 |     **Returns:**
1234 |     
1235 |     - `indicator` : sparse matrix of shape (n_samples, n_nodes)
1236 |         
1237 |         Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format.
1238 |         
1239 |     - `n_nodes_ptr` : ndarray of shape (n_estimators + 1, )
1240 |         
1241 |         The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.


--------------------------------------------------------------------------------
/notebooks/usage-LinearBoost.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "\n",
 11 |     "from sklearn.linear_model import *\n",
 12 |     "from lineartree import LinearBoostClassifier, LinearBoostRegressor\n",
 13 |     "\n",
 14 |     "from sklearn.datasets import make_classification, make_regression\n",
 15 |     "\n",
 16 |     "import warnings\n",
 17 |     "warnings.simplefilter('ignore')"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# REGRESSION"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [
 32 |     {
 33 |      "data": {
 34 |       "text/plain": [
 35 |        "((8000, 15), (8000,))"
 36 |       ]
 37 |      },
 38 |      "execution_count": 2,
 39 |      "metadata": {},
 40 |      "output_type": "execute_result"
 41 |     }
 42 |    ],
 43 |    "source": [
 44 |     "n_sample, n_features = 8000, 15\n",
 45 |     "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=1, \n",
 46 |     "                       n_informative=5, shuffle=True, random_state=33)\n",
 47 |     "\n",
 48 |     "X.shape, y.shape"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "### default configuration"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "code",
 60 |    "execution_count": 3,
 61 |    "metadata": {},
 62 |    "outputs": [
 63 |     {
 64 |      "data": {
 65 |       "text/plain": [
 66 |        "LinearBoostRegressor(base_estimator=Ridge())"
 67 |       ]
 68 |      },
 69 |      "execution_count": 3,
 70 |      "metadata": {},
 71 |      "output_type": "execute_result"
 72 |     }
 73 |    ],
 74 |    "source": [
 75 |     "regr = LinearBoostRegressor(Ridge(), loss='linear')\n",
 76 |     "regr.fit(X, y)"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": 4,
 82 |    "metadata": {},
 83 |    "outputs": [
 84 |     {
 85 |      "data": {
 86 |       "text/plain": [
 87 |        "((8000, 25), (8000,), 0.9999998985375346)"
 88 |       ]
 89 |      },
 90 |      "execution_count": 4,
 91 |      "metadata": {},
 92 |      "output_type": "execute_result"
 93 |     }
 94 |    ],
 95 |    "source": [
 96 |     "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "### square loss"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "code",
108 |    "execution_count": 5,
109 |    "metadata": {},
110 |    "outputs": [
111 |     {
112 |      "data": {
113 |       "text/plain": [
114 |        "LinearBoostRegressor(base_estimator=Ridge(), loss='square', n_estimators=50)"
115 |       ]
116 |      },
117 |      "execution_count": 5,
118 |      "metadata": {},
119 |      "output_type": "execute_result"
120 |     }
121 |    ],
122 |    "source": [
123 |     "regr = LinearBoostRegressor(Ridge(), loss='square', n_estimators=50)\n",
124 |     "regr.fit(X, y)"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "code",
129 |    "execution_count": 6,
130 |    "metadata": {},
131 |    "outputs": [
132 |     {
133 |      "data": {
134 |       "text/plain": [
135 |        "((8000, 65), (8000,), 0.9999998980115891)"
136 |       ]
137 |      },
138 |      "execution_count": 6,
139 |      "metadata": {},
140 |      "output_type": "execute_result"
141 |     }
142 |    ],
143 |    "source": [
144 |     "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "markdown",
149 |    "metadata": {},
150 |    "source": [
151 |     "### absolute loss"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": 7,
157 |    "metadata": {},
158 |    "outputs": [
159 |     {
160 |      "data": {
161 |       "text/plain": [
162 |        "LinearBoostRegressor(base_estimator=Ridge(), loss='absolute', n_estimators=50)"
163 |       ]
164 |      },
165 |      "execution_count": 7,
166 |      "metadata": {},
167 |      "output_type": "execute_result"
168 |     }
169 |    ],
170 |    "source": [
171 |     "regr = LinearBoostRegressor(Ridge(), loss='absolute', n_estimators=50)\n",
172 |     "regr.fit(X, y)"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": 8,
178 |    "metadata": {},
179 |    "outputs": [
180 |     {
181 |      "data": {
182 |       "text/plain": [
183 |        "((8000, 65), (8000,), 0.9999998009507592)"
184 |       ]
185 |      },
186 |      "execution_count": 8,
187 |      "metadata": {},
188 |      "output_type": "execute_result"
189 |     }
190 |    ],
191 |    "source": [
192 |     "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "markdown",
197 |    "metadata": {},
198 |    "source": [
199 |     "### exponential loss"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": 9,
205 |    "metadata": {},
206 |    "outputs": [
207 |     {
208 |      "data": {
209 |       "text/plain": [
210 |        "LinearBoostRegressor(base_estimator=Ridge(), loss='exponential',\n",
211 |        "                     n_estimators=50)"
212 |       ]
213 |      },
214 |      "execution_count": 9,
215 |      "metadata": {},
216 |      "output_type": "execute_result"
217 |     }
218 |    ],
219 |    "source": [
220 |     "regr = LinearBoostRegressor(Ridge(), loss='exponential', n_estimators=50)\n",
221 |     "regr.fit(X, y)"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "code",
226 |    "execution_count": 10,
227 |    "metadata": {},
228 |    "outputs": [
229 |     {
230 |      "data": {
231 |       "text/plain": [
232 |        "((8000, 65), (8000,), 0.9999998597027936)"
233 |       ]
234 |      },
235 |      "execution_count": 10,
236 |      "metadata": {},
237 |      "output_type": "execute_result"
238 |     }
239 |    ],
240 |    "source": [
241 |     "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)"
242 |    ]
243 |   },
244 |   {
245 |    "cell_type": "markdown",
246 |    "metadata": {},
247 |    "source": [
248 |     "### multi-target regression with weights "
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": 11,
254 |    "metadata": {},
255 |    "outputs": [
256 |     {
257 |      "data": {
258 |       "text/plain": [
259 |        "((8000, 15), (8000, 2))"
260 |       ]
261 |      },
262 |      "execution_count": 11,
263 |      "metadata": {},
264 |      "output_type": "execute_result"
265 |     }
266 |    ],
267 |    "source": [
268 |     "n_sample, n_features = 8000, 15\n",
269 |     "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=2, \n",
270 |     "                       n_informative=5, shuffle=True, random_state=33)\n",
271 |     "W = np.random.uniform(1,3, (n_sample,))\n",
272 |     "\n",
273 |     "X.shape, y.shape"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": 12,
279 |    "metadata": {},
280 |    "outputs": [
281 |     {
282 |      "data": {
283 |       "text/plain": [
284 |        "LinearBoostRegressor(base_estimator=Ridge(), n_estimators=50)"
285 |       ]
286 |      },
287 |      "execution_count": 12,
288 |      "metadata": {},
289 |      "output_type": "execute_result"
290 |     }
291 |    ],
292 |    "source": [
293 |     "regr = LinearBoostRegressor(Ridge(), loss='linear', n_estimators=50)\n",
294 |     "regr.fit(X, y, W)"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": 13,
300 |    "metadata": {},
301 |    "outputs": [
302 |     {
303 |      "data": {
304 |       "text/plain": [
305 |        "((8000, 65), (8000, 2), 0.999999867971615)"
306 |       ]
307 |      },
308 |      "execution_count": 13,
309 |      "metadata": {},
310 |      "output_type": "execute_result"
311 |     }
312 |    ],
313 |    "source": [
314 |     "regr.transform(X).shape, regr.predict(X).shape, regr.score(X, y)"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {},
320 |    "source": [
321 |     "# CLASSIFICATION"
322 |    ]
323 |   },
324 |   {
325 |    "cell_type": "code",
326 |    "execution_count": 14,
327 |    "metadata": {},
328 |    "outputs": [
329 |     {
330 |      "data": {
331 |       "text/plain": [
332 |        "((8000, 15), (8000,))"
333 |       ]
334 |      },
335 |      "execution_count": 14,
336 |      "metadata": {},
337 |      "output_type": "execute_result"
338 |     }
339 |    ],
340 |    "source": [
341 |     "n_sample, n_features = 8000, 15\n",
342 |     "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=3, \n",
343 |     "                           n_redundant=4, n_informative=5,\n",
344 |     "                           n_clusters_per_class=1,\n",
345 |     "                           shuffle=True, random_state=33)\n",
346 |     "\n",
347 |     "X.shape, y.shape"
348 |    ]
349 |   },
350 |   {
351 |    "cell_type": "markdown",
352 |    "metadata": {},
353 |    "source": [
354 |     "### default configuration "
355 |    ]
356 |   },
357 |   {
358 |    "cell_type": "code",
359 |    "execution_count": 15,
360 |    "metadata": {},
361 |    "outputs": [
362 |     {
363 |      "data": {
364 |       "text/plain": [
365 |        "LinearBoostClassifier(base_estimator=RidgeClassifier())"
366 |       ]
367 |      },
368 |      "execution_count": 15,
369 |      "metadata": {},
370 |      "output_type": "execute_result"
371 |     }
372 |    ],
373 |    "source": [
374 |     "clf = LinearBoostClassifier(RidgeClassifier(), loss='hamming')\n",
375 |     "clf.fit(X, y)"
376 |    ]
377 |   },
378 |   {
379 |    "cell_type": "code",
380 |    "execution_count": 16,
381 |    "metadata": {},
382 |    "outputs": [
383 |     {
384 |      "data": {
385 |       "text/plain": [
386 |        "((8000, 25), (8000,), (8000, 3), 0.81775)"
387 |       ]
388 |      },
389 |      "execution_count": 16,
390 |      "metadata": {},
391 |      "output_type": "execute_result"
392 |     }
393 |    ],
394 |    "source": [
395 |     "clf.transform(X).shape, clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X, y)"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "markdown",
400 |    "metadata": {},
401 |    "source": [
402 |     "### entropy loss "
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": 17,
408 |    "metadata": {},
409 |    "outputs": [
410 |     {
411 |      "data": {
412 |       "text/plain": [
413 |        "LinearBoostClassifier(base_estimator=LogisticRegression(), loss='entropy',\n",
414 |        "                      n_estimators=50)"
415 |       ]
416 |      },
417 |      "execution_count": 17,
418 |      "metadata": {},
419 |      "output_type": "execute_result"
420 |     }
421 |    ],
422 |    "source": [
423 |     "clf = LinearBoostClassifier(LogisticRegression(), loss='entropy', n_estimators=50)\n",
424 |     "clf.fit(X, y)"
425 |    ]
426 |   },
427 |   {
428 |    "cell_type": "code",
429 |    "execution_count": 18,
430 |    "metadata": {},
431 |    "outputs": [
432 |     {
433 |      "data": {
434 |       "text/plain": [
435 |        "((8000, 65), (8000,), (8000, 3), 0.844)"
436 |       ]
437 |      },
438 |      "execution_count": 18,
439 |      "metadata": {},
440 |      "output_type": "execute_result"
441 |     }
442 |    ],
443 |    "source": [
444 |     "clf.transform(X).shape, clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X, y)"
445 |    ]
446 |   }
447 |  ],
448 |  "metadata": {
449 |   "kernelspec": {
450 |    "display_name": "Python [conda env:prova]",
451 |    "language": "python",
452 |    "name": "conda-env-prova-py"
453 |   },
454 |   "language_info": {
455 |    "codemirror_mode": {
456 |     "name": "ipython",
457 |     "version": 3
458 |    },
459 |    "file_extension": ".py",
460 |    "mimetype": "text/x-python",
461 |    "name": "python",
462 |    "nbconvert_exporter": "python",
463 |    "pygments_lexer": "ipython3",
464 |    "version": "3.7.7"
465 |   }
466 |  },
467 |  "nbformat": 4,
468 |  "nbformat_minor": 2
469 | }
470 | 


--------------------------------------------------------------------------------
/notebooks/usage-LinearForest.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "\n",
 11 |     "from sklearn.linear_model import *\n",
 12 |     "from lineartree import LinearForestClassifier, LinearForestRegressor\n",
 13 |     "\n",
 14 |     "from sklearn.datasets import make_classification, make_regression\n",
 15 |     "\n",
 16 |     "import warnings\n",
 17 |     "warnings.simplefilter('ignore')"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "# REGRESSION"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [
 32 |     {
 33 |      "data": {
 34 |       "text/plain": [
 35 |        "((8000, 15), (8000,))"
 36 |       ]
 37 |      },
 38 |      "execution_count": 2,
 39 |      "metadata": {},
 40 |      "output_type": "execute_result"
 41 |     }
 42 |    ],
 43 |    "source": [
 44 |     "n_sample, n_features = 8000, 15\n",
 45 |     "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=1, \n",
 46 |     "                       n_informative=5, shuffle=True, random_state=33)\n",
 47 |     "\n",
 48 |     "X.shape, y.shape"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": 3,
 54 |    "metadata": {},
 55 |    "outputs": [
 56 |     {
 57 |      "data": {
 58 |       "text/plain": [
 59 |        "LinearForestRegressor(base_estimator=Ridge())"
 60 |       ]
 61 |      },
 62 |      "execution_count": 3,
 63 |      "metadata": {},
 64 |      "output_type": "execute_result"
 65 |     }
 66 |    ],
 67 |    "source": [
 68 |     "regr = LinearForestRegressor(Ridge())\n",
 69 |     "regr.fit(X, y)"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 4,
 75 |    "metadata": {},
 76 |    "outputs": [
 77 |     {
 78 |      "data": {
 79 |       "text/plain": [
 80 |        "((8000,), (8000, 100), (101,), 0.9999999999365181)"
 81 |       ]
 82 |      },
 83 |      "execution_count": 4,
 84 |      "metadata": {},
 85 |      "output_type": "execute_result"
 86 |     }
 87 |    ],
 88 |    "source": [
 89 |     "regr.predict(X).shape, regr.apply(X).shape, regr.decision_path(X)[-1].shape, regr.score(X,y)"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "### multi-target regression with weights "
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 5,
102 |    "metadata": {},
103 |    "outputs": [
104 |     {
105 |      "data": {
106 |       "text/plain": [
107 |        "((8000, 15), (8000, 2))"
108 |       ]
109 |      },
110 |      "execution_count": 5,
111 |      "metadata": {},
112 |      "output_type": "execute_result"
113 |     }
114 |    ],
115 |    "source": [
116 |     "n_sample, n_features = 8000, 15\n",
117 |     "X, y = make_regression(n_samples=n_sample, n_features=n_features, n_targets=2, \n",
118 |     "                       n_informative=5, shuffle=True, random_state=33)\n",
119 |     "W = np.random.uniform(1,3, (n_sample,))\n",
120 |     "\n",
121 |     "X.shape, y.shape"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 6,
127 |    "metadata": {},
128 |    "outputs": [
129 |     {
130 |      "data": {
131 |       "text/plain": [
132 |        "LinearForestRegressor(base_estimator=Ridge())"
133 |       ]
134 |      },
135 |      "execution_count": 6,
136 |      "metadata": {},
137 |      "output_type": "execute_result"
138 |     }
139 |    ],
140 |    "source": [
141 |     "regr = LinearForestRegressor(Ridge())\n",
142 |     "regr.fit(X, y, W)"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": 7,
148 |    "metadata": {},
149 |    "outputs": [
150 |     {
151 |      "data": {
152 |       "text/plain": [
153 |        "((8000, 2), (8000, 100), (101,), 0.999999999979502)"
154 |       ]
155 |      },
156 |      "execution_count": 7,
157 |      "metadata": {},
158 |      "output_type": "execute_result"
159 |     }
160 |    ],
161 |    "source": [
162 |     "regr.predict(X).shape, regr.apply(X).shape, regr.decision_path(X)[-1].shape, regr.score(X,y)"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "markdown",
167 |    "metadata": {},
168 |    "source": [
169 |     "# BINARY CLASSIFICATION"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "code",
174 |    "execution_count": 8,
175 |    "metadata": {},
176 |    "outputs": [
177 |     {
178 |      "data": {
179 |       "text/plain": [
180 |        "((8000, 15), (8000,))"
181 |       ]
182 |      },
183 |      "execution_count": 8,
184 |      "metadata": {},
185 |      "output_type": "execute_result"
186 |     }
187 |    ],
188 |    "source": [
189 |     "n_sample, n_features = 8000, 15\n",
190 |     "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=2, \n",
191 |     "                           n_redundant=4, n_informative=5,\n",
192 |     "                           n_clusters_per_class=1,\n",
193 |     "                           shuffle=True, random_state=33)\n",
194 |     "\n",
195 |     "X.shape, y.shape"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "### default configuration"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": 9,
208 |    "metadata": {},
209 |    "outputs": [
210 |     {
211 |      "data": {
212 |       "text/plain": [
213 |        "LinearForestClassifier(base_estimator=Ridge())"
214 |       ]
215 |      },
216 |      "execution_count": 9,
217 |      "metadata": {},
218 |      "output_type": "execute_result"
219 |     }
220 |    ],
221 |    "source": [
222 |     "clf = LinearForestClassifier(Ridge())\n",
223 |     "clf.fit(X, y)"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 10,
229 |    "metadata": {},
230 |    "outputs": [
231 |     {
232 |      "data": {
233 |       "text/plain": [
234 |        "((8000,), (8000, 2), (8000, 100), (101,), 1.0)"
235 |       ]
236 |      },
237 |      "execution_count": 10,
238 |      "metadata": {},
239 |      "output_type": "execute_result"
240 |     }
241 |    ],
242 |    "source": [
243 |     "clf.predict(X).shape, clf.predict_proba(X).shape, clf.apply(X).shape, clf.decision_path(X)[-1].shape, clf.score(X,y)"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "markdown",
248 |    "metadata": {},
249 |    "source": [
250 |     "# MULTI-CLASS CLASSIFICATION"
251 |    ]
252 |   },
253 |   {
254 |    "cell_type": "code",
255 |    "execution_count": 11,
256 |    "metadata": {},
257 |    "outputs": [
258 |     {
259 |      "data": {
260 |       "text/plain": [
261 |        "((8000, 15), (8000,))"
262 |       ]
263 |      },
264 |      "execution_count": 11,
265 |      "metadata": {},
266 |      "output_type": "execute_result"
267 |     }
268 |    ],
269 |    "source": [
270 |     "n_sample, n_features = 8000, 15\n",
271 |     "X, y = make_classification(n_samples=n_sample, n_features=n_features, n_classes=3, \n",
272 |     "                           n_redundant=4, n_informative=5,\n",
273 |     "                           n_clusters_per_class=1,\n",
274 |     "                           shuffle=True, random_state=33)\n",
275 |     "\n",
276 |     "X.shape, y.shape"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "markdown",
281 |    "metadata": {},
282 |    "source": [
283 |     "### default configuration"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": 12,
289 |    "metadata": {},
290 |    "outputs": [
291 |     {
292 |      "data": {
293 |       "text/plain": [
294 |        "OneVsRestClassifier(estimator=LinearForestClassifier(base_estimator=Ridge()))"
295 |       ]
296 |      },
297 |      "execution_count": 12,
298 |      "metadata": {},
299 |      "output_type": "execute_result"
300 |     }
301 |    ],
302 |    "source": [
303 |     "from sklearn.multiclass import OneVsRestClassifier\n",
304 |     "\n",
305 |     "clf = OneVsRestClassifier(LinearForestClassifier(Ridge()))\n",
306 |     "clf.fit(X, y)"
307 |    ]
308 |   },
309 |   {
310 |    "cell_type": "code",
311 |    "execution_count": 13,
312 |    "metadata": {},
313 |    "outputs": [
314 |     {
315 |      "data": {
316 |       "text/plain": [
317 |        "((8000,), (8000, 3), 1.0)"
318 |       ]
319 |      },
320 |      "execution_count": 13,
321 |      "metadata": {},
322 |      "output_type": "execute_result"
323 |     }
324 |    ],
325 |    "source": [
326 |     "clf.predict(X).shape, clf.predict_proba(X).shape, clf.score(X,y)"
327 |    ]
328 |   }
329 |  ],
330 |  "metadata": {
331 |   "kernelspec": {
332 |    "display_name": "Python [conda env:prova]",
333 |    "language": "python",
334 |    "name": "conda-env-prova-py"
335 |   },
336 |   "language_info": {
337 |    "codemirror_mode": {
338 |     "name": "ipython",
339 |     "version": 3
340 |    },
341 |    "file_extension": ".py",
342 |    "mimetype": "text/x-python",
343 |    "name": "python",
344 |    "nbconvert_exporter": "python",
345 |    "pygments_lexer": "ipython3",
346 |    "version": "3.7.7"
347 |   }
348 |  },
349 |  "nbformat": 4,
350 |  "nbformat_minor": 2
351 | }
352 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | scipy
3 | scikit-learn>=0.24.2
4 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import pathlib
 2 | from setuptools import setup, find_packages
 3 | 
 4 | HERE = pathlib.Path(__file__).parent
 5 | 
 6 | VERSION = '0.3.5'
 7 | PACKAGE_NAME = 'linear-tree'
 8 | AUTHOR = 'Marco Cerliani'
 9 | AUTHOR_EMAIL = 'cerlymarco@gmail.com'
10 | URL = 'https://github.com/cerlymarco/linear-tree'
11 | 
12 | LICENSE = 'MIT'
13 | DESCRIPTION = 'A python library to build Model Trees with Linear Models at the leaves.'
14 | LONG_DESCRIPTION = (HERE / "README.md").read_text()
15 | LONG_DESC_TYPE = "text/markdown"
16 | 
17 | INSTALL_REQUIRES = [
18 |     'scikit-learn>=0.24.2',
19 |     'numpy',
20 |     'scipy'
21 | ]
22 | 
23 | setup(name=PACKAGE_NAME,
24 |       version=VERSION,
25 |       description=DESCRIPTION,
26 |       long_description=LONG_DESCRIPTION,
27 |       long_description_content_type=LONG_DESC_TYPE,
28 |       author=AUTHOR,
29 |       license=LICENSE,
30 |       author_email=AUTHOR_EMAIL,
31 |       url=URL,
32 |       install_requires=INSTALL_REQUIRES,
33 |       python_requires='>=3',
34 |       packages=find_packages()
35 |       )


--------------------------------------------------------------------------------