├── MachineLearning.md
└── README.md


/MachineLearning.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | name: MachineLearning
  3 | topic: Machine Learning & Statistical Learning
  4 | maintainer: Torsten Hothorn, Hannah Frick, Lucas Kook
  5 | email: Torsten.Hothorn@R-project.org
  6 | version: 2025-06-03
  7 | source: https://github.com/cran-task-views/MachineLearning/
  8 | ---
  9 | 
 10 | Several add-on packages implement ideas and methods developed at the
 11 | borderline between computer science and statistics - this field of
 12 | research is usually referred to as machine learning. The packages can be
 13 | roughly structured into the following topics:
 14 | 
 15 | -   *Neural Networks and Deep Learning* : Single-hidden-layer neural
 16 |     network are implemented in package
 17 |     `r pkg("nnet", priority = "core")` (shipped with base R).
 18 |     Package `r pkg("RSNNS")` offers an interface to the
 19 |     Stuttgart Neural Network Simulator (SNNS). Packages implementing
 20 |     deep learning flavours of neural networks include
 21 |     `r pkg("deepnet")` (feed-forward neural network,
 22 |     restricted Boltzmann machine, deep belief network, stacked
 23 |     autoencoders) and `r pkg("h2o")` (feed-forward neural
 24 |     network, deep autoencoders). An interface to
 25 |     [tensorflow](http://www.tensorflow.org) is available in
 26 |     `r pkg("tensorflow")`. The `r pkg("torch")`
 27 |     package implements an interface to the [libtorch
 28 |     library](https://pytorch.org/). There is also `r pkg("mlr3torch")`
 29 |     which integrates torch into the `r pkg("mlr3")` ecosystem.
 30 |     Prediction uncertainty can be quantified
 31 |     by the ENNreg evidential regression neural network model implemented
 32 |     in `r pkg("evreg")`.
 33 | -   *Recursive Partitioning* : Tree-structured models for regression,
 34 |     classification and survival analysis, following the ideas in the
 35 |     CART book, are implemented in
 36 |     `r pkg("rpart", priority = "core")` (shipped with base R)
 37 |     and `r pkg("tree")`. Package
 38 |     `r pkg("rpart")` is recommended for computing CART-like
 39 |     trees. A rich toolbox of partitioning algorithms is available in
 40 |     [Weka](http://www.cs.waikato.ac.nz/~ml/weka/), package
 41 |     `r pkg("RWeka")` provides an interface to this
 42 |     implementation, including the J4.8-variant of C4.5 and M5. The
 43 |     `r pkg("Cubist")` package fits rule-based models
 44 |     (similar to trees) with linear regression models in the terminal
 45 |     leaves, instance-based corrections and boosting. The
 46 |     `r pkg("C50")` package can fit C5.0 classification
 47 |     trees, rule-based models, and boosted versions of these.
 48 |     `r pkg("pre")` can fit rule-based models for a wider range of
 49 |     response variable types.\
 50 |     Two recursive partitioning algorithms with unbiased variable
 51 |     selection and statistical stopping criterion are implemented in
 52 |     package `r pkg("party")` and
 53 |     `r pkg("partykit")`. Function `ctree()` is based on
 54 |     non-parametric conditional inference procedures for testing
 55 |     independence between response and each input variable whereas
 56 |     `mob()` can be used to partition parametric models. Extensible tools
 57 |     for visualizing binary trees and node distributions of the response
 58 |     are available in package `r pkg("party")` and
 59 |     `r pkg("partykit")` as well. Partitioning of mixed-effects models
 60 |     (GLMMs) can be performed with package `r pkg("glmertree")`;
 61 |     partitioning of structural equation models (SEMs) can be performed
 62 |     with package `r pkg("semtree")`.\
 63 |     Graphical tools for the visualization of trees are available in
 64 |     package `r pkg("maptree")`.\
 65 |     Partitioning of mixture models is performed by
 66 |     `r pkg("RPMM")`.\
 67 |     Computational infrastructure for representing trees and unified
 68 |     methods for prediction and visualization is implemented in
 69 |     `r pkg("partykit")`. This infrastructure is used by
 70 |     package `r pkg("evtree")` to implement evolutionary
 71 |     learning of globally optimal trees. Survival trees are available in
 72 |     various packages.
 73 | 
 74 |     Trees for subgroup identification with respect to heterogenuous
 75 |     treatment effects  are available in packages `r pkg("partykit")`,
 76 |     `r pkg("model4you")`, `r pkg("dipm")`, `r pkg("quint")`,
 77 |     `r pkg("SIDES")`, and `r pkg("psica")` (and probably many more).
 78 | 
 79 | -   *Random Forests* : The reference implementation of the random forest
 80 |     algorithm for regression and classification is available in package
 81 |     `r pkg("randomForest", priority = "core")`. Package
 82 |     `r pkg("ipred")` has bagging for regression,
 83 |     classification and survival analysis as well as bundling, a
 84 |     combination of multiple models via ensemble learning. In addition, a
 85 |     random forest variant for response variables measured at arbitrary
 86 |     scales based on conditional inference trees is implemented in
 87 |     package `r pkg("party")`.
 88 |     `r pkg("randomForestSRC")` implements a unified
 89 |     treatment of Breiman's random forests for survival, regression and
 90 |     classification problems. Quantile regression forests
 91 |     `r pkg("quantregForest")` allow to regress quantiles of
 92 |     a numeric response on exploratory variables via a random forest
 93 |     approach. For binary data, The `r pkg("varSelRF")` and
 94 |     `r pkg("Boruta")` packages focus on variable selection
 95 |     by means for random forest algorithms. In addition, packages
 96 |     `r pkg("ranger")` and `r pkg("Rborist")`
 97 |     offer R interfaces to fast C++ implementations of random forests.
 98 |     Reinforcement Learning Trees, featuring splits in variables which
 99 |     will be important down the tree, are implemented in package
100 |     `r pkg("RLT")`. `r pkg("wsrf")` implements
101 |     an alternative variable weighting method for variable subspace
102 |     selection in place of the traditional random variable sampling.
103 |     Package `r pkg("RGF")` is an interface to a Python
104 |     implementation of a procedure called regularized greedy forests.
105 |     Random forests for parametric models, including forests for the
106 |     estimation of predictive distributions, are available in packages
107 |     `r pkg("trtf")` (predictive transformation forests,
108 |     possibly under censoring and truncation) and
109 |     `r pkg("grf")` (an implementation of generalised random
110 |     forests).
111 | -   *Regularized and Shrinkage Methods* : Regression models with some
112 |     constraint on the parameter estimates can be fitted with the
113 |     `r pkg("lars")` package. Lasso with simultaneous updates for groups of parameters
114 |     (groupwise lasso) is available in package
115 |     `r pkg("grplasso")`; the `r pkg("grpreg")`
116 |     package implements a number of other group penalization models, such
117 |     as group MCP and group SCAD. The L1 regularization path for
118 |     generalized linear models and Cox models can be obtained from
119 |     functions available in package `r pkg("glmpath")`, the
120 |     entire lasso or elastic-net regularization path (also in
121 |     `r pkg("elasticnet")`) for linear regression, logistic
122 |     and multinomial regression models can be obtained from package
123 |     `r pkg("glmnet")`. The `r pkg("easy.glmnet")` is a companion to support the
124 |     usage of glmnet. The `r pkg("penalized")`
125 |     package provides an alternative implementation of lasso (L1) and
126 |     ridge (L2) penalized regression models (both GLM and Cox models).
127 |     Package `r pkg("RXshrink")` can be used to generate TRACE displays that identify the extent of
128 |     shrinkage with Maximum Likelihood of Minimum MSE Risk when errors are IID Normal.
129 |     Semiparametric additive hazards
130 |     models under lasso penalties are offered by package
131 |     `r pkg("ahaz")`. The shrunken centroids
132 |     classifier and utilities for gene expression analyses are
133 |     implemented in package `r pkg("pamr")`. An
134 |     implementation of multivariate adaptive regression splines is
135 |     available in package `r pkg("earth")`. Various forms of
136 |     penalized discriminant analysis are implemented in packages
137 |     `r pkg("hda")` and `r pkg("sda")`. Package
138 |     `r pkg("LiblineaR")` offers an interface to the
139 |     LIBLINEAR library. The `r pkg("ncvreg")` package fits
140 |     linear and logistic regression models under the the SCAD and MCP
141 |     regression penalties using a coordinate descent algorithm.
142 |     The Lasso under non-Gaussian and heteroscedastic errors is estimated
143 |     by `r pkg("hdm")`, inference on low-dimensional
144 |     components of Lasso regression and of estimated treatment effects in
145 |     a high-dimensional setting are also contained. Package
146 |     `r pkg("SIS")` implements sure independence screening in
147 |     generalised linear and Cox models. Elastic nets for correlated
148 |     outcomes are available from package `r pkg("joinet")`.
149 |     Robust penalized generalized linear models and robust support vector
150 |     machines are fitted by package `r pkg("mpath")` using
151 |     composite optimization by conjugation operator. The
152 |     `r pkg("islasso")` package provides an implementation of
153 |     lasso based on the induced smoothing idea which allows to obtain
154 |     reliable p-values for all model parameters. Best-subset selection
155 |     for linear, logistic, Cox and other regression models, based on a
156 |     fast polynomial time algorithm, is available from package
157 |     `r pkg("abess", priority = "core")`.
158 | -   *Boosting and Gradient Descent* : Various forms of gradient boosting
159 |     are implemented in package
160 |     `r pkg("gbm", priority = "core")` (tree-based functional
161 |     gradient descent boosting). Package `r pkg("lightgbm")` and `r pkg("xgboost")`
162 |     implement tree-based boosting using efficient trees as base
163 |     learners for several and also user-defined objective functions. The
164 |     Hinge-loss is optimized by the boosting implementation in package
165 |     `r pkg("bst")`. An extensible boosting framework for
166 |     generalized linear, additive and nonparametric models is available
167 |     in package `r pkg("mboost", priority = "core")`.
168 |     Likelihood-based boosting for mixed models is implemented in
169 |     `r pkg("GMMBoost")`. GAMLSS models can be fitted using
170 |     boosting by `r pkg("gamboostLSS")`. `r pkg("adabag")` implements the
171 |     classical AdaBoost algorithm with added functionality, such as variable
172 |     importances.
173 | -   *Support Vector Machines and Kernel Methods* : The function `svm()`
174 |     from `r pkg("e1071", priority = "core")` offers an
175 |     interface to the LIBSVM library and package
176 |     `r pkg("kernlab", priority = "core")` implements a
177 |     flexible framework for kernel learning (including SVMs, RVMs and
178 |     other kernel learning algorithms). An interface to the SVMlight
179 |     implementation (only for one-against-all classification) is provided
180 |     in package `r pkg("klaR")`.  Package `r pkg("gKRLS")` features
181 |     Generalized Kernel Regularized Least Squares, applicable to non-gaussian
182 |     data alongside random effects, splines, and unregularized fixed effects.
183 | -   *Bayesian Methods* : Bayesian Additive Regression Trees (BART),
184 |     where the final model is defined in terms of the sum over many weak
185 |     learners (not unlike ensemble methods), are implemented in packages
186 |     `r pkg("BayesTree")`, `r pkg("BART")`, and
187 |     `r pkg("bartMachine")`. Bayesian nonstationary,
188 |     semiparametric nonlinear regression and design by treed Gaussian
189 |     processes including Bayesian CART and treed linear models are made
190 |     available by package `r pkg("tgp")`. Bayesian structure
191 |     learning in undirected graphical models for multivariate continuous,
192 |     discrete, and mixed data is implemented in package
193 |     `r pkg("BDgraph")`; corresponding methods relying on
194 |     spike-and-slab priors are available from package
195 |     `r pkg("ssgraph")`. Naive Bayes classifiers are
196 |     available in `r pkg("naivebayes")`.
197 | -   *Optimization using Genetic Algorithms* : Package
198 |     `r pkg("rgenoud")` offers optimization routines based on
199 |     genetic algorithms. The package `r pkg("Rmalschains")`
200 |     implements memetic algorithms with local search chains, which are a
201 |     special type of evolutionary algorithms, combining a steady state
202 |     genetic algorithm with local search for real-valued parameter
203 |     optimization.
204 | -   *Association Rules* : Package `r pkg("arules")` provides
205 |     both data structures for efficient handling of sparse binary data as
206 |     well as interfaces to implementations of Apriori and Eclat for
207 |     mining frequent itemsets, maximal frequent itemsets, closed frequent
208 |     itemsets and association rules. Package
209 |     `r pkg("opusminer")` provides an interface to the OPUS
210 |     Miner algorithm (implemented in C++) for finding the key
211 |     associations in transaction data efficiently, in the form of
212 |     self-sufficient itemsets, using either leverage or lift.
213 | -   *Fuzzy Rule-based Systems* : Package `r pkg("frbs")`
214 |     implements a host of standard methods for learning fuzzy rule-based
215 |     systems from data for regression and classification. Package
216 |     `r pkg("RoughSets")` provides comprehensive
217 |     implementations of the rough set theory (RST) and the fuzzy rough
218 |     set theory (FRST) in a single package.
219 | -   *Model selection and validation* : Package
220 |     `r pkg("e1071")` has function `tune()` for hyper
221 |     parameter tuning and function `errorest()`
222 |     (`r pkg("ipred")`) can be used for error rate
223 |     estimation. The cost parameter C for support vector machines can be
224 |     chosen utilizing the functionality of package
225 |     `r pkg("svmpath")`. Data splitting for crossvalidation
226 |     and other resampling schemes is available in the
227 |     `r pkg("splitTools")` package. Package
228 |     `r pkg("nestedcv")` provides nested cross-validation for
229 |     `r pkg("glmnet")` and `r pkg("caret")`  models. Functions for ROC
230 |     analysis and other visualisation techniques for comparing candidate
231 |     classifiers are available from package `r pkg("ROCR")`.
232 |     Packages `r pkg("hdi")` and `r pkg("stabs")`
233 |     implement stability selection for a range of models,
234 |     `r pkg("hdi")` also offers other inference procedures in
235 |     high-dimensional models.
236 | -   *Causal Machine Learning* : The package
237 |     `r pkg("DoubleML")` is an object-oriented implementation
238 |     of the double machine learning framework in a variety of causal
239 |     models. Building upon the `r pkg("mlr3")` ecosystem,
240 |     estimation of causal effects can be based on an extensive collection
241 |     of machine learning methods.
242 | -   *Other procedures* : Evidential classifiers quantify the uncertainty
243 |     about the class of a test pattern using a Dempster-Shafer mass
244 |     function in package `r pkg("evclass")`. The
245 |     `r pkg("OneR")` (One Rule) package offers a
246 |     classification algorithm with enhancements for sophisticated
247 |     handling of missing values and numeric data together with extensive
248 |     diagnostic functions. The `r pkg("mlr3inferr")` allows to construct
249 |     confidence intervals for the generalization error using resampling-based
250 |     inference methods.
251 | -   *Meta packages* : Package `r pkg("tidymodels")` provides
252 |     miscellaneous functions for building predictive models, including
253 |     parameter tuning and variable importance measures.
254 |     In a similar spirit, package `r pkg("mlr3")` offers high-level interfaces to
255 |     various statistical and machine learning packages. Package
256 |     `r pkg("SuperLearner")` implements a similar toolbox.
257 |     The `r pkg("h2o")` package implements a general purpose
258 |     machine learning platform that has scalable implementations of many
259 |     popular algorithms such as random forest, GBM, GLM (with elastic net
260 |     regularization), and deep learning (feedforward multilayer
261 |     networks), among others. An interface to the mlpack C++ library is
262 |     available from package `r pkg("mlpack")`.
263 |     `r pkg("CORElearn")` implements a rather broad class of
264 |     machine learning algorithms, such as nearest neighbors, trees,
265 |     random forests, and several feature selection methods. Similar,
266 |     package `r pkg("rminer")` interfaces several learning
267 |     algorithms implemented in other packages and computes several
268 |     performance measures. Package `r pkg("qeML")` provides wrappers to numerous
269 |     machine learning R packages with a simple, convenient, and uniform interface,
270 |     for both beginner and advanced operations such as `r pkg("FOCI")` and
271 |     `r pkg("ncvreg")`.
272 | -   *Visualisation (initially contributed by Brandon Greenwell)* The
273 |     `stats::termplot()` function package can be used to plot the terms
274 |     in a model whose predict method supports `type="terms"`. The
275 |     `r pkg("effects")` package provides graphical and
276 |     tabular effect displays for models with a linear predictor (e.g.,
277 |     linear and generalized linear models). Friedman's partial dependence
278 |     plots (PDPs), that are low dimensional graphical renderings of the
279 |     prediction function, are implemented in a few packages.
280 |     `r pkg("gbm")`, `r pkg("randomForest")` and
281 |     `r pkg("randomForestSRC")` provide their own functions
282 |     for displaying PDPs, but are limited to the models fit with those
283 |     packages (the function `partialPlot` from
284 |     `r pkg("randomForest")` is more limited since it only
285 |     allows for one predictor at a time). Packages
286 |     `r pkg("pdp")`, `r pkg("plotmo")`, and
287 |     `r pkg("ICEbox")` are more general and allow for the
288 |     creation of PDPs for a wide variety of machine learning models
289 |     (e.g., random forests, support vector machines, etc.); both
290 |     `r pkg("pdp")` and `r pkg("plotmo")` support
291 |     multivariate displays (`r pkg("plotmo")` is limited to
292 |     two predictors while `r pkg("pdp")` uses trellis
293 |     graphics to display PDPs involving three predictors). By default,
294 |     `r pkg("plotmo")` fixes the background variables at
295 |     their medians (or first level for factors) which is faster than
296 |     constructing PDPs but incorporates less information.
297 |     `r pkg("ICEbox")` focuses on constructing individual
298 |     conditional expectation (ICE) curves, a refinement over Friedman's
299 |     PDPs. ICE curves, as well as centered ICE curves can also be
300 |     constructed with the `partial()` function from the
301 |     `r pkg("pdp")` package.
302 | -   *XAI* : Most packages and functions from the last section "Visualization"
303 |     belong to the field of explainable artificial intelligence (XAI).
304 |     The meta packages `r pkg("DALEX")` and `r pkg("iml")` offer different
305 |     methods to interpret any model, including partial dependence,
306 |     accumulated local effects, and permutation importance. Accumulated local
307 |     effects plots are also directly available in `r pkg("ALEPlot")`.
308 |     SHAP (from *SH*apley *A*dditive ex*P*lanations) is one of the most
309 |     frequently used techniques to interpret ML models.
310 |     It decomposes - in a fair way - predictions into additive contributions
311 |     of the predictors. For tree-based models, the very fast TreeSHAP algorithm
312 |     exists. It is shipped directly with `r pkg("h2o")`, `r pkg("xgboost")`,
313 |     and `r pkg("lightgbm")`. Model-agnostic implementations of SHAP
314 |     are available in additional packages: `r pkg("fastshap")` mainly uses
315 |     Monte-Carlo sampling to approximate SHAP values, while `r pkg("shapr")` and
316 |     `r pkg("kernelshap")` provide implementations of KernelSHAP.
317 |     SHAP values of any of these packages can be plotted by the package `r pkg("shapviz")`.
318 |     A port to Python's "shap" package is provided in `r pkg("shapper")`.
319 |     Alternative decompositions of predictions are implemented in
320 |     `r pkg("lime")` and `r pkg("iBreakDown")`.
321 | 
322 | ### Links
323 | -   [MLOSS: Machine Learning Open Source Software](http://www.MLOSS.org/)
324 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## CRAN Task View: Machine Learning & Statistical Learning
 2 | 
 3 | **URL:** <https://CRAN.R-project.org/view=MachineLearning>
 4 | 
 5 | **Source file:** [MachineLearning.md](MachineLearning.md)
 6 | 
 7 | **Contributions:** Suggestions and improvements for this task view are very
 8 | welcome and can be made through issues or pull requests here on GitHub or
 9 | via e-mail to the maintainer address. For further details see the
10 | [Contributing](https://github.com/cran-task-views/ctv/blob/main/Contributing.md)
11 | guide. All contributions must adhere to the
12 | [code of conduct](https://github.com/cran-task-views/ctv/blob/main/CodeOfConduct.md).
13 | 


--------------------------------------------------------------------------------