├── MachineLearning.md └── README.md /MachineLearning.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: MachineLearning 3 | topic: Machine Learning & Statistical Learning 4 | maintainer: Torsten Hothorn, Hannah Frick, Lucas Kook 5 | email: Torsten.Hothorn@R-project.org 6 | version: 2025-06-03 7 | source: https://github.com/cran-task-views/MachineLearning/ 8 | --- 9 | 10 | Several add-on packages implement ideas and methods developed at the 11 | borderline between computer science and statistics - this field of 12 | research is usually referred to as machine learning. The packages can be 13 | roughly structured into the following topics: 14 | 15 | - *Neural Networks and Deep Learning* : Single-hidden-layer neural 16 | network are implemented in package 17 | `r pkg("nnet", priority = "core")` (shipped with base R). 18 | Package `r pkg("RSNNS")` offers an interface to the 19 | Stuttgart Neural Network Simulator (SNNS). Packages implementing 20 | deep learning flavours of neural networks include 21 | `r pkg("deepnet")` (feed-forward neural network, 22 | restricted Boltzmann machine, deep belief network, stacked 23 | autoencoders) and `r pkg("h2o")` (feed-forward neural 24 | network, deep autoencoders). An interface to 25 | [tensorflow](http://www.tensorflow.org) is available in 26 | `r pkg("tensorflow")`. The `r pkg("torch")` 27 | package implements an interface to the [libtorch 28 | library](https://pytorch.org/). There is also `r pkg("mlr3torch")` 29 | which integrates torch into the `r pkg("mlr3")` ecosystem. 30 | Prediction uncertainty can be quantified 31 | by the ENNreg evidential regression neural network model implemented 32 | in `r pkg("evreg")`. 33 | - *Recursive Partitioning* : Tree-structured models for regression, 34 | classification and survival analysis, following the ideas in the 35 | CART book, are implemented in 36 | `r pkg("rpart", priority = "core")` (shipped with base R) 37 | and `r pkg("tree")`. Package 38 | `r pkg("rpart")` is recommended for computing CART-like 39 | trees. A rich toolbox of partitioning algorithms is available in 40 | [Weka](http://www.cs.waikato.ac.nz/~ml/weka/), package 41 | `r pkg("RWeka")` provides an interface to this 42 | implementation, including the J4.8-variant of C4.5 and M5. The 43 | `r pkg("Cubist")` package fits rule-based models 44 | (similar to trees) with linear regression models in the terminal 45 | leaves, instance-based corrections and boosting. The 46 | `r pkg("C50")` package can fit C5.0 classification 47 | trees, rule-based models, and boosted versions of these. 48 | `r pkg("pre")` can fit rule-based models for a wider range of 49 | response variable types.\ 50 | Two recursive partitioning algorithms with unbiased variable 51 | selection and statistical stopping criterion are implemented in 52 | package `r pkg("party")` and 53 | `r pkg("partykit")`. Function `ctree()` is based on 54 | non-parametric conditional inference procedures for testing 55 | independence between response and each input variable whereas 56 | `mob()` can be used to partition parametric models. Extensible tools 57 | for visualizing binary trees and node distributions of the response 58 | are available in package `r pkg("party")` and 59 | `r pkg("partykit")` as well. Partitioning of mixed-effects models 60 | (GLMMs) can be performed with package `r pkg("glmertree")`; 61 | partitioning of structural equation models (SEMs) can be performed 62 | with package `r pkg("semtree")`.\ 63 | Graphical tools for the visualization of trees are available in 64 | package `r pkg("maptree")`.\ 65 | Partitioning of mixture models is performed by 66 | `r pkg("RPMM")`.\ 67 | Computational infrastructure for representing trees and unified 68 | methods for prediction and visualization is implemented in 69 | `r pkg("partykit")`. This infrastructure is used by 70 | package `r pkg("evtree")` to implement evolutionary 71 | learning of globally optimal trees. Survival trees are available in 72 | various packages. 73 | 74 | Trees for subgroup identification with respect to heterogenuous 75 | treatment effects are available in packages `r pkg("partykit")`, 76 | `r pkg("model4you")`, `r pkg("dipm")`, `r pkg("quint")`, 77 | `r pkg("SIDES")`, and `r pkg("psica")` (and probably many more). 78 | 79 | - *Random Forests* : The reference implementation of the random forest 80 | algorithm for regression and classification is available in package 81 | `r pkg("randomForest", priority = "core")`. Package 82 | `r pkg("ipred")` has bagging for regression, 83 | classification and survival analysis as well as bundling, a 84 | combination of multiple models via ensemble learning. In addition, a 85 | random forest variant for response variables measured at arbitrary 86 | scales based on conditional inference trees is implemented in 87 | package `r pkg("party")`. 88 | `r pkg("randomForestSRC")` implements a unified 89 | treatment of Breiman's random forests for survival, regression and 90 | classification problems. Quantile regression forests 91 | `r pkg("quantregForest")` allow to regress quantiles of 92 | a numeric response on exploratory variables via a random forest 93 | approach. For binary data, The `r pkg("varSelRF")` and 94 | `r pkg("Boruta")` packages focus on variable selection 95 | by means for random forest algorithms. In addition, packages 96 | `r pkg("ranger")` and `r pkg("Rborist")` 97 | offer R interfaces to fast C++ implementations of random forests. 98 | Reinforcement Learning Trees, featuring splits in variables which 99 | will be important down the tree, are implemented in package 100 | `r pkg("RLT")`. `r pkg("wsrf")` implements 101 | an alternative variable weighting method for variable subspace 102 | selection in place of the traditional random variable sampling. 103 | Package `r pkg("RGF")` is an interface to a Python 104 | implementation of a procedure called regularized greedy forests. 105 | Random forests for parametric models, including forests for the 106 | estimation of predictive distributions, are available in packages 107 | `r pkg("trtf")` (predictive transformation forests, 108 | possibly under censoring and truncation) and 109 | `r pkg("grf")` (an implementation of generalised random 110 | forests). 111 | - *Regularized and Shrinkage Methods* : Regression models with some 112 | constraint on the parameter estimates can be fitted with the 113 | `r pkg("lars")` package. Lasso with simultaneous updates for groups of parameters 114 | (groupwise lasso) is available in package 115 | `r pkg("grplasso")`; the `r pkg("grpreg")` 116 | package implements a number of other group penalization models, such 117 | as group MCP and group SCAD. The L1 regularization path for 118 | generalized linear models and Cox models can be obtained from 119 | functions available in package `r pkg("glmpath")`, the 120 | entire lasso or elastic-net regularization path (also in 121 | `r pkg("elasticnet")`) for linear regression, logistic 122 | and multinomial regression models can be obtained from package 123 | `r pkg("glmnet")`. The `r pkg("easy.glmnet")` is a companion to support the 124 | usage of glmnet. The `r pkg("penalized")` 125 | package provides an alternative implementation of lasso (L1) and 126 | ridge (L2) penalized regression models (both GLM and Cox models). 127 | Package `r pkg("RXshrink")` can be used to generate TRACE displays that identify the extent of 128 | shrinkage with Maximum Likelihood of Minimum MSE Risk when errors are IID Normal. 129 | Semiparametric additive hazards 130 | models under lasso penalties are offered by package 131 | `r pkg("ahaz")`. The shrunken centroids 132 | classifier and utilities for gene expression analyses are 133 | implemented in package `r pkg("pamr")`. An 134 | implementation of multivariate adaptive regression splines is 135 | available in package `r pkg("earth")`. Various forms of 136 | penalized discriminant analysis are implemented in packages 137 | `r pkg("hda")` and `r pkg("sda")`. Package 138 | `r pkg("LiblineaR")` offers an interface to the 139 | LIBLINEAR library. The `r pkg("ncvreg")` package fits 140 | linear and logistic regression models under the the SCAD and MCP 141 | regression penalties using a coordinate descent algorithm. 142 | The Lasso under non-Gaussian and heteroscedastic errors is estimated 143 | by `r pkg("hdm")`, inference on low-dimensional 144 | components of Lasso regression and of estimated treatment effects in 145 | a high-dimensional setting are also contained. Package 146 | `r pkg("SIS")` implements sure independence screening in 147 | generalised linear and Cox models. Elastic nets for correlated 148 | outcomes are available from package `r pkg("joinet")`. 149 | Robust penalized generalized linear models and robust support vector 150 | machines are fitted by package `r pkg("mpath")` using 151 | composite optimization by conjugation operator. The 152 | `r pkg("islasso")` package provides an implementation of 153 | lasso based on the induced smoothing idea which allows to obtain 154 | reliable p-values for all model parameters. Best-subset selection 155 | for linear, logistic, Cox and other regression models, based on a 156 | fast polynomial time algorithm, is available from package 157 | `r pkg("abess", priority = "core")`. 158 | - *Boosting and Gradient Descent* : Various forms of gradient boosting 159 | are implemented in package 160 | `r pkg("gbm", priority = "core")` (tree-based functional 161 | gradient descent boosting). Package `r pkg("lightgbm")` and `r pkg("xgboost")` 162 | implement tree-based boosting using efficient trees as base 163 | learners for several and also user-defined objective functions. The 164 | Hinge-loss is optimized by the boosting implementation in package 165 | `r pkg("bst")`. An extensible boosting framework for 166 | generalized linear, additive and nonparametric models is available 167 | in package `r pkg("mboost", priority = "core")`. 168 | Likelihood-based boosting for mixed models is implemented in 169 | `r pkg("GMMBoost")`. GAMLSS models can be fitted using 170 | boosting by `r pkg("gamboostLSS")`. `r pkg("adabag")` implements the 171 | classical AdaBoost algorithm with added functionality, such as variable 172 | importances. 173 | - *Support Vector Machines and Kernel Methods* : The function `svm()` 174 | from `r pkg("e1071", priority = "core")` offers an 175 | interface to the LIBSVM library and package 176 | `r pkg("kernlab", priority = "core")` implements a 177 | flexible framework for kernel learning (including SVMs, RVMs and 178 | other kernel learning algorithms). An interface to the SVMlight 179 | implementation (only for one-against-all classification) is provided 180 | in package `r pkg("klaR")`. Package `r pkg("gKRLS")` features 181 | Generalized Kernel Regularized Least Squares, applicable to non-gaussian 182 | data alongside random effects, splines, and unregularized fixed effects. 183 | - *Bayesian Methods* : Bayesian Additive Regression Trees (BART), 184 | where the final model is defined in terms of the sum over many weak 185 | learners (not unlike ensemble methods), are implemented in packages 186 | `r pkg("BayesTree")`, `r pkg("BART")`, and 187 | `r pkg("bartMachine")`. Bayesian nonstationary, 188 | semiparametric nonlinear regression and design by treed Gaussian 189 | processes including Bayesian CART and treed linear models are made 190 | available by package `r pkg("tgp")`. Bayesian structure 191 | learning in undirected graphical models for multivariate continuous, 192 | discrete, and mixed data is implemented in package 193 | `r pkg("BDgraph")`; corresponding methods relying on 194 | spike-and-slab priors are available from package 195 | `r pkg("ssgraph")`. Naive Bayes classifiers are 196 | available in `r pkg("naivebayes")`. 197 | - *Optimization using Genetic Algorithms* : Package 198 | `r pkg("rgenoud")` offers optimization routines based on 199 | genetic algorithms. The package `r pkg("Rmalschains")` 200 | implements memetic algorithms with local search chains, which are a 201 | special type of evolutionary algorithms, combining a steady state 202 | genetic algorithm with local search for real-valued parameter 203 | optimization. 204 | - *Association Rules* : Package `r pkg("arules")` provides 205 | both data structures for efficient handling of sparse binary data as 206 | well as interfaces to implementations of Apriori and Eclat for 207 | mining frequent itemsets, maximal frequent itemsets, closed frequent 208 | itemsets and association rules. Package 209 | `r pkg("opusminer")` provides an interface to the OPUS 210 | Miner algorithm (implemented in C++) for finding the key 211 | associations in transaction data efficiently, in the form of 212 | self-sufficient itemsets, using either leverage or lift. 213 | - *Fuzzy Rule-based Systems* : Package `r pkg("frbs")` 214 | implements a host of standard methods for learning fuzzy rule-based 215 | systems from data for regression and classification. Package 216 | `r pkg("RoughSets")` provides comprehensive 217 | implementations of the rough set theory (RST) and the fuzzy rough 218 | set theory (FRST) in a single package. 219 | - *Model selection and validation* : Package 220 | `r pkg("e1071")` has function `tune()` for hyper 221 | parameter tuning and function `errorest()` 222 | (`r pkg("ipred")`) can be used for error rate 223 | estimation. The cost parameter C for support vector machines can be 224 | chosen utilizing the functionality of package 225 | `r pkg("svmpath")`. Data splitting for crossvalidation 226 | and other resampling schemes is available in the 227 | `r pkg("splitTools")` package. Package 228 | `r pkg("nestedcv")` provides nested cross-validation for 229 | `r pkg("glmnet")` and `r pkg("caret")` models. Functions for ROC 230 | analysis and other visualisation techniques for comparing candidate 231 | classifiers are available from package `r pkg("ROCR")`. 232 | Packages `r pkg("hdi")` and `r pkg("stabs")` 233 | implement stability selection for a range of models, 234 | `r pkg("hdi")` also offers other inference procedures in 235 | high-dimensional models. 236 | - *Causal Machine Learning* : The package 237 | `r pkg("DoubleML")` is an object-oriented implementation 238 | of the double machine learning framework in a variety of causal 239 | models. Building upon the `r pkg("mlr3")` ecosystem, 240 | estimation of causal effects can be based on an extensive collection 241 | of machine learning methods. 242 | - *Other procedures* : Evidential classifiers quantify the uncertainty 243 | about the class of a test pattern using a Dempster-Shafer mass 244 | function in package `r pkg("evclass")`. The 245 | `r pkg("OneR")` (One Rule) package offers a 246 | classification algorithm with enhancements for sophisticated 247 | handling of missing values and numeric data together with extensive 248 | diagnostic functions. The `r pkg("mlr3inferr")` allows to construct 249 | confidence intervals for the generalization error using resampling-based 250 | inference methods. 251 | - *Meta packages* : Package `r pkg("tidymodels")` provides 252 | miscellaneous functions for building predictive models, including 253 | parameter tuning and variable importance measures. 254 | In a similar spirit, package `r pkg("mlr3")` offers high-level interfaces to 255 | various statistical and machine learning packages. Package 256 | `r pkg("SuperLearner")` implements a similar toolbox. 257 | The `r pkg("h2o")` package implements a general purpose 258 | machine learning platform that has scalable implementations of many 259 | popular algorithms such as random forest, GBM, GLM (with elastic net 260 | regularization), and deep learning (feedforward multilayer 261 | networks), among others. An interface to the mlpack C++ library is 262 | available from package `r pkg("mlpack")`. 263 | `r pkg("CORElearn")` implements a rather broad class of 264 | machine learning algorithms, such as nearest neighbors, trees, 265 | random forests, and several feature selection methods. Similar, 266 | package `r pkg("rminer")` interfaces several learning 267 | algorithms implemented in other packages and computes several 268 | performance measures. Package `r pkg("qeML")` provides wrappers to numerous 269 | machine learning R packages with a simple, convenient, and uniform interface, 270 | for both beginner and advanced operations such as `r pkg("FOCI")` and 271 | `r pkg("ncvreg")`. 272 | - *Visualisation (initially contributed by Brandon Greenwell)* The 273 | `stats::termplot()` function package can be used to plot the terms 274 | in a model whose predict method supports `type="terms"`. The 275 | `r pkg("effects")` package provides graphical and 276 | tabular effect displays for models with a linear predictor (e.g., 277 | linear and generalized linear models). Friedman's partial dependence 278 | plots (PDPs), that are low dimensional graphical renderings of the 279 | prediction function, are implemented in a few packages. 280 | `r pkg("gbm")`, `r pkg("randomForest")` and 281 | `r pkg("randomForestSRC")` provide their own functions 282 | for displaying PDPs, but are limited to the models fit with those 283 | packages (the function `partialPlot` from 284 | `r pkg("randomForest")` is more limited since it only 285 | allows for one predictor at a time). Packages 286 | `r pkg("pdp")`, `r pkg("plotmo")`, and 287 | `r pkg("ICEbox")` are more general and allow for the 288 | creation of PDPs for a wide variety of machine learning models 289 | (e.g., random forests, support vector machines, etc.); both 290 | `r pkg("pdp")` and `r pkg("plotmo")` support 291 | multivariate displays (`r pkg("plotmo")` is limited to 292 | two predictors while `r pkg("pdp")` uses trellis 293 | graphics to display PDPs involving three predictors). By default, 294 | `r pkg("plotmo")` fixes the background variables at 295 | their medians (or first level for factors) which is faster than 296 | constructing PDPs but incorporates less information. 297 | `r pkg("ICEbox")` focuses on constructing individual 298 | conditional expectation (ICE) curves, a refinement over Friedman's 299 | PDPs. ICE curves, as well as centered ICE curves can also be 300 | constructed with the `partial()` function from the 301 | `r pkg("pdp")` package. 302 | - *XAI* : Most packages and functions from the last section "Visualization" 303 | belong to the field of explainable artificial intelligence (XAI). 304 | The meta packages `r pkg("DALEX")` and `r pkg("iml")` offer different 305 | methods to interpret any model, including partial dependence, 306 | accumulated local effects, and permutation importance. Accumulated local 307 | effects plots are also directly available in `r pkg("ALEPlot")`. 308 | SHAP (from *SH*apley *A*dditive ex*P*lanations) is one of the most 309 | frequently used techniques to interpret ML models. 310 | It decomposes - in a fair way - predictions into additive contributions 311 | of the predictors. For tree-based models, the very fast TreeSHAP algorithm 312 | exists. It is shipped directly with `r pkg("h2o")`, `r pkg("xgboost")`, 313 | and `r pkg("lightgbm")`. Model-agnostic implementations of SHAP 314 | are available in additional packages: `r pkg("fastshap")` mainly uses 315 | Monte-Carlo sampling to approximate SHAP values, while `r pkg("shapr")` and 316 | `r pkg("kernelshap")` provide implementations of KernelSHAP. 317 | SHAP values of any of these packages can be plotted by the package `r pkg("shapviz")`. 318 | A port to Python's "shap" package is provided in `r pkg("shapper")`. 319 | Alternative decompositions of predictions are implemented in 320 | `r pkg("lime")` and `r pkg("iBreakDown")`. 321 | 322 | ### Links 323 | - [MLOSS: Machine Learning Open Source Software](http://www.MLOSS.org/) 324 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## CRAN Task View: Machine Learning & Statistical Learning 2 | 3 | **URL:** 4 | 5 | **Source file:** [MachineLearning.md](MachineLearning.md) 6 | 7 | **Contributions:** Suggestions and improvements for this task view are very 8 | welcome and can be made through issues or pull requests here on GitHub or 9 | via e-mail to the maintainer address. For further details see the 10 | [Contributing](https://github.com/cran-task-views/ctv/blob/main/Contributing.md) 11 | guide. All contributions must adhere to the 12 | [code of conduct](https://github.com/cran-task-views/ctv/blob/main/CodeOfConduct.md). 13 | --------------------------------------------------------------------------------