├── README.md └── _config.yml /README.md: -------------------------------------------------------------------------------- 1 | # Feature selection (variable selection) 2 | > Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction ([Wikipedia](https://en.wikipedia.org/wiki/Feature_selection)) 3 | 4 | Why feature selection? 5 | 1. Data exploration 6 | 2. Curse of dimensionality 7 | 3. Less features - faster models 8 | 4. Better metrics 9 | 10 | - **Overview** 11 | - [An Introduction to Variable and Feature Selection](http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf) (2003) *Isabelle Guyon, Andre Elisseeff* 12 | - [A Survey on Feature Selection](https://www.sciencedirect.com/science/article/pii/S1877050916313047) (2016) *Jianyu Miaoac, Lingfeng Niu* 13 | - [Feature Selection: A Data Perspective](https://arxiv.org/pdf/1601.07996.pdf) (2016) *Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, Huan Liu* 14 | - [Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review](https://arxiv.org/pdf/1905.02845.pdf) (2019) *Benyamin Ghojogh, Maria N. Samad, Sayema Asif Mashhadi,Tania Kapoor, Wahab Ali, Fakhri Karray, Mark Crowle* 15 | - **All-relevant vs minimal-optimal feature selection** 16 | - [Consistent Feature Selection for Pattern Recognition in Polynomial Time](http://jmlr.csail.mit.edu/papers/volume8/nilsson07a/nilsson07a.pdf) (2007) *R. Nilsson, J. M. Peña, J. Björkegren, J. Tegnér* 17 | 18 | ### Filter methods 19 | Filter methods use model-free ranking to filter less relevant features 20 | - **Missing Values Ratio** 21 | - Removing features with a ratio of missing values greater than some threshold 22 | - **Low Variance Filter** ([sklearn](https://scikit-learn.org/stable/modules/feature_selection.html#removing-features-with-low-variance)) 23 | - Removing features with a variance lower than some threshold 24 | - **Correlation** ([Wiki](https://en.wikipedia.org/wiki/Correlation_and_dependence)) 25 | - **χ²** Chi-squared statistic for categorical features ([Wiki](https://en.wikipedia.org/wiki/Chi-squared_test), [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html)) 26 | - **ANOVA** F-value for quantitative features([Wiki](https://en.wikipedia.org/wiki/Analysis_of_variance), [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html)) 27 | - **Mutual information** ([Wiki](https://en.wikipedia.org/wiki/Mutual_information)) 28 | - [Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection](http://jmlr.csail.mit.edu/papers/volume13/brown12a/brown12a.pdf) (2012) *Gavin Brown, Adam Pocock, Ming-Jie Zhao, Mikel Lujan* 29 | - [Feature Selection Based on Joint Mutual Information](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.4424) (1999) *Howard Hua Yang, John Moody* 30 | - [Estimating mutual information](https://arxiv.org/pdf/cond-mat/0305641.pdf) (2003) *Alexander Kraskov, Harald Stoegbauer, Peter Grassberger* 31 | - **mRMR** Minimum redundancy, maximal relevancy ([Link](http://home.penglab.com/proj/mRMR/), [Wiki](https://en.wikipedia.org/wiki/Minimum_redundancy_feature_selection)) 32 | - [Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy](http://home.penglab.com/papersall/docpdf/2005_TPAMI_FeaSel.pdf) (2005) *Hanchuan Peng, Fuhui Long, Chris Ding* 33 | - **Relief** ([Wiki](https://en.wikipedia.org/wiki/Relief_(feature_selection))) 34 | - [The Feature Selection Problem: Traditional Methods and a New Algorithm](https://www.aaai.org/Papers/AAAI/1992/AAAI92-020.pdf) (1992) *Kira Kenji, Larry Rendell* 35 | - [Relief-Based Feature Selection: Introduction and Review](https://arxiv.org/pdf/1711.08421.pdf) (2018) *Ryan J. Urbanowicz, Melissa Meeker, William LaCava, Randal S. Olson, Jason H.Moore* 36 | - **Markov Blanket** ([Wiki](https://en.wikipedia.org/wiki/Markov_blanket)) 37 | - [Markov Blanket based Feature Selection: A Review of Past Decade](http://www.iaeng.org/publication/WCE2010/WCE2010_pp321-328.pdf) (2010) *Shunkai Fu, Michel C. Desmarais* 38 | - Incremental Association Markov Blanket: [Algorithms for Large Scale Markov Blanket Discovery](https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-073.pdf) (2003) *Ioannis Tsamardinos, Constantin F. Aliferis, Alexander Statnikov* 39 | - Grow-Shrink algorithm: [Bayesian Network Induction via Local Neighborhoods](http://robots.stanford.edu/papers/Margaritis99a.pdf) (2000) *Dimitris Margaritis, Sebastian Thrun* 40 | - Koller-Sahami method: [Toward Optimal Feature Selection](http://ilpubs.stanford.edu:8090/208/1/1996-77.pdf) (1996) *Daphne Koller and Mehran Sahami* 41 | - Max-Min Markov Blanket: [Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations](https://dl.acm.org/doi/10.1145/956750.956838) (2003) *Ioannis Tsamardinos, Constantin F. Aliferis, Alexander Statnikov* 42 | - **Fast Correlation-based Filter** 43 | - [Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution](https://www.public.asu.edu/~huanliu/papers/icml03.pdf) (2003) *Lei Yu, Huan Liu* 44 | - **CBF** Consistency-Based Filters 45 | - [Consistency-based search in feature selection](https://www.public.asu.edu/~huanliu/papers/aij03.pdf) (2003) *Manoranjan Dasha, Huan Liu* 46 | - **Interact** 47 | - [Searching for Interacting Features](https://www.public.asu.edu/~huanliu/papers/ijcai07.pdf) (2007) *Zheng Zhao, Huan Liu* 48 | 49 | ### Wrapper methods 50 | Wrapper methods use a model and its performance to find the best feature subset 51 | - **SFS** Sequential Feature Selection 52 | - **SFFS** Sequential Floating Forward Selection 53 | - [Floating search methods in feature selection](https://www.academia.edu/15425286/Floating_search_methods_in_feature_selection) (1994) *Pavel Pudil, Josef Kittler, Jana Novovicová* 54 | - [Adaptive floating search methods in feature selection](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.5032) (1999) *P. Somol , Pavel Pudil , Jana Novovicova , P. Paclik* 55 | - **Genertic algorithm** ([Wiki](https://en.wikipedia.org/wiki/Genetic_algorithm)) 56 | - **PSO** Particle Swarm Optimization ([Wiki](https://en.wikipedia.org/wiki/Particle_swarm_optimization)) 57 | - [Particle Swarm Optimization](https://www.cs.tufts.edu/comp/150GA/homeworks/hw3/_reading6%201995%20particle%20swarming.pdf) (1995) *James Kennedy, Russell Eberhar* 58 | - [Feature Selection using PSO-SVM](http://www.iaeng.org/IJCS/issues_v33/issue_1/IJCS_33_1_18.pdf) (2007) *Chung-Jui Tu, Li-Yeh Chuang, Jun-Yang Chang, Cheng-Hong Yang* 59 | - **Boruta** All-relevant feature selection ([CRAN](https://cran.r-project.org/web/packages/Boruta/), [PyPI](https://pypi.org/project/Boruta/)) 60 | - [Boruta – A System for Feature Selection](https://www.mimuw.edu.pl/~ajank/papers/Kursa2010.pdf) (2010) *Miron B. Kursa, Aleksander Jankowski, Witold R. Rudnick* 61 | - BoostARoota - Boruta with XGBoost as a base model ([Code](https://github.com/chasedehan/BoostARoota)) 62 | - **MUVR** ([GitLab](https://gitlab.com/CarlBrunius/MUVR)) 63 | - [Variable selection and validation in multivariate modelling](https://academic.oup.com/bioinformatics/article/35/6/972/5085367) (2018) *Lin Shi, Johan A Westerhuis, Johan Rosén, Rikard Landberg, Carl Brunius* 64 | - Wrappers methods and overfitting: 65 | - [Wrappers for feature subset selection](http://machine-learning.martinsewell.com/feature-selection/KohaviJohn1997.pdf) (1996) *Ron Kohavi, George H. John* 66 | 67 | ### Embedded methods 68 | - **LASSO** 69 | - [Regression Shrinkage and Selection via the lasso](https://statweb.stanford.edu/~tibs/lasso/lasso.pdf) (1996) *Robert Tibshirani* 70 | - **Elastic net** 71 | - [Regularization and variable selection via the elastic net](https://web.stanford.edu/~hastie/Papers/B67.2%20(2005)%20301-320%20Zou%20&%20Hastie.pdf) (2005) *Hui Zou, Trevor Hastie* 72 | - **Spike and Slab regression** ([Wiki](https://en.wikipedia.org/wiki/Spike-and-slab_regression)) 73 | - [Bayesian variable selection in linear regression](1988) *T.J. Mitchell, J.J. Beuchamp* 74 | - [Approaches for Bayesian variable selection](http://www-stat.wharton.upenn.edu/~edgeorge/Research_papers/GeorgeMcCulloch97.pdf) (1997) *Edward I. George, Robert E. McCulloch* 75 | - **Decision Tree** ([Wiki](https://en.wikipedia.org/wiki/Decision_tree)) 76 | - **Random Forest** ([Wiki](https://en.wikipedia.org/wiki/Random_forest)) 77 | - [Random Forests](https://link.springer.com/article/10.1023/A:1010933404324) (2001) *Leo Breiman* 78 | - [Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatic](https://epub.ub.uni-muenchen.de/13766/1/TR.pdf) (2012) *Anne-Laure Boulesteix, Silke Janitza, Jochen Kruppa, Inke R. Konig* 79 | - [Variable selection using random forests](https://hal.archives-ouvertes.fr/hal-00755489/file/PRLv4.pdf) (2010) *Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot* 80 | - [Bias in random forest variable importance measures: Illustrations, sources and a solution](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796903/pdf/1471-2105-8-25.pdf) (2007) *Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, Torsten Hothorn* 81 | - [Conditional Variable Importance for Random Forests](https://epub.ub.uni-muenchen.de/2821/1/deck.pdf) (2008) *Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, Achim Zeileis* 82 | - [Correlation and variable importance in random forests](https://arxiv.org/pdf/1310.5726.pdf) (2016) *Baptiste Gregorutti, Bertrand Michel, Philippe Saint-Pierre* 83 | - **Gradient Boosting** ([Wiki](https://en.wikipedia.org/wiki/Gradient_boosting)) 84 | - [Greedy Function Approximation: A Gradient Boosting Machine](https://statweb.stanford.edu/~jhf/ftp/trebst.pdf) (1999) *Jerome H Friedman* 85 | - [Boosting Algorithms as Gradient Descent](http://papers.nips.cc/paper/1766-boosting-algorithms-as-gradient-descent.pdf) (1999) *Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean* 86 | 87 | ### Unsupervised and semi-supervised feature selection 88 | - **FSSEM** Feature Subset Selection using Expectation-Maximization 89 | - [Feature Selection for Unsupervised Learning](http://www.jmlr.org/papers/volume5/dy04a/dy04a.pdf) (2004) *Jennifer G. Dy, Carla E. Brodley* 90 | - **Laplacian Score** 91 | - Choosing features using a nearest neighbor graph 92 | - [Laplacian Score for Feature Selection](https://papers.nips.cc/paper/2909-laplacian-score-for-feature-selection.pdf) (2005) *Xiaofei He, Deng Cai, Deng Cai, Partha Niyogi, Partha Niyogi* 93 | - **Principal Feature Analysis** 94 | - [Feature Selection Using Principal Feature Analysis](http://venom.cs.utsa.edu/dmz/techrep/2007/CS-TR-2007-011.pdf) (2007) *Yijuan Lu, Ira Cohen, Xiang Sean Zhou, Qi Tian* 95 | - **Spectral Feature Selection** 96 | - Separates samples into clusters using a spectrum of pairwise similarity graph 97 | - [Spectral Feature Selection forSupervised and Unsupervised Learning](https://www.public.asu.edu/~huanliu/papers/icml07.pdf) (2007) *Zheng Zhao, Huan Liu* 98 | - **MCFS** Multi-cluster Feature Selection 99 | - [Unsupervised Feature Selection for Multi-Cluster Data](https://wwwx.cs.unc.edu/Courses/comp790-090-s11/Presentations/p333-cai.pdf) (2010) *Deng Cai, Chiyuan Zhang, Xiaofei He* 100 | - **Autoencoders** ([Wiki](https://en.wikipedia.org/wiki/Autoencoder)) 101 | - [Autoencoders, Unsupervised Learning, and Deep Architectures](http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf) (2012) *Pierre Baldi* 102 | - [An Introduction to Variational Autoencoders](https://arxiv.org/pdf/1906.02691.pdf) (2019) *Diederik P. Kingma, Max Welling* 103 | - [Concrete Autoencoders for Differentiable Feature Selection and Reconstruction](https://arxiv.org/pdf/1901.09346.pdf) (2019) *Abubakar Abid, Muhammad Fatih Balin, James Zou* 104 | 105 | ### Stable feature selection 106 | - [Stability of Feature Selection Algorithms: a study on high dimensional spaces](http://cui.unige.ch/~kalousis/papers/stability/KalousisPradosHilarioKIS2007.pdf) (2007) *Alexandros Kalousis, Julien Prados, Melanie Hilario* 107 | - [Robust Feature Selection Using Ensemble Feature Selection Techniques](http://bioinformatics.psb.ugent.be/pdf/publications/978-3-540-87481-2.pdf) (2008) *Yvan Saeys, Thomas Abeel, Yves Van de Pee* 108 | - [Stability Selection](https://stat.ethz.ch/~nicolai/stability.pdf) (2009) *Nicolai Meinshausen, Peter Buhlmann* 109 | - [A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets](https://arxiv.org/pdf/2005.05003.pdf) (2020) *Zixiao Shen, Xin Chen, Jonathan M. Garibald* 110 | - Stability of MDA, LIME and SHAP: [The best way to select features](https://arxiv.org/pdf/2005.12483.pdf) (2020) *Xin Man, Ernest P. Chan* 111 | 112 | ### Domain-specific 113 | - **Uplift models** 114 | - [Feature Selection Methods for Uplift Modeling](https://arxiv.org/pdf/2005.03447.pdf) (2020) *Zhenyu Zhao, Yumin Zhang, Totte Harinen, Mike Yung* 115 | 116 | ### Meta feature selection 117 | - [A Feature Subset Selection Algorithm AutomaticRecommendation Method](https://arxiv.org/pdf/1402.0570.pdf) (2013) *Guangtao Wang, Qinbao Song, Heli Sun, Xueying Zhang, Baowen Xu, Yuming Zhou* 118 | - [Metalearning for Choosing Feature Selection Algorithms in Data Mining: Proposal of a New Framework](https://www.researchgate.net/profile/Antonio_Parmezan/publication/312482443_Metalearning_for_Choosing_Feature_Selection_Algorithms_in_Data_Mining_Proposal_of_a_New_Framework/links/5c3f0e7f92851c22a378a5a6/Metalearning-for-Choosing-Feature-Selection-Algorithms-in-Data-Mining-Proposal-of-a-New-Framework.pdf) (2017) *Antonio Rafael Sabino Parmezan, Huei Diana Lee* 119 | - [A Novel Meta Learning Framework for Feature Selection using Data Synthesis and Fuzzy Similarity](https://arxiv.org/pdf/2005.09856.pdf) (2020) *Zixiao Shen, Xin Chen, Jonathan M. Garibald* 120 | 121 | ### Packages 122 | - **R** 123 | - Package: fscaret ([CRAN](https://cran.r-project.org/web/packages/fscaret/)) *Jakub Szlek* 124 | - Package: praznik ([Code](https://gitlab.com/mbq/praznik)) *Miron Kursa* 125 | - Package: FSinR ([CRAN](https://cran.r-project.org/web/packages/FSinR/), [Paper](https://arxiv.org/pdf/2002.10330.pdf)) *Francisco Aragón-Royón, Alfonso Jiménez-Vílchez, Antonio Arauzo-Azofra, José Manuel Benítez* 126 | - Package: VSURF ([CRAN](https://cran.r-project.org/web/packages/VSURF/), [Paper](https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf)) 127 | - Package: spikeSlabGAM ([Code](https://github.com/fabian-s/spikeSlabGAM), [CRAN](https://cran.r-project.org/web/packages/spikeSlabGAM/), [Paper](https://www.jstatsoft.org/article/view/v043i14)) 128 | - Package: copent ([CRAN](https://cran.r-project.org/web/packages/copent/), [Code](https://github.com/majianthu/copent), [Paper](https://arxiv.org/pdf/2005.14025.pdf)) 129 | - **Python** 130 | - Package: sklearn.feature_selection ([Homepage](https://scikit-learn.org/stable/), [Code](https://github.com/scikit-learn/scikit-learn), [PyPI](https://pypi.org/project/scikit-learn/)) 131 | - Package: scikit-feature ([Homepage](http://featureselection.asu.edu/), [Code](https://github.com/jundongl/scikit-feature)) 132 | - Package: feature-selector ([Code](https://github.com/WillKoehrsen/feature-selector), [PyPI](https://pypi.org/project/feature-selector/)) 133 | - **Julia** 134 | - the main packages for ML in Julia are [MLJ](https://github.com/alan-turing-institute/MLJ.jl) and [Flow](https://fluxml.ai/Flux.jl/stable/) 135 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | remote_theme: mlpapers/minimal 2 | --------------------------------------------------------------------------------