├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.sublime* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # :rocket: Where to Start Practicing? 2 | If you can not wait to start practicing, you can start doing so at: 3 | * [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) (Kaggle). Excellent free challenge to start practicing classification in machine learning. 4 | * [Gym: Reinforcement Learning Toolkit](https://gym.openai.com/) Nice toolkit to get started and practice Reinforcement Learning. 5 | * [Neural Network Playground](https://playground.tensorflow.org/) Visualization of how Neural Networks work. 6 | 7 | # :video_camera: Courses 8 | * [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/ml-intro) (by Google) 9 | * [Machine Learning Problem Framing from Google](https://developers.google.com/machine-learning/problem-framing/) (by Google) 10 | * [Coursera Course: Machine Learning](https://www.coursera.org/learn/machine-learning) (by Dr. Andrew Ng) 11 | * [Start Training on Machine Learning with AWS](https://aws.amazon.com/training/learning-paths/machine-learning/) (by AWS) 12 | * [AI For Everyone](https://www.deeplearning.ai/ai-for-everyone/) (by Dr. Andrew Ng) 13 | * [Statistics and probability](https://www.khanacademy.org/math/statistics-probability) (Khan Academy). Excellent free source to get started on statistics and probablity. 14 | * [Linear Algebra](https://www.khanacademy.org/math/linear-algebra) (Khan Academy) 15 | 16 | # ![YouTube](https://cdn.emojidex.com/emoji/px32/YouTube.png?1512927079 "YouTube") YouTube Channels 17 | * [StatQuest with Josh Starmer](https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw). StatQuest breaks down complicated Statistics and Machine Learning methods into small, bite-sized pieces that are easy to understand. 18 | * [Statistics 110](https://www.youtube.com/watch?v=KbB0FjPg0mw&index=1&list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo) (Course by [Harvard University](https://www.quora.com/Why-is-Stat-110-so-popular-at-Harvard)) 19 | * [3Blue1Brown](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw). 3blue1brown, by Grant Sanderson, is some combination of math and entertainment, depending on your disposition. 20 | 21 | # :books: Books 22 | ## Introductory level 23 | * [A First Course in Machine Learning](https://www.amazon.com/Simon-Rogers/dp/1498738486/) (Chapman and Hall, by Simon Rogers and Mark Girolami) 24 | * [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646/) (O'Reilly, by Aurélien Géron) 25 | * [Machine Learning For Absolute Beginners: A Plain English Introduction](https://www.amazon.com/Machine-Learning-Absolute-Beginners-Introduction/dp/152095140X) (by Oliver Theobald) 26 | 27 | ## Intermediary/Advanced level 28 | * [Pattern Recognition and Machine Learning](https://www.springer.com/gp/book/9780387310732) (by Christopher Bishop) 29 | * [Artificial Intelligence: A Modern Approach](http://aima.cs.berkeley.edu/) (Pearson, By Stuart J. Russell, Stuart Jonathan Russell, Peter Norvig, Ernest Davis) 30 | * [The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition](https://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576) (Springer, by Trevor Hastie, Robert Tibshirani and Jerome Friedman) 31 | * [Advanced R](https://adv-r.hadley.nz/) 32 | 33 | # :school: Math 34 | * [Standard Deviation Visual](https://www.youtube.com/watch?v=pW8GZujRcFI) (Video) 35 | * Random Variables and Distrubutions 36 | * [Find Percentiles for a General Continuous Random Variable](https://www.youtube.com/watch?v=qo4Zj1n3Gak) (Video) 37 | * [Probability Density Functions / Continuous Random Variables](https://www.youtube.com/watch?v=szjL60gAweE) 38 | * [Applied Multivariate Statistical Analysis](https://online.stat.psu.edu/stat505/) (Course by PennState Eberly College of Science) 39 | * [Bivariate normal distribution conditional distributions](https://www.youtube.com/watch?v=fb8uE4NM2fc) (Video) 40 | * [Poisson Distribution](http://www.stats.ox.ac.uk/~marchini/teaching/L5/L5.notes.pdf?fbclid=IwAR21tPt0yVmXlU4CvEWRvW_uoeYUn_FIk6jADPfMsZ_B1C2qZDE20N0y5zc) (PDF) 41 | * [Approximating a Binomial Prob Distribution using a Normal Distribution](https://www.youtube.com/watch?v=rPOSpI7qMl0) (Video Part 1) [(Part 2)](https://www.youtube.com/watch?v=LYjKrMDdWKA) 42 | * [Continuity Corrections](https://www.youtube.com/watch?v=mjV8okVG1sc) (Video) 43 | * [How to do Normal Distributions Calculations](https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php) (Video) 44 | * [Fourier Transformation for a Data Scientist](https://towardsdatascience.com/fourier-transformation-for-a-data-scientist-1f3731115097) 45 | 46 | # :computer: Programming 47 | * [Learn R, in R.](https://swirlstats.com/) 48 | * R Tutorial - A Beginner's Guide: [Link 1](https://www.edureka.co/blog/r-tutorial/), [Link2](https://www.edureka.co/blog/r-programming-language) 49 | * [11 Beginner Tips for Learning Python Programming](https://realpython.com/python-beginner-tips/) (Article) 50 | * [Python Libraries for Interpretable Machine Learning](https://towardsdatascience.com/python-libraries-for-interpretable-machine-learning-c476a08ed2c7) 51 | 52 | ## :computer: Tutorials 53 | * [How to Create State and County Maps Easily in R](https://medium.com/@urban_institute/how-to-create-state-and-county-maps-easily-in-r-577d29300bb2) 54 | * [Setup-Random-Seeds-on-Caret-Package](http://jaehyeon-kim.github.io/2015/05/Setup-Random-Seeds-on-Caret-Package.html) 55 | * [Logistic Regression in R Tutorial](https://www.datacamp.com/community/tutorials/logistic-regression-R) 56 | * [Cross-Validation for Predictive Analytics Using R](http://www.milanor.net/blog/cross-validation-for-predictive-analytics-using-r/) 57 | * [K-Means Clustering in R Tutorial](https://www.datacamp.com/community/tutorials/k-means-clustering-r) 58 | * [Improve Your Model Performance using Cross Validation (in Python and R)](https://www.analyticsvidhya.com/blog/2018/05/improve-model-performance-cross-validation-in-python-r/) 59 | * [Accuracy and Errors for Models](https://rcompanion.org/handbook/G_14.html) 60 | * [Penalized Logistic Regression Essentials in R: Ridge, Lasso and Elastic Net](http://www.sthda.com/english/articles/36-classification-methods-essentials/149-penalized-logistic-regression-essentials-in-r-ridge-lasso-and-elastic-net/) 61 | 62 | ## :computer: Libraries 63 | * TensorFlow 64 | * Theano 65 | * Scikit-learn 66 | 67 | # :school: Schools 68 | ## México. 69 | * [http://inteligenciafutura.mx/](http://inteligenciafutura.mx/blog/inteligencia-artificial-aprendizaje-de-maquina-ciencia-de-datos-carreras-del-futuro) 70 | * [Maestría en Cómputo Estadístico](https://mce.cimat.mx/) 71 | 72 | # Data Privacy 73 | ## México 74 | * [Registro Público de Usuarios (REUS)](https://www.gob.mx/tramites/ficha/registro-publico-de-usuarios-reus-para-personas-fisicas/CONDUSEF2536) 75 | 76 | ## Europe 77 | * [GDPR](https://gdpr-info.eu/) 78 | 79 | ## United States 80 | * [CCPA California Consumer Privacy Act](https://oag.ca.gov/privacy/ccpa) 81 | 82 | # Blogs and Articles 83 | * [Lexalytics Blog](https://www.lexalytics.com/lexablog/category/machine-learning) 84 | 85 | # Publications 86 | 87 | ## Classification 88 | * Norton, M. & Uryasev, S. (2019). [Maximization of auc and buffered auc in binary clas- sification. Mathematical Programming.](https://dl.acm.org/doi/abs/10.1007/s10107-018-1312-2) 89 | * Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). [A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory.](https://dl.acm.org/doi/10.1145/130385.130401) 90 | * Tharwat, A. (2018). [Classification assessment methods. Applied Computing and Informatics.](https://doi.org/10.1016/j.aci.2018.08.003) 91 | * De Siqueira Santos, S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2013). [A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics.](https://doi.org/10.1093/bib/bbt051) 92 | * Dietterich, T. G. (1998). [Approximate statistical tests for comparing supervised classification learning algorithms.](https://doi.org/10.1162/089976698300017197) 93 | * Kotsiantis, S. B. (2007). [Supervised machine learning: A review of classification tech- niques.](https://dl.acm.org/doi/10.5555/1566770.1566773) 94 | * Kuncheva, L. I. (2004). [Combining pattern classifiers: Methods and algorithms. New York, NY, USA: Wiley-Interscience.](https://doi.org/10.1002/0471660264) 95 | 96 | ### Logistic Regression 97 | * Stoltzfus, J. C. (2011). [Logistic regression: A brief primer.](https://doi.org/10.1111/j.1553-2712.2011.01185.x) 98 | * Lee, S.-I., Lee, H., Abbeel, P., & Ng, A. Y. (2006). [Efficient l1 regularized logistic regression.](https://www.aaai.org/Papers/AAAI/2006/AAAI06-064.pdf) 99 | * Peduzzi, P., Concato, J. P., Kemper, E., Holford, T., & Feinstein, A. R. (1996). [A sim- ulation study of the number of events per variable in logistic regression analysis.](https://doi.org/10.1016/S0895-4356(96)00236-3) 100 | 101 | ### Support Vector Machines 102 | * Cortes, C. & Vapnik, V. (1995). [Support-vector networks.](https://doi.org/10.1023/A:1022627411411) 103 | * R. T. (2016). [Control-group feature normalization for multivariate pattern analysis of structural mri data using the support vector machine.](https://doi.org/10.1016/j.neuroimage.2016.02.044) 104 | * Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). [A practical guide to support vector clas- 105 | sification.](https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf) 106 | 107 | ### Decision Trees and Random Forests 108 | * Murthy, S. K. (1998). [Automatic construction of decision trees from data: A multi- disciplinary survey.](https://doi.org/10.1023/A:1009744630224) 109 | * Rokach, L. & Maimon, O. (2005). [Top-down induction of decision trees classifiers - a survey.](https://doi.org/10.1109/TSMCC.2004.843247) 110 | * Breiman, L. (2001). [Random Forests.](https://doi.org/10.1023/A:1010933404324) 111 | * Cutler, D. R., Edwards Jr., T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & 112 | Lawler, J. J. (2007). [Random forests for classification in ecology.](https://doi:10.1890/07-0539.1) 113 | * Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. (2013). [Understanding variable importances in forests of randomized trees.](http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf) 114 | * Probst, P., Wright, M. N., & Boulesteix, A.-L. (2019). [Hyperparameters and tuning strategies for random forest.](https://doi:10.1002/widm.1301) 115 | 116 | ### Data Imbalance 117 | * Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. In O. Maimon 118 | & L. Rokach (Eds.), Data mining and knowledge discovery handbook. [https://doi:10.1007/0-387-25465-X_40](https://doi:10.1007/0-387-25465-X_40) 119 | * Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. Retrieved from www.ijetae.com 120 | 121 | ### Collinearity 122 | * Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Lautenbach, S. (2013). [Collinearity: A review of methods to deal with it and a simulation study evaluating their performance.](https://doi:10.1111/j.1600-0587.2012.07348.x) 123 | 124 | ### Evaluation 125 | * Raschka, S. (2018). [Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning](https://arxiv.org/abs/1811.12808) 126 | * Kohavi, R. (1995). [A study of cross-validation and bootstrap for accuracy estimation and 127 | model selection.](https://dl.acm.org/doi/10.5555/1643031.1643047) 128 | * Spearman, C. (1904). ["General Intelligence," objectively determined and measured.](https://doi.org/10.2307/1412107) 129 | 130 | ### Spatial Statistics 131 | * Pebesma, E. (2018). [Simple Features for R: Standardized Support for Spatial Vector Data.](https://doi.org/10.32614/RJ-2018-009) 132 | 133 | 134 | ### Other 135 | * Cochran, W. G. (1950). [The comparison of percentages in matches samples.](https://doi.org/10.1093/biomet/37.3-4.256) 136 | * Hoeffding, W. (1948). [A non-parametric test of independence.](https://doi.org/10.1214/aoms/1177730150) 137 | * Samuels, M. L. (1993). [Simpson’s paradox and related phenomena.](http://www.jstor.org/stable/2290700) 138 | * Edwards, A. L. (1948). [Note on the “correction for continuity” in testing the significance of the difference between correlated proportions.](https://doi.org/10.1007/BF02289261) 139 | * Downar, L. & Duivesteijn, W. (2017). [Exceptionally monotone models - the rank correlation model class for exceptional model mining](https://doi.org/10.1007/s10115-016-0979-z) 140 | * McNemar, Q. (1947). [Note on the sampling error of the difference between correlated proportions or percentages.](https://doi.org/10.1007/BF02295996) 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | --------------------------------------------------------------------------------