├── .gitignore
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | *.sublime*


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # :rocket: Where to Start Practicing?
  2 | If you can not wait to start practicing, you can start doing so at:
  3 | * [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) (Kaggle). Excellent free challenge to start practicing classification in machine learning.
  4 | * [Gym: Reinforcement Learning Toolkit](https://gym.openai.com/) Nice toolkit to get started and practice Reinforcement Learning.
  5 | * [Neural Network Playground](https://playground.tensorflow.org/) Visualization of how Neural Networks work.
  6 | 
  7 | # :video_camera: Courses
  8 | * [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/ml-intro) (by Google)
  9 | * [Machine Learning Problem Framing from Google](https://developers.google.com/machine-learning/problem-framing/) (by Google)
 10 | * [Coursera Course: Machine Learning](https://www.coursera.org/learn/machine-learning) (by Dr. Andrew Ng)
 11 | * [Start Training on Machine Learning with AWS](https://aws.amazon.com/training/learning-paths/machine-learning/) (by AWS)
 12 | * [AI For Everyone](https://www.deeplearning.ai/ai-for-everyone/) (by Dr. Andrew Ng)
 13 | * [Statistics and probability](https://www.khanacademy.org/math/statistics-probability) (Khan Academy). Excellent free source to get started on statistics and probablity.  
 14 | * [Linear Algebra](https://www.khanacademy.org/math/linear-algebra) (Khan Academy)
 15 | 
 16 | # ![YouTube](https://cdn.emojidex.com/emoji/px32/YouTube.png?1512927079 "YouTube") YouTube Channels
 17 | * [StatQuest with Josh Starmer](https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw). StatQuest breaks down complicated Statistics and Machine Learning methods into small, bite-sized pieces that are easy to understand. 
 18 | * [Statistics 110](https://www.youtube.com/watch?v=KbB0FjPg0mw&index=1&list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo) (Course by [Harvard University](https://www.quora.com/Why-is-Stat-110-so-popular-at-Harvard))  
 19 | * [3Blue1Brown](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw). 3blue1brown, by Grant Sanderson, is some combination of math and entertainment, depending on your disposition.
 20 | 
 21 | # :books: Books
 22 | ## Introductory level
 23 | * [A First Course in Machine Learning](https://www.amazon.com/Simon-Rogers/dp/1498738486/) (Chapman and Hall, by Simon Rogers and Mark Girolami)
 24 | * [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646/) (O'Reilly, by Aurélien Géron)
 25 | * [Machine Learning For Absolute Beginners: A Plain English Introduction](https://www.amazon.com/Machine-Learning-Absolute-Beginners-Introduction/dp/152095140X) (by Oliver Theobald)
 26 | 
 27 | ## Intermediary/Advanced level
 28 | * [Pattern Recognition and Machine Learning](https://www.springer.com/gp/book/9780387310732) (by Christopher Bishop)
 29 | * [Artificial Intelligence: A Modern Approach](http://aima.cs.berkeley.edu/) (Pearson, By Stuart J. Russell, Stuart Jonathan Russell, Peter Norvig, Ernest Davis)
 30 | * [The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition](https://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576) (Springer, by Trevor Hastie, Robert Tibshirani and Jerome Friedman)
 31 | * [Advanced R](https://adv-r.hadley.nz/)
 32 | 
 33 | # :school: Math
 34 | * [Standard Deviation Visual](https://www.youtube.com/watch?v=pW8GZujRcFI) (Video)  
 35 | * Random Variables and Distrubutions
 36 |     * [Find Percentiles for a General Continuous Random Variable](https://www.youtube.com/watch?v=qo4Zj1n3Gak) (Video)
 37 |     * [Probability Density Functions / Continuous Random Variables](https://www.youtube.com/watch?v=szjL60gAweE)
 38 |     * [Applied Multivariate Statistical Analysis](https://online.stat.psu.edu/stat505/) (Course by PennState Eberly College of Science)
 39 |     * [Bivariate normal distribution conditional distributions](https://www.youtube.com/watch?v=fb8uE4NM2fc) (Video)
 40 |     * [Poisson Distribution](http://www.stats.ox.ac.uk/~marchini/teaching/L5/L5.notes.pdf?fbclid=IwAR21tPt0yVmXlU4CvEWRvW_uoeYUn_FIk6jADPfMsZ_B1C2qZDE20N0y5zc) (PDF)
 41 |     * [Approximating a Binomial Prob Distribution using a Normal Distribution](https://www.youtube.com/watch?v=rPOSpI7qMl0) (Video Part 1) [(Part 2)](https://www.youtube.com/watch?v=LYjKrMDdWKA)
 42 |     * [Continuity Corrections](https://www.youtube.com/watch?v=mjV8okVG1sc) (Video)
 43 |     * [How to do Normal Distributions Calculations](https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php) (Video)
 44 | * [Fourier Transformation for a Data Scientist](https://towardsdatascience.com/fourier-transformation-for-a-data-scientist-1f3731115097)
 45 | 
 46 | # :computer: Programming
 47 | * [Learn R, in R.](https://swirlstats.com/)
 48 | * R Tutorial - A Beginner's Guide: [Link 1](https://www.edureka.co/blog/r-tutorial/), [Link2](https://www.edureka.co/blog/r-programming-language)
 49 | * [11 Beginner Tips for Learning Python Programming](https://realpython.com/python-beginner-tips/) (Article)
 50 | * [Python Libraries for Interpretable Machine Learning](https://towardsdatascience.com/python-libraries-for-interpretable-machine-learning-c476a08ed2c7)
 51 | 
 52 | ## :computer: Tutorials
 53 | * [How to Create State and County Maps Easily in R](https://medium.com/@urban_institute/how-to-create-state-and-county-maps-easily-in-r-577d29300bb2)
 54 | * [Setup-Random-Seeds-on-Caret-Package](http://jaehyeon-kim.github.io/2015/05/Setup-Random-Seeds-on-Caret-Package.html)
 55 | * [Logistic Regression in R Tutorial](https://www.datacamp.com/community/tutorials/logistic-regression-R)
 56 | * [Cross-Validation for Predictive Analytics Using R](http://www.milanor.net/blog/cross-validation-for-predictive-analytics-using-r/)
 57 | * [K-Means Clustering in R Tutorial](https://www.datacamp.com/community/tutorials/k-means-clustering-r)
 58 | * [Improve Your Model Performance using Cross Validation (in Python and R)](https://www.analyticsvidhya.com/blog/2018/05/improve-model-performance-cross-validation-in-python-r/)
 59 | * [Accuracy and Errors for Models](https://rcompanion.org/handbook/G_14.html)
 60 | * [Penalized Logistic Regression Essentials in R: Ridge, Lasso and Elastic Net](http://www.sthda.com/english/articles/36-classification-methods-essentials/149-penalized-logistic-regression-essentials-in-r-ridge-lasso-and-elastic-net/)
 61 | 
 62 | ## :computer: Libraries
 63 | * TensorFlow
 64 | * Theano
 65 | * Scikit-learn
 66 | 
 67 | # :school: Schools
 68 | ## México. 
 69 | * [http://inteligenciafutura.mx/](http://inteligenciafutura.mx/blog/inteligencia-artificial-aprendizaje-de-maquina-ciencia-de-datos-carreras-del-futuro)
 70 | * [Maestría en Cómputo Estadístico](https://mce.cimat.mx/)
 71 | 
 72 | # Data Privacy
 73 | ## México
 74 | * [Registro Público de Usuarios (REUS)](https://www.gob.mx/tramites/ficha/registro-publico-de-usuarios-reus-para-personas-fisicas/CONDUSEF2536)
 75 | 
 76 | ## Europe
 77 | * [GDPR](https://gdpr-info.eu/)
 78 | 
 79 | ## United States
 80 | * [CCPA California Consumer Privacy Act](https://oag.ca.gov/privacy/ccpa)
 81 | 
 82 | # Blogs and Articles
 83 | * [Lexalytics Blog](https://www.lexalytics.com/lexablog/category/machine-learning)
 84 | 
 85 | # Publications
 86 | 
 87 | ## Classification
 88 | * Norton, M. & Uryasev, S. (2019). [Maximization of auc and buffered auc in binary clas- sification. Mathematical Programming.](https://dl.acm.org/doi/abs/10.1007/s10107-018-1312-2)
 89 | * Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). [A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory.](https://dl.acm.org/doi/10.1145/130385.130401)
 90 | * Tharwat, A. (2018). [Classification assessment methods. Applied Computing and Informatics.](https://doi.org/10.1016/j.aci.2018.08.003)
 91 | *  De Siqueira Santos, S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2013). [A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics.](https://doi.org/10.1093/bib/bbt051)
 92 | * Dietterich, T. G. (1998). [Approximate statistical tests for comparing supervised classification learning algorithms.](https://doi.org/10.1162/089976698300017197)
 93 | * Kotsiantis, S. B. (2007). [Supervised machine learning: A review of classification tech- niques.](https://dl.acm.org/doi/10.5555/1566770.1566773)
 94 | * Kuncheva, L. I. (2004). [Combining pattern classifiers: Methods and algorithms. New York, NY, USA: Wiley-Interscience.](https://doi.org/10.1002/0471660264)
 95 | 
 96 | ### Logistic Regression
 97 | * Stoltzfus, J. C. (2011). [Logistic regression: A brief primer.](https://doi.org/10.1111/j.1553-2712.2011.01185.x)
 98 | * Lee, S.-I., Lee, H., Abbeel, P., & Ng, A. Y. (2006). [Efficient l1 regularized logistic regression.](https://www.aaai.org/Papers/AAAI/2006/AAAI06-064.pdf)
 99 | * Peduzzi, P., Concato, J. P., Kemper, E., Holford, T., & Feinstein, A. R. (1996). [A sim- ulation study of the number of events per variable in logistic regression analysis.](https://doi.org/10.1016/S0895-4356(96)00236-3)
100 | 
101 | ### Support Vector Machines
102 | * Cortes, C. & Vapnik, V. (1995). [Support-vector networks.](https://doi.org/10.1023/A:1022627411411)
103 | * R. T. (2016). [Control-group feature normalization for multivariate pattern analysis of structural mri data using the support vector machine.](https://doi.org/10.1016/j.neuroimage.2016.02.044)
104 | * Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). [A practical guide to support vector clas-
105 | sification.](https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf)
106 | 
107 | ### Decision Trees and Random Forests
108 | * Murthy, S. K. (1998). [Automatic construction of decision trees from data: A multi- disciplinary survey.](https://doi.org/10.1023/A:1009744630224)
109 | * Rokach, L. & Maimon, O. (2005). [Top-down induction of decision trees classifiers - a survey.](https://doi.org/10.1109/TSMCC.2004.843247)
110 | * Breiman, L. (2001). [Random Forests.](https://doi.org/10.1023/A:1010933404324)
111 | * Cutler, D. R., Edwards Jr., T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., &
112 | Lawler, J. J. (2007). [Random forests for classification in ecology.](https://doi:10.1890/07-0539.1)
113 | * Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. (2013). [Understanding variable importances in forests of randomized trees.](http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf)
114 | * Probst, P., Wright, M. N., & Boulesteix, A.-L. (2019). [Hyperparameters and tuning strategies for random forest.](https://doi:10.1002/widm.1301)
115 | 
116 | ### Data Imbalance
117 | * Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. In O. Maimon
118 | & L. Rokach (Eds.), Data mining and knowledge discovery handbook. [https://doi:10.1007/0-387-25465-X_40](https://doi:10.1007/0-387-25465-X_40)
119 | *  Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. Retrieved from www.ijetae.com
120 | 
121 | ### Collinearity
122 | * Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Lautenbach, S. (2013). [Collinearity: A review of methods to deal with it and a simulation study evaluating their performance.](https://doi:10.1111/j.1600-0587.2012.07348.x)
123 | 
124 | ### Evaluation
125 | * Raschka, S. (2018). [Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning](https://arxiv.org/abs/1811.12808)
126 | * Kohavi, R. (1995). [A study of cross-validation and bootstrap for accuracy estimation and
127 | model selection.](https://dl.acm.org/doi/10.5555/1643031.1643047)
128 | * Spearman, C. (1904). ["General Intelligence," objectively determined and measured.](https://doi.org/10.2307/1412107)
129 | 
130 | ### Spatial Statistics
131 | * Pebesma, E. (2018). [Simple Features for R: Standardized Support for Spatial Vector Data.](https://doi.org/10.32614/RJ-2018-009)
132 | 
133 | 
134 | ### Other
135 | * Cochran, W. G. (1950). [The comparison of percentages in matches samples.](https://doi.org/10.1093/biomet/37.3-4.256)
136 | * Hoeffding, W. (1948). [A non-parametric test of independence.](https://doi.org/10.1214/aoms/1177730150)
137 | * Samuels, M. L. (1993). [Simpson’s paradox and related phenomena.](http://www.jstor.org/stable/2290700)
138 | * Edwards, A. L. (1948). [Note on the “correction for continuity” in testing the significance of the difference between correlated proportions.](https://doi.org/10.1007/BF02289261)
139 | * Downar, L. & Duivesteijn, W. (2017). [Exceptionally monotone models - the rank correlation model class for exceptional model mining](https://doi.org/10.1007/s10115-016-0979-z)
140 | * McNemar, Q. (1947). [Note on the sampling error of the difference between correlated proportions or percentages.](https://doi.org/10.1007/BF02295996)
141 | 
142 | 
143 | 
144 | 
145 | 
146 | 
147 | 
148 | 
149 | 
150 | 
151 | 


--------------------------------------------------------------------------------