├── .github └── FUNDING.yml ├── Interview Guide ├── 100 Python Interview Questions and Answers [Updated 2019].pdf ├── 109 Data Science Interview Questions and Answers _ Springboard Blog.pdf ├── 15 Free eBooks to Learn Python - codeburst.pdf ├── 50 Most Common Interview Questions and Answers.pdf ├── 50 of the Best Data Science Blogs _ Springboard Blog.pdf ├── Data Science Interview Guide.xlsx ├── Data Science Interview Questions.pdf ├── Getting Your First Job in DS.pdf ├── ML Questions.pdf ├── Top 10 Machine LEarning Algorithms.pdf ├── Top 100 Python Interview Questions & Answers For 2019 _ Edureka.pdf ├── Top 45 Data Science Interview Questions and Answers For 2019 _ Edureka.pdf └── Top Interview Questions you must know.pdf ├── Jonathan Bower.md ├── LICENSE ├── Machine Learning ├── INTRODUCTION TO MACHINE LEARNING.pdf ├── Linear Discriminant Analysis.pdf ├── ML Cheat Sheet.pdf ├── ML Formulas.pdf ├── ML PROS N CONS.pdf ├── ML-02-linear-regression.pdf ├── Machine Learning Cheatsheet.pdf ├── Machine Learning Cheatsheet_NEW.pdf ├── Machine Learning Modelling in R.pdf ├── Pytorch.pdf ├── Supervised Machine Learning.pdf └── mlr.pdf ├── README.md ├── SQL ├── SQL for DS.pdf ├── SQL-Cheat-Sheet SQL-Tutorial.pdf ├── SQL-cheat-sheet SQLTutorials.pdf ├── SQL-cheat-sheet.pdf ├── sql-cheat-sheet (1).pdf ├── sql-cheat-sheet (2).pdf ├── sql-cheat-sheet-for-data-scientists-by-tomi-mester.pdf └── sql_cheat_sheet2.pdf └── Statistics and Probability ├── Probability and Naive Bayes.pdf ├── Probability.pdf ├── Stat 100 Final Cheat Sheets - Google Docs (2).pdf ├── Statistics Cheat Sheet.pdf └── cheatsheet-statistics.pdf /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | 2 | # These are supported funding model platforms 3 | 4 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] 5 | patreon: # Replace with a single Patreon username 6 | open_collective: # Replace with a single Open Collective username 7 | ko_fi: # Replace with a single Ko-fi username 8 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 9 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 10 | liberapay: # Replace with a single Liberapay username 11 | issuehunt: # Replace with a single IssueHunt username 12 | otechie: # Replace with a single Otechie username 13 | custom: ['https://www.buymeacoffee.com/iamsivab'] 14 | -------------------------------------------------------------------------------- /Interview Guide/100 Python Interview Questions and Answers [Updated 2019].pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/100 Python Interview Questions and Answers [Updated 2019].pdf -------------------------------------------------------------------------------- /Interview Guide/109 Data Science Interview Questions and Answers _ Springboard Blog.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/109 Data Science Interview Questions and Answers _ Springboard Blog.pdf -------------------------------------------------------------------------------- /Interview Guide/15 Free eBooks to Learn Python - codeburst.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/15 Free eBooks to Learn Python - codeburst.pdf -------------------------------------------------------------------------------- /Interview Guide/50 Most Common Interview Questions and Answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/50 Most Common Interview Questions and Answers.pdf -------------------------------------------------------------------------------- /Interview Guide/50 of the Best Data Science Blogs _ Springboard Blog.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/50 of the Best Data Science Blogs _ Springboard Blog.pdf -------------------------------------------------------------------------------- /Interview Guide/Data Science Interview Guide.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Data Science Interview Guide.xlsx -------------------------------------------------------------------------------- /Interview Guide/Data Science Interview Questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Data Science Interview Questions.pdf -------------------------------------------------------------------------------- /Interview Guide/Getting Your First Job in DS.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Getting Your First Job in DS.pdf -------------------------------------------------------------------------------- /Interview Guide/ML Questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/ML Questions.pdf -------------------------------------------------------------------------------- /Interview Guide/Top 10 Machine LEarning Algorithms.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Top 10 Machine LEarning Algorithms.pdf -------------------------------------------------------------------------------- /Interview Guide/Top 100 Python Interview Questions & Answers For 2019 _ Edureka.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Top 100 Python Interview Questions & Answers For 2019 _ Edureka.pdf -------------------------------------------------------------------------------- /Interview Guide/Top 45 Data Science Interview Questions and Answers For 2019 _ Edureka.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Top 45 Data Science Interview Questions and Answers For 2019 _ Edureka.pdf -------------------------------------------------------------------------------- /Interview Guide/Top Interview Questions you must know.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Interview Guide/Top Interview Questions you must know.pdf -------------------------------------------------------------------------------- /Jonathan Bower.md: -------------------------------------------------------------------------------- 1 | ####Data Science Resources 2 | 3 | ####This Repository was Inspired From (https://www.github.com/jonathan-bower/DataScienceResources) 4 | 5 | Hello and welcome to the Data Science Resources repo. I originally built this repo so that I could have a location to host resources that are helpful to me. Through building the repo I realized that other people might be also be interested in this content - so I have tried to curate content on data science topics, high quality resources to learn from, and relevant blog posts. 6 | 7 | The intended goal was to cover more than just the technical component of data science. Data Science as a discipline is still relatively fresh and many business are learning how to properly integrate and structure those teams and also proper understanding the value proposition that data science can provide. 8 | 9 | As a result I also tried to find topics that cover building data science teams, business practices, use-cases, product metrics and data science career paths. 10 | 11 | This is a constant work in progress and I hope to refactor and update in some kind of meaningful time frame. 12 | 13 | If you find this resource helpful - please send it around to other people or you can [upvote it on datatau](http://www.datatau.com/item?id=4593), share it on linkedIn, twitter, Facebook, add it to Quora or just send me a note. Good luck, I hope this helps you find what you are looking for now, or in the future. 14 | 15 | Remember - If you’re not prepared to be wrong, you’ll never come up with anything original. 16 | 17 | #Table Of Contents 18 | 1. [Data Science Getting Started](#data-science-getting-started) 19 | * [Start](#start) 20 | * [Data Science Courses](#data-science-courses) 21 | 22 | 1. [Data Pipeline & Tools](#data-pipeline--tools) 23 | * [Python](#python) 24 | * [Data Structures & CS topics](#data-structures--cs-topics) 25 | * [Statistics](#statistics) 26 | * [Stats/Engineering Libraries](#statsengineering-libraries) 27 | * [Databases/Frameworks](#databasesframeworks) 28 | * [Data Acquisition](#data-acquisition) 29 | * [Processing & EDA](#processing--exploratory-data-analysis) 30 | * [Machine Learning](#machine-learning) 31 | * [Machine Learning Theory](#machine-learning-theory) 32 | * [Deep Learning](#deep-learning) 33 | * [Model Selection](#model-selection) 34 | * [Model Evaluation](#model-evaluation) 35 | * [Feature Engineering](#feature-engineering) 36 | * [Additional Tools or Processes](#additional-tools-or-processes) 37 | * [Data Visualization](#data-visualization) 38 | * [ipython Notebook Tutorials](#ipython-notebook-tutorials) 39 | * [Data Sources](#data-sources) 40 | * [New Data Tools](#new-data-tools) 41 | 42 | 1. [Product](#product) 43 | * [Product Metrics](#product-metrics) 44 | * [Team Communication & Business Tools](#team-communication--business-tools) 45 | * [Best Practices](#best-practices) 46 | 47 | 1. [Career Resources](#career-resources) 48 | * [Data Science Career Path](#data-science-career-path) 49 | * [Types of Data Scientists](#types-of-data-scientists) 50 | * [Data Science Applications/Use Cases](#data-science-applicationsuse-cases) 51 | * [Data Science Websites/Books](#data-science-websitesbooks) 52 | * [Data Science Meetups in the Bay Area](#data-science-meetups-in-the-bay-area) 53 | * [Data Science Blogs](#data-science-blogs) 54 | * [Data Science Conferences](#data-science-conferences) 55 | * [Data Science Presentations](#data-science-presentations) 56 | * [Relevant Business Processes](#relevent-business-processes) 57 | 58 | 1. [Open Source Data Science Resources](#open-source-data-science-resources) 59 | * [Additional Open Source Content](#other-open-source-data-science-content) 60 | * [Auxiliary Content & Apps](#auxiliary-content--apps) 61 | 62 | 1. [About Me](#about-me) 63 | 64 | ## Data Science Getting Started 65 | Data Science is a multidisciplinary field covering at the very minimum - statistics, programming, machine learning [Drew Conway's venn diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) or [Cheat Sheet of a Modern Data Scientist](http://www.marketingdistillery.com/2014/08/30/data-science-skill-set-explained/). These topics are covered throughout this repo. I personally find the best way to learn a topic is to get my hands dirty quickly - with that in mind I would get to work in python and then implement different tools or theory into my toolkit as they are understood. If you haven't used python before I would strongly urge you to use the codecademy course to familiarize yourself with the content and how to program. Good luck and have fun. 66 | 67 | A note about order - I framed the contents in the Pipeline & Tools section order of the data pipeline starting with acquisition, exploratory data analysis, cleaning data, model section & evaluation and then visualization. 68 | 69 | ### Start 70 | * [Data Science Pipeline](http://machinelearningmastery.com/wp-content/uploads/2014/05/Overview-of-the-Applied-Machine-Learning-Process.png) - Detailed overview of data pipeline from MachineLearningMastery.com 71 | * [Intro to ipython](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks/_edit#entire-books-or-other-large-collections-of-notebooks-on-a-topic) - A curation of Ipython Notebooks great for introductory level to python, programming, comp sci, data science and other topics. 72 | * [How do I Become a Data Scientist?](http://www.quora.com/How-do-I-become-a-data-scientist) - Some more great starting points from William Chen. 73 | 74 | ### Data Science Courses: 75 | * [Coursera](https://www.coursera.org/specialization/jhudatascience/1) - Data Science Specialization at Coursera - many other courses available as well. 76 | * [Udacity](https://www.udacity.com/courses#!/data-science) - Online MOOCs that are the Data Science related courses. by I 77 | * [Data Science Bootcamps](http://yet-another-data-blog.blogspot.com/2014/04/data-science-bootcamp-landscape-full.html) - A collection of all bootcamps currently on the market as of April 5, 2014 by Ikechukwu Okonkwo. 78 | * [Coursera Machine Learning Course](https://www.coursera.org/course/ml) - Andrew Ng's pinnacle Machine Learning course. 79 | * [Edx](https://www.edx.org/course/mitx/mitx-6-00-2x-introduction-computational-2836#.VEANx9TF-tw) - EDX courses related to data science. 80 | 81 | ## Data Pipeline & Tools 82 | 83 | ###Python 84 | Python is my workhorse language specifically as it has many data science and statistic library, the ability to work in production environments, and work on other problems outside of data science. There are many other languages that could be useful but are not covered here: Julia, R, Cython, Pig, Scala, Java, etc. 85 | 86 | * [Python @ Codecademy](http://www.codecademy.com/en/tracks/python) - If you have never used Python, right this way.. 87 | * [The Python Wiki](https://wiki.python.org/moin/FrontPage) - Good resource with lots of info about Python. 88 | * [Python for Data Science Tutorial - Kaggle](https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience) - Stepping into Data Science with Kaggle and installing some libraries. 89 | * [Introduction to Data Processing with Python](http://opentechschool.github.io/python-data-intro/) - Just as the name says - some introductory level information and exercises. 90 | * [Git tutorial](https://try.github.io/levels/1/challenges/1) - Git for Version Control. Simple tutorial for Git from Github. 91 | * [Git Tips](http://www.alexkras.com/19-git-tips-for-everyday-use/) - 19 git tips for everyday use. 92 | * [Anyone Can Code](http://dhruvbird.com/61.html) - Languages, tutorials, cheat sheets, algorithms and data structures 93 | 94 | #### Data Structures & CS Topics 95 | * [Algorithms & Data Structures](http://www.bogotobogo.com/Algorithms/algorithms.php) - Binary trees, hash tables, linked lists, big(O) notation and more. 96 | * [Algorithm & Data Structures](http://interactivepython.org/courselib/static/pythonds/index.html) - Well organized detailed and digestible site full of content covering data structures, algorithms, recursion and assignments! 97 | * [Big O Notation](http://interactivepython.org/courselib/static/pythonds/AlgorithmAnalysis/BigONotation.html) - Great details and visual of big-O notation. 98 | * [Visualizations of Data Structures](http://www.cs.usfca.edu/~galles/visualization/Algorithms.html) - Collection of different algorithms (graph problems) and data structures (queues, heaps, hashes) that walks through the visualization to get a better intuitive understanding. 99 | * [Data Structures CheatSheet & Big Oh Notation](http://bigocheatsheet.com/) 100 | * [Data Structures CheatSheet -smaller more readable](https://www.clear.rice.edu/comp160/data_cheat.html) 101 | * [Coursera: Stanford Algorithms Design & Analysis ](https://class.coursera.org/algo-006) - Course on algorithm design & analysis 102 | 103 | ####Statistics 104 | Some primers on understanding statistics and other resources to get a deeper understanding. 105 | * [Statistics Without the Agonizing Pain](https://www.youtube.com/watch?v=5Dnw46eC-0o) - John Rauser's really great video on statistics - funny and engaging with a good message. 106 | * [Probability Programming and Bayesian Methods for Hackers](http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb) - full book all online through ipython notebooks. 107 | * [Probabilistic Programming and Bayesian Methods for Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) - Github Repo for the book above. 108 | * [Statistics Cheat Sheet in Ipython Notebook](http://nbviewer.ipython.org/url/trust.sce.ntu.edu.sg/~gguo1/blogs/Statistics/Statistics.ipynb) 109 | * [The only probability Cheatsheet you'll ever need](https://bayesrule.files.wordpress.com/2014/07/probability_cheatsheet_140718.pdf) - Self explanatory - (thanks William Chen @ http://datastories.quora.com/) for pointing me this great cheat sheet out - wish I had that back at college. 110 | * [Khan Academy: Statistics](https://www.khanacademy.org/#statistics) - Tons of videos to help learn statistics concepts. 111 | * [Statistical Distributions in iPython Notebook](http://nbviewer.ipython.org/urls/gist.github.com/mattions/6113437/raw/c5468ea930d6960225d83e112d7f3d00d9c13398/Exploring+different+distribution.ipynb) - Discrete, Bernoulli, Poisson, Binomial, Alpha, Beta etc. The descriptions are mathematical - will find another resource to explain. 112 | 113 | ####Stats/Engineering Libraries 114 | A collection of workhorse libraries that are elemental for any python data scientist. 115 | * [Pandas](http://pandas.pydata.org/) Wes McKinney's pandas library for EDA on small to medium sized data sets when you don't want to put the infrastructure for SQL or when it isn't necessary. It has many other great applications other than just better than SQL on small to medium data sets. 116 | * [Numpy/Pandas/Scipy Cheatsheet](https://s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf) - self explanatory 117 | * [SciPy](http://www.scipy.org/) - Open-source software for mathematics, science and engineering. 118 | * [NumPy](http://www.numpy.org/) - Fundamental package for scientific computing with Python. 119 | * [StatsModels](http://statsmodels.sourceforge.net/) - Module that allows users to explore data, estimate statistical models and perform statistical tests. 120 | * [PyMC](https://pypi.python.org/pypi/pymc) - Bayesian estimation useful for Markov chain Monte Carlo analysis (among other things). 121 | 122 | 123 | ####Data Acquisition 124 | Libraries that are very helpful for abstracting away some of the complications of scraping or working with HTTP. 125 | * [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) - A python library to make web-scraping HTML easier. 126 | * [Beautiful Soup Cheat Sheet](http://youkilljohnny.blogspot.com/2014/03/beautifulsoup-cheat-sheet-parse-html-by.html) 127 | * [Requests](http://docs.python-requests.org/en/latest/) - HTTP for Humans - python library that makes working with http and api's more effortless 128 | 129 | ####Processing & Exploratory Data Analysis 130 | A collection of documents explaining some of the ways to do processing & EDA. 131 | * [Unix for Processing](http://www.theunixschool.com/p/awk-sed.html) - sed & awk for data processing. 132 | * [Pandas](http://pandas.pydata.org/) - Already mentioned is great for data processing - cleaning, filtering and getting rid of nan's, normalizing, scaling, replacing values, etc. 133 | * [SciKit Learn for Preprocessing](http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) - Doc on sklearn's preprocessing methods. 134 | * [Regular Expressions](http://www.zytrax.com/tech/web/regex.htm) - Regex explained. 135 | 136 | ###Databases/Frameworks 137 | A collection of databases & frameworks that are helpful for data management and are the industry standard. 138 | * [SQL](http://www.postgresql.org/) - SQL Database - I linked to Postgres since that is the version I use. 139 | * [Psycopg](http://initd.org/psycopg/) - Python <> Postgres. Able to adapt PostgreSQL for the python environment. 140 | * [SQL Cheet Sheet](http://www.sql-tutorial.net/sql-cheat-sheet.pdf) 141 | * [SQLZoo](http://sqlzoo.net/wiki/Main_Page) - Develop your skills 142 | * [SQLSchool](http://sqlschool.modeanalytics.com/) - Develop your skills 143 | [MongoDB](http://www.mongodb.org/) - NoSQL database 144 | * [PyMongo](http://api.mongodb.org/python/current/tutorial.html) - Python Mongo Driver. 145 | * [MongoDB - cheatsheet](https://blog.codecentric.de/files/2012/12/MongoDB-CheatSheet-v1_0.pdf) - Cheat sheet for MongoDB 146 | * [Apache Hive](https://hive.apache.org/) - Uses Hive Query Language (HQL) - similar to SQL for data at scale. 147 | * [Hive Cheatsheet](http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf) - Self Explanatory. 148 | * [ElasticSearch](http://www.elasticsearch.org/) - For scalable, fast text search/analysis. 149 | * [Neo4j](http://www.neo4j.org/) - Leading graph database. 150 | * [Redis](http://redis.io/) - Key-value open source data structure server. 151 | * [Redshift](http://aws.amazon.com/redshift/) - AWS petabyte-scale data warehouse solution. 152 | * [Hadoop - the definitive guide](http://ce.sysu.edu.cn/hope/UploadFiles/Education/2011/10/201110221516245419.pdf) - Hadoop ecosystem. 153 | * [Spark](https://spark.apache.org/) - Lightening fast cluster computing. 154 | * [MRjob](https://github.com/Yelp/mrjob) - Run MapReduce jobs on Hadoop or AWS. 155 | 156 | ###Machine Learning 157 | There is a lot of information available online about the theory, mathematical intuition, tuning for this discipline. Here are some tools that are currently available. 158 | * [A visual introduction to Machine Learning](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/) - Awesome d3 visualization to help understand machine learning. 159 | * [SciKit-Learn](http://scikit-learn.org/stable/) - Simple and efficient machine learning tools for data mining and data analysis 160 | * [NLTK](http://www.nltk.org/) - Natural Language Toolkit to work with human languages data. 161 | * [Tour of Machine Learning Algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/) - Blog post about some of the high level ML methods 162 | * [VIDEO - How to get started w/mL](https://www.youtube.com/watch?v=uBorfxosVYs) - Melanie Warrick @ PyCon 2014. 163 | * [Some ML methods classified](http://nyghtowlblog.files.wordpress.com/2014/04/ml_algorithms.png?w=535&h=311) - Classification for some sample ML algorithms by Melanie Warrick. 164 | * [SciKit-image](http://scikit-image.org/) - Algorithms for image processing. 165 | * [Machine Learning CheatSheet](https://github.com/soulmachine/machine-learning-cheat-sheet) - I would actually say this is more than just a cheat sheet given that there are > 100 pages of notes. 166 | * [Awesome Machine Learning](https://github.com/josephmisiti/awesome-machine-learning) - List of machine learning libraries in all languages and also Kaggle competition source code by Joseph Misiti. 167 | 168 | ###Machine Learning Theory 169 | * [MathematicalMonk ML videos](https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA) - Amazingly concise and digestible videos detailing how different machine learning algorithms function (e.g. logistical, sums, knn, Bayes, etc.) 170 | * [Logistic Regression Explained](http://www.appstate.edu/~whiteheadjc/service/logit/intro.htm#hypothesis) - Detailed explanation of how logistic regression works. 171 | * [Video explaining how Random Forests Algorithm works](https://www.youtube.com/watch?v=o7iDkcpOr_g) - Random Forests Algorithm explained. 172 | * [Random Forest Explained](http://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/) - Write up about Random Forest in layman's terms. 173 | * [Machine Learning 101](http://www.erogol.com/large-set-machine-learning-resources-beginners-mavens/) - Large set of ML resources for beginners. 174 | 175 | 176 | ###Deep Learning 177 | Getting a lot of media traction is deep learning - get your feet wet with some of these resources: 178 | * [HackerNews for Deep Learning](http://news.startup.ml/) - As the name says - a hacker news for Deep Learning 179 | * [Deeplearning4j](http://deeplearning4j.org/) - Deep Learning in Java. 180 | * [Neural Networks Explained - Video](https://www.youtube.com/watch?v=bxe2T-V8XRs) - High level and intuitive explanation how Neural Networks (deep learning) works. 181 | * [Deep Learning Tutorial](http://deeplearning.net/tutorial/deeplearning.pdf) 182 | * [What is Deep Learning](http://blog.shakirm.com/2015/06/a-statistical-view-of-deep-learning-vi-what-is-deep/?utm_content=bufferae750&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer) 183 | * [Free Online Deep Learning Book](http://neuralnetworksanddeeplearning.com/) - in-depth book about NN & deep learning 184 | * [The Brain vs Deep Learning - Blog Post](https://timdettmers.wordpress.com/2015/07/27/brain-vs-deep-learning-singularity/) 185 | 186 | ######Time-Series 187 | * [ANN & Computational Intelligence Forecasting Competition](http://www.neural-forecasting-competition.com/index.htm) 188 | * [Neural Networks for Time Series Slidedeck](http://www.cs.cmu.edu/afs/cs/academic/class/15782-f06/slides/timeseries.pdf) 189 | 190 | ####Model Selection 191 | Resources about how to decide on your model. 192 | * [SciKit Learn Flow Chart for Model Selection](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) - A helpful for a starting point selecting SKlearn algorithms. 193 | 194 | ####Model Evaluation 195 | Resources to help with understanding model evaluation. 196 | * [Evaluating ML Algorithms](http://machinelearningmastery.com/how-to-evaluate-machine-learning-algorithms/) - Blog Post from MachineLearningMastery about how to evaluate your performance. 197 | * [Cross-Validation](http://robjhyndman.com/hyndsight/crossvalidation/) - Critical concept to evaluate the performance of your models. 198 | * [K-fold & Grid Search in Scikitlearn](http://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/) - Demo on how to implement kfold cross validation and grid-search using scikit-learn. 199 | * [Scikit-learn Cross Validation doc](http://scikit-learn.org/stable/modules/cross_validation.html) - Self explanatory title. 200 | * [Cross Validation - how to select your final Kaggle Model](http://www.chioka.in/how-to-select-your-final-models-in-a-kaggle-competitio/) - Importance of cross-validation described specifically in how it effects Kaggle competition scores. 201 | 202 | ####Feature Engineering 203 | A critical element of Data Science to improve your performance but minimally talked about. 204 | * [Ipython Notebook for Feature engineering](http://nbviewer.ipython.org/urls/raw2.github.com/yhat/DataGotham2013/master/notebooks/7%20-%20Feature%20Engineering.ipynb?create=1) - Some discussion about Feature Engineering. 205 | * [CS Princeton Course](http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf) - Course content on Feature Engineering. 206 | * [Blog Post about Feature Engineering / Data Exploration](http://deblivingdata.net/machine-learning-tricks/) - Blog post about topic. 207 | 208 | ### Additional Tools or Processes 209 | Resources on other topics that are very helpful for data scientists and product. 210 | * [A/B Testing](http://conversionxl.com/how-to-build-a-strong-ab-testing-plan-that-gets-results/#.) - Blog about A/B testing. 211 | * [A/B Testing](http://unbounce.com/a-b-testing/5-ways-youre-screwing-up-your-a-b-testing/) - And how you are screwing it up. 212 | * [Bloom Filters](http://nbviewer.ipython.org/github/ctb/2013-pycon-awesome-big-data-algorithms/blob/master/04-bloom-filters.ipynb) - Python notebook about bloom filters. 213 | * [Bloom filters](http://billmill.org/bloomfilter-tutorial/) - Bloom Filters. 214 | * [Reservoir Sampling](http://blog.cloudera.com/blog/2013/04/hadoop-stratified-randosampling-algorithm/) - A primer on Reservoir Sampling. 215 | * [Reservoir Sampling Again](http://www.geeksforgeeks.org/reservoir-sampling/) 216 | * [Monte Carlo for the Monty Hall Problem](http://slantedwindows.com/monty-hall-meet-mr-monte-carlo/) - Hyon Chu puts on a good explanation to MC for the Monty Hall Problem. 217 | * [Markov Chain Monte Carlo](http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter3_MCMC/IntroMCMC.ipynb) - Opening the black box of MCMC. 218 | * [Multithreading and Queues](http://pymotw.com/2/Queue/) - How to build multithreading and queues. 219 | * [Basics of Multithreading and queses](http://www.troyfawkes.com/learn-python-multithreading-queues-basics/) - More about multithreading. 220 | * [Multithreading & Queuing](http://www.shanelynn.ie/using-python-threading-for-multiple-results-queue/) - Another great resource for multithreading & queuing. 221 | * [Building a Recommender System](http://www.quora.com/How-can-I-start-building-a-recommendation-engine) - Quora answer to this question. Helpful starting point. 222 | 223 | ### Data Visualization 224 | Collection of the best libraries that I know for easy and powerful data visualizations. 225 | * [ggplot](http://ggplot.yhathq.com/) - ggplot for python ported by the team at yhat. 226 | * [matplotlib](http://matplotlib.org/) - Awesome plotting library for python. 227 | * [d3](http://d3js.org/) - Mike Bostock's viz library - the de facto gold standard for polished visualization - in js, steep learning curve but beautiful outcomes. 228 | * [bokeh](http://bokeh.pydata.org/) - Interactive visualization library. 229 | * [d3py](https://github.com/mikedewar/d3py) - Another library for data viz. 230 | * [vincent](http://vincent.readthedocs.org/en/latest/) - Help with python for d3. 231 | * [seaborn](http://web.stanford.edu/~mwaskom/software/seaborn/) - Clean statistical data visualization library. 232 | 233 | Other available Visualization Resources. 234 | * [Scott Murray's D3 Tutorials](alignedleft.com/tutorials/d3/) Tutorials from *Interactive Data Visualization for the Web* 235 | * [tributary.io](http://tributary.io) - live code visualization platform designed specifically for D3.js 236 | * [plot.ly](http://plot.ly) - A web visualization and data processing platform 237 | * [blockspring](http://blockspring.com) - Share code and visualizations through a single platform 238 | * [dot.append](http://enjalot.github.io/dot-append/) - Ian Johnson (enjalot) goes through several live-coding examples using D3 239 | * [Text Visualization Plots](#http://textvis.lnu.se/) - Interactive site with different types of text visualization for different problems. 240 | 241 | ### Design Theory 242 | The importance of design theory in data visualization, story telling and presentations could not be understated. It can take great content and make it confusing or virtually unusable, or it can make content sing and connect with the audience. Through better understanding of design theory, UI principles, a data scientist (or anyone) can convey more understandable information to the intended audience and give a strong story to their content. 243 | * [Slidedeck on Data Storytelling & Visualization ](http://higherlogicdownload.s3.amazonaws.com/AMSTAT/62ac7e8c-c7ec-4e98-b58b-dd540bf9e0d9/UploadedImages/dvc2014/Veena%20MendirattaASA%20Storytelling%20with%20Data%20Visualization.pdf) - Overview of different story structures and how to tell a story with data. 244 | * [Accelerating Understanding Through Data Visualization](http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Accellerating-Understanding-Through-Data-Visualization.pdf) - Accenture White paper on Data Visualization 245 | 246 | ### Ipython Notebook Tutorials 247 | Collection of ipython notebooks that are helpful as examples to either using tools or to explain certain topics. 248 | * [Pandas Tutorial](http://nbviewer.ipython.org/github/twiecki/financial-analysis-python-tutorial/blob/master/1.%20Pandas%20Basics.ipynb) - Basic intro to Pandas in notebook form. 249 | * [Pandas / Stats Tutorial](https://github.com/fonnesbeck/pytenn2014_tutorial) - Intermediate tutorial by Christopher Fonnesbeck Feb 2014. 250 | * [Scipy Tutorial](http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-3-Scipy.ipynb) - Basic Scipy Tutorial. 251 | * [Numpy Tutorial](http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb) - Basic Numpy Tutorial. 252 | * [Multiple Regressions using Statsmodels](http://nbviewer.ipython.org/urls/s3.amazonaws.com/datarobotblog/notebooks/multiple_regression_in_python.ipynb) - Using statsmodels for regression. 253 | * [Intro to PyMC](http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section4_3-Introduction-to-PyMC.ipynb) - Intro to PyMC. 254 | * [More on PyMC](http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb) - More PyMC. 255 | * [Kaggle Titanic Comp Tutorial](http://nbviewer.ipython.org/github/agconti/kaggle-titanic/blob/master/Titanic.ipynb) - Kaggle Titanic Tutorial using RandomForests. 256 | * [Psycopg2 tutorial in Python](https://wiki.postgresql.org/wiki/Psycopg2_Tutorial) - How to use Psycopg2. 257 | * [SQL in iPython](http://nbviewer.ipython.org/gist/catherinedevlin/6588378) - SQL in Python. 258 | * [Mongo in Python](http://api.mongodb.org/python/current/tutorial.html) - Mongo in Python. 259 | * [Beautiful Soup Tutorial](http://nbviewer.ipython.org/github/kcranston/2013-08-ku/blob/master/beautifulsoup/notebooks/00-BeautifulSoup.ipynb) - Beautiful Soup! 260 | * [Sci-Kit Learn Basics](http://nbviewer.ipython.org/urls/raw2.github.com/yhat/DataGotham2013/master/notebooks/4%20-%20scikit-learn%20basics.ipynb?create=1) - Machine Learning Basics with scikit-learn. 261 | * [MatPlotLib](http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb) - Some of the possibilities of data-viz with MatPlotLib. 262 | * [Choosing the right priors - Bayesian](http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter6_Priorities/Priors.ipynb) - Bayesian statistics and prior selection. 263 | * [Some Basic Data Analysis in Python](http://nbviewer.ipython.org/github/jvns/talks/blob/master/pyconca2013/pistes-cyclables.ipynb) - Basic data analysis with python. 264 | * [Crash Course in Python for Scientists](http://nbviewer.ipython.org/gist/rpmuller/5920182) - Ipython Notebook for Scientists! 265 | * [Regular Expressions](http://nbviewer.ipython.org/gist/rjweiss/7577022) - Regex to match patterns in strings - very powerful. 266 | * [MapReduce](http://nbviewer.ipython.org/github/cs109/content/blob/master/labs/lab8/lab8_mapreduce.ipynb) - Classes, inheritance and map-reduce exercises. 267 | * [Recursion](http://nbviewer.ipython.org/github/gumption/Motivating_and_Visualizing_Recursion_in_Python/blob/master/Motivating_and_Visualizing_Recursion_in_Python.ipynb) Notebook visualization recursion "The single most powerful idea in algorithms". 268 | * [Recursion](http://anandology.com/python-practice-book/functional-programming.html) More about Recursion and Functional Programming 269 | * [Hash Table, Bloom Filter, HyperLogLog](http://nbviewer.ipython.org/github/mlaprise/pydata2013-pds-talk/blob/master/pydata2013.ipynb) - Explaining and demoing some of these concepts. 270 | * [Hash tables, Binary Trees](http://nbviewer.ipython.org/github/iit-cs429/main/blob/master/lectures/lec04/Dictionaries.ipy) 271 | * [Time Series- Arima & Arma](http://nbviewer.ipython.org/github/qwu-hab/geogg121/blob/master/tsa_temp.ipynb) 272 | 273 | ### Data Sources 274 | Collection of sites to access data if you want to build out a project or just use some of the tools for EDA. 275 | * [Data.Gov](https://www.data.gov/) - The US government portal to open data. 276 | * [California Water Resources](http://www.water.ca.gov/data_home.cfm) - California's water resource data. 277 | * [Data for Cool DS projects](http://101.datascience.community/2014/10/17/data-sources-for-cool-data-science-projects-part-1-guest-post/) 278 | * [Academic Torrents](http://academictorrents.com/) - Sharing Data is hard, torrents make it easier for academics. 279 | * [Data Basin](http://databasin.org/) - Science based mapping and analytics platform. 280 | * [Open Energy Data Initiative](http://en.openei.org/wiki/Main_Page) - Over 800 data sets covering energy issues. 281 | * [UCI Machine Learning Datasets](https://archive.ics.uci.edu/ml/datasets.html) - Data for machine learning - lots of labeled data and description of the problem types. 282 | * [London Data Store](http://data.london.gov.uk/) - Lots of datasets on London, UK 283 | 284 | ### New Data Tools 285 | Aim to keep track of developing trends and new tech that is helpful for the practicing Data Scientist. New might be a misnomer. 286 | * [BigML](https://bigml.com/) - machine learning for the everyday user, also useful for EDA. 287 | * [GraphLab](http://graphlab.com/) - graph-based, high performance, distributed computation framework. They just implemented deep learning onto their platform. 288 | * [ModeAnalytics](https://modeanalytics.com/) - platform to share analysis/data science. 289 | * [Apache Mahout](https://mahout.apache.org/) - Scalable machine learning library. Not in python. 290 | * [Apache Hadoop](http://hadoop.apache.org/) - Open-source software for reliable, scalable, distributed computing. Not really new (10 years old at this point) 291 | 292 | ### Other Useful Scripts 293 | * [Spinning up EC2 instances](https://github.com/drewconway/data_science_box) - Drew Conway's scripts to easily spin up AWS EC2 instances. 294 | 295 | ## Product 296 | 297 | ###Product Metrics 298 | Understanding product, user behavior, and product metrics is helpful for data scientists in industry. Being able to help your product manager and team execute on strategies by understanding the problem, metrics and what they understand facilitates a more fruitful relationship. 299 | * [Actionable Metrics](http://practicetrumpstheory.com/3-rules-to-actionable-metrics/) - Funnel reports, cohort analysis, actionable metrics. 300 | * [Analytics for Product Managers](http://www.mindtheproduct.com/2013/02/everything-a-product-manager-needs-to-know-about-analytics/) - Everything a PM needs to know about analytics - or the minimum amount your PM should know about analytics as a Data Scientist. 301 | * [Startups, you are doing data science wrong!](https://gigaom.com/2013/09/28/notice-to-startups-you-are-doing-data-science-wrong/) - High level explanation about how to use data science in a start-up company. 302 | * [Product Psychology](http://www.productpsychology.com/category/user-behavior/) - Understanding user behavior. 303 | * [Understanding Cohort Analysis](http://amarnath14.tumblr.com/post/69790103060/understanding-cohort-analysis) - Blog about cohort analysis, conversions, customer lifetime value, etc. Great starting point understanding product metrics. 304 | * [Tech Product Management](http://techproductmanagement.com/) - More product focused than Data Science but can provide a good sense to view product management. 305 | * [Mind The Product](http://www.mindtheproduct.com/) - Another solid PM blog. 306 | 307 | ### Team Communication & Business Tools 308 | There are some very innovative new companies that are producing very effective tools to minimize and abstract away inefficient processes at companies. While it isn't strictly data science related, these products could be very help to integrate with your teams to improve overall productivity. 309 | * [Aha!](http://www.aha.io/) - Clean product roadmapping software for PMs. 310 | * [Slack](https://slack.com) - Amazing team communication tool - abstracting away unnecessary e-mails. 311 | * [Harvest](https://www.getharvest.com) - Effortless time tracking for business. 312 | * [Trello](https://trello.com) - Helping organize everything - great for project management. 313 | * [Zapier](https://zapier.com/zapbook/harvest/slack/) - Bringing together Harvest + Slack + Trello and a lot more... 314 | * [Thoughtbot Playbook](http://playbook.thoughtbot.com/) - A detailed account of how thought book runs is software consulting company talking about guiding principles, design sprints, code reviews to sales and operations. A content packed post. 315 | * [IFTTT](http://www.ifttt.com) - 'Putting the internet to work for you'. Great for small companies to automate social media, marketing or to have your own personal recipes set up. 316 | * [Github](https://github.com) - Clearly a great product - 'Build software better, together'. 317 | * Web Analytics & Reporting Software: 318 | * [Google Analytics](http://www.google.com/analytics/) - In depth real-time analytics. 319 | * [Mixpanel](https://mixpanel.com/) - provides real-time analytics and solid cohort analysis. 320 | * [Clicky](http://clicky.com/) - Pride themselves on ease of use. 321 | * [Evernote](https://evernote.com/) - Great for keeping notes 322 | 323 | ### Best Practices 324 | Source control and keeping accurate documentation so that you and your colleagues can follow and reproduce your work is very important. I will add some best coding practices & data science practices. 325 | * [Python Code Style](http://docs.python-guide.org/en/latest/writing/style/) - Allows for better understanding for everyone involved on the project. 326 | * [Slide Deck for BMPs](https://python.g-node.org/python-summerschool-2011/_media/materials/best_practices/haenel-best-practices-2011-09-standrews.pdf) - Slide deck about best practices for coding or the [repo](#https://github.com/esc/best-practices-talk). 327 | * [Engineering Practices in Data Science](http://blog.kaggle.com/2012/10/04/engineering-practices-in-data-science/) A blog post about the lack of source control in Data Science. It's a challenging topic - I believe mode analytics is trying to solve it. 328 | 329 | ## Career Resources 330 | 331 | ### Data Science Career Path 332 | * [Data Science @ Google](http://www.quora.com/What-is-the-Quant-Data-Science-Career-ladder-at-Google) - Quora answer about Data Science career trajectory @ google. 333 | 334 | ### Types of Data Scientists 335 | Not all Data Scientists are the same and it's critical for organizations to understand what it is they need, and how best to fill those roles and/or complement the skills of their team. Finding the organizational structure that enables the data scientists/data engineers within the organization and generates better results is also crucial. It should be given thorough consideration. 336 | * [Kind's of Data Scientist](http://radar.oreilly.com/2013/06/theres-more-than-one-kind-of-data-scientist.html) - O'Reilly's classification of 4 different data scientists. 337 | * [Data Science For Startups](http://tomtunguz.com/data-science-types/) - Which of the Five Types of DS does your startup need? Different classification from O'Reilly. 338 | * [Building Data Science Teams](http://radar.oreilly.com/2011/09/building-data-science-teams.html) - posted from 2011 about how to build data science teams. 339 | * [Data Science Team Building - The Power of Collaborative Analytics](http://www.experfy.com/blog/data-science-team-building-power-collaborative-analytics/) - Post post about different team org structures, difference between DS & BI. 340 | 341 | ### Data Science Applications/Use Cases 342 | Data Science has so many different applications and use cases within industry - many are continuously discovered. These resources provide some potential ideas. 343 | * [Kaggle Data Science Use Cases](https://www.kaggle.com/wiki/DataScienceUseCases) - Helpful to generate ideas for new uses in different industries 344 | * [Data Science for each Industry](http://www.mastersindatascience.org/industry/) - Description of uses for different industries. 345 | * [Big Data Analytics News - use Cases](http://bigdataanalyticsnews.com/big-data-use-cases/) - For Big Data but that's almost synonymous with Data Science. 346 | 347 | ### Data Science Websites/Books 348 | More resources for community based information or hard copy books. 349 | * [Data Science Handbook](https://medium.com/@pericarus/introducing-the-data-science-handbook-b2bfa216bf4b) - Not yet released but should be interesting providing stories from academia and industry about data science - go read the post for a better description! 350 | * [CrossValidated](http://stats.stackexchange.com/) - A question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. 351 | * [StackOverflow](stackoverflow.com) - Language-independent collaboratively edited question and answer site for programmers. 352 | * [Kaggle](http://www.kaggle.com) - Model building competition and great resources for training and data. 353 | * [O'Reilly Media](http://shop.oreilly.com/category/get/data-science-kit.do) - A lot of content rich books available and tutorials on using the tools. 354 | * [Quora](http://www.quora.com/) - Question and answer site - lots of data science content and career content. 355 | * [Data Science @ StackExchange](http://datascience.stackexchange.com/) - Still in beta. 356 | 357 | ### Data Science Meetups in the Bay Area 358 | A great way to meet other Data Scientists and keep up to date with best practices. 359 | * [SF Data Science](http://www.meetup.com/SF-Data-Science/) 360 | * [Data Science for Sustainability](http://www.meetup.com/Data-Science-for-Sustainability/) 361 | * [Python Meetup Group](http://www.meetup.com/sfpython/) 362 | * [USF Seminar Series in Analytics](http://www.meetup.com/USF-Seminar-Series-in-Analytics/http://www.meetup.com/USF-Seminar-Series-in-Analytics/http://www.meetup.com/USF-Seminar-Series-in-Analytics/http://www.meetup.com/USF-Seminar-Series-in-Analytics/) 363 | * [DataKindSF](http://www.meetup.com/DataKind-SF-Bay-Area/) 364 | * [SF Bayarea Machine Learning](http://www.meetup.com/SF-Bayarea-Machine-Learning/) 365 | * [AirBnB Tech Talks](http://nerds.airbnb.com/tech-talks/) 366 | 367 | ### Data Science Blogs 368 | * [Data Stories @ Quroa](http://datastories.quora.com/) - William Chen's (DS@Quora) blog about data science. 369 | * [FastML](http://fastml.com/) 370 | * [FiveThirtyEight Blog](http://fivethirtyeight.com/) - Nate Silver's blog. 371 | * [Data Science Hanbook](http://www.datasciencehandbook.me/) - Data Science Handbook Project (not quite a blog but it fits here). 372 | * [Simply Statistics Blog](http://simplystatistics.org/) 373 | * [All The Things Tech](http://nyghtowl.io/) 374 | * [Musings in Data Science](http://deblivingdata.net/) 375 | * [Zipfian Data Science Blog](http://www.zipfianacademy.com/blog/) - Zipfian Academy DS Blog. 376 | * [Machine Learning Mastery](http://machinelearningmastery.com/) 377 | * [DataTau](http://datatau.com) - Hackernews for Data Science. 378 | * [HackerNews](https://news.ycombinator.com/) 379 | * [Quora](http://quora.com) - Q&A site with lots of information about Data Science. 380 | * [ThreeStoryBlog](http://blog.threestory.com/) - Design blog 381 | 382 | ### Data Science Conferences 383 | * [Strata](http://strataconf.com/) - Conference and a lot of videos from previous conferences - great resource. 384 | * [GraphLab](http://graphlab.com/events/conference14.html) - Another great conference. 385 | * [PyData](http://pydata.org) 386 | 387 | ### Data Science Presentations 388 | * [Strata Collection of Presentations](http://strataconf.com/strata2014/public/schedule/proceedings) - Most of their conference presentations available online. 389 | * [KDD Keynotes](http://videolectures.net/kdd2014_newyork/) - collection of keynote presentations from the NYC conference 390 | * [All of PyData Conference Talks](https://github.com/DataTau/datascience-anthology-pydata) 391 | 392 | ### Relevant Business Processes 393 | * [Lean Startup](http://theleanstartup.com/principles) - A method to develop product and businesses. 394 | * [Agile Development](http://en.wikipedia.org/wiki/Agile_software_development) - group of software development methods to optimize for self-organizational and cross-functional teams. 395 | * [Scrum](http://en.wikipedia.org/wiki/Scrum_(software_development)) - an iterative and incremental agile software development framework for managing product development. 396 | 397 | ### Start-Up Resources 398 | * [How to Start a Start-up](http://startupclass.samaltman.com/) - Series of lectures from successful entrepreneurs (i.e. Y comb, SV angels, etc.) on how to start a start up. 399 | 400 | ##Open Source Data Science Resources 401 | While the name might sound redundant this section represents other sites or repos that have aggregated information covering similar topics. Tons of great content on these sites - definitely go check them out. 402 | 403 | ### Other Open Source Data Science Content 404 | There are some really great resources linked within this section covering all of Data Science, the entire data pipeline, machine-learning, statistics, python, etc. Go check them out. 405 | * [Open Data Science Masters](http://datasciencemasters.org/) - Clare Corthell's Open Source online blog/github with lots of resources available for data science. 406 | * [A Practical Intro to Data Science](http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro-to-data-science) - Zipfian Academy's collection of excellent resources available. 407 | * [LearnDataScience](https://github.com/nborwankar/LearnDataScience) - Nitin Borwankar's collection of IpythonNotebooks for Linear Regression, Logistic Regression, Random Forests, K-Means Clustering 408 | * [FreeDataScienceBooks](https://github.com/chaconnewu/free-data-science-books/blob/master/free-data-science-books.md) - Yu Wu's free open sourced online data science books. 409 | * [Gallery of Ipython Notebooks](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks) - iPython's introduction to Python, Data Science, Economics, Comp Sci, Linguistics, and much more. 410 | * [Data Science 45 Min Intros](https://github.com/DrSkippy/Data-Science-45min-Intros) - The team @ Gnip have a collection of repos to introduce data science topics in roughly 45 minutes per topic. 411 | * [Awesome Data Science](https://github.com/okulbilisim/awesome-datascience) - Collection of bloggers, twitter accounts, facebook accounts, MOOC's, datasets, tools. 412 | * [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) - Onur Akpolat's curated list of awesome big data frameworks, resources and papers. 413 | * [Mining the Social Web](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition) - Matthew Russell's repo related to his book that focuses on working with the Twitter, Facebook, etc. 414 | * [Harvard CS109 Github Repo](https://github.com/cs109/) 415 | * [Pete Warden's Data Science Toolkit](https://github.com/petewarden/dstk) - Collection of open data sets and open-source tools for data science in ruby but has python. 416 | * [Course Materials for Data Science Specialization](https://github.com/DataScienceSpecialization/courses) - Coursera course materials. 417 | * [iPython Cookbook Materials](https://github.com/ipython-books/cookbook-code) - Excellent resources for high performance scientific computing and data science in python. 418 | 419 | ### Auxiliary Content & Apps 420 | * [Markable](http://markable.in/editor/) - Let's me visualize Markdown 421 | * [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - Self explanatory. 422 | * [LightPaper](http://www.ashokgelal.com/lightpaper-for-mac/) - Markdown editor that I use. 423 | * [iterm2](http://iterm2.com/) - Terminal application for Mac. 424 | * [Oh My Zsh](http://ohmyz.sh/) - Framework for managing your ZSH config. Awesome. 425 | * [Sublime Text Editor](http://www.sublimetext.com/) - For all your scripting needs. 426 | 427 | ## ABOUT ME 428 | 429 | I am currently working at an advanced energy storage start-up called Stem which is at the heart of revolutionizing how the grid integrates energy storage from a consumer and utility perspective. Our team works on a variety of different engineering challenges in particular, a lot of time-series problems. 430 | 431 | I am a chemical engineer and economist by formal education and have worked in the energy, water and carbon industries ever since college. I acquired my data science code skills through programming in an on-the-job environment and then taking three months off to learn to hone my data science skills @ Zipfian Academy (since acquired by Galvanize). For me taking that time off to learn, run the daily/weekly sprints, and be in a collective learning environment at Zipfian was irreplaceable. Even if all of Zipfian resources were open source, without taking the time off work it would have been next to impossible to learn all that content. Not to mention the great people I met through the program. 432 | 433 | I am always interested to hear what other data scientists are up to, especially those in the clean energy industry. If you have some project ideas or other resources that would be great to add here - feel free to reach out on Twitter [@sf_oak](http://bit.ly/1FefepA), [LinkedIn](http://linkd.in/1vp57dk) or [AngelList](https://angel.co/jonathan-bower). 434 | 435 | 436 | 437 | 438 | 439 | 440 | [![Analytics](https://ga-beacon.appspot.com/UA-50532302-5/jonathan-bower/DataScienceResources?pixel)](https://github.com/igrigorik/ga-beacon) 441 | 442 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /Machine Learning/INTRODUCTION TO MACHINE LEARNING.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/INTRODUCTION TO MACHINE LEARNING.pdf -------------------------------------------------------------------------------- /Machine Learning/Linear Discriminant Analysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Linear Discriminant Analysis.pdf -------------------------------------------------------------------------------- /Machine Learning/ML Cheat Sheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/ML Cheat Sheet.pdf -------------------------------------------------------------------------------- /Machine Learning/ML Formulas.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/ML Formulas.pdf -------------------------------------------------------------------------------- /Machine Learning/ML PROS N CONS.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/ML PROS N CONS.pdf -------------------------------------------------------------------------------- /Machine Learning/ML-02-linear-regression.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/ML-02-linear-regression.pdf -------------------------------------------------------------------------------- /Machine Learning/Machine Learning Cheatsheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Machine Learning Cheatsheet.pdf -------------------------------------------------------------------------------- /Machine Learning/Machine Learning Cheatsheet_NEW.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Machine Learning Cheatsheet_NEW.pdf -------------------------------------------------------------------------------- /Machine Learning/Machine Learning Modelling in R.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Machine Learning Modelling in R.pdf -------------------------------------------------------------------------------- /Machine Learning/Pytorch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Pytorch.pdf -------------------------------------------------------------------------------- /Machine Learning/Supervised Machine Learning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/Supervised Machine Learning.pdf -------------------------------------------------------------------------------- /Machine Learning/mlr.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Machine Learning/mlr.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Data-Science-Beginners-Guide 2 | 3 | [![Makes people smile](https://forthebadge.com/images/badges/makes-people-smile.svg)](https://github.com/iamsivab) 4 | 5 | [![HitCount](http://hits.dwyl.com/iamsivab/Data-Science-Beginners-Guide.svg)](http://hits.dwyl.com/iamsivab/Data-Science-Beginners-Guide) 6 | 7 | ## Data-Science-Beginners-Guide 8 | 9 | [![Generic badge](https://img.shields.io/badge/Datascience-Beginners-Red.svg?style=for-the-badge)](https://github.com/iamsivab/Data-Science-Beginners-Guide) 10 | [![Generic badge](https://img.shields.io/badge/LinkedIn-Connect-blue.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/iamsivab/) [![Generic badge](https://img.shields.io/badge/Python-Language-blue.svg?style=for-the-badge)](https://github.com/iamsivab/Data-Science-Beginners-Guide) [![ForTheBadge uses-git](http://ForTheBadge.com/images/badges/uses-git.svg)](https://GitHub.com/) 11 | 12 | #### The goal of this project is to Teach [#DataScience](https://github.com/iamsivab/Data-Science-Beginners-Guide) for the Beginners from various domains. 13 | 14 | [![GitHub repo size](https://img.shields.io/github/repo-size/iamsivab/Data-Science-Beginners-Guide.svg?logo=github&style=social)](https://github.com/iamsivab) [![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/iamsivab/Data-Science-Beginners-Guide.svg?logo=git&style=social)](https://github.com/iamsivab/)[![GitHub top language](https://img.shields.io/github/languages/top/iamsivab/Data-Science-Beginners-Guide.svg?logo=python&style=social)](https://github.com/iamsivab) 15 | 16 | #### Few popular hashtags - 17 | ### `#Self- Learning` `#Guide for Beginners` `#Self Learning` 18 | ### `#Python` `#LearnDataScience` `#Machcine Learning` 19 | 20 | ### Motivation 21 | ### Step 0. What is What 22 | 23 | Well, generally speaking, Data Science is not a certain or a single one realm, it’s like a combination of various disciplines that are focusing on analyzing data and finding the best solutions based on them. Initially, those tasks were held by math or statistics specialists, but then data-experts began to use machine learning and artificial intelligence, which added optimization and computer science as a method for analyzing data. This new approach turned out to be much faster and effective, and so extremely popular. 24 | 25 | So all-in-all, the popularity of Data Science lies in the fact it encompasses the collection of large arrays of structured and unstructured data and their conversion into human-readable format, including visualization, work with statistics and analytical methods — machine and deep learning, probability analysis and predictive models, neural networks and their application for solving actual problems. 26 | 27 | Artificial Intelligence, Machine Learning, Deep Learning, and Data Science — undoubtedly, these major terms are the most popular today. And although they are somehow related, they are not the same. So, before jumping into any of those realms, it is mandatory to feel the difference. 28 | 29 | [![DataScience](https://miro.medium.com/max/1104/0*8BYyKQB9Jjzbc9yD)](https://github.com/iamsivab/Data-Science-Beginners-Guide) 30 | 31 | 32 | ### About the Project 33 | 34 | Artificial Intelligence is the realm focusing on the creation of intelligent machines that work and react like humans. AI as a study dates back to 1936 when Alan Turing build first AI-powered machines. Despite quite a long history, today AI in most areas is not yet able to completely replace a human. And the competition of AI with humans in chess, and data encryption are two sides of the same coin. 35 | 36 | ```Machine learning is a creating tool for extracting knowledge from data. In ML models can be trained on data independently or in stages: training with a teacher, that is, having human-prepared data or training without a teacher, working with spontaneous, noisy data.``` 37 | 38 | Deep learning is the creation of multi-layer neural networks in areas where more advanced or fast analysis is needed and traditional machine learning cannot cope. “Depth” provides more than one hidden layer of neurons in the network that conducts mathematical calculations. 39 | 40 | ```Big Data — work with huge amounts of often unstructured data. The specifics of the sphere are tools and systems capable of withstanding high loads.``` 41 | 42 | Data Science is the addition of meaning to arrays of data, visualization, collection of insights, and making decisions based on these data. The field specialists use some methods of machine learning and Big Data — cloud computing, tools for creating a virtual development environment and much more. Data Science’s tasks summed up well by this Venn diagram created by Drew Conway: 43 | 44 | 45 | #### Steps involved in this project 46 | 47 | [![Made with Python](https://forthebadge.com/images/badges/made-with-python.svg)](https://github.com/iamsivab/Data-Science-Beginners-Guide) [![Made with love](https://forthebadge.com/images/badges/built-with-love.svg)](https://www.linkedin.com/in/iamsivab/) [![ForTheBadge built-with-swag](http://ForTheBadge.com/images/badges/built-with-swag.svg)](https://www.linkedin.com/in/iamsivab/) 48 | 49 | [![DataScience](https://miro.medium.com/max/1056/0*OwWBkLK-bd_aeG8y)](https://github.com/iamsivab/Data-Science-Beginners-Guide) 50 | 51 | So what does Data Scientist do? 52 | 53 | Here is all you need to know about it: 54 | 55 | ``` 56 | - detection of anomalies, for example, abnormal customer behavior, fraud; 57 | - personalized marketing — personal e-mail newsletters, retargeting, recommendation systems; 58 | - Metric forecasts — performance indicators, quality of advertising campaigns and other activities; 59 | - scoring systems — process large amounts of data and help to make a decision, for example, on granting a loan; 60 | - asic interaction with the client — standard answers in chat rooms, voice assistants, sorting letters into folders. 61 | 62 | ``` 63 | 64 | To do any of the above tasks you need to follow certain steps: 65 | 66 | ``` 67 | - Collection Search for channels where you can collect data, and how to get it. 68 | - Check. Validation, pruning anomalies that do not affect the result and confuse with further analysis. 69 | - Analysis. The study of data, confirmation of assumptions, conclusions. 70 | - Visualization. Presentation in a form that will be simple and understandable for perception by a person — in graphs, diagrams. 71 | - Act. Making decisions based on the analyzed data, for example, about changing the marketing strategy, increasing the budget for any activity of the company. 72 | 73 | ``` 74 | 75 | #### Let's Get into the Guide 76 | 77 | Right now is the time to move towards more complicated things. All of the steps below will probably seem too hard, time and energy consuming and blah blah. Well, yes, this path is hard if you perceive it as something you can learn in a month or even in a year. You should admit the fact of constant learning, the fact of making baby steps every day and be ready to see mistakes, be ready to try again and count on a long period of mastering this field. 78 | 79 | So, are you really ready for this stuff? If so, let’s roll. 80 | 81 | ### Step 1. Statistics, Math, Linear Algebra 82 | 83 | `“Data Scientist is a person who is better at statistics than any programmer and better at programming than any statistician.” 84 | Josh Wills` 85 | 86 | If we talk in general about Data Science, then for a serious understanding and work we need a fundamental course in probability theory (and therefore, mathematical analysis as a necessary tool in probability theory), linear algebra and, of course, mathematical statistics. Fundamental mathematical knowledge is important in order to be able to analyze the results of applying data processing algorithms. There are examples of relatively strong engineers in machine learning without such a background, but this is rather the exception. 87 | 88 | If university education has left many gaps, I recommend the book The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. In this book, the classic sections of machine learning are presented in terms of mathematical statistics with rigorous mathematical calculations. Despite the abundance of mathematical formulations and evidence, all methods are accompanied by practical examples and exercises. 89 | 90 | The best book at the moment to understand the mathematical principles underlying neural networks — Deep Learning by Ian Goodfellow. In the introduction, there is a whole section about all the math that is needed for a good understanding of neural networks. One more good reference is Neural Networks and Deep Learning by Michael Nielsen — this may not be a fundamental work, but it will be very useful for understanding the basic principles. 91 | 92 | Additional resources: 93 | 94 | [A Complete Guide To Math And Statistics For Data Science: cool and not boring walkthrough to help you become well-oriented in the realms of math and statistics](https://www.edureka.co/blog/math-and-statistics-for-data-science/) 95 | 96 | [Introduction to Statistics for Data Science: This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.](https://www.kdnuggets.com/2018/12/introduction-statistics-data-science.html) 97 | 98 | [A comprehensive beginners guide to Linear Algebra for Data Scientists: Everything you need to know about Linear Algebra](https://www.analyticsvidhya.com/blog/2017/05/comprehensive-guide-to-linear-algebra/) 99 | 100 | [Linear Algebra for Data Scientists: Amazing article to dive into a quick run through of the basics.](analyticbridge.datasciencecentral.com/profiles/blogs/linear-algebra-for-data-scientists) 101 | 102 | ### Step 2. Programming (Python) 103 | 104 | In fact, a great advantage would be to immediately get acquainted with the basics of programming. But since this is a very time-consuming process, you can simplify this task a bit. How? Everything is simple. Start learning one language and focus on all the nuances of programming through the syntax of that language. 105 | 106 | `But still, it is difficult to do without some kind of general guide. For this reason, I recommend paying attention to this article: Software Development Skills for Data Scientists: Amazing article about important soft skills for 107 | programming practice.` 108 | 109 | For example, I would advise you to pay attention to Python. Firstly, it is perfect for beginners to learn, it has a relatively simple syntax. Secondly, Python combines the demand for specialists and is multifunctional. 110 | 111 | ```But if these statements don’t tell you anything, read more about it here: Python vs R. Choosing the Best Tool for AI, ML & Data Science. Time is a precious resource, so it’s better not to disintegrate at once and not just waste it.``` 112 | 113 | So how to learn Python? 114 | 115 | If you don’t have any programming understanding, I recommend reading Automate the Boring Stuff With Python. The book offers to explain practical programming for total beginners and teach from scratch. Read Chapter 6, “String Manipulation,” and complete the practical tasks for this lesson. That will be enough. 116 | 117 | Here are some other great resources to explore: 118 | 119 | [Codecademy — teaches good general syntax](https://www.codecademy.com/) 120 | 121 | [Learn Python the Hard Way — a brilliant manual-like book that explains both basics and more complex applications.](https://learnpythonthehardway.org/) 122 | 123 | [Dataquest — this resource teaches syntax while also teaching data science](https://www.dataquest.io/) 124 | 125 | [The Python Tutorial — official documentation](https://docs.python.org/3/tutorial/) 126 | 127 | After you learn the basics of Python, you need to spend time getting to know the main libraries. 128 | 129 | ### Step 3. Machine Learning 130 | 131 | `Machine learning allows you to train computers to act independently so that we do not have to write detailed instructions for performing certain tasks. For this reason, machine learning is of great value for almost any area, but first of all, of course, it will work well where there is Data Science.` 132 | 133 | First thing or the first step in learning ML is its three main groups: 134 | 135 | 1) Supervised Learning is now the most developed form of ML. The idea here is that you have historical data with some notion of the output variable. Output Variable is meant for recognizing how you can a good combination of several input variables and corresponding output values as historical data presented to you and then based on that you try to come up with a function which is able to predict an output given any input. So, the key idea is that historical data is labeled. Labeled means that you have a specific output value for every row of data, that is presented to it⠀ 136 | PS. in the case of the output variable, if the output variable is discreet, it is called CLASSIFICATION. And if it is continuous it is called REGRESSION 137 | 138 | ```2) Unsupervised learning doesn’t have the luxury of having labeled historical data input-output. Instead, we can only say that it has a whole bunch of input data, RAW INPUT DATA. It allows us to identify what is known as patterns in the historical input data and interesting insights from the overall perspective. So, the output here is absent and all you need to understand is that is there a pattern being visible in the unsupervised set of input. The beauty of unsupervised learning is that it lends itself to numerous combinations of patterns, that’s why unsupervised algorithms are harder.``` 139 | 140 | 3) Reinforcement learning occurs when you present the algorithm with examples that lack labels, as in unsupervised learning. However, you can accompany an example with positive or negative feedback according to the solution the algorithm proposes. RL is connected to applications for which the algorithm must make decisions, and the decisions bear consequences. It is just like learning by trial and error. An interesting example of RL occurs when computers learn to play video games by themselves. 141 | So okay, now you know the basics of ML. After this, you obviously need to learn more. Here are great resources to explore for this purpose: 142 | 143 | Supervised and Unsupervised Machine Learning Algorithms: Clear, concise explanations of the types of machine learning algorithms. 144 | Visualization of Machine Learning: Excellent visualization that walks you through exactly how machine learning is used. 145 | 146 | ### Step 4. Data Mining and Data Visualization 147 | 148 | Data Mining is an important analytic process designed to explore data. It is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue. 149 | 150 | Resources to master Data Mining: 151 | 152 | [How data mining works — great video with the best explanation I found so far ‘Janitor Work’ is Key Hurdle to Insights: Interesting article that goes into detail regarding the importance of data mining practices in the field of data science.](https://www.youtube.com/watch?v=W44q6qszdqY) 153 | 154 | [Data Visualization is a general term that describes an effort to help people understand the significance of data by placing it in a visual context.](https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0) 155 | 156 | Resources to master Data Visualization: 157 | 158 | [Data visualization beginner’s guide](https://www.informationisbeautiful.net/visualizations/what-makes-a-good-data-visualization/) 159 | 160 | [What Makes a Good Data Visualization](https://www.tableau.com/learn/articles/data-visualization) 161 | 162 | ### Step 5. Practical Experience 163 | Studying only the theory is not very interesting, you need to try your hand at practice. Data Scientist’s beginner has a few good options for this: 164 | 165 | ```Use Kaggle, a website dedicated to Data Science. It constantly hosts data analysis competitions in which you can take part. There are also a large number of open data sets that you can analyze and publish your results. In addition, you can watch scripts published by other participants (on Kaggle, such scripts are called Kernels) and learn from successful experience.``` 166 | 167 | ### Step 7. Qualification Confirmation 168 | 169 | After you have studied everything you need to analyze the data and try your hand at open tasks and contests, then start looking for a job. Of course, you will say only good things, but you have the right to doubt your words. Then you will demonstrate independent confirmations, for example: 170 | 171 | ```Advanced profile on Kaggle. Kaggle has a ranks system, you can go through the steps from beginner to grandmaster. For successful participation in competitions, the publication of scripts and discussions, you can get points that allow you to raise the rating. In addition, the site shows in what competitions you participated, and what are your results.``` 172 | 173 | Data analysis programs can be published on GitHub or other open repositories, then all interested can get acquainted with them. Including representatives of the employer, who will conduct an interview with you. 174 | 175 | ``` Final Advice: Don’t Be a Copy of a Copy, Find Your Own Way``` 176 | 177 | Now anyone can become Data Scientist. There is everything you need for this in the public domain: online-courses, books, competitions for gaining practical experience and so on. It’s good for the first glance, but you shouldn’t learn it just because of hype. All we hear about Data Science it is unbelievably cool and it’s the sexiest job of the 21st century. If these things are the main motivation for you, nothing ever will work. Sad truth yes and maybe I’m exaggerating a little bit but that’s kind of how I feel about it. 178 | What I’m going to say right now is becoming a self-taught Data Scientist is possible. However, the key to your success is a high motivation to regularly find time to study data analysis and its practical application. Most importantly, you have to learn to get satisfaction in the process of learning and working. 179 | 180 | Think about it. 181 | 182 | Good luck! 183 | 184 | Feel free to share your ideas and thoughts. 185 | 186 | ### Project Reports 187 | 188 | [![report](https://img.shields.io/static/v1.svg?label=Project&message=Report&logo=microsoft-word&style=social)](https://towardsdatascience.com/a-beginners-guide-to-data-science-55edd0288973) 189 | 190 | - [Download](https://github.com/iamsivab/Data-Science-Beginners-Guide/pulls) for the report. 191 | 192 | ### Useful Links 193 | 194 | 1. [Towards Data Science](https://towardsdatascience.com/a-beginners-guide-to-data-science-55edd0288973) 195 | 196 | ### Related Work 197 | 198 | [![Data Science Repository](https://img.shields.io/static/v1.svg?label=DataScience&message=Repo&color=lightgray&logo=github&style=social&colorA=critical)](https://github.com/iamsivab/Data-Science-Repository) [![GitHub top language](https://img.shields.io/github/languages/top/iamsivab/Data-Science-Beginners-Guide.svg?logo=php&style=social)](https://github.com/iamsivab/) 199 | 200 | [Data Science Repo](https://github.com/iamsivab/Data-Science-Beginners-Guide) - A Detailed Report on the Analysis 201 | 202 | 203 | ### Contributing 204 | 205 | [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?logo=github)](https://github.com/iamsivab/Data-Science-Beginners-Guide/pulls) [![GitHub issues](https://img.shields.io/github/issues/iamsivab/Data-Science-Beginners-Guide?logo=github)](https://github.com/iamsivab/Data-Science-Beginners-Guide/issues) ![GitHub pull requests](https://img.shields.io/github/issues-pr/viamsivab/Data-Science-Beginners-Guide?color=blue&logo=github) 206 | [![GitHub commit activity](https://img.shields.io/github/commit-activity/y/iamsivab/Data-Science-Beginners-Guide?logo=github)](https://github.com/iamsivab/Data-Science-Beginners-Guide/) 207 | 208 | - Clone [this](https://github.com/iamsivab/Data-Science-Beginners-Guide/) repository: 209 | 210 | ```bash 211 | git clone https://github.com/iamsivab/Data-Science-Beginners-Guide.git 212 | ``` 213 | 214 | - Check out any issue from [here](https://github.com/iamsivab/Data-Science-Beginners-Guide/issues). 215 | 216 | - Make changes and send [Pull Request](https://github.com/iamsivab/Data-Science-Beginners-Guide/pull). 217 | 218 | ### Need help? 219 | 220 | [![Facebook](https://img.shields.io/static/v1.svg?label=follow&message=@iamsivab&color=9cf&logo=facebook&style=flat&logoColor=white&colorA=informational)](https://www.facebook.com/iamsivab) [![Instagram](https://img.shields.io/static/v1.svg?label=follow&message=@iamsivab&color=grey&logo=instagram&style=flat&logoColor=white&colorA=critical)](https://www.instagram.com/iamsivab/) [![LinkedIn](https://img.shields.io/static/v1.svg?label=connect&message=@iamsivab&color=success&logo=linkedin&style=flat&logoColor=white&colorA=blue)](https://www.linkedin.com/in/iamsivab/) 221 | 222 | :email: Feel free to contact me @ [balasiva001@gmail.com](https://mail.google.com/mail/) 223 | 224 | [![GMAIL](https://img.shields.io/static/v1.svg?label=send&message=balasiva001@gmail.com&color=red&logo=gmail&style=social)](https://www.github.com/iamsivab) [![Twitter Follow](https://img.shields.io/twitter/follow/iamsivab?style=social)](https://twitter.com/iamsivab) 225 | 226 | 227 | ### License 228 | 229 | MIT © [Sivasubramanian](https://github.com/iamsivab/Data-Science-Beginners-Guide/blob/master/LICENSE) 230 | 231 | [![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/0)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/0)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/1)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/1)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/2)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/2)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/3)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/3)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/4)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/4)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/5)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/5)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/6)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/6)[![](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/images/7)](https://sourcerer.io/fame/iamsivab/iamsivab/Data-Science-Beginners-Guide/links/7) 232 | 233 | 234 | [![GitHub license](https://img.shields.io/github/license/iamsivab/Data-Science-Beginners-Guide.svg?style=social&logo=github)](https://github.com/iamsivab/Data-Science-Beginners-Guide/blob/master/LICENSE) 235 | [![GitHub forks](https://img.shields.io/github/forks/iamsivab/Data-Science-Beginners-Guide.svg?style=social)](https://github.com/iamsivab/Data-Science-Beginners-Guide/network) [![GitHub stars](https://img.shields.io/github/stars/iamsivab/Data-Science-Beginners-Guide.svg?style=social)](https://github.com/iamsivab/Data-Science-Beginners-Guide/stargazers) [![GitHub followers](https://img.shields.io/github/followers/iamsivab.svg?label=Follow&style=social)](https://github.com/iamsivab/)[![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg)](https://GitHub.com/iamsivab/ama) 236 | -------------------------------------------------------------------------------- /SQL/SQL for DS.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/SQL for DS.pdf -------------------------------------------------------------------------------- /SQL/SQL-Cheat-Sheet SQL-Tutorial.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/SQL-Cheat-Sheet SQL-Tutorial.pdf -------------------------------------------------------------------------------- /SQL/SQL-cheat-sheet SQLTutorials.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/SQL-cheat-sheet SQLTutorials.pdf -------------------------------------------------------------------------------- /SQL/SQL-cheat-sheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/SQL-cheat-sheet.pdf -------------------------------------------------------------------------------- /SQL/sql-cheat-sheet (1).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/sql-cheat-sheet (1).pdf -------------------------------------------------------------------------------- /SQL/sql-cheat-sheet (2).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/sql-cheat-sheet (2).pdf -------------------------------------------------------------------------------- /SQL/sql-cheat-sheet-for-data-scientists-by-tomi-mester.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/sql-cheat-sheet-for-data-scientists-by-tomi-mester.pdf -------------------------------------------------------------------------------- /SQL/sql_cheat_sheet2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/SQL/sql_cheat_sheet2.pdf -------------------------------------------------------------------------------- /Statistics and Probability/Probability and Naive Bayes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Statistics and Probability/Probability and Naive Bayes.pdf -------------------------------------------------------------------------------- /Statistics and Probability/Probability.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Statistics and Probability/Probability.pdf -------------------------------------------------------------------------------- /Statistics and Probability/Stat 100 Final Cheat Sheets - Google Docs (2).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Statistics and Probability/Stat 100 Final Cheat Sheets - Google Docs (2).pdf -------------------------------------------------------------------------------- /Statistics and Probability/Statistics Cheat Sheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Statistics and Probability/Statistics Cheat Sheet.pdf -------------------------------------------------------------------------------- /Statistics and Probability/cheatsheet-statistics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storieswithsiva/Data-Science-Beginners-Guide/1ed74b07394eec3affe2a532f59b50fa17c55a54/Statistics and Probability/cheatsheet-statistics.pdf --------------------------------------------------------------------------------