├── examples
├── wordCloud.png
└── script.py
├── pythonHackathon
├── datePreference.png
└── pythonHackathon.md
├── script.py
├── StepsofaDataScienceProject.md
├── StepstoCreatingDataScienceProject.md
└── README.md
/examples/wordCloud.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raviolli77/dataScience-UCSBProjectGroup-Syllabus/HEAD/examples/wordCloud.png
--------------------------------------------------------------------------------
/pythonHackathon/datePreference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raviolli77/dataScience-UCSBProjectGroup-Syllabus/HEAD/pythonHackathon/datePreference.png
--------------------------------------------------------------------------------
/script.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | Minimal Example
4 | ===============
5 | Generating a square wordcloud from the Star Wars: A New Hope using default arguments.
6 |
7 | All code generated by writers of word_cloud module (https://github.com/amueller/word_cloud)
8 | Used for educational purposes
9 | """
10 |
11 | from os import path
12 | from wordcloud import WordCloud
13 |
14 | d = path.dirname(__file__)
15 |
16 | # Read the whole text.
17 | text = open(path.join(d, 'a_new_hope.txt')).read()
18 |
19 | # Generate a word cloud image
20 | wordcloud = WordCloud().generate(text)
21 |
22 | # Display the generated image:
23 | # the matplotlib way:
24 | import matplotlib.pyplot as plt
25 | #plt.imshow(wordcloud)
26 | #plt.axis("off")
27 |
28 | # lower max_font_size
29 | wordcloud = WordCloud(max_font_size=40).generate(text)
30 | plt.figure()
31 | plt.imshow(wordcloud)
32 | plt.axis("off")
33 | plt.show()
--------------------------------------------------------------------------------
/examples/script.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | Minimal Example
4 | ===============
5 | Generating a square wordcloud from the Star Wars: A New Hope using default arguments.
6 |
7 | All code generated by writers of word_cloud module (https://github.com/amueller/word_cloud)
8 | Used for educational purposes
9 | """
10 |
11 | from os import path
12 | from wordcloud import WordCloud
13 |
14 | d = path.dirname(__file__)
15 |
16 | # Read the whole text.
17 | text = open(path.join(d, 'a_new_hope.txt')).read()
18 |
19 | # Generate a word cloud image
20 | wordcloud = WordCloud().generate(text)
21 |
22 | # Display the generated image:
23 | # the matplotlib way:
24 | import matplotlib.pyplot as plt
25 | #plt.imshow(wordcloud)
26 | #plt.axis("off")
27 |
28 | # lower max_font_size
29 | wordcloud = WordCloud(max_font_size=40).generate(text)
30 | plt.figure()
31 | plt.imshow(wordcloud)
32 | plt.axis("off")
33 | plt.show()
--------------------------------------------------------------------------------
/pythonHackathon/pythonHackathon.md:
--------------------------------------------------------------------------------
1 | # Python Hackathon
2 | ## Function of Data Science at UCSB Project Group
3 |
4 | **Contributors**:
5 | + Raul Eulogio
6 |
7 | Necessary to discuss with everyone:
8 | + Date(s) (Most likely dates since we have a large preference for two dates)
9 | + Location - On Campus
10 | + Time
11 | + Earliest works best, but how early is the issue
12 | + Sign-In
13 | + Pitch in?
14 | + Coffee
15 | + Snacks?
16 | + Anything else add here:
17 |
18 | ## **Issues to Address**:
19 |
20 | The purpose of this file is to address issues with respect to setting up **Python** *before* attending the **Hackathon**. If we have too many people troubleshooting at the **hackathon**, it will lose the purpose of the **hackathon**'s intent, which is to collaborate learning/teaching **Python** for data analysis, not setting it up.
21 |
22 | So we are giving a brief overview of what should be done before the **hackathon** so that we will maximize our time in learning **Python**.
23 |
24 | ## Downloading Python
25 | Install **Python3.X** onto your computer.
26 | ### Mac OS
27 | **Mac OS** has **Python2.7** pre-installed so download **Python3.X**. Here's a quick run through although we recommend reading this [Guide](http://docs.python-guide.org/en/latest/starting/install/osx/) as well as reading the [Python Docs](https://www.python.org/).
28 |
29 | Installing **Python3.X** (from the command line) using homebrew (a popular package manager) for **Mac OS** would be simply
30 |
31 | brew python3
32 |
33 | That downloads **Python 3.X**, to test this on the terminal run
34 |
35 | python3
36 |
37 | This should open **Python3.X** (If you get an error code, google it and try diagnosing what the issue was or contact us), now run the command
38 |
39 | quit()
40 |
41 | This should close **Python**. Let's test out `pip3` (Method we will use extensively for downloading 3rd party modules) by downloading a popular module used for data analysis, pandas:
42 |
43 | pip3 install pandas
44 |
45 | And you should see the installment take place, and your set!
46 |
47 | ### Windows
48 | I unfortunately don't have a lot of experience installing **Python** on windows, but here's a tutorial from [The Hitchhiker's Guide to Python's](http://docs.python-guide.org/en/latest/starting/install/win/) guide for doing so.
49 |
50 | **Important**: this guide are instructions for 2.7. this resource will be mentioned later as well (So we think its a good resource if you find any other let us know so we can add it here!)
51 |
52 | ## Virtual Environments
53 | As you venture into the world of programming and data science it is necessary to use and understand *Virtual Environments*. There are two *Virtual Environments* that we familiar with, choice is really personal taste.
54 |
55 | + [**Anaconda**](https://www.continuum.io/downloads) - Popular and adverstised as a *Virtual Environment* for data science teams. Last quarter, Jason told people to download this for their *Virtual Environments* so if you have this just go ahead and use it.
56 | + [**Virtualenv**](http://docs.python-guide.org/en/latest/dev/virtualenvs/) - This tool is more versatile in that it is not advertised for a specific niche, so `virtualenv` seems to be the *Virtual Environment* used for any **Python** package. If you follow the tutorial provided it should be pretty straightforward to apply and if you have any questions contact me.
57 |
58 | **Recall**: Download `virtualenv` for **Python3.X** using `pip3` not `pip`
59 |
60 | **Important to Note**: The reason we are stressing the importance of using *Virtual Environments* is because we want your code to be reproducible. Once people want to replicate your projects, they need to know what **Python** version you used, they need to know what module versions you used. Often times when not specified someone might try to replicate your project and if they have an older **Python** version, they will quickly run into a lot of error codes, so you'll be doing people a huge favor by specifying what versions of modules and **Python** you used.
61 |
62 | ## Conclusion
63 | Please ensure, in order to have the most prodcutive **Hackathon**, you handle these issues **before** the day of:
64 |
65 | + Have **Python3.X** on your computer
66 | + Have a *Virtual Environment* set up on your computer (we might be lenient and go about teaching how to work *Virtual Environments* since at first it can be a daunting concept)
67 |
--------------------------------------------------------------------------------
/StepsofaDataScienceProject.md:
--------------------------------------------------------------------------------
1 | # How to do a Data Science Project
2 |
3 | Throughout the history of this organization we have emphasized the importance of creating projects. The biggest issue that we've seen people face is that they want to do a project but they don't know how to start a project. This roadblock prevents people from moving forward and will often make or break a team.
4 |
5 | To put it bluntly we can't let this slide anymore. We hope with this newly formatted **Project Group** to bridge the gap and make doing a project an attainable goal.
6 |
7 | ## Why do a Data Science Project?
8 |
9 | Before we go into the technical details of building a Data Science Project, we want you to ask yourself: Why do you want to do a Data Science Project?
10 |
11 | People's motives will vary. Answers can range from: '*it will look good on my resume*', '*I want to be able to brag about something that's attainable by other people online*', '*I want to learn beyond what's taught to me in my classess*'.
12 |
13 | Whatever the reason for your attendance in this group, you will realize that you are now connected to a community that aims to enter a newly emerging field in need of highly driven and analytical people with a love for all things data.
14 |
15 | The short answer for the desired results for a Data Science Project is that you want to predict an outcome based on certain attributes. Doing a Data Science Project will make sense of your data which you can then use for a plethora of insights relating to your understanding of the data which you can then use to give insight to people who need it or are curious as to how the *data speaks to you*.
16 |
17 | ### Open Source
18 | Once you start the process of a Data Science project you will quickly be introduced to the concept of *open-source* (if you haven't already). This concept was introduced to me through **R/Rstudio** and I quickly fell in love with *open-source*. Once you have learned the tools and have published a project, using **RStudio** or **Python**, you can publish to [inertia7](http://www.inertia7.com/) or your GitHub Repo, but you will realize that you have made your imprint unto this growing community where anyone can learn from you if they so choose to!
19 |
20 | Here I've provided a (non-exhaustive) list of *open-source* communities that provide resources at any skill level for people wanting to enter this field:
21 |
22 | - [RBloggers](https://www.r-bloggers.com/)
23 | - [Stack Overflow](http://stackoverflow.com/)
24 | - [Cross Validated](http://stats.stackexchange.com/)
25 | - [Kaggle](https://www.kaggle.com/)
26 |
27 | We can go on and on about resources, but regardless of how many resources we provide, people will still be left with an air of mystery as to what exactly a Data Science Project is. So the next section aims to dissect the process of doing a Data Science Project, and provide more resources at every step.
28 |
29 | ## Getting the Data
30 |
31 | We are assuming that for you to have gotten this far you have a basic understanding of some statistical tools/methods and are fairly knowledgeable with either **R**/**Python** as your tool for project building (granted we don't expect you to be experts on either but knowing how to manipulate data frames, conditional formatting, and data wrangling are a must to do projects).
32 |
33 | This part of the project building can be daunting due to the over-abundance of data sets/raw data.
34 |
35 | This step requires more thought than one might think, and will ultimately be decided as to what your team is interested in. This can range from sports, video games, music, etc. If it exists there is data available for it. And if there isn't then you can be the first to collect data (more on this later!)
36 |
37 | Now if you're a beginner we recommend using data sets that have are *open-source* and have been used before for Data Science projects.
38 |
39 | This can range from:
40 | - [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets.html)
41 | - Data sets found in the **RStudio** base packages found here: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html.
42 | - If you have a bit more experienced then we would definitely recommend the [Kaggle Data sets](https://www.kaggle.com/datasets)
43 | - API Calls to certain websites like **Twitter**, **Reddit**, **Google Maps** to name a few.
44 |
45 | A piece of advice when choosing a data set, when starting off, is to choose one that isn't to over the top. The best way to learn is to go through a simple data set and then expand what you learned unto more complicated data sets once you become familar with the process. For many of our projects in inertia7.com we chose data sets that were often used in our Statstical Classes and online resources, that we mentioned, to slowly build up our repertoire and skill set.
46 |
47 | Start small and build your way up
48 |
49 | ## Data Wrangling
50 |
51 | This part varies from data set to data set. Once you have chosen your data set/collected your data, the next step is the process of cleaning the data known as *Data Wrangling*. If you have chosen a data set that has been abundantly used in statistical courses/Online Data Science resources you will find that you can easily reference other people's work when it comes to cleaning your data set(**Remember to Cite every source you use**). As well as finding justifications readily available for the transformations that were made.
52 |
53 | This step is crucial; often we take for granted the data sets we use, but one will find that data sets are often very messy and it is up to you to go through the process known as *Data wrangling*.
54 |
55 | ### Example: Normalization of our predictor variables
56 | Often in machine learning, we will realize that the different variables are measured in different scales.
57 |
58 | For example say we have a data set that has these two columns:
59 |
60 | - `Spending` - measured in percentage (i.e. 13.56% is written as 13.56)
61 | - `CostPerCapita` - measured in $1 millions (i.e. $1.3 million is written as 1,300,000)
62 |
63 | If we tried to do some models on this data set without transforming our data we will quickly run into some critical errors that will hinder our results. Thus possibe solution is to scale our data so that the **mean** for both our variables is 0 and the **standard deviation/variance** is 1.
64 |
65 | This process is known as **Normalization**.
66 |
67 | This is one of many processes that one should take into consideration before delving into a Data Science Project.
68 |
69 | Say you've gotten this far and quickly realized that you don't know when and when not to do transformations to your data set. This process does require a bit of analytical intuition, and there are many resources online that can help with the process.
70 |
71 | Let's pause the process here and say you don't think you have the skill set to make these important decisions. This is important to admit because if you are stuck here and you don't know how to continue a project, this can put a project on pause indefinitely.
72 |
73 | Our advice to you is not to be discouuraged, let's just take a step back and not necessarily redesign your project, but essentially work with what you have.
74 |
75 | ## Data Visualization
76 | Something many people take for granted is *Data visualization* when going about Data Science Projects. Known as **Exploratory Analysis**, visuals are an important factor of projects because it helps gain readers that are interested in what the data has to say in, usually people who don't have a strong data based background. As you progress on to more complicated projects the visualizations will take a back seat to the analystical aspect of your project, but if you're a beginner visualizations are a powerful tool to lure other beginners and a wide array of people into reading your project no matter how simple or trivial you might think it is.
77 |
78 | An example of this being a subreddit called [Dataisbeautiful](https://www.reddit.com/r/dataisbeautiful/) currenlty followed by 9,616,736 redditors. This subreddit focuses on simple visualizations of data collected by redditors. Many of these projects do not feature machine learning or advanced statistical inferences, but there's a reason it has so many followers.
79 |
80 | Many people who do not posess the data-driven background still love to learn about data presented to them in a way they can understand, and a visualization is just that.
81 |
82 | ## Example: Exploratory Analysis on Text
83 | An example of an effective use of visualizations would be say your team decided to do text analysis on the movie script for **Star Wars: A New Hope** through the process known as **Natural Language Process** (NLP), but the algorithms you found online don't make sense you're team and you're having trouble moving forward. Instead of letting the project come to a complete halt, instead think of some easy descriptive statistical methods you can employ on your data set.
84 |
85 | WordClouds are a simple and effective way of showcasing words that are often said in a corpus, in our case the movie script. Employing the [Word_Cloud](https://github.com/amueller/word_cloud) module in **Python** you can quickly reignite fire to your project as shown below (This process took me all but 5 minutes to recreate using their sample scripts on GitHub):
86 |
87 |
88 |
89 | From here many other ideas come to mind that don't involve complicated machine learning algorithms:
90 |
91 | - Bar chart showcasing the amount of dialogue each character has
92 | - Word Clouds for each main character
93 | - Bar Charts showcasing frequency of a certain word like **the Force**
94 |
95 | Therefore we showcased that Data Science projects don't have to be overtly complicated, starting with exloratory analysis is okay and can be used as a jumping board to more complicated and indepth analysis.
--------------------------------------------------------------------------------
/StepstoCreatingDataScienceProject.md:
--------------------------------------------------------------------------------
1 | # How to do a Data Science Project
2 |
3 | Throughout the history of this organization we have emphasized the importance of creating projects. The biggest issue that we've seen people face is that they want to do a project but they don't know how to start a project. This roadblock prevents people from moving forward and will often make or break a team.
4 |
5 | To put it bluntly we can't let this slide anymore. We hope with this newly formatted **Project Group** to bridge the gap and make doing a project an attainable goal.
6 |
7 | ## Why do a Data Science Project?
8 |
9 | Before we go into the technical details of building a Data Science Project, we want you to ask yourself: Why do you want to do a Data Science Project?
10 |
11 | People's motives will vary. Answers can range from: *it will look good on my resume*, *I want to be able to brag about something that's attainable by other people online*, *I want to learn beyond what's taught to me in my classess*.
12 |
13 | Whatever the reason for your attendance in this group, you will realize that you are now connected to a community that aims to enter a newly emerging field in need of highly driven and analytical people with a love for all things data.
14 |
15 | The short answer for the desired results for a Data Science Project is that you want to predict an outcome based on certain attributes. Doing a Data Science Project will make sense of your data which you can then use for a plethora of insights relating to your understanding of the data which you can then use to give insight to people who need it or are curious as to how the *data speaks to you*.
16 |
17 | ### Open Source
18 | Once you start the process of a Data Science project you will quickly be introduced to the concept of *open-source* (if you haven't already). This concept was introduced to me through **R/Rstudio** and I quickly fell in love with *open-source*. Once you have learned the tools and have published a project, using **RStudio** or **Python**, you can publish to [inertia7](http://www.inertia7.com/) or your GitHub Repo, but you will realize that you have made your imprint unto this growing community where anyone can learn from you if they so choose to!
19 |
20 | Here I've provided a (non-exhaustive) list of *open-source* communities that provide resources at any skill level for people wanting to enter this field:
21 |
22 | - [RBloggers](https://www.r-bloggers.com/)
23 | - [Stack Overflow](http://stackoverflow.com/)
24 | - [Cross Validated](http://stats.stackexchange.com/)
25 | - [Kaggle](https://www.kaggle.com/)
26 |
27 | We can go on and on about resources, but regardless of how many resources we provide, people will still be left with an air of mystery as to what exactly a Data Science Project is. So the next section aims to dissect the process of doing a Data Science Project, and provide more resources at every step.
28 |
29 | ## Getting the Data
30 |
31 | We are assuming that for you to have gotten this far you have a basic understanding of some statistical tools/methods and are fairly knowledgeable with either **R**/**Python** as your tool for project building (granted we don't expect you to be experts on either but knowing how to manipulate data frames, conditional formatting, and data wrangling are a must to do projects).
32 |
33 | This part of the project building can be daunting due to the over-abundance of data sets/raw data.
34 |
35 | This step requires more thought than one might think, and will ultimately be decided as to what your team is interested in. This can range from sports, video games, music, etc. If it exists there is data available for it. And if there isn't then you can be the first to collect data (more on this later!)
36 |
37 | Now if you're a beginner we recommend using data sets that have are *open-source* and have been used before for Data Science projects.
38 |
39 | This can range from:
40 | - [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets.html)
41 | - Data sets found in the **RStudio** base packages found here: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html.
42 | - If you have a bit more experienced then we would definitely recommend the [Kaggle Data sets](https://www.kaggle.com/datasets)
43 | - API Calls to certain websites like **Twitter**, **Reddit**, **Google Maps** to name a few.
44 |
45 | A piece of advice when choosing a data set, when starting off, is to choose one that isn't to over the top. The best way to learn is to go through a simple data set and then expand what you learned unto more complicated data sets once you become familar with the process. For many of our projects in inertia7.com we chose data sets that were often used in our Statstical Classes and online resources, that we mentioned, to slowly build up our repertoire and skill set.
46 |
47 | Start small and build your way up
48 |
49 | ## Data Wrangling
50 |
51 | This part varies from data set to data set. Once you have chosen your data set/collected your data, the next step is the process of cleaning the data known as *Data Wrangling*. If you have chosen a data set that has been abundantly used in statistical courses/Online Data Science resources you will find that you can easily reference other people's work when it comes to cleaning your data set(**Remember to Cite every source you use**). As well as finding justifications readily available for the transformations that were made.
52 |
53 | This step is crucial; often we take for granted the data sets we use, but one will find that data sets are often very messy and it is up to you to go through the process known as *Data wrangling*.
54 |
55 | ### Example: Normalization of our predictor variables
56 | Often in machine learning, we will realize that the different variables are measured in different scales.
57 |
58 | For example say we have a data set that has these two columns:
59 |
60 | - `Spending` - measured in percentage (i.e. 13.56% is written as 13.56)
61 | - `CostPerCapita` - measured in $1 millions (i.e. $1.3 million is written as 1,300,000)
62 |
63 | If we tried to do some models on this data set without transforming our data we will quickly run into some critical errors that will hinder our results. Thus possibe solution is to scale our data so that the **mean** for both our variables is 0 and the **standard deviation/variance** is 1.
64 |
65 | This process is known as **Normalization**.
66 |
67 | This is one of many processes that one should take into consideration before delving into a Data Science Project.
68 |
69 | Say you've gotten this far and quickly realized that you don't know when and when not to do transformations to your data set. This process does require a bit of analytical intuition, and there are many resources online that can help with the process.
70 |
71 | Let's pause the process here and say you don't think you have the skill set to make these important decisions. This is important to admit because if you are stuck here and you don't know how to continue a project, this can put a project on pause indefinitely.
72 |
73 | Our advice to you is not to be discouuraged, let's just take a step back and not necessarily redesign your project, but essentially work with what you have.
74 |
75 | ## Data Visualization
76 | Something many people take for granted is *Data visualization* when going about Data Science Projects. Known as **Exploratory Analysis**, visuals are an important factor of projects because it helps gain readers that are interested in what the data has to say in, usually people who don't have a strong data based background. As you progress on to more complicated projects the visualizations will take a back seat to the analystical aspect of your project, but if you're a beginner visualizations are a powerful tool to lure other beginners and a wide array of people into reading your project no matter how simple or trivial you might think it is.
77 |
78 | An example of this being a subreddit called [Dataisbeautiful](https://www.reddit.com/r/dataisbeautiful/) currenlty followed by 9,616,736 redditors. This subreddit focuses on simple visualizations of data collected by redditors. Many of these projects do not feature machine learning or advanced statistical inferences, but there's a reason it has so many followers.
79 |
80 | Many people who do not posess the data-driven background still love to learn about data presented to them in a way they can understand, and a visualization is just that.
81 |
82 | ## Example: Exploratory Analysis on Text
83 | An example of an effective use of visualizations would be say your team decided to do text analysis on the movie script for **Star Wars: A New Hope** through the process known as **Natural Language Process** (NLP), but the algorithms you found online don't make sense you're team and you're having trouble moving forward. Instead of letting the project come to a complete halt, instead think of some easy descriptive statistical methods you can employ on your data set.
84 |
85 | WordClouds are a simple and effective way of showcasing words that are often said in a corpus, in our case the movie script. Employing the [Word_Cloud](https://github.com/amueller/word_cloud) module in **Python** you can quickly reignite fire to your project as shown below (This process took me all but 5 minutes to recreate using their sample scripts on GitHub):
86 |
87 |
88 |
89 | From here many other ideas come to mind that don't involve complicated machine learning algorithms
90 |
91 | - Bar chart showcasing the amount of dialogue each character has
92 | - Word Clouds for each main character
93 | - Bar Charts showcasing frequency of a certain word like **the Force**
94 |
95 | Therefore we showcased that Data Science projects don't have to be overtly complicated, starting with exloratory analysis is okay and can be used as a jumping board to more complicated and indepth analysis.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Winter Quarter Project Group - Data Science at UCSB
2 | ### Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter
3 |
4 |
5 | ## In Memory of..
6 | The efforts of this quarter and the work done is dedicated to the memory of:
7 |
8 | + **Fernando Regino** (1993-2013)
9 | + **Bernardino De Jesus** (1993-2016)
10 | + **Ivan Garcia Vergara** (1991-2018)
11 | + **Erik Alonso** (1991-2009)
12 | + **Jorge Zarate** (1990-2008)
13 |
14 | "*When the lights shut off
15 |
16 | And it's my turn to settle down
17 |
18 | My main concern
19 |
20 | Promise that you will sing about me*" - Kendrick Lamar
21 |
22 | Thank you to everyone who participated this quarter
23 |
24 | ## Abstract
25 |
26 | This repository serves as an itinerary for the Project Groups for Winter Quarter for the **Data Science at UCSB** organization. Providing a weekly overview as well as resources used within the weekly meetings.
27 |
28 | **Contributors**:
29 | + [Raul Eulogio](https://www.linkedin.com/in/raul-eulogio-217069123) -> rauleulogio3 [at] gmail.com
30 | + GitHub: https://github.com/raviolli77/
31 | + [David Campos](https://www.linkedin.com/in/dcamposliz) - dcampos.liz [at] gmail.com
32 | + GitHub: https://github.com/dcamposliz
33 | + Personal Site: http://davidacampos.com/
34 | + [Jason Freeberg](https://www.linkedin.com/in/jfreeberg) -> freeberg [at] umail.ucsb.edu
35 | + GitHub: https://github.com/JasonFreeberg
36 | + Personal Site: JasonFreeberg.github.io
37 | + [Nathan Fritter](https://www.linkedin.com/in/nathan-fritter) -> nathan.fritter [at] gmail.com
38 | + GitHub: https://github.com/Njfritter
39 |
40 | # Table of Contents
41 | * [Week 2](#weektwo)
42 | * [Week 3](#weekthree)
43 | * [Week 4](#weekfour)
44 | * [Week 5](#weekfive)
45 | * [Week 6](#weeksix)
46 | * [Week 7](#weekseven)
47 | * [Week 8](#weekeight)
48 | * [Week 9](#weeknine)
49 |
50 | # Lesson Plan
51 | ## Week 2: Introductions
52 | + Who are you?
53 | + Name
54 | + Major
55 | + Year
56 | + Where are you from?
57 | + Why are you here?
58 | + What are you trying to accomplish in life?
59 | + what are you trying to accomplish here?
60 | + What are you trying to learn?
61 | + What project(s) are you working on today?
62 | + What recent failure have you had?
63 | + Strengths & weaknesses as it relates to data science or in general?
64 | **Storm**
65 | Goal of this group is to ultimately get projects finished and published
66 | + **WHY**
67 | + We found that it is by working on projects that you actually get to learn and being to understand how to do data science
68 | + Brainstorm on data science ideas
69 | + Write them on a piece of paper
70 | + Go to the front of the group and present it
71 | + Have people walk up to you/you walk up to people, persuade people to be in your group
72 |
73 | **Collide**:
74 | + Form teams
75 | + Mix up grade levels/experience
76 | + Discuss **weaknesses**, **technologies**, **expertise**, **talent**
77 | + Pick **R** or **Python**
78 | + Establish Communication channels
79 | + Facebook
80 | + GroupMe
81 | + Slack
82 | + GitHub
83 | + Phone
84 | + Gmail/Email
85 |
86 | **Homework**:
87 | + Find an interesting project online/from inertia7.com
88 | + Read through contents
89 | + Catch up on your **R**/**Python** skills with DataCamp
90 | + Get to know each other
91 | + Become Familiar with GitHub/create account (for more beginner level/those who weren't here, we'll go into more detail in a later meeting)
92 |
93 | **Links to Resources** to resources discussed in meeting:
94 | + **R**/**RStudio**: https://www.rstudio.com/
95 | + **Python**: https://www.python.org/
96 | + **Inertia7**: http://www.inertia7.com/
97 | + **GroupMe**: https://groupme.com/en+US/
98 | + **GitHub**: https://github.com/
99 | + **Slack**: https://slack.com/
100 | + **DataCamp**: https://www.datacamp.com/
101 |
102 | ## Week 3: Why do a Data Science Project?
103 |
104 | **Some preliminaries**
105 | + Does everyone in your team have:
106 | + **Slack** account/channel within the *dsprojectgroup* **Slack**?
107 | + **GitHub** account?
108 | + **R**, **Python**, **SQL** set up on their machine? (Whatever y'all plan on using)
109 | + Speak about versions for language and packages/modules. Especially in **Python**:
110 | + Speak to me after if you need more clarification
111 | +If you can answer this questions then you're fine: Do you know what a virtual environment is? And do you know its use?
112 | + If you don't know have your team speak to me after.
113 | + Which interface will your team be using i.e. **Rstudio** or **Jupyter Notebook** for **R**
114 | + Introduce the concepts of **Stand Ups**
115 | + Structure of an effective **Stand Up**:
116 | + What did I accomplish last meeting?
117 | + What will I do today?
118 | + What obstacles are impeding my progress? (Blockers)
119 |
120 | + Document **everything** in your **Slack** channel
121 | + If you used a site to review **R**, **Python**, **html**, etc. post it within your group's channel
122 | + Read a cool article relating to your project; document it on **Slack**
123 | + This will become important when citing sources, creating documentation for project, and just a good habit to develop since people deserve credit for helping you!
124 |
125 | + **Trello**
126 | + Nathan will introduce the interface and how to integrate it into your workforce
127 | + We might create a markdown file explaining in more detail if people do not understand how to use it right away (but is pretty easy to use).
128 | + Resources:
129 | + [Trello Tutorial](https://trello.com/b/I7TjiplA/trello+tutorial)
130 | + [Trello Youtube Tutorial](www.youtube.com/watch?v=7najSDZcn+U)
131 |
132 | ## What is a **Data Science Project**?
133 |
134 | + How to do a **Data Science Project**?
135 |
136 | + Steps of a **Data Science** project:
137 | + Getting Data
138 | + **UCI Machine Learning Repository**
139 | + **Kaggle** datasets
140 | + Cleaning data/sanity checks
141 | + Exploratory Analysis
142 | + Trends in reponse and predictor variales
143 | + Modeling (Choosing Supervised Vs. Unsupervised Learning)
144 | + Model Validation
145 | + Sharing Results
146 | + Inertia7.com
147 | + GitHub repo with nice READNE.md
148 | + Jupyter/RMarkdown Notebook
149 |
150 | If you don't think you can do a project on your own right of the bat. Try doing a project from **Inertia7**!
151 |
152 | + [Scrape a Webpage - Python](www.inertia7.com/projects/scrape-webpage-python)
153 | + [Iris Flower Classification](http://www.inertia7.com/projects/iris-classification-r)
154 | + [Modeling Home Prices](http://www.inertia7.com/projects/regression-boston-housing-r)
155 | + [Forecasting the Stock Market](http://www.inertia7.com/projects/time-series-stock-market-r)
156 | + [Sentiment Analysis on Twitter](http://www.inertia7.com/projects/sentiment-analysis-clinton-trump-2016)
157 |
158 |
159 | Here are some of my own repos where I have projects that aren't published on **Inertia7**:
160 | + https://github.com/raviolli77/pythonTutorialsVinceLa
161 | + https://github.com/raviolli77/machineLearning_Flags_Python
162 | + https://github.com/raviolli77/classification_iris
163 | + https://github.com/raviolli77/machineLearning_breastCancer_Python
164 | + https://github.com/raviolli77/ggplot2_Tutorial_R
165 |
166 | Discuss what their project can look like given the structure of what they just hacked
167 | + Fill in the **Steps of a Data Science Project**
168 |
169 | **Homework**:
170 | For this section, we can be lenient as to when this gets done. For more advanced groups we expect for you to be able to do this on your own. Now for the newer groups you can wait until the next meeting to have me or other members help with the process.
171 | + Build a proposal for your own project
172 | + Get comfortable using **Markdown** notation
173 | + Create a repo in the [Data Science Project Groups GitHub Account](https://github.com/UCSB-dataScience-ProjectGroup) including these steps:
174 | + Abstracts
175 | + Finish filling the **Steps of a Data Science Project**
176 | + Data Sources? Examples include, but are not limited to:
177 | + Kaggle
178 | + UCI
179 | + Data sets found in **R**
180 | + Quandl
181 | + API calls:
182 | + Wikipedia
183 | + Twitter
184 | + Google Maps
185 | + Saint Louis Federal Reserve
186 | + Google Analytics
187 | + If not, then select a project from the suggested list or talk to me for project ideas
188 | **Links to Resources** to resources discussed in meeting:
189 | + **R**/**RStudio**: https://www.rstudio.com/
190 | + **Python**: https://www.python.org/
191 | + **Inertia7**: http://www.inertia7.com/
192 | + **GitHub**: https://github.com/raviolli77
193 | + **Trello**: https://trello.com/
194 | + **UCI ML Database**: https://archive.ics.uci.edu/ml/datasets.html
195 | + **Kaggle Datasets**: https://www.kaggle.com/datasets
196 | + **R Data sets**: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
197 | + **Quandl**: https://www.quandl.com/
198 | + **Wikipedia API**: https://www.mediawiki.org/wiki/API:Main_page
199 | + **Twitter API**: https://dev.twitter.com/docs
200 | + **Saint Louis Federal Reserve**: https://fred.stlouisfed.org/
201 | + **Google Analytics**: https://www.google.com/analytics/#?modal_active=none
202 | + **Jupyter Notebook**: http://jupyter.org/
203 | + **R Markdown**: http://rmarkdown.rstudio.com/
204 |
205 | ## Week 4: Project Iteration/GitHub
206 | **Some Preliminaries**:
207 | + Are people interested in a **Python Hackathon**?
208 | + If so when and where works best
209 |
210 | + Has your team created a **GitHub** Repo for your project within the organizational **GitHub** (Source: https://github.com/UCSB-dataScience-ProjectGroup)?
211 | + Does it have a **ReadMe** explaining the Steps of a **Data Science** Project?
212 | + Did you all agree which versions/interface for the language you will be using?
213 | + Did you reach a conclusion of what models/approach you will take?
214 | + If not give us an overview what you plan to do, by the end of this meeting the project should be decided more or less
215 |
216 | **Team Resources**
217 | + Has your team...
218 | + Been in contact through **Slack**?
219 | + Been doing **Stand Ups**?
220 | + Been addressing issues in going about your project or any preliminary practice for your project
221 | + Asked for help?
222 |
223 | ## **GitHub Crash Course**
224 |
225 | Here we're giving a quick overview of how **GitHub** works. Purpose is to be used as a rudimentary guide for those of you who are new to **GitHub**. We can spend an entire day going over the workflow of **GitHub**, but for now we're concerned with just getting your feet wet, and soon creating a repo for your project if you haven't already.
226 |
227 | **NOTE**: One can spend an entire day learning **git**, so we'll leave that out for this iteration. We will provide resources for **git** below!
228 |
229 | + **Step 1:**
230 | + Create a **GitHub** account (Should go without saying, but you'd be surprised.)
231 | + **Step 2:**
232 | + You should create a *myProject* folder where you keep all your projects. This will help with organization for later on when you'll be doing a shit load of projects and prior when publishing projects!
233 | + Create a folder for your project where you will include things like, but not limited to:
234 | + **README** file - This file will be other people's introduction to your project so make it pretty and easy to follow! (in .md format).
235 | I use [Sublime Text](https://www.sublimetext.com/) to create and edit **README** files (there's a plethora of text editors like **Notepad++**, **atom**, etc. really its all personal preference)
236 | + Script files - These files will be in the format of the language you are doing your project on so either an **R** file or **Python** file (in **.R** or **.py** or **.sql** )
237 | + Data file(Not sure what the proper name for this is will edit later) - This file is where your data is stored if you are using a static data source typically it can be:
238 | + **.csv** file
239 | + **.txt** file
240 | + **.JSON** file
241 | + **.db** file
242 | + Image folder - For organizational purposes we usually create an image folder which is where we store all images produced in the project if we plan on hosting them or making them viewable without having to run/save the code. Inside this folder you will find static image files like:
243 | + **.png** files (favorited in producing statistical images)
244 | + **.jpeg**
245 | + **.gif**
246 | + Once you get more acquainted with **GitHub** there will be more files that you will add, but for this example these will do
247 | + **Step 3**:
248 | + Once you have the folder for your project and all the respective files you wish to include in the repo on the main page of **GitHub**, click the green button that says *New repository*
249 | + Add the Repo name: we usually name our repos as such
250 | + *statisticalModel_DataSetDescription*
251 | Ex.
252 | + *classification_IrisFlowersR*
253 | + *regression_bostonHousingR*
254 | + Add a description: give a brief overview of what your project will be about to help give people context.
255 | Ex.
256 | + *A collection of alternate R markdown templates*
257 | + *Repo for a quick ggplot2 tutorial for Exploratory Analysis using Jupyter Notebook and R script*
258 | + Leave it as public: Make it accessible to everyone
259 | + **Initialize with a README** - ALWAYS **initialize with a README**: this acts as an instructional overview for your project
260 | + You typically include steps that were required that you can't express in your code (i.e. Creating a plotly account, steps needed if there are multiple scripts in your project)
261 | + A brief overview of your data set and statistical models used in the project
262 | + This will help later on if you plan to publish on inertia7!
263 | + Updates made to your project since its last iteration
264 | + Look at the inertia7 README's for some concrete examples
265 | + **Step 4**:
266 | Since you will be working in a team you have to be familiar with **branches**. **Branches** are different versions for the project, so a good way for your group to work on the project without fucking up the **master branch**
267 |
268 | + (**Master Branch**: This is the version the world will see and use, so make sure that this **branch** is the best iteration/is deployable)
269 | + Create a **branch** and call it like **ravi_branch**
270 | + You and each person in your team should have a branch that shows your iteration of the project if you happen to go ahead or test something out you haven't spoken with your teammates yet.
271 | + **Step 5:**
272 | Say you and your group are in agreement that your **branch** is the version you want on the **master branch**, the next step is creating a **Pull Request**.
273 |
274 | + (**Pull Request**: Allows people to review any changes made in a project, make modifications before the **master branch** changes, and overall help a team work efficiently)
275 | + Go into the **branch** you want to merge so **ravi_branch**
276 | + Click **New Pull Request**
277 | + Here you will see the two **branches** being compared:the **base** will typically be the **master branch** and the compared file will be **ravi_branch** in our example.
278 | + Add a description of some of the changes you made!
279 | + **GitHub** will give you an overview of the changes made in files
280 | + Once you have reviewed everything click **Create pull request**
281 | + This is where other teammates will be notified of you wanting to merge your **branch** and the **master branch**
282 | + If everyone is in agreement you click **Merge pull request**
283 | + Then, click **Confirm merge** and the **master branch** will now have the same contents as **ravi_branch**
284 |
285 | That's a quick and rough tutorial to working in **GitHub**. Doesn't go over everything but should give context as to how to work as a team using **GitHub** and **branches**. I have provided sources that go in more detail and definitely explain better so I would suggest reading up on them!
286 |
287 | **Homework**:
288 | + Will depend on conversations we have on Wednesday to see where your team is at
289 | + Have a repo within the organizational repo by the end of today!
290 | + Create **branches** for each teammate
291 | + Set up a meeting time outside of Wednesday
292 |
293 | **Links to Resources** to resources discussed in meeting(**NOTE**(2/14): Moved **GitHub** related resources to *Recommended Resources for entire quarter*):
294 |
295 | ## Week 5: Project Iteration
296 |
297 | **Some Preliminaries**:
298 | + **Python Hackathon** (Workshop)
299 | + Steps needed to be taken before we can start/set up the **hackathon**:
300 | + Install **Python3.X**
301 | + Use a *Virtual Environment* for your project if it will be in **Python**
302 | + Fill out the google survey sent yesterday night:
303 | + We need to gauge date, time, and funds to make sure it will run smoothly
304 |
305 | + Rewards!!!
306 | + **HG Data Hackathon**
307 | + Date proposition: April 21st from 2pm to 10pm
308 | + Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
309 | + Spoke with Jason
310 | + Informal presentation of projects with *congratulatory refreshments*
311 | + Reward for **Best Data Visualization**
312 | + Reward for **Best insight/best modeling**
313 | + Reward for **Best presentation**
314 | + Jun Seo can speak of presentation of projects for library staff!
315 |
316 | + Major issues to address for today:
317 | + Does every team have a *requirement.txt* for their project?
318 | + Some README's need more detail (I will go about doing informal interviews today to each group)
319 | + By today your team should have what algorithms, methods and **Python** versioning.
320 | + Branches for team members
321 | Depending on attendance we want today really show us the early iteration of your project so
322 | + Have a script with modules you will be using
323 | + Data set attached to your repo
324 | + Algorithms you will use
325 |
326 | ## Week 6: Project Iteration/Blockers
327 |
328 | **Some Preliminaries**:
329 | + **Python Hackathon** (Workshop)
330 | + Confirmed Date: **2/25/2017** at **10 a.m**.
331 | + Buy shirts to rep!
332 | + Contact me after to get them from other officer. I can take Venmo!
333 | + Rewards (Reiterate because a lot of people were MIA)!!!
334 | + **HG Data Hackathon**
335 | + Date proposition: April 21st from 2pm to 10pm
336 | + Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
337 | + Informal presentation of projects with *congratulatory refreshments* near end of this quarter
338 | + Reward for **Best Data Visualization**
339 | + Reward for **Best insight/best modeling**
340 | + Reward for **Best presentation**
341 | + The informal presentation can be a prep for the presentation to the Library faculty
342 | + Most likely scheduled at the start of next quarter (Ask Jun-Seo if you have any questions)
343 | + Project will be posted in the newest iteration of **int7x** (inertia7)!
344 | + Team Management
345 | + Word from me regarding team
346 | + We need teams to start applying **Stand Ups** now (Mandatory)
347 | + Must be done before starting your sessions and immediately when your team finishes the meet-up.
348 | + Will demonstrate again with more feedback given to teams
349 | Today will play as an important catch up day for many teams since midterm season was(is) around
350 | + I will go around to teams and ask about project relating to
351 | + repository
352 | + code
353 | + README
354 | Today will be focused mostly on iterating projects.
355 |
356 | ## Week 7: I didn't prep this week
357 |
358 | Carry on. Nothing to see here.
359 |
360 | ## Week 8: Presentation/Flex Day
361 |
362 | For this week I decided we are going to do a surprise project presentation.
363 |
364 | **Announcements**:
365 | Thank you for everyone who participated in the **Python Workshop**
366 |
367 | I will need every team to do the following:
368 | + Update all scripts on their **GitHub** repo in the **ProjectGroupWinter2017**.
369 | + README.md
370 | + scipt.py
371 | + All appropriate data files (i.e. csv files, txt files, etc.)
372 | + Images (inside images folder) that were produced for this project
373 | + Be prepared to pitch your idea to me.
374 | + Sell that shit.
375 | + Why is your project relevant to *Data Science* and the data community as a whole.
376 | + (Not 100%) I would like to see some scripts/notebooks being ran during presentation but due to time constraints, we might just only use what's on **GitHub**.
377 |
378 | Each group presentation should be no longer than **15 minutes**
379 |
380 | ## Week 9: Quarter Wrap-Up
381 |
382 | ### Final thoughts on quarter
383 | + Thank You's
384 | + Dedications
385 | + Food for thought for next quarter
386 |
387 | **Some Preliminaries**:
388 |
389 | ### FACTOR PI sale
390 |
391 |
392 |
393 | Only 1$ a piece! Go show some support to our friends at the **Female Actuarial Association**. Find event link [Here](https://www.facebook.com/events/1615213815453503/)
394 |
395 | + Location: **SRB**
396 | + Date: March 14, 2017
397 | + Time: 11AM - 3PM
398 |
399 | ### Farmer's Data Talk
400 |
401 |
402 |
403 | The Org. wants a packed house for the **Farmer's Insurance Data Talk** so let's all make it out! Facebook event link [Here](https://www.facebook.com/events/589388027918002/)
404 |
405 | + Location: UCen SB Harbor Room
406 | + Date: March 9, 2017 (So tomorrow)
407 | + Time: 6PM - 8PM
408 | + Will **NOT BE FOCUSED** on **actuary** based stuff (Will focus on **Natural Language Processing** so highly relevant to our group)
409 |
410 | ### HG Data Hackathon
411 |
412 | + Location: HG Data Offices
413 | + Time: April 21st
414 | + More on this later
415 | + Will most likely work on a tutorial with Calvin during Spring Break to help prep
416 |
417 |
418 | ### Chapman Data Fest
419 |
420 | + Location: Chapman University
421 | + Time: April 21st as well
422 | + Team of 5 to attend
423 | + **NOTE**: Json wants the people to attend the **Chapman Data Fest** to be of different class levels (i.e. freshman, sophomore, Junior, Senior and Super Senior)
424 | + Let me know if you're interested in this event! Link for Event [Here](https://events.chapman.edu/28206)
425 |
426 | ### Library presentations
427 | We have confirmed date!
428 |
429 | + Location: Same location so here
430 | + Time: April 26th at 7pm
431 | + Need y'all to use today to prep and keep track of progress!
432 | + Make **Github** repos pretty
433 | + Code readable
434 | + Write nice docs
435 | + Make plots pretty with titles, axis labels, and legends
436 |
437 | Let's really flex for this. Everyone worked hard!
438 |
439 |
440 | We would like your team to use inertia7 to present your projects so this is a good segue for the next section
441 |
442 | ### inertia7 User Testing
443 |
444 | We know dead week and finals are fast approaching but we were wondering if anyone would be interested in User-testing the new iteration of inertia7 to give constructive criticism.
445 |
446 | + Doesn't have to be publishing a project. Can just play with the app
447 | + If interested to talk to me or David
448 | + Follow [Link](https://docs.google.com/forms/d/e/1FAIpQLScX8KK6z3ji6OLKlMZ0GS64dbsAJAGmmQLGbihEd5d3wA8o6g/viewform?c=0&w=1) to apply for credentials
449 |
450 | ## Wrap-Up
451 | Things needed by the end of this meeting:
452 | + Updated Scripts
453 | + Updated README's
454 | + Add any appropriate images
455 | + Create plotly account to publish plotly graphs (if applicable)
456 | + To-do list detailing what is still needed for your project
457 | + Keep in contact with partners over break.
458 | + If you're bored during break work on the project!
459 |
460 | **IMPORTANT TO NOTE**: Since finals is approaching your group needs set this up in their repo since there will be a gap period of 3 weeks. I need to know where your team is at and context of this. You **CAN'T** leave until your team shows me the repo and the outline of what is done and what isn't done.
461 |
462 | Three weeks is a long time and if there's no structure as to where your at you will forget/will be hard to pick back up.
463 |
464 | For those of you who feel you are ready to iterate on the presentation part of your project talk to me by the end of today's meeting.
465 |
466 | Again thank you for a wonderful quarter and hope to see you all again next quarter!
467 |
468 | ## **Recommended Resources for entire quarter**:
469 |
470 | + **README** Resources:
471 | + [README wiki](https://en.wikipedia.org/wiki/README)
472 | + [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)
473 | + [inertia7 Examples](http://www.inertia7.com/)
474 | + [Time Series Analysis README](https://github.com/inertia7/timeSeries_sp500_R/blob/master/README.md)
475 | + [Regression Analysis README](https://github.com/inertia7/regression_bostonHousing_R/blob/master/README.md)
476 | + [noffle's Art of README article](https://github.com/noffle/art-of-readme)
477 | + [More resources about README's](https://github.com/matiassingers/awesome-readme)
478 | + **GitHub** Resources:
479 | + [Hello World Tutorial](https://guides.github.com/activities/hello-world/)
480 | + [GitHub Youtube Channel](https://www.youtube.com/githubguides)
481 | + [Understanding the GitHub Flow](https://guides.github.com/introduction/flow/)
482 | + [Creating and Deleting Branches](https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/)
483 | + **Git** Resources:
484 | + [Set Up Git Article](https://help.github.com/articles/set-up-git/)
485 | + [Create a Repo Article](https://help.github.com/articles/create-a-repo/)
486 | + [Fork A Repo](https://help.github.com/articles/fork-a-repo/) Not discussed in this meeting but important part of **GitHub** workflow
487 | + [Be social](https://help.github.com/articles/be-social/) (Great place to discover cool shit on **GitHub**)
488 | + [David's Git Repo](https://github.com/dcamposliz/learnGit)
489 | + **Text Editors** Resources:
490 | + [Sublime Text](https://www.sublimetext.com/)
491 | + [Notepad++](https://notepad-plus-plus.org/)
492 | + [Atom](https://atom.io/)
493 | + [vim](http://www.vim.org/download.php)
494 |
495 | + **Python** Resources:
496 | + [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) (Brush up on **NumPy** and learn **Pandas** from the man who created it!)
497 | + [Vincent La's Personal Website](http://vincela.com/) (Raul's Note: Great place to review/learn **Python** if you're really *rusty*)
498 | + [Python Documentation](https://docs.python.org/3/) (For more advanced users, the documentation for the programming language are clutch resources)
499 | + [Learn Python the Hardway](https://learnpythonthehardway.org/book/) (Haven't gone through it will soon, but dank resource for learning Python)
500 | + [Yhat](https://www.yhat.com/) (Great resource for machine learning application with **Python**)
501 | + [David's Repo: learnPython](https://github.com/dcamposliz/learnPython)
502 | + [Hitchhiker's Guide to Python](http://docs.python-guide.org/en/latest/)
503 | + [Sklearn Docs](http://scikit-learn.org/stable/)
504 | + [Plotly examples in Python](https://plot.ly/python/)
505 | + **R** Resources:
506 | + [R-bloggers](https://www.r-bloggers.com/) (Great place to see people contributing projects and tutorials by real **R** users)
507 | + [ggplot2 docs](http://docs.ggplot2.org/current/)
508 | + [ggplot2 Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) (For visualizations)
509 | + [Quick-R](http://www.statmethods.net/)
510 | + [Plotly examples in R](https://plot.ly/r/)
511 | + [R for Data Science](http://r4ds.had.co.nz/index.html) (Learn from some of the **R** greats including Hadley Wickham, creator of many famous **R** packages)
512 | + [An Introduction to Statistical Learning with R](http://www-bcf.usc.edu/~gareth/ISL/) (Great book used in many UCSB PSTAT Classes)
513 |
514 | + Misc.
515 | + [Kaggle](https://www.kaggle.com/) (Great resource for all things data science)
516 | + [DataCamp](https://www.datacamp.com/)
517 | + [Analytics Vidhya](https://www.analyticsvidhya.com) (Lot of great tutorials relating to machine learning)
518 | + [Stack Overflow](http://stackoverflow.com/) (Stack overflow is love, Stack Overflow is life)
519 | + [w3schools tutorials](https://www.w3schools.com/) (Great place to learn other important tools like, but not limited too: html, SQL (I used this one a lot), website development)
520 |
521 |
--------------------------------------------------------------------------------