├── examples ├── wordCloud.png └── script.py ├── pythonHackathon ├── datePreference.png └── pythonHackathon.md ├── script.py ├── StepsofaDataScienceProject.md ├── StepstoCreatingDataScienceProject.md └── README.md /examples/wordCloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raviolli77/dataScience-UCSBProjectGroup-Syllabus/HEAD/examples/wordCloud.png -------------------------------------------------------------------------------- /pythonHackathon/datePreference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raviolli77/dataScience-UCSBProjectGroup-Syllabus/HEAD/pythonHackathon/datePreference.png -------------------------------------------------------------------------------- /script.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | Minimal Example 4 | =============== 5 | Generating a square wordcloud from the Star Wars: A New Hope using default arguments. 6 | 7 | All code generated by writers of word_cloud module (https://github.com/amueller/word_cloud) 8 | Used for educational purposes 9 | """ 10 | 11 | from os import path 12 | from wordcloud import WordCloud 13 | 14 | d = path.dirname(__file__) 15 | 16 | # Read the whole text. 17 | text = open(path.join(d, 'a_new_hope.txt')).read() 18 | 19 | # Generate a word cloud image 20 | wordcloud = WordCloud().generate(text) 21 | 22 | # Display the generated image: 23 | # the matplotlib way: 24 | import matplotlib.pyplot as plt 25 | #plt.imshow(wordcloud) 26 | #plt.axis("off") 27 | 28 | # lower max_font_size 29 | wordcloud = WordCloud(max_font_size=40).generate(text) 30 | plt.figure() 31 | plt.imshow(wordcloud) 32 | plt.axis("off") 33 | plt.show() -------------------------------------------------------------------------------- /examples/script.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | Minimal Example 4 | =============== 5 | Generating a square wordcloud from the Star Wars: A New Hope using default arguments. 6 | 7 | All code generated by writers of word_cloud module (https://github.com/amueller/word_cloud) 8 | Used for educational purposes 9 | """ 10 | 11 | from os import path 12 | from wordcloud import WordCloud 13 | 14 | d = path.dirname(__file__) 15 | 16 | # Read the whole text. 17 | text = open(path.join(d, 'a_new_hope.txt')).read() 18 | 19 | # Generate a word cloud image 20 | wordcloud = WordCloud().generate(text) 21 | 22 | # Display the generated image: 23 | # the matplotlib way: 24 | import matplotlib.pyplot as plt 25 | #plt.imshow(wordcloud) 26 | #plt.axis("off") 27 | 28 | # lower max_font_size 29 | wordcloud = WordCloud(max_font_size=40).generate(text) 30 | plt.figure() 31 | plt.imshow(wordcloud) 32 | plt.axis("off") 33 | plt.show() -------------------------------------------------------------------------------- /pythonHackathon/pythonHackathon.md: -------------------------------------------------------------------------------- 1 | # Python Hackathon 2 | ## Function of Data Science at UCSB Project Group 3 | 4 | **Contributors**: 5 | + Raul Eulogio 6 | 7 | Necessary to discuss with everyone: 8 | + Date(s) (Most likely dates since we have a large preference for two dates) 9 | + Location - On Campus 10 | + Time 11 | + Earliest works best, but how early is the issue 12 | + Sign-In 13 | + Pitch in? 14 | + Coffee 15 | + Snacks? 16 | + Anything else add here: 17 | 18 | ## **Issues to Address**: 19 | 20 | The purpose of this file is to address issues with respect to setting up **Python** *before* attending the **Hackathon**. If we have too many people troubleshooting at the **hackathon**, it will lose the purpose of the **hackathon**'s intent, which is to collaborate learning/teaching **Python** for data analysis, not setting it up. 21 | 22 | So we are giving a brief overview of what should be done before the **hackathon** so that we will maximize our time in learning **Python**. 23 | 24 | ## Downloading Python 25 | Install **Python3.X** onto your computer. 26 | ### Mac OS 27 | **Mac OS** has **Python2.7** pre-installed so download **Python3.X**. Here's a quick run through although we recommend reading this [Guide](http://docs.python-guide.org/en/latest/starting/install/osx/) as well as reading the [Python Docs](https://www.python.org/). 28 | 29 | Installing **Python3.X** (from the command line) using homebrew (a popular package manager) for **Mac OS** would be simply 30 | 31 | brew python3 32 | 33 | That downloads **Python 3.X**, to test this on the terminal run 34 | 35 | python3 36 | 37 | This should open **Python3.X** (If you get an error code, google it and try diagnosing what the issue was or contact us), now run the command 38 | 39 | quit() 40 | 41 | This should close **Python**. Let's test out `pip3` (Method we will use extensively for downloading 3rd party modules) by downloading a popular module used for data analysis, pandas: 42 | 43 | pip3 install pandas 44 | 45 | And you should see the installment take place, and your set! 46 | 47 | ### Windows 48 | I unfortunately don't have a lot of experience installing **Python** on windows, but here's a tutorial from [The Hitchhiker's Guide to Python's](http://docs.python-guide.org/en/latest/starting/install/win/) guide for doing so. 49 | 50 | **Important**: this guide are instructions for 2.7. this resource will be mentioned later as well (So we think its a good resource if you find any other let us know so we can add it here!) 51 | 52 | ## Virtual Environments 53 | As you venture into the world of programming and data science it is necessary to use and understand *Virtual Environments*. There are two *Virtual Environments* that we familiar with, choice is really personal taste. 54 | 55 | + [**Anaconda**](https://www.continuum.io/downloads) - Popular and adverstised as a *Virtual Environment* for data science teams. Last quarter, Jason told people to download this for their *Virtual Environments* so if you have this just go ahead and use it. 56 | + [**Virtualenv**](http://docs.python-guide.org/en/latest/dev/virtualenvs/) - This tool is more versatile in that it is not advertised for a specific niche, so `virtualenv` seems to be the *Virtual Environment* used for any **Python** package. If you follow the tutorial provided it should be pretty straightforward to apply and if you have any questions contact me. 57 | 58 | **Recall**: Download `virtualenv` for **Python3.X** using `pip3` not `pip` 59 | 60 | **Important to Note**: The reason we are stressing the importance of using *Virtual Environments* is because we want your code to be reproducible. Once people want to replicate your projects, they need to know what **Python** version you used, they need to know what module versions you used. Often times when not specified someone might try to replicate your project and if they have an older **Python** version, they will quickly run into a lot of error codes, so you'll be doing people a huge favor by specifying what versions of modules and **Python** you used. 61 | 62 | ## Conclusion 63 | Please ensure, in order to have the most prodcutive **Hackathon**, you handle these issues **before** the day of: 64 | 65 | + Have **Python3.X** on your computer 66 | + Have a *Virtual Environment* set up on your computer (we might be lenient and go about teaching how to work *Virtual Environments* since at first it can be a daunting concept) 67 | -------------------------------------------------------------------------------- /StepsofaDataScienceProject.md: -------------------------------------------------------------------------------- 1 | # How to do a Data Science Project 2 | 3 | Throughout the history of this organization we have emphasized the importance of creating projects. The biggest issue that we've seen people face is that they want to do a project but they don't know how to start a project. This roadblock prevents people from moving forward and will often make or break a team. 4 | 5 | To put it bluntly we can't let this slide anymore. We hope with this newly formatted **Project Group** to bridge the gap and make doing a project an attainable goal. 6 | 7 | ## Why do a Data Science Project? 8 | 9 | Before we go into the technical details of building a Data Science Project, we want you to ask yourself: Why do you want to do a Data Science Project? 10 | 11 | People's motives will vary. Answers can range from: '*it will look good on my resume*', '*I want to be able to brag about something that's attainable by other people online*', '*I want to learn beyond what's taught to me in my classess*'. 12 | 13 | Whatever the reason for your attendance in this group, you will realize that you are now connected to a community that aims to enter a newly emerging field in need of highly driven and analytical people with a love for all things data. 14 | 15 | The short answer for the desired results for a Data Science Project is that you want to predict an outcome based on certain attributes. Doing a Data Science Project will make sense of your data which you can then use for a plethora of insights relating to your understanding of the data which you can then use to give insight to people who need it or are curious as to how the *data speaks to you*. 16 | 17 | ### Open Source 18 | Once you start the process of a Data Science project you will quickly be introduced to the concept of *open-source* (if you haven't already). This concept was introduced to me through **R/Rstudio** and I quickly fell in love with *open-source*. Once you have learned the tools and have published a project, using **RStudio** or **Python**, you can publish to [inertia7](http://www.inertia7.com/) or your GitHub Repo, but you will realize that you have made your imprint unto this growing community where anyone can learn from you if they so choose to! 19 | 20 | Here I've provided a (non-exhaustive) list of *open-source* communities that provide resources at any skill level for people wanting to enter this field: 21 | 22 | - [RBloggers](https://www.r-bloggers.com/) 23 | - [Stack Overflow](http://stackoverflow.com/) 24 | - [Cross Validated](http://stats.stackexchange.com/) 25 | - [Kaggle](https://www.kaggle.com/) 26 | 27 | We can go on and on about resources, but regardless of how many resources we provide, people will still be left with an air of mystery as to what exactly a Data Science Project is. So the next section aims to dissect the process of doing a Data Science Project, and provide more resources at every step. 28 | 29 | ## Getting the Data 30 | 31 | We are assuming that for you to have gotten this far you have a basic understanding of some statistical tools/methods and are fairly knowledgeable with either **R**/**Python** as your tool for project building (granted we don't expect you to be experts on either but knowing how to manipulate data frames, conditional formatting, and data wrangling are a must to do projects). 32 | 33 | This part of the project building can be daunting due to the over-abundance of data sets/raw data. 34 | 35 | This step requires more thought than one might think, and will ultimately be decided as to what your team is interested in. This can range from sports, video games, music, etc. If it exists there is data available for it. And if there isn't then you can be the first to collect data (more on this later!) 36 | 37 | Now if you're a beginner we recommend using data sets that have are *open-source* and have been used before for Data Science projects. 38 | 39 | This can range from: 40 | - [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets.html) 41 | - Data sets found in the **RStudio** base packages found here: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. 42 | - If you have a bit more experienced then we would definitely recommend the [Kaggle Data sets](https://www.kaggle.com/datasets) 43 | - API Calls to certain websites like **Twitter**, **Reddit**, **Google Maps** to name a few. 44 | 45 | A piece of advice when choosing a data set, when starting off, is to choose one that isn't to over the top. The best way to learn is to go through a simple data set and then expand what you learned unto more complicated data sets once you become familar with the process. For many of our projects in inertia7.com we chose data sets that were often used in our Statstical Classes and online resources, that we mentioned, to slowly build up our repertoire and skill set. 46 | 47 | Start small and build your way up 48 | 49 | ## Data Wrangling 50 | 51 | This part varies from data set to data set. Once you have chosen your data set/collected your data, the next step is the process of cleaning the data known as *Data Wrangling*. If you have chosen a data set that has been abundantly used in statistical courses/Online Data Science resources you will find that you can easily reference other people's work when it comes to cleaning your data set(**Remember to Cite every source you use**). As well as finding justifications readily available for the transformations that were made. 52 | 53 | This step is crucial; often we take for granted the data sets we use, but one will find that data sets are often very messy and it is up to you to go through the process known as *Data wrangling*. 54 | 55 | ### Example: Normalization of our predictor variables 56 | Often in machine learning, we will realize that the different variables are measured in different scales. 57 | 58 | For example say we have a data set that has these two columns: 59 | 60 | - `Spending` - measured in percentage (i.e. 13.56% is written as 13.56) 61 | - `CostPerCapita` - measured in $1 millions (i.e. $1.3 million is written as 1,300,000) 62 | 63 | If we tried to do some models on this data set without transforming our data we will quickly run into some critical errors that will hinder our results. Thus possibe solution is to scale our data so that the **mean** for both our variables is 0 and the **standard deviation/variance** is 1. 64 | 65 | This process is known as **Normalization**. 66 | 67 | This is one of many processes that one should take into consideration before delving into a Data Science Project. 68 | 69 | Say you've gotten this far and quickly realized that you don't know when and when not to do transformations to your data set. This process does require a bit of analytical intuition, and there are many resources online that can help with the process. 70 | 71 | Let's pause the process here and say you don't think you have the skill set to make these important decisions. This is important to admit because if you are stuck here and you don't know how to continue a project, this can put a project on pause indefinitely. 72 | 73 | Our advice to you is not to be discouuraged, let's just take a step back and not necessarily redesign your project, but essentially work with what you have. 74 | 75 | ## Data Visualization 76 | Something many people take for granted is *Data visualization* when going about Data Science Projects. Known as **Exploratory Analysis**, visuals are an important factor of projects because it helps gain readers that are interested in what the data has to say in, usually people who don't have a strong data based background. As you progress on to more complicated projects the visualizations will take a back seat to the analystical aspect of your project, but if you're a beginner visualizations are a powerful tool to lure other beginners and a wide array of people into reading your project no matter how simple or trivial you might think it is. 77 | 78 | An example of this being a subreddit called [Dataisbeautiful](https://www.reddit.com/r/dataisbeautiful/) currenlty followed by 9,616,736 redditors. This subreddit focuses on simple visualizations of data collected by redditors. Many of these projects do not feature machine learning or advanced statistical inferences, but there's a reason it has so many followers. 79 | 80 | Many people who do not posess the data-driven background still love to learn about data presented to them in a way they can understand, and a visualization is just that. 81 | 82 | ## Example: Exploratory Analysis on Text 83 | An example of an effective use of visualizations would be say your team decided to do text analysis on the movie script for **Star Wars: A New Hope** through the process known as **Natural Language Process** (NLP), but the algorithms you found online don't make sense you're team and you're having trouble moving forward. Instead of letting the project come to a complete halt, instead think of some easy descriptive statistical methods you can employ on your data set. 84 | 85 | WordClouds are a simple and effective way of showcasing words that are often said in a corpus, in our case the movie script. Employing the [Word_Cloud](https://github.com/amueller/word_cloud) module in **Python** you can quickly reignite fire to your project as shown below (This process took me all but 5 minutes to recreate using their sample scripts on GitHub): 86 | 87 | 88 | 89 | From here many other ideas come to mind that don't involve complicated machine learning algorithms: 90 | 91 | - Bar chart showcasing the amount of dialogue each character has 92 | - Word Clouds for each main character 93 | - Bar Charts showcasing frequency of a certain word like **the Force** 94 | 95 | Therefore we showcased that Data Science projects don't have to be overtly complicated, starting with exloratory analysis is okay and can be used as a jumping board to more complicated and indepth analysis. -------------------------------------------------------------------------------- /StepstoCreatingDataScienceProject.md: -------------------------------------------------------------------------------- 1 | # How to do a Data Science Project 2 | 3 | Throughout the history of this organization we have emphasized the importance of creating projects. The biggest issue that we've seen people face is that they want to do a project but they don't know how to start a project. This roadblock prevents people from moving forward and will often make or break a team. 4 | 5 | To put it bluntly we can't let this slide anymore. We hope with this newly formatted **Project Group** to bridge the gap and make doing a project an attainable goal. 6 | 7 | ## Why do a Data Science Project? 8 | 9 | Before we go into the technical details of building a Data Science Project, we want you to ask yourself: Why do you want to do a Data Science Project? 10 | 11 | People's motives will vary. Answers can range from: *it will look good on my resume*, *I want to be able to brag about something that's attainable by other people online*, *I want to learn beyond what's taught to me in my classess*. 12 | 13 | Whatever the reason for your attendance in this group, you will realize that you are now connected to a community that aims to enter a newly emerging field in need of highly driven and analytical people with a love for all things data. 14 | 15 | The short answer for the desired results for a Data Science Project is that you want to predict an outcome based on certain attributes. Doing a Data Science Project will make sense of your data which you can then use for a plethora of insights relating to your understanding of the data which you can then use to give insight to people who need it or are curious as to how the *data speaks to you*. 16 | 17 | ### Open Source 18 | Once you start the process of a Data Science project you will quickly be introduced to the concept of *open-source* (if you haven't already). This concept was introduced to me through **R/Rstudio** and I quickly fell in love with *open-source*. Once you have learned the tools and have published a project, using **RStudio** or **Python**, you can publish to [inertia7](http://www.inertia7.com/) or your GitHub Repo, but you will realize that you have made your imprint unto this growing community where anyone can learn from you if they so choose to! 19 | 20 | Here I've provided a (non-exhaustive) list of *open-source* communities that provide resources at any skill level for people wanting to enter this field: 21 | 22 | - [RBloggers](https://www.r-bloggers.com/) 23 | - [Stack Overflow](http://stackoverflow.com/) 24 | - [Cross Validated](http://stats.stackexchange.com/) 25 | - [Kaggle](https://www.kaggle.com/) 26 | 27 | We can go on and on about resources, but regardless of how many resources we provide, people will still be left with an air of mystery as to what exactly a Data Science Project is. So the next section aims to dissect the process of doing a Data Science Project, and provide more resources at every step. 28 | 29 | ## Getting the Data 30 | 31 | We are assuming that for you to have gotten this far you have a basic understanding of some statistical tools/methods and are fairly knowledgeable with either **R**/**Python** as your tool for project building (granted we don't expect you to be experts on either but knowing how to manipulate data frames, conditional formatting, and data wrangling are a must to do projects). 32 | 33 | This part of the project building can be daunting due to the over-abundance of data sets/raw data. 34 | 35 | This step requires more thought than one might think, and will ultimately be decided as to what your team is interested in. This can range from sports, video games, music, etc. If it exists there is data available for it. And if there isn't then you can be the first to collect data (more on this later!) 36 | 37 | Now if you're a beginner we recommend using data sets that have are *open-source* and have been used before for Data Science projects. 38 | 39 | This can range from: 40 | - [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets.html) 41 | - Data sets found in the **RStudio** base packages found here: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. 42 | - If you have a bit more experienced then we would definitely recommend the [Kaggle Data sets](https://www.kaggle.com/datasets) 43 | - API Calls to certain websites like **Twitter**, **Reddit**, **Google Maps** to name a few. 44 | 45 | A piece of advice when choosing a data set, when starting off, is to choose one that isn't to over the top. The best way to learn is to go through a simple data set and then expand what you learned unto more complicated data sets once you become familar with the process. For many of our projects in inertia7.com we chose data sets that were often used in our Statstical Classes and online resources, that we mentioned, to slowly build up our repertoire and skill set. 46 | 47 | Start small and build your way up 48 | 49 | ## Data Wrangling 50 | 51 | This part varies from data set to data set. Once you have chosen your data set/collected your data, the next step is the process of cleaning the data known as *Data Wrangling*. If you have chosen a data set that has been abundantly used in statistical courses/Online Data Science resources you will find that you can easily reference other people's work when it comes to cleaning your data set(**Remember to Cite every source you use**). As well as finding justifications readily available for the transformations that were made. 52 | 53 | This step is crucial; often we take for granted the data sets we use, but one will find that data sets are often very messy and it is up to you to go through the process known as *Data wrangling*. 54 | 55 | ### Example: Normalization of our predictor variables 56 | Often in machine learning, we will realize that the different variables are measured in different scales. 57 | 58 | For example say we have a data set that has these two columns: 59 | 60 | - `Spending` - measured in percentage (i.e. 13.56% is written as 13.56) 61 | - `CostPerCapita` - measured in $1 millions (i.e. $1.3 million is written as 1,300,000) 62 | 63 | If we tried to do some models on this data set without transforming our data we will quickly run into some critical errors that will hinder our results. Thus possibe solution is to scale our data so that the **mean** for both our variables is 0 and the **standard deviation/variance** is 1. 64 | 65 | This process is known as **Normalization**. 66 | 67 | This is one of many processes that one should take into consideration before delving into a Data Science Project. 68 | 69 | Say you've gotten this far and quickly realized that you don't know when and when not to do transformations to your data set. This process does require a bit of analytical intuition, and there are many resources online that can help with the process. 70 | 71 | Let's pause the process here and say you don't think you have the skill set to make these important decisions. This is important to admit because if you are stuck here and you don't know how to continue a project, this can put a project on pause indefinitely. 72 | 73 | Our advice to you is not to be discouuraged, let's just take a step back and not necessarily redesign your project, but essentially work with what you have. 74 | 75 | ## Data Visualization 76 | Something many people take for granted is *Data visualization* when going about Data Science Projects. Known as **Exploratory Analysis**, visuals are an important factor of projects because it helps gain readers that are interested in what the data has to say in, usually people who don't have a strong data based background. As you progress on to more complicated projects the visualizations will take a back seat to the analystical aspect of your project, but if you're a beginner visualizations are a powerful tool to lure other beginners and a wide array of people into reading your project no matter how simple or trivial you might think it is. 77 | 78 | An example of this being a subreddit called [Dataisbeautiful](https://www.reddit.com/r/dataisbeautiful/) currenlty followed by 9,616,736 redditors. This subreddit focuses on simple visualizations of data collected by redditors. Many of these projects do not feature machine learning or advanced statistical inferences, but there's a reason it has so many followers. 79 | 80 | Many people who do not posess the data-driven background still love to learn about data presented to them in a way they can understand, and a visualization is just that. 81 | 82 | ## Example: Exploratory Analysis on Text 83 | An example of an effective use of visualizations would be say your team decided to do text analysis on the movie script for **Star Wars: A New Hope** through the process known as **Natural Language Process** (NLP), but the algorithms you found online don't make sense you're team and you're having trouble moving forward. Instead of letting the project come to a complete halt, instead think of some easy descriptive statistical methods you can employ on your data set. 84 | 85 | WordClouds are a simple and effective way of showcasing words that are often said in a corpus, in our case the movie script. Employing the [Word_Cloud](https://github.com/amueller/word_cloud) module in **Python** you can quickly reignite fire to your project as shown below (This process took me all but 5 minutes to recreate using their sample scripts on GitHub): 86 | 87 | 88 | 89 | From here many other ideas come to mind that don't involve complicated machine learning algorithms 90 | 91 | - Bar chart showcasing the amount of dialogue each character has 92 | - Word Clouds for each main character 93 | - Bar Charts showcasing frequency of a certain word like **the Force** 94 | 95 | Therefore we showcased that Data Science projects don't have to be overtly complicated, starting with exloratory analysis is okay and can be used as a jumping board to more complicated and indepth analysis. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Winter Quarter Project Group - Data Science at UCSB 2 | ### Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter 3 | 4 | 5 | ## In Memory of.. 6 | The efforts of this quarter and the work done is dedicated to the memory of: 7 | 8 | + **Fernando Regino** (1993-2013) 9 | + **Bernardino De Jesus** (1993-2016) 10 | + **Ivan Garcia Vergara** (1991-2018) 11 | + **Erik Alonso** (1991-2009) 12 | + **Jorge Zarate** (1990-2008) 13 | 14 | "*When the lights shut off 15 |
16 | And it's my turn to settle down 17 |
18 | My main concern 19 |
20 | Promise that you will sing about me*" - Kendrick Lamar 21 | 22 | Thank you to everyone who participated this quarter 23 | 24 | ## Abstract 25 | 26 | This repository serves as an itinerary for the Project Groups for Winter Quarter for the **Data Science at UCSB** organization. Providing a weekly overview as well as resources used within the weekly meetings. 27 | 28 | **Contributors**: 29 | + [Raul Eulogio](https://www.linkedin.com/in/raul-eulogio-217069123) -> rauleulogio3 [at] gmail.com 30 | + GitHub: https://github.com/raviolli77/ 31 | + [David Campos](https://www.linkedin.com/in/dcamposliz) - dcampos.liz [at] gmail.com 32 | + GitHub: https://github.com/dcamposliz 33 | + Personal Site: http://davidacampos.com/ 34 | + [Jason Freeberg](https://www.linkedin.com/in/jfreeberg) -> freeberg [at] umail.ucsb.edu 35 | + GitHub: https://github.com/JasonFreeberg 36 | + Personal Site: JasonFreeberg.github.io 37 | + [Nathan Fritter](https://www.linkedin.com/in/nathan-fritter) -> nathan.fritter [at] gmail.com 38 | + GitHub: https://github.com/Njfritter 39 | 40 | # Table of Contents 41 | * [Week 2](#weektwo) 42 | * [Week 3](#weekthree) 43 | * [Week 4](#weekfour) 44 | * [Week 5](#weekfive) 45 | * [Week 6](#weeksix) 46 | * [Week 7](#weekseven) 47 | * [Week 8](#weekeight) 48 | * [Week 9](#weeknine) 49 | 50 | # Lesson Plan 51 | ## Week 2: Introductions 52 | + Who are you? 53 | + Name 54 | + Major 55 | + Year 56 | + Where are you from? 57 | + Why are you here? 58 | + What are you trying to accomplish in life? 59 | + what are you trying to accomplish here? 60 | + What are you trying to learn? 61 | + What project(s) are you working on today? 62 | + What recent failure have you had? 63 | + Strengths & weaknesses as it relates to data science or in general? 64 | **Storm** 65 | Goal of this group is to ultimately get projects finished and published 66 | + **WHY** 67 | + We found that it is by working on projects that you actually get to learn and being to understand how to do data science 68 | + Brainstorm on data science ideas 69 | + Write them on a piece of paper 70 | + Go to the front of the group and present it 71 | + Have people walk up to you/you walk up to people, persuade people to be in your group 72 | 73 | **Collide**: 74 | + Form teams 75 | + Mix up grade levels/experience 76 | + Discuss **weaknesses**, **technologies**, **expertise**, **talent** 77 | + Pick **R** or **Python** 78 | + Establish Communication channels 79 | + Facebook 80 | + GroupMe 81 | + Slack 82 | + GitHub 83 | + Phone 84 | + Gmail/Email 85 | 86 | **Homework**: 87 | + Find an interesting project online/from inertia7.com 88 | + Read through contents 89 | + Catch up on your **R**/**Python** skills with DataCamp 90 | + Get to know each other 91 | + Become Familiar with GitHub/create account (for more beginner level/those who weren't here, we'll go into more detail in a later meeting) 92 | 93 | **Links to Resources** to resources discussed in meeting: 94 | + **R**/**RStudio**: https://www.rstudio.com/ 95 | + **Python**: https://www.python.org/ 96 | + **Inertia7**: http://www.inertia7.com/ 97 | + **GroupMe**: https://groupme.com/en+US/ 98 | + **GitHub**: https://github.com/ 99 | + **Slack**: https://slack.com/ 100 | + **DataCamp**: https://www.datacamp.com/ 101 | 102 | ## Week 3: Why do a Data Science Project? 103 | 104 | **Some preliminaries** 105 | + Does everyone in your team have: 106 | + **Slack** account/channel within the *dsprojectgroup* **Slack**? 107 | + **GitHub** account? 108 | + **R**, **Python**, **SQL** set up on their machine? (Whatever y'all plan on using) 109 | + Speak about versions for language and packages/modules. Especially in **Python**: 110 | + Speak to me after if you need more clarification 111 | +If you can answer this questions then you're fine: Do you know what a virtual environment is? And do you know its use? 112 | + If you don't know have your team speak to me after. 113 | + Which interface will your team be using i.e. **Rstudio** or **Jupyter Notebook** for **R** 114 | + Introduce the concepts of **Stand Ups** 115 | + Structure of an effective **Stand Up**: 116 | + What did I accomplish last meeting? 117 | + What will I do today? 118 | + What obstacles are impeding my progress? (Blockers) 119 | 120 | + Document **everything** in your **Slack** channel 121 | + If you used a site to review **R**, **Python**, **html**, etc. post it within your group's channel 122 | + Read a cool article relating to your project; document it on **Slack** 123 | + This will become important when citing sources, creating documentation for project, and just a good habit to develop since people deserve credit for helping you! 124 | 125 | + **Trello** 126 | + Nathan will introduce the interface and how to integrate it into your workforce 127 | + We might create a markdown file explaining in more detail if people do not understand how to use it right away (but is pretty easy to use). 128 | + Resources: 129 | + [Trello Tutorial](https://trello.com/b/I7TjiplA/trello+tutorial) 130 | + [Trello Youtube Tutorial](www.youtube.com/watch?v=7najSDZcn+U) 131 | 132 | ## What is a **Data Science Project**? 133 | 134 | + How to do a **Data Science Project**? 135 | 136 | + Steps of a **Data Science** project: 137 | + Getting Data 138 | + **UCI Machine Learning Repository** 139 | + **Kaggle** datasets 140 | + Cleaning data/sanity checks 141 | + Exploratory Analysis 142 | + Trends in reponse and predictor variales 143 | + Modeling (Choosing Supervised Vs. Unsupervised Learning) 144 | + Model Validation 145 | + Sharing Results 146 | + Inertia7.com 147 | + GitHub repo with nice READNE.md 148 | + Jupyter/RMarkdown Notebook 149 | 150 | If you don't think you can do a project on your own right of the bat. Try doing a project from **Inertia7**! 151 | 152 | + [Scrape a Webpage - Python](www.inertia7.com/projects/scrape-webpage-python) 153 | + [Iris Flower Classification](http://www.inertia7.com/projects/iris-classification-r) 154 | + [Modeling Home Prices](http://www.inertia7.com/projects/regression-boston-housing-r) 155 | + [Forecasting the Stock Market](http://www.inertia7.com/projects/time-series-stock-market-r) 156 | + [Sentiment Analysis on Twitter](http://www.inertia7.com/projects/sentiment-analysis-clinton-trump-2016) 157 | 158 | 159 | Here are some of my own repos where I have projects that aren't published on **Inertia7**: 160 | + https://github.com/raviolli77/pythonTutorialsVinceLa 161 | + https://github.com/raviolli77/machineLearning_Flags_Python 162 | + https://github.com/raviolli77/classification_iris 163 | + https://github.com/raviolli77/machineLearning_breastCancer_Python 164 | + https://github.com/raviolli77/ggplot2_Tutorial_R 165 | 166 | Discuss what their project can look like given the structure of what they just hacked 167 | + Fill in the **Steps of a Data Science Project** 168 | 169 | **Homework**: 170 | For this section, we can be lenient as to when this gets done. For more advanced groups we expect for you to be able to do this on your own. Now for the newer groups you can wait until the next meeting to have me or other members help with the process. 171 | + Build a proposal for your own project 172 | + Get comfortable using **Markdown** notation 173 | + Create a repo in the [Data Science Project Groups GitHub Account](https://github.com/UCSB-dataScience-ProjectGroup) including these steps: 174 | + Abstracts 175 | + Finish filling the **Steps of a Data Science Project** 176 | + Data Sources? Examples include, but are not limited to: 177 | + Kaggle 178 | + UCI 179 | + Data sets found in **R** 180 | + Quandl 181 | + API calls: 182 | + Wikipedia 183 | + Twitter 184 | + Google Maps 185 | + Saint Louis Federal Reserve 186 | + Google Analytics 187 | + If not, then select a project from the suggested list or talk to me for project ideas 188 | **Links to Resources** to resources discussed in meeting: 189 | + **R**/**RStudio**: https://www.rstudio.com/ 190 | + **Python**: https://www.python.org/ 191 | + **Inertia7**: http://www.inertia7.com/ 192 | + **GitHub**: https://github.com/raviolli77 193 | + **Trello**: https://trello.com/ 194 | + **UCI ML Database**: https://archive.ics.uci.edu/ml/datasets.html 195 | + **Kaggle Datasets**: https://www.kaggle.com/datasets 196 | + **R Data sets**: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html 197 | + **Quandl**: https://www.quandl.com/ 198 | + **Wikipedia API**: https://www.mediawiki.org/wiki/API:Main_page 199 | + **Twitter API**: https://dev.twitter.com/docs 200 | + **Saint Louis Federal Reserve**: https://fred.stlouisfed.org/ 201 | + **Google Analytics**: https://www.google.com/analytics/#?modal_active=none 202 | + **Jupyter Notebook**: http://jupyter.org/ 203 | + **R Markdown**: http://rmarkdown.rstudio.com/ 204 | 205 | ## Week 4: Project Iteration/GitHub 206 | **Some Preliminaries**: 207 | + Are people interested in a **Python Hackathon**? 208 | + If so when and where works best 209 | 210 | + Has your team created a **GitHub** Repo for your project within the organizational **GitHub** (Source: https://github.com/UCSB-dataScience-ProjectGroup)? 211 | + Does it have a **ReadMe** explaining the Steps of a **Data Science** Project? 212 | + Did you all agree which versions/interface for the language you will be using? 213 | + Did you reach a conclusion of what models/approach you will take? 214 | + If not give us an overview what you plan to do, by the end of this meeting the project should be decided more or less 215 | 216 | **Team Resources** 217 | + Has your team... 218 | + Been in contact through **Slack**? 219 | + Been doing **Stand Ups**? 220 | + Been addressing issues in going about your project or any preliminary practice for your project 221 | + Asked for help? 222 | 223 | ## **GitHub Crash Course** 224 | 225 | Here we're giving a quick overview of how **GitHub** works. Purpose is to be used as a rudimentary guide for those of you who are new to **GitHub**. We can spend an entire day going over the workflow of **GitHub**, but for now we're concerned with just getting your feet wet, and soon creating a repo for your project if you haven't already. 226 | 227 | **NOTE**: One can spend an entire day learning **git**, so we'll leave that out for this iteration. We will provide resources for **git** below! 228 | 229 | + **Step 1:** 230 | + Create a **GitHub** account (Should go without saying, but you'd be surprised.) 231 | + **Step 2:** 232 | + You should create a *myProject* folder where you keep all your projects. This will help with organization for later on when you'll be doing a shit load of projects and prior when publishing projects! 233 | + Create a folder for your project where you will include things like, but not limited to: 234 | + **README** file - This file will be other people's introduction to your project so make it pretty and easy to follow! (in .md format). 235 | I use [Sublime Text](https://www.sublimetext.com/) to create and edit **README** files (there's a plethora of text editors like **Notepad++**, **atom**, etc. really its all personal preference) 236 | + Script files - These files will be in the format of the language you are doing your project on so either an **R** file or **Python** file (in **.R** or **.py** or **.sql** ) 237 | + Data file(Not sure what the proper name for this is will edit later) - This file is where your data is stored if you are using a static data source typically it can be: 238 | + **.csv** file 239 | + **.txt** file 240 | + **.JSON** file 241 | + **.db** file 242 | + Image folder - For organizational purposes we usually create an image folder which is where we store all images produced in the project if we plan on hosting them or making them viewable without having to run/save the code. Inside this folder you will find static image files like: 243 | + **.png** files (favorited in producing statistical images) 244 | + **.jpeg** 245 | + **.gif** 246 | + Once you get more acquainted with **GitHub** there will be more files that you will add, but for this example these will do 247 | + **Step 3**: 248 | + Once you have the folder for your project and all the respective files you wish to include in the repo on the main page of **GitHub**, click the green button that says *New repository* 249 | + Add the Repo name: we usually name our repos as such 250 | + *statisticalModel_DataSetDescription* 251 | Ex. 252 | + *classification_IrisFlowersR* 253 | + *regression_bostonHousingR* 254 | + Add a description: give a brief overview of what your project will be about to help give people context. 255 | Ex. 256 | + *A collection of alternate R markdown templates* 257 | + *Repo for a quick ggplot2 tutorial for Exploratory Analysis using Jupyter Notebook and R script* 258 | + Leave it as public: Make it accessible to everyone 259 | + **Initialize with a README** - ALWAYS **initialize with a README**: this acts as an instructional overview for your project 260 | + You typically include steps that were required that you can't express in your code (i.e. Creating a plotly account, steps needed if there are multiple scripts in your project) 261 | + A brief overview of your data set and statistical models used in the project 262 | + This will help later on if you plan to publish on inertia7! 263 | + Updates made to your project since its last iteration 264 | + Look at the inertia7 README's for some concrete examples 265 | + **Step 4**: 266 | Since you will be working in a team you have to be familiar with **branches**. **Branches** are different versions for the project, so a good way for your group to work on the project without fucking up the **master branch** 267 | 268 | + (**Master Branch**: This is the version the world will see and use, so make sure that this **branch** is the best iteration/is deployable) 269 | + Create a **branch** and call it like **ravi_branch** 270 | + You and each person in your team should have a branch that shows your iteration of the project if you happen to go ahead or test something out you haven't spoken with your teammates yet. 271 | + **Step 5:** 272 | Say you and your group are in agreement that your **branch** is the version you want on the **master branch**, the next step is creating a **Pull Request**. 273 | 274 | + (**Pull Request**: Allows people to review any changes made in a project, make modifications before the **master branch** changes, and overall help a team work efficiently) 275 | + Go into the **branch** you want to merge so **ravi_branch** 276 | + Click **New Pull Request** 277 | + Here you will see the two **branches** being compared:the **base** will typically be the **master branch** and the compared file will be **ravi_branch** in our example. 278 | + Add a description of some of the changes you made! 279 | + **GitHub** will give you an overview of the changes made in files 280 | + Once you have reviewed everything click **Create pull request** 281 | + This is where other teammates will be notified of you wanting to merge your **branch** and the **master branch** 282 | + If everyone is in agreement you click **Merge pull request** 283 | + Then, click **Confirm merge** and the **master branch** will now have the same contents as **ravi_branch** 284 | 285 | That's a quick and rough tutorial to working in **GitHub**. Doesn't go over everything but should give context as to how to work as a team using **GitHub** and **branches**. I have provided sources that go in more detail and definitely explain better so I would suggest reading up on them! 286 | 287 | **Homework**: 288 | + Will depend on conversations we have on Wednesday to see where your team is at 289 | + Have a repo within the organizational repo by the end of today! 290 | + Create **branches** for each teammate 291 | + Set up a meeting time outside of Wednesday 292 | 293 | **Links to Resources** to resources discussed in meeting(**NOTE**(2/14): Moved **GitHub** related resources to *Recommended Resources for entire quarter*): 294 | 295 | ## Week 5: Project Iteration 296 | 297 | **Some Preliminaries**: 298 | + **Python Hackathon** (Workshop) 299 | + Steps needed to be taken before we can start/set up the **hackathon**: 300 | + Install **Python3.X** 301 | + Use a *Virtual Environment* for your project if it will be in **Python** 302 | + Fill out the google survey sent yesterday night: 303 | + We need to gauge date, time, and funds to make sure it will run smoothly 304 | 305 | + Rewards!!! 306 | + **HG Data Hackathon** 307 | + Date proposition: April 21st from 2pm to 10pm 308 | + Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams 309 | + Spoke with Jason 310 | + Informal presentation of projects with *congratulatory refreshments* 311 | + Reward for **Best Data Visualization** 312 | + Reward for **Best insight/best modeling** 313 | + Reward for **Best presentation** 314 | + Jun Seo can speak of presentation of projects for library staff! 315 | 316 | + Major issues to address for today: 317 | + Does every team have a *requirement.txt* for their project? 318 | + Some README's need more detail (I will go about doing informal interviews today to each group) 319 | + By today your team should have what algorithms, methods and **Python** versioning. 320 | + Branches for team members 321 | Depending on attendance we want today really show us the early iteration of your project so 322 | + Have a script with modules you will be using 323 | + Data set attached to your repo 324 | + Algorithms you will use 325 | 326 | ## Week 6: Project Iteration/Blockers 327 | 328 | **Some Preliminaries**: 329 | + **Python Hackathon** (Workshop) 330 | + Confirmed Date: **2/25/2017** at **10 a.m**. 331 | + Buy shirts to rep! 332 | + Contact me after to get them from other officer. I can take Venmo! 333 | + Rewards (Reiterate because a lot of people were MIA)!!! 334 | + **HG Data Hackathon** 335 | + Date proposition: April 21st from 2pm to 10pm 336 | + Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams 337 | + Informal presentation of projects with *congratulatory refreshments* near end of this quarter 338 | + Reward for **Best Data Visualization** 339 | + Reward for **Best insight/best modeling** 340 | + Reward for **Best presentation** 341 | + The informal presentation can be a prep for the presentation to the Library faculty 342 | + Most likely scheduled at the start of next quarter (Ask Jun-Seo if you have any questions) 343 | + Project will be posted in the newest iteration of **int7x** (inertia7)! 344 | + Team Management 345 | + Word from me regarding team 346 | + We need teams to start applying **Stand Ups** now (Mandatory) 347 | + Must be done before starting your sessions and immediately when your team finishes the meet-up. 348 | + Will demonstrate again with more feedback given to teams 349 | Today will play as an important catch up day for many teams since midterm season was(is) around 350 | + I will go around to teams and ask about project relating to 351 | + repository 352 | + code 353 | + README 354 | Today will be focused mostly on iterating projects. 355 | 356 | ## Week 7: I didn't prep this week 357 | 358 | Carry on. Nothing to see here. 359 | 360 | ## Week 8: Presentation/Flex Day 361 | 362 | For this week I decided we are going to do a surprise project presentation. 363 | 364 | **Announcements**: 365 | Thank you for everyone who participated in the **Python Workshop** 366 | 367 | I will need every team to do the following: 368 | + Update all scripts on their **GitHub** repo in the **ProjectGroupWinter2017**. 369 | + README.md 370 | + scipt.py 371 | + All appropriate data files (i.e. csv files, txt files, etc.) 372 | + Images (inside images folder) that were produced for this project 373 | + Be prepared to pitch your idea to me. 374 | + Sell that shit. 375 | + Why is your project relevant to *Data Science* and the data community as a whole. 376 | + (Not 100%) I would like to see some scripts/notebooks being ran during presentation but due to time constraints, we might just only use what's on **GitHub**. 377 | 378 | Each group presentation should be no longer than **15 minutes** 379 | 380 | ## Week 9: Quarter Wrap-Up 381 | 382 | ### Final thoughts on quarter 383 | + Thank You's 384 | + Dedications 385 | + Food for thought for next quarter 386 | 387 | **Some Preliminaries**: 388 | 389 | ### FACTOR PI sale 390 |
391 | 392 | 393 | Only 1$ a piece! Go show some support to our friends at the **Female Actuarial Association**. Find event link [Here](https://www.facebook.com/events/1615213815453503/) 394 | 395 | + Location: **SRB** 396 | + Date: March 14, 2017 397 | + Time: 11AM - 3PM 398 | 399 | ### Farmer's Data Talk 400 |
401 | 402 | 403 | The Org. wants a packed house for the **Farmer's Insurance Data Talk** so let's all make it out! Facebook event link [Here](https://www.facebook.com/events/589388027918002/) 404 | 405 | + Location: UCen SB Harbor Room 406 | + Date: March 9, 2017 (So tomorrow) 407 | + Time: 6PM - 8PM 408 | + Will **NOT BE FOCUSED** on **actuary** based stuff (Will focus on **Natural Language Processing** so highly relevant to our group) 409 | 410 | ### HG Data Hackathon 411 | 412 | + Location: HG Data Offices 413 | + Time: April 21st 414 | + More on this later 415 | + Will most likely work on a tutorial with Calvin during Spring Break to help prep 416 | 417 | 418 | ### Chapman Data Fest 419 | 420 | + Location: Chapman University 421 | + Time: April 21st as well 422 | + Team of 5 to attend 423 | + **NOTE**: Json wants the people to attend the **Chapman Data Fest** to be of different class levels (i.e. freshman, sophomore, Junior, Senior and Super Senior) 424 | + Let me know if you're interested in this event! Link for Event [Here](https://events.chapman.edu/28206) 425 | 426 | ### Library presentations 427 | We have confirmed date! 428 | 429 | + Location: Same location so here 430 | + Time: April 26th at 7pm 431 | + Need y'all to use today to prep and keep track of progress! 432 | + Make **Github** repos pretty 433 | + Code readable 434 | + Write nice docs 435 | + Make plots pretty with titles, axis labels, and legends 436 | 437 | Let's really flex for this. Everyone worked hard! 438 | 439 | 440 | We would like your team to use inertia7 to present your projects so this is a good segue for the next section 441 | 442 | ### inertia7 User Testing 443 | 444 | We know dead week and finals are fast approaching but we were wondering if anyone would be interested in User-testing the new iteration of inertia7 to give constructive criticism. 445 | 446 | + Doesn't have to be publishing a project. Can just play with the app 447 | + If interested to talk to me or David 448 | + Follow [Link](https://docs.google.com/forms/d/e/1FAIpQLScX8KK6z3ji6OLKlMZ0GS64dbsAJAGmmQLGbihEd5d3wA8o6g/viewform?c=0&w=1) to apply for credentials 449 | 450 | ## Wrap-Up 451 | Things needed by the end of this meeting: 452 | + Updated Scripts 453 | + Updated README's 454 | + Add any appropriate images 455 | + Create plotly account to publish plotly graphs (if applicable) 456 | + To-do list detailing what is still needed for your project 457 | + Keep in contact with partners over break. 458 | + If you're bored during break work on the project! 459 | 460 | **IMPORTANT TO NOTE**: Since finals is approaching your group needs set this up in their repo since there will be a gap period of 3 weeks. I need to know where your team is at and context of this. You **CAN'T** leave until your team shows me the repo and the outline of what is done and what isn't done. 461 | 462 | Three weeks is a long time and if there's no structure as to where your at you will forget/will be hard to pick back up. 463 | 464 | For those of you who feel you are ready to iterate on the presentation part of your project talk to me by the end of today's meeting. 465 | 466 | Again thank you for a wonderful quarter and hope to see you all again next quarter! 467 | 468 | ## **Recommended Resources for entire quarter**: 469 | 470 | + **README** Resources: 471 | + [README wiki](https://en.wikipedia.org/wiki/README) 472 | + [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) 473 | + [inertia7 Examples](http://www.inertia7.com/) 474 | + [Time Series Analysis README](https://github.com/inertia7/timeSeries_sp500_R/blob/master/README.md) 475 | + [Regression Analysis README](https://github.com/inertia7/regression_bostonHousing_R/blob/master/README.md) 476 | + [noffle's Art of README article](https://github.com/noffle/art-of-readme) 477 | + [More resources about README's](https://github.com/matiassingers/awesome-readme) 478 | + **GitHub** Resources: 479 | + [Hello World Tutorial](https://guides.github.com/activities/hello-world/) 480 | + [GitHub Youtube Channel](https://www.youtube.com/githubguides) 481 | + [Understanding the GitHub Flow](https://guides.github.com/introduction/flow/) 482 | + [Creating and Deleting Branches](https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/) 483 | + **Git** Resources: 484 | + [Set Up Git Article](https://help.github.com/articles/set-up-git/) 485 | + [Create a Repo Article](https://help.github.com/articles/create-a-repo/) 486 | + [Fork A Repo](https://help.github.com/articles/fork-a-repo/) Not discussed in this meeting but important part of **GitHub** workflow 487 | + [Be social](https://help.github.com/articles/be-social/) (Great place to discover cool shit on **GitHub**) 488 | + [David's Git Repo](https://github.com/dcamposliz/learnGit) 489 | + **Text Editors** Resources: 490 | + [Sublime Text](https://www.sublimetext.com/) 491 | + [Notepad++](https://notepad-plus-plus.org/) 492 | + [Atom](https://atom.io/) 493 | + [vim](http://www.vim.org/download.php) 494 | 495 | + **Python** Resources: 496 | + [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) (Brush up on **NumPy** and learn **Pandas** from the man who created it!) 497 | + [Vincent La's Personal Website](http://vincela.com/) (Raul's Note: Great place to review/learn **Python** if you're really *rusty*) 498 | + [Python Documentation](https://docs.python.org/3/) (For more advanced users, the documentation for the programming language are clutch resources) 499 | + [Learn Python the Hardway](https://learnpythonthehardway.org/book/) (Haven't gone through it will soon, but dank resource for learning Python) 500 | + [Yhat](https://www.yhat.com/) (Great resource for machine learning application with **Python**) 501 | + [David's Repo: learnPython](https://github.com/dcamposliz/learnPython) 502 | + [Hitchhiker's Guide to Python](http://docs.python-guide.org/en/latest/) 503 | + [Sklearn Docs](http://scikit-learn.org/stable/) 504 | + [Plotly examples in Python](https://plot.ly/python/) 505 | + **R** Resources: 506 | + [R-bloggers](https://www.r-bloggers.com/) (Great place to see people contributing projects and tutorials by real **R** users) 507 | + [ggplot2 docs](http://docs.ggplot2.org/current/) 508 | + [ggplot2 Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) (For visualizations) 509 | + [Quick-R](http://www.statmethods.net/) 510 | + [Plotly examples in R](https://plot.ly/r/) 511 | + [R for Data Science](http://r4ds.had.co.nz/index.html) (Learn from some of the **R** greats including Hadley Wickham, creator of many famous **R** packages) 512 | + [An Introduction to Statistical Learning with R](http://www-bcf.usc.edu/~gareth/ISL/) (Great book used in many UCSB PSTAT Classes) 513 | 514 | + Misc. 515 | + [Kaggle](https://www.kaggle.com/) (Great resource for all things data science) 516 | + [DataCamp](https://www.datacamp.com/) 517 | + [Analytics Vidhya](https://www.analyticsvidhya.com) (Lot of great tutorials relating to machine learning) 518 | + [Stack Overflow](http://stackoverflow.com/) (Stack overflow is love, Stack Overflow is life) 519 | + [w3schools tutorials](https://www.w3schools.com/) (Great place to learn other important tools like, but not limited too: html, SQL (I used this one a lot), website development) 520 | 521 | --------------------------------------------------------------------------------