├── fast.ai courses ├── README.md └── machineLearning │ ├── Lesson2 │ └── README.md │ ├── README.md │ ├── Lesson3 │ └── README.md │ ├── Lesson1 │ └── README.md │ └── Lesson0 │ └── Setup_tips.md ├── CommandLine ├── resources │ ├── cat.JPG │ ├── cd.JPG │ ├── cp.JPG │ ├── ls.JPG │ ├── rm.JPG │ ├── echo1.JPG │ ├── echo2.JPG │ ├── man cat.JPG │ ├── new-item.JPG │ ├── new-item2.JPG │ └── additional_reading.txt ├── .ipynb_checkpoints │ └── Command_Line_Concepts-checkpoint.ipynb └── Command_Line_Concepts.ipynb ├── DataScienceR ├── Papers │ ├── math4ml.pdf │ ├── overfitting.pdf │ ├── use_notes_datascience.pdf │ └── README.md ├── README.md ├── DataScienceResources.md └── DataScienceProj.md ├── R Basics └── README.md ├── README.md └── Git Resources └── Resources.md /fast.ai courses/README.md: -------------------------------------------------------------------------------- 1 | # Materials for fast.ai courses 2 | 3 | Material to be split by course 4 | -------------------------------------------------------------------------------- /CommandLine/resources/cat.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/cat.JPG -------------------------------------------------------------------------------- /CommandLine/resources/cd.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/cd.JPG -------------------------------------------------------------------------------- /CommandLine/resources/cp.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/cp.JPG -------------------------------------------------------------------------------- /CommandLine/resources/ls.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/ls.JPG -------------------------------------------------------------------------------- /CommandLine/resources/rm.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/rm.JPG -------------------------------------------------------------------------------- /CommandLine/resources/echo1.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/echo1.JPG -------------------------------------------------------------------------------- /CommandLine/resources/echo2.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/echo2.JPG -------------------------------------------------------------------------------- /DataScienceR/Papers/math4ml.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/DataScienceR/Papers/math4ml.pdf -------------------------------------------------------------------------------- /CommandLine/resources/man cat.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/man cat.JPG -------------------------------------------------------------------------------- /CommandLine/resources/new-item.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/new-item.JPG -------------------------------------------------------------------------------- /CommandLine/resources/new-item2.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/CommandLine/resources/new-item2.JPG -------------------------------------------------------------------------------- /DataScienceR/Papers/overfitting.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/DataScienceR/Papers/overfitting.pdf -------------------------------------------------------------------------------- /DataScienceR/Papers/use_notes_datascience.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boudrejp/BeginningDataScience/HEAD/DataScienceR/Papers/use_notes_datascience.pdf -------------------------------------------------------------------------------- /fast.ai courses/machineLearning/Lesson2/README.md: -------------------------------------------------------------------------------- 1 | # Lesson 2 2 | * Watch the video for lesson 2 (https://course.fast.ai/lessonsml1/lesson2.html) 3 | * Note the link to the wiki, other helpful links 4 | * Go through the in-class notebook in your crestle account, and write comments to explain to yourself what is going on. 5 | * Replicate the analysis with the data you chose from lesson 1 6 | -------------------------------------------------------------------------------- /fast.ai courses/machineLearning/README.md: -------------------------------------------------------------------------------- 1 | # Machine Learning for Coders 2 | 3 | main site: https://course.fast.ai/ml.html 4 | 5 | Other material to be split by lesson 6 | 7 | Lesson materials to be added as I go through them... 8 | 9 | ### Directory of existing materials 10 | * Lesson 0: Getting started 11 | * Materials to brush up on, setup before beginning, helpful links 12 | * Lesson 1: Basics and training random forests 13 | * Supporting material 14 | * Additional data sets 15 | -------------------------------------------------------------------------------- /fast.ai courses/machineLearning/Lesson3/README.md: -------------------------------------------------------------------------------- 1 | # Lesson 3 2 | * Go through the lecture at https://course.fast.ai/lessonsml1/lesson3.html 3 | * Note the additional links 4 | * Go through the associated notebook wherever you are doing your follow-up work (crestle, local, otherwise...) 5 | * Put in comments to explain what each block of code is doing, and why you are doing it 6 | * For the additional data set that you worked with for parts 1 + 2, follow the analysis of lecture 3 using the same data. 7 | -------------------------------------------------------------------------------- /CommandLine/resources/additional_reading.txt: -------------------------------------------------------------------------------- 1 | Windows command tutorial: https://www.cs.princeton.edu/courses/archive/spr05/cos126/cmd-prompt.html 2 | More advanced command line: https://www.digitaltrends.com/computing/how-to-use-command-prompt/ 3 | Better understanding bash vs powershell: https://searchitoperations.techtarget.com/tip/On-Windows-PowerShell-vs-Bash-comparison-gets-interesting 4 | bash commands: https://courses.cs.washington.edu/courses/cse390a/14au/bash.html 5 | shell commands: https://www.liquidweb.com/kb/new-user-tutorial-basic-shell-commands/ 6 | -------------------------------------------------------------------------------- /DataScienceR/README.md: -------------------------------------------------------------------------------- 1 | # Data Science in R 2 | 3 | The purpose of this part of the repository is to get you doing some projects to get you used to doing data science projects in R. 4 | 5 | ### Contents 6 | * __DataScienceProj__: A description of a sample data science project. Doing a project like this will help you immensely in understanding data science from a practical perspective. 7 | * __DataScienceResources__: Additional web and text sources that could prove useful in performing your data science project. 8 | * __Papers__: A directory to contain academic papers on performing data science projects, general machine learning projects, etc... 9 | -------------------------------------------------------------------------------- /DataScienceR/Papers/README.md: -------------------------------------------------------------------------------- 1 | # Papers 2 | 3 | ### Contents 4 | 5 | * __Math4ML__: A paper that goes through the mathematical foundations of machine learning. This ranges from probability to calculus to optimization topics. If you want a deeper look under the hood of what is going on in machine learning, this is a very good overview. For the purposes of becoming an entry-level data analyst, this isn't absolutely necessary reading but would be important to know these topics to move onto creating your own algorithms. 6 | * __Use Notes- DataScience__: This paper is very high level and goes over some practical considerations for performing data science analyses. This is pretty accessible for someone with only a high level knowledge. 7 | -------------------------------------------------------------------------------- /fast.ai courses/machineLearning/Lesson1/README.md: -------------------------------------------------------------------------------- 1 | # Lesson 1 2 | * Watch the video lesson (https://course.fast.ai/lessonsml1/lesson1.html) 3 | * Note the lesson wiki and additional resources at the bottom 4 | * Go through the associated notebook on crestle. Run each cell of code, and write comments to yourself to say what each code block is doing. You don't need to understand it the whole way, just try to have a general idea. 5 | * Choose a dataset for a regression problem with which you can replicate the analysis 6 | * https://www.kaggle.com/swathiachath/kc-housesales-data 7 | * https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set 8 | * https://www.kaggle.com/orgesleka/hessen-house-prices-dataset 9 | * Any other data set that you might be interested in 10 | * Replicate the analysis from class with your additional data set. Copy and paste the code from the lecture notebook to a new one and only change the code you need. Write comments in order to explain what is happening. 11 | -------------------------------------------------------------------------------- /DataScienceR/DataScienceResources.md: -------------------------------------------------------------------------------- 1 | # Data Science Resources 2 | Additional videos, documentation, articles, etc. that may help you in conducting your data science project. If any of these links are broken, please open an issue for this repository. 3 | 4 | ### Helpful R packages with their Documentation 5 | * Dyplr 6 | * [Documentation](https://cran.r-project.org/web/packages/dplyr/dplyr.pdf) 7 | * [A nice introduction to using dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html) 8 | * ggplot2 9 | * [Cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) 10 | * [Suuuuper in depth tutorial](https://stats.idre.ucla.edu/stat/data/intro_ggplot2/ggplot2_intro_slidy.html) 11 | * [50 Useful Visualizations](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html) 12 | * [Briefer Tutorial](http://r-statistics.co/ggplot2-Tutorial-With-R.html) 13 | * caret 14 | * [Tutorial](https://www.analyticsvidhya.com/blog/2016/12/practical-guide-to-implement-machine-learning-with-caret-package-in-r-with-practice-problem/) 15 | * [A bit of a higher level tutorial](http://www.rebeccabarter.com/blog/2017-11-17-caret_tutorial/) 16 | * [caret documentation](https://cran.r-project.org/web/packages/caret/caret.pdf) 17 | 18 | ### Helpful things to learn data science 19 | * A cheatsheet of how to think about choosing ML algorithms (a guide, not a definitive resource!) 20 | ![Cheat Sheet](http://1.bp.blogspot.com/-ME24ePzpzIM/UQLWTwurfXI/AAAAAAAAANw/W3EETIroA80/s1600/drop_shadows_background.png) 21 | * [Introduction to Supervised learning](https://medium.com/machine-learning-for-humans/supervised-learning-740383a2feab) 22 | -------------------------------------------------------------------------------- /R Basics/README.md: -------------------------------------------------------------------------------- 1 | # R Basics 2 | 3 | A tutorial for those familiar with basic coding concepts such as loops and conditional statements. This will not be going over thinking algorithmically, but more about how to do things in R. 4 | 5 | ## Ways to learn R 6 | There are two main ways I would recommend learning R, both of which do not require buying any books. They both go over the same fundamental concepts, and in more or less the same order. The advantages of one over the other just boil down to the way you'd prefer to learn the material. The two ways are: 7 | * __Swirl__: learning R, in R. 8 | * To use this, open up a R session in R studio. Run the following commands in the console (type after the `>` and hit enter): 9 | * `install.packages("swirl")` (this installs the software package that contains the modules to learn R) 10 | * `library(swirl)` (this will open the swirl package into the current session) 11 | * `swirl()` (this begins the swirl module) 12 | * After following the above steps, the swirl program will give you prompts to go through all the relevant material afterwards. 13 | * [__PreludeInR__](preludeinr.com): Learn R with bad jokes 14 | * To use this, just go to the website and start going through the content in order. 15 | * If you're going to use this, __PLEASE__ have a session of R open and go through the commands yourself. It may seem redundant, but you will get a lot more out of it if you just end up typing the code yourself. 16 | 17 | In my book, both of these are good ways to learn R basics and a little bit of how to do statistics in R. For the most part, they cover the same material so I think that doing both would be redunandant and not useful. So pick one and stick with it, and you'll be good on learning R basics. 18 | 19 | Additional text resources: *The Art of R Programming* by Norman Matloff 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BeginningDataScience 2 | A repo for materials for engineering PDPs to learn general data science and computer science concepts 3 | 4 | ## __Breakdown of materials__ 5 | 1. Command line knowledge 6 | 2. Understanding git 7 | 3. Basic R knowledge (swirl package, general questions) 8 | 4. Basic R projects (use of dplyr, caret, ggplot2) 9 | * Use of Kaggle datasets would be good here 10 | 5. Simple R project to demonstrate reactivity concepts. 11 | * Needs to have a statistical model that will generate predictions based on user input 12 | 6. Basic SQL knowledge, relational database knowledge 13 | 7. *Bonus* Parallel computing concepts 14 | 8. *Bonus* Http protocol, proxies, firewalls, APIs... build a project to obtain API data and display 15 | 9. *Bonus* How to create R packages 16 | 17 | ## __Prereqs__ 18 | * You will need R and R studio installed on your device. *Note: If doing this on your work device, you will need local admin rights* 19 | * R: https://cran.r-project.org/ 20 | * R Studio: https://www.rstudio.com/ (You want R Studio Desktop, the free version) 21 | * Jupyter Notebooks: http://jupyter.org/ 22 | * Python >= 3.5: https://www.python.org/ 23 | * Knowledge of basic statistics 24 | * Linear Regression, standard deviations, normal distributions, hypothesis testing should be pretty intuitive to you 25 | * Knowledge of calculus concepts 26 | * You should be familiar with gradient descent at a minimum 27 | * Experience in coding and thinking algorithmically 28 | * Particular language is not important 29 | * Should be able to pseudocode something like a basic sorting algorithm 30 | * Know when and why you would want to use the following: if/else statements, for loop, while loop 31 | * A desire to learn more about data science and expand your data analysis skills! 32 | -------------------------------------------------------------------------------- /Git Resources/Resources.md: -------------------------------------------------------------------------------- 1 | # __Learning Git__ 2 | There's a lot of good resources out there to learn git. You'll want to learn git after you know your way around the command line. There are many ways to interface with git functionality, but I believe the best way to learn and have solid fundamentals is to use the command prompt. 3 | 4 | ## Step 0: Download Git 5 | * Download git for your system here: `https://git-scm.com/downloads` 6 | 7 | Go ahead and open up `Git Bash` when you have it all installed. You'll notice that Git looks like a standard command prompt or powershell window. It will use __bash__ commands rather than windows shell commands, so if you run into issues with the commands taught to you in the command prompt lesson that's why. Most commands are the same but just a few are different. 8 | 9 | From here, I will turn it over to external resources since there are lots of good resources out there to learn why you'd want to use git and how to use it effectively. No sense in recreating the wheel... 10 | 11 | ## Git Context 12 | Why use git in the first place? What is it? 13 | 14 | * [A nice article on Medium](https://medium.com/swlh/git-as-the-newbies-learning-steroid-963a2146220b) 15 | * [The Atlassian Git tutorial is VERY good](https://www.atlassian.com/git/tutorials/what-is-version-control) 16 | * [A nice, short video of why you want it and broadly how it works](https://www.youtube.com/watch?v=DqtZUvmPmo4) 17 | 18 | ## Using Git 19 | How can I use git? 20 | 21 | I would recommend the [Atlassian Git tutorial](https://www.atlassian.com/git/tutorials/setting-up-a-repository) sections `Getting Started` and `Collaborating`. These give you all the commands that you'll need to use on a regular basis. 22 | 23 | In general, a git workflow is the following: 24 | 1. Someone sets up a central git repository in Bitbucket, Github, Gitlab, or by some other means 25 | 2. The developers clone the repository 26 | 3. Each individual developer creates branches for the features that he/she works on. Commits are made locally for small blocks of code. 27 | 4. Once a feature is completed, the developer merges the feature branch into the master branch of the repository in the central repository. Other editors will review to make sure things look good, and work together to resolve merge conflicts. 28 | 5. Developers will merge/rebase their current feature branches to reflect the new changes in the master branch in the central repository. 29 | 30 | By doing this, all the developers can work efficiently on their own parts of the project while not stepping on the toes of anyone else as they work. It allows for easier collaboration and more productivity! 31 | 32 | You'll notice that this is a git repository in its own right- so you could clone this to your local machine and make changes to it and I would never know. However, if you had a fix or a suggestion that you'd like to make, you could make the edit, commit, and then submit it to the central repository as a pull request. That's how a lot of open source software projects work- people develop useful features for their own purposes and then will create pull requests so other people can use them as well. 33 | 34 | Happy Git-ing! 35 | -------------------------------------------------------------------------------- /fast.ai courses/machineLearning/Lesson0/Setup_tips.md: -------------------------------------------------------------------------------- 1 | # Tips before starting 2 | 3 | ### General notes 4 | * We are trying to get a normally 12 week course done in about 3 weeks. Because of that, we're not going to go into as much detail with work outside of lectures, but I still think we can learn a lot here. 5 | * The idea here is not to make you an expert in one course, but to get you hands on experience such that you would know how to learn more if you so choose. Or, for those of us in Berkeley, becoming used to performing machine learning analyses using python ahead of W207. 6 | * Overall, the goal for each lesson here is going to be: 7 | * Watch the lectures and don't worry too much if you don't understand everything on the first go 8 | * Pick your own dataset to replicate the analysis done in class (adapt the code where needed for this, but maintain the same overall structure) 9 | * I will provide a few examples of sample data sets that should perform similarly to those presented in class if you don't have one that you already have in mind 10 | * We will have virtual sessions of all of us together to share the progress we've made on our independently chosen data set to learn from those approaches and reinforce what's been done, ask questions, etc 11 | * We have RocketChat as a collaboration forum for questions, banter, updates, etc 12 | 13 | ### Python 14 | * This course will be using Python 3. In Python, there is "base" functionality that is written in the Python language, and there are pre-written Python libraries that can be imported and used. This class will be making heavy use of pre-written libraries to make using the various machine learning topics easy to implement. 15 | * Common libraries for data science include numpy (optimized array computations), pandas (built on top of numpy, allows you to have data tables which function similar to excel/ R data frames), and sklearn (implemented machine learning algorithms and associated functionality). 16 | * In python, it's common to see long lines of code with many dots. Because Python is an object oriented language, functions can be written as methods for types of objects. If this is confusing, don't worry about it too much - this basically just means that any code following a dot means that the proceeding code is applied to the code before the dot. For example... 17 | * `df["sales"].groupby(df.productcode).mean()` 18 | * A crash course on Python for data science can be found here: https://www.listendata.com/2017/05/python-data-science.html 19 | * A crash course for more general python can be found here: http://pythonentresdias.blogspot.com/p/who-serves-this-blog-learn-python-in.html 20 | * __Overall, don't worry so much about being able to write your own code from scratch. We'll use the class examples with minor tweaks to understand the material better__ 21 | 22 | ### Jupyter 23 | * Jupyter notebooks are a convenient way to run Python code. The code is written into cells which can be executed independently of one another. 24 | * They can also be configured to run R and Julia code, but we will stick with Python for this class. 25 | 26 | ### Crestle 27 | * The course has a pre-configured cloud service for running Jupyter notebooks with a GPU for a compute engine called crestle. 28 | * I recommend that everyone makes an account here and uses this. It will be pre-configured with the data needed for the course, and should also have the class notebooks already on there. This will also save us any headaches of replicating environments. 29 | * Note that crestle will charge you $0.30 USD per hour of use of the machines. I doubt that you will use over $20 over the duration of this course. 30 | * Crestle: https://www.crestle.ai/ 31 | 32 | ### Github 33 | * There is a github repo for all the course materials. Might be useful to peruse them outside of Crestle to not get charged. 34 | * repo: https://github.com/boudrejp/fastai/tree/master/courses/ml1 35 | 36 | -------------------------------------------------------------------------------- /DataScienceR/DataScienceProj.md: -------------------------------------------------------------------------------- 1 | # Your first data science project 2 | 3 | ## The Goal 4 | Utilize your knowledge of R to do an analysis of a dataset, and employ some machine learning concepts to model a feature of the dataset. This is an example of doing *supervised machine learning* because we have labeled data with our answers, and we are trying to figure out a model to explain how the inputs can be used to come up with the output. Additional supporting material for machine learning concepts can be found in the DataScienceResources.md file. 5 | 6 | ### The dataset 7 | Good potential data sources: 8 | * [House price prediction, a regression example](https://www.kaggle.com/harlfoxem/housesalesprediction) 9 | * [Predicting if a mushroom is poisonous, a classification example](https://www.kaggle.com/uciml/mushroom-classification) 10 | * [Predicting defects in silicon wafer manufacturing](https://archive.ics.uci.edu/ml/datasets/SECOM) 11 | 12 | Plenty of other sources too if you want to get creative, but these are pretty straightforward in terms of what your target is. If you find one that you're interested in doing, just make sure you have one parameter that is clearly the output. For now, we want to stay away from image recognition and natural language processing-type problems. 13 | 14 | ### How to do the project 15 | * Start by doing an exploratory data analysis. 16 | * Investigate your data. What kind of distributions are inherent in your data? Are there outliers, missing or nonsensical values? If so, how are you going to deal with these? Do any parameters seem to be highly correlated to each other? Do we need to think about reducing the number of variables we are looking at? There's a lot of questions to ask here, but these may become more obvious as you work with the data. 17 | * Make use of visualizations! They can help a lot in understanding your data. 18 | * Helpful libraries: `dplyr` and `ggplot2` (please find a use to demonstrate your knowledge of both of these!) 19 | * __Deliverable__: Create a well-commented R Script that shows all the things you did for your exploratory data analysis, and why you did them. Use the `dyplr` and `ggplot2` libraries, along with any others you may find useful. 20 | * Next, create a script that will take your original data and preprocess it. 21 | * These should be things you discovered during your initial data exploration. Do data need to be scaled? Do some variables need to be treated as categorical instead of numerical? Do you need to "bin" some continuous variables (i.e. turning a continuous variable into a category for 1-5, 6-10, etc...)? There are lots of things that may make sense to do. 22 | * __Deliverable__: Create a well-commented R Script that takes your original data and processes it into a way that has useful features to model your output variable. 23 | * Finally, do some machine learning! 24 | * Now that you have a nice preprocessed data frame, test out a few different machine learning algorithms and see how well you can model your target variable! 25 | * Separate your data into training and testing sets. 26 | * Test out at least 3 different types of machine learning algorithms. Do some hyperparameter tuning to find optimal settings. For now, stay away from more complex neural networks (no CNN or RNN. Standard NN with less than 10 layers is OK, if you can explain with comments how it works) 27 | * Suggestions (some may be applicable only for classification or regression- read up!): random forest, decision tree, cubist, linear regression, Naive Bayes, knn, SVM 28 | * Which one performed the best? What measure did you use to evaluate that (there are multiple!)? Does there appear to be overfitting? How might overfitting change which algorithm you choose? 29 | * Helpful libraries: `caret` 30 | * __Deliverable__: An R script that tests out at least 3 machine learning algorithms with hyperparameter tuning and uses the `caret` library. 31 | * __Deliverable__: Some sort of text document (Markdown preferred, but Word doc, txt file, pdf are all fine) explaining why you chose the ML algorithms that you did, broadly how they work, what your results were, how much you should be able to trust your results. Include visuals for things like AUC curve, fitting y vs y_predicted, etc... to help your explanation. 32 | -------------------------------------------------------------------------------- /CommandLine/.ipynb_checkpoints/Command_Line_Concepts-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Using the Command Line\n", 8 | "\n", 9 | "When working in the realm of data science, you need to be comfortable with using your computer in ways that a typical user might not generally use. You might even use some virtual machines or other devices that have no user interface- the only way to use them will be through a command interface like Windows Powershell, Command Prompt, or Bash.\n", 10 | "\n", 11 | "In general, there are two types of commands you will need to know: __Windows__ commands which are used in the Windows Command Prompt and Powershell, and __Bash__ commands that are used in Mac and linux devices." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Windows Commands\n", 19 | "\n", 20 | "Windows commands follow the same general framework as bash commands, but often will use a different name for them. Fundamentally, they have the same functionality. Since in the enterprise Windows machines are more common, I will start with Windows machines. If you are solely working on a Mac device, this won't apply to you." 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "Step 1: Open a window of Powershell or Cmd.exe" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "source": [ 36 | "### Navigating directories + clear commands\n", 37 | "\n", 38 | "You will see the current active directory on the left, with a `>` character followed by a blinking cursor. \n", 39 | "\n", 40 | "To see what files and directories are in the current directory, use the command `ls`. This will display all relevant files and directories.\n", 41 | "\n", 42 | "![ls example](./resources/ls.jpg)\n", 43 | "\n", 44 | "To change the directory you are working in, use the `cd` command. Note that you can give a relative file path or an absolute one. For relative file paths, use `./` to reference the current directory. To go to a parent directory by a relative reference, use `../`. For absolute file paths, use the full file path name. See below for some examples of moving around directories.\n", 45 | "\n", 46 | "![cd example](./resources/cd.jpg)\n", 47 | "\n", 48 | "You may be in a directory and want to see the contents of the file. To do this, use the `cat` command to print out the contents of a file in the command prompt.\n", 49 | "\n", 50 | "![cat example](./resources/cat.jpg)\n", 51 | "\n", 52 | "At this point, you may have a lot of text on your screen. To clear your screen, use the `clear` command. Note that you will stay in your current directory, but will just have all the text output cleared off the screen.\n", 53 | "\n" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "### More advanced commands\n", 61 | "\n", 62 | "If you're ever confused about what a command does, use the `man` command, followed by the command in question. For example, if I wanted the documentation on `cat`, I would use `man cat`.\n", 63 | "\n", 64 | "![man cat](./resources/man cat.jpg)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "If I had some string that I wanted to put into a text file, I would be able to do that via the `echo` command. `echo` will normally just print the text back at you in the command prompt. However, if you have a particular file you want the text to go to, you can use `echo \"some text to add\" > file_to_contain_text.txt`. Note that this will work even if you don't have a file called `file_to_contain_text.txt` already existing. \n", 72 | "\n", 73 | "If `file_to_contain_text.txt` already exists, using the `>` operator will overwrite all text in the file and you will end up with only the new text. If you want the new text added as an addition to the old text, use the `>>` operator.\n", 74 | "\n", 75 | "![echo ex 1](./resources/echo1.jpg)\n", 76 | "![echo ex 2](./resources/echo2.jpg)\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "I also have the option of making an empty file first and then funneling text into it. To create an empty document, I would use the `New-Item` command in Powershell.*Note: this is the first example of a command that is different in Windows compared to bash. The corresponding bash command is `touch`.\n", 84 | "\n", 85 | "![new-item](./resources/new-item.jpg)\n", 86 | "Notice here that when I use `cat` that there is no output. That's because we created an empty file!\n", 87 | "![new-item2](./resources/new-item.jpg)\n", 88 | "\n", 89 | "If I want to copy a file, I can use the `cp` command. It takes one additional argument of where you want the file to go to when it is copied.\n", 90 | "\n", 91 | "![cp](./resources/cp.jpg)\n" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Now at this point, we've made a lot of files that I don't really want. Let's delete them using the `rm` command.\n", 99 | "\n", 100 | "![rm](./resources/rm.jpg)\n", 101 | "\n", 102 | "Be cafeful with using rm... you can accidentally delete very important operational files with this and ruin your whole PC!\n", 103 | "\n", 104 | "### Congrats!!!\n", 105 | "\n", 106 | "At this point, you probably know enough command line to be dangerous for most of the things you'll be doing. " 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "outputs": [], 116 | "source": [] 117 | } 118 | ], 119 | "metadata": { 120 | "kernelspec": { 121 | "display_name": "Python 3", 122 | "language": "python", 123 | "name": "python3" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 3 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython3", 135 | "version": "3.6.3" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /CommandLine/Command_Line_Concepts.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Using the Command Line\n", 8 | "\n", 9 | "When working in the realm of data science, you need to be comfortable with using your computer in ways that a typical user might not generally use. You might even use some virtual machines or other devices that have no user interface- the only way to use them will be through a command interface like Windows Powershell, Command Prompt, or Bash.\n", 10 | "\n", 11 | "In general, there are two types of commands you will need to know: __Windows__ commands which are used in the Windows Command Prompt and Powershell, and __Bash__ commands that are used in Mac and linux devices." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Windows Commands\n", 19 | "\n", 20 | "Windows commands follow the same general framework as bash commands, but often will use a different name for them. Fundamentally, they have the same functionality. Since in the enterprise Windows machines are more common, I will start with Windows machines. If you are solely working on a Mac device, this won't apply to you." 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "Step 1: Open a window of Powershell or Cmd.exe" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "source": [ 36 | "### Navigating directories + clear commands\n", 37 | "\n", 38 | "You will see the current active directory on the left, with a `>` character followed by a blinking cursor. \n", 39 | "\n", 40 | "To see what files and directories are in the current directory, use the command `ls`. This will display all relevant files and directories.\n", 41 | "\n", 42 | "![ls example](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/ls.JPG?raw=true)\n", 43 | "\n", 44 | "To change the directory you are working in, use the `cd` command. Note that you can give a relative file path or an absolute one. For relative file paths, use `./` to reference the current directory. To go to a parent directory by a relative reference, use `../`. For absolute file paths, use the full file path name. See below for some examples of moving around directories.\n", 45 | "\n", 46 | "![cd example](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/cd.JPG?raw=true)\n", 47 | "\n", 48 | "You may be in a directory and want to see the contents of the file. To do this, use the `cat` command to print out the contents of a file in the command prompt.\n", 49 | "\n", 50 | "![cat example](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/cat.JPG?raw=true)\n", 51 | "\n", 52 | "At this point, you may have a lot of text on your screen. To clear your screen, use the `clear` command. Note that you will stay in your current directory, but will just have all the text output cleared off the screen.\n", 53 | "\n" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "### More advanced commands\n", 61 | "\n", 62 | "If you're ever confused about what a command does, use the `man` command, followed by the command in question. For example, if I wanted the documentation on `cat`, I would use `man cat`.\n", 63 | "\n", 64 | "![man cat](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/man%20cat.JPG?raw=true)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "If I had some string that I wanted to put into a text file, I would be able to do that via the `echo` command. `echo` will normally just print the text back at you in the command prompt. However, if you have a particular file you want the text to go to, you can use `echo \"some text to add\" > file_to_contain_text.txt`. Note that this will work even if you don't have a file called `file_to_contain_text.txt` already existing. \n", 72 | "\n", 73 | "If `file_to_contain_text.txt` already exists, using the `>` operator will overwrite all text in the file and you will end up with only the new text. If you want the new text added as an addition to the old text, use the `>>` operator.\n", 74 | "\n", 75 | "![echo ex 1](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/echo1.JPG?raw=true)\n", 76 | "![echo ex 2](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/echo2.JPG?raw=true)\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "I also have the option of making an empty file first and then funneling text into it. To create an empty document, I would use the `New-Item` command in Powershell.*Note: this is the first example of a command that is different in Windows compared to bash. The corresponding bash command is `touch`.\n", 84 | "\n", 85 | "![new-item](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/new-item.JPG?raw=true)\n", 86 | "Notice here that when I use `cat` that there is no output. That's because we created an empty file!\n", 87 | "![new-item2](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/new-item2.JPG?raw=true)\n", 88 | "\n", 89 | "If I want to copy a file, I can use the `cp` command. It takes one additional argument of where you want the file to go to when it is copied.\n", 90 | "\n", 91 | "![cp](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/cp.JPG?raw=true)\n" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Now at this point, we've made a lot of files that I don't really want. Let's delete them using the `rm` command.\n", 99 | "\n", 100 | "![rm](https://github.com/boudrejp/BeginningDataScience/blob/master/CommandLine/resources/rm.JPG?raw=true)\n", 101 | "\n", 102 | "Be cafeful with using rm... you can accidentally delete very important operational files with this and ruin your whole PC!\n", 103 | "\n", 104 | "### Congrats!!!\n", 105 | "\n", 106 | "At this point, you probably know enough command line to be dangerous for most of the things you'll be doing. " 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "outputs": [], 116 | "source": [] 117 | } 118 | ], 119 | "metadata": { 120 | "kernelspec": { 121 | "display_name": "Python 3", 122 | "language": "python", 123 | "name": "python3" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 3 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython3", 135 | "version": "3.6.3" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | --------------------------------------------------------------------------------