├── .gitignore ├── 01-intro └── 01-introduction.pdf ├── 02-basic-python ├── 02-basic-python.ipynb ├── 02-exercises.ipynb ├── 02-version-control.ipynb ├── anaconda_navigator.png ├── datasciencecat.jpg ├── exercise.py ├── first_steps.py └── newrepo.png ├── 03-basic-python-II ├── lecture-3-basic-python-II.ipynb ├── lecture-3-exercises.ipynb └── patric_recursive.gif ├── 04-intro-desc-stat ├── 04-DescriptiveStatistics.ipynb ├── 04-DescriptiveStatistics_Activity.ipynb ├── Conf1.png ├── Conf2.png ├── Correlation_examples2.svg ├── SmallLargeStandDev.png ├── correlation.png ├── global_warming.csv ├── purity.png └── test_data.csv ├── 05-dictionaries-pandas-series ├── 05-dictionaries-pandas-series.ipynb └── 05-exercises.ipynb ├── 06-loading-data-dataframes ├── 06-exercises.ipynb ├── 06-loading-data-dataframes.ipynb ├── grades.csv ├── hit_albums.csv └── my_file.txt ├── 07-hypothesis-testing ├── 07-hypothesis-testing.ipynb ├── Cohen1994.pdf ├── InferenceErrors.png ├── Nuzzo2014.pdf └── determinePvals.png ├── 08-data-science-ethics └── Data-Science-Ethics.pdf ├── 09-LinearRegression1 ├── 08-LinearRegression1.ipynb ├── 08-LinearRegression1_Activity.ipynb ├── 438px-Linear_regression.png ├── Advertising.csv └── SLR.pdf ├── 10-LinearRegression2 ├── 438px-Linear_regression.png ├── 9-LinearRegression2.ipynb ├── 9-LinearRegression2_Activity.ipynb ├── Advertising.csv ├── Auto.csv ├── Credit.csv └── Overfitted_Data.png ├── 11-practical-data-visualization ├── 10-exercise.ipynb ├── 10-practical_visualization.ipynb ├── movies.csv ├── stacked_hist.png └── standards.png ├── 12-vis-principles └── 11-vis-principles.pdf ├── 13-web-scraping ├── 13-exercise-scraping.ipynb ├── 13-web-scraping.ipynb ├── class.png ├── class_list.html ├── class_schedule.html ├── inspector.png ├── lyrics.html ├── requests.png └── sampledevtools.png ├── 14-apis ├── 14-apis-iss-and.ipynb ├── 14-exercise-apis.ipynb ├── credentials.py ├── pokeapiscreenshot.png ├── pokemonendpoint.png └── requests.png ├── 15-Classification1 ├── 15-Classification-1-Decision-Trees.ipynb ├── 15-Classification-1-kNN.ipynb ├── BiasVarianceTradeoff.png ├── BinaryConfusinoMatrix.png ├── ConfusionMatrix.png ├── iris.png ├── oc-tree.jpeg ├── p_sets.png ├── scikit-learn-logo.png ├── temp.dot ├── temp.png ├── titanic.csv └── titanic_tree.png ├── 16-Classification2 ├── 16-Classification2-SVM.ipynb ├── 4fold_CV.png ├── SVM-Tutorial.pdf └── iris.png ├── 17-NLP-RegEx ├── lecture-21-NLP.ipynb ├── lecture-21-exercise.ipynb ├── lecture-21-regex.ipynb └── mod_squad.png ├── 18-Clustering1 ├── 18-Clustering1-Exercise.ipynb ├── 18-Clustering1.ipynb ├── k-means-fig.png └── lloyd.png ├── 19-Clustering2 ├── 19-Clustering2-Exercise.ipynb ├── 19-Clustering2.ipynb ├── ComparisonOfClusteringMethods.png ├── DBScan.png ├── connectivity_plot1.png ├── connectivity_plot2.png ├── dendrogram.png ├── hc_1_homogeneous_complete.png ├── hc_2_homogeneous_not_complete.png ├── hc_3_complete_not_homogeneous.png ├── hierarchical_clustering_1.png ├── hierarchical_clustering_2.png └── lloyd.png ├── 20-DimReduction ├── 20-DimReduction-Activity.ipynb ├── 20-DimReduction.ipynb ├── heptathlon.csv ├── rnamix1_SCT.csv └── rnamix1_labels.csv ├── 21-NeuralNetwork1 ├── 21-NeuralNetworks1.ipynb ├── Colored_neural_network.svg ├── ImageNetPlot.png ├── mnist-original.mat.zip ├── neuralnetworks.png └── perceptron.png ├── 22-NeuralNetworks2 ├── 22-NeuralNetworks2-activity.ipynb ├── 22-NeuralNetworks2.ipynb ├── Colored_neural_network.svg ├── activationFct.png ├── beginner.ipynb ├── graph.png ├── images │ ├── brodie.jpeg │ ├── layla1.jpeg │ ├── scout1.jpeg │ ├── scout2.jpeg │ ├── scout3.jpeg │ ├── scout4.jpeg │ └── scout5.jpeg └── nature14539.pdf ├── 23-databases ├── 23-databases-exercises.ipynb ├── 23-databases.ipynb ├── albums_tracks_tables.jpg ├── backup_chinook.db ├── chinook.db ├── database_schema.png └── exploits_of_a_mom.png ├── 24-networks ├── 24-network-exercise-activity.ipynb ├── 24-networks-slides.pdf ├── 24-networks.ipynb ├── 24-path-search.ipynb ├── Astar_progress_animation.gif ├── Dijkstra_Animation.gif ├── Dijkstras_progress_animation.gif ├── bread.png └── lesmis.gml ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | 103 | # MacOS saving finder state 104 | .DS_Store 105 | 106 | # vim related 107 | *~ 108 | *.sw* 109 | -------------------------------------------------------------------------------- /01-intro/01-introduction.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/01-intro/01-introduction.pdf -------------------------------------------------------------------------------- /02-basic-python/02-exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Excercises for Lecture 2\n", 8 | "\n", 9 | "Your Name: \n", 10 | "Your UID: \n", 11 | "Your E-Mail: " 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Exercise 1: Data types and operations\n", 19 | "Play around with data types and operations. Try the following things:\n", 20 | "\n", 21 | "1. Define two variables and assign an integer to the first and a float to the second. Define a new variable and assign the sum of the previous two variables. What's the data type of the third variable?\n", 22 | "2. Reassign a variable with a different data type, e.g., take one of your numerical variables and assign a string to it. What's the new data type?\n", 23 | "3. See what happens if you try to add a string to a string.\n", 24 | "4. See what happens if you add a string to a float or an integer." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "# your code" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Exercise 2: Running Programs\n", 41 | "\n", 42 | " * Create a new file python file and use the double_number function as a template. Modify the code to add two numbers instead of doubling a single number.\n", 43 | " * Can you guess what would happen if you change the indentation? Try it out.\n", 44 | " * Try what happens if you print `a` at the very end of the program. Can you explain what's going on?\n", 45 | " \n", 46 | "If you want to submit this activity, paste the content of your python file here: " 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 1, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "# your code" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "## Exercise 3: Creating Cells, Executing Code\n", 63 | "\n", 64 | "1. Create a new code cell below where you define variables containing your name, your age in years, and your major.\n", 65 | "2. Create another cell that uses these variables and prints a concatenated string stating your name, major, and your age in years, months, and days (assuming today is your birthday). The output should look like that:\n", 66 | "\n", 67 | "```\n", 68 | "Name: Science Cat, Major: Computer Science, Age: 94 years, or 1128 months, or 34310 days. \n", 69 | "```" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "## Exercise 4: Functions\n", 77 | "Write a function that \n", 78 | " * takes two numerical variables\n", 79 | " * multiplies them with each other\n", 80 | " * divides them by a numerical variable defined in the scope outside the function\n", 81 | " * and returns the result. \n", 82 | " \n", 83 | "Print the result of the function for three different sets of input variables. " 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [] 92 | } 93 | ], 94 | "metadata": { 95 | "kernelspec": { 96 | "display_name": "Python 3 (ipykernel)", 97 | "language": "python", 98 | "name": "python3" 99 | }, 100 | "language_info": { 101 | "codemirror_mode": { 102 | "name": "ipython", 103 | "version": 3 104 | }, 105 | "file_extension": ".py", 106 | "mimetype": "text/x-python", 107 | "name": "python", 108 | "nbconvert_exporter": "python", 109 | "pygments_lexer": "ipython3", 110 | "version": "3.9.13" 111 | } 112 | }, 113 | "nbformat": 4, 114 | "nbformat_minor": 4 115 | } 116 | -------------------------------------------------------------------------------- /02-basic-python/02-version-control.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to Data Science – Lecture 2 – Git and GitHub\n", 8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n", 9 | "\n", 10 | "In this lecture, we will learn about version control. We'll look at a couple of general principles and then go into the specifics of git and also GitHub. We'll also look at features of GitHub such as issue tracking. We strongly recommend that you use proper version control for your final project. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Why version Control?\n", 18 | " \n", 19 | " * **Keep copies of multiple states of files** \n", 20 | " By committing you record a state of the file to which you can go back any time.\n", 21 | " * **Create alternative states** \n", 22 | " Imagine you just want to try out something, but you realize you have to modify multiple files. You're not sure whether it works or is worth it. With version control you can just create a **branch** where you can experiment or develop new features without changing the main or other branches.\n", 23 | " * **Collaborate in teams** \n", 24 | " Nobody wants to send code via e-mail or share via Dropbox. If two people work on a file at the same time it's unclear how to merge the code. Version control lets you keep your code in a shared central location and has dedicated ways to merge and deal with conflicts. \n", 25 | " * **Keep your work safe** \n", 26 | " Your hard drive breaks. Your computer is stolen. But your code is safe because you store it not only on your computer but also on a remote server. \n", 27 | " * **Share** \n", 28 | " You developed something awesome and want to share it. But not only do you want to make it available, you're also happy about contributions from others! \n", 29 | "\n", 30 | "\n", 31 | "## git\n", 32 | "\n", 33 | " * Created by Linus Torvalds, 2005\n", 34 | " * Meaning: British English slang roughly equivalent to \"unpleasant person\". \n", 35 | " * git – the stupid content tracker.\n", 36 | "\n", 37 | "*I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'. -- Linus Torvalds*\n", 38 | "\n", 39 | "## Why git?\n", 40 | "\n", 41 | " * Popular ([~60-90% of open source projects](https://rhodecode.com/insights/version-control-systems-2016))\n", 42 | " * Truly distributed\n", 43 | " * Very fast\n", 44 | " * Everything is local\n", 45 | " * Free\n", 46 | " * Safe against corruptions\n", 47 | " * GitHub!" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "## Installation\n", 55 | "\n", 56 | "See the [official documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) on how to install git on your operating system. Or see the [GitHub Documentation]( \n", 57 | " https://help.github.com/en/github/getting-started-with-github/set-up-git).\n", 58 | "\n", 59 | "On Mac, install the XCode package from the app store. \n", 60 | "\n", 61 | "On Windows, see the above link, or install [GitHub Desktop](https://desktop.github.com/) which includes a git shell." 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "# Working with GitHub\n", 69 | "\n", 70 | "First, we'll create a new repository on github by going to [https://github.com/new](https://github.com/new). \n", 71 | "\n", 72 | "![New repo interface on GitHub](newrepo.png). \n", 73 | "\n", 74 | "We'll also create a README.md and LICENSE file. " 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "## GUI Clients\n", 82 | "\n", 83 | "* **GitHub Desktop** \n", 84 | " Good option if you want a GUI client. [Download here](https://desktop.github.com/)\n", 85 | "* **Integrated in IDEs** \n", 86 | " Many operations can be done out of a IDE such as WebStorm \n", 87 | " " 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## Command Line Interface\n", 95 | "\n", 96 | "Now let's clone the repository from GitHub.\n", 97 | " \n", 98 | "```bash\n", 99 | "$ git clone https://github.com/Fericiano-Fluorite/demo.git\n", 100 | "```\n", 101 | "\n", 102 | "This creates a local copy of the GitHub repository. We will just start working with that and commit and push the code to the server. \n", 103 | "\n", 104 | "\n", 105 | "```bash\n", 106 | "# What's currently in the repository?\n", 107 | "$ cd Demo\n", 108 | "$ ls\n", 109 | "LICENSE README.md\n", 110 | "```\n", 111 | "Write something to demo.txt.\n", 112 | "\n", 113 | "```bash\n", 114 | "$ echo \"Hello world!\" > demo.txt\n", 115 | "echo \"Hello world\" > demo.txt\n", 116 | "```\n", 117 | "Add demo.txt to the repository.\n", 118 | "```bash\n", 119 | "$ git add demo.txt\n", 120 | "```\n", 121 | "Commit the file to the repository.\n", 122 | "\n", 123 | "```bash\n", 124 | "$ git commit -a -m \"added demo file\" \n", 125 | "[master 2e1918d] added demo file\n", 126 | " 1 file changed, 1 insertion(+)\n", 127 | " create mode 100644 demo.txt\n", 128 | "```\n", 129 | "\n", 130 | "**Pushing it to the server!**\n", 131 | "\n", 132 | "```bash\n", 133 | "$ git push \n", 134 | "Counting objects: 3, done.\n", 135 | "Delta compression using up to 8 threads.\n", 136 | "Compressing objects: 100% (2/2), done.\n", 137 | "Writing objects: 100% (3/3), 324 bytes | 0 bytes/s, done.\n", 138 | "Total 3 (delta 0), reused 0 (delta 0)\n", 139 | "To https://github.com/alexsb/demo.git\n", 140 | " 8e1ecd1..2e1918d master -> master\n", 141 | "```\n", 142 | "\n", 143 | "We have now committed a file locally and pushed it to the server, i.e., our local copy is in sync with the server copy. \n", 144 | "\n", 145 | "Note that the `git push` command uses the origin defined in the config file. You can also push to other repositories!" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "Next, we will make changes at another place. We'll use the **GitHub web interface** to do that. \n", 153 | "\n", 154 | "Once these changes are done, our local repository is out of sync with the remote repository. To get these changes locally, we have to pull from the repository:\n", 155 | "\n", 156 | "```bash\n", 157 | "$ git pull\n", 158 | "remote: Counting objects: 3, done.\n", 159 | "remote: Compressing objects: 100% (2/2), done.\n", 160 | "remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0\n", 161 | "Unpacking objects: 100% (3/3), done.\n", 162 | "From https://github.com/alexsb/demo\n", 163 | " 2e1918d..5dd3090 master -> origin/master\n", 164 | "Updating 2e1918d..5dd3090\n", 165 | "Fast-forward\n", 166 | " demo.txt | 1 +\n", 167 | " 1 file changed, 1 insertion(+)\n", 168 | "``` \n", 169 | " \n", 170 | "Let's see whether the changes are here \n", 171 | "```bash\n", 172 | "$ cat demo.txt \n", 173 | "Hello world\n", 174 | "Are you still spinning?\n", 175 | "```" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "## Handling Conflicts \n", 183 | "\n", 184 | "If we make local and remote changes simultaneously, we will get into conflicted state. \n", 185 | "\n", 186 | "Let's go to the web interface and make a change to the `demo.txt`. \n", 187 | "\n", 188 | "At the same time, we add a line to the demo.txt locally:\n", 189 | "\n", 190 | "```bash\n", 191 | "$ echo \"One more line\" >> demo.txt\n", 192 | "```\n", 193 | "\n", 194 | "If we now pull from GitHub, we're in trouble: \n", 195 | "\n", 196 | "```bash\n", 197 | "$ git pull\n", 198 | "remote: Enumerating objects: 3, done.\n", 199 | "remote: Counting objects: 100% (3/3), done.\n", 200 | "remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0\n", 201 | "Unpacking objects: 100% (3/3), done.\n", 202 | "From https://github.com/alexsb/demo\n", 203 | " 55da8f8..b03d9b1 master -> origin/master\n", 204 | "Updating 55da8f8..b03d9b1\n", 205 | "error: Your local changes to the following files would be overwritten by merge:\n", 206 | "\tdemo.txt\n", 207 | "Please commit your changes or stash them before you merge.\n", 208 | "Aborting\n", 209 | "```\n", 210 | "\n", 211 | "The right way to handle this conflict is to first commit locally: \n", 212 | "\n", 213 | "```bash\n", 214 | "$ git commit -a -m \"added a line\" \n", 215 | "[master 5589619] added a line\n", 216 | "1 file changed, 1 insertion(+)\n", 217 | "```\n", 218 | "\n", 219 | "Then to pull: \n", 220 | "\n", 221 | "```bash\n", 222 | "$ git pull\n", 223 | "Auto-merging demo.txt\n", 224 | "CONFLICT (content): Merge conflict in demo.txt\n", 225 | "Automatic merge failed; fix conflicts and then commit the result.\n", 226 | "```\n", 227 | "\n", 228 | "And fix the conflict in the file. Then commit and push. \n", 229 | "\n", 230 | "However, this is tricky with Jupyter Notebooks and our setup of lectures. \n", 231 | "\n", 232 | "For example, if you make local changes and we update the lecture later, resolving the conflict will not be easy.\n", 233 | "\n", 234 | "Another approach is to override your local changes and accept all changes from the source. This will **delete everything you've done in your local file**. \n", 235 | "\n", 236 | "```bash\n", 237 | "$ git checkout demo.txt\n", 238 | "```\n", 239 | "\n", 240 | "This overrides all changes to demo.txt. " 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "## Jupyter Notebooks and Git\n", 248 | "\n", 249 | "Unfortunately, Jupyter Notebooks aren't handled well by git, as they mix code and output in the jupyter notebook file. \n", 250 | "\n", 251 | "Let's take a quick look at a notebook file [this is edited and cut]: \n", 252 | "\n", 253 | "```json\n", 254 | "{\n", 255 | " \"cells\": [\n", 256 | " {\n", 257 | " \"cell_type\": \"markdown\",\n", 258 | " \"metadata\": {},\n", 259 | " \"source\": [\n", 260 | " \"# Introduction to Data Science, CS 5963 / Math 3900\\n\",\n", 261 | " \"*CS 5963 / MATH 3900, University of Utah, http://datasciencecourse.net/* \\n\",\n", 262 | " \"\\n\",\n", 263 | " \"## Lab 10: Classification\\n\",\n", 264 | " \"\\n\",\n", 265 | " \"In this lab, we will use the [scikit-learn](http://scikit-learn.org/) library to revisit the three classification methods we introduced: K-nearest neighbor, decision trees, and support vector machines. We will use a [dataset on contraceptive methods in Indonesia](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice).\\n\"\n", 266 | " ]\n", 267 | " },\n", 268 | " {\n", 269 | " \"cell_type\": \"markdown\",\n", 270 | " \"metadata\": {},\n", 271 | " \"source\": [\n", 272 | " \"## The Data\\n\",\n", 273 | " \"\\n\",\n", 274 | " \"We will explore a dataset about the use of contraception in Indonesia. The dataset has 1473 records and the following attributes:\\n\",\n", 275 | " \"\\n\",\n", 276 | " \"1. Woman's age (numerical) \\n\",\n", 277 | " \"2. Woman's education (categorical) 1=low, 2, 3, 4=high \\n\",\n", 278 | " \"3. Husband's education (categorical) 1=low, 2, 3, 4=high \\n\",\n", 279 | " \"4. Number of children ever born (numerical) \\n\",\n", 280 | " \"5. Woman's religion (binary) 0=Non-Islam, 1=Islam \\n\",\n", 281 | " \"6. Employed? (binary) 0=Yes, 1=No \\n\",\n", 282 | " \"7. Husband's occupation (categorical) 1, 2, 3, 4 \\n\",\n", 283 | " \"8. Standard-of-living index (categorical) 1=low, 2, 3, 4=high \\n\",\n", 284 | " \"9. Media exposure (binary) 0=Good, 1=Not good \\n\",\n", 285 | " \"10. Contraceptive method used (class attribute) 1=No-use, 2=Long-term, 3=Short-term\"\n", 286 | " ]\n", 287 | " },\n", 288 | " {\n", 289 | " \"cell_type\": \"markdown\",\n", 290 | " \"metadata\": {},\n", 291 | " \"source\": [\n", 292 | " \"### Hypothesis\\n\",\n", 293 | " \"\\n\",\n", 294 | " \"Write down which features do you think have the most impact on the use of contraception.\"\n", 295 | " ]\n", 296 | " },\n", 297 | " {\n", 298 | " \"cell_type\": \"code\",\n", 299 | " \"execution_count\": 2,\n", 300 | " \"metadata\": {\n", 301 | " \"collapsed\": false\n", 302 | " },\n", 303 | " \"outputs\": [\n", 304 | " {\n", 305 | " \"data\": {\n", 306 | " \"text/html\": [\n", 307 | " \"
\\n\",\n", 308 | " \"\\n\",\n", 309 | " \" \\n\",\n", 310 | " \" \\n\",\n", 311 | " \" \\n\",\n", 312 | " \" \\n\",\n", 313 | " \" \\n\",\n", 314 | " \" \\n\",\n", 315 | " \" \\n\",\n", 316 | " \" \\n\",\n", 317 | "```\n", 318 | "\n", 319 | "Things like \"outputs\" and \"execution_count\" can change without any change to the notebooks functionality. \n", 320 | "\n", 321 | "So, what can you do? \n", 322 | "\n", 323 | " * Only commit clean notebooks, i.e., run \"Restart and Clear Output\" before committing pusing. This gets tedious, of course, if your script takes a long time to run. \n", 324 | " * Deal with conflicts (it's not too hard).\n", 325 | " * Work in pure python (not encouraged for your final project). \n", 326 | " * Synchronize with your collaborators over chat (...). \n", 327 | " * More sophisticated solutions [such as this one](https://gist.github.com/pbugnion/ea2797393033b54674af) (untested). \n", 328 | " * Hope and wait that Jupyter notebook will at some point separate input from output. (It's [looking good](https://github.com/jupyter/roadmap/blob/master/companion-files.md)).\n" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "## Ignore Files\n", 336 | "\n", 337 | "When developing software, it's quite common that there are a lot of temporary files, e.g., created by Jupyter notebook to save temporary states. We shouldn't track temporary files, there is no reason to store them, and the can create conflicts.\n", 338 | "\n", 339 | "When you work with git on the command line, you have to manually add files you want to commit. But some GUI tools just add everything, so it's easy to add files you don't want. \n", 340 | "\n", 341 | "A good approach to avoid that is to use a `.gitignore` file. A gitignore file is a hidden file that contains file extensions that shouldn't be added to a git repository. For Jupyter notebooks, this is a minimal .gitignore file: \n", 342 | "\n", 343 | "```bash\n", 344 | "# IPython Notebook\n", 345 | ".ipynb_checkpoints\n", 346 | "```\n", 347 | "\n", 348 | "You can find a more comprehensive `.gitignore` file in the [lecture repository]. We recommend that you copy this into your project repository. " 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "## Other Files\n", 356 | "\n", 357 | "You should always add a `README.md` file that describes what the code in the repository does and how to run it. \n", 358 | "\n", 359 | "You should always add a license to your code. We recommend the BSD or MIT license, which are non-viral open source licenses. " 360 | ] 361 | } 362 | ], 363 | "metadata": { 364 | "anaconda-cloud": {}, 365 | "kernelspec": { 366 | "display_name": "Python 3 (ipykernel)", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.9.6" 381 | }, 382 | "pycharm": { 383 | "stem_cell": { 384 | "cell_type": "raw", 385 | "metadata": { 386 | "collapsed": false 387 | }, 388 | "source": [] 389 | } 390 | } 391 | }, 392 | "nbformat": 4, 393 | "nbformat_minor": 1 394 | } 395 | -------------------------------------------------------------------------------- /02-basic-python/anaconda_navigator.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/anaconda_navigator.png -------------------------------------------------------------------------------- /02-basic-python/datasciencecat.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/datasciencecat.jpg -------------------------------------------------------------------------------- /02-basic-python/exercise.py: -------------------------------------------------------------------------------- 1 | def add_numbers(a, b): 2 | return a + b 3 | 4 | print(add_numbers(3, 7)) 5 | print(add_numbers(14.22, 19)) 6 | # printing a won't work because of scope 7 | print(a) 8 | -------------------------------------------------------------------------------- /02-basic-python/first_steps.py: -------------------------------------------------------------------------------- 1 | def double_number(a): 2 | # btw, here is a comment! Use the # symbol to add comments or temporarily remove code 3 | # shorthand operator for 'a = a * 2' 4 | a *= 2 5 | return a 6 | 7 | print(double_number(3)) 8 | print(double_number(14.22)) 9 | -------------------------------------------------------------------------------- /02-basic-python/newrepo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/newrepo.png -------------------------------------------------------------------------------- /03-basic-python-II/lecture-3-exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lecture 3 Exercises" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Exercise 1: Data Types and Operators" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "**Task 1.1:** Try how capitalization affects string comparison, e.g., compare \"datascience\" to \"Datascience\"." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "**Task 1.2:** Try to compare floats using the `==` operator defined as expressions of integers, e.g., whether 1/3 is equal to 2/6. Does that work?" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "**Task 1.3:** Write an expression that compares the \"floor\" value of a float to an integer, e.g., compare the floor of 1/3 to 0. There are two ways to calculate a floor value: using `int()` and using `math.floor()`. Are they equal? What is the data type of the returned values?" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "## Exercise 3: Functions and If" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "Write a function that takes two integers. If either of the numbers can be divided by the other without a remainder, print the result of the division. If none of the numbers can divide the other one, print an error message." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "## Exercise 4: Lists\n", 92 | "\n", 93 | " * Create a list for the Rolling Stones: Mick, Keith, Charlie, Ronnie.\n", 94 | " * Create a slice of that list that contains only members of the original lineup (Mick, Keith, Charlie).\n", 95 | " * Add the stones lists to the the bands list." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 1, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "# intializations\n", 105 | "beatles = [\"Paul\", \"John\", \"George\", \"Ringo\"]\n", 106 | "zeppelin = [\"Jimmy\", \"Robert\", \"John\", \"John\"]\n", 107 | "bands = [beatles, zeppelin]" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "## Exercise 5.1: While\n", 122 | "\n", 123 | "Write a while loop that computes the sum of the 100 first positive integers. I.e., calculate\n", 124 | "\n", 125 | "$1+2+3+4+5+...+100$ " 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "## Exercise 5.2: For\n", 140 | "\n", 141 | "Use a for loop to create an array that contains all even numbers in the range 0-50, i.e., an array: [2, 4, 6, ..., 48, 50] " 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Create a new array for the Beatles main instruments: Ringo played drums, George played lead guitar, John played rhythm guitar and Paul played bass. Assume that the array position associated the musician with his instrument. Use a for loop to print:\n", 156 | "\n", 157 | "```\n", 158 | "Paul: Bass\n", 159 | "John: Rythm Guitar\n", 160 | "George: Lead Guitar\n", 161 | "Ringo: Drums\n", 162 | "```" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "## Exercise 6: Recursion\n", 177 | "\n", 178 | "Write a recursive function that calculates the factorial of a number. " 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": null, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "## Exercise 7: List Comprehension" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": { 198 | "collapsed": true 199 | }, 200 | "source": [ 201 | "Write a list comprehension function that creates an array with the length of each word in the following sentence:\n", 202 | "\n", 203 | "\"the quick brown fox jumps over the lazy dog\"\n", 204 | "\n", 205 | "The result should be a list: \n", 206 | "\n", 207 | "```python\n", 208 | "[3,5,...,3]\n", 209 | "```" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 2, 215 | "metadata": {}, 216 | "outputs": [ 217 | { 218 | "data": { 219 | "text/plain": [ 220 | "['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']" 221 | ] 222 | }, 223 | "execution_count": 2, 224 | "metadata": {}, 225 | "output_type": "execute_result" 226 | } 227 | ], 228 | "source": [ 229 | "# setting up the array\n", 230 | "sentence = \"the quick brown fox jumps over the lazy dog\"\n", 231 | "word_list = sentence.split()\n", 232 | "word_list" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [] 241 | } 242 | ], 243 | "metadata": { 244 | "anaconda-cloud": {}, 245 | "kernelspec": { 246 | "display_name": "Python 3 (ipykernel)", 247 | "language": "python", 248 | "name": "python3" 249 | }, 250 | "language_info": { 251 | "codemirror_mode": { 252 | "name": "ipython", 253 | "version": 3 254 | }, 255 | "file_extension": ".py", 256 | "mimetype": "text/x-python", 257 | "name": "python", 258 | "nbconvert_exporter": "python", 259 | "pygments_lexer": "ipython3", 260 | "version": "3.9.6" 261 | } 262 | }, 263 | "nbformat": 4, 264 | "nbformat_minor": 1 265 | } 266 | -------------------------------------------------------------------------------- /03-basic-python-II/patric_recursive.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/03-basic-python-II/patric_recursive.gif -------------------------------------------------------------------------------- /04-intro-desc-stat/04-DescriptiveStatistics_Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Lecture 4: In Class Activity\n", 8 | "\n", 9 | "\n", 10 | "Your Name: \n", 11 | "Your UID: \n", 12 | "Your E-Mail: \n", 13 | "\n", 14 | "For this activity first import a small data set containing 10 measurements of CO$_2$ levels and global temperatures. You can do this with the read_csv command from the pandas library as follows: " 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "tags": [] 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import pandas as pd\n", 26 | "import matplotlib.pyplot as plt\n", 27 | "import numpy as np\n", 28 | "my_data = pd.read_csv(\"global_warming.csv\")" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "# Item 1\n", 36 | "\n", 37 | "Print the first few rows of the my_data data frame. Then create lists storing the CO_2 and temperature values. Compute the mean and median of the temperature measurements." 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "# Your code here" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "# Item 2\n", 54 | "\n", 55 | "Make a scatterplot of CO_2 versus temperature. Is there a strong relationship between these two variables? What is the correlation cofficient? Can we infer that increasing carbon dioxide levels will increase global temperature?" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "# Your code here" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "*Your answers here.*" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "# Item 3\n", 79 | "\n", 80 | "Change the last temperature measurement from 14.4 degrees to 144 degrees which could have happened if there was a small error in manual data entry. \n", 81 | "\n", 82 | "How are the mean and median affected?\n", 83 | "\n", 84 | "How are the scatterplot and the correlation coefficient affected?\n", 85 | "\n", 86 | "Are the mean, median, and correlation coefficient robust to outliers and data entry errors?" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "# Your code here" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "*Your answers here.*" 103 | ] 104 | } 105 | ], 106 | "metadata": { 107 | "anaconda-cloud": {}, 108 | "celltoolbar": "Slideshow", 109 | "kernelspec": { 110 | "display_name": "Python 3 (ipykernel)", 111 | "language": "python", 112 | "name": "python3" 113 | }, 114 | "language_info": { 115 | "codemirror_mode": { 116 | "name": "ipython", 117 | "version": 3 118 | }, 119 | "file_extension": ".py", 120 | "mimetype": "text/x-python", 121 | "name": "python", 122 | "nbconvert_exporter": "python", 123 | "pygments_lexer": "ipython3", 124 | "version": "3.11.5" 125 | }, 126 | "nbpresent": { 127 | "slides": { 128 | "006f01ca-e160-4faa-ad02-2f873362ca99": { 129 | "id": "006f01ca-e160-4faa-ad02-2f873362ca99", 130 | "prev": "e60ea09b-1474-49b0-9ea6-2e803b335693", 131 | "regions": { 132 | "88222835-28de-4a0f-895e-303024baf060": { 133 | "attrs": { 134 | "height": 0.8, 135 | "width": 0.8, 136 | "x": 0.1, 137 | "y": 0.1 138 | }, 139 | "content": { 140 | "cell": "e6a51e7a-d63e-4187-8899-bfbf03f8a4b6", 141 | "part": "whole" 142 | }, 143 | "id": "88222835-28de-4a0f-895e-303024baf060" 144 | } 145 | } 146 | }, 147 | "2ba6955d-8be2-4ce3-ae98-f3b3695e4832": { 148 | "id": "2ba6955d-8be2-4ce3-ae98-f3b3695e4832", 149 | "prev": "35a7a5a6-f0c3-4b68-9579-e5840160a87d", 150 | "regions": { 151 | "63b4b5f3-c348-418c-aabf-932d5fdbcc1c": { 152 | "attrs": { 153 | "height": 0.8, 154 | "width": 0.8, 155 | "x": 0.1, 156 | "y": 0.1 157 | }, 158 | "content": { 159 | "cell": "883076a7-1c6e-492f-b9d2-0b8550b5c31f", 160 | "part": "whole" 161 | }, 162 | "id": "63b4b5f3-c348-418c-aabf-932d5fdbcc1c" 163 | } 164 | } 165 | }, 166 | "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244": { 167 | "id": "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244", 168 | "prev": "9ba8c8c8-59a7-4776-84cb-084e5b0a2317", 169 | "regions": { 170 | "44582fec-116b-475d-9793-ad29580b7fc2": { 171 | "attrs": { 172 | "height": 0.8, 173 | "width": 0.8, 174 | "x": 0.1, 175 | "y": 0.1 176 | }, 177 | "content": { 178 | "cell": "4992f285-654f-485e-81ef-8a6ae18cad34", 179 | "part": "whole" 180 | }, 181 | "id": "44582fec-116b-475d-9793-ad29580b7fc2" 182 | } 183 | } 184 | }, 185 | "35a7a5a6-f0c3-4b68-9579-e5840160a87d": { 186 | "id": "35a7a5a6-f0c3-4b68-9579-e5840160a87d", 187 | "prev": "c7291188-b014-4fcb-83bc-f1ea035ee4c9", 188 | "regions": { 189 | "cbc80f26-e933-4d90-9dfd-e55a3dc339ba": { 190 | "attrs": { 191 | "height": 0.8, 192 | "width": 0.8, 193 | "x": 0.1, 194 | "y": 0.1 195 | }, 196 | "content": { 197 | "cell": "558af430-f4c0-4be9-b1ef-afce5fccd0fa", 198 | "part": "whole" 199 | }, 200 | "id": "cbc80f26-e933-4d90-9dfd-e55a3dc339ba" 201 | } 202 | } 203 | }, 204 | "3ecd0fe6-e75f-4362-a6e7-e5273e12058e": { 205 | "id": "3ecd0fe6-e75f-4362-a6e7-e5273e12058e", 206 | "prev": "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244", 207 | "regions": { 208 | "eac20970-b45c-4073-a596-cdb2a974fe23": { 209 | "attrs": { 210 | "height": 0.8, 211 | "width": 0.8, 212 | "x": 0.1, 213 | "y": 0.1 214 | }, 215 | "content": { 216 | "cell": "de60c848-d1fb-478d-a736-0ebe21762a24", 217 | "part": "whole" 218 | }, 219 | "id": "eac20970-b45c-4073-a596-cdb2a974fe23" 220 | } 221 | } 222 | }, 223 | "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0": { 224 | "id": "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0", 225 | "prev": "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066", 226 | "regions": { 227 | "ddc97209-2f2f-409d-953c-a9a83dec6738": { 228 | "attrs": { 229 | "height": 0.8, 230 | "width": 0.8, 231 | "x": 0.1, 232 | "y": 0.1 233 | }, 234 | "content": { 235 | "cell": "674ee724-0165-40c5-9296-83db8305fa4c", 236 | "part": "whole" 237 | }, 238 | "id": "ddc97209-2f2f-409d-953c-a9a83dec6738" 239 | } 240 | } 241 | }, 242 | "95bf00f9-fc2a-4478-bd48-fb66078e061f": { 243 | "id": "95bf00f9-fc2a-4478-bd48-fb66078e061f", 244 | "prev": "b9fa7815-a205-4ea6-9076-1289b96670cf", 245 | "regions": { 246 | "dd7aef1b-3f05-47a5-bc5f-274809d2c21d": { 247 | "attrs": { 248 | "height": 0.8, 249 | "width": 0.8, 250 | "x": 0.1, 251 | "y": 0.1 252 | }, 253 | "content": { 254 | "cell": "61e1167e-99ef-4b5d-b717-07a46077a091", 255 | "part": "whole" 256 | }, 257 | "id": "dd7aef1b-3f05-47a5-bc5f-274809d2c21d" 258 | } 259 | } 260 | }, 261 | "966d5c12-49ef-4129-aecb-b183804ecd19": { 262 | "id": "966d5c12-49ef-4129-aecb-b183804ecd19", 263 | "prev": "3ecd0fe6-e75f-4362-a6e7-e5273e12058e", 264 | "regions": { 265 | "ff0704a2-f662-4a03-9874-5613f4634956": { 266 | "attrs": { 267 | "height": 0.8, 268 | "width": 0.8, 269 | "x": 0.1, 270 | "y": 0.1 271 | }, 272 | "content": { 273 | "cell": "a6fd92a3-b57e-45c5-b216-f9f475baf8ce", 274 | "part": "whole" 275 | }, 276 | "id": "ff0704a2-f662-4a03-9874-5613f4634956" 277 | } 278 | } 279 | }, 280 | "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066": { 281 | "id": "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066", 282 | "prev": "966d5c12-49ef-4129-aecb-b183804ecd19", 283 | "regions": { 284 | "9e037b47-3fb5-4a60-b69c-ad3b282807a1": { 285 | "attrs": { 286 | "height": 0.8, 287 | "width": 0.8, 288 | "x": 0.1, 289 | "y": 0.1 290 | }, 291 | "content": { 292 | "cell": "b79fa570-8c08-4820-a035-2a00bfae1a9b", 293 | "part": "whole" 294 | }, 295 | "id": "9e037b47-3fb5-4a60-b69c-ad3b282807a1" 296 | } 297 | } 298 | }, 299 | "9ba8c8c8-59a7-4776-84cb-084e5b0a2317": { 300 | "id": "9ba8c8c8-59a7-4776-84cb-084e5b0a2317", 301 | "prev": "9e2b6ffd-bec2-4027-93e8-64b63e770378", 302 | "regions": { 303 | "45bb0df9-937a-48a6-882b-346a1253250c": { 304 | "attrs": { 305 | "height": 0.8, 306 | "width": 0.8, 307 | "x": 0.1, 308 | "y": 0.1 309 | }, 310 | "content": { 311 | "cell": "be5bedf1-b9ed-4caa-bc3e-6c390df97946", 312 | "part": "whole" 313 | }, 314 | "id": "45bb0df9-937a-48a6-882b-346a1253250c" 315 | } 316 | } 317 | }, 318 | "9e2b6ffd-bec2-4027-93e8-64b63e770378": { 319 | "id": "9e2b6ffd-bec2-4027-93e8-64b63e770378", 320 | "prev": "e9d31a55-0862-44ec-bd48-d74167655985", 321 | "regions": { 322 | "7a7c6996-8117-42ce-8093-318b84bd052b": { 323 | "attrs": { 324 | "height": 0.8, 325 | "width": 0.8, 326 | "x": 0.1, 327 | "y": 0.1 328 | }, 329 | "content": { 330 | "cell": "95b34ac1-b36d-492d-84f9-adafb2d57ace", 331 | "part": "whole" 332 | }, 333 | "id": "7a7c6996-8117-42ce-8093-318b84bd052b" 334 | } 335 | } 336 | }, 337 | "ae354c9e-1384-4f31-8e03-5c96f3988bf4": { 338 | "id": "ae354c9e-1384-4f31-8e03-5c96f3988bf4", 339 | "prev": null, 340 | "regions": { 341 | "9e57ef10-c941-41de-93da-812357c7ec21": { 342 | "attrs": { 343 | "height": 0.8, 344 | "width": 0.8, 345 | "x": 0.1, 346 | "y": 0.1 347 | }, 348 | "content": { 349 | "cell": "dac6427e-b8df-46f9-bfd3-b24427a73993", 350 | "part": "whole" 351 | }, 352 | "id": "9e57ef10-c941-41de-93da-812357c7ec21" 353 | } 354 | } 355 | }, 356 | "b9fa7815-a205-4ea6-9076-1289b96670cf": { 357 | "id": "b9fa7815-a205-4ea6-9076-1289b96670cf", 358 | "prev": "ae354c9e-1384-4f31-8e03-5c96f3988bf4", 359 | "regions": { 360 | "cc19ec36-3caa-4666-86e7-2bd1d9cb03b8": { 361 | "attrs": { 362 | "height": 0.8, 363 | "width": 0.8, 364 | "x": 0.1, 365 | "y": 0.1 366 | }, 367 | "content": { 368 | "cell": "c7392535-4666-41a5-a68a-7306dccd6cd8", 369 | "part": "whole" 370 | }, 371 | "id": "cc19ec36-3caa-4666-86e7-2bd1d9cb03b8" 372 | } 373 | } 374 | }, 375 | "c7291188-b014-4fcb-83bc-f1ea035ee4c9": { 376 | "id": "c7291188-b014-4fcb-83bc-f1ea035ee4c9", 377 | "prev": "006f01ca-e160-4faa-ad02-2f873362ca99", 378 | "regions": { 379 | "3fa8900b-1ee2-4625-8e0b-a8a52d95390d": { 380 | "attrs": { 381 | "height": 0.8, 382 | "width": 0.8, 383 | "x": 0.1, 384 | "y": 0.1 385 | }, 386 | "content": { 387 | "cell": "a912604c-786a-448e-a908-397f28b46a13", 388 | "part": "whole" 389 | }, 390 | "id": "3fa8900b-1ee2-4625-8e0b-a8a52d95390d" 391 | } 392 | } 393 | }, 394 | "e60ea09b-1474-49b0-9ea6-2e803b335693": { 395 | "id": "e60ea09b-1474-49b0-9ea6-2e803b335693", 396 | "prev": "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0", 397 | "regions": { 398 | "3df2ec23-ba8a-4b76-b6af-2a4aa219da46": { 399 | "attrs": { 400 | "height": 0.8, 401 | "width": 0.8, 402 | "x": 0.1, 403 | "y": 0.1 404 | }, 405 | "content": { 406 | "cell": "06d04c6d-90a4-441d-9d6e-4f719490e12e", 407 | "part": "whole" 408 | }, 409 | "id": "3df2ec23-ba8a-4b76-b6af-2a4aa219da46" 410 | } 411 | } 412 | }, 413 | "e9d31a55-0862-44ec-bd48-d74167655985": { 414 | "id": "e9d31a55-0862-44ec-bd48-d74167655985", 415 | "prev": "95bf00f9-fc2a-4478-bd48-fb66078e061f", 416 | "regions": { 417 | "cc6bd695-e238-4250-8822-ffea3f82f544": { 418 | "attrs": { 419 | "height": 0.8, 420 | "width": 0.8, 421 | "x": 0.1, 422 | "y": 0.1 423 | }, 424 | "content": { 425 | "cell": "86c3f014-9535-48f0-95a2-df74d16eaa69", 426 | "part": "whole" 427 | }, 428 | "id": "cc6bd695-e238-4250-8822-ffea3f82f544" 429 | } 430 | } 431 | } 432 | }, 433 | "themes": {} 434 | } 435 | }, 436 | "nbformat": 4, 437 | "nbformat_minor": 4 438 | } 439 | -------------------------------------------------------------------------------- /04-intro-desc-stat/Conf1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/Conf1.png -------------------------------------------------------------------------------- /04-intro-desc-stat/Conf2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/Conf2.png -------------------------------------------------------------------------------- /04-intro-desc-stat/SmallLargeStandDev.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/SmallLargeStandDev.png -------------------------------------------------------------------------------- /04-intro-desc-stat/correlation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/correlation.png -------------------------------------------------------------------------------- /04-intro-desc-stat/global_warming.csv: -------------------------------------------------------------------------------- 1 | CO_2,Temp 2 | 314,13.9 3 | 317,14 4 | 320,13.9 5 | 326,14.1 6 | 331,14 7 | 339,14.3 8 | 346,14.1 9 | 354,14.5 10 | 361,14.5 11 | 369,14.4 -------------------------------------------------------------------------------- /04-intro-desc-stat/purity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/purity.png -------------------------------------------------------------------------------- /04-intro-desc-stat/test_data.csv: -------------------------------------------------------------------------------- 1 | Age,Weight,Gender 2 | 23,123,M 3 | 523,345,F 4 | 45,234,M 5 | 67,21,F -------------------------------------------------------------------------------- /05-dictionaries-pandas-series/05-exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lecture 5: Exercise Solutions" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "collapsed": true 14 | }, 15 | "source": [ 16 | "## Exercise 2: Sets\n", 17 | "\n", 18 | "Write a function that finds the overlap of two sets and prints them.\n", 19 | "Initialize two sets, e.g., with values {13, 25, 37, 45, 13} and {14, 25, 38, 8, 45} and call this function with them." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "## Exercise 3: Dictionaries\n", 34 | "\n", 35 | " * Create a dictionary with two-letter codes of two US states and the full names, e.g., UT: Utah, NY: New York\n", 36 | " * After initially creating the dictionary, add two more states to the dictionary.\n", 37 | " * Create a second dictionary that maps the state codes to an array of cities in that state, e.g., UT: [Salt Lake City, Ogden, Provo, St.George]. \n", 38 | " * Write a function that takes a state code and prints the full name of the state and lists the cities in that state." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Exercise 4: Objects\n", 53 | "\n", 54 | "Create a class `Pet` with members `name, pronoun, animal,` and `pet_response`. \n", 55 | "\n", 56 | "Add a method `pet()` which prints a response that is composed of the members, like that: \n", 57 | "\n", 58 | "`Layla is a cat. If you pet her, Layla purrs.` \n", 59 | "`Bond is a dog. If you pet him, Scout wags his tail.`\n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "Here are some example calls: " 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "layla = Pet(\"Layla\", \"her\", \"cat\", \"purrs\")\n", 83 | "scout = Pet(\"Bond\", \"him\", \"dog\", \"wags his tail\")\n", 84 | "\n", 85 | "layla.pet()\n", 86 | "scout.pet()" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": { 92 | "collapsed": true 93 | }, 94 | "source": [ 95 | "## Exercise 6: Pandas Series" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "Create a new pandas series with the lists given below that contain NFL team names and the number of Super Bowl titles they won. Use the names as indices, the wins as the data.\n", 103 | "\n", 104 | " * Once the list is created, sort the series alphabetically by index. \n", 105 | " * Print an overview of the statistical properties of the series. What's the mean number of wins?\n", 106 | " * Filter out all teams that have won less than four Super Bowl titles\n", 107 | " * A football team has 45 players. Update the series so that instead of the number of titles, it reflects the number of Super Bowl rings given to the players. \n", 108 | " * Assume that each ring costs USD 32,000. Update the series so that it contains a string of the dollar amount including the \\$ sign. For the Steelers, for example, this would correspond to: \n", 109 | " ```\n", 110 | " Pittsburgh Steelers $ 8640000\n", 111 | " ```\n", 112 | "\n" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "teams = [\"New England Patriots\",\n", 122 | " \"Pittsburgh Steelers\",\n", 123 | " \"Dallas Cowboys\",\n", 124 | " \"San Francisco 49ers\",\n", 125 | " \"Green Bay Packers\",\n", 126 | " \"New York Giants\",\n", 127 | " \"Denver Broncos\",\n", 128 | " \"Oakland/Los Angeles/Las Vegas Raiders\",\n", 129 | " \"Washington Commanders\"\n", 130 | " \"Miami Dolphins\",\n", 131 | " \"Baltimore/Indianapolis Colts\",\n", 132 | " \"Baltimore Ravens\",\n", 133 | " \"Los Angeles/St. Louis Rams\",\n", 134 | " \"Tampa Bay Buccaneers\"]\n", 135 | "wins = [6,6,5,5,4,4,3,3,3,2,2,2,2,2]" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "import pandas as pd" 145 | ] 146 | } 147 | ], 148 | "metadata": { 149 | "anaconda-cloud": {}, 150 | "kernelspec": { 151 | "display_name": "Python 3 (ipykernel)", 152 | "language": "python", 153 | "name": "python3" 154 | }, 155 | "language_info": { 156 | "codemirror_mode": { 157 | "name": "ipython", 158 | "version": 3 159 | }, 160 | "file_extension": ".py", 161 | "mimetype": "text/x-python", 162 | "name": "python", 163 | "nbconvert_exporter": "python", 164 | "pygments_lexer": "ipython3", 165 | "version": "3.9.6" 166 | } 167 | }, 168 | "nbformat": 4, 169 | "nbformat_minor": 1 170 | } 171 | -------------------------------------------------------------------------------- /06-loading-data-dataframes/06-exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "nbpresent": { 7 | "id": "50a40f10-f4b6-4dd3-b7aa-c63ed3ca3244" 8 | }, 9 | "slideshow": { 10 | "slide_type": "slide" 11 | } 12 | }, 13 | "source": [ 14 | "# Introduction to Data Science – Lecture 6 – Exercises\n", 15 | "\n", 16 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Exercise 1: Reading and Writing Data\n", 24 | "\n", 25 | "The file [grades.csv](grades.csv) is a file with student names and letter grades:\n", 26 | "\n", 27 | "```\n", 28 | "Alice; A\n", 29 | "Bob; B\n", 30 | "Robert; A\n", 31 | "Richard; C\n", 32 | "```\n", 33 | "\n", 34 | "Read the file into an array. Add a GPA to the student's row (A=4,B=3,C=2,D=1). \n", 35 | "\n", 36 | "Hint: the function [strip()](https://docs.python.org/3/library/stdtypes.html#str.strip) removes trailing whitespace from a string.\n", 37 | "\n", 38 | "Write that file into a new file `grades_gpa.csv`" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "gpas = {\"A\":4, \"B\":3, \"C\":2, \"D\":1}\n", 48 | "\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "## Exercise 2: Data Frames\n", 56 | "\n", 57 | "* Calculate the mean certified sales for all albums." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "import pandas as pd\n", 67 | "hit_albums = pd.read_csv(\"hit_albums.csv\")" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | " * Create a new dataframe that only contains albums with more than 20 million certified sales.\n" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": { 88 | "scrolled": true 89 | }, 90 | "outputs": [], 91 | "source": [] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "\n", 98 | " * Create a new dataframe based on the hit_albums dataset that only contains the artists that have at least two albums in the list." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "* Create a new dataframe that contains the aggregates sum of all certified sales for each year." 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [] 121 | } 122 | ], 123 | "metadata": { 124 | "anaconda-cloud": {}, 125 | "kernelspec": { 126 | "display_name": "Python 3 (ipykernel)", 127 | "language": "python", 128 | "name": "python3" 129 | }, 130 | "language_info": { 131 | "codemirror_mode": { 132 | "name": "ipython", 133 | "version": 3 134 | }, 135 | "file_extension": ".py", 136 | "mimetype": "text/x-python", 137 | "name": "python", 138 | "nbconvert_exporter": "python", 139 | "pygments_lexer": "ipython3", 140 | "version": "3.9.16" 141 | }, 142 | "nbpresent": { 143 | "slides": { 144 | "19a6495f-8346-4b23-a98a-c7115941e8f0": { 145 | "id": "19a6495f-8346-4b23-a98a-c7115941e8f0", 146 | "prev": null, 147 | "regions": { 148 | "d6523b36-7204-4001-8a6c-37c431b18d26": { 149 | "attrs": { 150 | "height": 1, 151 | "width": 1, 152 | "x": 0, 153 | "y": 0 154 | }, 155 | "id": "d6523b36-7204-4001-8a6c-37c431b18d26" 156 | } 157 | } 158 | } 159 | }, 160 | "themes": {} 161 | } 162 | }, 163 | "nbformat": 4, 164 | "nbformat_minor": 1 165 | } 166 | -------------------------------------------------------------------------------- /06-loading-data-dataframes/grades.csv: -------------------------------------------------------------------------------- 1 | Alice; A 2 | Bob; B 3 | Robert; A 4 | Richard; C 5 | -------------------------------------------------------------------------------- /06-loading-data-dataframes/hit_albums.csv: -------------------------------------------------------------------------------- 1 | Artist,Album,Released,Genre,"Certified sales (millions)",Claimed sales (millions) 2 | Michael Jackson,Thriller,1982,"Pop, rock, R&B",45.4,65 3 | AC/DC,Back in Black,1980,Hard rock,25.9,50 4 | Pink Floyd,The Dark Side of the Moon,1973,Progressive rock,22.7,45 5 | Whitney Houston / Various artists,The Bodyguard,1992,"Soundtrack/R&B, soul, pop",27.4,44 6 | Meat Loaf,Bat Out of Hell,1977,"Hard rock, progressive rock",20.6,43 7 | Eagles,Their Greatest Hits (1971–1975),1976,"Rock, soft rock, folk rock",32.2,42 8 | Bee Gees / Various artists,Saturday Night Fever,1977,Disco,19,40 9 | Fleetwood Mac,Rumours,1977,Soft rock,27.9,40 10 | Shania Twain,Come On Over,1997,"Country, pop",29.6,39 11 | Led Zeppelin,Led Zeppelin IV,1971,"Hard rock, heavy metal",29,37 12 | Michael Jackson,Bad,1987,"Pop, funk, rock",20.3,34 13 | Alanis Morissette,Jagged Little Pill,1995,Alternative rock,24.8,33 14 | Celine Dion,Falling into You,1996,"Pop, Soft rock",20.2,32 15 | The Beatles,Sgt. Pepper's Lonely Hearts Club Band,1967,Rock,13.1,32 16 | Eagles,Hotel California,1976,"Rock, soft rock, folk rock",21.5,32 17 | Mariah Carey,Music Box,1993,"Pop, R&B, Rock",19,32 18 | Michael Jackson,Dangerous,1991,"Rock, Funk, Pop",17.6,32 19 | Various artists,Dirty Dancing,1987,"Pop, rock, R&B",17.9,32 20 | Celine Dion,Let's Talk About Love,1997,"Pop, Soft rock",19.3,31 21 | The Beatles,1,2000,Rock,21.6,31 22 | Adele,21,2011,"Pop, soul",22.3,30 23 | The Beatles,Abbey Road,1969,Rock,14.4,30 24 | Bruce Springsteen,Born in the U.S.A.,1984,Rock,19.6,30 25 | Dire Straits,Brothers in Arms,1985,Rock,17.7,30 26 | James Horner,Titanic: Music from the Motion Picture,1997,Soundtrack,18.1,30 27 | Madonna,The Immaculate Collection,1990,"Pop, Dance",19.4,30 28 | Metallica,Metallica,1991,"Thrash metal, heavy metal",19.9,30 29 | Nirvana,Nevermind,1991,"Grunge, alternative rock",16.7,30 30 | Pink Floyd,The Wall,1979,Progressive rock,17.6,30 31 | Santana,Supernatural,1999,Rock,20.5,30 32 | Guns N' Roses,Appetite for Destruction,1987,"Heavy metal, hard rock",21.3,30 33 | ABBA,Gold: Greatest Hits,1992,Pop,29, 34 | Bon Jovi,Slippery When Wet,1986,Hard rock,28, 35 | Spice Girls,Spice,1996,Pop,28, 36 | Various artists,Grease: The Original Soundtrack from the Motion Picture,1978,Soundtrack,28, 37 | Britney Spears,...Baby One More Time,1999,Pop,28, 38 | Linkin Park,Hybrid Theory,2000,"Nu metal, rap metal, alternative metal",27, 39 | Bob Marley & The Wailers,Legend: The Best of Bob Marley & The Wailers,1984,Reggae,25, 40 | Carole King,Tapestry,1971,Pop,25, 41 | Madonna,Like a Virgin,1984,"Pop, dance",25, 42 | Madonna,True Blue,1986,Pop,25, 43 | Mariah Carey,Daydream,1995,"Pop, R&B",25, 44 | Norah Jones,Come Away with Me,2002,Jazz,25, 45 | Phil Collins,No Jacket Required,1985,"Pop, Rock",25, 46 | Queen,Greatest Hits,1981,Rock,25, 47 | Simon & Garfunkel,Bridge over Troubled Water,1970,Folk rock,25, 48 | U2,The Joshua Tree,1987,Rock,25, 49 | Whitney Houston,Whitney Houston,1985,"Pop, R&B",25, 50 | Backstreet Boys,Backstreet's Back / Backstreet Boys,1997,Pop,24, 51 | Backstreet Boys,Millennium,1999,Pop,24, 52 | Ace of Base,Happy Nation/The Sign,1993,Pop,23, 53 | TLC,CrazySexyCool,1994,"R&B, hip hop",23, 54 | Cyndi Lauper,She's So Unusual,1983,"New wave, pop rock, synthpop",22, 55 | Oasis,(What's the Story) Morning Glory?,1995,"Britpop, rock",22, 56 | Bon Jovi,Cross Road,1994,Hard rock,21, 57 | Eminem,The Marshall Mathers LP,2000,"Rap, hip-hop",21, 58 | Adele,25,2015,"Soul, pop, R&B",20, 59 | Avril Lavigne,Let Go,2002,"Pop rock, alternative rock, post-grunge",20, 60 | Boston,Boston,1976,Hard rock,20, 61 | Britney Spears,Oops!... I Did It Again,2000,Pop,20, 62 | Eric Clapton,Unplugged,1992,"Acoustic blues, folk rock",20, 63 | Def Leppard,Hysteria,1987,"Pop, Hard rock",20, 64 | George Michael,Faith,1987,"Pop, R&B",20, 65 | Green Day,Dookie,1994,"Pop punk, punk rock, alternative rock",20, 66 | Lionel Richie,Can't Slow Down,1983,"Pop, R&B, soul",20, 67 | Michael Jackson,"HIStory: Past, Present and Future, Book I",1995,"Pop, rock, R&B",20, 68 | Michael Jackson,Off the Wall,1979,"Soul, disco, R&B",20, 69 | Prince & the Revolution,Purple Rain,1984,"Pop, rock, R&B",20, 70 | Shania Twain,The Woman in Me,1995,"Country, pop",20, 71 | Shania Twain,Up!,2002,"Country, pop, world music",20, 72 | Supertramp,Breakfast in America,1979,"Progressive rock, art rock",20, 73 | Tina Turner,Private Dancer,1984,"Pop, rock, R&B",20, 74 | Tracy Chapman,Tracy Chapman,1988,Folk rock,20, 75 | Usher,Confessions,2004,R&B,20, 76 | Various artists,Flashdance: Original Soundtrack from the Motion Picture,1983,Soundtrack,20, 77 | Whitney Houston,Whitney,1987,"Pop, R&B",20, 78 | Shakira,Laundry Service,2001,"Pop, Rock",20, 79 | -------------------------------------------------------------------------------- /06-loading-data-dataframes/my_file.txt: -------------------------------------------------------------------------------- 1 | Hello World 2 | Are you still spinning? 3 | -------------------------------------------------------------------------------- /07-hypothesis-testing/Cohen1994.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/Cohen1994.pdf -------------------------------------------------------------------------------- /07-hypothesis-testing/InferenceErrors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/InferenceErrors.png -------------------------------------------------------------------------------- /07-hypothesis-testing/Nuzzo2014.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/Nuzzo2014.pdf -------------------------------------------------------------------------------- /07-hypothesis-testing/determinePvals.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/determinePvals.png -------------------------------------------------------------------------------- /08-data-science-ethics/Data-Science-Ethics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/08-data-science-ethics/Data-Science-Ethics.pdf -------------------------------------------------------------------------------- /09-LinearRegression1/08-LinearRegression1_Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "# Introduction to Data Science \n", 12 | "# Activity for Lecture 8: Linear Regression 1\n", 13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n", 14 | "\n", 15 | "Name:\n", 16 | "\n", 17 | "Email:\n", 18 | "\n", 19 | "UID:\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": { 25 | "slideshow": { 26 | "slide_type": "slide" 27 | } 28 | }, 29 | "source": [ 30 | "## Class exercise: amphetamine and appetite\n", 31 | "\n", 32 | "Amphetamine is a drug that suppresses appetite. In a study of this effect, a pharmocologist randomly allocated 24 rats to three treatment groups to receive an injection of amphetamine at one of two dosage levels (2.5 mg/kg or 5.0 mg/kg), or an injection of saline solution (0 mg/kg). She measured the amount of food consumed by each animal (in gm/kg) in the 3-hour period following injection. The results (gm of food consumed per kg of body weight) are shown below.\n" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 1, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "# imports and setup\n", 42 | "\n", 43 | "import scipy as sc\n", 44 | "import numpy as np\n", 45 | "\n", 46 | "import pandas as pd\n", 47 | "import statsmodels.formula.api as sm \n", 48 | "from sklearn import linear_model \n", 49 | "\n", 50 | "import matplotlib.pyplot as plt\n", 51 | "%matplotlib inline \n", 52 | "plt.rcParams['figure.figsize'] = (10, 6)\n", 53 | "\n", 54 | "from mpl_toolkits.mplot3d import Axes3D\n", 55 | "from matplotlib import cm\n", 56 | "\n", 57 | "# Experiment results:\n", 58 | "\n", 59 | "food_consump_dose0 = [112.6, 102.1, 90.2, 81.5, 105.6, 93.0, 106.6, 108.3]\n", 60 | "food_consump_dose2p5 = [73.3, 84.8, 67.3, 55.3, 80.7, 90.0, 75.5, 77.1]\n", 61 | "food_consump_dose5 = [38.5, 81.3, 57.1, 62.3, 51.5, 48.3, 42.7, 57.9]" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "## Activity 1: Scatterplot and Linear Regression\n", 69 | "\n", 70 | "**Exercise:** Make a scatter plot with dose as the $x$-variable and food consumption as the $y$ variable. Then run a linear regression on the data using the 'ols' function from the statsmodels python library to relate the variables by \n", 71 | "\n", 72 | "$$\n", 73 | "\\text{Food Consumption} = \\beta_0 + \\beta_1 \\text{Dose}. \n", 74 | "$$\n", 75 | "\n", 76 | "What is the resulting linear equation? What is the $R^2$ value? Do you think the variables have a strong linear relationship? Add the line to your scatter plot.\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 18, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "# your code goes here\n" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "**Your answer goes here:**" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "slideshow": { 99 | "slide_type": "slide" 100 | } 101 | }, 102 | "source": [ 103 | "## Activity 2: Residuals\n", 104 | "\n", 105 | "The regression in Activity 1 is in fact valid even though the predictor $x$ only has 3 distinct values; for each fixed value of $x$, the researcher collected a random sample of $y$ values.\n", 106 | "\n", 107 | "However, one assumption which is made by simple linear regression is that the residuals have an approximately normal distribution.\n", 108 | "\n", 109 | "**Exercise:** Compute the residuals for the above regression and make a normal probability plot of the residuals. Do you think they are approximately normally distributed? \n", 110 | "\n" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 19, 116 | "metadata": { 117 | "slideshow": { 118 | "slide_type": "-" 119 | } 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "# your code goes here \n" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": { 129 | "slideshow": { 130 | "slide_type": "-" 131 | } 132 | }, 133 | "source": [ 134 | "**Your answer goes here:**\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [] 143 | } 144 | ], 145 | "metadata": { 146 | "anaconda-cloud": {}, 147 | "celltoolbar": "Slideshow", 148 | "kernelspec": { 149 | "display_name": "Python 3 (ipykernel)", 150 | "language": "python", 151 | "name": "python3" 152 | }, 153 | "language_info": { 154 | "codemirror_mode": { 155 | "name": "ipython", 156 | "version": 3 157 | }, 158 | "file_extension": ".py", 159 | "mimetype": "text/x-python", 160 | "name": "python", 161 | "nbconvert_exporter": "python", 162 | "pygments_lexer": "ipython3", 163 | "version": "3.11.5" 164 | } 165 | }, 166 | "nbformat": 4, 167 | "nbformat_minor": 4 168 | } 169 | -------------------------------------------------------------------------------- /09-LinearRegression1/438px-Linear_regression.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/09-LinearRegression1/438px-Linear_regression.png -------------------------------------------------------------------------------- /09-LinearRegression1/Advertising.csv: -------------------------------------------------------------------------------- 1 | "","TV","Radio","Newspaper","Sales" 2 | "1",230.1,37.8,69.2,22.1 3 | "2",44.5,39.3,45.1,10.4 4 | "3",17.2,45.9,69.3,9.3 5 | "4",151.5,41.3,58.5,18.5 6 | "5",180.8,10.8,58.4,12.9 7 | "6",8.7,48.9,75,7.2 8 | "7",57.5,32.8,23.5,11.8 9 | "8",120.2,19.6,11.6,13.2 10 | "9",8.6,2.1,1,4.8 11 | "10",199.8,2.6,21.2,10.6 12 | "11",66.1,5.8,24.2,8.6 13 | "12",214.7,24,4,17.4 14 | "13",23.8,35.1,65.9,9.2 15 | "14",97.5,7.6,7.2,9.7 16 | "15",204.1,32.9,46,19 17 | "16",195.4,47.7,52.9,22.4 18 | "17",67.8,36.6,114,12.5 19 | "18",281.4,39.6,55.8,24.4 20 | "19",69.2,20.5,18.3,11.3 21 | "20",147.3,23.9,19.1,14.6 22 | "21",218.4,27.7,53.4,18 23 | "22",237.4,5.1,23.5,12.5 24 | "23",13.2,15.9,49.6,5.6 25 | "24",228.3,16.9,26.2,15.5 26 | "25",62.3,12.6,18.3,9.7 27 | "26",262.9,3.5,19.5,12 28 | "27",142.9,29.3,12.6,15 29 | "28",240.1,16.7,22.9,15.9 30 | "29",248.8,27.1,22.9,18.9 31 | "30",70.6,16,40.8,10.5 32 | "31",292.9,28.3,43.2,21.4 33 | "32",112.9,17.4,38.6,11.9 34 | "33",97.2,1.5,30,9.6 35 | "34",265.6,20,0.3,17.4 36 | "35",95.7,1.4,7.4,9.5 37 | "36",290.7,4.1,8.5,12.8 38 | "37",266.9,43.8,5,25.4 39 | "38",74.7,49.4,45.7,14.7 40 | "39",43.1,26.7,35.1,10.1 41 | "40",228,37.7,32,21.5 42 | "41",202.5,22.3,31.6,16.6 43 | "42",177,33.4,38.7,17.1 44 | "43",293.6,27.7,1.8,20.7 45 | "44",206.9,8.4,26.4,12.9 46 | "45",25.1,25.7,43.3,8.5 47 | "46",175.1,22.5,31.5,14.9 48 | "47",89.7,9.9,35.7,10.6 49 | "48",239.9,41.5,18.5,23.2 50 | "49",227.2,15.8,49.9,14.8 51 | "50",66.9,11.7,36.8,9.7 52 | "51",199.8,3.1,34.6,11.4 53 | "52",100.4,9.6,3.6,10.7 54 | "53",216.4,41.7,39.6,22.6 55 | "54",182.6,46.2,58.7,21.2 56 | "55",262.7,28.8,15.9,20.2 57 | "56",198.9,49.4,60,23.7 58 | "57",7.3,28.1,41.4,5.5 59 | "58",136.2,19.2,16.6,13.2 60 | "59",210.8,49.6,37.7,23.8 61 | "60",210.7,29.5,9.3,18.4 62 | "61",53.5,2,21.4,8.1 63 | "62",261.3,42.7,54.7,24.2 64 | "63",239.3,15.5,27.3,15.7 65 | "64",102.7,29.6,8.4,14 66 | "65",131.1,42.8,28.9,18 67 | "66",69,9.3,0.9,9.3 68 | "67",31.5,24.6,2.2,9.5 69 | "68",139.3,14.5,10.2,13.4 70 | "69",237.4,27.5,11,18.9 71 | "70",216.8,43.9,27.2,22.3 72 | "71",199.1,30.6,38.7,18.3 73 | "72",109.8,14.3,31.7,12.4 74 | "73",26.8,33,19.3,8.8 75 | "74",129.4,5.7,31.3,11 76 | "75",213.4,24.6,13.1,17 77 | "76",16.9,43.7,89.4,8.7 78 | "77",27.5,1.6,20.7,6.9 79 | "78",120.5,28.5,14.2,14.2 80 | "79",5.4,29.9,9.4,5.3 81 | "80",116,7.7,23.1,11 82 | "81",76.4,26.7,22.3,11.8 83 | "82",239.8,4.1,36.9,12.3 84 | "83",75.3,20.3,32.5,11.3 85 | "84",68.4,44.5,35.6,13.6 86 | "85",213.5,43,33.8,21.7 87 | "86",193.2,18.4,65.7,15.2 88 | "87",76.3,27.5,16,12 89 | "88",110.7,40.6,63.2,16 90 | "89",88.3,25.5,73.4,12.9 91 | "90",109.8,47.8,51.4,16.7 92 | "91",134.3,4.9,9.3,11.2 93 | "92",28.6,1.5,33,7.3 94 | "93",217.7,33.5,59,19.4 95 | "94",250.9,36.5,72.3,22.2 96 | "95",107.4,14,10.9,11.5 97 | "96",163.3,31.6,52.9,16.9 98 | "97",197.6,3.5,5.9,11.7 99 | "98",184.9,21,22,15.5 100 | "99",289.7,42.3,51.2,25.4 101 | "100",135.2,41.7,45.9,17.2 102 | "101",222.4,4.3,49.8,11.7 103 | "102",296.4,36.3,100.9,23.8 104 | "103",280.2,10.1,21.4,14.8 105 | "104",187.9,17.2,17.9,14.7 106 | "105",238.2,34.3,5.3,20.7 107 | "106",137.9,46.4,59,19.2 108 | "107",25,11,29.7,7.2 109 | "108",90.4,0.3,23.2,8.7 110 | "109",13.1,0.4,25.6,5.3 111 | "110",255.4,26.9,5.5,19.8 112 | "111",225.8,8.2,56.5,13.4 113 | "112",241.7,38,23.2,21.8 114 | "113",175.7,15.4,2.4,14.1 115 | "114",209.6,20.6,10.7,15.9 116 | "115",78.2,46.8,34.5,14.6 117 | "116",75.1,35,52.7,12.6 118 | "117",139.2,14.3,25.6,12.2 119 | "118",76.4,0.8,14.8,9.4 120 | "119",125.7,36.9,79.2,15.9 121 | "120",19.4,16,22.3,6.6 122 | "121",141.3,26.8,46.2,15.5 123 | "122",18.8,21.7,50.4,7 124 | "123",224,2.4,15.6,11.6 125 | "124",123.1,34.6,12.4,15.2 126 | "125",229.5,32.3,74.2,19.7 127 | "126",87.2,11.8,25.9,10.6 128 | "127",7.8,38.9,50.6,6.6 129 | "128",80.2,0,9.2,8.8 130 | "129",220.3,49,3.2,24.7 131 | "130",59.6,12,43.1,9.7 132 | "131",0.7,39.6,8.7,1.6 133 | "132",265.2,2.9,43,12.7 134 | "133",8.4,27.2,2.1,5.7 135 | "134",219.8,33.5,45.1,19.6 136 | "135",36.9,38.6,65.6,10.8 137 | "136",48.3,47,8.5,11.6 138 | "137",25.6,39,9.3,9.5 139 | "138",273.7,28.9,59.7,20.8 140 | "139",43,25.9,20.5,9.6 141 | "140",184.9,43.9,1.7,20.7 142 | "141",73.4,17,12.9,10.9 143 | "142",193.7,35.4,75.6,19.2 144 | "143",220.5,33.2,37.9,20.1 145 | "144",104.6,5.7,34.4,10.4 146 | "145",96.2,14.8,38.9,11.4 147 | "146",140.3,1.9,9,10.3 148 | "147",240.1,7.3,8.7,13.2 149 | "148",243.2,49,44.3,25.4 150 | "149",38,40.3,11.9,10.9 151 | "150",44.7,25.8,20.6,10.1 152 | "151",280.7,13.9,37,16.1 153 | "152",121,8.4,48.7,11.6 154 | "153",197.6,23.3,14.2,16.6 155 | "154",171.3,39.7,37.7,19 156 | "155",187.8,21.1,9.5,15.6 157 | "156",4.1,11.6,5.7,3.2 158 | "157",93.9,43.5,50.5,15.3 159 | "158",149.8,1.3,24.3,10.1 160 | "159",11.7,36.9,45.2,7.3 161 | "160",131.7,18.4,34.6,12.9 162 | "161",172.5,18.1,30.7,14.4 163 | "162",85.7,35.8,49.3,13.3 164 | "163",188.4,18.1,25.6,14.9 165 | "164",163.5,36.8,7.4,18 166 | "165",117.2,14.7,5.4,11.9 167 | "166",234.5,3.4,84.8,11.9 168 | "167",17.9,37.6,21.6,8 169 | "168",206.8,5.2,19.4,12.2 170 | "169",215.4,23.6,57.6,17.1 171 | "170",284.3,10.6,6.4,15 172 | "171",50,11.6,18.4,8.4 173 | "172",164.5,20.9,47.4,14.5 174 | "173",19.6,20.1,17,7.6 175 | "174",168.4,7.1,12.8,11.7 176 | "175",222.4,3.4,13.1,11.5 177 | "176",276.9,48.9,41.8,27 178 | "177",248.4,30.2,20.3,20.2 179 | "178",170.2,7.8,35.2,11.7 180 | "179",276.7,2.3,23.7,11.8 181 | "180",165.6,10,17.6,12.6 182 | "181",156.6,2.6,8.3,10.5 183 | "182",218.5,5.4,27.4,12.2 184 | "183",56.2,5.7,29.7,8.7 185 | "184",287.6,43,71.8,26.2 186 | "185",253.8,21.3,30,17.6 187 | "186",205,45.1,19.6,22.6 188 | "187",139.5,2.1,26.6,10.3 189 | "188",191.1,28.7,18.2,17.3 190 | "189",286,13.9,3.7,15.9 191 | "190",18.7,12.1,23.4,6.7 192 | "191",39.5,41.1,5.8,10.8 193 | "192",75.5,10.8,6,9.9 194 | "193",17.2,4.1,31.6,5.9 195 | "194",166.8,42,3.6,19.6 196 | "195",149.7,35.6,6,17.3 197 | "196",38.2,3.7,13.8,7.6 198 | "197",94.2,4.9,8.1,9.7 199 | "198",177,9.3,6.4,12.8 200 | "199",283.6,42,66.2,25.5 201 | "200",232.1,8.6,8.7,13.4 202 | -------------------------------------------------------------------------------- /09-LinearRegression1/SLR.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/09-LinearRegression1/SLR.pdf -------------------------------------------------------------------------------- /10-LinearRegression2/438px-Linear_regression.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/10-LinearRegression2/438px-Linear_regression.png -------------------------------------------------------------------------------- /10-LinearRegression2/9-LinearRegression2_Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "# Introduction to Data Science \n", 12 | "# Activity for Lecture 9: Linear Regression 2\n", 13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n", 14 | "\n", 15 | "Name:\n", 16 | "\n", 17 | "Email:\n", 18 | "\n", 19 | "UID:\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": { 25 | "slideshow": { 26 | "slide_type": "slide" 27 | } 28 | }, 29 | "source": [ 30 | "## Class exercise: analysis of the credit dataset \n", 31 | "\n", 32 | "Recall the 'Credit' dataset introduced in class and available [here](http://www-bcf.usc.edu/~gareth/ISL/data.html). \n", 33 | "This dataset consists of some credit card information for 400 people. \n", 34 | "\n", 35 | "First import the data and convert income to thousands.\n" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 1, 41 | "metadata": {}, 42 | "outputs": [ 43 | { 44 | "data": { 45 | "text/html": [ 46 | "
\n", 47 | "\n", 60 | "
AgeEducationHusband-EducationChildrenReligion
\n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | "
IncomeLimitRatingCardsAgeEducationGenderStudentMarriedEthnicityBalance
114891.0360628323411MaleNoYesCaucasian333
2106025.0664548338215FemaleYesYesAsian903
3104593.0707551447111MaleNoNoAsian580
4148924.0950468133611FemaleNoNoAsian964
555882.0489735726816MaleNoYesCaucasian331
....................................
39612096.0410030733213MaleNoYesCaucasian560
39713364.0383829656517MaleNoNoAfrican American480
39857872.0417132156712FemaleNoYesCaucasian138
39937728.0252519214413MaleNoYesCaucasian0
40018701.055244155647FemaleNoNoAsian966
\n", 234 | "

400 rows × 11 columns

\n", 235 | "
" 236 | ], 237 | "text/plain": [ 238 | " Income Limit Rating Cards Age Education Gender Student Married \\\n", 239 | "1 14891.0 3606 283 2 34 11 Male No Yes \n", 240 | "2 106025.0 6645 483 3 82 15 Female Yes Yes \n", 241 | "3 104593.0 7075 514 4 71 11 Male No No \n", 242 | "4 148924.0 9504 681 3 36 11 Female No No \n", 243 | "5 55882.0 4897 357 2 68 16 Male No Yes \n", 244 | ".. ... ... ... ... ... ... ... ... ... \n", 245 | "396 12096.0 4100 307 3 32 13 Male No Yes \n", 246 | "397 13364.0 3838 296 5 65 17 Male No No \n", 247 | "398 57872.0 4171 321 5 67 12 Female No Yes \n", 248 | "399 37728.0 2525 192 1 44 13 Male No Yes \n", 249 | "400 18701.0 5524 415 5 64 7 Female No No \n", 250 | "\n", 251 | " Ethnicity Balance \n", 252 | "1 Caucasian 333 \n", 253 | "2 Asian 903 \n", 254 | "3 Asian 580 \n", 255 | "4 Asian 964 \n", 256 | "5 Caucasian 331 \n", 257 | ".. ... ... \n", 258 | "396 Caucasian 560 \n", 259 | "397 African American 480 \n", 260 | "398 Caucasian 138 \n", 261 | "399 Caucasian 0 \n", 262 | "400 Asian 966 \n", 263 | "\n", 264 | "[400 rows x 11 columns]" 265 | ] 266 | }, 267 | "execution_count": 1, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "# imports and setup\n", 274 | "\n", 275 | "import scipy as sc\n", 276 | "import numpy as np\n", 277 | "\n", 278 | "import pandas as pd\n", 279 | "import statsmodels.formula.api as sm #Last lecture: used statsmodels.formula.api.ols() for OLS\n", 280 | "from sklearn import linear_model #Last lecture: used sklearn.linear_model.LinearRegression() for OLS\n", 281 | "\n", 282 | "import matplotlib.pyplot as plt\n", 283 | "%matplotlib inline \n", 284 | "plt.rcParams['figure.figsize'] = (10, 6)\n", 285 | "\n", 286 | "from mpl_toolkits.mplot3d import Axes3D\n", 287 | "from matplotlib import cm\n", 288 | "\n", 289 | "# Import data from Credit.csv file\n", 290 | "credit = pd.read_csv('Credit.csv',index_col=0) #load data\n", 291 | "credit[\"Income\"] = credit[\"Income\"].map(lambda x: 1000*x)\n", 292 | "credit" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "## Activity 1: A First Regression Model\n", 300 | "\n", 301 | "**Exercise:** First regress Limit on Rating: \n", 302 | "$$\n", 303 | "\\text{Limit} = \\beta_0 + \\beta_1 \\text{Rating}. \n", 304 | "$$\n", 305 | "Since credit ratings are primarily used by banks to determine credit limits, we expect that Rating is very predictive for Limit, so this regression should be very good. \n", 306 | "\n", 307 | "Use the 'ols' function from the statsmodels python library. What is the $R^2$ value? What are $H_0$ and $H_A$ for the associated hypothesis test and what is the $p$-value? \n" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 18, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "# your code goes here\n" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "**Your answer goes here:**" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": { 329 | "slideshow": { 330 | "slide_type": "slide" 331 | } 332 | }, 333 | "source": [ 334 | "## Activity 2: Predicting Limit without Rating \n", 335 | "\n", 336 | "Since Rating and Limit are almost the same variable, next we'll forget about Rating and just try to predict Limit from the real-valued variables (non-categorical variables): Income, Cards, Age, Education, Balance. \n", 337 | "\n", 338 | "**Exercise:** Develop a multilinear regression model to predict Rating. Interpret the results. \n", 339 | "\n", 340 | "For now, just focus on the real-valued variables (Income, Cards, Age, Education, Balance)\n", 341 | "and ignore the categorical variables (Gender, Student, Married, Ethnicity). \n", 342 | "\n" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 19, 348 | "metadata": { 349 | "slideshow": { 350 | "slide_type": "-" 351 | } 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "# your code goes here \n" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": { 361 | "slideshow": { 362 | "slide_type": "-" 363 | } 364 | }, 365 | "source": [ 366 | "Which independent variables are good/bad predictors? What is the best overall model?\n", 367 | "\n", 368 | "**Your observations:**\n" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": { 374 | "slideshow": { 375 | "slide_type": "slide" 376 | } 377 | }, 378 | "source": [ 379 | "## Activity 3: Incorporating Categorical Variables Into Regression Models\n", 380 | "\n", 381 | "Now consider the binary categorical variables which we mapped to integer 0, 1 values in class." 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 2, 387 | "metadata": { 388 | "slideshow": { 389 | "slide_type": "-" 390 | } 391 | }, 392 | "outputs": [], 393 | "source": [ 394 | "credit[\"Gender_num\"] = credit[\"Gender\"].map({' Male':0, 'Female':1})\n", 395 | "credit[\"Student_num\"] = credit[\"Student\"].map({'Yes':1, 'No':0})\n", 396 | "credit[\"Married_num\"] = credit[\"Married\"].map({'Yes':1, 'No':0})" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": { 402 | "slideshow": { 403 | "slide_type": "-" 404 | } 405 | }, 406 | "source": [ 407 | "Can you improve the model you developed in Activity 2 by incorporating one or more of these variables?\n" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 3, 413 | "metadata": { 414 | "slideshow": { 415 | "slide_type": "-" 416 | } 417 | }, 418 | "outputs": [], 419 | "source": [ 420 | "# your code here \n" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "**Your answer goes here:**" 428 | ] 429 | } 430 | ], 431 | "metadata": { 432 | "anaconda-cloud": {}, 433 | "celltoolbar": "Slideshow", 434 | "kernelspec": { 435 | "display_name": "Python 3 (ipykernel)", 436 | "language": "python", 437 | "name": "python3" 438 | }, 439 | "language_info": { 440 | "codemirror_mode": { 441 | "name": "ipython", 442 | "version": 3 443 | }, 444 | "file_extension": ".py", 445 | "mimetype": "text/x-python", 446 | "name": "python", 447 | "nbconvert_exporter": "python", 448 | "pygments_lexer": "ipython3", 449 | "version": "3.11.5" 450 | } 451 | }, 452 | "nbformat": 4, 453 | "nbformat_minor": 4 454 | } 455 | -------------------------------------------------------------------------------- /10-LinearRegression2/Advertising.csv: -------------------------------------------------------------------------------- 1 | "","TV","Radio","Newspaper","Sales" 2 | "1",230.1,37.8,69.2,22.1 3 | "2",44.5,39.3,45.1,10.4 4 | "3",17.2,45.9,69.3,9.3 5 | "4",151.5,41.3,58.5,18.5 6 | "5",180.8,10.8,58.4,12.9 7 | "6",8.7,48.9,75,7.2 8 | "7",57.5,32.8,23.5,11.8 9 | "8",120.2,19.6,11.6,13.2 10 | "9",8.6,2.1,1,4.8 11 | "10",199.8,2.6,21.2,10.6 12 | "11",66.1,5.8,24.2,8.6 13 | "12",214.7,24,4,17.4 14 | "13",23.8,35.1,65.9,9.2 15 | "14",97.5,7.6,7.2,9.7 16 | "15",204.1,32.9,46,19 17 | "16",195.4,47.7,52.9,22.4 18 | "17",67.8,36.6,114,12.5 19 | "18",281.4,39.6,55.8,24.4 20 | "19",69.2,20.5,18.3,11.3 21 | "20",147.3,23.9,19.1,14.6 22 | "21",218.4,27.7,53.4,18 23 | "22",237.4,5.1,23.5,12.5 24 | "23",13.2,15.9,49.6,5.6 25 | "24",228.3,16.9,26.2,15.5 26 | "25",62.3,12.6,18.3,9.7 27 | "26",262.9,3.5,19.5,12 28 | "27",142.9,29.3,12.6,15 29 | "28",240.1,16.7,22.9,15.9 30 | "29",248.8,27.1,22.9,18.9 31 | "30",70.6,16,40.8,10.5 32 | "31",292.9,28.3,43.2,21.4 33 | "32",112.9,17.4,38.6,11.9 34 | "33",97.2,1.5,30,9.6 35 | "34",265.6,20,0.3,17.4 36 | "35",95.7,1.4,7.4,9.5 37 | "36",290.7,4.1,8.5,12.8 38 | "37",266.9,43.8,5,25.4 39 | "38",74.7,49.4,45.7,14.7 40 | "39",43.1,26.7,35.1,10.1 41 | "40",228,37.7,32,21.5 42 | "41",202.5,22.3,31.6,16.6 43 | "42",177,33.4,38.7,17.1 44 | "43",293.6,27.7,1.8,20.7 45 | "44",206.9,8.4,26.4,12.9 46 | "45",25.1,25.7,43.3,8.5 47 | "46",175.1,22.5,31.5,14.9 48 | "47",89.7,9.9,35.7,10.6 49 | "48",239.9,41.5,18.5,23.2 50 | "49",227.2,15.8,49.9,14.8 51 | "50",66.9,11.7,36.8,9.7 52 | "51",199.8,3.1,34.6,11.4 53 | "52",100.4,9.6,3.6,10.7 54 | "53",216.4,41.7,39.6,22.6 55 | "54",182.6,46.2,58.7,21.2 56 | "55",262.7,28.8,15.9,20.2 57 | "56",198.9,49.4,60,23.7 58 | "57",7.3,28.1,41.4,5.5 59 | "58",136.2,19.2,16.6,13.2 60 | "59",210.8,49.6,37.7,23.8 61 | "60",210.7,29.5,9.3,18.4 62 | "61",53.5,2,21.4,8.1 63 | "62",261.3,42.7,54.7,24.2 64 | "63",239.3,15.5,27.3,15.7 65 | "64",102.7,29.6,8.4,14 66 | "65",131.1,42.8,28.9,18 67 | "66",69,9.3,0.9,9.3 68 | "67",31.5,24.6,2.2,9.5 69 | "68",139.3,14.5,10.2,13.4 70 | "69",237.4,27.5,11,18.9 71 | "70",216.8,43.9,27.2,22.3 72 | "71",199.1,30.6,38.7,18.3 73 | "72",109.8,14.3,31.7,12.4 74 | "73",26.8,33,19.3,8.8 75 | "74",129.4,5.7,31.3,11 76 | "75",213.4,24.6,13.1,17 77 | "76",16.9,43.7,89.4,8.7 78 | "77",27.5,1.6,20.7,6.9 79 | "78",120.5,28.5,14.2,14.2 80 | "79",5.4,29.9,9.4,5.3 81 | "80",116,7.7,23.1,11 82 | "81",76.4,26.7,22.3,11.8 83 | "82",239.8,4.1,36.9,12.3 84 | "83",75.3,20.3,32.5,11.3 85 | "84",68.4,44.5,35.6,13.6 86 | "85",213.5,43,33.8,21.7 87 | "86",193.2,18.4,65.7,15.2 88 | "87",76.3,27.5,16,12 89 | "88",110.7,40.6,63.2,16 90 | "89",88.3,25.5,73.4,12.9 91 | "90",109.8,47.8,51.4,16.7 92 | "91",134.3,4.9,9.3,11.2 93 | "92",28.6,1.5,33,7.3 94 | "93",217.7,33.5,59,19.4 95 | "94",250.9,36.5,72.3,22.2 96 | "95",107.4,14,10.9,11.5 97 | "96",163.3,31.6,52.9,16.9 98 | "97",197.6,3.5,5.9,11.7 99 | "98",184.9,21,22,15.5 100 | "99",289.7,42.3,51.2,25.4 101 | "100",135.2,41.7,45.9,17.2 102 | "101",222.4,4.3,49.8,11.7 103 | "102",296.4,36.3,100.9,23.8 104 | "103",280.2,10.1,21.4,14.8 105 | "104",187.9,17.2,17.9,14.7 106 | "105",238.2,34.3,5.3,20.7 107 | "106",137.9,46.4,59,19.2 108 | "107",25,11,29.7,7.2 109 | "108",90.4,0.3,23.2,8.7 110 | "109",13.1,0.4,25.6,5.3 111 | "110",255.4,26.9,5.5,19.8 112 | "111",225.8,8.2,56.5,13.4 113 | "112",241.7,38,23.2,21.8 114 | "113",175.7,15.4,2.4,14.1 115 | "114",209.6,20.6,10.7,15.9 116 | "115",78.2,46.8,34.5,14.6 117 | "116",75.1,35,52.7,12.6 118 | "117",139.2,14.3,25.6,12.2 119 | "118",76.4,0.8,14.8,9.4 120 | "119",125.7,36.9,79.2,15.9 121 | "120",19.4,16,22.3,6.6 122 | "121",141.3,26.8,46.2,15.5 123 | "122",18.8,21.7,50.4,7 124 | "123",224,2.4,15.6,11.6 125 | "124",123.1,34.6,12.4,15.2 126 | "125",229.5,32.3,74.2,19.7 127 | "126",87.2,11.8,25.9,10.6 128 | "127",7.8,38.9,50.6,6.6 129 | "128",80.2,0,9.2,8.8 130 | "129",220.3,49,3.2,24.7 131 | "130",59.6,12,43.1,9.7 132 | "131",0.7,39.6,8.7,1.6 133 | "132",265.2,2.9,43,12.7 134 | "133",8.4,27.2,2.1,5.7 135 | "134",219.8,33.5,45.1,19.6 136 | "135",36.9,38.6,65.6,10.8 137 | "136",48.3,47,8.5,11.6 138 | "137",25.6,39,9.3,9.5 139 | "138",273.7,28.9,59.7,20.8 140 | "139",43,25.9,20.5,9.6 141 | "140",184.9,43.9,1.7,20.7 142 | "141",73.4,17,12.9,10.9 143 | "142",193.7,35.4,75.6,19.2 144 | "143",220.5,33.2,37.9,20.1 145 | "144",104.6,5.7,34.4,10.4 146 | "145",96.2,14.8,38.9,11.4 147 | "146",140.3,1.9,9,10.3 148 | "147",240.1,7.3,8.7,13.2 149 | "148",243.2,49,44.3,25.4 150 | "149",38,40.3,11.9,10.9 151 | "150",44.7,25.8,20.6,10.1 152 | "151",280.7,13.9,37,16.1 153 | "152",121,8.4,48.7,11.6 154 | "153",197.6,23.3,14.2,16.6 155 | "154",171.3,39.7,37.7,19 156 | "155",187.8,21.1,9.5,15.6 157 | "156",4.1,11.6,5.7,3.2 158 | "157",93.9,43.5,50.5,15.3 159 | "158",149.8,1.3,24.3,10.1 160 | "159",11.7,36.9,45.2,7.3 161 | "160",131.7,18.4,34.6,12.9 162 | "161",172.5,18.1,30.7,14.4 163 | "162",85.7,35.8,49.3,13.3 164 | "163",188.4,18.1,25.6,14.9 165 | "164",163.5,36.8,7.4,18 166 | "165",117.2,14.7,5.4,11.9 167 | "166",234.5,3.4,84.8,11.9 168 | "167",17.9,37.6,21.6,8 169 | "168",206.8,5.2,19.4,12.2 170 | "169",215.4,23.6,57.6,17.1 171 | "170",284.3,10.6,6.4,15 172 | "171",50,11.6,18.4,8.4 173 | "172",164.5,20.9,47.4,14.5 174 | "173",19.6,20.1,17,7.6 175 | "174",168.4,7.1,12.8,11.7 176 | "175",222.4,3.4,13.1,11.5 177 | "176",276.9,48.9,41.8,27 178 | "177",248.4,30.2,20.3,20.2 179 | "178",170.2,7.8,35.2,11.7 180 | "179",276.7,2.3,23.7,11.8 181 | "180",165.6,10,17.6,12.6 182 | "181",156.6,2.6,8.3,10.5 183 | "182",218.5,5.4,27.4,12.2 184 | "183",56.2,5.7,29.7,8.7 185 | "184",287.6,43,71.8,26.2 186 | "185",253.8,21.3,30,17.6 187 | "186",205,45.1,19.6,22.6 188 | "187",139.5,2.1,26.6,10.3 189 | "188",191.1,28.7,18.2,17.3 190 | "189",286,13.9,3.7,15.9 191 | "190",18.7,12.1,23.4,6.7 192 | "191",39.5,41.1,5.8,10.8 193 | "192",75.5,10.8,6,9.9 194 | "193",17.2,4.1,31.6,5.9 195 | "194",166.8,42,3.6,19.6 196 | "195",149.7,35.6,6,17.3 197 | "196",38.2,3.7,13.8,7.6 198 | "197",94.2,4.9,8.1,9.7 199 | "198",177,9.3,6.4,12.8 200 | "199",283.6,42,66.2,25.5 201 | "200",232.1,8.6,8.7,13.4 202 | -------------------------------------------------------------------------------- /10-LinearRegression2/Auto.csv: -------------------------------------------------------------------------------- 1 | mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name 2 | 18,8,307,130,3504,12,70,1,chevrolet chevelle malibu 3 | 15,8,350,165,3693,11.5,70,1,buick skylark 320 4 | 18,8,318,150,3436,11,70,1,plymouth satellite 5 | 16,8,304,150,3433,12,70,1,amc rebel sst 6 | 17,8,302,140,3449,10.5,70,1,ford torino 7 | 15,8,429,198,4341,10,70,1,ford galaxie 500 8 | 14,8,454,220,4354,9,70,1,chevrolet impala 9 | 14,8,440,215,4312,8.5,70,1,plymouth fury iii 10 | 14,8,455,225,4425,10,70,1,pontiac catalina 11 | 15,8,390,190,3850,8.5,70,1,amc ambassador dpl 12 | 15,8,383,170,3563,10,70,1,dodge challenger se 13 | 14,8,340,160,3609,8,70,1,plymouth 'cuda 340 14 | 15,8,400,150,3761,9.5,70,1,chevrolet monte carlo 15 | 14,8,455,225,3086,10,70,1,buick estate wagon (sw) 16 | 24,4,113,95,2372,15,70,3,toyota corona mark ii 17 | 22,6,198,95,2833,15.5,70,1,plymouth duster 18 | 18,6,199,97,2774,15.5,70,1,amc hornet 19 | 21,6,200,85,2587,16,70,1,ford maverick 20 | 27,4,97,88,2130,14.5,70,3,datsun pl510 21 | 26,4,97,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan 22 | 25,4,110,87,2672,17.5,70,2,peugeot 504 23 | 24,4,107,90,2430,14.5,70,2,audi 100 ls 24 | 25,4,104,95,2375,17.5,70,2,saab 99e 25 | 26,4,121,113,2234,12.5,70,2,bmw 2002 26 | 21,6,199,90,2648,15,70,1,amc gremlin 27 | 10,8,360,215,4615,14,70,1,ford f250 28 | 10,8,307,200,4376,15,70,1,chevy c20 29 | 11,8,318,210,4382,13.5,70,1,dodge d200 30 | 9,8,304,193,4732,18.5,70,1,hi 1200d 31 | 27,4,97,88,2130,14.5,71,3,datsun pl510 32 | 28,4,140,90,2264,15.5,71,1,chevrolet vega 2300 33 | 25,4,113,95,2228,14,71,3,toyota corona 34 | 25,4,98,?,2046,19,71,1,ford pinto 35 | 19,6,232,100,2634,13,71,1,amc gremlin 36 | 16,6,225,105,3439,15.5,71,1,plymouth satellite custom 37 | 17,6,250,100,3329,15.5,71,1,chevrolet chevelle malibu 38 | 19,6,250,88,3302,15.5,71,1,ford torino 500 39 | 18,6,232,100,3288,15.5,71,1,amc matador 40 | 14,8,350,165,4209,12,71,1,chevrolet impala 41 | 14,8,400,175,4464,11.5,71,1,pontiac catalina brougham 42 | 14,8,351,153,4154,13.5,71,1,ford galaxie 500 43 | 14,8,318,150,4096,13,71,1,plymouth fury iii 44 | 12,8,383,180,4955,11.5,71,1,dodge monaco (sw) 45 | 13,8,400,170,4746,12,71,1,ford country squire (sw) 46 | 13,8,400,175,5140,12,71,1,pontiac safari (sw) 47 | 18,6,258,110,2962,13.5,71,1,amc hornet sportabout (sw) 48 | 22,4,140,72,2408,19,71,1,chevrolet vega (sw) 49 | 19,6,250,100,3282,15,71,1,pontiac firebird 50 | 18,6,250,88,3139,14.5,71,1,ford mustang 51 | 23,4,122,86,2220,14,71,1,mercury capri 2000 52 | 28,4,116,90,2123,14,71,2,opel 1900 53 | 30,4,79,70,2074,19.5,71,2,peugeot 304 54 | 30,4,88,76,2065,14.5,71,2,fiat 124b 55 | 31,4,71,65,1773,19,71,3,toyota corolla 1200 56 | 35,4,72,69,1613,18,71,3,datsun 1200 57 | 27,4,97,60,1834,19,71,2,volkswagen model 111 58 | 26,4,91,70,1955,20.5,71,1,plymouth cricket 59 | 24,4,113,95,2278,15.5,72,3,toyota corona hardtop 60 | 25,4,97.5,80,2126,17,72,1,dodge colt hardtop 61 | 23,4,97,54,2254,23.5,72,2,volkswagen type 3 62 | 20,4,140,90,2408,19.5,72,1,chevrolet vega 63 | 21,4,122,86,2226,16.5,72,1,ford pinto runabout 64 | 13,8,350,165,4274,12,72,1,chevrolet impala 65 | 14,8,400,175,4385,12,72,1,pontiac catalina 66 | 15,8,318,150,4135,13.5,72,1,plymouth fury iii 67 | 14,8,351,153,4129,13,72,1,ford galaxie 500 68 | 17,8,304,150,3672,11.5,72,1,amc ambassador sst 69 | 11,8,429,208,4633,11,72,1,mercury marquis 70 | 13,8,350,155,4502,13.5,72,1,buick lesabre custom 71 | 12,8,350,160,4456,13.5,72,1,oldsmobile delta 88 royale 72 | 13,8,400,190,4422,12.5,72,1,chrysler newport royal 73 | 19,3,70,97,2330,13.5,72,3,mazda rx2 coupe 74 | 15,8,304,150,3892,12.5,72,1,amc matador (sw) 75 | 13,8,307,130,4098,14,72,1,chevrolet chevelle concours (sw) 76 | 13,8,302,140,4294,16,72,1,ford gran torino (sw) 77 | 14,8,318,150,4077,14,72,1,plymouth satellite custom (sw) 78 | 18,4,121,112,2933,14.5,72,2,volvo 145e (sw) 79 | 22,4,121,76,2511,18,72,2,volkswagen 411 (sw) 80 | 21,4,120,87,2979,19.5,72,2,peugeot 504 (sw) 81 | 26,4,96,69,2189,18,72,2,renault 12 (sw) 82 | 22,4,122,86,2395,16,72,1,ford pinto (sw) 83 | 28,4,97,92,2288,17,72,3,datsun 510 (sw) 84 | 23,4,120,97,2506,14.5,72,3,toyouta corona mark ii (sw) 85 | 28,4,98,80,2164,15,72,1,dodge colt (sw) 86 | 27,4,97,88,2100,16.5,72,3,toyota corolla 1600 (sw) 87 | 13,8,350,175,4100,13,73,1,buick century 350 88 | 14,8,304,150,3672,11.5,73,1,amc matador 89 | 13,8,350,145,3988,13,73,1,chevrolet malibu 90 | 14,8,302,137,4042,14.5,73,1,ford gran torino 91 | 15,8,318,150,3777,12.5,73,1,dodge coronet custom 92 | 12,8,429,198,4952,11.5,73,1,mercury marquis brougham 93 | 13,8,400,150,4464,12,73,1,chevrolet caprice classic 94 | 13,8,351,158,4363,13,73,1,ford ltd 95 | 14,8,318,150,4237,14.5,73,1,plymouth fury gran sedan 96 | 13,8,440,215,4735,11,73,1,chrysler new yorker brougham 97 | 12,8,455,225,4951,11,73,1,buick electra 225 custom 98 | 13,8,360,175,3821,11,73,1,amc ambassador brougham 99 | 18,6,225,105,3121,16.5,73,1,plymouth valiant 100 | 16,6,250,100,3278,18,73,1,chevrolet nova custom 101 | 18,6,232,100,2945,16,73,1,amc hornet 102 | 18,6,250,88,3021,16.5,73,1,ford maverick 103 | 23,6,198,95,2904,16,73,1,plymouth duster 104 | 26,4,97,46,1950,21,73,2,volkswagen super beetle 105 | 11,8,400,150,4997,14,73,1,chevrolet impala 106 | 12,8,400,167,4906,12.5,73,1,ford country 107 | 13,8,360,170,4654,13,73,1,plymouth custom suburb 108 | 12,8,350,180,4499,12.5,73,1,oldsmobile vista cruiser 109 | 18,6,232,100,2789,15,73,1,amc gremlin 110 | 20,4,97,88,2279,19,73,3,toyota carina 111 | 21,4,140,72,2401,19.5,73,1,chevrolet vega 112 | 22,4,108,94,2379,16.5,73,3,datsun 610 113 | 18,3,70,90,2124,13.5,73,3,maxda rx3 114 | 19,4,122,85,2310,18.5,73,1,ford pinto 115 | 21,6,155,107,2472,14,73,1,mercury capri v6 116 | 26,4,98,90,2265,15.5,73,2,fiat 124 sport coupe 117 | 15,8,350,145,4082,13,73,1,chevrolet monte carlo s 118 | 16,8,400,230,4278,9.5,73,1,pontiac grand prix 119 | 29,4,68,49,1867,19.5,73,2,fiat 128 120 | 24,4,116,75,2158,15.5,73,2,opel manta 121 | 20,4,114,91,2582,14,73,2,audi 100ls 122 | 19,4,121,112,2868,15.5,73,2,volvo 144ea 123 | 15,8,318,150,3399,11,73,1,dodge dart custom 124 | 24,4,121,110,2660,14,73,2,saab 99le 125 | 20,6,156,122,2807,13.5,73,3,toyota mark ii 126 | 11,8,350,180,3664,11,73,1,oldsmobile omega 127 | 20,6,198,95,3102,16.5,74,1,plymouth duster 128 | 21,6,200,?,2875,17,74,1,ford maverick 129 | 19,6,232,100,2901,16,74,1,amc hornet 130 | 15,6,250,100,3336,17,74,1,chevrolet nova 131 | 31,4,79,67,1950,19,74,3,datsun b210 132 | 26,4,122,80,2451,16.5,74,1,ford pinto 133 | 32,4,71,65,1836,21,74,3,toyota corolla 1200 134 | 25,4,140,75,2542,17,74,1,chevrolet vega 135 | 16,6,250,100,3781,17,74,1,chevrolet chevelle malibu classic 136 | 16,6,258,110,3632,18,74,1,amc matador 137 | 18,6,225,105,3613,16.5,74,1,plymouth satellite sebring 138 | 16,8,302,140,4141,14,74,1,ford gran torino 139 | 13,8,350,150,4699,14.5,74,1,buick century luxus (sw) 140 | 14,8,318,150,4457,13.5,74,1,dodge coronet custom (sw) 141 | 14,8,302,140,4638,16,74,1,ford gran torino (sw) 142 | 14,8,304,150,4257,15.5,74,1,amc matador (sw) 143 | 29,4,98,83,2219,16.5,74,2,audi fox 144 | 26,4,79,67,1963,15.5,74,2,volkswagen dasher 145 | 26,4,97,78,2300,14.5,74,2,opel manta 146 | 31,4,76,52,1649,16.5,74,3,toyota corona 147 | 32,4,83,61,2003,19,74,3,datsun 710 148 | 28,4,90,75,2125,14.5,74,1,dodge colt 149 | 24,4,90,75,2108,15.5,74,2,fiat 128 150 | 26,4,116,75,2246,14,74,2,fiat 124 tc 151 | 24,4,120,97,2489,15,74,3,honda civic 152 | 26,4,108,93,2391,15.5,74,3,subaru 153 | 31,4,79,67,2000,16,74,2,fiat x1.9 154 | 19,6,225,95,3264,16,75,1,plymouth valiant custom 155 | 18,6,250,105,3459,16,75,1,chevrolet nova 156 | 15,6,250,72,3432,21,75,1,mercury monarch 157 | 15,6,250,72,3158,19.5,75,1,ford maverick 158 | 16,8,400,170,4668,11.5,75,1,pontiac catalina 159 | 15,8,350,145,4440,14,75,1,chevrolet bel air 160 | 16,8,318,150,4498,14.5,75,1,plymouth grand fury 161 | 14,8,351,148,4657,13.5,75,1,ford ltd 162 | 17,6,231,110,3907,21,75,1,buick century 163 | 16,6,250,105,3897,18.5,75,1,chevroelt chevelle malibu 164 | 15,6,258,110,3730,19,75,1,amc matador 165 | 18,6,225,95,3785,19,75,1,plymouth fury 166 | 21,6,231,110,3039,15,75,1,buick skyhawk 167 | 20,8,262,110,3221,13.5,75,1,chevrolet monza 2+2 168 | 13,8,302,129,3169,12,75,1,ford mustang ii 169 | 29,4,97,75,2171,16,75,3,toyota corolla 170 | 23,4,140,83,2639,17,75,1,ford pinto 171 | 20,6,232,100,2914,16,75,1,amc gremlin 172 | 23,4,140,78,2592,18.5,75,1,pontiac astro 173 | 24,4,134,96,2702,13.5,75,3,toyota corona 174 | 25,4,90,71,2223,16.5,75,2,volkswagen dasher 175 | 24,4,119,97,2545,17,75,3,datsun 710 176 | 18,6,171,97,2984,14.5,75,1,ford pinto 177 | 29,4,90,70,1937,14,75,2,volkswagen rabbit 178 | 19,6,232,90,3211,17,75,1,amc pacer 179 | 23,4,115,95,2694,15,75,2,audi 100ls 180 | 23,4,120,88,2957,17,75,2,peugeot 504 181 | 22,4,121,98,2945,14.5,75,2,volvo 244dl 182 | 25,4,121,115,2671,13.5,75,2,saab 99le 183 | 33,4,91,53,1795,17.5,75,3,honda civic cvcc 184 | 28,4,107,86,2464,15.5,76,2,fiat 131 185 | 25,4,116,81,2220,16.9,76,2,opel 1900 186 | 25,4,140,92,2572,14.9,76,1,capri ii 187 | 26,4,98,79,2255,17.7,76,1,dodge colt 188 | 27,4,101,83,2202,15.3,76,2,renault 12tl 189 | 17.5,8,305,140,4215,13,76,1,chevrolet chevelle malibu classic 190 | 16,8,318,150,4190,13,76,1,dodge coronet brougham 191 | 15.5,8,304,120,3962,13.9,76,1,amc matador 192 | 14.5,8,351,152,4215,12.8,76,1,ford gran torino 193 | 22,6,225,100,3233,15.4,76,1,plymouth valiant 194 | 22,6,250,105,3353,14.5,76,1,chevrolet nova 195 | 24,6,200,81,3012,17.6,76,1,ford maverick 196 | 22.5,6,232,90,3085,17.6,76,1,amc hornet 197 | 29,4,85,52,2035,22.2,76,1,chevrolet chevette 198 | 24.5,4,98,60,2164,22.1,76,1,chevrolet woody 199 | 29,4,90,70,1937,14.2,76,2,vw rabbit 200 | 33,4,91,53,1795,17.4,76,3,honda civic 201 | 20,6,225,100,3651,17.7,76,1,dodge aspen se 202 | 18,6,250,78,3574,21,76,1,ford granada ghia 203 | 18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj 204 | 17.5,6,258,95,3193,17.8,76,1,amc pacer d/l 205 | 29.5,4,97,71,1825,12.2,76,2,volkswagen rabbit 206 | 32,4,85,70,1990,17,76,3,datsun b-210 207 | 28,4,97,75,2155,16.4,76,3,toyota corolla 208 | 26.5,4,140,72,2565,13.6,76,1,ford pinto 209 | 20,4,130,102,3150,15.7,76,2,volvo 245 210 | 13,8,318,150,3940,13.2,76,1,plymouth volare premier v8 211 | 19,4,120,88,3270,21.9,76,2,peugeot 504 212 | 19,6,156,108,2930,15.5,76,3,toyota mark ii 213 | 16.5,6,168,120,3820,16.7,76,2,mercedes-benz 280s 214 | 16.5,8,350,180,4380,12.1,76,1,cadillac seville 215 | 13,8,350,145,4055,12,76,1,chevy c10 216 | 13,8,302,130,3870,15,76,1,ford f108 217 | 13,8,318,150,3755,14,76,1,dodge d100 218 | 31.5,4,98,68,2045,18.5,77,3,honda accord cvcc 219 | 30,4,111,80,2155,14.8,77,1,buick opel isuzu deluxe 220 | 36,4,79,58,1825,18.6,77,2,renault 5 gtl 221 | 25.5,4,122,96,2300,15.5,77,1,plymouth arrow gs 222 | 33.5,4,85,70,1945,16.8,77,3,datsun f-10 hatchback 223 | 17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic 224 | 17,8,260,110,4060,19,77,1,oldsmobile cutlass supreme 225 | 15.5,8,318,145,4140,13.7,77,1,dodge monaco brougham 226 | 15,8,302,130,4295,14.9,77,1,mercury cougar brougham 227 | 17.5,6,250,110,3520,16.4,77,1,chevrolet concours 228 | 20.5,6,231,105,3425,16.9,77,1,buick skylark 229 | 19,6,225,100,3630,17.7,77,1,plymouth volare custom 230 | 18.5,6,250,98,3525,19,77,1,ford granada 231 | 16,8,400,180,4220,11.1,77,1,pontiac grand prix lj 232 | 15.5,8,350,170,4165,11.4,77,1,chevrolet monte carlo landau 233 | 15.5,8,400,190,4325,12.2,77,1,chrysler cordoba 234 | 16,8,351,149,4335,14.5,77,1,ford thunderbird 235 | 29,4,97,78,1940,14.5,77,2,volkswagen rabbit custom 236 | 24.5,4,151,88,2740,16,77,1,pontiac sunbird coupe 237 | 26,4,97,75,2265,18.2,77,3,toyota corolla liftback 238 | 25.5,4,140,89,2755,15.8,77,1,ford mustang ii 2+2 239 | 30.5,4,98,63,2051,17,77,1,chevrolet chevette 240 | 33.5,4,98,83,2075,15.9,77,1,dodge colt m/m 241 | 30,4,97,67,1985,16.4,77,3,subaru dl 242 | 30.5,4,97,78,2190,14.1,77,2,volkswagen dasher 243 | 22,6,146,97,2815,14.5,77,3,datsun 810 244 | 21.5,4,121,110,2600,12.8,77,2,bmw 320i 245 | 21.5,3,80,110,2720,13.5,77,3,mazda rx-4 246 | 43.1,4,90,48,1985,21.5,78,2,volkswagen rabbit custom diesel 247 | 36.1,4,98,66,1800,14.4,78,1,ford fiesta 248 | 32.8,4,78,52,1985,19.4,78,3,mazda glc deluxe 249 | 39.4,4,85,70,2070,18.6,78,3,datsun b210 gx 250 | 36.1,4,91,60,1800,16.4,78,3,honda civic cvcc 251 | 19.9,8,260,110,3365,15.5,78,1,oldsmobile cutlass salon brougham 252 | 19.4,8,318,140,3735,13.2,78,1,dodge diplomat 253 | 20.2,8,302,139,3570,12.8,78,1,mercury monarch ghia 254 | 19.2,6,231,105,3535,19.2,78,1,pontiac phoenix lj 255 | 20.5,6,200,95,3155,18.2,78,1,chevrolet malibu 256 | 20.2,6,200,85,2965,15.8,78,1,ford fairmont (auto) 257 | 25.1,4,140,88,2720,15.4,78,1,ford fairmont (man) 258 | 20.5,6,225,100,3430,17.2,78,1,plymouth volare 259 | 19.4,6,232,90,3210,17.2,78,1,amc concord 260 | 20.6,6,231,105,3380,15.8,78,1,buick century special 261 | 20.8,6,200,85,3070,16.7,78,1,mercury zephyr 262 | 18.6,6,225,110,3620,18.7,78,1,dodge aspen 263 | 18.1,6,258,120,3410,15.1,78,1,amc concord d/l 264 | 19.2,8,305,145,3425,13.2,78,1,chevrolet monte carlo landau 265 | 17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo) 266 | 18.1,8,302,139,3205,11.2,78,1,ford futura 267 | 17.5,8,318,140,4080,13.7,78,1,dodge magnum xe 268 | 30,4,98,68,2155,16.5,78,1,chevrolet chevette 269 | 27.5,4,134,95,2560,14.2,78,3,toyota corona 270 | 27.2,4,119,97,2300,14.7,78,3,datsun 510 271 | 30.9,4,105,75,2230,14.5,78,1,dodge omni 272 | 21.1,4,134,95,2515,14.8,78,3,toyota celica gt liftback 273 | 23.2,4,156,105,2745,16.7,78,1,plymouth sapporo 274 | 23.8,4,151,85,2855,17.6,78,1,oldsmobile starfire sx 275 | 23.9,4,119,97,2405,14.9,78,3,datsun 200-sx 276 | 20.3,5,131,103,2830,15.9,78,2,audi 5000 277 | 17,6,163,125,3140,13.6,78,2,volvo 264gl 278 | 21.6,4,121,115,2795,15.7,78,2,saab 99gle 279 | 16.2,6,163,133,3410,15.8,78,2,peugeot 604sl 280 | 31.5,4,89,71,1990,14.9,78,2,volkswagen scirocco 281 | 29.5,4,98,68,2135,16.6,78,3,honda accord lx 282 | 21.5,6,231,115,3245,15.4,79,1,pontiac lemans v6 283 | 19.8,6,200,85,2990,18.2,79,1,mercury zephyr 6 284 | 22.3,4,140,88,2890,17.3,79,1,ford fairmont 4 285 | 20.2,6,232,90,3265,18.2,79,1,amc concord dl 6 286 | 20.6,6,225,110,3360,16.6,79,1,dodge aspen 6 287 | 17,8,305,130,3840,15.4,79,1,chevrolet caprice classic 288 | 17.6,8,302,129,3725,13.4,79,1,ford ltd landau 289 | 16.5,8,351,138,3955,13.2,79,1,mercury grand marquis 290 | 18.2,8,318,135,3830,15.2,79,1,dodge st. regis 291 | 16.9,8,350,155,4360,14.9,79,1,buick estate wagon (sw) 292 | 15.5,8,351,142,4054,14.3,79,1,ford country squire (sw) 293 | 19.2,8,267,125,3605,15,79,1,chevrolet malibu classic (sw) 294 | 18.5,8,360,150,3940,13,79,1,chrysler lebaron town @ country (sw) 295 | 31.9,4,89,71,1925,14,79,2,vw rabbit custom 296 | 34.1,4,86,65,1975,15.2,79,3,maxda glc deluxe 297 | 35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom 298 | 27.4,4,121,80,2670,15,79,1,amc spirit dl 299 | 25.4,5,183,77,3530,20.1,79,2,mercedes benz 300d 300 | 23,8,350,125,3900,17.4,79,1,cadillac eldorado 301 | 27.2,4,141,71,3190,24.8,79,2,peugeot 504 302 | 23.9,8,260,90,3420,22.2,79,1,oldsmobile cutlass salon brougham 303 | 34.2,4,105,70,2200,13.2,79,1,plymouth horizon 304 | 34.5,4,105,70,2150,14.9,79,1,plymouth horizon tc3 305 | 31.8,4,85,65,2020,19.2,79,3,datsun 210 306 | 37.3,4,91,69,2130,14.7,79,2,fiat strada custom 307 | 28.4,4,151,90,2670,16,79,1,buick skylark limited 308 | 28.8,6,173,115,2595,11.3,79,1,chevrolet citation 309 | 26.8,6,173,115,2700,12.9,79,1,oldsmobile omega brougham 310 | 33.5,4,151,90,2556,13.2,79,1,pontiac phoenix 311 | 41.5,4,98,76,2144,14.7,80,2,vw rabbit 312 | 38.1,4,89,60,1968,18.8,80,3,toyota corolla tercel 313 | 32.1,4,98,70,2120,15.5,80,1,chevrolet chevette 314 | 37.2,4,86,65,2019,16.4,80,3,datsun 310 315 | 28,4,151,90,2678,16.5,80,1,chevrolet citation 316 | 26.4,4,140,88,2870,18.1,80,1,ford fairmont 317 | 24.3,4,151,90,3003,20.1,80,1,amc concord 318 | 19.1,6,225,90,3381,18.7,80,1,dodge aspen 319 | 34.3,4,97,78,2188,15.8,80,2,audi 4000 320 | 29.8,4,134,90,2711,15.5,80,3,toyota corona liftback 321 | 31.3,4,120,75,2542,17.5,80,3,mazda 626 322 | 37,4,119,92,2434,15,80,3,datsun 510 hatchback 323 | 32.2,4,108,75,2265,15.2,80,3,toyota corolla 324 | 46.6,4,86,65,2110,17.9,80,3,mazda glc 325 | 27.9,4,156,105,2800,14.4,80,1,dodge colt 326 | 40.8,4,85,65,2110,19.2,80,3,datsun 210 327 | 44.3,4,90,48,2085,21.7,80,2,vw rabbit c (diesel) 328 | 43.4,4,90,48,2335,23.7,80,2,vw dasher (diesel) 329 | 36.4,5,121,67,2950,19.9,80,2,audi 5000s (diesel) 330 | 30,4,146,67,3250,21.8,80,2,mercedes-benz 240d 331 | 44.6,4,91,67,1850,13.8,80,3,honda civic 1500 gl 332 | 40.9,4,85,?,1835,17.3,80,2,renault lecar deluxe 333 | 33.8,4,97,67,2145,18,80,3,subaru dl 334 | 29.8,4,89,62,1845,15.3,80,2,vokswagen rabbit 335 | 32.7,6,168,132,2910,11.4,80,3,datsun 280-zx 336 | 23.7,3,70,100,2420,12.5,80,3,mazda rx-7 gs 337 | 35,4,122,88,2500,15.1,80,2,triumph tr7 coupe 338 | 23.6,4,140,?,2905,14.3,80,1,ford mustang cobra 339 | 32.4,4,107,72,2290,17,80,3,honda accord 340 | 27.2,4,135,84,2490,15.7,81,1,plymouth reliant 341 | 26.6,4,151,84,2635,16.4,81,1,buick skylark 342 | 25.8,4,156,92,2620,14.4,81,1,dodge aries wagon (sw) 343 | 23.5,6,173,110,2725,12.6,81,1,chevrolet citation 344 | 30,4,135,84,2385,12.9,81,1,plymouth reliant 345 | 39.1,4,79,58,1755,16.9,81,3,toyota starlet 346 | 39,4,86,64,1875,16.4,81,1,plymouth champ 347 | 35.1,4,81,60,1760,16.1,81,3,honda civic 1300 348 | 32.3,4,97,67,2065,17.8,81,3,subaru 349 | 37,4,85,65,1975,19.4,81,3,datsun 210 mpg 350 | 37.7,4,89,62,2050,17.3,81,3,toyota tercel 351 | 34.1,4,91,68,1985,16,81,3,mazda glc 4 352 | 34.7,4,105,63,2215,14.9,81,1,plymouth horizon 4 353 | 34.4,4,98,65,2045,16.2,81,1,ford escort 4w 354 | 29.9,4,98,65,2380,20.7,81,1,ford escort 2h 355 | 33,4,105,74,2190,14.2,81,2,volkswagen jetta 356 | 34.5,4,100,?,2320,15.8,81,2,renault 18i 357 | 33.7,4,107,75,2210,14.4,81,3,honda prelude 358 | 32.4,4,108,75,2350,16.8,81,3,toyota corolla 359 | 32.9,4,119,100,2615,14.8,81,3,datsun 200sx 360 | 31.6,4,120,74,2635,18.3,81,3,mazda 626 361 | 28.1,4,141,80,3230,20.4,81,2,peugeot 505s turbo diesel 362 | 30.7,6,145,76,3160,19.6,81,2,volvo diesel 363 | 25.4,6,168,116,2900,12.6,81,3,toyota cressida 364 | 24.2,6,146,120,2930,13.8,81,3,datsun 810 maxima 365 | 22.4,6,231,110,3415,15.8,81,1,buick century 366 | 26.6,8,350,105,3725,19,81,1,oldsmobile cutlass ls 367 | 20.2,6,200,88,3060,17.1,81,1,ford granada gl 368 | 17.6,6,225,85,3465,16.6,81,1,chrysler lebaron salon 369 | 28,4,112,88,2605,19.6,82,1,chevrolet cavalier 370 | 27,4,112,88,2640,18.6,82,1,chevrolet cavalier wagon 371 | 34,4,112,88,2395,18,82,1,chevrolet cavalier 2-door 372 | 31,4,112,85,2575,16.2,82,1,pontiac j2000 se hatchback 373 | 29,4,135,84,2525,16,82,1,dodge aries se 374 | 27,4,151,90,2735,18,82,1,pontiac phoenix 375 | 24,4,140,92,2865,16.4,82,1,ford fairmont futura 376 | 36,4,105,74,1980,15.3,82,2,volkswagen rabbit l 377 | 37,4,91,68,2025,18.2,82,3,mazda glc custom l 378 | 31,4,91,68,1970,17.6,82,3,mazda glc custom 379 | 38,4,105,63,2125,14.7,82,1,plymouth horizon miser 380 | 36,4,98,70,2125,17.3,82,1,mercury lynx l 381 | 36,4,120,88,2160,14.5,82,3,nissan stanza xe 382 | 36,4,107,75,2205,14.5,82,3,honda accord 383 | 34,4,108,70,2245,16.9,82,3,toyota corolla 384 | 38,4,91,67,1965,15,82,3,honda civic 385 | 32,4,91,67,1965,15.7,82,3,honda civic (auto) 386 | 38,4,91,67,1995,16.2,82,3,datsun 310 gx 387 | 25,6,181,110,2945,16.4,82,1,buick century limited 388 | 38,6,262,85,3015,17,82,1,oldsmobile cutlass ciera (diesel) 389 | 26,4,156,92,2585,14.5,82,1,chrysler lebaron medallion 390 | 22,6,232,112,2835,14.7,82,1,ford granada l 391 | 32,4,144,96,2665,13.9,82,3,toyota celica gt 392 | 36,4,135,84,2370,13,82,1,dodge charger 2.2 393 | 27,4,151,90,2950,17.3,82,1,chevrolet camaro 394 | 27,4,140,86,2790,15.6,82,1,ford mustang gl 395 | 44,4,97,52,2130,24.6,82,2,vw pickup 396 | 32,4,135,84,2295,11.6,82,1,dodge rampage 397 | 28,4,120,79,2625,18.6,82,1,ford ranger 398 | 31,4,119,82,2720,19.4,82,1,chevy s-10 399 | -------------------------------------------------------------------------------- /10-LinearRegression2/Overfitted_Data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/10-LinearRegression2/Overfitted_Data.png -------------------------------------------------------------------------------- /11-practical-data-visualization/10-exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exercise: Interactive Chart in Altair\n", 8 | "\n", 9 | "Create a scatterplot with dimensions of your choosing of the movies dataset that you can brush, and a stacked histogram that filters according to the brush. \n", 10 | "\n", 11 | "This is what your stacked histogram should look like: \n", 12 | "\n", 13 | "\n", 14 | "![Stacked Histogram](stacked_hist.png)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [] 23 | } 24 | ], 25 | "metadata": { 26 | "kernelspec": { 27 | "display_name": "Python 3 (ipykernel)", 28 | "language": "python", 29 | "name": "python3" 30 | }, 31 | "language_info": { 32 | "codemirror_mode": { 33 | "name": "ipython", 34 | "version": 3 35 | }, 36 | "file_extension": ".py", 37 | "mimetype": "text/x-python", 38 | "name": "python", 39 | "nbconvert_exporter": "python", 40 | "pygments_lexer": "ipython3", 41 | "version": "3.9.7" 42 | } 43 | }, 44 | "nbformat": 4, 45 | "nbformat_minor": 4 46 | } 47 | -------------------------------------------------------------------------------- /11-practical-data-visualization/stacked_hist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/11-practical-data-visualization/stacked_hist.png -------------------------------------------------------------------------------- /11-practical-data-visualization/standards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/11-practical-data-visualization/standards.png -------------------------------------------------------------------------------- /12-vis-principles/11-vis-principles.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/12-vis-principles/11-vis-principles.pdf -------------------------------------------------------------------------------- /13-web-scraping/class.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/class.png -------------------------------------------------------------------------------- /13-web-scraping/inspector.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/inspector.png -------------------------------------------------------------------------------- /13-web-scraping/lyrics.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Lyrics 6 | 7 | 8 |
9 | Published: 1969-10-22 10 | Led Zeppelin 11 |

Ramble On

12 |
13 | Leaves are falling all around, It's time I was on my way. 14 | Thanks to you, I'm much obliged for such a pleasant stay. 15 | But now it's time for me to go. The autumn moon lights my way. 16 | For now I smell the rain, and with it pain, and it's headed my way. 17 |
18 |
19 |
20 | Published: 2016-05-03 21 | Radiohead 22 |

Burn the Witch

23 |
24 | Stay in the shadows 25 | Cheer at the gallows 26 | This is a round up 27 | This is a low flying panic attack 28 | Sing a song on the jukebox that goes 29 | Burn the witch 30 | Burn the witch 31 | We know where you live 32 |
33 |
34 | 35 | 36 | -------------------------------------------------------------------------------- /13-web-scraping/requests.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/requests.png -------------------------------------------------------------------------------- /13-web-scraping/sampledevtools.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/sampledevtools.png -------------------------------------------------------------------------------- /14-apis/14-exercise-apis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Exercise 1 – APIs\n", 8 | "\n", 9 | "Use this public [Quotes API](https://github.com/lukePeavey/quotable) to find all quotes in their database by Oscar Wilde.\n", 10 | "\n", 11 | "1. How many quotes does this API have?\n", 12 | "2. List each quote they have available on a separate line." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import requests " 22 | ] 23 | } 24 | ], 25 | "metadata": { 26 | "kernelspec": { 27 | "display_name": "Python 3 (ipykernel)", 28 | "language": "python", 29 | "name": "python3" 30 | }, 31 | "language_info": { 32 | "codemirror_mode": { 33 | "name": "ipython", 34 | "version": 3 35 | }, 36 | "file_extension": ".py", 37 | "mimetype": "text/x-python", 38 | "name": "python", 39 | "nbconvert_exporter": "python", 40 | "pygments_lexer": "ipython3", 41 | "version": "3.9.6" 42 | } 43 | }, 44 | "nbformat": 4, 45 | "nbformat_minor": 2 46 | } 47 | -------------------------------------------------------------------------------- /14-apis/credentials.py: -------------------------------------------------------------------------------- 1 | # Fill these in with your keys. You may not need all of them. 2 | API_KEY = "" 3 | API_KEY_SECRET = "" 4 | BEARER_TOKEN = "" 5 | ACCESS_TOKEN = "" 6 | ACCESS_TOKEN_SECRET = "" 7 | CLIENT_ID = "" 8 | CLIENT_SECRET = "" 9 | -------------------------------------------------------------------------------- /14-apis/pokeapiscreenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/pokeapiscreenshot.png -------------------------------------------------------------------------------- /14-apis/pokemonendpoint.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/pokemonendpoint.png -------------------------------------------------------------------------------- /14-apis/requests.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/requests.png -------------------------------------------------------------------------------- /15-Classification1/BiasVarianceTradeoff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/BiasVarianceTradeoff.png -------------------------------------------------------------------------------- /15-Classification1/BinaryConfusinoMatrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/BinaryConfusinoMatrix.png -------------------------------------------------------------------------------- /15-Classification1/ConfusionMatrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/ConfusionMatrix.png -------------------------------------------------------------------------------- /15-Classification1/iris.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/iris.png -------------------------------------------------------------------------------- /15-Classification1/oc-tree.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/oc-tree.jpeg -------------------------------------------------------------------------------- /15-Classification1/p_sets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/p_sets.png -------------------------------------------------------------------------------- /15-Classification1/scikit-learn-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/scikit-learn-logo.png -------------------------------------------------------------------------------- /15-Classification1/temp.dot: -------------------------------------------------------------------------------- 1 | digraph Tree { 2 | node [shape=box, style="filled, rounded", color="black", fontname="helvetica"] ; 3 | edge [fontname="helvetica"] ; 4 | 0 [label=gini = 0.459
samples = 445
value = [286, 159]
class = Perished>, fillcolor="#f3c7a7"] ; 5 | 1 [label=gini = 0.254
samples = 288
value = [245, 43]
class = Perished>, fillcolor="#ea975c"] ; 6 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 7 | 2 [label=samples = 12
value = [4, 8]
class = Survived>, fillcolor="#9ccef2"] ; 8 | 1 -> 2 ; 9 | 3 [label=gini = 0.221
samples = 276
value = [241, 35]
class = Perished>, fillcolor="#e99356"] ; 10 | 1 -> 3 ; 11 | 4 [label=gini = 0.413
samples = 55
value = [39, 16]
class = Perished>, fillcolor="#f0b58a"] ; 12 | 3 -> 4 ; 13 | 5 [label=samples = 6
value = [6, 0]
class = Perished>, fillcolor="#e58139"] ; 14 | 4 -> 5 ; 15 | 6 [label=gini = 0.44
samples = 49
value = [33, 16]
class = Perished>, fillcolor="#f2be99"] ; 16 | 4 -> 6 ; 17 | 7 [label=samples = 2
value = [0, 2]
class = Survived>, fillcolor="#399de5"] ; 18 | 6 -> 7 ; 19 | 8 [label=gini = 0.418
samples = 47
value = [33, 14]
class = Perished>, fillcolor="#f0b68d"] ; 20 | 6 -> 8 ; 21 | 9 [label=gini = 0.368
samples = 37
value = [28, 9]
class = Perished>, fillcolor="#edaa79"] ; 22 | 8 -> 9 ; 23 | 10 [label=gini = 0.346
samples = 36
value = [28, 8]
class = Perished>, fillcolor="#eca572"] ; 24 | 9 -> 10 ; 25 | 11 [label=samples = 6
value = [3, 3]
class = Perished>, fillcolor="#ffffff"] ; 26 | 10 -> 11 ; 27 | 12 [label=gini = 0.278
samples = 30
value = [25, 5]
class = Perished>, fillcolor="#ea9a61"] ; 28 | 10 -> 12 ; 29 | 13 [label=gini = 0.363
samples = 21
value = [16, 5]
class = Perished>, fillcolor="#eda877"] ; 30 | 12 -> 13 ; 31 | 14 [label=gini = 0.266
samples = 19
value = [16, 3]
class = Perished>, fillcolor="#ea995e"] ; 32 | 13 -> 14 ; 33 | 15 [label=samples = 12
value = [9, 3]
class = Perished>, fillcolor="#eeab7b"] ; 34 | 14 -> 15 ; 35 | 16 [label=samples = 7
value = [7, 0]
class = Perished>, fillcolor="#e58139"] ; 36 | 14 -> 16 ; 37 | 17 [label=samples = 2
value = [0, 2]
class = Survived>, fillcolor="#399de5"] ; 38 | 13 -> 17 ; 39 | 18 [label=samples = 9
value = [9, 0]
class = Perished>, fillcolor="#e58139"] ; 40 | 12 -> 18 ; 41 | 19 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ; 42 | 9 -> 19 ; 43 | 20 [label=samples = 10
value = [5, 5]
class = Perished>, fillcolor="#ffffff"] ; 44 | 8 -> 20 ; 45 | 21 [label=gini = 0.157
samples = 221
value = [202, 19]
class = Perished>, fillcolor="#e78d4c"] ; 46 | 3 -> 21 ; 47 | 22 [label=gini = 0.174
samples = 198
value = [179, 19]
class = Perished>, fillcolor="#e88e4e"] ; 48 | 21 -> 22 ; 49 | 23 [label=gini = 0.135
samples = 151
value = [140, 11]
class = Perished>, fillcolor="#e78b49"] ; 50 | 22 -> 23 ; 51 | 24 [label=gini = 0.302
samples = 27
value = [22, 5]
class = Perished>, fillcolor="#eb9e66"] ; 52 | 23 -> 24 ; 53 | 25 [label=gini = 0.391
samples = 15
value = [11, 4]
class = Perished>, fillcolor="#eeaf81"] ; 54 | 24 -> 25 ; 55 | 26 [label=samples = 14
value = [11, 3]
class = Perished>, fillcolor="#eca36f"] ; 56 | 25 -> 26 ; 57 | 27 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ; 58 | 25 -> 27 ; 59 | 28 [label=samples = 12
value = [11, 1]
class = Perished>, fillcolor="#e78c4b"] ; 60 | 24 -> 28 ; 61 | 29 [label=gini = 0.092
samples = 124
value = [118, 6]
class = Perished>, fillcolor="#e68743"] ; 62 | 23 -> 29 ; 63 | 30 [label=samples = 10
value = [8, 2]
class = Perished>, fillcolor="#eca06a"] ; 64 | 29 -> 30 ; 65 | 31 [label=gini = 0.068
samples = 114
value = [110, 4]
class = Perished>, fillcolor="#e68640"] ; 66 | 29 -> 31 ; 67 | 32 [label=gini = 0.153
samples = 24
value = [22, 2]
class = Perished>, fillcolor="#e78c4b"] ; 68 | 31 -> 32 ; 69 | 33 [label=samples = 11
value = [9, 2]
class = Perished>, fillcolor="#eb9d65"] ; 70 | 32 -> 33 ; 71 | 34 [label=samples = 13
value = [13, 0]
class = Perished>, fillcolor="#e58139"] ; 72 | 32 -> 34 ; 73 | 35 [label=gini = 0.043
samples = 90
value = [88, 2]
class = Perished>, fillcolor="#e6843d"] ; 74 | 31 -> 35 ; 75 | 36 [label=samples = 50
value = [50, 0]
class = Perished>, fillcolor="#e58139"] ; 76 | 35 -> 36 ; 77 | 37 [label=gini = 0.095
samples = 40
value = [38, 2]
class = Perished>, fillcolor="#e68843"] ; 78 | 35 -> 37 ; 79 | 38 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ; 80 | 37 -> 38 ; 81 | 39 [label=gini = 0.05
samples = 39
value = [38, 1]
class = Perished>, fillcolor="#e6843e"] ; 82 | 37 -> 39 ; 83 | 40 [label=samples = 33
value = [33, 0]
class = Perished>, fillcolor="#e58139"] ; 84 | 39 -> 40 ; 85 | 41 [label=samples = 6
value = [5, 1]
class = Perished>, fillcolor="#ea9a61"] ; 86 | 39 -> 41 ; 87 | 42 [label=gini = 0.282
samples = 47
value = [39, 8]
class = Perished>, fillcolor="#ea9b62"] ; 88 | 22 -> 42 ; 89 | 43 [label=samples = 10
value = [6, 4]
class = Perished>, fillcolor="#f6d5bd"] ; 90 | 42 -> 43 ; 91 | 44 [label=gini = 0.193
samples = 37
value = [33, 4]
class = Perished>, fillcolor="#e89051"] ; 92 | 42 -> 44 ; 93 | 45 [label=gini = 0.157
samples = 35
value = [32, 3]
class = Perished>, fillcolor="#e78d4c"] ; 94 | 44 -> 45 ; 95 | 46 [label=gini = 0.227
samples = 23
value = [20, 3]
class = Perished>, fillcolor="#e99457"] ; 96 | 45 -> 46 ; 97 | 47 [label=gini = 0.105
samples = 18
value = [17, 1]
class = Perished>, fillcolor="#e78845"] ; 98 | 46 -> 47 ; 99 | 48 [label=samples = 13
value = [13, 0]
class = Perished>, fillcolor="#e58139"] ; 100 | 47 -> 48 ; 101 | 49 [label=samples = 5
value = [4, 1]
class = Perished>, fillcolor="#eca06a"] ; 102 | 47 -> 49 ; 103 | 50 [label=samples = 5
value = [3, 2]
class = Perished>, fillcolor="#f6d5bd"] ; 104 | 46 -> 50 ; 105 | 51 [label=samples = 12
value = [12, 0]
class = Perished>, fillcolor="#e58139"] ; 106 | 45 -> 51 ; 107 | 52 [label=samples = 2
value = [1, 1]
class = Perished>, fillcolor="#ffffff"] ; 108 | 44 -> 52 ; 109 | 53 [label=samples = 23
value = [23, 0]
class = Perished>, fillcolor="#e58139"] ; 110 | 21 -> 53 ; 111 | 54 [label=gini = 0.386
samples = 157
value = [41, 116]
class = Survived>, fillcolor="#7fc0ee"] ; 112 | 0 -> 54 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 113 | 55 [label=gini = 0.131
samples = 85
value = [6, 79]
class = Survived>, fillcolor="#48a4e7"] ; 114 | 54 -> 55 ; 115 | 56 [label=samples = 1
value = [1, 0]
class = Perished>, fillcolor="#e58139"] ; 116 | 55 -> 56 ; 117 | 57 [label=gini = 0.112
samples = 84
value = [5, 79]
class = Survived>, fillcolor="#46a3e7"] ; 118 | 55 -> 57 ; 119 | 58 [label=gini = 0.251
samples = 34
value = [5, 29]
class = Survived>, fillcolor="#5baee9"] ; 120 | 57 -> 58 ; 121 | 59 [label=gini = 0.213
samples = 33
value = [4, 29]
class = Survived>, fillcolor="#54abe9"] ; 122 | 58 -> 59 ; 123 | 60 [label=samples = 6
value = [0, 6]
class = Survived>, fillcolor="#399de5"] ; 124 | 59 -> 60 ; 125 | 61 [label=gini = 0.252
samples = 27
value = [4, 23]
class = Survived>, fillcolor="#5baeea"] ; 126 | 59 -> 61 ; 127 | 62 [label=samples = 4
value = [2, 2]
class = Perished>, fillcolor="#ffffff"] ; 128 | 61 -> 62 ; 129 | 63 [label=gini = 0.159
samples = 23
value = [2, 21]
class = Survived>, fillcolor="#4ca6e7"] ; 130 | 61 -> 63 ; 131 | 64 [label=samples = 13
value = [0, 13]
class = Survived>, fillcolor="#399de5"] ; 132 | 63 -> 64 ; 133 | 65 [label=samples = 10
value = [2, 8]
class = Survived>, fillcolor="#6ab6ec"] ; 134 | 63 -> 65 ; 135 | 66 [label=samples = 1
value = [1, 0]
class = Perished>, fillcolor="#e58139"] ; 136 | 58 -> 66 ; 137 | 67 [label=samples = 50
value = [0, 50]
class = Survived>, fillcolor="#399de5"] ; 138 | 57 -> 67 ; 139 | 68 [label=gini = 0.5
samples = 72
value = [35, 37]
class = Survived>, fillcolor="#f4fafe"] ; 140 | 54 -> 68 ; 141 | 69 [label=gini = 0.454
samples = 46
value = [30, 16]
class = Perished>, fillcolor="#f3c4a3"] ; 142 | 68 -> 69 ; 143 | 70 [label=gini = 0.499
samples = 29
value = [14, 15]
class = Survived>, fillcolor="#f2f8fd"] ; 144 | 69 -> 70 ; 145 | 71 [label=gini = 0.444
samples = 15
value = [10, 5]
class = Perished>, fillcolor="#f2c09c"] ; 146 | 70 -> 71 ; 147 | 72 [label=samples = 8
value = [3, 5]
class = Survived>, fillcolor="#b0d8f5"] ; 148 | 71 -> 72 ; 149 | 73 [label=samples = 7
value = [7, 0]
class = Perished>, fillcolor="#e58139"] ; 150 | 71 -> 73 ; 151 | 74 [label=samples = 14
value = [4, 10]
class = Survived>, fillcolor="#88c4ef"] ; 152 | 70 -> 74 ; 153 | 75 [label=gini = 0.111
samples = 17
value = [16, 1]
class = Perished>, fillcolor="#e78945"] ; 154 | 69 -> 75 ; 155 | 76 [label=samples = 15
value = [15, 0]
class = Perished>, fillcolor="#e58139"] ; 156 | 75 -> 76 ; 157 | 77 [label=samples = 2
value = [1, 1]
class = Perished>, fillcolor="#ffffff"] ; 158 | 75 -> 77 ; 159 | 78 [label=gini = 0.311
samples = 26
value = [5, 21]
class = Survived>, fillcolor="#68b4eb"] ; 160 | 68 -> 78 ; 161 | 79 [label=gini = 0.219
samples = 24
value = [3, 21]
class = Survived>, fillcolor="#55abe9"] ; 162 | 78 -> 79 ; 163 | 80 [label=samples = 9
value = [0, 9]
class = Survived>, fillcolor="#399de5"] ; 164 | 79 -> 80 ; 165 | 81 [label=gini = 0.32
samples = 15
value = [3, 12]
class = Survived>, fillcolor="#6ab6ec"] ; 166 | 79 -> 81 ; 167 | 82 [label=samples = 5
value = [2, 3]
class = Survived>, fillcolor="#bddef6"] ; 168 | 81 -> 82 ; 169 | 83 [label=samples = 10
value = [1, 9]
class = Survived>, fillcolor="#4fa8e8"] ; 170 | 81 -> 83 ; 171 | 84 [label=samples = 2
value = [2, 0]
class = Perished>, fillcolor="#e58139"] ; 172 | 78 -> 84 ; 173 | } -------------------------------------------------------------------------------- /15-Classification1/temp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/temp.png -------------------------------------------------------------------------------- /15-Classification1/titanic_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/titanic_tree.png -------------------------------------------------------------------------------- /16-Classification2/4fold_CV.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/4fold_CV.png -------------------------------------------------------------------------------- /16-Classification2/SVM-Tutorial.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/SVM-Tutorial.pdf -------------------------------------------------------------------------------- /16-Classification2/iris.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/iris.png -------------------------------------------------------------------------------- /17-NLP-RegEx/lecture-21-exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to Data Science – Text Munging Exercises\n", 8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* " 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "## NLP\n", 16 | "\n", 17 | "### Exercise 1.1: Frequent Words\n", 18 | "Find the most frequently used words in Moby Dick which are not stopwords and not punctuation. Hint: [`str.isalpha()`](https://docs.python.org/3/library/stdtypes.html#str.isalpha) could be useful here." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import nltk\n", 28 | "from nltk.corpus import stopwords\n", 29 | "stopwords = nltk.corpus.stopwords.words('english')\n", 30 | "from nltk.book import *" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Exercise 2.1\n", 45 | "\n", 46 | "You're an evil Spammer who's observed that many people try to obfuscate their e-mail using this notation: \"`alex at utah dot edu`\". Below are three examples of such e-mails text. Try to extract \"alex at utah dot edu\", etc. Start with the first string. Then extend your regular expression to work on all of them at the same time. Note that the second and third are slightly harder to do! " 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "import re\n", 56 | "html_smart = \"You can reach me: alex at utah dot edu\"\n", 57 | "html_smart2 = \"You can reach me: alex dot lex at utah dot edu\"\n", 58 | "html_smart3 = \"You can reach me: alex dot lex at sci dot utah dot edu\"" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "def testRegex(regex):\n", 68 | " for html in (html_smart, html_smart2, html_smart3):\n", 69 | " print(re.search(regex, html).group())" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "# TODO write your regex here\n", 79 | "mail_regex = \"\"" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "testRegex(mail_regex)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Exercise 2.2: Find Adverbs\n", 96 | "\n", 97 | "Write a regular expression that finds all adverbs in a sentence. Adverbs are characterized by ending in \"ly\"." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "text = \"He was carefully disguised but captured quickly by police.\"" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### Exercise 2.3: Phone Numbers\n", 121 | "\n", 122 | "Extract the phone numbers that follow a (xxx) xxx-xxxx pattern from the text:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "phone_numbers = \"\"\"(857) 131-2235, (801) 134-2215, but this one (12) 13044441 shouldnt match. \\\n", 132 | "Also, this is common in twelve (12) countries and one (1) state\"\"\"" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### Exercise 2.4: HTML Content\n", 147 | "\n", 148 | "Extract the content between the `` and `` tags but not the other tags:" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "html_tags = \"This is important and verytimely\"" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [] 166 | } 167 | ], 168 | "metadata": { 169 | "kernelspec": { 170 | "display_name": "Python 3 (ipykernel)", 171 | "language": "python", 172 | "name": "python3" 173 | }, 174 | "language_info": { 175 | "codemirror_mode": { 176 | "name": "ipython", 177 | "version": 3 178 | }, 179 | "file_extension": ".py", 180 | "mimetype": "text/x-python", 181 | "name": "python", 182 | "nbconvert_exporter": "python", 183 | "pygments_lexer": "ipython3", 184 | "version": "3.9.6" 185 | } 186 | }, 187 | "nbformat": 4, 188 | "nbformat_minor": 2 189 | } 190 | -------------------------------------------------------------------------------- /17-NLP-RegEx/mod_squad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/17-NLP-RegEx/mod_squad.png -------------------------------------------------------------------------------- /18-Clustering1/k-means-fig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/18-Clustering1/k-means-fig.png -------------------------------------------------------------------------------- /18-Clustering1/lloyd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/18-Clustering1/lloyd.png -------------------------------------------------------------------------------- /19-Clustering2/ComparisonOfClusteringMethods.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/ComparisonOfClusteringMethods.png -------------------------------------------------------------------------------- /19-Clustering2/DBScan.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/DBScan.png -------------------------------------------------------------------------------- /19-Clustering2/connectivity_plot1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/connectivity_plot1.png -------------------------------------------------------------------------------- /19-Clustering2/connectivity_plot2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/connectivity_plot2.png -------------------------------------------------------------------------------- /19-Clustering2/dendrogram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/dendrogram.png -------------------------------------------------------------------------------- /19-Clustering2/hc_1_homogeneous_complete.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_1_homogeneous_complete.png -------------------------------------------------------------------------------- /19-Clustering2/hc_2_homogeneous_not_complete.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_2_homogeneous_not_complete.png -------------------------------------------------------------------------------- /19-Clustering2/hc_3_complete_not_homogeneous.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_3_complete_not_homogeneous.png -------------------------------------------------------------------------------- /19-Clustering2/hierarchical_clustering_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hierarchical_clustering_1.png -------------------------------------------------------------------------------- /19-Clustering2/hierarchical_clustering_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hierarchical_clustering_2.png -------------------------------------------------------------------------------- /19-Clustering2/lloyd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/lloyd.png -------------------------------------------------------------------------------- /20-DimReduction/20-DimReduction-Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "# Introduction to Data Science \n", 12 | "# Lecture 20: Dimension Reduction - Activity\n", 13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "slideshow": { 21 | "slide_type": "slide" 22 | } 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "# imports and setup \n", 27 | "\n", 28 | "import numpy as np\n", 29 | "\n", 30 | "import pandas as pd\n", 31 | "pd.set_option('display.notebook_repr_html', False)\n", 32 | "\n", 33 | "from sklearn.datasets import load_iris, load_digits\n", 34 | "from sklearn.preprocessing import scale\n", 35 | "from sklearn.decomposition import PCA \n", 36 | "from sklearn.cluster import KMeans, AgglomerativeClustering\n", 37 | "from sklearn import metrics\n", 38 | "from sklearn.metrics import homogeneity_score, v_measure_score\n", 39 | "\n", 40 | "import matplotlib.pyplot as plt\n", 41 | "from matplotlib.colors import ListedColormap\n", 42 | "from mpl_toolkits.mplot3d import Axes3D\n", 43 | "%matplotlib inline\n", 44 | "plt.rcParams['figure.figsize'] = (10, 6)\n", 45 | "plt.style.use('ggplot')\n", 46 | "\n", 47 | "import seaborn as sns" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "In this activity we will consider an RNA data set taken from [here](https://www.nature.com/articles/s41592-019-0425-8). This data set contains genetic information on 296 different cells, recording 3000 distinct gene counts/features for each cell. The cells were synthetically generated in various mixtures (7 different cell types) so that ground truth cell type information is in fact available. Note this data has already been imputed and scaled, so you don't need to rescale." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 68, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "\n", 67 | "RangeIndex: 296 entries, 0 to 295\n", 68 | "Columns: 3000 entries, ENSG00000019582 to ENSG00000116560\n", 69 | "dtypes: float64(3000)\n", 70 | "memory usage: 6.8 MB\n", 71 | "None\n", 72 | " ENSG00000019582 ENSG00000132432 ENSG00000234745 ENSG00000146648 \\\n", 73 | "0 4.317488 3.135494 4.382027 2.397895 \n", 74 | "1 3.258097 4.382027 3.135494 3.688879 \n", 75 | "2 3.465736 4.844187 3.496508 4.007333 \n", 76 | "3 3.332205 5.017280 3.465736 4.025352 \n", 77 | "4 3.332205 5.111988 3.526361 4.060443 \n", 78 | "\n", 79 | " ENSG00000108602 ERCC-00130 ERCC-00096 ERCC-00002 ERCC-00046 \\\n", 80 | "0 3.806662 3.713572 3.496508 3.610918 2.995732 \n", 81 | "1 2.564949 5.043425 4.836282 4.875197 4.077537 \n", 82 | "2 2.890372 4.442651 4.110874 4.262680 3.433987 \n", 83 | "3 2.890372 4.369448 4.094345 4.262680 3.465736 \n", 84 | "4 2.833213 4.406719 4.143135 4.234107 3.496508 \n", 85 | "\n", 86 | " ERCC-00074 ... ENSG00000197619 ENSG00000114450 ENSG00000147592 \\\n", 87 | "0 3.663562 ... 0.0 1.386294 1.386294 \n", 88 | "1 4.844187 ... 0.0 0.693147 1.386294 \n", 89 | "2 4.219508 ... 0.0 0.693147 1.609438 \n", 90 | "3 4.204693 ... 0.0 1.098612 1.945910 \n", 91 | "4 4.234107 ... 0.0 1.098612 1.098612 \n", 92 | "\n", 93 | " ENSG00000184897 ENSG00000157765 ENSG00000116273 ENSG00000000003 \\\n", 94 | "0 2.302585 0.693147 0.000000 1.098612 \n", 95 | "1 2.079442 0.000000 0.000000 1.098612 \n", 96 | "2 2.197225 0.000000 0.000000 1.386294 \n", 97 | "3 2.302585 0.000000 0.000000 1.098612 \n", 98 | "4 2.484907 0.000000 0.693147 1.098612 \n", 99 | "\n", 100 | " ENSG00000250120 ENSG00000069122 ENSG00000116560 \n", 101 | "0 0.000000 0.693147 2.890372 \n", 102 | "1 0.000000 0.693147 2.772589 \n", 103 | "2 0.000000 0.000000 2.639057 \n", 104 | "3 0.000000 0.000000 2.772589 \n", 105 | "4 0.693147 0.693147 2.833213 \n", 106 | "\n", 107 | "[5 rows x 3000 columns]\n" 108 | ] 109 | }, 110 | { 111 | "data": { 112 | "text/plain": [ 113 | " CellType\n", 114 | "CellNumber \n", 115 | "1 1\n", 116 | "2 2\n", 117 | "3 2\n", 118 | "4 2\n", 119 | "5 2\n", 120 | "... ...\n", 121 | "292 7\n", 122 | "293 7\n", 123 | "294 7\n", 124 | "295 7\n", 125 | "296 7\n", 126 | "\n", 127 | "[296 rows x 1 columns]" 128 | ] 129 | }, 130 | "execution_count": 68, 131 | "metadata": {}, 132 | "output_type": "execute_result" 133 | } 134 | ], 135 | "source": [ 136 | "# Read in the data\n", 137 | "\n", 138 | "rna_data = pd.read_csv(\"rnamix1_SCT.csv\")\n", 139 | "rna_labels = pd.read_csv(\"rnamix1_labels.csv\",index_col=0)\n", 140 | "print(rna_data.info())\n", 141 | "print(rna_data.head())\n", 142 | "rna_labels" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "(1) Make a 2-dimensional PCA plot of the data, and color it by the cell types. " 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 69, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "# Your code goes here" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "(2) What percentage of the variance is captured by the first 2 PC's? Make a plot showing the decay of the variance explained by the first 100 PC's. How many PC's would you need to capture 90% of the variance?" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 70, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "# Your code goes here" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "(3) Calculate the v_measure_score obtained by running kmeans with k = 7 on the 2-dimensional PCA plot. Can you achieve a higher score by using more PCs?" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 71, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# Your code goes here" 191 | ] 192 | } 193 | ], 194 | "metadata": { 195 | "anaconda-cloud": {}, 196 | "kernelspec": { 197 | "display_name": "Python 3 (ipykernel)", 198 | "language": "python", 199 | "name": "python3" 200 | }, 201 | "language_info": { 202 | "codemirror_mode": { 203 | "name": "ipython", 204 | "version": 3 205 | }, 206 | "file_extension": ".py", 207 | "mimetype": "text/x-python", 208 | "name": "python", 209 | "nbconvert_exporter": "python", 210 | "pygments_lexer": "ipython3", 211 | "version": "3.11.5" 212 | } 213 | }, 214 | "nbformat": 4, 215 | "nbformat_minor": 4 216 | } 217 | -------------------------------------------------------------------------------- /20-DimReduction/heptathlon.csv: -------------------------------------------------------------------------------- 1 | name , hurdles, highjump, shot, run200m, longjump, javelin, run800m, score 2 | Joyner-Kersee (USA) , 12.69, 1.86, 15.80, 22.56, 7.27, 45.66, 128.51, 7291 3 | John (GDR) , 12.85, 1.80, 16.23, 23.65, 6.71, 42.56, 126.12, 6897 4 | Behmer (GDR) , 13.20, 1.83, 14.20, 23.10, 6.68, 44.54, 124.20, 6858 5 | Sablovskaite (URS) , 13.61, 1.80, 15.23, 23.92, 6.25, 42.78, 132.24, 6540 6 | Choubenkova (URS) , 13.51, 1.74, 14.76, 23.93, 6.32, 47.46, 127.90, 6540 7 | Schulz (GDR) , 13.75, 1.83, 13.50, 24.65, 6.33, 42.82, 125.79, 6411 8 | Fleming (AUS) , 13.38, 1.80, 12.88, 23.59, 6.37, 40.28, 132.54, 6351 9 | Greiner (USA) , 13.55, 1.80, 14.13, 24.48, 6.47, 38.00, 133.65, 6297 10 | Lajbnerova (CZE) , 13.63, 1.83, 14.28, 24.86, 6.11, 42.20, 136.05, 6252 11 | Bouraga (URS) , 13.25, 1.77, 12.62, 23.59, 6.28, 39.06, 134.74, 6252 12 | Wijnsma (HOL) , 13.75, 1.86, 13.01, 25.03, 6.34, 37.86, 131.49, 6205 13 | Dimitrova (BUL) , 13.24, 1.80, 12.88, 23.59, 6.37, 40.28, 132.54, 6171 14 | Scheider (SWI) , 13.85, 1.86, 11.58, 24.87, 6.05, 47.50, 134.93, 6137 15 | Braun (FRG) , 13.71, 1.83, 13.16, 24.78, 6.12, 44.58, 142.82, 6109 16 | Ruotsalainen (FIN) , 13.79, 1.80, 12.32, 24.61, 6.08, 45.44, 137.06, 6101 17 | Yuping (CHN) , 13.93, 1.86, 14.21, 25.00, 6.40, 38.60, 146.67, 6087 18 | Hagger (GB) , 13.47, 1.80, 12.75, 25.47, 6.34, 35.76, 138.48, 5975 19 | Brown (USA) , 14.07, 1.83, 12.69, 24.83, 6.13, 44.34, 146.43, 5972 20 | Mulliner (GB) , 14.39, 1.71, 12.68, 24.92, 6.10, 37.76, 138.02, 5746 21 | Hautenauve (BEL) , 14.04, 1.77, 11.81, 25.61, 5.99, 35.68, 133.90, 5734 22 | Kytola (FIN) , 14.31, 1.77, 11.66, 25.69, 5.75, 39.48, 133.35, 5686 23 | Geremias (BRA) , 14.23, 1.71, 12.95, 25.50, 5.50, 39.64, 144.02, 5508 24 | Hui-Ing (TAI) , 14.85, 1.68, 10.00, 25.23, 5.47, 39.14, 137.30, 5290 25 | Jeong-Mi (KOR) , 14.53, 1.71, 10.83, 26.61, 5.50, 39.26, 139.17, 5289 26 | Launa (PNG) , 16.42, 1.50, 11.78, 26.16, 4.88, 46.38, 163.43, 4566 -------------------------------------------------------------------------------- /20-DimReduction/rnamix1_labels.csv: -------------------------------------------------------------------------------- 1 | CellNumber,CellType 2 | 1,1 3 | 2,2 4 | 3,2 5 | 4,2 6 | 5,2 7 | 6,2 8 | 7,2 9 | 8,2 10 | 9,2 11 | 10,2 12 | 11,3 13 | 12,2 14 | 13,2 15 | 14,2 16 | 15,2 17 | 16,2 18 | 17,2 19 | 18,2 20 | 19,2 21 | 20,2 22 | 21,2 23 | 22,2 24 | 23,2 25 | 24,2 26 | 25,2 27 | 26,2 28 | 27,2 29 | 28,2 30 | 29,2 31 | 30,2 32 | 31,2 33 | 32,2 34 | 33,2 35 | 34,2 36 | 35,2 37 | 36,2 38 | 37,2 39 | 38,2 40 | 39,2 41 | 40,2 42 | 41,2 43 | 42,2 44 | 43,2 45 | 44,2 46 | 45,2 47 | 46,2 48 | 47,2 49 | 48,2 50 | 49,2 51 | 50,2 52 | 51,2 53 | 52,2 54 | 53,2 55 | 54,2 56 | 55,2 57 | 56,2 58 | 57,2 59 | 58,2 60 | 59,2 61 | 60,2 62 | 61,2 63 | 62,2 64 | 63,2 65 | 64,2 66 | 65,2 67 | 66,2 68 | 67,2 69 | 68,2 70 | 69,2 71 | 70,2 72 | 71,2 73 | 72,2 74 | 73,2 75 | 74,2 76 | 75,2 77 | 76,2 78 | 77,4 79 | 78,4 80 | 79,4 81 | 80,4 82 | 81,4 83 | 82,4 84 | 83,4 85 | 84,4 86 | 85,4 87 | 86,4 88 | 87,4 89 | 88,4 90 | 89,4 91 | 90,4 92 | 91,4 93 | 92,4 94 | 93,4 95 | 94,4 96 | 95,4 97 | 96,4 98 | 97,4 99 | 98,4 100 | 99,4 101 | 100,4 102 | 101,4 103 | 102,4 104 | 103,4 105 | 104,4 106 | 105,4 107 | 106,4 108 | 107,4 109 | 108,4 110 | 109,4 111 | 110,4 112 | 111,4 113 | 112,4 114 | 113,4 115 | 114,5 116 | 115,5 117 | 116,5 118 | 117,5 119 | 118,5 120 | 119,5 121 | 120,5 122 | 121,5 123 | 122,5 124 | 123,5 125 | 124,5 126 | 125,5 127 | 126,5 128 | 127,5 129 | 128,5 130 | 129,5 131 | 130,5 132 | 131,5 133 | 132,5 134 | 133,5 135 | 134,6 136 | 135,6 137 | 136,6 138 | 137,6 139 | 138,6 140 | 139,6 141 | 140,6 142 | 141,6 143 | 142,6 144 | 143,6 145 | 144,6 146 | 145,6 147 | 146,6 148 | 147,6 149 | 148,6 150 | 149,6 151 | 150,6 152 | 151,6 153 | 152,6 154 | 153,6 155 | 154,6 156 | 155,6 157 | 156,6 158 | 157,6 159 | 158,6 160 | 159,6 161 | 160,6 162 | 161,6 163 | 162,6 164 | 163,6 165 | 164,6 166 | 165,6 167 | 166,6 168 | 167,6 169 | 168,6 170 | 169,6 171 | 170,6 172 | 171,6 173 | 172,6 174 | 173,6 175 | 174,6 176 | 175,1 177 | 176,1 178 | 177,1 179 | 178,1 180 | 179,1 181 | 180,1 182 | 181,1 183 | 182,1 184 | 183,1 185 | 184,1 186 | 185,1 187 | 186,1 188 | 187,1 189 | 188,1 190 | 189,1 191 | 190,1 192 | 191,1 193 | 192,1 194 | 193,1 195 | 194,1 196 | 195,1 197 | 196,1 198 | 197,1 199 | 198,1 200 | 199,1 201 | 200,1 202 | 201,1 203 | 202,1 204 | 203,1 205 | 204,1 206 | 205,1 207 | 206,1 208 | 207,1 209 | 208,1 210 | 209,1 211 | 210,1 212 | 211,1 213 | 212,1 214 | 213,1 215 | 214,3 216 | 215,3 217 | 216,3 218 | 217,3 219 | 218,3 220 | 219,3 221 | 220,3 222 | 221,3 223 | 222,3 224 | 223,3 225 | 224,3 226 | 225,3 227 | 226,3 228 | 227,3 229 | 228,3 230 | 229,3 231 | 230,3 232 | 231,3 233 | 232,3 234 | 233,3 235 | 234,3 236 | 235,3 237 | 236,3 238 | 237,3 239 | 238,3 240 | 239,3 241 | 240,3 242 | 241,3 243 | 242,3 244 | 243,3 245 | 244,3 246 | 245,3 247 | 246,3 248 | 247,3 249 | 248,3 250 | 249,3 251 | 250,3 252 | 251,3 253 | 252,3 254 | 253,3 255 | 254,3 256 | 255,3 257 | 256,3 258 | 257,3 259 | 258,7 260 | 259,7 261 | 260,7 262 | 261,7 263 | 262,7 264 | 263,7 265 | 264,7 266 | 265,7 267 | 266,7 268 | 267,7 269 | 268,7 270 | 269,7 271 | 270,7 272 | 271,7 273 | 272,7 274 | 273,7 275 | 274,7 276 | 275,7 277 | 276,7 278 | 277,7 279 | 278,7 280 | 279,7 281 | 280,7 282 | 281,7 283 | 282,7 284 | 283,7 285 | 284,7 286 | 285,7 287 | 286,7 288 | 287,7 289 | 288,7 290 | 289,7 291 | 290,7 292 | 291,7 293 | 292,7 294 | 293,7 295 | 294,7 296 | 295,7 297 | 296,7 -------------------------------------------------------------------------------- /21-NeuralNetwork1/ImageNetPlot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/ImageNetPlot.png -------------------------------------------------------------------------------- /21-NeuralNetwork1/mnist-original.mat.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/mnist-original.mat.zip -------------------------------------------------------------------------------- /21-NeuralNetwork1/neuralnetworks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/neuralnetworks.png -------------------------------------------------------------------------------- /21-NeuralNetwork1/perceptron.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/perceptron.png -------------------------------------------------------------------------------- /22-NeuralNetworks2/22-NeuralNetworks2-activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "# Introduction to Data Science \n", 12 | "# Inclass Exercises for Lecture 22: Neural Networks II\n", 13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": { 19 | "slideshow": { 20 | "slide_type": "-" 21 | } 22 | }, 23 | "source": [ 24 | "### Installing TensorFlow\n", 25 | "\n", 26 | "Instructions for installing TensorFlow are available at [the tensorflow install page](https://www.tensorflow.org/versions/r1.0/install/).\n", 27 | "\n", 28 | "It is recommended that you use the command: \n", 29 | "```\n", 30 | "pip install tensorflow\n", 31 | "```\n" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "2.4.1\n" 44 | ] 45 | } 46 | ], 47 | "source": [ 48 | "import tensorflow as tf\n", 49 | "print(tf.__version__)\n" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": { 55 | "slideshow": { 56 | "slide_type": "-" 57 | } 58 | }, 59 | "source": [ 60 | "**Exercise 1:** Use TensorFlow to compute the derivative of $f(x) = e^x$ at $x=2$." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 2, 66 | "metadata": { 67 | "slideshow": { 68 | "slide_type": "-" 69 | } 70 | }, 71 | "outputs": [], 72 | "source": [ 73 | "# your code here\n" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "**Exercise 2:** Use TensorFlow to find the minimum of the [Rosenbrock function](https://en.wikipedia.org/wiki/Rosenbrock_function): \n", 81 | "$$\n", 82 | "f(x,y) = (x-1)^2 + 100*(y-x^2)^2.\n", 83 | "$$\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 4, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# your code here\n" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "slideshow": { 99 | "slide_type": "slide" 100 | } 101 | }, 102 | "source": [ 103 | "## Using a pre-trained network\n", 104 | "\n", 105 | "There are many examples of pre-trained NN that can be accessed [here](https://www.tensorflow.org/api_docs/python/tf/keras/applications). \n", 106 | "These NN are very large, having been trained on giant computers using massive datasets. \n", 107 | "\n", 108 | "It can be very useful to initialize a NN using one of these. This is called [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning). \n", 109 | "\n", 110 | "\n", 111 | "We'll use a NN that was pretrained for image recognition. This NN was trained on the [ImageNet](http://www.image-net.org/) project, which contains > 14 million images belonging to > 20,000 classes (synsets). " 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 6, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "import tensorflow as tf\n", 121 | "import numpy as np\n", 122 | "from tensorflow.keras.preprocessing import image\n", 123 | "from tensorflow.keras.applications import vgg16" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": { 129 | "slideshow": { 130 | "slide_type": "-" 131 | } 132 | }, 133 | "source": [ 134 | "**Exercise 3:** Use tf.keras.applications.VGG16 (the NN pre-trained on ImageNet) to classify at least two images not done in lecture. These can be images from the lecture folder or your own images. Report on the top five predicted classes and their corresponding probabilities. " 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 7, 140 | "metadata": { 141 | "slideshow": { 142 | "slide_type": "-" 143 | } 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "# your code here\n" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "**Exercise 4 (optional):** There are several [other pre-trained networks in Keras](https://github.com/keras-team/keras-applications). Try these!" 155 | ] 156 | } 157 | ], 158 | "metadata": { 159 | "anaconda-cloud": {}, 160 | "celltoolbar": "Slideshow", 161 | "kernelspec": { 162 | "display_name": "Python 3 (ipykernel)", 163 | "language": "python", 164 | "name": "python3" 165 | }, 166 | "language_info": { 167 | "codemirror_mode": { 168 | "name": "ipython", 169 | "version": 3 170 | }, 171 | "file_extension": ".py", 172 | "mimetype": "text/x-python", 173 | "name": "python", 174 | "nbconvert_exporter": "python", 175 | "pygments_lexer": "ipython3", 176 | "version": "3.11.5" 177 | } 178 | }, 179 | "nbformat": 4, 180 | "nbformat_minor": 4 181 | } 182 | -------------------------------------------------------------------------------- /22-NeuralNetworks2/activationFct.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/activationFct.png -------------------------------------------------------------------------------- /22-NeuralNetworks2/beginner.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "rX8mhOLljYeM" 7 | }, 8 | "source": [ 9 | "##### Copyright 2019 The TensorFlow Authors." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "cellView": "form", 17 | "execution": { 18 | "iopub.execute_input": "2021-01-27T02:22:31.155179Z", 19 | "iopub.status.busy": "2021-01-27T02:22:31.154464Z", 20 | "iopub.status.idle": "2021-01-27T02:22:31.156965Z", 21 | "shell.execute_reply": "2021-01-27T02:22:31.156425Z" 22 | }, 23 | "id": "BZSlp3DAjdYf" 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 28 | "# you may not use this file except in compliance with the License.\n", 29 | "# You may obtain a copy of the License at\n", 30 | "#\n", 31 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 32 | "#\n", 33 | "# Unless required by applicable law or agreed to in writing, software\n", 34 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 35 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 36 | "# See the License for the specific language governing permissions and\n", 37 | "# limitations under the License." 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": { 43 | "id": "3wF5wszaj97Y" 44 | }, 45 | "source": [ 46 | "# TensorFlow 2 quickstart for beginners" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": { 52 | "id": "DUNzJc4jTj6G" 53 | }, 54 | "source": [ 55 | "\n", 56 | " \n", 59 | " \n", 62 | " \n", 65 | " \n", 68 | "
\n", 57 | " View on TensorFlow.org\n", 58 | " \n", 60 | " Run in Google Colab\n", 61 | " \n", 63 | " View source on GitHub\n", 64 | " \n", 66 | " Download notebook\n", 67 | "
" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "id": "04QgGZc9bF5D" 75 | }, 76 | "source": [ 77 | "This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:\n", 78 | "\n", 79 | "1. Build a neural network that classifies images.\n", 80 | "2. Train this neural network.\n", 81 | "3. And, finally, evaluate the accuracy of the model." 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "id": "hiH7AC-NTniF" 88 | }, 89 | "source": [ 90 | "This is a [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) notebook file. Python programs are run directly in the browser—a great way to learn and use TensorFlow. To follow this tutorial, run the notebook in Google Colab by clicking the button at the top of this page.\n", 91 | "\n", 92 | "1. In Colab, connect to a Python runtime: At the top-right of the menu bar, select *CONNECT*.\n", 93 | "2. Run all the notebook code cells: Select *Runtime* > *Run all*." 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": { 99 | "id": "nnrWf3PCEzXL" 100 | }, 101 | "source": [ 102 | "Download and install TensorFlow 2. Import TensorFlow into your program:\n", 103 | "\n", 104 | "Note: Upgrade `pip` to install the TensorFlow 2 package. See the [install guide](https://www.tensorflow.org/install) for details." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 1, 110 | "metadata": { 111 | "execution": { 112 | "iopub.execute_input": "2021-01-27T02:22:31.165375Z", 113 | "iopub.status.busy": "2021-01-27T02:22:31.164739Z", 114 | "iopub.status.idle": "2021-01-27T02:22:37.373933Z", 115 | "shell.execute_reply": "2021-01-27T02:22:37.373322Z" 116 | }, 117 | "id": "0trJmd6DjqBZ" 118 | }, 119 | "outputs": [], 120 | "source": [ 121 | "import tensorflow as tf" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "id": "7NAbSZiaoJ4z" 128 | }, 129 | "source": [ 130 | "Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the samples from integers to floating-point numbers:" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 9, 136 | "metadata": { 137 | "execution": { 138 | "iopub.execute_input": "2021-01-27T02:22:37.379364Z", 139 | "iopub.status.busy": "2021-01-27T02:22:37.378426Z", 140 | "iopub.status.idle": "2021-01-27T02:22:37.838749Z", 141 | "shell.execute_reply": "2021-01-27T02:22:37.838096Z" 142 | }, 143 | "id": "7FP5258xjs-v" 144 | }, 145 | "outputs": [ 146 | { 147 | "name": "stdout", 148 | "output_type": "stream", 149 | "text": [ 150 | "\n", 151 | "(1, 28, 28)\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "mnist = tf.keras.datasets.mnist\n", 157 | "\n", 158 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 159 | "x_train, x_test = x_train / 255.0, x_test / 255.0\n", 160 | "print(type(x_train[:1]))\n", 161 | "temp = x_train[:1]\n", 162 | "print(temp.shape)" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": { 168 | "id": "BPZ68wASog_I" 169 | }, 170 | "source": [ 171 | "Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 10, 177 | "metadata": { 178 | "execution": { 179 | "iopub.execute_input": "2021-01-27T02:22:37.844781Z", 180 | "iopub.status.busy": "2021-01-27T02:22:37.843714Z", 181 | "iopub.status.idle": "2021-01-27T02:22:46.880604Z", 182 | "shell.execute_reply": "2021-01-27T02:22:46.881017Z" 183 | }, 184 | "id": "h3IKyzTCDNGo" 185 | }, 186 | "outputs": [], 187 | "source": [ 188 | "model = tf.keras.models.Sequential([\n", 189 | " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", 190 | " tf.keras.layers.Dense(128, activation='relu'),\n", 191 | " tf.keras.layers.Dropout(0.2),\n", 192 | " tf.keras.layers.Dense(10)\n", 193 | "])" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": { 199 | "id": "l2hiez2eIUz8" 200 | }, 201 | "source": [ 202 | "For each example the model returns a vector of \"[logits](https://developers.google.com/machine-learning/glossary#logits)\" or \"[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)\" scores, one for each class." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 11, 208 | "metadata": { 209 | "execution": { 210 | "iopub.execute_input": "2021-01-27T02:22:46.889938Z", 211 | "iopub.status.busy": "2021-01-27T02:22:46.888924Z", 212 | "iopub.status.idle": "2021-01-27T02:22:47.311748Z", 213 | "shell.execute_reply": "2021-01-27T02:22:47.312324Z" 214 | }, 215 | "id": "OeOrNdnkEEcR" 216 | }, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "array([[ 0.56925064, 0.07378727, 0.02948097, 0.3486092 , -0.06868142,\n", 222 | " 0.41106564, 0.07833061, -0.67182094, 1.172125 , -0.11269745]],\n", 223 | " dtype=float32)" 224 | ] 225 | }, 226 | "execution_count": 11, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "predictions = model(x_train[:1]).numpy()\n", 233 | "predictions" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": { 239 | "id": "tgjhDQGcIniO" 240 | }, 241 | "source": [ 242 | "The `tf.nn.softmax` function converts these logits to \"probabilities\" for each class: " 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 12, 248 | "metadata": { 249 | "execution": { 250 | "iopub.execute_input": "2021-01-27T02:22:47.317672Z", 251 | "iopub.status.busy": "2021-01-27T02:22:47.316571Z", 252 | "iopub.status.idle": "2021-01-27T02:22:47.320613Z", 253 | "shell.execute_reply": "2021-01-27T02:22:47.321064Z" 254 | }, 255 | "id": "zWSRnQ0WI5eq" 256 | }, 257 | "outputs": [ 258 | { 259 | "data": { 260 | "text/plain": [ 261 | "array([[0.13139944, 0.08006018, 0.07659043, 0.1053829 , 0.06942936,\n", 262 | " 0.11217463, 0.08042473, 0.0379842 , 0.24011457, 0.06643963]],\n", 263 | " dtype=float32)" 264 | ] 265 | }, 266 | "execution_count": 12, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | } 270 | ], 271 | "source": [ 272 | "tf.nn.softmax(predictions).numpy()" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": { 278 | "id": "he5u_okAYS4a" 279 | }, 280 | "source": [ 281 | "Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to\n", 282 | "provide an exact and numerically stable loss calculation for all models when using a softmax output. " 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": { 288 | "id": "hQyugpgRIyrA" 289 | }, 290 | "source": [ 291 | "The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example." 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 13, 297 | "metadata": { 298 | "execution": { 299 | "iopub.execute_input": "2021-01-27T02:22:47.326443Z", 300 | "iopub.status.busy": "2021-01-27T02:22:47.325323Z", 301 | "iopub.status.idle": "2021-01-27T02:22:47.328155Z", 302 | "shell.execute_reply": "2021-01-27T02:22:47.327614Z" 303 | }, 304 | "id": "RSkzdv8MD0tT" 305 | }, 306 | "outputs": [], 307 | "source": [ 308 | "loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": { 314 | "id": "SfR4MsSDU880" 315 | }, 316 | "source": [ 317 | "This loss is equal to the negative log probability of the true class:\n", 318 | "It is zero if the model is sure of the correct class.\n", 319 | "\n", 320 | "This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`." 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 14, 326 | "metadata": { 327 | "execution": { 328 | "iopub.execute_input": "2021-01-27T02:22:47.334018Z", 329 | "iopub.status.busy": "2021-01-27T02:22:47.332880Z", 330 | "iopub.status.idle": "2021-01-27T02:22:47.339363Z", 331 | "shell.execute_reply": "2021-01-27T02:22:47.339843Z" 332 | }, 333 | "id": "NJWqEVrrJ7ZB" 334 | }, 335 | "outputs": [ 336 | { 337 | "name": "stdout", 338 | "output_type": "stream", 339 | "text": [ 340 | "[5]\n", 341 | "[[ 0.56925064 0.07378727 0.02948097 0.3486092 -0.06868142 0.41106564\n", 342 | " 0.07833061 -0.67182094 1.172125 -0.11269745]]\n" 343 | ] 344 | }, 345 | { 346 | "data": { 347 | "text/plain": [ 348 | "2.1876984" 349 | ] 350 | }, 351 | "execution_count": 14, 352 | "metadata": {}, 353 | "output_type": "execute_result" 354 | } 355 | ], 356 | "source": [ 357 | "print(y_train[:1])\n", 358 | "print(predictions)\n", 359 | "loss_fn(y_train[:1], predictions).numpy()" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 9, 365 | "metadata": { 366 | "execution": { 367 | "iopub.execute_input": "2021-01-27T02:22:47.349430Z", 368 | "iopub.status.busy": "2021-01-27T02:22:47.348502Z", 369 | "iopub.status.idle": "2021-01-27T02:22:47.364666Z", 370 | "shell.execute_reply": "2021-01-27T02:22:47.364157Z" 371 | }, 372 | "id": "9foNKHzTD2Vo" 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "model.compile(optimizer='adam',\n", 377 | " loss=loss_fn,\n", 378 | " metrics=['accuracy'])" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": { 384 | "id": "ix4mEL65on-w" 385 | }, 386 | "source": [ 387 | "The `Model.fit` method adjusts the model parameters to minimize the loss: " 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 10, 393 | "metadata": { 394 | "execution": { 395 | "iopub.execute_input": "2021-01-27T02:22:47.369888Z", 396 | "iopub.status.busy": "2021-01-27T02:22:47.368635Z", 397 | "iopub.status.idle": "2021-01-27T02:23:02.490451Z", 398 | "shell.execute_reply": "2021-01-27T02:23:02.490856Z" 399 | }, 400 | "id": "y7suUbJXVLqP" 401 | }, 402 | "outputs": [ 403 | { 404 | "name": "stdout", 405 | "output_type": "stream", 406 | "text": [ 407 | "Epoch 1/5\n", 408 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.4813 - accuracy: 0.8565\n", 409 | "Epoch 2/5\n", 410 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.1533 - accuracy: 0.9553\n", 411 | "Epoch 3/5\n", 412 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.1057 - accuracy: 0.9686\n", 413 | "Epoch 4/5\n", 414 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.0908 - accuracy: 0.9721\n", 415 | "Epoch 5/5\n", 416 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.0700 - accuracy: 0.9788\n" 417 | ] 418 | }, 419 | { 420 | "data": { 421 | "text/plain": [ 422 | "" 423 | ] 424 | }, 425 | "execution_count": 1, 426 | "metadata": {}, 427 | "output_type": "execute_result" 428 | } 429 | ], 430 | "source": [ 431 | "model.fit(x_train, y_train, epochs=5)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": { 437 | "id": "4mDAAPFqVVgn" 438 | }, 439 | "source": [ 440 | "The `Model.evaluate` method checks the models performance, usually on a \"[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)\" or \"[Test-set](https://developers.google.com/machine-learning/glossary#test-set)\"." 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": 11, 446 | "metadata": { 447 | "execution": { 448 | "iopub.execute_input": "2021-01-27T02:23:02.496255Z", 449 | "iopub.status.busy": "2021-01-27T02:23:02.495021Z", 450 | "iopub.status.idle": "2021-01-27T02:23:03.037377Z", 451 | "shell.execute_reply": "2021-01-27T02:23:03.036860Z" 452 | }, 453 | "id": "F7dTAzgHDUh7" 454 | }, 455 | "outputs": [ 456 | { 457 | "name": "stdout", 458 | "output_type": "stream", 459 | "text": [ 460 | "313/313 - 0s - loss: 0.0748 - accuracy: 0.9758\n" 461 | ] 462 | }, 463 | { 464 | "data": { 465 | "text/plain": [ 466 | "[0.07476752996444702, 0.9757999777793884]" 467 | ] 468 | }, 469 | "execution_count": 1, 470 | "metadata": {}, 471 | "output_type": "execute_result" 472 | } 473 | ], 474 | "source": [ 475 | "model.evaluate(x_test, y_test, verbose=2)" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": { 481 | "id": "T4JfEh7kvx6m" 482 | }, 483 | "source": [ 484 | "The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/)." 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": { 490 | "id": "Aj8NrlzlJqDG" 491 | }, 492 | "source": [ 493 | "If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 12, 499 | "metadata": { 500 | "execution": { 501 | "iopub.execute_input": "2021-01-27T02:23:03.044680Z", 502 | "iopub.status.busy": "2021-01-27T02:23:03.044017Z", 503 | "iopub.status.idle": "2021-01-27T02:23:03.062428Z", 504 | "shell.execute_reply": "2021-01-27T02:23:03.061917Z" 505 | }, 506 | "id": "rYb6DrEH0GMv" 507 | }, 508 | "outputs": [], 509 | "source": [ 510 | "probability_model = tf.keras.Sequential([\n", 511 | " model,\n", 512 | " tf.keras.layers.Softmax()\n", 513 | "])" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 13, 519 | "metadata": { 520 | "execution": { 521 | "iopub.execute_input": "2021-01-27T02:23:03.067513Z", 522 | "iopub.status.busy": "2021-01-27T02:23:03.066434Z", 523 | "iopub.status.idle": "2021-01-27T02:23:03.072553Z", 524 | "shell.execute_reply": "2021-01-27T02:23:03.073021Z" 525 | }, 526 | "id": "cnqOZtUp1YR_" 527 | }, 528 | "outputs": [ 529 | { 530 | "data": { 531 | "text/plain": [ 532 | "" 548 | ] 549 | }, 550 | "execution_count": 1, 551 | "metadata": {}, 552 | "output_type": "execute_result" 553 | } 554 | ], 555 | "source": [ 556 | "probability_model(x_test[:5])" 557 | ] 558 | } 559 | ], 560 | "metadata": { 561 | "colab": { 562 | "collapsed_sections": [ 563 | "rX8mhOLljYeM" 564 | ], 565 | "name": "beginner.ipynb", 566 | "toc_visible": true 567 | }, 568 | "kernelspec": { 569 | "display_name": "Python 3", 570 | "language": "python", 571 | "name": "python3" 572 | }, 573 | "language_info": { 574 | "codemirror_mode": { 575 | "name": "ipython", 576 | "version": 3 577 | }, 578 | "file_extension": ".py", 579 | "mimetype": "text/x-python", 580 | "name": "python", 581 | "nbconvert_exporter": "python", 582 | "pygments_lexer": "ipython3", 583 | "version": "3.8.5" 584 | } 585 | }, 586 | "nbformat": 4, 587 | "nbformat_minor": 1 588 | } 589 | -------------------------------------------------------------------------------- /22-NeuralNetworks2/graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/graph.png -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/brodie.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/brodie.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/layla1.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/layla1.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/scout1.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout1.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/scout2.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout2.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/scout3.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout3.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/scout4.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout4.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/images/scout5.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout5.jpeg -------------------------------------------------------------------------------- /22-NeuralNetworks2/nature14539.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/nature14539.pdf -------------------------------------------------------------------------------- /23-databases/23-databases-exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to Data Science – Relational Databases Exercise\n", 8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* \n" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import pandas as pd\n", 18 | "import sqlite3 as sq\n", 19 | "\n", 20 | "# we connect to the database, which - in the case of sqlite - is a local file\n", 21 | "conn = sq.connect(\"./chinook.db\")" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Exercise 1: Simple Queries\n", 29 | "\n", 30 | "1. List all the rows in the genres table.\n", 31 | "2. List only the genre names (nothing else) in the table.\n", 32 | "3. List the genre names ordered by name.\n", 33 | "4. List the genre entries with IDs between 11 and 16.\n", 34 | "5. List the genre entries that start with an R.\n", 35 | "6. List the GenreIds of Latin, Easy Listening, and Opera (in one query)." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "1. List all the rows in the genres table." 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "2. List only the genre names (nothing else) in the table." 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "3. List the genre names ordered by name." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "List the genre entries with IDs between 11 and 16." 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "5. List the genre entries that start with an R." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "6. List the entries of Latin, Easy Listening, and Opera (in one query)." 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "## Exercise 2: Joining\n", 127 | "\n", 128 | "1. Create a table that contains track names, genre name and genre ID for each track. Hint: the table is sorted by genres, look at the tail of the dataframe to make sure it works correctly.\n", 129 | "2. Create a table that contains the counts of tracks in a genre by using the GenreID.\n", 130 | "3. Create a table that contains the genre name and the count of tracks in that genre.\n", 131 | "4. Sort the previous table by the count. Which are the biggest genres? Hint: the DESC keyword can be added at the end of the sorting expression.\n" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": {}, 138 | "outputs": [], 139 | "source": [] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "2. Create a table that contains the counts of tracks in a genre by using the GenreID." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "3. Create a table that contains the genre name and the count of tracks in that genre" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": { 172 | "collapsed": true 173 | }, 174 | "source": [ 175 | "4. Sort the previous table by the count. Which are the biggest genres?" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [] 184 | } 185 | ], 186 | "metadata": { 187 | "anaconda-cloud": {}, 188 | "kernelspec": { 189 | "display_name": "Python 3 (ipykernel)", 190 | "language": "python", 191 | "name": "python3" 192 | }, 193 | "language_info": { 194 | "codemirror_mode": { 195 | "name": "ipython", 196 | "version": 3 197 | }, 198 | "file_extension": ".py", 199 | "mimetype": "text/x-python", 200 | "name": "python", 201 | "nbconvert_exporter": "python", 202 | "pygments_lexer": "ipython3", 203 | "version": "3.9.6" 204 | } 205 | }, 206 | "nbformat": 4, 207 | "nbformat_minor": 1 208 | } 209 | -------------------------------------------------------------------------------- /23-databases/albums_tracks_tables.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/albums_tracks_tables.jpg -------------------------------------------------------------------------------- /23-databases/backup_chinook.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/backup_chinook.db -------------------------------------------------------------------------------- /23-databases/chinook.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/chinook.db -------------------------------------------------------------------------------- /23-databases/database_schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/database_schema.png -------------------------------------------------------------------------------- /23-databases/exploits_of_a_mom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/exploits_of_a_mom.png -------------------------------------------------------------------------------- /24-networks/24-network-exercise-activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "170612c4", 6 | "metadata": {}, 7 | "source": [ 8 | "## Exercise\n", 9 | "\n", 10 | "Explore the [Karate Club](https://networkx.readthedocs.io/en/stable/reference/generated/networkx.generators.social.karate_club_graph.html#networkx.generators.social.karate_club_graph) network:\n", 11 | "\n", 12 | " * How many nodes, how many edges are in the network? \n", 13 | " * Are there nodes of high betweenness centrality? Visualize the network.\n", 14 | " * Remove the node with the highest centrality. How many components do you have?" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 2, 20 | "id": "b4ab1f50", 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "import networkx as nx\n", 25 | "import matplotlib.pyplot as plt\n", 26 | "%matplotlib inline\n", 27 | "plt.rcParams['figure.figsize'] = (10, 6)\n", 28 | "plt.style.use('ggplot')\n", 29 | "karate_club = nx.karate_club_graph()" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "id": "f6b1c3ef", 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [] 39 | } 40 | ], 41 | "metadata": { 42 | "kernelspec": { 43 | "display_name": "Python 3 (ipykernel)", 44 | "language": "python", 45 | "name": "python3" 46 | }, 47 | "language_info": { 48 | "codemirror_mode": { 49 | "name": "ipython", 50 | "version": 3 51 | }, 52 | "file_extension": ".py", 53 | "mimetype": "text/x-python", 54 | "name": "python", 55 | "nbconvert_exporter": "python", 56 | "pygments_lexer": "ipython3", 57 | "version": "3.9.6" 58 | } 59 | }, 60 | "nbformat": 4, 61 | "nbformat_minor": 5 62 | } 63 | -------------------------------------------------------------------------------- /24-networks/24-networks-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/24-networks-slides.pdf -------------------------------------------------------------------------------- /24-networks/24-path-search.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to Data Science – Networks (Path Search)\n", 8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* \n", 9 | "\n", 10 | "This is a continuation of how to work with graphs in Python using the [NetworkX](networkx.github.io) library. Here we focus on understand Path Search Algorithms." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import networkx as nx\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "%matplotlib inline\n", 22 | "plt.rcParams['figure.figsize'] = (10, 6)\n", 23 | "plt.style.use('ggplot')" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "We'll also import the Les Miserable network again" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "# Read the graph file\n", 40 | "lesmis = nx.read_gml('lesmis.gml')\n", 41 | "# Plot the nodes\n", 42 | "lesmis.nodes()" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "## Path Search\n", 50 | "\n", 51 | "Path search, and in particular shortest path search is an important problem. It answers questions such as \n", 52 | " * how do I get as quickly as possible from A to B in a road network\n", 53 | " * how to best rout a data package that delivers the next second of your Netflix movie\n", 54 | " * who can I talk to to get an introduction to Person B\n", 55 | " * etc.\n", 56 | " \n", 57 | "There are two major types of path search algorithms: \n", 58 | "\n", 59 | "1. Algorithms that operate only on the topology, i.e., only the \"distance\" is relevant\n", 60 | "2. Algorithms that also consider edge weights, i.e., they minimize a \"cost\"\n", 61 | "\n", 62 | "For the above scenarios, edge weights make a lot of sense: I might give a different weight to an edge that is an Interstate, for example, as I will be able to travel faster. " 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "![](bread.png)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### Breadth First Seach\n", 77 | "\n", 78 | "Breadth first search is a simple algorithm that solves the single-source shortest path problem, i.e., it calculates the shortest path from one source to all other nodes in the network. \n", 79 | "\n", 80 | "The algorithm works as follows:\n", 81 | "\n", 82 | "1. Label source node 0\n", 83 | "2. Find neighbors, label 1, put in queue\n", 84 | "3. Take node labeled n (1 for first step) out of queue. Find its unlabeled neighbors. Label them n+1 and put in queue\n", 85 | "4. Repeat 3 until found node (if only the exact path is relevant) or no nodes left (when looking for all shortest paths)\n", 86 | "5. The distance between start and end node is the label of the end node.\n", 87 | "\n", 88 | "Let's look at the path from Boulatruelle to Napoleon:" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "path = nx.shortest_path(lesmis,source=\"Boulatruelle\",target=\"Marius\")\n", 98 | "path" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "And the path from Perpetue to Napoleon:" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "path = nx.shortest_path(lesmis,source=\"Perpetue\",target=\"Napoleon\")\n", 115 | "path" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### Dijkstra's Algorithm\n", 123 | "\n", 124 | "[Dijkstra's algoritm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) is the go-to algorithm for finding paths in a weigthed graph.\n", 125 | "\n", 126 | "Let the node at which we are starting be called the initial node. Let the distance of node Y be the distance from the initial node to Y. Dijkstra's algorithm will assign some initial distance values and will try to improve them step by step.\n", 127 | "1. Assign to every node a tentative distance value: set it to zero for our initial node and to infinity for all other nodes.\n", 128 | "2. Set the initial node as current. Mark all other nodes unvisited. Create a set of all the unvisited nodes called the unvisited set.\n", 129 | "3. For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one. For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.\n", 130 | "4. When we are done considering all of the neighbors of the current node, mark the current node as visited and remove it from the unvisited set. A visited node will never be checked again.\n", 131 | "5. If the destination node has been marked visited (when planning a route between two specific nodes) or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal; occurs when there is no connection between the initial node and remaining unvisited nodes), then stop. The algorithm has finished.\n", 132 | "6. Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new \"current node\", and go back to step 3.\n", 133 | "\n", 134 | "Here' is an animation for Dijkstra's Algorithm from Wikipedia (we'll go through this in class):\n", 135 | "\n", 136 | "![](Dijkstra_Animation.gif)\n", 137 | "\n", 138 | "Here is an illustration of Dijkstra's Algorithm for a motion planning task:\n", 139 | "\n", 140 | "![](Dijkstras_progress_animation.gif)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "Our Les Miserables dataset actually comes with edge weights. The weight describes the number of co-occurrences of the characters. Now, let's look at the values:" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": { 154 | "scrolled": true 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "lesmis.edges(data=True)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "We can draw the graph with these weights." 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "plt.rcParams['figure.figsize'] = (10, 15)\n", 175 | "\n", 176 | "pos = nx.spring_layout(lesmis)\n", 177 | "\n", 178 | "# Use edge weights in line drawing\n", 179 | "edge_widths = [1.0 * x[2]['value'] for x in lesmis.edges(data=True)]\n", 180 | "\n", 181 | "nx.draw(lesmis, pos=pos)\n", 182 | "nx.draw_networkx(lesmis, pos=pos, width=edge_widths)\n", 183 | "plt.show()" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "That was nasty, let's try color." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "plt.rcParams['figure.figsize'] = (10, 15)\n", 200 | "\n", 201 | "pos = nx.spring_layout(lesmis)\n", 202 | "\n", 203 | "# Use edge weights in line drawing\n", 204 | "edge_colors = [ x[2]['value'] / 31.0 for x in lesmis.edges(data=True)]\n", 205 | "\n", 206 | "nx.draw(lesmis, pos=pos)\n", 207 | "nx.draw_networkx(lesmis, pos=pos, edge_color=edge_colors, width=2.0, edge_cmap=plt.cm.YlOrRd)\n", 208 | "plt.show()" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "First we run the algorithm without weights:" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "path = nx.dijkstra_path(lesmis, source=\"Perpetue\", target=\"Napoleon\")\n", 225 | "path" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "And then we run it with the weights, to have a comparison:" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "weighted_path = nx.dijkstra_path(lesmis, source=\"Perpetue\", target=\"Napoleon\", weight=\"value\")\n", 242 | "weighted_path" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "We can calculate the relative weights of these paths:" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "def getPathCost(path):\n", 259 | " length = len(path)\n", 260 | " weight = 0\n", 261 | " for i in range(length-1):\n", 262 | " attributes = lesmis[path[i]][path[i+1]]\n", 263 | " weight += attributes[\"value\"]\n", 264 | " print(path[i], path[i+1], attributes)\n", 265 | " print(\"Weight:\", weight)\n", 266 | " \n", 267 | "print(\"Shortest Path\")\n", 268 | "getPathCost(path)\n", 269 | "\n", 270 | "print(\"\\n ==== \\n\")\n", 271 | "\n", 272 | "print(\"Weighted Path\") \n", 273 | "getPathCost(weighted_path)\n" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "### The A* Algorithm - Path Finding using Heuristics" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "Dijkstra is a great general algorithm, but it can be slow. \n", 288 | "\n", 289 | "If we know more about the network we're working with, we can use a more efficient algorithm that takes this information into account. For example, in motion planning and in route planning on a map, we know where the target point is located spatially, relative to the source point. We can take this information into account by using a heuristic function to refine the search. \n", 290 | "\n", 291 | "The [A* algorithm](https://en.wikipedia.org/wiki/A*_search_algorithm) is such an algorithm. It's based on Djikstra's algorithm, but uses a heuristic function to guide it's search into the right direction. A* is an informed search algorithm, or a best-first search, meaning that it solves problems by searching among all possible paths to the solution (goal) for the one that incurs the smallest cost (least distance traveled, shortest time, etc.), and among these paths it first considers the ones that appear to lead most quickly to the solution. \n", 292 | "\n", 293 | "At each step of the algorithm, A* evaluates which is the best paths to follow\n", 294 | "\n", 295 | "See the following example:\n", 296 | "\n", 297 | "![](Astar_progress_animation.gif)\n", 298 | "\n", 299 | "While [NetworkX](https://networkx.readthedocs.io/en/stable/reference/algorithms.shortest_paths.html#module-networkx.algorithms.shortest_paths.astar) provides an implementation of the A* algorithm, we are not able to define a meaningful heuristic function for the Les Miserables graph, so we can't use it on this graph." 300 | ] 301 | } 302 | ], 303 | "metadata": { 304 | "anaconda-cloud": {}, 305 | "kernelspec": { 306 | "display_name": "Python 3 (ipykernel)", 307 | "language": "python", 308 | "name": "python3" 309 | }, 310 | "language_info": { 311 | "codemirror_mode": { 312 | "name": "ipython", 313 | "version": 3 314 | }, 315 | "file_extension": ".py", 316 | "mimetype": "text/x-python", 317 | "name": "python", 318 | "nbconvert_exporter": "python", 319 | "pygments_lexer": "ipython3", 320 | "version": "3.9.6" 321 | } 322 | }, 323 | "nbformat": 4, 324 | "nbformat_minor": 1 325 | } 326 | -------------------------------------------------------------------------------- /24-networks/Astar_progress_animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Astar_progress_animation.gif -------------------------------------------------------------------------------- /24-networks/Dijkstra_Animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Dijkstra_Animation.gif -------------------------------------------------------------------------------- /24-networks/Dijkstras_progress_animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Dijkstras_progress_animation.gif -------------------------------------------------------------------------------- /24-networks/bread.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/bread.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019, the University of Utah 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to Data Science – Lecture Material 2 | Course website: [http://datasciencecourse.net](http://datasciencecourse.net) 3 | 4 | This repository (will) contain(s) the material used in the lectures. You can manually download the files for each lecture, but we recommend that you use git to clone and update this repository. 5 | 6 | You can use [GitHub Desktop](https://desktop.github.com/) to update this repository as new lectures are published, or you can use the following commands (recommended): 7 | 8 | ## Initial Step: Cloning 9 | 10 | When you clone a repository you set up a copy on your computer. Run: 11 | 12 | ```bash 13 | git clone https://github.com/datascience-course/2024-datascience-lectures 14 | ``` 15 | 16 | This will create a folder `2024-datascience-lectures` on your computer, with the individual labs in subdirectories. 17 | 18 | ## Updating 19 | 20 | As we release new lectures or update lectures, you'll have to update your repository. You can do this by changing into the `2024-datascience-lectures` directory and executing: 21 | 22 | ```bash 23 | git pull 24 | ``` 25 | 26 | That's it – you'll have the latest version of the lectures. 27 | --------------------------------------------------------------------------------