├── .gitignore
├── 01-intro
└── 01-introduction.pdf
├── 02-basic-python
├── 02-basic-python.ipynb
├── 02-exercises.ipynb
├── 02-version-control.ipynb
├── anaconda_navigator.png
├── datasciencecat.jpg
├── exercise.py
├── first_steps.py
└── newrepo.png
├── 03-basic-python-II
├── lecture-3-basic-python-II.ipynb
├── lecture-3-exercises.ipynb
└── patric_recursive.gif
├── 04-intro-desc-stat
├── 04-DescriptiveStatistics.ipynb
├── 04-DescriptiveStatistics_Activity.ipynb
├── Conf1.png
├── Conf2.png
├── Correlation_examples2.svg
├── SmallLargeStandDev.png
├── correlation.png
├── global_warming.csv
├── purity.png
└── test_data.csv
├── 05-dictionaries-pandas-series
├── 05-dictionaries-pandas-series.ipynb
└── 05-exercises.ipynb
├── 06-loading-data-dataframes
├── 06-exercises.ipynb
├── 06-loading-data-dataframes.ipynb
├── grades.csv
├── hit_albums.csv
└── my_file.txt
├── 07-hypothesis-testing
├── 07-hypothesis-testing.ipynb
├── Cohen1994.pdf
├── InferenceErrors.png
├── Nuzzo2014.pdf
└── determinePvals.png
├── 08-data-science-ethics
└── Data-Science-Ethics.pdf
├── 09-LinearRegression1
├── 08-LinearRegression1.ipynb
├── 08-LinearRegression1_Activity.ipynb
├── 438px-Linear_regression.png
├── Advertising.csv
└── SLR.pdf
├── 10-LinearRegression2
├── 438px-Linear_regression.png
├── 9-LinearRegression2.ipynb
├── 9-LinearRegression2_Activity.ipynb
├── Advertising.csv
├── Auto.csv
├── Credit.csv
└── Overfitted_Data.png
├── 11-practical-data-visualization
├── 10-exercise.ipynb
├── 10-practical_visualization.ipynb
├── movies.csv
├── stacked_hist.png
└── standards.png
├── 12-vis-principles
└── 11-vis-principles.pdf
├── 13-web-scraping
├── 13-exercise-scraping.ipynb
├── 13-web-scraping.ipynb
├── class.png
├── class_list.html
├── class_schedule.html
├── inspector.png
├── lyrics.html
├── requests.png
└── sampledevtools.png
├── 14-apis
├── 14-apis-iss-and.ipynb
├── 14-exercise-apis.ipynb
├── credentials.py
├── pokeapiscreenshot.png
├── pokemonendpoint.png
└── requests.png
├── 15-Classification1
├── 15-Classification-1-Decision-Trees.ipynb
├── 15-Classification-1-kNN.ipynb
├── BiasVarianceTradeoff.png
├── BinaryConfusinoMatrix.png
├── ConfusionMatrix.png
├── iris.png
├── oc-tree.jpeg
├── p_sets.png
├── scikit-learn-logo.png
├── temp.dot
├── temp.png
├── titanic.csv
└── titanic_tree.png
├── 16-Classification2
├── 16-Classification2-SVM.ipynb
├── 4fold_CV.png
├── SVM-Tutorial.pdf
└── iris.png
├── 17-NLP-RegEx
├── lecture-21-NLP.ipynb
├── lecture-21-exercise.ipynb
├── lecture-21-regex.ipynb
└── mod_squad.png
├── 18-Clustering1
├── 18-Clustering1-Exercise.ipynb
├── 18-Clustering1.ipynb
├── k-means-fig.png
└── lloyd.png
├── 19-Clustering2
├── 19-Clustering2-Exercise.ipynb
├── 19-Clustering2.ipynb
├── ComparisonOfClusteringMethods.png
├── DBScan.png
├── connectivity_plot1.png
├── connectivity_plot2.png
├── dendrogram.png
├── hc_1_homogeneous_complete.png
├── hc_2_homogeneous_not_complete.png
├── hc_3_complete_not_homogeneous.png
├── hierarchical_clustering_1.png
├── hierarchical_clustering_2.png
└── lloyd.png
├── 20-DimReduction
├── 20-DimReduction-Activity.ipynb
├── 20-DimReduction.ipynb
├── heptathlon.csv
├── rnamix1_SCT.csv
└── rnamix1_labels.csv
├── 21-NeuralNetwork1
├── 21-NeuralNetworks1.ipynb
├── Colored_neural_network.svg
├── ImageNetPlot.png
├── mnist-original.mat.zip
├── neuralnetworks.png
└── perceptron.png
├── 22-NeuralNetworks2
├── 22-NeuralNetworks2-activity.ipynb
├── 22-NeuralNetworks2.ipynb
├── Colored_neural_network.svg
├── activationFct.png
├── beginner.ipynb
├── graph.png
├── images
│ ├── brodie.jpeg
│ ├── layla1.jpeg
│ ├── scout1.jpeg
│ ├── scout2.jpeg
│ ├── scout3.jpeg
│ ├── scout4.jpeg
│ └── scout5.jpeg
└── nature14539.pdf
├── 23-databases
├── 23-databases-exercises.ipynb
├── 23-databases.ipynb
├── albums_tracks_tables.jpg
├── backup_chinook.db
├── chinook.db
├── database_schema.png
└── exploits_of_a_mom.png
├── 24-networks
├── 24-network-exercise-activity.ipynb
├── 24-networks-slides.pdf
├── 24-networks.ipynb
├── 24-path-search.ipynb
├── Astar_progress_animation.gif
├── Dijkstra_Animation.gif
├── Dijkstras_progress_animation.gif
├── bread.png
└── lesmis.gml
├── LICENSE
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 |
49 | # Translations
50 | *.mo
51 | *.pot
52 |
53 | # Django stuff:
54 | *.log
55 | local_settings.py
56 |
57 | # Flask stuff:
58 | instance/
59 | .webassets-cache
60 |
61 | # Scrapy stuff:
62 | .scrapy
63 |
64 | # Sphinx documentation
65 | docs/_build/
66 |
67 | # PyBuilder
68 | target/
69 |
70 | # Jupyter Notebook
71 | .ipynb_checkpoints
72 |
73 | # pyenv
74 | .python-version
75 |
76 | # celery beat schedule file
77 | celerybeat-schedule
78 |
79 | # SageMath parsed files
80 | *.sage.py
81 |
82 | # dotenv
83 | .env
84 |
85 | # virtualenv
86 | .venv
87 | venv/
88 | ENV/
89 |
90 | # Spyder project settings
91 | .spyderproject
92 | .spyproject
93 |
94 | # Rope project settings
95 | .ropeproject
96 |
97 | # mkdocs documentation
98 | /site
99 |
100 | # mypy
101 | .mypy_cache/
102 |
103 | # MacOS saving finder state
104 | .DS_Store
105 |
106 | # vim related
107 | *~
108 | *.sw*
109 |
--------------------------------------------------------------------------------
/01-intro/01-introduction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/01-intro/01-introduction.pdf
--------------------------------------------------------------------------------
/02-basic-python/02-exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Excercises for Lecture 2\n",
8 | "\n",
9 | "Your Name: \n",
10 | "Your UID: \n",
11 | "Your E-Mail: "
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## Exercise 1: Data types and operations\n",
19 | "Play around with data types and operations. Try the following things:\n",
20 | "\n",
21 | "1. Define two variables and assign an integer to the first and a float to the second. Define a new variable and assign the sum of the previous two variables. What's the data type of the third variable?\n",
22 | "2. Reassign a variable with a different data type, e.g., take one of your numerical variables and assign a string to it. What's the new data type?\n",
23 | "3. See what happens if you try to add a string to a string.\n",
24 | "4. See what happens if you add a string to a float or an integer."
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": null,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "# your code"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "## Exercise 2: Running Programs\n",
41 | "\n",
42 | " * Create a new file python file and use the double_number function as a template. Modify the code to add two numbers instead of doubling a single number.\n",
43 | " * Can you guess what would happen if you change the indentation? Try it out.\n",
44 | " * Try what happens if you print `a` at the very end of the program. Can you explain what's going on?\n",
45 | " \n",
46 | "If you want to submit this activity, paste the content of your python file here: "
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 1,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "# your code"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "## Exercise 3: Creating Cells, Executing Code\n",
63 | "\n",
64 | "1. Create a new code cell below where you define variables containing your name, your age in years, and your major.\n",
65 | "2. Create another cell that uses these variables and prints a concatenated string stating your name, major, and your age in years, months, and days (assuming today is your birthday). The output should look like that:\n",
66 | "\n",
67 | "```\n",
68 | "Name: Science Cat, Major: Computer Science, Age: 94 years, or 1128 months, or 34310 days. \n",
69 | "```"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "## Exercise 4: Functions\n",
77 | "Write a function that \n",
78 | " * takes two numerical variables\n",
79 | " * multiplies them with each other\n",
80 | " * divides them by a numerical variable defined in the scope outside the function\n",
81 | " * and returns the result. \n",
82 | " \n",
83 | "Print the result of the function for three different sets of input variables. "
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": null,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": []
92 | }
93 | ],
94 | "metadata": {
95 | "kernelspec": {
96 | "display_name": "Python 3 (ipykernel)",
97 | "language": "python",
98 | "name": "python3"
99 | },
100 | "language_info": {
101 | "codemirror_mode": {
102 | "name": "ipython",
103 | "version": 3
104 | },
105 | "file_extension": ".py",
106 | "mimetype": "text/x-python",
107 | "name": "python",
108 | "nbconvert_exporter": "python",
109 | "pygments_lexer": "ipython3",
110 | "version": "3.9.13"
111 | }
112 | },
113 | "nbformat": 4,
114 | "nbformat_minor": 4
115 | }
116 |
--------------------------------------------------------------------------------
/02-basic-python/02-version-control.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Introduction to Data Science – Lecture 2 – Git and GitHub\n",
8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n",
9 | "\n",
10 | "In this lecture, we will learn about version control. We'll look at a couple of general principles and then go into the specifics of git and also GitHub. We'll also look at features of GitHub such as issue tracking. We strongly recommend that you use proper version control for your final project. "
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## Why version Control?\n",
18 | " \n",
19 | " * **Keep copies of multiple states of files** \n",
20 | " By committing you record a state of the file to which you can go back any time.\n",
21 | " * **Create alternative states** \n",
22 | " Imagine you just want to try out something, but you realize you have to modify multiple files. You're not sure whether it works or is worth it. With version control you can just create a **branch** where you can experiment or develop new features without changing the main or other branches.\n",
23 | " * **Collaborate in teams** \n",
24 | " Nobody wants to send code via e-mail or share via Dropbox. If two people work on a file at the same time it's unclear how to merge the code. Version control lets you keep your code in a shared central location and has dedicated ways to merge and deal with conflicts. \n",
25 | " * **Keep your work safe** \n",
26 | " Your hard drive breaks. Your computer is stolen. But your code is safe because you store it not only on your computer but also on a remote server. \n",
27 | " * **Share** \n",
28 | " You developed something awesome and want to share it. But not only do you want to make it available, you're also happy about contributions from others! \n",
29 | "\n",
30 | "\n",
31 | "## git\n",
32 | "\n",
33 | " * Created by Linus Torvalds, 2005\n",
34 | " * Meaning: British English slang roughly equivalent to \"unpleasant person\". \n",
35 | " * git – the stupid content tracker.\n",
36 | "\n",
37 | "*I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'. -- Linus Torvalds*\n",
38 | "\n",
39 | "## Why git?\n",
40 | "\n",
41 | " * Popular ([~60-90% of open source projects](https://rhodecode.com/insights/version-control-systems-2016))\n",
42 | " * Truly distributed\n",
43 | " * Very fast\n",
44 | " * Everything is local\n",
45 | " * Free\n",
46 | " * Safe against corruptions\n",
47 | " * GitHub!"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "## Installation\n",
55 | "\n",
56 | "See the [official documentation](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) on how to install git on your operating system. Or see the [GitHub Documentation]( \n",
57 | " https://help.github.com/en/github/getting-started-with-github/set-up-git).\n",
58 | "\n",
59 | "On Mac, install the XCode package from the app store. \n",
60 | "\n",
61 | "On Windows, see the above link, or install [GitHub Desktop](https://desktop.github.com/) which includes a git shell."
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "# Working with GitHub\n",
69 | "\n",
70 | "First, we'll create a new repository on github by going to [https://github.com/new](https://github.com/new). \n",
71 | "\n",
72 | ". \n",
73 | "\n",
74 | "We'll also create a README.md and LICENSE file. "
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "## GUI Clients\n",
82 | "\n",
83 | "* **GitHub Desktop** \n",
84 | " Good option if you want a GUI client. [Download here](https://desktop.github.com/)\n",
85 | "* **Integrated in IDEs** \n",
86 | " Many operations can be done out of a IDE such as WebStorm \n",
87 | " "
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "## Command Line Interface\n",
95 | "\n",
96 | "Now let's clone the repository from GitHub.\n",
97 | " \n",
98 | "```bash\n",
99 | "$ git clone https://github.com/Fericiano-Fluorite/demo.git\n",
100 | "```\n",
101 | "\n",
102 | "This creates a local copy of the GitHub repository. We will just start working with that and commit and push the code to the server. \n",
103 | "\n",
104 | "\n",
105 | "```bash\n",
106 | "# What's currently in the repository?\n",
107 | "$ cd Demo\n",
108 | "$ ls\n",
109 | "LICENSE README.md\n",
110 | "```\n",
111 | "Write something to demo.txt.\n",
112 | "\n",
113 | "```bash\n",
114 | "$ echo \"Hello world!\" > demo.txt\n",
115 | "echo \"Hello world\" > demo.txt\n",
116 | "```\n",
117 | "Add demo.txt to the repository.\n",
118 | "```bash\n",
119 | "$ git add demo.txt\n",
120 | "```\n",
121 | "Commit the file to the repository.\n",
122 | "\n",
123 | "```bash\n",
124 | "$ git commit -a -m \"added demo file\" \n",
125 | "[master 2e1918d] added demo file\n",
126 | " 1 file changed, 1 insertion(+)\n",
127 | " create mode 100644 demo.txt\n",
128 | "```\n",
129 | "\n",
130 | "**Pushing it to the server!**\n",
131 | "\n",
132 | "```bash\n",
133 | "$ git push \n",
134 | "Counting objects: 3, done.\n",
135 | "Delta compression using up to 8 threads.\n",
136 | "Compressing objects: 100% (2/2), done.\n",
137 | "Writing objects: 100% (3/3), 324 bytes | 0 bytes/s, done.\n",
138 | "Total 3 (delta 0), reused 0 (delta 0)\n",
139 | "To https://github.com/alexsb/demo.git\n",
140 | " 8e1ecd1..2e1918d master -> master\n",
141 | "```\n",
142 | "\n",
143 | "We have now committed a file locally and pushed it to the server, i.e., our local copy is in sync with the server copy. \n",
144 | "\n",
145 | "Note that the `git push` command uses the origin defined in the config file. You can also push to other repositories!"
146 | ]
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "Next, we will make changes at another place. We'll use the **GitHub web interface** to do that. \n",
153 | "\n",
154 | "Once these changes are done, our local repository is out of sync with the remote repository. To get these changes locally, we have to pull from the repository:\n",
155 | "\n",
156 | "```bash\n",
157 | "$ git pull\n",
158 | "remote: Counting objects: 3, done.\n",
159 | "remote: Compressing objects: 100% (2/2), done.\n",
160 | "remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0\n",
161 | "Unpacking objects: 100% (3/3), done.\n",
162 | "From https://github.com/alexsb/demo\n",
163 | " 2e1918d..5dd3090 master -> origin/master\n",
164 | "Updating 2e1918d..5dd3090\n",
165 | "Fast-forward\n",
166 | " demo.txt | 1 +\n",
167 | " 1 file changed, 1 insertion(+)\n",
168 | "``` \n",
169 | " \n",
170 | "Let's see whether the changes are here \n",
171 | "```bash\n",
172 | "$ cat demo.txt \n",
173 | "Hello world\n",
174 | "Are you still spinning?\n",
175 | "```"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "## Handling Conflicts \n",
183 | "\n",
184 | "If we make local and remote changes simultaneously, we will get into conflicted state. \n",
185 | "\n",
186 | "Let's go to the web interface and make a change to the `demo.txt`. \n",
187 | "\n",
188 | "At the same time, we add a line to the demo.txt locally:\n",
189 | "\n",
190 | "```bash\n",
191 | "$ echo \"One more line\" >> demo.txt\n",
192 | "```\n",
193 | "\n",
194 | "If we now pull from GitHub, we're in trouble: \n",
195 | "\n",
196 | "```bash\n",
197 | "$ git pull\n",
198 | "remote: Enumerating objects: 3, done.\n",
199 | "remote: Counting objects: 100% (3/3), done.\n",
200 | "remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0\n",
201 | "Unpacking objects: 100% (3/3), done.\n",
202 | "From https://github.com/alexsb/demo\n",
203 | " 55da8f8..b03d9b1 master -> origin/master\n",
204 | "Updating 55da8f8..b03d9b1\n",
205 | "error: Your local changes to the following files would be overwritten by merge:\n",
206 | "\tdemo.txt\n",
207 | "Please commit your changes or stash them before you merge.\n",
208 | "Aborting\n",
209 | "```\n",
210 | "\n",
211 | "The right way to handle this conflict is to first commit locally: \n",
212 | "\n",
213 | "```bash\n",
214 | "$ git commit -a -m \"added a line\" \n",
215 | "[master 5589619] added a line\n",
216 | "1 file changed, 1 insertion(+)\n",
217 | "```\n",
218 | "\n",
219 | "Then to pull: \n",
220 | "\n",
221 | "```bash\n",
222 | "$ git pull\n",
223 | "Auto-merging demo.txt\n",
224 | "CONFLICT (content): Merge conflict in demo.txt\n",
225 | "Automatic merge failed; fix conflicts and then commit the result.\n",
226 | "```\n",
227 | "\n",
228 | "And fix the conflict in the file. Then commit and push. \n",
229 | "\n",
230 | "However, this is tricky with Jupyter Notebooks and our setup of lectures. \n",
231 | "\n",
232 | "For example, if you make local changes and we update the lecture later, resolving the conflict will not be easy.\n",
233 | "\n",
234 | "Another approach is to override your local changes and accept all changes from the source. This will **delete everything you've done in your local file**. \n",
235 | "\n",
236 | "```bash\n",
237 | "$ git checkout demo.txt\n",
238 | "```\n",
239 | "\n",
240 | "This overrides all changes to demo.txt. "
241 | ]
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "metadata": {},
246 | "source": [
247 | "## Jupyter Notebooks and Git\n",
248 | "\n",
249 | "Unfortunately, Jupyter Notebooks aren't handled well by git, as they mix code and output in the jupyter notebook file. \n",
250 | "\n",
251 | "Let's take a quick look at a notebook file [this is edited and cut]: \n",
252 | "\n",
253 | "```json\n",
254 | "{\n",
255 | " \"cells\": [\n",
256 | " {\n",
257 | " \"cell_type\": \"markdown\",\n",
258 | " \"metadata\": {},\n",
259 | " \"source\": [\n",
260 | " \"# Introduction to Data Science, CS 5963 / Math 3900\\n\",\n",
261 | " \"*CS 5963 / MATH 3900, University of Utah, http://datasciencecourse.net/* \\n\",\n",
262 | " \"\\n\",\n",
263 | " \"## Lab 10: Classification\\n\",\n",
264 | " \"\\n\",\n",
265 | " \"In this lab, we will use the [scikit-learn](http://scikit-learn.org/) library to revisit the three classification methods we introduced: K-nearest neighbor, decision trees, and support vector machines. We will use a [dataset on contraceptive methods in Indonesia](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice).\\n\"\n",
266 | " ]\n",
267 | " },\n",
268 | " {\n",
269 | " \"cell_type\": \"markdown\",\n",
270 | " \"metadata\": {},\n",
271 | " \"source\": [\n",
272 | " \"## The Data\\n\",\n",
273 | " \"\\n\",\n",
274 | " \"We will explore a dataset about the use of contraception in Indonesia. The dataset has 1473 records and the following attributes:\\n\",\n",
275 | " \"\\n\",\n",
276 | " \"1. Woman's age (numerical) \\n\",\n",
277 | " \"2. Woman's education (categorical) 1=low, 2, 3, 4=high \\n\",\n",
278 | " \"3. Husband's education (categorical) 1=low, 2, 3, 4=high \\n\",\n",
279 | " \"4. Number of children ever born (numerical) \\n\",\n",
280 | " \"5. Woman's religion (binary) 0=Non-Islam, 1=Islam \\n\",\n",
281 | " \"6. Employed? (binary) 0=Yes, 1=No \\n\",\n",
282 | " \"7. Husband's occupation (categorical) 1, 2, 3, 4 \\n\",\n",
283 | " \"8. Standard-of-living index (categorical) 1=low, 2, 3, 4=high \\n\",\n",
284 | " \"9. Media exposure (binary) 0=Good, 1=Not good \\n\",\n",
285 | " \"10. Contraceptive method used (class attribute) 1=No-use, 2=Long-term, 3=Short-term\"\n",
286 | " ]\n",
287 | " },\n",
288 | " {\n",
289 | " \"cell_type\": \"markdown\",\n",
290 | " \"metadata\": {},\n",
291 | " \"source\": [\n",
292 | " \"### Hypothesis\\n\",\n",
293 | " \"\\n\",\n",
294 | " \"Write down which features do you think have the most impact on the use of contraception.\"\n",
295 | " ]\n",
296 | " },\n",
297 | " {\n",
298 | " \"cell_type\": \"code\",\n",
299 | " \"execution_count\": 2,\n",
300 | " \"metadata\": {\n",
301 | " \"collapsed\": false\n",
302 | " },\n",
303 | " \"outputs\": [\n",
304 | " {\n",
305 | " \"data\": {\n",
306 | " \"text/html\": [\n",
307 | " \"
\\n\",\n",
308 | " \"
\\n\",\n",
309 | " \" \\n\",\n",
310 | " \" \\n\",\n",
311 | " \" | \\n\",\n",
312 | " \" Age | \\n\",\n",
313 | " \" Education | \\n\",\n",
314 | " \" Husband-Education | \\n\",\n",
315 | " \" Children | \\n\",\n",
316 | " \" Religion | \\n\",\n",
317 | "```\n",
318 | "\n",
319 | "Things like \"outputs\" and \"execution_count\" can change without any change to the notebooks functionality. \n",
320 | "\n",
321 | "So, what can you do? \n",
322 | "\n",
323 | " * Only commit clean notebooks, i.e., run \"Restart and Clear Output\" before committing pusing. This gets tedious, of course, if your script takes a long time to run. \n",
324 | " * Deal with conflicts (it's not too hard).\n",
325 | " * Work in pure python (not encouraged for your final project). \n",
326 | " * Synchronize with your collaborators over chat (...). \n",
327 | " * More sophisticated solutions [such as this one](https://gist.github.com/pbugnion/ea2797393033b54674af) (untested). \n",
328 | " * Hope and wait that Jupyter notebook will at some point separate input from output. (It's [looking good](https://github.com/jupyter/roadmap/blob/master/companion-files.md)).\n"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "## Ignore Files\n",
336 | "\n",
337 | "When developing software, it's quite common that there are a lot of temporary files, e.g., created by Jupyter notebook to save temporary states. We shouldn't track temporary files, there is no reason to store them, and the can create conflicts.\n",
338 | "\n",
339 | "When you work with git on the command line, you have to manually add files you want to commit. But some GUI tools just add everything, so it's easy to add files you don't want. \n",
340 | "\n",
341 | "A good approach to avoid that is to use a `.gitignore` file. A gitignore file is a hidden file that contains file extensions that shouldn't be added to a git repository. For Jupyter notebooks, this is a minimal .gitignore file: \n",
342 | "\n",
343 | "```bash\n",
344 | "# IPython Notebook\n",
345 | ".ipynb_checkpoints\n",
346 | "```\n",
347 | "\n",
348 | "You can find a more comprehensive `.gitignore` file in the [lecture repository]. We recommend that you copy this into your project repository. "
349 | ]
350 | },
351 | {
352 | "cell_type": "markdown",
353 | "metadata": {},
354 | "source": [
355 | "## Other Files\n",
356 | "\n",
357 | "You should always add a `README.md` file that describes what the code in the repository does and how to run it. \n",
358 | "\n",
359 | "You should always add a license to your code. We recommend the BSD or MIT license, which are non-viral open source licenses. "
360 | ]
361 | }
362 | ],
363 | "metadata": {
364 | "anaconda-cloud": {},
365 | "kernelspec": {
366 | "display_name": "Python 3 (ipykernel)",
367 | "language": "python",
368 | "name": "python3"
369 | },
370 | "language_info": {
371 | "codemirror_mode": {
372 | "name": "ipython",
373 | "version": 3
374 | },
375 | "file_extension": ".py",
376 | "mimetype": "text/x-python",
377 | "name": "python",
378 | "nbconvert_exporter": "python",
379 | "pygments_lexer": "ipython3",
380 | "version": "3.9.6"
381 | },
382 | "pycharm": {
383 | "stem_cell": {
384 | "cell_type": "raw",
385 | "metadata": {
386 | "collapsed": false
387 | },
388 | "source": []
389 | }
390 | }
391 | },
392 | "nbformat": 4,
393 | "nbformat_minor": 1
394 | }
395 |
--------------------------------------------------------------------------------
/02-basic-python/anaconda_navigator.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/anaconda_navigator.png
--------------------------------------------------------------------------------
/02-basic-python/datasciencecat.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/datasciencecat.jpg
--------------------------------------------------------------------------------
/02-basic-python/exercise.py:
--------------------------------------------------------------------------------
1 | def add_numbers(a, b):
2 | return a + b
3 |
4 | print(add_numbers(3, 7))
5 | print(add_numbers(14.22, 19))
6 | # printing a won't work because of scope
7 | print(a)
8 |
--------------------------------------------------------------------------------
/02-basic-python/first_steps.py:
--------------------------------------------------------------------------------
1 | def double_number(a):
2 | # btw, here is a comment! Use the # symbol to add comments or temporarily remove code
3 | # shorthand operator for 'a = a * 2'
4 | a *= 2
5 | return a
6 |
7 | print(double_number(3))
8 | print(double_number(14.22))
9 |
--------------------------------------------------------------------------------
/02-basic-python/newrepo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/02-basic-python/newrepo.png
--------------------------------------------------------------------------------
/03-basic-python-II/lecture-3-exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Lecture 3 Exercises"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Exercise 1: Data Types and Operators"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "**Task 1.1:** Try how capitalization affects string comparison, e.g., compare \"datascience\" to \"Datascience\"."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": []
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "**Task 1.2:** Try to compare floats using the `==` operator defined as expressions of integers, e.g., whether 1/3 is equal to 2/6. Does that work?"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": null,
41 | "metadata": {},
42 | "outputs": [],
43 | "source": []
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "**Task 1.3:** Write an expression that compares the \"floor\" value of a float to an integer, e.g., compare the floor of 1/3 to 0. There are two ways to calculate a floor value: using `int()` and using `math.floor()`. Are they equal? What is the data type of the returned values?"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": null,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": []
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": []
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "## Exercise 3: Functions and If"
71 | ]
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "metadata": {},
76 | "source": [
77 | "Write a function that takes two integers. If either of the numbers can be divided by the other without a remainder, print the result of the division. If none of the numbers can divide the other one, print an error message."
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": null,
83 | "metadata": {},
84 | "outputs": [],
85 | "source": []
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "## Exercise 4: Lists\n",
92 | "\n",
93 | " * Create a list for the Rolling Stones: Mick, Keith, Charlie, Ronnie.\n",
94 | " * Create a slice of that list that contains only members of the original lineup (Mick, Keith, Charlie).\n",
95 | " * Add the stones lists to the the bands list."
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 1,
101 | "metadata": {},
102 | "outputs": [],
103 | "source": [
104 | "# intializations\n",
105 | "beatles = [\"Paul\", \"John\", \"George\", \"Ringo\"]\n",
106 | "zeppelin = [\"Jimmy\", \"Robert\", \"John\", \"John\"]\n",
107 | "bands = [beatles, zeppelin]"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": []
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "metadata": {},
120 | "source": [
121 | "## Exercise 5.1: While\n",
122 | "\n",
123 | "Write a while loop that computes the sum of the 100 first positive integers. I.e., calculate\n",
124 | "\n",
125 | "$1+2+3+4+5+...+100$ "
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": null,
131 | "metadata": {},
132 | "outputs": [],
133 | "source": []
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "## Exercise 5.2: For\n",
140 | "\n",
141 | "Use a for loop to create an array that contains all even numbers in the range 0-50, i.e., an array: [2, 4, 6, ..., 48, 50] "
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": null,
147 | "metadata": {},
148 | "outputs": [],
149 | "source": []
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "Create a new array for the Beatles main instruments: Ringo played drums, George played lead guitar, John played rhythm guitar and Paul played bass. Assume that the array position associated the musician with his instrument. Use a for loop to print:\n",
156 | "\n",
157 | "```\n",
158 | "Paul: Bass\n",
159 | "John: Rythm Guitar\n",
160 | "George: Lead Guitar\n",
161 | "Ringo: Drums\n",
162 | "```"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": null,
168 | "metadata": {},
169 | "outputs": [],
170 | "source": []
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "## Exercise 6: Recursion\n",
177 | "\n",
178 | "Write a recursive function that calculates the factorial of a number. "
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": null,
184 | "metadata": {},
185 | "outputs": [],
186 | "source": []
187 | },
188 | {
189 | "cell_type": "markdown",
190 | "metadata": {},
191 | "source": [
192 | "## Exercise 7: List Comprehension"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {
198 | "collapsed": true
199 | },
200 | "source": [
201 | "Write a list comprehension function that creates an array with the length of each word in the following sentence:\n",
202 | "\n",
203 | "\"the quick brown fox jumps over the lazy dog\"\n",
204 | "\n",
205 | "The result should be a list: \n",
206 | "\n",
207 | "```python\n",
208 | "[3,5,...,3]\n",
209 | "```"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 2,
215 | "metadata": {},
216 | "outputs": [
217 | {
218 | "data": {
219 | "text/plain": [
220 | "['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']"
221 | ]
222 | },
223 | "execution_count": 2,
224 | "metadata": {},
225 | "output_type": "execute_result"
226 | }
227 | ],
228 | "source": [
229 | "# setting up the array\n",
230 | "sentence = \"the quick brown fox jumps over the lazy dog\"\n",
231 | "word_list = sentence.split()\n",
232 | "word_list"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {},
239 | "outputs": [],
240 | "source": []
241 | }
242 | ],
243 | "metadata": {
244 | "anaconda-cloud": {},
245 | "kernelspec": {
246 | "display_name": "Python 3 (ipykernel)",
247 | "language": "python",
248 | "name": "python3"
249 | },
250 | "language_info": {
251 | "codemirror_mode": {
252 | "name": "ipython",
253 | "version": 3
254 | },
255 | "file_extension": ".py",
256 | "mimetype": "text/x-python",
257 | "name": "python",
258 | "nbconvert_exporter": "python",
259 | "pygments_lexer": "ipython3",
260 | "version": "3.9.6"
261 | }
262 | },
263 | "nbformat": 4,
264 | "nbformat_minor": 1
265 | }
266 |
--------------------------------------------------------------------------------
/03-basic-python-II/patric_recursive.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/03-basic-python-II/patric_recursive.gif
--------------------------------------------------------------------------------
/04-intro-desc-stat/04-DescriptiveStatistics_Activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Lecture 4: In Class Activity\n",
8 | "\n",
9 | "\n",
10 | "Your Name: \n",
11 | "Your UID: \n",
12 | "Your E-Mail: \n",
13 | "\n",
14 | "For this activity first import a small data set containing 10 measurements of CO$_2$ levels and global temperatures. You can do this with the read_csv command from the pandas library as follows: "
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 1,
20 | "metadata": {
21 | "tags": []
22 | },
23 | "outputs": [],
24 | "source": [
25 | "import pandas as pd\n",
26 | "import matplotlib.pyplot as plt\n",
27 | "import numpy as np\n",
28 | "my_data = pd.read_csv(\"global_warming.csv\")"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "# Item 1\n",
36 | "\n",
37 | "Print the first few rows of the my_data data frame. Then create lists storing the CO_2 and temperature values. Compute the mean and median of the temperature measurements."
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "execution_count": null,
43 | "metadata": {},
44 | "outputs": [],
45 | "source": [
46 | "# Your code here"
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {},
52 | "source": [
53 | "# Item 2\n",
54 | "\n",
55 | "Make a scatterplot of CO_2 versus temperature. Is there a strong relationship between these two variables? What is the correlation cofficient? Can we infer that increasing carbon dioxide levels will increase global temperature?"
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {},
62 | "outputs": [],
63 | "source": [
64 | "# Your code here"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "*Your answers here.*"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "# Item 3\n",
79 | "\n",
80 | "Change the last temperature measurement from 14.4 degrees to 144 degrees which could have happened if there was a small error in manual data entry. \n",
81 | "\n",
82 | "How are the mean and median affected?\n",
83 | "\n",
84 | "How are the scatterplot and the correlation coefficient affected?\n",
85 | "\n",
86 | "Are the mean, median, and correlation coefficient robust to outliers and data entry errors?"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "metadata": {},
93 | "outputs": [],
94 | "source": [
95 | "# Your code here"
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "*Your answers here.*"
103 | ]
104 | }
105 | ],
106 | "metadata": {
107 | "anaconda-cloud": {},
108 | "celltoolbar": "Slideshow",
109 | "kernelspec": {
110 | "display_name": "Python 3 (ipykernel)",
111 | "language": "python",
112 | "name": "python3"
113 | },
114 | "language_info": {
115 | "codemirror_mode": {
116 | "name": "ipython",
117 | "version": 3
118 | },
119 | "file_extension": ".py",
120 | "mimetype": "text/x-python",
121 | "name": "python",
122 | "nbconvert_exporter": "python",
123 | "pygments_lexer": "ipython3",
124 | "version": "3.11.5"
125 | },
126 | "nbpresent": {
127 | "slides": {
128 | "006f01ca-e160-4faa-ad02-2f873362ca99": {
129 | "id": "006f01ca-e160-4faa-ad02-2f873362ca99",
130 | "prev": "e60ea09b-1474-49b0-9ea6-2e803b335693",
131 | "regions": {
132 | "88222835-28de-4a0f-895e-303024baf060": {
133 | "attrs": {
134 | "height": 0.8,
135 | "width": 0.8,
136 | "x": 0.1,
137 | "y": 0.1
138 | },
139 | "content": {
140 | "cell": "e6a51e7a-d63e-4187-8899-bfbf03f8a4b6",
141 | "part": "whole"
142 | },
143 | "id": "88222835-28de-4a0f-895e-303024baf060"
144 | }
145 | }
146 | },
147 | "2ba6955d-8be2-4ce3-ae98-f3b3695e4832": {
148 | "id": "2ba6955d-8be2-4ce3-ae98-f3b3695e4832",
149 | "prev": "35a7a5a6-f0c3-4b68-9579-e5840160a87d",
150 | "regions": {
151 | "63b4b5f3-c348-418c-aabf-932d5fdbcc1c": {
152 | "attrs": {
153 | "height": 0.8,
154 | "width": 0.8,
155 | "x": 0.1,
156 | "y": 0.1
157 | },
158 | "content": {
159 | "cell": "883076a7-1c6e-492f-b9d2-0b8550b5c31f",
160 | "part": "whole"
161 | },
162 | "id": "63b4b5f3-c348-418c-aabf-932d5fdbcc1c"
163 | }
164 | }
165 | },
166 | "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244": {
167 | "id": "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244",
168 | "prev": "9ba8c8c8-59a7-4776-84cb-084e5b0a2317",
169 | "regions": {
170 | "44582fec-116b-475d-9793-ad29580b7fc2": {
171 | "attrs": {
172 | "height": 0.8,
173 | "width": 0.8,
174 | "x": 0.1,
175 | "y": 0.1
176 | },
177 | "content": {
178 | "cell": "4992f285-654f-485e-81ef-8a6ae18cad34",
179 | "part": "whole"
180 | },
181 | "id": "44582fec-116b-475d-9793-ad29580b7fc2"
182 | }
183 | }
184 | },
185 | "35a7a5a6-f0c3-4b68-9579-e5840160a87d": {
186 | "id": "35a7a5a6-f0c3-4b68-9579-e5840160a87d",
187 | "prev": "c7291188-b014-4fcb-83bc-f1ea035ee4c9",
188 | "regions": {
189 | "cbc80f26-e933-4d90-9dfd-e55a3dc339ba": {
190 | "attrs": {
191 | "height": 0.8,
192 | "width": 0.8,
193 | "x": 0.1,
194 | "y": 0.1
195 | },
196 | "content": {
197 | "cell": "558af430-f4c0-4be9-b1ef-afce5fccd0fa",
198 | "part": "whole"
199 | },
200 | "id": "cbc80f26-e933-4d90-9dfd-e55a3dc339ba"
201 | }
202 | }
203 | },
204 | "3ecd0fe6-e75f-4362-a6e7-e5273e12058e": {
205 | "id": "3ecd0fe6-e75f-4362-a6e7-e5273e12058e",
206 | "prev": "2d5e1e8f-2e26-415e-8d8d-9229f2dc1244",
207 | "regions": {
208 | "eac20970-b45c-4073-a596-cdb2a974fe23": {
209 | "attrs": {
210 | "height": 0.8,
211 | "width": 0.8,
212 | "x": 0.1,
213 | "y": 0.1
214 | },
215 | "content": {
216 | "cell": "de60c848-d1fb-478d-a736-0ebe21762a24",
217 | "part": "whole"
218 | },
219 | "id": "eac20970-b45c-4073-a596-cdb2a974fe23"
220 | }
221 | }
222 | },
223 | "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0": {
224 | "id": "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0",
225 | "prev": "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066",
226 | "regions": {
227 | "ddc97209-2f2f-409d-953c-a9a83dec6738": {
228 | "attrs": {
229 | "height": 0.8,
230 | "width": 0.8,
231 | "x": 0.1,
232 | "y": 0.1
233 | },
234 | "content": {
235 | "cell": "674ee724-0165-40c5-9296-83db8305fa4c",
236 | "part": "whole"
237 | },
238 | "id": "ddc97209-2f2f-409d-953c-a9a83dec6738"
239 | }
240 | }
241 | },
242 | "95bf00f9-fc2a-4478-bd48-fb66078e061f": {
243 | "id": "95bf00f9-fc2a-4478-bd48-fb66078e061f",
244 | "prev": "b9fa7815-a205-4ea6-9076-1289b96670cf",
245 | "regions": {
246 | "dd7aef1b-3f05-47a5-bc5f-274809d2c21d": {
247 | "attrs": {
248 | "height": 0.8,
249 | "width": 0.8,
250 | "x": 0.1,
251 | "y": 0.1
252 | },
253 | "content": {
254 | "cell": "61e1167e-99ef-4b5d-b717-07a46077a091",
255 | "part": "whole"
256 | },
257 | "id": "dd7aef1b-3f05-47a5-bc5f-274809d2c21d"
258 | }
259 | }
260 | },
261 | "966d5c12-49ef-4129-aecb-b183804ecd19": {
262 | "id": "966d5c12-49ef-4129-aecb-b183804ecd19",
263 | "prev": "3ecd0fe6-e75f-4362-a6e7-e5273e12058e",
264 | "regions": {
265 | "ff0704a2-f662-4a03-9874-5613f4634956": {
266 | "attrs": {
267 | "height": 0.8,
268 | "width": 0.8,
269 | "x": 0.1,
270 | "y": 0.1
271 | },
272 | "content": {
273 | "cell": "a6fd92a3-b57e-45c5-b216-f9f475baf8ce",
274 | "part": "whole"
275 | },
276 | "id": "ff0704a2-f662-4a03-9874-5613f4634956"
277 | }
278 | }
279 | },
280 | "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066": {
281 | "id": "9807c9b8-54cd-4a78-b50d-1a6e9f7ff066",
282 | "prev": "966d5c12-49ef-4129-aecb-b183804ecd19",
283 | "regions": {
284 | "9e037b47-3fb5-4a60-b69c-ad3b282807a1": {
285 | "attrs": {
286 | "height": 0.8,
287 | "width": 0.8,
288 | "x": 0.1,
289 | "y": 0.1
290 | },
291 | "content": {
292 | "cell": "b79fa570-8c08-4820-a035-2a00bfae1a9b",
293 | "part": "whole"
294 | },
295 | "id": "9e037b47-3fb5-4a60-b69c-ad3b282807a1"
296 | }
297 | }
298 | },
299 | "9ba8c8c8-59a7-4776-84cb-084e5b0a2317": {
300 | "id": "9ba8c8c8-59a7-4776-84cb-084e5b0a2317",
301 | "prev": "9e2b6ffd-bec2-4027-93e8-64b63e770378",
302 | "regions": {
303 | "45bb0df9-937a-48a6-882b-346a1253250c": {
304 | "attrs": {
305 | "height": 0.8,
306 | "width": 0.8,
307 | "x": 0.1,
308 | "y": 0.1
309 | },
310 | "content": {
311 | "cell": "be5bedf1-b9ed-4caa-bc3e-6c390df97946",
312 | "part": "whole"
313 | },
314 | "id": "45bb0df9-937a-48a6-882b-346a1253250c"
315 | }
316 | }
317 | },
318 | "9e2b6ffd-bec2-4027-93e8-64b63e770378": {
319 | "id": "9e2b6ffd-bec2-4027-93e8-64b63e770378",
320 | "prev": "e9d31a55-0862-44ec-bd48-d74167655985",
321 | "regions": {
322 | "7a7c6996-8117-42ce-8093-318b84bd052b": {
323 | "attrs": {
324 | "height": 0.8,
325 | "width": 0.8,
326 | "x": 0.1,
327 | "y": 0.1
328 | },
329 | "content": {
330 | "cell": "95b34ac1-b36d-492d-84f9-adafb2d57ace",
331 | "part": "whole"
332 | },
333 | "id": "7a7c6996-8117-42ce-8093-318b84bd052b"
334 | }
335 | }
336 | },
337 | "ae354c9e-1384-4f31-8e03-5c96f3988bf4": {
338 | "id": "ae354c9e-1384-4f31-8e03-5c96f3988bf4",
339 | "prev": null,
340 | "regions": {
341 | "9e57ef10-c941-41de-93da-812357c7ec21": {
342 | "attrs": {
343 | "height": 0.8,
344 | "width": 0.8,
345 | "x": 0.1,
346 | "y": 0.1
347 | },
348 | "content": {
349 | "cell": "dac6427e-b8df-46f9-bfd3-b24427a73993",
350 | "part": "whole"
351 | },
352 | "id": "9e57ef10-c941-41de-93da-812357c7ec21"
353 | }
354 | }
355 | },
356 | "b9fa7815-a205-4ea6-9076-1289b96670cf": {
357 | "id": "b9fa7815-a205-4ea6-9076-1289b96670cf",
358 | "prev": "ae354c9e-1384-4f31-8e03-5c96f3988bf4",
359 | "regions": {
360 | "cc19ec36-3caa-4666-86e7-2bd1d9cb03b8": {
361 | "attrs": {
362 | "height": 0.8,
363 | "width": 0.8,
364 | "x": 0.1,
365 | "y": 0.1
366 | },
367 | "content": {
368 | "cell": "c7392535-4666-41a5-a68a-7306dccd6cd8",
369 | "part": "whole"
370 | },
371 | "id": "cc19ec36-3caa-4666-86e7-2bd1d9cb03b8"
372 | }
373 | }
374 | },
375 | "c7291188-b014-4fcb-83bc-f1ea035ee4c9": {
376 | "id": "c7291188-b014-4fcb-83bc-f1ea035ee4c9",
377 | "prev": "006f01ca-e160-4faa-ad02-2f873362ca99",
378 | "regions": {
379 | "3fa8900b-1ee2-4625-8e0b-a8a52d95390d": {
380 | "attrs": {
381 | "height": 0.8,
382 | "width": 0.8,
383 | "x": 0.1,
384 | "y": 0.1
385 | },
386 | "content": {
387 | "cell": "a912604c-786a-448e-a908-397f28b46a13",
388 | "part": "whole"
389 | },
390 | "id": "3fa8900b-1ee2-4625-8e0b-a8a52d95390d"
391 | }
392 | }
393 | },
394 | "e60ea09b-1474-49b0-9ea6-2e803b335693": {
395 | "id": "e60ea09b-1474-49b0-9ea6-2e803b335693",
396 | "prev": "4e939c85-e2b3-48b3-b2df-389bc0ca7dd0",
397 | "regions": {
398 | "3df2ec23-ba8a-4b76-b6af-2a4aa219da46": {
399 | "attrs": {
400 | "height": 0.8,
401 | "width": 0.8,
402 | "x": 0.1,
403 | "y": 0.1
404 | },
405 | "content": {
406 | "cell": "06d04c6d-90a4-441d-9d6e-4f719490e12e",
407 | "part": "whole"
408 | },
409 | "id": "3df2ec23-ba8a-4b76-b6af-2a4aa219da46"
410 | }
411 | }
412 | },
413 | "e9d31a55-0862-44ec-bd48-d74167655985": {
414 | "id": "e9d31a55-0862-44ec-bd48-d74167655985",
415 | "prev": "95bf00f9-fc2a-4478-bd48-fb66078e061f",
416 | "regions": {
417 | "cc6bd695-e238-4250-8822-ffea3f82f544": {
418 | "attrs": {
419 | "height": 0.8,
420 | "width": 0.8,
421 | "x": 0.1,
422 | "y": 0.1
423 | },
424 | "content": {
425 | "cell": "86c3f014-9535-48f0-95a2-df74d16eaa69",
426 | "part": "whole"
427 | },
428 | "id": "cc6bd695-e238-4250-8822-ffea3f82f544"
429 | }
430 | }
431 | }
432 | },
433 | "themes": {}
434 | }
435 | },
436 | "nbformat": 4,
437 | "nbformat_minor": 4
438 | }
439 |
--------------------------------------------------------------------------------
/04-intro-desc-stat/Conf1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/Conf1.png
--------------------------------------------------------------------------------
/04-intro-desc-stat/Conf2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/Conf2.png
--------------------------------------------------------------------------------
/04-intro-desc-stat/SmallLargeStandDev.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/SmallLargeStandDev.png
--------------------------------------------------------------------------------
/04-intro-desc-stat/correlation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/correlation.png
--------------------------------------------------------------------------------
/04-intro-desc-stat/global_warming.csv:
--------------------------------------------------------------------------------
1 | CO_2,Temp
2 | 314,13.9
3 | 317,14
4 | 320,13.9
5 | 326,14.1
6 | 331,14
7 | 339,14.3
8 | 346,14.1
9 | 354,14.5
10 | 361,14.5
11 | 369,14.4
--------------------------------------------------------------------------------
/04-intro-desc-stat/purity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/04-intro-desc-stat/purity.png
--------------------------------------------------------------------------------
/04-intro-desc-stat/test_data.csv:
--------------------------------------------------------------------------------
1 | Age,Weight,Gender
2 | 23,123,M
3 | 523,345,F
4 | 45,234,M
5 | 67,21,F
--------------------------------------------------------------------------------
/05-dictionaries-pandas-series/05-exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Lecture 5: Exercise Solutions"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {
13 | "collapsed": true
14 | },
15 | "source": [
16 | "## Exercise 2: Sets\n",
17 | "\n",
18 | "Write a function that finds the overlap of two sets and prints them.\n",
19 | "Initialize two sets, e.g., with values {13, 25, 37, 45, 13} and {14, 25, 38, 8, 45} and call this function with them."
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": null,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": []
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "## Exercise 3: Dictionaries\n",
34 | "\n",
35 | " * Create a dictionary with two-letter codes of two US states and the full names, e.g., UT: Utah, NY: New York\n",
36 | " * After initially creating the dictionary, add two more states to the dictionary.\n",
37 | " * Create a second dictionary that maps the state codes to an array of cities in that state, e.g., UT: [Salt Lake City, Ogden, Provo, St.George]. \n",
38 | " * Write a function that takes a state code and prints the full name of the state and lists the cities in that state."
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": null,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": []
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "## Exercise 4: Objects\n",
53 | "\n",
54 | "Create a class `Pet` with members `name, pronoun, animal,` and `pet_response`. \n",
55 | "\n",
56 | "Add a method `pet()` which prints a response that is composed of the members, like that: \n",
57 | "\n",
58 | "`Layla is a cat. If you pet her, Layla purrs.` \n",
59 | "`Bond is a dog. If you pet him, Scout wags his tail.`\n"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": null,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": []
68 | },
69 | {
70 | "cell_type": "markdown",
71 | "metadata": {},
72 | "source": [
73 | "Here are some example calls: "
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": null,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "layla = Pet(\"Layla\", \"her\", \"cat\", \"purrs\")\n",
83 | "scout = Pet(\"Bond\", \"him\", \"dog\", \"wags his tail\")\n",
84 | "\n",
85 | "layla.pet()\n",
86 | "scout.pet()"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {
92 | "collapsed": true
93 | },
94 | "source": [
95 | "## Exercise 6: Pandas Series"
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "Create a new pandas series with the lists given below that contain NFL team names and the number of Super Bowl titles they won. Use the names as indices, the wins as the data.\n",
103 | "\n",
104 | " * Once the list is created, sort the series alphabetically by index. \n",
105 | " * Print an overview of the statistical properties of the series. What's the mean number of wins?\n",
106 | " * Filter out all teams that have won less than four Super Bowl titles\n",
107 | " * A football team has 45 players. Update the series so that instead of the number of titles, it reflects the number of Super Bowl rings given to the players. \n",
108 | " * Assume that each ring costs USD 32,000. Update the series so that it contains a string of the dollar amount including the \\$ sign. For the Steelers, for example, this would correspond to: \n",
109 | " ```\n",
110 | " Pittsburgh Steelers $ 8640000\n",
111 | " ```\n",
112 | "\n"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": [
121 | "teams = [\"New England Patriots\",\n",
122 | " \"Pittsburgh Steelers\",\n",
123 | " \"Dallas Cowboys\",\n",
124 | " \"San Francisco 49ers\",\n",
125 | " \"Green Bay Packers\",\n",
126 | " \"New York Giants\",\n",
127 | " \"Denver Broncos\",\n",
128 | " \"Oakland/Los Angeles/Las Vegas Raiders\",\n",
129 | " \"Washington Commanders\"\n",
130 | " \"Miami Dolphins\",\n",
131 | " \"Baltimore/Indianapolis Colts\",\n",
132 | " \"Baltimore Ravens\",\n",
133 | " \"Los Angeles/St. Louis Rams\",\n",
134 | " \"Tampa Bay Buccaneers\"]\n",
135 | "wins = [6,6,5,5,4,4,3,3,3,2,2,2,2,2]"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "import pandas as pd"
145 | ]
146 | }
147 | ],
148 | "metadata": {
149 | "anaconda-cloud": {},
150 | "kernelspec": {
151 | "display_name": "Python 3 (ipykernel)",
152 | "language": "python",
153 | "name": "python3"
154 | },
155 | "language_info": {
156 | "codemirror_mode": {
157 | "name": "ipython",
158 | "version": 3
159 | },
160 | "file_extension": ".py",
161 | "mimetype": "text/x-python",
162 | "name": "python",
163 | "nbconvert_exporter": "python",
164 | "pygments_lexer": "ipython3",
165 | "version": "3.9.6"
166 | }
167 | },
168 | "nbformat": 4,
169 | "nbformat_minor": 1
170 | }
171 |
--------------------------------------------------------------------------------
/06-loading-data-dataframes/06-exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "nbpresent": {
7 | "id": "50a40f10-f4b6-4dd3-b7aa-c63ed3ca3244"
8 | },
9 | "slideshow": {
10 | "slide_type": "slide"
11 | }
12 | },
13 | "source": [
14 | "# Introduction to Data Science – Lecture 6 – Exercises\n",
15 | "\n",
16 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "## Exercise 1: Reading and Writing Data\n",
24 | "\n",
25 | "The file [grades.csv](grades.csv) is a file with student names and letter grades:\n",
26 | "\n",
27 | "```\n",
28 | "Alice; A\n",
29 | "Bob; B\n",
30 | "Robert; A\n",
31 | "Richard; C\n",
32 | "```\n",
33 | "\n",
34 | "Read the file into an array. Add a GPA to the student's row (A=4,B=3,C=2,D=1). \n",
35 | "\n",
36 | "Hint: the function [strip()](https://docs.python.org/3/library/stdtypes.html#str.strip) removes trailing whitespace from a string.\n",
37 | "\n",
38 | "Write that file into a new file `grades_gpa.csv`"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 1,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": [
47 | "gpas = {\"A\":4, \"B\":3, \"C\":2, \"D\":1}\n",
48 | "\n"
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "## Exercise 2: Data Frames\n",
56 | "\n",
57 | "* Calculate the mean certified sales for all albums."
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": 3,
63 | "metadata": {},
64 | "outputs": [],
65 | "source": [
66 | "import pandas as pd\n",
67 | "hit_albums = pd.read_csv(\"hit_albums.csv\")"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": []
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | " * Create a new dataframe that only contains albums with more than 20 million certified sales.\n"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "metadata": {
88 | "scrolled": true
89 | },
90 | "outputs": [],
91 | "source": []
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "\n",
98 | " * Create a new dataframe based on the hit_albums dataset that only contains the artists that have at least two albums in the list."
99 | ]
100 | },
101 | {
102 | "cell_type": "code",
103 | "execution_count": null,
104 | "metadata": {},
105 | "outputs": [],
106 | "source": []
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {},
111 | "source": [
112 | "* Create a new dataframe that contains the aggregates sum of all certified sales for each year."
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": []
121 | }
122 | ],
123 | "metadata": {
124 | "anaconda-cloud": {},
125 | "kernelspec": {
126 | "display_name": "Python 3 (ipykernel)",
127 | "language": "python",
128 | "name": "python3"
129 | },
130 | "language_info": {
131 | "codemirror_mode": {
132 | "name": "ipython",
133 | "version": 3
134 | },
135 | "file_extension": ".py",
136 | "mimetype": "text/x-python",
137 | "name": "python",
138 | "nbconvert_exporter": "python",
139 | "pygments_lexer": "ipython3",
140 | "version": "3.9.16"
141 | },
142 | "nbpresent": {
143 | "slides": {
144 | "19a6495f-8346-4b23-a98a-c7115941e8f0": {
145 | "id": "19a6495f-8346-4b23-a98a-c7115941e8f0",
146 | "prev": null,
147 | "regions": {
148 | "d6523b36-7204-4001-8a6c-37c431b18d26": {
149 | "attrs": {
150 | "height": 1,
151 | "width": 1,
152 | "x": 0,
153 | "y": 0
154 | },
155 | "id": "d6523b36-7204-4001-8a6c-37c431b18d26"
156 | }
157 | }
158 | }
159 | },
160 | "themes": {}
161 | }
162 | },
163 | "nbformat": 4,
164 | "nbformat_minor": 1
165 | }
166 |
--------------------------------------------------------------------------------
/06-loading-data-dataframes/grades.csv:
--------------------------------------------------------------------------------
1 | Alice; A
2 | Bob; B
3 | Robert; A
4 | Richard; C
5 |
--------------------------------------------------------------------------------
/06-loading-data-dataframes/hit_albums.csv:
--------------------------------------------------------------------------------
1 | Artist,Album,Released,Genre,"Certified sales (millions)",Claimed sales (millions)
2 | Michael Jackson,Thriller,1982,"Pop, rock, R&B",45.4,65
3 | AC/DC,Back in Black,1980,Hard rock,25.9,50
4 | Pink Floyd,The Dark Side of the Moon,1973,Progressive rock,22.7,45
5 | Whitney Houston / Various artists,The Bodyguard,1992,"Soundtrack/R&B, soul, pop",27.4,44
6 | Meat Loaf,Bat Out of Hell,1977,"Hard rock, progressive rock",20.6,43
7 | Eagles,Their Greatest Hits (1971–1975),1976,"Rock, soft rock, folk rock",32.2,42
8 | Bee Gees / Various artists,Saturday Night Fever,1977,Disco,19,40
9 | Fleetwood Mac,Rumours,1977,Soft rock,27.9,40
10 | Shania Twain,Come On Over,1997,"Country, pop",29.6,39
11 | Led Zeppelin,Led Zeppelin IV,1971,"Hard rock, heavy metal",29,37
12 | Michael Jackson,Bad,1987,"Pop, funk, rock",20.3,34
13 | Alanis Morissette,Jagged Little Pill,1995,Alternative rock,24.8,33
14 | Celine Dion,Falling into You,1996,"Pop, Soft rock",20.2,32
15 | The Beatles,Sgt. Pepper's Lonely Hearts Club Band,1967,Rock,13.1,32
16 | Eagles,Hotel California,1976,"Rock, soft rock, folk rock",21.5,32
17 | Mariah Carey,Music Box,1993,"Pop, R&B, Rock",19,32
18 | Michael Jackson,Dangerous,1991,"Rock, Funk, Pop",17.6,32
19 | Various artists,Dirty Dancing,1987,"Pop, rock, R&B",17.9,32
20 | Celine Dion,Let's Talk About Love,1997,"Pop, Soft rock",19.3,31
21 | The Beatles,1,2000,Rock,21.6,31
22 | Adele,21,2011,"Pop, soul",22.3,30
23 | The Beatles,Abbey Road,1969,Rock,14.4,30
24 | Bruce Springsteen,Born in the U.S.A.,1984,Rock,19.6,30
25 | Dire Straits,Brothers in Arms,1985,Rock,17.7,30
26 | James Horner,Titanic: Music from the Motion Picture,1997,Soundtrack,18.1,30
27 | Madonna,The Immaculate Collection,1990,"Pop, Dance",19.4,30
28 | Metallica,Metallica,1991,"Thrash metal, heavy metal",19.9,30
29 | Nirvana,Nevermind,1991,"Grunge, alternative rock",16.7,30
30 | Pink Floyd,The Wall,1979,Progressive rock,17.6,30
31 | Santana,Supernatural,1999,Rock,20.5,30
32 | Guns N' Roses,Appetite for Destruction,1987,"Heavy metal, hard rock",21.3,30
33 | ABBA,Gold: Greatest Hits,1992,Pop,29,
34 | Bon Jovi,Slippery When Wet,1986,Hard rock,28,
35 | Spice Girls,Spice,1996,Pop,28,
36 | Various artists,Grease: The Original Soundtrack from the Motion Picture,1978,Soundtrack,28,
37 | Britney Spears,...Baby One More Time,1999,Pop,28,
38 | Linkin Park,Hybrid Theory,2000,"Nu metal, rap metal, alternative metal",27,
39 | Bob Marley & The Wailers,Legend: The Best of Bob Marley & The Wailers,1984,Reggae,25,
40 | Carole King,Tapestry,1971,Pop,25,
41 | Madonna,Like a Virgin,1984,"Pop, dance",25,
42 | Madonna,True Blue,1986,Pop,25,
43 | Mariah Carey,Daydream,1995,"Pop, R&B",25,
44 | Norah Jones,Come Away with Me,2002,Jazz,25,
45 | Phil Collins,No Jacket Required,1985,"Pop, Rock",25,
46 | Queen,Greatest Hits,1981,Rock,25,
47 | Simon & Garfunkel,Bridge over Troubled Water,1970,Folk rock,25,
48 | U2,The Joshua Tree,1987,Rock,25,
49 | Whitney Houston,Whitney Houston,1985,"Pop, R&B",25,
50 | Backstreet Boys,Backstreet's Back / Backstreet Boys,1997,Pop,24,
51 | Backstreet Boys,Millennium,1999,Pop,24,
52 | Ace of Base,Happy Nation/The Sign,1993,Pop,23,
53 | TLC,CrazySexyCool,1994,"R&B, hip hop",23,
54 | Cyndi Lauper,She's So Unusual,1983,"New wave, pop rock, synthpop",22,
55 | Oasis,(What's the Story) Morning Glory?,1995,"Britpop, rock",22,
56 | Bon Jovi,Cross Road,1994,Hard rock,21,
57 | Eminem,The Marshall Mathers LP,2000,"Rap, hip-hop",21,
58 | Adele,25,2015,"Soul, pop, R&B",20,
59 | Avril Lavigne,Let Go,2002,"Pop rock, alternative rock, post-grunge",20,
60 | Boston,Boston,1976,Hard rock,20,
61 | Britney Spears,Oops!... I Did It Again,2000,Pop,20,
62 | Eric Clapton,Unplugged,1992,"Acoustic blues, folk rock",20,
63 | Def Leppard,Hysteria,1987,"Pop, Hard rock",20,
64 | George Michael,Faith,1987,"Pop, R&B",20,
65 | Green Day,Dookie,1994,"Pop punk, punk rock, alternative rock",20,
66 | Lionel Richie,Can't Slow Down,1983,"Pop, R&B, soul",20,
67 | Michael Jackson,"HIStory: Past, Present and Future, Book I",1995,"Pop, rock, R&B",20,
68 | Michael Jackson,Off the Wall,1979,"Soul, disco, R&B",20,
69 | Prince & the Revolution,Purple Rain,1984,"Pop, rock, R&B",20,
70 | Shania Twain,The Woman in Me,1995,"Country, pop",20,
71 | Shania Twain,Up!,2002,"Country, pop, world music",20,
72 | Supertramp,Breakfast in America,1979,"Progressive rock, art rock",20,
73 | Tina Turner,Private Dancer,1984,"Pop, rock, R&B",20,
74 | Tracy Chapman,Tracy Chapman,1988,Folk rock,20,
75 | Usher,Confessions,2004,R&B,20,
76 | Various artists,Flashdance: Original Soundtrack from the Motion Picture,1983,Soundtrack,20,
77 | Whitney Houston,Whitney,1987,"Pop, R&B",20,
78 | Shakira,Laundry Service,2001,"Pop, Rock",20,
79 |
--------------------------------------------------------------------------------
/06-loading-data-dataframes/my_file.txt:
--------------------------------------------------------------------------------
1 | Hello World
2 | Are you still spinning?
3 |
--------------------------------------------------------------------------------
/07-hypothesis-testing/Cohen1994.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/Cohen1994.pdf
--------------------------------------------------------------------------------
/07-hypothesis-testing/InferenceErrors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/InferenceErrors.png
--------------------------------------------------------------------------------
/07-hypothesis-testing/Nuzzo2014.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/Nuzzo2014.pdf
--------------------------------------------------------------------------------
/07-hypothesis-testing/determinePvals.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/07-hypothesis-testing/determinePvals.png
--------------------------------------------------------------------------------
/08-data-science-ethics/Data-Science-Ethics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/08-data-science-ethics/Data-Science-Ethics.pdf
--------------------------------------------------------------------------------
/09-LinearRegression1/08-LinearRegression1_Activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "# Introduction to Data Science \n",
12 | "# Activity for Lecture 8: Linear Regression 1\n",
13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n",
14 | "\n",
15 | "Name:\n",
16 | "\n",
17 | "Email:\n",
18 | "\n",
19 | "UID:\n"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {
25 | "slideshow": {
26 | "slide_type": "slide"
27 | }
28 | },
29 | "source": [
30 | "## Class exercise: amphetamine and appetite\n",
31 | "\n",
32 | "Amphetamine is a drug that suppresses appetite. In a study of this effect, a pharmocologist randomly allocated 24 rats to three treatment groups to receive an injection of amphetamine at one of two dosage levels (2.5 mg/kg or 5.0 mg/kg), or an injection of saline solution (0 mg/kg). She measured the amount of food consumed by each animal (in gm/kg) in the 3-hour period following injection. The results (gm of food consumed per kg of body weight) are shown below.\n"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 1,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "# imports and setup\n",
42 | "\n",
43 | "import scipy as sc\n",
44 | "import numpy as np\n",
45 | "\n",
46 | "import pandas as pd\n",
47 | "import statsmodels.formula.api as sm \n",
48 | "from sklearn import linear_model \n",
49 | "\n",
50 | "import matplotlib.pyplot as plt\n",
51 | "%matplotlib inline \n",
52 | "plt.rcParams['figure.figsize'] = (10, 6)\n",
53 | "\n",
54 | "from mpl_toolkits.mplot3d import Axes3D\n",
55 | "from matplotlib import cm\n",
56 | "\n",
57 | "# Experiment results:\n",
58 | "\n",
59 | "food_consump_dose0 = [112.6, 102.1, 90.2, 81.5, 105.6, 93.0, 106.6, 108.3]\n",
60 | "food_consump_dose2p5 = [73.3, 84.8, 67.3, 55.3, 80.7, 90.0, 75.5, 77.1]\n",
61 | "food_consump_dose5 = [38.5, 81.3, 57.1, 62.3, 51.5, 48.3, 42.7, 57.9]"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "## Activity 1: Scatterplot and Linear Regression\n",
69 | "\n",
70 | "**Exercise:** Make a scatter plot with dose as the $x$-variable and food consumption as the $y$ variable. Then run a linear regression on the data using the 'ols' function from the statsmodels python library to relate the variables by \n",
71 | "\n",
72 | "$$\n",
73 | "\\text{Food Consumption} = \\beta_0 + \\beta_1 \\text{Dose}. \n",
74 | "$$\n",
75 | "\n",
76 | "What is the resulting linear equation? What is the $R^2$ value? Do you think the variables have a strong linear relationship? Add the line to your scatter plot.\n"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 18,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "# your code goes here\n"
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "**Your answer goes here:**"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {
98 | "slideshow": {
99 | "slide_type": "slide"
100 | }
101 | },
102 | "source": [
103 | "## Activity 2: Residuals\n",
104 | "\n",
105 | "The regression in Activity 1 is in fact valid even though the predictor $x$ only has 3 distinct values; for each fixed value of $x$, the researcher collected a random sample of $y$ values.\n",
106 | "\n",
107 | "However, one assumption which is made by simple linear regression is that the residuals have an approximately normal distribution.\n",
108 | "\n",
109 | "**Exercise:** Compute the residuals for the above regression and make a normal probability plot of the residuals. Do you think they are approximately normally distributed? \n",
110 | "\n"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 19,
116 | "metadata": {
117 | "slideshow": {
118 | "slide_type": "-"
119 | }
120 | },
121 | "outputs": [],
122 | "source": [
123 | "# your code goes here \n"
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {
129 | "slideshow": {
130 | "slide_type": "-"
131 | }
132 | },
133 | "source": [
134 | "**Your answer goes here:**\n"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": []
143 | }
144 | ],
145 | "metadata": {
146 | "anaconda-cloud": {},
147 | "celltoolbar": "Slideshow",
148 | "kernelspec": {
149 | "display_name": "Python 3 (ipykernel)",
150 | "language": "python",
151 | "name": "python3"
152 | },
153 | "language_info": {
154 | "codemirror_mode": {
155 | "name": "ipython",
156 | "version": 3
157 | },
158 | "file_extension": ".py",
159 | "mimetype": "text/x-python",
160 | "name": "python",
161 | "nbconvert_exporter": "python",
162 | "pygments_lexer": "ipython3",
163 | "version": "3.11.5"
164 | }
165 | },
166 | "nbformat": 4,
167 | "nbformat_minor": 4
168 | }
169 |
--------------------------------------------------------------------------------
/09-LinearRegression1/438px-Linear_regression.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/09-LinearRegression1/438px-Linear_regression.png
--------------------------------------------------------------------------------
/09-LinearRegression1/Advertising.csv:
--------------------------------------------------------------------------------
1 | "","TV","Radio","Newspaper","Sales"
2 | "1",230.1,37.8,69.2,22.1
3 | "2",44.5,39.3,45.1,10.4
4 | "3",17.2,45.9,69.3,9.3
5 | "4",151.5,41.3,58.5,18.5
6 | "5",180.8,10.8,58.4,12.9
7 | "6",8.7,48.9,75,7.2
8 | "7",57.5,32.8,23.5,11.8
9 | "8",120.2,19.6,11.6,13.2
10 | "9",8.6,2.1,1,4.8
11 | "10",199.8,2.6,21.2,10.6
12 | "11",66.1,5.8,24.2,8.6
13 | "12",214.7,24,4,17.4
14 | "13",23.8,35.1,65.9,9.2
15 | "14",97.5,7.6,7.2,9.7
16 | "15",204.1,32.9,46,19
17 | "16",195.4,47.7,52.9,22.4
18 | "17",67.8,36.6,114,12.5
19 | "18",281.4,39.6,55.8,24.4
20 | "19",69.2,20.5,18.3,11.3
21 | "20",147.3,23.9,19.1,14.6
22 | "21",218.4,27.7,53.4,18
23 | "22",237.4,5.1,23.5,12.5
24 | "23",13.2,15.9,49.6,5.6
25 | "24",228.3,16.9,26.2,15.5
26 | "25",62.3,12.6,18.3,9.7
27 | "26",262.9,3.5,19.5,12
28 | "27",142.9,29.3,12.6,15
29 | "28",240.1,16.7,22.9,15.9
30 | "29",248.8,27.1,22.9,18.9
31 | "30",70.6,16,40.8,10.5
32 | "31",292.9,28.3,43.2,21.4
33 | "32",112.9,17.4,38.6,11.9
34 | "33",97.2,1.5,30,9.6
35 | "34",265.6,20,0.3,17.4
36 | "35",95.7,1.4,7.4,9.5
37 | "36",290.7,4.1,8.5,12.8
38 | "37",266.9,43.8,5,25.4
39 | "38",74.7,49.4,45.7,14.7
40 | "39",43.1,26.7,35.1,10.1
41 | "40",228,37.7,32,21.5
42 | "41",202.5,22.3,31.6,16.6
43 | "42",177,33.4,38.7,17.1
44 | "43",293.6,27.7,1.8,20.7
45 | "44",206.9,8.4,26.4,12.9
46 | "45",25.1,25.7,43.3,8.5
47 | "46",175.1,22.5,31.5,14.9
48 | "47",89.7,9.9,35.7,10.6
49 | "48",239.9,41.5,18.5,23.2
50 | "49",227.2,15.8,49.9,14.8
51 | "50",66.9,11.7,36.8,9.7
52 | "51",199.8,3.1,34.6,11.4
53 | "52",100.4,9.6,3.6,10.7
54 | "53",216.4,41.7,39.6,22.6
55 | "54",182.6,46.2,58.7,21.2
56 | "55",262.7,28.8,15.9,20.2
57 | "56",198.9,49.4,60,23.7
58 | "57",7.3,28.1,41.4,5.5
59 | "58",136.2,19.2,16.6,13.2
60 | "59",210.8,49.6,37.7,23.8
61 | "60",210.7,29.5,9.3,18.4
62 | "61",53.5,2,21.4,8.1
63 | "62",261.3,42.7,54.7,24.2
64 | "63",239.3,15.5,27.3,15.7
65 | "64",102.7,29.6,8.4,14
66 | "65",131.1,42.8,28.9,18
67 | "66",69,9.3,0.9,9.3
68 | "67",31.5,24.6,2.2,9.5
69 | "68",139.3,14.5,10.2,13.4
70 | "69",237.4,27.5,11,18.9
71 | "70",216.8,43.9,27.2,22.3
72 | "71",199.1,30.6,38.7,18.3
73 | "72",109.8,14.3,31.7,12.4
74 | "73",26.8,33,19.3,8.8
75 | "74",129.4,5.7,31.3,11
76 | "75",213.4,24.6,13.1,17
77 | "76",16.9,43.7,89.4,8.7
78 | "77",27.5,1.6,20.7,6.9
79 | "78",120.5,28.5,14.2,14.2
80 | "79",5.4,29.9,9.4,5.3
81 | "80",116,7.7,23.1,11
82 | "81",76.4,26.7,22.3,11.8
83 | "82",239.8,4.1,36.9,12.3
84 | "83",75.3,20.3,32.5,11.3
85 | "84",68.4,44.5,35.6,13.6
86 | "85",213.5,43,33.8,21.7
87 | "86",193.2,18.4,65.7,15.2
88 | "87",76.3,27.5,16,12
89 | "88",110.7,40.6,63.2,16
90 | "89",88.3,25.5,73.4,12.9
91 | "90",109.8,47.8,51.4,16.7
92 | "91",134.3,4.9,9.3,11.2
93 | "92",28.6,1.5,33,7.3
94 | "93",217.7,33.5,59,19.4
95 | "94",250.9,36.5,72.3,22.2
96 | "95",107.4,14,10.9,11.5
97 | "96",163.3,31.6,52.9,16.9
98 | "97",197.6,3.5,5.9,11.7
99 | "98",184.9,21,22,15.5
100 | "99",289.7,42.3,51.2,25.4
101 | "100",135.2,41.7,45.9,17.2
102 | "101",222.4,4.3,49.8,11.7
103 | "102",296.4,36.3,100.9,23.8
104 | "103",280.2,10.1,21.4,14.8
105 | "104",187.9,17.2,17.9,14.7
106 | "105",238.2,34.3,5.3,20.7
107 | "106",137.9,46.4,59,19.2
108 | "107",25,11,29.7,7.2
109 | "108",90.4,0.3,23.2,8.7
110 | "109",13.1,0.4,25.6,5.3
111 | "110",255.4,26.9,5.5,19.8
112 | "111",225.8,8.2,56.5,13.4
113 | "112",241.7,38,23.2,21.8
114 | "113",175.7,15.4,2.4,14.1
115 | "114",209.6,20.6,10.7,15.9
116 | "115",78.2,46.8,34.5,14.6
117 | "116",75.1,35,52.7,12.6
118 | "117",139.2,14.3,25.6,12.2
119 | "118",76.4,0.8,14.8,9.4
120 | "119",125.7,36.9,79.2,15.9
121 | "120",19.4,16,22.3,6.6
122 | "121",141.3,26.8,46.2,15.5
123 | "122",18.8,21.7,50.4,7
124 | "123",224,2.4,15.6,11.6
125 | "124",123.1,34.6,12.4,15.2
126 | "125",229.5,32.3,74.2,19.7
127 | "126",87.2,11.8,25.9,10.6
128 | "127",7.8,38.9,50.6,6.6
129 | "128",80.2,0,9.2,8.8
130 | "129",220.3,49,3.2,24.7
131 | "130",59.6,12,43.1,9.7
132 | "131",0.7,39.6,8.7,1.6
133 | "132",265.2,2.9,43,12.7
134 | "133",8.4,27.2,2.1,5.7
135 | "134",219.8,33.5,45.1,19.6
136 | "135",36.9,38.6,65.6,10.8
137 | "136",48.3,47,8.5,11.6
138 | "137",25.6,39,9.3,9.5
139 | "138",273.7,28.9,59.7,20.8
140 | "139",43,25.9,20.5,9.6
141 | "140",184.9,43.9,1.7,20.7
142 | "141",73.4,17,12.9,10.9
143 | "142",193.7,35.4,75.6,19.2
144 | "143",220.5,33.2,37.9,20.1
145 | "144",104.6,5.7,34.4,10.4
146 | "145",96.2,14.8,38.9,11.4
147 | "146",140.3,1.9,9,10.3
148 | "147",240.1,7.3,8.7,13.2
149 | "148",243.2,49,44.3,25.4
150 | "149",38,40.3,11.9,10.9
151 | "150",44.7,25.8,20.6,10.1
152 | "151",280.7,13.9,37,16.1
153 | "152",121,8.4,48.7,11.6
154 | "153",197.6,23.3,14.2,16.6
155 | "154",171.3,39.7,37.7,19
156 | "155",187.8,21.1,9.5,15.6
157 | "156",4.1,11.6,5.7,3.2
158 | "157",93.9,43.5,50.5,15.3
159 | "158",149.8,1.3,24.3,10.1
160 | "159",11.7,36.9,45.2,7.3
161 | "160",131.7,18.4,34.6,12.9
162 | "161",172.5,18.1,30.7,14.4
163 | "162",85.7,35.8,49.3,13.3
164 | "163",188.4,18.1,25.6,14.9
165 | "164",163.5,36.8,7.4,18
166 | "165",117.2,14.7,5.4,11.9
167 | "166",234.5,3.4,84.8,11.9
168 | "167",17.9,37.6,21.6,8
169 | "168",206.8,5.2,19.4,12.2
170 | "169",215.4,23.6,57.6,17.1
171 | "170",284.3,10.6,6.4,15
172 | "171",50,11.6,18.4,8.4
173 | "172",164.5,20.9,47.4,14.5
174 | "173",19.6,20.1,17,7.6
175 | "174",168.4,7.1,12.8,11.7
176 | "175",222.4,3.4,13.1,11.5
177 | "176",276.9,48.9,41.8,27
178 | "177",248.4,30.2,20.3,20.2
179 | "178",170.2,7.8,35.2,11.7
180 | "179",276.7,2.3,23.7,11.8
181 | "180",165.6,10,17.6,12.6
182 | "181",156.6,2.6,8.3,10.5
183 | "182",218.5,5.4,27.4,12.2
184 | "183",56.2,5.7,29.7,8.7
185 | "184",287.6,43,71.8,26.2
186 | "185",253.8,21.3,30,17.6
187 | "186",205,45.1,19.6,22.6
188 | "187",139.5,2.1,26.6,10.3
189 | "188",191.1,28.7,18.2,17.3
190 | "189",286,13.9,3.7,15.9
191 | "190",18.7,12.1,23.4,6.7
192 | "191",39.5,41.1,5.8,10.8
193 | "192",75.5,10.8,6,9.9
194 | "193",17.2,4.1,31.6,5.9
195 | "194",166.8,42,3.6,19.6
196 | "195",149.7,35.6,6,17.3
197 | "196",38.2,3.7,13.8,7.6
198 | "197",94.2,4.9,8.1,9.7
199 | "198",177,9.3,6.4,12.8
200 | "199",283.6,42,66.2,25.5
201 | "200",232.1,8.6,8.7,13.4
202 |
--------------------------------------------------------------------------------
/09-LinearRegression1/SLR.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/09-LinearRegression1/SLR.pdf
--------------------------------------------------------------------------------
/10-LinearRegression2/438px-Linear_regression.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/10-LinearRegression2/438px-Linear_regression.png
--------------------------------------------------------------------------------
/10-LinearRegression2/9-LinearRegression2_Activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "# Introduction to Data Science \n",
12 | "# Activity for Lecture 9: Linear Regression 2\n",
13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n",
14 | "\n",
15 | "Name:\n",
16 | "\n",
17 | "Email:\n",
18 | "\n",
19 | "UID:\n"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {
25 | "slideshow": {
26 | "slide_type": "slide"
27 | }
28 | },
29 | "source": [
30 | "## Class exercise: analysis of the credit dataset \n",
31 | "\n",
32 | "Recall the 'Credit' dataset introduced in class and available [here](http://www-bcf.usc.edu/~gareth/ISL/data.html). \n",
33 | "This dataset consists of some credit card information for 400 people. \n",
34 | "\n",
35 | "First import the data and convert income to thousands.\n"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 1,
41 | "metadata": {},
42 | "outputs": [
43 | {
44 | "data": {
45 | "text/html": [
46 | "\n",
47 | "\n",
60 | "
\n",
61 | " \n",
62 | " \n",
63 | " | \n",
64 | " Income | \n",
65 | " Limit | \n",
66 | " Rating | \n",
67 | " Cards | \n",
68 | " Age | \n",
69 | " Education | \n",
70 | " Gender | \n",
71 | " Student | \n",
72 | " Married | \n",
73 | " Ethnicity | \n",
74 | " Balance | \n",
75 | "
\n",
76 | " \n",
77 | " \n",
78 | " \n",
79 | " 1 | \n",
80 | " 14891.0 | \n",
81 | " 3606 | \n",
82 | " 283 | \n",
83 | " 2 | \n",
84 | " 34 | \n",
85 | " 11 | \n",
86 | " Male | \n",
87 | " No | \n",
88 | " Yes | \n",
89 | " Caucasian | \n",
90 | " 333 | \n",
91 | "
\n",
92 | " \n",
93 | " 2 | \n",
94 | " 106025.0 | \n",
95 | " 6645 | \n",
96 | " 483 | \n",
97 | " 3 | \n",
98 | " 82 | \n",
99 | " 15 | \n",
100 | " Female | \n",
101 | " Yes | \n",
102 | " Yes | \n",
103 | " Asian | \n",
104 | " 903 | \n",
105 | "
\n",
106 | " \n",
107 | " 3 | \n",
108 | " 104593.0 | \n",
109 | " 7075 | \n",
110 | " 514 | \n",
111 | " 4 | \n",
112 | " 71 | \n",
113 | " 11 | \n",
114 | " Male | \n",
115 | " No | \n",
116 | " No | \n",
117 | " Asian | \n",
118 | " 580 | \n",
119 | "
\n",
120 | " \n",
121 | " 4 | \n",
122 | " 148924.0 | \n",
123 | " 9504 | \n",
124 | " 681 | \n",
125 | " 3 | \n",
126 | " 36 | \n",
127 | " 11 | \n",
128 | " Female | \n",
129 | " No | \n",
130 | " No | \n",
131 | " Asian | \n",
132 | " 964 | \n",
133 | "
\n",
134 | " \n",
135 | " 5 | \n",
136 | " 55882.0 | \n",
137 | " 4897 | \n",
138 | " 357 | \n",
139 | " 2 | \n",
140 | " 68 | \n",
141 | " 16 | \n",
142 | " Male | \n",
143 | " No | \n",
144 | " Yes | \n",
145 | " Caucasian | \n",
146 | " 331 | \n",
147 | "
\n",
148 | " \n",
149 | " ... | \n",
150 | " ... | \n",
151 | " ... | \n",
152 | " ... | \n",
153 | " ... | \n",
154 | " ... | \n",
155 | " ... | \n",
156 | " ... | \n",
157 | " ... | \n",
158 | " ... | \n",
159 | " ... | \n",
160 | " ... | \n",
161 | "
\n",
162 | " \n",
163 | " 396 | \n",
164 | " 12096.0 | \n",
165 | " 4100 | \n",
166 | " 307 | \n",
167 | " 3 | \n",
168 | " 32 | \n",
169 | " 13 | \n",
170 | " Male | \n",
171 | " No | \n",
172 | " Yes | \n",
173 | " Caucasian | \n",
174 | " 560 | \n",
175 | "
\n",
176 | " \n",
177 | " 397 | \n",
178 | " 13364.0 | \n",
179 | " 3838 | \n",
180 | " 296 | \n",
181 | " 5 | \n",
182 | " 65 | \n",
183 | " 17 | \n",
184 | " Male | \n",
185 | " No | \n",
186 | " No | \n",
187 | " African American | \n",
188 | " 480 | \n",
189 | "
\n",
190 | " \n",
191 | " 398 | \n",
192 | " 57872.0 | \n",
193 | " 4171 | \n",
194 | " 321 | \n",
195 | " 5 | \n",
196 | " 67 | \n",
197 | " 12 | \n",
198 | " Female | \n",
199 | " No | \n",
200 | " Yes | \n",
201 | " Caucasian | \n",
202 | " 138 | \n",
203 | "
\n",
204 | " \n",
205 | " 399 | \n",
206 | " 37728.0 | \n",
207 | " 2525 | \n",
208 | " 192 | \n",
209 | " 1 | \n",
210 | " 44 | \n",
211 | " 13 | \n",
212 | " Male | \n",
213 | " No | \n",
214 | " Yes | \n",
215 | " Caucasian | \n",
216 | " 0 | \n",
217 | "
\n",
218 | " \n",
219 | " 400 | \n",
220 | " 18701.0 | \n",
221 | " 5524 | \n",
222 | " 415 | \n",
223 | " 5 | \n",
224 | " 64 | \n",
225 | " 7 | \n",
226 | " Female | \n",
227 | " No | \n",
228 | " No | \n",
229 | " Asian | \n",
230 | " 966 | \n",
231 | "
\n",
232 | " \n",
233 | "
\n",
234 | "
400 rows × 11 columns
\n",
235 | "
"
236 | ],
237 | "text/plain": [
238 | " Income Limit Rating Cards Age Education Gender Student Married \\\n",
239 | "1 14891.0 3606 283 2 34 11 Male No Yes \n",
240 | "2 106025.0 6645 483 3 82 15 Female Yes Yes \n",
241 | "3 104593.0 7075 514 4 71 11 Male No No \n",
242 | "4 148924.0 9504 681 3 36 11 Female No No \n",
243 | "5 55882.0 4897 357 2 68 16 Male No Yes \n",
244 | ".. ... ... ... ... ... ... ... ... ... \n",
245 | "396 12096.0 4100 307 3 32 13 Male No Yes \n",
246 | "397 13364.0 3838 296 5 65 17 Male No No \n",
247 | "398 57872.0 4171 321 5 67 12 Female No Yes \n",
248 | "399 37728.0 2525 192 1 44 13 Male No Yes \n",
249 | "400 18701.0 5524 415 5 64 7 Female No No \n",
250 | "\n",
251 | " Ethnicity Balance \n",
252 | "1 Caucasian 333 \n",
253 | "2 Asian 903 \n",
254 | "3 Asian 580 \n",
255 | "4 Asian 964 \n",
256 | "5 Caucasian 331 \n",
257 | ".. ... ... \n",
258 | "396 Caucasian 560 \n",
259 | "397 African American 480 \n",
260 | "398 Caucasian 138 \n",
261 | "399 Caucasian 0 \n",
262 | "400 Asian 966 \n",
263 | "\n",
264 | "[400 rows x 11 columns]"
265 | ]
266 | },
267 | "execution_count": 1,
268 | "metadata": {},
269 | "output_type": "execute_result"
270 | }
271 | ],
272 | "source": [
273 | "# imports and setup\n",
274 | "\n",
275 | "import scipy as sc\n",
276 | "import numpy as np\n",
277 | "\n",
278 | "import pandas as pd\n",
279 | "import statsmodels.formula.api as sm #Last lecture: used statsmodels.formula.api.ols() for OLS\n",
280 | "from sklearn import linear_model #Last lecture: used sklearn.linear_model.LinearRegression() for OLS\n",
281 | "\n",
282 | "import matplotlib.pyplot as plt\n",
283 | "%matplotlib inline \n",
284 | "plt.rcParams['figure.figsize'] = (10, 6)\n",
285 | "\n",
286 | "from mpl_toolkits.mplot3d import Axes3D\n",
287 | "from matplotlib import cm\n",
288 | "\n",
289 | "# Import data from Credit.csv file\n",
290 | "credit = pd.read_csv('Credit.csv',index_col=0) #load data\n",
291 | "credit[\"Income\"] = credit[\"Income\"].map(lambda x: 1000*x)\n",
292 | "credit"
293 | ]
294 | },
295 | {
296 | "cell_type": "markdown",
297 | "metadata": {},
298 | "source": [
299 | "## Activity 1: A First Regression Model\n",
300 | "\n",
301 | "**Exercise:** First regress Limit on Rating: \n",
302 | "$$\n",
303 | "\\text{Limit} = \\beta_0 + \\beta_1 \\text{Rating}. \n",
304 | "$$\n",
305 | "Since credit ratings are primarily used by banks to determine credit limits, we expect that Rating is very predictive for Limit, so this regression should be very good. \n",
306 | "\n",
307 | "Use the 'ols' function from the statsmodels python library. What is the $R^2$ value? What are $H_0$ and $H_A$ for the associated hypothesis test and what is the $p$-value? \n"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 18,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "# your code goes here\n"
317 | ]
318 | },
319 | {
320 | "cell_type": "markdown",
321 | "metadata": {},
322 | "source": [
323 | "**Your answer goes here:**"
324 | ]
325 | },
326 | {
327 | "cell_type": "markdown",
328 | "metadata": {
329 | "slideshow": {
330 | "slide_type": "slide"
331 | }
332 | },
333 | "source": [
334 | "## Activity 2: Predicting Limit without Rating \n",
335 | "\n",
336 | "Since Rating and Limit are almost the same variable, next we'll forget about Rating and just try to predict Limit from the real-valued variables (non-categorical variables): Income, Cards, Age, Education, Balance. \n",
337 | "\n",
338 | "**Exercise:** Develop a multilinear regression model to predict Rating. Interpret the results. \n",
339 | "\n",
340 | "For now, just focus on the real-valued variables (Income, Cards, Age, Education, Balance)\n",
341 | "and ignore the categorical variables (Gender, Student, Married, Ethnicity). \n",
342 | "\n"
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "execution_count": 19,
348 | "metadata": {
349 | "slideshow": {
350 | "slide_type": "-"
351 | }
352 | },
353 | "outputs": [],
354 | "source": [
355 | "# your code goes here \n"
356 | ]
357 | },
358 | {
359 | "cell_type": "markdown",
360 | "metadata": {
361 | "slideshow": {
362 | "slide_type": "-"
363 | }
364 | },
365 | "source": [
366 | "Which independent variables are good/bad predictors? What is the best overall model?\n",
367 | "\n",
368 | "**Your observations:**\n"
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {
374 | "slideshow": {
375 | "slide_type": "slide"
376 | }
377 | },
378 | "source": [
379 | "## Activity 3: Incorporating Categorical Variables Into Regression Models\n",
380 | "\n",
381 | "Now consider the binary categorical variables which we mapped to integer 0, 1 values in class."
382 | ]
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": 2,
387 | "metadata": {
388 | "slideshow": {
389 | "slide_type": "-"
390 | }
391 | },
392 | "outputs": [],
393 | "source": [
394 | "credit[\"Gender_num\"] = credit[\"Gender\"].map({' Male':0, 'Female':1})\n",
395 | "credit[\"Student_num\"] = credit[\"Student\"].map({'Yes':1, 'No':0})\n",
396 | "credit[\"Married_num\"] = credit[\"Married\"].map({'Yes':1, 'No':0})"
397 | ]
398 | },
399 | {
400 | "cell_type": "markdown",
401 | "metadata": {
402 | "slideshow": {
403 | "slide_type": "-"
404 | }
405 | },
406 | "source": [
407 | "Can you improve the model you developed in Activity 2 by incorporating one or more of these variables?\n"
408 | ]
409 | },
410 | {
411 | "cell_type": "code",
412 | "execution_count": 3,
413 | "metadata": {
414 | "slideshow": {
415 | "slide_type": "-"
416 | }
417 | },
418 | "outputs": [],
419 | "source": [
420 | "# your code here \n"
421 | ]
422 | },
423 | {
424 | "cell_type": "markdown",
425 | "metadata": {},
426 | "source": [
427 | "**Your answer goes here:**"
428 | ]
429 | }
430 | ],
431 | "metadata": {
432 | "anaconda-cloud": {},
433 | "celltoolbar": "Slideshow",
434 | "kernelspec": {
435 | "display_name": "Python 3 (ipykernel)",
436 | "language": "python",
437 | "name": "python3"
438 | },
439 | "language_info": {
440 | "codemirror_mode": {
441 | "name": "ipython",
442 | "version": 3
443 | },
444 | "file_extension": ".py",
445 | "mimetype": "text/x-python",
446 | "name": "python",
447 | "nbconvert_exporter": "python",
448 | "pygments_lexer": "ipython3",
449 | "version": "3.11.5"
450 | }
451 | },
452 | "nbformat": 4,
453 | "nbformat_minor": 4
454 | }
455 |
--------------------------------------------------------------------------------
/10-LinearRegression2/Advertising.csv:
--------------------------------------------------------------------------------
1 | "","TV","Radio","Newspaper","Sales"
2 | "1",230.1,37.8,69.2,22.1
3 | "2",44.5,39.3,45.1,10.4
4 | "3",17.2,45.9,69.3,9.3
5 | "4",151.5,41.3,58.5,18.5
6 | "5",180.8,10.8,58.4,12.9
7 | "6",8.7,48.9,75,7.2
8 | "7",57.5,32.8,23.5,11.8
9 | "8",120.2,19.6,11.6,13.2
10 | "9",8.6,2.1,1,4.8
11 | "10",199.8,2.6,21.2,10.6
12 | "11",66.1,5.8,24.2,8.6
13 | "12",214.7,24,4,17.4
14 | "13",23.8,35.1,65.9,9.2
15 | "14",97.5,7.6,7.2,9.7
16 | "15",204.1,32.9,46,19
17 | "16",195.4,47.7,52.9,22.4
18 | "17",67.8,36.6,114,12.5
19 | "18",281.4,39.6,55.8,24.4
20 | "19",69.2,20.5,18.3,11.3
21 | "20",147.3,23.9,19.1,14.6
22 | "21",218.4,27.7,53.4,18
23 | "22",237.4,5.1,23.5,12.5
24 | "23",13.2,15.9,49.6,5.6
25 | "24",228.3,16.9,26.2,15.5
26 | "25",62.3,12.6,18.3,9.7
27 | "26",262.9,3.5,19.5,12
28 | "27",142.9,29.3,12.6,15
29 | "28",240.1,16.7,22.9,15.9
30 | "29",248.8,27.1,22.9,18.9
31 | "30",70.6,16,40.8,10.5
32 | "31",292.9,28.3,43.2,21.4
33 | "32",112.9,17.4,38.6,11.9
34 | "33",97.2,1.5,30,9.6
35 | "34",265.6,20,0.3,17.4
36 | "35",95.7,1.4,7.4,9.5
37 | "36",290.7,4.1,8.5,12.8
38 | "37",266.9,43.8,5,25.4
39 | "38",74.7,49.4,45.7,14.7
40 | "39",43.1,26.7,35.1,10.1
41 | "40",228,37.7,32,21.5
42 | "41",202.5,22.3,31.6,16.6
43 | "42",177,33.4,38.7,17.1
44 | "43",293.6,27.7,1.8,20.7
45 | "44",206.9,8.4,26.4,12.9
46 | "45",25.1,25.7,43.3,8.5
47 | "46",175.1,22.5,31.5,14.9
48 | "47",89.7,9.9,35.7,10.6
49 | "48",239.9,41.5,18.5,23.2
50 | "49",227.2,15.8,49.9,14.8
51 | "50",66.9,11.7,36.8,9.7
52 | "51",199.8,3.1,34.6,11.4
53 | "52",100.4,9.6,3.6,10.7
54 | "53",216.4,41.7,39.6,22.6
55 | "54",182.6,46.2,58.7,21.2
56 | "55",262.7,28.8,15.9,20.2
57 | "56",198.9,49.4,60,23.7
58 | "57",7.3,28.1,41.4,5.5
59 | "58",136.2,19.2,16.6,13.2
60 | "59",210.8,49.6,37.7,23.8
61 | "60",210.7,29.5,9.3,18.4
62 | "61",53.5,2,21.4,8.1
63 | "62",261.3,42.7,54.7,24.2
64 | "63",239.3,15.5,27.3,15.7
65 | "64",102.7,29.6,8.4,14
66 | "65",131.1,42.8,28.9,18
67 | "66",69,9.3,0.9,9.3
68 | "67",31.5,24.6,2.2,9.5
69 | "68",139.3,14.5,10.2,13.4
70 | "69",237.4,27.5,11,18.9
71 | "70",216.8,43.9,27.2,22.3
72 | "71",199.1,30.6,38.7,18.3
73 | "72",109.8,14.3,31.7,12.4
74 | "73",26.8,33,19.3,8.8
75 | "74",129.4,5.7,31.3,11
76 | "75",213.4,24.6,13.1,17
77 | "76",16.9,43.7,89.4,8.7
78 | "77",27.5,1.6,20.7,6.9
79 | "78",120.5,28.5,14.2,14.2
80 | "79",5.4,29.9,9.4,5.3
81 | "80",116,7.7,23.1,11
82 | "81",76.4,26.7,22.3,11.8
83 | "82",239.8,4.1,36.9,12.3
84 | "83",75.3,20.3,32.5,11.3
85 | "84",68.4,44.5,35.6,13.6
86 | "85",213.5,43,33.8,21.7
87 | "86",193.2,18.4,65.7,15.2
88 | "87",76.3,27.5,16,12
89 | "88",110.7,40.6,63.2,16
90 | "89",88.3,25.5,73.4,12.9
91 | "90",109.8,47.8,51.4,16.7
92 | "91",134.3,4.9,9.3,11.2
93 | "92",28.6,1.5,33,7.3
94 | "93",217.7,33.5,59,19.4
95 | "94",250.9,36.5,72.3,22.2
96 | "95",107.4,14,10.9,11.5
97 | "96",163.3,31.6,52.9,16.9
98 | "97",197.6,3.5,5.9,11.7
99 | "98",184.9,21,22,15.5
100 | "99",289.7,42.3,51.2,25.4
101 | "100",135.2,41.7,45.9,17.2
102 | "101",222.4,4.3,49.8,11.7
103 | "102",296.4,36.3,100.9,23.8
104 | "103",280.2,10.1,21.4,14.8
105 | "104",187.9,17.2,17.9,14.7
106 | "105",238.2,34.3,5.3,20.7
107 | "106",137.9,46.4,59,19.2
108 | "107",25,11,29.7,7.2
109 | "108",90.4,0.3,23.2,8.7
110 | "109",13.1,0.4,25.6,5.3
111 | "110",255.4,26.9,5.5,19.8
112 | "111",225.8,8.2,56.5,13.4
113 | "112",241.7,38,23.2,21.8
114 | "113",175.7,15.4,2.4,14.1
115 | "114",209.6,20.6,10.7,15.9
116 | "115",78.2,46.8,34.5,14.6
117 | "116",75.1,35,52.7,12.6
118 | "117",139.2,14.3,25.6,12.2
119 | "118",76.4,0.8,14.8,9.4
120 | "119",125.7,36.9,79.2,15.9
121 | "120",19.4,16,22.3,6.6
122 | "121",141.3,26.8,46.2,15.5
123 | "122",18.8,21.7,50.4,7
124 | "123",224,2.4,15.6,11.6
125 | "124",123.1,34.6,12.4,15.2
126 | "125",229.5,32.3,74.2,19.7
127 | "126",87.2,11.8,25.9,10.6
128 | "127",7.8,38.9,50.6,6.6
129 | "128",80.2,0,9.2,8.8
130 | "129",220.3,49,3.2,24.7
131 | "130",59.6,12,43.1,9.7
132 | "131",0.7,39.6,8.7,1.6
133 | "132",265.2,2.9,43,12.7
134 | "133",8.4,27.2,2.1,5.7
135 | "134",219.8,33.5,45.1,19.6
136 | "135",36.9,38.6,65.6,10.8
137 | "136",48.3,47,8.5,11.6
138 | "137",25.6,39,9.3,9.5
139 | "138",273.7,28.9,59.7,20.8
140 | "139",43,25.9,20.5,9.6
141 | "140",184.9,43.9,1.7,20.7
142 | "141",73.4,17,12.9,10.9
143 | "142",193.7,35.4,75.6,19.2
144 | "143",220.5,33.2,37.9,20.1
145 | "144",104.6,5.7,34.4,10.4
146 | "145",96.2,14.8,38.9,11.4
147 | "146",140.3,1.9,9,10.3
148 | "147",240.1,7.3,8.7,13.2
149 | "148",243.2,49,44.3,25.4
150 | "149",38,40.3,11.9,10.9
151 | "150",44.7,25.8,20.6,10.1
152 | "151",280.7,13.9,37,16.1
153 | "152",121,8.4,48.7,11.6
154 | "153",197.6,23.3,14.2,16.6
155 | "154",171.3,39.7,37.7,19
156 | "155",187.8,21.1,9.5,15.6
157 | "156",4.1,11.6,5.7,3.2
158 | "157",93.9,43.5,50.5,15.3
159 | "158",149.8,1.3,24.3,10.1
160 | "159",11.7,36.9,45.2,7.3
161 | "160",131.7,18.4,34.6,12.9
162 | "161",172.5,18.1,30.7,14.4
163 | "162",85.7,35.8,49.3,13.3
164 | "163",188.4,18.1,25.6,14.9
165 | "164",163.5,36.8,7.4,18
166 | "165",117.2,14.7,5.4,11.9
167 | "166",234.5,3.4,84.8,11.9
168 | "167",17.9,37.6,21.6,8
169 | "168",206.8,5.2,19.4,12.2
170 | "169",215.4,23.6,57.6,17.1
171 | "170",284.3,10.6,6.4,15
172 | "171",50,11.6,18.4,8.4
173 | "172",164.5,20.9,47.4,14.5
174 | "173",19.6,20.1,17,7.6
175 | "174",168.4,7.1,12.8,11.7
176 | "175",222.4,3.4,13.1,11.5
177 | "176",276.9,48.9,41.8,27
178 | "177",248.4,30.2,20.3,20.2
179 | "178",170.2,7.8,35.2,11.7
180 | "179",276.7,2.3,23.7,11.8
181 | "180",165.6,10,17.6,12.6
182 | "181",156.6,2.6,8.3,10.5
183 | "182",218.5,5.4,27.4,12.2
184 | "183",56.2,5.7,29.7,8.7
185 | "184",287.6,43,71.8,26.2
186 | "185",253.8,21.3,30,17.6
187 | "186",205,45.1,19.6,22.6
188 | "187",139.5,2.1,26.6,10.3
189 | "188",191.1,28.7,18.2,17.3
190 | "189",286,13.9,3.7,15.9
191 | "190",18.7,12.1,23.4,6.7
192 | "191",39.5,41.1,5.8,10.8
193 | "192",75.5,10.8,6,9.9
194 | "193",17.2,4.1,31.6,5.9
195 | "194",166.8,42,3.6,19.6
196 | "195",149.7,35.6,6,17.3
197 | "196",38.2,3.7,13.8,7.6
198 | "197",94.2,4.9,8.1,9.7
199 | "198",177,9.3,6.4,12.8
200 | "199",283.6,42,66.2,25.5
201 | "200",232.1,8.6,8.7,13.4
202 |
--------------------------------------------------------------------------------
/10-LinearRegression2/Auto.csv:
--------------------------------------------------------------------------------
1 | mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
2 | 18,8,307,130,3504,12,70,1,chevrolet chevelle malibu
3 | 15,8,350,165,3693,11.5,70,1,buick skylark 320
4 | 18,8,318,150,3436,11,70,1,plymouth satellite
5 | 16,8,304,150,3433,12,70,1,amc rebel sst
6 | 17,8,302,140,3449,10.5,70,1,ford torino
7 | 15,8,429,198,4341,10,70,1,ford galaxie 500
8 | 14,8,454,220,4354,9,70,1,chevrolet impala
9 | 14,8,440,215,4312,8.5,70,1,plymouth fury iii
10 | 14,8,455,225,4425,10,70,1,pontiac catalina
11 | 15,8,390,190,3850,8.5,70,1,amc ambassador dpl
12 | 15,8,383,170,3563,10,70,1,dodge challenger se
13 | 14,8,340,160,3609,8,70,1,plymouth 'cuda 340
14 | 15,8,400,150,3761,9.5,70,1,chevrolet monte carlo
15 | 14,8,455,225,3086,10,70,1,buick estate wagon (sw)
16 | 24,4,113,95,2372,15,70,3,toyota corona mark ii
17 | 22,6,198,95,2833,15.5,70,1,plymouth duster
18 | 18,6,199,97,2774,15.5,70,1,amc hornet
19 | 21,6,200,85,2587,16,70,1,ford maverick
20 | 27,4,97,88,2130,14.5,70,3,datsun pl510
21 | 26,4,97,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan
22 | 25,4,110,87,2672,17.5,70,2,peugeot 504
23 | 24,4,107,90,2430,14.5,70,2,audi 100 ls
24 | 25,4,104,95,2375,17.5,70,2,saab 99e
25 | 26,4,121,113,2234,12.5,70,2,bmw 2002
26 | 21,6,199,90,2648,15,70,1,amc gremlin
27 | 10,8,360,215,4615,14,70,1,ford f250
28 | 10,8,307,200,4376,15,70,1,chevy c20
29 | 11,8,318,210,4382,13.5,70,1,dodge d200
30 | 9,8,304,193,4732,18.5,70,1,hi 1200d
31 | 27,4,97,88,2130,14.5,71,3,datsun pl510
32 | 28,4,140,90,2264,15.5,71,1,chevrolet vega 2300
33 | 25,4,113,95,2228,14,71,3,toyota corona
34 | 25,4,98,?,2046,19,71,1,ford pinto
35 | 19,6,232,100,2634,13,71,1,amc gremlin
36 | 16,6,225,105,3439,15.5,71,1,plymouth satellite custom
37 | 17,6,250,100,3329,15.5,71,1,chevrolet chevelle malibu
38 | 19,6,250,88,3302,15.5,71,1,ford torino 500
39 | 18,6,232,100,3288,15.5,71,1,amc matador
40 | 14,8,350,165,4209,12,71,1,chevrolet impala
41 | 14,8,400,175,4464,11.5,71,1,pontiac catalina brougham
42 | 14,8,351,153,4154,13.5,71,1,ford galaxie 500
43 | 14,8,318,150,4096,13,71,1,plymouth fury iii
44 | 12,8,383,180,4955,11.5,71,1,dodge monaco (sw)
45 | 13,8,400,170,4746,12,71,1,ford country squire (sw)
46 | 13,8,400,175,5140,12,71,1,pontiac safari (sw)
47 | 18,6,258,110,2962,13.5,71,1,amc hornet sportabout (sw)
48 | 22,4,140,72,2408,19,71,1,chevrolet vega (sw)
49 | 19,6,250,100,3282,15,71,1,pontiac firebird
50 | 18,6,250,88,3139,14.5,71,1,ford mustang
51 | 23,4,122,86,2220,14,71,1,mercury capri 2000
52 | 28,4,116,90,2123,14,71,2,opel 1900
53 | 30,4,79,70,2074,19.5,71,2,peugeot 304
54 | 30,4,88,76,2065,14.5,71,2,fiat 124b
55 | 31,4,71,65,1773,19,71,3,toyota corolla 1200
56 | 35,4,72,69,1613,18,71,3,datsun 1200
57 | 27,4,97,60,1834,19,71,2,volkswagen model 111
58 | 26,4,91,70,1955,20.5,71,1,plymouth cricket
59 | 24,4,113,95,2278,15.5,72,3,toyota corona hardtop
60 | 25,4,97.5,80,2126,17,72,1,dodge colt hardtop
61 | 23,4,97,54,2254,23.5,72,2,volkswagen type 3
62 | 20,4,140,90,2408,19.5,72,1,chevrolet vega
63 | 21,4,122,86,2226,16.5,72,1,ford pinto runabout
64 | 13,8,350,165,4274,12,72,1,chevrolet impala
65 | 14,8,400,175,4385,12,72,1,pontiac catalina
66 | 15,8,318,150,4135,13.5,72,1,plymouth fury iii
67 | 14,8,351,153,4129,13,72,1,ford galaxie 500
68 | 17,8,304,150,3672,11.5,72,1,amc ambassador sst
69 | 11,8,429,208,4633,11,72,1,mercury marquis
70 | 13,8,350,155,4502,13.5,72,1,buick lesabre custom
71 | 12,8,350,160,4456,13.5,72,1,oldsmobile delta 88 royale
72 | 13,8,400,190,4422,12.5,72,1,chrysler newport royal
73 | 19,3,70,97,2330,13.5,72,3,mazda rx2 coupe
74 | 15,8,304,150,3892,12.5,72,1,amc matador (sw)
75 | 13,8,307,130,4098,14,72,1,chevrolet chevelle concours (sw)
76 | 13,8,302,140,4294,16,72,1,ford gran torino (sw)
77 | 14,8,318,150,4077,14,72,1,plymouth satellite custom (sw)
78 | 18,4,121,112,2933,14.5,72,2,volvo 145e (sw)
79 | 22,4,121,76,2511,18,72,2,volkswagen 411 (sw)
80 | 21,4,120,87,2979,19.5,72,2,peugeot 504 (sw)
81 | 26,4,96,69,2189,18,72,2,renault 12 (sw)
82 | 22,4,122,86,2395,16,72,1,ford pinto (sw)
83 | 28,4,97,92,2288,17,72,3,datsun 510 (sw)
84 | 23,4,120,97,2506,14.5,72,3,toyouta corona mark ii (sw)
85 | 28,4,98,80,2164,15,72,1,dodge colt (sw)
86 | 27,4,97,88,2100,16.5,72,3,toyota corolla 1600 (sw)
87 | 13,8,350,175,4100,13,73,1,buick century 350
88 | 14,8,304,150,3672,11.5,73,1,amc matador
89 | 13,8,350,145,3988,13,73,1,chevrolet malibu
90 | 14,8,302,137,4042,14.5,73,1,ford gran torino
91 | 15,8,318,150,3777,12.5,73,1,dodge coronet custom
92 | 12,8,429,198,4952,11.5,73,1,mercury marquis brougham
93 | 13,8,400,150,4464,12,73,1,chevrolet caprice classic
94 | 13,8,351,158,4363,13,73,1,ford ltd
95 | 14,8,318,150,4237,14.5,73,1,plymouth fury gran sedan
96 | 13,8,440,215,4735,11,73,1,chrysler new yorker brougham
97 | 12,8,455,225,4951,11,73,1,buick electra 225 custom
98 | 13,8,360,175,3821,11,73,1,amc ambassador brougham
99 | 18,6,225,105,3121,16.5,73,1,plymouth valiant
100 | 16,6,250,100,3278,18,73,1,chevrolet nova custom
101 | 18,6,232,100,2945,16,73,1,amc hornet
102 | 18,6,250,88,3021,16.5,73,1,ford maverick
103 | 23,6,198,95,2904,16,73,1,plymouth duster
104 | 26,4,97,46,1950,21,73,2,volkswagen super beetle
105 | 11,8,400,150,4997,14,73,1,chevrolet impala
106 | 12,8,400,167,4906,12.5,73,1,ford country
107 | 13,8,360,170,4654,13,73,1,plymouth custom suburb
108 | 12,8,350,180,4499,12.5,73,1,oldsmobile vista cruiser
109 | 18,6,232,100,2789,15,73,1,amc gremlin
110 | 20,4,97,88,2279,19,73,3,toyota carina
111 | 21,4,140,72,2401,19.5,73,1,chevrolet vega
112 | 22,4,108,94,2379,16.5,73,3,datsun 610
113 | 18,3,70,90,2124,13.5,73,3,maxda rx3
114 | 19,4,122,85,2310,18.5,73,1,ford pinto
115 | 21,6,155,107,2472,14,73,1,mercury capri v6
116 | 26,4,98,90,2265,15.5,73,2,fiat 124 sport coupe
117 | 15,8,350,145,4082,13,73,1,chevrolet monte carlo s
118 | 16,8,400,230,4278,9.5,73,1,pontiac grand prix
119 | 29,4,68,49,1867,19.5,73,2,fiat 128
120 | 24,4,116,75,2158,15.5,73,2,opel manta
121 | 20,4,114,91,2582,14,73,2,audi 100ls
122 | 19,4,121,112,2868,15.5,73,2,volvo 144ea
123 | 15,8,318,150,3399,11,73,1,dodge dart custom
124 | 24,4,121,110,2660,14,73,2,saab 99le
125 | 20,6,156,122,2807,13.5,73,3,toyota mark ii
126 | 11,8,350,180,3664,11,73,1,oldsmobile omega
127 | 20,6,198,95,3102,16.5,74,1,plymouth duster
128 | 21,6,200,?,2875,17,74,1,ford maverick
129 | 19,6,232,100,2901,16,74,1,amc hornet
130 | 15,6,250,100,3336,17,74,1,chevrolet nova
131 | 31,4,79,67,1950,19,74,3,datsun b210
132 | 26,4,122,80,2451,16.5,74,1,ford pinto
133 | 32,4,71,65,1836,21,74,3,toyota corolla 1200
134 | 25,4,140,75,2542,17,74,1,chevrolet vega
135 | 16,6,250,100,3781,17,74,1,chevrolet chevelle malibu classic
136 | 16,6,258,110,3632,18,74,1,amc matador
137 | 18,6,225,105,3613,16.5,74,1,plymouth satellite sebring
138 | 16,8,302,140,4141,14,74,1,ford gran torino
139 | 13,8,350,150,4699,14.5,74,1,buick century luxus (sw)
140 | 14,8,318,150,4457,13.5,74,1,dodge coronet custom (sw)
141 | 14,8,302,140,4638,16,74,1,ford gran torino (sw)
142 | 14,8,304,150,4257,15.5,74,1,amc matador (sw)
143 | 29,4,98,83,2219,16.5,74,2,audi fox
144 | 26,4,79,67,1963,15.5,74,2,volkswagen dasher
145 | 26,4,97,78,2300,14.5,74,2,opel manta
146 | 31,4,76,52,1649,16.5,74,3,toyota corona
147 | 32,4,83,61,2003,19,74,3,datsun 710
148 | 28,4,90,75,2125,14.5,74,1,dodge colt
149 | 24,4,90,75,2108,15.5,74,2,fiat 128
150 | 26,4,116,75,2246,14,74,2,fiat 124 tc
151 | 24,4,120,97,2489,15,74,3,honda civic
152 | 26,4,108,93,2391,15.5,74,3,subaru
153 | 31,4,79,67,2000,16,74,2,fiat x1.9
154 | 19,6,225,95,3264,16,75,1,plymouth valiant custom
155 | 18,6,250,105,3459,16,75,1,chevrolet nova
156 | 15,6,250,72,3432,21,75,1,mercury monarch
157 | 15,6,250,72,3158,19.5,75,1,ford maverick
158 | 16,8,400,170,4668,11.5,75,1,pontiac catalina
159 | 15,8,350,145,4440,14,75,1,chevrolet bel air
160 | 16,8,318,150,4498,14.5,75,1,plymouth grand fury
161 | 14,8,351,148,4657,13.5,75,1,ford ltd
162 | 17,6,231,110,3907,21,75,1,buick century
163 | 16,6,250,105,3897,18.5,75,1,chevroelt chevelle malibu
164 | 15,6,258,110,3730,19,75,1,amc matador
165 | 18,6,225,95,3785,19,75,1,plymouth fury
166 | 21,6,231,110,3039,15,75,1,buick skyhawk
167 | 20,8,262,110,3221,13.5,75,1,chevrolet monza 2+2
168 | 13,8,302,129,3169,12,75,1,ford mustang ii
169 | 29,4,97,75,2171,16,75,3,toyota corolla
170 | 23,4,140,83,2639,17,75,1,ford pinto
171 | 20,6,232,100,2914,16,75,1,amc gremlin
172 | 23,4,140,78,2592,18.5,75,1,pontiac astro
173 | 24,4,134,96,2702,13.5,75,3,toyota corona
174 | 25,4,90,71,2223,16.5,75,2,volkswagen dasher
175 | 24,4,119,97,2545,17,75,3,datsun 710
176 | 18,6,171,97,2984,14.5,75,1,ford pinto
177 | 29,4,90,70,1937,14,75,2,volkswagen rabbit
178 | 19,6,232,90,3211,17,75,1,amc pacer
179 | 23,4,115,95,2694,15,75,2,audi 100ls
180 | 23,4,120,88,2957,17,75,2,peugeot 504
181 | 22,4,121,98,2945,14.5,75,2,volvo 244dl
182 | 25,4,121,115,2671,13.5,75,2,saab 99le
183 | 33,4,91,53,1795,17.5,75,3,honda civic cvcc
184 | 28,4,107,86,2464,15.5,76,2,fiat 131
185 | 25,4,116,81,2220,16.9,76,2,opel 1900
186 | 25,4,140,92,2572,14.9,76,1,capri ii
187 | 26,4,98,79,2255,17.7,76,1,dodge colt
188 | 27,4,101,83,2202,15.3,76,2,renault 12tl
189 | 17.5,8,305,140,4215,13,76,1,chevrolet chevelle malibu classic
190 | 16,8,318,150,4190,13,76,1,dodge coronet brougham
191 | 15.5,8,304,120,3962,13.9,76,1,amc matador
192 | 14.5,8,351,152,4215,12.8,76,1,ford gran torino
193 | 22,6,225,100,3233,15.4,76,1,plymouth valiant
194 | 22,6,250,105,3353,14.5,76,1,chevrolet nova
195 | 24,6,200,81,3012,17.6,76,1,ford maverick
196 | 22.5,6,232,90,3085,17.6,76,1,amc hornet
197 | 29,4,85,52,2035,22.2,76,1,chevrolet chevette
198 | 24.5,4,98,60,2164,22.1,76,1,chevrolet woody
199 | 29,4,90,70,1937,14.2,76,2,vw rabbit
200 | 33,4,91,53,1795,17.4,76,3,honda civic
201 | 20,6,225,100,3651,17.7,76,1,dodge aspen se
202 | 18,6,250,78,3574,21,76,1,ford granada ghia
203 | 18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj
204 | 17.5,6,258,95,3193,17.8,76,1,amc pacer d/l
205 | 29.5,4,97,71,1825,12.2,76,2,volkswagen rabbit
206 | 32,4,85,70,1990,17,76,3,datsun b-210
207 | 28,4,97,75,2155,16.4,76,3,toyota corolla
208 | 26.5,4,140,72,2565,13.6,76,1,ford pinto
209 | 20,4,130,102,3150,15.7,76,2,volvo 245
210 | 13,8,318,150,3940,13.2,76,1,plymouth volare premier v8
211 | 19,4,120,88,3270,21.9,76,2,peugeot 504
212 | 19,6,156,108,2930,15.5,76,3,toyota mark ii
213 | 16.5,6,168,120,3820,16.7,76,2,mercedes-benz 280s
214 | 16.5,8,350,180,4380,12.1,76,1,cadillac seville
215 | 13,8,350,145,4055,12,76,1,chevy c10
216 | 13,8,302,130,3870,15,76,1,ford f108
217 | 13,8,318,150,3755,14,76,1,dodge d100
218 | 31.5,4,98,68,2045,18.5,77,3,honda accord cvcc
219 | 30,4,111,80,2155,14.8,77,1,buick opel isuzu deluxe
220 | 36,4,79,58,1825,18.6,77,2,renault 5 gtl
221 | 25.5,4,122,96,2300,15.5,77,1,plymouth arrow gs
222 | 33.5,4,85,70,1945,16.8,77,3,datsun f-10 hatchback
223 | 17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic
224 | 17,8,260,110,4060,19,77,1,oldsmobile cutlass supreme
225 | 15.5,8,318,145,4140,13.7,77,1,dodge monaco brougham
226 | 15,8,302,130,4295,14.9,77,1,mercury cougar brougham
227 | 17.5,6,250,110,3520,16.4,77,1,chevrolet concours
228 | 20.5,6,231,105,3425,16.9,77,1,buick skylark
229 | 19,6,225,100,3630,17.7,77,1,plymouth volare custom
230 | 18.5,6,250,98,3525,19,77,1,ford granada
231 | 16,8,400,180,4220,11.1,77,1,pontiac grand prix lj
232 | 15.5,8,350,170,4165,11.4,77,1,chevrolet monte carlo landau
233 | 15.5,8,400,190,4325,12.2,77,1,chrysler cordoba
234 | 16,8,351,149,4335,14.5,77,1,ford thunderbird
235 | 29,4,97,78,1940,14.5,77,2,volkswagen rabbit custom
236 | 24.5,4,151,88,2740,16,77,1,pontiac sunbird coupe
237 | 26,4,97,75,2265,18.2,77,3,toyota corolla liftback
238 | 25.5,4,140,89,2755,15.8,77,1,ford mustang ii 2+2
239 | 30.5,4,98,63,2051,17,77,1,chevrolet chevette
240 | 33.5,4,98,83,2075,15.9,77,1,dodge colt m/m
241 | 30,4,97,67,1985,16.4,77,3,subaru dl
242 | 30.5,4,97,78,2190,14.1,77,2,volkswagen dasher
243 | 22,6,146,97,2815,14.5,77,3,datsun 810
244 | 21.5,4,121,110,2600,12.8,77,2,bmw 320i
245 | 21.5,3,80,110,2720,13.5,77,3,mazda rx-4
246 | 43.1,4,90,48,1985,21.5,78,2,volkswagen rabbit custom diesel
247 | 36.1,4,98,66,1800,14.4,78,1,ford fiesta
248 | 32.8,4,78,52,1985,19.4,78,3,mazda glc deluxe
249 | 39.4,4,85,70,2070,18.6,78,3,datsun b210 gx
250 | 36.1,4,91,60,1800,16.4,78,3,honda civic cvcc
251 | 19.9,8,260,110,3365,15.5,78,1,oldsmobile cutlass salon brougham
252 | 19.4,8,318,140,3735,13.2,78,1,dodge diplomat
253 | 20.2,8,302,139,3570,12.8,78,1,mercury monarch ghia
254 | 19.2,6,231,105,3535,19.2,78,1,pontiac phoenix lj
255 | 20.5,6,200,95,3155,18.2,78,1,chevrolet malibu
256 | 20.2,6,200,85,2965,15.8,78,1,ford fairmont (auto)
257 | 25.1,4,140,88,2720,15.4,78,1,ford fairmont (man)
258 | 20.5,6,225,100,3430,17.2,78,1,plymouth volare
259 | 19.4,6,232,90,3210,17.2,78,1,amc concord
260 | 20.6,6,231,105,3380,15.8,78,1,buick century special
261 | 20.8,6,200,85,3070,16.7,78,1,mercury zephyr
262 | 18.6,6,225,110,3620,18.7,78,1,dodge aspen
263 | 18.1,6,258,120,3410,15.1,78,1,amc concord d/l
264 | 19.2,8,305,145,3425,13.2,78,1,chevrolet monte carlo landau
265 | 17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo)
266 | 18.1,8,302,139,3205,11.2,78,1,ford futura
267 | 17.5,8,318,140,4080,13.7,78,1,dodge magnum xe
268 | 30,4,98,68,2155,16.5,78,1,chevrolet chevette
269 | 27.5,4,134,95,2560,14.2,78,3,toyota corona
270 | 27.2,4,119,97,2300,14.7,78,3,datsun 510
271 | 30.9,4,105,75,2230,14.5,78,1,dodge omni
272 | 21.1,4,134,95,2515,14.8,78,3,toyota celica gt liftback
273 | 23.2,4,156,105,2745,16.7,78,1,plymouth sapporo
274 | 23.8,4,151,85,2855,17.6,78,1,oldsmobile starfire sx
275 | 23.9,4,119,97,2405,14.9,78,3,datsun 200-sx
276 | 20.3,5,131,103,2830,15.9,78,2,audi 5000
277 | 17,6,163,125,3140,13.6,78,2,volvo 264gl
278 | 21.6,4,121,115,2795,15.7,78,2,saab 99gle
279 | 16.2,6,163,133,3410,15.8,78,2,peugeot 604sl
280 | 31.5,4,89,71,1990,14.9,78,2,volkswagen scirocco
281 | 29.5,4,98,68,2135,16.6,78,3,honda accord lx
282 | 21.5,6,231,115,3245,15.4,79,1,pontiac lemans v6
283 | 19.8,6,200,85,2990,18.2,79,1,mercury zephyr 6
284 | 22.3,4,140,88,2890,17.3,79,1,ford fairmont 4
285 | 20.2,6,232,90,3265,18.2,79,1,amc concord dl 6
286 | 20.6,6,225,110,3360,16.6,79,1,dodge aspen 6
287 | 17,8,305,130,3840,15.4,79,1,chevrolet caprice classic
288 | 17.6,8,302,129,3725,13.4,79,1,ford ltd landau
289 | 16.5,8,351,138,3955,13.2,79,1,mercury grand marquis
290 | 18.2,8,318,135,3830,15.2,79,1,dodge st. regis
291 | 16.9,8,350,155,4360,14.9,79,1,buick estate wagon (sw)
292 | 15.5,8,351,142,4054,14.3,79,1,ford country squire (sw)
293 | 19.2,8,267,125,3605,15,79,1,chevrolet malibu classic (sw)
294 | 18.5,8,360,150,3940,13,79,1,chrysler lebaron town @ country (sw)
295 | 31.9,4,89,71,1925,14,79,2,vw rabbit custom
296 | 34.1,4,86,65,1975,15.2,79,3,maxda glc deluxe
297 | 35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom
298 | 27.4,4,121,80,2670,15,79,1,amc spirit dl
299 | 25.4,5,183,77,3530,20.1,79,2,mercedes benz 300d
300 | 23,8,350,125,3900,17.4,79,1,cadillac eldorado
301 | 27.2,4,141,71,3190,24.8,79,2,peugeot 504
302 | 23.9,8,260,90,3420,22.2,79,1,oldsmobile cutlass salon brougham
303 | 34.2,4,105,70,2200,13.2,79,1,plymouth horizon
304 | 34.5,4,105,70,2150,14.9,79,1,plymouth horizon tc3
305 | 31.8,4,85,65,2020,19.2,79,3,datsun 210
306 | 37.3,4,91,69,2130,14.7,79,2,fiat strada custom
307 | 28.4,4,151,90,2670,16,79,1,buick skylark limited
308 | 28.8,6,173,115,2595,11.3,79,1,chevrolet citation
309 | 26.8,6,173,115,2700,12.9,79,1,oldsmobile omega brougham
310 | 33.5,4,151,90,2556,13.2,79,1,pontiac phoenix
311 | 41.5,4,98,76,2144,14.7,80,2,vw rabbit
312 | 38.1,4,89,60,1968,18.8,80,3,toyota corolla tercel
313 | 32.1,4,98,70,2120,15.5,80,1,chevrolet chevette
314 | 37.2,4,86,65,2019,16.4,80,3,datsun 310
315 | 28,4,151,90,2678,16.5,80,1,chevrolet citation
316 | 26.4,4,140,88,2870,18.1,80,1,ford fairmont
317 | 24.3,4,151,90,3003,20.1,80,1,amc concord
318 | 19.1,6,225,90,3381,18.7,80,1,dodge aspen
319 | 34.3,4,97,78,2188,15.8,80,2,audi 4000
320 | 29.8,4,134,90,2711,15.5,80,3,toyota corona liftback
321 | 31.3,4,120,75,2542,17.5,80,3,mazda 626
322 | 37,4,119,92,2434,15,80,3,datsun 510 hatchback
323 | 32.2,4,108,75,2265,15.2,80,3,toyota corolla
324 | 46.6,4,86,65,2110,17.9,80,3,mazda glc
325 | 27.9,4,156,105,2800,14.4,80,1,dodge colt
326 | 40.8,4,85,65,2110,19.2,80,3,datsun 210
327 | 44.3,4,90,48,2085,21.7,80,2,vw rabbit c (diesel)
328 | 43.4,4,90,48,2335,23.7,80,2,vw dasher (diesel)
329 | 36.4,5,121,67,2950,19.9,80,2,audi 5000s (diesel)
330 | 30,4,146,67,3250,21.8,80,2,mercedes-benz 240d
331 | 44.6,4,91,67,1850,13.8,80,3,honda civic 1500 gl
332 | 40.9,4,85,?,1835,17.3,80,2,renault lecar deluxe
333 | 33.8,4,97,67,2145,18,80,3,subaru dl
334 | 29.8,4,89,62,1845,15.3,80,2,vokswagen rabbit
335 | 32.7,6,168,132,2910,11.4,80,3,datsun 280-zx
336 | 23.7,3,70,100,2420,12.5,80,3,mazda rx-7 gs
337 | 35,4,122,88,2500,15.1,80,2,triumph tr7 coupe
338 | 23.6,4,140,?,2905,14.3,80,1,ford mustang cobra
339 | 32.4,4,107,72,2290,17,80,3,honda accord
340 | 27.2,4,135,84,2490,15.7,81,1,plymouth reliant
341 | 26.6,4,151,84,2635,16.4,81,1,buick skylark
342 | 25.8,4,156,92,2620,14.4,81,1,dodge aries wagon (sw)
343 | 23.5,6,173,110,2725,12.6,81,1,chevrolet citation
344 | 30,4,135,84,2385,12.9,81,1,plymouth reliant
345 | 39.1,4,79,58,1755,16.9,81,3,toyota starlet
346 | 39,4,86,64,1875,16.4,81,1,plymouth champ
347 | 35.1,4,81,60,1760,16.1,81,3,honda civic 1300
348 | 32.3,4,97,67,2065,17.8,81,3,subaru
349 | 37,4,85,65,1975,19.4,81,3,datsun 210 mpg
350 | 37.7,4,89,62,2050,17.3,81,3,toyota tercel
351 | 34.1,4,91,68,1985,16,81,3,mazda glc 4
352 | 34.7,4,105,63,2215,14.9,81,1,plymouth horizon 4
353 | 34.4,4,98,65,2045,16.2,81,1,ford escort 4w
354 | 29.9,4,98,65,2380,20.7,81,1,ford escort 2h
355 | 33,4,105,74,2190,14.2,81,2,volkswagen jetta
356 | 34.5,4,100,?,2320,15.8,81,2,renault 18i
357 | 33.7,4,107,75,2210,14.4,81,3,honda prelude
358 | 32.4,4,108,75,2350,16.8,81,3,toyota corolla
359 | 32.9,4,119,100,2615,14.8,81,3,datsun 200sx
360 | 31.6,4,120,74,2635,18.3,81,3,mazda 626
361 | 28.1,4,141,80,3230,20.4,81,2,peugeot 505s turbo diesel
362 | 30.7,6,145,76,3160,19.6,81,2,volvo diesel
363 | 25.4,6,168,116,2900,12.6,81,3,toyota cressida
364 | 24.2,6,146,120,2930,13.8,81,3,datsun 810 maxima
365 | 22.4,6,231,110,3415,15.8,81,1,buick century
366 | 26.6,8,350,105,3725,19,81,1,oldsmobile cutlass ls
367 | 20.2,6,200,88,3060,17.1,81,1,ford granada gl
368 | 17.6,6,225,85,3465,16.6,81,1,chrysler lebaron salon
369 | 28,4,112,88,2605,19.6,82,1,chevrolet cavalier
370 | 27,4,112,88,2640,18.6,82,1,chevrolet cavalier wagon
371 | 34,4,112,88,2395,18,82,1,chevrolet cavalier 2-door
372 | 31,4,112,85,2575,16.2,82,1,pontiac j2000 se hatchback
373 | 29,4,135,84,2525,16,82,1,dodge aries se
374 | 27,4,151,90,2735,18,82,1,pontiac phoenix
375 | 24,4,140,92,2865,16.4,82,1,ford fairmont futura
376 | 36,4,105,74,1980,15.3,82,2,volkswagen rabbit l
377 | 37,4,91,68,2025,18.2,82,3,mazda glc custom l
378 | 31,4,91,68,1970,17.6,82,3,mazda glc custom
379 | 38,4,105,63,2125,14.7,82,1,plymouth horizon miser
380 | 36,4,98,70,2125,17.3,82,1,mercury lynx l
381 | 36,4,120,88,2160,14.5,82,3,nissan stanza xe
382 | 36,4,107,75,2205,14.5,82,3,honda accord
383 | 34,4,108,70,2245,16.9,82,3,toyota corolla
384 | 38,4,91,67,1965,15,82,3,honda civic
385 | 32,4,91,67,1965,15.7,82,3,honda civic (auto)
386 | 38,4,91,67,1995,16.2,82,3,datsun 310 gx
387 | 25,6,181,110,2945,16.4,82,1,buick century limited
388 | 38,6,262,85,3015,17,82,1,oldsmobile cutlass ciera (diesel)
389 | 26,4,156,92,2585,14.5,82,1,chrysler lebaron medallion
390 | 22,6,232,112,2835,14.7,82,1,ford granada l
391 | 32,4,144,96,2665,13.9,82,3,toyota celica gt
392 | 36,4,135,84,2370,13,82,1,dodge charger 2.2
393 | 27,4,151,90,2950,17.3,82,1,chevrolet camaro
394 | 27,4,140,86,2790,15.6,82,1,ford mustang gl
395 | 44,4,97,52,2130,24.6,82,2,vw pickup
396 | 32,4,135,84,2295,11.6,82,1,dodge rampage
397 | 28,4,120,79,2625,18.6,82,1,ford ranger
398 | 31,4,119,82,2720,19.4,82,1,chevy s-10
399 |
--------------------------------------------------------------------------------
/10-LinearRegression2/Overfitted_Data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/10-LinearRegression2/Overfitted_Data.png
--------------------------------------------------------------------------------
/11-practical-data-visualization/10-exercise.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Exercise: Interactive Chart in Altair\n",
8 | "\n",
9 | "Create a scatterplot with dimensions of your choosing of the movies dataset that you can brush, and a stacked histogram that filters according to the brush. \n",
10 | "\n",
11 | "This is what your stacked histogram should look like: \n",
12 | "\n",
13 | "\n",
14 | ""
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": []
23 | }
24 | ],
25 | "metadata": {
26 | "kernelspec": {
27 | "display_name": "Python 3 (ipykernel)",
28 | "language": "python",
29 | "name": "python3"
30 | },
31 | "language_info": {
32 | "codemirror_mode": {
33 | "name": "ipython",
34 | "version": 3
35 | },
36 | "file_extension": ".py",
37 | "mimetype": "text/x-python",
38 | "name": "python",
39 | "nbconvert_exporter": "python",
40 | "pygments_lexer": "ipython3",
41 | "version": "3.9.7"
42 | }
43 | },
44 | "nbformat": 4,
45 | "nbformat_minor": 4
46 | }
47 |
--------------------------------------------------------------------------------
/11-practical-data-visualization/stacked_hist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/11-practical-data-visualization/stacked_hist.png
--------------------------------------------------------------------------------
/11-practical-data-visualization/standards.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/11-practical-data-visualization/standards.png
--------------------------------------------------------------------------------
/12-vis-principles/11-vis-principles.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/12-vis-principles/11-vis-principles.pdf
--------------------------------------------------------------------------------
/13-web-scraping/class.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/class.png
--------------------------------------------------------------------------------
/13-web-scraping/inspector.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/inspector.png
--------------------------------------------------------------------------------
/13-web-scraping/lyrics.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Lyrics
6 |
7 |
8 |
9 | Published: 1969-10-22
10 | Led Zeppelin
11 | Ramble On
12 |
13 | Leaves are falling all around, It's time I was on my way.
14 | Thanks to you, I'm much obliged for such a pleasant stay.
15 | But now it's time for me to go. The autumn moon lights my way.
16 | For now I smell the rain, and with it pain, and it's headed my way.
17 |
18 |
19 |
20 | Published: 2016-05-03
21 | Radiohead
22 | Burn the Witch
23 |
24 | Stay in the shadows
25 | Cheer at the gallows
26 | This is a round up
27 | This is a low flying panic attack
28 | Sing a song on the jukebox that goes
29 | Burn the witch
30 | Burn the witch
31 | We know where you live
32 |
33 |
34 |
35 |
36 |
--------------------------------------------------------------------------------
/13-web-scraping/requests.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/requests.png
--------------------------------------------------------------------------------
/13-web-scraping/sampledevtools.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/13-web-scraping/sampledevtools.png
--------------------------------------------------------------------------------
/14-apis/14-exercise-apis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercise 1 – APIs\n",
8 | "\n",
9 | "Use this public [Quotes API](https://github.com/lukePeavey/quotable) to find all quotes in their database by Oscar Wilde.\n",
10 | "\n",
11 | "1. How many quotes does this API have?\n",
12 | "2. List each quote they have available on a separate line."
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "metadata": {},
19 | "outputs": [],
20 | "source": [
21 | "import requests "
22 | ]
23 | }
24 | ],
25 | "metadata": {
26 | "kernelspec": {
27 | "display_name": "Python 3 (ipykernel)",
28 | "language": "python",
29 | "name": "python3"
30 | },
31 | "language_info": {
32 | "codemirror_mode": {
33 | "name": "ipython",
34 | "version": 3
35 | },
36 | "file_extension": ".py",
37 | "mimetype": "text/x-python",
38 | "name": "python",
39 | "nbconvert_exporter": "python",
40 | "pygments_lexer": "ipython3",
41 | "version": "3.9.6"
42 | }
43 | },
44 | "nbformat": 4,
45 | "nbformat_minor": 2
46 | }
47 |
--------------------------------------------------------------------------------
/14-apis/credentials.py:
--------------------------------------------------------------------------------
1 | # Fill these in with your keys. You may not need all of them.
2 | API_KEY = ""
3 | API_KEY_SECRET = ""
4 | BEARER_TOKEN = ""
5 | ACCESS_TOKEN = ""
6 | ACCESS_TOKEN_SECRET = ""
7 | CLIENT_ID = ""
8 | CLIENT_SECRET = ""
9 |
--------------------------------------------------------------------------------
/14-apis/pokeapiscreenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/pokeapiscreenshot.png
--------------------------------------------------------------------------------
/14-apis/pokemonendpoint.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/pokemonendpoint.png
--------------------------------------------------------------------------------
/14-apis/requests.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/14-apis/requests.png
--------------------------------------------------------------------------------
/15-Classification1/BiasVarianceTradeoff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/BiasVarianceTradeoff.png
--------------------------------------------------------------------------------
/15-Classification1/BinaryConfusinoMatrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/BinaryConfusinoMatrix.png
--------------------------------------------------------------------------------
/15-Classification1/ConfusionMatrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/ConfusionMatrix.png
--------------------------------------------------------------------------------
/15-Classification1/iris.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/iris.png
--------------------------------------------------------------------------------
/15-Classification1/oc-tree.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/oc-tree.jpeg
--------------------------------------------------------------------------------
/15-Classification1/p_sets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/p_sets.png
--------------------------------------------------------------------------------
/15-Classification1/scikit-learn-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/scikit-learn-logo.png
--------------------------------------------------------------------------------
/15-Classification1/temp.dot:
--------------------------------------------------------------------------------
1 | digraph Tree {
2 | node [shape=box, style="filled, rounded", color="black", fontname="helvetica"] ;
3 | edge [fontname="helvetica"] ;
4 | 0 [label=gini = 0.459
samples = 445
value = [286, 159]
class = Perished>, fillcolor="#f3c7a7"] ;
5 | 1 [label=gini = 0.254
samples = 288
value = [245, 43]
class = Perished>, fillcolor="#ea975c"] ;
6 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
7 | 2 [label=samples = 12
value = [4, 8]
class = Survived>, fillcolor="#9ccef2"] ;
8 | 1 -> 2 ;
9 | 3 [label=gini = 0.221
samples = 276
value = [241, 35]
class = Perished>, fillcolor="#e99356"] ;
10 | 1 -> 3 ;
11 | 4 [label=gini = 0.413
samples = 55
value = [39, 16]
class = Perished>, fillcolor="#f0b58a"] ;
12 | 3 -> 4 ;
13 | 5 [label=samples = 6
value = [6, 0]
class = Perished>, fillcolor="#e58139"] ;
14 | 4 -> 5 ;
15 | 6 [label=gini = 0.44
samples = 49
value = [33, 16]
class = Perished>, fillcolor="#f2be99"] ;
16 | 4 -> 6 ;
17 | 7 [label=samples = 2
value = [0, 2]
class = Survived>, fillcolor="#399de5"] ;
18 | 6 -> 7 ;
19 | 8 [label=gini = 0.418
samples = 47
value = [33, 14]
class = Perished>, fillcolor="#f0b68d"] ;
20 | 6 -> 8 ;
21 | 9 [label=gini = 0.368
samples = 37
value = [28, 9]
class = Perished>, fillcolor="#edaa79"] ;
22 | 8 -> 9 ;
23 | 10 [label=gini = 0.346
samples = 36
value = [28, 8]
class = Perished>, fillcolor="#eca572"] ;
24 | 9 -> 10 ;
25 | 11 [label=samples = 6
value = [3, 3]
class = Perished>, fillcolor="#ffffff"] ;
26 | 10 -> 11 ;
27 | 12 [label=gini = 0.278
samples = 30
value = [25, 5]
class = Perished>, fillcolor="#ea9a61"] ;
28 | 10 -> 12 ;
29 | 13 [label=gini = 0.363
samples = 21
value = [16, 5]
class = Perished>, fillcolor="#eda877"] ;
30 | 12 -> 13 ;
31 | 14 [label=gini = 0.266
samples = 19
value = [16, 3]
class = Perished>, fillcolor="#ea995e"] ;
32 | 13 -> 14 ;
33 | 15 [label=samples = 12
value = [9, 3]
class = Perished>, fillcolor="#eeab7b"] ;
34 | 14 -> 15 ;
35 | 16 [label=samples = 7
value = [7, 0]
class = Perished>, fillcolor="#e58139"] ;
36 | 14 -> 16 ;
37 | 17 [label=samples = 2
value = [0, 2]
class = Survived>, fillcolor="#399de5"] ;
38 | 13 -> 17 ;
39 | 18 [label=samples = 9
value = [9, 0]
class = Perished>, fillcolor="#e58139"] ;
40 | 12 -> 18 ;
41 | 19 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ;
42 | 9 -> 19 ;
43 | 20 [label=samples = 10
value = [5, 5]
class = Perished>, fillcolor="#ffffff"] ;
44 | 8 -> 20 ;
45 | 21 [label=gini = 0.157
samples = 221
value = [202, 19]
class = Perished>, fillcolor="#e78d4c"] ;
46 | 3 -> 21 ;
47 | 22 [label=gini = 0.174
samples = 198
value = [179, 19]
class = Perished>, fillcolor="#e88e4e"] ;
48 | 21 -> 22 ;
49 | 23 [label=gini = 0.135
samples = 151
value = [140, 11]
class = Perished>, fillcolor="#e78b49"] ;
50 | 22 -> 23 ;
51 | 24 [label=gini = 0.302
samples = 27
value = [22, 5]
class = Perished>, fillcolor="#eb9e66"] ;
52 | 23 -> 24 ;
53 | 25 [label=gini = 0.391
samples = 15
value = [11, 4]
class = Perished>, fillcolor="#eeaf81"] ;
54 | 24 -> 25 ;
55 | 26 [label=samples = 14
value = [11, 3]
class = Perished>, fillcolor="#eca36f"] ;
56 | 25 -> 26 ;
57 | 27 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ;
58 | 25 -> 27 ;
59 | 28 [label=samples = 12
value = [11, 1]
class = Perished>, fillcolor="#e78c4b"] ;
60 | 24 -> 28 ;
61 | 29 [label=gini = 0.092
samples = 124
value = [118, 6]
class = Perished>, fillcolor="#e68743"] ;
62 | 23 -> 29 ;
63 | 30 [label=samples = 10
value = [8, 2]
class = Perished>, fillcolor="#eca06a"] ;
64 | 29 -> 30 ;
65 | 31 [label=gini = 0.068
samples = 114
value = [110, 4]
class = Perished>, fillcolor="#e68640"] ;
66 | 29 -> 31 ;
67 | 32 [label=gini = 0.153
samples = 24
value = [22, 2]
class = Perished>, fillcolor="#e78c4b"] ;
68 | 31 -> 32 ;
69 | 33 [label=samples = 11
value = [9, 2]
class = Perished>, fillcolor="#eb9d65"] ;
70 | 32 -> 33 ;
71 | 34 [label=samples = 13
value = [13, 0]
class = Perished>, fillcolor="#e58139"] ;
72 | 32 -> 34 ;
73 | 35 [label=gini = 0.043
samples = 90
value = [88, 2]
class = Perished>, fillcolor="#e6843d"] ;
74 | 31 -> 35 ;
75 | 36 [label=samples = 50
value = [50, 0]
class = Perished>, fillcolor="#e58139"] ;
76 | 35 -> 36 ;
77 | 37 [label=gini = 0.095
samples = 40
value = [38, 2]
class = Perished>, fillcolor="#e68843"] ;
78 | 35 -> 37 ;
79 | 38 [label=samples = 1
value = [0, 1]
class = Survived>, fillcolor="#399de5"] ;
80 | 37 -> 38 ;
81 | 39 [label=gini = 0.05
samples = 39
value = [38, 1]
class = Perished>, fillcolor="#e6843e"] ;
82 | 37 -> 39 ;
83 | 40 [label=samples = 33
value = [33, 0]
class = Perished>, fillcolor="#e58139"] ;
84 | 39 -> 40 ;
85 | 41 [label=samples = 6
value = [5, 1]
class = Perished>, fillcolor="#ea9a61"] ;
86 | 39 -> 41 ;
87 | 42 [label=gini = 0.282
samples = 47
value = [39, 8]
class = Perished>, fillcolor="#ea9b62"] ;
88 | 22 -> 42 ;
89 | 43 [label=samples = 10
value = [6, 4]
class = Perished>, fillcolor="#f6d5bd"] ;
90 | 42 -> 43 ;
91 | 44 [label=gini = 0.193
samples = 37
value = [33, 4]
class = Perished>, fillcolor="#e89051"] ;
92 | 42 -> 44 ;
93 | 45 [label=gini = 0.157
samples = 35
value = [32, 3]
class = Perished>, fillcolor="#e78d4c"] ;
94 | 44 -> 45 ;
95 | 46 [label=gini = 0.227
samples = 23
value = [20, 3]
class = Perished>, fillcolor="#e99457"] ;
96 | 45 -> 46 ;
97 | 47 [label=gini = 0.105
samples = 18
value = [17, 1]
class = Perished>, fillcolor="#e78845"] ;
98 | 46 -> 47 ;
99 | 48 [label=samples = 13
value = [13, 0]
class = Perished>, fillcolor="#e58139"] ;
100 | 47 -> 48 ;
101 | 49 [label=samples = 5
value = [4, 1]
class = Perished>, fillcolor="#eca06a"] ;
102 | 47 -> 49 ;
103 | 50 [label=samples = 5
value = [3, 2]
class = Perished>, fillcolor="#f6d5bd"] ;
104 | 46 -> 50 ;
105 | 51 [label=samples = 12
value = [12, 0]
class = Perished>, fillcolor="#e58139"] ;
106 | 45 -> 51 ;
107 | 52 [label=samples = 2
value = [1, 1]
class = Perished>, fillcolor="#ffffff"] ;
108 | 44 -> 52 ;
109 | 53 [label=samples = 23
value = [23, 0]
class = Perished>, fillcolor="#e58139"] ;
110 | 21 -> 53 ;
111 | 54 [label=gini = 0.386
samples = 157
value = [41, 116]
class = Survived>, fillcolor="#7fc0ee"] ;
112 | 0 -> 54 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
113 | 55 [label=gini = 0.131
samples = 85
value = [6, 79]
class = Survived>, fillcolor="#48a4e7"] ;
114 | 54 -> 55 ;
115 | 56 [label=samples = 1
value = [1, 0]
class = Perished>, fillcolor="#e58139"] ;
116 | 55 -> 56 ;
117 | 57 [label=gini = 0.112
samples = 84
value = [5, 79]
class = Survived>, fillcolor="#46a3e7"] ;
118 | 55 -> 57 ;
119 | 58 [label=gini = 0.251
samples = 34
value = [5, 29]
class = Survived>, fillcolor="#5baee9"] ;
120 | 57 -> 58 ;
121 | 59 [label=gini = 0.213
samples = 33
value = [4, 29]
class = Survived>, fillcolor="#54abe9"] ;
122 | 58 -> 59 ;
123 | 60 [label=samples = 6
value = [0, 6]
class = Survived>, fillcolor="#399de5"] ;
124 | 59 -> 60 ;
125 | 61 [label=gini = 0.252
samples = 27
value = [4, 23]
class = Survived>, fillcolor="#5baeea"] ;
126 | 59 -> 61 ;
127 | 62 [label=samples = 4
value = [2, 2]
class = Perished>, fillcolor="#ffffff"] ;
128 | 61 -> 62 ;
129 | 63 [label=gini = 0.159
samples = 23
value = [2, 21]
class = Survived>, fillcolor="#4ca6e7"] ;
130 | 61 -> 63 ;
131 | 64 [label=samples = 13
value = [0, 13]
class = Survived>, fillcolor="#399de5"] ;
132 | 63 -> 64 ;
133 | 65 [label=samples = 10
value = [2, 8]
class = Survived>, fillcolor="#6ab6ec"] ;
134 | 63 -> 65 ;
135 | 66 [label=samples = 1
value = [1, 0]
class = Perished>, fillcolor="#e58139"] ;
136 | 58 -> 66 ;
137 | 67 [label=samples = 50
value = [0, 50]
class = Survived>, fillcolor="#399de5"] ;
138 | 57 -> 67 ;
139 | 68 [label=gini = 0.5
samples = 72
value = [35, 37]
class = Survived>, fillcolor="#f4fafe"] ;
140 | 54 -> 68 ;
141 | 69 [label=gini = 0.454
samples = 46
value = [30, 16]
class = Perished>, fillcolor="#f3c4a3"] ;
142 | 68 -> 69 ;
143 | 70 [label=gini = 0.499
samples = 29
value = [14, 15]
class = Survived>, fillcolor="#f2f8fd"] ;
144 | 69 -> 70 ;
145 | 71 [label=gini = 0.444
samples = 15
value = [10, 5]
class = Perished>, fillcolor="#f2c09c"] ;
146 | 70 -> 71 ;
147 | 72 [label=samples = 8
value = [3, 5]
class = Survived>, fillcolor="#b0d8f5"] ;
148 | 71 -> 72 ;
149 | 73 [label=samples = 7
value = [7, 0]
class = Perished>, fillcolor="#e58139"] ;
150 | 71 -> 73 ;
151 | 74 [label=samples = 14
value = [4, 10]
class = Survived>, fillcolor="#88c4ef"] ;
152 | 70 -> 74 ;
153 | 75 [label=gini = 0.111
samples = 17
value = [16, 1]
class = Perished>, fillcolor="#e78945"] ;
154 | 69 -> 75 ;
155 | 76 [label=samples = 15
value = [15, 0]
class = Perished>, fillcolor="#e58139"] ;
156 | 75 -> 76 ;
157 | 77 [label=samples = 2
value = [1, 1]
class = Perished>, fillcolor="#ffffff"] ;
158 | 75 -> 77 ;
159 | 78 [label=gini = 0.311
samples = 26
value = [5, 21]
class = Survived>, fillcolor="#68b4eb"] ;
160 | 68 -> 78 ;
161 | 79 [label=gini = 0.219
samples = 24
value = [3, 21]
class = Survived>, fillcolor="#55abe9"] ;
162 | 78 -> 79 ;
163 | 80 [label=samples = 9
value = [0, 9]
class = Survived>, fillcolor="#399de5"] ;
164 | 79 -> 80 ;
165 | 81 [label=gini = 0.32
samples = 15
value = [3, 12]
class = Survived>, fillcolor="#6ab6ec"] ;
166 | 79 -> 81 ;
167 | 82 [label=samples = 5
value = [2, 3]
class = Survived>, fillcolor="#bddef6"] ;
168 | 81 -> 82 ;
169 | 83 [label=samples = 10
value = [1, 9]
class = Survived>, fillcolor="#4fa8e8"] ;
170 | 81 -> 83 ;
171 | 84 [label=samples = 2
value = [2, 0]
class = Perished>, fillcolor="#e58139"] ;
172 | 78 -> 84 ;
173 | }
--------------------------------------------------------------------------------
/15-Classification1/temp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/temp.png
--------------------------------------------------------------------------------
/15-Classification1/titanic_tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/15-Classification1/titanic_tree.png
--------------------------------------------------------------------------------
/16-Classification2/4fold_CV.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/4fold_CV.png
--------------------------------------------------------------------------------
/16-Classification2/SVM-Tutorial.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/SVM-Tutorial.pdf
--------------------------------------------------------------------------------
/16-Classification2/iris.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/16-Classification2/iris.png
--------------------------------------------------------------------------------
/17-NLP-RegEx/lecture-21-exercise.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Introduction to Data Science – Text Munging Exercises\n",
8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* "
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "## NLP\n",
16 | "\n",
17 | "### Exercise 1.1: Frequent Words\n",
18 | "Find the most frequently used words in Moby Dick which are not stopwords and not punctuation. Hint: [`str.isalpha()`](https://docs.python.org/3/library/stdtypes.html#str.isalpha) could be useful here."
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "import nltk\n",
28 | "from nltk.corpus import stopwords\n",
29 | "stopwords = nltk.corpus.stopwords.words('english')\n",
30 | "from nltk.book import *"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": []
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": [
44 | "## Exercise 2.1\n",
45 | "\n",
46 | "You're an evil Spammer who's observed that many people try to obfuscate their e-mail using this notation: \"`alex at utah dot edu`\". Below are three examples of such e-mails text. Try to extract \"alex at utah dot edu\", etc. Start with the first string. Then extend your regular expression to work on all of them at the same time. Note that the second and third are slightly harder to do! "
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": null,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "import re\n",
56 | "html_smart = \"You can reach me: alex at utah dot edu\"\n",
57 | "html_smart2 = \"You can reach me: alex dot lex at utah dot edu\"\n",
58 | "html_smart3 = \"You can reach me: alex dot lex at sci dot utah dot edu\""
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "def testRegex(regex):\n",
68 | " for html in (html_smart, html_smart2, html_smart3):\n",
69 | " print(re.search(regex, html).group())"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "# TODO write your regex here\n",
79 | "mail_regex = \"\""
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": null,
85 | "metadata": {},
86 | "outputs": [],
87 | "source": [
88 | "testRegex(mail_regex)"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "## Exercise 2.2: Find Adverbs\n",
96 | "\n",
97 | "Write a regular expression that finds all adverbs in a sentence. Adverbs are characterized by ending in \"ly\"."
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "text = \"He was carefully disguised but captured quickly by police.\""
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": []
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "### Exercise 2.3: Phone Numbers\n",
121 | "\n",
122 | "Extract the phone numbers that follow a (xxx) xxx-xxxx pattern from the text:"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "metadata": {},
129 | "outputs": [],
130 | "source": [
131 | "phone_numbers = \"\"\"(857) 131-2235, (801) 134-2215, but this one (12) 13044441 shouldnt match. \\\n",
132 | "Also, this is common in twelve (12) countries and one (1) state\"\"\""
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "metadata": {},
139 | "outputs": [],
140 | "source": []
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "### Exercise 2.4: HTML Content\n",
147 | "\n",
148 | "Extract the content between the `` and `` tags but not the other tags:"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": null,
154 | "metadata": {},
155 | "outputs": [],
156 | "source": [
157 | "html_tags = \"This is important and verytimely\""
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": []
166 | }
167 | ],
168 | "metadata": {
169 | "kernelspec": {
170 | "display_name": "Python 3 (ipykernel)",
171 | "language": "python",
172 | "name": "python3"
173 | },
174 | "language_info": {
175 | "codemirror_mode": {
176 | "name": "ipython",
177 | "version": 3
178 | },
179 | "file_extension": ".py",
180 | "mimetype": "text/x-python",
181 | "name": "python",
182 | "nbconvert_exporter": "python",
183 | "pygments_lexer": "ipython3",
184 | "version": "3.9.6"
185 | }
186 | },
187 | "nbformat": 4,
188 | "nbformat_minor": 2
189 | }
190 |
--------------------------------------------------------------------------------
/17-NLP-RegEx/mod_squad.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/17-NLP-RegEx/mod_squad.png
--------------------------------------------------------------------------------
/18-Clustering1/k-means-fig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/18-Clustering1/k-means-fig.png
--------------------------------------------------------------------------------
/18-Clustering1/lloyd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/18-Clustering1/lloyd.png
--------------------------------------------------------------------------------
/19-Clustering2/ComparisonOfClusteringMethods.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/ComparisonOfClusteringMethods.png
--------------------------------------------------------------------------------
/19-Clustering2/DBScan.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/DBScan.png
--------------------------------------------------------------------------------
/19-Clustering2/connectivity_plot1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/connectivity_plot1.png
--------------------------------------------------------------------------------
/19-Clustering2/connectivity_plot2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/connectivity_plot2.png
--------------------------------------------------------------------------------
/19-Clustering2/dendrogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/dendrogram.png
--------------------------------------------------------------------------------
/19-Clustering2/hc_1_homogeneous_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_1_homogeneous_complete.png
--------------------------------------------------------------------------------
/19-Clustering2/hc_2_homogeneous_not_complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_2_homogeneous_not_complete.png
--------------------------------------------------------------------------------
/19-Clustering2/hc_3_complete_not_homogeneous.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hc_3_complete_not_homogeneous.png
--------------------------------------------------------------------------------
/19-Clustering2/hierarchical_clustering_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hierarchical_clustering_1.png
--------------------------------------------------------------------------------
/19-Clustering2/hierarchical_clustering_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/hierarchical_clustering_2.png
--------------------------------------------------------------------------------
/19-Clustering2/lloyd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/19-Clustering2/lloyd.png
--------------------------------------------------------------------------------
/20-DimReduction/20-DimReduction-Activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "# Introduction to Data Science \n",
12 | "# Lecture 20: Dimension Reduction - Activity\n",
13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*"
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 1,
19 | "metadata": {
20 | "slideshow": {
21 | "slide_type": "slide"
22 | }
23 | },
24 | "outputs": [],
25 | "source": [
26 | "# imports and setup \n",
27 | "\n",
28 | "import numpy as np\n",
29 | "\n",
30 | "import pandas as pd\n",
31 | "pd.set_option('display.notebook_repr_html', False)\n",
32 | "\n",
33 | "from sklearn.datasets import load_iris, load_digits\n",
34 | "from sklearn.preprocessing import scale\n",
35 | "from sklearn.decomposition import PCA \n",
36 | "from sklearn.cluster import KMeans, AgglomerativeClustering\n",
37 | "from sklearn import metrics\n",
38 | "from sklearn.metrics import homogeneity_score, v_measure_score\n",
39 | "\n",
40 | "import matplotlib.pyplot as plt\n",
41 | "from matplotlib.colors import ListedColormap\n",
42 | "from mpl_toolkits.mplot3d import Axes3D\n",
43 | "%matplotlib inline\n",
44 | "plt.rcParams['figure.figsize'] = (10, 6)\n",
45 | "plt.style.use('ggplot')\n",
46 | "\n",
47 | "import seaborn as sns"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "In this activity we will consider an RNA data set taken from [here](https://www.nature.com/articles/s41592-019-0425-8). This data set contains genetic information on 296 different cells, recording 3000 distinct gene counts/features for each cell. The cells were synthetically generated in various mixtures (7 different cell types) so that ground truth cell type information is in fact available. Note this data has already been imputed and scaled, so you don't need to rescale."
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 68,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "\n",
67 | "RangeIndex: 296 entries, 0 to 295\n",
68 | "Columns: 3000 entries, ENSG00000019582 to ENSG00000116560\n",
69 | "dtypes: float64(3000)\n",
70 | "memory usage: 6.8 MB\n",
71 | "None\n",
72 | " ENSG00000019582 ENSG00000132432 ENSG00000234745 ENSG00000146648 \\\n",
73 | "0 4.317488 3.135494 4.382027 2.397895 \n",
74 | "1 3.258097 4.382027 3.135494 3.688879 \n",
75 | "2 3.465736 4.844187 3.496508 4.007333 \n",
76 | "3 3.332205 5.017280 3.465736 4.025352 \n",
77 | "4 3.332205 5.111988 3.526361 4.060443 \n",
78 | "\n",
79 | " ENSG00000108602 ERCC-00130 ERCC-00096 ERCC-00002 ERCC-00046 \\\n",
80 | "0 3.806662 3.713572 3.496508 3.610918 2.995732 \n",
81 | "1 2.564949 5.043425 4.836282 4.875197 4.077537 \n",
82 | "2 2.890372 4.442651 4.110874 4.262680 3.433987 \n",
83 | "3 2.890372 4.369448 4.094345 4.262680 3.465736 \n",
84 | "4 2.833213 4.406719 4.143135 4.234107 3.496508 \n",
85 | "\n",
86 | " ERCC-00074 ... ENSG00000197619 ENSG00000114450 ENSG00000147592 \\\n",
87 | "0 3.663562 ... 0.0 1.386294 1.386294 \n",
88 | "1 4.844187 ... 0.0 0.693147 1.386294 \n",
89 | "2 4.219508 ... 0.0 0.693147 1.609438 \n",
90 | "3 4.204693 ... 0.0 1.098612 1.945910 \n",
91 | "4 4.234107 ... 0.0 1.098612 1.098612 \n",
92 | "\n",
93 | " ENSG00000184897 ENSG00000157765 ENSG00000116273 ENSG00000000003 \\\n",
94 | "0 2.302585 0.693147 0.000000 1.098612 \n",
95 | "1 2.079442 0.000000 0.000000 1.098612 \n",
96 | "2 2.197225 0.000000 0.000000 1.386294 \n",
97 | "3 2.302585 0.000000 0.000000 1.098612 \n",
98 | "4 2.484907 0.000000 0.693147 1.098612 \n",
99 | "\n",
100 | " ENSG00000250120 ENSG00000069122 ENSG00000116560 \n",
101 | "0 0.000000 0.693147 2.890372 \n",
102 | "1 0.000000 0.693147 2.772589 \n",
103 | "2 0.000000 0.000000 2.639057 \n",
104 | "3 0.000000 0.000000 2.772589 \n",
105 | "4 0.693147 0.693147 2.833213 \n",
106 | "\n",
107 | "[5 rows x 3000 columns]\n"
108 | ]
109 | },
110 | {
111 | "data": {
112 | "text/plain": [
113 | " CellType\n",
114 | "CellNumber \n",
115 | "1 1\n",
116 | "2 2\n",
117 | "3 2\n",
118 | "4 2\n",
119 | "5 2\n",
120 | "... ...\n",
121 | "292 7\n",
122 | "293 7\n",
123 | "294 7\n",
124 | "295 7\n",
125 | "296 7\n",
126 | "\n",
127 | "[296 rows x 1 columns]"
128 | ]
129 | },
130 | "execution_count": 68,
131 | "metadata": {},
132 | "output_type": "execute_result"
133 | }
134 | ],
135 | "source": [
136 | "# Read in the data\n",
137 | "\n",
138 | "rna_data = pd.read_csv(\"rnamix1_SCT.csv\")\n",
139 | "rna_labels = pd.read_csv(\"rnamix1_labels.csv\",index_col=0)\n",
140 | "print(rna_data.info())\n",
141 | "print(rna_data.head())\n",
142 | "rna_labels"
143 | ]
144 | },
145 | {
146 | "cell_type": "markdown",
147 | "metadata": {},
148 | "source": [
149 | "(1) Make a 2-dimensional PCA plot of the data, and color it by the cell types. "
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 69,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "# Your code goes here"
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {},
164 | "source": [
165 | "(2) What percentage of the variance is captured by the first 2 PC's? Make a plot showing the decay of the variance explained by the first 100 PC's. How many PC's would you need to capture 90% of the variance?"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 70,
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "# Your code goes here"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "(3) Calculate the v_measure_score obtained by running kmeans with k = 7 on the 2-dimensional PCA plot. Can you achieve a higher score by using more PCs?"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 71,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "# Your code goes here"
191 | ]
192 | }
193 | ],
194 | "metadata": {
195 | "anaconda-cloud": {},
196 | "kernelspec": {
197 | "display_name": "Python 3 (ipykernel)",
198 | "language": "python",
199 | "name": "python3"
200 | },
201 | "language_info": {
202 | "codemirror_mode": {
203 | "name": "ipython",
204 | "version": 3
205 | },
206 | "file_extension": ".py",
207 | "mimetype": "text/x-python",
208 | "name": "python",
209 | "nbconvert_exporter": "python",
210 | "pygments_lexer": "ipython3",
211 | "version": "3.11.5"
212 | }
213 | },
214 | "nbformat": 4,
215 | "nbformat_minor": 4
216 | }
217 |
--------------------------------------------------------------------------------
/20-DimReduction/heptathlon.csv:
--------------------------------------------------------------------------------
1 | name , hurdles, highjump, shot, run200m, longjump, javelin, run800m, score
2 | Joyner-Kersee (USA) , 12.69, 1.86, 15.80, 22.56, 7.27, 45.66, 128.51, 7291
3 | John (GDR) , 12.85, 1.80, 16.23, 23.65, 6.71, 42.56, 126.12, 6897
4 | Behmer (GDR) , 13.20, 1.83, 14.20, 23.10, 6.68, 44.54, 124.20, 6858
5 | Sablovskaite (URS) , 13.61, 1.80, 15.23, 23.92, 6.25, 42.78, 132.24, 6540
6 | Choubenkova (URS) , 13.51, 1.74, 14.76, 23.93, 6.32, 47.46, 127.90, 6540
7 | Schulz (GDR) , 13.75, 1.83, 13.50, 24.65, 6.33, 42.82, 125.79, 6411
8 | Fleming (AUS) , 13.38, 1.80, 12.88, 23.59, 6.37, 40.28, 132.54, 6351
9 | Greiner (USA) , 13.55, 1.80, 14.13, 24.48, 6.47, 38.00, 133.65, 6297
10 | Lajbnerova (CZE) , 13.63, 1.83, 14.28, 24.86, 6.11, 42.20, 136.05, 6252
11 | Bouraga (URS) , 13.25, 1.77, 12.62, 23.59, 6.28, 39.06, 134.74, 6252
12 | Wijnsma (HOL) , 13.75, 1.86, 13.01, 25.03, 6.34, 37.86, 131.49, 6205
13 | Dimitrova (BUL) , 13.24, 1.80, 12.88, 23.59, 6.37, 40.28, 132.54, 6171
14 | Scheider (SWI) , 13.85, 1.86, 11.58, 24.87, 6.05, 47.50, 134.93, 6137
15 | Braun (FRG) , 13.71, 1.83, 13.16, 24.78, 6.12, 44.58, 142.82, 6109
16 | Ruotsalainen (FIN) , 13.79, 1.80, 12.32, 24.61, 6.08, 45.44, 137.06, 6101
17 | Yuping (CHN) , 13.93, 1.86, 14.21, 25.00, 6.40, 38.60, 146.67, 6087
18 | Hagger (GB) , 13.47, 1.80, 12.75, 25.47, 6.34, 35.76, 138.48, 5975
19 | Brown (USA) , 14.07, 1.83, 12.69, 24.83, 6.13, 44.34, 146.43, 5972
20 | Mulliner (GB) , 14.39, 1.71, 12.68, 24.92, 6.10, 37.76, 138.02, 5746
21 | Hautenauve (BEL) , 14.04, 1.77, 11.81, 25.61, 5.99, 35.68, 133.90, 5734
22 | Kytola (FIN) , 14.31, 1.77, 11.66, 25.69, 5.75, 39.48, 133.35, 5686
23 | Geremias (BRA) , 14.23, 1.71, 12.95, 25.50, 5.50, 39.64, 144.02, 5508
24 | Hui-Ing (TAI) , 14.85, 1.68, 10.00, 25.23, 5.47, 39.14, 137.30, 5290
25 | Jeong-Mi (KOR) , 14.53, 1.71, 10.83, 26.61, 5.50, 39.26, 139.17, 5289
26 | Launa (PNG) , 16.42, 1.50, 11.78, 26.16, 4.88, 46.38, 163.43, 4566
--------------------------------------------------------------------------------
/20-DimReduction/rnamix1_labels.csv:
--------------------------------------------------------------------------------
1 | CellNumber,CellType
2 | 1,1
3 | 2,2
4 | 3,2
5 | 4,2
6 | 5,2
7 | 6,2
8 | 7,2
9 | 8,2
10 | 9,2
11 | 10,2
12 | 11,3
13 | 12,2
14 | 13,2
15 | 14,2
16 | 15,2
17 | 16,2
18 | 17,2
19 | 18,2
20 | 19,2
21 | 20,2
22 | 21,2
23 | 22,2
24 | 23,2
25 | 24,2
26 | 25,2
27 | 26,2
28 | 27,2
29 | 28,2
30 | 29,2
31 | 30,2
32 | 31,2
33 | 32,2
34 | 33,2
35 | 34,2
36 | 35,2
37 | 36,2
38 | 37,2
39 | 38,2
40 | 39,2
41 | 40,2
42 | 41,2
43 | 42,2
44 | 43,2
45 | 44,2
46 | 45,2
47 | 46,2
48 | 47,2
49 | 48,2
50 | 49,2
51 | 50,2
52 | 51,2
53 | 52,2
54 | 53,2
55 | 54,2
56 | 55,2
57 | 56,2
58 | 57,2
59 | 58,2
60 | 59,2
61 | 60,2
62 | 61,2
63 | 62,2
64 | 63,2
65 | 64,2
66 | 65,2
67 | 66,2
68 | 67,2
69 | 68,2
70 | 69,2
71 | 70,2
72 | 71,2
73 | 72,2
74 | 73,2
75 | 74,2
76 | 75,2
77 | 76,2
78 | 77,4
79 | 78,4
80 | 79,4
81 | 80,4
82 | 81,4
83 | 82,4
84 | 83,4
85 | 84,4
86 | 85,4
87 | 86,4
88 | 87,4
89 | 88,4
90 | 89,4
91 | 90,4
92 | 91,4
93 | 92,4
94 | 93,4
95 | 94,4
96 | 95,4
97 | 96,4
98 | 97,4
99 | 98,4
100 | 99,4
101 | 100,4
102 | 101,4
103 | 102,4
104 | 103,4
105 | 104,4
106 | 105,4
107 | 106,4
108 | 107,4
109 | 108,4
110 | 109,4
111 | 110,4
112 | 111,4
113 | 112,4
114 | 113,4
115 | 114,5
116 | 115,5
117 | 116,5
118 | 117,5
119 | 118,5
120 | 119,5
121 | 120,5
122 | 121,5
123 | 122,5
124 | 123,5
125 | 124,5
126 | 125,5
127 | 126,5
128 | 127,5
129 | 128,5
130 | 129,5
131 | 130,5
132 | 131,5
133 | 132,5
134 | 133,5
135 | 134,6
136 | 135,6
137 | 136,6
138 | 137,6
139 | 138,6
140 | 139,6
141 | 140,6
142 | 141,6
143 | 142,6
144 | 143,6
145 | 144,6
146 | 145,6
147 | 146,6
148 | 147,6
149 | 148,6
150 | 149,6
151 | 150,6
152 | 151,6
153 | 152,6
154 | 153,6
155 | 154,6
156 | 155,6
157 | 156,6
158 | 157,6
159 | 158,6
160 | 159,6
161 | 160,6
162 | 161,6
163 | 162,6
164 | 163,6
165 | 164,6
166 | 165,6
167 | 166,6
168 | 167,6
169 | 168,6
170 | 169,6
171 | 170,6
172 | 171,6
173 | 172,6
174 | 173,6
175 | 174,6
176 | 175,1
177 | 176,1
178 | 177,1
179 | 178,1
180 | 179,1
181 | 180,1
182 | 181,1
183 | 182,1
184 | 183,1
185 | 184,1
186 | 185,1
187 | 186,1
188 | 187,1
189 | 188,1
190 | 189,1
191 | 190,1
192 | 191,1
193 | 192,1
194 | 193,1
195 | 194,1
196 | 195,1
197 | 196,1
198 | 197,1
199 | 198,1
200 | 199,1
201 | 200,1
202 | 201,1
203 | 202,1
204 | 203,1
205 | 204,1
206 | 205,1
207 | 206,1
208 | 207,1
209 | 208,1
210 | 209,1
211 | 210,1
212 | 211,1
213 | 212,1
214 | 213,1
215 | 214,3
216 | 215,3
217 | 216,3
218 | 217,3
219 | 218,3
220 | 219,3
221 | 220,3
222 | 221,3
223 | 222,3
224 | 223,3
225 | 224,3
226 | 225,3
227 | 226,3
228 | 227,3
229 | 228,3
230 | 229,3
231 | 230,3
232 | 231,3
233 | 232,3
234 | 233,3
235 | 234,3
236 | 235,3
237 | 236,3
238 | 237,3
239 | 238,3
240 | 239,3
241 | 240,3
242 | 241,3
243 | 242,3
244 | 243,3
245 | 244,3
246 | 245,3
247 | 246,3
248 | 247,3
249 | 248,3
250 | 249,3
251 | 250,3
252 | 251,3
253 | 252,3
254 | 253,3
255 | 254,3
256 | 255,3
257 | 256,3
258 | 257,3
259 | 258,7
260 | 259,7
261 | 260,7
262 | 261,7
263 | 262,7
264 | 263,7
265 | 264,7
266 | 265,7
267 | 266,7
268 | 267,7
269 | 268,7
270 | 269,7
271 | 270,7
272 | 271,7
273 | 272,7
274 | 273,7
275 | 274,7
276 | 275,7
277 | 276,7
278 | 277,7
279 | 278,7
280 | 279,7
281 | 280,7
282 | 281,7
283 | 282,7
284 | 283,7
285 | 284,7
286 | 285,7
287 | 286,7
288 | 287,7
289 | 288,7
290 | 289,7
291 | 290,7
292 | 291,7
293 | 292,7
294 | 293,7
295 | 294,7
296 | 295,7
297 | 296,7
--------------------------------------------------------------------------------
/21-NeuralNetwork1/ImageNetPlot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/ImageNetPlot.png
--------------------------------------------------------------------------------
/21-NeuralNetwork1/mnist-original.mat.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/mnist-original.mat.zip
--------------------------------------------------------------------------------
/21-NeuralNetwork1/neuralnetworks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/neuralnetworks.png
--------------------------------------------------------------------------------
/21-NeuralNetwork1/perceptron.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/21-NeuralNetwork1/perceptron.png
--------------------------------------------------------------------------------
/22-NeuralNetworks2/22-NeuralNetworks2-activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "slideshow": {
7 | "slide_type": "slide"
8 | }
9 | },
10 | "source": [
11 | "# Introduction to Data Science \n",
12 | "# Inclass Exercises for Lecture 22: Neural Networks II\n",
13 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*\n"
14 | ]
15 | },
16 | {
17 | "cell_type": "markdown",
18 | "metadata": {
19 | "slideshow": {
20 | "slide_type": "-"
21 | }
22 | },
23 | "source": [
24 | "### Installing TensorFlow\n",
25 | "\n",
26 | "Instructions for installing TensorFlow are available at [the tensorflow install page](https://www.tensorflow.org/versions/r1.0/install/).\n",
27 | "\n",
28 | "It is recommended that you use the command: \n",
29 | "```\n",
30 | "pip install tensorflow\n",
31 | "```\n"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 1,
37 | "metadata": {},
38 | "outputs": [
39 | {
40 | "name": "stdout",
41 | "output_type": "stream",
42 | "text": [
43 | "2.4.1\n"
44 | ]
45 | }
46 | ],
47 | "source": [
48 | "import tensorflow as tf\n",
49 | "print(tf.__version__)\n"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {
55 | "slideshow": {
56 | "slide_type": "-"
57 | }
58 | },
59 | "source": [
60 | "**Exercise 1:** Use TensorFlow to compute the derivative of $f(x) = e^x$ at $x=2$."
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 2,
66 | "metadata": {
67 | "slideshow": {
68 | "slide_type": "-"
69 | }
70 | },
71 | "outputs": [],
72 | "source": [
73 | "# your code here\n"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "**Exercise 2:** Use TensorFlow to find the minimum of the [Rosenbrock function](https://en.wikipedia.org/wiki/Rosenbrock_function): \n",
81 | "$$\n",
82 | "f(x,y) = (x-1)^2 + 100*(y-x^2)^2.\n",
83 | "$$\n"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 4,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "# your code here\n"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {
98 | "slideshow": {
99 | "slide_type": "slide"
100 | }
101 | },
102 | "source": [
103 | "## Using a pre-trained network\n",
104 | "\n",
105 | "There are many examples of pre-trained NN that can be accessed [here](https://www.tensorflow.org/api_docs/python/tf/keras/applications). \n",
106 | "These NN are very large, having been trained on giant computers using massive datasets. \n",
107 | "\n",
108 | "It can be very useful to initialize a NN using one of these. This is called [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning). \n",
109 | "\n",
110 | "\n",
111 | "We'll use a NN that was pretrained for image recognition. This NN was trained on the [ImageNet](http://www.image-net.org/) project, which contains > 14 million images belonging to > 20,000 classes (synsets). "
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": 6,
117 | "metadata": {},
118 | "outputs": [],
119 | "source": [
120 | "import tensorflow as tf\n",
121 | "import numpy as np\n",
122 | "from tensorflow.keras.preprocessing import image\n",
123 | "from tensorflow.keras.applications import vgg16"
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {
129 | "slideshow": {
130 | "slide_type": "-"
131 | }
132 | },
133 | "source": [
134 | "**Exercise 3:** Use tf.keras.applications.VGG16 (the NN pre-trained on ImageNet) to classify at least two images not done in lecture. These can be images from the lecture folder or your own images. Report on the top five predicted classes and their corresponding probabilities. "
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 7,
140 | "metadata": {
141 | "slideshow": {
142 | "slide_type": "-"
143 | }
144 | },
145 | "outputs": [],
146 | "source": [
147 | "# your code here\n"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "**Exercise 4 (optional):** There are several [other pre-trained networks in Keras](https://github.com/keras-team/keras-applications). Try these!"
155 | ]
156 | }
157 | ],
158 | "metadata": {
159 | "anaconda-cloud": {},
160 | "celltoolbar": "Slideshow",
161 | "kernelspec": {
162 | "display_name": "Python 3 (ipykernel)",
163 | "language": "python",
164 | "name": "python3"
165 | },
166 | "language_info": {
167 | "codemirror_mode": {
168 | "name": "ipython",
169 | "version": 3
170 | },
171 | "file_extension": ".py",
172 | "mimetype": "text/x-python",
173 | "name": "python",
174 | "nbconvert_exporter": "python",
175 | "pygments_lexer": "ipython3",
176 | "version": "3.11.5"
177 | }
178 | },
179 | "nbformat": 4,
180 | "nbformat_minor": 4
181 | }
182 |
--------------------------------------------------------------------------------
/22-NeuralNetworks2/activationFct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/activationFct.png
--------------------------------------------------------------------------------
/22-NeuralNetworks2/beginner.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "rX8mhOLljYeM"
7 | },
8 | "source": [
9 | "##### Copyright 2019 The TensorFlow Authors."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "cellView": "form",
17 | "execution": {
18 | "iopub.execute_input": "2021-01-27T02:22:31.155179Z",
19 | "iopub.status.busy": "2021-01-27T02:22:31.154464Z",
20 | "iopub.status.idle": "2021-01-27T02:22:31.156965Z",
21 | "shell.execute_reply": "2021-01-27T02:22:31.156425Z"
22 | },
23 | "id": "BZSlp3DAjdYf"
24 | },
25 | "outputs": [],
26 | "source": [
27 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
28 | "# you may not use this file except in compliance with the License.\n",
29 | "# You may obtain a copy of the License at\n",
30 | "#\n",
31 | "# https://www.apache.org/licenses/LICENSE-2.0\n",
32 | "#\n",
33 | "# Unless required by applicable law or agreed to in writing, software\n",
34 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
35 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
36 | "# See the License for the specific language governing permissions and\n",
37 | "# limitations under the License."
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {
43 | "id": "3wF5wszaj97Y"
44 | },
45 | "source": [
46 | "# TensorFlow 2 quickstart for beginners"
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {
52 | "id": "DUNzJc4jTj6G"
53 | },
54 | "source": [
55 | ""
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {
74 | "id": "04QgGZc9bF5D"
75 | },
76 | "source": [
77 | "This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:\n",
78 | "\n",
79 | "1. Build a neural network that classifies images.\n",
80 | "2. Train this neural network.\n",
81 | "3. And, finally, evaluate the accuracy of the model."
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {
87 | "id": "hiH7AC-NTniF"
88 | },
89 | "source": [
90 | "This is a [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) notebook file. Python programs are run directly in the browser—a great way to learn and use TensorFlow. To follow this tutorial, run the notebook in Google Colab by clicking the button at the top of this page.\n",
91 | "\n",
92 | "1. In Colab, connect to a Python runtime: At the top-right of the menu bar, select *CONNECT*.\n",
93 | "2. Run all the notebook code cells: Select *Runtime* > *Run all*."
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {
99 | "id": "nnrWf3PCEzXL"
100 | },
101 | "source": [
102 | "Download and install TensorFlow 2. Import TensorFlow into your program:\n",
103 | "\n",
104 | "Note: Upgrade `pip` to install the TensorFlow 2 package. See the [install guide](https://www.tensorflow.org/install) for details."
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 1,
110 | "metadata": {
111 | "execution": {
112 | "iopub.execute_input": "2021-01-27T02:22:31.165375Z",
113 | "iopub.status.busy": "2021-01-27T02:22:31.164739Z",
114 | "iopub.status.idle": "2021-01-27T02:22:37.373933Z",
115 | "shell.execute_reply": "2021-01-27T02:22:37.373322Z"
116 | },
117 | "id": "0trJmd6DjqBZ"
118 | },
119 | "outputs": [],
120 | "source": [
121 | "import tensorflow as tf"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {
127 | "id": "7NAbSZiaoJ4z"
128 | },
129 | "source": [
130 | "Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the samples from integers to floating-point numbers:"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 9,
136 | "metadata": {
137 | "execution": {
138 | "iopub.execute_input": "2021-01-27T02:22:37.379364Z",
139 | "iopub.status.busy": "2021-01-27T02:22:37.378426Z",
140 | "iopub.status.idle": "2021-01-27T02:22:37.838749Z",
141 | "shell.execute_reply": "2021-01-27T02:22:37.838096Z"
142 | },
143 | "id": "7FP5258xjs-v"
144 | },
145 | "outputs": [
146 | {
147 | "name": "stdout",
148 | "output_type": "stream",
149 | "text": [
150 | "\n",
151 | "(1, 28, 28)\n"
152 | ]
153 | }
154 | ],
155 | "source": [
156 | "mnist = tf.keras.datasets.mnist\n",
157 | "\n",
158 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
159 | "x_train, x_test = x_train / 255.0, x_test / 255.0\n",
160 | "print(type(x_train[:1]))\n",
161 | "temp = x_train[:1]\n",
162 | "print(temp.shape)"
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {
168 | "id": "BPZ68wASog_I"
169 | },
170 | "source": [
171 | "Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:"
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": 10,
177 | "metadata": {
178 | "execution": {
179 | "iopub.execute_input": "2021-01-27T02:22:37.844781Z",
180 | "iopub.status.busy": "2021-01-27T02:22:37.843714Z",
181 | "iopub.status.idle": "2021-01-27T02:22:46.880604Z",
182 | "shell.execute_reply": "2021-01-27T02:22:46.881017Z"
183 | },
184 | "id": "h3IKyzTCDNGo"
185 | },
186 | "outputs": [],
187 | "source": [
188 | "model = tf.keras.models.Sequential([\n",
189 | " tf.keras.layers.Flatten(input_shape=(28, 28)),\n",
190 | " tf.keras.layers.Dense(128, activation='relu'),\n",
191 | " tf.keras.layers.Dropout(0.2),\n",
192 | " tf.keras.layers.Dense(10)\n",
193 | "])"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "metadata": {
199 | "id": "l2hiez2eIUz8"
200 | },
201 | "source": [
202 | "For each example the model returns a vector of \"[logits](https://developers.google.com/machine-learning/glossary#logits)\" or \"[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)\" scores, one for each class."
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 11,
208 | "metadata": {
209 | "execution": {
210 | "iopub.execute_input": "2021-01-27T02:22:46.889938Z",
211 | "iopub.status.busy": "2021-01-27T02:22:46.888924Z",
212 | "iopub.status.idle": "2021-01-27T02:22:47.311748Z",
213 | "shell.execute_reply": "2021-01-27T02:22:47.312324Z"
214 | },
215 | "id": "OeOrNdnkEEcR"
216 | },
217 | "outputs": [
218 | {
219 | "data": {
220 | "text/plain": [
221 | "array([[ 0.56925064, 0.07378727, 0.02948097, 0.3486092 , -0.06868142,\n",
222 | " 0.41106564, 0.07833061, -0.67182094, 1.172125 , -0.11269745]],\n",
223 | " dtype=float32)"
224 | ]
225 | },
226 | "execution_count": 11,
227 | "metadata": {},
228 | "output_type": "execute_result"
229 | }
230 | ],
231 | "source": [
232 | "predictions = model(x_train[:1]).numpy()\n",
233 | "predictions"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {
239 | "id": "tgjhDQGcIniO"
240 | },
241 | "source": [
242 | "The `tf.nn.softmax` function converts these logits to \"probabilities\" for each class: "
243 | ]
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": 12,
248 | "metadata": {
249 | "execution": {
250 | "iopub.execute_input": "2021-01-27T02:22:47.317672Z",
251 | "iopub.status.busy": "2021-01-27T02:22:47.316571Z",
252 | "iopub.status.idle": "2021-01-27T02:22:47.320613Z",
253 | "shell.execute_reply": "2021-01-27T02:22:47.321064Z"
254 | },
255 | "id": "zWSRnQ0WI5eq"
256 | },
257 | "outputs": [
258 | {
259 | "data": {
260 | "text/plain": [
261 | "array([[0.13139944, 0.08006018, 0.07659043, 0.1053829 , 0.06942936,\n",
262 | " 0.11217463, 0.08042473, 0.0379842 , 0.24011457, 0.06643963]],\n",
263 | " dtype=float32)"
264 | ]
265 | },
266 | "execution_count": 12,
267 | "metadata": {},
268 | "output_type": "execute_result"
269 | }
270 | ],
271 | "source": [
272 | "tf.nn.softmax(predictions).numpy()"
273 | ]
274 | },
275 | {
276 | "cell_type": "markdown",
277 | "metadata": {
278 | "id": "he5u_okAYS4a"
279 | },
280 | "source": [
281 | "Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to\n",
282 | "provide an exact and numerically stable loss calculation for all models when using a softmax output. "
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "metadata": {
288 | "id": "hQyugpgRIyrA"
289 | },
290 | "source": [
291 | "The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example."
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 13,
297 | "metadata": {
298 | "execution": {
299 | "iopub.execute_input": "2021-01-27T02:22:47.326443Z",
300 | "iopub.status.busy": "2021-01-27T02:22:47.325323Z",
301 | "iopub.status.idle": "2021-01-27T02:22:47.328155Z",
302 | "shell.execute_reply": "2021-01-27T02:22:47.327614Z"
303 | },
304 | "id": "RSkzdv8MD0tT"
305 | },
306 | "outputs": [],
307 | "source": [
308 | "loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)"
309 | ]
310 | },
311 | {
312 | "cell_type": "markdown",
313 | "metadata": {
314 | "id": "SfR4MsSDU880"
315 | },
316 | "source": [
317 | "This loss is equal to the negative log probability of the true class:\n",
318 | "It is zero if the model is sure of the correct class.\n",
319 | "\n",
320 | "This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`."
321 | ]
322 | },
323 | {
324 | "cell_type": "code",
325 | "execution_count": 14,
326 | "metadata": {
327 | "execution": {
328 | "iopub.execute_input": "2021-01-27T02:22:47.334018Z",
329 | "iopub.status.busy": "2021-01-27T02:22:47.332880Z",
330 | "iopub.status.idle": "2021-01-27T02:22:47.339363Z",
331 | "shell.execute_reply": "2021-01-27T02:22:47.339843Z"
332 | },
333 | "id": "NJWqEVrrJ7ZB"
334 | },
335 | "outputs": [
336 | {
337 | "name": "stdout",
338 | "output_type": "stream",
339 | "text": [
340 | "[5]\n",
341 | "[[ 0.56925064 0.07378727 0.02948097 0.3486092 -0.06868142 0.41106564\n",
342 | " 0.07833061 -0.67182094 1.172125 -0.11269745]]\n"
343 | ]
344 | },
345 | {
346 | "data": {
347 | "text/plain": [
348 | "2.1876984"
349 | ]
350 | },
351 | "execution_count": 14,
352 | "metadata": {},
353 | "output_type": "execute_result"
354 | }
355 | ],
356 | "source": [
357 | "print(y_train[:1])\n",
358 | "print(predictions)\n",
359 | "loss_fn(y_train[:1], predictions).numpy()"
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": 9,
365 | "metadata": {
366 | "execution": {
367 | "iopub.execute_input": "2021-01-27T02:22:47.349430Z",
368 | "iopub.status.busy": "2021-01-27T02:22:47.348502Z",
369 | "iopub.status.idle": "2021-01-27T02:22:47.364666Z",
370 | "shell.execute_reply": "2021-01-27T02:22:47.364157Z"
371 | },
372 | "id": "9foNKHzTD2Vo"
373 | },
374 | "outputs": [],
375 | "source": [
376 | "model.compile(optimizer='adam',\n",
377 | " loss=loss_fn,\n",
378 | " metrics=['accuracy'])"
379 | ]
380 | },
381 | {
382 | "cell_type": "markdown",
383 | "metadata": {
384 | "id": "ix4mEL65on-w"
385 | },
386 | "source": [
387 | "The `Model.fit` method adjusts the model parameters to minimize the loss: "
388 | ]
389 | },
390 | {
391 | "cell_type": "code",
392 | "execution_count": 10,
393 | "metadata": {
394 | "execution": {
395 | "iopub.execute_input": "2021-01-27T02:22:47.369888Z",
396 | "iopub.status.busy": "2021-01-27T02:22:47.368635Z",
397 | "iopub.status.idle": "2021-01-27T02:23:02.490451Z",
398 | "shell.execute_reply": "2021-01-27T02:23:02.490856Z"
399 | },
400 | "id": "y7suUbJXVLqP"
401 | },
402 | "outputs": [
403 | {
404 | "name": "stdout",
405 | "output_type": "stream",
406 | "text": [
407 | "Epoch 1/5\n",
408 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.4813 - accuracy: 0.8565\n",
409 | "Epoch 2/5\n",
410 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.1533 - accuracy: 0.9553\n",
411 | "Epoch 3/5\n",
412 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.1057 - accuracy: 0.9686\n",
413 | "Epoch 4/5\n",
414 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.0908 - accuracy: 0.9721\n",
415 | "Epoch 5/5\n",
416 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.0700 - accuracy: 0.9788\n"
417 | ]
418 | },
419 | {
420 | "data": {
421 | "text/plain": [
422 | ""
423 | ]
424 | },
425 | "execution_count": 1,
426 | "metadata": {},
427 | "output_type": "execute_result"
428 | }
429 | ],
430 | "source": [
431 | "model.fit(x_train, y_train, epochs=5)"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {
437 | "id": "4mDAAPFqVVgn"
438 | },
439 | "source": [
440 | "The `Model.evaluate` method checks the models performance, usually on a \"[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)\" or \"[Test-set](https://developers.google.com/machine-learning/glossary#test-set)\"."
441 | ]
442 | },
443 | {
444 | "cell_type": "code",
445 | "execution_count": 11,
446 | "metadata": {
447 | "execution": {
448 | "iopub.execute_input": "2021-01-27T02:23:02.496255Z",
449 | "iopub.status.busy": "2021-01-27T02:23:02.495021Z",
450 | "iopub.status.idle": "2021-01-27T02:23:03.037377Z",
451 | "shell.execute_reply": "2021-01-27T02:23:03.036860Z"
452 | },
453 | "id": "F7dTAzgHDUh7"
454 | },
455 | "outputs": [
456 | {
457 | "name": "stdout",
458 | "output_type": "stream",
459 | "text": [
460 | "313/313 - 0s - loss: 0.0748 - accuracy: 0.9758\n"
461 | ]
462 | },
463 | {
464 | "data": {
465 | "text/plain": [
466 | "[0.07476752996444702, 0.9757999777793884]"
467 | ]
468 | },
469 | "execution_count": 1,
470 | "metadata": {},
471 | "output_type": "execute_result"
472 | }
473 | ],
474 | "source": [
475 | "model.evaluate(x_test, y_test, verbose=2)"
476 | ]
477 | },
478 | {
479 | "cell_type": "markdown",
480 | "metadata": {
481 | "id": "T4JfEh7kvx6m"
482 | },
483 | "source": [
484 | "The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/)."
485 | ]
486 | },
487 | {
488 | "cell_type": "markdown",
489 | "metadata": {
490 | "id": "Aj8NrlzlJqDG"
491 | },
492 | "source": [
493 | "If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:"
494 | ]
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": 12,
499 | "metadata": {
500 | "execution": {
501 | "iopub.execute_input": "2021-01-27T02:23:03.044680Z",
502 | "iopub.status.busy": "2021-01-27T02:23:03.044017Z",
503 | "iopub.status.idle": "2021-01-27T02:23:03.062428Z",
504 | "shell.execute_reply": "2021-01-27T02:23:03.061917Z"
505 | },
506 | "id": "rYb6DrEH0GMv"
507 | },
508 | "outputs": [],
509 | "source": [
510 | "probability_model = tf.keras.Sequential([\n",
511 | " model,\n",
512 | " tf.keras.layers.Softmax()\n",
513 | "])"
514 | ]
515 | },
516 | {
517 | "cell_type": "code",
518 | "execution_count": 13,
519 | "metadata": {
520 | "execution": {
521 | "iopub.execute_input": "2021-01-27T02:23:03.067513Z",
522 | "iopub.status.busy": "2021-01-27T02:23:03.066434Z",
523 | "iopub.status.idle": "2021-01-27T02:23:03.072553Z",
524 | "shell.execute_reply": "2021-01-27T02:23:03.073021Z"
525 | },
526 | "id": "cnqOZtUp1YR_"
527 | },
528 | "outputs": [
529 | {
530 | "data": {
531 | "text/plain": [
532 | ""
548 | ]
549 | },
550 | "execution_count": 1,
551 | "metadata": {},
552 | "output_type": "execute_result"
553 | }
554 | ],
555 | "source": [
556 | "probability_model(x_test[:5])"
557 | ]
558 | }
559 | ],
560 | "metadata": {
561 | "colab": {
562 | "collapsed_sections": [
563 | "rX8mhOLljYeM"
564 | ],
565 | "name": "beginner.ipynb",
566 | "toc_visible": true
567 | },
568 | "kernelspec": {
569 | "display_name": "Python 3",
570 | "language": "python",
571 | "name": "python3"
572 | },
573 | "language_info": {
574 | "codemirror_mode": {
575 | "name": "ipython",
576 | "version": 3
577 | },
578 | "file_extension": ".py",
579 | "mimetype": "text/x-python",
580 | "name": "python",
581 | "nbconvert_exporter": "python",
582 | "pygments_lexer": "ipython3",
583 | "version": "3.8.5"
584 | }
585 | },
586 | "nbformat": 4,
587 | "nbformat_minor": 1
588 | }
589 |
--------------------------------------------------------------------------------
/22-NeuralNetworks2/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/graph.png
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/brodie.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/brodie.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/layla1.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/layla1.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/scout1.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout1.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/scout2.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout2.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/scout3.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout3.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/scout4.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout4.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/images/scout5.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/images/scout5.jpeg
--------------------------------------------------------------------------------
/22-NeuralNetworks2/nature14539.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/22-NeuralNetworks2/nature14539.pdf
--------------------------------------------------------------------------------
/23-databases/23-databases-exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Introduction to Data Science – Relational Databases Exercise\n",
8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* \n"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import pandas as pd\n",
18 | "import sqlite3 as sq\n",
19 | "\n",
20 | "# we connect to the database, which - in the case of sqlite - is a local file\n",
21 | "conn = sq.connect(\"./chinook.db\")"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "### Exercise 1: Simple Queries\n",
29 | "\n",
30 | "1. List all the rows in the genres table.\n",
31 | "2. List only the genre names (nothing else) in the table.\n",
32 | "3. List the genre names ordered by name.\n",
33 | "4. List the genre entries with IDs between 11 and 16.\n",
34 | "5. List the genre entries that start with an R.\n",
35 | "6. List the GenreIds of Latin, Easy Listening, and Opera (in one query)."
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "1. List all the rows in the genres table."
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": null,
48 | "metadata": {},
49 | "outputs": [],
50 | "source": []
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "2. List only the genre names (nothing else) in the table."
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": []
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "3. List the genre names ordered by name."
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": null,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": []
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "List the genre entries with IDs between 11 and 16."
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": null,
90 | "metadata": {},
91 | "outputs": [],
92 | "source": []
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "5. List the genre entries that start with an R."
99 | ]
100 | },
101 | {
102 | "cell_type": "code",
103 | "execution_count": null,
104 | "metadata": {},
105 | "outputs": [],
106 | "source": []
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {},
111 | "source": [
112 | "6. List the entries of Latin, Easy Listening, and Opera (in one query)."
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": []
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "## Exercise 2: Joining\n",
127 | "\n",
128 | "1. Create a table that contains track names, genre name and genre ID for each track. Hint: the table is sorted by genres, look at the tail of the dataframe to make sure it works correctly.\n",
129 | "2. Create a table that contains the counts of tracks in a genre by using the GenreID.\n",
130 | "3. Create a table that contains the genre name and the count of tracks in that genre.\n",
131 | "4. Sort the previous table by the count. Which are the biggest genres? Hint: the DESC keyword can be added at the end of the sorting expression.\n"
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": null,
137 | "metadata": {},
138 | "outputs": [],
139 | "source": []
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "2. Create a table that contains the counts of tracks in a genre by using the GenreID."
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": []
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {},
158 | "source": [
159 | "3. Create a table that contains the genre name and the count of tracks in that genre"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": null,
165 | "metadata": {},
166 | "outputs": [],
167 | "source": []
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {
172 | "collapsed": true
173 | },
174 | "source": [
175 | "4. Sort the previous table by the count. Which are the biggest genres?"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": null,
181 | "metadata": {},
182 | "outputs": [],
183 | "source": []
184 | }
185 | ],
186 | "metadata": {
187 | "anaconda-cloud": {},
188 | "kernelspec": {
189 | "display_name": "Python 3 (ipykernel)",
190 | "language": "python",
191 | "name": "python3"
192 | },
193 | "language_info": {
194 | "codemirror_mode": {
195 | "name": "ipython",
196 | "version": 3
197 | },
198 | "file_extension": ".py",
199 | "mimetype": "text/x-python",
200 | "name": "python",
201 | "nbconvert_exporter": "python",
202 | "pygments_lexer": "ipython3",
203 | "version": "3.9.6"
204 | }
205 | },
206 | "nbformat": 4,
207 | "nbformat_minor": 1
208 | }
209 |
--------------------------------------------------------------------------------
/23-databases/albums_tracks_tables.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/albums_tracks_tables.jpg
--------------------------------------------------------------------------------
/23-databases/backup_chinook.db:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/backup_chinook.db
--------------------------------------------------------------------------------
/23-databases/chinook.db:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/chinook.db
--------------------------------------------------------------------------------
/23-databases/database_schema.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/database_schema.png
--------------------------------------------------------------------------------
/23-databases/exploits_of_a_mom.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/23-databases/exploits_of_a_mom.png
--------------------------------------------------------------------------------
/24-networks/24-network-exercise-activity.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "170612c4",
6 | "metadata": {},
7 | "source": [
8 | "## Exercise\n",
9 | "\n",
10 | "Explore the [Karate Club](https://networkx.readthedocs.io/en/stable/reference/generated/networkx.generators.social.karate_club_graph.html#networkx.generators.social.karate_club_graph) network:\n",
11 | "\n",
12 | " * How many nodes, how many edges are in the network? \n",
13 | " * Are there nodes of high betweenness centrality? Visualize the network.\n",
14 | " * Remove the node with the highest centrality. How many components do you have?"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 2,
20 | "id": "b4ab1f50",
21 | "metadata": {},
22 | "outputs": [],
23 | "source": [
24 | "import networkx as nx\n",
25 | "import matplotlib.pyplot as plt\n",
26 | "%matplotlib inline\n",
27 | "plt.rcParams['figure.figsize'] = (10, 6)\n",
28 | "plt.style.use('ggplot')\n",
29 | "karate_club = nx.karate_club_graph()"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "id": "f6b1c3ef",
36 | "metadata": {},
37 | "outputs": [],
38 | "source": []
39 | }
40 | ],
41 | "metadata": {
42 | "kernelspec": {
43 | "display_name": "Python 3 (ipykernel)",
44 | "language": "python",
45 | "name": "python3"
46 | },
47 | "language_info": {
48 | "codemirror_mode": {
49 | "name": "ipython",
50 | "version": 3
51 | },
52 | "file_extension": ".py",
53 | "mimetype": "text/x-python",
54 | "name": "python",
55 | "nbconvert_exporter": "python",
56 | "pygments_lexer": "ipython3",
57 | "version": "3.9.6"
58 | }
59 | },
60 | "nbformat": 4,
61 | "nbformat_minor": 5
62 | }
63 |
--------------------------------------------------------------------------------
/24-networks/24-networks-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/24-networks-slides.pdf
--------------------------------------------------------------------------------
/24-networks/24-path-search.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Introduction to Data Science – Networks (Path Search)\n",
8 | "*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* \n",
9 | "\n",
10 | "This is a continuation of how to work with graphs in Python using the [NetworkX](networkx.github.io) library. Here we focus on understand Path Search Algorithms."
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": null,
16 | "metadata": {},
17 | "outputs": [],
18 | "source": [
19 | "import networkx as nx\n",
20 | "import matplotlib.pyplot as plt\n",
21 | "%matplotlib inline\n",
22 | "plt.rcParams['figure.figsize'] = (10, 6)\n",
23 | "plt.style.use('ggplot')"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "We'll also import the Les Miserable network again"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "# Read the graph file\n",
40 | "lesmis = nx.read_gml('lesmis.gml')\n",
41 | "# Plot the nodes\n",
42 | "lesmis.nodes()"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "## Path Search\n",
50 | "\n",
51 | "Path search, and in particular shortest path search is an important problem. It answers questions such as \n",
52 | " * how do I get as quickly as possible from A to B in a road network\n",
53 | " * how to best rout a data package that delivers the next second of your Netflix movie\n",
54 | " * who can I talk to to get an introduction to Person B\n",
55 | " * etc.\n",
56 | " \n",
57 | "There are two major types of path search algorithms: \n",
58 | "\n",
59 | "1. Algorithms that operate only on the topology, i.e., only the \"distance\" is relevant\n",
60 | "2. Algorithms that also consider edge weights, i.e., they minimize a \"cost\"\n",
61 | "\n",
62 | "For the above scenarios, edge weights make a lot of sense: I might give a different weight to an edge that is an Interstate, for example, as I will be able to travel faster. "
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "metadata": {},
68 | "source": [
69 | ""
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### Breadth First Seach\n",
77 | "\n",
78 | "Breadth first search is a simple algorithm that solves the single-source shortest path problem, i.e., it calculates the shortest path from one source to all other nodes in the network. \n",
79 | "\n",
80 | "The algorithm works as follows:\n",
81 | "\n",
82 | "1. Label source node 0\n",
83 | "2. Find neighbors, label 1, put in queue\n",
84 | "3. Take node labeled n (1 for first step) out of queue. Find its unlabeled neighbors. Label them n+1 and put in queue\n",
85 | "4. Repeat 3 until found node (if only the exact path is relevant) or no nodes left (when looking for all shortest paths)\n",
86 | "5. The distance between start and end node is the label of the end node.\n",
87 | "\n",
88 | "Let's look at the path from Boulatruelle to Napoleon:"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "metadata": {},
95 | "outputs": [],
96 | "source": [
97 | "path = nx.shortest_path(lesmis,source=\"Boulatruelle\",target=\"Marius\")\n",
98 | "path"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "And the path from Perpetue to Napoleon:"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": null,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "path = nx.shortest_path(lesmis,source=\"Perpetue\",target=\"Napoleon\")\n",
115 | "path"
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "### Dijkstra's Algorithm\n",
123 | "\n",
124 | "[Dijkstra's algoritm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) is the go-to algorithm for finding paths in a weigthed graph.\n",
125 | "\n",
126 | "Let the node at which we are starting be called the initial node. Let the distance of node Y be the distance from the initial node to Y. Dijkstra's algorithm will assign some initial distance values and will try to improve them step by step.\n",
127 | "1. Assign to every node a tentative distance value: set it to zero for our initial node and to infinity for all other nodes.\n",
128 | "2. Set the initial node as current. Mark all other nodes unvisited. Create a set of all the unvisited nodes called the unvisited set.\n",
129 | "3. For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one. For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.\n",
130 | "4. When we are done considering all of the neighbors of the current node, mark the current node as visited and remove it from the unvisited set. A visited node will never be checked again.\n",
131 | "5. If the destination node has been marked visited (when planning a route between two specific nodes) or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal; occurs when there is no connection between the initial node and remaining unvisited nodes), then stop. The algorithm has finished.\n",
132 | "6. Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new \"current node\", and go back to step 3.\n",
133 | "\n",
134 | "Here' is an animation for Dijkstra's Algorithm from Wikipedia (we'll go through this in class):\n",
135 | "\n",
136 | "\n",
137 | "\n",
138 | "Here is an illustration of Dijkstra's Algorithm for a motion planning task:\n",
139 | "\n",
140 | ""
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "Our Les Miserables dataset actually comes with edge weights. The weight describes the number of co-occurrences of the characters. Now, let's look at the values:"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "metadata": {
154 | "scrolled": true
155 | },
156 | "outputs": [],
157 | "source": [
158 | "lesmis.edges(data=True)"
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {},
164 | "source": [
165 | "We can draw the graph with these weights."
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": null,
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "plt.rcParams['figure.figsize'] = (10, 15)\n",
175 | "\n",
176 | "pos = nx.spring_layout(lesmis)\n",
177 | "\n",
178 | "# Use edge weights in line drawing\n",
179 | "edge_widths = [1.0 * x[2]['value'] for x in lesmis.edges(data=True)]\n",
180 | "\n",
181 | "nx.draw(lesmis, pos=pos)\n",
182 | "nx.draw_networkx(lesmis, pos=pos, width=edge_widths)\n",
183 | "plt.show()"
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "That was nasty, let's try color."
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "plt.rcParams['figure.figsize'] = (10, 15)\n",
200 | "\n",
201 | "pos = nx.spring_layout(lesmis)\n",
202 | "\n",
203 | "# Use edge weights in line drawing\n",
204 | "edge_colors = [ x[2]['value'] / 31.0 for x in lesmis.edges(data=True)]\n",
205 | "\n",
206 | "nx.draw(lesmis, pos=pos)\n",
207 | "nx.draw_networkx(lesmis, pos=pos, edge_color=edge_colors, width=2.0, edge_cmap=plt.cm.YlOrRd)\n",
208 | "plt.show()"
209 | ]
210 | },
211 | {
212 | "cell_type": "markdown",
213 | "metadata": {},
214 | "source": [
215 | "First we run the algorithm without weights:"
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": null,
221 | "metadata": {},
222 | "outputs": [],
223 | "source": [
224 | "path = nx.dijkstra_path(lesmis, source=\"Perpetue\", target=\"Napoleon\")\n",
225 | "path"
226 | ]
227 | },
228 | {
229 | "cell_type": "markdown",
230 | "metadata": {},
231 | "source": [
232 | "And then we run it with the weights, to have a comparison:"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {},
239 | "outputs": [],
240 | "source": [
241 | "weighted_path = nx.dijkstra_path(lesmis, source=\"Perpetue\", target=\"Napoleon\", weight=\"value\")\n",
242 | "weighted_path"
243 | ]
244 | },
245 | {
246 | "cell_type": "markdown",
247 | "metadata": {},
248 | "source": [
249 | "We can calculate the relative weights of these paths:"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": [
258 | "def getPathCost(path):\n",
259 | " length = len(path)\n",
260 | " weight = 0\n",
261 | " for i in range(length-1):\n",
262 | " attributes = lesmis[path[i]][path[i+1]]\n",
263 | " weight += attributes[\"value\"]\n",
264 | " print(path[i], path[i+1], attributes)\n",
265 | " print(\"Weight:\", weight)\n",
266 | " \n",
267 | "print(\"Shortest Path\")\n",
268 | "getPathCost(path)\n",
269 | "\n",
270 | "print(\"\\n ==== \\n\")\n",
271 | "\n",
272 | "print(\"Weighted Path\") \n",
273 | "getPathCost(weighted_path)\n"
274 | ]
275 | },
276 | {
277 | "cell_type": "markdown",
278 | "metadata": {},
279 | "source": [
280 | "### The A* Algorithm - Path Finding using Heuristics"
281 | ]
282 | },
283 | {
284 | "cell_type": "markdown",
285 | "metadata": {},
286 | "source": [
287 | "Dijkstra is a great general algorithm, but it can be slow. \n",
288 | "\n",
289 | "If we know more about the network we're working with, we can use a more efficient algorithm that takes this information into account. For example, in motion planning and in route planning on a map, we know where the target point is located spatially, relative to the source point. We can take this information into account by using a heuristic function to refine the search. \n",
290 | "\n",
291 | "The [A* algorithm](https://en.wikipedia.org/wiki/A*_search_algorithm) is such an algorithm. It's based on Djikstra's algorithm, but uses a heuristic function to guide it's search into the right direction. A* is an informed search algorithm, or a best-first search, meaning that it solves problems by searching among all possible paths to the solution (goal) for the one that incurs the smallest cost (least distance traveled, shortest time, etc.), and among these paths it first considers the ones that appear to lead most quickly to the solution. \n",
292 | "\n",
293 | "At each step of the algorithm, A* evaluates which is the best paths to follow\n",
294 | "\n",
295 | "See the following example:\n",
296 | "\n",
297 | "\n",
298 | "\n",
299 | "While [NetworkX](https://networkx.readthedocs.io/en/stable/reference/algorithms.shortest_paths.html#module-networkx.algorithms.shortest_paths.astar) provides an implementation of the A* algorithm, we are not able to define a meaningful heuristic function for the Les Miserables graph, so we can't use it on this graph."
300 | ]
301 | }
302 | ],
303 | "metadata": {
304 | "anaconda-cloud": {},
305 | "kernelspec": {
306 | "display_name": "Python 3 (ipykernel)",
307 | "language": "python",
308 | "name": "python3"
309 | },
310 | "language_info": {
311 | "codemirror_mode": {
312 | "name": "ipython",
313 | "version": 3
314 | },
315 | "file_extension": ".py",
316 | "mimetype": "text/x-python",
317 | "name": "python",
318 | "nbconvert_exporter": "python",
319 | "pygments_lexer": "ipython3",
320 | "version": "3.9.6"
321 | }
322 | },
323 | "nbformat": 4,
324 | "nbformat_minor": 1
325 | }
326 |
--------------------------------------------------------------------------------
/24-networks/Astar_progress_animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Astar_progress_animation.gif
--------------------------------------------------------------------------------
/24-networks/Dijkstra_Animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Dijkstra_Animation.gif
--------------------------------------------------------------------------------
/24-networks/Dijkstras_progress_animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/Dijkstras_progress_animation.gif
--------------------------------------------------------------------------------
/24-networks/bread.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datascience-course/2024-datascience-lectures/0f16bd94a2f4de1b1ba4b4c65b7f6e40fd21b962/24-networks/bread.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019, the University of Utah
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Introduction to Data Science – Lecture Material
2 | Course website: [http://datasciencecourse.net](http://datasciencecourse.net)
3 |
4 | This repository (will) contain(s) the material used in the lectures. You can manually download the files for each lecture, but we recommend that you use git to clone and update this repository.
5 |
6 | You can use [GitHub Desktop](https://desktop.github.com/) to update this repository as new lectures are published, or you can use the following commands (recommended):
7 |
8 | ## Initial Step: Cloning
9 |
10 | When you clone a repository you set up a copy on your computer. Run:
11 |
12 | ```bash
13 | git clone https://github.com/datascience-course/2024-datascience-lectures
14 | ```
15 |
16 | This will create a folder `2024-datascience-lectures` on your computer, with the individual labs in subdirectories.
17 |
18 | ## Updating
19 |
20 | As we release new lectures or update lectures, you'll have to update your repository. You can do this by changing into the `2024-datascience-lectures` directory and executing:
21 |
22 | ```bash
23 | git pull
24 | ```
25 |
26 | That's it – you'll have the latest version of the lectures.
27 |
--------------------------------------------------------------------------------