"
1658 | ],
1659 | "text/plain": [
1660 | " composer birth death city age age_def\n",
1661 | "0 Mahler 1860 1911 Kaliste 51 young\n",
1662 | "3 Shostakovich 1906 1975 Saint-Petersburg 69 old"
1663 | ]
1664 | },
1665 | "execution_count": 36,
1666 | "metadata": {},
1667 | "output_type": "execute_result"
1668 | }
1669 | ],
1670 | "source": [
1671 | "compos_sub"
1672 | ]
1673 | },
1674 | {
1675 | "cell_type": "markdown",
1676 | "metadata": {},
1677 | "source": [
1678 | "We can then modify the new array:"
1679 | ]
1680 | },
1681 | {
1682 | "cell_type": "code",
1683 | "execution_count": 37,
1684 | "metadata": {},
1685 | "outputs": [
1686 | {
1687 | "name": "stderr",
1688 | "output_type": "stream",
1689 | "text": [
1690 | "/Users/gw18g940/miniconda3/envs/danalytics/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning: \n",
1691 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
1692 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
1693 | "\n",
1694 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
1695 | " self.obj[item] = s\n"
1696 | ]
1697 | }
1698 | ],
1699 | "source": [
1700 | "compos_sub.loc[0,'birth'] = 3000"
1701 | ]
1702 | },
1703 | {
1704 | "cell_type": "markdown",
1705 | "metadata": {},
1706 | "source": [
1707 | "Note that we get this SettingWithCopyWarning warning. This is a very common problem hand has to do with how new arrays are created when making subselections. Simply stated, did we create an entirely new array or a \"view\" of the old one? This will be very case-dependent and to avoid this, if we want to create a new array we can just enforce it using the ```copy()``` method (for more information on the topic see for example this [explanation](https://www.dataquest.io/blog/settingwithcopywarning/):"
1708 | ]
1709 | },
1710 | {
1711 | "cell_type": "code",
1712 | "execution_count": 38,
1713 | "metadata": {},
1714 | "outputs": [],
1715 | "source": [
1716 | "compos_sub2 = compo_pd[compo_pd['birth'] > 1859].copy()\n",
1717 | "compos_sub2.loc[0,'birth'] = 3000"
1718 | ]
1719 | },
1720 | {
1721 | "cell_type": "code",
1722 | "execution_count": 39,
1723 | "metadata": {},
1724 | "outputs": [
1725 | {
1726 | "data": {
1727 | "text/html": [
1728 | "
\n",
1729 | "\n",
1742 | "
\n",
1743 | " \n",
1744 | "
\n",
1745 | "
\n",
1746 | "
composer
\n",
1747 | "
birth
\n",
1748 | "
death
\n",
1749 | "
city
\n",
1750 | "
age
\n",
1751 | "
age_def
\n",
1752 | "
\n",
1753 | " \n",
1754 | " \n",
1755 | "
\n",
1756 | "
0
\n",
1757 | "
Mahler
\n",
1758 | "
3000
\n",
1759 | "
1911
\n",
1760 | "
Kaliste
\n",
1761 | "
51
\n",
1762 | "
young
\n",
1763 | "
\n",
1764 | "
\n",
1765 | "
3
\n",
1766 | "
Shostakovich
\n",
1767 | "
1906
\n",
1768 | "
1975
\n",
1769 | "
Saint-Petersburg
\n",
1770 | "
69
\n",
1771 | "
old
\n",
1772 | "
\n",
1773 | " \n",
1774 | "
\n",
1775 | "
"
1776 | ],
1777 | "text/plain": [
1778 | " composer birth death city age age_def\n",
1779 | "0 Mahler 3000 1911 Kaliste 51 young\n",
1780 | "3 Shostakovich 1906 1975 Saint-Petersburg 69 old"
1781 | ]
1782 | },
1783 | "execution_count": 39,
1784 | "metadata": {},
1785 | "output_type": "execute_result"
1786 | }
1787 | ],
1788 | "source": [
1789 | "compos_sub2"
1790 | ]
1791 | }
1792 | ],
1793 | "metadata": {
1794 | "kernelspec": {
1795 | "display_name": "Python 3",
1796 | "language": "python",
1797 | "name": "python3"
1798 | },
1799 | "language_info": {
1800 | "codemirror_mode": {
1801 | "name": "ipython",
1802 | "version": 3
1803 | },
1804 | "file_extension": ".py",
1805 | "mimetype": "text/x-python",
1806 | "name": "python",
1807 | "nbconvert_exporter": "python",
1808 | "pygments_lexer": "ipython3",
1809 | "version": "3.8.2"
1810 | }
1811 | },
1812 | "nbformat": 4,
1813 | "nbformat_minor": 4
1814 | }
1815 |
--------------------------------------------------------------------------------
/98-DA_Numpy_Exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 2,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import matplotlib.pyplot as plt"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "# Exercice Numpy"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "## 1. Array creation"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "- Create a 1D array called ```xarray``` with values from 0 to 10 and in steps of 0.1. Check the shape of the array:"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": []
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "- Create an array of normally distributed numbers with mean $\\mu=0$ and standard deviation $\\sigma=0.5$. It should have 20 rows and as many columns as there are elements in ```xarray```. Call it ```normal_array```:"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": null,
51 | "metadata": {},
52 | "outputs": [],
53 | "source": []
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "- Check the type of ```normal_array```:"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": null,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": []
68 | },
69 | {
70 | "cell_type": "markdown",
71 | "metadata": {},
72 | "source": [
73 | "## 2. Array mathematics"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "- Using ```xarray``` as x-variable, create a new array ```yarray``` as y-variable using the function $y = 10* cos(x) * e^{-0.1x}$:"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": null,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": []
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "- Create ```array_abs``` by taking the absolute value of ```yarray```:"
95 | ]
96 | },
97 | {
98 | "cell_type": "code",
99 | "execution_count": null,
100 | "metadata": {},
101 | "outputs": [],
102 | "source": []
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {},
107 | "source": [
108 | "- Create a boolan array (logical array) where all positions $>0.3$ in ```array_abs``` are ```True``` and the others ```False```"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": null,
114 | "metadata": {},
115 | "outputs": [],
116 | "source": []
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "- Create a standard deviation projection along the second dimension (columns) of ```normal_array```. Check that the dimensions are the ones you expected. Also are the values around the value you expect?"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "metadata": {},
129 | "outputs": [],
130 | "source": []
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "metadata": {},
135 | "source": [
136 | "## 3. Plotting\n",
137 | "\n",
138 | "- Use a line plot to plot ```yarray``` vs ```xarray```:"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": []
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "- Try to change the color of the plot to red and to have markers on top of the line as squares:"
153 | ]
154 | },
155 | {
156 | "cell_type": "code",
157 | "execution_count": null,
158 | "metadata": {},
159 | "outputs": [],
160 | "source": []
161 | },
162 | {
163 | "cell_type": "markdown",
164 | "metadata": {},
165 | "source": [
166 | "- Plot the ```normal_array``` as an imagage and change the colormap to 'gray':"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": null,
172 | "metadata": {},
173 | "outputs": [],
174 | "source": []
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "- Assemble the two above plots in a figure with one row and two columns grid:"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": null,
186 | "metadata": {},
187 | "outputs": [],
188 | "source": []
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": [
194 | "## 4. Indexing\n",
195 | "\n",
196 | "- Create new arrays where you select every second element from xarray and yarray. Plot them on top of ```xarray``` and ```yarray```."
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": null,
202 | "metadata": {},
203 | "outputs": [],
204 | "source": []
205 | },
206 | {
207 | "cell_type": "markdown",
208 | "metadata": {},
209 | "source": [
210 | "- Select all values of ```yarray``` that are larger than 0. Plot those on top of the regular ```xarray``` and ```yarray```plot."
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": null,
216 | "metadata": {},
217 | "outputs": [],
218 | "source": []
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "- Flip the order of ```xarray``` use it to plot ```yarray```:"
225 | ]
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": null,
230 | "metadata": {},
231 | "outputs": [],
232 | "source": []
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "## 5. Combining arrays\n",
239 | "\n",
240 | "- Create an array filled with ones with the same shape as ```normal_array```. Concatenate it to ```normal_array``` along the first dimensions and plot the result:"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": null,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": []
249 | },
250 | {
251 | "cell_type": "markdown",
252 | "metadata": {},
253 | "source": [
254 | "- ```yarray``` represents a signal. Each line of ```normal_array``` represents a possible random noise for that signal. Using broadcasting, try to create an array of noisy versions of ```yarray``` using ```normal_array```. Finally, plot it:"
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": []
263 | }
264 | ],
265 | "metadata": {
266 | "kernelspec": {
267 | "display_name": "Python 3",
268 | "language": "python",
269 | "name": "python3"
270 | },
271 | "language_info": {
272 | "codemirror_mode": {
273 | "name": "ipython",
274 | "version": 3
275 | },
276 | "file_extension": ".py",
277 | "mimetype": "text/x-python",
278 | "name": "python",
279 | "nbconvert_exporter": "python",
280 | "pygments_lexer": "ipython3",
281 | "version": "3.8.5"
282 | },
283 | "toc": {
284 | "base_numbering": 1,
285 | "nav_menu": {},
286 | "number_sections": false,
287 | "sideBar": true,
288 | "skip_h1_title": false,
289 | "title_cell": "Table of Contents",
290 | "title_sidebar": "Contents",
291 | "toc_cell": false,
292 | "toc_position": {},
293 | "toc_section_display": true,
294 | "toc_window_display": true
295 | },
296 | "varInspector": {
297 | "cols": {
298 | "lenName": 16,
299 | "lenType": 16,
300 | "lenVar": 40
301 | },
302 | "kernels_config": {
303 | "python": {
304 | "delete_cmd_postfix": "",
305 | "delete_cmd_prefix": "del ",
306 | "library": "var_list.py",
307 | "varRefreshCmd": "print(var_dic_list())"
308 | },
309 | "r": {
310 | "delete_cmd_postfix": ") ",
311 | "delete_cmd_prefix": "rm(",
312 | "library": "var_list.r",
313 | "varRefreshCmd": "cat(var_dic_list()) "
314 | }
315 | },
316 | "types_to_exclude": [
317 | "module",
318 | "function",
319 | "builtin_function_or_method",
320 | "instance",
321 | "_Feature"
322 | ],
323 | "window_display": false
324 | }
325 | },
326 | "nbformat": 4,
327 | "nbformat_minor": 4
328 | }
329 |
--------------------------------------------------------------------------------
/99-DA_Pandas_Exercises.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 21,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "import numpy as np\n",
11 | "import matplotlib.pyplot as plt\n",
12 | "import seaborn as sns"
13 | ]
14 | },
15 | {
16 | "cell_type": "markdown",
17 | "metadata": {},
18 | "source": [
19 | "# Exercise Pandas"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "For these exercices we are using a [dataset](https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/kernels) provided by Airbnb for a Kaggle competition. It describes its offer for New York City in 2019, including types of appartments, price, location etc."
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "## 1. Create a dataframe \n",
34 | "Create a dataframe of a few lines with objects and their poperties (e.g fruits, their weight and colour).\n",
35 | "Calculate the mean of your Dataframe."
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "## 2. Import\n",
43 | "- Import the table called ```AB_NYC_2019.csv``` as a dataframe. It is located in the Datasets folder. Have a look at the beginning of the table (head).\n",
44 | "\n",
45 | "- Create a histogram of prices"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "## 3. Operations"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "Create a new column in the dataframe by multiplying the \"price\" and \"availability_365\" columns to get an estimate of the maximum yearly income."
60 | ]
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "metadata": {},
65 | "source": [
66 | "## 3b. Subselection and plotting\n",
67 | "Create a new Dataframe by first subselecting yearly incomes between 1 and 100'000. Then make a scatter plot of yearly income versus number of reviews "
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "## 4. Combine"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "We provide below and additional table that contains the number of inhabitants of each of New York's boroughs (\"neighbourhood_group\" in the table). Use ```merge``` to add this population information to each element in the original dataframe."
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "## 5. Groups"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "- Using ```groupby``` calculate the average price for each type of room (room_type) in each neighbourhood_group. What is the average price for an entire home in Brooklyn ?\n",
96 | "- Unstack the multi-level Dataframe into a regular Dataframe with ```unstack()``` and create a bar plot with the resulting table\n"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "metadata": {},
102 | "source": [
103 | "## 6. Advanced plotting"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "Using Seaborn, create a scatter plot where x and y positions are longitude and lattitude, the color reflects price and the shape of the marker the borough (neighbourhood_group). Can you recognize parts of new york ? Does the map make sense ?"
111 | ]
112 | }
113 | ],
114 | "metadata": {
115 | "kernelspec": {
116 | "display_name": "Python 3",
117 | "language": "python",
118 | "name": "python3"
119 | },
120 | "language_info": {
121 | "codemirror_mode": {
122 | "name": "ipython",
123 | "version": 3
124 | },
125 | "file_extension": ".py",
126 | "mimetype": "text/x-python",
127 | "name": "python",
128 | "nbconvert_exporter": "python",
129 | "pygments_lexer": "ipython3",
130 | "version": "3.8.2"
131 | }
132 | },
133 | "nbformat": 4,
134 | "nbformat_minor": 4
135 | }
136 |
--------------------------------------------------------------------------------
/Data/composers.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/63506c8e1229483512786323539fbcf853ae8495/Data/composers.xlsx
--------------------------------------------------------------------------------
/Data/ny_boroughs.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/63506c8e1229483512786323539fbcf853ae8495/Data/ny_boroughs.xlsx
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2020, Guillaume Witz
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | 3. Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://mybinder.org/v2/gh/guiwitz/NumpyPandas_course/54488164b462644baf601875be69cc911eda9615?urlpath=lab)
2 | [](https://colab.research.google.com/github/guiwitz/NumpyPandas_course/blob/colab)
3 |
4 |
5 | # Introduction to Numpy and Pandas
6 |
7 | This repository contains Jupyter notebooks introducing beginners to the Python packages Numpy and Pandas. The material has been designed for people already familiar with Python but not with its "scientific stack".
8 |
9 | This material has been created by Guillaume Witz (Science IT Support, Microscopy Imaging Center, Bern University) in the frame of the [courses offered by ScITS](https://www.scits.unibe.ch/).
10 |
11 | ## Content
12 | The course has the following content:
13 |
14 | ### Numpy
15 | - [Numpy arrays:](01-DA_Numpy_arrays_creation.ipynb): what they are and how to create, import and save them
16 | - [Maths with Numpy arrays](02-DA_Numpy_array_maths.ipynb): applying functions to arrays, doing basic statistics with arrays
17 | - [Numpy and Matplotlib](03-DA_Numpy_matplotlib.ipynb): Basics of plotting Numpy arrays with Matplotlib
18 | - [Recovering parts of arrays](04-DA_Numpy_indexing.ipynb): Using array coordinates to extract information (indexing, slicing)
19 | - [Combining arrays](05-DA_Numpy_combining_arrays.ipynb): Assembling arrays by concatenation, stacking etc. Combining arrays of different sizes (broadcasting)
20 |
21 | ### Pandas
22 | - [Introduction to Pandas](06-DA_Pandas_introduction.ipynb): What does Pandas offer?
23 | - [Pandas data structures](07-DA_Pandas_structures.ipynb): Series and dataframes
24 | - [Importing data to Pandas](08-DA_Pandas_import_plotting.ipynb): Importing data tables into Pandas (from Excel, CSV) and plotting them
25 | - [Pandas operations](09-DA_Pandas_operations.ipynb): Applying functions to the contents of Pandas dataframes (classical statistics, ```apply``` function etc.)
26 | - [Combining Pandas dataframes](10-DA_Pandas_combine.ipynb): Using concatenation or join operations to combine dataframes
27 | - [Analyzing Pandas dataframes](11-DA_Pandas_splitting.ipynb): Split dataframes into groups (```groupy```) for category-based analysis
28 | - [A real-world example](12-DA_Pandas_realworld.ipynb): Complete pipeline including data import, cleaning, analysis and plotting and showing the nitty-gritty issues one often faces with real data
29 |
30 | ## Running the course
31 |
32 | ### Live sessions
33 |
34 | During live sessions of the course, you are given access to a private Jupyter session and don't need to install anything no your computer.
35 |
36 | ### Without installation
37 | Outside live-sessions, this entire course can still be run interactively without any local installation thanks to the [mybinder](mybinder.org) service. For that just click on the mybinder tag at the top of this Readme. This will open a Jupyter session for you with all packages, notebooks and data available to run.
38 |
39 | Alternatively you can also run the course on Google Colab. For that just click on the Colab badge at the top of this file.
40 |
41 | ### Local installation
42 | For a local installation, we recommend using conda to create a specific environment to run the code. If you don't yet have conda, you can e.g. install miniconda, see [here](https://docs.conda.io/en/latest/miniconda.html) for instructions. Then:
43 |
44 | 1. Clone the repository to your computer using [this link](https://github.com/guiwitz/NumpyPandas_course/archive/master.zip) and unzip it
45 | 2. Open a terminal and move to the ```NumpyPandas_course-master/binder``` folder
46 | 3. Here you find an ```environment.yml``` file that you can use to create a conda environment. Choose an environment name e.g. ```numpypandas``` and type:
47 | ```
48 | conda env create -n numpypandas -f environment.yml
49 | ```
50 | 4. When you want to run the material, activate the environment and start jupyter:
51 | ```
52 | conda activate numpypandas
53 | jupyter lab
54 | ```
55 | Note that the top folder of your directory in Jupyter is the folder from where you started Jupyter. So if you are e.g. in the ```binder``` folder, move one level up to have access to the notebooks
56 |
57 | ## Note on the data used
58 |
59 | In the Pandas part, we use some data provided publicly by the Swiss National Science foundation at this link: http://p3.snf.ch/Pages/DataAndDocumentation.aspx#DataDownload. The examples of analysis on these data **are in no way confirmed or validated by the SNSF and are entirely the work of Guillaume Witz, Science IT Support, Bern University**.
60 |
61 |
--------------------------------------------------------------------------------
/binder/environment.yml:
--------------------------------------------------------------------------------
1 | channels:
2 | - conda-forge
3 | dependencies:
4 | - numpy
5 | - matplotlib
6 | - scikit-learn
7 | - scikit-image
8 | - pandas
9 | - jupyter
10 | - jupyterlab=1.2.*
11 | - jupyter_contrib_nbextensions
12 | - tqdm
13 | - seaborn
14 | - pip
15 | - nodejs
16 | - ipywidgets
17 | - pip:
18 | - plotnine
19 | - xlrd
--------------------------------------------------------------------------------
/binder/postBuild:
--------------------------------------------------------------------------------
1 | jupyter labextension install @jupyterlab/toc --no-build
2 | jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
3 | jupyter labextension install @lckr/jupyterlab_variableinspector --no-build
4 |
5 | jupyter lab build
--------------------------------------------------------------------------------
/colab/automate_colab_editing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import os, re, glob"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## Collect notebooks from regular branch"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 4,
22 | "metadata": {},
23 | "outputs": [],
24 | "source": [
25 | "notebooks_or = glob.glob('/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/*.ipynb')\n"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## Find which packages to add in each notebook by looking for \"special\" packages"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 6,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "external_packages = ['aicsimageio','ipyvolume','mrc','trackpy','stardist','cellpose']\n",
42 | "new_packages = []\n",
43 | "for noteb in notebooks_or:\n",
44 | " with open(noteb) as n:\n",
45 | " all_lines = n.readlines()\n",
46 | " to_add = []\n",
47 | " for a in all_lines:\n",
48 | " if len(a) < 1000:\n",
49 | " for e in external_packages:\n",
50 | " if a.find(e) > 0:\n",
51 | " if e not in to_add:\n",
52 | " to_add.append(e)\n",
53 | " new_packages.append(to_add)"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 7,
59 | "metadata": {},
60 | "outputs": [
61 | {
62 | "data": {
63 | "text/plain": [
64 | "[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]"
65 | ]
66 | },
67 | "execution_count": 7,
68 | "metadata": {},
69 | "output_type": "execute_result"
70 | }
71 | ],
72 | "source": [
73 | "new_packages"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "## Define basic cells to add to notebook"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 18,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": [
89 | "data_import = \"\"\" {\n",
90 | " \"cell_type\": \"code\",\n",
91 | " \"execution_count\": null,\n",
92 | " \"metadata\": {},\n",
93 | " \"outputs\": [],\n",
94 | " \"source\": [\n",
95 | " \"import sys, os\\\\n\",\n",
96 | " \"if 'google.colab' in sys.modules:\\\\n\",\n",
97 | " \" if not os.path.isdir('Data'):\\\\n\",\n",
98 | " \" !curl https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/master/colab/colab_data.sh -o colab_data.sh\\\\n\",\n",
99 | " \" !curl https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/master/svg.py -o svg.py\\\\n\",\n",
100 | " \" !sh colab_data.sh\"\n",
101 | " ]\n",
102 | " },\\n\"\"\""
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "## Define where to save new notebooks (colab branch)"
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": 19,
115 | "metadata": {},
116 | "outputs": [],
117 | "source": [
118 | "newpath = '/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course_colab\\\n",
119 | "/NumpyPandas_course/'\n"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "## Add Google drive import and package installation to each notebook"
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": 23,
132 | "metadata": {},
133 | "outputs": [
134 | {
135 | "name": "stdout",
136 | "output_type": "stream",
137 | "text": [
138 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/05-DA_Numpy_combining_arrays.ipynb\n",
139 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/99-DA_Pandas_Exercises.ipynb\n",
140 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/98-DA_Numpy_Exercises.ipynb\n",
141 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/01-DA_Numpy_arrays_creation.ipynb\n",
142 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/09-DA_Pandas_operations.ipynb\n",
143 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/11-DA_Pandas_splitting.ipynb\n",
144 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/08-DA_Pandas_import_plotting.ipynb\n",
145 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/06-DA_Pandas_introduction.ipynb\n",
146 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/02-DA_Numpy_array_maths.ipynb\n",
147 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/03-DA_Numpy_matplotlib.ipynb\n",
148 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/10-DA_Pandas_combine.ipynb\n",
149 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/12-DA_Pandas_realworld.ipynb\n",
150 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/99-DA_Pandas_Solutions.ipynb\n",
151 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/04-DA_Numpy_indexing.ipynb\n",
152 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/07-DA_Pandas_structures.ipynb\n",
153 | "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/98-DA_Numpy_Solutions.ipynb\n"
154 | ]
155 | }
156 | ],
157 | "source": [
158 | "for ind, n in enumerate(notebooks_or):\n",
159 | " print(n)\n",
160 | " fh = newpath + n.split('/')[-1]\n",
161 | " counter = 0\n",
162 | "\n",
163 | "\n",
164 | " with open(fh,'w') as new_file:\n",
165 | " with open(n) as old_file:\n",
166 | " for line in old_file:\n",
167 | " if counter == 2:\n",
168 | " new_file.write(data_import)\n",
169 | " new_file.write(line)\n",
170 | " else:\n",
171 | " new_file.write(line)\n",
172 | " counter +=1\n"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": null,
178 | "metadata": {},
179 | "outputs": [],
180 | "source": []
181 | }
182 | ],
183 | "metadata": {
184 | "kernelspec": {
185 | "display_name": "Python 3",
186 | "language": "python",
187 | "name": "python3"
188 | },
189 | "language_info": {
190 | "codemirror_mode": {
191 | "name": "ipython",
192 | "version": 3
193 | },
194 | "file_extension": ".py",
195 | "mimetype": "text/x-python",
196 | "name": "python",
197 | "nbconvert_exporter": "python",
198 | "pygments_lexer": "ipython3",
199 | "version": "3.8.2"
200 | },
201 | "toc": {
202 | "base_numbering": 1,
203 | "nav_menu": {},
204 | "number_sections": false,
205 | "sideBar": true,
206 | "skip_h1_title": false,
207 | "title_cell": "Table of Contents",
208 | "title_sidebar": "Contents",
209 | "toc_cell": false,
210 | "toc_position": {},
211 | "toc_section_display": true,
212 | "toc_window_display": true
213 | },
214 | "varInspector": {
215 | "cols": {
216 | "lenName": 16,
217 | "lenType": 16,
218 | "lenVar": 40
219 | },
220 | "kernels_config": {
221 | "python": {
222 | "delete_cmd_postfix": "",
223 | "delete_cmd_prefix": "del ",
224 | "library": "var_list.py",
225 | "varRefreshCmd": "print(var_dic_list())"
226 | },
227 | "r": {
228 | "delete_cmd_postfix": ") ",
229 | "delete_cmd_prefix": "rm(",
230 | "library": "var_list.r",
231 | "varRefreshCmd": "cat(var_dic_list()) "
232 | }
233 | },
234 | "types_to_exclude": [
235 | "module",
236 | "function",
237 | "builtin_function_or_method",
238 | "instance",
239 | "_Feature"
240 | ],
241 | "window_display": false
242 | }
243 | },
244 | "nbformat": 4,
245 | "nbformat_minor": 4
246 | }
247 |
--------------------------------------------------------------------------------
/colab/colab_data.sh:
--------------------------------------------------------------------------------
1 | git clone https://github.com/guiwitz/NumpyPandas_course.git
2 | cp -r NumpyPandas_course/Data /content
3 | rm -r NumpyPandas_course/
--------------------------------------------------------------------------------
/svg.py:
--------------------------------------------------------------------------------
1 | #This module is taken from the Dask project and can be found here:
2 | #https://github.com/dask/dask/blob/master/dask/array/svg.py
3 | #It has been slightly modified to allow for the representation of numpy arrays.
4 | #Here is the accompanying license:
5 |
6 | '''
7 | Copyright (c) 2014-2018, Anaconda, Inc. and contributors
8 | All rights reserved.
9 |
10 | Redistribution and use in source and binary forms, with or without modification,
11 | are permitted provided that the following conditions are met:
12 |
13 | Redistributions of source code must retain the above copyright notice,
14 | this list of conditions and the following disclaimer.
15 |
16 | Redistributions in binary form must reproduce the above copyright notice,
17 | this list of conditions and the following disclaimer in the documentation
18 | and/or other materials provided with the distribution.
19 |
20 | Neither the name of Anaconda nor the names of any contributors may be used to
21 | endorse or promote products derived from this software without specific prior
22 | written permission.
23 |
24 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
25 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27 | ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
28 | LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
29 | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
30 | SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
32 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
33 | ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
34 | THE POSSIBILITY OF SUCH DAMAGE.
35 | '''
36 |
37 | import math
38 | import re
39 |
40 | import numpy as np
41 | from IPython.display import HTML
42 |
43 | def svg(chunks, size=200, **kwargs):
44 | """ Convert chunks from Dask Array into an SVG Image
45 |
46 | Parameters
47 | ----------
48 | chunks: tuple
49 | size: int
50 | Rough size of the image
51 |
52 | Returns
53 | -------
54 | text: An svg string depicting the array as a grid of chunks
55 | """
56 | shape = tuple(map(sum, chunks))
57 | if np.isnan(shape).any(): # don't support unknown sizes
58 | raise NotImplementedError(
59 | "Can't generate SVG with unknown chunk sizes.\n\n"
60 | " A possible solution is with x.compute_chunk_sizes()"
61 | )
62 | if not all(shape):
63 | raise NotImplementedError("Can't generate SVG with 0-length dimensions")
64 | if len(chunks) == 0:
65 | raise NotImplementedError("Can't generate SVG with 0 dimensions")
66 | if len(chunks) == 1:
67 | return svg_1d(chunks, size=size, **kwargs)
68 | elif len(chunks) == 2:
69 | return svg_2d(chunks, size=size, **kwargs)
70 | elif len(chunks) == 3:
71 | return svg_3d(chunks, size=size, **kwargs)
72 | else:
73 | return svg_nd(chunks, size=size, **kwargs)
74 |
75 |
76 | text_style = 'font-size="1.0rem" font-weight="100" text-anchor="middle"'
77 |
78 |
79 | def svg_2d(chunks, offset=(0, 0), skew=(0, 0), size=200, sizes=None):
80 | shape = tuple(map(sum, chunks))
81 | sizes = sizes or draw_sizes(shape, size=size)
82 | y, x = grid_points(chunks, sizes)
83 |
84 | lines, (min_x, max_x, min_y, max_y) = svg_grid(x, y, offset=offset, skew=skew)
85 |
86 | header = (
87 | '"
91 |
92 | if shape[0] >= 100:
93 | rotate = -90
94 | else:
95 | rotate = 0
96 |
97 | text = [
98 | "",
99 | " ",
100 | ' %d'
101 | % (max_x / 2, max_y + 20, text_style, shape[1]),
102 | ' %d'
103 | % (max_x + 20, max_y / 2, text_style, rotate, max_x + 20, max_y / 2, shape[0]),
104 | ]
105 |
106 | return header + "\n".join(lines + text) + footer
107 |
108 |
109 | def svg_3d(chunks, size=200, sizes=None, offset=(0, 0)):
110 | shape = tuple(map(sum, chunks))
111 | sizes = sizes or draw_sizes(shape, size=size)
112 | x, y, z = grid_points(chunks, sizes)
113 | ox, oy = offset
114 |
115 | xy, (mnx, mxx, mny, mxy) = svg_grid(
116 | x / 1.7, y, offset=(ox + 10, oy + 0), skew=(1, 0)
117 | )
118 |
119 | zx, (_, _, _, max_x) = svg_grid(z, x / 1.7, offset=(ox + 10, oy + 0), skew=(0, 1))
120 | zy, (min_z, max_z, min_y, max_y) = svg_grid(
121 | z, y, offset=(ox + max_x + 10, oy + max_x), skew=(0, 0)
122 | )
123 |
124 | header = (
125 | '"
129 |
130 | if shape[1] >= 100:
131 | rotate = -90
132 | else:
133 | rotate = 0
134 |
135 | text = [
136 | "",
137 | " ",
138 | ' %d'
139 | % ((min_z + max_z) / 2, max_y + 20, text_style, shape[2]),
140 | ' %d'
141 | % (
142 | max_z + 20,
143 | (min_y + max_y) / 2,
144 | text_style,
145 | rotate,
146 | max_z + 20,
147 | (min_y + max_y) / 2,
148 | shape[1],
149 | ),
150 | ' %d'
151 | % (
152 | (mnx + mxx) / 2 - 10,
153 | mxy - (mxx - mnx) / 2 + 20,
154 | text_style,
155 | (mnx + mxx) / 2 - 10,
156 | mxy - (mxx - mnx) / 2 + 20,
157 | shape[0],
158 | ),
159 | ]
160 |
161 | return header + "\n".join(xy + zx + zy + text) + footer
162 |
163 |
164 | def svg_nd(chunks, size=200):
165 | if len(chunks) % 3 == 1:
166 | chunks = ((1,),) + chunks
167 | shape = tuple(map(sum, chunks))
168 | sizes = draw_sizes(shape, size=size)
169 |
170 | chunks2 = chunks
171 | sizes2 = sizes
172 | out = []
173 | left = 0
174 | total_height = 0
175 | while chunks2:
176 | n = len(chunks2) % 3 or 3
177 | o = svg(chunks2[:n], sizes=sizes2[:n], offset=(left, 0))
178 | chunks2 = chunks2[n:]
179 | sizes2 = sizes2[n:]
180 |
181 | lines = o.split("\n")
182 | header = lines[0]
183 | height = float(re.search(r'height="(\d*\.?\d*)"', header).groups()[0])
184 | total_height = max(total_height, height)
185 | width = float(re.search(r'width="(\d*\.?\d*)"', header).groups()[0])
186 | left += width + 10
187 | o = "\n".join(lines[1:-1]) # remove header and footer
188 |
189 | out.append(o)
190 |
191 | header = (
192 | '"
196 | return header + "\n\n".join(out) + footer
197 |
198 |
199 | def svg_lines(x1, y1, x2, y2):
200 | """ Convert points into lines of text for an SVG plot
201 |
202 | Examples
203 | --------
204 | >>> svg_lines([0, 1], [0, 0], [10, 11], [1, 1]) # doctest: +NORMALIZE_WHITESPACE
205 | [' ',
206 | ' ']
207 | """
208 | n = len(x1)
209 | lines = [
210 | ' ' % (x1[i], y1[i], x2[i], y2[i])
211 | for i in range(n)
212 | ]
213 |
214 | lines[0] = lines[0].replace(" /", ' style="stroke-width:2" /')
215 | lines[-1] = lines[-1].replace(" /", ' style="stroke-width:2" /')
216 | return lines
217 |
218 |
219 | def svg_grid(x, y, offset=(0, 0), skew=(0, 0)):
220 | """ Create lines of SVG text that show a grid
221 |
222 | Parameters
223 | ----------
224 | x: numpy.ndarray
225 | y: numpy.ndarray
226 | offset: tuple
227 | translational displacement of the grid in SVG coordinates
228 | skew: tuple
229 | """
230 | # Horizontal lines
231 | x1 = np.zeros_like(y) + offset[0]
232 | y1 = y + offset[1]
233 | x2 = np.full_like(y, x[-1]) + offset[0]
234 | y2 = y + offset[1]
235 |
236 | if skew[0]:
237 | y2 += x.max() * skew[0]
238 | if skew[1]:
239 | x1 += skew[1] * y
240 | x2 += skew[1] * y
241 |
242 | min_x = min(x1.min(), x2.min())
243 | min_y = min(y1.min(), y2.min())
244 | max_x = max(x1.max(), x2.max())
245 | max_y = max(y1.max(), y2.max())
246 |
247 | h_lines = ["", " "] + svg_lines(x1, y1, x2, y2)
248 |
249 | # Vertical lines
250 | x1 = x + offset[0]
251 | y1 = np.zeros_like(x) + offset[1]
252 | x2 = x + offset[0]
253 | y2 = np.full_like(x, y[-1]) + offset[1]
254 |
255 | if skew[0]:
256 | y1 += skew[0] * x
257 | y2 += skew[0] * x
258 | if skew[1]:
259 | x2 += skew[1] * y.max()
260 |
261 | v_lines = ["", " "] + svg_lines(x1, y1, x2, y2)
262 |
263 | rect = [
264 | "",
265 | " ",
266 | ' '
267 | % (x1[0], y1[0], x1[-1], y1[-1], x2[-1], y2[-1], x2[0], y2[0]),
268 | ]
269 |
270 | return h_lines + v_lines + rect, (min_x, max_x, min_y, max_y)
271 |
272 |
273 | def svg_1d(chunks, sizes=None, **kwargs):
274 | return svg_2d(((1,),) + chunks, **kwargs)
275 |
276 |
277 | def grid_points(chunks, sizes):
278 | cumchunks = [np.cumsum((0,) + c) for c in chunks]
279 | points = [x * size / x[-1] for x, size in zip(cumchunks, sizes)]
280 | return points
281 |
282 |
283 | def draw_sizes(shape, size=200):
284 | """ Get size in pixels for all dimensions """
285 | mx = max(shape)
286 | ratios = [mx / max(0.1, d) for d in shape]
287 | ratios = [ratio_response(r) for r in ratios]
288 | return tuple(size / r for r in ratios)
289 |
290 |
291 | def ratio_response(x):
292 | """ How we display actual size ratios
293 |
294 | Common ratios in sizes span several orders of magnitude,
295 | which is hard for us to perceive.
296 |
297 | We keep ratios in the 1-3 range accurate, and then apply a logarithm to
298 | values up until about 100 or so, at which point we stop scaling.
299 | """
300 | if x < math.e:
301 | return x
302 | elif x <= 100:
303 | return math.log(x + 12.4) # f(e) == e
304 | else:
305 | return math.log(100 + 12.4)
306 |
307 | def numpy_to_svg(array):
308 |
309 | return HTML(svg(tuple((tuple(np.ones(x)) for x in array.shape))))
310 |
311 |
312 |
313 |
--------------------------------------------------------------------------------