├── .github
└── FUNDING.yml
├── 1. Environment setup.ipynb
├── 2. First steps with data frames.ipynb
├── 3. Working with text files.ipynb
├── 4. Grouping data frames.ipynb
├── 5. Collecting experiments data in a data frame.ipynb
├── 6. Next steps.ipynb
├── LICENSE
├── Manifest.toml
├── Project.toml
├── README.md
└── rainfall_forecast.csv
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: [JuliaLang]
4 | open_collective: julialang
5 |
--------------------------------------------------------------------------------
/1. Environment setup.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Environment setup for data frames tutorial\n",
8 | "\n",
9 | "## Bogumił Kamiński"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "Welcome to DataFrames.jl introduction!\n",
17 | "\n",
18 | "This set of Jupyter notebooks is intended to give you an overwiew of what functionality DataFrames.jl has based on practical examples.\n",
19 | "\n",
20 | "You can find reviews of functionality of DataFrames.jl (not as exercises as this tutorial but task-type oriented) in the following locations:\n",
21 | "* an official manual at https://juliadata.github.io/DataFrames.jl/stable/\n",
22 | "* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial\n",
23 | "\n",
24 | "We also assume that you have a basic knowledge of the Julia language and the Julia ecosystem. There are great tutorials on this topic in [JuliaAcademy](https://juliaacademy.com/), so I encourage you to check them out.\n",
25 | "\n",
26 | "As this is a hands-on tutorial you can expect that the examples will be implemented in a way as I would write them when doing actual project."
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "The current version of the notebooks requires Julia 1.6+ and was run under Julia 1.9.0. If you have a different version of Julia installed change the kernel in *Kernel/Change kernel* option in menu (assuming you are on a Julia 1.6+ all examples should work)."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 1,
39 | "metadata": {},
40 | "outputs": [
41 | {
42 | "data": {
43 | "text/plain": [
44 | "v\"1.9.0\""
45 | ]
46 | },
47 | "execution_count": 1,
48 | "metadata": {},
49 | "output_type": "execute_result"
50 | }
51 | ],
52 | "source": [
53 | "VERSION"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "Jupyter Notebook automatically activates project environment if it is found in the working directory.\n",
61 | "\n",
62 | "So first let us check if we have Project.toml and Manifest.toml files present (they should be present if you cloned the repository of this tutorial)."
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 2,
68 | "metadata": {},
69 | "outputs": [
70 | {
71 | "data": {
72 | "text/plain": [
73 | "2-element BitVector:\n",
74 | " 1\n",
75 | " 1"
76 | ]
77 | },
78 | "execution_count": 2,
79 | "metadata": {},
80 | "output_type": "execute_result"
81 | }
82 | ],
83 | "source": [
84 | "isfile.([\"Project.toml\", \"Manifest.toml\"])"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "You should get `1` printed (meaning `true`) in both entries of a vector.\n",
92 | "\n",
93 | "Now we are sure that you are going to use exactly the same versions of the packages that I use when running this tutorial.\n",
94 | "\n",
95 | "Let us check what packages (and in what versions) we will use."
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 3,
101 | "metadata": {},
102 | "outputs": [
103 | {
104 | "name": "stdout",
105 | "output_type": "stream",
106 | "text": [
107 | "\u001b[32m\u001b[1mStatus\u001b[22m\u001b[39m `~/JuliaAcademy-DataFrames/Project.toml`\n",
108 | " \u001b[90m[69666777] \u001b[39mArrow v2.4.3\n",
109 | " \u001b[90m[336ed68f] \u001b[39mCSV v0.10.9\n",
110 | " \u001b[90m[a93c6f00] \u001b[39mDataFrames v1.5.0\n",
111 | " \u001b[90m[da1fdf0e] \u001b[39mFreqTables v0.4.5\n",
112 | " \u001b[90m[38e38edf] \u001b[39mGLM v1.8.1\n",
113 | " \u001b[90m[b98c9c47] \u001b[39mPipe v1.3.0\n",
114 | " \u001b[90m[d330b81b] \u001b[39mPyPlot v2.11.0\n",
115 | " \u001b[90m[1986cc42] \u001b[39mUnitful v1.12.3\n"
116 | ]
117 | }
118 | ],
119 | "source": [
120 | "] status"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "These notebooks should work with DataFrames.jl versions 0.22, 1.2, 1.3, 1.4 and 1.5 (but note that current version of Manifest.toml assumes version 1.5)."
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "If you are running the notebooks in Jupyter then the project environment specified by the Project.toml and Manifest.toml files is activated automatically.\n",
135 | "\n",
136 | "If you are using other way to run these notebooks (e.g. in VS Code) the project environment might not get automatically activated. To be sure, start Julia within this project folder run:"
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": 4,
142 | "metadata": {},
143 | "outputs": [
144 | {
145 | "name": "stderr",
146 | "output_type": "stream",
147 | "text": [
148 | "\u001b[32m\u001b[1m Activating\u001b[22m\u001b[39m project at `~/JuliaAcademy-DataFrames`\n"
149 | ]
150 | }
151 | ],
152 | "source": [
153 | "] activate ."
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "f the correct environment is not activated you might face some unexpected issues with the packages. More details about Julia environments can be found [here](https://pkgdocs.julialang.org/v1/environments)."
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {},
166 | "source": [
167 | "If checking the status of the packages gives a warning that some of the packages are not downloaded run the `instantiate` instruction from the following line."
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 5,
173 | "metadata": {
174 | "scrolled": true
175 | },
176 | "outputs": [],
177 | "source": [
178 | "] instantiate"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "
\n",
186 | "
PyPlot.jl configuration:
\n",
187 | "
In some environments automatic installation of PyPlot.jl might fail. If you encounter this ussue please refer to the PyPlot.jl installation instructions .
\n",
188 | "
\n",
189 | "\n",
190 | "In particular typically executing the following commands:\n",
191 | "\n",
192 | "```\n",
193 | "using Pkg\n",
194 | "ENV[\"PYTHON\"]=\"\"\n",
195 | "Pkg.build(\"PyCall\")\n",
196 | "```\n",
197 | "\n",
198 | "should resolve the PyPlot.jl installation issues. However, on OS X sometimes more configuration steps are required. You can find the detailed instructions [here](https://github.com/JuliaPy/PyPlot.jl#os-x)."
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "As you see we will use the following packages:\n",
206 | "\n",
207 | "Package | Description\n",
208 | ":-|:-\n",
209 | "DataFrames.jl | a core package that is a subject of this tutorial; it is used for data manipulation; we use version 0.21.0 of this package\n",
210 | "CSV.jl | a package for reading/writing of CSV files\n",
211 | "FreqTables.jl | a very useful package for creating frequency tables\n",
212 | "GLM.jl | a package for fitting Generalized Linear Models (as no data science tutorial would be complete without building some predictive model)\n",
213 | "PyPlot.jl | a package for plotting; there are many options in the Julia ecosystem to choose from; in this tutorial we use PyPlot.jl as it is based on Matplotlib so if you have experience with the Python data science technology stack it should be familiar\n",
214 | "Pipe.jl | a package that makes chaining of operations super powerful (which is something you probably know from `%>%` in R)\n",
215 | "Arrow.jl | a package for working with data in Apache Arrow format\n",
216 | "Unitful.jl | a package for working with physical units (like kg, cm, ...)"
217 | ]
218 | }
219 | ],
220 | "metadata": {
221 | "@webio": {
222 | "lastCommId": null,
223 | "lastKernelId": null
224 | },
225 | "kernelspec": {
226 | "display_name": "Julia 1.9.0",
227 | "language": "julia",
228 | "name": "julia-1.9"
229 | },
230 | "language_info": {
231 | "file_extension": ".jl",
232 | "mimetype": "application/julia",
233 | "name": "julia",
234 | "version": "1.9.0"
235 | }
236 | },
237 | "nbformat": 4,
238 | "nbformat_minor": 4
239 | }
240 |
--------------------------------------------------------------------------------
/5. Collecting experiments data in a data frame.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Collecting experiments data in a data frame\n",
8 | "\n",
9 | "### Bogumił Kamiński"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {},
16 | "outputs": [
17 | {
18 | "name": "stderr",
19 | "output_type": "stream",
20 | "text": [
21 | "\u001b[32m\u001b[1m Activating\u001b[22m\u001b[39m project at `~/JuliaAcademy-DataFrames`\n"
22 | ]
23 | }
24 | ],
25 | "source": [
26 | "] activate ."
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "using DataFrames"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 3,
41 | "metadata": {},
42 | "outputs": [],
43 | "source": [
44 | "using Statistics"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 4,
50 | "metadata": {},
51 | "outputs": [],
52 | "source": [
53 | "using PyPlot"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "\n",
61 | "
PyPlot.jl configuration:
\n",
62 | "
In some environments automatic installation of PyPlot.jl might fail. If you encounter this ussue please refer to the PyPlot.jl installation instructions .
\n",
63 | "
\n",
64 | "\n",
65 | "In particular typically executing the following commands:\n",
66 | "\n",
67 | "```\n",
68 | "using Pkg\n",
69 | "ENV[\"PYTHON\"]=\"\"\n",
70 | "Pkg.build(\"PyCall\")\n",
71 | "```\n",
72 | "\n",
73 | "should resolve the PyPlot.jl installation issues. However, on OS X sometimes more configuration steps are required. You can find the detailed instructions [here](https://github.com/JuliaPy/PyPlot.jl#os-x)."
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 5,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "using Random"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 6,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "using Pipe"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "In this part we will run a simple Monte Carlo simulation so show examples how one can work with data frames."
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "Consider the following puzzle.\n",
106 | "\n",
107 | "We draw independent random numbers from $U(0,1)$ distribution. On the average, how many draws do we need, till the sum of these numbers exceeds $1$?"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "Here is the code that runs this experiment once. For tutorial reasons we keep all the generated random numbers and recalculate their sum in each iteration (you can try to improve the efficiency of this code as an exercise)."
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 7,
120 | "metadata": {},
121 | "outputs": [
122 | {
123 | "data": {
124 | "text/plain": [
125 | "sim_e (generic function with 1 method)"
126 | ]
127 | },
128 | "execution_count": 7,
129 | "metadata": {},
130 | "output_type": "execute_result"
131 | }
132 | ],
133 | "source": [
134 | "function sim_e()\n",
135 | " draw = Float64[]\n",
136 | " while true\n",
137 | " push!(draw, rand())\n",
138 | " sum(draw) > 1.0 && return draw\n",
139 | " end\n",
140 | "end"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": 8,
146 | "metadata": {},
147 | "outputs": [],
148 | "source": [
149 | "Random.seed!(1234); # just to make sure we get the same results if we are on the same version of Julia"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "Let us run our simulation several times:"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 9,
162 | "metadata": {},
163 | "outputs": [
164 | {
165 | "data": {
166 | "text/plain": [
167 | "5-element Vector{Vector{Float64}}:\n",
168 | " [0.32597672886359486, 0.5490511363155669, 0.21858665481883066]\n",
169 | " [0.8942454282009883, 0.35311164439921205]\n",
170 | " [0.39425536741585077, 0.9531246272848422]\n",
171 | " [0.7955469475347194, 0.4942498668904206]\n",
172 | " [0.7484150218874741, 0.5782319465613976]"
173 | ]
174 | },
175 | "execution_count": 9,
176 | "metadata": {},
177 | "output_type": "execute_result"
178 | }
179 | ],
180 | "source": [
181 | "res = [sim_e() for _ in 1:5]"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "and check that each time we finished just when we exceeded $1$:"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 10,
194 | "metadata": {},
195 | "outputs": [
196 | {
197 | "data": {
198 | "text/plain": [
199 | "5-element Vector{Float64}:\n",
200 | " 1.0936145199979923\n",
201 | " 1.2473570726002003\n",
202 | " 1.347379994700693\n",
203 | " 1.2897968144251402\n",
204 | " 1.3266469684488718"
205 | ]
206 | },
207 | "execution_count": 10,
208 | "metadata": {},
209 | "output_type": "execute_result"
210 | }
211 | ],
212 | "source": [
213 | "sum.(res)"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": 11,
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "data": {
223 | "text/plain": [
224 | "5-element Vector{Float64}:\n",
225 | " 0.8750278651791616\n",
226 | " 0.8942454282009883\n",
227 | " 0.3942553674158509\n",
228 | " 0.7955469475347196\n",
229 | " 0.7484150218874741"
230 | ]
231 | },
232 | "execution_count": 11,
233 | "metadata": {},
234 | "output_type": "execute_result"
235 | }
236 | ],
237 | "source": [
238 | "@. sum(res) - last(res)"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "All looks good so far! (and as a bonus we have just made a small exercise in broadcasting)"
246 | ]
247 | },
248 | {
249 | "cell_type": "markdown",
250 | "metadata": {},
251 | "source": [
252 | "Now let us populate a data frame with the results of our experiments"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 12,
258 | "metadata": {},
259 | "outputs": [
260 | {
261 | "name": "stdout",
262 | "output_type": "stream",
263 | "text": [
264 | " 13.492488 seconds (120.34 M allocations: 3.966 GiB, 50.07% gc time, 2.38% compilation time)\n"
265 | ]
266 | }
267 | ],
268 | "source": [
269 | "df = DataFrame()\n",
270 | "\n",
271 | "@time for i in 1:10^7\n",
272 | " push!(df, (id=i, pos=sim_e()))\n",
273 | "end"
274 | ]
275 | },
276 | {
277 | "cell_type": "markdown",
278 | "metadata": {},
279 | "source": [
280 | "As you can see the process was quite fast, `push!`-ing data to a `DataFrame` is efficient."
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": 13,
286 | "metadata": {},
287 | "outputs": [
288 | {
289 | "data": {
290 | "text/html": [
291 | "10000000×2 DataFrame
9999975 rows omitted
1 1 [0.727935, 0.00744801, 0.199377, 0.439243] 2 2 [0.682533, 0.956741] 3 3 [0.647855, 0.996665] 4 4 [0.749194, 0.110084, 0.491383] 5 5 [0.565145, 0.253812, 0.626794] 6 6 [0.234105, 0.124792, 0.609875, 0.672793] 7 7 [0.761916, 0.588872] 8 8 [0.365854, 0.131026, 0.946453] 9 9 [0.574323, 0.67765] 10 10 [0.571586, 0.0727161, 0.701116] 11 11 [0.0952175, 0.845515, 0.348995] 12 12 [0.768308, 0.26906] 13 13 [0.539631, 0.293905, 0.242195] ⋮ ⋮ ⋮ 9999989 9999989 [0.383955, 0.70686] 9999990 9999990 [0.193813, 0.649544, 0.539589] 9999991 9999991 [0.382258, 0.901623] 9999992 9999992 [0.640595, 0.916399] 9999993 9999993 [0.963508, 0.301685] 9999994 9999994 [0.846778, 0.328575] 9999995 9999995 [0.251749, 0.28344, 0.16416, 0.618446] 9999996 9999996 [0.383316, 0.352557, 0.470778] 9999997 9999997 [0.362197, 0.639499] 9999998 9999998 [0.046897, 0.552855, 0.172669, 0.156109, 0.799254] 9999999 9999999 [0.815608, 0.663247] 10000000 10000000 [0.61177, 0.5798]
"
292 | ],
293 | "text/latex": [
294 | "\\begin{tabular}{r|cc}\n",
295 | "\t& id & pos\\\\\n",
296 | "\t\\hline\n",
297 | "\t& Int64 & Array…\\\\\n",
298 | "\t\\hline\n",
299 | "\t1 & 1 & [0.727935, 0.00744801, 0.199377, 0.439243] \\\\\n",
300 | "\t2 & 2 & [0.682533, 0.956741] \\\\\n",
301 | "\t3 & 3 & [0.647855, 0.996665] \\\\\n",
302 | "\t4 & 4 & [0.749194, 0.110084, 0.491383] \\\\\n",
303 | "\t5 & 5 & [0.565145, 0.253812, 0.626794] \\\\\n",
304 | "\t6 & 6 & [0.234105, 0.124792, 0.609875, 0.672793] \\\\\n",
305 | "\t7 & 7 & [0.761916, 0.588872] \\\\\n",
306 | "\t8 & 8 & [0.365854, 0.131026, 0.946453] \\\\\n",
307 | "\t9 & 9 & [0.574323, 0.67765] \\\\\n",
308 | "\t10 & 10 & [0.571586, 0.0727161, 0.701116] \\\\\n",
309 | "\t11 & 11 & [0.0952175, 0.845515, 0.348995] \\\\\n",
310 | "\t12 & 12 & [0.768308, 0.26906] \\\\\n",
311 | "\t13 & 13 & [0.539631, 0.293905, 0.242195] \\\\\n",
312 | "\t14 & 14 & [0.97807, 0.853242] \\\\\n",
313 | "\t15 & 15 & [0.706065, 0.442139] \\\\\n",
314 | "\t16 & 16 & [0.884577, 0.520741] \\\\\n",
315 | "\t17 & 17 & [0.941831, 0.810699] \\\\\n",
316 | "\t18 & 18 & [0.464849, 0.977012] \\\\\n",
317 | "\t19 & 19 & [0.500161, 0.715846] \\\\\n",
318 | "\t20 & 20 & [0.18985, 0.376441, 0.565864] \\\\\n",
319 | "\t21 & 21 & [0.172236, 0.327351, 0.632108] \\\\\n",
320 | "\t22 & 22 & [0.374141, 0.728543] \\\\\n",
321 | "\t23 & 23 & [0.00277646, 0.134685, 0.531899, 0.0731709, 0.501756] \\\\\n",
322 | "\t24 & 24 & [0.906939, 0.116769] \\\\\n",
323 | "\t25 & 25 & [0.367198, 0.881163] \\\\\n",
324 | "\t26 & 26 & [0.67189, 0.479862] \\\\\n",
325 | "\t27 & 27 & [0.949968, 0.0262331, 0.347189] \\\\\n",
326 | "\t28 & 28 & [0.819254, 0.792831] \\\\\n",
327 | "\t29 & 29 & [0.96723, 0.471452] \\\\\n",
328 | "\t30 & 30 & [0.19378, 0.941781] \\\\\n",
329 | "\t$\\dots$ & $\\dots$ & $\\dots$ \\\\\n",
330 | "\\end{tabular}\n"
331 | ],
332 | "text/plain": [
333 | "\u001b[1m10000000×2 DataFrame\u001b[0m\n",
334 | "\u001b[1m Row \u001b[0m│\u001b[1m id \u001b[0m\u001b[1m pos \u001b[0m\n",
335 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Array… \u001b[0m\n",
336 | "──────────┼─────────────────────────────────────────────\n",
337 | " 1 │ 1 [0.727935, 0.00744801, 0.199377,…\n",
338 | " 2 │ 2 [0.682533, 0.956741]\n",
339 | " 3 │ 3 [0.647855, 0.996665]\n",
340 | " 4 │ 4 [0.749194, 0.110084, 0.491383]\n",
341 | " 5 │ 5 [0.565145, 0.253812, 0.626794]\n",
342 | " 6 │ 6 [0.234105, 0.124792, 0.609875, 0…\n",
343 | " 7 │ 7 [0.761916, 0.588872]\n",
344 | " 8 │ 8 [0.365854, 0.131026, 0.946453]\n",
345 | " 9 │ 9 [0.574323, 0.67765]\n",
346 | " 10 │ 10 [0.571586, 0.0727161, 0.701116]\n",
347 | " 11 │ 11 [0.0952175, 0.845515, 0.348995]\n",
348 | " ⋮ │ ⋮ ⋮\n",
349 | " 9999991 │ 9999991 [0.382258, 0.901623]\n",
350 | " 9999992 │ 9999992 [0.640595, 0.916399]\n",
351 | " 9999993 │ 9999993 [0.963508, 0.301685]\n",
352 | " 9999994 │ 9999994 [0.846778, 0.328575]\n",
353 | " 9999995 │ 9999995 [0.251749, 0.28344, 0.16416, 0.6…\n",
354 | " 9999996 │ 9999996 [0.383316, 0.352557, 0.470778]\n",
355 | " 9999997 │ 9999997 [0.362197, 0.639499]\n",
356 | " 9999998 │ 9999998 [0.046897, 0.552855, 0.172669, 0…\n",
357 | " 9999999 │ 9999999 [0.815608, 0.663247]\n",
358 | " 10000000 │ 10000000 [0.61177, 0.5798]\n",
359 | "\u001b[36m 9999979 rows omitted\u001b[0m"
360 | ]
361 | },
362 | "execution_count": 13,
363 | "metadata": {},
364 | "output_type": "execute_result"
365 | }
366 | ],
367 | "source": [
368 | "df"
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {},
374 | "source": [
375 | "Let us count the number of jumps we have made in each step using the `transform!` function:"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": 14,
381 | "metadata": {
382 | "scrolled": false
383 | },
384 | "outputs": [
385 | {
386 | "data": {
387 | "text/html": [
388 | "10000000×3 DataFrame
9999975 rows omitted
1 1 [0.727935, 0.00744801, 0.199377, 0.439243] 4 2 2 [0.682533, 0.956741] 2 3 3 [0.647855, 0.996665] 2 4 4 [0.749194, 0.110084, 0.491383] 3 5 5 [0.565145, 0.253812, 0.626794] 3 6 6 [0.234105, 0.124792, 0.609875, 0.672793] 4 7 7 [0.761916, 0.588872] 2 8 8 [0.365854, 0.131026, 0.946453] 3 9 9 [0.574323, 0.67765] 2 10 10 [0.571586, 0.0727161, 0.701116] 3 11 11 [0.0952175, 0.845515, 0.348995] 3 12 12 [0.768308, 0.26906] 2 13 13 [0.539631, 0.293905, 0.242195] 3 ⋮ ⋮ ⋮ ⋮ 9999989 9999989 [0.383955, 0.70686] 2 9999990 9999990 [0.193813, 0.649544, 0.539589] 3 9999991 9999991 [0.382258, 0.901623] 2 9999992 9999992 [0.640595, 0.916399] 2 9999993 9999993 [0.963508, 0.301685] 2 9999994 9999994 [0.846778, 0.328575] 2 9999995 9999995 [0.251749, 0.28344, 0.16416, 0.618446] 4 9999996 9999996 [0.383316, 0.352557, 0.470778] 3 9999997 9999997 [0.362197, 0.639499] 2 9999998 9999998 [0.046897, 0.552855, 0.172669, 0.156109, 0.799254] 5 9999999 9999999 [0.815608, 0.663247] 2 10000000 10000000 [0.61177, 0.5798] 2
"
389 | ],
390 | "text/latex": [
391 | "\\begin{tabular}{r|ccc}\n",
392 | "\t& id & pos & jumps\\\\\n",
393 | "\t\\hline\n",
394 | "\t& Int64 & Array… & Int64\\\\\n",
395 | "\t\\hline\n",
396 | "\t1 & 1 & [0.727935, 0.00744801, 0.199377, 0.439243] & 4 \\\\\n",
397 | "\t2 & 2 & [0.682533, 0.956741] & 2 \\\\\n",
398 | "\t3 & 3 & [0.647855, 0.996665] & 2 \\\\\n",
399 | "\t4 & 4 & [0.749194, 0.110084, 0.491383] & 3 \\\\\n",
400 | "\t5 & 5 & [0.565145, 0.253812, 0.626794] & 3 \\\\\n",
401 | "\t6 & 6 & [0.234105, 0.124792, 0.609875, 0.672793] & 4 \\\\\n",
402 | "\t7 & 7 & [0.761916, 0.588872] & 2 \\\\\n",
403 | "\t8 & 8 & [0.365854, 0.131026, 0.946453] & 3 \\\\\n",
404 | "\t9 & 9 & [0.574323, 0.67765] & 2 \\\\\n",
405 | "\t10 & 10 & [0.571586, 0.0727161, 0.701116] & 3 \\\\\n",
406 | "\t11 & 11 & [0.0952175, 0.845515, 0.348995] & 3 \\\\\n",
407 | "\t12 & 12 & [0.768308, 0.26906] & 2 \\\\\n",
408 | "\t13 & 13 & [0.539631, 0.293905, 0.242195] & 3 \\\\\n",
409 | "\t14 & 14 & [0.97807, 0.853242] & 2 \\\\\n",
410 | "\t15 & 15 & [0.706065, 0.442139] & 2 \\\\\n",
411 | "\t16 & 16 & [0.884577, 0.520741] & 2 \\\\\n",
412 | "\t17 & 17 & [0.941831, 0.810699] & 2 \\\\\n",
413 | "\t18 & 18 & [0.464849, 0.977012] & 2 \\\\\n",
414 | "\t19 & 19 & [0.500161, 0.715846] & 2 \\\\\n",
415 | "\t20 & 20 & [0.18985, 0.376441, 0.565864] & 3 \\\\\n",
416 | "\t21 & 21 & [0.172236, 0.327351, 0.632108] & 3 \\\\\n",
417 | "\t22 & 22 & [0.374141, 0.728543] & 2 \\\\\n",
418 | "\t23 & 23 & [0.00277646, 0.134685, 0.531899, 0.0731709, 0.501756] & 5 \\\\\n",
419 | "\t24 & 24 & [0.906939, 0.116769] & 2 \\\\\n",
420 | "\t25 & 25 & [0.367198, 0.881163] & 2 \\\\\n",
421 | "\t26 & 26 & [0.67189, 0.479862] & 2 \\\\\n",
422 | "\t27 & 27 & [0.949968, 0.0262331, 0.347189] & 3 \\\\\n",
423 | "\t28 & 28 & [0.819254, 0.792831] & 2 \\\\\n",
424 | "\t29 & 29 & [0.96723, 0.471452] & 2 \\\\\n",
425 | "\t30 & 30 & [0.19378, 0.941781] & 2 \\\\\n",
426 | "\t$\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ \\\\\n",
427 | "\\end{tabular}\n"
428 | ],
429 | "text/plain": [
430 | "\u001b[1m10000000×3 DataFrame\u001b[0m\n",
431 | "\u001b[1m Row \u001b[0m│\u001b[1m id \u001b[0m\u001b[1m pos \u001b[0m\u001b[1m jumps \u001b[0m\n",
432 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Array… \u001b[0m\u001b[90m Int64 \u001b[0m\n",
433 | "──────────┼────────────────────────────────────────────────────\n",
434 | " 1 │ 1 [0.727935, 0.00744801, 0.199377,… 4\n",
435 | " 2 │ 2 [0.682533, 0.956741] 2\n",
436 | " 3 │ 3 [0.647855, 0.996665] 2\n",
437 | " 4 │ 4 [0.749194, 0.110084, 0.491383] 3\n",
438 | " 5 │ 5 [0.565145, 0.253812, 0.626794] 3\n",
439 | " 6 │ 6 [0.234105, 0.124792, 0.609875, 0… 4\n",
440 | " 7 │ 7 [0.761916, 0.588872] 2\n",
441 | " 8 │ 8 [0.365854, 0.131026, 0.946453] 3\n",
442 | " 9 │ 9 [0.574323, 0.67765] 2\n",
443 | " 10 │ 10 [0.571586, 0.0727161, 0.701116] 3\n",
444 | " 11 │ 11 [0.0952175, 0.845515, 0.348995] 3\n",
445 | " ⋮ │ ⋮ ⋮ ⋮\n",
446 | " 9999991 │ 9999991 [0.382258, 0.901623] 2\n",
447 | " 9999992 │ 9999992 [0.640595, 0.916399] 2\n",
448 | " 9999993 │ 9999993 [0.963508, 0.301685] 2\n",
449 | " 9999994 │ 9999994 [0.846778, 0.328575] 2\n",
450 | " 9999995 │ 9999995 [0.251749, 0.28344, 0.16416, 0.6… 4\n",
451 | " 9999996 │ 9999996 [0.383316, 0.352557, 0.470778] 3\n",
452 | " 9999997 │ 9999997 [0.362197, 0.639499] 2\n",
453 | " 9999998 │ 9999998 [0.046897, 0.552855, 0.172669, 0… 5\n",
454 | " 9999999 │ 9999999 [0.815608, 0.663247] 2\n",
455 | " 10000000 │ 10000000 [0.61177, 0.5798] 2\n",
456 | "\u001b[36m 9999979 rows omitted\u001b[0m"
457 | ]
458 | },
459 | "execution_count": 14,
460 | "metadata": {},
461 | "output_type": "execute_result"
462 | }
463 | ],
464 | "source": [
465 | "transform!(df, :pos => ByRow(length) => :jumps)"
466 | ]
467 | },
468 | {
469 | "cell_type": "markdown",
470 | "metadata": {},
471 | "source": [
472 | "Let us dissect what we have written above:\n",
473 | "* `transform!` adds columns to a data frame in-place\n",
474 | "* `:pos` is a source column\n",
475 | "* `ByRow(length)` tells us that we want to apply `length` function to each element for `:pos` column (without it `length` would be applied to the whole column - can you guess what would be the result?)\n",
476 | "* `:jumps` is the name of the column that should be created"
477 | ]
478 | },
479 | {
480 | "cell_type": "markdown",
481 | "metadata": {},
482 | "source": [
483 | "Now we are ready to find the average number of jumps that are made:"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 15,
489 | "metadata": {},
490 | "outputs": [
491 | {
492 | "data": {
493 | "text/plain": [
494 | "2.7183826"
495 | ]
496 | },
497 | "execution_count": 15,
498 | "metadata": {},
499 | "output_type": "execute_result"
500 | }
501 | ],
502 | "source": [
503 | "mean(df.jumps)"
504 | ]
505 | },
506 | {
507 | "cell_type": "markdown",
508 | "metadata": {},
509 | "source": [
510 | "or"
511 | ]
512 | },
513 | {
514 | "cell_type": "code",
515 | "execution_count": 16,
516 | "metadata": {},
517 | "outputs": [
518 | {
519 | "data": {
520 | "text/html": [
521 | ""
522 | ],
523 | "text/latex": [
524 | "\\begin{tabular}{r|c}\n",
525 | "\t& jumps\\_mean\\\\\n",
526 | "\t\\hline\n",
527 | "\t& Float64\\\\\n",
528 | "\t\\hline\n",
529 | "\t1 & 2.71838 \\\\\n",
530 | "\\end{tabular}\n"
531 | ],
532 | "text/plain": [
533 | "\u001b[1m1×1 DataFrame\u001b[0m\n",
534 | "\u001b[1m Row \u001b[0m│\u001b[1m jumps_mean \u001b[0m\n",
535 | " │\u001b[90m Float64 \u001b[0m\n",
536 | "─────┼────────────\n",
537 | " 1 │ 2.71838"
538 | ]
539 | },
540 | "execution_count": 16,
541 | "metadata": {},
542 | "output_type": "execute_result"
543 | }
544 | ],
545 | "source": [
546 | "combine(df, :jumps => mean)"
547 | ]
548 | },
549 | {
550 | "cell_type": "markdown",
551 | "metadata": {},
552 | "source": [
553 | "which happens to be very close to:"
554 | ]
555 | },
556 | {
557 | "cell_type": "code",
558 | "execution_count": 17,
559 | "metadata": {},
560 | "outputs": [
561 | {
562 | "data": {
563 | "text/plain": [
564 | "ℯ = 2.7182818284590..."
565 | ]
566 | },
567 | "execution_count": 17,
568 | "metadata": {},
569 | "output_type": "execute_result"
570 | }
571 | ],
572 | "source": [
573 | "MathConstants.e"
574 | ]
575 | },
576 | {
577 | "cell_type": "markdown",
578 | "metadata": {},
579 | "source": [
580 | "Let us now find a distribution of number of jumps:"
581 | ]
582 | },
583 | {
584 | "cell_type": "code",
585 | "execution_count": 18,
586 | "metadata": {},
587 | "outputs": [
588 | {
589 | "data": {
590 | "text/html": [
591 | "1 2 5000265 2 3 3332702 3 4 1249879 4 5 333412 5 6 69869 6 7 11866 7 8 1754 8 9 227 9 10 23 10 11 3
"
592 | ],
593 | "text/latex": [
594 | "\\begin{tabular}{r|cc}\n",
595 | "\t& jumps & jumps\\_length\\\\\n",
596 | "\t\\hline\n",
597 | "\t& Int64 & Int64\\\\\n",
598 | "\t\\hline\n",
599 | "\t1 & 2 & 5000265 \\\\\n",
600 | "\t2 & 3 & 3332702 \\\\\n",
601 | "\t3 & 4 & 1249879 \\\\\n",
602 | "\t4 & 5 & 333412 \\\\\n",
603 | "\t5 & 6 & 69869 \\\\\n",
604 | "\t6 & 7 & 11866 \\\\\n",
605 | "\t7 & 8 & 1754 \\\\\n",
606 | "\t8 & 9 & 227 \\\\\n",
607 | "\t9 & 10 & 23 \\\\\n",
608 | "\t10 & 11 & 3 \\\\\n",
609 | "\\end{tabular}\n"
610 | ],
611 | "text/plain": [
612 | "\u001b[1m10×2 DataFrame\u001b[0m\n",
613 | "\u001b[1m Row \u001b[0m│\u001b[1m jumps \u001b[0m\u001b[1m jumps_length \u001b[0m\n",
614 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
615 | "─────┼─────────────────────\n",
616 | " 1 │ 2 5000265\n",
617 | " 2 │ 3 3332702\n",
618 | " 3 │ 4 1249879\n",
619 | " 4 │ 5 333412\n",
620 | " 5 │ 6 69869\n",
621 | " 6 │ 7 11866\n",
622 | " 7 │ 8 1754\n",
623 | " 8 │ 9 227\n",
624 | " 9 │ 10 23\n",
625 | " 10 │ 11 3"
626 | ]
627 | },
628 | "execution_count": 18,
629 | "metadata": {},
630 | "output_type": "execute_result"
631 | }
632 | ],
633 | "source": [
634 | "jumps_agg = @pipe df |>\n",
635 | " groupby(_, :jumps, sort=true) |>\n",
636 | " combine(_, :jumps => length)"
637 | ]
638 | },
639 | {
640 | "cell_type": "markdown",
641 | "metadata": {},
642 | "source": [
643 | "and normalize it as a fraction (and at the same time calculate some theoretical result that we have *guessed* :)):"
644 | ]
645 | },
646 | {
647 | "cell_type": "code",
648 | "execution_count": 19,
649 | "metadata": {},
650 | "outputs": [
651 | {
652 | "data": {
653 | "text/html": [
654 | "1 2 5000265 0.500027 0.5 2 3 3332702 0.33327 0.333333 3 4 1249879 0.124988 0.125 4 5 333412 0.0333412 0.0333333 5 6 69869 0.0069869 0.00694444 6 7 11866 0.0011866 0.00119048 7 8 1754 0.0001754 0.000173611 8 9 227 2.27e-5 2.20459e-5 9 10 23 2.3e-6 2.48016e-6 10 11 3 3.0e-7 2.50521e-7
"
655 | ],
656 | "text/latex": [
657 | "\\begin{tabular}{r|cccc}\n",
658 | "\t& jumps & jumps\\_length & simulation & theory\\\\\n",
659 | "\t\\hline\n",
660 | "\t& Int64 & Int64 & Float64 & Float64\\\\\n",
661 | "\t\\hline\n",
662 | "\t1 & 2 & 5000265 & 0.500027 & 0.5 \\\\\n",
663 | "\t2 & 3 & 3332702 & 0.33327 & 0.333333 \\\\\n",
664 | "\t3 & 4 & 1249879 & 0.124988 & 0.125 \\\\\n",
665 | "\t4 & 5 & 333412 & 0.0333412 & 0.0333333 \\\\\n",
666 | "\t5 & 6 & 69869 & 0.0069869 & 0.00694444 \\\\\n",
667 | "\t6 & 7 & 11866 & 0.0011866 & 0.00119048 \\\\\n",
668 | "\t7 & 8 & 1754 & 0.0001754 & 0.000173611 \\\\\n",
669 | "\t8 & 9 & 227 & 2.27e-5 & 2.20459e-5 \\\\\n",
670 | "\t9 & 10 & 23 & 2.3e-6 & 2.48016e-6 \\\\\n",
671 | "\t10 & 11 & 3 & 3.0e-7 & 2.50521e-7 \\\\\n",
672 | "\\end{tabular}\n"
673 | ],
674 | "text/plain": [
675 | "\u001b[1m10×4 DataFrame\u001b[0m\n",
676 | "\u001b[1m Row \u001b[0m│\u001b[1m jumps \u001b[0m\u001b[1m jumps_length \u001b[0m\u001b[1m simulation \u001b[0m\u001b[1m theory \u001b[0m\n",
677 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
678 | "─────┼──────────────────────────────────────────────\n",
679 | " 1 │ 2 5000265 0.500027 0.5\n",
680 | " 2 │ 3 3332702 0.33327 0.333333\n",
681 | " 3 │ 4 1249879 0.124988 0.125\n",
682 | " 4 │ 5 333412 0.0333412 0.0333333\n",
683 | " 5 │ 6 69869 0.0069869 0.00694444\n",
684 | " 6 │ 7 11866 0.0011866 0.00119048\n",
685 | " 7 │ 8 1754 0.0001754 0.000173611\n",
686 | " 8 │ 9 227 2.27e-5 2.20459e-5\n",
687 | " 9 │ 10 23 2.3e-6 2.48016e-6\n",
688 | " 10 │ 11 3 3.0e-7 2.50521e-7"
689 | ]
690 | },
691 | "execution_count": 19,
692 | "metadata": {},
693 | "output_type": "execute_result"
694 | }
695 | ],
696 | "source": [
697 | "transform!(jumps_agg,\n",
698 | " :jumps_length => (x -> x ./ sum(x)) => :simulation,\n",
699 | " :jumps => ByRow(x -> (x-1) / factorial(x)) => :theory)"
700 | ]
701 | },
702 | {
703 | "cell_type": "markdown",
704 | "metadata": {},
705 | "source": [
706 | "Let us briefly justify how we have guessed it (you can safely skip the derivation):\n",
707 | "\n",
708 | "Formula\n",
709 | "$$\n",
710 | "p_n = \\frac{n-1}{n!}\n",
711 | "$$\n",
712 | "\n",
713 | "$$\n",
714 | "\\sum_{n=2}^{+\\infty}p_n=\\sum_{n=2}^{+\\infty} \\frac{n-1}{n!} = \\sum_{n=1}^{+\\infty} \\frac{1}{n!} - \\sum_{n=2}^{+\\infty} \\frac{1}{n!} = 1\n",
715 | "$$\n",
716 | "\n",
717 | "$$\n",
718 | "\\sum_{n=2}^{+\\infty}n\\cdot p_n=\\sum_{n=2}^{+\\infty} n\\frac{n-1}{n!} = \\sum_{n=2}^{+\\infty} \\frac{1}{(n-2)!} = e\n",
719 | "$$\n",
720 | "\n",
721 | "Now we note that:\n",
722 | "\n",
723 | "$$\n",
724 | "1-\\sum_{n=2}^k p_n = \\frac{1}{k!}\n",
725 | "$$\n",
726 | "which can be most easily justified by a geometric argument."
727 | ]
728 | },
729 | {
730 | "cell_type": "markdown",
731 | "metadata": {},
732 | "source": [
733 | "To finish this section of the tutorial let us check if random numbers generated using `rand()` were indeed $U(0,1)$."
734 | ]
735 | },
736 | {
737 | "cell_type": "markdown",
738 | "metadata": {},
739 | "source": [
740 | "To do this we will add some columns to `df` data frame."
741 | ]
742 | },
743 | {
744 | "cell_type": "code",
745 | "execution_count": 20,
746 | "metadata": {
747 | "scrolled": false
748 | },
749 | "outputs": [
750 | {
751 | "data": {
752 | "text/html": [
753 | "10000000×3 DataFrame
9999975 rows omitted
1 1 [0.727935, 0.00744801, 0.199377, 0.439243] 4 2 2 [0.682533, 0.956741] 2 3 3 [0.647855, 0.996665] 2 4 4 [0.749194, 0.110084, 0.491383] 3 5 5 [0.565145, 0.253812, 0.626794] 3 6 6 [0.234105, 0.124792, 0.609875, 0.672793] 4 7 7 [0.761916, 0.588872] 2 8 8 [0.365854, 0.131026, 0.946453] 3 9 9 [0.574323, 0.67765] 2 10 10 [0.571586, 0.0727161, 0.701116] 3 11 11 [0.0952175, 0.845515, 0.348995] 3 12 12 [0.768308, 0.26906] 2 13 13 [0.539631, 0.293905, 0.242195] 3 ⋮ ⋮ ⋮ ⋮ 9999989 9999989 [0.383955, 0.70686] 2 9999990 9999990 [0.193813, 0.649544, 0.539589] 3 9999991 9999991 [0.382258, 0.901623] 2 9999992 9999992 [0.640595, 0.916399] 2 9999993 9999993 [0.963508, 0.301685] 2 9999994 9999994 [0.846778, 0.328575] 2 9999995 9999995 [0.251749, 0.28344, 0.16416, 0.618446] 4 9999996 9999996 [0.383316, 0.352557, 0.470778] 3 9999997 9999997 [0.362197, 0.639499] 2 9999998 9999998 [0.046897, 0.552855, 0.172669, 0.156109, 0.799254] 5 9999999 9999999 [0.815608, 0.663247] 2 10000000 10000000 [0.61177, 0.5798] 2
"
754 | ],
755 | "text/latex": [
756 | "\\begin{tabular}{r|ccc}\n",
757 | "\t& id & pos & jumps\\\\\n",
758 | "\t\\hline\n",
759 | "\t& Int64 & Array… & Int64\\\\\n",
760 | "\t\\hline\n",
761 | "\t1 & 1 & [0.727935, 0.00744801, 0.199377, 0.439243] & 4 \\\\\n",
762 | "\t2 & 2 & [0.682533, 0.956741] & 2 \\\\\n",
763 | "\t3 & 3 & [0.647855, 0.996665] & 2 \\\\\n",
764 | "\t4 & 4 & [0.749194, 0.110084, 0.491383] & 3 \\\\\n",
765 | "\t5 & 5 & [0.565145, 0.253812, 0.626794] & 3 \\\\\n",
766 | "\t6 & 6 & [0.234105, 0.124792, 0.609875, 0.672793] & 4 \\\\\n",
767 | "\t7 & 7 & [0.761916, 0.588872] & 2 \\\\\n",
768 | "\t8 & 8 & [0.365854, 0.131026, 0.946453] & 3 \\\\\n",
769 | "\t9 & 9 & [0.574323, 0.67765] & 2 \\\\\n",
770 | "\t10 & 10 & [0.571586, 0.0727161, 0.701116] & 3 \\\\\n",
771 | "\t11 & 11 & [0.0952175, 0.845515, 0.348995] & 3 \\\\\n",
772 | "\t12 & 12 & [0.768308, 0.26906] & 2 \\\\\n",
773 | "\t13 & 13 & [0.539631, 0.293905, 0.242195] & 3 \\\\\n",
774 | "\t14 & 14 & [0.97807, 0.853242] & 2 \\\\\n",
775 | "\t15 & 15 & [0.706065, 0.442139] & 2 \\\\\n",
776 | "\t16 & 16 & [0.884577, 0.520741] & 2 \\\\\n",
777 | "\t17 & 17 & [0.941831, 0.810699] & 2 \\\\\n",
778 | "\t18 & 18 & [0.464849, 0.977012] & 2 \\\\\n",
779 | "\t19 & 19 & [0.500161, 0.715846] & 2 \\\\\n",
780 | "\t20 & 20 & [0.18985, 0.376441, 0.565864] & 3 \\\\\n",
781 | "\t21 & 21 & [0.172236, 0.327351, 0.632108] & 3 \\\\\n",
782 | "\t22 & 22 & [0.374141, 0.728543] & 2 \\\\\n",
783 | "\t23 & 23 & [0.00277646, 0.134685, 0.531899, 0.0731709, 0.501756] & 5 \\\\\n",
784 | "\t24 & 24 & [0.906939, 0.116769] & 2 \\\\\n",
785 | "\t25 & 25 & [0.367198, 0.881163] & 2 \\\\\n",
786 | "\t26 & 26 & [0.67189, 0.479862] & 2 \\\\\n",
787 | "\t27 & 27 & [0.949968, 0.0262331, 0.347189] & 3 \\\\\n",
788 | "\t28 & 28 & [0.819254, 0.792831] & 2 \\\\\n",
789 | "\t29 & 29 & [0.96723, 0.471452] & 2 \\\\\n",
790 | "\t30 & 30 & [0.19378, 0.941781] & 2 \\\\\n",
791 | "\t$\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ \\\\\n",
792 | "\\end{tabular}\n"
793 | ],
794 | "text/plain": [
795 | "\u001b[1m10000000×3 DataFrame\u001b[0m\n",
796 | "\u001b[1m Row \u001b[0m│\u001b[1m id \u001b[0m\u001b[1m pos \u001b[0m\u001b[1m jumps \u001b[0m\n",
797 | " │\u001b[90m Int64 \u001b[0m\u001b[90m Array… \u001b[0m\u001b[90m Int64 \u001b[0m\n",
798 | "──────────┼────────────────────────────────────────────────────\n",
799 | " 1 │ 1 [0.727935, 0.00744801, 0.199377,… 4\n",
800 | " 2 │ 2 [0.682533, 0.956741] 2\n",
801 | " 3 │ 3 [0.647855, 0.996665] 2\n",
802 | " 4 │ 4 [0.749194, 0.110084, 0.491383] 3\n",
803 | " 5 │ 5 [0.565145, 0.253812, 0.626794] 3\n",
804 | " 6 │ 6 [0.234105, 0.124792, 0.609875, 0… 4\n",
805 | " 7 │ 7 [0.761916, 0.588872] 2\n",
806 | " 8 │ 8 [0.365854, 0.131026, 0.946453] 3\n",
807 | " 9 │ 9 [0.574323, 0.67765] 2\n",
808 | " 10 │ 10 [0.571586, 0.0727161, 0.701116] 3\n",
809 | " 11 │ 11 [0.0952175, 0.845515, 0.348995] 3\n",
810 | " ⋮ │ ⋮ ⋮ ⋮\n",
811 | " 9999991 │ 9999991 [0.382258, 0.901623] 2\n",
812 | " 9999992 │ 9999992 [0.640595, 0.916399] 2\n",
813 | " 9999993 │ 9999993 [0.963508, 0.301685] 2\n",
814 | " 9999994 │ 9999994 [0.846778, 0.328575] 2\n",
815 | " 9999995 │ 9999995 [0.251749, 0.28344, 0.16416, 0.6… 4\n",
816 | " 9999996 │ 9999996 [0.383316, 0.352557, 0.470778] 3\n",
817 | " 9999997 │ 9999997 [0.362197, 0.639499] 2\n",
818 | " 9999998 │ 9999998 [0.046897, 0.552855, 0.172669, 0… 5\n",
819 | " 9999999 │ 9999999 [0.815608, 0.663247] 2\n",
820 | " 10000000 │ 10000000 [0.61177, 0.5798] 2\n",
821 | "\u001b[36m 9999979 rows omitted\u001b[0m"
822 | ]
823 | },
824 | "execution_count": 20,
825 | "metadata": {},
826 | "output_type": "execute_result"
827 | }
828 | ],
829 | "source": [
830 | "df"
831 | ]
832 | },
833 | {
834 | "cell_type": "code",
835 | "execution_count": 21,
836 | "metadata": {},
837 | "outputs": [
838 | {
839 | "data": {
840 | "text/html": [
841 | "10000000×2 DataFrame
9999975 rows omitted
1 0.727935 0.439243 2 0.682533 0.956741 3 0.647855 0.996665 4 0.749194 0.491383 5 0.565145 0.626794 6 0.234105 0.672793 7 0.761916 0.588872 8 0.365854 0.946453 9 0.574323 0.67765 10 0.571586 0.701116 11 0.0952175 0.348995 12 0.768308 0.26906 13 0.539631 0.242195 ⋮ ⋮ ⋮ 9999989 0.383955 0.70686 9999990 0.193813 0.539589 9999991 0.382258 0.901623 9999992 0.640595 0.916399 9999993 0.963508 0.301685 9999994 0.846778 0.328575 9999995 0.251749 0.618446 9999996 0.383316 0.470778 9999997 0.362197 0.639499 9999998 0.046897 0.799254 9999999 0.815608 0.663247 10000000 0.61177 0.5798
"
842 | ],
843 | "text/latex": [
844 | "\\begin{tabular}{r|cc}\n",
845 | "\t& first & last\\\\\n",
846 | "\t\\hline\n",
847 | "\t& Float64 & Float64\\\\\n",
848 | "\t\\hline\n",
849 | "\t1 & 0.727935 & 0.439243 \\\\\n",
850 | "\t2 & 0.682533 & 0.956741 \\\\\n",
851 | "\t3 & 0.647855 & 0.996665 \\\\\n",
852 | "\t4 & 0.749194 & 0.491383 \\\\\n",
853 | "\t5 & 0.565145 & 0.626794 \\\\\n",
854 | "\t6 & 0.234105 & 0.672793 \\\\\n",
855 | "\t7 & 0.761916 & 0.588872 \\\\\n",
856 | "\t8 & 0.365854 & 0.946453 \\\\\n",
857 | "\t9 & 0.574323 & 0.67765 \\\\\n",
858 | "\t10 & 0.571586 & 0.701116 \\\\\n",
859 | "\t11 & 0.0952175 & 0.348995 \\\\\n",
860 | "\t12 & 0.768308 & 0.26906 \\\\\n",
861 | "\t13 & 0.539631 & 0.242195 \\\\\n",
862 | "\t14 & 0.97807 & 0.853242 \\\\\n",
863 | "\t15 & 0.706065 & 0.442139 \\\\\n",
864 | "\t16 & 0.884577 & 0.520741 \\\\\n",
865 | "\t17 & 0.941831 & 0.810699 \\\\\n",
866 | "\t18 & 0.464849 & 0.977012 \\\\\n",
867 | "\t19 & 0.500161 & 0.715846 \\\\\n",
868 | "\t20 & 0.18985 & 0.565864 \\\\\n",
869 | "\t21 & 0.172236 & 0.632108 \\\\\n",
870 | "\t22 & 0.374141 & 0.728543 \\\\\n",
871 | "\t23 & 0.00277646 & 0.501756 \\\\\n",
872 | "\t24 & 0.906939 & 0.116769 \\\\\n",
873 | "\t25 & 0.367198 & 0.881163 \\\\\n",
874 | "\t26 & 0.67189 & 0.479862 \\\\\n",
875 | "\t27 & 0.949968 & 0.347189 \\\\\n",
876 | "\t28 & 0.819254 & 0.792831 \\\\\n",
877 | "\t29 & 0.96723 & 0.471452 \\\\\n",
878 | "\t30 & 0.19378 & 0.941781 \\\\\n",
879 | "\t$\\dots$ & $\\dots$ & $\\dots$ \\\\\n",
880 | "\\end{tabular}\n"
881 | ],
882 | "text/plain": [
883 | "\u001b[1m10000000×2 DataFrame\u001b[0m\n",
884 | "\u001b[1m Row \u001b[0m│\u001b[1m first \u001b[0m\u001b[1m last \u001b[0m\n",
885 | " │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
886 | "──────────┼─────────────────────\n",
887 | " 1 │ 0.727935 0.439243\n",
888 | " 2 │ 0.682533 0.956741\n",
889 | " 3 │ 0.647855 0.996665\n",
890 | " 4 │ 0.749194 0.491383\n",
891 | " 5 │ 0.565145 0.626794\n",
892 | " 6 │ 0.234105 0.672793\n",
893 | " 7 │ 0.761916 0.588872\n",
894 | " 8 │ 0.365854 0.946453\n",
895 | " 9 │ 0.574323 0.67765\n",
896 | " 10 │ 0.571586 0.701116\n",
897 | " 11 │ 0.0952175 0.348995\n",
898 | " ⋮ │ ⋮ ⋮\n",
899 | " 9999991 │ 0.382258 0.901623\n",
900 | " 9999992 │ 0.640595 0.916399\n",
901 | " 9999993 │ 0.963508 0.301685\n",
902 | " 9999994 │ 0.846778 0.328575\n",
903 | " 9999995 │ 0.251749 0.618446\n",
904 | " 9999996 │ 0.383316 0.470778\n",
905 | " 9999997 │ 0.362197 0.639499\n",
906 | " 9999998 │ 0.046897 0.799254\n",
907 | " 9999999 │ 0.815608 0.663247\n",
908 | " 10000000 │ 0.61177 0.5798\n",
909 | "\u001b[36m 9999979 rows omitted\u001b[0m"
910 | ]
911 | },
912 | "execution_count": 21,
913 | "metadata": {},
914 | "output_type": "execute_result"
915 | }
916 | ],
917 | "source": [
918 | "df_test = select(df, :pos => ByRow(first) => :first, :pos => ByRow(last) => :last)"
919 | ]
920 | },
921 | {
922 | "cell_type": "code",
923 | "execution_count": 22,
924 | "metadata": {},
925 | "outputs": [
926 | {
927 | "data": {
928 | "image/png": "",
929 | "text/plain": [
930 | "Figure(PyObject )"
931 | ]
932 | },
933 | "metadata": {},
934 | "output_type": "display_data"
935 | }
936 | ],
937 | "source": [
938 | "hist(df_test.first, 100);"
939 | ]
940 | },
941 | {
942 | "cell_type": "markdown",
943 | "metadata": {},
944 | "source": [
945 | "So far all looks good. But let us look at the distribution of the last dawn random number:"
946 | ]
947 | },
948 | {
949 | "cell_type": "code",
950 | "execution_count": 23,
951 | "metadata": {},
952 | "outputs": [
953 | {
954 | "data": {
955 | "image/png": "",
956 | "text/plain": [
957 | "Figure(PyObject )"
958 | ]
959 | },
960 | "metadata": {},
961 | "output_type": "display_data"
962 | }
963 | ],
964 | "source": [
965 | "hist(df_test.last, 100);"
966 | ]
967 | },
968 | {
969 | "cell_type": "markdown",
970 | "metadata": {},
971 | "source": [
972 | "So - is the `rand()` function broken for the last generated random number in each sequence or something else has made the distribution stop being uniform?"
973 | ]
974 | }
975 | ],
976 | "metadata": {
977 | "kernelspec": {
978 | "display_name": "Julia 1.9.0",
979 | "language": "julia",
980 | "name": "julia-1.9"
981 | },
982 | "language_info": {
983 | "file_extension": ".jl",
984 | "mimetype": "application/julia",
985 | "name": "julia",
986 | "version": "1.9.0"
987 | }
988 | },
989 | "nbformat": 4,
990 | "nbformat_minor": 4
991 | }
992 |
--------------------------------------------------------------------------------
/6. Next steps.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Final examples\n",
8 | "\n",
9 | "### Bogumił Kamiński"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "Let us wrap up our tutorial with examples of joining and reshaping data."
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### Joining and reshaping data frames"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 1,
29 | "metadata": {},
30 | "outputs": [
31 | {
32 | "name": "stderr",
33 | "output_type": "stream",
34 | "text": [
35 | "\u001b[32m\u001b[1m Activating\u001b[22m\u001b[39m project at `~/JuliaAcademy-DataFrames`\n"
36 | ]
37 | }
38 | ],
39 | "source": [
40 | "] activate ."
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 2,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "using DataFrames"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 3,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "using CSV"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 4,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "using Pipe"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 5,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": [
76 | "using Unitful"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 6,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "using Dates"
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "Load the weather forecast data from two cities from Poland."
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 7,
98 | "metadata": {},
99 | "outputs": [
100 | {
101 | "data": {
102 | "text/html": [
103 | "1 Olecko 2020-11-16 2.9 2 Olecko 2020-11-17 4.1 3 Olecko 2020-11-19 4.3 4 Olecko 2020-11-20 2.0 5 Olecko 2020-11-21 0.6 6 Olecko 2020-11-22 1.0 7 Ełk 2020-11-16 3.9 8 Ełk 2020-11-19 1.2 9 Ełk 2020-11-20 2.0 10 Ełk 2020-11-22 2.0
"
104 | ],
105 | "text/latex": [
106 | "\\begin{tabular}{r|ccc}\n",
107 | "\t& city & date & rainfall\\\\\n",
108 | "\t\\hline\n",
109 | "\t& String7 & Date & Float64\\\\\n",
110 | "\t\\hline\n",
111 | "\t1 & Olecko & 2020-11-16 & 2.9 \\\\\n",
112 | "\t2 & Olecko & 2020-11-17 & 4.1 \\\\\n",
113 | "\t3 & Olecko & 2020-11-19 & 4.3 \\\\\n",
114 | "\t4 & Olecko & 2020-11-20 & 2.0 \\\\\n",
115 | "\t5 & Olecko & 2020-11-21 & 0.6 \\\\\n",
116 | "\t6 & Olecko & 2020-11-22 & 1.0 \\\\\n",
117 | "\t7 & Ełk & 2020-11-16 & 3.9 \\\\\n",
118 | "\t8 & Ełk & 2020-11-19 & 1.2 \\\\\n",
119 | "\t9 & Ełk & 2020-11-20 & 2.0 \\\\\n",
120 | "\t10 & Ełk & 2020-11-22 & 2.0 \\\\\n",
121 | "\\end{tabular}\n"
122 | ],
123 | "text/plain": [
124 | "\u001b[1m10×3 DataFrame\u001b[0m\n",
125 | "\u001b[1m Row \u001b[0m│\u001b[1m city \u001b[0m\u001b[1m date \u001b[0m\u001b[1m rainfall \u001b[0m\n",
126 | " │\u001b[90m String7 \u001b[0m\u001b[90m Date \u001b[0m\u001b[90m Float64 \u001b[0m\n",
127 | "─────┼───────────────────────────────\n",
128 | " 1 │ Olecko 2020-11-16 2.9\n",
129 | " 2 │ Olecko 2020-11-17 4.1\n",
130 | " 3 │ Olecko 2020-11-19 4.3\n",
131 | " 4 │ Olecko 2020-11-20 2.0\n",
132 | " 5 │ Olecko 2020-11-21 0.6\n",
133 | " 6 │ Olecko 2020-11-22 1.0\n",
134 | " 7 │ Ełk 2020-11-16 3.9\n",
135 | " 8 │ Ełk 2020-11-19 1.2\n",
136 | " 9 │ Ełk 2020-11-20 2.0\n",
137 | " 10 │ Ełk 2020-11-22 2.0"
138 | ]
139 | },
140 | "execution_count": 7,
141 | "metadata": {},
142 | "output_type": "execute_result"
143 | }
144 | ],
145 | "source": [
146 | "rainfall_long = CSV.File(\"rainfall_forecast.csv\") |> DataFrame"
147 | ]
148 | },
149 | {
150 | "cell_type": "markdown",
151 | "metadata": {},
152 | "source": [
153 | "Note that we collect rainfall information, so it would be nice to add units to the measured values. This is not a problem with Unitful.jl. We take advantage of the fact that `DataFrame` can store vectors of any Julia objects."
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": 8,
159 | "metadata": {},
160 | "outputs": [
161 | {
162 | "data": {
163 | "text/html": [
164 | "1 Olecko 2020-11-16 2.9 mm 2 Olecko 2020-11-17 4.1 mm 3 Olecko 2020-11-19 4.3 mm 4 Olecko 2020-11-20 2.0 mm 5 Olecko 2020-11-21 0.6 mm 6 Olecko 2020-11-22 1.0 mm 7 Ełk 2020-11-16 3.9 mm 8 Ełk 2020-11-19 1.2 mm 9 Ełk 2020-11-20 2.0 mm 10 Ełk 2020-11-22 2.0 mm
"
165 | ],
166 | "text/latex": [
167 | "\\begin{tabular}{r|ccc}\n",
168 | "\t& city & date & rainfall\\\\\n",
169 | "\t\\hline\n",
170 | "\t& String7 & Date & Quantity…\\\\\n",
171 | "\t\\hline\n",
172 | "\t1 & Olecko & 2020-11-16 & 2.9 mm \\\\\n",
173 | "\t2 & Olecko & 2020-11-17 & 4.1 mm \\\\\n",
174 | "\t3 & Olecko & 2020-11-19 & 4.3 mm \\\\\n",
175 | "\t4 & Olecko & 2020-11-20 & 2.0 mm \\\\\n",
176 | "\t5 & Olecko & 2020-11-21 & 0.6 mm \\\\\n",
177 | "\t6 & Olecko & 2020-11-22 & 1.0 mm \\\\\n",
178 | "\t7 & Ełk & 2020-11-16 & 3.9 mm \\\\\n",
179 | "\t8 & Ełk & 2020-11-19 & 1.2 mm \\\\\n",
180 | "\t9 & Ełk & 2020-11-20 & 2.0 mm \\\\\n",
181 | "\t10 & Ełk & 2020-11-22 & 2.0 mm \\\\\n",
182 | "\\end{tabular}\n"
183 | ],
184 | "text/plain": [
185 | "\u001b[1m10×3 DataFrame\u001b[0m\n",
186 | "\u001b[1m Row \u001b[0m│\u001b[1m city \u001b[0m\u001b[1m date \u001b[0m\u001b[1m rainfall \u001b[0m\n",
187 | " │\u001b[90m String7 \u001b[0m\u001b[90m Date \u001b[0m\u001b[90m Quantity… \u001b[0m\n",
188 | "─────┼────────────────────────────────\n",
189 | " 1 │ Olecko 2020-11-16 2.9 mm\n",
190 | " 2 │ Olecko 2020-11-17 4.1 mm\n",
191 | " 3 │ Olecko 2020-11-19 4.3 mm\n",
192 | " 4 │ Olecko 2020-11-20 2.0 mm\n",
193 | " 5 │ Olecko 2020-11-21 0.6 mm\n",
194 | " 6 │ Olecko 2020-11-22 1.0 mm\n",
195 | " 7 │ Ełk 2020-11-16 3.9 mm\n",
196 | " 8 │ Ełk 2020-11-19 1.2 mm\n",
197 | " 9 │ Ełk 2020-11-20 2.0 mm\n",
198 | " 10 │ Ełk 2020-11-22 2.0 mm"
199 | ]
200 | },
201 | "execution_count": 8,
202 | "metadata": {},
203 | "output_type": "execute_result"
204 | }
205 | ],
206 | "source": [
207 | "transform!(rainfall_long, :rainfall => x -> x .* u\"mm\", renamecols=false)"
208 | ]
209 | },
210 | {
211 | "cell_type": "markdown",
212 | "metadata": {},
213 | "source": [
214 | "With `renamecols=false` we left the name of the transformed column unchanged when we did an in-place update of the data frame using the `transform!` function."
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "metadata": {},
220 | "source": [
221 | "It would be nice to see the data in a wide format, so that each city is represented by a single column. We can achieve this using the `unstack` function:"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": 9,
227 | "metadata": {},
228 | "outputs": [
229 | {
230 | "data": {
231 | "text/html": [
232 | "1 2020-11-16 2.9 mm 3.9 mm 2 2020-11-17 4.1 mm missing 3 2020-11-19 4.3 mm 1.2 mm 4 2020-11-20 2.0 mm 2.0 mm 5 2020-11-21 0.6 mm missing 6 2020-11-22 1.0 mm 2.0 mm
"
233 | ],
234 | "text/latex": [
235 | "\\begin{tabular}{r|ccc}\n",
236 | "\t& date & Olecko & Ełk\\\\\n",
237 | "\t\\hline\n",
238 | "\t& Date & Quantity…? & Quantity…?\\\\\n",
239 | "\t\\hline\n",
240 | "\t1 & 2020-11-16 & 2.9 mm & 3.9 mm \\\\\n",
241 | "\t2 & 2020-11-17 & 4.1 mm & \\emph{missing} \\\\\n",
242 | "\t3 & 2020-11-19 & 4.3 mm & 1.2 mm \\\\\n",
243 | "\t4 & 2020-11-20 & 2.0 mm & 2.0 mm \\\\\n",
244 | "\t5 & 2020-11-21 & 0.6 mm & \\emph{missing} \\\\\n",
245 | "\t6 & 2020-11-22 & 1.0 mm & 2.0 mm \\\\\n",
246 | "\\end{tabular}\n"
247 | ],
248 | "text/plain": [
249 | "\u001b[1m6×3 DataFrame\u001b[0m\n",
250 | "\u001b[1m Row \u001b[0m│\u001b[1m date \u001b[0m\u001b[1m Olecko \u001b[0m\u001b[1m Ełk \u001b[0m\n",
251 | " │\u001b[90m Date \u001b[0m\u001b[90m Quantity…? \u001b[0m\u001b[90m Quantity…? \u001b[0m\n",
252 | "─────┼────────────────────────────────────\n",
253 | " 1 │ 2020-11-16 2.9 mm 3.9 mm\n",
254 | " 2 │ 2020-11-17 4.1 mm \u001b[90m missing \u001b[0m\n",
255 | " 3 │ 2020-11-19 4.3 mm 1.2 mm\n",
256 | " 4 │ 2020-11-20 2.0 mm 2.0 mm\n",
257 | " 5 │ 2020-11-21 0.6 mm \u001b[90m missing \u001b[0m\n",
258 | " 6 │ 2020-11-22 1.0 mm 2.0 mm"
259 | ]
260 | },
261 | "execution_count": 9,
262 | "metadata": {},
263 | "output_type": "execute_result"
264 | }
265 | ],
266 | "source": [
267 | "rainfall_wide = unstack(rainfall_long, :date, :city, :rainfall)"
268 | ]
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "metadata": {},
273 | "source": [
274 | "We can see that the \"gaps\" in the rainfall information for `\"Ełk\"` column got automatically filled by `missing`."
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "metadata": {},
280 | "source": [
281 | "There is also a `stack` function that does the reverse: transforms a data frame from wide to long format."
282 | ]
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "metadata": {},
287 | "source": [
288 | "Also note that one of the cities is `\"Ełk\"`, which has a non standard character `ł` in its name. It is not a problem with DataFrames.jl. Let us e.g. extract this column as an exercise:"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": 10,
294 | "metadata": {},
295 | "outputs": [
296 | {
297 | "data": {
298 | "text/plain": [
299 | "6-element Vector{Union{Missing, Quantity{Float64, 𝐋, Unitful.FreeUnits{(mm,), 𝐋, nothing}}}}:\n",
300 | " 3.9 mm\n",
301 | " missing\n",
302 | " 1.2 mm\n",
303 | " 2.0 mm\n",
304 | " missing\n",
305 | " 2.0 mm"
306 | ]
307 | },
308 | "execution_count": 10,
309 | "metadata": {},
310 | "output_type": "execute_result"
311 | }
312 | ],
313 | "source": [
314 | "rainfall_wide.Ełk"
315 | ]
316 | },
317 | {
318 | "cell_type": "code",
319 | "execution_count": 11,
320 | "metadata": {},
321 | "outputs": [
322 | {
323 | "data": {
324 | "text/plain": [
325 | "6-element Vector{Union{Missing, Quantity{Float64, 𝐋, Unitful.FreeUnits{(mm,), 𝐋, nothing}}}}:\n",
326 | " 3.9 mm\n",
327 | " missing\n",
328 | " 1.2 mm\n",
329 | " 2.0 mm\n",
330 | " missing\n",
331 | " 2.0 mm"
332 | ]
333 | },
334 | "execution_count": 11,
335 | "metadata": {},
336 | "output_type": "execute_result"
337 | }
338 | ],
339 | "source": [
340 | "rainfall_wide.\"Ełk\""
341 | ]
342 | },
343 | {
344 | "cell_type": "markdown",
345 | "metadata": {},
346 | "source": [
347 | "When we read the data, we note that still there are gaps in the passed information --- one of the days is missing as there is no forecasted rainfall for it.\n",
348 | "\n",
349 | "It would be nice to have information for all days in the considered period. Here is the way to do it:"
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": 12,
355 | "metadata": {},
356 | "outputs": [
357 | {
358 | "data": {
359 | "text/html": [
360 | "1 2020-11-16 2 2020-11-17 3 2020-11-18 4 2020-11-19 5 2020-11-20 6 2020-11-21 7 2020-11-22
"
361 | ],
362 | "text/latex": [
363 | "\\begin{tabular}{r|c}\n",
364 | "\t& date\\\\\n",
365 | "\t\\hline\n",
366 | "\t& Date\\\\\n",
367 | "\t\\hline\n",
368 | "\t1 & 2020-11-16 \\\\\n",
369 | "\t2 & 2020-11-17 \\\\\n",
370 | "\t3 & 2020-11-18 \\\\\n",
371 | "\t4 & 2020-11-19 \\\\\n",
372 | "\t5 & 2020-11-20 \\\\\n",
373 | "\t6 & 2020-11-21 \\\\\n",
374 | "\t7 & 2020-11-22 \\\\\n",
375 | "\\end{tabular}\n"
376 | ],
377 | "text/plain": [
378 | "\u001b[1m7×1 DataFrame\u001b[0m\n",
379 | "\u001b[1m Row \u001b[0m│\u001b[1m date \u001b[0m\n",
380 | " │\u001b[90m Date \u001b[0m\n",
381 | "─────┼────────────\n",
382 | " 1 │ 2020-11-16\n",
383 | " 2 │ 2020-11-17\n",
384 | " 3 │ 2020-11-18\n",
385 | " 4 │ 2020-11-19\n",
386 | " 5 │ 2020-11-20\n",
387 | " 6 │ 2020-11-21\n",
388 | " 7 │ 2020-11-22"
389 | ]
390 | },
391 | "execution_count": 12,
392 | "metadata": {},
393 | "output_type": "execute_result"
394 | }
395 | ],
396 | "source": [
397 | "all_days = DataFrame(date=Date.(2020,11, 16:22))"
398 | ]
399 | },
400 | {
401 | "cell_type": "code",
402 | "execution_count": 13,
403 | "metadata": {},
404 | "outputs": [
405 | {
406 | "data": {
407 | "text/html": [
408 | "1 2020-11-16 2.9 mm 3.9 mm 2 2020-11-17 4.1 mm 0.0 mm 3 2020-11-19 4.3 mm 1.2 mm 4 2020-11-20 2.0 mm 2.0 mm 5 2020-11-21 0.6 mm 0.0 mm 6 2020-11-22 1.0 mm 2.0 mm 7 2020-11-18 0.0 mm 0.0 mm
"
409 | ],
410 | "text/latex": [
411 | "\\begin{tabular}{r|ccc}\n",
412 | "\t& date & Olecko & Ełk\\\\\n",
413 | "\t\\hline\n",
414 | "\t& Date & Quantity… & Quantity…\\\\\n",
415 | "\t\\hline\n",
416 | "\t1 & 2020-11-16 & 2.9 mm & 3.9 mm \\\\\n",
417 | "\t2 & 2020-11-17 & 4.1 mm & 0.0 mm \\\\\n",
418 | "\t3 & 2020-11-19 & 4.3 mm & 1.2 mm \\\\\n",
419 | "\t4 & 2020-11-20 & 2.0 mm & 2.0 mm \\\\\n",
420 | "\t5 & 2020-11-21 & 0.6 mm & 0.0 mm \\\\\n",
421 | "\t6 & 2020-11-22 & 1.0 mm & 2.0 mm \\\\\n",
422 | "\t7 & 2020-11-18 & 0.0 mm & 0.0 mm \\\\\n",
423 | "\\end{tabular}\n"
424 | ],
425 | "text/plain": [
426 | "\u001b[1m7×3 DataFrame\u001b[0m\n",
427 | "\u001b[1m Row \u001b[0m│\u001b[1m date \u001b[0m\u001b[1m Olecko \u001b[0m\u001b[1m Ełk \u001b[0m\n",
428 | " │\u001b[90m Date \u001b[0m\u001b[90m Quantity… \u001b[0m\u001b[90m Quantity… \u001b[0m\n",
429 | "─────┼──────────────────────────────────\n",
430 | " 1 │ 2020-11-16 2.9 mm 3.9 mm\n",
431 | " 2 │ 2020-11-17 4.1 mm 0.0 mm\n",
432 | " 3 │ 2020-11-19 4.3 mm 1.2 mm\n",
433 | " 4 │ 2020-11-20 2.0 mm 2.0 mm\n",
434 | " 5 │ 2020-11-21 0.6 mm 0.0 mm\n",
435 | " 6 │ 2020-11-22 1.0 mm 2.0 mm\n",
436 | " 7 │ 2020-11-18 0.0 mm 0.0 mm"
437 | ]
438 | },
439 | "execution_count": 13,
440 | "metadata": {},
441 | "output_type": "execute_result"
442 | }
443 | ],
444 | "source": [
445 | "@pipe leftjoin(all_days, rainfall_wide, on=:date) |>\n",
446 | " coalesce.(_, 0.0u\"mm\")"
447 | ]
448 | },
449 | {
450 | "cell_type": "markdown",
451 | "metadata": {},
452 | "source": [
453 | "Note that we additionally used a broadcasted `coalesce` operation on the whole data frame returned from `leftjoin` to replace all `missing` values by `0.0u\"mm\"` in it, as in this case `missing` meant that there is no rain forecasted for that day.\n",
454 | "\n",
455 | "It was safe to do here, as we knew that `:date` column does not contain missings. In particular note that `leftjoin` would error by default if we tried to perfrom join on a column that contains `missing` values (use `matchmissing` keyword argument in joins to change this behavior)."
456 | ]
457 | },
458 | {
459 | "cell_type": "markdown",
460 | "metadata": {},
461 | "source": [
462 | "### Conclusions"
463 | ]
464 | },
465 | {
466 | "cell_type": "markdown",
467 | "metadata": {},
468 | "source": [
469 | "Before we finish let us summarize the major functions that DataFrames.jl provides:\n",
470 | "1. data frame is a matrix-like data structure. You can index it just like a matrix. The differences are\n",
471 | " - you can use strings or `Symbol`s to select columns\n",
472 | " - if you select rows with `!` it selects you whole column of a data frame and passes it to you without copying\n",
473 | "2. You can quickly summarize the contents of a data frame using the `describe` function\n",
474 | "3. You can add rows to a data frame in-place using `push!` (similarly `append!` allows you to add multiple rows at the same time) (also `repeat`/`repeat!`, `hcat` and `vcat` are provided)\n",
475 | "4. You can work on a grouped data frame that is created using the `groupby` function. It is a view and works as-if you have created a lookup index to a data frame.\n",
476 | "5. There are `select`/`select!`/`transform`/`transform!`/`combine` functions that allow you to quickly transform/aggregate columns of a data frame or grouped data frame; there is also `mapcols`/`mapcols!` functions for quick aggregation of columns of a data frame\n",
477 | "6. You can filter rows of a data frame using `filter` and `filter!` functions (also `subset` and `subset!` starting from version 1.0)\n",
478 | "7. Use `sort` and `sort!` functions to sort data frames\n",
479 | "8. You can join multiple data frames using `innerjoin`, `outerjoin`, `leftjoin`, `rightjoin`, `semijoin`, `antijoin`, and `crossjoin` functions (they work as you would expect them if you know SQL)\n",
480 | "9. If you want to iterate rows or columns of a data frame use `eachrow` and `eachcol` functions (we have not discussed them, but they work exactly like in Julia Base)\n",
481 | "10. You can change names of columns in a data frame using `rename` and `rename!` functions; to get names of columns of a data frame use `names` (strings) or `propertynames` (`Symbol`s)\n",
482 | "11. To get number of rows and columns of a data frame use `nrow` and `ncol` functions\n",
483 | "12. To flatten nested columns of a data frame use `flatten`\n",
484 | "13. You can easily allow/disallow missing values in columns of a data frame using `allowmising`/`allowmissing!`/`disallowmising`/`disallowmissing!` functions\n",
485 | "14. You can drop rows with missing data with `dropmissing`/`dropmissing!` functions\n",
486 | "15. You can switch between [long and wide](https://en.wikipedia.org/wiki/Wide_and_narrow_data) representation of a data frame using `stack` and `unstack`"
487 | ]
488 | },
489 | {
490 | "cell_type": "markdown",
491 | "metadata": {},
492 | "source": [
493 | "Additionally we have covered `freqtable` from FreqTables.jl, `@pipe` from Pipe.jl, and `lm` from GLM.jl packages that are often useful when wrangling data.\n",
494 | "\n",
495 | "You can use many formats to store and read data frames, we have discussed CSV.jl and Arrow.jl packages that provide such functionality.\n",
496 | "\n",
497 | "Finally we have shown how to integrate DataFrames.jl with plotting using PyPlot.jl and Unitful.jl."
498 | ]
499 | },
500 | {
501 | "cell_type": "markdown",
502 | "metadata": {},
503 | "source": [
504 | "Of course this course was just an introduction.\n",
505 | "\n",
506 | "You can find reviews of functionality of DataFrames.jl in:\n",
507 | "* an official manual at https://juliadata.github.io/DataFrames.jl/stable/\n",
508 | "* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial\n",
509 | "* documentation strings of the respective funcions"
510 | ]
511 | }
512 | ],
513 | "metadata": {
514 | "kernelspec": {
515 | "display_name": "Julia 1.9.0",
516 | "language": "julia",
517 | "name": "julia-1.9"
518 | },
519 | "language_info": {
520 | "file_extension": ".jl",
521 | "mimetype": "application/julia",
522 | "name": "julia",
523 | "version": "1.9.0"
524 | }
525 | },
526 | "nbformat": 4,
527 | "nbformat_minor": 4
528 | }
529 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 JuliaAcademy
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/Manifest.toml:
--------------------------------------------------------------------------------
1 | # This file is machine-generated - editing it directly is not advised
2 |
3 | julia_version = "1.9.0"
4 | manifest_format = "2.0"
5 | project_hash = "f880972f100046709b5d5afa29c4329f1d88c4f4"
6 |
7 | [[deps.ArgTools]]
8 | uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
9 | version = "1.1.1"
10 |
11 | [[deps.Arrow]]
12 | deps = ["ArrowTypes", "BitIntegers", "CodecLz4", "CodecZstd", "DataAPI", "Dates", "LoggingExtras", "Mmap", "PooledArrays", "SentinelArrays", "Tables", "TimeZones", "UUIDs", "WorkerUtilities"]
13 | git-tree-sha1 = "4e40f4868281b7fd702c605c764ab82a52ac3f4b"
14 | uuid = "69666777-d1a9-59fb-9406-91d4454c9d45"
15 | version = "2.4.3"
16 |
17 | [[deps.ArrowTypes]]
18 | deps = ["UUIDs"]
19 | git-tree-sha1 = "563d60f89fcb730668bd568ba3e752ee71dde023"
20 | uuid = "31f734f8-188a-4ce0-8406-c8a06bd891cd"
21 | version = "2.0.2"
22 |
23 | [[deps.Artifacts]]
24 | uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
25 |
26 | [[deps.Base64]]
27 | uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
28 |
29 | [[deps.BitIntegers]]
30 | deps = ["Random"]
31 | git-tree-sha1 = "fc54d5837033a170f3bad307f993e156eefc345f"
32 | uuid = "c3b6d118-76ef-56ca-8cc7-ebb389d030a1"
33 | version = "0.2.7"
34 |
35 | [[deps.CEnum]]
36 | git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90"
37 | uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
38 | version = "0.4.2"
39 |
40 | [[deps.CSV]]
41 | deps = ["CodecZlib", "Dates", "FilePathsBase", "InlineStrings", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "SnoopPrecompile", "Tables", "Unicode", "WeakRefStrings", "WorkerUtilities"]
42 | git-tree-sha1 = "c700cce799b51c9045473de751e9319bdd1c6e94"
43 | uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
44 | version = "0.10.9"
45 |
46 | [[deps.Calculus]]
47 | deps = ["LinearAlgebra"]
48 | git-tree-sha1 = "f641eb0a4f00c343bbc32346e1217b86f3ce9dad"
49 | uuid = "49dc2e85-a5d0-5ad3-a950-438e2897f1b9"
50 | version = "0.5.1"
51 |
52 | [[deps.CategoricalArrays]]
53 | deps = ["DataAPI", "Future", "Missings", "Printf", "Requires", "Statistics", "Unicode"]
54 | git-tree-sha1 = "5084cc1a28976dd1642c9f337b28a3cb03e0f7d2"
55 | uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
56 | version = "0.10.7"
57 |
58 | [[deps.ChainRulesCore]]
59 | deps = ["Compat", "LinearAlgebra", "SparseArrays"]
60 | git-tree-sha1 = "c6d890a52d2c4d55d326439580c3b8d0875a77d9"
61 | uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
62 | version = "1.15.7"
63 |
64 | [[deps.CodecLz4]]
65 | deps = ["Lz4_jll", "TranscodingStreams"]
66 | git-tree-sha1 = "59fe0cb37784288d6b9f1baebddbf75457395d40"
67 | uuid = "5ba52731-8f18-5e0d-9241-30f10d1ec561"
68 | version = "0.4.0"
69 |
70 | [[deps.CodecZlib]]
71 | deps = ["TranscodingStreams", "Zlib_jll"]
72 | git-tree-sha1 = "9c209fb7536406834aa938fb149964b985de6c83"
73 | uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
74 | version = "0.7.1"
75 |
76 | [[deps.CodecZstd]]
77 | deps = ["CEnum", "TranscodingStreams", "Zstd_jll"]
78 | git-tree-sha1 = "849470b337d0fa8449c21061de922386f32949d9"
79 | uuid = "6b39b394-51ab-5f42-8807-6242bab2b4c2"
80 | version = "0.7.2"
81 |
82 | [[deps.ColorTypes]]
83 | deps = ["FixedPointNumbers", "Random"]
84 | git-tree-sha1 = "eb7f0f8307f71fac7c606984ea5fb2817275d6e4"
85 | uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
86 | version = "0.11.4"
87 |
88 | [[deps.Colors]]
89 | deps = ["ColorTypes", "FixedPointNumbers", "Reexport"]
90 | git-tree-sha1 = "fc08e5930ee9a4e03f84bfb5211cb54e7769758a"
91 | uuid = "5ae59095-9a9b-59fe-a467-6f913c188581"
92 | version = "0.12.10"
93 |
94 | [[deps.Combinatorics]]
95 | git-tree-sha1 = "08c8b6831dc00bfea825826be0bc8336fc369860"
96 | uuid = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
97 | version = "1.0.2"
98 |
99 | [[deps.Compat]]
100 | deps = ["Dates", "LinearAlgebra", "UUIDs"]
101 | git-tree-sha1 = "61fdd77467a5c3ad071ef8277ac6bd6af7dd4c04"
102 | uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
103 | version = "4.6.0"
104 |
105 | [[deps.CompilerSupportLibraries_jll]]
106 | deps = ["Artifacts", "Libdl"]
107 | uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
108 | version = "1.0.2+0"
109 |
110 | [[deps.Conda]]
111 | deps = ["Downloads", "JSON", "VersionParsing"]
112 | git-tree-sha1 = "e32a90da027ca45d84678b826fffd3110bb3fc90"
113 | uuid = "8f4d0f93-b110-5947-807f-2305c1781a2d"
114 | version = "1.8.0"
115 |
116 | [[deps.ConstructionBase]]
117 | deps = ["LinearAlgebra"]
118 | git-tree-sha1 = "fb21ddd70a051d882a1686a5a550990bbe371a95"
119 | uuid = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
120 | version = "1.4.1"
121 |
122 | [[deps.Crayons]]
123 | git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15"
124 | uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
125 | version = "4.1.1"
126 |
127 | [[deps.DataAPI]]
128 | git-tree-sha1 = "e8119c1a33d267e16108be441a287a6981ba1630"
129 | uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
130 | version = "1.14.0"
131 |
132 | [[deps.DataFrames]]
133 | deps = ["Compat", "DataAPI", "Future", "InlineStrings", "InvertedIndices", "IteratorInterfaceExtensions", "LinearAlgebra", "Markdown", "Missings", "PooledArrays", "PrettyTables", "Printf", "REPL", "Random", "Reexport", "SentinelArrays", "SnoopPrecompile", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
134 | git-tree-sha1 = "aa51303df86f8626a962fccb878430cdb0a97eee"
135 | uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
136 | version = "1.5.0"
137 |
138 | [[deps.DataStructures]]
139 | deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
140 | git-tree-sha1 = "d1fff3a548102f48987a52a2e0d114fa97d730f0"
141 | uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
142 | version = "0.18.13"
143 |
144 | [[deps.DataValueInterfaces]]
145 | git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
146 | uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
147 | version = "1.0.0"
148 |
149 | [[deps.Dates]]
150 | deps = ["Printf"]
151 | uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
152 |
153 | [[deps.DelimitedFiles]]
154 | deps = ["Mmap"]
155 | git-tree-sha1 = "9e2f36d3c96a820c678f2f1f1782582fcf685bae"
156 | uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
157 | version = "1.9.1"
158 |
159 | [[deps.DensityInterface]]
160 | deps = ["InverseFunctions", "Test"]
161 | git-tree-sha1 = "80c3e8639e3353e5d2912fb3a1916b8455e2494b"
162 | uuid = "b429d917-457f-4dbc-8f4c-0cc954292b1d"
163 | version = "0.4.0"
164 |
165 | [[deps.Distributions]]
166 | deps = ["ChainRulesCore", "DensityInterface", "FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Test"]
167 | git-tree-sha1 = "74911ad88921455c6afcad1eefa12bd7b1724631"
168 | uuid = "31c24e10-a181-5473-b8eb-7969acd0382f"
169 | version = "0.25.80"
170 |
171 | [[deps.DocStringExtensions]]
172 | deps = ["LibGit2"]
173 | git-tree-sha1 = "2fb1e02f2b635d0845df5d7c167fec4dd739b00d"
174 | uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
175 | version = "0.9.3"
176 |
177 | [[deps.Downloads]]
178 | deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"]
179 | uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
180 | version = "1.6.0"
181 |
182 | [[deps.DualNumbers]]
183 | deps = ["Calculus", "NaNMath", "SpecialFunctions"]
184 | git-tree-sha1 = "5837a837389fccf076445fce071c8ddaea35a566"
185 | uuid = "fa6b7ba4-c1ee-5f82-b5fc-ecf0adba8f74"
186 | version = "0.6.8"
187 |
188 | [[deps.ExprTools]]
189 | git-tree-sha1 = "56559bbef6ca5ea0c0818fa5c90320398a6fbf8d"
190 | uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
191 | version = "0.1.8"
192 |
193 | [[deps.FilePathsBase]]
194 | deps = ["Compat", "Dates", "Mmap", "Printf", "Test", "UUIDs"]
195 | git-tree-sha1 = "e27c4ebe80e8699540f2d6c805cc12203b614f12"
196 | uuid = "48062228-2e41-5def-b9a4-89aafe57970f"
197 | version = "0.9.20"
198 |
199 | [[deps.FileWatching]]
200 | uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"
201 |
202 | [[deps.FillArrays]]
203 | deps = ["LinearAlgebra", "Random", "SparseArrays", "Statistics"]
204 | git-tree-sha1 = "d3ba08ab64bdfd27234d3f61956c966266757fe6"
205 | uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
206 | version = "0.13.7"
207 |
208 | [[deps.FixedPointNumbers]]
209 | deps = ["Statistics"]
210 | git-tree-sha1 = "335bfdceacc84c5cdf16aadc768aa5ddfc5383cc"
211 | uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
212 | version = "0.8.4"
213 |
214 | [[deps.Formatting]]
215 | deps = ["Printf"]
216 | git-tree-sha1 = "8339d61043228fdd3eb658d86c926cb282ae72a8"
217 | uuid = "59287772-0a20-5a39-b81b-1366585eb4c0"
218 | version = "0.4.2"
219 |
220 | [[deps.FreqTables]]
221 | deps = ["CategoricalArrays", "Missings", "NamedArrays", "Tables"]
222 | git-tree-sha1 = "488ad2dab30fd2727ee65451f790c81ed454666d"
223 | uuid = "da1fdf0e-e0ff-5433-a45f-9bb5ff651cb1"
224 | version = "0.4.5"
225 |
226 | [[deps.Future]]
227 | deps = ["Random"]
228 | uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
229 |
230 | [[deps.GLM]]
231 | deps = ["Distributions", "LinearAlgebra", "Printf", "Reexport", "SparseArrays", "SpecialFunctions", "Statistics", "StatsAPI", "StatsBase", "StatsFuns", "StatsModels"]
232 | git-tree-sha1 = "884477b9886a52a84378275737e2823a5c98e349"
233 | uuid = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
234 | version = "1.8.1"
235 |
236 | [[deps.HypergeometricFunctions]]
237 | deps = ["DualNumbers", "LinearAlgebra", "OpenLibm_jll", "SpecialFunctions", "Test"]
238 | git-tree-sha1 = "709d864e3ed6e3545230601f94e11ebc65994641"
239 | uuid = "34004b35-14d8-5ef3-9330-4cdb6864b03a"
240 | version = "0.3.11"
241 |
242 | [[deps.InlineStrings]]
243 | deps = ["Parsers"]
244 | git-tree-sha1 = "9cc2baf75c6d09f9da536ddf58eb2f29dedaf461"
245 | uuid = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48"
246 | version = "1.4.0"
247 |
248 | [[deps.InteractiveUtils]]
249 | deps = ["Markdown"]
250 | uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
251 |
252 | [[deps.InverseFunctions]]
253 | deps = ["Test"]
254 | git-tree-sha1 = "49510dfcb407e572524ba94aeae2fced1f3feb0f"
255 | uuid = "3587e190-3f89-42d0-90ee-14403ec27112"
256 | version = "0.1.8"
257 |
258 | [[deps.InvertedIndices]]
259 | git-tree-sha1 = "82aec7a3dd64f4d9584659dc0b62ef7db2ef3e19"
260 | uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f"
261 | version = "1.2.0"
262 |
263 | [[deps.IrrationalConstants]]
264 | git-tree-sha1 = "7fd44fd4ff43fc60815f8e764c0f352b83c49151"
265 | uuid = "92d709cd-6900-40b7-9082-c6be49f344b6"
266 | version = "0.1.1"
267 |
268 | [[deps.IteratorInterfaceExtensions]]
269 | git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
270 | uuid = "82899510-4779-5014-852e-03e436cf321d"
271 | version = "1.0.0"
272 |
273 | [[deps.JLLWrappers]]
274 | deps = ["Preferences"]
275 | git-tree-sha1 = "abc9885a7ca2052a736a600f7fa66209f96506e1"
276 | uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
277 | version = "1.4.1"
278 |
279 | [[deps.JSON]]
280 | deps = ["Dates", "Mmap", "Parsers", "Unicode"]
281 | git-tree-sha1 = "3c837543ddb02250ef42f4738347454f95079d4e"
282 | uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
283 | version = "0.21.3"
284 |
285 | [[deps.LaTeXStrings]]
286 | git-tree-sha1 = "f2355693d6778a178ade15952b7ac47a4ff97996"
287 | uuid = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
288 | version = "1.3.0"
289 |
290 | [[deps.LazyArtifacts]]
291 | deps = ["Artifacts", "Pkg"]
292 | uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"
293 |
294 | [[deps.LibCURL]]
295 | deps = ["LibCURL_jll", "MozillaCACerts_jll"]
296 | uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"
297 | version = "0.6.3"
298 |
299 | [[deps.LibCURL_jll]]
300 | deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
301 | uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"
302 | version = "7.84.0+0"
303 |
304 | [[deps.LibGit2]]
305 | deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
306 | uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
307 |
308 | [[deps.LibSSH2_jll]]
309 | deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
310 | uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"
311 | version = "1.10.2+0"
312 |
313 | [[deps.Libdl]]
314 | uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
315 |
316 | [[deps.LinearAlgebra]]
317 | deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"]
318 | uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
319 |
320 | [[deps.LogExpFunctions]]
321 | deps = ["DocStringExtensions", "IrrationalConstants", "LinearAlgebra"]
322 | git-tree-sha1 = "680e733c3a0a9cea9e935c8c2184aea6a63fa0b5"
323 | uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
324 | version = "0.3.21"
325 |
326 | [deps.LogExpFunctions.extensions]
327 | ChainRulesCoreExt = "ChainRulesCore"
328 | ChangesOfVariablesExt = "ChangesOfVariables"
329 | InverseFunctionsExt = "InverseFunctions"
330 |
331 | [deps.LogExpFunctions.weakdeps]
332 | ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
333 | ChangesOfVariables = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0"
334 | InverseFunctions = "3587e190-3f89-42d0-90ee-14403ec27112"
335 |
336 | [[deps.Logging]]
337 | uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
338 |
339 | [[deps.LoggingExtras]]
340 | deps = ["Dates", "Logging"]
341 | git-tree-sha1 = "cedb76b37bc5a6c702ade66be44f831fa23c681e"
342 | uuid = "e6f89c97-d47a-5376-807f-9c37f3926c36"
343 | version = "1.0.0"
344 |
345 | [[deps.Lz4_jll]]
346 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
347 | git-tree-sha1 = "5d494bc6e85c4c9b626ee0cab05daa4085486ab1"
348 | uuid = "5ced341a-0733-55b8-9ab6-a4889d929147"
349 | version = "1.9.3+0"
350 |
351 | [[deps.MacroTools]]
352 | deps = ["Markdown", "Random"]
353 | git-tree-sha1 = "42324d08725e200c23d4dfb549e0d5d89dede2d2"
354 | uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
355 | version = "0.5.10"
356 |
357 | [[deps.Markdown]]
358 | deps = ["Base64"]
359 | uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
360 |
361 | [[deps.MbedTLS_jll]]
362 | deps = ["Artifacts", "Libdl"]
363 | uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
364 | version = "2.28.0+0"
365 |
366 | [[deps.Missings]]
367 | deps = ["DataAPI"]
368 | git-tree-sha1 = "f66bdc5de519e8f8ae43bdc598782d35a25b1272"
369 | uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
370 | version = "1.1.0"
371 |
372 | [[deps.Mmap]]
373 | uuid = "a63ad114-7e13-5084-954f-fe012c677804"
374 |
375 | [[deps.Mocking]]
376 | deps = ["Compat", "ExprTools"]
377 | git-tree-sha1 = "c272302b22479a24d1cf48c114ad702933414f80"
378 | uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
379 | version = "0.7.5"
380 |
381 | [[deps.MozillaCACerts_jll]]
382 | uuid = "14a3606d-f60d-562e-9121-12d972cd8159"
383 | version = "2022.10.11"
384 |
385 | [[deps.NaNMath]]
386 | deps = ["OpenLibm_jll"]
387 | git-tree-sha1 = "a7c3d1da1189a1c2fe843a3bfa04d18d20eb3211"
388 | uuid = "77ba4419-2d1f-58cd-9bb1-8ffee604a2e3"
389 | version = "1.0.1"
390 |
391 | [[deps.NamedArrays]]
392 | deps = ["Combinatorics", "DataStructures", "DelimitedFiles", "InvertedIndices", "LinearAlgebra", "Random", "Requires", "SparseArrays", "Statistics"]
393 | git-tree-sha1 = "2fd5787125d1a93fbe30961bd841707b8a80d75b"
394 | uuid = "86f7a689-2022-50b4-a561-43c23ac3c673"
395 | version = "0.9.6"
396 |
397 | [[deps.NetworkOptions]]
398 | uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"
399 | version = "1.2.0"
400 |
401 | [[deps.OpenBLAS_jll]]
402 | deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
403 | uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
404 | version = "0.3.21+0"
405 |
406 | [[deps.OpenLibm_jll]]
407 | deps = ["Artifacts", "Libdl"]
408 | uuid = "05823500-19ac-5b8b-9628-191a04bc5112"
409 | version = "0.8.1+0"
410 |
411 | [[deps.OpenSpecFun_jll]]
412 | deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"]
413 | git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1"
414 | uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
415 | version = "0.5.5+0"
416 |
417 | [[deps.OrderedCollections]]
418 | git-tree-sha1 = "85f8e6578bf1f9ee0d11e7bb1b1456435479d47c"
419 | uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
420 | version = "1.4.1"
421 |
422 | [[deps.PDMats]]
423 | deps = ["LinearAlgebra", "SparseArrays", "SuiteSparse"]
424 | git-tree-sha1 = "cf494dca75a69712a72b80bc48f59dcf3dea63ec"
425 | uuid = "90014a1f-27ba-587c-ab20-58faa44d9150"
426 | version = "0.11.16"
427 |
428 | [[deps.Parsers]]
429 | deps = ["Dates", "SnoopPrecompile"]
430 | git-tree-sha1 = "946b56b2135c6c10bbb93efad8a78b699b6383ab"
431 | uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
432 | version = "2.5.6"
433 |
434 | [[deps.Pipe]]
435 | git-tree-sha1 = "6842804e7867b115ca9de748a0cf6b364523c16d"
436 | uuid = "b98c9c47-44ae-5843-9183-064241ee97a0"
437 | version = "1.3.0"
438 |
439 | [[deps.Pkg]]
440 | deps = ["Artifacts", "Dates", "Downloads", "FileWatching", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
441 | uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
442 | version = "1.9.0"
443 |
444 | [[deps.PooledArrays]]
445 | deps = ["DataAPI", "Future"]
446 | git-tree-sha1 = "a6062fe4063cdafe78f4a0a81cfffb89721b30e7"
447 | uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
448 | version = "1.4.2"
449 |
450 | [[deps.Preferences]]
451 | deps = ["TOML"]
452 | git-tree-sha1 = "47e5f437cc0e7ef2ce8406ce1e7e24d44915f88d"
453 | uuid = "21216c6a-2e73-6563-6e65-726566657250"
454 | version = "1.3.0"
455 |
456 | [[deps.PrettyTables]]
457 | deps = ["Crayons", "Formatting", "LaTeXStrings", "Markdown", "Reexport", "StringManipulation", "Tables"]
458 | git-tree-sha1 = "96f6db03ab535bdb901300f88335257b0018689d"
459 | uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
460 | version = "2.2.2"
461 |
462 | [[deps.Printf]]
463 | deps = ["Unicode"]
464 | uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
465 |
466 | [[deps.PyCall]]
467 | deps = ["Conda", "Dates", "Libdl", "LinearAlgebra", "MacroTools", "Serialization", "VersionParsing"]
468 | git-tree-sha1 = "62f417f6ad727987c755549e9cd88c46578da562"
469 | uuid = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
470 | version = "1.95.1"
471 |
472 | [[deps.PyPlot]]
473 | deps = ["Colors", "LaTeXStrings", "PyCall", "Sockets", "Test", "VersionParsing"]
474 | git-tree-sha1 = "f9d953684d4d21e947cb6d642db18853d43cb027"
475 | uuid = "d330b81b-6aea-500a-939a-2ce795aea3ee"
476 | version = "2.11.0"
477 |
478 | [[deps.QuadGK]]
479 | deps = ["DataStructures", "LinearAlgebra"]
480 | git-tree-sha1 = "786efa36b7eff813723c4849c90456609cf06661"
481 | uuid = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
482 | version = "2.8.1"
483 |
484 | [[deps.REPL]]
485 | deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
486 | uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
487 |
488 | [[deps.Random]]
489 | deps = ["SHA", "Serialization"]
490 | uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
491 |
492 | [[deps.RecipesBase]]
493 | deps = ["SnoopPrecompile"]
494 | git-tree-sha1 = "261dddd3b862bd2c940cf6ca4d1c8fe593e457c8"
495 | uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
496 | version = "1.3.3"
497 |
498 | [[deps.Reexport]]
499 | git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b"
500 | uuid = "189a3867-3050-52da-a836-e630ba90ab69"
501 | version = "1.2.2"
502 |
503 | [[deps.Requires]]
504 | deps = ["UUIDs"]
505 | git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7"
506 | uuid = "ae029012-a4dd-5104-9daa-d747884805df"
507 | version = "1.3.0"
508 |
509 | [[deps.Rmath]]
510 | deps = ["Random", "Rmath_jll"]
511 | git-tree-sha1 = "f65dcb5fa46aee0cf9ed6274ccbd597adc49aa7b"
512 | uuid = "79098fc4-a85e-5d69-aa6a-4863f24498fa"
513 | version = "0.7.1"
514 |
515 | [[deps.Rmath_jll]]
516 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
517 | git-tree-sha1 = "6ed52fdd3382cf21947b15e8870ac0ddbff736da"
518 | uuid = "f50d1b31-88e8-58de-be2c-1cc44531875f"
519 | version = "0.4.0+0"
520 |
521 | [[deps.SHA]]
522 | uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
523 | version = "0.7.0"
524 |
525 | [[deps.Scratch]]
526 | deps = ["Dates"]
527 | git-tree-sha1 = "f94f779c94e58bf9ea243e77a37e16d9de9126bd"
528 | uuid = "6c6a2e73-6563-6170-7368-637461726353"
529 | version = "1.1.1"
530 |
531 | [[deps.SentinelArrays]]
532 | deps = ["Dates", "Random"]
533 | git-tree-sha1 = "c02bd3c9c3fc8463d3591a62a378f90d2d8ab0f3"
534 | uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c"
535 | version = "1.3.17"
536 |
537 | [[deps.Serialization]]
538 | uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
539 |
540 | [[deps.ShiftedArrays]]
541 | git-tree-sha1 = "503688b59397b3307443af35cd953a13e8005c16"
542 | uuid = "1277b4bf-5013-50f5-be3d-901d8477a67a"
543 | version = "2.0.0"
544 |
545 | [[deps.SnoopPrecompile]]
546 | deps = ["Preferences"]
547 | git-tree-sha1 = "e760a70afdcd461cf01a575947738d359234665c"
548 | uuid = "66db9d55-30c0-4569-8b51-7e840670fc0c"
549 | version = "1.0.3"
550 |
551 | [[deps.Sockets]]
552 | uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
553 |
554 | [[deps.SortingAlgorithms]]
555 | deps = ["DataStructures"]
556 | git-tree-sha1 = "a4ada03f999bd01b3a25dcaa30b2d929fe537e00"
557 | uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c"
558 | version = "1.1.0"
559 |
560 | [[deps.SparseArrays]]
561 | deps = ["Libdl", "LinearAlgebra", "Random", "Serialization", "SuiteSparse_jll"]
562 | uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
563 |
564 | [[deps.SpecialFunctions]]
565 | deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"]
566 | git-tree-sha1 = "d75bda01f8c31ebb72df80a46c88b25d1c79c56d"
567 | uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
568 | version = "2.1.7"
569 |
570 | [[deps.Statistics]]
571 | deps = ["LinearAlgebra", "SparseArrays"]
572 | uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
573 | version = "1.9.0"
574 |
575 | [[deps.StatsAPI]]
576 | deps = ["LinearAlgebra"]
577 | git-tree-sha1 = "f9af7f195fb13589dd2e2d57fdb401717d2eb1f6"
578 | uuid = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
579 | version = "1.5.0"
580 |
581 | [[deps.StatsBase]]
582 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "LogExpFunctions", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics", "StatsAPI"]
583 | git-tree-sha1 = "d1bf48bfcc554a3761a133fe3a9bb01488e06916"
584 | uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
585 | version = "0.33.21"
586 |
587 | [[deps.StatsFuns]]
588 | deps = ["ChainRulesCore", "HypergeometricFunctions", "InverseFunctions", "IrrationalConstants", "LogExpFunctions", "Reexport", "Rmath", "SpecialFunctions"]
589 | git-tree-sha1 = "ab6083f09b3e617e34a956b43e9d51b824206932"
590 | uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
591 | version = "1.1.1"
592 |
593 | [[deps.StatsModels]]
594 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Printf", "REPL", "ShiftedArrays", "SparseArrays", "StatsBase", "StatsFuns", "Tables"]
595 | git-tree-sha1 = "a5e15f27abd2692ccb61a99e0854dfb7d48017db"
596 | uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
597 | version = "0.6.33"
598 |
599 | [[deps.StringManipulation]]
600 | git-tree-sha1 = "46da2434b41f41ac3594ee9816ce5541c6096123"
601 | uuid = "892a3eda-7b42-436c-8928-eab12a02cf0e"
602 | version = "0.3.0"
603 |
604 | [[deps.SuiteSparse]]
605 | deps = ["Libdl", "LinearAlgebra", "Serialization", "SparseArrays"]
606 | uuid = "4607b0f0-06f3-5cda-b6b1-a6196a1729e9"
607 |
608 | [[deps.SuiteSparse_jll]]
609 | deps = ["Artifacts", "Libdl", "Pkg", "libblastrampoline_jll"]
610 | uuid = "bea87d4a-7f5b-5778-9afe-8cc45184846c"
611 | version = "5.10.1+6"
612 |
613 | [[deps.TOML]]
614 | deps = ["Dates"]
615 | uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
616 | version = "1.0.3"
617 |
618 | [[deps.TableTraits]]
619 | deps = ["IteratorInterfaceExtensions"]
620 | git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
621 | uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
622 | version = "1.0.1"
623 |
624 | [[deps.Tables]]
625 | deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits", "Test"]
626 | git-tree-sha1 = "c79322d36826aa2f4fd8ecfa96ddb47b174ac78d"
627 | uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
628 | version = "1.10.0"
629 |
630 | [[deps.Tar]]
631 | deps = ["ArgTools", "SHA"]
632 | uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"
633 | version = "1.10.0"
634 |
635 | [[deps.Test]]
636 | deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
637 | uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
638 |
639 | [[deps.TimeZones]]
640 | deps = ["Dates", "Downloads", "InlineStrings", "LazyArtifacts", "Mocking", "Printf", "RecipesBase", "Scratch", "Unicode"]
641 | git-tree-sha1 = "a92ec4466fc6e3dd704e2668b5e7f24add36d242"
642 | uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
643 | version = "1.9.1"
644 |
645 | [[deps.TranscodingStreams]]
646 | deps = ["Random", "Test"]
647 | git-tree-sha1 = "94f38103c984f89cf77c402f2a68dbd870f8165f"
648 | uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
649 | version = "0.9.11"
650 |
651 | [[deps.UUIDs]]
652 | deps = ["Random", "SHA"]
653 | uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
654 |
655 | [[deps.Unicode]]
656 | uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
657 |
658 | [[deps.Unitful]]
659 | deps = ["ConstructionBase", "Dates", "LinearAlgebra", "Random"]
660 | git-tree-sha1 = "d3f95a76c89777990d3d968ded5ecf12f9a0ad72"
661 | uuid = "1986cc42-f94f-5a68-af5c-568840ba703d"
662 | version = "1.12.3"
663 |
664 | [[deps.VersionParsing]]
665 | git-tree-sha1 = "58d6e80b4ee071f5efd07fda82cb9fbe17200868"
666 | uuid = "81def892-9a0e-5fdd-b105-ffc91e053289"
667 | version = "1.3.0"
668 |
669 | [[deps.WeakRefStrings]]
670 | deps = ["DataAPI", "InlineStrings", "Parsers"]
671 | git-tree-sha1 = "b1be2855ed9ed8eac54e5caff2afcdb442d52c23"
672 | uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
673 | version = "1.4.2"
674 |
675 | [[deps.WorkerUtilities]]
676 | git-tree-sha1 = "cd1659ba0d57b71a464a29e64dbc67cfe83d54e7"
677 | uuid = "76eceee3-57b5-4d4a-8e66-0e911cebbf60"
678 | version = "1.6.1"
679 |
680 | [[deps.Zlib_jll]]
681 | deps = ["Libdl"]
682 | uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
683 | version = "1.2.13+0"
684 |
685 | [[deps.Zstd_jll]]
686 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
687 | git-tree-sha1 = "e45044cd873ded54b6a5bac0eb5c971392cf1927"
688 | uuid = "3161d3a3-bdf6-5164-811a-617609db77b4"
689 | version = "1.5.2+0"
690 |
691 | [[deps.libblastrampoline_jll]]
692 | deps = ["Artifacts", "Libdl"]
693 | uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
694 | version = "5.4.0+0"
695 |
696 | [[deps.nghttp2_jll]]
697 | deps = ["Artifacts", "Libdl"]
698 | uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"
699 | version = "1.48.0+0"
700 |
701 | [[deps.p7zip_jll]]
702 | deps = ["Artifacts", "Libdl"]
703 | uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"
704 | version = "17.4.0+0"
705 |
--------------------------------------------------------------------------------
/Project.toml:
--------------------------------------------------------------------------------
1 | [deps]
2 | Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
3 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
4 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
5 | FreqTables = "da1fdf0e-e0ff-5433-a45f-9bb5ff651cb1"
6 | GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
7 | Pipe = "b98c9c47-44ae-5843-9183-064241ee97a0"
8 | PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
9 | Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
10 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # DataFrames.jl
2 |
3 | Welcome to DataFrames.jl with Bogumił Kamiński
4 |
5 | The material is organized in six notebooks that are recommended to be studied in
6 | the specified sequence. In order to make sure that the notebooks work correctly
7 | please download the whole repository and start Jupyter Notebook in the
8 | directory where the downloaded files are located.
9 |
10 | This tutorial is intended to be an introduction to the DataFrames.jl package and
11 | the related ecosystem. It is task-oriented, so it aims to show how available
12 | functionality can be used to solve typical data wrangling problems.
13 |
14 | If you are interested in in-depth coverage of all functions and their options
15 | that are provided in the DataFrames.jl package please consult
16 | [the documentation](https://juliadata.github.io/DataFrames.jl/stable/) or a more
17 | advanced tutorial that is available
18 | [here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
19 |
20 | This version of the manual was tested with Julia 1.9 and DataFrames.jl 1.5.
21 | However, you can check older commits and tags in this repository to get the
22 | versions for:
23 | * DataFrames.jl 0.22.0, 1.2.0, 1.3.0, 1.4.0.
24 |
--------------------------------------------------------------------------------
/rainfall_forecast.csv:
--------------------------------------------------------------------------------
1 | city,date,rainfall
2 | Olecko,2020-11-16,2.9
3 | Olecko,2020-11-17,4.1
4 | Olecko,2020-11-19,4.3
5 | Olecko,2020-11-20,2.0
6 | Olecko,2020-11-21,0.6
7 | Olecko,2020-11-22,1.0
8 | Ełk,2020-11-16,3.9
9 | Ełk,2020-11-19,1.2
10 | Ełk,2020-11-20,2.0
11 | Ełk,2020-11-22,2.0
12 |
--------------------------------------------------------------------------------