├── 1004588303.png
├── 1351541081.png
├── 2218687897.png
├── 2587122783.png
├── 2590211320.png
├── 2797619445.png
├── 3923620403.png
├── 4133943752.png
├── 4167443430.png
├── 517000181.png
├── 759701435.png
├── LICENCE.md
├── MLJLogo2.svg
├── Manifest.toml
├── Project.toml
├── README.md
├── Untitled.ipynb
├── apt.txt
├── assets
    ├── scitypes.drawio
    └── scitypes.png
├── data
    ├── horse.csv
    ├── house.csv
    ├── small.csv
    └── src
    │   ├── Manifest.toml
    │   ├── Project.toml
    │   ├── ames.csv
    │   ├── convert_ames.jl
    │   ├── convert_ames
    │       ├── Manifest.toml
    │       └── Project.toml
    │   ├── generate_horse.jl
    │   ├── get_king_county.jl
    │   └── reduced_ames.csv
├── environment.yml
├── exercise_6ci.png
├── exercise_7c.png
├── exercise_7c_2.png
├── exercise_7c_3.png
├── exercise_8c.png
├── gamma_sampler.png
├── iris_learning_curve.png
├── learning_curve.png
├── learning_curve2.png
├── methods.md
├── outline.md
├── setup.jl
├── stacking.png
├── tuning.png
├── tutorials.ipynb
├── tutorials.jl
├── tutorials.md
├── vecstack.png
├── wow.ipynb
├── wow.jl
└── wow.md


/1004588303.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/1004588303.png


--------------------------------------------------------------------------------
/1351541081.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/1351541081.png


--------------------------------------------------------------------------------
/2218687897.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/2218687897.png


--------------------------------------------------------------------------------
/2587122783.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/2587122783.png


--------------------------------------------------------------------------------
/2590211320.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/2590211320.png


--------------------------------------------------------------------------------
/2797619445.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/2797619445.png


--------------------------------------------------------------------------------
/3923620403.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/3923620403.png


--------------------------------------------------------------------------------
/4133943752.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/4133943752.png


--------------------------------------------------------------------------------
/4167443430.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/4167443430.png


--------------------------------------------------------------------------------
/517000181.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/517000181.png


--------------------------------------------------------------------------------
/759701435.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/759701435.png


--------------------------------------------------------------------------------
/LICENCE.md:
--------------------------------------------------------------------------------
 1 | The MLJ.jl package is licensed under the MIT "Expat" License:
 2 | 
 3 | > Copyright (c) 2020: Anthony Blaom
 4 | 
 5 | > Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | > of this software and associated documentation files (the "Software"), to deal
 7 | > in the Software without restriction, including without limitation the rights
 8 | > to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | > copies of the Software, and to permit persons to whom the Software is
10 | > furnished to do so, subject to the following conditions:
11 | > 
12 | > The above copyright notice and this permission notice shall be included in all
13 | > copies or substantial portions of the Software.
14 | > 
15 | > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | > IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | > FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | > AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | > LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | > OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | > SOFTWARE.
22 | > 
23 | 


--------------------------------------------------------------------------------
/MLJLogo2.svg:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <!-- Generator: Adobe Illustrator 19.2.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
 3 | <svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
 4 | 	 viewBox="0 0 200 160" style="enable-background:new 0 0 200 160;" xml:space="preserve">
 5 | <style type="text/css">
 6 | 	.st0{fill:none;stroke:#4C65AF;stroke-width:5;stroke-miterlimit:10;}
 7 | 	.st1{fill:#4C65AF;}
 8 | 	.st2{fill:#399846;stroke:#399846;stroke-miterlimit:10;}
 9 | 	.st3{fill:#FFFFFF;}
10 | 	.st4{fill:#CB3C33;stroke:#CB3C33;stroke-miterlimit:10;}
11 | 	.st5{fill:#925AA5;stroke:#925AA4;stroke-miterlimit:10;}
12 | </style>
13 | <g>
14 | 	<g>
15 | 		<line class="st0" x1="167.9" y1="76.1" x2="186.1" y2="76.1"/>
16 | 		<g>
17 | 			<polygon class="st1" points="183.9,83.6 196.8,76.1 183.9,68.6 			"/>
18 | 		</g>
19 | 	</g>
20 | </g>
21 | <g>
22 | 	<g>
23 | 		<line class="st0" x1="74.7" y1="44.9" x2="106.7" y2="59.1"/>
24 | 		<g>
25 | 			<polygon class="st1" points="101.7,65 116.6,63.4 107.8,51.3 			"/>
26 | 		</g>
27 | 	</g>
28 | </g>
29 | <g>
30 | 	<g>
31 | 		<line class="st0" x1="74.7" y1="114.3" x2="107.4" y2="94.5"/>
32 | 		<g>
33 | 			<polygon class="st1" points="109.4,102 116.6,88.9 101.7,89.2 			"/>
34 | 		</g>
35 | 	</g>
36 | </g>
37 | <circle class="st2" cx="52.9" cy="28.7" r="27.2"/>
38 | <g>
39 | 	<path class="st3" d="M57.8,26.3L58,22h-0.2l-4.5,7.3h-1L47.7,22h-0.2l0.2,3.9v15.4h-5.2v-28h5.6l4.8,7.8H53l4.8-7.8h5.5v28h-5.5
40 | 		V26.3z"/>
41 | </g>
42 | <circle class="st4" cx="52.9" cy="130.5" r="27.2"/>
43 | <g>
44 | 	<path class="st3" d="M57.5,131.3h5v11.4H44v-28h5.5v23.1h8V131.3z"/>
45 | </g>
46 | <circle class="st5" cx="140.7" cy="76.1" r="27.2"/>
47 | <g>
48 | 	<path class="st3" d="M132.3,61.2h16.7V79c0,1.5-0.2,2.9-0.6,4.3c-0.4,1.3-1.1,2.5-2.1,3.6c-0.6,0.6-1.3,1.1-2.1,1.5
49 | 		c-0.7,0.4-1.4,0.7-2.1,0.8s-1.3,0.3-2,0.4c-0.6,0.1-1.1,0.1-1.5,0.1c-1.3,0-2.5-0.1-3.7-0.4c-1.2-0.3-2.4-0.8-3.6-1.5l2-4.4
50 | 		c1,0.6,1.9,1,2.6,1.2s1.6,0.3,2.5,0.3c0.6,0,1.2-0.1,1.9-0.3c0.7-0.2,1.3-0.6,1.8-1.1c0.9-1,1.4-2.4,1.4-4.2V66h-11.2V61.2z"/>
51 | </g>
52 | <g>
53 | 	<g>
54 | 		<line class="st0" x1="1.5" y1="28.7" x2="14.9" y2="28.7"/>
55 | 		<g>
56 | 			<polygon class="st1" points="12.7,36.1 25.7,28.7 12.7,21.2 			"/>
57 | 		</g>
58 | 	</g>
59 | </g>
60 | <g>
61 | 	<g>
62 | 		<line class="st0" x1="1.5" y1="130.5" x2="14.9" y2="130.5"/>
63 | 		<g>
64 | 			<polygon class="st1" points="12.7,138 25.7,130.5 12.7,123.1 			"/>
65 | 		</g>
66 | 	</g>
67 | </g>
68 | <g>
69 | </g>
70 | <g>
71 | </g>
72 | <g>
73 | </g>
74 | <g>
75 | </g>
76 | <g>
77 | </g>
78 | <g>
79 | </g>
80 | <g>
81 | </g>
82 | </svg>
83 | 


--------------------------------------------------------------------------------
/Project.toml:
--------------------------------------------------------------------------------
 1 | [deps]
 2 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 3 | CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
 4 | ComputationalResources = "ed09eef8-17a6-5b46-8889-db040fac31e3"
 5 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 6 | DecisionTree = "7806a523-6efd-50cb-b5f6-3fa6f1930dbb"
 7 | Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 8 | EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
 9 | Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
10 | MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
11 | MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
12 | MLJClusteringInterface = "d354fa79-ed1c-40d4-88ef-b8c7bd1568af"
13 | MLJDecisionTreeInterface = "c6f25543-311c-4c74-83dc-3ea6d1015661"
14 | MLJFlux = "094fc8d1-fd35-5302-93ea-dabda2abf845"
15 | MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
16 | MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
17 | MLJMultivariateStatsInterface = "1b6a4a23-ba22-4f51-9698-8599985d3728"
18 | MLJScikitLearnInterface = "5ae90465-5518-4432-b9d2-8a1def2f0cab"
19 | NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
20 | NearestNeighbors = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
21 | Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
22 | Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
23 | ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
24 | StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
25 | Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
26 | UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228"
27 | 
28 | [compat]
29 | julia = ">=1.6, <1.7"
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Machine Learning in Julia using MLJ, JuliaCon2020
  2 | 
  3 | **Now updated for MLJ version 0.16 and Julia 1.6**
  4 | 
  5 | But binder notebook will not work until [this binder issue](https://github.com/jupyterhub/binderhub/issues/1424) is resolved.
  6 | 
  7 | Interactive tutorials for a workshop introducing the machine learning
  8 | toolbox [MLJ](https://alan-turing-institute.github.io/MLJ.jl/stable/) (v0.14.4)
  9 | 
 10 | <div align="center">
 11 | 	<img src="MLJLogo2.svg" alt="MLJ" width="200">
 12 | </div>
 13 | 
 14 | These tutorials were prepared for use in a 3 1/2 hour online workshop
 15 | at JuliaCon2020, recorded
 16 | [here](https://www.youtube.com/watch?time_continue=27&v=qSWbCn170HU&feature=emb_title). Their
 17 | main aim is to introduce the
 18 | [MLJ](https://alan-turing-institute.github.io/MLJ.jl/stable/) machine
 19 | learning toolbox to data scientists.
 20 | 
 21 | Differences from the original resources are minor (main difference:
 22 | `@load` now returns a type instead of an instance). However, if you
 23 | wish to access resources precisely matching those used in the video,
 24 | switch to the `JuliaCon2020` branch by clicking
 25 | [here](https://github.com/ablaom/MachineLearningInJulia2020/tree/for-MLJ-version-0.16).
 26 | 
 27 | **Future revisions** of these tutorials will appear [here](https://github.com/ablaom/MLJTutorial.jl).
 28 | 
 29 |   
 30 | ### [Options for running the tutorials](#options-for-running-the-tutorials)
 31 | 
 32 | ### [Non-interactive version](tutorials.md)
 33 | 
 34 | ### Topics covered
 35 | 
 36 | #### Basic
 37 | 
 38 | - Part 1 - **Data Representation**
 39 | 
 40 | - Part 2 - **Selecting, Training and Evaluating Models**
 41 | 
 42 | - Part 3 - **Transformers and Pipelines**
 43 | 
 44 | #### Advanced
 45 | 
 46 | - Part 4 - **Tuning hyper-parameters**
 47 | 
 48 | - Part 5 - **Advanced model composition** (as time permits)
 49 | 
 50 | The tutorials include links to external resources and exercises with
 51 | solutions.
 52 | 
 53 | 
 54 | ## Options for running the tutorials
 55 | 
 56 | ### 1. Plug-and-play
 57 | 
 58 | Only recommended for users with little Julia experience or users having
 59 | problems with the other options. 
 60 | 
 61 | Use this option if you have neither run Julia/Juptyer notebook on your
 62 | local machine before, nor used a Julia IDE to run a Julia script.
 63 | 
 64 | 
 65 | #### Pros
 66 | 
 67 | One
 68 | [click](https://mybinder.org/v2/gh/ablaom/MachineLearningInJulia2020/master?filepath=tutorials.ipynb). No
 69 | need to install anything on your local machine.
 70 | 
 71 | 
 72 | #### Cons
 73 | 
 74 | - The (automatic) setup can take a little while, sometimes over 15
 75 |   minutes (but you do get a static version of the notebook while it
 76 |   loads).
 77 | 
 78 | - **You will have to start over** if:
 79 | 
 80 |     - The notebook drops your connection for some reason.
 81 |     - You are **inactive for ten minutes**.
 82 | 
 83 | 
 84 | #### Instructions
 85 | 
 86 | Click this button: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ablaom/MachineLearningInJulia2020/master?filepath=tutorials.ipynb)
 87 | 
 88 | 
 89 | ### 2. Clone the repo and choose your preferred interface
 90 | 
 91 | Assumes that you have a working installation of
 92 | [Julia](https://julialang.org/downloads/) 1.3 or higher and that
 93 | either:
 94 | 
 95 | - You can run Julia/Juptyer notebooks on your local machine without problems; or
 96 | 
 97 | - You are comfortable running Julia scripts from an IDE, such as [Juno](https://junolab.org) or [Emacs](https://github.com/JuliaEditorSupport/julia-emacs) (see [here](https://julialang.org) for a complete list).
 98 | 
 99 | 
100 | #### Pros
101 | 
102 | More stable option
103 | 
104 | #### Cons
105 | 
106 | You need to meet above requirements
107 | 
108 | 
109 | #### Instructions
110 | 
111 | - Clone [this repository](https://github.com/ablaom/MachineLearningInJulia2020)
112 | 
113 | - Change to your local repo directory "MachineLearningInJulia2020/"
114 | 
115 | - Either run the Juptyper notebook called "tutorials.ipynb" from that
116 |   directory (corresponding to [this file](tutorials.ipynb) on GitHub)
117 |   or open "tutorials.jl" from that directory in your favourite IDE
118 |   (corresponding to [this file](tutorials.jl) on GitHub). You cannot
119 |   download these files individually - you need the whole directory.
120 | 
121 | - **Immediately** evaluate the first two lines of code to activate the
122 |   package environment and pre-load the packages, as this can take a
123 |   few minutes.
124 | 
125 | 
126 | ## More about the tutorials 
127 | 
128 | - The tutorials focus on the *machine learning* part of the data
129 |   science workflow, and less on exploratory data analysis and other
130 |   conventional "data analytics" methodology
131 | 
132 | - Here "machine learning" is meant in a broad sense, and is not
133 |   restricted to so-called *deep learning* (neural networks)
134 | 
135 | - The tutorials are crafted to rapidly familiarize the user with what
136 |   MLJ can do and how to do it, and are not a substitute for a course
137 |   on machine learning fundamentals. Examples do not necessarily
138 |   represent best practice or the best solution to a problem.
139 | 
140 | ## Binder notebook for stacking demo used in video
141 | 
142 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ablaom/MachineLearningInJulia2020/386ce06766dc1d9d9a0197ec57738b732c1c5d23?filepath=wow.ipynb)
143 | 
144 | 


--------------------------------------------------------------------------------
/Untitled.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 4,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "using DataFrames, CategoricalArrays"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "code",
 14 |    "execution_count": 2,
 15 |    "metadata": {},
 16 |    "outputs": [
 17 |     {
 18 |      "data": {
 19 |       "text/plain": [
 20 |        "12-element Array{Float64,1}:\n",
 21 |        "  1.0\n",
 22 |        "  2.0\n",
 23 |        "  3.0\n",
 24 |        "  4.0\n",
 25 |        "  5.0\n",
 26 |        "  6.0\n",
 27 |        "  7.0\n",
 28 |        "  8.0\n",
 29 |        "  9.0\n",
 30 |        " 10.0\n",
 31 |        " 11.0\n",
 32 |        " 12.0"
 33 |       ]
 34 |      },
 35 |      "execution_count": 2,
 36 |      "metadata": {},
 37 |      "output_type": "execute_result"
 38 |     }
 39 |    ],
 40 |    "source": [
 41 |     "time = float.(1:12 )"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 5,
 47 |    "metadata": {},
 48 |    "outputs": [
 49 |     {
 50 |      "data": {
 51 |       "text/plain": [
 52 |        "4-element CategoricalArray{String,1,UInt32}:\n",
 53 |        " \"kitchen\"\n",
 54 |        " \"bathroom\"\n",
 55 |        " \"bedroom_1\"\n",
 56 |        " \"living_room\""
 57 |       ]
 58 |      },
 59 |      "execution_count": 5,
 60 |      "metadata": {},
 61 |      "output_type": "execute_result"
 62 |     }
 63 |    ],
 64 |    "source": [
 65 |     "room = categorical([\"kitchen\", \"bathroom\", \"bedroom_1\", \"living_room\"])"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 6,
 71 |    "metadata": {},
 72 |    "outputs": [
 73 |     {
 74 |      "data": {
 75 |       "text/plain": [
 76 |        "12-element CategoricalArray{String,1,UInt32}:\n",
 77 |        " \"kitchen\"\n",
 78 |        " \"bathroom\"\n",
 79 |        " \"bedroom_1\"\n",
 80 |        " \"living_room\"\n",
 81 |        " \"kitchen\"\n",
 82 |        " \"bathroom\"\n",
 83 |        " \"bedroom_1\"\n",
 84 |        " \"living_room\"\n",
 85 |        " \"kitchen\"\n",
 86 |        " \"bathroom\"\n",
 87 |        " \"bedroom_1\"\n",
 88 |        " \"living_room\""
 89 |       ]
 90 |      },
 91 |      "execution_count": 6,
 92 |      "metadata": {},
 93 |      "output_type": "execute_result"
 94 |     }
 95 |    ],
 96 |    "source": [
 97 |     "room = vcat(room, room, room)"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": 7,
103 |    "metadata": {},
104 |    "outputs": [
105 |     {
106 |      "data": {
107 |       "text/plain": [
108 |        "1×12 Array{Int64,2}:\n",
109 |        " 5  5  5  5  6  6  6  6  7  7  7  7"
110 |       ]
111 |      },
112 |      "execution_count": 7,
113 |      "metadata": {},
114 |      "output_type": "execute_result"
115 |     }
116 |    ],
117 |    "source": [
118 |     "time = [5 5 5 5 6 6 6 6 7 7 7 7]"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": 11,
124 |    "metadata": {},
125 |    "outputs": [
126 |     {
127 |      "data": {
128 |       "text/plain": [
129 |        "12-element Array{Int64,1}:\n",
130 |        " 5\n",
131 |        " 5\n",
132 |        " 5\n",
133 |        " 5\n",
134 |        " 6\n",
135 |        " 6\n",
136 |        " 6\n",
137 |        " 6\n",
138 |        " 7\n",
139 |        " 7\n",
140 |        " 7\n",
141 |        " 7"
142 |       ]
143 |      },
144 |      "execution_count": 11,
145 |      "metadata": {},
146 |      "output_type": "execute_result"
147 |     }
148 |    ],
149 |    "source": [
150 |     "time =reshape(time, (12,))"
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "code",
155 |    "execution_count": 12,
156 |    "metadata": {},
157 |    "outputs": [
158 |     {
159 |      "data": {
160 |       "text/html": [
161 |        "<table class=\"data-frame\"><thead><tr><th></th><th>time</th><th>room</th></tr><tr><th></th><th>Int64</th><th>Cat…</th></tr></thead><tbody><p>12 rows × 2 columns</p><tr><th>1</th><td>5</td><td>kitchen</td></tr><tr><th>2</th><td>5</td><td>bathroom</td></tr><tr><th>3</th><td>5</td><td>bedroom_1</td></tr><tr><th>4</th><td>5</td><td>living_room</td></tr><tr><th>5</th><td>6</td><td>kitchen</td></tr><tr><th>6</th><td>6</td><td>bathroom</td></tr><tr><th>7</th><td>6</td><td>bedroom_1</td></tr><tr><th>8</th><td>6</td><td>living_room</td></tr><tr><th>9</th><td>7</td><td>kitchen</td></tr><tr><th>10</th><td>7</td><td>bathroom</td></tr><tr><th>11</th><td>7</td><td>bedroom_1</td></tr><tr><th>12</th><td>7</td><td>living_room</td></tr></tbody></table>"
162 |       ],
163 |       "text/latex": [
164 |        "\\begin{tabular}{r|cc}\n",
165 |        "\t& time & room\\\\\n",
166 |        "\t\\hline\n",
167 |        "\t& Int64 & Cat…\\\\\n",
168 |        "\t\\hline\n",
169 |        "\t1 & 5 & kitchen \\\\\n",
170 |        "\t2 & 5 & bathroom \\\\\n",
171 |        "\t3 & 5 & bedroom\\_1 \\\\\n",
172 |        "\t4 & 5 & living\\_room \\\\\n",
173 |        "\t5 & 6 & kitchen \\\\\n",
174 |        "\t6 & 6 & bathroom \\\\\n",
175 |        "\t7 & 6 & bedroom\\_1 \\\\\n",
176 |        "\t8 & 6 & living\\_room \\\\\n",
177 |        "\t9 & 7 & kitchen \\\\\n",
178 |        "\t10 & 7 & bathroom \\\\\n",
179 |        "\t11 & 7 & bedroom\\_1 \\\\\n",
180 |        "\t12 & 7 & living\\_room \\\\\n",
181 |        "\\end{tabular}\n"
182 |       ],
183 |       "text/plain": [
184 |        "12×2 DataFrame\n",
185 |        "│ Row │ time  │ room        │\n",
186 |        "│     │ \u001b[90mInt64\u001b[39m │ \u001b[90mCat…\u001b[39m        │\n",
187 |        "├─────┼───────┼─────────────┤\n",
188 |        "│ 1   │ 5     │ kitchen     │\n",
189 |        "│ 2   │ 5     │ bathroom    │\n",
190 |        "│ 3   │ 5     │ bedroom_1   │\n",
191 |        "│ 4   │ 5     │ living_room │\n",
192 |        "│ 5   │ 6     │ kitchen     │\n",
193 |        "│ 6   │ 6     │ bathroom    │\n",
194 |        "│ 7   │ 6     │ bedroom_1   │\n",
195 |        "│ 8   │ 6     │ living_room │\n",
196 |        "│ 9   │ 7     │ kitchen     │\n",
197 |        "│ 10  │ 7     │ bathroom    │\n",
198 |        "│ 11  │ 7     │ bedroom_1   │\n",
199 |        "│ 12  │ 7     │ living_room │"
200 |       ]
201 |      },
202 |      "execution_count": 12,
203 |      "metadata": {},
204 |      "output_type": "execute_result"
205 |     }
206 |    ],
207 |    "source": [
208 |     "X = DataFrame(time=time, room=room)"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": 13,
214 |    "metadata": {},
215 |    "outputs": [
216 |     {
217 |      "name": "stderr",
218 |      "output_type": "stream",
219 |      "text": [
220 |       "┌ Info: Precompiling MLJ [add582a8-e3ab-11e8-2d5e-e98b27df1bc7]\n",
221 |       "└ @ Base loading.jl:1260\n",
222 |       "[ Info: Model metadata loaded from registry. \n"
223 |      ]
224 |     }
225 |    ],
226 |    "source": [
227 |     "using MLJ"
228 |    ]
229 |   },
230 |   {
231 |    "cell_type": "code",
232 |    "execution_count": 31,
233 |    "metadata": {
234 |     "scrolled": true
235 |    },
236 |    "outputs": [
237 |     {
238 |      "name": "stdout",
239 |      "output_type": "stream",
240 |      "text": [
241 |       "\n",
242 |       "\n",
243 |       "┌\u001b[0m───────\u001b[0m┬\u001b[0m─────────────────────────────────\u001b[0m┐\u001b[0m\n",
244 |       "│\u001b[0m\u001b[1m  time \u001b[0m│\u001b[0m\u001b[1m                            room \u001b[0m│\u001b[0m\n",
245 |       "│\u001b[0m\u001b[90m Int64 \u001b[0m│\u001b[0m\u001b[90m CategoricalValue{String,UInt32} \u001b[0m│\u001b[0m\n",
246 |       "├\u001b[0m───────\u001b[0m┼\u001b[0m─────────────────────────────────\u001b[0m┤\u001b[0m\n",
247 |       "│\u001b[0m     5 \u001b[0m│\u001b[0m                         kitchen \u001b[0m│\u001b[0m\n",
248 |       "│\u001b[0m     5 \u001b[0m│\u001b[0m                        bathroom \u001b[0m│\u001b[0m\n",
249 |       "│\u001b[0m     5 \u001b[0m│\u001b[0m                       bedroom_1 \u001b[0m│\u001b[0m\n",
250 |       "│\u001b[0m     5 \u001b[0m│\u001b[0m                     living_room \u001b[0m│\u001b[0m\n",
251 |       "│\u001b[0m     6 \u001b[0m│\u001b[0m                         kitchen \u001b[0m│\u001b[0m\n",
252 |       "│\u001b[0m     6 \u001b[0m│\u001b[0m                        bathroom \u001b[0m│\u001b[0m\n",
253 |       "│\u001b[0m     6 \u001b[0m│\u001b[0m                       bedroom_1 \u001b[0m│\u001b[0m\n",
254 |       "│\u001b[0m     6 \u001b[0m│\u001b[0m                     living_room \u001b[0m│\u001b[0m\n",
255 |       "│\u001b[0m     7 \u001b[0m│\u001b[0m                         kitchen \u001b[0m│\u001b[0m\n",
256 |       "│\u001b[0m     7 \u001b[0m│\u001b[0m                        bathroom \u001b[0m│\u001b[0m\n",
257 |       "│\u001b[0m     7 \u001b[0m│\u001b[0m                       bedroom_1 \u001b[0m│\u001b[0m\n",
258 |       "│\u001b[0m     7 \u001b[0m│\u001b[0m                     living_room \u001b[0m│\u001b[0m\n",
259 |       "└\u001b[0m───────\u001b[0m┴\u001b[0m─────────────────────────────────\u001b[0m┘\u001b[0m\n"
260 |      ]
261 |     }
262 |    ],
263 |    "source": [
264 |     "println()\n",
265 |     "println()\n",
266 |     "MLJ.MLJBase.PrettyTables.pretty_table(X)"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": 15,
272 |    "metadata": {},
273 |    "outputs": [
274 |     {
275 |      "ename": "UndefVarError",
276 |      "evalue": "UndefVarError: y not defined",
277 |      "output_type": "error",
278 |      "traceback": [
279 |       "UndefVarError: y not defined",
280 |       "",
281 |       "Stacktrace:",
282 |       " [1] top-level scope at In[15]:1"
283 |      ]
284 |     }
285 |    ],
286 |    "source": [
287 |     "pretty(y)"
288 |    ]
289 |   },
290 |   {
291 |    "cell_type": "code",
292 |    "execution_count": 16,
293 |    "metadata": {},
294 |    "outputs": [
295 |     {
296 |      "data": {
297 |       "text/plain": [
298 |        "12-element Array{Float64,1}:\n",
299 |        " 18.490955359526012\n",
300 |        " 18.304060288673128\n",
301 |        " 18.25954037947709\n",
302 |        " 17.419481829632957\n",
303 |        " 16.589235329028348\n",
304 |        " 20.66317138311018\n",
305 |        " 18.945861996750985\n",
306 |        " 20.158722013970333\n",
307 |        " 20.361567584624957\n",
308 |        " 19.85771377870428\n",
309 |        " 16.180836445944205\n",
310 |        " 17.330000922162835"
311 |       ]
312 |      },
313 |      "execution_count": 16,
314 |      "metadata": {},
315 |      "output_type": "execute_result"
316 |     }
317 |    ],
318 |    "source": [
319 |     "temp = 16 .+ 5*rand(12)"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "code",
324 |    "execution_count": 19,
325 |    "metadata": {},
326 |    "outputs": [
327 |     {
328 |      "data": {
329 |       "text/plain": [
330 |        "12-element Array{Float64,1}:\n",
331 |        " 18.5\n",
332 |        " 18.3\n",
333 |        " 18.3\n",
334 |        " 17.4\n",
335 |        " 16.6\n",
336 |        " 20.7\n",
337 |        " 18.9\n",
338 |        " 20.2\n",
339 |        " 20.4\n",
340 |        " 19.9\n",
341 |        " 16.2\n",
342 |        " 17.3"
343 |       ]
344 |      },
345 |      "execution_count": 19,
346 |      "metadata": {},
347 |      "output_type": "execute_result"
348 |     }
349 |    ],
350 |    "source": [
351 |     "temperature = map(temp) do x round(x, sigdigits=3) end"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "code",
356 |    "execution_count": 20,
357 |    "metadata": {},
358 |    "outputs": [
359 |     {
360 |      "ename": "UndefVarError",
361 |      "evalue": "UndefVarError: y not defined",
362 |      "output_type": "error",
363 |      "traceback": [
364 |       "UndefVarError: y not defined",
365 |       "",
366 |       "Stacktrace:",
367 |       " [1] top-level scope at In[20]:1"
368 |      ]
369 |     }
370 |    ],
371 |    "source": [
372 |     "y = DataFrame(y)"
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": 23,
378 |    "metadata": {},
379 |    "outputs": [
380 |     {
381 |      "data": {
382 |       "text/html": [
383 |        "<table class=\"data-frame\"><thead><tr><th></th><th>temperature</th></tr><tr><th></th><th>Float64</th></tr></thead><tbody><p>12 rows × 1 columns</p><tr><th>1</th><td>18.5</td></tr><tr><th>2</th><td>18.3</td></tr><tr><th>3</th><td>18.3</td></tr><tr><th>4</th><td>17.4</td></tr><tr><th>5</th><td>16.6</td></tr><tr><th>6</th><td>20.7</td></tr><tr><th>7</th><td>18.9</td></tr><tr><th>8</th><td>20.2</td></tr><tr><th>9</th><td>20.4</td></tr><tr><th>10</th><td>19.9</td></tr><tr><th>11</th><td>16.2</td></tr><tr><th>12</th><td>17.3</td></tr></tbody></table>"
384 |       ],
385 |       "text/latex": [
386 |        "\\begin{tabular}{r|c}\n",
387 |        "\t& temperature\\\\\n",
388 |        "\t\\hline\n",
389 |        "\t& Float64\\\\\n",
390 |        "\t\\hline\n",
391 |        "\t1 & 18.5 \\\\\n",
392 |        "\t2 & 18.3 \\\\\n",
393 |        "\t3 & 18.3 \\\\\n",
394 |        "\t4 & 17.4 \\\\\n",
395 |        "\t5 & 16.6 \\\\\n",
396 |        "\t6 & 20.7 \\\\\n",
397 |        "\t7 & 18.9 \\\\\n",
398 |        "\t8 & 20.2 \\\\\n",
399 |        "\t9 & 20.4 \\\\\n",
400 |        "\t10 & 19.9 \\\\\n",
401 |        "\t11 & 16.2 \\\\\n",
402 |        "\t12 & 17.3 \\\\\n",
403 |        "\\end{tabular}\n"
404 |       ],
405 |       "text/plain": [
406 |        "12×1 DataFrame\n",
407 |        "│ Row │ temperature │\n",
408 |        "│     │ \u001b[90mFloat64\u001b[39m     │\n",
409 |        "├─────┼─────────────┤\n",
410 |        "│ 1   │ 18.5        │\n",
411 |        "│ 2   │ 18.3        │\n",
412 |        "│ 3   │ 18.3        │\n",
413 |        "│ 4   │ 17.4        │\n",
414 |        "│ 5   │ 16.6        │\n",
415 |        "│ 6   │ 20.7        │\n",
416 |        "│ 7   │ 18.9        │\n",
417 |        "│ 8   │ 20.2        │\n",
418 |        "│ 9   │ 20.4        │\n",
419 |        "│ 10  │ 19.9        │\n",
420 |        "│ 11  │ 16.2        │\n",
421 |        "│ 12  │ 17.3        │"
422 |       ]
423 |      },
424 |      "execution_count": 23,
425 |      "metadata": {},
426 |      "output_type": "execute_result"
427 |     }
428 |    ],
429 |    "source": [
430 |     "y=DataFrame(temperature=temperature)"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "code",
435 |    "execution_count": 32,
436 |    "metadata": {},
437 |    "outputs": [
438 |     {
439 |      "name": "stdout",
440 |      "output_type": "stream",
441 |      "text": [
442 |       "\n",
443 |       "\n",
444 |       "┌\u001b[0m─────────────\u001b[0m┐\u001b[0m\n",
445 |       "│\u001b[0m\u001b[1m temperature \u001b[0m│\u001b[0m\n",
446 |       "│\u001b[0m\u001b[90m     Float64 \u001b[0m│\u001b[0m\n",
447 |       "├\u001b[0m─────────────\u001b[0m┤\u001b[0m\n",
448 |       "│\u001b[0m        18.5 \u001b[0m│\u001b[0m\n",
449 |       "│\u001b[0m        18.3 \u001b[0m│\u001b[0m\n",
450 |       "│\u001b[0m        18.3 \u001b[0m│\u001b[0m\n",
451 |       "│\u001b[0m        17.4 \u001b[0m│\u001b[0m\n",
452 |       "│\u001b[0m        16.6 \u001b[0m│\u001b[0m\n",
453 |       "│\u001b[0m        20.7 \u001b[0m│\u001b[0m\n",
454 |       "│\u001b[0m        18.9 \u001b[0m│\u001b[0m\n",
455 |       "│\u001b[0m        20.2 \u001b[0m│\u001b[0m\n",
456 |       "│\u001b[0m        20.4 \u001b[0m│\u001b[0m\n",
457 |       "│\u001b[0m        19.9 \u001b[0m│\u001b[0m\n",
458 |       "│\u001b[0m        16.2 \u001b[0m│\u001b[0m\n",
459 |       "│\u001b[0m        17.3 \u001b[0m│\u001b[0m\n",
460 |       "└\u001b[0m─────────────\u001b[0m┘\u001b[0m\n"
461 |      ]
462 |     }
463 |    ],
464 |    "source": [
465 |     "println()\n",
466 |     "println()\n",
467 |     "MLJ.MLJBase.PrettyTables.pretty_table(y)"
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "code",
472 |    "execution_count": 25,
473 |    "metadata": {},
474 |    "outputs": [
475 |     {
476 |      "data": {
477 |       "text/html": [
478 |        "<table class=\"data-frame\"><thead><tr><th></th><th>temperature</th></tr><tr><th></th><th>Float64</th></tr></thead><tbody><p>12 rows × 1 columns</p><tr><th>1</th><td>18.5</td></tr><tr><th>2</th><td>18.3</td></tr><tr><th>3</th><td>18.3</td></tr><tr><th>4</th><td>17.4</td></tr><tr><th>5</th><td>16.6</td></tr><tr><th>6</th><td>20.7</td></tr><tr><th>7</th><td>18.9</td></tr><tr><th>8</th><td>20.2</td></tr><tr><th>9</th><td>20.4</td></tr><tr><th>10</th><td>19.9</td></tr><tr><th>11</th><td>16.2</td></tr><tr><th>12</th><td>17.3</td></tr></tbody></table>"
479 |       ],
480 |       "text/latex": [
481 |        "\\begin{tabular}{r|c}\n",
482 |        "\t& temperature\\\\\n",
483 |        "\t\\hline\n",
484 |        "\t& Float64\\\\\n",
485 |        "\t\\hline\n",
486 |        "\t1 & 18.5 \\\\\n",
487 |        "\t2 & 18.3 \\\\\n",
488 |        "\t3 & 18.3 \\\\\n",
489 |        "\t4 & 17.4 \\\\\n",
490 |        "\t5 & 16.6 \\\\\n",
491 |        "\t6 & 20.7 \\\\\n",
492 |        "\t7 & 18.9 \\\\\n",
493 |        "\t8 & 20.2 \\\\\n",
494 |        "\t9 & 20.4 \\\\\n",
495 |        "\t10 & 19.9 \\\\\n",
496 |        "\t11 & 16.2 \\\\\n",
497 |        "\t12 & 17.3 \\\\\n",
498 |        "\\end{tabular}\n"
499 |       ],
500 |       "text/plain": [
501 |        "12×1 DataFrame\n",
502 |        "│ Row │ temperature │\n",
503 |        "│     │ \u001b[90mFloat64\u001b[39m     │\n",
504 |        "├─────┼─────────────┤\n",
505 |        "│ 1   │ 18.5        │\n",
506 |        "│ 2   │ 18.3        │\n",
507 |        "│ 3   │ 18.3        │\n",
508 |        "│ 4   │ 17.4        │\n",
509 |        "│ 5   │ 16.6        │\n",
510 |        "│ 6   │ 20.7        │\n",
511 |        "│ 7   │ 18.9        │\n",
512 |        "│ 8   │ 20.2        │\n",
513 |        "│ 9   │ 20.4        │\n",
514 |        "│ 10  │ 19.9        │\n",
515 |        "│ 11  │ 16.2        │\n",
516 |        "│ 12  │ 17.3        │"
517 |       ]
518 |      },
519 |      "execution_count": 25,
520 |      "metadata": {},
521 |      "output_type": "execute_result"
522 |     }
523 |    ],
524 |    "source": [
525 |     "y"
526 |    ]
527 |   },
528 |   {
529 |    "cell_type": "code",
530 |    "execution_count": null,
531 |    "metadata": {},
532 |    "outputs": [],
533 |    "source": []
534 |   }
535 |  ],
536 |  "metadata": {
537 |   "kernelspec": {
538 |    "display_name": "Julia 1.4.2",
539 |    "language": "julia",
540 |    "name": "julia-1.4"
541 |   },
542 |   "language_info": {
543 |    "file_extension": ".jl",
544 |    "mimetype": "application/julia",
545 |    "name": "julia",
546 |    "version": "1.4.2"
547 |   }
548 |  },
549 |  "nbformat": 4,
550 |  "nbformat_minor": 4
551 | }
552 | 


--------------------------------------------------------------------------------
/apt.txt:
--------------------------------------------------------------------------------
1 | tzdata


--------------------------------------------------------------------------------
/assets/scitypes.drawio:
--------------------------------------------------------------------------------
1 | <mxfile host="Electron" modified="2020-07-06T04:00:08.283Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/12.6.5 Chrome/80.0.3987.86 Electron/8.0.0 Safari/537.36" etag="xh1rsP_3RuRtAKdCMuG3" version="12.6.5" type="device"><diagram id="tvpOugV7o9-3MZMcshac" name="Page-1">7ZnbbptAEIafhstU5pxcJiROKiWt2lRK07sVjGHbZddZBhvn6btrlmCKGhPJdq2UKzP/zJ5mPjNgW26UV9eSzLM7kQCznElSWe6l5Ti2PQnUh1ZWtXLqTmohlTQxQa1wT5/BiE1YSRMoOoEoBEM674qx4Bxi7GhESrHshs0E6646Jyn0hPuYsL76QBPMzCmcsNVvgKZZs7IdnNWenDTB5iRFRhKx3JDcK8uNpBBYX+VVBEwnr8lLPW76F+/LxiRwHDJg+cUn0Vn2Y/bwNcrOr5/pE41OzGYLXDUHhkSd35hCYiZSwQm7atWLuJQL0JPaypCi5MnamiirHXArxNyE/ATElaksKVEoKcOcGS9UFL/r4R9C35iPG67Lyky9NlaNwVGu6lF+Yz5u+tpha6sZV8xJTHl6CzPsKsrSE80ExynJKdMDboAtAGlMlKOfbJP/QpQyhlcy3EBLZAr4Spz5nuj0byxgSnkNIgd1EBUggRGkiy6exFCevsS1IKgLw8IbuLC9IwLD+a/B8I8LjHreBWGlWclyAoYmRR1igqdSNI6TYl3mcxVgB/OqdaqrVH9OKacIVnjxyQovmynVDutZ65gekggVdqEpUIpfEAkmpFK44JrLGWXsD4kwmnJlxqpsoPSLBUhdUXZuHDlNkjXUy0zt617BoNdcqibXY3v37OjNQPVqtY3X90x3Me21aTbLtlcFRso22lQTtns87F6Rxoay2/uGM/C+4R3XfcM5IjDeZ0MZCkZ4VGA4e2ooH/msbiljLxnYS8Jgay/xDtlL3D2RcUeLQmd0BGMgGM5pF4zTfwyGtycwIpVxHLEYisXLTzlHgoW/Jyw+ywQkJFMSoyrj+IbyNko8bysltntITIJ9tZWSqawxUhQjI299iw39rYzYh2Qk3FuH4Uh5KcpihGMoHMH2p4/DwmH3G803VaWSsPdWPe0wr9N2sJtqugd7llRm++fK2rfxF5V79Rs=</diagram></mxfile>


--------------------------------------------------------------------------------
/assets/scitypes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/assets/scitypes.png


--------------------------------------------------------------------------------
/data/horse.csv:
--------------------------------------------------------------------------------
  1 | surgery,age,rectal_temperature,pulse,respiratory_rate,temperature_extremities,mucous_membranes,capillary_refill_time,pain,peristalsis,abdominal_distension,packed_cell_volume,total_protein,outcome,surgical_lesion,cp_data
  2 | 2,1,38.5,66,66,3,1,2,5,4,4,45.0,8.4,2,2,2
  3 | 1,1,39.2,88,88,3,4,1,3,4,2,50.0,85.0,3,2,2
  4 | 2,1,38.3,40,40,1,3,1,3,3,1,33.0,6.7,1,2,1
  5 | 1,9,39.1,164,164,4,6,2,2,4,4,48.0,7.2,2,1,1
  6 | 2,1,37.3,104,104,3,6,2,3,3,1,74.0,7.4,2,2,2
  7 | 2,1,38.1,60,60,2,3,1,2,3,2,44.0,7.5,1,2,2
  8 | 1,1,37.9,48,48,1,1,1,3,3,3,37.0,7.0,1,1,2
  9 | 1,1,38.1,60,60,3,1,1,3,4,2,44.0,8.3,2,1,2
 10 | 2,1,38.1,80,80,3,3,1,4,4,4,38.0,6.2,3,1,2
 11 | 2,9,38.3,90,90,1,1,1,5,3,1,40.0,6.2,1,2,1
 12 | 1,1,38.1,66,66,3,5,1,3,3,1,44.0,6.0,1,1,1
 13 | 2,1,39.1,72,72,2,2,1,2,1,2,50.0,7.8,1,1,2
 14 | 1,1,37.2,42,42,2,1,1,3,3,3,44.0,7.0,1,2,2
 15 | 2,9,38.0,92,92,1,2,1,1,3,2,37.0,6.1,2,2,1
 16 | 1,1,38.2,76,76,3,1,1,3,4,1,46.0,81.0,1,1,2
 17 | 1,1,37.6,96,96,3,4,1,5,3,3,45.0,6.8,2,1,2
 18 | 1,9,38.1,128,128,3,4,2,4,4,3,53.0,7.8,2,2,1
 19 | 2,1,37.5,48,48,3,1,1,3,3,1,44.0,7.5,1,2,2
 20 | 1,1,37.6,64,64,1,2,1,2,3,1,40.0,7.0,1,1,1
 21 | 2,1,39.4,110,110,4,6,1,3,3,3,55.0,8.7,1,2,2
 22 | 1,1,39.9,72,72,1,5,2,5,4,4,46.0,6.1,1,1,2
 23 | 2,1,38.4,48,48,1,1,1,1,3,1,49.0,6.8,1,2,2
 24 | 1,1,38.6,42,42,2,4,1,2,3,1,48.0,7.2,1,1,2
 25 | 1,9,38.3,130,130,3,1,1,2,4,1,50.0,70.0,1,1,2
 26 | 1,1,38.1,60,60,3,3,1,3,4,3,51.0,65.0,1,1,2
 27 | 2,1,37.8,60,60,3,1,1,3,3,1,44.0,7.5,1,2,2
 28 | 1,1,38.3,72,72,4,3,2,3,3,3,43.0,7.0,1,1,1
 29 | 1,1,37.8,48,48,3,1,1,3,3,2,37.0,5.5,1,2,1
 30 | 1,1,38.1,60,60,3,1,1,3,3,1,44.0,7.5,2,2,2
 31 | 2,1,37.7,48,48,2,1,1,1,1,1,45.0,76.0,1,2,2
 32 | 2,1,37.7,96,96,3,4,2,5,4,4,66.0,7.5,2,1,2
 33 | 2,1,37.2,108,108,3,4,2,2,4,2,52.0,8.2,3,1,1
 34 | 1,1,37.2,60,60,2,1,1,3,3,3,43.0,6.6,1,1,2
 35 | 1,1,38.2,64,64,1,1,1,3,1,1,49.0,8.6,1,1,1
 36 | 1,1,38.1,100,100,3,4,2,5,4,4,52.0,6.6,1,1,2
 37 | 2,1,38.1,104,104,4,3,2,4,4,3,73.0,8.4,3,1,2
 38 | 2,1,38.3,112,112,3,5,2,3,3,1,51.0,6.0,3,2,1
 39 | 1,1,37.8,72,72,3,1,1,5,3,1,56.0,80.0,1,1,2
 40 | 2,1,38.6,52,52,1,1,1,3,3,2,32.0,6.6,1,2,1
 41 | 1,9,39.2,146,146,3,1,1,3,3,1,44.0,7.5,2,1,2
 42 | 1,1,38.1,88,88,3,6,2,5,3,3,63.0,6.5,2,1,2
 43 | 2,9,39.0,150,150,3,1,1,3,3,1,47.0,8.5,1,1,1
 44 | 2,1,38.0,60,60,3,3,1,3,3,1,47.0,7.0,1,2,2
 45 | 1,1,38.1,120,120,3,4,1,4,4,4,52.0,67.0,3,1,2
 46 | 1,1,35.4,140,140,3,4,2,4,4,1,57.0,69.0,3,1,2
 47 | 2,1,38.1,120,120,4,4,2,5,4,4,60.0,6.5,2,1,2
 48 | 1,1,37.9,60,60,3,4,2,5,4,4,65.0,7.5,1,1,1
 49 | 2,1,37.5,48,48,1,1,1,1,1,1,37.0,6.5,1,2,2
 50 | 1,1,38.9,80,80,3,3,2,2,3,3,54.0,6.5,2,1,2
 51 | 2,1,37.2,84,84,3,5,2,4,1,2,73.0,5.5,2,2,1
 52 | 2,1,38.6,46,46,1,2,1,1,3,2,49.0,9.1,1,2,1
 53 | 1,1,37.4,84,84,1,3,2,3,3,2,44.0,7.5,2,1,1
 54 | 2,1,38.1,60,60,1,3,1,1,3,1,43.0,7.7,1,2,2
 55 | 2,1,38.6,40,40,3,1,1,3,3,1,41.0,6.4,1,2,1
 56 | 2,1,40.3,114,114,3,1,2,2,3,3,57.0,8.1,3,1,1
 57 | 1,9,38.6,160,160,3,5,1,3,3,4,38.0,7.5,2,1,1
 58 | 1,1,38.1,60,60,3,1,1,3,3,1,24.0,6.7,1,1,2
 59 | 1,1,38.1,64,64,2,2,1,5,3,3,42.0,7.7,2,1,2
 60 | 1,1,38.1,60,60,4,3,1,5,4,3,53.0,5.9,2,1,1
 61 | 2,1,38.1,96,96,3,3,2,5,4,4,60.0,7.5,2,1,2
 62 | 2,1,37.8,48,48,1,3,1,2,1,1,37.0,6.7,1,2,2
 63 | 2,1,38.5,60,60,2,1,1,1,2,2,44.0,7.7,1,2,2
 64 | 1,1,37.8,88,88,2,2,1,3,3,1,64.0,8.0,2,1,1
 65 | 2,1,38.2,130,130,4,4,2,2,4,4,65.0,82.0,3,2,2
 66 | 1,1,39.0,64,64,3,4,2,3,3,2,44.0,7.5,1,1,1
 67 | 1,1,38.1,60,60,3,3,1,3,3,2,26.0,72.0,1,1,2
 68 | 2,1,37.9,72,72,1,5,2,3,3,1,58.0,74.0,1,1,2
 69 | 2,1,38.4,54,54,1,1,1,1,3,1,49.0,7.2,1,2,1
 70 | 2,1,38.1,52,52,1,3,1,3,3,1,55.0,7.2,1,2,2
 71 | 2,1,38.0,48,48,1,1,1,1,3,1,42.0,6.3,1,2,1
 72 | 2,1,37.0,60,60,3,1,1,3,3,3,43.0,7.6,3,1,1
 73 | 1,1,37.8,48,48,1,1,1,1,2,1,46.0,5.9,1,2,1
 74 | 1,1,37.7,56,56,3,1,1,3,3,1,44.0,7.5,2,1,2
 75 | 1,1,38.1,52,52,1,5,1,4,3,1,54.0,7.5,2,1,1
 76 | 1,9,38.1,60,60,3,1,1,3,3,1,37.0,4.9,2,1,2
 77 | 1,9,39.7,100,100,3,5,2,2,3,1,48.0,57.0,3,1,2
 78 | 1,1,37.6,38,38,3,1,1,3,3,2,37.0,68.0,1,1,2
 79 | 2,1,38.7,52,52,2,1,1,1,1,1,33.0,77.0,1,2,2
 80 | 1,1,38.1,60,60,3,3,3,5,3,3,46.0,5.9,2,1,2
 81 | 1,1,37.5,96,96,1,6,2,3,4,2,69.0,8.9,1,1,1
 82 | 1,1,36.4,98,98,3,4,1,4,3,2,47.0,6.4,2,1,1
 83 | 1,1,37.3,40,40,3,1,1,2,3,2,36.0,7.5,1,1,2
 84 | 1,9,38.1,100,100,3,2,1,3,4,1,36.0,5.7,1,1,2
 85 | 1,1,38.0,60,60,3,6,2,5,3,4,68.0,7.8,2,1,2
 86 | 1,1,37.8,60,60,1,2,2,2,3,3,40.0,4.5,1,1,1
 87 | 2,1,38.0,54,54,2,3,3,3,1,2,45.0,6.2,1,2,2
 88 | 1,1,38.1,88,88,3,4,2,5,4,3,50.0,7.7,2,1,1
 89 | 2,1,38.1,40,40,3,1,1,3,3,1,50.0,7.0,3,1,1
 90 | 2,1,39.0,64,64,1,5,1,3,3,2,42.0,7.5,1,2,1
 91 | 2,1,38.3,42,42,1,1,1,1,1,1,38.0,61.0,1,2,2
 92 | 2,1,38.0,52,52,3,1,1,2,3,1,53.0,86.0,1,1,2
 93 | 2,1,40.3,114,114,3,1,2,2,3,3,57.0,8.1,2,1,1
 94 | 2,1,38.8,50,50,3,1,1,1,1,1,42.0,6.2,1,2,2
 95 | 2,1,38.1,60,60,3,1,1,5,3,3,38.0,6.5,2,1,2
 96 | 2,1,37.5,48,48,4,3,1,3,2,1,48.0,8.6,1,2,2
 97 | 1,1,37.3,48,48,3,2,1,3,3,3,41.0,69.0,1,1,2
 98 | 2,1,38.1,84,84,3,3,1,3,3,1,44.0,8.5,1,1,2
 99 | 1,1,38.1,88,88,3,4,1,2,3,3,55.0,60.0,3,2,2
100 | 2,1,37.7,44,44,2,3,1,1,3,2,41.0,60.0,1,2,2
101 | 2,1,39.6,108,108,3,6,2,2,4,3,59.0,8.0,1,2,1
102 | 1,1,38.2,40,40,3,1,1,1,3,1,34.0,66.0,1,2,2
103 | 1,1,38.1,60,60,4,4,2,5,4,1,44.0,7.5,3,1,2
104 | 2,1,38.3,40,40,3,1,1,2,3,1,37.0,57.0,1,2,2
105 | 1,9,38.0,140,140,1,1,1,3,3,2,39.0,5.3,1,1,2
106 | 1,1,37.8,52,52,1,3,1,4,4,1,48.0,6.6,2,1,2
107 | 1,1,38.1,70,70,1,3,2,2,3,2,36.0,7.3,1,1,2
108 | 1,1,38.3,52,52,3,3,1,3,3,1,43.0,6.1,1,1,1
109 | 2,1,37.3,50,50,1,3,1,1,3,2,44.0,7.0,1,2,2
110 | 1,1,38.7,60,60,4,2,2,4,4,4,53.0,64.0,3,1,2
111 | 1,9,38.4,84,84,3,2,1,3,3,3,36.0,6.6,2,1,1
112 | 1,1,38.1,70,70,3,5,2,2,3,2,60.0,7.5,2,1,2
113 | 1,1,38.3,40,40,3,1,1,1,3,2,38.0,58.0,1,1,2
114 | 1,1,38.1,40,40,2,1,1,1,3,1,39.0,56.0,1,1,2
115 | 1,1,36.8,60,60,3,1,1,3,3,1,44.0,7.5,2,1,1
116 | 1,1,38.4,44,44,3,4,1,5,4,3,50.0,77.0,1,1,2
117 | 2,1,38.1,60,60,3,1,1,3,3,2,45.0,70.0,1,2,2
118 | 1,1,38.0,44,44,1,1,1,3,3,3,42.0,65.0,1,1,2
119 | 2,1,39.5,60,60,3,4,2,3,4,3,44.0,6.7,3,1,2
120 | 1,1,36.5,78,78,1,1,1,5,3,1,34.0,75.0,1,1,2
121 | 2,1,38.1,56,56,2,2,1,1,3,1,46.0,70.0,1,2,2
122 | 1,1,39.4,54,54,1,2,1,2,3,2,39.0,6.0,1,1,1
123 | 1,1,38.3,80,80,3,6,2,4,3,1,67.0,10.2,3,1,2
124 | 2,1,38.7,40,40,2,1,1,3,1,1,39.0,62.0,1,2,2
125 | 1,1,38.2,64,64,1,3,1,4,4,3,45.0,7.5,2,1,1
126 | 2,1,37.6,48,48,3,4,1,1,1,3,37.0,5.5,3,1,2
127 | 1,1,38.0,42,42,4,1,1,3,3,2,41.0,7.6,1,1,2
128 | 1,1,38.7,60,60,3,3,1,5,4,2,33.0,6.5,1,1,2
129 | 1,1,37.4,50,50,3,1,1,4,4,1,45.0,7.9,1,1,1
130 | 1,1,37.4,84,84,3,3,1,2,3,3,31.0,61.0,3,2,2
131 | 1,1,38.4,49,49,3,1,1,3,3,1,44.0,7.6,1,1,2
132 | 1,1,37.8,30,30,3,1,1,3,3,1,44.0,7.5,2,1,2
133 | 2,1,37.6,88,88,3,1,1,3,3,2,44.0,6.0,2,1,2
134 | 2,1,37.9,40,40,1,1,1,2,3,1,40.0,5.7,1,1,2
135 | 1,1,38.1,100,100,3,4,2,5,4,1,59.0,6.3,2,1,2
136 | 1,9,38.1,136,136,3,3,1,5,1,3,33.0,4.9,2,1,1
137 | 1,1,38.1,60,60,3,3,2,5,3,3,46.0,5.9,2,1,2
138 | 1,1,38.0,48,48,1,1,1,1,2,4,44.0,7.5,1,1,2
139 | 2,1,38.0,56,56,1,3,1,1,1,1,42.0,71.0,1,2,2
140 | 2,1,38.0,60,60,1,1,1,3,3,1,50.0,7.0,1,2,2
141 | 1,1,38.1,44,44,3,1,1,2,2,1,31.0,7.3,1,2,2
142 | 2,1,36.0,42,42,3,5,1,3,3,1,64.0,6.8,2,2,2
143 | 1,1,38.1,120,120,4,6,2,5,4,4,57.0,4.5,2,1,1
144 | 1,1,37.8,48,48,1,1,2,1,2,1,46.0,5.9,1,2,1
145 | 1,1,37.1,84,84,3,6,1,2,4,4,75.0,81.0,3,2,2
146 | 2,1,38.1,80,80,3,2,1,2,3,3,50.0,80.0,1,1,2
147 | 1,1,38.2,48,48,1,3,1,3,4,4,42.0,71.0,1,1,2
148 | 2,1,38.0,44,44,2,3,1,3,4,3,33.0,6.5,2,1,2
149 | 1,1,38.3,132,132,3,6,2,2,4,2,57.0,8.0,1,1,1
150 | 2,1,38.7,48,48,3,1,1,1,1,1,34.0,63.0,1,2,2
151 | 2,1,38.9,44,44,3,1,1,2,3,2,33.0,64.0,1,2,2
152 | 1,1,39.3,60,60,4,6,2,4,4,2,75.0,7.5,2,1,1
153 | 1,1,38.1,100,100,3,4,2,3,4,4,68.0,64.0,1,1,2
154 | 2,1,38.6,48,48,3,1,1,1,3,2,50.0,7.3,1,2,1
155 | 2,1,38.8,48,48,1,3,1,3,3,4,41.0,65.0,1,1,2
156 | 2,1,38.0,48,48,3,4,1,1,4,2,49.0,8.3,1,2,1
157 | 2,1,38.6,52,52,1,1,1,3,3,2,36.0,6.6,1,2,1
158 | 1,1,37.8,60,60,1,3,2,3,4,4,52.0,75.0,3,1,2
159 | 2,1,38.0,42,42,3,1,1,3,3,1,44.0,7.5,1,2,2
160 | 2,1,38.1,60,60,1,2,1,2,1,2,44.0,7.5,1,2,1
161 | 1,1,38.1,60,60,3,1,1,4,3,1,35.0,58.0,1,1,2
162 | 1,1,38.3,42,42,3,1,1,3,3,1,40.0,8.5,2,1,2
163 | 2,1,39.5,60,60,3,1,2,3,3,2,38.0,56.0,1,2,2
164 | 1,1,38.0,66,66,1,3,1,5,3,1,46.0,46.0,3,1,2
165 | 1,1,38.7,76,76,1,5,2,3,3,2,50.0,8.0,1,1,1
166 | 1,1,39.4,120,120,3,5,1,3,3,3,56.0,64.0,3,2,2
167 | 1,1,38.3,40,40,1,1,1,3,1,1,43.0,5.9,1,2,1
168 | 2,1,38.1,44,44,1,1,1,3,3,1,44.0,6.3,1,2,2
169 | 1,1,38.4,104,104,1,3,1,2,4,2,55.0,8.5,1,1,2
170 | 1,1,38.1,65,65,3,1,2,5,3,4,44.0,7.5,3,1,2
171 | 2,1,37.5,44,44,1,3,1,3,1,1,35.0,7.2,1,2,2
172 | 2,1,39.0,86,86,3,5,1,3,3,3,68.0,5.8,2,1,1
173 | 1,1,38.5,129,129,3,3,1,2,4,3,57.0,66.0,1,1,2
174 | 1,1,38.1,104,104,3,5,2,2,4,3,69.0,8.6,2,1,1
175 | 2,1,38.1,60,60,3,6,1,4,3,4,44.0,7.5,2,1,1
176 | 1,1,38.1,60,60,3,1,1,3,3,1,44.0,7.5,1,1,2
177 | 1,1,38.2,60,60,1,3,1,3,3,1,48.0,66.0,1,1,2
178 | 1,1,38.1,68,68,3,4,1,4,3,1,44.0,7.5,2,1,1
179 | 1,1,38.1,60,60,3,4,2,5,4,4,45.0,70.0,1,1,2
180 | 2,1,38.5,100,100,3,5,2,4,3,4,44.0,7.5,3,2,1
181 | 1,1,38.4,84,84,3,5,2,4,3,3,47.0,7.5,2,1,2
182 | 2,1,37.8,48,48,3,1,1,3,3,2,35.0,7.5,1,2,1
183 | 1,1,38.0,60,60,3,6,2,5,3,4,68.0,7.8,2,1,2
184 | 2,1,37.8,56,56,1,2,1,2,1,1,44.0,68.0,1,2,2
185 | 2,1,38.2,68,68,2,2,1,1,1,1,43.0,65.0,1,2,2
186 | 1,1,38.5,120,120,4,6,2,3,3,1,54.0,7.5,1,1,2
187 | 1,1,39.3,64,64,2,1,1,3,3,1,39.0,6.7,1,1,2
188 | 1,1,38.4,80,80,4,1,1,3,3,3,32.0,6.1,1,1,1
189 | 1,1,38.5,60,60,1,1,1,3,1,1,33.0,53.0,1,1,2
190 | 1,1,38.3,60,60,3,1,1,2,1,1,30.0,6.0,1,1,2
191 | 1,1,37.1,40,40,3,4,1,3,3,1,23.0,6.7,1,1,1
192 | 2,9,38.1,100,100,2,1,1,4,1,1,37.0,4.7,1,2,2
193 | 1,1,38.2,48,48,1,1,1,3,3,3,48.0,74.0,1,1,2
194 | 1,1,38.1,60,60,3,4,2,4,3,4,58.0,7.6,2,1,2
195 | 2,1,37.9,88,88,1,2,1,2,2,1,37.0,56.0,1,2,2
196 | 2,1,38.0,44,44,3,1,1,3,1,2,42.0,64.0,1,2,2
197 | 2,1,38.5,60,60,1,5,2,2,2,1,63.0,7.5,3,2,1
198 | 2,1,38.5,96,96,3,1,2,2,4,2,70.0,8.5,2,1,1
199 | 2,1,38.3,60,60,1,1,2,1,3,1,34.0,66.0,1,2,2
200 | 2,1,38.5,60,60,3,2,1,2,1,2,49.0,59.0,1,2,2
201 | 1,1,37.3,48,48,1,3,1,3,1,3,40.0,6.6,1,1,1
202 | 1,1,38.5,86,86,1,3,1,4,4,3,45.0,7.4,2,1,1
203 | 1,1,37.5,48,48,3,1,1,3,3,1,41.0,55.0,3,1,2
204 | 2,1,37.2,36,36,1,1,1,2,3,1,35.0,5.7,1,2,2
205 | 1,1,39.2,60,60,3,3,1,4,4,2,36.0,6.6,1,1,1
206 | 2,1,38.5,100,100,3,5,2,4,3,4,44.0,7.5,3,2,2
207 | 1,1,38.5,96,96,2,4,2,4,4,3,50.0,65.0,1,1,2
208 | 1,1,38.1,60,60,3,1,1,3,3,1,45.0,8.7,2,1,2
209 | 1,1,37.8,88,88,3,5,2,3,3,3,64.0,89.0,3,1,2
210 | 2,1,37.5,44,44,3,1,1,3,1,2,43.0,51.0,1,2,2
211 | 1,1,37.9,68,68,3,2,1,2,4,2,45.0,4.0,2,1,1
212 | 1,1,38.0,86,86,4,4,1,2,4,4,45.0,5.5,2,1,1
213 | 1,9,38.9,120,120,1,2,2,3,3,3,47.0,6.3,1,2,2
214 | 1,1,37.6,45,45,3,3,1,3,2,2,39.0,7.0,1,1,1
215 | 2,1,38.6,56,56,2,1,1,1,1,1,40.0,7.0,1,2,1
216 | 1,1,37.8,40,40,1,1,1,1,2,1,38.0,7.0,1,1,2
217 | 2,1,38.1,60,60,3,1,1,3,3,1,44.0,7.5,1,2,2
218 | 1,1,38.0,76,76,3,1,2,3,3,1,71.0,11.0,1,1,1
219 | 1,1,38.1,40,40,1,2,1,2,2,1,44.0,7.5,3,1,2
220 | 1,1,38.1,52,52,3,4,1,3,4,3,37.0,8.1,1,1,2
221 | 1,1,39.2,88,88,4,1,2,5,4,1,44.0,7.5,3,2,2
222 | 1,1,38.5,92,92,4,1,1,2,4,3,46.0,67.0,1,1,2
223 | 1,1,38.1,112,112,4,4,1,2,3,1,60.0,6.3,1,1,1
224 | 1,1,37.7,66,66,1,3,1,3,3,2,31.5,6.2,1,1,1
225 | 1,1,38.8,50,50,1,1,1,3,1,1,38.0,58.0,1,1,2
226 | 2,1,38.4,54,54,1,1,1,1,3,1,49.0,7.2,1,2,1
227 | 1,1,39.2,120,120,4,5,2,2,3,3,60.0,8.8,2,1,2
228 | 1,9,38.1,60,60,3,1,1,3,3,1,45.0,6.5,1,1,1
229 | 1,1,37.3,90,90,3,6,2,5,4,3,65.0,50.0,3,1,2
230 | 1,9,38.5,120,120,3,1,1,3,1,1,35.0,54.0,1,1,2
231 | 1,1,38.5,104,104,3,1,1,4,3,4,44.0,7.5,1,1,2
232 | 2,1,39.5,92,92,3,6,1,5,4,1,72.0,6.4,2,2,2
233 | 1,1,38.5,30,30,3,1,1,3,3,1,40.0,7.7,1,1,2
234 | 1,1,38.3,72,72,4,3,2,3,3,3,43.0,7.0,1,1,1
235 | 2,1,37.5,48,48,4,3,1,3,2,1,48.0,8.6,1,2,2
236 | 1,1,38.1,52,52,1,5,1,4,3,1,54.0,7.5,2,1,1
237 | 2,1,38.2,42,42,1,1,1,3,1,2,36.0,6.9,1,2,2
238 | 2,1,37.9,54,54,2,5,1,3,1,1,47.0,54.0,1,2,2
239 | 2,1,36.1,88,88,3,3,1,3,3,2,45.0,7.0,3,1,1
240 | 1,1,38.1,70,70,3,1,1,5,3,1,36.0,65.0,3,1,2
241 | 1,1,38.0,90,90,4,4,2,5,4,4,55.0,6.1,2,1,2
242 | 1,1,38.2,52,52,1,2,1,1,2,1,43.0,8.1,1,2,1
243 | 1,1,38.1,36,36,1,4,1,5,3,3,41.0,5.9,2,1,2
244 | 1,1,38.4,92,92,1,1,2,3,3,3,44.0,7.5,1,1,1
245 | 1,9,38.2,124,124,1,2,1,2,3,4,47.0,8.0,1,1,1
246 | 2,1,38.1,96,96,3,3,2,5,4,4,60.0,7.5,2,1,2
247 | 1,1,37.6,68,68,3,3,1,4,2,4,47.0,7.2,1,1,2
248 | 1,1,38.1,88,88,3,4,1,5,4,3,41.0,4.6,2,1,2
249 | 1,1,38.0,108,108,2,4,1,4,3,3,44.0,7.5,1,1,2
250 | 2,1,38.2,48,48,2,1,2,3,3,1,34.0,6.6,1,2,2
251 | 1,1,39.3,100,100,4,6,1,2,4,1,66.0,13.0,3,1,2
252 | 2,1,36.6,42,42,3,2,1,1,4,1,52.0,7.1,2,1,2
253 | 1,9,38.8,124,124,3,2,1,2,3,4,50.0,7.6,2,1,1
254 | 2,1,38.1,112,112,3,4,2,5,4,2,40.0,5.3,1,2,1
255 | 1,1,38.1,80,80,3,3,1,4,4,4,43.0,70.0,1,1,2
256 | 1,9,38.8,184,184,1,1,1,4,1,3,33.0,3.3,2,1,2
257 | 1,1,37.5,72,72,2,1,1,2,1,1,35.0,65.0,3,1,2
258 | 1,1,38.7,96,96,3,4,1,3,4,1,64.0,9.0,2,1,1
259 | 2,1,37.5,52,52,1,1,1,2,3,2,36.0,61.0,1,2,2
260 | 1,1,40.8,72,72,3,1,1,2,3,1,54.0,7.4,2,1,1
261 | 2,1,38.0,40,40,3,1,1,4,3,2,37.0,69.0,1,2,2
262 | 2,1,38.4,48,48,2,1,1,1,3,2,39.0,6.5,1,2,1
263 | 2,9,38.6,88,88,3,1,1,3,3,1,35.0,5.9,1,2,2
264 | 1,1,37.1,75,75,3,3,2,4,4,2,48.0,7.4,2,1,1
265 | 1,1,38.3,44,44,3,2,1,3,3,3,44.0,6.5,1,1,1
266 | 2,1,38.1,56,56,3,1,1,3,3,1,40.0,6.0,3,1,2
267 | 2,1,38.6,68,68,2,3,1,3,3,2,38.0,6.5,1,2,1
268 | 2,1,38.3,54,54,3,2,1,2,3,2,44.0,7.2,1,2,1
269 | 1,1,38.2,42,42,3,1,1,3,3,1,47.0,60.0,1,2,2
270 | 1,1,39.3,64,64,2,1,1,3,3,1,39.0,6.7,1,1,2
271 | 1,1,37.5,60,60,3,1,1,3,3,2,35.0,6.5,2,1,2
272 | 1,1,37.7,80,80,3,6,1,5,4,1,50.0,55.0,1,1,2
273 | 1,1,38.1,100,100,3,4,2,5,4,4,52.0,6.6,1,1,2
274 | 1,1,37.7,120,120,3,3,1,5,3,3,65.0,7.0,2,1,1
275 | 1,1,38.1,76,76,3,1,1,3,4,4,44.0,7.5,3,1,2
276 | 1,9,38.8,150,150,1,6,2,5,3,2,50.0,6.2,2,1,2
277 | 1,1,38.0,36,36,3,1,1,4,2,2,37.0,75.0,3,2,2
278 | 2,1,36.9,50,50,2,3,1,1,3,2,37.5,6.5,1,2,2
279 | 2,1,37.8,40,40,1,1,1,1,1,1,37.0,6.8,1,2,2
280 | 2,1,38.2,56,56,4,1,1,2,4,3,47.0,7.2,1,2,1
281 | 1,1,38.6,48,48,3,1,1,1,1,1,36.0,67.0,1,2,2
282 | 2,1,40.0,78,78,3,5,1,2,3,1,66.0,6.5,2,1,1
283 | 1,1,38.1,70,70,3,5,2,2,3,2,60.0,7.5,2,1,2
284 | 1,1,38.2,72,72,3,1,1,3,3,1,35.0,6.4,1,1,2
285 | 2,1,38.5,54,54,1,1,1,3,1,1,40.0,6.8,1,2,1
286 | 1,1,38.5,66,66,1,1,1,3,3,1,40.0,6.7,1,1,1
287 | 2,1,37.8,82,82,3,1,2,4,3,3,50.0,7.0,3,1,2
288 | 2,9,39.5,84,84,3,1,1,3,3,1,28.0,5.0,1,2,2
289 | 1,1,38.1,60,60,3,1,1,3,3,1,44.0,7.5,1,1,2
290 | 1,1,38.0,50,50,3,1,1,3,2,2,39.0,6.6,1,1,1
291 | 2,1,38.6,45,45,2,2,1,1,1,1,43.0,58.0,1,2,2
292 | 1,1,38.9,80,80,3,3,1,2,3,3,54.0,6.5,2,1,2
293 | 1,1,37.0,66,66,1,2,1,4,3,3,35.0,6.9,2,1,2
294 | 1,1,38.1,78,78,3,3,1,3,3,1,43.0,62.0,3,2,2
295 | 2,1,38.5,40,40,1,1,1,2,1,1,37.0,67.0,1,2,2
296 | 1,1,38.1,120,120,4,4,2,2,4,1,55.0,65.0,3,2,2
297 | 2,1,37.2,72,72,3,4,2,4,3,3,44.0,7.5,3,1,1
298 | 1,1,37.5,72,72,4,4,1,4,4,3,60.0,6.8,2,1,2
299 | 1,1,36.5,100,100,3,3,1,3,3,3,50.0,6.0,1,1,1
300 | 1,1,37.2,40,40,3,1,1,3,3,1,36.0,62.0,3,2,2
301 | 2,1,38.5,54,54,3,2,2,3,4,1,42.0,6.3,1,2,1
302 | 2,1,37.6,48,48,3,1,1,3,3,1,44.0,6.3,1,2,1
303 | 1,1,37.7,44,44,3,3,2,5,4,4,45.0,70.0,1,1,2
304 | 1,1,37.0,56,56,3,4,2,4,4,3,35.0,61.0,3,2,2
305 | 2,1,38.0,42,42,3,3,1,1,3,1,37.0,5.8,1,2,2
306 | 1,1,38.1,60,60,3,1,1,3,4,1,42.0,72.0,1,1,2
307 | 2,1,38.4,80,80,3,2,1,3,2,1,54.0,6.9,1,2,2
308 | 2,1,37.8,48,48,2,2,1,3,3,1,48.0,7.3,1,2,1
309 | 2,1,37.9,45,45,3,3,2,2,3,1,33.0,5.7,1,1,1
310 | 2,1,39.0,84,84,3,5,1,2,4,2,62.0,5.9,2,1,1
311 | 2,1,38.2,60,60,3,3,2,3,3,2,53.0,7.5,1,2,1
312 | 1,1,38.1,140,140,3,4,2,5,4,4,30.0,69.0,2,2,2
313 | 1,1,37.9,120,120,3,3,1,5,4,4,52.0,6.6,2,1,1
314 | 2,1,38.0,72,72,1,3,1,3,3,2,38.0,6.8,1,2,1
315 | 2,9,38.0,92,92,1,2,1,1,3,2,37.0,6.1,1,2,1
316 | 1,1,38.3,66,66,2,1,1,2,4,3,37.0,6.0,1,1,2
317 | 2,1,37.5,48,48,3,1,1,2,1,1,43.0,6.0,1,2,1
318 | 1,1,37.5,88,88,2,3,1,4,3,3,35.0,6.4,2,1,2
319 | 2,9,38.1,150,150,4,4,2,5,4,4,44.0,7.5,2,1,2
320 | 1,1,39.7,100,100,3,6,2,4,4,3,65.0,75.0,3,1,2
321 | 1,1,38.3,80,80,3,4,2,5,4,3,45.0,7.5,1,1,1
322 | 2,1,37.5,40,40,3,3,1,3,2,3,32.0,6.4,1,1,1
323 | 1,1,38.4,84,84,3,5,2,4,3,3,47.0,7.5,2,1,2
324 | 1,1,38.1,84,84,4,4,2,5,3,1,60.0,6.8,2,1,1
325 | 2,1,38.7,52,52,1,1,1,1,3,1,4.0,74.0,1,2,2
326 | 2,1,38.1,44,44,2,3,1,3,3,1,35.0,6.8,1,2,2
327 | 2,1,38.4,52,52,2,3,1,1,3,2,41.0,63.0,1,2,2
328 | 1,1,38.2,60,60,1,3,1,2,1,1,43.0,6.2,1,1,1
329 | 2,1,37.7,40,40,1,1,1,3,2,1,36.0,3.5,1,2,2
330 | 1,1,39.1,60,60,3,1,1,2,3,1,44.0,7.5,1,1,2
331 | 2,1,37.8,48,48,1,1,1,3,1,1,43.0,7.5,1,2,2
332 | 1,1,39.0,120,120,4,5,2,2,4,3,65.0,8.2,1,2,2
333 | 1,1,38.2,76,76,2,2,1,5,3,3,35.0,6.5,1,1,1
334 | 2,1,38.3,88,88,3,6,1,3,3,1,44.0,7.5,2,2,2
335 | 1,1,38.0,80,80,3,3,1,3,3,1,48.0,8.3,1,1,2
336 | 1,1,38.1,60,60,3,1,1,2,3,3,44.0,7.5,2,1,1
337 | 1,1,37.6,40,40,1,1,1,1,1,1,44.0,7.5,1,1,1
338 | 2,1,37.5,44,44,1,1,1,3,3,2,45.0,5.8,1,2,1
339 | 2,1,38.2,42,42,1,3,1,1,3,1,35.0,60.0,1,2,2
340 | 2,1,38.0,56,56,3,3,1,3,1,1,47.0,70.0,1,2,2
341 | 2,1,38.3,45,45,3,2,2,2,4,1,44.0,7.5,1,2,2
342 | 1,1,38.1,48,48,1,3,1,3,4,1,42.0,8.0,1,1,2
343 | 1,1,37.7,55,55,2,2,1,2,3,3,44.0,7.5,1,1,2
344 | 2,1,36.0,100,100,4,6,2,2,4,3,74.0,5.7,3,1,1
345 | 1,1,37.1,60,60,2,4,1,3,3,3,64.0,8.5,1,1,1
346 | 2,1,37.1,114,114,3,3,2,2,2,1,32.0,7.5,1,2,2
347 | 1,1,38.1,72,72,3,3,1,4,4,3,37.0,56.0,1,1,2
348 | 1,1,37.0,44,44,3,1,2,1,1,1,40.0,6.7,1,1,2
349 | 1,1,38.6,48,48,3,1,1,4,3,1,37.0,75.0,1,1,2
350 | 1,1,38.1,82,82,3,4,1,2,3,3,53.0,65.0,3,1,2
351 | 1,9,38.2,78,78,4,6,1,3,3,3,59.0,5.8,2,1,1
352 | 2,1,37.8,60,60,1,3,1,2,3,2,41.0,73.0,3,2,2
353 | 1,1,38.7,34,34,2,3,1,2,3,1,33.0,69.0,3,1,2
354 | 1,1,38.1,36,36,1,1,1,1,2,1,44.0,7.5,1,1,1
355 | 2,1,38.3,44,44,3,1,1,3,3,1,6.4,36.0,1,1,2
356 | 2,1,37.4,54,54,3,1,1,3,4,3,30.0,7.1,1,1,1
357 | 1,1,38.1,60,60,4,1,2,2,4,1,54.0,76.0,1,1,2
358 | 1,1,36.6,48,48,3,3,1,4,1,1,27.0,56.0,3,1,2
359 | 1,1,38.5,90,90,1,3,1,3,3,3,47.0,79.0,1,1,2
360 | 1,1,38.1,75,75,1,4,1,5,3,3,58.0,8.5,1,1,1
361 | 2,1,38.2,42,42,3,1,1,1,1,2,35.0,5.9,1,2,2
362 | 1,9,38.2,78,78,4,6,1,3,3,3,59.0,5.8,2,1,1
363 | 2,1,38.6,60,60,1,3,1,4,2,2,40.0,6.0,1,1,2
364 | 2,1,37.8,42,42,1,1,1,1,3,1,36.0,6.2,1,2,2
365 | 1,1,38.0,60,60,1,2,1,2,1,1,44.0,65.0,3,1,2
366 | 2,1,38.0,42,42,3,3,1,1,1,1,37.0,5.8,1,2,2
367 | 2,1,37.6,88,88,3,1,1,3,3,2,44.0,6.0,2,1,2
368 | 


--------------------------------------------------------------------------------
/data/small.csv:
--------------------------------------------------------------------------------
1 | h,e,t
2 | 185.0,rotten,2.3
3 | 153.0,great,4.5
4 | 163.0,bla,4.2
5 | 114.0,great,1.8
6 | 180.0,bla,7.1
7 | 


--------------------------------------------------------------------------------
/data/src/Manifest.toml:
--------------------------------------------------------------------------------
   1 | # This file is machine-generated - editing it directly is not advised
   2 | 
   3 | [[AbstractFFTs]]
   4 | deps = ["LinearAlgebra"]
   5 | git-tree-sha1 = "051c95d6836228d120f5f4b984dd5aba1624f716"
   6 | uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
   7 | version = "0.5.0"
   8 | 
   9 | [[Adapt]]
  10 | deps = ["LinearAlgebra"]
  11 | git-tree-sha1 = "0fac443759fa829ed8066db6cf1077d888bb6573"
  12 | uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
  13 | version = "2.0.2"
  14 | 
  15 | [[Arpack]]
  16 | deps = ["Arpack_jll", "Libdl", "LinearAlgebra"]
  17 | git-tree-sha1 = "2ff92b71ba1747c5fdd541f8fc87736d82f40ec9"
  18 | uuid = "7d9fca2a-8960-54d3-9f78-7d1dccf2cb97"
  19 | version = "0.4.0"
  20 | 
  21 | [[Arpack_jll]]
  22 | deps = ["Libdl", "OpenBLAS_jll", "Pkg"]
  23 | git-tree-sha1 = "e214a9b9bd1b4e1b4f15b22c0994862b66af7ff7"
  24 | uuid = "68821587-b530-5797-8361-c406ea357684"
  25 | version = "3.5.0+3"
  26 | 
  27 | [[ArrayInterface]]
  28 | deps = ["LinearAlgebra", "Requires", "SparseArrays"]
  29 | git-tree-sha1 = "0eccdcbe27fd6bd9cba3be31c67bdd435a21e865"
  30 | uuid = "4fba245c-0d91-5ea0-9b3e-6abc04ee57a9"
  31 | version = "2.9.1"
  32 | 
  33 | [[AxisAlgorithms]]
  34 | deps = ["LinearAlgebra", "Random", "SparseArrays", "WoodburyMatrices"]
  35 | git-tree-sha1 = "a4d07a1c313392a77042855df46c5f534076fab9"
  36 | uuid = "13072b0f-2c55-5437-9ae7-d433b7a33950"
  37 | version = "1.0.0"
  38 | 
  39 | [[BSON]]
  40 | git-tree-sha1 = "dd36d7cf3d185eeaaf64db902c15174b22f5dafb"
  41 | uuid = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
  42 | version = "0.2.6"
  43 | 
  44 | [[Base64]]
  45 | uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
  46 | 
  47 | [[Bzip2_jll]]
  48 | deps = ["Libdl", "Pkg"]
  49 | git-tree-sha1 = "3663bfffede2ef41358b6fc2e1d8a6d50b3c3904"
  50 | uuid = "6e34b625-4abd-537c-b88f-471c36dfa7a0"
  51 | version = "1.0.6+2"
  52 | 
  53 | [[CSV]]
  54 | deps = ["CategoricalArrays", "DataFrames", "Dates", "FilePathsBase", "Mmap", "Parsers", "PooledArrays", "Tables", "Unicode", "WeakRefStrings"]
  55 | git-tree-sha1 = "52a8e60c7822f53d57e4403b7f2811e7e1bdd32b"
  56 | uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
  57 | version = "0.6.2"
  58 | 
  59 | [[CategoricalArrays]]
  60 | deps = ["DataAPI", "Future", "JSON", "Missings", "Printf", "Statistics", "Unicode"]
  61 | git-tree-sha1 = "a6c17353ee38ddab30e73dcfaa1107752de724ec"
  62 | uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
  63 | version = "0.8.1"
  64 | 
  65 | [[Clustering]]
  66 | deps = ["Distances", "LinearAlgebra", "NearestNeighbors", "Printf", "SparseArrays", "Statistics", "StatsBase"]
  67 | git-tree-sha1 = "b11c8d607af357776a046889a7c32567d05f1319"
  68 | uuid = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
  69 | version = "0.14.1"
  70 | 
  71 | [[CodecZlib]]
  72 | deps = ["TranscodingStreams", "Zlib_jll"]
  73 | git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da"
  74 | uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
  75 | version = "0.7.0"
  76 | 
  77 | [[ColorSchemes]]
  78 | deps = ["ColorTypes", "Colors", "FixedPointNumbers", "Random", "StaticArrays"]
  79 | git-tree-sha1 = "7a15e3690529fd1042f0ab954dff7445b1efc8a5"
  80 | uuid = "35d6a980-a343-548e-a6ea-1d62b119f2f4"
  81 | version = "3.9.0"
  82 | 
  83 | [[ColorTypes]]
  84 | deps = ["FixedPointNumbers", "Random"]
  85 | git-tree-sha1 = "6e7aa35d0294f647bb9c985ccc34d4f5d371a533"
  86 | uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
  87 | version = "0.10.6"
  88 | 
  89 | [[Colors]]
  90 | deps = ["ColorTypes", "FixedPointNumbers", "InteractiveUtils", "Reexport"]
  91 | git-tree-sha1 = "5639e44833cfcf78c6a73fbceb4da75611d312cd"
  92 | uuid = "5ae59095-9a9b-59fe-a467-6f913c188581"
  93 | version = "0.12.3"
  94 | 
  95 | [[CommonSubexpressions]]
  96 | deps = ["MacroTools", "Test"]
  97 | git-tree-sha1 = "7b8a93dba8af7e3b42fecabf646260105ac373f7"
  98 | uuid = "bbf7d656-a473-5ed7-a52c-81e309532950"
  99 | version = "0.3.0"
 100 | 
 101 | [[Compat]]
 102 | deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
 103 | git-tree-sha1 = "a6a8197ae253f2c1a22b2ae17c2dfaf5812c03aa"
 104 | uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
 105 | version = "3.13.0"
 106 | 
 107 | [[CompilerSupportLibraries_jll]]
 108 | deps = ["Libdl", "Pkg"]
 109 | git-tree-sha1 = "7c4f882c41faa72118841185afc58a2eb00ef612"
 110 | uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
 111 | version = "0.3.3+0"
 112 | 
 113 | [[ComputationalResources]]
 114 | git-tree-sha1 = "52cb3ec90e8a8bea0e62e275ba577ad0f74821f7"
 115 | uuid = "ed09eef8-17a6-5b46-8889-db040fac31e3"
 116 | version = "0.3.2"
 117 | 
 118 | [[Conda]]
 119 | deps = ["JSON", "VersionParsing"]
 120 | git-tree-sha1 = "7a58bb32ce5d85f8bf7559aa7c2842f9aecf52fc"
 121 | uuid = "8f4d0f93-b110-5947-807f-2305c1781a2d"
 122 | version = "1.4.1"
 123 | 
 124 | [[Contour]]
 125 | deps = ["StaticArrays"]
 126 | git-tree-sha1 = "81685fee51fc5168898e3cbd8b0f01506cd9148e"
 127 | uuid = "d38c429a-6771-53c6-b99e-75d170b6e991"
 128 | version = "0.5.4"
 129 | 
 130 | [[Crayons]]
 131 | git-tree-sha1 = "c437a9c2114c7ba19322712e58942b383ffbd6c0"
 132 | uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
 133 | version = "4.0.3"
 134 | 
 135 | [[DataAPI]]
 136 | git-tree-sha1 = "176e23402d80e7743fc26c19c681bfb11246af32"
 137 | uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
 138 | version = "1.3.0"
 139 | 
 140 | [[DataFrames]]
 141 | deps = ["CategoricalArrays", "Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "Missings", "PooledArrays", "Printf", "REPL", "Reexport", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
 142 | git-tree-sha1 = "d4436b646615928b634b37e99a3288588072f851"
 143 | uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 144 | version = "0.21.4"
 145 | 
 146 | [[DataStructures]]
 147 | deps = ["InteractiveUtils", "OrderedCollections"]
 148 | git-tree-sha1 = "edad9434967fdc0a2631a65d902228400642120c"
 149 | uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 150 | version = "0.17.19"
 151 | 
 152 | [[DataValueInterfaces]]
 153 | git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
 154 | uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
 155 | version = "1.0.0"
 156 | 
 157 | [[DataValues]]
 158 | deps = ["DataValueInterfaces", "Dates"]
 159 | git-tree-sha1 = "d88a19299eba280a6d062e135a43f00323ae70bf"
 160 | uuid = "e7dc6d0d-1eca-5fa6-8ad6-5aecde8b7ea5"
 161 | version = "0.4.13"
 162 | 
 163 | [[Dates]]
 164 | deps = ["Printf"]
 165 | uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
 166 | 
 167 | [[DecisionTree]]
 168 | deps = ["DelimitedFiles", "Distributed", "LinearAlgebra", "Random", "ScikitLearnBase", "Statistics", "Test"]
 169 | git-tree-sha1 = "9faa81d6e611cf00d16d4dabbd60a325ada72a83"
 170 | uuid = "7806a523-6efd-50cb-b5f6-3fa6f1930dbb"
 171 | version = "0.10.7"
 172 | 
 173 | [[DelimitedFiles]]
 174 | deps = ["Mmap"]
 175 | uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
 176 | 
 177 | [[DiffResults]]
 178 | deps = ["StaticArrays"]
 179 | git-tree-sha1 = "da24935df8e0c6cf28de340b958f6aac88eaa0cc"
 180 | uuid = "163ba53b-c6d8-5494-b064-1a9d43ac40c5"
 181 | version = "1.0.2"
 182 | 
 183 | [[DiffRules]]
 184 | deps = ["NaNMath", "Random", "SpecialFunctions"]
 185 | git-tree-sha1 = "eb0c34204c8410888844ada5359ac8b96292cfd1"
 186 | uuid = "b552c78f-8df3-52c6-915a-8e097449b14b"
 187 | version = "1.0.1"
 188 | 
 189 | [[Distances]]
 190 | deps = ["LinearAlgebra", "Statistics"]
 191 | git-tree-sha1 = "23717536c81b63e250f682b0e0933769eecd1411"
 192 | uuid = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
 193 | version = "0.8.2"
 194 | 
 195 | [[Distributed]]
 196 | deps = ["Random", "Serialization", "Sockets"]
 197 | uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 198 | 
 199 | [[Distributions]]
 200 | deps = ["FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns"]
 201 | git-tree-sha1 = "78c4c32a2357a00a0a7d614880f02c2c6e1ec73c"
 202 | uuid = "31c24e10-a181-5473-b8eb-7969acd0382f"
 203 | version = "0.23.4"
 204 | 
 205 | [[DocStringExtensions]]
 206 | deps = ["LibGit2", "Markdown", "Pkg", "Test"]
 207 | git-tree-sha1 = "c5714d9bcdba66389612dc4c47ed827c64112997"
 208 | uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
 209 | version = "0.8.2"
 210 | 
 211 | [[Documenter]]
 212 | deps = ["Base64", "Dates", "DocStringExtensions", "InteractiveUtils", "JSON", "LibGit2", "Logging", "Markdown", "REPL", "Test", "Unicode"]
 213 | git-tree-sha1 = "395fa1554c69735802bba37d9e7d9586fd44326c"
 214 | uuid = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 215 | version = "0.24.11"
 216 | 
 217 | [[EvoTrees]]
 218 | deps = ["CategoricalArrays", "Distributions", "MLJModelInterface", "Random", "StaticArrays", "Statistics", "StatsBase"]
 219 | git-tree-sha1 = "2608d6cd10db187b7ef96c2197f809c04a1ac735"
 220 | uuid = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
 221 | version = "0.4.9"
 222 | 
 223 | [[ExprTools]]
 224 | git-tree-sha1 = "6f0517056812fd6aa3af23d4b70d5325a2ae4e95"
 225 | uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
 226 | version = "0.1.1"
 227 | 
 228 | [[EzXML]]
 229 | deps = ["Printf", "XML2_jll"]
 230 | git-tree-sha1 = "0fa3b52a04a4e210aeb1626def9c90df3ae65268"
 231 | uuid = "8f5d6c58-4d21-5cfd-889c-e3ad7ee6a615"
 232 | version = "1.1.0"
 233 | 
 234 | [[FFMPEG]]
 235 | deps = ["FFMPEG_jll"]
 236 | git-tree-sha1 = "c82bef6fc01e30d500f588cd01d29bdd44f1924e"
 237 | uuid = "c87230d0-a227-11e9-1b43-d7ebe4e7570a"
 238 | version = "0.3.0"
 239 | 
 240 | [[FFMPEG_jll]]
 241 | deps = ["Bzip2_jll", "FreeType2_jll", "FriBidi_jll", "LAME_jll", "LibVPX_jll", "Libdl", "Ogg_jll", "OpenSSL_jll", "Opus_jll", "Pkg", "Zlib_jll", "libass_jll", "libfdk_aac_jll", "libvorbis_jll", "x264_jll", "x265_jll"]
 242 | git-tree-sha1 = "0fa07f43e5609ea54848b82b4bb330b250e9645b"
 243 | uuid = "b22a6f82-2f65-5046-a5b2-351ab43fb4e5"
 244 | version = "4.1.0+3"
 245 | 
 246 | [[FFTW]]
 247 | deps = ["AbstractFFTs", "FFTW_jll", "IntelOpenMP_jll", "Libdl", "LinearAlgebra", "MKL_jll", "Reexport"]
 248 | git-tree-sha1 = "14536c95939aadcee44014728a459d2fe3ca9acf"
 249 | uuid = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
 250 | version = "1.2.2"
 251 | 
 252 | [[FFTW_jll]]
 253 | deps = ["Libdl", "Pkg"]
 254 | git-tree-sha1 = "6c975cd606128d45d1df432fb812d6eb10fee00b"
 255 | uuid = "f5851436-0d7a-5f13-b9de-f02708fd171a"
 256 | version = "3.3.9+5"
 257 | 
 258 | [[FileIO]]
 259 | deps = ["Pkg"]
 260 | git-tree-sha1 = "202335fd24c2776493e198d6c66a6d910400a895"
 261 | uuid = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
 262 | version = "1.3.0"
 263 | 
 264 | [[FilePathsBase]]
 265 | deps = ["Dates", "LinearAlgebra", "Printf", "Test", "UUIDs"]
 266 | git-tree-sha1 = "923fd3b942a11712435682eaa95cc8518c428b2c"
 267 | uuid = "48062228-2e41-5def-b9a4-89aafe57970f"
 268 | version = "0.8.0"
 269 | 
 270 | [[FileWatching]]
 271 | uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"
 272 | 
 273 | [[FillArrays]]
 274 | deps = ["LinearAlgebra", "Random", "SparseArrays"]
 275 | git-tree-sha1 = "be4180bdb27a11188d694ee3773122f4921f1a62"
 276 | uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
 277 | version = "0.8.13"
 278 | 
 279 | [[FiniteDiff]]
 280 | deps = ["ArrayInterface", "LinearAlgebra", "Requires", "SparseArrays", "StaticArrays"]
 281 | git-tree-sha1 = "b02b6f6ea2c33f86a444f9cf132c1d1180a66cfd"
 282 | uuid = "6a86dc24-6348-571c-b903-95158fe2bd41"
 283 | version = "2.4.1"
 284 | 
 285 | [[FixedPointNumbers]]
 286 | deps = ["Statistics"]
 287 | git-tree-sha1 = "266baee2e9d875cb7a3bfdcc6cab553c543ff8ab"
 288 | uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
 289 | version = "0.8.2"
 290 | 
 291 | [[Formatting]]
 292 | deps = ["Printf"]
 293 | git-tree-sha1 = "a0c901c29c0e7c763342751c0a94211d56c0de5c"
 294 | uuid = "59287772-0a20-5a39-b81b-1366585eb4c0"
 295 | version = "0.4.1"
 296 | 
 297 | [[ForwardDiff]]
 298 | deps = ["CommonSubexpressions", "DiffResults", "DiffRules", "NaNMath", "Random", "SpecialFunctions", "StaticArrays"]
 299 | git-tree-sha1 = "1d090099fb82223abc48f7ce176d3f7696ede36d"
 300 | uuid = "f6369f11-7733-5829-9624-2563aa707210"
 301 | version = "0.10.12"
 302 | 
 303 | [[Franklin]]
 304 | deps = ["Crayons", "Dates", "DelimitedFiles", "DocStringExtensions", "FranklinTemplates", "HTTP", "Literate", "LiveServer", "Logging", "Markdown", "NodeJS", "OrderedCollections", "Pkg", "Random"]
 305 | git-tree-sha1 = "c79cc974f019c23e8e5841772070b60c42cdef1f"
 306 | uuid = "713c75ef-9fc9-4b05-94a9-213340da978e"
 307 | version = "0.8.6"
 308 | 
 309 | [[FranklinTemplates]]
 310 | git-tree-sha1 = "dc509923f200b7385ffe699d82aca084aede014b"
 311 | uuid = "3a985190-f512-4703-8d38-2a7944ed5916"
 312 | version = "0.7.2"
 313 | 
 314 | [[FreeType2_jll]]
 315 | deps = ["Bzip2_jll", "Libdl", "Pkg", "Zlib_jll"]
 316 | git-tree-sha1 = "7d900f32a3788d4eacac2bfa3bf5c770179c8afd"
 317 | uuid = "d7e528f0-a631-5988-bf34-fe36492bcfd7"
 318 | version = "2.10.1+2"
 319 | 
 320 | [[FriBidi_jll]]
 321 | deps = ["Libdl", "Pkg"]
 322 | git-tree-sha1 = "2f56bee16bd0151de7b6a1eeea2ced190a2ad8d4"
 323 | uuid = "559328eb-81f9-559d-9380-de523a88c83c"
 324 | version = "1.0.5+3"
 325 | 
 326 | [[Future]]
 327 | deps = ["Random"]
 328 | uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
 329 | 
 330 | [[GLM]]
 331 | deps = ["Distributions", "LinearAlgebra", "Printf", "Random", "Reexport", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "StatsModels"]
 332 | git-tree-sha1 = "db0ace36f9dbe7b6a7a08434c5921377e9df2c72"
 333 | uuid = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
 334 | version = "1.3.9"
 335 | 
 336 | [[GR]]
 337 | deps = ["Base64", "DelimitedFiles", "HTTP", "JSON", "LinearAlgebra", "Printf", "Random", "Serialization", "Sockets", "Test", "UUIDs"]
 338 | git-tree-sha1 = "e26c513329675092535de20cc4bb9c579c8f85a0"
 339 | uuid = "28b8d3ca-fb5f-59d9-8090-bfdbd6d07a71"
 340 | version = "0.51.0"
 341 | 
 342 | [[GeometryBasics]]
 343 | deps = ["IterTools", "LinearAlgebra", "StaticArrays", "StructArrays", "Tables"]
 344 | git-tree-sha1 = "119f32f9c2b497b49cd3f7f513b358b82660294c"
 345 | uuid = "5c1252a2-5f33-56bf-86c9-59e7332b4326"
 346 | version = "0.2.15"
 347 | 
 348 | [[GeometryTypes]]
 349 | deps = ["ColorTypes", "FixedPointNumbers", "LinearAlgebra", "StaticArrays"]
 350 | git-tree-sha1 = "34bfa994967e893ab2f17b864eec221b3521ba4d"
 351 | uuid = "4d00f742-c7ba-57c2-abde-4428a4b178cb"
 352 | version = "0.8.3"
 353 | 
 354 | [[HTTP]]
 355 | deps = ["Base64", "Dates", "IniFile", "MbedTLS", "Sockets"]
 356 | git-tree-sha1 = "eca61b35cdd8cd2fcc5eec1eda766424a995b02f"
 357 | uuid = "cd3eb016-35fb-5094-929b-558a96fad6f3"
 358 | version = "0.8.16"
 359 | 
 360 | [[IniFile]]
 361 | deps = ["Test"]
 362 | git-tree-sha1 = "098e4d2c533924c921f9f9847274f2ad89e018b8"
 363 | uuid = "83e8ac13-25f8-5344-8a64-a9f2b223428f"
 364 | version = "0.5.0"
 365 | 
 366 | [[IntelOpenMP_jll]]
 367 | deps = ["Libdl", "Pkg"]
 368 | git-tree-sha1 = "fb8e1c7a5594ba56f9011310790e03b5384998d6"
 369 | uuid = "1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0"
 370 | version = "2018.0.3+0"
 371 | 
 372 | [[InteractiveUtils]]
 373 | deps = ["Markdown"]
 374 | uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
 375 | 
 376 | [[Interpolations]]
 377 | deps = ["AxisAlgorithms", "LinearAlgebra", "OffsetArrays", "Random", "Ratios", "SharedArrays", "SparseArrays", "StaticArrays", "WoodburyMatrices"]
 378 | git-tree-sha1 = "2b7d4e9be8b74f03115e64cf36ed2f48ae83d946"
 379 | uuid = "a98d9a8b-a2ab-59e6-89dd-64a1c18fca59"
 380 | version = "0.12.10"
 381 | 
 382 | [[InvertedIndices]]
 383 | deps = ["Test"]
 384 | git-tree-sha1 = "15732c475062348b0165684ffe28e85ea8396afc"
 385 | uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f"
 386 | version = "1.0.0"
 387 | 
 388 | [[IterTools]]
 389 | git-tree-sha1 = "05110a2ab1fc5f932622ffea2a003221f4782c18"
 390 | uuid = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
 391 | version = "1.3.0"
 392 | 
 393 | [[IterativeSolvers]]
 394 | deps = ["LinearAlgebra", "Printf", "Random", "RecipesBase", "SparseArrays"]
 395 | git-tree-sha1 = "3b7e2aac8c94444947facea7cc7ca91c49169be0"
 396 | uuid = "42fd0dbc-a981-5370-80f2-aaf504508153"
 397 | version = "0.8.4"
 398 | 
 399 | [[IteratorInterfaceExtensions]]
 400 | git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
 401 | uuid = "82899510-4779-5014-852e-03e436cf321d"
 402 | version = "1.0.0"
 403 | 
 404 | [[JLSO]]
 405 | deps = ["BSON", "CodecZlib", "FilePathsBase", "Memento", "Pkg", "Serialization"]
 406 | git-tree-sha1 = "9dc0c7a4b7527806e53f524ccd66be0cd9e75e2e"
 407 | uuid = "9da8a3cd-07a3-59c0-a743-3fdc52c30d11"
 408 | version = "2.3.2"
 409 | 
 410 | [[JSON]]
 411 | deps = ["Dates", "Mmap", "Parsers", "Unicode"]
 412 | git-tree-sha1 = "b34d7cef7b337321e97d22242c3c2b91f476748e"
 413 | uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
 414 | version = "0.21.0"
 415 | 
 416 | [[KernelDensity]]
 417 | deps = ["Distributions", "FFTW", "Interpolations", "Optim", "StatsBase", "Test"]
 418 | git-tree-sha1 = "c1048817fe5711f699abc8fabd47b1ac6ba4db04"
 419 | uuid = "5ab0869b-81aa-558d-bb23-cbf5423bbe9b"
 420 | version = "0.5.1"
 421 | 
 422 | [[LAME_jll]]
 423 | deps = ["Libdl", "Pkg"]
 424 | git-tree-sha1 = "221cc8998b9060677448cbb6375f00032554c4fd"
 425 | uuid = "c1c5ebd0-6772-5130-a774-d5fcae4a789d"
 426 | version = "3.100.0+1"
 427 | 
 428 | [[LIBLINEAR]]
 429 | deps = ["DelimitedFiles", "Libdl", "SparseArrays", "Test"]
 430 | git-tree-sha1 = "42cacc29d9b4ae77b6702c181bbfa58f14d8ef7a"
 431 | uuid = "2d691ee1-e668-5016-a719-b2531b85e0f5"
 432 | version = "0.5.1"
 433 | 
 434 | [[LIBSVM]]
 435 | deps = ["Compat", "LIBLINEAR", "Libdl", "ScikitLearnBase", "SparseArrays"]
 436 | git-tree-sha1 = "05d574c6598bce023ba6f2d2aa99ffd4f8e00789"
 437 | uuid = "b1bec4e5-fd48-53fe-b0cb-9723c09d164b"
 438 | version = "0.4.0"
 439 | 
 440 | [[LaTeXStrings]]
 441 | git-tree-sha1 = "de44b395389b84fd681394d4e8d39ef14e3a2ea8"
 442 | uuid = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
 443 | version = "1.1.0"
 444 | 
 445 | [[LearnBase]]
 446 | git-tree-sha1 = "a0d90569edd490b82fdc4dc078ea54a5a800d30a"
 447 | uuid = "7f8f8fb0-2700-5f03-b4bd-41f8cfc144b6"
 448 | version = "0.4.1"
 449 | 
 450 | [[LibGit2]]
 451 | deps = ["Printf"]
 452 | uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
 453 | 
 454 | [[LibVPX_jll]]
 455 | deps = ["Libdl", "Pkg"]
 456 | git-tree-sha1 = "e3549ca9bf35feb9d9d954f4c6a9032e92f46e7c"
 457 | uuid = "dd192d2f-8180-539f-9fb4-cc70b1dcf69a"
 458 | version = "1.8.1+1"
 459 | 
 460 | [[Libdl]]
 461 | uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
 462 | 
 463 | [[Libiconv_jll]]
 464 | deps = ["Libdl", "Pkg"]
 465 | git-tree-sha1 = "c9d4035d7481bcdff2babf5a55525a818ef8ed8f"
 466 | uuid = "94ce4f54-9a6c-5748-9c1c-f9c7231a4531"
 467 | version = "1.16.0+5"
 468 | 
 469 | [[LightGBM]]
 470 | deps = ["Dates", "Libdl", "MLJModelInterface", "StatsBase"]
 471 | git-tree-sha1 = "cae192532a16a84190935389dae1a3a9cdc92ce4"
 472 | uuid = "7acf609c-83a4-11e9-1ffb-b912bcd3b04a"
 473 | version = "0.3.1"
 474 | 
 475 | [[LineSearches]]
 476 | deps = ["LinearAlgebra", "NLSolversBase", "NaNMath", "Parameters", "Printf", "Test"]
 477 | git-tree-sha1 = "54eb90e8dbe745d617c78dee1d6ae95c7f6f5779"
 478 | uuid = "d3d80556-e9d4-5f37-9878-2ab0fcc64255"
 479 | version = "7.0.1"
 480 | 
 481 | [[LinearAlgebra]]
 482 | deps = ["Libdl"]
 483 | uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 484 | 
 485 | [[LinearMaps]]
 486 | deps = ["LinearAlgebra", "SparseArrays"]
 487 | git-tree-sha1 = "e204a96dbb8d49fbca24086c586734435d7bf5b5"
 488 | uuid = "7a12625a-238d-50fd-b39a-03d52299707e"
 489 | version = "2.6.1"
 490 | 
 491 | [[Literate]]
 492 | deps = ["Base64", "JSON", "REPL"]
 493 | git-tree-sha1 = "422133037d6dc5df9f9b97c2cb81fcd9e35ddffe"
 494 | uuid = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
 495 | version = "2.5.0"
 496 | 
 497 | [[LiveServer]]
 498 | deps = ["Crayons", "Documenter", "FileWatching", "HTTP", "Pkg", "Sockets", "Test"]
 499 | git-tree-sha1 = "452307c337d1f625e7475d3e1a028cc5f1ca2fcb"
 500 | uuid = "16fef848-5104-11e9-1b77-fb7a48bbb589"
 501 | version = "0.5.0"
 502 | 
 503 | [[Logging]]
 504 | uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
 505 | 
 506 | [[LossFunctions]]
 507 | deps = ["LearnBase", "Markdown", "RecipesBase", "SparseArrays", "StatsBase"]
 508 | git-tree-sha1 = "3cd347266e394a066ca7f17bd8ff589ff5ce1d35"
 509 | uuid = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
 510 | version = "0.6.2"
 511 | 
 512 | [[MKL_jll]]
 513 | deps = ["IntelOpenMP_jll", "Libdl", "Pkg"]
 514 | git-tree-sha1 = "0ce9a7fa68c70cf83c49d05d2c04d91b47404b08"
 515 | uuid = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
 516 | version = "2020.1.216+0"
 517 | 
 518 | [[MLJ]]
 519 | deps = ["CategoricalArrays", "ComputationalResources", "Distributed", "Distributions", "LinearAlgebra", "MLJBase", "MLJModels", "MLJScientificTypes", "MLJTuning", "Pkg", "ProgressMeter", "Random", "Statistics", "StatsBase", "Tables"]
 520 | git-tree-sha1 = "724663b1628522d83cb58189e57819f82d41063f"
 521 | uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
 522 | version = "0.11.6"
 523 | 
 524 | [[MLJBase]]
 525 | deps = ["CategoricalArrays", "ComputationalResources", "Dates", "DelimitedFiles", "Distributed", "Distributions", "HTTP", "InteractiveUtils", "InvertedIndices", "JLSO", "JSON", "LinearAlgebra", "LossFunctions", "MLJModelInterface", "MLJScientificTypes", "Missings", "OrderedCollections", "Parameters", "PrettyTables", "ProgressMeter", "Random", "ScientificTypes", "Statistics", "StatsBase", "Tables"]
 526 | git-tree-sha1 = "d8ba2063ffaaa7f0fe91ea5455a7bf838c1424ac"
 527 | uuid = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
 528 | version = "0.13.10"
 529 | 
 530 | [[MLJLinearModels]]
 531 | deps = ["DocStringExtensions", "IterativeSolvers", "LinearAlgebra", "LinearMaps", "MLJModelInterface", "Optim", "Parameters"]
 532 | git-tree-sha1 = "01e7a3dc5c07982315c9163bbc3ad9d08811ea8e"
 533 | uuid = "6ee0df7b-362f-4a72-a706-9e79364fb692"
 534 | version = "0.5.0"
 535 | 
 536 | [[MLJModelInterface]]
 537 | deps = ["Random", "ScientificTypes"]
 538 | git-tree-sha1 = "b02b13fde7b0dc301adc070d650405aa4909e657"
 539 | uuid = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
 540 | version = "0.3.0"
 541 | 
 542 | [[MLJModels]]
 543 | deps = ["CategoricalArrays", "Dates", "Distances", "Distributions", "InteractiveUtils", "LinearAlgebra", "MLJBase", "MLJModelInterface", "MultivariateStats", "OrderedCollections", "Parameters", "Pkg", "Random", "Requires", "ScientificTypes", "Statistics", "StatsBase", "Tables"]
 544 | git-tree-sha1 = "3a434db580e736e23643867cd7c7e3ccaeafb31d"
 545 | uuid = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
 546 | version = "0.10.1"
 547 | 
 548 | [[MLJScientificTypes]]
 549 | deps = ["CategoricalArrays", "ColorTypes", "Dates", "PrettyTables", "ScientificTypes", "Tables"]
 550 | git-tree-sha1 = "c85856fca1302f7fd7d46dd72db7cf43d93777d9"
 551 | uuid = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
 552 | version = "0.2.8"
 553 | 
 554 | [[MLJScikitLearnInterface]]
 555 | deps = ["MLJModelInterface", "PyCall", "ScikitLearn"]
 556 | git-tree-sha1 = "9202b249509ec05fd8a5e71b278f42b491f4f324"
 557 | uuid = "5ae90465-5518-4432-b9d2-8a1def2f0cab"
 558 | version = "0.1.5"
 559 | 
 560 | [[MLJTuning]]
 561 | deps = ["ComputationalResources", "Distributed", "Distributions", "MLJBase", "MLJModelInterface", "ProgressMeter", "Random", "RecipesBase"]
 562 | git-tree-sha1 = "f9aa8dafd3dc4b8d195aa1b5518188cfd3e181e1"
 563 | uuid = "03970b2e-30c4-11ea-3135-d1576263f10f"
 564 | version = "0.3.6"
 565 | 
 566 | [[MacroTools]]
 567 | deps = ["Markdown", "Random"]
 568 | git-tree-sha1 = "f7d2e3f654af75f01ec49be82c231c382214223a"
 569 | uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
 570 | version = "0.5.5"
 571 | 
 572 | [[Markdown]]
 573 | deps = ["Base64"]
 574 | uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
 575 | 
 576 | [[MbedTLS]]
 577 | deps = ["Dates", "MbedTLS_jll", "Random", "Sockets"]
 578 | git-tree-sha1 = "426a6978b03a97ceb7ead77775a1da066343ec6e"
 579 | uuid = "739be429-bea8-5141-9913-cc70e7f3736d"
 580 | version = "1.0.2"
 581 | 
 582 | [[MbedTLS_jll]]
 583 | deps = ["Libdl", "Pkg"]
 584 | git-tree-sha1 = "a0cb0d489819fa7ea5f9fa84c7e7eba19d8073af"
 585 | uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
 586 | version = "2.16.6+1"
 587 | 
 588 | [[Measures]]
 589 | git-tree-sha1 = "e498ddeee6f9fdb4551ce855a46f54dbd900245f"
 590 | uuid = "442fdcdd-2543-5da2-b0f3-8c86c306513e"
 591 | version = "0.3.1"
 592 | 
 593 | [[Memento]]
 594 | deps = ["Dates", "Distributed", "JSON", "Serialization", "Sockets", "Syslogs", "Test", "TimeZones", "UUIDs"]
 595 | git-tree-sha1 = "31921ad09307dd9ad693da3213a218152fadb8f2"
 596 | uuid = "f28f55f0-a522-5efc-85c2-fe41dfb9b2d9"
 597 | version = "1.1.0"
 598 | 
 599 | [[Missings]]
 600 | deps = ["DataAPI"]
 601 | git-tree-sha1 = "de0a5ce9e5289f27df672ffabef4d1e5861247d5"
 602 | uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
 603 | version = "0.4.3"
 604 | 
 605 | [[Mmap]]
 606 | uuid = "a63ad114-7e13-5084-954f-fe012c677804"
 607 | 
 608 | [[Mocking]]
 609 | deps = ["ExprTools"]
 610 | git-tree-sha1 = "916b850daad0d46b8c71f65f719c49957e9513ed"
 611 | uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
 612 | version = "0.7.1"
 613 | 
 614 | [[MultivariateStats]]
 615 | deps = ["Arpack", "LinearAlgebra", "SparseArrays", "Statistics", "StatsBase"]
 616 | git-tree-sha1 = "352fae519b447bf52e6de627b89f448bcd469e4e"
 617 | uuid = "6f286f6a-111f-5878-ab1e-185364afe411"
 618 | version = "0.7.0"
 619 | 
 620 | [[NLSolversBase]]
 621 | deps = ["DiffResults", "Distributed", "FiniteDiff", "ForwardDiff"]
 622 | git-tree-sha1 = "7c4e66c47848562003250f28b579c584e55becc0"
 623 | uuid = "d41bc354-129a-5804-8e4c-c37616107c6c"
 624 | version = "7.6.1"
 625 | 
 626 | [[NaNMath]]
 627 | git-tree-sha1 = "c84c576296d0e2fbb3fc134d3e09086b3ea617cd"
 628 | uuid = "77ba4419-2d1f-58cd-9bb1-8ffee604a2e3"
 629 | version = "0.3.4"
 630 | 
 631 | [[NearestNeighbors]]
 632 | deps = ["Distances", "StaticArrays"]
 633 | git-tree-sha1 = "8bc6180f328f3c0ea2663935db880d34c57d6eae"
 634 | uuid = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
 635 | version = "0.4.4"
 636 | 
 637 | [[NodeJS]]
 638 | deps = ["Pkg"]
 639 | git-tree-sha1 = "350ac618f41958e6e0f6b0d2005ae4547eb1b503"
 640 | uuid = "2bd173c7-0d6d-553b-b6af-13a54713934c"
 641 | version = "1.1.1"
 642 | 
 643 | [[Observables]]
 644 | git-tree-sha1 = "11832878355305984235a2e90d0e3737383c634c"
 645 | uuid = "510215fc-4207-5dde-b226-833fc4488ee2"
 646 | version = "0.3.1"
 647 | 
 648 | [[OffsetArrays]]
 649 | git-tree-sha1 = "4ba4cd84c88df8340da1c3e2d8dcb9d18dd1b53b"
 650 | uuid = "6fe1bfb0-de20-5000-8ca7-80f57d26f881"
 651 | version = "1.1.1"
 652 | 
 653 | [[Ogg_jll]]
 654 | deps = ["Libdl", "Pkg"]
 655 | git-tree-sha1 = "59cf7a95bf5ac39feac80b796e0f39f9d69dc887"
 656 | uuid = "e7412a2a-1a6e-54c0-be00-318e2571c051"
 657 | version = "1.3.4+0"
 658 | 
 659 | [[OpenBLAS_jll]]
 660 | deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
 661 | git-tree-sha1 = "0c922fd9634e358622e333fc58de61f05a048492"
 662 | uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
 663 | version = "0.3.9+5"
 664 | 
 665 | [[OpenSSL_jll]]
 666 | deps = ["Libdl", "Pkg"]
 667 | git-tree-sha1 = "7aaaded15bf393b5f34c2aad5b765c18d26cb495"
 668 | uuid = "458c3c95-2e84-50aa-8efc-19380b2a3a95"
 669 | version = "1.1.1+4"
 670 | 
 671 | [[OpenSpecFun_jll]]
 672 | deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
 673 | git-tree-sha1 = "d51c416559217d974a1113522d5919235ae67a87"
 674 | uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
 675 | version = "0.5.3+3"
 676 | 
 677 | [[Optim]]
 678 | deps = ["Compat", "FillArrays", "LineSearches", "LinearAlgebra", "NLSolversBase", "NaNMath", "Parameters", "PositiveFactorizations", "Printf", "SparseArrays", "StatsBase"]
 679 | git-tree-sha1 = "33af70b64e8ce2f2b857e3d5de7b71f67715c121"
 680 | uuid = "429524aa-4258-5aef-a3af-852621145aeb"
 681 | version = "0.21.0"
 682 | 
 683 | [[Opus_jll]]
 684 | deps = ["Libdl", "Pkg"]
 685 | git-tree-sha1 = "002c18f222a542907e16c83c64a1338992da7e2c"
 686 | uuid = "91d4177d-7536-5919-b921-800302f37372"
 687 | version = "1.3.1+1"
 688 | 
 689 | [[OrderedCollections]]
 690 | git-tree-sha1 = "293b70ac1780f9584c89268a6e2a560d938a7065"
 691 | uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
 692 | version = "1.3.0"
 693 | 
 694 | [[PDMats]]
 695 | deps = ["Arpack", "LinearAlgebra", "SparseArrays", "SuiteSparse", "Test"]
 696 | git-tree-sha1 = "2fc6f50ddd959e462f0a2dbc802ddf2a539c6e35"
 697 | uuid = "90014a1f-27ba-587c-ab20-58faa44d9150"
 698 | version = "0.9.12"
 699 | 
 700 | [[Parameters]]
 701 | deps = ["OrderedCollections", "UnPack"]
 702 | git-tree-sha1 = "38b2e970043613c187bd56a995fe2e551821eb4a"
 703 | uuid = "d96e819e-fc66-5662-9728-84c9c7592b0a"
 704 | version = "0.12.1"
 705 | 
 706 | [[Parsers]]
 707 | deps = ["Dates", "Test"]
 708 | git-tree-sha1 = "10134f2ee0b1978ae7752c41306e131a684e1f06"
 709 | uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
 710 | version = "1.0.7"
 711 | 
 712 | [[Pkg]]
 713 | deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
 714 | uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
 715 | 
 716 | [[PlotThemes]]
 717 | deps = ["PlotUtils", "Requires", "Statistics"]
 718 | git-tree-sha1 = "c6f5ea535551b3b16835134697f0c65d06c94b91"
 719 | uuid = "ccf2f8ad-2431-5c83-bf29-c5338b663b6a"
 720 | version = "2.0.0"
 721 | 
 722 | [[PlotUtils]]
 723 | deps = ["ColorSchemes", "Colors", "Dates", "Printf", "Random", "Reexport", "Statistics"]
 724 | git-tree-sha1 = "e18e0e51ff07bf92bb7e06dcb9c082a4e125e20c"
 725 | uuid = "995b91a9-d308-5afd-9ec6-746e21dbc043"
 726 | version = "1.0.5"
 727 | 
 728 | [[Plots]]
 729 | deps = ["Base64", "Contour", "Dates", "FFMPEG", "FixedPointNumbers", "GR", "GeometryBasics", "GeometryTypes", "JSON", "LinearAlgebra", "Measures", "NaNMath", "PlotThemes", "PlotUtils", "Printf", "REPL", "Random", "RecipesBase", "RecipesPipeline", "Reexport", "Requires", "Showoff", "SparseArrays", "Statistics", "StatsBase", "UUIDs"]
 730 | git-tree-sha1 = "ba747739872a67bc1a8078aec3313bde075b3fb0"
 731 | uuid = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 732 | version = "1.5.5"
 733 | 
 734 | [[PooledArrays]]
 735 | deps = ["DataAPI"]
 736 | git-tree-sha1 = "b1333d4eced1826e15adbdf01a4ecaccca9d353c"
 737 | uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
 738 | version = "0.5.3"
 739 | 
 740 | [[PositiveFactorizations]]
 741 | deps = ["LinearAlgebra", "Test"]
 742 | git-tree-sha1 = "127c47b91990c101ee3752291c4f45640eeb03d1"
 743 | uuid = "85a6dd25-e78a-55b7-8502-1745935b8125"
 744 | version = "0.2.3"
 745 | 
 746 | [[PrettyPrinting]]
 747 | git-tree-sha1 = "cb3bd68c8e0fabf6e13c10bdf11713068e748a79"
 748 | uuid = "54e16d92-306c-5ea0-a30b-337be88ac337"
 749 | version = "0.2.0"
 750 | 
 751 | [[PrettyTables]]
 752 | deps = ["Crayons", "Formatting", "Parameters", "Reexport", "Tables"]
 753 | git-tree-sha1 = "8458dc04a493ae5c2fed3796c1d3117972c69694"
 754 | uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
 755 | version = "0.9.1"
 756 | 
 757 | [[Printf]]
 758 | deps = ["Unicode"]
 759 | uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
 760 | 
 761 | [[ProgressMeter]]
 762 | deps = ["Distributed", "Printf"]
 763 | git-tree-sha1 = "2de4cddc0ceeddafb6b143b5b6cd9c659b64507c"
 764 | uuid = "92933f4c-e287-5a05-a399-4b506db050ca"
 765 | version = "1.3.2"
 766 | 
 767 | [[PyCall]]
 768 | deps = ["Conda", "Dates", "Libdl", "LinearAlgebra", "MacroTools", "Serialization", "VersionParsing"]
 769 | git-tree-sha1 = "3a3fdb9000d35958c9ba2323ca7c4958901f115d"
 770 | uuid = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
 771 | version = "1.91.4"
 772 | 
 773 | [[PyPlot]]
 774 | deps = ["Colors", "LaTeXStrings", "PyCall", "Sockets", "Test", "VersionParsing"]
 775 | git-tree-sha1 = "67dde2482fe1a72ef62ed93f8c239f947638e5a2"
 776 | uuid = "d330b81b-6aea-500a-939a-2ce795aea3ee"
 777 | version = "2.9.0"
 778 | 
 779 | [[QuadGK]]
 780 | deps = ["DataStructures", "LinearAlgebra"]
 781 | git-tree-sha1 = "0ab8a09d4478ebeb99a706ecbf8634a65077ccdc"
 782 | uuid = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
 783 | version = "2.4.0"
 784 | 
 785 | [[RData]]
 786 | deps = ["CategoricalArrays", "CodecZlib", "DataFrames", "Dates", "FileIO", "Requires", "TimeZones", "Unicode"]
 787 | git-tree-sha1 = "10693c581956334a368c26b7c544e406c4c94385"
 788 | uuid = "df47a6cb-8c03-5eed-afd8-b6050d6c41da"
 789 | version = "0.7.2"
 790 | 
 791 | [[RDatasets]]
 792 | deps = ["CSV", "CodecZlib", "DataFrames", "FileIO", "Printf", "RData", "Reexport"]
 793 | git-tree-sha1 = "511854268c47438216a7640341ad4ce14b3463bb"
 794 | uuid = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
 795 | version = "0.6.9"
 796 | 
 797 | [[REPL]]
 798 | deps = ["InteractiveUtils", "Markdown", "Sockets"]
 799 | uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
 800 | 
 801 | [[Random]]
 802 | deps = ["Serialization"]
 803 | uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 804 | 
 805 | [[Ratios]]
 806 | git-tree-sha1 = "37d210f612d70f3f7d57d488cb3b6eff56ad4e41"
 807 | uuid = "c84ed2f1-dad5-54f0-aa8e-dbefe2724439"
 808 | version = "0.4.0"
 809 | 
 810 | [[RecipesBase]]
 811 | git-tree-sha1 = "54f8ceb165a0f6d083f0d12cb4996f5367c6edbc"
 812 | uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
 813 | version = "1.0.1"
 814 | 
 815 | [[RecipesPipeline]]
 816 | deps = ["Dates", "PlotUtils", "RecipesBase"]
 817 | git-tree-sha1 = "d2a58b8291d1c0abae6a91489973f8a92bf5c04a"
 818 | uuid = "01d81517-befc-4cb6-b9ec-a95719d0359c"
 819 | version = "0.1.11"
 820 | 
 821 | [[Reexport]]
 822 | deps = ["Pkg"]
 823 | git-tree-sha1 = "7b1d07f411bc8ddb7977ec7f377b97b158514fe0"
 824 | uuid = "189a3867-3050-52da-a836-e630ba90ab69"
 825 | version = "0.2.0"
 826 | 
 827 | [[Requires]]
 828 | deps = ["UUIDs"]
 829 | git-tree-sha1 = "d37400976e98018ee840e0ca4f9d20baa231dc6b"
 830 | uuid = "ae029012-a4dd-5104-9daa-d747884805df"
 831 | version = "1.0.1"
 832 | 
 833 | [[Rmath]]
 834 | deps = ["Random", "Rmath_jll"]
 835 | git-tree-sha1 = "86c5647b565873641538d8f812c04e4c9dbeb370"
 836 | uuid = "79098fc4-a85e-5d69-aa6a-4863f24498fa"
 837 | version = "0.6.1"
 838 | 
 839 | [[Rmath_jll]]
 840 | deps = ["Libdl", "Pkg"]
 841 | git-tree-sha1 = "d76185aa1f421306dec73c057aa384bad74188f0"
 842 | uuid = "f50d1b31-88e8-58de-be2c-1cc44531875f"
 843 | version = "0.2.2+1"
 844 | 
 845 | [[SHA]]
 846 | uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
 847 | 
 848 | [[ScientificTypes]]
 849 | git-tree-sha1 = "1a9f881c800ea009fb7f8b5274f04e4e8a5faef8"
 850 | uuid = "321657f4-b219-11e9-178b-2701a2544e81"
 851 | version = "0.8.0"
 852 | 
 853 | [[ScikitLearn]]
 854 | deps = ["Compat", "Conda", "DataFrames", "Distributed", "IterTools", "LinearAlgebra", "MacroTools", "Parameters", "Printf", "PyCall", "Random", "ScikitLearnBase", "SparseArrays", "StatsBase", "VersionParsing"]
 855 | git-tree-sha1 = "b2dbb141575879beb3ad771fb0314a22617586d3"
 856 | uuid = "3646fa90-6ef7-5e7e-9f22-8aca16db6324"
 857 | version = "0.6.2"
 858 | 
 859 | [[ScikitLearnBase]]
 860 | deps = ["LinearAlgebra", "Random", "Statistics"]
 861 | git-tree-sha1 = "7877e55c1523a4b336b433da39c8e8c08d2f221f"
 862 | uuid = "6e75b9c4-186b-50bd-896f-2d2496a4843e"
 863 | version = "0.5.0"
 864 | 
 865 | [[Serialization]]
 866 | uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
 867 | 
 868 | [[SharedArrays]]
 869 | deps = ["Distributed", "Mmap", "Random", "Serialization"]
 870 | uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"
 871 | 
 872 | [[ShiftedArrays]]
 873 | git-tree-sha1 = "22395afdcf37d6709a5a0766cc4a5ca52cb85ea0"
 874 | uuid = "1277b4bf-5013-50f5-be3d-901d8477a67a"
 875 | version = "1.0.0"
 876 | 
 877 | [[Showoff]]
 878 | deps = ["Dates"]
 879 | git-tree-sha1 = "e032c9df551fb23c9f98ae1064de074111b7bc39"
 880 | uuid = "992d4aef-0814-514b-bc4d-f2e9a6c4116f"
 881 | version = "0.3.1"
 882 | 
 883 | [[Sockets]]
 884 | uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
 885 | 
 886 | [[SortingAlgorithms]]
 887 | deps = ["DataStructures", "Random", "Test"]
 888 | git-tree-sha1 = "03f5898c9959f8115e30bc7226ada7d0df554ddd"
 889 | uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c"
 890 | version = "0.3.1"
 891 | 
 892 | [[SparseArrays]]
 893 | deps = ["LinearAlgebra", "Random"]
 894 | uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
 895 | 
 896 | [[SpecialFunctions]]
 897 | deps = ["OpenSpecFun_jll"]
 898 | git-tree-sha1 = "d8d8b8a9f4119829410ecd706da4cc8594a1e020"
 899 | uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
 900 | version = "0.10.3"
 901 | 
 902 | [[StableRNGs]]
 903 | deps = ["Random", "Test"]
 904 | git-tree-sha1 = "705f8782b1d532c6db75e0a986fb848a629f971a"
 905 | uuid = "860ef19b-820b-49d6-a774-d7a799459cd3"
 906 | version = "0.1.1"
 907 | 
 908 | [[StaticArrays]]
 909 | deps = ["LinearAlgebra", "Random", "Statistics"]
 910 | git-tree-sha1 = "016d1e1a00fabc556473b07161da3d39726ded35"
 911 | uuid = "90137ffa-7385-5640-81b9-e52037218182"
 912 | version = "0.12.4"
 913 | 
 914 | [[Statistics]]
 915 | deps = ["LinearAlgebra", "SparseArrays"]
 916 | uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 917 | 
 918 | [[StatsBase]]
 919 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics"]
 920 | git-tree-sha1 = "a6102b1f364befdb05746f386b67c6b7e3262c45"
 921 | uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
 922 | version = "0.33.0"
 923 | 
 924 | [[StatsFuns]]
 925 | deps = ["Rmath", "SpecialFunctions"]
 926 | git-tree-sha1 = "04a5a8e6ab87966b43f247920eab053fd5fdc925"
 927 | uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
 928 | version = "0.9.5"
 929 | 
 930 | [[StatsModels]]
 931 | deps = ["DataAPI", "DataStructures", "Distributions", "LinearAlgebra", "Printf", "ShiftedArrays", "SparseArrays", "StatsBase", "Tables"]
 932 | git-tree-sha1 = "b79969dac368d8a61515b861b15d0e691e0bff96"
 933 | uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
 934 | version = "0.6.12"
 935 | 
 936 | [[StatsPlots]]
 937 | deps = ["Clustering", "DataStructures", "DataValues", "Distributions", "Interpolations", "KernelDensity", "MultivariateStats", "Observables", "Plots", "RecipesBase", "RecipesPipeline", "Reexport", "StatsBase", "TableOperations", "Tables", "Widgets"]
 938 | git-tree-sha1 = "b9b7fff81f573465fcac4685df1497d968537a9e"
 939 | uuid = "f3b207a7-027a-5e70-b257-86293d7955fd"
 940 | version = "0.14.6"
 941 | 
 942 | [[StructArrays]]
 943 | deps = ["Adapt", "DataAPI", "Tables"]
 944 | git-tree-sha1 = "8099ed9fb90b6e754d6ba8c6ed8670f010eadca0"
 945 | uuid = "09ab397b-f2b6-538f-b94a-2f83cf4a842a"
 946 | version = "0.4.4"
 947 | 
 948 | [[SuiteSparse]]
 949 | deps = ["Libdl", "LinearAlgebra", "Serialization", "SparseArrays"]
 950 | uuid = "4607b0f0-06f3-5cda-b6b1-a6196a1729e9"
 951 | 
 952 | [[Syslogs]]
 953 | deps = ["Printf", "Sockets"]
 954 | git-tree-sha1 = "46badfcc7c6e74535cc7d833a91f4ac4f805f86d"
 955 | uuid = "cea106d9-e007-5e6c-ad93-58fe2094e9c4"
 956 | version = "0.3.0"
 957 | 
 958 | [[TableOperations]]
 959 | deps = ["Tables", "Test"]
 960 | git-tree-sha1 = "208630a14884abd110a8f8008b0882f0d0f5632c"
 961 | uuid = "ab02a1b2-a7df-11e8-156e-fb1833f50b87"
 962 | version = "0.2.1"
 963 | 
 964 | [[TableTraits]]
 965 | deps = ["IteratorInterfaceExtensions"]
 966 | git-tree-sha1 = "b1ad568ba658d8cbb3b892ed5380a6f3e781a81e"
 967 | uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
 968 | version = "1.0.0"
 969 | 
 970 | [[Tables]]
 971 | deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"]
 972 | git-tree-sha1 = "c45dcc27331febabc20d86cb3974ef095257dcf3"
 973 | uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
 974 | version = "1.0.4"
 975 | 
 976 | [[Test]]
 977 | deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
 978 | uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
 979 | 
 980 | [[TimeZones]]
 981 | deps = ["Dates", "EzXML", "Mocking", "Pkg", "Printf", "RecipesBase", "Serialization", "Unicode"]
 982 | git-tree-sha1 = "fc9deaf6636c12c564a9eb7c110eff469eec2efa"
 983 | uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
 984 | version = "1.3.0"
 985 | 
 986 | [[TranscodingStreams]]
 987 | deps = ["Random", "Test"]
 988 | git-tree-sha1 = "7c53c35547de1c5b9d46a4797cf6d8253807108c"
 989 | uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
 990 | version = "0.9.5"
 991 | 
 992 | [[UUIDs]]
 993 | deps = ["Random", "SHA"]
 994 | uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
 995 | 
 996 | [[UnPack]]
 997 | git-tree-sha1 = "d4bfa022cd30df012700cf380af2141961bb3bfb"
 998 | uuid = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
 999 | version = "1.0.1"
1000 | 
1001 | [[Unicode]]
1002 | uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
1003 | 
1004 | [[UrlDownload]]
1005 | deps = ["HTTP", "ProgressMeter"]
1006 | git-tree-sha1 = "5f4a56e15ed7c4e37d35cd30b82ecc2fb28a0f5d"
1007 | uuid = "856ac37a-3032-4c1c-9122-f86d88358c8b"
1008 | version = "0.3.0"
1009 | 
1010 | [[VersionParsing]]
1011 | git-tree-sha1 = "80229be1f670524750d905f8fc8148e5a8c4537f"
1012 | uuid = "81def892-9a0e-5fdd-b105-ffc91e053289"
1013 | version = "1.2.0"
1014 | 
1015 | [[WeakRefStrings]]
1016 | deps = ["DataAPI", "Random", "Test"]
1017 | git-tree-sha1 = "28807f85197eaad3cbd2330386fac1dcb9e7e11d"
1018 | uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
1019 | version = "0.6.2"
1020 | 
1021 | [[Widgets]]
1022 | deps = ["Colors", "Dates", "Observables", "OrderedCollections"]
1023 | git-tree-sha1 = "fc0feda91b3fef7fe6948ee09bb628f882b49ca4"
1024 | uuid = "cc8bc4a8-27d6-5769-a93b-9d913e69aa62"
1025 | version = "0.6.2"
1026 | 
1027 | [[WoodburyMatrices]]
1028 | deps = ["LinearAlgebra", "SparseArrays"]
1029 | git-tree-sha1 = "28ffe06d28b1ba8fdb2f36ec7bb079fac81bac0d"
1030 | uuid = "efce3f68-66dc-5838-9240-27a6d6f5f9b6"
1031 | version = "0.5.2"
1032 | 
1033 | [[XGBoost]]
1034 | deps = ["Libdl", "Printf", "Random", "SparseArrays", "Statistics", "Test", "XGBoost_jll"]
1035 | git-tree-sha1 = "8a692f817f1a6c15ef4913a0ffefa6163117f43d"
1036 | uuid = "009559a3-9522-5dbb-924b-0b6ed2b22bb9"
1037 | version = "1.1.1"
1038 | 
1039 | [[XGBoost_jll]]
1040 | deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
1041 | git-tree-sha1 = "72c0d8bfbb56856c5f25668b72247ec18bbf5579"
1042 | uuid = "a5c6f535-4255-5ca2-a466-0e519f119c46"
1043 | version = "1.1.1+0"
1044 | 
1045 | [[XML2_jll]]
1046 | deps = ["Libdl", "Libiconv_jll", "Pkg", "Zlib_jll"]
1047 | git-tree-sha1 = "432d91f45e950f2f2bda5c0f4e2b938c14493af9"
1048 | uuid = "02c8fc9c-b97f-50b9-bbe4-9be30ff0a78a"
1049 | version = "2.9.10+1"
1050 | 
1051 | [[Zlib_jll]]
1052 | deps = ["Libdl", "Pkg"]
1053 | git-tree-sha1 = "622d8b6dc0c7e8029f17127703de9819134d1b71"
1054 | uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
1055 | version = "1.2.11+14"
1056 | 
1057 | [[libass_jll]]
1058 | deps = ["Bzip2_jll", "FreeType2_jll", "FriBidi_jll", "Libdl", "Pkg", "Zlib_jll"]
1059 | git-tree-sha1 = "027a304b2a90de84f690949a21f94e5ae0f92c73"
1060 | uuid = "0ac62f75-1d6f-5e53-bd7c-93b484bb37c0"
1061 | version = "0.14.0+2"
1062 | 
1063 | [[libfdk_aac_jll]]
1064 | deps = ["Libdl", "Pkg"]
1065 | git-tree-sha1 = "480c7ed04f68ea3edd4c757f5db5b6a0a4e0bd99"
1066 | uuid = "f638f0a6-7fb0-5443-88ba-1cc74229b280"
1067 | version = "0.1.6+2"
1068 | 
1069 | [[libvorbis_jll]]
1070 | deps = ["Libdl", "Ogg_jll", "Pkg"]
1071 | git-tree-sha1 = "6a66f65b5275dfa799036c8a3a26616a0a271c4a"
1072 | uuid = "f27f6e37-5d2b-51aa-960f-b287f2bc3b7a"
1073 | version = "1.3.6+4"
1074 | 
1075 | [[x264_jll]]
1076 | deps = ["Libdl", "Pkg"]
1077 | git-tree-sha1 = "d89346fe63a6465a9f44e958ac0e3d366af90b74"
1078 | uuid = "1270edf5-f2f9-52d2-97e9-ab00b5d0237a"
1079 | version = "2019.5.25+2"
1080 | 
1081 | [[x265_jll]]
1082 | deps = ["Libdl", "Pkg"]
1083 | git-tree-sha1 = "61324ad346b00a6e541896b94201c9426591e43a"
1084 | uuid = "dfaa095f-4041-5dcd-9319-2fabd8486b76"
1085 | version = "3.0.0+1"
1086 | 


--------------------------------------------------------------------------------
/data/src/Project.toml:
--------------------------------------------------------------------------------
 1 | name = "DataScienceTutorials"
 2 | uuid = "b22f6415-6e67-485c-b34d-7995e604d9c9"
 3 | version = "0.4.1"
 4 | 
 5 | [deps]
 6 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 7 | CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
 8 | Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
 9 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
10 | Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
11 | DecisionTree = "7806a523-6efd-50cb-b5f6-3fa6f1930dbb"
12 | Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
13 | Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
14 | EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
15 | Franklin = "713c75ef-9fc9-4b05-94a9-213340da978e"
16 | GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
17 | HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
18 | LIBSVM = "b1bec4e5-fd48-53fe-b0cb-9723c09d164b"
19 | LightGBM = "7acf609c-83a4-11e9-1ffb-b912bcd3b04a"
20 | LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
21 | LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
22 | MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
23 | MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
24 | MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
25 | MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
26 | MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
27 | MLJScientificTypes = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
28 | MLJScikitLearnInterface = "5ae90465-5518-4432-b9d2-8a1def2f0cab"
29 | MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
30 | NearestNeighbors = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
31 | PrettyPrinting = "54e16d92-306c-5ea0-a30b-337be88ac337"
32 | PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
33 | RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
34 | Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
35 | ScikitLearn = "3646fa90-6ef7-5e7e-9f22-8aca16db6324"
36 | StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
37 | Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
38 | StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
39 | StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
40 | Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
41 | UrlDownload = "856ac37a-3032-4c1c-9122-f86d88358c8b"
42 | XGBoost = "009559a3-9522-5dbb-924b-0b6ed2b22bb9"
43 | 
44 | [compat]
45 | MLJ = "0.11"
46 | MLJBase = "0.13"
47 | MLJLinearModels = "0.5"
48 | MLJModelInterface = "0.3"
49 | MLJModels = "0.10"
50 | MLJScientificTypes = "0.2"
51 | 
52 | [extras]
53 | Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
54 | Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
55 | 
56 | [targets]
57 | test = ["Test", "Logging"]
58 | 


--------------------------------------------------------------------------------
/data/src/convert_ames.jl:
--------------------------------------------------------------------------------
 1 | using Pkg
 2 | Pkg.activate(joinpath(@__DIR__, "convert_ames"))
 3 | Pkg.instantiate()
 4 | 
 5 | using DataFrames, CSV, MLJBase, CategoricalArrays
 6 | 
 7 | df = CSV.read(joinpath(@__DIR__, "reduced_ames.csv"))
 8 | 
 9 | schema(df)
10 | 
11 | price  = df.target
12 | quality = df.OverallQual
13 | area1 = map(df.GrLivArea) do a round(Int, a) end
14 | area2 = map(df.x1stFlrSF) do a round(Int, a) end
15 | area3 = map(df.TotalBsmtSF) do a round(Int, a) end
16 | area4 = map(df.BsmtFinSF1) do a round(Int, a) end
17 | area5 = map(df.GarageArea) do a round(Int, a) end
18 | lot_area = map(df.LotArea) do a round(Int, a) end
19 | garage_cars = map(df.GarageCars) do a round(Int, a) end
20 | suburb = df.Neighborhood
21 | council_code = map(df.MSSubClass) do a parse(Int, a[2:end]) end
22 | year_built = map(df.YearBuilt) do a round(Int, a) end
23 | year_upgraded =  map(df.YearRemodAdd) do a round(Int, a) end
24 | zone = df.MSSubClass
25 | 
26 | df2 = DataFrame(price=price,
27 |                 area1=area1,
28 |                 area2=area2,
29 |                 area3=area3,
30 |                 area4=area4,
31 |                 area5=area5,
32 |                 lot_area=lot_area,
33 |                 year_built=year_built,
34 |                 year_upgraded=year_upgraded,
35 |                 quality=quality,
36 |                 garage_cars=garage_cars,
37 |                 suburb=suburb,
38 |                 council_code=council_code,
39 |                 zone=zone)
40 | 
41 | CSV.write(joinpath(@__DIR__, "ames.csv"), df)
42 | 
43 | 


--------------------------------------------------------------------------------
/data/src/convert_ames/Manifest.toml:
--------------------------------------------------------------------------------
  1 | # This file is machine-generated - editing it directly is not advised
  2 | 
  3 | [[Arpack]]
  4 | deps = ["Arpack_jll", "Libdl", "LinearAlgebra"]
  5 | git-tree-sha1 = "2ff92b71ba1747c5fdd541f8fc87736d82f40ec9"
  6 | uuid = "7d9fca2a-8960-54d3-9f78-7d1dccf2cb97"
  7 | version = "0.4.0"
  8 | 
  9 | [[Arpack_jll]]
 10 | deps = ["Libdl", "OpenBLAS_jll", "Pkg"]
 11 | git-tree-sha1 = "e214a9b9bd1b4e1b4f15b22c0994862b66af7ff7"
 12 | uuid = "68821587-b530-5797-8361-c406ea357684"
 13 | version = "3.5.0+3"
 14 | 
 15 | [[BSON]]
 16 | git-tree-sha1 = "dd36d7cf3d185eeaaf64db902c15174b22f5dafb"
 17 | uuid = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
 18 | version = "0.2.6"
 19 | 
 20 | [[Base64]]
 21 | uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
 22 | 
 23 | [[CSV]]
 24 | deps = ["CategoricalArrays", "DataFrames", "Dates", "FilePathsBase", "Mmap", "Parsers", "PooledArrays", "Tables", "Unicode", "WeakRefStrings"]
 25 | git-tree-sha1 = "52a8e60c7822f53d57e4403b7f2811e7e1bdd32b"
 26 | uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 27 | version = "0.6.2"
 28 | 
 29 | [[CategoricalArrays]]
 30 | deps = ["DataAPI", "Future", "JSON", "Missings", "Printf", "Statistics", "Unicode"]
 31 | git-tree-sha1 = "a6c17353ee38ddab30e73dcfaa1107752de724ec"
 32 | uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
 33 | version = "0.8.1"
 34 | 
 35 | [[CodecZlib]]
 36 | deps = ["TranscodingStreams", "Zlib_jll"]
 37 | git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da"
 38 | uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
 39 | version = "0.7.0"
 40 | 
 41 | [[ColorTypes]]
 42 | deps = ["FixedPointNumbers", "Random"]
 43 | git-tree-sha1 = "c73d9cfc2a9d8433dc77f5bff4bddf46b1d78c20"
 44 | uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
 45 | version = "0.10.3"
 46 | 
 47 | [[Compat]]
 48 | deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
 49 | git-tree-sha1 = "054993b6611376ddb40203e973e954fd9d1d1902"
 50 | uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
 51 | version = "3.12.0"
 52 | 
 53 | [[CompilerSupportLibraries_jll]]
 54 | deps = ["Libdl", "Pkg"]
 55 | git-tree-sha1 = "7c4f882c41faa72118841185afc58a2eb00ef612"
 56 | uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
 57 | version = "0.3.3+0"
 58 | 
 59 | [[ComputationalResources]]
 60 | git-tree-sha1 = "52cb3ec90e8a8bea0e62e275ba577ad0f74821f7"
 61 | uuid = "ed09eef8-17a6-5b46-8889-db040fac31e3"
 62 | version = "0.3.2"
 63 | 
 64 | [[Crayons]]
 65 | git-tree-sha1 = "9f3adcb26c79d6270eb678f3c61bf44cc6b7077e"
 66 | uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
 67 | version = "4.0.2"
 68 | 
 69 | [[DataAPI]]
 70 | git-tree-sha1 = "176e23402d80e7743fc26c19c681bfb11246af32"
 71 | uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
 72 | version = "1.3.0"
 73 | 
 74 | [[DataFrames]]
 75 | deps = ["CategoricalArrays", "Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "Missings", "PooledArrays", "Printf", "REPL", "Reexport", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
 76 | git-tree-sha1 = "02f08ae77249b7f6d4186b081a016fb7454c616f"
 77 | uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 78 | version = "0.21.2"
 79 | 
 80 | [[DataStructures]]
 81 | deps = ["InteractiveUtils", "OrderedCollections"]
 82 | git-tree-sha1 = "be680f1ad03c0a03796aa3fda5a2180df7f83b46"
 83 | uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 84 | version = "0.17.18"
 85 | 
 86 | [[DataValueInterfaces]]
 87 | git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
 88 | uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
 89 | version = "1.0.0"
 90 | 
 91 | [[Dates]]
 92 | deps = ["Printf"]
 93 | uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
 94 | 
 95 | [[DelimitedFiles]]
 96 | deps = ["Mmap"]
 97 | uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
 98 | 
 99 | [[Distributed]]
100 | deps = ["Random", "Serialization", "Sockets"]
101 | uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
102 | 
103 | [[Distributions]]
104 | deps = ["FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns"]
105 | git-tree-sha1 = "78c4c32a2357a00a0a7d614880f02c2c6e1ec73c"
106 | uuid = "31c24e10-a181-5473-b8eb-7969acd0382f"
107 | version = "0.23.4"
108 | 
109 | [[ExprTools]]
110 | git-tree-sha1 = "6f0517056812fd6aa3af23d4b70d5325a2ae4e95"
111 | uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
112 | version = "0.1.1"
113 | 
114 | [[EzXML]]
115 | deps = ["Printf", "XML2_jll"]
116 | git-tree-sha1 = "0fa3b52a04a4e210aeb1626def9c90df3ae65268"
117 | uuid = "8f5d6c58-4d21-5cfd-889c-e3ad7ee6a615"
118 | version = "1.1.0"
119 | 
120 | [[FilePathsBase]]
121 | deps = ["Dates", "LinearAlgebra", "Printf", "Test", "UUIDs"]
122 | git-tree-sha1 = "923fd3b942a11712435682eaa95cc8518c428b2c"
123 | uuid = "48062228-2e41-5def-b9a4-89aafe57970f"
124 | version = "0.8.0"
125 | 
126 | [[FillArrays]]
127 | deps = ["LinearAlgebra", "Random", "SparseArrays"]
128 | git-tree-sha1 = "44f561e293987ffc84272cd3d2b14b0b93123d63"
129 | uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
130 | version = "0.8.10"
131 | 
132 | [[FixedPointNumbers]]
133 | git-tree-sha1 = "3ba9ea634d4c8b289d590403b4a06f8e227a6238"
134 | uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
135 | version = "0.8.0"
136 | 
137 | [[Formatting]]
138 | deps = ["Printf"]
139 | git-tree-sha1 = "a0c901c29c0e7c763342751c0a94211d56c0de5c"
140 | uuid = "59287772-0a20-5a39-b81b-1366585eb4c0"
141 | version = "0.4.1"
142 | 
143 | [[Future]]
144 | deps = ["Random"]
145 | uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
146 | 
147 | [[HTTP]]
148 | deps = ["Base64", "Dates", "IniFile", "MbedTLS", "Sockets"]
149 | git-tree-sha1 = "ec87d5e2acbe1693789efbbe14f5ea7525758f71"
150 | uuid = "cd3eb016-35fb-5094-929b-558a96fad6f3"
151 | version = "0.8.15"
152 | 
153 | [[IniFile]]
154 | deps = ["Test"]
155 | git-tree-sha1 = "098e4d2c533924c921f9f9847274f2ad89e018b8"
156 | uuid = "83e8ac13-25f8-5344-8a64-a9f2b223428f"
157 | version = "0.5.0"
158 | 
159 | [[InteractiveUtils]]
160 | deps = ["Markdown"]
161 | uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
162 | 
163 | [[InvertedIndices]]
164 | deps = ["Test"]
165 | git-tree-sha1 = "15732c475062348b0165684ffe28e85ea8396afc"
166 | uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f"
167 | version = "1.0.0"
168 | 
169 | [[IteratorInterfaceExtensions]]
170 | git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
171 | uuid = "82899510-4779-5014-852e-03e436cf321d"
172 | version = "1.0.0"
173 | 
174 | [[JLSO]]
175 | deps = ["BSON", "CodecZlib", "FilePathsBase", "Memento", "Pkg", "Serialization"]
176 | git-tree-sha1 = "9dc0c7a4b7527806e53f524ccd66be0cd9e75e2e"
177 | uuid = "9da8a3cd-07a3-59c0-a743-3fdc52c30d11"
178 | version = "2.3.2"
179 | 
180 | [[JSON]]
181 | deps = ["Dates", "Mmap", "Parsers", "Unicode"]
182 | git-tree-sha1 = "b34d7cef7b337321e97d22242c3c2b91f476748e"
183 | uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
184 | version = "0.21.0"
185 | 
186 | [[LearnBase]]
187 | git-tree-sha1 = "a0d90569edd490b82fdc4dc078ea54a5a800d30a"
188 | uuid = "7f8f8fb0-2700-5f03-b4bd-41f8cfc144b6"
189 | version = "0.4.1"
190 | 
191 | [[LibGit2]]
192 | uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
193 | 
194 | [[Libdl]]
195 | uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
196 | 
197 | [[Libiconv_jll]]
198 | deps = ["Libdl", "Pkg"]
199 | git-tree-sha1 = "e5256a3b0ebc710dbd6da0c0b212164a3681037f"
200 | uuid = "94ce4f54-9a6c-5748-9c1c-f9c7231a4531"
201 | version = "1.16.0+2"
202 | 
203 | [[LinearAlgebra]]
204 | deps = ["Libdl"]
205 | uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
206 | 
207 | [[Logging]]
208 | uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
209 | 
210 | [[LossFunctions]]
211 | deps = ["LearnBase", "Markdown", "RecipesBase", "SparseArrays", "StatsBase"]
212 | git-tree-sha1 = "3cd347266e394a066ca7f17bd8ff589ff5ce1d35"
213 | uuid = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
214 | version = "0.6.2"
215 | 
216 | [[MLJBase]]
217 | deps = ["CategoricalArrays", "ComputationalResources", "Dates", "DelimitedFiles", "Distributed", "Distributions", "HTTP", "InteractiveUtils", "InvertedIndices", "JLSO", "JSON", "LinearAlgebra", "LossFunctions", "MLJModelInterface", "MLJScientificTypes", "Missings", "OrderedCollections", "Parameters", "PrettyTables", "ProgressMeter", "Random", "ScientificTypes", "Statistics", "StatsBase", "Tables"]
218 | git-tree-sha1 = "d8ba2063ffaaa7f0fe91ea5455a7bf838c1424ac"
219 | uuid = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
220 | version = "0.13.10"
221 | 
222 | [[MLJModelInterface]]
223 | deps = ["Random", "ScientificTypes"]
224 | git-tree-sha1 = "b02b13fde7b0dc301adc070d650405aa4909e657"
225 | uuid = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
226 | version = "0.3.0"
227 | 
228 | [[MLJScientificTypes]]
229 | deps = ["CategoricalArrays", "ColorTypes", "Dates", "PrettyTables", "ScientificTypes", "Tables"]
230 | git-tree-sha1 = "5296df0ffd2ff7c667260c027d03a465b59dcff5"
231 | uuid = "2e2323e0-db8b-457b-ae0d-bdfb3bc63afd"
232 | version = "0.2.7"
233 | 
234 | [[Markdown]]
235 | deps = ["Base64"]
236 | uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
237 | 
238 | [[MbedTLS]]
239 | deps = ["Dates", "MbedTLS_jll", "Random", "Sockets"]
240 | git-tree-sha1 = "426a6978b03a97ceb7ead77775a1da066343ec6e"
241 | uuid = "739be429-bea8-5141-9913-cc70e7f3736d"
242 | version = "1.0.2"
243 | 
244 | [[MbedTLS_jll]]
245 | deps = ["Libdl", "Pkg"]
246 | git-tree-sha1 = "c83f5a1d038f034ad0549f9ee4d5fac3fb429e33"
247 | uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
248 | version = "2.16.0+2"
249 | 
250 | [[Memento]]
251 | deps = ["Dates", "Distributed", "JSON", "Serialization", "Sockets", "Syslogs", "Test", "TimeZones", "UUIDs"]
252 | git-tree-sha1 = "31921ad09307dd9ad693da3213a218152fadb8f2"
253 | uuid = "f28f55f0-a522-5efc-85c2-fe41dfb9b2d9"
254 | version = "1.1.0"
255 | 
256 | [[Missings]]
257 | deps = ["DataAPI"]
258 | git-tree-sha1 = "de0a5ce9e5289f27df672ffabef4d1e5861247d5"
259 | uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
260 | version = "0.4.3"
261 | 
262 | [[Mmap]]
263 | uuid = "a63ad114-7e13-5084-954f-fe012c677804"
264 | 
265 | [[Mocking]]
266 | deps = ["ExprTools"]
267 | git-tree-sha1 = "916b850daad0d46b8c71f65f719c49957e9513ed"
268 | uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
269 | version = "0.7.1"
270 | 
271 | [[OpenBLAS_jll]]
272 | deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
273 | git-tree-sha1 = "1887096f6897306a4662f7c5af936da7d5d1a062"
274 | uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
275 | version = "0.3.9+4"
276 | 
277 | [[OpenSpecFun_jll]]
278 | deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
279 | git-tree-sha1 = "d51c416559217d974a1113522d5919235ae67a87"
280 | uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
281 | version = "0.5.3+3"
282 | 
283 | [[OrderedCollections]]
284 | git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
285 | uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
286 | version = "1.2.0"
287 | 
288 | [[PDMats]]
289 | deps = ["Arpack", "LinearAlgebra", "SparseArrays", "SuiteSparse", "Test"]
290 | git-tree-sha1 = "2fc6f50ddd959e462f0a2dbc802ddf2a539c6e35"
291 | uuid = "90014a1f-27ba-587c-ab20-58faa44d9150"
292 | version = "0.9.12"
293 | 
294 | [[Parameters]]
295 | deps = ["OrderedCollections", "UnPack"]
296 | git-tree-sha1 = "38b2e970043613c187bd56a995fe2e551821eb4a"
297 | uuid = "d96e819e-fc66-5662-9728-84c9c7592b0a"
298 | version = "0.12.1"
299 | 
300 | [[Parsers]]
301 | deps = ["Dates", "Test"]
302 | git-tree-sha1 = "eb3e09940c0d7ae01b01d9291ebad7b081c844d3"
303 | uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
304 | version = "1.0.5"
305 | 
306 | [[Pkg]]
307 | deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
308 | uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
309 | 
310 | [[PooledArrays]]
311 | deps = ["DataAPI"]
312 | git-tree-sha1 = "b1333d4eced1826e15adbdf01a4ecaccca9d353c"
313 | uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
314 | version = "0.5.3"
315 | 
316 | [[PrettyTables]]
317 | deps = ["Crayons", "Formatting", "Parameters", "Reexport", "Tables"]
318 | git-tree-sha1 = "ac3cecc7254adfffb8fdbd2c83eaa247e14b02da"
319 | uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
320 | version = "0.9.0"
321 | 
322 | [[Printf]]
323 | deps = ["Unicode"]
324 | uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
325 | 
326 | [[ProgressMeter]]
327 | deps = ["Distributed", "Printf"]
328 | git-tree-sha1 = "3e1784c27847bba115815d4d4e668b99873985e5"
329 | uuid = "92933f4c-e287-5a05-a399-4b506db050ca"
330 | version = "1.3.1"
331 | 
332 | [[QuadGK]]
333 | deps = ["DataStructures", "LinearAlgebra"]
334 | git-tree-sha1 = "dc84e810393cfc6294248c9032a9cdacc14a3db4"
335 | uuid = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
336 | version = "2.3.1"
337 | 
338 | [[REPL]]
339 | deps = ["InteractiveUtils", "Markdown", "Sockets"]
340 | uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
341 | 
342 | [[Random]]
343 | deps = ["Serialization"]
344 | uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
345 | 
346 | [[RecipesBase]]
347 | git-tree-sha1 = "54f8ceb165a0f6d083f0d12cb4996f5367c6edbc"
348 | uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
349 | version = "1.0.1"
350 | 
351 | [[Reexport]]
352 | deps = ["Pkg"]
353 | git-tree-sha1 = "7b1d07f411bc8ddb7977ec7f377b97b158514fe0"
354 | uuid = "189a3867-3050-52da-a836-e630ba90ab69"
355 | version = "0.2.0"
356 | 
357 | [[Rmath]]
358 | deps = ["Random", "Rmath_jll"]
359 | git-tree-sha1 = "86c5647b565873641538d8f812c04e4c9dbeb370"
360 | uuid = "79098fc4-a85e-5d69-aa6a-4863f24498fa"
361 | version = "0.6.1"
362 | 
363 | [[Rmath_jll]]
364 | deps = ["Libdl", "Pkg"]
365 | git-tree-sha1 = "d76185aa1f421306dec73c057aa384bad74188f0"
366 | uuid = "f50d1b31-88e8-58de-be2c-1cc44531875f"
367 | version = "0.2.2+1"
368 | 
369 | [[SHA]]
370 | uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
371 | 
372 | [[ScientificTypes]]
373 | git-tree-sha1 = "1a9f881c800ea009fb7f8b5274f04e4e8a5faef8"
374 | uuid = "321657f4-b219-11e9-178b-2701a2544e81"
375 | version = "0.8.0"
376 | 
377 | [[Serialization]]
378 | uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
379 | 
380 | [[SharedArrays]]
381 | deps = ["Distributed", "Mmap", "Random", "Serialization"]
382 | uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"
383 | 
384 | [[Sockets]]
385 | uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
386 | 
387 | [[SortingAlgorithms]]
388 | deps = ["DataStructures", "Random", "Test"]
389 | git-tree-sha1 = "03f5898c9959f8115e30bc7226ada7d0df554ddd"
390 | uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c"
391 | version = "0.3.1"
392 | 
393 | [[SparseArrays]]
394 | deps = ["LinearAlgebra", "Random"]
395 | uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
396 | 
397 | [[SpecialFunctions]]
398 | deps = ["OpenSpecFun_jll"]
399 | git-tree-sha1 = "d8d8b8a9f4119829410ecd706da4cc8594a1e020"
400 | uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
401 | version = "0.10.3"
402 | 
403 | [[Statistics]]
404 | deps = ["LinearAlgebra", "SparseArrays"]
405 | uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
406 | 
407 | [[StatsBase]]
408 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics"]
409 | git-tree-sha1 = "a6102b1f364befdb05746f386b67c6b7e3262c45"
410 | uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
411 | version = "0.33.0"
412 | 
413 | [[StatsFuns]]
414 | deps = ["Rmath", "SpecialFunctions"]
415 | git-tree-sha1 = "04a5a8e6ab87966b43f247920eab053fd5fdc925"
416 | uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
417 | version = "0.9.5"
418 | 
419 | [[SuiteSparse]]
420 | deps = ["Libdl", "LinearAlgebra", "Serialization", "SparseArrays"]
421 | uuid = "4607b0f0-06f3-5cda-b6b1-a6196a1729e9"
422 | 
423 | [[Syslogs]]
424 | deps = ["Printf", "Sockets"]
425 | git-tree-sha1 = "46badfcc7c6e74535cc7d833a91f4ac4f805f86d"
426 | uuid = "cea106d9-e007-5e6c-ad93-58fe2094e9c4"
427 | version = "0.3.0"
428 | 
429 | [[TableTraits]]
430 | deps = ["IteratorInterfaceExtensions"]
431 | git-tree-sha1 = "b1ad568ba658d8cbb3b892ed5380a6f3e781a81e"
432 | uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
433 | version = "1.0.0"
434 | 
435 | [[Tables]]
436 | deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"]
437 | git-tree-sha1 = "c45dcc27331febabc20d86cb3974ef095257dcf3"
438 | uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
439 | version = "1.0.4"
440 | 
441 | [[Test]]
442 | deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
443 | uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
444 | 
445 | [[TimeZones]]
446 | deps = ["Dates", "EzXML", "Mocking", "Printf", "RecipesBase", "Serialization", "Unicode"]
447 | git-tree-sha1 = "db7bc2051d4c2e5f336409224df81485c00de6cb"
448 | uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
449 | version = "1.2.0"
450 | 
451 | [[TranscodingStreams]]
452 | deps = ["Random", "Test"]
453 | git-tree-sha1 = "7c53c35547de1c5b9d46a4797cf6d8253807108c"
454 | uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
455 | version = "0.9.5"
456 | 
457 | [[UUIDs]]
458 | deps = ["Random", "SHA"]
459 | uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
460 | 
461 | [[UnPack]]
462 | git-tree-sha1 = "d4bfa022cd30df012700cf380af2141961bb3bfb"
463 | uuid = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
464 | version = "1.0.1"
465 | 
466 | [[Unicode]]
467 | uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
468 | 
469 | [[WeakRefStrings]]
470 | deps = ["DataAPI", "Random", "Test"]
471 | git-tree-sha1 = "28807f85197eaad3cbd2330386fac1dcb9e7e11d"
472 | uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
473 | version = "0.6.2"
474 | 
475 | [[XML2_jll]]
476 | deps = ["Libdl", "Libiconv_jll", "Pkg", "Zlib_jll"]
477 | git-tree-sha1 = "987c02a43fa10a491a5f0f7c46a6d3559ed6a8e2"
478 | uuid = "02c8fc9c-b97f-50b9-bbe4-9be30ff0a78a"
479 | version = "2.9.9+4"
480 | 
481 | [[Zlib_jll]]
482 | deps = ["Libdl", "Pkg"]
483 | git-tree-sha1 = "a2e0d558f6031002e380a90613b199e37a8565bf"
484 | uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
485 | version = "1.2.11+10"
486 | 


--------------------------------------------------------------------------------
/data/src/convert_ames/Project.toml:
--------------------------------------------------------------------------------
1 | [deps]
2 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
3 | CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
4 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
5 | MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
6 | 


--------------------------------------------------------------------------------
/data/src/generate_horse.jl:
--------------------------------------------------------------------------------
 1 | using Pkg;
 2 | Pkg.activate(@__DIR__)
 3 | Pkg.instantiate()
 4 | 
 5 | using MLJ
 6 | 
 7 | using HTTP
 8 | using CSV
 9 | import DataFrames: DataFrame, select!, Not
10 | req1 = HTTP.get("http://archive.ics.uci.edu/ml/machine-learning-databases/horse-colic/horse-colic.data")
11 | req2 = HTTP.get("http://archive.ics.uci.edu/ml/machine-learning-databases/horse-colic/horse-colic.test")
12 | header = ["surgery", "age", "hospital_number",
13 |     "rectal_temperature", "pulse",
14 |     "respiratory_rate", "temperature_extremities",
15 |     "peripheral_pulse", "mucous_membranes",
16 |     "capillary_refill_time", "pain",
17 |     "peristalsis", "abdominal_distension",
18 |     "nasogastric_tube", "nasogastric_reflux",
19 |     "nasogastric_reflux_ph", "feces", "abdomen",
20 |     "packed_cell_volume", "total_protein",
21 |     "abdomcentesis_appearance", "abdomcentesis_total_protein",
22 |     "outcome", "surgical_lesion", "lesion_1", "lesion_2", "lesion_3",
23 |     "cp_data"]
24 | csv_opts = (header=header, delim=' ', missingstring="?",
25 |             ignorerepeated=true)
26 | data_train = CSV.read(req1.body; csv_opts...)
27 | data_test  = CSV.read(req2.body; csv_opts...)
28 | @show size(data_train)
29 | @show size(data_test)
30 | 
31 | unwanted = [:lesion_1, :lesion_2, :lesion_3]
32 | data = vcat(data_train, data_test)
33 | select!(data, Not(unwanted));
34 | 
35 | train = 1:nrows(data_train)
36 | test = last(train) .+ (1:nrows(data_test));
37 | 
38 | datac = coerce(data, autotype(data));
39 | 
40 | sch0 = schema(data)
41 | sch = schema(datac)
42 | 
43 | old_scitype_given_name = Dict(
44 | sch0.names[j] => sch0.scitypes[j] for j in eachindex(sch0.names))
45 | 
46 | length(unique(datac.hospital_number))
47 | 
48 | datac = select!(datac, Not(:hospital_number));
49 | 
50 | datac = coerce(datac, autotype(datac, rules=(:discrete_to_continuous,)));
51 | 
52 | missing_outcome = ismissing.(datac.outcome)
53 | idx_missing_outcome = missing_outcome |> findall
54 | 
55 | train = setdiff!(train |> collect, idx_missing_outcome)
56 | test = setdiff!(test |> collect, idx_missing_outcome)
57 | datac = datac[.!missing_outcome, :];
58 | 
59 | for name in names(datac)
60 |     col = datac[:, name]
61 |     ratio_missing = sum(ismissing.(col)) / nrows(datac) * 100
62 |     println(rpad(name, 30), round(ratio_missing, sigdigits=3))
63 | end
64 | 
65 | unwanted = [:peripheral_pulse, :nasogastric_tube, :nasogastric_reflux,
66 |         :nasogastric_reflux_ph, :feces, :abdomen, :abdomcentesis_appearance, :abdomcentesis_total_protein]
67 | select!(datac, Not(unwanted));
68 | 
69 | @load FillImputer
70 | filler = machine(FillImputer(), datac)
71 | fit!(filler)
72 | datac = transform(filler, datac)
73 | 
74 | cat_fields = filter(schema(datac).names) do field
75 |     datac[:, field] isa CategoricalArray
76 | end
77 | 
78 | for f in cat_fields
79 |     datac[!, f] = get.(datac[:, f])
80 | end
81 | 
82 | datac.pulse = coerce(datac.pulse, Count)
83 | datac.respiratory_rate = coerce(datac.pulse, Count)
84 | 
85 | sch1 = schema(datac)
86 | 
87 | CSV.write("horse.csv", datac)
88 | 


--------------------------------------------------------------------------------
/data/src/get_king_county.jl:
--------------------------------------------------------------------------------
 1 | using Pkg;
 2 | Pkg.activate(@__DIR__)
 3 | Pkg.instantiate()
 4 | 
 5 | using MLJ
 6 | using PrettyPrinting
 7 | import DataFrames: DataFrame, select!, Not, describe
 8 | import Statistics
 9 | using Dates
10 | using UrlDownload
11 | using CSV
12 | 
13 | 
14 | df = DataFrame(urldownload("https://raw.githubusercontent.com/tlienart/DataScienceTutorialsData.jl/master/data/kc_housing.csv", true))
15 | describe(df)
16 | 
17 | df.is_renovated = df.yr_renovated .== 0
18 | 
19 | select!(df, Not([:id, :date, :yr_renovated]))
20 | CSV.write(joinpath(@__DIR__, "..", "house.csv"), df)
21 | 


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | dependencies:
2 |     - matplotlib
3 |     - numpy
4 |     - pip
5 |     - pip:
6 |         - julia
7 | 


--------------------------------------------------------------------------------
/exercise_6ci.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/exercise_6ci.png


--------------------------------------------------------------------------------
/exercise_7c.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/exercise_7c.png


--------------------------------------------------------------------------------
/exercise_7c_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/exercise_7c_2.png


--------------------------------------------------------------------------------
/exercise_7c_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/exercise_7c_3.png


--------------------------------------------------------------------------------
/exercise_8c.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/exercise_8c.png


--------------------------------------------------------------------------------
/gamma_sampler.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/gamma_sampler.png


--------------------------------------------------------------------------------
/iris_learning_curve.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/iris_learning_curve.png


--------------------------------------------------------------------------------
/learning_curve.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/learning_curve.png


--------------------------------------------------------------------------------
/learning_curve2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/learning_curve2.png


--------------------------------------------------------------------------------
/methods.md:
--------------------------------------------------------------------------------
 1 | # List of methods introduced in the tutorials
 2 | 
 3 | ## Part 1
 4 | 
 5 | `scitype(object)`, `coerce(vector, SomeSciType)`,
 6 | `levels(categorical_vector)`, `levels!(categorical_vector)`,
 7 | `schema(table)`, `MLJ.table(matrix)`, `autotype(table)`,
 8 | `coerce(table, ...)`, `coerce!(dataframe, ...)`, `elscitype(vector)` 
 9 | 
10 | ## Part 2
11 | 
12 | `OpenML.load(id)`, `unpack(table, ...)`, `models()`, `models(filter)`,
13 | `models(string)`, `@load ModelType pkg=PackageName`, `info(model)`,
14 | `machine(model, X, y)`, `partition(row_indices, ...)`, `fit!(mach,
15 | rows=...)`, `predict(mach, rows=...)`, `predict(mach, Xnew)`,
16 | `fitted_params(mach)`, `report(mach)`, `MLJ.save`,
17 | `machine(filename)`, `machine(filename, X, y)`,
18 | `pdf(single_prediction, class)`, `predict_mode(mach, Xnew)`,
19 | `predict_mean(mach, Xnew)`, `predict_median(mach, Xnew)`,
20 | `measures()`, `evaluate!`, `range(model, :(param.nested_param), ...)`,
21 | `learning_curve(mach, ...)`
22 | 
23 | ## Part 3
24 | 
25 | `Standardizer`, `transform`, `inverse_transform`, `ContinuousEncoder`, `@pipeline`
26 | 
27 | ## Part 4
28 | 
29 | `iterator(r, resolution)`, `sampler(r, distribution)`, `RandomSearch`,
30 | `TunedModel`
31 | 
32 | ## Part 5
33 | 
34 | `source(data)`, `source()`, `Probabilistic()`, `Deterministic()`,
35 | `Unsupervised()`, `@from_network`
36 | 


--------------------------------------------------------------------------------
/outline.md:
--------------------------------------------------------------------------------
  1 | # Machine Learning in Julia using MLJ
  2 | 
  3 | ## Housekeeping
  4 | 
  5 | ### Getting help during the workshop
  6 | 
  7 | ### Resources to help you
  8 | 
  9 | From the MLJ ecosystem:
 10 | 
 11 | - The docs
 12 | 
 13 | - DataScienceTutorials
 14 | 
 15 | From elsewhere:
 16 | 
 17 | - Julia specific:
 18 | 
 19 |    - ScikitLearn
 20 | 
 21 | - General:
 22 | 
 23 |    -
 24 | 
 25 |    - 
 26 |    
 27 | ## Programme
 28 | 
 29 | - An overview of machine learning and MLJ (lecture)
 30 | 
 31 | - Workshop scope
 32 | 
 33 | - Installing MLJ and the tutorials
 34 | 
 35 | - Part 1. Data representations
 36 | 
 37 | Break
 38 | 
 39 | - Part 2: Selecting, training and evaluating models
 40 | 
 41 | - Part 3: Tuning model hyper-parameters 
 42 | 
 43 | Break
 44 | 
 45 | - Part 4: Model pipelines
 46 | 
 47 | - Part 5: Advanced features (lecture)
 48 | 
 49 | Each Parts 2-6 begins with demonstration on the "teacher's dataset", with
 50 | time for participants to carry out a similar exercise on a "student's
 51 | datasets" and interact with the instructors in the chat forum. 
 52 | 
 53 | 
 54 | ## What this workshop won't cover
 55 | 
 56 | This workshop assumes at some experience with data and, ideally, some
 57 | understanding of machine learning principles.
 58 | 
 59 | Lightly covered or not covered
 60 | 
 61 | - data wrangling and data cleaning
 62 | 
 63 | - feature engineering
 64 | 
 65 | - options for parallelism or using GPU's 
 66 | 
 67 | 
 68 | ## Part 1: Data ingestion and pre-processing 
 69 | 
 70 | ### What is machine learning?
 71 | 
 72 | Supervised learning - show with examples and pictures what the basic
 73 | idea and processes are: fitting, evaluating, tuning.
 74 | 
 75 | Unsupervised learning - no labels; main use-case is dimension reduction; explain PCA with a picture
 76 | 
 77 | Re-enforcement learning - out of scope
 78 | 
 79 | 
 80 | ### Different machine learning models and paradigms
 81 | 
 82 | - machine learning ≠ deep learning
 83 | 
 84 | - there are hundreds of machine learning models. All of the following
 85 |   are in common use:
 86 |   
 87 |   - linear models, especially Ridge regression, elastic net, pca (unsupervised)
 88 |   
 89 |   - Naive Bayes 
 90 |   
 91 |   - K-nearest neighbours
 92 |   
 93 |   - K-means clustering (unsupervised)
 94 |   
 95 |   - random forests
 96 |   
 97 |   - gradient boosted tree models (e.g., XGBoost)
 98 |   
 99 |   - support vector machines
100 |   
101 |   - probablistic programming models
102 |   
103 |   - neural networks
104 |   
105 | 
106 | ### What is a (good) machine learning toolbox?
107 | 
108 | - provides uniform interface to zoo of models scattered everywhere
109 |   (different packages, different languages)
110 | 
111 | - provides a searchable model registry
112 | 
113 | - meta-algorithms: 
114 | 
115 |     - evaluating performance using different performance measures (aka
116 |       metrics, scores, loss functions)
117 | 	  
118 | 	- tuning (optimizing hyperparmaters)
119 | 	
120 | 	- facilitates model *composition* (e.g., pipelines)
121 | 	
122 | - customizable (getting under the hood)
123 | 
124 | ### MLJ features 
125 | 
126 | 
127 | ### A short tour of MLJ
128 | 
129 | 
130 | ## Part 1: Data ingestion and pre-processing 
131 | 
132 | ### Scientific types and type coercion
133 | 
134 | - inspecting scitypes and coercing them
135 | 
136 | - working with categorical data 
137 | 
138 | 
139 | ### Tabular data
140 | 
141 | - Lots of things can be considered as tabular data; examples: native
142 |   tables, matrices, DataFrames, CSV files
143 | 
144 | - Lots of ways to grab data; examples:
145 | 
146 |    - load a canned dataset
147 |    - load from local file (e.g., csv)
148 |    - create a synthetic data set
149 |    - use OpenML
150 |    - use RDatasets
151 |    - use UrlDownload (or is there something better?)
152 |    
153 | ### Demo 
154 | 
155 | ### Exercise
156 | 
157 | ## 
158 | 
159 | 


--------------------------------------------------------------------------------
/setup.jl:
--------------------------------------------------------------------------------
 1 | # Setup:
 2 | 
 3 | isbinder() = "jovyan" in split(pwd(), "/")
 4 | 
 5 | const REPO = "https://github.com/ablaom/MachineLearningInJulia2020"
 6 | using Pkg
 7 | 
 8 | if !isbinder()
 9 |     Pkg.activate(DIR)
10 |     Pkg.instantiate()
11 |     using CategoricalArrays
12 |     import MLJLinearModels
13 |     import DataFrames
14 |     import CSV
15 |     import MLJDecisionTreeInterface
16 |     using MLJ
17 |     import MLJClusteringInterface
18 |     import MLJMultivariateStatsInterface
19 |     import MLJScikitLearnInterface
20 |     import MLJLinearModels
21 |     import MLJMultivariateStatsInterface
22 |     import MLJFlux
23 |     import Plots
24 | else
25 |     @info "Skipping package instantiation as binder notebook. "
26 | end
27 | @info "Done loading"
28 | 


--------------------------------------------------------------------------------
/stacking.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/stacking.png


--------------------------------------------------------------------------------
/tuning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/tuning.png


--------------------------------------------------------------------------------
/tutorials.jl:
--------------------------------------------------------------------------------
   1 | # # Machine Learning in Julia, JuliaCon2020
   2 | 
   3 | # A workshop introducing the machine learning toolbox
   4 | # [MLJ](https://alan-turing-institute.github.io/MLJ.jl/stable/).
   5 | 
   6 | 
   7 | # ### Set-up
   8 | 
   9 | # The following instantiates a package environment and pre-loads some
  10 | # packages, to avoid delays later on.
  11 | 
  12 | # The package environment has been created using **Julia 1.6** and may not
  13 | # instantiate properly for other Julia versions.
  14 | 
  15 | VERSION
  16 | 
  17 | #-
  18 | 
  19 | DIR = @__DIR__
  20 | include(joinpath(DIR, "setup.jl"))
  21 | color_off()
  22 | 
  23 | # ## General resources
  24 | 
  25 | # - [List of methods introduced in this tutorial](methods.md)
  26 | # - [MLJ Cheatsheet](https://alan-turing-institute.github.io/MLJ.jl/dev/mlj_cheatsheet/)
  27 | # - [Common MLJ Workflows](https://alan-turing-institute.github.io/MLJ.jl/dev/common_mlj_workflows/)
  28 | # - [MLJ manual](https://alan-turing-institute.github.io/MLJ.jl/dev/)
  29 | # - [Data Science Tutorials in Julia](https://juliaai.github.io/DataScienceTutorials.jl/)
  30 | 
  31 | 
  32 | # ## Contents
  33 | 
  34 | # ### Basic
  35 | 
  36 | # - [Part 1 - Data Representation](#part-1-data-representation)
  37 | # - [Part 2 - Selecting, Training and Evaluating Models](#part-2-selecting-training-and-evaluating-models)
  38 | # - [Part 3 - Transformers and Pipelines](#part-3-transformers-and-pipelines)
  39 | # - [Part 4 - Tuning Hyper-parameters](#part-4-tuning-hyper-parameters)
  40 | # - [Part 5 - Advanced model composition](#part-5-advanced-model-composition)
  41 | # - [Solutions to Exercises](#solutions-to-exercises)
  42 | 
  43 | 
  44 | # <a id='part-1-data-representation'></a>
  45 | 
  46 | 
  47 | # ## Part 1 - Data Representation
  48 | 
  49 | # > **Goals:**
  50 | # > 1. Learn how MLJ specifies it's data requirements using "scientific" types
  51 | # > 2. Understand the options for representing tabular data
  52 | # > 3. Learn how to inspect and fix the representation of data to meet MLJ requirements
  53 | 
  54 | 
  55 | # ### Scientific types
  56 | 
  57 | # To help you focus on the intended *purpose* or *interpretation* of
  58 | # data, MLJ models specify data requirements using *scientific types*,
  59 | # instead of machine types. An example of a scientific type is
  60 | # `OrderedFactor`. The other basic "scalar" scientific types are
  61 | # illustrated below:
  62 | 
  63 | # ![](assets/scitypes.png)
  64 | 
  65 | # A scientific type is an ordinary Julia type (so it can be used for
  66 | # method dispatch, for example) but it usually has no instances. The
  67 | # `scitype` function is used to articulate MLJ's convention about how
  68 | # different machine types will be interpreted by MLJ models:
  69 | 
  70 | using MLJ
  71 | scitype(3.141)
  72 | 
  73 | #-
  74 | 
  75 | time = [2.3, 4.5, 4.2, 1.8, 7.1]
  76 | scitype(time)
  77 | 
  78 | # To fix data which MLJ is interpreting incorrectly, we use the
  79 | # `coerce` method:
  80 | 
  81 | height = [185, 153, 163, 114, 180]
  82 | scitype(height)
  83 | 
  84 | #-
  85 | 
  86 | height = coerce(height, Continuous)
  87 | 
  88 | # Here's an example of data we would want interpreted as
  89 | # `OrderedFactor` but isn't:
  90 | 
  91 | exam_mark = ["rotten", "great", "bla",  missing, "great"]
  92 | scitype(exam_mark)
  93 | 
  94 | #-
  95 | 
  96 | exam_mark = coerce(exam_mark, OrderedFactor)
  97 | 
  98 | #-
  99 | 
 100 | levels(exam_mark)
 101 | 
 102 | # Use `levels!` to put the classes in the right order:
 103 | 
 104 | levels!(exam_mark, ["rotten", "bla", "great"])
 105 | exam_mark[1] < exam_mark[2]
 106 | 
 107 | # When sub-sampling, no levels are lost:
 108 | 
 109 | levels(exam_mark[1:2])
 110 | 
 111 | # **Note on binary data.** There is no separate scientific type for
 112 | # binary data. Binary data is `OrderedFactor{2}` or
 113 | # `Multiclass{2}`. If a binary measure like `truepositive` is a
 114 | # applied to `OrderedFactor{2}` then the "positive" class is assumed
 115 | # to appear *second* in the ordering. If such a measure is applied to
 116 | # `Multiclass{2}` data, a warning is issued. A single `OrderedFactor`
 117 | # can be coerced to a single `Continuous` variable, for models that
 118 | # require this, while a `Multiclass` variable can only be one-hot
 119 | # encoded.
 120 | 
 121 | 
 122 | # ### Two-dimensional data
 123 | 
 124 | # Whenever it makes sense, MLJ Models generally expect two-dimensional
 125 | # data to be *tabular*. All the tabular formats implementing the
 126 | # [Tables.jl API](https://juliadata.github.io/Tables.jl/stable/) (see
 127 | # this
 128 | # [list](https://github.com/JuliaData/Tables.jl/blob/master/INTEGRATIONS.md))
 129 | # have a scientific type of `Table` and can be used with such models.
 130 | 
 131 | # Probably the simplest example of a table is the julia native *column
 132 | # table*, which is just a named tuple of equal-length vectors:
 133 | 
 134 | column_table = (h=height, e=exam_mark, t=time)
 135 | 
 136 | #-
 137 | 
 138 | scitype(column_table)
 139 | 
 140 | #-
 141 | 
 142 | # Notice the `Table{K}` type parameter `K` encodes the scientific
 143 | # types of the columns. (This is useful when comparing table scitypes
 144 | # with `<:`). To inspect the individual column scitypes, we use the
 145 | # `schema` method instead:
 146 | 
 147 | schema(column_table)
 148 | 
 149 | # Here are five other examples of tables:
 150 | 
 151 | dict_table = Dict(:h => height, :e => exam_mark, :t => time)
 152 | schema(dict_table)
 153 | 
 154 | # (To control column order here, instead use `LittleDict` from
 155 | # OrderedCollections.jl.)
 156 | 
 157 | row_table = [(a=1, b=3.4),
 158 |              (a=2, b=4.5),
 159 |              (a=3, b=5.6)]
 160 | schema(row_table)
 161 | 
 162 | #-
 163 | 
 164 | import DataFrames
 165 | df = DataFrames.DataFrame(column_table)
 166 | 
 167 | #-
 168 | 
 169 | schema(df) == schema(column_table)
 170 | 
 171 | #-
 172 | 
 173 | using CSV
 174 | file = CSV.File(joinpath(DIR, "data", "horse.csv"));
 175 | schema(file) # (triggers a file read)
 176 | 
 177 | 
 178 | # Most MLJ models do not accept matrix in lieu of a table, but you can
 179 | # wrap a matrix as a table:
 180 | 
 181 | matrix_table = MLJ.table(rand(2,3))
 182 | schema(matrix_table)
 183 | 
 184 | # The matrix is *not* copied, only wrapped. Some models may perform
 185 | # better if one wraps the adjoint of the transpose - see
 186 | # [here](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Observations-correspond-to-rows,-not-columns).
 187 | 
 188 | 
 189 | # **Manipulating tabular data.** In this workshop we assume
 190 | # familiarity with some kind of tabular data container (although it is
 191 | # possible, in principle, to carry out the exercises without this.)
 192 | # For a quick start introduction to `DataFrames`, see [this
 193 | # tutorial](https://juliaai.github.io/DataScienceTutorials.jl/data/dataframe/).
 194 | 
 195 | # ### Fixing scientific types in tabular data
 196 | 
 197 | # To show how we can correct the scientific types of data in tables,
 198 | # we introduce a cleaned up version of the UCI Horse Colic Data Set
 199 | # (the cleaning work-flow is described
 200 | # [here](https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/horse/#dealing_with_missing_values)).
 201 | 
 202 | using CSV
 203 | file = CSV.File(joinpath(DIR, "data", "horse.csv"));
 204 | horse = DataFrames.DataFrame(file); # convert to data frame without copying columns
 205 | first(horse, 4)
 206 | 
 207 | #-
 208 | 
 209 | # From [the UCI
 210 | # docs](http://archive.ics.uci.edu/ml/datasets/Horse+Colic) we can
 211 | # surmise how each variable ought to be interpreted (a step in our
 212 | # work-flow that cannot reliably be left to the computer):
 213 | 
 214 | # variable                    | scientific type (interpretation)
 215 | # ----------------------------|-----------------------------------
 216 | # `:surgery`                  | Multiclass
 217 | # `:age`                      | Multiclass
 218 | # `:rectal_temperature`       | Continuous
 219 | # `:pulse`                    | Continuous
 220 | # `:respiratory_rate`         | Continuous
 221 | # `:temperature_extremities`  | OrderedFactor
 222 | # `:mucous_membranes`         | Multiclass
 223 | # `:capillary_refill_time`    | Multiclass
 224 | # `:pain`                     | OrderedFactor
 225 | # `:peristalsis`              | OrderedFactor
 226 | # `:abdominal_distension`     | OrderedFactor
 227 | # `:packed_cell_volume`       | Continuous
 228 | # `:total_protein`            | Continuous
 229 | # `:outcome`                  | Multiclass
 230 | # `:surgical_lesion`          | OrderedFactor
 231 | # `:cp_data`                  | Multiclass
 232 | 
 233 | # Let's see how MLJ will actually interpret the data, as it is
 234 | # currently encoded:
 235 | 
 236 | schema(horse)
 237 | 
 238 | # As a first correction step, we can get MLJ to "guess" the
 239 | # appropriate fix, using the `autotype` method:
 240 | 
 241 | autotype(horse)
 242 | 
 243 | #-
 244 | 
 245 | # Okay, this is not perfect, but a step in the right direction, which
 246 | # we implement like this:
 247 | 
 248 | coerce!(horse, autotype(horse));
 249 | schema(horse)
 250 | 
 251 | # All remaining `Count` data should be `Continuous`:
 252 | 
 253 | coerce!(horse, Count => Continuous);
 254 | schema(horse)
 255 | 
 256 | # We'll correct the remaining truant entries manually:
 257 | 
 258 | coerce!(horse,
 259 |         :surgery               => Multiclass,
 260 |         :age                   => Multiclass,
 261 |         :mucous_membranes      => Multiclass,
 262 |         :capillary_refill_time => Multiclass,
 263 |         :outcome               => Multiclass,
 264 |         :cp_data               => Multiclass);
 265 | schema(horse)
 266 | 
 267 | 
 268 | # ### Resources for Part 1
 269 | #
 270 | # - From the MLJ manual:
 271 | #    - [A preview of data type specification in
 272 | #   MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#A-preview-of-data-type-specification-in-MLJ-1)
 273 | #    - [Data containers and scientific types](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Data-containers-and-scientific-types-1)
 274 | #    - [Working with Categorical Data](https://alan-turing-institute.github.io/MLJ.jl/dev/working_with_categorical_data/)
 275 | # - [Summary](https://juliaai.github.io/ScientificTypes.jl/dev/#Summary-of-the-default-convention) of the MLJ convention for representing scientific types
 276 | # - [ScientificTypes.jl](https://juliaai.github.io/ScientificTypes.jl/dev/)
 277 | # - From Data Science Tutorials:
 278 | #     - [Data interpretation: Scientific Types](https://juliaai.github.io/DataScienceTutorials.jl/data/scitype/)
 279 | #     - [Horse colic data](https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/horse/)
 280 | # - [UCI Horse Colic Data Set](http://archive.ics.uci.edu/ml/datasets/Horse+Colic)
 281 | 
 282 | 
 283 | # ### Exercises for Part 1
 284 | 
 285 | 
 286 | # #### Exercise 1
 287 | 
 288 | # Try to guess how each code snippet below will evaluate:
 289 | 
 290 | scitype(42)
 291 | 
 292 | #-
 293 | 
 294 | questions = ["who", "why", "what", "when"]
 295 | scitype(questions)
 296 | 
 297 | #-
 298 | 
 299 | elscitype(questions)
 300 | 
 301 | #-
 302 | 
 303 | t = (3.141, 42, "how")
 304 | scitype(t)
 305 | 
 306 | #-
 307 | 
 308 | A = rand(2, 3)
 309 | 
 310 | # -
 311 | 
 312 | scitype(A)
 313 | 
 314 | #-
 315 | 
 316 | elscitype(A)
 317 | 
 318 | #-
 319 | 
 320 | using SparseArrays
 321 | Asparse = sparse(A)
 322 | 
 323 | #-
 324 | 
 325 | scitype(Asparse)
 326 | 
 327 | #-
 328 | 
 329 | using CategoricalArrays
 330 | C1 = categorical(A)
 331 | 
 332 | #-
 333 | 
 334 | scitype(C1)
 335 | 
 336 | #-
 337 | 
 338 | elscitype(C1)
 339 | 
 340 | #-
 341 | 
 342 | C2 = categorical(A, ordered=true)
 343 | scitype(C2)
 344 | 
 345 | #-
 346 | 
 347 | v = [1, 2, missing, 4]
 348 | scitype(v)
 349 | 
 350 | #-
 351 | 
 352 | elscitype(v)
 353 | 
 354 | #-
 355 | 
 356 | scitype(v[1:2])
 357 | 
 358 | # Can you guess at the general behavior of
 359 | # `scitype` with respect to tuples, abstract arrays and missing
 360 | # values? The answers are
 361 | # [here](https://github.com/juliaai/ScientificTypesBase.jl#2-the-scitype-and-scitype-methods)
 362 | # (ignore "Property 1").
 363 | 
 364 | 
 365 | # #### Exercise 2
 366 | 
 367 | # Coerce the following vector to make MLJ recognize it as a vector of
 368 | # ordered factors (with an appropriate ordering):
 369 | 
 370 | quality = ["good", "poor", "poor", "excellent", missing, "good", "excellent"]
 371 | 
 372 | #-
 373 | 
 374 | 
 375 | # #### Exercise 3 (fixing scitypes in a table)
 376 | 
 377 | # Fix the scitypes for the [House Prices in King
 378 | # County](https://mlr3gallery.mlr-org.com/posts/2020-01-30-house-prices-in-king-county/)
 379 | # dataset:
 380 | 
 381 | file = CSV.File(joinpath(DIR, "data", "house.csv"));
 382 | house = DataFrames.DataFrame(file); # convert to data frame without copying columns
 383 | first(house, 4)
 384 | 
 385 | # (Two features in the original data set have been deemed uninformative
 386 | # and dropped, namely `:id` and `:date`. The original feature
 387 | # `:yr_renovated` has been replaced by the `Bool` feature `is_renovated`.)
 388 | 
 389 | # <a id='part-2-selecting-training-and-evaluating-models'></a>
 390 | 
 391 | 
 392 | # ## Part 2 - Selecting, Training and Evaluating Models
 393 | 
 394 | # > **Goals:**
 395 | # > 1. Search MLJ's database of model metadata to identify model candidates for a supervised learning task.
 396 | # > 2. Evaluate the performance of a model on a holdout set using basic `fit!`/`predict` work-flow.
 397 | # > 3. Inspect the outcomes of training and save these to a file.
 398 | # > 3. Evaluate performance using other resampling strategies, such as cross-validation, in one line, using `evaluate!`
 399 | # > 4. Plot a "learning curve", to inspect performance as a function of some model hyper-parameter, such as an iteration parameter
 400 | 
 401 | # The "Hello World!" of machine learning is to classify Fisher's
 402 | # famous iris data set. This time, we'll grab the data from
 403 | # [OpenML](https://www.openml.org):
 404 | 
 405 | OpenML.describe_dataset(61)
 406 | 
 407 | #-
 408 | 
 409 | iris = OpenML.load(61); # a row table
 410 | iris = DataFrames.DataFrame(iris);
 411 | first(iris, 4)
 412 | 
 413 | # **Main goal.** To build and evaluate models for predicting the
 414 | # `:class` variable, given the four remaining measurement variables.
 415 | 
 416 | 
 417 | # ### Step 1. Inspect and fix scientific types
 418 | 
 419 | schema(iris)
 420 | 
 421 | # Unfortunately, `Missing` is appearing in the element type, despite
 422 | # the fact there are no missing values (see this
 423 | # [issue](https://github.com/JuliaAI/OpenML.jl/issues/10)). To do this
 424 | # we have to explicilty tighten the types:
 425 | 
 426 | #-
 427 | 
 428 | coerce!(iris,
 429 |         Union{Missing,Continuous}=>Continuous,
 430 |         Union{Missing,Multiclass}=>Multiclass,
 431 |         tight=true)
 432 | schema(iris)
 433 | 
 434 | 
 435 | # ### Step 2. Split data into input and target parts
 436 | 
 437 | # Here's how we split the data into target and input features, which
 438 | # is needed for MLJ supervised models. We randomize the data at the
 439 | # same time:
 440 | 
 441 | y, X = unpack(iris, ==(:class), name->true; rng=123);
 442 | scitype(y)
 443 | 
 444 | # Here's one way to access the documentation (at the REPL, `?unpack`
 445 | # also works):
 446 | 
 447 | @doc unpack #!md
 448 | 
 449 | # <display omitted, as not markdown renderable> #md
 450 | 
 451 | 
 452 | # ### On searching for a model
 453 | 
 454 | # Here's how to see *all* models (not immediately useful):
 455 | 
 456 | all_models = models()
 457 | 
 458 | # Each entry contains metadata for a model whose defining code is not yet loaded:
 459 | 
 460 | meta = all_models[3]
 461 | 
 462 | #-
 463 | 
 464 | targetscitype = meta.target_scitype
 465 | 
 466 | #-
 467 | 
 468 | scitype(y) <: targetscitype
 469 | 
 470 | # So this model won't do. Let's  find all pure julia classifiers:
 471 | 
 472 | filter_julia_classifiers(meta) =
 473 |     AbstractVector{Finite} <: meta.target_scitype &&
 474 |     meta.is_pure_julia
 475 | 
 476 | models(filter_julia_classifiers)
 477 | 
 478 | # Find all models with "Classifier" in `name` (or `docstring`):
 479 | 
 480 | models("Classifier")
 481 | 
 482 | 
 483 | # Find all (supervised) models that match my data!
 484 | 
 485 | models(matching(X, y))
 486 | 
 487 | 
 488 | 
 489 | # ### Step 3. Select and instantiate a model
 490 | 
 491 | # To load the code defining a new model type we use the `@load` macro:
 492 | 
 493 | NeuralNetworkClassifier = @load NeuralNetworkClassifier
 494 | 
 495 | # Other ways to load model code are described
 496 | # [here](https://alan-turing-institute.github.io/MLJ.jl/dev/loading_model_code/#Loading-Model-Code).
 497 | 
 498 | # We'll instantiate this type with default values for the
 499 | # hyperparameters:
 500 | 
 501 | model = NeuralNetworkClassifier()
 502 | 
 503 | #-
 504 | 
 505 | info(model)
 506 | 
 507 | # In MLJ a *model* is just a struct containing hyper-parameters, and
 508 | # that's all. A model does not store *learned* parameters. Models are
 509 | # mutable:
 510 | 
 511 | model.epochs = 12
 512 | 
 513 | # And all models have a key-word constructor that works once `@load`
 514 | # has been performed:
 515 | 
 516 | NeuralNetworkClassifier(epochs=12) == model
 517 | 
 518 | 
 519 | # ### On fitting, predicting, and inspecting models
 520 | 
 521 | # In MLJ a model and training/validation data are typically bound
 522 | # together in a machine:
 523 | 
 524 | mach = machine(model, X, y)
 525 | 
 526 | # A machine stores *learned* parameters, among other things. We'll
 527 | # train this machine on 70% of the data and evaluate on a 30% holdout
 528 | # set. Let's start by dividing all row indices into `train` and `test`
 529 | # subsets:
 530 | 
 531 | train, test = partition(eachindex(y), 0.7)
 532 | 
 533 | # Now we can `fit!`...
 534 | 
 535 | fit!(mach, rows=train, verbosity=2)
 536 | 
 537 | # ... and `predict`:
 538 | 
 539 | yhat = predict(mach, rows=test);  # or `predict(mach, Xnew)`
 540 | yhat[1:3]
 541 | 
 542 | # We'll have more to say on the form of this prediction shortly.
 543 | 
 544 | # After training, one can inspect the learned parameters:
 545 | 
 546 | fitted_params(mach)
 547 | 
 548 | #-
 549 | 
 550 | # Everything else the user might be interested in is accessed from the
 551 | # training *report*:
 552 | 
 553 | report(mach)
 554 | 
 555 | # You save a machine like this:
 556 | 
 557 | MLJ.save("neural_net.jlso", mach)
 558 | 
 559 | # And retrieve it like this:
 560 | 
 561 | mach2 = machine("neural_net.jlso")
 562 | yhat = predict(mach2, X);
 563 | yhat[1:3]
 564 | 
 565 | # If you want to fit a retrieved model, you will need to bind some data to it:
 566 | 
 567 | mach3 = machine("neural_net.jlso", X, y)
 568 | fit!(mach3)
 569 | 
 570 | # Machines remember the last set of hyper-parameters used during fit,
 571 | # which, in the case of iterative models, allows for a warm restart of
 572 | # computations in the case that only the iteration parameter is
 573 | # increased:
 574 | 
 575 | model.epochs = model.epochs + 4
 576 | fit!(mach, rows=train, verbosity=2)
 577 | 
 578 | # For this particular model we can also increase `:learning_rate`
 579 | # without triggering a cold restart:
 580 | 
 581 | model.epochs = model.epochs + 4
 582 | model.optimiser.eta = 10*model.optimiser.eta
 583 | fit!(mach, rows=train, verbosity=2)
 584 | 
 585 | # However, change any other parameter and training will restart from
 586 | # scratch:
 587 | 
 588 | model.lambda = 0.001
 589 | fit!(mach, rows=train, verbosity=2)
 590 | 
 591 | # Iterative models that implement warm-restart for training can be
 592 | # controlled externally (eg, using an out-of-sample stopping
 593 | # criterion). See
 594 | # [here](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/)
 595 | # for details.
 596 | 
 597 | 
 598 | # Let's train silently for a total of 50 epochs, and look at a
 599 | # prediction:
 600 | 
 601 | model.epochs = 50
 602 | fit!(mach, rows=train)
 603 | yhat = predict(mach, X[test,:]); # or predict(mach, rows=test)
 604 | yhat[1]
 605 | 
 606 | # What's going on here?
 607 | 
 608 | info(model).prediction_type
 609 | 
 610 | # **Important**:
 611 | # - In MLJ, a model that can predict probabilities (and not just point values) will do so by default.
 612 | # - For most probabilistic predictors, the predicted object is a `Distributions.Distribution` object, supporting the `Distributions.jl` [API](https://juliastats.org/Distributions.jl/latest/extends/#Create-a-Distribution-1) for such objects. In particular, the methods `rand`,  `pdf`, `logpdf`, `mode`, `median` and `mean` will apply, where appropriate.
 613 | 
 614 | # So, to obtain the probability of "Iris-virginica" in the first test
 615 | # prediction, we do
 616 | 
 617 | pdf(yhat[1], "Iris-virginica")
 618 | 
 619 | # To get the most likely observation, we do
 620 | 
 621 | mode(yhat[1])
 622 | 
 623 | # These can be broadcast over multiple predictions in the usual way:
 624 | 
 625 | broadcast(pdf, yhat[1:4], "Iris-versicolor")
 626 | 
 627 | #-
 628 | 
 629 | mode.(yhat[1:4])
 630 | 
 631 | # Or, alternatively, you can use the `predict_mode` operation instead
 632 | # of `predict`:
 633 | 
 634 | predict_mode(mach, X[test,:])[1:4] # or predict_mode(mach, rows=test)[1:4]
 635 | 
 636 | # For a more conventional matrix of probabilities you can do this:
 637 | 
 638 | L = levels(y)
 639 | pdf(yhat, L)[1:4, :]
 640 | 
 641 | # However, in a typical MLJ work-flow, this is not as useful as you
 642 | # might imagine. In particular, all probabilistic performance measures
 643 | # in MLJ expect distribution objects in their first slot:
 644 | 
 645 | cross_entropy(yhat, y[test]) |> mean
 646 | 
 647 | # To apply a deterministic measure, we first need to obtain point-estimates:
 648 | 
 649 | misclassification_rate(mode.(yhat), y[test])
 650 | 
 651 | # We note in passing that there is also a search tool for measures
 652 | # analogous to `models`:
 653 | 
 654 | measures()
 655 | 
 656 | 
 657 | # ### Step 4. Evaluate the model performance
 658 | 
 659 | # Naturally, MLJ provides boilerplate code for carrying out a model
 660 | # evaluation with a lot less fuss. Let's repeat the performance
 661 | # evaluation above and add an extra measure, `brier_score`:
 662 | 
 663 | evaluate!(mach, resampling=Holdout(fraction_train=0.7),
 664 |           measures=[cross_entropy, brier_score])
 665 | 
 666 | # Or applying cross-validation instead:
 667 | 
 668 | evaluate!(mach, resampling=CV(nfolds=6),
 669 |           measures=[cross_entropy, brier_score])
 670 | 
 671 | # Or, Monte Carlo cross-validation (cross-validation repeated
 672 | # randomized folds)
 673 | 
 674 | e = evaluate!(mach, resampling=CV(nfolds=6, rng=123),
 675 |               repeats=3,
 676 |               measures=[cross_entropy, brier_score])
 677 | 
 678 | # One can access the following properties of the output `e` of an
 679 | # evaluation: `measure`, `measurement`, `per_fold` (measurement for
 680 | # each fold) and `per_observation` (measurement per observation, if
 681 | # reported).
 682 | 
 683 | # We finally note that you can restrict the rows of observations from
 684 | # which train and test folds are drawn, by specifying `rows=...`. For
 685 | # example, imagining the last 30% of target observations are `missing`
 686 | # you might have a work-flow like this:
 687 | 
 688 | train, test = partition(eachindex(y), 0.7)
 689 | mach = machine(model, X, y)
 690 | evaluate!(mach, resampling=CV(nfolds=6),
 691 |           measures=[cross_entropy, brier_score],
 692 |           rows=train)     # cv estimate, resampling from `train`
 693 | fit!(mach, rows=train)    # re-train using all of `train` observations
 694 | predict(mach, rows=test); # and predict missing targets
 695 | 
 696 | 
 697 | # ### On learning curves
 698 | 
 699 | # Since our model is an iterative one, we might want to inspect the
 700 | # out-of-sample performance as a function of the iteration
 701 | # parameter. For this we can use the `learning_curve` function (which,
 702 | # incidentally can be applied to any model hyper-parameter). This
 703 | # starts by defining a one-dimensional range object for the parameter
 704 | # (more on this when we discuss tuning in Part 4):
 705 | 
 706 | r = range(model, :epochs, lower=1, upper=50, scale=:log)
 707 | 
 708 | #-
 709 | 
 710 | curve = learning_curve(mach,
 711 |                        range=r,
 712 |                        resampling=Holdout(fraction_train=0.7), # (default)
 713 |                        measure=cross_entropy)
 714 | 
 715 | using Plots
 716 | gr(size=(490,300))
 717 | plt=plot(curve.parameter_values, curve.measurements)
 718 | xlabel!(plt, "epochs")
 719 | ylabel!(plt, "cross entropy on holdout set")
 720 | savefig("learning_curve.png")
 721 | plt #!md
 722 | # ![](learning_curve.png) #md
 723 | 
 724 | # We will return to learning curves when we look at tuning in Part 4.
 725 | 
 726 | 
 727 | # ### Resources for Part 2
 728 | 
 729 | # - From the MLJ manual:
 730 | #     - [Getting Started](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/)
 731 | #     - [Model Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/)
 732 | #     - [Evaluating Performance](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/) (using `evaluate!`)
 733 | #     - [Learning Curves](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
 734 | #     - [Performance Measures](https://alan-turing-institute.github.io/MLJ.jl/dev/performance_measures/) (loss functions, scores, etc)
 735 | # - From Data Science Tutorials:
 736 | #     - [Choosing and evaluating a model](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/choosing-a-model/)
 737 | #     - [Fit, predict, transform](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/fit-and-predict/)
 738 | 
 739 | 
 740 | # ### Exercises for Part 2
 741 | 
 742 | 
 743 | # #### Exercise 4
 744 | 
 745 | # (a) Identify all supervised MLJ models that can be applied (without
 746 | # type coercion or one-hot encoding) to a supervised learning problem
 747 | # with input features `X4` and target `y4` defined below:
 748 | 
 749 | import Distributions
 750 | poisson = Distributions.Poisson
 751 | 
 752 | age = 18 .+ 60*rand(10);
 753 | salary = coerce(rand(["small", "big", "huge"], 10), OrderedFactor);
 754 | levels!(salary, ["small", "big", "huge"]);
 755 | small = CategoricalValue("small", salary)
 756 | 
 757 | #-
 758 | 
 759 | X4 = DataFrames.DataFrame(age=age, salary=salary)
 760 | 
 761 | n_devices(salary) = salary > small ? rand(poisson(1.3)) : rand(poisson(2.9))
 762 | y4 = [n_devices(row.salary) for row in eachrow(X4)]
 763 | 
 764 | # (b) What models can be applied if you coerce the salary to a
 765 | # `Continuous` scitype?
 766 | 
 767 | 
 768 | # #### Exercise 5 (unpack)
 769 | 
 770 | # After evaluating the following ...
 771 | 
 772 | data = (a = [1, 2, 3, 4],
 773 |         b = rand(4),
 774 |         c = rand(4),
 775 |         d = coerce(["male", "female", "female", "male"], OrderedFactor));
 776 | pretty(data)
 777 | 
 778 | #-
 779 | 
 780 | using Tables
 781 | y, X, w = unpack(data,
 782 |                  ==(:a),
 783 |                  name -> elscitype(Tables.getcolumn(data, name)) == Continuous,
 784 |                  name -> true);
 785 | 
 786 | # ...attempt to guess the evaluations of the following:
 787 | 
 788 | y
 789 | 
 790 | #-
 791 | 
 792 | pretty(X)
 793 | 
 794 | #-
 795 | 
 796 | w
 797 | 
 798 | # #### Exercise 6 (first steps in modeling Horse Colic)
 799 | 
 800 | # (a) Suppose we want to use predict the `:outcome` variable in the
 801 | # Horse Colic study introduced in Part 1, based on the remaining
 802 | # variables that are `Continuous` (one-hot encoding categorical
 803 | # variables is discussed later in Part 3) *while ignoring the others*.
 804 | # Extract from the `horse` data set (defined in Part 1) appropriate
 805 | # input features `X` and target variable `y`. (Do not, however,
 806 | # randomize the observations.)
 807 | 
 808 | # (b) Create a 70:30 `train`/`test` split of the data and train a
 809 | # `LogisticClassifier` model, from the `MLJLinearModels` package, on
 810 | # the `train` rows. Use `lambda=100` and default values for the
 811 | # other hyper-parameters. (Although one would normally standardize
 812 | # (whiten) the continuous features for this model, do not do so here.)
 813 | # After training:
 814 | 
 815 | # - (i) Recalling that a logistic classifier (aka logistic regressor) is
 816 | #   a linear-based model learning a *vector* of coefficients for each
 817 | #   feature (one coefficient for each target class), use the
 818 | #   `fitted_params` method to find this vector of coefficients in the
 819 | #   case of the `:pulse` feature. (You can convert a vector of pairs `v =
 820 | #   [x1 => y1, x2 => y2, ...]` into a dictionary with `Dict(v)`.)
 821 | 
 822 | # - (ii) Evaluate the `cross_entropy` performance on the `test`
 823 | #   observations.
 824 | 
 825 | # - &star;(iii) In how many `test` observations does the predicted
 826 | #   probability of the observed class exceed 50%?
 827 | 
 828 | # - (iv) Find the `misclassification_rate` in the `test`
 829 | #   set. (*Hint.* As this measure is deterministic, you will either
 830 | #   need to broadcast `mode` or use `predict_mode` instead of
 831 | #   `predict`.)
 832 | 
 833 | # (c) Instead use a `RandomForestClassifier` model from the
 834 | #     `DecisionTree` package and:
 835 | #
 836 | # - (i) Generate an appropriate learning curve to convince yourself
 837 | #   that out-of-sample estimates of the `cross_entropy` loss do not
 838 | #   substantially improve for `n_trees > 50`. Use default values for
 839 | #   all other hyper-parameters, and feel free to use all available
 840 | #   data to generate the curve.
 841 | 
 842 | # - (ii) Fix `n_trees=90` and use `evaluate!` to obtain a 9-fold
 843 | #   cross-validation estimate of the `cross_entropy`, restricting
 844 | #   sub-sampling to the `train` observations.
 845 | 
 846 | # - (iii) Now use *all* available data but set
 847 | #   `resampling=Holdout(fraction_train=0.7)` to obtain a score you can
 848 | #   compare with the `KNNClassifier` in part (b)(iii). Which model is
 849 | #   better?
 850 | 
 851 | # <a id='part-3-transformers-and-pipelines'></a>
 852 | 
 853 | 
 854 | # ## Part 3 - Transformers and Pipelines
 855 | 
 856 | # ### Transformers
 857 | 
 858 | # Unsupervised models, which receive no target `y` during training,
 859 | # always have a `transform` operation. They sometimes also support an
 860 | # `inverse_transform` operation, with obvious meaning, and sometimes
 861 | # support a `predict` operation (see the clustering example discussed
 862 | # [here](https://alan-turing-institute.github.io/MLJ.jl/dev/transformers/#Transformers-that-also-predict-1)).
 863 | # Otherwise, they are handled much like supervised models.
 864 | 
 865 | # Here's a simple standardization example:
 866 | 
 867 | x = rand(100);
 868 | @show mean(x) std(x);
 869 | 
 870 | #-
 871 | 
 872 | model = Standardizer() # a built-in model
 873 | mach = machine(model, x)
 874 | fit!(mach)
 875 | xhat = transform(mach, x);
 876 | @show mean(xhat) std(xhat);
 877 | 
 878 | # This particular model has an `inverse_transform`:
 879 | 
 880 | inverse_transform(mach, xhat) ≈ x
 881 | 
 882 | 
 883 | # ### Re-encoding the King County House data as continuous
 884 | 
 885 | # For further illustrations of transformers, let's re-encode *all* of the
 886 | # King County House input features (see [Ex
 887 | # 3](#exercise-3-fixing-scitypes-in-a-table)) into a set of `Continuous`
 888 | # features. We do this with the `ContinuousEncoder` model, which, by
 889 | # default, will:
 890 | 
 891 | # - one-hot encode all `Multiclass` features
 892 | # - coerce all `OrderedFactor` features to `Continuous` ones
 893 | # - coerce all `Count` features to `Continuous` ones (there aren't any)
 894 | # - drop any remaining non-Continuous features (none of these either)
 895 | 
 896 | # First, we reload the data and fix the scitypes (Exercise 3):
 897 | 
 898 | file = CSV.File(joinpath(DIR, "data", "house.csv"));
 899 | house = DataFrames.DataFrame(file);
 900 | coerce!(house, autotype(file));
 901 | coerce!(house, Count => Continuous, :zipcode => Multiclass);
 902 | schema(house)
 903 | 
 904 | #-
 905 | 
 906 | y, X = unpack(house, ==(:price), name -> true, rng=123);
 907 | 
 908 | # Instantiate the unsupervised model (transformer):
 909 | 
 910 | encoder = ContinuousEncoder() # a built-in model; no need to @load it
 911 | 
 912 | # Bind the model to the data and fit!
 913 | 
 914 | mach = machine(encoder, X) |> fit!;
 915 | 
 916 | # Transform and inspect the result:
 917 | 
 918 | Xcont = transform(mach, X);
 919 | schema(Xcont)
 920 | 
 921 | 
 922 | # ### More transformers
 923 | 
 924 | # Here's how to list all of MLJ's unsupervised models:
 925 | 
 926 | models(m->!m.is_supervised)
 927 | 
 928 | # Some commonly used ones are built-in (do not require `@load`ing):
 929 | 
 930 | # model type                  | does what?
 931 | # ----------------------------|----------------------------------------------
 932 | # ContinuousEncoder | transform input table to a table of `Continuous` features (see above)
 933 | # FeatureSelector | retain or dump selected features
 934 | # FillImputer | impute missing values
 935 | # OneHotEncoder | one-hot encoder `Multiclass` (and optionally `OrderedFactor`) features
 936 | # Standardizer | standardize (whiten) a vector or all `Continuous` features of a table
 937 | # UnivariateBoxCoxTransformer | apply a learned Box-Cox transformation to a vector
 938 | # UnivariateDiscretizer | discretize a `Continuous` vector, and hence render its elscitypw `OrderedFactor`
 939 | 
 940 | 
 941 | # In addition to "dynamic" transformers (ones that learn something
 942 | # from the data and must be `fit!`) users can wrap ordinary functions
 943 | # as transformers, and such *static* transformers can depend on
 944 | # parameters, like the dynamic ones. See
 945 | # [here](https://alan-turing-institute.github.io/MLJ.jl/dev/transformers/#Static-transformers-1)
 946 | # for how to define your own static transformers.
 947 | 
 948 | 
 949 | # ### Pipelines
 950 | 
 951 | length(schema(Xcont).names)
 952 | 
 953 | # Let's suppose that additionally we'd like to reduce the dimension of
 954 | # our data.  A model that will do this is `PCA` from
 955 | # `MultivariateStats`:
 956 | 
 957 | PCA = @load PCA
 958 | reducer = PCA()
 959 | 
 960 | # Now, rather simply repeating the work-flow above, applying the new
 961 | # transformation to `Xcont`, we can combine both the encoding and the
 962 | # dimension-reducing models into a single model, known as a
 963 | # *pipeline*. While MLJ offers a powerful interface for composing
 964 | # models in a variety of ways, we'll stick to these simplest class of
 965 | # composite models for now. The easiest way to construct them is using
 966 | # the `@pipeline` macro:
 967 | 
 968 | pipe = @pipeline encoder reducer
 969 | 
 970 | # Notice that `pipe` is an *instance* of an automatically generated
 971 | # type (called `Pipeline<some digits>`).
 972 | 
 973 | # The new model behaves like any other transformer:
 974 | 
 975 | mach = machine(pipe, X)
 976 | fit!(mach)
 977 | Xsmall = transform(mach, X)
 978 | schema(Xsmall)
 979 | 
 980 | # Want to combine this pre-processing with ridge regression?
 981 | 
 982 | RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels
 983 | rgs = RidgeRegressor()
 984 | pipe2 = @pipeline encoder reducer rgs
 985 | 
 986 | # Now our pipeline is a supervised model, instead of a transformer,
 987 | # whose performance we can evaluate:
 988 | 
 989 | mach = machine(pipe2, X, y)
 990 | evaluate!(mach, measure=mae, resampling=Holdout()) # CV(nfolds=6) is default
 991 | 
 992 | 
 993 | # ### Training of composite models is "smart"
 994 | 
 995 | # Now notice what happens if we train on all the data, then change a
 996 | # regressor hyper-parameter and retrain:
 997 | 
 998 | fit!(mach)
 999 | 
1000 | #-
1001 | 
1002 | pipe2.ridge_regressor.lambda = 0.1
1003 | fit!(mach)
1004 | 
1005 | # Second time only the ridge regressor is retrained!
1006 | 
1007 | # Mutate a hyper-parameter of the `PCA` model and every model except
1008 | # the `ContinuousEncoder` (which comes before it will be retrained):
1009 | 
1010 | pipe2.pca.pratio = 0.9999
1011 | fit!(mach)
1012 | 
1013 | 
1014 | # ### Inspecting composite models
1015 | 
1016 | # The dot syntax used above to change the values of *nested*
1017 | # hyper-parameters is also useful when inspecting the learned
1018 | # parameters and report generated when training a composite model:
1019 | 
1020 | fitted_params(mach).ridge_regressor
1021 | 
1022 | #-
1023 | 
1024 | report(mach).pca
1025 | 
1026 | 
1027 | # ### Incorporating target transformations
1028 | 
1029 | # Next, suppose that instead of using the raw `:price` as the
1030 | # training target, we want to use the log-price (a common practice in
1031 | # dealing with house price data). However, suppose that we still want
1032 | # to report final *predictions* on the original linear scale (and use
1033 | # these for evaluation purposes). Then we supply appropriate functions
1034 | # to key-word arguments `target` and `inverse`.
1035 | 
1036 | # First we'll overload `log` and `exp` for broadcasting:
1037 | Base.log(v::AbstractArray) = log.(v)
1038 | Base.exp(v::AbstractArray) = exp.(v)
1039 | 
1040 | # Now for the new pipeline:
1041 | 
1042 | pipe3 = @pipeline encoder reducer rgs target=log inverse=exp
1043 | mach = machine(pipe3, X, y)
1044 | evaluate!(mach, measure=mae)
1045 | 
1046 | # MLJ will also allow you to insert *learned* target
1047 | # transformations. For example, we might want to apply
1048 | # `Standardizer()` to the target, to standardize it, or
1049 | # `UnivariateBoxCoxTransformer()` to make it look Gaussian. Then
1050 | # instead of specifying a *function* for `target`, we specify a
1051 | # unsupervised *model* (or model type). One does not specify `inverse`
1052 | # because only models implementing `inverse_transform` are
1053 | # allowed.
1054 | 
1055 | # Let's see which of these two options results in a better outcome:
1056 | 
1057 | box = UnivariateBoxCoxTransformer(n=20)
1058 | stand = Standardizer()
1059 | 
1060 | pipe4 = @pipeline encoder reducer rgs target=box
1061 | mach = machine(pipe4, X, y)
1062 | evaluate!(mach, measure=mae)
1063 | 
1064 | #-
1065 | 
1066 | pipe4.target = stand
1067 | evaluate!(mach, measure=mae)
1068 | 
1069 | 
1070 | # ### Resources for Part 3
1071 | 
1072 | # - From the MLJ manual:
1073 | #     - [Transformers and other unsupervised models](https://alan-turing-institute.github.io/MLJ.jl/dev/transformers/)
1074 | #     - [Linear pipelines](https://alan-turing-institute.github.io/MLJ.jl/dev/linear_pipelines/#Linear-Pipelines)
1075 | # - From Data Science Tutorials:
1076 | #     - [Composing models](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/composing-models/)
1077 | 
1078 | 
1079 | # ### Exercises for Part 3
1080 | 
1081 | # #### Exercise 7
1082 | 
1083 | # Consider again the Horse Colic classification problem considered in
1084 | # Exercise 6, but with all features, `Finite` and `Infinite`:
1085 | 
1086 | y, X = unpack(horse, ==(:outcome), name -> true);
1087 | schema(X)
1088 | 
1089 | # (a) Define a pipeline that:
1090 | # - uses `Standardizer` to ensure that features that are already
1091 | #   continuous are centered at zero and have unit variance
1092 | # - re-encodes the full set of features as `Continuous`, using
1093 | #   `ContinuousEncoder`
1094 | # - uses the `KMeans` clustering model from `Clustering.jl`
1095 | #   to reduce the dimension of the feature space to `k=10`.
1096 | # - trains a `EvoTreeClassifier` (a gradient tree boosting
1097 | #   algorithm in `EvoTrees.jl`) on the reduced data, using
1098 | #   `nrounds=50` and default values for the other
1099 | #    hyper-parameters
1100 | 
1101 | # (b) Evaluate the pipeline on all data, using 6-fold cross-validation
1102 | # and `cross_entropy` loss.
1103 | 
1104 | # &star;(c) Plot a learning curve which examines the effect on this loss
1105 | # as the tree booster parameter `max_depth` varies from 2 to 10.
1106 | 
1107 | # <a id='part-4-tuning-hyper-parameters'></a>
1108 | 
1109 | 
1110 | # ## Part 4 - Tuning Hyper-parameters
1111 | 
1112 | # ### Naive tuning of a single parameter
1113 | 
1114 | # The most naive way to tune a single hyper-parameter is to use
1115 | # `learning_curve`, which we already saw in Part 2. Let's see this in
1116 | # the Horse Colic classification problem, in a case where the parameter
1117 | # to be tuned is *nested* (because the model is a pipeline):
1118 | 
1119 | y, X = unpack(horse, ==(:outcome), name -> true);
1120 | 
1121 | LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels
1122 | model = @pipeline Standardizer ContinuousEncoder LogisticClassifier
1123 | mach = machine(model, X, y)
1124 | 
1125 | #-
1126 | 
1127 | r = range(model, :(logistic_classifier.lambda), lower = 1e-2, upper=100, scale=:log10)
1128 | 
1129 | # If you're curious, you can see what `lambda` values this range will
1130 | # generate for a given resolution:
1131 | 
1132 | iterator(r, 5)
1133 | 
1134 | #-
1135 | 
1136 | _, _, lambdas, losses = learning_curve(mach,
1137 |                                        range=r,
1138 |                                        resampling=CV(nfolds=6),
1139 |                                        resolution=30, # default
1140 |                                        measure=cross_entropy)
1141 | plt=plot(lambdas, losses, xscale=:log10)
1142 | xlabel!(plt, "lambda")
1143 | ylabel!(plt, "cross entropy using 6-fold CV")
1144 | savefig("learning_curve2.png")
1145 | plt #!md
1146 | 
1147 | # ![](learning_curve2.png) #md
1148 | 
1149 | best_lambda = lambdas[argmin(losses)]
1150 | 
1151 | 
1152 | # ### Self tuning models
1153 | 
1154 | # A more sophisticated way to view hyper-parameter tuning (inspired by
1155 | # MLR) is as a model *wrapper*. The wrapped model is a new model in
1156 | # its own right and when you fit it, it tunes specified
1157 | # hyper-parameters of the model being wrapped, before training on all
1158 | # supplied data. Calling `predict` on the wrapped model is like
1159 | # calling `predict` on the original model, but with the
1160 | # hyper-parameters already optimized.
1161 | 
1162 | # In other words, we can think of the wrapped model as a "self-tuning"
1163 | # version of the original.
1164 | 
1165 | # We now create a self-tuning version of the pipeline above, adding a
1166 | # parameter from the `ContinuousEncoder` to the parameters we want
1167 | # optimized.
1168 | 
1169 | # First, let's choose a tuning strategy (from [these
1170 | # options](https://github.com/juliaai/MLJTuning.jl#what-is-provided-here)). MLJ
1171 | # supports ordinary `Grid` search (query `?Grid` for
1172 | # details). However, as the utility of `Grid` search is limited to a
1173 | # small number of parameters, and as `Grid` searches are demonstrated
1174 | # elsewhere (see the [resources below](#resources-for-part-4)) we'll
1175 | # demonstrate `RandomSearch` here:
1176 | 
1177 | tuning = RandomSearch(rng=123)
1178 | 
1179 | # In this strategy each parameter is sampled according to a
1180 | # pre-specified prior distribution that is fit to the one-dimensional
1181 | # range object constructed using `range` as before. While one has a
1182 | # lot of control over the specification of the priors (run
1183 | # `?RandomSearch` for details) we'll let the algorithm generate these
1184 | # priors automatically.
1185 | 
1186 | 
1187 | # #### Unbounded ranges and sampling
1188 | 
1189 | # In MLJ a range does not have to be bounded. In a `RandomSearch` a
1190 | # positive unbounded range is sampled using a `Gamma` distribution, by
1191 | # default:
1192 | 
1193 | r = range(model,
1194 |           :(logistic_classifier.lambda),
1195 |           lower=0,
1196 |           origin=6,
1197 |           unit=5,
1198 |           scale=:log10)
1199 | 
1200 | # The `scale` in a range makes no in a `RandomSearch` (unless it is a
1201 | # function) but this will effect later plots but it does effect the
1202 | # later plots.
1203 | 
1204 | # Let's see what sampling using a Gamma distribution is going to mean
1205 | # for this range:
1206 | 
1207 | import Distributions
1208 | sampler_r = sampler(r, Distributions.Gamma)
1209 | plt = histogram(rand(sampler_r, 10000), nbins=50)
1210 | savefig("gamma_sampler.png")
1211 | plt #!md
1212 | 
1213 | # ![](gamma_sampler.png)
1214 | 
1215 | # The second parameter that we'll add to this is *nominal* (finite) and, by
1216 | # default, will be sampled uniformly. Since it is nominal, we specify
1217 | # `values` instead of `upper` and `lower` bounds:
1218 | 
1219 | s  = range(model, :(continuous_encoder.one_hot_ordered_factors),
1220 |            values = [true, false])
1221 | 
1222 | 
1223 | # #### The tuning wrapper
1224 | 
1225 | # Now for the wrapper, which is an instance of `TunedModel`:
1226 | 
1227 | tuned_model = TunedModel(model=model,
1228 |                          ranges=[r, s],
1229 |                          resampling=CV(nfolds=6),
1230 |                          measures=cross_entropy,
1231 |                          tuning=tuning,
1232 |                          n=15)
1233 | 
1234 | # We can apply the `fit!/predict` work-flow to `tuned_model` just as
1235 | # for any other model:
1236 | 
1237 | tuned_mach = machine(tuned_model, X, y);
1238 | fit!(tuned_mach);
1239 | predict(tuned_mach, rows=1:3)
1240 | 
1241 | # The outcomes of the tuning can be inspected from a detailed
1242 | # report. For example, we have:
1243 | 
1244 | rep = report(tuned_mach);
1245 | rep.best_model
1246 | 
1247 | # By default, sampling of a bounded range is uniform. Lets
1248 | 
1249 | # In the special case of two-parameters, you can also plot the results:
1250 | 
1251 | plt = plot(tuned_mach)
1252 | savefig("tuning.png")
1253 | plt #!md
1254 | 
1255 | # ![](tuning.png) #md
1256 | 
1257 | # Finally, let's compare cross-validation estimate of the performance
1258 | # of the self-tuning model with that of the original model (an example
1259 | # of [*nested
1260 | # resampling*]((https://mlr.mlr-org.com/articles/tutorial/nested_resampling.html)
1261 | # here):
1262 | 
1263 | err = evaluate!(mach, resampling=CV(nfolds=3), measure=cross_entropy)
1264 | 
1265 | #-
1266 | 
1267 | tuned_err = evaluate!(tuned_mach, resampling=CV(nfolds=3), measure=cross_entropy)
1268 | 
1269 | # <a id='resources-for-part-4'></a>
1270 | 
1271 | 
1272 | # ### Resources for Part 4
1273 | #
1274 | # - From the MLJ manual:
1275 | #    - [Learning Curves](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
1276 | #    - [Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/)
1277 | # - The [MLJTuning repo](https://github.com/juliaai/MLJTuning.jl#who-is-this-repo-for) - mostly for developers
1278 | #
1279 | # - From Data Science Tutorials:
1280 | #     - [Tuning a model](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/model-tuning/)
1281 | #     - [Crabs with XGBoost](https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/crabs-xgb/) `Grid` tuning in stages for a tree-boosting model with many parameters
1282 | #     - [Boston with LightGBM](https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/boston-lgbm/) -  `Grid` tuning for another popular tree-booster
1283 | #     - [Boston with Flux](https://juliaai.github.io/DataScienceTutorials.jl/end-to-end/boston-flux/) - optimizing batch size in a simple neural network regressor
1284 | # - [UCI Horse Colic Data Set](http://archive.ics.uci.edu/ml/datasets/Horse+Colic)
1285 | 
1286 | 
1287 | # ### Exercises for Part 4
1288 | 
1289 | # #### Exercise 8
1290 | 
1291 | # This exercise continues our analysis of the King County House price
1292 | # prediction problem:
1293 | 
1294 | y, X = unpack(house, ==(:price), name -> true, rng=123);
1295 | 
1296 | # Your task will be to tune the following pipeline regression model,
1297 | # which includes a gradient tree boosting component:
1298 | 
1299 | EvoTreeRegressor = @load EvoTreeRegressor
1300 | tree_booster = EvoTreeRegressor(nrounds = 70)
1301 | model = @pipeline ContinuousEncoder tree_booster
1302 | 
1303 | # (a) Construct a bounded range `r1` for the `evo_tree_booster`
1304 | # parameter `max_depth`, varying between 1 and 12.
1305 | 
1306 | # \star&(b) For the `nbins` parameter of the `EvoTreeRegressor`, define the range
1307 | 
1308 | r2 = range(model,
1309 |            :(evo_tree_regressor.nbins),
1310 |            lower = 2.5,
1311 |            upper= 7.5, scale=x->2^round(Int, x))
1312 | 
1313 | # Notice that in this case we've specified a *function* instead of a
1314 | # canned scale, like `:log10`. In this case the `scale` function is
1315 | # applied after sampling (uniformly) between the limits of `lower` and
1316 | # `upper`. Perhaps you can guess the outputs of the following lines of
1317 | # code?
1318 | 
1319 | r2_sampler = sampler(r2, Distributions.Uniform)
1320 | samples = rand(r2_sampler, 1000);
1321 | plt = histogram(samples, nbins=50)
1322 | savefig("uniform_sampler.png")
1323 | 
1324 | plt #!md
1325 | 
1326 | # ![](uniform_sampler.png)
1327 | 
1328 | sort(unique(samples))
1329 | 
1330 | # (c) Optimize `model` over these the parameter ranges `r1` and `r2`
1331 | # using a random search with uniform priors (the default). Use
1332 | # `Holdout()` resampling, and implement your search by first
1333 | # constructing a "self-tuning" wrap of `model`, as described
1334 | # above. Make `mae` (mean absolute error) the loss function that you
1335 | # optimize, and search over a total of 40 combinations of
1336 | # hyper-parameters.  If you have time, plot the results of your
1337 | # search. Feel free to use all available data.
1338 | 
1339 | # (d) Evaluate the best model found in the search using 3-fold
1340 | # cross-validation and compare with that of the self-tuning model
1341 | # (which is different!). Setting data hygiene concerns aside, feel
1342 | # free to use all available data.
1343 | 
1344 | # <a id='part-5-advanced-model-composition'>
1345 | 
1346 | 
1347 | # ## Part 5 - Advanced Model Composition
1348 | 
1349 | # > **Goals:**
1350 | # > 1. Learn how to build a prototypes of a composite model, called a *learning network*
1351 | # > 2. Learn how to use the `@from_network` macro to export a learning network as a new stand-alone model type
1352 | 
1353 | # While `@pipeline` is great for composing models in an unbranching
1354 | # sequence, for more complicated model composition you'll want to use
1355 | # MLJ's generic model composition syntax. There are two main steps:
1356 | 
1357 | # - **Prototype** the composite model by building a *learning
1358 | #   network*, which can be tested on some (dummy) data as you build
1359 | #   it.
1360 | 
1361 | # - **Export** the learning network as a new stand-alone model type.
1362 | 
1363 | # Like pipeline models, instances of the exported model type behave
1364 | # like any other model (and are not bound to any data, until you wrap
1365 | # them in a machine).
1366 | 
1367 | 
1368 | # ### Building a pipeline using the generic composition syntax
1369 | 
1370 | # To warm up, we'll do the equivalent of
1371 | 
1372 | pipe = @pipeline Standardizer LogisticClassifier;
1373 | 
1374 | # using the generic syntax.
1375 | 
1376 | # Here's some dummy data we'll be using to test our learning network:
1377 | 
1378 | X, y = make_blobs(5, 3)
1379 | pretty(X)
1380 | 
1381 | # **Step 0** - Proceed as if you were combining the models "by hand",
1382 | # using all the data available for training, transforming and
1383 | # prediction:
1384 | 
1385 | stand = Standardizer();
1386 | linear = LogisticClassifier();
1387 | 
1388 | mach1 = machine(stand, X);
1389 | fit!(mach1);
1390 | Xstand = transform(mach1, X);
1391 | 
1392 | mach2 = machine(linear, Xstand, y);
1393 | fit!(mach2);
1394 | yhat = predict(mach2, Xstand)
1395 | 
1396 | # **Step 1** - Edit your code as follows:
1397 | 
1398 | # - pre-wrap the data in `Source` nodes
1399 | 
1400 | # - delete the `fit!` calls
1401 | 
1402 | X = source(X)  # or X = source() if not testing
1403 | y = source(y)  # or y = source()
1404 | 
1405 | stand = Standardizer();
1406 | linear = LogisticClassifier();
1407 | 
1408 | mach1 = machine(stand, X);
1409 | Xstand = transform(mach1, X);
1410 | 
1411 | mach2 = machine(linear, Xstand, y);
1412 | yhat = predict(mach2, Xstand)
1413 | 
1414 | # Now `X`, `y`, `Xstand` and `yhat` are *nodes* ("variables" or
1415 | # "dynammic data") instead of data. All training, predicting and
1416 | # transforming is now executed lazily, whenever we `fit!` one of these
1417 | # nodes. We *call* a node to retrieve the data it represents in the
1418 | # original manual workflow.
1419 | 
1420 | fit!(Xstand)
1421 | Xstand() |> pretty
1422 | 
1423 | #-
1424 | 
1425 | fit!(yhat);
1426 | yhat()
1427 | 
1428 | # The node `yhat` is the "descendant" (in an associated DAG we have
1429 | # defined) of a unique source node:
1430 | 
1431 | sources(yhat)
1432 | 
1433 | #-
1434 | 
1435 | # The data at the source node is replaced by `Xnew` to obtain a
1436 | # new prediction when we call `yhat` like this:
1437 | 
1438 | Xnew, _ = make_blobs(2, 3);
1439 | yhat(Xnew)
1440 | 
1441 | 
1442 | # **Step 2** - Export the learning network as a new stand-alone model type
1443 | 
1444 | # Now, somewhat paradoxically, we can wrap the whole network in a
1445 | # special machine - called a *learning network machine* - before have
1446 | # defined the new model type. Indeed doing so is a necessary step in
1447 | # the export process, for this machine will tell the export macro:
1448 | 
1449 | # - what kind of model the composite will be (`Deterministic`,
1450 | #   `Probabilistic` or `Unsupervised`)a
1451 | 
1452 | # - which source nodes are input nodes and which are for the target
1453 | 
1454 | # - which nodes correspond to each operation (`predict`, `transform`,
1455 | #   etc) that we might want to define
1456 | 
1457 | surrogate = Probabilistic()     # a model with no fields!
1458 | mach = machine(surrogate, X, y; predict=yhat)
1459 | 
1460 | # Although we have no real need to use it, this machine behaves like
1461 | # you'd expect it to:
1462 | 
1463 | Xnew, _ = make_blobs(2, 3)
1464 | fit!(mach)
1465 | predict(mach, Xnew)
1466 | 
1467 | #-
1468 | 
1469 | # Now we create a new model type using a Julia `struct` definition
1470 | # appropriately decorated:
1471 | 
1472 | @from_network mach begin
1473 |     mutable struct YourPipe
1474 |         standardizer = stand
1475 |         classifier = linear::Probabilistic
1476 |     end
1477 | end
1478 | 
1479 | # Instantiating and evaluating on some new data:
1480 | 
1481 | pipe = YourPipe()
1482 | X, y = @load_iris;   # built-in data set
1483 | mach = machine(pipe, X, y)
1484 | evaluate!(mach, measure=misclassification_rate, operation=predict_mode)
1485 | 
1486 | 
1487 | # ### A composite model to average two regressor predictors
1488 | 
1489 | # The following is condensed version of
1490 | # [this](https://github.com/alan-turing-institute/MLJ.jl/blob/master/binder/MLJ_demo.ipynb)
1491 | # tutorial. We will define a composite model that:
1492 | 
1493 | # - standardizes the input data
1494 | 
1495 | # - learns and applies a Box-Cox transformation to the target variable
1496 | 
1497 | # - blends the predictions of two supervised learning models - a ridge
1498 | #  regressor and a random forest regressor; we'll blend using a simple
1499 | #  average (for a more sophisticated stacking example, see
1500 | #  [here](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/))
1501 | 
1502 | # - applies the *inverse* Box-Cox transformation to this blended prediction
1503 | 
1504 | RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
1505 | 
1506 | # **Input layer**
1507 | 
1508 | X = source()
1509 | y = source()
1510 | 
1511 | # **First layer and target transformation**
1512 | 
1513 | std_model = Standardizer()
1514 | stand = machine(std_model, X)
1515 | W = MLJ.transform(stand, X)
1516 | 
1517 | box_model = UnivariateBoxCoxTransformer()
1518 | box = machine(box_model, y)
1519 | z = MLJ.transform(box, y)
1520 | 
1521 | # **Second layer**
1522 | 
1523 | ridge_model = RidgeRegressor(lambda=0.1)
1524 | ridge = machine(ridge_model, W, z)
1525 | 
1526 | forest_model = RandomForestRegressor(n_trees=50)
1527 | forest = machine(forest_model, W, z)
1528 | 
1529 | ẑ = 0.5*predict(ridge, W) + 0.5*predict(forest, W)
1530 | 
1531 | # **Output**
1532 | 
1533 | ŷ = inverse_transform(box, ẑ)
1534 | 
1535 | # With the learning network defined, we're ready to export:
1536 | 
1537 | @from_network machine(Deterministic(), X, y, predict=ŷ) begin
1538 |     mutable struct CompositeModel
1539 |         rgs1 = ridge_model
1540 |         rgs2 = forest_model
1541 |     end
1542 | end
1543 | 
1544 | # Let's instantiate the new model type and try it out on some data:
1545 | 
1546 | composite = CompositeModel()
1547 | 
1548 | #-
1549 | 
1550 | X, y = @load_boston;
1551 | mach = machine(composite, X, y);
1552 | evaluate!(mach,
1553 |           resampling=CV(nfolds=6, shuffle=true),
1554 |           measures=[rms, mae])
1555 | 
1556 | 
1557 | # ### Resources for Part 5
1558 | #
1559 | # - From the MLJ manual:
1560 | #    - [Learning Networks](https://alan-turing-institute.github.io/MLJ.jl/stable/composing_models/#Learning-Networks-1)
1561 | # - From Data Science Tutorials:
1562 | #     - [Learning Networks](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/learning-networks/)
1563 | #     - [Learning Networks 2](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/learning-networks-2/)
1564 | 
1565 | #     - [Stacking](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/): an advanced example of model composition
1566 | 
1567 | #     - [Finer Control](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Method-II:-Finer-control-(advanced)-1):
1568 | #       exporting learning networks without a macro for finer control
1569 | 
1570 | # <a id='solutions-to-exercises'></a>
1571 | 
1572 | 
1573 | # ## Solutions to exercises
1574 | 
1575 | # #### Exercise 2 solution
1576 | 
1577 | quality = coerce(quality, OrderedFactor);
1578 | levels!(quality, ["poor", "good", "excellent"]);
1579 | elscitype(quality)
1580 | 
1581 | 
1582 | # #### Exercise 3 solution
1583 | 
1584 | # First pass:
1585 | 
1586 | coerce!(house, autotype(house));
1587 | schema(house)
1588 | 
1589 | #-
1590 | 
1591 | # All the "sqft" fields refer to "square feet" so are
1592 | # really `Continuous`. We'll regard `:yr_built` (the other `Count`
1593 | # variable above) as `Continuous` as well. So:
1594 | 
1595 | coerce!(house, Count => Continuous);
1596 | 
1597 | # And `:zipcode` should not be ordered:
1598 | 
1599 | coerce!(house, :zipcode => Multiclass);
1600 | schema(house)
1601 | 
1602 | # `:bathrooms` looks like it has a lot of levels, but on further
1603 | # inspection we see why, and `OrderedFactor` remains appropriate:
1604 | 
1605 | import StatsBase.countmap
1606 | countmap(house.bathrooms)
1607 | 
1608 | 
1609 | # #### Exercise 4 solution
1610 | 
1611 | # 4(a)
1612 | 
1613 | # There are *no* models that apply immediately:
1614 | 
1615 | models(matching(X4, y4))
1616 | 
1617 | # 4(b)
1618 | 
1619 | y4 = coerce(y4, Continuous);
1620 | models(matching(X4, y4))
1621 | 
1622 | 
1623 | # #### Exercise 6 solution
1624 | 
1625 | # 6(a)
1626 | 
1627 | y, X = unpack(horse,
1628 |               ==(:outcome),
1629 |               name -> elscitype(Tables.getcolumn(horse, name)) == Continuous);
1630 | 
1631 | # 6(b)(i)
1632 | 
1633 | model = (@load LogisticClassifier pkg=MLJLinearModels)();
1634 | model.lambda = 100
1635 | mach = machine(model, X, y)
1636 | fit!(mach, rows=train)
1637 | fitted_params(mach)
1638 | 
1639 | #-
1640 | 
1641 | coefs_given_feature = Dict(fitted_params(mach).coefs)
1642 | coefs_given_feature[:pulse]
1643 | 
1644 | #6(b)(ii)
1645 | 
1646 | yhat = predict(mach, rows=test); # or predict(mach, X[test,:])
1647 | err = cross_entropy(yhat, y[test]) |> mean
1648 | 
1649 | # 6(b)(iii)
1650 | 
1651 | # The predicted probabilities of the actual observations in the test
1652 | # are given by
1653 | 
1654 | p = broadcast(pdf, yhat, y[test]);
1655 | 
1656 | # The number of times this probability exceeds 50% is:
1657 | n50 = filter(x -> x > 0.5, p) |> length
1658 | 
1659 | # Or, as a proportion:
1660 | 
1661 | n50/length(test)
1662 | 
1663 | # 6(b)(iv)
1664 | 
1665 | misclassification_rate(mode.(yhat), y[test])
1666 | 
1667 | # 6(c)(i)
1668 | 
1669 | model = (@load RandomForestClassifier pkg=DecisionTree)()
1670 | mach = machine(model, X, y)
1671 | evaluate!(mach, resampling=CV(nfolds=6), measure=cross_entropy)
1672 | 
1673 | r = range(model, :n_trees, lower=10, upper=70, scale=:log10)
1674 | 
1675 | # Since random forests are inherently randomized, we generate multiple
1676 | # curves:
1677 | 
1678 | plt = plot()
1679 | for i in 1:4
1680 |     one_curve = learning_curve(mach,
1681 |                            range=r,
1682 |                            resampling=Holdout(),
1683 |                            measure=cross_entropy)
1684 |     plot!(one_curve.parameter_values, one_curve.measurements)
1685 | end
1686 | xlabel!(plt, "n_trees")
1687 | ylabel!(plt, "cross entropy")
1688 | savefig("exercise_6ci.png")
1689 | plt #!md
1690 | 
1691 | # ![](exercise_6ci.png) #md
1692 | 
1693 | 
1694 | # 6(c)(ii)
1695 | 
1696 | evaluate!(mach, resampling=CV(nfolds=9),
1697 |                 measure=cross_entropy,
1698 |                 rows=train).measurement[1]
1699 | 
1700 | model.n_trees = 90
1701 | 
1702 | # 6(c)(iii)
1703 | 
1704 | err_forest = evaluate!(mach, resampling=Holdout(),
1705 |                        measure=cross_entropy).measurement[1]
1706 | 
1707 | # #### Exercise 7
1708 | 
1709 | # (a)
1710 | 
1711 | KMeans = @load KMeans pkg=Clustering
1712 | EvoTreeClassifier = @load EvoTreeClassifier
1713 | pipe = @pipeline(Standardizer,
1714 |                  ContinuousEncoder,
1715 |                  KMeans(k=10),
1716 |                  EvoTreeClassifier(nrounds=50))
1717 | 
1718 | # (b)
1719 | 
1720 | mach = machine(pipe, X, y)
1721 | evaluate!(mach, resampling=CV(nfolds=6), measure=cross_entropy)
1722 | 
1723 | # (c)
1724 | 
1725 | r = range(pipe, :(evo_tree_classifier.max_depth), lower=1, upper=10)
1726 | 
1727 | curve = learning_curve(mach,
1728 |                        range=r,
1729 |                        resampling=CV(nfolds=6),
1730 |                        measure=cross_entropy)
1731 | 
1732 | plt = plot(curve.parameter_values, curve.measurements)
1733 | xlabel!(plt, "max_depth")
1734 | ylabel!(plt, "CV estimate of cross entropy")
1735 | savefig("exercise_7c.png")
1736 | plt #!md
1737 | 
1738 | # ![](exercise_7c.png) #md
1739 | 
1740 | # Here's a second curve using a different random seed for the booster:
1741 | 
1742 | using Random
1743 | pipe.evo_tree_classifier.rng = MersenneTwister(123)
1744 | curve = learning_curve(mach,
1745 |                        range=r,
1746 |                        resampling=CV(nfolds=6),
1747 |                        measure=cross_entropy)
1748 | plot!(curve.parameter_values, curve.measurements)
1749 | savefig("exercise_7c_2.png")
1750 | plt #!md
1751 | 
1752 | # ![](exercise_7c_2.png) #md
1753 | 
1754 | # One can automatic the production of multiple curves with different
1755 | # seeds in the following way:
1756 | curves = learning_curve(mach,
1757 |                         range=r,
1758 |                         resampling=CV(nfolds=6),
1759 |                         measure=cross_entropy,
1760 |                         rng_name=:(evo_tree_classifier.rng),
1761 |                         rngs=6) # list of RNGs, or num to auto generate
1762 | plt = plot(curves.parameter_values, curves.measurements)
1763 | savefig("exercise_7c_3.png")
1764 | plt #!md
1765 | 
1766 | # ![](exercise_7c_3.png) #md
1767 | 
1768 | # If you have multiple threads available in your julia session, you
1769 | # can add the option `acceleration=CPUThreads()` to speed up this
1770 | # computation.
1771 | 
1772 | # #### Exercise 8
1773 | 
1774 | y, X = unpack(house, ==(:price), name -> true, rng=123);
1775 | 
1776 | EvoTreeRegressor = @load EvoTreeRegressor
1777 | tree_booster = EvoTreeRegressor(nrounds = 70)
1778 | model = @pipeline ContinuousEncoder tree_booster
1779 | 
1780 | # (a)
1781 | 
1782 | r1 = range(model, :(evo_tree_regressor.max_depth), lower=1, upper=12)
1783 | 
1784 | # (c)
1785 | 
1786 | tuned_model = TunedModel(model=model,
1787 |                          ranges=[r1, r2],
1788 |                          resampling=Holdout(),
1789 |                          measures=mae,
1790 |                          tuning=RandomSearch(rng=123),
1791 |                          n=40)
1792 | 
1793 | tuned_mach = machine(tuned_model, X, y) |> fit!
1794 | plt = plot(tuned_mach)
1795 | savefig("exercise_8c.png")
1796 | plt #!md
1797 | 
1798 | # ![](exercise_8c.png) #md
1799 | 
1800 | # (d)
1801 | 
1802 | best_model = report(tuned_mach).best_model;
1803 | best_mach = machine(best_model, X, y);
1804 | best_err = evaluate!(best_mach, resampling=CV(nfolds=3), measure=mae)
1805 | 
1806 | #-
1807 | 
1808 | tuned_err = evaluate!(tuned_mach, resampling=CV(nfolds=3), measure=mae)
1809 | 
1810 | 
1811 | using Literate #src
1812 | Literate.markdown(@__FILE__, DIR, execute=true) #src
1813 | Literate.notebook(@__FILE__, DIR, execute=false) #src
1814 | 


--------------------------------------------------------------------------------
/vecstack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ablaom/MachineLearningInJulia2020/552f98fbf012475d67cd29a72448ac7c476ea2c7/vecstack.png


--------------------------------------------------------------------------------
/wow.jl:
--------------------------------------------------------------------------------
  1 | # # State-of-the-art model composition in MLJ (Machine Learning in Julia)
  2 | 
  3 | # In this script we use model stacking to demonstrate the ease with
  4 | # which machine learning models can be combined in sophisticated ways
  5 | # using MLJ. In practice, one would use MLJ's [canned stacking model
  6 | # constructor](https://alan-turing-institute.github.io/MLJ.jl/dev/model_stacking/#Model-Stacking)
  7 | # `Stack`. Here, however, we give a quick demonstation how you would
  8 | # build a stack yourself, using MLJ's generic model composition
  9 | # syntax, which is an extension of the normal fit/predict syntax.
 10 | 
 11 | # For a more leisurely notebook on the same material, see
 12 | # [this](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/)
 13 | # tutorial.
 14 | 
 15 | 
 16 | DIR = @__DIR__
 17 | include(joinpath(DIR, "setup.jl"))
 18 | 
 19 | # ## Stacking is hard
 20 | 
 21 | # [Model
 22 | # stacking](https://alan-turing-institute.github.io/DataScienceTutorials.jl/getting-started/stacking/),
 23 | # popular in Kaggle data science competitions, is a sophisticated way
 24 | # to blend the predictions of multiple models.
 25 | 
 26 | # With the python toolbox
 27 | # [scikit-learn](https://scikit-learn.org/stable/) (or its [julia
 28 | # wrap](https://github.com/cstjean/ScikitLearn.jl)) you can use
 29 | # pipelines to combine composite models in simple ways but (automated)
 30 | # stacking is beyond its capabilities.
 31 | 
 32 | # One python alternative is to use
 33 | # [vecstack](https://github.com/vecxoz/vecstack). The [core
 34 | # algorithm](https://github.com/vecxoz/vecstack/blob/master/vecstack/core.py)
 35 | # is about eight pages (without the scikit-learn interface):
 36 | 
 37 | # ![](vecstack.png).
 38 | 
 39 | # ## Stacking is easy (in MLJ)
 40 | 
 41 | # Using MLJ's [generic model composition
 42 | # API](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/)
 43 | # you can build a stack in about a page.
 44 | 
 45 | # Here's the complete code needed to define a new model type that
 46 | # stacks two base regressors and one adjudicator in MLJ.  Here we use
 47 | # three folds to create the base-learner [out-of-sample
 48 | # predictions](https://alan-turing-institute.github.io/DataScienceTutorials.jl/getting-started/stacking/)
 49 | # to make it easier to read. You can make this generic with little fuss.
 50 | 
 51 | using MLJ
 52 | 
 53 | folds(data, nfolds) =
 54 |     partition(1:nrows(data), (1/nfolds for i in 1:(nfolds-1))...);
 55 | 
 56 | # these models are only going to be default choices for the stack:
 57 | 
 58 | LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
 59 | model1 = LinearRegressor()
 60 | model2 = LinearRegressor()
 61 | judge = LinearRegressor()
 62 | 
 63 | X = source()
 64 | y = source()
 65 | 
 66 | folds(X::AbstractNode, nfolds) = node(XX->folds(XX, nfolds), X)
 67 | MLJ.restrict(X::AbstractNode, f::AbstractNode, i) =
 68 |     node((XX, ff) -> restrict(XX, ff, i), X, f);
 69 | MLJ.corestrict(X::AbstractNode, f::AbstractNode, i) =
 70 |     node((XX, ff) -> corestrict(XX, ff, i), X, f);
 71 | 
 72 | f = folds(X, 3)
 73 | 
 74 | m11 = machine(model1, corestrict(X, f, 1), corestrict(y, f, 1))
 75 | m12 = machine(model1, corestrict(X, f, 2), corestrict(y, f, 2))
 76 | m13 = machine(model1, corestrict(X, f, 3), corestrict(y, f, 3))
 77 | 
 78 | y11 = predict(m11, restrict(X, f, 1));
 79 | y12 = predict(m12, restrict(X, f, 2));
 80 | y13 = predict(m13, restrict(X, f, 3));
 81 | 
 82 | m21 = machine(model2, corestrict(X, f, 1), corestrict(y, f, 1))
 83 | m22 = machine(model2, corestrict(X, f, 2), corestrict(y, f, 2))
 84 | m23 = machine(model2, corestrict(X, f, 3), corestrict(y, f, 3))
 85 | 
 86 | y21 = predict(m21, restrict(X, f, 1));
 87 | y22 = predict(m22, restrict(X, f, 2));
 88 | y23 = predict(m23, restrict(X, f, 3));
 89 | 
 90 | y1_oos = vcat(y11, y12, y13);
 91 | y2_oos = vcat(y21, y22, y23);
 92 | 
 93 | X_oos = MLJ.table(hcat(y1_oos, y2_oos))
 94 | 
 95 | m_judge = machine(judge, X_oos, y)
 96 | 
 97 | m1 = machine(model1, X, y)
 98 | m2 = machine(model2, X, y)
 99 | 
100 | y1 = predict(m1, X);
101 | y2 = predict(m2, X);
102 | 
103 | X_judge = MLJ.table(hcat(y1, y2))
104 | yhat = predict(m_judge, X_judge)
105 | 
106 | @from_network machine(Deterministic(), X, y; predict=yhat) begin
107 |     mutable struct MyStack
108 |         regressor1=model1
109 |         regressor2=model2
110 |         judge=judge
111 |     end
112 | end
113 | 
114 | my_stack = MyStack()
115 | 
116 | # For the curious: Only the last block defines the new model type. The
117 | # rest defines a *[learning network]()* - a kind of working prototype
118 | # or blueprint for the type. If the source nodes `X` and `y` wrap some
119 | # data (instead of nothing) then the network can be trained and tested
120 | # as you build it.
121 | 
122 | 
123 | # ## Composition plays well with other work-flows
124 | 
125 | # We did not include standardization of inputs and target (with
126 | # post-prediction inversion) in our stack. However, we can add these
127 | # now, using MLJ's canned pipeline composition:
128 | 
129 | pipe = @pipeline Standardizer my_stack target=Standardizer
130 | 
131 | # Want to change a base learner and adjudicator?
132 | 
133 | DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree;
134 | KNNRegressor = @load KNNRegressor;
135 | pipe.my_stack.regressor2 = DecisionTreeRegressor()
136 | pipe.my_stack.judge = KNNRegressor();
137 | 
138 | # Want a CV estimate of performance of the complete model on some data?
139 | 
140 | X, y = @load_boston;
141 | mach = machine(pipe, X, y)
142 | evaluate!(mach, resampling=CV(), measure=mae)
143 | 
144 | # Want to inspect the learned parameters of the adjudicator?
145 | 
146 | fp =  fitted_params(mach);
147 | fp.my_stack.judge
148 | 
149 | # What about the first base-learner of the stack? There are four sets
150 | # of learned parameters!  One for each fold to make an out-of-sample
151 | # prediction, and one trained on all the data:
152 | 
153 | fp.my_stack.regressor1
154 | 
155 | #-
156 | 
157 | fp.my_stack.regressor1[1].coefs
158 | 
159 | # Want to tune multiple (nested) hyperparameters in the stack? Tuning is a
160 | # model wrapper (for better composition!):
161 | 
162 | r1 = range(pipe, :(my_stack.regressor2.max_depth), lower = 1, upper = 25, scale=:linear)
163 | r2 = range(pipe, :(my_stack.judge.K), lower=1, origin=10, unit=10, scale=:log10)
164 | 
165 | import Distributions.Poisson
166 | 
167 | tuned_pipe = TunedModel(model=pipe,
168 |                          ranges=[r1, (r2, Poisson)],
169 |                          tuning=RandomSearch(),
170 |                          resampling=CV(),
171 |                          measure=rms,
172 |                          n=100)
173 | mach = machine(tuned_pipe, X, y) |> fit!
174 | best_model = fitted_params(mach).best_model
175 | K = fitted_params(mach).best_model.my_stack.judge.K;
176 | max_depth = fitted_params(mach).best_model.my_stack.regressor2.max_depth
177 | @show K max_depth;
178 | 
179 | # Visualize tuning results:
180 | 
181 | using Plots
182 | gr(size=(700,700*(sqrt(5) - 1)/2))
183 | plt = plot(mach)
184 | savefig("stacking.png")
185 | plt #!md
186 | 
187 | # ![](stacking.png)
188 | 
189 | using Literate #src
190 | Literate.markdown(@__FILE__, @__DIR__, execute=false) #src
191 | Literate.notebook(@__FILE__, @__DIR__, execute=true) #src
192 | 


--------------------------------------------------------------------------------
/wow.md:
--------------------------------------------------------------------------------
  1 | ```@meta
  2 | EditURL = "<unknown>/wow.jl"
  3 | ```
  4 | 
  5 | # State-of-the-art model composition in MLJ (Machine Learning in Julia)
  6 | 
  7 | In this script we use model stacking to demonstrate the ease with
  8 | which machine learning models can be combined in sophisticated ways
  9 | using MLJ. In practice, one would use MLJ's [canned stacking model
 10 | constructor](https://alan-turing-institute.github.io/MLJ.jl/dev/model_stacking/#Model-Stacking)
 11 | `Stack`. Here, however, we give a quick demonstation how you would
 12 | build a stack yourself, using MLJ's generic model composition
 13 | syntax, which is an extension of the normal fit/predict syntax.
 14 | 
 15 | For a more leisurely notebook on the same material, see
 16 | [this](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/stacking/)
 17 | tutorial.
 18 | 
 19 | ````@example wow
 20 | DIR = @__DIR__
 21 | include(joinpath(DIR, "setup.jl"))
 22 | ````
 23 | 
 24 | ## Stacking is hard
 25 | 
 26 | [Model
 27 | stacking](https://alan-turing-institute.github.io/DataScienceTutorials.jl/getting-started/stacking/),
 28 | popular in Kaggle data science competitions, is a sophisticated way
 29 | to blend the predictions of multiple models.
 30 | 
 31 | With the python toolbox
 32 | [scikit-learn](https://scikit-learn.org/stable/) (or its [julia
 33 | wrap](https://github.com/cstjean/ScikitLearn.jl)) you can use
 34 | pipelines to combine composite models in simple ways but (automated)
 35 | stacking is beyond its capabilities.
 36 | 
 37 | One python alternative is to use
 38 | [vecstack](https://github.com/vecxoz/vecstack). The [core
 39 | algorithm](https://github.com/vecxoz/vecstack/blob/master/vecstack/core.py)
 40 | is about eight pages (without the scikit-learn interface):
 41 | 
 42 | ![](vecstack.png).
 43 | 
 44 | ## Stacking is easy (in MLJ)
 45 | 
 46 | Using MLJ's [generic model composition
 47 | API](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/)
 48 | you can build a stack in about a page.
 49 | 
 50 | Here's the complete code needed to define a new model type that
 51 | stacks two base regressors and one adjudicator in MLJ.  Here we use
 52 | three folds to create the base-learner [out-of-sample
 53 | predictions](https://alan-turing-institute.github.io/DataScienceTutorials.jl/getting-started/stacking/)
 54 | to make it easier to read. You can make this generic with little fuss.
 55 | 
 56 | ````@example wow
 57 | using MLJ
 58 | 
 59 | folds(data, nfolds) =
 60 |     partition(1:nrows(data), (1/nfolds for i in 1:(nfolds-1))...);
 61 | nothing #hide
 62 | ````
 63 | 
 64 | these models are only going to be default choices for the stack:
 65 | 
 66 | ````@example wow
 67 | LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
 68 | model1 = LinearRegressor()
 69 | model2 = LinearRegressor()
 70 | judge = LinearRegressor()
 71 | 
 72 | X = source()
 73 | y = source()
 74 | 
 75 | folds(X::AbstractNode, nfolds) = node(XX->folds(XX, nfolds), X)
 76 | MLJ.restrict(X::AbstractNode, f::AbstractNode, i) =
 77 |     node((XX, ff) -> restrict(XX, ff, i), X, f);
 78 | MLJ.corestrict(X::AbstractNode, f::AbstractNode, i) =
 79 |     node((XX, ff) -> corestrict(XX, ff, i), X, f);
 80 | 
 81 | f = folds(X, 3)
 82 | 
 83 | m11 = machine(model1, corestrict(X, f, 1), corestrict(y, f, 1))
 84 | m12 = machine(model1, corestrict(X, f, 2), corestrict(y, f, 2))
 85 | m13 = machine(model1, corestrict(X, f, 3), corestrict(y, f, 3))
 86 | 
 87 | y11 = predict(m11, restrict(X, f, 1));
 88 | y12 = predict(m12, restrict(X, f, 2));
 89 | y13 = predict(m13, restrict(X, f, 3));
 90 | 
 91 | m21 = machine(model2, corestrict(X, f, 1), corestrict(y, f, 1))
 92 | m22 = machine(model2, corestrict(X, f, 2), corestrict(y, f, 2))
 93 | m23 = machine(model2, corestrict(X, f, 3), corestrict(y, f, 3))
 94 | 
 95 | y21 = predict(m21, restrict(X, f, 1));
 96 | y22 = predict(m22, restrict(X, f, 2));
 97 | y23 = predict(m23, restrict(X, f, 3));
 98 | 
 99 | y1_oos = vcat(y11, y12, y13);
100 | y2_oos = vcat(y21, y22, y23);
101 | 
102 | X_oos = MLJ.table(hcat(y1_oos, y2_oos))
103 | 
104 | m_judge = machine(judge, X_oos, y)
105 | 
106 | m1 = machine(model1, X, y)
107 | m2 = machine(model2, X, y)
108 | 
109 | y1 = predict(m1, X);
110 | y2 = predict(m2, X);
111 | 
112 | X_judge = MLJ.table(hcat(y1, y2))
113 | yhat = predict(m_judge, X_judge)
114 | 
115 | @from_network machine(Deterministic(), X, y; predict=yhat) begin
116 |     mutable struct MyStack
117 |         regressor1=model1
118 |         regressor2=model2
119 |         judge=judge
120 |     end
121 | end
122 | 
123 | my_stack = MyStack()
124 | ````
125 | 
126 | For the curious: Only the last block defines the new model type. The
127 | rest defines a *[learning network]()* - a kind of working prototype
128 | or blueprint for the type. If the source nodes `X` and `y` wrap some
129 | data (instead of nothing) then the network can be trained and tested
130 | as you build it.
131 | 
132 | ## Composition plays well with other work-flows
133 | 
134 | We did not include standardization of inputs and target (with
135 | post-prediction inversion) in our stack. However, we can add these
136 | now, using MLJ's canned pipeline composition:
137 | 
138 | ````@example wow
139 | pipe = @pipeline Standardizer my_stack target=Standardizer
140 | ````
141 | 
142 | Want to change a base learner and adjudicator?
143 | 
144 | ````@example wow
145 | DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree;
146 | KNNRegressor = @load KNNRegressor;
147 | pipe.my_stack.regressor2 = DecisionTreeRegressor()
148 | pipe.my_stack.judge = KNNRegressor();
149 | nothing #hide
150 | ````
151 | 
152 | Want a CV estimate of performance of the complete model on some data?
153 | 
154 | ````@example wow
155 | X, y = @load_boston;
156 | mach = machine(pipe, X, y)
157 | evaluate!(mach, resampling=CV(), measure=mae)
158 | ````
159 | 
160 | Want to inspect the learned parameters of the adjudicator?
161 | 
162 | ````@example wow
163 | fp =  fitted_params(mach);
164 | fp.my_stack.judge
165 | ````
166 | 
167 | What about the first base-learner of the stack? There are four sets
168 | of learned parameters!  One for each fold to make an out-of-sample
169 | prediction, and one trained on all the data:
170 | 
171 | ````@example wow
172 | fp.my_stack.regressor1
173 | ````
174 | 
175 | ````@example wow
176 | fp.my_stack.regressor1[1].coefs
177 | ````
178 | 
179 | Want to tune multiple (nested) hyperparameters in the stack? Tuning is a
180 | model wrapper (for better composition!):
181 | 
182 | ````@example wow
183 | r1 = range(pipe, :(my_stack.regressor2.max_depth), lower = 1, upper = 25, scale=:linear)
184 | r2 = range(pipe, :(my_stack.judge.K), lower=1, origin=10, unit=10, scale=:log10)
185 | 
186 | import Distributions.Poisson
187 | 
188 | tuned_pipe = TunedModel(model=pipe,
189 |                          ranges=[r1, (r2, Poisson)],
190 |                          tuning=RandomSearch(),
191 |                          resampling=CV(),
192 |                          measure=rms,
193 |                          n=100)
194 | mach = machine(tuned_pipe, X, y) |> fit!
195 | best_model = fitted_params(mach).best_model
196 | K = fitted_params(mach).best_model.my_stack.judge.K;
197 | max_depth = fitted_params(mach).best_model.my_stack.regressor2.max_depth
198 | @show K max_depth;
199 | nothing #hide
200 | ````
201 | 
202 | Visualize tuning results:
203 | 
204 | ````@example wow
205 | using Plots
206 | gr(size=(700,700*(sqrt(5) - 1)/2))
207 | plt = plot(mach)
208 | savefig("stacking.png")
209 | ````
210 | 
211 | ![](stacking.png)
212 | 
213 | ---
214 | 
215 | *This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*
216 | 
217 | 


--------------------------------------------------------------------------------