├── .gitignore
├── LICENSE
├── README.md
├── ani_kmeans.py
├── environment.yml
├── images
└── computable_numbers.jpg
├── matplotlibrc
├── ml
├── __init__.py
├── images
│ ├── alice.jpg
│ ├── cnn.gif
│ ├── confusion_matrix.png
│ ├── cv.png
│ ├── dominik.png
│ ├── graph_vis_animation.gif
│ ├── gw.jpg
│ ├── icecube.jpg
│ ├── jupyter.png
│ ├── keras-logo-2018-large-1200.png
│ ├── logo.png
│ ├── nn.png
│ ├── nn_two.png
│ ├── nn_wording.png
│ ├── nyt_titanic.jpg
│ ├── setosa.jpg
│ ├── sklearn_citations.png
│ ├── titanic-movie.jpg
│ ├── versicolor.jpg
│ └── virginica.jpg
├── plots.py
└── solutions
│ ├── exercise_1.py
│ ├── exercise_2.py
│ ├── exercise_3.py
│ ├── exercise_5.py
│ └── exercise_6.py
├── resources
├── event.mp4
├── lstsubarray_stereo.mp4
├── muon_data.txt
└── spalt.csv
├── smd_boosting.ipynb
├── smd_handson_fitting.ipynb
├── smd_neural_networks.ipynb
├── smd_neural_networks_keras.ipynb
├── smd_neural_networks_torch.ipynb
├── smd_pca.ipynb
├── smd_supervised.ipynb
├── smd_unsupervised.ipynb
├── titanic.xls
└── titanic_train.csv
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | data/
3 | __pycache__/
4 | *.py[cod]
5 | *$py.class
6 |
7 | # C extensions
8 | *.so
9 |
10 | # Distribution / packaging
11 | .Python
12 | env/
13 | build/
14 | develop-eggs/
15 | dist/
16 | downloads/
17 | eggs/
18 | .eggs/
19 | lib/
20 | lib64/
21 | parts/
22 | sdist/
23 | var/
24 | wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .coverage
43 | .coverage.*
44 | .cache
45 | nosetests.xml
46 | coverage.xml
47 | *.cover
48 | .hypothesis/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 |
58 | # Flask stuff:
59 | instance/
60 | .webassets-cache
61 |
62 | # Scrapy stuff:
63 | .scrapy
64 |
65 | # Sphinx documentation
66 | docs/_build/
67 |
68 | # PyBuilder
69 | target/
70 |
71 | # Jupyter Notebook
72 | .ipynb_checkpoints
73 |
74 | # pyenv
75 | .python-version
76 |
77 | # celery beat schedule file
78 | celerybeat-schedule
79 |
80 | # SageMath parsed files
81 | *.sage.py
82 |
83 | # dotenv
84 | .env
85 |
86 | # virtualenv
87 | .venv
88 | venv/
89 | ENV/
90 |
91 | # Spyder project settings
92 | .spyderproject
93 | .spyproject
94 |
95 | # Rope project settings
96 | .ropeproject
97 |
98 | # mkdocs documentation
99 | /site
100 |
101 | # mypy
102 | .mypy_cache/
103 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A machine learning lecture
[](https://mybinder.org/v2/gh/tudo-astroparticlephysics/machine-learning-lecture/main)
2 |
3 | This collection of notebooks was started for a lecture on machine learning at the Universitat Autònoma de Barcelona.
4 | It has since grown into a large part of the statistical methods lecture (SMD) at the Physics department at TU Dortmund University.
5 | It contains some mathematical derivations and small excersises to play with.
6 |
7 | As of now, you need to execute this notebook within the project folder since it imports some plotting functions from the `ml` module.
8 |
9 |
10 | 
11 |
12 | ## License
13 |
14 | The programming code examples in this material are shared under the GnuGPLv3 license.
15 | The lecture material (e.g. jupyter notebooks) are shared under the Creative Commons Attribution-NonCommercial License: https://creativecommons.org/licenses/by-nc/4.0/legalcode.txt, so they cannot be used for commercial training / tutorials / lectures.
16 |
17 |
18 | ## Lectures
19 |
20 | 1. Data-Preprocessing and feature selection (smd_pca.ipynb)
21 | 2. Introduction to supervised machine learning (smd_ml.ipynb, part 1)
22 | 3. Validation, Bias-Variance-Tradeoff, ensemble methods (smd_ml.ipynb, part 2)
23 | 4. Unsupervised learning, clustering (smd_unsupervised.ipynb)
24 | 5. Example on FACT Data and Boosting (smd_fact_boosting.ipynb)
25 | 6. Neural Networks (smd_neural_networks.ipynb)
26 |
27 |
28 | # Running the notebooks
29 |
30 |
31 | ## Install `conda`
32 | To make sure, all needed packages are installed in an environment for these lectures, we use
33 | `conda`.
34 |
35 | Download and install [Anaconda](https://www.anaconda.com/products/individual#Downloads) for a large collection of packages or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for a minimal starting point.
36 |
37 | ## Setup the environment
38 |
39 |
40 | After installing conda, run
41 |
42 | ```
43 | $ conda env create -f environment.yml
44 | ```
45 |
46 | This will create a new conda environment with all needed packages for these lectures
47 | named `ml`.
48 |
49 | To use this environment, run
50 | ```
51 | $ conda activate ml
52 | ```
53 | everytime before you start working on these lectures.
54 |
55 | From time to time, we will update the `environment.yml` with new versions or
56 | additional packages, to then update your environment, run:
57 | ```
58 | $ conda env update -f environment.yml
59 | ```
60 |
61 |
62 | ## Running the notebooks
63 |
64 | Just run
65 |
66 | ```
67 | $ jupyter notebook
68 | ```
69 | this will open your default browser at the overview page, where you can select each of
70 | the notebooks.
71 |
--------------------------------------------------------------------------------
/ani_kmeans.py:
--------------------------------------------------------------------------------
1 | from matplotlib.colors import ListedColormap
2 | from sklearn.cluster import KMeans
3 | from sklearn.datasets import make_blobs
4 | import matplotlib.pyplot as plt
5 | from matplotlib.animation import FuncAnimation
6 | from tqdm.auto import tqdm
7 | import numpy as np
8 |
9 |
10 | k = 4
11 | n_iters = 25
12 | discrete_cmap = ListedColormap([f'C{i}' for i in range(k)])
13 | fps = 25
14 | interval = 1000 / fps
15 | time_per_iter = 1
16 | frames = n_iters * time_per_iter * fps
17 |
18 | # choose inital cluster centers
19 | X, y = make_blobs(
20 | n_samples=500, centers=k, center_box=(-2, 2),
21 | cluster_std=0.5, random_state=1,
22 | )
23 |
24 | fig = plt.figure(figsize=(12.8, 7.2), dpi=100)
25 | ax = fig.add_subplot(1, 1, 1)
26 | ax.set_aspect(1)
27 | ax.set_axis_off()
28 | ax.set_xlim(-4, 4)
29 | ax.set_ylim(-4, 4)
30 |
31 |
32 | init_centers = np.random.uniform(-1, 1, size=[k, 2])
33 | center_history = np.zeros((n_iters, k, 2))
34 |
35 |
36 | center_lines = [ax.plot([], [])[0] for _ in range(k)]
37 | points = ax.scatter(X[:, 0], X[:, 1], c='k', cmap=discrete_cmap, alpha=0.5)
38 | center_plot = ax.scatter(
39 | init_centers[:, 0],
40 | init_centers[:, 1],
41 | c=np.arange(k),
42 | cmap=discrete_cmap,
43 | marker='h',
44 | edgecolor='k',
45 | s=400,
46 | label='cluster center',
47 | )
48 |
49 | ax.legend(loc='upper right')
50 |
51 |
52 | def init():
53 | t = ax.set_title('iteration 0')
54 | return *center_lines, points, t
55 |
56 | def update(frame, bar=None):
57 | if bar is not None:
58 | bar.update(1)
59 |
60 | i = frame // (fps * time_per_iter)
61 | if i > 0:
62 | kmeans = KMeans(n_clusters=k, init=init_centers, max_iter=i + 1, n_init=1)
63 | prediction = kmeans.fit_predict(X)
64 | center_history[i] = kmeans.cluster_centers_
65 | center_plot.set_offsets(kmeans.cluster_centers_)
66 | points.set_array(prediction)
67 | else:
68 | center_history[i] = init_centers
69 |
70 | for j, line in enumerate(center_lines):
71 | line.set_data(center_history[:i + 1, j, 0], center_history[:i + 1, j, 1])
72 |
73 | points.set_cmap(discrete_cmap)
74 | t = ax.set_title('iteration {}'.format(i + 1))
75 |
76 | return *center_lines, points, t
77 |
78 | bar = tqdm(total=frames)
79 | ani = FuncAnimation(fig, update, blit=True, init_func=init, frames=frames, fargs=(bar,), interval=interval)
80 | ani.save("kmeans_clustering.mp4")
81 | ani.pause()
82 | plt.close(fig)
83 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: ml
2 |
3 | channels:
4 | - conda-forge
5 |
6 | dependencies:
7 | - python=3.12
8 | - click=8
9 | - ctapipe=0.21
10 | - graphviz=2
11 | - h5py
12 | - iminuit
13 | - ipympl
14 | - ipython
15 | - ipywidgets
16 | - jupyter
17 | - jupyter_contrib_nbextensions
18 | - jupyterlab
19 | - keras
20 | - matplotlib=3.8
21 | - notebook
22 | - numba
23 | - numdifftools
24 | - numpy
25 | - pandas
26 | - pillow
27 | - pip
28 | - pre-commit
29 | - pydotplus
30 | - pytables
31 | - pytest
32 | - python-graphviz
33 | - pytorch=2.1
34 | - pytorch-cpu=2.1
35 | - ruff
36 | - scikit-image
37 | - scikit-learn=1.5
38 | - scipy=1.12
39 | - seaborn
40 | - torchvision
41 | - tqdm
42 | - xlrd
43 | - mrmr_selection
44 |
--------------------------------------------------------------------------------
/images/computable_numbers.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/images/computable_numbers.jpg
--------------------------------------------------------------------------------
/matplotlibrc:
--------------------------------------------------------------------------------
1 | image.cmap: inferno
2 | figure.constrained_layout.use: True
3 |
--------------------------------------------------------------------------------
/ml/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/ml/images/alice.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/alice.jpg
--------------------------------------------------------------------------------
/ml/images/cnn.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/cnn.gif
--------------------------------------------------------------------------------
/ml/images/confusion_matrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/confusion_matrix.png
--------------------------------------------------------------------------------
/ml/images/cv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/cv.png
--------------------------------------------------------------------------------
/ml/images/dominik.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/dominik.png
--------------------------------------------------------------------------------
/ml/images/graph_vis_animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/graph_vis_animation.gif
--------------------------------------------------------------------------------
/ml/images/gw.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/gw.jpg
--------------------------------------------------------------------------------
/ml/images/icecube.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/icecube.jpg
--------------------------------------------------------------------------------
/ml/images/jupyter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/jupyter.png
--------------------------------------------------------------------------------
/ml/images/keras-logo-2018-large-1200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/keras-logo-2018-large-1200.png
--------------------------------------------------------------------------------
/ml/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/logo.png
--------------------------------------------------------------------------------
/ml/images/nn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/nn.png
--------------------------------------------------------------------------------
/ml/images/nn_two.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/nn_two.png
--------------------------------------------------------------------------------
/ml/images/nn_wording.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/nn_wording.png
--------------------------------------------------------------------------------
/ml/images/nyt_titanic.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/nyt_titanic.jpg
--------------------------------------------------------------------------------
/ml/images/setosa.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/setosa.jpg
--------------------------------------------------------------------------------
/ml/images/sklearn_citations.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/sklearn_citations.png
--------------------------------------------------------------------------------
/ml/images/titanic-movie.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/titanic-movie.jpg
--------------------------------------------------------------------------------
/ml/images/versicolor.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/versicolor.jpg
--------------------------------------------------------------------------------
/ml/images/virginica.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/ml/images/virginica.jpg
--------------------------------------------------------------------------------
/ml/plots.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | from sklearn.metrics import accuracy_score
3 | from sklearn.metrics import confusion_matrix
4 | from sklearn import tree
5 | import collections
6 | import seaborn as sns
7 | import numpy as np
8 | import pandas as pd
9 | from matplotlib.colors import ListedColormap, to_hex
10 |
11 |
12 | rng = np.random.default_rng(0)
13 |
14 |
15 | colors = ['xkcd:sky', 'xkcd:grass']
16 | cmap = ListedColormap(colors)
17 |
18 | def create_discrete_colormap(n_classes):
19 | if n_classes == 2:
20 | return cmap.copy()
21 | return ListedColormap([f'C{i}' for i in range(n_classes)])
22 |
23 |
24 | def set_plot_style():
25 | sns.reset_orig()
26 | plt.rcParams["figure.figsize"] = (9.23, 9.23 / 3 * 2)
27 | plt.rcParams["figure.dpi"] = 100
28 | plt.rcParams["figure.max_open_warning"] = 50
29 | plt.rcParams["font.size"] = 14
30 | plt.rcParams["lines.linewidth"] = 2
31 | plt.rcParams["axes.spines.top"] = False
32 | plt.rcParams["axes.spines.right"] = False
33 |
34 |
35 | def twospirals(n_samples, noise=0.5, rng=rng):
36 | """
37 | Returns the two spirals dataset.
38 | """
39 | n = np.sqrt(rng.uniform(size=(n_samples, 1))) * 360 * (2 * np.pi) / 360
40 | d1x = -np.cos(n) * n + rng.uniform((n_samples, 1)) * noise
41 | d1y = np.sin(n) * n + rng.uniform((n_samples, 1)) * noise
42 | return (
43 | np.vstack((np.hstack((d1x, d1y)), np.hstack((-d1x, -d1y)))),
44 | np.hstack((np.zeros(n_samples), np.ones(n_samples))),
45 | )
46 |
47 |
48 | def draw_linear_regression_function(reg, ax=None, **kwargs):
49 | if not ax:
50 | ax = plt.gca()
51 |
52 | if reg.coef_.ndim > 1:
53 | b_1, b_2 = reg.coef_[0, :]
54 | else:
55 | b_1, b_2 = reg.coef_
56 |
57 | b_0 = reg.intercept_
58 |
59 | # solve the function y = b_0 + b_1*X_1 + b_2 * X_2 for X2
60 | x_low, x_high = ax.get_xlim()
61 | x1s = np.linspace(x_low, x_high)
62 | x2s = (0.5 - b_0 - b_1 * x1s) / b_2
63 |
64 | ax.plot(x1s, x2s, **kwargs)
65 |
66 |
67 | def plot_3d_views(X, y, cmap=cmap):
68 | from mpl_toolkits.mplot3d import Axes3D # noqa
69 |
70 | fig, axs = plt.subplots(2, 2, subplot_kw={'projection': '3d'}, constrained_layout=False)
71 |
72 | for ax in axs.ravel():
73 | ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=cmap, lw=0)
74 | ax.set_xlabel("X1")
75 | ax.set_ylabel("X2")
76 | ax.set_zlabel("X3")
77 | ax.set_xticklabels([])
78 | ax.set_yticklabels([])
79 | ax.set_zticklabels([])
80 |
81 | axs[0, 1].view_init(0, 0)
82 | axs[1, 0].view_init(0, 90)
83 | axs[1, 1].view_init(90, 0)
84 | fig.subplots_adjust(wspace=0.005, hspace=0.005)
85 |
86 | def draw_tree(clf):
87 | import pydotplus
88 |
89 | d = tree.export_graphviz(clf, out_file=None, filled=True)
90 | graph = pydotplus.graph_from_dot_data(d)
91 |
92 | edges = collections.defaultdict(list)
93 |
94 | for edge in graph.get_edge_list():
95 | edges[edge.get_source()].append(int(edge.get_destination()))
96 |
97 | for edge in edges:
98 | edges[edge].sort()
99 | for i in range(2):
100 | dest = graph.get_node(str(edges[edge][i]))[0]
101 | dest.set_fillcolor(to_hex(colors[i]))
102 |
103 | return graph.create(format="png")
104 |
105 |
106 | def draw_svm_decision_function(clf, ax=None, **kwargs):
107 | if not ax:
108 | ax = plt.gca()
109 |
110 | x_low, x_high = ax.get_xlim()
111 | y_low, y_high = ax.get_ylim()
112 | x1 = np.linspace(x_low, x_high, 40)
113 | x2 = np.linspace(y_low, y_high, 40)
114 |
115 | X1, X2 = np.meshgrid(x1, x2)
116 | xy = np.vstack([X1.ravel(), X2.ravel()]).T
117 | # get the separating hyperplane
118 | Z = clf.decision_function(xy).reshape(X1.shape)
119 |
120 | # plot decision boundary and margins
121 | label = kwargs.pop("label", "Decision Boundary")
122 | cs = ax.contour(
123 | X1, X2, Z, levels=[-1.0, 0, 1.0], linestyles=["--", "-", "--"], **kwargs
124 | )
125 | cs.collections[0].set_label(label)
126 | plt.axis("off")
127 |
128 |
129 | def draw_decision_boundaries(knn, ax=None, cmap="winter", alpha=0.07, **kwargs):
130 | if not ax:
131 | ax = plt.gca()
132 |
133 | x_low, x_high = ax.get_xlim()
134 | y_low, y_high = ax.get_ylim()
135 | x1 = np.linspace(x_low, x_high, 100)
136 | x2 = np.linspace(y_low, y_high, 100)
137 |
138 | X1, X2 = np.meshgrid(x1, x2)
139 | xy = np.vstack([X1.ravel(), X2.ravel()]).T
140 | Z = knn.predict(xy).reshape(X1.shape)
141 |
142 | label = kwargs.pop("label", "Decision Boundary")
143 | # plot decision boundary and margins
144 | cs = ax.contourf(X1, X2, Z, **kwargs, cmap=cmap, alpha=alpha)
145 | cs.collections[0].set_label(label)
146 | plt.axis("off")
147 |
148 |
149 | def draw_decision_surface(clf, predictions, label=None):
150 | ax = plt.gca()
151 | x_low, x_high = ax.get_xlim()
152 | y_low, y_high = ax.get_ylim()
153 | x1 = np.linspace(x_low, x_high, 100)
154 | x2 = np.linspace(y_low, y_high, 100)
155 |
156 | X1, X2 = np.meshgrid(x1, x2)
157 | xy = np.vstack([X1.ravel(), X2.ravel()]).T
158 | Z = clf.predict_proba(xy)[:, 1].reshape(X1.shape)
159 |
160 | plt.imshow(
161 | Z,
162 | extent=[x_low, x_high, y_low, y_high],
163 | cmap="GnBu",
164 | origin="lower",
165 | vmin=0,
166 | vmax=1,
167 | )
168 | plt.grid()
169 | plt.colorbar(label=label)
170 | plt.axis("off")
171 |
172 |
173 | def plot_bars_and_confusion(
174 | truth,
175 | prediction,
176 | axes=None,
177 | vmin=None,
178 | vmax=None,
179 | cmap='inferno',
180 | title=None,
181 | bar_color=None,
182 | ):
183 | accuracy = accuracy_score(truth, prediction)
184 | cm = confusion_matrix(truth, prediction)
185 |
186 | if not isinstance(truth, pd.Series):
187 | truth = pd.Series(truth)
188 |
189 | if not isinstance(prediction, pd.Series):
190 | prediction = pd.Series(prediction)
191 |
192 | correct = pd.Series(np.where(truth.values == prediction.values, 'Correct', 'Wrong'))
193 |
194 | truth.sort_index(inplace=True)
195 | prediction.sort_index(inplace=True)
196 |
197 | if not axes:
198 | fig, axes = plt.subplots(1, 2)
199 |
200 | if not vmin:
201 | vmin = cm.min()
202 |
203 | if not vmax:
204 | vmax = cm.max()
205 |
206 | if not bar_color:
207 | correct.value_counts().plot.barh(ax=axes[0])
208 | else:
209 | correct.value_counts().plot.barh(ax=axes[0], color=bar_color)
210 |
211 | axes[0].text(150, 0.5, "Accuracy {:0.3f}".format(accuracy))
212 |
213 | sns.heatmap(
214 | cm,
215 | annot=True,
216 | fmt="d",
217 | cmap=cmap,
218 | xticklabels=["No", "Yes"],
219 | yticklabels=["No", "Yes"],
220 | ax=axes[1],
221 | vmin=vmin,
222 | vmax=vmax,
223 | )
224 | axes[1].set_ylabel("Actual")
225 | axes[1].set_xlabel("Predicted")
226 | if title:
227 | plt.suptitle(title)
228 |
--------------------------------------------------------------------------------
/ml/solutions/exercise_1.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | import numpy as np
3 | from sklearn import linear_model
4 | np.random.seed(1234)
5 | # create two gaussians
6 | A = np.random.multivariate_normal(mean=[1, 1], cov=[[2, 1], [1, 2]], size=200)
7 | B = np.random.multivariate_normal(mean=[-2, -2], cov=[[2, 0], [0, 2]], size=200)
8 |
9 | # get them into proper matrix form
10 | X = np.vstack([A, B])
11 | Y = np.hstack([np.zeros(len(A)), np.ones(len(B))])
12 |
13 | # train the linear regressor and save the coefficents
14 | reg = linear_model.LinearRegression()
15 | reg.fit(X, Y)
16 | b_1, b_2 = reg.coef_
17 | b_0 = reg.intercept_
18 |
19 | # solve the function y = b_0 + b_1*X_1 + b_2 * X_2 for X2
20 | x1s = np.linspace(-8, 8)
21 | x2s = (0.5 - b_0 - b_1 * x1s) / b_2
22 |
23 |
24 | plt.scatter(A[:, 0], A[:, 1], s=25, color='dodgerblue', label='True class A')
25 | plt.scatter(B[:, 0], B[:, 1], s=25, color='limegreen', label='True class B')
26 |
27 | plt.plot(x1s, x2s, color='gray', linestyle='--')
28 |
29 | plt.fill_between(x1s, x2s, 10, color='dodgerblue', alpha=0.07)
30 | plt.fill_between(x1s, x2s, -10, color='limegreen', alpha=0.07)
31 | plt.grid()
32 | plt.xlabel('X1')
33 | plt.ylabel('X2')
34 | plt.margins(x=0, y=0)
35 | plt.xlim([-8, 8])
36 | plt.ylim([-8, 8])
37 | plt.legend()
38 | None
39 |
--------------------------------------------------------------------------------
/ml/solutions/exercise_2.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | from sklearn.svm import SVC
4 | from sklearn.model_selection import train_test_split
5 |
6 | np.random.seed(1234)
7 | data = read_titanic()
8 |
9 | X = data[['Sex_Code', 'Pclass_Code', 'Fare', 'Age']]
10 | y = data['Survived_Code']
11 |
12 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
13 |
14 |
15 | # Use linear kernel
16 | reg = SVC(kernel='linear')
17 | reg.fit(X_train, y_train)
18 | prediction_linear = reg.predict(X_test)
19 |
20 | # Use the rbf kernel
21 | reg_rbf = SVC(kernel='rbf')
22 | reg_rbf.fit(X_train, y_train)
23 | prediction_rbf = reg_rbf.predict(X_test)
24 |
25 | fig, ([ax1, ax2], [ax3, ax4]) = plt.subplots(2, 2, figsize=(10, 10))
26 | plots.plot_bars_and_confusion(truth=y_test, prediction=prediction_linear, axes=[ax1, ax2], vmin=0, vmax=182)
27 | plots.plot_bars_and_confusion(truth=y_test, prediction=prediction_rbf, axes=[ax3, ax4], vmin=0, vmax=182)
28 | ax1.set_title('Linear Kernel')
29 | ax3.set_title('Radial Kernel')
30 | ax1.set_xlim([0, 300])
31 | ax3.set_xlim([0, 300])
32 | None
33 |
--------------------------------------------------------------------------------
/ml/solutions/exercise_3.py:
--------------------------------------------------------------------------------
1 | from sklearn.model_selection import ParameterGrid, train_test_split
2 | from sklearn.metrics import accuracy_score
3 | from sklearn.tree import DecisionTreeClassifier
4 | import seaborn as sns
5 | import pandas as pd
6 | import numpy as np
7 | np.random.seed(1235)
8 |
9 | data = read_titanic()
10 |
11 | X = data[['Sex_Code', 'Pclass_Code', 'Fare', 'Age']]
12 | y = data['Survived']
13 |
14 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
15 |
16 | df = pd.DataFrame()
17 | ps = ParameterGrid({'max_depth':range(1, 20), 'criterion':['entropy', 'gini']})
18 | for d in ps:
19 | clf = DecisionTreeClassifier(max_depth=d['max_depth'], criterion=d['criterion'])
20 | clf.fit(X_train, y_train)
21 | acc = accuracy_score(y_test, clf.predict(X_test))
22 | df = df.append({'max_depth': d['max_depth'], 'criterion': d['criterion'], 'accuracy': acc}, ignore_index=True)
23 |
24 | df = df.pivot('max_depth', 'criterion', 'accuracy')
25 | sns.heatmap(df, cmap='YlOrRd', annot=True, fmt='.3f')
26 | None
27 |
--------------------------------------------------------------------------------
/ml/solutions/exercise_5.py:
--------------------------------------------------------------------------------
1 | from sklearn.model_selection import cross_validate
2 | from sklearn.neighbors import KNeighborsClassifier
3 | from sklearn.tree import DecisionTreeClassifier
4 | from sklearn.svm import SVC
5 |
6 | np.random.seed(1234)
7 | data = read_titanic()
8 |
9 | X = data[['Sex_Code', 'Pclass_Code', 'Fare', 'Age']]
10 | y = data['Survived_Code']
11 |
12 | svc = SVC(kernel='linear')
13 | knn = KNeighborsClassifier(n_neighbors=5)
14 | tree = DecisionTreeClassifier(max_depth=5)
15 |
16 | results = []
17 | for clf, name in zip([svc, knn, tree], ['SVM', 'kNN', 'tree']):
18 | r = cross_validate(clf, X=X, y=y, cv=5, scoring=['accuracy', 'precision', 'recall', 'f1'])
19 | df = pd.DataFrame().from_dict(r)
20 | df['classifier'] = name
21 | results.append(df)
22 |
23 | df = pd.concat(results).drop(['fit_time', 'score_time'], axis='columns')
24 |
25 | means = df.groupby('classifier').mean()
26 | deviations = df.groupby('classifier').std()
27 |
28 | fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 4))
29 | sns.heatmap(means, cmap='viridis', annot=True, ax=ax1, vmin=0, vmax=1)
30 | sns.heatmap(deviations, cmap='viridis', annot=True, ax=ax2, vmin=0, vmax=1)
31 |
--------------------------------------------------------------------------------
/ml/solutions/exercise_6.py:
--------------------------------------------------------------------------------
1 | from sklearn.metrics import precision_recall_curve, roc_curve, roc_auc_score, average_precision_score
2 | from sklearn.model_selection import StratifiedKFold
3 | from sklearn.tree import DecisionTreeClassifier
4 | from sklearn.datasets import make_moons
5 | from matplotlib import patches
6 |
7 | X, y = make_moons(n_samples=5000, noise=0.9)
8 |
9 | clf = DecisionTreeClassifier(min_samples_leaf=50)
10 | cv = StratifiedKFold(n_splits=5)
11 |
12 | fig, ([ax1, ax2], [ax3, ax4]) = plt.subplots(2, 2, figsize=(12, 12))
13 |
14 | roc_auc = []
15 | pr_auc = []
16 |
17 | for train, test in cv.split(X, y):
18 | X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]
19 |
20 | clf.fit(X_train, y_train)
21 |
22 | prediction = clf.predict_proba(X_test)[:, 1]
23 |
24 | p, r, thresholds_pr = precision_recall_curve(y_test, prediction)
25 | fpr, tpr, thresholds_roc = roc_curve(y_test, prediction)
26 |
27 | roc_auc.append(roc_auc_score(y_test, prediction))
28 | pr_auc.append(average_precision_score(y_test, prediction))
29 |
30 | ax1.step(thresholds_pr, r[: -1], color='gray', where='post')
31 | ax1.step(thresholds_pr, p[: -1], color='darkgray', where='post')
32 |
33 | ax2.step(r, p, color='darkmagenta', where='post')
34 |
35 | ax3.step(thresholds_roc, tpr, color='gray', where='post')
36 | ax3.step(thresholds_roc, fpr, color='darkgray', where='post')
37 |
38 | ax4.step(fpr, tpr, color='mediumvioletred', where='post')
39 |
40 |
41 |
42 | p1 = patches.Patch(color='gray', label='Recall')
43 | p2 = patches.Patch(color='darkgray', label='Precission')
44 | ax1.legend(handles=[p1, p2])
45 | ax1.set_xlabel('Decission Threshold')
46 | ax1.set_xlim([0, 1])
47 | ax1.set_ylim([0, 1])
48 |
49 | ax2.set_xlim([0, 1])
50 | ax2.set_ylim([0, 1])
51 | ax2.set_ylabel('Precission')
52 | ax2.set_xlabel('Recall')
53 | s = 'AUC {:0.3f} +/- {:0.3f}'.format(np.array(pr_auc).mean(), np.array(pr_auc).std())
54 | ax2.text(0.2, 0.2, s)
55 |
56 | p1 = patches.Patch(color='gray', label='True Positive Rate')
57 | p2 = patches.Patch(color='darkgray', label='False Positive Rate')
58 | ax3.legend(handles=[p1, p2])
59 | ax3.set_xlabel('Decission Threshold')
60 | ax3.set_xlim([0, 1])
61 | ax3.set_ylim([0, 1])
62 |
63 | ax4.set_xlim([0, 1])
64 | ax4.set_ylim([0, 1])
65 | ax4.set_ylabel('True Positive Rate')
66 | ax4.set_xlabel('False Positive Rate')
67 | s = 'AUC {:0.3f} +/- {:0.3f}'.format(np.array(roc_auc).mean(), np.array(roc_auc).std())
68 | ax4.text(0.2, 0.2, s)
69 |
70 | None
71 |
--------------------------------------------------------------------------------
/resources/event.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/resources/event.mp4
--------------------------------------------------------------------------------
/resources/lstsubarray_stereo.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/resources/lstsubarray_stereo.mp4
--------------------------------------------------------------------------------
/resources/muon_data.txt:
--------------------------------------------------------------------------------
1 | #$SPEC_ID:
2 | #No sample description was entered.
3 | #$SPEC_REM:
4 | #DET# 1
5 | #DETDESC# MYON MCB 337
6 | #AP# Maestro Version 6.06
7 | #$DATE_MEA:
8 | #02/21/2014 14:06:36
9 | #$MEAS_TIM:
10 | #241454 241455
11 | #$DATA:
12 | #0 511
13 | 0
14 | 0
15 | 55
16 | 305
17 | 381
18 | 303
19 | 320
20 | 316
21 | 324
22 | 317
23 | 337
24 | 302
25 | 308
26 | 318
27 | 293
28 | 267
29 | 260
30 | 293
31 | 206
32 | 251
33 | 226
34 | 220
35 | 261
36 | 216
37 | 236
38 | 215
39 | 228
40 | 209
41 | 221
42 | 215
43 | 187
44 | 187
45 | 173
46 | 200
47 | 187
48 | 175
49 | 200
50 | 178
51 | 146
52 | 142
53 | 160
54 | 150
55 | 139
56 | 142
57 | 136
58 | 143
59 | 110
60 | 118
61 | 125
62 | 123
63 | 123
64 | 115
65 | 119
66 | 112
67 | 98
68 | 122
69 | 117
70 | 123
71 | 111
72 | 98
73 | 109
74 | 111
75 | 97
76 | 103
77 | 97
78 | 92
79 | 90
80 | 102
81 | 104
82 | 90
83 | 77
84 | 88
85 | 79
86 | 60
87 | 75
88 | 87
89 | 70
90 | 65
91 | 73
92 | 83
93 | 77
94 | 70
95 | 67
96 | 54
97 | 67
98 | 55
99 | 56
100 | 69
101 | 61
102 | 39
103 | 57
104 | 55
105 | 57
106 | 46
107 | 54
108 | 54
109 | 43
110 | 54
111 | 46
112 | 51
113 | 44
114 | 54
115 | 35
116 | 59
117 | 33
118 | 40
119 | 39
120 | 30
121 | 38
122 | 31
123 | 44
124 | 47
125 | 37
126 | 38
127 | 33
128 | 31
129 | 30
130 | 24
131 | 31
132 | 26
133 | 28
134 | 22
135 | 34
136 | 34
137 | 34
138 | 31
139 | 30
140 | 27
141 | 24
142 | 24
143 | 18
144 | 29
145 | 33
146 | 18
147 | 24
148 | 22
149 | 26
150 | 22
151 | 18
152 | 18
153 | 18
154 | 17
155 | 22
156 | 17
157 | 16
158 | 20
159 | 20
160 | 22
161 | 16
162 | 22
163 | 21
164 | 21
165 | 21
166 | 13
167 | 18
168 | 11
169 | 14
170 | 16
171 | 17
172 | 15
173 | 17
174 | 21
175 | 13
176 | 15
177 | 15
178 | 12
179 | 19
180 | 15
181 | 13
182 | 13
183 | 11
184 | 16
185 | 19
186 | 11
187 | 14
188 | 8
189 | 10
190 | 10
191 | 8
192 | 13
193 | 16
194 | 13
195 | 5
196 | 10
197 | 10
198 | 12
199 | 12
200 | 9
201 | 16
202 | 7
203 | 8
204 | 5
205 | 9
206 | 11
207 | 6
208 | 11
209 | 8
210 | 8
211 | 9
212 | 11
213 | 9
214 | 14
215 | 8
216 | 12
217 | 10
218 | 8
219 | 6
220 | 5
221 | 6
222 | 10
223 | 3
224 | 10
225 | 5
226 | 15
227 | 7
228 | 8
229 | 6
230 | 9
231 | 11
232 | 9
233 | 5
234 | 3
235 | 4
236 | 7
237 | 7
238 | 5
239 | 6
240 | 6
241 | 8
242 | 7
243 |
--------------------------------------------------------------------------------
/resources/spalt.csv:
--------------------------------------------------------------------------------
1 | phi,I
2 | -0.02033898305084746,7.000000000000001e-09
3 | -0.019491525423728815,8e-09
4 | -0.01864406779661017,8.199999999999999e-09
5 | -0.017796610169491526,6.8000000000000005e-09
6 | -0.01694915254237288,4.6e-09
7 | -0.016101694915254237,2.6e-09
8 | -0.015254237288135596,1.5000000000000002e-09
9 | -0.014406779661016951,2.4e-09
10 | -0.013559322033898306,5e-09
11 | -0.012711864406779662,8.9e-09
12 | -0.011864406779661017,1.1000000000000001e-08
13 | -0.011016949152542374,1.2500000000000001e-08
14 | -0.01016949152542373,1.12e-08
15 | -0.009322033898305085,1.08e-08
16 | -0.00847457627118644,4.6e-09
17 | -0.007627118644067798,2.7e-09
18 | -0.006779661016949153,7.400000000000001e-09
19 | -0.005932203389830508,1.2150000000000001e-08
20 | -0.005084745762711863,4.9e-08
21 | -0.004237288135593221,8.700000000000001e-08
22 | -0.0033898305084745766,1.2500000000000002e-07
23 | -0.0025423728813559316,1.5000000000000002e-07
24 | -0.0016949152542372898,2.4500000000000004e-07
25 | -0.0008474576271186449,2.55e-07
26 | 0.0,2.65e-07
27 | 0.0008474576271186449,2.55e-07
28 | 0.0016949152542372898,2.3500000000000003e-07
29 | 0.0025423728813559316,1.95e-07
30 | 0.0033898305084745766,1.4500000000000001e-07
31 | 0.004237288135593221,9.5e-08
32 | 0.005084745762711863,6.000000000000001e-08
33 | 0.005932203389830508,3.8e-08
34 | 0.006779661016949153,1.6e-08
35 | 0.007627118644067798,8.4e-09
36 | 0.008474576271186442,6.400000000000001e-09
37 | 0.009322033898305087,8.600000000000001e-09
38 | 0.010169491525423733,1.0500000000000001e-08
39 | 0.011016949152542371,1.2500000000000001e-08
40 | 0.011864406779661016,1.2500000000000001e-08
41 | 0.012711864406779662,1.0500000000000001e-08
42 | 0.013559322033898306,7.500000000000001e-09
43 | 0.014406779661016951,5.2e-09
44 | 0.015254237288135596,2.8e-09
45 | 0.01610169491525424,1.6000000000000003e-09
46 | 0.01694915254237288,1.4e-09
47 | 0.017796610169491526,1.8000000000000002e-09
48 | 0.01864406779661017,2.4e-09
49 | 0.019491525423728815,2.6e-09
50 | 0.02033898305084746,2.2000000000000003e-09
51 | 0.021186440677966104,1.6000000000000003e-09
52 | 0.02203389830508475,1e-09
53 |
--------------------------------------------------------------------------------
/smd_boosting.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Application to CTAO Data and Boosting\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "\n",
15 | "The story so far:\n",
16 | "\n",
17 | "- Linear Discriminant Analysis (LDA) and Fisher's linear discriminant\n",
18 | "- Principal Component Analysis (PCA)\n",
19 | "- Feature Selection\n",
20 | "- Supervised Learning\n",
21 | "- Clustering"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "ExecuteTime": {
29 | "end_time": "2018-11-27T15:05:27.643365Z",
30 | "start_time": "2018-11-27T15:05:26.177785Z"
31 | }
32 | },
33 | "outputs": [],
34 | "source": [
35 | "from ml import plots\n",
36 | "import matplotlib\n",
37 | "import matplotlib.pyplot as plt\n",
38 | "import pandas as pd\n",
39 | "import numpy as np\n",
40 | "from matplotlib.colors import ListedColormap"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {
47 | "ExecuteTime": {
48 | "end_time": "2018-11-27T15:05:27.643365Z",
49 | "start_time": "2018-11-27T15:05:26.177785Z"
50 | }
51 | },
52 | "outputs": [],
53 | "source": [
54 | "pd.options.display.max_rows = 10\n",
55 | "plots.set_plot_style()"
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {},
62 | "outputs": [],
63 | "source": [
64 | "%matplotlib widget"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "# A Complete Example\n",
72 | "\n",
73 | "Below we load a dataset containing data from simulated CTA Observations.\n",
74 | "\n",
75 | "
\n",
76 | "\n",
77 | "We will perform the typical steps to build and evaluate a classifier.\n",
78 | "\n",
79 | "0. Understand where your data comes from\n",
80 | "\n",
81 | "1. Preprocessing\n",
82 | " * Drop Constant Values,\n",
83 | " * Handle Missing Data \n",
84 | " * Feature Generation\n",
85 | "\n",
86 | "2. Splitting\n",
87 | " \n",
88 | " * Split your data into training and evaluation sets\n",
89 | " \n",
90 | "3. Training \n",
91 | " \n",
92 | " * Train your classifier of choice.\n",
93 | " \n",
94 | "4. Evaluation\n",
95 | " \n",
96 | " * Evaluate the performance on the test data set.\n",
97 | " * If not good enough, go back to step 1 \n",
98 | " \n",
99 | "5. Physics\n",
100 | " \n",
101 | " * Check whether your data support your hypothesis\n",
102 | " "
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "## 1. Get to know your data\n",
110 | "\n",
111 | "Cherenkov telescopes record short flashes of light produced by very high energy cosmic rays and photons hitting earths atmosphere.\n",
112 | "\n",
113 | ""
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {
120 | "ExecuteTime": {
121 | "end_time": "2018-11-27T15:07:00.589316Z",
122 | "start_time": "2018-11-27T15:07:00.584438Z"
123 | }
124 | },
125 | "outputs": [],
126 | "source": [
127 | "%%HTML\n",
128 | "\n",
129 | ""
132 | ]
133 | },
134 | {
135 | "cell_type": "markdown",
136 | "metadata": {},
137 | "source": [
138 | "We will use machine learning for two tasks in this example. \n",
139 | "\n",
140 | " * Train a classifier to distinguish events induced by gamma rays form events induced by cosmic rays\n",
141 | " * Train a regressor to estimate the energy of the incoming primary particle."
142 | ]
143 | },
144 | {
145 | "cell_type": "markdown",
146 | "metadata": {},
147 | "source": [
148 | "## 2. Preprocess data\n",
149 | "\n",
150 | "A _**lot**_ of preprocessing has _already_ happened at this point.\n",
151 | "\n",
152 | "* Calibration of Raw Data\n",
153 | "* Data Reduction from voltage timeseries per pixel to number of photons and mean time for each pixel\n",
154 | "* Calculation of image features\n",
155 | "* Geometrical Reconstruction of the Shower Geometry\n",
156 | "\n",
157 | "\n",
158 | "Load data and remove unwanted columns and store the true labels separately."
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {
165 | "ExecuteTime": {
166 | "end_time": "2018-11-27T15:07:01.600004Z",
167 | "start_time": "2018-11-27T15:07:00.592824Z"
168 | }
169 | },
170 | "outputs": [],
171 | "source": [
172 | "import pandas as pd\n",
173 | "from ctapipe.io import TableLoader\n",
174 | "from ctapipe.utils import get_dataset_path"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "The dataset here is very similar but much smaller than the full dataset released publicly here:\n",
182 | "\n",
183 | "[](https://doi.org/10.5281/zenodo.7298569)"
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": null,
189 | "metadata": {},
190 | "outputs": [],
191 | "source": [
192 | "gamma_path = get_dataset_path('gamma_diffuse_dl2_train_small.dl2.h5')\n",
193 | "proton_path = get_dataset_path('proton_dl2_train_small.dl2.h5')"
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": null,
199 | "metadata": {},
200 | "outputs": [],
201 | "source": [
202 | "with TableLoader(gamma_path) as loader:\n",
203 | " subarray = loader.subarray\n",
204 | "\n",
205 | "subarray.peek()"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": null,
211 | "metadata": {
212 | "ExecuteTime": {
213 | "end_time": "2018-11-27T15:07:02.004702Z",
214 | "start_time": "2018-11-27T15:07:01.603010Z"
215 | }
216 | },
217 | "outputs": [],
218 | "source": [
219 | "def read_events(path):\n",
220 | "\n",
221 | " loader = TableLoader(\n",
222 | " path,\n",
223 | " dl2=True,\n",
224 | " instrument=True,\n",
225 | " simulated=True,\n",
226 | " )\n",
227 | "\n",
228 | " table = loader.read_telescope_events()\n",
229 | "\n",
230 | " # these two columns are arrays in each row, which is not supported by pandas\n",
231 | " table.remove_columns(['tels_with_trigger', 'HillasReconstructor_telescopes'])\n",
232 | "\n",
233 | " # convert astropy.table.Table to pd.DataFrame\n",
234 | " return table.to_pandas()"
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": null,
240 | "metadata": {},
241 | "outputs": [],
242 | "source": [
243 | "gammas = read_events(gamma_path)"
244 | ]
245 | },
246 | {
247 | "cell_type": "code",
248 | "execution_count": null,
249 | "metadata": {},
250 | "outputs": [],
251 | "source": [
252 | "len(gammas.columns)"
253 | ]
254 | },
255 | {
256 | "cell_type": "markdown",
257 | "metadata": {},
258 | "source": [
259 | "Now delete all simulated values which can not be observed during measurement in the physical world. We know which columns to remove because they have a special prefix."
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "metadata": {
266 | "ExecuteTime": {
267 | "end_time": "2018-11-27T15:07:02.055025Z",
268 | "start_time": "2018-11-27T15:07:02.007914Z"
269 | }
270 | },
271 | "outputs": [],
272 | "source": [
273 | "forbidden_columns = 'true_|obs_id|event_id'\n",
274 | "gammas = gammas.filter(regex=f'^(?!{forbidden_columns}).*$')\n",
275 | "\n",
276 | "len(gammas.columns)"
277 | ]
278 | },
279 | {
280 | "cell_type": "markdown",
281 | "metadata": {},
282 | "source": [
283 | "Check the data types of the columns. We can select non-numeric types and encode them. But in this case we might as well drop them as the attribute is not important."
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": null,
289 | "metadata": {
290 | "ExecuteTime": {
291 | "end_time": "2018-11-27T15:07:02.075296Z",
292 | "start_time": "2018-11-27T15:07:02.057255Z"
293 | }
294 | },
295 | "outputs": [],
296 | "source": [
297 | "c = gammas.select_dtypes(exclude=['number', 'bool']).columns\n",
298 | "print('Removed columns:', c.values)\n",
299 | "\n",
300 | "gammas = gammas.drop(c, axis='columns')"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "metadata": {},
306 | "source": [
307 | "We can spot the columns with constant values by looking at the count and/or standard deviation."
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": null,
313 | "metadata": {
314 | "ExecuteTime": {
315 | "end_time": "2018-11-27T15:07:02.529060Z",
316 | "start_time": "2018-11-27T15:07:02.078743Z"
317 | }
318 | },
319 | "outputs": [],
320 | "source": [
321 | "desc = gammas.describe()\n",
322 | "desc"
323 | ]
324 | },
325 | {
326 | "cell_type": "code",
327 | "execution_count": null,
328 | "metadata": {
329 | "ExecuteTime": {
330 | "end_time": "2018-11-27T15:07:02.547778Z",
331 | "start_time": "2018-11-27T15:07:02.532415Z"
332 | }
333 | },
334 | "outputs": [],
335 | "source": [
336 | "c = desc.columns[desc.loc['std'] == 0]\n",
337 | "print('Removed columns:', c.values)\n",
338 | "gammas = gammas.drop(c, axis='columns')"
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "metadata": {},
344 | "source": [
345 | "drop columns where all rows are nan"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": null,
351 | "metadata": {},
352 | "outputs": [],
353 | "source": [
354 | "c = gammas.columns[gammas.count() == 0]\n",
355 | "print('Removed columns:', c.values)\n",
356 | "gammas = gammas.drop(c, axis='columns')"
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "metadata": {},
362 | "source": [
363 | "here we do a specific pre-selection, again using \"expert knowledge\""
364 | ]
365 | },
366 | {
367 | "cell_type": "code",
368 | "execution_count": null,
369 | "metadata": {},
370 | "outputs": [],
371 | "source": [
372 | "print(len(gammas))\n",
373 | "gammas = gammas[gammas['hillas_width'] > 0]\n",
374 | "print(len(gammas))"
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "metadata": {},
380 | "source": [
381 | "Check for missing data. (Just delete it in this case)"
382 | ]
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": null,
387 | "metadata": {
388 | "ExecuteTime": {
389 | "end_time": "2018-11-27T15:07:02.594941Z",
390 | "start_time": "2018-11-27T15:07:02.551401Z"
391 | }
392 | },
393 | "outputs": [],
394 | "source": [
395 | "print(len(gammas))\n",
396 | "gammas = gammas.dropna()\n",
397 | "print(len(gammas))"
398 | ]
399 | },
400 | {
401 | "cell_type": "markdown",
402 | "metadata": {},
403 | "source": [
404 | "So far we only loaded simulated gamma-ray showers. Now we do the same for the cosmic ray events. We create a method to perform all preprocessing in one step. We need this several times."
405 | ]
406 | },
407 | {
408 | "cell_type": "code",
409 | "execution_count": null,
410 | "metadata": {
411 | "ExecuteTime": {
412 | "end_time": "2018-11-27T15:07:02.608429Z",
413 | "start_time": "2018-11-27T15:07:02.597366Z"
414 | }
415 | },
416 | "outputs": [],
417 | "source": [
418 | "def preprocess(df):\n",
419 | " df = df.filter(regex=f'^(?!{forbidden_columns}).*$')\n",
420 | " \n",
421 | " c = df.select_dtypes(exclude=['number', 'bool']).columns\n",
422 | " df = df.drop(c, axis='columns')\n",
423 | " \n",
424 | " c = df.columns[df.count() == 0]\n",
425 | " df = df.drop(c, axis='columns')\n",
426 | " \n",
427 | " desc = df.describe()\n",
428 | " \n",
429 | " c = desc.columns[desc.loc['std'] == 0]\n",
430 | " df = df.drop(c, axis='columns')\n",
431 | " \n",
432 | " df = df[df['hillas_width'] > 0]\n",
433 | " \n",
434 | " df = df.dropna()\n",
435 | " \n",
436 | " return df"
437 | ]
438 | },
439 | {
440 | "cell_type": "code",
441 | "execution_count": null,
442 | "metadata": {
443 | "ExecuteTime": {
444 | "end_time": "2018-11-27T15:07:03.467654Z",
445 | "start_time": "2018-11-27T15:07:02.611649Z"
446 | }
447 | },
448 | "outputs": [],
449 | "source": [
450 | "gammas = read_events(gamma_path)\n",
451 | "gammas = preprocess(gammas)\n",
452 | "\n",
453 | "protons = read_events(proton_path)\n",
454 | "protons = preprocess(protons)"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "metadata": {},
460 | "source": [
461 | "Now we can perform feature generation. We use our expert knowledge or intuition to create a new feature by combining existing columns into a new variable."
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "metadata": {
468 | "ExecuteTime": {
469 | "end_time": "2018-11-27T15:07:03.481076Z",
470 | "start_time": "2018-11-27T15:07:03.471131Z"
471 | }
472 | },
473 | "outputs": [],
474 | "source": [
475 | "def feature_generation(df):\n",
476 | " df['awesome_feature'] = df.eval('hillas_intensity / (hillas_width * hillas_length)')\n",
477 | " \n",
478 | " # distance of impact point to the telescope\n",
479 | " df['impact'] = np.sqrt(\n",
480 | " (df['HillasReconstructor_core_x'] - df['pos_x'])**2\n",
481 | " + (df['HillasReconstructor_core_y'] - df['pos_y'])**2\n",
482 | " )\n",
483 | "\n",
484 | " return df\n",
485 | "\n",
486 | "gammas = feature_generation(gammas)\n",
487 | "protons = feature_generation(protons)\n",
488 | "\n",
489 | "gammas[['awesome_feature', 'impact']]"
490 | ]
491 | },
492 | {
493 | "cell_type": "markdown",
494 | "metadata": {},
495 | "source": [
496 | "A quick look at the data so far"
497 | ]
498 | },
499 | {
500 | "cell_type": "code",
501 | "execution_count": null,
502 | "metadata": {
503 | "ExecuteTime": {
504 | "end_time": "2018-11-27T15:07:03.829801Z",
505 | "start_time": "2018-11-27T15:07:03.484278Z"
506 | }
507 | },
508 | "outputs": [],
509 | "source": [
510 | "# bins = np.geomspace(0.01, 1, 101)\n",
511 | "# bins = np.logspace(0, 1, 100)\n",
512 | "# bins = 100\n",
513 | "# bins = np.arange(0, 10) - 0.5\n",
514 | "bins = np.geomspace(1e3, 1e5, 51)\n",
515 | "\n",
516 | "col = 'awesome_feature'\n",
517 | "\n",
518 | "plt.figure()\n",
519 | "plt.hist(gammas[col], bins=bins, histtype='step', lw=2, label='Gammas', density=True)\n",
520 | "plt.hist(protons[col], bins=bins, histtype='step', lw=2, label='Protons', density=True)\n",
521 | "\n",
522 | "plt.xscale('log')\n",
523 | "\n",
524 | "plt.xlabel(col)\n",
525 | "plt.legend()\n",
526 | "None"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "metadata": {},
532 | "source": [
533 | "At this point we combine the two datasets into one big matrix and build a label vector $y$"
534 | ]
535 | },
536 | {
537 | "cell_type": "code",
538 | "execution_count": null,
539 | "metadata": {
540 | "ExecuteTime": {
541 | "end_time": "2018-11-27T15:07:03.895879Z",
542 | "start_time": "2018-11-27T15:07:03.832960Z"
543 | }
544 | },
545 | "outputs": [],
546 | "source": [
547 | "X = pd.concat([gammas, protons])\n",
548 | "y = np.concatenate([np.ones(len(gammas)), np.zeros(len(protons))])"
549 | ]
550 | },
551 | {
552 | "cell_type": "markdown",
553 | "metadata": {},
554 | "source": [
555 | "## 3. Split Data\n",
556 | "\n",
557 | "Now we can split the data into test and training sets. Scikit-Learn provides some neat methods to do just that."
558 | ]
559 | },
560 | {
561 | "cell_type": "code",
562 | "execution_count": null,
563 | "metadata": {
564 | "ExecuteTime": {
565 | "end_time": "2018-11-27T15:07:03.983514Z",
566 | "start_time": "2018-11-27T15:07:03.898708Z"
567 | }
568 | },
569 | "outputs": [],
570 | "source": [
571 | "from sklearn.model_selection import train_test_split\n",
572 | "\n",
573 | "X_test, X_train, y_test, y_train = train_test_split(X, y)"
574 | ]
575 | },
576 | {
577 | "cell_type": "markdown",
578 | "metadata": {},
579 | "source": [
580 | "## 4. Train the classifier\n",
581 | "\n",
582 | "Now we can train any classifier we want on the prepared data."
583 | ]
584 | },
585 | {
586 | "cell_type": "code",
587 | "execution_count": null,
588 | "metadata": {
589 | "ExecuteTime": {
590 | "end_time": "2018-11-27T15:13:04.181685Z",
591 | "start_time": "2018-11-27T15:13:02.875532Z"
592 | }
593 | },
594 | "outputs": [],
595 | "source": [
596 | "from sklearn.tree import DecisionTreeClassifier\n",
597 | "\n",
598 | "rf = DecisionTreeClassifier(max_depth=15, criterion='entropy')\n",
599 | "rf.fit(X_train, y_train)\n",
600 | "\n",
601 | "y_prediction = rf.predict(X_test)\n",
602 | "y_prediction_proba = rf.predict_proba(X_test)"
603 | ]
604 | },
605 | {
606 | "cell_type": "markdown",
607 | "metadata": {},
608 | "source": [
609 | "## 5. Evaluation \n",
610 | "\n",
611 | "Check accuracy of the models and other metrics "
612 | ]
613 | },
614 | {
615 | "cell_type": "code",
616 | "execution_count": null,
617 | "metadata": {},
618 | "outputs": [],
619 | "source": [
620 | "importance = pd.Series(rf.feature_importances_, index=gammas.columns)\n",
621 | "\n",
622 | "plt.figure()\n",
623 | "importance.sort_values().tail(20).plot.barh()"
624 | ]
625 | },
626 | {
627 | "cell_type": "code",
628 | "execution_count": null,
629 | "metadata": {
630 | "ExecuteTime": {
631 | "end_time": "2018-11-27T15:13:04.199205Z",
632 | "start_time": "2018-11-27T15:13:04.183884Z"
633 | }
634 | },
635 | "outputs": [],
636 | "source": [
637 | "from sklearn.metrics import accuracy_score, roc_curve, roc_auc_score\n",
638 | "\n",
639 | "acc = accuracy_score(y_test, y_prediction)\n",
640 | "auc = roc_auc_score(y_test, y_prediction_proba[:, 1])\n",
641 | "fpr, tpr, thresholds = roc_curve(y_test, y_prediction_proba[:, 1])"
642 | ]
643 | },
644 | {
645 | "cell_type": "code",
646 | "execution_count": null,
647 | "metadata": {
648 | "ExecuteTime": {
649 | "end_time": "2018-11-27T15:13:46.845086Z",
650 | "start_time": "2018-11-27T15:13:46.501506Z"
651 | }
652 | },
653 | "outputs": [],
654 | "source": [
655 | "def plot_roc(fpr, tpr, thresholds):\n",
656 | " fig, ax = plt.subplots()\n",
657 | "\n",
658 | " ax.plot(fpr, tpr, '--', color='gray', alpha=0.5)\n",
659 | " plot = ax.scatter(fpr, tpr, c=thresholds, vmax=1)\n",
660 | " fig.colorbar(plot)\n",
661 | " ax.text(0.5, 0.5, f'AuC ROC: {auc:0.03f} \\nAccuracy: {acc:0.03f}')\n",
662 | "\n",
663 | "\n",
664 | " ax.set_xlabel('FPR')\n",
665 | " ax.set_ylabel('TPR')\n",
666 | " ax.set_aspect(1)\n",
667 | "\n",
668 | " \n",
669 | "plot_roc(fpr, tpr, thresholds)\n",
670 | "None"
671 | ]
672 | },
673 | {
674 | "cell_type": "markdown",
675 | "metadata": {},
676 | "source": [
677 | "Perform steps 3, 4, and 5 in one step using cross validation"
678 | ]
679 | },
680 | {
681 | "cell_type": "code",
682 | "execution_count": null,
683 | "metadata": {
684 | "ExecuteTime": {
685 | "end_time": "2018-11-27T15:08:28.709336Z",
686 | "start_time": "2018-11-27T15:08:03.738680Z"
687 | }
688 | },
689 | "outputs": [],
690 | "source": [
691 | "from sklearn.model_selection import cross_validate\n",
692 | "\n",
693 | "rf = DecisionTreeClassifier(max_depth=12, criterion='entropy')\n",
694 | "\n",
695 | "scoring = {'acc': 'accuracy',\n",
696 | " 'auc': 'roc_auc',\n",
697 | " 'recall': 'recall'}\n",
698 | "\n",
699 | "results = cross_validate(rf, X, y, cv=5, scoring=scoring, return_train_score=True)\n",
700 | "results"
701 | ]
702 | },
703 | {
704 | "cell_type": "code",
705 | "execution_count": null,
706 | "metadata": {
707 | "ExecuteTime": {
708 | "end_time": "2018-11-27T15:12:14.425262Z",
709 | "start_time": "2018-11-27T15:12:14.404548Z"
710 | }
711 | },
712 | "outputs": [],
713 | "source": [
714 | "auc = results['test_auc']\n",
715 | "recall = results['test_recall']\n",
716 | "acc = results['test_acc']\n",
717 | "\n",
718 | "print(f'Area under RoC curve: {auc.mean():0.04f} ± {auc.std():0.04f}')\n",
719 | "print(f'Accuracy: {acc.mean():0.04f} ± {acc.std():0.04f}')\n",
720 | "print(f'Recall: {recall.mean():0.04f} ± {recall.std():0.04f}')"
721 | ]
722 | },
723 | {
724 | "cell_type": "markdown",
725 | "metadata": {},
726 | "source": [
727 | "## 6. Physics\n",
728 | "\n",
729 | "Now we could test our model and our hypothesis on real observed data. This part of the analysis is the most time \n",
730 | "consuming in general. It also requires more data than than this notebook can handle. \n",
731 | "After careful analysis one can produce an image of the gamma-ray sky\n",
732 | "\n",
733 | "
"
734 | ]
735 | },
736 | {
737 | "cell_type": "markdown",
738 | "metadata": {
739 | "collapsed": true,
740 | "jupyter": {
741 | "outputs_hidden": true
742 | }
743 | },
744 | "source": [
745 | "## Improving Classification\n",
746 | "\n",
747 | "\n",
748 | "### Boosting and AdaBoost\n",
749 | "\n",
750 | "Similar to the idea of combining many classifiers through bagging (like we did for the RandomForests) we now \n",
751 | "train many estimators in a sequential manner. In each iteration the data gets modified slightly using weights $w$\n",
752 | "for each sample in the training data. In the first iteration the weights are all set to $w=1$\n",
753 | "\n",
754 | "In each successive iteration the weights are updated. The samples that were incorrectly classified in the previous \n",
755 | "iteration get a higher weight. The weights for correctly classified samples get decreases. \n",
756 | "In other words: We increase the influence/importance of samples that are difficult to classify.\n",
757 | "\n",
758 | "Predictions are performed by taking a weighted average of the single predictors.\n",
759 | "\n",
760 | "The popular AdaBoost algorithms takes this a step further by optimizing the weight of each separate classifier \n",
761 | "in the ensemble.\n",
762 | "The AdaBoost ensemble combines many learners in an iterative way. The learner at iteration $m$ is:\n",
763 | "\n",
764 | "$$\n",
765 | " F_{m}(x)=F_{m-1}(x)+\\gamma _{m}h_{m}(x)\n",
766 | "$$\n",
767 | "\n",
768 | "The choice of $F_0$ is problem specific.\n",
769 | "\n",
770 | "Each weak learner produces a prediction $h(x_{m})$ for each sample in the training set. At each iteration $m$ a \n",
771 | "weak learner is fitted and assigned a coefficient $\\gamma_{m}$ which is found by minimizing:\n",
772 | "\n",
773 | "$$\n",
774 | "\\gamma_m = {\\underset {\\gamma }{\\arg \\min }} \\sum_{i}^{N}E\\bigl(F_{m-1}(x_{i})+\\gamma h(x_{i})\\bigr)\n",
775 | "$$\n",
776 | "\n",
777 | "where $E(F)$ is some error function and $x_i$ is the reweighted data sample.\n",
778 | "\n",
779 | "In general this method can work with any classifying method. Traditionally it is being used with very small \n",
780 | "decision trees. \n",
781 | "The weights get used to select the split points during the minimization of the loss function in each node\n",
782 | "\n",
783 | "$$\n",
784 | " \\underset{(X, s) \\in \\, \\mathbf{X} \\times {S}}{\\arg \\max} IG(X,Y) = \\underset{(X, s) \\in \\, \\mathbf{X} \\times {S}}{\\arg \\max} ( H(Y) - H(Y |\\, X) ).\n",
785 | "$$\n",
786 | "\n",
787 | "Below we try AdaBoost on the CTA data.\n"
788 | ]
789 | },
790 | {
791 | "cell_type": "code",
792 | "execution_count": null,
793 | "metadata": {
794 | "ExecuteTime": {
795 | "end_time": "2018-11-27T15:49:09.247110Z",
796 | "start_time": "2018-11-27T15:48:58.872812Z"
797 | }
798 | },
799 | "outputs": [],
800 | "source": [
801 | "from sklearn.ensemble import AdaBoostClassifier\n",
802 | "\n",
803 | "ada = AdaBoostClassifier(\n",
804 | " estimator=DecisionTreeClassifier(max_depth=2),\n",
805 | " n_estimators=100,\n",
806 | " learning_rate=0.5,\n",
807 | ")\n",
808 | "ada.fit(X_train, y_train)\n",
809 | "\n",
810 | "y_prediction = ada.predict(X_test)\n",
811 | "y_prediction_proba = ada.predict_proba(X_test)"
812 | ]
813 | },
814 | {
815 | "cell_type": "code",
816 | "execution_count": null,
817 | "metadata": {
818 | "ExecuteTime": {
819 | "end_time": "2018-11-27T15:49:10.072785Z",
820 | "start_time": "2018-11-27T15:49:09.249195Z"
821 | }
822 | },
823 | "outputs": [],
824 | "source": [
825 | "scores = np.array(list(ada.staged_score(X_test, y_test)))\n",
826 | "\n",
827 | "plt.figure()\n",
828 | "plt.plot(scores, '.')\n",
829 | "plt.ylabel('Accuracy')\n",
830 | "plt.xlabel('Iteration')\n",
831 | "None"
832 | ]
833 | },
834 | {
835 | "cell_type": "code",
836 | "execution_count": null,
837 | "metadata": {
838 | "ExecuteTime": {
839 | "end_time": "2018-11-27T15:49:10.727863Z",
840 | "start_time": "2018-11-27T15:49:10.075262Z"
841 | }
842 | },
843 | "outputs": [],
844 | "source": [
845 | "acc = accuracy_score(y_test, y_prediction)\n",
846 | "auc = roc_auc_score(y_test, y_prediction_proba[:, 1])\n",
847 | "fpr, tpr, thresholds = roc_curve(y_test, y_prediction_proba[:, 1])\n",
848 | "\n",
849 | "plot_roc(fpr, tpr, thresholds)"
850 | ]
851 | },
852 | {
853 | "cell_type": "markdown",
854 | "metadata": {},
855 | "source": [
856 | "### Gradient Boosting \n",
857 | "\n",
858 | "Very similar to AdaBoost. Only this time we change the target label we train the classifiers for.\n",
859 | "\n",
860 | "Formulate the general problem as follows (See Wikipedia):\n",
861 | "\n",
862 | "Starts with a constant function $F_{0}(x)$ and some differentiable loss function $L$ and incrementally expands it in a greedy fashion:\n",
863 | "\n",
864 | "$$\n",
865 | "F_{0}(x)={\\underset {\\gamma }{\\arg \\min }}{\\sum _{i=1}^{n}{L(y_{i},\\gamma )}}\n",
866 | "$$\n",
867 | "\n",
868 | "$$\n",
869 | "F_{m}(x)=F_{m-1}(x)+{\\underset {h_{m}\\in {\\mathcal {H}}}{\\operatorname {arg\\,min} }}\\left[{\\sum _{i=1}^{n}{L(y_{i},F_{m-1}(x_{i})+h_{m}(x_{i}))}}\\right]\n",
870 | "$$\n",
871 | "\n",
872 | "Finding the best $ h_{m}\\in {\\mathcal {H}}$ is computationally speaking impossible.\n",
873 | "If we could find the perfect $h$ however, we know that \n",
874 | "\n",
875 | "$$\n",
876 | "F_{m+1}(x_i)=F_{m}(x_i)+h(x_i)=y_i\n",
877 | "$$\n",
878 | "\n",
879 | "or, equivalently, \n",
880 | "\n",
881 | "$$\n",
882 | " h(x_i)= y_i - F_{m}(x_i)\n",
883 | "$$\n",
884 | "\n",
885 | "Note that for the mean squared error loss $\\frac{1}{2}(y_i - F(x_i))^2$ this is equivalent to the negative \n",
886 | "gradient with respect to $F_i$.\n",
887 | "\n",
888 | "For a general loss function we fit $h_{m}(x)$ to the residuals, or negative gradients \n",
889 | "$$\n",
890 | " r_{i, m}=-\\left[{\\frac {\\partial L(y_{i},F(x_{i}))}{\\partial F(x_{i})}}\\right]_{F(x)=F_{m-1}(x)}\\quad {\\mbox{for }}i=1,\\ldots ,n.\n",
891 | "$$\n",
892 | "\n",
893 | "\n",
894 | "\n",
895 | "Below we try it on CTA data again.\n"
896 | ]
897 | },
898 | {
899 | "cell_type": "code",
900 | "execution_count": null,
901 | "metadata": {
902 | "ExecuteTime": {
903 | "end_time": "2018-11-27T16:56:38.715276Z",
904 | "start_time": "2018-11-27T16:56:29.657159Z"
905 | }
906 | },
907 | "outputs": [],
908 | "source": [
909 | "from sklearn.ensemble import GradientBoostingClassifier\n",
910 | "\n",
911 | "grb = GradientBoostingClassifier(\n",
912 | " verbose=True,\n",
913 | " n_estimators=300,\n",
914 | ")\n",
915 | "grb.fit(X_train, y_train)\n",
916 | "\n",
917 | "y_prediction = grb.predict(X_test)\n",
918 | "y_prediction_proba = grb.predict_proba(X_test)"
919 | ]
920 | },
921 | {
922 | "cell_type": "code",
923 | "execution_count": null,
924 | "metadata": {
925 | "ExecuteTime": {
926 | "end_time": "2018-11-27T16:56:39.343691Z",
927 | "start_time": "2018-11-27T16:56:38.718176Z"
928 | }
929 | },
930 | "outputs": [],
931 | "source": [
932 | "l = [accuracy_score(y_pred, y_test) for y_pred in grb.staged_predict(X_test)]\n",
933 | "\n",
934 | "plt.figure()\n",
935 | "plt.plot(range(len(l)), l, '.')\n",
936 | "plt.ylabel('Accuracy')\n",
937 | "plt.xlabel('Iteration')\n",
938 | "None"
939 | ]
940 | },
941 | {
942 | "cell_type": "code",
943 | "execution_count": null,
944 | "metadata": {
945 | "ExecuteTime": {
946 | "end_time": "2018-11-27T16:56:40.215880Z",
947 | "start_time": "2018-11-27T16:56:39.346460Z"
948 | }
949 | },
950 | "outputs": [],
951 | "source": [
952 | "acc = accuracy_score(y_test, y_prediction)\n",
953 | "auc = roc_auc_score(y_test, y_prediction_proba[:, 1])\n",
954 | "fpr, tpr, thresholds = roc_curve(y_test, y_prediction_proba[:, 1])\n",
955 | "\n",
956 | "plot_roc(fpr, tpr, thresholds)\n",
957 | "\n",
958 | "plt.text(0.5, 0.5, f'AuC ROC: {auc:0.03f} \\nAccuracy: {acc:0.03f}')\n",
959 | "None"
960 | ]
961 | },
962 | {
963 | "cell_type": "markdown",
964 | "metadata": {},
965 | "source": [
966 | "More on gradient descent algorithms can be found in the Neural Network lecture.\n",
967 | "\n",
968 | "Let's now test our all time favorite classifier. "
969 | ]
970 | },
971 | {
972 | "cell_type": "code",
973 | "execution_count": null,
974 | "metadata": {
975 | "ExecuteTime": {
976 | "end_time": "2018-11-27T16:58:57.873659Z",
977 | "start_time": "2018-11-27T16:58:45.510336Z"
978 | }
979 | },
980 | "outputs": [],
981 | "source": [
982 | "from sklearn.ensemble import RandomForestClassifier\n",
983 | "\n",
984 | "rf = RandomForestClassifier(n_estimators=150, max_depth=18, criterion='entropy')\n",
985 | "rf.fit(X_train, y_train)\n",
986 | "\n",
987 | "y_prediction = rf.predict(X_test)\n",
988 | "y_prediction_proba = rf.predict_proba(X_test)"
989 | ]
990 | },
991 | {
992 | "cell_type": "code",
993 | "execution_count": null,
994 | "metadata": {
995 | "ExecuteTime": {
996 | "end_time": "2018-11-27T16:58:58.442111Z",
997 | "start_time": "2018-11-27T16:58:57.875736Z"
998 | }
999 | },
1000 | "outputs": [],
1001 | "source": [
1002 | "acc = accuracy_score(y_test, y_prediction)\n",
1003 | "auc = roc_auc_score(y_test, y_prediction_proba[:, 1])\n",
1004 | "fpr, tpr, thresholds = roc_curve(y_test, y_prediction_proba[:, 1])\n",
1005 | "\n",
1006 | "plot_roc(fpr, tpr, thresholds)\n",
1007 | "plt.text(0.5, 0.5, f'AuC ROC: {auc:0.03f} \\nAccuracy: {acc:0.03f}')\n",
1008 | "None"
1009 | ]
1010 | }
1011 | ],
1012 | "metadata": {
1013 | "kernelspec": {
1014 | "display_name": "Python 3 (ipykernel)",
1015 | "language": "python",
1016 | "name": "python3"
1017 | },
1018 | "language_info": {
1019 | "codemirror_mode": {
1020 | "name": "ipython",
1021 | "version": 3
1022 | },
1023 | "file_extension": ".py",
1024 | "mimetype": "text/x-python",
1025 | "name": "python",
1026 | "nbconvert_exporter": "python",
1027 | "pygments_lexer": "ipython3",
1028 | "version": "3.12.3"
1029 | }
1030 | },
1031 | "nbformat": 4,
1032 | "nbformat_minor": 4
1033 | }
1034 |
--------------------------------------------------------------------------------
/smd_handson_fitting.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "SMD Hands-On Estimators / Fitting"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import pandas as pd\n",
17 | "import matplotlib.pyplot as plt\n",
18 | "import numpy as np"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "%matplotlib widget\n",
28 | "\n",
29 | "from IPython.display import display, HTML\n",
30 | "display(HTML(\"\"))\n",
31 | "\n",
32 | "plt.rcParams['figure.figsize'] = (7.5, 5)\n",
33 | "plt.rcParams['figure.constrained_layout.use'] = True"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "def cov_to_corr(cov):\n",
43 | " '''Convert covariance to correlation matrix\n",
44 | " \n",
45 | " Taken from: https://math.stackexchange.com/a/300775/892886\n",
46 | " '''\n",
47 | " D = np.diag(1 / np.sqrt(np.diag(cov)))\n",
48 | " return D @ cov @ D"
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "# Least Squares\n",
56 | "\n",
57 | "## Analytic solution for linear combination of functions\n",
58 | "\n",
59 | "(Analog to the exercise on the last sheet)\n",
60 | "\n",
61 | "\n",
62 | "Here, we will fit a function of the form\n",
63 | "\n",
64 | "$$\n",
65 | "f(x) = p_0 + p_1 \\cdot \\sin(x) + p_2 \\cdot \\cos(x)\n",
66 | "$$\n",
67 | "\n",
68 | "to our data.\n",
69 | "\n",
70 | "This function is a linear combination of basis functions:\n",
71 | "$$\n",
72 | "f(x) = \\sum_{i=0}^2 p_i f_i(x)\n",
73 | "$$\n",
74 | "\n",
75 | "with\n",
76 | "\n",
77 | "$$\n",
78 | "f_0(x) = 1, \\quad f_1(x) = \\sin(x), \\quad f_2(x) = \\cos(x)\n",
79 | "$$\n",
80 | "\n",
81 | "In this case, we can use the analytic solution to the least squares optimization problem."
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "metadata": {},
88 | "outputs": [],
89 | "source": [
90 | "def linear_combination(x, funcs, parameters):\n",
91 | " '''Evaluate a linear combination of basis functions\n",
92 | " \n",
93 | " Parameters\n",
94 | " ----------\n",
95 | " x: number or np.ndarray\n",
96 | " The point or points at which to evaluate\n",
97 | " funcs: iterable of callables\n",
98 | " The basis functions\n",
99 | " parameters: iterable of numbers\n",
100 | " The coefficients\n",
101 | " ''' \n",
102 | " return np.sum([p * f(x) for p, f in zip(parameters, funcs, strict=True)], axis=0)"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "metadata": {},
109 | "outputs": [],
110 | "source": [
111 | "# define our linear model\n",
112 | "funcs = [np.ones_like, np.sin, np.cos]"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": [
121 | "# create some randomized example data points\n",
122 | "rng = np.random.default_rng(1337)\n",
123 | "\n",
124 | "N = 100\n",
125 | "true_parameters = np.array([2, 1, 0.5])\n",
126 | "x = np.linspace(0, 4 * np.pi, N)\n",
127 | "y = linear_combination(x, funcs, true_parameters)\n",
128 | "\n",
129 | "# add some noise for to simulate measurement uncertainty\n",
130 | "y_unc = rng.uniform(0.1, 0.4, N)\n",
131 | "y += rng.normal(0, y_unc)\n",
132 | "\n",
133 | "plt.figure()\n",
134 | "plt.errorbar(x, y, yerr=y_unc, ls='', marker='.', color='k')\n",
135 | "plt.xlabel('x')\n",
136 | "plt.ylabel('y');"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "Create the design matrix $\\boldsymbol{A}$"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": null,
149 | "metadata": {},
150 | "outputs": [],
151 | "source": [
152 | "def design_matrix(funcs, x):\n",
153 | " '''Create the design matrix for a linear least squares problem'''\n",
154 | " return np.column_stack([f(x) for f in funcs])"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": null,
160 | "metadata": {},
161 | "outputs": [],
162 | "source": [
163 | "A = design_matrix(funcs, x)\n",
164 | "A[:5]"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "Define the weight matrix $\\boldsymbol{W} = \\mathrm{Cov}^{-1}(\\boldsymbol{y})$ of the measurements.\n",
172 | "\n",
173 | "Here we assume that all measured points do not have an uncertainty in $x$ and that the $y$ values are statistically independent (no off-diagonal entries in $\\mathrm{Cov}(\\boldsymbol{y})$).\n",
174 | "\n",
175 | "This is a very strong assumption.\n",
176 | "\n",
177 | "The linear least squares method yields biased results if the `Covariance` is estimated from the data points themselves. \n",
178 | "\n",
179 | "More on that further down in the muon example."
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "execution_count": null,
185 | "metadata": {},
186 | "outputs": [],
187 | "source": [
188 | "# All measurements have a known uncertainty and no correlations\n",
189 | "cov_y = np.diag(y_unc**2)\n",
190 | " \n",
191 | "W = np.linalg.inv(cov_y)\n",
192 | "\n",
193 | "fig, ax = plt.subplots()\n",
194 | "plot = ax.matshow(W)\n",
195 | "fig.colorbar(plot);"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": null,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "W[:5, :5]"
205 | ]
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "metadata": {},
210 | "source": [
211 | "Solve the linear least squares problem"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": null,
217 | "metadata": {},
218 | "outputs": [],
219 | "source": [
220 | "def solve_linear_least_squares(A, W, y):\n",
221 | " \"\"\"Solve the linear least squares problem\n",
222 | " \n",
223 | " Parameters\n",
224 | " ----------\n",
225 | " A : np.ndarray\n",
226 | " Design matrix\n",
227 | " W : np.ndarray\n",
228 | " Weight matrix\n",
229 | " y : np.ndarray\n",
230 | " Vector of y values\n",
231 | " \"\"\"\n",
232 | " cov = np.linalg.inv(A.T @ W @ A)\n",
233 | " parameters = cov @ A.T @ W @ y\n",
234 | " return parameters, cov\n",
235 | "\n",
236 | "\n",
237 | "parameters, cov = solve_linear_least_squares(A, W, y)\n",
238 | "parameters, cov"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "Calculate the $\\chi^2$ over the number of degrees of freedom:"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": null,
251 | "metadata": {},
252 | "outputs": [],
253 | "source": [
254 | "def chisquare_over_ndf(y, A, parameters):\n",
255 | " residuals = (y - A @ parameters)\n",
256 | " sum_residuals = (residuals.T @ W @ residuals)\n",
257 | "\n",
258 | " ndf = len(y) - len(parameters)\n",
259 | " return sum_residuals / ndf\n",
260 | "\n",
261 | "\n",
262 | "chisquare_over_ndf(y, A, parameters), chisquare_over_ndf(y, A, true_parameters)"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {},
268 | "source": [
269 | "putting it all together:"
270 | ]
271 | },
272 | {
273 | "cell_type": "code",
274 | "execution_count": null,
275 | "metadata": {},
276 | "outputs": [],
277 | "source": [
278 | "def linear_least_squares(x, y, funcs, cov_y=None):\n",
279 | " \"\"\"\n",
280 | " Perform a linear least squares fit.\n",
281 | " \n",
282 | " Parameters\n",
283 | " ----------\n",
284 | " x : np.ndarray[ndim=1]\n",
285 | " Vector of x values\n",
286 | " y : np.ndarray[ndim=1]\n",
287 | " Vector of y values\n",
288 | " funcs : Sequence[Callable]\n",
289 | " The basis functions\n",
290 | " cov_y : Optional[np.ndarray[ndim=2]]\n",
291 | " The covariance matrix of the y values\n",
292 | " \n",
293 | " Returns :\n",
294 | " parameters : np.ndarray[ndim=1]\n",
295 | " The estimated parameters\n",
296 | " cov : np.ndarray[ndim=2]\n",
297 | " Covariance matrix of the parameters\n",
298 | " chisq_ndf : float\n",
299 | " The chisquare over the number of degrees of freedom\n",
300 | " of the fit result.\n",
301 | " \"\"\"\n",
302 | " A = design_matrix(funcs, x)\n",
303 | " \n",
304 | " if cov_y is not None:\n",
305 | " W = np.linalg.inv(cov_y)\n",
306 | " else:\n",
307 | " W = np.eye(len(y))\n",
308 | " \n",
309 | " parameters, cov = solve_linear_least_squares(A, W, y)\n",
310 | " \n",
311 | " chisquare_ndf = chisquare_over_ndf(y, A, parameters)\n",
312 | " \n",
313 | " return parameters, cov, chisquare_ndf"
314 | ]
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "metadata": {},
319 | "source": [
320 | "Result:"
321 | ]
322 | },
323 | {
324 | "cell_type": "code",
325 | "execution_count": null,
326 | "metadata": {},
327 | "outputs": [],
328 | "source": [
329 | "parameters, cov, chisquare_ndf = linear_least_squares(x, y, funcs, cov_y)"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": null,
335 | "metadata": {},
336 | "outputs": [],
337 | "source": [
338 | "x_fit = np.linspace(0, 4 * np.pi, 1000)\n",
339 | "y_fit = linear_combination(x_fit, funcs, parameters)\n",
340 | "y_truth = linear_combination(x_fit, funcs, true_parameters)\n",
341 | "\n",
342 | "\n",
343 | "fig, ax = plt.subplots()\n",
344 | "\n",
345 | "ax.errorbar(x, y, yerr=y_unc, ls='', marker='.', label='Data', color='k')\n",
346 | "\n",
347 | "ax.plot(x_fit, y_fit, label='Fit-Result')\n",
348 | "ax.plot(x_fit, y_truth, label='Truth')\n",
349 | "\n",
350 | "ax.set(\n",
351 | " title=(\n",
352 | " rf'$f(x) = {parameters[0]:.2f} + {parameters[1]:.2f} \\cdot \\sin(x) + {parameters[2]:.2f} \\cdot \\cos(x)'\n",
353 | " rf', \\quad \\chi^2_\\mathrm{{ndf}} = {chisquare_ndf:.2f}$'\n",
354 | " ),\n",
355 | " xlabel='x',\n",
356 | " ylabel='y',\n",
357 | ")\n",
358 | "\n",
359 | "\n",
360 | "ax.legend();"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": null,
366 | "metadata": {},
367 | "outputs": [],
368 | "source": [
369 | "corr = cov_to_corr(cov)\n",
370 | "\n",
371 | "fig, ax = plt.subplots()\n",
372 | "m = ax.matshow(corr, cmap='RdBu_r', vmin=-1, vmax=1)\n",
373 | "fig.colorbar(m);"
374 | ]
375 | },
376 | {
377 | "cell_type": "markdown",
378 | "metadata": {},
379 | "source": [
380 | "## Numerical solution for non-linear functions\n",
381 | "\n",
382 | "\n",
383 | "If the function is not a linear combination of basis functions, the solution\n",
384 | "can only be found numerically.\n",
385 | "\n",
386 | "From the lab courses, you probably know the `scipy.optimize.curve_fit` function, which does exactly this.\n",
387 | "\n",
388 | "The Nelder-Mead-Algorithm use by `curve_fit` has the nice property that it is guaranteed to find the analytical\n",
389 | "solution, if it exists.\n",
390 | "\n",
391 | "\n",
392 | "- https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html\n",
393 | "\n",
394 | "### Example application to our linear problem"
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": null,
400 | "metadata": {},
401 | "outputs": [],
402 | "source": [
403 | "from scipy.optimize import curve_fit\n",
404 | "\n",
405 | "\n",
406 | "def func(x, p1, p2, p3):\n",
407 | " return p1 + p2 * np.sin(x) + p3 * np.cos(x)\n",
408 | "\n",
409 | "\n",
410 | "# absolute_sigma prevents scaling of errors to match χ²/ndf=1\n",
411 | "parameters_numeric, cov_numeric = curve_fit(\n",
412 | " func,x, y,\n",
413 | " sigma=np.full(N, y_unc),\n",
414 | " absolute_sigma=True,\n",
415 | ")\n",
416 | "\n",
417 | "print(parameters, parameters_numeric, cov, cov_numeric, sep='\\n')"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": null,
423 | "metadata": {},
424 | "outputs": [],
425 | "source": [
426 | "x_fit = np.linspace(0, 4 * np.pi, 1000)\n",
427 | "y_fit = func(x_fit, *parameters)\n",
428 | "y_num = func(x_fit, *parameters_numeric)\n",
429 | "y_truth = func(x_fit, *true_parameters)\n",
430 | " \n",
431 | "\n",
432 | "fig, ax = plt.subplots()\n",
433 | "\n",
434 | "ax.errorbar(x, y, yerr=y_unc, ls='', marker='.', label='Data', color='k')\n",
435 | "\n",
436 | "ax.plot(x_fit, y_fit, label='Fit-Result (Analytic)')\n",
437 | "ax.plot(x_fit, y_num, label='Fit-Result (Numeric)')\n",
438 | "ax.plot(x_fit, y_truth, label='Truth')\n",
439 | "\n",
440 | "ax.set(\n",
441 | " title=(\n",
442 | " rf'$f(x) = {parameters[0]:.2f} + {parameters[1]:.2f} \\cdot \\sin(x) + {parameters[2]:.2f} \\cdot \\cos(x)'\n",
443 | " rf', \\quad \\chi^2_\\mathrm{{ndf}} = {chisquare_ndf:.2f}$'\n",
444 | " ),\n",
445 | " xlabel='x',\n",
446 | " ylabel='y',\n",
447 | ")\n",
448 | "\n",
449 | "\n",
450 | "ax.legend();"
451 | ]
452 | },
453 | {
454 | "cell_type": "markdown",
455 | "metadata": {},
456 | "source": [
457 | "### Why not simply always chose the numerical solution then?"
458 | ]
459 | },
460 | {
461 | "cell_type": "code",
462 | "execution_count": null,
463 | "metadata": {},
464 | "outputs": [],
465 | "source": [
466 | "%%timeit\n",
467 | "linear_least_squares(x, y, funcs)"
468 | ]
469 | },
470 | {
471 | "cell_type": "code",
472 | "execution_count": null,
473 | "metadata": {},
474 | "outputs": [],
475 | "source": [
476 | "%%timeit\n",
477 | "curve_fit(func, x, y)"
478 | ]
479 | },
480 | {
481 | "cell_type": "markdown",
482 | "metadata": {},
483 | "source": [
484 | "### Example with non-linear function \n",
485 | "\n",
486 | "\n",
487 | "From the lab courses: single slit diffraction"
488 | ]
489 | },
490 | {
491 | "cell_type": "code",
492 | "execution_count": null,
493 | "metadata": {},
494 | "outputs": [],
495 | "source": [
496 | "df = pd.read_csv('resources/spalt.csv')\n",
497 | "\n",
498 | "\n",
499 | "LASER_WAVELENGTH_NM = 632.8e-9\n",
500 | "\n",
501 | "def theory(phi, A0, b):\n",
502 | " return (A0 * b * np.sinc(b * np.sin(phi) / LASER_WAVELENGTH_NM))**2\n",
503 | "\n",
504 | "\n",
505 | "# first try with default initial guess (1 for every parameter)\n",
506 | "p0 = None\n",
507 | "\n",
508 | "# now with an \"educated guess\" based on the data and knowledge of the\n",
509 | "# order of magnitude of the slit size\n",
510 | "# p0 = [np.sqrt(df['I'].max()) / 1e-4, 1e-4]\n",
511 | "\n",
512 | "params, cov = curve_fit(theory, df['phi'], df['I'], p0=p0)\n",
513 | "\n",
514 | "\n",
515 | "x = np.linspace(-0.03, 0.03, 501)\n",
516 | "\n",
517 | "fig, ax = plt.subplots()\n",
518 | "ax.plot(x, theory(x, *params), label='Fit')\n",
519 | "\n",
520 | "ax.plot(df['phi'], df['I'], '.', label='Daten')\n",
521 | "\n",
522 | "ax.set(\n",
523 | " xlabel=r'$\\varphi \\,\\, / \\,\\, \\mathrm{rad}$',\n",
524 | " ylabel=r'$I \\,\\, / \\,\\, \\mathrm{A}$',\n",
525 | ")\n",
526 | "ax.legend(loc='best');"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "metadata": {},
532 | "source": [
533 | "# Maximimum-Likelihood-Method\n",
534 | "\n",
535 | "\n",
536 | "## Unbinned Fit of Probability Densities\n",
537 | "\n",
538 | "Strongly simplified example of a CERN-Like analysis.\n",
539 | "\n",
540 | "\n",
541 | "We are looking for the mass-peak of a (normally distributed) particle.\n",
542 | "\n",
543 | "We also observed a background, which in this simplified example is assumed to be exponentially distributed.\n",
544 | "\n",
545 | "We create a simple \"Toy\"-Dataset using Monte Carlo methods:"
546 | ]
547 | },
548 | {
549 | "cell_type": "code",
550 | "execution_count": null,
551 | "metadata": {},
552 | "outputs": [],
553 | "source": [
554 | "rng = np.random.default_rng(42)\n",
555 | "\n",
556 | "E_MIN = 75\n",
557 | "E_MAX = 175\n",
558 | "\n",
559 | "# normally distributed signal\n",
560 | "higgs_signal = rng.normal(126, 5, 500)\n",
561 | "\n",
562 | "# exponentially distributed background\n",
563 | "background = rng.exponential(50, size=20000)\n",
564 | "\n",
565 | "# combine signal and background\n",
566 | "measured = np.append(higgs_signal, background)\n",
567 | "\n",
568 | "# remove events outside of \"detector range\"\n",
569 | "in_range = (E_MIN <= measured) & (measured <= E_MAX)\n",
570 | "measured = measured[in_range]\n",
571 | "\n",
572 | "\n",
573 | "fig, ax = plt.subplots()\n",
574 | "ax.hist(measured, bins=100)\n",
575 | "ax.set_xlabel(r'$m \\,/\\, \\mathrm{GeV}$')\n",
576 | "ax.margins(x=0)\n",
577 | "None"
578 | ]
579 | },
580 | {
581 | "cell_type": "markdown",
582 | "metadata": {},
583 | "source": [
584 | "### Definition of the negative Log-Likelihood\n",
585 | "\n",
586 | "We create a superposition of two probability densities, each with a proportion of $p$ and $1 - p$ respectively.\n",
587 | "\n",
588 | "\n",
589 | "We also need to normalize the densities to the observed interval.\n",
590 | "\n",
591 | "In this special case, we ignore this for the normal distribution, assuming that it is fully contained in the measurement interval.\n",
592 | "\n",
593 | "So:\n",
594 | "\n",
595 | "\\begin{align}\n",
596 | "P_1 &= N(\\mu, \\sigma) \\\\[1ex]\n",
597 | "P_2 &= \\frac{1}{\\exp(-E_\\mathrm{min} / \\tau) - \\exp(-E_\\mathrm{max} / \\tau)} \\exp(- E / \\tau) \\\\[1ex]\n",
598 | "P(E | p, \\mu, \\sigma, \\tau) &= p \\cdot P_1(E, \\mu, \\sigma) + (1 - p) P_2(E | \\tau)) \\\\[1ex]\n",
599 | "\\mathcal{L}(p, \\mu, \\sigma, \\tau) &= \\prod_i P(E_i | p, \\mu, \\sigma, \\tau) \\\\[1ex]\n",
600 | "-\\log\\mathcal{L}(p, \\mu, \\sigma, \\tau) &= -\\sum_i \\log(P(E_i | p, \\mu, \\sigma, \\tau))\n",
601 | "\\end{align}\n",
602 | "\n",
603 | "In code, using the distribution classes from `scipy.stats`, it looks like this:"
604 | ]
605 | },
606 | {
607 | "cell_type": "code",
608 | "execution_count": null,
609 | "metadata": {},
610 | "outputs": [],
611 | "source": [
612 | "from scipy.stats import norm, expon\n",
613 | "\n",
614 | "\n",
615 | "def pdf(x, mean, std, tau, p):\n",
616 | " mass_peak = p * norm.pdf(x, mean, std)\n",
617 | " \n",
618 | " expon_norm = np.exp(-E_MIN / tau) - np.exp(-E_MAX / tau)\n",
619 | " background = (1 - p) / expon_norm * expon.pdf(x, scale=tau)\n",
620 | " \n",
621 | " return mass_peak + background"
622 | ]
623 | },
624 | {
625 | "cell_type": "markdown",
626 | "metadata": {},
627 | "source": [
628 | "### Solution using `scipy.optimize.minimize` und `numdifftools.Hessian`\n",
629 | "\n",
630 | "\n",
631 | "Using `scipy.optimize.minimize`, we can minimize arbitrary functions.\n",
632 | "\n",
633 | "\n",
634 | "The functions need to have an array of the fit parameters as first argument.\n",
635 | "\n",
636 | "Further arguments can be passed using the `args` argument of `minimize`.\n",
637 | "\n",
638 | "\n",
639 | "Naïvely, our negative Log-Likelihood for use with scipy looks like this:"
640 | ]
641 | },
642 | {
643 | "cell_type": "code",
644 | "execution_count": null,
645 | "metadata": {},
646 | "outputs": [],
647 | "source": [
648 | "def neg_log_likelihood(parameters, data):\n",
649 | " # we add an epsilon to avoid inf in case of p=0\n",
650 | " return -np.sum(np.log(pdf(data, *parameters) + 1e-30))"
651 | ]
652 | },
653 | {
654 | "cell_type": "markdown",
655 | "metadata": {},
656 | "source": [
657 | "Naïve, because often, analytic simplifications of the log-likelihood are possible!\n",
658 | "\n",
659 | "Analytic simplifications both benefit numerical precision and evaluation speed.\n",
660 | "\n",
661 | "It is almost always a good idea, to write down the likelihood on paper and simplify analytically as far as possible."
662 | ]
663 | },
664 | {
665 | "cell_type": "markdown",
666 | "metadata": {},
667 | "source": [
668 | "We find a minimum (and hopefully the global one) using `scipy.optimize.minimize`. \n",
669 | "\n",
670 | "As known from the previous lecture, we can estimate the covariance matrix of the parameters from the inverse of the Hessian matrix of the negative log-likelihood,\n",
671 | "evaluated at the found minimum.\n",
672 | "\n",
673 | "\n",
674 | "The Hessian matrix needs to be determined numerically.\n",
675 | "`scipy.optimize.minimize` returns a rough estimate, but this is borderline unusable.\n",
676 | "\n",
677 | "\n",
678 | "We use `numdifftools.Hessian` to get the Hessian numerically.\n",
679 | "\n",
680 | "\n",
681 | "`scipy.optimize.minimize` can use multiple algorithms each with benefits and drawbacks under the hood.\n",
682 | "For more information, see\n",
683 | " \n",
684 | "* https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html\n",
685 | "* https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html"
686 | ]
687 | },
688 | {
689 | "cell_type": "code",
690 | "execution_count": null,
691 | "metadata": {
692 | "scrolled": true
693 | },
694 | "outputs": [],
695 | "source": [
696 | "from scipy.optimize import minimize\n",
697 | "\n",
698 | "# a very small number to achive > 0 instead of >= 0\n",
699 | "eps = np.finfo(np.float64).eps\n",
700 | "\n",
701 | "result = minimize(\n",
702 | " neg_log_likelihood,\n",
703 | " args=(measured, ),\n",
704 | " x0=[130, 2, 30, 0.2], # here, the initial guess is required\n",
705 | " bounds=[\n",
706 | " (0, None), # mean >= 0\n",
707 | " (eps, None), # std > 0\n",
708 | " (eps, None), # tau > 0\n",
709 | " (0, 1), # 0 <= p <= 1\n",
710 | " ],\n",
711 | ")\n",
712 | "\n",
713 | "result"
714 | ]
715 | },
716 | {
717 | "cell_type": "code",
718 | "execution_count": null,
719 | "metadata": {},
720 | "outputs": [],
721 | "source": [
722 | "from numdifftools import Hessian\n",
723 | "\n",
724 | "hesse = Hessian(neg_log_likelihood)\n",
725 | "cov = np.linalg.inv(hesse(result.x, measured))"
726 | ]
727 | },
728 | {
729 | "cell_type": "code",
730 | "execution_count": null,
731 | "metadata": {},
732 | "outputs": [],
733 | "source": [
734 | "higgs_mass = result.x[0]\n",
735 | "higgs_mass_unc = np.sqrt(cov[0, 0])\n",
736 | "\n",
737 | "\n",
738 | "print(f'Higgs mass is {higgs_mass:.2f} ± {higgs_mass_unc:.2} GeV')"
739 | ]
740 | },
741 | {
742 | "cell_type": "code",
743 | "execution_count": null,
744 | "metadata": {},
745 | "outputs": [],
746 | "source": [
747 | "e = np.linspace(E_MIN, E_MAX, 1000)\n",
748 | "\n",
749 | "\n",
750 | "fig, ax = plt.subplots()\n",
751 | "\n",
752 | "ax.hist(measured, bins=100, density=True)\n",
753 | "ax.plot(e, pdf(e, *result.x))\n",
754 | "ax.axvline(result.x[0], color='C2', lw=2)\n",
755 | "\n",
756 | "ax.set_xlabel(r'$m \\,/\\, \\mathrm{GeV}$')\n",
757 | "ax.margins(x=0)"
758 | ]
759 | },
760 | {
761 | "cell_type": "markdown",
762 | "metadata": {},
763 | "source": [
764 | "### Solution using iminuit\n",
765 | "\n",
766 | "\n",
767 | "Iminuit provides python bindings for the `Minuit` minimization package from the `ROOT` framework.\n",
768 | "\n",
769 | "It does not require a full `ROOT` installation.\n",
770 | "\n",
771 | "\"Minuit\" is held to be – at least among particle physisicists – as non-plus-ultra of mimimizers.\n",
772 | "\n",
773 | "\n",
774 | "`iminuit` provides the minimizers and several helper classes for loss functions.\n",
775 | "This makes it much simpler to perform fits.\n",
776 | "\n",
777 | "\n",
778 | "It can also use likelihood ratio tests via the `minos` interface to estimate parameter uncertainties.\n",
779 | "(More on that in the next lectures). \n",
780 | "\n",
781 | "\n",
782 | "\n",
783 | "\n",
784 | "For starters, let's solve the same problem as above, this time using `iminuit`:"
785 | ]
786 | },
787 | {
788 | "cell_type": "code",
789 | "execution_count": null,
790 | "metadata": {},
791 | "outputs": [],
792 | "source": [
793 | "from iminuit import Minuit\n",
794 | "from iminuit.cost import UnbinnedNLL\n",
795 | "\n",
796 | "# minuit's UnbinnedNLL takes directly the pdf and the observed data\n",
797 | "loss = UnbinnedNLL(measured, pdf)\n",
798 | "\n",
799 | "m = Minuit(loss, mean=130, std=2, tau=30, p=0.2)\n",
800 | "\n",
801 | "# set bounds\n",
802 | "m.limits['mean'] = (0, None) # >= 0\n",
803 | "m.limits['std'] = (eps, None) # > 0\n",
804 | "m.limits['tau'] = (eps, None) # > 0\n",
805 | "m.limits['p'] = (0, 1)\n",
806 | "\n",
807 | "# perform minimization\n",
808 | "m.migrad()\n",
809 | "\n",
810 | "# perform likelihood scan for confidence intervals\n",
811 | "m.minos()"
812 | ]
813 | },
814 | {
815 | "cell_type": "code",
816 | "execution_count": null,
817 | "metadata": {},
818 | "outputs": [],
819 | "source": [
820 | "higgs_mass = m.values['mean']\n",
821 | "higgs_mass_unc = m.errors['mean']\n",
822 | "\n",
823 | "print(f'Higgs mass is {higgs_mass:.2f} ± {higgs_mass_unc:.2} GeV')"
824 | ]
825 | },
826 | {
827 | "cell_type": "markdown",
828 | "metadata": {},
829 | "source": [
830 | "## Poisson-Likelihood-Fit using a binned Event Distribution\n",
831 | "\n",
832 | "If the single observations are not accessible or the runtime of the analysis is critical for large amounts of data,\n",
833 | "a so-called \"binned\" fit is also possible.\n",
834 | "\n",
835 | "In a binned fit, we estimate the parameters of the distributions from a histogram.\n",
836 | "\n",
837 | "\n",
838 | "Since a histogram is a counting experiment, the single event numbers in each bin each follow a Poisson distribution.\n",
839 | "\n",
840 | "\n",
841 | "The cumulative probability function yields together with the total number of observed events the expected value in each histogram bin,\n",
842 | "dependent on the fit parameters $\\boldsymbol{\\theta}$.\n",
843 | "\n",
844 | "$$\n",
845 | "\\mathcal{L} = \\prod_{i=1}^N \\mathcal{P}(k=H_i, \\lambda=\\lambda_i(\\boldsymbol{\\theta}))\n",
846 | "$$\n",
847 | "\n",
848 | "with \n",
849 | "\n",
850 | "$$\n",
851 | "\\lambda_i = N_\\mathrm{total} \\cdot (\\mathrm{CDF}(b_i, \\boldsymbol{\\theta}) - \\mathrm{CDF}(a_i, \\boldsymbol{\\theta}))\n",
852 | "$$\n",
853 | "\n",
854 | "Here, $a_i$ und $b_i$ are the bin edges of the $i$-th bin.\n",
855 | "We then integrate the PDF in each bin and multiply with the total number of events.\n",
856 | "\n",
857 | "\n",
858 | "### Example from \"Lifetime of cosmic Muons\" (Lab Course Experiment V01)\n",
859 | "\n",
860 | "The experimental setup of this lab course experiment directly produces a histogram of observed decay times in hardware.\n",
861 | "\n",
862 | "We thus cannot perform an unbinned fit, the single values are simply not available.\n",
863 | "\n",
864 | "The lab course instruction ask for a least squares fit to the bin heights.\n",
865 | "\n",
866 | "This method yields an unbiased estimator but with non-optimal spread, as long as an unweighted fit is performed.\n",
867 | "\n",
868 | "In the past, it was recommended to perform a weighted fit assuming $\\sigma_i = \\sqrt{H_i}$.\n",
869 | "\n",
870 | "This is wrong! This method consistently yields biased results, due to under-fluctuations being weighted more strongly than over-fluctuations of the same amount.\n",
871 | "\n",
872 | "The correct method is the binned Poisson-likelihood fit as discussed here or the iterative least squares method discussed in SMD-2.\n",
873 | "\n",
874 | "A comparison of the different methods is available here: https://gist.github.com/maxnoe/41730e6ca1fac01fc06f0feab5c3566d\n",
875 | "\n",
876 | "In the experiment, there is an additional uniformly distributed background from non-decaying, coincident muons."
877 | ]
878 | },
879 | {
880 | "cell_type": "code",
881 | "execution_count": null,
882 | "metadata": {},
883 | "outputs": [],
884 | "source": [
885 | "N = np.genfromtxt(\"resources/muon_data.txt\")\n",
886 | "t = np.arange(len(N) + 1) / 21.48\n",
887 | "\n",
888 | "t = t[5:]\n",
889 | "N = N[5:]\n",
890 | "\n",
891 | "plt.figure()\n",
892 | "plt.stairs(values=N, edges=t)\n",
893 | "plt.xlabel('t / µs')\n",
894 | "plt.ylabel('Number of Events')\n",
895 | "None"
896 | ]
897 | },
898 | {
899 | "cell_type": "markdown",
900 | "metadata": {},
901 | "source": [
902 | "`Iminuit` provides the `BinnedNLL` loss function for this use cases."
903 | ]
904 | },
905 | {
906 | "cell_type": "code",
907 | "execution_count": null,
908 | "metadata": {},
909 | "outputs": [],
910 | "source": [
911 | "from iminuit.cost import BinnedNLL\n",
912 | "from scipy.stats import uniform\n",
913 | "\n",
914 | "T_MIN, T_MAX = t[0], t[-1]\n",
915 | "\n",
916 | "def cdf(x, tau, p):\n",
917 | " # normalize to 1 in histogram range\n",
918 | " cdf_min, cdf_max = expon.cdf([T_MIN, T_MAX], scale=tau) \n",
919 | " norm = 1 / (cdf_max - cdf_min)\n",
920 | " \n",
921 | " signal = p * expon.cdf(x, scale=tau) * norm\n",
922 | " background = (1 - p) * uniform.cdf(x, T_MIN, T_MAX)\n",
923 | " # combine exponential signal with uniform background\n",
924 | " return signal + background\n",
925 | "\n",
926 | "\n",
927 | "# histogram counds, histogram edges and cumulative distribution function\n",
928 | "loss = BinnedNLL(N, t, cdf)\n",
929 | "\n",
930 | "m = Minuit(loss, tau=2, p=0.99)\n",
931 | "m.limits['tau'] = (eps, None)\n",
932 | "m.limits['p'] = (0, 1)\n",
933 | "m.migrad()\n",
934 | "m.minos()"
935 | ]
936 | },
937 | {
938 | "cell_type": "code",
939 | "execution_count": null,
940 | "metadata": {},
941 | "outputs": [],
942 | "source": [
943 | "muon_lifetime = m.values['tau']\n",
944 | "muon_lifetime_unc = m.errors['tau']\n",
945 | "pdg_reference = 2.1969811\n",
946 | "pdg_reference_unc = 0.0000022\n",
947 | "\n",
948 | "print(f'Fit: τ = {muon_lifetime:.3f} ± {muon_lifetime_unc:.3f} µs')\n",
949 | "print(f'Lit: τ = {pdg_reference:.7f} ± {pdg_reference_unc:.7f} µs')"
950 | ]
951 | },
952 | {
953 | "cell_type": "markdown",
954 | "metadata": {},
955 | "source": [
956 | "Likelihood-Scan for uncertainty intervals:"
957 | ]
958 | },
959 | {
960 | "cell_type": "code",
961 | "execution_count": null,
962 | "metadata": {},
963 | "outputs": [],
964 | "source": [
965 | "tau, ts, valid = m.mnprofile('tau', size=100)"
966 | ]
967 | },
968 | {
969 | "cell_type": "code",
970 | "execution_count": null,
971 | "metadata": {},
972 | "outputs": [],
973 | "source": [
974 | "plt.figure()\n",
975 | "\n",
976 | "plt.axvline(m.values['tau'], color='C1', label=\"Fit-Result\")\n",
977 | "plt.plot(m.values['tau'], m.fval, color='C1', marker='o', zorder=3)\n",
978 | "\n",
979 | "plt.axvline(m.values['tau'] + m.merrors['tau'].lower, color='C2', label=\"Lower Error\")\n",
980 | "plt.axvline(m.values['tau'] + m.merrors['tau'].upper, color='C2', label=\"Upper Error\")\n",
981 | "plt.axhline(m.fval + 1, color='C3', label=\"NLL + 1\")\n",
982 | "plt.axvline(pdg_reference, color='k', label='PDG')\n",
983 | "plt.plot(tau, ts)\n",
984 | "\n",
985 | "plt.xlabel('τ / µs')\n",
986 | "plt.ylabel('NLL')\n",
987 | "plt.legend()\n",
988 | "None"
989 | ]
990 | },
991 | {
992 | "cell_type": "code",
993 | "execution_count": null,
994 | "metadata": {},
995 | "outputs": [],
996 | "source": [
997 | "plt.figure()\n",
998 | "m.draw_mncontour('tau', 'p', size=250);"
999 | ]
1000 | },
1001 | {
1002 | "cell_type": "code",
1003 | "execution_count": null,
1004 | "metadata": {},
1005 | "outputs": [],
1006 | "source": []
1007 | }
1008 | ],
1009 | "metadata": {
1010 | "kernelspec": {
1011 | "display_name": "Python 3 (ipykernel)",
1012 | "language": "python",
1013 | "name": "python3"
1014 | },
1015 | "language_info": {
1016 | "codemirror_mode": {
1017 | "name": "ipython",
1018 | "version": 3
1019 | },
1020 | "file_extension": ".py",
1021 | "mimetype": "text/x-python",
1022 | "name": "python",
1023 | "nbconvert_exporter": "python",
1024 | "pygments_lexer": "ipython3",
1025 | "version": "3.12.7"
1026 | }
1027 | },
1028 | "nbformat": 4,
1029 | "nbformat_minor": 4
1030 | }
1031 |
--------------------------------------------------------------------------------
/smd_neural_networks.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "ExecuteTime": {
8 | "end_time": "2017-11-30T18:20:49.589466Z",
9 | "start_time": "2017-11-30T18:20:47.850534Z"
10 | }
11 | },
12 | "outputs": [],
13 | "source": [
14 | "from ml import plots\n",
15 | "import matplotlib.pyplot as plt\n",
16 | "import seaborn as sns\n",
17 | "import pandas as pd\n",
18 | "import numpy as np\n",
19 | "from matplotlib.colors import ListedColormap\n",
20 | "from tqdm.auto import tqdm"
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": null,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": [
29 | "%matplotlib widget\n",
30 | "\n",
31 | "plots.set_plot_style()"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "# Neural Networks and Deep Learning\n",
39 | "\n",
40 | "Most of this material was heavily inspired by http://cs231n.github.io/. Stanfords CNN for Image Recognition lecture.\n",
41 | "\n",
42 | "\n",
43 | "## Linear Classification II\n",
44 | "\n",
45 | "Previously we talked about classification using a linear function $f(x): \\mathbb{R}^{p} \\to \\mathbb{R}$ and least-square regression.\n",
46 | "\n",
47 | "Recall the linear function dependent on the parameter vector $\\beta$\n",
48 | "\n",
49 | "\n",
50 | "$$\n",
51 | "f(x)= \\hat{y} = \\hat{\\beta}_0 + \\sum_{j=1}^p x_j \\hat{\\beta}_j\n",
52 | "$$\n",
53 | "\n",
54 | "\n",
55 | "$$\n",
56 | "f(x)= \\hat{y} = x^T \\mathbf{\\beta}\n",
57 | "$$\n",
58 | "\n",
59 | "where $\\beta = (\\beta_0, \\beta_1, \\beta_2, \\ldots, \\beta_p)$ and $x = (1, x_1, x_2, \\ldots, x_p)$.\n",
60 | "\n",
61 | "We minimized the residual sum of squares using matrix multiplication.\n",
62 | "\n",
63 | "$$\n",
64 | "RSS(\\beta) = (\\mathbf{y} - \\mathbf{X} \\beta)^T (\\mathbf{y} - \\mathbf{X} \\beta )\n",
65 | "\\Rightarrow \\hat{\\beta} = (\\mathbf{X}^T \\mathbf{X})^{-1} \\mathbf{X}^T \\mathbf{y}\n",
66 | "$$\n",
67 | "\n",
68 | "\n",
69 | "In order to perform the classification we had to define the classification function \n",
70 | "\n",
71 | "$$\n",
72 | "\\hat{y} = \\begin{cases}\n",
73 | "\\text{Yes}, & \\text{if $ f(x) \\gt 0.5$} \\\\\n",
74 | "\\text{No}, & \\text{if $ f(x) \\le 0.5$}\n",
75 | "\\end{cases}\n",
76 | "$$ \n",
77 | "\n",
78 | "Now we encode the class label directly into the function. Given a classification problem with $K$ different classes we want $f(x): \\mathbb{R}^{p} \\to \\mathbb{R}^K$ to return a vector of length $K$. Instead of finding the parameters in a vector $\\beta$ we use a $p \\times K$ matrix $\\mathbf{W}$\n",
79 | "\n",
80 | "$$\n",
81 | "f(x) = \\hat{y} = x \\cdot \\mathbf{W}\n",
82 | "$$\n",
83 | "\n",
84 | "At this point we could again minimize a modified version of the RSS or the SVM loss. This time however we want to be able to have a more *probabilistic* interpretation of the loss function.\n",
85 | "\n",
86 | "We use a new loss function called the __cross-entropy loss__. Also known as logistic loss. Given two probability density functions $p$ and $q$ over the same probabilistic space, cross entropy is defined as\n",
87 | "\n",
88 | "$$\n",
89 | "H_{\\times}(p, q) = -\\sum _{x} p(x)\\,\\log q(x).\\!\n",
90 | "$$\n",
91 | "\n",
92 | "The cross entropy gives smaller values for similar distributions $q$ and $p$.\n",
93 | "In case of classification problems $p$ is the true distribution of classes for a given $x$ and $q(x)$ the estimated distribution as produced by our machine learner. \n",
94 | "\n",
95 | "In the example below we look at $p$ for the digits dataset. As one example in the dataset can be just one digit, the $p$ vector has only a single non-zero entry. \n",
96 | "\n",
97 | "If a classifier produces the same distribution for the dataset we have perfect classification."
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {
104 | "ExecuteTime": {
105 | "end_time": "2017-11-30T18:20:50.702317Z",
106 | "start_time": "2017-11-30T18:20:49.591277Z"
107 | }
108 | },
109 | "outputs": [],
110 | "source": [
111 | "from sklearn.datasets import load_digits\n",
112 | "from sklearn import preprocessing\n",
113 | "\n",
114 | "X, y = load_digits(return_X_y=True)\n",
115 | "\n",
116 | "y = preprocessing.LabelBinarizer().fit_transform(y)\n",
117 | "\n",
118 | "print('True (discrete) probability distributions.')\n",
119 | "\n",
120 | "\n",
121 | "fig, axs = plt.subplots(2, 5, figsize=(8, 4))\n",
122 | "for ax, x_i, y_i in zip(axs.flat, X[10:20], y):\n",
123 | " img = x_i.reshape(8, 8)\n",
124 | " ax.imshow(img, cmap='gray_r', interpolation='nearest')\n",
125 | " ax.axis('off')\n",
126 | "\n",
127 | " print(y_i)"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "We want our estimated probabilities for the distribitution of class labels to be as close to the true distribution as possible. The result of our function $f(x)$ does not produce probability estimates. We need to *transform the results to probabilities* using the so called __softmax__ function\n",
135 | "\n",
136 | "$$\n",
137 | "q_k(x) = \\frac{e^{f_{k}(x)}} {∑_j e^{f_j(x)}}\n",
138 | "$$\n",
139 | "\n",
140 | "where the index $j \\in \\{1, \\ldots, K \\}$ loops over all classes and $f_k$ is the prediction for class $k$.\n",
141 | "\n",
142 | "In shorter words: A classifier using cross entropy loss with softmax produces probabilities for class memberships.\n",
143 | "\n",
144 | "Below we implement the softmax function and the loss function which uses a weight matrix $\\mathbf{W}$ to compute the loss of the entire dataset"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "from sklearn.preprocessing import label_binarize as label_binarize_sklearn\n",
154 | "\n",
155 | "def label_binarize(labels, n_classes):\n",
156 | " # sklearn does not expand binary labels but we need this for our \n",
157 | " # formulation of the training\n",
158 | " binary_labels = label_binarize_sklearn(labels, classes=range(max(n_classes, 3)))\n",
159 | " return binary_labels[:, :n_classes]"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": null,
165 | "metadata": {},
166 | "outputs": [],
167 | "source": [
168 | "label_binarize([0, 1, 1, 0], 2)"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": null,
174 | "metadata": {
175 | "ExecuteTime": {
176 | "end_time": "2017-11-30T18:20:50.977342Z",
177 | "start_time": "2017-11-30T18:20:50.705058Z"
178 | }
179 | },
180 | "outputs": [],
181 | "source": [
182 | "from sklearn.datasets import load_digits\n",
183 | "\n",
184 | "rng = np.random.default_rng(0)\n",
185 | "\n",
186 | "X, y = load_digits(return_X_y=True)\n",
187 | "\n",
188 | "ones = np.ones(shape=(len(X), 1))\n",
189 | "X = np.hstack([ones, X])\n",
190 | "\n",
191 | "# we have 10 classes and p attributes, digits from 0 to 9 and 64 pixels\n",
192 | "K = 10\n",
193 | "p = 64 \n",
194 | "\n",
195 | "# include the bias into the weight matrix\n",
196 | "W = rng.normal(size=(p+1, K))\n",
197 | "\n",
198 | "y = label_binarize(y, 10)\n",
199 | "y"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": null,
205 | "metadata": {
206 | "ExecuteTime": {
207 | "end_time": "2017-11-30T18:20:51.017438Z",
208 | "start_time": "2017-11-30T18:20:50.984394Z"
209 | }
210 | },
211 | "outputs": [],
212 | "source": [
213 | "# softmax function shifted to zero so no numerical instabilities arise.\n",
214 | "def softmax(f):\n",
215 | " f_shifted = f - f.max()\n",
216 | " p = np.exp(f_shifted).T / np.sum(np.exp(f_shifted), axis=1)\n",
217 | " return p.T\n",
218 | "\n",
219 | "\n",
220 | "# loss for the entire training set\n",
221 | "def loss_cross_ent(X, y, W):\n",
222 | " f = X @ W\n",
223 | " q = softmax(f)\n",
224 | " l = -np.sum(y * np.log2(q), axis=1)\n",
225 | " return l.mean() # the mean over all samples in the batch/dataset\n",
226 | "\n",
227 | "loss_cross_ent(X, y, W)"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "Now we need to find those weights in $\\mathbf{W}$ which minimize the loss function. One way to minimize this is to randomly search for matrix entries until we get a better loss function. But thats obviously not very smart. "
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": null,
240 | "metadata": {
241 | "ExecuteTime": {
242 | "end_time": "2017-11-30T18:20:59.341147Z",
243 | "start_time": "2017-11-30T18:20:51.019859Z"
244 | }
245 | },
246 | "outputs": [],
247 | "source": [
248 | "def random_search(X, y, loss_function, max_iter=10000, rng=None):\n",
249 | " rng = rng or np.random.default_rng(0)\n",
250 | " \n",
251 | " # we have 10 classes and p attributes, digits from 0 to 9 and 64 pixels\n",
252 | " K = y.shape[1]\n",
253 | " p = X.shape[1]\n",
254 | " \n",
255 | " #save all losses\n",
256 | " losses = []\n",
257 | " \n",
258 | " #start with a random weight matrix\n",
259 | " W = rng.normal(0, 0.01, size=(p, K))\n",
260 | " bestloss = np.inf\n",
261 | " for i in tqdm(range(max_iter)):\n",
262 | " \n",
263 | " # choose a random direction\n",
264 | " W_new = rng.normal(0, 0.01, size=(p, K))\n",
265 | " loss = loss_function(X, y, W_new)\n",
266 | " losses.append(loss)\n",
267 | " if loss < bestloss:\n",
268 | " W = W_new\n",
269 | " bestloss = loss\n",
270 | " return losses\n",
271 | " \n",
272 | "max_iter = 5000\n",
273 | "losses_random = random_search(X, y, loss_cross_ent, max_iter=max_iter)"
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": null,
279 | "metadata": {
280 | "ExecuteTime": {
281 | "end_time": "2017-11-30T18:20:59.824006Z",
282 | "start_time": "2017-11-30T18:20:59.346338Z"
283 | }
284 | },
285 | "outputs": [],
286 | "source": [
287 | "plt.figure()\n",
288 | "plt.plot(range(max_iter), losses_random, '.')\n",
289 | "plt.xlabel('Iterations')\n",
290 | "plt.ylabel('Loss Function')\n",
291 | "None"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "metadata": {},
297 | "source": [
298 | "Like in all problems we discussed before, it is actually computationally infeaseble to find the global optimum. We can again try a local search to find a local optimum. Again this is not a show-stopper in real-life situation. Often finding a local optimum suffices. Sometimes the problem can be reformulated to convex in which case the local optimizer is guaranteed to find the global optimum. \n",
299 | "\n",
300 | "Below we try an incremental approach. Which optimizes in a local neighbourhood of $\\mathbf{W}$. We keep the best $\\mathbf{W}$ we found in the previous iterations and only add a random direction to it.\n"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": null,
306 | "metadata": {
307 | "ExecuteTime": {
308 | "end_time": "2017-11-30T18:21:04.075353Z",
309 | "start_time": "2017-11-30T18:20:59.826158Z"
310 | }
311 | },
312 | "outputs": [],
313 | "source": [
314 | "def random_descent(X, y, loss_function, max_iter=10000, step_size=0.01, rng=None):\n",
315 | " rng = rng or np.random.default_rng(0)\n",
316 | " bestloss = np.inf\n",
317 | " \n",
318 | " # we have 10 classes and p attributes, digits from 0 to 9 and 64 pixels\n",
319 | " K = y.shape[1]\n",
320 | " p = X.shape[1]\n",
321 | " \n",
322 | " #save all losses\n",
323 | " losses = []\n",
324 | " \n",
325 | " #start with a random weight matrix\n",
326 | " W = np.random.normal(size=(p, K)) * step_size\n",
327 | " for i in tqdm(range(max_iter)):\n",
328 | " \n",
329 | " # choose a random direction\n",
330 | " W_new = rng.normal(W, step_size)\n",
331 | " \n",
332 | " loss = loss_function(X, y, W_new)\n",
333 | " losses.append(loss)\n",
334 | " if (loss < bestloss):\n",
335 | " W = W_new\n",
336 | " bestloss = loss\n",
337 | " return losses\n",
338 | " \n",
339 | "max_iter = 2500\n",
340 | "losses_random = random_descent(X, y, loss_cross_ent, max_iter=max_iter)"
341 | ]
342 | },
343 | {
344 | "cell_type": "code",
345 | "execution_count": null,
346 | "metadata": {
347 | "ExecuteTime": {
348 | "end_time": "2017-11-30T18:21:04.527350Z",
349 | "start_time": "2017-11-30T18:21:04.079232Z"
350 | }
351 | },
352 | "outputs": [],
353 | "source": [
354 | "plt.figure()\n",
355 | "plt.plot(losses_random, '.')\n",
356 | "plt.xlabel('Iterations')\n",
357 | "plt.ylabel('Loss Function')\n",
358 | "None"
359 | ]
360 | },
361 | {
362 | "cell_type": "markdown",
363 | "metadata": {},
364 | "source": [
365 | "In each step we shoose a random direction to optimize to. We can be smarter about it by following the gradient of the loss function in each iteration. The loss function will converge much faster. \n",
366 | "\n",
367 | "The gradient has to be computed with respect to our weight matrix $\\mathbf{W}$\n",
368 | "$$\n",
369 | "L = H_{\\times}(p, q) = -\\sum _{k} p_k\\,\\log q_k = -\\sum _{k} p_k\\,\\log \\left( \\frac{e^{f_{k}(x)}} {∑_j e^{f_j(x)}} \\right)\n",
370 | "$$\n",
371 | "\n",
372 | "where \n",
373 | "\n",
374 | "$$\n",
375 | "f = \\mathbf{W} \\cdot x\\, .\n",
376 | "$$\n",
377 | "\n",
378 | "We can build the derivative using the chain rule.\n",
379 | "\n",
380 | "$$\n",
381 | "\\frac{\\rm{\\partial} L }{\\rm{\\partial} w} = \\frac{\\rm{\\partial} L }{\\rm{\\partial} q} \\frac{\\rm{\\partial} q }{\\rm{\\partial} f}\\frac{\\rm{\\partial} f }{\\rm{\\partial}w}\n",
382 | "$$\n",
383 | "\n",
384 | "Note that derivatives of vectors always have to take the index into account. \n",
385 | "Or you can build the Jacobian. \n",
386 | "\n",
387 | "\n",
388 | "\\begin{align}\n",
389 | "\\frac{\\rm{\\partial} L }{\\rm{\\partial} f_i} =& - \\sum_k y_k \\frac{\\rm{\\partial} \\log q_k }{\\rm{\\partial} f_i} \\\\\n",
390 | "=&- \\sum_k y_k \\frac{1}{q_k}\\frac{\\rm{\\partial}q_k }{\\rm{\\partial} f_i} \\\\\n",
391 | "&\\vdots \\; \\; \\text{Übungsaufgabe} \\\\\n",
392 | "=& p_i - y_i\n",
393 | "\\end{align}\n",
394 | "\n",
395 | "We use the computed gradient in the code to compute the update to the weight matrix $\\mathbf{W}$\n",
396 | "\n",
397 | "\n",
398 | " W = W - dW * step_size\n",
399 | " \n",
400 | "The step size, or learning rate controls how far we go down the gradient in each step.\n"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": null,
406 | "metadata": {
407 | "ExecuteTime": {
408 | "end_time": "2017-11-30T18:21:04.584922Z",
409 | "start_time": "2017-11-30T18:21:04.529539Z"
410 | }
411 | },
412 | "outputs": [],
413 | "source": [
414 | "def gradient(W, X, y):\n",
415 | " f = X @ W\n",
416 | " p = softmax(f)\n",
417 | " dh = (p - y)\n",
418 | " dW = (X.T @ dh) / dh.shape[0]\n",
419 | " return dW\n",
420 | "\n",
421 | "\n",
422 | "def gradient_descent(X, y, loss_function, gradient, max_iter=10000, step_size=0.01, rng=None):\n",
423 | " rng = rng or np.random.default_rng(0)\n",
424 | " \n",
425 | " bestloss = np.inf\n",
426 | " \n",
427 | " # we have 10 classes and p attributes, digits from 0 to 9 and 64 pixels\n",
428 | " K = y.shape[1]\n",
429 | " p = X.shape[1]\n",
430 | " \n",
431 | " #save all losses\n",
432 | " losses = []\n",
433 | " #start with a random weight matrix\n",
434 | " W = rng.normal(0, step_size, size=(p, K))\n",
435 | " for i in tqdm(range(max_iter)):\n",
436 | " W = W - step_size * gradient(W, X, y)\n",
437 | " loss = loss_function(X, y, W)\n",
438 | " losses.append(loss)\n",
439 | " \n",
440 | " return losses, W"
441 | ]
442 | },
443 | {
444 | "cell_type": "code",
445 | "execution_count": null,
446 | "metadata": {
447 | "ExecuteTime": {
448 | "end_time": "2017-11-30T18:21:13.940471Z",
449 | "start_time": "2017-11-30T18:21:04.630436Z"
450 | }
451 | },
452 | "outputs": [],
453 | "source": [
454 | "X, y = load_digits(return_X_y=True)\n",
455 | "\n",
456 | "X = np.hstack([ones, X])\n",
457 | "y = label_binarize(y, 10)\n",
458 | "\n",
459 | "losses, W = gradient_descent(X, y, loss_cross_ent, gradient, max_iter=max_iter)"
460 | ]
461 | },
462 | {
463 | "cell_type": "code",
464 | "execution_count": null,
465 | "metadata": {
466 | "ExecuteTime": {
467 | "end_time": "2017-11-30T18:21:14.380174Z",
468 | "start_time": "2017-11-30T18:21:13.943244Z"
469 | }
470 | },
471 | "outputs": [],
472 | "source": [
473 | "plt.figure()\n",
474 | "plt.plot(losses_random, '.', label='random descent')\n",
475 | "plt.plot(losses, '.', label='gradient descent')\n",
476 | "plt.xlabel('Iterations')\n",
477 | "plt.ylabel('Loss Function')\n",
478 | "plt.legend()\n",
479 | "plt.title('Cross Entropy Loss')\n",
480 | "None"
481 | ]
482 | },
483 | {
484 | "cell_type": "code",
485 | "execution_count": null,
486 | "metadata": {
487 | "ExecuteTime": {
488 | "end_time": "2017-11-30T18:21:15.652635Z",
489 | "start_time": "2017-11-30T18:21:14.382763Z"
490 | }
491 | },
492 | "outputs": [],
493 | "source": [
494 | "plt.figure()\n",
495 | "plt.plot(losses[:200], '.', label='gradient descent', color='orange')\n",
496 | "plt.xlabel('Iterations')\n",
497 | "plt.ylabel('Loss Function')\n",
498 | "None"
499 | ]
500 | },
501 | {
502 | "cell_type": "markdown",
503 | "metadata": {},
504 | "source": [
505 | "We just implemented the so called SoftMax classifier. This classifier is non-linear.\n",
506 | "While the simple matrix multiplication $f = \\boldsymbol{X} \\boldsymbol{W}$ is linear , the application of the softmax function certainly is not. Our prediction vector is given by applying a non-linear function to the output of $f$\n",
507 | "\n",
508 | "$$\n",
509 | "p = q(f(x)) = q(\\boldsymbol{x} \\cdot \\boldsymbol{W})\n",
510 | "$$\n",
511 | "\n",
512 | "Below we apply this classifier to some test data. Clearly this on non-linear step does not suffice to classify non-linear data like the two spirals."
513 | ]
514 | },
515 | {
516 | "cell_type": "code",
517 | "execution_count": null,
518 | "metadata": {
519 | "ExecuteTime": {
520 | "end_time": "2017-11-30T18:21:15.802124Z",
521 | "start_time": "2017-11-30T18:21:15.656491Z"
522 | }
523 | },
524 | "outputs": [],
525 | "source": [
526 | "from sklearn.datasets import make_blobs\n",
527 | "\n",
528 | "n_classes = 3\n",
529 | "\n",
530 | "X, y = make_blobs(n_samples=1500, n_features=2, center_box=(-5, 5), centers=n_classes, cluster_std=0.3, random_state=0)\n",
531 | "\n",
532 | "# add biases to matrix\n",
533 | "ones = np.ones(shape=(len(X), 1))\n",
534 | "X_b = np.hstack([ones, X])\n",
535 | "y_b = label_binarize(y, n_classes)\n",
536 | "\n",
537 | "losses, W = gradient_descent(X_b, y_b, loss_cross_ent, gradient, max_iter=150)"
538 | ]
539 | },
540 | {
541 | "cell_type": "code",
542 | "execution_count": null,
543 | "metadata": {
544 | "ExecuteTime": {
545 | "end_time": "2017-11-30T18:21:16.542873Z",
546 | "start_time": "2017-11-30T18:21:15.805585Z"
547 | }
548 | },
549 | "outputs": [],
550 | "source": [
551 | "prediction = np.argmax(softmax(X_b @ W), axis=1)\n",
552 | "\n",
553 | "\n",
554 | "cmap = plots.create_discrete_colormap(n_classes)\n",
555 | "norm = plt.Normalize(-0.5, n_classes - 0.5)\n",
556 | "\n",
557 | "\n",
558 | "plt.figure()\n",
559 | "plt.axes(aspect='equal')\n",
560 | "plt.scatter(X[:,0], X[:, 1], facecolor='none',s=40, edgecolors=cmap(norm(prediction)))\n",
561 | "plt.scatter(X[:,0], X[:, 1], c=y, s=10, cmap=cmap, norm=norm)"
562 | ]
563 | },
564 | {
565 | "cell_type": "code",
566 | "execution_count": null,
567 | "metadata": {
568 | "ExecuteTime": {
569 | "end_time": "2017-11-30T18:21:16.710362Z",
570 | "start_time": "2017-11-30T18:21:16.546422Z"
571 | }
572 | },
573 | "outputs": [],
574 | "source": [
575 | "X, y = plots.twospirals(n_samples=300)\n",
576 | "n_classes = len(np.unique(y))\n",
577 | "\n",
578 | "\n",
579 | "#add biases to matrix\n",
580 | "ones = np.ones(shape=(len(X), 1))\n",
581 | "X_b = np.hstack([ones, X])\n",
582 | "y_b = label_binarize(y, n_classes)\n",
583 | "\n",
584 | "losses, W = gradient_descent(X_b, y_b, loss_cross_ent, gradient, max_iter=150)"
585 | ]
586 | },
587 | {
588 | "cell_type": "code",
589 | "execution_count": null,
590 | "metadata": {
591 | "ExecuteTime": {
592 | "end_time": "2017-11-30T18:21:17.424789Z",
593 | "start_time": "2017-11-30T18:21:16.715133Z"
594 | }
595 | },
596 | "outputs": [],
597 | "source": [
598 | "X, y = plots.twospirals(n_samples=300)\n",
599 | "\n",
600 | "X_b = np.hstack([ones, X])\n",
601 | "y_b = label_binarize(y, n_classes)\n",
602 | "\n",
603 | "prediction = np.argmax(softmax(X_b @ W), axis=1)\n",
604 | "\n",
605 | "cmap = plots.create_discrete_colormap(n_classes)\n",
606 | "norm = plt.Normalize(-0.5, n_classes - 0.5)\n",
607 | "\n",
608 | "plt.figure()\n",
609 | "plt.axes(aspect=1)\n",
610 | "plt.scatter(X[:,0], X[:, 1], c=y, cmap=cmap, s=10, label='truth')\n",
611 | "plt.scatter(X[:,0], X[:, 1], facecolor='none', s=50, edgecolors=cmap(norm(prediction)), label='prediction')\n",
612 | "plt.legend()\n",
613 | "\n",
614 | "None"
615 | ]
616 | },
617 | {
618 | "cell_type": "markdown",
619 | "metadata": {},
620 | "source": [
621 | "### Neural Networks (Multi Layer Perceptron)\n",
622 | "\n",
623 | "We use another abstraction to describe the linear classification we used above. \n",
624 | "We started with an input image of $8 \\times 8$ pixels. We flatten it to get a single vector $x_i$ with $64$ entries.\n",
625 | "In the next step we multiply the vector with a matrix $W$ with dimensions $64 \\times 10$ and get a result vector of length 10. The following image shows this process in a network like manner. \n",
626 | "\n",
627 | "
\n",
628 | "\n",
629 | "Neural networks use some special slang to describe their architecture. The column of blue dots, i.e. the matrix multiplication, is called a layer. Depending on context this will be called a *hidden layer* or *fully connected layer*. The input and outputs are called the *input* and *output layer* respectively. The white box labeled as Softmax is called *activation function*\n",
630 | "\n",
631 | "\n",
632 | "
\n",
633 | "\n",
634 | "\n",
635 | "We can extend the previous idea further by applying more than one layer of matrix multiplications.\n",
636 | "\n",
637 | "$$\n",
638 | "p = q_2(q_1(x \\cdot W_1), W_2)\n",
639 | "$$\n",
640 | "\n",
641 | "Each entry in the weight matrices $W_1$ and $W_2$ are *learnable* parameters. The matrices have dimension $n_{\\text{in}} \\times n_{\\text{out}}$ (neglecting biases). In the single layer network we trained above we had a single layer with 640 free parameters to optimize. In the case of two full layers we would have twice as many. \n",
642 | "\n",
643 | "\n",
644 | "
\n",
645 | "\n",
646 | "\n",
647 | "The method of finding the gradient for such a complex function is actually not different from what we did before. As long as the Loss function is a linear function with respect to the number of samples. This holds for loss functions like the mean square error, the cross entropy loss or the SVM loss. \n",
648 | "\n",
649 | "$$\n",
650 | "L = \\frac 1 N \\sum_n^N L_i\n",
651 | "$$\n",
652 | "\n",
653 | "As long as the activation functions and the loss function are differentiable the gradient can be calculated either numerically or analytically using __backpropagation__. Which essentially requires nothing more than recursive application of the chain rule of derivatives. In the example above we used backpropagation on a single layer to calculate the gradient. \n",
654 | "\n",
655 | "Below we use the sklearn __multi-layer perceptron__ to classify the spiral data. "
656 | ]
657 | },
658 | {
659 | "cell_type": "code",
660 | "execution_count": null,
661 | "metadata": {
662 | "ExecuteTime": {
663 | "end_time": "2017-11-30T19:02:33.948933Z",
664 | "start_time": "2017-11-30T19:02:31.762703Z"
665 | }
666 | },
667 | "outputs": [],
668 | "source": [
669 | "from sklearn.neural_network import MLPClassifier\n",
670 | "\n",
671 | "X, y = plots.twospirals(n_samples=400)\n",
672 | "\n",
673 | "clf = MLPClassifier(hidden_layer_sizes=(10, 10), max_iter=500)\n",
674 | "clf.fit(X, y)\n",
675 | "\n",
676 | "X, y = plots.twospirals(n_samples=400)\n",
677 | "prediction = clf.predict(X)\n",
678 | "\n",
679 | "n_classes = len(np.unique(y))\n",
680 | "cmap = plots.create_discrete_colormap(n_classes)\n",
681 | "norm = plt.Normalize(-0.5, n_classes - 0.5)\n",
682 | "\n",
683 | "\n",
684 | "plt.figure()\n",
685 | "plt.axes(aspect=1)\n",
686 | "plt.scatter(X[:,0], X[:, 1], c=y, cmap=cmap, s=10, label='truth')\n",
687 | "plt.scatter(X[:,0], X[:, 1], facecolor='none', s=50, edgecolors=cmap(norm(prediction)), label='prediction')\n",
688 | "plt.legend()"
689 | ]
690 | },
691 | {
692 | "cell_type": "markdown",
693 | "metadata": {},
694 | "source": [
695 | "We initialized the MLP with two hidden layers of size 10. As our dataset is two dimensional, the input layer has size 2. The two layers are represented as matrices of size $2 \\times 10$ which means we have 40 matrix entries (plus biases) to optimize in total. \n",
696 | "\n",
697 | "The more layers and nodes a network has, the higher the representation power. In this case a simple 2 layer network does not suffice to represent the spiral data. \n",
698 | "\n",
699 | "\n",
700 | "http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html"
701 | ]
702 | },
703 | {
704 | "cell_type": "code",
705 | "execution_count": null,
706 | "metadata": {
707 | "ExecuteTime": {
708 | "end_time": "2017-11-30T18:34:33.067516Z",
709 | "start_time": "2017-11-30T18:34:25.444842Z"
710 | }
711 | },
712 | "outputs": [],
713 | "source": [
714 | "from IPython import display\n",
715 | "from sklearn.metrics import confusion_matrix\n",
716 | "import time\n",
717 | "import warnings\n",
718 | "warnings.filterwarnings('ignore')\n",
719 | "\n",
720 | "fig, [ax1, ax2, ax3, ax4] = plt.subplots(1, 4)\n",
721 | "\n",
722 | "X, y = load_digits(return_X_y=True)\n",
723 | "\n",
724 | "last = 0\n",
725 | "for i in range(10):\n",
726 | " mlp = MLPClassifier(hidden_layer_sizes=(64,64,64), max_iter=i + 1, random_state=43, solver='sgd',)\n",
727 | " mlp.fit(X, y)\n",
728 | " \n",
729 | " if i > 0 :\n",
730 | " plt.suptitle('Iteration {}'.format(i + 1))\n",
731 | " \n",
732 | " ax1.set_title('Layer 1')\n",
733 | " ax2.set_title('Layer 2')\n",
734 | " ax3.set_title('Layer 3')\n",
735 | " ax4.set_title('Loss Function')\n",
736 | " ax4.set_xlabel('Iteration')\n",
737 | "\n",
738 | " diff_0 = np.asarray(mlp.coefs_[0]) \n",
739 | " diff_1 = np.asarray(mlp.coefs_[1]) \n",
740 | " diff_2 = np.asarray(mlp.coefs_[2]) \n",
741 | " diff_3 = np.asarray(mlp.coefs_[3]) \n",
742 | " \n",
743 | "\n",
744 | " ax1.imshow(diff_0, )\n",
745 | " ax2.imshow(diff_1, )\n",
746 | " ax3.imshow(diff_2, )\n",
747 | " ax4.plot(range(1, i + 2), mlp.loss_curve_, color='purple')\n",
748 | " \n",
749 | " \n",
750 | " display.clear_output(wait=True)\n",
751 | " display.display(plt.gcf())\n",
752 | " \n",
753 | " last_0 = np.asarray(mlp.coefs_[0])\n",
754 | " last_1 = np.asarray(mlp.coefs_[1])\n",
755 | " last_2 = np.asarray(mlp.coefs_[2])\n",
756 | " last_3 = np.asarray(mlp.coefs_[3])\n",
757 | "\n",
758 | "plt.close()"
759 | ]
760 | },
761 | {
762 | "cell_type": "markdown",
763 | "metadata": {},
764 | "source": [
765 | "\n",
766 | "### Activation Functions\n",
767 | "\n",
768 | "There are a range of activation functions to choose from. The Softmax functions we used for linear classification is often used in the final/output layer of a network to produce probability estimates.\n",
769 | "An activation function has to be non-linear. Otherwise layers in the network could be merged together into one single matrix operation. Sometimes activation functions are also called non-linearities\n",
770 | "\n",
771 | "##### Sigmoid \n",
772 | "\n",
773 | "$$\n",
774 | "g(x) = \\frac{1}{1 + e^{-x}}\n",
775 | "$$\n",
776 | "\n",
777 | "The sigmoid function squashes values into range between 0 and 1.\n",
778 | "It has been very popular for neural networks in the past but has recently fallen out of favor because it does not behave well when using gradient descent. \n",
779 | "\n",
780 | "$$\n",
781 | "\\rm{d}g(x) = \\frac{e^x}{\\left(1 + e^{x}\\right)^2}\n",
782 | "$$\n",
783 | "\n",
784 | "As visible in the plot below the gradient becomes zero for large values of $x$. Which leads to saturations in the gradient descent steps.\n",
785 | "\n",
786 | " \n",
787 | "\n"
788 | ]
789 | },
790 | {
791 | "cell_type": "code",
792 | "execution_count": null,
793 | "metadata": {
794 | "ExecuteTime": {
795 | "end_time": "2017-11-30T18:21:19.142818Z",
796 | "start_time": "2017-11-30T18:20:47.943Z"
797 | }
798 | },
799 | "outputs": [],
800 | "source": [
801 | "def sigmoid(x):\n",
802 | " return 1 / (1 + np.exp(-x))\n",
803 | "\n",
804 | "def sigmoid_gradient(x):\n",
805 | " return np.exp(x) / (1 + np.exp(x))**2\n",
806 | "\n",
807 | "\n",
808 | "x = np.linspace(-10, 10, 200)\n",
809 | "\n",
810 | "fig, ax = plt.subplots()\n",
811 | "ax.plot(x, sigmoid(x), color='purple')\n",
812 | "ax.plot(x, sigmoid_gradient(x), color='purple', ls='--')\n",
813 | "ax.spines['left'].set_position(('data', 0.0))\n",
814 | "ax.spines['bottom'].set_position(('data', 0.0))\n",
815 | "\n",
816 | "ax.set_title('Sigmoid')"
817 | ]
818 | },
819 | {
820 | "cell_type": "markdown",
821 | "metadata": {},
822 | "source": [
823 | "##### Tangens hyperbolicus \n",
824 | "\n",
825 | "$$\n",
826 | "g(x) = \\tanh{x}\n",
827 | "$$\n",
828 | "\n",
829 | "The tangens hyperbolicus function behaves very similar to the sigmoid function. Its just a scaled version of it.\n",
830 | "\n",
831 | "\n",
832 | "$$\n",
833 | "\\rm{d}g(x) = 1 - \\tanh^2{x}\n",
834 | "$$\n",
835 | "\n",
836 | "The gradient also becomes zero for large values of $x$. However the values of the $\\tanh$ function are centered around zero which has some advantagous properties when compared to the sigmoid function.\n",
837 | " \n"
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "execution_count": null,
843 | "metadata": {
844 | "ExecuteTime": {
845 | "end_time": "2017-11-30T18:21:19.144083Z",
846 | "start_time": "2017-11-30T18:20:47.946Z"
847 | }
848 | },
849 | "outputs": [],
850 | "source": [
851 | "def tanh(x):\n",
852 | " return np.tanh(x)\n",
853 | "\n",
854 | "def tanh_gradient(x):\n",
855 | " return 1 - tanh(x)**2\n",
856 | "\n",
857 | "\n",
858 | "x = np.linspace(-10, 10, 200) \n",
859 | "\n",
860 | "fig, ax = plt.subplots()\n",
861 | "ax.plot(x, tanh(x), color='purple')\n",
862 | "ax.plot(x, tanh_gradient(x), color='purple', ls='--')\n",
863 | "ax.spines['left'].set_position(('data', 0.0))\n",
864 | "ax.spines['bottom'].set_position(('data', 0.0))\n",
865 | "\n",
866 | "ax.set_title('tanh')"
867 | ]
868 | },
869 | {
870 | "cell_type": "markdown",
871 | "metadata": {},
872 | "source": [
873 | "##### Rectified Linear Unit (ReLU)\n",
874 | "\n",
875 | "$$\n",
876 | "g(x) = \\max{0, x}\n",
877 | "$$\n",
878 | "\n",
879 | "The ReLU function became very popular recently. It was used in some very image recognition networks. Evaluating the function is cheap and it often increases the speed of the gradient descent. See [Krizhevsky et al.](http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf)\n",
880 | "\n",
881 | "\n",
882 | "$$\n",
883 | "\\rm{d}g(x) = \\begin{cases}\n",
884 | " 1, & \\text{if $x > 0$} \\\\\n",
885 | " \\text{undefined}, & \\text{if $x = 0$} \\\\\n",
886 | " \\text{0}, & \\text{if $x < 0$} \\\\\n",
887 | "\\end{cases}\n",
888 | "$$\n",
889 | "\n",
890 | "The gradient is zero for value smaller than 0 and is discontinous. This can lead to a problem called 'dying neurons'.\n",
891 | "The softplus function is a common alternative.\n"
892 | ]
893 | },
894 | {
895 | "cell_type": "code",
896 | "execution_count": null,
897 | "metadata": {
898 | "ExecuteTime": {
899 | "end_time": "2017-11-30T18:21:19.149667Z",
900 | "start_time": "2017-11-30T18:20:47.950Z"
901 | }
902 | },
903 | "outputs": [],
904 | "source": [
905 | "def relu(x):\n",
906 | " return np.maximum(0, x)\n",
907 | "\n",
908 | "def relu_gradient(x):\n",
909 | " return np.clip(np.sign(x), 0, 1)\n",
910 | "\n",
911 | "def softplus(x):\n",
912 | " return np.log(1 + np.exp(x))\n",
913 | "\n",
914 | "def softplus_gradient(x):\n",
915 | " return 1 / (1 + np.exp(-x))\n",
916 | "\n",
917 | "x = np.linspace(-3, 3, 1000) \n",
918 | "\n",
919 | "\n",
920 | "fig, ax = plt.subplots()\n",
921 | "ax.plot(x, relu(x), label='relu', color='purple')\n",
922 | "ax.plot(x, relu_gradient(x), label='relu gradient', color='darkred', ls='--')\n",
923 | "ax.plot(x, softplus(x), label='softplus', color='darkorange')\n",
924 | "ax.plot(x, softplus_gradient(x), label='softplus gradient', color='darkorange', ls='--')\n",
925 | "ax.spines['left'].set_position(('data', 0.0))\n",
926 | "ax.spines['bottom'].set_position(('data', 0.0))\n",
927 | "ax.legend()"
928 | ]
929 | },
930 | {
931 | "cell_type": "markdown",
932 | "metadata": {},
933 | "source": [
934 | "#### Regularization\n",
935 | "\n",
936 | "We've seen how easily neural networks can overfit. We can try to avoid that using two techniques.\n",
937 | "\n",
938 | "- Add a penalty term to the loss function.\n",
939 | "- Add special regularization layers.\n",
940 | "\n",
941 | "Adapting the loss function is a common method to avoid overfitting not only for neural networks. We have seen a similar idea for the SVM loss function.\n",
942 | "\n",
943 | "A common approach is the __L2__ regularization.\n",
944 | "\n",
945 | "$$\n",
946 | "L_r = L + \\frac{1}{2} \\lambda w^2\n",
947 | "$$\n",
948 | "\n",
949 | "It avoids strong changes in $w$ and creates more diffuse weight matrices.\n",
950 | "\n",
951 | "So called __dropout layers__ randomly select neurons to activate during the training. Each iteration only optimizes a subset of neurons.\n",
952 | "\n",
953 | "#### Convolutional Neural Nets for Image Recognition\n",
954 | "\n",
955 | "The basic idea behind CNNs are simplifications made for handlin high dimensional inputs like high resolution images. \n",
956 | "Before the first fully connected layers are trained on the input data a so called convolution kernel is applied to the image. The parameters of this kernel are free to be optimized during training. The weights for each region in the image are shared however. This reduces the amount of parameters need to train a network and also leads to __translational invariance__.\n",
957 | "\n",
958 | "\n"
959 | ]
960 | },
961 | {
962 | "cell_type": "markdown",
963 | "metadata": {},
964 | "source": [
965 | ""
966 | ]
967 | },
968 | {
969 | "cell_type": "markdown",
970 | "metadata": {
971 | "collapsed": true
972 | },
973 | "source": [
974 | "##### Neural Networks in practice\n",
975 | "A few key points to take away:\n",
976 | "\n",
977 | " - The more layers the network the more powerful it becomes. However training might not converge or overfitting might occur.\n",
978 | " \n",
979 | " - The term neural network is a misnomer. That's not how real brains work.\n",
980 | " \n",
981 | " - You need regularization to avoid overfitting.\n",
982 | " \n",
983 | " - The word *Deep* in *Deep Learning* refers to the number of hidden layers. No magic involved but a lot of hype.\n",
984 | " \n",
985 | " - Do not use the sigmoid activation. Use ReLU if you are unsure.\n"
986 | ]
987 | },
988 | {
989 | "cell_type": "markdown",
990 | "metadata": {},
991 | "source": [
992 | "#### Software\n",
993 | "\n",
994 | "Large networks require considerable computing power to optimize. Matrix computations are well suited for GPUs. The scikit-learn library does not support GPU computation out of the box. It also does not support convoluted neural networks.\n",
995 | "\n",
996 | "There is a whole zoo of software libraries to train large neural networks. Here are some important ones.\n",
997 | "\n",
998 | "##### Tensorflow\n",
999 | "\n",
1000 | "Developed by Google Inc. Maybe the most popular solution.\n",
1001 | "\n",
1002 | "\n",
1003 | "\n",
1004 | "\n",
1005 | "##### Keras\n",
1006 | "\n",
1007 | "High-Level API build ontop of TF\n",
1008 | "\n",
1009 | "
\n",
1010 | "\n",
1011 | "###### pytorsch\n",
1012 | "\n",
1013 | "Facebook's solution \n",
1014 | "https://github.com/pytorch/pytorch"
1015 | ]
1016 | },
1017 | {
1018 | "cell_type": "code",
1019 | "execution_count": null,
1020 | "metadata": {
1021 | "ExecuteTime": {
1022 | "end_time": "2017-12-04T12:23:41.772422Z",
1023 | "start_time": "2017-12-04T12:23:41.756376Z"
1024 | }
1025 | },
1026 | "outputs": [],
1027 | "source": [
1028 | "from IPython.display import HTML\n",
1029 | "\n",
1030 | "HTML('')"
1031 | ]
1032 | }
1033 | ],
1034 | "metadata": {
1035 | "kernelspec": {
1036 | "display_name": "Python 3",
1037 | "language": "python",
1038 | "name": "python3"
1039 | },
1040 | "language_info": {
1041 | "codemirror_mode": {
1042 | "name": "ipython",
1043 | "version": 3
1044 | },
1045 | "file_extension": ".py",
1046 | "mimetype": "text/x-python",
1047 | "name": "python",
1048 | "nbconvert_exporter": "python",
1049 | "pygments_lexer": "ipython3",
1050 | "version": "3.8.12"
1051 | }
1052 | },
1053 | "nbformat": 4,
1054 | "nbformat_minor": 2
1055 | }
1056 |
--------------------------------------------------------------------------------
/smd_neural_networks_keras.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2a42f6dc",
6 | "metadata": {},
7 | "source": [
8 | "# Short Introduction to Neural Networks and Deep Learning with Pytorch"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "id": "2d86167c",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import os\n",
19 | "\n",
20 | "\n",
21 | "import matplotlib.pyplot as plt\n",
22 | "import numpy as np\n",
23 | "\n",
24 | "from tqdm.auto import tqdm\n",
25 | "\n",
26 | "# keras supports tensorflow, torch and jax\n",
27 | "os.environ[\"KERAS_BACKEND\"] = \"torch\"\n",
28 | "\n",
29 | "import keras\n",
30 | "from keras import layers\n",
31 | "from keras import ops"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "id": "1ead697a",
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "%matplotlib widget\n",
42 | "\n",
43 | "plt.rcParams[\"figure.constrained_layout.use\"] = True"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "id": "53b553b9",
49 | "metadata": {},
50 | "source": [
51 | "# How to define a Neural Network Architecture in Keras\n",
52 | "\n",
53 | "To declare a new Network architecture, we create an instance of [`keras.Sequential`](https://keras.io/guides/sequential_model/)\n",
54 | "\n",
55 | "We can define the layers that are applied *in sequence*. \n",
56 | "\n",
57 | "Keras completely takes care about gradient computation using back propagation for us."
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": null,
63 | "id": "ff06fc7b",
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "model = keras.Sequential(\n",
68 | " name=\"fully-connected\",\n",
69 | " layers=[\n",
70 | " layers.Dense(128, activation=\"relu\", name='hidden-1'),\n",
71 | " layers.Dense(128, activation=\"relu\", name='hidden-2'),\n",
72 | " layers.Dense(10, activation=\"softmax\", name='output'),\n",
73 | " ],\n",
74 | ")\n",
75 | "\n",
76 | "model.summary()"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "id": "b4ad8146-9098-4efb-a40e-b9e60935fbed",
82 | "metadata": {},
83 | "source": [
84 | "Observe that the input shapes are not yet fixed. It will be determined once applied to data for the first time:"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": null,
90 | "id": "fa3d4f8d-80de-4ba3-b7ee-28f79c01fdee",
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "dummy = np.zeros((1, 8 * 8))\n",
95 | "model(dummy)\n",
96 | "model.summary()"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "id": "f842e09a",
102 | "metadata": {},
103 | "source": [
104 | "Now we are building a more flexible model, where we can pass some options:"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": null,
110 | "id": "479c7554",
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "def create_model(n_classes, n_hidden, dropout=0.25, activation=\"leaky_relu\"):\n",
115 | " return keras.Sequential(\n",
116 | " layers=[\n",
117 | " # flatten and normalize input\n",
118 | " layers.Flatten(),\n",
119 | " layers.BatchNormalization(),\n",
120 | " # first hidden layer\n",
121 | " layers.Dense(n_hidden, activation=activation),\n",
122 | " layers.BatchNormalization(),\n",
123 | " layers.Dropout(dropout),\n",
124 | " \n",
125 | " # second hidden layer\n",
126 | " layers.Dense(n_hidden, activation=activation),\n",
127 | " layers.BatchNormalization(),\n",
128 | " layers.Dropout(dropout),\n",
129 | " \n",
130 | " # output layer\n",
131 | " layers.Dense(n_classes, activation=\"softmax\")\n",
132 | " ]\n",
133 | " )\n",
134 | "\n",
135 | "model = create_model(n_classes=10, n_hidden=128)\n",
136 | "model(np.zeros((1, 28, 28)))\n",
137 | "model.summary()"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "id": "5f24277b",
143 | "metadata": {},
144 | "source": [
145 | "# Training\n",
146 | "\n",
147 | "Keras comes with default fit / evaluate functions.\n",
148 | "\n",
149 | "We could roll our own, but for this simple examples, we are going to use the defaults."
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "id": "91cfebec",
155 | "metadata": {},
156 | "source": [
157 | "# MNIST"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "id": "4dca0824",
164 | "metadata": {},
165 | "outputs": [],
166 | "source": [
167 | "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
168 | "\n",
169 | "x_train.shape, y_train.shape"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "id": "c5bed474",
176 | "metadata": {},
177 | "outputs": [],
178 | "source": [
179 | "fig, axs = plt.subplots(2, 5, figsize=(9, 3), constrained_layout=True)\n",
180 | "\n",
181 | "for i, ax in enumerate(axs.flat):\n",
182 | " ax.imshow(x_train[i], cmap='gray_r')"
183 | ]
184 | },
185 | {
186 | "cell_type": "markdown",
187 | "id": "c78b9d9f-be42-4b19-a130-81a85c33763d",
188 | "metadata": {},
189 | "source": [
190 | "Now we need to compile the model, which also defines loss function, optimizer and metrics we want to evaluate"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "id": "87f11b66",
197 | "metadata": {},
198 | "outputs": [],
199 | "source": [
200 | "model = create_model(n_classes=10, n_hidden=128)\n",
201 | "\n",
202 | "model.compile(\n",
203 | " loss=keras.losses.SparseCategoricalCrossentropy(),\n",
204 | " optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
205 | " metrics=[\n",
206 | " keras.metrics.SparseCategoricalAccuracy(name=\"accuracy\"),\n",
207 | " ],\n",
208 | ")"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": null,
214 | "id": "8c850d30-75e3-4028-a280-efc0e9c07d71",
215 | "metadata": {},
216 | "outputs": [],
217 | "source": [
218 | "batch_size = 128\n",
219 | "epochs = 20\n",
220 | "\n",
221 | "history = model.fit(\n",
222 | " x_train,\n",
223 | " y_train,\n",
224 | " batch_size=batch_size,\n",
225 | " epochs=epochs,\n",
226 | " validation_split=0.15,\n",
227 | ")\n",
228 | "score = model.evaluate(x_test, y_test, verbose=0)"
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": null,
234 | "id": "4f50fc2e",
235 | "metadata": {},
236 | "outputs": [],
237 | "source": [
238 | "from matplotlib.ticker import IndexLocator\n",
239 | "\n",
240 | "def plot_losses(history):\n",
241 | " fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)\n",
242 | " \n",
243 | " ax1.plot(history.history[\"loss\"], label=\"train\")\n",
244 | " ax1.plot(history.history[\"val_loss\"], label=\"validation\")\n",
245 | "\n",
246 | " ax2.plot(history.history[\"accuracy\"], label=\"train\")\n",
247 | " ax2.plot(history.history[\"val_accuracy\"], label=\"validation\")\n",
248 | "\n",
249 | " ax1.set(\n",
250 | " ylabel=\"loss\",\n",
251 | " )\n",
252 | " ax2.set(\n",
253 | " xlabel=\"epoch\",\n",
254 | " ylabel=\"accuracy\",\n",
255 | " )\n",
256 | "\n",
257 | " ax1.legend()\n",
258 | " ax2.legend()\n",
259 | " ax2.xaxis.set_major_locator(IndexLocator(2, 0))"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "id": "5bf4e9ae",
266 | "metadata": {},
267 | "outputs": [],
268 | "source": [
269 | "plot_losses(history)"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "id": "9fc3a125-0d46-4779-b62c-581f61505ae6",
275 | "metadata": {},
276 | "source": [
277 | "The validation loss is lower in the beginning, mainly due to two reasons:\n",
278 | "\n",
279 | "* The model is learning fast and the train loss is the mean over the epoch while the validation loss is evaluated at the end of the epoch.\n",
280 | "* Dropout is active for the training evaluation, but the validation uses the full network"
281 | ]
282 | },
283 | {
284 | "cell_type": "markdown",
285 | "id": "6a6fee71",
286 | "metadata": {},
287 | "source": [
288 | "# CIFAR-10"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": null,
294 | "id": "91821c11",
295 | "metadata": {},
296 | "outputs": [],
297 | "source": [
298 | "(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "id": "8c290b31-5d0c-4442-aa2b-ff81edd80488",
305 | "metadata": {},
306 | "outputs": [],
307 | "source": [
308 | "x_train.shape, y_train.shape"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": null,
314 | "id": "b9006984-eca8-4366-aa95-bcfe8851f76e",
315 | "metadata": {},
316 | "outputs": [],
317 | "source": [
318 | "cifar10_classes = dict(enumerate([\"airplane\", \"automobile\", \"bird\", \"cat\", \"deer\", \"dog\", \"frog\", \"horse\", \"ship\", \"truck\"]))"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": null,
324 | "id": "fc54c5a2",
325 | "metadata": {},
326 | "outputs": [],
327 | "source": [
328 | "fig, axs = plt.subplots(4, 4, figsize=(9, 9), constrained_layout=True)\n",
329 | "\n",
330 | "rng = np.random.default_rng(0)\n",
331 | "indices = rng.choice(len(y_train), size=axs.size)\n",
332 | "\n",
333 | "for idx, ax in zip(indices, axs.flat):\n",
334 | "\n",
335 | " img = x_train[idx] \n",
336 | "\n",
337 | " ax.set_title(cifar10_classes[y_train[idx, 0]])\n",
338 | " ax.imshow(img)\n",
339 | " ax.set_axis_off()"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": null,
345 | "id": "e28c57b5-5a7d-463c-a619-90db458eb264",
346 | "metadata": {},
347 | "outputs": [],
348 | "source": [
349 | "model = create_model(n_classes=len(cifar10_classes), n_hidden=128)\n",
350 | "\n",
351 | "\n",
352 | "model.compile(\n",
353 | " loss=keras.losses.SparseCategoricalCrossentropy(),\n",
354 | " optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
355 | " metrics=[\n",
356 | " keras.metrics.SparseCategoricalAccuracy(name=\"accuracy\"),\n",
357 | " ],\n",
358 | ")"
359 | ]
360 | },
361 | {
362 | "cell_type": "code",
363 | "execution_count": null,
364 | "id": "77e19540-45dc-492b-b929-3169e4234bd2",
365 | "metadata": {},
366 | "outputs": [],
367 | "source": [
368 | "batch_size = 128\n",
369 | "epochs = 30\n",
370 | "\n",
371 | "history = model.fit(\n",
372 | " x_train,\n",
373 | " y_train,\n",
374 | " batch_size=batch_size,\n",
375 | " epochs=epochs,\n",
376 | " validation_split=0.15,\n",
377 | ")\n",
378 | "score = model.evaluate(x_test, y_test, verbose=0)\n",
379 | "score"
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": null,
385 | "id": "271bd140-2030-4463-a29e-1a08f4fff668",
386 | "metadata": {},
387 | "outputs": [],
388 | "source": [
389 | "plot_losses(history)"
390 | ]
391 | },
392 | {
393 | "cell_type": "markdown",
394 | "id": "d5481c6d-0e64-4232-a66f-060cb62b019f",
395 | "metadata": {},
396 | "source": [
397 | "We do not get much better than 50 % with a fully connected network.\n",
398 | "\n",
399 | "Let's try a convolutional network. First a relatively simple one, based on the Keras examples:"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": null,
405 | "id": "9f8a2b81-c535-4040-bfd6-dfa07b18bd4a",
406 | "metadata": {},
407 | "outputs": [],
408 | "source": [
409 | "input_shape = x_train[0].shape\n",
410 | "n_classes = len(cifar10_classes)\n",
411 | "\n",
412 | "model = keras.Sequential(\n",
413 | " [\n",
414 | " keras.Input(shape=input_shape),\n",
415 | " layers.BatchNormalization(),\n",
416 | " layers.Conv2D(32, kernel_size=(3, 3), activation=\"relu\"),\n",
417 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
418 | " layers.Conv2D(64, kernel_size=(3, 3), activation=\"relu\"),\n",
419 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
420 | " layers.Flatten(),\n",
421 | " layers.Dropout(0.5),\n",
422 | " layers.BatchNormalization(),\n",
423 | " layers.Dense(n_classes, activation=\"softmax\"),\n",
424 | " ]\n",
425 | ")\n",
426 | "\n",
427 | "model.compile(\n",
428 | " loss=keras.losses.SparseCategoricalCrossentropy(),\n",
429 | " optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
430 | " metrics=[\n",
431 | " keras.metrics.SparseCategoricalAccuracy(name=\"accuracy\"),\n",
432 | " ],\n",
433 | ")\n",
434 | "\n",
435 | "model.summary()"
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "id": "763c4182-81a0-4cda-aa91-c7edadd48862",
442 | "metadata": {},
443 | "outputs": [],
444 | "source": [
445 | "batch_size = 128\n",
446 | "epochs = 20\n",
447 | "\n",
448 | "history = model.fit(\n",
449 | " x_train,\n",
450 | " y_train,\n",
451 | " batch_size=batch_size,\n",
452 | " epochs=epochs,\n",
453 | " validation_split=0.15,\n",
454 | ")\n",
455 | "score = model.evaluate(x_test, y_test, verbose=0)\n",
456 | "score"
457 | ]
458 | },
459 | {
460 | "cell_type": "code",
461 | "execution_count": null,
462 | "id": "1ae836f7-4ee4-42a7-9b41-e2b0f87abd3b",
463 | "metadata": {},
464 | "outputs": [],
465 | "source": [
466 | "plot_losses(history)"
467 | ]
468 | },
469 | {
470 | "cell_type": "markdown",
471 | "id": "ac5df220-abb7-47e3-824c-0828c3c347d1",
472 | "metadata": {},
473 | "source": [
474 | "Let's try another architecture, from https://arxiv.org/abs/1409.1556\n",
475 | "\n",
476 | "> Very Deep Convolutional Networks for Large-Scale Image Recognition \n",
477 | "> Karen Simonyan, Andrew Zisserman\n",
478 | "\n",
479 | "> In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision. "
480 | ]
481 | },
482 | {
483 | "cell_type": "code",
484 | "execution_count": null,
485 | "id": "b3f1c80a-85e7-404e-90b5-65a5dc9e6334",
486 | "metadata": {},
487 | "outputs": [],
488 | "source": [
489 | "model = keras.Sequential(\n",
490 | " [\n",
491 | " keras.Input(shape=input_shape),\n",
492 | "\n",
493 | " # first convolutional stack\n",
494 | " layers.Conv2D(32, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
495 | " layers.Conv2D(32, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
496 | " layers.BatchNormalization(),\n",
497 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
498 | " layers.Dropout(0.25),\n",
499 | "\n",
500 | " # second convolutional stack\n",
501 | " layers.Conv2D(64, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
502 | " layers.Conv2D(64, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
503 | " layers.BatchNormalization(),\n",
504 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
505 | " layers.Dropout(0.25),\n",
506 | "\n",
507 | " # third convolutional stack\n",
508 | " layers.Conv2D(128, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
509 | " layers.Conv2D(128, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
510 | " layers.BatchNormalization(),\n",
511 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
512 | " layers.Dropout(0.25),\n",
513 | "\n",
514 | " # fully-connected part\n",
515 | " layers.Flatten(),\n",
516 | " layers.Dense(128, activation=\"leaky_relu\"),\n",
517 | " layers.Dropout(0.25),\n",
518 | " layers.Dense(n_classes, activation=\"softmax\"),\n",
519 | " ]\n",
520 | ")\n",
521 | "\n",
522 | "model.compile(\n",
523 | " loss=keras.losses.SparseCategoricalCrossentropy(),\n",
524 | " optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
525 | " metrics=[\n",
526 | " keras.metrics.SparseCategoricalAccuracy(name=\"accuracy\"),\n",
527 | " ],\n",
528 | ")\n",
529 | "\n",
530 | "model.summary()"
531 | ]
532 | },
533 | {
534 | "cell_type": "code",
535 | "execution_count": null,
536 | "id": "0a505340-eb78-43af-9b01-17c1fab7323c",
537 | "metadata": {},
538 | "outputs": [],
539 | "source": [
540 | "batch_size = 64\n",
541 | "epochs = 30\n",
542 | "\n",
543 | "history = model.fit(\n",
544 | " x_train,\n",
545 | " y_train,\n",
546 | " batch_size=batch_size,\n",
547 | " epochs=epochs,\n",
548 | " validation_split=0.15,\n",
549 | ")"
550 | ]
551 | },
552 | {
553 | "cell_type": "code",
554 | "execution_count": null,
555 | "id": "020ce7e7-5fa0-4e29-bc78-3d485fea0040",
556 | "metadata": {},
557 | "outputs": [],
558 | "source": [
559 | "test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)"
560 | ]
561 | },
562 | {
563 | "cell_type": "code",
564 | "execution_count": null,
565 | "id": "09c3967e-c577-4718-b14e-abc358f71e5c",
566 | "metadata": {},
567 | "outputs": [],
568 | "source": [
569 | "plot_losses(history)"
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": null,
575 | "id": "56b3bbc9-d16d-442e-817b-76d18c9f25d6",
576 | "metadata": {},
577 | "outputs": [],
578 | "source": [
579 | "predicted_score = model.predict(x_test)\n",
580 | "prediction = np.argmax(predicted_score, axis=1)"
581 | ]
582 | },
583 | {
584 | "cell_type": "code",
585 | "execution_count": null,
586 | "id": "e8b4cd99-2d4c-459d-b399-5c9df01dd326",
587 | "metadata": {},
588 | "outputs": [],
589 | "source": [
590 | "prediction"
591 | ]
592 | },
593 | {
594 | "cell_type": "code",
595 | "execution_count": null,
596 | "id": "c330ae0d-d02c-49e7-b4d3-64802ab5aca3",
597 | "metadata": {},
598 | "outputs": [],
599 | "source": [
600 | "test_accuracy"
601 | ]
602 | },
603 | {
604 | "cell_type": "code",
605 | "execution_count": null,
606 | "id": "0f1da51d",
607 | "metadata": {},
608 | "outputs": [],
609 | "source": [
610 | "from sklearn.metrics import confusion_matrix\n",
611 | "\n",
612 | "\n",
613 | "classes = list(cifar10_classes.values())\n",
614 | "\n",
615 | "\n",
616 | "matrix = confusion_matrix(y_test, prediction)\n",
617 | "matrix = np.divide(matrix, matrix.sum(axis = 1))\n",
618 | "\n",
619 | "fig, ax = plt.subplots()\n",
620 | "\n",
621 | "mat = ax.matshow(matrix)\n",
622 | "ax.set_xticks(np.arange(len(classes)))\n",
623 | "ax.set_xticklabels(classes, rotation=90)\n",
624 | "\n",
625 | "ax.set_yticks(np.arange(len(classes)))\n",
626 | "ax.set_yticklabels(classes)\n",
627 | "\n",
628 | "fig.colorbar(mat)\n",
629 | " \n",
630 | "None"
631 | ]
632 | },
633 | {
634 | "cell_type": "markdown",
635 | "id": "9956dd0a-5939-496f-9146-3abbe38e7e2a",
636 | "metadata": {},
637 | "source": [
638 | "## Fashion-MNIST"
639 | ]
640 | },
641 | {
642 | "cell_type": "code",
643 | "execution_count": null,
644 | "id": "c5c93955-9266-403f-8e93-16f37a9f3a65",
645 | "metadata": {},
646 | "outputs": [],
647 | "source": [
648 | "(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()\n"
649 | ]
650 | },
651 | {
652 | "cell_type": "code",
653 | "execution_count": null,
654 | "id": "7a1b8cb2-b70f-4500-96d0-58dcfcfde639",
655 | "metadata": {},
656 | "outputs": [],
657 | "source": [
658 | "x_train.shape, y_train.shape"
659 | ]
660 | },
661 | {
662 | "cell_type": "code",
663 | "execution_count": null,
664 | "id": "8091ddd7-20c2-4b2e-9dc6-84773572a0fd",
665 | "metadata": {},
666 | "outputs": [],
667 | "source": [
668 | "fashion_mnist_classes = {\n",
669 | " 0: \"T-shirt/top\",\n",
670 | " 1: \"Trouser\",\n",
671 | " 2: \"Pullover\",\n",
672 | " 3: \"Dress\",\n",
673 | " 4: \"Coat\",\n",
674 | " 5: \"Sandal\",\n",
675 | " 6: \"Shirt\",\n",
676 | " 7: \"Sneaker\",\n",
677 | " 8: \"Bag\",\n",
678 | " 9: \"Ankle boot\",\n",
679 | "}"
680 | ]
681 | },
682 | {
683 | "cell_type": "code",
684 | "execution_count": null,
685 | "id": "54307794-2363-494c-ae20-81bc144376ca",
686 | "metadata": {},
687 | "outputs": [],
688 | "source": [
689 | "fig, axs = plt.subplots(4, 4, figsize=(9, 9), constrained_layout=True)\n",
690 | "\n",
691 | "rng = np.random.default_rng(0)\n",
692 | "indices = rng.choice(len(y_train), size=axs.size)\n",
693 | "\n",
694 | "for idx, ax in zip(indices, axs.flat):\n",
695 | "\n",
696 | " img = x_train[idx] \n",
697 | "\n",
698 | " ax.set_title(fashion_mnist_classes[y_train[idx]])\n",
699 | " ax.imshow(img, cmap='gray_r')\n",
700 | " ax.set_axis_off()"
701 | ]
702 | },
703 | {
704 | "cell_type": "code",
705 | "execution_count": null,
706 | "id": "749669b3-0119-4d77-8f05-498d82c11b91",
707 | "metadata": {},
708 | "outputs": [],
709 | "source": [
710 | "# add one dimension for the \"color channels\"\n",
711 | "\n",
712 | "x_train = x_train[..., np.newaxis]\n",
713 | "x_test = x_test[..., np.newaxis]"
714 | ]
715 | },
716 | {
717 | "cell_type": "code",
718 | "execution_count": null,
719 | "id": "a210838e-74d8-43c4-b959-9420bd03ed72",
720 | "metadata": {},
721 | "outputs": [],
722 | "source": [
723 | "input_shape = x_train[0].shape\n",
724 | "n_classes = len(fashion_mnist_classes)\n",
725 | "\n",
726 | "\n",
727 | "model = keras.Sequential(\n",
728 | " [\n",
729 | " keras.Input(shape=input_shape),\n",
730 | "\n",
731 | " # first convolutional stack\n",
732 | " layers.Conv2D(32, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
733 | " layers.Conv2D(32, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
734 | " layers.BatchNormalization(),\n",
735 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
736 | " layers.Dropout(0.25),\n",
737 | "\n",
738 | " # second convolutional stack\n",
739 | " layers.Conv2D(64, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
740 | " layers.Conv2D(64, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
741 | " layers.BatchNormalization(),\n",
742 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
743 | " layers.Dropout(0.25),\n",
744 | "\n",
745 | " # third convolutional stack\n",
746 | " layers.Conv2D(128, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
747 | " layers.Conv2D(128, kernel_size=(3, 3), activation=\"leaky_relu\", padding=\"same\"),\n",
748 | " layers.BatchNormalization(),\n",
749 | " layers.MaxPooling2D(pool_size=(2, 2)),\n",
750 | " layers.Dropout(0.25),\n",
751 | "\n",
752 | " # fully-connected part\n",
753 | " layers.Flatten(),\n",
754 | " layers.Dense(128, activation=\"leaky_relu\"),\n",
755 | " layers.Dropout(0.25),\n",
756 | " layers.Dense(n_classes, activation=\"softmax\"),\n",
757 | " ]\n",
758 | ")\n",
759 | "\n",
760 | "model.compile(\n",
761 | " loss=keras.losses.SparseCategoricalCrossentropy(),\n",
762 | " optimizer=keras.optimizers.Adam(learning_rate=1e-3),\n",
763 | " metrics=[\n",
764 | " keras.metrics.SparseCategoricalAccuracy(name=\"accuracy\"),\n",
765 | " ],\n",
766 | ")\n",
767 | "\n",
768 | "model.summary()"
769 | ]
770 | },
771 | {
772 | "cell_type": "code",
773 | "execution_count": null,
774 | "id": "e888251d-fb8d-4349-9801-fb8886597fa3",
775 | "metadata": {},
776 | "outputs": [],
777 | "source": [
778 | "batch_size = 64\n",
779 | "epochs = 30\n",
780 | "\n",
781 | "history = model.fit(\n",
782 | " x_train,\n",
783 | " y_train,\n",
784 | " batch_size=batch_size,\n",
785 | " epochs=epochs,\n",
786 | " validation_split=0.15,\n",
787 | ")"
788 | ]
789 | },
790 | {
791 | "cell_type": "markdown",
792 | "id": "69c468e7-28c2-4330-a93e-b51cf35ddc0d",
793 | "metadata": {},
794 | "source": [
795 | "Further links:\n",
796 | "* [Keras Documentation](https://keras.io)\n",
797 | "* [Keras Quickstart Tutorial](https://keras.io/getting_started/intro_to_keras_for_engineers/)\n",
798 | "* [Keras Examples](https://keras.io/examples/)"
799 | ]
800 | },
801 | {
802 | "cell_type": "markdown",
803 | "id": "52044197-fa81-4186-afe6-84418c1fb990",
804 | "metadata": {},
805 | "source": [
806 | "The best current performance claimed on CIFAR-10 is 99.5 % accuracy:\n",
807 | "\n",
808 | "https://en.wikipedia.org/wiki/CIFAR-10#Research_papers_claiming_state-of-the-art_results_on_CIFAR-10"
809 | ]
810 | }
811 | ],
812 | "metadata": {
813 | "kernelspec": {
814 | "display_name": "Python 3 (ipykernel)",
815 | "language": "python",
816 | "name": "python3"
817 | },
818 | "language_info": {
819 | "codemirror_mode": {
820 | "name": "ipython",
821 | "version": 3
822 | },
823 | "file_extension": ".py",
824 | "mimetype": "text/x-python",
825 | "name": "python",
826 | "nbconvert_exporter": "python",
827 | "pygments_lexer": "ipython3",
828 | "version": "3.12.4"
829 | }
830 | },
831 | "nbformat": 4,
832 | "nbformat_minor": 5
833 | }
834 |
--------------------------------------------------------------------------------
/smd_neural_networks_torch.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2a42f6dc",
6 | "metadata": {},
7 | "source": [
8 | "# Short Introduction to Neural Networks and Deep Learning with Pytorch"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "id": "2d86167c",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import matplotlib.pyplot as plt\n",
19 | "import numpy as np\n",
20 | "\n",
21 | "from tqdm.auto import tqdm\n",
22 | "\n",
23 | "import torch\n",
24 | "from torch import nn\n",
25 | "from torch.utils.data import DataLoader\n",
26 | "from torchvision import datasets\n",
27 | "from torchvision.transforms import ToTensor, Resize, Compose"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "id": "1ead697a",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "%matplotlib widget"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "53b553b9",
43 | "metadata": {},
44 | "source": [
45 | "# How to define a Neural Network Architecture in Torch\n",
46 | "\n",
47 | "To declare a new Network architecture, we create a new class inheriting from `torch.nn.Model`.\n",
48 | "\n",
49 | "The simplest way to declare a Network architecture is to declare the sequence of layers using `torch.nn.Sequential`\n",
50 | "in `__init__` and we have to implement the `forward` pass. The rest is taken care of by torch (gradients, backword propagation, ...) automagically.\n",
51 | "\n",
52 | "Torch builds a computational graph, that can be executed (on different devices) and transformed (e.g. calculate the gradient)."
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": null,
58 | "id": "ff06fc7b",
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "class FullyConnected(nn.Module):\n",
63 | " \n",
64 | " def __init__(self):\n",
65 | " super().__init__()\n",
66 | " \n",
67 | " self.fc = nn.Sequential(\n",
68 | " nn.Linear(8 * 8 * 1, 128),\n",
69 | " nn.ReLU(),\n",
70 | " nn.Linear(128, 128),\n",
71 | " nn.ReLU(),\n",
72 | " nn.Linear(128, 10),\n",
73 | " nn.Softmax(dim=1)\n",
74 | " )\n",
75 | " self.flatten = nn.Flatten()\n",
76 | " \n",
77 | " def forward(self, x):\n",
78 | " # x.shape = (batchsize, 1, 8, 8)\n",
79 | " x = self.flatten(x)\n",
80 | " x = self.fc(x)\n",
81 | " return x"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "id": "17f6b9e2",
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "net = FullyConnected()\n",
92 | "net"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "id": "f842e09a",
98 | "metadata": {},
99 | "source": [
100 | "Now we are building a more flexible model, where we can pass some options:"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": null,
106 | "id": "479c7554",
107 | "metadata": {},
108 | "outputs": [],
109 | "source": [
110 | "class FullyConnected(nn.Module):\n",
111 | " def __init__(self, input_size, n_classes, dropout=0.25, n_hidden=256):\n",
112 | " super().__init__()\n",
113 | " self.flatten = nn.Flatten()\n",
114 | " \n",
115 | " self.fc_stack = nn.Sequential(\n",
116 | " nn.BatchNorm1d(input_size),\n",
117 | " \n",
118 | " # First Hidden Layer\n",
119 | " nn.Linear(input_size, n_hidden),\n",
120 | " nn.BatchNorm1d(n_hidden),\n",
121 | " nn.Dropout(dropout),\n",
122 | " nn.LeakyReLU(),\n",
123 | " \n",
124 | " # Second Hidden Layer\n",
125 | " nn.Linear(n_hidden, n_hidden),\n",
126 | " nn.BatchNorm1d(n_hidden),\n",
127 | " nn.Dropout(dropout),\n",
128 | " nn.LeakyReLU(),\n",
129 | " \n",
130 | " # Output Layer\n",
131 | " nn.Linear(n_hidden, n_classes),\n",
132 | " nn.Softmax(dim=1),\n",
133 | " )\n",
134 | "\n",
135 | " def forward(self, x):\n",
136 | " x = self.flatten(x)\n",
137 | " x = self.fc_stack(x)\n",
138 | " return x\n",
139 | " \n",
140 | "net = FullyConnected(input_size=3 * 50 * 50, n_classes=2)\n",
141 | "net"
142 | ]
143 | },
144 | {
145 | "cell_type": "markdown",
146 | "id": "5f24277b",
147 | "metadata": {},
148 | "source": [
149 | "# Training\n",
150 | "\n",
151 | "Unfortunately, training the network is not as simple as calling `fit` like in sklearn.\n",
152 | "Torch is a very flexible framework, and we have to decide for the data loader, loss function, the optimizer, the model, device and how we evaluate the performance on the test data set.\n",
153 | "\n",
154 | "In the end, we are going to write our own `fit` function, to make it simpler."
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": null,
160 | "id": "435aef2a",
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "# device = \"cpu\"\n",
165 | "# uncomment to use GPU if available\n",
166 | "# CPU offers better debugging\n",
167 | "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
168 | "print(\"Using {} device\".format(DEVICE))"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": null,
174 | "id": "1a43b6fa",
175 | "metadata": {},
176 | "outputs": [],
177 | "source": [
178 | "def train(dataloader, model, loss_fn, optimizer, device=DEVICE): \n",
179 | " model = model.to(device)\n",
180 | " model.train()\n",
181 | " \n",
182 | " # losses = []\n",
183 | " for X, y in dataloader:\n",
184 | " X, y = X.to(device), y.to(device)\n",
185 | " \n",
186 | " # Compute prediction error\n",
187 | " pred = model(X)\n",
188 | " loss = loss_fn(pred, y)\n",
189 | " \n",
190 | " # Backpropagation\n",
191 | " optimizer.zero_grad()\n",
192 | " loss.backward()\n",
193 | " optimizer.step()\n",
194 | " \n",
195 | " # # store loss for plotting\n",
196 | " # losses.append(loss.item())\n",
197 | " # return losses\n",
198 | "\n",
199 | " \n",
200 | "def test(dataloader, model, loss_fn, device=DEVICE):\n",
201 | " model = model.to(device)\n",
202 | " test_losses = []\n",
203 | " with torch.no_grad():\n",
204 | " model.eval()\n",
205 | " for X, y in dataloader:\n",
206 | " X, y = X.to(device), y.to(device)\n",
207 | " pred = model(X)\n",
208 | " test_losses.append(loss_fn(pred, y).item())\n",
209 | "\n",
210 | " return test_losses\n",
211 | "\n",
212 | "\n",
213 | "def fit_one_epoch(train_dataloader, test_dataloader, model, loss_fn, optimizer, device=DEVICE):\n",
214 | " train(train_dataloader, model, loss_fn, optimizer, device)\n",
215 | "\n",
216 | " train_losses = test(train_dataloader, model, loss_fn, device)\n",
217 | " test_losses = test(test_dataloader, model, loss_fn, device)\n",
218 | " return train_losses, test_losses\n",
219 | "\n",
220 | "\n",
221 | "def accuracy(dataloader, model, device=DEVICE):\n",
222 | " correct = 0\n",
223 | " total = 0\n",
224 | " model = model.to(device)\n",
225 | " with torch.no_grad():\n",
226 | " model.eval()\n",
227 | " for X, y in dataloader:\n",
228 | " X, y = X.to(device), y.to(device)\n",
229 | " pred = model(X)\n",
230 | " total += len(y)\n",
231 | " correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n",
232 | " return correct / total\n",
233 | "\n",
234 | "\n",
235 | "def predictions(dataloader, model, device=DEVICE):\n",
236 | " predictions = []\n",
237 | " truth = []\n",
238 | " model = model.to(device)\n",
239 | " with torch.no_grad():\n",
240 | " model.eval()\n",
241 | " for X, y in dataloader:\n",
242 | " X, y = X.to(device), y.to(device)\n",
243 | " predictions.append(model(X).argmax(1))\n",
244 | " truth.append(y)\n",
245 | " return torch.cat(predictions), torch.cat(truth)\n",
246 | "\n",
247 | "\n",
248 | "def report_accuracy(test_dataloader, train_dataloader, model):\n",
249 | " accuracy_test = accuracy(test_dataloader, model)\n",
250 | " accuracy_train = accuracy(train_dataloader, model)\n",
251 | " print(f'Accuracy: train={accuracy_train:5.1%}, test={accuracy_test:5.1%}')"
252 | ]
253 | },
254 | {
255 | "cell_type": "markdown",
256 | "id": "91cfebec",
257 | "metadata": {},
258 | "source": [
259 | "# MNIST"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "id": "4dca0824",
266 | "metadata": {},
267 | "outputs": [],
268 | "source": [
269 | "mnist_train = datasets.MNIST(\n",
270 | " root=\"data\",\n",
271 | " train=True,\n",
272 | " transform=Compose([Resize((16, 16)), ToTensor()]),\n",
273 | " download=True,\n",
274 | ")\n",
275 | "\n",
276 | "mnist_test = datasets.MNIST(\n",
277 | " root=\"data\",\n",
278 | " train=False,\n",
279 | " transform=Compose([Resize((16, 16)), ToTensor()]),\n",
280 | " download=True,\n",
281 | "\n",
282 | ")"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "id": "b5f71853",
289 | "metadata": {},
290 | "outputs": [],
291 | "source": [
292 | "batch_size = 64\n",
293 | "\n",
294 | "train_dataloader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True)\n",
295 | "test_dataloader = DataLoader(mnist_test, batch_size=batch_size, shuffle=True)\n",
296 | "\n",
297 | "\n",
298 | "# get first batch\n",
299 | "X, y = next(iter(test_dataloader))\n",
300 | "\n",
301 | "print(\"Shape of X: \", X.shape)\n",
302 | "print(\"Shape of y: \", y.shape)"
303 | ]
304 | },
305 | {
306 | "cell_type": "code",
307 | "execution_count": null,
308 | "id": "c5bed474",
309 | "metadata": {},
310 | "outputs": [],
311 | "source": [
312 | "fig, axs = plt.subplots(2, 5, figsize=(9, 3), constrained_layout=True)\n",
313 | "\n",
314 | "for i, ax in enumerate(axs.flat):\n",
315 | " ax.imshow(X[i, 0], cmap='gray_r')"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": null,
321 | "id": "87f11b66",
322 | "metadata": {},
323 | "outputs": [],
324 | "source": [
325 | "model = FullyConnected(\n",
326 | " input_size=X[0].shape.numel(),\n",
327 | " n_classes=len(mnist_train.classes),\n",
328 | " n_hidden=256,\n",
329 | " dropout=0.25,\n",
330 | ")\n",
331 | "\n",
332 | "loss_fn = nn.CrossEntropyLoss()\n",
333 | "optimizer = torch.optim.AdamW(model.parameters())\n",
334 | "\n",
335 | "\n",
336 | "epochs = 20\n",
337 | "test_losses = []\n",
338 | "train_losses = []\n",
339 | "report_accuracy(test_dataloader, train_dataloader, model)\n",
340 | "for t in tqdm(range(epochs)):\n",
341 | " epoch_loss_train, epoch_loss_test = fit_one_epoch(\n",
342 | " train_dataloader, test_dataloader, model, loss_fn, optimizer\n",
343 | " )\n",
344 | " train_losses.append(epoch_loss_train)\n",
345 | " test_losses.append(epoch_loss_test)\n",
346 | " report_accuracy(test_dataloader, train_dataloader, model)\n",
347 | " \n",
348 | "print(\"Done!\")"
349 | ]
350 | },
351 | {
352 | "cell_type": "code",
353 | "execution_count": null,
354 | "id": "4f50fc2e",
355 | "metadata": {},
356 | "outputs": [],
357 | "source": [
358 | "def plot_losses(train_losses, test_losses):\n",
359 | " plt.figure()\n",
360 | " \n",
361 | " for i, (label, losses) in enumerate(zip((\"Train\", \"Test\"), (train_losses, test_losses))):\n",
362 | " losses = np.array(losses)\n",
363 | " \n",
364 | " x = np.linspace(0, len(losses), losses.size)\n",
365 | " plt.plot(x, losses.ravel(), label=f'Loss {label}', color=f'C{i}', alpha=0.5)\n",
366 | " \n",
367 | " mean_loss = losses.mean(axis=1)\n",
368 | " x = np.arange(1, len(mean_loss) + 1)\n",
369 | " plt.plot(x, mean_loss, label=f'Mean Epoch Loss {label}', color=f'C{i}', zorder=3, marker='.')\n",
370 | " \n",
371 | " plt.xlabel('Epoch')\n",
372 | " plt.legend()"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": null,
378 | "id": "5bf4e9ae",
379 | "metadata": {},
380 | "outputs": [],
381 | "source": [
382 | "plot_losses(train_losses, test_losses)\n",
383 | "# plt.yscale('log')\n",
384 | "plt.grid()"
385 | ]
386 | },
387 | {
388 | "cell_type": "markdown",
389 | "id": "6a6fee71",
390 | "metadata": {},
391 | "source": [
392 | "# CIFAR-10"
393 | ]
394 | },
395 | {
396 | "cell_type": "code",
397 | "execution_count": null,
398 | "id": "91821c11",
399 | "metadata": {},
400 | "outputs": [],
401 | "source": [
402 | "cifar10_train = datasets.CIFAR10(\n",
403 | " root=\"data\",\n",
404 | " train=True,\n",
405 | " transform=ToTensor(),\n",
406 | " download=True,\n",
407 | ")\n",
408 | "\n",
409 | "cifar10_test = datasets.CIFAR10(\n",
410 | " root=\"data\",\n",
411 | " train=False,\n",
412 | " transform=ToTensor(),\n",
413 | " download=True,\n",
414 | ")"
415 | ]
416 | },
417 | {
418 | "cell_type": "code",
419 | "execution_count": null,
420 | "id": "763c9611",
421 | "metadata": {},
422 | "outputs": [],
423 | "source": [
424 | "batch_size = 64\n",
425 | "\n",
426 | "train_dataloader = DataLoader(cifar10_train, batch_size=batch_size)\n",
427 | "test_dataloader = DataLoader(cifar10_test, batch_size=batch_size)\n",
428 | "\n",
429 | "# get first batch\n",
430 | "X, y = next(iter(test_dataloader))\n",
431 | "\n",
432 | "print(\"Shape of X: \", X.shape)\n",
433 | "print(\"Shape of y: \", y.shape)"
434 | ]
435 | },
436 | {
437 | "cell_type": "code",
438 | "execution_count": null,
439 | "id": "fc54c5a2",
440 | "metadata": {},
441 | "outputs": [],
442 | "source": [
443 | "fig, axs = plt.subplots(4, 4, figsize=(9, 9), constrained_layout=True)\n",
444 | "\n",
445 | "for idx, ax in enumerate(axs.flat):\n",
446 | " img = np.swapaxes(np.asarray(X[idx + 16]), 1, 2).T\n",
447 | " \n",
448 | "\n",
449 | " ax.set_title(cifar10_train.classes[y[idx + 16]])\n",
450 | " ax.imshow(img)\n",
451 | " ax.set_axis_off()"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": null,
457 | "id": "42d95256",
458 | "metadata": {},
459 | "outputs": [],
460 | "source": [
461 | "model = FullyConnected(\n",
462 | " input_size=X[0].shape.numel(),\n",
463 | " n_classes=len(cifar10_train.classes),\n",
464 | ")"
465 | ]
466 | },
467 | {
468 | "cell_type": "code",
469 | "execution_count": null,
470 | "id": "e30310f4",
471 | "metadata": {},
472 | "outputs": [],
473 | "source": [
474 | "loss_fn = nn.CrossEntropyLoss()\n",
475 | "optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)\n",
476 | "\n",
477 | "\n",
478 | "test_losses = []\n",
479 | "train_losses = []\n",
480 | "\n",
481 | "\n",
482 | "report_accuracy(test_dataloader, train_dataloader, model)\n",
483 | "\n",
484 | "epochs = 15\n",
485 | "for t in tqdm(range(epochs)):\n",
486 | " epoch_loss_train, epoch_loss_test = fit_one_epoch(\n",
487 | " train_dataloader, test_dataloader, model, loss_fn, optimizer\n",
488 | " )\n",
489 | " train_losses.append(epoch_loss_train)\n",
490 | " test_losses.append(epoch_loss_test)\n",
491 | " report_accuracy(test_dataloader, train_dataloader, model)\n",
492 | " \n",
493 | "print(\"Done!\")"
494 | ]
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": null,
499 | "id": "5310d293",
500 | "metadata": {},
501 | "outputs": [],
502 | "source": [
503 | "plot_losses(train_losses, test_losses)\n",
504 | "plt.yscale('log')"
505 | ]
506 | },
507 | {
508 | "cell_type": "markdown",
509 | "id": "7d0f93bc",
510 | "metadata": {},
511 | "source": [
512 | "We do not get much better than 50 % with a fully connected network.\n",
513 | "\n",
514 | "Let's try a deep learning network with convolutional layers. The architecture follows the one proposed here:\n",
515 | "https://arxiv.org/abs/1409.1556\n",
516 | "\n",
517 | "> Very Deep Convolutional Networks for Large-Scale Image Recognition \n",
518 | "> Karen Simonyan, Andrew Zisserman\n",
519 | "\n",
520 | "> In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision. "
521 | ]
522 | },
523 | {
524 | "cell_type": "code",
525 | "execution_count": null,
526 | "id": "cc77cdf1",
527 | "metadata": {},
528 | "outputs": [],
529 | "source": [
530 | "class ConvolutionalNetwork(nn.Module):\n",
531 | " def __init__(self):\n",
532 | " super().__init__()\n",
533 | " \n",
534 | " self.conv_stack = nn.Sequential(\n",
535 | " # 1st stack of conv layers\n",
536 | " nn.Conv2d(3, 32, kernel_size=(3, 3), padding='same'),\n",
537 | " nn.Conv2d(32, 32, kernel_size=(3, 3), padding='same'),\n",
538 | " nn.BatchNorm2d(32),\n",
539 | " nn.LeakyReLU(),\n",
540 | " nn.MaxPool2d(kernel_size=2, stride=2),\n",
541 | " nn.Dropout(0.25),\n",
542 | " \n",
543 | " # 2nd stack\n",
544 | " nn.Conv2d(32, 64, kernel_size=(3, 3), padding='same'),\n",
545 | " nn.Conv2d(64, 64, kernel_size=(3, 3), padding='same'),\n",
546 | " nn.BatchNorm2d(64),\n",
547 | " nn.LeakyReLU(),\n",
548 | " nn.MaxPool2d(kernel_size=2, stride=2),\n",
549 | " nn.Dropout(0.25),\n",
550 | " \n",
551 | " # 3rd stack\n",
552 | " nn.Conv2d(64, 128, kernel_size=(3, 3), padding='same'),\n",
553 | " nn.Conv2d(128, 128, kernel_size=(3, 3), padding='same'),\n",
554 | " nn.BatchNorm2d(128),\n",
555 | " nn.LeakyReLU(),\n",
556 | " nn.MaxPool2d(kernel_size=2, stride=2),\n",
557 | " nn.Dropout(0.25),\n",
558 | " )\n",
559 | " \n",
560 | " self.flatten = nn.Flatten()\n",
561 | " \n",
562 | " self.linear_relu_stack = nn.Sequential(\n",
563 | " nn.Linear(128 * 4 * 4, 128),\n",
564 | " nn.Dropout(0.25),\n",
565 | " nn.LeakyReLU(),\n",
566 | " nn.Linear(128, 10),\n",
567 | " nn.Softmax(dim=1),\n",
568 | " )\n",
569 | "\n",
570 | " def forward(self, x):\n",
571 | " x = self.conv_stack(x)\n",
572 | " x = self.flatten(x)\n",
573 | " x = self.linear_relu_stack(x)\n",
574 | " return x"
575 | ]
576 | },
577 | {
578 | "cell_type": "code",
579 | "execution_count": null,
580 | "id": "6cedbcc4",
581 | "metadata": {},
582 | "outputs": [],
583 | "source": [
584 | "model = ConvolutionalNetwork()"
585 | ]
586 | },
587 | {
588 | "cell_type": "code",
589 | "execution_count": null,
590 | "id": "4388223d",
591 | "metadata": {},
592 | "outputs": [],
593 | "source": [
594 | "loss_fn = nn.CrossEntropyLoss()\n",
595 | "optimizer = torch.optim.AdamW(model.parameters())\n",
596 | "\n",
597 | "test_losses = []\n",
598 | "train_losses = []"
599 | ]
600 | },
601 | {
602 | "cell_type": "code",
603 | "execution_count": null,
604 | "id": "787fde22",
605 | "metadata": {},
606 | "outputs": [],
607 | "source": [
608 | "epochs = 25\n",
609 | "\n",
610 | "\n",
611 | "report_accuracy(test_dataloader, train_dataloader, model)\n",
612 | "\n",
613 | "for t in tqdm(range(epochs)):\n",
614 | " epoch_loss_train, epoch_loss_test = fit_one_epoch(\n",
615 | " train_dataloader, test_dataloader, model, loss_fn, optimizer\n",
616 | " )\n",
617 | " train_losses.append(epoch_loss_train)\n",
618 | " test_losses.append(epoch_loss_test)\n",
619 | " report_accuracy(test_dataloader, train_dataloader, model)\n",
620 | " \n",
621 | "print(\"Done!\")"
622 | ]
623 | },
624 | {
625 | "cell_type": "code",
626 | "execution_count": null,
627 | "id": "c820f23d",
628 | "metadata": {},
629 | "outputs": [],
630 | "source": [
631 | "plot_losses(train_losses=train_losses, test_losses=test_losses)"
632 | ]
633 | },
634 | {
635 | "cell_type": "code",
636 | "execution_count": null,
637 | "id": "b221de1f",
638 | "metadata": {},
639 | "outputs": [],
640 | "source": [
641 | "prediction, y = predictions(test_dataloader, model)"
642 | ]
643 | },
644 | {
645 | "cell_type": "code",
646 | "execution_count": null,
647 | "id": "0f1da51d",
648 | "metadata": {},
649 | "outputs": [],
650 | "source": [
651 | "from sklearn.metrics import confusion_matrix\n",
652 | "\n",
653 | "\n",
654 | "classes = ('plane', 'car', 'bird', 'cat',\n",
655 | " 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')\n",
656 | "\n",
657 | "matrix = confusion_matrix(y.cpu().numpy(), prediction.cpu().numpy())\n",
658 | "matrix = np.divide(matrix, matrix.sum(axis = 1))\n",
659 | "\n",
660 | "with plt.rc_context({\"figure.constrained_layout.use\": False}):\n",
661 | " plt.matshow(matrix)\n",
662 | " plt.xticks(np.arange(len(classes)), classes, rotation=90)\n",
663 | " plt.yticks(np.arange(len(classes)), classes)\n",
664 | " plt.colorbar()\n",
665 | " \n",
666 | "None"
667 | ]
668 | }
669 | ],
670 | "metadata": {
671 | "kernelspec": {
672 | "display_name": "Python 3 (ipykernel)",
673 | "language": "python",
674 | "name": "python3"
675 | },
676 | "language_info": {
677 | "codemirror_mode": {
678 | "name": "ipython",
679 | "version": 3
680 | },
681 | "file_extension": ".py",
682 | "mimetype": "text/x-python",
683 | "name": "python",
684 | "nbconvert_exporter": "python",
685 | "pygments_lexer": "ipython3",
686 | "version": "3.12.3"
687 | }
688 | },
689 | "nbformat": 4,
690 | "nbformat_minor": 5
691 | }
692 |
--------------------------------------------------------------------------------
/titanic.xls:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tudo-astroparticlephysics/machine-learning-lecture/38bec9a945b85b0d32e472a98a770affb616dfe8/titanic.xls
--------------------------------------------------------------------------------