├── 0_prelude ├── puppy.jpg └── imgs │ ├── ml_model.png │ ├── ml_full_picture.png │ └── ml_full_supervised.png ├── 1_ANN └── imgs │ ├── MLP.png │ ├── sigmoid.png │ ├── Perceptron.png │ ├── ad_example.png │ ├── backprop.png │ ├── forward_ad.png │ ├── fwd_step.png │ ├── backward_ad.png │ ├── fwd_step_net.png │ ├── model_optim.png │ ├── single_layer.png │ ├── taylor_joke.jpg │ ├── bkwd_step_net.png │ ├── multi-layers-1.png │ ├── multi-layers-2.png │ ├── learning_process.png │ ├── logistic_function.png │ └── automatic_diff_methods.png ├── logos ├── gmail_small.png ├── linkedin_small.png ├── twitter_small.png ├── uob_logo_small.png ├── dynamic_genetics.png └── pytorch_logo_small.png ├── requirements.txt ├── dl-datascience.yml ├── .gitignore ├── README.md ├── setup.md ├── 2_ML_Data ├── fer.py └── imgs │ ├── train_validation_test2.svg │ ├── train_test_split.svg │ └── cross_validation.svg └── LICENSE /0_prelude/puppy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/0_prelude/puppy.jpg -------------------------------------------------------------------------------- /1_ANN/imgs/MLP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/MLP.png -------------------------------------------------------------------------------- /1_ANN/imgs/sigmoid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/sigmoid.png -------------------------------------------------------------------------------- /logos/gmail_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/gmail_small.png -------------------------------------------------------------------------------- /1_ANN/imgs/Perceptron.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/Perceptron.png -------------------------------------------------------------------------------- /1_ANN/imgs/ad_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/ad_example.png -------------------------------------------------------------------------------- /1_ANN/imgs/backprop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/backprop.png -------------------------------------------------------------------------------- /1_ANN/imgs/forward_ad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/forward_ad.png -------------------------------------------------------------------------------- /1_ANN/imgs/fwd_step.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/fwd_step.png -------------------------------------------------------------------------------- /logos/linkedin_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/linkedin_small.png -------------------------------------------------------------------------------- /logos/twitter_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/twitter_small.png -------------------------------------------------------------------------------- /logos/uob_logo_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/uob_logo_small.png -------------------------------------------------------------------------------- /0_prelude/imgs/ml_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/0_prelude/imgs/ml_model.png -------------------------------------------------------------------------------- /1_ANN/imgs/backward_ad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/backward_ad.png -------------------------------------------------------------------------------- /1_ANN/imgs/fwd_step_net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/fwd_step_net.png -------------------------------------------------------------------------------- /1_ANN/imgs/model_optim.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/model_optim.png -------------------------------------------------------------------------------- /1_ANN/imgs/single_layer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/single_layer.png -------------------------------------------------------------------------------- /1_ANN/imgs/taylor_joke.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/taylor_joke.jpg -------------------------------------------------------------------------------- /logos/dynamic_genetics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/dynamic_genetics.png -------------------------------------------------------------------------------- /1_ANN/imgs/bkwd_step_net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/bkwd_step_net.png -------------------------------------------------------------------------------- /1_ANN/imgs/multi-layers-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/multi-layers-1.png -------------------------------------------------------------------------------- /1_ANN/imgs/multi-layers-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/multi-layers-2.png -------------------------------------------------------------------------------- /logos/pytorch_logo_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/logos/pytorch_logo_small.png -------------------------------------------------------------------------------- /1_ANN/imgs/learning_process.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/learning_process.png -------------------------------------------------------------------------------- /1_ANN/imgs/logistic_function.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/logistic_function.png -------------------------------------------------------------------------------- /0_prelude/imgs/ml_full_picture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/0_prelude/imgs/ml_full_picture.png -------------------------------------------------------------------------------- /0_prelude/imgs/ml_full_supervised.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/0_prelude/imgs/ml_full_supervised.png -------------------------------------------------------------------------------- /1_ANN/imgs/automatic_diff_methods.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leriomaggio/deep-learning-for-data-science/main/1_ANN/imgs/automatic_diff_methods.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | black>=20.8b1 2 | ipykernel>=5.5 3 | ipython>=7.21 4 | jupyter>=1.0 5 | matplotlib>=3.3 6 | notebook>=6.2 7 | nptyping>=1.4 8 | numpy>=1.19 9 | pandas>=1.2 10 | pillow>=8.1.1 11 | pip>=21.0 12 | pytorch>=1.8 13 | scikit-learn>=0.24 14 | scipy>=1.6 15 | torchvision>=0.9 16 | notexbook-theme>=2.0.0 17 | -------------------------------------------------------------------------------- /dl-datascience.yml: -------------------------------------------------------------------------------- 1 | name: dl-datascience 2 | channels: 3 | - pytorch 4 | - conda-forge 5 | - defaults 6 | dependencies: 7 | - black>=20.8b1 8 | - ipykernel>=5.5 9 | - ipython>=7.21 10 | - jupyter>=1.0 11 | - matplotlib>=3.3 12 | - notebook>=6.2 13 | - nptyping>=1.4 14 | - numpy>=1.19 15 | - pandas>=1.2 16 | - pillow>=8.1 17 | - pip>=21.0 18 | - python>=3.8 19 | - pytorch>=1.8 20 | - scikit-learn>=0.24 21 | - scipy>=1.6 22 | - torchvision>=0.9 23 | - pip: 24 | - notexbook-theme>=2.0.0 25 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Configs and OS rubbish 2 | .DS_Store 3 | .vscode 4 | .idea 5 | 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | pip-wheel-metadata/ 29 | share/python-wheels/ 30 | *.egg-info/ 31 | .installed.cfg 32 | *.egg 33 | MANIFEST 34 | 35 | # PyInstaller 36 | # Usually these files are written by a python script from a template 37 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 38 | *.manifest 39 | *.spec 40 | 41 | # Installer logs 42 | pip-log.txt 43 | pip-delete-this-directory.txt 44 | 45 | # Unit test / coverage reports 46 | htmlcov/ 47 | .tox/ 48 | .nox/ 49 | .coverage 50 | .coverage.* 51 | .cache 52 | nosetests.xml 53 | coverage.xml 54 | *.cover 55 | *.py,cover 56 | .hypothesis/ 57 | .pytest_cache/ 58 | 59 | # Translations 60 | *.mo 61 | *.pot 62 | 63 | # Django stuff: 64 | *.log 65 | local_settings.py 66 | db.sqlite3 67 | db.sqlite3-journal 68 | 69 | # Flask stuff: 70 | instance/ 71 | .webassets-cache 72 | 73 | # Scrapy stuff: 74 | .scrapy 75 | 76 | # Sphinx documentation 77 | docs/_build/ 78 | 79 | # PyBuilder 80 | target/ 81 | 82 | # Jupyter Notebook 83 | .ipynb_checkpoints 84 | 85 | # IPython 86 | profile_default/ 87 | ipython_config.py 88 | 89 | # pyenv 90 | .python-version 91 | 92 | # pipenv 93 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 94 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 95 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 96 | # install all needed dependencies. 97 | #Pipfile.lock 98 | 99 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 100 | __pypackages__/ 101 | 102 | # Celery stuff 103 | celerybeat-schedule 104 | celerybeat.pid 105 | 106 | # SageMath parsed files 107 | *.sage.py 108 | 109 | # Environments 110 | .env 111 | .venv 112 | env/ 113 | venv/ 114 | ENV/ 115 | env.bak/ 116 | venv.bak/ 117 | 118 | # Spyder project settings 119 | .spyderproject 120 | .spyproject 121 | 122 | # Rope project settings 123 | .ropeproject 124 | 125 | # mkdocs documentation 126 | /site 127 | 128 | # mypy 129 | .mypy_cache/ 130 | .dmypy.json 131 | dmypy.json 132 | 133 | # Pyre type checker 134 | .pyre/ 135 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning for Data Science 2 | 3 | Tutorials on Deep Learning for Data Science with 4 | ![pytorch logo](./logos/pytorch_logo_small.png) 5 | 6 | ⚠️ **This repository is still WIP** so content is still in _draft mode_ - and references to external resources might be missing ⚠️ 7 | 8 | ## Content at a glance 9 | 10 | **I**: **ANN and Automatic Differentiation** 11 | 12 | 1. Intro to Artificial Neural Networks 13 | 14 | - Short intro: Supervised vs Unsupervied Learning 15 | - Perceptron: the linear Neuron model 16 | 17 | - Short on Vectorisation 18 | - ADAptive LInear NEuron (ADALINE) 19 | 20 | - Multi-Layer Perceptron 21 | 22 | - `numpy`-based implementation 23 | - `torch.Tensor`-based implementation 24 | 25 | - From ANN to DNN 26 | 27 | - Introduction to `torch.nn` 28 | - PyTorch Model Persistence 29 | - Classification and Regression Revisited 30 | - Short on Universal Approximation Theorem 31 | - from Logistic to Softmax 32 | - Multi-class Classification and `CrossEntropyLoss` 33 | 34 | 2. Automatic Differentiation and `autograd`: 35 | 36 | - Intro to Automatic Differentiation 37 | 38 | - forwad mode AD 39 | - backward mode AD 40 | - `tangent` and `autograd` 41 | 42 | - Towards `torch.nn`: [`micrograd`](https://github.com/karpathy/micrograd) 43 | 44 | - `torch.Tensor` and `autograd` 45 | 46 | **II**: **Data and Dataset** 47 | 48 | 3. Data for Machine and Deep Learning 49 | 50 | - Data _for_ Machine (Deep) Learning 51 | 52 | - `torchvision` 53 | - `torchtext` 54 | - `torchaudio` 55 | 56 | - Deep learning _for_ Data 57 | 58 | - Choose your Estimator 59 | - Choose your DL model 60 | 61 | - Data the `torch` way - Introducing `torch.utils.data`, `DataSet`, and `DataLoader` 62 | - Preparing Data for Experiments - _Training_, _Test_ and _Cross Validation_ 63 | 64 | ### Requirements 65 | 66 | This tutorial runs on **Python 3** (Py3.7+ should be fine), and requires the following main packages: 67 | 68 | - `numpy` 69 | - `scipy` 70 | - `matplotlib` 71 | - `scikit-learn` 72 | - `pandas` 73 | - `torch` (of course 😄) 74 | - `torchvision` 75 | 76 | The complete list of requirements is available in [`requirements.txt`](requirements.txt) 77 | 78 | Detailed (_step-by-step_) instructions to setup the Python virtual environment on your local machine are also available [here](./setup.md). 79 | 80 | ### License Summary 81 | 82 | The material provided in this repository adopts two different licence files, for Lecture notes and Source Code, respectively. 83 | 84 | The Lecture notes (and corresponding source notebooks) are available under the **Creative Commons Attribution-ShareAlike 4.0 International License**. 85 | Creative Commons License 86 | 87 | The samples and reference code within this repository is made available under the **Apache License 2.0**. See the `LICENSE` file. 88 | 89 | ### References 90 | 91 | **Author**: Valerio Maggio, **Senior Research Associate** `@` Dynamic Genetics Lab 92 | 93 | University of Bristol 95 | 96 | 97 | Dynamic Genetics 98 | 99 | | Contact | 100 | | -------------------------------------------------------------------------------------------------------------------------------------- | 101 | | Twitter [`@leriomaggio`](http://twitter.com/leriomaggio) | 102 | | LinkedIn [`ValerioMaggio`](http://it.linkedin.com/in/valeriomaggio) | 103 | | Mail [`valerio.maggio@bristol.ac.uk`]() | 104 | -------------------------------------------------------------------------------- /setup.md: -------------------------------------------------------------------------------- 1 | # Development Environment Setup 2 | 3 | ## Setup Virtual Environment 4 | 5 | Instructions to setup the (local) Python development environment. 6 | 7 | - [Virtual Environment using `venv`](#venv) for Standard Python Distribution 8 | - [Conda Virtual Environment](#conda) for Anaconda Python Distribution 9 | 10 | --- 11 | 12 | 13 | 14 | #### Virtual Environment using `venv` 15 | 16 | The `venv` module provides support for creating lightweight “virtual environments” 17 | with their own site directories, 18 | optionally isolated from system site directories. 19 | Each virtual environment has its own Python binary 20 | (which matches the version of the binary that was used to create this environment) 21 | and can have its own independent set of installed Python packages in 22 | its site directories. 23 | 24 | **Note**: The `venv` module is part of the **Python Standard Library**, so no further 25 | installation is required. 26 | 27 | The following **`3`** steps are required to setup a new virtual environment 28 | using `venv`: 29 | 30 | 1. Create the environment: 31 | 32 | ```shell script 33 | python -m venv /path/to/new/virtual/environment 34 | ``` 35 | 36 | 2. Activate the environment: 37 | 38 | ```shell script 39 | source /path/to/new/virtual/environment/bin/activate 40 | ``` 41 | 42 | 3. Install the Required Package (using the `requirements.txt` file): 43 | 44 | ```shell script 45 | pip install -r requirements.txt 46 | ``` 47 | 48 | 4. (**Optional**) Create new Jupyter notebook Kernel 49 | 50 | To avoid re-installing the entire Jupyter stack into the new environment, 51 | it is possible to add a new **Jupyter Kernel** to be used in notebooks with 52 | the "default" Jupyter. 53 | 54 | To do so, please make sure that the `ipykernel` package is installed in the **new** 55 | Python environment: 56 | 57 | ```shell script 58 | pip install ipykernel ## this should be the default pip 59 | ``` 60 | 61 | Then, execute the following command: 62 | 63 | ```shell script 64 | python -m ipykernel install --user --prefix /path/to/new/virtual/environment --display-name "Python 3 (Deep Learning for Data Science)" 65 | ``` 66 | 67 | Further information [here](https://ipython.readthedocs.io/en/stable/install/kernel_install.html) 68 | 69 | 70 | 71 | #### Setting up `conda` environment 72 | 73 | If you are using Anaconda Python distribution, it is possible to re-create the 74 | virtual (conda) environment using the export `.yml` (`YAML`) file: 75 | 76 | ```shell script 77 | conda env create -f dl-datascience.yml 78 | ``` 79 | 80 | This will create a new Conda environment named `dl-datascience` with all the 81 | required packages. 82 | 83 | To **activate** the environment: 84 | 85 | ```shell script 86 | conda activate dl-datascience 87 | ``` 88 | 89 | ##### (**Optional**) Create new Jupyter notebook Kernel 90 | 91 | To avoid re-installing the entire Jupyter stack into the new environment, 92 | it is possible to 93 | add a new **Jupyter Kernel** to be used in notebooks with the "default" Jupyter. 94 | 95 | To do so, please make sure that the `ipykernel` package is installed in the **new** 96 | conda environment: 97 | 98 | ```shell script 99 | conda install ipykernel 100 | ``` 101 | 102 | Then, still remaining in the **new** conda environment, execute the following command: 103 | 104 | ```shell script 105 | python -m ipykernel install --user --name dl-torch --display-name "Python 3 (Deep Learning for Data Science)" 106 | ``` 107 | 108 | Further information [here](https://ipython.readthedocs.io/en/stable/install/kernel_install.html) 109 | 110 | ### Code formatting 111 | 112 | If possible, use the [black code formatter](https://github.com/python/black) (e.g. 113 | `pip install black`) and run it before submitting your code to the repository. 114 | This helps maintain consistency. 115 | 116 | Formatting is as simple as running: 117 | 118 | ```bash 119 | black . 120 | ``` 121 | 122 | in the root of the project. 123 | 124 | ### Pre-commit (Optional) 125 | 126 | In order to automate the `black` and `flake8` code formatting, integration with 127 | the Python `pre-commit` package has been added to this repo. 128 | 129 | To do so, it is just necessary to: 130 | 131 | 1. Install `pre-commit` Python package 132 | (already **included** in the `requirements.txt` file): 133 | ```shell script 134 | pip install pre-commit 135 | ``` 136 | 137 | 2. Install the git hooks in your `./git/hooks` directory: 138 | ```shell script 139 | pre-commit install 140 | ``` 141 | 142 | For a walk-through guide and further details, please read the **outstanding** 143 | [post](https://bit.ly/py-pre-commit) 144 | from [Lj Miranda](https://ljvmiranda921.github.io). -------------------------------------------------------------------------------- /2_ML_Data/fer.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides access to the FER (Facial Emotion Recognition) 3 | as encapsulated as a `torchvision.datasets.VisionDataset` class. 4 | 5 | Notes 6 | ----- 7 | The FER dataset [1]_ consists of `48x48` pixel grayscale images of faces. 8 | The faces have been automatically registered so that the face is more or less 9 | centered and occupies about the same amount of space in each image. 10 | 11 | The task is to categorize each face based on the emotion shown in the facial 12 | expression in to one of seven categories 13 | `(0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)`. 14 | 15 | These are the overall statistics of the Dataset, per emotion. 16 | 17 | .. list-table:: Dataset Overall Statistics 18 | :widths: 25 25 50 19 | :header-rows: 1 20 | 21 | * - Emotion 22 | - Emotion Label 23 | - Count 24 | * - 0 25 | - Angry 26 | - 4,593 27 | * - 1 28 | - Disgust 29 | - 547 30 | * - 2 31 | - Fear 32 | - 5,121 33 | * - 3 34 | - Happy 35 | - 8,989 36 | * - 4 37 | - Sad 38 | - 6,077 39 | * - 5 40 | - Surprise 41 | - 4,002 42 | * - 6 43 | - Neutral 44 | - 6,198 45 | 46 | Samples distributions per Data partitions, and per-emotions are reported below. 47 | 48 | .. list-table:: Data partitions statistics 49 | :widths: 33 33 34 50 | :header-rows: 1 51 | 52 | * - Training 53 | - Validation 54 | - Test 55 | * - 28,709 56 | - 3589 57 | - 3,589 58 | 59 | 60 | The distribution of the samples per single emotion for each of the three 61 | considered data data_partition is show in the barplot below: 62 | 63 | .. image:: images/fer_data_partitions_distributions.png 64 | :width: 400 65 | :alt: Samples distribution per-emotion among the three data partitions 66 | 67 | 68 | References 69 | ----------- 70 | .. [1] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, 71 | W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, 72 | R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, 73 | M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, 74 | Z. Chuang, and Y. Bengio. "Challenges in representation learning: A report on three 75 | machine learning contests". Neural Networks, 64:59--63, 2015. 76 | Special Issue on Deep Learning of Representations 77 | """ 78 | 79 | 80 | import os 81 | from typing import Tuple 82 | from pathlib import Path 83 | from math import sqrt 84 | 85 | import numpy as np 86 | import pandas as pd 87 | import torch 88 | 89 | from torch.utils.data import Dataset 90 | from torchvision.datasets.utils import download_and_extract_archive 91 | 92 | 93 | class FER(Dataset): 94 | """`FER` (Facial Emotion Recognition) Dataset 95 | 96 | Attributes 97 | ---------- 98 | root : str 99 | Root directory where the local copy of dataset is stored. 100 | split : {"train", "validation", "test"} (default: "train") 101 | Target data data_partition. Three data partitions are available, namely 102 | "training", "validation", and "test". Training data_partition is considered 103 | by default. 104 | download : bool, optional (False) 105 | If true, the dataset will be downloaded from the internet and saved in the root 106 | directory. If dataset is already downloaded, it is not downloaded again. 107 | """ 108 | 109 | RAW_DATA_FILE = "fer2013.csv" 110 | RAW_DATA_FOLDER = "fer2013" 111 | 112 | resources = [ 113 | ( 114 | "https://www.dropbox.com/s/2rehtpc6b5mj9y3/fer2013.tar.gz?dl=1", 115 | "ca95d94fe42f6ce65aaae694d18c628a", 116 | ) 117 | ] 118 | 119 | # torch Tensor filename 120 | data_files = { 121 | "train": "training.pt", 122 | "validation": "validation.pt", 123 | "test": "test.pt", 124 | } 125 | 126 | classes = [ 127 | "angry", 128 | "disgust", 129 | "fear", 130 | "happy", 131 | "sad", 132 | "surprise", 133 | "neutral", 134 | ] 135 | 136 | def __init__(self, root: str, split: str = "train", download: bool = False): 137 | self.root = root 138 | split = split.strip().lower() 139 | if split not in self.data_files: 140 | raise ValueError( 141 | "Data Partition not recognised. Accepted values are 'train', 'validation', 'test'." 142 | ) 143 | if download: 144 | self.download() # download, preprocess, and store FER data 145 | if not self._check_exists(): 146 | raise RuntimeError( 147 | "Dataset not found." + " You can use download=True to download it" 148 | ) 149 | self.split = split 150 | data_file = self.data_files[self.split] 151 | data_filepath = self.processed_folder / data_file 152 | # load serialisation of dataset as torch.Tensors 153 | self.data, self.targets = torch.load(data_filepath) 154 | 155 | def __len__(self): 156 | return len(self.data) 157 | 158 | def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]: 159 | """ 160 | 161 | Parameters 162 | ---------- 163 | index : int 164 | Index of the sample 165 | 166 | Returns 167 | ------- 168 | tuple 169 | (Image, Target) where target is index of the target class. 170 | """ 171 | img, target = self.data[index], int(self.targets[index]) 172 | img = torch.unsqueeze(img, 0) 173 | return img, target 174 | 175 | def _check_exists(self): 176 | for data_fname in self.data_files.values(): 177 | data_file = self.processed_folder / data_fname 178 | if not data_file.exists(): 179 | return False 180 | return True 181 | 182 | def download(self): 183 | """Download the FER data if it doesn't already exist in the processed folder""" 184 | 185 | if self._check_exists(): 186 | return 187 | 188 | os.makedirs(self.raw_folder, exist_ok=True) 189 | os.makedirs(self.processed_folder, exist_ok=True) 190 | 191 | # download files 192 | for url, md5 in self.resources: 193 | filename = url.rpartition("/")[-1].split("?")[0] 194 | download_and_extract_archive( 195 | url, download_root=str(self.raw_folder), filename=filename, 196 | ) 197 | 198 | # process and save as torch files 199 | def _set_partition(label: str) -> str: 200 | if label == "Training": 201 | return "train" 202 | if label == "PrivateTest": 203 | return "validation" 204 | return "test" 205 | 206 | print("Processing...", end="") 207 | raw_data_filepath = self.raw_folder / self.RAW_DATA_FOLDER / self.RAW_DATA_FILE 208 | fer_df = pd.read_csv( 209 | raw_data_filepath, header=0, names=["emotion", "pixels", "partition"] 210 | ) 211 | fer_df["partition"] = fer_df.partition.apply(_set_partition) 212 | fer_df.emotion = pd.Categorical(fer_df.emotion) 213 | 214 | for partition in ("train", "validation", "test"): 215 | dataset = fer_df[fer_df["partition"] == partition] 216 | images = self._images_as_torch_tensors(dataset) 217 | labels = self._labels_as_torch_tensors(dataset) 218 | data_file = self.processed_folder / self.data_files[partition] 219 | with open(data_file, "wb") as f: 220 | torch.save((images, labels), f) 221 | print("Done!") 222 | 223 | def _images_as_torch_tensors(self, dataset: pd.DataFrame) -> torch.Tensor: 224 | """ 225 | Extract all the pixel from the input dataframes, and convert images in 226 | a [sample x features] torch.Tensor 227 | 228 | Parameters 229 | ---------- 230 | dataset : pd.DataFrame 231 | The target dataset data_partition (i.e. training, validation, or test) 232 | as extracted from the original dataset 233 | Returns 234 | ------- 235 | torch.Tensor 236 | [sample x pixels] tensor representing the whole data data_partition as 237 | torch Tensor. 238 | """ 239 | imgs_np = (dataset.pixels.map(self._to_numpy)).values 240 | imgs_np = np.array(np.concatenate(imgs_np, axis=0)) 241 | samples_no, pixels = imgs_np.shape 242 | new_shape = (samples_no, int(sqrt(pixels)), int(sqrt(pixels))) 243 | return torch.from_numpy(imgs_np).view(new_shape) 244 | 245 | @staticmethod 246 | def _labels_as_torch_tensors(dataset: pd.DataFrame) -> torch.Tensor: 247 | """Extract labels from pd.Series and convert into torch.Tensor""" 248 | labels_np = dataset.emotion.values.astype(np.int) 249 | return torch.from_numpy(labels_np) 250 | 251 | @staticmethod 252 | def _to_numpy(pixels: str) -> np.ndarray: 253 | """Convert one-line string pixels into NumPy array, adding the first 254 | extra axis (sample dimension) later used as the concatenation axis""" 255 | img_array = np.fromstring(pixels, dtype="uint8", sep=" ")[np.newaxis, ...] 256 | return img_array 257 | 258 | # Properties 259 | # for public API 260 | # -------------- 261 | 262 | @property 263 | def processed_folder(self): 264 | return Path(self.root) / self.__class__.__name__ / "processed" 265 | 266 | @property 267 | def raw_folder(self): 268 | return Path(self.root) / self.__class__.__name__ / "raw" 269 | 270 | @property 271 | def partition(self): 272 | return self.split 273 | 274 | @property 275 | def class_to_idx(self): 276 | return {_class: i for i, _class in enumerate(self.classes)} 277 | 278 | @property 279 | def idx_to_class(self): 280 | return {v: k for k, v in self.class_to_idx.items()} 281 | 282 | @staticmethod 283 | def classes_map(): 284 | return {i: c for i, c in enumerate(FER.classes)} 285 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /2_ML_Data/imgs/train_validation_test2.svg: -------------------------------------------------------------------------------- 1 | 2 | image/svg+xmlAll Data 351 | Training 373 | Test 398 | Validation 423 | -------------------------------------------------------------------------------- /2_ML_Data/imgs/train_test_split.svg: -------------------------------------------------------------------------------- 1 | 2 | image/svg+xmlAll Data 369 | Training data 395 | Test data 421 | -------------------------------------------------------------------------------- /2_ML_Data/imgs/cross_validation.svg: -------------------------------------------------------------------------------- 1 | 2 | image/svg+xmlFold 1 369 | Fold 2 395 | Fold 3 421 | Fold 4 447 | Fold 5 473 | Fold 1 499 | Fold 2 525 | Fold 3 551 | Fold 4 577 | Fold 5 603 | Fold 1 629 | Fold 2 655 | Fold 3 681 | Fold 4 707 | Fold 5 733 | Fold 1 759 | Fold 2 785 | Fold 3 811 | Fold 4 837 | Fold 5 863 | Fold 1 889 | Fold 2 915 | Fold 3 941 | Fold 4 967 | Fold 5 993 | Fold 1 1019 | Fold 2 1045 | Fold 3 1071 | Fold 4 1097 | Fold 5 1123 | Split 1 1145 | Split 2 1163 | Split 3 1181 | Split 4 1199 | Split 5 1217 | --------------------------------------------------------------------------------