├── __init__.py
├── .gitmodules
├── makefiles
    └── .gitkeep
├── notebooks
    └── .gitkeep
├── src
    ├── __init__.py
    ├── functional
    │   ├── numpy.py
    │   ├── __init__.py
    │   ├── pandas.py
    │   └── common.py
    ├── models
    │   ├── __init__.py
    │   ├── ensembles.py
    │   └── base.py
    ├── motifs
    │   ├── __init__.py
    │   ├── features.py
    │   └── models.py
    ├── utils
    │   ├── __init__.py
    │   ├── exceptions.py
    │   ├── label_helpers.py
    │   ├── misc.py
    │   ├── loaders.py
    │   └── decorators.py
    ├── evaluation
    │   ├── __init__.py
    │   └── classification.py
    ├── features
    │   ├── __init__.py
    │   ├── ecdf_features.py
    │   ├── statistical_features.py
    │   └── statistical_features_impl.py
    ├── transformers
    │   ├── __init__.py
    │   ├── source_selector.py
    │   ├── body_grav_filter.py
    │   ├── resample.py
    │   └── window.py
    ├── visualisations
    │   ├── __init__.py
    │   └── umap_embedding.py
    ├── datasets
    │   ├── __init__.py
    │   ├── uschad.py
    │   ├── anguita2013.py
    │   ├── pamap2.py
    │   └── base.py
    ├── keys.py
    ├── meta.py
    └── base.py
├── tables
    ├── modalities.md
    ├── pipelines.md
    ├── locations.md
    ├── visualisations.md
    ├── transformers.md
    ├── features.md
    ├── models.md
    ├── activities.md
    └── datasets.md
├── metadata
    ├── indices
    │   ├── index.yaml
    │   ├── label.yaml
    │   ├── split.yaml
    │   └── target.yaml
    ├── data_partitions
    │   ├── loso.yaml
    │   ├── deployable.yaml
    │   └── predefined.yaml
    ├── modality.yaml
    ├── tasks
    │   ├── har.yaml
    │   └── localisation.yaml
    └── datasets
    │   ├── sphere_challenge.yaml_
    │   ├── uschad.yaml
    │   ├── anguita2013.yaml
    │   └── pamap2.yaml
├── .flake8
├── requirements.txt
├── docs
    ├── getting-started.rst
    ├── commands.rst
    ├── index.rst
    ├── make.bat
    ├── Makefile
    └── conf.py
├── pyproject.toml
├── Pipfile
├── .pre-commit-config.yaml
├── LICENSE
├── .gitignore
├── make_download.py
├── har_basic.py
├── har_chain.py
├── har_ensemble_avg.py
├── make_tables.py
├── har_zero.py
├── Makefile
└── README.md


/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/makefiles/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/notebooks/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/functional/numpy.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/models/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/motifs/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/utils/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/tables/modalities.md:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/indices/index.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/indices/label.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/indices/split.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/indices/target.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/evaluation/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/features/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/functional/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/functional/pandas.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/transformers/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/visualisations/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/data_partitions/loso.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/data_partitions/deployable.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/metadata/data_partitions/predefined.yaml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/src/utils/exceptions.py:
--------------------------------------------------------------------------------
1 | class ModalityNotPresentError(KeyError):
2 |     pass
3 | 


--------------------------------------------------------------------------------
/.flake8:
--------------------------------------------------------------------------------
1 | [flake8]
2 | ignore = E203, E266, E501, W503, F403, F401, W293
3 | max-line-length = 120
4 | max-complexity = 18
5 | select = B,C,E,F,W,T4,B9
6 | 


--------------------------------------------------------------------------------
/tables/pipelines.md:
--------------------------------------------------------------------------------
1 | | Index | Pipelines | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | stat_feat | None | 
4 | | 1 | ecdf_11 | None | 
5 | | 2 | ecdf_21 | None | 
6 | 


--------------------------------------------------------------------------------
/tables/locations.md:
--------------------------------------------------------------------------------
1 | | Index | Locations | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | wrist | 0 | 
4 | | 1 | chest | 1 | 
5 | | 2 | ankle | 2 | 
6 | | 3 | waist | 3 | 
7 | 


--------------------------------------------------------------------------------
/src/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from src.datasets.anguita2013 import anguita2013
2 | from src.datasets.base import Dataset
3 | from src.datasets.pamap2 import pamap2
4 | from src.datasets.uschad import uschad
5 | 


--------------------------------------------------------------------------------
/tables/visualisations.md:
--------------------------------------------------------------------------------
1 | | Index | Visualisations | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | umap_embedding | [Link 1](https://arxiv.org/abs/1802.03426), [Link 2](https://github.com/lmcinnes/umap) | 
4 | 


--------------------------------------------------------------------------------
/tables/transformers.md:
--------------------------------------------------------------------------------
1 | | Index | Transformers | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | body_grav_filter | {'resamples': False} | 
4 | | 1 | resample | None | 
5 | | 2 | window | {'resamples': True} | 
6 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | requests
 2 | numpy
 3 | pandas
 4 | numpy
 5 | matplotlib
 6 | seaborn
 7 | python-dotenv
 8 | scikit-learn
 9 | joblib
10 | ipython
11 | jupyter
12 | pyyaml
13 | loguru
14 | tqdm
15 | git+https://github.com/njtwomey/mldb.git#egg=mldb
16 | spectrum
17 | click
18 | umap-learn
19 | 


--------------------------------------------------------------------------------
/docs/getting-started.rst:
--------------------------------------------------------------------------------
1 | Getting started
2 | ===============
3 | 
4 | This is where you describe how to get set up on a clean install, including the
5 | commands necessary to get the raw data (using the `sync_data_from_s3` command,
6 | for example), and then how to make the cleaned, final data sets.
7 | 


--------------------------------------------------------------------------------
/metadata/modality.yaml:
--------------------------------------------------------------------------------
 1 | # Wearable modalities
 2 |  - accel
 3 |  - gyro
 4 |  - mag
 5 | 
 6 | # Ambient sensor modalities
 7 |  - humidity
 8 |  - temperature
 9 |  - pir
10 |  - light
11 |  - electricity
12 |  - rssi
13 | 
14 | # RGBD modalities
15 |  - rgbd_kitchen
16 |  - rgbd_hall
17 |  - rgbd_lr
18 | 


--------------------------------------------------------------------------------
/src/functional/common.py:
--------------------------------------------------------------------------------
 1 | def sorted_node_values(nodes):
 2 |     return [nodes[key] for key in sorted(nodes.keys())]
 3 | 
 4 | 
 5 | def node_itemgetter(item):
 6 |     def itemgetter_func(df):
 7 |         return df[item]
 8 | 
 9 |     itemgetter_func.__name__ = f"get_{item}"
10 | 
11 |     return itemgetter_func
12 | 


--------------------------------------------------------------------------------
/tables/features.md:
--------------------------------------------------------------------------------
1 | | Index | Features | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | statistical_features | [Link 1](https://pdfs.semanticscholar.org/83de/43bc849ad3d9579ccf540e6fe566ef90a58e.pdf) | 
4 | | 1 | ecdf | [Link 1](https://dl.acm.org/citation.cfm?id=2494353), [Link 2](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.908&rep=rep1&type=pdf) | 
5 | 


--------------------------------------------------------------------------------
/tables/models.md:
--------------------------------------------------------------------------------
1 | | Index | Models | value | 
2 | | ----- | ----- | ----- | 
3 | | 0 | scale_log_reg | [Link 1](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), [Link 2](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html) | 
4 | | 1 | random_forest | [Link 1](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) | 
5 | 


--------------------------------------------------------------------------------
/docs/commands.rst:
--------------------------------------------------------------------------------
 1 | Commands
 2 | ========
 3 | 
 4 | The Makefile contains the central entry points for common tasks related to this project.
 5 | 
 6 | Syncing data to S3
 7 | ^^^^^^^^^^^^^^^^^^
 8 | 
 9 | * `make sync_data_to_s3` will use `aws s3 sync` to recursively sync files in `data/` up to `s3://[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')/data/`.
10 | * `make sync_data_from_s3` will use `aws s3 sync` to recursively sync files from `s3://[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')/data/` to `data/`.
11 | 


--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
 1 | .. har_datasets documentation master file, created by
 2 |    sphinx-quickstart.
 3 |    You can adapt this file completely to your liking, but it should at least
 4 |    contain the root `toctree` directive.
 5 | 
 6 | har_datasets documentation!
 7 | ==============================================
 8 | 
 9 | Contents:
10 | 
11 | .. toctree::
12 |    :maxdepth: 2
13 | 
14 |    getting-started
15 |    commands
16 | 
17 | 
18 | 
19 | Indices and tables
20 | ==================
21 | 
22 | * :ref:`genindex`
23 | * :ref:`modindex`
24 | * :ref:`search`
25 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.black]
 2 | line-length = 120
 3 | target-version = ['py38']
 4 | include = '\.pyi?$'
 5 | exclude = '''
 6 | 
 7 | (
 8 |   /(
 9 |       \.eggs         # exclude a few common directories in the
10 |     | \.git          # root of the project
11 |     | \.hg
12 |     | \.mypy_cache
13 |     | \.tox
14 |     | \.venv
15 |     | _build
16 |     | buck-out
17 |     | build
18 |     | dist
19 |   )/
20 |   | foo.py           # also separately exclude a file named foo.py in
21 |                      # the root of the project
22 | )
23 | '''
24 | 


--------------------------------------------------------------------------------
/src/utils/label_helpers.py:
--------------------------------------------------------------------------------
 1 | __all__ = ["normalise_labels"]
 2 | 
 3 | 
 4 | def normalise_labels(ll):
 5 |     """
 6 | 
 7 |     Args:
 8 |         ll:
 9 | 
10 |     Returns:
11 | 
12 |     """
13 |     if "walk" in ll:
14 |         return "walk"
15 |     elif "elevator" in ll:
16 |         return "stand"
17 |     elif ll in {"lie", "sleep"}:
18 |         return "lie"
19 |     elif ll in {"vacuum", "iron", "laundry", "clean"}:
20 |         return "chores"
21 |     elif ll in {"run", "jump", "rope_jump", "soccer", "cycle"}:
22 |         return "sport"
23 |     return ll
24 | 


--------------------------------------------------------------------------------
/metadata/tasks/har.yaml:
--------------------------------------------------------------------------------
 1 | # Sedentary activities
 2 |  - sit
 3 |  - stand
 4 |  - lie
 5 |  - watch_tv
 6 |  - elevator_up
 7 |  - elevator_down
 8 |  - sleep
 9 | 
10 | # Walking activities
11 |  - walk
12 |  - walk_up
13 |  - walk_down
14 |  - walk_left
15 |  - walk_right
16 |  - walk_nordic
17 | 
18 | # Household
19 |  - iron
20 |  - laundry
21 |  - clean
22 |  - vacuum
23 | 
24 | # Sport/exercise
25 |  - run
26 |  - cycle
27 |  - soccer
28 |  - rope_jump
29 |  - jump
30 | 
31 | # Other
32 |  - work_computer
33 |  - drive_car
34 | 
35 | # Catchall
36 |  - other
37 |  - none
38 | 


--------------------------------------------------------------------------------
/Pipfile:
--------------------------------------------------------------------------------
 1 | [[source]]
 2 | url = "https://pypi.org/simple"
 3 | verify_ssl = true
 4 | name = "pypi"
 5 | 
 6 | [packages]
 7 | requests = "*"
 8 | numpy = "*"
 9 | pandas = "*"
10 | matplotlib = "*"
11 | seaborn = "*"
12 | python-dotenv = "*"
13 | scikit-learn = "*"
14 | joblib = "*"
15 | ipython = "*"
16 | jupyter = "*"
17 | loguru = "*"
18 | tqdm = "*"
19 | mldb = {git = "https://github.com/njtwomey/mldb.git"}
20 | spectrum = "*"
21 | click = "*"
22 | umap-learn = "*"
23 | PyYAML = "*"
24 | pygraphviz = "*"
25 | 
26 | [dev-packages]
27 | 
28 | [requires]
29 | python_version = "3.8"
30 | 


--------------------------------------------------------------------------------
/src/utils/misc.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import random
 3 | 
 4 | import numpy as np
 5 | 
 6 | 
 7 | __all__ = ["randomised_order", "NumpyEncoder"]
 8 | 
 9 | 
10 | def randomised_order(iterable):
11 |     iterable = list(iterable)
12 |     random.shuffle(iterable)
13 |     yield from iterable
14 | 
15 | 
16 | class NumpyEncoder(json.JSONEncoder):
17 |     def default(self, obj):
18 |         if isinstance(obj, np.integer):
19 |             return int(obj)
20 |         elif isinstance(obj, np.floating):
21 |             return float(obj)
22 |         elif isinstance(obj, np.ndarray):
23 |             return obj.tolist()
24 |         else:
25 |             return super(NumpyEncoder, self).default(obj)
26 | 


--------------------------------------------------------------------------------
/tables/activities.md:
--------------------------------------------------------------------------------
 1 | | Index | Activities | value | 
 2 | | ----- | ----- | ----- | 
 3 | | 0 | walk | 0 | 
 4 | | 1 | walk_up | 1 | 
 5 | | 2 | walk_down | 2 | 
 6 | | 3 | sit | 3 | 
 7 | | 4 | stand | 4 | 
 8 | | 5 | lie | 5 | 
 9 | | 6 | run | 6 | 
10 | | 7 | cycle | 7 | 
11 | | 8 | walk_nordic | 8 | 
12 | | 9 | watch_tv | 9 | 
13 | | 10 | work_computer | 10 | 
14 | | 11 | drive_car | 11 | 
15 | | 12 | vacuum | 12 | 
16 | | 13 | iron | 13 | 
17 | | 14 | laundry | 14 | 
18 | | 15 | clean | 15 | 
19 | | 16 | soccer | 16 | 
20 | | 17 | rope_jump | 17 | 
21 | | 18 | other | 18 | 
22 | | 19 | walk_left | 19 | 
23 | | 20 | walk_right | 20 | 
24 | | 21 | jump | 21 | 
25 | | 22 | sleep | 22 | 
26 | | 23 | elevator_up | 23 | 
27 | | 24 | elevator_down | 24 | 
28 | 


--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
 1 | repos:
 2 | -   repo: https://github.com/ambv/black
 3 |     rev: stable
 4 |     hooks:
 5 |     - id: black
 6 |       language_version: python3.8
 7 | -   repo: https://gitlab.com/pycqa/flake8
 8 |     rev: 3.7.9
 9 |     hooks:
10 |     - id: flake8
11 | -   repo: https://github.com/asottile/reorder_python_imports
12 |     rev: v2.4.0
13 |     hooks:
14 |     -   id: reorder-python-imports
15 | -   repo: https://github.com/pre-commit/pre-commit-hooks
16 |     rev: v3.4.0
17 |     hooks:
18 |     -   id: trailing-whitespace
19 |     -   id: end-of-file-fixer
20 |     -   id: check-docstring-first
21 |     -   id: check-json
22 |     -   id: check-added-large-files
23 |     -   id: check-yaml
24 |     -   id: debug-statements
25 |     -   id: requirements-txt-fixer
26 | 


--------------------------------------------------------------------------------
/metadata/tasks/localisation.yaml:
--------------------------------------------------------------------------------
 1 |  - floor_5
 2 |  - floor_4
 3 |  - floor_3
 4 |  - floor_2
 5 |  - floor_1
 6 |  - floor_0
 7 |  - basement_1
 8 |  - basement_2
 9 |  - basement_3
10 |  - basement_4
11 |  - basement_5
12 | 
13 |  - hallway_1
14 |  - hallway_2
15 |  - hallway_3
16 |  - hallway_4
17 |  - hallway_5
18 | 
19 |  - bathroom_1
20 |  - bathroom_2
21 |  - bathroom_3
22 |  - bathroom_4
23 |  - bathroom_5
24 | 
25 |  - toilet_1
26 |  - toilet_2
27 |  - toilet_3
28 |  - toilet_4
29 |  - toilet_5
30 | 
31 |  - living_1
32 |  - living_2
33 |  - living_3
34 |  - living_4
35 |  - living_5
36 | 
37 |  - bedroom_1
38 |  - bedroom_2
39 |  - bedroom_3
40 |  - bedroom_4
41 |  - bedroom_5
42 | 
43 |  - dining_room_1
44 |  - dining_room_2
45 |  - dining_room_3
46 |  - dining_room_4
47 |  - dining_room_5
48 | 
49 |  - stairs_1
50 |  - stairs_2
51 |  - stairs_3
52 |  - stairs_4
53 |  - stairs_5
54 | 
55 |  - kitchen
56 |  - study
57 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | 
 2 | The MIT License (MIT)
 3 | Copyright (c) 2019, Niall Twomey
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
 6 | associated documentation files (the "Software"), to deal in the Software without restriction, including
 7 | without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 8 | copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
 9 | following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included in all copies or substantial
12 | portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
15 | LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
16 | NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
17 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
18 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
19 | 
20 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | .DS_Store
 5 | # C extensions
 6 | *.so
 7 | 
 8 | *.lock
 9 | 
10 | devel.py
11 | 
12 | # Distribution / packaging
13 | .Python
14 | env/
15 | build/
16 | develop-eggs/
17 | dist/
18 | downloads/
19 | eggs/
20 | .eggs/
21 | lib/
22 | lib64/
23 | parts/
24 | sdist/
25 | var/
26 | *.egg-info/
27 | .installed.cfg
28 | *.egg
29 | *.lock
30 | 
31 | niall_*.py
32 | 
33 | # PyInstaller
34 | #  Usually these files are written by a python script from a template
35 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
36 | *.manifest
37 | *.spec
38 | 
39 | # Installer logs
40 | pip-log.txt
41 | pip-delete-this-directory.txt
42 | 
43 | # Unit test / coverage reports
44 | htmlcov/
45 | .tox/
46 | .coverage
47 | .coverage.*
48 | .cache
49 | nosetests.xml
50 | coverage.xml
51 | *,cover
52 | 
53 | # Translations
54 | *.mo
55 | *.pot
56 | 
57 | # Django stuff:
58 | *.log
59 | 
60 | # Sphinx documentation
61 | docs/_build/
62 | 
63 | # PyBuilder
64 | target/
65 | 
66 | # DotEnv configuration
67 | .env
68 | 
69 | # Database
70 | *.db
71 | *.rdb
72 | 
73 | # Pycharm
74 | .idea
75 | 
76 | # Jupyter NB Checkpoints
77 | .ipynb_checkpoints/
78 | 
79 | # exclude data from source control by default
80 | /data/
81 | 


--------------------------------------------------------------------------------
/src/features/ecdf_features.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from src.functional.common import sorted_node_values
 4 | 
 5 | 
 6 | __all__ = [
 7 |     "ecdf",
 8 | ]
 9 | 
10 | 
11 | def ecdf(parent, n_components):
12 |     root = parent / f"feat='ecdf'-k={n_components}"
13 | 
14 |     for key, node in parent.outputs.items():
15 |         root.instantiate_node(
16 |             key=f"{key}-ecdf", backend="numpy", func=calc_ecdf, kwargs=dict(n_components=n_components, data=node),
17 |         )
18 | 
19 |     return root.instantiate_node(
20 |         key="features", func=np.concatenate, args=[sorted_node_values(root.outputs)], kwargs=dict(axis=1)
21 |     )
22 | 
23 | 
24 | def calc_ecdf(data, n_components):
25 |     return np.asarray([ecdf_rep(datum, n_components) for datum in data])
26 | 
27 | 
28 | def ecdf_rep(data, components):
29 |     """
30 |     Taken from: https://github.com/nhammerla/ecdfRepresentation/blob/master/python/ecdfRep.py
31 | 
32 |     Parameters
33 |     ----------
34 |     data
35 |     components
36 | 
37 |     Returns
38 |     -------
39 | 
40 |     """
41 | 
42 |     m = data.mean(0)
43 |     data = np.sort(data, axis=0)
44 |     data = data[np.int32(np.around(np.linspace(0, data.shape[0] - 1, num=components))), :]
45 |     data = data.flatten()
46 |     return np.hstack((data, m))
47 | 


--------------------------------------------------------------------------------
/src/motifs/features.py:
--------------------------------------------------------------------------------
 1 | from mldb import NodeWrapper
 2 | 
 3 | from src.features.ecdf_features import ecdf
 4 | from src.features.statistical_features import statistical_features
 5 | from src.transformers.body_grav_filter import body_grav_filter
 6 | from src.transformers.resample import resample
 7 | from src.transformers.source_selector import source_selector
 8 | from src.transformers.window import window
 9 | 
10 | 
11 | def get_windowed_wearables(
12 |     dataset: NodeWrapper, modality: str, location: str, fs_new: float, win_len: float, win_inc: float
13 | ):
14 |     selected_sources = source_selector(parent=dataset, modality=modality, location=location)
15 |     wear_resampled = resample(parent=selected_sources, fs_new=fs_new)
16 |     wear_filtered = body_grav_filter(parent=wear_resampled)
17 |     wear_windowed = window(parent=wear_filtered, win_len=win_len, win_inc=win_inc)
18 |     return wear_windowed
19 | 
20 | 
21 | def get_features(feat_name: str, windowed_data: NodeWrapper):
22 |     if feat_name == "statistical":
23 |         features = statistical_features(parent=windowed_data)
24 |     elif feat_name == "ecdf":
25 |         features = ecdf(parent=windowed_data, n_components=21)
26 |     else:
27 |         raise ValueError
28 | 
29 |     assert isinstance(features, NodeWrapper)
30 | 
31 |     return features
32 | 


--------------------------------------------------------------------------------
/src/keys.py:
--------------------------------------------------------------------------------
 1 | __all__ = ["Key"]
 2 | 
 3 | from pathlib import Path
 4 | from typing import Union
 5 | 
 6 | 
 7 | class Key(object):
 8 |     def __init__(self, key):
 9 |         self.key = validate_key(key)
10 | 
11 |     def __len__(self) -> int:
12 |         return len(self.key)
13 | 
14 |     def __str__(self) -> str:
15 |         return self.key
16 | 
17 |     def __repr__(self) -> str:
18 |         key = self.key
19 |         return f"Key({key=})"
20 | 
21 |     def __eq__(self, other: Union["Key", Path, str]) -> bool:
22 |         if isinstance(other, Path):
23 |             return self.key == other.name
24 |         if isinstance(other, Key):
25 |             return self.key == other.key
26 |         if isinstance(other, str):
27 |             return self.key == other
28 |         raise NotImplementedError
29 | 
30 |     def __hash__(self) -> int:
31 |         return hash(self.key)
32 | 
33 |     def __contains__(self, key) -> bool:
34 |         validate_key(key)
35 |         return key in self.key
36 | 
37 |     def __lt__(self, other) -> bool:
38 |         return self.key < other.key
39 | 
40 | 
41 | def validate_key(key: Union[Path, Key, str]):
42 |     if isinstance(key, Path):
43 |         return key.name
44 |     if isinstance(key, Key):
45 |         return key.key
46 |     if isinstance(key, str):
47 |         return key
48 |     raise ValueError(f"Unsupported key type: expected str or Key, got {type(key)} (value: {key})")
49 | 


--------------------------------------------------------------------------------
/metadata/datasets/sphere_challenge.yaml_:
--------------------------------------------------------------------------------
 1 | #name: "sphere_challenge"
 2 | #author: "Twomey"
 3 | #paper_name: "The SPHERE challenge: Activity recognition with multimodal sensor data"
 4 | #venue: "arXiv preprint arXiv:1603.00797"
 5 | #bibtex:
 6 | #  - "@article{twomey2016sphere,title={The SPHERE challenge: Activity recognition with multimodal sensor data},author={Twomey, Niall and Diethe, Tom and Kull, Meelis and Song, Hao and Camplani, Massimo and Hannuna, Sion and Fafoutis, Xenofon and Zhu, Ni and Woznowski, Pete and Flach, Peter and Craddock, Ian},journal={arXiv preprint arXiv:1603.00797},year={2016}}"
 7 | #paper_urls:
 8 | #  - "https://arxiv.org/pdf/1603.00797"
 9 | #year: 2016
10 | #description_urls:
11 | #  - "https://www.irc-sphere.ac.uk/sphere-challenge/home"
12 | #  - "https://arxiv.org/abs/1603.00797"
13 | #download_urls:
14 | #  - "https://data.bris.ac.uk/datasets/8gccwpx47rav19vk8x4xapcog/8gccwpx47rav19vk8x4xapcog.zip"
15 | #fs: 20
16 | #num_subjects: 9
17 | #num_activities: 12
18 | #missing: true
19 | #tasks:
20 | #  har:
21 | #    target_transform:
22 | #      lie: 1
23 | #      sit: 2
24 | #      stand: 3
25 | #      walk: 4
26 | #      run: 5
27 | #      cycle: 6
28 | #      walk_nordic: 7
29 | #      watch_tv: 9
30 | #      work_computer: 10
31 | #      drive_car: 11
32 | #      walk_up: 12
33 | #      walk_down: 13
34 | #      vacuum: 16
35 | #      iron: 17
36 | #      laundry: 18
37 | #      clean: 19
38 | #      soccer: 20
39 | #      rope_jump: 24
40 | #      other: 0
41 | #    evaluation:
42 | #      - "probabilistic_targets"
43 | 


--------------------------------------------------------------------------------
/tables/datasets.md:
--------------------------------------------------------------------------------
1 | | First Author | Paper Name | Dataset Name | Description | Missing data | Download Links | Year | Sampling Rate | Device Locations | Device Modalities | Num Subjects | Num Activities | Activities | 
2 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
3 | | Anguita | A Public Domain Dataset for Human Activity Recognition Using Smartphones | anguita2013 | [Link 1](http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones) | None | [Link 1](https://pdfs.semanticscholar.org/83de/43bc849ad3d9579ccf540e6fe566ef90a58e.pdf) | 2013 | 50 | waist | accel, gyro, mag | 30 | 6 | walk, walk_up, walk_down, sit, stand, lie |
4 | | Reiss | Introducing a new benchmarked dataset for activity monitoring | pamap2 | [Link 1](http://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring), [Link 2](http://archive.ics.uci.edu/ml/machine-learning-databases/00231/readme.pdf) | None | [Link 1](https://ieeexplore.ieee.org/document/6246152/), [Link 2](https://www.researchgate.net/publication/235348485_Introducing_a_New_Benchmarked_Dataset_for_Activity_Monitoring) | 2012 | 100 | wrist, chest, ankle | accel, gyro, mag | 9 | 12 | lie, sit, stand, walk, run, cycle, walk_nordic, watch_tv, work_computer, drive_car, walk_up, walk_down, vacuum, iron, laundry, clean, soccer, rope_jump, other |
5 | | Zhang | USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors | uschad | [Link 1](http://sipi.usc.edu/had/) | None | [Link 1](http://sipi.usc.edu/had/mi_ubicomp_sagaware12.pdf) | 2012 | 100 | waist | accel, gyro | 14 | 12 | walk, walk_left, walk_right, walk_up, walk_down, run, jump, sit, stand, sleep, elevator_up, elevator_down |
6 | 


--------------------------------------------------------------------------------
/src/transformers/source_selector.py:
--------------------------------------------------------------------------------
 1 | from loguru import logger
 2 | from numpy import concatenate
 3 | 
 4 | from src.base import get_ancestral_metadata
 5 | from src.keys import Key
 6 | 
 7 | __all__ = [
 8 |     "source_selector",
 9 | ]
10 | 
11 | 
12 | def do_select_feats(**nodes):
13 |     keys = sorted(nodes.keys())
14 |     return concatenate([nodes[key] for key in keys], axis=1)
15 | 
16 | 
17 | def source_selector(parent, modality="all", location="all"):
18 |     locations_set = set(get_ancestral_metadata(parent, "locations"))
19 |     assert location in locations_set or location == "all", f"Location {location} not in {locations_set}"
20 | 
21 |     modality_set = set(get_ancestral_metadata(parent, "modalities"))
22 |     assert modality in modality_set or modality == "all", f"Modality {modality} not in {modality_set}"
23 | 
24 |     loc, mod = location, modality
25 |     root = parent / f"{loc=}-{mod=}"
26 | 
27 |     # Prepare a set of viable outputs
28 |     valid_locations = set()
29 |     for pair in parent.meta["sources"]:
30 |         loc, mod = pair["loc"], pair["mod"]
31 |         good_location = location == "all" or pair["loc"] == location
32 |         good_modality = modality == "all" or pair["mod"] == modality
33 |         if good_location and good_modality:
34 |             valid_locations.update({Key(f"{loc=}-{mod=}")})
35 | 
36 |     # Aggregate all relevant sources
37 |     selected = 0
38 |     for key, node in parent.outputs.items():
39 |         if key in valid_locations:
40 |             selected += 1
41 |             root.acquire_node(key=key, node=node)
42 | 
43 |     if not selected:
44 |         logger.exception(f"No wearable keys found in {sorted(parent.outputs.keys())}")
45 |         raise KeyError
46 | 
47 |     return root
48 | 


--------------------------------------------------------------------------------
/make_download.py:
--------------------------------------------------------------------------------
 1 | import zipfile
 2 | from os import makedirs
 3 | from os.path import basename
 4 | from os.path import exists
 5 | from os.path import join
 6 | from os.path import split
 7 | from os.path import splitext
 8 | 
 9 | import requests
10 | from loguru import logger
11 | from tqdm import tqdm
12 | 
13 | from src.meta import DatasetMeta
14 | from src.utils.loaders import iter_dataset_paths
15 | 
16 | 
17 | def unzip_data(zip_path, in_name, out_name):
18 |     if exists(join(zip_path, out_name)):
19 |         return
20 |     with zipfile.ZipFile(join(zip_path, in_name), "r") as fil:
21 |         fil.extractall(zip_path)
22 | 
23 | 
24 | def download_and_save(url, path, force=False, chunk_size=2 ** 12):
25 |     response = requests.get(url, stream=True)
26 |     fname = join(path, split(url)[1])
27 |     desc = f"Downloading {fname}..."
28 |     if exists(fname):
29 |         if not force:
30 |             return
31 |     chunks = tqdm(response.iter_content(chunk_size=chunk_size), desc=basename(desc))
32 |     with open(fname, "wb") as fil:
33 |         for chunk in chunks:
34 |             fil.write(chunk)
35 | 
36 | 
37 | def download_dataset(dataset_meta_path):
38 |     dataset = DatasetMeta(dataset_meta_path)
39 |     if not exists(dataset.zip_path):
40 |         makedirs(dataset.zip_path)
41 |     for ii, url in enumerate(dataset.meta["download_urls"]):
42 |         logger.info("\t{}/{} {}".format(ii + 1, len(dataset.meta["download_urls"]), url))
43 |         download_and_save(url=url, path=dataset.zip_path)
44 |         zip_name = basename(dataset.meta["download_urls"][0])
45 |         unzip_path = join(dataset.zip_path, splitext(zip_name)[0])
46 |         unzip_data(zip_path=dataset.zip_path, in_name=zip_name, out_name=unzip_path)
47 | 
48 | 
49 | def main():
50 |     for dataset_meta_path in iter_dataset_paths():
51 |         logger.info(f"Downloading {dataset_meta_path}")
52 |         download_dataset(dataset_meta_path)
53 | 
54 | 
55 | if __name__ == "__main__":
56 |     main()
57 | 


--------------------------------------------------------------------------------
/src/visualisations/umap_embedding.py:
--------------------------------------------------------------------------------
 1 | import matplotlib.pyplot as pl
 2 | import pandas as pd
 3 | import seaborn as sns
 4 | from mldb import NodeWrapper
 5 | from sklearn.pipeline import Pipeline
 6 | from sklearn.preprocessing import StandardScaler
 7 | from umap import UMAP
 8 | 
 9 | from src.base import ExecutionGraph
10 | 
11 | # from src.utils.label_helpers import normalise_labels
12 | 
13 | sns.set_style("darkgrid")
14 | sns.set_context("paper")
15 | 
16 | 
17 | def learn_umap(data):
18 |     umap = Pipeline([("scale", StandardScaler()), ("embed", UMAP(n_neighbors=50, verbose=True))])
19 |     umap.fit(data)
20 |     return umap
21 | 
22 | 
23 | def embed_umap(label, data, model):
24 |     embedding = model.transform(data)
25 | 
26 |     # Need to re-label with a new dataframe since the categories in the normalised label
27 |     # set are different to those in the full set.
28 |     # label = pd.DataFrame(label.target.apply(normalise_labels)).astype("category")
29 |     label = pd.DataFrame(label["target"]).astype("category")
30 | 
31 |     fig, ax = pl.subplots(1, 1, figsize=(10, 10))
32 | 
33 |     labels = label.target.values
34 |     unique_labels = label.target.unique()
35 |     colours = sns.color_palette(n_colors=unique_labels.shape[0])
36 |     for ll, cc in zip(unique_labels, colours):
37 |         if ll == "other":
38 |             continue
39 |         inds = labels == ll
40 |         ax.scatter(embedding[inds, 0], embedding[inds, 1], color=cc, label=ll, s=5, alpha=0.75)
41 |     pl.legend(fontsize="x-large", markerscale=3)
42 |     pl.tight_layout()
43 | 
44 |     return fig
45 | 
46 | 
47 | def umap_embedding(features: NodeWrapper, task_name):
48 |     parent: ExecutionGraph = features.graph
49 |     umap_model = parent.instantiate_orphan_node(func=learn_umap, kwargs=dict(data=features),)
50 |     viz = parent.instantiate_node(
51 |         key=f"umap-visualisation",
52 |         func=embed_umap,
53 |         backend="png",
54 |         kwargs=dict(data=features, model=umap_model, label=parent[task_name]),
55 |     )
56 |     return viz
57 | 


--------------------------------------------------------------------------------
/metadata/datasets/uschad.yaml:
--------------------------------------------------------------------------------
 1 | name: "uschad"
 2 | author: "Zhang"
 3 | paper_name: "USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors"
 4 | venue: "Proceedings of the 2012 ACM Conference on Ubiquitous Computing"
 5 | bibtex:
 6 |   - "@inproceedings{zhang2012usc,title={USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors},author={Zhang, Mi and Sawchuk, Alexander A},booktitle={Proceedings of the 2012 ACM Conference on Ubiquitous Computing},pages={1036--1043},year={2012},organization={ACM}}"
 7 | paper_urls:
 8 |   - "http://sipi.usc.edu/had/mi_ubicomp_sagaware12.pdf"
 9 | year: 2012
10 | description_urls:
11 |   - "http://sipi.usc.edu/had/"
12 | download_urls:
13 |   - "http://sipi.usc.edu/had/USC-HAD.zip"
14 | fs: 100
15 | num_subjects: 14
16 | num_activities: 12
17 | missing: null
18 | modalities:
19 |   - accel
20 |   - gyro
21 | locations:
22 |   - waist
23 | sources:
24 |   - loc: waist
25 |     mod: accel
26 |   - loc: waist
27 |     mod: gyro
28 | tasks:
29 |   har:
30 |     evaluation: classification
31 |     target_transform:
32 |       walk: 1
33 |       walk_left: 2
34 |       walk_right: 3
35 |       walk_up: 4
36 |       walk_down: 5
37 |       run: 6
38 |       jump: 7
39 |       sit: 8
40 |       stand: 9
41 |       sleep: 10
42 |       elevator_up: 11
43 |       elevator_down: 12
44 | data_partitions:
45 |   predefined:
46 |       - fold_0
47 |   loso:
48 |       - fold_0
49 |       - fold_1
50 |       - fold_2
51 |       - fold_3
52 |       - fold_4
53 |       - fold_5
54 |       - fold_6
55 |       - fold_7
56 |       - fold_8
57 |       - fold_9
58 |       - fold_10
59 |       - fold_11
60 |       - fold_12
61 |       - fold_13
62 |       - fold_14
63 |       - fold_15
64 |       - fold_16
65 |       - fold_17
66 |       - fold_18
67 |       - fold_19
68 |       - fold_20
69 |       - fold_21
70 |       - fold_22
71 |       - fold_23
72 |       - fold_24
73 |       - fold_25
74 |       - fold_26
75 |       - fold_27
76 |       - fold_28
77 |       - fold_29
78 |   deployable:
79 |       - fold_0
80 | 


--------------------------------------------------------------------------------
/har_basic.py:
--------------------------------------------------------------------------------
 1 | from src.base import get_ancestral_metadata
 2 | from src.motifs.features import get_features
 3 | from src.motifs.features import get_windowed_wearables
 4 | from src.motifs.models import get_classifier
 5 | from src.utils.loaders import dataset_importer
 6 | from src.utils.misc import randomised_order
 7 | from src.visualisations.umap_embedding import umap_embedding
 8 | 
 9 | 
10 | def basic_har(
11 |     #
12 |     # Dataset
13 |     dataset_name="pamap2",
14 |     #
15 |     # Representation sources
16 |     modality="all",
17 |     location="all",
18 |     #
19 |     # Task/split
20 |     task_name="har",
21 |     data_partition="predefined",
22 |     #
23 |     # Windowification
24 |     fs_new=33,
25 |     win_len=3,
26 |     win_inc=1,
27 |     #
28 |     # Features
29 |     feat_name="ecdf",
30 |     clf_name="rf",
31 |     #
32 |     # Embedding visualisation
33 |     viz=False,
34 |     evaluate=False,
35 | ):
36 |     dataset = dataset_importer(dataset_name)
37 | 
38 |     # Resample, filter and window the raw sensor data
39 |     wear_windowed = get_windowed_wearables(
40 |         dataset=dataset, modality=modality, location=location, fs_new=fs_new, win_len=win_len, win_inc=win_inc
41 |     )
42 | 
43 |     # Extract features
44 |     features = get_features(feat_name=feat_name, windowed_data=wear_windowed)
45 | 
46 |     # Visualise the feature embeddings
47 |     if viz:
48 |         umap_embedding(features, task_name=task_name).evaluate()
49 | 
50 |     # Get classifier params
51 |     models = dict()
52 |     train_test_splits = get_ancestral_metadata(features, "data_partitions")[data_partition]
53 |     for train_test_split in randomised_order(train_test_splits):
54 |         models[train_test_split] = get_classifier(
55 |             clf_name=clf_name,
56 |             feature_node=features,
57 |             task_name=task_name,
58 |             data_partition=data_partition,
59 |             evaluate=evaluate,
60 |             train_test_split=train_test_split,
61 |         )
62 | 
63 |     return features, models
64 | 
65 | 
66 | if __name__ == "__main__":
67 |     basic_har(viz=True, evaluate=True)
68 | 


--------------------------------------------------------------------------------
/metadata/datasets/anguita2013.yaml:
--------------------------------------------------------------------------------
 1 | name: "anguita2013"
 2 | author: "Anguita"
 3 | paper_name: "A Public Domain Dataset for Human Activity Recognition Using Smartphones"
 4 | venue: "European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning"
 5 | bibtex:
 6 |   - "@inproceedings{anguita2013public,title={A public domain dataset for human activity recognition using smartphones.},author={Anguita, Davide and Ghio, Alessandro and Oneto, Luca and Parra, Xavier and Reyes-Ortiz, Jorge Luis},booktitle={Esann},year={2013}}"
 7 | paper_urls:
 8 |   - "https://pdfs.semanticscholar.org/83de/43bc849ad3d9579ccf540e6fe566ef90a58e.pdf"
 9 | year: 2013
10 | description_urls:
11 |   - "http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones"
12 | download_urls:
13 |   - "http://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip"
14 |   - "http://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.names"
15 | fs: 50
16 | missing: null
17 | num_subjects: 30
18 | num_activities: 6
19 | modalities:
20 |   - accel
21 |   - gyro
22 | locations:
23 |   - waist
24 | sources:
25 |   - loc: waist
26 |     mod: accel
27 |   - loc: waist
28 |     mod: gyro
29 | tasks:
30 |   har:
31 |     evaluation:
32 |       - classification
33 |     target_transform:
34 |       walk: 1
35 |       walk_up: 2
36 |       walk_down: 3
37 |       sit: 4
38 |       stand: 5
39 |       lie: 6
40 | data_partitions:
41 |   predefined:
42 |       - fold_0
43 |   loso:
44 |       - fold_0
45 |       - fold_1
46 |       - fold_2
47 |       - fold_3
48 |       - fold_4
49 |       - fold_5
50 |       - fold_6
51 |       - fold_7
52 |       - fold_8
53 |       - fold_9
54 |       - fold_10
55 |       - fold_11
56 |       - fold_12
57 |       - fold_13
58 |       - fold_14
59 |       - fold_15
60 |       - fold_16
61 |       - fold_17
62 |       - fold_18
63 |       - fold_19
64 |       - fold_20
65 |       - fold_21
66 |       - fold_22
67 |       - fold_23
68 |       - fold_24
69 |       - fold_25
70 |       - fold_26
71 |       - fold_27
72 |       - fold_28
73 |       - fold_29
74 |   deployable:
75 |       - fold_0
76 | 


--------------------------------------------------------------------------------
/src/transformers/body_grav_filter.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from scipy import signal
 3 | 
 4 | from src.base import get_ancestral_metadata
 5 | from src.utils.decorators import PartitionByTrial
 6 | 
 7 | __all__ = [
 8 |     "body_grav_filter",
 9 | ]
10 | 
11 | 
12 | def filter_signal(data, filter_order, cutoff, fs, btype, axis=0):
13 |     ba = signal.butter(filter_order, cutoff / fs / 2, btype=btype)
14 | 
15 |     mu = np.mean(data, axis=0, keepdims=True)
16 | 
17 |     dd = signal.filtfilt(ba[0], ba[1], data - mu, axis=axis) + mu
18 | 
19 |     return dd
20 | 
21 | 
22 | def body_filt(index, data, **kwargs):
23 |     filt = filter_signal(data, btype="high", **kwargs)
24 |     assert np.isfinite(filt).all()
25 |     return filt
26 | 
27 | 
28 | def grav_filt(index, data, **kwargs):
29 |     filt = filter_signal(data, btype="low", **kwargs)
30 |     assert np.isfinite(filt).all()
31 |     return filt
32 | 
33 | 
34 | def body_jerk_filt(index, data, **kwargs):
35 |     filt = body_filt(index, data, **kwargs)
36 |     jerk = np.empty(filt.shape, dtype=filt.dtype)
37 |     jerk[0] = 0
38 |     jerk[1:] = filt[1:] - filt[:-1]
39 |     assert np.isfinite(filt).all()
40 |     return jerk
41 | 
42 | 
43 | def body_grav_filter(parent):
44 |     root = parent / "body_grav_filter"
45 | 
46 |     kwargs = dict(fs=get_ancestral_metadata(root, "fs"), filter_order=3, cutoff=0.3)
47 | 
48 |     for key, node in parent.outputs.items():
49 |         filt = "body"
50 |         root.instantiate_node(
51 |             key=f"{key}-{filt=}",
52 |             func=PartitionByTrial(func=body_filt),
53 |             backend="none",
54 |             kwargs=dict(data=node, index=parent.index["index"], **kwargs),
55 |         )
56 | 
57 |         filt = "body_jerk"
58 |         root.instantiate_node(
59 |             key=f"{key}-{filt=}",
60 |             func=PartitionByTrial(func=body_jerk_filt),
61 |             backend="none",
62 |             kwargs=dict(data=node, index=parent.index["index"], **kwargs),
63 |         )
64 | 
65 |         if "accel" in key:
66 |             filt = "grav"
67 |             root.instantiate_node(
68 |                 key=f"{key}-{filt=}",
69 |                 func=PartitionByTrial(func=grav_filt),
70 |                 backend="none",
71 |                 kwargs=dict(data=node, index=parent.index["index"], **kwargs),
72 |             )
73 | 
74 |     return root
75 | 


--------------------------------------------------------------------------------
/metadata/datasets/pamap2.yaml:
--------------------------------------------------------------------------------
 1 | name: "pamap2"
 2 | author: "Reiss"
 3 | paper_name: "Introducing a new benchmarked dataset for activity monitoring"
 4 | venue: "IEEE 16th International Symposium on Wearable Computers"
 5 | bibtex:
 6 |   - "@inproceedings{reiss2012introducing,title={Introducing a new benchmarked dataset for activity monitoring},author={Reiss, Attila and Stricker, Didier},booktitle={2012 16th International Symposium on Wearable Computers},pages={108--109},year={2012},organization={IEEE}}"
 7 | paper_urls:
 8 |   - "https://ieeexplore.ieee.org/document/6246152/"
 9 |   - "https://www.researchgate.net/publication/235348485_Introducing_a_New_Benchmarked_Dataset_for_Activity_Monitoring"
10 | year: 2012
11 | description_urls:
12 |   - "http://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring"
13 |   - "http://archive.ics.uci.edu/ml/machine-learning-databases/00231/readme.pdf"
14 | download_urls:
15 |   - "http://archive.ics.uci.edu/ml/machine-learning-databases/00231/PAMAP2_Dataset.zip"
16 |   - "http://archive.ics.uci.edu/ml/machine-learning-databases/00231/readme.pdf"
17 | fs: 100
18 | num_subjects: 9
19 | num_activities: 12
20 | missing: null
21 | modalities:
22 |   - accel
23 |   - gyro
24 |   - mag
25 | locations:
26 |   - ankle
27 |   - chest
28 |   - wrist
29 | sources:
30 |   - loc: ankle
31 |     mod: accel
32 |   - loc: ankle
33 |     mod: gyro
34 |   - loc: ankle
35 |     mod: mag
36 |   - loc: chest
37 |     mod: accel
38 |   - loc: chest
39 |     mod: gyro
40 |   - loc: chest
41 |     mod: mag
42 |   - loc: wrist
43 |     mod: accel
44 |   - loc: wrist
45 |     mod: gyro
46 |   - loc: wrist
47 |     mod: mag
48 | tasks:
49 |   har:
50 |     evaluation: classification
51 |     target_transform:
52 |       lie: 1
53 |       sit: 2
54 |       stand: 3
55 |       walk: 4
56 |       run: 5
57 |       cycle: 6
58 |       walk_nordic: 7
59 |       watch_tv: 9
60 |       work_computer: 10
61 |       drive_car: 11
62 |       walk_up: 12
63 |       walk_down: 13
64 |       vacuum: 16
65 |       iron: 17
66 |       laundry: 18
67 |       clean: 19
68 |       soccer: 20
69 |       rope_jump: 24
70 |       other: 0
71 | data_partitions:
72 |   predefined:
73 |       - fold_0
74 |   loso:
75 |       - fold_0
76 |       - fold_1
77 |       - fold_2
78 |       - fold_3
79 |       - fold_4
80 |       - fold_5
81 |       - fold_6
82 |       - fold_7
83 |       - fold_8
84 |   deployable:
85 |       - fold_0
86 | 


--------------------------------------------------------------------------------
/har_chain.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from har_basic import basic_har
 4 | from src.base import get_ancestral_metadata
 5 | from src.motifs.models import get_classifier
 6 | from src.utils.misc import randomised_order
 7 | 
 8 | 
 9 | def har_chain(
10 |     test_dataset="anguita2013",
11 |     fs_new=33,
12 |     win_len=3,
13 |     win_inc=1,
14 |     task_name="har",
15 |     data_partition="predefined",
16 |     feat_name="ecdf",
17 |     clf_name="sgd",
18 |     evaluate=False,
19 | ):
20 |     # Make metadata for the experiment
21 |     kwargs = dict(
22 |         fs_new=fs_new, win_len=win_len, win_inc=win_inc, task_name=task_name, feat_name=feat_name, clf_name=clf_name
23 |     )
24 | 
25 |     dataset_alignment = dict(
26 |         anguita2013=dict(dataset_name="anguita2013", location="waist", modality="accel"),
27 |         pamap2=dict(dataset_name="pamap2", location="chest", modality="accel"),
28 |         uschad=dict(dataset_name="uschad", location="waist", modality="accel"),
29 |     )
30 | 
31 |     # Extract the representation for the test dataset
32 |     test_dataset = dataset_alignment.pop(test_dataset)
33 |     features, test_models = basic_har(data_partition="predefined", **test_dataset, **kwargs)
34 | 
35 |     # Instantiate the root directory
36 |     root = features.graph / f"chained-from-{sorted(dataset_alignment.keys())}"
37 | 
38 |     # Build up the list of models from aux datasets
39 |     auxiliary_models = {train_test_split: [model] for train_test_split, model in test_models.items()}
40 |     for model_name, model_kwargs in dataset_alignment.items():
41 |         aux_features, aux_models = basic_har(data_partition="deployable", **model_kwargs, **kwargs)
42 |         for fi, mi in aux_models.items():
43 |             auxiliary_models[fi].append(mi)
44 | 
45 |     models = dict()
46 | 
47 |     # Perform the chaining
48 |     train_test_splits = get_ancestral_metadata(features, "data_partitions")[data_partition]
49 |     for train_test_split in randomised_order(train_test_splits):
50 |         aux_probs = [features] + [model.predict_proba(features) for model in auxiliary_models[train_test_split]]
51 |         prob_features = root.instantiate_orphan_node(func=np.concatenate, args=[aux_probs], kwargs=dict(axis=1))
52 | 
53 |         models[train_test_split] = get_classifier(
54 |             clf_name=clf_name,
55 |             feature_node=prob_features,
56 |             task_name=task_name,
57 |             data_partition=data_partition,
58 |             train_test_split=train_test_split,
59 |             evaluate=evaluate,
60 |         )
61 | 
62 |     return features, models
63 | 
64 | 
65 | if __name__ == "__main__":
66 |     har_chain(evaluate=True)
67 | 


--------------------------------------------------------------------------------
/src/features/statistical_features.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from src.base import get_ancestral_metadata
 4 | from src.features.statistical_features_impl import f_feat
 5 | from src.features.statistical_features_impl import t_feat
 6 | from src.functional.common import sorted_node_values
 7 | 
 8 | __all__ = ["statistical_features"]
 9 | 
10 | 
11 | def statistical_features(parent):
12 |     """
13 |     There are two feature categories defined here:
14 |       1. Time domain
15 |       2. Frequency domain
16 | 
17 |     And these get mapped from transformed data from two sources:
18 |       1. Acceleration
19 |       2. Gyroscope
20 | 
21 |     Assuming these two sources have gone through some body/gravity transformations
22 |     (eg from src.transformations.body_grav_filt) there will actually be several
23 |     more sources, eg:
24 |       1. accel-body
25 |       2. accel-body-jerk
26 |       3. accel-body-jerk
27 |       4. accel-grav
28 |       5. gyro-body
29 |       6. gyro-body-jerk
30 |       7. gyro-body-jerk
31 | 
32 |     With more data sources this list will grows quickly.
33 | 
34 |     The feature types (time and frequency domain) are mapped to the transformed
35 |     sources in a particular way. For example, the frequency domain features are
36 |     *not* calculated on the gravity data sources. The loop below iterates through
37 |     all of the outputs of the previous node in the graph, and the logic within
38 |     the loop manages the correct mapping of functions to sources.
39 | 
40 |     Consult with the dataset table (tables/datasets.md) and see anguita2013 for
41 |     details.
42 |     """
43 | 
44 |     root = parent / "statistical_features"
45 | 
46 |     fs = get_ancestral_metadata(root, "fs")
47 | 
48 |     accel_key = "mod='accel'"
49 |     gyro_key = "mod='gyro'"
50 |     mag_key = "mod='mag'"
51 | 
52 |     for key, node in parent.outputs.items():
53 |         key_td = f"{key}-feat='td'"
54 |         key_fd = f"{key}-feat='fd'"
55 | 
56 |         t_kwargs = dict(data=node)
57 |         f_kwargs = dict(data=node, fs=fs)
58 | 
59 |         if accel_key in key:
60 |             root.instantiate_node(key=key_td, func=t_feat, kwargs=t_kwargs, backend="numpy")
61 |             if "grav" not in key:
62 |                 root.instantiate_node(key=key_fd, func=f_feat, kwargs=f_kwargs, backend="numpy")
63 |         if gyro_key in key or mag_key in key:
64 |             root.instantiate_node(key=key_td, func=t_feat, kwargs=t_kwargs, backend="numpy")
65 |             root.instantiate_node(key=key_fd, func=f_feat, kwargs=f_kwargs, backend="numpy")
66 | 
67 |     return root.instantiate_node(
68 |         key="features", func=np.concatenate, args=[sorted_node_values(root.outputs)], kwargs=dict(axis=1)
69 |     )
70 | 


--------------------------------------------------------------------------------
/src/transformers/resample.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from scipy import signal
 3 | 
 4 | from src.base import get_ancestral_metadata
 5 | from src.utils.decorators import PartitionByTrial
 6 | 
 7 | __all__ = [
 8 |     "resample",
 9 | ]
10 | 
11 | 
12 | def resample_data(index, data, fs_old, fs_new):
13 |     if fs_old == fs_new:
14 |         return data
15 | 
16 |     n_samples = int(data.shape[0] * fs_new / fs_old)
17 | 
18 |     return signal.resample(data, n_samples, axis=0)
19 | 
20 | 
21 | def align_metadata(t_old, t_new):
22 |     assert np.all((t_new[1:] - t_new[:-1]) > 0)
23 |     assert np.all((t_old[1:] - t_old[:-1]) > 0)
24 |     assert t_old[-1] == t_new[-1]
25 |     assert t_old[0] == t_new[0]
26 | 
27 |     inds = [0]
28 | 
29 |     for ti, target in enumerate(t_new[1:], start=1):
30 |         best, best_ind, last_ind = np.inf, None, inds[-1]
31 |         for ii in range(last_ind, len(t_old)):
32 |             diff_ii = abs(target - t_old[ii])
33 |             is_best = diff_ii < best
34 |             if is_best:
35 |                 best, best_ind = diff_ii, ii
36 |             if (not is_best) or (ii == (len(t_old) - 1)):
37 |                 inds.append(best_ind)
38 |                 break
39 | 
40 |     inds = np.asarray(inds)
41 | 
42 |     return inds
43 | 
44 | 
45 | def resample_metadata(index, data, fs_old, fs_new, is_index):
46 |     if fs_old == fs_new:
47 |         return data
48 | 
49 |     n_samples = int(data.shape[0] * fs_new / fs_old)
50 |     t_old = index.time.values
51 |     t_new = np.linspace(t_old[0], t_old[-1], n_samples)
52 | 
53 |     inds = align_metadata(t_old, t_new)
54 |     df = data.iloc[inds].copy()
55 |     if is_index:
56 |         df.loc[:, "time"] = t_new
57 |     df = df.astype(data.dtypes)
58 |     return df
59 | 
60 | 
61 | def resample(parent, fs_new):
62 |     fs_old = get_ancestral_metadata(parent, "fs")
63 | 
64 |     root = parent / f"{fs_new}Hz"
65 |     root.meta.insert("fs", fs_new)
66 | 
67 |     kwargs = dict(fs_old=fs_old, fs_new=fs_new)
68 | 
69 |     if fs_old != fs_new:
70 |         # Only compute indexes and outputs if the sample rate has changed
71 |         for key, node in parent.index.items():
72 |             root.instantiate_node(
73 |                 key=key,
74 |                 backend="pandas",
75 |                 func=PartitionByTrial(resample_metadata),
76 |                 kwargs=dict(index=parent.index["index"], data=node, is_index="index" in str(key), **kwargs),
77 |             )
78 | 
79 |         for key, node in parent.outputs.items():
80 |             root.instantiate_node(
81 |                 key=key,
82 |                 func=PartitionByTrial(resample_data),
83 |                 kwargs=dict(index=parent.index["index"], data=node, **kwargs),
84 |             )
85 | 
86 |     return root
87 | 


--------------------------------------------------------------------------------
/src/models/ensembles.py:
--------------------------------------------------------------------------------
 1 | from typing import Dict
 2 | from typing import List
 3 | from typing import Optional
 4 | from typing import Sized
 5 | from typing import Union
 6 | 
 7 | import numpy as np
 8 | from sklearn.base import BaseEstimator
 9 | from sklearn.preprocessing import LabelEncoder
10 | 
11 | 
12 | class PrefittedVotingClassifier(BaseEstimator):
13 |     def __init__(
14 |         self,
15 |         estimators: List[Union[BaseEstimator]],
16 |         voting: str = "soft",
17 |         weights: Optional[Sized] = None,
18 |         verbose: bool = False,
19 |         strict: bool = True,
20 |     ):
21 |         assert weights is None or len(weights) == len(estimators)
22 | 
23 |         self.estimators = estimators
24 |         self.voting = voting
25 |         self.weights = weights
26 |         self.verbose = verbose
27 |         self.strict = strict
28 |         self.le_ = None
29 |         self.classes_ = None
30 | 
31 |     def transform(self, X):
32 |         weights = self.weights
33 |         if weights is None:
34 |             weights = np.ones(len(self.estimators)) / len(self.estimators)
35 |         return [est.predict_proba(X) * ww for ww, (_, est) in zip(weights, self.estimators)]
36 | 
37 |     def predict_proba(self, X):
38 |         return sum(self.transform(X))
39 | 
40 |     def predict(self, X):
41 |         probs = self.predict_proba(X)
42 |         inds = np.argmax(probs, axis=1)
43 |         return self.classes_[inds]
44 | 
45 |     def fit(self, X, y, sample_weight=None):
46 |         self.le_ = LabelEncoder().fit(y)
47 |         self.classes_ = self.le_.classes_
48 |         for name, est in self.estimators:
49 |             if self.strict:
50 |                 assert np.all(
51 |                     est.classes_ == self.classes_
52 |                 ), f"Model classes ({self.classes_}) not aligned with {name}: {est.classes_=}"
53 |         return self
54 | 
55 |     def score(self, X, y):
56 |         return np.mean(self.predict(X) == y)
57 | 
58 | 
59 | class ZeroShotVotingClassifier(PrefittedVotingClassifier):
60 |     def __init__(
61 |         self,
62 |         estimators: List[Union[BaseEstimator]],
63 |         label_alignment: Dict[str, str],
64 |         voting: str = "soft",
65 |         weights: Optional[Sized] = None,
66 |         verbose: bool = False,
67 |     ):
68 |         super().__init__(estimators=estimators, voting=voting, weights=weights, verbose=verbose, strict=False)
69 |         self.label_alignment = label_alignment
70 | 
71 |     def predict_proba(self, X):
72 |         out = np.zeros((X.shape[0], self.classes_.shape[0]))
73 |         self_lookup = dict(zip(self.classes_, range(len(self.classes_))))
74 |         for (_, estimator), transformed in zip(self.estimators, self.transform(X)):
75 |             for fi, (name, col) in enumerate(zip(estimator.classes_, transformed.T)):
76 |                 out[:, self_lookup[self.label_alignment[name]]] += col
77 |         return out
78 | 
79 |     def predict(self, X):
80 |         probs = self.predict_proba(X)
81 |         inds = np.argmax(probs, axis=1)
82 |         return self.classes_[inds]
83 | 


--------------------------------------------------------------------------------
/src/datasets/uschad.py:
--------------------------------------------------------------------------------
 1 | from os.path import join
 2 | 
 3 | import numpy as np
 4 | import pandas as pd
 5 | from scipy.io import loadmat
 6 | from tqdm import trange
 7 | 
 8 | from src.datasets.base import Dataset
 9 | from src.utils.decorators import fold_decorator
10 | from src.utils.decorators import index_decorator
11 | from src.utils.decorators import label_decorator
12 | 
13 | __all__ = ["uschad"]
14 | 
15 | 
16 | class uschad(Dataset):
17 |     def __init__(self):
18 |         super(uschad, self).__init__(name=self.__class__.__name__)
19 | 
20 |     @label_decorator
21 |     def build_label(self, task, *args, **kwargs):
22 |         def callback(ii, sub_id, act_id, trial_id, data):
23 |             return np.zeros((data.shape[0], 1)) + act_id
24 | 
25 |         return (
26 |             self.meta.inv_lookup[task],
27 |             uschad_iterator(self.unzip_path, callback=callback, desc=f"{self.identifier} Labels"),
28 |         )
29 | 
30 |     @fold_decorator
31 |     def build_predefined(self, *args, **kwargs):
32 |         def callback(ii, sub_id, act_id, trial_id, data):
33 |             return np.tile(["train", "test"][sub_id > 10], (data.shape[0], 1))
34 | 
35 |         return uschad_iterator(self.unzip_path, callback=callback, desc=f"{self.identifier} Folds")
36 | 
37 |     @index_decorator
38 |     def build_index(self, *args, **kwargs):
39 |         def callback(ii, sub_id, act_id, trial_id, data):
40 |             return np.c_[
41 |                 np.zeros((data.shape[0], 1)) + sub_id,
42 |                 np.zeros((data.shape[0], 1)) + ii,
43 |                 np.arange(data.shape[0]) / self.meta["fs"],
44 |             ]
45 | 
46 |         return uschad_iterator(
47 |             self.unzip_path, callback=callback, columns=["subject", "trial", "time"], desc=f"{self.identifier} Index",
48 |         )
49 | 
50 |     def build_data(self, loc, mod, *args, **kwargs):
51 |         cols = dict(accel=[0, 1, 2], gyro=[3, 4, 5])[mod]
52 | 
53 |         def callback(ii, sub_id, act_id, trial_id, data):
54 |             return data[:, cols]
55 | 
56 |         scale = dict(accel=1.0, gyro=2 * np.pi / 360)
57 | 
58 |         data = uschad_iterator(self.unzip_path, callback=callback, desc=f"Data ({loc}-{mod})")
59 | 
60 |         return data.values * scale[mod]
61 | 
62 | 
63 | def uschad_iterator(path, columns=None, cols=None, callback=None, desc=None):
64 |     data_list = []
65 | 
66 |     ii = 0
67 | 
68 |     for sub_id in trange(1, 14 + 1, desc=desc):
69 |         for act_id in range(1, 12 + 1):
70 |             for trail_id in range(1, 5 + 1):
71 |                 fname = join(path, f"Subject{sub_id}", f"a{act_id}t{trail_id}.mat")
72 | 
73 |                 data = loadmat(fname)["sensor_readings"]
74 | 
75 |                 if callback:
76 |                     data = callback(ii, sub_id, act_id, trail_id, data)
77 |                 elif cols:
78 |                     data = data[cols]
79 |                 else:
80 |                     raise ValueError
81 | 
82 |                 data_list.extend(data)
83 |                 ii += 1
84 | 
85 |     df = pd.DataFrame(data_list)
86 |     if columns:
87 |         df.columns = columns
88 |     return df
89 | 


--------------------------------------------------------------------------------
/har_ensemble_avg.py:
--------------------------------------------------------------------------------
 1 | from mldb import NodeWrapper
 2 | 
 3 | from src.base import ExecutionGraph
 4 | from src.base import get_ancestral_metadata
 5 | from src.models.base import BasicScorer
 6 | from src.models.base import ClassifierWrapper
 7 | from src.models.ensembles import PrefittedVotingClassifier
 8 | from src.motifs.features import get_features
 9 | from src.motifs.features import get_windowed_wearables
10 | from src.motifs.models import get_classifier
11 | from src.utils.loaders import dataset_importer
12 | from src.utils.misc import randomised_order
13 | 
14 | 
15 | def ensemble_classifier(
16 |     task_name: str,
17 |     feat_name: str,
18 |     data_partition: str,
19 |     train_test_split: str,
20 |     windowed_data: NodeWrapper,
21 |     clf_names,
22 |     evaluate=False,
23 | ):
24 |     features = get_features(feat_name=feat_name, windowed_data=windowed_data)
25 | 
26 |     graph: ExecutionGraph = features.graph / f"ensemble-over={sorted(clf_names)}" / task_name / train_test_split
27 | 
28 |     estimators = list()
29 |     for clf_name in randomised_order(clf_names):
30 |         estimators.append(
31 |             [
32 |                 f"{clf_name=}",
33 |                 get_classifier(
34 |                     feature_node=features,
35 |                     clf_name=clf_name,
36 |                     task_name=task_name,
37 |                     data_partition=data_partition,
38 |                     train_test_split=train_test_split,
39 |                 ).model,
40 |             ]
41 |         )
42 | 
43 |     estimator = graph.instantiate_node(
44 |         key=f"PrefittedVotingClassifier-{feat_name}".lower(),
45 |         func=PrefittedVotingClassifier,
46 |         kwargs=dict(estimators=estimators, voting="soft", verbose=10),
47 |     )
48 | 
49 |     model = ClassifierWrapper(
50 |         parent=graph,
51 |         estimator=estimator,
52 |         features=features,
53 |         task=features.graph[task_name],
54 |         split=graph.get_split_series(data_partition=data_partition, train_test_split=train_test_split),
55 |         scorer=BasicScorer(),
56 |         evaluate=evaluate,
57 |     )
58 | 
59 |     return model
60 | 
61 | 
62 | def basic_ensemble(
63 |     dataset_name="anguita2013",
64 |     modality="all",
65 |     location="all",
66 |     task_name="har",
67 |     feat_name="ecdf",
68 |     data_partition="predefined",
69 |     fs_new=33,
70 |     win_len=3,
71 |     win_inc=1,
72 | ):
73 |     dataset = dataset_importer(dataset_name)
74 | 
75 |     windowed_data = get_windowed_wearables(
76 |         dataset=dataset, modality=modality, location=location, fs_new=fs_new, win_len=win_len, win_inc=win_inc
77 |     )
78 | 
79 |     models = dict()
80 | 
81 |     train_test_splits = get_ancestral_metadata(windowed_data, "data_partitions")[data_partition]
82 |     for train_test_split in randomised_order(train_test_splits):
83 |         models[train_test_split] = ensemble_classifier(
84 |             feat_name=feat_name,
85 |             task_name=task_name,
86 |             data_partition=data_partition,
87 |             windowed_data=windowed_data,
88 |             train_test_split=train_test_split,
89 |             clf_names=["sgd", "lr", "rf"],
90 |             evaluate=True,
91 |         )
92 | 
93 |     return models
94 | 
95 | 
96 | if __name__ == "__main__":
97 |     basic_ensemble()
98 | 


--------------------------------------------------------------------------------
/src/datasets/anguita2013.py:
--------------------------------------------------------------------------------
 1 | from os.path import join
 2 | 
 3 | import numpy as np
 4 | import pandas as pd
 5 | 
 6 | from src.datasets.base import Dataset
 7 | from src.utils.decorators import fold_decorator
 8 | from src.utils.decorators import index_decorator
 9 | from src.utils.decorators import label_decorator
10 | from src.utils.loaders import load_csv_data
11 | 
12 | __all__ = ["anguita2013"]
13 | 
14 | WIN_LEN = 128
15 | 
16 | 
17 | class anguita2013(Dataset):
18 |     def __init__(self):
19 |         super(anguita2013, self).__init__(name=self.__class__.__name__, unzip_path=lambda s: s.replace("%20", " "))
20 | 
21 |     @label_decorator
22 |     def build_label(self, task, *args, **kwargs):
23 |         labels = []
24 |         for fold in ("train", "test"):
25 |             fold_labels = load_csv_data(join(self.unzip_path, fold, f"y_{fold}.txt"))
26 |             labels.extend([l for l in fold_labels for _ in range(WIN_LEN)])
27 |         return self.meta.inv_lookup[task], pd.DataFrame(dict(labels=labels))
28 | 
29 |     @fold_decorator
30 |     def build_predefined(self, *args, **kwargs):
31 |         fold = []
32 |         fold.extend(
33 |             ["train" for _ in load_csv_data(join(self.unzip_path, "train", "y_train.txt")) for _ in range(WIN_LEN)]
34 |         )
35 |         fold.extend(
36 |             ["test" for _ in load_csv_data(join(self.unzip_path, "test", "y_test.txt")) for _ in range(WIN_LEN)]
37 |         )
38 |         return pd.DataFrame(fold)
39 | 
40 |     @index_decorator
41 |     def build_index(self, *args, **kwargs):
42 |         sub = []
43 |         for fold in ("train", "test"):
44 |             sub.extend(load_csv_data(join(self.unzip_path, fold, f"subject_{fold}.txt")))
45 |         index = pd.DataFrame(
46 |             dict(
47 |                 subject=[si for si in sub for _ in range(WIN_LEN)],
48 |                 trial=build_seq_list(subs=sub, win_len=WIN_LEN),
49 |                 time=build_time(subs=sub, win_len=WIN_LEN, fs=float(self.meta.meta["fs"])),
50 |             )
51 |         )
52 |         return index
53 | 
54 |     def build_data(self, loc, mod, *args, **kwargs):
55 |         x_data = []
56 |         y_data = []
57 |         z_data = []
58 |         modality = dict(accel="acc", gyro="gyro")[mod]
59 |         for fold in ("train", "test"):
60 |             for l, d in zip((x_data, y_data, z_data), ("x", "y", "z")):
61 |                 ty = ["body", "total"][modality in {"accel", "acc"}]
62 |                 acc = load_csv_data(
63 |                     join(self.unzip_path, fold, "Inertial Signals", f"{ty}_{modality}_{d}_{fold}.txt"), astype="np",
64 |                 )
65 |                 l.extend(acc.ravel().tolist())
66 |         return np.c_[x_data, y_data, z_data]
67 | 
68 | 
69 | def build_time(subs, win_len, fs):
70 |     win = np.arange(win_len, dtype=float) / fs
71 |     inc = win_len / fs
72 |     t = []
73 |     prev_sub = subs[0]
74 |     for curr_sub in subs:
75 |         if curr_sub != prev_sub:
76 |             win = np.arange(win_len, dtype=float) / fs
77 |         t.extend(win)
78 |         win += inc
79 |         prev_sub = curr_sub
80 |     return t
81 | 
82 | 
83 | def build_seq_list(subs, win_len):
84 |     seq = []
85 |     si = 0
86 |     last_sub = subs[0]
87 |     for prev_sub in subs:
88 |         if prev_sub != last_sub:
89 |             si += 1
90 |         seq.extend([si] * win_len)
91 |         last_sub = prev_sub
92 |     return seq
93 | 


--------------------------------------------------------------------------------
/make_tables.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | import os
 3 | 
 4 | from src.meta import DatasetMeta
 5 | from src.utils.loaders import build_path
 6 | from src.utils.loaders import get_yaml_file_list
 7 | from src.utils.loaders import load_metadata
 8 | from src.utils.loaders import load_yaml
 9 | 
10 | 
11 | def make_links(links, desc="Link"):
12 |     return ", ".join("[{} {}]({})".format(desc, ii, url) for ii, url in enumerate(links, start=1))
13 | 
14 | 
15 | def make_dataset_row(dataset):
16 |     # modalities = sorted(set([mn for ln, lm in self.meta['locations'].items() for mn, mv in lm.items() if mv]))
17 | 
18 |     data = [
19 |         dataset.meta["author"],
20 |         dataset.meta["paper_name"],
21 |         dataset.name,
22 |         make_links(links=dataset.meta["description_urls"], desc="Link"),
23 |         dataset.meta.get("missing", ""),
24 |         make_links(links=dataset.meta["paper_urls"], desc="Link"),
25 |         dataset.meta["year"],
26 |         dataset.meta["fs"],
27 |         ", ".join(dataset.meta["locations"].keys()),
28 |         ", ".join(dataset.meta["modalities"]),
29 |         dataset.meta["num_subjects"],
30 |         dataset.meta["num_activities"],
31 |         ", ".join(dataset.meta["activities"].keys()),
32 |     ]
33 | 
34 |     return (
35 |         (
36 |             f"| First Author | Paper Name | Dataset Name | Description | Missing data "
37 |             f"| Download Links | Year | Sampling Rate | Device Locations | Device Modalities "
38 |             f"| Num Subjects | Num Activities | Activities | "
39 |         ),
40 |         "| {} |".format(" | ".join(["-----"] * len(data))),
41 |         "| {} |".format(" | ".join(map(str, data))),
42 |     )
43 | 
44 | 
45 | def load_datasets_metadata():
46 |     return [load_yaml(path) for path in get_yaml_file_list("datasets", stem=False)]
47 | 
48 | 
49 | def main():
50 |     # Ensure the paths exist
51 |     root = build_path("tables")
52 |     if not os.path.exists(root):
53 |         os.makedirs(root)
54 | 
55 |     # Current list of datasets
56 |     lines = []
57 |     datasets = load_datasets_metadata()
58 |     for dataset in datasets:
59 |         dataset = DatasetMeta(dataset)
60 |         head, space, line = make_dataset_row(dataset)
61 |         lines.append(line)
62 |     with open(build_path("tables", "datasets.md"), "w") as fil:
63 |         fil.write("{}\n".format(head))
64 |         fil.write("{}\n".format(space))
65 |         for line in lines:
66 |             fil.write("{}\n".format(line))
67 | 
68 |     # Iterate over the other data tables
69 |     dims = [
70 |         "activities",
71 |         "features",
72 |         "locations",
73 |         "models",
74 |         "pipelines",
75 |         "transformers",
76 |         "visualisations",
77 |     ]
78 | 
79 |     for dim in dims:
80 |         with open(build_path("tables", f"{dim}.md"), "w") as fil:
81 |             data = load_metadata(f"{dim}.yaml")
82 |             fil.write(f"| Index | {dim[0].upper()}{dim[1:].lower()} | value | \n")
83 |             fil.write(f"| ----- | ----- | ----- | \n")
84 |             if isinstance(data, dict):
85 |                 for ki, (key, value) in enumerate(data.items()):
86 |                     if isinstance(value, dict) and "description" in value:
87 |                         value = make_links(value["description"])
88 |                     fil.write(f"| {ki} | {key} | {value} | \n")
89 | 
90 | 
91 | if __name__ == "__main__":
92 |     main()
93 | 


--------------------------------------------------------------------------------
/src/datasets/pamap2.py:
--------------------------------------------------------------------------------
 1 | from collections import defaultdict
 2 | from os.path import join
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | from tqdm import tqdm
 7 | 
 8 | from src.datasets.base import Dataset
 9 | from src.utils.decorators import fold_decorator
10 | from src.utils.decorators import index_decorator
11 | from src.utils.decorators import label_decorator
12 | 
13 | __all__ = [
14 |     "pamap2",
15 | ]
16 | 
17 | 
18 | class pamap2(Dataset):
19 |     def __init__(self):
20 |         super(pamap2, self).__init__(name=self.__class__.__name__, unzip_path=lambda p: join(p, "Protocol"))
21 | 
22 |     @label_decorator
23 |     def build_label(self, task, *args, **kwargs):
24 |         df = pd.DataFrame(iter_pamap2_subs(path=self.unzip_path, cols=[1], desc=f"{self.identifier} Labels"))
25 | 
26 |         return self.meta.inv_lookup[task], df
27 | 
28 |     @fold_decorator
29 |     def build_predefined(self, *args, **kwargs):
30 |         def folder(sid, data):
31 |             return np.zeros(data.shape[0]) + sid
32 | 
33 |         df = iter_pamap2_subs(
34 |             path=self.unzip_path, cols=[1], desc=f"{self.identifier} Folds", callback=folder, columns=["fold"],
35 |         ).astype(int)
36 | 
37 |         lookup = {
38 |             1: "train",
39 |             2: "train",
40 |             3: "test",
41 |             4: "train",
42 |             5: "train",
43 |             6: "test",
44 |             7: "train",
45 |             8: "train",
46 |             9: "test",
47 |         }
48 | 
49 |         return df.assign(fold_0=df["fold"].apply(lookup.__getitem__))[["fold_0"]].astype("category")
50 | 
51 |     @index_decorator
52 |     def build_index(self, *args, **kwargs):
53 |         def indexer(sid, data):
54 |             subject = np.zeros(data.shape[0])[:, None] + sid
55 |             trial = np.zeros(data.shape[0])[:, None] + sid
56 |             return np.concatenate((subject, trial, data), axis=1)
57 | 
58 |         df = iter_pamap2_subs(
59 |             path=self.unzip_path,
60 |             cols=[0],
61 |             desc=f"{self.identifier} Index",
62 |             callback=indexer,
63 |             columns=["subject", "trial", "time"],
64 |         ).astype(dict(subject=int, trial=int, time=float))
65 | 
66 |         return df
67 | 
68 |     def build_data(self, loc, mod, *args, **kwargs):
69 |         offset = dict(wrist=3, chest=20, ankle=37)[loc] + dict(accel=1, gyro=7, mag=10)[mod]
70 | 
71 |         df = iter_pamap2_subs(
72 |             path=self.unzip_path,
73 |             cols=list(range(offset, offset + 3)),
74 |             desc=f"Parsing {mod} at {loc}",
75 |             columns=["x", "y", "z"],
76 |         ).astype(float)
77 | 
78 |         scale = dict(accel=9.80665, gyro=np.pi * 2.0, mag=1.0)[mod]
79 | 
80 |         return df.values / scale
81 | 
82 | 
83 | def iter_pamap2_subs(path, cols, desc, columns=None, callback=None, n_subjects=9):
84 |     data = []
85 | 
86 |     for sid in tqdm(range(1, n_subjects + 1), desc=desc):
87 |         datum = pd.read_csv(join(path, f"subject10{sid}.dat"), delim_whitespace=True, header=None, usecols=cols).fillna(
88 |             method="ffill"
89 |         )
90 |         assert np.isfinite(datum.values).all()
91 |         if callback:
92 |             data.extend(callback(sid, datum.values))
93 |         else:
94 |             data.extend(datum.values)
95 |     df = pd.DataFrame(data)
96 |     if columns:
97 |         df.columns = columns
98 |     return df
99 | 


--------------------------------------------------------------------------------
/src/motifs/models.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from mldb import NodeWrapper
  3 | from sklearn.decomposition import PCA
  4 | from sklearn.ensemble import RandomForestClassifier
  5 | from sklearn.linear_model import LogisticRegressionCV
  6 | from sklearn.linear_model import SGDClassifier
  7 | from sklearn.pipeline import Pipeline
  8 | from sklearn.preprocessing import MinMaxScaler
  9 | from sklearn.preprocessing import Normalizer
 10 | from sklearn.preprocessing import StandardScaler
 11 | from sklearn.svm import SVC
 12 | 
 13 | from src.base import ExecutionGraph
 14 | from src.base import get_ancestral_metadata
 15 | from src.models.base import BasicScorer
 16 | from src.models.base import ClassifierWrapper
 17 | 
 18 | 
 19 | def make_classifier_node(root: ExecutionGraph, features: NodeWrapper, clf_name: str):
 20 |     if clf_name == "sgd":
 21 |         steps = [
 22 |             # ('imputation', SimpleImputer()),
 23 |             ("scaling", StandardScaler()),
 24 |             ("pca", PCA(n_components=0.95)),
 25 |             ("clf", SGDClassifier(loss="log")),
 26 |         ]
 27 | 
 28 |         param_grid = dict(clf__alpha=np.logspace(-5, 5, 11))
 29 | 
 30 |     elif clf_name == "lr":
 31 |         steps = [
 32 |             # ('imputation', SimpleImputer()),
 33 |             ("scaling", StandardScaler()),
 34 |             ("pca", PCA(n_components=0.95)),
 35 |             ("clf", LogisticRegressionCV(max_iter=1000)),
 36 |         ]
 37 | 
 38 |         param_grid = dict(clf__penalty=["l2"], clf__max_iter=[100])
 39 | 
 40 |     elif clf_name == "rf":
 41 |         steps = [
 42 |             # ('imputation', SimpleImputer()),
 43 |             ("scaling", StandardScaler()),
 44 |             ("pca", PCA(n_components=0.95)),
 45 |             ("clf", RandomForestClassifier()),
 46 |         ]
 47 | 
 48 |         param_grid = dict(clf__n_estimators=[10, 30, 100])
 49 | 
 50 |     elif clf_name == "svc":
 51 |         steps = [
 52 |             # ('imputation', SimpleImputer()),
 53 |             ("scaling", Normalizer()),
 54 |             ("pca", PCA(n_components=0.95)),
 55 |             ("minmax", MinMaxScaler(feature_range=(-1, 1))),
 56 |             ("clf", SVC(kernel="rbf", probability=True)),
 57 |         ]
 58 | 
 59 |         param_grid = dict(clf__gamma=["scale", "auto"], clf__C=np.logspace(-3, 3, 7),)
 60 | 
 61 |     else:
 62 |         raise ValueError(f"Classifier {clf_name} selected, but this is not supported.")
 63 | 
 64 |     key = ",".join([str(nn) for _, nn in steps]).lower()
 65 |     estimator = root.instantiate_node(key=key, func=Pipeline, kwargs=dict(steps=steps))
 66 | 
 67 |     return estimator, param_grid
 68 | 
 69 | 
 70 | def get_classifier(
 71 |     feature_node: NodeWrapper,
 72 |     clf_name: str,
 73 |     task_name: str,
 74 |     data_partition: str,
 75 |     train_test_split: str,
 76 |     evaluate: bool = False,
 77 | ) -> ClassifierWrapper:
 78 |     # Value checks
 79 |     assert task_name in get_ancestral_metadata(feature_node, "tasks")
 80 |     assert data_partition in get_ancestral_metadata(feature_node, "data_partitions")
 81 |     assert train_test_split in get_ancestral_metadata(feature_node, "data_partitions")[data_partition]
 82 | 
 83 |     root: ExecutionGraph = feature_node.graph / task_name / data_partition / train_test_split
 84 | 
 85 |     # Instantiate the classifier
 86 |     estimator, param_grid = make_classifier_node(root=root, features=feature_node, clf_name=clf_name)
 87 | 
 88 |     # Instantiate the classifier
 89 |     model = ClassifierWrapper(
 90 |         parent=root,
 91 |         estimator=estimator,
 92 |         param_grid=param_grid,
 93 |         features=feature_node,
 94 |         task=feature_node.graph[task_name],
 95 |         split=root.get_split_series(data_partition=data_partition, train_test_split=train_test_split),
 96 |         scorer=BasicScorer(),
 97 |         evaluate=evaluate,
 98 |     )
 99 | 
100 |     return model
101 | 


--------------------------------------------------------------------------------
/src/evaluation/classification.py:
--------------------------------------------------------------------------------
 1 | from collections import Counter
 2 | 
 3 | import numpy as np
 4 | from loguru import logger
 5 | from scipy.special import logsumexp
 6 | from sklearn import metrics
 7 | 
 8 | 
 9 | __all__ = ["evaluate_data_split"]
10 | 
11 | 
12 | def evaluate_data_split(split, targets, estimator, prob_predictions):
13 |     res = dict()
14 |     predictions = estimator.classes_[prob_predictions.argmax(axis=1)]
15 |     for tr_val_te in split.unique():
16 |         inds = split == tr_val_te
17 |         yy, pp, ss = targets[inds], predictions[inds], prob_predictions[inds]
18 |         res[tr_val_te] = dict()
19 |         res[tr_val_te] = _classification_perf_metrics(
20 |             model=estimator, labels=np.asarray(yy).ravel(), predictions=np.asarray(pp), scores=np.asarray(ss),
21 |         )
22 |     if hasattr(estimator, "cv_results_"):
23 |         res["xval"] = estimator.cv_results_
24 |     return res
25 | 
26 | 
27 | def _get_class_names(model):
28 |     if hasattr(model, "classes_"):
29 |         return model.classes_
30 |     logger.exception(TypeError(f"The classes member cannot be extracted from this object: {model}"))
31 | 
32 | 
33 | def _classification_perf_metrics(labels, model, predictions, scores):
34 |     cols = _get_class_names(model)
35 | 
36 |     def score_metrics(name, func, labels_, scores_, **kwargs):
37 |         unique_labels = np.unique(labels_)
38 |         lookup = dict(zip(unique_labels, range(unique_labels.shape[0])))
39 |         scores_ = scores_[:, unique_labels]
40 |         if scores_.shape[1] == 1:
41 |             return 1.0
42 |         scores_ /= scores_.sum(axis=1, keepdims=True)
43 |         if scores_.shape[1] == 2:
44 |             scores_ = scores_[:, 1]
45 |         return {
46 |             f"{name}_{average}": func(
47 |                 y_true=np.asarray([lookup[label] for label in labels_]), y_score=scores_, average=average, **kwargs,
48 |             )
49 |             for average in ("macro", "weighted")
50 |         }
51 | 
52 |     def prediction_metrics(name, func):
53 |         return {
54 |             f"{name}_{average}": func(y_true=labels, y_pred=predictions, average=average)
55 |             for average in ("macro", "micro", "weighted")
56 |         }
57 | 
58 |     label_lookup = dict(zip(model.classes_, range(model.classes_.shape[0])))
59 |     probs = np.exp(scores - logsumexp(scores, axis=1, keepdims=True))
60 |     label_ind = np.asarray([label_lookup[ll] for ll in labels])
61 | 
62 |     label_counts = Counter(labels)
63 | 
64 |     res = dict(
65 |         class_names=cols,
66 |         num_instances=len(labels),
67 |         label_counts=dict(label_counts.items()),
68 |         class_prior={kk: vv / len(labels) for kk, vv in label_counts.items()},
69 |         accuracy=metrics.accuracy_score(labels, predictions),
70 |         confusion_matrix=metrics.confusion_matrix(labels, predictions),
71 |         **score_metrics("auroc_ovo", metrics.roc_auc_score, label_ind, probs, multi_class="ovo"),
72 |         **score_metrics("auroc_ovr", metrics.roc_auc_score, label_ind, probs, multi_class="ovr"),
73 |         **prediction_metrics("f1", metrics.f1_score),
74 |         **prediction_metrics("precision", metrics.precision_score),
75 |         **prediction_metrics("recall", metrics.recall_score),
76 |         per_class_metrics=dict(),
77 |     )
78 | 
79 |     if len(cols) > 2:
80 |         for col in np.unique(labels):
81 |             yi = label_lookup[col]
82 |             yy_i = labels == col
83 |             y_hat_i = predictions == col
84 |             res["per_class_metrics"][col] = dict(
85 |                 index=yi,
86 |                 label=col,
87 |                 count=yy_i.sum(),
88 |                 class_prior=yy_i.mean(),
89 |                 accuracy=metrics.accuracy_score(yy_i, y_hat_i),
90 |                 auroc=metrics.roc_auc_score(yy_i, y_hat_i),
91 |                 f1=metrics.f1_score(yy_i, y_hat_i),
92 |                 precision=metrics.precision_score(yy_i, y_hat_i),
93 |                 recall=metrics.recall_score(yy_i, y_hat_i),
94 |                 confusion_matrix=metrics.confusion_matrix(yy_i, y_hat_i),
95 |             )
96 | 
97 |     return res
98 | 


--------------------------------------------------------------------------------
/src/meta.py:
--------------------------------------------------------------------------------
  1 | from pathlib import Path
  2 | 
  3 | from loguru import logger
  4 | 
  5 | from src.utils.loaders import get_env
  6 | from src.utils.loaders import load_yaml
  7 | from src.utils.loaders import metadata_path
  8 | 
  9 | 
 10 | __all__ = [
 11 |     "DatasetMeta",
 12 |     "BaseMeta",
 13 |     "HARMeta",
 14 |     "ModalityMeta",
 15 |     "DatasetMeta",
 16 | ]
 17 | 
 18 | 
 19 | class BaseMeta(object):
 20 |     def __init__(self, path, *args, **kwargs):
 21 |         self.path = Path(path)
 22 |         self.name = self.path.stem
 23 |         self.meta = dict()
 24 | 
 25 |         if path:
 26 |             try:
 27 |                 meta = load_yaml(path)
 28 | 
 29 |                 if meta is None:
 30 |                     logger.info(f'The content metadata module "{self.name}" from {path} is empty. Assigning empty dict')
 31 |                     meta = dict()
 32 |                 else:
 33 |                     if not isinstance(meta, dict):
 34 |                         logger.warning(f"Metadata not of type dict loaded: {meta}")
 35 | 
 36 |                 self.meta = meta
 37 | 
 38 |             except FileNotFoundError:
 39 |                 # logger.warning(f'The metadata file for "{self.name}" was not found.')
 40 |                 pass
 41 | 
 42 |     def __getitem__(self, item):
 43 |         if item not in self.meta:
 44 |             logger.exception(KeyError(f"{item} not found in {self.__class__.__name__}"))
 45 |         return self.meta[item]
 46 | 
 47 |     def __contains__(self, item):
 48 |         return item in self.meta
 49 | 
 50 |     def __repr__(self):
 51 |         return f"<{self.name} {self.meta.__repr__()}>"
 52 | 
 53 |     def keys(self):
 54 |         return self.meta.keys()
 55 | 
 56 |     def values(self):
 57 |         return self.meta.values()
 58 | 
 59 |     def items(self):
 60 |         return self.meta.items()
 61 | 
 62 |     def insert(self, key, value):
 63 |         assert key not in self.meta
 64 |         self.meta[key] = value
 65 | 
 66 | 
 67 | """
 68 | Non-functional metadata
 69 | """
 70 | 
 71 | 
 72 | class HARMeta(BaseMeta):
 73 |     def __init__(self, path, *args, **kwargs):
 74 |         super(HARMeta, self).__init__(path=metadata_path("tasks", "har.yaml"), *args, **kwargs)
 75 | 
 76 | 
 77 | class LocalisationMeta(BaseMeta):
 78 |     def __init__(self, path, *args, **kwargs):
 79 |         super(LocalisationMeta, self).__init__(path=metadata_path("tasks", "localisation.yaml"), *args, **kwargs)
 80 | 
 81 | 
 82 | class ModalityMeta(BaseMeta):
 83 |     def __init__(self, path, *args, **kwargs):
 84 |         super(ModalityMeta, self).__init__(name=metadata_path("modality.yaml"), *args, **kwargs)
 85 | 
 86 | 
 87 | class DatasetMeta(BaseMeta):
 88 |     def __init__(self, path, *args, **kwargs):
 89 |         if isinstance(path, str):
 90 |             path = Path("metadata", "datasets", f"{path}.yaml")
 91 | 
 92 |         assert path.exists()
 93 | 
 94 |         super(DatasetMeta, self).__init__(path=path, *args, **kwargs)
 95 | 
 96 |         if "fs" not in self.meta:
 97 |             logger.exception(KeyError(f'The file {path} does not contain the key "fs"'))
 98 | 
 99 |         self.inv_lookup = dict()
100 | 
101 |         for task_name in self.meta["tasks"].keys():
102 |             task_label_file = metadata_path("tasks", f"{task_name}.yaml")
103 |             task_labels = load_yaml(task_label_file)
104 |             dataset_labels = self.meta["tasks"][task_name]["target_transform"]
105 |             if not set(dataset_labels.keys()).issubset(set(task_labels)):
106 |                 logger.exception(
107 |                     ValueError(
108 |                         f"The following labels from dataset {path} are not accounted for in {task_label_file}: "
109 |                         f"{set(dataset_labels.keys()).difference(task_labels.keys())}"
110 |                     )
111 |                 )
112 |             self.inv_lookup[task_name] = {dataset_labels[kk]: kk for kk, vv in dataset_labels.items()}
113 | 
114 |     @property
115 |     def fs(self):
116 |         return float(self.meta["fs"])
117 | 
118 |     @property
119 |     def zip_path(self):
120 |         return get_env("ZIP_ROOT") / self.name
121 | 


--------------------------------------------------------------------------------
/har_zero.py:
--------------------------------------------------------------------------------
  1 | from operator import itemgetter
  2 | from typing import Dict
  3 | from typing import List
  4 | from typing import Tuple
  5 | 
  6 | from mldb import NodeWrapper
  7 | 
  8 | from har_basic import basic_har
  9 | from src.base import ExecutionGraph
 10 | from src.base import get_ancestral_metadata
 11 | from src.models.base import BasicScorer
 12 | from src.models.base import ClassifierWrapper
 13 | from src.models.ensembles import ZeroShotVotingClassifier
 14 | from src.utils.misc import randomised_order
 15 | 
 16 | 
 17 | def make_zero_shot_classifier(
 18 |     feat_name,
 19 |     features,
 20 |     estimators: List[Tuple[str, NodeWrapper]],
 21 |     task_name,
 22 |     data_partition,
 23 |     train_test_split,
 24 |     label_alignment: Dict[str, str],
 25 |     evaluate: bool = True,
 26 | ):
 27 |     clf_names = sorted(map(itemgetter(0), estimators))
 28 | 
 29 |     graph: ExecutionGraph = features.graph / f"zero-shot-from={sorted(clf_names)}" / task_name / data_partition / train_test_split
 30 | 
 31 |     estimator = graph.instantiate_node(
 32 |         key=f"ZeroShotVotingClassifier-{feat_name}".lower(),
 33 |         func=ZeroShotVotingClassifier,
 34 |         kwargs=dict(estimators=estimators, voting="soft", verbose=10, label_alignment=label_alignment),
 35 |     )
 36 | 
 37 |     model = ClassifierWrapper(
 38 |         parent=graph,
 39 |         estimator=estimator,
 40 |         features=features,
 41 |         task=features.graph[task_name],
 42 |         split=graph.get_split_series(data_partition=data_partition, train_test_split=train_test_split),
 43 |         scorer=BasicScorer(),
 44 |         evaluate=evaluate,
 45 |     )
 46 | 
 47 |     return model
 48 | 
 49 | 
 50 | def har_zero(
 51 |     fs_new: float = 33,
 52 |     win_len: float = 3,
 53 |     win_inc: float = 1,
 54 |     clf_name: str = "sgd",
 55 |     task_name: str = "har",
 56 |     dataset_partition: str = "predefined",
 57 |     feat_name: str = "statistical",
 58 |     evaluate: bool = False,
 59 | ):
 60 |     kwargs = dict(
 61 |         fs_new=fs_new, win_len=win_len, win_inc=win_inc, task_name=task_name, feat_name=feat_name, clf_name=clf_name
 62 |     )
 63 | 
 64 |     external_datasets = dict(
 65 |         pamap2=dict(dataset_name="pamap2", location="chest", modality="accel"),
 66 |         uschad=dict(dataset_name="uschad", location="waist", modality="accel"),
 67 |     )
 68 | 
 69 |     test_dataset = dict(dataset_name="anguita2013", location="waist", modality="accel")
 70 | 
 71 |     label_alignment = dict(
 72 |         cycle="walk",
 73 |         elevator_down="stand",
 74 |         elevator_up="stand",
 75 |         iron="stand",
 76 |         jump="walk",
 77 |         other="walk",
 78 |         rope_jump="walk",
 79 |         run="walk",
 80 |         sit="sit",
 81 |         sleep="lie",
 82 |         stand="stand",
 83 |         vacuum="walk",
 84 |         walk="walk",
 85 |         walk_down="walk_down",
 86 |         walk_left="walk",
 87 |         walk_nordic="walk",
 88 |         walk_right="walk",
 89 |         walk_up="walk_up",
 90 |         lie="lie",
 91 |     )
 92 | 
 93 |     features, test_models = basic_har(data_partition="predefined", **test_dataset, **kwargs)
 94 | 
 95 |     auxiliary_models = dict()
 96 |     for model_name, model_kwargs in external_datasets.items():
 97 |         aux_features, aux_models = basic_har(data_partition="deployable", **model_kwargs, **kwargs)
 98 |         auxiliary_models[model_name] = aux_models
 99 | 
100 |     models = dict()
101 | 
102 |     train_test_splits = get_ancestral_metadata(features, "data_partitions")[dataset_partition]
103 |     for train_test_split in randomised_order(train_test_splits):
104 |         models[train_test_split] = make_zero_shot_classifier(
105 |             estimators=[(mn, mm[train_test_split].model) for mn, mm in auxiliary_models.items()],
106 |             feat_name=feat_name,
107 |             features=features,
108 |             task_name=task_name,
109 |             train_test_split=train_test_split,
110 |             data_partition=dataset_partition,
111 |             evaluate=evaluate,
112 |             label_alignment=label_alignment,
113 |         )
114 | 
115 |     return models
116 | 
117 | 
118 | if __name__ == "__main__":
119 |     har_zero(evaluate=True)
120 | 


--------------------------------------------------------------------------------
/src/datasets/base.py:
--------------------------------------------------------------------------------
  1 | from os.path import basename
  2 | from os.path import join
  3 | from os.path import splitext
  4 | 
  5 | import pandas as pd
  6 | 
  7 | from src.base import ExecutionGraph
  8 | from src.base import get_ancestral_metadata
  9 | from src.meta import DatasetMeta
 10 | 
 11 | __all__ = [
 12 |     "Dataset",
 13 | ]
 14 | 
 15 | 
 16 | def validate_split_names(split_name, split_df, split_cols):
 17 |     meta_set = set(split_cols)
 18 |     df_set = set(split_df.columns)
 19 |     if not df_set.issubset(meta_set):
 20 |         raise ValueError(
 21 |             f"Split dataframe columns ({df_set}) not subset of metadata ({meta_set}) for the split type {split_name}."
 22 |         )
 23 | 
 24 | 
 25 | class Dataset(ExecutionGraph):
 26 |     def __init__(self, name, *args, **kwargs):
 27 |         super(Dataset, self).__init__(name=f"datasets/{name}", meta=DatasetMeta(name))
 28 | 
 29 |         def load_meta(*args, **kwargs):
 30 |             return self.meta.meta
 31 | 
 32 |         load_meta.__name__ = name
 33 | 
 34 |         metadata = self.instantiate_node(key=f"{name}-metadata", backend="yaml", func=load_meta, kwargs=dict())
 35 | 
 36 |         zip_name = kwargs.get("unzip_path", lambda x: x)(splitext(basename(self.meta.meta["download_urls"][0]))[0])
 37 |         self.unzip_path = join(self.meta.zip_path, splitext(zip_name)[0])
 38 | 
 39 |         index = self.instantiate_node(
 40 |             key="index", func=self.build_index, backend="pandas", kwargs=dict(path=self.unzip_path, metatdata=metadata),
 41 |         )
 42 | 
 43 |         # Build the indexes
 44 |         self.instantiate_node(
 45 |             key="predefined",
 46 |             func=self.build_predefined,
 47 |             backend="pandas",
 48 |             kwargs=dict(path=self.unzip_path, metatdata=metadata),
 49 |         )
 50 | 
 51 |         data_partitions = get_ancestral_metadata(self, "data_partitions")
 52 | 
 53 |         self.instantiate_node(
 54 |             key="loso",
 55 |             func=self.build_loso,
 56 |             backend="pandas",
 57 |             kwargs=dict(index=index, columns=data_partitions["loso"]),
 58 |         )
 59 | 
 60 |         self.instantiate_node(
 61 |             key="deployable",
 62 |             func=self.build_deployable,
 63 |             backend="pandas",
 64 |             kwargs=dict(index=index, columns=data_partitions["deployable"]),
 65 |         )
 66 | 
 67 |         tasks = get_ancestral_metadata(self, "tasks")
 68 |         for task in tasks:
 69 |             self.instantiate_node(
 70 |                 key=task,
 71 |                 func=self.build_label,
 72 |                 backend="pandas",
 73 |                 kwargs=dict(path=self.unzip_path, task=task, inv_lookup=self.meta.inv_lookup[task], metatdata=metadata),
 74 |             )
 75 | 
 76 |         # Build list of outputs
 77 |         for placement_modality in self.meta["sources"]:
 78 |             loc = placement_modality["loc"]
 79 |             mod = placement_modality["mod"]
 80 | 
 81 |             self.instantiate_node(
 82 |                 key=f"{loc=}-{mod=}",
 83 |                 func=self.build_data,
 84 |                 backend="numpy",
 85 |                 kwargs=dict(loc=loc, mod=mod, metadata=metadata),
 86 |             )
 87 | 
 88 |     def build_label(self, *args, **kwargs):
 89 |         raise NotImplementedError
 90 | 
 91 |     def build_index(self, *args, **kwargs):
 92 |         raise NotImplementedError
 93 | 
 94 |     def build_data(self, loc, mod, *args, **kwargs):
 95 |         raise NotImplementedError
 96 | 
 97 |     def build_predefined(self, *args, **kwargs):
 98 |         raise NotImplementedError
 99 | 
100 |     def build_deployable(self, index, columns):
101 |         splits = pd.DataFrame({"fold_0": ["train"] * len(index)}, dtype="category")
102 |         validate_split_names(split_name="deployable", split_df=splits, split_cols=columns)
103 |         return splits
104 | 
105 |     def build_loso(self, index, columns):
106 |         splits = pd.DataFrame(
107 |             {
108 |                 f"fold_{ki}": index.trial.apply(lambda tt: ["train", "test"][tt == kk])
109 |                 for ki, kk in enumerate(index.trial.unique())
110 |             }
111 |         )
112 |         validate_split_names(split_name="loso", split_df=splits, split_cols=columns)
113 |         return splits
114 | 


--------------------------------------------------------------------------------
/src/utils/loaders.py:
--------------------------------------------------------------------------------
  1 | from os import environ
  2 | from pathlib import Path
  3 | 
  4 | import pandas as pd
  5 | import yaml
  6 | from dotenv import find_dotenv
  7 | from dotenv import load_dotenv
  8 | from loguru import logger
  9 | 
 10 | 
 11 | __all__ = [
 12 |     # Generic
 13 |     "load_csv_data",
 14 |     "load_metadata",
 15 |     "build_path",
 16 |     "get_yaml_file_list",
 17 |     "iter_dataset_paths",
 18 |     "iter_task_paths",
 19 |     "metadata_path",
 20 |     # Metadata loaders
 21 |     "load_task_metadata",
 22 |     "load_modality_metadata",
 23 |     # Module importers
 24 |     "dataset_importer",
 25 |     "transformer_importer",
 26 |     "feature_importer",
 27 |     "pipeline_importer",
 28 |     "model_importer",
 29 |     "visualisation_importer",
 30 |     "load_yaml",
 31 | ]
 32 | 
 33 | 
 34 | def get_env(key) -> Path:
 35 |     load_dotenv(find_dotenv())
 36 |     return Path(environ[key])
 37 | 
 38 | 
 39 | # Root directory of the project
 40 | def get_project_root() -> Path:
 41 |     return get_env("PROJECT_ROOT")
 42 | 
 43 | 
 44 | # For building file structure
 45 | def build_path(*args) -> Path:
 46 |     path = get_env("DATA_ROOT").joinpath(*args)
 47 |     return path
 48 | 
 49 | 
 50 | def metadata_path(*args) -> Path:
 51 |     path = get_project_root() / "metadata"
 52 |     return path.joinpath(*args)
 53 | 
 54 | 
 55 | # Generic CSV loader
 56 | def load_csv_data(fname, astype="list"):
 57 |     df = pd.read_csv(fname, delim_whitespace=True, header=None)
 58 | 
 59 |     if astype in {"dataframe", "pandas", "pd"}:
 60 |         return df
 61 |     if astype in {"values", "np", "numpy"}:
 62 |         return df.values
 63 |     if astype == "list":
 64 |         return df.values.ravel().tolist()
 65 | 
 66 |     logger.exception(ValueError(f"Un-implemented type specification: {astype}"))
 67 | 
 68 | 
 69 | # YAML file loaders
 70 | def iter_files(path, suffix, stem=False):
 71 |     fil_iter = filter(lambda fil: fil.suffix == suffix, path.iterdir())
 72 |     if stem:
 73 |         yield from map(lambda fil: fil.stem, fil_iter)
 74 |     yield from map(lambda fil: path / fil, fil_iter)
 75 | 
 76 | 
 77 | def iter_dataset_paths():
 78 |     return iter_files(path=metadata_path("datasets"), suffix=".yaml", stem=False)
 79 | 
 80 | 
 81 | def iter_task_paths():
 82 |     return iter_files(path=metadata_path("tasks"), suffix=".yaml", stem=False)
 83 | 
 84 | 
 85 | def load_yaml(filename):
 86 |     with open(filename, "r") as fil:
 87 |         return yaml.load(fil, Loader=yaml.SafeLoader)
 88 | 
 89 | 
 90 | # Metadata
 91 | def load_metadata(*args):
 92 |     return load_yaml(metadata_path(*args))
 93 | 
 94 | 
 95 | def load_task_metadata(task_name):
 96 |     return load_metadata("task", f"{task_name}.yaml")
 97 | 
 98 | 
 99 | # Dataset metadata
100 | def load_modality_metadata():
101 |     return load_metadata("modality.yaml")
102 | 
103 | 
104 | #
105 | 
106 | 
107 | def get_yaml_file_list(*args, stem=False):
108 |     path = metadata_path(*args)
109 |     fil_iter = iter_files(path=path, suffix=".yaml", stem=stem)
110 |     return list(fil_iter)
111 | 
112 | 
113 | # Module importers
114 | 
115 | 
116 | def module_importer(module_path, class_name, *args, **kwargs):
117 |     """
118 | 
119 |     Args:
120 |         module_path:
121 |         class_name:
122 |         *args:
123 |         **kwargs:
124 | 
125 |     Returns:
126 | 
127 |     """
128 |     m = __import__(module_path, fromlist=[class_name])
129 |     c = getattr(m, class_name)
130 |     return c(*args, **kwargs)
131 | 
132 | 
133 | def dataset_importer(class_name, *args, **kwargs):
134 |     """
135 | 
136 |     Args:
137 |         class_name:
138 |         *args:
139 |         **kwargs:
140 | 
141 |     Returns:
142 | 
143 |     """
144 |     return module_importer(module_path="src.datasets", class_name=class_name, *args, **kwargs)
145 | 
146 | 
147 | def feature_importer(class_name, *args, **kwargs):
148 |     """
149 | 
150 |     Args:
151 |         class_name:
152 |         *args:
153 |         **kwargs:
154 | 
155 |     Returns:
156 | 
157 |     """
158 |     return module_importer(module_path="src.features", class_name=class_name, *args, **kwargs)
159 | 
160 | 
161 | def transformer_importer(class_name, *args, **kwargs):
162 |     """
163 | 
164 |     Args:
165 |         class_name:
166 |         *args:
167 |         **kwargs:
168 | 
169 |     Returns:
170 | 
171 |     """
172 |     return module_importer(module_path="src.transformers", class_name=class_name, *args, **kwargs)
173 | 
174 | 
175 | def pipeline_importer(class_name, *args, **kwargs):
176 |     """
177 | 
178 |     Args:
179 |         class_name:
180 |         *args:
181 |         **kwargs:
182 | 
183 |     Returns:
184 | 
185 |     """
186 |     return module_importer(module_path="src.pipelines", class_name=class_name, *args, **kwargs)
187 | 
188 | 
189 | def model_importer(class_name, *args, **kwargs):
190 |     """
191 | 
192 |     Args:
193 |         class_name:
194 |         *args:
195 |         **kwargs:
196 | 
197 |     Returns:
198 | 
199 |     """
200 |     return module_importer(module_path="src.models", class_name=class_name, *args, **kwargs)
201 | 
202 | 
203 | def visualisation_importer(class_name, *args, **kwargs):
204 |     """
205 | 
206 |     Args:
207 |         class_name:
208 |         *args:
209 |         **kwargs:
210 | 
211 |     Returns:
212 | 
213 |     """
214 |     return module_importer(module_path="src.visualisations", class_name=class_name, *args, **kwargs)
215 | 


--------------------------------------------------------------------------------
/src/features/statistical_features_impl.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.signal
  3 | import scipy.stats
  4 | from spectrum import arburg
  5 | 
  6 | 
  7 | __all__ = [
  8 |     "mad",
  9 |     "sma",
 10 |     "energy",
 11 |     "autoreg",
 12 |     "corr",
 13 |     "td_entropy",
 14 |     "fd_entropy",
 15 |     "mean_freq",
 16 |     "bands_energy",
 17 |     "t_feat",
 18 |     "f_feat",
 19 | ]
 20 | 
 21 | """
 22 | TIME DOMAIN FEATURES
 23 | """
 24 | 
 25 | 
 26 | def mad(data, axis):
 27 |     return np.median(np.abs(data), axis=axis)
 28 | 
 29 | 
 30 | def sma(data, axis):
 31 |     return np.abs(data).sum(tuple(np.arange(1, data.ndim)))[:, None]
 32 | 
 33 | 
 34 | def energy(data, axis):
 35 |     return np.power(data, 2).mean(axis=axis)
 36 | 
 37 | 
 38 | def autoreg(data, axis):
 39 |     def _autoreg(datum):
 40 |         order = 4
 41 |         try:
 42 |             coef, _, _ = arburg(datum, order)
 43 |             coef = coef.real.tolist()
 44 |         except ValueError:
 45 |             coef = [0] * order
 46 |         return coef
 47 | 
 48 |     ar = np.asarray([[_autoreg(data[jj, :, ii]) for ii in range(data.shape[2])] for jj in range(data.shape[0])])
 49 | 
 50 |     return ar.reshape(ar.shape[0], -1)
 51 | 
 52 | 
 53 | def corr(data, axis):
 54 |     inds = np.tril_indices(3, k=-1)
 55 |     cor = np.asarray([np.corrcoef(datum.T)[inds] for datum in data])
 56 |     return cor
 57 | 
 58 | 
 59 | def td_entropy(data, axis, bins=16):
 60 |     bins = np.linspace(-4, 4, bins)
 61 | 
 62 |     def _td_entropy(datum):
 63 |         ent = []
 64 |         for ci in range(datum.shape[1]):
 65 |             pp, bb = np.histogram(datum[:, ci], bins, density=True)
 66 |             ent.append(scipy.stats.entropy(pp * (bb[1:] - bb[:-1]), base=2))
 67 |         return ent
 68 | 
 69 |     H = np.asarray([_td_entropy(datum) for datum in data])
 70 | 
 71 |     return H
 72 | 
 73 | 
 74 | """
 75 | FREQUENCY DOMAIN FEATURES
 76 | """
 77 | 
 78 | 
 79 | def fd_entropy(psd, axis, td=False):
 80 |     H = scipy.stats.entropy((psd / psd.sum(axis=axis)[:, None, :]).transpose(1, 0, 2), base=2)
 81 |     return H
 82 | 
 83 | 
 84 | def mean_freq(freq, spec, axis):
 85 |     return (spec * freq[None, :, None]).sum(axis=axis)
 86 | 
 87 | 
 88 | def bands_energy(freq, spec, axis):
 89 |     # Based on window of 2.56 seconds sampled at 50 Hz: 128 samples
 90 |     orig_freqs = np.fft.fftfreq(128, 1 / 50)[:64]
 91 |     orig_band_inds = np.asarray(
 92 |         [
 93 |             orig_freqs[[ll - 1, uu - 1]]
 94 |             for ll, uu in [
 95 |                 [1, 8],
 96 |                 [9, 16],
 97 |                 [17, 24],
 98 |                 [25, 32],
 99 |                 [33, 40],
100 |                 [41, 48],
101 |                 [49, 56],
102 |                 [57, 64],
103 |                 [1, 16],
104 |                 [17, 32],
105 |                 [22, 48],
106 |                 [49, 64],
107 |                 [1, 24],
108 |                 [25, 48],
109 |             ]
110 |         ]
111 |     )
112 | 
113 |     # Generate the inds
114 |     bands = np.asarray([(freq > ll) & (freq <= uu) for ll, uu in orig_band_inds]).T
115 | 
116 |     # Compute the sum with tensor multiplication
117 |     band_energy = np.einsum("ijk,kl->ijl", spec.transpose(0, 2, 1), bands).transpose(0, 2, 1)
118 |     band_energy = band_energy.reshape(band_energy.shape[0], -1)
119 | 
120 |     return band_energy
121 | 
122 | 
123 | def add_magnitude(data):
124 |     assert isinstance(data, np.ndarray)
125 |     return np.concatenate((data, np.sqrt(np.power(data, 2).sum(axis=2, keepdims=True)) - 1), axis=-1)
126 | 
127 | 
128 | """
129 | Time and frequency feature interfaces
130 | """
131 | 
132 | 
133 | def t_feat(data):
134 |     data = add_magnitude(data)
135 |     features = [
136 |         f(data, axis=1)
137 |         for f in [
138 |             np.mean,  # 3 (cumsum: 3)
139 |             np.std,  # 3 (cumsum: 6)
140 |             mad,  # 3 (cumsum: 9)
141 |             np.max,  # 3 (cumsum: 12)
142 |             np.min,  # 3 (cumsum: 15)
143 |             sma,  # 1 --- (cumsum: 16)
144 |             energy,  # 3 --- (cumsum: 19)
145 |             scipy.stats.iqr,  # 3 (cumsum: 22)
146 |             td_entropy,  # 3 (cumsum: 25)
147 |             # autoreg,  # 12 (cumsum: 37)
148 |             corr,  # 3 (cumsum: 40)
149 |         ]
150 |     ]
151 | 
152 |     feats = np.concatenate(features, axis=1)
153 |     return feats
154 | 
155 | 
156 | def f_feat(data, fs):
157 |     data = add_magnitude(data)
158 | 
159 |     freq, spec = scipy.signal.periodogram(data, fs=fs, axis=1)
160 |     spec_normed = spec / spec.sum(axis=1)[:, None, :]
161 | 
162 |     features = [
163 |         f(spec, axis=1)
164 |         for f in [
165 |             np.mean,  # 3 (cumsum: 3)
166 |             np.std,  # 3 (cumsum: 6)
167 |             mad,  # 3 (cumsum: 9)
168 |             np.max,  # 3 (cumsum: 12)
169 |             np.min,  # 3 (cumsum: 15)
170 |             sma,  # 1 (cumsum: 16)
171 |             energy,  # 3 (cumsum: 19)
172 |             scipy.stats.iqr,  # 3 (cumsum: 22)
173 |             fd_entropy,  # 3 (cumsum: 25)
174 |             np.argmax,  # 3 (cumsum: 28)
175 |             scipy.stats.skew,  # 3 (cumsum: 31)
176 |             scipy.stats.kurtosis,  # 3 (cumsum: 34)
177 |         ]
178 |     ]
179 | 
180 |     features += [
181 |         mean_freq(freq, spec_normed, axis=1),  # 3 (cumsum: 37)
182 |         bands_energy(freq, spec_normed, axis=1),  # 42 (cumsum: 79) (not on mag)
183 |     ]
184 | 
185 |     feats = np.concatenate(features, axis=1)
186 |     return feats
187 | 


--------------------------------------------------------------------------------
/src/transformers/window.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | from loguru import logger
  4 | 
  5 | from src.base import get_ancestral_metadata
  6 | from src.utils.decorators import PartitionByTrial
  7 | 
  8 | 
  9 | __all__ = [
 10 |     "window",
 11 | ]
 12 | 
 13 | 
 14 | def window_data(index, data, fs, win_len, win_inc):
 15 |     assert data.ndim == 2
 16 | 
 17 |     win_len = int(win_len * fs)
 18 |     win_inc = int(win_inc * fs)
 19 | 
 20 |     data_windowed = sliding_window_rect(data, win_len, win_inc)
 21 | 
 22 |     if data.shape[0] // win_len == 0:
 23 |         raise ValueError
 24 |     elif data.shape[0] // win_len == 1:
 25 |         return data_windowed[None, ...]
 26 |     elif data_windowed.ndim == 2:
 27 |         return data_windowed[..., None]
 28 |     assert data_windowed.ndim == 3
 29 |     return data_windowed
 30 | 
 31 | 
 32 | def window_index(index, data, fs, win_len, win_inc, slice_at="middle"):
 33 |     assert isinstance(data, pd.DataFrame)
 34 |     data_windowed = window_data(index=index, data=data.values, fs=fs, win_len=win_len, win_inc=win_inc)
 35 |     ind = dict(start=0, middle=data_windowed.shape[1] // 2, end=-1)[slice_at]
 36 |     df = pd.DataFrame(data_windowed[:, ind, :], columns=data.columns)
 37 |     df = df.astype(data.dtypes)
 38 |     return df
 39 | 
 40 | 
 41 | def window(parent, win_len, win_inc):
 42 |     root = parent / f"{win_len=:03.2f}-{win_inc=:03.2f}"
 43 | 
 44 |     fs = get_ancestral_metadata(root, "fs")
 45 | 
 46 |     kwargs = dict(index=parent.index["index"], win_len=win_len, win_inc=win_inc, fs=fs)
 47 | 
 48 |     # Build index outputs
 49 |     for key, node in parent.index.items():
 50 |         root.instantiate_node(
 51 |             key=key, func=PartitionByTrial(window_index), kwargs=dict(data=node, **kwargs), backend="pandas"
 52 |         )
 53 | 
 54 |     # Build Data outputs
 55 |     for key, node in parent.outputs.items():
 56 |         root.instantiate_node(
 57 |             key=key, func=PartitionByTrial(window_data), kwargs=dict(data=node, **kwargs), backend="none",
 58 |         )
 59 | 
 60 |     return root
 61 | 
 62 | 
 63 | def norm_shape(shape):
 64 |     try:
 65 |         i = int(shape)
 66 |         return (i,)
 67 |     except TypeError:
 68 |         # shape was not a number
 69 |         pass
 70 | 
 71 |     try:
 72 |         t = tuple(shape)
 73 |         return t
 74 |     except TypeError:
 75 |         # shape was not iterable
 76 |         pass
 77 | 
 78 |     logger.exception(TypeError("shape must be an int, or a tuple of ints"))
 79 | 
 80 | 
 81 | def sliding_window(a, ws, ss=None, flatten=True):
 82 |     """
 83 |     based on: https://stackoverflow.com/questions/22685274
 84 | 
 85 |     Return a sliding window over a in any number of dimensions
 86 | 
 87 |     Parameters
 88 |     ----------
 89 |     a : ndarray
 90 |         an n-dimensional numpy array
 91 |     ws : int, tuple
 92 |         an int (a is 1D) or tuple (a is 2D or greater) representing the size of
 93 |         each dimension of the window
 94 |     ss : int, tuple
 95 |         an int (a is 1D) or tuple (a is 2D or greater) representing the amount
 96 |         to slide the window in each dimension. If not specified, it defaults to ws.
 97 |     flatten : book
 98 |         if True, all slices are flattened, otherwise, there is an extra dimension
 99 |         for each dimension of the input.
100 | 
101 |     Returns
102 |     -------
103 |         strided : ndarray
104 |             an array containing each n-dimensional window from a
105 |     """
106 | 
107 |     if None is ss:
108 |         # ss was not provided. the windows will not overlap in any direction.
109 |         ss = ws
110 |     ws = norm_shape(ws)
111 |     ss = norm_shape(ss)
112 | 
113 |     # convert ws, ss, and a.shape to numpy arrays so that we can do math in every
114 |     # dimension at once.
115 |     ws = np.array(ws)
116 |     ss = np.array(ss)
117 |     shape = np.array(a.shape)
118 | 
119 |     # ensure that ws, ss, and a.shape all have the same number of dimensions
120 |     ls = [len(shape), len(ws), len(ss)]
121 |     if 1 != len(set(ls)):
122 |         logger.exception(ValueError(f"a.shape, ws and ss must all have the same length. They were {ls}"))
123 | 
124 |     # ensure that ws is smaller than a in every dimension
125 |     if np.any(ws > shape):
126 |         logger.exception(
127 |             ValueError(
128 |                 f"ws cannot be larger than a in any dimension. a.shape was %s and " "ws was {(str(a.shape), str(ws))}"
129 |             )
130 |         )
131 | 
132 |     # how many slices will there be in each dimension?
133 |     newshape = norm_shape(((shape - ws) // ss) + 1)
134 |     # the shape of the strided array will be the number of slices in each dimension
135 |     # plus the shape of the window (tuple addition)
136 |     newshape += norm_shape(ws)
137 |     # the strides tuple will be the array's strides multiplied by step size, plus
138 |     # the array's strides (tuple addition)
139 |     newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
140 |     strided = np.lib.stride_tricks.as_strided(a, shape=newshape, strides=newstrides)
141 |     if not flatten:
142 |         return strided
143 | 
144 |     # Collapse strided so that it has one more dimension than the window.  I.e.,
145 |     # the new array is a flat list of slices.
146 |     meat = len(ws) if ws.shape else 0
147 |     firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
148 |     dim = firstdim + (newshape[-meat:])
149 |     dim = list(filter(lambda i: i != 1, dim))
150 | 
151 |     return strided.reshape(dim)
152 | 
153 | 
154 | def sliding_window_rect(data, length, increment):
155 |     length = (length, data.shape[1])
156 |     increment = (increment, data.shape[1])
157 | 
158 |     return sliding_window(data, length, increment)
159 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
  1 | .PHONY: clean data lint requirements sync_data_to_s3 sync_data_from_s3
  2 | 
  3 | #################################################################################
  4 | # GLOBALS                                                                       #
  5 | #################################################################################
  6 | 
  7 | PROJECT_DIR := $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
  8 | BUCKET = [OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')
  9 | PROFILE = default
 10 | PROJECT_NAME = har_datasets
 11 | PYTHON_INTERPRETER = pipenv run
 12 | MAKE = /usr/bin/make
 13 | 
 14 | ifeq (,$(shell which conda))
 15 | HAS_CONDA=False
 16 | else
 17 | HAS_CONDA=True
 18 | endif
 19 | 
 20 | #################################################################################
 21 | # COMMANDS                                                                      #
 22 | #################################################################################
 23 | 
 24 | ## Install Python Dependencies
 25 | docs:
 26 | ifeq (True,$(HAS_CONDA))
 27 | 	conda install --file requirements.txt
 28 | else
 29 | 	pip install -r requirements.txt
 30 | endif
 31 | 
 32 | ## Install Python Dependencies
 33 | requirements:
 34 | ifeq (True,$(HAS_CONDA))
 35 | 	conda install --file requirements.txt
 36 | else
 37 | 	pip install -r requirements.txt
 38 | endif
 39 | 
 40 | ## Make Dataset Table
 41 | tables:
 42 | 	$(PYTHON_INTERPRETER) make_tables.py
 43 | 
 44 | ## Download the raw data
 45 | download:
 46 | 	$(PYTHON_INTERPRETER) make_download.py
 47 | 
 48 | ## Delete all compiled Python files
 49 | clean:
 50 | 	find . -name "*.pyc" -delete
 51 | 
 52 | ## Lint using flake8
 53 | lint:
 54 | 	## F401 	module imported but unused
 55 | 	## F403 	‘from module import *’ used; unable to detect undefined names
 56 | 	## W293 	blank line contains whitespace
 57 | 	flake8 . \
 58 | 		--exclude=data/,docs/,*__init__.py \
 59 | 		--ignore=W293,F401,F403 \
 60 | 		--max-line-length=120
 61 | 
 62 | ## Upload Data to S3
 63 | sync_data_to_s3:
 64 | ifeq (default,$(PROFILE))
 65 | 	aws s3 sync data/ s3://$(BUCKET)/data/
 66 | else
 67 | 	aws s3 sync data/ s3://$(BUCKET)/data/ --profile $(PROFILE)
 68 | endif
 69 | 
 70 | ## Download Data from S3
 71 | sync_data_from_s3:
 72 | ifeq (default,$(PROFILE))
 73 | 	aws s3 sync s3://$(BUCKET)/data/ data/
 74 | else
 75 | 	aws s3 sync s3://$(BUCKET)/data/ data/ --profile $(PROFILE)
 76 | endif
 77 | 
 78 | ## Set up python interpreter environment
 79 | create_environment:
 80 | ifeq (True,$(HAS_CONDA))
 81 | 		@echo ">>> Detected conda, creating conda environment."
 82 | ifeq (3,$(findstring 3,$(PYTHON_INTERPRETER)))
 83 | 	conda create --name $(PROJECT_NAME) python=3
 84 | else
 85 | 	conda create --name $(PROJECT_NAME) python=2.7
 86 | endif
 87 | 		@echo ">>> New conda env created. Activate with:\nsource activate $(PROJECT_NAME)"
 88 | else
 89 | 	@pip install -q virtualenv virtualenvwrapper
 90 | 	@echo ">>> Installing virtualenvwrapper if not already intalled.\nMake sure the following lines are in shell startup file\n\
 91 | 	export WORKON_HOME=$$HOME/.virtualenvs\nexport PROJECT_HOME=$$HOME/Devel\nsource /usr/local/bin/virtualenvwrapper.sh\n"
 92 | 	@bash -c "source `which virtualenvwrapper.sh`;mkvirtualenv $(PROJECT_NAME) --python=$(PYTHON_INTERPRETER)"
 93 | 	@echo ">>> New virtualenv created. Activate with:\nworkon $(PROJECT_NAME)"
 94 | endif
 95 | 
 96 | ## Test python environment is setup correctly
 97 | test_environment:
 98 | 	$(PYTHON_INTERPRETER) test_environment.py
 99 | 
100 | #################################################################################
101 | # PROJECT RULES                                                                 #
102 | #################################################################################
103 | 
104 | 
105 | 
106 | #################################################################################
107 | # Self Documenting Commands                                                     #
108 | #################################################################################
109 | 
110 | .DEFAULT_GOAL := show-help
111 | 
112 | # Inspired by <http://marmelab.com/blog/2016/02/29/auto-documented-makefile.html>
113 | # sed script explained:
114 | # /^##/:
115 | # 	* save line in hold space
116 | # 	* purge line
117 | # 	* Loop:
118 | # 		* append newline + line to hold space
119 | # 		* go to next line
120 | # 		* if line starts with doc comment, strip comment character off and loop
121 | # 	* remove target prerequisites
122 | # 	* append hold space (+ newline) to line
123 | # 	* replace newline plus comments by `---`
124 | # 	* print line
125 | # Separate expressions are necessary because labels cannot be delimited by
126 | # semicolon; see <http://stackoverflow.com/a/11799865/1968>
127 | .PHONY: show-help
128 | show-help:
129 | 	@echo "$$(tput bold)Available rules:$$(tput sgr0)"
130 | 	@echo
131 | 	@sed -n -e "/^## / { \
132 | 		h; \
133 | 		s/.*//; \
134 | 		:doc" \
135 | 		-e "H; \
136 | 		n; \
137 | 		s/^## //; \
138 | 		t doc" \
139 | 		-e "s/:.*//; \
140 | 		G; \
141 | 		s/\\n## /---/; \
142 | 		s/\\n/ /g; \
143 | 		p; \
144 | 	}" ${MAKEFILE_LIST} \
145 | 	| LC_ALL='C' sort --ignore-case \
146 | 	| awk -F '---' \
147 | 		-v ncol=$$(tput cols) \
148 | 		-v indent=19 \
149 | 		-v col_on="$$(tput setaf 6)" \
150 | 		-v col_off="$$(tput sgr0)" \
151 | 	'{ \
152 | 		printf "%s%*s%s ", col_on, -indent, $$1, col_off; \
153 | 		n = split($$2, words, " "); \
154 | 		line_length = ncol - indent; \
155 | 		for (i = 1; i <= n; i++) { \
156 | 			line_length -= length(words[i]) + 1; \
157 | 			if (line_length <= 0) { \
158 | 				line_length = ncol - indent - length(words[i]) - 1; \
159 | 				printf "\n%*s ", -indent, " "; \
160 | 			} \
161 | 			printf "%s ", words[i]; \
162 | 		} \
163 | 		printf "\n"; \
164 | 	}' \
165 | 	| more $(shell test $(shell uname) = Darwin && echo '--no-init --raw-control-chars')
166 | 


--------------------------------------------------------------------------------
/docs/make.bat:
--------------------------------------------------------------------------------
  1 | @ECHO OFF
  2 | 
  3 | REM Command file for Sphinx documentation
  4 | 
  5 | if "%SPHINXBUILD%" == "" (
  6 | 	set SPHINXBUILD=sphinx-build
  7 | )
  8 | set BUILDDIR=_build
  9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
 10 | set I18NSPHINXOPTS=%SPHINXOPTS% .
 11 | if NOT "%PAPER%" == "" (
 12 | 	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
 13 | 	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
 14 | )
 15 | 
 16 | if "%1" == "" goto help
 17 | 
 18 | if "%1" == "help" (
 19 | 	:help
 20 | 	echo.Please use `make ^<target^>` where ^<target^> is one of
 21 | 	echo.  html       to make standalone HTML files
 22 | 	echo.  dirhtml    to make HTML files named index.html in directories
 23 | 	echo.  singlehtml to make a single large HTML file
 24 | 	echo.  pickle     to make pickle files
 25 | 	echo.  json       to make JSON files
 26 | 	echo.  htmlhelp   to make HTML files and a HTML help project
 27 | 	echo.  qthelp     to make HTML files and a qthelp project
 28 | 	echo.  devhelp    to make HTML files and a Devhelp project
 29 | 	echo.  epub       to make an epub
 30 | 	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
 31 | 	echo.  text       to make text files
 32 | 	echo.  man        to make manual pages
 33 | 	echo.  texinfo    to make Texinfo files
 34 | 	echo.  gettext    to make PO message catalogs
 35 | 	echo.  changes    to make an overview over all changed/added/deprecated items
 36 | 	echo.  linkcheck  to check all external links for integrity
 37 | 	echo.  doctest    to run all doctests embedded in the documentation if enabled
 38 | 	goto end
 39 | )
 40 | 
 41 | if "%1" == "clean" (
 42 | 	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
 43 | 	del /q /s %BUILDDIR%\*
 44 | 	goto end
 45 | )
 46 | 
 47 | if "%1" == "html" (
 48 | 	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
 49 | 	if errorlevel 1 exit /b 1
 50 | 	echo.
 51 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
 52 | 	goto end
 53 | )
 54 | 
 55 | if "%1" == "dirhtml" (
 56 | 	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
 57 | 	if errorlevel 1 exit /b 1
 58 | 	echo.
 59 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
 60 | 	goto end
 61 | )
 62 | 
 63 | if "%1" == "singlehtml" (
 64 | 	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
 65 | 	if errorlevel 1 exit /b 1
 66 | 	echo.
 67 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
 68 | 	goto end
 69 | )
 70 | 
 71 | if "%1" == "pickle" (
 72 | 	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
 73 | 	if errorlevel 1 exit /b 1
 74 | 	echo.
 75 | 	echo.Build finished; now you can process the pickle files.
 76 | 	goto end
 77 | )
 78 | 
 79 | if "%1" == "json" (
 80 | 	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
 81 | 	if errorlevel 1 exit /b 1
 82 | 	echo.
 83 | 	echo.Build finished; now you can process the JSON files.
 84 | 	goto end
 85 | )
 86 | 
 87 | if "%1" == "htmlhelp" (
 88 | 	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
 89 | 	if errorlevel 1 exit /b 1
 90 | 	echo.
 91 | 	echo.Build finished; now you can run HTML Help Workshop with the ^
 92 | .hhp project file in %BUILDDIR%/htmlhelp.
 93 | 	goto end
 94 | )
 95 | 
 96 | if "%1" == "qthelp" (
 97 | 	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
 98 | 	if errorlevel 1 exit /b 1
 99 | 	echo.
100 | 	echo.Build finished; now you can run "qcollectiongenerator" with the ^
101 | .qhcp project file in %BUILDDIR%/qthelp, like this:
102 | 	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\har_datasets.qhcp
103 | 	echo.To view the help file:
104 | 	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\har_datasets.ghc
105 | 	goto end
106 | )
107 | 
108 | if "%1" == "devhelp" (
109 | 	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
110 | 	if errorlevel 1 exit /b 1
111 | 	echo.
112 | 	echo.Build finished.
113 | 	goto end
114 | )
115 | 
116 | if "%1" == "epub" (
117 | 	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
118 | 	if errorlevel 1 exit /b 1
119 | 	echo.
120 | 	echo.Build finished. The epub file is in %BUILDDIR%/epub.
121 | 	goto end
122 | )
123 | 
124 | if "%1" == "latex" (
125 | 	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
126 | 	if errorlevel 1 exit /b 1
127 | 	echo.
128 | 	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
129 | 	goto end
130 | )
131 | 
132 | if "%1" == "text" (
133 | 	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
134 | 	if errorlevel 1 exit /b 1
135 | 	echo.
136 | 	echo.Build finished. The text files are in %BUILDDIR%/text.
137 | 	goto end
138 | )
139 | 
140 | if "%1" == "man" (
141 | 	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
142 | 	if errorlevel 1 exit /b 1
143 | 	echo.
144 | 	echo.Build finished. The manual pages are in %BUILDDIR%/man.
145 | 	goto end
146 | )
147 | 
148 | if "%1" == "texinfo" (
149 | 	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
150 | 	if errorlevel 1 exit /b 1
151 | 	echo.
152 | 	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
153 | 	goto end
154 | )
155 | 
156 | if "%1" == "gettext" (
157 | 	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
158 | 	if errorlevel 1 exit /b 1
159 | 	echo.
160 | 	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
161 | 	goto end
162 | )
163 | 
164 | if "%1" == "changes" (
165 | 	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
166 | 	if errorlevel 1 exit /b 1
167 | 	echo.
168 | 	echo.The overview file is in %BUILDDIR%/changes.
169 | 	goto end
170 | )
171 | 
172 | if "%1" == "linkcheck" (
173 | 	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
174 | 	if errorlevel 1 exit /b 1
175 | 	echo.
176 | 	echo.Link check complete; look for any errors in the above output ^
177 | or in %BUILDDIR%/linkcheck/output.txt.
178 | 	goto end
179 | )
180 | 
181 | if "%1" == "doctest" (
182 | 	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
183 | 	if errorlevel 1 exit /b 1
184 | 	echo.
185 | 	echo.Testing of doctests in the sources finished, look at the ^
186 | results in %BUILDDIR%/doctest/output.txt.
187 | 	goto end
188 | )
189 | 
190 | :end
191 | 


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
  1 | # Makefile for Sphinx documentation
  2 | #
  3 | 
  4 | # You can set these variables from the command line.
  5 | SPHINXOPTS    =
  6 | SPHINXBUILD   = sphinx-build
  7 | PAPER         =
  8 | BUILDDIR      = _build
  9 | 
 10 | # Internal variables.
 11 | PAPEROPT_a4     = -D latex_paper_size=a4
 12 | PAPEROPT_letter = -D latex_paper_size=letter
 13 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 14 | # the i18n builder cannot share the environment and doctrees with the others
 15 | I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 16 | 
 17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
 18 | 
 19 | help:
 20 | 	@echo "Please use \`make <target>' where <target> is one of"
 21 | 	@echo "  html       to make standalone HTML files"
 22 | 	@echo "  dirhtml    to make HTML files named index.html in directories"
 23 | 	@echo "  singlehtml to make a single large HTML file"
 24 | 	@echo "  pickle     to make pickle files"
 25 | 	@echo "  json       to make JSON files"
 26 | 	@echo "  htmlhelp   to make HTML files and a HTML help project"
 27 | 	@echo "  qthelp     to make HTML files and a qthelp project"
 28 | 	@echo "  devhelp    to make HTML files and a Devhelp project"
 29 | 	@echo "  epub       to make an epub"
 30 | 	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
 31 | 	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
 32 | 	@echo "  text       to make text files"
 33 | 	@echo "  man        to make manual pages"
 34 | 	@echo "  texinfo    to make Texinfo files"
 35 | 	@echo "  info       to make Texinfo files and run them through makeinfo"
 36 | 	@echo "  gettext    to make PO message catalogs"
 37 | 	@echo "  changes    to make an overview of all changed/added/deprecated items"
 38 | 	@echo "  linkcheck  to check all external links for integrity"
 39 | 	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
 40 | 
 41 | clean:
 42 | 	-rm -rf $(BUILDDIR)/*
 43 | 
 44 | html:
 45 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
 46 | 	@echo
 47 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 48 | 
 49 | dirhtml:
 50 | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 51 | 	@echo
 52 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
 53 | 
 54 | singlehtml:
 55 | 	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
 56 | 	@echo
 57 | 	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
 58 | 
 59 | pickle:
 60 | 	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
 61 | 	@echo
 62 | 	@echo "Build finished; now you can process the pickle files."
 63 | 
 64 | json:
 65 | 	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
 66 | 	@echo
 67 | 	@echo "Build finished; now you can process the JSON files."
 68 | 
 69 | htmlhelp:
 70 | 	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
 71 | 	@echo
 72 | 	@echo "Build finished; now you can run HTML Help Workshop with the" \
 73 | 	      ".hhp project file in $(BUILDDIR)/htmlhelp."
 74 | 
 75 | qthelp:
 76 | 	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
 77 | 	@echo
 78 | 	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
 79 | 	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
 80 | 	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/har_datasets.qhcp"
 81 | 	@echo "To view the help file:"
 82 | 	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/har_datasets.qhc"
 83 | 
 84 | devhelp:
 85 | 	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
 86 | 	@echo
 87 | 	@echo "Build finished."
 88 | 	@echo "To view the help file:"
 89 | 	@echo "# mkdir -p $$HOME/.local/share/devhelp/har_datasets"
 90 | 	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/har_datasets"
 91 | 	@echo "# devhelp"
 92 | 
 93 | epub:
 94 | 	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
 95 | 	@echo
 96 | 	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
 97 | 
 98 | latex:
 99 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100 | 	@echo
101 | 	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102 | 	@echo "Run \`make' in that directory to run these through (pdf)latex" \
103 | 	      "(use \`make latexpdf' here to do that automatically)."
104 | 
105 | latexpdf:
106 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 | 	@echo "Running LaTeX files through pdflatex..."
108 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf
109 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110 | 
111 | text:
112 | 	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113 | 	@echo
114 | 	@echo "Build finished. The text files are in $(BUILDDIR)/text."
115 | 
116 | man:
117 | 	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118 | 	@echo
119 | 	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120 | 
121 | texinfo:
122 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123 | 	@echo
124 | 	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125 | 	@echo "Run \`make' in that directory to run these through makeinfo" \
126 | 	      "(use \`make info' here to do that automatically)."
127 | 
128 | info:
129 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130 | 	@echo "Running Texinfo files through makeinfo..."
131 | 	make -C $(BUILDDIR)/texinfo info
132 | 	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133 | 
134 | gettext:
135 | 	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136 | 	@echo
137 | 	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138 | 
139 | changes:
140 | 	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141 | 	@echo
142 | 	@echo "The overview file is in $(BUILDDIR)/changes."
143 | 
144 | linkcheck:
145 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146 | 	@echo
147 | 	@echo "Link check complete; look for any errors in the above output " \
148 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
149 | 
150 | doctest:
151 | 	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152 | 	@echo "Testing of doctests in the sources finished, look at the " \
153 | 	      "results in $(BUILDDIR)/doctest/output.txt."
154 | 


--------------------------------------------------------------------------------
/src/models/base.py:
--------------------------------------------------------------------------------
  1 | from typing import Any
  2 | from typing import Dict
  3 | from typing import Optional
  4 | 
  5 | import numpy as np
  6 | import pandas as pd
  7 | from loguru import logger
  8 | from mldb import NodeWrapper
  9 | from sklearn.base import BaseEstimator
 10 | from sklearn.model_selection import GridSearchCV
 11 | from sklearn.model_selection import GroupKFold
 12 | 
 13 | from src.base import ExecutionGraph
 14 | from src.evaluation.classification import evaluate_data_split
 15 | 
 16 | __all__ = ["instantiate_and_fit", "ClassifierWrapper", "BasicScorer"]
 17 | 
 18 | 
 19 | def instantiate_and_fit(
 20 |     index: pd.DataFrame,
 21 |     fold: pd.DataFrame,
 22 |     X: np.ndarray,
 23 |     y: pd.DataFrame,
 24 |     estimator: BaseEstimator,
 25 |     n_splits: int = 5,
 26 |     param_grid: Optional[Dict[str, Any]] = None,
 27 | ) -> BaseEstimator:
 28 |     assert fold.shape[0] == index.shape[0]
 29 |     assert fold.shape[0] == X.shape[0]
 30 |     assert fold.shape[0] == y.shape[0]
 31 | 
 32 |     fold_vals = fold.ravel()
 33 | 
 34 |     train_inds = fold_vals == "train"
 35 |     val_inds = fold_vals == "val"
 36 | 
 37 |     if val_inds.sum():
 38 |         raise NotImplementedError("Explicit validation indices not yet supported.")
 39 | 
 40 |     y = y.values.ravel()
 41 | 
 42 |     nan_row, nan_col = np.nonzero(np.isnan(X) | np.isinf(X))
 43 |     if len(nan_row):
 44 |         logger.warning(f"Setting {len(nan_row)} NaN elements to zero before fitting {estimator}.")
 45 |         X[nan_row, nan_col] = 0
 46 | 
 47 |     logger.info(f"Fitting {estimator} on data (shape: {X.shape})")
 48 | 
 49 |     if param_grid is not None:
 50 |         group_k_fold = GroupKFold(n_splits=n_splits).split(X[train_inds], y[train_inds], index.trial.values[train_inds])
 51 | 
 52 |         grid_search = GridSearchCV(estimator=estimator, param_grid=param_grid, verbose=10, cv=list(group_k_fold))
 53 |         grid_search.fit(X[train_inds], y[train_inds])
 54 | 
 55 |         return grid_search.best_estimator_
 56 | 
 57 |     estimator.fit(X[train_inds], y[train_inds])
 58 |     return estimator
 59 | 
 60 | 
 61 | # noinspection PyPep8Naming
 62 | class BasicScorer(object):
 63 |     def fit(self, estimator: Any, X: np.ndarray, y: np.ndarray):
 64 |         return estimator.fit(X, y)
 65 | 
 66 |     def score(self, estimator: Any, X: np.ndarray, y: np.ndarray):
 67 |         return estimator.score(X, y)
 68 | 
 69 |     def transform(self, estimator: Any, X: np.ndarray):
 70 |         return estimator.transform(X)
 71 | 
 72 |     def decision_function(self, estimator: Any, X: np.ndarray):
 73 |         return estimator.predict_proba(X)
 74 | 
 75 |     def predict(self, estimator: Any, X: np.ndarray):
 76 |         return estimator.predict(X)
 77 | 
 78 |     def predict_proba(self, estimator: Any, X: np.ndarray):
 79 |         return estimator.predict_proba(X)
 80 | 
 81 |     def predict_log_proba(self, estimator: Any, X: np.ndarray):
 82 |         return estimator.predict_proba(X)
 83 | 
 84 | 
 85 | # noinspection PyPep8Naming
 86 | class ClassifierWrapper(ExecutionGraph):
 87 |     def __init__(
 88 |         self,
 89 |         parent: ExecutionGraph,
 90 |         features: NodeWrapper,
 91 |         split: NodeWrapper,
 92 |         task: NodeWrapper,
 93 |         estimator: NodeWrapper,
 94 |         param_grid: Optional[Dict[str, Any]] = None,
 95 |         scorer: Optional[BasicScorer] = None,
 96 |         evaluate: bool = False,
 97 |     ):
 98 |         assert isinstance(parent, ExecutionGraph)
 99 |         assert isinstance(features, NodeWrapper)
100 |         assert isinstance(split, NodeWrapper)
101 |         assert isinstance(task, NodeWrapper)
102 |         assert isinstance(estimator, NodeWrapper)
103 | 
104 |         super(ClassifierWrapper, self).__init__(parent=parent, name=f"estimator={str(estimator.name.name)}")
105 | 
106 |         self.features = features
107 |         self.split = split
108 |         self.task = task
109 | 
110 |         self.scorer = BasicScorer() if scorer is None else scorer
111 | 
112 |         model = self.instantiate_node(
113 |             key="model",
114 |             func=instantiate_and_fit,
115 |             backend="sklearn",
116 |             kwargs=dict(
117 |                 X=features, y=task, index=self["index"], fold=self.split, estimator=estimator, param_grid=param_grid,
118 |             ),
119 |         )
120 | 
121 |         results = self.get_or_create(
122 |             key="results",
123 |             func=evaluate_data_split,
124 |             backend="json",
125 |             kwargs=dict(split=split, targets=task, estimator=model, prob_predictions=self.predict_proba(features)),
126 |         )
127 | 
128 |         if evaluate:
129 |             self.dump_graph()
130 |             model.evaluate()
131 |             results.evaluate()
132 | 
133 |     @property
134 |     def model(self):
135 |         return self["model"]
136 | 
137 |     @property
138 |     def results(self):
139 |         return self["results"]
140 | 
141 |     def fit(self, X, y) -> NodeWrapper:
142 |         logger.warning(f"it looks like you're attempting to re-fit a model on new data - is this the intent?")
143 |         return self.instantiate_orphan_node(func=self.scorer.fit, kwargs=dict(estimator=self["model"], X=X, y=y),)
144 | 
145 |     def score(self, X, y) -> NodeWrapper:
146 |         return self.instantiate_orphan_node(func=self.scorer.score, kwargs=dict(estimator=self["model"], X=X, y=y))
147 | 
148 |     def transform(self, X) -> NodeWrapper:
149 |         return self.instantiate_orphan_node(func=self.scorer.transform, kwargs=dict(estimator=self["model"], X=X))
150 | 
151 |     def predict(self, X) -> NodeWrapper:
152 |         return self.instantiate_orphan_node(func=self.scorer.predict, kwargs=dict(estimator=self["model"], X=X))
153 | 
154 |     def decision_function(self, X) -> NodeWrapper:
155 |         return self.instantiate_orphan_node(
156 |             func=self.scorer.decision_function, kwargs=dict(estimator=self["model"], X=X)
157 |         )
158 | 
159 |     def predict_proba(self, X) -> NodeWrapper:
160 |         return self.instantiate_orphan_node(func=self.scorer.predict_proba, kwargs=dict(estimator=self["model"], X=X))
161 | 
162 |     def predict_log_proba(self, X) -> NodeWrapper:
163 |         return self.instantiate_orphan_node(
164 |             func=self.scorer.predict_log_proba, kwargs=dict(estimator=self["model"], X=X)
165 |         )
166 | 


--------------------------------------------------------------------------------
/src/utils/decorators.py:
--------------------------------------------------------------------------------
  1 | from functools import partial
  2 | from functools import update_wrapper
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | from loguru import logger
  7 | from pandas.api.types import is_categorical_dtype
  8 | from tqdm import tqdm
  9 | 
 10 | from src.utils.exceptions import ModalityNotPresentError
 11 | from src.utils.loaders import dataset_importer
 12 | 
 13 | 
 14 | __all__ = [
 15 |     "index_decorator",
 16 |     "fold_decorator",
 17 |     "label_decorator",
 18 |     "PartitionByTrial",
 19 |     "partitioning_decorator",
 20 | ]
 21 | 
 22 | 
 23 | class DecoratorBase(object):
 24 |     def __init__(self, func):
 25 |         update_wrapper(self, func)
 26 |         self.func = func
 27 | 
 28 |     def __get__(self, obj, objtype):
 29 |         return partial(self.__call__, obj)
 30 | 
 31 |     def __call__(self, *args, **kwargs):
 32 |         return self.func(*args, **kwargs)
 33 | 
 34 | 
 35 | class LabelDecorator(DecoratorBase):
 36 |     def __init__(self, func):
 37 |         super(LabelDecorator, self).__init__(func)
 38 | 
 39 |     def __call__(self, *args, **kwargs):
 40 |         df = super(LabelDecorator, self).__call__(*args, **kwargs)
 41 | 
 42 |         # TODO/FIXME: remove this strange pattern
 43 |         if isinstance(df, tuple):
 44 |             inv_lookup, df = df
 45 |             df = pd.DataFrame(df)
 46 |             for ci in df.columns:
 47 |                 df[ci] = df[ci].apply(lambda ll: inv_lookup[ll])
 48 | 
 49 |         assert len(df.columns) == 1
 50 | 
 51 |         df = pd.DataFrame(df)
 52 |         df.columns = [f"target" for _ in range(len(df.columns))]
 53 |         if is_categorical_dtype(df["target"]):
 54 |             df = df.astype(dict(target="category"))
 55 | 
 56 |         return df
 57 | 
 58 | 
 59 | class FoldDecorator(DecoratorBase):
 60 |     def __init__(self, func):
 61 |         super(FoldDecorator, self).__init__(func)
 62 | 
 63 |     def __call__(self, *args, **kwargs):
 64 |         df = super(FoldDecorator, self).__call__(*args, **kwargs)
 65 |         if isinstance(df.columns, pd.RangeIndex):
 66 |             df.columns = [f"fold_{fi}" for fi in range(len(df.columns))]
 67 |         df = df.astype({col: "category" for col in df.columns})
 68 |         return df
 69 | 
 70 | 
 71 | class IndexDecorator(DecoratorBase):
 72 |     def __init__(self, func):
 73 |         super(IndexDecorator, self).__init__(func)
 74 | 
 75 |     def __call__(self, *args, **kwargs):
 76 |         df = super(IndexDecorator, self).__call__(*args, **kwargs)
 77 |         df.columns = ["subject", "trial", "time"]
 78 |         return df.astype(dict(subject="category", trial="category", time=float))
 79 | 
 80 | 
 81 | def infer_data_type(data):
 82 |     if isinstance(data, np.ndarray):
 83 |         return "numpy"
 84 |     elif isinstance(data, pd.DataFrame):
 85 |         return "pandas"
 86 | 
 87 |     logger.exception(
 88 |         TypeError(f"Unsupported data type in infer_data_type ({type(data)}), currently only {{numpy, pandas}}")
 89 |     )
 90 | 
 91 | 
 92 | def slice_data_type(data, inds, data_type_name):
 93 |     if data_type_name == "numpy":
 94 |         return data[inds]
 95 |     elif data_type_name == "pandas":
 96 |         return data.loc[inds]
 97 | 
 98 |     logger.exception(
 99 |         TypeError(f"Unsupported data type in slice_data_type ({type(data)}), currently only {{numpy, pandas}}")
100 |     )
101 | 
102 | 
103 | def concat_data_type(datas, data_type_name):
104 |     if data_type_name == "numpy":
105 |         return np.concatenate(datas, axis=0)
106 |     elif data_type_name == "pandas":
107 |         df = pd.concat(datas, axis=0)
108 |         return df.reset_index(drop=True)
109 | 
110 |     logger.exception(
111 |         TypeError(f"Unsupported data type in concat_data_type ({type(datas)}), currently only {{numpy, pandas}}")
112 |     )
113 | 
114 | 
115 | class PartitionByTrial(DecoratorBase):
116 |     """
117 | 
118 |     """
119 | 
120 |     def __init__(self, func):
121 |         super(PartitionByTrial, self).__init__(func=func)
122 | 
123 |     def __call__(self, index, data, *args, **kwargs):
124 |         if index.shape[0] != data.shape[0]:
125 |             logger.exception(
126 |                 ValueError(
127 |                     f"The data and index  should have the same length "
128 |                     "with index: {index.shape}; and data: {data.shape}"
129 |                 )
130 |             )
131 |         output = []
132 |         trials = index.trial.unique()
133 |         data_type = infer_data_type(data)
134 |         for trial in tqdm(trials):
135 |             inds = index.trial == trial
136 |             index_ = index.loc[inds]
137 |             data_ = slice_data_type(data, inds, data_type)
138 |             vals = self.func(index=index_, data=data_, *args, **kwargs)
139 |             opdt = infer_data_type(vals)
140 |             if opdt != data_type:
141 |                 logger.exception(
142 |                     ValueError(
143 |                         f"The data type of {self.func} should be the same as the input {data_type} "
144 |                         f"but instead got {opdt}"
145 |                     )
146 |                 )
147 |             output.append(vals)
148 |         return concat_data_type(output, data_type)
149 | 
150 | 
151 | class RequiredModalities(DecoratorBase):
152 |     def __init__(self, func, *modalities):
153 |         super(RequiredModalities, self).__init__(func=func)
154 | 
155 |         self.required_modalities = set(modalities)
156 | 
157 |     def __call__(self, dataset, *args, **kwargs):
158 |         dataset = dataset_importer(dataset)
159 |         dataset_modalities = dataset.meta.modalities
160 |         for required_modality in self.required_modalities:
161 |             if required_modality not in dataset_modalities:
162 |                 logger.exception(
163 |                     ModalityNotPresentError(
164 |                         f"The modality {required_modality} is required by the function {self.func}. "
165 |                         f"However, the dataset {dataset} does not have {required_modality}. The "
166 |                         f"available modalities are: {dataset_modalities})"
167 |                     )
168 |                 )
169 | 
170 |         super(self, RequiredModalities).__call__(dataset, *args, **kwargs)
171 | 
172 | 
173 | required_modalities = RequiredModalities
174 | 
175 | 
176 | label_decorator = LabelDecorator
177 | index_decorator = IndexDecorator
178 | fold_decorator = FoldDecorator
179 | partitioning_decorator = PartitionByTrial
180 | 


--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # har_datasets documentation build configuration file, created by
  4 | # sphinx-quickstart.
  5 | #
  6 | # This file is execfile()d with the current directory set to its containing dir.
  7 | #
  8 | # Note that not all possible configuration values are present in this
  9 | # autogenerated file.
 10 | #
 11 | # All configuration values have a default; values that are commented out
 12 | # serve to show the default.
 13 | 
 14 | import os
 15 | import sys
 16 | 
 17 | # If extensions (or modules to document with autodoc) are in another directory,
 18 | # add these directories to sys.path here. If the directory is relative to the
 19 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 20 | # sys.path.insert(0, os.path.abspath('.'))
 21 | 
 22 | # -- General configuration -----------------------------------------------------
 23 | 
 24 | # If your documentation needs a minimal Sphinx version, state it here.
 25 | # needs_sphinx = '1.0'
 26 | 
 27 | # Add any Sphinx extension module names here, as strings. They can be extensions
 28 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 29 | extensions = []
 30 | 
 31 | # Add any paths that contain templates here, relative to this directory.
 32 | templates_path = ["_templates"]
 33 | 
 34 | # The suffix of source filenames.
 35 | source_suffix = ".rst"
 36 | 
 37 | # The encoding of source files.
 38 | # source_encoding = 'utf-8-sig'
 39 | 
 40 | # The master toctree document.
 41 | master_doc = "index"
 42 | 
 43 | # General information about the project.
 44 | project = "har_datasets"
 45 | 
 46 | # The version info for the project you're documenting, acts as replacement for
 47 | # |version| and |release|, also used in various other places throughout the
 48 | # built documents.
 49 | #
 50 | # The short X.Y version.
 51 | version = "0.1"
 52 | # The full version, including alpha/beta/rc tags.
 53 | release = "0.1"
 54 | 
 55 | # The language for content autogenerated by Sphinx. Refer to documentation
 56 | # for a list of supported languages.
 57 | # language = None
 58 | 
 59 | # There are two options for replacing |today|: either, you set today to some
 60 | # non-false value, then it is used:
 61 | # today = ''
 62 | # Else, today_fmt is used as the format for a strftime call.
 63 | # today_fmt = '%B %d, %Y'
 64 | 
 65 | # List of patterns, relative to source directory, that match files and
 66 | # directories to ignore when looking for source files.
 67 | exclude_patterns = ["_build"]
 68 | 
 69 | # The reST default role (used for this markup: `text`) to use for all documents.
 70 | # default_role = None
 71 | 
 72 | # If true, '()' will be appended to :func: etc. cross-reference text.
 73 | # add_function_parentheses = True
 74 | 
 75 | # If true, the current module name will be prepended to all description
 76 | # unit titles (such as .. function::).
 77 | # add_module_names = True
 78 | 
 79 | # If true, sectionauthor and moduleauthor directives will be shown in the
 80 | # output. They are ignored by default.
 81 | # show_authors = False
 82 | 
 83 | # The name of the Pygments (syntax highlighting) style to use.
 84 | pygments_style = "sphinx"
 85 | 
 86 | # A list of ignored prefixes for module index sorting.
 87 | # modindex_common_prefix = []
 88 | 
 89 | 
 90 | # -- Options for HTML output ---------------------------------------------------
 91 | 
 92 | # The theme to use for HTML and HTML Help pages.  See the documentation for
 93 | # a list of builtin themes.
 94 | html_theme = "default"
 95 | 
 96 | # Theme options are theme-specific and customize the look and feel of a theme
 97 | # further.  For a list of options available for each theme, see the
 98 | # documentation.
 99 | # html_theme_options = {}
100 | 
101 | # Add any paths that contain custom themes here, relative to this directory.
102 | # html_theme_path = []
103 | 
104 | # The name for this set of Sphinx documents.  If None, it defaults to
105 | # "<project> v<release> documentation".
106 | # html_title = None
107 | 
108 | # A shorter title for the navigation bar.  Default is the same as html_title.
109 | # html_short_title = None
110 | 
111 | # The name of an image file (relative to this directory) to place at the top
112 | # of the sidebar.
113 | # html_logo = None
114 | 
115 | # The name of an image file (within the static path) to use as favicon of the
116 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
117 | # pixels large.
118 | # html_favicon = None
119 | 
120 | # Add any paths that contain custom static files (such as style sheets) here,
121 | # relative to this directory. They are copied after the builtin static files,
122 | # so a file named "default.css" will overwrite the builtin "default.css".
123 | html_static_path = ["_static"]
124 | 
125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
126 | # using the given strftime format.
127 | # html_last_updated_fmt = '%b %d, %Y'
128 | 
129 | # If true, SmartyPants will be used to convert quotes and dashes to
130 | # typographically correct entities.
131 | # html_use_smartypants = True
132 | 
133 | # Custom sidebar templates, maps document names to template names.
134 | # html_sidebars = {}
135 | 
136 | # Additional templates that should be rendered to pages, maps page names to
137 | # template names.
138 | # html_additional_pages = {}
139 | 
140 | # If false, no module index is generated.
141 | # html_domain_indices = True
142 | 
143 | # If false, no index is generated.
144 | # html_use_index = True
145 | 
146 | # If true, the index is split into individual pages for each letter.
147 | # html_split_index = False
148 | 
149 | # If true, links to the reST sources are added to the pages.
150 | # html_show_sourcelink = True
151 | 
152 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
153 | # html_show_sphinx = True
154 | 
155 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
156 | # html_show_copyright = True
157 | 
158 | # If true, an OpenSearch description file will be output, and all pages will
159 | # contain a <link> tag referring to it.  The value of this option must be the
160 | # base URL from which the finished HTML is served.
161 | # html_use_opensearch = ''
162 | 
163 | # This is the file name suffix for HTML files (e.g. ".xhtml").
164 | # html_file_suffix = None
165 | 
166 | # Output file base name for HTML help builder.
167 | htmlhelp_basename = "har_datasetsdoc"
168 | 
169 | 
170 | # -- Options for LaTeX output --------------------------------------------------
171 | 
172 | latex_elements = {
173 |     # The paper size ('letterpaper' or 'a4paper').
174 |     # 'papersize': 'letterpaper',
175 |     # The font size ('10pt', '11pt' or '12pt').
176 |     # 'pointsize': '10pt',
177 |     # Additional stuff for the LaTeX preamble.
178 |     # 'preamble': '',
179 | }
180 | 
181 | # Grouping the document tree into LaTeX files. List of tuples
182 | # (source start file, target name, title, author, documentclass [howto/manual]).
183 | latex_documents = [
184 |     ("index", "har_datasets.tex", "har_datasets Documentation", "Niall Twomey", "manual"),
185 | ]
186 | 
187 | # The name of an image file (relative to this directory) to place at the top of
188 | # the title page.
189 | # latex_logo = None
190 | 
191 | # For "manual" documents, if this is true, then toplevel headings are parts,
192 | # not chapters.
193 | # latex_use_parts = False
194 | 
195 | # If true, show page references after internal links.
196 | # latex_show_pagerefs = False
197 | 
198 | # If true, show URL addresses after external links.
199 | # latex_show_urls = False
200 | 
201 | # Documents to append as an appendix to all manuals.
202 | # latex_appendices = []
203 | 
204 | # If false, no module index is generated.
205 | # latex_domain_indices = True
206 | 
207 | 
208 | # -- Options for manual page output --------------------------------------------
209 | 
210 | # One entry per manual page. List of tuples
211 | # (source start file, name, description, authors, manual section).
212 | man_pages = [("index", "har_datasets", "har_datasets Documentation", ["Niall Twomey"], 1)]
213 | 
214 | # If true, show URL addresses after external links.
215 | # man_show_urls = False
216 | 
217 | 
218 | # -- Options for Texinfo output ------------------------------------------------
219 | 
220 | # Grouping the document tree into Texinfo files. List of tuples
221 | # (source start file, target name, title, author,
222 | #  dir menu entry, description, category)
223 | texinfo_documents = [
224 |     (
225 |         "index",
226 |         "har_datasets",
227 |         "har_datasets Documentation",
228 |         "Niall Twomey",
229 |         "har_datasets",
230 |         "A collection of human activity recognition (HAR) datasets, complete with metadata, consistent labels, and processing engine for unified HAR analysis. ",
231 |         "Miscellaneous",
232 |     ),
233 | ]
234 | 
235 | # Documents to append as an appendix to all manuals.
236 | # texinfo_appendices = []
237 | 
238 | # If false, no module index is generated.
239 | # texinfo_domain_indices = True
240 | 
241 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
242 | # texinfo_show_urls = 'footnote'
243 | 


--------------------------------------------------------------------------------
/src/base.py:
--------------------------------------------------------------------------------
  1 | from functools import lru_cache
  2 | from functools import partial
  3 | from operator import itemgetter
  4 | from pathlib import Path
  5 | from typing import Any
  6 | from typing import Callable
  7 | from typing import Dict
  8 | from typing import Iterable
  9 | from typing import List
 10 | from typing import Optional
 11 | from typing import Tuple
 12 | from typing import Union
 13 | 
 14 | import pygraphviz as pgv
 15 | from loguru import logger
 16 | from mldb import ComputationGraph
 17 | from mldb import FileLockExistsException
 18 | from mldb import NodeWrapper
 19 | from mldb.backends import Backend
 20 | from mldb.backends import JsonBackend
 21 | from mldb.backends import NumpyBackend
 22 | from mldb.backends import PandasBackend
 23 | from mldb.backends import PickleBackend
 24 | from mldb.backends import PNGBackend
 25 | from mldb.backends import ScikitLearnBackend
 26 | from mldb.backends import VolatileBackend
 27 | from mldb.backends import YamlBackend
 28 | 
 29 | from src.functional.common import node_itemgetter
 30 | from src.keys import Key
 31 | from src.meta import BaseMeta
 32 | from src.utils.decorators import DecoratorBase
 33 | from src.utils.loaders import build_path
 34 | from src.utils.loaders import get_yaml_file_list
 35 | from src.utils.misc import NumpyEncoder
 36 | from src.utils.misc import randomised_order
 37 | 
 38 | 
 39 | __all__ = ["ExecutionGraph", "get_ancestral_metadata"]
 40 | 
 41 | 
 42 | INDEX_FILES_SET = set(
 43 |     get_yaml_file_list("indices", stem=True)
 44 |     + get_yaml_file_list("tasks", stem=True)
 45 |     + get_yaml_file_list("data_partitions", stem=True)
 46 | )
 47 | 
 48 | DATA_ROOT: Path = build_path()
 49 | 
 50 | BACKEND_DICT = dict(
 51 |     none=VolatileBackend(),
 52 |     pickle=PickleBackend(DATA_ROOT),
 53 |     pandas=PandasBackend(DATA_ROOT),
 54 |     numpy=NumpyBackend(DATA_ROOT),
 55 |     json=JsonBackend(DATA_ROOT, cls=NumpyEncoder),
 56 |     sklearn=ScikitLearnBackend(DATA_ROOT),
 57 |     png=PNGBackend(DATA_ROOT),
 58 |     yaml=YamlBackend(DATA_ROOT),
 59 | )
 60 | 
 61 | 
 62 | @lru_cache(2 ** 16)
 63 | def is_index_key(key: Optional[Union[Key, str]]) -> bool:
 64 |     if key is None:
 65 |         return False
 66 |     if isinstance(key, Key):
 67 |         key = key.key
 68 |     assert isinstance(key, str)
 69 |     return key in INDEX_FILES_SET
 70 | 
 71 | 
 72 | def validate_meta(meta: Union[BaseMeta, Path, str], name: Union[Path, str]) -> BaseMeta:
 73 |     if isinstance(meta, BaseMeta):
 74 |         return meta
 75 |     elif isinstance(meta, (str, Path)):
 76 |         return BaseMeta(path=meta)
 77 |     elif isinstance(name, (str, Path)):
 78 |         return BaseMeta(path=name)
 79 | 
 80 |     logger.exception(f"Ambiguous metadata specification with {name=} and {meta=}")
 81 | 
 82 |     raise ValueError
 83 | 
 84 | 
 85 | def validate_backend(backend: Optional[str], key: Optional[Union[str, Key]] = None) -> Backend:
 86 |     if is_index_key(key):
 87 |         if backend != "pandas":
 88 |             logger.warning(f"Backend for node {key} is not pandas - setting value to 'pandas'.")
 89 |         backend = "pandas"
 90 | 
 91 |     else:
 92 |         if backend is None:
 93 |             backend = "none"
 94 | 
 95 |     if backend not in BACKEND_DICT:
 96 |         logger.exception(f"Backend ({backend}) not in known list ({sorted(BACKEND_DICT.keys())})")
 97 |         raise KeyError
 98 | 
 99 |     return BACKEND_DICT[backend]
100 | 
101 | 
102 | def relative_node_name(identifier: Path, key: Union[Key, str]) -> Path:
103 |     assert isinstance(key, (Key, str))
104 |     return identifier / str(key)
105 | 
106 | 
107 | def absolute_node_name(identifier: Path, key: Union[Key, str]) -> Path:
108 |     return DATA_ROOT / relative_node_name(identifier=identifier, key=key)
109 | 
110 | 
111 | class NodeGroup(object):
112 |     def __init__(self, graph: "ExecutionGraph"):
113 |         self.graph = graph
114 | 
115 |     def __repr__(self) -> str:
116 |         graph_name = self.graph.name
117 |         nodes = sorted(map(str, self.keys()))
118 |         return f"{self.__class__.__name__}({graph_name=}, {nodes=})"
119 | 
120 |     def __getitem__(self, key: Union[Key, str]) -> NodeWrapper:
121 |         assert self.validate_key(key)
122 | 
123 |         key = Key(key)
124 | 
125 |         if key in self.graph.nodes:
126 |             return self.graph.nodes[key]
127 | 
128 |         if self.graph.parent is not None:
129 |             return self.parent_group[key]
130 | 
131 |         logger.exception(f"Unable to find key '{key}' in graph - reached root.")
132 | 
133 |         raise KeyError
134 | 
135 |     def keys(self) -> Iterable[Union[Key, str]]:
136 |         yield from map(itemgetter(0), self.items())
137 | 
138 |     def values(self) -> Iterable[NodeWrapper]:
139 |         yield from map(itemgetter(1), self.items())
140 | 
141 |     def items(self) -> Iterable[Tuple[Union[Key, str], NodeWrapper]]:
142 |         keys = [key for key in self.graph.nodes.keys() if self.validate_key(key)]
143 | 
144 |         if len(keys) == 0:
145 |             yield from self.parent_group.items()
146 | 
147 |         else:
148 |             for key in keys:
149 |                 yield key, self.graph.nodes[key]
150 | 
151 |     def validate_key(self, key: Union[Key, str]) -> bool:
152 |         raise NotImplementedError
153 | 
154 |     @property
155 |     def parent_group(self) -> "NodeGroup":
156 |         raise NotImplementedError
157 | 
158 | 
159 | class OutputGroup(NodeGroup):
160 |     def validate_key(self, key: Union[Key, str]) -> bool:
161 |         return not is_index_key(key)
162 | 
163 |     @property
164 |     def parent_group(self) -> "OutputGroup":
165 |         return self.graph.parent.outputs
166 | 
167 | 
168 | class IndexGroup(NodeGroup):
169 |     def validate_key(self, key):
170 |         return is_index_key(key)
171 | 
172 |     @property
173 |     def parent_group(self) -> "IndexGroup":
174 |         return self.graph.parent.index
175 | 
176 |     @property
177 |     def index(self):
178 |         return self["index"]
179 | 
180 |     @property
181 |     def har(self):
182 |         return self["har"]
183 | 
184 |     @property
185 |     def localisation(self):
186 |         return self["localisation"]
187 | 
188 |     @property
189 |     def predefined(self):
190 |         return self["predefined"]
191 | 
192 |     @property
193 |     def loso(self):
194 |         return self["loso"]
195 | 
196 |     @property
197 |     def deployable(self):
198 |         return self["deployable"]
199 | 
200 | 
201 | class ExecutionGraph(ComputationGraph):
202 |     def __init__(self, name, parent=None, meta=None):
203 |         super(ExecutionGraph, self).__init__(name=name)
204 |         self.meta = validate_meta(meta=meta, name=name)
205 |         self.parent: Optional["ExecutionGraph"] = parent
206 |         self.index = IndexGroup(graph=self)
207 |         self.outputs = OutputGroup(graph=self)
208 | 
209 |     # NODE CREATION/ACQUISITION
210 | 
211 |     def instantiate_orphan_node(
212 |         self,
213 |         func: Callable,
214 |         args: Optional[Union[Any, List[Any], Tuple[Any]]] = None,
215 |         kwargs: Optional[Dict[str, Any]] = None,
216 |     ) -> NodeWrapper:
217 |         return self.make_node(name=None, func=func, backend=None, args=args, kwargs=kwargs, cache=False)
218 | 
219 |     def instantiate_node(
220 |         self,
221 |         key: Union[Key, str],
222 |         func: Callable,
223 |         args: Optional[Union[Any, List[Any], Tuple[Any]]] = None,
224 |         kwargs: Optional[Dict[str, Any]] = None,
225 |         backend: Optional[str] = None,
226 |         force_add: bool = False,
227 |     ) -> NodeWrapper:
228 |         key = Key(key)
229 |         if not force_add:
230 |             assert key not in self.nodes
231 |         name = absolute_node_name(identifier=self.identifier, key=key)
232 |         backend = validate_backend(backend, key)
233 |         return self.make_node(name=name, key=key, func=func, backend=backend, args=args, kwargs=kwargs)
234 | 
235 |     def get_or_create(
236 |         self,
237 |         key: Union[Key, str],
238 |         func: Callable,
239 |         args: Optional[Union[Any, Tuple[Any]]] = None,
240 |         kwargs: Optional[Dict[str, Any]] = None,
241 |         backend: Optional[str] = None,
242 |     ) -> NodeWrapper:
243 |         key = Key(key)
244 |         if key in self.nodes:
245 |             return self.nodes[key]
246 |         return self.instantiate_node(key=key, func=func, backend=backend, args=args, kwargs=kwargs)
247 | 
248 |     def acquire_node(self, node: NodeWrapper, key: Optional[Union[Key, str]] = None) -> None:
249 |         if key is None:
250 |             raise NotImplementedError
251 |         key = Key(key)
252 |         if key in self.nodes:
253 |             raise KeyError(f"Cannot acquire {key} since a node of this name's already in {self.nodes.keys()} of {self}")
254 |         self.nodes[key] = node
255 | 
256 |     def __getitem__(self, key: Union[Key, str]) -> NodeWrapper:
257 |         if is_index_key(key):
258 |             return self.index[key]
259 |         return self.outputs[key]
260 | 
261 |     # Some convenience functions
262 | 
263 |     def get_split_series(self, data_partition: str, train_test_split: str) -> NodeWrapper:
264 |         return self.instantiate_node(
265 |             key=f"{data_partition=}-{train_test_split=}",
266 |             func=node_itemgetter(train_test_split),
267 |             backend="pandas",
268 |             args=self.index[data_partition],
269 |         )
270 | 
271 |     # BRANCHING
272 | 
273 |     def make_child(self, name: Union[Key, str], meta: Tuple[Path, str] = None) -> "ExecutionGraph":
274 |         return ExecutionGraph(name=name, parent=self, meta=meta)
275 | 
276 |     def make_sibling(self) -> "ExecutionGraph":
277 |         logger.warning("Making siblings not tested - may be buggy!")
278 |         return ExecutionGraph(name=self.name, parent=self.parent, meta=self.meta)
279 | 
280 |     def __truediv__(self, name: Union[Key, str]) -> "ExecutionGraph":
281 |         return self.make_child(name=name)
282 | 
283 |     # EVALUATION
284 | 
285 |     @property
286 |     def identifier(self) -> Path:
287 |         if self.parent is None:
288 |             return Path(self.name)
289 |         return self.parent.identifier / self.name
290 | 
291 |     def dump_graph(self) -> None:
292 |         dump_graph(graph=self, filepath=absolute_node_name(identifier=self.identifier, key="graph.pdf"))
293 | 
294 |     def evaluate(self, force: bool = False) -> Dict[str, Any]:
295 |         output = dict()
296 |         for key in randomised_order(self.nodes.keys()):
297 |             node = self.nodes[key]
298 |             try:
299 |                 output[key] = node.evaluate()
300 |             except FileLockExistsException:
301 |                 logger.warning(f"Skipping evaluation of {node.name} as it's already being computed.")
302 |         return output
303 | 
304 |     # @staticmethod
305 |     # def build_root():
306 |     #     return ExecutionGraph(name="datasets")
307 | 
308 |     # @staticmethod
309 |     # def zip_root():
310 |     #     return ExecutionGraph("zips")
311 | 
312 | 
313 | def dump_graph(graph: Union[NodeWrapper, ExecutionGraph, ComputationGraph], filepath: Path):
314 |     nodes = dict()
315 |     edges = []
316 | 
317 |     if isinstance(graph, NodeWrapper):
318 |         consume_nodes(nodes, edges, graph)
319 |     elif isinstance(graph, ExecutionGraph):
320 |         for _, node in graph.outputs.items():
321 |             consume_nodes(nodes, edges, node)
322 |     else:
323 |         raise TypeError
324 | 
325 |     nodes = {str(kk): vv for kk, vv in nodes.items()}
326 |     edges = list(map(lambda rr: list(map(str, rr)), edges))
327 | 
328 |     G = pgv.AGraph(directed=True, strict=True, rankdir="LR")
329 |     for node_id, node_name in nodes.items():
330 |         G.add_node(node_id, label=node_name)
331 |     G.add_edges_from(edges)
332 |     try:
333 |         G.layout("dot")
334 |         filepath.parent.mkdir(exist_ok=True, parents=True)
335 |         G.draw(filepath)
336 |         G.close()
337 |     except ValueError as ex:
338 |         logger.exception(f"Unable to save dot file {filepath}: {ex}")
339 | 
340 |     return nodes, edges
341 | 
342 | 
343 | def get_all_sources(node: NodeWrapper):
344 |     sources = []
345 | 
346 |     def resolve(nn):
347 |         if isinstance(nn, NodeWrapper):
348 |             sources.append(nn)
349 |         elif isinstance(nn, (list, tuple)):
350 |             for ni in nn:
351 |                 resolve(ni)
352 |         elif isinstance(nn, dict):
353 |             for ni in nn.values():
354 |                 resolve(ni)
355 | 
356 |     resolve(node.args)
357 |     resolve(node.kwargs)
358 | 
359 |     return sources
360 | 
361 | 
362 | def consume_nodes(nodes: Dict[str, str], edges: List[Tuple[str, str]], ptr: NodeWrapper):
363 |     def add_node(node):
364 |         node_name = node.name
365 |         func = node.func
366 |         if isinstance(func, partial):
367 |             func = node.func.func.__self__.func
368 |         elif isinstance(node.func, DecoratorBase):
369 |             func = func.func
370 |         func_name = func.__name__
371 |         if node_name not in nodes:
372 |             if isinstance(node.name, Path):
373 |                 name = f"{func_name} =>\n{node.name.stem}"
374 |             else:
375 |                 name = func_name
376 |             nodes[node_name] = f"{name}"
377 |         return node_name
378 | 
379 |     add_node(ptr)
380 | 
381 |     for source_node in get_all_sources(ptr):
382 |         source_name = add_node(source_node)
383 |         edges.append((source_name, ptr.name))
384 |         consume_nodes(nodes, edges, source_node)
385 | 
386 | 
387 | def get_ancestral_metadata(graph: Union[NodeWrapper, ExecutionGraph], key: str):
388 |     if isinstance(graph, NodeWrapper):
389 |         graph = graph.graph
390 |     if graph.meta is None:
391 |         logger.exception(f'The key "{key}" cannot be found in "{graph}"')
392 |         raise KeyError
393 |     if key in graph.meta:
394 |         return graph.meta[key]
395 |     if graph.parent is None:
396 |         logger.exception(f'The key "{key}" cannot be found in the ancestry of "{graph}"')
397 |         raise ValueError
398 |     return get_ancestral_metadata(graph.parent, key)
399 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Introduction
  2 | 
  3 | This repository aims to provide a unified interface to wearable-based Human Activity Recognition (HAR) datasets. The philosophy is to acquire many datasets from a wide variety of recording conditions and to translate these into a consistent data format in order to more easily address open questions on feature extraction/representation learning, meta/transfer learning, active learning amongst other tasks. Ultimately, I am to create a home for the easier understanding of the stability, strengths and weaknesses of the state-of-the-art in HAR. 
  4 | 
  5 | # Setup
  6 | 
  7 | ## Virtual environment
  8 | 
  9 | It is good practise to use virtual environments when using this. I have recently been using [miniconda](https://docs.conda.io/en/latest/miniconda.html) as my python management system. It works exactly like anaconda. The following commands create a new environment, activates it and installs the requirements to that environment.
 10 | 
 11 | ```bash 
 12 | pipenv install --python 3.8 --skip-lock --dev
 13 | pipenv shell
 14 | pre-commit install
 15 | ```
 16 | 
 17 | ## dotenv 
 18 | 
 19 | Several global variables are required for this library to work. I set these up with the [dotenv](https://pypi.org/project/python-dotenv/) library. This searches for a file called `.env` that should be found in the project root. It then loads environment variables called `PROJECT_ROOT`, `ZIP_ROOT` and `BUILD_ROOT`. In my system, these are set up roughly as follows. 
 20 | 
 21 | ```bash 
 22 | export PROJECT_ROOT = "/users/username/workspace/har_datasets"
 23 | export ZIP_ROOT = "/users/username/workspace/har_datasets/data/zip"
 24 | export BUILD_ROOT = "/users/username/workspace/har_datasets/data/build"
 25 | ```
 26 | 
 27 | # Data Format
 28 | 
 29 | The data from all datasets listed in this project are converted into one consistent format that consistes of four key elements: 
 30 | 
 31 | 1. the train/validation/test fold definition file; 
 32 | 2. the label file; 
 33 | 3. the data file; and
 34 | 4. an index file. 
 35 | 
 36 | Note, the serialisation format used in this repository is that data are stored on a per-sample basis. This means that each of the files listed above will have the same number of rows. 
 37 | 
 38 | ## Index File
 39 | 
 40 | The following columns are required for the index file:
 41 | 
 42 | ```
 43 | subject, trail, time
 44 | ```
 45 | 
 46 | `subject` defines a subject identifier, `trial` allows for different trials to be specified (eg it can distinguish data from subjects who perform a task several times), and `time` defines the time (absolute or relative). Subject and trial should be integers, but need not be contiguous. Although time can be considered unnecessary in many applications (especially if the recording was done in a controlled environment or following a script) it is added here to allow for the detection of missing data (missing time stamps) and time-of-day features (if `time` represents epoch time, for example).
 47 | 
 48 | This file must have three columns only. 
 49 | 
 50 | ## Task Files
 51 | 
 52 | The following structure is required for the task files
 53 | 
 54 | ```
 55 | label_vals
 56 | ```
 57 | 
 58 | This file must have at least one column. In general, it is expected that the column will be a list of strings (where the string corresponds to the target). This is not a requirement, however, and the label values may be vector-valued. It is important that the correct model and evaluation criteria are associated with the task. 
 59 | 
 60 | ## Data File
 61 | 
 62 | The data format is quite simple:
 63 | 
 64 | ```
 65 | x, y, z
 66 | ```
 67 | 
 68 | where `x`, `y` and `z` correspond to the axes of the wearable. By default different files are created for each modality (ie accelerometer, gyroscope and magnetomoter) and for each location (eg wrist, waist). For example, if one accelerometer is on the wrist a file called `accel-wrist` will be created for it. There is no restriction on the number of colums in this file, but we expect that more often than not 3 columns will be present for each axis of the device. 
 69 | 
 70 | This file must have at least one column. 
 71 | 
 72 | ## Fold Definitions
 73 | 
 74 | Train and test folds are defined by the columns of this file: 
 75 | 
 76 | ```python
 77 | fold_1
 78 | -1
 79 | -1
 80 | -1
 81 | 0
 82 | 0
 83 | 0
 84 | 1
 85 | 1
 86 | 1
 87 | ```
 88 | 
 89 | The behaviour of these folds is based on scikit-learn's [PredefinedSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.PredefinedSplit.html) module. Additional folds can (if necessary) be defined by adding supplementary columns to this file. For example if doing 10 times 10-fold cross validation, 10 fold identifiers would be contained in each of the 10 columns. 
 90 | 
 91 | This file must have at least one column. 
 92 | 
 93 | Several special fold definitions are also supported. `LOSO` performs leave one subject out cross validation, and `deployable` learns models on all of the data with the expectation that this model is to be deployed outside of the scope of the pipeline that created it. 
 94 | 
 95 | # Contributing
 96 | 
 97 | I hope to receive pull requests for new datasets, processing methods, features, and models to this repository. Requests are likely to be accepted once the exact data format, feature extraction, modelling and evaluation interfaces are relatively stable. 
 98 | 
 99 | ## Contributing Datasets
100 | 
101 | 1. Create a new [yaml](https://en.m.wikipedia.org/wiki/YAML) file in the `metadata/datasets` directory and fill out the information as accurately as possible. Follow the styles and detail given in the entries named `anguita2013`, `pamap2` and `uschad`. The entry of accurate metadata will be heavily strictly moderated before a submission is accepted. Note:
102 |     - The name of the file and the `name` filed in the yaml file dataset name must be lower case.
103 |     - List all sensor modalities in the dataset in the `modalities` field. The modality names should be consistent with the values found in `metadata/modality.yaml`.
104 |     - List all sensor placements in the dataset in the `placements` field  The placement names should be consistent with the values found in `metadata/placement.yaml`.
105 |     - List all outputs in the dataset in the `sources` field. For example, if a data source arrives from an accelerometer placed on the wrist, a dict entry like `{"placement": "wrist", "modality": "accel"}`. This can be tedious, but there is great value in doing this. 
106 |     - If the dataset introduces a new task, add a new file to the `metadata/tasks/<task-name>.yaml` file. List all new target names in this file (see `metadata/tasks/har.yaml` for example). 
107 |     - If the dataset introduces a new target to an existing task, add it to the end of `tasks/<task-name>.yaml`. 
108 |     - If the sensor has been placed on a new location add it to the end of `metadata/placement.yaml`. 
109 |     - If the sensor is of a new modality, add it to the end of `metadata/modality.yaml`. 
110 | 2. Run `make table`. This will update the dataset table in the `tables` directory. Ensure this command executes successully and verify that the entered information is accurate.
111 | 3. Run `make data`. This will download the archive automatically based on the URLs provided in the `download_urls` field from step 1 above. 
112 | 4. Copy the file `src/datasets/__new__.py` to `src/datasets/<dataset-name>.py` (`<dataset-name>` is defined by #1 above). The prupose of this file is to translate the data to the expected format described in the sections above. In particular, separate files with the wearable data, annotated labels, pre-defined folds, and index files are required. Use the existing examples of the aforementioned datasets (`anguita2013`, `pamap2` and `uschad`) that can be found in `src/datasets` as examples of how this has been achieved. 
113 | 
114 | ## Contributing Pipelines
115 | 
116 | (Under construction. See `examples/basic_har.py` for basic examples.)
117 | 
118 | ## Contributing Models
119 | 
120 | (Under construction. See `src/models/sklearn/basic.py` for basic examples.)
121 | 
122 | 
123 | # Datasets
124 | 
125 | The following table enumerates the datasets that are under consideration for inclusion in this repository.
126 | 
127 | | First Author | Dataset Name | Paper (URL) | Data Description (URL) | Data Download (URL) | Year | fs | Accel | Gyro | Mag | #Subjects | #Activities | Notes | 
128 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
129 | | Banos | banos2012 | [A benchmark dataset to evaluate sensor displacement in activity recognition](http://www.orestibanos.com/paper_files/banos_ubicomp_2012.pdf) | [Description](http://archive.ics.uci.edu/ml/datasets/REALDISP+Activity+Recognition+Dataset) | [Download](http://archive.ics.uci.edu/ml/machine-learning-databases/00305/realistic_sensor_displacement.zip) | 2012 | 50 | yes | yes | yes | 17 | 33 |  |
130 | | Banos | banos2015 | [mHealthDroid: a novel framework for agile development of mobile health applications](https://link.springer.com/chapter/10.1007/978-3-319-13105-4_14) | [Description](http://archive.ics.uci.edu/ml/datasets/mhealth+dataset) | [Download](http://archive.ics.uci.edu/ml/machine-learning-databases/00319/MHEALTHDATASET.zip) | 2015 | 50 | yes | yes | yes | 10 | 12 |  |
131 | | Barshan | barshan2014 | [Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units](https://ieeexplore.ieee.org/abstract/document/8130901/) | [Description](https://archive.ics.uci.edu/ml/datasets/daily+and+sports+activities) | [Download](https://archive.ics.uci.edu/ml/machine-learning-databases/00256/data.zip) | 2014 | 25 | yes | yes | yes | 8 | 19 |  |
132 | | Bruno | bruno2013 | [Analysis of Human Behavior Recognition Algorithms based on Acceleration Data](https://www.researchgate.net/profile/Barbara_Bruno2/publication/261415865_Analysis_of_human_behavior_recognition_algorithms_based_on_acceleration_data/links/53d001320cf25dc05cfca025.pdf) | [Description](DescriptionURL) | [Download](DownloadURL) | 2013 | 32 | yes |  |  | 16 | 14 | Notes |
133 | | Casale | casale2015 | [Personalization and user verification in wearable systems using biometric walking patterns](https://dl.acm.org/citation.cfm?id=2339117) |  |  | 2012 | 52 | yes |  |  | 7 | 15 |  |
134 | | Chen | utdmhad | [UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor](https://ieeexplore.ieee.org/abstract/document/7350781) | [Description](https://www.utdallas.edu/~kehtar/UTD-MHAD.html) | [Download](http://www.utdallas.edu/~kehtar/UTD-MAD/Inertial.zip) | 2015 | 50 | yes | yes | | 9 | 21 |  |
135 | | Chavarriaga | opportunity | [The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition](https://www.sciencedirect.com/science/article/pii/S0167865512004205) | [Description](https://archive.ics.uci.edu/ml/datasets/opportunity+activity+recognition) | [Download](https://archive.ics.uci.edu/ml/machine-learning-databases/00226/OpportunityUCIDataset.zip) | 2012 | 30 | yes | yes | yes | 12 | 7 | Several annotation tracks. |
136 | | Chereshnev | hugadb | [HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks](https://link.springer.com/chapter/10.1007/978-3-319-73013-4_12) | [Description](https://github.com/romanchereshnev/HuGaDB) | [Download](https://www.dropbox.com/s/7nb9g650i5m9k6c/HuGaDB.zip?dl=0) | 2017 | ~56 | yes | yes |  | 18 | 12 |  |
137 | | Kwapisz | wisdm | [Activity Recognition using Cell Phone Accelerometers](http://www.cis.fordham.edu/wisdm/includes/files/sensorKDD-2010.pdf) | [Description](http://www.cis.fordham.edu/wisdm/dataset.php) | [Download](http://www.cis.fordham.edu/wisdm/includes/datasets/latest/WISDM_ar_latest.tar.gz) | 2012 | 20 | yes |  |  | 29 | 6 |  |
138 | | Micucci | micucci2017 | [UniMiB SHAR: A Dataset for Human Activity Recognition Using Acceleration Data from Smartphones](https://www.mdpi.com/2076-3417/7/10/1101/html) | [Description](http://www.sal.disco.unimib.it/technologies/unimib-shar/) | [Download](https://www.dropbox.com/s/x2fpfqj0bpf8ep6/UniMiB-SHAR.zip?dl=0) | 2017 | 50 | yes |  |  | 30 | 8 | Notes |
139 | | Ortiz | ortiz2015 | [Human Activity Recognition on Smartphones with Awareness of Basic Activities and Postural Transitions](https://link.springer.com/chapter/10.1007/978-3-319-11179-7_23) | [Description](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based%20Recognition%20of%20Human%20Activities%20and%20Postural%20Transitions) | [Download](http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip) | 2015 | 50 | yes | yes |  | ? | 7 | With postural transitions |
140 | | Shoaib | shoaib2014 | [Fusion of Smartphone Motion Sensors for Physical Activity Recognition](https://www.mdpi.com/1424-8220/14/6/10146) | [Description](https://www.researchgate.net/publication/266384007_Sensors_Activity_Recognition_DataSet) | [Download](https://www.researchgate.net/profile/Muhammad_Shoaib20/publication/266384007_Sensors_Activity_Recognition_DataSet/data/542e9d260cf277d58e8ec40c/Sensors-Activity-Recognition-DataSet-Shoaib.rar) | 2014 | 50 | yes | yes | yes | 7 | 7 |  |
141 | | Siirtola | siirtola2012 | [Recognizing human activities user-independently on smartphones based on accelerometer data](https://dialnet.unirioja.es/servlet/articulo?codigo=3954593) | [Description](http://www.oulu.fi/bisg/node/40364) | [Download](http://www.ee.oulu.fi/research/neurogroup/opendata/OpenHAR.zip) | 2012 | 40 | yes | | | 7 | 5 |  |
142 | | Stisen | stisen2015 | [Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition](https://dl.acm.org/citation.cfm?id=2809718) | [Description](https://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+Recognition) | [Download](https://archive.ics.uci.edu/ml/machine-learning-databases/00344/Activity%20recognition%20exp.zip) | 2015 | 50-200 | yes |  |  | 9 | 6 |  |
143 | | Sztyler | sztyler2016 | [On-body localization of wearable devices: An investigation of position-aware activity recognition](https://ieeexplore.ieee.org/document/7456521) | [Description](http://sensor.informatik.uni-mannheim.de/index.html#dataset_realworld) | [Download](http://wifo5-14.informatik.uni-mannheim.de/sensor/dataset/realworld2016/realworld2016_dataset.zip) | 2016 | 50 | yes | yes | yes | 15 | 8 | Many other sensors also (video, light, sound, etc) |
144 | | Twomey | spherechallenge | [The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data](https://arxiv.org/abs/1603.00797) | [Description](https://data.bris.ac.uk/data/dataset/8gccwpx47rav19vk8x4xapcog) | [Download](https://data.bris.ac.uk/datasets/8gccwpx47rav19vk8x4xapcog/8gccwpx47rav19vk8x4xapcog.zip) | 2016 | 20 | yes | | | 20 | 20 |  |
145 | | Ugulino | ugulino2012 | [Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements](https://link.springer.com/chapter/10.1007/978-3-642-34459-6_6) | [Description](http://groupware.les.inf.puc-rio.br/har) | [Download](http://groupware.les.inf.puc-rio.br/static/har/SystematicReview-RIS-Format.zip) | 2012 | 50 | yes |  |  | 4 | 5 |  |
146 | | Vavoulas | mobiact | [The MobiAct Dataset: Recognition of Activities of Daily Living using Smartphones](http://www.scitepress.org/Papers/2016/57924/57924.pdf) | [Description](https://bmi.teicrete.gr/en/the-mobifall-and-mobiact-datasets-2/) | [Fill out this form to download](https://bmi.hmu.gr/the-mobifall-and-mobiact-datasets-2/) | 2016 | 100 | yes |  |  | 57 | 9 |  |
147 | 
148 | 
149 | 
150 | 
151 | 
152 | 
153 | # Project Structure
154 | 
155 | This project follows the [DataScience CookieCutter](https://drivendata.github.io/cookiecutter-data-science/) template with the aim of facilitating reproducible models and results. the majority of commands are executed with the `make` command, and we also provide a high-level data loading interface.
156 | 
157 | 


--------------------------------------------------------------------------------