├── .gitignore
├── Filternet Paper - architectures.ipynb
├── Filternet Paper - ensembles.ipynb
├── Filternet Paper - event metrics visualization.ipynb
├── Filternet Paper - heat map examples to compare model results.ipynb
├── Filternet Paper - inference window length.ipynb
├── Filternet Paper - multimodal sensor fusion.ipynb
├── Filternet Paper - overall results summary.ipynb
├── Filternet Paper - plot training history example.ipynb
├── README.md
├── ensemble_effects.png
├── environment.yaml
├── filternet
    ├── __init__.py
    ├── datasets
    │   ├── .gitignore
    │   ├── __init__.py
    │   ├── har.py
    │   ├── intention_recognition.py
    │   ├── opportunity.py
    │   └── smartphone_hapt.py
    ├── models
    │   ├── __init__.py
    │   ├── base_layers.py
    │   ├── base_net.py
    │   ├── deep_conv_lstm.py
    │   ├── filter_net.py
    │   ├── filter_net_ensemble.py
    │   └── reference_architectures.py
    ├── mputil.py
    └── training
    │   ├── __init__.py
    │   ├── ensemble_train.py
    │   ├── evalmodel.py
    │   ├── train.py
    │   └── trainable.py
├── multimodal_sensor_fusion.png
├── scripts
    ├── __init__.py
    ├── run_base_configs_exp.py
    ├── run_ensemble_exp.py
    └── run_mm_base_configs_exp.py
├── setup.py
├── stripchart heatmaps.png
├── tests
    ├── datasets
    │   ├── test_har.py
    │   ├── test_intention_recognition.py
    │   ├── test_opportunity.py
    │   └── test_smartphone_hapt.py
    ├── test_datasets.py
    ├── test_init.py
    ├── test_models.py
    └── test_train.py
├── training_history.png
└── win_len_effects.png


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 | 
30 | # Installer logs
31 | pip-log.txt
32 | pip-delete-this-directory.txt
33 | 
34 | # Unit test / coverage reports
35 | htmlcov/
36 | .tox/
37 | .nox/
38 | .coverage
39 | .coverage.*
40 | .cache
41 | nosetests.xml
42 | coverage.xml
43 | *.cover
44 | *.py,cover
45 | .hypothesis/
46 | .pytest_cache/
47 | cover/
48 | 
49 | # Jupyter Notebook
50 | .ipynb_checkpoints
51 | 
52 | # User-specific stuff
53 | .idea
54 | .idea/
55 | .idea/*
56 | 
57 | Icon
58 | 
59 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # FilterNet: A many-to-many deep learning architecture for time series classification
 3 | 
 4 | [![DOI](https://zenodo.org/badge/242397153.svg)](https://zenodo.org/badge/latestdoi/242397153)
 5 | 
 6 | This repository contains code to reproduce the results and figures in the paper: 
 7 | *[FilterNet: A many-to-many deep learning architecture for time series classification](https://www.mdpi.com/703084)*.
 8 | 
 9 | ## Setup
10 | The easiest way to run this software is via the Anaconda Python distribution.
11 | 
12 | 1. Install Anaconda
13 | 2. Run `conda env create -f environment.yaml`
14 | 3. Enable the `filternet` environment, like, `source activate filternet`
15 | 4. Install filternet so it is importable, by running `pip install -e .` in the same 
16 |    directory as setup.py
17 | 
18 | ## Running tests
19 | In the root dir of this repo:
20 | 
21 | ```
22 | pytest tests
23 | ```
24 | 
25 | This will be *really* slow the first time because it has to download and pre-process 
26 | several large AR datasets.
27 | 
28 | Subsequent test runs will probably still be slow, but... less slow.
29 | 
30 | ## Reproducing Results
31 | 
32 | 1. Run the scripts in the `scripts/` directory. These are very long-running scripts that 
33 |    reproduce each experimental condition many times. You might want to set, e.g., `NUM_REPEATS=1` 
34 |    if you don't need this level of reproducibility.
35 |    
36 | 2. Run the notebooks to re-produce the figures. You might need to edit a few paths to specific
37 |    models to match the filenames on your system, especially if you changed the 
38 |    `NAME` or `NUM_REPEATS` parameters.
39 |      
40 | ------
41 | Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
42 | 


--------------------------------------------------------------------------------
/ensemble_effects.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/ensemble_effects.png


--------------------------------------------------------------------------------
/environment.yaml:
--------------------------------------------------------------------------------
 1 | name: filternet
 2 | channels:
 3 |   - pytorch
 4 |   - defaults
 5 | 
 6 | dependencies:
 7 |   - anaconda == 2019.10  # unpin this for latest packages; this pins lots of versions.
 8 |   - ipython
 9 |   - jupyter
10 |   - matplotlib
11 |   - numpy
12 |   - pandas
13 |   - pip
14 |   - pytest
15 |   - pytorch == 1.0.1 # unpin for latest pytorch
16 |   - scikit-learn
17 |   - scipy
18 |   - seaborn
19 |   - traits
20 |   - pip:
21 |       - hyperopt
22 |       - pyEDFlib
23 |       - ray == 0.8.4 # you could unpin, but api seems unstable
24 |       - tabulate
25 |       - torchsummary
26 |       - torchsummaryX
27 |       - typing
28 |       - ward-metrics == 0.9.5 # probably doesn't need to be pinned
29 | 
30 | 
31 | 
32 | 


--------------------------------------------------------------------------------
/filternet/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | import os
4 | 
5 | base_dir = os.path.split(__file__)[0]


--------------------------------------------------------------------------------
/filternet/datasets/.gitignore:
--------------------------------------------------------------------------------
1 | *.zip
2 | OpportunityUCIDataset


--------------------------------------------------------------------------------
/filternet/datasets/__init__.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import os
 4 | 
 5 | import numpy as np
 6 | import sklearn
 7 | 
 8 | datasets_dir = os.path.split(__file__)[0]
 9 | 
10 | 
11 | def sliding_window_x_y(Xc, ycs, win_len=128, step=None, shuffle=True):
12 |     if step is None:
13 |         step = int(win_len / 2)
14 |     start_idxs = np.arange(0, len(Xc) - win_len, step)
15 |     X = (
16 |         np.array([Xc[i : i + win_len] for i in start_idxs])
17 |         .transpose([0, 2, 1])
18 |         .astype(np.float32)
19 |     )  # [N, C, L]
20 |     ys = [
21 |         np.array([yc[i : i + win_len] for i in start_idxs]).astype(np.long) 
22 |         for yc in ycs
23 |     ]  # [len(ycs), N, L]
24 |     if shuffle:
25 |         X, *ys = sklearn.utils.shuffle(X, *ys, random_state=0)
26 |         return X, ys
27 |     else:
28 |         return X, ys
29 | 


--------------------------------------------------------------------------------
/filternet/datasets/har.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """ Loads the smartphone human activity and posture detection dataset.
  4 | 
  5 | HAPT
  6 | """
  7 | 
  8 | import os
  9 | import urllib
 10 | from zipfile import ZipFile
 11 | 
 12 | # from . import datasets_dir
 13 | import numpy as np
 14 | 
 15 | datasets_dir = os.path.dirname(__file__)
 16 | 
 17 | import pandas as pd
 18 | 
 19 | # Papers
 20 | # http://conference.scipy.org/proceedings/scipy2018/pdfs/christian_mcdaniel.pdf
 21 | #   github: https://github.com/xtianmcd/accelstm
 22 | # https://www.mdpi.com/1424-8220/17/11/2556/htm
 23 | # https://arxiv.org/pdf/1801.04503.pdf
 24 | # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.641.2285&rep=rep1&type=pdf
 25 | 
 26 | 
 27 | DATASET_URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip"
 28 | _, DATASET_FILE = os.path.split(DATASET_URL)
 29 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE)
 30 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_")
 31 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR)
 32 | OUTPUT_SUBDIR = os.path.join(OUTPUT_DIR, "UCI HAR Dataset")
 33 | 
 34 | dir_map = {"1": "x", "2": "y", "3": "z"}
 35 | 
 36 | 
 37 | def download_if_needed():
 38 |     """Downloads and extracts .zip if needed. """
 39 |     if not os.path.exists(OUTPUT_DIR):
 40 |         if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)):
 41 |             print("Downloading .zip file...")
 42 |             urllib.request.urlretrieve(
 43 |                 DATASET_URL, os.path.join(datasets_dir, DATASET_FILE)
 44 |             )
 45 |         assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE))
 46 | 
 47 |         print(f"Extracting to {OUTPUT_DIR}...")
 48 |         zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE))
 49 |         zip.extractall(OUTPUT_DIR)
 50 | 
 51 |     assert os.path.exists(OUTPUT_SUBDIR)
 52 | 
 53 | 
 54 | def get_label_series():
 55 |     activity_labels_path = os.path.join(OUTPUT_SUBDIR, "activity_labels.txt")
 56 |     print(f"Loading file {activity_labels_path}")
 57 |     label_series = pd.read_csv(activity_labels_path, sep=r"\s+", header=None)
 58 |     label_series.columns = ["class_number", "class_label"]
 59 |     label_series = label_series.dropna(how="any", axis=0)
 60 |     return pd.Series(
 61 |         label_series["class_label"].values, index=label_series["class_number"].values
 62 |     )
 63 | 
 64 | 
 65 | def load_labels():
 66 |     label_file_path = os.path.join(OUTPUT_SUBDIR, "train/y_train.txt")
 67 |     print(f"Reading labels file {label_file_path}")
 68 |     train_labels = pd.read_csv(label_file_path, r"\s+", header=None)
 69 |     train_labels.columns = ["activity_id"]
 70 | 
 71 |     train_inds = np.arange(0, len(train_labels), 2)
 72 |     train_inds = np.tile(train_inds, (128, 1)).T.flatten()
 73 |     train_labels = train_labels.loc[train_inds, :]
 74 | 
 75 |     label_file_path = os.path.join(OUTPUT_SUBDIR, "test/y_test.txt")
 76 |     print(f"Reading labels file {label_file_path}")
 77 |     test_labels = pd.read_csv(label_file_path, r"\s+", header=None)
 78 |     test_labels.columns = ["activity_id"]
 79 | 
 80 |     test_inds = np.arange(0, len(test_labels), 2)
 81 |     test_inds = np.tile(test_inds, (128, 1)).T.flatten()
 82 |     test_labels = test_labels.loc[test_inds, :]
 83 | 
 84 |     label_series = get_label_series()
 85 | 
 86 |     df_labels = pd.concat([train_labels, test_labels], ignore_index=True)
 87 | 
 88 |     df_labels["activity_label"] = df_labels["activity_id"].map(label_series)
 89 | 
 90 |     return df_labels
 91 | 
 92 | 
 93 | def load_subjects():
 94 |     subject_fp = os.path.join(OUTPUT_SUBDIR, "train/subject_train.txt")
 95 |     print(f"Reading subjects file {subject_fp}")
 96 |     train_subjects = pd.read_csv(subject_fp, r"\s+", header=None)
 97 |     train_subjects.columns = ["subject_id"]
 98 | 
 99 |     train_inds = np.arange(0, len(train_subjects), 2)
100 |     train_inds = np.tile(train_inds, (128, 1)).T.flatten()
101 |     train_subjects = train_subjects.loc[train_inds, :]
102 | 
103 |     subject_fp = os.path.join(OUTPUT_SUBDIR, "test/subject_test.txt")
104 |     print(f"Reading subjects file {subject_fp}")
105 |     test_subjects = pd.read_csv(subject_fp, r"\s+", header=None)
106 |     test_subjects.columns = ["subject_id"]
107 | 
108 |     test_inds = np.arange(0, len(test_subjects), 2)
109 |     test_inds = np.tile(test_inds, (128, 1)).T.flatten()
110 |     test_subjects = test_subjects.loc[test_inds, :]
111 | 
112 |     df_subjects = pd.concat([train_subjects, test_subjects], ignore_index=True)
113 | 
114 |     return df_subjects
115 | 
116 | 
117 | def separate_subjects():
118 |     all_subjects = set(range(1, 31))
119 |     # validation_subjects = {4, 12, 20, 27}  # 3 test (4, 12, 20) + 1 train (27)
120 |     validation_subjects = {5, 16, 27}
121 |     test_subjects = {2, 4, 9, 10, 12, 13, 18, 20, 24}  # Given test subjects
122 |     train_subjects = all_subjects - test_subjects - validation_subjects
123 |     assert train_subjects | test_subjects | validation_subjects == all_subjects
124 |     assert set() == validation_subjects & test_subjects
125 |     assert set() == train_subjects & test_subjects
126 |     assert set() == train_subjects & validation_subjects
127 |     return train_subjects, validation_subjects, test_subjects
128 | 
129 | 
130 | def get_column_info():
131 |     sensor_types = ["total_accel", "body_accel", "gyro"]
132 |     file_names = ["total_acc", "body_acc", "body_gyro"]
133 |     num_chans = [3, 3, 3]
134 |     assert len(num_chans) == len(sensor_types)
135 |     types = []
136 |     names = []
137 |     file_prefixes = []
138 |     for i in range(len(sensor_types)):
139 |         t = sensor_types[i]
140 |         nc = num_chans[i]
141 |         types.extend([t] * nc)
142 |         file_prefixes.extend([file_names[i]] * nc)
143 |         names.extend([f"{t[0]}{j + 1}" for j in range(nc)])
144 | 
145 |     # Note that this segfaults if the input data is a tuple and not a list. WTF pandas?
146 |     df = pd.DataFrame(
147 |         [types, file_prefixes], index=["sensor_type", "file_prefix"], columns=names
148 |     ).T
149 |     df.index.name = "name"
150 |     df["output"] = False
151 | 
152 |     df = df.reindex(
153 |         df.index.append(pd.Index(["subject_id", "activity_id", "activity_label"]))
154 |     )
155 | 
156 |     df.loc[["activity_id", "activity_label"], "output"] = True
157 | 
158 |     return df
159 | 
160 | 
161 | def get_data():
162 |     train_subjects, validation_subjects, test_subjects = separate_subjects()
163 |     labels = load_labels()
164 |     subjects = load_subjects()
165 |     col_info = get_column_info()
166 | 
167 |     output_cols = col_info.index[col_info.output == True]
168 |     not_input_cols = col_info.index[~(col_info.output == False)]
169 |     all_sensors = col_info.loc[col_info.output == False, "sensor_type"].unique()
170 |     all_data = []
171 |     for d_type in ["train", "test"]:
172 | 
173 |         exp_data = []
174 |         for t in all_sensors:
175 |             sensor_type_mask = t == col_info.sensor_type
176 |             prefix = col_info.loc[sensor_type_mask, "file_prefix"].unique()[0]
177 |             cur_sensors = col_info.loc[sensor_type_mask]
178 |             for i_name in cur_sensors.index:
179 |                 file_path = os.path.join(
180 |                     OUTPUT_SUBDIR,
181 |                     d_type,
182 |                     "Inertial Signals",
183 |                     f"{prefix}_{dir_map[i_name[-1]]}_{d_type}.txt",
184 |                 )
185 |                 print(f"Reading file {file_path}")
186 |                 tmp_df = pd.read_csv(file_path, sep=r"\s+", header=None)
187 |                 tmp_s = pd.Series(tmp_df.values[::2, :].flatten(), name=i_name)
188 |                 exp_data.append(tmp_s)
189 | 
190 |         all_data.append(pd.concat(exp_data, axis=1, ignore_index=False))
191 | 
192 |     all_data = pd.concat(all_data, axis=0, ignore_index=True)
193 |     for col in col_info.index[col_info.output == True]:
194 |         all_data[col] = labels[col]
195 | 
196 |     all_data["subject_id"] = subjects.values
197 | 
198 |     train_data = all_data.loc[all_data["subject_id"].isin(train_subjects), :]
199 |     validation_data = all_data.loc[all_data["subject_id"].isin(validation_subjects), :]
200 |     test_data = all_data.loc[all_data["subject_id"].isin(test_subjects), :]
201 |     return train_data, validation_data, test_data
202 | 
203 | 
204 | def get_dfs_processed():
205 |     df_train, df_val, df_test = get_data()
206 |     col_df = get_column_info()
207 |     # Calculate feature normalizing stats from training set only
208 |     norm_mean = df_train.mean(axis=0)
209 |     norm_std = df_train.std(axis=0)
210 | 
211 |     # Apply to all sets to normalize features
212 |     input_features = col_df.index[col_df.output == False]
213 |     for df in (df_train, df_val, df_test):
214 |         # de-mean
215 |         df.loc[:, input_features] -= norm_mean
216 | 
217 |         # unit std dev
218 |         df.loc[:, input_features] /= norm_std
219 | 
220 |         # interpolate and NA's -> 0
221 |         df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0)
222 | 
223 |     return df_train, df_val, df_test
224 | 
225 | 
226 | _df_dicts = {}
227 | 
228 | 
229 | def get_or_make_dfs():
230 |     if _df_dicts:
231 |         return _df_dicts
232 |     download_if_needed()
233 | 
234 |     cache_dir = os.path.join(OUTPUT_DIR, "cache")
235 |     if not os.path.isdir(cache_dir) or not os.path.isfile(
236 |         os.path.join(cache_dir, "df_train.df.pkl")
237 |     ):
238 |         print("Smartphone data not cached. Creating cache now...")
239 |         try:
240 |             os.makedirs(cache_dir)
241 |         except FileExistsError:
242 |             pass
243 | 
244 |         df_train, df_val, df_test = get_dfs_processed()
245 |         df_cols = get_column_info()
246 | 
247 |         s_labels = get_label_series()
248 | 
249 |         df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl"))
250 |         _df_dicts["df_train"] = df_train
251 |         df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
252 |         _df_dicts["df_val"] = df_val
253 |         df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
254 |         _df_dicts["df_test"] = df_test
255 | 
256 |         df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
257 |         _df_dicts["df_cols"] = df_cols
258 |         s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl"))
259 |         _df_dicts["s_labels"] = s_labels
260 |         print("Caching done.")
261 |     else:
262 |         print("Loading cached data.")
263 |         _df_dicts["df_train"] = pd.read_pickle(
264 |             os.path.join(cache_dir, "df_train.df.pkl")
265 |         )
266 |         _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
267 |         _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
268 | 
269 |         _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
270 |         _df_dicts["s_labels"] = pd.read_pickle(
271 |             os.path.join(cache_dir, "s_labels.s.pkl")
272 |         )
273 |         print("Loaded.")
274 | 
275 |     return _df_dicts
276 | 
277 | 
278 | def get_x_y_contig(
279 |     which_set="train",
280 |     dfs_dict=None,
281 |     y_cols=["y_activity"],
282 |     sensor_subset=None,
283 |     include_transitions=False,
284 | ):
285 |     """ Load X and y as contiguous time vectors (with various runs concatenated together).
286 | 
287 |     Parameters
288 |     ----------
289 |     which_set: str
290 |         which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g.,
291 |         "train", "val", "test", "train+val", etc.
292 |     dfs_dict: dict
293 |         Can provide pre-laoded df_dict to save time. (deprecated)
294 |     y_cols: List[str]
295 |         List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion']
296 |     sensor_subset: ty.Iterable
297 |         Which subset of sensors to include. Valid values include:
298 |             "accels", "accel"
299 |             "gyros", "gyro"
300 |             "accels+gyros", "accel+gyro"
301 |             "all"
302 |     include_transitions : bool
303 |     """
304 |     if not dfs_dict:
305 |         dfs_dict = get_or_make_dfs()
306 | 
307 |     assert type(y_cols) == list
308 | 
309 |     df = []
310 |     for _which_set in which_set.split("+"):
311 |         df.append(dfs_dict["df_" + _which_set])
312 | 
313 |     df = pd.concat(df)
314 | 
315 |     df_cols = dfs_dict["df_cols"]
316 | 
317 |     if not sensor_subset or sensor_subset == "all":
318 |         cols = df_cols.index[df_cols.output == False]
319 |     else:
320 |         cols = []
321 |         for s in sensor_subset.split("+"):
322 |             if s.endswith("s"):
323 |                 s = s[:-1]
324 |             cols.append(df_cols.index[df_cols.sensor_type == s])
325 |         cols = pd.concat(cols)
326 | 
327 |     Xc = df[cols].values.copy()
328 |     s_labels = dfs_dict["s_labels"]
329 |     activity_outputs = df["activity_id"].copy()
330 |     # Replace nulls with 0
331 |     assert not activity_outputs.isnull().any()
332 | 
333 |     if (activity_outputs == 0).sum() == 0:
334 |         # No None class
335 |         activity_outputs -= 1
336 |         s_labels.index = s_labels.index - 1
337 | 
338 |     ycs = [activity_outputs.values.copy()]
339 | 
340 |     # Include the null class
341 |     output_spec_dict = {
342 |         "name": "y_activity",
343 |         "num_classes": len(s_labels),
344 |         "classes": s_labels.sort_index().to_list(),
345 |     }
346 | 
347 |     data_spec = {
348 |         "dataset_name": "har",
349 |         "input_channels": Xc.shape[1],
350 |         "n_outputs": len(ycs),
351 |         "input_features": cols.to_list(),
352 |         "output_spec": [output_spec_dict],
353 |     }
354 | 
355 |     return Xc, ycs, data_spec
356 | 
357 | 
358 | if __name__ == "__main__":
359 |     pass
360 |     # get_x_y_contig()
361 | 


--------------------------------------------------------------------------------
/filternet/datasets/intention_recognition.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """ EEG Based intention recognition
  4 | 
  5 | Good paper to base it off of:
  6 | https://arxiv.org/pdf/1708.06578.pdf
  7 | https://github.com/pbashivan/EEGLearn
  8 | """
  9 | 
 10 | import os
 11 | import urllib
 12 | from zipfile import ZipFile
 13 | 
 14 | # from . import datasets_dir
 15 | import numpy as np
 16 | import pyedflib as pyedflib
 17 | 
 18 | datasets_dir = os.path.dirname(__file__)
 19 | 
 20 | import pandas as pd
 21 | 
 22 | DATASET_URL = "https://www.physionet.org/static/published-projects/eegmmidb/eeg-motor-movementimagery-dataset-1.0.0.zip"
 23 | _, DATASET_FILE = os.path.split(DATASET_URL)
 24 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE)
 25 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_")
 26 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR)
 27 | 
 28 | 
 29 | def download_if_needed():
 30 |     """Downloads and extracts .zip if needed. """
 31 |     if not os.path.exists(OUTPUT_DIR):
 32 |         if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)):
 33 |             print("Downloading .zip file...")
 34 |             urllib.request.urlretrieve(
 35 |                 DATASET_URL, os.path.join(datasets_dir, DATASET_FILE)
 36 |             )
 37 |         assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE))
 38 | 
 39 |         print(f"Extracting to {OUTPUT_DIR}...")
 40 |         zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE))
 41 |         zip.extractall(OUTPUT_DIR)
 42 | 
 43 |     assert os.path.exists(OUTPUT_DIR)
 44 | 
 45 | 
 46 | def separate_subjects():
 47 |     all_subjects = set(range(1, 110))
 48 |     all_subjects -= {89}  # Screwed up according to paper
 49 |     # validation_subjects = {4, 12, 20, 27}  # 3 test (4, 12, 20) + 1 train (27)
 50 | 
 51 |     np.random.seed(seed=123)
 52 |     test_subjects = set(
 53 |         np.random.choice(
 54 |             list(all_subjects), size=int(len(all_subjects) // 20), replace=False
 55 |         )
 56 |     )
 57 |     validation_subjects = set(
 58 |         np.random.choice(
 59 |             list(all_subjects - test_subjects),
 60 |             size=int((len(all_subjects) - len(test_subjects)) // 20),
 61 |             replace=False,
 62 |         )
 63 |     )
 64 |     train_subjects = all_subjects - test_subjects - validation_subjects
 65 | 
 66 |     print(f"test_subjects ({len(test_subjects)}): {test_subjects}")
 67 |     print(f"validation_subjects ({len(validation_subjects)}): {validation_subjects}")
 68 |     print(f"train_subjects ({len(train_subjects)}): {train_subjects}")
 69 | 
 70 |     assert train_subjects | test_subjects | validation_subjects == all_subjects
 71 |     assert set() == validation_subjects & test_subjects
 72 |     assert set() == train_subjects & test_subjects
 73 |     assert set() == train_subjects & validation_subjects
 74 |     return train_subjects, validation_subjects, test_subjects
 75 | 
 76 | 
 77 | def get_label_series():
 78 |     return pd.Series(
 79 |         {
 80 |             0: "Eyes Closed",
 81 |             1: "Left Fist",
 82 |             2: "Right Fist",
 83 |             3: "Both Fists",
 84 |             4: "Both Feet",
 85 |         }
 86 |     )
 87 | 
 88 | 
 89 | def get_column_info():
 90 |     sensor_types = ["eeg"]
 91 |     num_chans = [64]
 92 |     assert len(num_chans) == len(sensor_types)
 93 |     types = []
 94 |     names = []
 95 |     file_path = os.path.join(OUTPUT_DIR, "files", "S001", "S001R01.edf")
 96 |     f = pyedflib.EdfReader(file_path)
 97 |     names = f.getSignalLabels()
 98 |     for i in range(len(sensor_types)):
 99 |         t = sensor_types[i]
100 |         nc = num_chans[i]
101 |         types.extend([t] * nc)
102 | 
103 |     # Note that this segfaults if the input data is a tuple and not a list. WTF pandas?
104 |     df = pd.DataFrame([types], index=["sensor_type"], columns=names).T
105 |     df.index.name = "name"
106 |     df["output"] = False
107 | 
108 |     df = df.reindex(
109 |         df.index.append(
110 |             pd.Index(["experiment_id", "subject_id", "activity_id", "activity_label"])
111 |         )
112 |     )
113 | 
114 |     df.loc[["activity_id", "activity_label"], "output"] = True
115 | 
116 |     return df
117 | 
118 | 
119 | def get_data():
120 |     train_subjects, validation_subjects, test_subjects = separate_subjects()
121 |     col_info = get_column_info()
122 | 
123 |     train_data = []
124 |     validation_data = []
125 |     test_data = []
126 | 
127 |     left_or_right_trials = {4, 8, 12}
128 |     both_trials = {6, 10, 14}
129 |     label_series = get_label_series()
130 |     all_trials = left_or_right_trials | both_trials
131 |     all_subjects = train_subjects | validation_subjects | test_subjects
132 |     for sub in all_subjects:
133 |         sub_str = f"S{sub:03d}"
134 |         print(f"Loading data for subject {sub}")
135 |         sub_folder = os.path.join(OUTPUT_DIR, "files", sub_str)
136 |         sub_dfs = []
137 | 
138 |         for t in all_trials:
139 |             # https://pyedflib.readthedocs.io/en/latest/
140 |             file_path = os.path.join(sub_folder, f"{sub_str}R{t:02d}.edf")
141 |             # print(f"Loading file {file_path}")
142 |             f = pyedflib.EdfReader(file_path)
143 | 
144 |             n = f.signals_in_file
145 |             assert n == 64
146 |             trial_data = np.empty((f.getNSamples()[0], n), order="F")
147 |             for i in np.arange(n):
148 |                 trial_data[:, i] = f.readSignal(i)
149 | 
150 |             output_classes = np.zeros(trial_data.shape[0], order="F")
151 |             start_times, durations, labels = f.readAnnotations()
152 |             sr = 160
153 |             for st, dur, lab in zip(start_times, durations, labels):
154 |                 s_ind = int(st * sr)
155 |                 e_ind = int(s_ind + dur * sr)
156 |                 if lab == "T0":
157 |                     output_classes[s_ind:e_ind] = 0
158 |                 elif lab == "T1":
159 |                     output_classes[s_ind:e_ind] = 1 if t in left_or_right_trials else 3
160 |                 elif lab == "T2":
161 |                     output_classes[s_ind:e_ind] = 2 if t in left_or_right_trials else 4
162 |                 else:
163 |                     raise ValueError(f"The label {lab} is not defined")
164 | 
165 |             df = pd.DataFrame(
166 |                 trial_data, columns=col_info.index[col_info.output == False]
167 |             )
168 |             df["experiment_id"] = t
169 |             df["subject_id"] = sub
170 |             df["activity_id"] = output_classes
171 |             df["activity_label"] = df["activity_id"].map(label_series)
172 |             sub_dfs.append(df)
173 | 
174 |         if sub in train_subjects:
175 |             train_data.extend(sub_dfs)
176 |         elif sub in validation_subjects:
177 |             validation_data.extend(sub_dfs)
178 |         elif sub in test_subjects:
179 |             test_data.extend(sub_dfs)
180 |         else:
181 |             raise ValueError(f"Unexpected subject {sub}")
182 | 
183 |         print(f"Done with subject {sub}")
184 | 
185 |     print("Creating training DF")
186 |     train_data = pd.concat(train_data, axis=0, ignore_index=True)
187 |     print("Creating validation DF")
188 |     validation_data = pd.concat(validation_data, axis=0, ignore_index=True)
189 |     print("Creating test DF")
190 |     test_data = pd.concat(test_data, axis=0, ignore_index=True)
191 |     print("Created all data frames")
192 |     return train_data, validation_data, test_data
193 | 
194 | 
195 | def get_dfs_processed():
196 |     df_train, df_val, df_test = get_data()
197 |     col_df = get_column_info()
198 | 
199 |     # Apply to all sets to normalize features
200 |     input_features = col_df.index[col_df.output == False]
201 | 
202 |     # Calculate feature normalizing stats from training set only
203 |     norm_mean = df_train.loc[:, input_features].mean(axis=0)
204 |     norm_std = df_train.loc[:, input_features].std(axis=0)
205 | 
206 |     print("Normalizing dataframes")
207 |     for df in (df_test, df_val, df_train):
208 |         # de-mean
209 |         df.loc[:, input_features] = df.loc[:, input_features] - norm_mean
210 | 
211 |         # unit std dev
212 |         df.loc[:, input_features] = df.loc[:, input_features].divide(norm_std, axis=1)
213 | 
214 |         # interpolate and NA's -> 0
215 |         df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0)
216 | 
217 |     print("Done normalizing dataframes")
218 |     return df_train, df_val, df_test
219 | 
220 | 
221 | _df_dicts = {}
222 | 
223 | 
224 | def get_or_make_dfs():
225 |     if _df_dicts:
226 |         return _df_dicts
227 |     download_if_needed()
228 | 
229 |     cache_dir = os.path.join(OUTPUT_DIR, "cache")
230 |     if not os.path.isdir(cache_dir) or not os.path.isfile(
231 |         os.path.join(cache_dir, "df_train.df.pkl")
232 |     ):
233 |         print("Intention recognition data not cached. Creating cache now...")
234 |         try:
235 |             os.makedirs(cache_dir)
236 |         except FileExistsError:
237 |             pass
238 | 
239 |         df_train, df_val, df_test = get_dfs_processed()
240 |         df_cols = get_column_info()
241 | 
242 |         s_labels = get_label_series()
243 | 
244 |         print("Saving data to cache")
245 |         df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl"))
246 |         _df_dicts["df_train"] = df_train
247 |         df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
248 |         _df_dicts["df_val"] = df_val
249 |         df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
250 |         _df_dicts["df_test"] = df_test
251 | 
252 |         df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
253 |         _df_dicts["df_cols"] = df_cols
254 |         s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl"))
255 |         _df_dicts["s_labels"] = s_labels
256 |         print("Caching done.")
257 |     else:
258 |         print("Loading cached data.")
259 |         _df_dicts["df_train"] = pd.read_pickle(
260 |             os.path.join(cache_dir, "df_train.df.pkl")
261 |         )
262 |         _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
263 |         _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
264 | 
265 |         _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
266 |         _df_dicts["s_labels"] = pd.read_pickle(
267 |             os.path.join(cache_dir, "s_labels.s.pkl")
268 |         )
269 |         print("Loaded.")
270 | 
271 |     return _df_dicts
272 | 
273 | 
274 | def get_x_y_contig(
275 |     which_set="train", dfs_dict=None, y_cols=["y_activity"], sensor_subset=None
276 | ):
277 |     """ Load X and y as contiguous time vectors (with various runs concatenated together).
278 | 
279 |     Parameters
280 |     ----------
281 |     which_set: str
282 |         which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g.,
283 |         "train", "val", "test", "train+val", etc.
284 |     dfs_dict: dict
285 |         Can provide pre-laoded df_dict to save time. (deprecated)
286 |     y_cols: List[str]
287 |         List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion']
288 |     sensor_subset: ty.Iterable
289 |         Which subset of sensors to include. Valid values include:
290 |             "accels", "accel"
291 |             "gyros", "gyro"
292 |             "accels+gyros", "accel+gyro"
293 |             "all"
294 |     """
295 |     if not dfs_dict:
296 |         dfs_dict = get_or_make_dfs()
297 | 
298 |     assert type(y_cols) == list
299 | 
300 |     df = []
301 |     for _which_set in which_set.split("+"):
302 |         df.append(dfs_dict["df_" + _which_set])
303 | 
304 |     df = pd.concat(df)
305 | 
306 |     df_cols = dfs_dict["df_cols"]
307 | 
308 |     if not sensor_subset or sensor_subset == "all":
309 |         cols = df_cols.index[df_cols.output == False]
310 |     else:
311 |         cols = []
312 |         for s in sensor_subset.split("+"):
313 |             if s.endswith("s"):
314 |                 s = s[:-1]
315 |             cols.append(df_cols.index[df_cols.sensor_type == s])
316 |         cols = pd.concat(cols)
317 | 
318 |     Xc = df[cols].values
319 |     s_labels = dfs_dict["s_labels"]
320 |     activity_outputs = df["activity_id"].copy()
321 | 
322 |     # Replace nulls with 0
323 |     activity_outputs = activity_outputs.fillna(0)
324 | 
325 |     ycs = [activity_outputs.values]
326 | 
327 |     output_spec_dict = {
328 |         "name": "y_activity",
329 |         "num_classes": len(s_labels),
330 |         "classes": s_labels.sort_index().to_list(),
331 |     }
332 | 
333 |     data_spec = {
334 |         "dataset_name": "intention",
335 |         "input_channels": Xc.shape[1],
336 |         "n_outputs": len(ycs),
337 |         "input_features": cols.to_list(),
338 |         "output_spec": [output_spec_dict],
339 |     }
340 | 
341 |     return Xc, ycs, data_spec
342 | 
343 | 
344 | if __name__ == "__main__":
345 |     pass
346 |     # get_x_y_contig()
347 | 


--------------------------------------------------------------------------------
/filternet/datasets/opportunity.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """ Loads the opportunity dataset. Based largely on
  4 | https://github.com/sussexwearlab/DeepConvLSTM/blob/master/preprocess_data.py
  5 | but refactored and with a slightly different normalization
  6 | 
  7 | (we de-mean and normalize to unit std deviation using
  8 | the statistics of the training set, instead of rescaling + clipping from 0->1 with predefined limits as
  9 | the referenced repo does. )
 10 | 
 11 | """
 12 | 
 13 | import os
 14 | import re
 15 | import urllib.error
 16 | import urllib.parse
 17 | import urllib.request
 18 | from zipfile import ZipFile
 19 | 
 20 | import pandas as pd
 21 | 
 22 | from . import datasets_dir
 23 | 
 24 | # datasets_dir = os.path.dirname(__file__)
 25 | 
 26 | DATASET_URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00226/OpportunityUCIDataset.zip"
 27 | _, DATASET_FILE = os.path.split(DATASET_URL)
 28 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE)
 29 | 
 30 | # +
 31 | fns_train = [
 32 |     "dataset/S1-ADL4.dat",
 33 |     "dataset/S1-Drill.dat",
 34 |     "dataset/S1-ADL5.dat",
 35 |     "dataset/S1-ADL1.dat",
 36 |     "dataset/S1-ADL2.dat",
 37 |     "dataset/S1-ADL3.dat",
 38 |     "dataset/S2-ADL2.dat",
 39 |     "dataset/S3-ADL2.dat",
 40 |     "dataset/S3-ADL1.dat",
 41 |     "dataset/S2-ADL1.dat",
 42 |     "dataset/S3-Drill.dat",
 43 |     #  'dataset/S4-ADL4.dat',
 44 |     #  'dataset/S4-ADL5.dat',
 45 |     "dataset/S2-Drill.dat",
 46 |     #  'dataset/S4-ADL2.dat',
 47 |     #  'dataset/S4-ADL3.dat',
 48 |     #  'dataset/S4-ADL1.dat',
 49 |     #  'dataset/S4-Drill.dat'
 50 | ]
 51 | 
 52 | fns_val = ["dataset/S2-ADL3.dat", "dataset/S3-ADL3.dat"]
 53 | 
 54 | fns_test = [
 55 |     "dataset/S2-ADL4.dat",
 56 |     "dataset/S2-ADL5.dat",
 57 |     "dataset/S3-ADL4.dat",
 58 |     "dataset/S3-ADL5.dat",
 59 | ]
 60 | 
 61 | 
 62 | def make_df_cols():
 63 |     """ Make a dataframe of the columns in the datafiles that can be downselected for various sensor subsets, lablels, etc."""
 64 | 
 65 |     # Read in columns list
 66 |     with open(
 67 |         os.path.join(datasets_dir, DATASET_SUBDIR, "dataset/column_names.txt")
 68 |     ) as f:
 69 |         txt = f.read()
 70 | 
 71 |     # Extract Feature columns first
 72 |     pattern = "^Column: (?P<col_no_matlab>\d*) (?P<cat>\S*\s?\S*) (?P<posn>\S*) (?P<chan>\S*); .*unit = (?P<unit>.*)$"
 73 | 
 74 |     df_feat_cols = pd.DataFrame.from_records(
 75 |         [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)]
 76 |     )
 77 |     df_feat_cols["col_no"] = df_feat_cols["col_no_matlab"].astype(int) - 1
 78 |     df_feat_cols = df_feat_cols.set_index("col_no").sort_index()
 79 |     df_feat_cols["name"] = (
 80 |         (
 81 |             df_feat_cols.cat.str.slice(0, 2)
 82 |             + " "
 83 |             + df_feat_cols["posn"]
 84 |             + " "
 85 |             + df_feat_cols["chan"]
 86 |         )
 87 |         .str.replace("^", "hi")
 88 |         .str.replace("_", "lo")
 89 |     )
 90 | 
 91 |     # Then label columns
 92 |     pattern = "^Column: (?P<col_no_matlab>\d*) (?P<name>\S*\s?\S*)$"
 93 |     df_label_cols = pd.DataFrame.from_records(
 94 |         [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)]
 95 |     )
 96 |     df_label_cols["col_no"] = df_label_cols["col_no_matlab"].astype(int) - 1
 97 |     df_label_cols = df_label_cols.set_index("col_no").sort_index()
 98 | 
 99 |     # Combine to get all columns
100 |     df_cols = pd.concat([df_feat_cols, df_label_cols], sort=True).sort_index()
101 | 
102 |     return df_cols
103 | 
104 | 
105 | def make_df_labels():
106 |     """ Make a dataframe of the different class labels """
107 |     # +
108 |     pattern = "^(?P<src_idx>\d*)   -   (?P<track_name>\S*)   -   (?P<label_name>.*)$"
109 |     with open(
110 |         os.path.join(datasets_dir, DATASET_SUBDIR, "dataset/label_legend.txt")
111 |     ) as f:
112 |         txt = f.read()
113 |     df_label_legend = pd.DataFrame.from_records(
114 |         [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)]
115 |     )
116 | 
117 |     # Label mapping for locomotion
118 |     df_labels_locomotion = df_label_legend.query(
119 |         'track_name == "Locomotion"'
120 |     ).reset_index(drop=True)
121 |     df_labels_locomotion["idx"] = df_labels_locomotion.index + 1
122 |     df_labels_locomotion.src_idx = df_labels_locomotion.src_idx.astype(int)
123 |     df_labels_locomotion = df_labels_locomotion.set_index("src_idx", drop=True)
124 |     df_labels_locomotion.loc[0] = ("Null", "", 0)
125 | 
126 |     # Label mapping for gestures
127 |     df_labels_gesture = df_label_legend.query(
128 |         'track_name == "ML_Both_Arms"'
129 |     ).reset_index(drop=True)
130 |     df_labels_gesture["idx"] = df_labels_gesture.index + 1
131 |     df_labels_gesture.src_idx = df_labels_gesture.src_idx.astype(int)
132 |     df_labels_gesture = df_labels_gesture.set_index("src_idx", drop=True)
133 |     df_labels_gesture.loc[0] = ("Null", "", 0)
134 | 
135 |     return df_labels_locomotion, df_labels_gesture
136 | 
137 | 
138 | def download_if_needed():
139 |     """Downloads and extracts .zip if needed. """
140 | 
141 |     if not os.path.exists(os.path.join(datasets_dir, DATASET_SUBDIR)):
142 |         if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)):
143 |             print("Downloading .zip file...")
144 |             urllib.request.urlretrieve(
145 |                 DATASET_URL, os.path.join(datasets_dir, DATASET_FILE)
146 |             )
147 |         assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE))
148 | 
149 |         print("Extracting...")
150 |         zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE))
151 |         zip.extractall(datasets_dir)
152 | 
153 |     assert os.path.exists(os.path.join(datasets_dir, DATASET_SUBDIR))
154 | 
155 | 
156 | def load_opp_dataset(fn):
157 |     """ Load an individual file """
158 |     print(fn)
159 |     return pd.read_csv(
160 |         os.path.join(datasets_dir, DATASET_SUBDIR, fn), header=None, sep="\s+"
161 |     )
162 | 
163 | 
164 | def get_dfs_raw():
165 |     """ Load dataframes of concatenated data files for the three pre-defined splits. """
166 |     df_train = pd.concat([load_opp_dataset(fn) for fn in fns_train])
167 |     df_val = pd.concat([load_opp_dataset(fn) for fn in fns_val])
168 |     df_test = pd.concat([load_opp_dataset(fn) for fn in fns_test])
169 | 
170 |     return df_train, df_val, df_test
171 | 
172 | 
173 | def get_dfs_processed():
174 |     """ Load and preprocess the data from raw data down to more manageable dataframes that can be chopped
175 |     up a bit for specific purposes. This includes de-meaning, etc."""
176 |     df_cols = make_df_cols()
177 |     df_feat_cols = df_cols[~df_cols.cat.isna()]
178 | 
179 |     df_labels_locomotion, df_labels_gestures = make_df_labels()
180 | 
181 |     df_train, df_val, df_test = get_dfs_raw()
182 | 
183 |     for df in [df_train, df_val, df_test]:
184 |         # Meaningful column names
185 |         df.columns = df_cols.name.values
186 | 
187 |     # Calculate feature normalizing stats from training set only
188 |     norm_mean = df_train.loc[:, df_feat_cols.name].mean()
189 |     norm_std = df_train.loc[:, df_feat_cols.name].std()
190 | 
191 |     # Apply to all sets to normalize features
192 |     for df in [df_train, df_val, df_test]:
193 |         # de-mean
194 |         df.loc[:, df_feat_cols.name] -= norm_mean
195 | 
196 |         # unit std dev
197 |         df.loc[:, df_feat_cols.name] /= norm_std
198 | 
199 |         # interpolate and NA's -> 0
200 |         df.loc[:, df_feat_cols.name] = (
201 |             df.loc[:, df_feat_cols.name].interpolate().fillna(0)
202 |         )
203 | 
204 |         df["y_locomotion"] = df["Locomotion"].map(df_labels_locomotion.idx)
205 |         df["y_gesture"] = df["ML_Both_Arms"].map(df_labels_gestures.idx)
206 | 
207 |     return df_train, df_val, df_test
208 | 
209 | 
210 | _df_dicts = None
211 | 
212 | 
213 | def get_or_make_dfs():
214 |     """ Loads pre-processed dataframes from disk, if they exist; creates them if not. Once loaded, they are
215 |     cached in-memory for fast subsequent loads. """
216 | 
217 |     global _df_dicts
218 |     if _df_dicts is not None:
219 |         return _df_dicts
220 |     download_if_needed()
221 |     cache_dir = os.path.join(datasets_dir, DATASET_SUBDIR, "cache")
222 |     if not os.path.exists(cache_dir):
223 |         print("Opportunity data not cached. Creating cache now...")
224 |         os.makedirs(cache_dir)
225 | 
226 |         df_cols = make_df_cols()
227 |         df_labels_locomotion, df_labels_gestures = make_df_labels()
228 |         df_train, df_val, df_test = get_dfs_processed()
229 | 
230 |         df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl"))
231 |         df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
232 |         df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
233 | 
234 |         df_labels_locomotion.to_pickle(
235 |             os.path.join(cache_dir, "df_labels_locomotion.df.pkl")
236 |         )
237 |         df_labels_gestures.to_pickle(
238 |             os.path.join(cache_dir, "df_labels_gestures.df.pkl")
239 |         )
240 |         df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
241 |         print("Caching done.")
242 | 
243 |     print("Loading cached data.")
244 |     df_train = pd.read_pickle(os.path.join(cache_dir, "df_train.df.pkl"))
245 |     df_val = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
246 |     df_test = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
247 | 
248 |     df_labels_locomotion = pd.read_pickle(
249 |         os.path.join(cache_dir, "df_labels_locomotion.df.pkl")
250 |     )
251 |     df_labels_gestures = pd.read_pickle(
252 |         os.path.join(cache_dir, "df_labels_gestures.df.pkl")
253 |     )
254 |     df_cols = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
255 |     print("Loaded.")
256 | 
257 |     _df_dicts = dict(
258 |         df_train=df_train,
259 |         df_val=df_val,
260 |         df_test=df_test,
261 |         df_labels_locomotion=df_labels_locomotion,
262 |         df_labels_gestures=df_labels_gestures,
263 |         df_cols=df_cols,
264 |     )
265 | 
266 |     return _df_dicts
267 | 
268 | 
269 | def get_x_y_contig(
270 |     which_set="train", dfs_dict=None, y_cols=["y_gesture"], sensor_subset=None
271 | ):
272 |     """ Load X and y as contiguous time vectors (with various runs concatenated together).
273 | 
274 |     Parameters
275 |     ----------
276 |     which_set: str
277 |         which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g.,
278 |         "train", "val", "test", "train+val", etc.
279 |     dfs_dict: dict
280 |         Can provide pre-laoded df_dict to save time. (deprecated)
281 |     y_cols: List[str]
282 |         List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion']
283 |     sensor_subset: str
284 |         Which subset of sensors to include. Valid values include:
285 |             "accels",
286 |             "gyros",
287 |             "accels+gyros",
288 |             "accels+gyros+magnetic",  << above categories are for motion jacket only
289 |             "opportunity"  <<< all 113 sensors used in opportunity challenge
290 |     """
291 |     if dfs_dict is None:
292 |         dfs_dict = get_or_make_dfs()
293 | 
294 |     assert type(y_cols) == list
295 | 
296 |     df = []
297 |     for _which_set in which_set.split("+"):
298 |         df.append(dfs_dict["df_" + _which_set])
299 | 
300 |     df = pd.concat(df)
301 | 
302 |     df_cols = dfs_dict["df_cols"]
303 | 
304 |     # Used for sensor subsets (arm + back IMUs)
305 |     core_posns = ["RUA", "LUA", "RLA", "LLA", "BACK"]
306 | 
307 |     # For full Opportunity complement of sensors:
308 |     xtra_posns = [
309 |         "L-SHOE",
310 |         "R-SHOE",
311 |         "RKN_",
312 |         "RUA_",
313 |         "RWR",
314 |         "LUA_",
315 |         "HIP",
316 |         "RUA^",
317 |         "RKN^",
318 |         "LWR",
319 |         "LH",
320 |         "LUA^",
321 |         "RH",
322 |     ]
323 | 
324 |     if sensor_subset is None or sensor_subset == "opportunity":
325 |         posns = core_posns + xtra_posns
326 |         chan_pattern = ""
327 |         cat_pattern = ""
328 |     else:
329 |         posns = core_posns
330 |         # regex for selecting sensors by prefixes
331 |         chan_pattern = "|".join([f"^{s[:3]}" for s in sensor_subset.split("+")])
332 |         cat_pattern = "^Inertial"
333 | 
334 |     feature_cols = df_cols[
335 |         (
336 |             df_cols.posn.isin(posns)
337 |             & ~(df_cols.chan.str.match("^Quat") == True)
338 |             & (df_cols.chan.str.match(chan_pattern) == True)
339 |             & (df_cols.cat.str.match(cat_pattern) == True)
340 |         )
341 |     ]
342 | 
343 |     Xc = df[feature_cols.name].values
344 |     ycs = [df[y_col].values for y_col in y_cols]
345 | 
346 |     output_spec_dict = {
347 |         "y_gesture": {
348 |             "name": "gesture",
349 |             "num_classes": len(dfs_dict["df_labels_gestures"]),
350 |             "classes": dfs_dict["df_labels_gestures"]
351 |             .set_index("idx")
352 |             .sort_index()
353 |             .label_name.to_list(),
354 |         },
355 |         "y_locomotion": {
356 |             "name": "locomotion",
357 |             "num_classes": len(dfs_dict["df_labels_locomotion"]),
358 |             "classes": dfs_dict["df_labels_locomotion"]
359 |             .set_index("idx")
360 |             .sort_index()
361 |             .label_name.to_list(),
362 |         },
363 |     }
364 | 
365 |     data_spec = {
366 |         "dataset_name": "opportunity",
367 |         "input_channels": Xc.shape[1],
368 |         "n_outputs": len(y_cols),
369 |         "input_features": feature_cols.name.to_list(),
370 |         "output_spec": [output_spec_dict[y_col] for y_col in y_cols],
371 |     }
372 | 
373 |     return Xc, ycs, data_spec
374 | 
375 | 
376 | if __name__ == "__main__":
377 |     pass
378 | 


--------------------------------------------------------------------------------
/filternet/datasets/smartphone_hapt.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """ Loads the smartphone human activity and posture detection dataset.
  4 | 
  5 | HAPT
  6 | """
  7 | 
  8 | import os
  9 | import urllib
 10 | from zipfile import ZipFile
 11 | 
 12 | # from . import datasets_dir
 13 | import numpy as np
 14 | 
 15 | datasets_dir = os.path.dirname(__file__)
 16 | 
 17 | import pandas as pd
 18 | 
 19 | # Papers
 20 | # http://conference.scipy.org/proceedings/scipy2018/pdfs/christian_mcdaniel.pdf
 21 | #   github: https://github.com/xtianmcd/accelstm
 22 | # https://www.mdpi.com/1424-8220/17/11/2556/htm
 23 | # https://arxiv.org/pdf/1801.04503.pdf
 24 | # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.641.2285&rep=rep1&type=pdf
 25 | 
 26 | 
 27 | DATASET_URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip"
 28 | _, DATASET_FILE = os.path.split(DATASET_URL)
 29 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE)
 30 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_")
 31 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR)
 32 | 
 33 | 
 34 | def download_if_needed():
 35 |     """Downloads and extracts .zip if needed. """
 36 |     if not os.path.exists(OUTPUT_DIR):
 37 |         if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)):
 38 |             print("Downloading .zip file...")
 39 |             urllib.request.urlretrieve(
 40 |                 DATASET_URL, os.path.join(datasets_dir, DATASET_FILE)
 41 |             )
 42 |         assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE))
 43 | 
 44 |         print(f"Extracting to {OUTPUT_DIR}...")
 45 |         zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE))
 46 |         zip.extractall(OUTPUT_DIR)
 47 | 
 48 |     assert os.path.exists(OUTPUT_DIR)
 49 | 
 50 | 
 51 | def get_label_series():
 52 |     activity_labels_path = os.path.join(OUTPUT_DIR, "activity_labels.txt")
 53 |     print(f"Loading file {activity_labels_path}")
 54 |     label_series = pd.read_csv(activity_labels_path, sep=r"\s+", header=None)
 55 |     label_series.columns = ["class_number", "class_label"]
 56 |     label_series = label_series.dropna(how="any", axis=0)
 57 |     return pd.Series(
 58 |         label_series["class_label"].values, index=label_series["class_number"].values
 59 |     )
 60 | 
 61 | 
 62 | def load_labels():
 63 |     label_file_path = os.path.join(OUTPUT_DIR, "RawData/labels.txt")
 64 |     print(f"Reading labels file {label_file_path}")
 65 |     df_labels = pd.read_csv(label_file_path, r"\s+", header=None)
 66 |     df_labels.columns = [
 67 |         "experiment_id",
 68 |         "subject_id",
 69 |         "activity_id",
 70 |         "label_start_point",
 71 |         "label_end_point",
 72 |     ]
 73 | 
 74 |     label_series = get_label_series()
 75 | 
 76 |     df_labels["activity_label"] = df_labels["activity_id"].map(label_series)
 77 | 
 78 |     return df_labels.sort_values(
 79 |         ["experiment_id", "label_start_point", "label_end_point"]
 80 |     )
 81 | 
 82 | 
 83 | def separate_subjects():
 84 |     all_subjects = set(range(1, 31))
 85 |     # validation_subjects = {4, 12, 20, 27}  # 3 test (4, 12, 20) + 1 train (27)
 86 |     validation_subjects = {5, 16, 27}
 87 |     test_subjects = {2, 4, 9, 10, 12, 13, 18, 20, 24}  # Given test subjects
 88 |     train_subjects = all_subjects - test_subjects - validation_subjects
 89 |     assert train_subjects | test_subjects | validation_subjects == all_subjects
 90 |     assert set() == validation_subjects & test_subjects
 91 |     assert set() == train_subjects & test_subjects
 92 |     assert set() == train_subjects & validation_subjects
 93 |     return train_subjects, validation_subjects, test_subjects
 94 | 
 95 | 
 96 | def get_column_info():
 97 |     sensor_types = ["accel", "gyro"]
 98 |     file_names = ["acc", "gyro"]
 99 |     num_chans = [3, 3]
100 |     assert len(num_chans) == len(sensor_types)
101 |     types = []
102 |     names = []
103 |     file_prefixes = []
104 |     for i in range(len(sensor_types)):
105 |         t = sensor_types[i]
106 |         nc = num_chans[i]
107 |         types.extend([t] * nc)
108 |         file_prefixes.extend([file_names[i]] * nc)
109 |         names.extend([f"{t[0]}{j + 1}" for j in range(nc)])
110 | 
111 |     # Note that this segfaults if the input data is a tuple and not a list. WTF pandas?
112 |     df = pd.DataFrame(
113 |         [types, file_prefixes], index=["sensor_type", "file_prefix"], columns=names
114 |     ).T
115 |     df.index.name = "name"
116 |     df["output"] = False
117 | 
118 |     df = df.reindex(
119 |         df.index.append(
120 |             pd.Index(["experiment_id", "subject_id", "activity_id", "activity_label"])
121 |         )
122 |     )
123 | 
124 |     df.loc[["activity_id", "activity_label"], "output"] = True
125 | 
126 |     return df
127 | 
128 | 
129 | def get_data():
130 |     train_subjects, validation_subjects, test_subjects = separate_subjects()
131 |     labels = load_labels()
132 |     col_info = get_column_info()
133 | 
134 |     output_cols = col_info.index[col_info.output == True]
135 |     not_input_cols = col_info.index[~(col_info.output == False)]
136 |     all_sensors = col_info.loc[col_info.output == False, "sensor_type"].unique()
137 |     train_data = []
138 |     validation_data = []
139 |     test_data = []
140 |     for exp_num, exp_labels in labels.groupby("experiment_id"):
141 |         user_id = exp_labels["subject_id"].unique()  # Should only be 1
142 |         assert len(user_id) == 1
143 |         user_id = user_id[0]
144 | 
145 |         exp_data = []
146 |         for t in all_sensors:
147 |             sensor_type_mask = t == col_info.sensor_type
148 |             prefix = col_info.loc[sensor_type_mask, "file_prefix"].unique()[0]
149 |             file_path = os.path.join(
150 |                 OUTPUT_DIR,
151 |                 "RawData",
152 |                 f"{prefix}_exp{exp_num:02d}_user{user_id:02d}.txt",
153 |             )
154 |             print(f"Reading file {file_path}")
155 |             tmp_df = pd.read_csv(file_path, sep=r"\s+", header=None)
156 |             tmp_df.columns = col_info.index[sensor_type_mask]
157 |             exp_data.append(tmp_df)
158 |         exp_data = pd.concat(exp_data, axis=1, ignore_index=False)
159 | 
160 |         # Make 1 based to match everything else
161 |         exp_data.index = exp_data.index + 1
162 |         exp_data = pd.concat(
163 |             (exp_data, pd.DataFrame(index=exp_data.index, columns=not_input_cols)),
164 |             axis=1,
165 |             ignore_index=False,
166 |         )
167 | 
168 |         for _, l in exp_labels.iterrows():
169 |             exp_data.loc[
170 |                 l["label_start_point"] : l["label_end_point"], output_cols
171 |             ] = l[output_cols].values
172 | 
173 |         # Just to help make sure matches expectations
174 |         exp_data = exp_data.reset_index(drop=True)
175 | 
176 |         exp_data["experiment_id"] = exp_num
177 |         exp_data["subject_id"] = user_id
178 |         exp_data["activity_id"] = exp_data["activity_id"].fillna(0)
179 |         exp_data["activity_label"] = "UNKNOWN"
180 | 
181 |         if user_id in train_subjects:
182 |             train_data.append(exp_data)
183 |         elif user_id in validation_subjects:
184 |             validation_data.append(exp_data)
185 |         elif user_id in test_subjects:
186 |             test_data.append(exp_data)
187 |         else:
188 |             raise ValueError(f"Unexpected subject {user_id}")
189 | 
190 |     train_data = pd.concat(train_data)
191 |     validation_data = pd.concat(validation_data)
192 |     test_data = pd.concat(test_data)
193 |     return train_data, validation_data, test_data
194 | 
195 | 
196 | def get_dfs_processed():
197 |     df_train, df_val, df_test = get_data()
198 |     col_df = get_column_info()
199 |     # Calculate feature normalizing stats from training set only
200 |     norm_mean = df_train.mean(axis=0)
201 |     norm_std = df_train.std(axis=0)
202 | 
203 |     # Apply to all sets to normalize features
204 |     input_features = col_df.index[col_df.output == False]
205 |     for df in (df_train, df_val, df_test):
206 |         # de-mean
207 |         df.loc[:, input_features] -= norm_mean
208 | 
209 |         # unit std dev
210 |         df.loc[:, input_features] /= norm_std
211 | 
212 |         # interpolate and NA's -> 0
213 |         df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0)
214 | 
215 |     return df_train, df_val, df_test
216 | 
217 | 
218 | _df_dicts = {}
219 | 
220 | 
221 | def get_or_make_dfs():
222 |     if _df_dicts:
223 |         return _df_dicts
224 |     download_if_needed()
225 | 
226 |     cache_dir = os.path.join(OUTPUT_DIR, "cache")
227 |     if not os.path.isdir(cache_dir) or not os.path.isfile(
228 |         os.path.join(cache_dir, "df_train.df.pkl")
229 |     ):
230 |         print("Smartphone data not cached. Creating cache now...")
231 |         try:
232 |             os.makedirs(cache_dir)
233 |         except FileExistsError:
234 |             pass
235 | 
236 |         df_train, df_val, df_test = get_dfs_processed()
237 |         df_cols = get_column_info()
238 | 
239 |         s_labels = get_label_series()
240 | 
241 |         df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl"))
242 |         _df_dicts["df_train"] = df_train
243 |         df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
244 |         _df_dicts["df_val"] = df_val
245 |         df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
246 |         _df_dicts["df_test"] = df_test
247 | 
248 |         df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
249 |         _df_dicts["df_cols"] = df_cols
250 |         s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl"))
251 |         _df_dicts["s_labels"] = s_labels
252 |         print("Caching done.")
253 |     else:
254 |         print("Loading cached data.")
255 |         _df_dicts["df_train"] = pd.read_pickle(
256 |             os.path.join(cache_dir, "df_train.df.pkl")
257 |         )
258 |         _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl"))
259 |         _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl"))
260 | 
261 |         _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl"))
262 |         _df_dicts["s_labels"] = pd.read_pickle(
263 |             os.path.join(cache_dir, "s_labels.s.pkl")
264 |         )
265 |         print("Loaded.")
266 | 
267 |     return _df_dicts
268 | 
269 | 
270 | def get_x_y_contig(
271 |     which_set="train",
272 |     dfs_dict=None,
273 |     y_cols=("y_activity",),
274 |     sensor_subset=None,
275 |     include_transitions=False,
276 | ):
277 |     """ Load X and y as contiguous time vectors (with various runs concatenated together).
278 | 
279 |     Parameters
280 |     ----------
281 |     which_set: str
282 |         which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g.,
283 |         "train", "val", "test", "train+val", etc.
284 |     dfs_dict: dict
285 |         Can provide pre-laoded df_dict to save time. (deprecated)
286 |     y_cols: List[str]
287 |         List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion']
288 |     sensor_subset: ty.Iterable
289 |         Which subset of sensors to include. Valid values include:
290 |             "accels", "accel"
291 |             "gyros", "gyro"
292 |             "accels+gyros", "accel+gyro"
293 |             "all"
294 |     include_transitions : bool
295 |     """
296 |     y_cols = list(y_cols)
297 |     if not dfs_dict:
298 |         dfs_dict = get_or_make_dfs()
299 | 
300 |     assert isinstance(y_cols, list)
301 | 
302 |     df = []
303 |     for _which_set in which_set.split("+"):
304 |         df.append(dfs_dict["df_" + _which_set])
305 | 
306 |     df = pd.concat(df)
307 | 
308 |     df_cols = dfs_dict["df_cols"]
309 | 
310 |     if not sensor_subset or sensor_subset == "all":
311 |         cols = df_cols.index[df_cols.output == False]
312 |     else:
313 |         cols = []
314 |         for s in sensor_subset.split("+"):
315 |             if s.endswith("s"):
316 |                 s = s[:-1]
317 |             cols.append(df_cols.index[df_cols.sensor_type == s])
318 |         cols = pd.concat(cols)
319 | 
320 |     Xc = df[cols].values
321 |     s_labels = dfs_dict["s_labels"]
322 |     activity_outputs = df["activity_id"].copy()
323 |     if not include_transitions:
324 |         transition_mask = s_labels.str.contains("_TO_")
325 |         transition_activities = s_labels.index[transition_mask]
326 |         activity_outputs[activity_outputs.isin(transition_activities)] = np.nan
327 |         s_labels = s_labels[~transition_mask]
328 | 
329 |     # Replace nulls with 0
330 |     activity_outputs = activity_outputs.fillna(0)
331 | 
332 |     ycs = [activity_outputs.values]
333 | 
334 |     # Include the null class
335 |     output_spec_dict = {
336 |         "name": "y_activity",
337 |         "num_classes": len(s_labels) + 1,
338 |         "classes": [""] + s_labels.sort_index().to_list(),
339 |     }
340 | 
341 |     data_spec = {
342 |         "dataset_name": "smartphone",
343 |         "input_channels": Xc.shape[1],
344 |         "n_outputs": len(ycs),
345 |         "input_features": cols.to_list(),
346 |         "output_spec": [output_spec_dict],
347 |     }
348 | 
349 |     return Xc, ycs, data_spec
350 | 
351 | 
352 | if __name__ == "__main__":
353 |     pass
354 | 


--------------------------------------------------------------------------------
/filternet/models/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | from .base_net import BaseNet
4 | from .deep_conv_lstm import DeepConvLSTM
5 | from .filter_net import FilterNet
6 | from .filter_net_ensemble import FilterNetEnsemble
7 | 


--------------------------------------------------------------------------------
/filternet/models/base_layers.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import numpy as np
  4 | from torch import nn
  5 | 
  6 | 
  7 | class CustomRNNMixin(object):
  8 |     """Mixin to hekp wraps PyTorch recurrent layer(s) to swap axes 1&2 (and back) since that's what PyTorch RNNs expect.
  9 |     """
 10 | 
 11 |     def __init__(self, *args, **kwargs):
 12 |         if "batch_first" not in kwargs:
 13 |             kwargs["batch_first"] = True
 14 |         super().__init__(*args, **kwargs)
 15 | 
 16 |     def forward(self, input):
 17 |         input = input.transpose(1, 2).contiguous()
 18 |         output, h_n = super().forward(input)
 19 |         return output.transpose(1, 2).contiguous()
 20 | 
 21 | 
 22 | class CustomGRU(CustomRNNMixin, nn.GRU):
 23 |     """Wraps PyTorch GRU to swap axes 1&2 (and back) since that's what PyTorch RNNs expect.
 24 |     GRU sub-type.
 25 |     """
 26 | 
 27 |     pass
 28 | 
 29 | 
 30 | class CustomLSTM(CustomRNNMixin, nn.LSTM):
 31 |     """Wraps PyTorch LSTM version to swap axes 1&2 (and back) since that's what PyTorch RNNs expect.
 32 |     LSTM sub-type.
 33 |     """
 34 | 
 35 |     pass
 36 | 
 37 | 
 38 | class CGLLayer(nn.Sequential):
 39 |     """Flexible mplementation of a convolution/GRU/LSTM layer, which is the basic building block of our models. Each
 40 |     layer is made up of (optional) dropout, a CNN, GRU, or LSTM layer surrounded by (optional) striding/pooling
 41 |     layers, and a BatchNorm layer.
 42 | 
 43 |     This layer subclasses torch.nn.Sequential so that all the pytorch magic still works with it (like transferring
 44 |     to/from devices, initializing weights, switching back/forth to eval mode, etc)
 45 |     """
 46 | 
 47 |     output_size = (
 48 |         None
 49 |     )  # type: int # depth (channels) output by this layer, useful for hooking up to subsequent modules.
 50 | 
 51 |     def __init__(
 52 |         self,
 53 |         input_size,
 54 |         output_size,
 55 |         kernel_size=5,
 56 |         type="cnn",
 57 |         stride=1,
 58 |         pool=None,
 59 |         dropout=0.1,
 60 |         stride_pos=None,
 61 |         batch_norm=True,
 62 |         groups=1,
 63 |     ):
 64 |         """
 65 | 
 66 |         Parameters
 67 |         ----------
 68 |         input_size: int
 69 |             Depth (channels) of input / previous layer
 70 |         output_size: int
 71 |             Depth (channels) that this layer will output
 72 |         kernel_size: int
 73 |             For CNNs
 74 |         type: str
 75 |             'cnn', 'lstm', or 'gru'; determines primary layer type.
 76 |         stride: int
 77 |             How much to decimate output (in temporal dimension) via _striding_. Defaults to 1 (no decimation).
 78 |         pool: int
 79 |             How much to decimate output (in temporal dimension) via _average_pooling_. Defaults to 1 (no decimation).
 80 |         dropout: float
 81 |             Amount of dropout Defaults to 0.0, i.e., None
 82 |         stride_pos: str
 83 |             For recurrent layers only, determines whether striding/pooling is done *before* (default) or
 84 |             *after* the recurrent layer.
 85 |         batch_norm: bool
 86 |             If True (default), the activation layer is followed by a batchnorm layer.
 87 |         """
 88 | 
 89 |         layers = []
 90 |         self.output_size = output_size
 91 | 
 92 |         if type == "cnn":
 93 |             if dropout:
 94 |                 layers.append(nn.Dropout2d(dropout))
 95 |             s = 1 if pool else stride
 96 |             p = int(np.ceil((kernel_size - s) / 2.0))
 97 |             layers.append(
 98 |                 nn.Conv1d(
 99 |                     input_size,
100 |                     output_size,
101 |                     stride=s,
102 |                     kernel_size=kernel_size,
103 |                     padding=p,
104 |                     groups=groups,
105 |                 )
106 |             )
107 |             layers.append(nn.ReLU())
108 |             if pool:
109 |                 p = int(np.ceil((pool - stride) / 2.0))
110 |                 layers.append(
111 |                     nn.AvgPool1d(pool, stride, padding=p, count_include_pad=False)
112 |                 )
113 |         elif type in ["gru", "lstm"]:
114 |             klass = {"gru": CustomGRU, "lstm": CustomLSTM}[type]
115 |             if (pool or stride) and stride_pos != "post":
116 |                 pl = 1 if not pool else pool
117 |                 p = np.ceil((pl - stride) / 2.0).astype(int)
118 |                 layers.append(nn.AvgPool1d(pl, stride=stride, padding=p))
119 |             if dropout:
120 |                 layers.append(nn.Dropout2d(dropout))
121 |             assert output_size % 2 == 0  # must be even b/c bidirectional
122 |             layers.append(
123 |                 klass(
124 |                     input_size=input_size,
125 |                     hidden_size=int(output_size / 2),
126 |                     bidirectional=True,
127 |                 )
128 |             )
129 |             if (pool or stride) and stride_pos == "post":
130 |                 pl = 1 if not pool else pool
131 |                 p = np.ceil((pl - stride) / 2.0).astype(int)
132 |                 layers.append(nn.AvgPool1d(pl, stride=stride, padding=p))
133 |         else:
134 |             raise ValueError("Unknown layer type: %s" % type)
135 | 
136 |         # Follow with BN
137 |         if batch_norm:
138 |             layers.append(nn.BatchNorm1d(self.output_size))
139 | 
140 |         super().__init__(*layers)
141 | 


--------------------------------------------------------------------------------
/filternet/models/base_net.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import numpy as np
  4 | import torch
  5 | from torch import nn
  6 | 
  7 | 
  8 | class BaseNet(nn.Module):
  9 |     """ Abstract 'base' network that can be reimplemented for specific architectures."""
 10 | 
 11 |     def __init__(
 12 |         self,
 13 |         input_channels=113,
 14 |         num_output_classes=[18],
 15 |         output_type="many_to_many",
 16 |         keep_intermediates=False,
 17 |         **other_kwargs,
 18 |     ):
 19 | 
 20 |         self.output_type = output_type
 21 |         self.num_output_classes = num_output_classes
 22 |         self.input_channels = input_channels
 23 |         self.keep_intermediates = keep_intermediates
 24 |         self.padding_lost_per_side = 0
 25 |         self.output_stride = 1
 26 | 
 27 |         super(BaseNet, self).__init__()
 28 | 
 29 |         self.build(**other_kwargs)
 30 | 
 31 |     def build(self, **other_kwargs):
 32 |         """ Builds the network. Can take any number of custom params as kwargs to configure it.
 33 |         REIMPLEMENT IN SUBCLASSES.
 34 |         """
 35 |         raise NotImplementedError()
 36 | 
 37 |     def forward(self, X, **kwargs):
 38 |         ys = self._forward(X, **kwargs)
 39 | 
 40 |         if self.output_type == "many_to_one_takelast":
 41 |             return [y[:, :, [-1]] for y in ys]
 42 |         elif self.output_type == "many_to_many":
 43 |             return ys
 44 |         else:
 45 |             raise NotImplemented(self.output_type)
 46 | 
 47 |     def _forward(self, X, **kwargs):
 48 |         """Forward pass logic specific to this network type.
 49 |         REIPMLEMENT IN SUBCLASSES.
 50 |         Input dimensionality: (N, C_{in}, L_{in})"""
 51 |         raise NotImplementedError()
 52 | 
 53 |     def transform_targets(self, ys, one_hot=True):
 54 |         """ Convert a `y` vector (one of `ys`) into targets that can be compared
 55 |         to network outputs... take into account padding, one-hot encoding (if requested),
 56 |         and whether the network is many-to-many or many-to-one. """
 57 |         ys2 = []
 58 |         for i_y, y in enumerate(ys):
 59 |             if self.output_type == "many_to_one_takelast" and not one_hot:
 60 |                 ys2.append(y[:, [-1]])
 61 |                 continue
 62 | 
 63 |             # Account for any change in sequence length due to padding
 64 |             if self.padding_lost_per_side > 0:
 65 |                 y = y[:, self.padding_lost_per_side : -self.padding_lost_per_side]
 66 | 
 67 |             # for many-to-many, if needed:
 68 |             win_len = y.shape[-1]
 69 |             # Calculate number of outputs. This is not always accurate and sometimes
 70 |             # 'floor' needs to change to 'ceil' or vice-versa... TBD is to implement
 71 |             # a system that calculates this accurately for all of our possible
 72 |             # architectures.
 73 |             output_size = int(np.floor(win_len / float(self.output_stride)))
 74 |             # Now, create that many outputs, evenly spaced by output_stride
 75 |             output_idxs = np.arange(output_size) * self.output_stride
 76 |             # Now, center it in the middle of the window. Depending on our
 77 |             #  architecture, this many not be *exactly* optimal, but it's
 78 |             #  a good guess on average.
 79 |             # Note: win_len - 1 because of zero-indexing
 80 |             output_idxs = np.round(
 81 |                 output_idxs - (output_idxs.mean() - (win_len - 1) / 2.0)
 82 |             ).astype(int)
 83 | 
 84 |             if one_hot:
 85 |                 if len(y.shape) == 2:
 86 |                     # Do one-hot encoding
 87 |                     y = torch.zeros(
 88 |                         y.size()[0],
 89 |                         self.num_output_classes[i_y],
 90 |                         y.size()[1],
 91 |                         device=y.device,
 92 |                     ).scatter_(1, y.unsqueeze(1), 1)
 93 | 
 94 |                 if self.output_type == "many_to_one_takelast":
 95 |                     ys2.append(y[:, :, [output_idxs[-1]]])
 96 |                 elif self.output_type == "many_to_many":
 97 |                     ys2.append(y[:, :, output_idxs])
 98 |                 else:
 99 |                     raise NotImplemented(self.output_type)
100 | 
101 |             else:
102 |                 if self.output_type == "many_to_one_takelast":
103 |                     ys2.append(y[:, [output_idxs[-1]]])
104 |                 elif self.output_type == "many_to_many":
105 |                     ys2.append(y[:, output_idxs])
106 |                 else:
107 |                     raise NotImplemented(self.output_type)
108 | 
109 |         return ys2
110 | 


--------------------------------------------------------------------------------
/filternet/models/deep_conv_lstm.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | from torch import nn
 4 | 
 5 | from .base_net import BaseNet
 6 | 
 7 | 
 8 | class DeepConvLSTM(BaseNet):
 9 |     """ A pytorch implementation of 'DeepConvLSTM' as described in:
10 | 
11 |       [1]   F. J. Ordóñez and D. Roggen, “Deep Convolutional and LSTM Recurrent Neural Networks for
12 |             Multimodal Wearable Activity Recognition,” Sensors, vol. 16, no. 1, p. 115, Jan. 2016.
13 |     """
14 | 
15 |     def __init__(self, **other_kwargs):
16 |         super().__init__(output_type="many_to_one_takelast", **other_kwargs)
17 | 
18 |     def build(
19 |         self,
20 |         num_filters=64,
21 |         filter_size=5,
22 |         num_units_lstm=128,
23 |         scale=1.0,
24 |         **other_kwargs,
25 |     ):
26 | 
27 |         pad = 0
28 | 
29 |         num_filters = int(num_filters * scale)
30 |         num_units_lstm = int(num_units_lstm * scale)
31 | 
32 |         n_conv = 4
33 |         conv_stack = []
34 |         in_shape = 1
35 |         for i in range(n_conv):
36 |             conv_stack.append(
37 |                 nn.Conv2d(in_shape, num_filters, (filter_size, 1), padding=(pad, 0))
38 |             )
39 |             conv_stack.append(nn.ReLU())
40 |             self.padding_lost_per_side += int((filter_size - 1) / 2)
41 |             in_shape = num_filters
42 | 
43 |         self.conv_stack = nn.Sequential(*conv_stack)
44 | 
45 |         self.drop1 = nn.Dropout(0.5)
46 |         self.lstm1 = nn.LSTM(num_filters * self.input_channels, num_units_lstm)
47 |         self.drop2 = nn.Dropout(0.5)
48 |         self.lstm2 = nn.LSTM(num_units_lstm, num_units_lstm)
49 | 
50 |         end_stacks = []
51 |         for num_output_classes in self.num_output_classes:
52 |             end_stacks.append(
53 |                 nn.Sequential(
54 |                     nn.Dropout(0.5),
55 |                     # This sort of Conv1D acts as a time-distributed Dense layer.
56 |                     nn.Conv1d(
57 |                         num_units_lstm, num_output_classes, 1
58 |                     ),  # time-distributed dense
59 |                 )
60 |             )
61 |         self.end_stacks = nn.ModuleList(end_stacks)
62 | 
63 |         # Original DeepConvLSTM used an orthogonal weight initialization
64 |         for m in self.modules():
65 |             if isinstance(m, (nn.Conv2d, nn.Conv1d)):
66 |                 nn.init.orthogonal_(m.weight)
67 | 
68 |     def _forward(self, X, **kwargs):
69 |         """(N, C_{in}, L_{in})"""
70 |         Xs = [X]  # [batch, chan, seq]
71 |         Xs.append(
72 |             Xs[-1].unsqueeze(1).permute([0, 1, 3, 2])
73 |         )  # [batch, filters, seq, sensors]
74 | 
75 |         Xs.append(self.conv_stack(Xs[-1]))
76 | 
77 |         Xs.append(
78 |             Xs[-1].permute([2, 0, 1, 3]).flatten(2)
79 |         )  # to [seq, batch, (filtersxsensors)]
80 | 
81 |         Xs.append(self.drop1(Xs[-1]))
82 |         Xs.append(self.lstm1(Xs[-1])[0])
83 | 
84 |         Xs.append(self.drop2(Xs[-1]))
85 |         Xs.append(self.lstm2(Xs[-1])[0])
86 | 
87 |         Xs.append(Xs[-1].permute([1, 2, 0]))  # back to [batch, chan, seq]
88 | 
89 |         ys = [end_stack(Xs[-1]) for end_stack in self.end_stacks]
90 |         # No softmax because the pytorch cross_entropy loss function wants the raw outputs.
91 | 
92 |         if self.keep_intermediates:
93 |             self.Xs = Xs
94 | 
95 |         return ys
96 | 


--------------------------------------------------------------------------------
/filternet/models/filter_net.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import numpy as np
  4 | import torch
  5 | import torch.nn.functional as F
  6 | from torch import nn
  7 | 
  8 | from .base_layers import CGLLayer
  9 | from .base_net import BaseNet
 10 | 
 11 | DEFAULT_WIDTH = 100
 12 | 
 13 | 
 14 | class FilterNet(BaseNet):
 15 |     def build(
 16 |         self,
 17 |         n_pre=1,
 18 |         w_pre=DEFAULT_WIDTH,
 19 |         n_strided=3,
 20 |         w_strided=DEFAULT_WIDTH,
 21 |         n_interp=4,
 22 |         w_interp=DEFAULT_WIDTH,
 23 |         n_dense_pre_l=1,
 24 |         w_dense_pre_l=DEFAULT_WIDTH,
 25 |         n_l=1,
 26 |         w_l=DEFAULT_WIDTH,
 27 |         n_dense_post_l=0,
 28 |         w_dense_post_l=int(DEFAULT_WIDTH / 2),
 29 |         cnn_kernel_size=5,
 30 |         scale=1.0,
 31 |         bn_pre=False,
 32 |         dropout=0.1,
 33 |         do_pool=True,
 34 |         stride_pos="post",
 35 |         stride_amt=2,
 36 |         **other_kwargs,
 37 |     ):
 38 |         # if scale != 1:
 39 |         w_pre = int((w_pre * scale))  # / 6) * 6
 40 |         w_strided = int((w_strided * scale))  # / 6) * 6
 41 |         w_interp = int(w_interp * scale)
 42 |         w_dense_pre_l = int(w_dense_pre_l * scale)
 43 |         w_l = int((w_l * scale) / 2) * 2
 44 |         w_dense_post_l = int(w_dense_post_l * scale)
 45 | 
 46 |         down_stack_1 = []
 47 |         in_shape = self.input_channels
 48 | 
 49 |         if bn_pre:
 50 |             down_stack_1.append(nn.BatchNorm1d(in_shape))
 51 | 
 52 |         for i in range(n_pre):
 53 |             down_stack_1.append(
 54 |                 CGLLayer(in_shape, w_pre, cnn_kernel_size, type="cnn", dropout=dropout)
 55 |             )
 56 |             in_shape = down_stack_1[-1].output_size
 57 | 
 58 |         for i in range(n_strided):
 59 |             stride = stride_amt
 60 |             pool = stride if (do_pool and stride > 1) else None
 61 |             ltype = "cnn"
 62 |             down_stack_1.append(
 63 |                 CGLLayer(
 64 |                     in_shape,
 65 |                     w_strided,
 66 |                     cnn_kernel_size,
 67 |                     type=ltype,
 68 |                     stride=stride,
 69 |                     pool=pool,
 70 |                     stride_pos=stride_pos,
 71 |                     dropout=dropout,
 72 |                     # groups=3 if ( i % 2 == 0 ) else 2
 73 |                 )
 74 |             )
 75 |             self.output_stride *= stride
 76 |             in_shape = down_stack_1[-1].output_size
 77 |         ds_1_end_size = in_shape
 78 |         self.down_stack_1 = nn.Sequential(*down_stack_1)
 79 | 
 80 |         ds2_ltype = "cnn"
 81 |         down_stack_2 = []
 82 | 
 83 |         for i in range(n_interp):
 84 |             stride = stride_amt if (i < n_interp - 1) else 1
 85 |             pool = stride if (do_pool and stride > 1) else None
 86 |             w = int(np.ceil(w_interp * 0.5 ** (i + 1)))
 87 |             # if i == n_interp-1:
 88 |             #     w = int(w_interp * .66)
 89 |             # if i == n_interp - 2:
 90 |             #     w =int(w_interp * .33)
 91 |             # else:
 92 |             #     w = w_interp
 93 |             down_stack_2.append(
 94 |                 CGLLayer(
 95 |                     in_shape,
 96 |                     w,
 97 |                     cnn_kernel_size,
 98 |                     type=ds2_ltype,
 99 |                     stride=stride,
100 |                     pool=pool,
101 |                     stride_pos=stride_pos,
102 |                     dropout=dropout,
103 |                     # groups = 3 if ( i % 2 == 0 ) else 2
104 |                 )
105 |             )
106 |             in_shape = down_stack_2[-1].output_size
107 | 
108 |         self.down_stack_2 = nn.Sequential(*down_stack_2)
109 | 
110 |         self.merged_output_size = ds_1_end_size + sum(
111 |             [l.output_size for l in down_stack_2]
112 |         )
113 | 
114 |         in_shape = self.merged_output_size
115 | 
116 |         lstm_stack = []
117 |         for i in range(n_dense_pre_l):
118 |             lstm_stack.append(
119 |                 CGLLayer(
120 |                     in_shape, w_dense_pre_l, kernel_size=1, type="cnn", dropout=dropout
121 |                 )
122 |             )
123 |             in_shape = lstm_stack[-1].output_size
124 | 
125 |         for i in range(n_l):
126 |             lstm_stack.append(
127 |                 CGLLayer(
128 |                     in_shape,
129 |                     w_l,
130 |                     cnn_kernel_size,  # unused when type!-='cnn'
131 |                     type="lstm",
132 |                     dropout=dropout,
133 |                 )
134 |             )
135 |             in_shape = lstm_stack[-1].output_size
136 | 
137 |         for i in range(n_dense_post_l):
138 |             lstm_stack.append(
139 |                 CGLLayer(
140 |                     in_shape, w_dense_post_l, kernel_size=1, type="cnn", dropout=dropout
141 |                 )
142 |             )
143 |             in_shape = lstm_stack[-1].output_size
144 | 
145 |         self.lstm_stack = nn.Sequential(*lstm_stack)
146 | 
147 |         # [batch, chan, seq]
148 | 
149 |         end_stacks = []
150 |         for num_output_classes in self.num_output_classes:
151 |             end_stacks.append(
152 |                 nn.Sequential(
153 |                     nn.Dropout(dropout),
154 |                     #     # This sort of Conv1D acts as a time-distributed Dense layer.
155 |                     nn.Linear(in_shape, num_output_classes),
156 |                     # nn.Conv1d(
157 |                     #     in_shape, num_output_classes, 1
158 |                     # ),  # time-distributed dense
159 |                 )
160 |                 # CGLLayer(
161 |                 #     in_shape,
162 |                 #     num_output_classes,
163 |                 #     kernel_size=1,
164 |                 #     type="cnn",
165 |                 #     dropout=dropout,
166 |                 #     batch_norm=False
167 |                 # )
168 |             )
169 | 
170 |         self.end_stacks = nn.ModuleList(end_stacks)
171 | 
172 |     def _forward(self, X, **kwargs):
173 |         """(N, C_{in}, L_{in})"""
174 |         Xs = [X]  # [batch, chan, seq]
175 |         Xs.append(self.down_stack_1(Xs[-1]))
176 | 
177 |         to_merge = [Xs[-1]]
178 |         for module in self.down_stack_2:
179 |             output = module(Xs[-1])
180 |             Xs.append(output)
181 |             to_merge.append(
182 |                 F.interpolate(
183 |                     output,
184 |                     size=to_merge[0].shape[-1],
185 |                     mode="linear",
186 |                     align_corners=False,
187 |                 )
188 |             )
189 | 
190 |         merged = torch.cat(to_merge, dim=1)
191 |         Xs.append(merged)
192 |         Xs.append(self.lstm_stack(Xs[-1]))
193 | 
194 |         if self.keep_intermediates:
195 |             self.Xs = Xs
196 | 
197 |         ys = []
198 | 
199 |         # (N, C_{in}, L_{in})
200 | 
201 |         for end_stack in self.end_stacks:
202 |             # (N, C_{in}, L_{in}) => # (N, L_{in},  C_{in},)
203 |             x = Xs[-1].permute([0, 2, 1])
204 |             x = end_stack(x)
205 |             x = x.permute([0, 2, 1])
206 |             ys.append(x)
207 | 
208 |         # ys = [end_stack(Xs[-1]) for end_stack in self.end_stacks]
209 | 
210 |         # No softmax because the pytorch cross_entropy loss function wants the raw outputs.
211 | 
212 |         return ys
213 | 


--------------------------------------------------------------------------------
/filternet/models/filter_net_ensemble.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import torch
 4 | from torch import nn
 5 | 
 6 | from .base_net import BaseNet
 7 | 
 8 | 
 9 | class FilterNetEnsemble(BaseNet):
10 |     variance_penalty = 0.0
11 | 
12 |     def build(self, **config):
13 |         pass
14 | 
15 |     def set_models(self, models):
16 |         self.model = nn.ModuleList([m for m in models])
17 | 
18 |     def _forward(self, X, **kwargs):
19 |         """(N, C_{in}, L_{in})"""
20 |         outputs_list = [sub_model(X) for sub_model in self.model]
21 |         outputs = []
22 |         for i in range(len(self.model[0].num_output_classes)):
23 |             output_ = torch.stack([_outputs[i] for _outputs in outputs_list])
24 |             if self.variance_penalty:
25 |                 s = torch.std(output_, dim=0)
26 |                 mean = output = torch.mean(output_, dim=0)
27 |                 output = mean - self.variance_penalty * s
28 |             else:
29 |                 output = torch.mean(output_, dim=0)
30 |             outputs.append(output)
31 |             # output, _ = torch.median(outputs, dim=0)
32 |             # output = torch.log(torch.mean(torch.softmax(outputs, dim=2), dim=0))
33 | 
34 |         return outputs
35 | 
36 |     def transform_targets(self, y, one_hot=True):
37 |         return self.model[0].transform_targets(y, one_hot=one_hot)
38 | 


--------------------------------------------------------------------------------
/filternet/models/reference_architectures.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | """ Reference architectures """
 4 | from copy import deepcopy
 5 | 
 6 | ref_archs = {
 7 |     "base_cnn": {
 8 |         "n_pre": 1,
 9 |         "n_strided": 3,
10 |         "n_interp": 0,
11 |         "n_dense_pre_l": 0,
12 |         "n_l": 0,
13 |         "n_dense_post_l": 0,
14 |     },
15 |     "multi_scale_cnn": {
16 |         "n_pre": 1,
17 |         "n_strided": 3,
18 |         "n_interp": 4,
19 |         "n_dense_pre_l": 1,
20 |         "n_l": 0,
21 |         "n_dense_post_l": 0,
22 |     },
23 |     "base_lstm": {
24 |         "n_pre": 0,
25 |         "lr_decay": 1.0,
26 |         "n_strided": 0,
27 |         "n_interp": 0,
28 |         "n_dense_pre_l": 0,
29 |         "n_l": 1,
30 |         "n_dense_post_l": 0,
31 |     },
32 |     "cnn_lstm": {
33 |         "n_pre": 1,
34 |         "n_strided": 3,
35 |         "n_interp": 0,
36 |         "n_dense_pre_l": 1,
37 |         "n_l": 1,
38 |         "n_dense_post_l": 0,
39 |     },
40 |     "multi_scale_cnn_lstm": {
41 |         "n_pre": 1,
42 |         "n_strided": 3,
43 |         "n_interp": 4,
44 |         "n_dense_pre_l": 1,
45 |         "n_l": 1,
46 |         "n_dense_post_l": 0,
47 |     },
48 | }
49 | 
50 | 
51 | def get_ref_arch(name):
52 |     return deepcopy(ref_archs[name])
53 | 


--------------------------------------------------------------------------------
/filternet/mputil.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import time
 4 | 
 5 | 
 6 | class Timer(object):
 7 |     """ Convenience class for timing code w/ a context timer.
 8 | 
 9 |     Note that timer prints wall time, but also keeps track of
10 |     cpu time internally (you can access via timer.interval_cpu
11 | 
12 |     Not super accurate, but super convenient!
13 | 
14 |     with Timer('Name of timer'):
15 |         do_slow_stuff(...)
16 |         do_some_other_stuff(...)
17 | 
18 |     # Once code reaches here, timing and name info will be printed to stdout.
19 | 
20 |     """
21 | 
22 |     def __init__(self, name=__name__, log_output=True):
23 |         self.name = name
24 |         self.log_output = log_output
25 | 
26 |     def __enter__(self):
27 |         self.start_wall = time.perf_counter()
28 |         self.start_cpu = time.process_time()
29 |         if self.log_output:
30 |             print(f"/ / / /   [{self.name} ...]")
31 |         return self
32 | 
33 |     def __exit__(self, *args):
34 |         self.end_wall = time.perf_counter()
35 |         self.end_cpu = time.process_time()
36 |         self.interval_wall = self.end_wall - self.start_wall
37 |         self.interval_cpu = self.end_cpu - self.start_cpu
38 |         if self.log_output:
39 |             print(
40 |                 f"\\ \\ \\ \\  {self.interval_wall:.03f} s wall  ({self.interval_cpu:.03f} s cpu) [... {self.name}]\n"
41 |             )
42 | 


--------------------------------------------------------------------------------
/filternet/training/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | 


--------------------------------------------------------------------------------
/filternet/training/ensemble_train.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import os
  4 | import pickle
  5 | import typing as ty
  6 | 
  7 | import numpy as np
  8 | import torch
  9 | import torch.nn.functional as F
 10 | import torch.optim
 11 | import traits.api as t
 12 | from torch.utils.data import DataLoader, TensorDataset
 13 | 
 14 | from filternet import models
 15 | from filternet.training.evalmodel import EvalModel
 16 | from filternet.training.train import Trainer
 17 | 
 18 | 
 19 | class EnsembleTrainer(t.HasStrictTraits):
 20 |     def __init__(self, config={}, **kwargs):
 21 |         trainer_template = Trainer(**config)
 22 |         super().__init__(trainer_template=trainer_template, config=config, **kwargs)
 23 | 
 24 |     config: dict = t.Dict()
 25 | 
 26 |     trainer_template: Trainer = t.Instance(Trainer)
 27 |     trainers: ty.List[Trainer] = t.List(t.Instance(Trainer))
 28 | 
 29 |     n_folds = t.Int(5)
 30 | 
 31 |     dl_test: DataLoader = t.DelegatesTo("trainer_template")
 32 |     data_spec: dict = t.DelegatesTo("trainer_template")
 33 |     cuda: bool = t.DelegatesTo("trainer_template")
 34 |     device: str = t.DelegatesTo("trainer_template")
 35 |     loss_func: str = t.DelegatesTo("trainer_template")
 36 |     batch_size: int = t.DelegatesTo("trainer_template")
 37 |     win_len: int = t.DelegatesTo("trainer_template")
 38 |     has_null_class: bool = t.DelegatesTo("trainer_template")
 39 |     predict_null_class: bool = t.DelegatesTo("trainer_template")
 40 |     name: str = t.Str()
 41 | 
 42 |     def _name_default(self):
 43 |         import time
 44 | 
 45 |         modelstr = "Ensemble"
 46 |         timestr = time.strftime("%Y%m%d-%H%M%S")
 47 |         return f"{modelstr}_{timestr}"
 48 | 
 49 |     X_folds = t.Tuple(transient=True)
 50 |     ys_folds = t.Tuple(transient=True)
 51 | 
 52 |     def _trainers_default(self):
 53 |         # Temp trainer for grabbing datasets, etc
 54 |         tt = self.trainer_template
 55 |         tt.init_data()
 56 | 
 57 |         # Combine official train & val sets
 58 |         X = torch.cat([tt.dl_train.dataset.tensors[0], tt.dl_val.dataset.tensors[0]])
 59 |         ys = [
 60 |             torch.cat([yt, yv])
 61 |             for yt, yv in zip(
 62 |                 tt.dl_train.dataset.tensors[1:], tt.dl_val.dataset.tensors[1:]
 63 |             )
 64 |         ]
 65 |         # make folds
 66 |         fold_len = int(np.ceil(len(X) / self.n_folds))
 67 |         self.X_folds = torch.split(X, fold_len)
 68 |         self.ys_folds = [torch.split(y, fold_len) for y in ys]
 69 | 
 70 |         trainers = []
 71 |         for i_val_fold in range(self.n_folds):
 72 |             trainer = Trainer(
 73 |                 validation_fold=i_val_fold,
 74 |                 name=f"{self.name}/{i_val_fold}",
 75 |                 **self.config,
 76 |             )
 77 | 
 78 |             trainer.dl_test = tt.dl_test
 79 | 
 80 |             trainers.append(trainer)
 81 | 
 82 |         return trainers
 83 | 
 84 |     model: models.BaseNet = t.Instance(torch.nn.Module, transient=True)
 85 | 
 86 |     def _model_default(self):
 87 |         model = models.FilterNetEnsemble()
 88 |         model.set_models([trainer.model for trainer in self.trainers])
 89 |         return model
 90 | 
 91 |     model_path: str = t.Str()
 92 | 
 93 |     def _model_path_default(self):
 94 |         return f"saved_models/{self.name}/"
 95 | 
 96 |     def init_data(self):
 97 |         # Initiate loading of datasets, model
 98 |         pass
 99 |         # for trainer in self.trainers:
100 |         #     trainer.init_data()
101 | 
102 |     def init_train(self):
103 |         pass
104 |         # for trainer in self.trainers:
105 |         #     trainer.init_train()
106 | 
107 |     def train(self, max_epochs=50):
108 |         """ A pretty standard training loop, constrained to stop in `max_epochs` but may stop early if our
109 |         custom stopping metric does not improve for `self.patience` epochs. Always checkpoints
110 |         when a new best stopping_metric is achieved. An alternative to using
111 |         ray.tune for training."""
112 | 
113 |         for trainer in self.trainers:
114 |             # Add data to trainer
115 | 
116 |             X_train = torch.cat(
117 |                 [
118 |                     arr
119 |                     for i, arr in enumerate(self.X_folds)
120 |                     if i != trainer.validation_fold
121 |                 ]
122 |             )
123 |             ys_train = [
124 |                 torch.cat(
125 |                     [arr for i, arr in enumerate(y) if i != trainer.validation_fold]
126 |                 )
127 |                 for y in self.ys_folds
128 |             ]
129 | 
130 |             X_val = torch.cat(
131 |                 [
132 |                     arr
133 |                     for i, arr in enumerate(self.X_folds)
134 |                     if i == trainer.validation_fold
135 |                 ]
136 |             )
137 |             ys_val = [
138 |                 torch.cat(
139 |                     [arr for i, arr in enumerate(y) if i == trainer.validation_fold]
140 |                 )
141 |                 for y in self.ys_folds
142 |             ]
143 | 
144 |             trainer.dl_train = DataLoader(
145 |                 TensorDataset(torch.Tensor(X_train), *ys_train),
146 |                 batch_size=trainer.batch_size,
147 |                 shuffle=True,
148 |             )
149 |             trainer.data_spec = self.trainer_template.data_spec
150 |             trainer.epoch_iters = self.trainer_template.epoch_iters
151 |             trainer.dl_val = DataLoader(
152 |                 TensorDataset(torch.Tensor(X_val), *ys_val),
153 |                 batch_size=trainer.batch_size,
154 |                 shuffle=False,
155 |             )
156 | 
157 |             # Now clear local vars to save ranm
158 |             X_train = ys_train = X_val = ys_val = None
159 | 
160 |             trainer.init_data()
161 |             trainer.init_train()
162 |             trainer.train(max_epochs=max_epochs)
163 | 
164 |             # Clear trainer train and val datasets to save ram
165 |             trainer.dl_train = t.Undefined
166 |             trainer.dl_val = t.Undefined
167 | 
168 |             print(f"RESTORING TO best model")
169 |             trainer._restore()
170 |             trainer._save()
171 | 
172 |             trainer.print_train_summary()
173 | 
174 |             em = EvalModel(trainer=trainer)
175 | 
176 |             em.run_test_set()
177 |             em.calc_metrics()
178 |             em.calc_ward_metrics()
179 |             print(em.classification_report_df.to_string(float_format="%.3f"))
180 |             em._save()
181 | 
182 |     def print_train_summary(self):
183 |         for trainer in self.trainers:
184 |             trainer.print_train_summary()
185 | 
186 |     def _save(self, checkpoint_dir=None, save_model=True, save_trainer=True):
187 |         """ Saves/checkpoints model state and training state to disk. """
188 |         if checkpoint_dir is None:
189 |             checkpoint_dir = self.model_path
190 |         else:
191 |             self.model_path = checkpoint_dir
192 | 
193 |         os.makedirs(checkpoint_dir, exist_ok=True)
194 | 
195 |         # save model params
196 |         model_path = os.path.join(checkpoint_dir, "model.pth")
197 |         trainer_path = os.path.join(checkpoint_dir, "trainer.pth")
198 | 
199 |         if save_model:
200 |             torch.save(self.model.state_dict(), model_path)
201 |         if save_trainer:
202 |             with open(trainer_path, "wb") as f:
203 |                 pickle.dump(self, f)
204 | 
205 |         return checkpoint_dir
206 | 
207 |     def _restore(self, checkpoint_dir=None):
208 |         """ Restores model state and training state from disk. """
209 | 
210 |         if checkpoint_dir is None:
211 |             checkpoint_dir = self.model_path
212 | 
213 |         model_path = os.path.join(checkpoint_dir, "model.pth")
214 |         trainer_path = os.path.join(checkpoint_dir, "trainer.pth")
215 | 
216 |         # Reconstitute old trainer and copy state to this trainer.
217 |         with open(trainer_path, "rb") as f:
218 |             other_trainer = pickle.load(f)
219 | 
220 |         self.__setstate__(other_trainer.__getstate__())
221 | 
222 |         # Load sub-models
223 |         for trainer in self.trainers:
224 |             trainer._restore()
225 | 
226 |         # Load model (after loading state in case we need to re-initialize model from config)
227 |         self.model.load_state_dict(torch.load(model_path, map_location=self.device))
228 |         # self.model = self.model._model_default()
229 | 


--------------------------------------------------------------------------------
/filternet/training/evalmodel.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import os
  4 | import pickle
  5 | import typing as ty
  6 | from builtins import AssertionError
  7 | 
  8 | import numpy as np
  9 | import pandas as pd
 10 | import sklearn.metrics
 11 | import torch
 12 | import torch.nn.functional as F
 13 | import torch.optim
 14 | import traits.api as t
 15 | from scipy.special import softmax
 16 | from torch.utils.data import DataLoader
 17 | 
 18 | from filternet import models as mo
 19 | from filternet.mputil import Timer
 20 | from filternet.training.train import Trainer
 21 | 
 22 | 
 23 | class ClassWardMetrics(t.HasStrictTraits):
 24 |     segment_twoset_results: dict = t.Dict()
 25 |     event_detailed_scores: dict = t.Dict()
 26 |     event_standard_scores: dict = t.Dict()
 27 | 
 28 | 
 29 | class WardMetrics(t.HasStrictTraits):
 30 |     class_ward_metrics: ty.List[ClassWardMetrics] = t.List(ClassWardMetrics, [])
 31 |     overall_ward_metrics: ClassWardMetrics = t.Instance(ClassWardMetrics)
 32 |     df_event_scores: pd.DataFrame = t.Instance(pd.DataFrame())
 33 |     df_event_detailed_scores: pd.DataFrame = t.Instance(pd.DataFrame())
 34 |     df_segment_2set_results: pd.DataFrame = t.Instance(pd.DataFrame())
 35 | 
 36 | 
 37 | class EvalModel(t.HasStrictTraits):
 38 |     trainer: Trainer = t.Any()
 39 |     model: mo.BaseNet = t.DelegatesTo("trainer")
 40 |     dl_test: DataLoader = t.DelegatesTo("trainer")
 41 |     data_spec: dict = t.DelegatesTo("trainer")
 42 |     cuda: bool = t.DelegatesTo("trainer")
 43 |     device: str = t.DelegatesTo("trainer")
 44 |     loss_func: str = t.DelegatesTo("trainer")
 45 |     model_path: str = t.DelegatesTo("trainer")
 46 |     has_null_class: bool = t.DelegatesTo("trainer")
 47 |     predict_null_class: bool = t.DelegatesTo("trainer")
 48 | 
 49 |     # 'prediction' mode employs overlap and reconstructs signal
 50 |     #   as a contiguous timeseries w/ optional windowing.
 51 |     #   It aims for best accuracy/f1 by using overlap, and will
 52 |     #   typically outperform 'training' mode.
 53 |     # 'training' mode does not average repeated point and does
 54 |     #   not window; it should product acc/loss/f1 similar to
 55 |     #   training mode.
 56 |     run_mode: str = t.Enum(["prediction", "training"])
 57 |     window: str = t.Enum(["hanning", "boxcar"])
 58 |     eval_batch_size: int = t.Int(100)
 59 | 
 60 |     target_names: ty.List[str] = t.ListStr()
 61 | 
 62 |     def _target_names_default(self):
 63 |         target_names = self.data_spec["output_spec"][0]["classes"]
 64 | 
 65 |         if self.has_null_class:
 66 |             assert target_names[0] in ("", "Null")
 67 | 
 68 |             if not self.predict_null_class:
 69 |                 target_names = target_names[1:]
 70 | 
 71 |         return target_names
 72 | 
 73 |     def _run_model_on_batch(self, data, targets):
 74 |         targets = torch.stack(targets)
 75 | 
 76 |         if self.cuda:
 77 |             data, targets = data.cuda(), targets.cuda()
 78 | 
 79 |         output = self.model(data)
 80 | 
 81 |         _targets = self.model.transform_targets(targets, one_hot=False)
 82 |         if self.loss_func == "cross_entropy":
 83 |             _losses = [F.cross_entropy(o, t) for o, t in zip(output, _targets)]
 84 |             loss = sum(_losses)
 85 |         elif self.loss_func == "binary_cross_entropy":
 86 |             _targets_onehot = self.model.transform_targets(targets, one_hot=True)
 87 |             _losses = [
 88 |                 F.binary_cross_entropy_with_logits(o, t)
 89 |                 for o, t in zip(output, _targets_onehot)
 90 |             ]
 91 |             loss = sum(_losses)
 92 |         else:
 93 |             raise NotImplementedError(self.loss)
 94 | 
 95 |         # Assume only 1 output:
 96 | 
 97 |         return loss, output[0], _targets[0], _losses[0]
 98 | 
 99 |     def run_test_set(self, dl=None):
100 |         """ Runs `self.model` on `self.dl_test` (or a provided dl) and stores results for subsequent evaluation. """
101 |         if dl is None:
102 |             dl = self.dl_test
103 | 
104 |         if self.cuda:
105 |             self.model.cuda()
106 |         self.model.eval()
107 |         if self.eval_batch_size:
108 |             dl = DataLoader(dl.dataset, batch_size=self.eval_batch_size, shuffle=False)
109 |         #
110 |         #     # Xc, yc = data.get_x_y_contig('test')
111 |         X, *ys = dl.dataset.tensors
112 |         # X: [N, input_chans, win_len]
113 |         step = int(X.shape[2] / 2)
114 |         assert torch.equal(X[0, :, step], X[1, :, 0])
115 | 
116 |         losses = []
117 |         outputsraw = []
118 |         outputs = []
119 |         targets = []
120 | 
121 |         with Timer("run", log_output=False) as tr:
122 |             with Timer("infer", log_output=False) as ti:
123 |                 for batch_idx, (data, *target) in enumerate(dl):
124 |                     (
125 |                         batch_loss,
126 |                         batch_output,
127 |                         batch_targets,
128 |                         train_losses,
129 |                     ) = self._run_model_on_batch(data, target)
130 | 
131 |                     losses.append(batch_loss.detach().cpu().item())
132 |                     outputsraw.append(batch_output.detach().cpu().data.numpy())
133 |                     outputs.append(
134 |                         torch.argmax(batch_output, 1, False).detach().cpu().data.numpy()
135 |                     )
136 |                     targets.append(batch_targets.detach().cpu().data.numpy())
137 |             self.infer_time_s_cpu = ti.interval_cpu
138 |             self.infer_time_s_wall = ti.interval_wall
139 | 
140 |             self.loss = np.mean(losses)
141 |             targets = np.concatenate(targets, axis=0)  # [N, out_win_len]
142 |             outputsraw = np.concatenate(
143 |                 outputsraw, axis=0
144 |             )  # [N, n_out_classes, out_win_len]
145 |             outputs = np.concatenate(outputs, axis=0)  # [N, n_out_classes, out_win_len]
146 | 
147 |             # win_len = toutputsraw[0].shape[-1]
148 |             if (
149 |                 self.model.output_type == "many_to_one_takelast"
150 |                 or self.run_mode == "training"
151 |             ):
152 |                 self.targets = np.concatenate(targets, axis=-1)  # [N,]
153 |                 self.outputsraw = np.concatenate(
154 |                     outputsraw, axis=-1
155 |                 )  # [n_out_classes, N,]
156 |                 self.outputs = np.concatenate(outputs, axis=-1)  # [N,]
157 | 
158 |             elif self.run_mode == "prediction":
159 |                 n_segments, n_classes, out_win_len = outputsraw.shape
160 | 
161 |                 output_step = int(out_win_len / 2)
162 | 
163 |                 if self.window == "hanning":
164 |                     EPS = 0.001  # prevents divide-by-zero
165 |                     arr_window = (1 - EPS) * np.hanning(out_win_len) + EPS
166 |                 elif self.window == "boxcar":
167 |                     arr_window = np.ones((out_win_len,))
168 |                 else:
169 |                     raise ValueError()
170 | 
171 |                 # Allocate space for merged predictions
172 |                 if self.has_null_class and not self.predict_null_class:
173 |                     outputsraw2 = np.zeros(
174 |                         (n_segments + 1, n_classes - 1, output_step, 2)
175 |                     )
176 |                     window2 = np.zeros(
177 |                         (n_segments + 1, n_classes - 1, output_step, 2)
178 |                     )  # [N+1, out_win_len/2, 2]
179 |                     # Drop in outputs/window vals in the two layers
180 |                     outputsraw = outputsraw[:, 1:, :]
181 |                 else:
182 |                     outputsraw2 = np.zeros((n_segments + 1, n_classes, output_step, 2))
183 |                     window2 = np.zeros(
184 |                         (n_segments + 1, n_classes, output_step, 2)
185 |                     )  # [N+1, out_win_len/2, 2]
186 | 
187 |                 # Drop in outputs/window vals in the two layers
188 |                 outputsraw2[:-1, :, :, 0] = outputsraw[:, :, :output_step]
189 |                 outputsraw2[1:, :, :, 1] = outputsraw[
190 |                     :, :, output_step : output_step * 2
191 |                 ]
192 |                 window2[:-1, :, :, 0] = arr_window[:output_step]
193 |                 window2[1:, :, :, 1] = arr_window[output_step : output_step * 2]
194 | 
195 |                 merged_outputsraw = (outputsraw2 * window2).sum(axis=3) / (window2).sum(
196 |                     axis=3
197 |                 )
198 |                 softmaxed_merged_outputsraw = softmax(merged_outputsraw, axis=1)
199 |                 merged_outputs = np.argmax(softmaxed_merged_outputsraw, 1)
200 | 
201 |                 self.outputsraw = np.concatenate(merged_outputsraw, axis=-1)
202 |                 self.outputs = np.concatenate(merged_outputs, axis=-1)
203 |                 self.targets = np.concatenate(
204 |                     np.concatenate(
205 |                         [
206 |                             targets[:, :output_step],
207 |                             targets[[-1], output_step : output_step * 2],
208 |                         ],
209 |                         axis=0,
210 |                     ),
211 |                     axis=-1,
212 |                 )
213 |             else:
214 |                 raise ValueError()
215 | 
216 |         if self.has_null_class and not self.predict_null_class:
217 |             not_null_mask = self.targets > 0
218 |             self.outputsraw = self.outputsraw[..., not_null_mask]
219 |             self.outputs = self.outputs[not_null_mask]
220 |             self.targets = self.targets[not_null_mask]
221 |             self.targets -= 1
222 | 
223 |         self.n_samples_in = np.prod(dl.dataset.tensors[1].shape)
224 |         self.n_samples_out = len(self.outputs)
225 |         self.infer_samples_per_s = self.n_samples_in / self.infer_time_s_wall
226 |         self.run_time_s_cpu = tr.interval_cpu
227 |         self.run_time_s_wall = tr.interval_wall
228 | 
229 |     loss: float = t.Float()
230 |     targets: np.ndarray = t.Array()
231 |     outputsraw: np.ndarray = t.Array()
232 |     outputs: np.ndarray = t.Array()
233 |     n_samples_in: int = t.Int()
234 |     n_samples_out: int = t.Int()
235 |     infer_samples_per_s: float = t.Float()
236 | 
237 |     infer_time_s_cpu: float = t.Float()
238 |     infer_time_s_wall: float = t.Float()
239 |     run_time_s_cpu: float = t.Float()
240 |     run_time_s_wall: float = t.Float()
241 | 
242 |     extra: dict = t.Dict({})
243 | 
244 |     acc: float = t.Float()
245 |     f1: float = t.Float()
246 |     f1_mean: float = t.Float()
247 |     event_f1: float = t.Float()
248 |     classification_report_txt: str = t.Str()
249 |     classification_report_dict: dict = t.Dict()
250 |     classification_report_df: pd.DataFrame = t.Property(t.Instance(pd.DataFrame))
251 |     confusion_matrix: np.ndarray = t.Array()
252 | 
253 |     nonull_acc: float = t.Float()
254 |     nonull_f1: float = t.Float()
255 |     nonull_f1_mean: float = t.Float()
256 |     nonull_classification_report_txt: str = t.Str()
257 |     nonull_classification_report_dict: dict = t.Dict()
258 |     nonull_classification_report_df: pd.DataFrame = t.Property(t.Instance(pd.DataFrame))
259 |     nonull_confusion_matrix: np.ndarray = t.Array()
260 | 
261 |     def calc_metrics(self):
262 | 
263 |         self.acc = sklearn.metrics.accuracy_score(self.targets, self.outputs)
264 |         self.f1 = sklearn.metrics.f1_score(
265 |             self.targets, self.outputs, average="weighted"
266 |         )
267 |         self.f1_mean = sklearn.metrics.f1_score(
268 |             self.targets, self.outputs, average="macro"
269 |         )
270 | 
271 |         self.classification_report_txt = sklearn.metrics.classification_report(
272 |             self.targets,
273 |             self.outputs,
274 |             digits=3,
275 |             labels=np.arange(len(self.target_names)),
276 |             target_names=self.target_names,
277 |         )
278 |         self.classification_report_dict = sklearn.metrics.classification_report(
279 |             self.targets,
280 |             self.outputs,
281 |             digits=3,
282 |             output_dict=True,
283 |             labels=np.arange(len(self.target_names)),
284 |             target_names=self.target_names,
285 |         )
286 |         self.confusion_matrix = sklearn.metrics.confusion_matrix(
287 |             self.targets, self.outputs
288 |         )
289 | 
290 |         # Now, ignoring the null/none class:
291 |         if self.has_null_class and self.predict_null_class:
292 |             # assume null class comes fistnonull_mask = self.targets > 0
293 |             nonull_mask = self.targets > 0
294 |             nonull_targets = self.targets[nonull_mask]
295 |             # nonull_outputs = self.outputs[nonull_mask]
296 |             nonull_outputs = self.outputsraw[1:, :].argmax(axis=0)[nonull_mask] + 1
297 | 
298 |             self.nonull_acc = sklearn.metrics.accuracy_score(
299 |                 nonull_targets, nonull_outputs
300 |             )
301 |             self.nonull_f1 = sklearn.metrics.f1_score(
302 |                 nonull_targets, nonull_outputs, average="weighted"
303 |             )
304 |             self.nonull_f1_mean = sklearn.metrics.f1_score(
305 |                 nonull_targets, nonull_outputs, average="macro"
306 |             )
307 |             self.nonull_classification_report_txt = sklearn.metrics.classification_report(
308 |                 nonull_targets,
309 |                 nonull_outputs,
310 |                 digits=3,
311 |                 labels=np.arange(len(self.target_names)),
312 |                 target_names=self.target_names,
313 |             )
314 |             self.nonull_classification_report_dict = sklearn.metrics.classification_report(
315 |                 nonull_targets,
316 |                 nonull_outputs,
317 |                 digits=3,
318 |                 output_dict=True,
319 |                 labels=np.arange(len(self.target_names)),
320 |                 target_names=self.target_names,
321 |             )
322 |             self.nonull_confusion_matrix = sklearn.metrics.confusion_matrix(
323 |                 nonull_targets, nonull_outputs
324 |             )
325 |         else:
326 |             self.nonull_acc = self.acc
327 |             self.nonull_f1 = self.f1
328 |             self.nonull_f1_mean = self.f1_mean
329 |             self.nonull_classification_report_txt = self.classification_report_txt
330 |             self.nonull_classification_report_dict = self.classification_report_dict
331 |             self.nonull_confusion_matrix = self.confusion_matrix
332 | 
333 |     ward_metrics: WardMetrics = t.Instance(WardMetrics)
334 | 
335 |     def calc_ward_metrics(self):
336 |         """ Do event-wise metrics, using the `wardmetrics` package which implements metrics from:
337 | 
338 |          [1]    J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for activity recognition,”
339 |                     ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23, Jan. 2011.
340 |         """
341 | 
342 |         import wardmetrics
343 | 
344 |         # Must be in prediction mode -- otherwise, data is not contiguous, ward metrics will be bogus
345 |         assert self.run_mode == "prediction"
346 | 
347 |         targets = self.targets
348 |         predictions = self.outputs
349 | 
350 |         wmetrics = WardMetrics()
351 | 
352 |         targets_events = wardmetrics.frame_results_to_events(targets)
353 |         preds_events = wardmetrics.frame_results_to_events(predictions)
354 | 
355 |         for i, class_name in enumerate(self.target_names):
356 |             class_wmetrics = ClassWardMetrics()
357 | 
358 |             t = targets_events.get(str(i), [])
359 |             p = preds_events.get(str(i), [])
360 |             # class_wmetrics['t'] = t
361 |             # class_wmetrics['p'] = p
362 | 
363 |             try:
364 |                 assert len(t) and len(p)
365 |                 (
366 |                     twoset_results,
367 |                     segments_with_scores,
368 |                     segment_counts,
369 |                     normed_segment_counts,
370 |                 ) = wardmetrics.eval_segments(t, p)
371 |                 class_wmetrics.segment_twoset_results = twoset_results
372 | 
373 |                 (
374 |                     gt_event_scores,
375 |                     det_event_scores,
376 |                     detailed_scores,
377 |                     standard_scores,
378 |                 ) = wardmetrics.eval_events(t, p)
379 |                 class_wmetrics.event_detailed_scores = detailed_scores
380 |                 class_wmetrics.event_standard_scores = standard_scores
381 |             except (AssertionError, ZeroDivisionError) as e:
382 |                 class_wmetrics.segment_twoset_results = {}
383 |                 class_wmetrics.event_detailed_scores = {}
384 |                 class_wmetrics.event_standard_scores = {}
385 |                 # print("Empty Results or targets for a class.")
386 |                 # raise ValueError("Empty Results or targets for a class.")
387 | 
388 |             wmetrics.class_ward_metrics.append(class_wmetrics)
389 | 
390 |         tt = []
391 |         pp = []
392 |         for i, class_name in enumerate(self.target_names):
393 |             # skip null class for combined eventing:
394 |             if class_name in ("", "Null"):
395 |                 continue
396 | 
397 |             if len(tt) or len(pp):
398 |                 offset = np.max(tt + pp) + 2
399 |             else:
400 |                 offset = 0
401 |             [(a + offset, b + offset) for (a, b) in t]
402 | 
403 |             t = targets_events.get(str(i), [])
404 |             p = preds_events.get(str(i), [])
405 | 
406 |             tt += [(a + offset, b + offset) for (a, b) in t]
407 |             pp += [(a + offset, b + offset) for (a, b) in p]
408 | 
409 |         t = tt
410 |         p = pp
411 | 
412 |         class_wmetrics = ClassWardMetrics()
413 |         assert len(t) and len(p)
414 |         (
415 |             twoset_results,
416 |             segments_with_scores,
417 |             segment_counts,
418 |             normed_segment_counts,
419 |         ) = wardmetrics.eval_segments(t, p)
420 |         class_wmetrics.segment_twoset_results = twoset_results
421 | 
422 |         (
423 |             gt_event_scores,
424 |             det_event_scores,
425 |             detailed_scores,
426 |             standard_scores,
427 |         ) = wardmetrics.eval_events(t, p)
428 |         class_wmetrics.event_detailed_scores = detailed_scores
429 |         class_wmetrics.event_standard_scores = standard_scores
430 | 
431 |         # Reformat as dataframe for easier calculations
432 |         df = pd.DataFrame(
433 |             [cm.event_standard_scores for cm in wmetrics.class_ward_metrics],
434 |             index=self.target_names,
435 |         )
436 |         df.loc["all_nonull"] = class_wmetrics.event_standard_scores
437 | 
438 |         # Calculate F1's to summarize recall/precision for each class
439 |         df["f1"] = (
440 |             2 * (df["precision"] * df["recall"]) / (df["precision"] + df["recall"])
441 |         )
442 |         df["f1 (weighted)"] = (
443 |             2
444 |             * (df["precision (weighted)"] * df["recall (weighted)"])
445 |             / (df["precision (weighted)"] + df["recall (weighted)"])
446 |         )
447 | 
448 |         # Load dataframes into dictionary output
449 |         wmetrics.df_event_scores = df
450 |         wmetrics.df_event_detailed_scores = pd.DataFrame(
451 |             [cm.event_detailed_scores for cm in wmetrics.class_ward_metrics],
452 |             index=self.target_names,
453 |         )
454 |         wmetrics.df_segment_2set_results = pd.DataFrame(
455 |             [cm.segment_twoset_results for cm in wmetrics.class_ward_metrics],
456 |             index=self.target_names,
457 |         )
458 |         wmetrics.overall_ward_metrics = class_wmetrics
459 | 
460 |         self.ward_metrics = wmetrics
461 |         self.event_f1 = self.ward_metrics.df_event_scores.loc["all_nonull", "f1"]
462 | 
463 |     def _get_classification_report_df(self):
464 |         df = pd.DataFrame(self.classification_report_dict).T
465 | 
466 |         # Include Ward-metrics-derived "Event F1 (unweighted by length)"
467 |         if self.ward_metrics:
468 |             df["event_f1"] = self.ward_metrics.df_event_scores["f1"]
469 |         else:
470 |             df["event_f1"] = np.nan
471 | 
472 |             # Calculate various summary averages
473 |         df.loc["macro avg", "event_f1"] = df["event_f1"].iloc[:-3].mean()
474 |         df.loc["weighted avg", "event_f1"] = (
475 |             df["event_f1"].iloc[:-3] * df["support"].iloc[:-3]
476 |         ).sum() / df["support"].iloc[:-3].sum()
477 | 
478 |         df["support"] = df["support"].astype(int)
479 | 
480 |         return df
481 | 
482 |     def _get_nonull_classification_report_df(self):
483 |         target_names = self.target_names
484 |         if not (target_names[0] in ("", "Null")):
485 |             return None
486 | 
487 |         df = pd.DataFrame(self.nonull_classification_report_dict).T
488 | 
489 |         df["support"] = df["support"].astype(int)
490 | 
491 |         return df
492 | 
493 |     def _save(self, checkpoint_dir=None):
494 |         """ Saves/checkpoints model state and training state to disk. """
495 |         if checkpoint_dir is None:
496 |             checkpoint_dir = self.model_path
497 | 
498 |         os.makedirs(checkpoint_dir, exist_ok=True)
499 | 
500 |         # save model params
501 |         evalmodel_path = os.path.join(checkpoint_dir, "evalmodel.pth")
502 | 
503 |         with open(evalmodel_path, "wb") as f:
504 |             pickle.dump(self, f)
505 | 
506 |         return checkpoint_dir
507 | 
508 |     def _restore(self, checkpoint_dir=None):
509 |         """ Restores model state and training state from disk. """
510 | 
511 |         if checkpoint_dir is None:
512 |             checkpoint_dir = self.model_path
513 | 
514 |         evalmodel_path = os.path.join(checkpoint_dir, "evalmodel.pth")
515 | 
516 |         # Reconstitute old trainer and copy state to this trainer.
517 |         with open(evalmodel_path, "rb") as f:
518 |             other_evalmodel = pickle.load(f)
519 | 
520 |         self.__setstate__(other_evalmodel.__getstate__())
521 | 
522 |         self.trainer._restore(checkpoint_dir)
523 | 
524 | 
525 | def load_eval_model_from_dir(checkpoint_dir: str):
526 |     em = EvalModel()
527 |     em._restore(checkpoint_dir)
528 |     return em
529 | 


--------------------------------------------------------------------------------
/filternet/training/train.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import os
  4 | import pickle
  5 | import typing as ty
  6 | 
  7 | import numpy as np
  8 | import pandas as pd
  9 | import sklearn.metrics
 10 | import torch
 11 | import torch.nn.functional as F
 12 | import torch.optim
 13 | import traits.api as t
 14 | from torch.utils.data import DataLoader, TensorDataset
 15 | 
 16 | from filternet import models
 17 | from filternet.datasets import sliding_window_x_y
 18 | from filternet.models.reference_architectures import get_ref_arch
 19 | from filternet.mputil import Timer
 20 | 
 21 | 
 22 | class EpochMetrics(t.HasStrictTraits):
 23 |     f1: float = t.Float()
 24 |     loss: float = t.Float()
 25 |     acc: float = t.Float()
 26 | 
 27 | 
 28 | class EpochRecord(t.HasStrictTraits):
 29 |     epoch: int = t.Int()
 30 |     train: EpochMetrics = t.Instance(EpochMetrics)
 31 |     val: EpochMetrics = t.Instance(EpochMetrics)
 32 | 
 33 |     lr: float = t.Float()
 34 |     iter_s_cpu: float = t.Float()
 35 |     iter_s_wall: float = t.Float()
 36 |     should_checkpoint: bool = t.Bool(False)
 37 |     done: bool = t.Bool(False)
 38 |     stopping_metric: float = t.Float()
 39 | 
 40 |     def to_dict(self):
 41 |         d = {
 42 |             k: v
 43 |             for k, v in self.__dict__.items()
 44 |             if v is not None and not type(v) == EpochMetrics
 45 |         }
 46 |         for f in ["train", "val"]:
 47 |             em = getattr(self, f)
 48 |             if em:
 49 |                 for k, v in em.__dict__.items():
 50 |                     if v is not None:
 51 |                         d[f"{f}_{k}"] = v
 52 |         return d
 53 | 
 54 | 
 55 | class TrainState(t.HasStrictTraits):
 56 |     epoch_records: ty.List[EpochRecord] = t.List(EpochRecord, [])
 57 |     best_sm: float = t.Float()
 58 |     best_loss: float = t.Float()
 59 |     best_f1: float = t.Float()
 60 |     extra: dict = t.Dict()
 61 | 
 62 |     def to_df(self):
 63 |         return (
 64 |             pd.DataFrame.from_records([er.to_dict() for er in self.epoch_records])
 65 |             .set_index("epoch")
 66 |             .sort_index(axis=1)
 67 |         )
 68 | 
 69 | 
 70 | class Trainer(t.HasStrictTraits):
 71 |     model: models.BaseNet = t.Instance(torch.nn.Module, transient=True)
 72 | 
 73 |     def _model_default(self):
 74 | 
 75 |         # Merge 'base config' (if requested) and any overrides in 'model_config'
 76 |         if self.base_config:
 77 |             model_config = get_ref_arch(self.base_config)
 78 |         else:
 79 |             model_config = {}
 80 |         if self.model_config:
 81 |             model_config.update(self.model_config)
 82 |         if self.data_spec:
 83 |             model_config.update(
 84 |                 {
 85 |                     "input_channels": self.data_spec["input_channels"],
 86 |                     "num_output_classes": [
 87 |                         s["num_classes"] for s in self.data_spec["output_spec"]
 88 |                     ],
 89 |                 }
 90 |             )
 91 |         # create model accordingly
 92 |         model_class = getattr(models, self.model_class)
 93 |         return model_class(**model_config)
 94 | 
 95 |     base_config: str = t.Str()
 96 |     model_config: dict = t.Dict()
 97 |     model_class: str = t.Enum("FilterNet", "DeepConvLSTM")
 98 | 
 99 |     lr_exp: float = t.Float(-3.0)
100 |     batch_size: int = t.Int()
101 |     win_len: int = t.Int(512)
102 |     n_samples_per_batch: int = t.Int(5000)
103 |     train_step: int = t.Int(16)
104 |     seed: int = t.Int()
105 |     decimation: int = t.Int(1)
106 |     optim_type: str = t.Enum(["Adam", "SGD, RMSprop"])
107 |     loss_func: str = t.Enum(["cross_entropy", "binary_cross_entropy"])
108 |     patience: int = t.Int(10)
109 |     lr_decay: float = t.Float(0.95)
110 |     weight_decay: float = t.Float(1e-4)
111 |     alpha: float = t.Float(0.99)
112 |     momentum: float = t.Float(0.25)
113 |     validation_fold: int = t.Int()
114 |     epoch_size: float = t.Float(2.0)
115 |     y_cols: str = t.Str()
116 |     sensor_subset: str = t.Str()
117 | 
118 |     has_null_class: bool = t.Bool()
119 | 
120 |     def _has_null_class_default(self):
121 |         return self.data_spec["output_spec"][0]["classes"][0] in ("", "Null")
122 | 
123 |     predict_null_class: bool = t.Bool(True)
124 | 
125 |     _class_weights: torch.Tensor = t.Instance(torch.Tensor)
126 | 
127 |     def __class_weights_default(self):
128 |         # Not weights for now because didn't seem to increase things significantly and
129 |         #   added yet another hyper-parameter. Using zero didn't seem to work well.
130 |         if False and self.has_null_class and not self.predict_null_class:
131 |             cw = torch.ones(self.model.num_output_classes, device=self.device)
132 |             cw[0] = 0.01
133 |             cw /= cw.sum()
134 |             return cw
135 |         return None
136 | 
137 |     dataset: str = t.Enum(
138 |         ["opportunity", "smartphone_hapt", "har", "intention_recognition"]
139 |     )
140 |     name: str = t.Str()
141 | 
142 |     def _name_default(self):
143 |         import time
144 | 
145 |         modelstr = self.model.__class__.__name__
146 |         timestr = time.strftime("%Y%m%d-%H%M%S")
147 |         return f"{modelstr}_{timestr}"
148 | 
149 |     model_path: str = t.Str()
150 | 
151 |     def _model_path_default(self):
152 |         return f"saved_models/{self.name}/"
153 | 
154 |     data_spec: dict = t.Any()
155 |     epoch_iters: int = t.Int(0)
156 |     train_state: TrainState = t.Instance(TrainState, ())
157 |     cp_iter: int = t.Int()
158 | 
159 |     cuda: bool = t.Bool(transient=True)
160 | 
161 |     def _cuda_default(self):
162 |         return torch.cuda.is_available()
163 | 
164 |     device: str = t.Str(transient=True)
165 | 
166 |     def _device_default(self):
167 |         return "cuda" if self.cuda else "cpu"
168 | 
169 |     dl_train: DataLoader = t.Instance(DataLoader, transient=True)
170 | 
171 |     def _dl_train_default(self):
172 |         return self._get_dl("train")
173 | 
174 |     dl_val: DataLoader = t.Instance(DataLoader, transient=True)
175 | 
176 |     def _dl_val_default(self):
177 |         return self._get_dl("val")
178 | 
179 |     dl_test: DataLoader = t.Instance(DataLoader, transient=True)
180 | 
181 |     def _dl_test_default(self):
182 |         return self._get_dl("test")
183 | 
184 |     def _get_dl(self, s):
185 | 
186 |         if self.dataset == "opportunity":
187 |             from filternet.datasets.opportunity import get_x_y_contig
188 |         elif self.dataset == "smartphone_hapt":
189 |             from filternet.datasets.smartphone_hapt import get_x_y_contig
190 |         elif self.dataset == "har":
191 |             from filternet.datasets.har import get_x_y_contig
192 |         elif self.dataset == "intention_recognition":
193 |             from filternet.datasets.intention_recognition import get_x_y_contig
194 |         else:
195 |             raise ValueError(f"Unknown dataset {self.dataset}")
196 | 
197 |         kwargs = {}
198 |         if self.y_cols:
199 |             kwargs["y_cols"] = self.y_cols
200 |         if self.sensor_subset:
201 |             kwargs["sensor_subset"] = self.sensor_subset
202 | 
203 |         Xc, ycs, data_spec = get_x_y_contig(s, **kwargs)
204 | 
205 |         if s == "train":
206 |             # Training shuffles, and we set epoch size to length of the dataset. We can set train_step as
207 |             # small as we want to get more windows; we'll only run len(Sc)/win_len of them in each training
208 |             # epoch.
209 |             self.epoch_iters = int(len(Xc) / self.decimation)
210 |             X, ys = sliding_window_x_y(
211 |                 Xc, ycs, win_len=self.win_len, step=self.train_step, shuffle=False
212 |             )
213 |             # Set the overall data spec using the training set,
214 |             #  and modify later if more info is needed.
215 |             self.data_spec = data_spec
216 |         else:
217 |             # Val and test data are not shuffled.
218 |             # Each point is inferred ~twice b/c step = win_len/2
219 |             X, ys = sliding_window_x_y(
220 |                 Xc,
221 |                 ycs,
222 |                 win_len=self.win_len,
223 |                 step=int(self.win_len / 2),
224 |                 shuffle=False,  # Cannot be true with windows
225 |             )
226 | 
227 |         dl = DataLoader(
228 |             TensorDataset(torch.Tensor(X), *[torch.Tensor(y).long() for y in ys]),
229 |             batch_size=self.batch_size,
230 |             shuffle=True if s == "train" else False,
231 |         )
232 |         return dl
233 | 
234 |     def _batch_size_default(self):
235 |         batch_size = int(self.n_samples_per_batch / self.win_len)
236 |         print(f"Batch size: {batch_size}")
237 |         return batch_size
238 | 
239 |     optimizer = t.Any(transient=True)
240 | 
241 |     def _optimizer_default(self):
242 |         if self.optim_type == "SGD":
243 |             optimizer = torch.optim.SGD(
244 |                 self.model.parameters(),
245 |                 lr=10 ** (self.lr_exp),
246 |                 momentum=self.momentum,
247 |                 weight_decay=self.weight_decay,
248 |             )
249 |         elif self.optim_type == "Adam":
250 |             optimizer = torch.optim.Adam(
251 |                 self.model.parameters(),
252 |                 lr=10 ** (self.lr_exp),
253 |                 weight_decay=self.weight_decay,
254 |                 amsgrad=True,
255 |             )
256 |         elif self.optim_type == "RMSprop":
257 |             optimizer = torch.optim.RMSprop(
258 |                 self.model.parameters(),
259 |                 lr=10 ** (self.lr_exp),
260 |                 alpha=self.alpha,
261 |                 weight_decay=self.weight_decay,
262 |                 momentum=self.momentum,
263 |             )
264 |         else:
265 |             raise NotImplementedError(self.optim_type)
266 |         return optimizer
267 | 
268 |     iteration: int = t.Property(t.Int)
269 | 
270 |     def _get_iteration(self):
271 |         return len(self.train_state.epoch_records) + 1
272 | 
273 |     lr_scheduler = t.Any(transient=True)
274 | 
275 |     def _lr_scheduler_default(self):
276 |         lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(
277 |             self.optimizer, self.lr_decay  # , last_epoch=self._iteration
278 |         )
279 | 
280 |         # If this is being re-instantiated in mid-training, then we must
281 |         #  iterate scheduler forward to match the training step.
282 |         for i in range(self.iteration):
283 |             if self.lr_decay != 1:
284 |                 lr_scheduler.step()
285 | 
286 |         return lr_scheduler
287 | 
288 |     #####
289 |     # Training Methods
290 |     ##
291 |     def _train_batch(self, data, targets):
292 |         self.optimizer.zero_grad()
293 |         loss, output, _targets, _ = self._run_model_on_batch(data, targets)
294 |         loss.backward()
295 |         self.optimizer.step()
296 |         # if self.max_lr:
297 |         #     self.lr_scheduler.step()
298 | 
299 |         return loss, output, _targets
300 | 
301 |     def _run_model_on_batch(self, data, targets):
302 |         targets = torch.stack(targets)
303 | 
304 |         if self.cuda:
305 |             data, targets = data.cuda(), targets.cuda()
306 | 
307 |         output = self.model(data)
308 | 
309 |         _targets = self.model.transform_targets(targets, one_hot=False)
310 |         if self.loss_func == "cross_entropy":
311 |             _losses = [
312 |                 F.cross_entropy(o, t, weight=self._class_weights)
313 |                 for o, t in zip(output, _targets)
314 |             ]
315 |             loss = sum(_losses)
316 |         elif self.loss_func == "binary_cross_entropy":
317 |             _targets_onehot = self.model.transform_targets(targets, one_hot=True)
318 |             _losses = [
319 |                 F.binary_cross_entropy_with_logits(o, t, weight=self._class_weights)
320 |                 for o, t in zip(output, _targets_onehot)
321 |             ]
322 |             loss = sum(_losses)
323 |         else:
324 |             raise NotImplementedError(self.loss)
325 | 
326 |         # Assume only 1 output:
327 | 
328 |         return loss, output[0], _targets[0], _losses[0]
329 | 
330 |     def _calc_validation_loss(self):
331 |         running_loss = 0
332 |         self.model.eval()
333 |         with torch.no_grad():
334 |             for batch_idx, (data, *targets) in enumerate(self.dl_val):
335 |                 loss, _, _, _ = self._run_model_on_batch(data, targets)
336 |                 running_loss += loss.item() * data.size(0)
337 | 
338 |         return running_loss / len(self.dl_val.dataset)
339 | 
340 |     def _train_epoch(self):
341 | 
342 |         self.model.train()
343 | 
344 |         train_losses = []
345 |         train_accs = []
346 | 
347 |         for batch_idx, (data, *targets) in enumerate(self.dl_train):
348 |             if (
349 |                 batch_idx * data.shape[0] * data.shape[2]
350 |                 > self.epoch_iters * self.epoch_size
351 |             ):
352 |                 # we've effectively finished one epoch worth of data; break!
353 |                 break
354 | 
355 |             batch_loss, batch_output, batch_targets = self._train_batch(data, targets)
356 |             train_losses.append(batch_loss.detach().cpu().item())
357 |             batch_preds = torch.argmax(batch_output, 1, False)
358 |             train_accs.append(
359 |                 (batch_preds == batch_targets).detach().cpu().float().mean().item()
360 |             )
361 | 
362 |         if self.lr_decay != 1:
363 |             self.lr_scheduler.step()
364 | 
365 |         return EpochMetrics(loss=np.mean(train_losses), acc=np.mean(train_accs))
366 | 
367 |     def _val_epoch(self):
368 |         return self._eval_epoch(self.dl_val)
369 | 
370 |     def _eval_epoch(self, data_loader):
371 |         # Validation
372 |         self.model.eval()
373 | 
374 |         losses = []
375 |         outputs = []
376 |         targets = []
377 | 
378 |         with torch.no_grad():
379 |             for batch_idx, (data, *target) in enumerate(data_loader):
380 |                 (
381 |                     batch_loss,
382 |                     batch_output,
383 |                     batch_targets,
384 |                     train_losses,
385 |                 ) = self._run_model_on_batch(data, target)
386 | 
387 |                 losses.append(batch_loss.detach().cpu().item())
388 |                 outputs.append(
389 |                     torch.argmax(batch_output, 1, False)
390 |                     .detach()
391 |                     .cpu()
392 |                     .data.numpy()
393 |                     .flatten()
394 |                 )
395 |                 targets.append(batch_targets.detach().cpu().data.numpy().flatten())
396 | 
397 |         targets = np.hstack(targets)
398 |         outputs = np.hstack(outputs)
399 |         acc = sklearn.metrics.accuracy_score(targets, outputs)
400 |         f1 = sklearn.metrics.f1_score(targets, outputs, average="weighted")
401 | 
402 |         return EpochMetrics(loss=np.mean(losses), acc=acc, f1=f1)
403 | 
404 |     def init_data(self):
405 |         # Initiate loading of datasets, model
406 |         _, _, _ = self.dl_train, self.dl_val, self.dl_test
407 |         _ = self.model
408 | 
409 |     def init_train(self):
410 | 
411 |         # initialization
412 |         if self.seed:
413 |             torch.manual_seed(self.seed)
414 |         if self.cuda:
415 |             if self.seed:
416 |                 torch.cuda.manual_seed(self.seed)
417 |         self.model.to(self.device)
418 | 
419 |     def train_one_epoch(self, verbose=True) -> EpochRecord:
420 |         """ traing a single epoch -- method tailored to the Ray.tune methodology."""
421 |         epoch_record = EpochRecord(epoch=len(self.train_state.epoch_records))
422 |         self.train_state.epoch_records.append(epoch_record)
423 | 
424 |         with Timer("Train Epoch", log_output=verbose) as t:
425 |             epoch_record.train = self._train_epoch()
426 |         epoch_record.iter_s_cpu = t.interval_cpu
427 |         epoch_record.iter_s_wall = t.interval_wall
428 |         epoch_record.lr = self.optimizer.param_groups[0]["lr"]
429 | 
430 |         with Timer("Val Epoch", log_output=verbose):
431 |             epoch_record.val = self._val_epoch()
432 | 
433 |         df = self.train_state.to_df()
434 | 
435 |         # Early stopping / checkpointing implementation
436 |         df["raw_metric"] = df.val_loss / df.val_f1
437 |         df["ewma_smoothed_loss"] = (
438 |             df["raw_metric"].ewm(ignore_na=False, halflife=3).mean()
439 |         )
440 |         df["instability_penalty"] = (
441 |             df["raw_metric"].rolling(5, min_periods=3).std().fillna(0.75)
442 |         )
443 |         stopping_metric = df["stopping_metric"] = (
444 |             df["ewma_smoothed_loss"] + df["instability_penalty"]
445 |         )
446 |         epoch_record.stopping_metric = df["stopping_metric"].iloc[-1]
447 | 
448 |         idx_this_iter = stopping_metric.index.max()
449 |         idx_best_yet = stopping_metric.idxmin()
450 |         self.train_state.best_sm = df.loc[idx_best_yet, "stopping_metric"]
451 |         self.train_state.best_loss = df.loc[idx_best_yet, "val_loss"]
452 |         self.train_state.best_f1 = df.loc[idx_best_yet, "val_f1"]
453 | 
454 |         if idx_best_yet == idx_this_iter:
455 |             # Best yet! Checkpoint.
456 |             epoch_record.should_checkpoint = True
457 |             self.cp_iter = epoch_record.epoch
458 | 
459 |         else:
460 |             if self.patience is not None:
461 |                 patience_counter = idx_this_iter - idx_best_yet
462 |                 assert patience_counter >= 0
463 |                 if patience_counter > self.patience:
464 |                     if verbose:
465 |                         print(
466 |                             f"Early stop! Out of patience ( {patience_counter} > {self.patience} )"
467 |                         )
468 |                     epoch_record.done = True
469 | 
470 |         if verbose:
471 |             self.print_train_summary()
472 | 
473 |         return epoch_record
474 | 
475 |     def train(self, max_epochs=50, verbose=True):
476 |         """ A pretty standard training loop, constrained to stop in `max_epochs` but may stop early if our
477 |         custom stopping metric does not improve for `self.patience` epochs. Always checkpoints
478 |         when a new best stopping_metric is achieved. An alternative to using
479 |         ray.tune for training."""
480 | 
481 |         self.init_data()
482 |         self.init_train()
483 | 
484 |         while True:
485 |             epoch_record = self.train_one_epoch(verbose=verbose)
486 | 
487 |             if epoch_record.should_checkpoint:
488 |                 last_cp = self._save()
489 |                 if verbose:
490 |                     print(f"<<<< Checkpointed ({last_cp}) >>>")
491 |             if epoch_record.done:
492 |                 break
493 |             if epoch_record.epoch >= max_epochs:
494 |                 break
495 | 
496 |         # Save trainer state, but not model"
497 |         self._save(save_model=False)
498 |         if verbose:
499 |             print(self.model_path)
500 | 
501 |     def print_train_summary(self):
502 |         df = self.train_state.to_df()
503 | 
504 |         with pd.option_context(
505 |             "display.max_rows",
506 |             100,
507 |             "display.max_columns",
508 |             100,
509 |             "display.precision",
510 |             3,
511 |             "display.width",
512 |             180,
513 |         ):
514 |             print(df.drop(["done"], axis=1, errors="ignore"))
515 | 
516 |     def _save(self, checkpoint_dir=None, save_model=True, save_trainer=True):
517 |         """ Saves/checkpoints model state and training state to disk. """
518 |         if checkpoint_dir is None:
519 |             checkpoint_dir = self.model_path
520 |         else:
521 |             self.model_path = checkpoint_dir
522 | 
523 |         os.makedirs(checkpoint_dir, exist_ok=True)
524 | 
525 |         # save model params
526 |         model_path = os.path.join(checkpoint_dir, "model.pth")
527 |         trainer_path = os.path.join(checkpoint_dir, "trainer.pth")
528 | 
529 |         if save_model:
530 |             torch.save(self.model.state_dict(), model_path)
531 |         if save_trainer:
532 |             with open(trainer_path, "wb") as f:
533 |                 pickle.dump(self, f)
534 | 
535 |         return checkpoint_dir
536 | 
537 |     def _restore(self, checkpoint_dir=None):
538 |         """ Restores model state and training state from disk. """
539 | 
540 |         if checkpoint_dir is None:
541 |             checkpoint_dir = self.model_path
542 | 
543 |         model_path = os.path.join(checkpoint_dir, "model.pth")
544 |         trainer_path = os.path.join(checkpoint_dir, "trainer.pth")
545 | 
546 |         # Reconstitute old trainer and copy state to this trainer.
547 |         with open(trainer_path, "rb") as f:
548 |             other_trainer = pickle.load(f)
549 | 
550 |         self.__setstate__(other_trainer.__getstate__())
551 | 
552 |         # Load model (after loading state in case we need to re-initialize model from config)
553 |         self.model.load_state_dict(torch.load(model_path, map_location=self.device))
554 | 
555 |         # Be careful to reinitialize optimizer and lr scheduler
556 |         self.optimizer = self._optimizer_default()
557 |         self.lr_scheduler = self._lr_scheduler_default()
558 | 
559 | 
560 | #
561 | # class EnsembleCNNLSTMTrainable(CNNLSTMTrainable):
562 | #     def _setup(self, config={}):
563 | #         """Decimation is for speedup during unit testing only."""
564 | #         super()._setup(config=config)
565 | #         self.model = mo.FilterNetEnsemble(config=config)
566 | 


--------------------------------------------------------------------------------
/filternet/training/trainable.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import ray.tune
 4 | 
 5 | from filternet.models.reference_architectures import get_ref_arch
 6 | from filternet.training.train import Trainer
 7 | 
 8 | 
 9 | class MPTrainable(ray.tune.Trainable):
10 |     def _setup(self, config={}):
11 |         """Decimation is for speedup during unit testing only."""
12 |         if config.get("base_config", False) and "model_config" not in config:
13 |             # Use the requested 'base config', updating it with any other
14 |             #  requested options.
15 |             config["model_config"] = get_ref_arch(config["base_config"])
16 |             print(f"Using base config: {config['base_config']}")
17 | 
18 |         self.trainer = trainer = Trainer(**config)
19 |         self.trainer.init_data()
20 |         self.trainer.init_train()
21 | 
22 |     def _train(self):
23 |         epoch_record = self.trainer.train_one_epoch()
24 |         d = epoch_record.to_dict()
25 |         d["mean_loss"] = d["val_loss"]
26 |         d["mean_accuracy"] = d["val_acc"]
27 |         return d
28 | 
29 |     def _save(self, checkpoint_dir):
30 |         """ Saves/checkpoints model state and training state to disk. """
31 | 
32 |         return self.trainer._save(checkpoint_dir)
33 | 
34 |     def _restore(self, checkpoint_dir):
35 |         """ Restores model state and training state from disk. """
36 |         return self.trainer._restore(checkpoint_dir)
37 | 


--------------------------------------------------------------------------------
/multimodal_sensor_fusion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/multimodal_sensor_fusion.png


--------------------------------------------------------------------------------
/scripts/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | 


--------------------------------------------------------------------------------
/scripts/run_base_configs_exp.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """
  4 | Executes a series of benchmarking runs to be used in plots & tables in the paper:
  5 | * FilterNet reference configurations
  6 | * DeepConvLSTM reimpplementation
  7 | * .5x ms-c/l
  8 | # 2X ms-c/l
  9 | 
 10 | """
 11 | 
 12 | import sys
 13 | 
 14 | sys.path.insert(0, ".")
 15 | 
 16 | from filternet.models.reference_architectures import ref_archs
 17 | from filternet.training.evalmodel import *
 18 | 
 19 | NAME = "base_configs_7"  # unique name for this particular run
 20 | MAX_EPOCHS = 100
 21 | NUM_REPEATS = 10
 22 | saved_model_glob = (
 23 |     f"saved_models/{NAME}*/evalmodel.pth"  # helps NB's to load these models
 24 | )
 25 | 
 26 | 
 27 | def do_run():
 28 |     for i in range(0, NUM_REPEATS):
 29 |         # Do the FilterNet reference architectures
 30 |         for ref_arch in ref_archs.keys():
 31 |             name = f"{NAME}_{ref_arch}_{i}"
 32 | 
 33 |             config = {}
 34 |             config["base_config"] = ref_arch
 35 |             config["name"] = f"{NAME}_{ref_arch}_{i}"
 36 | 
 37 |             trainer = Trainer(**config)
 38 |             trainer.init_data()
 39 |             trainer.init_train()
 40 | 
 41 |             trainer.train(max_epochs=MAX_EPOCHS)
 42 | 
 43 |             em = EvalModel(trainer=trainer)
 44 |             em._save()
 45 |             # Load fresh for consistency
 46 |             em = load_eval_model_from_dir(em.model_path)
 47 | 
 48 |             em.run_test_set()
 49 |             em.calc_metrics()
 50 |             em.calc_ward_metrics()
 51 |             print(em.classification_report_df)
 52 |             em._save()
 53 | 
 54 |         # Also do a matching number of DeepConvLSTMs
 55 |         name = f"{NAME}_deepconvlstm_{i}"
 56 | 
 57 |         config = {
 58 |             "win_len": 24,
 59 |             "batch_size": 100,
 60 |             "model_class": "DeepConvLSTM",
 61 |             "model_config": {"scale": 1.0},
 62 |         }
 63 |         # config["base_config"] = ref_arch
 64 |         # config["model_config"] = {} #get_ref_arch("multi_scale_cnn_lstm")
 65 |         config["name"] = name
 66 | 
 67 |         trainer = Trainer(**config)
 68 |         trainer.init_data()
 69 |         trainer.init_train()
 70 | 
 71 |         trainer.train(max_epochs=MAX_EPOCHS)
 72 | 
 73 |         em = EvalModel(trainer=trainer)
 74 |         em._save()
 75 |         # Load fresh for consistency
 76 |         em = load_eval_model_from_dir(em.model_path)
 77 |         em.run_test_set()
 78 |         em.calc_metrics()
 79 |         em.calc_ward_metrics()
 80 |         print(em.classification_report_df)
 81 |         em._save()
 82 | 
 83 |         # Also do a matching number of .5x scale models.
 84 |         ref_arch = "multi_scale_cnn_lstm"
 85 | 
 86 |         name = f"{NAME}_mscl_p5x_{i}"
 87 | 
 88 |         config = {}
 89 |         config["base_config"] = ref_arch
 90 |         config["model_config"] = {"scale": 0.5}
 91 |         config["name"] = name
 92 | 
 93 |         trainer = Trainer(**config)
 94 |         trainer.init_data()
 95 |         trainer.init_train()
 96 | 
 97 |         trainer.train(max_epochs=MAX_EPOCHS)
 98 | 
 99 |         em = EvalModel(trainer=trainer)
100 |         em._save()
101 |         # Load fresh for consistency
102 |         em = load_eval_model_from_dir(em.model_path)
103 |         em.run_test_set()
104 |         em.calc_metrics()
105 |         em.calc_ward_metrics()
106 |         print(em.classification_report_df)
107 |         em._save()
108 | 
109 |         # Also do a matching number of 2x scale models.
110 |         ref_arch = "multi_scale_cnn_lstm"
111 | 
112 |         name = f"{NAME}_mscl_2x_{i}"
113 | 
114 |         config = {}
115 |         config["base_config"] = ref_arch
116 |         config["model_config"] = {"scale": 2}
117 |         config["name"] = name
118 | 
119 |         trainer = Trainer(**config)
120 |         trainer.init_data()
121 |         trainer.init_train()
122 | 
123 |         trainer.train(max_epochs=MAX_EPOCHS)
124 | 
125 |         em = EvalModel(trainer=trainer)
126 |         em._save()
127 |         # Load fresh for consistency
128 |         em = load_eval_model_from_dir(em.model_path)
129 |         em.run_test_set()
130 |         em.calc_metrics()
131 |         em.calc_ward_metrics()
132 |         print(em.classification_report_df)
133 |         em._save()
134 | 
135 |         # Also do a matching number of .5x scale models.
136 |         ref_arch = "multi_scale_cnn_lstm"
137 | 
138 |         name = f"{NAME}_mscl_p5x_{i}"
139 | 
140 |         config = {}
141 |         config["base_config"] = ref_arch
142 |         config["model_config"] = {"scale": 0.5}
143 |         config["name"] = name
144 | 
145 |         trainer = Trainer(**config)
146 |         trainer.init_data()
147 |         trainer.init_train()
148 | 
149 |         trainer.train(max_epochs=MAX_EPOCHS)
150 | 
151 |         em = EvalModel(trainer=trainer)
152 |         em._save()
153 |         # Load fresh for consistency
154 |         em = load_eval_model_from_dir(em.model_path)
155 |         em.run_test_set()
156 |         em.calc_metrics()
157 |         em.calc_ward_metrics()
158 |         print(em.classification_report_df)
159 |         em._save()
160 | 
161 | 
162 | if __name__ == "__main__":
163 |     do_run()
164 | 


--------------------------------------------------------------------------------
/scripts/run_ensemble_exp.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | """
 4 | Executes a series of benchmarking runs to be used in plots & tables in the paper:
 5 | * Ensembled MS-C/L models with various #'s of folds / submodels
 6 | """
 7 | 
 8 | import sys
 9 | 
10 | sys.path.insert(0, ".")
11 | 
12 | from filternet.training.evalmodel import *
13 | from filternet.training.ensemble_train import EnsembleTrainer
14 | 
15 | NAME = "ensembles_3"  # unique name for this particular run
16 | MAX_EPOCHS = 100
17 | NUM_REPEATS = 10
18 | NUM_FOLDS = [2, 3, 4, 5]
19 | saved_model_glob = (
20 |     f"saved_models/{NAME}*/evalmodel.pth"  # helps NB's to load these models
21 | )
22 | 
23 | 
24 | def do_run():
25 |     for i in range(0, NUM_REPEATS):
26 |         # Do the FilterNet reference architectures
27 |         for num_folds in NUM_FOLDS:
28 |             name = f"{NAME}_{num_folds}_folds_{i}"
29 | 
30 |             config = {"base_config": "multi_scale_cnn_lstm", "model_config": {}}
31 | 
32 |             trainer = EnsembleTrainer(n_folds=num_folds, name=name)
33 |             trainer.init_data()
34 | 
35 |             trainer.train(max_epochs=MAX_EPOCHS)
36 |             trainer._save()
37 | 
38 |             em = EvalModel(trainer=trainer)
39 | 
40 |             # annotate extra field, to make post-analysis easier.
41 |             em.extra["exp_name"] = NAME
42 |             em.extra["num_folds"] = num_folds
43 |             em.extra["i_repeat"] = i
44 | 
45 |             em.run_test_set()
46 |             em.calc_metrics()
47 |             em.calc_ward_metrics()
48 |             print(em.classification_report_df)
49 |             em._save()
50 | 
51 | 
52 | if __name__ == "__main__":
53 |     do_run()
54 | 


--------------------------------------------------------------------------------
/scripts/run_mm_base_configs_exp.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | """
  4 | Executes a series of benchmarking runs to be used in plots & tables in the paper:
  5 | * Runs for 'multimodal sensor fusion analysis'
  6 | 
  7 | """
  8 | 
  9 | import sys
 10 | 
 11 | sys.path.insert(0, ".")
 12 | 
 13 | from filternet.models.reference_architectures import ref_archs
 14 | from filternet.training.evalmodel import *
 15 | from filternet.training.ensemble_train import EnsembleTrainer
 16 | 
 17 | NAME = "mm_base_configs_2"  # unique name for this particular run
 18 | MAX_EPOCHS = 100  # 0
 19 | NUM_REPEATS = 5  # 20
 20 | saved_model_glob = (
 21 |     f"saved_models/{NAME}*/evalmodel.pth"  # helps NB's to load these models
 22 | )
 23 | 
 24 | # Iterate through these architectures
 25 | ref_archs = [
 26 |     # 'deepconvlstm',  # slow
 27 |     "base_cnn",
 28 |     "base_lstm",
 29 |     "cnn_lstm",
 30 |     "multi_scale_cnn",
 31 |     "multi_scale_cnn_lstm",
 32 | ]
 33 | 
 34 | # For each architecture, train/eval on these sensor subsets
 35 | sensor_subsets = [
 36 |     "accels",
 37 |     "gyros",
 38 |     "accels+gyros",
 39 |     "accels+gyros+magnetic",
 40 |     "opportunity",
 41 | ]
 42 | 
 43 | 
 44 | def do_run():
 45 |     for i in range(1, NUM_REPEATS):
 46 |         # Do the FilterNet reference architectures
 47 |         for ref_arch in ref_archs:
 48 |             for sensor_subset in sensor_subsets:
 49 |                 name = f"{NAME}_{ref_arch}_{sensor_subset}_{i}"
 50 | 
 51 |                 config = {}
 52 |                 config["base_config"] = ref_arch
 53 |                 config["name"] = name
 54 |                 config["sensor_subset"] = sensor_subset
 55 | 
 56 |                 trainer = Trainer(**config)
 57 |                 trainer.init_data()
 58 |                 trainer.init_train()
 59 | 
 60 |                 trainer.train(max_epochs=MAX_EPOCHS)
 61 | 
 62 |                 em = EvalModel(trainer=trainer)
 63 |                 em._save()
 64 | 
 65 |                 # Load fresh for consistency
 66 |                 em = load_eval_model_from_dir(em.model_path)
 67 | 
 68 |                 em.run_test_set()
 69 |                 em.calc_metrics()
 70 |                 em.calc_ward_metrics()
 71 |                 print(em.classification_report_df)
 72 |                 print(f"Weighted F1: {em.f1:.4f}")
 73 |                 print(f"Event F1: {em.event_f1:.4f}")
 74 |                 print(f"Nonull F1: {em.nonull_f1:.4f}")
 75 |                 em._save()
 76 | 
 77 |         # And also do the 4-fold ensemble, which requires slightly different code
 78 |         for sensor_subset in sensor_subsets:
 79 |             num_folds = 4
 80 | 
 81 |             name = f"{NAME}_{num_folds}_folds_{sensor_subset}_{i}"
 82 | 
 83 |             config = {"base_config": "multi_scale_cnn_lstm", "model_config": {}}
 84 |             config["sensor_subset"] = sensor_subset
 85 | 
 86 |             trainer = EnsembleTrainer(n_folds=num_folds, name=name, config=config)
 87 |             trainer.init_data()
 88 | 
 89 |             trainer.train(max_epochs=MAX_EPOCHS)
 90 |             trainer._save()
 91 | 
 92 |             em = EvalModel(trainer=trainer)
 93 |             em._save()
 94 | 
 95 |             # Load fresh for consistency
 96 |             em = load_eval_model_from_dir(em.model_path)
 97 | 
 98 |             # annotate extra field, to make post-analysis easier.
 99 |             em.extra["exp_name"] = NAME
100 |             em.extra["num_folds"] = num_folds
101 |             em.extra["i_repeat"] = i
102 | 
103 |             em.run_test_set()
104 |             em.calc_metrics()
105 |             em.calc_ward_metrics()
106 |             print(em.classification_report_df)
107 |             em._save()
108 | 
109 | 
110 | if __name__ == "__main__":
111 |     do_run()
112 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(
 4 |     name='FilterNet',
 5 |     version='',
 6 |     packages=['filternet'],
 7 |     url='https://github.com/WhistleLabs/FilterNet',
 8 |     license='',
 9 |     author='Whistle Labs',
10 |     author_email='',
11 |     description=''
12 | )
13 | 


--------------------------------------------------------------------------------
/stripchart heatmaps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/stripchart heatmaps.png


--------------------------------------------------------------------------------
/tests/datasets/test_har.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import os
 4 | 
 5 | import pytest
 6 | 
 7 | from filternet.datasets import har as ds
 8 | from filternet.datasets import sliding_window_x_y
 9 | 
10 | 
11 | @pytest.fixture
12 | def dfs_dict():
13 |     return ds.get_or_make_dfs()
14 | 
15 | 
16 | def test_download():
17 |     ds.download_if_needed()
18 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR))
19 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE))
20 | 
21 | 
22 | def test_get_or_make_dfs(dfs_dict):
23 |     assert dfs_dict["df_train"].shape == (403712, 12)
24 |     assert dfs_dict["df_val"].shape == (66816, 12)
25 |     assert dfs_dict["df_test"].shape == (188672, 12)
26 | 
27 |     for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]:
28 |         assert df.isna().sum().sum() == 0
29 | 
30 |     assert dfs_dict["s_labels"].shape == (6,)
31 |     assert dfs_dict["df_cols"].shape == (12, 3)
32 | 
33 | 
34 | def test_get_x_y(dfs_dict):
35 |     lens = {}
36 |     for which_set in ["train", "train+val", "val", "test"]:
37 |         Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict)
38 |         wl = 128
39 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
40 | 
41 |         assert X.shape[1] == data_spec["input_channels"]
42 |         assert X.shape[2] == wl
43 |         for y in ys:
44 |             assert y.shape[1] == wl
45 | 
46 |         lens[which_set] = len(Xc)
47 |         assert len(data_spec["input_features"]) == data_spec["input_channels"]
48 |         assert data_spec["n_outputs"] == len(data_spec["output_spec"])
49 | 
50 |         for o in data_spec["output_spec"]:
51 |             assert "name" in o
52 |             assert o["num_classes"] == len(o["classes"])
53 | 
54 |         assert "dataset_name" in data_spec
55 | 
56 |     assert lens["train"] + lens["val"] == lens["train+val"]
57 | 
58 | 
59 | def test_urls():
60 |     assert ds.DATASET_FILE == "UCI%20HAR%20Dataset.zip"
61 |     assert ds.DATASET_SUBDIR == "UCI_HAR_Dataset"
62 | 


--------------------------------------------------------------------------------
/tests/datasets/test_intention_recognition.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import os
 4 | 
 5 | import pytest
 6 | 
 7 | from filternet.datasets import intention_recognition as ds
 8 | from filternet.datasets import sliding_window_x_y
 9 | 
10 | 
11 | @pytest.fixture
12 | def dfs_dict():
13 |     return ds.get_or_make_dfs()
14 | 
15 | 
16 | def test_download():
17 |     ds.download_if_needed()
18 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR))
19 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE))
20 | 
21 | 
22 | def test_get_or_make_dfs(dfs_dict):
23 |     # assert dfs_dict["df_train"].shape == (11565536, 68)
24 |     # assert dfs_dict["df_val"].shape == (2009920, 68)
25 |     # assert dfs_dict["df_test"].shape == (2468032, 68)
26 | 
27 |     for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]:
28 |         assert df.isna().sum().sum() == 0
29 | 
30 |     assert dfs_dict["s_labels"].shape == (5,)
31 |     assert dfs_dict["df_cols"].shape == (68, 2)
32 | 
33 | 
34 | def test_get_x_y(dfs_dict):
35 |     lens = {}
36 |     for which_set in ["train", "train+val", "val", "test"]:
37 |         Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict)
38 |         wl = 128
39 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
40 | 
41 |         assert X.shape[1] == data_spec["input_channels"]
42 |         assert X.shape[2] == wl
43 |         for y in ys:
44 |             assert y.shape[1] == wl
45 | 
46 |         lens[which_set] = len(Xc)
47 |         assert len(data_spec["input_features"]) == data_spec["input_channels"]
48 |         assert data_spec["n_outputs"] == len(data_spec["output_spec"])
49 | 
50 |         for o in data_spec["output_spec"]:
51 |             assert "name" in o
52 |             assert o["num_classes"] == len(o["classes"])
53 | 
54 |         assert "dataset_name" in data_spec
55 | 
56 |     assert lens["train"] + lens["val"] == lens["train+val"]
57 | 
58 | 
59 | def test_urls():
60 |     assert ds.DATASET_FILE == "eeg-motor-movementimagery-dataset-1.0.0.zip"
61 |     assert ds.DATASET_SUBDIR == "eeg-motor-movementimagery-dataset-1.0.0"
62 | 


--------------------------------------------------------------------------------
/tests/datasets/test_opportunity.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import os
  4 | 
  5 | import pytest
  6 | 
  7 | from filternet.datasets import opportunity as opp
  8 | from filternet.datasets import sliding_window_x_y
  9 | 
 10 | 
 11 | @pytest.fixture
 12 | def dfs_dict():
 13 |     return opp.get_or_make_dfs()
 14 | 
 15 | 
 16 | def test_download_opportunity():
 17 |     opp.download_if_needed()
 18 |     assert os.path.exists(os.path.join(opp.datasets_dir, opp.DATASET_SUBDIR))
 19 |     assert os.path.exists(os.path.join(opp.datasets_dir, opp.DATASET_FILE))
 20 | 
 21 | 
 22 | def test_get_or_make_dfs(dfs_dict):
 23 |     assert dfs_dict["df_train"].shape == (497014, 252)
 24 |     assert dfs_dict["df_val"].shape == (60949, 252)
 25 |     assert dfs_dict["df_test"].shape == (118750, 252)
 26 | 
 27 |     for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]:
 28 |         assert df.isna().sum().sum() == 0
 29 | 
 30 |     assert dfs_dict["df_labels_locomotion"].shape == (5, 3)
 31 |     assert dfs_dict["df_labels_gestures"].shape == (18, 3)
 32 | 
 33 |     assert dfs_dict["df_cols"].shape == (250, 6)
 34 | 
 35 | 
 36 | def test_get_x_y(dfs_dict):
 37 |     lens = {}
 38 |     for which_set in ["train", "train+val", "val", "test"]:
 39 |         Xc, ycs, data_spec = opp.get_x_y_contig(which_set, dfs_dict=dfs_dict)
 40 |         wl = 128
 41 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
 42 | 
 43 |         assert X.shape[1] == 113
 44 |         assert X.shape[2] == wl
 45 |         for y in ys:
 46 |             assert y.shape[1] == wl
 47 | 
 48 |         lens[which_set] = len(Xc)
 49 |         assert data_spec["input_channels"] == 113
 50 |         assert len(data_spec["input_features"]) == data_spec["input_channels"]
 51 |         assert data_spec["n_outputs"] == len(data_spec["output_spec"])
 52 | 
 53 |         for o in data_spec["output_spec"]:
 54 |             assert "name" in o
 55 |             assert o["num_classes"] == len(o["classes"])
 56 | 
 57 |         assert "dataset_name" in data_spec
 58 | 
 59 |     assert lens["train"] + lens["val"] == lens["train+val"]
 60 | 
 61 | 
 62 | def test_get_different_outputs(dfs_dict):
 63 |     with pytest.raises(AssertionError):
 64 |         Xc, ycs, data_spec = opp.get_x_y_contig(dfs_dict=dfs_dict, y_cols="y_gesture")
 65 |     Xc, ycs, data_spec = opp.get_x_y_contig(dfs_dict=dfs_dict, y_cols=["y_gesture"])
 66 |     assert len(ycs) == 1
 67 |     Xc, ycs2, data_spec = opp.get_x_y_contig(
 68 |         dfs_dict=dfs_dict, y_cols=["y_gesture", "y_locomotion"]
 69 |     )
 70 |     assert len(ycs2) == 2
 71 |     assert ycs[0].shape == ycs2[0].shape
 72 | 
 73 | 
 74 | def test_get_sensor_subsets(dfs_dict):
 75 |     lens = {}
 76 |     expected_lens = {
 77 |         "accels": 15,
 78 |         "gyros": 15,
 79 |         "accels+gyros": 30,
 80 |         "accels+gyros+magnetic": 45,
 81 |         "opportunity": 113,
 82 |         None: 113,
 83 |     }
 84 |     for sensor_subset in [
 85 |         None,
 86 |         "accels",
 87 |         "gyros",
 88 |         "accels+gyros",
 89 |         "accels+gyros+magnetic",
 90 |         "opportunity",
 91 |     ]:
 92 |         Xc, ycs, data_spec = opp.get_x_y_contig(
 93 |             "train+val", sensor_subset=sensor_subset, dfs_dict=dfs_dict
 94 |         )
 95 |         assert Xc.shape[1] == expected_lens[sensor_subset]
 96 | 
 97 | 
 98 | def test_urls():
 99 |     assert opp.DATASET_FILE == "OpportunityUCIDataset.zip"
100 |     assert opp.DATASET_SUBDIR == "OpportunityUCIDataset"
101 | 


--------------------------------------------------------------------------------
/tests/datasets/test_smartphone_hapt.py:
--------------------------------------------------------------------------------
 1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
 2 | 
 3 | import os
 4 | 
 5 | import pytest
 6 | 
 7 | from filternet.datasets import sliding_window_x_y
 8 | from filternet.datasets import smartphone_hapt as ds
 9 | 
10 | 
11 | @pytest.fixture
12 | def dfs_dict():
13 |     return ds.get_or_make_dfs()
14 | 
15 | 
16 | def test_download():
17 |     ds.download_if_needed()
18 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR))
19 |     assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE))
20 | 
21 | 
22 | def test_get_or_make_dfs(dfs_dict):
23 |     assert dfs_dict["df_train"].shape == (686203, 10)
24 |     assert dfs_dict["df_val"].shape == (111281, 10)
25 |     assert dfs_dict["df_test"].shape == (325288, 10)
26 | 
27 |     for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]:
28 |         assert not df.isna().any().any()
29 | 
30 |     assert dfs_dict["s_labels"].shape == (12,)
31 |     assert dfs_dict["df_cols"].shape == (10, 3)
32 | 
33 | 
34 | def test_get_x_y(dfs_dict):
35 |     lens = {}
36 |     for which_set in ["train", "train+val", "val", "test"]:
37 |         Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict)
38 |         wl = 128
39 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
40 | 
41 |         assert X.shape[1] == data_spec["input_channels"]
42 |         assert X.shape[2] == wl
43 |         for y in ys:
44 |             assert y.shape[1] == wl
45 | 
46 |         lens[which_set] = len(Xc)
47 |         assert len(data_spec["input_features"]) == data_spec["input_channels"]
48 |         assert data_spec["n_outputs"] == len(data_spec["output_spec"])
49 | 
50 |         for o in data_spec["output_spec"]:
51 |             assert "name" in o
52 |             assert o["num_classes"] == len(o["classes"])
53 | 
54 |         assert "dataset_name" in data_spec
55 | 
56 |     assert lens["train"] + lens["val"] == lens["train+val"]
57 | 
58 | 
59 | def test_urls():
60 |     assert ds.DATASET_FILE == "HAPT%20Data%20Set.zip"
61 |     assert ds.DATASET_SUBDIR == "HAPT_Data_Set"
62 | 


--------------------------------------------------------------------------------
/tests/test_datasets.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | import filternet.datasets
4 | 
5 | 
6 | def test_datasets_dir():
7 |     assert "datasets" in filternet.datasets.datasets_dir
8 |     print(filternet.datasets.datasets_dir)
9 | 


--------------------------------------------------------------------------------
/tests/test_init.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
2 | 
3 | import filternet
4 | 
5 | 
6 | def test_base_dir():
7 |     assert "filternet" in filternet.base_dir
8 |     print(filternet.base_dir)
9 | 


--------------------------------------------------------------------------------
/tests/test_models.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import pytest
  4 | import torch
  5 | 
  6 | import filternet.models as mo
  7 | from filternet.datasets import opportunity as opp, sliding_window_x_y
  8 | 
  9 | 
 10 | @pytest.fixture
 11 | def dfs_dict():
 12 |     return opp.get_or_make_dfs()
 13 | 
 14 | 
 15 | @pytest.fixture
 16 | def x_y_dict():
 17 |     wl = 64
 18 |     xys = {}
 19 |     for which_set in ["train", "val", "test"]:
 20 |         Xc, ycs, data_spec = opp.get_x_y_contig(which_set)
 21 | 
 22 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
 23 | 
 24 |         assert X.shape[1] == 113
 25 |         assert X.shape[2] == wl
 26 |         assert ys[0].shape[1] == wl
 27 | 
 28 |         xys["X_" + which_set] = torch.Tensor(X)
 29 |         xys["ys_" + which_set] = [torch.Tensor(y).long() for y in ys]
 30 |     xys["win_len"] = wl
 31 |     return xys
 32 | 
 33 | 
 34 | def test_make_model():
 35 |     net = mo.DeepConvLSTM(scale=0.25)
 36 | 
 37 | 
 38 | def test_transform_output_m2o(x_y_dict):
 39 |     net = mo.DeepConvLSTM(scale=(1.0 / 8))
 40 |     N = 10
 41 |     X = x_y_dict["X_train"][:N]
 42 |     ys = [y[:N] for y in x_y_dict["ys_train"]]
 43 | 
 44 |     y_outs = net(X)
 45 |     for y_out, num_output_classes in zip(y_outs, net.num_output_classes):
 46 |         assert y_out.shape == (N, num_output_classes, 1)
 47 |     y_comps = net.transform_targets(ys)
 48 |     for y_comp, y_out in zip(y_comps, y_outs):
 49 |         assert y_comp.shape == y_out.shape
 50 | 
 51 | 
 52 | def test_transform_output_m2m(x_y_dict):
 53 |     net = mo.DeepConvLSTM(scale=(1.0 / 8))
 54 |     N = 10
 55 |     X = x_y_dict["X_train"][:N]
 56 |     ys = [y[:N] for y in x_y_dict["ys_train"]]
 57 | 
 58 |     y_outs = net(X)
 59 |     for y_out, num_output_classes in zip(y_outs, net.num_output_classes):
 60 |         assert y_out.shape == (
 61 |             N,
 62 |             num_output_classes,
 63 |             1 #x_y_dict["win_len"] - 2 * net.padding_lost_per_side,
 64 |         )
 65 |     y_comps = net.transform_targets(ys)
 66 |     for y_comp, y_out in zip(y_comps, y_outs):
 67 |         assert y_comp.shape == y_out.shape
 68 | 
 69 | 
 70 | @pytest.mark.skip("No need to generate 10,000 different models every time!")
 71 | def test_make_cnn_lstm_models(x_y_dict):
 72 |     N = 10
 73 |     X = x_y_dict["X_train"][:N]
 74 |     ys = [y[:N] for y in x_y_dict["ys_train"]]
 75 | 
 76 |     i = 0
 77 |     for n_pre in [0, 1, 2]:
 78 |         print("n_pre ", n_pre)
 79 |         for n_strided in [0, 3]:
 80 |             print("n_strided ", n_strided)
 81 |             for n_interp in [0, 1, 3]:
 82 |                 print("n_interp ", n_interp)
 83 |                 for n_dense_pre_l in [0, 1, 2]:
 84 |                     print("n_dense_pre_l", n_dense_pre_l)
 85 |                     for n_l in [0, 1, 2]:
 86 |                         print("n_l ", n_l)
 87 |                         for n_dense_post_l in [0, 1, 2]:
 88 |                             print("n_dense_post_l ", n_dense_post_l)
 89 |                             for do_pool in [True, False]:
 90 |                                 print("do_pool ", do_pool)
 91 |                                 for stride_pos in ["pre", "post"]:
 92 |                                     print("stride_pos ", stride_pos)
 93 |                                     for dropout in [0, 0.5]:
 94 |                                         print("dropout ", dropout)
 95 |                                         for bn_pre in [True, False]:
 96 |                                             print("bn_pre ", bn_pre)
 97 |                                             i += 1
 98 |                                             opts = dict(
 99 |                                                 output_type="many_to_many",
100 |                                                 scale=(1.0 / 4),
101 |                                                 n_pre=n_pre,
102 |                                                 n_strided=n_strided,
103 |                                                 n_interp=n_interp,
104 |                                                 n_dense_pre_l=n_dense_pre_l,
105 |                                                 n_l=n_l,
106 |                                                 n_dense_post_l=n_dense_post_l,
107 |                                                 do_pool=do_pool,
108 |                                                 stride_pos=stride_pos,
109 |                                                 dropout=dropout,
110 |                                                 bn_pre=bn_pre,
111 |                                             )
112 |                                             print(i)
113 |                                             try:
114 |                                                 net = mo.FilterNet(**opts)
115 |                                                 y_outs = net(X)
116 |                                                 for y_out, num_output_classes in zip(
117 |                                                     y_outs, net.num_output_classes
118 |                                                 ):
119 |                                                     assert y_out.shape == (
120 |                                                         N,
121 |                                                         num_output_classes,
122 |                                                         x_y_dict["win_len"]
123 |                                                         / net.output_stride,
124 |                                                     )
125 |                                                 y_comps = net.transform_targets(ys)
126 |                                                 for y_comp, y_out in zip(
127 |                                                     y_comps, y_outs
128 |                                                 ):
129 |                                                     assert y_comp.shape == y_out.shape
130 |                                             except Exception as e:
131 |                                                 print(opts)
132 |                                                 raise
133 | 


--------------------------------------------------------------------------------
/tests/test_train.py:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2020 Pet Insight  Project - All Rights Reserved
  2 | 
  3 | import os
  4 | 
  5 | import pytest
  6 | import torch
  7 | 
  8 | from filternet.datasets import opportunity as opp, sliding_window_x_y
  9 | from filternet.training.trainable import MPTrainable
 10 | 
 11 | 
 12 | @pytest.fixture
 13 | def dfs_dict():
 14 |     return opp.get_or_make_dfs()
 15 | 
 16 | 
 17 | @pytest.fixture
 18 | def x_y_dict(dfs_dict):
 19 |     wl = 64
 20 |     xys = {}
 21 |     for which_set in ["train", "val", "test"]:
 22 |         Xc, ycs, data_spec = opp.get_x_y_contig(which_set, dfs_dict=dfs_dict)
 23 | 
 24 |         X, ys = sliding_window_x_y(Xc, ycs, win_len=wl)
 25 | 
 26 |         assert X.shape[1] == 113
 27 |         assert X.shape[2] == wl
 28 |         for y in ys:
 29 |             assert y.shape[1] == wl
 30 | 
 31 |         xys["X_" + which_set] = torch.Tensor(X)
 32 |         xys["ys_" + which_set] = [torch.Tensor(y).long() for y in ys]
 33 |     xys["win_len"] = wl
 34 |     return xys
 35 | 
 36 | 
 37 | def test_train_val_test_model():
 38 |     trainable = MPTrainable(
 39 |         {
 40 |             "name": "unittest",
 41 |             "loss_func": "cross_entropy",
 42 |             "decimation": 10,
 43 |             "base_config": "base_cnn",
 44 |             "model_config": {"scale": (1.0 / 16)},
 45 |         }
 46 |     )
 47 | 
 48 |     trainer = trainable.trainer
 49 |     assert trainer.model is not None
 50 |     assert trainer.optimizer is not None
 51 |     assert trainer.win_len is not None
 52 |     assert trainer.loss_func is not None
 53 | 
 54 |     assert trainer.dl_train is not None
 55 |     assert trainer.dl_val is not None
 56 |     assert trainer.dl_test is not None
 57 | 
 58 |     # one training iteration
 59 |     ret = trainable.train()
 60 |     assert not ret["done"]
 61 |     assert ret["training_iteration"] == 1
 62 |     assert "train_loss" in ret
 63 |     assert "train_acc" in ret
 64 |     assert "mean_loss" in ret
 65 |     assert "mean_accuracy" in ret
 66 |     assert "val_f1" in ret
 67 |     assert ret["config"]["loss_func"] == trainer.loss_func
 68 |     print(ret)
 69 | 
 70 |     trainable.trainer.loss_func = "binary_cross_entropy"
 71 |     ret = trainable.train()
 72 |     assert not ret["done"]
 73 |     assert ret["training_iteration"] == 2
 74 |     assert "train_loss" in ret
 75 |     print(ret)
 76 | 
 77 |     ret = trainable.train()
 78 |     assert not ret["done"]
 79 |     assert ret["training_iteration"] == 3
 80 |     assert "train_loss" in ret
 81 |     print(ret)
 82 | 
 83 |     trainer.train_state.extra["temp"] = 1
 84 | 
 85 |     path = trainable.save()
 86 |     assert os.path.exists(path)
 87 |     assert trainer.train_state.extra["temp"] == 1
 88 |     trainer.train_state.extra["temp"] = 2
 89 |     assert trainer.train_state.extra["temp"] == 2
 90 |     trainable.restore(path)
 91 |     assert trainer.train_state.extra["temp"] == 1  # make sure restoring state worked.
 92 |     ret = trainable.train()
 93 |     assert ret["training_iteration"] == 4
 94 | 
 95 |     print(ret)
 96 | 
 97 | 
 98 | #
 99 | # def test_train_diff_dimensionalities():
100 | #
101 | #     trainable = train.CNNLSTMTrainable({'output_type': 'many_to_one_takelast', 'decimation': 10, 'loss': 'binary_cross_entropy', 'scale': (1.0 / 8)})
102 | #     ret = trainable.train()
103 | #     assert ret['training_iteration'] == 1
104 | #     assert 'train_loss' in ret
105 | #     assert 'train_acc' in ret
106 | #     assert 'mean_loss' in ret
107 | #     assert 'mean_accuracy' in ret
108 | #     assert 'val_f1' in ret
109 | #
110 | #     trainable = train.CNNLSTMTrainable(
111 | #         {'output_type': 'many_to_one_takelast', 'decimation': 10, 'loss': 'cross_entropy', 'scale': (1.0/8)})
112 | #     ret = trainable.train()
113 | #     assert ret['training_iteration'] == 1
114 | #     assert 'train_loss' in ret
115 | #     assert 'train_acc' in ret
116 | #     assert 'mean_loss' in ret
117 | #     assert 'mean_accuracy' in ret
118 | #     assert 'val_f1' in ret
119 | #
120 | #     trainable = train.CNNLSTMTrainable({'output_type': 'many_to_many', 'decimation': 10, 'scale': (1.0 / 8)})
121 | #     ret = trainable.train()
122 | #     assert ret['training_iteration'] == 1
123 | #     assert 'train_loss' in ret
124 | #     assert 'train_acc' in ret
125 | #     assert 'mean_loss' in ret
126 | #     assert 'mean_accuracy' in ret
127 | #     assert 'val_f1' in ret
128 | #
129 | # # Re-enable when ability to use different models is re-enabled:
130 | # def test_train_diff_models():
131 | #     trainable = train.CNNLSTMTrainable({'model_class': 'DeepConvLSTM', 'decimation': 10, 'scale': (1.0 / 8)})
132 | #     ret = trainable.train()
133 | #     assert ret['training_iteration'] == 1
134 | #     assert 'train_loss' in ret
135 | #     assert 'train_acc' in ret
136 | #     assert 'mean_loss' in ret
137 | #     assert 'mean_accuracy' in ret
138 | #     assert 'val_f1' in ret
139 | #     ret = trainable.test_with_overlap()
140 | #     assert 'test_f1' in ret
141 | #     for o in ret['output_records']:
142 | #         assert 'classification_report_txt' in o
143 | #
144 | #
145 | #     trainable = train.CNNLSTMTrainable({'model_class': 'FilterNet', 'decimation': 10, 'scale': (1.0 / 8)})
146 | #     ret = trainable.train()
147 | #     assert ret['training_iteration'] == 1
148 | #     assert 'train_loss' in ret
149 | #     assert 'train_acc' in ret
150 | #     assert 'mean_loss' in ret
151 | #     assert 'mean_accuracy' in ret
152 | #     assert 'val_f1' in ret
153 | #     ret = trainable.test_with_overlap()
154 | #     assert 'test_f1' in ret
155 | #     for o in ret['output_records']:
156 | #         assert 'classification_report_txt' in o
157 | 


--------------------------------------------------------------------------------
/training_history.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/training_history.png


--------------------------------------------------------------------------------
/win_len_effects.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/win_len_effects.png


--------------------------------------------------------------------------------