├── .gitignore ├── Filternet Paper - architectures.ipynb ├── Filternet Paper - ensembles.ipynb ├── Filternet Paper - event metrics visualization.ipynb ├── Filternet Paper - heat map examples to compare model results.ipynb ├── Filternet Paper - inference window length.ipynb ├── Filternet Paper - multimodal sensor fusion.ipynb ├── Filternet Paper - overall results summary.ipynb ├── Filternet Paper - plot training history example.ipynb ├── README.md ├── ensemble_effects.png ├── environment.yaml ├── filternet ├── __init__.py ├── datasets │ ├── .gitignore │ ├── __init__.py │ ├── har.py │ ├── intention_recognition.py │ ├── opportunity.py │ └── smartphone_hapt.py ├── models │ ├── __init__.py │ ├── base_layers.py │ ├── base_net.py │ ├── deep_conv_lstm.py │ ├── filter_net.py │ ├── filter_net_ensemble.py │ └── reference_architectures.py ├── mputil.py └── training │ ├── __init__.py │ ├── ensemble_train.py │ ├── evalmodel.py │ ├── train.py │ └── trainable.py ├── multimodal_sensor_fusion.png ├── scripts ├── __init__.py ├── run_base_configs_exp.py ├── run_ensemble_exp.py └── run_mm_base_configs_exp.py ├── setup.py ├── stripchart heatmaps.png ├── tests ├── datasets │ ├── test_har.py │ ├── test_intention_recognition.py │ ├── test_opportunity.py │ └── test_smartphone_hapt.py ├── test_datasets.py ├── test_init.py ├── test_models.py └── test_train.py ├── training_history.png └── win_len_effects.png /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # Installer logs 31 | pip-log.txt 32 | pip-delete-this-directory.txt 33 | 34 | # Unit test / coverage reports 35 | htmlcov/ 36 | .tox/ 37 | .nox/ 38 | .coverage 39 | .coverage.* 40 | .cache 41 | nosetests.xml 42 | coverage.xml 43 | *.cover 44 | *.py,cover 45 | .hypothesis/ 46 | .pytest_cache/ 47 | cover/ 48 | 49 | # Jupyter Notebook 50 | .ipynb_checkpoints 51 | 52 | # User-specific stuff 53 | .idea 54 | .idea/ 55 | .idea/* 56 | 57 | Icon 58 | 59 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # FilterNet: A many-to-many deep learning architecture for time series classification 3 | 4 | [![DOI](https://zenodo.org/badge/242397153.svg)](https://zenodo.org/badge/latestdoi/242397153) 5 | 6 | This repository contains code to reproduce the results and figures in the paper: 7 | *[FilterNet: A many-to-many deep learning architecture for time series classification](https://www.mdpi.com/703084)*. 8 | 9 | ## Setup 10 | The easiest way to run this software is via the Anaconda Python distribution. 11 | 12 | 1. Install Anaconda 13 | 2. Run `conda env create -f environment.yaml` 14 | 3. Enable the `filternet` environment, like, `source activate filternet` 15 | 4. Install filternet so it is importable, by running `pip install -e .` in the same 16 | directory as setup.py 17 | 18 | ## Running tests 19 | In the root dir of this repo: 20 | 21 | ``` 22 | pytest tests 23 | ``` 24 | 25 | This will be *really* slow the first time because it has to download and pre-process 26 | several large AR datasets. 27 | 28 | Subsequent test runs will probably still be slow, but... less slow. 29 | 30 | ## Reproducing Results 31 | 32 | 1. Run the scripts in the `scripts/` directory. These are very long-running scripts that 33 | reproduce each experimental condition many times. You might want to set, e.g., `NUM_REPEATS=1` 34 | if you don't need this level of reproducibility. 35 | 36 | 2. Run the notebooks to re-produce the figures. You might need to edit a few paths to specific 37 | models to match the filenames on your system, especially if you changed the 38 | `NAME` or `NUM_REPEATS` parameters. 39 | 40 | ------ 41 | Copyright (C) 2020 Pet Insight Project - All Rights Reserved 42 | -------------------------------------------------------------------------------- /ensemble_effects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/ensemble_effects.png -------------------------------------------------------------------------------- /environment.yaml: -------------------------------------------------------------------------------- 1 | name: filternet 2 | channels: 3 | - pytorch 4 | - defaults 5 | 6 | dependencies: 7 | - anaconda == 2019.10 # unpin this for latest packages; this pins lots of versions. 8 | - ipython 9 | - jupyter 10 | - matplotlib 11 | - numpy 12 | - pandas 13 | - pip 14 | - pytest 15 | - pytorch == 1.0.1 # unpin for latest pytorch 16 | - scikit-learn 17 | - scipy 18 | - seaborn 19 | - traits 20 | - pip: 21 | - hyperopt 22 | - pyEDFlib 23 | - ray == 0.8.4 # you could unpin, but api seems unstable 24 | - tabulate 25 | - torchsummary 26 | - torchsummaryX 27 | - typing 28 | - ward-metrics == 0.9.5 # probably doesn't need to be pinned 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /filternet/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | base_dir = os.path.split(__file__)[0] -------------------------------------------------------------------------------- /filternet/datasets/.gitignore: -------------------------------------------------------------------------------- 1 | *.zip 2 | OpportunityUCIDataset -------------------------------------------------------------------------------- /filternet/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import numpy as np 6 | import sklearn 7 | 8 | datasets_dir = os.path.split(__file__)[0] 9 | 10 | 11 | def sliding_window_x_y(Xc, ycs, win_len=128, step=None, shuffle=True): 12 | if step is None: 13 | step = int(win_len / 2) 14 | start_idxs = np.arange(0, len(Xc) - win_len, step) 15 | X = ( 16 | np.array([Xc[i : i + win_len] for i in start_idxs]) 17 | .transpose([0, 2, 1]) 18 | .astype(np.float32) 19 | ) # [N, C, L] 20 | ys = [ 21 | np.array([yc[i : i + win_len] for i in start_idxs]).astype(np.long) 22 | for yc in ycs 23 | ] # [len(ycs), N, L] 24 | if shuffle: 25 | X, *ys = sklearn.utils.shuffle(X, *ys, random_state=0) 26 | return X, ys 27 | else: 28 | return X, ys 29 | -------------------------------------------------------------------------------- /filternet/datasets/har.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ Loads the smartphone human activity and posture detection dataset. 4 | 5 | HAPT 6 | """ 7 | 8 | import os 9 | import urllib 10 | from zipfile import ZipFile 11 | 12 | # from . import datasets_dir 13 | import numpy as np 14 | 15 | datasets_dir = os.path.dirname(__file__) 16 | 17 | import pandas as pd 18 | 19 | # Papers 20 | # http://conference.scipy.org/proceedings/scipy2018/pdfs/christian_mcdaniel.pdf 21 | # github: https://github.com/xtianmcd/accelstm 22 | # https://www.mdpi.com/1424-8220/17/11/2556/htm 23 | # https://arxiv.org/pdf/1801.04503.pdf 24 | # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.641.2285&rep=rep1&type=pdf 25 | 26 | 27 | DATASET_URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip" 28 | _, DATASET_FILE = os.path.split(DATASET_URL) 29 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE) 30 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_") 31 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR) 32 | OUTPUT_SUBDIR = os.path.join(OUTPUT_DIR, "UCI HAR Dataset") 33 | 34 | dir_map = {"1": "x", "2": "y", "3": "z"} 35 | 36 | 37 | def download_if_needed(): 38 | """Downloads and extracts .zip if needed. """ 39 | if not os.path.exists(OUTPUT_DIR): 40 | if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)): 41 | print("Downloading .zip file...") 42 | urllib.request.urlretrieve( 43 | DATASET_URL, os.path.join(datasets_dir, DATASET_FILE) 44 | ) 45 | assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE)) 46 | 47 | print(f"Extracting to {OUTPUT_DIR}...") 48 | zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE)) 49 | zip.extractall(OUTPUT_DIR) 50 | 51 | assert os.path.exists(OUTPUT_SUBDIR) 52 | 53 | 54 | def get_label_series(): 55 | activity_labels_path = os.path.join(OUTPUT_SUBDIR, "activity_labels.txt") 56 | print(f"Loading file {activity_labels_path}") 57 | label_series = pd.read_csv(activity_labels_path, sep=r"\s+", header=None) 58 | label_series.columns = ["class_number", "class_label"] 59 | label_series = label_series.dropna(how="any", axis=0) 60 | return pd.Series( 61 | label_series["class_label"].values, index=label_series["class_number"].values 62 | ) 63 | 64 | 65 | def load_labels(): 66 | label_file_path = os.path.join(OUTPUT_SUBDIR, "train/y_train.txt") 67 | print(f"Reading labels file {label_file_path}") 68 | train_labels = pd.read_csv(label_file_path, r"\s+", header=None) 69 | train_labels.columns = ["activity_id"] 70 | 71 | train_inds = np.arange(0, len(train_labels), 2) 72 | train_inds = np.tile(train_inds, (128, 1)).T.flatten() 73 | train_labels = train_labels.loc[train_inds, :] 74 | 75 | label_file_path = os.path.join(OUTPUT_SUBDIR, "test/y_test.txt") 76 | print(f"Reading labels file {label_file_path}") 77 | test_labels = pd.read_csv(label_file_path, r"\s+", header=None) 78 | test_labels.columns = ["activity_id"] 79 | 80 | test_inds = np.arange(0, len(test_labels), 2) 81 | test_inds = np.tile(test_inds, (128, 1)).T.flatten() 82 | test_labels = test_labels.loc[test_inds, :] 83 | 84 | label_series = get_label_series() 85 | 86 | df_labels = pd.concat([train_labels, test_labels], ignore_index=True) 87 | 88 | df_labels["activity_label"] = df_labels["activity_id"].map(label_series) 89 | 90 | return df_labels 91 | 92 | 93 | def load_subjects(): 94 | subject_fp = os.path.join(OUTPUT_SUBDIR, "train/subject_train.txt") 95 | print(f"Reading subjects file {subject_fp}") 96 | train_subjects = pd.read_csv(subject_fp, r"\s+", header=None) 97 | train_subjects.columns = ["subject_id"] 98 | 99 | train_inds = np.arange(0, len(train_subjects), 2) 100 | train_inds = np.tile(train_inds, (128, 1)).T.flatten() 101 | train_subjects = train_subjects.loc[train_inds, :] 102 | 103 | subject_fp = os.path.join(OUTPUT_SUBDIR, "test/subject_test.txt") 104 | print(f"Reading subjects file {subject_fp}") 105 | test_subjects = pd.read_csv(subject_fp, r"\s+", header=None) 106 | test_subjects.columns = ["subject_id"] 107 | 108 | test_inds = np.arange(0, len(test_subjects), 2) 109 | test_inds = np.tile(test_inds, (128, 1)).T.flatten() 110 | test_subjects = test_subjects.loc[test_inds, :] 111 | 112 | df_subjects = pd.concat([train_subjects, test_subjects], ignore_index=True) 113 | 114 | return df_subjects 115 | 116 | 117 | def separate_subjects(): 118 | all_subjects = set(range(1, 31)) 119 | # validation_subjects = {4, 12, 20, 27} # 3 test (4, 12, 20) + 1 train (27) 120 | validation_subjects = {5, 16, 27} 121 | test_subjects = {2, 4, 9, 10, 12, 13, 18, 20, 24} # Given test subjects 122 | train_subjects = all_subjects - test_subjects - validation_subjects 123 | assert train_subjects | test_subjects | validation_subjects == all_subjects 124 | assert set() == validation_subjects & test_subjects 125 | assert set() == train_subjects & test_subjects 126 | assert set() == train_subjects & validation_subjects 127 | return train_subjects, validation_subjects, test_subjects 128 | 129 | 130 | def get_column_info(): 131 | sensor_types = ["total_accel", "body_accel", "gyro"] 132 | file_names = ["total_acc", "body_acc", "body_gyro"] 133 | num_chans = [3, 3, 3] 134 | assert len(num_chans) == len(sensor_types) 135 | types = [] 136 | names = [] 137 | file_prefixes = [] 138 | for i in range(len(sensor_types)): 139 | t = sensor_types[i] 140 | nc = num_chans[i] 141 | types.extend([t] * nc) 142 | file_prefixes.extend([file_names[i]] * nc) 143 | names.extend([f"{t[0]}{j + 1}" for j in range(nc)]) 144 | 145 | # Note that this segfaults if the input data is a tuple and not a list. WTF pandas? 146 | df = pd.DataFrame( 147 | [types, file_prefixes], index=["sensor_type", "file_prefix"], columns=names 148 | ).T 149 | df.index.name = "name" 150 | df["output"] = False 151 | 152 | df = df.reindex( 153 | df.index.append(pd.Index(["subject_id", "activity_id", "activity_label"])) 154 | ) 155 | 156 | df.loc[["activity_id", "activity_label"], "output"] = True 157 | 158 | return df 159 | 160 | 161 | def get_data(): 162 | train_subjects, validation_subjects, test_subjects = separate_subjects() 163 | labels = load_labels() 164 | subjects = load_subjects() 165 | col_info = get_column_info() 166 | 167 | output_cols = col_info.index[col_info.output == True] 168 | not_input_cols = col_info.index[~(col_info.output == False)] 169 | all_sensors = col_info.loc[col_info.output == False, "sensor_type"].unique() 170 | all_data = [] 171 | for d_type in ["train", "test"]: 172 | 173 | exp_data = [] 174 | for t in all_sensors: 175 | sensor_type_mask = t == col_info.sensor_type 176 | prefix = col_info.loc[sensor_type_mask, "file_prefix"].unique()[0] 177 | cur_sensors = col_info.loc[sensor_type_mask] 178 | for i_name in cur_sensors.index: 179 | file_path = os.path.join( 180 | OUTPUT_SUBDIR, 181 | d_type, 182 | "Inertial Signals", 183 | f"{prefix}_{dir_map[i_name[-1]]}_{d_type}.txt", 184 | ) 185 | print(f"Reading file {file_path}") 186 | tmp_df = pd.read_csv(file_path, sep=r"\s+", header=None) 187 | tmp_s = pd.Series(tmp_df.values[::2, :].flatten(), name=i_name) 188 | exp_data.append(tmp_s) 189 | 190 | all_data.append(pd.concat(exp_data, axis=1, ignore_index=False)) 191 | 192 | all_data = pd.concat(all_data, axis=0, ignore_index=True) 193 | for col in col_info.index[col_info.output == True]: 194 | all_data[col] = labels[col] 195 | 196 | all_data["subject_id"] = subjects.values 197 | 198 | train_data = all_data.loc[all_data["subject_id"].isin(train_subjects), :] 199 | validation_data = all_data.loc[all_data["subject_id"].isin(validation_subjects), :] 200 | test_data = all_data.loc[all_data["subject_id"].isin(test_subjects), :] 201 | return train_data, validation_data, test_data 202 | 203 | 204 | def get_dfs_processed(): 205 | df_train, df_val, df_test = get_data() 206 | col_df = get_column_info() 207 | # Calculate feature normalizing stats from training set only 208 | norm_mean = df_train.mean(axis=0) 209 | norm_std = df_train.std(axis=0) 210 | 211 | # Apply to all sets to normalize features 212 | input_features = col_df.index[col_df.output == False] 213 | for df in (df_train, df_val, df_test): 214 | # de-mean 215 | df.loc[:, input_features] -= norm_mean 216 | 217 | # unit std dev 218 | df.loc[:, input_features] /= norm_std 219 | 220 | # interpolate and NA's -> 0 221 | df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0) 222 | 223 | return df_train, df_val, df_test 224 | 225 | 226 | _df_dicts = {} 227 | 228 | 229 | def get_or_make_dfs(): 230 | if _df_dicts: 231 | return _df_dicts 232 | download_if_needed() 233 | 234 | cache_dir = os.path.join(OUTPUT_DIR, "cache") 235 | if not os.path.isdir(cache_dir) or not os.path.isfile( 236 | os.path.join(cache_dir, "df_train.df.pkl") 237 | ): 238 | print("Smartphone data not cached. Creating cache now...") 239 | try: 240 | os.makedirs(cache_dir) 241 | except FileExistsError: 242 | pass 243 | 244 | df_train, df_val, df_test = get_dfs_processed() 245 | df_cols = get_column_info() 246 | 247 | s_labels = get_label_series() 248 | 249 | df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl")) 250 | _df_dicts["df_train"] = df_train 251 | df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 252 | _df_dicts["df_val"] = df_val 253 | df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 254 | _df_dicts["df_test"] = df_test 255 | 256 | df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 257 | _df_dicts["df_cols"] = df_cols 258 | s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl")) 259 | _df_dicts["s_labels"] = s_labels 260 | print("Caching done.") 261 | else: 262 | print("Loading cached data.") 263 | _df_dicts["df_train"] = pd.read_pickle( 264 | os.path.join(cache_dir, "df_train.df.pkl") 265 | ) 266 | _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 267 | _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 268 | 269 | _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 270 | _df_dicts["s_labels"] = pd.read_pickle( 271 | os.path.join(cache_dir, "s_labels.s.pkl") 272 | ) 273 | print("Loaded.") 274 | 275 | return _df_dicts 276 | 277 | 278 | def get_x_y_contig( 279 | which_set="train", 280 | dfs_dict=None, 281 | y_cols=["y_activity"], 282 | sensor_subset=None, 283 | include_transitions=False, 284 | ): 285 | """ Load X and y as contiguous time vectors (with various runs concatenated together). 286 | 287 | Parameters 288 | ---------- 289 | which_set: str 290 | which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g., 291 | "train", "val", "test", "train+val", etc. 292 | dfs_dict: dict 293 | Can provide pre-laoded df_dict to save time. (deprecated) 294 | y_cols: List[str] 295 | List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion'] 296 | sensor_subset: ty.Iterable 297 | Which subset of sensors to include. Valid values include: 298 | "accels", "accel" 299 | "gyros", "gyro" 300 | "accels+gyros", "accel+gyro" 301 | "all" 302 | include_transitions : bool 303 | """ 304 | if not dfs_dict: 305 | dfs_dict = get_or_make_dfs() 306 | 307 | assert type(y_cols) == list 308 | 309 | df = [] 310 | for _which_set in which_set.split("+"): 311 | df.append(dfs_dict["df_" + _which_set]) 312 | 313 | df = pd.concat(df) 314 | 315 | df_cols = dfs_dict["df_cols"] 316 | 317 | if not sensor_subset or sensor_subset == "all": 318 | cols = df_cols.index[df_cols.output == False] 319 | else: 320 | cols = [] 321 | for s in sensor_subset.split("+"): 322 | if s.endswith("s"): 323 | s = s[:-1] 324 | cols.append(df_cols.index[df_cols.sensor_type == s]) 325 | cols = pd.concat(cols) 326 | 327 | Xc = df[cols].values.copy() 328 | s_labels = dfs_dict["s_labels"] 329 | activity_outputs = df["activity_id"].copy() 330 | # Replace nulls with 0 331 | assert not activity_outputs.isnull().any() 332 | 333 | if (activity_outputs == 0).sum() == 0: 334 | # No None class 335 | activity_outputs -= 1 336 | s_labels.index = s_labels.index - 1 337 | 338 | ycs = [activity_outputs.values.copy()] 339 | 340 | # Include the null class 341 | output_spec_dict = { 342 | "name": "y_activity", 343 | "num_classes": len(s_labels), 344 | "classes": s_labels.sort_index().to_list(), 345 | } 346 | 347 | data_spec = { 348 | "dataset_name": "har", 349 | "input_channels": Xc.shape[1], 350 | "n_outputs": len(ycs), 351 | "input_features": cols.to_list(), 352 | "output_spec": [output_spec_dict], 353 | } 354 | 355 | return Xc, ycs, data_spec 356 | 357 | 358 | if __name__ == "__main__": 359 | pass 360 | # get_x_y_contig() 361 | -------------------------------------------------------------------------------- /filternet/datasets/intention_recognition.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ EEG Based intention recognition 4 | 5 | Good paper to base it off of: 6 | https://arxiv.org/pdf/1708.06578.pdf 7 | https://github.com/pbashivan/EEGLearn 8 | """ 9 | 10 | import os 11 | import urllib 12 | from zipfile import ZipFile 13 | 14 | # from . import datasets_dir 15 | import numpy as np 16 | import pyedflib as pyedflib 17 | 18 | datasets_dir = os.path.dirname(__file__) 19 | 20 | import pandas as pd 21 | 22 | DATASET_URL = "https://www.physionet.org/static/published-projects/eegmmidb/eeg-motor-movementimagery-dataset-1.0.0.zip" 23 | _, DATASET_FILE = os.path.split(DATASET_URL) 24 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE) 25 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_") 26 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR) 27 | 28 | 29 | def download_if_needed(): 30 | """Downloads and extracts .zip if needed. """ 31 | if not os.path.exists(OUTPUT_DIR): 32 | if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)): 33 | print("Downloading .zip file...") 34 | urllib.request.urlretrieve( 35 | DATASET_URL, os.path.join(datasets_dir, DATASET_FILE) 36 | ) 37 | assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE)) 38 | 39 | print(f"Extracting to {OUTPUT_DIR}...") 40 | zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE)) 41 | zip.extractall(OUTPUT_DIR) 42 | 43 | assert os.path.exists(OUTPUT_DIR) 44 | 45 | 46 | def separate_subjects(): 47 | all_subjects = set(range(1, 110)) 48 | all_subjects -= {89} # Screwed up according to paper 49 | # validation_subjects = {4, 12, 20, 27} # 3 test (4, 12, 20) + 1 train (27) 50 | 51 | np.random.seed(seed=123) 52 | test_subjects = set( 53 | np.random.choice( 54 | list(all_subjects), size=int(len(all_subjects) // 20), replace=False 55 | ) 56 | ) 57 | validation_subjects = set( 58 | np.random.choice( 59 | list(all_subjects - test_subjects), 60 | size=int((len(all_subjects) - len(test_subjects)) // 20), 61 | replace=False, 62 | ) 63 | ) 64 | train_subjects = all_subjects - test_subjects - validation_subjects 65 | 66 | print(f"test_subjects ({len(test_subjects)}): {test_subjects}") 67 | print(f"validation_subjects ({len(validation_subjects)}): {validation_subjects}") 68 | print(f"train_subjects ({len(train_subjects)}): {train_subjects}") 69 | 70 | assert train_subjects | test_subjects | validation_subjects == all_subjects 71 | assert set() == validation_subjects & test_subjects 72 | assert set() == train_subjects & test_subjects 73 | assert set() == train_subjects & validation_subjects 74 | return train_subjects, validation_subjects, test_subjects 75 | 76 | 77 | def get_label_series(): 78 | return pd.Series( 79 | { 80 | 0: "Eyes Closed", 81 | 1: "Left Fist", 82 | 2: "Right Fist", 83 | 3: "Both Fists", 84 | 4: "Both Feet", 85 | } 86 | ) 87 | 88 | 89 | def get_column_info(): 90 | sensor_types = ["eeg"] 91 | num_chans = [64] 92 | assert len(num_chans) == len(sensor_types) 93 | types = [] 94 | names = [] 95 | file_path = os.path.join(OUTPUT_DIR, "files", "S001", "S001R01.edf") 96 | f = pyedflib.EdfReader(file_path) 97 | names = f.getSignalLabels() 98 | for i in range(len(sensor_types)): 99 | t = sensor_types[i] 100 | nc = num_chans[i] 101 | types.extend([t] * nc) 102 | 103 | # Note that this segfaults if the input data is a tuple and not a list. WTF pandas? 104 | df = pd.DataFrame([types], index=["sensor_type"], columns=names).T 105 | df.index.name = "name" 106 | df["output"] = False 107 | 108 | df = df.reindex( 109 | df.index.append( 110 | pd.Index(["experiment_id", "subject_id", "activity_id", "activity_label"]) 111 | ) 112 | ) 113 | 114 | df.loc[["activity_id", "activity_label"], "output"] = True 115 | 116 | return df 117 | 118 | 119 | def get_data(): 120 | train_subjects, validation_subjects, test_subjects = separate_subjects() 121 | col_info = get_column_info() 122 | 123 | train_data = [] 124 | validation_data = [] 125 | test_data = [] 126 | 127 | left_or_right_trials = {4, 8, 12} 128 | both_trials = {6, 10, 14} 129 | label_series = get_label_series() 130 | all_trials = left_or_right_trials | both_trials 131 | all_subjects = train_subjects | validation_subjects | test_subjects 132 | for sub in all_subjects: 133 | sub_str = f"S{sub:03d}" 134 | print(f"Loading data for subject {sub}") 135 | sub_folder = os.path.join(OUTPUT_DIR, "files", sub_str) 136 | sub_dfs = [] 137 | 138 | for t in all_trials: 139 | # https://pyedflib.readthedocs.io/en/latest/ 140 | file_path = os.path.join(sub_folder, f"{sub_str}R{t:02d}.edf") 141 | # print(f"Loading file {file_path}") 142 | f = pyedflib.EdfReader(file_path) 143 | 144 | n = f.signals_in_file 145 | assert n == 64 146 | trial_data = np.empty((f.getNSamples()[0], n), order="F") 147 | for i in np.arange(n): 148 | trial_data[:, i] = f.readSignal(i) 149 | 150 | output_classes = np.zeros(trial_data.shape[0], order="F") 151 | start_times, durations, labels = f.readAnnotations() 152 | sr = 160 153 | for st, dur, lab in zip(start_times, durations, labels): 154 | s_ind = int(st * sr) 155 | e_ind = int(s_ind + dur * sr) 156 | if lab == "T0": 157 | output_classes[s_ind:e_ind] = 0 158 | elif lab == "T1": 159 | output_classes[s_ind:e_ind] = 1 if t in left_or_right_trials else 3 160 | elif lab == "T2": 161 | output_classes[s_ind:e_ind] = 2 if t in left_or_right_trials else 4 162 | else: 163 | raise ValueError(f"The label {lab} is not defined") 164 | 165 | df = pd.DataFrame( 166 | trial_data, columns=col_info.index[col_info.output == False] 167 | ) 168 | df["experiment_id"] = t 169 | df["subject_id"] = sub 170 | df["activity_id"] = output_classes 171 | df["activity_label"] = df["activity_id"].map(label_series) 172 | sub_dfs.append(df) 173 | 174 | if sub in train_subjects: 175 | train_data.extend(sub_dfs) 176 | elif sub in validation_subjects: 177 | validation_data.extend(sub_dfs) 178 | elif sub in test_subjects: 179 | test_data.extend(sub_dfs) 180 | else: 181 | raise ValueError(f"Unexpected subject {sub}") 182 | 183 | print(f"Done with subject {sub}") 184 | 185 | print("Creating training DF") 186 | train_data = pd.concat(train_data, axis=0, ignore_index=True) 187 | print("Creating validation DF") 188 | validation_data = pd.concat(validation_data, axis=0, ignore_index=True) 189 | print("Creating test DF") 190 | test_data = pd.concat(test_data, axis=0, ignore_index=True) 191 | print("Created all data frames") 192 | return train_data, validation_data, test_data 193 | 194 | 195 | def get_dfs_processed(): 196 | df_train, df_val, df_test = get_data() 197 | col_df = get_column_info() 198 | 199 | # Apply to all sets to normalize features 200 | input_features = col_df.index[col_df.output == False] 201 | 202 | # Calculate feature normalizing stats from training set only 203 | norm_mean = df_train.loc[:, input_features].mean(axis=0) 204 | norm_std = df_train.loc[:, input_features].std(axis=0) 205 | 206 | print("Normalizing dataframes") 207 | for df in (df_test, df_val, df_train): 208 | # de-mean 209 | df.loc[:, input_features] = df.loc[:, input_features] - norm_mean 210 | 211 | # unit std dev 212 | df.loc[:, input_features] = df.loc[:, input_features].divide(norm_std, axis=1) 213 | 214 | # interpolate and NA's -> 0 215 | df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0) 216 | 217 | print("Done normalizing dataframes") 218 | return df_train, df_val, df_test 219 | 220 | 221 | _df_dicts = {} 222 | 223 | 224 | def get_or_make_dfs(): 225 | if _df_dicts: 226 | return _df_dicts 227 | download_if_needed() 228 | 229 | cache_dir = os.path.join(OUTPUT_DIR, "cache") 230 | if not os.path.isdir(cache_dir) or not os.path.isfile( 231 | os.path.join(cache_dir, "df_train.df.pkl") 232 | ): 233 | print("Intention recognition data not cached. Creating cache now...") 234 | try: 235 | os.makedirs(cache_dir) 236 | except FileExistsError: 237 | pass 238 | 239 | df_train, df_val, df_test = get_dfs_processed() 240 | df_cols = get_column_info() 241 | 242 | s_labels = get_label_series() 243 | 244 | print("Saving data to cache") 245 | df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl")) 246 | _df_dicts["df_train"] = df_train 247 | df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 248 | _df_dicts["df_val"] = df_val 249 | df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 250 | _df_dicts["df_test"] = df_test 251 | 252 | df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 253 | _df_dicts["df_cols"] = df_cols 254 | s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl")) 255 | _df_dicts["s_labels"] = s_labels 256 | print("Caching done.") 257 | else: 258 | print("Loading cached data.") 259 | _df_dicts["df_train"] = pd.read_pickle( 260 | os.path.join(cache_dir, "df_train.df.pkl") 261 | ) 262 | _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 263 | _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 264 | 265 | _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 266 | _df_dicts["s_labels"] = pd.read_pickle( 267 | os.path.join(cache_dir, "s_labels.s.pkl") 268 | ) 269 | print("Loaded.") 270 | 271 | return _df_dicts 272 | 273 | 274 | def get_x_y_contig( 275 | which_set="train", dfs_dict=None, y_cols=["y_activity"], sensor_subset=None 276 | ): 277 | """ Load X and y as contiguous time vectors (with various runs concatenated together). 278 | 279 | Parameters 280 | ---------- 281 | which_set: str 282 | which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g., 283 | "train", "val", "test", "train+val", etc. 284 | dfs_dict: dict 285 | Can provide pre-laoded df_dict to save time. (deprecated) 286 | y_cols: List[str] 287 | List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion'] 288 | sensor_subset: ty.Iterable 289 | Which subset of sensors to include. Valid values include: 290 | "accels", "accel" 291 | "gyros", "gyro" 292 | "accels+gyros", "accel+gyro" 293 | "all" 294 | """ 295 | if not dfs_dict: 296 | dfs_dict = get_or_make_dfs() 297 | 298 | assert type(y_cols) == list 299 | 300 | df = [] 301 | for _which_set in which_set.split("+"): 302 | df.append(dfs_dict["df_" + _which_set]) 303 | 304 | df = pd.concat(df) 305 | 306 | df_cols = dfs_dict["df_cols"] 307 | 308 | if not sensor_subset or sensor_subset == "all": 309 | cols = df_cols.index[df_cols.output == False] 310 | else: 311 | cols = [] 312 | for s in sensor_subset.split("+"): 313 | if s.endswith("s"): 314 | s = s[:-1] 315 | cols.append(df_cols.index[df_cols.sensor_type == s]) 316 | cols = pd.concat(cols) 317 | 318 | Xc = df[cols].values 319 | s_labels = dfs_dict["s_labels"] 320 | activity_outputs = df["activity_id"].copy() 321 | 322 | # Replace nulls with 0 323 | activity_outputs = activity_outputs.fillna(0) 324 | 325 | ycs = [activity_outputs.values] 326 | 327 | output_spec_dict = { 328 | "name": "y_activity", 329 | "num_classes": len(s_labels), 330 | "classes": s_labels.sort_index().to_list(), 331 | } 332 | 333 | data_spec = { 334 | "dataset_name": "intention", 335 | "input_channels": Xc.shape[1], 336 | "n_outputs": len(ycs), 337 | "input_features": cols.to_list(), 338 | "output_spec": [output_spec_dict], 339 | } 340 | 341 | return Xc, ycs, data_spec 342 | 343 | 344 | if __name__ == "__main__": 345 | pass 346 | # get_x_y_contig() 347 | -------------------------------------------------------------------------------- /filternet/datasets/opportunity.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ Loads the opportunity dataset. Based largely on 4 | https://github.com/sussexwearlab/DeepConvLSTM/blob/master/preprocess_data.py 5 | but refactored and with a slightly different normalization 6 | 7 | (we de-mean and normalize to unit std deviation using 8 | the statistics of the training set, instead of rescaling + clipping from 0->1 with predefined limits as 9 | the referenced repo does. ) 10 | 11 | """ 12 | 13 | import os 14 | import re 15 | import urllib.error 16 | import urllib.parse 17 | import urllib.request 18 | from zipfile import ZipFile 19 | 20 | import pandas as pd 21 | 22 | from . import datasets_dir 23 | 24 | # datasets_dir = os.path.dirname(__file__) 25 | 26 | DATASET_URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00226/OpportunityUCIDataset.zip" 27 | _, DATASET_FILE = os.path.split(DATASET_URL) 28 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE) 29 | 30 | # + 31 | fns_train = [ 32 | "dataset/S1-ADL4.dat", 33 | "dataset/S1-Drill.dat", 34 | "dataset/S1-ADL5.dat", 35 | "dataset/S1-ADL1.dat", 36 | "dataset/S1-ADL2.dat", 37 | "dataset/S1-ADL3.dat", 38 | "dataset/S2-ADL2.dat", 39 | "dataset/S3-ADL2.dat", 40 | "dataset/S3-ADL1.dat", 41 | "dataset/S2-ADL1.dat", 42 | "dataset/S3-Drill.dat", 43 | # 'dataset/S4-ADL4.dat', 44 | # 'dataset/S4-ADL5.dat', 45 | "dataset/S2-Drill.dat", 46 | # 'dataset/S4-ADL2.dat', 47 | # 'dataset/S4-ADL3.dat', 48 | # 'dataset/S4-ADL1.dat', 49 | # 'dataset/S4-Drill.dat' 50 | ] 51 | 52 | fns_val = ["dataset/S2-ADL3.dat", "dataset/S3-ADL3.dat"] 53 | 54 | fns_test = [ 55 | "dataset/S2-ADL4.dat", 56 | "dataset/S2-ADL5.dat", 57 | "dataset/S3-ADL4.dat", 58 | "dataset/S3-ADL5.dat", 59 | ] 60 | 61 | 62 | def make_df_cols(): 63 | """ Make a dataframe of the columns in the datafiles that can be downselected for various sensor subsets, lablels, etc.""" 64 | 65 | # Read in columns list 66 | with open( 67 | os.path.join(datasets_dir, DATASET_SUBDIR, "dataset/column_names.txt") 68 | ) as f: 69 | txt = f.read() 70 | 71 | # Extract Feature columns first 72 | pattern = "^Column: (?P\d*) (?P\S*\s?\S*) (?P\S*) (?P\S*); .*unit = (?P.*)$" 73 | 74 | df_feat_cols = pd.DataFrame.from_records( 75 | [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)] 76 | ) 77 | df_feat_cols["col_no"] = df_feat_cols["col_no_matlab"].astype(int) - 1 78 | df_feat_cols = df_feat_cols.set_index("col_no").sort_index() 79 | df_feat_cols["name"] = ( 80 | ( 81 | df_feat_cols.cat.str.slice(0, 2) 82 | + " " 83 | + df_feat_cols["posn"] 84 | + " " 85 | + df_feat_cols["chan"] 86 | ) 87 | .str.replace("^", "hi") 88 | .str.replace("_", "lo") 89 | ) 90 | 91 | # Then label columns 92 | pattern = "^Column: (?P\d*) (?P\S*\s?\S*)$" 93 | df_label_cols = pd.DataFrame.from_records( 94 | [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)] 95 | ) 96 | df_label_cols["col_no"] = df_label_cols["col_no_matlab"].astype(int) - 1 97 | df_label_cols = df_label_cols.set_index("col_no").sort_index() 98 | 99 | # Combine to get all columns 100 | df_cols = pd.concat([df_feat_cols, df_label_cols], sort=True).sort_index() 101 | 102 | return df_cols 103 | 104 | 105 | def make_df_labels(): 106 | """ Make a dataframe of the different class labels """ 107 | # + 108 | pattern = "^(?P\d*) - (?P\S*) - (?P.*)$" 109 | with open( 110 | os.path.join(datasets_dir, DATASET_SUBDIR, "dataset/label_legend.txt") 111 | ) as f: 112 | txt = f.read() 113 | df_label_legend = pd.DataFrame.from_records( 114 | [m.groupdict() for m in re.compile(pattern, re.MULTILINE).finditer(txt)] 115 | ) 116 | 117 | # Label mapping for locomotion 118 | df_labels_locomotion = df_label_legend.query( 119 | 'track_name == "Locomotion"' 120 | ).reset_index(drop=True) 121 | df_labels_locomotion["idx"] = df_labels_locomotion.index + 1 122 | df_labels_locomotion.src_idx = df_labels_locomotion.src_idx.astype(int) 123 | df_labels_locomotion = df_labels_locomotion.set_index("src_idx", drop=True) 124 | df_labels_locomotion.loc[0] = ("Null", "", 0) 125 | 126 | # Label mapping for gestures 127 | df_labels_gesture = df_label_legend.query( 128 | 'track_name == "ML_Both_Arms"' 129 | ).reset_index(drop=True) 130 | df_labels_gesture["idx"] = df_labels_gesture.index + 1 131 | df_labels_gesture.src_idx = df_labels_gesture.src_idx.astype(int) 132 | df_labels_gesture = df_labels_gesture.set_index("src_idx", drop=True) 133 | df_labels_gesture.loc[0] = ("Null", "", 0) 134 | 135 | return df_labels_locomotion, df_labels_gesture 136 | 137 | 138 | def download_if_needed(): 139 | """Downloads and extracts .zip if needed. """ 140 | 141 | if not os.path.exists(os.path.join(datasets_dir, DATASET_SUBDIR)): 142 | if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)): 143 | print("Downloading .zip file...") 144 | urllib.request.urlretrieve( 145 | DATASET_URL, os.path.join(datasets_dir, DATASET_FILE) 146 | ) 147 | assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE)) 148 | 149 | print("Extracting...") 150 | zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE)) 151 | zip.extractall(datasets_dir) 152 | 153 | assert os.path.exists(os.path.join(datasets_dir, DATASET_SUBDIR)) 154 | 155 | 156 | def load_opp_dataset(fn): 157 | """ Load an individual file """ 158 | print(fn) 159 | return pd.read_csv( 160 | os.path.join(datasets_dir, DATASET_SUBDIR, fn), header=None, sep="\s+" 161 | ) 162 | 163 | 164 | def get_dfs_raw(): 165 | """ Load dataframes of concatenated data files for the three pre-defined splits. """ 166 | df_train = pd.concat([load_opp_dataset(fn) for fn in fns_train]) 167 | df_val = pd.concat([load_opp_dataset(fn) for fn in fns_val]) 168 | df_test = pd.concat([load_opp_dataset(fn) for fn in fns_test]) 169 | 170 | return df_train, df_val, df_test 171 | 172 | 173 | def get_dfs_processed(): 174 | """ Load and preprocess the data from raw data down to more manageable dataframes that can be chopped 175 | up a bit for specific purposes. This includes de-meaning, etc.""" 176 | df_cols = make_df_cols() 177 | df_feat_cols = df_cols[~df_cols.cat.isna()] 178 | 179 | df_labels_locomotion, df_labels_gestures = make_df_labels() 180 | 181 | df_train, df_val, df_test = get_dfs_raw() 182 | 183 | for df in [df_train, df_val, df_test]: 184 | # Meaningful column names 185 | df.columns = df_cols.name.values 186 | 187 | # Calculate feature normalizing stats from training set only 188 | norm_mean = df_train.loc[:, df_feat_cols.name].mean() 189 | norm_std = df_train.loc[:, df_feat_cols.name].std() 190 | 191 | # Apply to all sets to normalize features 192 | for df in [df_train, df_val, df_test]: 193 | # de-mean 194 | df.loc[:, df_feat_cols.name] -= norm_mean 195 | 196 | # unit std dev 197 | df.loc[:, df_feat_cols.name] /= norm_std 198 | 199 | # interpolate and NA's -> 0 200 | df.loc[:, df_feat_cols.name] = ( 201 | df.loc[:, df_feat_cols.name].interpolate().fillna(0) 202 | ) 203 | 204 | df["y_locomotion"] = df["Locomotion"].map(df_labels_locomotion.idx) 205 | df["y_gesture"] = df["ML_Both_Arms"].map(df_labels_gestures.idx) 206 | 207 | return df_train, df_val, df_test 208 | 209 | 210 | _df_dicts = None 211 | 212 | 213 | def get_or_make_dfs(): 214 | """ Loads pre-processed dataframes from disk, if they exist; creates them if not. Once loaded, they are 215 | cached in-memory for fast subsequent loads. """ 216 | 217 | global _df_dicts 218 | if _df_dicts is not None: 219 | return _df_dicts 220 | download_if_needed() 221 | cache_dir = os.path.join(datasets_dir, DATASET_SUBDIR, "cache") 222 | if not os.path.exists(cache_dir): 223 | print("Opportunity data not cached. Creating cache now...") 224 | os.makedirs(cache_dir) 225 | 226 | df_cols = make_df_cols() 227 | df_labels_locomotion, df_labels_gestures = make_df_labels() 228 | df_train, df_val, df_test = get_dfs_processed() 229 | 230 | df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl")) 231 | df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 232 | df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 233 | 234 | df_labels_locomotion.to_pickle( 235 | os.path.join(cache_dir, "df_labels_locomotion.df.pkl") 236 | ) 237 | df_labels_gestures.to_pickle( 238 | os.path.join(cache_dir, "df_labels_gestures.df.pkl") 239 | ) 240 | df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 241 | print("Caching done.") 242 | 243 | print("Loading cached data.") 244 | df_train = pd.read_pickle(os.path.join(cache_dir, "df_train.df.pkl")) 245 | df_val = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 246 | df_test = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 247 | 248 | df_labels_locomotion = pd.read_pickle( 249 | os.path.join(cache_dir, "df_labels_locomotion.df.pkl") 250 | ) 251 | df_labels_gestures = pd.read_pickle( 252 | os.path.join(cache_dir, "df_labels_gestures.df.pkl") 253 | ) 254 | df_cols = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 255 | print("Loaded.") 256 | 257 | _df_dicts = dict( 258 | df_train=df_train, 259 | df_val=df_val, 260 | df_test=df_test, 261 | df_labels_locomotion=df_labels_locomotion, 262 | df_labels_gestures=df_labels_gestures, 263 | df_cols=df_cols, 264 | ) 265 | 266 | return _df_dicts 267 | 268 | 269 | def get_x_y_contig( 270 | which_set="train", dfs_dict=None, y_cols=["y_gesture"], sensor_subset=None 271 | ): 272 | """ Load X and y as contiguous time vectors (with various runs concatenated together). 273 | 274 | Parameters 275 | ---------- 276 | which_set: str 277 | which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g., 278 | "train", "val", "test", "train+val", etc. 279 | dfs_dict: dict 280 | Can provide pre-laoded df_dict to save time. (deprecated) 281 | y_cols: List[str] 282 | List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion'] 283 | sensor_subset: str 284 | Which subset of sensors to include. Valid values include: 285 | "accels", 286 | "gyros", 287 | "accels+gyros", 288 | "accels+gyros+magnetic", << above categories are for motion jacket only 289 | "opportunity" <<< all 113 sensors used in opportunity challenge 290 | """ 291 | if dfs_dict is None: 292 | dfs_dict = get_or_make_dfs() 293 | 294 | assert type(y_cols) == list 295 | 296 | df = [] 297 | for _which_set in which_set.split("+"): 298 | df.append(dfs_dict["df_" + _which_set]) 299 | 300 | df = pd.concat(df) 301 | 302 | df_cols = dfs_dict["df_cols"] 303 | 304 | # Used for sensor subsets (arm + back IMUs) 305 | core_posns = ["RUA", "LUA", "RLA", "LLA", "BACK"] 306 | 307 | # For full Opportunity complement of sensors: 308 | xtra_posns = [ 309 | "L-SHOE", 310 | "R-SHOE", 311 | "RKN_", 312 | "RUA_", 313 | "RWR", 314 | "LUA_", 315 | "HIP", 316 | "RUA^", 317 | "RKN^", 318 | "LWR", 319 | "LH", 320 | "LUA^", 321 | "RH", 322 | ] 323 | 324 | if sensor_subset is None or sensor_subset == "opportunity": 325 | posns = core_posns + xtra_posns 326 | chan_pattern = "" 327 | cat_pattern = "" 328 | else: 329 | posns = core_posns 330 | # regex for selecting sensors by prefixes 331 | chan_pattern = "|".join([f"^{s[:3]}" for s in sensor_subset.split("+")]) 332 | cat_pattern = "^Inertial" 333 | 334 | feature_cols = df_cols[ 335 | ( 336 | df_cols.posn.isin(posns) 337 | & ~(df_cols.chan.str.match("^Quat") == True) 338 | & (df_cols.chan.str.match(chan_pattern) == True) 339 | & (df_cols.cat.str.match(cat_pattern) == True) 340 | ) 341 | ] 342 | 343 | Xc = df[feature_cols.name].values 344 | ycs = [df[y_col].values for y_col in y_cols] 345 | 346 | output_spec_dict = { 347 | "y_gesture": { 348 | "name": "gesture", 349 | "num_classes": len(dfs_dict["df_labels_gestures"]), 350 | "classes": dfs_dict["df_labels_gestures"] 351 | .set_index("idx") 352 | .sort_index() 353 | .label_name.to_list(), 354 | }, 355 | "y_locomotion": { 356 | "name": "locomotion", 357 | "num_classes": len(dfs_dict["df_labels_locomotion"]), 358 | "classes": dfs_dict["df_labels_locomotion"] 359 | .set_index("idx") 360 | .sort_index() 361 | .label_name.to_list(), 362 | }, 363 | } 364 | 365 | data_spec = { 366 | "dataset_name": "opportunity", 367 | "input_channels": Xc.shape[1], 368 | "n_outputs": len(y_cols), 369 | "input_features": feature_cols.name.to_list(), 370 | "output_spec": [output_spec_dict[y_col] for y_col in y_cols], 371 | } 372 | 373 | return Xc, ycs, data_spec 374 | 375 | 376 | if __name__ == "__main__": 377 | pass 378 | -------------------------------------------------------------------------------- /filternet/datasets/smartphone_hapt.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ Loads the smartphone human activity and posture detection dataset. 4 | 5 | HAPT 6 | """ 7 | 8 | import os 9 | import urllib 10 | from zipfile import ZipFile 11 | 12 | # from . import datasets_dir 13 | import numpy as np 14 | 15 | datasets_dir = os.path.dirname(__file__) 16 | 17 | import pandas as pd 18 | 19 | # Papers 20 | # http://conference.scipy.org/proceedings/scipy2018/pdfs/christian_mcdaniel.pdf 21 | # github: https://github.com/xtianmcd/accelstm 22 | # https://www.mdpi.com/1424-8220/17/11/2556/htm 23 | # https://arxiv.org/pdf/1801.04503.pdf 24 | # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.641.2285&rep=rep1&type=pdf 25 | 26 | 27 | DATASET_URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip" 28 | _, DATASET_FILE = os.path.split(DATASET_URL) 29 | DATASET_SUBDIR, _ = os.path.splitext(DATASET_FILE) 30 | DATASET_SUBDIR = DATASET_SUBDIR.replace("%20", "_") 31 | OUTPUT_DIR = os.path.join(datasets_dir, DATASET_SUBDIR) 32 | 33 | 34 | def download_if_needed(): 35 | """Downloads and extracts .zip if needed. """ 36 | if not os.path.exists(OUTPUT_DIR): 37 | if not os.path.exists(os.path.join(datasets_dir, DATASET_FILE)): 38 | print("Downloading .zip file...") 39 | urllib.request.urlretrieve( 40 | DATASET_URL, os.path.join(datasets_dir, DATASET_FILE) 41 | ) 42 | assert os.path.exists(os.path.join(datasets_dir, DATASET_FILE)) 43 | 44 | print(f"Extracting to {OUTPUT_DIR}...") 45 | zip = ZipFile(os.path.join(datasets_dir, DATASET_FILE)) 46 | zip.extractall(OUTPUT_DIR) 47 | 48 | assert os.path.exists(OUTPUT_DIR) 49 | 50 | 51 | def get_label_series(): 52 | activity_labels_path = os.path.join(OUTPUT_DIR, "activity_labels.txt") 53 | print(f"Loading file {activity_labels_path}") 54 | label_series = pd.read_csv(activity_labels_path, sep=r"\s+", header=None) 55 | label_series.columns = ["class_number", "class_label"] 56 | label_series = label_series.dropna(how="any", axis=0) 57 | return pd.Series( 58 | label_series["class_label"].values, index=label_series["class_number"].values 59 | ) 60 | 61 | 62 | def load_labels(): 63 | label_file_path = os.path.join(OUTPUT_DIR, "RawData/labels.txt") 64 | print(f"Reading labels file {label_file_path}") 65 | df_labels = pd.read_csv(label_file_path, r"\s+", header=None) 66 | df_labels.columns = [ 67 | "experiment_id", 68 | "subject_id", 69 | "activity_id", 70 | "label_start_point", 71 | "label_end_point", 72 | ] 73 | 74 | label_series = get_label_series() 75 | 76 | df_labels["activity_label"] = df_labels["activity_id"].map(label_series) 77 | 78 | return df_labels.sort_values( 79 | ["experiment_id", "label_start_point", "label_end_point"] 80 | ) 81 | 82 | 83 | def separate_subjects(): 84 | all_subjects = set(range(1, 31)) 85 | # validation_subjects = {4, 12, 20, 27} # 3 test (4, 12, 20) + 1 train (27) 86 | validation_subjects = {5, 16, 27} 87 | test_subjects = {2, 4, 9, 10, 12, 13, 18, 20, 24} # Given test subjects 88 | train_subjects = all_subjects - test_subjects - validation_subjects 89 | assert train_subjects | test_subjects | validation_subjects == all_subjects 90 | assert set() == validation_subjects & test_subjects 91 | assert set() == train_subjects & test_subjects 92 | assert set() == train_subjects & validation_subjects 93 | return train_subjects, validation_subjects, test_subjects 94 | 95 | 96 | def get_column_info(): 97 | sensor_types = ["accel", "gyro"] 98 | file_names = ["acc", "gyro"] 99 | num_chans = [3, 3] 100 | assert len(num_chans) == len(sensor_types) 101 | types = [] 102 | names = [] 103 | file_prefixes = [] 104 | for i in range(len(sensor_types)): 105 | t = sensor_types[i] 106 | nc = num_chans[i] 107 | types.extend([t] * nc) 108 | file_prefixes.extend([file_names[i]] * nc) 109 | names.extend([f"{t[0]}{j + 1}" for j in range(nc)]) 110 | 111 | # Note that this segfaults if the input data is a tuple and not a list. WTF pandas? 112 | df = pd.DataFrame( 113 | [types, file_prefixes], index=["sensor_type", "file_prefix"], columns=names 114 | ).T 115 | df.index.name = "name" 116 | df["output"] = False 117 | 118 | df = df.reindex( 119 | df.index.append( 120 | pd.Index(["experiment_id", "subject_id", "activity_id", "activity_label"]) 121 | ) 122 | ) 123 | 124 | df.loc[["activity_id", "activity_label"], "output"] = True 125 | 126 | return df 127 | 128 | 129 | def get_data(): 130 | train_subjects, validation_subjects, test_subjects = separate_subjects() 131 | labels = load_labels() 132 | col_info = get_column_info() 133 | 134 | output_cols = col_info.index[col_info.output == True] 135 | not_input_cols = col_info.index[~(col_info.output == False)] 136 | all_sensors = col_info.loc[col_info.output == False, "sensor_type"].unique() 137 | train_data = [] 138 | validation_data = [] 139 | test_data = [] 140 | for exp_num, exp_labels in labels.groupby("experiment_id"): 141 | user_id = exp_labels["subject_id"].unique() # Should only be 1 142 | assert len(user_id) == 1 143 | user_id = user_id[0] 144 | 145 | exp_data = [] 146 | for t in all_sensors: 147 | sensor_type_mask = t == col_info.sensor_type 148 | prefix = col_info.loc[sensor_type_mask, "file_prefix"].unique()[0] 149 | file_path = os.path.join( 150 | OUTPUT_DIR, 151 | "RawData", 152 | f"{prefix}_exp{exp_num:02d}_user{user_id:02d}.txt", 153 | ) 154 | print(f"Reading file {file_path}") 155 | tmp_df = pd.read_csv(file_path, sep=r"\s+", header=None) 156 | tmp_df.columns = col_info.index[sensor_type_mask] 157 | exp_data.append(tmp_df) 158 | exp_data = pd.concat(exp_data, axis=1, ignore_index=False) 159 | 160 | # Make 1 based to match everything else 161 | exp_data.index = exp_data.index + 1 162 | exp_data = pd.concat( 163 | (exp_data, pd.DataFrame(index=exp_data.index, columns=not_input_cols)), 164 | axis=1, 165 | ignore_index=False, 166 | ) 167 | 168 | for _, l in exp_labels.iterrows(): 169 | exp_data.loc[ 170 | l["label_start_point"] : l["label_end_point"], output_cols 171 | ] = l[output_cols].values 172 | 173 | # Just to help make sure matches expectations 174 | exp_data = exp_data.reset_index(drop=True) 175 | 176 | exp_data["experiment_id"] = exp_num 177 | exp_data["subject_id"] = user_id 178 | exp_data["activity_id"] = exp_data["activity_id"].fillna(0) 179 | exp_data["activity_label"] = "UNKNOWN" 180 | 181 | if user_id in train_subjects: 182 | train_data.append(exp_data) 183 | elif user_id in validation_subjects: 184 | validation_data.append(exp_data) 185 | elif user_id in test_subjects: 186 | test_data.append(exp_data) 187 | else: 188 | raise ValueError(f"Unexpected subject {user_id}") 189 | 190 | train_data = pd.concat(train_data) 191 | validation_data = pd.concat(validation_data) 192 | test_data = pd.concat(test_data) 193 | return train_data, validation_data, test_data 194 | 195 | 196 | def get_dfs_processed(): 197 | df_train, df_val, df_test = get_data() 198 | col_df = get_column_info() 199 | # Calculate feature normalizing stats from training set only 200 | norm_mean = df_train.mean(axis=0) 201 | norm_std = df_train.std(axis=0) 202 | 203 | # Apply to all sets to normalize features 204 | input_features = col_df.index[col_df.output == False] 205 | for df in (df_train, df_val, df_test): 206 | # de-mean 207 | df.loc[:, input_features] -= norm_mean 208 | 209 | # unit std dev 210 | df.loc[:, input_features] /= norm_std 211 | 212 | # interpolate and NA's -> 0 213 | df.loc[:, input_features] = df.loc[:, input_features].interpolate().fillna(0) 214 | 215 | return df_train, df_val, df_test 216 | 217 | 218 | _df_dicts = {} 219 | 220 | 221 | def get_or_make_dfs(): 222 | if _df_dicts: 223 | return _df_dicts 224 | download_if_needed() 225 | 226 | cache_dir = os.path.join(OUTPUT_DIR, "cache") 227 | if not os.path.isdir(cache_dir) or not os.path.isfile( 228 | os.path.join(cache_dir, "df_train.df.pkl") 229 | ): 230 | print("Smartphone data not cached. Creating cache now...") 231 | try: 232 | os.makedirs(cache_dir) 233 | except FileExistsError: 234 | pass 235 | 236 | df_train, df_val, df_test = get_dfs_processed() 237 | df_cols = get_column_info() 238 | 239 | s_labels = get_label_series() 240 | 241 | df_train.to_pickle(os.path.join(cache_dir, "df_train.df.pkl")) 242 | _df_dicts["df_train"] = df_train 243 | df_val.to_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 244 | _df_dicts["df_val"] = df_val 245 | df_test.to_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 246 | _df_dicts["df_test"] = df_test 247 | 248 | df_cols.to_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 249 | _df_dicts["df_cols"] = df_cols 250 | s_labels.to_pickle(os.path.join(cache_dir, "s_labels.s.pkl")) 251 | _df_dicts["s_labels"] = s_labels 252 | print("Caching done.") 253 | else: 254 | print("Loading cached data.") 255 | _df_dicts["df_train"] = pd.read_pickle( 256 | os.path.join(cache_dir, "df_train.df.pkl") 257 | ) 258 | _df_dicts["df_val"] = pd.read_pickle(os.path.join(cache_dir, "df_val.df.pkl")) 259 | _df_dicts["df_test"] = pd.read_pickle(os.path.join(cache_dir, "df_test.df.pkl")) 260 | 261 | _df_dicts["df_cols"] = pd.read_pickle(os.path.join(cache_dir, "df_cols.df.pkl")) 262 | _df_dicts["s_labels"] = pd.read_pickle( 263 | os.path.join(cache_dir, "s_labels.s.pkl") 264 | ) 265 | print("Loaded.") 266 | 267 | return _df_dicts 268 | 269 | 270 | def get_x_y_contig( 271 | which_set="train", 272 | dfs_dict=None, 273 | y_cols=("y_activity",), 274 | sensor_subset=None, 275 | include_transitions=False, 276 | ): 277 | """ Load X and y as contiguous time vectors (with various runs concatenated together). 278 | 279 | Parameters 280 | ---------- 281 | which_set: str 282 | which of the pre-defined splits to load. Can specify multiples, separated by '+'. E.g., 283 | "train", "val", "test", "train+val", etc. 284 | dfs_dict: dict 285 | Can provide pre-laoded df_dict to save time. (deprecated) 286 | y_cols: List[str] 287 | List of which label columns to return (e.g., ['y_gesture'], ['y_locomotion'], or ['y_gesture', 'y_locomotion'] 288 | sensor_subset: ty.Iterable 289 | Which subset of sensors to include. Valid values include: 290 | "accels", "accel" 291 | "gyros", "gyro" 292 | "accels+gyros", "accel+gyro" 293 | "all" 294 | include_transitions : bool 295 | """ 296 | y_cols = list(y_cols) 297 | if not dfs_dict: 298 | dfs_dict = get_or_make_dfs() 299 | 300 | assert isinstance(y_cols, list) 301 | 302 | df = [] 303 | for _which_set in which_set.split("+"): 304 | df.append(dfs_dict["df_" + _which_set]) 305 | 306 | df = pd.concat(df) 307 | 308 | df_cols = dfs_dict["df_cols"] 309 | 310 | if not sensor_subset or sensor_subset == "all": 311 | cols = df_cols.index[df_cols.output == False] 312 | else: 313 | cols = [] 314 | for s in sensor_subset.split("+"): 315 | if s.endswith("s"): 316 | s = s[:-1] 317 | cols.append(df_cols.index[df_cols.sensor_type == s]) 318 | cols = pd.concat(cols) 319 | 320 | Xc = df[cols].values 321 | s_labels = dfs_dict["s_labels"] 322 | activity_outputs = df["activity_id"].copy() 323 | if not include_transitions: 324 | transition_mask = s_labels.str.contains("_TO_") 325 | transition_activities = s_labels.index[transition_mask] 326 | activity_outputs[activity_outputs.isin(transition_activities)] = np.nan 327 | s_labels = s_labels[~transition_mask] 328 | 329 | # Replace nulls with 0 330 | activity_outputs = activity_outputs.fillna(0) 331 | 332 | ycs = [activity_outputs.values] 333 | 334 | # Include the null class 335 | output_spec_dict = { 336 | "name": "y_activity", 337 | "num_classes": len(s_labels) + 1, 338 | "classes": [""] + s_labels.sort_index().to_list(), 339 | } 340 | 341 | data_spec = { 342 | "dataset_name": "smartphone", 343 | "input_channels": Xc.shape[1], 344 | "n_outputs": len(ycs), 345 | "input_features": cols.to_list(), 346 | "output_spec": [output_spec_dict], 347 | } 348 | 349 | return Xc, ycs, data_spec 350 | 351 | 352 | if __name__ == "__main__": 353 | pass 354 | -------------------------------------------------------------------------------- /filternet/models/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | from .base_net import BaseNet 4 | from .deep_conv_lstm import DeepConvLSTM 5 | from .filter_net import FilterNet 6 | from .filter_net_ensemble import FilterNetEnsemble 7 | -------------------------------------------------------------------------------- /filternet/models/base_layers.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import numpy as np 4 | from torch import nn 5 | 6 | 7 | class CustomRNNMixin(object): 8 | """Mixin to hekp wraps PyTorch recurrent layer(s) to swap axes 1&2 (and back) since that's what PyTorch RNNs expect. 9 | """ 10 | 11 | def __init__(self, *args, **kwargs): 12 | if "batch_first" not in kwargs: 13 | kwargs["batch_first"] = True 14 | super().__init__(*args, **kwargs) 15 | 16 | def forward(self, input): 17 | input = input.transpose(1, 2).contiguous() 18 | output, h_n = super().forward(input) 19 | return output.transpose(1, 2).contiguous() 20 | 21 | 22 | class CustomGRU(CustomRNNMixin, nn.GRU): 23 | """Wraps PyTorch GRU to swap axes 1&2 (and back) since that's what PyTorch RNNs expect. 24 | GRU sub-type. 25 | """ 26 | 27 | pass 28 | 29 | 30 | class CustomLSTM(CustomRNNMixin, nn.LSTM): 31 | """Wraps PyTorch LSTM version to swap axes 1&2 (and back) since that's what PyTorch RNNs expect. 32 | LSTM sub-type. 33 | """ 34 | 35 | pass 36 | 37 | 38 | class CGLLayer(nn.Sequential): 39 | """Flexible mplementation of a convolution/GRU/LSTM layer, which is the basic building block of our models. Each 40 | layer is made up of (optional) dropout, a CNN, GRU, or LSTM layer surrounded by (optional) striding/pooling 41 | layers, and a BatchNorm layer. 42 | 43 | This layer subclasses torch.nn.Sequential so that all the pytorch magic still works with it (like transferring 44 | to/from devices, initializing weights, switching back/forth to eval mode, etc) 45 | """ 46 | 47 | output_size = ( 48 | None 49 | ) # type: int # depth (channels) output by this layer, useful for hooking up to subsequent modules. 50 | 51 | def __init__( 52 | self, 53 | input_size, 54 | output_size, 55 | kernel_size=5, 56 | type="cnn", 57 | stride=1, 58 | pool=None, 59 | dropout=0.1, 60 | stride_pos=None, 61 | batch_norm=True, 62 | groups=1, 63 | ): 64 | """ 65 | 66 | Parameters 67 | ---------- 68 | input_size: int 69 | Depth (channels) of input / previous layer 70 | output_size: int 71 | Depth (channels) that this layer will output 72 | kernel_size: int 73 | For CNNs 74 | type: str 75 | 'cnn', 'lstm', or 'gru'; determines primary layer type. 76 | stride: int 77 | How much to decimate output (in temporal dimension) via _striding_. Defaults to 1 (no decimation). 78 | pool: int 79 | How much to decimate output (in temporal dimension) via _average_pooling_. Defaults to 1 (no decimation). 80 | dropout: float 81 | Amount of dropout Defaults to 0.0, i.e., None 82 | stride_pos: str 83 | For recurrent layers only, determines whether striding/pooling is done *before* (default) or 84 | *after* the recurrent layer. 85 | batch_norm: bool 86 | If True (default), the activation layer is followed by a batchnorm layer. 87 | """ 88 | 89 | layers = [] 90 | self.output_size = output_size 91 | 92 | if type == "cnn": 93 | if dropout: 94 | layers.append(nn.Dropout2d(dropout)) 95 | s = 1 if pool else stride 96 | p = int(np.ceil((kernel_size - s) / 2.0)) 97 | layers.append( 98 | nn.Conv1d( 99 | input_size, 100 | output_size, 101 | stride=s, 102 | kernel_size=kernel_size, 103 | padding=p, 104 | groups=groups, 105 | ) 106 | ) 107 | layers.append(nn.ReLU()) 108 | if pool: 109 | p = int(np.ceil((pool - stride) / 2.0)) 110 | layers.append( 111 | nn.AvgPool1d(pool, stride, padding=p, count_include_pad=False) 112 | ) 113 | elif type in ["gru", "lstm"]: 114 | klass = {"gru": CustomGRU, "lstm": CustomLSTM}[type] 115 | if (pool or stride) and stride_pos != "post": 116 | pl = 1 if not pool else pool 117 | p = np.ceil((pl - stride) / 2.0).astype(int) 118 | layers.append(nn.AvgPool1d(pl, stride=stride, padding=p)) 119 | if dropout: 120 | layers.append(nn.Dropout2d(dropout)) 121 | assert output_size % 2 == 0 # must be even b/c bidirectional 122 | layers.append( 123 | klass( 124 | input_size=input_size, 125 | hidden_size=int(output_size / 2), 126 | bidirectional=True, 127 | ) 128 | ) 129 | if (pool or stride) and stride_pos == "post": 130 | pl = 1 if not pool else pool 131 | p = np.ceil((pl - stride) / 2.0).astype(int) 132 | layers.append(nn.AvgPool1d(pl, stride=stride, padding=p)) 133 | else: 134 | raise ValueError("Unknown layer type: %s" % type) 135 | 136 | # Follow with BN 137 | if batch_norm: 138 | layers.append(nn.BatchNorm1d(self.output_size)) 139 | 140 | super().__init__(*layers) 141 | -------------------------------------------------------------------------------- /filternet/models/base_net.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import numpy as np 4 | import torch 5 | from torch import nn 6 | 7 | 8 | class BaseNet(nn.Module): 9 | """ Abstract 'base' network that can be reimplemented for specific architectures.""" 10 | 11 | def __init__( 12 | self, 13 | input_channels=113, 14 | num_output_classes=[18], 15 | output_type="many_to_many", 16 | keep_intermediates=False, 17 | **other_kwargs, 18 | ): 19 | 20 | self.output_type = output_type 21 | self.num_output_classes = num_output_classes 22 | self.input_channels = input_channels 23 | self.keep_intermediates = keep_intermediates 24 | self.padding_lost_per_side = 0 25 | self.output_stride = 1 26 | 27 | super(BaseNet, self).__init__() 28 | 29 | self.build(**other_kwargs) 30 | 31 | def build(self, **other_kwargs): 32 | """ Builds the network. Can take any number of custom params as kwargs to configure it. 33 | REIMPLEMENT IN SUBCLASSES. 34 | """ 35 | raise NotImplementedError() 36 | 37 | def forward(self, X, **kwargs): 38 | ys = self._forward(X, **kwargs) 39 | 40 | if self.output_type == "many_to_one_takelast": 41 | return [y[:, :, [-1]] for y in ys] 42 | elif self.output_type == "many_to_many": 43 | return ys 44 | else: 45 | raise NotImplemented(self.output_type) 46 | 47 | def _forward(self, X, **kwargs): 48 | """Forward pass logic specific to this network type. 49 | REIPMLEMENT IN SUBCLASSES. 50 | Input dimensionality: (N, C_{in}, L_{in})""" 51 | raise NotImplementedError() 52 | 53 | def transform_targets(self, ys, one_hot=True): 54 | """ Convert a `y` vector (one of `ys`) into targets that can be compared 55 | to network outputs... take into account padding, one-hot encoding (if requested), 56 | and whether the network is many-to-many or many-to-one. """ 57 | ys2 = [] 58 | for i_y, y in enumerate(ys): 59 | if self.output_type == "many_to_one_takelast" and not one_hot: 60 | ys2.append(y[:, [-1]]) 61 | continue 62 | 63 | # Account for any change in sequence length due to padding 64 | if self.padding_lost_per_side > 0: 65 | y = y[:, self.padding_lost_per_side : -self.padding_lost_per_side] 66 | 67 | # for many-to-many, if needed: 68 | win_len = y.shape[-1] 69 | # Calculate number of outputs. This is not always accurate and sometimes 70 | # 'floor' needs to change to 'ceil' or vice-versa... TBD is to implement 71 | # a system that calculates this accurately for all of our possible 72 | # architectures. 73 | output_size = int(np.floor(win_len / float(self.output_stride))) 74 | # Now, create that many outputs, evenly spaced by output_stride 75 | output_idxs = np.arange(output_size) * self.output_stride 76 | # Now, center it in the middle of the window. Depending on our 77 | # architecture, this many not be *exactly* optimal, but it's 78 | # a good guess on average. 79 | # Note: win_len - 1 because of zero-indexing 80 | output_idxs = np.round( 81 | output_idxs - (output_idxs.mean() - (win_len - 1) / 2.0) 82 | ).astype(int) 83 | 84 | if one_hot: 85 | if len(y.shape) == 2: 86 | # Do one-hot encoding 87 | y = torch.zeros( 88 | y.size()[0], 89 | self.num_output_classes[i_y], 90 | y.size()[1], 91 | device=y.device, 92 | ).scatter_(1, y.unsqueeze(1), 1) 93 | 94 | if self.output_type == "many_to_one_takelast": 95 | ys2.append(y[:, :, [output_idxs[-1]]]) 96 | elif self.output_type == "many_to_many": 97 | ys2.append(y[:, :, output_idxs]) 98 | else: 99 | raise NotImplemented(self.output_type) 100 | 101 | else: 102 | if self.output_type == "many_to_one_takelast": 103 | ys2.append(y[:, [output_idxs[-1]]]) 104 | elif self.output_type == "many_to_many": 105 | ys2.append(y[:, output_idxs]) 106 | else: 107 | raise NotImplemented(self.output_type) 108 | 109 | return ys2 110 | -------------------------------------------------------------------------------- /filternet/models/deep_conv_lstm.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | from torch import nn 4 | 5 | from .base_net import BaseNet 6 | 7 | 8 | class DeepConvLSTM(BaseNet): 9 | """ A pytorch implementation of 'DeepConvLSTM' as described in: 10 | 11 | [1] F. J. Ordóñez and D. Roggen, “Deep Convolutional and LSTM Recurrent Neural Networks for 12 | Multimodal Wearable Activity Recognition,” Sensors, vol. 16, no. 1, p. 115, Jan. 2016. 13 | """ 14 | 15 | def __init__(self, **other_kwargs): 16 | super().__init__(output_type="many_to_one_takelast", **other_kwargs) 17 | 18 | def build( 19 | self, 20 | num_filters=64, 21 | filter_size=5, 22 | num_units_lstm=128, 23 | scale=1.0, 24 | **other_kwargs, 25 | ): 26 | 27 | pad = 0 28 | 29 | num_filters = int(num_filters * scale) 30 | num_units_lstm = int(num_units_lstm * scale) 31 | 32 | n_conv = 4 33 | conv_stack = [] 34 | in_shape = 1 35 | for i in range(n_conv): 36 | conv_stack.append( 37 | nn.Conv2d(in_shape, num_filters, (filter_size, 1), padding=(pad, 0)) 38 | ) 39 | conv_stack.append(nn.ReLU()) 40 | self.padding_lost_per_side += int((filter_size - 1) / 2) 41 | in_shape = num_filters 42 | 43 | self.conv_stack = nn.Sequential(*conv_stack) 44 | 45 | self.drop1 = nn.Dropout(0.5) 46 | self.lstm1 = nn.LSTM(num_filters * self.input_channels, num_units_lstm) 47 | self.drop2 = nn.Dropout(0.5) 48 | self.lstm2 = nn.LSTM(num_units_lstm, num_units_lstm) 49 | 50 | end_stacks = [] 51 | for num_output_classes in self.num_output_classes: 52 | end_stacks.append( 53 | nn.Sequential( 54 | nn.Dropout(0.5), 55 | # This sort of Conv1D acts as a time-distributed Dense layer. 56 | nn.Conv1d( 57 | num_units_lstm, num_output_classes, 1 58 | ), # time-distributed dense 59 | ) 60 | ) 61 | self.end_stacks = nn.ModuleList(end_stacks) 62 | 63 | # Original DeepConvLSTM used an orthogonal weight initialization 64 | for m in self.modules(): 65 | if isinstance(m, (nn.Conv2d, nn.Conv1d)): 66 | nn.init.orthogonal_(m.weight) 67 | 68 | def _forward(self, X, **kwargs): 69 | """(N, C_{in}, L_{in})""" 70 | Xs = [X] # [batch, chan, seq] 71 | Xs.append( 72 | Xs[-1].unsqueeze(1).permute([0, 1, 3, 2]) 73 | ) # [batch, filters, seq, sensors] 74 | 75 | Xs.append(self.conv_stack(Xs[-1])) 76 | 77 | Xs.append( 78 | Xs[-1].permute([2, 0, 1, 3]).flatten(2) 79 | ) # to [seq, batch, (filtersxsensors)] 80 | 81 | Xs.append(self.drop1(Xs[-1])) 82 | Xs.append(self.lstm1(Xs[-1])[0]) 83 | 84 | Xs.append(self.drop2(Xs[-1])) 85 | Xs.append(self.lstm2(Xs[-1])[0]) 86 | 87 | Xs.append(Xs[-1].permute([1, 2, 0])) # back to [batch, chan, seq] 88 | 89 | ys = [end_stack(Xs[-1]) for end_stack in self.end_stacks] 90 | # No softmax because the pytorch cross_entropy loss function wants the raw outputs. 91 | 92 | if self.keep_intermediates: 93 | self.Xs = Xs 94 | 95 | return ys 96 | -------------------------------------------------------------------------------- /filternet/models/filter_net.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import numpy as np 4 | import torch 5 | import torch.nn.functional as F 6 | from torch import nn 7 | 8 | from .base_layers import CGLLayer 9 | from .base_net import BaseNet 10 | 11 | DEFAULT_WIDTH = 100 12 | 13 | 14 | class FilterNet(BaseNet): 15 | def build( 16 | self, 17 | n_pre=1, 18 | w_pre=DEFAULT_WIDTH, 19 | n_strided=3, 20 | w_strided=DEFAULT_WIDTH, 21 | n_interp=4, 22 | w_interp=DEFAULT_WIDTH, 23 | n_dense_pre_l=1, 24 | w_dense_pre_l=DEFAULT_WIDTH, 25 | n_l=1, 26 | w_l=DEFAULT_WIDTH, 27 | n_dense_post_l=0, 28 | w_dense_post_l=int(DEFAULT_WIDTH / 2), 29 | cnn_kernel_size=5, 30 | scale=1.0, 31 | bn_pre=False, 32 | dropout=0.1, 33 | do_pool=True, 34 | stride_pos="post", 35 | stride_amt=2, 36 | **other_kwargs, 37 | ): 38 | # if scale != 1: 39 | w_pre = int((w_pre * scale)) # / 6) * 6 40 | w_strided = int((w_strided * scale)) # / 6) * 6 41 | w_interp = int(w_interp * scale) 42 | w_dense_pre_l = int(w_dense_pre_l * scale) 43 | w_l = int((w_l * scale) / 2) * 2 44 | w_dense_post_l = int(w_dense_post_l * scale) 45 | 46 | down_stack_1 = [] 47 | in_shape = self.input_channels 48 | 49 | if bn_pre: 50 | down_stack_1.append(nn.BatchNorm1d(in_shape)) 51 | 52 | for i in range(n_pre): 53 | down_stack_1.append( 54 | CGLLayer(in_shape, w_pre, cnn_kernel_size, type="cnn", dropout=dropout) 55 | ) 56 | in_shape = down_stack_1[-1].output_size 57 | 58 | for i in range(n_strided): 59 | stride = stride_amt 60 | pool = stride if (do_pool and stride > 1) else None 61 | ltype = "cnn" 62 | down_stack_1.append( 63 | CGLLayer( 64 | in_shape, 65 | w_strided, 66 | cnn_kernel_size, 67 | type=ltype, 68 | stride=stride, 69 | pool=pool, 70 | stride_pos=stride_pos, 71 | dropout=dropout, 72 | # groups=3 if ( i % 2 == 0 ) else 2 73 | ) 74 | ) 75 | self.output_stride *= stride 76 | in_shape = down_stack_1[-1].output_size 77 | ds_1_end_size = in_shape 78 | self.down_stack_1 = nn.Sequential(*down_stack_1) 79 | 80 | ds2_ltype = "cnn" 81 | down_stack_2 = [] 82 | 83 | for i in range(n_interp): 84 | stride = stride_amt if (i < n_interp - 1) else 1 85 | pool = stride if (do_pool and stride > 1) else None 86 | w = int(np.ceil(w_interp * 0.5 ** (i + 1))) 87 | # if i == n_interp-1: 88 | # w = int(w_interp * .66) 89 | # if i == n_interp - 2: 90 | # w =int(w_interp * .33) 91 | # else: 92 | # w = w_interp 93 | down_stack_2.append( 94 | CGLLayer( 95 | in_shape, 96 | w, 97 | cnn_kernel_size, 98 | type=ds2_ltype, 99 | stride=stride, 100 | pool=pool, 101 | stride_pos=stride_pos, 102 | dropout=dropout, 103 | # groups = 3 if ( i % 2 == 0 ) else 2 104 | ) 105 | ) 106 | in_shape = down_stack_2[-1].output_size 107 | 108 | self.down_stack_2 = nn.Sequential(*down_stack_2) 109 | 110 | self.merged_output_size = ds_1_end_size + sum( 111 | [l.output_size for l in down_stack_2] 112 | ) 113 | 114 | in_shape = self.merged_output_size 115 | 116 | lstm_stack = [] 117 | for i in range(n_dense_pre_l): 118 | lstm_stack.append( 119 | CGLLayer( 120 | in_shape, w_dense_pre_l, kernel_size=1, type="cnn", dropout=dropout 121 | ) 122 | ) 123 | in_shape = lstm_stack[-1].output_size 124 | 125 | for i in range(n_l): 126 | lstm_stack.append( 127 | CGLLayer( 128 | in_shape, 129 | w_l, 130 | cnn_kernel_size, # unused when type!-='cnn' 131 | type="lstm", 132 | dropout=dropout, 133 | ) 134 | ) 135 | in_shape = lstm_stack[-1].output_size 136 | 137 | for i in range(n_dense_post_l): 138 | lstm_stack.append( 139 | CGLLayer( 140 | in_shape, w_dense_post_l, kernel_size=1, type="cnn", dropout=dropout 141 | ) 142 | ) 143 | in_shape = lstm_stack[-1].output_size 144 | 145 | self.lstm_stack = nn.Sequential(*lstm_stack) 146 | 147 | # [batch, chan, seq] 148 | 149 | end_stacks = [] 150 | for num_output_classes in self.num_output_classes: 151 | end_stacks.append( 152 | nn.Sequential( 153 | nn.Dropout(dropout), 154 | # # This sort of Conv1D acts as a time-distributed Dense layer. 155 | nn.Linear(in_shape, num_output_classes), 156 | # nn.Conv1d( 157 | # in_shape, num_output_classes, 1 158 | # ), # time-distributed dense 159 | ) 160 | # CGLLayer( 161 | # in_shape, 162 | # num_output_classes, 163 | # kernel_size=1, 164 | # type="cnn", 165 | # dropout=dropout, 166 | # batch_norm=False 167 | # ) 168 | ) 169 | 170 | self.end_stacks = nn.ModuleList(end_stacks) 171 | 172 | def _forward(self, X, **kwargs): 173 | """(N, C_{in}, L_{in})""" 174 | Xs = [X] # [batch, chan, seq] 175 | Xs.append(self.down_stack_1(Xs[-1])) 176 | 177 | to_merge = [Xs[-1]] 178 | for module in self.down_stack_2: 179 | output = module(Xs[-1]) 180 | Xs.append(output) 181 | to_merge.append( 182 | F.interpolate( 183 | output, 184 | size=to_merge[0].shape[-1], 185 | mode="linear", 186 | align_corners=False, 187 | ) 188 | ) 189 | 190 | merged = torch.cat(to_merge, dim=1) 191 | Xs.append(merged) 192 | Xs.append(self.lstm_stack(Xs[-1])) 193 | 194 | if self.keep_intermediates: 195 | self.Xs = Xs 196 | 197 | ys = [] 198 | 199 | # (N, C_{in}, L_{in}) 200 | 201 | for end_stack in self.end_stacks: 202 | # (N, C_{in}, L_{in}) => # (N, L_{in}, C_{in},) 203 | x = Xs[-1].permute([0, 2, 1]) 204 | x = end_stack(x) 205 | x = x.permute([0, 2, 1]) 206 | ys.append(x) 207 | 208 | # ys = [end_stack(Xs[-1]) for end_stack in self.end_stacks] 209 | 210 | # No softmax because the pytorch cross_entropy loss function wants the raw outputs. 211 | 212 | return ys 213 | -------------------------------------------------------------------------------- /filternet/models/filter_net_ensemble.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import torch 4 | from torch import nn 5 | 6 | from .base_net import BaseNet 7 | 8 | 9 | class FilterNetEnsemble(BaseNet): 10 | variance_penalty = 0.0 11 | 12 | def build(self, **config): 13 | pass 14 | 15 | def set_models(self, models): 16 | self.model = nn.ModuleList([m for m in models]) 17 | 18 | def _forward(self, X, **kwargs): 19 | """(N, C_{in}, L_{in})""" 20 | outputs_list = [sub_model(X) for sub_model in self.model] 21 | outputs = [] 22 | for i in range(len(self.model[0].num_output_classes)): 23 | output_ = torch.stack([_outputs[i] for _outputs in outputs_list]) 24 | if self.variance_penalty: 25 | s = torch.std(output_, dim=0) 26 | mean = output = torch.mean(output_, dim=0) 27 | output = mean - self.variance_penalty * s 28 | else: 29 | output = torch.mean(output_, dim=0) 30 | outputs.append(output) 31 | # output, _ = torch.median(outputs, dim=0) 32 | # output = torch.log(torch.mean(torch.softmax(outputs, dim=2), dim=0)) 33 | 34 | return outputs 35 | 36 | def transform_targets(self, y, one_hot=True): 37 | return self.model[0].transform_targets(y, one_hot=one_hot) 38 | -------------------------------------------------------------------------------- /filternet/models/reference_architectures.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ Reference architectures """ 4 | from copy import deepcopy 5 | 6 | ref_archs = { 7 | "base_cnn": { 8 | "n_pre": 1, 9 | "n_strided": 3, 10 | "n_interp": 0, 11 | "n_dense_pre_l": 0, 12 | "n_l": 0, 13 | "n_dense_post_l": 0, 14 | }, 15 | "multi_scale_cnn": { 16 | "n_pre": 1, 17 | "n_strided": 3, 18 | "n_interp": 4, 19 | "n_dense_pre_l": 1, 20 | "n_l": 0, 21 | "n_dense_post_l": 0, 22 | }, 23 | "base_lstm": { 24 | "n_pre": 0, 25 | "lr_decay": 1.0, 26 | "n_strided": 0, 27 | "n_interp": 0, 28 | "n_dense_pre_l": 0, 29 | "n_l": 1, 30 | "n_dense_post_l": 0, 31 | }, 32 | "cnn_lstm": { 33 | "n_pre": 1, 34 | "n_strided": 3, 35 | "n_interp": 0, 36 | "n_dense_pre_l": 1, 37 | "n_l": 1, 38 | "n_dense_post_l": 0, 39 | }, 40 | "multi_scale_cnn_lstm": { 41 | "n_pre": 1, 42 | "n_strided": 3, 43 | "n_interp": 4, 44 | "n_dense_pre_l": 1, 45 | "n_l": 1, 46 | "n_dense_post_l": 0, 47 | }, 48 | } 49 | 50 | 51 | def get_ref_arch(name): 52 | return deepcopy(ref_archs[name]) 53 | -------------------------------------------------------------------------------- /filternet/mputil.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import time 4 | 5 | 6 | class Timer(object): 7 | """ Convenience class for timing code w/ a context timer. 8 | 9 | Note that timer prints wall time, but also keeps track of 10 | cpu time internally (you can access via timer.interval_cpu 11 | 12 | Not super accurate, but super convenient! 13 | 14 | with Timer('Name of timer'): 15 | do_slow_stuff(...) 16 | do_some_other_stuff(...) 17 | 18 | # Once code reaches here, timing and name info will be printed to stdout. 19 | 20 | """ 21 | 22 | def __init__(self, name=__name__, log_output=True): 23 | self.name = name 24 | self.log_output = log_output 25 | 26 | def __enter__(self): 27 | self.start_wall = time.perf_counter() 28 | self.start_cpu = time.process_time() 29 | if self.log_output: 30 | print(f"/ / / / [{self.name} ...]") 31 | return self 32 | 33 | def __exit__(self, *args): 34 | self.end_wall = time.perf_counter() 35 | self.end_cpu = time.process_time() 36 | self.interval_wall = self.end_wall - self.start_wall 37 | self.interval_cpu = self.end_cpu - self.start_cpu 38 | if self.log_output: 39 | print( 40 | f"\\ \\ \\ \\ {self.interval_wall:.03f} s wall ({self.interval_cpu:.03f} s cpu) [... {self.name}]\n" 41 | ) 42 | -------------------------------------------------------------------------------- /filternet/training/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | -------------------------------------------------------------------------------- /filternet/training/ensemble_train.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | import pickle 5 | import typing as ty 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn.functional as F 10 | import torch.optim 11 | import traits.api as t 12 | from torch.utils.data import DataLoader, TensorDataset 13 | 14 | from filternet import models 15 | from filternet.training.evalmodel import EvalModel 16 | from filternet.training.train import Trainer 17 | 18 | 19 | class EnsembleTrainer(t.HasStrictTraits): 20 | def __init__(self, config={}, **kwargs): 21 | trainer_template = Trainer(**config) 22 | super().__init__(trainer_template=trainer_template, config=config, **kwargs) 23 | 24 | config: dict = t.Dict() 25 | 26 | trainer_template: Trainer = t.Instance(Trainer) 27 | trainers: ty.List[Trainer] = t.List(t.Instance(Trainer)) 28 | 29 | n_folds = t.Int(5) 30 | 31 | dl_test: DataLoader = t.DelegatesTo("trainer_template") 32 | data_spec: dict = t.DelegatesTo("trainer_template") 33 | cuda: bool = t.DelegatesTo("trainer_template") 34 | device: str = t.DelegatesTo("trainer_template") 35 | loss_func: str = t.DelegatesTo("trainer_template") 36 | batch_size: int = t.DelegatesTo("trainer_template") 37 | win_len: int = t.DelegatesTo("trainer_template") 38 | has_null_class: bool = t.DelegatesTo("trainer_template") 39 | predict_null_class: bool = t.DelegatesTo("trainer_template") 40 | name: str = t.Str() 41 | 42 | def _name_default(self): 43 | import time 44 | 45 | modelstr = "Ensemble" 46 | timestr = time.strftime("%Y%m%d-%H%M%S") 47 | return f"{modelstr}_{timestr}" 48 | 49 | X_folds = t.Tuple(transient=True) 50 | ys_folds = t.Tuple(transient=True) 51 | 52 | def _trainers_default(self): 53 | # Temp trainer for grabbing datasets, etc 54 | tt = self.trainer_template 55 | tt.init_data() 56 | 57 | # Combine official train & val sets 58 | X = torch.cat([tt.dl_train.dataset.tensors[0], tt.dl_val.dataset.tensors[0]]) 59 | ys = [ 60 | torch.cat([yt, yv]) 61 | for yt, yv in zip( 62 | tt.dl_train.dataset.tensors[1:], tt.dl_val.dataset.tensors[1:] 63 | ) 64 | ] 65 | # make folds 66 | fold_len = int(np.ceil(len(X) / self.n_folds)) 67 | self.X_folds = torch.split(X, fold_len) 68 | self.ys_folds = [torch.split(y, fold_len) for y in ys] 69 | 70 | trainers = [] 71 | for i_val_fold in range(self.n_folds): 72 | trainer = Trainer( 73 | validation_fold=i_val_fold, 74 | name=f"{self.name}/{i_val_fold}", 75 | **self.config, 76 | ) 77 | 78 | trainer.dl_test = tt.dl_test 79 | 80 | trainers.append(trainer) 81 | 82 | return trainers 83 | 84 | model: models.BaseNet = t.Instance(torch.nn.Module, transient=True) 85 | 86 | def _model_default(self): 87 | model = models.FilterNetEnsemble() 88 | model.set_models([trainer.model for trainer in self.trainers]) 89 | return model 90 | 91 | model_path: str = t.Str() 92 | 93 | def _model_path_default(self): 94 | return f"saved_models/{self.name}/" 95 | 96 | def init_data(self): 97 | # Initiate loading of datasets, model 98 | pass 99 | # for trainer in self.trainers: 100 | # trainer.init_data() 101 | 102 | def init_train(self): 103 | pass 104 | # for trainer in self.trainers: 105 | # trainer.init_train() 106 | 107 | def train(self, max_epochs=50): 108 | """ A pretty standard training loop, constrained to stop in `max_epochs` but may stop early if our 109 | custom stopping metric does not improve for `self.patience` epochs. Always checkpoints 110 | when a new best stopping_metric is achieved. An alternative to using 111 | ray.tune for training.""" 112 | 113 | for trainer in self.trainers: 114 | # Add data to trainer 115 | 116 | X_train = torch.cat( 117 | [ 118 | arr 119 | for i, arr in enumerate(self.X_folds) 120 | if i != trainer.validation_fold 121 | ] 122 | ) 123 | ys_train = [ 124 | torch.cat( 125 | [arr for i, arr in enumerate(y) if i != trainer.validation_fold] 126 | ) 127 | for y in self.ys_folds 128 | ] 129 | 130 | X_val = torch.cat( 131 | [ 132 | arr 133 | for i, arr in enumerate(self.X_folds) 134 | if i == trainer.validation_fold 135 | ] 136 | ) 137 | ys_val = [ 138 | torch.cat( 139 | [arr for i, arr in enumerate(y) if i == trainer.validation_fold] 140 | ) 141 | for y in self.ys_folds 142 | ] 143 | 144 | trainer.dl_train = DataLoader( 145 | TensorDataset(torch.Tensor(X_train), *ys_train), 146 | batch_size=trainer.batch_size, 147 | shuffle=True, 148 | ) 149 | trainer.data_spec = self.trainer_template.data_spec 150 | trainer.epoch_iters = self.trainer_template.epoch_iters 151 | trainer.dl_val = DataLoader( 152 | TensorDataset(torch.Tensor(X_val), *ys_val), 153 | batch_size=trainer.batch_size, 154 | shuffle=False, 155 | ) 156 | 157 | # Now clear local vars to save ranm 158 | X_train = ys_train = X_val = ys_val = None 159 | 160 | trainer.init_data() 161 | trainer.init_train() 162 | trainer.train(max_epochs=max_epochs) 163 | 164 | # Clear trainer train and val datasets to save ram 165 | trainer.dl_train = t.Undefined 166 | trainer.dl_val = t.Undefined 167 | 168 | print(f"RESTORING TO best model") 169 | trainer._restore() 170 | trainer._save() 171 | 172 | trainer.print_train_summary() 173 | 174 | em = EvalModel(trainer=trainer) 175 | 176 | em.run_test_set() 177 | em.calc_metrics() 178 | em.calc_ward_metrics() 179 | print(em.classification_report_df.to_string(float_format="%.3f")) 180 | em._save() 181 | 182 | def print_train_summary(self): 183 | for trainer in self.trainers: 184 | trainer.print_train_summary() 185 | 186 | def _save(self, checkpoint_dir=None, save_model=True, save_trainer=True): 187 | """ Saves/checkpoints model state and training state to disk. """ 188 | if checkpoint_dir is None: 189 | checkpoint_dir = self.model_path 190 | else: 191 | self.model_path = checkpoint_dir 192 | 193 | os.makedirs(checkpoint_dir, exist_ok=True) 194 | 195 | # save model params 196 | model_path = os.path.join(checkpoint_dir, "model.pth") 197 | trainer_path = os.path.join(checkpoint_dir, "trainer.pth") 198 | 199 | if save_model: 200 | torch.save(self.model.state_dict(), model_path) 201 | if save_trainer: 202 | with open(trainer_path, "wb") as f: 203 | pickle.dump(self, f) 204 | 205 | return checkpoint_dir 206 | 207 | def _restore(self, checkpoint_dir=None): 208 | """ Restores model state and training state from disk. """ 209 | 210 | if checkpoint_dir is None: 211 | checkpoint_dir = self.model_path 212 | 213 | model_path = os.path.join(checkpoint_dir, "model.pth") 214 | trainer_path = os.path.join(checkpoint_dir, "trainer.pth") 215 | 216 | # Reconstitute old trainer and copy state to this trainer. 217 | with open(trainer_path, "rb") as f: 218 | other_trainer = pickle.load(f) 219 | 220 | self.__setstate__(other_trainer.__getstate__()) 221 | 222 | # Load sub-models 223 | for trainer in self.trainers: 224 | trainer._restore() 225 | 226 | # Load model (after loading state in case we need to re-initialize model from config) 227 | self.model.load_state_dict(torch.load(model_path, map_location=self.device)) 228 | # self.model = self.model._model_default() 229 | -------------------------------------------------------------------------------- /filternet/training/evalmodel.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | import pickle 5 | import typing as ty 6 | from builtins import AssertionError 7 | 8 | import numpy as np 9 | import pandas as pd 10 | import sklearn.metrics 11 | import torch 12 | import torch.nn.functional as F 13 | import torch.optim 14 | import traits.api as t 15 | from scipy.special import softmax 16 | from torch.utils.data import DataLoader 17 | 18 | from filternet import models as mo 19 | from filternet.mputil import Timer 20 | from filternet.training.train import Trainer 21 | 22 | 23 | class ClassWardMetrics(t.HasStrictTraits): 24 | segment_twoset_results: dict = t.Dict() 25 | event_detailed_scores: dict = t.Dict() 26 | event_standard_scores: dict = t.Dict() 27 | 28 | 29 | class WardMetrics(t.HasStrictTraits): 30 | class_ward_metrics: ty.List[ClassWardMetrics] = t.List(ClassWardMetrics, []) 31 | overall_ward_metrics: ClassWardMetrics = t.Instance(ClassWardMetrics) 32 | df_event_scores: pd.DataFrame = t.Instance(pd.DataFrame()) 33 | df_event_detailed_scores: pd.DataFrame = t.Instance(pd.DataFrame()) 34 | df_segment_2set_results: pd.DataFrame = t.Instance(pd.DataFrame()) 35 | 36 | 37 | class EvalModel(t.HasStrictTraits): 38 | trainer: Trainer = t.Any() 39 | model: mo.BaseNet = t.DelegatesTo("trainer") 40 | dl_test: DataLoader = t.DelegatesTo("trainer") 41 | data_spec: dict = t.DelegatesTo("trainer") 42 | cuda: bool = t.DelegatesTo("trainer") 43 | device: str = t.DelegatesTo("trainer") 44 | loss_func: str = t.DelegatesTo("trainer") 45 | model_path: str = t.DelegatesTo("trainer") 46 | has_null_class: bool = t.DelegatesTo("trainer") 47 | predict_null_class: bool = t.DelegatesTo("trainer") 48 | 49 | # 'prediction' mode employs overlap and reconstructs signal 50 | # as a contiguous timeseries w/ optional windowing. 51 | # It aims for best accuracy/f1 by using overlap, and will 52 | # typically outperform 'training' mode. 53 | # 'training' mode does not average repeated point and does 54 | # not window; it should product acc/loss/f1 similar to 55 | # training mode. 56 | run_mode: str = t.Enum(["prediction", "training"]) 57 | window: str = t.Enum(["hanning", "boxcar"]) 58 | eval_batch_size: int = t.Int(100) 59 | 60 | target_names: ty.List[str] = t.ListStr() 61 | 62 | def _target_names_default(self): 63 | target_names = self.data_spec["output_spec"][0]["classes"] 64 | 65 | if self.has_null_class: 66 | assert target_names[0] in ("", "Null") 67 | 68 | if not self.predict_null_class: 69 | target_names = target_names[1:] 70 | 71 | return target_names 72 | 73 | def _run_model_on_batch(self, data, targets): 74 | targets = torch.stack(targets) 75 | 76 | if self.cuda: 77 | data, targets = data.cuda(), targets.cuda() 78 | 79 | output = self.model(data) 80 | 81 | _targets = self.model.transform_targets(targets, one_hot=False) 82 | if self.loss_func == "cross_entropy": 83 | _losses = [F.cross_entropy(o, t) for o, t in zip(output, _targets)] 84 | loss = sum(_losses) 85 | elif self.loss_func == "binary_cross_entropy": 86 | _targets_onehot = self.model.transform_targets(targets, one_hot=True) 87 | _losses = [ 88 | F.binary_cross_entropy_with_logits(o, t) 89 | for o, t in zip(output, _targets_onehot) 90 | ] 91 | loss = sum(_losses) 92 | else: 93 | raise NotImplementedError(self.loss) 94 | 95 | # Assume only 1 output: 96 | 97 | return loss, output[0], _targets[0], _losses[0] 98 | 99 | def run_test_set(self, dl=None): 100 | """ Runs `self.model` on `self.dl_test` (or a provided dl) and stores results for subsequent evaluation. """ 101 | if dl is None: 102 | dl = self.dl_test 103 | 104 | if self.cuda: 105 | self.model.cuda() 106 | self.model.eval() 107 | if self.eval_batch_size: 108 | dl = DataLoader(dl.dataset, batch_size=self.eval_batch_size, shuffle=False) 109 | # 110 | # # Xc, yc = data.get_x_y_contig('test') 111 | X, *ys = dl.dataset.tensors 112 | # X: [N, input_chans, win_len] 113 | step = int(X.shape[2] / 2) 114 | assert torch.equal(X[0, :, step], X[1, :, 0]) 115 | 116 | losses = [] 117 | outputsraw = [] 118 | outputs = [] 119 | targets = [] 120 | 121 | with Timer("run", log_output=False) as tr: 122 | with Timer("infer", log_output=False) as ti: 123 | for batch_idx, (data, *target) in enumerate(dl): 124 | ( 125 | batch_loss, 126 | batch_output, 127 | batch_targets, 128 | train_losses, 129 | ) = self._run_model_on_batch(data, target) 130 | 131 | losses.append(batch_loss.detach().cpu().item()) 132 | outputsraw.append(batch_output.detach().cpu().data.numpy()) 133 | outputs.append( 134 | torch.argmax(batch_output, 1, False).detach().cpu().data.numpy() 135 | ) 136 | targets.append(batch_targets.detach().cpu().data.numpy()) 137 | self.infer_time_s_cpu = ti.interval_cpu 138 | self.infer_time_s_wall = ti.interval_wall 139 | 140 | self.loss = np.mean(losses) 141 | targets = np.concatenate(targets, axis=0) # [N, out_win_len] 142 | outputsraw = np.concatenate( 143 | outputsraw, axis=0 144 | ) # [N, n_out_classes, out_win_len] 145 | outputs = np.concatenate(outputs, axis=0) # [N, n_out_classes, out_win_len] 146 | 147 | # win_len = toutputsraw[0].shape[-1] 148 | if ( 149 | self.model.output_type == "many_to_one_takelast" 150 | or self.run_mode == "training" 151 | ): 152 | self.targets = np.concatenate(targets, axis=-1) # [N,] 153 | self.outputsraw = np.concatenate( 154 | outputsraw, axis=-1 155 | ) # [n_out_classes, N,] 156 | self.outputs = np.concatenate(outputs, axis=-1) # [N,] 157 | 158 | elif self.run_mode == "prediction": 159 | n_segments, n_classes, out_win_len = outputsraw.shape 160 | 161 | output_step = int(out_win_len / 2) 162 | 163 | if self.window == "hanning": 164 | EPS = 0.001 # prevents divide-by-zero 165 | arr_window = (1 - EPS) * np.hanning(out_win_len) + EPS 166 | elif self.window == "boxcar": 167 | arr_window = np.ones((out_win_len,)) 168 | else: 169 | raise ValueError() 170 | 171 | # Allocate space for merged predictions 172 | if self.has_null_class and not self.predict_null_class: 173 | outputsraw2 = np.zeros( 174 | (n_segments + 1, n_classes - 1, output_step, 2) 175 | ) 176 | window2 = np.zeros( 177 | (n_segments + 1, n_classes - 1, output_step, 2) 178 | ) # [N+1, out_win_len/2, 2] 179 | # Drop in outputs/window vals in the two layers 180 | outputsraw = outputsraw[:, 1:, :] 181 | else: 182 | outputsraw2 = np.zeros((n_segments + 1, n_classes, output_step, 2)) 183 | window2 = np.zeros( 184 | (n_segments + 1, n_classes, output_step, 2) 185 | ) # [N+1, out_win_len/2, 2] 186 | 187 | # Drop in outputs/window vals in the two layers 188 | outputsraw2[:-1, :, :, 0] = outputsraw[:, :, :output_step] 189 | outputsraw2[1:, :, :, 1] = outputsraw[ 190 | :, :, output_step : output_step * 2 191 | ] 192 | window2[:-1, :, :, 0] = arr_window[:output_step] 193 | window2[1:, :, :, 1] = arr_window[output_step : output_step * 2] 194 | 195 | merged_outputsraw = (outputsraw2 * window2).sum(axis=3) / (window2).sum( 196 | axis=3 197 | ) 198 | softmaxed_merged_outputsraw = softmax(merged_outputsraw, axis=1) 199 | merged_outputs = np.argmax(softmaxed_merged_outputsraw, 1) 200 | 201 | self.outputsraw = np.concatenate(merged_outputsraw, axis=-1) 202 | self.outputs = np.concatenate(merged_outputs, axis=-1) 203 | self.targets = np.concatenate( 204 | np.concatenate( 205 | [ 206 | targets[:, :output_step], 207 | targets[[-1], output_step : output_step * 2], 208 | ], 209 | axis=0, 210 | ), 211 | axis=-1, 212 | ) 213 | else: 214 | raise ValueError() 215 | 216 | if self.has_null_class and not self.predict_null_class: 217 | not_null_mask = self.targets > 0 218 | self.outputsraw = self.outputsraw[..., not_null_mask] 219 | self.outputs = self.outputs[not_null_mask] 220 | self.targets = self.targets[not_null_mask] 221 | self.targets -= 1 222 | 223 | self.n_samples_in = np.prod(dl.dataset.tensors[1].shape) 224 | self.n_samples_out = len(self.outputs) 225 | self.infer_samples_per_s = self.n_samples_in / self.infer_time_s_wall 226 | self.run_time_s_cpu = tr.interval_cpu 227 | self.run_time_s_wall = tr.interval_wall 228 | 229 | loss: float = t.Float() 230 | targets: np.ndarray = t.Array() 231 | outputsraw: np.ndarray = t.Array() 232 | outputs: np.ndarray = t.Array() 233 | n_samples_in: int = t.Int() 234 | n_samples_out: int = t.Int() 235 | infer_samples_per_s: float = t.Float() 236 | 237 | infer_time_s_cpu: float = t.Float() 238 | infer_time_s_wall: float = t.Float() 239 | run_time_s_cpu: float = t.Float() 240 | run_time_s_wall: float = t.Float() 241 | 242 | extra: dict = t.Dict({}) 243 | 244 | acc: float = t.Float() 245 | f1: float = t.Float() 246 | f1_mean: float = t.Float() 247 | event_f1: float = t.Float() 248 | classification_report_txt: str = t.Str() 249 | classification_report_dict: dict = t.Dict() 250 | classification_report_df: pd.DataFrame = t.Property(t.Instance(pd.DataFrame)) 251 | confusion_matrix: np.ndarray = t.Array() 252 | 253 | nonull_acc: float = t.Float() 254 | nonull_f1: float = t.Float() 255 | nonull_f1_mean: float = t.Float() 256 | nonull_classification_report_txt: str = t.Str() 257 | nonull_classification_report_dict: dict = t.Dict() 258 | nonull_classification_report_df: pd.DataFrame = t.Property(t.Instance(pd.DataFrame)) 259 | nonull_confusion_matrix: np.ndarray = t.Array() 260 | 261 | def calc_metrics(self): 262 | 263 | self.acc = sklearn.metrics.accuracy_score(self.targets, self.outputs) 264 | self.f1 = sklearn.metrics.f1_score( 265 | self.targets, self.outputs, average="weighted" 266 | ) 267 | self.f1_mean = sklearn.metrics.f1_score( 268 | self.targets, self.outputs, average="macro" 269 | ) 270 | 271 | self.classification_report_txt = sklearn.metrics.classification_report( 272 | self.targets, 273 | self.outputs, 274 | digits=3, 275 | labels=np.arange(len(self.target_names)), 276 | target_names=self.target_names, 277 | ) 278 | self.classification_report_dict = sklearn.metrics.classification_report( 279 | self.targets, 280 | self.outputs, 281 | digits=3, 282 | output_dict=True, 283 | labels=np.arange(len(self.target_names)), 284 | target_names=self.target_names, 285 | ) 286 | self.confusion_matrix = sklearn.metrics.confusion_matrix( 287 | self.targets, self.outputs 288 | ) 289 | 290 | # Now, ignoring the null/none class: 291 | if self.has_null_class and self.predict_null_class: 292 | # assume null class comes fistnonull_mask = self.targets > 0 293 | nonull_mask = self.targets > 0 294 | nonull_targets = self.targets[nonull_mask] 295 | # nonull_outputs = self.outputs[nonull_mask] 296 | nonull_outputs = self.outputsraw[1:, :].argmax(axis=0)[nonull_mask] + 1 297 | 298 | self.nonull_acc = sklearn.metrics.accuracy_score( 299 | nonull_targets, nonull_outputs 300 | ) 301 | self.nonull_f1 = sklearn.metrics.f1_score( 302 | nonull_targets, nonull_outputs, average="weighted" 303 | ) 304 | self.nonull_f1_mean = sklearn.metrics.f1_score( 305 | nonull_targets, nonull_outputs, average="macro" 306 | ) 307 | self.nonull_classification_report_txt = sklearn.metrics.classification_report( 308 | nonull_targets, 309 | nonull_outputs, 310 | digits=3, 311 | labels=np.arange(len(self.target_names)), 312 | target_names=self.target_names, 313 | ) 314 | self.nonull_classification_report_dict = sklearn.metrics.classification_report( 315 | nonull_targets, 316 | nonull_outputs, 317 | digits=3, 318 | output_dict=True, 319 | labels=np.arange(len(self.target_names)), 320 | target_names=self.target_names, 321 | ) 322 | self.nonull_confusion_matrix = sklearn.metrics.confusion_matrix( 323 | nonull_targets, nonull_outputs 324 | ) 325 | else: 326 | self.nonull_acc = self.acc 327 | self.nonull_f1 = self.f1 328 | self.nonull_f1_mean = self.f1_mean 329 | self.nonull_classification_report_txt = self.classification_report_txt 330 | self.nonull_classification_report_dict = self.classification_report_dict 331 | self.nonull_confusion_matrix = self.confusion_matrix 332 | 333 | ward_metrics: WardMetrics = t.Instance(WardMetrics) 334 | 335 | def calc_ward_metrics(self): 336 | """ Do event-wise metrics, using the `wardmetrics` package which implements metrics from: 337 | 338 | [1] J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for activity recognition,” 339 | ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23, Jan. 2011. 340 | """ 341 | 342 | import wardmetrics 343 | 344 | # Must be in prediction mode -- otherwise, data is not contiguous, ward metrics will be bogus 345 | assert self.run_mode == "prediction" 346 | 347 | targets = self.targets 348 | predictions = self.outputs 349 | 350 | wmetrics = WardMetrics() 351 | 352 | targets_events = wardmetrics.frame_results_to_events(targets) 353 | preds_events = wardmetrics.frame_results_to_events(predictions) 354 | 355 | for i, class_name in enumerate(self.target_names): 356 | class_wmetrics = ClassWardMetrics() 357 | 358 | t = targets_events.get(str(i), []) 359 | p = preds_events.get(str(i), []) 360 | # class_wmetrics['t'] = t 361 | # class_wmetrics['p'] = p 362 | 363 | try: 364 | assert len(t) and len(p) 365 | ( 366 | twoset_results, 367 | segments_with_scores, 368 | segment_counts, 369 | normed_segment_counts, 370 | ) = wardmetrics.eval_segments(t, p) 371 | class_wmetrics.segment_twoset_results = twoset_results 372 | 373 | ( 374 | gt_event_scores, 375 | det_event_scores, 376 | detailed_scores, 377 | standard_scores, 378 | ) = wardmetrics.eval_events(t, p) 379 | class_wmetrics.event_detailed_scores = detailed_scores 380 | class_wmetrics.event_standard_scores = standard_scores 381 | except (AssertionError, ZeroDivisionError) as e: 382 | class_wmetrics.segment_twoset_results = {} 383 | class_wmetrics.event_detailed_scores = {} 384 | class_wmetrics.event_standard_scores = {} 385 | # print("Empty Results or targets for a class.") 386 | # raise ValueError("Empty Results or targets for a class.") 387 | 388 | wmetrics.class_ward_metrics.append(class_wmetrics) 389 | 390 | tt = [] 391 | pp = [] 392 | for i, class_name in enumerate(self.target_names): 393 | # skip null class for combined eventing: 394 | if class_name in ("", "Null"): 395 | continue 396 | 397 | if len(tt) or len(pp): 398 | offset = np.max(tt + pp) + 2 399 | else: 400 | offset = 0 401 | [(a + offset, b + offset) for (a, b) in t] 402 | 403 | t = targets_events.get(str(i), []) 404 | p = preds_events.get(str(i), []) 405 | 406 | tt += [(a + offset, b + offset) for (a, b) in t] 407 | pp += [(a + offset, b + offset) for (a, b) in p] 408 | 409 | t = tt 410 | p = pp 411 | 412 | class_wmetrics = ClassWardMetrics() 413 | assert len(t) and len(p) 414 | ( 415 | twoset_results, 416 | segments_with_scores, 417 | segment_counts, 418 | normed_segment_counts, 419 | ) = wardmetrics.eval_segments(t, p) 420 | class_wmetrics.segment_twoset_results = twoset_results 421 | 422 | ( 423 | gt_event_scores, 424 | det_event_scores, 425 | detailed_scores, 426 | standard_scores, 427 | ) = wardmetrics.eval_events(t, p) 428 | class_wmetrics.event_detailed_scores = detailed_scores 429 | class_wmetrics.event_standard_scores = standard_scores 430 | 431 | # Reformat as dataframe for easier calculations 432 | df = pd.DataFrame( 433 | [cm.event_standard_scores for cm in wmetrics.class_ward_metrics], 434 | index=self.target_names, 435 | ) 436 | df.loc["all_nonull"] = class_wmetrics.event_standard_scores 437 | 438 | # Calculate F1's to summarize recall/precision for each class 439 | df["f1"] = ( 440 | 2 * (df["precision"] * df["recall"]) / (df["precision"] + df["recall"]) 441 | ) 442 | df["f1 (weighted)"] = ( 443 | 2 444 | * (df["precision (weighted)"] * df["recall (weighted)"]) 445 | / (df["precision (weighted)"] + df["recall (weighted)"]) 446 | ) 447 | 448 | # Load dataframes into dictionary output 449 | wmetrics.df_event_scores = df 450 | wmetrics.df_event_detailed_scores = pd.DataFrame( 451 | [cm.event_detailed_scores for cm in wmetrics.class_ward_metrics], 452 | index=self.target_names, 453 | ) 454 | wmetrics.df_segment_2set_results = pd.DataFrame( 455 | [cm.segment_twoset_results for cm in wmetrics.class_ward_metrics], 456 | index=self.target_names, 457 | ) 458 | wmetrics.overall_ward_metrics = class_wmetrics 459 | 460 | self.ward_metrics = wmetrics 461 | self.event_f1 = self.ward_metrics.df_event_scores.loc["all_nonull", "f1"] 462 | 463 | def _get_classification_report_df(self): 464 | df = pd.DataFrame(self.classification_report_dict).T 465 | 466 | # Include Ward-metrics-derived "Event F1 (unweighted by length)" 467 | if self.ward_metrics: 468 | df["event_f1"] = self.ward_metrics.df_event_scores["f1"] 469 | else: 470 | df["event_f1"] = np.nan 471 | 472 | # Calculate various summary averages 473 | df.loc["macro avg", "event_f1"] = df["event_f1"].iloc[:-3].mean() 474 | df.loc["weighted avg", "event_f1"] = ( 475 | df["event_f1"].iloc[:-3] * df["support"].iloc[:-3] 476 | ).sum() / df["support"].iloc[:-3].sum() 477 | 478 | df["support"] = df["support"].astype(int) 479 | 480 | return df 481 | 482 | def _get_nonull_classification_report_df(self): 483 | target_names = self.target_names 484 | if not (target_names[0] in ("", "Null")): 485 | return None 486 | 487 | df = pd.DataFrame(self.nonull_classification_report_dict).T 488 | 489 | df["support"] = df["support"].astype(int) 490 | 491 | return df 492 | 493 | def _save(self, checkpoint_dir=None): 494 | """ Saves/checkpoints model state and training state to disk. """ 495 | if checkpoint_dir is None: 496 | checkpoint_dir = self.model_path 497 | 498 | os.makedirs(checkpoint_dir, exist_ok=True) 499 | 500 | # save model params 501 | evalmodel_path = os.path.join(checkpoint_dir, "evalmodel.pth") 502 | 503 | with open(evalmodel_path, "wb") as f: 504 | pickle.dump(self, f) 505 | 506 | return checkpoint_dir 507 | 508 | def _restore(self, checkpoint_dir=None): 509 | """ Restores model state and training state from disk. """ 510 | 511 | if checkpoint_dir is None: 512 | checkpoint_dir = self.model_path 513 | 514 | evalmodel_path = os.path.join(checkpoint_dir, "evalmodel.pth") 515 | 516 | # Reconstitute old trainer and copy state to this trainer. 517 | with open(evalmodel_path, "rb") as f: 518 | other_evalmodel = pickle.load(f) 519 | 520 | self.__setstate__(other_evalmodel.__getstate__()) 521 | 522 | self.trainer._restore(checkpoint_dir) 523 | 524 | 525 | def load_eval_model_from_dir(checkpoint_dir: str): 526 | em = EvalModel() 527 | em._restore(checkpoint_dir) 528 | return em 529 | -------------------------------------------------------------------------------- /filternet/training/train.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | import pickle 5 | import typing as ty 6 | 7 | import numpy as np 8 | import pandas as pd 9 | import sklearn.metrics 10 | import torch 11 | import torch.nn.functional as F 12 | import torch.optim 13 | import traits.api as t 14 | from torch.utils.data import DataLoader, TensorDataset 15 | 16 | from filternet import models 17 | from filternet.datasets import sliding_window_x_y 18 | from filternet.models.reference_architectures import get_ref_arch 19 | from filternet.mputil import Timer 20 | 21 | 22 | class EpochMetrics(t.HasStrictTraits): 23 | f1: float = t.Float() 24 | loss: float = t.Float() 25 | acc: float = t.Float() 26 | 27 | 28 | class EpochRecord(t.HasStrictTraits): 29 | epoch: int = t.Int() 30 | train: EpochMetrics = t.Instance(EpochMetrics) 31 | val: EpochMetrics = t.Instance(EpochMetrics) 32 | 33 | lr: float = t.Float() 34 | iter_s_cpu: float = t.Float() 35 | iter_s_wall: float = t.Float() 36 | should_checkpoint: bool = t.Bool(False) 37 | done: bool = t.Bool(False) 38 | stopping_metric: float = t.Float() 39 | 40 | def to_dict(self): 41 | d = { 42 | k: v 43 | for k, v in self.__dict__.items() 44 | if v is not None and not type(v) == EpochMetrics 45 | } 46 | for f in ["train", "val"]: 47 | em = getattr(self, f) 48 | if em: 49 | for k, v in em.__dict__.items(): 50 | if v is not None: 51 | d[f"{f}_{k}"] = v 52 | return d 53 | 54 | 55 | class TrainState(t.HasStrictTraits): 56 | epoch_records: ty.List[EpochRecord] = t.List(EpochRecord, []) 57 | best_sm: float = t.Float() 58 | best_loss: float = t.Float() 59 | best_f1: float = t.Float() 60 | extra: dict = t.Dict() 61 | 62 | def to_df(self): 63 | return ( 64 | pd.DataFrame.from_records([er.to_dict() for er in self.epoch_records]) 65 | .set_index("epoch") 66 | .sort_index(axis=1) 67 | ) 68 | 69 | 70 | class Trainer(t.HasStrictTraits): 71 | model: models.BaseNet = t.Instance(torch.nn.Module, transient=True) 72 | 73 | def _model_default(self): 74 | 75 | # Merge 'base config' (if requested) and any overrides in 'model_config' 76 | if self.base_config: 77 | model_config = get_ref_arch(self.base_config) 78 | else: 79 | model_config = {} 80 | if self.model_config: 81 | model_config.update(self.model_config) 82 | if self.data_spec: 83 | model_config.update( 84 | { 85 | "input_channels": self.data_spec["input_channels"], 86 | "num_output_classes": [ 87 | s["num_classes"] for s in self.data_spec["output_spec"] 88 | ], 89 | } 90 | ) 91 | # create model accordingly 92 | model_class = getattr(models, self.model_class) 93 | return model_class(**model_config) 94 | 95 | base_config: str = t.Str() 96 | model_config: dict = t.Dict() 97 | model_class: str = t.Enum("FilterNet", "DeepConvLSTM") 98 | 99 | lr_exp: float = t.Float(-3.0) 100 | batch_size: int = t.Int() 101 | win_len: int = t.Int(512) 102 | n_samples_per_batch: int = t.Int(5000) 103 | train_step: int = t.Int(16) 104 | seed: int = t.Int() 105 | decimation: int = t.Int(1) 106 | optim_type: str = t.Enum(["Adam", "SGD, RMSprop"]) 107 | loss_func: str = t.Enum(["cross_entropy", "binary_cross_entropy"]) 108 | patience: int = t.Int(10) 109 | lr_decay: float = t.Float(0.95) 110 | weight_decay: float = t.Float(1e-4) 111 | alpha: float = t.Float(0.99) 112 | momentum: float = t.Float(0.25) 113 | validation_fold: int = t.Int() 114 | epoch_size: float = t.Float(2.0) 115 | y_cols: str = t.Str() 116 | sensor_subset: str = t.Str() 117 | 118 | has_null_class: bool = t.Bool() 119 | 120 | def _has_null_class_default(self): 121 | return self.data_spec["output_spec"][0]["classes"][0] in ("", "Null") 122 | 123 | predict_null_class: bool = t.Bool(True) 124 | 125 | _class_weights: torch.Tensor = t.Instance(torch.Tensor) 126 | 127 | def __class_weights_default(self): 128 | # Not weights for now because didn't seem to increase things significantly and 129 | # added yet another hyper-parameter. Using zero didn't seem to work well. 130 | if False and self.has_null_class and not self.predict_null_class: 131 | cw = torch.ones(self.model.num_output_classes, device=self.device) 132 | cw[0] = 0.01 133 | cw /= cw.sum() 134 | return cw 135 | return None 136 | 137 | dataset: str = t.Enum( 138 | ["opportunity", "smartphone_hapt", "har", "intention_recognition"] 139 | ) 140 | name: str = t.Str() 141 | 142 | def _name_default(self): 143 | import time 144 | 145 | modelstr = self.model.__class__.__name__ 146 | timestr = time.strftime("%Y%m%d-%H%M%S") 147 | return f"{modelstr}_{timestr}" 148 | 149 | model_path: str = t.Str() 150 | 151 | def _model_path_default(self): 152 | return f"saved_models/{self.name}/" 153 | 154 | data_spec: dict = t.Any() 155 | epoch_iters: int = t.Int(0) 156 | train_state: TrainState = t.Instance(TrainState, ()) 157 | cp_iter: int = t.Int() 158 | 159 | cuda: bool = t.Bool(transient=True) 160 | 161 | def _cuda_default(self): 162 | return torch.cuda.is_available() 163 | 164 | device: str = t.Str(transient=True) 165 | 166 | def _device_default(self): 167 | return "cuda" if self.cuda else "cpu" 168 | 169 | dl_train: DataLoader = t.Instance(DataLoader, transient=True) 170 | 171 | def _dl_train_default(self): 172 | return self._get_dl("train") 173 | 174 | dl_val: DataLoader = t.Instance(DataLoader, transient=True) 175 | 176 | def _dl_val_default(self): 177 | return self._get_dl("val") 178 | 179 | dl_test: DataLoader = t.Instance(DataLoader, transient=True) 180 | 181 | def _dl_test_default(self): 182 | return self._get_dl("test") 183 | 184 | def _get_dl(self, s): 185 | 186 | if self.dataset == "opportunity": 187 | from filternet.datasets.opportunity import get_x_y_contig 188 | elif self.dataset == "smartphone_hapt": 189 | from filternet.datasets.smartphone_hapt import get_x_y_contig 190 | elif self.dataset == "har": 191 | from filternet.datasets.har import get_x_y_contig 192 | elif self.dataset == "intention_recognition": 193 | from filternet.datasets.intention_recognition import get_x_y_contig 194 | else: 195 | raise ValueError(f"Unknown dataset {self.dataset}") 196 | 197 | kwargs = {} 198 | if self.y_cols: 199 | kwargs["y_cols"] = self.y_cols 200 | if self.sensor_subset: 201 | kwargs["sensor_subset"] = self.sensor_subset 202 | 203 | Xc, ycs, data_spec = get_x_y_contig(s, **kwargs) 204 | 205 | if s == "train": 206 | # Training shuffles, and we set epoch size to length of the dataset. We can set train_step as 207 | # small as we want to get more windows; we'll only run len(Sc)/win_len of them in each training 208 | # epoch. 209 | self.epoch_iters = int(len(Xc) / self.decimation) 210 | X, ys = sliding_window_x_y( 211 | Xc, ycs, win_len=self.win_len, step=self.train_step, shuffle=False 212 | ) 213 | # Set the overall data spec using the training set, 214 | # and modify later if more info is needed. 215 | self.data_spec = data_spec 216 | else: 217 | # Val and test data are not shuffled. 218 | # Each point is inferred ~twice b/c step = win_len/2 219 | X, ys = sliding_window_x_y( 220 | Xc, 221 | ycs, 222 | win_len=self.win_len, 223 | step=int(self.win_len / 2), 224 | shuffle=False, # Cannot be true with windows 225 | ) 226 | 227 | dl = DataLoader( 228 | TensorDataset(torch.Tensor(X), *[torch.Tensor(y).long() for y in ys]), 229 | batch_size=self.batch_size, 230 | shuffle=True if s == "train" else False, 231 | ) 232 | return dl 233 | 234 | def _batch_size_default(self): 235 | batch_size = int(self.n_samples_per_batch / self.win_len) 236 | print(f"Batch size: {batch_size}") 237 | return batch_size 238 | 239 | optimizer = t.Any(transient=True) 240 | 241 | def _optimizer_default(self): 242 | if self.optim_type == "SGD": 243 | optimizer = torch.optim.SGD( 244 | self.model.parameters(), 245 | lr=10 ** (self.lr_exp), 246 | momentum=self.momentum, 247 | weight_decay=self.weight_decay, 248 | ) 249 | elif self.optim_type == "Adam": 250 | optimizer = torch.optim.Adam( 251 | self.model.parameters(), 252 | lr=10 ** (self.lr_exp), 253 | weight_decay=self.weight_decay, 254 | amsgrad=True, 255 | ) 256 | elif self.optim_type == "RMSprop": 257 | optimizer = torch.optim.RMSprop( 258 | self.model.parameters(), 259 | lr=10 ** (self.lr_exp), 260 | alpha=self.alpha, 261 | weight_decay=self.weight_decay, 262 | momentum=self.momentum, 263 | ) 264 | else: 265 | raise NotImplementedError(self.optim_type) 266 | return optimizer 267 | 268 | iteration: int = t.Property(t.Int) 269 | 270 | def _get_iteration(self): 271 | return len(self.train_state.epoch_records) + 1 272 | 273 | lr_scheduler = t.Any(transient=True) 274 | 275 | def _lr_scheduler_default(self): 276 | lr_scheduler = torch.optim.lr_scheduler.ExponentialLR( 277 | self.optimizer, self.lr_decay # , last_epoch=self._iteration 278 | ) 279 | 280 | # If this is being re-instantiated in mid-training, then we must 281 | # iterate scheduler forward to match the training step. 282 | for i in range(self.iteration): 283 | if self.lr_decay != 1: 284 | lr_scheduler.step() 285 | 286 | return lr_scheduler 287 | 288 | ##### 289 | # Training Methods 290 | ## 291 | def _train_batch(self, data, targets): 292 | self.optimizer.zero_grad() 293 | loss, output, _targets, _ = self._run_model_on_batch(data, targets) 294 | loss.backward() 295 | self.optimizer.step() 296 | # if self.max_lr: 297 | # self.lr_scheduler.step() 298 | 299 | return loss, output, _targets 300 | 301 | def _run_model_on_batch(self, data, targets): 302 | targets = torch.stack(targets) 303 | 304 | if self.cuda: 305 | data, targets = data.cuda(), targets.cuda() 306 | 307 | output = self.model(data) 308 | 309 | _targets = self.model.transform_targets(targets, one_hot=False) 310 | if self.loss_func == "cross_entropy": 311 | _losses = [ 312 | F.cross_entropy(o, t, weight=self._class_weights) 313 | for o, t in zip(output, _targets) 314 | ] 315 | loss = sum(_losses) 316 | elif self.loss_func == "binary_cross_entropy": 317 | _targets_onehot = self.model.transform_targets(targets, one_hot=True) 318 | _losses = [ 319 | F.binary_cross_entropy_with_logits(o, t, weight=self._class_weights) 320 | for o, t in zip(output, _targets_onehot) 321 | ] 322 | loss = sum(_losses) 323 | else: 324 | raise NotImplementedError(self.loss) 325 | 326 | # Assume only 1 output: 327 | 328 | return loss, output[0], _targets[0], _losses[0] 329 | 330 | def _calc_validation_loss(self): 331 | running_loss = 0 332 | self.model.eval() 333 | with torch.no_grad(): 334 | for batch_idx, (data, *targets) in enumerate(self.dl_val): 335 | loss, _, _, _ = self._run_model_on_batch(data, targets) 336 | running_loss += loss.item() * data.size(0) 337 | 338 | return running_loss / len(self.dl_val.dataset) 339 | 340 | def _train_epoch(self): 341 | 342 | self.model.train() 343 | 344 | train_losses = [] 345 | train_accs = [] 346 | 347 | for batch_idx, (data, *targets) in enumerate(self.dl_train): 348 | if ( 349 | batch_idx * data.shape[0] * data.shape[2] 350 | > self.epoch_iters * self.epoch_size 351 | ): 352 | # we've effectively finished one epoch worth of data; break! 353 | break 354 | 355 | batch_loss, batch_output, batch_targets = self._train_batch(data, targets) 356 | train_losses.append(batch_loss.detach().cpu().item()) 357 | batch_preds = torch.argmax(batch_output, 1, False) 358 | train_accs.append( 359 | (batch_preds == batch_targets).detach().cpu().float().mean().item() 360 | ) 361 | 362 | if self.lr_decay != 1: 363 | self.lr_scheduler.step() 364 | 365 | return EpochMetrics(loss=np.mean(train_losses), acc=np.mean(train_accs)) 366 | 367 | def _val_epoch(self): 368 | return self._eval_epoch(self.dl_val) 369 | 370 | def _eval_epoch(self, data_loader): 371 | # Validation 372 | self.model.eval() 373 | 374 | losses = [] 375 | outputs = [] 376 | targets = [] 377 | 378 | with torch.no_grad(): 379 | for batch_idx, (data, *target) in enumerate(data_loader): 380 | ( 381 | batch_loss, 382 | batch_output, 383 | batch_targets, 384 | train_losses, 385 | ) = self._run_model_on_batch(data, target) 386 | 387 | losses.append(batch_loss.detach().cpu().item()) 388 | outputs.append( 389 | torch.argmax(batch_output, 1, False) 390 | .detach() 391 | .cpu() 392 | .data.numpy() 393 | .flatten() 394 | ) 395 | targets.append(batch_targets.detach().cpu().data.numpy().flatten()) 396 | 397 | targets = np.hstack(targets) 398 | outputs = np.hstack(outputs) 399 | acc = sklearn.metrics.accuracy_score(targets, outputs) 400 | f1 = sklearn.metrics.f1_score(targets, outputs, average="weighted") 401 | 402 | return EpochMetrics(loss=np.mean(losses), acc=acc, f1=f1) 403 | 404 | def init_data(self): 405 | # Initiate loading of datasets, model 406 | _, _, _ = self.dl_train, self.dl_val, self.dl_test 407 | _ = self.model 408 | 409 | def init_train(self): 410 | 411 | # initialization 412 | if self.seed: 413 | torch.manual_seed(self.seed) 414 | if self.cuda: 415 | if self.seed: 416 | torch.cuda.manual_seed(self.seed) 417 | self.model.to(self.device) 418 | 419 | def train_one_epoch(self, verbose=True) -> EpochRecord: 420 | """ traing a single epoch -- method tailored to the Ray.tune methodology.""" 421 | epoch_record = EpochRecord(epoch=len(self.train_state.epoch_records)) 422 | self.train_state.epoch_records.append(epoch_record) 423 | 424 | with Timer("Train Epoch", log_output=verbose) as t: 425 | epoch_record.train = self._train_epoch() 426 | epoch_record.iter_s_cpu = t.interval_cpu 427 | epoch_record.iter_s_wall = t.interval_wall 428 | epoch_record.lr = self.optimizer.param_groups[0]["lr"] 429 | 430 | with Timer("Val Epoch", log_output=verbose): 431 | epoch_record.val = self._val_epoch() 432 | 433 | df = self.train_state.to_df() 434 | 435 | # Early stopping / checkpointing implementation 436 | df["raw_metric"] = df.val_loss / df.val_f1 437 | df["ewma_smoothed_loss"] = ( 438 | df["raw_metric"].ewm(ignore_na=False, halflife=3).mean() 439 | ) 440 | df["instability_penalty"] = ( 441 | df["raw_metric"].rolling(5, min_periods=3).std().fillna(0.75) 442 | ) 443 | stopping_metric = df["stopping_metric"] = ( 444 | df["ewma_smoothed_loss"] + df["instability_penalty"] 445 | ) 446 | epoch_record.stopping_metric = df["stopping_metric"].iloc[-1] 447 | 448 | idx_this_iter = stopping_metric.index.max() 449 | idx_best_yet = stopping_metric.idxmin() 450 | self.train_state.best_sm = df.loc[idx_best_yet, "stopping_metric"] 451 | self.train_state.best_loss = df.loc[idx_best_yet, "val_loss"] 452 | self.train_state.best_f1 = df.loc[idx_best_yet, "val_f1"] 453 | 454 | if idx_best_yet == idx_this_iter: 455 | # Best yet! Checkpoint. 456 | epoch_record.should_checkpoint = True 457 | self.cp_iter = epoch_record.epoch 458 | 459 | else: 460 | if self.patience is not None: 461 | patience_counter = idx_this_iter - idx_best_yet 462 | assert patience_counter >= 0 463 | if patience_counter > self.patience: 464 | if verbose: 465 | print( 466 | f"Early stop! Out of patience ( {patience_counter} > {self.patience} )" 467 | ) 468 | epoch_record.done = True 469 | 470 | if verbose: 471 | self.print_train_summary() 472 | 473 | return epoch_record 474 | 475 | def train(self, max_epochs=50, verbose=True): 476 | """ A pretty standard training loop, constrained to stop in `max_epochs` but may stop early if our 477 | custom stopping metric does not improve for `self.patience` epochs. Always checkpoints 478 | when a new best stopping_metric is achieved. An alternative to using 479 | ray.tune for training.""" 480 | 481 | self.init_data() 482 | self.init_train() 483 | 484 | while True: 485 | epoch_record = self.train_one_epoch(verbose=verbose) 486 | 487 | if epoch_record.should_checkpoint: 488 | last_cp = self._save() 489 | if verbose: 490 | print(f"<<<< Checkpointed ({last_cp}) >>>") 491 | if epoch_record.done: 492 | break 493 | if epoch_record.epoch >= max_epochs: 494 | break 495 | 496 | # Save trainer state, but not model" 497 | self._save(save_model=False) 498 | if verbose: 499 | print(self.model_path) 500 | 501 | def print_train_summary(self): 502 | df = self.train_state.to_df() 503 | 504 | with pd.option_context( 505 | "display.max_rows", 506 | 100, 507 | "display.max_columns", 508 | 100, 509 | "display.precision", 510 | 3, 511 | "display.width", 512 | 180, 513 | ): 514 | print(df.drop(["done"], axis=1, errors="ignore")) 515 | 516 | def _save(self, checkpoint_dir=None, save_model=True, save_trainer=True): 517 | """ Saves/checkpoints model state and training state to disk. """ 518 | if checkpoint_dir is None: 519 | checkpoint_dir = self.model_path 520 | else: 521 | self.model_path = checkpoint_dir 522 | 523 | os.makedirs(checkpoint_dir, exist_ok=True) 524 | 525 | # save model params 526 | model_path = os.path.join(checkpoint_dir, "model.pth") 527 | trainer_path = os.path.join(checkpoint_dir, "trainer.pth") 528 | 529 | if save_model: 530 | torch.save(self.model.state_dict(), model_path) 531 | if save_trainer: 532 | with open(trainer_path, "wb") as f: 533 | pickle.dump(self, f) 534 | 535 | return checkpoint_dir 536 | 537 | def _restore(self, checkpoint_dir=None): 538 | """ Restores model state and training state from disk. """ 539 | 540 | if checkpoint_dir is None: 541 | checkpoint_dir = self.model_path 542 | 543 | model_path = os.path.join(checkpoint_dir, "model.pth") 544 | trainer_path = os.path.join(checkpoint_dir, "trainer.pth") 545 | 546 | # Reconstitute old trainer and copy state to this trainer. 547 | with open(trainer_path, "rb") as f: 548 | other_trainer = pickle.load(f) 549 | 550 | self.__setstate__(other_trainer.__getstate__()) 551 | 552 | # Load model (after loading state in case we need to re-initialize model from config) 553 | self.model.load_state_dict(torch.load(model_path, map_location=self.device)) 554 | 555 | # Be careful to reinitialize optimizer and lr scheduler 556 | self.optimizer = self._optimizer_default() 557 | self.lr_scheduler = self._lr_scheduler_default() 558 | 559 | 560 | # 561 | # class EnsembleCNNLSTMTrainable(CNNLSTMTrainable): 562 | # def _setup(self, config={}): 563 | # """Decimation is for speedup during unit testing only.""" 564 | # super()._setup(config=config) 565 | # self.model = mo.FilterNetEnsemble(config=config) 566 | -------------------------------------------------------------------------------- /filternet/training/trainable.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import ray.tune 4 | 5 | from filternet.models.reference_architectures import get_ref_arch 6 | from filternet.training.train import Trainer 7 | 8 | 9 | class MPTrainable(ray.tune.Trainable): 10 | def _setup(self, config={}): 11 | """Decimation is for speedup during unit testing only.""" 12 | if config.get("base_config", False) and "model_config" not in config: 13 | # Use the requested 'base config', updating it with any other 14 | # requested options. 15 | config["model_config"] = get_ref_arch(config["base_config"]) 16 | print(f"Using base config: {config['base_config']}") 17 | 18 | self.trainer = trainer = Trainer(**config) 19 | self.trainer.init_data() 20 | self.trainer.init_train() 21 | 22 | def _train(self): 23 | epoch_record = self.trainer.train_one_epoch() 24 | d = epoch_record.to_dict() 25 | d["mean_loss"] = d["val_loss"] 26 | d["mean_accuracy"] = d["val_acc"] 27 | return d 28 | 29 | def _save(self, checkpoint_dir): 30 | """ Saves/checkpoints model state and training state to disk. """ 31 | 32 | return self.trainer._save(checkpoint_dir) 33 | 34 | def _restore(self, checkpoint_dir): 35 | """ Restores model state and training state from disk. """ 36 | return self.trainer._restore(checkpoint_dir) 37 | -------------------------------------------------------------------------------- /multimodal_sensor_fusion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/multimodal_sensor_fusion.png -------------------------------------------------------------------------------- /scripts/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | -------------------------------------------------------------------------------- /scripts/run_base_configs_exp.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ 4 | Executes a series of benchmarking runs to be used in plots & tables in the paper: 5 | * FilterNet reference configurations 6 | * DeepConvLSTM reimpplementation 7 | * .5x ms-c/l 8 | # 2X ms-c/l 9 | 10 | """ 11 | 12 | import sys 13 | 14 | sys.path.insert(0, ".") 15 | 16 | from filternet.models.reference_architectures import ref_archs 17 | from filternet.training.evalmodel import * 18 | 19 | NAME = "base_configs_7" # unique name for this particular run 20 | MAX_EPOCHS = 100 21 | NUM_REPEATS = 10 22 | saved_model_glob = ( 23 | f"saved_models/{NAME}*/evalmodel.pth" # helps NB's to load these models 24 | ) 25 | 26 | 27 | def do_run(): 28 | for i in range(0, NUM_REPEATS): 29 | # Do the FilterNet reference architectures 30 | for ref_arch in ref_archs.keys(): 31 | name = f"{NAME}_{ref_arch}_{i}" 32 | 33 | config = {} 34 | config["base_config"] = ref_arch 35 | config["name"] = f"{NAME}_{ref_arch}_{i}" 36 | 37 | trainer = Trainer(**config) 38 | trainer.init_data() 39 | trainer.init_train() 40 | 41 | trainer.train(max_epochs=MAX_EPOCHS) 42 | 43 | em = EvalModel(trainer=trainer) 44 | em._save() 45 | # Load fresh for consistency 46 | em = load_eval_model_from_dir(em.model_path) 47 | 48 | em.run_test_set() 49 | em.calc_metrics() 50 | em.calc_ward_metrics() 51 | print(em.classification_report_df) 52 | em._save() 53 | 54 | # Also do a matching number of DeepConvLSTMs 55 | name = f"{NAME}_deepconvlstm_{i}" 56 | 57 | config = { 58 | "win_len": 24, 59 | "batch_size": 100, 60 | "model_class": "DeepConvLSTM", 61 | "model_config": {"scale": 1.0}, 62 | } 63 | # config["base_config"] = ref_arch 64 | # config["model_config"] = {} #get_ref_arch("multi_scale_cnn_lstm") 65 | config["name"] = name 66 | 67 | trainer = Trainer(**config) 68 | trainer.init_data() 69 | trainer.init_train() 70 | 71 | trainer.train(max_epochs=MAX_EPOCHS) 72 | 73 | em = EvalModel(trainer=trainer) 74 | em._save() 75 | # Load fresh for consistency 76 | em = load_eval_model_from_dir(em.model_path) 77 | em.run_test_set() 78 | em.calc_metrics() 79 | em.calc_ward_metrics() 80 | print(em.classification_report_df) 81 | em._save() 82 | 83 | # Also do a matching number of .5x scale models. 84 | ref_arch = "multi_scale_cnn_lstm" 85 | 86 | name = f"{NAME}_mscl_p5x_{i}" 87 | 88 | config = {} 89 | config["base_config"] = ref_arch 90 | config["model_config"] = {"scale": 0.5} 91 | config["name"] = name 92 | 93 | trainer = Trainer(**config) 94 | trainer.init_data() 95 | trainer.init_train() 96 | 97 | trainer.train(max_epochs=MAX_EPOCHS) 98 | 99 | em = EvalModel(trainer=trainer) 100 | em._save() 101 | # Load fresh for consistency 102 | em = load_eval_model_from_dir(em.model_path) 103 | em.run_test_set() 104 | em.calc_metrics() 105 | em.calc_ward_metrics() 106 | print(em.classification_report_df) 107 | em._save() 108 | 109 | # Also do a matching number of 2x scale models. 110 | ref_arch = "multi_scale_cnn_lstm" 111 | 112 | name = f"{NAME}_mscl_2x_{i}" 113 | 114 | config = {} 115 | config["base_config"] = ref_arch 116 | config["model_config"] = {"scale": 2} 117 | config["name"] = name 118 | 119 | trainer = Trainer(**config) 120 | trainer.init_data() 121 | trainer.init_train() 122 | 123 | trainer.train(max_epochs=MAX_EPOCHS) 124 | 125 | em = EvalModel(trainer=trainer) 126 | em._save() 127 | # Load fresh for consistency 128 | em = load_eval_model_from_dir(em.model_path) 129 | em.run_test_set() 130 | em.calc_metrics() 131 | em.calc_ward_metrics() 132 | print(em.classification_report_df) 133 | em._save() 134 | 135 | # Also do a matching number of .5x scale models. 136 | ref_arch = "multi_scale_cnn_lstm" 137 | 138 | name = f"{NAME}_mscl_p5x_{i}" 139 | 140 | config = {} 141 | config["base_config"] = ref_arch 142 | config["model_config"] = {"scale": 0.5} 143 | config["name"] = name 144 | 145 | trainer = Trainer(**config) 146 | trainer.init_data() 147 | trainer.init_train() 148 | 149 | trainer.train(max_epochs=MAX_EPOCHS) 150 | 151 | em = EvalModel(trainer=trainer) 152 | em._save() 153 | # Load fresh for consistency 154 | em = load_eval_model_from_dir(em.model_path) 155 | em.run_test_set() 156 | em.calc_metrics() 157 | em.calc_ward_metrics() 158 | print(em.classification_report_df) 159 | em._save() 160 | 161 | 162 | if __name__ == "__main__": 163 | do_run() 164 | -------------------------------------------------------------------------------- /scripts/run_ensemble_exp.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ 4 | Executes a series of benchmarking runs to be used in plots & tables in the paper: 5 | * Ensembled MS-C/L models with various #'s of folds / submodels 6 | """ 7 | 8 | import sys 9 | 10 | sys.path.insert(0, ".") 11 | 12 | from filternet.training.evalmodel import * 13 | from filternet.training.ensemble_train import EnsembleTrainer 14 | 15 | NAME = "ensembles_3" # unique name for this particular run 16 | MAX_EPOCHS = 100 17 | NUM_REPEATS = 10 18 | NUM_FOLDS = [2, 3, 4, 5] 19 | saved_model_glob = ( 20 | f"saved_models/{NAME}*/evalmodel.pth" # helps NB's to load these models 21 | ) 22 | 23 | 24 | def do_run(): 25 | for i in range(0, NUM_REPEATS): 26 | # Do the FilterNet reference architectures 27 | for num_folds in NUM_FOLDS: 28 | name = f"{NAME}_{num_folds}_folds_{i}" 29 | 30 | config = {"base_config": "multi_scale_cnn_lstm", "model_config": {}} 31 | 32 | trainer = EnsembleTrainer(n_folds=num_folds, name=name) 33 | trainer.init_data() 34 | 35 | trainer.train(max_epochs=MAX_EPOCHS) 36 | trainer._save() 37 | 38 | em = EvalModel(trainer=trainer) 39 | 40 | # annotate extra field, to make post-analysis easier. 41 | em.extra["exp_name"] = NAME 42 | em.extra["num_folds"] = num_folds 43 | em.extra["i_repeat"] = i 44 | 45 | em.run_test_set() 46 | em.calc_metrics() 47 | em.calc_ward_metrics() 48 | print(em.classification_report_df) 49 | em._save() 50 | 51 | 52 | if __name__ == "__main__": 53 | do_run() 54 | -------------------------------------------------------------------------------- /scripts/run_mm_base_configs_exp.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | """ 4 | Executes a series of benchmarking runs to be used in plots & tables in the paper: 5 | * Runs for 'multimodal sensor fusion analysis' 6 | 7 | """ 8 | 9 | import sys 10 | 11 | sys.path.insert(0, ".") 12 | 13 | from filternet.models.reference_architectures import ref_archs 14 | from filternet.training.evalmodel import * 15 | from filternet.training.ensemble_train import EnsembleTrainer 16 | 17 | NAME = "mm_base_configs_2" # unique name for this particular run 18 | MAX_EPOCHS = 100 # 0 19 | NUM_REPEATS = 5 # 20 20 | saved_model_glob = ( 21 | f"saved_models/{NAME}*/evalmodel.pth" # helps NB's to load these models 22 | ) 23 | 24 | # Iterate through these architectures 25 | ref_archs = [ 26 | # 'deepconvlstm', # slow 27 | "base_cnn", 28 | "base_lstm", 29 | "cnn_lstm", 30 | "multi_scale_cnn", 31 | "multi_scale_cnn_lstm", 32 | ] 33 | 34 | # For each architecture, train/eval on these sensor subsets 35 | sensor_subsets = [ 36 | "accels", 37 | "gyros", 38 | "accels+gyros", 39 | "accels+gyros+magnetic", 40 | "opportunity", 41 | ] 42 | 43 | 44 | def do_run(): 45 | for i in range(1, NUM_REPEATS): 46 | # Do the FilterNet reference architectures 47 | for ref_arch in ref_archs: 48 | for sensor_subset in sensor_subsets: 49 | name = f"{NAME}_{ref_arch}_{sensor_subset}_{i}" 50 | 51 | config = {} 52 | config["base_config"] = ref_arch 53 | config["name"] = name 54 | config["sensor_subset"] = sensor_subset 55 | 56 | trainer = Trainer(**config) 57 | trainer.init_data() 58 | trainer.init_train() 59 | 60 | trainer.train(max_epochs=MAX_EPOCHS) 61 | 62 | em = EvalModel(trainer=trainer) 63 | em._save() 64 | 65 | # Load fresh for consistency 66 | em = load_eval_model_from_dir(em.model_path) 67 | 68 | em.run_test_set() 69 | em.calc_metrics() 70 | em.calc_ward_metrics() 71 | print(em.classification_report_df) 72 | print(f"Weighted F1: {em.f1:.4f}") 73 | print(f"Event F1: {em.event_f1:.4f}") 74 | print(f"Nonull F1: {em.nonull_f1:.4f}") 75 | em._save() 76 | 77 | # And also do the 4-fold ensemble, which requires slightly different code 78 | for sensor_subset in sensor_subsets: 79 | num_folds = 4 80 | 81 | name = f"{NAME}_{num_folds}_folds_{sensor_subset}_{i}" 82 | 83 | config = {"base_config": "multi_scale_cnn_lstm", "model_config": {}} 84 | config["sensor_subset"] = sensor_subset 85 | 86 | trainer = EnsembleTrainer(n_folds=num_folds, name=name, config=config) 87 | trainer.init_data() 88 | 89 | trainer.train(max_epochs=MAX_EPOCHS) 90 | trainer._save() 91 | 92 | em = EvalModel(trainer=trainer) 93 | em._save() 94 | 95 | # Load fresh for consistency 96 | em = load_eval_model_from_dir(em.model_path) 97 | 98 | # annotate extra field, to make post-analysis easier. 99 | em.extra["exp_name"] = NAME 100 | em.extra["num_folds"] = num_folds 101 | em.extra["i_repeat"] = i 102 | 103 | em.run_test_set() 104 | em.calc_metrics() 105 | em.calc_ward_metrics() 106 | print(em.classification_report_df) 107 | em._save() 108 | 109 | 110 | if __name__ == "__main__": 111 | do_run() 112 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name='FilterNet', 5 | version='', 6 | packages=['filternet'], 7 | url='https://github.com/WhistleLabs/FilterNet', 8 | license='', 9 | author='Whistle Labs', 10 | author_email='', 11 | description='' 12 | ) 13 | -------------------------------------------------------------------------------- /stripchart heatmaps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/stripchart heatmaps.png -------------------------------------------------------------------------------- /tests/datasets/test_har.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import pytest 6 | 7 | from filternet.datasets import har as ds 8 | from filternet.datasets import sliding_window_x_y 9 | 10 | 11 | @pytest.fixture 12 | def dfs_dict(): 13 | return ds.get_or_make_dfs() 14 | 15 | 16 | def test_download(): 17 | ds.download_if_needed() 18 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR)) 19 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE)) 20 | 21 | 22 | def test_get_or_make_dfs(dfs_dict): 23 | assert dfs_dict["df_train"].shape == (403712, 12) 24 | assert dfs_dict["df_val"].shape == (66816, 12) 25 | assert dfs_dict["df_test"].shape == (188672, 12) 26 | 27 | for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]: 28 | assert df.isna().sum().sum() == 0 29 | 30 | assert dfs_dict["s_labels"].shape == (6,) 31 | assert dfs_dict["df_cols"].shape == (12, 3) 32 | 33 | 34 | def test_get_x_y(dfs_dict): 35 | lens = {} 36 | for which_set in ["train", "train+val", "val", "test"]: 37 | Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict) 38 | wl = 128 39 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 40 | 41 | assert X.shape[1] == data_spec["input_channels"] 42 | assert X.shape[2] == wl 43 | for y in ys: 44 | assert y.shape[1] == wl 45 | 46 | lens[which_set] = len(Xc) 47 | assert len(data_spec["input_features"]) == data_spec["input_channels"] 48 | assert data_spec["n_outputs"] == len(data_spec["output_spec"]) 49 | 50 | for o in data_spec["output_spec"]: 51 | assert "name" in o 52 | assert o["num_classes"] == len(o["classes"]) 53 | 54 | assert "dataset_name" in data_spec 55 | 56 | assert lens["train"] + lens["val"] == lens["train+val"] 57 | 58 | 59 | def test_urls(): 60 | assert ds.DATASET_FILE == "UCI%20HAR%20Dataset.zip" 61 | assert ds.DATASET_SUBDIR == "UCI_HAR_Dataset" 62 | -------------------------------------------------------------------------------- /tests/datasets/test_intention_recognition.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import pytest 6 | 7 | from filternet.datasets import intention_recognition as ds 8 | from filternet.datasets import sliding_window_x_y 9 | 10 | 11 | @pytest.fixture 12 | def dfs_dict(): 13 | return ds.get_or_make_dfs() 14 | 15 | 16 | def test_download(): 17 | ds.download_if_needed() 18 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR)) 19 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE)) 20 | 21 | 22 | def test_get_or_make_dfs(dfs_dict): 23 | # assert dfs_dict["df_train"].shape == (11565536, 68) 24 | # assert dfs_dict["df_val"].shape == (2009920, 68) 25 | # assert dfs_dict["df_test"].shape == (2468032, 68) 26 | 27 | for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]: 28 | assert df.isna().sum().sum() == 0 29 | 30 | assert dfs_dict["s_labels"].shape == (5,) 31 | assert dfs_dict["df_cols"].shape == (68, 2) 32 | 33 | 34 | def test_get_x_y(dfs_dict): 35 | lens = {} 36 | for which_set in ["train", "train+val", "val", "test"]: 37 | Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict) 38 | wl = 128 39 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 40 | 41 | assert X.shape[1] == data_spec["input_channels"] 42 | assert X.shape[2] == wl 43 | for y in ys: 44 | assert y.shape[1] == wl 45 | 46 | lens[which_set] = len(Xc) 47 | assert len(data_spec["input_features"]) == data_spec["input_channels"] 48 | assert data_spec["n_outputs"] == len(data_spec["output_spec"]) 49 | 50 | for o in data_spec["output_spec"]: 51 | assert "name" in o 52 | assert o["num_classes"] == len(o["classes"]) 53 | 54 | assert "dataset_name" in data_spec 55 | 56 | assert lens["train"] + lens["val"] == lens["train+val"] 57 | 58 | 59 | def test_urls(): 60 | assert ds.DATASET_FILE == "eeg-motor-movementimagery-dataset-1.0.0.zip" 61 | assert ds.DATASET_SUBDIR == "eeg-motor-movementimagery-dataset-1.0.0" 62 | -------------------------------------------------------------------------------- /tests/datasets/test_opportunity.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import pytest 6 | 7 | from filternet.datasets import opportunity as opp 8 | from filternet.datasets import sliding_window_x_y 9 | 10 | 11 | @pytest.fixture 12 | def dfs_dict(): 13 | return opp.get_or_make_dfs() 14 | 15 | 16 | def test_download_opportunity(): 17 | opp.download_if_needed() 18 | assert os.path.exists(os.path.join(opp.datasets_dir, opp.DATASET_SUBDIR)) 19 | assert os.path.exists(os.path.join(opp.datasets_dir, opp.DATASET_FILE)) 20 | 21 | 22 | def test_get_or_make_dfs(dfs_dict): 23 | assert dfs_dict["df_train"].shape == (497014, 252) 24 | assert dfs_dict["df_val"].shape == (60949, 252) 25 | assert dfs_dict["df_test"].shape == (118750, 252) 26 | 27 | for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]: 28 | assert df.isna().sum().sum() == 0 29 | 30 | assert dfs_dict["df_labels_locomotion"].shape == (5, 3) 31 | assert dfs_dict["df_labels_gestures"].shape == (18, 3) 32 | 33 | assert dfs_dict["df_cols"].shape == (250, 6) 34 | 35 | 36 | def test_get_x_y(dfs_dict): 37 | lens = {} 38 | for which_set in ["train", "train+val", "val", "test"]: 39 | Xc, ycs, data_spec = opp.get_x_y_contig(which_set, dfs_dict=dfs_dict) 40 | wl = 128 41 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 42 | 43 | assert X.shape[1] == 113 44 | assert X.shape[2] == wl 45 | for y in ys: 46 | assert y.shape[1] == wl 47 | 48 | lens[which_set] = len(Xc) 49 | assert data_spec["input_channels"] == 113 50 | assert len(data_spec["input_features"]) == data_spec["input_channels"] 51 | assert data_spec["n_outputs"] == len(data_spec["output_spec"]) 52 | 53 | for o in data_spec["output_spec"]: 54 | assert "name" in o 55 | assert o["num_classes"] == len(o["classes"]) 56 | 57 | assert "dataset_name" in data_spec 58 | 59 | assert lens["train"] + lens["val"] == lens["train+val"] 60 | 61 | 62 | def test_get_different_outputs(dfs_dict): 63 | with pytest.raises(AssertionError): 64 | Xc, ycs, data_spec = opp.get_x_y_contig(dfs_dict=dfs_dict, y_cols="y_gesture") 65 | Xc, ycs, data_spec = opp.get_x_y_contig(dfs_dict=dfs_dict, y_cols=["y_gesture"]) 66 | assert len(ycs) == 1 67 | Xc, ycs2, data_spec = opp.get_x_y_contig( 68 | dfs_dict=dfs_dict, y_cols=["y_gesture", "y_locomotion"] 69 | ) 70 | assert len(ycs2) == 2 71 | assert ycs[0].shape == ycs2[0].shape 72 | 73 | 74 | def test_get_sensor_subsets(dfs_dict): 75 | lens = {} 76 | expected_lens = { 77 | "accels": 15, 78 | "gyros": 15, 79 | "accels+gyros": 30, 80 | "accels+gyros+magnetic": 45, 81 | "opportunity": 113, 82 | None: 113, 83 | } 84 | for sensor_subset in [ 85 | None, 86 | "accels", 87 | "gyros", 88 | "accels+gyros", 89 | "accels+gyros+magnetic", 90 | "opportunity", 91 | ]: 92 | Xc, ycs, data_spec = opp.get_x_y_contig( 93 | "train+val", sensor_subset=sensor_subset, dfs_dict=dfs_dict 94 | ) 95 | assert Xc.shape[1] == expected_lens[sensor_subset] 96 | 97 | 98 | def test_urls(): 99 | assert opp.DATASET_FILE == "OpportunityUCIDataset.zip" 100 | assert opp.DATASET_SUBDIR == "OpportunityUCIDataset" 101 | -------------------------------------------------------------------------------- /tests/datasets/test_smartphone_hapt.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import pytest 6 | 7 | from filternet.datasets import sliding_window_x_y 8 | from filternet.datasets import smartphone_hapt as ds 9 | 10 | 11 | @pytest.fixture 12 | def dfs_dict(): 13 | return ds.get_or_make_dfs() 14 | 15 | 16 | def test_download(): 17 | ds.download_if_needed() 18 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_SUBDIR)) 19 | assert os.path.exists(os.path.join(ds.datasets_dir, ds.DATASET_FILE)) 20 | 21 | 22 | def test_get_or_make_dfs(dfs_dict): 23 | assert dfs_dict["df_train"].shape == (686203, 10) 24 | assert dfs_dict["df_val"].shape == (111281, 10) 25 | assert dfs_dict["df_test"].shape == (325288, 10) 26 | 27 | for df in [dfs_dict["df_train"], dfs_dict["df_val"], dfs_dict["df_test"]]: 28 | assert not df.isna().any().any() 29 | 30 | assert dfs_dict["s_labels"].shape == (12,) 31 | assert dfs_dict["df_cols"].shape == (10, 3) 32 | 33 | 34 | def test_get_x_y(dfs_dict): 35 | lens = {} 36 | for which_set in ["train", "train+val", "val", "test"]: 37 | Xc, ycs, data_spec = ds.get_x_y_contig(which_set, dfs_dict=dfs_dict) 38 | wl = 128 39 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 40 | 41 | assert X.shape[1] == data_spec["input_channels"] 42 | assert X.shape[2] == wl 43 | for y in ys: 44 | assert y.shape[1] == wl 45 | 46 | lens[which_set] = len(Xc) 47 | assert len(data_spec["input_features"]) == data_spec["input_channels"] 48 | assert data_spec["n_outputs"] == len(data_spec["output_spec"]) 49 | 50 | for o in data_spec["output_spec"]: 51 | assert "name" in o 52 | assert o["num_classes"] == len(o["classes"]) 53 | 54 | assert "dataset_name" in data_spec 55 | 56 | assert lens["train"] + lens["val"] == lens["train+val"] 57 | 58 | 59 | def test_urls(): 60 | assert ds.DATASET_FILE == "HAPT%20Data%20Set.zip" 61 | assert ds.DATASET_SUBDIR == "HAPT_Data_Set" 62 | -------------------------------------------------------------------------------- /tests/test_datasets.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import filternet.datasets 4 | 5 | 6 | def test_datasets_dir(): 7 | assert "datasets" in filternet.datasets.datasets_dir 8 | print(filternet.datasets.datasets_dir) 9 | -------------------------------------------------------------------------------- /tests/test_init.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import filternet 4 | 5 | 6 | def test_base_dir(): 7 | assert "filternet" in filternet.base_dir 8 | print(filternet.base_dir) 9 | -------------------------------------------------------------------------------- /tests/test_models.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import pytest 4 | import torch 5 | 6 | import filternet.models as mo 7 | from filternet.datasets import opportunity as opp, sliding_window_x_y 8 | 9 | 10 | @pytest.fixture 11 | def dfs_dict(): 12 | return opp.get_or_make_dfs() 13 | 14 | 15 | @pytest.fixture 16 | def x_y_dict(): 17 | wl = 64 18 | xys = {} 19 | for which_set in ["train", "val", "test"]: 20 | Xc, ycs, data_spec = opp.get_x_y_contig(which_set) 21 | 22 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 23 | 24 | assert X.shape[1] == 113 25 | assert X.shape[2] == wl 26 | assert ys[0].shape[1] == wl 27 | 28 | xys["X_" + which_set] = torch.Tensor(X) 29 | xys["ys_" + which_set] = [torch.Tensor(y).long() for y in ys] 30 | xys["win_len"] = wl 31 | return xys 32 | 33 | 34 | def test_make_model(): 35 | net = mo.DeepConvLSTM(scale=0.25) 36 | 37 | 38 | def test_transform_output_m2o(x_y_dict): 39 | net = mo.DeepConvLSTM(scale=(1.0 / 8)) 40 | N = 10 41 | X = x_y_dict["X_train"][:N] 42 | ys = [y[:N] for y in x_y_dict["ys_train"]] 43 | 44 | y_outs = net(X) 45 | for y_out, num_output_classes in zip(y_outs, net.num_output_classes): 46 | assert y_out.shape == (N, num_output_classes, 1) 47 | y_comps = net.transform_targets(ys) 48 | for y_comp, y_out in zip(y_comps, y_outs): 49 | assert y_comp.shape == y_out.shape 50 | 51 | 52 | def test_transform_output_m2m(x_y_dict): 53 | net = mo.DeepConvLSTM(scale=(1.0 / 8)) 54 | N = 10 55 | X = x_y_dict["X_train"][:N] 56 | ys = [y[:N] for y in x_y_dict["ys_train"]] 57 | 58 | y_outs = net(X) 59 | for y_out, num_output_classes in zip(y_outs, net.num_output_classes): 60 | assert y_out.shape == ( 61 | N, 62 | num_output_classes, 63 | 1 #x_y_dict["win_len"] - 2 * net.padding_lost_per_side, 64 | ) 65 | y_comps = net.transform_targets(ys) 66 | for y_comp, y_out in zip(y_comps, y_outs): 67 | assert y_comp.shape == y_out.shape 68 | 69 | 70 | @pytest.mark.skip("No need to generate 10,000 different models every time!") 71 | def test_make_cnn_lstm_models(x_y_dict): 72 | N = 10 73 | X = x_y_dict["X_train"][:N] 74 | ys = [y[:N] for y in x_y_dict["ys_train"]] 75 | 76 | i = 0 77 | for n_pre in [0, 1, 2]: 78 | print("n_pre ", n_pre) 79 | for n_strided in [0, 3]: 80 | print("n_strided ", n_strided) 81 | for n_interp in [0, 1, 3]: 82 | print("n_interp ", n_interp) 83 | for n_dense_pre_l in [0, 1, 2]: 84 | print("n_dense_pre_l", n_dense_pre_l) 85 | for n_l in [0, 1, 2]: 86 | print("n_l ", n_l) 87 | for n_dense_post_l in [0, 1, 2]: 88 | print("n_dense_post_l ", n_dense_post_l) 89 | for do_pool in [True, False]: 90 | print("do_pool ", do_pool) 91 | for stride_pos in ["pre", "post"]: 92 | print("stride_pos ", stride_pos) 93 | for dropout in [0, 0.5]: 94 | print("dropout ", dropout) 95 | for bn_pre in [True, False]: 96 | print("bn_pre ", bn_pre) 97 | i += 1 98 | opts = dict( 99 | output_type="many_to_many", 100 | scale=(1.0 / 4), 101 | n_pre=n_pre, 102 | n_strided=n_strided, 103 | n_interp=n_interp, 104 | n_dense_pre_l=n_dense_pre_l, 105 | n_l=n_l, 106 | n_dense_post_l=n_dense_post_l, 107 | do_pool=do_pool, 108 | stride_pos=stride_pos, 109 | dropout=dropout, 110 | bn_pre=bn_pre, 111 | ) 112 | print(i) 113 | try: 114 | net = mo.FilterNet(**opts) 115 | y_outs = net(X) 116 | for y_out, num_output_classes in zip( 117 | y_outs, net.num_output_classes 118 | ): 119 | assert y_out.shape == ( 120 | N, 121 | num_output_classes, 122 | x_y_dict["win_len"] 123 | / net.output_stride, 124 | ) 125 | y_comps = net.transform_targets(ys) 126 | for y_comp, y_out in zip( 127 | y_comps, y_outs 128 | ): 129 | assert y_comp.shape == y_out.shape 130 | except Exception as e: 131 | print(opts) 132 | raise 133 | -------------------------------------------------------------------------------- /tests/test_train.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2020 Pet Insight Project - All Rights Reserved 2 | 3 | import os 4 | 5 | import pytest 6 | import torch 7 | 8 | from filternet.datasets import opportunity as opp, sliding_window_x_y 9 | from filternet.training.trainable import MPTrainable 10 | 11 | 12 | @pytest.fixture 13 | def dfs_dict(): 14 | return opp.get_or_make_dfs() 15 | 16 | 17 | @pytest.fixture 18 | def x_y_dict(dfs_dict): 19 | wl = 64 20 | xys = {} 21 | for which_set in ["train", "val", "test"]: 22 | Xc, ycs, data_spec = opp.get_x_y_contig(which_set, dfs_dict=dfs_dict) 23 | 24 | X, ys = sliding_window_x_y(Xc, ycs, win_len=wl) 25 | 26 | assert X.shape[1] == 113 27 | assert X.shape[2] == wl 28 | for y in ys: 29 | assert y.shape[1] == wl 30 | 31 | xys["X_" + which_set] = torch.Tensor(X) 32 | xys["ys_" + which_set] = [torch.Tensor(y).long() for y in ys] 33 | xys["win_len"] = wl 34 | return xys 35 | 36 | 37 | def test_train_val_test_model(): 38 | trainable = MPTrainable( 39 | { 40 | "name": "unittest", 41 | "loss_func": "cross_entropy", 42 | "decimation": 10, 43 | "base_config": "base_cnn", 44 | "model_config": {"scale": (1.0 / 16)}, 45 | } 46 | ) 47 | 48 | trainer = trainable.trainer 49 | assert trainer.model is not None 50 | assert trainer.optimizer is not None 51 | assert trainer.win_len is not None 52 | assert trainer.loss_func is not None 53 | 54 | assert trainer.dl_train is not None 55 | assert trainer.dl_val is not None 56 | assert trainer.dl_test is not None 57 | 58 | # one training iteration 59 | ret = trainable.train() 60 | assert not ret["done"] 61 | assert ret["training_iteration"] == 1 62 | assert "train_loss" in ret 63 | assert "train_acc" in ret 64 | assert "mean_loss" in ret 65 | assert "mean_accuracy" in ret 66 | assert "val_f1" in ret 67 | assert ret["config"]["loss_func"] == trainer.loss_func 68 | print(ret) 69 | 70 | trainable.trainer.loss_func = "binary_cross_entropy" 71 | ret = trainable.train() 72 | assert not ret["done"] 73 | assert ret["training_iteration"] == 2 74 | assert "train_loss" in ret 75 | print(ret) 76 | 77 | ret = trainable.train() 78 | assert not ret["done"] 79 | assert ret["training_iteration"] == 3 80 | assert "train_loss" in ret 81 | print(ret) 82 | 83 | trainer.train_state.extra["temp"] = 1 84 | 85 | path = trainable.save() 86 | assert os.path.exists(path) 87 | assert trainer.train_state.extra["temp"] == 1 88 | trainer.train_state.extra["temp"] = 2 89 | assert trainer.train_state.extra["temp"] == 2 90 | trainable.restore(path) 91 | assert trainer.train_state.extra["temp"] == 1 # make sure restoring state worked. 92 | ret = trainable.train() 93 | assert ret["training_iteration"] == 4 94 | 95 | print(ret) 96 | 97 | 98 | # 99 | # def test_train_diff_dimensionalities(): 100 | # 101 | # trainable = train.CNNLSTMTrainable({'output_type': 'many_to_one_takelast', 'decimation': 10, 'loss': 'binary_cross_entropy', 'scale': (1.0 / 8)}) 102 | # ret = trainable.train() 103 | # assert ret['training_iteration'] == 1 104 | # assert 'train_loss' in ret 105 | # assert 'train_acc' in ret 106 | # assert 'mean_loss' in ret 107 | # assert 'mean_accuracy' in ret 108 | # assert 'val_f1' in ret 109 | # 110 | # trainable = train.CNNLSTMTrainable( 111 | # {'output_type': 'many_to_one_takelast', 'decimation': 10, 'loss': 'cross_entropy', 'scale': (1.0/8)}) 112 | # ret = trainable.train() 113 | # assert ret['training_iteration'] == 1 114 | # assert 'train_loss' in ret 115 | # assert 'train_acc' in ret 116 | # assert 'mean_loss' in ret 117 | # assert 'mean_accuracy' in ret 118 | # assert 'val_f1' in ret 119 | # 120 | # trainable = train.CNNLSTMTrainable({'output_type': 'many_to_many', 'decimation': 10, 'scale': (1.0 / 8)}) 121 | # ret = trainable.train() 122 | # assert ret['training_iteration'] == 1 123 | # assert 'train_loss' in ret 124 | # assert 'train_acc' in ret 125 | # assert 'mean_loss' in ret 126 | # assert 'mean_accuracy' in ret 127 | # assert 'val_f1' in ret 128 | # 129 | # # Re-enable when ability to use different models is re-enabled: 130 | # def test_train_diff_models(): 131 | # trainable = train.CNNLSTMTrainable({'model_class': 'DeepConvLSTM', 'decimation': 10, 'scale': (1.0 / 8)}) 132 | # ret = trainable.train() 133 | # assert ret['training_iteration'] == 1 134 | # assert 'train_loss' in ret 135 | # assert 'train_acc' in ret 136 | # assert 'mean_loss' in ret 137 | # assert 'mean_accuracy' in ret 138 | # assert 'val_f1' in ret 139 | # ret = trainable.test_with_overlap() 140 | # assert 'test_f1' in ret 141 | # for o in ret['output_records']: 142 | # assert 'classification_report_txt' in o 143 | # 144 | # 145 | # trainable = train.CNNLSTMTrainable({'model_class': 'FilterNet', 'decimation': 10, 'scale': (1.0 / 8)}) 146 | # ret = trainable.train() 147 | # assert ret['training_iteration'] == 1 148 | # assert 'train_loss' in ret 149 | # assert 'train_acc' in ret 150 | # assert 'mean_loss' in ret 151 | # assert 'mean_accuracy' in ret 152 | # assert 'val_f1' in ret 153 | # ret = trainable.test_with_overlap() 154 | # assert 'test_f1' in ret 155 | # for o in ret['output_records']: 156 | # assert 'classification_report_txt' in o 157 | -------------------------------------------------------------------------------- /training_history.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/training_history.png -------------------------------------------------------------------------------- /win_len_effects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WhistleLabs/FilterNet/f5bcf82e8a542197b7b84d3786e808adfdf7c56f/win_len_effects.png --------------------------------------------------------------------------------