├── .gitignore ├── 2D-Manifold.ipynb ├── 2D-Synthetic.ipynb ├── README.md ├── aggregate_results.py ├── datasets ├── .gitkeep ├── electricity_inputs.npy ├── electricity_targets.npy ├── ionosphere_inputs.npy ├── ionosphere_targets.npy ├── mushroom_inputs.npy ├── mushroom_targets.npy ├── sonar_inputs.npy ├── sonar_targets.npy ├── spectf_inputs.npy └── spectf_targets.npy ├── ensembling_methods.py ├── launch_jobs.py ├── neural_network.py ├── run_experiment.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | __pycache__/ 3 | *.DS_Store 4 | *.pyc 5 | *.aux 6 | *.bbl 7 | *.blg 8 | *.out 9 | *.log 10 | *.bst 11 | datasets/icu*.npy 12 | results/ 13 | writeup/ 14 | 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Ensembles of Locally Independent Prediction Models 2 | 3 | This repository contains code used to generate the results in 4 | [Ensembles of Locally Independent Prediction Models](https://arxiv.org/abs/1911.01291). 5 | 6 | 7 | ## Main Idea 8 | 9 | Ensembling is a subfield of machine learning that studies how predictions from 10 | multiple models, all trained to solve the same task, can be combined to improve 11 | performance. One intuition behind ensembling is that there is "wisdom in the 12 | crowds" -- that individual models may be fallible, but that many together may 13 | be more correct. However, this intuition rests on the assumption that the 14 | models are unbiased (i.e. make errors independent of the data) and that they 15 | aren't all wrong in the same ways (i.e. make errors independent of each other). 16 | 17 | This second property--the property of making independent errors on new data--is 18 | often called the "diversity" of the ensemble. Although it cannot be measured 19 | directly without access to that new data, many ensembling methods still try to 20 | encourage diversity by optimizing proxies that _can_ be evaluated on training 21 | data. One example is 22 | [negative correlation learning](https://ieeexplore.ieee.org/document/809027) (NCL), 23 | which penalizes models for making correlated predictions on the training set, 24 | even as it encourages them to still make correct predictions. 25 | 26 | However, these two goals are clearly at odds. On the training set, which we 27 | assume to be "true" in some way, we really do want all of our models to make 28 | correct predictions. If that happens their predictions will of course be 29 | correlated, but the correlation of training set predictions shouldn't 30 | necessarily imply anything about the correlation (or more generally statistical 31 | dependence) of errors on _new data_, especially under distributional shift. 32 | 33 | So instead of trying to reduce the dependence of training predictions, our 34 | method (local independence training, or LIT) tries to enforce independence 35 | between _changes_ in training predictions when we _locally extrapolate_ away 36 | from the data--where we define these extrapolations in terms of infinitesimal 37 | Gaussian perturbations, possibly projected down to the data manifold. What we 38 | find is that this procedure produces a qualitatively different kind of model 39 | diversity than prediction-based methods like NCL, and that in many cases, it 40 | can lead to much more diverse and accurate predictions on new data. However, 41 | the number of locally independent models we can successfully train without an 42 | accuracy tradeoff depends on the ambiguity of the dataset. 43 | 44 | ## Repository Structure 45 | 46 | - [2D-Synthetic.ipynb](./2D-Synthetic.ipynb) contains the 2D synthetic 47 | experiments that we use to demonstrate the qualitative differences between 48 | random restarts, LIT, and NCL. It's a good starting point for understanding 49 | the contribution intuitively. 50 | - [2D-Manifold.ipynb](./2D-Manifold.ipynb) demonstrates how LIT can be 51 | generalized to work with projections down to a data manifold, providing a 52 | strategy for addressing an important limitation. 53 | - [ensembling_methods.py](./ensembling_methods.py) contains implementations of 54 | LIT as well as baseline methods (NCL, bagging, AdaBoost, and amended 55 | cross-entropy), building on top of [neural_network.py](./neural_network.py) 56 | (which abstracts away some of the Tensorflow boilerplate code). 57 | - [run_experiment.py](./run_experiment.py) contains the core script that 58 | generated the main quantitative results. To fully replicate the results, see 59 | also [launch_jobs.py](./launch_jobs.py) and 60 | [aggregate_results.py](./aggregate_results.py). 61 | 62 | ## Note About Data 63 | 64 | We provide the (z-scored) benchmark classification datasets used to generate 65 | the paper in the [datasets](./datasets) directory, but we omit the ICU 66 | mortality dataset (generated using the same procedures from 67 | [Ghassemi et al. 2017](https://www.ncbi.nlm.nih.gov/pubmed/28815112)) 68 | from this public repository since it requires access to the 69 | [MIMIC-III database](https://mimic.physionet.org/). 70 | If you have already obtained access to MIMIC-III and seek to reproduce those 71 | results, please contact us using the emails provided in the paper. 72 | 73 | ## See Also 74 | 75 | A [workshop version of this paper](https://arxiv.org/abs/1806.08716) and [associated repository](https://github.com/dtak/local-independence-public), where we focus on interpretability rather than predictive performance. 76 | 77 | ## Citation 78 | 79 | You can cite this work using 80 | 81 | ``` 82 | @inproceedings{ross2020ensembles, 83 | author = {Ross, Andrew Slavin and Pan, Weiwei and Celi, Leo Anthony and Doshi-Velez, Finale}, 84 | title = {Ensembles of Locally Independent Prediction Models}, 85 | booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence}, 86 | year = {2020}, 87 | url = {https://arxiv.org/abs/1911.01291}, 88 | } 89 | ``` 90 | -------------------------------------------------------------------------------- /aggregate_results.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import csv 4 | import glob 5 | import argparse 6 | import numpy as np 7 | import pandas as pd 8 | from collections import defaultdict, Counter 9 | 10 | """This is a script to aggregate the results of running `launch_jobs.py` into a 11 | set of CSV files that, for each dataset and splitting method, summarize 12 | mean/std/sem predictive performances and diversity metrics. These are then used 13 | by additional code to output plots and LaTeX tables.""" 14 | 15 | parser = argparse.ArgumentParser() 16 | parser.add_argument("--result_dir", type=str) 17 | FLAGS = parser.parse_args() 18 | 19 | files = glob.glob(os.path.join(FLAGS.result_dir, '**/*.csv')) 20 | 21 | methods = ['restarts','baggings','adaboost','negcorr','amended','diverse'] 22 | 23 | def method_label(m): 24 | if '-' in m: 25 | lmb = ', $\lambda=10^{'+'{:.1f}'.format(np.log10(float(m.split('-')[-1])))+'}$' 26 | if m.startswith('diverse'): 27 | return 'LIT'+lmb 28 | elif m.startswith('negcorr'): 29 | return 'NCL'+lmb 30 | elif m.startswith('amended'): 31 | return 'ACE'+lmb 32 | else: 33 | assert(False) 34 | else: 35 | return { 36 | 'diverse': 'LIT', 37 | 'negcorr': 'NCL', 38 | 'amended': 'ACE', 39 | 'restarts': 'RRs', 40 | 'baggings': 'Bag', 41 | 'adaboost': 'Ada' }[m] 42 | 43 | def load_run_with_cross_validation(f): 44 | df = pd.read_csv(f) 45 | df['n_models'] = int(re.search('n-models-(\d+)', f).group(1)) 46 | df['reg_param'] = [float(name.split('-')[-1]) if '-' in name else np.nan for name in df.ensemble_type] 47 | for prefix in ['diverse','negcorr','amended']: 48 | rows = df[df.ensemble_type.str.startswith(prefix)] 49 | if len(rows) > 0: 50 | max_idx = rows.ensem_val_auc.idxmax() 51 | max_row = rows.loc[max_idx] 52 | max_row['ensemble_type'] = prefix 53 | df = df.append(max_row) 54 | return df 55 | 56 | def aggregate(fs): 57 | dfs = [load_run_with_cross_validation(f) for f in fs] 58 | cols = list(dfs[0].columns) 59 | cols.remove('ensemble_type') 60 | aggs = dict((c, ['mean','std','sem']) for c in cols) 61 | return pd.concat(dfs).groupby('ensemble_type').agg(aggs) 62 | 63 | def load_experiment(ds, split): 64 | return aggregate([f for f in files if (('dataset-{}'.format(ds) in f) and ('split-{}'.format(split) in f))]) 65 | 66 | for ds in ['mushroom','ionosphere','sonar','spectf','electricity','icu']: 67 | for split in ['none','norm']: 68 | if ds == 'icu' and split == 'norm': split = 'limit' 69 | print(ds, split) 70 | 71 | exp = load_experiment(ds, split) 72 | cols = ['method'] 73 | for col in exp.columns.levels[0]: 74 | cols.append(col + '_mu') 75 | cols.append(col + '_sd') 76 | cols.append(col + '_se') 77 | 78 | result_file = os.path.join(FLAGS.result_dir, '{}-{}.csv'.format(ds, split)) 79 | with open(result_file, 'w') as f: 80 | writer = csv.DictWriter(f, fieldnames=cols) 81 | writer.writeheader() 82 | rows = [] 83 | methods = [exp.iloc[i].name for i in range(len(exp))] 84 | for method in methods: 85 | row = { 'method': method_label(method) } 86 | for col in exp.columns.levels[0]: 87 | row[col + '_mu'] = exp.loc[method][col]['mean'] 88 | row[col + '_sd'] = exp.loc[method][col]['std'] 89 | row[col + '_se'] = exp.loc[method][col]['sem'] 90 | writer.writerow(row) 91 | -------------------------------------------------------------------------------- /datasets/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/.gitkeep -------------------------------------------------------------------------------- /datasets/electricity_inputs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/electricity_inputs.npy -------------------------------------------------------------------------------- /datasets/electricity_targets.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/electricity_targets.npy -------------------------------------------------------------------------------- /datasets/ionosphere_inputs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/ionosphere_inputs.npy -------------------------------------------------------------------------------- /datasets/ionosphere_targets.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/ionosphere_targets.npy -------------------------------------------------------------------------------- /datasets/mushroom_inputs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/mushroom_inputs.npy -------------------------------------------------------------------------------- /datasets/mushroom_targets.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/mushroom_targets.npy -------------------------------------------------------------------------------- /datasets/sonar_inputs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/sonar_inputs.npy -------------------------------------------------------------------------------- /datasets/sonar_targets.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/sonar_targets.npy -------------------------------------------------------------------------------- /datasets/spectf_inputs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/spectf_inputs.npy -------------------------------------------------------------------------------- /datasets/spectf_targets.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/spectf_targets.npy -------------------------------------------------------------------------------- /ensembling_methods.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | from sklearn.ensemble import AdaBoostClassifier 4 | from utils import * 5 | 6 | def squared_cos_sim(v,w,eps=1e-6): 7 | """Tensorflow operation to compute the elementwise squared cosine 8 | similarity between two sets of vectors.""" 9 | num = tf.reduce_sum(v*w, axis=1)**2 10 | den = tf.reduce_sum(v*v, axis=1)*tf.reduce_sum(w*w, axis=1) 11 | return num / (den + eps) 12 | 13 | def train_diverse_models(cls, n, X, y, 14 | grad_quantity='binary_logit_input_gradients', 15 | lambda_overlap=0.01, **kw): 16 | """Main method implementing local independence training.""" 17 | 18 | if len(y.shape) == 1: 19 | y = onehot(y) 20 | 21 | if y.shape[1] > 2 and grad_quantity == 'binary_logit_input_gradients': 22 | grad_quantity = 'cross_entropy_input_gradients' 23 | 24 | # Instantiate neural networks 25 | models = [cls() for _ in range(n)] 26 | 27 | # Gather their input gradients 28 | igrads = [getattr(m, grad_quantity) for m in models] 29 | 30 | # Compute the prediction loss (sum of indiv. losses) 31 | regular_loss = tf.add_n([m.loss_function(**kw) for m in models]) 32 | 33 | # Compute the diversity loss (average CosIndepErr of pairs) 34 | diverse_loss = tf.add_n([tf.reduce_sum(squared_cos_sim(igrads[i], igrads[j])) 35 | for i in range(n) 36 | for j in range(i+1, n)]) * lambda_overlap 37 | 38 | # Combine losses and train 39 | loss = regular_loss + diverse_loss 40 | ops = { 'xent': regular_loss, 'same': diverse_loss } 41 | for i, m in enumerate(models, 1): 42 | ops['acc{}'.format(i)] = m.accuracy 43 | sw = np.ones(len(X)) 44 | data = train_batches(models, X, y, sample_weight=sw, **kw) 45 | with tf.Session() as sess: 46 | minimize(sess, loss, data, operations=ops, **kw) 47 | for m in models: 48 | m.vals = [v.eval() for v in m.vars] 49 | 50 | # Return trained models 51 | return models 52 | 53 | def train_amended_xent_models(cls, n, X, y, lambda_overlap=0.01, **kw): 54 | if len(y.shape) == 1: 55 | y = onehot(y) 56 | 57 | # Instantiate models 58 | models = [cls() for _ in range(n)] 59 | 60 | # Compute the prediction loss (sum of indiv. losses) 61 | regular_loss = tf.add_n([m.loss_function(**kw) for m in models]) 62 | 63 | # Compute the diversity loss (cross-entropy between models) 64 | diverse_loss = -tf.add_n([ 65 | tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=models[i].logits, labels=models[j].probs)) 66 | for i in range(n) for j in range(n) if i != j 67 | ]) * (lambda_overlap / (n*(n-1)*0.5)) 68 | 69 | # Combine losses and train 70 | loss = regular_loss + diverse_loss 71 | ops = { 'xent': regular_loss, 'same': diverse_loss } 72 | for i, m in enumerate(models, 1): 73 | ops['acc{}'.format(i)] = m.accuracy 74 | sw = np.ones(len(X)) 75 | data = train_batches(models, X, y, sample_weight=sw, **kw) 76 | with tf.Session() as sess: 77 | minimize(sess, loss, data, operations=ops, **kw) 78 | for m in models: 79 | m.vals = [v.eval() for v in m.vars] 80 | 81 | # Return trained models 82 | return models 83 | 84 | def train_restart_models(cls, n, X, y, **kw): 85 | """Fit a collection of models over random restarts.""" 86 | models = [cls() for _ in range(n)] 87 | for i in range(n): 88 | models[i].fit(X,y,**kw) 89 | return models 90 | 91 | def train_bagged_models(cls, n, X, y, **kw): 92 | """Fit a collection of models using bagging.""" 93 | models = [cls() for _ in range(n)] 94 | for i in range(n): 95 | idx = np.random.choice(np.arange(len(X)), size=len(X), replace=True).astype(int) 96 | models[i].fit(X[idx],y[idx],**kw) 97 | return models 98 | 99 | def train_neg_corr_models(cls, n, X, y, lambda_overlap=0.01, **kw): 100 | """Fit a collection of models using negative correlation learning. Note 101 | this uses a 0-1 MSE loss rather than cross-entropy.""" 102 | 103 | if len(y.shape) == 1: 104 | y = onehot(y) 105 | 106 | # Instantiate models 107 | models = [cls() for _ in range(n)] 108 | 109 | # Compute their mean predicted probability 110 | mean_pred = tf.add_n([m.probs for m in models]) / len(models) 111 | 112 | # Compute MSE prediction loss 113 | zero_one_losses = [tf.nn.l2_loss(m.probs-m.y) for m in models] 114 | 115 | # Compute diversity losses (difference from mean) 116 | neg_corr_losses = [-tf.nn.l2_loss(m.probs-mean_pred) for m in models] 117 | 118 | # Combine losses and train 119 | regular_loss = tf.add_n(zero_one_losses) 120 | diverse_loss = tf.add_n(neg_corr_losses) 121 | loss = regular_loss + lambda_overlap * diverse_loss 122 | ops = { 'xent': regular_loss, 'same': diverse_loss } 123 | for i, m in enumerate(models, 1): 124 | ops['acc{}'.format(i)] = m.accuracy 125 | sw = np.ones(len(X)) 126 | data = train_batches(models, X, y, sample_weight=sw, **kw) 127 | with tf.Session() as sess: 128 | minimize(sess, loss, data, operations=ops, **kw) 129 | for m in models: 130 | m.vals = [v.eval() for v in m.vars] 131 | 132 | # Return trained models 133 | return models 134 | 135 | 136 | def train_adaboost_models(cls, n, X, y, **kw): 137 | """Fit a collection of neural networks using AdaBoost.""" 138 | if len(y.shape) == 1: 139 | classes = np.array([0.,1.]) 140 | y_ = y 141 | else: 142 | classes = np.arange(y.shape[1]).astype(float) 143 | y_ = np.argmax(y, axis=1) 144 | 145 | # First, create a wrapper class that can be interepreted as a model from 146 | # within scikit-learn. 147 | class sklearn_compatible_mlp(): 148 | def __init__(self, **kwargs): 149 | self.mlp = cls() 150 | self.params_ = kwargs 151 | def get_params(self, **kwargs): return self.params_ 152 | def set_params(self, **kwargs): self.params_ = kwargs 153 | @property 154 | def classes_(self): return classes 155 | @property 156 | def n_classes_(self): return len(classes) 157 | def fit(self, X, y, sample_weight=None, **_): 158 | N = len(X) 159 | assert(y.shape == (N,)) 160 | assert(np.abs(sample_weight.sum()-1) < 0.001) 161 | self.mlp = cls() 162 | self.mlp.fit(X,y,sample_weight=sample_weight, **kw) 163 | def predict_proba(self, X, **_): 164 | return self.mlp.predict_proba(X) 165 | 166 | # Now, use scikit-learn's implementation of AdaBoost to fit the ensemble. 167 | ab = AdaBoostClassifier(base_estimator=sklearn_compatible_mlp(), n_estimators=n) 168 | ab.fit(X, y_) 169 | 170 | # Return the scikit-learn ensemble instance. 171 | return ab 172 | 173 | def train_diverse_models_w_projection(cls, n, X, y, projections, 174 | grad_quantity='binary_logit_input_gradients', 175 | lambda_overlap=0.01, **kw): 176 | """Local independence training modified to first project gradients to a 177 | lower dimensional space. This can be used to implement the local 178 | independence penalty over a manifold.""" 179 | 180 | if len(y.shape) == 1: 181 | y = onehot(y) 182 | 183 | # Define Tensorflow operation to project gradients 184 | D = X.shape[1] 185 | def project_to(low_dim_basis, high_dim_vectors): 186 | return tf.reduce_sum(tf.reshape(high_dim_vectors, (-1,1,D)) * low_dim_basis, axis=2) 187 | 188 | # Instantiate models and input-space gradients 189 | models = [cls() for _ in range(n)] 190 | igrads = [getattr(m, grad_quantity) for m in models] 191 | 192 | # Add a placeholder to each model for the projection matrices 193 | for m in models: 194 | m.proj = tf.placeholder(tf.float32, [None,projections.shape[1],projections.shape[2]]) 195 | 196 | # Compute prediction and diversity losses 197 | regular_loss = tf.add_n([m.loss_function(**kw) for m in models]) 198 | diverse_loss = tf.add_n([tf.reduce_sum(squared_cos_sim( 199 | project_to(models[i].proj, igrads[i]), 200 | project_to(models[j].proj, igrads[j]))) 201 | for i in range(n) 202 | for j in range(i+1, n)]) * lambda_overlap 203 | loss = regular_loss + diverse_loss 204 | 205 | # Train ensemble, passing in the additional projection matrices 206 | ops = { 'xent': regular_loss, 'same': diverse_loss } 207 | for i, m in enumerate(models, 1): 208 | ops['acc{}'.format(i)] = m.accuracy 209 | sw = np.ones(len(X)) 210 | data = train_batches(models, X, y, sample_weight=sw, proj=projections, **kw) 211 | with tf.Session() as sess: 212 | minimize(sess, loss, data, operations=ops, **kw) 213 | for m in models: 214 | m.vals = [v.eval() for v in m.vars] 215 | 216 | # Return trained models 217 | return models 218 | -------------------------------------------------------------------------------- /launch_jobs.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import numpy as np 4 | 5 | """This is a script to repeatedly launch experiments (i.e. invoke 6 | `run_experiment.py`) and generate the full set of results from the paper. 7 | 8 | NOTE: Some of this code is specific to Harvard Odyssey, and would need to be 9 | rewritten to work on different research clusters.""" 10 | 11 | parser = argparse.ArgumentParser() 12 | parser.add_argument("--base_dir", type=str) 13 | parser.add_argument("--conda_env", type=str) 14 | parser.add_argument("--partition", type=str) 15 | parser.add_argument("--mem_limit", type=str, default='20000') 16 | FLAGS = parser.parse_args() 17 | 18 | slurm_template = """#!/bin/bash 19 | #SBATCH --mem={mem_limit} 20 | #SBATCH -t {time_limit} 21 | #SBATCH -p {partition} 22 | #SBATCH -o {out_file} 23 | #SBATCH -e {err_file} 24 | 25 | module load Anaconda3/5.0.1-fasrc01 26 | source activate {conda_env} 27 | {job_command} 28 | """ 29 | 30 | def launch_job(restart, dataset, n_models, split, time_limit=None, mem_limit=None): 31 | if time_limit is None: time_limit = '0-{0:02d}:00'.format(min(24, n_models*2)) 32 | if mem_limit is None: mem_limit = FLAGS.mem_limit 33 | 34 | save_dir = 'restart-{}__dataset-{}__n-models-{}__split-{}/'.format(restart+1, dataset, n_models, split) 35 | save_dir = os.path.join(FLAGS.base_dir, save_dir) 36 | out_file = os.path.join(save_dir, 'job-%j.out') 37 | err_file = os.path.join(save_dir, 'job-%j.err') 38 | slurm_file = os.path.join(save_dir, 'job.slurm') 39 | os.system('mkdir -p {}'.format(save_dir)) 40 | 41 | job_command = "python -u run_experiment.py --save_dir={} --n_models={} --dataset={} --split={}".format(save_dir, n_models, dataset, split) 42 | slurm_command = slurm_template.format( 43 | job_command=job_command, 44 | time_limit=time_limit, 45 | mem_limit=mem_limit, 46 | partition=FLAGS.partition, 47 | conda_env=FLAGS.conda_env, 48 | out_file=out_file, 49 | err_file=err_file) 50 | with open(slurm_file, "w") as f: f.write(slurm_command) 51 | os.system("cat {} | sbatch".format(slurm_file)) 52 | 53 | datasets = ['covertype', 'ionosphere', 'sonar', 'spectf', 'mushroom', 'electricity'] 54 | datasets += ['icu'] # available on request if you have access to MIMIC-III. 55 | 56 | for restart in range(10): 57 | for n_models in [2,3,5,8,13]: 58 | for dataset in datasets: 59 | splits = ['none', 'limit'] if 'icu' in dataset else ['none', 'norm'] 60 | for split in splits: 61 | launch_job(restart, dataset, n_models, split) 62 | -------------------------------------------------------------------------------- /neural_network.py: -------------------------------------------------------------------------------- 1 | import uuid 2 | import numpy as np 3 | import tensorflow as tf 4 | import six.moves.cPickle as pickle 5 | from six import add_metaclass 6 | from abc import ABCMeta, abstractmethod, abstractproperty 7 | from utils import * 8 | 9 | """ 10 | Object-oriented class for handling neural networks implemented 11 | in Tensorflow. 12 | """ 13 | @add_metaclass(ABCMeta) 14 | class NeuralNetwork(): 15 | def __init__(self, name=None, dtype=tf.float32, **kwargs): 16 | self.vals = None # Holds the trained weights of the network 17 | self.name = (name or str(uuid.uuid4())) # Tensorflow variable scope 18 | self.dtype = dtype 19 | self.setup_model(**kwargs) 20 | assert(hasattr(self, 'X')) 21 | assert(hasattr(self, 'y')) 22 | assert(hasattr(self, 'logits')) 23 | 24 | def setup_model(self, X=None, y=None, **kw): 25 | """Defines common placeholders, then calls rebuild_model""" 26 | with tf.name_scope(self.name): 27 | self.X = tf.placeholder(self.dtype, self.x_shape, name="X") if X is None else X 28 | self.y = tf.placeholder(self.dtype, self.y_shape, name="y") if y is None else y 29 | self.sample_weight = tf.placeholder(self.dtype, [None], name="sample_weight") 30 | self.is_train = tf.placeholder_with_default( 31 | tf.constant(False, dtype=tf.bool), shape=(), name="is_train") 32 | self.model = self.rebuild_model(self.X, **kw) 33 | self.recompute_vars() 34 | 35 | def rebuild_model(self, X, reuse=None, **kw): 36 | """Override this in subclasses. Define Tensorflow operations and return 37 | list whose last entry is logits.""" 38 | 39 | @property 40 | def logits(self): 41 | return self.model[-1] 42 | 43 | @abstractproperty 44 | def x_shape(self): 45 | """Specify the shape of X; for MNIST, this could be [None, 784]""" 46 | 47 | @abstractproperty 48 | def y_shape(self): 49 | """Specify the shape of y; for MNIST, this would be [None, 10]""" 50 | 51 | @property 52 | def num_features(self): 53 | return np.product(self.x_shape[1:]) 54 | 55 | @property 56 | def num_classes(self): 57 | return np.product(self.y_shape[1:]) 58 | 59 | @property 60 | def trainable_vars(self): 61 | """Return this model's trainable variables""" 62 | return [v for v in tf.trainable_variables() if v in self.vars] 63 | 64 | def input_grad(self, f): 65 | """Helper to take input gradients""" 66 | return tf.gradients(f, self.X)[0] 67 | 68 | def cross_entropy_with(self, y): 69 | """Compute sample-weighted cross-entropy classification loss""" 70 | w = self.sample_weight / tf.reduce_sum(self.sample_weight) 71 | return tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(logits=self.logits, labels=y) * w) 72 | 73 | @cachedproperty 74 | def preds(self): 75 | """Tensorflow operation to return predicted labels.""" 76 | return tf.argmax(self.logits, axis=1) 77 | 78 | @cachedproperty 79 | def probs(self): 80 | """Tensorflow operation to return predicted label probabilities.""" 81 | return tf.nn.softmax(self.logits) 82 | 83 | @cachedproperty 84 | def logps(self): 85 | """Tensorflow operation to return predicted label log-probabilities.""" 86 | return self.logits - tf.reduce_logsumexp(self.logits, 1, keepdims=True) 87 | 88 | @cachedproperty 89 | def grad_sum_logps(self): 90 | """Tensorflow operation returning gradient of the sum of log-probabilities. 91 | Can be used as the gradient for LIT for multi-class classification (doesn't 92 | require labels).""" 93 | return self.input_grad(self.logps) 94 | 95 | @cachedproperty 96 | def l2_weights(self): 97 | """Tensorflow operation returning sum of squared weight values""" 98 | return tf.add_n([l2_loss(v) for v in self.trainable_vars]) 99 | 100 | @cachedproperty 101 | def cross_entropy(self): 102 | """Tensorflow operation returning classification cross-entropy""" 103 | return self.cross_entropy_with(self.y) 104 | 105 | @cachedproperty 106 | def cross_entropy_input_gradients(self): 107 | """Tensorflow operation returning gradient of the loss. Can be used as the 108 | gradient for LIT for multi-class classification but does require labels.""" 109 | return self.input_grad(self.cross_entropy) 110 | 111 | @cachedproperty 112 | def predicted_logit_input_gradients(self): 113 | """Tensorflow operation returning gradient of the predicted log-odds. 114 | Mostly useful for visualization rather than training.""" 115 | return self.input_grad(self.logits * self.y) 116 | 117 | @cachedproperty 118 | def binary_logits(self): 119 | """Tensorflow operation returning the actual predicted log-odds (binary 120 | only).""" 121 | assert(self.num_classes == 2) 122 | return self.logps[:,1] - self.logps[:,0] 123 | 124 | @cachedproperty 125 | def binary_logit_input_gradients(self): 126 | """Tensorflow operation returning gradient of the predicted binary 127 | log-odds. This is what we use for LIT in binary classification.""" 128 | return self.input_grad(self.binary_logits) 129 | 130 | @cachedproperty 131 | def accuracy(self): 132 | """Tensorflow operation returning classification accuracy.""" 133 | return tf.reduce_mean(tf.cast(tf.equal(self.preds, tf.argmax(self.y, 1)), dtype=tf.float32)) 134 | 135 | def score(self, X, y, **kw): 136 | """Compute classification accuracy for numpy inputs and labels.""" 137 | if len(y.shape) == 2: 138 | return np.mean(self.predict(X, **kw) == np.argmax(y, 1)) 139 | else: 140 | return np.mean(self.predict(X, **kw) == y) 141 | 142 | def predict(self, X, **kw): 143 | """Compute predictions for numpy inputs.""" 144 | with tf.Session() as sess: 145 | self.init(sess) 146 | return self.batch_eval(sess, self.preds, X, **kw) 147 | 148 | def predict_logits(self, X, **kw): 149 | """Compute raw logits for numpy inputs.""" 150 | with tf.Session() as sess: 151 | self.init(sess) 152 | return self.batch_eval(sess, self.logits, X, **kw) 153 | 154 | def predict_binary_logodds(self, X, **kw): 155 | """Compute predicted binary log-odds for numpy inputs.""" 156 | with tf.Session() as sess: 157 | self.init(sess) 158 | return self.batch_eval(sess, self.binary_logits, X, **kw) 159 | 160 | def predict_proba(self, X, **kw): 161 | """Compute predicted probabilities for numpy inputs.""" 162 | with tf.Session() as sess: 163 | self.init(sess) 164 | return self.batch_eval(sess, self.probs, X, **kw) 165 | 166 | def batch_eval(self, sess, quantity, X, n=256): 167 | """Internal helper to batch computations (prevents memory issues)""" 168 | vals = sess.run(quantity, feed_dict={ self.X: X[:n] }) 169 | stack = np.vstack if len(vals.shape) > 1 else np.hstack 170 | for i in range(n, len(X), n): 171 | vals = stack((vals, sess.run(quantity, feed_dict={ self.X: X[i:i+n] }))) 172 | return vals 173 | 174 | def input_gradients(self, X, y=None, n=256, **kw): 175 | """Computes different kinds of input gradients for inputs (and optionally 176 | labels). See input_gradients_ for details.""" 177 | with tf.Session() as sess: 178 | self.init(sess) 179 | return self.batch_input_gradients_(sess, X, y, n, **kw) 180 | 181 | def batch_input_gradients_(self, sess, X, y=None, n=256, **kw): 182 | yy = y[:n] if y is not None and not isint(y) else y 183 | grads = self.input_gradients_(sess, X[:n], yy, **kw) 184 | for i in range(n, len(X), n): 185 | yy = y[i:i+n] if y is not None and not isint(y) else y 186 | grads = np.vstack((grads, 187 | self.input_gradients_(sess, X[i:i+n], yy, **kw))) 188 | return grads 189 | 190 | def input_gradients_(self, sess, X, y=None, logits=False, quantity=None): 191 | if quantity is not None: 192 | return sess.run(quantity, feed_dict={ self.X: X }) 193 | if y is None: 194 | return sess.run(self.grad_sum_logps, feed_dict={ self.X: X }) 195 | elif logits and self.num_classes == 2: 196 | return sess.run(self.binary_logit_input_gradients, feed_dict={ self.X: X }) 197 | elif isint(y): 198 | y = onehot(np.array([y]*len(X)), self.num_classes) 199 | feed = { self.X: X, self.y: y, self.sample_weight: np.ones(len(X)) } 200 | if logits: 201 | return sess.run(self.predicted_logit_input_gradients, feed_dict=feed) 202 | else: 203 | return sess.run(self.cross_entropy_input_gradients, feed_dict=feed) 204 | 205 | def loss_function(self, l2_weights=0., **kw): 206 | """Construct the loss function Tensorflow op given hyperparameters.""" 207 | log_likelihood = self.cross_entropy 208 | if l2_weights > 0: 209 | log_prior = l2_weights * self.l2_weights 210 | else: 211 | log_prior = 0 212 | return log_likelihood + log_prior 213 | 214 | def fit(self, X, y, loss_fn=None, init=False, sample_weight=None, **kw): 215 | """Fit the neural network on the particular dataset.""" 216 | if loss_fn is None: 217 | loss_fn = self.loss_function(**kw) 218 | if len(y.shape) == 1: 219 | y = onehot(y, self.num_classes) 220 | if sample_weight is None: 221 | sample_weight = np.ones(len(X)) 222 | ops = { 'xent': self.cross_entropy, 'loss': loss_fn, 'accu': self.accuracy } 223 | batches = train_batches([self], X, y, sample_weight=sample_weight, **kw) 224 | with tf.Session() as sess: 225 | if init: self.init(sess) 226 | minimize(sess, loss_fn, batches, ops, **kw) 227 | self.vals = [v.eval() for v in self.vars] 228 | 229 | def recompute_vars(self): 230 | """Determine which Tensorflow variables are associated with this 231 | network.""" 232 | self.vars = tf.get_default_graph().get_collection( 233 | tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name) 234 | 235 | def init(self, sess): 236 | """Prepare this network to be used in a Tensorflow session.""" 237 | if self.vals is None: 238 | sess.run(tf.global_variables_initializer()) 239 | else: 240 | for var, val in zip(self.vars, self.vals): 241 | sess.run(var.assign(val)) 242 | 243 | def save(self, filename): 244 | """Save the weights of the network to a pickle file.""" 245 | with open(filename, 'wb') as f: 246 | pickle.dump(self.vals, f) 247 | 248 | def load(self, filename): 249 | """Load the weights of the network from a pickle file.""" 250 | with open(filename, 'rb') as f: 251 | self.vals = pickle.load(f) 252 | -------------------------------------------------------------------------------- /run_experiment.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import numpy as np 3 | import tensorflow as tf 4 | from scipy.stats import pearsonr 5 | from utils import * 6 | from neural_network import * 7 | from ensembling_methods import * 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument("--save_dir", type=str) 11 | parser.add_argument("--n_models", type=int) 12 | parser.add_argument("--dataset", type=str) 13 | parser.add_argument("--split", type=str, default='none') 14 | FLAGS = parser.parse_args() 15 | 16 | save_dir = FLAGS.save_dir 17 | n_models = FLAGS.n_models 18 | dataset = FLAGS.dataset 19 | split = FLAGS.split 20 | 21 | # Load the dataset 22 | X = np.load('datasets/{}_inputs.npy'.format(dataset)) 23 | y = np.load('datasets/{}_targets.npy'.format(dataset)) 24 | 25 | # Decide how to split it 26 | if split == 'norm': 27 | # If splitting by "norm", split train and test by distance from origin, 28 | # but subsplit train and validation randomly 29 | norms = np.linalg.norm(X, axis=1) 30 | midpt = np.median(norms) 31 | X_test = X[np.argwhere(norms > midpt)[:,0]] 32 | y_test = y[np.argwhere(norms > midpt)[:,0]] 33 | X_train = X[np.argwhere(norms <= midpt)[:,0]] 34 | y_train = y[np.argwhere(norms <= midpt)[:,0]] 35 | X_test, X_val, y_test, y_val = tt_split(X_test, y_test) 36 | else: 37 | # Otherwise, split train/test/val completely randomly 38 | X_train, X_test, y_train, y_test = tt_split(X, y) 39 | X_train, X_val, y_train, y_val = tt_split(X_train, y_train) 40 | 41 | if split == 'limit': 42 | # If splitting by "limit", use a smaller training set 43 | X_train = X_train[:1000] 44 | y_train = y_train[:1000] 45 | 46 | # Define helpers for printing performance metrics to a csv. 47 | cols = ( 48 | ['ensemble_type', 'ensem_val_auc'] 49 | + ['indiv_auc_avg', 'indiv_auc_std', 'ensem_auc', 'indiv_auc_max', 'indiv_auc_min'] 50 | + ['indiv_acc_avg', 'indiv_acc_std', 'ensem_acc', 'indiv_acc_max', 'indiv_acc_min'] 51 | + ['q_stat', 'interrater', 'err_corr', 'grad_cos2'] 52 | ) 53 | 54 | def write_row(row, mode='a+'): 55 | csv = open(save_dir + 'auc_results.csv', mode) 56 | csv.write(','.join(row) + '\n') 57 | csv.close() 58 | 59 | write_row(cols, mode='w') 60 | 61 | # Define main helper for evaluating models and saving results 62 | def save_models(models, name, moe_auc=None, moe_val=None, moe_acc=None): 63 | row = {'ensemble_type': name} 64 | print(name) 65 | test_preds = [] 66 | val_preds = [] 67 | accs = [] 68 | aucs = [] 69 | grads = [] 70 | 71 | # For each model, save its parameters, compute its individual AUC, and 72 | # compile its predictions 73 | for i, m in enumerate(models): 74 | m.save('{}{}_model{}.pkl'.format(save_dir, name, i)) 75 | testp = m.predict_proba(X_test) 76 | valp = m.predict_proba(X_val) 77 | auc = scoring_fun(testp, y_test) 78 | acc = accuracy_fun(testp, y_test) 79 | print(' Model #{} AUC: {:.4f}, acc: {:.4f}'.format(i+1,auc,acc)) 80 | aucs.append(auc) 81 | accs.append(acc) 82 | test_preds.append(testp) 83 | val_preds.append(valp) 84 | grads.append(m.input_gradients(X_test, logits=(y_test.max() == 1))) 85 | 86 | # Save max, min, mean, and standard deviation of individual model AUC 87 | print(' Indiv AUC max: {:.4f}'.format(np.max(aucs))) 88 | print(' Indiv AUC min: {:.4f}'.format(np.min(aucs))) 89 | print(' Indiv AUC mu: {:.4f}'.format(np.mean(aucs))) 90 | print(' Indiv AUC sd: {:.4f}'.format(np.std(aucs))) 91 | 92 | row['indiv_auc_max'] = '{:.6f}'.format(np.max(aucs)) 93 | row['indiv_auc_min'] = '{:.6f}'.format(np.min(aucs)) 94 | row['indiv_auc_avg'] = '{:.6f}'.format(np.mean(aucs)) 95 | row['indiv_auc_std'] = '{:.6f}'.format(np.std(aucs)) 96 | 97 | row['indiv_acc_max'] = '{:.6f}'.format(np.max(accs)) 98 | row['indiv_acc_min'] = '{:.6f}'.format(np.min(accs)) 99 | row['indiv_acc_avg'] = '{:.6f}'.format(np.mean(accs)) 100 | row['indiv_acc_std'] = '{:.6f}'.format(np.std(accs)) 101 | 102 | # Compute AUC of average prediction 103 | val_preds = np.array(val_preds) 104 | test_preds = np.array(test_preds) 105 | grads = np.array(grads) 106 | avg_auc = scoring_fun(test_preds.mean(axis=0), y_test) 107 | avg_acc = accuracy_fun(test_preds.mean(axis=0), y_test) 108 | print(' Ens. Avg AUC: {:.4f}, acc: {:.4f}'.format(avg_auc, avg_acc)) 109 | 110 | avg_auc_val = scoring_fun(val_preds.mean(axis=0), y_val) 111 | 112 | # Report it (unless it's adaboost and we've been passed its weighted 113 | # predictions) 114 | if moe_auc is None: 115 | row['ensem_auc'] = '{:.6f}'.format(avg_auc) 116 | row['ensem_acc'] = '{:.6f}'.format(avg_acc) 117 | row['ensem_val_auc'] = '{:.6f}'.format(avg_auc_val) 118 | else: 119 | row['ensem_auc'] = '{:.6f}'.format(moe_auc) 120 | row['ensem_acc'] = '{:.6f}'.format(moe_acc) 121 | row['ensem_val_auc'] = '{:.6f}'.format(moe_val) 122 | 123 | # If the ensemble had more than one model (i.e. if it wasn't AdaBoost 124 | # terminating early), then compute standard diversity measures (+ ours). 125 | if len(models) > 1: 126 | # First determine where each model erred 127 | ens_errors = [error_masks(preds, y_test) for preds in test_preds] 128 | ens_errsets = [set(np.argwhere(error)[:,0]) for error in ens_errors] 129 | 130 | # Compute the error correlation rho_avg 131 | error_corr = np.mean([ 132 | pearsonr(err1, err2)[0] 133 | for i,err1 in enumerate(ens_errors) 134 | for err2 in ens_errors[i+1:]]) 135 | 136 | # Compute the q-statistic 137 | q_stat = np.mean([ 138 | yules_q_statistic(e1,e2,y_test) 139 | for i,e1 in enumerate(ens_errsets) 140 | for e2 in ens_errsets[i+1:]]) 141 | 142 | # Compute the interrater agreement (See Eq. 16 of Kuncheva & Whitaker 2003) 143 | Dis_av = np.mean([ 144 | disagreement_measure(e1,e2,y_test) 145 | for i,e1 in enumerate(ens_errsets) 146 | for e2 in ens_errsets[i+1:]]) 147 | avg_acc = np.mean(accs) 148 | try: 149 | kappa = 1 - Dis_av / (2 * avg_acc * (1-avg_acc)) 150 | except ZeroDivisionError: 151 | kappa = np.nan 152 | 153 | # Compute the value of the LIT penalty 154 | gradcos2 = np.mean([elemwise_sq_cos_sim(g1, g2) 155 | for i,g1 in enumerate(grads) 156 | for g2 in grads[i+1:]]) 157 | 158 | print(' Q. statistic: {:.4f}'.format(q_stat)) 159 | print(' Interrater agg: {:.4f}'.format(kappa)) 160 | print(' Err. correl.: {:.4f}'.format(error_corr)) 161 | print(' Av grad cos2: {:.4f}'.format(gradcos2)) 162 | 163 | row['q_stat'] = '{:.6f}'.format(q_stat) 164 | row['interrater'] = '{:.6f}'.format(kappa) 165 | row['err_corr'] = '{:.6f}'.format(error_corr) 166 | row['grad_cos2'] = '{:.6f}'.format(gradcos2) 167 | else: 168 | row['q_stat'] = 'nan' 169 | row['interrater'] = 'nan' 170 | row['err_corr'] = 'nan' 171 | row['grad_cos2'] = 'nan' 172 | 173 | assert(set(row.keys()) == set(cols)) 174 | 175 | # Print everything to the CSV. 176 | write_row([row[k] for k in cols]) 177 | 178 | # Define the neural network architecture - 256-unit hidden layer w/ dropout 179 | if len(y_train.shape) == 1: 180 | y_shape = 2 181 | else: 182 | y_shape = y_train.shape[1] 183 | class Net(NeuralNetwork): 184 | @property 185 | def x_shape(self): return [None, X_train.shape[1]] 186 | @property 187 | def y_shape(self): return [None, y_shape] 188 | def rebuild_model(self, X, **_): 189 | L0 = X 190 | L1 = tf.layers.dense(L0, 256, name=self.name+'/L1', activation=tf.nn.relu) 191 | L1 = tf.layers.dropout(L1, training=self.is_train) 192 | L2 = tf.layers.dense(L1, y_shape, name=self.name+'/L2', activation=None) 193 | return [L1, L2] 194 | 195 | # Set up training parameters -- we'll use 0.0001 weight decay and train for the 196 | # minimum epochs to run for 5000 iterations. 197 | num_epochs = int(np.ceil(np.ceil((5000*128) / float(len(X_train))))) 198 | train_args = [Net, n_models, X_train, y_train] 199 | train_kwargs = { 200 | 'num_epochs': num_epochs, 201 | 'l2_weights': 0.0001, 202 | 'print_every': 100 203 | } 204 | 205 | # Train random restarts 206 | tf.reset_default_graph() 207 | save_models(train_restart_models(*train_args, **train_kwargs), 'restarts') 208 | 209 | # Train bagging 210 | tf.reset_default_graph() 211 | save_models(train_bagged_models(*train_args, **train_kwargs), 'baggings') 212 | 213 | # Train adaboost (using scikit-learn's default implementation) 214 | tf.reset_default_graph() 215 | adaboost = train_adaboost_models(*train_args, **train_kwargs) 216 | adaboost_models = [e.mlp for e in adaboost.estimators_] 217 | save_models(adaboost_models, 'adaboost', 218 | moe_auc=scoring_fun(adaboost.predict_proba(X_test), y_test), 219 | moe_val=scoring_fun(adaboost.predict_proba(X_val), y_val), 220 | moe_acc=accuracy_fun(adaboost.predict_proba(X_test), y_test)) 221 | 222 | for penalty in np.logspace(-4, 1, 16): 223 | # Run LIT 224 | tf.reset_default_graph() 225 | save_models( 226 | train_diverse_models(*train_args, lambda_overlap=penalty, **train_kwargs), 227 | 'diverse-{:.4f}'.format(penalty)) 228 | 229 | # Run NCL 230 | tf.reset_default_graph() 231 | save_models( 232 | train_neg_corr_models(*train_args, lambda_overlap=penalty, **train_kwargs), 233 | 'negcorr-{:.4f}'.format(penalty)) 234 | 235 | # Run ACE 236 | tf.reset_default_graph() 237 | save_models( 238 | train_amended_xent_models(*train_args, lambda_overlap=penalty, **train_kwargs), 239 | 'amended-{:.4f}'.format(penalty)) 240 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import six 3 | import time 4 | import numpy as np 5 | import tensorflow as tf 6 | from sklearn.metrics import roc_auc_score, accuracy_score 7 | from sklearn.model_selection import train_test_split 8 | 9 | def l1_loss(x): 10 | return tf.reduce_sum(tf.abs(x)) 11 | 12 | def l2_loss(x): 13 | return tf.nn.l2_loss(x) 14 | 15 | class cachedproperty(object): 16 | """Simplified https://github.com/pydanny/cached-property""" 17 | def __init__(self, function): 18 | self.__doc__ = getattr(function, '__doc__') 19 | self.function = function 20 | 21 | def __get__(self, instance, klass): 22 | if instance is None: return self 23 | value = instance.__dict__[self.function.__name__] = self.function(instance) 24 | return value 25 | 26 | def isint(x): 27 | return isinstance(x, (int, np.int32, np.int64)) 28 | 29 | def onehot(Y, K=None): 30 | if K is None: 31 | K = np.unique(Y) 32 | elif isint(K): 33 | K = list(range(K)) 34 | data = np.array([[y == k for k in K] for y in Y]).astype(int) 35 | return data 36 | 37 | def minibatch_indexes(lenX, batch_size=256, num_epochs=50, **kw): 38 | n = int(np.ceil(lenX / batch_size)) 39 | for epoch in range(num_epochs): 40 | for batch in range(n): 41 | i = epoch*n + batch 42 | yield i, epoch, slice((i%n)*batch_size, ((i%n)+1)*batch_size) 43 | 44 | def train_feed(idx, models, **kw): 45 | """Convert a set of models, a set of indexes, and numpy arrays given by the 46 | keyword arguments to a set of feed dictionaries for each model.""" 47 | feed = {} 48 | for m in models: 49 | feed[m.is_train] = True 50 | for dictionary in [kw, kw.get('feed_dict', {})]: 51 | for key, val in six.iteritems(dictionary): 52 | attr = getattr(m, key) if isinstance(key, str) and hasattr(m, key) else key 53 | if type(attr) == type(m.X): 54 | if len(attr.shape) >= 1: 55 | if attr.shape[0].value is None: 56 | feed[attr] = val[idx] 57 | return feed 58 | 59 | def train_batches(models, X, y, **kw): 60 | for i, epoch, idx in minibatch_indexes(len(X), **kw): 61 | yield i, epoch, train_feed(idx, models, X=X, y=y, **kw) 62 | 63 | def reinitialize_variables(sess): 64 | """Construct a Tensorflow operation to initialize any variables in its graph 65 | which are not already initialized.""" 66 | uninitialized_vars = [] 67 | for var in tf.global_variables(): 68 | try: 69 | sess.run(var) 70 | except tf.errors.FailedPreconditionError: 71 | uninitialized_vars.append(var) 72 | return tf.variables_initializer(uninitialized_vars) 73 | 74 | def minimize(sess, loss_fn, batches, operations={}, learning_rate=0.001, print_every=None, var_list=None, **kw): 75 | """Minimize a loss function over the provided batches of data, possibly 76 | printing progress.""" 77 | optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) 78 | train_op = optimizer.minimize(loss_fn, var_list=var_list) 79 | op_keys = sorted(list(operations.keys())) 80 | ops = [train_op] + [operations[k] for k in op_keys] 81 | t = time.time() 82 | sess.run(reinitialize_variables(sess)) 83 | for i, epoch, batch in batches: 84 | results = sess.run(ops, feed_dict=batch) 85 | if print_every and i % print_every == 0: 86 | s = 'Batch {}, epoch {}, time {:.1f}s'.format(i, epoch, time.time() - t) 87 | for j,k in enumerate(op_keys, 1): 88 | s += ', {} {:.4f}'.format(k, results[j]) 89 | print(s) 90 | 91 | def tt_split(X, y, test_size=0.2): 92 | return train_test_split(X, y, test_size=test_size, stratify=y) 93 | 94 | def elemwise_sq_cos_sim(v, w, eps=1e-8): 95 | assert(len(v.shape) == 2) 96 | assert(len(w.shape) == 2) 97 | num = np.sum(v*w, axis=1)**2 98 | den = np.sum(v*v, axis=1) * np.sum(w*w, axis=1) 99 | return num / (den + eps) 100 | 101 | def yules_q_statistic(e1, e2, y_test): 102 | n = len(y_test) 103 | n00 = len(e1.intersection(e2)) 104 | n01 = len(e1.difference(e2)) 105 | n10 = len(e2.difference(e1)) 106 | n11 = n - len(e1.union(e2)) 107 | assert(n00+n01+n10+n11 == n) 108 | numer = n11*n00 - n01*n10 109 | denom = n11*n00 + n01*n10 110 | if numer == 0: 111 | return 0 112 | else: 113 | return numer / float(denom) 114 | 115 | def disagreement_measure(e1, e2, y_test): 116 | n = len(y_test) 117 | n01 = len(e1.difference(e2)) 118 | n10 = len(e2.difference(e1)) 119 | return (n01 + n10) / n 120 | 121 | def scoring_fun(y_pred, y_true): 122 | if len(y_true.shape) == 1: 123 | assert(y_true.max() == 1) # binary 124 | if len(y_pred.shape) == 1: 125 | preds = y_pred 126 | else: 127 | preds = y_pred[:,1] 128 | return roc_auc_score(y_true, preds) 129 | else: 130 | return accuracy_fun(y_pred, y_true) 131 | 132 | def accuracy_fun(y_pred, y_true): 133 | if len(y_true.shape) == 1: 134 | assert(y_true.max() == 1) # binary 135 | if len(y_pred.shape) == 1: 136 | preds = (y_pred > 0.5).astype(int) 137 | else: 138 | preds = np.argmax(y_pred, axis=1) 139 | return np.mean(y_true == preds) 140 | else: 141 | return np.mean(np.argmax(y_true, axis=1) == np.argmax(y_pred, axis=1)) 142 | 143 | def error_masks(y_pred, y_true): 144 | if len(y_true.shape) == 1: 145 | assert(y_true.max() == 1) # binary 146 | if len(y_pred.shape) == 1: 147 | preds = (y_pred > 0.5).astype(int) 148 | else: 149 | preds = np.argmax(y_pred, axis=1) 150 | return (preds != y_true).astype(int) 151 | else: 152 | return (np.argmax(y_true, axis=1) != np.argmax(y_pred, axis=1)).astype(int) 153 | --------------------------------------------------------------------------------