├── .gitignore
├── 2D-Manifold.ipynb
├── 2D-Synthetic.ipynb
├── README.md
├── aggregate_results.py
├── datasets
    ├── .gitkeep
    ├── electricity_inputs.npy
    ├── electricity_targets.npy
    ├── ionosphere_inputs.npy
    ├── ionosphere_targets.npy
    ├── mushroom_inputs.npy
    ├── mushroom_targets.npy
    ├── sonar_inputs.npy
    ├── sonar_targets.npy
    ├── spectf_inputs.npy
    └── spectf_targets.npy
├── ensembling_methods.py
├── launch_jobs.py
├── neural_network.py
├── run_experiment.py
└── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | .ipynb_checkpoints/
 2 | __pycache__/
 3 | *.DS_Store
 4 | *.pyc
 5 | *.aux
 6 | *.bbl
 7 | *.blg
 8 | *.out
 9 | *.log
10 | *.bst
11 | datasets/icu*.npy
12 | results/
13 | writeup/
14 | 
15 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Ensembles of Locally Independent Prediction Models
 2 | 
 3 | This repository contains code used to generate the results in
 4 | [Ensembles of Locally Independent Prediction Models](https://arxiv.org/abs/1911.01291).
 5 | 
 6 | 
 7 | ## Main Idea
 8 | 
 9 | Ensembling is a subfield of machine learning that studies how predictions from
10 | multiple models, all trained to solve the same task, can be combined to improve
11 | performance. One intuition behind ensembling is that there is "wisdom in the
12 | crowds" -- that individual models may be fallible, but that many together may
13 | be more correct. However, this intuition rests on the assumption that the
14 | models are unbiased (i.e. make errors independent of the data) and that they
15 | aren't all wrong in the same ways (i.e. make errors independent of each other).
16 | 
17 | This second property--the property of making independent errors on new data--is
18 | often called the "diversity" of the ensemble. Although it cannot be measured
19 | directly without access to that new data, many ensembling methods still try to
20 | encourage diversity by optimizing proxies that _can_ be evaluated on training
21 | data. One example is
22 | [negative correlation learning](https://ieeexplore.ieee.org/document/809027) (NCL),
23 | which penalizes models for making correlated predictions on the training set,
24 | even as it encourages them to still make correct predictions.
25 | 
26 | However, these two goals are clearly at odds. On the training set, which we
27 | assume to be "true" in some way, we really do want all of our models to make
28 | correct predictions. If that happens their predictions will of course be
29 | correlated, but the correlation of training set predictions shouldn't
30 | necessarily imply anything about the correlation (or more generally statistical
31 | dependence) of errors on _new data_, especially under distributional shift.
32 | 
33 | So instead of trying to reduce the dependence of training predictions, our
34 | method (local independence training, or LIT) tries to enforce independence
35 | between _changes_ in training predictions when we _locally extrapolate_ away
36 | from the data--where we define these extrapolations in terms of infinitesimal
37 | Gaussian perturbations, possibly projected down to the data manifold. What we
38 | find is that this procedure produces a qualitatively different kind of model
39 | diversity than prediction-based methods like NCL, and that in many cases, it
40 | can lead to much more diverse and accurate predictions on new data. However,
41 | the number of locally independent models we can successfully train without an
42 | accuracy tradeoff depends on the ambiguity of the dataset.
43 | 
44 | ## Repository Structure
45 | 
46 | - [2D-Synthetic.ipynb](./2D-Synthetic.ipynb) contains the 2D synthetic
47 |   experiments that we use to demonstrate the qualitative differences between
48 |   random restarts, LIT, and NCL. It's a good starting point for understanding
49 |   the contribution intuitively.
50 | - [2D-Manifold.ipynb](./2D-Manifold.ipynb) demonstrates how LIT can be
51 |   generalized to work with projections down to a data manifold, providing a
52 |   strategy for addressing an important limitation.
53 | - [ensembling_methods.py](./ensembling_methods.py) contains implementations of
54 |   LIT as well as baseline methods (NCL, bagging, AdaBoost, and amended
55 |   cross-entropy), building on top of [neural_network.py](./neural_network.py)
56 |   (which abstracts away some of the Tensorflow boilerplate code).
57 | - [run_experiment.py](./run_experiment.py) contains the core script that
58 |   generated the main quantitative results. To fully replicate the results, see
59 |   also [launch_jobs.py](./launch_jobs.py) and
60 |   [aggregate_results.py](./aggregate_results.py).
61 | 
62 | ## Note About Data
63 | 
64 | We provide the (z-scored) benchmark classification datasets used to generate
65 | the paper in the [datasets](./datasets) directory, but we omit the ICU
66 | mortality dataset (generated using the same procedures from
67 | [Ghassemi et al.  2017](https://www.ncbi.nlm.nih.gov/pubmed/28815112))
68 | from this public repository since it requires access to the
69 | [MIMIC-III database](https://mimic.physionet.org/).
70 | If you have already obtained access to MIMIC-III and seek to reproduce those
71 | results, please contact us using the emails provided in the paper.
72 | 
73 | ## See Also
74 | 
75 | A [workshop version of this paper](https://arxiv.org/abs/1806.08716) and [associated repository](https://github.com/dtak/local-independence-public), where we focus on interpretability rather than predictive performance.
76 | 
77 | ## Citation
78 | 
79 | You can cite this work using
80 | 
81 | ```
82 | @inproceedings{ross2020ensembles,
83 |   author    = {Ross, Andrew Slavin and Pan, Weiwei and Celi, Leo Anthony and Doshi-Velez, Finale},
84 |   title     = {Ensembles of Locally Independent Prediction Models},
85 |   booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence},
86 |   year      = {2020},
87 |   url       = {https://arxiv.org/abs/1911.01291},
88 | }
89 | ```
90 | 


--------------------------------------------------------------------------------
/aggregate_results.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import re
 3 | import csv
 4 | import glob
 5 | import argparse
 6 | import numpy as np
 7 | import pandas as pd
 8 | from collections import defaultdict, Counter
 9 | 
10 | """This is a script to aggregate the results of running `launch_jobs.py` into a
11 | set of CSV files that, for each dataset and splitting method, summarize
12 | mean/std/sem predictive performances and diversity metrics. These are then used
13 | by additional code to output plots and LaTeX tables."""
14 | 
15 | parser = argparse.ArgumentParser()
16 | parser.add_argument("--result_dir", type=str)
17 | FLAGS = parser.parse_args()
18 | 
19 | files = glob.glob(os.path.join(FLAGS.result_dir, '**/*.csv'))
20 | 
21 | methods = ['restarts','baggings','adaboost','negcorr','amended','diverse']
22 | 
23 | def method_label(m):
24 |   if '-' in m:
25 |     lmb = ', $\lambda=10^{'+'{:.1f}'.format(np.log10(float(m.split('-')[-1])))+'}$'
26 |     if m.startswith('diverse'):
27 |       return 'LIT'+lmb
28 |     elif m.startswith('negcorr'):
29 |       return 'NCL'+lmb
30 |     elif m.startswith('amended'):
31 |       return 'ACE'+lmb
32 |     else:
33 |       assert(False)
34 |   else:
35 |     return {
36 |       'diverse': 'LIT',
37 |       'negcorr': 'NCL',
38 |       'amended': 'ACE',
39 |       'restarts': 'RRs',
40 |       'baggings': 'Bag',
41 |       'adaboost': 'Ada' }[m]
42 | 
43 | def load_run_with_cross_validation(f):
44 |   df = pd.read_csv(f)
45 |   df['n_models'] = int(re.search('n-models-(\d+)', f).group(1))
46 |   df['reg_param'] = [float(name.split('-')[-1]) if '-' in name else np.nan for name in df.ensemble_type]
47 |   for prefix in ['diverse','negcorr','amended']:
48 |     rows = df[df.ensemble_type.str.startswith(prefix)]
49 |     if len(rows) > 0:
50 |       max_idx = rows.ensem_val_auc.idxmax()
51 |       max_row = rows.loc[max_idx]
52 |       max_row['ensemble_type'] = prefix
53 |       df = df.append(max_row)
54 |   return df
55 | 
56 | def aggregate(fs):
57 |   dfs = [load_run_with_cross_validation(f) for f in fs]
58 |   cols = list(dfs[0].columns)
59 |   cols.remove('ensemble_type')
60 |   aggs = dict((c, ['mean','std','sem']) for c in cols)
61 |   return pd.concat(dfs).groupby('ensemble_type').agg(aggs)
62 | 
63 | def load_experiment(ds, split):
64 |   return aggregate([f for f in files if (('dataset-{}'.format(ds) in f) and ('split-{}'.format(split) in f))])
65 | 
66 | for ds in ['mushroom','ionosphere','sonar','spectf','electricity','icu']:
67 |   for split in ['none','norm']:
68 |     if ds == 'icu' and split == 'norm': split = 'limit'
69 |     print(ds, split)
70 | 
71 |     exp = load_experiment(ds, split)
72 |     cols = ['method']
73 |     for col in exp.columns.levels[0]:
74 |       cols.append(col + '_mu')
75 |       cols.append(col + '_sd')
76 |       cols.append(col + '_se')
77 | 
78 |     result_file = os.path.join(FLAGS.result_dir, '{}-{}.csv'.format(ds, split))
79 |     with open(result_file, 'w') as f:
80 |       writer = csv.DictWriter(f, fieldnames=cols)
81 |       writer.writeheader()
82 |       rows = []
83 |       methods = [exp.iloc[i].name for i in range(len(exp))]
84 |       for method in methods:
85 |         row = { 'method': method_label(method) }
86 |         for col in exp.columns.levels[0]:
87 |           row[col + '_mu'] = exp.loc[method][col]['mean']
88 |           row[col + '_sd'] = exp.loc[method][col]['std']
89 |           row[col + '_se'] = exp.loc[method][col]['sem']
90 |         writer.writerow(row)
91 | 


--------------------------------------------------------------------------------
/datasets/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/.gitkeep


--------------------------------------------------------------------------------
/datasets/electricity_inputs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/electricity_inputs.npy


--------------------------------------------------------------------------------
/datasets/electricity_targets.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/electricity_targets.npy


--------------------------------------------------------------------------------
/datasets/ionosphere_inputs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/ionosphere_inputs.npy


--------------------------------------------------------------------------------
/datasets/ionosphere_targets.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/ionosphere_targets.npy


--------------------------------------------------------------------------------
/datasets/mushroom_inputs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/mushroom_inputs.npy


--------------------------------------------------------------------------------
/datasets/mushroom_targets.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/mushroom_targets.npy


--------------------------------------------------------------------------------
/datasets/sonar_inputs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/sonar_inputs.npy


--------------------------------------------------------------------------------
/datasets/sonar_targets.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/sonar_targets.npy


--------------------------------------------------------------------------------
/datasets/spectf_inputs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/spectf_inputs.npy


--------------------------------------------------------------------------------
/datasets/spectf_targets.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dtak/lit/67885bb83adaeaae97244e16579f5796914304f3/datasets/spectf_targets.npy


--------------------------------------------------------------------------------
/ensembling_methods.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tensorflow as tf
  3 | from sklearn.ensemble import AdaBoostClassifier
  4 | from utils import *
  5 | 
  6 | def squared_cos_sim(v,w,eps=1e-6):
  7 |   """Tensorflow operation to compute the elementwise squared cosine
  8 |   similarity between two sets of vectors."""
  9 |   num = tf.reduce_sum(v*w, axis=1)**2
 10 |   den = tf.reduce_sum(v*v, axis=1)*tf.reduce_sum(w*w, axis=1)
 11 |   return num / (den + eps)
 12 | 
 13 | def train_diverse_models(cls, n, X, y,
 14 |     grad_quantity='binary_logit_input_gradients',
 15 |     lambda_overlap=0.01, **kw):
 16 |   """Main method implementing local independence training."""
 17 | 
 18 |   if len(y.shape) == 1:
 19 |     y = onehot(y)
 20 | 
 21 |   if y.shape[1] > 2 and grad_quantity == 'binary_logit_input_gradients':
 22 |     grad_quantity = 'cross_entropy_input_gradients'
 23 | 
 24 |   # Instantiate neural networks
 25 |   models = [cls() for _ in range(n)]
 26 | 
 27 |   # Gather their input gradients
 28 |   igrads = [getattr(m, grad_quantity) for m in models]
 29 | 
 30 |   # Compute the prediction loss (sum of indiv. losses)
 31 |   regular_loss = tf.add_n([m.loss_function(**kw) for m in models])
 32 | 
 33 |   # Compute the diversity loss (average CosIndepErr of pairs)
 34 |   diverse_loss = tf.add_n([tf.reduce_sum(squared_cos_sim(igrads[i], igrads[j]))
 35 |                            for i in range(n)
 36 |                            for j in range(i+1, n)]) * lambda_overlap
 37 | 
 38 |   # Combine losses and train
 39 |   loss = regular_loss + diverse_loss
 40 |   ops = { 'xent': regular_loss, 'same': diverse_loss }
 41 |   for i, m in enumerate(models, 1):
 42 |     ops['acc{}'.format(i)] = m.accuracy
 43 |   sw = np.ones(len(X))
 44 |   data = train_batches(models, X, y, sample_weight=sw, **kw)
 45 |   with tf.Session() as sess:
 46 |     minimize(sess, loss, data, operations=ops, **kw)
 47 |     for m in models:
 48 |       m.vals = [v.eval() for v in m.vars]
 49 | 
 50 |   # Return trained models
 51 |   return models
 52 | 
 53 | def train_amended_xent_models(cls, n, X, y, lambda_overlap=0.01, **kw):
 54 |   if len(y.shape) == 1:
 55 |     y = onehot(y)
 56 | 
 57 |   # Instantiate models
 58 |   models = [cls() for _ in range(n)]
 59 | 
 60 |   # Compute the prediction loss (sum of indiv. losses)
 61 |   regular_loss = tf.add_n([m.loss_function(**kw) for m in models])
 62 | 
 63 |   # Compute the diversity loss (cross-entropy between models)
 64 |   diverse_loss = -tf.add_n([
 65 |       tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=models[i].logits, labels=models[j].probs))
 66 |       for i in range(n) for j in range(n) if i != j
 67 |   ]) * (lambda_overlap / (n*(n-1)*0.5))
 68 | 
 69 |   # Combine losses and train
 70 |   loss = regular_loss + diverse_loss
 71 |   ops = { 'xent': regular_loss, 'same': diverse_loss }
 72 |   for i, m in enumerate(models, 1):
 73 |     ops['acc{}'.format(i)] = m.accuracy
 74 |   sw = np.ones(len(X))
 75 |   data = train_batches(models, X, y, sample_weight=sw, **kw)
 76 |   with tf.Session() as sess:
 77 |     minimize(sess, loss, data, operations=ops, **kw)
 78 |     for m in models:
 79 |       m.vals = [v.eval() for v in m.vars]
 80 | 
 81 |   # Return trained models
 82 |   return models
 83 | 
 84 | def train_restart_models(cls, n, X, y, **kw):
 85 |   """Fit a collection of models over random restarts."""
 86 |   models = [cls() for _ in range(n)]
 87 |   for i in range(n):
 88 |     models[i].fit(X,y,**kw)
 89 |   return models
 90 | 
 91 | def train_bagged_models(cls, n, X, y, **kw):
 92 |   """Fit a collection of models using bagging."""
 93 |   models = [cls() for _ in range(n)]
 94 |   for i in range(n):
 95 |     idx = np.random.choice(np.arange(len(X)), size=len(X), replace=True).astype(int)
 96 |     models[i].fit(X[idx],y[idx],**kw)
 97 |   return models
 98 | 
 99 | def train_neg_corr_models(cls, n, X, y, lambda_overlap=0.01, **kw):
100 |   """Fit a collection of models using negative correlation learning. Note
101 |   this uses a 0-1 MSE loss rather than cross-entropy."""
102 | 
103 |   if len(y.shape) == 1:
104 |     y = onehot(y)
105 | 
106 |   # Instantiate models
107 |   models = [cls() for _ in range(n)]
108 | 
109 |   # Compute their mean predicted probability
110 |   mean_pred = tf.add_n([m.probs for m in models]) / len(models)
111 | 
112 |   # Compute MSE prediction loss
113 |   zero_one_losses = [tf.nn.l2_loss(m.probs-m.y) for m in models]
114 | 
115 |   # Compute diversity losses (difference from mean)
116 |   neg_corr_losses = [-tf.nn.l2_loss(m.probs-mean_pred) for m in models]
117 | 
118 |   # Combine losses and train
119 |   regular_loss = tf.add_n(zero_one_losses)
120 |   diverse_loss = tf.add_n(neg_corr_losses)
121 |   loss = regular_loss + lambda_overlap * diverse_loss
122 |   ops = { 'xent': regular_loss, 'same': diverse_loss }
123 |   for i, m in enumerate(models, 1):
124 |     ops['acc{}'.format(i)] = m.accuracy
125 |   sw = np.ones(len(X))
126 |   data = train_batches(models, X, y, sample_weight=sw, **kw)
127 |   with tf.Session() as sess:
128 |     minimize(sess, loss, data, operations=ops, **kw)
129 |     for m in models:
130 |       m.vals = [v.eval() for v in m.vars]
131 | 
132 |   # Return trained models
133 |   return models
134 | 
135 | 
136 | def train_adaboost_models(cls, n, X, y, **kw):
137 |   """Fit a collection of neural networks using AdaBoost."""
138 |   if len(y.shape) == 1:
139 |     classes = np.array([0.,1.])
140 |     y_ = y
141 |   else:
142 |     classes = np.arange(y.shape[1]).astype(float)
143 |     y_ = np.argmax(y, axis=1)
144 | 
145 |   # First, create a wrapper class that can be interepreted as a model from
146 |   # within scikit-learn.
147 |   class sklearn_compatible_mlp():
148 |     def __init__(self, **kwargs):
149 |       self.mlp = cls()
150 |       self.params_ = kwargs
151 |     def get_params(self, **kwargs): return self.params_
152 |     def set_params(self, **kwargs): self.params_ = kwargs
153 |     @property
154 |     def classes_(self): return classes
155 |     @property
156 |     def n_classes_(self): return len(classes)
157 |     def fit(self, X, y, sample_weight=None, **_):
158 |       N = len(X)
159 |       assert(y.shape == (N,))
160 |       assert(np.abs(sample_weight.sum()-1) < 0.001)
161 |       self.mlp = cls()
162 |       self.mlp.fit(X,y,sample_weight=sample_weight, **kw)
163 |     def predict_proba(self, X, **_):
164 |       return self.mlp.predict_proba(X)
165 | 
166 |   # Now, use scikit-learn's implementation of AdaBoost to fit the ensemble.
167 |   ab = AdaBoostClassifier(base_estimator=sklearn_compatible_mlp(), n_estimators=n)
168 |   ab.fit(X, y_)
169 | 
170 |   # Return the scikit-learn ensemble instance.
171 |   return ab
172 | 
173 | def train_diverse_models_w_projection(cls, n, X, y, projections,
174 |     grad_quantity='binary_logit_input_gradients',
175 |     lambda_overlap=0.01, **kw):
176 |   """Local independence training modified to first project gradients to a
177 |   lower dimensional space. This can be used to implement the local
178 |   independence penalty over a manifold."""
179 | 
180 |   if len(y.shape) == 1:
181 |     y = onehot(y)
182 |   
183 |   # Define Tensorflow operation to project gradients
184 |   D = X.shape[1]
185 |   def project_to(low_dim_basis, high_dim_vectors):
186 |     return tf.reduce_sum(tf.reshape(high_dim_vectors, (-1,1,D)) * low_dim_basis, axis=2)
187 |   
188 |   # Instantiate models and input-space gradients
189 |   models = [cls() for _ in range(n)]
190 |   igrads = [getattr(m, grad_quantity) for m in models]
191 | 
192 |   # Add a placeholder to each model for the projection matrices
193 |   for m in models:
194 |     m.proj = tf.placeholder(tf.float32, [None,projections.shape[1],projections.shape[2]])
195 | 
196 |   # Compute prediction and diversity losses
197 |   regular_loss = tf.add_n([m.loss_function(**kw) for m in models])
198 |   diverse_loss = tf.add_n([tf.reduce_sum(squared_cos_sim(
199 |                                               project_to(models[i].proj, igrads[i]),
200 |                                               project_to(models[j].proj, igrads[j])))
201 |                            for i in range(n)
202 |                            for j in range(i+1, n)]) * lambda_overlap
203 |   loss = regular_loss + diverse_loss
204 | 
205 |   # Train ensemble, passing in the additional projection matrices
206 |   ops = { 'xent': regular_loss, 'same': diverse_loss }
207 |   for i, m in enumerate(models, 1):
208 |     ops['acc{}'.format(i)] = m.accuracy
209 |   sw = np.ones(len(X))
210 |   data = train_batches(models, X, y, sample_weight=sw, proj=projections, **kw)
211 |   with tf.Session() as sess:
212 |     minimize(sess, loss, data, operations=ops, **kw)
213 |     for m in models:
214 |       m.vals = [v.eval() for v in m.vars]
215 | 
216 |   # Return trained models
217 |   return models
218 | 


--------------------------------------------------------------------------------
/launch_jobs.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import argparse
 3 | import numpy as np
 4 | 
 5 | """This is a script to repeatedly launch experiments (i.e. invoke
 6 | `run_experiment.py`) and generate the full set of results from the paper.
 7 | 
 8 | NOTE: Some of this code is specific to Harvard Odyssey, and would need to be
 9 | rewritten to work on different research clusters."""
10 | 
11 | parser = argparse.ArgumentParser()
12 | parser.add_argument("--base_dir", type=str)
13 | parser.add_argument("--conda_env", type=str)
14 | parser.add_argument("--partition", type=str)
15 | parser.add_argument("--mem_limit", type=str, default='20000')
16 | FLAGS = parser.parse_args()
17 | 
18 | slurm_template = """#!/bin/bash
19 | #SBATCH --mem={mem_limit}
20 | #SBATCH -t {time_limit}
21 | #SBATCH -p {partition}
22 | #SBATCH -o {out_file}
23 | #SBATCH -e {err_file}
24 | 
25 | module load Anaconda3/5.0.1-fasrc01
26 | source activate {conda_env}
27 | {job_command}
28 | """
29 | 
30 | def launch_job(restart, dataset, n_models, split, time_limit=None, mem_limit=None):
31 |   if time_limit is None: time_limit = '0-{0:02d}:00'.format(min(24, n_models*2))
32 |   if mem_limit is None: mem_limit = FLAGS.mem_limit
33 | 
34 |   save_dir = 'restart-{}__dataset-{}__n-models-{}__split-{}/'.format(restart+1, dataset, n_models, split)
35 |   save_dir = os.path.join(FLAGS.base_dir, save_dir)
36 |   out_file = os.path.join(save_dir, 'job-%j.out')
37 |   err_file = os.path.join(save_dir, 'job-%j.err')
38 |   slurm_file = os.path.join(save_dir, 'job.slurm')
39 |   os.system('mkdir -p {}'.format(save_dir))
40 | 
41 |   job_command = "python -u run_experiment.py --save_dir={} --n_models={} --dataset={} --split={}".format(save_dir, n_models, dataset, split)
42 |   slurm_command = slurm_template.format(
43 |     job_command=job_command,
44 |     time_limit=time_limit,
45 |     mem_limit=mem_limit,
46 |     partition=FLAGS.partition,
47 |     conda_env=FLAGS.conda_env,
48 |     out_file=out_file,
49 |     err_file=err_file)
50 |   with open(slurm_file, "w") as f: f.write(slurm_command)
51 |   os.system("cat {} | sbatch".format(slurm_file))
52 | 
53 | datasets = ['covertype', 'ionosphere', 'sonar', 'spectf', 'mushroom', 'electricity']
54 | datasets += ['icu'] # available on request if you have access to MIMIC-III.
55 | 
56 | for restart in range(10):
57 |   for n_models in [2,3,5,8,13]:
58 |     for dataset in datasets:
59 |       splits = ['none', 'limit'] if 'icu' in dataset else ['none', 'norm']
60 |       for split in splits:
61 |         launch_job(restart, dataset, n_models, split)
62 | 


--------------------------------------------------------------------------------
/neural_network.py:
--------------------------------------------------------------------------------
  1 | import uuid
  2 | import numpy as np
  3 | import tensorflow as tf
  4 | import six.moves.cPickle as pickle
  5 | from six import add_metaclass
  6 | from abc import ABCMeta, abstractmethod, abstractproperty
  7 | from utils import *
  8 | 
  9 | """
 10 | Object-oriented class for handling neural networks implemented
 11 | in Tensorflow.
 12 | """
 13 | @add_metaclass(ABCMeta)
 14 | class NeuralNetwork():
 15 |   def __init__(self, name=None, dtype=tf.float32, **kwargs):
 16 |     self.vals = None # Holds the trained weights of the network
 17 |     self.name = (name or str(uuid.uuid4())) # Tensorflow variable scope
 18 |     self.dtype = dtype
 19 |     self.setup_model(**kwargs)
 20 |     assert(hasattr(self, 'X'))
 21 |     assert(hasattr(self, 'y'))
 22 |     assert(hasattr(self, 'logits'))
 23 | 
 24 |   def setup_model(self, X=None, y=None, **kw):
 25 |     """Defines common placeholders, then calls rebuild_model"""
 26 |     with tf.name_scope(self.name):
 27 |       self.X = tf.placeholder(self.dtype, self.x_shape, name="X") if X is None else X
 28 |       self.y = tf.placeholder(self.dtype, self.y_shape, name="y") if y is None else y
 29 |       self.sample_weight = tf.placeholder(self.dtype, [None], name="sample_weight")
 30 |       self.is_train = tf.placeholder_with_default(
 31 |           tf.constant(False, dtype=tf.bool), shape=(), name="is_train")
 32 |     self.model = self.rebuild_model(self.X, **kw)
 33 |     self.recompute_vars()
 34 | 
 35 |   def rebuild_model(self, X, reuse=None, **kw):
 36 |     """Override this in subclasses. Define Tensorflow operations and return
 37 |     list whose last entry is logits."""
 38 | 
 39 |   @property
 40 |   def logits(self):
 41 |     return self.model[-1]
 42 | 
 43 |   @abstractproperty
 44 |   def x_shape(self):
 45 |     """Specify the shape of X; for MNIST, this could be [None, 784]"""
 46 | 
 47 |   @abstractproperty
 48 |   def y_shape(self):
 49 |     """Specify the shape of y; for MNIST, this would be [None, 10]"""
 50 | 
 51 |   @property
 52 |   def num_features(self):
 53 |     return np.product(self.x_shape[1:])
 54 | 
 55 |   @property
 56 |   def num_classes(self):
 57 |     return np.product(self.y_shape[1:])
 58 | 
 59 |   @property
 60 |   def trainable_vars(self):
 61 |     """Return this model's trainable variables"""
 62 |     return [v for v in tf.trainable_variables() if v in self.vars]
 63 | 
 64 |   def input_grad(self, f):
 65 |     """Helper to take input gradients"""
 66 |     return tf.gradients(f, self.X)[0]
 67 | 
 68 |   def cross_entropy_with(self, y):
 69 |     """Compute sample-weighted cross-entropy classification loss"""
 70 |     w = self.sample_weight / tf.reduce_sum(self.sample_weight)
 71 |     return tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(logits=self.logits, labels=y) * w)
 72 | 
 73 |   @cachedproperty
 74 |   def preds(self):
 75 |     """Tensorflow operation to return predicted labels."""
 76 |     return tf.argmax(self.logits, axis=1)
 77 | 
 78 |   @cachedproperty
 79 |   def probs(self):
 80 |     """Tensorflow operation to return predicted label probabilities."""
 81 |     return tf.nn.softmax(self.logits)
 82 | 
 83 |   @cachedproperty
 84 |   def logps(self):
 85 |     """Tensorflow operation to return predicted label log-probabilities."""
 86 |     return self.logits - tf.reduce_logsumexp(self.logits, 1, keepdims=True)
 87 | 
 88 |   @cachedproperty
 89 |   def grad_sum_logps(self):
 90 |     """Tensorflow operation returning gradient of the sum of log-probabilities.
 91 |     Can be used as the gradient for LIT for multi-class classification (doesn't
 92 |     require labels)."""
 93 |     return self.input_grad(self.logps)
 94 | 
 95 |   @cachedproperty
 96 |   def l2_weights(self):
 97 |     """Tensorflow operation returning sum of squared weight values"""
 98 |     return tf.add_n([l2_loss(v) for v in self.trainable_vars])
 99 | 
100 |   @cachedproperty
101 |   def cross_entropy(self):
102 |     """Tensorflow operation returning classification cross-entropy"""
103 |     return self.cross_entropy_with(self.y)
104 | 
105 |   @cachedproperty
106 |   def cross_entropy_input_gradients(self):
107 |     """Tensorflow operation returning gradient of the loss. Can be used as the
108 |     gradient for LIT for multi-class classification but does require labels."""
109 |     return self.input_grad(self.cross_entropy)
110 | 
111 |   @cachedproperty
112 |   def predicted_logit_input_gradients(self):
113 |     """Tensorflow operation returning gradient of the predicted log-odds.
114 |     Mostly useful for visualization rather than training."""
115 |     return self.input_grad(self.logits * self.y)
116 | 
117 |   @cachedproperty
118 |   def binary_logits(self):
119 |     """Tensorflow operation returning the actual predicted log-odds (binary
120 |     only)."""
121 |     assert(self.num_classes == 2)
122 |     return self.logps[:,1] - self.logps[:,0]
123 | 
124 |   @cachedproperty
125 |   def binary_logit_input_gradients(self):
126 |     """Tensorflow operation returning gradient of the predicted binary
127 |     log-odds. This is what we use for LIT in binary classification."""
128 |     return self.input_grad(self.binary_logits)
129 | 
130 |   @cachedproperty
131 |   def accuracy(self):
132 |     """Tensorflow operation returning classification accuracy."""
133 |     return tf.reduce_mean(tf.cast(tf.equal(self.preds, tf.argmax(self.y, 1)), dtype=tf.float32))
134 | 
135 |   def score(self, X, y, **kw):
136 |     """Compute classification accuracy for numpy inputs and labels."""
137 |     if len(y.shape) == 2:
138 |       return np.mean(self.predict(X, **kw) == np.argmax(y, 1))
139 |     else:
140 |       return np.mean(self.predict(X, **kw) == y)
141 | 
142 |   def predict(self, X, **kw):
143 |     """Compute predictions for numpy inputs."""
144 |     with tf.Session() as sess:
145 |       self.init(sess)
146 |       return self.batch_eval(sess, self.preds, X, **kw)
147 | 
148 |   def predict_logits(self, X, **kw):
149 |     """Compute raw logits for numpy inputs."""
150 |     with tf.Session() as sess:
151 |       self.init(sess)
152 |       return self.batch_eval(sess, self.logits, X, **kw)
153 | 
154 |   def predict_binary_logodds(self, X, **kw):
155 |     """Compute predicted binary log-odds for numpy inputs."""
156 |     with tf.Session() as sess:
157 |       self.init(sess)
158 |       return self.batch_eval(sess, self.binary_logits, X, **kw)
159 | 
160 |   def predict_proba(self, X, **kw):
161 |     """Compute predicted probabilities for numpy inputs."""
162 |     with tf.Session() as sess:
163 |       self.init(sess)
164 |       return self.batch_eval(sess, self.probs, X, **kw)
165 | 
166 |   def batch_eval(self, sess, quantity, X, n=256):
167 |     """Internal helper to batch computations (prevents memory issues)"""
168 |     vals = sess.run(quantity, feed_dict={ self.X: X[:n] })
169 |     stack = np.vstack if len(vals.shape) > 1 else np.hstack
170 |     for i in range(n, len(X), n):
171 |       vals = stack((vals, sess.run(quantity, feed_dict={ self.X: X[i:i+n] })))
172 |     return vals
173 | 
174 |   def input_gradients(self, X, y=None, n=256, **kw):
175 |     """Computes different kinds of input gradients for inputs (and optionally
176 |     labels). See input_gradients_ for details."""
177 |     with tf.Session() as sess:
178 |       self.init(sess)
179 |       return self.batch_input_gradients_(sess, X, y, n, **kw)
180 | 
181 |   def batch_input_gradients_(self, sess, X, y=None, n=256, **kw):
182 |     yy = y[:n] if y is not None and not isint(y) else y
183 |     grads = self.input_gradients_(sess, X[:n], yy, **kw)
184 |     for i in range(n, len(X), n):
185 |       yy = y[i:i+n] if y is not None and not isint(y) else y
186 |       grads = np.vstack((grads,
187 |         self.input_gradients_(sess, X[i:i+n], yy, **kw)))
188 |     return grads
189 | 
190 |   def input_gradients_(self, sess, X, y=None, logits=False, quantity=None):
191 |     if quantity is not None:
192 |       return sess.run(quantity, feed_dict={ self.X: X })
193 |     if y is None:
194 |       return sess.run(self.grad_sum_logps, feed_dict={ self.X: X })
195 |     elif logits and self.num_classes == 2:
196 |       return sess.run(self.binary_logit_input_gradients, feed_dict={ self.X: X })
197 |     elif isint(y):
198 |       y = onehot(np.array([y]*len(X)), self.num_classes)
199 |     feed = { self.X: X, self.y: y, self.sample_weight: np.ones(len(X)) }
200 |     if logits:
201 |       return sess.run(self.predicted_logit_input_gradients, feed_dict=feed)
202 |     else:
203 |       return sess.run(self.cross_entropy_input_gradients, feed_dict=feed)
204 | 
205 |   def loss_function(self, l2_weights=0., **kw):
206 |     """Construct the loss function Tensorflow op given hyperparameters."""
207 |     log_likelihood = self.cross_entropy
208 |     if l2_weights > 0:
209 |       log_prior = l2_weights * self.l2_weights
210 |     else:
211 |       log_prior = 0
212 |     return log_likelihood + log_prior
213 | 
214 |   def fit(self, X, y, loss_fn=None, init=False, sample_weight=None, **kw):
215 |     """Fit the neural network on the particular dataset."""
216 |     if loss_fn is None:
217 |       loss_fn = self.loss_function(**kw)
218 |     if len(y.shape) == 1:
219 |       y = onehot(y, self.num_classes)
220 |     if sample_weight is None:
221 |       sample_weight = np.ones(len(X))
222 |     ops = { 'xent': self.cross_entropy, 'loss': loss_fn, 'accu': self.accuracy }
223 |     batches = train_batches([self], X, y, sample_weight=sample_weight, **kw)
224 |     with tf.Session() as sess:
225 |       if init: self.init(sess)
226 |       minimize(sess, loss_fn, batches, ops, **kw)
227 |       self.vals = [v.eval() for v in self.vars]
228 | 
229 |   def recompute_vars(self):
230 |     """Determine which Tensorflow variables are associated with this
231 |     network."""
232 |     self.vars = tf.get_default_graph().get_collection(
233 |         tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
234 | 
235 |   def init(self, sess):
236 |     """Prepare this network to be used in a Tensorflow session."""
237 |     if self.vals is None:
238 |       sess.run(tf.global_variables_initializer())
239 |     else:
240 |       for var, val in zip(self.vars, self.vals):
241 |         sess.run(var.assign(val))
242 | 
243 |   def save(self, filename):
244 |     """Save the weights of the network to a pickle file."""
245 |     with open(filename, 'wb') as f:
246 |       pickle.dump(self.vals, f)
247 | 
248 |   def load(self, filename):
249 |     """Load the weights of the network from a pickle file."""
250 |     with open(filename, 'rb') as f:
251 |       self.vals = pickle.load(f)
252 | 


--------------------------------------------------------------------------------
/run_experiment.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import numpy as np
  3 | import tensorflow as tf
  4 | from scipy.stats import pearsonr
  5 | from utils import *
  6 | from neural_network import *
  7 | from ensembling_methods import *
  8 | 
  9 | parser = argparse.ArgumentParser()
 10 | parser.add_argument("--save_dir", type=str)
 11 | parser.add_argument("--n_models", type=int)
 12 | parser.add_argument("--dataset", type=str)
 13 | parser.add_argument("--split", type=str, default='none')
 14 | FLAGS = parser.parse_args()
 15 | 
 16 | save_dir = FLAGS.save_dir
 17 | n_models = FLAGS.n_models
 18 | dataset = FLAGS.dataset
 19 | split = FLAGS.split
 20 | 
 21 | # Load the dataset
 22 | X = np.load('datasets/{}_inputs.npy'.format(dataset))
 23 | y = np.load('datasets/{}_targets.npy'.format(dataset))
 24 | 
 25 | # Decide how to split it
 26 | if split == 'norm':
 27 |   # If splitting by "norm", split train and test by distance from origin,
 28 |   # but subsplit train and validation randomly
 29 |   norms = np.linalg.norm(X, axis=1)
 30 |   midpt = np.median(norms)
 31 |   X_test = X[np.argwhere(norms > midpt)[:,0]]
 32 |   y_test = y[np.argwhere(norms > midpt)[:,0]]
 33 |   X_train = X[np.argwhere(norms <= midpt)[:,0]]
 34 |   y_train = y[np.argwhere(norms <= midpt)[:,0]]
 35 |   X_test, X_val, y_test, y_val = tt_split(X_test, y_test)
 36 | else:
 37 |   # Otherwise, split train/test/val completely randomly
 38 |   X_train, X_test, y_train, y_test = tt_split(X, y)
 39 |   X_train, X_val, y_train, y_val = tt_split(X_train, y_train)
 40 | 
 41 | if split == 'limit':
 42 |   # If splitting by "limit", use a smaller training set
 43 |   X_train = X_train[:1000]
 44 |   y_train = y_train[:1000]
 45 | 
 46 | # Define helpers for printing performance metrics to a csv.
 47 | cols = (
 48 |     ['ensemble_type', 'ensem_val_auc']
 49 |     + ['indiv_auc_avg', 'indiv_auc_std', 'ensem_auc', 'indiv_auc_max', 'indiv_auc_min']
 50 |     + ['indiv_acc_avg', 'indiv_acc_std', 'ensem_acc', 'indiv_acc_max', 'indiv_acc_min']
 51 |     + ['q_stat', 'interrater', 'err_corr', 'grad_cos2']
 52 | )
 53 | 
 54 | def write_row(row, mode='a+'):
 55 |   csv = open(save_dir + 'auc_results.csv', mode)
 56 |   csv.write(','.join(row) + '\n')
 57 |   csv.close()
 58 | 
 59 | write_row(cols, mode='w')
 60 | 
 61 | # Define main helper for evaluating models and saving results
 62 | def save_models(models, name, moe_auc=None, moe_val=None, moe_acc=None):
 63 |   row = {'ensemble_type': name}
 64 |   print(name)
 65 |   test_preds = []
 66 |   val_preds = []
 67 |   accs = []
 68 |   aucs = []
 69 |   grads = []
 70 | 
 71 |   # For each model, save its parameters, compute its individual AUC, and
 72 |   # compile its predictions
 73 |   for i, m in enumerate(models):
 74 |     m.save('{}{}_model{}.pkl'.format(save_dir, name, i))
 75 |     testp = m.predict_proba(X_test)
 76 |     valp = m.predict_proba(X_val)
 77 |     auc = scoring_fun(testp, y_test)
 78 |     acc = accuracy_fun(testp, y_test)
 79 |     print('  Model #{} AUC: {:.4f}, acc: {:.4f}'.format(i+1,auc,acc))
 80 |     aucs.append(auc)
 81 |     accs.append(acc)
 82 |     test_preds.append(testp)
 83 |     val_preds.append(valp)
 84 |     grads.append(m.input_gradients(X_test, logits=(y_test.max() == 1)))
 85 | 
 86 |   # Save max, min, mean, and standard deviation of individual model AUC
 87 |   print('  Indiv AUC max: {:.4f}'.format(np.max(aucs)))
 88 |   print('  Indiv AUC min: {:.4f}'.format(np.min(aucs)))
 89 |   print('  Indiv AUC mu: {:.4f}'.format(np.mean(aucs)))
 90 |   print('  Indiv AUC sd: {:.4f}'.format(np.std(aucs)))
 91 | 
 92 |   row['indiv_auc_max'] = '{:.6f}'.format(np.max(aucs))
 93 |   row['indiv_auc_min'] = '{:.6f}'.format(np.min(aucs))
 94 |   row['indiv_auc_avg'] = '{:.6f}'.format(np.mean(aucs))
 95 |   row['indiv_auc_std'] = '{:.6f}'.format(np.std(aucs))
 96 | 
 97 |   row['indiv_acc_max'] = '{:.6f}'.format(np.max(accs))
 98 |   row['indiv_acc_min'] = '{:.6f}'.format(np.min(accs))
 99 |   row['indiv_acc_avg'] = '{:.6f}'.format(np.mean(accs))
100 |   row['indiv_acc_std'] = '{:.6f}'.format(np.std(accs))
101 | 
102 |   # Compute AUC of average prediction
103 |   val_preds = np.array(val_preds)
104 |   test_preds = np.array(test_preds)
105 |   grads = np.array(grads)
106 |   avg_auc = scoring_fun(test_preds.mean(axis=0), y_test)
107 |   avg_acc = accuracy_fun(test_preds.mean(axis=0), y_test)
108 |   print('  Ens. Avg AUC: {:.4f}, acc: {:.4f}'.format(avg_auc, avg_acc))
109 | 
110 |   avg_auc_val = scoring_fun(val_preds.mean(axis=0), y_val)
111 | 
112 |   # Report it (unless it's adaboost and we've been passed its weighted
113 |   # predictions)
114 |   if moe_auc is None:
115 |     row['ensem_auc'] = '{:.6f}'.format(avg_auc)
116 |     row['ensem_acc'] = '{:.6f}'.format(avg_acc)
117 |     row['ensem_val_auc'] = '{:.6f}'.format(avg_auc_val)
118 |   else:
119 |     row['ensem_auc'] = '{:.6f}'.format(moe_auc)
120 |     row['ensem_acc'] = '{:.6f}'.format(moe_acc)
121 |     row['ensem_val_auc'] = '{:.6f}'.format(moe_val)
122 | 
123 |   # If the ensemble had more than one model (i.e. if it wasn't AdaBoost
124 |   # terminating early), then compute standard diversity measures (+ ours).
125 |   if len(models) > 1:
126 |     # First determine where each model erred
127 |     ens_errors = [error_masks(preds, y_test) for preds in test_preds]
128 |     ens_errsets = [set(np.argwhere(error)[:,0]) for error in ens_errors]
129 | 
130 |     # Compute the error correlation rho_avg
131 |     error_corr = np.mean([
132 |       pearsonr(err1, err2)[0]
133 |       for i,err1 in enumerate(ens_errors)
134 |       for err2 in ens_errors[i+1:]])
135 | 
136 |     # Compute the q-statistic
137 |     q_stat = np.mean([
138 |       yules_q_statistic(e1,e2,y_test)
139 |       for i,e1 in enumerate(ens_errsets)
140 |       for e2 in ens_errsets[i+1:]])
141 | 
142 |     # Compute the interrater agreement (See Eq. 16 of Kuncheva & Whitaker 2003)
143 |     Dis_av = np.mean([
144 |       disagreement_measure(e1,e2,y_test)
145 |       for i,e1 in enumerate(ens_errsets)
146 |       for e2 in ens_errsets[i+1:]])
147 |     avg_acc = np.mean(accs)
148 |     try:
149 |       kappa = 1 - Dis_av / (2 * avg_acc * (1-avg_acc))
150 |     except ZeroDivisionError:
151 |       kappa = np.nan
152 | 
153 |     # Compute the value of the LIT penalty
154 |     gradcos2 = np.mean([elemwise_sq_cos_sim(g1, g2)
155 |       for i,g1 in enumerate(grads)
156 |       for g2 in grads[i+1:]])
157 | 
158 |     print('  Q. statistic: {:.4f}'.format(q_stat))
159 |     print('  Interrater agg: {:.4f}'.format(kappa))
160 |     print('  Err. correl.: {:.4f}'.format(error_corr))
161 |     print('  Av grad cos2: {:.4f}'.format(gradcos2))
162 | 
163 |     row['q_stat'] = '{:.6f}'.format(q_stat)
164 |     row['interrater'] = '{:.6f}'.format(kappa)
165 |     row['err_corr'] = '{:.6f}'.format(error_corr)
166 |     row['grad_cos2'] = '{:.6f}'.format(gradcos2)
167 |   else:
168 |     row['q_stat'] = 'nan'
169 |     row['interrater'] = 'nan'
170 |     row['err_corr'] = 'nan'
171 |     row['grad_cos2'] = 'nan'
172 | 
173 |   assert(set(row.keys()) == set(cols))
174 | 
175 |   # Print everything to the CSV.
176 |   write_row([row[k] for k in cols])
177 | 
178 | # Define the neural network architecture - 256-unit hidden layer w/ dropout
179 | if len(y_train.shape) == 1:
180 |   y_shape = 2
181 | else:
182 |   y_shape = y_train.shape[1]
183 | class Net(NeuralNetwork):
184 |   @property
185 |   def x_shape(self): return [None, X_train.shape[1]]
186 |   @property
187 |   def y_shape(self): return [None, y_shape]
188 |   def rebuild_model(self, X, **_):
189 |     L0 = X
190 |     L1 = tf.layers.dense(L0, 256, name=self.name+'/L1', activation=tf.nn.relu)
191 |     L1 = tf.layers.dropout(L1, training=self.is_train)
192 |     L2 = tf.layers.dense(L1, y_shape, name=self.name+'/L2', activation=None)
193 |     return [L1, L2]
194 | 
195 | # Set up training parameters -- we'll use 0.0001 weight decay and train for the
196 | # minimum epochs to run for 5000 iterations.
197 | num_epochs = int(np.ceil(np.ceil((5000*128) / float(len(X_train)))))
198 | train_args = [Net, n_models, X_train, y_train]
199 | train_kwargs = {
200 |   'num_epochs': num_epochs,
201 |   'l2_weights': 0.0001,
202 |   'print_every': 100
203 | }
204 | 
205 | # Train random restarts
206 | tf.reset_default_graph()
207 | save_models(train_restart_models(*train_args, **train_kwargs), 'restarts')
208 | 
209 | # Train bagging
210 | tf.reset_default_graph()
211 | save_models(train_bagged_models(*train_args, **train_kwargs), 'baggings')
212 | 
213 | # Train adaboost (using scikit-learn's default implementation)
214 | tf.reset_default_graph()
215 | adaboost = train_adaboost_models(*train_args, **train_kwargs)
216 | adaboost_models = [e.mlp for e in adaboost.estimators_]
217 | save_models(adaboost_models, 'adaboost',
218 |         moe_auc=scoring_fun(adaboost.predict_proba(X_test), y_test),
219 |         moe_val=scoring_fun(adaboost.predict_proba(X_val), y_val),
220 |         moe_acc=accuracy_fun(adaboost.predict_proba(X_test), y_test))
221 | 
222 | for penalty in np.logspace(-4, 1, 16):
223 |   # Run LIT
224 |   tf.reset_default_graph()
225 |   save_models(
226 |     train_diverse_models(*train_args, lambda_overlap=penalty, **train_kwargs),
227 |     'diverse-{:.4f}'.format(penalty))
228 | 
229 |   # Run NCL
230 |   tf.reset_default_graph()
231 |   save_models(
232 |     train_neg_corr_models(*train_args, lambda_overlap=penalty, **train_kwargs),
233 |     'negcorr-{:.4f}'.format(penalty))
234 | 
235 |   # Run ACE
236 |   tf.reset_default_graph()
237 |   save_models(
238 |     train_amended_xent_models(*train_args, lambda_overlap=penalty, **train_kwargs),
239 |     'amended-{:.4f}'.format(penalty))
240 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import six
  3 | import time
  4 | import numpy as np
  5 | import tensorflow as tf
  6 | from sklearn.metrics import roc_auc_score, accuracy_score
  7 | from sklearn.model_selection import train_test_split
  8 | 
  9 | def l1_loss(x):
 10 |   return tf.reduce_sum(tf.abs(x))
 11 | 
 12 | def l2_loss(x):
 13 |   return tf.nn.l2_loss(x)
 14 | 
 15 | class cachedproperty(object):
 16 |   """Simplified https://github.com/pydanny/cached-property"""
 17 |   def __init__(self, function):
 18 |     self.__doc__ = getattr(function, '__doc__')
 19 |     self.function = function
 20 | 
 21 |   def __get__(self, instance, klass):
 22 |     if instance is None: return self
 23 |     value = instance.__dict__[self.function.__name__] = self.function(instance)
 24 |     return value
 25 | 
 26 | def isint(x):
 27 |   return isinstance(x, (int, np.int32, np.int64))
 28 | 
 29 | def onehot(Y, K=None):
 30 |   if K is None:
 31 |     K = np.unique(Y)
 32 |   elif isint(K):
 33 |     K = list(range(K))
 34 |   data = np.array([[y == k for k in K] for y in Y]).astype(int)
 35 |   return data
 36 | 
 37 | def minibatch_indexes(lenX, batch_size=256, num_epochs=50, **kw):
 38 |   n = int(np.ceil(lenX / batch_size))
 39 |   for epoch in range(num_epochs):
 40 |     for batch in range(n):
 41 |       i = epoch*n + batch
 42 |       yield i, epoch, slice((i%n)*batch_size, ((i%n)+1)*batch_size)
 43 | 
 44 | def train_feed(idx, models, **kw):
 45 |   """Convert a set of models, a set of indexes, and numpy arrays given by the
 46 |   keyword arguments to a set of feed dictionaries for each model."""
 47 |   feed = {}
 48 |   for m in models:
 49 |     feed[m.is_train] = True
 50 |     for dictionary in [kw, kw.get('feed_dict', {})]:
 51 |       for key, val in six.iteritems(dictionary):
 52 |         attr = getattr(m, key) if isinstance(key, str) and hasattr(m, key) else key
 53 |         if type(attr) == type(m.X):
 54 |           if len(attr.shape) >= 1:
 55 |             if attr.shape[0].value is None:
 56 |               feed[attr] = val[idx]
 57 |   return feed
 58 | 
 59 | def train_batches(models, X, y, **kw):
 60 |   for i, epoch, idx in minibatch_indexes(len(X), **kw):
 61 |     yield i, epoch, train_feed(idx, models, X=X, y=y, **kw)
 62 | 
 63 | def reinitialize_variables(sess):
 64 |   """Construct a Tensorflow operation to initialize any variables in its graph
 65 |   which are not already initialized."""
 66 |   uninitialized_vars = []
 67 |   for var in tf.global_variables():
 68 |     try:
 69 |       sess.run(var)
 70 |     except tf.errors.FailedPreconditionError:
 71 |       uninitialized_vars.append(var)
 72 |   return tf.variables_initializer(uninitialized_vars)
 73 | 
 74 | def minimize(sess, loss_fn, batches, operations={}, learning_rate=0.001, print_every=None, var_list=None, **kw):
 75 |   """Minimize a loss function over the provided batches of data, possibly
 76 |   printing progress."""
 77 |   optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
 78 |   train_op = optimizer.minimize(loss_fn, var_list=var_list)
 79 |   op_keys = sorted(list(operations.keys()))
 80 |   ops = [train_op] + [operations[k] for k in op_keys]
 81 |   t = time.time()
 82 |   sess.run(reinitialize_variables(sess))
 83 |   for i, epoch, batch in batches:
 84 |     results = sess.run(ops, feed_dict=batch)
 85 |     if print_every and i % print_every == 0:
 86 |       s = 'Batch {}, epoch {}, time {:.1f}s'.format(i, epoch, time.time() - t)
 87 |       for j,k in enumerate(op_keys, 1):
 88 |         s += ', {} {:.4f}'.format(k, results[j])
 89 |       print(s)
 90 | 
 91 | def tt_split(X, y, test_size=0.2):
 92 |   return train_test_split(X, y, test_size=test_size, stratify=y)
 93 | 
 94 | def elemwise_sq_cos_sim(v, w, eps=1e-8):
 95 |   assert(len(v.shape) == 2)
 96 |   assert(len(w.shape) == 2)
 97 |   num = np.sum(v*w, axis=1)**2
 98 |   den = np.sum(v*v, axis=1) * np.sum(w*w, axis=1)
 99 |   return num / (den + eps)
100 | 
101 | def yules_q_statistic(e1, e2, y_test):
102 |   n = len(y_test)
103 |   n00 = len(e1.intersection(e2))
104 |   n01 = len(e1.difference(e2))
105 |   n10 = len(e2.difference(e1))
106 |   n11 = n - len(e1.union(e2))
107 |   assert(n00+n01+n10+n11 == n)
108 |   numer = n11*n00 - n01*n10
109 |   denom = n11*n00 + n01*n10
110 |   if numer == 0:
111 |     return 0
112 |   else:
113 |     return numer / float(denom)
114 | 
115 | def disagreement_measure(e1, e2, y_test):
116 |   n = len(y_test)
117 |   n01 = len(e1.difference(e2))
118 |   n10 = len(e2.difference(e1))
119 |   return (n01 + n10) / n
120 | 
121 | def scoring_fun(y_pred, y_true):
122 |   if len(y_true.shape) == 1:
123 |     assert(y_true.max() == 1) # binary
124 |     if len(y_pred.shape) == 1:
125 |       preds = y_pred
126 |     else:
127 |       preds = y_pred[:,1]
128 |     return roc_auc_score(y_true, preds)
129 |   else:
130 |     return accuracy_fun(y_pred, y_true)
131 | 
132 | def accuracy_fun(y_pred, y_true):
133 |   if len(y_true.shape) == 1:
134 |     assert(y_true.max() == 1) # binary
135 |     if len(y_pred.shape) == 1:
136 |       preds = (y_pred > 0.5).astype(int)
137 |     else:
138 |       preds = np.argmax(y_pred, axis=1)
139 |     return np.mean(y_true == preds)
140 |   else:
141 |     return np.mean(np.argmax(y_true, axis=1) == np.argmax(y_pred, axis=1))
142 | 
143 | def error_masks(y_pred, y_true):
144 |   if len(y_true.shape) == 1:
145 |     assert(y_true.max() == 1) # binary
146 |     if len(y_pred.shape) == 1:
147 |       preds = (y_pred > 0.5).astype(int)
148 |     else:
149 |       preds = np.argmax(y_pred, axis=1)
150 |     return (preds != y_true).astype(int)
151 |   else:
152 |     return (np.argmax(y_true, axis=1) != np.argmax(y_pred, axis=1)).astype(int)
153 | 


--------------------------------------------------------------------------------