├── scripts
    ├── .gitignore
    ├── run_experiments.sh
    └── make_figures.py
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── README.md
├── plotting.py
├── .gitignore
├── private_model_inversion.py
├── fisher_experiment.py
├── reweighted.py
├── models.py
├── test_jacobians.py
├── model_inversion.py
├── dataloading.py
└── LICENSE


/scripts/.gitignore:
--------------------------------------------------------------------------------
1 | *.pdf
2 | *.png
3 | *.json
4 | *.pth
5 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Code of Conduct
2 | 
3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
4 | Please read the [full text](https://code.fb.com/codeofconduct/)
5 | so that you can understand what actions will and will not be tolerated.
6 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to `fisher_information_loss`
 2 | We want to make contributing to this project as easy and transparent as
 3 | possible.
 4 | 
 5 | ## Pull Requests
 6 | We actively welcome your pull requests.
 7 | 
 8 | 1. Fork the repo and create your branch from `master`.
 9 | 2. If you've added code that should be tested, add tests.
10 | 3. If you've changed APIs, update the documentation.
11 | 4. Ensure the test suite passes.
12 | 5. If you haven't already, complete the Contributor License Agreement ("CLA").
13 | 
14 | ## Contributor License Agreement ("CLA")
15 | In order to accept your pull request, we need you to submit a CLA. You only need
16 | to do this once to work on any of Facebook's open source projects.
17 | 
18 | Complete your CLA here: <https://code.facebook.com/cla>
19 | 
20 | ## Issues
21 | We use GitHub issues to track public bugs. Please ensure your description is
22 | clear and has sufficient instructions to be able to reproduce the issue.
23 | 
24 | ## License
25 | By contributing to mpcfp, you agree that your contributions will be licensed
26 | under the LICENSE file in the root directory of this source tree.
27 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Fisher Information Loss
 2 | 
 3 | This repository contains code that can be used to reproduce the experimental
 4 | results presented in the paper:
 5 | 
 6 | Awni Hannun, Chuan Guo and Laurens van der Maaten. Measuring Data Leakage in
 7 | Machine-Learning Models with Fisher Information. 
 8 | [arXiv 2102.11673](https://arxiv.org/abs/2102.11673), 2021.
 9 | 
10 | # Installation
11 | 
12 | The code requires Python 3.7+, [PyTorch
13 | 1.7.1+](https://pytorch.org/get-started/locally/), and torchvision 0.8.2+.
14 | 
15 | Create an Anaconda environment and install the dependencies:
16 | 
17 | ```
18 | conda create --name fil
19 | conda activate fil
20 | conda install -c pytorch pytorch torchvision
21 | pip install gitpython numpy
22 | ```
23 | 
24 | # Usage
25 | 
26 | The script `fisher_information.py` computes the per-example FIL for the given
27 | dataset and model. An example run is:
28 | 
29 | ```
30 | python fisher_information.py \
31 |     --dataset mnist \
32 |     --model least_squares
33 | ```
34 | 
35 | To see usage options for the script run:
36 | 
37 | ```
38 | python fisher_information.py --help
39 | ```
40 | 
41 | Other scripts in the repository are:
42 | - `reweighted.py` : Run the iteratively reweighted Fisher information loss
43 |   (IRFIL) algorithm.
44 | - `model_inversion.py` : Attribute inversion experiments for a non-private
45 |   model.
46 | - `private_model_inversion.py` : Attribute inversion experiments for a private
47 |   model.
48 | - `test_jacobians.py` : Unit tests.
49 | 
50 | To run the full set of experiments in the accompanying paper:
51 | ```
52 | cd scripts/ && ./run_experiments.sh
53 | ```
54 | 
55 | # Citing this Repository
56 | 
57 | If you use the code in this repository, please cite the following paper:
58 | 
59 | ```
60 | @inproceedings{hannun2021fil,
61 |   title={Measuring Data Leakage in Machine-Learning Models with Fisher
62 |     Information},
63 |   author={Hannun, Awni and Guo, Chuan and van der Maaten, Laurens},
64 |   booktitle={Conference on Uncertainty in Artificial Intelligence},
65 |   year={2021}
66 | }
67 | ```
68 | 
69 | # License
70 | 
71 | This code is released under a CC-BY-NC 4.0 license. Please see the
72 | [LICENSE](LICENSE) file for more information.
73 | 
74 | Please review Facebook Open Source [Terms of
75 | Use](https://opensource.facebook.com/legal/terms) and [Privacy
76 | Policy](https://opensource.facebook.com/legal/privacy).
77 | 
78 | 


--------------------------------------------------------------------------------
/plotting.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | # Copyright (c) Facebook, Inc. and its affiliates.
 4 | # All rights reserved.
 5 | 
 6 | # This source code is licensed under the license found in the
 7 | # LICENSE file in the root directory of this source tree.
 8 | 
 9 | 
10 | """
11 | A simple example of creating a figure with text rendered in LaTeX.
12 | 
13 | https://jwalton.info/Embed-Publication-Matplotlib-Latex/
14 | """
15 | 
16 | import seaborn as sns
17 | import matplotlib.pyplot as plt
18 | 
19 | # Using seaborn's style
20 | plt.style.use('seaborn-white')
21 | 
22 | WIDTH = 345
23 | GR = (5**.5 - 1) / 2
24 | FORMAT = "pdf"
25 | 
26 | tex_fonts = {
27 |     # Use LaTeX to write all text
28 |     "text.usetex": True,
29 |     "font.family": "serif",
30 |     "axes.labelsize": 14,
31 |     "font.size": 14,
32 |     # Make the legend/label fonts a little smaller
33 |     "legend.fontsize": 12,
34 |     "xtick.labelsize": 12,
35 |     "ytick.labelsize": 12
36 | }
37 | 
38 | plt.rcParams.update(tex_fonts)
39 | plt.rcParams.update({"legend.handlelength": 1})
40 | 
41 | def savefig(filename):
42 |     plt.savefig(
43 |         filename + "." + FORMAT, format=FORMAT, dpi=1200, bbox_inches="tight")
44 | 
45 | def line_plot(
46 |         Y, X, xlabel=None, ylabel=None, ymax=None, ymin=None,
47 |         xmax=None, xmin=None, filename=None, legend=None, errors=None,
48 |         xlog=False, ylog=False, size=None, marker="s"):
49 |     colors = sns.cubehelix_palette(Y.shape[0], start=2, rot=0, dark=0, light=.5)
50 |     plt.clf()
51 |     if legend is None:
52 |         legend = [None] * Y.shape[0]
53 | 
54 |     if size is not None:
55 |         plt.figure(figsize=size)
56 | 
57 |     for n in range(Y.shape[0]):
58 |         x = X[n, :] if X.ndim == 2 else X
59 |         plt.plot(x, Y[n, :], label=legend[n], color=colors[n],
60 |                 marker=marker, markersize=5)
61 |         if errors is not None:
62 |             plt.fill_between(
63 |                 x, Y[n, :] - errors[n, :], Y[n, :] + errors[n, :],
64 |                 alpha=0.1, color=colors[n])
65 | 
66 |     if ymax is not None:
67 |         plt.ylim(top=ymax)
68 |     if ymin is not None:
69 |         plt.ylim(bottom=ymin)
70 |     if xmax is not None:
71 |         plt.xlim(right=xmax)
72 |     if xmin is not None:
73 |         plt.xlim(left=xmin)
74 | 
75 |     plt.xlabel(xlabel)
76 |     plt.ylabel(ylabel)
77 |     if legend[0] is not None:
78 |         plt.legend()
79 | 
80 |     axes = plt.gca()
81 |     if xlog:
82 |         axes.semilogx(10.)
83 |     if ylog:
84 |         axes.semilogy(10.)
85 | 
86 |     if filename is not None:
87 |         savefig(filename)
88 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Misc.
  2 | .nfs*
  3 | .DS_Store
  4 | 
  5 | # vim
  6 | *.swp
  7 | 
  8 | # Ignore results folder
  9 | results/
 10 | 
 11 | # Byte-compiled / optimized / DLL files
 12 | __pycache__/
 13 | *.py[cod]
 14 | *$py.class
 15 | 
 16 | # C extensions
 17 | *.so
 18 | 
 19 | # Distribution / packaging
 20 | .Python
 21 | build/
 22 | develop-eggs/
 23 | dist/
 24 | downloads/
 25 | eggs/
 26 | .eggs/
 27 | lib/
 28 | lib64/
 29 | parts/
 30 | sdist/
 31 | var/
 32 | wheels/
 33 | pip-wheel-metadata/
 34 | share/python-wheels/
 35 | *.egg-info/
 36 | .installed.cfg
 37 | *.egg
 38 | MANIFEST
 39 | 
 40 | # PyInstaller
 41 | #  Usually these files are written by a python script from a template
 42 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 43 | *.manifest
 44 | *.spec
 45 | 
 46 | # Installer logs
 47 | pip-log.txt
 48 | pip-delete-this-directory.txt
 49 | 
 50 | # Unit test / coverage reports
 51 | htmlcov/
 52 | .tox/
 53 | .nox/
 54 | .coverage
 55 | .coverage.*
 56 | .cache
 57 | nosetests.xml
 58 | coverage.xml
 59 | *.cover
 60 | *.py,cover
 61 | .hypothesis/
 62 | .pytest_cache/
 63 | 
 64 | # Translations
 65 | *.mo
 66 | *.pot
 67 | 
 68 | # Django stuff:
 69 | *.log
 70 | local_settings.py
 71 | db.sqlite3
 72 | db.sqlite3-journal
 73 | 
 74 | # Flask stuff:
 75 | instance/
 76 | .webassets-cache
 77 | 
 78 | # Scrapy stuff:
 79 | .scrapy
 80 | 
 81 | # Sphinx documentation
 82 | docs/_build/
 83 | 
 84 | # PyBuilder
 85 | target/
 86 | 
 87 | # Jupyter Notebook
 88 | .ipynb_checkpoints
 89 | 
 90 | # IPython
 91 | profile_default/
 92 | ipython_config.py
 93 | 
 94 | # pyenv
 95 | .python-version
 96 | 
 97 | # pipenv
 98 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 99 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
100 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
101 | #   install all needed dependencies.
102 | #Pipfile.lock
103 | 
104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
105 | __pypackages__/
106 | 
107 | # Celery stuff
108 | celerybeat-schedule
109 | celerybeat.pid
110 | 
111 | # SageMath parsed files
112 | *.sage.py
113 | 
114 | # Environments
115 | .env
116 | .venv
117 | env/
118 | venv/
119 | ENV/
120 | env.bak/
121 | venv.bak/
122 | 
123 | # Spyder project settings
124 | .spyderproject
125 | .spyproject
126 | 
127 | # Rope project settings
128 | .ropeproject
129 | 
130 | # mkdocs documentation
131 | /site
132 | 
133 | # mypy
134 | .mypy_cache/
135 | .dmypy.json
136 | dmypy.json
137 | 
138 | # Pyre type checker
139 | .pyre/
140 | 


--------------------------------------------------------------------------------
/scripts/run_experiments.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | REPO_ROOT=`pwd`
 10 | RESULT_FOLDER=$REPO_ROOT/uai2021
 11 | 
 12 | ### MNIST AND CIFAR EXPERIMENTS ###
 13 | 
 14 | # Linear regression for MNIST and CIFAR-10
 15 | for DATASET in "mnist" "cifar10"
 16 | do
 17 |   RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_linear_pca20.json"
 18 |   python $REPO_ROOT/fisher_experiment.py \
 19 |     --dataset $DATASET \
 20 |     --model least_squares \
 21 |     --trials 1 \
 22 |     --results_file $RESULTS_FILE
 23 | done
 24 | 
 25 | # Logistic regression for MNIST and CIFAR-10
 26 | RESULTS_FILE="${RESULT_FOLDER}/mnist_logistic_pca20.json"
 27 | python $REPO_ROOT/fisher_experiment.py \
 28 |   --dataset mnist \
 29 |   --model logistic \
 30 |   --l2 8e-4 \
 31 |   --trials 1 \
 32 |   --results_file $RESULTS_FILE
 33 | 
 34 | RESULTS_FILE="${RESULT_FOLDER}/cifar10_logistic_pca20.json"
 35 | python $REPO_ROOT/fisher_experiment.py \
 36 |   --dataset cifar10 \
 37 |   --model logistic \
 38 |   --l2 8e-5 \
 39 |   --trials 1 \
 40 |   --results_file $RESULTS_FILE
 41 | 
 42 | # Iteratively reweighted FP for linear regression
 43 | for DATASET in "mnist" "cifar10"
 44 | do
 45 |   RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_linear_reweighted"
 46 |   python $REPO_ROOT/reweighted.py \
 47 |     --dataset $DATASET \
 48 |     --model least_squares \
 49 |     --weight_method sample \
 50 |     --iters 15 \
 51 |     --results_file $RESULTS_FILE
 52 | done
 53 | 
 54 | # Iteratively reweighted FP for logistic regression
 55 | RESULTS_FILE="${RESULT_FOLDER}/mnist_logistic_reweighted"
 56 | python $REPO_ROOT/reweighted.py \
 57 |   --dataset mnist \
 58 |   --model logistic \
 59 |   --weight_method sample \
 60 |   --iters 15 \
 61 |   --l2 8e-4 \
 62 |   --results_file $RESULTS_FILE
 63 | 
 64 | RESULTS_FILE="${RESULT_FOLDER}/cifar10_logistic_reweighted"
 65 | python $REPO_ROOT/reweighted.py \
 66 |   --dataset cifar10 \
 67 |   --model logistic \
 68 |   --weight_method sample \
 69 |   --iters 15 \
 70 |   --l2 8e-5 \
 71 |   --results_file $RESULTS_FILE
 72 | 
 73 | 
 74 | ### IWPC EXPERIMENTS ###
 75 | 
 76 | DATASET="iwpc"
 77 | MODEL="least_squares"
 78 | 
 79 | # For L2 and sigma inversion plots:
 80 | for L2 in "1e-5" "1e-3" "1e-1" "1"
 81 | do
 82 |   FIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_fil_l2_${L2}"
 83 |   python $REPO_ROOT/reweighted.py \
 84 |     --dataset $DATASET \
 85 |     --model $MODEL \
 86 |     --pca_dims 0 \
 87 |     --no_norm \
 88 |     --l2 $L2 \
 89 |     --attribute 11 13 \
 90 |     --results_file $FIL_RESULTS
 91 | 
 92 |   INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_inversion_l2_${L2}.json"
 93 |   python $REPO_ROOT/model_inversion.py \
 94 |     --inverter all \
 95 |     --dataset $DATASET \
 96 |     --model $MODEL \
 97 |     --l2 $L2 \
 98 |     --results_file $INVERSION_RESULTS
 99 |   
100 |   for INVERTER in 'fredrikson14' 'whitebox'
101 |   do
102 |     INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_${INVERTER}_private_inversion_l2_${L2}.json"
103 |     python $REPO_ROOT/private_model_inversion.py \
104 |       --dataset $DATASET \
105 |       --trials 100 \
106 |       --noise_scales 1e-5 2e-5 5e-5 1e-4 2e-4 5e-4 1e-3 2e-3 5e-3 1e-2 2e-2 5e-2 1e-1 2e-1 5e-1 1 \
107 |       --inverter $INVERTER \
108 |       --model $MODEL \
109 |       --l2 $L2 \
110 |       --results_file $INVERSION_RESULTS
111 |   done
112 | done
113 | 
114 | # For IRFIL inversion plots:
115 | L2=1e-2
116 | IRFIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_irfil"
117 | python $REPO_ROOT/reweighted.py \
118 |   --dataset $DATASET \
119 |   --model $MODEL \
120 |   --pca_dims 0 \
121 |   --iters 10 \
122 |   --no_norm \
123 |   --l2 $L2 \
124 |   --attribute 11 13 \
125 |   --results_file $IRFIL_RESULTS
126 | 
127 | for INVERTER in 'fredrikson14' 'whitebox'
128 | do
129 |   INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_${INVERTER}_private_inversion_irfil.json"
130 |   python $REPO_ROOT/private_model_inversion.py \
131 |     --dataset $DATASET \
132 |     --trials 100 \
133 |     --noise_scales 1e-4 1e-3 1e-2 \
134 |     --inverter $INVERTER \
135 |     --model $MODEL \
136 |     --l2 $L2 \
137 |     --weights_file ${IRFIL_RESULTS}.pth \
138 |     --results_file $INVERSION_RESULTS
139 | done
140 | 
141 | ### UCI ADULT EXPERIMENTS ###
142 | DATASET="uciadult"
143 | MODEL="least_squares"
144 | L2=1e-3
145 | 
146 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_${MODEL}_inversion.json"
147 | python $REPO_ROOT/model_inversion.py \
148 |   --dataset $DATASET \
149 |   --model $MODEL \
150 |   --l2 $L2 \
151 |   --inverter all \
152 |   --results_file $RESULTS_FILE
153 | 
154 | IRFIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_irfil"
155 | python $REPO_ROOT/reweighted.py \
156 |   --dataset $DATASET \
157 |   --model $MODEL \
158 |   --iters 10 \
159 |   --l2 $L2 \
160 |   --pca_dims 0 \
161 |   --no_norm \
162 |   --attribute 24 25 \
163 |   --results_file $IRFIL_RESULTS
164 | 
165 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_${MODEL}_whitebox_private_inversion_irfil.json"
166 | python $REPO_ROOT/private_model_inversion.py \
167 |   --dataset $DATASET \
168 |   --model $MODEL \
169 |   --l2 $L2 \
170 |   --inverter whitebox \
171 |   --trials 100 \
172 |   --noise_scales 1e-4 1e-3 1e-2 1e-1 1 \
173 |   --weights_file ${IRFIL_RESULTS}.pth \
174 |   --results_file $RESULTS_FILE
175 | 


--------------------------------------------------------------------------------
/private_model_inversion.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import argparse
 10 | import json
 11 | import logging
 12 | import math
 13 | import time
 14 | import torch
 15 | 
 16 | import models
 17 | import dataloading
 18 | from model_inversion import fredrikson14_inverter, WhiteboxInverter, compute_metrics, features_to_category
 19 | 
 20 | # set up logger:
 21 | logger = logging.getLogger()
 22 | logger.setLevel(logging.INFO)
 23 | 
 24 | 
 25 | def eval_model(model, data, regression):
 26 |     predictions = model.predict(data["features"], regression=regression)
 27 |     if regression:
 28 |         acc = (predictions - data["targets"]).pow(2).mean()
 29 |     else:
 30 |         acc = ((predictions == data["targets"]).float()).mean()
 31 |     return acc.item()
 32 | 
 33 | 
 34 | def run_inversion(args, data, test_data, weights):
 35 |     regression = (args.dataset == "iwpc" or args.dataset == "synth")
 36 | 
 37 |     # Train model:
 38 |     model = models.get_model(args.model)
 39 |     logging.info(f"Training model {args.model}")
 40 |     model.train(data, l2=args.l2, weights=weights)
 41 | 
 42 |     if args.dataset == "uciadult":
 43 |         target_attribute = (24, 25) # [not married, married]
 44 |     elif args.dataset == "iwpc":
 45 |         #target_attribute = (2, 7) # CYP2C9 genotype
 46 |         target_attribute = (11, 13) # VKORC1 genotype
 47 |     elif args.dataset == "synth":
 48 |         target_attribute = (0, 2)
 49 |     else:
 50 |         raise NotImplementedError("Dataset not yet implemented.")
 51 | 
 52 |     if args.inverter == "fredrikson14":
 53 |         def invert(private_model, noise_scale=None):
 54 |             return fredrikson14_inverter(
 55 |                 data, target_attribute, private_model, weights)
 56 |         invert_fn = invert
 57 |     elif args.inverter == "whitebox":
 58 |         inverter = WhiteboxInverter(
 59 |             data, target_attribute, type(model), weights, args.l2)
 60 |         def invert(private_model, noise_scale=None):
 61 |             return inverter.predict(private_model, gamma=noise_scale)
 62 |         invert_fn = invert
 63 | 
 64 |     results = {}
 65 |     theta = model.get_params()
 66 |     for noise_scale in args.noise_scales:
 67 |         logging.info(f"Running inversion for noise scale {noise_scale}.")
 68 |         all_predictions = []
 69 |         train_accs = []
 70 |         test_accs = []
 71 |         for trial in range(args.trials):
 72 |             # Add noise:
 73 |             theta_priv = theta + torch.randn_like(theta) * noise_scale
 74 |             model.set_params(theta_priv)
 75 |             # Check train and test predictions:
 76 |             train_acc = eval_model(model, data, regression)
 77 |             test_acc = eval_model(model, test_data, regression)
 78 |             if regression:
 79 |                 logging.info(f"MSE Train {train_acc:.3f}, MSE Test {test_acc:.3f}.")
 80 |             else:
 81 |                 logging.info(f"Acc Train {train_acc:.3f}, Acc Test {test_acc:.3f}.")
 82 |             predictions = invert_fn(model, noise_scale=noise_scale)
 83 |             acc = compute_metrics(data, predictions, target_attribute)
 84 |             logging.info(f"Private inversion accuracy {acc:.4f}")
 85 |             all_predictions.append(predictions.tolist())
 86 |             train_accs.append(train_acc)
 87 |             test_accs.append(test_acc)
 88 | 
 89 |         results[noise_scale] = {
 90 |                 "predictions" : all_predictions,
 91 |                 "train_acc" : train_accs,
 92 |                 "test_acc" : test_accs,
 93 |             }
 94 | 
 95 |     results["target"] = features_to_category(
 96 |             data["features"][:, range(*target_attribute)]).tolist()
 97 |     return results
 98 | 
 99 | 
100 | def main(args):
101 |     regression = (args.dataset == "iwpc" or args.dataset == "synth")
102 |     data = dataloading.load_dataset(
103 |         name=args.dataset, split="train", normalize=False,
104 |         num_classes=2, root=args.data_folder, regression=regression)
105 |     test_data = dataloading.load_dataset(
106 |         name=args.dataset, split="test", normalize=False,
107 |         num_classes=2, root=args.data_folder, regression=regression)
108 | 
109 |     if args.subsample > 0:
110 |         data = dataloading.subsample(data, args.subsample)
111 | 
112 |     if args.weights_file is not None:
113 |         all_weights = torch.load(args.weights_file)
114 |     else:
115 |         all_weights = [torch.ones(len(data["targets"]))]
116 | 
117 |     results = []
118 |     for it, weights in enumerate(all_weights):
119 |         if len(all_weights) > 1:
120 |             logging.info(f"Iteration {it} weights for model inversion.")
121 |         results.append(run_inversion(args, data, test_data, weights))
122 | 
123 |     if args.results_file is not None:
124 |         with open(args.results_file, 'w') as fid:
125 |             json.dump(results, fid)
126 | 
127 | 
128 | if __name__ == "__main__":
129 |     parser = argparse.ArgumentParser(description="Model inversion.")
130 |     parser.add_argument("--data_folder", default="/tmp", type=str,
131 |         help="folder in which to store data (default: '/tmp')")
132 |     parser.add_argument("--dataset", default="uciadult", type=str,
133 |         choices=["uciadult", "iwpc", "synth"],
134 |         help="dataset to use.")
135 |     parser.add_argument("--model", default="least_squares", type=str,
136 |         choices=["least_squares", "logistic"],
137 |         help="type of model (default: least_squares)")
138 |     parser.add_argument("--l2", default=0, type=float,
139 |         help="l2 regularization parameter")
140 |     parser.add_argument("--inverter", default="whitebox", type=str,
141 |         choices=["fredrikson14", "whitebox"],
142 |         help="inversion method to use (default: whitebox)")
143 |     parser.add_argument("--noise_scales", metavar='N', type=float,
144 |         nargs='+', default=[0],
145 |         help="Gaussian noise scales for output perturbation")
146 |     parser.add_argument("--trials", default=1, type=int,
147 |         help="number of noise vectors to test")
148 |     parser.add_argument("--subsample", default=0, type=int,
149 |         help="number of training examples")
150 |     parser.add_argument("--weights_file", default=None, type=str,
151 |         help="(optional) file to load IRFIL weights from")
152 |     parser.add_argument("--results_file", default=None, type=str,
153 |         help="(optional) path to save results")
154 |     args = parser.parse_args()
155 |     main(args)
156 | 


--------------------------------------------------------------------------------
/fisher_experiment.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import argparse
 10 | import json
 11 | import logging
 12 | import math
 13 | import time
 14 | import torch
 15 | 
 16 | import models
 17 | import dataloading
 18 | 
 19 | # set up logger:
 20 | logger = logging.getLogger()
 21 | logger.setLevel(logging.INFO)
 22 | 
 23 | 
 24 | def compute_accuracy(model, data, noise_scale=0, trials=100, regression=False):
 25 |     accuracies = []
 26 |     X, y = data["features"], data["targets"]
 27 |     theta = model.get_params()
 28 |     for _ in range(trials):
 29 |         theta_priv = theta + torch.randn_like(theta) * noise_scale
 30 |         model.set_params(theta_priv)
 31 |         if regression:
 32 |             acc = model.loss(data).mean().item()
 33 |         else:
 34 |             predictions = model.predict(X)
 35 |             acc = ((predictions == y).float()).mean().item()
 36 |         accuracies.append(acc)
 37 |     model.set_params(theta)
 38 |     accuracies = torch.tensor(accuracies)
 39 |     return torch.mean(accuracies).item(), torch.std(accuracies).item()
 40 | 
 41 | 
 42 | def clip_data(data, etas, clip):
 43 |     keep_ids = etas < clip
 44 |     return {"features" : data["features"][keep_ids, ...],
 45 |             "targets" : data["targets"][keep_ids]}
 46 | 
 47 | 
 48 | def eval_comparison_stats(model, data):
 49 |     theta = model.get_params()
 50 |     theta.requires_grad = True
 51 | 
 52 |     # compute per sample losses:
 53 |     losses = model.loss(data)
 54 | 
 55 |     # compute the norm of the gradient of the loss at each sample
 56 |     # w.r.t. the model weights:
 57 |     def func(theta):
 58 |         model.theta = theta
 59 |         return model.loss(data)
 60 |     ind_grads = torch.autograd.functional.jacobian(func, theta)
 61 |     grad_norms = ind_grads.norm(dim=1)
 62 | 
 63 |     theta.requires_grad = False
 64 | 
 65 |     # compute per sample inner product with weights
 66 |     weight_dots = data["features"] @ theta
 67 | 
 68 |     return losses.tolist(), weight_dots.tolist(), grad_norms.tolist()
 69 | 
 70 | def main(args):
 71 |     regression = (args.dataset == "iwpc")
 72 |     data = dataloading.load_dataset(
 73 |         name=args.dataset, split="train", normalize=not args.no_norm,
 74 |         num_classes=2, root=args.data_folder, regression=regression)
 75 |     test_data = dataloading.load_dataset(
 76 |         name=args.dataset, split="test", normalize=not args.no_norm,
 77 |         num_classes=2, root=args.data_folder, regression=regression)
 78 |     if args.pca_dims > 0:
 79 |         data, pca = dataloading.pca(data, num_dims=args.pca_dims)
 80 |         test_data, _ = dataloading.pca(test_data, mapping=pca)
 81 | 
 82 |     model = models.get_model(args.model)
 83 | 
 84 |     # Find the optimal parameters for the model:
 85 |     logging.info(f"Training model {args.model}")
 86 |     model.train(data, l2=args.l2)
 87 | 
 88 |     # Check predictions for sanity:
 89 |     accuracy, _ = compute_accuracy(model, data, regression=regression)
 90 |     if regression:
 91 |         logging.info("Training MSE of classifier {:.3f}".format(accuracy))
 92 |     else:
 93 |         logging.info("Training accuracy of classifier {:.3f}".format(accuracy))
 94 | 
 95 |     # Compute the Jacobian of the influence of each example on the optimal
 96 |     # parameters:
 97 |     logging.info(f"Computing influence Jacobian on training set...")
 98 |     start = time.time()
 99 |     J = model.influence_jacobian(data)
100 |     time_per_sample = 1e3 * (time.time() - start) / len(data["targets"])
101 |     logging.info("Time taken per example {:.3f} (ms)".format(time_per_sample))
102 | 
103 |     # Compute the Fisher information loss from the FIM (J^T J) for each example
104 |     # in the training set (J^T J is the Fisher information with Gaussian output
105 |     # perturbation on the parameters at a scale of 1):
106 |     start = time.time()
107 |     logging.info(f"Computing Fisher information loss...")
108 |     etas = models.compute_information_loss(J)
109 |     time_per_sample = 1e3 * (time.time() - start) / len(etas)
110 |     logging.info(
111 |         "Computed {} examples, maximum eta: {:.3f}, "
112 |         "time per sample {:.3f} (ms).".format(
113 |             len(etas), max(etas), time_per_sample))
114 | 
115 |     # Compute some comparison points:
116 |     losses, weight_dots, grad_norms = eval_comparison_stats(model, data)
117 | 
118 |     # Retrain the model and measure the new etas if removing most lossy
119 |     # examples:
120 |     if args.clip > 0:
121 |         clipped_data = clip_data(data, etas, args.clip)
122 |         logging.info(
123 |             "Kept {}/{} samples, retrain and compute eta..".format(
124 |                 len(clipped_data["targets"]), len(data["targets"])))
125 |         model.train(clipped_data, l2=args.l2)
126 |         J = model.influence_jacobian(clipped_data)
127 |         etas = models.compute_information_loss(J)
128 |         etamax = max(etas)
129 |     else:
130 |         etamax = max(etas)
131 | 
132 |     # Measure the test accuracy as a function of the noise needed to attain a
133 |     # desired eta:
134 |     accuracies = []
135 |     stds = []
136 |     for eta in args.etas:
137 |         # Compute the Gaussian noise scale needed for eta:
138 |         scale = etamax / eta
139 |         # Measure test accuracy:
140 |         accuracy, std = compute_accuracy(
141 |             model, test_data, noise_scale=scale, trials=args.trials,
142 |             regression=regression)
143 |         accuracies.append(accuracy)
144 |         stds.append(std)
145 | 
146 |     results = {
147 |         "clip" : args.clip,
148 |         "accuracies" : accuracies,
149 |         "stds" : stds,
150 |         "etas" : etas.tolist(),
151 |         "train_losses" : losses,
152 |         "train_dot_weights" : weight_dots,
153 |         "train_grad_norms" : grad_norms,
154 |     }
155 |     with open(args.results_file, 'w') as fid:
156 |         json.dump(results, fid)
157 | 
158 | 
159 | if __name__ == "__main__":
160 |     parser = argparse.ArgumentParser(description="Fisher information loss.")
161 |     parser.add_argument("--data_folder", default="/tmp", type=str,
162 |         help="folder in which to store data (default: '/tmp')")
163 |     parser.add_argument("--dataset", default="mnist", type=str,
164 |         choices=["mnist", "cifar10", "cifar100", "uciadult", "iwpc"],
165 |         help="dataset to use.")
166 |     parser.add_argument("--model", default="least_squares", type=str,
167 |         choices=["least_squares", "logistic"],
168 |         help="type of model (default: least_squares)")
169 |     parser.add_argument("--pca_dims", default=20, type=int,
170 |         help="Number of PCA dimensions (if 0, uses raw features)")
171 |     parser.add_argument("--no_norm", default=False, action="store_true",
172 |         help="Don't normalize examples to lie in unit ball")
173 |     parser.add_argument("--l2", default=0, type=float,
174 |         help="l2 regularization parameter")
175 |     parser.add_argument("--results_file",
176 |         default="/tmp/private_model_results.json", type=str,
177 |         help="file in which to save the results")
178 |     parser.add_argument('--etas', metavar='N', type=float,
179 |         nargs='+', default=[1.0],
180 |         help='Fisher information loss levels (eta) to evaluate accuracy')
181 |     parser.add_argument('--clip', type=float, default=0.0,
182 |         help='eta removal threshold for data')
183 |     parser.add_argument('--trials', type=int, default=100,
184 |         help='number of trials to evaluate an output perturbed model')
185 |     args = parser.parse_args()
186 |     main(args)
187 | 


--------------------------------------------------------------------------------
/reweighted.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import argparse
 10 | import json
 11 | import logging
 12 | import math
 13 | import time
 14 | import torch
 15 | 
 16 | import models
 17 | import dataloading
 18 | 
 19 | # set up logger:
 20 | logger = logging.getLogger()
 21 | logger.setLevel(logging.INFO)
 22 | 
 23 | 
 24 | def compute_accuracy(model, data, regression=False):
 25 |     X, y = data["features"], data["targets"]
 26 |     if regression:
 27 |         acc = model.loss(data).mean().item()
 28 |     else:
 29 |         predictions = model.predict(X)
 30 |         acc = ((predictions == y).float()).mean().item()
 31 |     return acc
 32 | 
 33 | 
 34 | def get_weights(method, prev_weights, data):
 35 |     weights = torch.ones(len(data["targets"]))
 36 |     if method == "sample":
 37 |         weights[:] = prev_weights.data
 38 |     elif method == "class":
 39 |         n_class = data["targets"].max() + 1
 40 |         for c in range(n_class):
 41 |             mask = data["targets"] == c
 42 |             weights[mask] = prev_weights[mask].mean()
 43 |     elif method == "hybrid":
 44 |         n_class = data["targets"].max() + 1
 45 |         for c in range(n_class):
 46 |             mask = data["targets"] == c
 47 |             weights[mask] = prev_weights[mask].mean()
 48 |         weights *= prev_weights
 49 |     else:
 50 |         raise ValueError(f"Invalid weight method {method}.")
 51 |     return weights
 52 | 
 53 | 
 54 | def main(args):
 55 |     regression = args.dataset == "iwpc" or args.dataset == "synth"
 56 |     data = dataloading.load_dataset(
 57 |         name=args.dataset, split="train", normalize=not args.no_norm,
 58 |         num_classes=2, root=args.data_folder, regression=regression)
 59 |     test_data = dataloading.load_dataset(
 60 |         name=args.dataset, split="test", normalize=not args.no_norm,
 61 |         num_classes=2, root=args.data_folder, regression=regression)
 62 |     if args.pca_dims > 0:
 63 |         data, pca = dataloading.pca(data, num_dims=args.pca_dims)
 64 |         test_data, _ = dataloading.pca(test_data, mapping=pca)
 65 | 
 66 |     model = models.get_model(args.model)
 67 | 
 68 |     # Find the optimal parameters for the model:
 69 |     logging.info(f"Training {args.model} model.")
 70 |     model.train(data, l2=args.l2)
 71 | 
 72 |     train_accuracy = compute_accuracy(model, data, regression=regression)
 73 |     test_accuracy = compute_accuracy(model, test_data, regression=regression)
 74 |     if regression:
 75 |         logging.info(f"MSE train {train_accuracy:.3f},"
 76 |             f" test: {test_accuracy:.3f}.")
 77 |     else:
 78 |         logging.info(f"Accuracy train {train_accuracy:.3f},"
 79 |             f" test: {test_accuracy:.3f}.")
 80 | 
 81 |     # Compute the Fisher information loss, eta, for each example in the
 82 |     # training set:
 83 |     logging.info("Computing unweighted etas on training set...")
 84 |     J = model.influence_jacobian(data)
 85 |     etas = models.compute_information_loss(J, target_attribute=args.attribute,
 86 |                                            constrained=args.constrained)
 87 |     logging.info(f"etas max: {etas.max().item():.4f},"
 88 |         f" mean: {etas.mean().item():.4f}, std: {etas.std().item():.4f}.")
 89 | 
 90 |     # Reweight using the fisher information loss:
 91 |     updated_fi = etas.reciprocal().detach()
 92 |     maxs = [etas.max().item()]
 93 |     means = [etas.mean().item()]
 94 |     stds = [etas.std().item()]
 95 |     train_accs = [train_accuracy]
 96 |     test_accs = [test_accuracy]
 97 |     all_weights = [torch.ones(len(updated_fi))]
 98 |     for i in range(args.iters):
 99 |         logging.info(f"Iter {i}: Training weighted model...")
100 |         updated_fi *= (len(updated_fi) / updated_fi.sum())
101 |         # TODO does it make sense to renormalize after clamping?
102 |         updated_fi.clamp_(min=args.min_weight, max=args.max_weight)
103 |         weights = get_weights(args.weight_method, updated_fi, data)
104 |         model.train(data, l2=args.l2, weights=weights.detach())
105 | 
106 |         # Check predictions of weighted model:
107 |         train_accuracy = compute_accuracy(model, data, regression=regression)
108 |         test_accuracy = compute_accuracy(model, test_data, regression=regression)
109 |         if regression:
110 |             logging.info(f"Weighted model MSE train {train_accuracy:.3f},"
111 |                 f" test: {test_accuracy:.3f}.")
112 |         else:
113 |             logging.info(f"Weighted model accuracy train {train_accuracy:.3f},"
114 |                 f" test: {test_accuracy:.3f}.")
115 | 
116 |         J = model.influence_jacobian(data)
117 |         weighted_etas = models.compute_information_loss(J, target_attribute=args.attribute,
118 |                                                         constrained=args.constrained)
119 |         updated_fi /= weighted_etas
120 |         maxs.append(weighted_etas.max().item())
121 |         means.append(weighted_etas.mean().item())
122 |         stds.append(weighted_etas.std().item())
123 |         train_accs.append(train_accuracy)
124 |         test_accs.append(test_accuracy)
125 |         all_weights.append(weights)
126 |         logging.info(f"Weighted etas max: {maxs[-1]:.4f},"
127 |             f" mean: {means[-1]:.4f},"
128 |             f" std: {stds[-1]:.4f}.")
129 | 
130 |     results = {
131 |         "weights" : weights.tolist(),
132 |         "etas" : etas.tolist(),
133 |         "weighted_etas" : weighted_etas.tolist(),
134 |         "eta_maxes" : maxs,
135 |         "eta_means" : means,
136 |         "eta_stds" : stds,
137 |         "train_accs" : train_accs,
138 |         "test_accs" : test_accs,
139 |     }
140 | 
141 |     with open(args.results_file + ".json", 'w') as fid:
142 |         json.dump(results, fid)
143 |     torch.save(torch.stack(all_weights), args.results_file + ".pth")
144 | 
145 | 
146 | if __name__ == "__main__":
147 |     parser = argparse.ArgumentParser(description="Information loss.")
148 |     parser.add_argument("--data_folder", default="/tmp", type=str,
149 |         help="folder in which to store data (default: '/tmp')")
150 |     parser.add_argument("--dataset", default="mnist", type=str,
151 |         choices=["mnist", "cifar10", "cifar100", "uciadult", "iwpc", "synth"],
152 |         help="dataset to use.")
153 |     parser.add_argument("--model", default="least_squares", type=str,
154 |         choices=["least_squares", "logistic"],
155 |         help="type of model (default: least_squares)")
156 |     parser.add_argument("--weight_method", default="sample", type=str,
157 |         choices=["sample", "class", "hybrid"],
158 |         help="Method to weight the loss by (default: sample)")
159 |     parser.add_argument("--min_weight", default=0, type=float,
160 |         help="Minimum per-sample weight (default: 0)")
161 |     parser.add_argument("--max_weight", default=float("inf"), type=float,
162 |         help="Maximum per-sample weight (default: inf)")
163 |     parser.add_argument("--attribute", default=None, nargs="+", type=int,
164 |         help="Which attributes to reweight for privacy (None for full)")
165 |     parser.add_argument("--iters", default=1, type=int,
166 |         help="Number of iterations.")
167 |     parser.add_argument("--pca_dims", default=20, type=int,
168 |         help="Number of PCA dimensions (if 0, uses raw features)")
169 |     parser.add_argument("--no_norm", default=False, action="store_true",
170 |         help="Don't normalize examples to lie in unit ball")
171 |     parser.add_argument("--constrained", default=False, action="store_true",
172 |         help="Use constrained Fisher information matrix")
173 |     parser.add_argument("--l2", default=0, type=float,
174 |         help="l2 regularization parameter")
175 |     parser.add_argument("--results_file",
176 |         default="/tmp/private_model_results", type=str,
177 |         help="file in which to save the results")
178 |     args = parser.parse_args()
179 |     main(args)
180 | 


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import torch
 10 | 
 11 | 
 12 | def compute_information_loss(J, target_attribute=None, constrained=False):
 13 |     """
 14 |     Compute the Fisher information loss, eta, from the largest eigenvalues
 15 |     of the unscaled Fisher information matrix for each sample.
 16 | 
 17 |     Arguments:
 18 |         target_attribute:  Used to specify a range of attributes
 19 |                            (default None for all attributes)
 20 |         constrained:       If True, constrain the attributes to sum to 1 by
 21 |                            multiplying FIM with an orthogonal matrix U
 22 |     """
 23 |     if target_attribute is not None:
 24 |         J = J[:, :, range(*target_attribute)]
 25 |     if constrained:
 26 |         d = J.size(2)
 27 |         assert d > 1, 'Cannot constrain 1-dimensional attribute vector'
 28 |         U = torch.vstack([torch.zeros(1, d-1), torch.ones(d-1, d-1).tril()])
 29 |         U /= -U.sum(0).unsqueeze(0)
 30 |         U += torch.vstack([torch.eye(d-1), torch.zeros(1, d-1)])
 31 |         U /= U.norm(2, 0).unsqueeze(0)
 32 |         J = J @ U
 33 |     return torch.linalg.norm(J, ord=2, dim=(1, 2))
 34 | 
 35 | 
 36 | def get_model(model_type):
 37 |     if type(model_type) is type:
 38 |         return model_type()
 39 | 
 40 |     if model_type == "least_squares":
 41 |         return LeastSquares()
 42 |     elif model_type == "logistic":
 43 |         return Logistic()
 44 |     raise ValueError(f"Unknown model type {model_type}")
 45 | 
 46 | 
 47 | def weighted_least_squares_jacobian(A, theta, X, y, w):
 48 |     """
 49 |     Jacobian of the remove and update function with respect to the update.
 50 |     """
 51 |     r = (X @ theta - y)[:, None, None]
 52 |     XA = X @ A
 53 |     JX = -(r * A.unsqueeze(0) + XA.unsqueeze(2) * theta[None, None, :])
 54 |     return w[:, None, None] * JX, w[:, None] * XA
 55 | 
 56 | 
 57 | def least_squares_jacobian(A, theta, X, y):
 58 |     """
 59 |     Jacobian of the remove and update function with respect to the update.
 60 |     """
 61 |     r = (X @ theta - y)[:, None, None]
 62 |     XA = X @ A
 63 |     JX = -(r * A.unsqueeze(0) + XA.unsqueeze(2) * theta[None, None, :])
 64 |     return JX, XA
 65 | 
 66 | 
 67 | class LeastSquares:
 68 | 
 69 |     def train(self, data, l2=0, weights=None):
 70 |         n = len(data["targets"])
 71 |         if weights is None:
 72 |             weights = torch.ones(n)
 73 |         assert len(weights) == n, "Invalid number of weights"
 74 | 
 75 |         # Save the weights for the jacobian:
 76 |         self.weights = weights
 77 | 
 78 |         X = data["features"]
 79 |         y = data["targets"].float()
 80 |         # [-1, 1] works much better for regression
 81 |         y[y == 0] = -1
 82 |         XTX = (weights[:, None] * X).T @ X
 83 |         XTXdiag = torch.diagonal(XTX)
 84 |         XTXdiag += (n * l2)
 85 |         b = X.T @ (weights * y)
 86 |         theta = torch.solve(b[:, None], XTX)[0].squeeze(1)
 87 |         # Need A to compute the Jacobian.
 88 |         A = torch.inverse(XTX)
 89 |         self.A = A
 90 |         self.theta = theta
 91 | 
 92 |     def get_params(self):
 93 |         return self.theta
 94 | 
 95 |     def set_params(self, theta):
 96 |         self.theta = theta
 97 | 
 98 |     def predict(self, X, regression=False):
 99 |         """
100 |         Given a data matrix X with examples as rows,
101 |         returns a {0, 1} prediction for each x in X.
102 |         """
103 |         if regression:
104 |             return X @ self.theta
105 |         else:
106 |             return (X @ self.theta) > 0
107 | 
108 |     def loss(self, data):
109 |         """
110 |         Evaluate the loss for each example in a given dataset.
111 |         """
112 |         X = data["features"]
113 |         y = data["targets"].float()
114 |         # [-1, 1] works much better for regression
115 |         y[y == 0] = -1
116 |         return (X @ self.theta - y)**2 / 2
117 | 
118 |     def influence_jacobian(self, data, weighted=True):
119 |         """
120 |         Compute the Jacobian of the influence of each
121 |         example on the optimal parameters. The resulting
122 |         Jacobian will have shape N x d x (d+1) where N is
123 |         the number of data points.
124 |         """
125 |         X = data["features"]
126 |         y = data["targets"].float()
127 |         y[y == 0] = -1
128 |         if weighted:
129 |             JX, Jy = weighted_least_squares_jacobian(
130 |                 self.A, self.theta, X, y, self.weights)
131 |         else:
132 |             JX, Jy = least_squares_jacobian(
133 |                 self.A, self.theta, X, y)
134 |         return torch.cat([JX, Jy.unsqueeze(2)], dim=2)
135 | 
136 | 
137 | class Logistic:
138 | 
139 |     def train(self, data, l2=0, init=None, weights=None):
140 |         n = len(data["targets"])
141 |         if weights is None:
142 |             weights = torch.ones(n)
143 |         assert len(weights) == n, "Invalid number of weights"
144 | 
145 |         # Save for the jacobian:
146 |         self.weights = weights
147 |         self.l2 = n * l2
148 | 
149 |         X = data["features"]
150 |         y = data["targets"].float()
151 |         theta = torch.randn(X.shape[1], requires_grad=True)
152 |         if init is not None:
153 |             theta.data[:] = init[:]
154 | 
155 |         crit = torch.nn.BCEWithLogitsLoss(reduction="none")
156 |         optimizer = torch.optim.LBFGS([theta], line_search_fn="strong_wolfe")
157 |         def closure():
158 |             optimizer.zero_grad()
159 |             loss = (crit(X @ theta, y) * weights).sum()
160 |             loss += (self.l2 / 2.0) * (theta**2).sum()
161 |             loss.backward()
162 |             return loss
163 |         for _ in range(100):
164 |             loss = optimizer.step(closure)
165 |         self.theta = theta
166 | 
167 |     def get_params(self):
168 |         return self.theta
169 | 
170 |     def set_params(self, theta):
171 |         self.theta = theta
172 | 
173 |     def predict(self, X):
174 |         """
175 |         Given a data matrix X with examples as rows,
176 |         returns a {0, 1} prediction for each x in X.
177 |         """
178 |         return (X @ self.theta) > 0
179 | 
180 |     def loss(self, data):
181 |         X = data["features"]
182 |         y = data["targets"].float()
183 |         return torch.nn.BCEWithLogitsLoss(reduction="none")(X @ self.theta, y)
184 | 
185 |     def influence_jacobian(self, data):
186 |         """
187 |         Compute the Jacobian of the influence of each
188 |         example on the optimal parameters. The resulting
189 |         Jacobian will have shape N x d x (d+1) where N is
190 |         the number of data points.
191 |         """
192 |         X = data["features"]
193 |         y = data["targets"].float()
194 | 
195 |         # Compute the Hessian at theta for all X:
196 |         s = (X @ self.theta).sigmoid().unsqueeze(1)
197 |         H = (self.weights.unsqueeze(1) * s * (1-s) * X).T @ X
198 |         Hdiag = torch.diagonal(H)
199 |         Hdiag += self.l2
200 |         Hinv = H.inverse()
201 | 
202 |         # Compute the Jacobian of the gradient w.r.t. theta at each (x, y) pair
203 |         XHinv = X @ Hinv
204 |         JX = -(s * (1-s) * XHinv).unsqueeze(2) * self.theta[None, None, :]
205 |         JX -= (s - y.unsqueeze(1)).unsqueeze(2) * Hinv.unsqueeze(0)
206 |         JX = self.weights[:, None, None] * JX
207 |         JY =  (self.weights[:, None] * XHinv).unsqueeze(2)
208 |         return torch.cat([JX, JY], dim=2)
209 | 
210 | class MultinomialLogistic:
211 | 
212 |     def train(self, data):
213 |         X = data["features"]
214 |         y = data["targets"]
215 |         c = torch.max(y) + 1
216 | 
217 |         theta = torch.randn((X.shape[1], c), requires_grad=True)
218 |         optimizer = torch.optim.LBFGS([theta], line_search_fn="strong_wolfe")
219 |         crit = torch.nn.CrossEntropyLoss(reduction="mean")
220 |         def closure():
221 |             optimizer.zero_grad()
222 |             loss = crit(X @ theta, y)
223 |             loss.backward()
224 |             return loss
225 |         for _ in range(100):
226 |             loss = optimizer.step(closure)
227 | 
228 |         self.theta = theta
229 | 
230 |     def predict(self, X):
231 |         """
232 |         Given a data matrix X with examples as rows,
233 |         returns a {0, 1} prediction for each x in X.
234 |         """
235 |         return torch.argmax(X @ self.theta, axis=1)
236 | 


--------------------------------------------------------------------------------
/test_jacobians.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import math
 10 | import torch
 11 | import unittest
 12 | 
 13 | from models import least_squares_jacobian
 14 | from models import weighted_least_squares_jacobian
 15 | 
 16 | 
 17 | def least_squares_jacobian_single(A, theta, x, y):
 18 |     """
 19 |     Jacobian of the remove and update function with respect to the update.
 20 |     """
 21 |     r = (theta.dot(x) - y)
 22 |     Ax = A @ x
 23 |     return -(r * A + Ax.ger(theta)), Ax
 24 | 
 25 | 
 26 | def weighted_least_squares_jacobian_single(A, theta, x, y, w):
 27 |     """
 28 |     Jacobian of the remove and update function with respect to the update for
 29 |     weighted least squares.
 30 |     """
 31 |     sqw = math.sqrt(w)
 32 |     x = sqw * x
 33 |     y = sqw * y
 34 |     r = (theta.dot(x) - y)
 35 |     Ax = A @ x
 36 |     return -sqw * (r * A + Ax.ger(theta)), sqw * Ax
 37 | 
 38 | 
 39 | def least_squares_update(A, theta, x, y, w=1.0):
 40 |     """
 41 |     Updates a given solution to the least squares problem to incorporate the
 42 |     new data point (x, y).
 43 |     A is `(X^T X)^{-1}` and theta are the optimal parameters (`A X^T y`)
 44 |     """
 45 |     x = math.sqrt(w) * x
 46 |     y = math.sqrt(w) * y
 47 |     Ax = A @ x
 48 |     c = 1 / (1 + x.T @ Ax)
 49 |     return theta - c * (x.T @ theta - y) * Ax
 50 | 
 51 | 
 52 | def least_squares_remove_and_update(A, theta, X, Y, xr, yr, xn, yn):
 53 |     Au = rank_two_update(A, xr, xn, steps=[-1, 1])
 54 |     return Au @ (X.T @ Y - xr * yr + xn * yn)
 55 | 
 56 | 
 57 | def least_squares_jacobian_update(A, theta, x, y):
 58 |     """
 59 |     Jacobian of the update function with respect to the update.
 60 |     """
 61 |     Ax = A @ x
 62 |     c = 1 / (1 + x.T @ Ax)
 63 |     r = x.dot(theta) - y
 64 |     t1 = ((2 * r * c**2) * Ax).ger(Ax)
 65 |     t2 = -r * c * A
 66 |     t3 = -c *  Ax.ger(theta)
 67 |     Jx = t1 + t2 + t3
 68 |     Jy = c * Ax
 69 |     return Jx, Jy
 70 | 
 71 | 
 72 | def rank_one_update(A, x, step=1):
 73 |     """
 74 |     Compute the updated A when applying the rank-one update of x on A^{-1}.
 75 |     E.g. compute `(A^{-1} + step * x x^T)^{-1}` using the Sherman-Morrison formula.
 76 |     """
 77 |     Ax = A @ x
 78 |     c = 1 / (step + x.dot(Ax))
 79 |     return A - c * Ax.ger(Ax)
 80 | 
 81 | 
 82 | def rank_two_update(A, x1, x2, steps=[1, 1]):
 83 |     steps = torch.tensor([1 / s for s in steps])
 84 |     U = torch.stack([x1, x2], dim=1)
 85 |     AU = A @ U
 86 |     D = (U.T @ AU + torch.diag(steps)).inverse()
 87 |     return A - AU @ D @ AU.T
 88 | 
 89 | 
 90 | class TestJacobian(unittest.TestCase):
 91 | 
 92 |     def test_rank_one_update(self):
 93 |         d = 10
 94 |         A = torch.rand(d, d)
 95 |         x = torch.rand(d)
 96 |         Aminus = rank_one_update(A, x, step=-1)
 97 |         Aplus = rank_one_update(Aminus, x, step=1)
 98 |         self.assertTrue(torch.allclose(A, Aplus, rtol=1e-4, atol=1e-4))
 99 | 
100 | 
101 |     def test_rank_two_update(self):
102 |         d = 10
103 |         A = torch.rand(d, d)
104 |         A = (A + A.T) + 5 * torch.eye(10)
105 |         x1 = torch.rand(d)
106 |         x2 = torch.rand(d)
107 |         Atwo = rank_two_update(A, x1, x2, steps=[1, 1])
108 |         Aone_one = rank_one_update(rank_one_update(A, x1), x2)
109 |         self.assertTrue(torch.allclose(Aone_one, Atwo, rtol=1e-4, atol=1e-4))
110 | 
111 |         Aminus = rank_two_update(A, x1, x2, steps=[1, 1])
112 |         Aplus = rank_two_update(Aminus, x1, x2, steps=[-1, -1])
113 |         # TODO this seems to be too unstable..
114 |         self.assertTrue(torch.allclose(A, Aplus, rtol=1e-2, atol=1e-2))
115 | 
116 | 
117 |     def test_jacobians(self):
118 |         # Make a random sample:
119 |         d = 10
120 |         n = 20
121 |         X = torch.randn(n, d)
122 |         Y = torch.randn(n)
123 | 
124 |         # Find least squares solution for the full dataset:
125 |         A = torch.inverse(X.T @ X)
126 |         theta = A @ (X.T @ Y)
127 | 
128 |         xi = X[0, :]
129 |         yi = Y[0]
130 | 
131 |         # Method 1:, Compute Jacobian w.r.t. x_i, y_i by
132 |         # 1. Removing x_i, y_i from A and theta
133 |         # 2. Expressing theta* as a function of x, y via a rank-one update
134 |         # 3. Computing the Jacobian of 2 at x_i, y_i
135 |         A_minus = rank_one_update(A, xi, step=-1)
136 |         theta_minus = A_minus @ (X.T @ Y - xi * yi)
137 |         def f_x_y(x, y):
138 |             return least_squares_update(A_minus, theta_minus, x, y)
139 | 
140 |         # Using torch autograd:
141 |         Jx_auto, Jy_auto = torch.autograd.functional.jacobian(f_x_y, (xi, yi))
142 | 
143 |         # Using closed form:
144 |         Jx, Jy = least_squares_jacobian_update(A_minus, theta_minus, xi, yi)
145 | 
146 |         self.assertTrue(torch.allclose(Jy.squeeze(), Jy_auto.squeeze(), rtol=1e-4, atol=1e-4))
147 |         self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4))
148 | 
149 |         # Method 2: Compute Jacobian w.r.t. x_i, y_i by
150 |         # 1. Expressing theta* as a function of removing x_i, y_i and adding in
151 |         # x, y via a rank-two update
152 |         # 2. Computing the Jacobian of 2 at x, y
153 |         def f_x_y(x, y):
154 |             return least_squares_remove_and_update(A, theta, X, Y, xi, yi, x, y)
155 |         Jx_auto2, Jy_auto2 = torch.autograd.functional.jacobian(f_x_y, (xi, yi))
156 |         self.assertTrue(torch.allclose(Jy_auto.squeeze(), Jy_auto2.squeeze(), rtol=1e-4, atol=1e-4))
157 |         self.assertTrue(torch.allclose(Jx_auto, Jx_auto2, rtol=1e-4, atol=1e-4))
158 | 
159 |         # Using closed form:
160 |         Jx, Jy = least_squares_jacobian_single(A, theta, xi, yi)
161 |         self.assertTrue(torch.allclose(Jy, Jy_auto, rtol=1e-4, atol=1e-4))
162 |         self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4))
163 | 
164 |     def test_weighted_jacobian(self):
165 |         # Make a random sample:
166 |         d = 10
167 |         n = 20
168 |         W = torch.diag(torch.ones(n))
169 |         X = torch.randn(n, d)
170 |         Y = torch.randn(n)
171 | 
172 |         for w in [0.25, 0.5, 1.0, 2.0, 3.0, 4.0]:
173 | 
174 |             W[0, 0] = w
175 | 
176 |             # Find least squares solution for the full dataset:
177 |             A = torch.inverse(X.T @ W @ X)
178 |             theta = A @ (X.T @ W @ Y)
179 | 
180 |             xi = X[0, :]
181 |             yi = Y[0]
182 | 
183 |             A_minus = rank_one_update(A, math.sqrt(w) * xi, step=-1)
184 |             theta_minus = A_minus @ (X.T @ W @ Y - w * xi * yi)
185 |             def f_x_y(x, y):
186 |                 return least_squares_update(A_minus, theta_minus, x, y, w)
187 | 
188 |             # Using torch autograd:
189 |             Jx_auto, Jy_auto = torch.autograd.functional.jacobian(f_x_y, (xi, yi))
190 | 
191 |             # Using closed form:
192 |             Jx, Jy = weighted_least_squares_jacobian_single(A, theta, xi, yi, w=w)
193 |             self.assertTrue(torch.allclose(Jy, Jy_auto, rtol=1e-4, atol=1e-4))
194 |             self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4))
195 | 
196 | 
197 |     def test_batched_jacobian(self):
198 |         d = 10
199 |         n = 20
200 |         X = torch.randn(n, d)
201 |         Y = torch.randn(n)
202 | 
203 |         # Find least squares solution for the full dataset:
204 |         A = torch.inverse(X.T @ X)
205 |         theta = A @ (X.T @ Y)
206 | 
207 |         batchJx, batchJy = least_squares_jacobian(A, theta, X, Y)
208 | 
209 |         singleJs = [least_squares_jacobian_single(A, theta, x, y) for x, y in zip(X, Y)]
210 |         singleJx, singleJy = zip(*singleJs)
211 |         self.assertTrue(torch.allclose(batchJx, torch.stack(singleJx), rtol=1e-4, atol=1e-4))
212 |         self.assertTrue(torch.allclose(batchJy, torch.stack(singleJy), rtol=1e-4, atol=1e-4))
213 | 
214 |         # With weights:
215 |         W = torch.rand(n) * 5
216 |         A = (W.unsqueeze(1) * X).T @ X
217 |         theta = A @ (X.T @ (W * Y))
218 | 
219 |         batchJx, batchJy = weighted_least_squares_jacobian(A, theta, X, Y, W)
220 | 
221 |         singleJs = [weighted_least_squares_jacobian_single(A, theta, x, y, w) for x, y, w in zip(X, Y, W)]
222 |         singleJx, singleJy = zip(*singleJs)
223 |         self.assertTrue(torch.allclose(batchJx, torch.stack(singleJx), rtol=1e-3, atol=1e-4))
224 |         self.assertTrue(torch.allclose(batchJy, torch.stack(singleJy), rtol=1e-4, atol=1e-4))
225 | 
226 | 
227 |     def test_logistic_jacobian(self):
228 |         def grad(w, x, y, l2):
229 |             s = torch.sigmoid(w @ x)
230 |             return x @ (s - y) + x.shape[1] * l2 * w
231 | 
232 |         def Hinv(w, x, l2):
233 |             s = torch.sigmoid(w @ x)
234 |             H = s * (1 - s) * x @ x.T + x.shape[1] * l2 * torch.eye(x.shape[0])
235 |             return torch.inverse(H)
236 | 
237 |         def solve(x, y, l2, its=30):
238 |             w = 1e-1*torch.randn(x.shape[0], dtype=torch.double)
239 |             for it in range(its):
240 |                 w = w - Hinv(w, x, l2=l2) @ grad(w, x, y, l2=l2)
241 |             assert grad(w, x, y, l2).norm().item() < 1e-12, "Did not converge."
242 |             return w
243 | 
244 |         def compute_Jf_exact(w, x, y, l2, i):
245 |             xi = x[:, i]
246 |             yi = y[i]
247 |             si = torch.sigmoid(w.dot(xi))
248 |             nabla_xyw = si * (si - 1) * xi.ger(w) + (yi - si) * torch.eye(xi.shape[0])
249 |             Hi = Hinv(w, x, l2)
250 |             return Hi @ nabla_xyw
251 | 
252 |         def compute_Jf_fd(x, y, l2, i, epsilon=1e-6):
253 |             Jf_fd = []
254 |             for j in range(x.shape[0]):
255 |                 x[j, i] += epsilon
256 |                 w_up = solve(x, y, l2)
257 |                 x[j, i] -= 2*epsilon
258 |                 w_down = solve(x, y, l2)
259 |                 Jf_fd.append((w_up - w_down) / (2* epsilon))
260 |                 x[j, i] += epsilon
261 |             return torch.stack(Jf_fd, dim=1)
262 | 
263 |         torch.random.manual_seed(123)
264 |         d = 4
265 |         n = 40
266 |         x = torch.randn((d, n), dtype=torch.double)
267 |         y = torch.randint(high=1, size=(n,))
268 |         l2 = 1e-5
269 | 
270 |         # Compute Jacobian at x_0 with finite differences
271 |         Jf_fd = compute_Jf_fd(x, y, l2, 0)
272 | 
273 |         # Compute Jacobian at x_0 analytically
274 |         w_star = solve(x, y, l2)
275 |         Jf = compute_Jf_exact(w_star, x, y, l2, 0)
276 | 
277 |         self.assertTrue(torch.allclose(Jf, Jf_fd, rtol=1e-7, atol=1e-7))
278 | 
279 | 
280 | # run all the tests:
281 | if __name__ == '__main__':
282 |     unittest.main()
283 | 


--------------------------------------------------------------------------------
/model_inversion.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import argparse
 10 | import json
 11 | import logging
 12 | import math
 13 | import time
 14 | import torch
 15 | 
 16 | import models
 17 | import dataloading
 18 | 
 19 | # set up logger:
 20 | logger = logging.getLogger()
 21 | logger.setLevel(logging.INFO)
 22 | 
 23 | 
 24 | INVERTERS = ["baseline", "ideal", "fredrikson14", "whitebox", "all"]
 25 | 
 26 | 
 27 | def features_to_category(values, one_hot=False):
 28 |     if not one_hot:
 29 |         last_value = (values==0).all(axis=1, keepdim=True).to(values.dtype)
 30 |         values = torch.cat([values, last_value], dim=1)
 31 |     return torch.argmax(values, dim=1)
 32 | 
 33 | 
 34 | def compute_log_marginal(train_data, target_attribute, one_hot=False, weights=None):
 35 |     if weights is None:
 36 |         weights = torch.ones(train_data["features"].size(0))
 37 |     target_values = train_data["features"][:, range(*target_attribute)]
 38 |     if not one_hot:
 39 |         last_value = (target_values==0).all(
 40 |             axis=1, keepdim=True).to(target_values.dtype)
 41 |         target_values = torch.cat([target_values, last_value], dim=1)
 42 |     return (weights[:, None] * target_values).sum(axis=0).log()
 43 | 
 44 | 
 45 | def baseline_inverter(
 46 |         train_data, target_attribute, weights=None, one_hot=False, **kwargs):
 47 |     """
 48 |     The baseline inverter simply measures the prior for the target attribute
 49 |     (using the training data) and predicts the mode of the prior for every
 50 |     target example.
 51 |     """
 52 |     # NB A stronger baseline might have access to the joint prior and/or the
 53 |     # target label
 54 |     log_marginal = compute_log_marginal(
 55 |             train_data, target_attribute, weights=weights, one_hot=one_hot)
 56 |     prediction = torch.argmax(log_marginal)
 57 |     return torch.full(
 58 |         size=(train_data["features"].shape[0],),
 59 |         fill_value=prediction,
 60 |         dtype=torch.long)
 61 | 
 62 | 
 63 | def ideal_inverter(train_data, target_attribute, **kwargs):
 64 |     """
 65 |     The ideal inverter uses the training data to learn a model to predict the
 66 |     target attribute given all other features including the label.
 67 |     """
 68 |     def swap_feature_target(data):
 69 |         features = data["features"]
 70 |         new_features = torch.cat([
 71 |             features[:, :target_attribute[0]],
 72 |             features[:, target_attribute[1]:],
 73 |             data["targets"][:, None].float()
 74 |         ], axis=1)
 75 |         new_targets = features_to_category(
 76 |             features[:, range(*target_attribute)])
 77 |         return {
 78 |             "features" : new_features,
 79 |             "targets" : new_targets,
 80 |         }
 81 |     tmp_train = swap_feature_target(train_data)
 82 |     model = models.MultinomialLogistic()
 83 |     model.train(tmp_train)
 84 |     return model.predict(tmp_train["features"])
 85 | 
 86 | 
 87 | def fredrikson14_inverter(
 88 |         train_data, target_attribute, model, weights=None, one_hot=False, **kwargs):
 89 |     """
 90 |     Implements the model inversion attack of:
 91 |         Fredrikson, 2014, Privacy in Pharmacogenetics: An End-to-End Case Study
 92 |         of Personalized Warfarin Dosing
 93 |     """
 94 |     if weights is None:
 95 |         weights = torch.ones(train_data["features"].size(0))
 96 | 
 97 |     log_marginal = compute_log_marginal(
 98 |         train_data, target_attribute, weights=weights, one_hot=one_hot)
 99 | 
100 |     if type(model) == models.LeastSquares:
101 |         n, d = train_data["features"].shape
102 |         std_var = (weights * model.loss(train_data)).sum().true_divide(n - d)
103 |         score_fn = lambda data : -0.5 * (weights * model.loss(data)) / std_var
104 |     elif type(model) == models.Logistic:
105 |         preds = model.predict(train_data["features"])
106 |         y = train_data["targets"]
107 |         matched = preds == y
108 |         confusions = torch.tensor([
109 |             [matched[y == 0].sum(), (~matched)[y == 0].sum()],
110 |             [(~matched)[y == 1].sum(), matched[y == 1].sum()]
111 |         ])
112 |         pi = confusions.true_divide(confusions.sum(axis=0, keepdim=True))
113 |         def score_fn(data):
114 |             preds = model.predict(data["features"])
115 |             y = data["targets"]
116 |             return pi[y, preds.long()].log()
117 |     else:
118 |         raise ValueError("Unknown model type.")
119 | 
120 |     # For each possible value of the target attribute compute score for the
121 |     # attribute which should be proportional to log pi(y, y') + log p(x), where
122 |     # pi(y, y') is a model dependent performance measure.
123 |     tgt_features = train_data["features"].clone().detach()
124 |     tgt_features[:, range(*target_attribute)] = 0.
125 | 
126 |     scores = []
127 |     for c in range(*target_attribute):
128 |         tgt_features[:, c] = 1.
129 |         score = score_fn(
130 |             {"features": tgt_features, "targets": train_data["targets"]})
131 |         score += log_marginal[c - target_attribute[0]]
132 |         scores.append(score)
133 |         tgt_features[:, c] = 0.
134 |     # Try all 0s
135 |     if not one_hot:
136 |         score = score_fn(
137 |             {"features": tgt_features, "targets": train_data["targets"]})
138 |         score += log_marginal[-1]
139 |         scores.append(score)
140 | 
141 |     # Make the prediction:
142 |     return torch.argmax(torch.stack(scores, axis=1), axis=1)
143 | 
144 | 
145 | class WhiteboxInverter:
146 | 
147 |     def __init__(self, train_data, target_attribute, model_type, weights, l2, one_hot=False):
148 |         # Compute marginal counts (proportional to log marginal):
149 |         self.log_marginal = compute_log_marginal(
150 |             train_data, target_attribute, weights=weights, one_hot=one_hot)
151 |         self.one_hot = one_hot
152 |         # Store learned theta for each possible attribute
153 |         self.thetas = torch.zeros(
154 |                 train_data["features"].size(0),
155 |                 target_attribute[1] - target_attribute[0] + (not one_hot),
156 |                 train_data["features"].size(1))
157 |         for i in range(train_data["features"].size(0)):
158 |             tmp = train_data["features"][i, range(*target_attribute)].clone()
159 |             train_data["features"][i, range(*target_attribute)] = 0.
160 |             for c in range(*target_attribute):
161 |                 j = c - target_attribute[0]
162 |                 train_data["features"][i, c] = 1.
163 |                 model = models.get_model(model_type)
164 |                 model.train(train_data, l2=l2, weights=weights)
165 |                 self.thetas[i, j] = model.theta
166 |                 train_data["features"][i, c] = 0.
167 |             if not one_hot:
168 |                 # Try all 0s (last attribute)
169 |                 model = models.get_model(model_type)
170 |                 model.train(train_data, l2=l2, weights=weights)
171 |                 self.thetas[i, -1] = model.theta
172 |             train_data["features"][i, range(*target_attribute)] = tmp
173 | 
174 |     # prior_lam controls strength of prior
175 |     def predict(self, model, gamma=0, prior_lam=0):
176 |         scores = -(self.thetas - model.theta.view(1, 1, -1)).pow(2).sum(2)
177 |         if gamma > 0 and prior_lam > 0:
178 |             scores = 0.5 * scores / gamma**2 + prior_lam * self.log_marginal.unsqueeze(0)
179 |         return torch.argmax(scores, axis=1)
180 | 
181 | 
182 | def whitebox_inverter(train_data, target_attribute, model, weights=None, l2=0.0, **kwargs):
183 |     """
184 |     Whitebox inverter has access to the trained model and all of the trained
185 |     data less the one example's target attribute value. Predictions are made
186 |     from solving:
187 |         argmax_{attribute} p(model | available data, attribute) p(attribute)
188 |     """
189 |     inverter = WhiteboxInverter(train_data, target_attribute, type(model), weights, l2)
190 |     return inverter.predict(model)
191 | 
192 | 
193 | def compute_metrics(data, predictions, attribute):
194 |     """
195 |     Computes the accuracy for the `predictions` of
196 |     `attribute` on `data`.
197 |     """
198 |     reference = features_to_category(data["features"][:, range(*attribute)])
199 |     return (predictions == reference).float().mean().item()
200 | 
201 | 
202 | def run_inversion(args, data, weights):
203 |     regression = (args.dataset == "iwpc" or args.dataset == "synth")
204 | 
205 |     # Train model:
206 |     model = models.get_model(args.model)
207 |     logging.info(f"Training model {args.model}")
208 |     model.train(data, l2=args.l2, weights=weights)
209 |     # Check predictions for sanity:
210 |     predictions = model.predict(data["features"], regression=regression)
211 |     if regression:
212 |         acc = (predictions - data["targets"]).pow(2).mean()
213 |         logging.info(f"Training MSE of regressor {acc.item():.3f}")
214 |     else:
215 |         acc = ((predictions == data["targets"]).float()).mean()
216 |         logging.info(f"Training accuracy of classifier {acc.item():.3f}")
217 | 
218 |     # The target attribute can be specified as a range, e.g. `(4, 8)` means the
219 |     # 4th through the 7th feature are the values of the encoded target attribute.
220 |     if args.dataset == "uciadult":
221 |         target_attribute = (24, 25) # [not married, married]
222 |     elif args.dataset == "iwpc":
223 |         #target_attribute = (2, 7) # CYP2C9 genotype
224 |         target_attribute = (11, 13) # VKORC1 genotype
225 |     else:
226 |         raise NotImplementedError("Dataset not yet implemented.")
227 | 
228 |     if args.inverter == "all":
229 |         inverters = INVERTERS[:-1]
230 |     else:
231 |         inverters = [args.inverter]
232 | 
233 |     target = features_to_category(data["features"][:, range(*target_attribute)])
234 |     results = { "target" : target.tolist() }
235 | 
236 |     for inverter in inverters:
237 |         invert_fn = globals()[f"{inverter}_inverter"]
238 |         predictions = invert_fn(
239 |             data, target_attribute, model=model, weights=weights, l2=args.l2)
240 |         acc = compute_metrics(data, predictions, target_attribute)
241 |         logging.info(f"{inverter} inverter Accuracy {acc:.4f}")
242 |         results[inverter] = predictions.tolist()
243 | 
244 |     return results
245 | 
246 | 
247 | def main(args):
248 |     regression = (args.dataset == "iwpc" or args.dataset == "synth")
249 |     data = dataloading.load_dataset(
250 |         name=args.dataset, split="train", normalize=False,
251 |         num_classes=2, root=args.data_folder, regression=regression)
252 |     if args.subsample > 0:
253 |         data = dataloading.subsample(data, args.subsample)
254 | 
255 |     if args.weights_file is not None:
256 |         all_weights = torch.load(args.weights_file)
257 |     else:
258 |         all_weights = [torch.ones(len(data["targets"]))]
259 | 
260 |     results = []
261 |     for it, weights in enumerate(all_weights):
262 |         if len(all_weights) > 1:
263 |             logging.info(f"Iteration {it} weights for model inversion.")
264 |         results.append(run_inversion(args, data, weights))
265 | 
266 |     if args.results_file is not None:
267 |         with open(args.results_file, 'w') as fid:
268 |             json.dump(results, fid)
269 | 
270 | 
271 | if __name__ == "__main__":
272 |     parser = argparse.ArgumentParser(description="Model inversion.")
273 |     parser.add_argument("--data_folder", default="/tmp", type=str,
274 |         help="folder in which to store data (default: '/tmp')")
275 |     parser.add_argument("--dataset", default="uciadult", type=str,
276 |         choices=["uciadult", "iwpc", "synth"],
277 |         help="dataset to use.")
278 |     parser.add_argument("--model", default="least_squares", type=str,
279 |         choices=["least_squares", "logistic"],
280 |         help="type of model (default: least_squares)")
281 |     parser.add_argument("--inverter", default="fredrikson14", type=str,
282 |         choices=INVERTERS,
283 |         help="inversion method to use (default: fredrikson14)")
284 |     parser.add_argument("--l2", default=0, type=float,
285 |         help="l2 regularization parameter")
286 |     parser.add_argument("--subsample", default=0, type=int,
287 |         help="number of training examples")
288 |     parser.add_argument("--weights_file", default=None, type=str,
289 |         help="(optional) file to load IRFIL weights from")
290 |     parser.add_argument("--results_file", default=None, type=str,
291 |         help="(optional) path to save results")
292 |     args = parser.parse_args()
293 |     main(args)
294 | 


--------------------------------------------------------------------------------
/scripts/make_figures.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import argparse
 10 | import json
 11 | import math
 12 | import matplotlib.pyplot as plt
 13 | import numpy as np
 14 | import os
 15 | import scipy.stats as scs
 16 | import seaborn as sns
 17 | 
 18 | import sys
 19 | sys.path.append("..")
 20 | 
 21 | import dataloading
 22 | import plotting
 23 | 
 24 | 
 25 | COLOR = sns.cubehelix_palette(1, start=2, rot=0, dark=0, light=.5)[0]
 26 | 
 27 | 
 28 | def load_results(results_path, file_name):
 29 |     with open(os.path.join(results_path, file_name), 'r') as fid:
 30 |         return json.load(fid)
 31 | 
 32 | 
 33 | def eta_overlap(results_path, prefix):
 34 |     etas_li = np.array(load_results(
 35 |         results_path, f"{prefix}_linear_pca20.json")["etas"])
 36 |     etas_lo = np.array(load_results(
 37 |         results_path, f"{prefix}_logistic_pca20.json")["etas"])
 38 |     idx_li = etas_li.argsort()
 39 |     idx_lo = etas_lo.argsort()
 40 |     n_overlap = len(set(idx_li[-500:]).intersection(idx_lo[-500:]))
 41 |     print("="*20)
 42 |     print(prefix)
 43 |     print(f"Num overlap {n_overlap}/100")
 44 | 
 45 | 
 46 | def eta_histogram(results_path, save_path, prefix, train, labels):
 47 |     """
 48 |     Histogram of samples by individual sample eta.
 49 |     """
 50 |     plt.clf()
 51 | 
 52 |     n_class = len(labels)
 53 |     colors = sns.cubehelix_palette(n_class, start=2, rot=0, dark=0, light=.5)
 54 | 
 55 |     plt.figure(figsize=(6, 6))
 56 |     etas = np.array(load_results(
 57 |         results_path, f"{prefix}_pca20.json")["etas"])
 58 |     targets = train["targets"].numpy()
 59 |     etas = [etas[targets == c] for c in range(n_class)]
 60 | 
 61 |     for c in range(n_class):
 62 |         plt.hist(
 63 |             etas[c], bins=80, color=colors[c], alpha=0.5, label=f"{labels[c]}")
 64 |         plt.axvline(
 65 |             etas[c].mean(), color=colors[c], linestyle='dashed', linewidth=2)
 66 |     plt.xlabel("Per sample $\eta$", fontsize=30)
 67 |     plt.xticks(fontsize=26)
 68 |     plt.ylabel("Number of samples", fontsize=30)
 69 |     plt.yticks(fontsize=26)
 70 |     plt.legend()
 71 |     plotting.savefig(os.path.join(save_path, f"{prefix}_eta_hist"))
 72 | 
 73 | 
 74 | def view_images(train, results_path, save_path, prefix):
 75 |     """
 76 |     View the most and least leaked images.
 77 |     """
 78 |     # sort etas by index
 79 |     etas = load_results(
 80 |         results_path, f"{prefix}_pca20.json")["etas"]
 81 |     sorted_etas = sorted(
 82 |         zip(etas, range(len(etas))), key=lambda x: x[0], reverse=True)
 83 | 
 84 |     ims = train["features"].squeeze()
 85 |     n_ims = 8
 86 |     f, axarr = plt.subplots(2, n_ims, figsize=(7, 2.2))
 87 |     f.subplots_adjust(wspace=0.05)
 88 |     for priv in [False, True]:
 89 |         for i in range(n_ims):
 90 |             ax = axarr[int(priv), i]
 91 |             idx = -(i + 1) if priv else i
 92 |             im = sorted_etas[idx][1]
 93 |             image = ims[im, ...]
 94 |             if image.ndim == 3:
 95 |                 image = image.permute(1, 2, 0)
 96 |             ax.imshow(image, cmap='gray')
 97 |             ax.axis("off")
 98 |             title = "{:.1e}".format(sorted_etas[idx][0])
 99 |             ax.set_title(title, fontsize=14)
100 |             ax.get_xaxis().set_visible(False)
101 |             ax.get_yaxis().set_visible(False)
102 |     plotting.savefig(os.path.join(save_path, f"{prefix}_images"))
103 |     plt.close(f)
104 | 
105 | 
106 | def correlations(results_path, save_path, prefix):
107 |     results = load_results(results_path, f"{prefix}_pca20.json")
108 |     etas = np.array(results["etas"])
109 |     n_samples = 2000
110 |     np.random.seed(n_samples)
111 |     samples = np.random.permutation(len(etas))[:n_samples]
112 |     losses = np.array(results["train_losses"])
113 |     grad_norms = np.array(results["train_grad_norms"])
114 |     alternatives = [
115 |         ("loss", "(a) Loss $\ell({\\bf w^*}^\\top {\\bf x}, y)$", losses),
116 |         ("gradnorm", "(b) Gradient norm $\|\\nabla_{\\bf w^*} \ell\|_2$", grad_norms)]
117 |     f, axarr = plt.subplots(1, 2, figsize=(10, 4), sharey=True)
118 |     f.subplots_adjust(wspace=0.1)
119 |     for e, (method, xlabel, values) in enumerate(alternatives):
120 |         ax = axarr[e]
121 |         ax.scatter(values[samples], etas[samples], s=2.5, color=COLOR)
122 |         ax.set_xlabel(xlabel)
123 |     axarr[0].set_ylabel("FIL $\eta$")
124 | 
125 |     plotting.savefig(os.path.join(save_path, f"{prefix}_scatter_alternatives_eta"))
126 |     plt.clf()
127 | 
128 | 
129 | def iterated_reweighted_etas(results_path, save_path, prefix):
130 |     """
131 |     Line plot of variance of Fisher information loss eta with iterated
132 |     reweighting.
133 |     """
134 |     linear = load_results(results_path, f"{prefix}_linear_reweighted.json")
135 |     logistic = load_results(results_path, f"{prefix}_logistic_reweighted.json")
136 |     stds = np.array([linear["eta_stds"], logistic["eta_stds"]])
137 |     iterations = np.arange(stds.shape[1])
138 |     plotting.line_plot(
139 |         stds, iterations, legend=["Linear", "Logistic"],
140 |         xlabel="Iteration", ylabel="Standard deviation of $\eta$",
141 |         ylog=True,
142 |         size=(5, 5),
143 |         filename=os.path.join(save_path, f"{prefix}_eta_std_vs_iterations"))
144 |     for results, model in [(linear, "linear"), (logistic, "logistic")]:
145 |         em = results["eta_means"]
146 |         std = results["eta_stds"]
147 |         acc = results["test_accs"]
148 |         print("="*20)
149 |         print(f"{prefix} {model}")
150 |         print(f"Pre IRFP eta {em[0]:.3f}, std {std[0]:.3f}, test accuracy {acc[0]:.3f}")
151 |         print(f"Post IRFP eta {em[-1]:.3f}, std {std[-1]:.3f}, test accuracy {acc[-1]:.3f}")
152 | 
153 | 
154 | def private_mse_and_fil(results_path, save_path):
155 |     L2s = ['1e-5', '1e-3', '1e-1', '1']
156 |     noise_scales = [
157 |         '1e-05', '2e-05', '5e-05', '0.0001', '0.0002',
158 |         '0.0005', '0.001', '0.002', '0.005', '0.01',
159 |         '0.02', '0.05', '0.1', '0.2', '0.5', '1.0',
160 |     ]
161 | 
162 |     fils = []
163 |     mean_etas = []
164 |     mses = []
165 |     for l2 in L2s:
166 |         etas = load_results(
167 |             results_path, f"iwpc_least_squares_fil_l2_{l2}.json")["etas"]
168 |         fils.append(etas)
169 |         inversion_results = load_results(
170 |             results_path,
171 |             f"iwpc_least_squares_whitebox_private_inversion_l2_{l2}.json")
172 |         mses.append([inversion_results[0][noise_scale]['test_acc']
173 |             for noise_scale in noise_scales])
174 |         mean_etas.append(np.mean(etas))
175 |     sigmas = np.array([float(ns) for ns in noise_scales])
176 |     mean_etas = np.array(mean_etas)[:, None] / sigmas
177 | 
178 |     l2s = np.array([float(l2) for l2 in L2s])
179 |     legend = ["$\lambda=10^{%d}$"%int(math.log10(float(l2))) for l2 in L2s]
180 | 
181 |     # Plot FILs:
182 |     num_bins = 100
183 |     fil_counts = []
184 |     fil_centers = []
185 |     for fil in fils:
186 |         lower = math.log10(np.min(fil))
187 |         upper = math.log10(np.max(fil) + 1e-4)
188 |         bins = np.logspace(lower, upper, num_bins + 1)
189 |         counts, edges = np.histogram(fil, bins=bins)
190 |         centers = (edges[:-1] + edges[1:]) / 2
191 |         fil_counts.append(counts)
192 |         fil_centers.append(centers)
193 |     plotting.line_plot(
194 |         np.array(fil_counts),
195 |         np.array(fil_centers),
196 |         xlabel="FIL $\eta$ (at $\sigma=1$)",
197 |         ylabel="Number of examples",
198 |         legend=legend,
199 |         marker=None,
200 |         size=(5, 5),
201 |         xlog=True,
202 |         filename=os.path.join(
203 |             args.save_path,
204 |             f"iwpc_fil_counts_varying_l2"),
205 |     )
206 | 
207 |     # PLot MSEs
208 |     mses = np.array(mses) # [L2, noise_scale, trials]
209 |     mse_means = mses.mean(axis=2)
210 |     mse_stds = mses.std(axis=2)
211 |     plotting.line_plot(
212 |         mse_means, mean_etas, legend=legend,
213 |         xlabel="Mean $\\bar{\eta}$",
214 |         ylabel="Test MSE",
215 |         ylog=True,
216 |         xlog=True,
217 |         size=(5, 5),
218 |         errors=mse_stds,
219 |         filename=os.path.join(args.save_path, f"iwpc_mse_vs_eta_varying_l2"),
220 |     )
221 | 
222 | 
223 | def private_inversion_accuracy(results_path, save_path):
224 |     L2s = ['1e-5', '1e-3', '1e-1', '1']
225 |     noise_scales = [
226 |         '1e-05', '2e-05', '5e-05', '0.0001', '0.0002',
227 |         '0.0005', '0.001', '0.002', '0.005', '0.01',
228 |         '0.02', '0.05', '0.1', '0.2', '0.5', '1.0',
229 |     ]
230 | 
231 |     inversion_results = load_results(
232 |         results_path, f"iwpc_least_squares_inversion_l2_1e-3.json")[0]
233 |     target = np.array(inversion_results["target"])
234 |     baseline = (target == np.array(inversion_results["baseline"])).mean()
235 | 
236 |     # Load the max eta for each L2.
237 |     mean_etas = []
238 |     for l2 in L2s:
239 |         etas = load_results(
240 |             results_path, f"iwpc_least_squares_fil_l2_{l2}.json")["etas"]
241 |         mean_etas.append(np.mean(etas))
242 |     sigmas = np.array([float(ns) for ns in noise_scales])
243 |     mean_etas = np.array(mean_etas)[:, None] / sigmas
244 | 
245 |     results = {"whitebox": {}, "fredrikson14": {}}
246 |     for inverter in ["whitebox", "fredrikson14"]:
247 |         results = []
248 |         for l2 in L2s:
249 |             # inversion results are in a list ordered by ieration or IRFIL,
250 |             # each dictionary is the results at a given noise scale along with
251 |             # the target values
252 |             inversion_results = load_results(
253 |                 results_path,
254 |                 f"iwpc_least_squares_{inverter}_private_inversion_l2_{l2}.json")
255 |             all_accs = []
256 |             for noise_scale in noise_scales:
257 |                 accs = []
258 |                 for prediction in inversion_results[0][noise_scale]['predictions']:
259 |                     accs.append((np.array(prediction) == target).mean())
260 |                 all_accs.append([np.mean(accs), np.std(accs)])
261 |             results.append(all_accs)
262 | 
263 |         results = np.array(results) # [L2, noise scale, mean/std]
264 |         means = results[:, :, 0]
265 |         stds = results[:, :, 1]
266 | 
267 |         legend = ["$\lambda=10^{%d}$"%int(math.log10(float(l2))) for l2 in L2s]
268 | 
269 |         plotting.line_plot(
270 |             means, mean_etas, legend=legend,
271 |             xlabel="Mean $\\bar{\eta}$",
272 |             ylabel="Attribute inversion accuracy",
273 |             errors=stds,
274 |             ymax=1.02,
275 |             ymin=0.2,
276 |             xlog=True,
277 |             size=(5, 5))
278 |         plt.semilogx([0, 1e3], [baseline]*2, 'k--', label="Prior")
279 |         plt.legend()
280 |         plt.xlim(right=1e3)
281 |         plotting.savefig(os.path.join(
282 |             args.save_path,
283 |             f"iwpc_{inverter}_vs_eta_varying_l2"))
284 | 
285 | 
286 | def irfil_inversion(results_path, dataset, save_path):
287 |     noise_scale = "0.001"
288 |     its = 10
289 | 
290 |     irfil_results = load_results(results_path, f"{dataset}_least_squares_irfil.json")
291 |     etas = np.array(irfil_results["etas"])
292 |     eta_means = np.array(irfil_results["eta_means"])[:its]
293 |     eta_stds = np.array(irfil_results["eta_stds"])[:its]
294 |     plotting.line_plot(
295 |         eta_means[None, :], np.arange(its),
296 |         xlabel="Steps of IRFIL",
297 |         ylabel="Mean $\\bar{\eta}$",
298 |         errors=eta_stds[None, :],
299 |         size=(4.85, 5.05),
300 |         filename=os.path.join(args.save_path, f"{dataset}_mean_fil"),
301 |     )
302 | 
303 |     def compute_correct_ratio(etas, num_bins, predictions, target):
304 |         order = etas.argsort()
305 |         bin_size = len(target) // num_bins + 1
306 |         bin_accs = []
307 |         for prediction in predictions:
308 |             prediction = np.array(prediction)
309 |             correct = (prediction == target)
310 |             bin_accs.append([correct[order[lower:lower + bin_size]].mean()
311 |                 for lower in range(0, len(correct), bin_size)])
312 |         return np.array(bin_accs)
313 | 
314 |     inversion_results = load_results(
315 |         results_path,
316 |         f"{dataset}_least_squares_whitebox_private_inversion_irfil.json")
317 |     target = np.array(inversion_results[0]["target"])
318 |     num_bins = 10
319 |     ratio_means = []
320 |     ratio_stds = []
321 |     its = [0, 2, 10]
322 |     for it in its:
323 |         predictions = inversion_results[it][noise_scale]['predictions']
324 |         ratios = compute_correct_ratio(etas, num_bins, predictions, target)
325 |         ratio_means.append(ratios.mean(axis=0))
326 |         ratio_stds.append(ratios.std(axis=0))
327 |     ratio_means = np.array(ratio_means)
328 |     ratio_stds = np.array(ratio_stds)
329 | 
330 |     plotting.line_plot(
331 |         ratio_means, np.arange(num_bins),
332 |         legend=["Iteration {}".format(it) for it in its],
333 |         xlabel="FIL ($\eta$) percentile",
334 |         ylabel="Attribute inversion accuracy",
335 |         errors=ratio_stds,
336 |         size=(5, 5),
337 |         filename=os.path.join(
338 |             args.save_path,
339 |             f"{dataset}_whitebox_eta_percentile"),
340 |     )
341 | 
342 | 
343 | 
344 | def main(args):
345 |     labels = {"mnist" : ["0", "1"], "cifar10": ["Plane", "Car"]}
346 |     for dataset in ["mnist", "cifar10"]:
347 |         train = dataloading.load_dataset(
348 |             name=dataset, split="train", normalize=False,
349 |             num_classes=2, reshape=False, root=args.data_folder)
350 | 
351 |         for model in ["linear", "logistic"]:
352 |             prefix = f"{dataset}_{model}"
353 | 
354 |             # Histogram of etas:
355 |             eta_histogram(
356 |                 args.results_path, args.save_path,
357 |                 prefix, train, labels[dataset])
358 | 
359 |             # Most and least leaked images:
360 |             view_images(train, args.results_path, args.save_path, prefix)
361 | 
362 |         eta_overlap(args.results_path, f"{dataset}")
363 | 
364 |         # Plot of eta stds vs iterations of reweighting
365 |         iterated_reweighted_etas(
366 |             args.results_path, args.save_path, f"{dataset}")
367 | 
368 |     # Plot correlations of eta with other metrics
369 |     correlations(args.results_path, args.save_path, "mnist_linear")
370 | 
371 |     # IWPC MSE and FIL with output pertubration
372 |     private_mse_and_fil(args.results_path, args.save_path)
373 | 
374 |     # IWPC Fredrikson and whitebox attribute inversion results.
375 |     private_inversion_accuracy(args.results_path, args.save_path)
376 | 
377 |     # IWPC and UCI Adult attribute inversion results as a function of
378 |     # iterations of IRFIL
379 |     for dataset in ["iwpc", "uciadult"]:
380 |         irfil_inversion(args.results_path, dataset, args.save_path)
381 | 
382 | 
383 | if __name__ == "__main__":
384 |     parser = argparse.ArgumentParser(
385 |         description="Script to make all the figures.")
386 |     parser.add_argument("--results_path", type=str, default=".",
387 |         help="Path of saved results")
388 |     parser.add_argument("--data_folder", default="/tmp", type=str,
389 |         help="folder in which to store data (default: '/tmp')")
390 |     parser.add_argument("--save_path", default=".", type=str,
391 |         help="folder in which to store figures (default: '.')")
392 |     parser.add_argument("--format", default=None, type=str,
393 |         help="format to save figures (default: \"pdf\")")
394 |     args = parser.parse_args()
395 |     if args.format is not None:
396 |         plotting.FORMAT = args.format
397 | 
398 |     plt.rcParams.update({
399 |         "axes.titlesize": 24,
400 |         "legend.fontsize": 20,
401 |         "xtick.labelsize": 18,
402 |         "ytick.labelsize" : 18,
403 |         "axes.labelsize": 24
404 |     })
405 | 
406 |     main(args)
407 | 


--------------------------------------------------------------------------------
/dataloading.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Copyright (c) Facebook, Inc. and its affiliates.
  4 | # All rights reserved.
  5 | 
  6 | # This source code is licensed under the license found in the
  7 | # LICENSE file in the root directory of this source tree.
  8 | 
  9 | import logging
 10 | import numpy as np
 11 | import os
 12 | import torch
 13 | from torchvision.datasets import MNIST, CIFAR10, CIFAR100
 14 | from torchvision.datasets.mnist import read_image_file, read_label_file
 15 | from torchvision.datasets.utils import download_url
 16 | 
 17 | import time
 18 | import sys
 19 | import git
 20 | 
 21 | 
 22 | class Synth:
 23 |     """
 24 |     Synthetic dataset for testing.
 25 |     """
 26 |     def __init__(self, root, train=True, download=True):
 27 |         regression = True
 28 |         drop = True
 29 |         num_examples = 1000
 30 |         num_attributes = 3
 31 |         categories_per_attribute = 3
 32 | 
 33 |         # For repeatability
 34 |         torch.manual_seed(int(train))
 35 |         X = torch.randint(
 36 |             0, categories_per_attribute, (num_examples, num_attributes))
 37 |         X = torch.nn.functional.one_hot(X, categories_per_attribute)
 38 |         if drop:
 39 |             # Remove the last attribute to get rid of perfect collinearity
 40 |             X = X[:, :, :-1]
 41 |         X = X.reshape(num_examples, -1)
 42 |         self.data = X.float()
 43 | 
 44 |         if regression:
 45 |             self.targets = torch.randn(num_examples) * 50
 46 |         else:
 47 |             self.targets = torch.randint(0, 2, (num_examples,))
 48 |         torch.manual_seed(time.time())
 49 | 
 50 | 
 51 | class UCIAdult:
 52 |     """
 53 |     UCI Adult dataset:
 54 |         http://archive.ics.uci.edu/ml/datasets/Adult
 55 | 
 56 |     The task is to classify individuals as making above or below $50k
 57 |     based on certain demographic attributes.
 58 | 
 59 |     Data key:
 60 |         0 age: continuous.
 61 |         1 workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov,
 62 |             State-gov, Without-pay, (NB removed with ?: Never-worked).
 63 |         2 fnlwgt: continuous.
 64 |         3 education: Bachelors, Some-college, 11th, HS-grad, Prof-school,
 65 |           Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th,
 66 |           10th, Doctorate, 5th-6th, Preschool.
 67 |         4 education-num: continuous.
 68 |         5 marital-status: Married-civ-spouse, Divorced, Never-married, Separated,
 69 |             Widowed, Married-spouse-absent, Married-AF-spouse.
 70 |         6 occupation: Tech-support, Craft-repair, Other-service, Sales,
 71 |             Exec-managerial, Prof-specialty, Handlers-cleaners,
 72 |             Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving,
 73 |             Priv-house-serv, Protective-serv, Armed-Forces.
 74 |         7 relationship: Wife, Own-child, Husband, Not-in-family, Other-relative,
 75 |             Unmarried.
 76 |         8 race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
 77 |         9 sex: Female, Male.
 78 |         10 capital-gain: continuous.
 79 |         11 capital-loss: continuous.
 80 |         12 hours-per-week: continuous.
 81 |         13 native-country: United-States, Cambodia, England, Puerto-Rico, Canada,
 82 |             Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South,
 83 |             China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica,
 84 |             Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos,
 85 |             Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua,
 86 |             Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru,
 87 |             Hong, Holand-Netherlands.
 88 | 
 89 |         14 label: >50K, <=50K.
 90 |     """
 91 | 
 92 |     def __init__(self, root, train=True, download=False, mehnaz20=True, drop=True):
 93 |         """
 94 |         mehnaz20: If True, then pre-process the data according to:
 95 |               Black-box Model Inversion Attribute Inference Attacks on
 96 |               Classification Models, Mehnaz 2020, https://arxiv.org/abs/2012.03404.
 97 |             Namely, they combine the marital status features (5) into a single
 98 |             binary feature {Married-civ-spouse, Married-spouse-absent,
 99 |             Married-Af-spouse} vs {Divorced, Never-married, Separated, Widowed} and
100 |             remove the relationship features (7).
101 |         drop: If True, remove the last feature for a one-hot encoded
102 |             attribute. This helps alleviate perfect colinearity amongst
103 |             the features.
104 |         """
105 |         self.root = root
106 |         if download:
107 |             self.download()
108 |         continuous_ids = set([0, 2, 4, 10, 11, 12])
109 |         feature_keys = [set() for _ in range(15)]
110 | 
111 |         def load(dataset):
112 |             with open(os.path.join(root, dataset)) as fid:
113 |                 # load data ignoring rows with missing values
114 |                 lines = (l.strip() for l in fid)
115 |                 lines = (l.split(",") for l in lines if "?" not in l)
116 |                 lines = [l for l in lines if len(l) == 15]
117 |             return lines
118 | 
119 |         for line in load("adult.data"):
120 |             for e, k in enumerate(line):
121 |                 if e in continuous_ids:
122 |                     continue
123 |                 k = k.strip()
124 |                 feature_keys[e].add(k)
125 |         feature_keys = [{k: i for i, k in enumerate(sorted(fk))}
126 |             for fk in feature_keys]
127 |         self.feature_keys = feature_keys
128 |         if mehnaz20:
129 |             # Remap marital status to binary feature:
130 |             marital_status = feature_keys[5]
131 |             for ms in ["Divorced", "Never-married", "Separated", "Widowed"]:
132 |                 marital_status[ms] = 0
133 |             for ms in ["Married-AF-spouse", "Married-civ-spouse", "Married-spouse-absent"]:
134 |                 marital_status[ms] = 1
135 | 
136 |         def process(dataset, mean_stds=None):
137 |             features = []
138 |             targets = []
139 |             for line in load(dataset):
140 |                 example = []
141 |                 for e, k in enumerate(line):
142 |                     k = k.strip().strip(".")
143 |                     example.append(int(feature_keys[e].get(k, k)))
144 |                 features.append(example[:-1])
145 |                 targets.append(example[-1])
146 |             features = torch.tensor(features, dtype=torch.float)
147 |             features = list(features.split(1, dim=1))
148 |             targets = torch.tensor(targets)
149 | 
150 |             if mean_stds is None:
151 |                 mean_stds = {}
152 |             for e, feat in enumerate(features):
153 |                 keys = feature_keys[e]
154 |                 # Normalize continuous features:
155 |                 if len(keys) == 0:
156 |                     if e not in mean_stds:
157 |                         mean_stds[e] = (torch.mean(feat), torch.std(feat))
158 |                     mean, std = mean_stds[e]
159 |                     features[e] = (feat - mean) / std
160 |                 # One-hot encode non-continuous features:
161 |                 else:
162 |                     num_feats = max(keys.values()) + 1
163 |                     features[e] = torch.nn.functional.one_hot(
164 |                         feat.squeeze().to(torch.long), num_feats)
165 |                     if drop:
166 |                         features[e] = features[e][:, :-1]
167 |                     features[e] = features[e].to(torch.float)
168 |             if mehnaz20:
169 |                 # Remove relationship status:
170 |                 features.pop(7)
171 |             features = torch.cat(features, dim=1)
172 |             return features, targets, mean_stds
173 | 
174 |         features, targets, mean_stds  = process("adult.data")
175 |         if not train:
176 |             features, targets, _ = process("adult.test", mean_stds)
177 |         self.data = features
178 |         self.targets = targets
179 | 
180 |     def download(self):
181 |         logging.info("Maybe downloading UCI Adult dataset.")
182 |         base_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/"
183 |         for f in ["adult.data", "adult.names", "adult.test"]:
184 |             download_url(os.path.join(base_url, f), root=self.root)
185 | 
186 | 
187 | class IWPC:
188 |     """
189 |     International Warfarin Pharmacogenetics Consortium (IWPC) dataset:
190 |         https://www.pharmgkb.org/downloads
191 |     Pre-processed version from https://github.com/samuel-yeom/ml-privacy-csf18
192 | 
193 |     The task is to predict stable Warfarin dosage given demographic, medical, and genetic attributes.
194 | 
195 |     Processed data has the following attributes:
196 |         0 race: asian/white/black
197 |         1 age: rounded down to 10s and whitened
198 |         2 height: continuous; whitened
199 |         3 weight: continuous; whitened
200 |         4 amiodarone: binary
201 |         5 CYP2C9 inducer: binary
202 |         6 CYP2C9 genotype: 11/12/13/22/23/33
203 |         7 VKORC1 genotype: CC/CT/TT
204 |         8 label: Warfarin dosage; whitened
205 |     """
206 |     def __init__(self, root, train=True, download=False, drop=True):
207 |         """
208 |         drop: If True, remove the last feature for a one-hot encoded
209 |             attribute. This helps alleviate perfect colinearity amongst
210 |             the features.
211 |         """
212 |         if download:
213 |             # download dataloading code and data from repo
214 |             try:
215 |                 git.Repo.clone_from("git@github.com:samuel-yeom/ml-privacy-csf18.git", root)
216 |             except git.GitCommandError:
217 |                 print("Directory exists and is non-empty, skipping download")
218 |         sys.path.append(os.path.join(root, "code"))
219 |         from main import load_iwpc
220 |         X, y, featnames = load_iwpc(os.path.join(root, "data"))
221 |         X = X.todense()
222 |         if drop:
223 |             attrs = [f.split("=")[0] for f in featnames]
224 |             drop_keys = ["cyp2c9", "race", "vkorc1"]
225 |             drop_idx = [attrs.index(k) for k in drop_keys]
226 |             X = np.delete(X, drop_idx, axis=1)
227 |             featnames = np.delete(featnames, drop_idx)
228 |         print("Attributes: " + str(featnames))
229 |         X = torch.from_numpy(X).float()
230 |         y = torch.from_numpy(y).float()
231 |         # fix a random 80:20 train-val split
232 |         torch.manual_seed(0)
233 |         perm = torch.randperm(X.size(0))
234 |         n_train = int(0.8 * X.size(0))
235 |         if train:
236 |             self.data = X[perm[:n_train], :]
237 |             self.targets = y[perm[:n_train]]
238 |         else:
239 |             self.data = X[perm[n_train:], :]
240 |             self.targets = y[perm[n_train:]]
241 |         torch.manual_seed(time.time())
242 | 
243 | 
244 | class MNIST1M(MNIST):
245 |     """
246 |     MNIST1M dataset that can be generated using InfiMNIST.
247 | 
248 |     Note: This dataset cannot be downloaded automatically.
249 |     """
250 | 
251 |     def __init__(self, root, train=True, transform=None, target_transform=None,
252 |                  download=False):
253 |         super(MNIST1M, self).__init__(
254 |             root,
255 |             train=train,
256 |             transform=transform,
257 |             target_transform=target_transform,
258 |             download=download,
259 |         )
260 | 
261 |     def download(self):
262 |         """
263 |         Process MNIST1M data if it does not exist in processed_folder already.
264 |         """
265 | 
266 |         # check if processed data does not exist:
267 |         if self._check_exists():
268 |             return
269 | 
270 |         # process and save as torch files:
271 |         logging.info("Processing MNIST1M data...")
272 |         os.makedirs(self.processed_folder, exist_ok=True)
273 |         training_set = (
274 |             read_image_file(os.path.join(self.raw_folder, "mnist1m-images-idx3-ubyte")),
275 |             read_label_file(os.path.join(self.raw_folder, "mnist1m-labels-idx1-ubyte"))
276 |         )
277 |         test_set = (
278 |             read_image_file(os.path.join(self.raw_folder, "t10k-images-idx3-ubyte")),
279 |             read_label_file(os.path.join(self.raw_folder, "t10k-labels-idx1-ubyte"))
280 |         )
281 |         with open(os.path.join(self.processed_folder, self.training_file), "wb") as f:
282 |             torch.save(training_set, f)
283 |         with open(os.path.join(self.processed_folder, self.test_file), "wb") as f:
284 |             torch.save(test_set, f)
285 |         logging.info("Done!")
286 | 
287 | 
288 | def load_dataset(
289 |     name="mnist",
290 |     split="train",
291 |     normalize=True,
292 |     reshape=True,
293 |     num_classes=None,
294 |     regression=False,
295 |     root="/tmp",
296 | ):
297 |     """
298 |     Loads train or test `split` from the dataset with the specified `name`.
299 |     Setting `normalize` to `True` (default) normalizes each feature vector to
300 |     lie on the unit ball. Setting `reshape` to `True` (default) flattens n-D
301 |     arrays into vectors. Specifying `num_classes` selects only the first
302 |     `num_classes` of the classification problem (default: all classes).
303 |     """
304 | 
305 |     # assertions:
306 |     assert split in ["train", "test"], f"unknown split: {split}"
307 |     image_sets = {
308 |         "mnist": MNIST,
309 |         "mnist1m": MNIST1M,
310 |         "cifar10": CIFAR10,
311 |         "cifar100": CIFAR100,
312 |     }
313 |     datasets = {
314 |         "uciadult": UCIAdult, "iwpc": IWPC, "synth": Synth}
315 |     datasets.update(image_sets)
316 |     assert name in datasets, f"unknown dataset: {name}"
317 | 
318 |     # download the image dataset:
319 |     dataset = datasets[name](
320 |         f"{root}/{name}_original",
321 |         download=True,
322 |         train=(split == "train"),
323 |     )
324 | 
325 |     # preprocess the image dataset:
326 |     features, targets = dataset.data, dataset.targets
327 |     if not torch.is_tensor(features):
328 |         features = torch.from_numpy(features)
329 |     if not torch.is_tensor(targets):
330 |         targets = torch.tensor(targets)
331 |     if name in image_sets:
332 |         features = features.float().div_(255.)
333 |     if not regression:
334 |         targets = targets.long()
335 | 
336 |     # flatten images or convert to NCHW:
337 |     if reshape:
338 |         features = features.reshape(features.size(0), -1)
339 |     else:
340 |         if len(features.shape) == 3:
341 |             features = features.unsqueeze(3)
342 |         features = features.permute(0, 3, 1, 2)
343 | 
344 |     # select only subset of classes:
345 |     if not regression and num_classes is not None:
346 |         assert num_classes >= 2, "number of classes must be >= 2"
347 |         mask = targets.lt(num_classes)  # assumes classes are 0, 1, ..., C - 1
348 |         features = features[mask, :]
349 |         targets = targets[mask]
350 | 
351 |     # normalize all samples to lie within unit ball:
352 |     if normalize:
353 |         assert reshape, "normalization without reshaping unsupported"
354 |         features.div_(features.norm(dim=1).max())
355 |     # return dataset:
356 |     return {"features": features, "targets": targets}
357 | 
358 | 
359 | def load_datasampler(dataset, batch_size=1, shuffle=True, transform=None):
360 |     """
361 |     Returns a data sampler that yields samples of the specified `dataset` with the
362 |     given `batch_size`. An optional `transform` for samples can also be given.
363 |     If `shuffle` is `True` (default), samples are shuffled.
364 |     """
365 |     assert dataset["features"].size(0) == dataset["targets"].size(0), \
366 |         "number of feature vectors and targets must match"
367 |     if transform is not None:
368 |         assert callable(transform), "transform must be callable if specified"
369 |     N = dataset["features"].size(0)
370 | 
371 |     # define simple dataset sampler:
372 |     def sampler():
373 |         idx = 0
374 |         perm = torch.randperm(N) if shuffle else torch.range(0, N).long()
375 |         while idx < N:
376 | 
377 |             # get batch:
378 |             start = idx
379 |             end = min(idx + batch_size, N)
380 |             batch = dataset["features"][perm[start:end], :]
381 | 
382 |             # apply transform:
383 |             if transform is not None:
384 |                 transformed_batch = [
385 |                     transform(batch[n, :]) for n in range(batch.size(0))
386 |                 ]
387 |                 batch = torch.stack(transformed_batch, dim=0)
388 | 
389 |             # return sample:
390 |             yield {"features": batch, "targets": dataset["targets"][perm[start:end]]}
391 |             idx += batch_size
392 | 
393 |     # return sampler:
394 |     return sampler
395 | 
396 | 
397 | def subsample(data, num_samples, random=True):
398 |     """
399 |     Subsamples the specified `data` to contain `num_samples` samples. Set
400 |     `random` to `False` to not select samples randomly but only pick top ones.
401 |     """
402 | 
403 |     # assertions:
404 |     assert isinstance(data, dict), "data must be a dict"
405 |     assert "targets" in data, "data dict does not have targets field"
406 |     dataset_size = data["targets"].nelement()
407 |     assert num_samples > 0, "num_samples must be positive integer value"
408 |     assert num_samples <= dataset_size, "num_samples cannot exceed data size"
409 | 
410 |     # subsample data:
411 |     if random:
412 |         permutation = torch.randperm(dataset_size)
413 |     for key, value in data.items():
414 |         if random:
415 |             data[key] = value.index_select(0, permutation[:num_samples])
416 |         else:
417 |             data[key] = value.narrow(0, 0, num_samples).contiguous()
418 |     return data
419 | 
420 | 
421 | def pca(data, num_dims=None, mapping=None):
422 |     """
423 |     Applies PCA on the specified `data` to reduce its dimensionality to
424 |     `num_dims` dimensions, and returns the reduced data and `mapping`.
425 | 
426 |     If a `mapping` is specified as input, `num_dims` is ignored and that mapping
427 |     is applied on the input `data`.
428 |     """
429 | 
430 |     # work on both data tensor and data dict:
431 |     data_dict = False
432 |     if isinstance(data, dict):
433 |         assert "features" in data, "data dict does not have features field"
434 |         data_dict = True
435 |         original_data = data
436 |         data = original_data["features"]
437 |     assert data.dim() == 2, "data tensor must be two-dimensional matrix"
438 | 
439 |     # compute PCA mapping:
440 |     if mapping is None:
441 |         assert num_dims is not None, "must specify num_dims or mapping"
442 |         mean = torch.mean(data, 0, keepdim=True)
443 |         zero_mean_data = data.sub(mean)
444 |         covariance = torch.matmul(zero_mean_data.t(), zero_mean_data)
445 |         _, projection = torch.symeig(covariance, eigenvectors=True)
446 |         projection = projection[:, -min(num_dims, projection.size(1)):]
447 |         mapping = {"mean": mean, "projection": projection}
448 |     else:
449 |         assert isinstance(mapping, dict), "mapping must be a dict"
450 |         assert "mean" in mapping and "projection" in mapping, "mapping missing keys"
451 |         if num_dims is not None:
452 |             logging.warning("Value of num_dims is ignored when mapping is specified.")
453 | 
454 |     # apply PCA mapping:
455 |     reduced_data = data.sub(mapping["mean"]).matmul(mapping["projection"])
456 | 
457 |     # return results:
458 |     if data_dict:
459 |         original_data["features"] = reduced_data
460 |         reduced_data = original_data
461 |     return reduced_data, mapping
462 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Attribution-NonCommercial 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public:
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution-NonCommercial 4.0 International Public
 58 | License
 59 | 
 60 | By exercising the Licensed Rights (defined below), You accept and agree
 61 | to be bound by the terms and conditions of this Creative Commons
 62 | Attribution-NonCommercial 4.0 International Public License ("Public
 63 | License"). To the extent this Public License may be interpreted as a
 64 | contract, You are granted the Licensed Rights in consideration of Your
 65 | acceptance of these terms and conditions, and the Licensor grants You
 66 | such rights in consideration of benefits the Licensor receives from
 67 | making the Licensed Material available under these terms and
 68 | conditions.
 69 | 
 70 | Section 1 -- Definitions.
 71 | 
 72 |   a. Adapted Material means material subject to Copyright and Similar
 73 |      Rights that is derived from or based upon the Licensed Material
 74 |      and in which the Licensed Material is translated, altered,
 75 |      arranged, transformed, or otherwise modified in a manner requiring
 76 |      permission under the Copyright and Similar Rights held by the
 77 |      Licensor. For purposes of this Public License, where the Licensed
 78 |      Material is a musical work, performance, or sound recording,
 79 |      Adapted Material is always produced where the Licensed Material is
 80 |      synched in timed relation with a moving image.
 81 | 
 82 |   b. Adapter's License means the license You apply to Your Copyright
 83 |      and Similar Rights in Your contributions to Adapted Material in
 84 |      accordance with the terms and conditions of this Public License.
 85 | 
 86 |   c. Copyright and Similar Rights means copyright and/or similar rights
 87 |      closely related to copyright including, without limitation,
 88 |      performance, broadcast, sound recording, and Sui Generis Database
 89 |      Rights, without regard to how the rights are labeled or
 90 |      categorized. For purposes of this Public License, the rights
 91 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 92 |      Rights.
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. NonCommercial means not primarily intended for or directed towards
116 |      commercial advantage or monetary compensation. For purposes of
117 |      this Public License, the exchange of the Licensed Material for
118 |      other material subject to Copyright and Similar Rights by digital
119 |      file-sharing or similar means is NonCommercial provided there is
120 |      no payment of monetary compensation in connection with the
121 |      exchange.
122 | 
123 |   j. Share means to provide material to the public by any means or
124 |      process that requires permission under the Licensed Rights, such
125 |      as reproduction, public display, public performance, distribution,
126 |      dissemination, communication, or importation, and to make material
127 |      available to the public including in ways that members of the
128 |      public may access the material from a place and at a time
129 |      individually chosen by them.
130 | 
131 |   k. Sui Generis Database Rights means rights other than copyright
132 |      resulting from Directive 96/9/EC of the European Parliament and of
133 |      the Council of 11 March 1996 on the legal protection of databases,
134 |      as amended and/or succeeded, as well as other essentially
135 |      equivalent rights anywhere in the world.
136 | 
137 |   l. You means the individual or entity exercising the Licensed Rights
138 |      under this Public License. Your has a corresponding meaning.
139 | 
140 | Section 2 -- Scope.
141 | 
142 |   a. License grant.
143 | 
144 |        1. Subject to the terms and conditions of this Public License,
145 |           the Licensor hereby grants You a worldwide, royalty-free,
146 |           non-sublicensable, non-exclusive, irrevocable license to
147 |           exercise the Licensed Rights in the Licensed Material to:
148 | 
149 |             a. reproduce and Share the Licensed Material, in whole or
150 |                in part, for NonCommercial purposes only; and
151 | 
152 |             b. produce, reproduce, and Share Adapted Material for
153 |                NonCommercial purposes only.
154 | 
155 |        2. Exceptions and Limitations. For the avoidance of doubt, where
156 |           Exceptions and Limitations apply to Your use, this Public
157 |           License does not apply, and You do not need to comply with
158 |           its terms and conditions.
159 | 
160 |        3. Term. The term of this Public License is specified in Section
161 |           6(a).
162 | 
163 |        4. Media and formats; technical modifications allowed. The
164 |           Licensor authorizes You to exercise the Licensed Rights in
165 |           all media and formats whether now known or hereafter created,
166 |           and to make technical modifications necessary to do so. The
167 |           Licensor waives and/or agrees not to assert any right or
168 |           authority to forbid You from making technical modifications
169 |           necessary to exercise the Licensed Rights, including
170 |           technical modifications necessary to circumvent Effective
171 |           Technological Measures. For purposes of this Public License,
172 |           simply making modifications authorized by this Section 2(a)
173 |           (4) never produces Adapted Material.
174 | 
175 |        5. Downstream recipients.
176 | 
177 |             a. Offer from the Licensor -- Licensed Material. Every
178 |                recipient of the Licensed Material automatically
179 |                receives an offer from the Licensor to exercise the
180 |                Licensed Rights under the terms and conditions of this
181 |                Public License.
182 | 
183 |             b. No downstream restrictions. You may not offer or impose
184 |                any additional or different terms or conditions on, or
185 |                apply any Effective Technological Measures to, the
186 |                Licensed Material if doing so restricts exercise of the
187 |                Licensed Rights by any recipient of the Licensed
188 |                Material.
189 | 
190 |        6. No endorsement. Nothing in this Public License constitutes or
191 |           may be construed as permission to assert or imply that You
192 |           are, or that Your use of the Licensed Material is, connected
193 |           with, or sponsored, endorsed, or granted official status by,
194 |           the Licensor or others designated to receive attribution as
195 |           provided in Section 3(a)(1)(A)(i).
196 | 
197 |   b. Other rights.
198 | 
199 |        1. Moral rights, such as the right of integrity, are not
200 |           licensed under this Public License, nor are publicity,
201 |           privacy, and/or other similar personality rights; however, to
202 |           the extent possible, the Licensor waives and/or agrees not to
203 |           assert any such rights held by the Licensor to the limited
204 |           extent necessary to allow You to exercise the Licensed
205 |           Rights, but not otherwise.
206 | 
207 |        2. Patent and trademark rights are not licensed under this
208 |           Public License.
209 | 
210 |        3. To the extent possible, the Licensor waives any right to
211 |           collect royalties from You for the exercise of the Licensed
212 |           Rights, whether directly or through a collecting society
213 |           under any voluntary or waivable statutory or compulsory
214 |           licensing scheme. In all other cases the Licensor expressly
215 |           reserves any right to collect such royalties, including when
216 |           the Licensed Material is used other than for NonCommercial
217 |           purposes.
218 | 
219 | Section 3 -- License Conditions.
220 | 
221 | Your exercise of the Licensed Rights is expressly made subject to the
222 | following conditions.
223 | 
224 |   a. Attribution.
225 | 
226 |        1. If You Share the Licensed Material (including in modified
227 |           form), You must:
228 | 
229 |             a. retain the following if it is supplied by the Licensor
230 |                with the Licensed Material:
231 | 
232 |                  i. identification of the creator(s) of the Licensed
233 |                     Material and any others designated to receive
234 |                     attribution, in any reasonable manner requested by
235 |                     the Licensor (including by pseudonym if
236 |                     designated);
237 | 
238 |                 ii. a copyright notice;
239 | 
240 |                iii. a notice that refers to this Public License;
241 | 
242 |                 iv. a notice that refers to the disclaimer of
243 |                     warranties;
244 | 
245 |                  v. a URI or hyperlink to the Licensed Material to the
246 |                     extent reasonably practicable;
247 | 
248 |             b. indicate if You modified the Licensed Material and
249 |                retain an indication of any previous modifications; and
250 | 
251 |             c. indicate the Licensed Material is licensed under this
252 |                Public License, and include the text of, or the URI or
253 |                hyperlink to, this Public License.
254 | 
255 |        2. You may satisfy the conditions in Section 3(a)(1) in any
256 |           reasonable manner based on the medium, means, and context in
257 |           which You Share the Licensed Material. For example, it may be
258 |           reasonable to satisfy the conditions by providing a URI or
259 |           hyperlink to a resource that includes the required
260 |           information.
261 | 
262 |        3. If requested by the Licensor, You must remove any of the
263 |           information required by Section 3(a)(1)(A) to the extent
264 |           reasonably practicable.
265 | 
266 |        4. If You Share Adapted Material You produce, the Adapter's
267 |           License You apply must not prevent recipients of the Adapted
268 |           Material from complying with this Public License.
269 | 
270 | Section 4 -- Sui Generis Database Rights.
271 | 
272 | Where the Licensed Rights include Sui Generis Database Rights that
273 | apply to Your use of the Licensed Material:
274 | 
275 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
276 |      to extract, reuse, reproduce, and Share all or a substantial
277 |      portion of the contents of the database for NonCommercial purposes
278 |      only;
279 | 
280 |   b. if You include all or a substantial portion of the database
281 |      contents in a database in which You have Sui Generis Database
282 |      Rights, then the database in which You have Sui Generis Database
283 |      Rights (but not its individual contents) is Adapted Material; and
284 | 
285 |   c. You must comply with the conditions in Section 3(a) if You Share
286 |      all or a substantial portion of the contents of the database.
287 | 
288 | For the avoidance of doubt, this Section 4 supplements and does not
289 | replace Your obligations under this Public License where the Licensed
290 | Rights include other Copyright and Similar Rights.
291 | 
292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293 | 
294 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304 | 
305 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314 | 
315 |   c. The disclaimer of warranties and limitation of liability provided
316 |      above shall be interpreted in a manner that, to the extent
317 |      possible, most closely approximates an absolute disclaimer and
318 |      waiver of all liability.
319 | 
320 | Section 6 -- Term and Termination.
321 | 
322 |   a. This Public License applies for the term of the Copyright and
323 |      Similar Rights licensed here. However, if You fail to comply with
324 |      this Public License, then Your rights under this Public License
325 |      terminate automatically.
326 | 
327 |   b. Where Your right to use the Licensed Material has terminated under
328 |      Section 6(a), it reinstates:
329 | 
330 |        1. automatically as of the date the violation is cured, provided
331 |           it is cured within 30 days of Your discovery of the
332 |           violation; or
333 | 
334 |        2. upon express reinstatement by the Licensor.
335 | 
336 |      For the avoidance of doubt, this Section 6(b) does not affect any
337 |      right the Licensor may have to seek remedies for Your violations
338 |      of this Public License.
339 | 
340 |   c. For the avoidance of doubt, the Licensor may also offer the
341 |      Licensed Material under separate terms or conditions or stop
342 |      distributing the Licensed Material at any time; however, doing so
343 |      will not terminate this Public License.
344 | 
345 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
346 |      License.
347 | 
348 | Section 7 -- Other Terms and Conditions.
349 | 
350 |   a. The Licensor shall not be bound by any additional or different
351 |      terms or conditions communicated by You unless expressly agreed.
352 | 
353 |   b. Any arrangements, understandings, or agreements regarding the
354 |      Licensed Material not stated herein are separate from and
355 |      independent of the terms and conditions of this Public License.
356 | 
357 | Section 8 -- Interpretation.
358 | 
359 |   a. For the avoidance of doubt, this Public License does not, and
360 |      shall not be interpreted to, reduce, limit, restrict, or impose
361 |      conditions on any use of the Licensed Material that could lawfully
362 |      be made without permission under this Public License.
363 | 
364 |   b. To the extent possible, if any provision of this Public License is
365 |      deemed unenforceable, it shall be automatically reformed to the
366 |      minimum extent necessary to make it enforceable. If the provision
367 |      cannot be reformed, it shall be severed from this Public License
368 |      without affecting the enforceability of the remaining terms and
369 |      conditions.
370 | 
371 |   c. No term or condition of this Public License will be waived and no
372 |      failure to comply consented to unless expressly agreed to by the
373 |      Licensor.
374 | 
375 |   d. Nothing in this Public License constitutes or may be interpreted
376 |      as a limitation upon, or waiver of, any privileges and immunities
377 |      that apply to the Licensor or You, including from the legal
378 |      processes of any jurisdiction or authority.
379 | 
380 | =======================================================================
381 | 
382 | Creative Commons is not a party to its public
383 | licenses. Notwithstanding, Creative Commons may elect to apply one of
384 | its public licenses to material it publishes and in those instances
385 | will be considered the “Licensor.” The text of the Creative Commons
386 | public licenses is dedicated to the public domain under the CC0 Public
387 | Domain Dedication. Except for the limited purpose of indicating that
388 | material is shared under a Creative Commons public license or as
389 | otherwise permitted by the Creative Commons policies published at
390 | creativecommons.org/policies, Creative Commons does not authorize the
391 | use of the trademark "Creative Commons" or any other trademark or logo
392 | of Creative Commons without its prior written consent including,
393 | without limitation, in connection with any unauthorized modifications
394 | to any of its public licenses or any other arrangements,
395 | understandings, or agreements concerning use of licensed material. For
396 | the avoidance of doubt, this paragraph does not form part of the
397 | public licenses.
398 | 
399 | Creative Commons may be contacted at creativecommons.org.
400 | 


--------------------------------------------------------------------------------