├── scripts ├── .gitignore ├── run_experiments.sh └── make_figures.py ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── README.md ├── plotting.py ├── .gitignore ├── private_model_inversion.py ├── fisher_experiment.py ├── reweighted.py ├── models.py ├── test_jacobians.py ├── model_inversion.py ├── dataloading.py └── LICENSE /scripts/.gitignore: -------------------------------------------------------------------------------- 1 | *.pdf 2 | *.png 3 | *.json 4 | *.pth 5 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to. 4 | Please read the [full text](https://code.fb.com/codeofconduct/) 5 | so that you can understand what actions will and will not be tolerated. 6 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to `fisher_information_loss` 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Pull Requests 6 | We actively welcome your pull requests. 7 | 8 | 1. Fork the repo and create your branch from `master`. 9 | 2. If you've added code that should be tested, add tests. 10 | 3. If you've changed APIs, update the documentation. 11 | 4. Ensure the test suite passes. 12 | 5. If you haven't already, complete the Contributor License Agreement ("CLA"). 13 | 14 | ## Contributor License Agreement ("CLA") 15 | In order to accept your pull request, we need you to submit a CLA. You only need 16 | to do this once to work on any of Facebook's open source projects. 17 | 18 | Complete your CLA here: 19 | 20 | ## Issues 21 | We use GitHub issues to track public bugs. Please ensure your description is 22 | clear and has sufficient instructions to be able to reproduce the issue. 23 | 24 | ## License 25 | By contributing to mpcfp, you agree that your contributions will be licensed 26 | under the LICENSE file in the root directory of this source tree. 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fisher Information Loss 2 | 3 | This repository contains code that can be used to reproduce the experimental 4 | results presented in the paper: 5 | 6 | Awni Hannun, Chuan Guo and Laurens van der Maaten. Measuring Data Leakage in 7 | Machine-Learning Models with Fisher Information. 8 | [arXiv 2102.11673](https://arxiv.org/abs/2102.11673), 2021. 9 | 10 | # Installation 11 | 12 | The code requires Python 3.7+, [PyTorch 13 | 1.7.1+](https://pytorch.org/get-started/locally/), and torchvision 0.8.2+. 14 | 15 | Create an Anaconda environment and install the dependencies: 16 | 17 | ``` 18 | conda create --name fil 19 | conda activate fil 20 | conda install -c pytorch pytorch torchvision 21 | pip install gitpython numpy 22 | ``` 23 | 24 | # Usage 25 | 26 | The script `fisher_information.py` computes the per-example FIL for the given 27 | dataset and model. An example run is: 28 | 29 | ``` 30 | python fisher_information.py \ 31 | --dataset mnist \ 32 | --model least_squares 33 | ``` 34 | 35 | To see usage options for the script run: 36 | 37 | ``` 38 | python fisher_information.py --help 39 | ``` 40 | 41 | Other scripts in the repository are: 42 | - `reweighted.py` : Run the iteratively reweighted Fisher information loss 43 | (IRFIL) algorithm. 44 | - `model_inversion.py` : Attribute inversion experiments for a non-private 45 | model. 46 | - `private_model_inversion.py` : Attribute inversion experiments for a private 47 | model. 48 | - `test_jacobians.py` : Unit tests. 49 | 50 | To run the full set of experiments in the accompanying paper: 51 | ``` 52 | cd scripts/ && ./run_experiments.sh 53 | ``` 54 | 55 | # Citing this Repository 56 | 57 | If you use the code in this repository, please cite the following paper: 58 | 59 | ``` 60 | @inproceedings{hannun2021fil, 61 | title={Measuring Data Leakage in Machine-Learning Models with Fisher 62 | Information}, 63 | author={Hannun, Awni and Guo, Chuan and van der Maaten, Laurens}, 64 | booktitle={Conference on Uncertainty in Artificial Intelligence}, 65 | year={2021} 66 | } 67 | ``` 68 | 69 | # License 70 | 71 | This code is released under a CC-BY-NC 4.0 license. Please see the 72 | [LICENSE](LICENSE) file for more information. 73 | 74 | Please review Facebook Open Source [Terms of 75 | Use](https://opensource.facebook.com/legal/terms) and [Privacy 76 | Policy](https://opensource.facebook.com/legal/privacy). 77 | 78 | -------------------------------------------------------------------------------- /plotting.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | 10 | """ 11 | A simple example of creating a figure with text rendered in LaTeX. 12 | 13 | https://jwalton.info/Embed-Publication-Matplotlib-Latex/ 14 | """ 15 | 16 | import seaborn as sns 17 | import matplotlib.pyplot as plt 18 | 19 | # Using seaborn's style 20 | plt.style.use('seaborn-white') 21 | 22 | WIDTH = 345 23 | GR = (5**.5 - 1) / 2 24 | FORMAT = "pdf" 25 | 26 | tex_fonts = { 27 | # Use LaTeX to write all text 28 | "text.usetex": True, 29 | "font.family": "serif", 30 | "axes.labelsize": 14, 31 | "font.size": 14, 32 | # Make the legend/label fonts a little smaller 33 | "legend.fontsize": 12, 34 | "xtick.labelsize": 12, 35 | "ytick.labelsize": 12 36 | } 37 | 38 | plt.rcParams.update(tex_fonts) 39 | plt.rcParams.update({"legend.handlelength": 1}) 40 | 41 | def savefig(filename): 42 | plt.savefig( 43 | filename + "." + FORMAT, format=FORMAT, dpi=1200, bbox_inches="tight") 44 | 45 | def line_plot( 46 | Y, X, xlabel=None, ylabel=None, ymax=None, ymin=None, 47 | xmax=None, xmin=None, filename=None, legend=None, errors=None, 48 | xlog=False, ylog=False, size=None, marker="s"): 49 | colors = sns.cubehelix_palette(Y.shape[0], start=2, rot=0, dark=0, light=.5) 50 | plt.clf() 51 | if legend is None: 52 | legend = [None] * Y.shape[0] 53 | 54 | if size is not None: 55 | plt.figure(figsize=size) 56 | 57 | for n in range(Y.shape[0]): 58 | x = X[n, :] if X.ndim == 2 else X 59 | plt.plot(x, Y[n, :], label=legend[n], color=colors[n], 60 | marker=marker, markersize=5) 61 | if errors is not None: 62 | plt.fill_between( 63 | x, Y[n, :] - errors[n, :], Y[n, :] + errors[n, :], 64 | alpha=0.1, color=colors[n]) 65 | 66 | if ymax is not None: 67 | plt.ylim(top=ymax) 68 | if ymin is not None: 69 | plt.ylim(bottom=ymin) 70 | if xmax is not None: 71 | plt.xlim(right=xmax) 72 | if xmin is not None: 73 | plt.xlim(left=xmin) 74 | 75 | plt.xlabel(xlabel) 76 | plt.ylabel(ylabel) 77 | if legend[0] is not None: 78 | plt.legend() 79 | 80 | axes = plt.gca() 81 | if xlog: 82 | axes.semilogx(10.) 83 | if ylog: 84 | axes.semilogy(10.) 85 | 86 | if filename is not None: 87 | savefig(filename) 88 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Misc. 2 | .nfs* 3 | .DS_Store 4 | 5 | # vim 6 | *.swp 7 | 8 | # Ignore results folder 9 | results/ 10 | 11 | # Byte-compiled / optimized / DLL files 12 | __pycache__/ 13 | *.py[cod] 14 | *$py.class 15 | 16 | # C extensions 17 | *.so 18 | 19 | # Distribution / packaging 20 | .Python 21 | build/ 22 | develop-eggs/ 23 | dist/ 24 | downloads/ 25 | eggs/ 26 | .eggs/ 27 | lib/ 28 | lib64/ 29 | parts/ 30 | sdist/ 31 | var/ 32 | wheels/ 33 | pip-wheel-metadata/ 34 | share/python-wheels/ 35 | *.egg-info/ 36 | .installed.cfg 37 | *.egg 38 | MANIFEST 39 | 40 | # PyInstaller 41 | # Usually these files are written by a python script from a template 42 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 43 | *.manifest 44 | *.spec 45 | 46 | # Installer logs 47 | pip-log.txt 48 | pip-delete-this-directory.txt 49 | 50 | # Unit test / coverage reports 51 | htmlcov/ 52 | .tox/ 53 | .nox/ 54 | .coverage 55 | .coverage.* 56 | .cache 57 | nosetests.xml 58 | coverage.xml 59 | *.cover 60 | *.py,cover 61 | .hypothesis/ 62 | .pytest_cache/ 63 | 64 | # Translations 65 | *.mo 66 | *.pot 67 | 68 | # Django stuff: 69 | *.log 70 | local_settings.py 71 | db.sqlite3 72 | db.sqlite3-journal 73 | 74 | # Flask stuff: 75 | instance/ 76 | .webassets-cache 77 | 78 | # Scrapy stuff: 79 | .scrapy 80 | 81 | # Sphinx documentation 82 | docs/_build/ 83 | 84 | # PyBuilder 85 | target/ 86 | 87 | # Jupyter Notebook 88 | .ipynb_checkpoints 89 | 90 | # IPython 91 | profile_default/ 92 | ipython_config.py 93 | 94 | # pyenv 95 | .python-version 96 | 97 | # pipenv 98 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 99 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 100 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 101 | # install all needed dependencies. 102 | #Pipfile.lock 103 | 104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 105 | __pypackages__/ 106 | 107 | # Celery stuff 108 | celerybeat-schedule 109 | celerybeat.pid 110 | 111 | # SageMath parsed files 112 | *.sage.py 113 | 114 | # Environments 115 | .env 116 | .venv 117 | env/ 118 | venv/ 119 | ENV/ 120 | env.bak/ 121 | venv.bak/ 122 | 123 | # Spyder project settings 124 | .spyderproject 125 | .spyproject 126 | 127 | # Rope project settings 128 | .ropeproject 129 | 130 | # mkdocs documentation 131 | /site 132 | 133 | # mypy 134 | .mypy_cache/ 135 | .dmypy.json 136 | dmypy.json 137 | 138 | # Pyre type checker 139 | .pyre/ 140 | -------------------------------------------------------------------------------- /scripts/run_experiments.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | REPO_ROOT=`pwd` 10 | RESULT_FOLDER=$REPO_ROOT/uai2021 11 | 12 | ### MNIST AND CIFAR EXPERIMENTS ### 13 | 14 | # Linear regression for MNIST and CIFAR-10 15 | for DATASET in "mnist" "cifar10" 16 | do 17 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_linear_pca20.json" 18 | python $REPO_ROOT/fisher_experiment.py \ 19 | --dataset $DATASET \ 20 | --model least_squares \ 21 | --trials 1 \ 22 | --results_file $RESULTS_FILE 23 | done 24 | 25 | # Logistic regression for MNIST and CIFAR-10 26 | RESULTS_FILE="${RESULT_FOLDER}/mnist_logistic_pca20.json" 27 | python $REPO_ROOT/fisher_experiment.py \ 28 | --dataset mnist \ 29 | --model logistic \ 30 | --l2 8e-4 \ 31 | --trials 1 \ 32 | --results_file $RESULTS_FILE 33 | 34 | RESULTS_FILE="${RESULT_FOLDER}/cifar10_logistic_pca20.json" 35 | python $REPO_ROOT/fisher_experiment.py \ 36 | --dataset cifar10 \ 37 | --model logistic \ 38 | --l2 8e-5 \ 39 | --trials 1 \ 40 | --results_file $RESULTS_FILE 41 | 42 | # Iteratively reweighted FP for linear regression 43 | for DATASET in "mnist" "cifar10" 44 | do 45 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_linear_reweighted" 46 | python $REPO_ROOT/reweighted.py \ 47 | --dataset $DATASET \ 48 | --model least_squares \ 49 | --weight_method sample \ 50 | --iters 15 \ 51 | --results_file $RESULTS_FILE 52 | done 53 | 54 | # Iteratively reweighted FP for logistic regression 55 | RESULTS_FILE="${RESULT_FOLDER}/mnist_logistic_reweighted" 56 | python $REPO_ROOT/reweighted.py \ 57 | --dataset mnist \ 58 | --model logistic \ 59 | --weight_method sample \ 60 | --iters 15 \ 61 | --l2 8e-4 \ 62 | --results_file $RESULTS_FILE 63 | 64 | RESULTS_FILE="${RESULT_FOLDER}/cifar10_logistic_reweighted" 65 | python $REPO_ROOT/reweighted.py \ 66 | --dataset cifar10 \ 67 | --model logistic \ 68 | --weight_method sample \ 69 | --iters 15 \ 70 | --l2 8e-5 \ 71 | --results_file $RESULTS_FILE 72 | 73 | 74 | ### IWPC EXPERIMENTS ### 75 | 76 | DATASET="iwpc" 77 | MODEL="least_squares" 78 | 79 | # For L2 and sigma inversion plots: 80 | for L2 in "1e-5" "1e-3" "1e-1" "1" 81 | do 82 | FIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_fil_l2_${L2}" 83 | python $REPO_ROOT/reweighted.py \ 84 | --dataset $DATASET \ 85 | --model $MODEL \ 86 | --pca_dims 0 \ 87 | --no_norm \ 88 | --l2 $L2 \ 89 | --attribute 11 13 \ 90 | --results_file $FIL_RESULTS 91 | 92 | INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_inversion_l2_${L2}.json" 93 | python $REPO_ROOT/model_inversion.py \ 94 | --inverter all \ 95 | --dataset $DATASET \ 96 | --model $MODEL \ 97 | --l2 $L2 \ 98 | --results_file $INVERSION_RESULTS 99 | 100 | for INVERTER in 'fredrikson14' 'whitebox' 101 | do 102 | INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_${INVERTER}_private_inversion_l2_${L2}.json" 103 | python $REPO_ROOT/private_model_inversion.py \ 104 | --dataset $DATASET \ 105 | --trials 100 \ 106 | --noise_scales 1e-5 2e-5 5e-5 1e-4 2e-4 5e-4 1e-3 2e-3 5e-3 1e-2 2e-2 5e-2 1e-1 2e-1 5e-1 1 \ 107 | --inverter $INVERTER \ 108 | --model $MODEL \ 109 | --l2 $L2 \ 110 | --results_file $INVERSION_RESULTS 111 | done 112 | done 113 | 114 | # For IRFIL inversion plots: 115 | L2=1e-2 116 | IRFIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_irfil" 117 | python $REPO_ROOT/reweighted.py \ 118 | --dataset $DATASET \ 119 | --model $MODEL \ 120 | --pca_dims 0 \ 121 | --iters 10 \ 122 | --no_norm \ 123 | --l2 $L2 \ 124 | --attribute 11 13 \ 125 | --results_file $IRFIL_RESULTS 126 | 127 | for INVERTER in 'fredrikson14' 'whitebox' 128 | do 129 | INVERSION_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_${INVERTER}_private_inversion_irfil.json" 130 | python $REPO_ROOT/private_model_inversion.py \ 131 | --dataset $DATASET \ 132 | --trials 100 \ 133 | --noise_scales 1e-4 1e-3 1e-2 \ 134 | --inverter $INVERTER \ 135 | --model $MODEL \ 136 | --l2 $L2 \ 137 | --weights_file ${IRFIL_RESULTS}.pth \ 138 | --results_file $INVERSION_RESULTS 139 | done 140 | 141 | ### UCI ADULT EXPERIMENTS ### 142 | DATASET="uciadult" 143 | MODEL="least_squares" 144 | L2=1e-3 145 | 146 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_${MODEL}_inversion.json" 147 | python $REPO_ROOT/model_inversion.py \ 148 | --dataset $DATASET \ 149 | --model $MODEL \ 150 | --l2 $L2 \ 151 | --inverter all \ 152 | --results_file $RESULTS_FILE 153 | 154 | IRFIL_RESULTS="${RESULT_FOLDER}/${DATASET}_${MODEL}_irfil" 155 | python $REPO_ROOT/reweighted.py \ 156 | --dataset $DATASET \ 157 | --model $MODEL \ 158 | --iters 10 \ 159 | --l2 $L2 \ 160 | --pca_dims 0 \ 161 | --no_norm \ 162 | --attribute 24 25 \ 163 | --results_file $IRFIL_RESULTS 164 | 165 | RESULTS_FILE="${RESULT_FOLDER}/${DATASET}_${MODEL}_whitebox_private_inversion_irfil.json" 166 | python $REPO_ROOT/private_model_inversion.py \ 167 | --dataset $DATASET \ 168 | --model $MODEL \ 169 | --l2 $L2 \ 170 | --inverter whitebox \ 171 | --trials 100 \ 172 | --noise_scales 1e-4 1e-3 1e-2 1e-1 1 \ 173 | --weights_file ${IRFIL_RESULTS}.pth \ 174 | --results_file $RESULTS_FILE 175 | -------------------------------------------------------------------------------- /private_model_inversion.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import argparse 10 | import json 11 | import logging 12 | import math 13 | import time 14 | import torch 15 | 16 | import models 17 | import dataloading 18 | from model_inversion import fredrikson14_inverter, WhiteboxInverter, compute_metrics, features_to_category 19 | 20 | # set up logger: 21 | logger = logging.getLogger() 22 | logger.setLevel(logging.INFO) 23 | 24 | 25 | def eval_model(model, data, regression): 26 | predictions = model.predict(data["features"], regression=regression) 27 | if regression: 28 | acc = (predictions - data["targets"]).pow(2).mean() 29 | else: 30 | acc = ((predictions == data["targets"]).float()).mean() 31 | return acc.item() 32 | 33 | 34 | def run_inversion(args, data, test_data, weights): 35 | regression = (args.dataset == "iwpc" or args.dataset == "synth") 36 | 37 | # Train model: 38 | model = models.get_model(args.model) 39 | logging.info(f"Training model {args.model}") 40 | model.train(data, l2=args.l2, weights=weights) 41 | 42 | if args.dataset == "uciadult": 43 | target_attribute = (24, 25) # [not married, married] 44 | elif args.dataset == "iwpc": 45 | #target_attribute = (2, 7) # CYP2C9 genotype 46 | target_attribute = (11, 13) # VKORC1 genotype 47 | elif args.dataset == "synth": 48 | target_attribute = (0, 2) 49 | else: 50 | raise NotImplementedError("Dataset not yet implemented.") 51 | 52 | if args.inverter == "fredrikson14": 53 | def invert(private_model, noise_scale=None): 54 | return fredrikson14_inverter( 55 | data, target_attribute, private_model, weights) 56 | invert_fn = invert 57 | elif args.inverter == "whitebox": 58 | inverter = WhiteboxInverter( 59 | data, target_attribute, type(model), weights, args.l2) 60 | def invert(private_model, noise_scale=None): 61 | return inverter.predict(private_model, gamma=noise_scale) 62 | invert_fn = invert 63 | 64 | results = {} 65 | theta = model.get_params() 66 | for noise_scale in args.noise_scales: 67 | logging.info(f"Running inversion for noise scale {noise_scale}.") 68 | all_predictions = [] 69 | train_accs = [] 70 | test_accs = [] 71 | for trial in range(args.trials): 72 | # Add noise: 73 | theta_priv = theta + torch.randn_like(theta) * noise_scale 74 | model.set_params(theta_priv) 75 | # Check train and test predictions: 76 | train_acc = eval_model(model, data, regression) 77 | test_acc = eval_model(model, test_data, regression) 78 | if regression: 79 | logging.info(f"MSE Train {train_acc:.3f}, MSE Test {test_acc:.3f}.") 80 | else: 81 | logging.info(f"Acc Train {train_acc:.3f}, Acc Test {test_acc:.3f}.") 82 | predictions = invert_fn(model, noise_scale=noise_scale) 83 | acc = compute_metrics(data, predictions, target_attribute) 84 | logging.info(f"Private inversion accuracy {acc:.4f}") 85 | all_predictions.append(predictions.tolist()) 86 | train_accs.append(train_acc) 87 | test_accs.append(test_acc) 88 | 89 | results[noise_scale] = { 90 | "predictions" : all_predictions, 91 | "train_acc" : train_accs, 92 | "test_acc" : test_accs, 93 | } 94 | 95 | results["target"] = features_to_category( 96 | data["features"][:, range(*target_attribute)]).tolist() 97 | return results 98 | 99 | 100 | def main(args): 101 | regression = (args.dataset == "iwpc" or args.dataset == "synth") 102 | data = dataloading.load_dataset( 103 | name=args.dataset, split="train", normalize=False, 104 | num_classes=2, root=args.data_folder, regression=regression) 105 | test_data = dataloading.load_dataset( 106 | name=args.dataset, split="test", normalize=False, 107 | num_classes=2, root=args.data_folder, regression=regression) 108 | 109 | if args.subsample > 0: 110 | data = dataloading.subsample(data, args.subsample) 111 | 112 | if args.weights_file is not None: 113 | all_weights = torch.load(args.weights_file) 114 | else: 115 | all_weights = [torch.ones(len(data["targets"]))] 116 | 117 | results = [] 118 | for it, weights in enumerate(all_weights): 119 | if len(all_weights) > 1: 120 | logging.info(f"Iteration {it} weights for model inversion.") 121 | results.append(run_inversion(args, data, test_data, weights)) 122 | 123 | if args.results_file is not None: 124 | with open(args.results_file, 'w') as fid: 125 | json.dump(results, fid) 126 | 127 | 128 | if __name__ == "__main__": 129 | parser = argparse.ArgumentParser(description="Model inversion.") 130 | parser.add_argument("--data_folder", default="/tmp", type=str, 131 | help="folder in which to store data (default: '/tmp')") 132 | parser.add_argument("--dataset", default="uciadult", type=str, 133 | choices=["uciadult", "iwpc", "synth"], 134 | help="dataset to use.") 135 | parser.add_argument("--model", default="least_squares", type=str, 136 | choices=["least_squares", "logistic"], 137 | help="type of model (default: least_squares)") 138 | parser.add_argument("--l2", default=0, type=float, 139 | help="l2 regularization parameter") 140 | parser.add_argument("--inverter", default="whitebox", type=str, 141 | choices=["fredrikson14", "whitebox"], 142 | help="inversion method to use (default: whitebox)") 143 | parser.add_argument("--noise_scales", metavar='N', type=float, 144 | nargs='+', default=[0], 145 | help="Gaussian noise scales for output perturbation") 146 | parser.add_argument("--trials", default=1, type=int, 147 | help="number of noise vectors to test") 148 | parser.add_argument("--subsample", default=0, type=int, 149 | help="number of training examples") 150 | parser.add_argument("--weights_file", default=None, type=str, 151 | help="(optional) file to load IRFIL weights from") 152 | parser.add_argument("--results_file", default=None, type=str, 153 | help="(optional) path to save results") 154 | args = parser.parse_args() 155 | main(args) 156 | -------------------------------------------------------------------------------- /fisher_experiment.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import argparse 10 | import json 11 | import logging 12 | import math 13 | import time 14 | import torch 15 | 16 | import models 17 | import dataloading 18 | 19 | # set up logger: 20 | logger = logging.getLogger() 21 | logger.setLevel(logging.INFO) 22 | 23 | 24 | def compute_accuracy(model, data, noise_scale=0, trials=100, regression=False): 25 | accuracies = [] 26 | X, y = data["features"], data["targets"] 27 | theta = model.get_params() 28 | for _ in range(trials): 29 | theta_priv = theta + torch.randn_like(theta) * noise_scale 30 | model.set_params(theta_priv) 31 | if regression: 32 | acc = model.loss(data).mean().item() 33 | else: 34 | predictions = model.predict(X) 35 | acc = ((predictions == y).float()).mean().item() 36 | accuracies.append(acc) 37 | model.set_params(theta) 38 | accuracies = torch.tensor(accuracies) 39 | return torch.mean(accuracies).item(), torch.std(accuracies).item() 40 | 41 | 42 | def clip_data(data, etas, clip): 43 | keep_ids = etas < clip 44 | return {"features" : data["features"][keep_ids, ...], 45 | "targets" : data["targets"][keep_ids]} 46 | 47 | 48 | def eval_comparison_stats(model, data): 49 | theta = model.get_params() 50 | theta.requires_grad = True 51 | 52 | # compute per sample losses: 53 | losses = model.loss(data) 54 | 55 | # compute the norm of the gradient of the loss at each sample 56 | # w.r.t. the model weights: 57 | def func(theta): 58 | model.theta = theta 59 | return model.loss(data) 60 | ind_grads = torch.autograd.functional.jacobian(func, theta) 61 | grad_norms = ind_grads.norm(dim=1) 62 | 63 | theta.requires_grad = False 64 | 65 | # compute per sample inner product with weights 66 | weight_dots = data["features"] @ theta 67 | 68 | return losses.tolist(), weight_dots.tolist(), grad_norms.tolist() 69 | 70 | def main(args): 71 | regression = (args.dataset == "iwpc") 72 | data = dataloading.load_dataset( 73 | name=args.dataset, split="train", normalize=not args.no_norm, 74 | num_classes=2, root=args.data_folder, regression=regression) 75 | test_data = dataloading.load_dataset( 76 | name=args.dataset, split="test", normalize=not args.no_norm, 77 | num_classes=2, root=args.data_folder, regression=regression) 78 | if args.pca_dims > 0: 79 | data, pca = dataloading.pca(data, num_dims=args.pca_dims) 80 | test_data, _ = dataloading.pca(test_data, mapping=pca) 81 | 82 | model = models.get_model(args.model) 83 | 84 | # Find the optimal parameters for the model: 85 | logging.info(f"Training model {args.model}") 86 | model.train(data, l2=args.l2) 87 | 88 | # Check predictions for sanity: 89 | accuracy, _ = compute_accuracy(model, data, regression=regression) 90 | if regression: 91 | logging.info("Training MSE of classifier {:.3f}".format(accuracy)) 92 | else: 93 | logging.info("Training accuracy of classifier {:.3f}".format(accuracy)) 94 | 95 | # Compute the Jacobian of the influence of each example on the optimal 96 | # parameters: 97 | logging.info(f"Computing influence Jacobian on training set...") 98 | start = time.time() 99 | J = model.influence_jacobian(data) 100 | time_per_sample = 1e3 * (time.time() - start) / len(data["targets"]) 101 | logging.info("Time taken per example {:.3f} (ms)".format(time_per_sample)) 102 | 103 | # Compute the Fisher information loss from the FIM (J^T J) for each example 104 | # in the training set (J^T J is the Fisher information with Gaussian output 105 | # perturbation on the parameters at a scale of 1): 106 | start = time.time() 107 | logging.info(f"Computing Fisher information loss...") 108 | etas = models.compute_information_loss(J) 109 | time_per_sample = 1e3 * (time.time() - start) / len(etas) 110 | logging.info( 111 | "Computed {} examples, maximum eta: {:.3f}, " 112 | "time per sample {:.3f} (ms).".format( 113 | len(etas), max(etas), time_per_sample)) 114 | 115 | # Compute some comparison points: 116 | losses, weight_dots, grad_norms = eval_comparison_stats(model, data) 117 | 118 | # Retrain the model and measure the new etas if removing most lossy 119 | # examples: 120 | if args.clip > 0: 121 | clipped_data = clip_data(data, etas, args.clip) 122 | logging.info( 123 | "Kept {}/{} samples, retrain and compute eta..".format( 124 | len(clipped_data["targets"]), len(data["targets"]))) 125 | model.train(clipped_data, l2=args.l2) 126 | J = model.influence_jacobian(clipped_data) 127 | etas = models.compute_information_loss(J) 128 | etamax = max(etas) 129 | else: 130 | etamax = max(etas) 131 | 132 | # Measure the test accuracy as a function of the noise needed to attain a 133 | # desired eta: 134 | accuracies = [] 135 | stds = [] 136 | for eta in args.etas: 137 | # Compute the Gaussian noise scale needed for eta: 138 | scale = etamax / eta 139 | # Measure test accuracy: 140 | accuracy, std = compute_accuracy( 141 | model, test_data, noise_scale=scale, trials=args.trials, 142 | regression=regression) 143 | accuracies.append(accuracy) 144 | stds.append(std) 145 | 146 | results = { 147 | "clip" : args.clip, 148 | "accuracies" : accuracies, 149 | "stds" : stds, 150 | "etas" : etas.tolist(), 151 | "train_losses" : losses, 152 | "train_dot_weights" : weight_dots, 153 | "train_grad_norms" : grad_norms, 154 | } 155 | with open(args.results_file, 'w') as fid: 156 | json.dump(results, fid) 157 | 158 | 159 | if __name__ == "__main__": 160 | parser = argparse.ArgumentParser(description="Fisher information loss.") 161 | parser.add_argument("--data_folder", default="/tmp", type=str, 162 | help="folder in which to store data (default: '/tmp')") 163 | parser.add_argument("--dataset", default="mnist", type=str, 164 | choices=["mnist", "cifar10", "cifar100", "uciadult", "iwpc"], 165 | help="dataset to use.") 166 | parser.add_argument("--model", default="least_squares", type=str, 167 | choices=["least_squares", "logistic"], 168 | help="type of model (default: least_squares)") 169 | parser.add_argument("--pca_dims", default=20, type=int, 170 | help="Number of PCA dimensions (if 0, uses raw features)") 171 | parser.add_argument("--no_norm", default=False, action="store_true", 172 | help="Don't normalize examples to lie in unit ball") 173 | parser.add_argument("--l2", default=0, type=float, 174 | help="l2 regularization parameter") 175 | parser.add_argument("--results_file", 176 | default="/tmp/private_model_results.json", type=str, 177 | help="file in which to save the results") 178 | parser.add_argument('--etas', metavar='N', type=float, 179 | nargs='+', default=[1.0], 180 | help='Fisher information loss levels (eta) to evaluate accuracy') 181 | parser.add_argument('--clip', type=float, default=0.0, 182 | help='eta removal threshold for data') 183 | parser.add_argument('--trials', type=int, default=100, 184 | help='number of trials to evaluate an output perturbed model') 185 | args = parser.parse_args() 186 | main(args) 187 | -------------------------------------------------------------------------------- /reweighted.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import argparse 10 | import json 11 | import logging 12 | import math 13 | import time 14 | import torch 15 | 16 | import models 17 | import dataloading 18 | 19 | # set up logger: 20 | logger = logging.getLogger() 21 | logger.setLevel(logging.INFO) 22 | 23 | 24 | def compute_accuracy(model, data, regression=False): 25 | X, y = data["features"], data["targets"] 26 | if regression: 27 | acc = model.loss(data).mean().item() 28 | else: 29 | predictions = model.predict(X) 30 | acc = ((predictions == y).float()).mean().item() 31 | return acc 32 | 33 | 34 | def get_weights(method, prev_weights, data): 35 | weights = torch.ones(len(data["targets"])) 36 | if method == "sample": 37 | weights[:] = prev_weights.data 38 | elif method == "class": 39 | n_class = data["targets"].max() + 1 40 | for c in range(n_class): 41 | mask = data["targets"] == c 42 | weights[mask] = prev_weights[mask].mean() 43 | elif method == "hybrid": 44 | n_class = data["targets"].max() + 1 45 | for c in range(n_class): 46 | mask = data["targets"] == c 47 | weights[mask] = prev_weights[mask].mean() 48 | weights *= prev_weights 49 | else: 50 | raise ValueError(f"Invalid weight method {method}.") 51 | return weights 52 | 53 | 54 | def main(args): 55 | regression = args.dataset == "iwpc" or args.dataset == "synth" 56 | data = dataloading.load_dataset( 57 | name=args.dataset, split="train", normalize=not args.no_norm, 58 | num_classes=2, root=args.data_folder, regression=regression) 59 | test_data = dataloading.load_dataset( 60 | name=args.dataset, split="test", normalize=not args.no_norm, 61 | num_classes=2, root=args.data_folder, regression=regression) 62 | if args.pca_dims > 0: 63 | data, pca = dataloading.pca(data, num_dims=args.pca_dims) 64 | test_data, _ = dataloading.pca(test_data, mapping=pca) 65 | 66 | model = models.get_model(args.model) 67 | 68 | # Find the optimal parameters for the model: 69 | logging.info(f"Training {args.model} model.") 70 | model.train(data, l2=args.l2) 71 | 72 | train_accuracy = compute_accuracy(model, data, regression=regression) 73 | test_accuracy = compute_accuracy(model, test_data, regression=regression) 74 | if regression: 75 | logging.info(f"MSE train {train_accuracy:.3f}," 76 | f" test: {test_accuracy:.3f}.") 77 | else: 78 | logging.info(f"Accuracy train {train_accuracy:.3f}," 79 | f" test: {test_accuracy:.3f}.") 80 | 81 | # Compute the Fisher information loss, eta, for each example in the 82 | # training set: 83 | logging.info("Computing unweighted etas on training set...") 84 | J = model.influence_jacobian(data) 85 | etas = models.compute_information_loss(J, target_attribute=args.attribute, 86 | constrained=args.constrained) 87 | logging.info(f"etas max: {etas.max().item():.4f}," 88 | f" mean: {etas.mean().item():.4f}, std: {etas.std().item():.4f}.") 89 | 90 | # Reweight using the fisher information loss: 91 | updated_fi = etas.reciprocal().detach() 92 | maxs = [etas.max().item()] 93 | means = [etas.mean().item()] 94 | stds = [etas.std().item()] 95 | train_accs = [train_accuracy] 96 | test_accs = [test_accuracy] 97 | all_weights = [torch.ones(len(updated_fi))] 98 | for i in range(args.iters): 99 | logging.info(f"Iter {i}: Training weighted model...") 100 | updated_fi *= (len(updated_fi) / updated_fi.sum()) 101 | # TODO does it make sense to renormalize after clamping? 102 | updated_fi.clamp_(min=args.min_weight, max=args.max_weight) 103 | weights = get_weights(args.weight_method, updated_fi, data) 104 | model.train(data, l2=args.l2, weights=weights.detach()) 105 | 106 | # Check predictions of weighted model: 107 | train_accuracy = compute_accuracy(model, data, regression=regression) 108 | test_accuracy = compute_accuracy(model, test_data, regression=regression) 109 | if regression: 110 | logging.info(f"Weighted model MSE train {train_accuracy:.3f}," 111 | f" test: {test_accuracy:.3f}.") 112 | else: 113 | logging.info(f"Weighted model accuracy train {train_accuracy:.3f}," 114 | f" test: {test_accuracy:.3f}.") 115 | 116 | J = model.influence_jacobian(data) 117 | weighted_etas = models.compute_information_loss(J, target_attribute=args.attribute, 118 | constrained=args.constrained) 119 | updated_fi /= weighted_etas 120 | maxs.append(weighted_etas.max().item()) 121 | means.append(weighted_etas.mean().item()) 122 | stds.append(weighted_etas.std().item()) 123 | train_accs.append(train_accuracy) 124 | test_accs.append(test_accuracy) 125 | all_weights.append(weights) 126 | logging.info(f"Weighted etas max: {maxs[-1]:.4f}," 127 | f" mean: {means[-1]:.4f}," 128 | f" std: {stds[-1]:.4f}.") 129 | 130 | results = { 131 | "weights" : weights.tolist(), 132 | "etas" : etas.tolist(), 133 | "weighted_etas" : weighted_etas.tolist(), 134 | "eta_maxes" : maxs, 135 | "eta_means" : means, 136 | "eta_stds" : stds, 137 | "train_accs" : train_accs, 138 | "test_accs" : test_accs, 139 | } 140 | 141 | with open(args.results_file + ".json", 'w') as fid: 142 | json.dump(results, fid) 143 | torch.save(torch.stack(all_weights), args.results_file + ".pth") 144 | 145 | 146 | if __name__ == "__main__": 147 | parser = argparse.ArgumentParser(description="Information loss.") 148 | parser.add_argument("--data_folder", default="/tmp", type=str, 149 | help="folder in which to store data (default: '/tmp')") 150 | parser.add_argument("--dataset", default="mnist", type=str, 151 | choices=["mnist", "cifar10", "cifar100", "uciadult", "iwpc", "synth"], 152 | help="dataset to use.") 153 | parser.add_argument("--model", default="least_squares", type=str, 154 | choices=["least_squares", "logistic"], 155 | help="type of model (default: least_squares)") 156 | parser.add_argument("--weight_method", default="sample", type=str, 157 | choices=["sample", "class", "hybrid"], 158 | help="Method to weight the loss by (default: sample)") 159 | parser.add_argument("--min_weight", default=0, type=float, 160 | help="Minimum per-sample weight (default: 0)") 161 | parser.add_argument("--max_weight", default=float("inf"), type=float, 162 | help="Maximum per-sample weight (default: inf)") 163 | parser.add_argument("--attribute", default=None, nargs="+", type=int, 164 | help="Which attributes to reweight for privacy (None for full)") 165 | parser.add_argument("--iters", default=1, type=int, 166 | help="Number of iterations.") 167 | parser.add_argument("--pca_dims", default=20, type=int, 168 | help="Number of PCA dimensions (if 0, uses raw features)") 169 | parser.add_argument("--no_norm", default=False, action="store_true", 170 | help="Don't normalize examples to lie in unit ball") 171 | parser.add_argument("--constrained", default=False, action="store_true", 172 | help="Use constrained Fisher information matrix") 173 | parser.add_argument("--l2", default=0, type=float, 174 | help="l2 regularization parameter") 175 | parser.add_argument("--results_file", 176 | default="/tmp/private_model_results", type=str, 177 | help="file in which to save the results") 178 | args = parser.parse_args() 179 | main(args) 180 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import torch 10 | 11 | 12 | def compute_information_loss(J, target_attribute=None, constrained=False): 13 | """ 14 | Compute the Fisher information loss, eta, from the largest eigenvalues 15 | of the unscaled Fisher information matrix for each sample. 16 | 17 | Arguments: 18 | target_attribute: Used to specify a range of attributes 19 | (default None for all attributes) 20 | constrained: If True, constrain the attributes to sum to 1 by 21 | multiplying FIM with an orthogonal matrix U 22 | """ 23 | if target_attribute is not None: 24 | J = J[:, :, range(*target_attribute)] 25 | if constrained: 26 | d = J.size(2) 27 | assert d > 1, 'Cannot constrain 1-dimensional attribute vector' 28 | U = torch.vstack([torch.zeros(1, d-1), torch.ones(d-1, d-1).tril()]) 29 | U /= -U.sum(0).unsqueeze(0) 30 | U += torch.vstack([torch.eye(d-1), torch.zeros(1, d-1)]) 31 | U /= U.norm(2, 0).unsqueeze(0) 32 | J = J @ U 33 | return torch.linalg.norm(J, ord=2, dim=(1, 2)) 34 | 35 | 36 | def get_model(model_type): 37 | if type(model_type) is type: 38 | return model_type() 39 | 40 | if model_type == "least_squares": 41 | return LeastSquares() 42 | elif model_type == "logistic": 43 | return Logistic() 44 | raise ValueError(f"Unknown model type {model_type}") 45 | 46 | 47 | def weighted_least_squares_jacobian(A, theta, X, y, w): 48 | """ 49 | Jacobian of the remove and update function with respect to the update. 50 | """ 51 | r = (X @ theta - y)[:, None, None] 52 | XA = X @ A 53 | JX = -(r * A.unsqueeze(0) + XA.unsqueeze(2) * theta[None, None, :]) 54 | return w[:, None, None] * JX, w[:, None] * XA 55 | 56 | 57 | def least_squares_jacobian(A, theta, X, y): 58 | """ 59 | Jacobian of the remove and update function with respect to the update. 60 | """ 61 | r = (X @ theta - y)[:, None, None] 62 | XA = X @ A 63 | JX = -(r * A.unsqueeze(0) + XA.unsqueeze(2) * theta[None, None, :]) 64 | return JX, XA 65 | 66 | 67 | class LeastSquares: 68 | 69 | def train(self, data, l2=0, weights=None): 70 | n = len(data["targets"]) 71 | if weights is None: 72 | weights = torch.ones(n) 73 | assert len(weights) == n, "Invalid number of weights" 74 | 75 | # Save the weights for the jacobian: 76 | self.weights = weights 77 | 78 | X = data["features"] 79 | y = data["targets"].float() 80 | # [-1, 1] works much better for regression 81 | y[y == 0] = -1 82 | XTX = (weights[:, None] * X).T @ X 83 | XTXdiag = torch.diagonal(XTX) 84 | XTXdiag += (n * l2) 85 | b = X.T @ (weights * y) 86 | theta = torch.solve(b[:, None], XTX)[0].squeeze(1) 87 | # Need A to compute the Jacobian. 88 | A = torch.inverse(XTX) 89 | self.A = A 90 | self.theta = theta 91 | 92 | def get_params(self): 93 | return self.theta 94 | 95 | def set_params(self, theta): 96 | self.theta = theta 97 | 98 | def predict(self, X, regression=False): 99 | """ 100 | Given a data matrix X with examples as rows, 101 | returns a {0, 1} prediction for each x in X. 102 | """ 103 | if regression: 104 | return X @ self.theta 105 | else: 106 | return (X @ self.theta) > 0 107 | 108 | def loss(self, data): 109 | """ 110 | Evaluate the loss for each example in a given dataset. 111 | """ 112 | X = data["features"] 113 | y = data["targets"].float() 114 | # [-1, 1] works much better for regression 115 | y[y == 0] = -1 116 | return (X @ self.theta - y)**2 / 2 117 | 118 | def influence_jacobian(self, data, weighted=True): 119 | """ 120 | Compute the Jacobian of the influence of each 121 | example on the optimal parameters. The resulting 122 | Jacobian will have shape N x d x (d+1) where N is 123 | the number of data points. 124 | """ 125 | X = data["features"] 126 | y = data["targets"].float() 127 | y[y == 0] = -1 128 | if weighted: 129 | JX, Jy = weighted_least_squares_jacobian( 130 | self.A, self.theta, X, y, self.weights) 131 | else: 132 | JX, Jy = least_squares_jacobian( 133 | self.A, self.theta, X, y) 134 | return torch.cat([JX, Jy.unsqueeze(2)], dim=2) 135 | 136 | 137 | class Logistic: 138 | 139 | def train(self, data, l2=0, init=None, weights=None): 140 | n = len(data["targets"]) 141 | if weights is None: 142 | weights = torch.ones(n) 143 | assert len(weights) == n, "Invalid number of weights" 144 | 145 | # Save for the jacobian: 146 | self.weights = weights 147 | self.l2 = n * l2 148 | 149 | X = data["features"] 150 | y = data["targets"].float() 151 | theta = torch.randn(X.shape[1], requires_grad=True) 152 | if init is not None: 153 | theta.data[:] = init[:] 154 | 155 | crit = torch.nn.BCEWithLogitsLoss(reduction="none") 156 | optimizer = torch.optim.LBFGS([theta], line_search_fn="strong_wolfe") 157 | def closure(): 158 | optimizer.zero_grad() 159 | loss = (crit(X @ theta, y) * weights).sum() 160 | loss += (self.l2 / 2.0) * (theta**2).sum() 161 | loss.backward() 162 | return loss 163 | for _ in range(100): 164 | loss = optimizer.step(closure) 165 | self.theta = theta 166 | 167 | def get_params(self): 168 | return self.theta 169 | 170 | def set_params(self, theta): 171 | self.theta = theta 172 | 173 | def predict(self, X): 174 | """ 175 | Given a data matrix X with examples as rows, 176 | returns a {0, 1} prediction for each x in X. 177 | """ 178 | return (X @ self.theta) > 0 179 | 180 | def loss(self, data): 181 | X = data["features"] 182 | y = data["targets"].float() 183 | return torch.nn.BCEWithLogitsLoss(reduction="none")(X @ self.theta, y) 184 | 185 | def influence_jacobian(self, data): 186 | """ 187 | Compute the Jacobian of the influence of each 188 | example on the optimal parameters. The resulting 189 | Jacobian will have shape N x d x (d+1) where N is 190 | the number of data points. 191 | """ 192 | X = data["features"] 193 | y = data["targets"].float() 194 | 195 | # Compute the Hessian at theta for all X: 196 | s = (X @ self.theta).sigmoid().unsqueeze(1) 197 | H = (self.weights.unsqueeze(1) * s * (1-s) * X).T @ X 198 | Hdiag = torch.diagonal(H) 199 | Hdiag += self.l2 200 | Hinv = H.inverse() 201 | 202 | # Compute the Jacobian of the gradient w.r.t. theta at each (x, y) pair 203 | XHinv = X @ Hinv 204 | JX = -(s * (1-s) * XHinv).unsqueeze(2) * self.theta[None, None, :] 205 | JX -= (s - y.unsqueeze(1)).unsqueeze(2) * Hinv.unsqueeze(0) 206 | JX = self.weights[:, None, None] * JX 207 | JY = (self.weights[:, None] * XHinv).unsqueeze(2) 208 | return torch.cat([JX, JY], dim=2) 209 | 210 | class MultinomialLogistic: 211 | 212 | def train(self, data): 213 | X = data["features"] 214 | y = data["targets"] 215 | c = torch.max(y) + 1 216 | 217 | theta = torch.randn((X.shape[1], c), requires_grad=True) 218 | optimizer = torch.optim.LBFGS([theta], line_search_fn="strong_wolfe") 219 | crit = torch.nn.CrossEntropyLoss(reduction="mean") 220 | def closure(): 221 | optimizer.zero_grad() 222 | loss = crit(X @ theta, y) 223 | loss.backward() 224 | return loss 225 | for _ in range(100): 226 | loss = optimizer.step(closure) 227 | 228 | self.theta = theta 229 | 230 | def predict(self, X): 231 | """ 232 | Given a data matrix X with examples as rows, 233 | returns a {0, 1} prediction for each x in X. 234 | """ 235 | return torch.argmax(X @ self.theta, axis=1) 236 | -------------------------------------------------------------------------------- /test_jacobians.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import math 10 | import torch 11 | import unittest 12 | 13 | from models import least_squares_jacobian 14 | from models import weighted_least_squares_jacobian 15 | 16 | 17 | def least_squares_jacobian_single(A, theta, x, y): 18 | """ 19 | Jacobian of the remove and update function with respect to the update. 20 | """ 21 | r = (theta.dot(x) - y) 22 | Ax = A @ x 23 | return -(r * A + Ax.ger(theta)), Ax 24 | 25 | 26 | def weighted_least_squares_jacobian_single(A, theta, x, y, w): 27 | """ 28 | Jacobian of the remove and update function with respect to the update for 29 | weighted least squares. 30 | """ 31 | sqw = math.sqrt(w) 32 | x = sqw * x 33 | y = sqw * y 34 | r = (theta.dot(x) - y) 35 | Ax = A @ x 36 | return -sqw * (r * A + Ax.ger(theta)), sqw * Ax 37 | 38 | 39 | def least_squares_update(A, theta, x, y, w=1.0): 40 | """ 41 | Updates a given solution to the least squares problem to incorporate the 42 | new data point (x, y). 43 | A is `(X^T X)^{-1}` and theta are the optimal parameters (`A X^T y`) 44 | """ 45 | x = math.sqrt(w) * x 46 | y = math.sqrt(w) * y 47 | Ax = A @ x 48 | c = 1 / (1 + x.T @ Ax) 49 | return theta - c * (x.T @ theta - y) * Ax 50 | 51 | 52 | def least_squares_remove_and_update(A, theta, X, Y, xr, yr, xn, yn): 53 | Au = rank_two_update(A, xr, xn, steps=[-1, 1]) 54 | return Au @ (X.T @ Y - xr * yr + xn * yn) 55 | 56 | 57 | def least_squares_jacobian_update(A, theta, x, y): 58 | """ 59 | Jacobian of the update function with respect to the update. 60 | """ 61 | Ax = A @ x 62 | c = 1 / (1 + x.T @ Ax) 63 | r = x.dot(theta) - y 64 | t1 = ((2 * r * c**2) * Ax).ger(Ax) 65 | t2 = -r * c * A 66 | t3 = -c * Ax.ger(theta) 67 | Jx = t1 + t2 + t3 68 | Jy = c * Ax 69 | return Jx, Jy 70 | 71 | 72 | def rank_one_update(A, x, step=1): 73 | """ 74 | Compute the updated A when applying the rank-one update of x on A^{-1}. 75 | E.g. compute `(A^{-1} + step * x x^T)^{-1}` using the Sherman-Morrison formula. 76 | """ 77 | Ax = A @ x 78 | c = 1 / (step + x.dot(Ax)) 79 | return A - c * Ax.ger(Ax) 80 | 81 | 82 | def rank_two_update(A, x1, x2, steps=[1, 1]): 83 | steps = torch.tensor([1 / s for s in steps]) 84 | U = torch.stack([x1, x2], dim=1) 85 | AU = A @ U 86 | D = (U.T @ AU + torch.diag(steps)).inverse() 87 | return A - AU @ D @ AU.T 88 | 89 | 90 | class TestJacobian(unittest.TestCase): 91 | 92 | def test_rank_one_update(self): 93 | d = 10 94 | A = torch.rand(d, d) 95 | x = torch.rand(d) 96 | Aminus = rank_one_update(A, x, step=-1) 97 | Aplus = rank_one_update(Aminus, x, step=1) 98 | self.assertTrue(torch.allclose(A, Aplus, rtol=1e-4, atol=1e-4)) 99 | 100 | 101 | def test_rank_two_update(self): 102 | d = 10 103 | A = torch.rand(d, d) 104 | A = (A + A.T) + 5 * torch.eye(10) 105 | x1 = torch.rand(d) 106 | x2 = torch.rand(d) 107 | Atwo = rank_two_update(A, x1, x2, steps=[1, 1]) 108 | Aone_one = rank_one_update(rank_one_update(A, x1), x2) 109 | self.assertTrue(torch.allclose(Aone_one, Atwo, rtol=1e-4, atol=1e-4)) 110 | 111 | Aminus = rank_two_update(A, x1, x2, steps=[1, 1]) 112 | Aplus = rank_two_update(Aminus, x1, x2, steps=[-1, -1]) 113 | # TODO this seems to be too unstable.. 114 | self.assertTrue(torch.allclose(A, Aplus, rtol=1e-2, atol=1e-2)) 115 | 116 | 117 | def test_jacobians(self): 118 | # Make a random sample: 119 | d = 10 120 | n = 20 121 | X = torch.randn(n, d) 122 | Y = torch.randn(n) 123 | 124 | # Find least squares solution for the full dataset: 125 | A = torch.inverse(X.T @ X) 126 | theta = A @ (X.T @ Y) 127 | 128 | xi = X[0, :] 129 | yi = Y[0] 130 | 131 | # Method 1:, Compute Jacobian w.r.t. x_i, y_i by 132 | # 1. Removing x_i, y_i from A and theta 133 | # 2. Expressing theta* as a function of x, y via a rank-one update 134 | # 3. Computing the Jacobian of 2 at x_i, y_i 135 | A_minus = rank_one_update(A, xi, step=-1) 136 | theta_minus = A_minus @ (X.T @ Y - xi * yi) 137 | def f_x_y(x, y): 138 | return least_squares_update(A_minus, theta_minus, x, y) 139 | 140 | # Using torch autograd: 141 | Jx_auto, Jy_auto = torch.autograd.functional.jacobian(f_x_y, (xi, yi)) 142 | 143 | # Using closed form: 144 | Jx, Jy = least_squares_jacobian_update(A_minus, theta_minus, xi, yi) 145 | 146 | self.assertTrue(torch.allclose(Jy.squeeze(), Jy_auto.squeeze(), rtol=1e-4, atol=1e-4)) 147 | self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4)) 148 | 149 | # Method 2: Compute Jacobian w.r.t. x_i, y_i by 150 | # 1. Expressing theta* as a function of removing x_i, y_i and adding in 151 | # x, y via a rank-two update 152 | # 2. Computing the Jacobian of 2 at x, y 153 | def f_x_y(x, y): 154 | return least_squares_remove_and_update(A, theta, X, Y, xi, yi, x, y) 155 | Jx_auto2, Jy_auto2 = torch.autograd.functional.jacobian(f_x_y, (xi, yi)) 156 | self.assertTrue(torch.allclose(Jy_auto.squeeze(), Jy_auto2.squeeze(), rtol=1e-4, atol=1e-4)) 157 | self.assertTrue(torch.allclose(Jx_auto, Jx_auto2, rtol=1e-4, atol=1e-4)) 158 | 159 | # Using closed form: 160 | Jx, Jy = least_squares_jacobian_single(A, theta, xi, yi) 161 | self.assertTrue(torch.allclose(Jy, Jy_auto, rtol=1e-4, atol=1e-4)) 162 | self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4)) 163 | 164 | def test_weighted_jacobian(self): 165 | # Make a random sample: 166 | d = 10 167 | n = 20 168 | W = torch.diag(torch.ones(n)) 169 | X = torch.randn(n, d) 170 | Y = torch.randn(n) 171 | 172 | for w in [0.25, 0.5, 1.0, 2.0, 3.0, 4.0]: 173 | 174 | W[0, 0] = w 175 | 176 | # Find least squares solution for the full dataset: 177 | A = torch.inverse(X.T @ W @ X) 178 | theta = A @ (X.T @ W @ Y) 179 | 180 | xi = X[0, :] 181 | yi = Y[0] 182 | 183 | A_minus = rank_one_update(A, math.sqrt(w) * xi, step=-1) 184 | theta_minus = A_minus @ (X.T @ W @ Y - w * xi * yi) 185 | def f_x_y(x, y): 186 | return least_squares_update(A_minus, theta_minus, x, y, w) 187 | 188 | # Using torch autograd: 189 | Jx_auto, Jy_auto = torch.autograd.functional.jacobian(f_x_y, (xi, yi)) 190 | 191 | # Using closed form: 192 | Jx, Jy = weighted_least_squares_jacobian_single(A, theta, xi, yi, w=w) 193 | self.assertTrue(torch.allclose(Jy, Jy_auto, rtol=1e-4, atol=1e-4)) 194 | self.assertTrue(torch.allclose(Jx, Jx_auto, rtol=1e-4, atol=1e-4)) 195 | 196 | 197 | def test_batched_jacobian(self): 198 | d = 10 199 | n = 20 200 | X = torch.randn(n, d) 201 | Y = torch.randn(n) 202 | 203 | # Find least squares solution for the full dataset: 204 | A = torch.inverse(X.T @ X) 205 | theta = A @ (X.T @ Y) 206 | 207 | batchJx, batchJy = least_squares_jacobian(A, theta, X, Y) 208 | 209 | singleJs = [least_squares_jacobian_single(A, theta, x, y) for x, y in zip(X, Y)] 210 | singleJx, singleJy = zip(*singleJs) 211 | self.assertTrue(torch.allclose(batchJx, torch.stack(singleJx), rtol=1e-4, atol=1e-4)) 212 | self.assertTrue(torch.allclose(batchJy, torch.stack(singleJy), rtol=1e-4, atol=1e-4)) 213 | 214 | # With weights: 215 | W = torch.rand(n) * 5 216 | A = (W.unsqueeze(1) * X).T @ X 217 | theta = A @ (X.T @ (W * Y)) 218 | 219 | batchJx, batchJy = weighted_least_squares_jacobian(A, theta, X, Y, W) 220 | 221 | singleJs = [weighted_least_squares_jacobian_single(A, theta, x, y, w) for x, y, w in zip(X, Y, W)] 222 | singleJx, singleJy = zip(*singleJs) 223 | self.assertTrue(torch.allclose(batchJx, torch.stack(singleJx), rtol=1e-3, atol=1e-4)) 224 | self.assertTrue(torch.allclose(batchJy, torch.stack(singleJy), rtol=1e-4, atol=1e-4)) 225 | 226 | 227 | def test_logistic_jacobian(self): 228 | def grad(w, x, y, l2): 229 | s = torch.sigmoid(w @ x) 230 | return x @ (s - y) + x.shape[1] * l2 * w 231 | 232 | def Hinv(w, x, l2): 233 | s = torch.sigmoid(w @ x) 234 | H = s * (1 - s) * x @ x.T + x.shape[1] * l2 * torch.eye(x.shape[0]) 235 | return torch.inverse(H) 236 | 237 | def solve(x, y, l2, its=30): 238 | w = 1e-1*torch.randn(x.shape[0], dtype=torch.double) 239 | for it in range(its): 240 | w = w - Hinv(w, x, l2=l2) @ grad(w, x, y, l2=l2) 241 | assert grad(w, x, y, l2).norm().item() < 1e-12, "Did not converge." 242 | return w 243 | 244 | def compute_Jf_exact(w, x, y, l2, i): 245 | xi = x[:, i] 246 | yi = y[i] 247 | si = torch.sigmoid(w.dot(xi)) 248 | nabla_xyw = si * (si - 1) * xi.ger(w) + (yi - si) * torch.eye(xi.shape[0]) 249 | Hi = Hinv(w, x, l2) 250 | return Hi @ nabla_xyw 251 | 252 | def compute_Jf_fd(x, y, l2, i, epsilon=1e-6): 253 | Jf_fd = [] 254 | for j in range(x.shape[0]): 255 | x[j, i] += epsilon 256 | w_up = solve(x, y, l2) 257 | x[j, i] -= 2*epsilon 258 | w_down = solve(x, y, l2) 259 | Jf_fd.append((w_up - w_down) / (2* epsilon)) 260 | x[j, i] += epsilon 261 | return torch.stack(Jf_fd, dim=1) 262 | 263 | torch.random.manual_seed(123) 264 | d = 4 265 | n = 40 266 | x = torch.randn((d, n), dtype=torch.double) 267 | y = torch.randint(high=1, size=(n,)) 268 | l2 = 1e-5 269 | 270 | # Compute Jacobian at x_0 with finite differences 271 | Jf_fd = compute_Jf_fd(x, y, l2, 0) 272 | 273 | # Compute Jacobian at x_0 analytically 274 | w_star = solve(x, y, l2) 275 | Jf = compute_Jf_exact(w_star, x, y, l2, 0) 276 | 277 | self.assertTrue(torch.allclose(Jf, Jf_fd, rtol=1e-7, atol=1e-7)) 278 | 279 | 280 | # run all the tests: 281 | if __name__ == '__main__': 282 | unittest.main() 283 | -------------------------------------------------------------------------------- /model_inversion.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import argparse 10 | import json 11 | import logging 12 | import math 13 | import time 14 | import torch 15 | 16 | import models 17 | import dataloading 18 | 19 | # set up logger: 20 | logger = logging.getLogger() 21 | logger.setLevel(logging.INFO) 22 | 23 | 24 | INVERTERS = ["baseline", "ideal", "fredrikson14", "whitebox", "all"] 25 | 26 | 27 | def features_to_category(values, one_hot=False): 28 | if not one_hot: 29 | last_value = (values==0).all(axis=1, keepdim=True).to(values.dtype) 30 | values = torch.cat([values, last_value], dim=1) 31 | return torch.argmax(values, dim=1) 32 | 33 | 34 | def compute_log_marginal(train_data, target_attribute, one_hot=False, weights=None): 35 | if weights is None: 36 | weights = torch.ones(train_data["features"].size(0)) 37 | target_values = train_data["features"][:, range(*target_attribute)] 38 | if not one_hot: 39 | last_value = (target_values==0).all( 40 | axis=1, keepdim=True).to(target_values.dtype) 41 | target_values = torch.cat([target_values, last_value], dim=1) 42 | return (weights[:, None] * target_values).sum(axis=0).log() 43 | 44 | 45 | def baseline_inverter( 46 | train_data, target_attribute, weights=None, one_hot=False, **kwargs): 47 | """ 48 | The baseline inverter simply measures the prior for the target attribute 49 | (using the training data) and predicts the mode of the prior for every 50 | target example. 51 | """ 52 | # NB A stronger baseline might have access to the joint prior and/or the 53 | # target label 54 | log_marginal = compute_log_marginal( 55 | train_data, target_attribute, weights=weights, one_hot=one_hot) 56 | prediction = torch.argmax(log_marginal) 57 | return torch.full( 58 | size=(train_data["features"].shape[0],), 59 | fill_value=prediction, 60 | dtype=torch.long) 61 | 62 | 63 | def ideal_inverter(train_data, target_attribute, **kwargs): 64 | """ 65 | The ideal inverter uses the training data to learn a model to predict the 66 | target attribute given all other features including the label. 67 | """ 68 | def swap_feature_target(data): 69 | features = data["features"] 70 | new_features = torch.cat([ 71 | features[:, :target_attribute[0]], 72 | features[:, target_attribute[1]:], 73 | data["targets"][:, None].float() 74 | ], axis=1) 75 | new_targets = features_to_category( 76 | features[:, range(*target_attribute)]) 77 | return { 78 | "features" : new_features, 79 | "targets" : new_targets, 80 | } 81 | tmp_train = swap_feature_target(train_data) 82 | model = models.MultinomialLogistic() 83 | model.train(tmp_train) 84 | return model.predict(tmp_train["features"]) 85 | 86 | 87 | def fredrikson14_inverter( 88 | train_data, target_attribute, model, weights=None, one_hot=False, **kwargs): 89 | """ 90 | Implements the model inversion attack of: 91 | Fredrikson, 2014, Privacy in Pharmacogenetics: An End-to-End Case Study 92 | of Personalized Warfarin Dosing 93 | """ 94 | if weights is None: 95 | weights = torch.ones(train_data["features"].size(0)) 96 | 97 | log_marginal = compute_log_marginal( 98 | train_data, target_attribute, weights=weights, one_hot=one_hot) 99 | 100 | if type(model) == models.LeastSquares: 101 | n, d = train_data["features"].shape 102 | std_var = (weights * model.loss(train_data)).sum().true_divide(n - d) 103 | score_fn = lambda data : -0.5 * (weights * model.loss(data)) / std_var 104 | elif type(model) == models.Logistic: 105 | preds = model.predict(train_data["features"]) 106 | y = train_data["targets"] 107 | matched = preds == y 108 | confusions = torch.tensor([ 109 | [matched[y == 0].sum(), (~matched)[y == 0].sum()], 110 | [(~matched)[y == 1].sum(), matched[y == 1].sum()] 111 | ]) 112 | pi = confusions.true_divide(confusions.sum(axis=0, keepdim=True)) 113 | def score_fn(data): 114 | preds = model.predict(data["features"]) 115 | y = data["targets"] 116 | return pi[y, preds.long()].log() 117 | else: 118 | raise ValueError("Unknown model type.") 119 | 120 | # For each possible value of the target attribute compute score for the 121 | # attribute which should be proportional to log pi(y, y') + log p(x), where 122 | # pi(y, y') is a model dependent performance measure. 123 | tgt_features = train_data["features"].clone().detach() 124 | tgt_features[:, range(*target_attribute)] = 0. 125 | 126 | scores = [] 127 | for c in range(*target_attribute): 128 | tgt_features[:, c] = 1. 129 | score = score_fn( 130 | {"features": tgt_features, "targets": train_data["targets"]}) 131 | score += log_marginal[c - target_attribute[0]] 132 | scores.append(score) 133 | tgt_features[:, c] = 0. 134 | # Try all 0s 135 | if not one_hot: 136 | score = score_fn( 137 | {"features": tgt_features, "targets": train_data["targets"]}) 138 | score += log_marginal[-1] 139 | scores.append(score) 140 | 141 | # Make the prediction: 142 | return torch.argmax(torch.stack(scores, axis=1), axis=1) 143 | 144 | 145 | class WhiteboxInverter: 146 | 147 | def __init__(self, train_data, target_attribute, model_type, weights, l2, one_hot=False): 148 | # Compute marginal counts (proportional to log marginal): 149 | self.log_marginal = compute_log_marginal( 150 | train_data, target_attribute, weights=weights, one_hot=one_hot) 151 | self.one_hot = one_hot 152 | # Store learned theta for each possible attribute 153 | self.thetas = torch.zeros( 154 | train_data["features"].size(0), 155 | target_attribute[1] - target_attribute[0] + (not one_hot), 156 | train_data["features"].size(1)) 157 | for i in range(train_data["features"].size(0)): 158 | tmp = train_data["features"][i, range(*target_attribute)].clone() 159 | train_data["features"][i, range(*target_attribute)] = 0. 160 | for c in range(*target_attribute): 161 | j = c - target_attribute[0] 162 | train_data["features"][i, c] = 1. 163 | model = models.get_model(model_type) 164 | model.train(train_data, l2=l2, weights=weights) 165 | self.thetas[i, j] = model.theta 166 | train_data["features"][i, c] = 0. 167 | if not one_hot: 168 | # Try all 0s (last attribute) 169 | model = models.get_model(model_type) 170 | model.train(train_data, l2=l2, weights=weights) 171 | self.thetas[i, -1] = model.theta 172 | train_data["features"][i, range(*target_attribute)] = tmp 173 | 174 | # prior_lam controls strength of prior 175 | def predict(self, model, gamma=0, prior_lam=0): 176 | scores = -(self.thetas - model.theta.view(1, 1, -1)).pow(2).sum(2) 177 | if gamma > 0 and prior_lam > 0: 178 | scores = 0.5 * scores / gamma**2 + prior_lam * self.log_marginal.unsqueeze(0) 179 | return torch.argmax(scores, axis=1) 180 | 181 | 182 | def whitebox_inverter(train_data, target_attribute, model, weights=None, l2=0.0, **kwargs): 183 | """ 184 | Whitebox inverter has access to the trained model and all of the trained 185 | data less the one example's target attribute value. Predictions are made 186 | from solving: 187 | argmax_{attribute} p(model | available data, attribute) p(attribute) 188 | """ 189 | inverter = WhiteboxInverter(train_data, target_attribute, type(model), weights, l2) 190 | return inverter.predict(model) 191 | 192 | 193 | def compute_metrics(data, predictions, attribute): 194 | """ 195 | Computes the accuracy for the `predictions` of 196 | `attribute` on `data`. 197 | """ 198 | reference = features_to_category(data["features"][:, range(*attribute)]) 199 | return (predictions == reference).float().mean().item() 200 | 201 | 202 | def run_inversion(args, data, weights): 203 | regression = (args.dataset == "iwpc" or args.dataset == "synth") 204 | 205 | # Train model: 206 | model = models.get_model(args.model) 207 | logging.info(f"Training model {args.model}") 208 | model.train(data, l2=args.l2, weights=weights) 209 | # Check predictions for sanity: 210 | predictions = model.predict(data["features"], regression=regression) 211 | if regression: 212 | acc = (predictions - data["targets"]).pow(2).mean() 213 | logging.info(f"Training MSE of regressor {acc.item():.3f}") 214 | else: 215 | acc = ((predictions == data["targets"]).float()).mean() 216 | logging.info(f"Training accuracy of classifier {acc.item():.3f}") 217 | 218 | # The target attribute can be specified as a range, e.g. `(4, 8)` means the 219 | # 4th through the 7th feature are the values of the encoded target attribute. 220 | if args.dataset == "uciadult": 221 | target_attribute = (24, 25) # [not married, married] 222 | elif args.dataset == "iwpc": 223 | #target_attribute = (2, 7) # CYP2C9 genotype 224 | target_attribute = (11, 13) # VKORC1 genotype 225 | else: 226 | raise NotImplementedError("Dataset not yet implemented.") 227 | 228 | if args.inverter == "all": 229 | inverters = INVERTERS[:-1] 230 | else: 231 | inverters = [args.inverter] 232 | 233 | target = features_to_category(data["features"][:, range(*target_attribute)]) 234 | results = { "target" : target.tolist() } 235 | 236 | for inverter in inverters: 237 | invert_fn = globals()[f"{inverter}_inverter"] 238 | predictions = invert_fn( 239 | data, target_attribute, model=model, weights=weights, l2=args.l2) 240 | acc = compute_metrics(data, predictions, target_attribute) 241 | logging.info(f"{inverter} inverter Accuracy {acc:.4f}") 242 | results[inverter] = predictions.tolist() 243 | 244 | return results 245 | 246 | 247 | def main(args): 248 | regression = (args.dataset == "iwpc" or args.dataset == "synth") 249 | data = dataloading.load_dataset( 250 | name=args.dataset, split="train", normalize=False, 251 | num_classes=2, root=args.data_folder, regression=regression) 252 | if args.subsample > 0: 253 | data = dataloading.subsample(data, args.subsample) 254 | 255 | if args.weights_file is not None: 256 | all_weights = torch.load(args.weights_file) 257 | else: 258 | all_weights = [torch.ones(len(data["targets"]))] 259 | 260 | results = [] 261 | for it, weights in enumerate(all_weights): 262 | if len(all_weights) > 1: 263 | logging.info(f"Iteration {it} weights for model inversion.") 264 | results.append(run_inversion(args, data, weights)) 265 | 266 | if args.results_file is not None: 267 | with open(args.results_file, 'w') as fid: 268 | json.dump(results, fid) 269 | 270 | 271 | if __name__ == "__main__": 272 | parser = argparse.ArgumentParser(description="Model inversion.") 273 | parser.add_argument("--data_folder", default="/tmp", type=str, 274 | help="folder in which to store data (default: '/tmp')") 275 | parser.add_argument("--dataset", default="uciadult", type=str, 276 | choices=["uciadult", "iwpc", "synth"], 277 | help="dataset to use.") 278 | parser.add_argument("--model", default="least_squares", type=str, 279 | choices=["least_squares", "logistic"], 280 | help="type of model (default: least_squares)") 281 | parser.add_argument("--inverter", default="fredrikson14", type=str, 282 | choices=INVERTERS, 283 | help="inversion method to use (default: fredrikson14)") 284 | parser.add_argument("--l2", default=0, type=float, 285 | help="l2 regularization parameter") 286 | parser.add_argument("--subsample", default=0, type=int, 287 | help="number of training examples") 288 | parser.add_argument("--weights_file", default=None, type=str, 289 | help="(optional) file to load IRFIL weights from") 290 | parser.add_argument("--results_file", default=None, type=str, 291 | help="(optional) path to save results") 292 | args = parser.parse_args() 293 | main(args) 294 | -------------------------------------------------------------------------------- /scripts/make_figures.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import argparse 10 | import json 11 | import math 12 | import matplotlib.pyplot as plt 13 | import numpy as np 14 | import os 15 | import scipy.stats as scs 16 | import seaborn as sns 17 | 18 | import sys 19 | sys.path.append("..") 20 | 21 | import dataloading 22 | import plotting 23 | 24 | 25 | COLOR = sns.cubehelix_palette(1, start=2, rot=0, dark=0, light=.5)[0] 26 | 27 | 28 | def load_results(results_path, file_name): 29 | with open(os.path.join(results_path, file_name), 'r') as fid: 30 | return json.load(fid) 31 | 32 | 33 | def eta_overlap(results_path, prefix): 34 | etas_li = np.array(load_results( 35 | results_path, f"{prefix}_linear_pca20.json")["etas"]) 36 | etas_lo = np.array(load_results( 37 | results_path, f"{prefix}_logistic_pca20.json")["etas"]) 38 | idx_li = etas_li.argsort() 39 | idx_lo = etas_lo.argsort() 40 | n_overlap = len(set(idx_li[-500:]).intersection(idx_lo[-500:])) 41 | print("="*20) 42 | print(prefix) 43 | print(f"Num overlap {n_overlap}/100") 44 | 45 | 46 | def eta_histogram(results_path, save_path, prefix, train, labels): 47 | """ 48 | Histogram of samples by individual sample eta. 49 | """ 50 | plt.clf() 51 | 52 | n_class = len(labels) 53 | colors = sns.cubehelix_palette(n_class, start=2, rot=0, dark=0, light=.5) 54 | 55 | plt.figure(figsize=(6, 6)) 56 | etas = np.array(load_results( 57 | results_path, f"{prefix}_pca20.json")["etas"]) 58 | targets = train["targets"].numpy() 59 | etas = [etas[targets == c] for c in range(n_class)] 60 | 61 | for c in range(n_class): 62 | plt.hist( 63 | etas[c], bins=80, color=colors[c], alpha=0.5, label=f"{labels[c]}") 64 | plt.axvline( 65 | etas[c].mean(), color=colors[c], linestyle='dashed', linewidth=2) 66 | plt.xlabel("Per sample $\eta$", fontsize=30) 67 | plt.xticks(fontsize=26) 68 | plt.ylabel("Number of samples", fontsize=30) 69 | plt.yticks(fontsize=26) 70 | plt.legend() 71 | plotting.savefig(os.path.join(save_path, f"{prefix}_eta_hist")) 72 | 73 | 74 | def view_images(train, results_path, save_path, prefix): 75 | """ 76 | View the most and least leaked images. 77 | """ 78 | # sort etas by index 79 | etas = load_results( 80 | results_path, f"{prefix}_pca20.json")["etas"] 81 | sorted_etas = sorted( 82 | zip(etas, range(len(etas))), key=lambda x: x[0], reverse=True) 83 | 84 | ims = train["features"].squeeze() 85 | n_ims = 8 86 | f, axarr = plt.subplots(2, n_ims, figsize=(7, 2.2)) 87 | f.subplots_adjust(wspace=0.05) 88 | for priv in [False, True]: 89 | for i in range(n_ims): 90 | ax = axarr[int(priv), i] 91 | idx = -(i + 1) if priv else i 92 | im = sorted_etas[idx][1] 93 | image = ims[im, ...] 94 | if image.ndim == 3: 95 | image = image.permute(1, 2, 0) 96 | ax.imshow(image, cmap='gray') 97 | ax.axis("off") 98 | title = "{:.1e}".format(sorted_etas[idx][0]) 99 | ax.set_title(title, fontsize=14) 100 | ax.get_xaxis().set_visible(False) 101 | ax.get_yaxis().set_visible(False) 102 | plotting.savefig(os.path.join(save_path, f"{prefix}_images")) 103 | plt.close(f) 104 | 105 | 106 | def correlations(results_path, save_path, prefix): 107 | results = load_results(results_path, f"{prefix}_pca20.json") 108 | etas = np.array(results["etas"]) 109 | n_samples = 2000 110 | np.random.seed(n_samples) 111 | samples = np.random.permutation(len(etas))[:n_samples] 112 | losses = np.array(results["train_losses"]) 113 | grad_norms = np.array(results["train_grad_norms"]) 114 | alternatives = [ 115 | ("loss", "(a) Loss $\ell({\\bf w^*}^\\top {\\bf x}, y)$", losses), 116 | ("gradnorm", "(b) Gradient norm $\|\\nabla_{\\bf w^*} \ell\|_2$", grad_norms)] 117 | f, axarr = plt.subplots(1, 2, figsize=(10, 4), sharey=True) 118 | f.subplots_adjust(wspace=0.1) 119 | for e, (method, xlabel, values) in enumerate(alternatives): 120 | ax = axarr[e] 121 | ax.scatter(values[samples], etas[samples], s=2.5, color=COLOR) 122 | ax.set_xlabel(xlabel) 123 | axarr[0].set_ylabel("FIL $\eta$") 124 | 125 | plotting.savefig(os.path.join(save_path, f"{prefix}_scatter_alternatives_eta")) 126 | plt.clf() 127 | 128 | 129 | def iterated_reweighted_etas(results_path, save_path, prefix): 130 | """ 131 | Line plot of variance of Fisher information loss eta with iterated 132 | reweighting. 133 | """ 134 | linear = load_results(results_path, f"{prefix}_linear_reweighted.json") 135 | logistic = load_results(results_path, f"{prefix}_logistic_reweighted.json") 136 | stds = np.array([linear["eta_stds"], logistic["eta_stds"]]) 137 | iterations = np.arange(stds.shape[1]) 138 | plotting.line_plot( 139 | stds, iterations, legend=["Linear", "Logistic"], 140 | xlabel="Iteration", ylabel="Standard deviation of $\eta$", 141 | ylog=True, 142 | size=(5, 5), 143 | filename=os.path.join(save_path, f"{prefix}_eta_std_vs_iterations")) 144 | for results, model in [(linear, "linear"), (logistic, "logistic")]: 145 | em = results["eta_means"] 146 | std = results["eta_stds"] 147 | acc = results["test_accs"] 148 | print("="*20) 149 | print(f"{prefix} {model}") 150 | print(f"Pre IRFP eta {em[0]:.3f}, std {std[0]:.3f}, test accuracy {acc[0]:.3f}") 151 | print(f"Post IRFP eta {em[-1]:.3f}, std {std[-1]:.3f}, test accuracy {acc[-1]:.3f}") 152 | 153 | 154 | def private_mse_and_fil(results_path, save_path): 155 | L2s = ['1e-5', '1e-3', '1e-1', '1'] 156 | noise_scales = [ 157 | '1e-05', '2e-05', '5e-05', '0.0001', '0.0002', 158 | '0.0005', '0.001', '0.002', '0.005', '0.01', 159 | '0.02', '0.05', '0.1', '0.2', '0.5', '1.0', 160 | ] 161 | 162 | fils = [] 163 | mean_etas = [] 164 | mses = [] 165 | for l2 in L2s: 166 | etas = load_results( 167 | results_path, f"iwpc_least_squares_fil_l2_{l2}.json")["etas"] 168 | fils.append(etas) 169 | inversion_results = load_results( 170 | results_path, 171 | f"iwpc_least_squares_whitebox_private_inversion_l2_{l2}.json") 172 | mses.append([inversion_results[0][noise_scale]['test_acc'] 173 | for noise_scale in noise_scales]) 174 | mean_etas.append(np.mean(etas)) 175 | sigmas = np.array([float(ns) for ns in noise_scales]) 176 | mean_etas = np.array(mean_etas)[:, None] / sigmas 177 | 178 | l2s = np.array([float(l2) for l2 in L2s]) 179 | legend = ["$\lambda=10^{%d}$"%int(math.log10(float(l2))) for l2 in L2s] 180 | 181 | # Plot FILs: 182 | num_bins = 100 183 | fil_counts = [] 184 | fil_centers = [] 185 | for fil in fils: 186 | lower = math.log10(np.min(fil)) 187 | upper = math.log10(np.max(fil) + 1e-4) 188 | bins = np.logspace(lower, upper, num_bins + 1) 189 | counts, edges = np.histogram(fil, bins=bins) 190 | centers = (edges[:-1] + edges[1:]) / 2 191 | fil_counts.append(counts) 192 | fil_centers.append(centers) 193 | plotting.line_plot( 194 | np.array(fil_counts), 195 | np.array(fil_centers), 196 | xlabel="FIL $\eta$ (at $\sigma=1$)", 197 | ylabel="Number of examples", 198 | legend=legend, 199 | marker=None, 200 | size=(5, 5), 201 | xlog=True, 202 | filename=os.path.join( 203 | args.save_path, 204 | f"iwpc_fil_counts_varying_l2"), 205 | ) 206 | 207 | # PLot MSEs 208 | mses = np.array(mses) # [L2, noise_scale, trials] 209 | mse_means = mses.mean(axis=2) 210 | mse_stds = mses.std(axis=2) 211 | plotting.line_plot( 212 | mse_means, mean_etas, legend=legend, 213 | xlabel="Mean $\\bar{\eta}$", 214 | ylabel="Test MSE", 215 | ylog=True, 216 | xlog=True, 217 | size=(5, 5), 218 | errors=mse_stds, 219 | filename=os.path.join(args.save_path, f"iwpc_mse_vs_eta_varying_l2"), 220 | ) 221 | 222 | 223 | def private_inversion_accuracy(results_path, save_path): 224 | L2s = ['1e-5', '1e-3', '1e-1', '1'] 225 | noise_scales = [ 226 | '1e-05', '2e-05', '5e-05', '0.0001', '0.0002', 227 | '0.0005', '0.001', '0.002', '0.005', '0.01', 228 | '0.02', '0.05', '0.1', '0.2', '0.5', '1.0', 229 | ] 230 | 231 | inversion_results = load_results( 232 | results_path, f"iwpc_least_squares_inversion_l2_1e-3.json")[0] 233 | target = np.array(inversion_results["target"]) 234 | baseline = (target == np.array(inversion_results["baseline"])).mean() 235 | 236 | # Load the max eta for each L2. 237 | mean_etas = [] 238 | for l2 in L2s: 239 | etas = load_results( 240 | results_path, f"iwpc_least_squares_fil_l2_{l2}.json")["etas"] 241 | mean_etas.append(np.mean(etas)) 242 | sigmas = np.array([float(ns) for ns in noise_scales]) 243 | mean_etas = np.array(mean_etas)[:, None] / sigmas 244 | 245 | results = {"whitebox": {}, "fredrikson14": {}} 246 | for inverter in ["whitebox", "fredrikson14"]: 247 | results = [] 248 | for l2 in L2s: 249 | # inversion results are in a list ordered by ieration or IRFIL, 250 | # each dictionary is the results at a given noise scale along with 251 | # the target values 252 | inversion_results = load_results( 253 | results_path, 254 | f"iwpc_least_squares_{inverter}_private_inversion_l2_{l2}.json") 255 | all_accs = [] 256 | for noise_scale in noise_scales: 257 | accs = [] 258 | for prediction in inversion_results[0][noise_scale]['predictions']: 259 | accs.append((np.array(prediction) == target).mean()) 260 | all_accs.append([np.mean(accs), np.std(accs)]) 261 | results.append(all_accs) 262 | 263 | results = np.array(results) # [L2, noise scale, mean/std] 264 | means = results[:, :, 0] 265 | stds = results[:, :, 1] 266 | 267 | legend = ["$\lambda=10^{%d}$"%int(math.log10(float(l2))) for l2 in L2s] 268 | 269 | plotting.line_plot( 270 | means, mean_etas, legend=legend, 271 | xlabel="Mean $\\bar{\eta}$", 272 | ylabel="Attribute inversion accuracy", 273 | errors=stds, 274 | ymax=1.02, 275 | ymin=0.2, 276 | xlog=True, 277 | size=(5, 5)) 278 | plt.semilogx([0, 1e3], [baseline]*2, 'k--', label="Prior") 279 | plt.legend() 280 | plt.xlim(right=1e3) 281 | plotting.savefig(os.path.join( 282 | args.save_path, 283 | f"iwpc_{inverter}_vs_eta_varying_l2")) 284 | 285 | 286 | def irfil_inversion(results_path, dataset, save_path): 287 | noise_scale = "0.001" 288 | its = 10 289 | 290 | irfil_results = load_results(results_path, f"{dataset}_least_squares_irfil.json") 291 | etas = np.array(irfil_results["etas"]) 292 | eta_means = np.array(irfil_results["eta_means"])[:its] 293 | eta_stds = np.array(irfil_results["eta_stds"])[:its] 294 | plotting.line_plot( 295 | eta_means[None, :], np.arange(its), 296 | xlabel="Steps of IRFIL", 297 | ylabel="Mean $\\bar{\eta}$", 298 | errors=eta_stds[None, :], 299 | size=(4.85, 5.05), 300 | filename=os.path.join(args.save_path, f"{dataset}_mean_fil"), 301 | ) 302 | 303 | def compute_correct_ratio(etas, num_bins, predictions, target): 304 | order = etas.argsort() 305 | bin_size = len(target) // num_bins + 1 306 | bin_accs = [] 307 | for prediction in predictions: 308 | prediction = np.array(prediction) 309 | correct = (prediction == target) 310 | bin_accs.append([correct[order[lower:lower + bin_size]].mean() 311 | for lower in range(0, len(correct), bin_size)]) 312 | return np.array(bin_accs) 313 | 314 | inversion_results = load_results( 315 | results_path, 316 | f"{dataset}_least_squares_whitebox_private_inversion_irfil.json") 317 | target = np.array(inversion_results[0]["target"]) 318 | num_bins = 10 319 | ratio_means = [] 320 | ratio_stds = [] 321 | its = [0, 2, 10] 322 | for it in its: 323 | predictions = inversion_results[it][noise_scale]['predictions'] 324 | ratios = compute_correct_ratio(etas, num_bins, predictions, target) 325 | ratio_means.append(ratios.mean(axis=0)) 326 | ratio_stds.append(ratios.std(axis=0)) 327 | ratio_means = np.array(ratio_means) 328 | ratio_stds = np.array(ratio_stds) 329 | 330 | plotting.line_plot( 331 | ratio_means, np.arange(num_bins), 332 | legend=["Iteration {}".format(it) for it in its], 333 | xlabel="FIL ($\eta$) percentile", 334 | ylabel="Attribute inversion accuracy", 335 | errors=ratio_stds, 336 | size=(5, 5), 337 | filename=os.path.join( 338 | args.save_path, 339 | f"{dataset}_whitebox_eta_percentile"), 340 | ) 341 | 342 | 343 | 344 | def main(args): 345 | labels = {"mnist" : ["0", "1"], "cifar10": ["Plane", "Car"]} 346 | for dataset in ["mnist", "cifar10"]: 347 | train = dataloading.load_dataset( 348 | name=dataset, split="train", normalize=False, 349 | num_classes=2, reshape=False, root=args.data_folder) 350 | 351 | for model in ["linear", "logistic"]: 352 | prefix = f"{dataset}_{model}" 353 | 354 | # Histogram of etas: 355 | eta_histogram( 356 | args.results_path, args.save_path, 357 | prefix, train, labels[dataset]) 358 | 359 | # Most and least leaked images: 360 | view_images(train, args.results_path, args.save_path, prefix) 361 | 362 | eta_overlap(args.results_path, f"{dataset}") 363 | 364 | # Plot of eta stds vs iterations of reweighting 365 | iterated_reweighted_etas( 366 | args.results_path, args.save_path, f"{dataset}") 367 | 368 | # Plot correlations of eta with other metrics 369 | correlations(args.results_path, args.save_path, "mnist_linear") 370 | 371 | # IWPC MSE and FIL with output pertubration 372 | private_mse_and_fil(args.results_path, args.save_path) 373 | 374 | # IWPC Fredrikson and whitebox attribute inversion results. 375 | private_inversion_accuracy(args.results_path, args.save_path) 376 | 377 | # IWPC and UCI Adult attribute inversion results as a function of 378 | # iterations of IRFIL 379 | for dataset in ["iwpc", "uciadult"]: 380 | irfil_inversion(args.results_path, dataset, args.save_path) 381 | 382 | 383 | if __name__ == "__main__": 384 | parser = argparse.ArgumentParser( 385 | description="Script to make all the figures.") 386 | parser.add_argument("--results_path", type=str, default=".", 387 | help="Path of saved results") 388 | parser.add_argument("--data_folder", default="/tmp", type=str, 389 | help="folder in which to store data (default: '/tmp')") 390 | parser.add_argument("--save_path", default=".", type=str, 391 | help="folder in which to store figures (default: '.')") 392 | parser.add_argument("--format", default=None, type=str, 393 | help="format to save figures (default: \"pdf\")") 394 | args = parser.parse_args() 395 | if args.format is not None: 396 | plotting.FORMAT = args.format 397 | 398 | plt.rcParams.update({ 399 | "axes.titlesize": 24, 400 | "legend.fontsize": 20, 401 | "xtick.labelsize": 18, 402 | "ytick.labelsize" : 18, 403 | "axes.labelsize": 24 404 | }) 405 | 406 | main(args) 407 | -------------------------------------------------------------------------------- /dataloading.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | 9 | import logging 10 | import numpy as np 11 | import os 12 | import torch 13 | from torchvision.datasets import MNIST, CIFAR10, CIFAR100 14 | from torchvision.datasets.mnist import read_image_file, read_label_file 15 | from torchvision.datasets.utils import download_url 16 | 17 | import time 18 | import sys 19 | import git 20 | 21 | 22 | class Synth: 23 | """ 24 | Synthetic dataset for testing. 25 | """ 26 | def __init__(self, root, train=True, download=True): 27 | regression = True 28 | drop = True 29 | num_examples = 1000 30 | num_attributes = 3 31 | categories_per_attribute = 3 32 | 33 | # For repeatability 34 | torch.manual_seed(int(train)) 35 | X = torch.randint( 36 | 0, categories_per_attribute, (num_examples, num_attributes)) 37 | X = torch.nn.functional.one_hot(X, categories_per_attribute) 38 | if drop: 39 | # Remove the last attribute to get rid of perfect collinearity 40 | X = X[:, :, :-1] 41 | X = X.reshape(num_examples, -1) 42 | self.data = X.float() 43 | 44 | if regression: 45 | self.targets = torch.randn(num_examples) * 50 46 | else: 47 | self.targets = torch.randint(0, 2, (num_examples,)) 48 | torch.manual_seed(time.time()) 49 | 50 | 51 | class UCIAdult: 52 | """ 53 | UCI Adult dataset: 54 | http://archive.ics.uci.edu/ml/datasets/Adult 55 | 56 | The task is to classify individuals as making above or below $50k 57 | based on certain demographic attributes. 58 | 59 | Data key: 60 | 0 age: continuous. 61 | 1 workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, 62 | State-gov, Without-pay, (NB removed with ?: Never-worked). 63 | 2 fnlwgt: continuous. 64 | 3 education: Bachelors, Some-college, 11th, HS-grad, Prof-school, 65 | Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 66 | 10th, Doctorate, 5th-6th, Preschool. 67 | 4 education-num: continuous. 68 | 5 marital-status: Married-civ-spouse, Divorced, Never-married, Separated, 69 | Widowed, Married-spouse-absent, Married-AF-spouse. 70 | 6 occupation: Tech-support, Craft-repair, Other-service, Sales, 71 | Exec-managerial, Prof-specialty, Handlers-cleaners, 72 | Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, 73 | Priv-house-serv, Protective-serv, Armed-Forces. 74 | 7 relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, 75 | Unmarried. 76 | 8 race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. 77 | 9 sex: Female, Male. 78 | 10 capital-gain: continuous. 79 | 11 capital-loss: continuous. 80 | 12 hours-per-week: continuous. 81 | 13 native-country: United-States, Cambodia, England, Puerto-Rico, Canada, 82 | Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, 83 | China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, 84 | Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, 85 | Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, 86 | Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, 87 | Hong, Holand-Netherlands. 88 | 89 | 14 label: >50K, <=50K. 90 | """ 91 | 92 | def __init__(self, root, train=True, download=False, mehnaz20=True, drop=True): 93 | """ 94 | mehnaz20: If True, then pre-process the data according to: 95 | Black-box Model Inversion Attribute Inference Attacks on 96 | Classification Models, Mehnaz 2020, https://arxiv.org/abs/2012.03404. 97 | Namely, they combine the marital status features (5) into a single 98 | binary feature {Married-civ-spouse, Married-spouse-absent, 99 | Married-Af-spouse} vs {Divorced, Never-married, Separated, Widowed} and 100 | remove the relationship features (7). 101 | drop: If True, remove the last feature for a one-hot encoded 102 | attribute. This helps alleviate perfect colinearity amongst 103 | the features. 104 | """ 105 | self.root = root 106 | if download: 107 | self.download() 108 | continuous_ids = set([0, 2, 4, 10, 11, 12]) 109 | feature_keys = [set() for _ in range(15)] 110 | 111 | def load(dataset): 112 | with open(os.path.join(root, dataset)) as fid: 113 | # load data ignoring rows with missing values 114 | lines = (l.strip() for l in fid) 115 | lines = (l.split(",") for l in lines if "?" not in l) 116 | lines = [l for l in lines if len(l) == 15] 117 | return lines 118 | 119 | for line in load("adult.data"): 120 | for e, k in enumerate(line): 121 | if e in continuous_ids: 122 | continue 123 | k = k.strip() 124 | feature_keys[e].add(k) 125 | feature_keys = [{k: i for i, k in enumerate(sorted(fk))} 126 | for fk in feature_keys] 127 | self.feature_keys = feature_keys 128 | if mehnaz20: 129 | # Remap marital status to binary feature: 130 | marital_status = feature_keys[5] 131 | for ms in ["Divorced", "Never-married", "Separated", "Widowed"]: 132 | marital_status[ms] = 0 133 | for ms in ["Married-AF-spouse", "Married-civ-spouse", "Married-spouse-absent"]: 134 | marital_status[ms] = 1 135 | 136 | def process(dataset, mean_stds=None): 137 | features = [] 138 | targets = [] 139 | for line in load(dataset): 140 | example = [] 141 | for e, k in enumerate(line): 142 | k = k.strip().strip(".") 143 | example.append(int(feature_keys[e].get(k, k))) 144 | features.append(example[:-1]) 145 | targets.append(example[-1]) 146 | features = torch.tensor(features, dtype=torch.float) 147 | features = list(features.split(1, dim=1)) 148 | targets = torch.tensor(targets) 149 | 150 | if mean_stds is None: 151 | mean_stds = {} 152 | for e, feat in enumerate(features): 153 | keys = feature_keys[e] 154 | # Normalize continuous features: 155 | if len(keys) == 0: 156 | if e not in mean_stds: 157 | mean_stds[e] = (torch.mean(feat), torch.std(feat)) 158 | mean, std = mean_stds[e] 159 | features[e] = (feat - mean) / std 160 | # One-hot encode non-continuous features: 161 | else: 162 | num_feats = max(keys.values()) + 1 163 | features[e] = torch.nn.functional.one_hot( 164 | feat.squeeze().to(torch.long), num_feats) 165 | if drop: 166 | features[e] = features[e][:, :-1] 167 | features[e] = features[e].to(torch.float) 168 | if mehnaz20: 169 | # Remove relationship status: 170 | features.pop(7) 171 | features = torch.cat(features, dim=1) 172 | return features, targets, mean_stds 173 | 174 | features, targets, mean_stds = process("adult.data") 175 | if not train: 176 | features, targets, _ = process("adult.test", mean_stds) 177 | self.data = features 178 | self.targets = targets 179 | 180 | def download(self): 181 | logging.info("Maybe downloading UCI Adult dataset.") 182 | base_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/" 183 | for f in ["adult.data", "adult.names", "adult.test"]: 184 | download_url(os.path.join(base_url, f), root=self.root) 185 | 186 | 187 | class IWPC: 188 | """ 189 | International Warfarin Pharmacogenetics Consortium (IWPC) dataset: 190 | https://www.pharmgkb.org/downloads 191 | Pre-processed version from https://github.com/samuel-yeom/ml-privacy-csf18 192 | 193 | The task is to predict stable Warfarin dosage given demographic, medical, and genetic attributes. 194 | 195 | Processed data has the following attributes: 196 | 0 race: asian/white/black 197 | 1 age: rounded down to 10s and whitened 198 | 2 height: continuous; whitened 199 | 3 weight: continuous; whitened 200 | 4 amiodarone: binary 201 | 5 CYP2C9 inducer: binary 202 | 6 CYP2C9 genotype: 11/12/13/22/23/33 203 | 7 VKORC1 genotype: CC/CT/TT 204 | 8 label: Warfarin dosage; whitened 205 | """ 206 | def __init__(self, root, train=True, download=False, drop=True): 207 | """ 208 | drop: If True, remove the last feature for a one-hot encoded 209 | attribute. This helps alleviate perfect colinearity amongst 210 | the features. 211 | """ 212 | if download: 213 | # download dataloading code and data from repo 214 | try: 215 | git.Repo.clone_from("git@github.com:samuel-yeom/ml-privacy-csf18.git", root) 216 | except git.GitCommandError: 217 | print("Directory exists and is non-empty, skipping download") 218 | sys.path.append(os.path.join(root, "code")) 219 | from main import load_iwpc 220 | X, y, featnames = load_iwpc(os.path.join(root, "data")) 221 | X = X.todense() 222 | if drop: 223 | attrs = [f.split("=")[0] for f in featnames] 224 | drop_keys = ["cyp2c9", "race", "vkorc1"] 225 | drop_idx = [attrs.index(k) for k in drop_keys] 226 | X = np.delete(X, drop_idx, axis=1) 227 | featnames = np.delete(featnames, drop_idx) 228 | print("Attributes: " + str(featnames)) 229 | X = torch.from_numpy(X).float() 230 | y = torch.from_numpy(y).float() 231 | # fix a random 80:20 train-val split 232 | torch.manual_seed(0) 233 | perm = torch.randperm(X.size(0)) 234 | n_train = int(0.8 * X.size(0)) 235 | if train: 236 | self.data = X[perm[:n_train], :] 237 | self.targets = y[perm[:n_train]] 238 | else: 239 | self.data = X[perm[n_train:], :] 240 | self.targets = y[perm[n_train:]] 241 | torch.manual_seed(time.time()) 242 | 243 | 244 | class MNIST1M(MNIST): 245 | """ 246 | MNIST1M dataset that can be generated using InfiMNIST. 247 | 248 | Note: This dataset cannot be downloaded automatically. 249 | """ 250 | 251 | def __init__(self, root, train=True, transform=None, target_transform=None, 252 | download=False): 253 | super(MNIST1M, self).__init__( 254 | root, 255 | train=train, 256 | transform=transform, 257 | target_transform=target_transform, 258 | download=download, 259 | ) 260 | 261 | def download(self): 262 | """ 263 | Process MNIST1M data if it does not exist in processed_folder already. 264 | """ 265 | 266 | # check if processed data does not exist: 267 | if self._check_exists(): 268 | return 269 | 270 | # process and save as torch files: 271 | logging.info("Processing MNIST1M data...") 272 | os.makedirs(self.processed_folder, exist_ok=True) 273 | training_set = ( 274 | read_image_file(os.path.join(self.raw_folder, "mnist1m-images-idx3-ubyte")), 275 | read_label_file(os.path.join(self.raw_folder, "mnist1m-labels-idx1-ubyte")) 276 | ) 277 | test_set = ( 278 | read_image_file(os.path.join(self.raw_folder, "t10k-images-idx3-ubyte")), 279 | read_label_file(os.path.join(self.raw_folder, "t10k-labels-idx1-ubyte")) 280 | ) 281 | with open(os.path.join(self.processed_folder, self.training_file), "wb") as f: 282 | torch.save(training_set, f) 283 | with open(os.path.join(self.processed_folder, self.test_file), "wb") as f: 284 | torch.save(test_set, f) 285 | logging.info("Done!") 286 | 287 | 288 | def load_dataset( 289 | name="mnist", 290 | split="train", 291 | normalize=True, 292 | reshape=True, 293 | num_classes=None, 294 | regression=False, 295 | root="/tmp", 296 | ): 297 | """ 298 | Loads train or test `split` from the dataset with the specified `name`. 299 | Setting `normalize` to `True` (default) normalizes each feature vector to 300 | lie on the unit ball. Setting `reshape` to `True` (default) flattens n-D 301 | arrays into vectors. Specifying `num_classes` selects only the first 302 | `num_classes` of the classification problem (default: all classes). 303 | """ 304 | 305 | # assertions: 306 | assert split in ["train", "test"], f"unknown split: {split}" 307 | image_sets = { 308 | "mnist": MNIST, 309 | "mnist1m": MNIST1M, 310 | "cifar10": CIFAR10, 311 | "cifar100": CIFAR100, 312 | } 313 | datasets = { 314 | "uciadult": UCIAdult, "iwpc": IWPC, "synth": Synth} 315 | datasets.update(image_sets) 316 | assert name in datasets, f"unknown dataset: {name}" 317 | 318 | # download the image dataset: 319 | dataset = datasets[name]( 320 | f"{root}/{name}_original", 321 | download=True, 322 | train=(split == "train"), 323 | ) 324 | 325 | # preprocess the image dataset: 326 | features, targets = dataset.data, dataset.targets 327 | if not torch.is_tensor(features): 328 | features = torch.from_numpy(features) 329 | if not torch.is_tensor(targets): 330 | targets = torch.tensor(targets) 331 | if name in image_sets: 332 | features = features.float().div_(255.) 333 | if not regression: 334 | targets = targets.long() 335 | 336 | # flatten images or convert to NCHW: 337 | if reshape: 338 | features = features.reshape(features.size(0), -1) 339 | else: 340 | if len(features.shape) == 3: 341 | features = features.unsqueeze(3) 342 | features = features.permute(0, 3, 1, 2) 343 | 344 | # select only subset of classes: 345 | if not regression and num_classes is not None: 346 | assert num_classes >= 2, "number of classes must be >= 2" 347 | mask = targets.lt(num_classes) # assumes classes are 0, 1, ..., C - 1 348 | features = features[mask, :] 349 | targets = targets[mask] 350 | 351 | # normalize all samples to lie within unit ball: 352 | if normalize: 353 | assert reshape, "normalization without reshaping unsupported" 354 | features.div_(features.norm(dim=1).max()) 355 | # return dataset: 356 | return {"features": features, "targets": targets} 357 | 358 | 359 | def load_datasampler(dataset, batch_size=1, shuffle=True, transform=None): 360 | """ 361 | Returns a data sampler that yields samples of the specified `dataset` with the 362 | given `batch_size`. An optional `transform` for samples can also be given. 363 | If `shuffle` is `True` (default), samples are shuffled. 364 | """ 365 | assert dataset["features"].size(0) == dataset["targets"].size(0), \ 366 | "number of feature vectors and targets must match" 367 | if transform is not None: 368 | assert callable(transform), "transform must be callable if specified" 369 | N = dataset["features"].size(0) 370 | 371 | # define simple dataset sampler: 372 | def sampler(): 373 | idx = 0 374 | perm = torch.randperm(N) if shuffle else torch.range(0, N).long() 375 | while idx < N: 376 | 377 | # get batch: 378 | start = idx 379 | end = min(idx + batch_size, N) 380 | batch = dataset["features"][perm[start:end], :] 381 | 382 | # apply transform: 383 | if transform is not None: 384 | transformed_batch = [ 385 | transform(batch[n, :]) for n in range(batch.size(0)) 386 | ] 387 | batch = torch.stack(transformed_batch, dim=0) 388 | 389 | # return sample: 390 | yield {"features": batch, "targets": dataset["targets"][perm[start:end]]} 391 | idx += batch_size 392 | 393 | # return sampler: 394 | return sampler 395 | 396 | 397 | def subsample(data, num_samples, random=True): 398 | """ 399 | Subsamples the specified `data` to contain `num_samples` samples. Set 400 | `random` to `False` to not select samples randomly but only pick top ones. 401 | """ 402 | 403 | # assertions: 404 | assert isinstance(data, dict), "data must be a dict" 405 | assert "targets" in data, "data dict does not have targets field" 406 | dataset_size = data["targets"].nelement() 407 | assert num_samples > 0, "num_samples must be positive integer value" 408 | assert num_samples <= dataset_size, "num_samples cannot exceed data size" 409 | 410 | # subsample data: 411 | if random: 412 | permutation = torch.randperm(dataset_size) 413 | for key, value in data.items(): 414 | if random: 415 | data[key] = value.index_select(0, permutation[:num_samples]) 416 | else: 417 | data[key] = value.narrow(0, 0, num_samples).contiguous() 418 | return data 419 | 420 | 421 | def pca(data, num_dims=None, mapping=None): 422 | """ 423 | Applies PCA on the specified `data` to reduce its dimensionality to 424 | `num_dims` dimensions, and returns the reduced data and `mapping`. 425 | 426 | If a `mapping` is specified as input, `num_dims` is ignored and that mapping 427 | is applied on the input `data`. 428 | """ 429 | 430 | # work on both data tensor and data dict: 431 | data_dict = False 432 | if isinstance(data, dict): 433 | assert "features" in data, "data dict does not have features field" 434 | data_dict = True 435 | original_data = data 436 | data = original_data["features"] 437 | assert data.dim() == 2, "data tensor must be two-dimensional matrix" 438 | 439 | # compute PCA mapping: 440 | if mapping is None: 441 | assert num_dims is not None, "must specify num_dims or mapping" 442 | mean = torch.mean(data, 0, keepdim=True) 443 | zero_mean_data = data.sub(mean) 444 | covariance = torch.matmul(zero_mean_data.t(), zero_mean_data) 445 | _, projection = torch.symeig(covariance, eigenvectors=True) 446 | projection = projection[:, -min(num_dims, projection.size(1)):] 447 | mapping = {"mean": mean, "projection": projection} 448 | else: 449 | assert isinstance(mapping, dict), "mapping must be a dict" 450 | assert "mean" in mapping and "projection" in mapping, "mapping missing keys" 451 | if num_dims is not None: 452 | logging.warning("Value of num_dims is ignored when mapping is specified.") 453 | 454 | # apply PCA mapping: 455 | reduced_data = data.sub(mapping["mean"]).matmul(mapping["projection"]) 456 | 457 | # return results: 458 | if data_dict: 459 | original_data["features"] = reduced_data 460 | reduced_data = original_data 461 | return reduced_data, mapping 462 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution-NonCommercial 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution-NonCommercial 4.0 International Public 58 | License 59 | 60 | By exercising the Licensed Rights (defined below), You accept and agree 61 | to be bound by the terms and conditions of this Creative Commons 62 | Attribution-NonCommercial 4.0 International Public License ("Public 63 | License"). To the extent this Public License may be interpreted as a 64 | contract, You are granted the Licensed Rights in consideration of Your 65 | acceptance of these terms and conditions, and the Licensor grants You 66 | such rights in consideration of benefits the Licensor receives from 67 | making the Licensed Material available under these terms and 68 | conditions. 69 | 70 | Section 1 -- Definitions. 71 | 72 | a. Adapted Material means material subject to Copyright and Similar 73 | Rights that is derived from or based upon the Licensed Material 74 | and in which the Licensed Material is translated, altered, 75 | arranged, transformed, or otherwise modified in a manner requiring 76 | permission under the Copyright and Similar Rights held by the 77 | Licensor. For purposes of this Public License, where the Licensed 78 | Material is a musical work, performance, or sound recording, 79 | Adapted Material is always produced where the Licensed Material is 80 | synched in timed relation with a moving image. 81 | 82 | b. Adapter's License means the license You apply to Your Copyright 83 | and Similar Rights in Your contributions to Adapted Material in 84 | accordance with the terms and conditions of this Public License. 85 | 86 | c. Copyright and Similar Rights means copyright and/or similar rights 87 | closely related to copyright including, without limitation, 88 | performance, broadcast, sound recording, and Sui Generis Database 89 | Rights, without regard to how the rights are labeled or 90 | categorized. For purposes of this Public License, the rights 91 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 92 | Rights. 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. NonCommercial means not primarily intended for or directed towards 116 | commercial advantage or monetary compensation. For purposes of 117 | this Public License, the exchange of the Licensed Material for 118 | other material subject to Copyright and Similar Rights by digital 119 | file-sharing or similar means is NonCommercial provided there is 120 | no payment of monetary compensation in connection with the 121 | exchange. 122 | 123 | j. Share means to provide material to the public by any means or 124 | process that requires permission under the Licensed Rights, such 125 | as reproduction, public display, public performance, distribution, 126 | dissemination, communication, or importation, and to make material 127 | available to the public including in ways that members of the 128 | public may access the material from a place and at a time 129 | individually chosen by them. 130 | 131 | k. Sui Generis Database Rights means rights other than copyright 132 | resulting from Directive 96/9/EC of the European Parliament and of 133 | the Council of 11 March 1996 on the legal protection of databases, 134 | as amended and/or succeeded, as well as other essentially 135 | equivalent rights anywhere in the world. 136 | 137 | l. You means the individual or entity exercising the Licensed Rights 138 | under this Public License. Your has a corresponding meaning. 139 | 140 | Section 2 -- Scope. 141 | 142 | a. License grant. 143 | 144 | 1. Subject to the terms and conditions of this Public License, 145 | the Licensor hereby grants You a worldwide, royalty-free, 146 | non-sublicensable, non-exclusive, irrevocable license to 147 | exercise the Licensed Rights in the Licensed Material to: 148 | 149 | a. reproduce and Share the Licensed Material, in whole or 150 | in part, for NonCommercial purposes only; and 151 | 152 | b. produce, reproduce, and Share Adapted Material for 153 | NonCommercial purposes only. 154 | 155 | 2. Exceptions and Limitations. For the avoidance of doubt, where 156 | Exceptions and Limitations apply to Your use, this Public 157 | License does not apply, and You do not need to comply with 158 | its terms and conditions. 159 | 160 | 3. Term. The term of this Public License is specified in Section 161 | 6(a). 162 | 163 | 4. Media and formats; technical modifications allowed. The 164 | Licensor authorizes You to exercise the Licensed Rights in 165 | all media and formats whether now known or hereafter created, 166 | and to make technical modifications necessary to do so. The 167 | Licensor waives and/or agrees not to assert any right or 168 | authority to forbid You from making technical modifications 169 | necessary to exercise the Licensed Rights, including 170 | technical modifications necessary to circumvent Effective 171 | Technological Measures. For purposes of this Public License, 172 | simply making modifications authorized by this Section 2(a) 173 | (4) never produces Adapted Material. 174 | 175 | 5. Downstream recipients. 176 | 177 | a. Offer from the Licensor -- Licensed Material. Every 178 | recipient of the Licensed Material automatically 179 | receives an offer from the Licensor to exercise the 180 | Licensed Rights under the terms and conditions of this 181 | Public License. 182 | 183 | b. No downstream restrictions. You may not offer or impose 184 | any additional or different terms or conditions on, or 185 | apply any Effective Technological Measures to, the 186 | Licensed Material if doing so restricts exercise of the 187 | Licensed Rights by any recipient of the Licensed 188 | Material. 189 | 190 | 6. No endorsement. Nothing in this Public License constitutes or 191 | may be construed as permission to assert or imply that You 192 | are, or that Your use of the Licensed Material is, connected 193 | with, or sponsored, endorsed, or granted official status by, 194 | the Licensor or others designated to receive attribution as 195 | provided in Section 3(a)(1)(A)(i). 196 | 197 | b. Other rights. 198 | 199 | 1. Moral rights, such as the right of integrity, are not 200 | licensed under this Public License, nor are publicity, 201 | privacy, and/or other similar personality rights; however, to 202 | the extent possible, the Licensor waives and/or agrees not to 203 | assert any such rights held by the Licensor to the limited 204 | extent necessary to allow You to exercise the Licensed 205 | Rights, but not otherwise. 206 | 207 | 2. Patent and trademark rights are not licensed under this 208 | Public License. 209 | 210 | 3. To the extent possible, the Licensor waives any right to 211 | collect royalties from You for the exercise of the Licensed 212 | Rights, whether directly or through a collecting society 213 | under any voluntary or waivable statutory or compulsory 214 | licensing scheme. In all other cases the Licensor expressly 215 | reserves any right to collect such royalties, including when 216 | the Licensed Material is used other than for NonCommercial 217 | purposes. 218 | 219 | Section 3 -- License Conditions. 220 | 221 | Your exercise of the Licensed Rights is expressly made subject to the 222 | following conditions. 223 | 224 | a. Attribution. 225 | 226 | 1. If You Share the Licensed Material (including in modified 227 | form), You must: 228 | 229 | a. retain the following if it is supplied by the Licensor 230 | with the Licensed Material: 231 | 232 | i. identification of the creator(s) of the Licensed 233 | Material and any others designated to receive 234 | attribution, in any reasonable manner requested by 235 | the Licensor (including by pseudonym if 236 | designated); 237 | 238 | ii. a copyright notice; 239 | 240 | iii. a notice that refers to this Public License; 241 | 242 | iv. a notice that refers to the disclaimer of 243 | warranties; 244 | 245 | v. a URI or hyperlink to the Licensed Material to the 246 | extent reasonably practicable; 247 | 248 | b. indicate if You modified the Licensed Material and 249 | retain an indication of any previous modifications; and 250 | 251 | c. indicate the Licensed Material is licensed under this 252 | Public License, and include the text of, or the URI or 253 | hyperlink to, this Public License. 254 | 255 | 2. You may satisfy the conditions in Section 3(a)(1) in any 256 | reasonable manner based on the medium, means, and context in 257 | which You Share the Licensed Material. For example, it may be 258 | reasonable to satisfy the conditions by providing a URI or 259 | hyperlink to a resource that includes the required 260 | information. 261 | 262 | 3. If requested by the Licensor, You must remove any of the 263 | information required by Section 3(a)(1)(A) to the extent 264 | reasonably practicable. 265 | 266 | 4. If You Share Adapted Material You produce, the Adapter's 267 | License You apply must not prevent recipients of the Adapted 268 | Material from complying with this Public License. 269 | 270 | Section 4 -- Sui Generis Database Rights. 271 | 272 | Where the Licensed Rights include Sui Generis Database Rights that 273 | apply to Your use of the Licensed Material: 274 | 275 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 276 | to extract, reuse, reproduce, and Share all or a substantial 277 | portion of the contents of the database for NonCommercial purposes 278 | only; 279 | 280 | b. if You include all or a substantial portion of the database 281 | contents in a database in which You have Sui Generis Database 282 | Rights, then the database in which You have Sui Generis Database 283 | Rights (but not its individual contents) is Adapted Material; and 284 | 285 | c. You must comply with the conditions in Section 3(a) if You Share 286 | all or a substantial portion of the contents of the database. 287 | 288 | For the avoidance of doubt, this Section 4 supplements and does not 289 | replace Your obligations under this Public License where the Licensed 290 | Rights include other Copyright and Similar Rights. 291 | 292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 293 | 294 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 295 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 296 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 297 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 298 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 299 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 300 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 301 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 302 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 303 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 304 | 305 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 306 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 307 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 308 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 309 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 310 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 311 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 312 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 313 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 314 | 315 | c. The disclaimer of warranties and limitation of liability provided 316 | above shall be interpreted in a manner that, to the extent 317 | possible, most closely approximates an absolute disclaimer and 318 | waiver of all liability. 319 | 320 | Section 6 -- Term and Termination. 321 | 322 | a. This Public License applies for the term of the Copyright and 323 | Similar Rights licensed here. However, if You fail to comply with 324 | this Public License, then Your rights under this Public License 325 | terminate automatically. 326 | 327 | b. Where Your right to use the Licensed Material has terminated under 328 | Section 6(a), it reinstates: 329 | 330 | 1. automatically as of the date the violation is cured, provided 331 | it is cured within 30 days of Your discovery of the 332 | violation; or 333 | 334 | 2. upon express reinstatement by the Licensor. 335 | 336 | For the avoidance of doubt, this Section 6(b) does not affect any 337 | right the Licensor may have to seek remedies for Your violations 338 | of this Public License. 339 | 340 | c. For the avoidance of doubt, the Licensor may also offer the 341 | Licensed Material under separate terms or conditions or stop 342 | distributing the Licensed Material at any time; however, doing so 343 | will not terminate this Public License. 344 | 345 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 346 | License. 347 | 348 | Section 7 -- Other Terms and Conditions. 349 | 350 | a. The Licensor shall not be bound by any additional or different 351 | terms or conditions communicated by You unless expressly agreed. 352 | 353 | b. Any arrangements, understandings, or agreements regarding the 354 | Licensed Material not stated herein are separate from and 355 | independent of the terms and conditions of this Public License. 356 | 357 | Section 8 -- Interpretation. 358 | 359 | a. For the avoidance of doubt, this Public License does not, and 360 | shall not be interpreted to, reduce, limit, restrict, or impose 361 | conditions on any use of the Licensed Material that could lawfully 362 | be made without permission under this Public License. 363 | 364 | b. To the extent possible, if any provision of this Public License is 365 | deemed unenforceable, it shall be automatically reformed to the 366 | minimum extent necessary to make it enforceable. If the provision 367 | cannot be reformed, it shall be severed from this Public License 368 | without affecting the enforceability of the remaining terms and 369 | conditions. 370 | 371 | c. No term or condition of this Public License will be waived and no 372 | failure to comply consented to unless expressly agreed to by the 373 | Licensor. 374 | 375 | d. Nothing in this Public License constitutes or may be interpreted 376 | as a limitation upon, or waiver of, any privileges and immunities 377 | that apply to the Licensor or You, including from the legal 378 | processes of any jurisdiction or authority. 379 | 380 | ======================================================================= 381 | 382 | Creative Commons is not a party to its public 383 | licenses. Notwithstanding, Creative Commons may elect to apply one of 384 | its public licenses to material it publishes and in those instances 385 | will be considered the “Licensor.” The text of the Creative Commons 386 | public licenses is dedicated to the public domain under the CC0 Public 387 | Domain Dedication. Except for the limited purpose of indicating that 388 | material is shared under a Creative Commons public license or as 389 | otherwise permitted by the Creative Commons policies published at 390 | creativecommons.org/policies, Creative Commons does not authorize the 391 | use of the trademark "Creative Commons" or any other trademark or logo 392 | of Creative Commons without its prior written consent including, 393 | without limitation, in connection with any unauthorized modifications 394 | to any of its public licenses or any other arrangements, 395 | understandings, or agreements concerning use of licensed material. For 396 | the avoidance of doubt, this paragraph does not form part of the 397 | public licenses. 398 | 399 | Creative Commons may be contacted at creativecommons.org. 400 | --------------------------------------------------------------------------------