Object representing an experiment, based on its data generator and model pair
41 |
Args
42 |
43 |
generator : Generator
44 |
data abstraction used for this experiment
45 |
model : Model.__class__
46 |
a Model class that represents the model to be used
47 |
n_targets : int
48 |
the total number of target models
49 |
n_shadows : int
50 |
the total number of shadow models
51 |
hyperparams : dict or DictConfig
52 |
dictionary containing every useful hyper-parameter for the Model;
53 | if a list is provided for some hyperparameter(s), we grid optimise between all given options (except for keyword layers)
54 |
n_queries : int
55 |
the number of queries used in the scope of grey- and black-box attacks. Must be strictly superior to n_targets
56 |
n_classes : int
57 |
the number of classes considered for property inference; if 1 then a regression is performed
58 |
range : tuple
59 |
the range of values accepted for regression tasks (needed for regression, ignored for classification)
60 | it is possible to pass an iterable of multiple ranges in order to perform multi-variable property inference regression, in which case the values of the variables are passed to the Generator as a list
61 |
62 |
Methods
63 |
64 |
65 | def run_blackbox(self, n_outputs=1)
66 |
67 |
68 |
Runs a blackbox attack on the target models, by using the result of random queries as features for a meta-classifier
69 |
Args
70 |
71 |
n_outputs : int
72 |
number of attack results to output, using multiple random subsets of the shadow models
73 |
74 |
Returns: Attack accuracy on target models for the classification task, or mean absolute error for the regression task
75 |
76 |
77 | def run_loss_test(self)
78 |
79 |
80 |
Runs a loss test attack on target models. Works only for the binary classification attack on a classifier.
a Model class that represents the model to be used. If None, will be the same as target models
92 |
hyperparams : dict or DictConfig
93 |
dictionary containing every useful hyper-parameter for the Model;
94 | Hyperparameters of shadow models will NOT be optimised. If None, will be the same as target models.
36 | class GaussianGenerator
37 | (n_samples=1024)
38 |
39 |
40 |
Generator sampling from a multivariate Gaussian Distribution in which features are correlated.
41 | Label is made categorical by checking whether it is positive or negative.
42 | Sensitive attribute is the mean of the fourth feature vector
Returns a dataset sampled from the data; the label variable corresponds to the property being attacked
77 |
Args
78 |
79 |
label : int or float or numpy.array
80 |
the label corresponding to the dataset being queried - when performing regression, the value of the target variable(s)
81 |
adv : bool
82 |
a boolean describing whether we are using target or adversary data split
83 |
84 |
Returns
85 |
a pandas DataFrame representing our dataset for this experiment
86 |
87 |
88 |
89 |
90 | class IndependentPropertyGenerator
91 | (n_samples=1024)
92 |
93 |
94 |
Generator sampling from a multivariate Gaussian Distribution in which features are not correlated with the label, but are correlated between each other.
95 | Label is made categorical by checking whether it is positive or negative.
96 | Sensitive attribute is the mean of the fourth feature vector
Generator subsampling records from a larger dataset.
181 |
Classification case: samples using a specific proportion for label 1, and for proportion of 0.5 for label 0. Only works with boolean labels.
182 | Regression mode: samples using a specific given proportion between 0 and 1
183 |
Args
184 |
185 |
data : pandas.Dataframe
186 |
the larger dataset to subsample from
187 |
label_col : str
188 |
the label being predicted by the models
189 |
sensitive_attribute : str
190 |
the attribute which distribution being inferred by the property inference attack; is always considered as categorical
191 |
target_category
192 |
if sensitive_attribute is not a binary vector, the category considered in the sensitive attribute
193 |
n_samples : int
194 |
the number of records to sample
195 |
proportion : float
196 |
the proportion of the target_category in the datasets subsampled with label 1 ; ignored in the regression case
197 |
split : bool
198 |
whether to split original dataset between target and adversary
199 |
regression : bool
200 |
whether to use the sampler in regression or classification mode
propinfer is a modular framework to run Property Inference Attacks on Machine Learning models.
26 |
To run an experiment, you simply should define subclasses of Generator and Model
27 | to represent your data source and your evaluated model respectively.
28 |
Logging is available for this framework, using logger propinfer.
If the model has only one layer, or is not a DNN, as a numpy array.
171 |
If the model has multiple layers without biases, as a list of numpy arrays representing each layer.
172 |
If the model has multiple layers with weights and biases, arrays of the corresponding weights and biases are
173 | grouped in a list, with weights going before biases.
174 |
175 |
Returns: the model's parameters
176 |
177 |
178 | def predict(self, data)
179 |
180 |
181 |
Makes predictions on the given data
182 |
Args
183 |
184 |
data
185 |
DataFrame containing all useful data
186 |
187 |
Returns: np.array containing predictions
188 |
189 |
190 | def predict_proba(self, data)
191 |
192 |
193 |
Outputs prediction probability scores for the given data
194 |
Args
195 |
196 |
data
197 |
DataFrame containing all useful data
198 |
199 |
Returns:np.array containing probability scores
200 |
201 |
202 |
203 |
204 |
205 |
206 |
246 |
247 |
250 |
251 |
--------------------------------------------------------------------------------
/env.yml:
--------------------------------------------------------------------------------
1 | name: property-inference
2 | channels:
3 | - defaults
4 | dependencies:
5 | - scikit-learn
6 | - python=3.9
7 | - pandas
8 | - seaborn
9 | - numpy
10 | - matplotlib
11 | - pip:
12 | - hydra-core
13 |
--------------------------------------------------------------------------------
/logging.ini:
--------------------------------------------------------------------------------
1 | [loggers]
2 | keys=root, propinfer
3 |
4 | [handlers]
5 | keys=consoleHandler, fileHandler
6 |
7 | [formatters]
8 | keys=simpleFormatter
9 |
10 | [logger_root]
11 | level=WARNING
12 | handlers=consoleHandler
13 |
14 | [logger_propinfer]
15 | level=DEBUG
16 | handlers=consoleHandler, fileHandler
17 | qualname=propinfer
18 | propagate=0
19 |
20 | [handler_consoleHandler]
21 | class=StreamHandler
22 | level=INFO
23 | formatter=simpleFormatter
24 | args=(sys.stdout,)
25 |
26 | [handler_fileHandler]
27 | class=FileHandler
28 | level=DEBUG
29 | formatter=simpleFormatter
30 | args=('%(logfilename)s', 'w')
31 |
32 | [formatter_simpleFormatter]
33 | format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
--------------------------------------------------------------------------------
/propinfer/__init__.py:
--------------------------------------------------------------------------------
1 | """propinfer is a modular framework to run Property Inference Attacks on Machine Learning models.
2 |
3 | To run an experiment, you simply should define subclasses of `Generator` and `Model`
4 | to represent your data source and your evaluated model respectively.
5 |
6 | Logging is available for this framework, using logger `propinfer`.
7 |
8 | Version 1.3.0
9 |
10 | (c) [EPFL](https://epfl.ch/) [Data Science Lab (dlab)](https://dlab.epfl.ch/) 2022"""
11 |
12 | import logging
13 |
14 | from propinfer.experiment import Experiment
15 | from propinfer.generator import Generator, GaussianGenerator, IndependentPropertyGenerator, ProbitGenerator, \
16 | LinearGenerator, SubsamplingGenerator, MultilabelProbitGenerator
17 | from propinfer.model import Model, LinReg, LogReg, MLP
18 |
19 | logging.getLogger('propinfer').addHandler(logging.NullHandler())
20 |
21 | __pdoc__ = {
22 | 'deepsets': False,
23 | 'model_utils': False
24 | }
--------------------------------------------------------------------------------
/propinfer/deepsets.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from torch.utils.data import TensorDataset, DataLoader
4 | import numpy as np
5 |
6 | import logging
7 | logger = logging.getLogger('propinfer')
8 |
9 | __pdoc__ = {
10 | 'DeepSets': False
11 | }
12 |
13 |
14 | class DeepSets(nn.Module):
15 | def __init__(self, param, latent_dim, epochs, lr, wd, dropout=0.5, bs=32, n_classes=2, out_dim=1):
16 | super().__init__()
17 |
18 | self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
19 |
20 | if isinstance(param, np.ndarray):
21 | param = list(param)
22 | if isinstance(param, list):
23 | self.reducer = list()
24 | self.dimensions = list()
25 | context_size = 0
26 | for i, layer in enumerate(param):
27 |
28 | if isinstance(layer, list):
29 | self.dimensions.append((layer[0].shape[0], layer[0].shape[1] + 1))
30 | dim = layer[0].shape[1] + 1 + context_size
31 | context_size = layer[0].shape[0]*latent_dim
32 | else:
33 | if len(layer.shape) < 2:
34 | layer = layer.reshape((1, -1))
35 |
36 | self.dimensions.append((layer.shape[0], layer.shape[1]))
37 | dim = layer.shape[1] + context_size
38 | context_size = layer.shape[0] * latent_dim
39 |
40 | self.reducer.append(
41 | nn.Sequential(nn.Linear(dim, 64), nn.ReLU(),
42 | nn.Linear(64, latent_dim), nn.Dropout(dropout), nn.ReLU()).to(self.device))
43 | else:
44 | raise AttributeError('The given param is not a list or ndarray, but is {}'.format(type(param).__name__))
45 |
46 | dim = len(param) * latent_dim
47 |
48 | out_dim = n_classes if n_classes > 1 else out_dim
49 |
50 | self.classifier = nn.Sequential(
51 | nn.Linear(dim, out_dim)
52 | ).to(self.device)
53 |
54 | self.epochs = epochs
55 | self.lr = lr
56 | self.wd = wd
57 | self.bs = bs
58 | self.n_classes = n_classes
59 | self.out_dim = out_dim
60 |
61 | def forward(self, X):
62 | offset = 0
63 | context = None
64 | l = list()
65 |
66 | for i, dim in enumerate(self.dimensions):
67 |
68 | layer = X[:, offset:offset + dim[0] * dim[1]].view(-1, dim[0], dim[1])
69 | offset += dim[0] * dim[1]
70 |
71 | if context is not None:
72 | layer = torch.cat((layer, context.view(layer.size()[0], 1, -1).repeat_interleave(dim[0], dim=1)), dim=2)
73 |
74 | n = self.reducer[i](layer)
75 | context = n.flatten(start_dim=1)
76 |
77 | l.append(n.sum(axis=1))
78 |
79 | x = torch.cat(l, dim=1)
80 | x = self.classifier(x)
81 | return x.view(X.shape[0], -1) if self.out_dim > 1 else x.flatten()
82 |
83 | def parameters(self, recurse: bool = True):
84 | params = list(self.classifier.parameters())
85 | for r in self.reducer:
86 | params.extend(list(r.parameters()))
87 | return params
88 |
89 | def __transform(self, parameters):
90 | tensors = list()
91 | for param in parameters:
92 | if isinstance(param, np.ndarray):
93 | param = list(param)
94 |
95 | flat = list()
96 | for i, p in enumerate(param):
97 | if isinstance(p, list):
98 | flat.append(np.concatenate(p, axis=1).flatten())
99 | else:
100 | flat.append(p.flatten())
101 |
102 | tensors.append(torch.tensor(np.concatenate(flat), dtype=torch.float32, device=self.device).view(1, -1))
103 | return torch.cat(tensors, dim=0)
104 |
105 | def fit(self, parameters, labels):
106 | y_true_dtype = torch.int64 if self.n_classes > 1 else torch.float32
107 | ds = TensorDataset(self.__transform(parameters),
108 | torch.tensor(labels, dtype=y_true_dtype, device=self.device))
109 | loader = DataLoader(ds, batch_size=self.bs, shuffle=True)
110 | opt = torch.optim.Adam(self.parameters(), lr=self.lr, weight_decay=self.wd)
111 | criterion = torch.nn.CrossEntropyLoss() if self.n_classes > 1 else torch.nn.MSELoss()
112 | for e in range(self.epochs):
113 | tot_loss = 0
114 | for X, y_true in loader:
115 | opt.zero_grad()
116 | y_pred = self.forward(X)
117 | if self.out_dim > 1 and self.n_classes == 1:
118 | y_true = y_true.view(X.shape[0], -1)
119 | else:
120 | y_true = y_true.view(-1)
121 | loss = criterion(y_pred, y_true)
122 | tot_loss += loss.item()
123 | loss.backward()
124 | opt.step()
125 | if e % 10 == 0 or e == self.epochs-1:
126 | logger.debug('Training DeepSets - Epoch {} - Loss={:.4f}'.format(e, tot_loss))
127 |
128 | def predict(self, parameters):
129 | for r in self.reducer:
130 | r.train(False)
131 | self.classifier.train(False)
132 |
133 | loader = DataLoader(self.__transform(parameters), batch_size=self.bs, shuffle=False)
134 |
135 | predictions = list()
136 |
137 | if self.n_classes > 1:
138 | for X in loader:
139 | predictions.append(self.forward(X).detach().argmax(dim=1).cpu().numpy())
140 | else:
141 | for X in loader:
142 | predictions.append(self.forward(X).detach().cpu().numpy())
143 |
144 | return np.concatenate(predictions)
145 |
--------------------------------------------------------------------------------
/propinfer/experiment.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 |
4 | from sklearn.neural_network import MLPClassifier, MLPRegressor
5 | from sklearn.metrics import accuracy_score, mean_absolute_error, mean_squared_error
6 | from sklearn.model_selection import StratifiedShuffleSplit
7 | from omegaconf import DictConfig
8 | from itertools import product
9 | from random import sample
10 |
11 | from propinfer.generator import Generator
12 | from propinfer.model import Model
13 | from propinfer.deepsets import DeepSets
14 | from propinfer.model_utils import transform_parameters
15 |
16 | import logging
17 | logger = logging.getLogger('propinfer')
18 |
19 |
20 | class Experiment:
21 | def __init__(self, generator, label_col, model, n_targets, n_shadows, hyperparams, n_queries=1024, n_classes=2, range=None):
22 | """Object representing an experiment, based on its data generator and model pair
23 |
24 | Args:
25 | generator (Generator): data abstraction used for this experiment
26 | model (Model.__class__): a Model class that represents the model to be used
27 | n_targets (int): the total number of target models
28 | n_shadows (int): the total number of shadow models
29 | hyperparams (dict or DictConfig): dictionary containing every useful hyper-parameter for the Model;
30 | if a list is provided for some hyperparameter(s), we grid optimise between all given options (except for keyword `layers`)
31 | n_queries (int): the number of queries used in the scope of grey- and black-box attacks. Must be strictly superior to `n_targets`
32 | n_classes (int): the number of classes considered for property inference; if 1 then a regression is performed
33 | range (tuple): the range of values accepted for regression tasks (needed for regression, ignored for classification)
34 | it is possible to pass an iterable of multiple ranges in order to perform multi-variable property inference regression, in which case the values of the variables are passed to the Generator as a list
35 | """
36 |
37 | assert isinstance(generator, Generator), 'The given generator is not an instance of Generator, but {}'.format(type(generator).__name__)
38 | self.generator = generator
39 |
40 | assert isinstance(label_col, str), 'label_col should be a string, but is {}'.format(type(label_col).__name__)
41 | self.label_col = label_col
42 |
43 | assert issubclass(model, Model), 'The given model is not a subclass of Model'
44 | self.model = model
45 |
46 | assert isinstance(n_targets, int), 'The given n_targets is not an integer, but is {}'.format(type(n_targets).__name__)
47 | self.n_targets = n_targets
48 |
49 | assert isinstance(n_shadows, int), 'The given n_shadows is not an integer, but is {}'.format(type(n_shadows).__name__)
50 | self.n_shadows = n_shadows
51 |
52 | if hyperparams is not None:
53 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict),\
54 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
55 | self.hyperparams = hyperparams
56 | if np.any([isinstance(p, list) for p in hyperparams.values()]):
57 | self.__optimise_hyperparams()
58 | else:
59 | self.hyperparams = dict()
60 |
61 | assert isinstance(n_queries, int), 'The given n_queries is not an integer, but is {}'.format(type(n_queries).__name__)
62 | assert n_queries > n_targets, f'n_queries={n_queries} must be strictly superior to n_targets={n_targets}'
63 | self.n_queries = n_queries
64 |
65 | assert isinstance(n_classes, int), 'The given n_classes is not an integer, but is {}'.format(type(n_classes).__name__)
66 | if n_classes == 1:
67 | assert range is not None
68 | assert hasattr(range, '__getitem__')
69 |
70 | self.n_classes = n_classes
71 | self.range = range
72 |
73 | self.targets = None
74 | self.labels = None
75 |
76 | self.shadow_models = None
77 | self.shadow_labels = None
78 |
79 | self.shadow_models = None
80 | self.shadow_labels = None
81 |
82 | if n_classes == 1 and hasattr(self.range[0], '__getitem__'):
83 | label = [r[0] for r in self.range]
84 | data = self.generator.sample(label)
85 | elif n_classes == 1:
86 | data = self.generator.sample(range[0])
87 | else:
88 | data = self.generator.sample(0)
89 | reg = self.model(self.label_col, self.hyperparams).fit(data).predict_proba(data)
90 | self.is_regression = len(reg.shape) < 2 or reg.shape[1] == 1
91 |
92 | def __optimise_hyperparams(self):
93 | """Private method for hyperparamters grid optimisation"""
94 | optims = list()
95 | keys = list()
96 |
97 | for k, v in self.hyperparams.items():
98 | if isinstance(v, list) and k != 'layers':
99 | optims.append(v)
100 | keys.append(k)
101 |
102 | logger.debug('Optimising hyperparameters: {}'.format(keys))
103 |
104 | optims = list(product(*optims))
105 |
106 | best_res = -np.inf
107 | reg_checked = False
108 | is_reg = False
109 |
110 | for params in optims:
111 | hyperparams = self.hyperparams.copy()
112 | for i, p in enumerate(params):
113 | hyperparams[keys[i]] = p
114 | train = [self.generator.sample(b) for b in [False, True]]
115 | test = [self.generator.sample(b) for b in [False, True]]
116 |
117 | res = 0.
118 |
119 | for i in range(len(train)):
120 | models = [self.model(self.label_col, hyperparams).fit(train[i]) for _ in range(10)]
121 |
122 | if not reg_checked and \
123 | (len(models[0].predict_proba(test[0]).shape) < 2 or
124 | models[0].predict_proba(test[0]).shape[1] == 1):
125 | is_reg = True
126 |
127 | reg_checked = True
128 |
129 | if is_reg:
130 | res -= np.mean([mean_squared_error(test[i][self.label_col], m.predict(train[i])) for m in models])
131 | else:
132 | res += np.mean([accuracy_score(test[i][self.label_col], m.predict(train[i])) for m in models])
133 |
134 | res /= len(train)
135 |
136 | if res > best_res:
137 | best_res = res
138 | self.hyperparams = hyperparams
139 |
140 | logger.debug('Best hyperparameters defined as: {}'.format(self.hyperparams))
141 | if is_reg:
142 | logger.debug('Best MSE: {:.2}'.format(-best_res))
143 | else:
144 | logger.debug('Best accuracy: {:.2%}'.format(best_res))
145 |
146 | def run_targets(self):
147 | """Create and fit target models """
148 | if self.n_classes > 1:
149 | self.labels = np.concatenate([[i]*(self.n_targets//self.n_classes) for i in range(self.n_classes)],
150 | dtype=np.int8)
151 | if self.n_targets % self.n_classes > 0:
152 | self.labels = np.concatenate((self.labels,
153 | np.random.randint(0, self.n_classes, self.n_targets % self.n_classes)),
154 | dtype=np.int8)
155 | elif self.n_classes == 1:
156 | if hasattr(self.range[0], '__getitem__'):
157 | bounds = np.array(self.range)
158 | self.labels = np.random.uniform(bounds[:, 0], bounds[:, 1], (self.n_targets, len(self.range)))
159 | else:
160 | self.labels = np.arange(self.range[0], self.range[1], (self.range[1] - self.range[0])/self.n_targets)
161 | else:
162 | raise AttributeError("Invalid n_classes provided: {}".format(self.n_classes))
163 |
164 | self.targets = [self.model(self.label_col, self.hyperparams).fit(data) for data in
165 | [self.generator.sample(label) for label in self.labels]]
166 |
167 | if self.is_regression:
168 | scores = [mean_squared_error(data[self.label_col], self.targets[i].predict(data)) for i, data in
169 | enumerate([self.generator.sample(label) for label in self.labels])]
170 | logger.debug('Target models MAE - mean={:.2} - std={:.2} - min={:.2} - max={:.2}'.format(
171 | np.mean(scores), np.std(scores), np.min(scores), np.max(scores)))
172 |
173 | else:
174 | scores = [accuracy_score(data[self.label_col], self.targets[i].predict(data)) for i, data in
175 | enumerate([self.generator.sample(label) for label in self.labels])]
176 | logger.debug('Target models accuracy - mean={:.2%} - std={:.2%} - min={:.2%} - max={:.2%}'.format(
177 | np.mean(scores), np.std(scores), np.min(scores), np.max(scores)))
178 |
179 | def run_shadows(self, model=None, hyperparams=None):
180 | """Create and fit shadow models
181 |
182 | Args:
183 | model (Model.__class__): a Model class that represents the model to be used. If None, will be the same as target models
184 | hyperparams (dict or DictConfig): dictionary containing every useful hyper-parameter for the Model;
185 | Hyperparameters of shadow models will NOT be optimised. If None, will be the same as target models.
186 | """
187 | if model is not None:
188 | assert issubclass(model, Model), 'The given model is not a subclass of Model'
189 |
190 | if hyperparams is not None:
191 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict),\
192 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
193 | else:
194 | self.hyperparams = dict()
195 |
196 | else:
197 | model = self.model
198 | hyperparams = self.hyperparams
199 |
200 | if self.n_classes > 1:
201 | self.shadow_labels = np.concatenate([[i]*(self.n_shadows//self.n_classes) for i in range(self.n_classes)],
202 | dtype=np.int8)
203 | if self.n_shadows % self.n_classes > 0:
204 | self.shadow_labels = np.concatenate((self.shadow_labels,
205 | np.random.randint(0, self.n_classes, self.n_shadows % self.n_classes)),
206 | dtype=np.int8)
207 | elif self.n_classes == 1:
208 | if hasattr(self.range[0], '__getitem__'):
209 | bounds = np.array(self.range)
210 | self.shadow_labels = np.random.uniform(bounds[:, 0], bounds[:, 1], (self.n_shadows, len(self.range)))
211 | else:
212 | self.shadow_labels = np.arange(self.range[0], self.range[1], (self.range[1] - self.range[0])/self.n_shadows)
213 | else:
214 | raise AttributeError("Invalid n_classes provided: {}".format(self.n_classes))
215 |
216 | self.shadow_models = [model(self.label_col, hyperparams).fit(data) for data in
217 | [self.generator.sample(label, adv=True) for label in self.shadow_labels]]
218 |
219 | if self.is_regression:
220 | scores = [mean_squared_error(data[self.label_col], self.shadow_models[i].predict(data)) for i, data in
221 | enumerate([self.generator.sample(label, adv=True) for label in self.shadow_labels])]
222 | logger.debug('Shadow models MAE - mean={:.2} - std={:.2} - min={:.2} - max={:.2}'.format(
223 | np.mean(scores), np.std(scores), np.min(scores), np.max(scores)))
224 |
225 | else:
226 | scores = [accuracy_score(data[self.label_col], self.shadow_models[i].predict(data)) for i, data in
227 | enumerate([self.generator.sample(label, adv=True) for label in self.shadow_labels])]
228 | logger.debug('Shadow models accuracy - mean={:.2%} - std={:.2%} - min={:.2%} - max={:.2%}'.format(
229 | np.mean(scores), np.std(scores), np.min(scores), np.max(scores)))
230 |
231 | def run_loss_test(self):
232 | """Runs a loss test attack on target models. Works only for the binary classification attack on a classifier.
233 |
234 | Returns: Attack accuracy on target models
235 | """
236 | assert self.targets is not None
237 | assert self.n_classes == 2
238 |
239 | y_true = [False, True]
240 | X_test = [self.generator.sample(b, adv=True) for b in y_true]
241 |
242 | accuracy = [[accuracy_score(X[self.label_col], t.predict(X)) for X in X_test] for t in self.targets]
243 | return accuracy_score(self.labels, [np.argmax(acc) for acc in accuracy])
244 |
245 | def __run_multiple(self, n, func, *args):
246 | """Helper private method to run a same attack multiple times"""
247 |
248 | sss = StratifiedShuffleSplit(n_splits=n, train_size=0.5)
249 | shadow_models = np.array(self.shadow_models)
250 | shadow_labels = np.array(self.shadow_labels)
251 |
252 | accs = []
253 |
254 | if self.n_classes > 1:
255 | for idx, _ in sss.split(shadow_models, shadow_labels):
256 | self.shadow_models, self.shadow_labels = shadow_models[idx], shadow_labels[idx]
257 | accs.append(func(*args))
258 | else:
259 | for _ in range(n):
260 | idx = sample(range(self.n_shadows), self.n_shadows//2)
261 | self.shadow_models, self.shadow_labels = shadow_models[idx], shadow_labels[idx]
262 | accs.append(func(*args))
263 |
264 | self.shadow_models, self.shadow_labels = shadow_models, shadow_labels
265 |
266 | return accs
267 |
268 | def run_threshold_test(self, n_outputs=1):
269 | """Runs a threshold test attack on target models. Works only for the binary classification attack on a classifier.
270 |
271 | Args:
272 | n_outputs (int): number of attack results to output, using multiple random subsets of the shadow models
273 |
274 | Returns: Attack accuracy on target models
275 | """
276 | assert self.targets is not None
277 | assert self.shadow_models is not None
278 | assert self.n_classes == 2
279 |
280 | if n_outputs > 1:
281 | return self.__run_multiple(n_outputs, self.run_threshold_test)
282 |
283 | y_true = [False, True]
284 | X_test = [self.generator.sample(b, adv=True) for b in y_true]
285 |
286 | shadow_labels = np.array(self.shadow_labels, dtype=bool)
287 | accuracy = np.array([[accuracy_score(X[self.label_col], s.predict(X)) for X in X_test] for s in self.shadow_models])
288 | k = np.argmax(np.abs(np.sum(accuracy[shadow_labels, :], axis=0) -
289 | np.sum(accuracy[~shadow_labels, :], axis=0)))
290 | higher_acc = np.argmax([np.sum(accuracy[~shadow_labels, k]), np.sum(accuracy[shadow_labels, k])])
291 |
292 | thr = 0.0
293 | best_acc = 0.0
294 | for z in np.arange(0, 1, 1e-2):
295 | thr_labels = [higher_acc if acc > z else not higher_acc for acc in accuracy[:, k]]
296 | acc = accuracy_score(shadow_labels, thr_labels)
297 | if acc > best_acc:
298 | thr = z
299 | best_acc = acc
300 |
301 | accuracy = np.array([accuracy_score(X_test[k][self.label_col], t.predict(X_test[k])) for t in self.targets])
302 | y_pred = [higher_acc if acc > thr else not higher_acc for acc in accuracy]
303 | return accuracy_score(self.labels, y_pred)
304 |
305 | def __get_score(self, y_pred):
306 | if self.n_classes > 1:
307 | return accuracy_score(self.labels, y_pred)
308 | else:
309 | if len(y_pred.shape) == 1:
310 | return mean_absolute_error(self.labels, y_pred)
311 | else:
312 | return [mean_absolute_error(self.labels[:, i], y_pred[:, i]) for i in range(y_pred.shape[1])]
313 |
314 | def run_whitebox_deepsets(self, hyperparams, n_outputs=1):
315 | """Runs a whitebox attack on the target models using a DeepSets meta-classifier
316 |
317 | Args:
318 | hyperparams (dict or DictConfig): Hyperparameters for the DeepSets meta-classifier.
319 | Accepted keywords are: latent_dim (default=5); epochs (default=20); learning_rate (default=1e-4); weight_decay (default=1e-4)
320 | n_outputs (int): number of attack results to output, using multiple random subsets of the shadow models
321 |
322 | Returns: Attack accuracy on target models
323 | """
324 | assert self.targets is not None
325 | assert self.shadow_models is not None
326 |
327 | if n_outputs > 1:
328 | return self.__run_multiple(n_outputs, self.run_whitebox_deepsets, hyperparams)
329 |
330 | if hyperparams is not None:
331 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict),\
332 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
333 | else:
334 | hyperparams = dict()
335 |
336 | latent_dim = hyperparams['latent_dim'] if 'latent_dim' in hyperparams.keys() else 5
337 | epochs = hyperparams['epochs'] if 'epochs' in hyperparams.keys() else 20
338 | lr = hyperparams['learning_rate'] if 'learning_rate' in hyperparams.keys() else 1e-4
339 | wd = hyperparams['weight_decay'] if 'weight_decay' in hyperparams.keys() else 1e-4
340 | out_dim = 1 if self.n_classes > 1 or not hasattr(self.range[0], '__getitem__') else len(self.range)
341 |
342 | meta_classifier = DeepSets(self.shadow_models[0].parameters(), latent_dim=latent_dim,
343 | epochs=epochs, lr=lr, wd=wd, n_classes=self.n_classes, out_dim=out_dim)
344 |
345 | train = [s.parameters() for s in self.shadow_models]
346 | test = [t.parameters() for t in self.targets]
347 |
348 | meta_classifier.fit(train, self.shadow_labels)
349 | y_pred = meta_classifier.predict(test)
350 |
351 | del train, test, meta_classifier
352 |
353 | return self.__get_score(y_pred)
354 |
355 | def run_whitebox_sort(self, sort=True, n_outputs=1):
356 | """Runs a whitebox attack on the target models, by using the model parameters as features for a meta-classifier
357 |
358 | Args:
359 | sort (bool): whether to perform node sorting (to be used for permutation-invariant DNN)
360 | n_outputs (int): number of attack results to output, using multiple random subsets of the shadow models
361 |
362 | Returns: Attack accuracy on target models for the classification task, or mean absolute error for the regression task
363 | """
364 | assert self.targets is not None
365 | assert self.shadow_models is not None
366 |
367 | if n_outputs > 1:
368 | return self.__run_multiple(n_outputs, self.run_whitebox_sort, sort)
369 |
370 | train = pd.DataFrame(data=[transform_parameters(s.parameters(), sort=sort)
371 | for s in self.shadow_models])
372 |
373 | test = pd.DataFrame(data=[transform_parameters(t.parameters(), sort=sort)
374 | for t in self.targets])
375 |
376 | meta_classifier = MLPClassifier(hidden_layer_sizes=(128, 64), max_iter=1024, early_stopping=True) \
377 | if self.n_classes > 1 else MLPRegressor(hidden_layer_sizes=(128, 64), max_iter=1024, early_stopping=True)
378 |
379 | meta_classifier.fit(train, self.shadow_labels)
380 | y_pred = meta_classifier.predict(test)
381 |
382 | del train, test, meta_classifier
383 |
384 | return self.__get_score(y_pred)
385 |
386 | def run_blackbox(self, n_outputs=1):
387 | """Runs a blackbox attack on the target models, by using the result of random queries as features for a meta-classifier
388 |
389 | Args:
390 | n_outputs (int): number of attack results to output, using multiple random subsets of the shadow models
391 |
392 | Returns: Attack accuracy on target models for the classification task, or mean absolute error for the regression task
393 | """
394 | assert self.targets is not None
395 | assert self.shadow_models is not None
396 |
397 | if n_outputs > 1:
398 | return self.__run_multiple(n_outputs, self.run_blackbox)
399 |
400 | meta_classifier = MLPClassifier(hidden_layer_sizes=(128, 64), max_iter=1024, early_stopping=True) \
401 | if self.n_classes > 1 else MLPRegressor(hidden_layer_sizes=(128, 64), max_iter=1024, early_stopping=True)
402 |
403 | if self.n_classes > 1:
404 | queries = pd.concat([self.generator.sample(i, adv=True) for i in range(self.n_classes)])
405 | labels = np.concatenate([[i]*len(self.generator.sample(i, adv=True)) for i in range(self.n_classes)])
406 | elif self.n_classes == 1:
407 | if hasattr(self.range[0], '__getitem__'):
408 | bounds = np.array(self.range)
409 | labels = np.random.uniform(bounds[:, 0], bounds[:, 1], (10*len(self.range), len(self.range)))
410 | sample_len = len(self.generator.sample([0]*len(self.range), adv=True))
411 | else:
412 | labels = np.arange(self.range[0], self.range[1], (self.range[1] - self.range[0])/10)
413 | sample_len = len(self.generator.sample(0, adv=True))
414 |
415 | queries = pd.concat([self.generator.sample(l, adv=True) for l in labels])
416 | labels = np.concatenate([[l]*sample_len for l in labels])
417 | else:
418 | raise AttributeError("Invalid n_classes provided: {}".format(self.n_classes))
419 |
420 | sss = StratifiedShuffleSplit(n_splits=1, train_size=self.n_queries)
421 | idx, _ = list(sss.split(queries, labels))[0]
422 | queries = queries.iloc[idx]
423 |
424 | train = pd.DataFrame(data=[s.predict(queries).flatten() for s in self.shadow_models])
425 | test = pd.DataFrame(data=[s.predict(queries).flatten() for s in self.targets])
426 |
427 | meta_classifier.fit(train, self.shadow_labels)
428 | y_pred = meta_classifier.predict(test)
429 |
430 | del train, test, meta_classifier
431 |
432 | return self.__get_score(y_pred)
433 |
--------------------------------------------------------------------------------
/propinfer/generator.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from numpy import array, eye, ones, int32, int64, float32
3 | from numpy.random import normal, multivariate_normal
4 | from pandas import DataFrame, concat, get_dummies
5 | from sklearn.model_selection import StratifiedShuffleSplit
6 |
7 | __pdoc__ = {
8 | 'multivariate_normal': False,
9 | 'normal': False
10 | }
11 |
12 |
13 | class Generator:
14 | """An abstraction class used to query for data"""
15 |
16 | def __init__(self, n_samples=1024):
17 | assert isinstance(n_samples, int), 'n_samples should be an int, but {} was provided'.format(type(n_samples).__name__)
18 | self.n_samples = n_samples
19 |
20 | def sample(self, label, adv=False):
21 | """Returns a dataset sampled from the data; the label variable corresponds to the property being attacked
22 |
23 | Args:
24 | label (int or float or numpy.array): the label corresponding to the dataset being queried - when performing regression, the value of the target variable(s)
25 | adv (bool): a boolean describing whether we are using target or adversary data split
26 |
27 | Returns:
28 | a pandas DataFrame representing our dataset for this experiment
29 | """
30 | raise NotImplementedError
31 |
32 |
33 | class GaussianGenerator(Generator):
34 | """Generator sampling from a multivariate Gaussian Distribution in which features are correlated.
35 | Label is made categorical by checking whether it is positive or negative.
36 | Sensitive attribute is the mean of the fourth feature vector"""
37 |
38 | def sample(self, label, adv=False):
39 | mean = array([0.]*5)
40 | mean[4] = label
41 |
42 | cov = eye(5)
43 |
44 | for i in range(1, 5):
45 | cov[0, i] = cov[i, 0] = 0.5
46 |
47 | data = DataFrame(data=multivariate_normal(mean, cov, size=self.n_samples),
48 | columns=['label', 'f1', 'f2', 'f3', 'f4'], dtype=float32)
49 | data['label'] = (data['label'] > 0).astype('int32')
50 |
51 | return data
52 |
53 |
54 | class IndependentPropertyGenerator(Generator):
55 | """Generator sampling from a multivariate Gaussian Distribution in which features are not correlated with the label, but are correlated between each other.
56 | Label is made categorical by checking whether it is positive or negative.
57 | Sensitive attribute is the mean of the fourth feature vector"""
58 | def sample(self, label, adv=False):
59 | mean = array([0.] * 5)
60 | mean[4] = label
61 |
62 | cov = eye(5)
63 | for i in range(1, 4):
64 | cov[0, i] = cov[i, 0] = 0.5
65 |
66 | data = DataFrame(data=multivariate_normal(mean, cov, size=self.n_samples),
67 | columns=['label', 'f1', 'f2', 'f3', 'f4'], dtype=float32)
68 | data['label'] = (data['label'] > 0).astype('int32')
69 |
70 | return data
71 |
72 |
73 | class LinearGenerator(Generator):
74 | """Generator sampling from a linear model with additive white gaussian noise
75 |
76 | The sensitive attribute defines the mean of the covariates"""
77 |
78 | def __init__(self, n_samples=1024):
79 | super().__init__(n_samples)
80 | self.beta = ones(4) + normal(0., 1., 4)
81 |
82 | def sample(self, label, adv=False):
83 | x = multivariate_normal(ones(4) * label, eye(4), size=self.n_samples)
84 | y = x @ self.beta + normal(0., size=self.n_samples) + 0.5
85 |
86 | data = DataFrame(data=x,
87 | columns=['f1', 'f2', 'f3', 'f4'], dtype=float32)
88 | data['label'] = y.astype('float32')
89 |
90 | return data
91 |
92 |
93 | class ProbitGenerator(LinearGenerator):
94 | """Generator sampling from a probit model with additive white gaussian noise
95 |
96 | The sensitive attribute defines the mean of the covariates"""
97 |
98 | def sample(self, label, adv=False):
99 | data = super().sample(label, adv)
100 | data['label'] = (data['label'] > 0).astype('int32')
101 | return data
102 |
103 |
104 | class MultilabelProbitGenerator(Generator):
105 | """Generator sampling from a probit model of which sensitive attribute are the mean and variance of the covariates"""
106 |
107 | def sample(self, label, adv=False):
108 | mean = array([label[0]] * 4)
109 | cov = eye(4) + 2 * eye(4) * label[1]
110 |
111 | x = multivariate_normal(mean, cov, size=self.n_samples)
112 |
113 | beta = array([-1., 1., -0.5, 1.5])
114 | y = x @ beta + 0.25 + normal(0., 1., size=self.n_samples)
115 |
116 | data = DataFrame(data=x, columns=['f1', 'f2', 'f3', 'f4'], dtype=float32)
117 | data['label'] = (y > 0).astype('int32')
118 |
119 | return data
120 |
121 |
122 | class SubsamplingGenerator(Generator):
123 | def __init__(self, data, label_col, sensitive_attribute, target_category=None,
124 | n_samples=1024, proportion=None, split=False, regression=False):
125 | """Generator subsampling records from a larger dataset.
126 |
127 | Classification case: samples using a specific proportion for label 1, and for proportion of 0.5 for label 0. Only works with boolean labels.
128 | Regression mode: samples using a specific given proportion between 0 and 1
129 |
130 | Args:
131 | data (pandas.Dataframe): the larger dataset to subsample from
132 | label_col (str): the label being predicted by the models
133 | sensitive_attribute (str): the attribute which distribution being inferred by the property inference attack; is always considered as categorical
134 | target_category: if sensitive_attribute is not a binary vector, the category considered in the sensitive attribute
135 | n_samples (int): the number of records to sample
136 | proportion (float): the proportion of the target_category in the datasets subsampled with label 1 ; ignored in the regression case
137 | split (bool): whether to split original dataset between target and adversary
138 | regression (bool): whether to use the sampler in regression or classification mode
139 | """
140 | super().__init__(n_samples)
141 |
142 | assert isinstance(data, DataFrame), 'Given data should be a DataFrame, but is {}'.format(type(data).__name__)
143 | self.data = data
144 |
145 | assert isinstance(label_col, str), 'label_col should be a string, but is {}' .format(type(label_col).__name__)
146 | assert label_col in data.columns, 'label_col not in data columns'
147 | self.label_col = label_col
148 |
149 | assert isinstance(sensitive_attribute, str), 'sensitive_attribute should be a string, but is {}'.format(type(sensitive_attribute).__name__)
150 | assert sensitive_attribute in data.columns, 'sensitive_attribute not in data columns'
151 | self.attr = sensitive_attribute
152 |
153 | assert isinstance(split, bool), 'Split should be a bool, but is {}'.format(type(split).__name__)
154 | self.split = split
155 | if split:
156 | sss = StratifiedShuffleSplit(train_size=0.5)
157 | self.tar, self.adv = next(sss.split(data, data[[self.label_col, self.attr]]))
158 |
159 | self.data[sensitive_attribute] = self.data[sensitive_attribute].astype('category')
160 |
161 | if not target_category:
162 | assert len(data[data[sensitive_attribute] == 0]) + len(data[data[sensitive_attribute] == 1]) == len(data), \
163 | 'target_category not specified but sensitive attribute is not a binary vector'
164 | self.pos = data[sensitive_attribute] == 1
165 | self.data['attr'] = self.data[sensitive_attribute]
166 |
167 | else:
168 | assert target_category in self.data[sensitive_attribute].cat.categories, \
169 | 'target category {} not in {} column'.format(target_category, sensitive_attribute)
170 | self.pos = data[sensitive_attribute] == target_category
171 | self.data['attr'] = self.data[sensitive_attribute].cat.codes
172 |
173 | assert isinstance(regression, bool), 'Regression should be a bool, but is {}'.format(type(regression).__name__)
174 | self.regression = regression
175 |
176 | if not self.regression:
177 | self.set_proportion(proportion)
178 |
179 | def sample(self, label, adv=False):
180 | if not self.regression:
181 | assert np.isclose(label, 0) or np.isclose(label, 1)
182 |
183 | if self.split:
184 | data = self.data.iloc[self.adv] if adv else self.data.iloc[self.tar]
185 | pos = self.pos.iloc[self.adv] if adv else self.pos.iloc[self.tar]
186 | else:
187 | data = self.data
188 | pos = self.pos
189 |
190 | if self.regression:
191 | prop = label
192 | else:
193 | prop = self.proportion if label else 0.5
194 |
195 | # Sampling positive examples
196 | n = int(self.n_samples * prop)
197 | if n > 0:
198 | sss = StratifiedShuffleSplit(train_size=n)
199 | try:
200 | idx, _ = next(sss.split(data[pos], data[pos][[self.label_col, 'attr']]))
201 | pos_df = data[pos].iloc[idx]
202 | except ValueError:
203 | pos_df = data[pos].sample(n)
204 | else:
205 | pos_df = None
206 |
207 | # Sampling negative examples
208 | n = self.n_samples - int(self.n_samples * prop)
209 | if n > 0:
210 | sss = StratifiedShuffleSplit(train_size=n)
211 | try:
212 | idx, _ = next(sss.split(data[~pos], data[~pos][[self.label_col, 'attr']]))
213 | neg_df = data[~pos].iloc[idx]
214 | except ValueError:
215 | neg_df = data[~pos].sample(n)
216 | else:
217 | neg_df = None
218 |
219 | if pos_df is not None:
220 | out = concat((pos_df, neg_df)) if neg_df is not None else pos_df
221 | else:
222 | out = neg_df
223 |
224 | if not (out.dtypes[self.label_col] == int32 or out.dtypes[self.label_col] == int64):
225 | out[self.label_col] = out[self.label_col].astype('category').cat.codes
226 |
227 | out = out.drop('attr', axis=1)
228 |
229 | return get_dummies(out)
230 |
231 | def set_proportion(self, proportion):
232 | assert 0. <= proportion <= 1., 'proportion is {:.2f} but should be in [0., 1.]'.format(proportion)
233 | self.proportion = proportion
234 |
--------------------------------------------------------------------------------
/propinfer/model.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn as nn
4 | import warnings
5 |
6 | from sklearn.linear_model import LinearRegression, LogisticRegression
7 | from sklearn.exceptions import ConvergenceWarning
8 |
9 | from torch.nn.functional import softmax
10 | from omegaconf import DictConfig
11 |
12 |
13 | class Model:
14 | def __init__(self, label_col, normalise):
15 | """An abstract class to be extended to represent the models that will be attacked.
16 |
17 | Args:
18 | label_col: the index of the column to be used as Label
19 | normalise (bool): whether to normalise data before fit/predict
20 | """
21 | assert isinstance(label_col, str), 'label_col should be a string'
22 | self.label_col = label_col
23 |
24 | assert isinstance(normalise, bool), 'normalise should be bool'
25 | self.normalise = normalise
26 |
27 | self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
28 |
29 | self.train_mean = None
30 | self.train_std = None
31 |
32 | def _prepare_data(self, df, train=True):
33 | """Prepares data by separating features from labels and eventually normalising features.
34 |
35 | Args:
36 | df (DataFrame): the data to be prepared
37 | train (bool): whether we are preparing a train or test set
38 |
39 | Returns:
40 | X (DataFrame): feature data
41 | y (Series): label data
42 | """
43 | feature_cols = df.columns.to_list()
44 | feature_cols.remove(self.label_col)
45 |
46 | X = df[feature_cols].copy()
47 | y = df[self.label_col].copy()
48 |
49 | if self.normalise:
50 | norm = X.select_dtypes(exclude=[np.uint8, np.int8])
51 |
52 | if train or self.train_mean is None:
53 | self.train_mean = norm.mean()
54 | self.train_std = norm.std()
55 | if self.train_std < 1e-5:
56 | self.train_std = 1.
57 |
58 | X[norm.columns] = (norm - self.train_mean) / self.train_std
59 |
60 | return X, y
61 |
62 | def _prepare_dataloader(self, df, bs=32, train=True, regression=False):
63 | """Prepares data, and puts it inside a ready-to-use PyTorch DataLoader.
64 |
65 | Args:
66 | df (DataFrame): the data to be prepared
67 | bs (int): batch-size
68 | train (bool): whether we are preparing a train or test set
69 |
70 | Returns: a PyTorch DataLoader
71 | """
72 | X, y = self._prepare_data(df, train)
73 |
74 | X = torch.tensor(X.values.astype(np.float32), device=self.device)
75 | y = torch.tensor(y.values.astype(np.int64 if not regression else np.float32), device=self.device)
76 | data = torch.utils.data.TensorDataset(X, y)
77 | loader = torch.utils.data.DataLoader(dataset=data, batch_size=bs, shuffle=train)
78 |
79 | return loader
80 |
81 | def fit(self, data):
82 | """Fits the model according to the given data
83 | Args:
84 | data: DataFrame containing all useful data
85 | Returns: Model, the model itself
86 | """
87 | raise NotImplementedError
88 |
89 | def predict(self, data):
90 | """Makes predictions on the given data
91 | Args:
92 | data: DataFrame containing all useful data
93 | Returns: np.array containing predictions
94 | """
95 | res = self.predict_proba(data)
96 | return res.flatten() if len(res.shape) < 2 or res.shape[1] == 1 else res.argmax(axis=1)
97 |
98 | def predict_proba(self, data):
99 | """Outputs prediction probability scores for the given data
100 | Args:
101 | data: DataFrame containing all useful data
102 | Returns:np.array containing probability scores
103 | """
104 | raise NotImplementedError
105 |
106 | def parameters(self):
107 | """Returns the model's parameters.
108 |
109 | * If the model has only one layer, or is not a DNN, as a numpy array.
110 | * If the model has multiple layers without biases, as a list of numpy arrays representing each layer.
111 | * If the model has multiple layers with weights and biases, arrays of the corresponding weights and biases are
112 | grouped in a list, with weights going before biases.
113 |
114 | Returns: the model's parameters
115 | """
116 |
117 | return []
118 |
119 |
120 | class LinReg(Model):
121 | def __init__(self, label_col, hyperparams=None):
122 | """A linear regression based model
123 |
124 | Args:
125 | label_col: the index of the column to be used as Label
126 | hyperparams (dict of DictConfig): hyperperameters for the Model
127 | Accepted keywords: max_iter (default = 100), normalise (default=False)
128 | """
129 | if hyperparams is not None:
130 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict),\
131 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
132 | else:
133 | hyperparams = dict()
134 |
135 | if 'normalise' in hyperparams.keys():
136 | normalise = hyperparams['normalise']
137 | elif 'normalize' in hyperparams.keys():
138 | normalise = hyperparams['normalize']
139 | else:
140 | normalise = False
141 |
142 | super().__init__(label_col, normalise)
143 | self.model = LinearRegression()
144 |
145 | def fit(self, data):
146 | X, y = self._prepare_data(data, train=True)
147 | with warnings.catch_warnings():
148 | warnings.simplefilter("ignore", category=ConvergenceWarning)
149 | self.model.fit(X, y)
150 | return self
151 |
152 | def predict_proba(self, data):
153 | X, _ = self._prepare_data(data, train=True)
154 | return self.model.predict(X)
155 |
156 | def parameters(self):
157 | intercept = self.model.intercept_
158 | if not isinstance(intercept, np.ndarray):
159 | intercept = np.array([intercept])
160 | return np.concatenate([intercept, self.model.coef_.flatten()])
161 |
162 |
163 | class LogReg(LinReg):
164 | def __init__(self, label_col, hyperparams):
165 | """A logistic regression based model
166 |
167 | Args:
168 | label_col: the index of the column to be used as Label
169 | hyperparams (dict of DictConfig): hyperperameters for the Model
170 | Accepted keywords: max_iter (default = 100), normalise (default=False)
171 | """
172 | if hyperparams is not None:
173 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict),\
174 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
175 | else:
176 | hyperparams = dict()
177 |
178 | max_iter = hyperparams['max_iter'] if 'max_iter' in hyperparams.keys() else 100
179 |
180 | super().__init__(label_col, hyperparams)
181 | self.model = LogisticRegression(max_iter=max_iter)
182 |
183 | def predict_proba(self, data):
184 | X, _ = self._prepare_data(data, train=True)
185 | return self.model.predict_proba(X)
186 |
187 |
188 | class MLP(Model):
189 | def __init__(self, label_col, hyperparams):
190 | """A Multi-Layer Perceptron based model, for either regression or classification
191 |
192 | Args:
193 | label_col: the index of the column to be used as Label
194 | hyperparams (dict of DictConfig): hyperperameters for the Model
195 | Accepted keywords: input_size (mandatory), n_classes (mandatory, performs regression if is 1),
196 | layers (default=[64,16]), epochs (default=20), learning_rate (default=1e-1), weight_decay (default=1e-2),
197 | batch_size (default=32), normalise (default=False)
198 | """
199 | assert isinstance(hyperparams, DictConfig) or isinstance(hyperparams, dict), \
200 | 'The given hyperparameters are not a dict or a DictConfig, but are {}'.format(type(hyperparams).__name__)
201 |
202 | if 'normalise' in hyperparams.keys():
203 | normalise = hyperparams['normalise']
204 | elif 'normalize' in hyperparams.keys():
205 | normalise = hyperparams['normalize']
206 | else:
207 | normalise = False
208 | super(MLP, self).__init__(label_col, normalise)
209 |
210 | layers = hyperparams['layers'] if 'layers' in hyperparams.keys() else [64, 16]
211 |
212 | input_size = hyperparams['input_size']
213 |
214 | # Legacy version compatibility
215 | if 'num_classes' in hyperparams.keys():
216 | hyperparams['n_classes'] = hyperparams['num_classes']
217 |
218 | self.n_classes = hyperparams['n_classes']
219 |
220 | seq = list()
221 | for l in layers:
222 | seq.extend([
223 | nn.Linear(input_size, l),
224 | nn.ReLU()
225 | ])
226 | input_size = l
227 |
228 | seq.extend([
229 | nn.Linear(input_size, self.n_classes)
230 | ])
231 |
232 | self.model = nn.Sequential(*seq).to(self.device)
233 |
234 | self.epochs = hyperparams['epochs'] if 'epochs' in hyperparams.keys() else 10
235 | self.lr = hyperparams['learning_rate'] if 'learning_rate' in hyperparams.keys() else 1e-1
236 | self.wd = hyperparams['weight_decay'] if 'weight_decay' in hyperparams.keys() else 1e-2
237 | self.bs = hyperparams['batch_size'] if 'batch_size' in hyperparams.keys() else 32
238 |
239 | def fit(self, data):
240 | loader = self._prepare_dataloader(data, bs=self.bs, train=True, regression=self.n_classes == 1)
241 |
242 | opt = torch.optim.Adam(self.model.parameters(), lr=self.lr, weight_decay=self.wd)
243 | criterion = nn.CrossEntropyLoss() if self.n_classes > 1 else nn.MSELoss()
244 |
245 | for _ in range(self.epochs):
246 | for X, y_true in loader:
247 | opt.zero_grad()
248 | y_pred = self.model(X)
249 |
250 | if y_pred.shape[1] == 1:
251 | y_pred = y_pred.flatten()
252 |
253 | loss = criterion(y_pred, y_true)
254 | loss.backward()
255 | opt.step()
256 |
257 | return self
258 |
259 | def predict_proba(self, data):
260 | loader = self._prepare_dataloader(data, bs=self.bs, train=False, regression=self.n_classes == 1)
261 | preds = list()
262 |
263 | if self.n_classes > 1:
264 | for X, _ in loader:
265 | preds.append(softmax(self.model(X).cpu(), dim=1))
266 |
267 | else:
268 | for X, _ in loader:
269 | preds.append(self.model(X).cpu())
270 |
271 | return np.nan_to_num(torch.cat(preds, dim=0).detach().cpu().numpy())
272 |
273 | def parameters(self):
274 | params = self.model.state_dict()
275 | out = list()
276 | for i in {int(k.split('.')[0]) for k in params.keys()}:
277 | w = np.nan_to_num(params['{}.weight'.format(i)].detach().cpu().numpy())
278 | b = np.nan_to_num(params['{}.bias'.format(i)].view(-1, 1).detach().cpu().numpy())
279 | out.append([w, b])
280 | return out
--------------------------------------------------------------------------------
/propinfer/model_utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def transform_parameters(parameters, sort=False):
5 | if isinstance(parameters, np.ndarray):
6 | if sort:
7 | return np.sort(parameters.flatten())
8 | else:
9 | return parameters.flatten()
10 | elif isinstance(parameters, list):
11 | if sort:
12 | return sort_parameters(parameters)
13 | else:
14 | return flatten_parameters(parameters)
15 | else:
16 | raise AttributeError(
17 | 'Parameters should be a numpy array or a list, but is {}'.format(type(parameters).__name__))
18 |
19 |
20 | def flatten_parameters(parameters):
21 | out = []
22 | for p in parameters:
23 | if isinstance(p, list):
24 | out.extend([array.flatten() for array in p])
25 | else:
26 | out.append(p.flatten())
27 | return np.concatenate(out)
28 |
29 |
30 | def sort_parameters(parameters):
31 | out = []
32 | for i in range(len(parameters)-1):
33 | if isinstance(parameters[i], list):
34 | order = np.argsort(parameters[i][0].sum(axis=1))
35 | out.append(parameters[i][0][order, :].flatten())
36 | out.append(parameters[i][1][order, :].flatten())
37 | else:
38 | order = np.argsort(np.abs(parameters[i].sum(axis=1)))
39 | out.append(parameters[i][order, :].flatten())
40 |
41 | if isinstance(parameters[i + 1], list):
42 | parameters[i+1][0] = parameters[i+1][0][:, order]
43 | else:
44 | parameters[i+1] = parameters[i+1][:, order]
45 |
46 | if isinstance(parameters[-1], list):
47 | out.extend([array.flatten() for array in parameters[-1]])
48 | else:
49 | out.append(parameters[-1].flatten())
50 |
51 | out = np.concatenate(out)
52 | return out
53 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = [
3 | "setuptools>=42",
4 | "wheel"
5 | ]
6 | build-backend = "setuptools.build_meta"
--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
1 | import json
2 | import logging.config
3 |
4 | from os import path
5 |
6 | from omegaconf import DictConfig
7 | import hydra
8 |
9 | from propinfer import Experiment
10 | from propinfer import GaussianGenerator, IndependentPropertyGenerator, ProbitGenerator, NonlinearGenerator
11 | from propinfer import LogReg, MLP
12 |
13 | CWD = path.dirname(__file__)
14 |
15 | MODELS = {
16 | 'LogReg': LogReg,
17 | 'MLP': MLP
18 | }
19 |
20 | GENERATORS = {
21 | 'GaussianGenerator': GaussianGenerator,
22 | 'IndependentPropertyGenerator': IndependentPropertyGenerator,
23 | 'ProbitGenerator': ProbitGenerator,
24 | 'NonlinearGenerator': NonlinearGenerator
25 | }
26 |
27 | from os import path, mkdir
28 | from time import strftime
29 |
30 | TIMESTAMP = strftime('%d%m%y_%H:%M:%S')
31 |
32 | config = path.abspath(path.join(path.dirname(__file__), 'logging.ini'))
33 |
34 | logdir = path.abspath(path.join(path.dirname(__file__),"./logs"))
35 | if not path.isdir(logdir):
36 | mkdir(logdir)
37 | logfile = logdir + '/logs_property-inference-attacks_' + TIMESTAMP + '.txt'
38 |
39 | logging.config.fileConfig(config, defaults={'logfilename': logfile})
40 |
41 | # create logger
42 | logger = logging.getLogger('propinfer')
43 |
44 |
45 | @hydra.main(config_path="config", config_name="config")
46 | def main(cfg: DictConfig):
47 | experiments = dict()
48 | for gen in cfg.experiments.generators:
49 | generator = GENERATORS[gen](num_samples=cfg.generators.n_samples)
50 | for model in cfg.experiments.models:
51 | n_classes = cfg.experiments.n_classes if 'n_classes' in cfg.experiments.keys() else 2
52 | exp_range = None if n_classes > 1 else cfg.experiments.range
53 | experiment = Experiment(generator, cfg.generators.label_col, MODELS[model], cfg.experiments.n_targets, cfg.experiments.n_shadows,
54 | cfg.models[model], cfg.experiments.n_queries, n_classes=n_classes, range=exp_range)
55 |
56 | logger.info('Training target models: {} - {}'.format(gen, model))
57 | experiment.run_targets()
58 | logger.info('Training shadow models: {} - {}'.format(gen, model))
59 | experiment.run_shadows(MODELS[model], cfg.models[model])
60 |
61 | runs = list(cfg.experiments.runs)
62 | if 'BlackBox' in runs:
63 | runs.remove('BlackBox')
64 | runs.append('BlackBox')
65 |
66 | for run in cfg.experiments.runs:
67 | name = '{} - {} - {}'.format(gen, model, run)
68 | logger.info('Running {}...'.format(name))
69 | if run == 'LossTest':
70 | experiments[name] = experiment.run_loss_test()
71 | elif run == 'ThresholdTest':
72 | experiments[name] = experiment.run_threshold_test()
73 | elif run == 'Naive':
74 | experiments[name] = experiment.run_whitebox_sort(sort=False)
75 | elif run == 'Sort':
76 | experiments[name] = experiment.run_whitebox_sort(sort=True)
77 | elif run == 'DeepSets':
78 | experiments[name] = experiment.run_whitebox_deepsets(cfg.deepsets)
79 | elif run == 'GreyBox':
80 | experiments[name] = experiment.run_blackbox()
81 | elif run == 'BlackBox':
82 | logger.info('Training default shadow models: {} - {}'.format(gen, cfg.experiments.blackbox_model))
83 | experiment.run_shadows(MODELS[cfg.experiments.blackbox_model], cfg.models[cfg.experiments.blackbox_model])
84 | experiments[name] = experiment.run_blackbox()
85 | else:
86 | raise AttributeError('Invalid run provided: should be Naive, Sort, DeepSets, GreyBox or BlackBox'
87 | ' - instead is {}'.format(run))
88 |
89 | if n_classes > 1:
90 | logger.info('Attack accuracy for {}: {:.2%}'.format(name, experiments[name]))
91 | else:
92 | logger.info('Mean absolute error for {}: {:.2f}'.format(name, experiments[name]))
93 |
94 | # Output results
95 | outfile_name = 'results_PIA_' + TIMESTAMP + '.json'
96 | outdir = path.join(CWD, cfg.outdir)
97 | if not path.isdir(outdir):
98 | mkdir(outdir)
99 | with open(path.join(outdir, outfile_name), 'w') as f:
100 | json.dump(experiments, f)
101 |
102 |
103 | if __name__ == "__main__":
104 | main()
105 |
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | name = propinfer
3 | version = 1.3.0
4 | author = Léo Meynent
5 | author_email = leo.meynent@epfl.ch
6 | description = Modular framework to run Property Inference Attacks on Machine Learning models.
7 | long_description = file: README.md
8 | long_description_content_type = text/markdown
9 | url = https://epfl-dlab.github.io/property-inference-attacks/
10 | project_urls =
11 | Repository = https://github.com/epfl-dlab/property-inference-attacks/
12 | Tracker = https://github.com/epfl-dlab/property-inference-attacks/issues
13 | classifiers =
14 | Programming Language :: Python :: 3
15 | License :: OSI Approved :: MIT License
16 | Operating System :: OS Independent
17 | Intended Audience :: Science/Research
18 | license = MIT
19 | license_files = LICENSE.md
20 |
21 | [options]
22 | packages = propinfer
23 | python_requires = >=3.6
24 | install_requires =
25 | scikit-learn
26 | pandas
27 | numpy
28 | hydra-core
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 | import logging
2 | logger = logging.getLogger('propinfer')
3 |
4 | for handler in logger.handlers:
5 | if handler.name == 'consoleHandler':
6 | handler.setLevel(logging.WARNING)
--------------------------------------------------------------------------------
/tests/test_deepsets.py:
--------------------------------------------------------------------------------
1 | from unittest import TestCase
2 |
3 | from propinfer.deepsets import DeepSets
4 | from propinfer import MLP
5 |
6 | from numpy import array
7 |
8 | DEFAULT_HYPERPARAMS_MLP = {
9 | 'input_size': 5,
10 | 'num_classes': 2,
11 | 'epochs': 10,
12 | 'learning_rate': 1e-3,
13 | 'weight_decay': 1e-4,
14 | 'normalise': False,
15 | 'layers': (128, 64)
16 | }
17 |
18 |
19 | class Test(TestCase):
20 | def test_deepsets(self):
21 | model = MLP('label', DEFAULT_HYPERPARAMS_MLP)
22 | multi_params = [model.parameters()]*64
23 | ds = DeepSets(model.parameters(), 8, 2, 1e-3, 1e-4)
24 | ds.fit(multi_params, array([0]*64))
25 | assert len(ds.predict(multi_params)) == 64
--------------------------------------------------------------------------------
/tests/test_experiment.py:
--------------------------------------------------------------------------------
1 | from unittest import TestCase
2 |
3 | from propinfer import Experiment
4 | from propinfer import GaussianGenerator, IndependentPropertyGenerator, MultilabelProbitGenerator, LinearGenerator
5 | from propinfer import LinReg, LogReg, MLP
6 |
7 | import numpy as np
8 |
9 |
10 | DEFAULT_HYPERPARAMS_MLP = {
11 | "input_size": 4,
12 | "layers": (4, 4),
13 | "num_classes": 2,
14 | "epochs": 1,
15 | "learning_rate": [1e-1, 1e-2],
16 | "weight_decay": [1e-2, 1e-3],
17 | "batch_size": 32
18 | }
19 |
20 | DEFAULT_HYPERPARAMS_DEEPSETS = {
21 | 'latent_dim': 8,
22 | 'epochs': 1,
23 | 'learning_rate': 1e-3,
24 | 'weight_decay': 1e-4
25 | }
26 |
27 |
28 | class TestExperiment(TestCase):
29 | def setUp(self):
30 | self.num_targets = 128
31 | self.num_shadows = 512
32 |
33 | self.gen = GaussianGenerator()
34 | self.model = LogReg
35 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100})
36 |
37 | def test_prepare_attacks(self):
38 | self.exp.run_targets()
39 | assert self.exp.targets is not None
40 | assert sum(self.exp.labels) == self.num_targets//2
41 | assert len(self.exp.labels) == self.num_targets
42 |
43 | def test_run_shadows(self):
44 | self.exp.run_shadows(LogReg, {'max_iter': 100})
45 | assert self.exp.shadow_models is not None
46 | assert sum(self.exp.shadow_labels) == self.num_shadows // 2
47 | assert len(self.exp.shadow_labels) == self.num_shadows
48 |
49 | def test_attacks(self):
50 | self.exp.run_targets()
51 | self.exp.run_shadows(LogReg, {'max_iter': 100})
52 |
53 | assert self.exp.run_loss_test() > 0.25
54 | assert self.exp.run_threshold_test() > 0.25
55 | assert self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS) > 0.25
56 |
57 | res = dict()
58 | res['whitebox'] = self.exp.run_whitebox_sort()
59 | res['blackbox'] = self.exp.run_blackbox()
60 |
61 | indep = IndependentPropertyGenerator()
62 | exp_indep = Experiment(indep, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100})
63 |
64 | exp_indep.run_targets()
65 | exp_indep.run_shadows(LogReg, {'max_iter': 100})
66 |
67 | res_indep = dict()
68 | res_indep['whitebox'] = exp_indep.run_whitebox_sort()
69 | res_indep['blackbox'] = exp_indep.run_blackbox()
70 |
71 | assert res['whitebox'] > res_indep['whitebox']
72 | assert res['blackbox'] > res_indep['blackbox']
73 |
74 | def test_optimise_classifier(self):
75 | assert isinstance(DEFAULT_HYPERPARAMS_MLP['learning_rate'], list)
76 | assert isinstance(DEFAULT_HYPERPARAMS_MLP['weight_decay'], list)
77 |
78 | self.model = MLP
79 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, DEFAULT_HYPERPARAMS_MLP)
80 |
81 | assert not isinstance(self.exp.hyperparams['learning_rate'], list)
82 | assert not isinstance(self.exp.hyperparams['weight_decay'], list)
83 |
84 | def test_optimise_regressor(self):
85 | self.gen = LinearGenerator()
86 | self.model = LinReg
87 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'dummy': [0., 1.]})
88 |
89 | assert not isinstance(self.exp.hyperparams['dummy'], list)
90 |
91 | def test_attacks_multiple(self):
92 | self.exp.run_targets()
93 | self.exp.run_shadows(LogReg, {'max_iter': 100})
94 |
95 | assert len(self.exp.run_threshold_test(n_outputs=2)) == 2
96 | assert len(self.exp.run_whitebox_sort(n_outputs=2)) == 2
97 | assert len(self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS, n_outputs=2)) == 2
98 | assert len(self.exp.run_whitebox_sort(n_outputs=2)) == 2
99 | assert len(self.exp.run_blackbox(n_outputs=2)) == 2
100 |
101 | def test_wb_bb_regression(self):
102 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
103 | n_classes=1, range=(-1., 1.))
104 |
105 | self.exp.run_targets()
106 | self.exp.run_shadows(LogReg, {'max_iter': 100})
107 |
108 | assert self.exp.run_whitebox_sort() < 1
109 | assert self.exp.run_blackbox() < 1
110 |
111 | def test_wb_bb_multiclass(self):
112 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
113 | n_classes=3)
114 |
115 | self.exp.run_targets()
116 | self.exp.run_shadows(LogReg, {'max_iter': 100})
117 |
118 | assert self.exp.run_whitebox_sort() > 0.5
119 | assert self.exp.run_blackbox() > 0.5
120 |
121 | def test_deepsets_regression(self):
122 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
123 | n_classes=1, range=(-1., 1.))
124 |
125 | self.exp.run_targets()
126 | self.exp.run_shadows(LogReg, {'max_iter': 100})
127 |
128 | assert self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS) < 1
129 |
130 | def test_deepsets_multiclass(self):
131 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
132 | n_classes=3)
133 |
134 | self.exp.run_targets()
135 | self.exp.run_shadows(LogReg, {'max_iter': 100})
136 |
137 | assert self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS) > 0.25
138 |
139 | def test_multiple_regression(self):
140 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
141 | n_classes=1, range=(-1., 1.))
142 |
143 | self.exp.run_targets()
144 | self.exp.run_shadows(LogReg, {'max_iter': 100})
145 |
146 | assert len(self.exp.run_whitebox_sort(n_outputs=2)) == 2
147 |
148 | def test_multiple_multiclass(self):
149 | self.exp = Experiment(self.gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
150 | n_classes=3)
151 |
152 | self.exp.run_targets()
153 | self.exp.run_shadows(LogReg, {'max_iter': 100})
154 |
155 | assert len(self.exp.run_whitebox_sort(n_outputs=2)) == 2
156 |
157 | def test_multiclass_nondivisible_number_models(self):
158 | self.exp = Experiment(self.gen, 'label', self.model, 10, 13, {'max_iter': 100},
159 | n_classes=3)
160 | self.exp.run_targets()
161 | self.exp.run_shadows()
162 |
163 | assert len(self.exp.labels) == 10
164 | assert len(self.exp.targets) == 10
165 | assert len(self.exp.shadow_labels) == 13
166 | assert len(self.exp.shadow_models) == 13
167 |
168 | def test_multivariable_regression(self):
169 | gen = MultilabelProbitGenerator()
170 | self.exp = Experiment(gen, 'label', self.model, self.num_targets, self.num_shadows, {'max_iter': 100},
171 | n_classes=1, range=np.array(((0., 1.), (0., 1.))))
172 |
173 | self.exp.run_targets()
174 | self.exp.run_shadows()
175 |
176 | assert len(self.exp.run_whitebox_sort()) == 2
177 | assert len(self.exp.run_blackbox()) == 2
178 | assert len(self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS)) == 2
179 |
180 | def test_attack_regressor(self):
181 | gen = LinearGenerator()
182 | self.exp = Experiment(gen, 'label', LinReg, self.num_targets, self.num_shadows, dict())
183 |
184 | self.exp.run_targets()
185 | self.exp.run_shadows()
186 |
187 | assert self.exp.run_whitebox_sort() > 0.25
188 | assert self.exp.run_blackbox() > 0.25
189 | assert self.exp.run_whitebox_deepsets(DEFAULT_HYPERPARAMS_DEEPSETS) > 0.25
--------------------------------------------------------------------------------
/tests/test_generator.py:
--------------------------------------------------------------------------------
1 | from unittest import TestCase
2 |
3 | from propinfer import Generator, GaussianGenerator, ProbitGenerator, IndependentPropertyGenerator, \
4 | LinearGenerator, SubsamplingGenerator, MultilabelProbitGenerator
5 | from numpy import stack, sum, int32
6 | from numpy.random import randint
7 | from pandas import DataFrame
8 |
9 | class TestExperiment(TestCase):
10 | def test_subsampling_generator(self):
11 | attr1 = randint(0, 2, 32768)
12 | attr2 = randint(0, 3, 32768)
13 | attr3 = randint(0, 4, 32768)
14 |
15 | data = DataFrame(data=stack((attr1, attr2, attr3), axis=1), columns=['Bin', 'Tri', 'Quad'], dtype=int32)
16 | data.loc[:, 'Cat'] = data.Quad.astype('category')
17 |
18 | gen = SubsamplingGenerator(data, 'Quad', 'Bin', proportion=0.1)
19 |
20 | sample = gen.sample(False)
21 | assert 0.49 < sum(sample['Bin_1']) / len(sample) < 0.51
22 | assert 0.2 < sum(sample['Quad'] == 1) / len(sample) < 0.3
23 |
24 | sample = gen.sample(True)
25 | assert 0.09 < sum(sample['Bin_1']) / len(sample) < 0.11
26 | assert 0.2 < sum(sample['Quad'] == 1) / len(sample) < 0.3
27 |
28 | self.assertRaises(AssertionError, SubsamplingGenerator, data, 'Tri', 'Quad', proportion=0.1)
29 |
30 | gen = SubsamplingGenerator(data, 'Tri', 'Quad', target_category=1, proportion=0.1)
31 | sample = gen.sample(False)
32 | assert 0.25 < sum(sample['Tri'] == 1) / len(sample) < 0.4
33 | assert 0.49 < sum(sample['Quad_1']) / len(sample) < 0.51
34 |
35 | sample = gen.sample(True)
36 | assert 0.25 < sum(sample['Tri'] == 1) / len(sample) < 0.4
37 | assert 0.09 < sum(sample['Quad_1']) / len(sample) < 0.11
38 | assert 0.25 < sum(sample['Quad_0']) / len(sample) < 0.35
39 | assert 0.25 < sum(sample['Quad_2']) / len(sample) < 0.35
40 | assert 0.25 < sum(sample['Quad_3']) / len(sample) < 0.35
41 |
42 | gen = SubsamplingGenerator(data, 'Tri', 'Quad', target_category=1, proportion=0.1, split=True)
43 | sample = gen.sample(False)
44 | assert 0.25 < sum(sample['Tri'] == 1) / len(sample) < 0.4
45 | assert 0.49 < sum(sample['Quad_1']) / len(sample) < 0.51
46 |
47 | sample = gen.sample(True)
48 | assert 0.25 < sum(sample['Tri'] == 1) / len(sample) < 0.4
49 | assert 0.09 < sum(sample['Quad_1']) / len(sample) < 0.11
50 |
51 | gen = SubsamplingGenerator(data, 'Tri', 'Cat', target_category=1, proportion=0.1)
52 | gen.sample(False)
53 |
54 | gen = SubsamplingGenerator(data, 'Cat', 'Bin', proportion=0.1)
55 | gen.sample(False)
56 |
57 | gen = SubsamplingGenerator(data, 'Tri', 'Quad', target_category=1, regression=True)
58 | sample = gen.sample(0.5)
59 | assert 0.25 < sum(sample['Tri'] == 1) / len(sample) < 0.4
60 | assert 0.49 < sum(sample['Quad_1']) / len(sample) < 0.51
61 |
62 | sample = gen.sample(0.25)
63 | assert 0.24 < sum(sample['Quad_1']) / len(sample) < 0.26
64 |
65 | sample = gen.sample(0.75)
66 | assert 0.74 < sum(sample['Quad_1']) / len(sample) < 0.76
67 |
68 | sample = gen.sample(0.)
69 | assert sum(sample['Quad_1']) / len(sample) < 0.01
70 |
71 | sample = gen.sample(1.)
72 | assert 0.99 < sum(sample['Quad_1']) / len(sample)
73 |
74 | def test_generator(self):
75 | gen = Generator()
76 | self.assertRaises(NotImplementedError, gen.sample, 0)
77 |
78 | gen = GaussianGenerator()
79 | assert gen.sample(0).mean()[1] < 0.1
80 |
81 | gen = IndependentPropertyGenerator()
82 | assert gen.sample(0).mean()[1] < 0.1
83 |
84 | gen = ProbitGenerator()
85 | assert gen.sample(0).mean()[1] < 0.1
86 |
87 | gen = LinearGenerator()
88 | assert gen.sample(0).mean()[1] < 0.1
89 |
90 | def test_multilabel_probit_generator(self):
91 | gen = MultilabelProbitGenerator()
92 | assert gen.sample((0., 1.)).mean()['f1'] < 0.1
93 | assert gen.sample((0., 1.)).var()['f1'] > 2.
94 |
95 | assert gen.sample((1., 0.)).mean()['f1'] > 0.9
96 | assert gen.sample((1., 0.)).var()['f1'] < 1.5
97 |
--------------------------------------------------------------------------------
/tests/test_model.py:
--------------------------------------------------------------------------------
1 | from unittest import TestCase
2 |
3 | from propinfer import LinReg, LogReg, MLP
4 | from propinfer import GaussianGenerator, LinearGenerator
5 |
6 | from sklearn.metrics import accuracy_score, mean_squared_error
7 |
8 | DEFAULT_HYPERPARAMS_LOGREG = {
9 | "max_iter": 100
10 | }
11 |
12 | DEFAULT_HYPERPARAMS_MLP = {
13 | "input_size": 4,
14 | "num_classes": 2,
15 | "epochs": 20,
16 | "learning_rate": 1e-3,
17 | "weight_decay": 1e-4,
18 | "batch_size": 32,
19 | "layers": [8]
20 | }
21 |
22 | DEFAULT_HYPERPARAMS_MLP_REGRESSOR = {
23 | "input_size": 4,
24 | "num_classes": 1,
25 | "epochs": 20,
26 | "learning_rate": 1e-2,
27 | "weight_decay": 1e-3,
28 | "batch_size": 32,
29 | "layers": [8]
30 | }
31 |
32 |
33 | class Test(TestCase):
34 | def test_linreg(self):
35 | gen = LinearGenerator()
36 | model = LinReg('label')
37 |
38 | train = gen.sample(False)
39 | model.fit(train)
40 |
41 | assert mean_squared_error(train['label'], model.predict(train)) < 2.
42 |
43 | def test_logreg(self):
44 | gen = GaussianGenerator()
45 | model = LogReg('label', DEFAULT_HYPERPARAMS_LOGREG)
46 |
47 | train = gen.sample(False)
48 | model.fit(train)
49 |
50 | assert accuracy_score(train['label'], model.predict(train)) > 0.75
51 |
52 | def test_mlp(self):
53 | gen = GaussianGenerator()
54 |
55 | model = MLP('label', DEFAULT_HYPERPARAMS_MLP)
56 | assert model.parameters()[0][0].shape[0] == 8
57 |
58 | train = gen.sample(False)
59 | model.fit(train)
60 |
61 | assert accuracy_score(train['label'], model.predict(train)) > 0.75
62 |
63 | def test_mlp_regression(self):
64 | gen = LinearGenerator()
65 |
66 | model = MLP('label', DEFAULT_HYPERPARAMS_MLP_REGRESSOR)
67 |
68 | train = gen.sample(False)
69 | model.fit(train)
70 |
71 | assert mean_squared_error(train['label'], model.predict(train)) < 2.
72 |
--------------------------------------------------------------------------------
/tests/test_model_utils.py:
--------------------------------------------------------------------------------
1 | from unittest import TestCase
2 |
3 | from propinfer import MLP
4 | from propinfer.model_utils import sort_parameters, flatten_parameters
5 |
6 | DEFAULT_HYPERPARAMS = {
7 | "input_size": 4,
8 | "hidden_size": 20,
9 | "num_classes": 2,
10 | "epochs": 20,
11 | "learning_rate": 1e-3,
12 | "batch_size": 32
13 | }
14 |
15 |
16 | class Test(TestCase):
17 | def test_sort_parameters(self):
18 | model = MLP('None', DEFAULT_HYPERPARAMS)
19 | params_flat = flatten_parameters(model.parameters())
20 | params_transf = sort_parameters(model.parameters())
21 |
22 | assert params_flat.shape[0] == params_transf.shape[0]
23 | assert params_flat[-1] == params_transf[-1]
24 |
--------------------------------------------------------------------------------