226 |
227 |
228 |
229 |
230 |
231 |
232 |
233 |
234 |
235 |
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
251 |
252 |
253 |
254 |
255 |
256 |
257 |
258 |
259 |
260 |
261 |
262 |
263 |
264 |
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 |
274 |
275 |
276 |
277 |
278 |
279 |
280 |
281 |
282 |
283 |
284 |
285 |
286 |
287 |
288 |
289 |
290 |
291 |
292 |
293 |
294 |
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 |
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |
336 |
337 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 vaikkunth anton lalana
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PrivacyFL: A simulator for privacy-preserving and secure federated learning
2 |
3 | This repository contains the source code for running a privacy perserving federated learning simulator. The source code is currently set up for the configuration of three clients performing secure and differentially private federated learning using logistic regresion on the MNIST dataset. This library, however, is meant to be modified so as to simulate your own secure federated machine learning configuration. We hope that this simulation can help users decide whether it is beneficial for them to participate in differentially-private federated learning for a given differentially private algorithm.
4 |
5 | ## UPDATE : Paper accepted at the 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT
6 |
7 | Paper and Video Link : https://dl.acm.org/doi/10.1145/3340531.3412771
8 |
9 | ## Getting Started
10 |
11 | These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
12 |
13 |
14 | ### Installing
15 | First, clone this repository locally. Then create a conda enviroment by running:
16 | ```
17 | conda env create -f environment.yml -n YourEnvironmentName
18 | ```
19 | Activate the new enviornment:
20 | ```
21 | source activate YourEnvironmentName
22 | ```
23 | To validate correct installation `cd` to `src` and run
24 | ```
25 | python run_simulation.py
26 | ```
27 | If you encounter any issues, please let us know so that we can help in getting the simulation up and running.
28 |
29 | ### Configuring Your Simulation
30 | This library is intended to be modified as needed for your use case. We have provided a default `config.py` file as an example.
31 |
32 | Some simulation behavior can easily be configured by changing the files in the config file. The file contains Boolean variables `USE_SECURITY` and `USE_DP_PRIVACY` to toggle security and differential privacy features. The security feature does not affect accuracy, however you can set `USE_DP_PRIVACY` to `False` if you want to see what the federated accuracy would be without differential privacy.
33 |
34 | The default `config.py` file also has `USING_CUMULATIVE` set to `True`. What that means is that the dataset for a client on iteration `i` containts all of the datapoints in iteration `i-1` as well as len_per_iteration new datapoints. As such, this flag also makes it so that each client trains its weights from scratch each iteration. Conversely, one can set the `USING_CUMULATIVE` flag to `False`, which will make the dataset non-cumulative and clients perform gradient descent from last iteration's federated weights.
35 |
36 | ## System Architecure
37 | ### Agent
38 | Agent (defined in agent.py) is the base class for this simulation. It is not meant to be initialized directly, but subclassed to create an agent with whichever behavior you would like. We have provided two sample subclasses, **ClientAgent** and **ServerAgent**, which are both used in the sample simulation.
39 |
40 | ### ClientAgent
41 | An instance of the **ClientAgent** class represents an entity that is training a machine learning model on the same task as the other client agents. The initialization arguments for a client are `agent_number`, `train_datasets`, `evaluator`, `sensitivity`. Client agents are named assigned an `agent_number`, which is then appended to the string `client_agent` to create their name. For example, in the example simulation there are three client agents named `client_agent0`, `client_agent1`, and `client_agent2`. When initialized, clients are also provided their dataset, which in the example is a pyspark dataframe. The client is also passed an `evaluator` which it will use in the simulation to evaluate its own weights and the federated weights. `evaluator`is an instance of the ModelEvaluator class defined in `utils/model_evaluator.py`.
42 |
43 | There are two important methods of **ClientAgent** that are invoked by the **ServerAgent**. The first is `compute_weights`, which is called every iteration and prompts the client to perform its machine learning task on its dataset for that iteration.
44 | The second method is `receive_weights` which is called at the end of every iteration when the server has federated weights to return to the client.
45 |
46 | ### ServerAgent
47 | An instance of the **ServerAgent** class represents a third-party entity that is responsible for invoking the simulation and corresponding with the client agents. It is possible to configure a simulation to use more than one **ServerAgent**, but the straightforward example in the respository currently only creates one instance. Initializaing a **ServerAgent** only requires the same default argument for initializing its superclass **Agent**: `agent_number` which should be `0` for the first server agent.
48 |
49 | **ServerAgent** has one method: `request_values`. The `request_values` is called to signal the server agent to start requesting values from clients, thereby starting the online portion of the simulation. The only argument is `iters` which dictates how many iterations to run the simulation for. Note that if iters is too large, the client agents may run out of data. For the example shown in the repository, only set `iters` to be equal to or less than to the `iterations` argument in the **Initializer** `__init__` method, since that is the method that creates the datasets and distributes them to the clients. If you wish to change the behavior of the simulation, `request_values` is a good place to start and subsequently add/modify any methods that are called in the **ClientAgent** class.
50 |
51 | The `request_values` method first requests weights in parallel from the clients by calling their `compute_weights` method, averages them, and then returns them to the clients in parallel by calling their `receive_weights` method. In the current example, the client returns a message to the server agent through the `receive_weights` method indicating whether its weights have converged.
52 |
53 | ### Initializer
54 | An instance of the **Initializer** class is used to initialize the agents and model evaluator. In addition, any offline stage logic, such as a diffie-helman key exchange, should occur in this class. In our example, it loads the MNIST dataset and processes it for the client agent instances.
55 |
56 | To commence the simulation, the initializer's `run_simulation` method is invoked, which then invokes the server agent's `request_values` method.
57 | ### Directory
58 | The **Directory** class contains a mapping of agent names to agent instances that allows agents to invoke other agents' methods by only having their name. An instance of **Directory** is created in the `__init__` method of the **Initializer** class after all the agents have been created. It is then passed on to all the agents using their `set_directory` method.
59 |
60 | An example usage to call some method of client_agent1 would be:
61 | `self.directory.clients['client_agent1'].METHOD_TO_CALL()'`
62 |
63 | ### Utils Folder
64 | This folder contains utilities such as data processing, differential privacy functions, and more. For the most part, functions in here are implementation specific and you should feel free to add any auxiliary functions scripts.
65 |
66 | ## Features
67 | The library is intended to help users simulate secure federated learning to decide whether it would be feasible and beneficial. A snippet of a sample output looks like:
68 | ```
69 | Performance Metrics for client_agent2 on iteration 1
70 | -------------------------------------------
71 | Personal accuracy: 0.8283333333333334
72 | Personal computation time: 0:00:01.194242
73 | Federated accuracy: 0.8566666666666667
74 | Simulated time to receive federated weights: 0:00:07.202375
75 |
76 | Performance Metrics for client_agent0 on iteration 1
77 | -------------------------------------------
78 | Personal accuracy: 0.8216666666666667
79 | Personal computation time: 0:00:01.198737
80 | Federated accuracy: 0.8566666666666667
81 | Simulated time to receive federated weights: 0:00:09.202375
82 | ```
83 | As you can see, the simulation prints out i) the personal accuracy: the accuracy that the client can obtain by itself on its own dataset, adding no DP noise. NOTE: this quantitiy does incorporate other client's data if you set the config.USING_CUMULATIVE flag to False, since that indicates to clients that they should start training on this iteration using the federated weights from the previous iteration since the datasets aren't cumulative. ii) the federated accuracy: the accuracy of the federated model which is the average of all the clients' personal weights + differentially private noise for that iteration. Note that while the clients benefit from participating in the simulation in this example, that is not always the case. In particular, as one increases the amount of differentially private noise, the federated accuracy is expected to decrease. On the other hand, the personal accuracy will remain the same since it is assumed you don't add differentially private noise to your personal model since you are not sharing it.
84 |
85 | In addition, this library also allows you to simulate how long it would take to receive the federated values back for each iteration. `Personal computation time` indicates how long your training took for that iteration while `Simulated time to receive federated weights` takes into account user-defined communication latencies between the clients and the server, as well as how long it took the other clients to compute their weights and the server to average them.
86 |
87 | ## Authors
88 |
89 | Vaikkunth Mugunthan* Anton Peraire* Lalana Kagal
90 |
91 | ## License
92 |
93 | This project is licensed under the MIT License
94 |
95 | MIT License
96 |
97 | Copyright (c) 2020 Vaikkunth, Anton, Lalana (PrivacyFL)
98 |
99 | Permission is hereby granted, free of charge, to any person obtaining a copy
100 | of this software and associated documentation files (the "Software"), to deal
101 | in the Software without restriction, including without limitation the rights
102 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
103 | copies of the Software, and to permit persons to whom the Software is
104 | furnished to do so, subject to the following conditions:
105 |
106 | The above copyright notice and this permission notice shall be included in all
107 | copies or substantial portions of the Software.
108 |
109 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
110 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
111 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
112 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
113 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
114 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
115 | SOFTWARE.
116 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: privacyFL
2 | channels:
3 | - anaconda
4 | - conda-forge
5 | - defaults
6 | dependencies:
7 | - astroid=2.3.3
8 | - attrs=19.1.0
9 | - backcall=0.1.0
10 | - blas=1.0
11 | - bleach=3.1.0
12 | - ca-certificates=2019.11.27
13 | - certifi=2019.11.28
14 | - cffi=1.12.3
15 | - colorama=0.4.3
16 | - decorator=4.4.0
17 | - defusedxml=0.6.0
18 | - deprecated=1.2.6
19 | - entrypoints=0.3
20 | - gettext=0.19.8.1
21 | - glib=2.56.2
22 | - icc_rt=2019.0.0
23 | - icu=58.2
24 | - intel-openmp=2019.4
25 | - ipykernel=5.1.2
26 | - ipython=7.8.0
27 | - ipython_genutils=0.2.0
28 | - ipywidgets=7.5.1
29 | - isort=4.3.21
30 | - jedi=0.15.1
31 | - jinja2=2.10.1
32 | - joblib=0.13.2
33 | - jpeg=9b
34 | - jsonschema=3.0.2
35 | - jupyter=1.0.0
36 | - jupyter_client=5.3.1
37 | - jupyter_console=6.0.0
38 | - jupyter_core=4.5.0
39 | - lazy-object-proxy=1.4.3
40 | - libblas=3.8.0
41 | - libcblas=3.8.0
42 | - libffi=3.2.1
43 | - libiconv=1.15
44 | - liblapack=3.8.0
45 | - libpng=1.6.37
46 | - libsodium=1.0.16
47 | - m2w64-gcc-libgfortran=5.3.0
48 | - m2w64-gcc-libs=5.3.0
49 | - m2w64-gcc-libs-core=5.3.0
50 | - m2w64-gmp=6.1.0
51 | - m2w64-libwinpthread-git=5.0.0.4634.697f757
52 | - markupsafe=1.1.1
53 | - mccabe=0.6.1
54 | - mistune=0.8.4
55 | - mkl=2019.4
56 | - mkl-service=2.3.0
57 | - mkl_fft=1.0.14
58 | - mkl_random=1.0.2
59 | - msys2-conda-epoch=20160418
60 | - nbconvert=5.5.0
61 | - nbformat=4.4.0
62 | - notebook=6.0.1
63 | - numpy=1.16.5
64 | - numpy-base=1.16.5
65 | - openssl=1.1.1g
66 | - pandas=0.25.3
67 | - pandoc=2.2.3.2
68 | - pandocfilters=1.4.2
69 | - parso=0.5.1
70 | - pcre=8.43
71 | - pexpect=4.7.0
72 | - pickleshare=0.7.5
73 | - pip=19.2.2
74 | - prometheus_client=0.7.1
75 | - prompt_toolkit=2.0.9
76 | - ptyprocess=0.6.0
77 | - py4j=0.10.7
78 | - pycparser=2.19
79 | - pygments=2.4.2
80 | - pylint=2.4.4
81 | - pynacl=1.3.0
82 | - pyqt=5.9.2
83 | - pyrsistent=0.14.11
84 | - pyspark=2.4.4
85 | - python=3.7.4
86 | - python-dateutil=2.8.0
87 | - python_abi=3.7
88 | - pytz=2019.2
89 | - pywinpty=0.5.7
90 | - pyzmq=18.1.0
91 | - qt=5.9.7
92 | - qtconsole=4.5.5
93 | - scikit-learn=0.21.3
94 | - scipy=1.3.1
95 | - send2trash=1.5.0
96 | - setuptools=41.0.1
97 | - sip=4.19.8
98 | - six=1.12.0
99 | - sqlite=3.29.0
100 | - terminado=0.8.2
101 | - testpath=0.4.2
102 | - tk=8.6.8
103 | - tornado=6.0.3
104 | - traitlets=4.3.2
105 | - typed-ast=1.4.1
106 | - vc=14.1
107 | - vs2015_runtime=14.16.27012
108 | - wcwidth=0.1.7
109 | - webencodings=0.5.1
110 | - wheel=0.33.4
111 | - widgetsnbextension=3.5.1
112 | - wincertstore=0.2
113 | - winpty=0.4.3
114 | - wrapt=1.11.2
115 | - xz=5.2.4
116 | - zeromq=4.3.1
117 | - zlib=1.2.11
118 | - pip:
119 | - cryptography==2.9.2
120 | prefix: Anaconda3\envs\privacyFL
121 |
--------------------------------------------------------------------------------
/src/Serverless/client_agent_serverless.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | sys.path.append('..')
4 |
5 | import copy
6 | import numpy as np
7 | import sys
8 | import random
9 | import config
10 | import threading
11 | from warnings import simplefilter
12 | from datetime import datetime
13 | from sklearn import metrics
14 |
15 |
16 | from agent import Agent
17 | from message import Message
18 | from utils.dp_mechanisms import laplace
19 | import utils.diffie_hellman as dh
20 | from sklearn.linear_model import LogisticRegression
21 | from sklearn.pipeline import Pipeline
22 | from sklearn.linear_model import SGDClassifier
23 | from sklearn.preprocessing import MinMaxScaler
24 |
25 | from pyspark.ml.classification import LogisticRegression
26 |
27 | simplefilter(action='ignore', category=FutureWarning)
28 |
29 |
30 | class ClientAgentServerless(Agent):
31 | def __init__(self, agent_number, train_datasets, evaluator, active_clients):
32 | """
33 | Initializes an instance of client agent
34 |
35 | :param agent_number: id for agent
36 | :type agent_number: int
37 | :param train_datasets: dictionary mapping iteration to dataset for given iteration
38 | :type train_datasets: dictionary indexed by ints mapping to pyspark dataframes
39 | :param evaluator: evaluator instance used to evaluate new weights
40 | :type evaluator: evaluator, defined in parallelized.py
41 | :param active_clients: Clients currently in simulation. Will be updated if clients drop out
42 | :type sensitivity: set
43 | """
44 | super(ClientAgentServerless, self).__init__(agent_number=agent_number, agent_type="client_agent")
45 |
46 | self.train_datasets = train_datasets
47 | self.evaluator = evaluator
48 | self.active_clients = active_clients
49 |
50 | self.directory = None
51 | self.pubkeyList = None
52 | self.seckeyList = None
53 | self.otherkeyList = None
54 | self.commonkeyList = None
55 | self.seeds = None
56 | self.deltas = None
57 |
58 | self.computation_times = {}
59 |
60 | self.personal_weights = {} # personal weights. Maps iteration (int) to weights (numpy array)
61 | self.personal_intercepts = {}
62 | self.weights_dp_noise = {} # keyed by iteration; noise added at each iteration
63 | self.intercepts_dp_noise = {}
64 |
65 | self.other_client_weights = {}
66 | self.other_client_intercepts = {}
67 |
68 | self.federated_weights = {} # averaged weights
69 | self.federated_intercepts = {}
70 | self.personal_accuracy = {}
71 | self.federated_accuracy = {}
72 |
73 | def initializations(self):
74 | """
75 | Preforms initializions that have to be done after initializing instance
76 | :return: None
77 | :rtype: None
78 | """
79 | assert (self.directory is not None)
80 | clients = self.directory.clients
81 | num_clients = len(clients)
82 |
83 | pubkeyList, seckeyList = dh.keygeneration(num_clients, self.agent_number)
84 |
85 | # note this works because dicts are ordered in Python 3.6+
86 | self.pubkeyList = dict(zip(clients.keys(), pubkeyList))
87 | self.seckeyList = dict(zip(clients.keys(), seckeyList))
88 |
89 | # these dictionaries will be populated after key exchange
90 |
91 | self.otherkeyList = {agent_name: None for agent_name, __ in clients.items()}
92 | self.otherkeyList[self.name] = 0 # set to zero for yourself!
93 |
94 | self.commonkeyList = {agent_name: None for agent_name, __ in clients.items()}
95 | self.commonkeyList[self.name] = 0
96 |
97 | self.seeds = {agent_name: None for agent_name, __ in clients.items()}
98 | self.seeds[self.name] = 0
99 |
100 | self.deltas = {agent_name: None for agent_name, __ in clients.items()}
101 | self.deltas[self.name] = 0
102 |
103 | def send_pubkeys(self):
104 | """
105 | Sends public keys to other clients in simulations as required by diffie-helman protocol.
106 | """
107 | for agent_name, agent in self.directory.clients.items():
108 | pubkey = self.pubkeyList[agent_name] # retrieve pubkey for client we're sending to
109 | body = {'pubkey': pubkey}
110 | msg = Message(sender_name=self.name, recipient_name=agent_name, body=body)
111 | agent.receive_pubkey(msg) # invoke method of receiving agent
112 |
113 | def receive_pubkey(self, message):
114 | """
115 | Receives public key from another client
116 | :param message: message containing pubkey from another client
117 | :type message: instance of Message defined in message.py
118 | """
119 | sender = message.sender
120 | body = message.body
121 | pubkey = body["pubkey"]
122 | self.otherkeyList[sender] = pubkey
123 |
124 | def initialize_common_keys(self):
125 | """
126 | Initializes common key list to be used as offsets for sending weights
127 | """
128 | pubkeyList = list(self.pubkeyList.values())
129 | seckeyList = list(self.seckeyList.values())
130 | otherkeyList = list(self.otherkeyList.values())
131 | commonkeyList = dh.keyexchange(len(self.directory.clients), self.agent_number, pubkeyList, seckeyList,
132 | otherkeyList) # generates common keys
133 | for i, agent in enumerate(self.commonkeyList):
134 | self.commonkeyList[agent] = commonkeyList[i]
135 |
136 | self.update_deltas() # this method generates seeds and deltas from the common keys
137 |
138 | def produce_weights(self, iteration, lock):
139 | start_time = datetime.now()
140 |
141 | if iteration - 1 > len(self.train_datasets): # iteration is indexed starting from 1
142 | raise (ValueError(
143 | 'Not enough data to support a {}th iteration. Either change iteration data length in config.py or decrease amount of iterations.'.format(
144 | iteration)))
145 |
146 | if config.USING_PYSPARK:
147 | weights, intercepts = self.compute_weights_pyspark(iteration)
148 | else:
149 | weights, intercepts = self.compute_weights_sklearn(iteration)
150 |
151 | self.personal_weights[iteration] = weights
152 | self.personal_intercepts[iteration] = intercepts
153 |
154 | # create copies of weights and intercepts since we may be adding to them
155 | final_weights, final_intercepts = copy.deepcopy(weights), copy.deepcopy(intercepts)
156 |
157 | if config.USE_DP_PRIVACY:
158 | lock.acquire() # for random seed
159 | final_weights, final_intercepts = \
160 | self.add_noise(weights=weights, intercepts=intercepts, iteration=iteration)
161 | lock.release()
162 |
163 | if config.USE_SECURITY: # adding security via diffie-helman offsets
164 | final_weights, final_intercepts = \
165 | self.add_security_offsets(weights=final_weights, intercepts=final_intercepts)
166 | end_time = datetime.now()
167 | computation_time = end_time - start_time
168 | self.computation_times[iteration] = computation_time
169 | # multiply latency by two: first the server has to request the value, then the client has to return it
170 |
171 | body = {'weights': final_weights, 'intercepts': final_intercepts, 'iteration': iteration} # generate body
172 |
173 | for client_name in self.active_clients:
174 | client_agent = self.directory.clients[client_name]
175 | if client_name != self.name:
176 | message = Message(sender_name=self.name, recipient_name=client_name, body=body)
177 | client_agent.receive_weights(message)
178 |
179 | return None
180 |
181 | def compute_weights_pyspark(self, iteration):
182 | '''
183 | Corresponds to algorithm 2 in the paper
184 | '''
185 | dataset = self.train_datasets[iteration]
186 | lr = LogisticRegression(maxIter=config.LOG_MAX_ITER)
187 | lrModel = lr.fit(dataset)
188 |
189 | weights = lrModel.coefficientMatrix.toArray()
190 | intercepts = lrModel.interceptVector
191 | return weights, intercepts
192 |
193 | def compute_weights_sklearn(self, iteration):
194 | '''
195 | Corresponds to algorithm 1 inthe paper
196 | '''
197 | X, y = self.train_datasets[iteration]
198 |
199 | lr = SGDClassifier(alpha=0.0001, loss="log", random_state=config.RANDOM_SEEDS[self.name][iteration])
200 |
201 | # Assign prev round coefficients
202 | if iteration > 1:
203 | federated_weights = copy.deepcopy(self.federated_weights[iteration - 1])
204 | federated_intercepts = copy.deepcopy(self.federated_intercepts[iteration - 1])
205 | else:
206 | federated_weights = None
207 | federated_intercepts = None
208 |
209 | lr.fit(X, y, coef_init=federated_weights, intercept_init=federated_intercepts)
210 | local_weights = lr.coef_
211 | local_intercepts = lr.intercept_
212 |
213 | return local_weights, local_intercepts
214 |
215 | def add_noise(self, weights, intercepts, iteration):
216 | # preparing value to send to server by adding deltas and DP noise
217 |
218 | weights_shape = weights.shape
219 | weights_dp_noise = np.zeros(weights_shape)
220 |
221 | intercepts_shape = intercepts.shape
222 | intercepts_dp_noise = np.zeros(intercepts_shape)
223 |
224 | # generate DP parameters
225 | active_clients_lens = [config.LENS_PER_ITERATION[client_name] for client_name in self.active_clients]
226 |
227 | smallest_dataset = min(active_clients_lens)
228 | if config.USING_CUMULATIVE:
229 | smallest_dataset *= iteration
230 |
231 | sensitivity = 2 / (
232 | len(self.active_clients) * smallest_dataset * config.alpha)
233 | epsilon = config.EPSILONS[self.name]
234 |
235 | random.seed(config.RANDOM_SEEDS[self.name][iteration])
236 | # adding differentially private noise
237 | for i in range(weights_shape[0]): # weights_modified is 2-D
238 | for j in range(weights_shape[1]):
239 | if config.DP_ALGORITHM == 'Laplace':
240 | dp_noise = laplace(mean=config.mean, sensitivity=sensitivity, epsilon=epsilon)
241 | elif config.DP_ALGORITHM == 'Gamma':
242 | scale = sensitivity / epsilon
243 | num_clients = len(self.directory.clients)
244 | dp_noise = random.gammavariate(1 / num_clients, scale) - random.gammavariate(1 / num_clients,
245 | scale)
246 | else:
247 | raise AssertionError('Need to specify config.DP_ALGORITHM as Laplace or Gamma')
248 | weights_dp_noise[i][j] = dp_noise
249 |
250 | if config.INTERCEPTS_DP_NOISE:
251 | for i in range(intercepts_shape[0]):
252 | if config.DP_ALGORITHM == 'Laplace':
253 | dp_noise = laplace(mean=config.mean, sensitivity=sensitivity, epsilon=epsilon)
254 | elif config.DP_ALGORITHM == 'Gamma':
255 | scale = sensitivity / epsilon
256 | num_clients = len(self.directory.clients)
257 | dp_noise = random.gammavariate(1 / num_clients, scale) - random.gammavariate(1 / num_clients, scale)
258 | else:
259 | raise AssertionError('Need to specify config.DP_ALGORITHM as Laplace or Gamma')
260 | intercepts_dp_noise[i] = dp_noise
261 |
262 | weights_with_noise = copy.deepcopy(weights) # make a copy to not mutate weights
263 | intercepts_with_noise = copy.deepcopy(intercepts)
264 |
265 | self.weights_dp_noise[iteration] = weights_dp_noise
266 | weights_with_noise += weights_dp_noise
267 | self.intercepts_dp_noise[iteration] = intercepts_dp_noise
268 | intercepts_with_noise += intercepts_dp_noise
269 | return weights_with_noise, intercepts_with_noise
270 |
271 | def add_security_offsets(self, weights, intercepts):
272 | adding = True # Controls flow of loop. When other agent number is greater, subtract offset instead of add it
273 | for agent_name, offset in self.deltas.items(): # dictionary but should be ordered since Python 3
274 | if agent_name == self.name:
275 | adding = False # from here on out subtract offsets for next clients
276 | elif agent_name in self.active_clients:
277 | if adding == True:
278 | weights += offset
279 | intercepts += offset
280 | else:
281 | weights -= offset
282 | intercepts -= offset
283 | else:
284 | # client no longer in simulation so don't add offset
285 | pass
286 | self.update_deltas() # update the deltas after using them
287 |
288 | return weights, intercepts
289 |
290 | def update_deltas(self):
291 | """
292 | Updates commonkeyList. Called after each iteration to update values.
293 | """
294 |
295 | if None not in self.commonkeyList.values(): # if first time calling this function
296 | agents_and_seeds = self.commonkeyList.items()
297 | self.commonkeyList = self.commonkeyList.fromkeys(self.commonkeyList.keys(), None)
298 | else:
299 | # use exisitng seed to generate new seeds and offsets
300 | agents_and_seeds = self.seeds.items()
301 |
302 | for agent, seed in agents_and_seeds:
303 | # uses current seeds to generate new deltas and new seeds
304 | if agent != self.name:
305 | seed_b = bin(seed) # cast to binary
306 | delta_b = seed_b[:20]
307 | delta = int(delta_b, 2) # convert back to decimal from base 2
308 |
309 | seed_b = seed_b[20:]
310 | seed = int(seed_b, 2)
311 | random.seed(seed) # generate new seed
312 | seed = random.randint(-sys.maxsize, sys.maxsize)
313 | self.seeds[agent] = seed
314 | self.deltas[agent] = delta
315 |
316 | def receive_weights(self, message):
317 | body = message.body
318 | iteration, client_weight, client_intercept = body['iteration'], body['weights'], body[
319 | 'intercepts']
320 | if iteration not in self.other_client_weights:
321 | self.other_client_weights[iteration] = {}
322 | self.other_client_intercepts[iteration] = {}
323 |
324 | self.other_client_weights[iteration][message.sender] = client_weight
325 | self.other_client_intercepts[iteration][message.sender] = client_intercept
326 | if len(self.other_client_weights[iteration]) == len(self.active_clients) - 1: # -1 because of yourself
327 | self.federate_weights(iteration)
328 | return None
329 |
330 | def federate_weights(self, iteration):
331 | while iteration not in self.personal_weights.keys():
332 | pass # waiting until you've also finished computation
333 |
334 | all_weights = list(self.other_client_weights[iteration].values())
335 | all_weights.append(self.personal_weights[iteration])
336 |
337 | all_intercepts = list(self.other_client_intercepts[iteration].values())
338 | all_intercepts.append(self.personal_intercepts[iteration])
339 |
340 | federated_weights = np.average(all_weights, axis = 0) # the weights for this iteration!
341 | federated_intercepts = np.average(all_intercepts, axis=0)
342 | self.federated_weights[iteration] = federated_weights
343 | self.federated_intercepts[iteration] = federated_intercepts
344 |
345 | personal_weights = self.personal_weights[iteration]
346 | personal_intercepts = self.personal_intercepts[iteration]
347 |
348 | personal_accuracy = self.evaluator.accuracy(personal_weights, personal_intercepts)
349 | federated_accuracy = self.evaluator.accuracy(federated_weights, federated_intercepts)
350 |
351 | self.personal_accuracy[iteration] = personal_accuracy
352 | self.federated_accuracy[iteration] = federated_accuracy
353 |
354 | args = [self.name, iteration, personal_accuracy, federated_accuracy]
355 | iteration_report = 'Performance Metrics for {} on iteration {} \n' \
356 | '------------------------------------------- \n' \
357 | 'Personal accuracy: {} \n' \
358 | 'Federated accuracy: {} \n'
359 | print(iteration_report.format(*args))
360 |
361 | return None
362 |
363 | def final_statistics(self):
364 | print("PERSONAL ACCURACY IS {}".format(self.personal_accuracy))
365 | print(self.personal_accuracy)
366 | print("FEDERATED ACCURACY IS {}".format(self.federated_accuracy))
367 | print(self.federated_accuracy)
368 |
--------------------------------------------------------------------------------
/src/Serverless/config.py:
--------------------------------------------------------------------------------
1 | """
2 | Config file for simulation. Accessed by most files.
3 | Configuration options included:
4 | - Simulation Parameters
5 | - Toggling Security
6 | - Toggling DP Privacy
7 | - Toggling Client Dropout
8 | - Settling latency values
9 | """
10 |
11 |
12 | """Imports"""
13 | from datetime import timedelta
14 | import random
15 |
16 | """Simulation Parameters"""
17 | # NUM_CLIENTS: Number of client agents
18 | NUM_CLIENTS = 3
19 | client_names = ['client_agent' + str(i) for i in range(NUM_CLIENTS)]
20 | # NUM_SERVERS: More than one server will require additional coding to specify each server's behavior in the simulation
21 | NUM_SERVERS = 1
22 | # ITERATIONS: How many iterations to run simulation for
23 | ITERATIONS = 4
24 | # LEN_PER_ITERATION: How many datapoints each client gets per iteration (starts at 0). On iteration i, each client has (i+1) * LEN_PER_ITERATION samples
25 | len_per_iteration = 100 # using equal size datasets for each client in this example
26 | LENS_PER_ITERATION = {client_name: len_per_iteration for client_name in client_names}
27 |
28 | # LEN_TEST: Length of test dataset. Note whole dataset length is 1797
29 | LEN_TEST = 300
30 |
31 | VERBOSITY = 1 # 1 to print out the result of each iteration
32 |
33 | """Pyspark"""
34 | # NOTE: As it's currently implemented. Both these should be False to use Algorithm 1. Both these should be True to use Algorithm 2.
35 | # Not Using cumulative with algorithm 2 means the weights from your previous iterations don't end up getting used.
36 | USING_PYSPARK = True
37 | USING_CUMULATIVE = True
38 |
39 | """Security"""
40 | # USE_SECURITY: Implements Diffie-Helman key exchange for added security. Slows runtime slightly, but no effect on performance
41 | USE_SECURITY = True
42 |
43 | """Differential Privacy"""
44 | # USE_DP_PRIVACY: Whether to implement differential privacy functionality. Defaults to laplacian noise.
45 | USE_DP_PRIVACY = False
46 | # SUBTRACT_DP_NOISE: Use more advanced version of protocol which has each client subtract the DP noise it added from the federated model it receives
47 | SUBTRACT_DP_NOISE = False # Subtract your own DP noise from federated model to increase accuracy
48 | assert (SUBTRACT_DP_NOISE == False or (
49 | SUBTRACT_DP_NOISE == True and USE_DP_PRIVACY == True)) # Only subtract DP Noise if adding it to begin with
50 |
51 | INTERCEPTS_DP_NOISE = False # Add DP noise to intercepts (for Logistic Regression example)
52 | # DP_ALGORITHM: either Laplace or Gamma. Can easily add more in client agent's code
53 |
54 | DP_ALGORITHM = "Gamma"
55 |
56 | # DP Privacy Parameters
57 | epsilon = 1.0 # smaller epsilon --> more noise/less accuracy
58 | # can make each client's epsilon different if desired
59 | EPSILONS = {client_name: epsilon for client_name in client_names}
60 | alpha = 1
61 | mean = 0
62 |
63 | """Client Dropout"""
64 | # CLIENT_DROPOUT: When TRUE, clients drop out of simulation when personal weights are within tolerance of federated weights
65 | CLIENT_DROPOUT = True
66 | tolerance = 20.0 # note this value should change depending on whether you are normalizing
67 |
68 | """Latency"""
69 | SIMULATE_LATENCIES = True
70 | # Define any agent-agent communication latencies here. If none is provided, defaults to zero.
71 |
72 |
73 | LATENCY_DICT = {}
74 | # fill in the rest with zeros:
75 | if 'server_agent0' not in LATENCY_DICT.keys():
76 | LATENCY_DICT['server_agent0'] = {}
77 |
78 | for client_name in client_names:
79 | if client_name not in LATENCY_DICT.keys():
80 | LATENCY_DICT[client_name] = {client_name2: timedelta(seconds=0.1) for client_name2 in client_names}
81 | LATENCY_DICT[client_name]['server_agent0'] = timedelta(seconds=0.1)
82 | LATENCY_DICT['server_agent0'][client_name] = timedelta(seconds=0.1)
83 |
84 | LATENCY_DICT['client_agent1'] = {client_name: timedelta(seconds=2.0) for client_name in client_names}
85 | LATENCY_DICT['client_agent1']['server_agent0'] = timedelta(seconds=2.0)
86 | LATENCY_DICT['server_agent0']['client_agent1'] = timedelta(seconds=2.0)
87 |
88 | LATENCY_DICT['client_agent0']['server_agent0'] = timedelta(seconds=0.3)
89 | LATENCY_DICT['server_agent0']['client_agent0'] = timedelta(seconds=0.3)
90 |
91 | """
92 | Additional Constants: likely won't need modification
93 | """
94 | # LOG_MAX_ITER: max iterations for the logistic regression
95 | LOG_MAX_ITER = 10
96 | random.seed(0)
97 | # RANDOM_SEEDS: required for reproducibility of simulation. Seeds every iteration of the training for each client
98 | RANDOM_SEEDS = {client_name: list(random.sample(range(0, 1000000), 100)) for client_name in client_names}
99 |
--------------------------------------------------------------------------------
/src/Serverless/directory_serverless.py:
--------------------------------------------------------------------------------
1 | class Directory:
2 | """
3 | Contains mappings for client and server names to instances in simulation
4 | """
5 | def __init__(self, clients):
6 | self.clients = clients
7 |
--------------------------------------------------------------------------------
/src/Serverless/initializer_serverless.py:
--------------------------------------------------------------------------------
1 | """
2 | NOTE: The Serverless directory was written as a proof of concept extension of our library
3 | that involves no servers, with clients communicating with each other directly.
4 | """
5 |
6 |
7 |
8 | import sys
9 | sys.path.append('..')
10 |
11 | import config
12 | import datetime
13 | import numpy as np
14 | import pickle
15 |
16 | from client_agent_serverless import ClientAgentServerless
17 | from directory import Directory
18 | from pyspark.sql import SparkSession
19 | from sklearn.datasets import load_digits
20 | from directory_serverless import Directory
21 |
22 |
23 | from utils import data_formatting
24 |
25 | import multiprocessing
26 | from multiprocessing.pool import ThreadPool
27 | from message import Message
28 |
29 | from utils.print_config import print_config
30 | from utils.model_evaluator import ModelEvaluator
31 |
32 |
33 |
34 | def client_computation_caller(inp):
35 | client_instance, iteration, lock = inp
36 | __ = client_instance.produce_weights(iteration, lock)
37 | return None
38 |
39 | class InitializerServerless:
40 | def __init__(self, num_clients, iterations):
41 | """
42 | Offline stage of simulation. Initializes clients and servers for iteration as well as gives each client its data.
43 | :param num_clients: number of clients to be use for simulation
44 | :param num_servers: number of servers to be use for simulation. Personalized coding required if greater than 1.
45 | :param iterations: number of iterations to run simulation for
46 | """
47 | global len_per_iteration
48 | if config.USING_PYSPARK:
49 | spark = SparkSession.builder.appName('SecureFederatedLearning').getOrCreate() # initialize spark session
50 | spark.sparkContext.setLogLevel("ERROR") # supress sparks messages
51 |
52 | digits = load_digits() # using sklearn's MNIST dataset
53 | X, y = digits.data, digits.target
54 |
55 | X_train, X_test = X[:-config.LEN_TEST], X[-config.LEN_TEST:]
56 | y_train, y_test = y[:-config.LEN_TEST], y[-config.LEN_TEST:]
57 |
58 | # extract only amount that we require
59 | number_of_samples = 0
60 | for client_name in config.client_names:
61 | len_per_iteration = config.LENS_PER_ITERATION[client_name]
62 | number_of_samples += len_per_iteration * iterations
63 |
64 | X_train = X_train[:number_of_samples]
65 | y_train = y_train[:number_of_samples]
66 |
67 | client_to_datasets = data_formatting.partition_data(X_train, y_train, config.client_names, iterations,
68 | config.LENS_PER_ITERATION, cumulative=config.USING_CUMULATIVE, pyspark=config.USING_PYSPARK)
69 |
70 | #print_config(len_per_iteration=config.LEN_PER_ITERATION)
71 | print('\n \n \nSTARTING SIMULATION \n \n \n')
72 |
73 | active_clients = {'client_agent' + str(i) for i in range(num_clients)}
74 | self.clients = {
75 | 'client_agent' + str(i): ClientAgentServerless(agent_number=i,
76 | train_datasets=client_to_datasets['client_agent' + str(i)],
77 | evaluator=ModelEvaluator(X_test, y_test),
78 | active_clients=active_clients) for i in
79 | range(num_clients)} # initialize the agents
80 |
81 | # create directory with mappings from names to instances
82 | self.directory = Directory(clients=self.clients)
83 |
84 | for agent_name, agent in self.clients.items():
85 | agent.set_directory(self.directory)
86 | agent.initializations()
87 |
88 | # OFFLINE diffie-helman key exchange
89 | # NOTE: this is sequential in implementation, but simulated as occuring parallel
90 | if config.USE_SECURITY:
91 | key_exchange_start = datetime.datetime.now() # measuring how long the python script takes
92 | max_latencies = []
93 | for client_name, client in self.clients.items():
94 | # not including logic of sending/receiving public keys in latency computation since it is nearly zero
95 | client.send_pubkeys()
96 | max_latency = max(config.LATENCY_DICT[client_name].values())
97 | max_latencies.append(max_latency)
98 | simulated_time = max(max_latencies)
99 |
100 | key_exchange_end = datetime.datetime.now() # measuring runtime
101 | key_exchange_duration = key_exchange_end - key_exchange_start
102 | simulated_time += key_exchange_duration
103 | if config.SIMULATE_LATENCIES:
104 | print(
105 | 'Diffie-helman key exchange simulated duration: {}\nDiffie-helman key exchange real run-time: {}\n'.format(
106 | simulated_time, key_exchange_duration))
107 |
108 | for client_name, client in self.clients.items():
109 | client.initialize_common_keys()
110 |
111 |
112 | def request_values(self, num_iterations):
113 | """
114 | Method invoked to start simulation. Prints out what clients have converged on what iteration.
115 | Also prints out accuracy for each client on each iteration (what weights would be if not for the simulation) and federated accuaracy.
116 | :param iters: number of iterations to run
117 | """
118 | for i in range(1, num_iterations+1):
119 | m = multiprocessing.Manager()
120 | lock = m.Lock()
121 | with ThreadPool(len(self.clients)) as calling_pool:
122 | args = []
123 | for client_instance in self.clients.values():
124 | args.append((client_instance, i, lock))
125 | __ = calling_pool.map(client_computation_caller, args)
126 |
127 |
128 | def run_simulation(self, num_iterations):
129 | """
130 | Online stage of simulation.
131 | :param num_iterations: number of iterations to run
132 | :param server_agent_name: which server to use. Defaults to first server.
133 | """
134 | # ONLINE
135 | self.request_values(num_iterations)
136 | for client_name, client_agent in self.directory.clients.items():
137 | client_agent.final_statistics()
138 |
139 |
--------------------------------------------------------------------------------
/src/Serverless/run_simulation_serverless.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('..')
3 |
4 | import random
5 | import warnings
6 | import datetime
7 | import config
8 | import numpy as np
9 |
10 | from initializer_serverless import InitializerServerless
11 |
12 |
13 | if __name__ == '__main__':
14 | random.seed(0)
15 | np.random.seed(0)
16 | initializer = InitializerServerless(num_clients=config.NUM_CLIENTS, iterations=config.ITERATIONS)
17 | # can use any amount of iterations less than config.ITERATIONS but the
18 | # initializer has only given each client config.ITERATIONS datasets for training.
19 | a = datetime.datetime.now()
20 | initializer.run_simulation(config.ITERATIONS)
21 | b = datetime.datetime.now()
22 |
--------------------------------------------------------------------------------
/src/agent.py:
--------------------------------------------------------------------------------
1 | from abc import ABC, abstractmethod
2 |
3 |
4 | class Agent(ABC):
5 | def __init__(self, agent_number, agent_type):
6 | self.agent_number = agent_number
7 | self.agent_type = agent_type # 'CLIENT' or 'SERVER'
8 | self.directory = None # will be added after initialization
9 |
10 | @property
11 | def name(self):
12 | return str(self.agent_type) + str(self.agent_number)
13 |
14 | def set_directory(self, directory):
15 | self.directory = directory
16 |
--------------------------------------------------------------------------------
/src/client_agent.py:
--------------------------------------------------------------------------------
1 | import copy
2 | import numpy as np
3 | import sys
4 | import random
5 | import config
6 | import threading
7 | from warnings import simplefilter
8 | from datetime import datetime
9 | from sklearn import metrics
10 |
11 | from agent import Agent
12 | from message import Message
13 | from utils.dp_mechanisms import laplace
14 | import utils.diffie_hellman as dh
15 |
16 | from sklearn.linear_model import SGDClassifier
17 |
18 |
19 | simplefilter(action='ignore', category=FutureWarning)
20 |
21 | class ClientAgent(Agent):
22 | def __init__(self, agent_number, train_datasets, evaluator, active_clients):
23 | """
24 | Initializes an instance of client agent
25 |
26 | :param agent_number: id for agent
27 | :type agent_number: int
28 | :param train_datasets: dictionary mapping iteration to dataset for given iteration
29 | :type train_datasets: dictionary indexed by ints mapping to numpy arrays
30 | :param evaluator: evaluator instance used to evaluate new weights
31 | :type evaluator: evaluator, defined in parallelized.py
32 | :param active_clients: Clients currently in simulation. Will be updated if clients drop out
33 | """
34 | super(ClientAgent, self).__init__(agent_number=agent_number, agent_type="client_agent")
35 |
36 | self.train_datasets = train_datasets
37 | self.evaluator = evaluator
38 | self.active_clients = active_clients
39 |
40 | self.directory = None
41 | self.pubkeyList = None
42 | self.seckeyList = None
43 | self.otherkeyList = None
44 | self.commonkeyList = None
45 | self.seeds = None
46 | self.deltas = None
47 |
48 | self.computation_times = {}
49 |
50 | self.personal_weights = {} # personal weights. Maps iteration (int) to weights (numpy array)
51 | self.personal_intercepts = {}
52 |
53 | self.weights_dp_noise = {} # keyed by iteration; noise added at each iteration
54 | self.intercepts_dp_noise = {}
55 |
56 | self.federated_weights = {} # averaged weights
57 | self.federated_intercepts = {}
58 |
59 | self.personal_accuracy = {}
60 | self.federated_accuracy = {}
61 |
62 | def initializations(self):
63 | """
64 | Preforms initializions that have to be done after initializing instance
65 | :return: None
66 | :rtype: None
67 | """
68 | assert (self.directory is not None)
69 | clients = self.directory.clients
70 | num_clients = len(clients)
71 |
72 | pubkeyList, seckeyList = dh.keygeneration(num_clients, self.agent_number)
73 |
74 | # note this works because dicts are ordered in Python 3.6+
75 | self.pubkeyList = dict(zip(clients.keys(), pubkeyList))
76 | self.seckeyList = dict(zip(clients.keys(), seckeyList))
77 |
78 | # these dictionaries will be populated after key exchange
79 |
80 | self.otherkeyList = {agent_name: None for agent_name, __ in clients.items()}
81 | self.otherkeyList[self.name] = 0 # set to zero for yourself!
82 |
83 | self.commonkeyList = {agent_name: None for agent_name, __ in clients.items()}
84 | self.commonkeyList[self.name] = 0
85 |
86 | self.seeds = {agent_name: None for agent_name, __ in clients.items()}
87 | self.seeds[self.name] = 0
88 |
89 | self.deltas = {agent_name: None for agent_name, __ in clients.items()}
90 | self.deltas[self.name] = 0
91 |
92 | def send_pubkeys(self):
93 | """
94 | Sends public keys to other clients in simulations as required by diffie-helman protocol.
95 | """
96 | for agent_name, agent in self.directory.clients.items():
97 | pubkey = self.pubkeyList[agent_name] # retrieve pubkey for client we're sending to
98 | body = {'pubkey': pubkey}
99 | msg = Message(sender_name=self.name, recipient_name=agent_name, body=body)
100 | agent.receive_pubkey(msg) # invoke method of receiving agent
101 |
102 | def receive_pubkey(self, message):
103 | """
104 | Receives public key from another client
105 | :param message: message containing pubkey from another client
106 | :type message: instance of Message defined in message.py
107 | """
108 | sender = message.sender
109 | body = message.body
110 | pubkey = body["pubkey"]
111 | self.otherkeyList[sender] = pubkey
112 |
113 | def initialize_common_keys(self):
114 | """
115 | Initializes common key list to be used as offsets for sending weights
116 | """
117 | pubkeyList = list(self.pubkeyList.values())
118 | seckeyList = list(self.seckeyList.values())
119 | otherkeyList = list(self.otherkeyList.values())
120 | commonkeyList = dh.keyexchange(len(self.directory.clients), self.agent_number, pubkeyList, seckeyList,
121 | otherkeyList) # generates common keys
122 | for i, agent in enumerate(self.commonkeyList):
123 | self.commonkeyList[agent] = commonkeyList[i]
124 |
125 | self.update_deltas() # this method generates seeds and deltas from the common keys
126 |
127 | def produce_weights(self, message):
128 | """
129 | :param message: message containing information necessary to produce weights for the iteration
130 | :type message: Message
131 | :return: message containing weights with security and/or DP noise added, as specified in config.py
132 | :rtype: Message
133 | """
134 | start_time = datetime.now()
135 | body = message.body
136 | iteration, lock, simulated_time = body['iteration'], body['lock'], body['simulated_time']
137 |
138 | if iteration - 1 > len(self.train_datasets): # iteration is indexed starting from 1
139 | raise (ValueError(
140 | 'Not enough data to support a {}th iteration. Either change iteration data length in config.py or decrease amount of iterations.'.format(
141 | iteration)))
142 |
143 | if config.USING_CUMULATIVE: # choosing between algorithms
144 | # compute weights from scratch
145 | weights, intercepts = self.compute_weights_noncumulative(iteration)
146 | else:
147 | # gradient descent on previous iteration
148 | weights, intercepts = self.compute_weights_cumulative(iteration)
149 |
150 | self.personal_weights[iteration] = weights
151 | self.personal_intercepts[iteration] = intercepts
152 |
153 | # create copies of weights and intercepts since we may be adding to them
154 | final_weights, final_intercepts = copy.deepcopy(weights), copy.deepcopy(intercepts)
155 |
156 | if config.USE_DP_PRIVACY:
157 | lock.acquire() # for random seed
158 | final_weights, final_intercepts = \
159 | self.add_noise(weights=weights, intercepts=intercepts, iteration=iteration)
160 | lock.release()
161 |
162 | if config.USE_SECURITY: # adding security via diffie-helman offsets
163 | final_weights, final_intercepts = \
164 | self.add_security_offsets(weights=final_weights, intercepts=final_intercepts)
165 | end_time = datetime.now()
166 | computation_time = end_time - start_time
167 | self.computation_times[iteration] = computation_time
168 | # multiply latency by two: first the server has to request the value, then the client has to return it
169 |
170 | simulated_time += computation_time + config.LATENCY_DICT[self.name]['server_agent0']
171 |
172 | body = {'weights': final_weights, 'intercepts': final_intercepts, 'iter': iteration,
173 | 'computation_time': computation_time, 'simulated_time': simulated_time} # generate body
174 |
175 | return Message(sender_name=self.name, recipient_name=self.directory.server_agents, body=body)
176 |
177 | def compute_weights_cumulative(self, iteration):
178 | X, y = self.train_datasets[iteration]
179 | lr = SGDClassifier(alpha=0.0001, loss="log", random_state=config.RANDOM_SEEDS[self.name][iteration])
180 | lr.fit(X, y)
181 | local_weights = lr.coef_
182 | local_intercepts = lr.intercept_
183 | return local_weights, local_intercepts
184 |
185 | def compute_weights_noncumulative(self, iteration):
186 |
187 | """
188 | Example of a function that would compute weights. This one uses sklearn to perform
189 | logistic regression. If using this function, the datasets should not be cumulative, i.e.,
190 | the dataset in iteration i+1 should be completely new data since the training starts with the
191 | federated weights from the previous iteration. Note that if using a compute_weights function like this,
192 | the 'personal weights' are not created with only this clients dataset since it uses the *federated*
193 | weights from previous iterations, which include other clients data.
194 | :return: weights, intercepts
195 | :rtype: numpy arrays
196 | """
197 | X, y = self.train_datasets[iteration]
198 |
199 | lr = SGDClassifier(alpha=0.0001, loss="log", random_state=config.RANDOM_SEEDS[self.name][iteration])
200 |
201 | # Assign prev round coefficients
202 | if iteration > 1:
203 | federated_weights = copy.deepcopy(self.federated_weights[iteration - 1])
204 | federated_intercepts = copy.deepcopy(self.federated_intercepts[iteration - 1])
205 | else:
206 | federated_weights = None
207 | federated_intercepts = None
208 |
209 | lr.fit(X, y, coef_init=federated_weights, intercept_init=federated_intercepts)
210 | local_weights = lr.coef_
211 | local_intercepts = lr.intercept_
212 |
213 | return local_weights, local_intercepts
214 |
215 | def add_noise(self, weights, intercepts, iteration):
216 | """
217 | Adds differentially private noise to weights as specified by parameters in config.py.
218 | Also adds noise to intercepts if specified in intercepts.py.
219 | The sensitivity is computed using the size of the smallest dataset used by any client in this iteration.
220 | Note that modifications to add_noise might be necessary depending if you are using cumulative or non cumulative
221 | datasets.
222 | :return: weights, intercepts
223 | :rtype: numpy arrays
224 | """
225 | weights_shape = weights.shape
226 | weights_dp_noise = np.zeros(weights_shape)
227 |
228 | intercepts_shape = intercepts.shape
229 | intercepts_dp_noise = np.zeros(intercepts_shape)
230 |
231 | # generate DP parameters
232 | active_clients_lens = [config.LENS_PER_ITERATION[client_name] for client_name in self.active_clients]
233 |
234 | smallest_dataset = min(active_clients_lens)
235 | if config.USING_CUMULATIVE:
236 | smallest_dataset *= iteration
237 |
238 | sensitivity = 2 / (
239 | len(self.active_clients) * smallest_dataset * config.alpha)
240 | epsilon = config.EPSILONS[self.name]
241 |
242 | random.seed(config.RANDOM_SEEDS[self.name][iteration])
243 | # adding differentially private noise
244 | for i in range(weights_shape[0]): # weights_modified is 2-D
245 | for j in range(weights_shape[1]):
246 | if config.DP_ALGORITHM == 'Laplace':
247 | dp_noise = laplace(mean=config.mean, sensitivity=sensitivity, epsilon=epsilon)
248 | elif config.DP_ALGORITHM == 'Gamma':
249 | scale = sensitivity / epsilon
250 | num_clients = len(self.directory.clients)
251 | dp_noise = random.gammavariate(1 / num_clients, scale) - random.gammavariate(1 / num_clients,
252 | scale)
253 | else:
254 | raise AssertionError('Need to specify config.DP_ALGORITHM as Laplace or Gamma')
255 | weights_dp_noise[i][j] = dp_noise
256 |
257 | if config.INTERCEPTS_DP_NOISE:
258 | for i in range(intercepts_shape[0]):
259 | if config.DP_ALGORITHM == 'Laplace':
260 | dp_noise = laplace(mean=config.mean, sensitivity=sensitivity, epsilon=epsilon)
261 | elif config.DP_ALGORITHM == 'Gamma':
262 | scale = sensitivity / epsilon
263 | num_clients = len(self.directory.clients)
264 | dp_noise = random.gammavariate(1 / num_clients, scale) - random.gammavariate(1 / num_clients, scale)
265 | else:
266 | raise AssertionError('Need to specify config.DP_ALGORITHM as Laplace or Gamma')
267 | intercepts_dp_noise[i] = dp_noise
268 |
269 | weights_with_noise = copy.deepcopy(weights) # make a copy to not mutate weights
270 | intercepts_with_noise = copy.deepcopy(intercepts)
271 |
272 | self.weights_dp_noise[iteration] = weights_dp_noise
273 | weights_with_noise += weights_dp_noise
274 | self.intercepts_dp_noise[iteration] = intercepts_dp_noise
275 | intercepts_with_noise += intercepts_dp_noise
276 | return weights_with_noise, intercepts_with_noise
277 |
278 | def add_security_offsets(self, weights, intercepts):
279 | """
280 | Called if config.USE_SECURITY flag is on. Uses the offsets established by the diffie helman key exchange
281 | to mask weights and intercepts. Client i adds the offset established with Client j to the weights if i < j
282 | and otherwise subtracts it if i > j. If i = j, the client does not add anything since it does not have an offset
283 | with itself.
284 | :return: weights, intercepts
285 | :rtype: numpy array, numpy array
286 | """
287 | adding = True # Controls flow of loop. When other agent number is greater, subtract offset instead of add it
288 | for agent_name, offset in self.deltas.items(): # dictionary but should be ordered since Python 3
289 | if agent_name == self.name:
290 | adding = False # from here on out subtract offsets for next clients
291 | elif agent_name in self.active_clients:
292 | if adding == True:
293 | weights += offset
294 | intercepts += offset
295 | else:
296 | weights -= offset
297 | intercepts -= offset
298 | else:
299 | # client no longer in simulation so don't add offset
300 | pass
301 | self.update_deltas() # update the deltas after using them
302 |
303 | return weights, intercepts
304 |
305 | def update_deltas(self):
306 | """
307 | Updates commonkeyList. Called after each iteration to update values.
308 | """
309 | if None not in self.commonkeyList.values(): # if first time calling this function
310 | agents_and_seeds = self.commonkeyList.items()
311 | self.commonkeyList = self.commonkeyList.fromkeys(self.commonkeyList.keys(), None)
312 | else:
313 | # use exisitng seed to generate new seeds and offsets
314 | agents_and_seeds = self.seeds.items()
315 |
316 | for agent, seed in agents_and_seeds:
317 | # uses current seeds to generate new deltas and new seeds
318 | if agent != self.name:
319 | seed_b = bin(seed) # cast to binary
320 | delta_b = seed_b[:20]
321 | delta = int(delta_b, 2) # convert back to decimal from base 2
322 |
323 | seed_b = seed_b[20:]
324 | seed = int(seed_b, 2)
325 | random.seed(seed) # generate new seed
326 | seed = random.randint(-sys.maxsize, sys.maxsize)
327 | self.seeds[agent] = seed
328 | self.deltas[agent] = delta
329 |
330 | def receive_weights(self, message):
331 | """
332 | Called by server agent to return federated weights.
333 | :param message: message containing return weights and other necessary information
334 | :type message: Message
335 | :return: Message indicating whether client has converged in training this iteration, which only
336 | matters if config.CLIENT_DROPOUT is True.
337 | :rtype: Message
338 | """
339 | body = message.body
340 | iteration, return_weights, return_intercepts, simulated_time = body['iteration'], body['return_weights'], body[
341 | 'return_intercepts'], body['simulated_time']
342 |
343 | return_weights = copy.deepcopy(return_weights)
344 | return_intercepts = copy.deepcopy(return_intercepts)
345 |
346 | if config.USE_DP_PRIVACY and config.SUBTRACT_DP_NOISE:
347 | # subtract your own DP noise
348 | return_weights -= self.weights_dp_noise[iteration] / len(self.active_clients)
349 | return_intercepts -= self.intercepts_dp_noise[iteration] / len(self.active_clients)
350 |
351 | self.federated_weights[iteration] = return_weights
352 | self.federated_intercepts[iteration] = return_intercepts
353 |
354 | personal_weights = self.personal_weights[iteration]
355 | personal_intercepts = self.personal_intercepts[iteration]
356 |
357 | converged = self.satisfactory_weights((personal_weights, personal_intercepts), (
358 | return_weights, return_intercepts)) # check whether weights have converged
359 | personal_accuracy = self.evaluator.accuracy(personal_weights, personal_intercepts)
360 | federated_accuracy = self.evaluator.accuracy(return_weights, return_intercepts)
361 |
362 | self.personal_accuracy[iteration] = personal_accuracy
363 | self.federated_accuracy[iteration] = federated_accuracy
364 |
365 | args = [self.name, iteration, personal_accuracy, federated_accuracy]
366 | iteration_report = 'Performance Metrics for {} on iteration {} \n' \
367 | '------------------------------------------- \n' \
368 | 'Personal accuracy: {} \n' \
369 | 'Federated accuracy: {} \n' \
370 |
371 | if config.SIMULATE_LATENCIES:
372 | args.append(self.computation_times[iteration])
373 | iteration_report += 'Personal computation time: {} \n'
374 |
375 | args.append(simulated_time)
376 | iteration_report += 'Simulated time to receive federated weights: {} \n \n'
377 |
378 | if config.VERBOSITY:
379 | print(iteration_report.format(*args))
380 |
381 | msg = Message(sender_name=self.name, recipient_name=self.directory.server_agents,
382 | body={'converged': converged,
383 | 'simulated_time': simulated_time + config.LATENCY_DICT[self.name]['server_agent0']})
384 | return msg
385 |
386 | def satisfactory_weights(self, personal, federated):
387 | """
388 | Private function to check convergence of weights
389 | :param personal: personal weights and person intercepts
390 | :type personal: tuple of numpy arrays
391 | :param federated: federated weights and federated intercepts
392 | :type federated: tuple of numpy arrays
393 | :return: True if converged.
394 | :rtype: Bool
395 | """
396 | personal_weights, personal_intercepts = personal
397 | federated_weights, federated_intercepts = federated
398 |
399 | weights_differences = np.abs(federated_weights - personal_weights)
400 | intercepts_differences = np.abs(federated_intercepts - personal_intercepts)
401 | return (weights_differences < config.tolerance).all() and (
402 | intercepts_differences < config.tolerance).all() # check all weights are close enough
403 |
404 | def remove_active_clients(self, message):
405 | """
406 | Method invoked by server agent when clients have dropped out.
407 | If another client has dropped out, this client needs to know that so that
408 | it knows now to add that security offset, and also to be able to dynamically compute
409 | the differential privacy parameters.
410 | :return: None
411 | """
412 | body = message.body
413 | clients_to_remove, simulated_time, iteration = body['clients_to_remove'], body['simulated_time'], body[
414 | 'iteration']
415 |
416 | print('Simulated time for client {} to finish iteration {}: {}\n'.format(self.name, iteration, simulated_time))
417 |
418 | self.active_clients -= clients_to_remove
419 | return None
--------------------------------------------------------------------------------
/src/config.py:
--------------------------------------------------------------------------------
1 | """
2 | Config file for simulation. Accessed by most files.
3 | Configuration options included:
4 | - Simulation Parameters
5 | - Toggling Security
6 | - Toggling DP Privacy
7 | - Toggling Client Dropout
8 | - Settling latency values
9 |
10 | See PrivacyFl paper for more information about parameters.
11 | https://arxiv.org/abs/2002.08423
12 | """
13 |
14 | """Imports"""
15 | from datetime import timedelta
16 | import random
17 |
18 | """Simulation Parameters"""
19 | # NUM_CLIENTS: Number of client agents
20 | NUM_CLIENTS = 3
21 | client_names = ['client_agent' + str(i) for i in range(NUM_CLIENTS)]
22 | # NUM_SERVERS: More than one server will require additional coding to specify each server's behavior in the simulation
23 | NUM_SERVERS = 1
24 | # ITERATIONS: How many iterations to run simulation for
25 | ITERATIONS = 10
26 | # LEN_PER_ITERATION: How many datapoints each client gets per iteration (starts at 0). On iteration i, each client has (i+1) * LEN_PER_ITERATION samples
27 | len_per_iteration = 50 # using equal size datasets for each client in this example
28 | LENS_PER_ITERATION = {client_name: len_per_iteration for client_name in client_names}
29 |
30 | # LEN_TEST: Length of test dataset. Note whole dataset length is 1797
31 | LEN_TEST = 300
32 |
33 | VERBOSITY = 1 # 1 to print out the result of each iteration
34 |
35 | """Dataset Formatting"""
36 | # When set to True, simulation uses cumulative datasets (i.e., each iteration includes the last), which also affects the way
37 | # training is done. With cumulative datasets, each iteration each client trains on its dataset from scratch.
38 | USING_CUMULATIVE = True
39 |
40 |
41 | """Security"""
42 | # USE_SECURITY: Implements Diffie-Helman key exchange for added security. Slows runtime slightly, but no effect on performance
43 | USE_SECURITY = True
44 |
45 | """Differential Privacy"""
46 | # USE_DP_PRIVACY: Whether to implement differential privacy functionality. Defaults to laplacian noise.
47 | USE_DP_PRIVACY = False
48 | # SUBTRACT_DP_NOISE: Use more advanced version of protocol which has each client subtract the DP noise it added from the federated model it receives
49 | SUBTRACT_DP_NOISE = False # Subtract your own DP noise from federated model to increase accuracy
50 | assert (SUBTRACT_DP_NOISE == False or (
51 | SUBTRACT_DP_NOISE == True and USE_DP_PRIVACY == True)) # Only subtract DP Noise if adding it to begin with
52 |
53 | INTERCEPTS_DP_NOISE = False # Add DP noise to intercepts (for Logistic Regression example)
54 | # DP_ALGORITHM: either Laplace or Gamma. Can easily add more in client agent's code
55 |
56 | DP_ALGORITHM = "Gamma"
57 |
58 | # DP Privacy Parameters
59 | epsilon = 1.0 # smaller epsilon --> more noise/less accuracy
60 | # can make each client's epsilon different if desired
61 | EPSILONS = {client_name: epsilon for client_name in client_names}
62 | alpha = 1
63 | mean = 0
64 |
65 | """Client Dropout"""
66 | # CLIENT_DROPOUT: When TRUE, clients drop out of simulation when personal weights are within tolerance of federated weights
67 | CLIENT_DROPOUT = False
68 | tolerance = 10.0 # note this value should change depending on whether you are normalizing
69 |
70 | """Latency"""
71 | SIMULATE_LATENCIES = True
72 | # Define any agent-agent communication latencies here. If none is provided, defaults to zero.
73 |
74 |
75 | LATENCY_DICT = {}
76 | # fill in the rest with zeros:
77 | if 'server_agent0' not in LATENCY_DICT.keys():
78 | LATENCY_DICT['server_agent0'] = {}
79 |
80 | for client_name in client_names:
81 | if client_name not in LATENCY_DICT.keys():
82 | LATENCY_DICT[client_name] = {client_name2: timedelta(seconds=0.1) for client_name2 in client_names}
83 | LATENCY_DICT[client_name]['server_agent0'] = timedelta(seconds=0.1)
84 | LATENCY_DICT['server_agent0'][client_name] = timedelta(seconds=0.1)
85 |
86 | LATENCY_DICT['client_agent1'] = {client_name: timedelta(seconds=2.0) for client_name in client_names}
87 | LATENCY_DICT['client_agent1']['server_agent0'] = timedelta(seconds=2.0)
88 | LATENCY_DICT['server_agent0']['client_agent1'] = timedelta(seconds=2.0)
89 |
90 | LATENCY_DICT['client_agent0']['server_agent0'] = timedelta(seconds=0.3)
91 | LATENCY_DICT['server_agent0']['client_agent0'] = timedelta(seconds=0.3)
92 |
93 | """
94 | Additional Constants: likely won't need modification
95 | """
96 | # LOG_MAX_ITER: max iterations for the logistic regression
97 | LOG_MAX_ITER = 10
98 | random.seed(0)
99 | # RANDOM_SEEDS: required for reproducibility of simulation. Seeds every iteration of the training for each client
100 | RANDOM_SEEDS = {client_name: list(random.sample(range(0, 1000000), 100)) for client_name in client_names}
101 |
--------------------------------------------------------------------------------
/src/directory.py:
--------------------------------------------------------------------------------
1 | class Directory:
2 | """
3 | Contains mappings for client and server names to instances in simulation
4 | """
5 | def __init__(self, clients, server_agents):
6 | self.clients = clients
7 | self.server_agents = server_agents
8 | self.all_agents = {**self.clients, **self.server_agents}
--------------------------------------------------------------------------------
/src/initializer.py:
--------------------------------------------------------------------------------
1 | import config
2 | import datetime
3 | import numpy as np
4 | import pickle
5 | import random
6 |
7 | from client_agent import ClientAgent
8 | from server_agent import ServerAgent
9 | from directory import Directory
10 | from sklearn.datasets import load_digits
11 |
12 | from utils import data_formatting
13 |
14 | from utils.print_config import print_config
15 | from utils.model_evaluator import ModelEvaluator
16 | from sklearn.preprocessing import MinMaxScaler
17 |
18 |
19 | class Initializer:
20 | def __init__(self, num_clients, num_servers, iterations):
21 | """
22 | Offline stage of simulation. Initializes clients and servers for iteration as well as gives each client its data.
23 | :param num_clients: number of clients to be use for simulation
24 | :param num_servers: number of servers to be use for simulation. Personalized coding required if greater than 1.
25 | :param iterations: number of iterations to run simulation for
26 | """
27 |
28 | global len_per_iteration
29 |
30 | digits = load_digits() # using sklearn's MNIST dataset
31 | X, y = digits.data, digits.target
32 |
33 | scaler = MinMaxScaler()
34 | scaler.fit(X)
35 | X = scaler.transform(X)
36 |
37 | X_train, X_test = X[:-config.LEN_TEST], X[-config.LEN_TEST:]
38 | y_train, y_test = y[:-config.LEN_TEST], y[-config.LEN_TEST:]
39 |
40 | # extract only amount that we require
41 | number_of_samples = 0
42 | for client_name in config.client_names:
43 | len_per_iteration = config.LENS_PER_ITERATION[client_name]
44 | number_of_samples += len_per_iteration * iterations
45 |
46 | X_train = X_train[:number_of_samples]
47 | y_train = y_train[:number_of_samples]
48 |
49 | client_to_datasets = data_formatting.partition_data(X_train, y_train, config.client_names, iterations,
50 | config.LENS_PER_ITERATION, cumulative=config.USING_CUMULATIVE)
51 |
52 | #print_config(len_per_iteration=config.LEN_PER_ITERATION)
53 | print('\n \n \nSTARTING SIMULATION \n \n \n')
54 |
55 | active_clients = {'client_agent' + str(i) for i in range(num_clients)}
56 | self.clients = {
57 | 'client_agent' + str(i): ClientAgent(agent_number=i,
58 | train_datasets=client_to_datasets['client_agent' + str(i)],
59 | evaluator=ModelEvaluator(X_test, y_test),
60 | active_clients=active_clients) for i in
61 | range(num_clients)} # initialize the agents
62 |
63 | self.server_agents = {'server_agent' + str(i): ServerAgent(agent_number=i) for i in
64 | range(num_servers)} # initialize servers
65 |
66 | # create directory with mappings from names to instances
67 | self.directory = Directory(clients=self.clients, server_agents=self.server_agents)
68 |
69 | for agent_name, agent in self.clients.items():
70 | agent.set_directory(self.directory)
71 | agent.initializations()
72 | for agent_name, agent in self.server_agents.items():
73 | agent.set_directory(self.directory)
74 |
75 | # OFFLINE diffie-helman key exchange
76 | # NOTE: this is sequential in implementation, but simulated as occuring parallel
77 | if config.USE_SECURITY:
78 | key_exchange_start = datetime.datetime.now() # measuring how long the python script takes
79 | max_latencies = []
80 | for client_name, client in self.clients.items():
81 | # not including logic of sending/receiving public keys in latency computation since it is nearly zero
82 | client.send_pubkeys()
83 | max_latency = max(config.LATENCY_DICT[client_name].values())
84 | max_latencies.append(max_latency)
85 | simulated_time = max(max_latencies)
86 |
87 | key_exchange_end = datetime.datetime.now() # measuring runtime
88 | key_exchange_duration = key_exchange_end - key_exchange_start
89 | simulated_time += key_exchange_duration
90 | if config.SIMULATE_LATENCIES:
91 | print(
92 | 'Diffie-helman key exchange simulated duration: {}\nDiffie-helman key exchange real run-time: {}\n'.format(
93 | simulated_time, key_exchange_duration))
94 |
95 | for client_name, client in self.clients.items():
96 | client.initialize_common_keys()
97 |
98 | def run_simulation(self, num_iterations, server_agent_name='server_agent0'):
99 | """
100 | Online stage of simulation.
101 | :param num_iterations: number of iterations to run
102 | :param server_agent_name: which server to use. Defaults to first server.
103 | """
104 | # ONLINE
105 | server_agent = self.directory.server_agents[server_agent_name]
106 | server_agent.request_values(num_iterations=num_iterations)
107 | server_agent.final_statistics()
108 |
109 |
--------------------------------------------------------------------------------
/src/message.py:
--------------------------------------------------------------------------------
1 | class Message:
2 | """
3 | Used for all client-client and client-server communications
4 | """
5 | def __init__(self, sender_name, recipient_name, body):
6 | """
7 | :param sender_name: name of sender
8 | :param recipient_name: name of recipient
9 | :param body: Content depends no the message being sent.
10 | """
11 | self.sender = sender_name
12 | self.recipient = recipient_name
13 | self.body = body
14 |
15 | def __str__(self):
16 | return "Message from {self.sender} to {self.recipient}.\n Body is : {self.body} \n \n"
--------------------------------------------------------------------------------
/src/run_simulation.py:
--------------------------------------------------------------------------------
1 | import random
2 | import warnings
3 | import datetime
4 | import config
5 | import numpy as np
6 |
7 | from initializer import Initializer
8 |
9 | if __name__ == '__main__':
10 | random.seed(0)
11 | np.random.seed(0)
12 | initializer = Initializer(num_clients=config.NUM_CLIENTS, iterations=config.ITERATIONS,
13 | num_servers=config.NUM_SERVERS)
14 | # can use any amount of iterations less than config.ITERATIONS but the
15 | # initializer has only given each client config.ITERATIONS datasets for training.
16 | a = datetime.datetime.now()
17 | initializer.run_simulation(config.ITERATIONS,
18 | server_agent_name='server_agent0')
19 | b = datetime.datetime.now()
--------------------------------------------------------------------------------
/src/server_agent.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('..')
3 |
4 | import numpy as np
5 | import config
6 | from datetime import datetime
7 | import multiprocessing
8 | from multiprocessing.pool import ThreadPool
9 | from utils.latency_helper import find_slowest_time
10 | from agent import Agent
11 | from message import Message
12 |
13 | def client_computation_caller(inp):
14 | client_instance, message = inp
15 | return_message = client_instance.produce_weights(message=message)
16 | return return_message
17 |
18 |
19 | def client_weights_returner(inp):
20 | client_instance, message = inp
21 | converged = client_instance.receive_weights(message)
22 | return converged
23 |
24 |
25 | def client_agent_dropout_caller(inp):
26 | client_instance, message = inp
27 | __ = client_instance.remove_active_clients(message)
28 | return None
29 |
30 |
31 | class ServerAgent(Agent):
32 | """ Server agent that averages (federated) weights and returns them to clients"""
33 |
34 | def __init__(self, agent_number):
35 | super(ServerAgent, self).__init__(agent_number=agent_number, agent_type='server_agent')
36 | self.averaged_weights = {}
37 | self.averaged_intercepts = {}
38 |
39 | def request_values(self, num_iterations):
40 | """
41 | Method invoked to start simulation. Prints out what clients have converged on what iteration.
42 | Also prints out accuracy for each client on each iteration (what weights would be if not for the simulation) and federated accuaracy.
43 | :param iters: number of iterations to run
44 | """
45 | converged = {} # maps client names to iteration of convergence. Contains all inactive clients
46 | active_clients = set(self.directory.clients.keys())
47 |
48 | for i in range(1, num_iterations+1):
49 | weights = {}
50 | intercepts = {}
51 |
52 | m = multiprocessing.Manager()
53 | lock = m.Lock()
54 | with ThreadPool(len(active_clients)) as calling_pool:
55 | args = []
56 | for client_name in active_clients:
57 | client_instance = self.directory.clients[client_name]
58 | body = {'iteration': i, 'lock': lock, 'simulated_time':config.LATENCY_DICT[self.name][client_name]}
59 | arg = Message(sender_name=self.name, recipient_name=client_name, body=body)
60 | args.append((client_instance, arg))
61 | messages = calling_pool.map(client_computation_caller, args)
62 |
63 | server_logic_start = datetime.now()
64 |
65 | vals = {message.sender: (message.body['weights'], message.body['intercepts']) for message in messages}
66 | simulated_time = find_slowest_time(messages)
67 |
68 | # add them to the weights_dictionary
69 | for client_name, return_vals in vals.items():
70 | client_weights, client_intercepts = return_vals
71 | weights[client_name] = np.array(client_weights)
72 | intercepts[client_name] = np.array(client_intercepts)
73 |
74 | weights_np = list(weights.values()) # the weights for this iteration!
75 | intercepts_np = list(intercepts.values())
76 |
77 | try:
78 | averaged_weights = np.average(weights_np, axis=0) # gets rid of security offsets
79 | except:
80 | raise ValueError('''DATA INSUFFICIENT: Some client does not have a sample from each class so dimension of weights is incorrect. Make
81 | train length per iteration larger for each client to avoid this issue''')
82 |
83 | averaged_intercepts = np.average(intercepts_np, axis=0)
84 | self.averaged_weights[i] = averaged_weights ## averaged weights for this iteration!!
85 | self.averaged_intercepts[i] = averaged_intercepts
86 |
87 | # add time server logic takes
88 | server_logic_end = datetime.now()
89 | server_logic_time = server_logic_end - server_logic_start
90 | simulated_time += server_logic_time
91 |
92 | with ThreadPool(len(active_clients)) as returning_pool:
93 | args = []
94 | for client_name in active_clients:
95 | client_instance = self.directory.clients[client_name]
96 | body = {'iteration': i, 'return_weights': averaged_weights,
97 | 'return_intercepts': averaged_intercepts,
98 | 'simulated_time': simulated_time + config.LATENCY_DICT[self.name][client_name]}
99 | message = Message(sender_name=self.name, recipient_name=client_name, body=body)
100 | args.append((client_instance, message))
101 | return_messages = returning_pool.map(client_weights_returner, args)
102 |
103 | simulated_time = find_slowest_time(return_messages)
104 | server_logic_start = datetime.now()
105 | clients_to_remove = set()
106 | for message in return_messages:
107 | if message.body['converged'] == True and message.sender not in converged: # converging
108 | converged[message.sender] = i # iteration of convergence
109 | clients_to_remove.add(message.sender)
110 |
111 | server_logic_end = datetime.now()
112 | server_logic_time = server_logic_end - server_logic_start
113 | simulated_time += server_logic_time
114 |
115 | if config.CLIENT_DROPOUT:
116 | # tell the clients which other clients have dropped out
117 | active_clients -= clients_to_remove
118 | if len(active_clients) < 2: # no point in continuing if don't have at least 2 clients
119 | self.print_convergences(converged)
120 | return
121 | with ThreadPool(len(active_clients)) as calling_pool:
122 | args = []
123 | for client_name in active_clients:
124 | client_instance = self.directory.clients[client_name]
125 | body = {'clients_to_remove': clients_to_remove, 'simulated_time': simulated_time + config.LATENCY_DICT[self.name][client_name], 'iteration':i}
126 | message = Message(sender_name=self.name, recipient_name=client_name, body=body)
127 | args.append((client_instance, message))
128 | __ = calling_pool.map(client_agent_dropout_caller, args)
129 |
130 |
131 | # at end of all iterations
132 | self.print_convergences(converged)
133 |
134 | def print_convergences(self, converged):
135 | """
136 | Used to print out all the clients that have converged at the end of request values
137 | :param converged: dict of converged clients containing iteration of convergence
138 | :type converged: dict
139 | """
140 |
141 | for client_name in self.directory.clients.keys():
142 | if client_name in converged:
143 | print('Client {} converged on iteration {}'.format(client_name, converged[client_name]))
144 | if client_name not in converged:
145 | print('Client {} never converged'.format(client_name))
146 |
147 | def final_statistics(self):
148 | """
149 | USED FOR RESEARCH PURPOSES.
150 | """
151 | # for research purposes
152 | client_accs = []
153 | fed_acc = []
154 | for client_name, client_instance in self.directory.clients.items():
155 | fed_acc.append(list(client_instance.federated_accuracy.values()))
156 | client_accs.append(list(client_instance.personal_accuracy.values()))
157 |
158 | if config.CLIENT_DROPOUT:
159 | print('Federated accuracies are {}'.format(dict(zip(self.directory.clients, fed_acc))))
160 | else:
161 | client_accs = list(np.mean(client_accs, axis=0))
162 | fed_acc = list(np.mean(fed_acc, axis=0))
163 | print('Personal accuracy on final iteration is {}'.format(client_accs))
164 | print('Federated accuracy on final iteration is {}'.format(fed_acc)) # should all be the same if no dropout
165 |
166 |
--------------------------------------------------------------------------------
/src/utils/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vaikkunth/PrivacyFL/7f66084fe0bec114065370e1257566f1988cbf64/src/utils/.DS_Store
--------------------------------------------------------------------------------
/src/utils/asymmetric_laplacian.py:
--------------------------------------------------------------------------------
1 | import scipy.integrate as integrate
2 | import numpy as np
3 | from math import inf
4 |
5 | '''
6 | Given privacy parameters epsilon, global_sensitivity, and scale parameter k, return the function for the probability density function
7 | for asymmetric laplace distribution.
8 | '''
9 | def asymmetric_laplace(epsilon, global_sensitivity, k):
10 | lam = epsilon / global_sensitivity
11 | def f(x):
12 | if x < 0:
13 | return lam / (k + 1 / k) * np.exp(lam * x / k)
14 | else :
15 | return lam / (k + 1 / k) * np.exp(-lam * x * k)
16 | return f
17 |
18 |
19 | '''
20 | Given epsilon, global_sensitivity, and k, return the L1 cost for the asymmetric laplacian mechanism.
21 | '''
22 | def asymmetric_laplace_L1_eval(epsilon, global_sensitivity, k):
23 | return integrate.quad(lambda x: abs(x) * asymmetric_laplace(epsilon, global_sensitivity, k)(x), -inf, inf)[0]
24 |
25 |
26 | '''
27 | Given epsilon, global_sensitivity, and k, return the L2 cost for the asymmetric laplacian mechanism.
28 | '''
29 | def asymmetric_laplace_L2_eval(epsilon, global_sensitivity, k):
30 | return integrate.quad(lambda x: x**2 * asymmetric_laplace(epsilon, global_sensitivity,k)(x), -inf, inf)[0]
31 |
32 |
33 | # Example for getting resutls for asymmetric laplacian
34 | def asymmetric_laplacian_example():
35 | epsilon = 1e-4 # change here for different epsilon values
36 | global_sensitivity = 1 # changee here for different global sensitivity values
37 | k = 3 # change here for different scale parameter values
38 | L1_cost = asymmetric_laplace_L1_eval(epsilon, global_sensitivity, k)
39 | L2_cost = asymmetric_laplace_L2_eval(epsilon, global_sensitivity, k)
40 |
41 | asymmetric_laplacian_example()
--------------------------------------------------------------------------------
/src/utils/data_formatting.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def partition_data(X, y, client_names, num_iterations, lens_per_iteration, cumulative=False):
5 | """
6 | Function used to partition data to give to clients for simulation.
7 | :type X: numpy array
8 | :type y: numpy array
9 | :param client_names: list of all client agents' name
10 | :param num_iterations: number of iterations to partition data for (initially set in config.py)
11 | :type num_iterations: int
12 | :param lens_per_iteration: length of new dataset available each iteration for each client
13 | :type lens_per_iteration: dictionary
14 | :param cumulative: flag that indidicates where dataset creation should be cumulative. If True,
15 | the dataset at iteration i+1 contains all of the data from previous iterations as well.
16 | :type cumulative: bool
17 | :return: dictionary mapping each client by name to another dictionary which contains its dataset for each
18 | iteration mapped by ints (iteration).
19 |
20 |
21 | """
22 | client_datasets = {client_name: None for client_name in client_names}
23 | # partition each client its data
24 | last_index = 0 # where to start the next client's dataset from
25 | for client_name in client_names:
26 | datasets_i = {} # datasets for client i
27 | len_per_iteration = lens_per_iteration[client_name]
28 | start_idx = last_index
29 | last_index += num_iterations * len_per_iteration # where this client's datasets will end
30 | for j in range(1, num_iterations+1):
31 | if cumulative: # dataset gets bigger each iteraton
32 | end_indx = start_idx + len_per_iteration * j
33 | else:
34 | end_indx = start_idx + len_per_iteration # add the length per iteration
35 |
36 |
37 | #print('From {} to {}'.format(start_idx, end_indx))
38 | X_ij = X[start_idx:end_indx]
39 | y_ij = y[start_idx:end_indx]
40 |
41 | datasets_i[j] = (X_ij, y_ij)
42 |
43 | if not cumulative:
44 | start_idx = end_indx # move up start index
45 |
46 | client_datasets[client_name] = datasets_i
47 | return client_datasets
48 |
49 |
--------------------------------------------------------------------------------
/src/utils/diffie_hellman.py:
--------------------------------------------------------------------------------
1 | from cryptography.hazmat.backends import default_backend
2 | from cryptography.hazmat.primitives.kdf.hkdf import HKDF
3 | from cryptography.hazmat.primitives.asymmetric import ec
4 | from cryptography.hazmat.primitives import hashes
5 | import random
6 |
7 | def keygeneration(num, ip): # num - number of parties ; ip - ith party
8 | publicKeyList = []
9 | secretKeyList = []
10 | for i in range(num):
11 | if i == ip:
12 | publicKeyList.append(0)
13 | secretKeyList.append(0)
14 | else:
15 | secretKey = ec.generate_private_key(ec.SECP384R1(), default_backend())
16 | publicKey = secretKey.public_key()
17 | publicKeyList.append(publicKey)
18 | secretKeyList.append(secretKey)
19 | return publicKeyList,secretKeyList
20 |
21 | def keyexchange(num, ip, selfPublicKeys, selfSecretKeys, othersPublicKeys):
22 | exchangeKeys = []
23 | for i in range(num):
24 | if i == ip:
25 | exchangeKeys.append(0)
26 | else:
27 | shareKey = selfSecretKeys[i].exchange(ec.ECDH(), othersPublicKeys[i])
28 | #Hashing the keys
29 | newKey = HKDF(
30 | algorithm=hashes.SHA256(),
31 | length=32,
32 | salt=None,
33 | info=b'handshake data',
34 | backend=default_backend()
35 | ).derive(shareKey)
36 | keyCom = int.from_bytes(newKey, byteorder='big')
37 | exchangeKeys.append(keyCom)
38 | return exchangeKeys
39 |
40 |
41 | def randomize(sd):
42 | random.seed(sd)
43 | rand = random.getrandbits(256*2)
44 | randBin = bin(rand)
45 | appendZeros = 256 - (len(randBin) - 2)
46 | r = '0' * appendZeros + randBin[2:]
47 | # first portion - mask the inputs ; second portion - seed for PRG
48 | second = int(r[0:256], 2)
49 | first = int(r[256:] , 2)
50 | return first, second
51 |
52 |
53 | def randomize_all(ip, exchangeKeys, div):
54 |
55 | for i in range(len(exchangeKeys)):
56 | if i == ip:
57 | continue
58 | exchangeKeys[i], party = randomize( exchangeKeys[i])
59 |
60 | return exchangeKeys, party
61 |
--------------------------------------------------------------------------------
/src/utils/dp_mechanisms.py:
--------------------------------------------------------------------------------
1 | from numbers import Real
2 | import random
3 | import numpy as np
4 |
5 |
6 | """
7 | WARNING: only laplace will produce reproducible results since the other DP mechanisms
8 | included here using numpy's random module which is not thread-safe for randomness.
9 | """
10 | def laplace(mean, sensitivity, epsilon): # mean : value to be randomized (mean)
11 | scale = sensitivity / epsilon
12 | rand = random.uniform(0,1) - 0.5 # rand : uniform random variable
13 | return mean - scale * np.sign(rand) * np.log(1 - 2 * np.abs(rand))
14 |
15 |
16 | def boundedLaplace(mean, sensitivity, epsilon, delta):
17 | #epsion>0 and delta >0 and <0.5
18 |
19 | scale = sensitivity / epsilon
20 |
21 | def cdf(mean):
22 |
23 | if scale == 0:
24 | if mean < 0:
25 | return 0
26 | else:
27 | return 1
28 |
29 | if mean < 0:
30 | return 0.5 * np.exp(mean / scale)
31 |
32 | return 1 - 0.5 * np.exp(-mean / scale)
33 |
34 | if(scale==0):
35 | noiseBound = -1
36 | else:
37 | noiseBound=scale * np.log(1 + (np.exp(epsilon) - 1) / 2 / delta)
38 |
39 | rand = random()
40 | rand *= cdf(noiseBound) - cdf(-noiseBound)
41 | rand += cdf(-noiseBound)
42 | rand -= 0.5
43 |
44 | return mean - scale * (np.sign(rand) * np.log(1 - 2 * np.abs(rand)))
45 |
46 | def staircase(mean,sensitivity,epsilon,gamma):
47 | # 0 <= gamma <= 1 , delta=0, epsilon>0
48 |
49 | #gamma = 1 / (1 + np.exp(epsilon / 2))
50 |
51 | if random() < 0.5:
52 | sign = -1
53 | else:
54 | sign = 1
55 |
56 | geoRand = geometric(1 - np.exp(-epsilon)) - 1 # geoRand : geometric Random Variable
57 | rand = random() # Uniform random variable
58 |
59 | if random() < gamma / (gamma + (1 - gamma) * np.exp(-epsilon)):
60 | binRand = 0 # binary random variable
61 | else:
62 | binRand = 1
63 |
64 | return mean + sign * ((1 - binRand) * ((geoRand + gamma * rand) * sensitivity) +
65 | binRand * ((geoRand + gamma + (1 - gamma) * rand) *
66 | sensitivity))
67 |
68 | def gaussian(mean, sensitivity, epsilon, delta ):
69 |
70 | scale = np.sqrt(2 * np.log(1.25 /delta)) * sensitivity / epsilon
71 | randUnif1 = random()
72 | randUnif2 = random()
73 |
74 | gaussian = np.sqrt(- 2 * np.log(randUnif1)) * np.sin(2 * np.pi * randUnif2)
75 | stdNormal = np.sqrt(- 2 * np.log(randUnif1)) * np.cos(2 * np.pi * randUnif2)
76 |
77 | return stdNormal * scale + mean
78 |
79 |
80 |
--------------------------------------------------------------------------------
/src/utils/latency_helper.py:
--------------------------------------------------------------------------------
1 | def find_slowest_time(messages):
2 | simulated_communication_times = {message.sender: message.body['simulated_time'] for message in messages}
3 | slowest_client = max(simulated_communication_times, key=simulated_communication_times.get)
4 | simulated_time = simulated_communication_times[
5 | slowest_client] # simulated time it would take for server to receive all values
6 | return simulated_time
--------------------------------------------------------------------------------
/src/utils/model_evaluator.py:
--------------------------------------------------------------------------------
1 | from sklearn.linear_model import LogisticRegression
2 |
3 | class ModelEvaluator:
4 | """
5 | Used to evaluate weights on test dataset. Evaluation is performed with sklearn due to difficulties of
6 | changing weights for pyspark's logistic regression.
7 | """
8 |
9 | def __init__(self, X_test, Y_test):
10 | """
11 | Creates a logistic regression object whose weights will be overriden.
12 | :param X_test: numpy array of test inputs
13 | :param Y_test: numpy array of test labels
14 | """
15 | self.X_test = X_test
16 | self.Y_test = Y_test
17 | self.logisticRegr = LogisticRegression()
18 | self.logisticRegr.fit(self.X_test, self.Y_test)
19 |
20 | def accuracy(self, weights, intercepts):
21 | """
22 | Calculates accuracy on test dataset given new weights and intercepts
23 | :param weights: numpy array of weights
24 | :param intercepts: numpy array of intercepts
25 | :return: returns accuracy on test dataset
26 | """
27 | self.logisticRegr.coef_ = weights # override weights and coefficients
28 | self.logisticRegr.intercept_ = intercepts
29 | return self.logisticRegr.score(self.X_test, self.Y_test)
--------------------------------------------------------------------------------
/src/utils/print_config.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | sys.path.append('..')
4 | import config
5 |
6 | def print_config(len_per_iteration):
7 | """
8 | Prints parameters at start of simulation. The two arguments are dynamically created and hence not in config.
9 | :param len_per_iteration: length of training dataset for each client for each iteration
10 | :param sensitivity: sensitivity for differential privacy
11 | """
12 | print('\n')
13 | print(
14 | 'Running simulation with: \n{} clients \n{} iterations \n{}differential privacy \nand {}security \n'.format(
15 | config.NUM_CLIENTS, config.ITERATIONS, 'no ' if not config.USE_DP_PRIVACY else '',
16 | 'no ' if not config.USE_SECURITY else ''))
17 | print('Training length per client per iteration is {}\n'.format((len_per_iteration)))
18 | print(
19 | 'Simulation parameters are: \nTolerance for weight convergence = {} \nEpsilon for DP privacy is {}'.format(
20 | config.tolerance, config.epsilon))
21 |
--------------------------------------------------------------------------------
/src/utils/truncated_laplacian.py:
--------------------------------------------------------------------------------
1 | import scipy.integrate as integrate
2 | import numpy as np
3 |
4 | '''
5 | This is a helper function for generating the probability density function for truncated laplacee distribution.
6 | This function processes A and B to be the desired value. If the input A, B value is not appropriate for the desired
7 | epsilon, we default to the values for symmetric A, B values.
8 | '''
9 | def process(epsilon, global_sensitivity, A, B):
10 | lam = 1 / epsilon
11 | A_ = lam * np.log(2 + (1 - global_sensitivity) / global_sensitivity * np.exp(-B / lam) - 1 / global_sensitivity * np.exp(-(B - 1) / lam))
12 | if abs(B) < abs(A_):
13 | return A_, B
14 | B_ = -lam * np.log(2 + (1 - global_sensitivity)/global_sensitivity * np.exp(A/lam) - 1/global_sensitivity * np.exp((B+1)/lam))
15 | if abs(A) < abs(B_):
16 | return A, B_
17 | A_ = global_sensitivity / epsilon * np.log(1 + (np.exp(epsilon) - 1)/(2 * global_sensitivity))
18 | B_ = - A_
19 | return A_, B_
20 |
21 | '''
22 | Given privacy parameters epsilon, global_sensitivity, and scale parameters A, and B, return a function that is the probability
23 | density function for the truncated laplace distribution.
24 | '''
25 | def truncated_laplace(epsilon, global_sensitivity, A, B):
26 | lam = 1 / epsilon
27 | M = 1 / (lam * (2 - np.exp(A / lam) - np.exp(-B / lam)))
28 | return lambda x: M * np.exp(-abs(x) / lam)
29 |
30 | '''
31 | Given epsilon, global_sensitivity, A, B, return the L1 cost for the truncated laplacian mechanism.
32 | '''
33 | def truncated_laplace_L1_eval(epsilon, global_sensitivity, A, B):
34 | return integrate.quad(lambda x: abs(x) * truncated_laplace(epsilon, global_sensitivity, A, B)(x), A, B)[0]
35 |
36 | '''
37 | Given epsilon, global_sensitivity, A, B, return the L2 cost for the truncated laplacian mechanism.
38 | '''
39 | def truncated_laplace_L2_eval(epsilon, global_sensitivity, A, B):
40 | return integrate.quad(lambda x: x**2 * truncated_laplace(epsilon, global_sensitivity, A, B)(x), A, B)[0]
41 |
42 | # Example for getting resutls for truncated laplacian
43 | def truncated_laplacian_example():
44 | epsilon = 1e-4 # change here for different epsilon values
45 | global_sensitivity = 1 # change here for different global sensitivity
46 | A = -10 # change here for different left truncation values
47 | a, b = process(epsilon, global_sensitivity, A, -A) # It is necessary to call process function to get the appropriate bounds
48 | L1_cost = truncated_laplace_L1_eval(epsilon, global_sensitivity, a, b)
49 | L2_cost = truncated_laplace_L2_eval(epsilon, global_sensitivity, a, b)
--------------------------------------------------------------------------------