├── DPBoost ├── README.md ├── framework.png └── paper.pdf ├── FL_survey ├── README.md ├── paper.pdf └── taxonomy.png ├── OARF ├── README.md ├── arch.png └── paper.pdf ├── README.md ├── SimFL ├── README.md ├── paper.pdf └── train_a_tree.png └── Tutorial ├── README.md └── introduction.mp4 /DPBoost/README.md: -------------------------------------------------------------------------------- 1 | ## Privacy-Preserving Gradient Boosting Decision Trees 2 | 3 | Authors: Qinbin Li, Zhaomin Wu, Zeyi Wen, Bingsheng He 4 | 5 | Abstract: The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines. 6 | 7 | Paper: https://arxiv.org/pdf/1911.04209.pdf 8 | 9 | Code: https://github.com/QinbinLi/DPBoost 10 | 11 | Remark: accepted to AAAI 2020. 12 | 13 | ![](DPBoost/framework.png) 14 | -------------------------------------------------------------------------------- /DPBoost/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/DPBoost/framework.png -------------------------------------------------------------------------------- /DPBoost/paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/DPBoost/paper.pdf -------------------------------------------------------------------------------- /FL_survey/README.md: -------------------------------------------------------------------------------- 1 | ## A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection 2 | 3 | Authors: Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He 4 | 5 | Abstract: Federated learning has been a hot research area in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Just like deep learning systems such as Caffe, PyTorch, and Tensorflow that boost the development of deep learning algorithms, federated learning systems are equivalently important, and face challenges from various aspects such as unpractical system assumptions, scalability and efficiency. Inspired by federated systems in other fields such as databases and cloud computing, we investigate the existing characteristics of federated learning systems. We find that two important features for federated systems in other fields, i.e., heterogeneity and autonomy, are also applicable in the existing federated learning systems. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. Lastly, we take a systematic comparison among the existing federated learning systems and present future research opportunities and directions. 6 | 7 | Paper: https://arxiv.org/pdf/1907.09693.pdf 8 | 9 | ![](taxonomy.png) -------------------------------------------------------------------------------- /FL_survey/paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/FL_survey/paper.pdf -------------------------------------------------------------------------------- /FL_survey/taxonomy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/FL_survey/taxonomy.png -------------------------------------------------------------------------------- /OARF/README.md: -------------------------------------------------------------------------------- 1 | ## The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems 2 | 3 | Authors: Sixu Hu, Yuan Li, Xu Liu, Qinbin Li, Zhaomin Wu, Bingsheng He 4 | 5 | Abstract: This paper presents and characterizes an Open Application Repository for Federated Learning (OARF), a benchmark suite for federated machine learning systems. Previously available benchmarks for federated learning have focused mainly on synthetic datasets and use a very limited number of applications. OARF includes different data partitioning methods (horizontal, vertical and hybrid) as well as emerging applications in image, text and structured data, which represent different scenarios in federated learning. Our characterization shows that the benchmark suite is diverse in data size, distribution, feature distribution and learning task complexity. We have developed reference implementations, and evaluated the important aspects of federated learning, including model accuracy, communication cost, differential privacy, secure multiparty computation and vertical federated learning. 6 | 7 | Paper: https://arxiv.org/pdf/2006.07856 8 | 9 | ![](arch.png) 10 | -------------------------------------------------------------------------------- /OARF/arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/OARF/arch.png -------------------------------------------------------------------------------- /OARF/paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/OARF/paper.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Private Machine Learning 2 | 3 | ## Table of Contents 4 | * [Overview](#overview) 5 | * [Project Descriptions](#project-descriptions) 6 | * [Publications](#publications) 7 | 8 | ## Overview 9 | 10 | This repo summarizes the private machine learning work of Xtra group. Currently we work mainly on two areas: federated learning and differential privacy. Federated learning enables the collaborative learning of multiple parties without exchanging the local data. 11 | 12 | ## Project Descriptions 13 | 14 | We have worked/are working on the following projects. 15 | 16 | (1) [Federated Learning Survey](#FL_survey): We conducted a survey on federated learning systems. 17 | 18 | (2) [Federated Gradient Boosting Decision Trees](#SimFL): We designed a novel federated learning framework for gradient boosting decision trees. 19 | 20 | (3) [Differentially Private Gradient Boosting Decision Trees](#DPBoost): We designed a differentially private gradient boosting decision tree training algorithm. 21 | 22 | (4) [Federated Learning Benchmarks](#OARF): We designed a benchmark for evaluating the components in different FL systems. 23 | 24 | ## Publications 25 | 26 | * [A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection](https://qinbinli.com/files/FLSurvey.pdf)
27 | Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He
28 | arXiv preprint 29 | * We conducted a comprehensive analysis against existing federated learning systems from different aspects (see [details](FL_survey)). 30 | 31 | * [Practical Federated Gradient Boosting Decision Trees](https://arxiv.org/abs/1911.04206)
32 | Qinbin Li, Zeyi Wen, Bingsheng He
33 | Thirty-Fourth AAAI Conference on Artificial Intelligence. AAAI 2020. 34 | * We proposed a novel federated learning framework for gradient boosting decision trees by exploiting similarity (see [details](SimFL)). 35 | 36 | * [Privacy-Preserving Gradient Boosting Decision Trees](https://arxiv.org/abs/1911.04209)
37 | Qinbin Li, Zhaomin Wu, Zeyi Wen, Bingsheng He
38 | Thirty-Fourth AAAI Conference on Artificial Intelligence. AAAI 2020. 39 | * We designed a new differentially private gradient boosting decision trees training algorithm (see [details](DPBoost)). 40 | 41 | * [The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems](https://arxiv.org/abs/2006.07856)
42 | Sixu Hu, Yuan Li, Xu Liu, Qinbin Li, Zhaomin Wu, Bingsheng He
43 | arXiv preprint. 44 | * We designed a benchmark for evaluating the components in different FL systems (see [details](OARF), [code](https://github.com/Xtra-computing/OARF)). 45 | 46 | -------------------------------------------------------------------------------- /SimFL/README.md: -------------------------------------------------------------------------------- 1 | ## Practical Federated Gradient Boosting Decision Trees 2 | 3 | Authors: Qinbin Li, Zeyi Wen, Bingsheng He 4 | 5 | Abstract: Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secret sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties. 6 | 7 | Paper: https://arxiv.org/pdf/1911.04206.pdf 8 | 9 | Code: https://github.com/Xtra-Computing/SimFL 10 | 11 | Remark: accepted to AAAI 2020. 12 | 13 | ![](SimFL/train_a_tree.png) 14 | -------------------------------------------------------------------------------- /SimFL/paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/SimFL/paper.pdf -------------------------------------------------------------------------------- /SimFL/train_a_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/SimFL/train_a_tree.png -------------------------------------------------------------------------------- /Tutorial/README.md: -------------------------------------------------------------------------------- 1 | # A Tutorial on Federated Learning Systems: Comparative Studies and Hand-on Demonstrations 2 | 3 | 4 | ## Presenters 5 | Qinbin Li, Sixu Hu, Zhaomin Wu, Yuan Li 6 | 7 | School of Computing 8 | 9 | National University of Singapore 10 | 11 | ## Videos 12 | You can watch the pre-recorded videos [here](https://drive.google.com/drive/folders/1CBzsy0lg4ML3rTFcMJECZRqkmjxegKdT?usp=sharing). 13 | 14 | ## Table of Contents 15 | * [Abstract](#Abstract) 16 | * [Outline](#Outline) 17 | * [References](#References) 18 | 19 | ## Abstract 20 | 21 | Federated learning has become a hot research area in machine learning, which enables multiple parties collaboratively training a model without exchanging the local data. It is attractive to researchers working on many areas such as distributed learning, privacy, and fairness. Meanwhile, many companies have developed their federated learning systems such as [Webank FATE](https://github.com/FederatedAI/FATE) and [Google TensorFlow Federated](https://github.com/tensorflow/federated), which provide useful platforms for researchers. 22 | 23 | In this tutorial, we will introduce the concept of federated learning, including current widely used federated learning frameworks. Specifically, we will present comparative studies on existing system from the perspective of data partitioning, machine learning model, scale of federation, communication architectures, privacy mechanisms, and motivation of federations. Moreover, we will present existing federated learning systems and demo the usage of FATE. We will also cover the benchmark study of federated learning systems. 24 | 25 | 26 | ## Outline 27 | 28 | * An Overview of Federated Learning (~30 min) 29 | * Concept 30 | * Category 31 | * Challenges 32 | * Federated Learning Systems 33 | * Demo 34 | * Benchmark (~30 min) 35 | * Vertical Federated Learning (~20 min) 36 | 37 | ## References 38 | [1] [A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection](https://arxiv.org/pdf/1907.09693.pdf)
39 | Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He
40 | 41 | -------------------------------------------------------------------------------- /Tutorial/introduction.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xtra-Computing/PrivML/e888aa8db4688ac064b2d3220dfcc441393729a2/Tutorial/introduction.mp4 --------------------------------------------------------------------------------