├── .github
    └── ISSUE_TEMPLATE
    │   └── paper-reading-templates.md
└── README.md


/.github/ISSUE_TEMPLATE/paper-reading-templates.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Paper Reading Templates
 3 | about: Describe this issue template's purpose here.
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | ## 一言でいうと
11 | 
12 | ### 論文リンク
13 | 
14 | ### 著者/所属機関
15 | 
16 | ### 投稿日付(yyyy/MM/dd)
17 | 
18 | ## 概要
19 | 
20 | ## 新規性・差分
21 | 
22 | ## 手法
23 | 
24 | ## 結果
25 | 
26 | ## コメント
27 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # survey-deep-gradient-compression
 2 | 
 3 | 
 4 | ## Gradient Compression (Sparsification)
 5 | * [ ] [Chen, Chia-Yu, et al. "Scalecom: Scalable sparsified gradient compression for communication-efficient distributed training." arXiv preprint arXiv:2104.11125 (2021)](https://arxiv.org/pdf/2104.11125.pdf)
 6 | * [ ] [Han, Pengchao, Shiqiang Wang, and Kin K. Leung. "Adaptive gradient sparsification for efficient federated learning: An online learning approach." arXiv preprint arXiv:2001.04756 (2020).](https://arxiv.org/abs/2001.04756)
 7 | * [X] [Dutta, Aritra, et al. "On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020.](https://ojs.aaai.org/index.php/AAAI/article/view/5793)
 8 | * [X] [Shi, Shaohuai, et al. "Communication-efficient distributed deep learning with merged gradient sparsification on GPUs." IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 2020.](https://www.comp.hkbu.edu.hk/~chxw/papers/infocom_2020_MGS.pdf)
 9 | * [X] [Mishchenko, Konstantin, Filip Hanzely, and Peter Richtárik. "99% of worker-master communication in distributed optimization is not needed." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.](http://proceedings.mlr.press/v124/mishchenko20a/mishchenko20a.pdf)
10 | * [] [Shi, Shaohuai, et al. "A distributed synchronous SGD algorithm with global Top-k sparsification for low bandwidth networks." 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019.](https://arxiv.org/pdf/1901.04359.pdf)
11 | * [X] [Shi, Shaohuai, et al. "A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification." IJCAI. 2019.](https://www.ijcai.org/Proceedings/2019/0473.pdf)
12 | * [ ] [Sattler, Felix, et al. "Sparse binary compression: Towards distributed deep learning with minimal communication." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019.](https://arxiv.org/pdf/1805.08768.pdf)
13 | * [X] [Shi, Shaohuai, et al. "Layer-wise adaptive gradient sparsification for distributed deep learning with convergence guarantees." arXiv preprint arXiv:1911.08727 (2019).](https://arxiv.org/abs/1911.08727)
14 | * [X] [Alistarh, Dan, et al. "The Convergence of Sparsified Gradient Methods." 32nd Conference on Neural Information Processing Systems (NIPS), DEC 02-08, 2018, Montreal, Canada. Vol. 31. Neural Information Processing Systems (NIPS), 2018.](https://papers.nips.cc/paper/2018/hash/314450613369e0ee72d0da7f6fee773c-Abstract.html)
15 | * [X] [Stich, Sebastian U., Jean-Baptiste Cordonnier, and Martin Jaggi. "Sparsified SGD with Memory." Advances in Neural Information Processing Systems 31 (2018): 4447-4458.](https://proceedings.neurips.cc/paper/2018/hash/b440509a0106086a67bc2ea9df0a1dab-Abstract.html)
16 | 
17 | ## Gradient Quantization
18 | * [X] [Zhu, Guangxu, et al. "One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis." IEEE Transactions on Wireless Communications (2020).](https://arxiv.org/pdf/2001.05713.pdf)
19 | * [X] [Abdi, Afshin, and Faramarz Fekri. "Quantized compressive sampling of stochastic gradients for efficient communication in distributed deep learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020.](https://ojs.aaai.org/index.php/AAAI/article/download/5706/5562)
20 | * [ ] [Sohn, Jy-yong, et al. "Election coding for distributed learning: Protecting SignSGD against byzantine attacks." Advances in Neural Information Processing Systems 33 (2020).](https://proceedings.neurips.cc/paper/2020/hash/a7f0d2b95c60161b3f3c82f764b1d1c9-Abstract.html)
21 | * [ ] [Horváth, Samuel, et al. "Stochastic distributed learning with gradient quantization and variance reduction." arXiv preprint arXiv:1904.05115 (2019).](https://arxiv.org/pdf/1904.05115.pdf)
22 | * [ ] [Zheng, Shuai, Ziyue Huang, and James T. Kwok. "Communication-efficient distributed blockwise momentum SGD with error-feedback." arXiv preprint arXiv:1905.10936 (2019).](https://arxiv.org/pdf/1905.10936.pdf)
23 | * [X] [Karimireddy, Sai Praneeth, et al. "Error feedback fixes signsgd and other gradient compression schemes." International Conference on Machine Learning. PMLR, 2019.](http://proceedings.mlr.press/v97/karimireddy19a.html)
24 | * [ ] [Wu, Jiaxiang, et al. "Error compensated quantized SGD and its applications to large-scale distributed optimization." International Conference on Machine Learning. PMLR, 2018.](http://proceedings.mlr.press/v80/wu18d/wu18d.pdf)
25 | * [X] [Bernstein, Jeremy, et al. "signSGD with majority vote is communication efficient and fault tolerant." arXiv preprint arXiv:1810.05291 (2018).](https://arxiv.org/pdf/1810.05291.pdf)
26 | * [X] [Yu, Mingchao, et al. "Gradiveq: Vector quantization for bandwidth-efficient gradient aggregation in distributed cnn training." arXiv preprint arXiv:1811.03617 (2018).](https://arxiv.org/pdf/1811.03617.pdf)
27 | * [X] [Alistarh, Dan, et al. "QSGD: Communication-efficient SGD via gradient quantization and encoding." Advances in Neural Information Processing Systems 30 (2017): 1709-1720.](https://papers.nips.cc/paper/2017/file/6c340f25839e6acdc73414517203f5f0-Paper.pdf)
28 | * [X] [Wen, Wei, et al. "TernGrad: ternary gradients to reduce communication in distributed deep learning." Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.](https://dl.acm.org/doi/pdf/10.5555/3294771.3294915)
29 | 
30 | ## Asynchronus Learning
31 | * [X] [Grishchenko, Dmitry, et al. "Asynchronous distributed learning with sparse communications and identification." (2018).](https://hal.archives-ouvertes.fr/hal-01950120/document)
32 | * [ ] [Bogoychev, Nikolay, et al. "Accelerating asynchronous stochastic gradient descent for neural machine translation." arXiv preprint arXiv:1808.08859 (2018).](https://hal.archives-ouvertes.fr/hal-01950120/document)
33 | 
34 | ## Federated Learning
35 | * [ ] [Haddadpour, Farzin, et al. "Federated learning with compression: Unified analysis and sharp guarantees." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.](http://proceedings.mlr.press/v130/haddadpour21a/haddadpour21a.pdf)
36 | * [ ] [Amiri, Mohammad Mohammadi, and Deniz Gündüz. "Federated learning over wireless fading channels." IEEE Transactions on Wireless Communications 19.5 (2020): 3546-3557.](https://arxiv.org/pdf/1907.09769.pdf)
37 | * [ ] [Abad, M. Salehi Heydar, et al. "Hierarchical federated learning across heterogeneous cellular networks." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.](https://arxiv.org/pdf/1909.02362.pdf)
38 | 
39 | ## Others
40 | * [ ] [Zhu, Ligeng, and Song Han. "Deep leakage from gradients." Federated Learning. Springer, Cham, 2020. 17-31.](https://arxiv.org/pdf/1906.08935.pdf)
41 | 
42 | * [ ] [Jiang, Yimin, et al. "A Unified Architecture for Accelerating Distributed {DNN} Training in Heterogeneous GPU/CPU Clusters." 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 2020.](https://www.usenix.org/system/files/osdi20-jiang.pdf)
43 | 
44 | * [ ] [Singh, Navjot, et al. "SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization." arXiv preprint arXiv:2005.07041 (2020).](https://arxiv.org/pdf/2005.07041.pdf)
45 | 
46 | * [ ] [Sapio, Amedeo, et al. "Scaling distributed machine learning with in-network aggregation." arXiv preprint arXiv:1903.06701 (2019).](https://arxiv.org/pdf/1903.06701.pdf)
47 | 
48 | * [ ] [Jayarajan, Anand, et al. "Priority-based parameter propagation for distributed DNN training." arXiv preprint arXiv:1905.03960 (2019).](https://arxiv.org/pdf/1905.03960.pdf)
49 | 
50 | * [ ] [Renggli, Cèdric, et al. "Sparcml: High-performance sparse communication for machine learning." Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019.](https://arxiv.org/pdf/1802.08021.pdf)
51 | 
52 | * [ ] [Haddadpour, Farzin, et al. "Local sgd with periodic averaging: Tighter analysis and adaptive synchronization." arXiv preprint arXiv:1910.13598 (2019).](https://arxiv.org/pdf/1910.13598.pdf)
53 | 
54 | * [ ] [Coleman, Cody, et al. "Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark." ACM SIGOPS Operating Systems Review 53.1 (2019): 14-25.](https://arxiv.org/pdf/1806.01427.pdf)  
55 | 新しいベンチマーク、評価指標の提案
56 | 
57 | * [ ] [Shi, Shaohuai, Xiaowen Chu, and Bo Li. "MG-WFBP: Efficient data communication for distributed synchronous SGD algorithms." IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019.](https://arxiv.org/pdf/1811.11141.pdf)
58 | 
59 | * [ ] [Luo, Liang, et al. "Parameter hub: a rack-scale parameter server for distributed deep neural network training." Proceedings of the ACM Symposium on Cloud Computing. 2018.](https://dl.acm.org/doi/pdf/10.1145/3267809.3267840)
60 | 
61 | * [ ] [Li, Youjie, et al. "Pipe-sgd: A decentralized pipelined sgd framework for distributed deep net training." arXiv preprint arXiv:1811.03619 (2018).](https://arxiv.org/pdf/1811.03619.pdf)
62 | 
63 | * [ ] [Sridharan, Srinivas, et al. "On scale-out deep learning training for cloud and HPC." arXiv preprint arXiv:1801.08030 (2018).](https://arxiv.org/pdf/1810.00859.pdf)
64 | 


--------------------------------------------------------------------------------