├── .github
    └── PULL_REQUEST_TEMPLATE.md
├── AutoML_system.md
├── GNN_system.md
├── LICENSE
├── README.md
├── conference.md
├── data_processing.md
├── drl_system.md
├── edge_system.md
├── federated_learning_system.md
├── imgs
    └── AI_system.png
├── inference.md
├── infra.md
├── llm_serving.md
├── llm_training.md
├── note
    ├── Eurosys2020.md
    ├── Eurosys2021.md
    ├── MLSys2021.md
    ├── NSDI2021.md
    ├── OSDI2020.md
    ├── SIGCOMM2020.md
    ├── SoCC2019.md
    ├── SoCC2020.md
    └── SoCC2021.md
├── paper
    └── mlsys-whitepaper.pdf
├── training.md
└── video_system.md


/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 |   <!-- Thanks for your pull request -->
2 |   
3 |   - Title [[Paper]](link) [[GitHub]](link)
4 |   - Author (*conference(journal) year*)
5 |   - Summary: 
6 | 


--------------------------------------------------------------------------------
/AutoML_system.md:
--------------------------------------------------------------------------------
 1 | # AutoML System
 2 | 
 3 | ## Survey
 4 | 
 5 | - A curated list of automated machine learning papers, articles, tutorials, slides and projects [[GitHub]](https://github.com/hibayesian/awesome-automl-papers)
 6 | - Taking human out of learning applications: A survey on automated machine learning. [[Must Read Survey]](https://arxiv.org/pdf/1810.13306.pdf)
 7 |   - Quanming, Y., Mengshuo, W., Hugo, J.E., Isabelle, G., Yi-Qi, H., Yu-Feng, L., Wei-Wei, T., Qiang, Y. and Yang, Y.
 8 | - AutoML Freiburg-Hannover [[Website]](https://www.ml4aad.org/automl/)
 9 | - Survey on End-To-End Machine Learning Automation [[Paper]](https://arxiv.org/pdf/1906.02287.pdf) [[GitHub]](https://github.com/DataSystemsGroupUT/AutoML_Survey)
10 | - Design Automation for Efficient Deep Learning Computing [[paper]](https://arxiv.org/pdf/1904.10616.pdf) [[GitHub]](https://github.com/mit-han-lab/haq-release)
11 |   - Han, Song, et al. (*arXiv preprint arXiv:1904.10616 (2019)*)
12 |   
13 | 
14 | ## AutoML Opensource Toolkit
15 | - Swearingen, Thomas, et al. "ATM: A distributed, collaborative, scalable system for automated machine learning." 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. [[Paper]](https://dai.lids.mit.edu/wp-content/uploads/2018/02/atm_IEEE_BIgData-9-1.pdf) [[GitHub]](https://github.com/HDI-Project/ATM)
16 | - Google vizier: A service for black-box optimization. [[Paper]](https://ai.google/research/pubs/pub46180.pdf) [[GitHub]](https://github.com/tobegit3hub/advisor)
17 |   - Golovin, Daniel, et al. (*SIGMOD 2017*)
18 | - Aut-sklearn: Automated Machine Learning with scikit-learn [[GitHub]](https://github.com/automl/auto-sklearn) [[Paper]](https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf)
19 | - Katib: A Distributed General AutoML Platform on Kubernetes [[GitHub]](https://github.com/kubeflow/katib/) [[Paper]](https://www.usenix.org/system/files/opml19papers-zhou.pdf)
20 | - NNI: An open source AutoML toolkit for neural architecture search and hyper-parameter tuning [[GitHub]](https://github.com/Microsoft/nni)
21 | - AutoKeras: Accessible AutoML for deep learning. [[GitHub]](https://github.com/keras-team/autokeras)
22 | - Facebook/Ax: Adaptive experimentation is the machine-learning guided process of iteratively exploring a (possibly infinite) parameter space in order to identify optimal configurations in a resource-efficient manner. [[GitHub]](https://github.com/facebook/Ax)
23 | - DeepSwarm: DeepSwarm is an open-source library which uses Ant Colony Optimization to tackle the neural architecture search problem. [[GitHub]](https://github.com/Pattio/DeepSwarm)
24 | - Google/AdaNet: AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert. Importantly, AdaNet provides a general framework for not only learning a neural network architecture, but also for learning to ensemble to obtain even better models. [[GitHub]](https://github.com/tensorflow/adanet)
25 | - TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning [[GitHub]](https://github.com/salesforce/TransmogrifAI)
26 | - Angel-ML/automl:An automatic machine learning toolkit, including hyper-parameter tuning and feature engineering. [[GitHub]](https://github.com/Angel-ML/automlI)
27 | 
28 | 
29 | ## Auto Model Selection
30 | 
31 | - Automating model search for large scale machine learning [[Paper]](https://amplab.cs.berkeley.edu/wp-content/uploads/2015/07/163-sparks.pdf)
32 |   - Sparks, E.R., Talwalkar, A., Haas, D., Franklin, M.J., Jordan, M.I. and Kraska, T., 2015, August.
33 |   - In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 368-380). ACM.
34 | - A framework for searching a predictive model [[Paper]](https://arxiv.org/pdf/1908.10310.pdf)
35 |   - Takahashi, Yoshiki, Masato Asahara, and Kazuyuki Shudo
36 |   - In SysML Conference, vol. 2018. 2018.
37 | - Dynamic Autoselection and Autotuning of Machine Learning Models for Cloud Network Analytics [[Paper]](https://ieeexplore.ieee.org/document/8500348)
38 |   - IEEE Transactions on Parallel and Distributed Systems 30, no. 5 (2018): 1052-1064.
39 |   - Karn, Rupesh Raj, Prabhakar Kudva, and Ibrahim Abe M. Elfadel.
40 | 


--------------------------------------------------------------------------------
/GNN_system.md:
--------------------------------------------------------------------------------
1 | # System for GNN training&inference
2 | 
3 | Papers and projects about GNN training and inference
4 | 
5 | # Platform
6 | 
7 | - Graph-Learn(GL) is a framework designed to simplify the application of graph neural networks(GNNs). [[Paper]](http://www.vldb.org/pvldb/vol12/p2094-zhu.pdf) [[Github]](https://github.com/alibaba/graph-learn)
8 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Huaizheng
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | [![Maintenance](https://img.shields.io/badge/Maintained%3F-YES-green.svg)](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/graphs/commit-activity)
  2 | [![Commit Activity](https://img.shields.io/github/commit-activity/m/HuaizhengZhang/Awesome-System-for-Machine-Learning.svg?color=red)](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/graphs/commit-activity)
  3 | [![Last Commit](https://img.shields.io/github/last-commit/HuaizhengZhang/Awesome-System-for-Machine-Learning.svg)](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/commits/master)
  4 | [![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg)](https://GitHub.com/Naereen/ama)
  5 | [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
  6 | [![GitHub license](https://img.shields.io/github/license/HuaizhengZhang/Awesome-System-for-Machine-Learning.svg?color=blue)](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/blob/master/LICENSE)
  7 | [![GitHub stars](https://img.shields.io/github/stars/HuaizhengZhang/Awesome-System-for-Machine-Learning.svg?style=social)](https://GitHub.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/stargazers/)
  8 | 
  9 | # AI System School 
 10 | 
 11 | ### 💫💫💫 System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI)
 12 | 
 13 | ### Updates: 
 14 | 
 15 | - Video Tutorials [[YouTube]](https://youtu.be/ChD1_aVZJ0g?si=Kg-yB3F4Iea0Xp9J) [[bilibili]](https://www.bilibili.com/video/BV1ZwYUerEtL/) [[小红书]](http://xhslink.com/MmrjcT)
 16 | - We are preparing a new website [[Lets Go AI]](https://letsgoai.pro/) for this repo!!!
 17 | 
 18 | ### *Path to System for AI* [[Whitepaper You Must Read]](./paper/mlsys-whitepaper.pdf)
 19 | 
 20 | A curated list of research in machine learning systems. Link to the code if available is also present. Now we have a [team](#maintainer) to maintain this project. *You are very welcome to pull request by using our template*.
 21 | 
 22 | ![AI system](https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning/blob/master/imgs/AI_system.png)
 23 | 
 24 | ## System for AI (Ordered by Category)
 25 | 
 26 | ### ML / DL Infra
 27 | 
 28 | - [Data Processing](data_processing.md#data-processing)
 29 | - [Training System](training.md#training-system)
 30 | - [Inference System](inference.md#inference-system)
 31 | - [Machine Learning Infrastructure](infra.md#machine-learning-infrastructure)
 32 | 
 33 | ### LLM Infra
 34 | 
 35 | - [LLM Training](llm_training.md#llm_training)
 36 | - [LLM Serving](llm_serving.md#llm_serving)
 37 | 
 38 | ### Domain-Specific Infra
 39 | 
 40 | - [Video System](video_system.md#video-system)
 41 | - [AutoML System](AutoML_system.md#automl-system)
 42 | - [Edge AI](edge_system.md#edge-or-mobile-papers)
 43 | - [GNN System](GNN_system.md#system-for-gnn-traininginference)
 44 | - [Federated Learning System](federated_learning_system.md#federated-learning-system)
 45 | - [Deep Reinforcement Learning System](drl_system.md#deep-reinforcement-learning-system)
 46 | 
 47 | ## System for ML/LLM Conference
 48 | 
 49 | ### Conference
 50 | 
 51 | - OSDI
 52 | - SOSP
 53 | - SIGCOMM
 54 | - NSDI
 55 | - MLSys
 56 | - ATC
 57 | - Eurosys 
 58 | - Middleware
 59 | - SoCC
 60 | - TinyML
 61 | 
 62 | ## General Resources
 63 | 
 64 | - [Survey](#survey)
 65 | - [Book](#book)
 66 | - [Video](#video)
 67 | - [Course](#course)
 68 | - [Blog](#blog)
 69 | 
 70 | ## Survey
 71 | 
 72 | - Toward Highly Available, Intelligent Cloud and ML Systems [[Slide]](http://sysnetome.com/Talks/cguo_netai_2018.pdf)
 73 | - A curated list of awesome System Designing articles, videos and resources for distributed computing, AKA Big Data. [[GitHub]](https://github.com/madd86/awesome-system-design)
 74 | - awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning [[GitHub]](https://github.com/EthicalML/awesome-production-machine-learning)
 75 | - Opportunities and Challenges Of Machine Learning Accelerators In Production [[Paper]](https://www.usenix.org/system/files/opml19papers-ananthanarayanan.pdf)
 76 |   - Ananthanarayanan, Rajagopal, et al. "
 77 |   - 2019 {USENIX} Conference on Operational Machine Learning (OpML 19). 2019.
 78 | - How (and How Not) to Write a Good Systems Paper [[Advice]](https://www.usenix.org/legacy/events/samples/submit/advice_old.html)
 79 | - Applied machine learning at Facebook: a datacenter infrastructure perspective [[Paper]](https://research.fb.com/wp-content/uploads/2017/12/hpca-2018-facebook.pdf)
 80 |   - Hazelwood, Kim, et al. (*HPCA 2018*)
 81 | - Infrastructure for Usable Machine Learning: The Stanford DAWN Project
 82 |   - Bailis, Peter, Kunle Olukotun, Christopher Ré, and Matei Zaharia. (*preprint 2017*)
 83 | - Hidden technical debt in machine learning systems [[Paper]](https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)
 84 |   - Sculley, David, et al. (*NIPS 2015*)
 85 | - End-to-end arguments in system design [[Paper]](http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf)
 86 |   - Saltzer, Jerome H., David P. Reed, and David D. Clark. 
 87 | - System Design for Large Scale Machine Learning [[Thesis]](http://shivaram.org/publications/shivaram-dissertation.pdf)
 88 | - Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications [[Paper]](https://arxiv.org/pdf/1811.09886.pdf)
 89 |   - Park, Jongsoo, Maxim Naumov, Protonu Basu et al. *arXiv 2018*
 90 |   - Summary: This paper presents a characterizations of DL models and then shows the new design principle of DL hardware.
 91 | - A Berkeley View of Systems Challenges for AI [[Paper]](https://arxiv.org/pdf/1712.05855.pdf)
 92 | 
 93 | 
 94 | ## Book
 95 | 
 96 | - Computer Architecture: A Quantitative Approach [[Must read]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.1881&rep=rep1&type=pdf)
 97 | - Distributed Machine Learning Patterns [[Website]](https://www.manning.com/books/distributed-machine-learning-patterns)
 98 | - Streaming Systems [[Book]](https://www.oreilly.com/library/view/streaming-systems/9781491983867/)
 99 | - Kubernetes in Action (start to read) [[Book]](https://www.oreilly.com/library/view/kubernetes-in-action/9781617293726/)
100 | - Machine Learning Systems: Designs that scale [[Website]](https://www.manning.com/books/machine-learning-systems)
101 | - Trust in Machine Learning [[Website]](https://www.manning.com/books/trust-in-machine-learning)
102 | - Automated Machine Learning in Action [[Website]](https://www.manning.com/books/automated-machine-learning-in-action)
103 | - Machine Learning Systems: Principles and Practices of Engineering Artificially Intelligent Systems [[Website]](https://mlsysbook.ai/)
104 | ## Video
105 | 
106 | - ScalaDML2020: Learn from the best minds in the machine learning community. [[Video]](https://info.matroid.com/scaledml-media-archive-preview)
107 | - Jeff Dean: "Achieving Rapid Response Times in Large Online Services" Keynote - Velocity 2014 [[YouTube]](https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
108 | - From Research to Production with PyTorch [[Video]](https://www.infoq.com/presentations/pytorch-torchscript-botorch/#downloadPdf/)
109 | - Introduction to Microservices, Docker, and Kubernetes [[YouTube]](https://www.youtube.com/watch?v=1xo-0gCVhTU)
110 | - ICML Keynote: Lessons Learned from Helping 200,000 non-ML experts use ML [[Video]](https://slideslive.com/38916584/keynote-lessons-learned-from-helping-200000-nonml-experts-use-ml)
111 | - Adaptive & Multitask Learning Systems [[Website]](https://www.amtl-workshop.org/schedule)
112 | - System thinking. A TED talk. [[YouTube]](https://www.youtube.com/watch?v=_vS_b7cJn2A)
113 | - Flexible systems are the next frontier of machine learning. Jeff Dean [[YouTube]](https://www.youtube.com/watch?v=Jnunp-EymJQ&list=WL&index=12)
114 | - Is It Time to Rewrite the Operating System in Rust? [[YouTube]](https://www.youtube.com/watch?v=HgtRAbE1nBM&list=WL&index=17&t=0s)
115 | - InfoQ: AI, ML and Data Engineering [[YouTube]](https://www.youtube.com/playlist?list=PLndbWGuLoHeYsZk6VpCEj_SSd9IFgjJ-2)
116 |   - Start to watch.
117 | - Netflix: Human-centric Machine Learning Infrastructure [[InfoQ]](https://www.infoq.com/presentations/netflix-ml-infrastructure?utm_source=youtube&utm_medium=link&utm_campaign=qcontalks)
118 | - SysML 2019: [[YouTube]](https://www.youtube.com/channel/UChutDKIa-AYyAmbT45s991g/videos)
119 | - ScaledML 2019: David Patterson, Ion Stoica, Dawn Song and so on [[YouTube]](https://www.youtube.com/playlist?list=PLRM2gQVaW_wWXoUnSfZTxpgDmNaAS1RtG)
120 | - ScaledML 2018: Jeff Dean, Ion Stoica, Yangqing Jia and so on [[YouTube]](https://www.youtube.com/playlist?list=PLRM2gQVaW_wW9KAxcibxdqY_TDyvmEjzm) [[Slides]](https://www.matroid.com/blog/post/slides-and-videos-from-scaledml-2018)
121 | - A New Golden Age for Computer Architecture History, Challenges, and Opportunities. David Patterson [[YouTube]](https://www.youtube.com/watch?v=uyc_pDBJotI&t=767s)
122 | - How to Have a Bad Career. David Patterson (I am a big fan) [[YouTube]](https://www.youtube.com/watch?v=Rn1w4MRHIhc)
123 | - SysML 18: Perspectives and Challenges. Michael Jordan [[YouTube]](https://www.youtube.com/watch?v=4inIBmY8dQI&t=26s)
124 | - SysML 18: Systems and Machine Learning Symbiosis. Jeff Dean [[YouTube]](https://www.youtube.com/watch?v=Nj6uxDki6-0)
125 | - AutoML Basics: Automated Machine Learning in Action. Qingquan Song, Haifeng Jin, Xia Hu [[YouTube]](https://www.youtube.com/watch?v=9KpieG0B7VM)
126 | 
127 | ## Course
128 | 
129 | - CS692 Seminar: Systems for Machine Learning, Machine Learning for Systems [[GitHub]](https://github.com/guanh01/CS692-mlsys)
130 | - Topics in Networks: Machine Learning for Networking and Systems, Autumn 2019 [[Course Website]](https://people.cs.uchicago.edu/~junchenj/34702-fall19/syllabus.html)
131 | - CS6465: Emerging Cloud Technologies and Systems Challenges [[Cornell]](http://www.cs.cornell.edu/courses/cs6465/2019fa/)
132 | - CS294: AI For Systems and Systems For AI. [[UC Berkeley Spring]](https://github.com/ucbrise/cs294-ai-sys-sp19) (*Strong Recommendation*) [[Machine Learning Systems (Fall 2019)]](https://ucbrise.github.io/cs294-ai-sys-fa19/)
133 | - CSE 599W: System for ML.  [[Chen Tianqi]](https://github.com/tqchen) [[University of Washington]](http://dlsys.cs.washington.edu/)
134 | - EECS 598: Systems for AI (W'21). [[Mosharaf Chowdhury]](https://www.mosharaf.com/) [[Systems for AI (W'21)]](https://github.com/mosharaf/eecs598/tree/w21-ai)
135 | - Tutorial code on how to build your own Deep Learning System in 2k Lines [[GitHub]](https://github.com/tqchen/tinyflow)
136 | - CSE 291F: Advanced Data Analytics and ML Systems. [[UCSD]](http://cseweb.ucsd.edu/classes/wi19/cse291-f/)
137 | - CSci 8980: Machine Learning in Computer Systems [[University of Minnesota, Twin Cities]](http://www-users.cselabs.umn.edu/classes/Spring-2019/csci8980/)
138 | - Mu Li (MxNet, Parameter Server): Introduction to Deep Learning [[Best DL Course I think]](https://courses.d2l.ai/berkeley-stat-157/index.html)  [[Book]](https://www.d2l.ai/)
139 | - 10-605: Machine Learning with Large Datasets. [[CMU]](https://10605.github.io/fall2020/index.html)
140 | - CS 329S: Machine Learning Systems Design. [[Stanford]](https://stanford-cs329s.github.io/index.html)
141 | 
142 | ## Blog
143 | 
144 | - Parallelizing across multiple CPU/GPUs to speed up deep learning inference at the edge [[Amazon Blog]](https://aws.amazon.com/blogs/machine-learning/parallelizing-across-multiple-cpu-gpus-to-speed-up-deep-learning-inference-at-the-edge/)
145 | - Building Robust Production-Ready Deep Learning Vision Models in Minutes [[Blog]](https://medium.com/google-developer-experts/building-robust-production-ready-deep-learning-vision-models-in-minutes-acd716f6450a)
146 | - Deploy Machine Learning Models with Keras, FastAPI, Redis and Docker [[Blog]](https://medium.com/@shane.soh/deploy-machine-learning-models-with-keras-fastapi-redis-and-docker-4940df614ece)
147 | - How to Deploy a Machine Learning Model -- Creating a production-ready API using FastAPI + Uvicorn [[Blog]](https://towardsdatascience.com/how-to-deploy-a-machine-learning-model-dc51200fe8cf) [[GitHub]](https://github.com/MaartenGr/ML-API)
148 | - Deploying a Machine Learning Model as a REST API [[Blog]](https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166)
149 | - Continuous Delivery for Machine Learning [[Blog]](https://martinfowler.com/articles/cd4ml.html)
150 | - Kubernetes CheatSheets In A4 [[GitHub]](https://github.com/HuaizhengZhang/cheatsheet-kubernetes-A4)
151 | - A Gentle Introduction to Kubernetes [[Blog]](https://medium.com/faun/a-gentle-introduction-to-kubernetes-4961e443ba26)
152 | - Train and Deploy Machine Learning Model With Web Interface - Docker, PyTorch & Flask [[GitHub]](https://github.com/imadelh/ML-web-app)
153 | - Learning Kubernetes, The Chinese Taoist Way [[GitHub]](https://github.com/caicloud/kube-ladder)
154 | - Data pipelines, Luigi, Airflow: everything you need to know [[Blog]](https://towardsdatascience.com/data-pipelines-luigi-airflow-everything-you-need-to-know-18dc741449b7)
155 | - The Deep Learning Toolset — An Overview [[Blog]](https://medium.com/luminovo/the-deep-learning-toolset-an-overview-b71756016c06)
156 | - Summary of CSE 599W: Systems for ML [[Chinese Blog]](http://jcf94.com/2018/10/04/2018-10-04-cse559w/)
157 | - Polyaxon, Argo and Seldon for Model Training, Package and Deployment in Kubernetes [[Blog]](https://medium.com/analytics-vidhya/polyaxon-argo-and-seldon-for-model-training-package-and-deployment-in-kubernetes-fa089ba7d60b)
158 | - Overview of the different approaches to putting Machine Learning (ML) models in production [[Blog]](https://medium.com/analytics-and-data/overview-of-the-different-approaches-to-putting-machinelearning-ml-models-in-production-c699b34abf86)
159 | - Being a Data Scientist does not make you a Software Engineer [[Part1]](https://towardsdatascience.com/being-a-data-scientist-does-not-make-you-a-software-engineer-c64081526372)
160 |   Architecting a Machine Learning Pipeline [[Part2]](https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7)
161 | - Model Serving in PyTorch [[Blog]](https://pytorch.org/blog/model-serving-in-pyorch/)
162 | - Machine learning in Netflix [[Medium]](https://medium.com/@NetflixTechBlog)
163 | - SciPy Conference Materials (slides, repo) [[GitHub]](https://github.com/deniederhut/Slides-SciPyConf-2018)
164 | - 继Spark之后，UC Berkeley 推出新一代AI计算引擎——Ray [[Blog]](http://www.qtmuniao.com/2019/04/06/ray/)
165 | - 了解/从事机器学习/深度学习系统相关的研究需要什么样的知识结构？ [[Zhihu]](https://www.zhihu.com/question/315611053/answer/623529977)
166 | - Learn Kubernetes in Under 3 Hours: A Detailed Guide to Orchestrating Containers [[Blog]](https://www.freecodecamp.org/news/learn-kubernetes-in-under-3-hours-a-detailed-guide-to-orchestrating-containers-114ff420e882/) [[GitHub]](https://github.com/rinormaloku/k8s-mastery)
167 | - data-engineer-roadmap: Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups [[GitHub]](https://github.com/hasbrain/data-engineer-roadmap)
168 | - TensorFlow Serving + Docker + Tornado机器学习模型生产级快速部署 [[Blog]](https://zhuanlan.zhihu.com/p/52096200?utm_source=wechat_session&utm_medium=social&utm_oi=38612796178432)
169 | - Deploying a Machine Learning Model as a REST API [[Blog]](https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166)
170 | - Colossal-AI: A Unified Deep Learning System for Big Model Era [[Blog]](https://medium.com/@hpcaitech/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer-8793d08332dc) [[GitHub]](https://github.com/hpcaitech/ColossalAI)
171 | - Data Engineer Roadmap [[Scaler Blogs]](https://www.scaler.com/blog/data-engineer-roadmap/)
172 | 
173 | 


--------------------------------------------------------------------------------
/conference.md:
--------------------------------------------------------------------------------
 1 | ## Year 2020
 2 | 
 3 | ### ATC 2020 [[Program]](https://www.usenix.org/conference/atc20/technical-sessions)
 4 | 
 5 | - Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider [[Paper]](https://www.usenix.org/conference/atc20/presentation/shahrad)
 6 |   - Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini, Microsoft Azure and Microsoft Research
 7 |   - Summary: Since many ML services are stateless, FaaS is a good chance to reduce cost. This paper analyzes the real-wolrd workload on FaaS and provides a good dataset.
 8 | - Lessons Learned from the Chameleon Testbed [[Paper]](https://www.usenix.org/conference/atc20/presentation/keahey)
 9 |   - Kate Keahey, Argonne National Laboratory; Jason Anderson and Zhuo Zhen, University of Chicago; Pierre Riteau, StackHPC Ltd; Paul Ruth, RENCI UNC Chapel Hill; Dan Stanzione, Texas Advanced Computing Center; Mert Cevik, RENCI UNC Chapel Hill; Jacob Colleran and Haryadi S. Gunawi, University of Chicago; Cody Hammock, Texas Advanced Computing Center; Joe Mambretti, Northwestern University; Alexander Barnes, François Halbah, Alex Rocha, and Joe Stubbs, Texas Advanced Computing Center
10 | - Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads [[Paper]](https://www.usenix.org/conference/atc20/presentation/yuan)
11 |   - Gina Yuan, Shoumik Palkar, Deepak Narayanan, and Matei Zaharia, Stanford University
12 | - HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism [[Paper]](https://www.usenix.org/conference/atc20/presentation/park)
13 |   - Jay H. Park, Gyeongchan Yun, Chang M. Yi, Nguyen T. Nguyen, and Seungmin Lee, UNIST; Jaesik Choi, KAIST; Sam H. Noh and Young-ri Choi, UNIST
14 | - AutoSys: The Design and Operation of Learning-Augmented Systems [[Paper]](https://www.usenix.org/conference/atc20/presentation/liang-mike)
15 |   - Chieh-Jan Mike Liang, Hui Xue, Mao Yang, and Lidong Zhou, Microsoft Research; Lifei Zhu, Peking University and Microsoft Research; Zhao Lucis Li and Zibo Wang, University of Science and Technology of China and Microsoft Research; Qi Chen and Quanlu Zhang, Microsoft Research; Chuanjie Liu, Microsoft Bing Platform; Wenjun Dai, Microsoft Bing Ads
16 | - Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training [[Paper]](https://www.usenix.org/conference/atc20/presentation/zhu-hongyu)
17 |   - Hongyu Zhu, University of Toronto & Vector Institute; Amar Phanishayee, Microsoft Research; Gennady Pekhimenko, University of Toronto & Vector Institute
18 | - ALERT: Accurate Learning for Energy and Timeliness [[Paper]](https://www.usenix.org/conference/atc20/presentation/wan)
19 |   - Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu, University of Chicago
20 | - NeuOS: A Latency-Predictable Multi-Dimensional Optimization Framework for DNN-driven Autonomous Systems [[Paper]](https://www.usenix.org/conference/atc20/presentation/bateni)
21 |   - Soroush Bateni and Cong Liu, University of Texas at Dallas
22 | - PERCIVAL: Making In-Browser Perceptual Ad Blocking Practical with Deep Learning [[Paper]](https://www.usenix.org/conference/atc20/presentation/din)
23 |   - Zainul Abi Din, UC Davis; Panagiotis Tigas, University of Oxford; Samuel T. King, UC Davis, Bouncer Technologies; Benjamin Livshits, Brave Software, Imperial College London
24 | 
25 | 
26 | 
27 | ### ICLR 2020: Challenges in Deploying and Monitoring Machine Learning Systems [[Workshop]](https://icml.cc/Conferences/2020/Schedule?showEvent=5738)
28 | 
29 | ### MLsys 2020 [[All Papers]](https://mlsys.org/Conferences/2020/ScheduleMultitrack?text=&session=&event_type=&day=)
30 | 
31 | Most of the papers deserve to be read.
32 | 
33 | ### NSDI 2020
34 | 
35 | - Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure [[Paper]](https://www.usenix.org/system/files/nsdi20-paper-li.pdf)
36 |   - Ze Li, Qian Cheng, Ken Hsieh, and Yingnong Dang, Microsoft Azure; Peng Huang, Johns Hopkins University; Pankaj Singh and Xinsheng Yang, Microsoft Azure; Qingwei Lin, Microsoft Research; Youjiang Wu, Sebastien Levy, and Murali Chintalapati, Microsoft Azure
37 | - Telekine: Secure Computing with Cloud GPUs [[Paper]](https://www.usenix.org/system/files/nsdi20-paper-hunt.pdf)
38 |   - Tyler Hunt, Zhipeng Jia, Vance Miller, Ariel Szekely, and Yige Hu, The University of Texas at Austin; Christopher J. Rossbach, The University of Texas at Austin and VMware Research; Emmett Witchel, The University of Texas at Austin
39 | - Rex: Preventing Bugs and Misconfiguration in Large Services Using Correlated Change Analysis [[Paper]](https://www.usenix.org/system/files/nsdi20-paper-mehta.pdf)
40 |   - Sonu Mehta, Ranjita Bhagwan, and Rahul Kumar, Microsoft Research India; Chetan Bansal, Microsoft Research; Chandra Maddila and B. Ashok, Microsoft Research India; Sumit Asthana, University of Michigan; Christian Bird, Microsoft Research; Aditya Kumar
41 | - Themis: Fair and Efficient GPU Cluster Scheduling [[Paper]](https://www.usenix.org/system/files/nsdi20-paper-mahajan.pdf)
42 |   - Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, and Aditya Akella, University of Wisconsin-Madison; Amar Phanishayee, Microsoft Research; Shuchi Chawla, University of Wisconsin-Madison
43 | 
44 | 
45 | ### Eurosys 2020
46 | 
47 | - Balancing Efficiency and Fairness in Heterogeneous GPU Clusters for Deep Learning [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3342195.3387555)
48 |   - Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Srinidhi Viswanatha (Microsoft Research India)
49 | - Experiences of Landing Machine Learning onto Market-Scale Mobile Malware Detection [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3342195.3387530)
50 |   - Liangyi Gong, Zhenhua Li (Tsinghua University), Feng Qian (University of Minnesota, Twin Cities), Zifan Zhang (Tsinghua University & Tencent Co. LTD), Qi Alfred Chen, Zhiyun Qian (University of California, Riverside), Hao Lin (Tsinghua University), Yunhao Liu (Tsinghua University & Michigan State University)
51 | - Autopilot: workload autoscaling at Google [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3342195.3387524?download=false)
52 |   - Krzysztof Rzadca (Google, University of Warsaw), Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, John Wilkes (Google)
53 | - Borg: the Next Generation [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3342195.3387517)
54 |   - Muhammad Tirmazi (Harvard University), Adam Barker (Google and University of St Andrews), Nan Deng, Md Ehtesam Haque, Zhijing Gene Qin, Steven Hand (Google), Mor Harchol-Balter (Carnegie Mellon University), John Wilkes (Google)
55 | - AlloX: Compute Allocation in Hybrid Clusters [[Paper]](https://www.mosharaf.com/wp-content/uploads/allox-eurosys20.pdf)
56 |   - Tan N. Le (SUNY Korea, Stony Brook University), Xiao Sun (Stony Brook University), Mosharaf Chowdhury (University of Michigan), Zhenhua Liu (Stony Brook University)
57 | - Env2Vec: Accelerating VNF Testing with Deep Learning [[Paper]](https://dl.acm.org/doi/abs/10.1145/3342195.3387525)
58 |   - Guangyuan Piao, Patrick K. Nicholson, Diego Lugones (Nokia Bell Labs)
59 |   
60 |   
61 | ## Year 2019
62 | 
63 | ### MLSys 2019 [[Proceedings]](https://proceedings.mlsys.org/book/2019) 
64 | 
65 | ### ATC 2019
66 | 
67 | - The Design and Operation of CloudLab [[Paper]](https://www.usenix.org/system/files/atc19-duplyakin_0.pdf)
68 |   - Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, and Kirk Webb, University of Utah; Aditya Akella, University of Wisconsin—Madison; Kuangching Wang, Clemson University; Glenn Ricart, US Ignite; Larry Landweber, University of Wisconsin—Madison; Chip Elliott, Raytheon; Michael Zink and Emmanuel Cecchet, University of Massachusetts Amherst; Snigdhaswin Kar and Prabodh Mishra, Clemson University
69 | - NeuGraph: Parallel Deep Neural Network Computation on Large Graphs [[Paper]](https://www.usenix.org/system/files/atc19-ma_0.pdf)
70 |   - Lingxiao Ma and Zhi Yang, Peking University; Youshan Miao, Jilong Xue, Ming Wu, and Lidong Zhou, Microsoft Research; Yafei Dai, Peking University
71 | - Optimizing CNN Model Inference on CPUs [[Paper]](https://www.usenix.org/system/files/atc19-liu-yizhi.pdf)
72 |   - Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang, Amazon
73 | - Accelerating Rule-matching Systems with Learned Rankers [[Paper]](https://www.usenix.org/system/files/atc19-li-zhao-lucas.pdf)
74 |   - Zhao Lucis Li, University of Science and Technology China; Chieh-Jan Mike Liang and Wei Bai, Microsoft Research; Qiming Zheng, Shanghai Jiao Tong University; Yongqiang Xiong, Microsoft Research; Guangzhong Sun, University of Science and Technology China
75 | - MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving [[Paper]](https://www.usenix.org/system/files/atc19-zhang-chengliang.pdf)
76 |   - Chengliang Zhang, Minchen Yu, and Wei Wang, Hong Kong University of Science and Technology; Feng Yan, University of Nevada, Reno  
77 | - Cross-dataset Time Series Anomaly Detection for Cloud Systems [[Paper]](https://www.usenix.org/system/files/atc19-zhang-xu.pdf)
78 |   - Xu Zhang, Microsoft Research, Nanjing University; Qingwei Lin, Yong Xu, and Si Qin, Microsoft Research; Hongyu Zhang, The University of Newcastle; Bo Qiao, Microsoft Research; Yingnong Dang, Xinsheng Yang, Qian Cheng, Murali Chintalapati, Youjiang Wu, and Ken Hsieh, Microsoft; Kaixin Sui, Xin Meng, Yaohai Xu, and Wenchi Zhang, Microsoft Research; Furao Shen, Nanjing University; Dongmei Zhang, Microsoft Research
79 | 
80 | 
81 | 
82 | 


--------------------------------------------------------------------------------
/data_processing.md:
--------------------------------------------------------------------------------
 1 | # Data Processing
 2 | 
 3 | Data processing and feature extraction play a key role in machine learning pipeline.
 4 | 
 5 | - Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. [[GitHub]](https://github.com/quantumblacklabs/kedro)
 6 | - Google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more [[GitHub]](https://github.com/google/jax)
 7 | - CuPy: NumPy-like API accelerated with CUDA [[GitHub]](https://github.com/cupy/cupy)
 8 | - Modin: Speed up your Pandas workflows by changing a single line of code [[GitHub]](https://github.com/modin-project/modin)
 9 | - Weld: Weld is a runtime for improving the performance of data-intensive applications. [[Project Website]](https://www.weld.rs/)
10 | - Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines [[Project Website]](http://halide-lang.org/)
11 |   - Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe. (*PLDI 2013*)
12 |   - Summary: Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines.
13 | - a-mma/AquilaDB: Resilient, Replicated, Decentralized, Host neutral vector database to store Feature Vectors along with JSON Metadata. Do similarity search from anywhere, even from the darkest rifts of Aquila. Production ready solution for Machine Learning engineers and Data scientists. [[GitHub]](https://github.com/a-mma/AquilaDB)
14 | - ShannonAI/service-streamer: Boosting your Web Services of Deep Learning Applications. [[GitHub]](https://github.com/ShannonAI/service-streamer)
15 | 


--------------------------------------------------------------------------------
/drl_system.md:
--------------------------------------------------------------------------------
 1 | # Deep Reinforcement Learning System
 2 | 
 3 |  For now, this category only contains system for drl papers and projects.
 4 |  
 5 | - Mao, Hongzi, et al. "Park: An Open Platform for Learning-Augmented Computer Systems." Advances in Neural Information Processing Systems. 2019.
 6 |  - Summary: This work builds a platform to introduce DRL to computer system optimizaton. It provides a lot of APIs so researcher can focus on developing algorithm rather spend a lot of time on writing system engineering codes.
 7 | - Ray: A Distributed Framework for Emerging {AI} Applications [[GitHub]](https://www.usenix.org/conference/osdi18/presentation/moritz)
 8 |   - Moritz, Philipp, et al. (*OSDI 2018*)
 9 |   - Summary: Distributed DRL training, simulation and inference system. Can be used as a high-performance python framework.
10 | - Elf: An extensive, lightweight and flexible research platform for real-time strategy games [[Paper]](https://papers.nips.cc/paper/6859-elf-an-extensive-lightweight-and-flexible-research-platform-for-real-time-strategy-games.pdf) [[GitHub]](https://github.com/facebookresearch/ELF)
11 |   - Tian, Yuandong, Qucheng Gong, Wenling Shang, Yuxin Wu, and C. Lawrence Zitnick. (*NIPS 2017*)
12 |   - Summary:
13 | - Horizon: Facebook's Open Source Applied Reinforcement Learning Platform [[Paper]](https://arxiv.org/pdf/1811.00260) [[GitHub]](https://github.com/facebookresearch/Horizon)
14 |   - Gauci, Jason, et al. (*preprint 2019*)
15 | - RLgraph: Modular Computation Graphs for Deep Reinforcement Learning [[Paper]](http://www.sysml.cc/doc/2019/43.pdf)[[GitHub]](https://github.com/rlgraph/rlgraph)
16 |   - Schaarschmidt, Michael, Sven Mika, Kai Fricke, and Eiko Yoneki. (*SysML 2019*)
17 |   - Summary:
18 | - Stable-Baselines: Stable-Baselines3: Reliable Reinforcement Learning Implementations 2021
19 |   - [[Paper]](https://www.jmlr.org/papers/volume22/20-1364/20-1364.pdf) [[GitHub]](https://github.com/hill-a/stable-baselines)
20 | 


--------------------------------------------------------------------------------
/edge_system.md:
--------------------------------------------------------------------------------
  1 | # Edge or Mobile Papers
  2 | 
  3 | This part contains papers of projects of edge or mobile system for ML.
  4 | 
  5 | ## Project
  6 | 
  7 | - deepC is a vendor independent deep learning library, compiler and inference framework designed for small form-factor devices including μControllers, IoT and Edge devices[[GitHub]](https://github.com/ai-techsystems/deepC)
  8 | - Tengine, developed by OPEN AI LAB, is an AI application development platform for AIoT scenarios launched by OPEN AI LAB, which is dedicated to solving the fragmentation problem of aiot industrial chain and accelerating the landing of AI industrialization. [[GitHub]](https://github.com/OAID/Tengine)
  9 | - Mobile Computer Vision @ Facebook [[GitHub]](https://github.com/facebookresearch/mobile-vision)
 10 | - alibaba/MNN: MNN is a lightweight deep neural network inference engine. It loads models and do inference on devices. [[GitHub]](https://github.com/alibaba/MNN)
 11 | - XiaoMi/mobile-ai-bench: Benchmarking Neural Network Inference on Mobile Devices [[GitHub]](https://github.com/XiaoMi/mobile-ai-bench)
 12 | - XiaoMi/mace-models: Mobile AI Compute Engine Model Zoo [[GitHub]](https://github.com/XiaoMi/mace-models)
 13 | - Tencent/nccn: ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. [[Github]](https://github.com/Tencent/ncnn)
 14 | - Tencent/TNN: [[Github]](https://github.com/Tencent/TNN)
 15 | 
 16 | ## Survey
 17 | 
 18 | -  Convergence of edge computing and deep learning: A comprehensive survey. [[Paper]](https://arxiv.org/pdf/1907.08349)
 19 |     - Wang, X., Han, Y., Leung, V. C., Niyato, D., Yan, X., & Chen, X. (2020).
 20 |     - IEEE Communications Surveys & Tutorials, 22(2), 869-904.
 21 | - Deep learning with edge computing: A review. [[Paper]](https://www.cs.ucr.edu/~jiasi/pub/deep_edge_review.pdf)
 22 |     - Chen, J., & Ran, X. 
 23 |     - Proceedings of the IEEE, 107(8), 1655-1674.(2019). 
 24 | - Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. [[Paper]](https://arxiv.org/pdf/1905.10083.pdf)
 25 |     - Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., & Zhang, J.
 26 |     - arXiv: Distributed, Parallel, and Cluster Computing. (2019). 
 27 | - Machine Learning at Facebook: Understanding Inference at the Edge. [[Paper]](https://ieeexplore.ieee.org/abstract/document/8675201)
 28 |     - Wu, C., Brooks, D., Chen, K., Chen, D., Choudhury, S., Dukhan, M., ... & Zhang, P. 
 29 |     - high-performance computer architecture.(2019). 
 30 | 
 31 | ## Edge AI Paper
 32 | 
 33 | - Modeling of Deep Neural Network (DNN) Placement and Inference in Edge Computing. [[GitHub]](https://arxiv.org/pdf/2001.06901.pdf)
 34 |     - Bensalem, M., Dizdarević, J. and Jukan, A., 2020.
 35 |     - arXiv preprint arXiv:2001.06901. 
 36 | - Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [[Paper]](https://arxiv.org/pdf/1803.09492.pdf)
 37 |     - Hanhirova, J., Kämäräinen, T., Seppälä, S., Siekkinen, M., Hirvisalo, V. and Ylä-Jääski
 38 |     - In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 204-215).
 39 | - Characterizing the Deep Neural Networks Inference Performance of Mobile Applications. [[Paper]](https://arxiv.org/pdf/1909.04783.pdf)
 40 |     - Ogden, S.S. and Guo, T., 2019.
 41 |     - arXiv preprint arXiv:1909.04783.
 42 | - Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. [[Paper]](http://web.eecs.umich.edu/~jahausw/publications/kang2017neurosurgeon.pdf)
 43 |     - Kang, Y., Hauswald, J., Gao, C., Rovinski, A., Mudge, T., Mars, J. and Tang, L., 2017, April. 
 44 |     - In ACM SIGARCH Computer Architecture News (Vol. 45, No. 1, pp. 615-629). ACM.
 45 | - 26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone [[Paper]](https://arxiv.org/pdf/1905.00571.pdf)
 46 |     - Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren (*ICML2019*)
 47 | - NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3241539.3241559)
 48 |     - Fang, Biyi, Xiao Zeng, and Mi Zhang. (*MobiCom 2018*)
 49 |     - Summary: Borrow some ideas from network prune. The pruned model then recovers to trade-off computation resource and accuracy at runtime
 50 | - Lavea: Latency-aware video analytics on edge computing platform [[Paper]](http://www.cs.wayne.edu/~weisong/papers/yi17-LAVEA.pdf)
 51 |     - Yi, Shanhe, et al. (*Second ACM/IEEE Symposium on Edge Computing. ACM, 2017.*)
 52 | - Scaling Video Analytics on Constrained Edge Nodes [[Paper]](http://www.sysml.cc/doc/2019/197.pdf) [[GitHub]](https://github.com/viscloud/filterforward)
 53 |     - Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloo (*SysML 2019*)
 54 | - Big/little deep neural network for ultra low power inference.
 55 |     - Park, E., Kim, D. Y., Kim, S., Kim, Y. M., Kim, G., Yoon, S., & Yoo, S.
 56 |     - international conference on hardware/software codesign and system synthesis.(2015)
 57 | - Collaborative learning between cloud and end devices: an empirical study on location prediction. [[Paper]](https://www.microsoft.com/en-us/research/uploads/prod/2019/08/sec19colla.pdf)
 58 |     - Lu, Y., Shu, Y., Tan, X., Liu, Y., Zhou, M., Chen, Q., & Pei, D.
 59 |     - ACM/IEEE Symposium on Edge Computing(2019)
 60 | - Context-Aware Convolutional Neural Network over Distributed System in Collaborative Computing. [[Paper]](https://dl.acm.org/doi/10.1145/3316781.3317792)
 61 |     - Choi, J., Hakimi, Z., Shin, P. W., Sampson, J., & Narayanan, V. (2019). 
 62 |     - design automation conference.
 63 | - OpenEI: An Open Framework for Edge Intelligence. [[Paper]](https://arxiv.org/pdf/1906.01864.pdf)
 64 |     - Zhang, X., Wang, Y., Lu, S., Liu, L., Xu, L., & Shi, W. 
 65 |     - international conference on distributed computing systems.(2019). 
 66 | - Swing: Swarm Computing for Mobile Sensing.[[Paper]](http://people.duke.edu/~bcl15/documents/fan18-icdcs.pdf)
 67 |     - Fan, S., Salonidis, T., & Lee, B. C.  
 68 |     - international conference on distributed computing systems(2018).
 69 | - Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. [[Paper]](https://arxiv.org/pdf/1910.14315.pdf)
 70 |     - Shao, J., & Zhang, J. 
 71 |     - In 2020 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1-6). IEEE.(2020, June). 
 72 | - JointDNN: an efficient training and inference engine for intelligent mobile cloud computing services. [[Paper]](https://arxiv.org/pdf/1801.08618.pdf)
 73 |     - Eshratifar, A. E., Abrishami, M. S., & Pedram, M.  
 74 |     - IEEE Transactions on Mobile Computing.(2019).
 75 | - TeamNet: A Collaborative Inference Framework on the Edge. 
 76 |     - Fang, Y., Jin, Z., & Zheng, R.
 77 |     - In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) (pp. 1487-1496). IEEE. (2019, July). 
 78 | - Distributing deep neural networks with containerized partitions at the edge. [[Paper]](https://www.usenix.org/system/files/hotedge19-paper-zhou.pdf)
 79 |     - Zhou, L., Wen, H., Teodorescu, R., & Du, D. H. (2019). 
 80 |     - In 2nd {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 19).
 81 | - Distributed Machine Learning through Heterogeneous Edge Systems.[[Paper]](https://i2.cs.hku.hk/~cwu/papers/hphu-aaai20.pdf)
 82 |     - Hu, H., Wang, D., & Wu, C. (2020). 
 83 |     - In AAAI (pp. 7179-7186).
 84 | - Dynamic adaptive DNN surgery for inference acceleration on the edge. [[Paper]](https://ieeexplore.ieee.org/abstract/document/8737614/)
 85 |     - Hu, C., Bao, W., Wang, D., & Liu, F. (2019, April). 
 86 |     - In IEEE INFOCOM 2019-IEEE Conference on Computer Communications (pp. 1423-1431). IEEE.
 87 | - Collaborative execution of deep neural networks on internet of things devices. [[Paper]](https://arxiv.org/pdf/1901.02537)
 88 |     - Hadidi, R., Cao, J., Ryoo, M. S., & Kim, H. 
 89 |     - arXiv preprint arXiv:1901.02537.(2019). 
 90 | - DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. [[Paper]](https://ieeexplore.ieee.org/document/8493499)
 91 |     - Zhao, Z., Barijough, K. M., & Gerstlauer, A. 
 92 |     - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11), 2348-2359.(2018). 
 93 | 
 94 | ## Fog AI Paper
 95 | 
 96 | - Fogflow: Easy programming of iot services over cloud and edges for smart cities. [[Paper]](https://ieeexplore.ieee.org/document/8022859) [[GitHub]](https://github.com/smartfog/fogflow)
 97 |   - Cheng, Bin, Gürkan Solmaz, Flavio Cirillo, Ernö Kovacs, Kazuyuki Terasawa, and Atsushi Kitazawa.
 98 |   - IEEE Internet of Things Journal 5, no. 2 (2017): 696-707.
 99 | 
100 | 
101 | 
102 | 
103 | 
104 | 


--------------------------------------------------------------------------------
/federated_learning_system.md:
--------------------------------------------------------------------------------
 1 | # Federated Learning System
 2 | 
 3 | ## Papers
 4 | 
 5 | - Towards Federated Learning at Scale: System Design [[Paper]](https://arxiv.org/abs/1902.01046) [MLSys'19]
 6 | 
 7 | - BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning [[Paper]](https://www.usenix.org/system/files/atc20-zhang-chengliang.pdf) [[Github]](https://github.com/marcoszh/BatchCrypt) [ATC'20]
 8 | 
 9 | ## Projects
10 | 
11 | - FATE @ Webank [[Github]](https://github.com/FederatedAI/FATE)
12 | - Tensorflow Federated @ Google [[Github]](https://github.com/tensorflow/federated)
13 | - PySyft @ OpenMined [[Github]](https://github.com/OpenMined/PySyft)
14 |   - A Generic Framework for Privacy Preserving Peep Pearning [[Paper]](https://arxiv.org/abs/1811.04017)
15 | - PaddleFL @ Baidu [[Github]](https://github.com/PaddlePaddle/PaddleFL)
16 | - Nvidia Clara SDK [[Web]](https://developer.nvidia.com/clara)
17 | - Flower [[Github]](https://github.com/adap/flower) [[Website]](https://flower.dev/) [[Paper]](https://arxiv.org/abs/2007.14390)
18 |   - A unified approach to federated learning, analytics, and evaluation. Federate any workload, any ML framework, and any programming language.
19 | 


--------------------------------------------------------------------------------
/imgs/AI_system.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HuaizhengZhang/AI-Infra-from-Zero-to-Hero/e956db3a372fb520d5f0eeec18176fdd81eacd4d/imgs/AI_system.png


--------------------------------------------------------------------------------
/inference.md:
--------------------------------------------------------------------------------
  1 | # Inference System
  2 | 
  3 | System for machine learning inference.
  4 | 
  5 | ## Benchmark
  6 | - Wanling Gao, Fei Tang, Jianfeng Zhan, et al. "AIBench: A Datacenter AI Benchmark Suite, BenchCouncil". [[Paper]](https://arxiv.org/pdf/2005.03459.pdf) [[Website]](https://www.benchcouncil.org/AIBench/index.html)
  7 | - BaiduBench: Benchmarking Deep Learning operations on different hardware. [[Github]](https://github.com/baidu-research/DeepBench#inference-benchmark)
  8 | - Reddi, Vijay Janapa, et al. "Mlperf inference benchmark." arXiv preprint arXiv:1911.02549 (2019). [[Paper]](https://arxiv.org/pdf/1911.02549.pdf) [[GitHub]](https://github.com/mlperf/inference)
  9 | - Bianco, Simone, et al. "Benchmark analysis of representative deep neural network architectures." IEEE Access 6 (2018): 64270-64277. [[Paper]](https://arxiv.org/abs/1810.00736)
 10 | - Almeida, Mario, et al. "EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices." The 3rd International Workshop on Deep Learning for Mobile Systems and Applications. 2019. [[Paper]](https://arxiv.org/pdf/1905.07346.pdf)
 11 | 
 12 | ## Model Management
 13 | 
 14 | - Model Card Toolkit. The Model Card Toolkit (MCT) streamlines and automates generation of Model Cards [1], machine learning documents that provide context and transparency into a model's development and performance. [[Paper]](https://arxiv.org/pdf/1810.03993.pdf) [[GitHub]](https://github.com/tensorflow/model-card-toolkit)
 15 | - DLHub: Model and data serving for science.  [[Paper]](https://arxiv.org/pdf/1811.11213.pdf)
 16 |   - Chard, R., Li, Z., Chard, K., Ward, L., Babuji, Y., Woodard, A., Tuecke, S., Blaiszik, B., Franklin, M. and Foster, I., 2019, May. 
 17 |   - In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 283-292). IEEE. 
 18 | - Publishing and Serving Machine Learning Models with DLHub. [[Paper]](https://dl.acm.org/doi/10.1145/3332186.3332246)
 19 | - TRAINS - Auto-Magical Experiment Manager & Version Control for AI [[GitHub]](https://github.com/allegroai/trains)
 20 | - ModelDB: A system to manage ML models [[GitHub]](https://github.com/mitdbg/modeldb) [[MIT short paper]](https://mitdbg.github.io/modeldb/papers/hilda_modeldb.pdf)
 21 | - iterative/dvc: Data & models versioning for ML projects, make them shareable and reproducible [[GitHub]](https://github.com/iterative/dvc)
 22 | 
 23 | ## Model Serving
 24 | 
 25 | - Announcing RedisAI 1.0: AI Serving Engine for Real-Time Applications [[Blog]](https://redislabs.com/blog/redisai-ai-serving-engine-for-real-time-applications/)
 26 | - Cloudburst: Stateful Functions-as-a-Service. [\[Paper\]](https://arxiv.org/pdf/2001.04592.pdf) [\[GitHub\]](https://github.com/hydro-project/cloudburst)
 27 |   - Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, Alexey Tumanov
 28 |   - VLDB 2020
 29 |   - A stateful FaaS platform.
 30 |     (1) feasibility of general-purpose stateful serverless computing. 
 31 |     (2) Autoscaling via logical disaggregation of storage and compute, state management via physical 
 32 |     colocation of caches with compute services.
 33 |     (3) LDPC design pattern
 34 | - Optimizing Prediction Serving on Low-Latency Serverless Dataflow [[Paper]](https://arxiv.org/pdf/2007.05832.pdf)
 35 |   - Sreekanti, Vikram, Harikaran Subbaraj, Chenggang Wu, Joseph E. Gonzalez, and Joseph M. Hellerstein.
 36 |   - arXiv preprint arXiv:2007.05832 (2020).
 37 | - Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. [[Paper]](https://arxiv.org/pdf/2006.02464.pdf)
 38 |   - Gujarati, A., Karimi, R., Alzayat, S., Kaufmann, A., Vigfusson, Y. and Mace, J., 2020.
 39 |   - OSDI 2020
 40 | - Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency [[Paper]](https://www.microsoft.com/en-us/research/uploads/prod/2018/01/2017.Middleware.Swayam.TailLatencyInAzureML.pdf)
 41 |   - Gujarati, Arpan, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Björn B. Brandenburg.
 42 |   - In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, pp. 109-120. 2017.
 43 |   - Summary: a cloud autoscaler. (1) model-based autoscaling that takes into account SLAs and ML inference workload characteristics, (2) a distributed protocol that uses partial load information and prediction at frontends to provi- sion new service instances, and (3) a backend self-decommissioning protocol for service instances
 44 | - Swift machine learning model serving scheduling: a region based reinforcement learning approach. [[Paper]](https://dl.acm.org/doi/10.1145/3295500.3356164) [[GitHub]](https://github.com/SC-RRL/RRL)
 45 |   - Qin, Heyang, Syed Zawad, Yanqi Zhou, Lei Yang, Dongfang Zhao, and Feng Yan.
 46 |   - In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-23. 2019.
 47 |   - Summary: The system performances under different similar con- figurations in a region can be accurately estimated by using the system performance under one of these configurations, due to their similarity. Region based DRL is designed for parallelism selection.
 48 | - TorchServe is a flexible and easy to use tool for serving PyTorch models. [[GitHub]](https://github.com/pytorch/serve)
 49 | - Seldon Core: Blazing Fast, Industry-Ready ML. An open source platform to deploy your machine learning models on Kubernetes at massive scale. [[GitHub]](https://github.com/SeldonIO/seldon-core)
 50 | - MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving [[Paper]](https://www.usenix.org/system/files/atc19-zhang-chengliang.pdf) [[GitHub]](https://github.com/marcoszh/MArk-Project)
 51 |   - Zhang, C., Yu, M., Wang, W. and Yan, F., 2019. 
 52 |   - In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19) (pp. 1049-1062).
 53 |   - Summary: address the scalability and cost minimization issues for model serving on the public cloud.
 54 | - Parity Models: Erasure-Coded Resilience for Prediction Serving Systems(SOSP2019) [[Paper]](http://www.cs.cmu.edu/~rvinayak/papers/sosp2019parity-models.pdf) [[GitHub]](https://github.com/Thesys-lab/parity-models)
 55 | - Nexus: Nexus is a scalable and efficient serving system for DNN applications on GPU cluster (SOSP2019) [[Paper]](https://pdfs.semanticscholar.org/0c0f/353dbac84311ea4f1485d4a8ac0b0459be8c.pdf) [[GitHub]](https://github.com/uwsampl/nexus)
 56 | - Deep Learning Inference Service at Microsoft [[Paper]](https://www.usenix.org/system/files/opml19papers-soifer.pdf)
 57 |   - J Soifer, et al. (*OptML2019*)
 58 | - {PRETZEL}: Opening the Black Box of Machine Learning Prediction Serving Systems. [[Paper]](https://www.usenix.org/system/files/osdi18-lee.pdf)
 59 |   - Lee, Y., Scolari, A., Chun, B.G., Santambrogio, M.D., Weimer, M. and Interlandi, M., 2018. (*OSDI 2018*)
 60 | - Brusta: PyTorch model serving project [[GitHub]](https://github.com/hyoungseok/brusta)
 61 | - Model Server for Apache MXNet: Model Server for Apache MXNet is a tool for serving neural net models for inference [[GitHub]](https://github.com/awslabs/mxnet-model-server)
 62 | - TFX: A TensorFlow-Based Production-Scale Machine Learning Platform [[Paper]](http://stevenwhang.com/tfx_paper.pdf) [[Website]](https://www.tensorflow.org/tfx) [[GitHub]](https://github.com/tensorflow/tfx)
 63 |   - Baylor, Denis, et al. (*KDD 2017*)
 64 | - Tensorflow-serving: Flexible, high-performance ml serving [[Paper]](https://arxiv.org/pdf/1712.06139) [[GitHub]](https://github.com/tensorflow/serving)
 65 |   - Olston, Christopher, et al.
 66 | - IntelAI/OpenVINO-model-server: Inference model server implementation with gRPC interface, compatible with TensorFlow serving API and OpenVINO™ as the execution backend. [[GitHub]](https://github.com/IntelAI/OpenVINO-model-server)
 67 | - Clipper: A Low-Latency Online Prediction Serving System [[Paper]](https://www.usenix.org/system/files/conference/nsdi17/nsdi17-crankshaw.pdf)
 68 | [[GitHub]](https://github.com/ucbrise/clipper)
 69 |   - Crankshaw, Daniel, et al. (*NSDI 2017*)
 70 |   - Summary: Adaptive batch 
 71 | - InferLine: ML Inference Pipeline Composition Framework [[Paper]](https://arxiv.org/pdf/1812.01776.pdf) [[GitHub]](https://github.com/simon-mo/inferline-models)
 72 |   - Crankshaw, Daniel, et al. (*SoCC 2020*)
 73 |   - Summary: update version of Clipper
 74 | - TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments [[Paper]](https://arxiv.org/pdf/1811.09732.pdf)
 75 |   - Dakkak, Abdul, et al (*Preprint*)
 76 |   - Summary: model cold start problem
 77 | - Rafiki: machine learning as an analytics service system [[Paper]](http://www.vldb.org/pvldb/vol12/p128-wang.pdf) [[GitHub]](https://github.com/nginyc/rafiki)
 78 |   - Wang, Wei, Jinyang Gao, Meihui Zhang, Sheng Wang, Gang Chen, Teck Khim Ng, Beng Chin Ooi, Jie Shao, and Moaz Reyad.
 79 |   - Summary: Contain both training and inference. Auto-Hype-Parameter search for training. Ensemble models for inference. Using DRL to balance trade-off between accuracy and latency.
 80 | - GraphPipe: Machine Learning Model Deployment Made Simple [[GitHub]](https://github.com/oracle/graphpipe)
 81 | - Orkhon: ML Inference Framework and Server Runtime [[GitHub]](https://github.com/vertexclique/orkhon)
 82 | - NVIDIA/tensorrt-inference-server: The TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. [[GitHub]](https://github.com/NVIDIA/tensorrt-inference-server) [[Slides: DEEP INTO TRTIS]](https://on-demand.gputechconf.com/gtc-cn/2019/pdf/CN9506/presentation.pdf)
 83 | - torchpipe: Ensemble Pipeline Serving with Pytorch Frontend. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends. [[GitHub]](https://github.com/torchpipe/torchpipe)
 84 | - INFaaS: Automated Model-less Inference Serving [[GitHub]](https://github.com/stanford-mast/INFaaS), [[Paper]](https://www.usenix.org/conference/atc21/presentation/romero)
 85 |   - Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis (*ATC 2021*)
 86 | - Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines
 87 |   - Francisco Romero, Mark Zhao, Neeraja J. Yadwadkar, and Christos Kozyrakis (*SoCC 2021*)
 88 | - Scrooge: A Cost-Effective Deep Learning Inference System [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3472883.3486993)
 89 |   - Yitao Hu, Rajrup Ghosh, Ramesh Govindan
 90 | - Apache PredictionIO® is an open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning task [[Website]](http://predictionio.apache.org/)
 91 | 
 92 | ## Cache for Inference
 93 | 
 94 | - Kumar, Adarsh, et al. "Accelerating deep learning inference via freezing." 11th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 19). 2019. [[Paper]](http://shivaram.org/publications/freeze-hotcloud19.pdf)
 95 | - Xu, Mengwei, et al. "DeepCache: Principled cache for mobile deep vision." Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 2018. [[Paper]](https://arxiv.org/pdf/1712.01670.pdf)
 96 | - Park, Keunyoung, and Doo-Hyun Kim. "Accelerating image classification using feature map similarity in convolutional neural networks." Applied Sciences 9.1 (2019): 108. [[Paper]](https://www.mdpi.com/2076-3417/9/1/108/htm)
 97 | - Cavigelli, Lukas, and Luca Benini. "CBinfer: Exploiting frame-to-frame locality for faster convolutional network inference on video streams." IEEE Transactions on Circuits and Systems for Video Technology (2019). [[Paper]](https://arxiv.org/pdf/1808.05488)
 98 | 
 99 | ## Inference Optimization
100 | 
101 | - Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics [[Paper]](https://arxiv.org/pdf/2007.13005.pdf)
102 |   - Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, Matei Zaharia
103 |   - VLDB 2021 
104 | - Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. [[arxiv]](https://arxiv.org/pdf/1906.01974.pdf)[[GitHub]](https://github.com/stanford-futuredata/Willump)
105 |   - Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia.
106 |   - arXiv Preprint. 2019.
107 | - TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs and deep learning accelerators. [[GitHub]](https://github.com/NVIDIA/TensorRT)
108 | - Dynamic Space-Time Scheduling for GPU Inference [[Paper]](http://learningsys.org/nips18/assets/papers/102CameraReadySubmissionGPU_Virtualization%20(8).pdf) [[GitHub]](https://github.com/ucbrise/caravel)
109 |   - Jain, Paras, et al. (*NIPS 18, System for ML*)
110 |   - Summary: optimization for GPU Multi-tenancy
111 | - Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems [[Paper]](http://www.cs.cmu.edu/~jinlianw/papers/dynamic_scheduling_nips18_sysml.pdf)
112 |   - Wei, Jinliang, Garth Gibson, Vijay Vasudevan, and Eric Xing. (*On going*)
113 | - Accelerating Deep Learning Workloads through Efficient Multi-Model Execution. [[Paper]](https://cs.stanford.edu/~matei/papers/2018/mlsys_hivemind.pdf)
114 |   - D. Narayanan, K. Santhanam, A. Phanishayee and M. Zaharia. (*NeurIPS Systems for ML Workshop 2018*)
115 |   - Summary: They assume that their system, HiveMind, is given as input models grouped into model batches that are amenable to co-optimization and co-execution. a compiler, and a runtime.
116 | - DeepCPU: Serving RNN-based Deep Learning Models 10x Faster [[Paper]](https://www.usenix.org/system/files/conference/atc18/atc18-zhang-minjia.pdf)
117 |   - Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He, Microsoft AI and Research (*ATC 2018*)
118 | 
119 | ## Cluster Management for Inference (now only contain multi-tenant)
120 | 
121 | - Ease. ml: Towards multi-tenant resource sharing for machine learning workloads [[Paper]](http://www.vldb.org/pvldb/vol11/p607-li.pdf) [[GitHub]](https://github.com/DS3Lab/easeml) [[Demo]](http://www.vldb.org/pvldb/vol11/p2054-karlas.pdf)
122 |   - Li, Tian, et al
123 |   - Proceedings of the VLDB Endowment 11.5 (2018): 607-620.
124 | - Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models [[Paper]](https://arxiv.org/pdf/1912.02322.pdf)
125 |   - LeMay, Matthew, Shijian Li, and Tian Guo. 
126 |   - arXiv preprint arXiv:1912.02322 (2019).
127 |   
128 | ## Machine Learning Compiler
129 | - Hummingbird: Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to seamlessly leverage neural network frameworks (such as PyTorch) to accelerate traditional ML models.[[GitHub]](https://github.com/microsoft/hummingbird)
130 | - {TVM}: An Automated End-to-End Optimizing Compiler for Deep Learning [[Paper]](https://www.usenix.org/system/files/osdi18-chen.pdf) [[YouTube]](https://www.youtube.com/watch?v=I1APhlSjVjs) [[Project Website]](https://tvm.ai/)
131 |   - Chen, Tianqi, et al. (*OSDI 2018*)
132 |   - Summary: Automated optimization is very impressive: cost model (rank objective function) + schedule explorer (parallel simulated annealing)
133 | - Facebook TC: Tensor Comprehensions (TC) is a fully-functional C++ library to automatically synthesize high-performance machine learning kernels using Halide, ISL and NVRTC or LLVM. [[GitHub]](https://github.com/facebookresearch/TensorComprehensions)
134 | - Tensorflow/mlir: "Multi-Level Intermediate Representation" Compiler Infrastructure [[GitHub]](https://github.com/tensorflow/mlir) [[Video]](https://www.youtube.com/watch?v=qzljG6DKgic)
135 | - PyTorch/glow: Compiler for Neural Network hardware accelerators [[GitHub]](https://github.com/pytorch/glow)
136 | - TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions [[Paper]](https://cs.stanford.edu/~matei/papers/2019/sosp_taso.pdf) [[GitHub]](https://github.com/jiazhihao/TASO)
137 |   - Jia, Zhihao, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. (*SOSP 2019*)
138 |   - Experiments tested on TVM and XLA
139 | - SGLAng: Manage KV cache through radix attention [[Paper]](https://arxiv.org/pdf/2312.07104.pdf) [[Github]](https://github.com/sgl-project/sglang)
140 |   - Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
141 | 


--------------------------------------------------------------------------------
/infra.md:
--------------------------------------------------------------------------------
 1 | # Machine Learning Infrastructure
 2 | 
 3 | Frameworks, infra and useful toolkits (e.g., visulization) for training, inference or both. You can check [[AI infrastructures list]](https://github.com/1duo/awesome-ai-infrastructures) for more.
 4 | 
 5 | - [Paper](#paper)
 6 | - [Project with code](#project)
 7 | - [GPU tech](#gpu-sharing)
 8 | - [Tool](#userful-tools)
 9 | 
10 | ## Paper
11 | 
12 | - The Case for Learning-and-System Co-design [[Paper]](https://dl.acm.org/citation.cfm?id=3352031)
13 |   - Mike Liang, C.J., Xue, H., Yang, M. and Zhou, L., 2019. 
14 |   - ACM SIGOPS Operating Systems Review, 53(1), pp.68-74.
15 |   - Summary: Make the system learnable. Propose a framework named AutoSys which contains both training plane and inference plane
16 | 
17 | *These three papers are not only for ML but also for Big Data and they are too good to be ingnored.*
18 | 
19 | 
20 | - Large-scale cluster management at Google with Borg [[Paper]](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf)
21 |   - Verma, Abhishek, et al
22 |   - Proceedings of the Tenth European Conference on Computer Systems. 2015.
23 | - Apache hadoop yarn: Yet another resource negotiator [[Paper]](https://www.cse.ust.hk/~weiwa/teaching/Fall15-COMP6611B/reading_list/YARN.pdf)
24 |   - Vavilapalli, Vinod Kumar, et al. 
25 |   - Proceedings of the 4th annual Symposium on Cloud Computing. 2013.
26 | - Mesos: A platform for fine-grained resource sharing in the data center [[Paper]](https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf)
27 |   - Hindman, Benjamin, et al
28 |   - NSDI. Vol. 11. No. 2011. 2011.
29 | 
30 | 
31 | ## ML Platform
32 | - Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster [[GitHub]](https://github.com/facebookincubator/submitit)
33 | - Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.[[GitHub]](https://github.com/Jittor/jittor)
34 | - MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.[[GitHub]](https://github.com/mindspore-ai/mindspore)
35 | - MegEngine is a fast, scalable and easy-to-use numerical evaluation framework, with auto-differentiation.[[GitHub]](https://github.com/MegEngine/MegEngine)
36 | - cortexlabs/cortex: Deploy machine learning applications without worrying about setting up infrastructure, managing dependencies, or orchestrating data pipelines. [[GitHub]](https://github.com/cortexlabs/cortex)
37 | - Osquery is a SQL powered operating system instrumentation, monitoring, and analytics framework. [[Facebook Project]](https://osquery.io/)
38 | - Kubeflow: Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable. [[GitHub]](https://github.com/kubeflow/pipelines)
39 | - Polytaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes. [[GitHub]](https://github.com/polyaxon/polyaxon)
40 | - MLOps on Azure [[GitHub]](https://github.com/microsoft/MLOps)
41 | - Flame: An ML framework to accelerate research and its path to production. [[GitHub]](https://github.com/Open-ASAPP/flambe)
42 | - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. [[GitHub]](https://github.com/uber/ludwig)
43 | - intel-analytics/analytics-zoo Distributed Tensorflow, Keras and BigDL on Apache Spark [[GitHub]](https://github.com/intel-analytics/analytics-zoo)
44 | - Machine Learning for .NET [[GitHub]](https://github.com/dotnet/machinelearning)
45 |   - ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers.
46 |   - ML.NET allows .NET developers to develop their own models and infuse custom machine learning into their applications, using .NET, even without prior expertise in developing or tuning machine learning models.
47 | - ONNX: Open Neural Network Exchange [[GitHub]](https://github.com/onnx/onnx)
48 | - ONNXRuntime: has an open architecture that is continually evolving to address the newest developments and challenges in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard, supporting all ONNX releases with future compatibility and maintaining backwards compatibility with prior releases. [[GitHub]](https://github.com/microsoft/onnxruntime)
49 | - BentoML: Machine Learning Toolkit for packaging and deploying models [[GitHub]](https://github.com/bentoml/BentoML)
50 | - EuclidesDB: A multi-model machine learning feature embedding database [[GitHub]](https://github.com/perone/euclidesdb)
51 | - Prefect: Perfect is a new workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine. [[GitHub]](https://github.com/PrefectHQ/prefect)
52 | - MindsDB: MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects [[GitHub]](https://github.com/mindsdb/mindsdb)
53 | - PAI: OpenPAI is an open source platform that provides complete AI model training and resource management capabilities. [[Microsoft Project]](https://github.com/Microsoft/pai#resources)
54 | - Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems [[Facebook Project]](https://github.com/facebook/bistro)
55 | - GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network. [[GitHub]](https://github.com/gnes-ai/gnes)
56 | 
57 | 
58 | ## GPU Sharing
59 | 
60 | - Yu, P. and Chowdhury, M., 2019. Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications. arXiv preprint arXiv:1902.04610. [[Paper]](https://arxiv.org/pdf/1902.04610.pdf) [[GitHub]](https://github.com/SymbioticLab/Salus)
61 | - gpushare-scheduler-extender [[GitHub]](https://github.com/HuaizhengZhang/gpushare-scheduler-extender)
62 |   - More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods
63 |   
64 |   
65 | ## Userful Tools
66 | 
67 | #### Profile
68 | 
69 | - Performance issues analysis (Off-CPU) [[Website]](http://www.brendangregg.com/offcpuanalysis.html)
70 | - Collective Knowledge repository to automate MLPerf - a broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms [[GitHub]](https://github.com/ctuning/ck-mlperf)
71 | - NetworKit is a growing open-source toolkit for large-scale network analysis. [[GitHub]](https://github.com/kit-parco/networkit)
72 | - gpu-sentry: Flask-based package for monitoring utilisation of nVidia GPUs. [[GitHub]](https://github.com/jacenkow/gpu-sentry)
73 | - anderskm/gputil: A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python [[GitHub]](https://github.com/anderskm/gputil)
74 | - Pytorch-Memory-Utils: detect your GPU memory during training with Pytorch. [[GitHub]](https://github.com/Oldpan/Pytorch-Memory-Utils)
75 | - torchstat: a lightweight neural network analyzer based on PyTorch. [[GitHub]](https://github.com/Swall0w/torchstat)
76 | - NVIDIA GPU Monitoring Tools [[GitHub]](https://github.com/NVIDIA/gpu-monitoring-tools)
77 | - PyTorch/cpuinfo: cpuinfo is a library to detect essential for performance optimization information about host CPU. [[GitHub]](https://github.com/pytorch/cpuinfo)
78 | - Popular Network memory consumption and FLOP counts [[GitHub]](https://github.com/albanie/convnet-burden)
79 | - Intel® VTune™ Amplifier [[Website]](https://software.intel.com/en-us/vtune)
80 |   - Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors
81 | - Pyflame: A Ptracing Profiler For Python [[GitHub]](https://github.com/uber/pyflame)
82 | 
83 | #### Others
84 | - Facebook AI Performance Evaluation Platform [[GitHub]](https://github.com/facebook/FAI-PEP)
85 | - Netron: Visualizer for deep learning and machine learning models [[GitHub]](https://github.com/lutzroeder/netron)
86 | - Facebook/FBGEMM: FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. [[GitHub]](https://github.com/pytorch/FBGEMM)
87 | - Dslabs: Distributed Systems Labs and Framework for UW system course [[GitHub]](https://github.com/emichael/dslabs)
88 | - Machine Learning Model Zoo [[Website]](https://modelzoo.co/)
89 | - Faiss: A library for efficient similarity search and clustering of dense vectors [[GitHub]](https://github.com/facebookresearch/faiss)
90 | - Microsoft/MMdnn: A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models.[[GitHub]](https://github.com/Microsoft/MMdnn)
91 | - Example recipes for Kubernetes Network Policies that you can just copy paste [[GitHub]](https://github.com/ahmetb/kubernetes-network-policy-recipes)
92 | 


--------------------------------------------------------------------------------
/llm_serving.md:
--------------------------------------------------------------------------------
 1 | # LLM Serving
 2 | 
 3 | ## 2024
 4 | 
 5 | ### OSDI
 6 | 
 7 | - Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve [[paper]](https://www.usenix.org/conference/osdi24/presentation/agrawal)
 8 | - ServerlessLLM: Low-Latency Serverless Inference for Large Language Models [[paper]](https://www.usenix.org/conference/osdi24/presentation/fu)
 9 | - InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management [[paper]](https://www.usenix.org/conference/osdi24/presentation/lee)
10 | - Llumnix: Dynamic Scheduling for Large Language Model Serving [[paper]](https://www.usenix.org/conference/osdi24/presentation/sun-biao)
11 | - DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving [[paper]](https://www.usenix.org/conference/osdi24/presentation/zhong-yinmin)
12 | - dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving [[paper]](https://www.usenix.org/conference/osdi24/presentation/wu-bingyang)
13 | - Parrot: Efficient Serving of LLM-based Applications with Semantic Variable [[paper]](https://www.usenix.org/conference/osdi24/technical-sessions)
14 | - USHER: Holistic Interference Avoidance for Resource Optimized ML Inference [[paper]](https://www.usenix.org/conference/osdi24/presentation/shubha)
15 | - Fairness in Serving Large Language Models [[paper]](https://www.usenix.org/conference/osdi24/presentation/sheng)
16 | 


--------------------------------------------------------------------------------
/llm_training.md:
--------------------------------------------------------------------------------
 1 | # LLM Training
 2 | 
 3 | ## General 
 4 | 
 5 | - The Llama 3 Herd of Models [[paper]](https://arxiv.org/abs/2407.21783)
 6 | - TorchScale - A Library for Transformers at (Any) Scale [[GitHub]](https://github.com/microsoft/torchscale)
 7 | - DLRover: An Automatic Distributed Deep Learning System [[GitHub]](https://github.com/intelligent-machine-learning/dlrover)
 8 | 
 9 | 
10 | ## 2024
11 | 
12 | - FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision [[paper]](https://tridao.me/blog/2024/flash3/)
13 | - MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs [[paper]](https://arxiv.org/abs/2402.15627)
14 | - ByteCheckpoint: A Unified Checkpointing System for LLM Development [[paper]](https://arxiv.org/abs/2407.20143)
15 | 


--------------------------------------------------------------------------------
/note/Eurosys2020.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/note/Eurosys2021.md:
--------------------------------------------------------------------------------
 1 | ## MLsys in Eurosys2021
 2 | 
 3 | ### Main conference
 4 | 
 5 | - Profiling Dataflow Systems on Multiple Abstraction Levels [[paper]](https://doi.org/10.1145/3447786.3456254) [[YouTube]](https://www.youtube.com/watch?v=s78qnixF-Ww&list=PLzDuHU-z7gNjuSbEYCFXZtWAl3nAdNF2f&index=17)
 6 |   - Alexander Beischl (Technical University of Munich, Germany), Timo Kersten (Technical University of Munich, Germany), Maximilian Bandle (Technical University of Munich, Germany), Jana Giceva (Technical University of Munich, Germany), Thomas Neumann (Technical University of Munich).
 7 |   - Before this paper, the exisiting solutions
 8 |   - This paper wants
 9 |   - So they propose
10 |   - Hard part
11 |   - What metrics do the authors focus on?
12 | 
13 | - OFC: an opportunistic caching system for FaaS platforms [[Paper]](https://doi.org/10.1145/3447786.3456239) [[YouTube]](https://www.youtube.com/watch?v=wjo92xfIFLI&list=PLzDuHU-z7gNjuSbEYCFXZtWAl3nAdNF2f&index=15)
14 |   - Djob Mvondo (University Grenoble Alpes, France), Mathieu Bacou (TeleCom SudParis, France), Kevin Nguetchouang (ENSP, Cameroon), Lucien Ngale (ENSP, Cameroon), Stephane Pouget (ENS Lyon, France), Josiane Kouam (INRIA, France), Renaud Lachaize (University Grenoble Alpes, France), Jinho Hwang (Facebook, United States of America), Tim Wood (GWU, USA), Daniel Hagimont (University of Toulouse, France), Noel De Palma (Grenoble Alpes University, France), Batchakui bernabé (ENSP, Cameroon), Alain Tchana (ENS Lyon, France).
15 | 
16 | - SmartHarvest: Harvesting Idle CPUs Safely and Efficiently in the Cloud [[paper]](https://doi.org/10.1145/3447786.3456225) [[YouTube]](https://www.youtube.com/watch?v=9298p68G8f4&list=PLzDuHU-z7gNjuSbEYCFXZtWAl3nAdNF2f&index=18)
17 |   - Yawen Wang (Stanford University), Kapil Arya (Microsoft Research), Marios Kogias (Microsoft Research, Switzerland), Manohar Vanga (Nokia Bell Labs, Germany), Aditya Bhandari (Microsoft), Neeraja J. Yadwadkar (Stanford University, United States of America), Siddhartha Sen (Microsoft Research), Sameh Elnikety (Microsoft Research, United States of America), Christos Kozyrakis (Stanford University, United States of America), Ricardo Bianchini (Microsoft Research, United States of America).
18 | 
19 | - Rubberband: Cloud-based Hyperparameter Tuning [[paper]](https://dl.acm.org/doi/pdf/10.1145/3447786.3456245) [[YouTube]](https://www.youtube.com/watch?v=w_04ks34jwk&list=PLzDuHU-z7gNjuSbEYCFXZtWAl3nAdNF2f&index=20)
20 |   - Richard Liaw (UC Berkeley), Ujval Misra (UC Berkeley), Lisa Dunlap (UC Berkeley), Joseph Gonzalez (UC Berkeley, United States of America), Ion Stoica (UC Berkeley, United States of America), Alexey Tumanov (Georgia Tech, United States of America), Kirthevasan Kandasamy (UC Berkeley), Romil Bhardwaj (UC Berkeley, United States of America).
21 | 
22 | - Characterizing, Exploiting, and Detecting DMA Code Injection Vulnerabilities in the Presence of an IOMMU [[paper]](https://dl.acm.org/doi/10.1145/3447786.3456249)
23 |   - Alex Markuze (Technion), Shay Vargaftik (VMware Research), Gil Kupfer (Technion), Boris Pismenny (Technion), Nadav Amit (VMware Research), Adam Morrison (Tel Aviv University), Dan Tsafrir (Technion & VMware Research, Israel)
24 | 
25 | - Take it to the Limit: Peak Prediction-driven Resource Overcommitment in Datacenters [[paper]](https://doi.org/10.1145/3447786.3456259) [[Simulator]](https://github.com/googleinterns/cluster-resource-forecast)
26 |   - Noman Bashir (University of Massachusetts Amherst, United States of America), Nan Deng (Google LLC, United States of America), Krzysztof Rzadca (Google LLC and University of Warsaw, Poland), David Irwin (University of Massachusetts, Amherst), Sree Kodak (Google LLC), Rohit Jnagal (Google LLC).
27 | 
28 | - Accelerating Graph Sampling for Graph Machine Learning using GPUs [[paper]](https://dl.acm.org/doi/10.1145/3447786.3456244)
29 |   - Abhinav Jangda (University of Massachusetts Amherst, United States of America), Sandeep Polisetty (University of Massachusetts Amherst), Arjun Guha (Northeastern University, United States of America), Marco Serafini (University of Massachusetts Amherst, United States of America).
30 | 
31 | - Seastar: Vertex-Centric Programming for Graph Neural Networks [[Paper]](https://doi.org/10.1145/3447786.3456247)
32 |   - Yidi Wu (The Chinese University of Hong Kong), Kaihao Ma (The Chinese University of Hong Kong), Zhenkun Cai (The Chinese University of Hong Kong), Tatiana Jin (The Chinese University of Hong Kong), Boyang Li (The Chinese University of Hong Kong, China), Chenguang Zheng (The Chinese University of Hong Kong, China), James Cheng (The Chinese University of Hong Kong), Fan Yu (Huawei Technologies Co. Ltd).
33 | 
34 | - DGCL: An Efficient Communication Library for Distributed GNN Training [[paper]](https://doi.org/10.1145/3447786.3456233)
35 |   - Zhenkun Cai (The Chinese University of Hong Kong), Xiao Yan (Southern University of Science and Technology), Yidi Wu (The Chinese University of Hong Kong), Kaihao Ma (The Chinese University of Hong Kong), James Cheng (The Chinese University of Hong Kong), Fan Yu (Huawei Technologies Co. Ltd).
36 | 
37 | - FlexGraph: A flexible and efficient distributed framework for GNN training [[paper]](https://doi.org/10.1145/3447786.3456229)
38 |   - Lei Wang (Alibaba Group, China), Qiang Yin (Alibaba Group), Chao Tian (Alibaba Group), Jianbang Yang (Shanghai Jiao Tong University), Rong Chen (Shanghai Jiao Tong University, China), Wenyuan Yu (Alibaba Group, China), Zihang Yao (Shanghai Jiao Tong University), Jingren Zhou (Alibaba Group), Qiang Yin (Alibaba Group, China).
39 | 
40 | - Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU [[paper]](https://doi.org/10.1145/3447786.3456251)
41 |   - Zhen Xie (University of California, Merced), Wenqian Dong (University of California, Merced), Jiawen Liu (University of California, Merced), Hang Liu (Stevens Institute of Technology, United States of America), Dong Li (University of California, Merced).
42 | 
43 | - Achieving Low Tail-latency and High Scalability for Serializable Transactions in Edge Computing [[paper]](https://doi.org/10.1145/3447786.3456238)
44 |   - Xusheng Chen (University of Hong Kong, China), Haoze Song (University of Hong Kong), Jianyu Jiang (The University of Hong Kong, China), Chaoyi Ruan (USTC), Cheng Li (USTC, China), Sen Wang (Huawei Technologies), Nicholas Zhang (Huawei Technologies, China), Reynold Cheng (University of Hong Kong), Heming Cui (University of Hong Kong, China).
45 | 
46 | 
47 | ### EuroMLSys [[program]](https://www.euromlsys.eu/#schedule)
48 | 
49 | - Interference-Aware Scheduling for Inference Serving [[paper]](https://dl.acm.org/doi/pdf/10.1145/3437984.3458837)
50 | - Predicting CPU usage for proactive autoscaling
51 |   - LSTM-based CPU usage prediction 
52 | - AutoAblation: Automated Parallel Ablation Studies for Deep Learning
53 | - Queen Jane Approximately: Enabling Efficient Neural Network Inference with Context-Adaptivity
54 |   - early-exit model 
55 | - Fast Optimisation of Convolutional Neural Network Inference using System Performance Models
56 |   - performance modeling 
57 | 
58 | ### EdgeSys Workshop [[proceeding]](https://dl.acm.org/doi/proceedings/10.1145/3434770)
59 | 
60 | - eCaaS: A Management Framework of Edge Container as a Service for Business Workload [[paper]](https://dl.acm.org/doi/pdf/10.1145/3434770.3459741)
61 |   - an edge application deployment framework
62 | 
63 | - AlertMe: Towards Natural Language-Based Live Video Trigger Systems at the Edge [[paper]](https://dl.acm.org/doi/pdf/10.1145/3434770.3459740)
64 |   - multimodal AI on edge
65 | 
66 | - Scheduling Continuous Operators for IoT Edge Analytics [[paper]](https://dl.acm.org/doi/pdf/10.1145/3434770.3459738)
67 |   - IoT analytics on Edge-Fog-Cloud infrastructure
68 | 
69 | 


--------------------------------------------------------------------------------
/note/MLSys2021.md:
--------------------------------------------------------------------------------
 1 | # MLsys 2021 [[Proceeding]](https://proceedings.mlsys.org/paper/2022)
 2 | 
 3 | 
 4 | ## TinyML
 5 | 
 6 | - Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick [[paper]](https://proceedings.mlsys.org/paper/2021/hash/013d407166ec4fa56eb1e1f8cbe183b9-Abstract.html)
 7 |   - on-chip data compression for DNN
 8 |   - not my expertise. welcome to leaving your comments.
 9 | 
10 | - Value Learning for Throughput Optimization of Deep Neural Networks [[paper]](https://proceedings.mlsys.org/paper/2021/file/73278a4a86960eeb576a8fd4c9ec6997-Paper.pdf)
11 |   -  model DNN optimization as a MDP (state + action）
12 |   -  Exploring the state space efficeienctly: 
13 | 


--------------------------------------------------------------------------------
/note/NSDI2021.md:
--------------------------------------------------------------------------------
 1 | # NSDI 2021 MLsys or SysML
 2 | 
 3 | Paper Link: https://www.usenix.org/conference/nsdi21/technical-sessions
 4 | 
 5 | 
 6 | 
 7 | - Alohamora: Reviving HTTP/2 Push and Preload by Adapting Policies On the Fly
 8 |   - Nikhil Kansal, Murali Ramanujam, and Ravi Netravali, UCLA
 9 |   - Before this paper, the exisiting solutions
10 |   - This paper wants
11 |   - So they propose
12 |   - Hard part
13 |   - What metrics do the authors focus on?
14 | 


--------------------------------------------------------------------------------
/note/OSDI2020.md:
--------------------------------------------------------------------------------
 1 | # ML System in OSDI 2020
 2 | 
 3 | ## Compiler
 4 | 
 5 | - Ansor: Generating High-Performance Tensor Programs for Deep Learning [[arxiv]](https://arxiv.org/pdf/2006.06762.pdf)
 6 |   - Lianmin Zheng, UC Berkeley; Chengfan Jia, Minmin Sun, and Zhao Wu, Alibaba Inc.; Cody Hao Yu, Amazon Web Services, Inc; Ameer Haj-Ali, UC Berkeley; Yida Wang, Amazon Web Services, Inc; Jun Yang, Alibaba Inc.; Danyang Zhuo, Duke University and UC Berkeley; Koushik Sen, Joseph Gonzalez, and Ion Stoica, UC Berkeley
 7 | - A Tensor Compiler Approach for One-size-fits-all ML Prediction Serving [[arxiv]](https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf) [[code]](https://github.com/microsoft/hummingbird)
 8 |   - Supun Nakandala, University of California San Diego; Karla Saur, Microsoft; Gyeong-In Yu, Seoul National University; Konstantinos Karanasos and Carlo Curino, Microsoft; Markus Weimer, Microsoft Research; Matteo Interlandi, Microsoft
 9 | - Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
10 |   - Lingxiao Ma, Peking University and Microsoft Research; Zhiqiang Xie, ShanghaiTech University and Microsoft Research; Zhi Yang, Peking University; Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou, Microsoft Research
11 | 
12 | ## Serving
13 | 
14 | - Serving DNNs like Clockwork: Performance Predictability from the Bottom Up [[arxiv]](https://arxiv.org/pdf/2006.02464.pdf)
15 |   - Arpan Gujarati, Max Planck Institute for Software Systems; Reza Karimi, Emory University; Safya Alzayat and Antoine Kaufmann, Max Planck Institute for Software Systems; Ymir Vigfusson, Emory University; Jonathan Mace, Max Planck Institute for Software Systems
16 | - PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
17 |   - Zhihao Bai and Zhen Zhang, Johns Hopkins University; Yibo Zhu, ByteDance Inc.; Xin Jin, Johns Hopkins University
18 | 
19 | ## Training in clusters
20 | 
21 | - HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees [[Code]](https://github.com/microsoft/hivedscheduler)
22 |   - Hanyu Zhao, Peking University; Zhenhua Han, The University of Hong Kong; Zhi Yang, Peking University; Quanlu Zhang, Fan Yang, Lidong Zhou, and Mao Yang, Microsoft Research; Francis C.M. Lau, The University of Hong Kong; Yuqi Wang, Yifan Xiong, and Bin Wang, Microsoft
23 | - Retiarii: A Deep Learning Exploratory-Training Framework
24 |   - Quanlu Zhang, Zhenhua Han, Fan Yang, Yuge Zhang, Zhe Liu, Mao Yang, and Lidong Zhou, Microsoft Research Asia
25 | - Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads [[arxiv]](https://arxiv.org/pdf/2008.09213.pdf)
26 |   - Deepak Narayanan, Keshav Santhanam, and Fiodar Kazhamiaka, Stanford University; Amar Phanishayee, Microsoft Research; Matei Zaharia, Stanford University
27 | - KungFu: Making Training in Distributed Machine Learning Adaptive
28 |   - Luo Mai, Guo Li, Marcel Wagenlander, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch, Imperial College London
29 | - A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
30 |   - Yimin Jiang, Tsinghua University; Yibo Zhu, ByteDance Inc.; Chang Lan, Google; Bairen Yi, ByteDance Inc.; Yong Cui, Tsinghua University; Chuanxiong Guo, ByteDance Inc.
31 | - AntMan: Dynamic Scaling on GPU Cluster for Deep Learning
32 |   - Wencong Xiao, Shiru Ren, Yong Li, Yang zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia, Alibaba Group


--------------------------------------------------------------------------------
/note/SIGCOMM2020.md:
--------------------------------------------------------------------------------
 1 | # Notes of ML System Related Papers in SIGCOMM2020
 2 | 
 3 | 
 4 | ## Video Analysis System with Deep Learning
 5 | 
 6 | - SIGCOMM 2020 Topic Preview: Video + Machine Learning [[Video]](https://www.youtube.com/watch?v=5VtWWG_a1sk&feature=youtu.be)
 7 | 
 8 | - Server-Driven Video Streaming for Deep Learning Inference [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3387514.3405887)
 9 |   - Du, Kuntai, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, and Junchen Jiang. 
10 |   - Context:
11 |   - Challenging:
12 |   - Existing Solution:
13 |   - Gaps:
14 |   - Proposed Method
15 | 
16 | - Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3387514.3405874)
17 |   - Li, Yuanqi, Arthi Padmanabhan, Pengzhan Zhao, Yufei Wang, Guoqing Harry Xu, and Ravi Netravali.
18 |   - Context:
19 |   - Challenging:
20 |   - Existing Solution:
21 |   - Gaps:
22 |   - Proposed Method
23 | 
24 | - Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3387514.3405856)
25 |   - Kim, Jaehong, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han.
26 |   - Context:
27 |   - Challenging:
28 |   - Existing Solution:
29 |   - Gaps:
30 |   - Proposed Method
31 | 
32 | ## Deep Learning for Networking
33 | 
34 | - Interpreting Deep Learning-Based Networking Systems [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3387514.3405859)
35 |   - Meng, Zili, Minhu Wang, Jiasong Bai, Mingwei Xu, Hongzi Mao, and Hongxin Hu.
36 |   - Context:
37 |   - Challenging:
38 |   - Existing Solution:
39 |   - Gaps:
40 |   - Proposed Method
41 | 


--------------------------------------------------------------------------------
/note/SoCC2019.md:
--------------------------------------------------------------------------------
 1 | ## Machine Learning Related Paper
 2 | 
 3 | - Cartel: A System for Collaborative Transfer Learning at the Edge (Georgia Institute of Technology)
 4 |   - 这篇文章讲在边缘节点进行迁移学习
 5 |   - 背景：随着边缘节点和5G技术的普及，各种各样的应用正在快速发展，并且提出了新的要求
 6 |   - 问题：目前的机器学习系统，1）需要将数据迁移到中心数据中心进行建模，然后将模型返回到边缘节点，非常的不高效；2）一个完整的模型在边缘节点或许是不需要的，可以只利用边缘节点数据进行训练，可是这样会损失精度
 7 |   - 方案：这个文章提出了一个系统，在边缘节点云（一个分享环境中）构建了一个联合学习系统，定制其中的模型，使其能够快速反应数据变化，并且精度和大模型差不多
 8 |   - 结果：1）速度：对于workload变化比独立学习要快；2）传输数据量和模型量小；3）训练时间短
 9 |   - 重要的机制：1）drift detection 2）logical neighbor （有着类似先验数据的节点）3）knowledge transfer
10 |   
11 | - HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline (UC Berkerley)
12 |   - 这篇文章讲如何在训练过程中，对于之后精确度更高的那个模型，给予更多资源，完成训练以在deadline之前部署
13 |   
14 | - BigDL: A Distributed Deep Learning Framework for Big Data （Intel & Tencent)
15 |   - 描述了部署在公司集群的大规模数据处理框架
16 |   
17 | - Cirrus: a Serverless Framework for End-to-end ML Workflows (UC berkerley, 质量很高的一篇文章）
18 |   - 讲述了如何利用AWS lambda和S3构建一个深度学习训练平台
19 |   - 背景：
20 |   - 问题：
21 |   - 方案：
22 |   - 结果：
23 | 
24 | - DCUDA: Dynamic GPU Scheduling with Live Migration Support
25 |   - 这篇文章是对于GPU的资源利用，进行动态的调整，主要就是重新迁移正在执行的任务，做一个balance
26 |   
27 | - BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud (Penn)
28 |   - 这篇文章是讲利用burstable的instance弥补下在autoscaling时的问题，里面的排队论建模部分比较有意思
29 |   - 背景：
30 |   - 问题：
31 |   - 方案：
32 |   - 结果：
33 |   
34 | - An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications
35 |   - 用来在集群中做问题诊断的，内容较少后面可以读一下
36 |    
37 |  - A System-Wide Debugging Assistant Powered by Natural Language Processing
38 |    - 以NLP的方式做系统debug，很新颖的思路，后面可以看一下
39 |    
40 |  - Perceptual Compression for Video Storage and Processing Systems (UW的一个工作）
41 |    - 用显著性模型做视频压缩，并设计了一个存储管理器来管理用户的主观信息
42 |    
43 |  - PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems
44 |    - 也是用来做性能诊断的，感觉质量很高值得一看
45 |    - 背景：
46 |    - 问题：
47 |    - 方案：
48 |    - 结果：
49 |    
50 |  
51 | 
52 |    
53 |  
54 | 


--------------------------------------------------------------------------------
/note/SoCC2020.md:
--------------------------------------------------------------------------------
 1 | # Two Machine Learning System Sections
 2 | 
 3 | - PaGraph: Scaling GNN training on large graphs via computation-aware caching 
 4 |   - 本文章主要针对大规模GNN训练。没有相关背景，不是很能看懂。当前方案用mini-batch和sampling，在大的graph上进行训练。
 5 |   - 但是在从CPU载入rich vertices到GPU的时候，他们会有scalability的问题，此时bandwidth极大地限制了训练。
 6 |   - 这篇文章提出了一个系统，在单机多卡上面训练GNN。他利用可用GPU资源作为缓存载体，减少data loading time。他们还设计了一个缓存策略，考虑图结构信息和数据获取模式。最后，为了在多个GPU上面分布式计算，他们开发了一个fast GNN-computation-aware partition algorithm来减少交叉切割。
 7 |   
 8 | - Baechi: fast device placement of machine learning graphs 
 9 |   - 这个文章应该是follow google那个DRL文章，做device placement。
10 |   - 之前的方案会导致训练model-parallelism非常耗时，因为规划算子在不同device这个过程很花时间。
11 |   - 这个文章提出了一个系统，可以再资源受限的小集群上面运行。系统中包含了一个高效算法，可以极大地加速方案搜索。
12 | 
13 | - Semi-dynamic load balancing: efficient distributed learning in non-dedicated environments (HKUST)
14 |   - 在集群上训练深度学习模型，经常会遇到非独占的并且异构的资源。这会导致分布式训练被其中较慢的设备所限制，导致效率降低。
15 |   - 目前的load balancer无法高效可行的解决这个问题。
16 |   - 所以，这个文章就设计了一个半动态的load balancer解决这个问题，主要思想就是在以每次training interation为界限，利用机器当前的瞬时处理能力，及时调整batch size
17 |   
18 | - Network-accelerated distributed machine learning for multi-tenant settings (Uber, AWS, University of Wisconsin-Madison)
19 |   - 在集群中训练中的资源争抢会导致出现straggler（和上面文章类似）
20 |   - 提出了一个统一的网络管理器，MLfabric，来解决这个问题
21 |   - 梯度传输和模型传输这些需要占用大量网络资源的任务，会被统一的进行排队管理，加快收敛，提高效率。同时还会：1）找机会利用空闲机器进行数据集成和分析，2）生成复制体，提高系统对错误的容忍度
22 |   
23 | - Vessels: Efficient and Scalable Deep Learning Prediction on Trusted Processors (Purdue)
24 |   - 这篇文章主要是讲在深度学习系统中进行敏感数据保护 （不是很懂该领域）
25 |   - 先是分析了Intel SGX两个显著问题：需要分配大内存和很低的内存复用率
26 |   - 为了解决这个问题，提出了一个新系统，来做内存优化。具体就是识别分析深度学习程序的内存的分配和利用模式，然后创建一个可信环境，利用较少的资源分配和较高的复用来执行程序
27 |   
28 | - **InferLine: latency-aware provisioning and scaling for prediction serving pipelines (UCB clipper团队续作）**
29 |   - 这篇文章聚焦于机器学习推理pipeline优化
30 |   - 为了保证整个推理流程端到端的延迟，必须要充分的考虑模型的batch size，hardware的类型和动态的arrival rate
31 |   - 文章提出了一个infereline推理系统，来进行资源配给和推理各阶段（pre，inference，...）管理。具体来说就是设计了一个低频使用的组合planer+一个高频的autoscaling tuner。前者对每个阶段进行性能分析并模拟各种组合，然后自动的选择hardware，batch size还有扩容数的组合。后者对每个阶段利用网络积分进行分析（不是很懂，看一下细节需要），然后自动化按照阶段以及arrival rate进行扩容。
32 |   
33 | - GSLICE: controlled spatial sharing of GPUs for a scalable inference platform
34 |   - 这个文章聚焦于推理阶段的GPU sharing
35 |   - 文章先是说MPS和CUDA stream不足以保证可预测的性能和实现scalability
36 |   - 然后文章设计了一个GPU资源分配和管理框架。这个框架虚拟化GPU资源为多个不同的推理函数（inference functions），用来保证资源的独立性以确保性能。然后用自学习的方式，动态分配GPU资源和batching的方式。整个系统可以将inference function进行拓展，能够充分对不同的arrival rate进行反应，达到了scalability。
37 |   
38 | - Elastic parameter server load distribution in deep learning clusters
39 |   - 这篇文章聚焦于训练加速
40 |   - 使用Parameter Server这种架构进行训练的时候，由于各种原因（不均衡的参数分布，网络资源争抢等等）会导致有些节点比较慢，拖慢系统效率。
41 |   - 该文章设计了一个动态的parameter server load balancer来解决上述问题。具体来说就是在训练运行过程中，利用一个exploitation-exploration方案，调整参数分布（后面细看一下）。
42 | 
43 |   
44 | 


--------------------------------------------------------------------------------
/note/SoCC2021.md:
--------------------------------------------------------------------------------
1 | # SoCC 2021 [[paper]](https://dl.acm.org/doi/proceedings/10.1145/3472883)
2 | 
3 | ## Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving
4 | 


--------------------------------------------------------------------------------
/paper/mlsys-whitepaper.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HuaizhengZhang/AI-Infra-from-Zero-to-Hero/e956db3a372fb520d5f0eeec18176fdd81eacd4d/paper/mlsys-whitepaper.pdf


--------------------------------------------------------------------------------
/training.md:
--------------------------------------------------------------------------------
 1 | # Training System
 2 | 
 3 | System for deep learning training. Currently, I only summarize some arxiv papers here and put accepted papers into [conference](note) section
 4 | 
 5 | ## Survey
 6 | 
 7 | - Mayer, Ruben, and Hans-Arno Jacobsen. "Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques, and Tools." ACM Computing Surveys (CSUR) 53.1 (2020): 1-37. [[Paper]](https://arxiv.org/pdf/1903.11314.pdf)
 8 | 
 9 | ## Training(Multi-jobs on cluster)
10 | 
11 | - Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning [[arxiv]](https://arxiv.org/pdf/2008.12260.pdf) [[GitHub]](https://github.com/petuum/adaptdl)
12 |   - arXiv preprint arXiv:2008.12260 (2020).
13 |   - Qiao, Aurick, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing
14 | - Themis: Fair and Efficient {GPU} Cluster Scheduling. [[Paper]](http://wisr.cs.wisc.edu/papers/nsdi20-themis.pdf)
15 |   - Mahajan, K., Balasubramanian, A., Singhvi, A., Venkataraman, S., Akella, A., Phanishayee, A. and Chawla, S., 2020. Themis: Fair and Efficient {GPU} Cluster Scheduling.
16 |   - In 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20) (pp. 289-304).
17 | - Tiresias: A {GPU} cluster manager for distributed deep learning. [[Paper]](https://www.usenix.org/system/files/nsdi19-gu.pdf) [[GitHub]](https://github.com/SymbioticLab/Tiresias)
18 |   - Gu, J., Chowdhury, M., Shin, K.G., Zhu, Y., Jeon, M., Qian, J., Liu, H. and Guo, C., 2019. 
19 |   - In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19) (pp. 485-500).
20 | - Microsoft OpenPAI HiveDScheduler: As one standalone component of Microsoft OpenPAI, HiveD is designed to be a Kubernetes Scheduler Extender for Multi-Tenant GPU clusters. [[Project]](https://github.com/microsoft/hivedscheduler)
21 | - Gandiva: Introspective cluster scheduling for deep learning. [[Paper]](https://www.usenix.org/system/files/osdi18-xiao.pdf)
22 |   - Xiao, Wencong, et al. (*OSDI 2018*)
23 |   - Summary: Improvet the efficency of hyper-parameter in cluster. Aware of hardware utilization.
24 | - Optimus: an efficient dynamic resource scheduler for deep learning clusters [[Paper]](https://i.cs.hku.hk/~cwu/papers/yhpeng-eurosys18.pdf)
25 |   - Peng, Yanghua, et al. (*EuroSys 2018*)
26 |   - Summary: Job scheduling on clusters. Total complete time as the metric.
27 | - Multi-tenant GPU clusters for deep learning workloads: Analysis and implications. [[Paper]](https://www.microsoft.com/en-us/research/uploads/prod/2018/05/gpu_sched_tr.pdf) [[dataset]](https://github.com/msr-fiddle/philly-traces)
28 |   - Jeon, Myeongjae, Shivaram Venkataraman, Junjie Qian, Amar Phanishayee, Wencong Xiao, and Fan Yang
29 | - Slurm: A Highly Scalable Workload Manager [[GitHub]](https://github.com/SchedMD/slurm)
30 | 
31 | 
32 | ## Training(Parallelism)
33 | 
34 | - ZeRO: Memory Optimization Towards Training A Trillion Parameter Models. *Microsoft Work* [[Paper]](https://arxiv.org/pdf/1910.02054.pdf) [[GitHub]](https://github.com/microsoft/DeepSpeed)
35 | - Class materials for a distributed systems lecture series [[GitHub]](https://github.com/aphyr/distsys-class)
36 | - A Unified Architecture for Accelerating Distributed
37 | DNN Training in Heterogeneous GPU/CPU Clusters [[Paper]](https://www.usenix.org/system/files/osdi20-jiang.pdf) [[GitHub]](https://github.com/bytedance/byteps)
38 |   - Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo. (OSDI 2020)
39 |   - Summary: SoTA Parameter Server
40 | - PipeDream: Generalized Pipeline Parallelism for DNN Training (SOSP2019) [[Paper]](https://cs.stanford.edu/~matei/papers/2019/sosp_pipedream.pdf) [[Github]](https://github.com/msr-fiddle/pipedream)
41 | - Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. [[Paper]](http://proceedings.mlr.press/v80/jia18a/jia18a.pdf) [[GitHub]](https://github.com/flexflow/FlexFlow)
42 |   - Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. (*ICML 2018*)
43 | - Mesh-TensorFlow: Deep Learning for Supercomputers [[Paper]](https://arxiv.org/pdf/1811.02084.pdf) [[GitHub]](https://github.com/tensorflow/mesh)
44 |   - Shazeer, Noam, Youlong Cheng, Niki Parmar, Dustin Tran, et al. (*NIPS 2018*)
45 |   - Summary: Data parallelism for language model
46 | - PyTorch-BigGraph: A Large-scale Graph Embedding System [[Paper]](https://arxiv.org/pdf/1903.12287.pdf) [[GitHub]](https://github.com/facebookresearch/PyTorch-BigGraph)
47 |   - Lerer, Adam and Wu, Ledell and Shen, Jiajun and Lacroix, Timothee and Wehrstedt, Luca and Bose, Abhijit and Peysakhovich, Alex (*SysML 2019*)
48 | - Beyond data and model parallelism for deep neural networks [[Paper]](https://arxiv.org/pdf/1807.05358.pdf) [[GitHub]](https://github.com/jiazhihao/metaflow_sysml19)
49 |   - Jia, Zhihao, Matei Zaharia, and Alex Aiken. (*SysML 2019*)
50 |   - Summary: SOAP (sample, operation, attribution and parameter) parallelism. Operator graph, device topology and extution optimizer. MCMC search algorithm and excution simulator.
51 | - Device placement optimization with reinforcement learning [[Paper]](https://arxiv.org/pdf/1706.04972.pdf)
52 |   - Mirhoseini, Azalia, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. (*ICML 17*)
53 |   - Summary: Using REINFORCE learn a device placement policy. Group operations to excute. Need a lot of GPUs.
54 | - Spotlight: Optimizing device placement for training deep neural networks  [[Paper]](http://proceedings.mlr.press/v80/gao18a/gao18a.pdf)
55 |   - Gao, Yuanxiang, Li Chen, and Baochun Li (*ICML 18*)
56 | - GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [[Paper]](https://arxiv.org/pdf/1811.06965.pdf) [[GitHub]](https://github.com/tensorflow/lingvo/blob/master/lingvo/core/gpipe.py) [[News]](https://www.cnbeta.com/articles/tech/824495.htm)
57 |   - Huang, Yanping, et al. (*arXiv preprint arXiv:1811.06965 (2018)*)
58 | - Horovod: Distributed training framework for TensorFlow, Keras, and PyTorch. 
59 | [[GitHub]](https://github.com/uber/horovod)
60 | - Distributed machine learning infrastructure for large-scale robotics research [[GitHub]](https://github.com/google-research/tensor2robot) [[Blog]](https://ai.google/research/teams/brain/robotics/)
61 | - A Generic Communication Scheduler for Distributed DNN Training Acceleration [[Paper]](https://i.cs.hku.hk/~cwu/papers/yhpeng-sosp19.pdf) [[BytePS]](https://github.com/bytedance/byteps)
62 |   - PENG, Y., Zhu, Y., CHEN, Y., BAO, Y., Yi, B., Lan, C., Wu, C. and Guo, (*SOSP 2019*)
63 |   - Summary: communication schedular
64 | 
65 | 
66 | ## Training(Fault-tolerant)
67 | 
68 | - Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates [[Paper]](https://dl.acm.org/doi/abs/10.1145/3600006.3613152) [[GitHub]](https://github.com/SymbioticLab/Oobleck)
69 |   - Jang, Insu and Yang, Zhenning and Zhang, Zhen and Jin, Xin and Chowdhury, Mosharaf, (*SOSP 2023*)
70 | - Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs [[Paper]](https://www.usenix.org/conference/nsdi23/presentation/thorpe) [[GitHub]](https://github.com/uclasystem/bamboo)
71 |   - John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali and Guoqing Harry Xu, (*NSDI 2023*)
72 | - Varuna: scalable, low-cost training of massive deep learning models [[Paper]](https://dl.acm.org/doi/abs/10.1145/3492321.3519584) [[GitHub]](https://github.com/microsoft/varuna)
73 |   - Athlur, Sanjith and Saran, Nitika and Sivathanu, Muthian and Ramjee, Ramachandran and Kwatra, Nipun, (*EuroSys 2022*)
74 | 
75 | 
76 | ## Training(Energy-efficient)
77 | 
78 | - Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training [[Paper]](https://www.usenix.org/system/files/nsdi23-you.pdf) [[GitHub]](https://github.com/ml-energy/zeus) [[ml.energy]](https://ml.energy/zeus/) [[The ML.ENERGY Initiative]](https://ml.energy/)
79 |   - Jie You, Jae-Won Chung, and Mosharaf Chowdhury (*NSDI 2023*)
80 | 
81 | 


--------------------------------------------------------------------------------
/video_system.md:
--------------------------------------------------------------------------------
 1 | # Video System
 2 | 
 3 | ## Tools
 4 | 
 5 | - VideoFlow: Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. [[GitHub]](https://github.com/videoflow/videoflow)
 6 | - VidGear: Powerful Multi-Threaded OpenCV and FFmpeg based Turbo Video Processing Python Library with unique State-of-the-Art Features. [[GitHub]](https://github.com/abhiTronix/vidgear)
 7 | - NVIDIA DALI: A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications [[GitHub]](https://github.com/NVIDIA/DALI)
 8 | - TensorStream: A library for real-time video stream decoding to CUDA memory [[GitHub]](https://github.com/Fonbet/argus-tensor-stream)
 9 | - C++ image processing library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) [[GitHub]](https://github.com/ermig1979/Simd)
10 | - Pretrained image and video models for Pytorch. [[GitHub]](https://github.com/alexandonian/pretorched-x)
11 | - LiveDetect - Live video client to DeepDetect. [[GitHub]](https://github.com/jolibrain/livedetect)
12 | 
13 | ## Video Analysis Papers
14 | 
15 | - Server-Driven Video Streaming for Deep Learning Inference [[Paper]](https://kuntaidu.github.io/assets/doc/DDS.pdf)
16 |   - Kuntai Du, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, Junchen Jiang (*SIGCOMM2020*)
17 | - Reducto: On-Camera Filtering for ResourceEfficient Real-Time Video Analytics [[Paper]](http://web.cs.ucla.edu/~harryxu/papers/li-sigcomm20.pdf)
18 |   - Yuanqi Li, Arthi Padmanabhan, Pengzhan Zhao, Yufei Wang, Guoqing Harry Xu, Ravi Netravali. (*SIGCOMM2020*)
19 | - Fu, Daniel Y., et al. "Rekall: Specifying video events using compositions of spatiotemporal labels." arXiv preprint arXiv:1910.02993 (2019). [[Paper]](https://arxiv.org/pdf/1910.02993.pdf)
20 | - Puffer: Puffer is a Stanford University research study about using machine learning to improve video-streaming algorithms. Please visit [[GitHub]](https://github.com/StanfordSNR/puffer)
21 | - Visual Road: A Video Data Management Benchmark [[Project Website]](http://db.cs.washington.edu/projects/visualroad/)
22 |   - Brandon Haynes, Amrita Mazumdar, Magdalena Balazinska, Luis Ceze, Alvin Cheung (*SIGMOD 2019*)
23 | - CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video [[Paper]](http://www.sysml.cc/doc/2019/111.pdf)
24 |   - Mao, Huizi, Taeyoung Kong, and William J. Dally. (*SysML2019*)
25 | - Live Video Analytics at Scale with Approximation and Delay-Tolerance [[Paper]](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/02/videostorm_nsdi17.pdf)
26 |   - Zhang, Haoyu, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. (*NSDI 2017*)
27 | - Chameleon: scalable adaptation of video analytics [[Paper]](http://people.cs.uchicago.edu/~junchenj/docs/Chameleon_SIGCOMM_CameraReady.pdf)
28 |   - Jiang, Junchen, et al. (*SIGCOMM 2018*)
29 |   - Summary: Configuration controller for balancing accuracy and resource. Golden configuration is a good design. Periodic profiling often exceeded any resource savings gained by adapting the configurations.
30 | - Kang, Daniel, Peter Bailis, and Matei Zaharia. "Blazeit: Fast exploratory video queries using neural networks." arXiv preprint arXiv:1805.01046 (2018). [[Paper]](https://arxiv.org/pdf/1805.01046.pdf)
31 | - Noscope: optimizing neural network queries over video at scale [[Paper]](https://arxiv.org/pdf/1703.02529) [[GitHub]](https://github.com/stanford-futuredata/noscope)
32 |   - Kang, Daniel, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. (*VLDB2017*)
33 |   - Summary: Information cache + difference detection model + small detection model + sequence optimizer
34 | - SVE: Distributed video processing at Facebook scale [[Paper]](http://www.cs.princeton.edu/~wlloyd/papers/sve-sosp17.pdf)
35 |   - Huang, Qi, et al. (*SOSP2017*)
36 | - Scanner: Efficient Video Analysis at Scale [[Paper]](http://graphics.stanford.edu/papers/scanner/poms18_scanner.pdf)[[GitHub]](https://github.com/scanner-research/scanner)
37 |   - Poms, Alex, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian (*SIGGRAPH 2018*)
38 | - A cloud-based large-scale distributed video analysis system [[Paper]](https://ai.google/research/pubs/pub45631)
39 |   - Wang, Yongzhe, et al. (*ICIP 2016*)
40 | - Rosetta: Large scale system for text detection and recognition in images [[Paper]](https://research.fb.com/wp-content/uploads/2018/10/Rosetta-Large-scale-system-for-text-detection-and-recognition-in-images.pdf)
41 |   - Borisyuk, Fedor, Albert Gordo, and Viswanath Sivakumar. (*KDD 2018*)
42 | 
43 | 
44 | ## Video Streaming Papers
45 | 
46 | - Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning [[Paper]](http://ina.kaist.ac.kr/~livenas/livenas_sigcomm2020.pdf)
47 |   - Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han (*SIGCOMM2020*)
48 | - Learning in situ: a randomized experiment in video streaming [[Paper]](https://www.usenix.org/system/files/nsdi20-paper-yan.pdf)
49 |   - Francis Y. Yan and Hudson Ayers, Stanford University; Chenzhi Zhu, Tsinghua University; Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, and Keith Winstein, Stanford University (*NSDI2020*)
50 | - CSI: Inferring Mobile ABR Video Adaptation Behavior under HTTPS and QUIC [[Paper]](https://dl.acm.org/doi/abs/10.1145/3342195.3387558)
51 |   - Shichang Xu (University of Michigan), Subhabrata Sen (AT&T Labs Research), Z. Morley Mao (University of Michigan) (*Eurosys2020*)
52 | - Reconstructing proprietary video streaming algorithms [[Paper]](https://www.usenix.org/conference/atc20/presentation/gruener)
53 |   - Maximilian Grüner, Melissa Licciardello, and Ankit Singla, ETH Zürich (*ATC2020*)
54 | - Neural adaptive content-aware internet video delivery. [[Paper]](https://www.usenix.org/system/files/osdi18-yeo.pdf) [[GitHub]](https://github.com/kaist-ina/NAS_public)
55 |   - Yeo, H., Jung, Y., Kim, J., Shin, J. and Han, D., 2018.  (*OSDI 2018*)
56 |   - Summary: Combine video super-resolution and ABR
57 | 


--------------------------------------------------------------------------------