├── .gitignore
├── Contribute
    ├── renderer.py
    ├── data.yaml
    └── config.yaml
├── LicenseBlock
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── Papers
    ├── AutoML
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── Others
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── Network_Pruning
    │   ├── config.yaml
    │   ├── renderer.py
    │   └── README.md
    ├── Applications
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── Quantization
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── Federated_Learning
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── Efficient_Architectures
    │   ├── config.yaml
    │   └── renderer.py
    ├── ML_Algorithms_For_Edge
    │   ├── config.yaml
    │   ├── renderer.py
    │   ├── README.md
    │   └── data.yaml
    ├── config.yaml
    ├── renderer.py
    ├── data.yaml
    └── README.md
├── Datasets
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── Challenges
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── Other_Resources
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── AI_Chips
    ├── config.yaml
    ├── data.yaml
    └── renderer.py
├── Books
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── MCU_and_MPU_Software_Packages
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── Inference_Engines
    ├── config.yaml
    ├── renderer.py
    └── data.yaml
├── config.yaml
├── data.yaml
├── utils.py
├── style.py
├── awesome.py
├── LICENSE
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | 


--------------------------------------------------------------------------------
/Contribute/renderer.py:
--------------------------------------------------------------------------------
1 | def renderer(fp, data, config):
2 |     pass
3 | 


--------------------------------------------------------------------------------
/LicenseBlock/config.yaml:
--------------------------------------------------------------------------------
1 | # LICENSE
2 | ---
3 | description:
4 | ...
5 | 


--------------------------------------------------------------------------------
/Papers/AutoML/config.yaml:
--------------------------------------------------------------------------------
1 | # AUTOML
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Papers/Others/config.yaml:
--------------------------------------------------------------------------------
1 | # OTHERS
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Papers/Network_Pruning/config.yaml:
--------------------------------------------------------------------------------
1 | # PRUNING
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...


--------------------------------------------------------------------------------
/Papers/Applications/config.yaml:
--------------------------------------------------------------------------------
1 | # APPLICATIONS
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Papers/Quantization/config.yaml:
--------------------------------------------------------------------------------
1 | # QUANTIZATION
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Contribute/data.yaml:
--------------------------------------------------------------------------------
1 | # CONTRIBUTE
2 | #
3 | # [Template]
4 | # name:
5 | ---
6 | name: Contribute
7 | ...
8 | 


--------------------------------------------------------------------------------
/Papers/Federated_Learning/config.yaml:
--------------------------------------------------------------------------------
1 | # FEDERATED LEARNING
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Papers/Efficient_Architectures/config.yaml:
--------------------------------------------------------------------------------
1 | # EFFICIENT NETWORKS
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Papers/ML_Algorithms_For_Edge/config.yaml:
--------------------------------------------------------------------------------
1 | # ML ALGORITHMS FOR EDGE
2 | ---
3 | sort_key: date
4 | sort_reverse: true
5 | ...
6 | 


--------------------------------------------------------------------------------
/Datasets/config.yaml:
--------------------------------------------------------------------------------
 1 | # DATASETS
 2 | #
 3 | # [Template]
 4 | # description:
 5 | # sort_key:
 6 | # sort_reverse:
 7 | ---
 8 | description:
 9 | sort_key: name
10 | ...
11 | 


--------------------------------------------------------------------------------
/Challenges/config.yaml:
--------------------------------------------------------------------------------
 1 | # CHALLENGES
 2 | #
 3 | # [Template]
 4 | # description:
 5 | # sort_key:
 6 | # sort_reverse:
 7 | ---
 8 | description:
 9 | sort_key: name
10 | ...
11 | 


--------------------------------------------------------------------------------
/Papers/config.yaml:
--------------------------------------------------------------------------------
 1 | # PAPERS
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description:
10 | sort_key: name
11 | ...
12 | 


--------------------------------------------------------------------------------
/Other_Resources/config.yaml:
--------------------------------------------------------------------------------
 1 | # OTHER RESOURCES
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description:
10 | sort_key: name
11 | ...
12 | 


--------------------------------------------------------------------------------
/AI_Chips/config.yaml:
--------------------------------------------------------------------------------
 1 | # AI CHIPS
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description: List of resources about AI Chips
10 | sort_key: name
11 | ...
12 | 


--------------------------------------------------------------------------------
/LicenseBlock/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import h2
 2 | from style import newline
 3 | 
 4 | 
 5 | def renderer(fp, data, config):
 6 |    fp.write(data["logo"])
 7 |    newline(fp)
 8 |    newline(fp)
 9 |    fp.write(data["description"])
10 | 


--------------------------------------------------------------------------------
/Books/config.yaml:
--------------------------------------------------------------------------------
 1 | # BOOKS
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description: List of books with focus on on-device (e.g., edge or mobile) machine learning.
10 | sort_key: published
11 | sort_reverse: true
12 | ...
13 | 


--------------------------------------------------------------------------------
/MCU_and_MPU_Software_Packages/config.yaml:
--------------------------------------------------------------------------------
 1 | # MCU AND MPU SOFTWARE PACKAGES
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description: List of software packages for AI development on MCU and MPU
10 | sort_key: name
11 | ...
12 | 


--------------------------------------------------------------------------------
/Inference_Engines/config.yaml:
--------------------------------------------------------------------------------
 1 | # INFERENCE ENGINES
 2 | #
 3 | # [Template]
 4 | #
 5 | # description:
 6 | # sort_key:
 7 | # sort_reverse:
 8 | ---
 9 | description: List of machine learning inference engines and APIs that are optimized for execution and/or training on edge devices.
10 | sort_key: name
11 | ...
12 | 


--------------------------------------------------------------------------------
/AI_Chips/data.yaml:
--------------------------------------------------------------------------------
 1 | # AI CHIPS
 2 | #
 3 | # [Template]
 4 | #
 5 | # -
 6 | #     name:
 7 | #     description:
 8 | #     link:
 9 | ---
10 | 
11 | -
12 |     name: AI Chip (ICs and IPs)
13 |     description:  A list of ICs and IPs for AI, Machine Learning and Deep Learning
14 |     link: https://github.com/basicmi/AI-Chip
15 | 


--------------------------------------------------------------------------------
/Datasets/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import h3
 2 | from style import p
 3 | from style import a
 4 | from style import newline
 5 | 
 6 | 
 7 | def renderer(fp, data, config):
 8 |     fp.write(h3(a([
 9 |         data["name"],
10 |         data["url"],
11 |     ])))
12 |     fp.write(p(data["description"]))
13 |     newline(fp)
14 | 


--------------------------------------------------------------------------------
/Challenges/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import h3
 2 | from style import p
 3 | from style import a
 4 | from style import newline
 5 | 
 6 | 
 7 | def renderer(fp, data, config):
 8 |     fp.write(h3(a([
 9 |         data["name"],
10 |         data["url"],
11 |     ])))
12 |     fp.write(p(data["description"]))
13 |     newline(fp)
14 | 


--------------------------------------------------------------------------------
/AI_Chips/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h3
 3 | from style import a
 4 | 
 5 | 
 6 | def renderer(fp, data, config):
 7 |     fp.write(h3(a([data["name"], data["link"]])))
 8 | 
 9 |     if data["description"] is not None:
10 |         fp.write(data["description"])
11 |         fp.write("\n")
12 | 
13 |     fp.write("\n")
14 | 


--------------------------------------------------------------------------------
/MCU_and_MPU_Software_Packages/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h3
 3 | from style import a
 4 | 
 5 | 
 6 | def renderer(fp, data, config):
 7 |     fp.write(h3(a([data["name"], data["link"]])))
 8 | 
 9 |     if data["description"] is not None:
10 |         fp.write(data["description"])
11 |         fp.write("\n")
12 | 
13 |     fp.write("\n")
14 | 


--------------------------------------------------------------------------------
/Other_Resources/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import p
 2 | from style import a
 3 | from style import h3
 4 | from style import newline
 5 | from utils import name2link
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     fp.write(h3(a([
10 |         data["name"],
11 |         data["url"],
12 |     ])))
13 |     newline(fp)
14 |     fp.write(p(data["description"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/LicenseBlock/data.yaml:
--------------------------------------------------------------------------------
1 | # LICENSE
2 | ---
3 | name: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
4 | logo: "[![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/)"
5 | description: To the extent possible under law, [Bisonai](https://bisonai.com/) has waived all copyright and related or neighboring rights to this work.
6 | ...
7 | 


--------------------------------------------------------------------------------
/config.yaml:
--------------------------------------------------------------------------------
 1 | # AWESOME EDGE MACHINE LEARNING
 2 | ---
 3 | title: Awesome Edge Machine Learning
 4 | description: A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.
 5 | url: https://github.com/bisonai/awesome-edge-machine-learning
 6 | filename: README.md
 7 | max_level: 2
 8 | badge:
 9 |   - "[![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re)"
10 | ...
11 | 


--------------------------------------------------------------------------------
/Contribute/config.yaml:
--------------------------------------------------------------------------------
1 | # CONTRIBUTE
2 | ---
3 | description: >
4 |   Unlike other awesome list, we are storing data in <a href="https://en.wikipedia.org/wiki/YAML">YAML</a> format and markdown files are generated with `awesome.py` script.
5 | 
6 | 
7 |   Every directory contains `data.yaml` which stores data we want to display and `config.yaml` which stores its metadata (e.g. way of sorting data). The way how data will be presented is defined in `renderer.py`.
8 | ...
9 | 


--------------------------------------------------------------------------------
/Papers/AutoML/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Others/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Applications/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Network_Pruning/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Quantization/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Federated_Learning/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/Efficient_Architectures/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Papers/ML_Algorithms_For_Edge/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h2
 3 | from style import a
 4 | from style import p
 5 | from style import newline
 6 | 
 7 | 
 8 | def renderer(fp, data, config):
 9 |     year, month, day = data["date"].split("/")
10 |     fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}"))
11 |     fp.write(data["authors"])
12 |     newline(fp)
13 |     newline(fp)
14 |     fp.write(p(data["abstract"]))
15 |     newline(fp)
16 | 


--------------------------------------------------------------------------------
/Datasets/data.yaml:
--------------------------------------------------------------------------------
 1 | # DATASETS
 2 | ---
 3 | -
 4 |   name: Visual Wake Words Dataset
 5 |   url: https://arxiv.org/abs/1906.05721
 6 |   description: >
 7 |      Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset.
 8 | 
 9 | ...
10 | 


--------------------------------------------------------------------------------
/Papers/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import h3
 2 | from style import p
 3 | from style import a
 4 | from style import li
 5 | from style import newline
 6 | from utils import name2dir
 7 | 
 8 | 
 9 | # TODO extract Papers directory automatically
10 | def renderer(fp, data, config):
11 |     fp.write(h3(a([
12 |         data["name"],
13 |         config["url"] + "/tree/master/Papers/" + name2dir(data["name"]),
14 |     ])))
15 |     fp.write(p(data["description"]))
16 |     fp.write("\n")
17 | 
18 | 
19 | def renderer_subdir(fp, data, config):
20 |     li(fp, a([data["name"].strip(), data["url"]]) + ". " + data["authors"].strip())
21 |     newline(fp)
22 | 


--------------------------------------------------------------------------------
/Inference_Engines/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import li
 2 | from style import h3
 3 | from style import a
 4 | 
 5 | 
 6 | # TODO image: platform(iOS, Android), 32, 16, 8 bits, gpu/cpu acceleration
 7 | # TODO link to companies
 8 | def renderer(fp, data, config):
 9 |     fp.write(h3(data["name"]))
10 | 
11 |     if data["sourcecode"] is not None:
12 |         li(fp, [
13 |             "Source code: ",
14 |             a(data["sourcecode"]),
15 |         ])
16 | 
17 |     if data["documentation"] is not None:
18 |         li(fp, [
19 |             "Documentation: ",
20 |             a(data["documentation"]),
21 |         ])
22 | 
23 |     li(fp, data["company"])
24 |     fp.write("\n")
25 | 


--------------------------------------------------------------------------------
/Books/renderer.py:
--------------------------------------------------------------------------------
 1 | from style import h3
 2 | from style import p
 3 | from style import a
 4 | from style import newline
 5 | from style import li
 6 | 
 7 | 
 8 | # TODO separate authors and add links
 9 | def renderer(fp, data, config):
10 |     title = data["title"] if data["subtitle"] is None else f"{data['title']}: {data['subtitle']}"
11 |     fp.write(h3(a([
12 |         title,
13 |         data["url"],
14 |     ])))
15 | 
16 |     # Authors
17 |     if len(data["authors"]) > 1:
18 |         author = "Authors: "
19 |     else:
20 |         author = "Author: "
21 | 
22 |     author += ", ".join(data["authors"])
23 |     li(fp, author)
24 | 
25 |     li(fp, f"Published: {data['published']}")
26 |     newline(fp)
27 | 


--------------------------------------------------------------------------------
/Challenges/data.yaml:
--------------------------------------------------------------------------------
 1 | # CHALLENGES
 2 | ---
 3 | 
 4 | -
 5 |   name: Low Power Recognition Challenge (LPIRC)
 6 |   url: https://rebootingcomputing.ieee.org/lpirc
 7 |   description: >
 8 |     Competition with focus on the best vision solutions that can simultaneously achieve high accuracy in computer vision and energy efficiency. LPIRC is regularly held during computer vision conferences (CVPR, ICCV and others) since 2015 and the winners’ solutions have already improved 24 times in the ratio of accuracy divided by energy.
 9 | 
10 | 
11 |     - [Online Track](https://rebootingcomputing.ieee.org/lpirc/online-track)
12 | 
13 | 
14 |     - [Onsite Track](https://rebootingcomputing.ieee.org/lpirc/onsite-track)
15 | 
16 | ...
17 | 


--------------------------------------------------------------------------------
/data.yaml:
--------------------------------------------------------------------------------
 1 | # TABLE OF CONTENTS
 2 | #
 3 | # `name` have to correspond to the name of directory.
 4 | # The space between words will be replaced with underscore _
 5 | #
 6 | # The order of content will be same as defined here in this file.
 7 | #
 8 | # [Template]
 9 | # -
10 | #   name:
11 | ---
12 | 
13 | -
14 |   name: Papers
15 | 
16 | -
17 |   name: Datasets
18 | 
19 | -
20 |   name: Inference Engines
21 | 
22 | -
23 |   name: MCU and MPU Software Packages
24 | 
25 | -
26 |   name: AI Chips
27 | 
28 | #-
29 | #    name: Labs
30 | # TODO bisonai inference time benchmark
31 | #-
32 | #    name: Benchmarks
33 | 
34 | -
35 |   name: Books
36 | 
37 | -
38 |   name: Challenges
39 | 
40 | #-
41 | #    name: Companies
42 | # -
43 | #   name: Meetups
44 | 
45 | -
46 |   name: Other Resources
47 | 
48 | -
49 |  name: Contribute
50 | 
51 | -
52 |   name: LicenseBlock
53 | 
54 | ...
55 | 


--------------------------------------------------------------------------------
/Other_Resources/data.yaml:
--------------------------------------------------------------------------------
 1 | # OTHER RESOURCES
 2 | #
 3 | # [Template]
 4 | #
 5 | # -
 6 | #   name:
 7 | #   url:
 8 | #   description:
 9 | ---
10 | 
11 | -
12 |   name: Awesome EMDL
13 |   url: https://github.com/EMDL/awesome-emdl
14 |   description: Embedded and mobile deep learning research resources
15 | 
16 | -
17 |   name: Awesome Pruning
18 |   url: https://github.com/he-y/Awesome-Pruning
19 |   description: A curated list of neural network pruning resources
20 | 
21 | -
22 |   name: Efficient DNNs
23 |   url: https://github.com/MingSun-Tse/EfficientDNNs
24 |   description: Collection of recent methods on DNN compression and acceleration
25 | 
26 | -
27 |   name: Machine Think
28 |   url: https://machinethink.net/
29 |   description: Machine learning tutorials targeted for iOS devices
30 | 
31 | -
32 |   name: Pete Warden's blog
33 |   url: https://petewarden.com/
34 |   description:
35 | 
36 | ...
37 | 


--------------------------------------------------------------------------------
/Books/data.yaml:
--------------------------------------------------------------------------------
 1 | # BOOKS
 2 | #
 3 | # [Template]
 4 | #
 5 | # -
 6 | #   title:
 7 | #   subtitle:
 8 | #   authors:
 9 | #     -
10 | #   url:
11 | #   published:
12 | ---
13 | 
14 | -
15 |   title: Building Mobile Applications with TensorFlow
16 |   subtitle:
17 |   authors:
18 |     - Pete Warden
19 |   url: https://www.oreilly.com/library/view/building-mobile-applications/9781491988435/
20 |   published: 2017
21 | 
22 | -
23 |   title: Machine Learning by Tutorials
24 |   subtitle: Beginning machine learning for Apple and iOS
25 |   authors:
26 |     - Matthijs Hollemans
27 |   url: https://store.raywenderlich.com/products/machine-learning-by-tutorials
28 |   published: 2019
29 | 
30 | -
31 |   title: Core ML Survival Guide
32 |   subtitle:
33 |   authors:
34 |     - Matthijs Hollemans
35 |   url: https://leanpub.com/coreml-survival-guide
36 |   published: 2018
37 | 
38 | -
39 |   title: TinyML
40 |   subtitle: Machine Learning with TensorFlow on Arduino, and Ultra-Low Power Micro-Controllers
41 |   authors:
42 |     - Pete Warden
43 |     - Daniel Situnayake
44 |   url: http://shop.oreilly.com/product/0636920254508.do
45 |   published: 2020
46 | 
47 | ...
48 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
 1 | from pathlib import Path
 2 | from typing import List
 3 | from typing import Dict
 4 | 
 5 | from yaml import load
 6 | from yaml import Loader
 7 | 
 8 | 
 9 | def parse_yaml(filepath: Path):
10 |     with open(filepath, "r") as f:
11 |         stream = "".join(f.readlines())
12 |         return load(stream, Loader=Loader)
13 | 
14 | 
15 | def name2dir(name: str):
16 |     return "_".join([s for s in name.split(" ")])
17 | 
18 | 
19 | def name2link(name: str):
20 |     """Used for hyperlink anchors"""
21 |     if not isinstance(name, str):
22 |         name = str(name)
23 | 
24 |     return "-".join([s.lower() for s in name.split(" ")])
25 | 
26 | 
27 | def dir2name(name: str):
28 |     if not isinstance(name, str):
29 |         name = str(name)
30 | 
31 |     # return " ".join([w[0].upper() + w[1:] for w in name.split("_")])
32 |     return " ".join([w for w in name.split("_")])
33 | 
34 | 
35 | def find_subdirectories(path: Path):
36 |     if not isinstance(path, Path):
37 |         path = Path(path)
38 | 
39 |     return sorted(list(filter(lambda f: f.is_dir() and f.name != "__pycache__", path.glob("*"))))
40 | 
41 | 
42 | def sort(
43 |         data: List[Dict],
44 |         sort_key: str,
45 |         sort_reverse: bool,
46 | ):
47 |     if sort_key is not None:
48 |         data = sorted(data, key=lambda k: k[sort_key], reverse=sort_reverse)
49 | 
50 |     return data
51 | 


--------------------------------------------------------------------------------
/style.py:
--------------------------------------------------------------------------------
 1 | from typing import List
 2 | 
 3 | 
 4 | def concatenate(text: List):
 5 |     return "".join(filter(lambda x: x is not None, text))
 6 | 
 7 | 
 8 | def li(fp, text):
 9 |     if isinstance(text, list):
10 |         text = concatenate(text)
11 | 
12 |     fp.write("- " + text + "\n")
13 | 
14 | 
15 | def lili(fp, text):
16 |     """Second level of list items"""
17 |     fp.write("\t")
18 |     li(fp, text)
19 | 
20 | 
21 | def h1(text):
22 |     return "# " + text + "\n"
23 | 
24 | 
25 | def h2(text):
26 |     return "## " + text + "\n"
27 | 
28 | 
29 | def h3(text):
30 |     return "### " + text + "\n"
31 | 
32 | 
33 | def h4(text):
34 |     return "#### " + text + "\n"
35 | 
36 | 
37 | def h5(text):
38 |     return "##### " + text + "\n"
39 | 
40 | 
41 | def h6(text):
42 |     return "###### " + text + "\n"
43 | 
44 | 
45 | def p(text):
46 |     if text is None:
47 |         return "\n"
48 |     else:
49 |         return str(text) + "\n"
50 | 
51 | 
52 | def a(args: List):
53 |     if not isinstance(args, list):
54 |         args = [args]
55 | 
56 |     if len(args) == 1:
57 |         src = args[0]
58 |         if src is None:
59 |             return ""
60 |         else:
61 |             return f"[{src}]({src})"
62 |     if len(args) == 2:
63 |         name = args[0]
64 |         src = args[1]
65 |         if name is None or src is None:
66 |             return ""
67 |         else:
68 |             return f"[{name}]({src})"
69 |     else:
70 |         raise NotImplementedError
71 | 
72 | 
73 | def newline(fp, iter=1):
74 |     for _ in range(iter):
75 |         fp.write("\n")
76 | 


--------------------------------------------------------------------------------
/Papers/data.yaml:
--------------------------------------------------------------------------------
 1 | # PAPERS
 2 | #
 3 | # [Template]
 4 | #
 5 | # -
 6 | #   name:
 7 | #   description:
 8 | ---
 9 | 
10 | - name: Applications
11 |   description: >
12 |     There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions.
13 | 
14 | -
15 |   name: Federated Learning
16 |   description: >
17 |     Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.<sup><a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html" target="_blank">Google AI blog: Federated Learning</a></sup>
18 | 
19 | -
20 |   name: Quantization
21 |   description: >
22 |     Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision.
23 | 
24 | -
25 |   name: Network Pruning
26 |   description: >
27 |     Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.<sup><a href="http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf" target="_blank">Importance Estimation for Neural Network Pruning</a></sup>
28 | 
29 | -
30 |   name: AutoML
31 |   description: >
32 |     Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.<sup><a href="https://en.wikipedia.org/wiki/Automated_machine_learning" targe="_blank">Wikipedia</a></sup> AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture.
33 | 
34 | -
35 |   name: Efficient Architectures
36 |   description: >
37 |     Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices.
38 | 
39 | - name: ML Algorithms For Edge
40 |   description: >
41 |     Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms.
42 | 
43 | -
44 |   name: Others
45 |   description: >
46 |     This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform).
47 | 
48 | ...
49 | 


--------------------------------------------------------------------------------
/Inference_Engines/data.yaml:
--------------------------------------------------------------------------------
  1 | # INFERENCE ENGINES
  2 | #
  3 | # [Template]
  4 | #
  5 | # -
  6 | #     name:
  7 | #     company:
  8 | #     sourcecode:
  9 | #     documentation:
 10 | #     platform:
 11 | #     gpu:
 12 | ---
 13 | 
 14 | -
 15 |     name: Arm Compute Library
 16 |     company: Arm
 17 |     sourcecode: https://github.com/ARM-software/ComputeLibrary
 18 |     documentation:
 19 |     platform:
 20 |     gpu:
 21 | 
 22 | -
 23 |     name: Qualcomm Neural Processing SDK for AI
 24 |     company: Qualcomm
 25 |     sourcecode: https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk
 26 |     documentation:
 27 |     platform: Android
 28 |     gpu: true
 29 | 
 30 | -
 31 |     name: Embedded Learning Library
 32 |     company: Microsoft
 33 |     sourcecode: https://github.com/Microsoft/ELL
 34 |     documentation: https://microsoft.github.io/ELL
 35 |     platform: Raspberry Pi, Arduino, micro:bit
 36 |     gpu: false
 37 | 
 38 | -
 39 |     name: Bender
 40 |     company: Xmartlabs
 41 |     sourcecode: https://github.com/xmartlabs/Bender
 42 |     documentation: https://xmartlabs.github.io/Bender/
 43 |     platform: iOS
 44 |     gpu: true
 45 | 
 46 | -
 47 |     name: dabnn
 48 |     company: JDAI Computer Vision
 49 |     sourcecode: https://github.com/JDAI-CV/dabnn
 50 |     documentation:
 51 |     platform: Android
 52 |     gpu:
 53 | 
 54 | -
 55 |     name: Tengine
 56 |     company: OAID
 57 |     sourcecode: https://github.com/OAID/Tengine
 58 |     documentation:
 59 |     platform: Android
 60 |     gpu: true
 61 | 
 62 | -
 63 |     name: MACE
 64 |     company: XiaoMi
 65 |     sourcecode: https://github.com/XiaoMi/mace
 66 |     documentation: https://mace.readthedocs.io/
 67 |     platform: Android, iOS
 68 |     gpu:
 69 | 
 70 | -
 71 |     name: MNN
 72 |     company: Alibaba
 73 |     sourcecode: https://github.com/alibaba/MNN
 74 |     documentation:
 75 |     platform:
 76 |     gpu:
 77 | 
 78 | -
 79 |     name: Feather CNN
 80 |     company: Tencent
 81 |     sourcecode: https://github.com/Tencent/FeatherCNN
 82 |     documentation:
 83 |     platform:
 84 |     gpu:
 85 | 
 86 | -
 87 |     name: NCNN
 88 |     company: Tencent
 89 |     sourcecode: https://github.com/tencent/ncnn
 90 |     documentation:
 91 |     platform: iOS, Android
 92 |     gpu: true
 93 | 
 94 | -
 95 |     name: Paddle Mobile
 96 |     company: Baidu
 97 |     sourcecode: https://github.com/PaddlePaddle/paddle-mobile
 98 |     documentation:
 99 |     platform:
100 |     gpu:
101 | 
102 | -
103 |     name: MXNet
104 |     company: Amazon
105 |     sourcecode:
106 |     documentation: https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html
107 |     platform:
108 |     gpu:
109 | 
110 | -
111 |     name: TensorFlow Lite
112 |     company: Google
113 |     sourcecode: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite
114 |     documentation: https://www.tensorflow.org/lite/
115 |     platform: Android, iOS
116 |     gpu: true
117 | 
118 | -
119 |     name: Caffe 2
120 |     company: Facebook
121 |     sourcecode: https://github.com/pytorch/pytorch/tree/master/caffe2
122 |     documentation: https://caffe2.ai/
123 |     platform:
124 |     gpu:
125 | 
126 | -
127 |     name: CoreML
128 |     company: Apple
129 |     sourcecode:
130 |     documentation: https://developer.apple.com/documentation/coreml
131 |     platform: iOS
132 |     gpu: true
133 | 
134 | -
135 |     name: Neural Networks API
136 |     company: Google
137 |     sourcecode:
138 |     documentation: https://developer.android.com/ndk/guides/neuralnetworks/
139 |     platform:
140 |     gpu: true
141 | 
142 | -
143 |     name: Deeplearning4j
144 |     company: Skymind
145 |     sourcecode:
146 |     documentation: https://deeplearning4j.org/docs/latest/deeplearning4j-android
147 |     platform:
148 |     gpu:
149 | 
150 | ...
151 | 


--------------------------------------------------------------------------------
/MCU_and_MPU_Software_Packages/data.yaml:
--------------------------------------------------------------------------------
 1 | # MCU AND MPU SOFTWARE PACKAGES
 2 | #
 3 | # [Template]
 4 | #
 5 | # -
 6 | #     name:
 7 | #     description:
 8 | #     link:
 9 | #     company:
10 | ---
11 | 
12 | -
13 |     name: FP-AI-Sensing
14 |     description: STM32Cube function pack for ultra-low power IoT node with artificial intelligence (AI) application based on audio and motion sensing
15 |     link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-ode-function-pack-sw/fp-ai-sensing1.html
16 |     company: STMicroelectronics
17 | 
18 | -
19 |     name: FP-AI-VISION1
20 |     description: FP-AI-VISION1 is an STM32Cube function pack featuring examples of computer vision applications based on Convolutional Neural Network (CNN)
21 |     link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32cube-expansion-packages/fp-ai-vision1.html
22 |     company: STMicroelectronics
23 | 
24 | -
25 |     name: X-LINUX-AI-CV
26 |     description: X-LINUX-AI-CV is an STM32 MPU OpenSTLinux Expansion Package that targets Artificial Intelligence for computer vision applications based on Convolutional Neural Network (CNN)
27 |     link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-mpu-openstlinux-expansion-packages/x-linux-ai-cv.html
28 |     company: STMicroelectronics
29 | 
30 | -
31 |     name: e-AI Translator
32 |     description: Tool for converting  Caffe and TensorFlow models to MCU/MPU development environment
33 |     link: https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html
34 |     company: Renesas
35 | 
36 | -
37 |     name: e-AI Checker
38 |     description: Based on the output result from the translator, the ROM/RAM mounting size and the inference execution processing time are calculated while referring to the information of the selected MCU/MPU
39 |     link: https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html
40 |     company: Renesas
41 | 
42 | -
43 |     name: Processor SDK Linux for AM57x
44 |     description: TIDL software framework leverages a highly optimized neural network implementation on TI’s Sitara AM57x processors, making use of hardware acceleration on the device
45 |     link: http://www.ti.com/tool/SITARA-MACHINE-LEARNING
46 |     company: Texas Instruments
47 | 
48 | -
49 |     name: eIQ ML Software Development Environment
50 |     description: The NXP® eIQ™ machine learning software development environment enables the use of ML algorithms on NXP MCUs, i.MX RT crossover MCUs, and i.MX family SoCs. eIQ software includes inference engines, neural network compilers and optimized libraries
51 |     link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ
52 |     company: NXP
53 | 
54 | -
55 |     name: eIQ Auto deep learning (DL) toolkit
56 |     description: The NXP eIQ™ Auto deep learning (DL) toolkit enables developers to introduce DL algorithms into their applications and to continue satisfying automotive standards
57 |     link: https://www.nxp.com/design/software/development-software/eiq-auto-dl-toolkit:EIQ-AUTO
58 |     company: NXP
59 | 
60 | -
61 |     name: eIQ™ Software for Arm® NN Inference Engine
62 |     description:
63 |     link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-software-for-arm-nn-inference-engine:eIQArmNN
64 |     company: NXP
65 | 
66 | -
67 |     name: eIQ™ for TensorFlow Lite
68 |     description:
69 |     link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-tensorflow-lite:eIQTensorFlowLite
70 |     company: NXP
71 | 
72 | -
73 |     name: eIQ™ for Glow Neural Network Compiler
74 |     description:
75 |     link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-glow-neural-network-compiler:eIQ-Glow
76 |     company: NXP
77 | 
78 | -
79 |     name: eIQ™ for Arm® CMSIS-NN
80 |     description:
81 |     link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-arm-cmsis-nn:eIQArmCMSISNN
82 |     company: NXP
83 | 


--------------------------------------------------------------------------------
/Papers/Federated_Learning/README.md:
--------------------------------------------------------------------------------
 1 | # Federated Learning
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.<sup><a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html" target="_blank">Google AI blog: Federated Learning</a></sup>
 7 | 
 8 | 
 9 | ## [Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046), 2019/02
10 | Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander
11 | 
12 | Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.
13 | 
14 | 
15 | ## [Adaptive Federated Learning in Resource Constrained Edge Computing Systems](https://arxiv.org/abs/1804.05271), 2018/04
16 | Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan
17 | 
18 | Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.
19 | 
20 | 
21 | ## [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629), 2016/02
22 | H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
23 | 
24 | Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/Papers/Federated_Learning/data.yaml:
--------------------------------------------------------------------------------
 1 | # FEDERATED LEARNING
 2 | #
 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
 4 | # In case of arxiv, use the date of the first version of paper.
 5 | #
 6 | # [Template]
 7 | #
 8 | # -
 9 | #   name:
10 | #   url:
11 | #   date:
12 | #   conference:
13 | #   code:
14 | #   authors:
15 | #   abstract:
16 | ---
17 | 
18 | -
19 |   name: >
20 |     Communication-Efficient Learning of Deep Networks from Decentralized Data
21 |   url: https://arxiv.org/abs/1602.05629
22 |   date: 2016/02/17
23 |   conference: AISTATS 2017
24 |   code:
25 |   authors: H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
26 |   abstract: >
27 |     Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning.
28 |     We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.
29 | 
30 | -
31 |   name: >
32 |     Towards Federated Learning at Scale: System Design
33 |   url: https://arxiv.org/abs/1902.01046
34 |   date: 2019/02/04
35 |   conference:
36 |   code:
37 |   authors: Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander
38 |   abstract: >
39 |     Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.
40 | 
41 | -
42 |   name: >
43 |     Adaptive Federated Learning in Resource Constrained Edge Computing Systems
44 |   url: https://arxiv.org/abs/1804.05271
45 |   date: 2018/04/14
46 |   conference: IEEE Journal on Selected Areas in Communications
47 |   code:
48 |   authors: Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan
49 |   abstract: >
50 |     Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.
51 | ...
52 | 


--------------------------------------------------------------------------------
/awesome.py:
--------------------------------------------------------------------------------
  1 | # AWESOME EDGE MACHINE LEARNING
  2 | # Bisonai, 2019
  3 | from pathlib import Path
  4 | 
  5 | from style import li
  6 | from style import lili
  7 | from style import h1
  8 | from style import h2
  9 | from style import a
 10 | from style import p
 11 | from style import newline
 12 | from utils import parse_yaml
 13 | from utils import name2dir
 14 | from utils import dir2name
 15 | from utils import name2link
 16 | from utils import find_subdirectories
 17 | from utils import sort
 18 | 
 19 | # TODO conference badges
 20 | 
 21 | config = parse_yaml("config.yaml")
 22 | f = open(config["filename"], "w")
 23 | 
 24 | 
 25 | # Introduction ########################################################
 26 | f.write(h1(config["title"]))
 27 | for badge in config["badge"]:
 28 |     f.write(badge)
 29 |     newline(f)
 30 | 
 31 | newline(f)
 32 | f.write(config["description"])
 33 | newline(f, iter=2)
 34 | 
 35 | 
 36 | # Table of Contents ###################################################
 37 | f.write(h2("Table of Contents"))
 38 | table_of_contents = parse_yaml("data.yaml")
 39 | default_level = 1
 40 | max_level = config.get("max_level", default_level)
 41 | level = default_level
 42 | for tol in table_of_contents:
 43 |     li(f, a([
 44 |         tol["name"],
 45 |         config["url"] + "#" + name2link(tol["name"]),
 46 |     ]))
 47 | 
 48 |     # Deeper levels in table of contents
 49 |     while True:
 50 |         if level < max_level:
 51 |             level += 1
 52 |             sub_table_of_contents = find_subdirectories(name2dir(tol["name"]))
 53 |             for s in sub_table_of_contents:
 54 |                 lili(f, a([
 55 |                     dir2name(s.name),
 56 |                     config["url"] + "/tree/master/" + str(s),
 57 |                 ]))
 58 |         else:
 59 |             level = default_level
 60 |             break
 61 | 
 62 | newline(f)
 63 | 
 64 | 
 65 | # Main Content ########################################################
 66 | for tol in table_of_contents:
 67 |     f.write(h2(tol["name"]))
 68 | 
 69 |     datafile = Path(name2dir(tol["name"]))
 70 |     if not datafile.is_dir():
 71 |         print(f"You must create directory for {tol['name']} and populate it with data.yaml, config.yaml and renderer.py files.")
 72 |         continue
 73 | 
 74 |     data = parse_yaml(datafile / "data.yaml")
 75 |     config_local = parse_yaml(datafile / "config.yaml")
 76 | 
 77 |     # Section description
 78 |     description = config_local.get("description", None)
 79 |     if description is not None:
 80 |         f.write(p(description))
 81 |         newline(f)
 82 | 
 83 |     # Sort content of section
 84 |     sort_key = config_local.get("sort_key", None)
 85 |     data = sort(data, sort_key, config_local.get("sort_reverse", False))
 86 | 
 87 |     exec(f"from {datafile}.renderer import renderer")
 88 | 
 89 |     try:
 90 |         exec(f"from {datafile}.renderer import renderer_subdir")
 91 |         # e.g. content of Papers / README.md
 92 |         fp_sub2 = open(str(Path(tol["name"]) / "README.md"), "w")
 93 |         fp_sub2.write(h1(tol["name"]))
 94 |         fp_sub2.write(a(["Back to awesome edge machine learning", config["url"]]))
 95 |         newline(fp_sub2, iter=2)
 96 |     except:
 97 |         pass
 98 | 
 99 |     if not isinstance(data, list):
100 |         data = [data]
101 |     for d in data:
102 |         renderer(f, d, config)
103 | 
104 |     subdirs = find_subdirectories(datafile)
105 |     for idx, sub in enumerate(subdirs):
106 |         # e.g. content of Papers / AutoML / README.md
107 |         data_sub = parse_yaml(sub / "data.yaml")
108 |         config_sub = parse_yaml(sub / "config.yaml")
109 |         fp_sub = open(sub / "README.md", "w")
110 | 
111 |         fp_sub.write(h1(dir2name(sub.name)))
112 |         fp_sub.write(a(["Back to awesome edge machine learning", config["url"]]))
113 |         newline(fp_sub, iter=2)
114 |         fp_sub.write(a([f"Back to {datafile}", config["url"] + f"/tree/master/{datafile}"]))
115 |         newline(fp_sub, iter=2)
116 |         fp_sub.write(data[idx]["description"])
117 |         newline(fp_sub, iter=2)
118 | 
119 |         exec(f"from {str(sub).replace('/', '.')}.renderer import renderer")
120 | 
121 |         try:
122 |             fp_sub2.write(h2(dir2name(sub.name)))
123 |             newline(fp_sub2)
124 |             fp_sub2.write((data[idx]["description"]))
125 |             newline(fp_sub2)
126 |         except:
127 |             pass
128 | 
129 |         if config_sub is not None:
130 |             sort_key = config_sub.get("sort_key", None)
131 |             data_sub = sort(data_sub, sort_key, config_sub.get("sort_reverse", False))
132 |             for d in data_sub:
133 |                 renderer(fp_sub, d, config)
134 |                 try:
135 |                     renderer_subdir(fp_sub2, d, config)
136 |                 except:
137 |                     pass
138 |         fp_sub.close()
139 | 


--------------------------------------------------------------------------------
/Papers/ML_Algorithms_For_Edge/README.md:
--------------------------------------------------------------------------------
 1 | # ML Algorithms For Edge
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms.
 7 | 
 8 | 
 9 | ## [Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices](https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf), 2019/12
10 | Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain
11 | 
12 | Recurrent Neural Networks (RNNs) capture long dependencies and context, and hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using ShaRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag in phoneme classification by 10-12x while maintaining state-of-the-art accuracy.
13 | 
14 | 
15 | ## [ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices](http://proceedings.mlr.press/v70/gupta17a.html), 2017/08
16 | Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain
17 | 
18 | Several real-world applications require real-time prediction on resource-scarce devices such as an Internet of Things (IoT) sensor. Such applications demand prediction models with small storage and computational complexity that do not compromise significantly on accuracy. In this work, we propose ProtoNN, a novel algorithm that addresses the problem of real-time and accurate prediction on resource-scarce devices. ProtoNN is inspired by k-Nearest Neighbor (KNN) but has several orders lower storage and prediction complexity. ProtoNN models can be deployed even on devices with puny storage and computational power (e.g. an Arduino UNO with 2kB RAM) to get excellent prediction accuracy. ProtoNN derives its strength from three key ideas: a) learning a small number of prototypes to represent the entire training set, b) sparse low dimensional projection of data, c) joint discriminative learning of the projection and prototypes with explicit model size constraint. We conduct systematic empirical evaluation of ProtoNN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory.
19 | 
20 | 
21 | ## [Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things](http://proceedings.mlr.press/v70/kumar17a.html), 2017/08
22 | Ashish Kumar, Saurabh Goyal, Manik Varma
23 | 
24 | This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash. Bonsai maintains prediction accuracy while minimizing model size and prediction costs by: (a) developing a tree model which learns a single, shallow, sparse tree with powerful nodes; (b) sparsely projecting all data into a low-dimensional space in which the tree is learnt; and (c) jointly learning all tree and projection parameters. Experimental results on multiple benchmark datasets demonstrate that Bonsai can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, has lower battery consumption than all other algorithms while achieving prediction accuracies that can be as much as 30\% higher than state-of-the-art methods for resource-efficient machine learning. Bonsai is also shown to generalize to other resource constrained settings beyond IoT by generating significantly better search results as compared to Bing’s L3 ranker when the model size is restricted to 300 bytes. Bonsai’s code can be downloaded from [http://www.manikvarma.org/code/Bonsai/download.html](http://www.manikvarma.org/code/Bonsai/download.html).
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/Papers/ML_Algorithms_For_Edge/data.yaml:
--------------------------------------------------------------------------------
 1 | # ML ALGORITHMS FOR EDGE
 2 | #
 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
 4 | # In case of arxiv, use the date of the first version of paper.
 5 | #
 6 | # [Template]
 7 | #
 8 | # -
 9 | #   name:
10 | #   url:
11 | #   date:
12 | #   conference:
13 | #   code:
14 | #   authors:
15 | #   abstract:
16 | ---
17 | 
18 | -
19 |   name: >
20 |     ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices
21 |   url: http://proceedings.mlr.press/v70/gupta17a.html
22 |   date: 2017/08/06
23 |   conference: ICML 2017
24 |   code: https://github.com/microsoft/EdgeML/tree/master/cpp/src/ProtoNN
25 |   authors: Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain
26 |   abstract: >
27 |     Several real-world applications require real-time prediction on resource-scarce devices such as an Internet of Things (IoT) sensor. Such applications demand prediction models with small storage and computational complexity that do not compromise significantly on accuracy. In this work, we propose ProtoNN, a novel algorithm that addresses the problem of real-time and accurate prediction on resource-scarce devices. ProtoNN is inspired by k-Nearest Neighbor (KNN) but has several orders lower storage and prediction complexity. ProtoNN models can be deployed even on devices with puny storage and computational power (e.g. an Arduino UNO with 2kB RAM) to get excellent prediction accuracy. ProtoNN derives its strength from three key ideas: a) learning a small number of prototypes to represent the entire training set, b) sparse low dimensional projection of data, c) joint discriminative learning of the projection and prototypes with explicit model size constraint. We conduct systematic empirical evaluation of ProtoNN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory.
28 | 
29 | -
30 |   name: >
31 |     Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things
32 |   url: http://proceedings.mlr.press/v70/kumar17a.html
33 |   date: 2017/08/06
34 |   conference: ICML 2017
35 |   code: https://github.com/microsoft/EdgeML/tree/master/cpp/src/Bonsai
36 |   authors: Ashish Kumar, Saurabh Goyal, Manik Varma
37 |   abstract: >
38 |     This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash. Bonsai maintains prediction accuracy while minimizing model size and prediction costs by: (a) developing a tree model which learns a single, shallow, sparse tree with powerful nodes; (b) sparsely projecting all data into a low-dimensional space in which the tree is learnt; and (c) jointly learning all tree and projection parameters. Experimental results on multiple benchmark datasets demonstrate that Bonsai can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, has lower battery consumption than all other algorithms while achieving prediction accuracies that can be as much as 30\% higher than state-of-the-art methods for resource-efficient machine learning. Bonsai is also shown to generalize to other resource constrained settings beyond IoT by generating significantly better search results as compared to Bing’s L3 ranker when the model size is restricted to 300 bytes. Bonsai’s code can be downloaded from [http://www.manikvarma.org/code/Bonsai/download.html](http://www.manikvarma.org/code/Bonsai/download.html).
39 | 
40 | -
41 |   name: >
42 |     Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices
43 |   url: https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf
44 |   date: 2019/12/09
45 |   conference: NeurIPS 2019
46 |   code: https://github.com/Microsoft/EdgeML/
47 |   authors: Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain
48 |   abstract: >
49 |     Recurrent Neural Networks (RNNs) capture long dependencies and context, and hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using ShaRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag in phoneme classification by 10-12x while maintaining state-of-the-art accuracy.
50 | 
51 | ...
52 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 2019 Bisonai Authors
  2 | 
  3 | Creative Commons Legal Code
  4 | 
  5 | CC0 1.0 Universal
  6 | 
  7 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
  8 |     LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
  9 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
 10 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
 11 |     REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
 12 |     PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
 13 |     THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
 14 |     HEREUNDER.
 15 | 
 16 | Statement of Purpose
 17 | 
 18 | The laws of most jurisdictions throughout the world automatically confer
 19 | exclusive Copyright and Related Rights (defined below) upon the creator
 20 | and subsequent owner(s) (each and all, an "owner") of an original work of
 21 | authorship and/or a database (each, a "Work").
 22 | 
 23 | Certain owners wish to permanently relinquish those rights to a Work for
 24 | the purpose of contributing to a commons of creative, cultural and
 25 | scientific works ("Commons") that the public can reliably and without fear
 26 | of later claims of infringement build upon, modify, incorporate in other
 27 | works, reuse and redistribute as freely as possible in any form whatsoever
 28 | and for any purposes, including without limitation commercial purposes.
 29 | These owners may contribute to the Commons to promote the ideal of a free
 30 | culture and the further production of creative, cultural and scientific
 31 | works, or to gain reputation or greater distribution for their Work in
 32 | part through the use and efforts of others.
 33 | 
 34 | For these and/or other purposes and motivations, and without any
 35 | expectation of additional consideration or compensation, the person
 36 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
 37 | is an owner of Copyright and Related Rights in the Work, voluntarily
 38 | elects to apply CC0 to the Work and publicly distribute the Work under its
 39 | terms, with knowledge of his or her Copyright and Related Rights in the
 40 | Work and the meaning and intended legal effect of CC0 on those rights.
 41 | 
 42 | 1. Copyright and Related Rights. A Work made available under CC0 may be
 43 | protected by copyright and related or neighboring rights ("Copyright and
 44 | Related Rights"). Copyright and Related Rights include, but are not
 45 | limited to, the following:
 46 | 
 47 |   i. the right to reproduce, adapt, distribute, perform, display,
 48 |      communicate, and translate a Work;
 49 |  ii. moral rights retained by the original author(s) and/or performer(s);
 50 | iii. publicity and privacy rights pertaining to a person's image or
 51 |      likeness depicted in a Work;
 52 |  iv. rights protecting against unfair competition in regards to a Work,
 53 |      subject to the limitations in paragraph 4(a), below;
 54 |   v. rights protecting the extraction, dissemination, use and reuse of data
 55 |      in a Work;
 56 |  vi. database rights (such as those arising under Directive 96/9/EC of the
 57 |      European Parliament and of the Council of 11 March 1996 on the legal
 58 |      protection of databases, and under any national implementation
 59 |      thereof, including any amended or successor version of such
 60 |      directive); and
 61 | vii. other similar, equivalent or corresponding rights throughout the
 62 |      world based on applicable law or treaty, and any national
 63 |      implementations thereof.
 64 | 
 65 | 2. Waiver. To the greatest extent permitted by, but not in contravention
 66 | of, applicable law, Affirmer hereby overtly, fully, permanently,
 67 | irrevocably and unconditionally waives, abandons, and surrenders all of
 68 | Affirmer's Copyright and Related Rights and associated claims and causes
 69 | of action, whether now known or unknown (including existing as well as
 70 | future claims and causes of action), in the Work (i) in all territories
 71 | worldwide, (ii) for the maximum duration provided by applicable law or
 72 | treaty (including future time extensions), (iii) in any current or future
 73 | medium and for any number of copies, and (iv) for any purpose whatsoever,
 74 | including without limitation commercial, advertising or promotional
 75 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
 76 | member of the public at large and to the detriment of Affirmer's heirs and
 77 | successors, fully intending that such Waiver shall not be subject to
 78 | revocation, rescission, cancellation, termination, or any other legal or
 79 | equitable action to disrupt the quiet enjoyment of the Work by the public
 80 | as contemplated by Affirmer's express Statement of Purpose.
 81 | 
 82 | 3. Public License Fallback. Should any part of the Waiver for any reason
 83 | be judged legally invalid or ineffective under applicable law, then the
 84 | Waiver shall be preserved to the maximum extent permitted taking into
 85 | account Affirmer's express Statement of Purpose. In addition, to the
 86 | extent the Waiver is so judged Affirmer hereby grants to each affected
 87 | person a royalty-free, non transferable, non sublicensable, non exclusive,
 88 | irrevocable and unconditional license to exercise Affirmer's Copyright and
 89 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
 90 | maximum duration provided by applicable law or treaty (including future
 91 | time extensions), (iii) in any current or future medium and for any number
 92 | of copies, and (iv) for any purpose whatsoever, including without
 93 | limitation commercial, advertising or promotional purposes (the
 94 | "License"). The License shall be deemed effective as of the date CC0 was
 95 | applied by Affirmer to the Work. Should any part of the License for any
 96 | reason be judged legally invalid or ineffective under applicable law, such
 97 | partial invalidity or ineffectiveness shall not invalidate the remainder
 98 | of the License, and in such case Affirmer hereby affirms that he or she
 99 | will not (i) exercise any of his or her remaining Copyright and Related
100 | Rights in the Work or (ii) assert any associated claims and causes of
101 | action with respect to the Work, in either case contrary to Affirmer's
102 | express Statement of Purpose.
103 | 
104 | 4. Limitations and Disclaimers.
105 | 
106 |  a. No trademark or patent rights held by Affirmer are waived, abandoned,
107 |     surrendered, licensed or otherwise affected by this document.
108 |  b. Affirmer offers the Work as-is and makes no representations or
109 |     warranties of any kind concerning the Work, express, implied,
110 |     statutory or otherwise, including without limitation warranties of
111 |     title, merchantability, fitness for a particular purpose, non
112 |     infringement, or the absence of latent or other defects, accuracy, or
113 |     the present or absence of errors, whether or not discoverable, all to
114 |     the greatest extent permissible under applicable law.
115 |  c. Affirmer disclaims responsibility for clearing rights of other persons
116 |     that may apply to the Work or any use thereof, including without
117 |     limitation any person's Copyright and Related Rights in the Work.
118 |     Further, Affirmer disclaims responsibility for obtaining any necessary
119 |     consents, permissions or other rights required for any use of the
120 |     Work.
121 |  d. Affirmer understands and acknowledges that Creative Commons is not a
122 |     party to this document and has no duty or obligation with respect to
123 |     this CC0 or use of the Work.
124 | 
125 | 


--------------------------------------------------------------------------------
/Papers/Applications/README.md:
--------------------------------------------------------------------------------
 1 | # Applications
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions.
 7 | 
 8 | 
 9 | ## [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), 2019/09
10 | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
11 | 
12 | Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.The code and the pretrained models are available at this [URL](https://github.com/google-research/google-research/tree/master/albert).
13 | 
14 | 
15 | ## [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351), 2019/09
16 | Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
17 | 
18 | Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them.
19 | 
20 | 
21 | ## [Temporal Convolution for Real-time Keyword Spotting on Mobile Devices](https://arxiv.org/abs/1904.03814), 2019/04
22 | Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha
23 | 
24 | Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.
25 | 
26 | 
27 | ## [Towards Real-Time Automatic Portrait Matting on Mobile Devices](https://arxiv.org/abs/1904.03816), 2019/04
28 | Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha
29 | 
30 | We tackle the problem of automatic portrait matting on mobile devices. The proposed model is aimed at attaining real-time inference on mobile devices with minimal degradation of model performance. Our model MMNet, based on multi-branch dilated convolution with linear bottleneck blocks, outperforms the state-of-the-art model and is orders of magnitude faster. The model can be accelerated four times to attain 30 FPS on Xiaomi Mi 5 device with moderate increase in the gradient error. Under the same conditions, our model has an order of magnitude less number of parameters and is faster than Mobile DeepLabv3 while maintaining comparable performance. The accompanied implementation can be found at this [URL](https://github.com/hyperconnect/MMNet).
31 | 
32 | ## [ThunderNet: Towards Real-time Generic Object Detection](https://arxiv.org/abs/1903.11752), 2019/03
33 | Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
34 | 
35 | Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.
36 | 
37 | 
38 | ## [PFLD: A Practical Facial Landmark Detector](https://arxiv.org/abs/1902.10859), 2019/02
39 | Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling
40 | 
41 | Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at this [URL](http://sites.google.com/view/xjguo/fld) for encouraging comparisons and improvements from the community.
42 | 
43 | 
44 | ## [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), 2018/11
45 | Ji Lin, Chuang Gan, Song Han
46 | 
47 | The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: this https URL.
48 | 
49 | 
50 | 


--------------------------------------------------------------------------------
/Papers/Applications/data.yaml:
--------------------------------------------------------------------------------
 1 | # APPLICATIONS
 2 | #
 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
 4 | # In case of arxiv, use the date of the first version of paper.
 5 | #
 6 | # [Template]
 7 | #
 8 | # -
 9 | #   name:
10 | #   url:
11 | #   date:
12 | #   conference:
13 | #   code:
14 | #   authors:
15 | #   abstract:
16 | ---
17 | 
18 | -
19 |   name: >
20 |     Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
21 |   url: https://arxiv.org/abs/1904.03814
22 |   date: 2019/04/08
23 |   conference: INTERSPEECH 2019
24 |   code: https://github.com/hyperconnect/TC-ResNet
25 |   authors: Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha
26 |   abstract: >
27 |     Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.
28 | 
29 | -
30 |   name: >
31 |     Towards Real-Time Automatic Portrait Matting on Mobile Devices
32 |   url: https://arxiv.org/abs/1904.03816
33 |   date: 2019/04/08
34 |   conference:
35 |   code: https://github.com/hyperconnect/MMNet
36 |   authors: Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha
37 |   abstract: >-
38 |     We tackle the problem of automatic portrait matting on mobile devices. The proposed model is aimed at attaining real-time inference on mobile devices with minimal degradation of model performance. Our model MMNet, based on multi-branch dilated convolution with linear bottleneck blocks, outperforms the state-of-the-art model and is orders of magnitude faster. The model can be accelerated four times to attain 30 FPS on Xiaomi Mi 5 device with moderate increase in the gradient error. Under the same conditions, our model has an order of magnitude less number of parameters and is faster than Mobile DeepLabv3 while maintaining comparable performance. The accompanied implementation can be found at this [URL](https://github.com/hyperconnect/MMNet).
39 | 
40 | -
41 |   name: >
42 |     PFLD: A Practical Facial Landmark Detector
43 |   url: https://arxiv.org/abs/1902.10859
44 |   date: 2019/02/28
45 |   conference:
46 |   code: https://sites.google.com/view/xjguo/fld
47 |   authors: Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling
48 |   abstract: >
49 |      Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at this [URL](http://sites.google.com/view/xjguo/fld) for encouraging comparisons and improvements from the community.
50 | 
51 | -
52 |   name: >
53 |     ThunderNet: Towards Real-time Generic Object Detection
54 |   url: https://arxiv.org/abs/1903.11752
55 |   date: 2019/03/28
56 |   conference: ICCV 2019
57 |   code:
58 |   authors: Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
59 |   abstract: >
60 |     Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.
61 | 
62 | -
63 |   name: >
64 |     TinyBERT: Distilling BERT for Natural Language Understanding
65 |   url: https://arxiv.org/abs/1909.10351
66 |   date: 2019/09/23
67 |   conference:
68 |   code: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT
69 |   authors: Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
70 |   abstract: >
71 |     Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them.
72 | 
73 | -
74 |   name: >
75 |     ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
76 |   url: https://arxiv.org/abs/1909.11942
77 |   date: 2019/09/26
78 |   conference: ICLR 2020
79 |   code: https://github.com/google-research/google-research/tree/master/albert
80 |   authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
81 |   abstract: >
82 |     Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.The code and the pretrained models are available at this [URL](https://github.com/google-research/google-research/tree/master/albert).
83 | 
84 | -
85 |   name: >
86 |     TSM: Temporal Shift Module for Efficient Video Understanding
87 |   url: https://arxiv.org/abs/1811.08383
88 |   date: 2018/11/20
89 |   conference: ICCV 2019
90 |   code: https://github.com/mit-han-lab/temporal-shift-module
91 |   authors: Ji Lin, Chuang Gan, Song Han
92 |   abstract: >
93 |     The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: this https URL.
94 | 
95 | ...
96 | 


--------------------------------------------------------------------------------
/Papers/AutoML/README.md:
--------------------------------------------------------------------------------
 1 | # AutoML
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.<sup><a href="https://en.wikipedia.org/wiki/Automated_machine_learning" targe="_blank">Wikipedia</a></sup> AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture.
 7 | 
 8 | 
 9 | ## [ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation](https://arxiv.org/abs/1812.08934), 2018/12
10 | Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha
11 | 
12 | This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).
13 | 
14 | 
15 | ## [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443), 2018/12
16 | Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer
17 | 
18 | Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too expensive for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets, a family of models discovered by DNAS surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3 with similar accuracy. Despite higher accuracy and lower latency than MnasNet, we estimate FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU-hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X.
19 | 
20 | 
21 | ## [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332), 2018/12
22 | Han Cai, Ligeng Zhu, Song Han
23 | 
24 | Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. 104 GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6× fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2× faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.
25 | 
26 | 
27 | ## [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626), 2018/07
28 | Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
29 | 
30 | Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8x faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3x faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at this [URL](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet)
31 | 
32 | 
33 | ## [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230), 2018/04
34 | Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
35 | 
36 | This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7× speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2).
37 | 
38 | 
39 | ## [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494), 2018/02
40 | Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han
41 | 
42 | Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.
43 | 
44 | 
45 | ## [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks](https://arxiv.org/abs/1711.06798), 2017/11
46 | Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi
47 | 
48 | We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.
49 | 
50 | 
51 | 


--------------------------------------------------------------------------------
/Papers/AutoML/data.yaml:
--------------------------------------------------------------------------------
 1 | # AUTOML
 2 | #
 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
 4 | # In case of arxiv, use the date of the first version of paper.
 5 | #
 6 | # [Template]
 7 | #
 8 | # -
 9 | #   name:
10 | #   url:
11 | #   date:
12 | #   conference:
13 | #   code:
14 | #   authors:
15 | #   abstract:
16 | ---
17 | 
18 | -
19 |   name: >
20 |     ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation
21 |   url: https://arxiv.org/abs/1812.08934
22 |   date: 2018/12/21
23 |   conference: CVPR 2019
24 |   code: https://github.com/facebookresearch/mobile-vision
25 |   authors: Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha
26 |   abstract: >
27 |     This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).
28 | 
29 | -
30 |   name: >
31 |     FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
32 |   url: https://arxiv.org/abs/1812.03443
33 |   date: 2018/12/09
34 |   conference: CVPR 2019
35 |   code: https://github.com/facebookresearch/mobile-vision
36 |   authors: Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer
37 |   abstract: >
38 |    Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too expensive for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets, a family of models discovered by DNAS surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3 with similar accuracy. Despite higher accuracy and lower latency than MnasNet, we estimate FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU-hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X.
39 | 
40 | -
41 |   name: >
42 |     AMC: AutoML for Model Compression and Acceleration on Mobile Devices
43 |   url: https://arxiv.org/abs/1802.03494
44 |   date: 2018/02/10
45 |   conference: ECCV 2018
46 |   code: https://github.com/mit-han-lab/amc-compressed-models
47 |   authors: Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han
48 |   abstract: >
49 |     Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.
50 | 
51 | -
52 |   name: >
53 |     ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
54 |   url: https://arxiv.org/abs/1812.00332
55 |   date: 2018/12/02
56 |   conference: ICLR 2019
57 |   code: https://github.com/mit-han-lab/ProxylessNAS
58 |   authors: Han Cai, Ligeng Zhu, Song Han
59 |   abstract: >
60 |     Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. 104 GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6× fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2× faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.
61 | 
62 | -
63 |   name: >
64 |     MnasNet: Platform-Aware Neural Architecture Search for Mobile
65 |   url: https://arxiv.org/abs/1807.11626
66 |   date: 2018/07/31
67 |   conference: CVPR 2019
68 |   code: https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet
69 |   authors: Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
70 |   abstract: >
71 |     Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8x faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3x faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at this [URL](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet)
72 | 
73 | -
74 |   name: >
75 |    NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
76 |   url: https://arxiv.org/abs/1804.03230
77 |   date: 2018/04/19
78 |   conference: ECCV 2018
79 |   code:
80 |   authors: Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
81 |   abstract: >
82 |     This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7× speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2).
83 | 
84 | -
85 |   name: >
86 |     MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
87 |   url: https://arxiv.org/abs/1711.06798
88 |   date: 2017/11/18
89 |   conference:
90 |   code:
91 |   authors: Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi
92 |   abstract: >
93 |     We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.
94 | 
95 | ...
96 | 


--------------------------------------------------------------------------------
/Papers/Quantization/README.md:
--------------------------------------------------------------------------------
 1 | # Quantization
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision.
 7 | 
 8 | 
 9 | ## [And the Bit Goes Down: Revisiting the Quantization of Neural Networks](https://arxiv.org/abs/1907.05686), 2019/07
10 | Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou
11 | 
12 | In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error for in-domain inputs and does not require any labelled data. We also use byte-aligned codebooks to produce compressed networks with efficient inference on CPU. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a size budget around 6 MB.
13 | 
14 | 
15 | ## [Data-Free Quantization through Weight Equalization and Bias Correction](https://arxiv.org/abs/1906.04721), 2019/06
16 | Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
17 | 
18 | We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.
19 | 
20 | 
21 | ## [HAQ: Hardware-Aware Automated Quantization with Mixed Precision](https://arxiv.org/abs/1811.08886), 2018/11
22 | Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han
23 | 
24 | Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.
25 | 
26 | 
27 | ## [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342), 2018/06
28 | Raghuraman Krishnamoorthi
29 | 
30 | We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.
31 | 
32 | 
33 | ## [A Quantization-Friendly Separable Convolution for MobileNets](https://arxiv.org/abs/1803.08607), 2018/03
34 | Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic
35 | 
36 | As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.
37 | 
38 | 
39 | ## [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877), 2017/11
40 | Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
41 | 
42 | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
43 | 
44 | 
45 | ## [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061), 2016/09
46 | Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
47 | 
48 | We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.
49 | 
50 | 
51 | ## [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160), 2016/06
52 | Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
53 | 
54 | We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.
55 | 
56 | 
57 | ## [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473), 2015/12
58 | Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
59 | 
60 | Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4~6x speed-up and 15~20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.
61 | 
62 | 
63 | 


--------------------------------------------------------------------------------
/Papers/Others/README.md:
--------------------------------------------------------------------------------
 1 | # Others
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform).
 7 | 
 8 | 
 9 | ## [Distributed Machine Learning on Mobile Devices: A Survey](https://arxiv.org/abs/1909.08329), 2019/09
10 | Renjie Gu, Shuo Yang, Fan Wu
11 | 
12 | In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.
13 | 
14 | 
15 | ## [Machine Learning at the Network Edge: A Survey](https://arxiv.org/abs/1908.00080), 2019/07
16 | M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain
17 | 
18 | Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks.
19 | 
20 | 
21 | ## [Convergence of Edge Computing and Deep Learning: A Comprehensive Survey](https://arxiv.org/abs/1907.08349), 2019/07
22 | Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen
23 | 
24 | Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge.
25 | 
26 | 
27 | ## [On-Device Neural Net Inference with Mobile GPUs](https://arxiv.org/abs/1907.01989), 2019/07
28 | Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann
29 | 
30 | On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at [https://tensorflow.org/lite](https://tensorflow.org/lite).
31 | 
32 | 
33 | ## [Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing](https://arxiv.org/abs/1905.10083), 2019/05
34 | Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang
35 | 
36 | With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence.
37 | 
38 | 
39 | ## [Deep Learning on Mobile Devices - A Review](https://arxiv.org/abs/1904.09274), 2019/03
40 | Yunbin Deng
41 | 
42 | Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays, Application Specific Integrated Circuit, and recent mobile Graphic Processing Units. We present Size, Weight, Area and Power considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine, and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multi-media, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies.
43 | 
44 | 
45 | ## [Wireless Network Intelligence at the Edge](https://arxiv.org/abs/1812.02858), 2018/12
46 | Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah
47 | 
48 | Fueled by the availability of more data and computing power, recent breakthroughs in cloud-based machine learning (ML) have transformed every aspect of our lives from face recognition and medical diagnosis to natural language processing. However, classical ML exerts severe demands in terms of energy, memory and computing resources, limiting their adoption for resource constrained edge devices. The new breed of intelligent devices and high-stake applications (drones, augmented/virtual reality, autonomous systems, etc.), requires a novel paradigm change calling for distributed, low-latency and reliable ML at the wireless network edge (referred to as edge ML). In edge ML, training data is unevenly distributed over a large number of edge nodes, which have access to a tiny fraction of the data. Moreover training and inference is carried out collectively over wireless links, where edge devices communicate and exchange their learned models (not their private data). In a first of its kind, this article explores key building blocks of edge ML, different neural network architectural splits and their inherent tradeoffs, as well as theoretical and technical enablers stemming from a wide range of mathematical disciplines. Finally, several case studies pertaining to various high-stake applications are presented demonstrating the effectiveness of edge ML in unlocking the full potential of 5G and beyond.
49 | 
50 | 
51 | ## [Machine Learning at Facebook:Understanding Inference at the Edge](https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf), 2018/12
52 | Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang
53 | 
54 | At Facebook, machine learning provides a wide range ofcapabilities that drive many aspects of user experienceincluding ranking posts, content understanding, objectdetection and tracking for augmented and virtual real-ity, speech and text translations.  While machine learn-ing  models  are  currently  trained  on  customized  data-center infrastructure, Facebook is working to bring ma-chine learning inference to the edge.  By doing so, userexperience is improved with reduced latency (inferencetime) and becomes less dependent on network connec-tivity.  Furthermore, this also enables many more appli-cations  of  deep  learning  with  important  features  onlymade available at the edge.  This paper takes a data-driven  approach  to  present  the  opportunities  and  de-sign  challenges  faced  by  Facebook  in  order  to  enablemachine learning inference locally on smartphones andother edge platforms.
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/Papers/Quantization/data.yaml:
--------------------------------------------------------------------------------
  1 | # QUANTIZATION
  2 | #
  3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
  4 | # In case of arxiv, use the date of the first version of paper.
  5 | #
  6 | # [Template]
  7 | #
  8 | # -
  9 | #   name:
 10 | #   url:
 11 | #   date:
 12 | #   conference:
 13 | #   code:
 14 | #   authors:
 15 | #   abstract:
 16 | ---
 17 | 
 18 | -
 19 |   name: >
 20 |     And the Bit Goes Down: Revisiting the Quantization of Neural Networks
 21 |   url: https://arxiv.org/abs/1907.05686
 22 |   date: 2019/07/12
 23 |   conference:
 24 |   code: https://github.com/facebookresearch/kill-the-bits
 25 |   authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou
 26 |   abstract: >
 27 |     In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error for in-domain inputs and does not require any labelled data. We also use byte-aligned codebooks to produce compressed networks with efficient inference on CPU. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a size budget around 6 MB.
 28 | 
 29 | -
 30 |   name: >
 31 |     A Quantization-Friendly Separable Convolution for MobileNets
 32 |   url: https://arxiv.org/abs/1803.08607
 33 |   date: 2018/03/22
 34 |   conference:
 35 |   code:
 36 |   authors: Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic
 37 |   abstract: >
 38 |     As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.
 39 | 
 40 | -
 41 |   name: >
 42 |     DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
 43 |   url: https://arxiv.org/abs/1606.06160
 44 |   date: 2016/06/20
 45 |   conference:
 46 |   code:
 47 |   authors: Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
 48 |   abstract: >
 49 |     We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.
 50 | 
 51 | -
 52 |   name: >
 53 |     Data-Free Quantization through Weight Equalization and Bias Correction
 54 |   url: https://arxiv.org/abs/1906.04721
 55 |   date: 2019/06/11
 56 |   conference:
 57 |   code:
 58 |   authors: Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
 59 |   abstract: >
 60 |     We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.
 61 | 
 62 | -
 63 |   name: >
 64 |     Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
 65 |   url: https://arxiv.org/abs/1609.07061
 66 |   date: 2016/09/22
 67 |   conference:
 68 |   authors: Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
 69 |   abstract: >
 70 |     We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.
 71 | 
 72 | -
 73 |   name: Quantized Convolutional Neural Networks for Mobile Devices
 74 |   url: https://arxiv.org/abs/1512.06473
 75 |   date: 2015/12/21
 76 |   conference: CVPR 2016
 77 |   authors: Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
 78 |   abstract: >
 79 |     Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4~6x speed-up and 15~20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.
 80 | 
 81 | -
 82 |   name: >
 83 |     Quantizing deep convolutional networks for efficient inference: A whitepaper
 84 |   url: https://arxiv.org/abs/1806.08342
 85 |   date: 2018/06/21
 86 |   conference:
 87 |   authors: Raghuraman Krishnamoorthi
 88 |   abstract: >
 89 |     We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX.
 90 |     Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.
 91 | 
 92 | -
 93 |   name: >
 94 |     Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
 95 |   url: https://arxiv.org/abs/1712.05877
 96 |   date: 2017/11/15
 97 |   conference:
 98 |   authors: Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
 99 |   abstract: >
100 |     The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
101 | 
102 | -
103 |   name: >
104 |     HAQ: Hardware-Aware Automated Quantization with Mixed Precision
105 |   url: https://arxiv.org/abs/1811.08886
106 |   date: 2018/11/21
107 |   conference: CVPR 2019
108 |   code: https://github.com/mit-han-lab/haq-release
109 |   authors: Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han
110 |   abstract: >
111 |     Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.
112 | 
113 | ...
114 | 


--------------------------------------------------------------------------------
/Papers/Others/data.yaml:
--------------------------------------------------------------------------------
  1 | # OTHERS
  2 | #
  3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06.
  4 | # In case of arxiv, use the date of the first version of paper.
  5 | #
  6 | # [Template]
  7 | #
  8 | # -
  9 | #   name:
 10 | #   url:
 11 | #   date:
 12 | #   conference:
 13 | #   code:
 14 | #   authors:
 15 | #   abstract:
 16 | ---
 17 | 
 18 | -
 19 |   name: >
 20 |     Machine Learning at Facebook:Understanding Inference at the Edge
 21 |   url: https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf
 22 |   date: 2018/12/01
 23 |   conference:
 24 |   code:
 25 |   authors: Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang
 26 |   abstract: >
 27 |     At Facebook, machine learning provides a wide range ofcapabilities that drive many aspects of user experienceincluding ranking posts, content understanding, objectdetection and tracking for augmented and virtual real-ity, speech and text translations.  While machine learn-ing  models  are  currently  trained  on  customized  data-center infrastructure, Facebook is working to bring ma-chine learning inference to the edge.  By doing so, userexperience is improved with reduced latency (inferencetime) and becomes less dependent on network connec-tivity.  Furthermore, this also enables many more appli-cations  of  deep  learning  with  important  features  onlymade available at the edge.  This paper takes a data-driven  approach  to  present  the  opportunities  and  de-sign  challenges  faced  by  Facebook  in  order  to  enablemachine learning inference locally on smartphones andother edge platforms.
 28 | 
 29 | -
 30 |   name: >
 31 |     On-Device Neural Net Inference with Mobile GPUs
 32 |   url: https://arxiv.org/abs/1907.01989
 33 |   date: 2019/07/03
 34 |   conference:
 35 |   code:
 36 |   authors: Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann
 37 |   abstract: >
 38 |     On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at [https://tensorflow.org/lite](https://tensorflow.org/lite).
 39 | 
 40 | -
 41 |   name: >
 42 |     Deep Learning on Mobile Devices - A Review
 43 |   url: https://arxiv.org/abs/1904.09274
 44 |   date: 2019/03/21
 45 |   conference:
 46 |   code:
 47 |   authors: Yunbin Deng
 48 |   abstract: >
 49 |     Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays, Application Specific Integrated Circuit, and recent mobile Graphic Processing Units. We present Size, Weight, Area and Power considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine, and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multi-media, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies.
 50 | 
 51 | -
 52 |   name: >
 53 |     Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
 54 |   url: https://arxiv.org/abs/1907.08349
 55 |   date: 2019/07/19
 56 |   conference:
 57 |   code:
 58 |   authors: Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen
 59 |   abstract: >
 60 |     Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge.
 61 | 
 62 | -
 63 |   name: >
 64 |     Machine Learning at the Network Edge: A Survey
 65 |   url: https://arxiv.org/abs/1908.00080
 66 |   date: 2019/07/31
 67 |   conference:
 68 |   code:
 69 |   authors: M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain
 70 |   abstract: >
 71 |     Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks.
 72 | 
 73 | -
 74 |   name: >
 75 |     Distributed Machine Learning on Mobile Devices: A Survey
 76 |   url: https://arxiv.org/abs/1909.08329
 77 |   date: 2019/09/18
 78 |   conference:
 79 |   code:
 80 |   authors: Renjie Gu, Shuo Yang, Fan Wu
 81 |   abstract: >
 82 |     In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.
 83 | 
 84 | -
 85 |   name: >
 86 |     Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
 87 |   url: https://arxiv.org/abs/1905.10083
 88 |   date: 2019/05/24
 89 |   conference:
 90 |   code:
 91 |   authors: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang
 92 |   abstract: >
 93 |     With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence.
 94 | 
 95 | -
 96 |   name: >
 97 |     Wireless Network Intelligence at the Edge
 98 |   url: https://arxiv.org/abs/1812.02858
 99 |   date: 2018/12/07
100 |   conference:
101 |   code:
102 |   authors: Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah
103 |   abstract: >
104 |     Fueled by the availability of more data and computing power, recent breakthroughs in cloud-based machine learning (ML) have transformed every aspect of our lives from face recognition and medical diagnosis to natural language processing. However, classical ML exerts severe demands in terms of energy, memory and computing resources, limiting their adoption for resource constrained edge devices. The new breed of intelligent devices and high-stake applications (drones, augmented/virtual reality, autonomous systems, etc.), requires a novel paradigm change calling for distributed, low-latency and reliable ML at the wireless network edge (referred to as edge ML). In edge ML, training data is unevenly distributed over a large number of edge nodes, which have access to a tiny fraction of the data. Moreover training and inference is carried out collectively over wireless links, where edge devices communicate and exchange their learned models (not their private data). In a first of its kind, this article explores key building blocks of edge ML, different neural network architectural splits and their inherent tradeoffs, as well as theoretical and technical enablers stemming from a wide range of mathematical disciplines. Finally, several case studies pertaining to various high-stake applications are presented demonstrating the effectiveness of edge ML in unlocking the full potential of 5G and beyond.
105 | 
106 | ...
107 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome Edge Machine Learning
  2 | [![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re)
  3 | 
  4 | A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.
  5 | 
  6 | ## Table of Contents
  7 | - [Papers](https://github.com/bisonai/awesome-edge-machine-learning#papers)
  8 | 	- [Applications](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Applications)
  9 | 	- [AutoML](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/AutoML)
 10 | 	- [Efficient Architectures](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Efficient_Architectures)
 11 | 	- [Federated Learning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Federated_Learning)
 12 | 	- [ML Algorithms For Edge](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/ML_Algorithms_For_Edge)
 13 | 	- [Network Pruning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Network_Pruning)
 14 | 	- [Others](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Others)
 15 | 	- [Quantization](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Quantization)
 16 | - [Datasets](https://github.com/bisonai/awesome-edge-machine-learning#datasets)
 17 | - [Inference Engines](https://github.com/bisonai/awesome-edge-machine-learning#inference-engines)
 18 | - [MCU and MPU Software Packages](https://github.com/bisonai/awesome-edge-machine-learning#mcu-and-mpu-software-packages)
 19 | - [AI Chips](https://github.com/bisonai/awesome-edge-machine-learning#ai-chips)
 20 | - [Books](https://github.com/bisonai/awesome-edge-machine-learning#books)
 21 | - [Challenges](https://github.com/bisonai/awesome-edge-machine-learning#challenges)
 22 | - [Other Resources](https://github.com/bisonai/awesome-edge-machine-learning#other-resources)
 23 | - [Contribute](https://github.com/bisonai/awesome-edge-machine-learning#contribute)
 24 | - [LicenseBlock](https://github.com/bisonai/awesome-edge-machine-learning#licenseblock)
 25 | 
 26 | ## Papers
 27 | ### [Applications](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Applications)
 28 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions.
 29 | 
 30 | 
 31 | ### [AutoML](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/AutoML)
 32 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.<sup><a href="https://en.wikipedia.org/wiki/Automated_machine_learning" targe="_blank">Wikipedia</a></sup> AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture.
 33 | 
 34 | 
 35 | ### [Efficient Architectures](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Efficient_Architectures)
 36 | Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices.
 37 | 
 38 | 
 39 | ### [Federated Learning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Federated_Learning)
 40 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.<sup><a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html" target="_blank">Google AI blog: Federated Learning</a></sup>
 41 | 
 42 | 
 43 | ### [ML Algorithms For Edge](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/ML_Algorithms_For_Edge)
 44 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms.
 45 | 
 46 | 
 47 | ### [Network Pruning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Network_Pruning)
 48 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.<sup><a href="http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf" target="_blank">Importance Estimation for Neural Network Pruning</a></sup>
 49 | 
 50 | 
 51 | ### [Others](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Others)
 52 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform).
 53 | 
 54 | 
 55 | ### [Quantization](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Quantization)
 56 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision.
 57 | 
 58 | 
 59 | ## Datasets
 60 | ### [Visual Wake Words Dataset](https://arxiv.org/abs/1906.05721)
 61 | Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset.
 62 | 
 63 | 
 64 | ## Inference Engines
 65 | List of machine learning inference engines and APIs that are optimized for execution and/or training on edge devices.
 66 | 
 67 | ### Arm Compute Library
 68 | - Source code: [https://github.com/ARM-software/ComputeLibrary](https://github.com/ARM-software/ComputeLibrary)
 69 | - Arm
 70 | 
 71 | ### Bender
 72 | - Source code: [https://github.com/xmartlabs/Bender](https://github.com/xmartlabs/Bender)
 73 | - Documentation: [https://xmartlabs.github.io/Bender/](https://xmartlabs.github.io/Bender/)
 74 | - Xmartlabs
 75 | 
 76 | ### Caffe 2
 77 | - Source code: [https://github.com/pytorch/pytorch/tree/master/caffe2](https://github.com/pytorch/pytorch/tree/master/caffe2)
 78 | - Documentation: [https://caffe2.ai/](https://caffe2.ai/)
 79 | - Facebook
 80 | 
 81 | ### CoreML
 82 | - Documentation: [https://developer.apple.com/documentation/coreml](https://developer.apple.com/documentation/coreml)
 83 | - Apple
 84 | 
 85 | ### Deeplearning4j
 86 | - Documentation: [https://deeplearning4j.org/docs/latest/deeplearning4j-android](https://deeplearning4j.org/docs/latest/deeplearning4j-android)
 87 | - Skymind
 88 | 
 89 | ### Embedded Learning Library
 90 | - Source code: [https://github.com/Microsoft/ELL](https://github.com/Microsoft/ELL)
 91 | - Documentation: [https://microsoft.github.io/ELL](https://microsoft.github.io/ELL)
 92 | - Microsoft
 93 | 
 94 | ### Feather CNN
 95 | - Source code: [https://github.com/Tencent/FeatherCNN](https://github.com/Tencent/FeatherCNN)
 96 | - Tencent
 97 | 
 98 | ### MACE
 99 | - Source code: [https://github.com/XiaoMi/mace](https://github.com/XiaoMi/mace)
100 | - Documentation: [https://mace.readthedocs.io/](https://mace.readthedocs.io/)
101 | - XiaoMi
102 | 
103 | ### MNN
104 | - Source code: [https://github.com/alibaba/MNN](https://github.com/alibaba/MNN)
105 | - Alibaba
106 | 
107 | ### MXNet
108 | - Documentation: [https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html](https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html)
109 | - Amazon
110 | 
111 | ### NCNN
112 | - Source code: [https://github.com/tencent/ncnn](https://github.com/tencent/ncnn)
113 | - Tencent
114 | 
115 | ### Neural Networks API
116 | - Documentation: [https://developer.android.com/ndk/guides/neuralnetworks/](https://developer.android.com/ndk/guides/neuralnetworks/)
117 | - Google
118 | 
119 | ### Paddle Mobile
120 | - Source code: [https://github.com/PaddlePaddle/paddle-mobile](https://github.com/PaddlePaddle/paddle-mobile)
121 | - Baidu
122 | 
123 | ### Qualcomm Neural Processing SDK for AI
124 | - Source code: [https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk](https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk)
125 | - Qualcomm
126 | 
127 | ### Tengine
128 | - Source code: [https://github.com/OAID/Tengine](https://github.com/OAID/Tengine)
129 | - OAID
130 | 
131 | ### TensorFlow Lite
132 | - Source code: [https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite)
133 | - Documentation: [https://www.tensorflow.org/lite/](https://www.tensorflow.org/lite/)
134 | - Google
135 | 
136 | ### dabnn
137 | - Source code: [https://github.com/JDAI-CV/dabnn](https://github.com/JDAI-CV/dabnn)
138 | - JDAI Computer Vision
139 | 
140 | ## MCU and MPU Software Packages
141 | List of software packages for AI development on MCU and MPU
142 | 
143 | ### [FP-AI-Sensing](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-ode-function-pack-sw/fp-ai-sensing1.html)
144 | STM32Cube function pack for ultra-low power IoT node with artificial intelligence (AI) application based on audio and motion sensing
145 | 
146 | ### [FP-AI-VISION1](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32cube-expansion-packages/fp-ai-vision1.html)
147 | FP-AI-VISION1 is an STM32Cube function pack featuring examples of computer vision applications based on Convolutional Neural Network (CNN)
148 | 
149 | ### [Processor SDK Linux for AM57x](http://www.ti.com/tool/SITARA-MACHINE-LEARNING)
150 | TIDL software framework leverages a highly optimized neural network implementation on TI’s Sitara AM57x processors, making use of hardware acceleration on the device
151 | 
152 | ### [X-LINUX-AI-CV](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-mpu-openstlinux-expansion-packages/x-linux-ai-cv.html)
153 | X-LINUX-AI-CV is an STM32 MPU OpenSTLinux Expansion Package that targets Artificial Intelligence for computer vision applications based on Convolutional Neural Network (CNN)
154 | 
155 | ### [e-AI Checker](https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html)
156 | Based on the output result from the translator, the ROM/RAM mounting size and the inference execution processing time are calculated while referring to the information of the selected MCU/MPU
157 | 
158 | ### [e-AI Translator](https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html)
159 | Tool for converting  Caffe and TensorFlow models to MCU/MPU development environment
160 | 
161 | ### [eIQ Auto deep learning (DL) toolkit](https://www.nxp.com/design/software/development-software/eiq-auto-dl-toolkit:EIQ-AUTO)
162 | The NXP eIQ™ Auto deep learning (DL) toolkit enables developers to introduce DL algorithms into their applications and to continue satisfying automotive standards
163 | 
164 | ### [eIQ ML Software Development Environment](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ)
165 | The NXP® eIQ™ machine learning software development environment enables the use of ML algorithms on NXP MCUs, i.MX RT crossover MCUs, and i.MX family SoCs. eIQ software includes inference engines, neural network compilers and optimized libraries
166 | 
167 | ### [eIQ™ Software for Arm® NN Inference Engine](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-software-for-arm-nn-inference-engine:eIQArmNN)
168 | 
169 | ### [eIQ™ for Arm® CMSIS-NN](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-arm-cmsis-nn:eIQArmCMSISNN)
170 | 
171 | ### [eIQ™ for Glow Neural Network Compiler](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-glow-neural-network-compiler:eIQ-Glow)
172 | 
173 | ### [eIQ™ for TensorFlow Lite](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-tensorflow-lite:eIQTensorFlowLite)
174 | 
175 | ## AI Chips
176 | List of resources about AI Chips
177 | 
178 | ### [AI Chip (ICs and IPs)](https://github.com/basicmi/AI-Chip)
179 | A list of ICs and IPs for AI, Machine Learning and Deep Learning
180 | 
181 | ## Books
182 | List of books with focus on on-device (e.g., edge or mobile) machine learning.
183 | 
184 | ### [TinyML: Machine Learning with TensorFlow on Arduino, and Ultra-Low Power Micro-Controllers](http://shop.oreilly.com/product/0636920254508.do)
185 | - Authors: Pete Warden, Daniel Situnayake
186 | - Published: 2020
187 | 
188 | ### [Machine Learning by Tutorials: Beginning machine learning for Apple and iOS](https://store.raywenderlich.com/products/machine-learning-by-tutorials)
189 | - Author: Matthijs Hollemans
190 | - Published: 2019
191 | 
192 | ### [Core ML Survival Guide](https://leanpub.com/coreml-survival-guide)
193 | - Author: Matthijs Hollemans
194 | - Published: 2018
195 | 
196 | ### [Building Mobile Applications with TensorFlow](https://www.oreilly.com/library/view/building-mobile-applications/9781491988435/)
197 | - Author: Pete Warden
198 | - Published: 2017
199 | 
200 | ## Challenges
201 | ### [Low Power Recognition Challenge (LPIRC)](https://rebootingcomputing.ieee.org/lpirc)
202 | Competition with focus on the best vision solutions that can simultaneously achieve high accuracy in computer vision and energy efficiency. LPIRC is regularly held during computer vision conferences (CVPR, ICCV and others) since 2015 and the winners’ solutions have already improved 24 times in the ratio of accuracy divided by energy.
203 | 
204 | - [Online Track](https://rebootingcomputing.ieee.org/lpirc/online-track)
205 | 
206 | - [Onsite Track](https://rebootingcomputing.ieee.org/lpirc/onsite-track)
207 | 
208 | 
209 | ## Other Resources
210 | ### [Awesome EMDL](https://github.com/EMDL/awesome-emdl)
211 | 
212 | Embedded and mobile deep learning research resources
213 | 
214 | ### [Awesome Pruning](https://github.com/he-y/Awesome-Pruning)
215 | 
216 | A curated list of neural network pruning resources
217 | 
218 | ### [Efficient DNNs](https://github.com/MingSun-Tse/EfficientDNNs)
219 | 
220 | Collection of recent methods on DNN compression and acceleration
221 | 
222 | ### [Machine Think](https://machinethink.net/)
223 | 
224 | Machine learning tutorials targeted for iOS devices
225 | 
226 | ### [Pete Warden's blog](https://petewarden.com/)
227 | 
228 | 
229 | 
230 | ## Contribute
231 | Unlike other awesome list, we are storing data in <a href="https://en.wikipedia.org/wiki/YAML">YAML</a> format and markdown files are generated with `awesome.py` script.
232 | 
233 | Every directory contains `data.yaml` which stores data we want to display and `config.yaml` which stores its metadata (e.g. way of sorting data). The way how data will be presented is defined in `renderer.py`.
234 | 
235 | 
236 | ## LicenseBlock
237 | [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/)
238 | 
239 | To the extent possible under law, [Bisonai](https://bisonai.com/) has waived all copyright and related or neighboring rights to this work.


--------------------------------------------------------------------------------
/Papers/README.md:
--------------------------------------------------------------------------------
  1 | # Papers
  2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
  3 | 
  4 | ## Applications
  5 | 
  6 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions.
  7 | 
  8 | - [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942). Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
  9 | 
 10 | - [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351). Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
 11 | 
 12 | - [Temporal Convolution for Real-time Keyword Spotting on Mobile Devices](https://arxiv.org/abs/1904.03814). Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha
 13 | 
 14 | - [Towards Real-Time Automatic Portrait Matting on Mobile Devices](https://arxiv.org/abs/1904.03816). Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha
 15 | 
 16 | - [ThunderNet: Towards Real-time Generic Object Detection](https://arxiv.org/abs/1903.11752). Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
 17 | 
 18 | - [PFLD: A Practical Facial Landmark Detector](https://arxiv.org/abs/1902.10859). Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling
 19 | 
 20 | - [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383). Ji Lin, Chuang Gan, Song Han
 21 | 
 22 | ## AutoML
 23 | 
 24 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.<sup><a href="https://en.wikipedia.org/wiki/Automated_machine_learning" targe="_blank">Wikipedia</a></sup> AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture.
 25 | 
 26 | - [ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation](https://arxiv.org/abs/1812.08934). Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha
 27 | 
 28 | - [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443). Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer
 29 | 
 30 | - [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). Han Cai, Ligeng Zhu, Song Han
 31 | 
 32 | - [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626). Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
 33 | 
 34 | - [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230). Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
 35 | 
 36 | - [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494). Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han
 37 | 
 38 | - [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks](https://arxiv.org/abs/1711.06798). Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi
 39 | 
 40 | ## Efficient Architectures
 41 | 
 42 | Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices.
 43 | 
 44 | - [Compression of convolutional neural networks for high performance image matching tasks on mobile devices](https://arxiv.org/abs/2001.03102). Roy Miles, Krystian Mikolajczyk
 45 | 
 46 | - [Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms](https://arxiv.org/abs/1908.09791). Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
 47 | 
 48 | - [MixNet: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595). Mingxing Tan, Quoc V. Le
 49 | 
 50 | - [Efficient On-Device Models using Neural Projections](http://proceedings.mlr.press/v97/ravi19a/ravi19a.pdf). Sujith Ravi
 51 | 
 52 | - [Butterfly Transform: An Efficient FFT Based Neural Architecture Design](https://arxiv.org/abs/1906.02256). Keivan Alizadeh, Ali Farhadi, Mohammad Rastegari
 53 | 
 54 | - [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946). Mingxing Tan, Quoc V. Le
 55 | 
 56 | - [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244). Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
 57 | 
 58 | - [ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network](https://arxiv.org/abs/1811.11431). Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi
 59 | 
 60 | - [ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation](https://arxiv.org/abs/1803.06815). Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, Hannaneh Hajishirzi
 61 | 
 62 | - [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
 63 | 
 64 | - [CondenseNet: An Efficient DenseNet using Learned Group Convolutions](https://arxiv.org/abs/1711.09224). Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger
 65 | 
 66 | - [BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks](https://arxiv.org/abs/1709.01686). Surat Teerapittayanon, Bradley McDanel, H.T. Kung
 67 | 
 68 | - [DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices](https://arxiv.org/abs/1708.04728). Dawei Li, Xiaolong Wang, Deguang Kong
 69 | 
 70 | - [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083). Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
 71 | 
 72 | - [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861). Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
 73 | 
 74 | - [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360). Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
 75 | 
 76 | ## Federated Learning
 77 | 
 78 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.<sup><a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html" target="_blank">Google AI blog: Federated Learning</a></sup>
 79 | 
 80 | - [Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046). Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander
 81 | 
 82 | - [Adaptive Federated Learning in Resource Constrained Edge Computing Systems](https://arxiv.org/abs/1804.05271). Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan
 83 | 
 84 | - [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629). H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
 85 | 
 86 | ## ML Algorithms For Edge
 87 | 
 88 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms.
 89 | 
 90 | - [Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices](https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf). Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain
 91 | 
 92 | - [ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices](http://proceedings.mlr.press/v70/gupta17a.html). Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain
 93 | 
 94 | - [Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things](http://proceedings.mlr.press/v70/kumar17a.html). Ashish Kumar, Saurabh Goyal, Manik Varma
 95 | 
 96 | ## Network Pruning
 97 | 
 98 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.<sup><a href="http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf" target="_blank">Importance Estimation for Neural Network Pruning</a></sup>
 99 | 
100 | - [Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks](https://arxiv.org/abs/1909.08174). Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, Ping Wang
101 | 
102 | - [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz
103 | 
104 | - [Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure](https://arxiv.org/abs/1904.03837). Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han
105 | 
106 | - [Towards Optimal Structured CNN Pruning via Generative Adversarial Learning](https://arxiv.org/abs/1903.09291). Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann
107 | 
108 | - [Variational Convolutional Neural Network Pruning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Variational_Convolutional_Neural_Network_Pruning_CVPR_2019_paper.pdf). Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian
109 | 
110 | - [On Implicit Filter Level Sparsity in Convolutional Neural Networks](https://arxiv.org/abs/1811.12495). Dushyant Mehta, Kwang In Kim, Christian Theobalt
111 | 
112 | - [Structured Pruning of Neural Networks with Budget-Aware Regularization](https://arxiv.org/abs/1811.09332). Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin
113 | 
114 | - [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250). Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
115 | 
116 | - [Discrimination-aware Channel Pruning for Deep Neural Networks](https://arxiv.org/abs/1810.11809). Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
117 | 
118 | - [Rethinking the Value of Network Pruning](https://arxiv.org/abs/1810.05270). Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell
119 | 
120 | - [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635). Jonathan Frankle, Michael Carbin
121 | 
122 | - [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878). Michael Zhu, Suyog Gupta
123 | 
124 | - [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342). Jian-Hao Luo, Jianxin Wu, Weiyao Lin
125 | 
126 | - [Channel Pruning for Accelerating Very Deep Neural Networks](https://arxiv.org/abs/1707.06168). Yihui He, Xiangyu Zhang, Jian Sun
127 | 
128 | ## Others
129 | 
130 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform).
131 | 
132 | - [Distributed Machine Learning on Mobile Devices: A Survey](https://arxiv.org/abs/1909.08329). Renjie Gu, Shuo Yang, Fan Wu
133 | 
134 | - [Machine Learning at the Network Edge: A Survey](https://arxiv.org/abs/1908.00080). M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain
135 | 
136 | - [Convergence of Edge Computing and Deep Learning: A Comprehensive Survey](https://arxiv.org/abs/1907.08349). Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen
137 | 
138 | - [On-Device Neural Net Inference with Mobile GPUs](https://arxiv.org/abs/1907.01989). Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann
139 | 
140 | - [Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing](https://arxiv.org/abs/1905.10083). Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang
141 | 
142 | - [Deep Learning on Mobile Devices - A Review](https://arxiv.org/abs/1904.09274). Yunbin Deng
143 | 
144 | - [Wireless Network Intelligence at the Edge](https://arxiv.org/abs/1812.02858). Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah
145 | 
146 | - [Machine Learning at Facebook:Understanding Inference at the Edge](https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf). Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang
147 | 
148 | ## Quantization
149 | 
150 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision.
151 | 
152 | - [And the Bit Goes Down: Revisiting the Quantization of Neural Networks](https://arxiv.org/abs/1907.05686). Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou
153 | 
154 | - [Data-Free Quantization through Weight Equalization and Bias Correction](https://arxiv.org/abs/1906.04721). Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
155 | 
156 | - [HAQ: Hardware-Aware Automated Quantization with Mixed Precision](https://arxiv.org/abs/1811.08886). Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han
157 | 
158 | - [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342). Raghuraman Krishnamoorthi
159 | 
160 | - [A Quantization-Friendly Separable Convolution for MobileNets](https://arxiv.org/abs/1803.08607). Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic
161 | 
162 | - [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877). Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
163 | 
164 | - [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061). Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
165 | 
166 | - [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160). Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
167 | 
168 | - [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473). Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
169 | 
170 | 


--------------------------------------------------------------------------------
/Papers/Network_Pruning/README.md:
--------------------------------------------------------------------------------
 1 | # Network Pruning
 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning)
 3 | 
 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers)
 5 | 
 6 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.<sup><a href="http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf" target="_blank">Importance Estimation for Neural Network Pruning</a></sup>
 7 | 
 8 | 
 9 | ## [Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks](https://arxiv.org/abs/1909.08174), 2019/09
10 | Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, Ping Wang
11 | 
12 | Filter pruning is one of the most effective ways to accelerate and compress convolutional neural networks (CNNs). In this work, we propose a global filter pruning algorithm called Gate Decorator, which transforms a vanilla CNN module by multiplying its output by the channel-wise scaling factors, i.e. gate. When the scaling factor is set to zero, it is equivalent to removing the corresponding filter. We use Taylor expansion to estimate the change in the loss function caused by setting the scaling factor to zero and use the estimation for the global filter importance ranking. Then we prune the network by removing those unimportant filters. After pruning, we merge all the scaling factors into its original module, so no special operations or structures are introduced. Moreover, we propose an iterative pruning framework called Tick-Tock to improve pruning accuracy. The extensive experiments demonstrate the effectiveness of our approaches. For example, we achieve the state-of-the-art pruning ratio on ResNet-56 by reducing 70% FLOPs without noticeable loss in accuracy. For ResNet-50 on ImageNet, our pruned model with 40% FLOPs reduction outperforms the baseline model by 0.31% in top-1 accuracy. Various datasets are used, including CIFAR-10, CIFAR-100, CUB-200, ImageNet ILSVRC-12 and PASCAL VOC 2011. Code is available at this [URL](https://github.com/youzhonghui/gate-decorator-pruning).
13 | 
14 | 
15 | ## [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf), 2019/06
16 | Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz
17 | 
18 | Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at [https://github.com/NVlabs/Taylor_pruning](https://github.com/NVlabs/Taylor_pruning).
19 | 
20 | 
21 | ## [Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure](https://arxiv.org/abs/1904.03837), 2019/04
22 | Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han
23 | 
24 | The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove unimportant filters from convolutional layers so as to slim the network with acceptable performance drop. Inspired by the linear and combinational properties of convolution, we seek to make some filters increasingly close and eventually identical for network slimming. To this end, we propose Centripetal SGD (C-SGD), a novel optimization method, which can train several filters to collapse into a single point in the parameter hyperspace. When the training is completed, the removal of the identical filters can trim the network with NO performance loss, thus no finetuning is needed. By doing so, we have partly solved an open problem of constrained filter pruning on CNNs with complicated structure, where some layers must be pruned following others. Our experimental results on CIFAR-10 and ImageNet have justified the effectiveness of C-SGD-based filter pruning. Moreover, we have provided empirical evidences for the assumption that the redundancy in deep neural networks helps the convergence of training by showing that a redundant CNN trained using C-SGD outperforms a normally trained counterpart with the equivalent width.
25 | 
26 | 
27 | ## [Towards Optimal Structured CNN Pruning via Generative Adversarial Learning](https://arxiv.org/abs/1903.09291), 2019/03
28 | Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann
29 | 
30 | Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner for iteratively pruning and retraining which may not be optimal and may be computation intensive. Besides, these methods are designed for pruning a specific structure, such as filter or block structures without jointly pruning heterogeneous structures. In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner. To accomplish this, we first introduce a soft mask to scale the output of these structures by defining a new objective function with sparsity regularization to align the output of baseline and network with this mask. We then effectively solve the optimization problem by generative adversarial learning (GAL), which learns a sparse soft mask in a label-free and an end-to-end manner. By forcing more scaling factors in the soft mask to zero, the fast iterative shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably remove the corresponding structures. Extensive experiments demonstrate the effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup. This significantly outperforms state-of-the-art methods.
31 | 
32 | 
33 | ## [Variational Convolutional Neural Network Pruning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Variational_Convolutional_Neural_Network_Pruning_CVPR_2019_paper.pdf), 2019/01
34 | Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian
35 | 
36 | We propose a variational Bayesian scheme for pruningconvolutional neural networks in channel level. This idea ismotivated by the fact that deterministic value based pruningmethods are inherently improper and unstable. In a nut-shell, variational technique is introduced to estimate dis-tribution of a newly proposed parameter, called channelsaliency, based on this, redundant channels can be removedfrom model via a simple criterion.  The advantages aretwo-fold: 1) Our method conducts channel pruning with-out desire of re-training stage, thus improving the compu-tation efficiency. 2) Our method is implemented as a stand-alone module, called variational pruning layer, which canbe straightforwardly inserted into off-the-shelf deep learn-ing packages, without any special network design. Exten-sive experimental results well demonstrate the effectivenessof our method: For CIFAR-10, we perform channel removalon different CNN models up to 74% reduction, which resultsin significant size reduction and computation saving. ForImageNet, about 40% channels of ResNet-50 are removedwithout compromising accuracy.
37 | 
38 | 
39 | ## [On Implicit Filter Level Sparsity in Convolutional Neural Networks](https://arxiv.org/abs/1811.12495), 2018/11
40 | Dushyant Mehta, Kwang In Kim, Christian Theobalt
41 | 
42 | We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. We conduct an extensive experimental study casting our initial findings into hypotheses and conclusions about the mechanisms underlying the emergent filter level sparsity. This study allows new insight into the performance gap obeserved between adapative and non-adaptive gradient descent methods in practice. Further, analysis of the effect of training strategies and hyperparameters on the sparsity leads to practical suggestions in designing CNN training strategies enabling us to explore the tradeoffs between feature selectivity, network capacity, and generalization performance. Lastly, we show that the implicit sparsity can be harnessed for neural network speedup at par or better than explicit sparsification / pruning approaches, with no modifications to the typical training pipeline required.
43 | 
44 | 
45 | ## [Structured Pruning of Neural Networks with Budget-Aware Regularization](https://arxiv.org/abs/1811.09332), 2018/11
46 | Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin
47 | 
48 | Pruning methods have shown to be effective at reducing the size of deep neural networks while keeping accuracy almost intact. Among the most effective methods are those that prune a network while training it with a sparsity prior loss and learnable dropout parameters. A shortcoming of these approaches however is that neither the size nor the inference speed of the pruned network can be controlled directly; yet this is a key feature for targeting deployment of CNNs on low-power hardware. To overcome this, we introduce a budgeted regularized pruning framework for deep convolutional neural networks. Our approach naturally fits into traditional neural network training as it consists of a learnable masking layer, a novel budget-aware objective function, and the use of knowledge distillation. We also provide insights on how to prune a residual network and how this can lead to new architectures. Experimental results reveal that CNNs pruned with our method are more accurate and less compute-hungry than state-of-the-art methods. Also, our approach is more effective at preventing accuracy collapse in case of severe pruning; this allows us to attain pruning factors up to 16x without significantly affecting the accuracy.
49 | 
50 | 
51 | ## [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250), 2018/11
52 | Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
53 | 
54 | Previous works utilized "smaller-norm-less-important" criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with "relatively less" importance. When applied to two image classification benchmarks, our method validates its usefulness and strengths. Notably, on CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69% relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art. Code is publicly available on [GitHub](https://github.com/he-y/filter-pruning-geometric-median).
55 | 
56 | 
57 | ## [Discrimination-aware Channel Pruning for Deep Neural Networks](https://arxiv.org/abs/1810.11809), 2018/10
58 | Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
59 | 
60 | Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy.
61 | 
62 | 
63 | ## [Rethinking the Value of Network Pruning](https://arxiv.org/abs/1810.05270), 2018/10
64 | Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell
65 | 
66 | Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization.
67 | 
68 | 
69 | ## [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), 2018/03
70 | Jonathan Frankle, Michael Carbin
71 | 
72 | Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
73 | 
74 | 
75 | ## [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), 2017/10
76 | Michael Zhu, Suyog Gupta
77 | 
78 | Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
79 | 
80 | 
81 | ## [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342), 2017/07
82 | Jian-Hao Luo, Jianxin Wu, Weiyao Lin
83 | 
84 | We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
85 | 
86 | 
87 | ## [Channel Pruning for Accelerating Very Deep Neural Networks](https://arxiv.org/abs/1707.06168), 2017/07
88 | Yihui He, Xiangyu Zhang, Jian Sun
89 | 
90 | In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.
91 | 
92 | 
93 | 


--------------------------------------------------------------------------------