├── .gitignore ├── Contribute ├── renderer.py ├── data.yaml └── config.yaml ├── LicenseBlock ├── config.yaml ├── renderer.py └── data.yaml ├── Papers ├── AutoML │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── Others │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── Network_Pruning │ ├── config.yaml │ ├── renderer.py │ └── README.md ├── Applications │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── Quantization │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── Federated_Learning │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── Efficient_Architectures │ ├── config.yaml │ └── renderer.py ├── ML_Algorithms_For_Edge │ ├── config.yaml │ ├── renderer.py │ ├── README.md │ └── data.yaml ├── config.yaml ├── renderer.py ├── data.yaml └── README.md ├── Datasets ├── config.yaml ├── renderer.py └── data.yaml ├── Challenges ├── config.yaml ├── renderer.py └── data.yaml ├── Other_Resources ├── config.yaml ├── renderer.py └── data.yaml ├── AI_Chips ├── config.yaml ├── data.yaml └── renderer.py ├── Books ├── config.yaml ├── renderer.py └── data.yaml ├── MCU_and_MPU_Software_Packages ├── config.yaml ├── renderer.py └── data.yaml ├── Inference_Engines ├── config.yaml ├── renderer.py └── data.yaml ├── config.yaml ├── data.yaml ├── utils.py ├── style.py ├── awesome.py ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | -------------------------------------------------------------------------------- /Contribute/renderer.py: -------------------------------------------------------------------------------- 1 | def renderer(fp, data, config): 2 | pass 3 | -------------------------------------------------------------------------------- /LicenseBlock/config.yaml: -------------------------------------------------------------------------------- 1 | # LICENSE 2 | --- 3 | description: 4 | ... 5 | -------------------------------------------------------------------------------- /Papers/AutoML/config.yaml: -------------------------------------------------------------------------------- 1 | # AUTOML 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Papers/Others/config.yaml: -------------------------------------------------------------------------------- 1 | # OTHERS 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Papers/Network_Pruning/config.yaml: -------------------------------------------------------------------------------- 1 | # PRUNING 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... -------------------------------------------------------------------------------- /Papers/Applications/config.yaml: -------------------------------------------------------------------------------- 1 | # APPLICATIONS 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Papers/Quantization/config.yaml: -------------------------------------------------------------------------------- 1 | # QUANTIZATION 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Contribute/data.yaml: -------------------------------------------------------------------------------- 1 | # CONTRIBUTE 2 | # 3 | # [Template] 4 | # name: 5 | --- 6 | name: Contribute 7 | ... 8 | -------------------------------------------------------------------------------- /Papers/Federated_Learning/config.yaml: -------------------------------------------------------------------------------- 1 | # FEDERATED LEARNING 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Papers/Efficient_Architectures/config.yaml: -------------------------------------------------------------------------------- 1 | # EFFICIENT NETWORKS 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Papers/ML_Algorithms_For_Edge/config.yaml: -------------------------------------------------------------------------------- 1 | # ML ALGORITHMS FOR EDGE 2 | --- 3 | sort_key: date 4 | sort_reverse: true 5 | ... 6 | -------------------------------------------------------------------------------- /Datasets/config.yaml: -------------------------------------------------------------------------------- 1 | # DATASETS 2 | # 3 | # [Template] 4 | # description: 5 | # sort_key: 6 | # sort_reverse: 7 | --- 8 | description: 9 | sort_key: name 10 | ... 11 | -------------------------------------------------------------------------------- /Challenges/config.yaml: -------------------------------------------------------------------------------- 1 | # CHALLENGES 2 | # 3 | # [Template] 4 | # description: 5 | # sort_key: 6 | # sort_reverse: 7 | --- 8 | description: 9 | sort_key: name 10 | ... 11 | -------------------------------------------------------------------------------- /Papers/config.yaml: -------------------------------------------------------------------------------- 1 | # PAPERS 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: 10 | sort_key: name 11 | ... 12 | -------------------------------------------------------------------------------- /Other_Resources/config.yaml: -------------------------------------------------------------------------------- 1 | # OTHER RESOURCES 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: 10 | sort_key: name 11 | ... 12 | -------------------------------------------------------------------------------- /AI_Chips/config.yaml: -------------------------------------------------------------------------------- 1 | # AI CHIPS 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: List of resources about AI Chips 10 | sort_key: name 11 | ... 12 | -------------------------------------------------------------------------------- /LicenseBlock/renderer.py: -------------------------------------------------------------------------------- 1 | from style import h2 2 | from style import newline 3 | 4 | 5 | def renderer(fp, data, config): 6 | fp.write(data["logo"]) 7 | newline(fp) 8 | newline(fp) 9 | fp.write(data["description"]) 10 | -------------------------------------------------------------------------------- /Books/config.yaml: -------------------------------------------------------------------------------- 1 | # BOOKS 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: List of books with focus on on-device (e.g., edge or mobile) machine learning. 10 | sort_key: published 11 | sort_reverse: true 12 | ... 13 | -------------------------------------------------------------------------------- /MCU_and_MPU_Software_Packages/config.yaml: -------------------------------------------------------------------------------- 1 | # MCU AND MPU SOFTWARE PACKAGES 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: List of software packages for AI development on MCU and MPU 10 | sort_key: name 11 | ... 12 | -------------------------------------------------------------------------------- /Inference_Engines/config.yaml: -------------------------------------------------------------------------------- 1 | # INFERENCE ENGINES 2 | # 3 | # [Template] 4 | # 5 | # description: 6 | # sort_key: 7 | # sort_reverse: 8 | --- 9 | description: List of machine learning inference engines and APIs that are optimized for execution and/or training on edge devices. 10 | sort_key: name 11 | ... 12 | -------------------------------------------------------------------------------- /AI_Chips/data.yaml: -------------------------------------------------------------------------------- 1 | # AI CHIPS 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # name: 7 | # description: 8 | # link: 9 | --- 10 | 11 | - 12 | name: AI Chip (ICs and IPs) 13 | description: A list of ICs and IPs for AI, Machine Learning and Deep Learning 14 | link: https://github.com/basicmi/AI-Chip 15 | -------------------------------------------------------------------------------- /Datasets/renderer.py: -------------------------------------------------------------------------------- 1 | from style import h3 2 | from style import p 3 | from style import a 4 | from style import newline 5 | 6 | 7 | def renderer(fp, data, config): 8 | fp.write(h3(a([ 9 | data["name"], 10 | data["url"], 11 | ]))) 12 | fp.write(p(data["description"])) 13 | newline(fp) 14 | -------------------------------------------------------------------------------- /Challenges/renderer.py: -------------------------------------------------------------------------------- 1 | from style import h3 2 | from style import p 3 | from style import a 4 | from style import newline 5 | 6 | 7 | def renderer(fp, data, config): 8 | fp.write(h3(a([ 9 | data["name"], 10 | data["url"], 11 | ]))) 12 | fp.write(p(data["description"])) 13 | newline(fp) 14 | -------------------------------------------------------------------------------- /AI_Chips/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h3 3 | from style import a 4 | 5 | 6 | def renderer(fp, data, config): 7 | fp.write(h3(a([data["name"], data["link"]]))) 8 | 9 | if data["description"] is not None: 10 | fp.write(data["description"]) 11 | fp.write("\n") 12 | 13 | fp.write("\n") 14 | -------------------------------------------------------------------------------- /MCU_and_MPU_Software_Packages/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h3 3 | from style import a 4 | 5 | 6 | def renderer(fp, data, config): 7 | fp.write(h3(a([data["name"], data["link"]]))) 8 | 9 | if data["description"] is not None: 10 | fp.write(data["description"]) 11 | fp.write("\n") 12 | 13 | fp.write("\n") 14 | -------------------------------------------------------------------------------- /Other_Resources/renderer.py: -------------------------------------------------------------------------------- 1 | from style import p 2 | from style import a 3 | from style import h3 4 | from style import newline 5 | from utils import name2link 6 | 7 | 8 | def renderer(fp, data, config): 9 | fp.write(h3(a([ 10 | data["name"], 11 | data["url"], 12 | ]))) 13 | newline(fp) 14 | fp.write(p(data["description"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /LicenseBlock/data.yaml: -------------------------------------------------------------------------------- 1 | # LICENSE 2 | --- 3 | name: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication 4 | logo: "[![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/)" 5 | description: To the extent possible under law, [Bisonai](https://bisonai.com/) has waived all copyright and related or neighboring rights to this work. 6 | ... 7 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | # AWESOME EDGE MACHINE LEARNING 2 | --- 3 | title: Awesome Edge Machine Learning 4 | description: A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others. 5 | url: https://github.com/bisonai/awesome-edge-machine-learning 6 | filename: README.md 7 | max_level: 2 8 | badge: 9 | - "[![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re)" 10 | ... 11 | -------------------------------------------------------------------------------- /Contribute/config.yaml: -------------------------------------------------------------------------------- 1 | # CONTRIBUTE 2 | --- 3 | description: > 4 | Unlike other awesome list, we are storing data in YAML format and markdown files are generated with `awesome.py` script. 5 | 6 | 7 | Every directory contains `data.yaml` which stores data we want to display and `config.yaml` which stores its metadata (e.g. way of sorting data). The way how data will be presented is defined in `renderer.py`. 8 | ... 9 | -------------------------------------------------------------------------------- /Papers/AutoML/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Others/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Applications/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Network_Pruning/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Quantization/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Federated_Learning/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/Efficient_Architectures/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Papers/ML_Algorithms_For_Edge/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h2 3 | from style import a 4 | from style import p 5 | from style import newline 6 | 7 | 8 | def renderer(fp, data, config): 9 | year, month, day = data["date"].split("/") 10 | fp.write(h2(a([data["name"].strip(), data["url"].strip()]) + f", {year}/{month}")) 11 | fp.write(data["authors"]) 12 | newline(fp) 13 | newline(fp) 14 | fp.write(p(data["abstract"])) 15 | newline(fp) 16 | -------------------------------------------------------------------------------- /Datasets/data.yaml: -------------------------------------------------------------------------------- 1 | # DATASETS 2 | --- 3 | - 4 | name: Visual Wake Words Dataset 5 | url: https://arxiv.org/abs/1906.05721 6 | description: > 7 | Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. 8 | 9 | ... 10 | -------------------------------------------------------------------------------- /Papers/renderer.py: -------------------------------------------------------------------------------- 1 | from style import h3 2 | from style import p 3 | from style import a 4 | from style import li 5 | from style import newline 6 | from utils import name2dir 7 | 8 | 9 | # TODO extract Papers directory automatically 10 | def renderer(fp, data, config): 11 | fp.write(h3(a([ 12 | data["name"], 13 | config["url"] + "/tree/master/Papers/" + name2dir(data["name"]), 14 | ]))) 15 | fp.write(p(data["description"])) 16 | fp.write("\n") 17 | 18 | 19 | def renderer_subdir(fp, data, config): 20 | li(fp, a([data["name"].strip(), data["url"]]) + ". " + data["authors"].strip()) 21 | newline(fp) 22 | -------------------------------------------------------------------------------- /Inference_Engines/renderer.py: -------------------------------------------------------------------------------- 1 | from style import li 2 | from style import h3 3 | from style import a 4 | 5 | 6 | # TODO image: platform(iOS, Android), 32, 16, 8 bits, gpu/cpu acceleration 7 | # TODO link to companies 8 | def renderer(fp, data, config): 9 | fp.write(h3(data["name"])) 10 | 11 | if data["sourcecode"] is not None: 12 | li(fp, [ 13 | "Source code: ", 14 | a(data["sourcecode"]), 15 | ]) 16 | 17 | if data["documentation"] is not None: 18 | li(fp, [ 19 | "Documentation: ", 20 | a(data["documentation"]), 21 | ]) 22 | 23 | li(fp, data["company"]) 24 | fp.write("\n") 25 | -------------------------------------------------------------------------------- /Books/renderer.py: -------------------------------------------------------------------------------- 1 | from style import h3 2 | from style import p 3 | from style import a 4 | from style import newline 5 | from style import li 6 | 7 | 8 | # TODO separate authors and add links 9 | def renderer(fp, data, config): 10 | title = data["title"] if data["subtitle"] is None else f"{data['title']}: {data['subtitle']}" 11 | fp.write(h3(a([ 12 | title, 13 | data["url"], 14 | ]))) 15 | 16 | # Authors 17 | if len(data["authors"]) > 1: 18 | author = "Authors: " 19 | else: 20 | author = "Author: " 21 | 22 | author += ", ".join(data["authors"]) 23 | li(fp, author) 24 | 25 | li(fp, f"Published: {data['published']}") 26 | newline(fp) 27 | -------------------------------------------------------------------------------- /Challenges/data.yaml: -------------------------------------------------------------------------------- 1 | # CHALLENGES 2 | --- 3 | 4 | - 5 | name: Low Power Recognition Challenge (LPIRC) 6 | url: https://rebootingcomputing.ieee.org/lpirc 7 | description: > 8 | Competition with focus on the best vision solutions that can simultaneously achieve high accuracy in computer vision and energy efficiency. LPIRC is regularly held during computer vision conferences (CVPR, ICCV and others) since 2015 and the winners’ solutions have already improved 24 times in the ratio of accuracy divided by energy. 9 | 10 | 11 | - [Online Track](https://rebootingcomputing.ieee.org/lpirc/online-track) 12 | 13 | 14 | - [Onsite Track](https://rebootingcomputing.ieee.org/lpirc/onsite-track) 15 | 16 | ... 17 | -------------------------------------------------------------------------------- /data.yaml: -------------------------------------------------------------------------------- 1 | # TABLE OF CONTENTS 2 | # 3 | # `name` have to correspond to the name of directory. 4 | # The space between words will be replaced with underscore _ 5 | # 6 | # The order of content will be same as defined here in this file. 7 | # 8 | # [Template] 9 | # - 10 | # name: 11 | --- 12 | 13 | - 14 | name: Papers 15 | 16 | - 17 | name: Datasets 18 | 19 | - 20 | name: Inference Engines 21 | 22 | - 23 | name: MCU and MPU Software Packages 24 | 25 | - 26 | name: AI Chips 27 | 28 | #- 29 | # name: Labs 30 | # TODO bisonai inference time benchmark 31 | #- 32 | # name: Benchmarks 33 | 34 | - 35 | name: Books 36 | 37 | - 38 | name: Challenges 39 | 40 | #- 41 | # name: Companies 42 | # - 43 | # name: Meetups 44 | 45 | - 46 | name: Other Resources 47 | 48 | - 49 | name: Contribute 50 | 51 | - 52 | name: LicenseBlock 53 | 54 | ... 55 | -------------------------------------------------------------------------------- /Other_Resources/data.yaml: -------------------------------------------------------------------------------- 1 | # OTHER RESOURCES 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # name: 7 | # url: 8 | # description: 9 | --- 10 | 11 | - 12 | name: Awesome EMDL 13 | url: https://github.com/EMDL/awesome-emdl 14 | description: Embedded and mobile deep learning research resources 15 | 16 | - 17 | name: Awesome Pruning 18 | url: https://github.com/he-y/Awesome-Pruning 19 | description: A curated list of neural network pruning resources 20 | 21 | - 22 | name: Efficient DNNs 23 | url: https://github.com/MingSun-Tse/EfficientDNNs 24 | description: Collection of recent methods on DNN compression and acceleration 25 | 26 | - 27 | name: Machine Think 28 | url: https://machinethink.net/ 29 | description: Machine learning tutorials targeted for iOS devices 30 | 31 | - 32 | name: Pete Warden's blog 33 | url: https://petewarden.com/ 34 | description: 35 | 36 | ... 37 | -------------------------------------------------------------------------------- /Books/data.yaml: -------------------------------------------------------------------------------- 1 | # BOOKS 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # title: 7 | # subtitle: 8 | # authors: 9 | # - 10 | # url: 11 | # published: 12 | --- 13 | 14 | - 15 | title: Building Mobile Applications with TensorFlow 16 | subtitle: 17 | authors: 18 | - Pete Warden 19 | url: https://www.oreilly.com/library/view/building-mobile-applications/9781491988435/ 20 | published: 2017 21 | 22 | - 23 | title: Machine Learning by Tutorials 24 | subtitle: Beginning machine learning for Apple and iOS 25 | authors: 26 | - Matthijs Hollemans 27 | url: https://store.raywenderlich.com/products/machine-learning-by-tutorials 28 | published: 2019 29 | 30 | - 31 | title: Core ML Survival Guide 32 | subtitle: 33 | authors: 34 | - Matthijs Hollemans 35 | url: https://leanpub.com/coreml-survival-guide 36 | published: 2018 37 | 38 | - 39 | title: TinyML 40 | subtitle: Machine Learning with TensorFlow on Arduino, and Ultra-Low Power Micro-Controllers 41 | authors: 42 | - Pete Warden 43 | - Daniel Situnayake 44 | url: http://shop.oreilly.com/product/0636920254508.do 45 | published: 2020 46 | 47 | ... 48 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from typing import List 3 | from typing import Dict 4 | 5 | from yaml import load 6 | from yaml import Loader 7 | 8 | 9 | def parse_yaml(filepath: Path): 10 | with open(filepath, "r") as f: 11 | stream = "".join(f.readlines()) 12 | return load(stream, Loader=Loader) 13 | 14 | 15 | def name2dir(name: str): 16 | return "_".join([s for s in name.split(" ")]) 17 | 18 | 19 | def name2link(name: str): 20 | """Used for hyperlink anchors""" 21 | if not isinstance(name, str): 22 | name = str(name) 23 | 24 | return "-".join([s.lower() for s in name.split(" ")]) 25 | 26 | 27 | def dir2name(name: str): 28 | if not isinstance(name, str): 29 | name = str(name) 30 | 31 | # return " ".join([w[0].upper() + w[1:] for w in name.split("_")]) 32 | return " ".join([w for w in name.split("_")]) 33 | 34 | 35 | def find_subdirectories(path: Path): 36 | if not isinstance(path, Path): 37 | path = Path(path) 38 | 39 | return sorted(list(filter(lambda f: f.is_dir() and f.name != "__pycache__", path.glob("*")))) 40 | 41 | 42 | def sort( 43 | data: List[Dict], 44 | sort_key: str, 45 | sort_reverse: bool, 46 | ): 47 | if sort_key is not None: 48 | data = sorted(data, key=lambda k: k[sort_key], reverse=sort_reverse) 49 | 50 | return data 51 | -------------------------------------------------------------------------------- /style.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | 3 | 4 | def concatenate(text: List): 5 | return "".join(filter(lambda x: x is not None, text)) 6 | 7 | 8 | def li(fp, text): 9 | if isinstance(text, list): 10 | text = concatenate(text) 11 | 12 | fp.write("- " + text + "\n") 13 | 14 | 15 | def lili(fp, text): 16 | """Second level of list items""" 17 | fp.write("\t") 18 | li(fp, text) 19 | 20 | 21 | def h1(text): 22 | return "# " + text + "\n" 23 | 24 | 25 | def h2(text): 26 | return "## " + text + "\n" 27 | 28 | 29 | def h3(text): 30 | return "### " + text + "\n" 31 | 32 | 33 | def h4(text): 34 | return "#### " + text + "\n" 35 | 36 | 37 | def h5(text): 38 | return "##### " + text + "\n" 39 | 40 | 41 | def h6(text): 42 | return "###### " + text + "\n" 43 | 44 | 45 | def p(text): 46 | if text is None: 47 | return "\n" 48 | else: 49 | return str(text) + "\n" 50 | 51 | 52 | def a(args: List): 53 | if not isinstance(args, list): 54 | args = [args] 55 | 56 | if len(args) == 1: 57 | src = args[0] 58 | if src is None: 59 | return "" 60 | else: 61 | return f"[{src}]({src})" 62 | if len(args) == 2: 63 | name = args[0] 64 | src = args[1] 65 | if name is None or src is None: 66 | return "" 67 | else: 68 | return f"[{name}]({src})" 69 | else: 70 | raise NotImplementedError 71 | 72 | 73 | def newline(fp, iter=1): 74 | for _ in range(iter): 75 | fp.write("\n") 76 | -------------------------------------------------------------------------------- /Papers/data.yaml: -------------------------------------------------------------------------------- 1 | # PAPERS 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # name: 7 | # description: 8 | --- 9 | 10 | - name: Applications 11 | description: > 12 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions. 13 | 14 | - 15 | name: Federated Learning 16 | description: > 17 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.Google AI blog: Federated Learning 18 | 19 | - 20 | name: Quantization 21 | description: > 22 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision. 23 | 24 | - 25 | name: Network Pruning 26 | description: > 27 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.Importance Estimation for Neural Network Pruning 28 | 29 | - 30 | name: AutoML 31 | description: > 32 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.Wikipedia AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture. 33 | 34 | - 35 | name: Efficient Architectures 36 | description: > 37 | Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices. 38 | 39 | - name: ML Algorithms For Edge 40 | description: > 41 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms. 42 | 43 | - 44 | name: Others 45 | description: > 46 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform). 47 | 48 | ... 49 | -------------------------------------------------------------------------------- /Inference_Engines/data.yaml: -------------------------------------------------------------------------------- 1 | # INFERENCE ENGINES 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # name: 7 | # company: 8 | # sourcecode: 9 | # documentation: 10 | # platform: 11 | # gpu: 12 | --- 13 | 14 | - 15 | name: Arm Compute Library 16 | company: Arm 17 | sourcecode: https://github.com/ARM-software/ComputeLibrary 18 | documentation: 19 | platform: 20 | gpu: 21 | 22 | - 23 | name: Qualcomm Neural Processing SDK for AI 24 | company: Qualcomm 25 | sourcecode: https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk 26 | documentation: 27 | platform: Android 28 | gpu: true 29 | 30 | - 31 | name: Embedded Learning Library 32 | company: Microsoft 33 | sourcecode: https://github.com/Microsoft/ELL 34 | documentation: https://microsoft.github.io/ELL 35 | platform: Raspberry Pi, Arduino, micro:bit 36 | gpu: false 37 | 38 | - 39 | name: Bender 40 | company: Xmartlabs 41 | sourcecode: https://github.com/xmartlabs/Bender 42 | documentation: https://xmartlabs.github.io/Bender/ 43 | platform: iOS 44 | gpu: true 45 | 46 | - 47 | name: dabnn 48 | company: JDAI Computer Vision 49 | sourcecode: https://github.com/JDAI-CV/dabnn 50 | documentation: 51 | platform: Android 52 | gpu: 53 | 54 | - 55 | name: Tengine 56 | company: OAID 57 | sourcecode: https://github.com/OAID/Tengine 58 | documentation: 59 | platform: Android 60 | gpu: true 61 | 62 | - 63 | name: MACE 64 | company: XiaoMi 65 | sourcecode: https://github.com/XiaoMi/mace 66 | documentation: https://mace.readthedocs.io/ 67 | platform: Android, iOS 68 | gpu: 69 | 70 | - 71 | name: MNN 72 | company: Alibaba 73 | sourcecode: https://github.com/alibaba/MNN 74 | documentation: 75 | platform: 76 | gpu: 77 | 78 | - 79 | name: Feather CNN 80 | company: Tencent 81 | sourcecode: https://github.com/Tencent/FeatherCNN 82 | documentation: 83 | platform: 84 | gpu: 85 | 86 | - 87 | name: NCNN 88 | company: Tencent 89 | sourcecode: https://github.com/tencent/ncnn 90 | documentation: 91 | platform: iOS, Android 92 | gpu: true 93 | 94 | - 95 | name: Paddle Mobile 96 | company: Baidu 97 | sourcecode: https://github.com/PaddlePaddle/paddle-mobile 98 | documentation: 99 | platform: 100 | gpu: 101 | 102 | - 103 | name: MXNet 104 | company: Amazon 105 | sourcecode: 106 | documentation: https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html 107 | platform: 108 | gpu: 109 | 110 | - 111 | name: TensorFlow Lite 112 | company: Google 113 | sourcecode: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite 114 | documentation: https://www.tensorflow.org/lite/ 115 | platform: Android, iOS 116 | gpu: true 117 | 118 | - 119 | name: Caffe 2 120 | company: Facebook 121 | sourcecode: https://github.com/pytorch/pytorch/tree/master/caffe2 122 | documentation: https://caffe2.ai/ 123 | platform: 124 | gpu: 125 | 126 | - 127 | name: CoreML 128 | company: Apple 129 | sourcecode: 130 | documentation: https://developer.apple.com/documentation/coreml 131 | platform: iOS 132 | gpu: true 133 | 134 | - 135 | name: Neural Networks API 136 | company: Google 137 | sourcecode: 138 | documentation: https://developer.android.com/ndk/guides/neuralnetworks/ 139 | platform: 140 | gpu: true 141 | 142 | - 143 | name: Deeplearning4j 144 | company: Skymind 145 | sourcecode: 146 | documentation: https://deeplearning4j.org/docs/latest/deeplearning4j-android 147 | platform: 148 | gpu: 149 | 150 | ... 151 | -------------------------------------------------------------------------------- /MCU_and_MPU_Software_Packages/data.yaml: -------------------------------------------------------------------------------- 1 | # MCU AND MPU SOFTWARE PACKAGES 2 | # 3 | # [Template] 4 | # 5 | # - 6 | # name: 7 | # description: 8 | # link: 9 | # company: 10 | --- 11 | 12 | - 13 | name: FP-AI-Sensing 14 | description: STM32Cube function pack for ultra-low power IoT node with artificial intelligence (AI) application based on audio and motion sensing 15 | link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-ode-function-pack-sw/fp-ai-sensing1.html 16 | company: STMicroelectronics 17 | 18 | - 19 | name: FP-AI-VISION1 20 | description: FP-AI-VISION1 is an STM32Cube function pack featuring examples of computer vision applications based on Convolutional Neural Network (CNN) 21 | link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32cube-expansion-packages/fp-ai-vision1.html 22 | company: STMicroelectronics 23 | 24 | - 25 | name: X-LINUX-AI-CV 26 | description: X-LINUX-AI-CV is an STM32 MPU OpenSTLinux Expansion Package that targets Artificial Intelligence for computer vision applications based on Convolutional Neural Network (CNN) 27 | link: https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-mpu-openstlinux-expansion-packages/x-linux-ai-cv.html 28 | company: STMicroelectronics 29 | 30 | - 31 | name: e-AI Translator 32 | description: Tool for converting Caffe and TensorFlow models to MCU/MPU development environment 33 | link: https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html 34 | company: Renesas 35 | 36 | - 37 | name: e-AI Checker 38 | description: Based on the output result from the translator, the ROM/RAM mounting size and the inference execution processing time are calculated while referring to the information of the selected MCU/MPU 39 | link: https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html 40 | company: Renesas 41 | 42 | - 43 | name: Processor SDK Linux for AM57x 44 | description: TIDL software framework leverages a highly optimized neural network implementation on TI’s Sitara AM57x processors, making use of hardware acceleration on the device 45 | link: http://www.ti.com/tool/SITARA-MACHINE-LEARNING 46 | company: Texas Instruments 47 | 48 | - 49 | name: eIQ ML Software Development Environment 50 | description: The NXP® eIQ™ machine learning software development environment enables the use of ML algorithms on NXP MCUs, i.MX RT crossover MCUs, and i.MX family SoCs. eIQ software includes inference engines, neural network compilers and optimized libraries 51 | link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ 52 | company: NXP 53 | 54 | - 55 | name: eIQ Auto deep learning (DL) toolkit 56 | description: The NXP eIQ™ Auto deep learning (DL) toolkit enables developers to introduce DL algorithms into their applications and to continue satisfying automotive standards 57 | link: https://www.nxp.com/design/software/development-software/eiq-auto-dl-toolkit:EIQ-AUTO 58 | company: NXP 59 | 60 | - 61 | name: eIQ™ Software for Arm® NN Inference Engine 62 | description: 63 | link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-software-for-arm-nn-inference-engine:eIQArmNN 64 | company: NXP 65 | 66 | - 67 | name: eIQ™ for TensorFlow Lite 68 | description: 69 | link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-tensorflow-lite:eIQTensorFlowLite 70 | company: NXP 71 | 72 | - 73 | name: eIQ™ for Glow Neural Network Compiler 74 | description: 75 | link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-glow-neural-network-compiler:eIQ-Glow 76 | company: NXP 77 | 78 | - 79 | name: eIQ™ for Arm® CMSIS-NN 80 | description: 81 | link: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-arm-cmsis-nn:eIQArmCMSISNN 82 | company: NXP 83 | -------------------------------------------------------------------------------- /Papers/Federated_Learning/README.md: -------------------------------------------------------------------------------- 1 | # Federated Learning 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.Google AI blog: Federated Learning 7 | 8 | 9 | ## [Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046), 2019/02 10 | Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander 11 | 12 | Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions. 13 | 14 | 15 | ## [Adaptive Federated Learning in Resource Constrained Edge Computing Systems](https://arxiv.org/abs/1804.05271), 2018/04 16 | Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan 17 | 18 | Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions. 19 | 20 | 21 | ## [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629), 2016/02 22 | H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas 23 | 24 | Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent. 25 | 26 | 27 | -------------------------------------------------------------------------------- /Papers/Federated_Learning/data.yaml: -------------------------------------------------------------------------------- 1 | # FEDERATED LEARNING 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | Communication-Efficient Learning of Deep Networks from Decentralized Data 21 | url: https://arxiv.org/abs/1602.05629 22 | date: 2016/02/17 23 | conference: AISTATS 2017 24 | code: 25 | authors: H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas 26 | abstract: > 27 | Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. 28 | We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent. 29 | 30 | - 31 | name: > 32 | Towards Federated Learning at Scale: System Design 33 | url: https://arxiv.org/abs/1902.01046 34 | date: 2019/02/04 35 | conference: 36 | code: 37 | authors: Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander 38 | abstract: > 39 | Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions. 40 | 41 | - 42 | name: > 43 | Adaptive Federated Learning in Resource Constrained Edge Computing Systems 44 | url: https://arxiv.org/abs/1804.05271 45 | date: 2018/04/14 46 | conference: IEEE Journal on Selected Areas in Communications 47 | code: 48 | authors: Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan 49 | abstract: > 50 | Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions. 51 | ... 52 | -------------------------------------------------------------------------------- /awesome.py: -------------------------------------------------------------------------------- 1 | # AWESOME EDGE MACHINE LEARNING 2 | # Bisonai, 2019 3 | from pathlib import Path 4 | 5 | from style import li 6 | from style import lili 7 | from style import h1 8 | from style import h2 9 | from style import a 10 | from style import p 11 | from style import newline 12 | from utils import parse_yaml 13 | from utils import name2dir 14 | from utils import dir2name 15 | from utils import name2link 16 | from utils import find_subdirectories 17 | from utils import sort 18 | 19 | # TODO conference badges 20 | 21 | config = parse_yaml("config.yaml") 22 | f = open(config["filename"], "w") 23 | 24 | 25 | # Introduction ######################################################## 26 | f.write(h1(config["title"])) 27 | for badge in config["badge"]: 28 | f.write(badge) 29 | newline(f) 30 | 31 | newline(f) 32 | f.write(config["description"]) 33 | newline(f, iter=2) 34 | 35 | 36 | # Table of Contents ################################################### 37 | f.write(h2("Table of Contents")) 38 | table_of_contents = parse_yaml("data.yaml") 39 | default_level = 1 40 | max_level = config.get("max_level", default_level) 41 | level = default_level 42 | for tol in table_of_contents: 43 | li(f, a([ 44 | tol["name"], 45 | config["url"] + "#" + name2link(tol["name"]), 46 | ])) 47 | 48 | # Deeper levels in table of contents 49 | while True: 50 | if level < max_level: 51 | level += 1 52 | sub_table_of_contents = find_subdirectories(name2dir(tol["name"])) 53 | for s in sub_table_of_contents: 54 | lili(f, a([ 55 | dir2name(s.name), 56 | config["url"] + "/tree/master/" + str(s), 57 | ])) 58 | else: 59 | level = default_level 60 | break 61 | 62 | newline(f) 63 | 64 | 65 | # Main Content ######################################################## 66 | for tol in table_of_contents: 67 | f.write(h2(tol["name"])) 68 | 69 | datafile = Path(name2dir(tol["name"])) 70 | if not datafile.is_dir(): 71 | print(f"You must create directory for {tol['name']} and populate it with data.yaml, config.yaml and renderer.py files.") 72 | continue 73 | 74 | data = parse_yaml(datafile / "data.yaml") 75 | config_local = parse_yaml(datafile / "config.yaml") 76 | 77 | # Section description 78 | description = config_local.get("description", None) 79 | if description is not None: 80 | f.write(p(description)) 81 | newline(f) 82 | 83 | # Sort content of section 84 | sort_key = config_local.get("sort_key", None) 85 | data = sort(data, sort_key, config_local.get("sort_reverse", False)) 86 | 87 | exec(f"from {datafile}.renderer import renderer") 88 | 89 | try: 90 | exec(f"from {datafile}.renderer import renderer_subdir") 91 | # e.g. content of Papers / README.md 92 | fp_sub2 = open(str(Path(tol["name"]) / "README.md"), "w") 93 | fp_sub2.write(h1(tol["name"])) 94 | fp_sub2.write(a(["Back to awesome edge machine learning", config["url"]])) 95 | newline(fp_sub2, iter=2) 96 | except: 97 | pass 98 | 99 | if not isinstance(data, list): 100 | data = [data] 101 | for d in data: 102 | renderer(f, d, config) 103 | 104 | subdirs = find_subdirectories(datafile) 105 | for idx, sub in enumerate(subdirs): 106 | # e.g. content of Papers / AutoML / README.md 107 | data_sub = parse_yaml(sub / "data.yaml") 108 | config_sub = parse_yaml(sub / "config.yaml") 109 | fp_sub = open(sub / "README.md", "w") 110 | 111 | fp_sub.write(h1(dir2name(sub.name))) 112 | fp_sub.write(a(["Back to awesome edge machine learning", config["url"]])) 113 | newline(fp_sub, iter=2) 114 | fp_sub.write(a([f"Back to {datafile}", config["url"] + f"/tree/master/{datafile}"])) 115 | newline(fp_sub, iter=2) 116 | fp_sub.write(data[idx]["description"]) 117 | newline(fp_sub, iter=2) 118 | 119 | exec(f"from {str(sub).replace('/', '.')}.renderer import renderer") 120 | 121 | try: 122 | fp_sub2.write(h2(dir2name(sub.name))) 123 | newline(fp_sub2) 124 | fp_sub2.write((data[idx]["description"])) 125 | newline(fp_sub2) 126 | except: 127 | pass 128 | 129 | if config_sub is not None: 130 | sort_key = config_sub.get("sort_key", None) 131 | data_sub = sort(data_sub, sort_key, config_sub.get("sort_reverse", False)) 132 | for d in data_sub: 133 | renderer(fp_sub, d, config) 134 | try: 135 | renderer_subdir(fp_sub2, d, config) 136 | except: 137 | pass 138 | fp_sub.close() 139 | -------------------------------------------------------------------------------- /Papers/ML_Algorithms_For_Edge/README.md: -------------------------------------------------------------------------------- 1 | # ML Algorithms For Edge 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms. 7 | 8 | 9 | ## [Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices](https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf), 2019/12 10 | Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain 11 | 12 | Recurrent Neural Networks (RNNs) capture long dependencies and context, and hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using ShaRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag in phoneme classification by 10-12x while maintaining state-of-the-art accuracy. 13 | 14 | 15 | ## [ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices](http://proceedings.mlr.press/v70/gupta17a.html), 2017/08 16 | Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain 17 | 18 | Several real-world applications require real-time prediction on resource-scarce devices such as an Internet of Things (IoT) sensor. Such applications demand prediction models with small storage and computational complexity that do not compromise significantly on accuracy. In this work, we propose ProtoNN, a novel algorithm that addresses the problem of real-time and accurate prediction on resource-scarce devices. ProtoNN is inspired by k-Nearest Neighbor (KNN) but has several orders lower storage and prediction complexity. ProtoNN models can be deployed even on devices with puny storage and computational power (e.g. an Arduino UNO with 2kB RAM) to get excellent prediction accuracy. ProtoNN derives its strength from three key ideas: a) learning a small number of prototypes to represent the entire training set, b) sparse low dimensional projection of data, c) joint discriminative learning of the projection and prototypes with explicit model size constraint. We conduct systematic empirical evaluation of ProtoNN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory. 19 | 20 | 21 | ## [Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things](http://proceedings.mlr.press/v70/kumar17a.html), 2017/08 22 | Ashish Kumar, Saurabh Goyal, Manik Varma 23 | 24 | This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash. Bonsai maintains prediction accuracy while minimizing model size and prediction costs by: (a) developing a tree model which learns a single, shallow, sparse tree with powerful nodes; (b) sparsely projecting all data into a low-dimensional space in which the tree is learnt; and (c) jointly learning all tree and projection parameters. Experimental results on multiple benchmark datasets demonstrate that Bonsai can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, has lower battery consumption than all other algorithms while achieving prediction accuracies that can be as much as 30\% higher than state-of-the-art methods for resource-efficient machine learning. Bonsai is also shown to generalize to other resource constrained settings beyond IoT by generating significantly better search results as compared to Bing’s L3 ranker when the model size is restricted to 300 bytes. Bonsai’s code can be downloaded from [http://www.manikvarma.org/code/Bonsai/download.html](http://www.manikvarma.org/code/Bonsai/download.html). 25 | 26 | 27 | -------------------------------------------------------------------------------- /Papers/ML_Algorithms_For_Edge/data.yaml: -------------------------------------------------------------------------------- 1 | # ML ALGORITHMS FOR EDGE 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices 21 | url: http://proceedings.mlr.press/v70/gupta17a.html 22 | date: 2017/08/06 23 | conference: ICML 2017 24 | code: https://github.com/microsoft/EdgeML/tree/master/cpp/src/ProtoNN 25 | authors: Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain 26 | abstract: > 27 | Several real-world applications require real-time prediction on resource-scarce devices such as an Internet of Things (IoT) sensor. Such applications demand prediction models with small storage and computational complexity that do not compromise significantly on accuracy. In this work, we propose ProtoNN, a novel algorithm that addresses the problem of real-time and accurate prediction on resource-scarce devices. ProtoNN is inspired by k-Nearest Neighbor (KNN) but has several orders lower storage and prediction complexity. ProtoNN models can be deployed even on devices with puny storage and computational power (e.g. an Arduino UNO with 2kB RAM) to get excellent prediction accuracy. ProtoNN derives its strength from three key ideas: a) learning a small number of prototypes to represent the entire training set, b) sparse low dimensional projection of data, c) joint discriminative learning of the projection and prototypes with explicit model size constraint. We conduct systematic empirical evaluation of ProtoNN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory. 28 | 29 | - 30 | name: > 31 | Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things 32 | url: http://proceedings.mlr.press/v70/kumar17a.html 33 | date: 2017/08/06 34 | conference: ICML 2017 35 | code: https://github.com/microsoft/EdgeML/tree/master/cpp/src/Bonsai 36 | authors: Ashish Kumar, Saurabh Goyal, Manik Varma 37 | abstract: > 38 | This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash. Bonsai maintains prediction accuracy while minimizing model size and prediction costs by: (a) developing a tree model which learns a single, shallow, sparse tree with powerful nodes; (b) sparsely projecting all data into a low-dimensional space in which the tree is learnt; and (c) jointly learning all tree and projection parameters. Experimental results on multiple benchmark datasets demonstrate that Bonsai can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, has lower battery consumption than all other algorithms while achieving prediction accuracies that can be as much as 30\% higher than state-of-the-art methods for resource-efficient machine learning. Bonsai is also shown to generalize to other resource constrained settings beyond IoT by generating significantly better search results as compared to Bing’s L3 ranker when the model size is restricted to 300 bytes. Bonsai’s code can be downloaded from [http://www.manikvarma.org/code/Bonsai/download.html](http://www.manikvarma.org/code/Bonsai/download.html). 39 | 40 | - 41 | name: > 42 | Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices 43 | url: https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf 44 | date: 2019/12/09 45 | conference: NeurIPS 2019 46 | code: https://github.com/Microsoft/EdgeML/ 47 | authors: Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain 48 | abstract: > 49 | Recurrent Neural Networks (RNNs) capture long dependencies and context, and hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using ShaRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag in phoneme classification by 10-12x while maintaining state-of-the-art accuracy. 50 | 51 | ... 52 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2019 Bisonai Authors 2 | 3 | Creative Commons Legal Code 4 | 5 | CC0 1.0 Universal 6 | 7 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 8 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 9 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 10 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 11 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 12 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 13 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 14 | HEREUNDER. 15 | 16 | Statement of Purpose 17 | 18 | The laws of most jurisdictions throughout the world automatically confer 19 | exclusive Copyright and Related Rights (defined below) upon the creator 20 | and subsequent owner(s) (each and all, an "owner") of an original work of 21 | authorship and/or a database (each, a "Work"). 22 | 23 | Certain owners wish to permanently relinquish those rights to a Work for 24 | the purpose of contributing to a commons of creative, cultural and 25 | scientific works ("Commons") that the public can reliably and without fear 26 | of later claims of infringement build upon, modify, incorporate in other 27 | works, reuse and redistribute as freely as possible in any form whatsoever 28 | and for any purposes, including without limitation commercial purposes. 29 | These owners may contribute to the Commons to promote the ideal of a free 30 | culture and the further production of creative, cultural and scientific 31 | works, or to gain reputation or greater distribution for their Work in 32 | part through the use and efforts of others. 33 | 34 | For these and/or other purposes and motivations, and without any 35 | expectation of additional consideration or compensation, the person 36 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 37 | is an owner of Copyright and Related Rights in the Work, voluntarily 38 | elects to apply CC0 to the Work and publicly distribute the Work under its 39 | terms, with knowledge of his or her Copyright and Related Rights in the 40 | Work and the meaning and intended legal effect of CC0 on those rights. 41 | 42 | 1. Copyright and Related Rights. A Work made available under CC0 may be 43 | protected by copyright and related or neighboring rights ("Copyright and 44 | Related Rights"). Copyright and Related Rights include, but are not 45 | limited to, the following: 46 | 47 | i. the right to reproduce, adapt, distribute, perform, display, 48 | communicate, and translate a Work; 49 | ii. moral rights retained by the original author(s) and/or performer(s); 50 | iii. publicity and privacy rights pertaining to a person's image or 51 | likeness depicted in a Work; 52 | iv. rights protecting against unfair competition in regards to a Work, 53 | subject to the limitations in paragraph 4(a), below; 54 | v. rights protecting the extraction, dissemination, use and reuse of data 55 | in a Work; 56 | vi. database rights (such as those arising under Directive 96/9/EC of the 57 | European Parliament and of the Council of 11 March 1996 on the legal 58 | protection of databases, and under any national implementation 59 | thereof, including any amended or successor version of such 60 | directive); and 61 | vii. other similar, equivalent or corresponding rights throughout the 62 | world based on applicable law or treaty, and any national 63 | implementations thereof. 64 | 65 | 2. Waiver. To the greatest extent permitted by, but not in contravention 66 | of, applicable law, Affirmer hereby overtly, fully, permanently, 67 | irrevocably and unconditionally waives, abandons, and surrenders all of 68 | Affirmer's Copyright and Related Rights and associated claims and causes 69 | of action, whether now known or unknown (including existing as well as 70 | future claims and causes of action), in the Work (i) in all territories 71 | worldwide, (ii) for the maximum duration provided by applicable law or 72 | treaty (including future time extensions), (iii) in any current or future 73 | medium and for any number of copies, and (iv) for any purpose whatsoever, 74 | including without limitation commercial, advertising or promotional 75 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 76 | member of the public at large and to the detriment of Affirmer's heirs and 77 | successors, fully intending that such Waiver shall not be subject to 78 | revocation, rescission, cancellation, termination, or any other legal or 79 | equitable action to disrupt the quiet enjoyment of the Work by the public 80 | as contemplated by Affirmer's express Statement of Purpose. 81 | 82 | 3. Public License Fallback. Should any part of the Waiver for any reason 83 | be judged legally invalid or ineffective under applicable law, then the 84 | Waiver shall be preserved to the maximum extent permitted taking into 85 | account Affirmer's express Statement of Purpose. In addition, to the 86 | extent the Waiver is so judged Affirmer hereby grants to each affected 87 | person a royalty-free, non transferable, non sublicensable, non exclusive, 88 | irrevocable and unconditional license to exercise Affirmer's Copyright and 89 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 90 | maximum duration provided by applicable law or treaty (including future 91 | time extensions), (iii) in any current or future medium and for any number 92 | of copies, and (iv) for any purpose whatsoever, including without 93 | limitation commercial, advertising or promotional purposes (the 94 | "License"). The License shall be deemed effective as of the date CC0 was 95 | applied by Affirmer to the Work. Should any part of the License for any 96 | reason be judged legally invalid or ineffective under applicable law, such 97 | partial invalidity or ineffectiveness shall not invalidate the remainder 98 | of the License, and in such case Affirmer hereby affirms that he or she 99 | will not (i) exercise any of his or her remaining Copyright and Related 100 | Rights in the Work or (ii) assert any associated claims and causes of 101 | action with respect to the Work, in either case contrary to Affirmer's 102 | express Statement of Purpose. 103 | 104 | 4. Limitations and Disclaimers. 105 | 106 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 107 | surrendered, licensed or otherwise affected by this document. 108 | b. Affirmer offers the Work as-is and makes no representations or 109 | warranties of any kind concerning the Work, express, implied, 110 | statutory or otherwise, including without limitation warranties of 111 | title, merchantability, fitness for a particular purpose, non 112 | infringement, or the absence of latent or other defects, accuracy, or 113 | the present or absence of errors, whether or not discoverable, all to 114 | the greatest extent permissible under applicable law. 115 | c. Affirmer disclaims responsibility for clearing rights of other persons 116 | that may apply to the Work or any use thereof, including without 117 | limitation any person's Copyright and Related Rights in the Work. 118 | Further, Affirmer disclaims responsibility for obtaining any necessary 119 | consents, permissions or other rights required for any use of the 120 | Work. 121 | d. Affirmer understands and acknowledges that Creative Commons is not a 122 | party to this document and has no duty or obligation with respect to 123 | this CC0 or use of the Work. 124 | 125 | -------------------------------------------------------------------------------- /Papers/Applications/README.md: -------------------------------------------------------------------------------- 1 | # Applications 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions. 7 | 8 | 9 | ## [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), 2019/09 10 | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 11 | 12 | Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.The code and the pretrained models are available at this [URL](https://github.com/google-research/google-research/tree/master/albert). 13 | 14 | 15 | ## [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351), 2019/09 16 | Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu 17 | 18 | Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them. 19 | 20 | 21 | ## [Temporal Convolution for Real-time Keyword Spotting on Mobile Devices](https://arxiv.org/abs/1904.03814), 2019/04 22 | Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha 23 | 24 | Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices. 25 | 26 | 27 | ## [Towards Real-Time Automatic Portrait Matting on Mobile Devices](https://arxiv.org/abs/1904.03816), 2019/04 28 | Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha 29 | 30 | We tackle the problem of automatic portrait matting on mobile devices. The proposed model is aimed at attaining real-time inference on mobile devices with minimal degradation of model performance. Our model MMNet, based on multi-branch dilated convolution with linear bottleneck blocks, outperforms the state-of-the-art model and is orders of magnitude faster. The model can be accelerated four times to attain 30 FPS on Xiaomi Mi 5 device with moderate increase in the gradient error. Under the same conditions, our model has an order of magnitude less number of parameters and is faster than Mobile DeepLabv3 while maintaining comparable performance. The accompanied implementation can be found at this [URL](https://github.com/hyperconnect/MMNet). 31 | 32 | ## [ThunderNet: Towards Real-time Generic Object Detection](https://arxiv.org/abs/1903.11752), 2019/03 33 | Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun 34 | 35 | Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction. 36 | 37 | 38 | ## [PFLD: A Practical Facial Landmark Detector](https://arxiv.org/abs/1902.10859), 2019/02 39 | Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling 40 | 41 | Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at this [URL](http://sites.google.com/view/xjguo/fld) for encouraging comparisons and improvements from the community. 42 | 43 | 44 | ## [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383), 2018/11 45 | Ji Lin, Chuang Gan, Song Han 46 | 47 | The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: this https URL. 48 | 49 | 50 | -------------------------------------------------------------------------------- /Papers/Applications/data.yaml: -------------------------------------------------------------------------------- 1 | # APPLICATIONS 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | Temporal Convolution for Real-time Keyword Spotting on Mobile Devices 21 | url: https://arxiv.org/abs/1904.03814 22 | date: 2019/04/08 23 | conference: INTERSPEECH 2019 24 | code: https://github.com/hyperconnect/TC-ResNet 25 | authors: Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha 26 | abstract: > 27 | Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices. 28 | 29 | - 30 | name: > 31 | Towards Real-Time Automatic Portrait Matting on Mobile Devices 32 | url: https://arxiv.org/abs/1904.03816 33 | date: 2019/04/08 34 | conference: 35 | code: https://github.com/hyperconnect/MMNet 36 | authors: Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha 37 | abstract: >- 38 | We tackle the problem of automatic portrait matting on mobile devices. The proposed model is aimed at attaining real-time inference on mobile devices with minimal degradation of model performance. Our model MMNet, based on multi-branch dilated convolution with linear bottleneck blocks, outperforms the state-of-the-art model and is orders of magnitude faster. The model can be accelerated four times to attain 30 FPS on Xiaomi Mi 5 device with moderate increase in the gradient error. Under the same conditions, our model has an order of magnitude less number of parameters and is faster than Mobile DeepLabv3 while maintaining comparable performance. The accompanied implementation can be found at this [URL](https://github.com/hyperconnect/MMNet). 39 | 40 | - 41 | name: > 42 | PFLD: A Practical Facial Landmark Detector 43 | url: https://arxiv.org/abs/1902.10859 44 | date: 2019/02/28 45 | conference: 46 | code: https://sites.google.com/view/xjguo/fld 47 | authors: Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling 48 | abstract: > 49 | Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at this [URL](http://sites.google.com/view/xjguo/fld) for encouraging comparisons and improvements from the community. 50 | 51 | - 52 | name: > 53 | ThunderNet: Towards Real-time Generic Object Detection 54 | url: https://arxiv.org/abs/1903.11752 55 | date: 2019/03/28 56 | conference: ICCV 2019 57 | code: 58 | authors: Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun 59 | abstract: > 60 | Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction. 61 | 62 | - 63 | name: > 64 | TinyBERT: Distilling BERT for Natural Language Understanding 65 | url: https://arxiv.org/abs/1909.10351 66 | date: 2019/09/23 67 | conference: 68 | code: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT 69 | authors: Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu 70 | abstract: > 71 | Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them. 72 | 73 | - 74 | name: > 75 | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 76 | url: https://arxiv.org/abs/1909.11942 77 | date: 2019/09/26 78 | conference: ICLR 2020 79 | code: https://github.com/google-research/google-research/tree/master/albert 80 | authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 81 | abstract: > 82 | Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.The code and the pretrained models are available at this [URL](https://github.com/google-research/google-research/tree/master/albert). 83 | 84 | - 85 | name: > 86 | TSM: Temporal Shift Module for Efficient Video Understanding 87 | url: https://arxiv.org/abs/1811.08383 88 | date: 2018/11/20 89 | conference: ICCV 2019 90 | code: https://github.com/mit-han-lab/temporal-shift-module 91 | authors: Ji Lin, Chuang Gan, Song Han 92 | abstract: > 93 | The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: this https URL. 94 | 95 | ... 96 | -------------------------------------------------------------------------------- /Papers/AutoML/README.md: -------------------------------------------------------------------------------- 1 | # AutoML 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.Wikipedia AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture. 7 | 8 | 9 | ## [ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation](https://arxiv.org/abs/1812.08934), 2018/12 10 | Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha 11 | 12 | This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU). 13 | 14 | 15 | ## [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443), 2018/12 16 | Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer 17 | 18 | Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too expensive for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets, a family of models discovered by DNAS surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3 with similar accuracy. Despite higher accuracy and lower latency than MnasNet, we estimate FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU-hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X. 19 | 20 | 21 | ## [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332), 2018/12 22 | Han Cai, Ligeng Zhu, Song Han 23 | 24 | Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. 104 GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6× fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2× faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design. 25 | 26 | 27 | ## [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626), 2018/07 28 | Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le 29 | 30 | Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8x faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3x faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at this [URL](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet) 31 | 32 | 33 | ## [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230), 2018/04 34 | Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam 35 | 36 | This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7× speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2). 37 | 38 | 39 | ## [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494), 2018/02 40 | Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han 41 | 42 | Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy. 43 | 44 | 45 | ## [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks](https://arxiv.org/abs/1711.06798), 2017/11 46 | Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi 47 | 48 | We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint. 49 | 50 | 51 | -------------------------------------------------------------------------------- /Papers/AutoML/data.yaml: -------------------------------------------------------------------------------- 1 | # AUTOML 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation 21 | url: https://arxiv.org/abs/1812.08934 22 | date: 2018/12/21 23 | conference: CVPR 2019 24 | code: https://github.com/facebookresearch/mobile-vision 25 | authors: Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha 26 | abstract: > 27 | This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU). 28 | 29 | - 30 | name: > 31 | FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search 32 | url: https://arxiv.org/abs/1812.03443 33 | date: 2018/12/09 34 | conference: CVPR 2019 35 | code: https://github.com/facebookresearch/mobile-vision 36 | authors: Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer 37 | abstract: > 38 | Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too expensive for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets, a family of models discovered by DNAS surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3 with similar accuracy. Despite higher accuracy and lower latency than MnasNet, we estimate FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU-hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X. 39 | 40 | - 41 | name: > 42 | AMC: AutoML for Model Compression and Acceleration on Mobile Devices 43 | url: https://arxiv.org/abs/1802.03494 44 | date: 2018/02/10 45 | conference: ECCV 2018 46 | code: https://github.com/mit-han-lab/amc-compressed-models 47 | authors: Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han 48 | abstract: > 49 | Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy. 50 | 51 | - 52 | name: > 53 | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 54 | url: https://arxiv.org/abs/1812.00332 55 | date: 2018/12/02 56 | conference: ICLR 2019 57 | code: https://github.com/mit-han-lab/ProxylessNAS 58 | authors: Han Cai, Ligeng Zhu, Song Han 59 | abstract: > 60 | Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. 104 GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6× fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2× faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design. 61 | 62 | - 63 | name: > 64 | MnasNet: Platform-Aware Neural Architecture Search for Mobile 65 | url: https://arxiv.org/abs/1807.11626 66 | date: 2018/07/31 67 | conference: CVPR 2019 68 | code: https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet 69 | authors: Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le 70 | abstract: > 71 | Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8x faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3x faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at this [URL](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet) 72 | 73 | - 74 | name: > 75 | NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications 76 | url: https://arxiv.org/abs/1804.03230 77 | date: 2018/04/19 78 | conference: ECCV 2018 79 | code: 80 | authors: Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam 81 | abstract: > 82 | This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7× speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2). 83 | 84 | - 85 | name: > 86 | MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks 87 | url: https://arxiv.org/abs/1711.06798 88 | date: 2017/11/18 89 | conference: 90 | code: 91 | authors: Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi 92 | abstract: > 93 | We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint. 94 | 95 | ... 96 | -------------------------------------------------------------------------------- /Papers/Quantization/README.md: -------------------------------------------------------------------------------- 1 | # Quantization 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision. 7 | 8 | 9 | ## [And the Bit Goes Down: Revisiting the Quantization of Neural Networks](https://arxiv.org/abs/1907.05686), 2019/07 10 | Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou 11 | 12 | In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error for in-domain inputs and does not require any labelled data. We also use byte-aligned codebooks to produce compressed networks with efficient inference on CPU. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a size budget around 6 MB. 13 | 14 | 15 | ## [Data-Free Quantization through Weight Equalization and Bias Correction](https://arxiv.org/abs/1906.04721), 2019/06 16 | Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling 17 | 18 | We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection. 19 | 20 | 21 | ## [HAQ: Hardware-Aware Automated Quantization with Mixed Precision](https://arxiv.org/abs/1811.08886), 2018/11 22 | Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han 23 | 24 | Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design. 25 | 26 | 27 | ## [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342), 2018/06 28 | Raghuraman Krishnamoorthi 29 | 30 | We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits. 31 | 32 | 33 | ## [A Quantization-Friendly Separable Convolution for MobileNets](https://arxiv.org/abs/1803.08607), 2018/03 34 | Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic 35 | 36 | As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline. 37 | 38 | 39 | ## [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877), 2017/11 40 | Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko 41 | 42 | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs. 43 | 44 | 45 | ## [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061), 2016/09 46 | Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio 47 | 48 | We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online. 49 | 50 | 51 | ## [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160), 2016/06 52 | Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou 53 | 54 | We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly. 55 | 56 | 57 | ## [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473), 2015/12 58 | Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng 59 | 60 | Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4~6x speed-up and 15~20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second. 61 | 62 | 63 | -------------------------------------------------------------------------------- /Papers/Others/README.md: -------------------------------------------------------------------------------- 1 | # Others 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform). 7 | 8 | 9 | ## [Distributed Machine Learning on Mobile Devices: A Survey](https://arxiv.org/abs/1909.08329), 2019/09 10 | Renjie Gu, Shuo Yang, Fan Wu 11 | 12 | In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications. 13 | 14 | 15 | ## [Machine Learning at the Network Edge: A Survey](https://arxiv.org/abs/1908.00080), 2019/07 16 | M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain 17 | 18 | Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks. 19 | 20 | 21 | ## [Convergence of Edge Computing and Deep Learning: A Comprehensive Survey](https://arxiv.org/abs/1907.08349), 2019/07 22 | Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen 23 | 24 | Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge. 25 | 26 | 27 | ## [On-Device Neural Net Inference with Mobile GPUs](https://arxiv.org/abs/1907.01989), 2019/07 28 | Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann 29 | 30 | On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at [https://tensorflow.org/lite](https://tensorflow.org/lite). 31 | 32 | 33 | ## [Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing](https://arxiv.org/abs/1905.10083), 2019/05 34 | Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang 35 | 36 | With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence. 37 | 38 | 39 | ## [Deep Learning on Mobile Devices - A Review](https://arxiv.org/abs/1904.09274), 2019/03 40 | Yunbin Deng 41 | 42 | Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays, Application Specific Integrated Circuit, and recent mobile Graphic Processing Units. We present Size, Weight, Area and Power considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine, and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multi-media, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies. 43 | 44 | 45 | ## [Wireless Network Intelligence at the Edge](https://arxiv.org/abs/1812.02858), 2018/12 46 | Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah 47 | 48 | Fueled by the availability of more data and computing power, recent breakthroughs in cloud-based machine learning (ML) have transformed every aspect of our lives from face recognition and medical diagnosis to natural language processing. However, classical ML exerts severe demands in terms of energy, memory and computing resources, limiting their adoption for resource constrained edge devices. The new breed of intelligent devices and high-stake applications (drones, augmented/virtual reality, autonomous systems, etc.), requires a novel paradigm change calling for distributed, low-latency and reliable ML at the wireless network edge (referred to as edge ML). In edge ML, training data is unevenly distributed over a large number of edge nodes, which have access to a tiny fraction of the data. Moreover training and inference is carried out collectively over wireless links, where edge devices communicate and exchange their learned models (not their private data). In a first of its kind, this article explores key building blocks of edge ML, different neural network architectural splits and their inherent tradeoffs, as well as theoretical and technical enablers stemming from a wide range of mathematical disciplines. Finally, several case studies pertaining to various high-stake applications are presented demonstrating the effectiveness of edge ML in unlocking the full potential of 5G and beyond. 49 | 50 | 51 | ## [Machine Learning at Facebook:Understanding Inference at the Edge](https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf), 2018/12 52 | Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang 53 | 54 | At Facebook, machine learning provides a wide range ofcapabilities that drive many aspects of user experienceincluding ranking posts, content understanding, objectdetection and tracking for augmented and virtual real-ity, speech and text translations. While machine learn-ing models are currently trained on customized data-center infrastructure, Facebook is working to bring ma-chine learning inference to the edge. By doing so, userexperience is improved with reduced latency (inferencetime) and becomes less dependent on network connec-tivity. Furthermore, this also enables many more appli-cations of deep learning with important features onlymade available at the edge. This paper takes a data-driven approach to present the opportunities and de-sign challenges faced by Facebook in order to enablemachine learning inference locally on smartphones andother edge platforms. 55 | 56 | 57 | -------------------------------------------------------------------------------- /Papers/Quantization/data.yaml: -------------------------------------------------------------------------------- 1 | # QUANTIZATION 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | And the Bit Goes Down: Revisiting the Quantization of Neural Networks 21 | url: https://arxiv.org/abs/1907.05686 22 | date: 2019/07/12 23 | conference: 24 | code: https://github.com/facebookresearch/kill-the-bits 25 | authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou 26 | abstract: > 27 | In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error for in-domain inputs and does not require any labelled data. We also use byte-aligned codebooks to produce compressed networks with efficient inference on CPU. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a size budget around 6 MB. 28 | 29 | - 30 | name: > 31 | A Quantization-Friendly Separable Convolution for MobileNets 32 | url: https://arxiv.org/abs/1803.08607 33 | date: 2018/03/22 34 | conference: 35 | code: 36 | authors: Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic 37 | abstract: > 38 | As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline. 39 | 40 | - 41 | name: > 42 | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients 43 | url: https://arxiv.org/abs/1606.06160 44 | date: 2016/06/20 45 | conference: 46 | code: 47 | authors: Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou 48 | abstract: > 49 | We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly. 50 | 51 | - 52 | name: > 53 | Data-Free Quantization through Weight Equalization and Bias Correction 54 | url: https://arxiv.org/abs/1906.04721 55 | date: 2019/06/11 56 | conference: 57 | code: 58 | authors: Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling 59 | abstract: > 60 | We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference in modern deep learning hardware architectures. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied ubiquitously to almost any model with a straight-forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection. 61 | 62 | - 63 | name: > 64 | Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations 65 | url: https://arxiv.org/abs/1609.07061 66 | date: 2016/09/22 67 | conference: 68 | authors: Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio 69 | abstract: > 70 | We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online. 71 | 72 | - 73 | name: Quantized Convolutional Neural Networks for Mobile Devices 74 | url: https://arxiv.org/abs/1512.06473 75 | date: 2015/12/21 76 | conference: CVPR 2016 77 | authors: Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng 78 | abstract: > 79 | Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4~6x speed-up and 15~20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second. 80 | 81 | - 82 | name: > 83 | Quantizing deep convolutional networks for efficient inference: A whitepaper 84 | url: https://arxiv.org/abs/1806.08342 85 | date: 2018/06/21 86 | conference: 87 | authors: Raghuraman Krishnamoorthi 88 | abstract: > 89 | We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. 90 | Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits. 91 | 92 | - 93 | name: > 94 | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference 95 | url: https://arxiv.org/abs/1712.05877 96 | date: 2017/11/15 97 | conference: 98 | authors: Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko 99 | abstract: > 100 | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs. 101 | 102 | - 103 | name: > 104 | HAQ: Hardware-Aware Automated Quantization with Mixed Precision 105 | url: https://arxiv.org/abs/1811.08886 106 | date: 2018/11/21 107 | conference: CVPR 2019 108 | code: https://github.com/mit-han-lab/haq-release 109 | authors: Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han 110 | abstract: > 111 | Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design. 112 | 113 | ... 114 | -------------------------------------------------------------------------------- /Papers/Others/data.yaml: -------------------------------------------------------------------------------- 1 | # OTHERS 2 | # 3 | # `date` format is as following yyy/mm/dd. For example 6 May 2019 would be 2019/05/06. 4 | # In case of arxiv, use the date of the first version of paper. 5 | # 6 | # [Template] 7 | # 8 | # - 9 | # name: 10 | # url: 11 | # date: 12 | # conference: 13 | # code: 14 | # authors: 15 | # abstract: 16 | --- 17 | 18 | - 19 | name: > 20 | Machine Learning at Facebook:Understanding Inference at the Edge 21 | url: https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf 22 | date: 2018/12/01 23 | conference: 24 | code: 25 | authors: Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang 26 | abstract: > 27 | At Facebook, machine learning provides a wide range ofcapabilities that drive many aspects of user experienceincluding ranking posts, content understanding, objectdetection and tracking for augmented and virtual real-ity, speech and text translations. While machine learn-ing models are currently trained on customized data-center infrastructure, Facebook is working to bring ma-chine learning inference to the edge. By doing so, userexperience is improved with reduced latency (inferencetime) and becomes less dependent on network connec-tivity. Furthermore, this also enables many more appli-cations of deep learning with important features onlymade available at the edge. This paper takes a data-driven approach to present the opportunities and de-sign challenges faced by Facebook in order to enablemachine learning inference locally on smartphones andother edge platforms. 28 | 29 | - 30 | name: > 31 | On-Device Neural Net Inference with Mobile GPUs 32 | url: https://arxiv.org/abs/1907.01989 33 | date: 2019/07/03 34 | conference: 35 | code: 36 | authors: Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann 37 | abstract: > 38 | On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at [https://tensorflow.org/lite](https://tensorflow.org/lite). 39 | 40 | - 41 | name: > 42 | Deep Learning on Mobile Devices - A Review 43 | url: https://arxiv.org/abs/1904.09274 44 | date: 2019/03/21 45 | conference: 46 | code: 47 | authors: Yunbin Deng 48 | abstract: > 49 | Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays, Application Specific Integrated Circuit, and recent mobile Graphic Processing Units. We present Size, Weight, Area and Power considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine, and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multi-media, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies. 50 | 51 | - 52 | name: > 53 | Convergence of Edge Computing and Deep Learning: A Comprehensive Survey 54 | url: https://arxiv.org/abs/1907.08349 55 | date: 2019/07/19 56 | conference: 57 | code: 58 | authors: Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen 59 | abstract: > 60 | Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge. 61 | 62 | - 63 | name: > 64 | Machine Learning at the Network Edge: A Survey 65 | url: https://arxiv.org/abs/1908.00080 66 | date: 2019/07/31 67 | conference: 68 | code: 69 | authors: M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain 70 | abstract: > 71 | Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks. 72 | 73 | - 74 | name: > 75 | Distributed Machine Learning on Mobile Devices: A Survey 76 | url: https://arxiv.org/abs/1909.08329 77 | date: 2019/09/18 78 | conference: 79 | code: 80 | authors: Renjie Gu, Shuo Yang, Fan Wu 81 | abstract: > 82 | In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications. 83 | 84 | - 85 | name: > 86 | Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing 87 | url: https://arxiv.org/abs/1905.10083 88 | date: 2019/05/24 89 | conference: 90 | code: 91 | authors: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang 92 | abstract: > 93 | With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence. 94 | 95 | - 96 | name: > 97 | Wireless Network Intelligence at the Edge 98 | url: https://arxiv.org/abs/1812.02858 99 | date: 2018/12/07 100 | conference: 101 | code: 102 | authors: Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah 103 | abstract: > 104 | Fueled by the availability of more data and computing power, recent breakthroughs in cloud-based machine learning (ML) have transformed every aspect of our lives from face recognition and medical diagnosis to natural language processing. However, classical ML exerts severe demands in terms of energy, memory and computing resources, limiting their adoption for resource constrained edge devices. The new breed of intelligent devices and high-stake applications (drones, augmented/virtual reality, autonomous systems, etc.), requires a novel paradigm change calling for distributed, low-latency and reliable ML at the wireless network edge (referred to as edge ML). In edge ML, training data is unevenly distributed over a large number of edge nodes, which have access to a tiny fraction of the data. Moreover training and inference is carried out collectively over wireless links, where edge devices communicate and exchange their learned models (not their private data). In a first of its kind, this article explores key building blocks of edge ML, different neural network architectural splits and their inherent tradeoffs, as well as theoretical and technical enablers stemming from a wide range of mathematical disciplines. Finally, several case studies pertaining to various high-stake applications are presented demonstrating the effectiveness of edge ML in unlocking the full potential of 5G and beyond. 105 | 106 | ... 107 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Edge Machine Learning 2 | [![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re) 3 | 4 | A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others. 5 | 6 | ## Table of Contents 7 | - [Papers](https://github.com/bisonai/awesome-edge-machine-learning#papers) 8 | - [Applications](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Applications) 9 | - [AutoML](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/AutoML) 10 | - [Efficient Architectures](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Efficient_Architectures) 11 | - [Federated Learning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Federated_Learning) 12 | - [ML Algorithms For Edge](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/ML_Algorithms_For_Edge) 13 | - [Network Pruning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Network_Pruning) 14 | - [Others](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Others) 15 | - [Quantization](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Quantization) 16 | - [Datasets](https://github.com/bisonai/awesome-edge-machine-learning#datasets) 17 | - [Inference Engines](https://github.com/bisonai/awesome-edge-machine-learning#inference-engines) 18 | - [MCU and MPU Software Packages](https://github.com/bisonai/awesome-edge-machine-learning#mcu-and-mpu-software-packages) 19 | - [AI Chips](https://github.com/bisonai/awesome-edge-machine-learning#ai-chips) 20 | - [Books](https://github.com/bisonai/awesome-edge-machine-learning#books) 21 | - [Challenges](https://github.com/bisonai/awesome-edge-machine-learning#challenges) 22 | - [Other Resources](https://github.com/bisonai/awesome-edge-machine-learning#other-resources) 23 | - [Contribute](https://github.com/bisonai/awesome-edge-machine-learning#contribute) 24 | - [LicenseBlock](https://github.com/bisonai/awesome-edge-machine-learning#licenseblock) 25 | 26 | ## Papers 27 | ### [Applications](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Applications) 28 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions. 29 | 30 | 31 | ### [AutoML](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/AutoML) 32 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.Wikipedia AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture. 33 | 34 | 35 | ### [Efficient Architectures](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Efficient_Architectures) 36 | Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices. 37 | 38 | 39 | ### [Federated Learning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Federated_Learning) 40 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.Google AI blog: Federated Learning 41 | 42 | 43 | ### [ML Algorithms For Edge](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/ML_Algorithms_For_Edge) 44 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms. 45 | 46 | 47 | ### [Network Pruning](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Network_Pruning) 48 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.Importance Estimation for Neural Network Pruning 49 | 50 | 51 | ### [Others](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Others) 52 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform). 53 | 54 | 55 | ### [Quantization](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers/Quantization) 56 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision. 57 | 58 | 59 | ## Datasets 60 | ### [Visual Wake Words Dataset](https://arxiv.org/abs/1906.05721) 61 | Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. 62 | 63 | 64 | ## Inference Engines 65 | List of machine learning inference engines and APIs that are optimized for execution and/or training on edge devices. 66 | 67 | ### Arm Compute Library 68 | - Source code: [https://github.com/ARM-software/ComputeLibrary](https://github.com/ARM-software/ComputeLibrary) 69 | - Arm 70 | 71 | ### Bender 72 | - Source code: [https://github.com/xmartlabs/Bender](https://github.com/xmartlabs/Bender) 73 | - Documentation: [https://xmartlabs.github.io/Bender/](https://xmartlabs.github.io/Bender/) 74 | - Xmartlabs 75 | 76 | ### Caffe 2 77 | - Source code: [https://github.com/pytorch/pytorch/tree/master/caffe2](https://github.com/pytorch/pytorch/tree/master/caffe2) 78 | - Documentation: [https://caffe2.ai/](https://caffe2.ai/) 79 | - Facebook 80 | 81 | ### CoreML 82 | - Documentation: [https://developer.apple.com/documentation/coreml](https://developer.apple.com/documentation/coreml) 83 | - Apple 84 | 85 | ### Deeplearning4j 86 | - Documentation: [https://deeplearning4j.org/docs/latest/deeplearning4j-android](https://deeplearning4j.org/docs/latest/deeplearning4j-android) 87 | - Skymind 88 | 89 | ### Embedded Learning Library 90 | - Source code: [https://github.com/Microsoft/ELL](https://github.com/Microsoft/ELL) 91 | - Documentation: [https://microsoft.github.io/ELL](https://microsoft.github.io/ELL) 92 | - Microsoft 93 | 94 | ### Feather CNN 95 | - Source code: [https://github.com/Tencent/FeatherCNN](https://github.com/Tencent/FeatherCNN) 96 | - Tencent 97 | 98 | ### MACE 99 | - Source code: [https://github.com/XiaoMi/mace](https://github.com/XiaoMi/mace) 100 | - Documentation: [https://mace.readthedocs.io/](https://mace.readthedocs.io/) 101 | - XiaoMi 102 | 103 | ### MNN 104 | - Source code: [https://github.com/alibaba/MNN](https://github.com/alibaba/MNN) 105 | - Alibaba 106 | 107 | ### MXNet 108 | - Documentation: [https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html](https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html) 109 | - Amazon 110 | 111 | ### NCNN 112 | - Source code: [https://github.com/tencent/ncnn](https://github.com/tencent/ncnn) 113 | - Tencent 114 | 115 | ### Neural Networks API 116 | - Documentation: [https://developer.android.com/ndk/guides/neuralnetworks/](https://developer.android.com/ndk/guides/neuralnetworks/) 117 | - Google 118 | 119 | ### Paddle Mobile 120 | - Source code: [https://github.com/PaddlePaddle/paddle-mobile](https://github.com/PaddlePaddle/paddle-mobile) 121 | - Baidu 122 | 123 | ### Qualcomm Neural Processing SDK for AI 124 | - Source code: [https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk](https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk) 125 | - Qualcomm 126 | 127 | ### Tengine 128 | - Source code: [https://github.com/OAID/Tengine](https://github.com/OAID/Tengine) 129 | - OAID 130 | 131 | ### TensorFlow Lite 132 | - Source code: [https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite) 133 | - Documentation: [https://www.tensorflow.org/lite/](https://www.tensorflow.org/lite/) 134 | - Google 135 | 136 | ### dabnn 137 | - Source code: [https://github.com/JDAI-CV/dabnn](https://github.com/JDAI-CV/dabnn) 138 | - JDAI Computer Vision 139 | 140 | ## MCU and MPU Software Packages 141 | List of software packages for AI development on MCU and MPU 142 | 143 | ### [FP-AI-Sensing](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-ode-function-pack-sw/fp-ai-sensing1.html) 144 | STM32Cube function pack for ultra-low power IoT node with artificial intelligence (AI) application based on audio and motion sensing 145 | 146 | ### [FP-AI-VISION1](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32cube-expansion-packages/fp-ai-vision1.html) 147 | FP-AI-VISION1 is an STM32Cube function pack featuring examples of computer vision applications based on Convolutional Neural Network (CNN) 148 | 149 | ### [Processor SDK Linux for AM57x](http://www.ti.com/tool/SITARA-MACHINE-LEARNING) 150 | TIDL software framework leverages a highly optimized neural network implementation on TI’s Sitara AM57x processors, making use of hardware acceleration on the device 151 | 152 | ### [X-LINUX-AI-CV](https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-mpu-openstlinux-expansion-packages/x-linux-ai-cv.html) 153 | X-LINUX-AI-CV is an STM32 MPU OpenSTLinux Expansion Package that targets Artificial Intelligence for computer vision applications based on Convolutional Neural Network (CNN) 154 | 155 | ### [e-AI Checker](https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html) 156 | Based on the output result from the translator, the ROM/RAM mounting size and the inference execution processing time are calculated while referring to the information of the selected MCU/MPU 157 | 158 | ### [e-AI Translator](https://www.renesas.com/jp/en/solutions/key-technology/e-ai/tool.html) 159 | Tool for converting Caffe and TensorFlow models to MCU/MPU development environment 160 | 161 | ### [eIQ Auto deep learning (DL) toolkit](https://www.nxp.com/design/software/development-software/eiq-auto-dl-toolkit:EIQ-AUTO) 162 | The NXP eIQ™ Auto deep learning (DL) toolkit enables developers to introduce DL algorithms into their applications and to continue satisfying automotive standards 163 | 164 | ### [eIQ ML Software Development Environment](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ) 165 | The NXP® eIQ™ machine learning software development environment enables the use of ML algorithms on NXP MCUs, i.MX RT crossover MCUs, and i.MX family SoCs. eIQ software includes inference engines, neural network compilers and optimized libraries 166 | 167 | ### [eIQ™ Software for Arm® NN Inference Engine](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-software-for-arm-nn-inference-engine:eIQArmNN) 168 | 169 | ### [eIQ™ for Arm® CMSIS-NN](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-arm-cmsis-nn:eIQArmCMSISNN) 170 | 171 | ### [eIQ™ for Glow Neural Network Compiler](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-glow-neural-network-compiler:eIQ-Glow) 172 | 173 | ### [eIQ™ for TensorFlow Lite](https://www.nxp.com/design/software/development-software/eiq-ml-development-environment/eiq-for-tensorflow-lite:eIQTensorFlowLite) 174 | 175 | ## AI Chips 176 | List of resources about AI Chips 177 | 178 | ### [AI Chip (ICs and IPs)](https://github.com/basicmi/AI-Chip) 179 | A list of ICs and IPs for AI, Machine Learning and Deep Learning 180 | 181 | ## Books 182 | List of books with focus on on-device (e.g., edge or mobile) machine learning. 183 | 184 | ### [TinyML: Machine Learning with TensorFlow on Arduino, and Ultra-Low Power Micro-Controllers](http://shop.oreilly.com/product/0636920254508.do) 185 | - Authors: Pete Warden, Daniel Situnayake 186 | - Published: 2020 187 | 188 | ### [Machine Learning by Tutorials: Beginning machine learning for Apple and iOS](https://store.raywenderlich.com/products/machine-learning-by-tutorials) 189 | - Author: Matthijs Hollemans 190 | - Published: 2019 191 | 192 | ### [Core ML Survival Guide](https://leanpub.com/coreml-survival-guide) 193 | - Author: Matthijs Hollemans 194 | - Published: 2018 195 | 196 | ### [Building Mobile Applications with TensorFlow](https://www.oreilly.com/library/view/building-mobile-applications/9781491988435/) 197 | - Author: Pete Warden 198 | - Published: 2017 199 | 200 | ## Challenges 201 | ### [Low Power Recognition Challenge (LPIRC)](https://rebootingcomputing.ieee.org/lpirc) 202 | Competition with focus on the best vision solutions that can simultaneously achieve high accuracy in computer vision and energy efficiency. LPIRC is regularly held during computer vision conferences (CVPR, ICCV and others) since 2015 and the winners’ solutions have already improved 24 times in the ratio of accuracy divided by energy. 203 | 204 | - [Online Track](https://rebootingcomputing.ieee.org/lpirc/online-track) 205 | 206 | - [Onsite Track](https://rebootingcomputing.ieee.org/lpirc/onsite-track) 207 | 208 | 209 | ## Other Resources 210 | ### [Awesome EMDL](https://github.com/EMDL/awesome-emdl) 211 | 212 | Embedded and mobile deep learning research resources 213 | 214 | ### [Awesome Pruning](https://github.com/he-y/Awesome-Pruning) 215 | 216 | A curated list of neural network pruning resources 217 | 218 | ### [Efficient DNNs](https://github.com/MingSun-Tse/EfficientDNNs) 219 | 220 | Collection of recent methods on DNN compression and acceleration 221 | 222 | ### [Machine Think](https://machinethink.net/) 223 | 224 | Machine learning tutorials targeted for iOS devices 225 | 226 | ### [Pete Warden's blog](https://petewarden.com/) 227 | 228 | 229 | 230 | ## Contribute 231 | Unlike other awesome list, we are storing data in YAML format and markdown files are generated with `awesome.py` script. 232 | 233 | Every directory contains `data.yaml` which stores data we want to display and `config.yaml` which stores its metadata (e.g. way of sorting data). The way how data will be presented is defined in `renderer.py`. 234 | 235 | 236 | ## LicenseBlock 237 | [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/) 238 | 239 | To the extent possible under law, [Bisonai](https://bisonai.com/) has waived all copyright and related or neighboring rights to this work. -------------------------------------------------------------------------------- /Papers/README.md: -------------------------------------------------------------------------------- 1 | # Papers 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | ## Applications 5 | 6 | There is a countless number of possible edge machine learning applications. Here, we collect papers that describe specific solutions. 7 | 8 | - [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942). Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 9 | 10 | - [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351). Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu 11 | 12 | - [Temporal Convolution for Real-time Keyword Spotting on Mobile Devices](https://arxiv.org/abs/1904.03814). Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha 13 | 14 | - [Towards Real-Time Automatic Portrait Matting on Mobile Devices](https://arxiv.org/abs/1904.03816). Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha 15 | 16 | - [ThunderNet: Towards Real-time Generic Object Detection](https://arxiv.org/abs/1903.11752). Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun 17 | 18 | - [PFLD: A Practical Facial Landmark Detector](https://arxiv.org/abs/1902.10859). Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling 19 | 20 | - [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383). Ji Lin, Chuang Gan, Song Han 21 | 22 | ## AutoML 23 | 24 | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.Wikipedia AutoML is for example used to design new efficient neural architectures with a constraint on a computational budget (defined either as a number of FLOPS or as an inference time measured on real device) or a size of the architecture. 25 | 26 | - [ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation](https://arxiv.org/abs/1812.08934). Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha 27 | 28 | - [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443). Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer 29 | 30 | - [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332). Han Cai, Ligeng Zhu, Song Han 31 | 32 | - [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626). Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le 33 | 34 | - [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230). Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam 35 | 36 | - [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494). Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han 37 | 38 | - [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks](https://arxiv.org/abs/1711.06798). Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi 39 | 40 | ## Efficient Architectures 41 | 42 | Efficient architectures represent neural networks with small memory footprint and fast inference time when measured on edge devices. 43 | 44 | - [Compression of convolutional neural networks for high performance image matching tasks on mobile devices](https://arxiv.org/abs/2001.03102). Roy Miles, Krystian Mikolajczyk 45 | 46 | - [Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms](https://arxiv.org/abs/1908.09791). Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han 47 | 48 | - [MixNet: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595). Mingxing Tan, Quoc V. Le 49 | 50 | - [Efficient On-Device Models using Neural Projections](http://proceedings.mlr.press/v97/ravi19a/ravi19a.pdf). Sujith Ravi 51 | 52 | - [Butterfly Transform: An Efficient FFT Based Neural Architecture Design](https://arxiv.org/abs/1906.02256). Keivan Alizadeh, Ali Farhadi, Mohammad Rastegari 53 | 54 | - [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946). Mingxing Tan, Quoc V. Le 55 | 56 | - [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244). Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam 57 | 58 | - [ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network](https://arxiv.org/abs/1811.11431). Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi 59 | 60 | - [ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation](https://arxiv.org/abs/1803.06815). Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, Hannaneh Hajishirzi 61 | 62 | - [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen 63 | 64 | - [CondenseNet: An Efficient DenseNet using Learned Group Convolutions](https://arxiv.org/abs/1711.09224). Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger 65 | 66 | - [BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks](https://arxiv.org/abs/1709.01686). Surat Teerapittayanon, Bradley McDanel, H.T. Kung 67 | 68 | - [DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices](https://arxiv.org/abs/1708.04728). Dawei Li, Xiaolong Wang, Deguang Kong 69 | 70 | - [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083). Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun 71 | 72 | - [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861). Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam 73 | 74 | - [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360). Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer 75 | 76 | ## Federated Learning 77 | 78 | Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.Google AI blog: Federated Learning 79 | 80 | - [Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046). Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander 81 | 82 | - [Adaptive Federated Learning in Resource Constrained Edge Computing Systems](https://arxiv.org/abs/1804.05271). Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan 83 | 84 | - [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629). H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas 85 | 86 | ## ML Algorithms For Edge 87 | 88 | Standard machine learning algorithms are not always able to run on edge devices due to large computational requirements and space complexity. This section introduces optimized machine learning algorithms. 89 | 90 | - [Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices](https://dkdennis.xyz/static/sharnn-neurips19-paper.pdf). Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Harsha Vardhan Simhadri, Venkatesh Saligrama, Prateek Jain 91 | 92 | - [ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices](http://proceedings.mlr.press/v70/gupta17a.html). Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain 93 | 94 | - [Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things](http://proceedings.mlr.press/v70/kumar17a.html). Ashish Kumar, Saurabh Goyal, Manik Varma 95 | 96 | ## Network Pruning 97 | 98 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.Importance Estimation for Neural Network Pruning 99 | 100 | - [Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks](https://arxiv.org/abs/1909.08174). Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, Ping Wang 101 | 102 | - [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz 103 | 104 | - [Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure](https://arxiv.org/abs/1904.03837). Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han 105 | 106 | - [Towards Optimal Structured CNN Pruning via Generative Adversarial Learning](https://arxiv.org/abs/1903.09291). Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann 107 | 108 | - [Variational Convolutional Neural Network Pruning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Variational_Convolutional_Neural_Network_Pruning_CVPR_2019_paper.pdf). Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian 109 | 110 | - [On Implicit Filter Level Sparsity in Convolutional Neural Networks](https://arxiv.org/abs/1811.12495). Dushyant Mehta, Kwang In Kim, Christian Theobalt 111 | 112 | - [Structured Pruning of Neural Networks with Budget-Aware Regularization](https://arxiv.org/abs/1811.09332). Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin 113 | 114 | - [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250). Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang 115 | 116 | - [Discrimination-aware Channel Pruning for Deep Neural Networks](https://arxiv.org/abs/1810.11809). Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu 117 | 118 | - [Rethinking the Value of Network Pruning](https://arxiv.org/abs/1810.05270). Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell 119 | 120 | - [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635). Jonathan Frankle, Michael Carbin 121 | 122 | - [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878). Michael Zhu, Suyog Gupta 123 | 124 | - [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342). Jian-Hao Luo, Jianxin Wu, Weiyao Lin 125 | 126 | - [Channel Pruning for Accelerating Very Deep Neural Networks](https://arxiv.org/abs/1707.06168). Yihui He, Xiangyu Zhang, Jian Sun 127 | 128 | ## Others 129 | 130 | This section contains papers that are related to edge machine learning but are not part of any major group. These papers often deal with deployment issues (i.e. optimizing inference on target platform). 131 | 132 | - [Distributed Machine Learning on Mobile Devices: A Survey](https://arxiv.org/abs/1909.08329). Renjie Gu, Shuo Yang, Fan Wu 133 | 134 | - [Machine Learning at the Network Edge: A Survey](https://arxiv.org/abs/1908.00080). M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain 135 | 136 | - [Convergence of Edge Computing and Deep Learning: A Comprehensive Survey](https://arxiv.org/abs/1907.08349). Yiwen Han, Xiaofei Wang, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen 137 | 138 | - [On-Device Neural Net Inference with Mobile GPUs](https://arxiv.org/abs/1907.01989). Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann 139 | 140 | - [Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing](https://arxiv.org/abs/1905.10083). Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, Junshan Zhang 141 | 142 | - [Deep Learning on Mobile Devices - A Review](https://arxiv.org/abs/1904.09274). Yunbin Deng 143 | 144 | - [Wireless Network Intelligence at the Edge](https://arxiv.org/abs/1812.02858). Jihong Park, Sumudu Samarakoon, Mehdi Bennis, Mérouane Debbah 145 | 146 | - [Machine Learning at Facebook:Understanding Inference at the Edge](https://research.fb.com/wp-content/uploads/2018/12/Machine-Learning-at-Facebook-Understanding-Inference-at-the-Edge.pdf). Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan,Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao,Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang,Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang 147 | 148 | ## Quantization 149 | 150 | Quantization is the process of reducing a precision (from 32 bit floating point into lower bit depth representations) of weights and/or activations in a neural network. The advantages of this method are reduced model size and faster model inference on hardware that support arithmetic operations in lower precision. 151 | 152 | - [And the Bit Goes Down: Revisiting the Quantization of Neural Networks](https://arxiv.org/abs/1907.05686). Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou 153 | 154 | - [Data-Free Quantization through Weight Equalization and Bias Correction](https://arxiv.org/abs/1906.04721). Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling 155 | 156 | - [HAQ: Hardware-Aware Automated Quantization with Mixed Precision](https://arxiv.org/abs/1811.08886). Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han 157 | 158 | - [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342). Raghuraman Krishnamoorthi 159 | 160 | - [A Quantization-Friendly Separable Convolution for MobileNets](https://arxiv.org/abs/1803.08607). Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic 161 | 162 | - [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877). Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko 163 | 164 | - [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061). Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio 165 | 166 | - [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160). Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou 167 | 168 | - [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473). Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng 169 | 170 | -------------------------------------------------------------------------------- /Papers/Network_Pruning/README.md: -------------------------------------------------------------------------------- 1 | # Network Pruning 2 | [Back to awesome edge machine learning](https://github.com/bisonai/awesome-edge-machine-learning) 3 | 4 | [Back to Papers](https://github.com/bisonai/awesome-edge-machine-learning/tree/master/Papers) 5 | 6 | Pruning is a common method to derive a compact network – after training, some structural portion of the parameters is removed, along with its associated computations.Importance Estimation for Neural Network Pruning 7 | 8 | 9 | ## [Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks](https://arxiv.org/abs/1909.08174), 2019/09 10 | Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, Ping Wang 11 | 12 | Filter pruning is one of the most effective ways to accelerate and compress convolutional neural networks (CNNs). In this work, we propose a global filter pruning algorithm called Gate Decorator, which transforms a vanilla CNN module by multiplying its output by the channel-wise scaling factors, i.e. gate. When the scaling factor is set to zero, it is equivalent to removing the corresponding filter. We use Taylor expansion to estimate the change in the loss function caused by setting the scaling factor to zero and use the estimation for the global filter importance ranking. Then we prune the network by removing those unimportant filters. After pruning, we merge all the scaling factors into its original module, so no special operations or structures are introduced. Moreover, we propose an iterative pruning framework called Tick-Tock to improve pruning accuracy. The extensive experiments demonstrate the effectiveness of our approaches. For example, we achieve the state-of-the-art pruning ratio on ResNet-56 by reducing 70% FLOPs without noticeable loss in accuracy. For ResNet-50 on ImageNet, our pruned model with 40% FLOPs reduction outperforms the baseline model by 0.31% in top-1 accuracy. Various datasets are used, including CIFAR-10, CIFAR-100, CUB-200, ImageNet ILSVRC-12 and PASCAL VOC 2011. Code is available at this [URL](https://github.com/youzhonghui/gate-decorator-pruning). 13 | 14 | 15 | ## [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf), 2019/06 16 | Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz 17 | 18 | Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at [https://github.com/NVlabs/Taylor_pruning](https://github.com/NVlabs/Taylor_pruning). 19 | 20 | 21 | ## [Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure](https://arxiv.org/abs/1904.03837), 2019/04 22 | Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han 23 | 24 | The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove unimportant filters from convolutional layers so as to slim the network with acceptable performance drop. Inspired by the linear and combinational properties of convolution, we seek to make some filters increasingly close and eventually identical for network slimming. To this end, we propose Centripetal SGD (C-SGD), a novel optimization method, which can train several filters to collapse into a single point in the parameter hyperspace. When the training is completed, the removal of the identical filters can trim the network with NO performance loss, thus no finetuning is needed. By doing so, we have partly solved an open problem of constrained filter pruning on CNNs with complicated structure, where some layers must be pruned following others. Our experimental results on CIFAR-10 and ImageNet have justified the effectiveness of C-SGD-based filter pruning. Moreover, we have provided empirical evidences for the assumption that the redundancy in deep neural networks helps the convergence of training by showing that a redundant CNN trained using C-SGD outperforms a normally trained counterpart with the equivalent width. 25 | 26 | 27 | ## [Towards Optimal Structured CNN Pruning via Generative Adversarial Learning](https://arxiv.org/abs/1903.09291), 2019/03 28 | Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann 29 | 30 | Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner for iteratively pruning and retraining which may not be optimal and may be computation intensive. Besides, these methods are designed for pruning a specific structure, such as filter or block structures without jointly pruning heterogeneous structures. In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner. To accomplish this, we first introduce a soft mask to scale the output of these structures by defining a new objective function with sparsity regularization to align the output of baseline and network with this mask. We then effectively solve the optimization problem by generative adversarial learning (GAL), which learns a sparse soft mask in a label-free and an end-to-end manner. By forcing more scaling factors in the soft mask to zero, the fast iterative shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably remove the corresponding structures. Extensive experiments demonstrate the effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup. This significantly outperforms state-of-the-art methods. 31 | 32 | 33 | ## [Variational Convolutional Neural Network Pruning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Variational_Convolutional_Neural_Network_Pruning_CVPR_2019_paper.pdf), 2019/01 34 | Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian 35 | 36 | We propose a variational Bayesian scheme for pruningconvolutional neural networks in channel level. This idea ismotivated by the fact that deterministic value based pruningmethods are inherently improper and unstable. In a nut-shell, variational technique is introduced to estimate dis-tribution of a newly proposed parameter, called channelsaliency, based on this, redundant channels can be removedfrom model via a simple criterion. The advantages aretwo-fold: 1) Our method conducts channel pruning with-out desire of re-training stage, thus improving the compu-tation efficiency. 2) Our method is implemented as a stand-alone module, called variational pruning layer, which canbe straightforwardly inserted into off-the-shelf deep learn-ing packages, without any special network design. Exten-sive experimental results well demonstrate the effectivenessof our method: For CIFAR-10, we perform channel removalon different CNN models up to 74% reduction, which resultsin significant size reduction and computation saving. ForImageNet, about 40% channels of ResNet-50 are removedwithout compromising accuracy. 37 | 38 | 39 | ## [On Implicit Filter Level Sparsity in Convolutional Neural Networks](https://arxiv.org/abs/1811.12495), 2018/11 40 | Dushyant Mehta, Kwang In Kim, Christian Theobalt 41 | 42 | We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. We conduct an extensive experimental study casting our initial findings into hypotheses and conclusions about the mechanisms underlying the emergent filter level sparsity. This study allows new insight into the performance gap obeserved between adapative and non-adaptive gradient descent methods in practice. Further, analysis of the effect of training strategies and hyperparameters on the sparsity leads to practical suggestions in designing CNN training strategies enabling us to explore the tradeoffs between feature selectivity, network capacity, and generalization performance. Lastly, we show that the implicit sparsity can be harnessed for neural network speedup at par or better than explicit sparsification / pruning approaches, with no modifications to the typical training pipeline required. 43 | 44 | 45 | ## [Structured Pruning of Neural Networks with Budget-Aware Regularization](https://arxiv.org/abs/1811.09332), 2018/11 46 | Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin 47 | 48 | Pruning methods have shown to be effective at reducing the size of deep neural networks while keeping accuracy almost intact. Among the most effective methods are those that prune a network while training it with a sparsity prior loss and learnable dropout parameters. A shortcoming of these approaches however is that neither the size nor the inference speed of the pruned network can be controlled directly; yet this is a key feature for targeting deployment of CNNs on low-power hardware. To overcome this, we introduce a budgeted regularized pruning framework for deep convolutional neural networks. Our approach naturally fits into traditional neural network training as it consists of a learnable masking layer, a novel budget-aware objective function, and the use of knowledge distillation. We also provide insights on how to prune a residual network and how this can lead to new architectures. Experimental results reveal that CNNs pruned with our method are more accurate and less compute-hungry than state-of-the-art methods. Also, our approach is more effective at preventing accuracy collapse in case of severe pruning; this allows us to attain pruning factors up to 16x without significantly affecting the accuracy. 49 | 50 | 51 | ## [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250), 2018/11 52 | Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang 53 | 54 | Previous works utilized "smaller-norm-less-important" criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with "relatively less" importance. When applied to two image classification benchmarks, our method validates its usefulness and strengths. Notably, on CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69% relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art. Code is publicly available on [GitHub](https://github.com/he-y/filter-pruning-geometric-median). 55 | 56 | 57 | ## [Discrimination-aware Channel Pruning for Deep Neural Networks](https://arxiv.org/abs/1810.11809), 2018/10 58 | Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu 59 | 60 | Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy. 61 | 62 | 63 | ## [Rethinking the Value of Network Pruning](https://arxiv.org/abs/1810.05270), 2018/10 64 | Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell 65 | 66 | Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. 67 | 68 | 69 | ## [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), 2018/03 70 | Jonathan Frankle, Michael Carbin 71 | 72 | Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy. 73 | 74 | 75 | ## [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), 2017/10 76 | Michael Zhu, Suyog Gupta 77 | 78 | Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. 79 | 80 | 81 | ## [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342), 2017/07 82 | Jian-Hao Luo, Jianxin Wu, Weiyao Lin 83 | 84 | We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability. 85 | 86 | 87 | ## [Channel Pruning for Accelerating Very Deep Neural Networks](https://arxiv.org/abs/1707.06168), 2017/07 88 | Yihui He, Xiangyu Zhang, Jian Sun 89 | 90 | In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available. 91 | 92 | 93 | --------------------------------------------------------------------------------