└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # CNN-Inference-Engine-Quick-View
 2 | A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
 3 | 
 4 | ### Runtime-speed Comparisons
 5 | * [AI-Benchmarks](http://ai-benchmark.com/ranking_detailed.html)
 6 | 
 7 | ### Data-flow / Graph Optimization
 8 | * [nnfusion](https://github.com/microsoft/nnfusion)
 9 | * [TASO](https://github.com/jiazhihao/TASO)
10 | * [AMDMIGraphX](https://onnxruntime.ai/docs/execution-providers/MIGraphX-ExecutionProvider.html)
11 | 
12 | ### FLOAT32-Support
13 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks
14 | | :----------- | :------: | :------------: | :------------: | :------------:
15 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch / onnx | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 
16 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 
17 | | [PPLNN](https://github.com/openppl-public/ppl.nn) | CPU (**ARM**/**x86** optimized) / Nvidia GPU | onnx | Y | [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/arm-doc/benchmark_tool.md) / [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/cuda-doc/benchmark_tool.md)
18 | | [Paddle-Light](https://github.com/PaddlePaddle/Paddle-Lite) | CPU (**ARM** optimized) / Mali GPU / FPGA / **NPU** | Paddle / Caffe / onnx | Y| [Link](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/)
19 | | [MNN](https://github.com/alibaba/MNN) | CPU (**ARM** optimized) / Mali GPU | Caffe / Tensorflow / onnx | Y | [Link](https://github.com/alibaba/MNN/blob/master/benchmark/result/2020-3-22.md) 
20 | | [NCNN](https://github.com/Tencent/ncnn) | CPU (**ARM** optimized) / Mali GPU | Caffe / PyTorch / mxnet / onnx | Y | [3rd party Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) / [Official Link](https://github.com/Tencent/ncnn/tree/master/benchmark)
21 | | [MACE](https://github.com/XiaoMi/mace) | CPU (**ARM** optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx  | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/151820514)
22 | | [TEngine](https://github.com/OAID/Tengine) | CPU (**ARM A72** optimized) | Caffe / mxnet  | Y | [Link](https://github.com/OAID/Tengine/blob/master/doc/benchmark.md)
23 | | [AutoKernel](https://github.com/OAID/AutoKernel) | CPU / GPU/ NPU | Caffe / mxnet / Tensorflow / PyTorch / Darknet | Y | [Link](https://github.com/OAID/Tengine/blob/master/doc/benchmark.md)
24 | | [Synet](https://github.com/ermig1979/Synet) | CPU (**ARM** optimized) / x86  | Caffe / PyTorch / Tensorflow / mxnet / onnx  | Y | -
25 | | [MsnhNet](https://github.com/msnh2012/Msnhnet) | CPU (**ARM** optimized) / Mali GPU / x86 / TensorRT | PyTorch  | Y | [Link](https://github.com/msnh2012/Msnhnet)
26 | | [ONNX-Runtime](https://github.com/microsoft/onnxruntime) | CPU / Nvidia GPU | onnx  | Y | -
27 | | [HiAI](https://developer.huawei.com/consumer/cn/hiai) | Kirin CPU / NPU | Caffe / Tensorflow | Y | -
28 | | [NNIE](https://github.com/RaySue/NNIE-lite) | NPU | Caffe | Y | [1TOPs](http://www.hisilicon.com/-/media/Hisilicon/pdf/Surveillance_mobilecam/Hi3516DV300.pdf)
29 | | [Intel-Caffe](https://github.com/intel/caffe) | CPU (**Intel** optimized) | Caffe | Y | [Link](https://github.com/intel/caffe/wiki/INTEL%C2%AE-OPTIMIZED-CAFFE-PERFORMANCE-AND-CONVERGENCE)
30 | | [FeatherCNN](https://github.com/Tencent/FeatherCNN) | CPU (**ARM** optimized) | Caffe | N | [Link](https://github.com/Tencent/FeatherCNN/wiki/Benchmarks) / [unofficial Link](https://www.zhihu.com/question/276372408)
31 | | [Tensorflowlite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite) | CPU (**Android** optimized) | Caffe2 / Tensorflow / onnx  | Y | [Link](https://www.tensorflow.org/mobile/tflite/performance)
32 | | [TensorRT](https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/) | GPU (**Volta** optimized) | Caffe / Tensorflow / onnx  | Y | [Link](http://on-demand.gputechconf.com/gtc-eu/2017/presentation/23425-han-vanholder-efficient-inference-with-tensorrt.pdf)
33 | | [TVM](https://github.com/dmlc/tvm) | CPU (**ARM** optimized) / Mali GPU / FPGA | onnx  | Y | -
34 | | [SNPE](https://developer.qualcomm.com/docs/snpe/index.html) | CPU (**Qualcomm** optimized) / GPU / DSP | Caffe / Caffe2 / Tensorflow/ onnx  | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978)
35 | | [Pocket-Tensor](https://github.com/GValiente/pocket-tensor) | CPU (**ARM**/**x86** optimized) | Keras | N | [Link](https://github.com/GValiente/pocket-tensor)
36 | | [ZQCNN](https://github.com/zuoqing1988/ZQCNN-v0.0) | CPU |  Caffe / mxnet | Y| [Link](https://github.com/zuoqing1988/ZQCNN-v0.0)
37 | | [ARM-NEON-to-x86-SSE](https://github.com/intel/ARM_NEON_2_x86_SSE) | CPU (**Intel** optimized) | Intrinsics-Level | - | -
38 | | [Simd](https://github.com/ermig1979/Simd) | CPU (all platform optimized) | Intrinsics-Level | - | -
39 | | [clDNN](https://github.com/intel/clDNN) |  Intel® Processor Graphics / Iris™ Pro Graphics |  Caffe / Tennsorflow / mxnet / onnx | Y | [Link](https://software.intel.com/en-us/articles/accelerate-deep-learning-inference-with-integrated-intel-processor-graphics-rev-2-0)
40 | 
41 | ### FIX16-Support
42 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks
43 | | :----------- | :------: | :------------: | :------------: | :------------:
44 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 
45 | | [ARM32-SGEMM-LIB](https://github.com/JunLee85/ARM32-SGEMM-LIB) | CPU (**ARM** optimized) | GEMM Library  | N | [Link](https://github.com/JunLee85/ARM32-SGEMM-LIB/wiki)
46 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 
47 | | [Yolov2-Xilinx-PYNQ](https://github.com/dhm2013724/yolov2_xilinx_fpga) | FPGA (Xilinx PYNQ) | Yolov2-only | Y | [Link](https://github.com/dhm2013724/yolov2_xilinx_fpga) 
48 | 
49 | ### INT8-Support
50 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks
51 | | :----------- | :------: | :------------: | :------------: | :------------:
52 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 
53 | | [Intel-Caffe](https://github.com/intel/caffe) | CPU (**Intel Skylake**) | Caffe | Y | [Link](https://github.com/intel/caffe/wiki/INTEL%C2%AE-OPTIMIZED-CAFFE-PERFORMANCE-AND-CONVERGENCE)
54 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 
55 | | [PPLNN](https://github.com/openppl-public/ppl.nn) |  Nvidia GPU optimized | onnx | Y | [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/cuda-doc/benchmark_tool.md)
56 | | [NCNN](https://github.com/Tencent/ncnn) | CPU (**ARM** optimized) | Caffe / pytorch / mxnet / onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978)
57 | | [Paddle-Light](https://github.com/PaddlePaddle/Paddle-Lite) | CPU (**ARM** optimized) / Mali GPU / FPGA | Paddle / Caffe / onnx | Y| [Link](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/)
58 | | [MNN](https://github.com/alibaba/MNN) | CPU (**ARM** optimized) / Mali GPU | Caffe / Tensorflow / onnx | Y | [Link](https://github.com/alibaba/MNN/blob/master/benchmark/result/2020-3-22.md) 
59 | | [Tensorflowlite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite) | CPU (**ARM**) | Caffe2 / Tensorflow / onnx  | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978)
60 | | [TensorRT](https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/) | GPU (**Volta**) | Caffe / Tensorflow / onnx  | Y | [Link](http://on-demand.gputechconf.com/gtc-eu/2017/presentation/23425-han-vanholder-efficient-inference-with-tensorrt.pdf)
61 | | [Gemmlowp](https://github.com/google/gemmlowp) | CPU (ARM / x86) | GEMM Library  | - | -
62 | | [SNPE](https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk) | DSP (Quantized DLC) | Caffe / Caffe2 / Tensorflow/ onnx  | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978)
63 | | [MACE](https://github.com/XiaoMi/mace) | CPU (**ARM** optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx  | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978)
64 | | [TF2](https://github.com/TF2-Engine/TF2) | FPGA | Caffe / PyTorch / Tensorflow | Y| [Link](https://github.com/TF2-Engine/TF2#runtime-engine)
65 | | [TVM](https://github.com/dmlc/tvm) | CPU (**ARM** optimized) / Mali GPU / FPGA | onnx  | Y | [Link](https://github.com/vinx13/tvm-cuda-int8-benchmark)
66 | 
67 | ### TERNARY-Support
68 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks
69 | | :----------- | :------: | :------------: | :------------: | :------------:
70 | | [Gemmbitserial](https://github.com/maltanar/gemmbitserial) | CPU (ARM / x86) | GEMM Library | - | [Link](http://www.idi.ntnu.no/%7Eyamanu/2017-cases-wip-quantizedmm-preprint.pdf)
71 | 
72 | ### BINARY-Support
73 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks
74 | | :----------- | :------: | :------------: | :------------: | :------------:
75 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 
76 | | [BMXNET](https://github.com/hpi-xnor/BMXNet) | CPU (ARM / x86) / GPU | mxnet | Y | [Link](https://arxiv.org/abs/1705.09864)
77 | | [DABNN](https://github.com/JDAI-CV/dabnn) | CPU (ARM) | Caffe / Tensorflow / onnx | N | [Link](https://github.com/JDAI-CV/dabnn/blob/master/images/comparison_en.png)
78 | | [Espresso](https://github.com/fpeder/espresso) | GPU | - | N | [Link](https://arxiv.org/abs/1705.09864)
79 | | [BNN-PYNQ](https://github.com/Xilinx/BNN-PYNQ) | FPGA (Xilinx PYNQ) | - | N | [Link](https://openreview.net/forum?id=Sk6fD5yCb)
80 | | [FINN](https://github.com/Xilinx/FINN) | FPGA (Xilinx) | - | N | [Link](https://arxiv.org/abs/1612.07119)
81 | 
82 | 
83 | ### NLP-Support
84 | | Framework | Main Platform | Model Compatibility | Speed Benchmarks
85 | | :----------- | :------: | :------------: | :------------:
86 | | [TurboTransformers](https://github.com/Tencent/TurboTransformers) | CPU / Nvidia GPU | PyTorch | [Link](https://github.com/Tencent/TurboTransformers#performance)
87 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU / Mali GPU | Caffe / onnx | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md)
88 | 


--------------------------------------------------------------------------------