└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # CNN-Inference-Engine-Quick-View 2 | A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices. 3 | 4 | ### Runtime-speed Comparisons 5 | * [AI-Benchmarks](http://ai-benchmark.com/ranking_detailed.html) 6 | 7 | ### Data-flow / Graph Optimization 8 | * [nnfusion](https://github.com/microsoft/nnfusion) 9 | * [TASO](https://github.com/jiazhihao/TASO) 10 | * [AMDMIGraphX](https://onnxruntime.ai/docs/execution-providers/MIGraphX-ExecutionProvider.html) 11 | 12 | ### FLOAT32-Support 13 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks 14 | | :----------- | :------: | :------------: | :------------: | :------------: 15 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch / onnx | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 16 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 17 | | [PPLNN](https://github.com/openppl-public/ppl.nn) | CPU (**ARM**/**x86** optimized) / Nvidia GPU | onnx | Y | [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/arm-doc/benchmark_tool.md) / [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/cuda-doc/benchmark_tool.md) 18 | | [Paddle-Light](https://github.com/PaddlePaddle/Paddle-Lite) | CPU (**ARM** optimized) / Mali GPU / FPGA / **NPU** | Paddle / Caffe / onnx | Y| [Link](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/) 19 | | [MNN](https://github.com/alibaba/MNN) | CPU (**ARM** optimized) / Mali GPU | Caffe / Tensorflow / onnx | Y | [Link](https://github.com/alibaba/MNN/blob/master/benchmark/result/2020-3-22.md) 20 | | [NCNN](https://github.com/Tencent/ncnn) | CPU (**ARM** optimized) / Mali GPU | Caffe / PyTorch / mxnet / onnx | Y | [3rd party Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) / [Official Link](https://github.com/Tencent/ncnn/tree/master/benchmark) 21 | | [MACE](https://github.com/XiaoMi/mace) | CPU (**ARM** optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/151820514) 22 | | [TEngine](https://github.com/OAID/Tengine) | CPU (**ARM A72** optimized) | Caffe / mxnet | Y | [Link](https://github.com/OAID/Tengine/blob/master/doc/benchmark.md) 23 | | [AutoKernel](https://github.com/OAID/AutoKernel) | CPU / GPU/ NPU | Caffe / mxnet / Tensorflow / PyTorch / Darknet | Y | [Link](https://github.com/OAID/Tengine/blob/master/doc/benchmark.md) 24 | | [Synet](https://github.com/ermig1979/Synet) | CPU (**ARM** optimized) / x86 | Caffe / PyTorch / Tensorflow / mxnet / onnx | Y | - 25 | | [MsnhNet](https://github.com/msnh2012/Msnhnet) | CPU (**ARM** optimized) / Mali GPU / x86 / TensorRT | PyTorch | Y | [Link](https://github.com/msnh2012/Msnhnet) 26 | | [ONNX-Runtime](https://github.com/microsoft/onnxruntime) | CPU / Nvidia GPU | onnx | Y | - 27 | | [HiAI](https://developer.huawei.com/consumer/cn/hiai) | Kirin CPU / NPU | Caffe / Tensorflow | Y | - 28 | | [NNIE](https://github.com/RaySue/NNIE-lite) | NPU | Caffe | Y | [1TOPs](http://www.hisilicon.com/-/media/Hisilicon/pdf/Surveillance_mobilecam/Hi3516DV300.pdf) 29 | | [Intel-Caffe](https://github.com/intel/caffe) | CPU (**Intel** optimized) | Caffe | Y | [Link](https://github.com/intel/caffe/wiki/INTEL%C2%AE-OPTIMIZED-CAFFE-PERFORMANCE-AND-CONVERGENCE) 30 | | [FeatherCNN](https://github.com/Tencent/FeatherCNN) | CPU (**ARM** optimized) | Caffe | N | [Link](https://github.com/Tencent/FeatherCNN/wiki/Benchmarks) / [unofficial Link](https://www.zhihu.com/question/276372408) 31 | | [Tensorflowlite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite) | CPU (**Android** optimized) | Caffe2 / Tensorflow / onnx | Y | [Link](https://www.tensorflow.org/mobile/tflite/performance) 32 | | [TensorRT](https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/) | GPU (**Volta** optimized) | Caffe / Tensorflow / onnx | Y | [Link](http://on-demand.gputechconf.com/gtc-eu/2017/presentation/23425-han-vanholder-efficient-inference-with-tensorrt.pdf) 33 | | [TVM](https://github.com/dmlc/tvm) | CPU (**ARM** optimized) / Mali GPU / FPGA | onnx | Y | - 34 | | [SNPE](https://developer.qualcomm.com/docs/snpe/index.html) | CPU (**Qualcomm** optimized) / GPU / DSP | Caffe / Caffe2 / Tensorflow/ onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) 35 | | [Pocket-Tensor](https://github.com/GValiente/pocket-tensor) | CPU (**ARM**/**x86** optimized) | Keras | N | [Link](https://github.com/GValiente/pocket-tensor) 36 | | [ZQCNN](https://github.com/zuoqing1988/ZQCNN-v0.0) | CPU | Caffe / mxnet | Y| [Link](https://github.com/zuoqing1988/ZQCNN-v0.0) 37 | | [ARM-NEON-to-x86-SSE](https://github.com/intel/ARM_NEON_2_x86_SSE) | CPU (**Intel** optimized) | Intrinsics-Level | - | - 38 | | [Simd](https://github.com/ermig1979/Simd) | CPU (all platform optimized) | Intrinsics-Level | - | - 39 | | [clDNN](https://github.com/intel/clDNN) | Intel® Processor Graphics / Iris™ Pro Graphics | Caffe / Tennsorflow / mxnet / onnx | Y | [Link](https://software.intel.com/en-us/articles/accelerate-deep-learning-inference-with-integrated-intel-processor-graphics-rev-2-0) 40 | 41 | ### FIX16-Support 42 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks 43 | | :----------- | :------: | :------------: | :------------: | :------------: 44 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 45 | | [ARM32-SGEMM-LIB](https://github.com/JunLee85/ARM32-SGEMM-LIB) | CPU (**ARM** optimized) | GEMM Library | N | [Link](https://github.com/JunLee85/ARM32-SGEMM-LIB/wiki) 46 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 47 | | [Yolov2-Xilinx-PYNQ](https://github.com/dhm2013724/yolov2_xilinx_fpga) | FPGA (Xilinx PYNQ) | Yolov2-only | Y | [Link](https://github.com/dhm2013724/yolov2_xilinx_fpga) 48 | 49 | ### INT8-Support 50 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks 51 | | :----------- | :------: | :------------: | :------------: | :------------: 52 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 53 | | [Intel-Caffe](https://github.com/intel/caffe) | CPU (**Intel Skylake**) | Caffe | Y | [Link](https://github.com/intel/caffe/wiki/INTEL%C2%AE-OPTIMIZED-CAFFE-PERFORMANCE-AND-CONVERGENCE) 54 | | [TNN](https://github.com/Tencent/TNN) | CPU (**ARM** optimized) / Mali Adreno Apple GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/Tencent/TNN/blob/master/doc/en/development/profiling_en.md) 55 | | [PPLNN](https://github.com/openppl-public/ppl.nn) | Nvidia GPU optimized | onnx | Y | [Link](https://github.com/openppl-public/ppl.nn/blob/master/docs/en/cuda-doc/benchmark_tool.md) 56 | | [NCNN](https://github.com/Tencent/ncnn) | CPU (**ARM** optimized) | Caffe / pytorch / mxnet / onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) 57 | | [Paddle-Light](https://github.com/PaddlePaddle/Paddle-Lite) | CPU (**ARM** optimized) / Mali GPU / FPGA | Paddle / Caffe / onnx | Y| [Link](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/) 58 | | [MNN](https://github.com/alibaba/MNN) | CPU (**ARM** optimized) / Mali GPU | Caffe / Tensorflow / onnx | Y | [Link](https://github.com/alibaba/MNN/blob/master/benchmark/result/2020-3-22.md) 59 | | [Tensorflowlite](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite) | CPU (**ARM**) | Caffe2 / Tensorflow / onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) 60 | | [TensorRT](https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/) | GPU (**Volta**) | Caffe / Tensorflow / onnx | Y | [Link](http://on-demand.gputechconf.com/gtc-eu/2017/presentation/23425-han-vanholder-efficient-inference-with-tensorrt.pdf) 61 | | [Gemmlowp](https://github.com/google/gemmlowp) | CPU (ARM / x86) | GEMM Library | - | - 62 | | [SNPE](https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk) | DSP (Quantized DLC) | Caffe / Caffe2 / Tensorflow/ onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) 63 | | [MACE](https://github.com/XiaoMi/mace) | CPU (**ARM** optimized) / Mali GPU / DSP | Caffe / Tensorflow / onnx | Y | [Link](https://gitlab.com/llhe/mobile-ai-bench/-/jobs/402963978) 64 | | [TF2](https://github.com/TF2-Engine/TF2) | FPGA | Caffe / PyTorch / Tensorflow | Y| [Link](https://github.com/TF2-Engine/TF2#runtime-engine) 65 | | [TVM](https://github.com/dmlc/tvm) | CPU (**ARM** optimized) / Mali GPU / FPGA | onnx | Y | [Link](https://github.com/vinx13/tvm-cuda-int8-benchmark) 66 | 67 | ### TERNARY-Support 68 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks 69 | | :----------- | :------: | :------------: | :------------: | :------------: 70 | | [Gemmbitserial](https://github.com/maltanar/gemmbitserial) | CPU (ARM / x86) | GEMM Library | - | [Link](http://www.idi.ntnu.no/%7Eyamanu/2017-cases-wip-quantizedmm-preprint.pdf) 71 | 72 | ### BINARY-Support 73 | | Framework | Main Platform | Model Compatibility | Detection-Support | Speed Benchmarks 74 | | :----------- | :------: | :------------: | :------------: | :------------: 75 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU (**ARM** optimized) / x86 / Mali GPU | Caffe / Tensorflow / PyTorch | Y | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 76 | | [BMXNET](https://github.com/hpi-xnor/BMXNet) | CPU (ARM / x86) / GPU | mxnet | Y | [Link](https://arxiv.org/abs/1705.09864) 77 | | [DABNN](https://github.com/JDAI-CV/dabnn) | CPU (ARM) | Caffe / Tensorflow / onnx | N | [Link](https://github.com/JDAI-CV/dabnn/blob/master/images/comparison_en.png) 78 | | [Espresso](https://github.com/fpeder/espresso) | GPU | - | N | [Link](https://arxiv.org/abs/1705.09864) 79 | | [BNN-PYNQ](https://github.com/Xilinx/BNN-PYNQ) | FPGA (Xilinx PYNQ) | - | N | [Link](https://openreview.net/forum?id=Sk6fD5yCb) 80 | | [FINN](https://github.com/Xilinx/FINN) | FPGA (Xilinx) | - | N | [Link](https://arxiv.org/abs/1612.07119) 81 | 82 | 83 | ### NLP-Support 84 | | Framework | Main Platform | Model Compatibility | Speed Benchmarks 85 | | :----------- | :------: | :------------: | :------------: 86 | | [TurboTransformers](https://github.com/Tencent/TurboTransformers) | CPU / Nvidia GPU | PyTorch | [Link](https://github.com/Tencent/TurboTransformers#performance) 87 | | [Bolt](https://github.com/huawei-noah/bolt) | CPU / Mali GPU | Caffe / onnx | [Link](https://github.com/huawei-noah/bolt/blob/master/docs/BENCHMARK.md) 88 | --------------------------------------------------------------------------------