└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Deep-Learning-Hardware-Accelerator 2 | A collection of works for hardware accelerators in deep learning. 3 | 4 | ## Conference Paper 5 | ### 2015 6 | * Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA 2015) 7 | 8 | ### 2016 9 | * DnnWeaver: From High-Level Deep Network Models to FPGA Acceleration (MICRO 2016) 10 | * Fused-layer CNN accelerators (MICRO 2016) 11 | * Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks (ICCAD 2016) 12 | * Going deeper with embedded fpga platform for convolutional neural network (FPGA 2016) 13 | * Automatic code generation of convolutional neural networks in FPGA implementation (FPT 2016) 14 | * Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware (ISVLSI 2016) 15 | * A high performance FPGA-based accelerator for large-scale convolutional neural networks (FPL 2016) 16 | * Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks (ISCA 2016) 17 | * C-brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC 2016) 18 | * Stripes: Bit-serial deep neural network computing (MICRO 2016) 19 | * Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks (ASP-DAC 2016) 20 | 21 | ### 2017 22 | * Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks (FPGA 2017) 23 | * Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs (DAC 2017) 24 | * A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA (IPDPSW 2017) 25 | * A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition (SSIAI 2017) 26 | * Maximizing CNN accelerator efficiency through resource partitioning (ISCA 2017) 27 | * Design space exploration of FPGA accelerators for convolutional neural networks (DATE 2017) 28 | * Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks (CODES+ISSS 2017) 29 | * A Power-Efficient Accelerator for Convolutional Neural Networks (CLUSTER 2017) 30 | * In-Datacenter Performance Analysis of a Tensor Processing Unit (ISCA 2017) 31 | * FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks (HPCA 2017) 32 | * COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array (ICPADS 2017) 33 | 34 | ### 2019 35 | * An Energy-Aware Bit-Serial Streaming Deep Comvolutional Neural Network Accelerator (ICIP 2019) 36 | 37 | ## Journal Paper 38 | ### 2016 39 | * Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse (IEEE Computer Architecture Letters 2016 Jan.-June) 40 | 41 | ### 2017 42 | * Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June) 43 | * Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks (JSSC 2017 Jan.) 44 | * Embedded Streaming Deep Neural Networks Accelerator With Applications (TNNLS 2017 July) 45 | * Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns (TVLSI 2017 Aug.) 46 | * Origami: A 803-GOp/s/W Convolutional Network Accelerator (TCSVT 2017 Nov.) 47 | 48 | ### 2018 49 | * Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA (TCAD 2018 Jan.) 50 | * A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things (TCSI 2018 Jan.) 51 | * An Architecture to Accelerate Convolution in Deep Neural Networks (TCSI 2018 April) 52 | * Data and Hardware Efficient Design for Convolutional Neural Network (TCSI 2018 May) 53 | * Efficient Hardware Architectures for Deep Convolutional Neural Network (TCSI 2018 June) 54 | * Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA (TVLSI 2018 Early Access) 55 | 56 | ## Accelerator with quantization technique discussed in the paper 57 | * Going Deeper with Embedded FPGA Platform for Convolutional Neural Network (FPGA 2016) 58 | * Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA (ISVLSI 2016)(TCAD 2018 Jan.) 59 | 60 | ## Paper about bit reduction 61 | * An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks (ICASSP 2018) 62 | * True-Gradient Based Training of Deep Binary Activated Neural Networks via Continuous Binarization (ICASSP 2018) 63 | 64 | ## Serial Approach Architecture 65 | * Bit-Pragmatic Deep Neural Network Computing (2016) 66 | * Stripes: Bit-serial deep neural network computing (MICRO 2016) 67 | * Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June) 68 | * Dynamic Stripes:Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks (2017) 69 | * Value-Based Deep-Learning Acceleration (IEEE Micro 2018 Jan./Feb.) 70 | * Exploiting Typical Values to Accelerate Deep Learning (Computer 2018 May) 71 | * Loom:Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks (DAC 2018) 72 | 73 | ## Zero-skipping series Architecture 74 | * Cnvlutin:Ineffectual-Neuron-Free Deep Neural Network Computing (ISCA 2016) 75 | * Cnvlutin2:Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing (2017) 76 | --------------------------------------------------------------------------------