├── A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection.md ├── A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA.md ├── A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection.md ├── A Lightweight YOLOv2 A Binarized CNN with A Parallel Support Vector Regression for an FPGA.md ├── A Multi-Resolution FPGA-Based Architecture for Real-Time Edge and Corner Detection.md ├── A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan.md ├── A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA.md ├── A new multi-scroll Chua’s circuit with composite hyperbolic tangent-cubic nonlinearity Complex dynamics, Hardware implementation and Image encryption application.md ├── A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function.md ├── ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition video.md ├── Accelerating Tiny YOLO v3 using FPGA-based HardwareSoftware Co-Design.md ├── An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution.md ├── An Extremely Simple Multi-Wing Chaotic System:Dynamics Analysis, Encryption Application and Hardware Implementation.md ├── An Object Detector based on Multiscale Sliding Window Search using a Fully Pipelined Binarized CNN on an FPGA.md ├── An efficient and cost effective FPGA based implementation of the Viola-Jones face detection algorithm.md ├── Angel-Eye A Complete Design Flow for Mapping CNN Onto Embedded FPGA.md ├── Design and FPGA Implementation of a Pseudo-random Number Generator Based on a Hopfield Neural Network Under Electromagnetic Radiation.md ├── Energy-Efficient Pedestrian Detection System Exploiting Statistical Error Compensation for Lossy Memory Data Compression.md ├── FPGA acceleration of Hyperspectral Image Processing for High-Speed Detection Applications.md ├── FPGA-Based Real-Time Moving Target Detection System for Unmanned Aerial Vehicle Application.md ├── LACS A High-Computational-Efficiency Accelerator for CNNs.md ├── Multichannel Pulse-Coupled-Neural-Network-Based Color Image Segmentation for Object Detection.md ├── NOVEL FPGA BASED HAAR CLASSIFIER FACE DETECTION ALGORITHM ACCELERATION.md ├── Pseudorandom number generator based on a 5D hyperchaotic four-wing memristive system and its FPGA implementation.md ├── README.md ├── REQ-YOLO A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs.md ├── Real-time DDoS attack detection using FPGA.md ├── Real-time hardware–software embedded vision system for ITS smart camera implemented in Zynq SoC.md ├── SpWA An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs.md ├── Sparse-YOLO HardwareSoftware Co-Design of an FPGA Accelerator for YOLOv2.md └── Towards a Scalable HardwareSoftware Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device.md /A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection.md: -------------------------------------------------------------------------------- 1 | # A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection 2 | 3 | ## ABSTRACT 4 | 5 | Real-time object detection is becoming necessary for a wide number of applications related to computer vision and image processing, security, bioinformatics, and several other areas. Existing software implementations of object detection algorithms are constrained in small-sized images and rely on favorable conditions in the image frame to achieve real-time detection frame rates. Efforts to design hardware architectures have yielded encouraging results, yet are mostly directed towards a single application, targeting specific operating environments. Consequently, there is a need for hardware architectures capable of detecting several objects in large image frames, and which can be used under several object detection scenarios. In this work, we present a generic, flexible parallel architecture, which is suitable for all ranges of object detection applications and image sizes. The architecture implements the AdaBoost-based detection algorithm, which is considered one of the most efficient object detection algorithms. Through both field-programmable gate array emulation and large-scale implementation, and register transfer level synthesis and simulation, we illustrate that the architecture can detect objects in large images(up to 1024×768 pixels) with frame rates that can vary between 64–139 fps for various applications and input image frame sizes. 6 | 7 | ## Index Terms 8 | 9 | Object detection, systolic arrays, VLSI. 10 | 11 | ## Contribution 12 | 13 | The architecture proposed in this work is based on a massively parallel systolic computation of the classification engine using a systolic array implementation which yields extremely high detection frames per second (fps). The architecture is designed in such a way as to boost parallel computation of the classifiers used in the algorithm, and parallelize integral image computation, reducing the frequency of off-chip memory access. To make the architecture scalable in terms of image sizes, we utilize an image pyramid generation module in conjunction with the systolic array. As the array elements are modular and simple, and communication is regular and predetermined, the architecture is highly scalable and can operate on high frequency. The designer can select all the appropriate design parameters with the targeted operating environment in mind, without affecting the real-time constraints. The designer can also choose the operating frequency (with power constraints in mind), the array size(with area constraints in mind), and image size (with targeted application specifications in mind). The architecture is flexible as well in terms of input image size; the maximum input image size depends on the silicon budget available, however smaller images may easily be processed by the system as the input image size can be loaded as a parameter. Moreover, the architecture can support different training sets and different training set formats. 14 | 15 | ## The overall design of the Framework 16 | 17 | ![image-20210905184826795](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910202616.png 18 | ) 19 | 20 | ## Performance 21 | 22 | ![image-20210905184911854](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910202647.png) 23 | 24 | ![image-20210905184942828](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910202713.png) 25 | 26 | -------------------------------------------------------------------------------- /A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA.md: -------------------------------------------------------------------------------- 1 | # A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA 2 | ## ABSTRACT 3 | 4 | A pre-trained convolutional deep neural network (CNN) is widely used for embedded systems, which requires highly power-and-area efficiency. In that case, the CPU is too slow, the embedded GPU dissipates much power, and the ASIC cannot keep up with the rapidly progress of the CNN variations. This paper uses a binarized CNN which treats only binary 2-values for the inputs and the weights. Since the multiplier is replaced into an XNOR circuit, we can realize a high-performance MAC circuit by using many XNOR circuits. In the paper, we eliminate internal FC layers excluding the last one, then, insert a binarized average pooling layer, which can be realized by a majority circuit for binarized (1/0) values. In that case, since the weight memory is replaced into the 1’s counter, we can realize a compact and faster CNN than the conventional ones. We implemented the VGG-11 benchmark CNN for the CIFAR10 image classification task on the Xilinx Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. 5 | 6 | ## Contribution 7 | 8 | They eliminate the FC layers instead of the pruning. By analyzing the distribution of both the weights and the activations, they introduce the multiply accumulation (MAC) operation on the binarized CNN is almost the same as the binarized average pooling operation by a trick of the training algorithm. Thus, the internal FC layers are replaced into an average pooling layer, which is realized by the 1’s counter. Also, they propose the shared XNOR-MAC architecture and its streaming operation to realized the high-performance convolutional operation with small size hardware. 9 | 10 | ## The overall design of the Framework 11 | 12 | ![image-20210909195709486](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909195709.png) 13 | 14 | ## Performance 15 | 16 | ![image-20210909195753834](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909195753.png) 17 | 18 | ![image-20210909195818089](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909195818.png) 19 | 20 | -------------------------------------------------------------------------------- /A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection.md: -------------------------------------------------------------------------------- 1 | # A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection 2 | 3 | ## Abstract 4 | 5 | Convolutional neural networks (CNNs) require numerous computations and external memory accesses. Frequent accesses to off-chip memory cause slow processing and large power dissipation. For real-time object detection with high throughput and power efficiency, this paper presents a Tera-OPS streaming hardware accelerator implementing a you-only-look-once (YOLO) CNN. The parameters of the YOLO CNN are retrained and quantized with the PASCAL VOC data set using binary weight and flexible low-bit activation. The binary weight enables storing the entire network model in block RAMs of a field-programmable gate array (FPGA) to reduce off-chip accesses aggressively and, thereby, achieve significant performance enhancement. In the proposed design, all convolutional layers are fully pipelined for enhanced hardware utilization. The input image is delivered to the accelerator line-by-line. Similarly, the output from the previous layer is transmitted to the next layer line-by-line. The intermediate data are fully reused across layers, thereby eliminating external memory accesses. The decreased dynamic random access memory (DRAM) accesses reduce DRAM power consumption. Furthermore, as the convolutional layers are fully parameterized, it is easy to scale up the network. In this streaming design, each convolution layer is mapped to a dedicated hardware block. Therefore, it outperforms the “one-size-fits-all” designs in both performance and power efficiency. This CNN implemented using VC707 FPGA achieves a throughput of 1.877 tera operations per second (TOPS) at 200 MHz with batch processing while consuming 18.29 W of on-chip power, which shows the best power efficiency compared with the previous research. As for object detection accuracy, it achieves a mean average precision (mAP) of 64.16% for the PASCAL VOC 2007 data set that is only 2.63% lower than the mAP of the same YOLO network with full precision. 6 | 7 | ## Index Terms 8 | 9 | Binary weight, low-precision quantization, object detection, streaming architecture, you-only-look-once(YOLO). 10 | 11 | ## Contribution 12 | 13 | 1) A binary weight, flexible low-bit activation, hardware-centric quantization, and a retraining method for YOLO CNN are presented. This paper shows that even the binary weight and 3-to-6-bit activation are adequate to realize the desired accuracy of object detection. The advantages of this quantization are as follows: 1) it requires a minimum number of DSPs, as the convolutional kernel contains only summations and 2) binary weight enables storing the entire network model in an on-chip memory to minimize the off-chip accesses, thereby enhancing the performance. 14 | 2) A scalable and high-accuracy streaming architecture for real-time object detection is proposed. The intermediate data are reused to minimize the size of the input buffer of each convolution layer while eliminating the accesses to the off-chip memory. The convolutional layers are fully parameterized. Thus, it is easy to change the network structure. 15 | 3) The proposed architecture is implemented, and its relative merits are highlighted by comparing with the previous works. A real-time demo for the object detection is also presented. 16 | 17 | ## The overall design of the Framework 18 | 19 | ![image-20210906233419551](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906233419.png) 20 | 21 | ## Performance 22 | 23 | ![image-20210906233543804](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906233543.png) 24 | 25 | -------------------------------------------------------------------------------- /A Lightweight YOLOv2 A Binarized CNN with A Parallel Support Vector Regression for an FPGA.md: -------------------------------------------------------------------------------- 1 | # A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA 2 | ## ABSTRACT 3 | 4 | A frame object detection problem consists of two problems: one is a regression problem to spatially separated bounding boxes, the second is the associated classification of the objects within realtime frame rate. It is widely used in the em-bedded systems, such as robotics, autonomous driving, security, and drones - all of which require high-performance and low-power consumption. This paper implements the YOLO(You only look once) object detector on an FPGA, which is faster and has higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part both the performance and the area. However, the object detector based on the CNN consists of a bounding box prediction (regression) and a class estimation (classification). Thus, the conventional all binarized CNN fails to recognize in most cases. In the paper, we propose a light-weight YOLOv2, which consists of the binarized CNN for feature extraction and the parallel support vector regression(SVR) for both classi cation and localization. To our knowledge, this is the first time binarized CNN's have been successfully used in object detection. We implement a pipelined based architecture for the lightweight YOLOv2 on the Xilinx Inc. zcu102 board, which has the Xilinx Inc. Zynq Ultra-scale+ MPSoC. The implemented object detector archived 40.81 frames per second (FPS). Compared with the ARM Cortex-A57, it was 177.4 times faster, it dissipated 1.1 times more power, and its performance per power efficiency was 158.9 times better. Also, compared with the nVidia Pascall embedded GPU, it was 27.5 times faster, it dissipated 1.5 times lower power, and its performance per power efficiency was 42.9 times better. Thus, our method is suitable for the frame object detector for an embedded vision system. 5 | 6 | ## KEYWORDS 7 | 8 | Convolutional Deep Neural Network; Object Detection; Binarized Deep Neural Network 9 | 10 | ## Contribution 11 | 12 | They showed that the object detector based on the CNN,which consists of a bounding box prediction (regression) and a class estimation (classification). Thus, the conventional all binarized CNN fails to recognize in most cases. They opened the new problem to this research area, and proposed a lightweight YOLOv2 which consists of the conventional binarized CNN with parallel SVRs. They showed the architecture for such a mixed low precision CNN and SVRs on the FPGA. They demonstrate a performance-per-power efficient object detector on an FPGA. They implemented a proposed lightweight YOLOv2 on the Xilinx Inc. zcu102 Zynq Ultra Scale+ MPSOC evaluation board. They compared with the embedded CPU and the embedded GPU with respect to the YOLOv2 (batch size was 1). Compared with the ARM Cortex-A57, it was 177.4 times faster, it dissipated 1.1 times lower power, and its performance per power efficiency was 158.9 times better. Also, compared with the Pascal embedded GPU, it was 27.5 times faster, it dissipated 1.5 times lower power, and its performance per power efficiency was 42.9 times better. 13 | 14 | ## The overall design of the Framework 15 | 16 | ![image-20210906231255653](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906231302.png) 17 | 18 | ## Performance 19 | 20 | ![image-20210906231420432](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906231420.png) 21 | 22 | -------------------------------------------------------------------------------- /A Multi-Resolution FPGA-Based Architecture for Real-Time Edge and Corner Detection.md: -------------------------------------------------------------------------------- 1 | # A Multi-Resolution FPGA-Based Architecture for Real-Time Edge and Corner Detection 2 | ## ABSTRACT 3 | 4 | This work presents a new flexible parameterizable architecture for image and video processing with reduced latency and memory requirements, supporting a variable input resolution. The proposed architecture is optimized for feature detection, more specifically, the Canny edge detector and the Harris corner detector. The architecture contains neighborhood extractors and threshold operators that can be parameterized at runtime. Also, algorithm simplifications are employed to reduce mathematical complexity, memory requirements, and latency without losing reliability. Furthermore, we present the proposed architecture implementation on an FPGA-based platform and its analogous optimized implementation on a GPU-based architecture for comparison. A performance analysis of the FPGA and the GPU implementations, and an extra CPU reference implementation, shows the competitive throughput of the proposed architecture even at a much lower clock frequency than those of the GPU and the CPU. Also, the results show a clear advantage of the proposed architecture in terms of power consumption and maintain a reliable performance with noisy images, low latency and memory requirements. 5 | 6 | ## Index Terms 7 | 8 | Reconfigurable hardware, graphics processors, real-time systems, computer vision, edge and feature detection 9 | 10 | ## Contribution 11 | 12 | They propose a new multi-resolution FPGA-based architecture that supports runtime parameterizations of its internal processing blocks. They also propose an optimized GPU implementation of those algorithms in order to provide a comparison between these two approaches, analyzing their advantages and drawbacks. With proper design constraints and application-to-architecture mapping, they show how FPGAs can be a suitable alternative to GPU-based image and video processing units, both in terms of flexibility and real-time performance. This is especially valid when portability, low-latency and power consumption are needed. This paper is an updated extension of the work in which only the FPGA implementation was addressed. Also, the present work provides additional results obtained with up-to-date FPGA- and GPU-based platforms. 13 | 14 | ## The overall design of the Framework 15 | 16 | ![image-20210908220644285](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210908220644.png) 17 | 18 | ## Performance 19 | 20 | ![image-20210908220712375](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210908220712.png) 21 | 22 | -------------------------------------------------------------------------------- /A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan.md: -------------------------------------------------------------------------------- 1 | # A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU 2 | 3 | ## ABSTRACT 4 | 5 | Convolutional neural networks (CNNs) based deep learning algorithms require high data flow and computational intensity. For real-time industrial applications, they need to overcome challenges such as high data bandwidth requirement and power consumption on hardware platforms. In this work, we have analyzed in detail the data dependency in the CNN accelerator and propose specific pipelined operations and data organized manner to design a high throughput CNN accelerator on FPGA. Besides, we have optimized the kernel operations to obtain a high power efficiency. The proposed CNN accelerator supports image classification and real-time object detection with high accuracy. The evaluation results show that our CNN-based FPGA accelerator can achieve 740 Giga operations per second (GOPS) at 200 MHz with kernel power of 12*.*2 watts on Intel Arria 10 FPGA. For object detection tasks, our system can achieve 105 fps with 56*.*5 mAP or 25 fps with 73*.*6 mAP on VOC dataset. Since we use the mixed fixed-point data representation, the detection accuracy is comparable with the GPU-based YOLO V2 framework. The power efficiency of our system is ∼ 3*.*3× better than Titan X GPU and ∼ 418× better than Intel E5-2620 V4 CPU. 6 | 7 | ## INDEX TERMS 8 | 9 | Deep neural network accelerator, FPGA, pipeline architecture, parallel computing, mixed fixed-point, object detection, low power. 10 | 11 | ## Contribution 12 | 13 | - They analyze in detail the data dependency in the CNN accelerator and present a high throughput CNN-based FPGA accelerator. Specifically, they use a pipelined MAC operation structure to remove loop-carried data dependency. They also propose the zigzag fetch unit to remove line data dependency. 14 | - To achieve a high power efficiency, we propose the offline preprocessing and combination of batch normalization (BN) and scale/bias (SB) and approximation expression for kernel computation. 15 | - They have applied the CNN accelerator on advanced multi-object detection frameworks such as tiny YOLO V2 and full YOLO V2 . To acquire a high accuracy, they use 8-16 bits mixed fixed-point data representation in the object detection task and achieve comparable accuracy compared with Titan X GPU. The demo can be found in. 16 | - Their CNN accelerator provides a definable interface to reconfigure the new CNN model easily, and it supports the Caffe framework. 17 | 18 | ## The overall design of the Framework 19 | 20 | ![image-20210907222042481](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907222042.png) 21 | 22 | ## Performance 23 | 24 | ![image-20210907222108849](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907222108.png) 25 | 26 | -------------------------------------------------------------------------------- /A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA.md: -------------------------------------------------------------------------------- 1 | # A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA 2 | ## Abstract 3 | 4 | Convolutional neural network (CNN)-based object detection has been widely employed in various applications such as autonomous driving and intelligent video surveillance. However, the computational complexity of conventional convolution hinders its application in embedded systems. Recently, a mobile-friendly CNN model SSDLite-MobileNetV2 (SSDLiteM2) has been proposed for object detection. This model consists of a novel layer called bottleneck residual block (BRB). Although SSDLiteM2 contains far fewer parameters and computations than conventional CNN models, its performance on embedded devices still cannot meet the requirements of real-time processing. This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning. Moreover, a novel quantization scheme called partial quantization has been developed, which partially quantizes SSDLiteM2 to 8 bits with only 1.8% accuracy loss. Experiments show that the proposed design on a Xilinx ZC706 device can achieve up to 65 frames per second with 20.3 mean average precision on the COCO dataset. 5 | 6 | ## Contribution 7 | 8 | - A novel FPGA-based architecture for SSDLiteM2, which supports multiple types of convolutions with different kernel sizes. 9 | - Several innovative hardware optimizations such as fused BRB, PE sharing and load-balanced channel pruning, which improve the overall performance as well as the hardware efficiency. 10 | - Complementary software optimizations including partial quantization and bias folding, which reduce not only the computational complexity but also the amount of parameters. 11 | 12 | ## The overall design of the Framework 13 | 14 | ![image-20210907075607954](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907075615.png) 15 | 16 | ## Performance 17 | 18 | ![image-20210907075757746](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907075757.png) 19 | 20 | -------------------------------------------------------------------------------- /A new multi-scroll Chua’s circuit with composite hyperbolic tangent-cubic nonlinearity Complex dynamics, Hardware implementation and Image encryption application.md: -------------------------------------------------------------------------------- 1 | # A new multi-scroll Chua’s circuit with composite hyperbolic tangent-cubic nonlinearity: Complex dynamics, Hardware implementation and Image encryption application 2 | 3 | ## ABSTRACT 4 | 5 | In this paper, a new method for generating multi-scroll chaotic attractors via constructing a compound hyperbolic tangent-cubic nonlinear function in canonical Chua’s circuit is presented. The basic dynamic characteristics of the system are analyzed, including equilibrium points, bifurcation diagrams, Lyapunov exponents, phase portraits, time-domain diagrams and attractive basins. What is surprising is that the proposed multi-scroll Chua’s circuit also exhibits rich dynamic behaviors like coexisting multiple attractors, transient period, intermittent chaos and offset boosting. In addition, we put forward the application of the system in chaotic image encryption, and analyzed some security performance evaluation indexes to show that the new Chua’s chaotic cipher system has high security and reliable encryption performance. Finally, the hardware design and experiments of the chaotic digital circuits and image encryption are carried out. Both numerical simulation and FPGA experimental results verify the feasibility and usability of the proposed new multi-scroll Chua’s system. 6 | 7 | ## Contribution 8 | 9 | This article proposes a new Chua’s circuit with smooth compound hyperbolic tangent-cubic nonlinearity, which can generate multi-scroll chaotic attractors. The dynamical behaviors of the proposed system are investigated through evaluating bifurcation diagrams, Lyapunov exponents, phase portraits, time-domain diagrams and attractive basins. Interestingly, qualitative studies show that the new Chua’s circuit exhibits coexisting multiple attractors, transient period, intermittent chaos and offset boosting. In addition, they also apply the system to image encryption, and analyze it through some security performance evaluation methods. Finally, they implement the digital circuit and image encryption of the new multi-scroll Chua’s chaotic system on FPGA platform, and the experimental observation of the attractors proves that it is suitable for generating chaotic behavior, and it is verified that the application in image encryption has a good encryption effect. 10 | 11 | ## The overall design of the Framework 12 | 13 | ![image-20210913225338549](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913225345.png) 14 | 15 | ## Main experimental analysis 16 | 17 | Parameter 𝑎 is a fixed value of 0.3, and parameter *𝑏* is the control parameter used to control the number of scrolls. When the initial values of the system are chosen as [0*.*5*,* 0*,* 0] and parameters *𝛼* = 8*, 𝛽* = 12, Fig. 1 displays the numerical simulation results of one-to-three-scroll Chua’s attractors with different parameter values of *𝑏*. The bifurcation diagram and Lyapunov exponents spectrum of system with parameter 𝑏 are shown in Fig. 2, respectively. 18 | 19 | ![image-20210913230747931](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913230747.png) 20 | 21 | ![image-20210913230835981](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913230836.png) 22 | 23 | As shown in Fig. 3, taking the parameters 𝑎 = 0*.*3, 𝑏 = 0*.*6 as an example, the curve of smooth nonlinear hyperbolic tangent-cubic function has five determined equilibrium points, among which four non-zero equilibrium points are symmetrical with respect to the origin. 24 | 25 | ![image-20210913231045490](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913231045.png) 26 | 27 | ## Performance 28 | 29 | ![image-20210913225424696](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913225424.png) 30 | 31 | ![image-20210913230425784](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913230425.png) 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function.md: -------------------------------------------------------------------------------- 1 | # A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function 2 | 3 | ## ABSTRACT 4 | 5 | Nonlinear activation functions play an important role in zeroing neural network (ZNN), and it has be proved that ZNN can achieve finite-time convergence when the sign-bi-power (SBP) activation function is explored. However, its upper bound depends on initial states of ZNN seriously, which will restrict some practical applications since the knowledge of initial conditions is generally unavailable in advance. Be- sides, SBP activation function does not make ZNN reject external disturbances simultaneously. To address the above two issues encountered by ZNN, by suggesting a new nonlinear activation function, a robust and fixed-time zeroing neural dynamics (RaFT-ZND) model is proposed and analyzed for time-variant nonlinear equation (TVNE). As compared to the previous ZNN model with SBP activation function, the RaFT-ZND model not only converges to the theoretical solution of TVNE within a fixed time, but also re- jects external disturbances to show good robustness. In addition, the upper bound of the fixed-time con- vergence is theoretically computed in mathematics, which is independent of initial states of the RaFT-ZND model. At last, computer simulations are conducted under external disturbances, and comparative results demonstrate the effectiveness, robustness, and advantage of the RaFT-ZND model for solving TVNE. 6 | 7 | ## Contribution 8 | 9 | - A new activation function is developed to modify the comprehensive performance of zeroing neural dynamics. As compared to the previous activation functions, the new activation function can achieve the best results. 10 | 11 | - On basis of such a new activation function, a robust and fixed- time zeroing neural dynamics (RaFT-ZND) model is proposed and analyzed for time-variant nonlinear equation (TVNE). In ad- dition, the detailed theoretical analyses about robustness and fixed-time convergence of the RaFT-ZND model are presented. 12 | 13 | - We give a TVNE example to test the effort of the RaFT-ZND model by comparing the previous ZND model with existing activation functions. Simulative results demonstrate that the RaFT-ZND model is a better model, and combines the robust and fixed-time convergent merits. 14 | 15 | ## Performance 16 | 17 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913231359.JPG) 18 | 19 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913231415.JPG) -------------------------------------------------------------------------------- /ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition video.md: -------------------------------------------------------------------------------- 1 | # ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition video 2 | ## ABSTRACT 3 | 4 | Background identification is a common feature in many video processing systems. This paper proposes two hardware implementations of the OpenCV version of the Gaussian mixture model (GMM), a background identification algorithm. The implemented version of the algorithm allows a fast initialization of the background model while an innovative, hardware-oriented, formulation of the GMM equations makes the proposed circuits able to perform real-time background identification on high definition (HD) video sequences with frame size 1920 × 1080. The first of the two circuits is designed with commercial field-programmable gate-array (FPGA) devices as target. When implemented on Virtex6 vlx75t, the proposed circuit process 91 HD fps (frames per second) and uses 3% of FPGA logic resources. The second circuit is oriented to the implementation in UMC-90 nm CMOS standard cell technology, and is proposed in two versions. Both versions can process at a frame rate higher than 60 HD fps. The first version uses the constant voltage scaling technique to provide a low power implementation. It provides silicon area occupation of 28847 µm2 and energy dissipation per pixel of 15.3 pJ/pixel. The second version is designed to reduce silicon area utilization and occupies 21847 µm2 with an energy dissipation of 49.4 pJ/pixel. 5 | 6 | ## Index Terms 7 | 8 | Application-specific integrated circuits (ASICs), computer vision, field programmable gate arrays (FPGAs), image motion analysis, object detection, subtraction techniques. 9 | 10 | ## Contribution 11 | 12 | 1) An innovative, hardware-oriented, formulation of the GMM equations that allows hardware saving and speed improvement without affecting the output of the GMM algorithm. 13 | 14 | 2) The implementation of the above cited GMM equations in an FPGA-oriented circuit that outperforms previously proposed circuits. 15 | 16 | 3) The ASIC standard cell implementation of the proposed Bg identification circuit (actually in two versions that in turn optimize power or silicon area occupation), thus providing a performance reference for ASIC designers that is still missing in the scientific literature. 17 | 18 | 4) The experimental demonstration of the proposed FPGA circuit in running on-line video systems. 19 | 20 | ## The overall design of the Framework 21 | 22 | ![image-20210908214522428](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210908214529.png) 23 | 24 | ## Performance 25 | 26 | ![image-20210908215020355](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210908215020.png) 27 | 28 | ![image-20210908215107368](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210908215107.png) 29 | 30 | -------------------------------------------------------------------------------- /Accelerating Tiny YOLO v3 using FPGA-based HardwareSoftware Co-Design.md: -------------------------------------------------------------------------------- 1 | # Accelerating Tiny YOLO v3 using FPGA-based Hardware/Software Co-Design 2 | ## ABSTRACT 3 | 4 | Convolutional Neural Networks (CNNs) are influencing major breakthroughs in computer vision by achieving unprecedented accuracy on tasks such as image classification, object detection, landmark detection and semantic segmentation. Owing to high computational complexity of most modern CNN architectures, graphical processing units (GPUs) are being utilized to achieve real-time performance albeit at a high energy cost. Consequently, Field Programmable Gate Arrays (FPGAs) based hardware accelerators are also making their way as they demonstrate GPU-like performance with significantly lower energy consumption that is well-suited for embedded vision applications. In this paper, we employ Hardware/Software Co-Design approach to accelerate Tiny YOLOv3 – an efficient CNN architecture for object detection – by designing a hardware accelerator for convolution, the most complex operation involved in the CNNs. Experimental results show significant performance gains, in the range of 3.9× to 21.3×, over previous implementations of efficient object detection algorithms. 5 | 6 | ## Contribution 7 | 8 | - They perform detailed profiling of Tiny YOLOv3 to categorize computational workloads by operations. 9 | - They propose a hardware accelerator based on hardware/software co-design approach and discuss optimizations leading to an efficient design. 10 | - They design a parallel and pipelined hardware accelerator to offload the operations that are computationally intractable for the soft-core processor. 11 | 12 | ## The overall design of the Framework 13 | 14 | ![image-20210907213615720](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907213615.png) 15 | 16 | ## Performance 17 | 18 | ![image-20210907213730202](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907213730.png) 19 | 20 | -------------------------------------------------------------------------------- /An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution.md: -------------------------------------------------------------------------------- 1 | # An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution 2 | 3 | ## ABSTRACT 4 | 5 | Convolutional neural networks (CNNs) demonstrate excellent performance in various computer vision applications. In recent years, FPGA-based CNN accelerators have been proposed for optimizing performance and power efficiency. Most accelerators are designed for object detection and recognition algorithms that are performed on low-resolution images. However, real-time image super-resolution (SR) cannot be implemented on a typical accelerator because of the long execution cycles required to generate high-resolution (HR) images, such as those used in ultra-high-definition systems. In this paper, we propose a novel CNN accelerator with efficient parallelization methods for SR applications. First, we propose a new methodology for optimizing the deconvolutional neural networks (DCNNs) used for increasing feature maps. Second, we propose a novel method to optimize CNN dataflow so that the SR algorithm can be driven at low power in display applications. Finally, we quantize and compress a DCNN-based SR algorithm into an optimal model for efficient inference using on-chip memory. We present an energy-efficient architecture for SR and validate our architecture on a mobile panel with quad-high-definition resolution. Our experimental results show that, with the same hardware resources, the proposed DCNN accelerator achieves a throughput up to 108 times greater than that of a conventional DCNN accelerator. In addition, our SR system achieves an energy efficiency of 144.9, 293.0, and 500.2 GOPS/W at SR scale factors of 2, 3, and 4, respectively. Furthermore, we demonstrate that our system can restore HR images to a high quality while greatly reducing the data bit-width and the number of parameters compared with conventional SR algorithms. 6 | 7 | ## Index Terms 8 | 9 | Accelerator architectures, deep neural networks (DNNs), deep learning, super-resolution, system architecture. 10 | 11 | ## Contribution 12 | 13 | - They propose a novel DCNN accelerator that can be massively parallelized by transforming the deconvolutional layer into the convolutional layer (the TDC method). They identified a load imbalance problem during the convolution process executed by the TDC method in our previous work. To overcome this problem, we propose a new load balance-aware TDC method that increases the efficiency of sparse matrix multiplication. 14 | - They propose a dataflow for hardware acceleration to store the intermediate data between the layers using the on-chip memory. 15 | - They quantize and compress a representative DCNN-based SR algorithm, called FSRCNN, into an optimal model for efficient inference using on-chip memory. If they design other SR algorithms, the same optimization process can be done to be implemented in onchip memory. They present an energy-efficient DNN-based SR system. Their system achieves an energy efficiency of 144.9 GOPS/W, 293.0 GOPS/W, and 500.2 GOPS/W for SR scale factors of 2, 3, and 4, respectively. 16 | 17 | ## The overall design of the Framework 18 | 19 | ![image-20210907231853443](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907231853.png) 20 | 21 | ## Performance 22 | 23 | ![image-20210907231951406](C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\image-20210907231951406.png) 24 | 25 | ![image-20210907232030556](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907232030.png) 26 | 27 | ![image-20210907232110253](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907232110.png) 28 | 29 | ![image-20210907232130231](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907232130.png) 30 | 31 | ![image-20210907232205552](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907232205.png) 32 | 33 | -------------------------------------------------------------------------------- /An Extremely Simple Multi-Wing Chaotic System:Dynamics Analysis, Encryption Application and Hardware Implementation.md: -------------------------------------------------------------------------------- 1 | # An Extremely Simple Multi-Wing Chaotic System: Dynamics Analysis, Encryption Application and Hardware Implementation 2 | ## ABSTRACT 3 | 4 | Polynomial functions have been the main barrier restricting the circuit realization and engineering application of multi-wing chaotic systems (MWCSs). To eliminate this bottleneck, we construct a simple MWCS without polynomial functions by introducing a sinusoidal function in a Sprott C system. Theoretical analysis and numerical simulations show that the MWCS can not only generate multi-butterfly attractors with an arbitrary number of butterflies, but also adjust the number of the butterflies by multiple ways including self-oscillating time, control parameters, and initial states. To further explore the advantage of the proposed MWCS, we realize its analog circuit using commercially available electronic elements. The results demonstrate that in comparison to traditional MWCSs, our circuit implementation greatly reduces the consumption of electronic components. This makes the MWCS more suitable for many chaos-based engineering applications. Furthermore, we propose an application of the MWCS to chaotic image encryption. Histogram, correlation, information entropy, and key sensitivity show that the simple image encryption scheme has high security and reliable encryption performance. Finally, we develop a field-programmable gate array (FPGA) test platform to implement the MWCS-based image cryptosystem. Both theoretical analysis and experimental results verify the feasibility and availability of the proposed MWCS. 5 | 6 | ## Index Terms 7 | 8 | Chaotic system, image encryption, FPGA implementation, nonlinear circuit, multi-butterfly attractor, multistability 9 | 10 | ## Contribution 11 | 12 | - An extremely simple MWCS with no polynomial functions is designed, and its unique dynamics properties are revealed. 13 | - A MWCS-based image encryption scheme is proposed, and its various security metrics are analyzed. 14 | - The image cryptosystem based on the MWCS is implemented and demonstrated on the FPGA platform. 15 | 16 | ## Main experimental analysis 17 | 18 | In Equation, the equilibrium points of the system are determined by the equation *z*=*kπ*. Namely, the equilibrium points in the *z*-axis will orderly extend with the increase of *k*. It should be pointed that unlike the previous MWCSs, all equilibrium points in system exist in the system itself, rather than being extended by adding additional polynomial functions. Thus, the new system does not require additional polynomial functions to yield multi-butterfly attractors. Let *k*=0, *±*1, *±*2, *±*3, *±*4, *±*5, Fig. 1 illustrates the distribution of the equilibrium points on the k- z plane and x-z plane with the increase of *k*. We can see that the equilibrium points of system are extended along the *z*-axis with the increase of *k*. And all equilibrium points are symmetric about the *z*-axis, which forms infinite pairs of equilibrium points with the same stabilities. Moreover, thenonhyperbolic equilibrium points and the unstable index-1 saddle points are alternately extended along *z*-axis direction in the *x*-*z* plane. 19 | 20 | ![image-20210913232530969](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913232531.png) 21 | 22 | In this subsection, the dynamic characteristics of *c* are investigated by using bifurcation diagram and Lyapunov exponents. Set the parameters *a*=1, *b*=2.1, the initial values (*x*0, *y*0, *z*0)=(0.1, 0.1, 0.1), we can plot the bifurcation diagrams and Lyapunov exponents of system with respect to the parameter c∈(0, 1.8), as shown in Fig. 2, where xmax is the maxima of the *x* variable. Fig. 2(a) shows that system appears forward period-doubling bifurcation with the increase of parameter *c*. When *c* increases to 0.055, system enter into an intermittent chaotic area until *c*=0.62. With the *c* further increase, the chaotic state is degenerated to a periodic state by tangent bifurcation. However, the periodic state quickly evolves into a stable chaotic area by forward period-doubling bifurcation route again. Finally, the chaotic state degenerates to a stable point through the tangent bifurcation route at *c*=1.7. The Lyapunov exponents in Fig. 2(b) are basically consistent with the dynamical behavior on the bifurcation in Fig. 2(a). The phase portraits of the attractors of the system with different values of *c* are further presented to illustrate its dynamic evolution with the parameter, as shown in Fig. 3. It is obvious that system generates different attractors from initial values (0.1, 0.1, 0.1), for *c*=0.048, 0.051, 0.01, 1.2, respectively. 23 | 24 | ![image-20210913232843243](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913232843.png) 25 | 26 | ## Performance 27 | 28 | ![image-20210913232936439](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913232936.png) 29 | 30 | ![image-20210913233017057](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913233017.png) 31 | 32 | ![image-20210913233115175](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913233115.png) 33 | 34 | -------------------------------------------------------------------------------- /An Object Detector based on Multiscale Sliding Window Search using a Fully Pipelined Binarized CNN on an FPGA.md: -------------------------------------------------------------------------------- 1 | # An Object Detector based on Multiscale Sliding Window Search using a Fully Pipelined Binarized CNN on an FPGA 2 | ## ABSTRACT 3 | 4 | An object detection problem consists of two problems: one is classification of detected object category and the other is localization. Frame object detection is used in an embedded vision systems, such as a robot, an automobile, a security camera, and a drone. These applications require high-performance computation and low-power consumption by an inexpensive device. This paper proposes multiscale sliding window based object detector using a fully pipelined binarized deep convolutional neural network (BCNN) on an FPGA. It consists of a sliding window part, a fully pipelined BCNN classifier, and an ARM processing unit for detection. Duplicate detections were filtered by using a non-maximum suppression algorithm running on the ARM processor. We propose the fully pipelined layers for the BCNN and its architecture for FPGA realization. Since the proposed BCNN circuit uses on-chip memories on the FPGA, its throughput is higher than a GPU based one with practical recognition accuracy. We trained the VGG11 based BCNN using the KITTI vision benchmark for the car detection scenario. Then, we implemented the proposed object detector on the Xilinx Inc. Zynq UltraScale+ MPSoC zcu102 evaluation board. The GPU based object detectors were too slow for the realtime application requirement (HD frame rate), with the exception of YOLOv2. As compared with the GPU implementation of YOLOv2, the proposed FPGA detector had higher recognition accuracy and lower power consumption. Compared with the YOLOv2, the proposed FPGA one is higher with respect to recognition accuracy, and its power consumption is lower than the GPU based YOLOv2. Thus, the FPGA based object detector suitable for the embedded realtime applications. 5 | 6 | ## Contribution 7 | 8 | - They propose a multiscale sliding window based object detector on the FPGA. It used a binarized deep neural network (BCNN) circuit consisting of heterogeneous pipeline circuits, which decreases the throughput for modern GPUs. Thus, our detector is faster than GPU based ones. The proposed architecture is based on a multiscale sliding window algorithm, thus, many techniques can be applied, such as post processing to reduce the number of image extractions. Therefore, our realization affects other computer vision hardware researchers who may apply their results. 9 | 10 | - They implemented the proposed object detector on a Xilinx Inc. zcu102 Zynq Ultra Scale+ MPSOC evaluation board. They compared our FPGA based object detector with GPU based ones on the KITTI vision benchmark suite for a car detector (medium). Although most GPU detectors performed around 90% recognition except for YOLOv2, their frame rates were lower than the high definition (HD) television frame rate (29.97 FPS). Thus, they cannot be used for the real time detection on an embedded system. Only YOLOv2 and our FPGA realization exceeded the HD frame rate. As for the recognition accuracy, the FPGA realization is higher than the YOLOv2 based one. The other advantage of the FPGA realization is the power consumption. YOLOv2 requires 250 Watts using the NVidia Titan X (Pascal architecture), while our FPGA realization requires only 2.5 Watt. Thus, the FPGA based object detector suitable for embedded realtime applications. 11 | 12 | ## The overall design of the Framework 13 | 14 | ![image-20210907210426512](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907210426.png) 15 | 16 | ## Performance 17 | 18 | ![image-20210907210455837](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907210455.png) 19 | 20 | -------------------------------------------------------------------------------- /An efficient and cost effective FPGA based implementation of the Viola-Jones face detection algorithm.md: -------------------------------------------------------------------------------- 1 | # An efficient and cost effective FPGA based implementation of the Viola-Jones face detection algorithm 2 | ## ABSTRACT 3 | 4 | We present an field programmable gate arrays (FPGA) based implementation of the popular Viola-Jones face detection algorithm, which is an essential building block in many applications such as video surveillance and tracking. Our implementation is a complete system level hardware design described in a hardware description language and validated on the affordable DE2-115 evaluation board. Our primary objective is to study the achievable performance with a low-end FPGA chip based implementation. In addition, we release to the public domain the entire project. We hope that this will enable other researchers to easily replicate and compare their results to ours and that it will encourage and facilitate further research and educational ideas in the areas of image processing, computer vision, and advanced digital design and FPGA prototyping. 5 | 6 | ## Contribution 7 | 8 | They present a complete hardware implementation of the Viola-Jones face detection algorithm on a low-end FPGA chip. They focus on the Viola-Jones face detection algorithm due to its popularity and efficiency and because it underlies a lot of other face detection algorithms. Their hardware implementation is described entirely in a hardware description language (HDL). They compare our HDL implementation to software based executed on general purpose processors or CPUs. The hardware FPGA based implementation offers a lower performance measured as frames per second (fps) compared to the software CPU-alone implementations for an image size of 320 × 240 pixels. However, it represents a good solution from a performance-power-price point of view. In addition, the FPGA based implementation has the potential to improve performance if deployed with greater parallelism and especially for larger image sizes on more complex but also more expensive FPGA chips. As such, we release the FPGA based implementation to the public domain. 9 | 10 | ## The overall design of the Framework 11 | 12 | ![image-20210909103626302](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909103626.png) 13 | 14 | ## Performance 15 | 16 | ![image-20210909103714405](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909103714.png) 17 | 18 | -------------------------------------------------------------------------------- /Angel-Eye A Complete Design Flow for Mapping CNN Onto Embedded FPGA.md: -------------------------------------------------------------------------------- 1 | # Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA 2 | 3 | ## ABSTRACT 4 | 5 | Convolutional neural network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many computer vision algorithms. But the computation complexity of CNN is much higher than traditional algorithms. With the help of GPU acceleration, CNN based applications are widely deployed in servers. However, for embedded platforms, CNN-based solutions are still too complex to be applied. Various dedicated hardware designs on field programmable gate arrays (FPGAs) have been carried out to accelerate CNNs, while few of them explore the whole design flow for both fast deployment and high power efficiency. In this paper, we investigate state-of-the-art CNN models and CNN-based applications. Requirements on memory, computation and the flexibility of the system are summarized for mapping CNN on embedded FPGAs. Based on these requirements, we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool. Data quantization strategy helps reduce the bit-width down to 8-bit with negligible accuracy loss. The compilation tool maps a certain CNN model efficiently onto hardware. Evaluated on Zynq XC7Z045 platform, Angel-Eye is 6× faster and 5× better in power efficiency than peer FPGA implementation on the same platform. Applications of VGG network, pedestrian detection and face alignment are used to evaluate our design on Zynq XC7Z020. NIVIDA TK1 and TX1 platforms are used for comparison. Angel-Eye achieves similar performance and delivers up to 16× better energy efficiency. 6 | 7 | ## Index Terms 8 | 9 | Convolutional neural network (CNN), design flow, embedded field-programmable gate array (FPGA), hardware/software co-design. 10 | 11 | ## Contribution 12 | 13 | - A data quantization strategy to compress the original network to a fixed-point form. 14 | - A parameterized and run-time configurable hardware architecture to support various networks and fit into various platforms. 15 | - A compiler is proposed to map a CNN model onto the hardware architecture. 16 | 17 | ## The overall design of the Framework 18 | 19 | ![image-20210907215806282](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907215806.png) 20 | 21 | ## Performance 22 | 23 | ![image-20210907215946395](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907215946.png) 24 | 25 | -------------------------------------------------------------------------------- /Design and FPGA Implementation of a Pseudo-random Number Generator Based on a Hopfield Neural Network Under Electromagnetic Radiation.md: -------------------------------------------------------------------------------- 1 | # Design and FPGA Implementation of a Pseudo-random Number Generator Based on a Hopfield Neural Network Under Electromagnetic Radiation 2 | 3 | ## ABSTRACT 4 | 5 | When implementing a pseudo-random number generator (PRNG) for neural network chaos-based systems on FPGAs, chaotic degradation caused by numerical accuracy constraints can have a dramatic impact on the performance of the PRNG. To suppress this degradation, a PRNG with a feedback controller based on a Hopfield neural network chaotic oscillator is proposed, in which a neuron is exposed to electromagnetic radiation. We choose the 6 | magnetic flux across the cell membrane of the neuron as a feedback condition of the feedback controller to disturb other neurons, thus avoiding periodicity. The proposed PRNG is modeled and simulated on Vivado 2018.3 software and implemented and synthesized by the FPGA device ZYNQ-XC7Z020 on Xilinx using Verilog HDL code. As the basic entropy source, the Hopfield neural network with one neuron exposed to electromagnetic radiation has been implemented on the FPGA using the high precision 32-bit Runge Kutta fourth-order method (RK4) algorithm from the IEEE 754-1985 floating point standard. The post-processing module consists of 32 registers and 15 XOR comparators. The binary data generated by the scheme was tested and analyzed using the NIST 800.22 statistical test suite. The results show that it has high security and randomness. Finally, an image encryption 7 | and decryption system based on PRNG is designed and implemented on FPGA. The feasibility of the system is proved by simulation and security analysis. 8 | 9 | ## Keywords 10 | 11 | PRNG, hopfield neural network, electromagnetic radiation, chaotic degradation, FPGA, security analysis, image encryption and decryption system 12 | 13 | ## Contribution 14 | 15 | In this paper, a PRNG with a feedback controller based on the improved Hopfield chaotic neural network oscillator is proposed and well implemented on FPGA. Among them, the magnetic flux of neurons is taken as the judgment condition, and the feedback controller is used to add the corresponding 16 | interference factor to the neurons with the highest Lyapunov exponent, so as to reduce the influence of chaos degradation on the generated random numbers and improve the randomness of the random sequence. The post-processing unit consists of 32 registers and 15 XOR comparators. From the chip statistics, it can be seen that the PRNG can be implemented on FPGA and the output data rate can be up to 16.2 Mbit/s. The performance 17 | of the PRNG was tested. The security analysis and FPGA implementation of the image encryption and decryption system based on PRNG show that PRNG has good randomness and engineering application value. Existing feedback controllers and post-processing algorithms will be improved in the future to further improve the randomness of the PRNG and reduce the impact of chaotic degradation. 18 | 19 | ## The overall design of the Framework 20 | 21 | ![image-20210913233515245](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913233515.png) 22 | 23 | ## Performance 24 | 25 | ![image-20210913233745718](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913233745.png) 26 | 27 | -------------------------------------------------------------------------------- /Energy-Efficient Pedestrian Detection System Exploiting Statistical Error Compensation for Lossy Memory Data Compression.md: -------------------------------------------------------------------------------- 1 | # Energy-Efficient Pedestrian Detection System: Exploiting Statistical Error Compensation for Lossy Memory Data Compression 2 | ## ABSTRACT 3 | 4 | Pedestrian detection represents an important application for embedded vision systems. Focusing on the most energy constrained implementations, systems have typically employed histogram of oriented gradients features and support vector machine classification, which leads to low detection accuracy (a log-average miss rate of 68% on the Caltech Pedestrian dataset). Additionally, single-scale detection is often adopted in these systems for real-time processing, which further deteriorates the detection performance. In this paper, we propose a hardware accelerator achieving substantially higher detection accuracy by employing aggregated channel features (ACFs) at multiple different scales and using boosted decision trees for classification. Though resulting in higher accuracy, the higher dimensionality ACFs exacerbate memory operations, which become the energy and speed bottlenecks of the system. To overcome this, we employ binary discrete cosine transform to perform low-overhead and lossy compression, to efficiently store and access feature data. For restoring performance following compression, we exploit retraining of the classifier, resulting in an optimal model for pedestrian detection. The proposed accelerator is implemented in field-programmable gate array, which can process 40 video graphics array frames (640 × 480 resolution) per second at a log-average miss rate of 42% on the Caltech Pedestrian dataset, with compression reducing memory energy by 4× and overall energy by 1.7×. 5 | 6 | ## Index Terms 7 | 8 | Energy efficiency, error compensation, field-programmable gate array (FPGA), lossy compression, pedestrian detection. 9 | 10 | ## Contribution 11 | 12 | 1) The concept of statistical error compensation for overcoming feature data errors from compression loss is presented and analyzed for the potential it holds in addressing memory operations. They find that substantial opportunity for compression on feature data opens up, namely three times more with statistical error compensation than without it, in our demonstrated system. 13 | 14 | 2) An architecture for pedestrian detection is developed whereby the use of lossy compression is exploited toward mitigating expensive memory operations. However, performing compression over a block of feature data necessitates additional buffering, which introduces additional memory overhead besides the compression and decompression operational overheads. In order to minimize the overheads, they perform architectural analysis and employ a hardware-friendly compression approach. Ultimately, the demonstrated architecture achieves four times reduction in memory energy. 15 | 16 | 3) Their proposed memory compression architecture is implemented using field-programmable gate array (FPGA). For improving detection accuracy, we use boosted decision trees on high-dimensional aggregated channel features (ACFs). Furthermore, multiscale detection is supported to increase accuracy, and the computations involved are accelerated using the method of feature pyramid approximation. The system can run at 40 video graphics array (VGA) frames per second (fps) and achieves high detection accuracy (a logaverage miss rate of 42%) on the Caltech Pedestrian dataset. Compared to a baseline system without feature compression, the log-average miss rate is increased by just 1%, thanks to statistical error compensation. The 4× memory energy reduction by feature data compression yields 1.7 × overall energy reduction for the system. 17 | 18 | ## The overall design of the Framework 19 | 20 | ![image-20210909165357941](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909165358.png) 21 | 22 | ## Performance 23 | 24 | ![image-20210909165428437](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909165428.png) 25 | 26 | -------------------------------------------------------------------------------- /FPGA acceleration of Hyperspectral Image Processing for High-Speed Detection Applications.md: -------------------------------------------------------------------------------- 1 | # FPGA acceleration of Hyperspectral Image Processing for High-Speed Detection Applications 2 | ## ABSTRACT 3 | 4 | Recent advances in photonics and imaging technology allow the development of cutting-edge, lightweight hyperspectral sensors, both push-broom/line-scanning and snapshot/frame. At the same time, emerging applications in robotics, food inspection, medicine and earth observation are posing critical challenges on real-time processing and computational efficiency, both in terms of accuracy and power consumption. In this direction, in the current paper, we accelerate hyperspectral processing kernels by utilizing FPGAs, i.e., Zynq-7000 SoC, to perform similarity-based matching of spectral signatures. We propose a custom HW architecture based on multi-level parallelization, modularity, and parametric VHDL coding, which allows for in-depth design space exploration and trade-off analysis. Depending on configuration, our implementation processes 22−107 Megapixels per second providing an acceleration of 40−355x vs Intel-i3 CPU and 360−104x vs the embedded ARM Cortex A9, whereas the overall detection quality ranges from 56% to 97% when evaluated with multiple objects and images of 285 spectral channels. 5 | 6 | ## Contribution 7 | 8 | They proposed a highly-parallel parametric architecture to detect on-the-fly numerous hyperspectral signatures/pixels via similarity-based matching. Implemented on Zynq XC7Z045 FPGA, the HW throughput increases almost proportionally to cost to sustain 22−107 Megapixel per second with various matching metrics, e.g., L1, L2, χ2, by consuming 8−82K LUTs. Depending on configuration, the HW speedup is 40−355x vs SW execution on Intel Core-i3 and 360−104x vs ARM Cortex-A9. Evaluated with the APEX dataset, the FPGA provides overall detection quality of 56−97% depending on image and object. Our accuracy-speed-cost exploration showed that the most efficient metric for straightforward matching is L1 and, also, that using multiple signatures per object consistently improves the L1 detection accuracy by up to 12%. 9 | 10 | ## The overall design of the Framework 11 | 12 | ![image-20210909161621303](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909161621.png) 13 | 14 | ## Performance 15 | 16 | ![image-20210909161754023](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909161754.png) 17 | 18 | -------------------------------------------------------------------------------- /FPGA-Based Real-Time Moving Target Detection System for Unmanned Aerial Vehicle Application.md: -------------------------------------------------------------------------------- 1 | # FPGA-Based Real-Time Moving Target Detection System for Unmanned Aerial Vehicle Application 2 | ## ABSTRACT 3 | 4 | Moving target detection is the most common task for Unmanned Aerial Vehicle (UAV) to find and track object of interest from a bird’s eye view in mobile aerial surveillance for civilian applications such as search and rescue operation. The complex detection algorithm can be implemented in a real-time embedded system using Field Programmable Gate Array (FPGA). This paper presents the development of real-time moving target detection System-on-Chip (SoC) using FPGA for deployment on a UAV. The detection algorithm utilizes area-based image registration technique which includes motion estimation and object segmentation processes. The moving target detection system has been prototyped on a low-cost Terasic DE2-115 board mounted with TRDB-D5M camera. The system consists of Nios II processor and stream-oriented dedicated hardware accelerators running at 100 MHz clock rate, achieving 30-frame per second processing speed for 640 × 480 pixels’ resolution greyscale videos. 5 | 6 | ## Contribution 7 | 8 | (i) Development of real-time moving target detection in a System-on-Chip (SoC), attaining 30 frames per second (fps) processing rate for 640 × 480 pixels’ video. 9 | 10 | (ii) Prototyping of the proposed system in a low-cost FPGA board (Terasic DE2-115) mounted with a 5 megapixels’ camera (TRDB-D5M), occupying only 13% of total combinational function and 13% of total memory bits. 11 | 12 | (iii) Partitioning and pipeline scheduling of the detection algorithm in a hardware/software (HW/SW) codesign for maximum processing throughput. 13 | 14 | (iv) Stream-oriented hardware accelerators including block matching and object segmentation module which are able to operate in one cycle per pixel. 15 | 16 | (v) Analyzing detection performance with different density of area-based ego-motion estimation and frame differencing threshold. 17 | 18 | ## The overall design of the Framework 19 | 20 | ![image-20210909105518127](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909105518.png) 21 | 22 | ## Performance 23 | 24 | ![image-20210909110458221](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909110458.png) 25 | 26 | -------------------------------------------------------------------------------- /LACS A High-Computational-Efficiency Accelerator for CNNs.md: -------------------------------------------------------------------------------- 1 | # LACS: A High-Computational-Efficiency Accelerator for CNNs 2 | ## ABSTRACT 3 | 4 | Convolutional neural networks (CNNs) have become continually deeper. With the increasing depth of CNNs, the invalid calculations caused by padding-zero operations, filling-zero operations and stride length (stride length*>*1) represent an increasing proportion of all calculations. To adapt to different CNNs and to eliminate the influences of padding-zero operations, filling-zero operations and stride length on the computational efficiency of the accelerator, we draw upon the computation pattern of CPUs to design an efficient and versatile CNN accelerator, LACS (Loading-Addressing-Computing-Storing). We reduce the amount of data movements between registers and the on-chip buffer from O(k × k) to O(k) by a bypass buffer mechanism. Finally, we deploy LACS on a field-programmable gate array (FPGA) chip and analyze the factors that affect the computational efficiency of LACS. We also run popular CNNs on LACS. The results show that LACS achieves an extremely high computational efficiency, 98.51% when executing AlexNet and 99.66% when executing VGG-16, significantly exceeding state-of-the-art accelerators. 5 | 6 | ## INDEX TERMS 7 | 8 | Accelerator, convolutional neural networks (CNNs), field-programmable gate array (FPGA), buffer mechanism. 9 | 10 | ## Contribution 11 | 12 | 1) We design the LACS architecture for convolution layers. LACS eliminates the influences of padding-zero operations and stride length by establishing the coordinate relationships between the input feature maps, convolution kernels, output feature maps and stride length. We also design a set of instructions for LACS. 13 | 14 | 2) A simple and practical data partitioning method is proposed to eliminate filling-zero operations and to increase the DRAM burst length. 15 | 16 | 3) A bypass buffer is designed to reduce the amount of data movements between registers and the output buffer from k × k to k. 17 | 18 | 4) We show how to extend LACS by adding a POOL module. 19 | 20 | 5) The factors affecting the computational efficiency of LACS are analyzed. According to the results, we propose a strategy to optimize LACS. We also test the computational efficiency using popular CNNs, AlexNet and VGG-16, and compare it to the latest accelerators. The results show that LACS is an efficient and versatile accelerator. 21 | 22 | ## The overall design of the Framework 23 | 24 | ![image-20210909203457317](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909203457.png) 25 | 26 | ## Performance 27 | 28 | ![image-20210909203625476](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909203625.png) 29 | 30 | -------------------------------------------------------------------------------- /Multichannel Pulse-Coupled-Neural-Network-Based Color Image Segmentation for Object Detection.md: -------------------------------------------------------------------------------- 1 | # Multichannel Pulse-Coupled-Neural-Network-Based Color Image Segmentation for Object Detection 2 | 3 | ## ABSTRACT 4 | 5 | This paper proposes a pulse-coupled neural network(PCNN) with multichannel (MPCNN) linking and feeding fields for color image segmentation. Different from the conventional PCNN, pulse-based radial basis function units are introduced into the model neurons of PCNN to determine the fast links among neurons with respect to their spectral feature vectors and spatial proximity. The computing of the color image segmentation can be implemented in parallel on a field-programmable-gate-array chip. Furthermore, the results of segmentations are applied to an object-detection scheme. Experimental results show that the performance of the proposed MPCNN is comparable to those of other popular image segmentation algorithms for the segmentation of noisy images while its parallel neural circuits improve the speed of processing drastically as compared with the sequential-code-based counterparts. 6 | 7 | ## Index Terms 8 | 9 | Color image segmentation, field-programmable gate array (FPGA), object detection, pulse-coupled neural network (PCNN), radial basis function (RBF) neural network. 10 | 11 | ## Contribution 12 | 13 | In this paper, they propose an improved PCNN model to perform color image segmentation and circumvent the afore-mentioned problems of conventional PCNN. First, it is designed as a vector-oriented model to deal with color images. This scheme introduces a pulse-based radial basis function (RBF) into the model neurons of PCNN to work as a bioinspired computation unit. It uses the timing of individual pulse produced by the neurons to determine the distances between pixels’ feature vectors and their rank order, respectively, so that the fast links can be established among neurons with respect to the spectral feature vectors and spatial proximity of mapped pixels. The pulse-based RBF model’s pulsed behavior is designed to be compatible with PCNN’s pulsed behavior so that the algorithm can efficiently deal with color images whose pixels are presented in vectors instead of scalars. Here, the dimension of a pixel’s spectral feature vector is not limited to three. Thus, the processing of color images can be considered as a special case of this multichannel image processor. For brevity, they use the acronym MPCNN for the proposed multichannel PCNN. Second, the segmentation algorithm is designed in full parallelism by utilizing the 2-D structure of PCNN. The neurons’ connectivity may transfer in parallel in the 2-D space to establish intra-/intersegment linking. To improve the computational speed, some dynamics of pulsed neurons are simplified to make the neural circuits more compact for implementation on large-scale programmable digital integrated circuit such as field-programmable gate array (FPGA). In addition, the proposed scheme can be implemented on FPGA chips to overcome the bottleneck of computational burden. Analyses and experiments show that the time complexity of the proposed scheme can be reduced by up to hundreds of times as compared with the sequential-code-based counterparts, such as seeded region growing (SRG) and JSEG, while its quantitative performance is also competitive for the segmentation of noisy images. 14 | 15 | ## The overall design of the Framework 16 | 17 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910202343.png) 18 | 19 | ## Performance 20 | 21 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210905231712.png) 22 | 23 | -------------------------------------------------------------------------------- /NOVEL FPGA BASED HAAR CLASSIFIER FACE DETECTION ALGORITHM ACCELERATION.md: -------------------------------------------------------------------------------- 1 | # NOVEL FPGA BASED HAAR CLASSIFIER FACE DETECTION ALGORITHM ACCELERATION 2 | 3 | ## ABSTRACT 4 | 5 | We present here a novel approach to use FPGA to accelerate the Haar-classifier based face detection algorithm. With highly pipelined microarchitecture and utilizing abundant parallel arithmetic units in the FPGA, we’ve achieved real-time performance of face detection having very high detection rate and low false positives. Moreover, our approach is flexible toward the resources available on the FPGA chip. This work also provides us an understanding toward using FPGA for implementing non-systolic based vision algorithm acceleration. Our implementation is realized on a HiTech Global PCIe card that contains a Xilinx XC5VLX110T FPGA chip. 6 | 7 | ## Contribution 8 | 9 | First, they reconstructed the software-based face detection application. Because the Haar classifier function costs more than 95% of the total time, they populated only the Haar classifier function step onto the FPGA board, and left the pre-processing and post-processing on the host PC (personal computer). they also changed the data flow (loop cycles) from looping for every classifier per pixel, to looping for every stage per pixel. 10 | 11 | Second, in order to reduce resources requirement and use FPGA’s intrinsic parallel granularity, they replace all FP (floating point) operations with integer computations. With such transformation, they could utilize Xilinx Virtex 5’s embedded DSP48E building blocks to accelerate the multiplications and additions in the Haar classifier function. A single cycle of FPGA operation represents 100× to 1000× of software clock ticks to achieve the same functionality. This lotheyr data precision does not affect the final detection accuracy. 12 | 13 | Third, they employ extensive pipelining to increase algorithm level parallelism and to maintain frequency. Our FPGA implements more than 16 pipeline stages for the Haar classifier functions. they then tried to match the 17×17 pixel sub-window with 16N classifiers for each stage, where N is an integer. For example, the first stage has 16 classifiers, compared with three classifiers from software version. Although each pixel needs more computations to pass the first stage in the FPGA implementation, however, the first stage in FPGA dropped more than 90% of the total non-face pixels, compared with 50% drop rate for the software version. 14 | 15 | Fourth, they design our FPGA implementation for the Haar classifier function with reuse in mind. It consists of intellectual property (IP) building blocks which could be used for other applications with similar time-consuming Haar functions. The data path units, such as the multiplier and adders, can be easily populated to other implementations too. And the design files could be easily scaled for different parameters. 16 | 17 | ## The overall design of the Framework 18 | 19 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910201552.png) 20 | 21 | ## Performance 22 | 23 | ![img](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210910202144.png) 24 | 25 | -------------------------------------------------------------------------------- /Pseudorandom number generator based on a 5D hyperchaotic four-wing memristive system and its FPGA implementation.md: -------------------------------------------------------------------------------- 1 | # Pseudorandom number generator based on a 5D hyperchaotic four-wing memristive system and its FPGA implementation 2 | 3 | ## ABSTRACT 4 | 5 | Pseudorandom numbers are widely used in information encryption, spread spectrum communication and other science and technology and engineering fields. Because chaos is very sensitive to the initial conditions and has good inherent pseudo-random characteristics, the research of pseudorandom number generator (PRNG) based on a chaotic system becomes a new beneficial exploration. This paper presents a FPGA PRNG based on a 5D hyperchaotic four-wing memristive system (HFWMS). The 5D HFWMS has multiline equilibrium and three positive Lyapunov exponents, which indicates that the system has very complex dynamic behavior. On this basis, a FPGA PRNG based on the 5D HFWMS is proposed. The proposed PRNG is implemented in VHDL language, modeled and simulated on Vivado 2018.3 platform, and synthesized by FPGA device ZYNQ-XC7Z020 on Xilinx. The post-processing module consists of 16 linear shift registers and 15 levels XOR chain. The maximum operating frequency is 138.331 MHz and the speed is 15.37 Mbit/s. The random bit sets generated by PRNG are further verified by NIST 800.22 statistical standard. The security is analyzed by dynamic degradation, keyspace, key sensitivity and correlation. Experiments show that the design can be applied to various embedded password applications. 6 | 7 | ## Contribution 8 | 9 | This paper presents a PRNG-HFWMS based on FPGA. The 5D HFWMS has very complex dynamic behavior, which has multi-line equilibrium and three positive Lyapunov exponents, so it is very suitable for the design of PRNG. On this basis, a new PRNG based on the 5D HFWMS is proposed. The post-processing module consists of 16 shift registers and 15 levels of XOR chains. Finally, the PRNG test of the designed chaotic oscillator shows that the maximum operating frequency is 138.331 MHz and the speed is 15.37 Mbit/s. The random bit sets generated by PRNG has been further verified 10 | by NIST 800.22 statistical standard, which proves that the proposed design scheme can be applied to various embedded cryptography applications. 11 | 12 | ## Main experimental analysis 13 | 14 | When the system parameters are set as a = 1, b = 1, c = 0.7,m = 1, d = 0.2, e = 0.1, n = 0.01, and the initial conditions are (1,−1, 1, 1, 1), system behaves as a four-wing hyperchaotic attractor, and its phase portrait is shown in Fig. 1a. Figures 2 and 3 are the LE spectrum and bifurcation diagram of system with respect to parameter a respectively. From Fig. 2, it can be seen that the system has three positive LEs, especially when a = 0.78, the five LEs are LE1 = 0.1712, LE2 = 0.0907, LE3 = 0.0107, LE4 = 0, LE5 = −2.3243. From the bifurcation diagram in Fig. 3, it can be seen that the system has periodic, chaotic and hyperchaotic phenomena. When the system parameters of system (1) are selected as a = 11, b = 1, c = 0.7,m = 1, d = 0.2, e = 0.1, n = 0.01, and the initial conditions are selected as (0, 1, 0, 0, 0), the system presents a two-wing hyperchaotic attractor, as shown in Fig. 1b. 15 | 16 | ![image-20210913231927824](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913231927.png) 17 | 18 | ## Performance 19 | 20 | ![image-20210913232005577](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210913232005.png) 21 | 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The advanced works of Object detection based on FPGAs 2 | 3 | This repository provides an up-to-date the list of studies, which is the advanced work of object detection based on FPGAs. We plan to update the platform in the long term. It follows the taxonomy provided in the following paper (please cite the paper if you benefit from this repository): 4 | 5 | Tao S, Kai Z, Zhe C, Qian M, Jiawen W, Lu W, "FPGA-based accelerator for object detection:A comprehensive review" 6 | 7 | ## Copyright issues 8 | This article is not fully published, please follow the copyright information to quote the article we introduced. 9 | 10 | # Table of Contents (Follows the taxonomy in the paper) 11 | 12 | 1. [Target detection under traditional methods](#1-target-detection-under-traditional-methods) 13 | 2. [Target detection under the deep learning method](#2-target-detection-under-the-deep-learning-method) 14 | 15 | # 1. Object detection under traditional methods 16 | 17 | - [NOVEL FPGA BASED HAAR CLASSIFIER FACE DETECTION ALGORITHM ACCELERATION](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/NOVEL%20FPGA%20BASED%20HAAR%20CLASSIFIER%20FACE%20DETECTION%20ALGORITHM%20ACCELERATION.md#novel-fpga-based-haar-classifier-face-detection-algorithm-acceleration), [[paper]](https://www.researchgate.net/profile/Shih-Lien-Lu/publication/4375375_Novel_FPGA_based_Haar_classifier_face_detection_algorithm_acceleration/links/0fcfd50933e992036b000000/Novel-FPGA-based-Haar-classifier-face-detection-algorithm-acceleration.pdf?origin=publication_detail) 18 | - [Multichannel Pulse-Coupled-Neural-Network-Based Color Image Segmentation for Object Detection](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Multichannel%20Pulse-Coupled-Neural-Network-Based%20Color%20Image%20Segmentation%20for%20Object%20Detection.md#multichannel-pulse-coupled-neural-network-based-color-image-segmentation-for-object-detection), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5991960) 19 | - [A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Flexible%20Parallel%20Hardware%20Architecture%20for%20AdaBoost-Based%20Real-Time%20Object%20Detection.md#a-flexible-parallel-hardware-architecture-for-adaboost-based-real-time-object-detection), [[paper]](http://islab.soe.uoguelph.ca/sareibi/TEACHING_dr/ENG6530_RCS_html_dr/outline_W2017/docs/PAPER_REVIEW_dr/2014_dr/GRAD_dr/FPGA_Object_Detection.pdf) 20 | - [ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition video](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/ASIC%20and%20FPGA%20Implementation%20of%20the%20Gaussian%20Mixture%20Model%20Algorithm%20for%20Real-Time%20Segmentation%20of%20High%20Definition%20video.md#asic-and-fpga-implementation-of-the-gaussian-mixture-model-algorithm-for-real-time-segmentation-of-high-definition-video), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6481463) 21 | - [A Multi-Resolution FPGA-Based Architecture for Real-Time Edge and Corner Detection](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Multi-Resolution%20FPGA-Based%20Architecture%20for%20Real-Time%20Edge%20and%20Corner%20Detection.md#a-multi-resolution-fpga-based-architecture-for-real-time-edge-and-corner-detection), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6532283&tag=1) 22 | - [Real-time DDoS attack detection using FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Real-time%20DDoS%20attack%20detection%20using%20FPGA.md#real-time-ddos-attack-detection-using-fpga), [[paper]](https://pdf.sciencedirectassets.com/271515/1-s2.0-S0140366417X00145/1-s2.0-S0140366416306442/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQDroCWGkQN2XzrnwGlmWFagrg%2FRRsU05YF2%2FVUxL1P1hwIgIh2w8AwG%2BTnjOVKUOa9WTeafQZG%2FzWpWHoBASsK6jNIqgwQI1P%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAEGgwwNTkwMDM1NDY4NjUiDEDjcyYQxCR6tZrWtCrXA2j%2FQZudCQismE7Xxvc1Z3rL7l1c3wBIj%2BXSlBDaHNjjM%2F0VGzycSbtKXSrXqwaMpxq952ZHEVVV6fc%2BRBALq2uG1cuMHZsDp%2B%2FjvbXYduyH4irAK%2BmOPTSVL9LaWg9RJq0qQg5aEqv1KbBdAsXikVsWS7%2BfY0iWjpPTLD0j%2F4GeYsHQmZdyFBM6x6boxArlQeE0iUuSrJBxhkdWMR6sbuPJQI%2BR462dVjJ2ZfbgCuRey19Zg3sPyAZr0V4Hn%2BwzwyES3oRUDdirTekq%2BEI8mKkazzKLtrLcP2f3lurv37%2F2mEOiCxDRRXx6yqJxQaipllz2dHkp0oEmEb%2FxOxZ%2F1xva5AH92HiET%2FF%2FisU36jF7plkSZUW%2BpHsRgaZsxA%2FPZQap248lZTxUbvt9Mqdwwd2Ab4WDJHPeJnIeM4rf%2Bm4CiZmaugV6Yro%2BO25%2FZqRUURwjoguLjAeKJtLDRfB9QuBGaYokOTI9C0wh8LQoK4hYCYU6r1q1sn4PjPVFGC7laYnjtd9YNVnOkpqcFawGnHLw4G0M2gOMr0buR3KGddiPXkOeLSi7pb3m3HdsyDM%2BIZEo9b%2BCcAx%2F77jNrMsHywKWcPCODpgM6scIauRmhnRAJ%2BodyGfqnjDS5OyJBjqlAfQHx683tsTAPt47PHofPBHAsCvHYCR6mFfbVWombgLQB4zEPVS5c7eGf%2B05MuY87XWqEjmI%2F6x2Qy%2Bs%2BEckVbHBn5nF4DWk9pLb1XilnFXbuzv%2BtR8o9lQxgsKvRzZKa7u26Uzmf9O6nQQ0PLzeNZPyEB9AMnLmFxPp3AxftPsLytIqpvzUd87BHPOkKevKOLmt%2BnmgKiDPxV4%2BdLEvHBtWFBf5CQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210910T115637Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTY5LL4ONHX%2F20210910%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=473fd4c0343cd4decf36e9b537261e1b1c7184b40d35e7a256c7cfcc9f5a0dae&hash=5672e07fbb7e7f767e2d43907a2ce5ba3b82ed6298ef94d6f70127df0c44cc0a&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0140366416306442&tid=spdf-440306f7-5d6b-47a6-a8ba-597231133837&sid=405589e954c7e4489209b4b6469f22e8ab85gxrqa&type=client) 23 | - [An efficient and cost effective FPGA based implementation of the Viola-Jones face detection algorithm](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/An%20efficient%20and%20cost%20effective%20FPGA%20based%20implementation%20of%20the%20Viola-Jones%20face%20detection%20algorithm.md#an-efficient-and-cost-effective-fpga-based-implementation-of-the-viola-jones-face-detection-algorithm), [[paper]](https://pdf.sciencedirectassets.com/314097/1-s2.0-S2468067217X00023/1-s2.0-S2468067216300116/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQDroCWGkQN2XzrnwGlmWFagrg%2FRRsU05YF2%2FVUxL1P1hwIgIh2w8AwG%2BTnjOVKUOa9WTeafQZG%2FzWpWHoBASsK6jNIqgwQI1P%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAEGgwwNTkwMDM1NDY4NjUiDEDjcyYQxCR6tZrWtCrXA2j%2FQZudCQismE7Xxvc1Z3rL7l1c3wBIj%2BXSlBDaHNjjM%2F0VGzycSbtKXSrXqwaMpxq952ZHEVVV6fc%2BRBALq2uG1cuMHZsDp%2B%2FjvbXYduyH4irAK%2BmOPTSVL9LaWg9RJq0qQg5aEqv1KbBdAsXikVsWS7%2BfY0iWjpPTLD0j%2F4GeYsHQmZdyFBM6x6boxArlQeE0iUuSrJBxhkdWMR6sbuPJQI%2BR462dVjJ2ZfbgCuRey19Zg3sPyAZr0V4Hn%2BwzwyES3oRUDdirTekq%2BEI8mKkazzKLtrLcP2f3lurv37%2F2mEOiCxDRRXx6yqJxQaipllz2dHkp0oEmEb%2FxOxZ%2F1xva5AH92HiET%2FF%2FisU36jF7plkSZUW%2BpHsRgaZsxA%2FPZQap248lZTxUbvt9Mqdwwd2Ab4WDJHPeJnIeM4rf%2Bm4CiZmaugV6Yro%2BO25%2FZqRUURwjoguLjAeKJtLDRfB9QuBGaYokOTI9C0wh8LQoK4hYCYU6r1q1sn4PjPVFGC7laYnjtd9YNVnOkpqcFawGnHLw4G0M2gOMr0buR3KGddiPXkOeLSi7pb3m3HdsyDM%2BIZEo9b%2BCcAx%2F77jNrMsHywKWcPCODpgM6scIauRmhnRAJ%2BodyGfqnjDS5OyJBjqlAfQHx683tsTAPt47PHofPBHAsCvHYCR6mFfbVWombgLQB4zEPVS5c7eGf%2B05MuY87XWqEjmI%2F6x2Qy%2Bs%2BEckVbHBn5nF4DWk9pLb1XilnFXbuzv%2BtR8o9lQxgsKvRzZKa7u26Uzmf9O6nQQ0PLzeNZPyEB9AMnLmFxPp3AxftPsLytIqpvzUd87BHPOkKevKOLmt%2BnmgKiDPxV4%2BdLEvHBtWFBf5CQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210910T120116Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTY5LL4ONHX%2F20210910%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=0875b362da97b104ef03d345a75fb2c4201dbf6e0286ebfd29eaf0ac80b4c008&hash=0c16728afe2ae300224ec939217fe40889d3e90251ac9a58a0489ce2934af946&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S2468067216300116&tid=spdf-72c81ead-0663-4bd5-86dd-591e268a547d&sid=405589e954c7e4489209b4b6469f22e8ab85gxrqa&type=client) 24 | - [FPGA-Based Real-Time Moving Target Detection System for Unmanned Aerial Vehicle Application](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/FPGA-Based%20Real-Time%20Moving%20Target%20Detection%20System%20for%20Unmanned%20Aerial%20Vehicle%20Application.md#fpga-based-real-time-moving-target-detection-system-for-unmanned-aerial-vehicle-application), [[paper]](https://downloads.hindawi.com/journals/ijrc/2016/8457908.pdf) 25 | - [FPGA acceleration of Hyperspectral Image Processing for High-Speed Detection Applications](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/FPGA%20acceleration%20of%20Hyperspectral%20Image%20Processing%20for%20High-Speed%20Detection%20Applications.md#fpga-acceleration-of-hyperspectral-image-processing-for-high-speed-detection-applications), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8050773) 26 | - [Energy-Efficient Pedestrian Detection System: Exploiting Statistical Error Compensation for Lossy Memory Data Compression](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Energy-Efficient%20Pedestrian%20Detection%20System%20Exploiting%20Statistical%20Error%20Compensation%20for%20Lossy%20Memory%20Data%20Compression.md#energy-efficient-pedestrian-detection-system-exploiting-statistical-error-compensation-for-lossy-memory-data-compression), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8310932) 27 | - [Real-time hardware–software embedded vision system for ITS smart camera implemented in Zynq SoC](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Real-time%20hardware%E2%80%93software%20embedded%20vision%20system%20for%20ITS%20smart%20camera%20implemented%20in%20Zynq%20SoC.md#real-time-hardwaresoftware-embedded-vision-system-for-its-smart-camera-implemented-in-zynq-soc), [[paper]](https://link.springer.com/content/pdf/10.1007/s11554-016-0588-9.pdf) 28 | - [Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Towards%20a%20Scalable%20HardwareSoftware%20Co-Design%20Platform%20for%20Real-time%20Pedestrian%20Tracking%20Based%20on%20a%20ZYNQ-7000%20Device.md#towards-a-scalable-hardwaresoftware-co-design-platform-for-real-time-pedestrian-tracking-based-on-a-zynq-7000-device), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8307853) 29 | 30 | # 2. Object detection under the deep learning method 31 | 32 | - [A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Lightweight%20YOLOv2%20A%20Binarized%20CNN%20with%20A%20Parallel%20Support%20Vector%20Regression%20for%20an%20FPGA.md#a-lightweight-yolov2-a-binarized-cnn-with-a-parallel-support-vector-regression-for-an-fpga), [[paper]](https://dl.acm.org/doi/pdf/10.1145/3174243.3174266) 33 | - [A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20High-Throughput%20and%20Power-Efficient%20FPGA%20Implementation%20of%20YOLO%20CNN%20for%20Object%20Detection.md#a-high-throughput-and-power-efficient-fpga-implementation-of-yolo-cnn-for-object-detection), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8678682) 34 | - [REQ-YOLO A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/REQ-YOLO%20A%20Resource-Aware,%20Efficient%20Quantization%20Framework%20for%20Object%20Detection%20on%20FPGAs.md#req-yolo-a-resource-aware-efficient-quantization-framework-for-object-detection-on-fpgas), [[paper]](https://arxiv.org/pdf/1909.13396.pdf) 35 | - [A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Real-Time%20Object%20Detection%20Accelerator%20with%20Compressed%20SSDLite%20on%20FPGA.md#a-real-time-object-detection-accelerator-with-compressed-ssdlite-on-fpga), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8742299) 36 | - [An Object Detector based on Multiscale Sliding Window Search using a Fully Pipelined Binarized CNN on an FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/An%20Object%20Detector%20based%20on%20Multiscale%20Sliding%20Window%20Search%20using%20a%20Fully%20Pipelined%20Binarized%20CNN%20on%20an%20FPGA.md#an-object-detector-based-on-multiscale-sliding-window-search-using-a-fully-pipelined-binarized-cnn-on-an-fpga), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8280135) 37 | - [Accelerating Tiny YOLO v3 using FPGA-based Hardware/Software Co-Design](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Accelerating%20Tiny%20YOLO%20v3%20using%20FPGA-based%20HardwareSoftware%20Co-Design.md#accelerating-tiny-yolo-v3-using-fpga-based-hardwaresoftware-co-design), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9180843) 38 | - [Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Angel-Eye%20A%20Complete%20Design%20Flow%20for%20Mapping%20CNN%20Onto%20Embedded%20FPGA.md#angel-eye-a-complete-design-flow-for-mapping-cnn-onto-embedded-fpga), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7930521) 39 | - [A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Novel%20FPGA%20Accelerator%20Design%20for%20Real-Time%20and%20Ultra-Low%20Power%20Deep%20Convolutional%20Neural%20Networks%20Compared%20With%20Titan.md#a-novel-fpga-accelerator-design-for-real-time-and-ultra-low-power-deep-convolutional-neural-networks-compared-with-titan-x-gpu), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9108269) 40 | - [SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/SpWA%20An%20Efficient%20Sparse%20Winograd%20Convolutional%20Neural%20Networks%20Accelerator%20on%20FPGAs.md#spwa-an-efficient-sparse-winograd-convolutional-neural-networks-accelerator-on-fpgas), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8465842) 41 | - [An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/An%20Energy-Efficient%20FPGA-Based%20Deconvolutional%20Neural%20Networks%20Accelerator%20for%20Single%20Image%20Super-Resolution.md#an-energy-efficient-fpga-based-deconvolutional-neural-networks-accelerator-for-single-image-super-resolution), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8584497) 42 | - [A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/A%20Fully%20Connected%20Layer%20Elimination%20for%20a%20Binarized%20Convolutional%20Neural%20Network%20on%20an%20FPGA.md#a-fully-connected-layer-elimination-for-a-binarized-convolutional-neural-network-on-an-fpga), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8056771) 43 | - [Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/Sparse-YOLO%20HardwareSoftware%20Co-Design%20of%20an%20FPGA%20Accelerator%20for%20YOLOv2.md#sparse-yolo-hardwaresoftware-co-design-of-an-fpga-accelerator-for-yolov2), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9122495) 44 | - [LACS: A High-Computational-Efficiency Accelerator for CNNs](https://github.com/vivian13maker/FPGA-based-accelerator-for-object-detection-A-comprehensive-review/blob/main/LACS%20A%20High-Computational-Efficiency%20Accelerator%20for%20CNNs.md#lacs-a-high-computational-efficiency-accelerator-for-cnns), [[paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8944026) 45 | - [A new multi-scroll Chua’s circuit with composite hyperbolic tangent-cubic nonlinearity: Complex dynamics, Hardware implementation and Image encryption application](https://github.com/vivian13maker/The-advanced-work-of-Object-detection-based-on-FPGAs/blob/main/A%20new%20multi-scroll%20Chua%E2%80%99s%20circuit%20with%20composite%20hyperbolic%20tangent-cubic%20nonlinearity%20Complex%20dynamics,%20Hardware%20implementation%20and%20Image%20encryption%20application.md#a-new-multi-scroll-chuas-circuit-with-composite-hyperbolic-tangent-cubic-nonlinearity-complex-dynamics-hardware-implementation-and-image-encryption-application), [[paper]](https://pdf.sciencedirectassets.com/271564/1-s2.0-S0167926021X00053/1-s2.0-S0167926021000663/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjENf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQC4mre1L12NM3T8s%2BOIGPKfJw%2F36uoGGFXrhPQ8jf%2BmuQIgaZDIAMjeH2Yz0PmbBA0hf95B8mszfMUgm5OFNkbMIYUq%2BgMIMBAEGgwwNTkwMDM1NDY4NjUiDKtqkTaEdFhFdZfQlyrXA0LYbCfntNJ1BO7AcWxpyMOkl6evf6ycYC6WOgyNRQ%2FIPRTPDsJ8H5%2Foox%2BcaR3rGjmpbKxpOBYswW5hCyQzKsll7TH3PeGR8rIXZPNhTuyatx23wsHsdu%2FZP73p0gDB5%2BoGdmAaxOr1Ii3LbP6AcI108r81DVBFqYXjxv9UrWNkjol4ndUbTfZ3x5kMYpqnEcKvkRIyTHx%2FrHdaG1j0bL%2Fw0myvTxy2%2Bya%2B3q0tcQU%2FxYi8Ni1k7rb7O8qe6QKXQazla7mZ%2BzvcrNN11akT%2BvWGwHrrL1Xm1GqCiCGRJS%2FsQw4MD06mlYY1uIyEr2CoOMAzpzguMMwz8h8GZbGLJGLiyB25MWeQN0Llmqehefoe9s2cV%2BznPRHtLSHIXb8Q1bOZdIWRYHGJKcuzb3YqRniiW9gNFKOfRzz5UdQBBiGL3C%2FQY9gHTvUJ8wnHRsLmitQvmcItViaG%2FaFZPHyV21cZDNyJQalKSD%2FiwHQ5MX60DPMKdHZ%2FgnJ%2BDmV1d041FiY2LG754ZwP8bD0VJClmnW0ce4BtmyRiJjs%2F8ew6llguOffYHNPPZTCewCmyfmFGb3Pjoaxf%2BWyLUnEfQuwFrkLXjXF5Offkuv6ICzLpkPI26q19JM8eTCuw%2F2JBjqlAXhCs0pEcONxSXT5Kwbd2I%2B1kfTUvIaeA40%2BPWoPdVWeWjxNfPShYjjgZoE89gzrj%2FIeU6%2BoWOERmES%2Fsdjow3hNkIXDN0OmsQyMvxaeLmkjsdnAtd4z%2Fha9hQv4uik0YB6dbXQfswN7WUtRdAzWG2XXYv22d00BAmigC76f%2F%2Fa2vm8XLsO1yYUF4us4pTgw0%2FjirQ7GUZb2M75DcZ1nEbj21rv%2Baw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210913T155307Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTY53S44Q4N%2F20210913%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=6a731e6436933c03f0adb057bbf1c8e620b0722d24b14dd6d411dc4ca451a63c&hash=93668ace5e0bba0a2ab762b7414a44ec2eb4e9f8b510716799a897c8b5090dd4&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0167926021000663&tid=spdf-724dab76-4710-45e3-bb32-78126f46b369&sid=98e80d873e2c794dad5bd31673eee39b05d5gxrqa&type=client) 46 | - [A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function](https://github.com/vivian13maker/The-advanced-work-of-Object-detection-based-on-FPGAs/blob/main/A%20robust%20and%20fixed-time%20zeroing%20neural%20dynamics%20for%20computing%20time-variant%20nonlinear%20equation%20using%20a%20novel%20nonlinear%20activation%20function.md#a-robust-and-fixed-time-zeroing-neural-dynamics-for-computing-time-variant-nonlinear-equation-using-a-novel-nonlinear-activation-function), [[paper]](https://pdf.sciencedirectassets.com/271597/1-s2.0-S0925231219X0020X/1-s2.0-S0925231219303935/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjENf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIHCLRvsc2%2B5noAOe3P%2B44C8tqcTLDDlns9qtXrHLnRXcAiEAow15iO5APwMcjh6s%2BnyYXsoKGohlSFpnDw8VXWpsJPMq%2BgMILxAEGgwwNTkwMDM1NDY4NjUiDFhOHkiaHmrnKrFhTCrXA50dGLo%2B78yfuG3xIqGruPrOs6aZ92Usm2zR1VtUeLolGsYzxTu4cBolvj2Sn%2BffWmY9v42mHbH%2Fv5VXA8oCd%2BU6%2FyPC6A5q82NuU%2FX8mc12g0xAZmIvNrwccq2ZZ31kDIO72%2Bj%2FVZq%2BPbznnxAXAuD5KcB92eni8FO8VViyVl%2B5Czm1%2FixFKz%2FFHffb5naoPfHBvPFIg4UALH3Cmbp5lHE6IF9da4q2%2FRXZgtGsiySrGUPLaVJlRj%2FK%2Bd%2BOXXC7U9KWr2yaNOPF%2Be6XEbT0gNAFlZ%2FNfmKUFANX22Z6nZVF9S7x1%2FjdZS9MyLOC6ioSwyMl1Z7tP6j44YkifjUA%2FZcCPW5tT%2FTeqLwPR6Ij4EXiTFGhhgbcAtuigLaw1GF6tQ7Gi%2BNolnUJ5mKL4XplDzTBKt44a53gutsf77fSBlqgjsAl0wsHy7ZQe8U2T%2BVghageFg7fw%2B2L5QeLdVnR3P5X5N93Io405XKZ4l%2BvD6nHHe%2FeMrAjmLfqdxOKUe2V40GNA66ZDKXHRw67O5oh6A8DmBQiNKzSTw55gqqIYP5LfJKoWCORYZud03rnFeCJndSL85XXbsXdWJlCT80nnJxWFIX3pQN2A9cTzCKu2hR7M2qUSXNHIjDyu%2F2JBjqlAU9MEcPJ4Fcyd6W4tA1ZlsTXQPzf3jM5j29GfNhU7JMF1tqBb6I0JiP6tbTkk6M%2FEeOYJSzQOFXhgvltnhi1dHuboeYIcGAN5cWB3bZjDLgOGepbQoKcSdqJzUCTfAKMWZiRz7Ei4ITBCavgmbqf%2FP9jVsQaBUeYSABdi1pf3HT6Q%2FHkscojSUHzAY0eUqshMB%2FoKraerdrrUsmgYTydYSnSKJorkw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210913T155729Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYZQ74CTY7%2F20210913%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=581ff7120041f79054b51c3f7112af50f9410a3d7a41be2e4dc956d9eb9e5789&hash=8dc84f84d9612dd1fbe43dd1cd3a411dd2dc4150643783aaf45f9140abb03962&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0925231219303935&tid=spdf-7f71911f-1e93-4b73-9d32-72351c0d3cae&sid=98e80d873e2c794dad5bd31673eee39b05d5gxrqa&type=client) 47 | - [An Extremely Simple Multi-Wing Chaotic System: Dynamics Analysis, Encryption Application and Hardware Implementation](https://github.com/vivian13maker/The-advanced-work-of-Object-detection-based-on-FPGAs/blob/main/An%20Extremely%20Simple%20Multi-Wing%20Chaotic%20System%EF%BC%9ADynamics%20Analysis,%20Encryption%20Application%20and%20Hardware%20Implementation.md#an-extremely-simple-multi-wing-chaotic-system-dynamics-analysis-encryption-application-and-hardware-implementation), [[paper]](https://www.researchgate.net/publication/348146005_An_Extremely_Simple_Multi-Wing_Chaotic_System_Dynamics_Analysis_Encryption_Application_and_Hardware_Implementation) 48 | - [Design and FPGA Implementation of a Pseudo-random Number Generator Based on a Hopfield Neural Network Under Electromagnetic Radiation](https://github.com/vivian13maker/The-advanced-work-of-Object-detection-based-on-FPGAs/blob/main/Design%20and%20FPGA%20Implementation%20of%20a%20Pseudo-random%20Number%20Generator%20Based%20on%20a%20Hop%EF%AC%81eld%20Neural%20Network%20Under%20Electromagnetic%20Radiation.md#design-and-fpga-implementation-of-a-pseudo-random-number-generator-based-on-a-hop%EF%AC%81eld-neural-network-under-electromagnetic-radiation), [[paper]](https://www.researchgate.net/publication/352125459_Design_and_FPGA_Implementation_of_a_Pseudo-random_Number_Generator_Based_on_a_Hopfield_Neural_Network_Under_Electromagnetic_Radiation) 49 | - [Pseudorandom number generator based on a 5D hyperchaotic four-wing memristive system and its FPGA implementation](https://github.com/vivian13maker/The-advanced-work-of-Object-detection-based-on-FPGAs/blob/main/Pseudorandom%20number%20generator%20based%20on%20a%205D%20hyperchaotic%20four-wing%20memristive%20system%20and%20its%20FPGA%20implementation.md#pseudorandom-number-generator-based-on-a-5d-hyperchaotic-four-wing-memristive-system-and-its-fpga-implementation), [[paper]](https://link.springer.com/content/pdf/10.1140/epjs/s11734-021-00132-x.pdf) 50 | 51 | # Contact 52 | Please contact Qian M (kmust_mzq@126.com) for your questions about this webpage. 53 | -------------------------------------------------------------------------------- /REQ-YOLO A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs.md: -------------------------------------------------------------------------------- 1 | # REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs 2 | 3 | ## ABSTRACT 4 | 5 | Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-efficient implementations of DNNs on a power-constrained system. Two research thrusts are dedicated to performance and energy efficiency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is efficient hardware implementation. Recent works on extremely-low-bit CNNs such as the binary neural network (BNN) and XNOR-Net replace the traditional floating point operations with binary bit operations which significantly reduces the memory bandwidth and storage requirement. However, it suffers from nonnegligible accuracy loss and underutilized digital signal processing(DSP) blocks of FPGAs. 6 | 7 | To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using *Alternative Direction Method of Multipliers* (ADMM), an effective optimization technique for general, non-convex optimization problems. To achieve real-time, highly-efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and develop an efficient processing element (PE) structure supporting the heterogeneous weight quantization, CONV dataflow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can significantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO. 8 | 9 | ## **KEYWORDS** 10 | 11 | FPGA; YOLO; object detection; compression; ADMM 12 | 13 | ## Contribution 14 | 15 | • They present a detailed hardware implementation and optimization of block circulant matrices on CONV layers on object detection tasks. 16 | • They present a heterogeneous weight quantization method including both equal-distance and mixed powers-of-two methods considering hardware resource on FPGAs. They adopt ADMM to directly quantize the FFT results of weight. 17 | • They employ an HLS design methodology for productive development and optimal hardware resource exploration of our FPGA-based YOLO accelerator. 18 | 19 | ## The overall design of the Framework 20 | 21 | ![image-20210906234930919](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906234930.png) 22 | 23 | ## Performance 24 | 25 | ![image-20210906235056482](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210906235056.png) 26 | 27 | -------------------------------------------------------------------------------- /Real-time DDoS attack detection using FPGA.md: -------------------------------------------------------------------------------- 1 | # Real-time DDoS attack detection using FPGA 2 | ## ABSTRACT 3 | 4 | A real-time DDoS attack detection method should identify attacks with low computational overhead. Although a large number of statistical methods have been designed for DDoS attack detection, real-time statistical solution to detect DDoS attacks in hardware is only a few. In this paper, a real-time DDoS detection method is proposed that uses a novel correlation measure to identify DDoS attacks. Effectiveness of the method is evaluated with three network datasets, viz., CAIDA DDoS 2007, MIT DARPA, and TUIDS. Further, the proposed method is implemented on an FPGA to analyze its performance. The method yields high detection accuracy and the FPGA implementation requires less than one microsecond to identify an attack. 5 | 6 | ## Contribution 7 | 8 | 1. A robust correlation measure, referred to as *NaHiD* capable of handling shifting and scaling correlations among the features of two traffic objects. 9 | 10 | 2. Selection of a small subset of traffic features for DDoS detection without compromising detection accuracy. 11 | 12 | 3. A fast DDoS attack detection method implemented on software as well as hardware platforms. The hardware implementation uses a Field Programmable Gate Arrays (FPGA) device and requires less than one microsecond to classify an incoming traffic sample as attack or normal. 13 | 14 | 4. Evaluation of the proposed *NaHiD* and DDoS attack detection method in terms of detection time and accuracy on three benchmark network datasets. 15 | 16 | ## The overall design of the Framework 17 | 18 | ![image-20210909095933443](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909095933.png) 19 | 20 | ## Performance 21 | 22 | ![image-20210909100057531](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909100057.png) 23 | 24 | -------------------------------------------------------------------------------- /Real-time hardware–software embedded vision system for ITS smart camera implemented in Zynq SoC.md: -------------------------------------------------------------------------------- 1 | # Real-time hardware–software embedded vision system for ITS smart camera implemented in Zynq SoC 2 | ## ABSTRACT 3 | 4 | The article demonstrates the usefulness of heterogeneous System on Chip (SoC) devices in smart cameras used in intelligent transportation systems (ITS). In a compact, energy efficient system the following exemplary algorithms were implemented: vehicle queue length estimation, vehicle detection, vehicle counting and speed estimation (using multiple virtual detection lines), as well as vehicle type (local binary features and SVM classifier) and colour (k-means classifier and YCbCr colourspace analysis) recognition. The solution exploits the hardware–software architecture, i.e. the combination of reconfigurable resources and the efficient ARM processor. Most of the modules were implemented in hardware, using Verilog HDL, taking full advantage of the possible parallelization and pipeline, which allowed to obtain real-time image processing. The ARM processor is responsible for executing some parts of the algorithm, i.e. high-level image processing and analysis, as well as for communication with the external systems (e.g. traffic lights controllers). The demonstrated results indicate that modern SoC systems are a very interesting platform for advanced ITS systems and other advanced embedded image processing, analysis and recognition applications. 5 | 6 | ## Keywords 7 | 8 | Intelligent Transportation Systems · Hardware-software image processing (Zynq SoC) · Vehicle queue length estimation · Vehicle detection · Vehicle type and colour recognition 9 | 10 | ## Contribution 11 | 12 | – the concept of using the heterogeneous (hardware–software) Zynq SoC device for ITS smart-camera, which allows to obtain an effective, low-power computing platform, 13 | 14 | – evaluation of this concept by implementing sample algorithms used in ITS smart cameras, 15 | 16 | – the proposal of a new and robust vehicle detection algorithm customized for a hardware–software system, 17 | 18 | – the proof that a Zynq-based smart camera allows realtime image processing of a 720 9 576 @ 50 fps pixel video stream. 19 | 20 | ## The overall design of the Framework 21 | 22 | ![image-20210909174601202](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909174601.png) 23 | 24 | ## Performance 25 | 26 | ![image-20210909174625019](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909174625.png) 27 | 28 | -------------------------------------------------------------------------------- /SpWA An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs.md: -------------------------------------------------------------------------------- 1 | # SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs 2 | ## ABSTRACT 3 | 4 | FPGAs have been an efficient accelerator for CNN inference due to its high performance, flexibility, and energy-efficiency. To improve the performance of CNNs on FPGAs, fast algorithms and sparse methods emerge as the most attractive alternatives, which can effectively reduce the complexity of CNNs. Using fast algorithms, the feature maps are transformed to special domain to reduce the arithmetic complexity. On the other hand, compressing CNN models by pruning the unimportant connections reduces both storage and arithmetic complexity. 5 | 6 | In this paper, we introduce sparse Winograd convolution accelerator (SpWA) combining these two orthogonal approaches on FPGAs. First, we employ a novel dataflow by rearranging the filter layout in Winograd convolution. Then we design an efficient architecture to implement SpWA using line buffer design and CompressSparse-Column (CSC) format-based processing element. Finally, we propose an efficient algorithm based on dynamic programming to balance the computation among different processing elements. Experimental results on VGG16 and YOLO network show a 2.9x-3.1x speedup compared with state-of-the-art technique. 7 | 8 | ## Contribution 9 | 10 | - They present a dataflow that applies sparse Winograd algorithm for accelerating CNNs on FPGAs. In this dataflow, we batch the transformed input tiles and rearrange the filter layout. 11 | - They propose an architecture based on SpWA dataflow. The SpWA architecture employs efficient line-buffer and PE designs. 12 | - They design an efficient algorithm to determine the partition of sparse matrices into groups so as to minimize the total idle cycles. 13 | 14 | ## The overall design of the Framework 15 | 16 | ![image-20210907225639832](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907225639.png) 17 | 18 | ## Performance 19 | 20 | ![image-20210907225753764](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907225753.png) 21 | 22 | ![image-20210907225841379](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210907225841.png) 23 | 24 | -------------------------------------------------------------------------------- /Sparse-YOLO HardwareSoftware Co-Design of an FPGA Accelerator for YOLOv2.md: -------------------------------------------------------------------------------- 1 | # Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2 2 | ## ABSTRACT 3 | 4 | Convolutional neural network (CNN) based object detection algorithms are becoming dominant in many application fields due to their superior accuracy advantage over traditional schemes. Among them, You Look Only Once (YOLO) is one of the most popular detection frameworks that show best trade-offs between speed and accuracy. However, due to the intrinsic high computational workload of CNN, it is still challenging when targeting high-throughput processing with low cost in energy consumption. In this paper, we propose a hardware/software (HW/SW) co-design methodology targeting CPU+FPGA-based heterogeneous platforms. Firstly, we extend a novel sparse convolution algorithm to the YOLOv2 framework, and then develop a resource-efficient FPGA accelerator architecture based on asynchronously executed parallel convolution cores. Secondly, algorithm-level optimization schemes, including hardwareaware neural network pruning, clustering and quantization are introduced, which successfully save the computational workload of the YOLOv2 algorithm by 7 times. Finally, an end-to-end design space exploration flow for FPGA-based accelerator design is presented and two HW/SW partition strategies are studied and implemented. Experimental results show that our design can achieve a peak throughput of 2.13 TOPS (72.5 fps) on an Intel Arria-10 GX1150 FPGA under the working frequency of 211 MHz, while the detection accuracy is 74.45 on the PASCAL VOC2007 dataset. 5 | 6 | ## Index Terms 7 | 8 | convolutional neural networks, fine-grained pruning, field programmable gate arrays, object detection, YOLO 9 | 10 | ## Contribution 11 | 12 | *•* A novel sparse convolution scheme, namely the Accumulate-Before-Multiplication Sparse-Convolution (ABM-SpConv), was applied to the YOLOv2 object detection algorithm. They proposed a hardware-aware algorithm-level optimization flow for YOLOv2 network including pruning, clustering, layer fusion, and quantization to overcome the low hardware utilization issue of the ABM-SpConv scheme that the study of has failed to address. 13 | 14 | *•* A dedicated accelerator architecture was developed targeting for CPU+FPGA-based heterogeneous computing platforms. The proposed accelerator architecture achieved the best balance between the flexibility of software and the high computational efficiency of customized hardware circuits. 15 | 16 | *•* An end-to-end hardware-software (HW-SW) co-design work flow was proposed, which covered all the important research topics from algorithm-level neural network compression to hardware-level sparseconvolution-specific circuit design and architecture design space exploration (DSE). 17 | 18 | *•* Two HW/SW partition strategies, which targeted different hardware settings, were proposed and corresponding designs were implemented on a Intel Arria 10 GX1150 FPGA. Experimental results have shown that our optimization scheme compressed the memory footprint of YOLOv2’s CNN model by 20*×* and the computational workload by 7*×* at the cost of only 2*.*35% loss in detection accuracy, which had no obvious impact on the effectiveness of the real-world application tested in this work. The design that targeted at platforms with powerful CPUs can achieved a detection speed of 72.5 at 416 *×* 416 input resolution, while the design targeted at embedded platforms could also perform real-time detection at the frame rate of 61.9 fps. Our design shows more than 3*.*3*×* improvement on throughput over the best FPGA-based YOLO accelerator reported in the literature. 19 | 20 | ## The overall design of the Framework 21 | 22 | ![image-20210909201936546](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909201936.png) 23 | 24 | ## Performance 25 | 26 | ![image-20210909202030944](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909202030.png) 27 | 28 | -------------------------------------------------------------------------------- /Towards a Scalable HardwareSoftware Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device.md: -------------------------------------------------------------------------------- 1 | # Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device 2 | ## Abstract 3 | 4 | Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processing-related products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products. 5 | 6 | ## Keywords 7 | 8 | Object tracking; Open-access; System-on-Chip; Hardware/Software co-design 9 | 10 | ## Contribution 11 | 12 | They propose to use a general solution and prototyping design, which allows any software developers to quickly validate and prototype image applications in an environment very close to the final implementation with a real-time execution on an embedded hardware architecture. For other boards, the development will be the same concept for changing synthesis steps. 13 | 14 | They propose to use a hardware/software co-design, which allows any software developers to quickly validate and prototype image applications in an environment very close to the final implementation with a real-time execution on embedded hardware. In general, libraries in hardware/software co-design system need to be recompiled. For instance, in tradition High-level synthesis (HLS), it includes three steps that allocation, scheduling and binding, and then Xilinx SDK develops a method which involves a lot of recompilation and redevelopment in FPGA interfaces to on-chip processors. Hence a lot of repeated developing task often happens. However, using our co-design platform, hardware IPs can be used as a device through a DDR shared memory scheme. 15 | 16 | ## The overall design of the Framework 17 | 18 | ![image-20210909193811460](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909193811.png) 19 | 20 | ## Performance 21 | 22 | ![image-20210909193911519](https://gitee.com/feiyipengfei/pic-md1/raw/master/20210909193911.png) 23 | 24 | --------------------------------------------------------------------------------