├── Dockerfile ├── README.md ├── YOLOv4_cam.py ├── cv2_info.py ├── face_SSD.py ├── golden_axe.png ├── gstream.png ├── model_SSD ├── deploy.prototxt └── res10_300x300_ssd_iter_140000_fp16.caffemodel ├── model_SURE └── EDSR_x3.pb ├── model_YOLOv4 ├── coco.names └── yolov4.cfg ├── super_resolution.py └── twistellar.png /Dockerfile: -------------------------------------------------------------------------------- 1 | ARG CUDA_VERSION=10.2 2 | ARG CUDNN_VERSION=7 3 | ARG UBUNTU_VERSION=18.04 4 | 5 | FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} 6 | LABEL mantainer=" github.com/Fizmath < boy.cosmic@yandex.by > " 7 | 8 | ARG PYTHON_VERSION=3.8 9 | ARG OPENCV_VERSION=4.5.0 10 | 11 | ENV DEBIAN_FRONTEND=noninteractive 12 | 13 | RUN apt-get -qq update && \ 14 | apt-get -qq install \ 15 | # python : 16 | python${PYTHON_VERSION} \ 17 | python${PYTHON_VERSION}-dev \ 18 | libpython${PYTHON_VERSION} \ 19 | libpython${PYTHON_VERSION}-dev \ 20 | python-dev \ 21 | python3-setuptools \ 22 | # developement tools, opencv image/video/GUI dependencies, optimiztion packages , etc ... : 23 | apt-utils \ 24 | autoconf \ 25 | automake \ 26 | checkinstall \ 27 | cmake \ 28 | gfortran \ 29 | git \ 30 | libatlas-base-dev \ 31 | libavcodec-dev \ 32 | libavformat-dev \ 33 | libavresample-dev \ 34 | libeigen3-dev \ 35 | libexpat1-dev \ 36 | libglew-dev \ 37 | libgtk-3-dev \ 38 | libjpeg-dev \ 39 | libopenexr-dev \ 40 | libpng-dev \ 41 | libpostproc-dev \ 42 | libpq-dev \ 43 | libqt5opengl5-dev \ 44 | libsm6 \ 45 | libswscale-dev \ 46 | libtbb2 \ 47 | libtbb-dev \ 48 | libtiff-dev \ 49 | libtool \ 50 | libv4l-dev \ 51 | libwebp-dev \ 52 | libxext6 \ 53 | libxrender1 \ 54 | libxvidcore-dev \ 55 | pkg-config \ 56 | protobuf-compiler \ 57 | qt5-default \ 58 | unzip \ 59 | wget \ 60 | yasm \ 61 | zlib1g-dev \ 62 | # GStreamer : 63 | libgstreamer1.0-0 \ 64 | gstreamer1.0-plugins-base \ 65 | gstreamer1.0-plugins-good \ 66 | gstreamer1.0-plugins-bad \ 67 | gstreamer1.0-plugins-ugly \ 68 | gstreamer1.0-libav \ 69 | gstreamer1.0-doc \ 70 | gstreamer1.0-tools \ 71 | gstreamer1.0-x \ 72 | gstreamer1.0-alsa \ 73 | gstreamer1.0-gl \ 74 | gstreamer1.0-gtk3 \ 75 | gstreamer1.0-qt5 \ 76 | gstreamer1.0-pulseaudio \ 77 | libgstreamer1.0-dev \ 78 | libgstreamer-plugins-base1.0-dev && \ 79 | rm -rf /var/lib/apt/lists/* && \ 80 | apt-get purge --auto-remove && \ 81 | apt-get clean 82 | 83 | # install new pyhton system wide : 84 | RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1 && \ 85 | update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 2 && \ 86 | update-alternatives --config python3 87 | 88 | # numpy for the newly installed python : 89 | RUN wget https://bootstrap.pypa.io/get-pip.py && \ 90 | python${PYTHON_VERSION} get-pip.py --no-setuptools --no-wheel && \ 91 | rm get-pip.py && \ 92 | pip install numpy 93 | 94 | # opencv and opencv-contrib : 95 | RUN cd /opt/ &&\ 96 | wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip -O opencv.zip &&\ 97 | unzip -qq opencv.zip &&\ 98 | rm opencv.zip &&\ 99 | wget https://github.com/opencv/opencv_contrib/archive/${OPENCV_VERSION}.zip -O opencv-co.zip &&\ 100 | unzip -qq opencv-co.zip &&\ 101 | rm opencv-co.zip &&\ 102 | mkdir /opt/opencv-${OPENCV_VERSION}/build && cd /opt/opencv-${OPENCV_VERSION}/build &&\ 103 | cmake \ 104 | -D BUILD_opencv_java=OFF \ 105 | -D WITH_CUDA=ON \ 106 | -D WITH_CUBLAS=ON \ 107 | -D OPENCV_DNN_CUDA=ON \ 108 | -D CUDA_ARCH_PTX=7.5 \ 109 | -D WITH_NVCUVID=ON \ 110 | -D WITH_CUFFT=ON \ 111 | -D WITH_OPENGL=ON \ 112 | -D WITH_QT=ON \ 113 | -D WITH_IPP=ON \ 114 | -D WITH_TBB=ON \ 115 | -D WITH_EIGEN=ON \ 116 | -D CMAKE_BUILD_TYPE=RELEASE \ 117 | -D OPENCV_EXTRA_MODULES_PATH=/opt/opencv_contrib-${OPENCV_VERSION}/modules \ 118 | -D PYTHON2_EXECUTABLE=$(python${PYTHON_VERSION} -c "import sys; print(sys.prefix)") \ 119 | -D CMAKE_INSTALL_PREFIX=$(python${PYTHON_VERSION} -c "import sys; print(sys.prefix)") \ 120 | -D PYTHON_EXECUTABLE=$(which python${PYTHON_VERSION}) \ 121 | -D PYTHON_INCLUDE_DIR=$(python${PYTHON_VERSION} -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \ 122 | -D PYTHON_PACKAGES_PATH=$(python${PYTHON_VERSION} -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \ 123 | -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \ 124 | -D CMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs \ 125 | .. &&\ 126 | make -j$(nproc) && \ 127 | make install && \ 128 | ldconfig &&\ 129 | rm -rf /opt/opencv-${OPENCV_VERSION} && rm -rf /opt/opencv_contrib-${OPENCV_VERSION} 130 | 131 | ENV NVIDIA_DRIVER_CAPABILITIES all 132 | ENV XDG_RUNTIME_DIR "/tmp" 133 | 134 | WORKDIR /myapp -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | ![Docker Build Status](https://img.shields.io/docker/cloud/build/fizmath/gpu-opencv) 5 | ![Docker](https://img.shields.io/docker/cloud/automated/fizmath/gpu-opencv) 6 | ![DockerHub Pulls](https://img.shields.io/docker/pulls/fizmath/gpu-opencv.svg) 7 | 8 | 9 | # GPU-accelerated Docker container with OpenCV 4.5, Python 3.8 and GStreamer 10 | 11 | - [opencv](https://github.com/opencv/opencv) + [opencv_contrib](https://github.com/opencv/opencv_contrib) 12 | - Python 3.8.0 13 | - Ubuntu 18.04 LTS 14 | - GStreamer 1.14.5 15 | - FFMPEG 16 | - CUDA 10.2 17 | - NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75 18 | - CUDA_ARCH_PTX = 75 ( The container does not work with **NVIDIA Ampere GPUs** `sm_86 `. For RTX 30 series, please see this [new repo](https://github.com/Fizmath/Docker-opencv-GPU-RTX_30)) 19 | - cuDNN: 7.6.5 20 | - OpenCL 21 | - Qt5::OpenGL 5.9.5 22 | - Intel IPP and TBB 23 | - UNCOMPRESSED SIZE 6.02 GB 24 | 25 | 26 | Pull the image from here : 27 | 28 | - [https://hub.docker.com/u/fizmath](https://hub.docker.com/u/fizmath) 29 | ```sh 30 | $ docker pull fizmath/gpu-opencv:latest 31 | ``` 32 | 33 | ## How to run : 34 | 35 | - With GPU 36 | 37 | You need to install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) on your machine. Run the container by this command : 38 | 39 | ``` 40 | $ docker run --gpus all -it --rm fizmath/gpu-opencv:latest 41 | root@22067ad0cc87:/myapp# 42 | ``` 43 | 44 | - With CPU : 45 | 46 | If no GPU available on your machine, yet you can use the container with [Docker](https://docs.docker.com/engine/install/) 47 | ``` 48 | $ docker run -it --rm fizmath/gpu-opencv:latest 49 | root@cc00562d816e:/myapp# 50 | ``` 51 | 52 | for running the example ``.py`` files in this repo with CPU you should comment these two lines : 53 | 54 | ``` 55 | net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) 56 | net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA) 57 | ``` 58 | 59 | ## Test with the examples in this repo : 60 | 61 | 1. Download this repo . The pretrained models are included, but you just need to download one of them [yolov4.weights](https://github.com/easyadin/Object-Detection-YOLOv4#pre-trained-models). And also no arg-parsing for the simplicity 62 | 63 | 2. Unzip and set its directory in your machine as your docker volume : 64 | ```sh 65 | $ docker run --gpus all -it --rm -v :/myapp fizmath/gpu-opencv:latest 66 | root@771c5bcb2895:/myapp# ls 67 | Dockerfile YOLOv4_cam.py face_SSD.py model_SSD model_YOLOv4 68 | README.md cv2_info.py golden_axe.png model_SURE super_resolution.py 69 | ``` 70 | 3. Print out OpenCV build info into a textfile, check out the produced output in your volume : 71 | ```sh 72 | root@771c5bcb2895:/myapp# python3 cv2_info.py 73 | ``` 74 | 4. Image Super-Resolution with OpenCV, Cuda and Docker : the included [image](golden_axe.png), a SR image `SURE_golden_axe.png` will be produced in your volume by the following command : 75 | ```sh 76 | root@771c5bcb2895:/myapp# python3 super_resolution.py 77 | ``` 78 | compare both images visually. If you run out of GPU memory , make inference by your CPU cores ; see the [.py](./super_resolution.py) file. 79 | 80 | For the next two examples, you need to include these commands to docker run : 81 | 82 | - -e DISPLAY=$DISPLAY : this sends the display id from your machine to the container. 83 | 84 | - --device="/dev/video0:/dev/video0" : this lets the container find the camera. 85 | 86 | - -v /tmp/.X11-unix:/tmp/.X11-unix:rw : this lets the container find the display via X server. In order to display the GUI with Docker, the X client in the Docker container needs to communicate with the host X server. 87 | 88 | Note that I tested the above commands in UBUNTU. These may differ in other systems. 89 | 90 | 5. Real-time face detection with OpenCV DNN, GStreamer, CUDA and Docker : 91 | . Before running the container type in your CMD : 92 | ```sh 93 | $ xhost + 94 | access control disabled, clients can connect from any host 95 | ``` 96 | which allows the user to access the running X server. After being done with the examples type ``xhost -`` for the sake of security 97 | 98 | ```sh 99 | $ docker run --gpus all --rm -it -e DISPLAY=$DISPLAY -v :/myapp -v /tmp/.X11-unix:/tmp/.X11-unix:rw --device="/dev/video0:/dev/video0" fizmath/gpu-opencv:latest 100 | root@1be1f7efabf9:/myapp# python3 face_SSD.py 101 | ``` 102 | 103 | ![img](twistellar.png) 104 | 105 | I drew ***encircling ellipses*** instead of the common face ***bounding boxes*** . Click `q` to terminate the session. 106 | 107 | 108 | 109 | 6. Real-time object detection with YOLO v4, GStreamer, CUDA and Docker. First download [yolov4.weights](https://github.com/easyadin/Object-Detection-YOLOv4#pre-trained-models) and put it in it's [folder](model_YOLOv4). Following from the above: 110 | 111 | ```sh 112 | root@1be1f7efabf9:/myapp# python3 YOLOv4_cam.py 113 | ``` 114 | bring some objects from this [list](model_YOLOv4/coco.names) in front of your camera. 115 | 116 | ## Inspect the GStreamer API out of the OpenCV wrapper 117 | 118 | Type into the container's shell : 119 | 120 | ```sh 121 | root@1be1f7efabf9:/myapp# gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink 122 | ``` 123 | then a GStreamer hello world window pops up . See the [documentation](https://gstreamer.freedesktop.org/documentation/tutorials/basic/gstreamer-tools.html?gi-language=python). 124 | 125 | 126 | 127 | ## OpenCV and GStreamer debugging 128 | 129 | Export OpenCV [Log Levels ](https://docs.opencv.org/4.5.0/da/db0/namespacecv_1_1utils_1_1logging.html) or GStreamer [Debug Levels](https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python) into our container's shell . For example: 130 | 131 | ```sh 132 | root@1be1f7efabf9:/myapp# export OPENCV_LOG_LEVEL=INFO 133 | ``` 134 | or 135 | 136 | ```sh 137 | root@1be1f7efabf9:/myapp# export GST_DEBUG=2 138 | ``` 139 | 140 | then run one of the examples and browse the output. 141 | 142 | ## Build your own image 143 | The [Dockerfile](Dockerfile) culprit for the image may not have a perfect structure . It is just my own assembly. 144 | You may modify, upgrade and build a proper one for your requirements : 145 | 146 | ```bash 147 | $ docker build -f Dockerfile -t : . 148 | ``` 149 | It won't be that straight-forward, you will get some deprecation warnings and compatibility issues.
To keep the image light-weight and compatible with old GPU architectures `SM_30 , SM_35 , SM_37` I implanted ``10.2-cudnn7-devel-ubuntu18.04`` as base 150 | image . 151 | -------------------------------------------------------------------------------- /YOLOv4_cam.py: -------------------------------------------------------------------------------- 1 | """" 2 | OPENCV YOLO v4 web cam object detection with GPU/CPU 3 | 4 | https://gist.github.com/YashasSamaga/e2b19a6807a13046e399f4bc3cca3a49 5 | https://github.com/easyadin/Object-Detection-YOLOv4 6 | 7 | """ 8 | import cv2 9 | import time 10 | import numpy as np 11 | 12 | CONFIDENCE_THRESHOLD = 0.2 13 | NMS_THRESHOLD = 0.4 14 | 15 | net = cv2.dnn.readNet("./model_YOLOv4/yolov4.weights", "./model_YOLOv4/yolov4.cfg") 16 | 17 | # comment to make inference by CPU cores : 18 | net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) 19 | net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA) 20 | # 21 | # if your device supports FP16 replace this with the above line: 22 | # net.setPreferableTarget(cv.dnn.DNN_TARGET_OPENCL_FP16) 23 | 24 | model = cv2.dnn_DetectionModel(net) 25 | model.setInputParams(size=(416, 416), scale=1 / 255, swapRB=True) 26 | 27 | with open("./model_YOLOv4/coco.names", "r") as f: 28 | class_names = [cname.strip() for cname in f.readlines()] 29 | 30 | COLORS = np.random.randint(256, size=(10, 3)).tolist() # 10 random colors 31 | 32 | # pin the inherent cam params , change 0 if not your input device : 33 | camera = cv2.VideoCapture(0) 34 | fps = camera.get(cv2.CAP_PROP_FPS) 35 | width0 = int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)) 36 | height0 = int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)) 37 | del camera 38 | 39 | # the default cam device path is 'v4l2src device=/dev/video0 ! .. ! replace with yours if other 40 | gst_str = 'v4l2src ! videoconvert ! videoscale ! videorate ! ' \ 41 | 'video/x-raw,width=1000,height=500,framerate=12/1 ! appsink sync=false drop=TRUE ' 42 | 43 | cap = cv2.VideoCapture(gst_str, cv2.CAP_GSTREAMER) 44 | wid = cap.get(3) 45 | hei = cap.get(4) 46 | fpsGS = cap.get(5) 47 | 48 | while True: 49 | ret, frame = cap.read() 50 | if not ret: 51 | print(' stream gone yaw !') 52 | break 53 | start = time.time() 54 | classes, scores, boxes = model.detect(frame, CONFIDENCE_THRESHOLD, NMS_THRESHOLD) 55 | 56 | for (classid, score, box) in zip(classes, scores, boxes): 57 | color = COLORS[int(classid) % len(COLORS)] 58 | label = "%s : %.2f" % (class_names[classid[0]], score) 59 | cv2.rectangle(frame, box, color, 2) 60 | cv2.putText(frame, label, (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2) 61 | # the text may got out of the frame in other frame sizes - tune text_x and text_y for the case 62 | text_x = int(len(frame[0]) - 400) 63 | text_y = 30 64 | st1 = f'Camera {width0}*{height0},fps:{fps}' 65 | cv2.putText(frame, st1, (text_x, text_y + 20), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (66, 245, 242), 1) 66 | # 67 | st2 = f'Gstreamer {wid}*{hei},fps:{fpsGS}' 68 | cv2.putText(frame, st2, (text_x, text_y + 60), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (66, 245, 242), 1) 69 | # 70 | end = time.time() 71 | fps_det = 1 / (end - start) 72 | st3 = f'Detection fps: {fps_det:.2f} ' 73 | cv2.putText(frame, st3, (text_x, text_y + 100), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (66, 245, 242), 1) 74 | # 75 | cv2.imshow("detections", frame) 76 | if cv2.waitKey(1) & 0xFF == ord('q'): 77 | break 78 | 79 | cap.release() 80 | cv2.destroyAllWindows() 81 | -------------------------------------------------------------------------------- /cv2_info.py: -------------------------------------------------------------------------------- 1 | 2 | import cv2 3 | 4 | with open('cv2_Build_Info', 'a') as f: 5 | f.write(cv2.getBuildInformation()) -------------------------------------------------------------------------------- /face_SSD.py: -------------------------------------------------------------------------------- 1 | """ 2 | Face detection OpenCV DNN with Gsteamer on GPU/CPU 3 | 4 | https://github.com/opencv/opencv/tree/master/samples/dnn 5 | 6 | """ 7 | import cv2 8 | import numpy as np 9 | import time 10 | 11 | # before messing with w,h,fps pin the embedded w,h,fps of your cam . Replace 0 with your options if other : 12 | camera = cv2.VideoCapture(0) 13 | fps = camera.get(cv2.CAP_PROP_FPS) 14 | width = int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)) 15 | height = int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)) 16 | del camera 17 | 18 | net = cv2.dnn.readNetFromCaffe("./model_SSD/deploy.prototxt", 19 | "./model_SSD/res10_300x300_ssd_iter_140000_fp16.caffemodel") 20 | 21 | # comment the two lines below for the CPU inference 22 | net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) 23 | net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA) 24 | # 25 | 26 | confid = 0.75 27 | 28 | # the default cam device path is 'v4l2src device=/dev/video0 ! .. replace with yours if other 29 | gst_str = 'v4l2src ! videoconvert ! videoscale ! videorate ! ' \ 30 | 'video/x-raw,width=900,height=500,framerate=20/1 ! appsink sync=false drop=TRUE ' 31 | 32 | camera = cv2.VideoCapture(gst_str, cv2.CAP_GSTREAMER) 33 | fpsGS = camera.get(cv2.CAP_PROP_FPS) 34 | 35 | while True: 36 | ret, frame = camera.read() 37 | if not ret: 38 | print("stream gone yaw !") 39 | break 40 | start = time.time() 41 | h, w = frame.shape[:2] 42 | imageBlob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0, (300, 300), (104, 177, 123)) 43 | net.setInput(imageBlob) 44 | detections = net.forward() 45 | for i in range(0, detections.shape[2]): 46 | confidence = detections[0, 0, i, 2] 47 | if confidence > confid: 48 | box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) 49 | (startX, startY, endX, endY) = box.astype("int") 50 | face = frame[startY:endY, startX:endX] 51 | (fH, fW) = face.shape[:2] 52 | if fW < 30 or fH < 30: 53 | continue 54 | # encircling ellipse 55 | a1 = int((np.absolute(startX - endX)) / 2) 56 | a2 = int((np.absolute(startY - endY)) / 2) 57 | midx = startX + a1 58 | midy = startY + a2 59 | cv2.ellipse(frame, (midx, midy), (a1, int(a1 * 1.5)), 0, 120, 360, (66, 245, 242), 2, lineType=8) 60 | # the text may got out of the frame in other frame sizes - tune text_x and text_y for the case 61 | text_x = int(len(frame[0]) - 400) 62 | text_y = 30 63 | st1 = f'Camera {width}*{height},fps:{fps}' 64 | cv2.putText(frame, st1, (text_x, text_y + 20), cv2.FONT_HERSHEY_SIMPLEX , 0.7, (66, 245, 242), 1) 65 | # GS 66 | (h, w) = frame.shape[:2] 67 | st2 = f'GStreamer {w}*{h},fps:{fpsGS}' 68 | cv2.putText(frame, st2, (text_x, text_y + 60),cv2.FONT_HERSHEY_SIMPLEX , 0.7, (66, 245, 242), 1) 69 | # detection fps 70 | end = time.time() 71 | fps_det = 1 / (end - start) 72 | st3 = f'Detection fps: {fps_det:.1f}' 73 | cv2.putText(frame, st3, (text_x, text_y + 100), cv2.FONT_HERSHEY_SIMPLEX , 0.7, (66, 245, 242), 1) 74 | cv2.imshow('OpenCV DNN face detection ', frame) 75 | if cv2.waitKey(1) & 0xFF == ord('q'): 76 | break 77 | 78 | camera.release() 79 | cv2.destroyAllWindows() 80 | -------------------------------------------------------------------------------- /golden_axe.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fizmath/Docker-opencv-GPU/c4b1fc4c3e4e22417bb49a8990bf62d33454f175/golden_axe.png -------------------------------------------------------------------------------- /gstream.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fizmath/Docker-opencv-GPU/c4b1fc4c3e4e22417bb49a8990bf62d33454f175/gstream.png -------------------------------------------------------------------------------- /model_SSD/deploy.prototxt: -------------------------------------------------------------------------------- 1 | input: "data" 2 | input_shape { 3 | dim: 1 4 | dim: 3 5 | dim: 300 6 | dim: 300 7 | } 8 | 9 | layer { 10 | name: "data_bn" 11 | type: "BatchNorm" 12 | bottom: "data" 13 | top: "data_bn" 14 | param { 15 | lr_mult: 0.0 16 | } 17 | param { 18 | lr_mult: 0.0 19 | } 20 | param { 21 | lr_mult: 0.0 22 | } 23 | } 24 | layer { 25 | name: "data_scale" 26 | type: "Scale" 27 | bottom: "data_bn" 28 | top: "data_bn" 29 | param { 30 | lr_mult: 1.0 31 | decay_mult: 1.0 32 | } 33 | param { 34 | lr_mult: 2.0 35 | decay_mult: 1.0 36 | } 37 | scale_param { 38 | bias_term: true 39 | } 40 | } 41 | layer { 42 | name: "conv1_h" 43 | type: "Convolution" 44 | bottom: "data_bn" 45 | top: "conv1_h" 46 | param { 47 | lr_mult: 1.0 48 | decay_mult: 1.0 49 | } 50 | param { 51 | lr_mult: 2.0 52 | decay_mult: 1.0 53 | } 54 | convolution_param { 55 | num_output: 32 56 | pad: 3 57 | kernel_size: 7 58 | stride: 2 59 | weight_filler { 60 | type: "msra" 61 | variance_norm: FAN_OUT 62 | } 63 | bias_filler { 64 | type: "constant" 65 | value: 0.0 66 | } 67 | } 68 | } 69 | layer { 70 | name: "conv1_bn_h" 71 | type: "BatchNorm" 72 | bottom: "conv1_h" 73 | top: "conv1_h" 74 | param { 75 | lr_mult: 0.0 76 | } 77 | param { 78 | lr_mult: 0.0 79 | } 80 | param { 81 | lr_mult: 0.0 82 | } 83 | } 84 | layer { 85 | name: "conv1_scale_h" 86 | type: "Scale" 87 | bottom: "conv1_h" 88 | top: "conv1_h" 89 | param { 90 | lr_mult: 1.0 91 | decay_mult: 1.0 92 | } 93 | param { 94 | lr_mult: 2.0 95 | decay_mult: 1.0 96 | } 97 | scale_param { 98 | bias_term: true 99 | } 100 | } 101 | layer { 102 | name: "conv1_relu" 103 | type: "ReLU" 104 | bottom: "conv1_h" 105 | top: "conv1_h" 106 | } 107 | layer { 108 | name: "conv1_pool" 109 | type: "Pooling" 110 | bottom: "conv1_h" 111 | top: "conv1_pool" 112 | pooling_param { 113 | kernel_size: 3 114 | stride: 2 115 | } 116 | } 117 | layer { 118 | name: "layer_64_1_conv1_h" 119 | type: "Convolution" 120 | bottom: "conv1_pool" 121 | top: "layer_64_1_conv1_h" 122 | param { 123 | lr_mult: 1.0 124 | decay_mult: 1.0 125 | } 126 | convolution_param { 127 | num_output: 32 128 | bias_term: false 129 | pad: 1 130 | kernel_size: 3 131 | stride: 1 132 | weight_filler { 133 | type: "msra" 134 | } 135 | bias_filler { 136 | type: "constant" 137 | value: 0.0 138 | } 139 | } 140 | } 141 | layer { 142 | name: "layer_64_1_bn2_h" 143 | type: "BatchNorm" 144 | bottom: "layer_64_1_conv1_h" 145 | top: "layer_64_1_conv1_h" 146 | param { 147 | lr_mult: 0.0 148 | } 149 | param { 150 | lr_mult: 0.0 151 | } 152 | param { 153 | lr_mult: 0.0 154 | } 155 | } 156 | layer { 157 | name: "layer_64_1_scale2_h" 158 | type: "Scale" 159 | bottom: "layer_64_1_conv1_h" 160 | top: "layer_64_1_conv1_h" 161 | param { 162 | lr_mult: 1.0 163 | decay_mult: 1.0 164 | } 165 | param { 166 | lr_mult: 2.0 167 | decay_mult: 1.0 168 | } 169 | scale_param { 170 | bias_term: true 171 | } 172 | } 173 | layer { 174 | name: "layer_64_1_relu2" 175 | type: "ReLU" 176 | bottom: "layer_64_1_conv1_h" 177 | top: "layer_64_1_conv1_h" 178 | } 179 | layer { 180 | name: "layer_64_1_conv2_h" 181 | type: "Convolution" 182 | bottom: "layer_64_1_conv1_h" 183 | top: "layer_64_1_conv2_h" 184 | param { 185 | lr_mult: 1.0 186 | decay_mult: 1.0 187 | } 188 | convolution_param { 189 | num_output: 32 190 | bias_term: false 191 | pad: 1 192 | kernel_size: 3 193 | stride: 1 194 | weight_filler { 195 | type: "msra" 196 | } 197 | bias_filler { 198 | type: "constant" 199 | value: 0.0 200 | } 201 | } 202 | } 203 | layer { 204 | name: "layer_64_1_sum" 205 | type: "Eltwise" 206 | bottom: "layer_64_1_conv2_h" 207 | bottom: "conv1_pool" 208 | top: "layer_64_1_sum" 209 | } 210 | layer { 211 | name: "layer_128_1_bn1_h" 212 | type: "BatchNorm" 213 | bottom: "layer_64_1_sum" 214 | top: "layer_128_1_bn1_h" 215 | param { 216 | lr_mult: 0.0 217 | } 218 | param { 219 | lr_mult: 0.0 220 | } 221 | param { 222 | lr_mult: 0.0 223 | } 224 | } 225 | layer { 226 | name: "layer_128_1_scale1_h" 227 | type: "Scale" 228 | bottom: "layer_128_1_bn1_h" 229 | top: "layer_128_1_bn1_h" 230 | param { 231 | lr_mult: 1.0 232 | decay_mult: 1.0 233 | } 234 | param { 235 | lr_mult: 2.0 236 | decay_mult: 1.0 237 | } 238 | scale_param { 239 | bias_term: true 240 | } 241 | } 242 | layer { 243 | name: "layer_128_1_relu1" 244 | type: "ReLU" 245 | bottom: "layer_128_1_bn1_h" 246 | top: "layer_128_1_bn1_h" 247 | } 248 | layer { 249 | name: "layer_128_1_conv1_h" 250 | type: "Convolution" 251 | bottom: "layer_128_1_bn1_h" 252 | top: "layer_128_1_conv1_h" 253 | param { 254 | lr_mult: 1.0 255 | decay_mult: 1.0 256 | } 257 | convolution_param { 258 | num_output: 128 259 | bias_term: false 260 | pad: 1 261 | kernel_size: 3 262 | stride: 2 263 | weight_filler { 264 | type: "msra" 265 | } 266 | bias_filler { 267 | type: "constant" 268 | value: 0.0 269 | } 270 | } 271 | } 272 | layer { 273 | name: "layer_128_1_bn2" 274 | type: "BatchNorm" 275 | bottom: "layer_128_1_conv1_h" 276 | top: "layer_128_1_conv1_h" 277 | param { 278 | lr_mult: 0.0 279 | } 280 | param { 281 | lr_mult: 0.0 282 | } 283 | param { 284 | lr_mult: 0.0 285 | } 286 | } 287 | layer { 288 | name: "layer_128_1_scale2" 289 | type: "Scale" 290 | bottom: "layer_128_1_conv1_h" 291 | top: "layer_128_1_conv1_h" 292 | param { 293 | lr_mult: 1.0 294 | decay_mult: 1.0 295 | } 296 | param { 297 | lr_mult: 2.0 298 | decay_mult: 1.0 299 | } 300 | scale_param { 301 | bias_term: true 302 | } 303 | } 304 | layer { 305 | name: "layer_128_1_relu2" 306 | type: "ReLU" 307 | bottom: "layer_128_1_conv1_h" 308 | top: "layer_128_1_conv1_h" 309 | } 310 | layer { 311 | name: "layer_128_1_conv2" 312 | type: "Convolution" 313 | bottom: "layer_128_1_conv1_h" 314 | top: "layer_128_1_conv2" 315 | param { 316 | lr_mult: 1.0 317 | decay_mult: 1.0 318 | } 319 | convolution_param { 320 | num_output: 128 321 | bias_term: false 322 | pad: 1 323 | kernel_size: 3 324 | stride: 1 325 | weight_filler { 326 | type: "msra" 327 | } 328 | bias_filler { 329 | type: "constant" 330 | value: 0.0 331 | } 332 | } 333 | } 334 | layer { 335 | name: "layer_128_1_conv_expand_h" 336 | type: "Convolution" 337 | bottom: "layer_128_1_bn1_h" 338 | top: "layer_128_1_conv_expand_h" 339 | param { 340 | lr_mult: 1.0 341 | decay_mult: 1.0 342 | } 343 | convolution_param { 344 | num_output: 128 345 | bias_term: false 346 | pad: 0 347 | kernel_size: 1 348 | stride: 2 349 | weight_filler { 350 | type: "msra" 351 | } 352 | bias_filler { 353 | type: "constant" 354 | value: 0.0 355 | } 356 | } 357 | } 358 | layer { 359 | name: "layer_128_1_sum" 360 | type: "Eltwise" 361 | bottom: "layer_128_1_conv2" 362 | bottom: "layer_128_1_conv_expand_h" 363 | top: "layer_128_1_sum" 364 | } 365 | layer { 366 | name: "layer_256_1_bn1" 367 | type: "BatchNorm" 368 | bottom: "layer_128_1_sum" 369 | top: "layer_256_1_bn1" 370 | param { 371 | lr_mult: 0.0 372 | } 373 | param { 374 | lr_mult: 0.0 375 | } 376 | param { 377 | lr_mult: 0.0 378 | } 379 | } 380 | layer { 381 | name: "layer_256_1_scale1" 382 | type: "Scale" 383 | bottom: "layer_256_1_bn1" 384 | top: "layer_256_1_bn1" 385 | param { 386 | lr_mult: 1.0 387 | decay_mult: 1.0 388 | } 389 | param { 390 | lr_mult: 2.0 391 | decay_mult: 1.0 392 | } 393 | scale_param { 394 | bias_term: true 395 | } 396 | } 397 | layer { 398 | name: "layer_256_1_relu1" 399 | type: "ReLU" 400 | bottom: "layer_256_1_bn1" 401 | top: "layer_256_1_bn1" 402 | } 403 | layer { 404 | name: "layer_256_1_conv1" 405 | type: "Convolution" 406 | bottom: "layer_256_1_bn1" 407 | top: "layer_256_1_conv1" 408 | param { 409 | lr_mult: 1.0 410 | decay_mult: 1.0 411 | } 412 | convolution_param { 413 | num_output: 256 414 | bias_term: false 415 | pad: 1 416 | kernel_size: 3 417 | stride: 2 418 | weight_filler { 419 | type: "msra" 420 | } 421 | bias_filler { 422 | type: "constant" 423 | value: 0.0 424 | } 425 | } 426 | } 427 | layer { 428 | name: "layer_256_1_bn2" 429 | type: "BatchNorm" 430 | bottom: "layer_256_1_conv1" 431 | top: "layer_256_1_conv1" 432 | param { 433 | lr_mult: 0.0 434 | } 435 | param { 436 | lr_mult: 0.0 437 | } 438 | param { 439 | lr_mult: 0.0 440 | } 441 | } 442 | layer { 443 | name: "layer_256_1_scale2" 444 | type: "Scale" 445 | bottom: "layer_256_1_conv1" 446 | top: "layer_256_1_conv1" 447 | param { 448 | lr_mult: 1.0 449 | decay_mult: 1.0 450 | } 451 | param { 452 | lr_mult: 2.0 453 | decay_mult: 1.0 454 | } 455 | scale_param { 456 | bias_term: true 457 | } 458 | } 459 | layer { 460 | name: "layer_256_1_relu2" 461 | type: "ReLU" 462 | bottom: "layer_256_1_conv1" 463 | top: "layer_256_1_conv1" 464 | } 465 | layer { 466 | name: "layer_256_1_conv2" 467 | type: "Convolution" 468 | bottom: "layer_256_1_conv1" 469 | top: "layer_256_1_conv2" 470 | param { 471 | lr_mult: 1.0 472 | decay_mult: 1.0 473 | } 474 | convolution_param { 475 | num_output: 256 476 | bias_term: false 477 | pad: 1 478 | kernel_size: 3 479 | stride: 1 480 | weight_filler { 481 | type: "msra" 482 | } 483 | bias_filler { 484 | type: "constant" 485 | value: 0.0 486 | } 487 | } 488 | } 489 | layer { 490 | name: "layer_256_1_conv_expand" 491 | type: "Convolution" 492 | bottom: "layer_256_1_bn1" 493 | top: "layer_256_1_conv_expand" 494 | param { 495 | lr_mult: 1.0 496 | decay_mult: 1.0 497 | } 498 | convolution_param { 499 | num_output: 256 500 | bias_term: false 501 | pad: 0 502 | kernel_size: 1 503 | stride: 2 504 | weight_filler { 505 | type: "msra" 506 | } 507 | bias_filler { 508 | type: "constant" 509 | value: 0.0 510 | } 511 | } 512 | } 513 | layer { 514 | name: "layer_256_1_sum" 515 | type: "Eltwise" 516 | bottom: "layer_256_1_conv2" 517 | bottom: "layer_256_1_conv_expand" 518 | top: "layer_256_1_sum" 519 | } 520 | layer { 521 | name: "layer_512_1_bn1" 522 | type: "BatchNorm" 523 | bottom: "layer_256_1_sum" 524 | top: "layer_512_1_bn1" 525 | param { 526 | lr_mult: 0.0 527 | } 528 | param { 529 | lr_mult: 0.0 530 | } 531 | param { 532 | lr_mult: 0.0 533 | } 534 | } 535 | layer { 536 | name: "layer_512_1_scale1" 537 | type: "Scale" 538 | bottom: "layer_512_1_bn1" 539 | top: "layer_512_1_bn1" 540 | param { 541 | lr_mult: 1.0 542 | decay_mult: 1.0 543 | } 544 | param { 545 | lr_mult: 2.0 546 | decay_mult: 1.0 547 | } 548 | scale_param { 549 | bias_term: true 550 | } 551 | } 552 | layer { 553 | name: "layer_512_1_relu1" 554 | type: "ReLU" 555 | bottom: "layer_512_1_bn1" 556 | top: "layer_512_1_bn1" 557 | } 558 | layer { 559 | name: "layer_512_1_conv1_h" 560 | type: "Convolution" 561 | bottom: "layer_512_1_bn1" 562 | top: "layer_512_1_conv1_h" 563 | param { 564 | lr_mult: 1.0 565 | decay_mult: 1.0 566 | } 567 | convolution_param { 568 | num_output: 128 569 | bias_term: false 570 | pad: 1 571 | kernel_size: 3 572 | stride: 1 # 2 573 | weight_filler { 574 | type: "msra" 575 | } 576 | bias_filler { 577 | type: "constant" 578 | value: 0.0 579 | } 580 | } 581 | } 582 | layer { 583 | name: "layer_512_1_bn2_h" 584 | type: "BatchNorm" 585 | bottom: "layer_512_1_conv1_h" 586 | top: "layer_512_1_conv1_h" 587 | param { 588 | lr_mult: 0.0 589 | } 590 | param { 591 | lr_mult: 0.0 592 | } 593 | param { 594 | lr_mult: 0.0 595 | } 596 | } 597 | layer { 598 | name: "layer_512_1_scale2_h" 599 | type: "Scale" 600 | bottom: "layer_512_1_conv1_h" 601 | top: "layer_512_1_conv1_h" 602 | param { 603 | lr_mult: 1.0 604 | decay_mult: 1.0 605 | } 606 | param { 607 | lr_mult: 2.0 608 | decay_mult: 1.0 609 | } 610 | scale_param { 611 | bias_term: true 612 | } 613 | } 614 | layer { 615 | name: "layer_512_1_relu2" 616 | type: "ReLU" 617 | bottom: "layer_512_1_conv1_h" 618 | top: "layer_512_1_conv1_h" 619 | } 620 | layer { 621 | name: "layer_512_1_conv2_h" 622 | type: "Convolution" 623 | bottom: "layer_512_1_conv1_h" 624 | top: "layer_512_1_conv2_h" 625 | param { 626 | lr_mult: 1.0 627 | decay_mult: 1.0 628 | } 629 | convolution_param { 630 | num_output: 256 631 | bias_term: false 632 | pad: 2 # 1 633 | kernel_size: 3 634 | stride: 1 635 | dilation: 2 636 | weight_filler { 637 | type: "msra" 638 | } 639 | bias_filler { 640 | type: "constant" 641 | value: 0.0 642 | } 643 | } 644 | } 645 | layer { 646 | name: "layer_512_1_conv_expand_h" 647 | type: "Convolution" 648 | bottom: "layer_512_1_bn1" 649 | top: "layer_512_1_conv_expand_h" 650 | param { 651 | lr_mult: 1.0 652 | decay_mult: 1.0 653 | } 654 | convolution_param { 655 | num_output: 256 656 | bias_term: false 657 | pad: 0 658 | kernel_size: 1 659 | stride: 1 # 2 660 | weight_filler { 661 | type: "msra" 662 | } 663 | bias_filler { 664 | type: "constant" 665 | value: 0.0 666 | } 667 | } 668 | } 669 | layer { 670 | name: "layer_512_1_sum" 671 | type: "Eltwise" 672 | bottom: "layer_512_1_conv2_h" 673 | bottom: "layer_512_1_conv_expand_h" 674 | top: "layer_512_1_sum" 675 | } 676 | layer { 677 | name: "last_bn_h" 678 | type: "BatchNorm" 679 | bottom: "layer_512_1_sum" 680 | top: "layer_512_1_sum" 681 | param { 682 | lr_mult: 0.0 683 | } 684 | param { 685 | lr_mult: 0.0 686 | } 687 | param { 688 | lr_mult: 0.0 689 | } 690 | } 691 | layer { 692 | name: "last_scale_h" 693 | type: "Scale" 694 | bottom: "layer_512_1_sum" 695 | top: "layer_512_1_sum" 696 | param { 697 | lr_mult: 1.0 698 | decay_mult: 1.0 699 | } 700 | param { 701 | lr_mult: 2.0 702 | decay_mult: 1.0 703 | } 704 | scale_param { 705 | bias_term: true 706 | } 707 | } 708 | layer { 709 | name: "last_relu" 710 | type: "ReLU" 711 | bottom: "layer_512_1_sum" 712 | top: "fc7" 713 | } 714 | 715 | layer { 716 | name: "conv6_1_h" 717 | type: "Convolution" 718 | bottom: "fc7" 719 | top: "conv6_1_h" 720 | param { 721 | lr_mult: 1 722 | decay_mult: 1 723 | } 724 | param { 725 | lr_mult: 2 726 | decay_mult: 0 727 | } 728 | convolution_param { 729 | num_output: 128 730 | pad: 0 731 | kernel_size: 1 732 | stride: 1 733 | weight_filler { 734 | type: "xavier" 735 | } 736 | bias_filler { 737 | type: "constant" 738 | value: 0 739 | } 740 | } 741 | } 742 | layer { 743 | name: "conv6_1_relu" 744 | type: "ReLU" 745 | bottom: "conv6_1_h" 746 | top: "conv6_1_h" 747 | } 748 | layer { 749 | name: "conv6_2_h" 750 | type: "Convolution" 751 | bottom: "conv6_1_h" 752 | top: "conv6_2_h" 753 | param { 754 | lr_mult: 1 755 | decay_mult: 1 756 | } 757 | param { 758 | lr_mult: 2 759 | decay_mult: 0 760 | } 761 | convolution_param { 762 | num_output: 256 763 | pad: 1 764 | kernel_size: 3 765 | stride: 2 766 | weight_filler { 767 | type: "xavier" 768 | } 769 | bias_filler { 770 | type: "constant" 771 | value: 0 772 | } 773 | } 774 | } 775 | layer { 776 | name: "conv6_2_relu" 777 | type: "ReLU" 778 | bottom: "conv6_2_h" 779 | top: "conv6_2_h" 780 | } 781 | layer { 782 | name: "conv7_1_h" 783 | type: "Convolution" 784 | bottom: "conv6_2_h" 785 | top: "conv7_1_h" 786 | param { 787 | lr_mult: 1 788 | decay_mult: 1 789 | } 790 | param { 791 | lr_mult: 2 792 | decay_mult: 0 793 | } 794 | convolution_param { 795 | num_output: 64 796 | pad: 0 797 | kernel_size: 1 798 | stride: 1 799 | weight_filler { 800 | type: "xavier" 801 | } 802 | bias_filler { 803 | type: "constant" 804 | value: 0 805 | } 806 | } 807 | } 808 | layer { 809 | name: "conv7_1_relu" 810 | type: "ReLU" 811 | bottom: "conv7_1_h" 812 | top: "conv7_1_h" 813 | } 814 | layer { 815 | name: "conv7_2_h" 816 | type: "Convolution" 817 | bottom: "conv7_1_h" 818 | top: "conv7_2_h" 819 | param { 820 | lr_mult: 1 821 | decay_mult: 1 822 | } 823 | param { 824 | lr_mult: 2 825 | decay_mult: 0 826 | } 827 | convolution_param { 828 | num_output: 128 829 | pad: 1 830 | kernel_size: 3 831 | stride: 2 832 | weight_filler { 833 | type: "xavier" 834 | } 835 | bias_filler { 836 | type: "constant" 837 | value: 0 838 | } 839 | } 840 | } 841 | layer { 842 | name: "conv7_2_relu" 843 | type: "ReLU" 844 | bottom: "conv7_2_h" 845 | top: "conv7_2_h" 846 | } 847 | layer { 848 | name: "conv8_1_h" 849 | type: "Convolution" 850 | bottom: "conv7_2_h" 851 | top: "conv8_1_h" 852 | param { 853 | lr_mult: 1 854 | decay_mult: 1 855 | } 856 | param { 857 | lr_mult: 2 858 | decay_mult: 0 859 | } 860 | convolution_param { 861 | num_output: 64 862 | pad: 0 863 | kernel_size: 1 864 | stride: 1 865 | weight_filler { 866 | type: "xavier" 867 | } 868 | bias_filler { 869 | type: "constant" 870 | value: 0 871 | } 872 | } 873 | } 874 | layer { 875 | name: "conv8_1_relu" 876 | type: "ReLU" 877 | bottom: "conv8_1_h" 878 | top: "conv8_1_h" 879 | } 880 | layer { 881 | name: "conv8_2_h" 882 | type: "Convolution" 883 | bottom: "conv8_1_h" 884 | top: "conv8_2_h" 885 | param { 886 | lr_mult: 1 887 | decay_mult: 1 888 | } 889 | param { 890 | lr_mult: 2 891 | decay_mult: 0 892 | } 893 | convolution_param { 894 | num_output: 128 895 | pad: 1 896 | kernel_size: 3 897 | stride: 1 898 | weight_filler { 899 | type: "xavier" 900 | } 901 | bias_filler { 902 | type: "constant" 903 | value: 0 904 | } 905 | } 906 | } 907 | layer { 908 | name: "conv8_2_relu" 909 | type: "ReLU" 910 | bottom: "conv8_2_h" 911 | top: "conv8_2_h" 912 | } 913 | layer { 914 | name: "conv9_1_h" 915 | type: "Convolution" 916 | bottom: "conv8_2_h" 917 | top: "conv9_1_h" 918 | param { 919 | lr_mult: 1 920 | decay_mult: 1 921 | } 922 | param { 923 | lr_mult: 2 924 | decay_mult: 0 925 | } 926 | convolution_param { 927 | num_output: 64 928 | pad: 0 929 | kernel_size: 1 930 | stride: 1 931 | weight_filler { 932 | type: "xavier" 933 | } 934 | bias_filler { 935 | type: "constant" 936 | value: 0 937 | } 938 | } 939 | } 940 | layer { 941 | name: "conv9_1_relu" 942 | type: "ReLU" 943 | bottom: "conv9_1_h" 944 | top: "conv9_1_h" 945 | } 946 | layer { 947 | name: "conv9_2_h" 948 | type: "Convolution" 949 | bottom: "conv9_1_h" 950 | top: "conv9_2_h" 951 | param { 952 | lr_mult: 1 953 | decay_mult: 1 954 | } 955 | param { 956 | lr_mult: 2 957 | decay_mult: 0 958 | } 959 | convolution_param { 960 | num_output: 128 961 | pad: 1 962 | kernel_size: 3 963 | stride: 1 964 | weight_filler { 965 | type: "xavier" 966 | } 967 | bias_filler { 968 | type: "constant" 969 | value: 0 970 | } 971 | } 972 | } 973 | layer { 974 | name: "conv9_2_relu" 975 | type: "ReLU" 976 | bottom: "conv9_2_h" 977 | top: "conv9_2_h" 978 | } 979 | layer { 980 | name: "conv4_3_norm" 981 | type: "Normalize" 982 | bottom: "layer_256_1_bn1" 983 | top: "conv4_3_norm" 984 | norm_param { 985 | across_spatial: false 986 | scale_filler { 987 | type: "constant" 988 | value: 20 989 | } 990 | channel_shared: false 991 | } 992 | } 993 | layer { 994 | name: "conv4_3_norm_mbox_loc" 995 | type: "Convolution" 996 | bottom: "conv4_3_norm" 997 | top: "conv4_3_norm_mbox_loc" 998 | param { 999 | lr_mult: 1 1000 | decay_mult: 1 1001 | } 1002 | param { 1003 | lr_mult: 2 1004 | decay_mult: 0 1005 | } 1006 | convolution_param { 1007 | num_output: 16 1008 | pad: 1 1009 | kernel_size: 3 1010 | stride: 1 1011 | weight_filler { 1012 | type: "xavier" 1013 | } 1014 | bias_filler { 1015 | type: "constant" 1016 | value: 0 1017 | } 1018 | } 1019 | } 1020 | layer { 1021 | name: "conv4_3_norm_mbox_loc_perm" 1022 | type: "Permute" 1023 | bottom: "conv4_3_norm_mbox_loc" 1024 | top: "conv4_3_norm_mbox_loc_perm" 1025 | permute_param { 1026 | order: 0 1027 | order: 2 1028 | order: 3 1029 | order: 1 1030 | } 1031 | } 1032 | layer { 1033 | name: "conv4_3_norm_mbox_loc_flat" 1034 | type: "Flatten" 1035 | bottom: "conv4_3_norm_mbox_loc_perm" 1036 | top: "conv4_3_norm_mbox_loc_flat" 1037 | flatten_param { 1038 | axis: 1 1039 | } 1040 | } 1041 | layer { 1042 | name: "conv4_3_norm_mbox_conf" 1043 | type: "Convolution" 1044 | bottom: "conv4_3_norm" 1045 | top: "conv4_3_norm_mbox_conf" 1046 | param { 1047 | lr_mult: 1 1048 | decay_mult: 1 1049 | } 1050 | param { 1051 | lr_mult: 2 1052 | decay_mult: 0 1053 | } 1054 | convolution_param { 1055 | num_output: 8 # 84 1056 | pad: 1 1057 | kernel_size: 3 1058 | stride: 1 1059 | weight_filler { 1060 | type: "xavier" 1061 | } 1062 | bias_filler { 1063 | type: "constant" 1064 | value: 0 1065 | } 1066 | } 1067 | } 1068 | layer { 1069 | name: "conv4_3_norm_mbox_conf_perm" 1070 | type: "Permute" 1071 | bottom: "conv4_3_norm_mbox_conf" 1072 | top: "conv4_3_norm_mbox_conf_perm" 1073 | permute_param { 1074 | order: 0 1075 | order: 2 1076 | order: 3 1077 | order: 1 1078 | } 1079 | } 1080 | layer { 1081 | name: "conv4_3_norm_mbox_conf_flat" 1082 | type: "Flatten" 1083 | bottom: "conv4_3_norm_mbox_conf_perm" 1084 | top: "conv4_3_norm_mbox_conf_flat" 1085 | flatten_param { 1086 | axis: 1 1087 | } 1088 | } 1089 | layer { 1090 | name: "conv4_3_norm_mbox_priorbox" 1091 | type: "PriorBox" 1092 | bottom: "conv4_3_norm" 1093 | bottom: "data" 1094 | top: "conv4_3_norm_mbox_priorbox" 1095 | prior_box_param { 1096 | min_size: 30.0 1097 | max_size: 60.0 1098 | aspect_ratio: 2 1099 | flip: true 1100 | clip: false 1101 | variance: 0.1 1102 | variance: 0.1 1103 | variance: 0.2 1104 | variance: 0.2 1105 | step: 8 1106 | offset: 0.5 1107 | } 1108 | } 1109 | layer { 1110 | name: "fc7_mbox_loc" 1111 | type: "Convolution" 1112 | bottom: "fc7" 1113 | top: "fc7_mbox_loc" 1114 | param { 1115 | lr_mult: 1 1116 | decay_mult: 1 1117 | } 1118 | param { 1119 | lr_mult: 2 1120 | decay_mult: 0 1121 | } 1122 | convolution_param { 1123 | num_output: 24 1124 | pad: 1 1125 | kernel_size: 3 1126 | stride: 1 1127 | weight_filler { 1128 | type: "xavier" 1129 | } 1130 | bias_filler { 1131 | type: "constant" 1132 | value: 0 1133 | } 1134 | } 1135 | } 1136 | layer { 1137 | name: "fc7_mbox_loc_perm" 1138 | type: "Permute" 1139 | bottom: "fc7_mbox_loc" 1140 | top: "fc7_mbox_loc_perm" 1141 | permute_param { 1142 | order: 0 1143 | order: 2 1144 | order: 3 1145 | order: 1 1146 | } 1147 | } 1148 | layer { 1149 | name: "fc7_mbox_loc_flat" 1150 | type: "Flatten" 1151 | bottom: "fc7_mbox_loc_perm" 1152 | top: "fc7_mbox_loc_flat" 1153 | flatten_param { 1154 | axis: 1 1155 | } 1156 | } 1157 | layer { 1158 | name: "fc7_mbox_conf" 1159 | type: "Convolution" 1160 | bottom: "fc7" 1161 | top: "fc7_mbox_conf" 1162 | param { 1163 | lr_mult: 1 1164 | decay_mult: 1 1165 | } 1166 | param { 1167 | lr_mult: 2 1168 | decay_mult: 0 1169 | } 1170 | convolution_param { 1171 | num_output: 12 # 126 1172 | pad: 1 1173 | kernel_size: 3 1174 | stride: 1 1175 | weight_filler { 1176 | type: "xavier" 1177 | } 1178 | bias_filler { 1179 | type: "constant" 1180 | value: 0 1181 | } 1182 | } 1183 | } 1184 | layer { 1185 | name: "fc7_mbox_conf_perm" 1186 | type: "Permute" 1187 | bottom: "fc7_mbox_conf" 1188 | top: "fc7_mbox_conf_perm" 1189 | permute_param { 1190 | order: 0 1191 | order: 2 1192 | order: 3 1193 | order: 1 1194 | } 1195 | } 1196 | layer { 1197 | name: "fc7_mbox_conf_flat" 1198 | type: "Flatten" 1199 | bottom: "fc7_mbox_conf_perm" 1200 | top: "fc7_mbox_conf_flat" 1201 | flatten_param { 1202 | axis: 1 1203 | } 1204 | } 1205 | layer { 1206 | name: "fc7_mbox_priorbox" 1207 | type: "PriorBox" 1208 | bottom: "fc7" 1209 | bottom: "data" 1210 | top: "fc7_mbox_priorbox" 1211 | prior_box_param { 1212 | min_size: 60.0 1213 | max_size: 111.0 1214 | aspect_ratio: 2 1215 | aspect_ratio: 3 1216 | flip: true 1217 | clip: false 1218 | variance: 0.1 1219 | variance: 0.1 1220 | variance: 0.2 1221 | variance: 0.2 1222 | step: 16 1223 | offset: 0.5 1224 | } 1225 | } 1226 | layer { 1227 | name: "conv6_2_mbox_loc" 1228 | type: "Convolution" 1229 | bottom: "conv6_2_h" 1230 | top: "conv6_2_mbox_loc" 1231 | param { 1232 | lr_mult: 1 1233 | decay_mult: 1 1234 | } 1235 | param { 1236 | lr_mult: 2 1237 | decay_mult: 0 1238 | } 1239 | convolution_param { 1240 | num_output: 24 1241 | pad: 1 1242 | kernel_size: 3 1243 | stride: 1 1244 | weight_filler { 1245 | type: "xavier" 1246 | } 1247 | bias_filler { 1248 | type: "constant" 1249 | value: 0 1250 | } 1251 | } 1252 | } 1253 | layer { 1254 | name: "conv6_2_mbox_loc_perm" 1255 | type: "Permute" 1256 | bottom: "conv6_2_mbox_loc" 1257 | top: "conv6_2_mbox_loc_perm" 1258 | permute_param { 1259 | order: 0 1260 | order: 2 1261 | order: 3 1262 | order: 1 1263 | } 1264 | } 1265 | layer { 1266 | name: "conv6_2_mbox_loc_flat" 1267 | type: "Flatten" 1268 | bottom: "conv6_2_mbox_loc_perm" 1269 | top: "conv6_2_mbox_loc_flat" 1270 | flatten_param { 1271 | axis: 1 1272 | } 1273 | } 1274 | layer { 1275 | name: "conv6_2_mbox_conf" 1276 | type: "Convolution" 1277 | bottom: "conv6_2_h" 1278 | top: "conv6_2_mbox_conf" 1279 | param { 1280 | lr_mult: 1 1281 | decay_mult: 1 1282 | } 1283 | param { 1284 | lr_mult: 2 1285 | decay_mult: 0 1286 | } 1287 | convolution_param { 1288 | num_output: 12 # 126 1289 | pad: 1 1290 | kernel_size: 3 1291 | stride: 1 1292 | weight_filler { 1293 | type: "xavier" 1294 | } 1295 | bias_filler { 1296 | type: "constant" 1297 | value: 0 1298 | } 1299 | } 1300 | } 1301 | layer { 1302 | name: "conv6_2_mbox_conf_perm" 1303 | type: "Permute" 1304 | bottom: "conv6_2_mbox_conf" 1305 | top: "conv6_2_mbox_conf_perm" 1306 | permute_param { 1307 | order: 0 1308 | order: 2 1309 | order: 3 1310 | order: 1 1311 | } 1312 | } 1313 | layer { 1314 | name: "conv6_2_mbox_conf_flat" 1315 | type: "Flatten" 1316 | bottom: "conv6_2_mbox_conf_perm" 1317 | top: "conv6_2_mbox_conf_flat" 1318 | flatten_param { 1319 | axis: 1 1320 | } 1321 | } 1322 | layer { 1323 | name: "conv6_2_mbox_priorbox" 1324 | type: "PriorBox" 1325 | bottom: "conv6_2_h" 1326 | bottom: "data" 1327 | top: "conv6_2_mbox_priorbox" 1328 | prior_box_param { 1329 | min_size: 111.0 1330 | max_size: 162.0 1331 | aspect_ratio: 2 1332 | aspect_ratio: 3 1333 | flip: true 1334 | clip: false 1335 | variance: 0.1 1336 | variance: 0.1 1337 | variance: 0.2 1338 | variance: 0.2 1339 | step: 32 1340 | offset: 0.5 1341 | } 1342 | } 1343 | layer { 1344 | name: "conv7_2_mbox_loc" 1345 | type: "Convolution" 1346 | bottom: "conv7_2_h" 1347 | top: "conv7_2_mbox_loc" 1348 | param { 1349 | lr_mult: 1 1350 | decay_mult: 1 1351 | } 1352 | param { 1353 | lr_mult: 2 1354 | decay_mult: 0 1355 | } 1356 | convolution_param { 1357 | num_output: 24 1358 | pad: 1 1359 | kernel_size: 3 1360 | stride: 1 1361 | weight_filler { 1362 | type: "xavier" 1363 | } 1364 | bias_filler { 1365 | type: "constant" 1366 | value: 0 1367 | } 1368 | } 1369 | } 1370 | layer { 1371 | name: "conv7_2_mbox_loc_perm" 1372 | type: "Permute" 1373 | bottom: "conv7_2_mbox_loc" 1374 | top: "conv7_2_mbox_loc_perm" 1375 | permute_param { 1376 | order: 0 1377 | order: 2 1378 | order: 3 1379 | order: 1 1380 | } 1381 | } 1382 | layer { 1383 | name: "conv7_2_mbox_loc_flat" 1384 | type: "Flatten" 1385 | bottom: "conv7_2_mbox_loc_perm" 1386 | top: "conv7_2_mbox_loc_flat" 1387 | flatten_param { 1388 | axis: 1 1389 | } 1390 | } 1391 | layer { 1392 | name: "conv7_2_mbox_conf" 1393 | type: "Convolution" 1394 | bottom: "conv7_2_h" 1395 | top: "conv7_2_mbox_conf" 1396 | param { 1397 | lr_mult: 1 1398 | decay_mult: 1 1399 | } 1400 | param { 1401 | lr_mult: 2 1402 | decay_mult: 0 1403 | } 1404 | convolution_param { 1405 | num_output: 12 # 126 1406 | pad: 1 1407 | kernel_size: 3 1408 | stride: 1 1409 | weight_filler { 1410 | type: "xavier" 1411 | } 1412 | bias_filler { 1413 | type: "constant" 1414 | value: 0 1415 | } 1416 | } 1417 | } 1418 | layer { 1419 | name: "conv7_2_mbox_conf_perm" 1420 | type: "Permute" 1421 | bottom: "conv7_2_mbox_conf" 1422 | top: "conv7_2_mbox_conf_perm" 1423 | permute_param { 1424 | order: 0 1425 | order: 2 1426 | order: 3 1427 | order: 1 1428 | } 1429 | } 1430 | layer { 1431 | name: "conv7_2_mbox_conf_flat" 1432 | type: "Flatten" 1433 | bottom: "conv7_2_mbox_conf_perm" 1434 | top: "conv7_2_mbox_conf_flat" 1435 | flatten_param { 1436 | axis: 1 1437 | } 1438 | } 1439 | layer { 1440 | name: "conv7_2_mbox_priorbox" 1441 | type: "PriorBox" 1442 | bottom: "conv7_2_h" 1443 | bottom: "data" 1444 | top: "conv7_2_mbox_priorbox" 1445 | prior_box_param { 1446 | min_size: 162.0 1447 | max_size: 213.0 1448 | aspect_ratio: 2 1449 | aspect_ratio: 3 1450 | flip: true 1451 | clip: false 1452 | variance: 0.1 1453 | variance: 0.1 1454 | variance: 0.2 1455 | variance: 0.2 1456 | step: 64 1457 | offset: 0.5 1458 | } 1459 | } 1460 | layer { 1461 | name: "conv8_2_mbox_loc" 1462 | type: "Convolution" 1463 | bottom: "conv8_2_h" 1464 | top: "conv8_2_mbox_loc" 1465 | param { 1466 | lr_mult: 1 1467 | decay_mult: 1 1468 | } 1469 | param { 1470 | lr_mult: 2 1471 | decay_mult: 0 1472 | } 1473 | convolution_param { 1474 | num_output: 16 1475 | pad: 1 1476 | kernel_size: 3 1477 | stride: 1 1478 | weight_filler { 1479 | type: "xavier" 1480 | } 1481 | bias_filler { 1482 | type: "constant" 1483 | value: 0 1484 | } 1485 | } 1486 | } 1487 | layer { 1488 | name: "conv8_2_mbox_loc_perm" 1489 | type: "Permute" 1490 | bottom: "conv8_2_mbox_loc" 1491 | top: "conv8_2_mbox_loc_perm" 1492 | permute_param { 1493 | order: 0 1494 | order: 2 1495 | order: 3 1496 | order: 1 1497 | } 1498 | } 1499 | layer { 1500 | name: "conv8_2_mbox_loc_flat" 1501 | type: "Flatten" 1502 | bottom: "conv8_2_mbox_loc_perm" 1503 | top: "conv8_2_mbox_loc_flat" 1504 | flatten_param { 1505 | axis: 1 1506 | } 1507 | } 1508 | layer { 1509 | name: "conv8_2_mbox_conf" 1510 | type: "Convolution" 1511 | bottom: "conv8_2_h" 1512 | top: "conv8_2_mbox_conf" 1513 | param { 1514 | lr_mult: 1 1515 | decay_mult: 1 1516 | } 1517 | param { 1518 | lr_mult: 2 1519 | decay_mult: 0 1520 | } 1521 | convolution_param { 1522 | num_output: 8 # 84 1523 | pad: 1 1524 | kernel_size: 3 1525 | stride: 1 1526 | weight_filler { 1527 | type: "xavier" 1528 | } 1529 | bias_filler { 1530 | type: "constant" 1531 | value: 0 1532 | } 1533 | } 1534 | } 1535 | layer { 1536 | name: "conv8_2_mbox_conf_perm" 1537 | type: "Permute" 1538 | bottom: "conv8_2_mbox_conf" 1539 | top: "conv8_2_mbox_conf_perm" 1540 | permute_param { 1541 | order: 0 1542 | order: 2 1543 | order: 3 1544 | order: 1 1545 | } 1546 | } 1547 | layer { 1548 | name: "conv8_2_mbox_conf_flat" 1549 | type: "Flatten" 1550 | bottom: "conv8_2_mbox_conf_perm" 1551 | top: "conv8_2_mbox_conf_flat" 1552 | flatten_param { 1553 | axis: 1 1554 | } 1555 | } 1556 | layer { 1557 | name: "conv8_2_mbox_priorbox" 1558 | type: "PriorBox" 1559 | bottom: "conv8_2_h" 1560 | bottom: "data" 1561 | top: "conv8_2_mbox_priorbox" 1562 | prior_box_param { 1563 | min_size: 213.0 1564 | max_size: 264.0 1565 | aspect_ratio: 2 1566 | flip: true 1567 | clip: false 1568 | variance: 0.1 1569 | variance: 0.1 1570 | variance: 0.2 1571 | variance: 0.2 1572 | step: 100 1573 | offset: 0.5 1574 | } 1575 | } 1576 | layer { 1577 | name: "conv9_2_mbox_loc" 1578 | type: "Convolution" 1579 | bottom: "conv9_2_h" 1580 | top: "conv9_2_mbox_loc" 1581 | param { 1582 | lr_mult: 1 1583 | decay_mult: 1 1584 | } 1585 | param { 1586 | lr_mult: 2 1587 | decay_mult: 0 1588 | } 1589 | convolution_param { 1590 | num_output: 16 1591 | pad: 1 1592 | kernel_size: 3 1593 | stride: 1 1594 | weight_filler { 1595 | type: "xavier" 1596 | } 1597 | bias_filler { 1598 | type: "constant" 1599 | value: 0 1600 | } 1601 | } 1602 | } 1603 | layer { 1604 | name: "conv9_2_mbox_loc_perm" 1605 | type: "Permute" 1606 | bottom: "conv9_2_mbox_loc" 1607 | top: "conv9_2_mbox_loc_perm" 1608 | permute_param { 1609 | order: 0 1610 | order: 2 1611 | order: 3 1612 | order: 1 1613 | } 1614 | } 1615 | layer { 1616 | name: "conv9_2_mbox_loc_flat" 1617 | type: "Flatten" 1618 | bottom: "conv9_2_mbox_loc_perm" 1619 | top: "conv9_2_mbox_loc_flat" 1620 | flatten_param { 1621 | axis: 1 1622 | } 1623 | } 1624 | layer { 1625 | name: "conv9_2_mbox_conf" 1626 | type: "Convolution" 1627 | bottom: "conv9_2_h" 1628 | top: "conv9_2_mbox_conf" 1629 | param { 1630 | lr_mult: 1 1631 | decay_mult: 1 1632 | } 1633 | param { 1634 | lr_mult: 2 1635 | decay_mult: 0 1636 | } 1637 | convolution_param { 1638 | num_output: 8 # 84 1639 | pad: 1 1640 | kernel_size: 3 1641 | stride: 1 1642 | weight_filler { 1643 | type: "xavier" 1644 | } 1645 | bias_filler { 1646 | type: "constant" 1647 | value: 0 1648 | } 1649 | } 1650 | } 1651 | layer { 1652 | name: "conv9_2_mbox_conf_perm" 1653 | type: "Permute" 1654 | bottom: "conv9_2_mbox_conf" 1655 | top: "conv9_2_mbox_conf_perm" 1656 | permute_param { 1657 | order: 0 1658 | order: 2 1659 | order: 3 1660 | order: 1 1661 | } 1662 | } 1663 | layer { 1664 | name: "conv9_2_mbox_conf_flat" 1665 | type: "Flatten" 1666 | bottom: "conv9_2_mbox_conf_perm" 1667 | top: "conv9_2_mbox_conf_flat" 1668 | flatten_param { 1669 | axis: 1 1670 | } 1671 | } 1672 | layer { 1673 | name: "conv9_2_mbox_priorbox" 1674 | type: "PriorBox" 1675 | bottom: "conv9_2_h" 1676 | bottom: "data" 1677 | top: "conv9_2_mbox_priorbox" 1678 | prior_box_param { 1679 | min_size: 264.0 1680 | max_size: 315.0 1681 | aspect_ratio: 2 1682 | flip: true 1683 | clip: false 1684 | variance: 0.1 1685 | variance: 0.1 1686 | variance: 0.2 1687 | variance: 0.2 1688 | step: 300 1689 | offset: 0.5 1690 | } 1691 | } 1692 | layer { 1693 | name: "mbox_loc" 1694 | type: "Concat" 1695 | bottom: "conv4_3_norm_mbox_loc_flat" 1696 | bottom: "fc7_mbox_loc_flat" 1697 | bottom: "conv6_2_mbox_loc_flat" 1698 | bottom: "conv7_2_mbox_loc_flat" 1699 | bottom: "conv8_2_mbox_loc_flat" 1700 | bottom: "conv9_2_mbox_loc_flat" 1701 | top: "mbox_loc" 1702 | concat_param { 1703 | axis: 1 1704 | } 1705 | } 1706 | layer { 1707 | name: "mbox_conf" 1708 | type: "Concat" 1709 | bottom: "conv4_3_norm_mbox_conf_flat" 1710 | bottom: "fc7_mbox_conf_flat" 1711 | bottom: "conv6_2_mbox_conf_flat" 1712 | bottom: "conv7_2_mbox_conf_flat" 1713 | bottom: "conv8_2_mbox_conf_flat" 1714 | bottom: "conv9_2_mbox_conf_flat" 1715 | top: "mbox_conf" 1716 | concat_param { 1717 | axis: 1 1718 | } 1719 | } 1720 | layer { 1721 | name: "mbox_priorbox" 1722 | type: "Concat" 1723 | bottom: "conv4_3_norm_mbox_priorbox" 1724 | bottom: "fc7_mbox_priorbox" 1725 | bottom: "conv6_2_mbox_priorbox" 1726 | bottom: "conv7_2_mbox_priorbox" 1727 | bottom: "conv8_2_mbox_priorbox" 1728 | bottom: "conv9_2_mbox_priorbox" 1729 | top: "mbox_priorbox" 1730 | concat_param { 1731 | axis: 2 1732 | } 1733 | } 1734 | 1735 | layer { 1736 | name: "mbox_conf_reshape" 1737 | type: "Reshape" 1738 | bottom: "mbox_conf" 1739 | top: "mbox_conf_reshape" 1740 | reshape_param { 1741 | shape { 1742 | dim: 0 1743 | dim: -1 1744 | dim: 2 1745 | } 1746 | } 1747 | } 1748 | layer { 1749 | name: "mbox_conf_softmax" 1750 | type: "Softmax" 1751 | bottom: "mbox_conf_reshape" 1752 | top: "mbox_conf_softmax" 1753 | softmax_param { 1754 | axis: 2 1755 | } 1756 | } 1757 | layer { 1758 | name: "mbox_conf_flatten" 1759 | type: "Flatten" 1760 | bottom: "mbox_conf_softmax" 1761 | top: "mbox_conf_flatten" 1762 | flatten_param { 1763 | axis: 1 1764 | } 1765 | } 1766 | 1767 | layer { 1768 | name: "detection_out" 1769 | type: "DetectionOutput" 1770 | bottom: "mbox_loc" 1771 | bottom: "mbox_conf_flatten" 1772 | bottom: "mbox_priorbox" 1773 | top: "detection_out" 1774 | include { 1775 | phase: TEST 1776 | } 1777 | detection_output_param { 1778 | num_classes: 2 1779 | share_location: true 1780 | background_label_id: 0 1781 | nms_param { 1782 | nms_threshold: 0.3 1783 | top_k: 400 1784 | } 1785 | code_type: CENTER_SIZE 1786 | keep_top_k: 200 1787 | confidence_threshold: 0.01 1788 | } 1789 | } 1790 | -------------------------------------------------------------------------------- /model_SSD/res10_300x300_ssd_iter_140000_fp16.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fizmath/Docker-opencv-GPU/c4b1fc4c3e4e22417bb49a8990bf62d33454f175/model_SSD/res10_300x300_ssd_iter_140000_fp16.caffemodel -------------------------------------------------------------------------------- /model_SURE/EDSR_x3.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fizmath/Docker-opencv-GPU/c4b1fc4c3e4e22417bb49a8990bf62d33454f175/model_SURE/EDSR_x3.pb -------------------------------------------------------------------------------- /model_YOLOv4/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /model_YOLOv4/yolov4.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | batch=64 3 | subdivisions=8 4 | # Training 5 | #width=512 6 | #height=512 7 | width=416 8 | height=416 9 | channels=3 10 | momentum=0.949 11 | decay=0.0005 12 | angle=0 13 | saturation = 1.5 14 | exposure = 1.5 15 | hue=.1 16 | 17 | learning_rate=0.0013 18 | burn_in=1000 19 | max_batches = 500500 20 | policy=steps 21 | steps=400000,450000 22 | scales=.1,.1 23 | 24 | #cutmix=1 25 | mosaic=1 26 | 27 | #:104x104 54:52x52 85:26x26 104:13x13 for 416 28 | 29 | [convolutional] 30 | batch_normalize=1 31 | filters=32 32 | size=3 33 | stride=1 34 | pad=1 35 | activation=mish 36 | 37 | # Downsample 38 | 39 | [convolutional] 40 | batch_normalize=1 41 | filters=64 42 | size=3 43 | stride=2 44 | pad=1 45 | activation=mish 46 | 47 | [convolutional] 48 | batch_normalize=1 49 | filters=64 50 | size=1 51 | stride=1 52 | pad=1 53 | activation=mish 54 | 55 | [route] 56 | layers = -2 57 | 58 | [convolutional] 59 | batch_normalize=1 60 | filters=64 61 | size=1 62 | stride=1 63 | pad=1 64 | activation=mish 65 | 66 | [convolutional] 67 | batch_normalize=1 68 | filters=32 69 | size=1 70 | stride=1 71 | pad=1 72 | activation=mish 73 | 74 | [convolutional] 75 | batch_normalize=1 76 | filters=64 77 | size=3 78 | stride=1 79 | pad=1 80 | activation=mish 81 | 82 | [shortcut] 83 | from=-3 84 | activation=linear 85 | 86 | [convolutional] 87 | batch_normalize=1 88 | filters=64 89 | size=1 90 | stride=1 91 | pad=1 92 | activation=mish 93 | 94 | [route] 95 | layers = -1,-7 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=64 100 | size=1 101 | stride=1 102 | pad=1 103 | activation=mish 104 | 105 | # Downsample 106 | 107 | [convolutional] 108 | batch_normalize=1 109 | filters=128 110 | size=3 111 | stride=2 112 | pad=1 113 | activation=mish 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=64 118 | size=1 119 | stride=1 120 | pad=1 121 | activation=mish 122 | 123 | [route] 124 | layers = -2 125 | 126 | [convolutional] 127 | batch_normalize=1 128 | filters=64 129 | size=1 130 | stride=1 131 | pad=1 132 | activation=mish 133 | 134 | [convolutional] 135 | batch_normalize=1 136 | filters=64 137 | size=1 138 | stride=1 139 | pad=1 140 | activation=mish 141 | 142 | [convolutional] 143 | batch_normalize=1 144 | filters=64 145 | size=3 146 | stride=1 147 | pad=1 148 | activation=mish 149 | 150 | [shortcut] 151 | from=-3 152 | activation=linear 153 | 154 | [convolutional] 155 | batch_normalize=1 156 | filters=64 157 | size=1 158 | stride=1 159 | pad=1 160 | activation=mish 161 | 162 | [convolutional] 163 | batch_normalize=1 164 | filters=64 165 | size=3 166 | stride=1 167 | pad=1 168 | activation=mish 169 | 170 | [shortcut] 171 | from=-3 172 | activation=linear 173 | 174 | [convolutional] 175 | batch_normalize=1 176 | filters=64 177 | size=1 178 | stride=1 179 | pad=1 180 | activation=mish 181 | 182 | [route] 183 | layers = -1,-10 184 | 185 | [convolutional] 186 | batch_normalize=1 187 | filters=128 188 | size=1 189 | stride=1 190 | pad=1 191 | activation=mish 192 | 193 | # Downsample 194 | 195 | [convolutional] 196 | batch_normalize=1 197 | filters=256 198 | size=3 199 | stride=2 200 | pad=1 201 | activation=mish 202 | 203 | [convolutional] 204 | batch_normalize=1 205 | filters=128 206 | size=1 207 | stride=1 208 | pad=1 209 | activation=mish 210 | 211 | [route] 212 | layers = -2 213 | 214 | [convolutional] 215 | batch_normalize=1 216 | filters=128 217 | size=1 218 | stride=1 219 | pad=1 220 | activation=mish 221 | 222 | [convolutional] 223 | batch_normalize=1 224 | filters=128 225 | size=1 226 | stride=1 227 | pad=1 228 | activation=mish 229 | 230 | [convolutional] 231 | batch_normalize=1 232 | filters=128 233 | size=3 234 | stride=1 235 | pad=1 236 | activation=mish 237 | 238 | [shortcut] 239 | from=-3 240 | activation=linear 241 | 242 | [convolutional] 243 | batch_normalize=1 244 | filters=128 245 | size=1 246 | stride=1 247 | pad=1 248 | activation=mish 249 | 250 | [convolutional] 251 | batch_normalize=1 252 | filters=128 253 | size=3 254 | stride=1 255 | pad=1 256 | activation=mish 257 | 258 | [shortcut] 259 | from=-3 260 | activation=linear 261 | 262 | [convolutional] 263 | batch_normalize=1 264 | filters=128 265 | size=1 266 | stride=1 267 | pad=1 268 | activation=mish 269 | 270 | [convolutional] 271 | batch_normalize=1 272 | filters=128 273 | size=3 274 | stride=1 275 | pad=1 276 | activation=mish 277 | 278 | [shortcut] 279 | from=-3 280 | activation=linear 281 | 282 | [convolutional] 283 | batch_normalize=1 284 | filters=128 285 | size=1 286 | stride=1 287 | pad=1 288 | activation=mish 289 | 290 | [convolutional] 291 | batch_normalize=1 292 | filters=128 293 | size=3 294 | stride=1 295 | pad=1 296 | activation=mish 297 | 298 | [shortcut] 299 | from=-3 300 | activation=linear 301 | 302 | 303 | [convolutional] 304 | batch_normalize=1 305 | filters=128 306 | size=1 307 | stride=1 308 | pad=1 309 | activation=mish 310 | 311 | [convolutional] 312 | batch_normalize=1 313 | filters=128 314 | size=3 315 | stride=1 316 | pad=1 317 | activation=mish 318 | 319 | [shortcut] 320 | from=-3 321 | activation=linear 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=128 326 | size=1 327 | stride=1 328 | pad=1 329 | activation=mish 330 | 331 | [convolutional] 332 | batch_normalize=1 333 | filters=128 334 | size=3 335 | stride=1 336 | pad=1 337 | activation=mish 338 | 339 | [shortcut] 340 | from=-3 341 | activation=linear 342 | 343 | [convolutional] 344 | batch_normalize=1 345 | filters=128 346 | size=1 347 | stride=1 348 | pad=1 349 | activation=mish 350 | 351 | [convolutional] 352 | batch_normalize=1 353 | filters=128 354 | size=3 355 | stride=1 356 | pad=1 357 | activation=mish 358 | 359 | [shortcut] 360 | from=-3 361 | activation=linear 362 | 363 | [convolutional] 364 | batch_normalize=1 365 | filters=128 366 | size=1 367 | stride=1 368 | pad=1 369 | activation=mish 370 | 371 | [convolutional] 372 | batch_normalize=1 373 | filters=128 374 | size=3 375 | stride=1 376 | pad=1 377 | activation=mish 378 | 379 | [shortcut] 380 | from=-3 381 | activation=linear 382 | 383 | [convolutional] 384 | batch_normalize=1 385 | filters=128 386 | size=1 387 | stride=1 388 | pad=1 389 | activation=mish 390 | 391 | [route] 392 | layers = -1,-28 393 | 394 | [convolutional] 395 | batch_normalize=1 396 | filters=256 397 | size=1 398 | stride=1 399 | pad=1 400 | activation=mish 401 | 402 | # Downsample 403 | 404 | [convolutional] 405 | batch_normalize=1 406 | filters=512 407 | size=3 408 | stride=2 409 | pad=1 410 | activation=mish 411 | 412 | [convolutional] 413 | batch_normalize=1 414 | filters=256 415 | size=1 416 | stride=1 417 | pad=1 418 | activation=mish 419 | 420 | [route] 421 | layers = -2 422 | 423 | [convolutional] 424 | batch_normalize=1 425 | filters=256 426 | size=1 427 | stride=1 428 | pad=1 429 | activation=mish 430 | 431 | [convolutional] 432 | batch_normalize=1 433 | filters=256 434 | size=1 435 | stride=1 436 | pad=1 437 | activation=mish 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=3 443 | stride=1 444 | pad=1 445 | activation=mish 446 | 447 | [shortcut] 448 | from=-3 449 | activation=linear 450 | 451 | 452 | [convolutional] 453 | batch_normalize=1 454 | filters=256 455 | size=1 456 | stride=1 457 | pad=1 458 | activation=mish 459 | 460 | [convolutional] 461 | batch_normalize=1 462 | filters=256 463 | size=3 464 | stride=1 465 | pad=1 466 | activation=mish 467 | 468 | [shortcut] 469 | from=-3 470 | activation=linear 471 | 472 | 473 | [convolutional] 474 | batch_normalize=1 475 | filters=256 476 | size=1 477 | stride=1 478 | pad=1 479 | activation=mish 480 | 481 | [convolutional] 482 | batch_normalize=1 483 | filters=256 484 | size=3 485 | stride=1 486 | pad=1 487 | activation=mish 488 | 489 | [shortcut] 490 | from=-3 491 | activation=linear 492 | 493 | 494 | [convolutional] 495 | batch_normalize=1 496 | filters=256 497 | size=1 498 | stride=1 499 | pad=1 500 | activation=mish 501 | 502 | [convolutional] 503 | batch_normalize=1 504 | filters=256 505 | size=3 506 | stride=1 507 | pad=1 508 | activation=mish 509 | 510 | [shortcut] 511 | from=-3 512 | activation=linear 513 | 514 | 515 | [convolutional] 516 | batch_normalize=1 517 | filters=256 518 | size=1 519 | stride=1 520 | pad=1 521 | activation=mish 522 | 523 | [convolutional] 524 | batch_normalize=1 525 | filters=256 526 | size=3 527 | stride=1 528 | pad=1 529 | activation=mish 530 | 531 | [shortcut] 532 | from=-3 533 | activation=linear 534 | 535 | 536 | [convolutional] 537 | batch_normalize=1 538 | filters=256 539 | size=1 540 | stride=1 541 | pad=1 542 | activation=mish 543 | 544 | [convolutional] 545 | batch_normalize=1 546 | filters=256 547 | size=3 548 | stride=1 549 | pad=1 550 | activation=mish 551 | 552 | [shortcut] 553 | from=-3 554 | activation=linear 555 | 556 | 557 | [convolutional] 558 | batch_normalize=1 559 | filters=256 560 | size=1 561 | stride=1 562 | pad=1 563 | activation=mish 564 | 565 | [convolutional] 566 | batch_normalize=1 567 | filters=256 568 | size=3 569 | stride=1 570 | pad=1 571 | activation=mish 572 | 573 | [shortcut] 574 | from=-3 575 | activation=linear 576 | 577 | [convolutional] 578 | batch_normalize=1 579 | filters=256 580 | size=1 581 | stride=1 582 | pad=1 583 | activation=mish 584 | 585 | [convolutional] 586 | batch_normalize=1 587 | filters=256 588 | size=3 589 | stride=1 590 | pad=1 591 | activation=mish 592 | 593 | [shortcut] 594 | from=-3 595 | activation=linear 596 | 597 | [convolutional] 598 | batch_normalize=1 599 | filters=256 600 | size=1 601 | stride=1 602 | pad=1 603 | activation=mish 604 | 605 | [route] 606 | layers = -1,-28 607 | 608 | [convolutional] 609 | batch_normalize=1 610 | filters=512 611 | size=1 612 | stride=1 613 | pad=1 614 | activation=mish 615 | 616 | # Downsample 617 | 618 | [convolutional] 619 | batch_normalize=1 620 | filters=1024 621 | size=3 622 | stride=2 623 | pad=1 624 | activation=mish 625 | 626 | [convolutional] 627 | batch_normalize=1 628 | filters=512 629 | size=1 630 | stride=1 631 | pad=1 632 | activation=mish 633 | 634 | [route] 635 | layers = -2 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=512 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=mish 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | filters=512 648 | size=1 649 | stride=1 650 | pad=1 651 | activation=mish 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=512 656 | size=3 657 | stride=1 658 | pad=1 659 | activation=mish 660 | 661 | [shortcut] 662 | from=-3 663 | activation=linear 664 | 665 | [convolutional] 666 | batch_normalize=1 667 | filters=512 668 | size=1 669 | stride=1 670 | pad=1 671 | activation=mish 672 | 673 | [convolutional] 674 | batch_normalize=1 675 | filters=512 676 | size=3 677 | stride=1 678 | pad=1 679 | activation=mish 680 | 681 | [shortcut] 682 | from=-3 683 | activation=linear 684 | 685 | [convolutional] 686 | batch_normalize=1 687 | filters=512 688 | size=1 689 | stride=1 690 | pad=1 691 | activation=mish 692 | 693 | [convolutional] 694 | batch_normalize=1 695 | filters=512 696 | size=3 697 | stride=1 698 | pad=1 699 | activation=mish 700 | 701 | [shortcut] 702 | from=-3 703 | activation=linear 704 | 705 | [convolutional] 706 | batch_normalize=1 707 | filters=512 708 | size=1 709 | stride=1 710 | pad=1 711 | activation=mish 712 | 713 | [convolutional] 714 | batch_normalize=1 715 | filters=512 716 | size=3 717 | stride=1 718 | pad=1 719 | activation=mish 720 | 721 | [shortcut] 722 | from=-3 723 | activation=linear 724 | 725 | [convolutional] 726 | batch_normalize=1 727 | filters=512 728 | size=1 729 | stride=1 730 | pad=1 731 | activation=mish 732 | 733 | [route] 734 | layers = -1,-16 735 | 736 | [convolutional] 737 | batch_normalize=1 738 | filters=1024 739 | size=1 740 | stride=1 741 | pad=1 742 | activation=mish 743 | 744 | ########################## 745 | 746 | [convolutional] 747 | batch_normalize=1 748 | filters=512 749 | size=1 750 | stride=1 751 | pad=1 752 | activation=leaky 753 | 754 | [convolutional] 755 | batch_normalize=1 756 | size=3 757 | stride=1 758 | pad=1 759 | filters=1024 760 | activation=leaky 761 | 762 | [convolutional] 763 | batch_normalize=1 764 | filters=512 765 | size=1 766 | stride=1 767 | pad=1 768 | activation=leaky 769 | 770 | ### SPP ### 771 | [maxpool] 772 | stride=1 773 | size=5 774 | 775 | [route] 776 | layers=-2 777 | 778 | [maxpool] 779 | stride=1 780 | size=9 781 | 782 | [route] 783 | layers=-4 784 | 785 | [maxpool] 786 | stride=1 787 | size=13 788 | 789 | [route] 790 | layers=-1,-3,-5,-6 791 | ### End SPP ### 792 | 793 | [convolutional] 794 | batch_normalize=1 795 | filters=512 796 | size=1 797 | stride=1 798 | pad=1 799 | activation=leaky 800 | 801 | [convolutional] 802 | batch_normalize=1 803 | size=3 804 | stride=1 805 | pad=1 806 | filters=1024 807 | activation=leaky 808 | 809 | [convolutional] 810 | batch_normalize=1 811 | filters=512 812 | size=1 813 | stride=1 814 | pad=1 815 | activation=leaky 816 | 817 | [convolutional] 818 | batch_normalize=1 819 | filters=256 820 | size=1 821 | stride=1 822 | pad=1 823 | activation=leaky 824 | 825 | [upsample] 826 | stride=2 827 | 828 | [route] 829 | layers = 85 830 | 831 | [convolutional] 832 | batch_normalize=1 833 | filters=256 834 | size=1 835 | stride=1 836 | pad=1 837 | activation=leaky 838 | 839 | [route] 840 | layers = -1, -3 841 | 842 | [convolutional] 843 | batch_normalize=1 844 | filters=256 845 | size=1 846 | stride=1 847 | pad=1 848 | activation=leaky 849 | 850 | [convolutional] 851 | batch_normalize=1 852 | size=3 853 | stride=1 854 | pad=1 855 | filters=512 856 | activation=leaky 857 | 858 | [convolutional] 859 | batch_normalize=1 860 | filters=256 861 | size=1 862 | stride=1 863 | pad=1 864 | activation=leaky 865 | 866 | [convolutional] 867 | batch_normalize=1 868 | size=3 869 | stride=1 870 | pad=1 871 | filters=512 872 | activation=leaky 873 | 874 | [convolutional] 875 | batch_normalize=1 876 | filters=256 877 | size=1 878 | stride=1 879 | pad=1 880 | activation=leaky 881 | 882 | [convolutional] 883 | batch_normalize=1 884 | filters=128 885 | size=1 886 | stride=1 887 | pad=1 888 | activation=leaky 889 | 890 | [upsample] 891 | stride=2 892 | 893 | [route] 894 | layers = 54 895 | 896 | [convolutional] 897 | batch_normalize=1 898 | filters=128 899 | size=1 900 | stride=1 901 | pad=1 902 | activation=leaky 903 | 904 | [route] 905 | layers = -1, -3 906 | 907 | [convolutional] 908 | batch_normalize=1 909 | filters=128 910 | size=1 911 | stride=1 912 | pad=1 913 | activation=leaky 914 | 915 | [convolutional] 916 | batch_normalize=1 917 | size=3 918 | stride=1 919 | pad=1 920 | filters=256 921 | activation=leaky 922 | 923 | [convolutional] 924 | batch_normalize=1 925 | filters=128 926 | size=1 927 | stride=1 928 | pad=1 929 | activation=leaky 930 | 931 | [convolutional] 932 | batch_normalize=1 933 | size=3 934 | stride=1 935 | pad=1 936 | filters=256 937 | activation=leaky 938 | 939 | [convolutional] 940 | batch_normalize=1 941 | filters=128 942 | size=1 943 | stride=1 944 | pad=1 945 | activation=leaky 946 | 947 | ########################## 948 | 949 | [convolutional] 950 | batch_normalize=1 951 | size=3 952 | stride=1 953 | pad=1 954 | filters=256 955 | activation=leaky 956 | 957 | [convolutional] 958 | size=1 959 | stride=1 960 | pad=1 961 | filters=255 962 | activation=linear 963 | 964 | 965 | [yolo] 966 | mask = 0,1,2 967 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 968 | classes=80 969 | num=9 970 | jitter=.3 971 | ignore_thresh = .7 972 | truth_thresh = 1 973 | scale_x_y = 1.2 974 | iou_thresh=0.213 975 | cls_normalizer=1.0 976 | iou_normalizer=0.07 977 | iou_loss=ciou 978 | nms_kind=greedynms 979 | beta_nms=0.6 980 | max_delta=5 981 | 982 | 983 | [route] 984 | layers = -4 985 | 986 | [convolutional] 987 | batch_normalize=1 988 | size=3 989 | stride=2 990 | pad=1 991 | filters=256 992 | activation=leaky 993 | 994 | [route] 995 | layers = -1, -16 996 | 997 | [convolutional] 998 | batch_normalize=1 999 | filters=256 1000 | size=1 1001 | stride=1 1002 | pad=1 1003 | activation=leaky 1004 | 1005 | [convolutional] 1006 | batch_normalize=1 1007 | size=3 1008 | stride=1 1009 | pad=1 1010 | filters=512 1011 | activation=leaky 1012 | 1013 | [convolutional] 1014 | batch_normalize=1 1015 | filters=256 1016 | size=1 1017 | stride=1 1018 | pad=1 1019 | activation=leaky 1020 | 1021 | [convolutional] 1022 | batch_normalize=1 1023 | size=3 1024 | stride=1 1025 | pad=1 1026 | filters=512 1027 | activation=leaky 1028 | 1029 | [convolutional] 1030 | batch_normalize=1 1031 | filters=256 1032 | size=1 1033 | stride=1 1034 | pad=1 1035 | activation=leaky 1036 | 1037 | [convolutional] 1038 | batch_normalize=1 1039 | size=3 1040 | stride=1 1041 | pad=1 1042 | filters=512 1043 | activation=leaky 1044 | 1045 | [convolutional] 1046 | size=1 1047 | stride=1 1048 | pad=1 1049 | filters=255 1050 | activation=linear 1051 | 1052 | 1053 | [yolo] 1054 | mask = 3,4,5 1055 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1056 | classes=80 1057 | num=9 1058 | jitter=.3 1059 | ignore_thresh = .7 1060 | truth_thresh = 1 1061 | scale_x_y = 1.1 1062 | iou_thresh=0.213 1063 | cls_normalizer=1.0 1064 | iou_normalizer=0.07 1065 | iou_loss=ciou 1066 | nms_kind=greedynms 1067 | beta_nms=0.6 1068 | max_delta=5 1069 | 1070 | 1071 | [route] 1072 | layers = -4 1073 | 1074 | [convolutional] 1075 | batch_normalize=1 1076 | size=3 1077 | stride=2 1078 | pad=1 1079 | filters=512 1080 | activation=leaky 1081 | 1082 | [route] 1083 | layers = -1, -37 1084 | 1085 | [convolutional] 1086 | batch_normalize=1 1087 | filters=512 1088 | size=1 1089 | stride=1 1090 | pad=1 1091 | activation=leaky 1092 | 1093 | [convolutional] 1094 | batch_normalize=1 1095 | size=3 1096 | stride=1 1097 | pad=1 1098 | filters=1024 1099 | activation=leaky 1100 | 1101 | [convolutional] 1102 | batch_normalize=1 1103 | filters=512 1104 | size=1 1105 | stride=1 1106 | pad=1 1107 | activation=leaky 1108 | 1109 | [convolutional] 1110 | batch_normalize=1 1111 | size=3 1112 | stride=1 1113 | pad=1 1114 | filters=1024 1115 | activation=leaky 1116 | 1117 | [convolutional] 1118 | batch_normalize=1 1119 | filters=512 1120 | size=1 1121 | stride=1 1122 | pad=1 1123 | activation=leaky 1124 | 1125 | [convolutional] 1126 | batch_normalize=1 1127 | size=3 1128 | stride=1 1129 | pad=1 1130 | filters=1024 1131 | activation=leaky 1132 | 1133 | [convolutional] 1134 | size=1 1135 | stride=1 1136 | pad=1 1137 | filters=255 1138 | activation=linear 1139 | 1140 | 1141 | [yolo] 1142 | mask = 6,7,8 1143 | anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 1144 | classes=80 1145 | num=9 1146 | jitter=.3 1147 | ignore_thresh = .7 1148 | truth_thresh = 1 1149 | random=1 1150 | scale_x_y = 1.05 1151 | iou_thresh=0.213 1152 | cls_normalizer=1.0 1153 | iou_normalizer=0.07 1154 | iou_loss=ciou 1155 | nms_kind=greedynms 1156 | beta_nms=0.6 1157 | max_delta=5 1158 | -------------------------------------------------------------------------------- /super_resolution.py: -------------------------------------------------------------------------------- 1 | """ 2 | OpenCV DNN Super Resolution with GPU/CPU 3 | https://github.com/opencv/opencv_contrib/tree/master/modules/dnn_superres 4 | 5 | """ 6 | import cv2 7 | from cv2 import dnn_superres 8 | 9 | image = cv2.imread('./golden_axe.png') 10 | path = "./model_SURE/EDSR_x3.pb" 11 | 12 | net = dnn_superres.DnnSuperResImpl_create() 13 | net.readModel(path) 14 | # if you run out of GPU memory comment the following two lines to make inference by CPU cores 15 | net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) 16 | net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA) 17 | # 18 | # input EDSR version 19 | net.setModel("edsr", 3) 20 | result = net.upsample(image) 21 | cv2.imwrite("./SURE_golden_axe.png", result) -------------------------------------------------------------------------------- /twistellar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fizmath/Docker-opencv-GPU/c4b1fc4c3e4e22417bb49a8990bf62d33454f175/twistellar.png --------------------------------------------------------------------------------