├── Jupyter ├── README.md ├── inference.py ├── safety_gear_detector_jupyter.ipynb └── safety_gear_detector_jupyter.py ├── LICENSE ├── README.md ├── application ├── inference.py └── safety_gear_detector.py ├── docs └── images │ ├── archdia.png │ ├── jupy1.png │ ├── jupy2.png │ └── safetygear.png ├── resources ├── Safety_Full_Hat_and_Vest.mp4 ├── config.json └── worker-safety-mobilenet │ ├── worker_safety_mobilenet.caffemodel │ └── worker_safety_mobilenet.prototxt └── setup.sh /Jupyter/README.md: -------------------------------------------------------------------------------- 1 | # Safety Gear Detector 2 | 3 | | Details | | 4 | |-----------------------|---------------| 5 | | Target OS: | Ubuntu\* 18.04 LTS | 6 | | Programming Language: | Python\* 3.5| 7 | | Time to Complete: | 30-40min | 8 | 9 | ![safety-gear-detector](./docs/images/safetygear.png) 10 | 11 | 12 | ## What It Does 13 | This reference implementation is capable of detecting people passing in front of a camera and detecting if the people are wearing safety-jackets and hard-hats. The application counts the number of people who are violating the safety gear standards and the total number of people detected. 14 | 15 | ## Requirements 16 | 17 | ### Hardware 18 | 19 | - 6th to 8th Generation Intel® Core™ processors with Iris® Pro graphics or Intel® HD Graphics 20 | 21 | ### Software 22 | 23 | - [Ubuntu\* 18.04 LTS](http://releases.ubuntu.com/18.04/)
24 | **Note**: We recommend using a 4.14+ Linux* kernel with this software. Run the following command to determine the kernel version: 25 | 26 | ``` 27 | uname -a 28 | ``` 29 | 30 | - OpenCL™ Runtime Package 31 | 32 | - Intel® Distribution of OpenVINO™ toolkit 2020 R3 Release 33 | 34 | ## How It Works 35 | The application uses the Inference Engine included in the Intel® Distribution of OpenVINO™ toolkit. 36 | 37 | Firstly, a trained neural network detects people in the frame and displays a green colored bounding box over them. For each person detected, the application determines if they are wearing a safety-jacket and hard-hat. If they are not, an alert is registered with the system. 38 | 39 | ![Architectural diagram](./docs/images/archdia.png) 40 | 41 | ## Setup 42 | 43 | ### Install Intel® Distribution of OpenVINO™ toolkit 44 | Refer to [Install the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to install and set up the toolkit. 45 | 46 | Install the OpenCL™ Runtime Package to run inference on the GPU. It is not mandatory for CPU inference. 47 | 48 | 54 | 55 | ## Setup 56 | ### Get the code 57 | Clone the reference implementation 58 | ``` 59 | sudo apt-get update && sudo apt-get install git 60 | git clone https://gitlab.devtools.intel.com/reference-implementations/safety-gear-detector-python-with-worker-safety-model.git 61 | ``` 62 | 63 | ### Install OpenVINO 64 | 65 | Refer to [Install Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to learn how to install and configure the toolkit. 66 | 67 | Install the OpenCL™ Runtime Package to run inference on the GPU, as shown in the instructions below. It is not mandatory for CPU inference. 68 | 69 | ### Other dependencies 70 | #### FFmpeg* 71 | FFmpeg is a free and open-source project capable of recording, converting and streaming digital audio and video in various formats. It can be used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more. 72 | 73 | 74 | ## Which model to use 75 | 76 | This application uses the [person-detection-retail-0013](https://docs.openvinotoolkit.org/2020.3/_models_intel_person_detection_retail_0013_description_person_detection_retail_0013.html) Intel® model, that can be downloaded using the **model downloader**. The **model downloader** downloads the __.xml__ and __.bin__ files that will be used by the application. 77 | 78 | The application also uses the **worker_safety_mobilenet** model, whose Caffe* model file are provided in the `resources/worker-safety-mobilenet` directory. These need to be passed through the model optimizer to generate the IR (the .xml and .bin files) that will be used by the application. 79 | 80 | To download the models and install the dependencies of the application, run the below command in the `safety-gear-detector-cpp-with-worker-safety-model` directory: 81 | ``` 82 | ./setup.sh 83 | ``` 84 | 85 | ### The Config File 86 | 87 | The _resources/config.json_ contains the path of video that will be used by the application as input. 88 | 89 | For example: 90 | ``` 91 | { 92 | "inputs": [ 93 | { 94 | "video":"path_to_video/video1.mp4" 95 | } 96 | ] 97 | } 98 | ``` 99 | 100 | The `path/to/video` is the path to an input video file. 101 | 102 | ### Which Input Video to use 103 | 104 | The application works with any input video. Sample videos are provided [here](https://github.com/intel-iot-devkit/sample-videos/). 105 | 106 | For first-use, we recommend using the *Safety_Full_Hat_and_Vest.mp4* video which is present in the `resources/` directory. 107 | 108 | For example: 109 | ``` 110 | { 111 | "inputs": [ 112 | { 113 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4" 114 | }, 115 | { 116 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4" 117 | } 118 | ] 119 | } 120 | ``` 121 | If the user wants to use any other video, it can be used by providing the path in the config.json file. 122 | 123 | ### Using the Camera Stream instead of video 124 | 125 | Replace `path/to/video` with the camera ID in the config.json file, where the ID is taken from the video device (the number **X** in /dev/video**X**). 126 | 127 | On Ubuntu, to list all available video devices use the following command: 128 | 129 | ``` 130 | ls /dev/video* 131 | ``` 132 | 133 | For example, if the output of above command is __/dev/video0__, then config.json would be: 134 | 135 | ``` 136 | { 137 | "inputs": [ 138 | { 139 | "video":"0" 140 | } 141 | ] 142 | } 143 | ``` 144 | 145 | ### Setup the Environment 146 | 147 | Configure the environment to use the Intel® Distribution of OpenVINO™ toolkit by exporting environment variables: 148 | 149 | ``` 150 | source /opt/intel/openvino/bin/setupvars.sh 151 | ``` 152 | 153 | __Note__: This command needs to be executed only once in the terminal where the application will be executed. If the terminal is closed, the command needs to be executed again. 154 | 155 | ## Run the Code on Jupyter* 156 | 157 | * Change the current directory to the git-cloned application code location on your system: 158 | ``` 159 | cd /Jupyter 160 | ``` 161 | 162 | 198 | 199 | #### Follow the steps to run the code on Jupyter: 200 | 201 | ![Jupyter Notebook](./docs/images/jupy1.png) 202 | 203 | 1. Click on **New** button on the right side of the jupyter window. 204 | 205 | 2. Click on **Python 3** option from the drop down list. 206 | 207 | 3. In the first cell type **import os** and press **Shift+Enter** from the keyboard. 208 | 209 | 4. Export the environment variables in second cell of Jupyter and press **Shift+Enter**.
210 | ``` 211 | %env MODEL = /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml 212 | %env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml 213 | ``` 214 | 215 | 5. User can set target device to infer on (DEVICE), 216 | export the environment variable as given below if required. If user skips this step, these values are set to default values. For example:
217 | %env DEVICE = CPU
218 | 219 | 6. To run the application on sync mode, export the environment variable **%env FLAG = sync**. By default, the application runs on async mode. 220 | 221 | 222 | 7. Copy the code from **safety_gear_detector_jupyter.py** and paste it in the next cell and press **Shift+Enter**. 223 | 224 | 8. Alternatively, code can be run in the following way. 225 | 226 | i. Click on the **safety_gear_detector_jupyter.ipynb** file in the jupyter notebook window. 227 | 228 | ii. Click on the **Kernel** menu and then select **Restart & Run All** from the drop down list. 229 | 230 | iii. Click on Restart and Run All Cells. 231 | 232 | ![Jupyter Notebook](./docs/images/jupy2.png) 233 | 234 | **NOTE:** 235 | 236 | 1. To run the application on **GPU**: 237 | 238 | * With the floating point precision 32 (FP32), change the **%env DEVICE = CPU** to **%env DEVICE = GPU**.
239 | **FP32:** FP32 is single-precision floating-point arithmetic uses 32 bits to represent numbers. 8 bits for the magnitude and 23 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) 240 | * With the floating point precision 16 (FP16), change the environment variables as given below:
241 | ``` 242 | %env DEVICE = GPU 243 | %env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml 244 | %env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml 245 | ``` 246 | **FP16:** FP16 is half-precision floating-point arithmetic uses 16 bits. 5 bits for the magnitude and 10 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) 247 | 248 | 2. To run the application on **Intel® Neural Compute Stick**: 249 | * Change the **%env DEVICE = CPU** to **%env DEVICE = MYRIAD**. 250 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml**
251 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
252 | 253 | 3. To run the application on **Intel® Movidius™ VPU**: 254 | - Change the **%env DEVICE = CPU** to **%env DEVICE = HDDL**. 255 | - The HDDL can only run FP16 models. Change the environment variable for the model as shown below and the model that is passed to the application must be of data type FP16.
256 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml**
257 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
258 | 259 | 267 | 4. To run the application on multiple devices: 268 | - Change the **%env DEVICE = CPU** to **%env DEVICE = MULTI:CPU,GPU,MYRIAD** 269 | - With the floating point precision 16 (FP16), change the path of the model in the environment variable MODEL as given below: 270 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml**
271 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
272 | 273 | 274 | 280 | -------------------------------------------------------------------------------- /Jupyter/inference.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Copyright (c) 2018 Intel Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | """ 24 | 25 | import os 26 | import sys 27 | import logging as log 28 | from openvino.inference_engine import IENetwork, IECore 29 | 30 | 31 | class Network: 32 | """ 33 | Load and configure inference plugins for the specified target devices 34 | and performs synchronous and asynchronous modes for the specified infer requests. 35 | """ 36 | 37 | def __init__(self): 38 | self.net = None 39 | self.plugin = None 40 | self.input_blob = None 41 | self.out_blob = None 42 | self.net_plugin = None 43 | self.infer_request_handle = None 44 | 45 | def load_model(self, model, device, input_size, output_size, num_requests, cpu_extension=None, plugin=None): 46 | """ 47 | Loads a network and an image to the Inference Engine plugin. 48 | :param model: .xml file of pre trained model 49 | :param cpu_extension: extension for the CPU device 50 | :param device: Target device 51 | :param input_size: Number of input layers 52 | :param output_size: Number of output layers 53 | :param num_requests: Index of Infer request value. Limited to device capabilities. 54 | :param plugin: Plugin for specified device 55 | :return: Shape of input layer 56 | """ 57 | 58 | model_xml = model 59 | model_bin = os.path.splitext(model_xml)[0] + ".bin" 60 | # Plugin initialization for specified device 61 | # and load extensions library if specified 62 | if not plugin: 63 | log.info("Initializing plugin for {} device...".format(device)) 64 | self.plugin = IECore() 65 | else: 66 | self.plugin = plugin 67 | 68 | if cpu_extension and 'CPU' in device: 69 | self.plugin.add_extension(cpu_extension, "CPU") 70 | 71 | # Read IR 72 | log.info("Reading IR...") 73 | self.net = self.plugin.read_network(model=model_xml, weights=model_bin) #IENetwork(model=model_xml, weights=model_bin) 74 | log.info("Loading IR to the plugin...") 75 | 76 | if "CPU" in device: 77 | supported_layers = self.plugin.query_network(self.net, "CPU") 78 | not_supported_layers = \ 79 | [l for l in self.net.layers.keys() if l not in supported_layers] 80 | if len(not_supported_layers) != 0: 81 | log.error("Following layers are not supported by " 82 | "the plugin for specified device {}:\n {}". 83 | format(device, 84 | ', '.join(not_supported_layers))) 85 | # log.error("Please try to specify cpu extensions library path" 86 | # " in command line parameters using -l " 87 | # "or --cpu_extension command line argument") 88 | sys.exit(1) 89 | 90 | if num_requests == 0: 91 | # Loads network read from IR to the plugin 92 | self.net_plugin = self.plugin.load_network(network=self.net, device_name=device) 93 | else: 94 | self.net_plugin = self.plugin.load_network(network=self.net, num_requests=num_requests, device_name=device) 95 | # log.error("num_requests != 0") 96 | 97 | self.input_blob = next(iter(self.net.inputs)) 98 | self.out_blob = next(iter(self.net.outputs)) 99 | assert len(self.net.inputs.keys()) == input_size, \ 100 | "Supports only {} input topologies".format(len(self.net.inputs)) 101 | assert len(self.net.outputs) == output_size, \ 102 | "Supports only {} output topologies".format(len(self.net.outputs)) 103 | 104 | return self.plugin, self.get_input_shape() 105 | 106 | def get_input_shape(self): 107 | """ 108 | Gives the shape of the input layer of the network. 109 | :return: None 110 | """ 111 | return self.net.inputs[self.input_blob].shape 112 | 113 | def performance_counter(self, request_id): 114 | """ 115 | Queries performance measures per layer to get feedback of what is the 116 | most time consuming layer. 117 | :param request_id: Index of Infer request value. Limited to device capabilities 118 | :return: Performance of the layer 119 | """ 120 | perf_count = self.net_plugin.requests[request_id].get_perf_counts() 121 | return perf_count 122 | 123 | def exec_net(self, request_id, frame): 124 | """ 125 | Starts asynchronous inference for specified request. 126 | :param request_id: Index of Infer request value. Limited to device capabilities. 127 | :param frame: Input image 128 | :return: Instance of Executable Network class 129 | """ 130 | self.infer_request_handle = self.net_plugin.start_async( 131 | request_id=request_id, inputs={self.input_blob: frame}) 132 | return self.net_plugin 133 | 134 | def wait(self, request_id): 135 | """ 136 | Waits for the result to become available. 137 | :param request_id: Index of Infer request value. Limited to device capabilities. 138 | :return: Timeout value 139 | """ 140 | wait_process = self.net_plugin.requests[request_id].wait(-1) 141 | return wait_process 142 | 143 | def get_output(self, request_id, output=None): 144 | """ 145 | Gives a list of results for the output layer of the network. 146 | :param request_id: Index of Infer request value. Limited to device capabilities. 147 | :param output: Name of the output layer 148 | :return: Results for the specified request 149 | """ 150 | if output: 151 | res = self.infer_request_handle.outputs[output] 152 | else: 153 | res = self.net_plugin.requests[request_id].outputs[self.out_blob] 154 | return res 155 | 156 | def clean(self): 157 | """ 158 | Deletes all the instances 159 | :return: None 160 | """ 161 | del self.net_plugin 162 | del self.plugin 163 | del self.net 164 | -------------------------------------------------------------------------------- /Jupyter/safety_gear_detector_jupyter.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "%env MODEL = /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml\n", 19 | "%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "#!/usr/bin/env python3\n", 29 | "\"\"\"\n", 30 | " Copyright (c) 2018 Intel Corporation.\n", 31 | "\n", 32 | " Permission is hereby granted, free of charge, to any person obtaining\n", 33 | " a copy of this software and associated documentation files (the\n", 34 | " \"Software\"), to deal in the Software without restriction, including\n", 35 | " without limitation the rights to use, copy, modify, merge, publish,\n", 36 | " distribute, sublicense, and/or sell copies of the Software, and to\n", 37 | " permit persons to whom the Software is furnished to do so, subject to\n", 38 | " the following conditions:\n", 39 | "\n", 40 | " The above copyright notice and this permission notice shall be\n", 41 | " included in all copies or substantial portions of the Software.\n", 42 | "\n", 43 | " THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n", 44 | " EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n", 45 | " MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n", 46 | " NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE\n", 47 | " LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\n", 48 | " OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION\n", 49 | " WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n", 50 | "\"\"\"\n", 51 | "\n", 52 | "from __future__ import print_function\n", 53 | "import sys\n", 54 | "import os\n", 55 | "import cv2\n", 56 | "import numpy as np\n", 57 | "import datetime\n", 58 | "import json\n", 59 | "from inference import Network\n", 60 | "\n", 61 | "# Global vars\n", 62 | "cpu_extension = ''\n", 63 | "conf_modelLayers = ''\n", 64 | "conf_modelWeights = ''\n", 65 | "targetDevice = \"CPU\"\n", 66 | "conf_batchSize = 1\n", 67 | "conf_modelPersonLabel = 1\n", 68 | "conf_inferConfidenceThreshold = 0.7\n", 69 | "conf_inFrameViolationsThreshold = 19\n", 70 | "conf_inFramePeopleThreshold = 5\n", 71 | "padding = 30\n", 72 | "viol_wk = 0\n", 73 | "acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL']\n", 74 | "videos = []\n", 75 | "name_of_videos = []\n", 76 | "CONFIG_FILE = '../resources/config.json'\n", 77 | "\n", 78 | "class Video:\n", 79 | " def __init__(self, idx, path):\n", 80 | " if path.isnumeric():\n", 81 | " self.video = cv2.VideoCapture(int(path))\n", 82 | " self.name = \"Cam \" + str(idx)\n", 83 | " else:\n", 84 | " if os.path.exists(path):\n", 85 | " self.video = cv2.VideoCapture(path)\n", 86 | " self.name = \"Video \" + str(idx)\n", 87 | " else:\n", 88 | " print(\"Either wrong input path or empty line is found. Please check the conf.json file\")\n", 89 | " exit(21)\n", 90 | " if not self.video.isOpened():\n", 91 | " print(\"Couldn't open video: \" + path)\n", 92 | " sys.exit(20)\n", 93 | " self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT))\n", 94 | " self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))\n", 95 | "\n", 96 | " self.currentViolationCount = 0\n", 97 | " self.currentViolationCountConfidence = 0\n", 98 | " self.prevViolationCount = 0\n", 99 | " self.totalViolations = 0\n", 100 | " self.totalPeopleCount = 0\n", 101 | " self.currentPeopleCount = 0\n", 102 | " self.currentPeopleCountConfidence = 0\n", 103 | " self.prevPeopleCount = 0\n", 104 | " self.currentTotalPeopleCount = 0\n", 105 | "\n", 106 | " cv2.namedWindow(self.name, cv2.WINDOW_NORMAL)\n", 107 | " self.frame_start_time = datetime.datetime.now()\n", 108 | "\n", 109 | "\n", 110 | "def env_parser():\n", 111 | " \"\"\"\n", 112 | " Parses the inputs.\n", 113 | " :return: None\n", 114 | " \"\"\"\n", 115 | " global use_safety_model, conf_modelLayers, conf_modelWeights, targetDevice, cpu_extension, videos,\\\n", 116 | " conf_safety_modelWeights, conf_safety_modelLayers, is_async_mode\n", 117 | " if 'MODEL' in os.environ:\n", 118 | " conf_modelLayers = os.environ['MODEL']\n", 119 | " conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + \".bin\"\n", 120 | " else:\n", 121 | " print(\"Please provide path for the .xml file.\")\n", 122 | " sys.exit(0)\n", 123 | " if 'DEVICE' in os.environ:\n", 124 | " targetDevice = os.environ['DEVICE']\n", 125 | " if 'MULTI' not in targetDevice and targetDevice not in acceptedDevices:\n", 126 | " print(\"Unsupported device: \" + targetDevice)\n", 127 | " sys.exit(2)\n", 128 | " elif 'MULTI' in targetDevice:\n", 129 | " target_devices = targetDevice.split(':')[1].split(',')\n", 130 | " for multi_device in target_devices:\n", 131 | " if multi_device not in acceptedDevices:\n", 132 | " print(\"Unsupported device: \" + targetDevice)\n", 133 | " sys.exit(2)\n", 134 | " if 'CPU_EXTENSION' in os.environ:\n", 135 | " cpu_extension = os.environ['CPU_EXTENSION']\n", 136 | " if 'USE_SAFETY_MODEL' in os.environ:\n", 137 | " conf_safety_modelLayers = os.environ['USE_SAFETY_MODEL']\n", 138 | " conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + \".bin\"\n", 139 | " use_safety_model = True\n", 140 | " else:\n", 141 | " use_safety_model = False\n", 142 | " if 'FLAG' in os.environ:\n", 143 | " if os.environ['FLAG'] == 'async':\n", 144 | " is_async_mode = True\n", 145 | " print('Application running in Async mode')\n", 146 | " else:\n", 147 | " is_async_mode = False\n", 148 | " print('Application running in Sync mode')\n", 149 | " else:\n", 150 | " is_async_mode = True\n", 151 | " print('Application running in Async mode')\n", 152 | " assert os.path.isfile(CONFIG_FILE), \"{} file doesn't exist\".format(CONFIG_FILE)\n", 153 | " config = json.loads(open(CONFIG_FILE).read())\n", 154 | " for idx, item in enumerate(config['inputs']):\n", 155 | " vid = Video(idx, item['video'])\n", 156 | " name_of_videos.append([idx, item['video']])\n", 157 | " videos.append([idx, vid])\n", 158 | "\n", 159 | "\n", 160 | "\n", 161 | "def detect_safety_hat(img):\n", 162 | " \"\"\"\n", 163 | " Detection of the hat of the person.\n", 164 | " :param img: Current frame\n", 165 | " :return: Boolean value of the detected hat\n", 166 | " \"\"\"\n", 167 | " lowH = 15\n", 168 | " lowS = 65\n", 169 | " lowV = 75\n", 170 | "\n", 171 | " highH = 30\n", 172 | " highS = 255\n", 173 | " highV = 255\n", 174 | "\n", 175 | " crop = 0\n", 176 | " height = 15\n", 177 | " perc = 8\n", 178 | "\n", 179 | " hsv = np.zeros(1)\n", 180 | "\n", 181 | " try:\n", 182 | " hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)\n", 183 | " except cv2.error as e:\n", 184 | " print(\"%d %d %d\" % (img.shape))\n", 185 | " print(\"%d %d %d\" % (img.shape))\n", 186 | " print(e)\n", 187 | "\n", 188 | " threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))\n", 189 | "\n", 190 | " x = 0\n", 191 | " y = int(threshold_img.shape[0] * crop / 100)\n", 192 | " w = int(threshold_img.shape[1])\n", 193 | " h = int(threshold_img.shape[0] * height / 100)\n", 194 | " img_cropped = threshold_img[y: y + h, x: x + w]\n", 195 | "\n", 196 | " if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:\n", 197 | " return False\n", 198 | "\n", 199 | " return True\n", 200 | "\n", 201 | "\n", 202 | "def detect_safety_jacket(img):\n", 203 | " \"\"\"\n", 204 | " Detection of the safety jacket of the person.\n", 205 | " :param img: Current frame\n", 206 | " :return: Boolean value of the detected jacket\n", 207 | " \"\"\"\n", 208 | " lowH = 0\n", 209 | " lowS = 150\n", 210 | " lowV = 42\n", 211 | "\n", 212 | " highH = 11\n", 213 | " highS = 255\n", 214 | " highV = 255\n", 215 | "\n", 216 | " crop = 15\n", 217 | " height = 40\n", 218 | " perc = 23\n", 219 | "\n", 220 | " hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)\n", 221 | "\n", 222 | " threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))\n", 223 | "\n", 224 | " x = 0\n", 225 | " y = int(threshold_img.shape[0] * crop / 100)\n", 226 | " w = int(threshold_img.shape[1])\n", 227 | " h = int(threshold_img.shape[0] * height / 100)\n", 228 | " img_cropped = threshold_img[y: y + h, x: x + w]\n", 229 | "\n", 230 | " if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:\n", 231 | " return False\n", 232 | "\n", 233 | " return True\n", 234 | "\n", 235 | "\n", 236 | "def detect_workers(workers, frame):\n", 237 | " \"\"\"\n", 238 | " Detection of the person with the safety guards.\n", 239 | " :param workers: Total number of the person in the current frame\n", 240 | " :param frame: Current frame\n", 241 | " :return: Total violation count of the person\n", 242 | " \"\"\"\n", 243 | " violations = 0\n", 244 | " global viol_wk\n", 245 | " for worker in workers:\n", 246 | " xmin, ymin, xmax, ymax = worker\n", 247 | " crop = frame[ymin:ymax, xmin:xmax]\n", 248 | " if 0 not in crop.shape:\n", 249 | " if detect_safety_hat(crop):\n", 250 | " if detect_safety_jacket(crop):\n", 251 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),\n", 252 | " (0, 255, 0), 2)\n", 253 | " else:\n", 254 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),\n", 255 | " (0, 0, 255), 2)\n", 256 | " violations += 1\n", 257 | " viol_wk += 1\n", 258 | "\n", 259 | " else:\n", 260 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)\n", 261 | " violations += 1\n", 262 | " viol_wk += 1\n", 263 | "\n", 264 | " return violations\n", 265 | "\n", 266 | "\n", 267 | "def main():\n", 268 | " \"\"\"\n", 269 | " Load the network and parse the output.\n", 270 | " :return: None\n", 271 | " \"\"\"\n", 272 | " env_parser()\n", 273 | " global is_async_mode\n", 274 | " nextReq = 1\n", 275 | " currReq = 0\n", 276 | " nextReq_s = 1\n", 277 | " currReq_s = 0\n", 278 | " prevVideo = None\n", 279 | " vid_finished = [False] * len(videos)\n", 280 | " min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))])\n", 281 | " # Initialise the class\n", 282 | " infer_network = Network()\n", 283 | " infer_network_safety = Network()\n", 284 | " # Load the network to IE plugin to get shape of input layer\n", 285 | " plugin, (batch_size, channels, model_height, model_width) = \\\n", 286 | " infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension)\n", 287 | " if use_safety_model:\n", 288 | " batch_size_sm, channels_sm, model_height_sm, model_width_sm = \\\n", 289 | " infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1]\n", 290 | "\n", 291 | " while True:\n", 292 | " for index, currVideo in videos:\n", 293 | " # Read image from video/cam\n", 294 | " vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS)))\n", 295 | " for i in range(0, int(round(vfps / min_FPS))):\n", 296 | " ret, current_img = currVideo.video.read()\n", 297 | " if not ret:\n", 298 | " vid_finished[index] = True\n", 299 | " break\n", 300 | " if vid_finished[index]:\n", 301 | " stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1),\n", 302 | " dtype='uint8')\n", 303 | " cv2.putText(stream_end_frame, \"Input file {} has ended\".format\n", 304 | " (name_of_videos[index][1].split('/')[-1]) ,\n", 305 | " (10, int(currVideo.height/2)),\n", 306 | " cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)\n", 307 | " cv2.imshow(currVideo.name, stream_end_frame)\n", 308 | " continue\n", 309 | " # Transform image to person detection model input\n", 310 | " rsImg = cv2.resize(current_img, (model_width, model_height))\n", 311 | " rsImg = rsImg.transpose((2, 0, 1))\n", 312 | " rsImg = rsImg.reshape((batch_size, channels, model_height, model_width))\n", 313 | "\n", 314 | " infer_start_time = datetime.datetime.now()\n", 315 | " # Infer current image\n", 316 | " if is_async_mode:\n", 317 | " infer_network.exec_net(nextReq, rsImg)\n", 318 | " else:\n", 319 | " infer_network.exec_net(currReq, rsImg)\n", 320 | " prevVideo = currVideo\n", 321 | " previous_img = current_img\n", 322 | " # Wait for previous request to end\n", 323 | " if infer_network.wait(currReq) == 0:\n", 324 | " infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000\n", 325 | " in_frame_workers = []\n", 326 | " people = 0\n", 327 | " violations = 0\n", 328 | " hard_hat_detection =False\n", 329 | " vest_detection = False\n", 330 | " result = infer_network.get_output(currReq)\n", 331 | " # Filter output\n", 332 | " for obj in result[0][0]:\n", 333 | " if obj[2] > conf_inferConfidenceThreshold:\n", 334 | " xmin = int(obj[3] * prevVideo.width)\n", 335 | " ymin = int(obj[4] * prevVideo.height)\n", 336 | " xmax = int(obj[5] * prevVideo.width)\n", 337 | " ymax = int(obj[6] * prevVideo.height)\n", 338 | " xmin = int(xmin - padding) if (xmin - padding) > 0 else 0\n", 339 | " ymin = int(ymin - padding) if (ymin - padding) > 0 else 0\n", 340 | " xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width\n", 341 | " ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height\n", 342 | " cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)\n", 343 | " people += 1\n", 344 | " in_frame_workers.append((xmin, ymin, xmax, ymax))\n", 345 | " new_frame = previous_img[ymin:ymax, xmin:xmax]\n", 346 | " if use_safety_model:\n", 347 | " # Transform image to safety model input\n", 348 | " in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm))\n", 349 | " in_frame_sm = in_frame_sm.transpose((2, 0, 1))\n", 350 | " in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm))\n", 351 | "\n", 352 | " infer_start_time_sm = datetime.datetime.now()\n", 353 | " if is_async_mode:\n", 354 | " infer_network_safety.exec_net(nextReq_s, in_frame_sm)\n", 355 | " else:\n", 356 | " infer_network_safety.exec_net(currReq_s, in_frame_sm)\n", 357 | " # Wait for the result\n", 358 | " infer_network_safety.wait(currReq_s)\n", 359 | " infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000\n", 360 | "\n", 361 | " result_sm = infer_network_safety.get_output(currReq_s)\n", 362 | " # Filter output\n", 363 | " hard_hat_detection = False\n", 364 | " vest_detection = False\n", 365 | " detection_list = []\n", 366 | " for obj_sm in result_sm[0][0]:\n", 367 | "\n", 368 | " if (obj_sm[2] > 0.4):\n", 369 | " # Detect safety vest\n", 370 | " if (int(obj_sm[1])) == 2:\n", 371 | " xmin_sm = int(obj_sm[3] * (xmax-xmin))\n", 372 | " ymin_sm = int(obj_sm[4] * (ymax-ymin))\n", 373 | " xmax_sm = int(obj_sm[5] * (xmax-xmin))\n", 374 | " ymax_sm = int(obj_sm[6] * (ymax-ymin))\n", 375 | " if vest_detection == False:\n", 376 | " detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin])\n", 377 | " vest_detection = True\n", 378 | "\n", 379 | " # Detect hard-hat\n", 380 | " if int(obj_sm[1]) == 4:\n", 381 | " xmin_sm_v = int(obj_sm[3] * (xmax-xmin))\n", 382 | " ymin_sm_v = int(obj_sm[4] * (ymax-ymin))\n", 383 | " xmax_sm_v = int(obj_sm[5] * (xmax-xmin))\n", 384 | " ymax_sm_v = int(obj_sm[6] * (ymax-ymin))\n", 385 | " if hard_hat_detection == False:\n", 386 | " detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin])\n", 387 | " hard_hat_detection = True\n", 388 | "\n", 389 | " if hard_hat_detection is False or vest_detection is False:\n", 390 | " violations += 1\n", 391 | " for _rect in detection_list:\n", 392 | " cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2)\n", 393 | " if is_async_mode:\n", 394 | " currReq_s, nextReq_s = nextReq_s, currReq_s\n", 395 | "\n", 396 | " # Use OpenCV if worker-safety-model is not provided\n", 397 | " else :\n", 398 | " violations = detect_workers(in_frame_workers, previous_img)\n", 399 | "\n", 400 | " # Check if detected violations equals previous frames\n", 401 | " if violations == prevVideo.currentViolationCount:\n", 402 | " prevVideo.currentViolationCountConfidence += 1\n", 403 | "\n", 404 | " # If frame threshold is reached, change validated count\n", 405 | " if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold:\n", 406 | "\n", 407 | " # If another violation occurred, save image\n", 408 | " if prevVideo.currentViolationCount > prevVideo.prevViolationCount:\n", 409 | " prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount)\n", 410 | " prevVideo.prevViolationCount = prevVideo.currentViolationCount\n", 411 | " else:\n", 412 | " prevVideo.currentViolationCountConfidence = 0\n", 413 | " prevVideo.currentViolationCount = violations\n", 414 | "\n", 415 | " # Check if detected people count equals previous frames\n", 416 | " if people == prevVideo.currentPeopleCount:\n", 417 | " prevVideo.currentPeopleCountConfidence += 1\n", 418 | " # If frame threshold is reached, change validated count\n", 419 | " if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold:\n", 420 | " prevVideo.currentTotalPeopleCount += (\n", 421 | " prevVideo.currentPeopleCount - prevVideo.prevPeopleCount)\n", 422 | " if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount:\n", 423 | " prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount\n", 424 | " prevVideo.prevPeopleCount = prevVideo.currentPeopleCount\n", 425 | " else:\n", 426 | " prevVideo.currentPeopleCountConfidence = 0\n", 427 | " prevVideo.currentPeopleCount = people\n", 428 | "\n", 429 | " frame_end_time = datetime.datetime.now()\n", 430 | " cv2.putText(previous_img, 'Total people count: ' + str(\n", 431 | " prevVideo.totalPeopleCount), (10, prevVideo.height - 10),\n", 432 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n", 433 | " cv2.putText(previous_img, 'Current people count: ' + str(\n", 434 | " prevVideo.currentTotalPeopleCount),\n", 435 | " (10, prevVideo.height - 40),\n", 436 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n", 437 | " cv2.putText(previous_img, 'Total violation count: ' + str(\n", 438 | " prevVideo.totalViolations), (10, prevVideo.height - 70),\n", 439 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n", 440 | " cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / (\n", 441 | " frame_end_time - prevVideo.frame_start_time).total_seconds()),\n", 442 | " (10, prevVideo.height - 100),\n", 443 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n", 444 | " cv2.putText(previous_img, 'Inference time: N\\A for async mode' if is_async_mode else 'Inference time: {}ms'.format((infer_end_time).total_seconds()),\n", 445 | " (10, prevVideo.height - 130),\n", 446 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n", 447 | " cv2.imshow(prevVideo.name, previous_img)\n", 448 | " prevVideo.frame_start_time = datetime.datetime.now()\n", 449 | " # Swap\n", 450 | " if is_async_mode:\n", 451 | " currReq, nextReq = nextReq, currReq\n", 452 | " previous_img = current_img\n", 453 | " prevVideo = currVideo\n", 454 | " # Exit if ESC key is pressed\n", 455 | " if cv2.waitKey(1) == 27:\n", 456 | " print(\"Attempting to stop input files\")\n", 457 | " infer_network.clean()\n", 458 | " infer_network_safety.clean()\n", 459 | " cv2.destroyAllWindows()\n", 460 | " return\n", 461 | " \n", 462 | " if False not in vid_finished:\n", 463 | " infer_network.clean()\n", 464 | " infer_network_safety.clean()\n", 465 | " cv2.destroyAllWindows()\n", 466 | " break\n", 467 | "\n", 468 | "\n", 469 | "\n", 470 | "if __name__ == '__main__':\n", 471 | " main()" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "metadata": {}, 478 | "outputs": [], 479 | "source": [] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [] 487 | } 488 | ], 489 | "metadata": { 490 | "kernelspec": { 491 | "display_name": "Python 3", 492 | "language": "python", 493 | "name": "python3" 494 | }, 495 | "language_info": { 496 | "codemirror_mode": { 497 | "name": "ipython", 498 | "version": 3 499 | }, 500 | "file_extension": ".py", 501 | "mimetype": "text/x-python", 502 | "name": "python", 503 | "nbconvert_exporter": "python", 504 | "pygments_lexer": "ipython3", 505 | "version": "3.6.9" 506 | } 507 | }, 508 | "nbformat": 4, 509 | "nbformat_minor": 2 510 | } 511 | -------------------------------------------------------------------------------- /Jupyter/safety_gear_detector_jupyter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Copyright (c) 2018 Intel Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | """ 24 | 25 | from __future__ import print_function 26 | import sys 27 | import os 28 | import cv2 29 | import numpy as np 30 | import datetime 31 | import json 32 | from inference import Network 33 | 34 | # Global vars 35 | cpu_extension = '' 36 | conf_modelLayers = '' 37 | conf_modelWeights = '' 38 | targetDevice = "CPU" 39 | conf_batchSize = 1 40 | conf_modelPersonLabel = 1 41 | conf_inferConfidenceThreshold = 0.7 42 | conf_inFrameViolationsThreshold = 19 43 | conf_inFramePeopleThreshold = 5 44 | padding = 30 45 | viol_wk = 0 46 | acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL'] 47 | videos = [] 48 | name_of_videos = [] 49 | CONFIG_FILE = '../resources/config.json' 50 | 51 | class Video: 52 | def __init__(self, idx, path): 53 | if path.isnumeric(): 54 | self.video = cv2.VideoCapture(int(path)) 55 | self.name = "Cam " + str(idx) 56 | else: 57 | if os.path.exists(path): 58 | self.video = cv2.VideoCapture(path) 59 | self.name = "Video " + str(idx) 60 | else: 61 | print("Either wrong input path or empty line is found. Please check the conf.json file") 62 | exit(21) 63 | if not self.video.isOpened(): 64 | print("Couldn't open video: " + path) 65 | sys.exit(20) 66 | self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT)) 67 | self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH)) 68 | 69 | self.currentViolationCount = 0 70 | self.currentViolationCountConfidence = 0 71 | self.prevViolationCount = 0 72 | self.totalViolations = 0 73 | self.totalPeopleCount = 0 74 | self.currentPeopleCount = 0 75 | self.currentPeopleCountConfidence = 0 76 | self.prevPeopleCount = 0 77 | self.currentTotalPeopleCount = 0 78 | 79 | cv2.namedWindow(self.name, cv2.WINDOW_NORMAL) 80 | self.frame_start_time = datetime.datetime.now() 81 | 82 | 83 | def env_parser(): 84 | """ 85 | Parses the inputs. 86 | :return: None 87 | """ 88 | global use_safety_model, conf_modelLayers, conf_modelWeights, targetDevice, cpu_extension, videos,\ 89 | conf_safety_modelWeights, conf_safety_modelLayers, is_async_mode 90 | if 'MODEL' in os.environ: 91 | conf_modelLayers = os.environ['MODEL'] 92 | conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + ".bin" 93 | else: 94 | print("Please provide path for the .xml file.") 95 | sys.exit(0) 96 | if 'DEVICE' in os.environ: 97 | targetDevice = os.environ['DEVICE'] 98 | if targetDevice not in acceptedDevices: 99 | print("Selected device, %s not supported." % (targetDevice)) 100 | sys.exit(12) 101 | if 'CPU_EXTENSION' in os.environ: 102 | cpu_extension = os.environ['CPU_EXTENSION'] 103 | if 'USE_SAFETY_MODEL' in os.environ: 104 | conf_safety_modelLayers = os.environ['USE_SAFETY_MODEL'] 105 | conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + ".bin" 106 | use_safety_model = True 107 | else: 108 | use_safety_model = False 109 | if 'FLAG' in os.environ: 110 | if os.environ['FLAG'] == 'async': 111 | is_async_mode = True 112 | print('Application running in Async mode') 113 | else: 114 | is_async_mode = False 115 | print('Application running in Sync mode') 116 | else: 117 | is_async_mode = True 118 | print('Application running in Async mode') 119 | assert os.path.isfile(CONFIG_FILE), "{} file doesn't exist".format(CONFIG_FILE) 120 | config = json.loads(open(CONFIG_FILE).read()) 121 | for idx, item in enumerate(config['inputs']): 122 | vid = Video(idx, item['video']) 123 | name_of_videos.append([idx, item['video']]) 124 | videos.append([idx, vid]) 125 | 126 | 127 | 128 | def detect_safety_hat(img): 129 | """ 130 | Detection of the hat of the person. 131 | :param img: Current frame 132 | :return: Boolean value of the detected hat 133 | """ 134 | lowH = 15 135 | lowS = 65 136 | lowV = 75 137 | 138 | highH = 30 139 | highS = 255 140 | highV = 255 141 | 142 | crop = 0 143 | height = 15 144 | perc = 8 145 | 146 | hsv = np.zeros(1) 147 | 148 | try: 149 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) 150 | except cv2.error as e: 151 | print("%d %d %d" % (img.shape)) 152 | print("%d %d %d" % (img.shape)) 153 | print(e) 154 | 155 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV)) 156 | 157 | x = 0 158 | y = int(threshold_img.shape[0] * crop / 100) 159 | w = int(threshold_img.shape[1]) 160 | h = int(threshold_img.shape[0] * height / 100) 161 | img_cropped = threshold_img[y: y + h, x: x + w] 162 | 163 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100: 164 | return False 165 | 166 | return True 167 | 168 | 169 | def detect_safety_jacket(img): 170 | """ 171 | Detection of the safety jacket of the person. 172 | :param img: Current frame 173 | :return: Boolean value of the detected jacket 174 | """ 175 | lowH = 0 176 | lowS = 150 177 | lowV = 42 178 | 179 | highH = 11 180 | highS = 255 181 | highV = 255 182 | 183 | crop = 15 184 | height = 40 185 | perc = 23 186 | 187 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) 188 | 189 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV)) 190 | 191 | x = 0 192 | y = int(threshold_img.shape[0] * crop / 100) 193 | w = int(threshold_img.shape[1]) 194 | h = int(threshold_img.shape[0] * height / 100) 195 | img_cropped = threshold_img[y: y + h, x: x + w] 196 | 197 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100: 198 | return False 199 | 200 | return True 201 | 202 | 203 | def detect_workers(workers, frame): 204 | """ 205 | Detection of the person with the safety guards. 206 | :param workers: Total number of the person in the current frame 207 | :param frame: Current frame 208 | :return: Total violation count of the person 209 | """ 210 | violations = 0 211 | global viol_wk 212 | for worker in workers: 213 | xmin, ymin, xmax, ymax = worker 214 | crop = frame[ymin:ymax, xmin:xmax] 215 | if 0 not in crop.shape: 216 | if detect_safety_hat(crop): 217 | if detect_safety_jacket(crop): 218 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), 219 | (0, 255, 0), 2) 220 | else: 221 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), 222 | (0, 0, 255), 2) 223 | violations += 1 224 | viol_wk += 1 225 | 226 | else: 227 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2) 228 | violations += 1 229 | viol_wk += 1 230 | 231 | return violations 232 | 233 | 234 | def main(): 235 | """ 236 | Load the network and parse the output. 237 | :return: None 238 | """ 239 | env_parser() 240 | global is_async_mode 241 | nextReq = 1 242 | currReq = 0 243 | nextReq_s = 1 244 | currReq_s = 0 245 | prevVideo = None 246 | vid_finished = [False] * len(videos) 247 | min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))]) 248 | # Initialise the class 249 | infer_network = Network() 250 | infer_network_safety = Network() 251 | # Load the network to IE plugin to get shape of input layer 252 | plugin, (batch_size, channels, model_height, model_width) = \ 253 | infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension) 254 | if use_safety_model: 255 | batch_size_sm, channels_sm, model_height_sm, model_width_sm = \ 256 | infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1] 257 | 258 | while True: 259 | for index, currVideo in videos: 260 | # Read image from video/cam 261 | vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS))) 262 | for i in range(0, int(round(vfps / min_FPS))): 263 | ret, current_img = currVideo.video.read() 264 | if not ret: 265 | vid_finished[index] = True 266 | break 267 | if vid_finished[index]: 268 | stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1), 269 | dtype='uint8') 270 | cv2.putText(stream_end_frame, "Input file {} has ended".format 271 | (name_of_videos[index][1].split('/')[-1]) , 272 | (10, int(currVideo.height/2)), 273 | cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2) 274 | cv2.imshow(currVideo.name, stream_end_frame) 275 | continue 276 | # Transform image to person detection model input 277 | rsImg = cv2.resize(current_img, (model_width, model_height)) 278 | rsImg = rsImg.transpose((2, 0, 1)) 279 | rsImg = rsImg.reshape((batch_size, channels, model_height, model_width)) 280 | 281 | infer_start_time = datetime.datetime.now() 282 | # Infer current image 283 | if is_async_mode: 284 | infer_network.exec_net(nextReq, rsImg) 285 | else: 286 | infer_network.exec_net(currReq, rsImg) 287 | prevVideo = currVideo 288 | previous_img = current_img 289 | # Wait for previous request to end 290 | if infer_network.wait(currReq) == 0: 291 | infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000 292 | in_frame_workers = [] 293 | people = 0 294 | violations = 0 295 | hard_hat_detection =False 296 | vest_detection = False 297 | result = infer_network.get_output(currReq) 298 | # Filter output 299 | for obj in result[0][0]: 300 | if obj[2] > conf_inferConfidenceThreshold: 301 | xmin = int(obj[3] * prevVideo.width) 302 | ymin = int(obj[4] * prevVideo.height) 303 | xmax = int(obj[5] * prevVideo.width) 304 | ymax = int(obj[6] * prevVideo.height) 305 | xmin = int(xmin - padding) if (xmin - padding) > 0 else 0 306 | ymin = int(ymin - padding) if (ymin - padding) > 0 else 0 307 | xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width 308 | ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height 309 | cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2) 310 | people += 1 311 | in_frame_workers.append((xmin, ymin, xmax, ymax)) 312 | new_frame = previous_img[ymin:ymax, xmin:xmax] 313 | if use_safety_model: 314 | # Transform image to safety model input 315 | in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm)) 316 | in_frame_sm = in_frame_sm.transpose((2, 0, 1)) 317 | in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm)) 318 | 319 | infer_start_time_sm = datetime.datetime.now() 320 | if is_async_mode: 321 | infer_network_safety.exec_net(nextReq_s, in_frame_sm) 322 | else: 323 | infer_network_safety.exec_net(currReq_s, in_frame_sm) 324 | # Wait for the result 325 | infer_network_safety.wait(currReq_s) 326 | infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000 327 | 328 | result_sm = infer_network_safety.get_output(currReq_s) 329 | # Filter output 330 | hard_hat_detection = False 331 | vest_detection = False 332 | detection_list = [] 333 | for obj_sm in result_sm[0][0]: 334 | 335 | if (obj_sm[2] > 0.4): 336 | # Detect safety vest 337 | if (int(obj_sm[1])) == 2: 338 | xmin_sm = int(obj_sm[3] * (xmax-xmin)) 339 | ymin_sm = int(obj_sm[4] * (ymax-ymin)) 340 | xmax_sm = int(obj_sm[5] * (xmax-xmin)) 341 | ymax_sm = int(obj_sm[6] * (ymax-ymin)) 342 | if vest_detection == False: 343 | detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin]) 344 | vest_detection = True 345 | 346 | # Detect hard-hat 347 | if int(obj_sm[1]) == 4: 348 | xmin_sm_v = int(obj_sm[3] * (xmax-xmin)) 349 | ymin_sm_v = int(obj_sm[4] * (ymax-ymin)) 350 | xmax_sm_v = int(obj_sm[5] * (xmax-xmin)) 351 | ymax_sm_v = int(obj_sm[6] * (ymax-ymin)) 352 | if hard_hat_detection == False: 353 | detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin]) 354 | hard_hat_detection = True 355 | 356 | if hard_hat_detection is False or vest_detection is False: 357 | violations += 1 358 | for _rect in detection_list: 359 | cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2) 360 | if is_async_mode: 361 | currReq_s, nextReq_s = nextReq_s, currReq_s 362 | 363 | # Use OpenCV if worker-safety-model is not provided 364 | else : 365 | violations = detect_workers(in_frame_workers, previous_img) 366 | 367 | # Check if detected violations equals previous frames 368 | if violations == prevVideo.currentViolationCount: 369 | prevVideo.currentViolationCountConfidence += 1 370 | 371 | # If frame threshold is reached, change validated count 372 | if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold: 373 | 374 | # If another violation occurred, save image 375 | if prevVideo.currentViolationCount > prevVideo.prevViolationCount: 376 | prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount) 377 | prevVideo.prevViolationCount = prevVideo.currentViolationCount 378 | else: 379 | prevVideo.currentViolationCountConfidence = 0 380 | prevVideo.currentViolationCount = violations 381 | 382 | # Check if detected people count equals previous frames 383 | if people == prevVideo.currentPeopleCount: 384 | prevVideo.currentPeopleCountConfidence += 1 385 | # If frame threshold is reached, change validated count 386 | if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold: 387 | prevVideo.currentTotalPeopleCount += ( 388 | prevVideo.currentPeopleCount - prevVideo.prevPeopleCount) 389 | if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount: 390 | prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount 391 | prevVideo.prevPeopleCount = prevVideo.currentPeopleCount 392 | else: 393 | prevVideo.currentPeopleCountConfidence = 0 394 | prevVideo.currentPeopleCount = people 395 | 396 | frame_end_time = datetime.datetime.now() 397 | cv2.putText(previous_img, 'Total people count: ' + str( 398 | prevVideo.totalPeopleCount), (10, prevVideo.height - 10), 399 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 400 | cv2.putText(previous_img, 'Current people count: ' + str( 401 | prevVideo.currentTotalPeopleCount), 402 | (10, prevVideo.height - 40), 403 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 404 | cv2.putText(previous_img, 'Total violation count: ' + str( 405 | prevVideo.totalViolations), (10, prevVideo.height - 70), 406 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 407 | cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / ( 408 | frame_end_time - prevVideo.frame_start_time).total_seconds()), 409 | (10, prevVideo.height - 100), 410 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 411 | cv2.putText(previous_img, 'Inference time: N\A for async mode' if is_async_mode else 'Inference time: {}ms'.format((infer_end_time).total_seconds()), 412 | (10, prevVideo.height - 130), 413 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 414 | cv2.imshow(prevVideo.name, previous_img) 415 | prevVideo.frame_start_time = datetime.datetime.now() 416 | # Swap 417 | if is_async_mode: 418 | currReq, nextReq = nextReq, currReq 419 | previous_img = current_img 420 | prevVideo = currVideo 421 | # Exit if ESC key is pressed 422 | if cv2.waitKey(1) == 27: 423 | print("Attempting to stop input files") 424 | infer_network.clean() 425 | infer_network_safety.clean() 426 | cv2.destroyAllWindows() 427 | return 428 | 429 | if False not in vid_finished: 430 | infer_network.clean() 431 | infer_network_safety.clean() 432 | cv2.destroyAllWindows() 433 | break 434 | 435 | 436 | 437 | if __name__ == '__main__': 438 | main() 439 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Intel Corporation 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Safety Gear Detector 2 | 3 | | Details | | 4 | |-----------------------|---------------| 5 | | Target OS: | Ubuntu\* 18.04 LTS | 6 | | Programming Language: | Python\* 3.5| 7 | | Time to Complete: | 30-40min | 8 | 9 | ![safety-gear-detector](docs/images/safetygear.png) 10 | 11 | 12 | ## What It Does 13 | This reference implementation is capable of detecting people passing in front of a camera and detecting if the people are wearing safety-jackets and hard-hats. The application counts the number of people who are violating the safety gear standards and the total number of people detected. 14 | 15 | ## Requirements 16 | 17 | ### Hardware 18 | 19 | - 6th to 8th Generation Intel® Core™ processors with Iris® Pro graphics or Intel® HD Graphics 20 | 21 | ### Software 22 | 23 | - [Ubuntu\* 18.04 LTS](http://releases.ubuntu.com/18.04/)
24 | **Note**: We recommend using a 4.14+ Linux* kernel with this software. Run the following command to determine the kernel version: 25 | 26 | ``` 27 | uname -a 28 | ``` 29 | 30 | - OpenCL™ Runtime Package 31 | 32 | - Intel® Distribution of OpenVINO™ toolkit 2020 R3 Release 33 | 34 | ## How It Works 35 | The application uses the Inference Engine included in the Intel® Distribution of OpenVINO™ toolkit. 36 | 37 | Firstly, a trained neural network detects people in the frame and displays a green colored bounding box over them. For each person detected, the application determines if they are wearing a safety-jacket and hard-hat. If they are not, an alert is registered with the system. 38 | 39 | ![Architectural diagram](docs/images/archdia.png) 40 | 41 | ## Setup 42 | 43 | ### Install Intel® Distribution of OpenVINO™ toolkit 44 | Refer to [Install the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to install and set up the toolkit. 45 | 46 | Install the OpenCL™ Runtime Package to run inference on the GPU. It is not mandatory for CPU inference. 47 | 48 | ### Other dependencies 49 | #### FFmpeg* 50 | FFmpeg is a free and open-source project capable of recording, converting and streaming digital audio and video in various formats. It can be used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more. 51 | 52 | ## Setup 53 | ### Get the code 54 | Clone the reference implementation 55 | ``` 56 | sudo apt-get update && sudo apt-get install git 57 | git clone https://github.com/intel-iot-devkit/safety-gear-detector-python.git 58 | ``` 59 | 60 | ### Install OpenVINO 61 | 62 | Refer to [Install Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to learn how to install and configure the toolkit. 63 | 64 | Install the OpenCL™ Runtime Package to run inference on the GPU, as shown in the instructions below. It is not mandatory for CPU inference. 65 | 66 | 67 | ## Which model to use 68 | 69 | This application uses the [person-detection-retail-0013](https://docs.openvinotoolkit.org/2020.3/_models_intel_person_detection_retail_0013_description_person_detection_retail_0013.html) Intel® model, that can be downloaded using the **model downloader**. The **model downloader** downloads the __.xml__ and __.bin__ files that will be used by the application. 70 | 71 | The application also uses the **worker_safety_mobilenet** model, whose Caffe* model file are provided in the `resources/worker-safety-mobilenet` directory. These need to be passed through the model optimizer to generate the IR (the .xml and .bin files) that will be used by the application. 72 | 73 | To download the models and install the dependencies of the application, run the below command in the `safety-gear-detector-cpp-with-worker-safety-model` directory: 74 | ``` 75 | ./setup.sh 76 | ``` 77 | 78 | ### The Config File 79 | 80 | The _resources/config.json_ contains the path of video that will be used by the application as input. 81 | 82 | For example: 83 | ``` 84 | { 85 | "inputs": [ 86 | { 87 | "video":"path_to_video/video1.mp4", 88 | } 89 | ] 90 | } 91 | ``` 92 | 93 | The `path/to/video` is the path to an input video file. 94 | 95 | ### Which Input Video to use 96 | 97 | The application works with any input video. Sample videos are provided [here](https://github.com/intel-iot-devkit/sample-videos/). 98 | 99 | For first-use, we recommend using the *Safety_Full_Hat_and_Vest.mp4* video which is present in the `resources/` directory. 100 | 101 | For example: 102 | ``` 103 | { 104 | "inputs": [ 105 | { 106 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4" 107 | }, 108 | { 109 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4" 110 | } 111 | ] 112 | } 113 | ``` 114 | If the user wants to use any other video, it can be used by providing the path in the config.json file. 115 | 116 | ### Using the Camera Stream instead of video 117 | 118 | Replace `path/to/video` with the camera ID in the config.json file, where the ID is taken from the video device (the number **X** in /dev/video**X**). 119 | 120 | On Ubuntu, to list all available video devices use the following command: 121 | 122 | ``` 123 | ls /dev/video* 124 | ``` 125 | 126 | For example, if the output of above command is __/dev/video0__, then config.json would be: 127 | 128 | ``` 129 | { 130 | "inputs": [ 131 | { 132 | "video":"0" 133 | } 134 | ] 135 | } 136 | ``` 137 | 138 | ### Setup the Environment 139 | 140 | Configure the environment to use the Intel® Distribution of OpenVINO™ toolkit by exporting environment variables: 141 | 142 | ``` 143 | source /opt/intel/openvino/bin/setupvars.sh 144 | ``` 145 | 146 | __Note__: This command needs to be executed only once in the terminal where the application will be executed. If the terminal is closed, the command needs to be executed again. 147 | 148 | ## Run the Application 149 | 150 | Change the current directory to the git-cloned application code location on your system: 151 | ``` 152 | cd /application 153 | ``` 154 | 155 | To see a list of the various options: 156 | ``` 157 | ./safety_gear_detector.py -h 158 | ``` 159 | A user can specify what target device to run on by using the device command-line argument `-d`. If no target device is specified the application will run on the CPU by default. 160 | To run with multiple devices use _-d MULTI:device1,device2_. For example: _-d MULTI:CPU,GPU,MYRIAD_ 161 | 162 | ### Run on the CPU 163 | 164 | To run the application using **worker_safety_mobilenet** model, use the `-sm` flag followed by the path to the worker_safety_mobilenet.xml file, as follows: 165 | ``` 166 | ./safety_gear_detector.py -d CPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml 167 | ``` 168 | If the worker_safety_mobilenet model is not provided as command-line argument, the application uses OpenCV to detect safety jacket and hard-hat. To run the application without using worker_safety_mobilenet model: 169 | ``` 170 | ./safety_gear_detector.py -d CPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml 171 | ``` 172 | **Note:** By default, the application runs on async mode. To run the application on sync mode, use ```-f sync``` as command-line argument. 173 | 174 | ### Run on the Integrated GPU 175 | * To run on the integrated Intel GPU with floating point precision 32 (FP32), use the `-d GPU` command-line argument: 176 | 177 | **FP32:** FP32 is single-precision floating-point arithmetic uses 32 bits to represent numbers. 8 bits for the magnitude and 23 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) 178 | 179 | ``` 180 | ./safety_gear_detector.py -d GPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml 181 | ``` 182 | * To run on the integrated Intel® GPU with floating point precision 16 (FP16): 183 | 184 | **FP16:** FP16 is half-precision floating-point arithmetic uses 16 bits. 5 bits for the magnitude and 10 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) 185 | 186 | ``` 187 | ./safety_gear_detector.py -d GPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml 188 | ``` 189 | ### Run on the Intel® Neural Compute Stick 190 | To run on the Intel® Neural Compute Stick, use the `-d MYRIAD` command-line argument: 191 | ``` 192 | ./safety_gear_detector.py -d MYRIAD -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml 193 | ``` 194 | 195 | ### Run on the Intel® Movidius™ VPU 196 | To run on the Intel® Movidius™ VPU, use the `-d HDDL` command-line argument: 197 | ``` 198 | ./safety_gear_detector.py -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml -d HDDL 199 | ``` 200 | **Note:** The Intel® Movidius™ VPU can only run FP16 models. The model that is passed to the application through the `-m ` command-line argument must be of data type FP16. 201 | 202 | 240 | -------------------------------------------------------------------------------- /application/inference.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Copyright (c) 2018 Intel Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | """ 24 | 25 | import os 26 | import sys 27 | import logging as log 28 | from openvino.inference_engine import IENetwork, IECore 29 | 30 | 31 | class Network: 32 | """ 33 | Load and configure inference plugins for the specified target devices 34 | and performs synchronous and asynchronous modes for the specified infer requests. 35 | """ 36 | 37 | def __init__(self): 38 | self.net = None 39 | self.plugin = None 40 | self.input_blob = None 41 | self.out_blob = None 42 | self.net_plugin = None 43 | self.infer_request_handle = None 44 | 45 | def load_model(self, model, device, input_size, output_size, num_requests, cpu_extension=None, plugin=None): 46 | """ 47 | Loads a network and an image to the Inference Engine plugin. 48 | :param model: .xml file of pre trained model 49 | :param cpu_extension: extension for the CPU device 50 | :param device: Target device 51 | :param input_size: Number of input layers 52 | :param output_size: Number of output layers 53 | :param num_requests: Index of Infer request value. Limited to device capabilities. 54 | :param plugin: Plugin for specified device 55 | :return: Shape of input layer 56 | """ 57 | 58 | model_xml = model 59 | model_bin = os.path.splitext(model_xml)[0] + ".bin" 60 | # Plugin initialization for specified device 61 | # and load extensions library if specified 62 | if not plugin: 63 | log.info("Initializing plugin for {} device...".format(device)) 64 | self.plugin = IECore() 65 | else: 66 | self.plugin = plugin 67 | 68 | if cpu_extension and 'CPU' in device: 69 | self.plugin.add_extension(cpu_extension, "CPU") 70 | 71 | # Read IR 72 | log.info("Reading IR...") 73 | self.net = self.plugin.read_network(model=model_xml, weights=model_bin) #IENetwork(model=model_xml, weights=model_bin) 74 | log.info("Loading IR to the plugin...") 75 | 76 | if "CPU" in device: 77 | supported_layers = self.plugin.query_network(self.net, "CPU") 78 | not_supported_layers = \ 79 | [l for l in self.net.layers.keys() if l not in supported_layers] 80 | if len(not_supported_layers) != 0: 81 | log.error("Following layers are not supported by " 82 | "the plugin for specified device {}:\n {}". 83 | format(device, 84 | ', '.join(not_supported_layers))) 85 | # log.error("Please try to specify cpu extensions library path" 86 | # " in command line parameters using -l " 87 | # "or --cpu_extension command line argument") 88 | sys.exit(1) 89 | 90 | if num_requests == 0: 91 | # Loads network read from IR to the plugin 92 | self.net_plugin = self.plugin.load_network(network=self.net, device_name=device) 93 | else: 94 | self.net_plugin = self.plugin.load_network(network=self.net, num_requests=num_requests, device_name=device) 95 | # log.error("num_requests != 0") 96 | 97 | self.input_blob = next(iter(self.net.inputs)) 98 | self.out_blob = next(iter(self.net.outputs)) 99 | assert len(self.net.inputs.keys()) == input_size, \ 100 | "Supports only {} input topologies".format(len(self.net.inputs)) 101 | assert len(self.net.outputs) == output_size, \ 102 | "Supports only {} output topologies".format(len(self.net.outputs)) 103 | 104 | return self.plugin, self.get_input_shape() 105 | 106 | def get_input_shape(self): 107 | """ 108 | Gives the shape of the input layer of the network. 109 | :return: None 110 | """ 111 | return self.net.inputs[self.input_blob].shape 112 | 113 | def performance_counter(self, request_id): 114 | """ 115 | Queries performance measures per layer to get feedback of what is the 116 | most time consuming layer. 117 | :param request_id: Index of Infer request value. Limited to device capabilities 118 | :return: Performance of the layer 119 | """ 120 | perf_count = self.net_plugin.requests[request_id].get_perf_counts() 121 | return perf_count 122 | 123 | def exec_net(self, request_id, frame): 124 | """ 125 | Starts asynchronous inference for specified request. 126 | :param request_id: Index of Infer request value. Limited to device capabilities. 127 | :param frame: Input image 128 | :return: Instance of Executable Network class 129 | """ 130 | self.infer_request_handle = self.net_plugin.start_async( 131 | request_id=request_id, inputs={self.input_blob: frame}) 132 | return self.net_plugin 133 | 134 | def wait(self, request_id): 135 | """ 136 | Waits for the result to become available. 137 | :param request_id: Index of Infer request value. Limited to device capabilities. 138 | :return: Timeout value 139 | """ 140 | wait_process = self.net_plugin.requests[request_id].wait(-1) 141 | return wait_process 142 | 143 | def get_output(self, request_id, output=None): 144 | """ 145 | Gives a list of results for the output layer of the network. 146 | :param request_id: Index of Infer request value. Limited to device capabilities. 147 | :param output: Name of the output layer 148 | :return: Results for the specified request 149 | """ 150 | if output: 151 | res = self.infer_request_handle.outputs[output] 152 | else: 153 | res = self.net_plugin.requests[request_id].outputs[self.out_blob] 154 | return res 155 | 156 | def clean(self): 157 | """ 158 | Deletes all the instances 159 | :return: None 160 | """ 161 | del self.net_plugin 162 | del self.plugin 163 | del self.net 164 | -------------------------------------------------------------------------------- /application/safety_gear_detector.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Copyright (c) 2018 Intel Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | """ 24 | 25 | from __future__ import print_function 26 | import sys 27 | import os 28 | import cv2 29 | import numpy as np 30 | from argparse import ArgumentParser 31 | import datetime 32 | import json 33 | from inference import Network 34 | 35 | # Global vars 36 | cpu_extension = '' 37 | conf_modelLayers = '' 38 | conf_modelWeights = '' 39 | conf_safety_modelLayers = '' 40 | conf_safety_modelWeights = '' 41 | targetDevice = "CPU" 42 | conf_batchSize = 1 43 | conf_modelPersonLabel = 1 44 | conf_inferConfidenceThreshold = 0.7 45 | conf_inFrameViolationsThreshold = 19 46 | conf_inFramePeopleThreshold = 5 47 | use_safety_model = False 48 | padding = 30 49 | viol_wk = 0 50 | acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL'] 51 | videos = [] 52 | name_of_videos = [] 53 | CONFIG_FILE = '../resources/config.json' 54 | is_async_mode = True 55 | 56 | 57 | class Video: 58 | def __init__(self, idx, path): 59 | if path.isnumeric(): 60 | self.video = cv2.VideoCapture(int(path)) 61 | self.name = "Cam " + str(idx) 62 | else: 63 | if os.path.exists(path): 64 | self.video = cv2.VideoCapture(path) 65 | self.name = "Video " + str(idx) 66 | else: 67 | print("Either wrong input path or empty line is found. Please check the conf.json file") 68 | exit(21) 69 | if not self.video.isOpened(): 70 | print("Couldn't open video: " + path) 71 | sys.exit(20) 72 | self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT)) 73 | self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH)) 74 | 75 | self.currentViolationCount = 0 76 | self.currentViolationCountConfidence = 0 77 | self.prevViolationCount = 0 78 | self.totalViolations = 0 79 | self.totalPeopleCount = 0 80 | self.currentPeopleCount = 0 81 | self.currentPeopleCountConfidence = 0 82 | self.prevPeopleCount = 0 83 | self.currentTotalPeopleCount = 0 84 | 85 | cv2.namedWindow(self.name, cv2.WINDOW_NORMAL) 86 | self.frame_start_time = datetime.datetime.now() 87 | 88 | 89 | def get_args(): 90 | """ 91 | Parses the argument. 92 | :return: None 93 | """ 94 | global is_async_mode 95 | parser = ArgumentParser() 96 | parser.add_argument("-d", "--device", 97 | help="Specify the target device to infer on; CPU, GPU," 98 | "FPGA, MYRIAD or HDDL is acceptable. Application will" 99 | "look for a suitable plugin for device specified" 100 | " (CPU by default)", 101 | type=str, required=False) 102 | parser.add_argument("-m", "--model", 103 | help="Path to an .xml file with a trained model's" 104 | " weights.", 105 | required=True, type=str) 106 | parser.add_argument("-sm", "--safety_model", 107 | help="Path to an .xml file with a trained model's" 108 | " weights.", 109 | required=False, type=str, default=None) 110 | parser.add_argument("-e", "--cpu_extension", 111 | help="MKLDNN (CPU)-targeted custom layers. Absolute " 112 | "path to a shared library with the kernels impl", 113 | type=str, default=None) 114 | parser.add_argument("-f", "--flag", help="sync or async", default="async", type=str) 115 | 116 | args = parser.parse_args() 117 | 118 | global conf_modelLayers, conf_modelWeights, conf_safety_modelLayers, conf_safety_modelWeights, \ 119 | targetDevice, cpu_extension, videos, use_safety_model 120 | if args.model: 121 | conf_modelLayers = args.model 122 | conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + ".bin" 123 | if args.safety_model: 124 | conf_safety_modelLayers = args.safety_model 125 | conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + ".bin" 126 | use_safety_model = True 127 | if args.device: 128 | targetDevice = args.device 129 | if "MULTI:" not in targetDevice: 130 | if targetDevice not in acceptedDevices: 131 | print("Selected device, %s not supported." % (targetDevice)) 132 | sys.exit(12) 133 | if args.cpu_extension: 134 | cpu_extension = args.cpu_extension 135 | if args.flag == "async": 136 | is_async_mode = True 137 | print('Application running in Async mode') 138 | else: 139 | is_async_mode = False 140 | print('Application running in Sync mode') 141 | assert os.path.isfile(CONFIG_FILE), "{} file doesn't exist".format(CONFIG_FILE) 142 | config = json.loads(open(CONFIG_FILE).read()) 143 | for idx, item in enumerate(config['inputs']): 144 | vid = Video(idx, item['video']) 145 | name_of_videos.append([idx, item['video']]) 146 | videos.append([idx, vid]) 147 | 148 | 149 | def detect_safety_hat(img): 150 | """ 151 | Detection of the hat of the person. 152 | :param img: Current frame 153 | :return: Boolean value of the detected hat 154 | """ 155 | lowH = 15 156 | lowS = 65 157 | lowV = 75 158 | 159 | highH = 30 160 | highS = 255 161 | highV = 255 162 | 163 | crop = 0 164 | height = 15 165 | perc = 8 166 | 167 | hsv = np.zeros(1) 168 | 169 | try: 170 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) 171 | except cv2.error as e: 172 | print("%d %d %d" % (img.shape)) 173 | print("%d %d %d" % (img.shape)) 174 | print(e) 175 | 176 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV)) 177 | 178 | x = 0 179 | y = int(threshold_img.shape[0] * crop / 100) 180 | w = int(threshold_img.shape[1]) 181 | h = int(threshold_img.shape[0] * height / 100) 182 | img_cropped = threshold_img[y: y + h, x: x + w] 183 | 184 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100: 185 | return False 186 | return True 187 | 188 | 189 | def detect_safety_jacket(img): 190 | """ 191 | Detection of the safety jacket of the person. 192 | :param img: Current frame 193 | :return: Boolean value of the detected jacket 194 | """ 195 | lowH = 0 196 | lowS = 150 197 | lowV = 42 198 | 199 | highH = 11 200 | highS = 255 201 | highV = 255 202 | 203 | crop = 15 204 | height = 40 205 | perc = 23 206 | 207 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) 208 | 209 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV)) 210 | 211 | x = 0 212 | y = int(threshold_img.shape[0] * crop / 100) 213 | w = int(threshold_img.shape[1]) 214 | h = int(threshold_img.shape[0] * height / 100) 215 | img_cropped = threshold_img[y: y + h, x: x + w] 216 | 217 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100: 218 | return False 219 | return True 220 | 221 | 222 | def detect_workers(workers, frame): 223 | """ 224 | Detection of the person with the safety guards. 225 | :param workers: Total number of the person in the current frame 226 | :param frame: Current frame 227 | :return: Total violation count of the person 228 | """ 229 | violations = 0 230 | global viol_wk 231 | for worker in workers: 232 | xmin, ymin, xmax, ymax = worker 233 | crop = frame[ymin:ymax, xmin:xmax] 234 | if 0 not in crop.shape: 235 | if detect_safety_hat(crop): 236 | if detect_safety_jacket(crop): 237 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), 238 | (0, 255, 0), 2) 239 | else: 240 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), 241 | (0, 0, 255), 2) 242 | violations += 1 243 | viol_wk += 1 244 | 245 | else: 246 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2) 247 | violations += 1 248 | viol_wk += 1 249 | return violations 250 | 251 | 252 | def main(): 253 | """ 254 | Load the network and parse the output. 255 | :return: None 256 | """ 257 | get_args() 258 | global is_async_mode 259 | nextReq = 1 260 | currReq = 0 261 | nextReq_s = 1 262 | currReq_s = 0 263 | prevVideo = None 264 | vid_finished = [False] * len(videos) 265 | min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))]) 266 | 267 | # Initialise the class 268 | infer_network = Network() 269 | infer_network_safety = Network() 270 | # Load the network to IE plugin to get shape of input layer 271 | plugin, (batch_size, channels, model_height, model_width) = \ 272 | infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension) 273 | if use_safety_model: 274 | batch_size_sm, channels_sm, model_height_sm, model_width_sm = \ 275 | infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1] 276 | 277 | while True: 278 | for index, currVideo in videos: 279 | # Read image from video/cam 280 | vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS))) 281 | for i in range(0, int(round(vfps / min_FPS))): 282 | ret, current_img = currVideo.video.read() 283 | if not ret: 284 | vid_finished[index] = True 285 | break 286 | if vid_finished[index]: 287 | stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1), 288 | dtype='uint8') 289 | cv2.putText(stream_end_frame, "Input file {} has ended".format 290 | (name_of_videos[index][1].split('/')[-1]) , 291 | (10, int(currVideo.height/2)), 292 | cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2) 293 | cv2.imshow(currVideo.name, stream_end_frame) 294 | continue 295 | # Transform image to person detection model input 296 | rsImg = cv2.resize(current_img, (model_width, model_height)) 297 | rsImg = rsImg.transpose((2, 0, 1)) 298 | rsImg = rsImg.reshape((batch_size, channels, model_height, model_width)) 299 | 300 | infer_start_time = datetime.datetime.now() 301 | # Infer current image 302 | if is_async_mode: 303 | infer_network.exec_net(nextReq, rsImg) 304 | else: 305 | infer_network.exec_net(currReq, rsImg) 306 | prevVideo = currVideo 307 | previous_img = current_img 308 | 309 | # Wait for previous request to end 310 | if infer_network.wait(currReq) == 0: 311 | infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000 312 | 313 | in_frame_workers = [] 314 | 315 | people = 0 316 | violations = 0 317 | hard_hat_detection =False 318 | vest_detection = False 319 | result = infer_network.get_output(currReq) 320 | # Filter output 321 | for obj in result[0][0]: 322 | if obj[2] > conf_inferConfidenceThreshold: 323 | xmin = int(obj[3] * prevVideo.width) 324 | ymin = int(obj[4] * prevVideo.height) 325 | xmax = int(obj[5] * prevVideo.width) 326 | ymax = int(obj[6] * prevVideo.height) 327 | xmin = int(xmin - padding) if (xmin - padding) > 0 else 0 328 | ymin = int(ymin - padding) if (ymin - padding) > 0 else 0 329 | xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width 330 | ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height 331 | cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2) 332 | people += 1 333 | in_frame_workers.append((xmin, ymin, xmax, ymax)) 334 | new_frame = previous_img[ymin:ymax, xmin:xmax] 335 | if use_safety_model: 336 | 337 | # Transform image to safety model input 338 | in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm)) 339 | in_frame_sm = in_frame_sm.transpose((2, 0, 1)) 340 | in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm)) 341 | 342 | infer_start_time_sm = datetime.datetime.now() 343 | if is_async_mode: 344 | infer_network_safety.exec_net(nextReq_s, in_frame_sm) 345 | else: 346 | infer_network_safety.exec_net(currReq_s, in_frame_sm) 347 | # Wait for the result 348 | infer_network_safety.wait(currReq_s) 349 | infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000 350 | 351 | result_sm = infer_network_safety.get_output(currReq_s) 352 | # Filter output 353 | hard_hat_detection = False 354 | vest_detection = False 355 | detection_list = [] 356 | for obj_sm in result_sm[0][0]: 357 | 358 | if (obj_sm[2] > 0.4): 359 | # Detect safety vest 360 | if (int(obj_sm[1])) == 2: 361 | xmin_sm = int(obj_sm[3] * (xmax-xmin)) 362 | ymin_sm = int(obj_sm[4] * (ymax-ymin)) 363 | xmax_sm = int(obj_sm[5] * (xmax-xmin)) 364 | ymax_sm = int(obj_sm[6] * (ymax-ymin)) 365 | if vest_detection == False: 366 | detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin]) 367 | vest_detection = True 368 | 369 | # Detect hard-hat 370 | if int(obj_sm[1]) == 4: 371 | xmin_sm_v = int(obj_sm[3] * (xmax-xmin)) 372 | ymin_sm_v = int(obj_sm[4] * (ymax-ymin)) 373 | xmax_sm_v = int(obj_sm[5] * (xmax-xmin)) 374 | ymax_sm_v = int(obj_sm[6] * (ymax-ymin)) 375 | if hard_hat_detection == False: 376 | detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin]) 377 | hard_hat_detection = True 378 | 379 | if hard_hat_detection is False or vest_detection is False: 380 | violations += 1 381 | for _rect in detection_list: 382 | cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2) 383 | if is_async_mode: 384 | currReq_s, nextReq_s = nextReq_s, currReq_s 385 | 386 | # Use OpenCV if worker-safety-model is not provided 387 | else : 388 | violations = detect_workers(in_frame_workers, previous_img) 389 | 390 | # Check if detected violations equals previous frames 391 | if violations == prevVideo.currentViolationCount: 392 | prevVideo.currentViolationCountConfidence += 1 393 | 394 | # If frame threshold is reached, change validated count 395 | if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold: 396 | 397 | # If another violation occurred, save image 398 | if prevVideo.currentViolationCount > prevVideo.prevViolationCount: 399 | prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount) 400 | prevVideo.prevViolationCount = prevVideo.currentViolationCount 401 | else: 402 | prevVideo.currentViolationCountConfidence = 0 403 | prevVideo.currentViolationCount = violations 404 | 405 | # Check if detected people count equals previous frames 406 | if people == prevVideo.currentPeopleCount: 407 | prevVideo.currentPeopleCountConfidence += 1 408 | 409 | # If frame threshold is reached, change validated count 410 | if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold: 411 | prevVideo.currentTotalPeopleCount += ( 412 | prevVideo.currentPeopleCount - prevVideo.prevPeopleCount) 413 | if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount: 414 | prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount 415 | prevVideo.prevPeopleCount = prevVideo.currentPeopleCount 416 | else: 417 | prevVideo.currentPeopleCountConfidence = 0 418 | prevVideo.currentPeopleCount = people 419 | 420 | 421 | 422 | frame_end_time = datetime.datetime.now() 423 | cv2.putText(previous_img, 'Total people count: ' + str( 424 | prevVideo.totalPeopleCount), (10, prevVideo.height - 10), 425 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 426 | cv2.putText(previous_img, 'Current people count: ' + str( 427 | prevVideo.currentTotalPeopleCount), 428 | (10, prevVideo.height - 40), 429 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 430 | cv2.putText(previous_img, 'Total violation count: ' + str( 431 | prevVideo.totalViolations), (10, prevVideo.height - 70), 432 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 433 | cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / ( 434 | frame_end_time - prevVideo.frame_start_time).total_seconds()), 435 | (10, prevVideo.height - 100), 436 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 437 | cv2.putText(previous_img, "Inference time: N\A for async mode" if is_async_mode else\ 438 | "Inference time: {:.3f} ms".format((infer_end_time).total_seconds()), 439 | (10, prevVideo.height - 130), 440 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) 441 | 442 | cv2.imshow(prevVideo.name, previous_img) 443 | prevVideo.frame_start_time = datetime.datetime.now() 444 | # Swap 445 | if is_async_mode: 446 | currReq, nextReq = nextReq, currReq 447 | previous_img = current_img 448 | prevVideo = currVideo 449 | if cv2.waitKey(1) == 27: 450 | print("Attempting to stop input files") 451 | infer_network.clean() 452 | infer_network_safety.clean() 453 | cv2.destroyAllWindows() 454 | return 455 | 456 | if False not in vid_finished: 457 | infer_network.clean() 458 | infer_network_safety.clean() 459 | cv2.destroyAllWindows() 460 | break 461 | 462 | 463 | if __name__ == '__main__': 464 | main() 465 | -------------------------------------------------------------------------------- /docs/images/archdia.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/archdia.png -------------------------------------------------------------------------------- /docs/images/jupy1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/jupy1.png -------------------------------------------------------------------------------- /docs/images/jupy2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/jupy2.png -------------------------------------------------------------------------------- /docs/images/safetygear.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/safetygear.png -------------------------------------------------------------------------------- /resources/Safety_Full_Hat_and_Vest.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/resources/Safety_Full_Hat_and_Vest.mp4 -------------------------------------------------------------------------------- /resources/config.json: -------------------------------------------------------------------------------- 1 | { 2 | "inputs":[ 3 | { 4 | "video":"../resources/Safety_Full_Hat_and_Vest.mp4" 5 | }, 6 | { 7 | "video":"../resources/Safety_Full_Hat_and_Vest.mp4" 8 | } 9 | ] 10 | } 11 | 12 | -------------------------------------------------------------------------------- /resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel -------------------------------------------------------------------------------- /resources/worker-safety-mobilenet/worker_safety_mobilenet.prototxt: -------------------------------------------------------------------------------- 1 | name: "MobileNet-SSD" 2 | input: "data" 3 | input_shape { 4 | dim: 1 5 | dim: 3 6 | dim: 224 7 | dim: 224 8 | } 9 | layer { 10 | name: "conv0" 11 | type: "Convolution" 12 | bottom: "data" 13 | top: "conv0" 14 | param { 15 | lr_mult: 0.1 16 | decay_mult: 0.1 17 | } 18 | convolution_param { 19 | num_output: 32 20 | bias_term: false 21 | pad: 1 22 | kernel_size: 3 23 | stride: 2 24 | weight_filler { 25 | type: "msra" 26 | } 27 | } 28 | } 29 | layer { 30 | name: "conv0/bn" 31 | type: "BatchNorm" 32 | bottom: "conv0" 33 | top: "conv0" 34 | param { 35 | lr_mult: 0 36 | decay_mult: 0 37 | } 38 | param { 39 | lr_mult: 0 40 | decay_mult: 0 41 | } 42 | param { 43 | lr_mult: 0 44 | decay_mult: 0 45 | } 46 | } 47 | layer { 48 | name: "conv0/scale" 49 | type: "Scale" 50 | bottom: "conv0" 51 | top: "conv0" 52 | param { 53 | lr_mult: 0.1 54 | decay_mult: 0.0 55 | } 56 | param { 57 | lr_mult: 0.2 58 | decay_mult: 0.0 59 | } 60 | scale_param { 61 | filler { 62 | value: 1 63 | } 64 | bias_term: true 65 | bias_filler { 66 | value: 0 67 | } 68 | } 69 | } 70 | layer { 71 | name: "conv0/relu" 72 | type: "ReLU" 73 | bottom: "conv0" 74 | top: "conv0" 75 | } 76 | layer { 77 | name: "conv1/dw" 78 | type: "Convolution" 79 | bottom: "conv0" 80 | top: "conv1/dw" 81 | param { 82 | lr_mult: 0.1 83 | decay_mult: 0.1 84 | } 85 | convolution_param { 86 | num_output: 32 87 | bias_term: false 88 | pad: 1 89 | kernel_size: 3 90 | group: 32 91 | engine: CAFFE 92 | weight_filler { 93 | type: "msra" 94 | } 95 | } 96 | } 97 | layer { 98 | name: "conv1/dw/bn" 99 | type: "BatchNorm" 100 | bottom: "conv1/dw" 101 | top: "conv1/dw" 102 | param { 103 | lr_mult: 0 104 | decay_mult: 0 105 | } 106 | param { 107 | lr_mult: 0 108 | decay_mult: 0 109 | } 110 | param { 111 | lr_mult: 0 112 | decay_mult: 0 113 | } 114 | } 115 | layer { 116 | name: "conv1/dw/scale" 117 | type: "Scale" 118 | bottom: "conv1/dw" 119 | top: "conv1/dw" 120 | param { 121 | lr_mult: 0.1 122 | decay_mult: 0.0 123 | } 124 | param { 125 | lr_mult: 0.2 126 | decay_mult: 0.0 127 | } 128 | scale_param { 129 | filler { 130 | value: 1 131 | } 132 | bias_term: true 133 | bias_filler { 134 | value: 0 135 | } 136 | } 137 | } 138 | layer { 139 | name: "conv1/dw/relu" 140 | type: "ReLU" 141 | bottom: "conv1/dw" 142 | top: "conv1/dw" 143 | } 144 | layer { 145 | name: "conv1" 146 | type: "Convolution" 147 | bottom: "conv1/dw" 148 | top: "conv1" 149 | param { 150 | lr_mult: 0.1 151 | decay_mult: 0.1 152 | } 153 | convolution_param { 154 | num_output: 64 155 | bias_term: false 156 | kernel_size: 1 157 | weight_filler { 158 | type: "msra" 159 | } 160 | } 161 | } 162 | layer { 163 | name: "conv1/bn" 164 | type: "BatchNorm" 165 | bottom: "conv1" 166 | top: "conv1" 167 | param { 168 | lr_mult: 0 169 | decay_mult: 0 170 | } 171 | param { 172 | lr_mult: 0 173 | decay_mult: 0 174 | } 175 | param { 176 | lr_mult: 0 177 | decay_mult: 0 178 | } 179 | } 180 | layer { 181 | name: "conv1/scale" 182 | type: "Scale" 183 | bottom: "conv1" 184 | top: "conv1" 185 | param { 186 | lr_mult: 0.1 187 | decay_mult: 0.0 188 | } 189 | param { 190 | lr_mult: 0.2 191 | decay_mult: 0.0 192 | } 193 | scale_param { 194 | filler { 195 | value: 1 196 | } 197 | bias_term: true 198 | bias_filler { 199 | value: 0 200 | } 201 | } 202 | } 203 | layer { 204 | name: "conv1/relu" 205 | type: "ReLU" 206 | bottom: "conv1" 207 | top: "conv1" 208 | } 209 | layer { 210 | name: "conv2/dw" 211 | type: "Convolution" 212 | bottom: "conv1" 213 | top: "conv2/dw" 214 | param { 215 | lr_mult: 0.1 216 | decay_mult: 0.1 217 | } 218 | convolution_param { 219 | num_output: 64 220 | bias_term: false 221 | pad: 1 222 | kernel_size: 3 223 | stride: 2 224 | group: 64 225 | engine: CAFFE 226 | weight_filler { 227 | type: "msra" 228 | } 229 | } 230 | } 231 | layer { 232 | name: "conv2/dw/bn" 233 | type: "BatchNorm" 234 | bottom: "conv2/dw" 235 | top: "conv2/dw" 236 | param { 237 | lr_mult: 0 238 | decay_mult: 0 239 | } 240 | param { 241 | lr_mult: 0 242 | decay_mult: 0 243 | } 244 | param { 245 | lr_mult: 0 246 | decay_mult: 0 247 | } 248 | } 249 | layer { 250 | name: "conv2/dw/scale" 251 | type: "Scale" 252 | bottom: "conv2/dw" 253 | top: "conv2/dw" 254 | param { 255 | lr_mult: 0.1 256 | decay_mult: 0.0 257 | } 258 | param { 259 | lr_mult: 0.2 260 | decay_mult: 0.0 261 | } 262 | scale_param { 263 | filler { 264 | value: 1 265 | } 266 | bias_term: true 267 | bias_filler { 268 | value: 0 269 | } 270 | } 271 | } 272 | layer { 273 | name: "conv2/dw/relu" 274 | type: "ReLU" 275 | bottom: "conv2/dw" 276 | top: "conv2/dw" 277 | } 278 | layer { 279 | name: "conv2" 280 | type: "Convolution" 281 | bottom: "conv2/dw" 282 | top: "conv2" 283 | param { 284 | lr_mult: 0.1 285 | decay_mult: 0.1 286 | } 287 | convolution_param { 288 | num_output: 128 289 | bias_term: false 290 | kernel_size: 1 291 | weight_filler { 292 | type: "msra" 293 | } 294 | } 295 | } 296 | layer { 297 | name: "conv2/bn" 298 | type: "BatchNorm" 299 | bottom: "conv2" 300 | top: "conv2" 301 | param { 302 | lr_mult: 0 303 | decay_mult: 0 304 | } 305 | param { 306 | lr_mult: 0 307 | decay_mult: 0 308 | } 309 | param { 310 | lr_mult: 0 311 | decay_mult: 0 312 | } 313 | } 314 | layer { 315 | name: "conv2/scale" 316 | type: "Scale" 317 | bottom: "conv2" 318 | top: "conv2" 319 | param { 320 | lr_mult: 0.1 321 | decay_mult: 0.0 322 | } 323 | param { 324 | lr_mult: 0.2 325 | decay_mult: 0.0 326 | } 327 | scale_param { 328 | filler { 329 | value: 1 330 | } 331 | bias_term: true 332 | bias_filler { 333 | value: 0 334 | } 335 | } 336 | } 337 | layer { 338 | name: "conv2/relu" 339 | type: "ReLU" 340 | bottom: "conv2" 341 | top: "conv2" 342 | } 343 | layer { 344 | name: "conv3/dw" 345 | type: "Convolution" 346 | bottom: "conv2" 347 | top: "conv3/dw" 348 | param { 349 | lr_mult: 0.1 350 | decay_mult: 0.1 351 | } 352 | convolution_param { 353 | num_output: 128 354 | bias_term: false 355 | pad: 1 356 | kernel_size: 3 357 | group: 128 358 | engine: CAFFE 359 | weight_filler { 360 | type: "msra" 361 | } 362 | } 363 | } 364 | layer { 365 | name: "conv3/dw/bn" 366 | type: "BatchNorm" 367 | bottom: "conv3/dw" 368 | top: "conv3/dw" 369 | param { 370 | lr_mult: 0 371 | decay_mult: 0 372 | } 373 | param { 374 | lr_mult: 0 375 | decay_mult: 0 376 | } 377 | param { 378 | lr_mult: 0 379 | decay_mult: 0 380 | } 381 | } 382 | layer { 383 | name: "conv3/dw/scale" 384 | type: "Scale" 385 | bottom: "conv3/dw" 386 | top: "conv3/dw" 387 | param { 388 | lr_mult: 0.1 389 | decay_mult: 0.0 390 | } 391 | param { 392 | lr_mult: 0.2 393 | decay_mult: 0.0 394 | } 395 | scale_param { 396 | filler { 397 | value: 1 398 | } 399 | bias_term: true 400 | bias_filler { 401 | value: 0 402 | } 403 | } 404 | } 405 | layer { 406 | name: "conv3/dw/relu" 407 | type: "ReLU" 408 | bottom: "conv3/dw" 409 | top: "conv3/dw" 410 | } 411 | layer { 412 | name: "conv3" 413 | type: "Convolution" 414 | bottom: "conv3/dw" 415 | top: "conv3" 416 | param { 417 | lr_mult: 0.1 418 | decay_mult: 0.1 419 | } 420 | convolution_param { 421 | num_output: 128 422 | bias_term: false 423 | kernel_size: 1 424 | weight_filler { 425 | type: "msra" 426 | } 427 | } 428 | } 429 | layer { 430 | name: "conv3/bn" 431 | type: "BatchNorm" 432 | bottom: "conv3" 433 | top: "conv3" 434 | param { 435 | lr_mult: 0 436 | decay_mult: 0 437 | } 438 | param { 439 | lr_mult: 0 440 | decay_mult: 0 441 | } 442 | param { 443 | lr_mult: 0 444 | decay_mult: 0 445 | } 446 | } 447 | layer { 448 | name: "conv3/scale" 449 | type: "Scale" 450 | bottom: "conv3" 451 | top: "conv3" 452 | param { 453 | lr_mult: 0.1 454 | decay_mult: 0.0 455 | } 456 | param { 457 | lr_mult: 0.2 458 | decay_mult: 0.0 459 | } 460 | scale_param { 461 | filler { 462 | value: 1 463 | } 464 | bias_term: true 465 | bias_filler { 466 | value: 0 467 | } 468 | } 469 | } 470 | layer { 471 | name: "conv3/relu" 472 | type: "ReLU" 473 | bottom: "conv3" 474 | top: "conv3" 475 | } 476 | layer { 477 | name: "conv4/dw" 478 | type: "Convolution" 479 | bottom: "conv3" 480 | top: "conv4/dw" 481 | param { 482 | lr_mult: 0.1 483 | decay_mult: 0.1 484 | } 485 | convolution_param { 486 | num_output: 128 487 | bias_term: false 488 | pad: 1 489 | kernel_size: 3 490 | stride: 2 491 | group: 128 492 | engine: CAFFE 493 | weight_filler { 494 | type: "msra" 495 | } 496 | } 497 | } 498 | layer { 499 | name: "conv4/dw/bn" 500 | type: "BatchNorm" 501 | bottom: "conv4/dw" 502 | top: "conv4/dw" 503 | param { 504 | lr_mult: 0 505 | decay_mult: 0 506 | } 507 | param { 508 | lr_mult: 0 509 | decay_mult: 0 510 | } 511 | param { 512 | lr_mult: 0 513 | decay_mult: 0 514 | } 515 | } 516 | layer { 517 | name: "conv4/dw/scale" 518 | type: "Scale" 519 | bottom: "conv4/dw" 520 | top: "conv4/dw" 521 | param { 522 | lr_mult: 0.1 523 | decay_mult: 0.0 524 | } 525 | param { 526 | lr_mult: 0.2 527 | decay_mult: 0.0 528 | } 529 | scale_param { 530 | filler { 531 | value: 1 532 | } 533 | bias_term: true 534 | bias_filler { 535 | value: 0 536 | } 537 | } 538 | } 539 | layer { 540 | name: "conv4/dw/relu" 541 | type: "ReLU" 542 | bottom: "conv4/dw" 543 | top: "conv4/dw" 544 | } 545 | layer { 546 | name: "conv4" 547 | type: "Convolution" 548 | bottom: "conv4/dw" 549 | top: "conv4" 550 | param { 551 | lr_mult: 0.1 552 | decay_mult: 0.1 553 | } 554 | convolution_param { 555 | num_output: 256 556 | bias_term: false 557 | kernel_size: 1 558 | weight_filler { 559 | type: "msra" 560 | } 561 | } 562 | } 563 | layer { 564 | name: "conv4/bn" 565 | type: "BatchNorm" 566 | bottom: "conv4" 567 | top: "conv4" 568 | param { 569 | lr_mult: 0 570 | decay_mult: 0 571 | } 572 | param { 573 | lr_mult: 0 574 | decay_mult: 0 575 | } 576 | param { 577 | lr_mult: 0 578 | decay_mult: 0 579 | } 580 | } 581 | layer { 582 | name: "conv4/scale" 583 | type: "Scale" 584 | bottom: "conv4" 585 | top: "conv4" 586 | param { 587 | lr_mult: 0.1 588 | decay_mult: 0.0 589 | } 590 | param { 591 | lr_mult: 0.2 592 | decay_mult: 0.0 593 | } 594 | scale_param { 595 | filler { 596 | value: 1 597 | } 598 | bias_term: true 599 | bias_filler { 600 | value: 0 601 | } 602 | } 603 | } 604 | layer { 605 | name: "conv4/relu" 606 | type: "ReLU" 607 | bottom: "conv4" 608 | top: "conv4" 609 | } 610 | layer { 611 | name: "conv5/dw" 612 | type: "Convolution" 613 | bottom: "conv4" 614 | top: "conv5/dw" 615 | param { 616 | lr_mult: 0.1 617 | decay_mult: 0.1 618 | } 619 | convolution_param { 620 | num_output: 256 621 | bias_term: false 622 | pad: 1 623 | kernel_size: 3 624 | group: 256 625 | engine: CAFFE 626 | weight_filler { 627 | type: "msra" 628 | } 629 | } 630 | } 631 | layer { 632 | name: "conv5/dw/bn" 633 | type: "BatchNorm" 634 | bottom: "conv5/dw" 635 | top: "conv5/dw" 636 | param { 637 | lr_mult: 0 638 | decay_mult: 0 639 | } 640 | param { 641 | lr_mult: 0 642 | decay_mult: 0 643 | } 644 | param { 645 | lr_mult: 0 646 | decay_mult: 0 647 | } 648 | } 649 | layer { 650 | name: "conv5/dw/scale" 651 | type: "Scale" 652 | bottom: "conv5/dw" 653 | top: "conv5/dw" 654 | param { 655 | lr_mult: 0.1 656 | decay_mult: 0.0 657 | } 658 | param { 659 | lr_mult: 0.2 660 | decay_mult: 0.0 661 | } 662 | scale_param { 663 | filler { 664 | value: 1 665 | } 666 | bias_term: true 667 | bias_filler { 668 | value: 0 669 | } 670 | } 671 | } 672 | layer { 673 | name: "conv5/dw/relu" 674 | type: "ReLU" 675 | bottom: "conv5/dw" 676 | top: "conv5/dw" 677 | } 678 | layer { 679 | name: "conv5" 680 | type: "Convolution" 681 | bottom: "conv5/dw" 682 | top: "conv5" 683 | param { 684 | lr_mult: 0.1 685 | decay_mult: 0.1 686 | } 687 | convolution_param { 688 | num_output: 256 689 | bias_term: false 690 | kernel_size: 1 691 | weight_filler { 692 | type: "msra" 693 | } 694 | } 695 | } 696 | layer { 697 | name: "conv5/bn" 698 | type: "BatchNorm" 699 | bottom: "conv5" 700 | top: "conv5" 701 | param { 702 | lr_mult: 0 703 | decay_mult: 0 704 | } 705 | param { 706 | lr_mult: 0 707 | decay_mult: 0 708 | } 709 | param { 710 | lr_mult: 0 711 | decay_mult: 0 712 | } 713 | } 714 | layer { 715 | name: "conv5/scale" 716 | type: "Scale" 717 | bottom: "conv5" 718 | top: "conv5" 719 | param { 720 | lr_mult: 0.1 721 | decay_mult: 0.0 722 | } 723 | param { 724 | lr_mult: 0.2 725 | decay_mult: 0.0 726 | } 727 | scale_param { 728 | filler { 729 | value: 1 730 | } 731 | bias_term: true 732 | bias_filler { 733 | value: 0 734 | } 735 | } 736 | } 737 | layer { 738 | name: "conv5/relu" 739 | type: "ReLU" 740 | bottom: "conv5" 741 | top: "conv5" 742 | } 743 | layer { 744 | name: "conv6/dw" 745 | type: "Convolution" 746 | bottom: "conv5" 747 | top: "conv6/dw" 748 | param { 749 | lr_mult: 0.1 750 | decay_mult: 0.1 751 | } 752 | convolution_param { 753 | num_output: 256 754 | bias_term: false 755 | pad: 1 756 | kernel_size: 3 757 | stride: 2 758 | group: 256 759 | engine: CAFFE 760 | weight_filler { 761 | type: "msra" 762 | } 763 | } 764 | } 765 | layer { 766 | name: "conv6/dw/bn" 767 | type: "BatchNorm" 768 | bottom: "conv6/dw" 769 | top: "conv6/dw" 770 | param { 771 | lr_mult: 0 772 | decay_mult: 0 773 | } 774 | param { 775 | lr_mult: 0 776 | decay_mult: 0 777 | } 778 | param { 779 | lr_mult: 0 780 | decay_mult: 0 781 | } 782 | } 783 | layer { 784 | name: "conv6/dw/scale" 785 | type: "Scale" 786 | bottom: "conv6/dw" 787 | top: "conv6/dw" 788 | param { 789 | lr_mult: 0.1 790 | decay_mult: 0.0 791 | } 792 | param { 793 | lr_mult: 0.2 794 | decay_mult: 0.0 795 | } 796 | scale_param { 797 | filler { 798 | value: 1 799 | } 800 | bias_term: true 801 | bias_filler { 802 | value: 0 803 | } 804 | } 805 | } 806 | layer { 807 | name: "conv6/dw/relu" 808 | type: "ReLU" 809 | bottom: "conv6/dw" 810 | top: "conv6/dw" 811 | } 812 | layer { 813 | name: "conv6" 814 | type: "Convolution" 815 | bottom: "conv6/dw" 816 | top: "conv6" 817 | param { 818 | lr_mult: 0.1 819 | decay_mult: 0.1 820 | } 821 | convolution_param { 822 | num_output: 512 823 | bias_term: false 824 | kernel_size: 1 825 | weight_filler { 826 | type: "msra" 827 | } 828 | } 829 | } 830 | layer { 831 | name: "conv6/bn" 832 | type: "BatchNorm" 833 | bottom: "conv6" 834 | top: "conv6" 835 | param { 836 | lr_mult: 0 837 | decay_mult: 0 838 | } 839 | param { 840 | lr_mult: 0 841 | decay_mult: 0 842 | } 843 | param { 844 | lr_mult: 0 845 | decay_mult: 0 846 | } 847 | } 848 | layer { 849 | name: "conv6/scale" 850 | type: "Scale" 851 | bottom: "conv6" 852 | top: "conv6" 853 | param { 854 | lr_mult: 0.1 855 | decay_mult: 0.0 856 | } 857 | param { 858 | lr_mult: 0.2 859 | decay_mult: 0.0 860 | } 861 | scale_param { 862 | filler { 863 | value: 1 864 | } 865 | bias_term: true 866 | bias_filler { 867 | value: 0 868 | } 869 | } 870 | } 871 | layer { 872 | name: "conv6/relu" 873 | type: "ReLU" 874 | bottom: "conv6" 875 | top: "conv6" 876 | } 877 | layer { 878 | name: "conv7/dw" 879 | type: "Convolution" 880 | bottom: "conv6" 881 | top: "conv7/dw" 882 | param { 883 | lr_mult: 0.1 884 | decay_mult: 0.1 885 | } 886 | convolution_param { 887 | num_output: 512 888 | bias_term: false 889 | pad: 1 890 | kernel_size: 3 891 | group: 512 892 | engine: CAFFE 893 | weight_filler { 894 | type: "msra" 895 | } 896 | } 897 | } 898 | layer { 899 | name: "conv7/dw/bn" 900 | type: "BatchNorm" 901 | bottom: "conv7/dw" 902 | top: "conv7/dw" 903 | param { 904 | lr_mult: 0 905 | decay_mult: 0 906 | } 907 | param { 908 | lr_mult: 0 909 | decay_mult: 0 910 | } 911 | param { 912 | lr_mult: 0 913 | decay_mult: 0 914 | } 915 | } 916 | layer { 917 | name: "conv7/dw/scale" 918 | type: "Scale" 919 | bottom: "conv7/dw" 920 | top: "conv7/dw" 921 | param { 922 | lr_mult: 0.1 923 | decay_mult: 0.0 924 | } 925 | param { 926 | lr_mult: 0.2 927 | decay_mult: 0.0 928 | } 929 | scale_param { 930 | filler { 931 | value: 1 932 | } 933 | bias_term: true 934 | bias_filler { 935 | value: 0 936 | } 937 | } 938 | } 939 | layer { 940 | name: "conv7/dw/relu" 941 | type: "ReLU" 942 | bottom: "conv7/dw" 943 | top: "conv7/dw" 944 | } 945 | layer { 946 | name: "conv7" 947 | type: "Convolution" 948 | bottom: "conv7/dw" 949 | top: "conv7" 950 | param { 951 | lr_mult: 0.1 952 | decay_mult: 0.1 953 | } 954 | convolution_param { 955 | num_output: 512 956 | bias_term: false 957 | kernel_size: 1 958 | weight_filler { 959 | type: "msra" 960 | } 961 | } 962 | } 963 | layer { 964 | name: "conv7/bn" 965 | type: "BatchNorm" 966 | bottom: "conv7" 967 | top: "conv7" 968 | param { 969 | lr_mult: 0 970 | decay_mult: 0 971 | } 972 | param { 973 | lr_mult: 0 974 | decay_mult: 0 975 | } 976 | param { 977 | lr_mult: 0 978 | decay_mult: 0 979 | } 980 | } 981 | layer { 982 | name: "conv7/scale" 983 | type: "Scale" 984 | bottom: "conv7" 985 | top: "conv7" 986 | param { 987 | lr_mult: 0.1 988 | decay_mult: 0.0 989 | } 990 | param { 991 | lr_mult: 0.2 992 | decay_mult: 0.0 993 | } 994 | scale_param { 995 | filler { 996 | value: 1 997 | } 998 | bias_term: true 999 | bias_filler { 1000 | value: 0 1001 | } 1002 | } 1003 | } 1004 | layer { 1005 | name: "conv7/relu" 1006 | type: "ReLU" 1007 | bottom: "conv7" 1008 | top: "conv7" 1009 | } 1010 | layer { 1011 | name: "conv8/dw" 1012 | type: "Convolution" 1013 | bottom: "conv7" 1014 | top: "conv8/dw" 1015 | param { 1016 | lr_mult: 0.1 1017 | decay_mult: 0.1 1018 | } 1019 | convolution_param { 1020 | num_output: 512 1021 | bias_term: false 1022 | pad: 1 1023 | kernel_size: 3 1024 | group: 512 1025 | engine: CAFFE 1026 | weight_filler { 1027 | type: "msra" 1028 | } 1029 | } 1030 | } 1031 | layer { 1032 | name: "conv8/dw/bn" 1033 | type: "BatchNorm" 1034 | bottom: "conv8/dw" 1035 | top: "conv8/dw" 1036 | param { 1037 | lr_mult: 0 1038 | decay_mult: 0 1039 | } 1040 | param { 1041 | lr_mult: 0 1042 | decay_mult: 0 1043 | } 1044 | param { 1045 | lr_mult: 0 1046 | decay_mult: 0 1047 | } 1048 | } 1049 | layer { 1050 | name: "conv8/dw/scale" 1051 | type: "Scale" 1052 | bottom: "conv8/dw" 1053 | top: "conv8/dw" 1054 | param { 1055 | lr_mult: 0.1 1056 | decay_mult: 0.0 1057 | } 1058 | param { 1059 | lr_mult: 0.2 1060 | decay_mult: 0.0 1061 | } 1062 | scale_param { 1063 | filler { 1064 | value: 1 1065 | } 1066 | bias_term: true 1067 | bias_filler { 1068 | value: 0 1069 | } 1070 | } 1071 | } 1072 | layer { 1073 | name: "conv8/dw/relu" 1074 | type: "ReLU" 1075 | bottom: "conv8/dw" 1076 | top: "conv8/dw" 1077 | } 1078 | layer { 1079 | name: "conv8" 1080 | type: "Convolution" 1081 | bottom: "conv8/dw" 1082 | top: "conv8" 1083 | param { 1084 | lr_mult: 0.1 1085 | decay_mult: 0.1 1086 | } 1087 | convolution_param { 1088 | num_output: 512 1089 | bias_term: false 1090 | kernel_size: 1 1091 | weight_filler { 1092 | type: "msra" 1093 | } 1094 | } 1095 | } 1096 | layer { 1097 | name: "conv8/bn" 1098 | type: "BatchNorm" 1099 | bottom: "conv8" 1100 | top: "conv8" 1101 | param { 1102 | lr_mult: 0 1103 | decay_mult: 0 1104 | } 1105 | param { 1106 | lr_mult: 0 1107 | decay_mult: 0 1108 | } 1109 | param { 1110 | lr_mult: 0 1111 | decay_mult: 0 1112 | } 1113 | } 1114 | layer { 1115 | name: "conv8/scale" 1116 | type: "Scale" 1117 | bottom: "conv8" 1118 | top: "conv8" 1119 | param { 1120 | lr_mult: 0.1 1121 | decay_mult: 0.0 1122 | } 1123 | param { 1124 | lr_mult: 0.2 1125 | decay_mult: 0.0 1126 | } 1127 | scale_param { 1128 | filler { 1129 | value: 1 1130 | } 1131 | bias_term: true 1132 | bias_filler { 1133 | value: 0 1134 | } 1135 | } 1136 | } 1137 | layer { 1138 | name: "conv8/relu" 1139 | type: "ReLU" 1140 | bottom: "conv8" 1141 | top: "conv8" 1142 | } 1143 | layer { 1144 | name: "conv9/dw" 1145 | type: "Convolution" 1146 | bottom: "conv8" 1147 | top: "conv9/dw" 1148 | param { 1149 | lr_mult: 0.1 1150 | decay_mult: 0.1 1151 | } 1152 | convolution_param { 1153 | num_output: 512 1154 | bias_term: false 1155 | pad: 1 1156 | kernel_size: 3 1157 | group: 512 1158 | engine: CAFFE 1159 | weight_filler { 1160 | type: "msra" 1161 | } 1162 | } 1163 | } 1164 | layer { 1165 | name: "conv9/dw/bn" 1166 | type: "BatchNorm" 1167 | bottom: "conv9/dw" 1168 | top: "conv9/dw" 1169 | param { 1170 | lr_mult: 0 1171 | decay_mult: 0 1172 | } 1173 | param { 1174 | lr_mult: 0 1175 | decay_mult: 0 1176 | } 1177 | param { 1178 | lr_mult: 0 1179 | decay_mult: 0 1180 | } 1181 | } 1182 | layer { 1183 | name: "conv9/dw/scale" 1184 | type: "Scale" 1185 | bottom: "conv9/dw" 1186 | top: "conv9/dw" 1187 | param { 1188 | lr_mult: 0.1 1189 | decay_mult: 0.0 1190 | } 1191 | param { 1192 | lr_mult: 0.2 1193 | decay_mult: 0.0 1194 | } 1195 | scale_param { 1196 | filler { 1197 | value: 1 1198 | } 1199 | bias_term: true 1200 | bias_filler { 1201 | value: 0 1202 | } 1203 | } 1204 | } 1205 | layer { 1206 | name: "conv9/dw/relu" 1207 | type: "ReLU" 1208 | bottom: "conv9/dw" 1209 | top: "conv9/dw" 1210 | } 1211 | layer { 1212 | name: "conv9" 1213 | type: "Convolution" 1214 | bottom: "conv9/dw" 1215 | top: "conv9" 1216 | param { 1217 | lr_mult: 0.1 1218 | decay_mult: 0.1 1219 | } 1220 | convolution_param { 1221 | num_output: 512 1222 | bias_term: false 1223 | kernel_size: 1 1224 | weight_filler { 1225 | type: "msra" 1226 | } 1227 | } 1228 | } 1229 | layer { 1230 | name: "conv9/bn" 1231 | type: "BatchNorm" 1232 | bottom: "conv9" 1233 | top: "conv9" 1234 | param { 1235 | lr_mult: 0 1236 | decay_mult: 0 1237 | } 1238 | param { 1239 | lr_mult: 0 1240 | decay_mult: 0 1241 | } 1242 | param { 1243 | lr_mult: 0 1244 | decay_mult: 0 1245 | } 1246 | } 1247 | layer { 1248 | name: "conv9/scale" 1249 | type: "Scale" 1250 | bottom: "conv9" 1251 | top: "conv9" 1252 | param { 1253 | lr_mult: 0.1 1254 | decay_mult: 0.0 1255 | } 1256 | param { 1257 | lr_mult: 0.2 1258 | decay_mult: 0.0 1259 | } 1260 | scale_param { 1261 | filler { 1262 | value: 1 1263 | } 1264 | bias_term: true 1265 | bias_filler { 1266 | value: 0 1267 | } 1268 | } 1269 | } 1270 | layer { 1271 | name: "conv9/relu" 1272 | type: "ReLU" 1273 | bottom: "conv9" 1274 | top: "conv9" 1275 | } 1276 | layer { 1277 | name: "conv10/dw" 1278 | type: "Convolution" 1279 | bottom: "conv9" 1280 | top: "conv10/dw" 1281 | param { 1282 | lr_mult: 0.1 1283 | decay_mult: 0.1 1284 | } 1285 | convolution_param { 1286 | num_output: 512 1287 | bias_term: false 1288 | pad: 1 1289 | kernel_size: 3 1290 | group: 512 1291 | engine: CAFFE 1292 | weight_filler { 1293 | type: "msra" 1294 | } 1295 | } 1296 | } 1297 | layer { 1298 | name: "conv10/dw/bn" 1299 | type: "BatchNorm" 1300 | bottom: "conv10/dw" 1301 | top: "conv10/dw" 1302 | param { 1303 | lr_mult: 0 1304 | decay_mult: 0 1305 | } 1306 | param { 1307 | lr_mult: 0 1308 | decay_mult: 0 1309 | } 1310 | param { 1311 | lr_mult: 0 1312 | decay_mult: 0 1313 | } 1314 | } 1315 | layer { 1316 | name: "conv10/dw/scale" 1317 | type: "Scale" 1318 | bottom: "conv10/dw" 1319 | top: "conv10/dw" 1320 | param { 1321 | lr_mult: 0.1 1322 | decay_mult: 0.0 1323 | } 1324 | param { 1325 | lr_mult: 0.2 1326 | decay_mult: 0.0 1327 | } 1328 | scale_param { 1329 | filler { 1330 | value: 1 1331 | } 1332 | bias_term: true 1333 | bias_filler { 1334 | value: 0 1335 | } 1336 | } 1337 | } 1338 | layer { 1339 | name: "conv10/dw/relu" 1340 | type: "ReLU" 1341 | bottom: "conv10/dw" 1342 | top: "conv10/dw" 1343 | } 1344 | layer { 1345 | name: "conv10" 1346 | type: "Convolution" 1347 | bottom: "conv10/dw" 1348 | top: "conv10" 1349 | param { 1350 | lr_mult: 0.1 1351 | decay_mult: 0.1 1352 | } 1353 | convolution_param { 1354 | num_output: 512 1355 | bias_term: false 1356 | kernel_size: 1 1357 | weight_filler { 1358 | type: "msra" 1359 | } 1360 | } 1361 | } 1362 | layer { 1363 | name: "conv10/bn" 1364 | type: "BatchNorm" 1365 | bottom: "conv10" 1366 | top: "conv10" 1367 | param { 1368 | lr_mult: 0 1369 | decay_mult: 0 1370 | } 1371 | param { 1372 | lr_mult: 0 1373 | decay_mult: 0 1374 | } 1375 | param { 1376 | lr_mult: 0 1377 | decay_mult: 0 1378 | } 1379 | } 1380 | layer { 1381 | name: "conv10/scale" 1382 | type: "Scale" 1383 | bottom: "conv10" 1384 | top: "conv10" 1385 | param { 1386 | lr_mult: 0.1 1387 | decay_mult: 0.0 1388 | } 1389 | param { 1390 | lr_mult: 0.2 1391 | decay_mult: 0.0 1392 | } 1393 | scale_param { 1394 | filler { 1395 | value: 1 1396 | } 1397 | bias_term: true 1398 | bias_filler { 1399 | value: 0 1400 | } 1401 | } 1402 | } 1403 | layer { 1404 | name: "conv10/relu" 1405 | type: "ReLU" 1406 | bottom: "conv10" 1407 | top: "conv10" 1408 | } 1409 | layer { 1410 | name: "conv11/dw" 1411 | type: "Convolution" 1412 | bottom: "conv10" 1413 | top: "conv11/dw" 1414 | param { 1415 | lr_mult: 0.1 1416 | decay_mult: 0.1 1417 | } 1418 | convolution_param { 1419 | num_output: 512 1420 | bias_term: false 1421 | pad: 1 1422 | kernel_size: 3 1423 | group: 512 1424 | engine: CAFFE 1425 | weight_filler { 1426 | type: "msra" 1427 | } 1428 | } 1429 | } 1430 | layer { 1431 | name: "conv11/dw/bn" 1432 | type: "BatchNorm" 1433 | bottom: "conv11/dw" 1434 | top: "conv11/dw" 1435 | param { 1436 | lr_mult: 0 1437 | decay_mult: 0 1438 | } 1439 | param { 1440 | lr_mult: 0 1441 | decay_mult: 0 1442 | } 1443 | param { 1444 | lr_mult: 0 1445 | decay_mult: 0 1446 | } 1447 | } 1448 | layer { 1449 | name: "conv11/dw/scale" 1450 | type: "Scale" 1451 | bottom: "conv11/dw" 1452 | top: "conv11/dw" 1453 | param { 1454 | lr_mult: 0.1 1455 | decay_mult: 0.0 1456 | } 1457 | param { 1458 | lr_mult: 0.2 1459 | decay_mult: 0.0 1460 | } 1461 | scale_param { 1462 | filler { 1463 | value: 1 1464 | } 1465 | bias_term: true 1466 | bias_filler { 1467 | value: 0 1468 | } 1469 | } 1470 | } 1471 | layer { 1472 | name: "conv11/dw/relu" 1473 | type: "ReLU" 1474 | bottom: "conv11/dw" 1475 | top: "conv11/dw" 1476 | } 1477 | layer { 1478 | name: "conv11" 1479 | type: "Convolution" 1480 | bottom: "conv11/dw" 1481 | top: "conv11" 1482 | param { 1483 | lr_mult: 0.1 1484 | decay_mult: 0.1 1485 | } 1486 | convolution_param { 1487 | num_output: 512 1488 | bias_term: false 1489 | kernel_size: 1 1490 | weight_filler { 1491 | type: "msra" 1492 | } 1493 | } 1494 | } 1495 | layer { 1496 | name: "conv11/bn" 1497 | type: "BatchNorm" 1498 | bottom: "conv11" 1499 | top: "conv11" 1500 | param { 1501 | lr_mult: 0 1502 | decay_mult: 0 1503 | } 1504 | param { 1505 | lr_mult: 0 1506 | decay_mult: 0 1507 | } 1508 | param { 1509 | lr_mult: 0 1510 | decay_mult: 0 1511 | } 1512 | } 1513 | layer { 1514 | name: "conv11/scale" 1515 | type: "Scale" 1516 | bottom: "conv11" 1517 | top: "conv11" 1518 | param { 1519 | lr_mult: 0.1 1520 | decay_mult: 0.0 1521 | } 1522 | param { 1523 | lr_mult: 0.2 1524 | decay_mult: 0.0 1525 | } 1526 | scale_param { 1527 | filler { 1528 | value: 1 1529 | } 1530 | bias_term: true 1531 | bias_filler { 1532 | value: 0 1533 | } 1534 | } 1535 | } 1536 | layer { 1537 | name: "conv11/relu" 1538 | type: "ReLU" 1539 | bottom: "conv11" 1540 | top: "conv11" 1541 | } 1542 | layer { 1543 | name: "conv12/dw" 1544 | type: "Convolution" 1545 | bottom: "conv11" 1546 | top: "conv12/dw" 1547 | param { 1548 | lr_mult: 0.1 1549 | decay_mult: 0.1 1550 | } 1551 | convolution_param { 1552 | num_output: 512 1553 | bias_term: false 1554 | pad: 1 1555 | kernel_size: 3 1556 | stride: 2 1557 | group: 512 1558 | engine: CAFFE 1559 | weight_filler { 1560 | type: "msra" 1561 | } 1562 | } 1563 | } 1564 | layer { 1565 | name: "conv12/dw/bn" 1566 | type: "BatchNorm" 1567 | bottom: "conv12/dw" 1568 | top: "conv12/dw" 1569 | param { 1570 | lr_mult: 0 1571 | decay_mult: 0 1572 | } 1573 | param { 1574 | lr_mult: 0 1575 | decay_mult: 0 1576 | } 1577 | param { 1578 | lr_mult: 0 1579 | decay_mult: 0 1580 | } 1581 | } 1582 | layer { 1583 | name: "conv12/dw/scale" 1584 | type: "Scale" 1585 | bottom: "conv12/dw" 1586 | top: "conv12/dw" 1587 | param { 1588 | lr_mult: 0.1 1589 | decay_mult: 0.0 1590 | } 1591 | param { 1592 | lr_mult: 0.2 1593 | decay_mult: 0.0 1594 | } 1595 | scale_param { 1596 | filler { 1597 | value: 1 1598 | } 1599 | bias_term: true 1600 | bias_filler { 1601 | value: 0 1602 | } 1603 | } 1604 | } 1605 | layer { 1606 | name: "conv12/dw/relu" 1607 | type: "ReLU" 1608 | bottom: "conv12/dw" 1609 | top: "conv12/dw" 1610 | } 1611 | layer { 1612 | name: "conv12" 1613 | type: "Convolution" 1614 | bottom: "conv12/dw" 1615 | top: "conv12" 1616 | param { 1617 | lr_mult: 0.1 1618 | decay_mult: 0.1 1619 | } 1620 | convolution_param { 1621 | num_output: 1024 1622 | bias_term: false 1623 | kernel_size: 1 1624 | weight_filler { 1625 | type: "msra" 1626 | } 1627 | } 1628 | } 1629 | layer { 1630 | name: "conv12/bn" 1631 | type: "BatchNorm" 1632 | bottom: "conv12" 1633 | top: "conv12" 1634 | param { 1635 | lr_mult: 0 1636 | decay_mult: 0 1637 | } 1638 | param { 1639 | lr_mult: 0 1640 | decay_mult: 0 1641 | } 1642 | param { 1643 | lr_mult: 0 1644 | decay_mult: 0 1645 | } 1646 | } 1647 | layer { 1648 | name: "conv12/scale" 1649 | type: "Scale" 1650 | bottom: "conv12" 1651 | top: "conv12" 1652 | param { 1653 | lr_mult: 0.1 1654 | decay_mult: 0.0 1655 | } 1656 | param { 1657 | lr_mult: 0.2 1658 | decay_mult: 0.0 1659 | } 1660 | scale_param { 1661 | filler { 1662 | value: 1 1663 | } 1664 | bias_term: true 1665 | bias_filler { 1666 | value: 0 1667 | } 1668 | } 1669 | } 1670 | layer { 1671 | name: "conv12/relu" 1672 | type: "ReLU" 1673 | bottom: "conv12" 1674 | top: "conv12" 1675 | } 1676 | layer { 1677 | name: "conv13/dw" 1678 | type: "Convolution" 1679 | bottom: "conv12" 1680 | top: "conv13/dw" 1681 | param { 1682 | lr_mult: 0.1 1683 | decay_mult: 0.1 1684 | } 1685 | convolution_param { 1686 | num_output: 1024 1687 | bias_term: false 1688 | pad: 1 1689 | kernel_size: 3 1690 | group: 1024 1691 | engine: CAFFE 1692 | weight_filler { 1693 | type: "msra" 1694 | } 1695 | } 1696 | } 1697 | layer { 1698 | name: "conv13/dw/bn" 1699 | type: "BatchNorm" 1700 | bottom: "conv13/dw" 1701 | top: "conv13/dw" 1702 | param { 1703 | lr_mult: 0 1704 | decay_mult: 0 1705 | } 1706 | param { 1707 | lr_mult: 0 1708 | decay_mult: 0 1709 | } 1710 | param { 1711 | lr_mult: 0 1712 | decay_mult: 0 1713 | } 1714 | } 1715 | layer { 1716 | name: "conv13/dw/scale" 1717 | type: "Scale" 1718 | bottom: "conv13/dw" 1719 | top: "conv13/dw" 1720 | param { 1721 | lr_mult: 0.1 1722 | decay_mult: 0.0 1723 | } 1724 | param { 1725 | lr_mult: 0.2 1726 | decay_mult: 0.0 1727 | } 1728 | scale_param { 1729 | filler { 1730 | value: 1 1731 | } 1732 | bias_term: true 1733 | bias_filler { 1734 | value: 0 1735 | } 1736 | } 1737 | } 1738 | layer { 1739 | name: "conv13/dw/relu" 1740 | type: "ReLU" 1741 | bottom: "conv13/dw" 1742 | top: "conv13/dw" 1743 | } 1744 | layer { 1745 | name: "conv13" 1746 | type: "Convolution" 1747 | bottom: "conv13/dw" 1748 | top: "conv13" 1749 | param { 1750 | lr_mult: 0.1 1751 | decay_mult: 0.1 1752 | } 1753 | convolution_param { 1754 | num_output: 1024 1755 | bias_term: false 1756 | kernel_size: 1 1757 | weight_filler { 1758 | type: "msra" 1759 | } 1760 | } 1761 | } 1762 | layer { 1763 | name: "conv13/bn" 1764 | type: "BatchNorm" 1765 | bottom: "conv13" 1766 | top: "conv13" 1767 | param { 1768 | lr_mult: 0 1769 | decay_mult: 0 1770 | } 1771 | param { 1772 | lr_mult: 0 1773 | decay_mult: 0 1774 | } 1775 | param { 1776 | lr_mult: 0 1777 | decay_mult: 0 1778 | } 1779 | } 1780 | layer { 1781 | name: "conv13/scale" 1782 | type: "Scale" 1783 | bottom: "conv13" 1784 | top: "conv13" 1785 | param { 1786 | lr_mult: 0.1 1787 | decay_mult: 0.0 1788 | } 1789 | param { 1790 | lr_mult: 0.2 1791 | decay_mult: 0.0 1792 | } 1793 | scale_param { 1794 | filler { 1795 | value: 1 1796 | } 1797 | bias_term: true 1798 | bias_filler { 1799 | value: 0 1800 | } 1801 | } 1802 | } 1803 | layer { 1804 | name: "conv13/relu" 1805 | type: "ReLU" 1806 | bottom: "conv13" 1807 | top: "conv13" 1808 | } 1809 | layer { 1810 | name: "conv14_1" 1811 | type: "Convolution" 1812 | bottom: "conv13" 1813 | top: "conv14_1" 1814 | param { 1815 | lr_mult: 0.1 1816 | decay_mult: 0.1 1817 | } 1818 | convolution_param { 1819 | num_output: 256 1820 | bias_term: false 1821 | kernel_size: 1 1822 | weight_filler { 1823 | type: "msra" 1824 | } 1825 | } 1826 | } 1827 | layer { 1828 | name: "conv14_1/bn" 1829 | type: "BatchNorm" 1830 | bottom: "conv14_1" 1831 | top: "conv14_1" 1832 | param { 1833 | lr_mult: 0 1834 | decay_mult: 0 1835 | } 1836 | param { 1837 | lr_mult: 0 1838 | decay_mult: 0 1839 | } 1840 | param { 1841 | lr_mult: 0 1842 | decay_mult: 0 1843 | } 1844 | } 1845 | layer { 1846 | name: "conv14_1/scale" 1847 | type: "Scale" 1848 | bottom: "conv14_1" 1849 | top: "conv14_1" 1850 | param { 1851 | lr_mult: 0.1 1852 | decay_mult: 0.0 1853 | } 1854 | param { 1855 | lr_mult: 0.2 1856 | decay_mult: 0.0 1857 | } 1858 | scale_param { 1859 | filler { 1860 | value: 1 1861 | } 1862 | bias_term: true 1863 | bias_filler { 1864 | value: 0 1865 | } 1866 | } 1867 | } 1868 | layer { 1869 | name: "conv14_1/relu" 1870 | type: "ReLU" 1871 | bottom: "conv14_1" 1872 | top: "conv14_1" 1873 | } 1874 | layer { 1875 | name: "conv14_2" 1876 | type: "Convolution" 1877 | bottom: "conv14_1" 1878 | top: "conv14_2" 1879 | param { 1880 | lr_mult: 0.1 1881 | decay_mult: 0.1 1882 | } 1883 | convolution_param { 1884 | num_output: 512 1885 | bias_term: false 1886 | pad: 1 1887 | kernel_size: 3 1888 | stride: 2 1889 | weight_filler { 1890 | type: "msra" 1891 | } 1892 | } 1893 | } 1894 | layer { 1895 | name: "conv14_2/bn" 1896 | type: "BatchNorm" 1897 | bottom: "conv14_2" 1898 | top: "conv14_2" 1899 | param { 1900 | lr_mult: 0 1901 | decay_mult: 0 1902 | } 1903 | param { 1904 | lr_mult: 0 1905 | decay_mult: 0 1906 | } 1907 | param { 1908 | lr_mult: 0 1909 | decay_mult: 0 1910 | } 1911 | } 1912 | layer { 1913 | name: "conv14_2/scale" 1914 | type: "Scale" 1915 | bottom: "conv14_2" 1916 | top: "conv14_2" 1917 | param { 1918 | lr_mult: 0.1 1919 | decay_mult: 0.0 1920 | } 1921 | param { 1922 | lr_mult: 0.2 1923 | decay_mult: 0.0 1924 | } 1925 | scale_param { 1926 | filler { 1927 | value: 1 1928 | } 1929 | bias_term: true 1930 | bias_filler { 1931 | value: 0 1932 | } 1933 | } 1934 | } 1935 | layer { 1936 | name: "conv14_2/relu" 1937 | type: "ReLU" 1938 | bottom: "conv14_2" 1939 | top: "conv14_2" 1940 | } 1941 | layer { 1942 | name: "conv15_1" 1943 | type: "Convolution" 1944 | bottom: "conv14_2" 1945 | top: "conv15_1" 1946 | param { 1947 | lr_mult: 0.1 1948 | decay_mult: 0.1 1949 | } 1950 | convolution_param { 1951 | num_output: 128 1952 | bias_term: false 1953 | kernel_size: 1 1954 | weight_filler { 1955 | type: "msra" 1956 | } 1957 | } 1958 | } 1959 | layer { 1960 | name: "conv15_1/bn" 1961 | type: "BatchNorm" 1962 | bottom: "conv15_1" 1963 | top: "conv15_1" 1964 | param { 1965 | lr_mult: 0 1966 | decay_mult: 0 1967 | } 1968 | param { 1969 | lr_mult: 0 1970 | decay_mult: 0 1971 | } 1972 | param { 1973 | lr_mult: 0 1974 | decay_mult: 0 1975 | } 1976 | } 1977 | layer { 1978 | name: "conv15_1/scale" 1979 | type: "Scale" 1980 | bottom: "conv15_1" 1981 | top: "conv15_1" 1982 | param { 1983 | lr_mult: 0.1 1984 | decay_mult: 0.0 1985 | } 1986 | param { 1987 | lr_mult: 0.2 1988 | decay_mult: 0.0 1989 | } 1990 | scale_param { 1991 | filler { 1992 | value: 1 1993 | } 1994 | bias_term: true 1995 | bias_filler { 1996 | value: 0 1997 | } 1998 | } 1999 | } 2000 | layer { 2001 | name: "conv15_1/relu" 2002 | type: "ReLU" 2003 | bottom: "conv15_1" 2004 | top: "conv15_1" 2005 | } 2006 | layer { 2007 | name: "conv15_2" 2008 | type: "Convolution" 2009 | bottom: "conv15_1" 2010 | top: "conv15_2" 2011 | param { 2012 | lr_mult: 0.1 2013 | decay_mult: 0.1 2014 | } 2015 | convolution_param { 2016 | num_output: 256 2017 | bias_term: false 2018 | pad: 1 2019 | kernel_size: 3 2020 | stride: 2 2021 | weight_filler { 2022 | type: "msra" 2023 | } 2024 | } 2025 | } 2026 | layer { 2027 | name: "conv15_2/bn" 2028 | type: "BatchNorm" 2029 | bottom: "conv15_2" 2030 | top: "conv15_2" 2031 | param { 2032 | lr_mult: 0 2033 | decay_mult: 0 2034 | } 2035 | param { 2036 | lr_mult: 0 2037 | decay_mult: 0 2038 | } 2039 | param { 2040 | lr_mult: 0 2041 | decay_mult: 0 2042 | } 2043 | } 2044 | layer { 2045 | name: "conv15_2/scale" 2046 | type: "Scale" 2047 | bottom: "conv15_2" 2048 | top: "conv15_2" 2049 | param { 2050 | lr_mult: 0.1 2051 | decay_mult: 0.0 2052 | } 2053 | param { 2054 | lr_mult: 0.2 2055 | decay_mult: 0.0 2056 | } 2057 | scale_param { 2058 | filler { 2059 | value: 1 2060 | } 2061 | bias_term: true 2062 | bias_filler { 2063 | value: 0 2064 | } 2065 | } 2066 | } 2067 | layer { 2068 | name: "conv15_2/relu" 2069 | type: "ReLU" 2070 | bottom: "conv15_2" 2071 | top: "conv15_2" 2072 | } 2073 | layer { 2074 | name: "conv16_1" 2075 | type: "Convolution" 2076 | bottom: "conv15_2" 2077 | top: "conv16_1" 2078 | param { 2079 | lr_mult: 0.1 2080 | decay_mult: 0.1 2081 | } 2082 | convolution_param { 2083 | num_output: 128 2084 | bias_term: false 2085 | kernel_size: 1 2086 | weight_filler { 2087 | type: "msra" 2088 | } 2089 | } 2090 | } 2091 | layer { 2092 | name: "conv16_1/bn" 2093 | type: "BatchNorm" 2094 | bottom: "conv16_1" 2095 | top: "conv16_1" 2096 | param { 2097 | lr_mult: 0 2098 | decay_mult: 0 2099 | } 2100 | param { 2101 | lr_mult: 0 2102 | decay_mult: 0 2103 | } 2104 | param { 2105 | lr_mult: 0 2106 | decay_mult: 0 2107 | } 2108 | } 2109 | layer { 2110 | name: "conv16_1/scale" 2111 | type: "Scale" 2112 | bottom: "conv16_1" 2113 | top: "conv16_1" 2114 | param { 2115 | lr_mult: 0.1 2116 | decay_mult: 0.0 2117 | } 2118 | param { 2119 | lr_mult: 0.2 2120 | decay_mult: 0.0 2121 | } 2122 | scale_param { 2123 | filler { 2124 | value: 1 2125 | } 2126 | bias_term: true 2127 | bias_filler { 2128 | value: 0 2129 | } 2130 | } 2131 | } 2132 | layer { 2133 | name: "conv16_1/relu" 2134 | type: "ReLU" 2135 | bottom: "conv16_1" 2136 | top: "conv16_1" 2137 | } 2138 | layer { 2139 | name: "conv16_2" 2140 | type: "Convolution" 2141 | bottom: "conv16_1" 2142 | top: "conv16_2" 2143 | param { 2144 | lr_mult: 0.1 2145 | decay_mult: 0.1 2146 | } 2147 | convolution_param { 2148 | num_output: 256 2149 | bias_term: false 2150 | pad: 1 2151 | kernel_size: 3 2152 | stride: 2 2153 | weight_filler { 2154 | type: "msra" 2155 | } 2156 | } 2157 | } 2158 | layer { 2159 | name: "conv16_2/bn" 2160 | type: "BatchNorm" 2161 | bottom: "conv16_2" 2162 | top: "conv16_2" 2163 | param { 2164 | lr_mult: 0 2165 | decay_mult: 0 2166 | } 2167 | param { 2168 | lr_mult: 0 2169 | decay_mult: 0 2170 | } 2171 | param { 2172 | lr_mult: 0 2173 | decay_mult: 0 2174 | } 2175 | } 2176 | layer { 2177 | name: "conv16_2/scale" 2178 | type: "Scale" 2179 | bottom: "conv16_2" 2180 | top: "conv16_2" 2181 | param { 2182 | lr_mult: 0.1 2183 | decay_mult: 0.0 2184 | } 2185 | param { 2186 | lr_mult: 0.2 2187 | decay_mult: 0.0 2188 | } 2189 | scale_param { 2190 | filler { 2191 | value: 1 2192 | } 2193 | bias_term: true 2194 | bias_filler { 2195 | value: 0 2196 | } 2197 | } 2198 | } 2199 | layer { 2200 | name: "conv16_2/relu" 2201 | type: "ReLU" 2202 | bottom: "conv16_2" 2203 | top: "conv16_2" 2204 | } 2205 | layer { 2206 | name: "conv17_1" 2207 | type: "Convolution" 2208 | bottom: "conv16_2" 2209 | top: "conv17_1" 2210 | param { 2211 | lr_mult: 0.1 2212 | decay_mult: 0.1 2213 | } 2214 | convolution_param { 2215 | num_output: 64 2216 | bias_term: false 2217 | kernel_size: 1 2218 | weight_filler { 2219 | type: "msra" 2220 | } 2221 | } 2222 | } 2223 | layer { 2224 | name: "conv17_1/bn" 2225 | type: "BatchNorm" 2226 | bottom: "conv17_1" 2227 | top: "conv17_1" 2228 | param { 2229 | lr_mult: 0 2230 | decay_mult: 0 2231 | } 2232 | param { 2233 | lr_mult: 0 2234 | decay_mult: 0 2235 | } 2236 | param { 2237 | lr_mult: 0 2238 | decay_mult: 0 2239 | } 2240 | } 2241 | layer { 2242 | name: "conv17_1/scale" 2243 | type: "Scale" 2244 | bottom: "conv17_1" 2245 | top: "conv17_1" 2246 | param { 2247 | lr_mult: 0.1 2248 | decay_mult: 0.0 2249 | } 2250 | param { 2251 | lr_mult: 0.2 2252 | decay_mult: 0.0 2253 | } 2254 | scale_param { 2255 | filler { 2256 | value: 1 2257 | } 2258 | bias_term: true 2259 | bias_filler { 2260 | value: 0 2261 | } 2262 | } 2263 | } 2264 | layer { 2265 | name: "conv17_1/relu" 2266 | type: "ReLU" 2267 | bottom: "conv17_1" 2268 | top: "conv17_1" 2269 | } 2270 | layer { 2271 | name: "conv17_2" 2272 | type: "Convolution" 2273 | bottom: "conv17_1" 2274 | top: "conv17_2" 2275 | param { 2276 | lr_mult: 0.1 2277 | decay_mult: 0.1 2278 | } 2279 | convolution_param { 2280 | num_output: 128 2281 | bias_term: false 2282 | pad: 1 2283 | kernel_size: 3 2284 | stride: 2 2285 | weight_filler { 2286 | type: "msra" 2287 | } 2288 | } 2289 | } 2290 | layer { 2291 | name: "conv17_2/bn" 2292 | type: "BatchNorm" 2293 | bottom: "conv17_2" 2294 | top: "conv17_2" 2295 | param { 2296 | lr_mult: 0 2297 | decay_mult: 0 2298 | } 2299 | param { 2300 | lr_mult: 0 2301 | decay_mult: 0 2302 | } 2303 | param { 2304 | lr_mult: 0 2305 | decay_mult: 0 2306 | } 2307 | } 2308 | layer { 2309 | name: "conv17_2/scale" 2310 | type: "Scale" 2311 | bottom: "conv17_2" 2312 | top: "conv17_2" 2313 | param { 2314 | lr_mult: 0.1 2315 | decay_mult: 0.0 2316 | } 2317 | param { 2318 | lr_mult: 0.2 2319 | decay_mult: 0.0 2320 | } 2321 | scale_param { 2322 | filler { 2323 | value: 1 2324 | } 2325 | bias_term: true 2326 | bias_filler { 2327 | value: 0 2328 | } 2329 | } 2330 | } 2331 | layer { 2332 | name: "conv17_2/relu" 2333 | type: "ReLU" 2334 | bottom: "conv17_2" 2335 | top: "conv17_2" 2336 | } 2337 | layer { 2338 | name: "conv11_mbox_loc" 2339 | type: "Convolution" 2340 | bottom: "conv11" 2341 | top: "conv11_mbox_loc" 2342 | param { 2343 | lr_mult: 0.1 2344 | decay_mult: 0.1 2345 | } 2346 | param { 2347 | lr_mult: 0.2 2348 | decay_mult: 0.0 2349 | } 2350 | convolution_param { 2351 | num_output: 12 2352 | kernel_size: 1 2353 | weight_filler { 2354 | type: "msra" 2355 | } 2356 | bias_filler { 2357 | type: "constant" 2358 | value: 0.0 2359 | } 2360 | } 2361 | } 2362 | layer { 2363 | name: "conv11_mbox_loc_perm" 2364 | type: "Permute" 2365 | bottom: "conv11_mbox_loc" 2366 | top: "conv11_mbox_loc_perm" 2367 | permute_param { 2368 | order: 0 2369 | order: 2 2370 | order: 3 2371 | order: 1 2372 | } 2373 | } 2374 | layer { 2375 | name: "conv11_mbox_loc_flat" 2376 | type: "Flatten" 2377 | bottom: "conv11_mbox_loc_perm" 2378 | top: "conv11_mbox_loc_flat" 2379 | flatten_param { 2380 | axis: 1 2381 | } 2382 | } 2383 | layer { 2384 | name: "conv11_mbox_conf_new_worker" 2385 | type: "Convolution" 2386 | bottom: "conv11" 2387 | top: "conv11_mbox_conf_new_worker" 2388 | param { 2389 | lr_mult: 1.0 2390 | decay_mult: 1.0 2391 | } 2392 | param { 2393 | lr_mult: 2.0 2394 | decay_mult: 0.0 2395 | } 2396 | convolution_param { 2397 | num_output: 15 2398 | kernel_size: 1 2399 | weight_filler { 2400 | type: "msra" 2401 | } 2402 | bias_filler { 2403 | type: "constant" 2404 | value: 0.0 2405 | } 2406 | } 2407 | } 2408 | layer { 2409 | name: "conv11_mbox_conf_perm" 2410 | type: "Permute" 2411 | bottom: "conv11_mbox_conf_new_worker" 2412 | top: "conv11_mbox_conf_perm" 2413 | permute_param { 2414 | order: 0 2415 | order: 2 2416 | order: 3 2417 | order: 1 2418 | } 2419 | } 2420 | layer { 2421 | name: "conv11_mbox_conf_flat" 2422 | type: "Flatten" 2423 | bottom: "conv11_mbox_conf_perm" 2424 | top: "conv11_mbox_conf_flat" 2425 | flatten_param { 2426 | axis: 1 2427 | } 2428 | } 2429 | layer { 2430 | name: "conv11_mbox_priorbox" 2431 | type: "PriorBox" 2432 | bottom: "conv11" 2433 | bottom: "data" 2434 | top: "conv11_mbox_priorbox" 2435 | prior_box_param { 2436 | min_size: 60.0 2437 | aspect_ratio: 2.0 2438 | flip: true 2439 | clip: false 2440 | variance: 0.1 2441 | variance: 0.1 2442 | variance: 0.2 2443 | variance: 0.2 2444 | offset: 0.5 2445 | } 2446 | } 2447 | layer { 2448 | name: "conv13_mbox_loc" 2449 | type: "Convolution" 2450 | bottom: "conv13" 2451 | top: "conv13_mbox_loc" 2452 | param { 2453 | lr_mult: 0.1 2454 | decay_mult: 0.1 2455 | } 2456 | param { 2457 | lr_mult: 0.2 2458 | decay_mult: 0.0 2459 | } 2460 | convolution_param { 2461 | num_output: 24 2462 | kernel_size: 1 2463 | weight_filler { 2464 | type: "msra" 2465 | } 2466 | bias_filler { 2467 | type: "constant" 2468 | value: 0.0 2469 | } 2470 | } 2471 | } 2472 | layer { 2473 | name: "conv13_mbox_loc_perm" 2474 | type: "Permute" 2475 | bottom: "conv13_mbox_loc" 2476 | top: "conv13_mbox_loc_perm" 2477 | permute_param { 2478 | order: 0 2479 | order: 2 2480 | order: 3 2481 | order: 1 2482 | } 2483 | } 2484 | layer { 2485 | name: "conv13_mbox_loc_flat" 2486 | type: "Flatten" 2487 | bottom: "conv13_mbox_loc_perm" 2488 | top: "conv13_mbox_loc_flat" 2489 | flatten_param { 2490 | axis: 1 2491 | } 2492 | } 2493 | layer { 2494 | name: "conv13_mbox_conf_new_worker" 2495 | type: "Convolution" 2496 | bottom: "conv13" 2497 | top: "conv13_mbox_conf_new_worker" 2498 | param { 2499 | lr_mult: 1.0 2500 | decay_mult: 1.0 2501 | } 2502 | param { 2503 | lr_mult: 2.0 2504 | decay_mult: 0.0 2505 | } 2506 | convolution_param { 2507 | num_output: 30 2508 | kernel_size: 1 2509 | weight_filler { 2510 | type: "msra" 2511 | } 2512 | bias_filler { 2513 | type: "constant" 2514 | value: 0.0 2515 | } 2516 | } 2517 | } 2518 | layer { 2519 | name: "conv13_mbox_conf_perm" 2520 | type: "Permute" 2521 | bottom: "conv13_mbox_conf_new_worker" 2522 | top: "conv13_mbox_conf_perm" 2523 | permute_param { 2524 | order: 0 2525 | order: 2 2526 | order: 3 2527 | order: 1 2528 | } 2529 | } 2530 | layer { 2531 | name: "conv13_mbox_conf_flat" 2532 | type: "Flatten" 2533 | bottom: "conv13_mbox_conf_perm" 2534 | top: "conv13_mbox_conf_flat" 2535 | flatten_param { 2536 | axis: 1 2537 | } 2538 | } 2539 | layer { 2540 | name: "conv13_mbox_priorbox" 2541 | type: "PriorBox" 2542 | bottom: "conv13" 2543 | bottom: "data" 2544 | top: "conv13_mbox_priorbox" 2545 | prior_box_param { 2546 | min_size: 105.0 2547 | max_size: 150.0 2548 | aspect_ratio: 2.0 2549 | aspect_ratio: 3.0 2550 | flip: true 2551 | clip: false 2552 | variance: 0.1 2553 | variance: 0.1 2554 | variance: 0.2 2555 | variance: 0.2 2556 | offset: 0.5 2557 | } 2558 | } 2559 | layer { 2560 | name: "conv14_2_mbox_loc" 2561 | type: "Convolution" 2562 | bottom: "conv14_2" 2563 | top: "conv14_2_mbox_loc" 2564 | param { 2565 | lr_mult: 0.1 2566 | decay_mult: 0.1 2567 | } 2568 | param { 2569 | lr_mult: 0.2 2570 | decay_mult: 0.0 2571 | } 2572 | convolution_param { 2573 | num_output: 24 2574 | kernel_size: 1 2575 | weight_filler { 2576 | type: "msra" 2577 | } 2578 | bias_filler { 2579 | type: "constant" 2580 | value: 0.0 2581 | } 2582 | } 2583 | } 2584 | layer { 2585 | name: "conv14_2_mbox_loc_perm" 2586 | type: "Permute" 2587 | bottom: "conv14_2_mbox_loc" 2588 | top: "conv14_2_mbox_loc_perm" 2589 | permute_param { 2590 | order: 0 2591 | order: 2 2592 | order: 3 2593 | order: 1 2594 | } 2595 | } 2596 | layer { 2597 | name: "conv14_2_mbox_loc_flat" 2598 | type: "Flatten" 2599 | bottom: "conv14_2_mbox_loc_perm" 2600 | top: "conv14_2_mbox_loc_flat" 2601 | flatten_param { 2602 | axis: 1 2603 | } 2604 | } 2605 | layer { 2606 | name: "conv14_2_mbox_conf_new_worker" 2607 | type: "Convolution" 2608 | bottom: "conv14_2" 2609 | top: "conv14_2_mbox_conf_new_worker" 2610 | param { 2611 | lr_mult: 1.0 2612 | decay_mult: 1.0 2613 | } 2614 | param { 2615 | lr_mult: 2.0 2616 | decay_mult: 0.0 2617 | } 2618 | convolution_param { 2619 | num_output: 30 2620 | kernel_size: 1 2621 | weight_filler { 2622 | type: "msra" 2623 | } 2624 | bias_filler { 2625 | type: "constant" 2626 | value: 0.0 2627 | } 2628 | } 2629 | } 2630 | layer { 2631 | name: "conv14_2_mbox_conf_perm" 2632 | type: "Permute" 2633 | bottom: "conv14_2_mbox_conf_new_worker" 2634 | top: "conv14_2_mbox_conf_perm" 2635 | permute_param { 2636 | order: 0 2637 | order: 2 2638 | order: 3 2639 | order: 1 2640 | } 2641 | } 2642 | layer { 2643 | name: "conv14_2_mbox_conf_flat" 2644 | type: "Flatten" 2645 | bottom: "conv14_2_mbox_conf_perm" 2646 | top: "conv14_2_mbox_conf_flat" 2647 | flatten_param { 2648 | axis: 1 2649 | } 2650 | } 2651 | layer { 2652 | name: "conv14_2_mbox_priorbox" 2653 | type: "PriorBox" 2654 | bottom: "conv14_2" 2655 | bottom: "data" 2656 | top: "conv14_2_mbox_priorbox" 2657 | prior_box_param { 2658 | min_size: 150.0 2659 | max_size: 195.0 2660 | aspect_ratio: 2.0 2661 | aspect_ratio: 3.0 2662 | flip: true 2663 | clip: false 2664 | variance: 0.1 2665 | variance: 0.1 2666 | variance: 0.2 2667 | variance: 0.2 2668 | offset: 0.5 2669 | } 2670 | } 2671 | layer { 2672 | name: "conv15_2_mbox_loc" 2673 | type: "Convolution" 2674 | bottom: "conv15_2" 2675 | top: "conv15_2_mbox_loc" 2676 | param { 2677 | lr_mult: 0.1 2678 | decay_mult: 0.1 2679 | } 2680 | param { 2681 | lr_mult: 0.2 2682 | decay_mult: 0.0 2683 | } 2684 | convolution_param { 2685 | num_output: 24 2686 | kernel_size: 1 2687 | weight_filler { 2688 | type: "msra" 2689 | } 2690 | bias_filler { 2691 | type: "constant" 2692 | value: 0.0 2693 | } 2694 | } 2695 | } 2696 | layer { 2697 | name: "conv15_2_mbox_loc_perm" 2698 | type: "Permute" 2699 | bottom: "conv15_2_mbox_loc" 2700 | top: "conv15_2_mbox_loc_perm" 2701 | permute_param { 2702 | order: 0 2703 | order: 2 2704 | order: 3 2705 | order: 1 2706 | } 2707 | } 2708 | layer { 2709 | name: "conv15_2_mbox_loc_flat" 2710 | type: "Flatten" 2711 | bottom: "conv15_2_mbox_loc_perm" 2712 | top: "conv15_2_mbox_loc_flat" 2713 | flatten_param { 2714 | axis: 1 2715 | } 2716 | } 2717 | layer { 2718 | name: "conv15_2_mbox_conf_new_worker" 2719 | type: "Convolution" 2720 | bottom: "conv15_2" 2721 | top: "conv15_2_mbox_conf_new_worker" 2722 | param { 2723 | lr_mult: 1.0 2724 | decay_mult: 1.0 2725 | } 2726 | param { 2727 | lr_mult: 2.0 2728 | decay_mult: 0.0 2729 | } 2730 | convolution_param { 2731 | num_output: 30 2732 | kernel_size: 1 2733 | weight_filler { 2734 | type: "msra" 2735 | } 2736 | bias_filler { 2737 | type: "constant" 2738 | value: 0.0 2739 | } 2740 | } 2741 | } 2742 | layer { 2743 | name: "conv15_2_mbox_conf_perm" 2744 | type: "Permute" 2745 | bottom: "conv15_2_mbox_conf_new_worker" 2746 | top: "conv15_2_mbox_conf_perm" 2747 | permute_param { 2748 | order: 0 2749 | order: 2 2750 | order: 3 2751 | order: 1 2752 | } 2753 | } 2754 | layer { 2755 | name: "conv15_2_mbox_conf_flat" 2756 | type: "Flatten" 2757 | bottom: "conv15_2_mbox_conf_perm" 2758 | top: "conv15_2_mbox_conf_flat" 2759 | flatten_param { 2760 | axis: 1 2761 | } 2762 | } 2763 | layer { 2764 | name: "conv15_2_mbox_priorbox" 2765 | type: "PriorBox" 2766 | bottom: "conv15_2" 2767 | bottom: "data" 2768 | top: "conv15_2_mbox_priorbox" 2769 | prior_box_param { 2770 | min_size: 195.0 2771 | max_size: 240.0 2772 | aspect_ratio: 2.0 2773 | aspect_ratio: 3.0 2774 | flip: true 2775 | clip: false 2776 | variance: 0.1 2777 | variance: 0.1 2778 | variance: 0.2 2779 | variance: 0.2 2780 | offset: 0.5 2781 | } 2782 | } 2783 | layer { 2784 | name: "conv16_2_mbox_loc" 2785 | type: "Convolution" 2786 | bottom: "conv16_2" 2787 | top: "conv16_2_mbox_loc" 2788 | param { 2789 | lr_mult: 0.1 2790 | decay_mult: 0.1 2791 | } 2792 | param { 2793 | lr_mult: 0.2 2794 | decay_mult: 0.0 2795 | } 2796 | convolution_param { 2797 | num_output: 24 2798 | kernel_size: 1 2799 | weight_filler { 2800 | type: "msra" 2801 | } 2802 | bias_filler { 2803 | type: "constant" 2804 | value: 0.0 2805 | } 2806 | } 2807 | } 2808 | layer { 2809 | name: "conv16_2_mbox_loc_perm" 2810 | type: "Permute" 2811 | bottom: "conv16_2_mbox_loc" 2812 | top: "conv16_2_mbox_loc_perm" 2813 | permute_param { 2814 | order: 0 2815 | order: 2 2816 | order: 3 2817 | order: 1 2818 | } 2819 | } 2820 | layer { 2821 | name: "conv16_2_mbox_loc_flat" 2822 | type: "Flatten" 2823 | bottom: "conv16_2_mbox_loc_perm" 2824 | top: "conv16_2_mbox_loc_flat" 2825 | flatten_param { 2826 | axis: 1 2827 | } 2828 | } 2829 | layer { 2830 | name: "conv16_2_mbox_conf_new_worker" 2831 | type: "Convolution" 2832 | bottom: "conv16_2" 2833 | top: "conv16_2_mbox_conf_new_worker" 2834 | param { 2835 | lr_mult: 1.0 2836 | decay_mult: 1.0 2837 | } 2838 | param { 2839 | lr_mult: 2.0 2840 | decay_mult: 0.0 2841 | } 2842 | convolution_param { 2843 | num_output: 30 2844 | kernel_size: 1 2845 | weight_filler { 2846 | type: "msra" 2847 | } 2848 | bias_filler { 2849 | type: "constant" 2850 | value: 0.0 2851 | } 2852 | } 2853 | } 2854 | layer { 2855 | name: "conv16_2_mbox_conf_perm" 2856 | type: "Permute" 2857 | bottom: "conv16_2_mbox_conf_new_worker" 2858 | top: "conv16_2_mbox_conf_perm" 2859 | permute_param { 2860 | order: 0 2861 | order: 2 2862 | order: 3 2863 | order: 1 2864 | } 2865 | } 2866 | layer { 2867 | name: "conv16_2_mbox_conf_flat" 2868 | type: "Flatten" 2869 | bottom: "conv16_2_mbox_conf_perm" 2870 | top: "conv16_2_mbox_conf_flat" 2871 | flatten_param { 2872 | axis: 1 2873 | } 2874 | } 2875 | layer { 2876 | name: "conv16_2_mbox_priorbox" 2877 | type: "PriorBox" 2878 | bottom: "conv16_2" 2879 | bottom: "data" 2880 | top: "conv16_2_mbox_priorbox" 2881 | prior_box_param { 2882 | min_size: 240.0 2883 | max_size: 285.0 2884 | aspect_ratio: 2.0 2885 | aspect_ratio: 3.0 2886 | flip: true 2887 | clip: false 2888 | variance: 0.1 2889 | variance: 0.1 2890 | variance: 0.2 2891 | variance: 0.2 2892 | offset: 0.5 2893 | } 2894 | } 2895 | layer { 2896 | name: "conv17_2_mbox_loc" 2897 | type: "Convolution" 2898 | bottom: "conv17_2" 2899 | top: "conv17_2_mbox_loc" 2900 | param { 2901 | lr_mult: 0.1 2902 | decay_mult: 0.1 2903 | } 2904 | param { 2905 | lr_mult: 0.2 2906 | decay_mult: 0.0 2907 | } 2908 | convolution_param { 2909 | num_output: 24 2910 | kernel_size: 1 2911 | weight_filler { 2912 | type: "msra" 2913 | } 2914 | bias_filler { 2915 | type: "constant" 2916 | value: 0.0 2917 | } 2918 | } 2919 | } 2920 | layer { 2921 | name: "conv17_2_mbox_loc_perm" 2922 | type: "Permute" 2923 | bottom: "conv17_2_mbox_loc" 2924 | top: "conv17_2_mbox_loc_perm" 2925 | permute_param { 2926 | order: 0 2927 | order: 2 2928 | order: 3 2929 | order: 1 2930 | } 2931 | } 2932 | layer { 2933 | name: "conv17_2_mbox_loc_flat" 2934 | type: "Flatten" 2935 | bottom: "conv17_2_mbox_loc_perm" 2936 | top: "conv17_2_mbox_loc_flat" 2937 | flatten_param { 2938 | axis: 1 2939 | } 2940 | } 2941 | layer { 2942 | name: "conv17_2_mbox_conf_new_worker" 2943 | type: "Convolution" 2944 | bottom: "conv17_2" 2945 | top: "conv17_2_mbox_conf_new_worker" 2946 | param { 2947 | lr_mult: 1.0 2948 | decay_mult: 1.0 2949 | } 2950 | param { 2951 | lr_mult: 2.0 2952 | decay_mult: 0.0 2953 | } 2954 | convolution_param { 2955 | num_output: 30 2956 | kernel_size: 1 2957 | weight_filler { 2958 | type: "msra" 2959 | } 2960 | bias_filler { 2961 | type: "constant" 2962 | value: 0.0 2963 | } 2964 | } 2965 | } 2966 | layer { 2967 | name: "conv17_2_mbox_conf_perm" 2968 | type: "Permute" 2969 | bottom: "conv17_2_mbox_conf_new_worker" 2970 | top: "conv17_2_mbox_conf_perm" 2971 | permute_param { 2972 | order: 0 2973 | order: 2 2974 | order: 3 2975 | order: 1 2976 | } 2977 | } 2978 | layer { 2979 | name: "conv17_2_mbox_conf_flat" 2980 | type: "Flatten" 2981 | bottom: "conv17_2_mbox_conf_perm" 2982 | top: "conv17_2_mbox_conf_flat" 2983 | flatten_param { 2984 | axis: 1 2985 | } 2986 | } 2987 | layer { 2988 | name: "conv17_2_mbox_priorbox" 2989 | type: "PriorBox" 2990 | bottom: "conv17_2" 2991 | bottom: "data" 2992 | top: "conv17_2_mbox_priorbox" 2993 | prior_box_param { 2994 | min_size: 285.0 2995 | max_size: 300.0 2996 | aspect_ratio: 2.0 2997 | aspect_ratio: 3.0 2998 | flip: true 2999 | clip: false 3000 | variance: 0.1 3001 | variance: 0.1 3002 | variance: 0.2 3003 | variance: 0.2 3004 | offset: 0.5 3005 | } 3006 | } 3007 | layer { 3008 | name: "mbox_loc" 3009 | type: "Concat" 3010 | bottom: "conv11_mbox_loc_flat" 3011 | bottom: "conv13_mbox_loc_flat" 3012 | bottom: "conv14_2_mbox_loc_flat" 3013 | bottom: "conv15_2_mbox_loc_flat" 3014 | bottom: "conv16_2_mbox_loc_flat" 3015 | bottom: "conv17_2_mbox_loc_flat" 3016 | top: "mbox_loc" 3017 | concat_param { 3018 | axis: 1 3019 | } 3020 | } 3021 | layer { 3022 | name: "mbox_conf" 3023 | type: "Concat" 3024 | bottom: "conv11_mbox_conf_flat" 3025 | bottom: "conv13_mbox_conf_flat" 3026 | bottom: "conv14_2_mbox_conf_flat" 3027 | bottom: "conv15_2_mbox_conf_flat" 3028 | bottom: "conv16_2_mbox_conf_flat" 3029 | bottom: "conv17_2_mbox_conf_flat" 3030 | top: "mbox_conf" 3031 | concat_param { 3032 | axis: 1 3033 | } 3034 | } 3035 | layer { 3036 | name: "mbox_priorbox" 3037 | type: "Concat" 3038 | bottom: "conv11_mbox_priorbox" 3039 | bottom: "conv13_mbox_priorbox" 3040 | bottom: "conv14_2_mbox_priorbox" 3041 | bottom: "conv15_2_mbox_priorbox" 3042 | bottom: "conv16_2_mbox_priorbox" 3043 | bottom: "conv17_2_mbox_priorbox" 3044 | top: "mbox_priorbox" 3045 | concat_param { 3046 | axis: 2 3047 | } 3048 | } 3049 | layer { 3050 | name: "mbox_conf_reshape" 3051 | type: "Reshape" 3052 | bottom: "mbox_conf" 3053 | top: "mbox_conf_reshape" 3054 | reshape_param { 3055 | shape { 3056 | dim: 0 3057 | dim: -1 3058 | dim: 6 3059 | } 3060 | } 3061 | } 3062 | layer { 3063 | name: "mbox_conf_softmax" 3064 | type: "Softmax" 3065 | bottom: "mbox_conf_reshape" 3066 | top: "mbox_conf_softmax" 3067 | softmax_param { 3068 | axis: 2 3069 | } 3070 | } 3071 | layer { 3072 | name: "mbox_conf_flatten" 3073 | type: "Flatten" 3074 | bottom: "mbox_conf_softmax" 3075 | top: "mbox_conf_flatten" 3076 | flatten_param { 3077 | axis: 1 3078 | } 3079 | } 3080 | layer { 3081 | name: "detection_out" 3082 | type: "DetectionOutput" 3083 | bottom: "mbox_loc" 3084 | bottom: "mbox_conf_flatten" 3085 | bottom: "mbox_priorbox" 3086 | top: "detection_out" 3087 | include { 3088 | phase: TEST 3089 | } 3090 | detection_output_param { 3091 | num_classes: 5 3092 | share_location: true 3093 | background_label_id: 0 3094 | nms_param { 3095 | nms_threshold: 0.45 3096 | top_k: 100 3097 | } 3098 | code_type: CENTER_SIZE 3099 | keep_top_k: 100 3100 | confidence_threshold: 0.25 3101 | } 3102 | } 3103 | -------------------------------------------------------------------------------- /setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) 2018 Intel Corporation. 3 | # Permission is hereby granted, free of charge, to any person obtaining 4 | # a copy of this software and associated documentation files (the 5 | # "Software"), to deal in the Software without restriction, including 6 | # without limitation the rights to use, copy, modify, merge, publish, 7 | # distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so, subject to 9 | # the following conditions: 10 | # 11 | # The above copyright notice and this permission notice shall be 12 | # included in all copies or substantial portions of the Software. 13 | # 14 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 15 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 16 | # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 17 | # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 18 | # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 19 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 20 | # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | 22 | BASE_DIR=`pwd` 23 | 24 | #Install the dependencies 25 | sudo apt-get update 26 | sudo apt-get install ffmpeg 27 | sudo apt-get install python3-pip 28 | sudo pip3 install numpy jupyter 29 | 30 | #Download the person detection model 31 | cd /opt/intel/openvino/deployment_tools/tools/model_downloader 32 | sudo ./downloader.py --name person-detection-retail-0013 33 | 34 | #Optimize the worker-safety-mobilenet model 35 | cd /opt/intel/openvino/deployment_tools/model_optimizer/ 36 | ./mo_caffe.py --input_model $BASE_DIR/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel -o $BASE_DIR/resources/worker-safety-mobilenet/FP32 --data_type FP32 37 | ./mo_caffe.py --input_model $BASE_DIR/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel -o $BASE_DIR/resources/worker-safety-mobilenet/FP16 --data_type FP16 38 | 39 | --------------------------------------------------------------------------------