├── Jupyter
├── README.md
├── inference.py
├── safety_gear_detector_jupyter.ipynb
└── safety_gear_detector_jupyter.py
├── LICENSE
├── README.md
├── application
├── inference.py
└── safety_gear_detector.py
├── docs
└── images
│ ├── archdia.png
│ ├── jupy1.png
│ ├── jupy2.png
│ └── safetygear.png
├── resources
├── Safety_Full_Hat_and_Vest.mp4
├── config.json
└── worker-safety-mobilenet
│ ├── worker_safety_mobilenet.caffemodel
│ └── worker_safety_mobilenet.prototxt
└── setup.sh
/Jupyter/README.md:
--------------------------------------------------------------------------------
1 | # Safety Gear Detector
2 |
3 | | Details | |
4 | |-----------------------|---------------|
5 | | Target OS: | Ubuntu\* 18.04 LTS |
6 | | Programming Language: | Python\* 3.5|
7 | | Time to Complete: | 30-40min |
8 |
9 | 
10 |
11 |
12 | ## What It Does
13 | This reference implementation is capable of detecting people passing in front of a camera and detecting if the people are wearing safety-jackets and hard-hats. The application counts the number of people who are violating the safety gear standards and the total number of people detected.
14 |
15 | ## Requirements
16 |
17 | ### Hardware
18 |
19 | - 6th to 8th Generation Intel® Core™ processors with Iris® Pro graphics or Intel® HD Graphics
20 |
21 | ### Software
22 |
23 | - [Ubuntu\* 18.04 LTS](http://releases.ubuntu.com/18.04/)
24 | **Note**: We recommend using a 4.14+ Linux* kernel with this software. Run the following command to determine the kernel version:
25 |
26 | ```
27 | uname -a
28 | ```
29 |
30 | - OpenCL™ Runtime Package
31 |
32 | - Intel® Distribution of OpenVINO™ toolkit 2020 R3 Release
33 |
34 | ## How It Works
35 | The application uses the Inference Engine included in the Intel® Distribution of OpenVINO™ toolkit.
36 |
37 | Firstly, a trained neural network detects people in the frame and displays a green colored bounding box over them. For each person detected, the application determines if they are wearing a safety-jacket and hard-hat. If they are not, an alert is registered with the system.
38 |
39 | 
40 |
41 | ## Setup
42 |
43 | ### Install Intel® Distribution of OpenVINO™ toolkit
44 | Refer to [Install the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to install and set up the toolkit.
45 |
46 | Install the OpenCL™ Runtime Package to run inference on the GPU. It is not mandatory for CPU inference.
47 |
48 |
54 |
55 | ## Setup
56 | ### Get the code
57 | Clone the reference implementation
58 | ```
59 | sudo apt-get update && sudo apt-get install git
60 | git clone https://gitlab.devtools.intel.com/reference-implementations/safety-gear-detector-python-with-worker-safety-model.git
61 | ```
62 |
63 | ### Install OpenVINO
64 |
65 | Refer to [Install Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to learn how to install and configure the toolkit.
66 |
67 | Install the OpenCL™ Runtime Package to run inference on the GPU, as shown in the instructions below. It is not mandatory for CPU inference.
68 |
69 | ### Other dependencies
70 | #### FFmpeg*
71 | FFmpeg is a free and open-source project capable of recording, converting and streaming digital audio and video in various formats. It can be used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more.
72 |
73 |
74 | ## Which model to use
75 |
76 | This application uses the [person-detection-retail-0013](https://docs.openvinotoolkit.org/2020.3/_models_intel_person_detection_retail_0013_description_person_detection_retail_0013.html) Intel® model, that can be downloaded using the **model downloader**. The **model downloader** downloads the __.xml__ and __.bin__ files that will be used by the application.
77 |
78 | The application also uses the **worker_safety_mobilenet** model, whose Caffe* model file are provided in the `resources/worker-safety-mobilenet` directory. These need to be passed through the model optimizer to generate the IR (the .xml and .bin files) that will be used by the application.
79 |
80 | To download the models and install the dependencies of the application, run the below command in the `safety-gear-detector-cpp-with-worker-safety-model` directory:
81 | ```
82 | ./setup.sh
83 | ```
84 |
85 | ### The Config File
86 |
87 | The _resources/config.json_ contains the path of video that will be used by the application as input.
88 |
89 | For example:
90 | ```
91 | {
92 | "inputs": [
93 | {
94 | "video":"path_to_video/video1.mp4"
95 | }
96 | ]
97 | }
98 | ```
99 |
100 | The `path/to/video` is the path to an input video file.
101 |
102 | ### Which Input Video to use
103 |
104 | The application works with any input video. Sample videos are provided [here](https://github.com/intel-iot-devkit/sample-videos/).
105 |
106 | For first-use, we recommend using the *Safety_Full_Hat_and_Vest.mp4* video which is present in the `resources/` directory.
107 |
108 | For example:
109 | ```
110 | {
111 | "inputs": [
112 | {
113 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4"
114 | },
115 | {
116 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4"
117 | }
118 | ]
119 | }
120 | ```
121 | If the user wants to use any other video, it can be used by providing the path in the config.json file.
122 |
123 | ### Using the Camera Stream instead of video
124 |
125 | Replace `path/to/video` with the camera ID in the config.json file, where the ID is taken from the video device (the number **X** in /dev/video**X**).
126 |
127 | On Ubuntu, to list all available video devices use the following command:
128 |
129 | ```
130 | ls /dev/video*
131 | ```
132 |
133 | For example, if the output of above command is __/dev/video0__, then config.json would be:
134 |
135 | ```
136 | {
137 | "inputs": [
138 | {
139 | "video":"0"
140 | }
141 | ]
142 | }
143 | ```
144 |
145 | ### Setup the Environment
146 |
147 | Configure the environment to use the Intel® Distribution of OpenVINO™ toolkit by exporting environment variables:
148 |
149 | ```
150 | source /opt/intel/openvino/bin/setupvars.sh
151 | ```
152 |
153 | __Note__: This command needs to be executed only once in the terminal where the application will be executed. If the terminal is closed, the command needs to be executed again.
154 |
155 | ## Run the Code on Jupyter*
156 |
157 | * Change the current directory to the git-cloned application code location on your system:
158 | ```
159 | cd /Jupyter
160 | ```
161 |
162 |
198 |
199 | #### Follow the steps to run the code on Jupyter:
200 |
201 | 
202 |
203 | 1. Click on **New** button on the right side of the jupyter window.
204 |
205 | 2. Click on **Python 3** option from the drop down list.
206 |
207 | 3. In the first cell type **import os** and press **Shift+Enter** from the keyboard.
208 |
209 | 4. Export the environment variables in second cell of Jupyter and press **Shift+Enter**.
210 | ```
211 | %env MODEL = /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml
212 | %env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml
213 | ```
214 |
215 | 5. User can set target device to infer on (DEVICE),
216 | export the environment variable as given below if required. If user skips this step, these values are set to default values. For example:
217 | %env DEVICE = CPU
218 |
219 | 6. To run the application on sync mode, export the environment variable **%env FLAG = sync**. By default, the application runs on async mode.
220 |
221 |
222 | 7. Copy the code from **safety_gear_detector_jupyter.py** and paste it in the next cell and press **Shift+Enter**.
223 |
224 | 8. Alternatively, code can be run in the following way.
225 |
226 | i. Click on the **safety_gear_detector_jupyter.ipynb** file in the jupyter notebook window.
227 |
228 | ii. Click on the **Kernel** menu and then select **Restart & Run All** from the drop down list.
229 |
230 | iii. Click on Restart and Run All Cells.
231 |
232 | 
233 |
234 | **NOTE:**
235 |
236 | 1. To run the application on **GPU**:
237 |
238 | * With the floating point precision 32 (FP32), change the **%env DEVICE = CPU** to **%env DEVICE = GPU**.
239 | **FP32:** FP32 is single-precision floating-point arithmetic uses 32 bits to represent numbers. 8 bits for the magnitude and 23 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)
240 | * With the floating point precision 16 (FP16), change the environment variables as given below:
241 | ```
242 | %env DEVICE = GPU
243 | %env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml
244 | %env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml
245 | ```
246 | **FP16:** FP16 is half-precision floating-point arithmetic uses 16 bits. 5 bits for the magnitude and 10 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Half-precision_floating-point_format)
247 |
248 | 2. To run the application on **Intel® Neural Compute Stick**:
249 | * Change the **%env DEVICE = CPU** to **%env DEVICE = MYRIAD**.
250 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml**
251 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
252 |
253 | 3. To run the application on **Intel® Movidius™ VPU**:
254 | - Change the **%env DEVICE = CPU** to **%env DEVICE = HDDL**.
255 | - The HDDL can only run FP16 models. Change the environment variable for the model as shown below and the model that is passed to the application must be of data type FP16.
256 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml**
257 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
258 |
259 |
267 | 4. To run the application on multiple devices:
268 | - Change the **%env DEVICE = CPU** to **%env DEVICE = MULTI:CPU,GPU,MYRIAD**
269 | - With the floating point precision 16 (FP16), change the path of the model in the environment variable MODEL as given below:
270 | **%env MODEL=/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml**
271 | **%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml**
272 |
273 |
274 |
280 |
--------------------------------------------------------------------------------
/Jupyter/inference.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | Copyright (c) 2018 Intel Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining
6 | a copy of this software and associated documentation files (the
7 | "Software"), to deal in the Software without restriction, including
8 | without limitation the rights to use, copy, modify, merge, publish,
9 | distribute, sublicense, and/or sell copies of the Software, and to
10 | permit persons to whom the Software is furnished to do so, subject to
11 | the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be
14 | included in all copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23 | """
24 |
25 | import os
26 | import sys
27 | import logging as log
28 | from openvino.inference_engine import IENetwork, IECore
29 |
30 |
31 | class Network:
32 | """
33 | Load and configure inference plugins for the specified target devices
34 | and performs synchronous and asynchronous modes for the specified infer requests.
35 | """
36 |
37 | def __init__(self):
38 | self.net = None
39 | self.plugin = None
40 | self.input_blob = None
41 | self.out_blob = None
42 | self.net_plugin = None
43 | self.infer_request_handle = None
44 |
45 | def load_model(self, model, device, input_size, output_size, num_requests, cpu_extension=None, plugin=None):
46 | """
47 | Loads a network and an image to the Inference Engine plugin.
48 | :param model: .xml file of pre trained model
49 | :param cpu_extension: extension for the CPU device
50 | :param device: Target device
51 | :param input_size: Number of input layers
52 | :param output_size: Number of output layers
53 | :param num_requests: Index of Infer request value. Limited to device capabilities.
54 | :param plugin: Plugin for specified device
55 | :return: Shape of input layer
56 | """
57 |
58 | model_xml = model
59 | model_bin = os.path.splitext(model_xml)[0] + ".bin"
60 | # Plugin initialization for specified device
61 | # and load extensions library if specified
62 | if not plugin:
63 | log.info("Initializing plugin for {} device...".format(device))
64 | self.plugin = IECore()
65 | else:
66 | self.plugin = plugin
67 |
68 | if cpu_extension and 'CPU' in device:
69 | self.plugin.add_extension(cpu_extension, "CPU")
70 |
71 | # Read IR
72 | log.info("Reading IR...")
73 | self.net = self.plugin.read_network(model=model_xml, weights=model_bin) #IENetwork(model=model_xml, weights=model_bin)
74 | log.info("Loading IR to the plugin...")
75 |
76 | if "CPU" in device:
77 | supported_layers = self.plugin.query_network(self.net, "CPU")
78 | not_supported_layers = \
79 | [l for l in self.net.layers.keys() if l not in supported_layers]
80 | if len(not_supported_layers) != 0:
81 | log.error("Following layers are not supported by "
82 | "the plugin for specified device {}:\n {}".
83 | format(device,
84 | ', '.join(not_supported_layers)))
85 | # log.error("Please try to specify cpu extensions library path"
86 | # " in command line parameters using -l "
87 | # "or --cpu_extension command line argument")
88 | sys.exit(1)
89 |
90 | if num_requests == 0:
91 | # Loads network read from IR to the plugin
92 | self.net_plugin = self.plugin.load_network(network=self.net, device_name=device)
93 | else:
94 | self.net_plugin = self.plugin.load_network(network=self.net, num_requests=num_requests, device_name=device)
95 | # log.error("num_requests != 0")
96 |
97 | self.input_blob = next(iter(self.net.inputs))
98 | self.out_blob = next(iter(self.net.outputs))
99 | assert len(self.net.inputs.keys()) == input_size, \
100 | "Supports only {} input topologies".format(len(self.net.inputs))
101 | assert len(self.net.outputs) == output_size, \
102 | "Supports only {} output topologies".format(len(self.net.outputs))
103 |
104 | return self.plugin, self.get_input_shape()
105 |
106 | def get_input_shape(self):
107 | """
108 | Gives the shape of the input layer of the network.
109 | :return: None
110 | """
111 | return self.net.inputs[self.input_blob].shape
112 |
113 | def performance_counter(self, request_id):
114 | """
115 | Queries performance measures per layer to get feedback of what is the
116 | most time consuming layer.
117 | :param request_id: Index of Infer request value. Limited to device capabilities
118 | :return: Performance of the layer
119 | """
120 | perf_count = self.net_plugin.requests[request_id].get_perf_counts()
121 | return perf_count
122 |
123 | def exec_net(self, request_id, frame):
124 | """
125 | Starts asynchronous inference for specified request.
126 | :param request_id: Index of Infer request value. Limited to device capabilities.
127 | :param frame: Input image
128 | :return: Instance of Executable Network class
129 | """
130 | self.infer_request_handle = self.net_plugin.start_async(
131 | request_id=request_id, inputs={self.input_blob: frame})
132 | return self.net_plugin
133 |
134 | def wait(self, request_id):
135 | """
136 | Waits for the result to become available.
137 | :param request_id: Index of Infer request value. Limited to device capabilities.
138 | :return: Timeout value
139 | """
140 | wait_process = self.net_plugin.requests[request_id].wait(-1)
141 | return wait_process
142 |
143 | def get_output(self, request_id, output=None):
144 | """
145 | Gives a list of results for the output layer of the network.
146 | :param request_id: Index of Infer request value. Limited to device capabilities.
147 | :param output: Name of the output layer
148 | :return: Results for the specified request
149 | """
150 | if output:
151 | res = self.infer_request_handle.outputs[output]
152 | else:
153 | res = self.net_plugin.requests[request_id].outputs[self.out_blob]
154 | return res
155 |
156 | def clean(self):
157 | """
158 | Deletes all the instances
159 | :return: None
160 | """
161 | del self.net_plugin
162 | del self.plugin
163 | del self.net
164 |
--------------------------------------------------------------------------------
/Jupyter/safety_gear_detector_jupyter.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import os"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": null,
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "%env MODEL = /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml\n",
19 | "%env USE_SAFETY_MODEL = ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": null,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "#!/usr/bin/env python3\n",
29 | "\"\"\"\n",
30 | " Copyright (c) 2018 Intel Corporation.\n",
31 | "\n",
32 | " Permission is hereby granted, free of charge, to any person obtaining\n",
33 | " a copy of this software and associated documentation files (the\n",
34 | " \"Software\"), to deal in the Software without restriction, including\n",
35 | " without limitation the rights to use, copy, modify, merge, publish,\n",
36 | " distribute, sublicense, and/or sell copies of the Software, and to\n",
37 | " permit persons to whom the Software is furnished to do so, subject to\n",
38 | " the following conditions:\n",
39 | "\n",
40 | " The above copyright notice and this permission notice shall be\n",
41 | " included in all copies or substantial portions of the Software.\n",
42 | "\n",
43 | " THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n",
44 | " EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n",
45 | " MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n",
46 | " NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE\n",
47 | " LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\n",
48 | " OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION\n",
49 | " WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n",
50 | "\"\"\"\n",
51 | "\n",
52 | "from __future__ import print_function\n",
53 | "import sys\n",
54 | "import os\n",
55 | "import cv2\n",
56 | "import numpy as np\n",
57 | "import datetime\n",
58 | "import json\n",
59 | "from inference import Network\n",
60 | "\n",
61 | "# Global vars\n",
62 | "cpu_extension = ''\n",
63 | "conf_modelLayers = ''\n",
64 | "conf_modelWeights = ''\n",
65 | "targetDevice = \"CPU\"\n",
66 | "conf_batchSize = 1\n",
67 | "conf_modelPersonLabel = 1\n",
68 | "conf_inferConfidenceThreshold = 0.7\n",
69 | "conf_inFrameViolationsThreshold = 19\n",
70 | "conf_inFramePeopleThreshold = 5\n",
71 | "padding = 30\n",
72 | "viol_wk = 0\n",
73 | "acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL']\n",
74 | "videos = []\n",
75 | "name_of_videos = []\n",
76 | "CONFIG_FILE = '../resources/config.json'\n",
77 | "\n",
78 | "class Video:\n",
79 | " def __init__(self, idx, path):\n",
80 | " if path.isnumeric():\n",
81 | " self.video = cv2.VideoCapture(int(path))\n",
82 | " self.name = \"Cam \" + str(idx)\n",
83 | " else:\n",
84 | " if os.path.exists(path):\n",
85 | " self.video = cv2.VideoCapture(path)\n",
86 | " self.name = \"Video \" + str(idx)\n",
87 | " else:\n",
88 | " print(\"Either wrong input path or empty line is found. Please check the conf.json file\")\n",
89 | " exit(21)\n",
90 | " if not self.video.isOpened():\n",
91 | " print(\"Couldn't open video: \" + path)\n",
92 | " sys.exit(20)\n",
93 | " self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT))\n",
94 | " self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))\n",
95 | "\n",
96 | " self.currentViolationCount = 0\n",
97 | " self.currentViolationCountConfidence = 0\n",
98 | " self.prevViolationCount = 0\n",
99 | " self.totalViolations = 0\n",
100 | " self.totalPeopleCount = 0\n",
101 | " self.currentPeopleCount = 0\n",
102 | " self.currentPeopleCountConfidence = 0\n",
103 | " self.prevPeopleCount = 0\n",
104 | " self.currentTotalPeopleCount = 0\n",
105 | "\n",
106 | " cv2.namedWindow(self.name, cv2.WINDOW_NORMAL)\n",
107 | " self.frame_start_time = datetime.datetime.now()\n",
108 | "\n",
109 | "\n",
110 | "def env_parser():\n",
111 | " \"\"\"\n",
112 | " Parses the inputs.\n",
113 | " :return: None\n",
114 | " \"\"\"\n",
115 | " global use_safety_model, conf_modelLayers, conf_modelWeights, targetDevice, cpu_extension, videos,\\\n",
116 | " conf_safety_modelWeights, conf_safety_modelLayers, is_async_mode\n",
117 | " if 'MODEL' in os.environ:\n",
118 | " conf_modelLayers = os.environ['MODEL']\n",
119 | " conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + \".bin\"\n",
120 | " else:\n",
121 | " print(\"Please provide path for the .xml file.\")\n",
122 | " sys.exit(0)\n",
123 | " if 'DEVICE' in os.environ:\n",
124 | " targetDevice = os.environ['DEVICE']\n",
125 | " if 'MULTI' not in targetDevice and targetDevice not in acceptedDevices:\n",
126 | " print(\"Unsupported device: \" + targetDevice)\n",
127 | " sys.exit(2)\n",
128 | " elif 'MULTI' in targetDevice:\n",
129 | " target_devices = targetDevice.split(':')[1].split(',')\n",
130 | " for multi_device in target_devices:\n",
131 | " if multi_device not in acceptedDevices:\n",
132 | " print(\"Unsupported device: \" + targetDevice)\n",
133 | " sys.exit(2)\n",
134 | " if 'CPU_EXTENSION' in os.environ:\n",
135 | " cpu_extension = os.environ['CPU_EXTENSION']\n",
136 | " if 'USE_SAFETY_MODEL' in os.environ:\n",
137 | " conf_safety_modelLayers = os.environ['USE_SAFETY_MODEL']\n",
138 | " conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + \".bin\"\n",
139 | " use_safety_model = True\n",
140 | " else:\n",
141 | " use_safety_model = False\n",
142 | " if 'FLAG' in os.environ:\n",
143 | " if os.environ['FLAG'] == 'async':\n",
144 | " is_async_mode = True\n",
145 | " print('Application running in Async mode')\n",
146 | " else:\n",
147 | " is_async_mode = False\n",
148 | " print('Application running in Sync mode')\n",
149 | " else:\n",
150 | " is_async_mode = True\n",
151 | " print('Application running in Async mode')\n",
152 | " assert os.path.isfile(CONFIG_FILE), \"{} file doesn't exist\".format(CONFIG_FILE)\n",
153 | " config = json.loads(open(CONFIG_FILE).read())\n",
154 | " for idx, item in enumerate(config['inputs']):\n",
155 | " vid = Video(idx, item['video'])\n",
156 | " name_of_videos.append([idx, item['video']])\n",
157 | " videos.append([idx, vid])\n",
158 | "\n",
159 | "\n",
160 | "\n",
161 | "def detect_safety_hat(img):\n",
162 | " \"\"\"\n",
163 | " Detection of the hat of the person.\n",
164 | " :param img: Current frame\n",
165 | " :return: Boolean value of the detected hat\n",
166 | " \"\"\"\n",
167 | " lowH = 15\n",
168 | " lowS = 65\n",
169 | " lowV = 75\n",
170 | "\n",
171 | " highH = 30\n",
172 | " highS = 255\n",
173 | " highV = 255\n",
174 | "\n",
175 | " crop = 0\n",
176 | " height = 15\n",
177 | " perc = 8\n",
178 | "\n",
179 | " hsv = np.zeros(1)\n",
180 | "\n",
181 | " try:\n",
182 | " hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)\n",
183 | " except cv2.error as e:\n",
184 | " print(\"%d %d %d\" % (img.shape))\n",
185 | " print(\"%d %d %d\" % (img.shape))\n",
186 | " print(e)\n",
187 | "\n",
188 | " threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))\n",
189 | "\n",
190 | " x = 0\n",
191 | " y = int(threshold_img.shape[0] * crop / 100)\n",
192 | " w = int(threshold_img.shape[1])\n",
193 | " h = int(threshold_img.shape[0] * height / 100)\n",
194 | " img_cropped = threshold_img[y: y + h, x: x + w]\n",
195 | "\n",
196 | " if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:\n",
197 | " return False\n",
198 | "\n",
199 | " return True\n",
200 | "\n",
201 | "\n",
202 | "def detect_safety_jacket(img):\n",
203 | " \"\"\"\n",
204 | " Detection of the safety jacket of the person.\n",
205 | " :param img: Current frame\n",
206 | " :return: Boolean value of the detected jacket\n",
207 | " \"\"\"\n",
208 | " lowH = 0\n",
209 | " lowS = 150\n",
210 | " lowV = 42\n",
211 | "\n",
212 | " highH = 11\n",
213 | " highS = 255\n",
214 | " highV = 255\n",
215 | "\n",
216 | " crop = 15\n",
217 | " height = 40\n",
218 | " perc = 23\n",
219 | "\n",
220 | " hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)\n",
221 | "\n",
222 | " threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))\n",
223 | "\n",
224 | " x = 0\n",
225 | " y = int(threshold_img.shape[0] * crop / 100)\n",
226 | " w = int(threshold_img.shape[1])\n",
227 | " h = int(threshold_img.shape[0] * height / 100)\n",
228 | " img_cropped = threshold_img[y: y + h, x: x + w]\n",
229 | "\n",
230 | " if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:\n",
231 | " return False\n",
232 | "\n",
233 | " return True\n",
234 | "\n",
235 | "\n",
236 | "def detect_workers(workers, frame):\n",
237 | " \"\"\"\n",
238 | " Detection of the person with the safety guards.\n",
239 | " :param workers: Total number of the person in the current frame\n",
240 | " :param frame: Current frame\n",
241 | " :return: Total violation count of the person\n",
242 | " \"\"\"\n",
243 | " violations = 0\n",
244 | " global viol_wk\n",
245 | " for worker in workers:\n",
246 | " xmin, ymin, xmax, ymax = worker\n",
247 | " crop = frame[ymin:ymax, xmin:xmax]\n",
248 | " if 0 not in crop.shape:\n",
249 | " if detect_safety_hat(crop):\n",
250 | " if detect_safety_jacket(crop):\n",
251 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),\n",
252 | " (0, 255, 0), 2)\n",
253 | " else:\n",
254 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),\n",
255 | " (0, 0, 255), 2)\n",
256 | " violations += 1\n",
257 | " viol_wk += 1\n",
258 | "\n",
259 | " else:\n",
260 | " cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)\n",
261 | " violations += 1\n",
262 | " viol_wk += 1\n",
263 | "\n",
264 | " return violations\n",
265 | "\n",
266 | "\n",
267 | "def main():\n",
268 | " \"\"\"\n",
269 | " Load the network and parse the output.\n",
270 | " :return: None\n",
271 | " \"\"\"\n",
272 | " env_parser()\n",
273 | " global is_async_mode\n",
274 | " nextReq = 1\n",
275 | " currReq = 0\n",
276 | " nextReq_s = 1\n",
277 | " currReq_s = 0\n",
278 | " prevVideo = None\n",
279 | " vid_finished = [False] * len(videos)\n",
280 | " min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))])\n",
281 | " # Initialise the class\n",
282 | " infer_network = Network()\n",
283 | " infer_network_safety = Network()\n",
284 | " # Load the network to IE plugin to get shape of input layer\n",
285 | " plugin, (batch_size, channels, model_height, model_width) = \\\n",
286 | " infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension)\n",
287 | " if use_safety_model:\n",
288 | " batch_size_sm, channels_sm, model_height_sm, model_width_sm = \\\n",
289 | " infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1]\n",
290 | "\n",
291 | " while True:\n",
292 | " for index, currVideo in videos:\n",
293 | " # Read image from video/cam\n",
294 | " vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS)))\n",
295 | " for i in range(0, int(round(vfps / min_FPS))):\n",
296 | " ret, current_img = currVideo.video.read()\n",
297 | " if not ret:\n",
298 | " vid_finished[index] = True\n",
299 | " break\n",
300 | " if vid_finished[index]:\n",
301 | " stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1),\n",
302 | " dtype='uint8')\n",
303 | " cv2.putText(stream_end_frame, \"Input file {} has ended\".format\n",
304 | " (name_of_videos[index][1].split('/')[-1]) ,\n",
305 | " (10, int(currVideo.height/2)),\n",
306 | " cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)\n",
307 | " cv2.imshow(currVideo.name, stream_end_frame)\n",
308 | " continue\n",
309 | " # Transform image to person detection model input\n",
310 | " rsImg = cv2.resize(current_img, (model_width, model_height))\n",
311 | " rsImg = rsImg.transpose((2, 0, 1))\n",
312 | " rsImg = rsImg.reshape((batch_size, channels, model_height, model_width))\n",
313 | "\n",
314 | " infer_start_time = datetime.datetime.now()\n",
315 | " # Infer current image\n",
316 | " if is_async_mode:\n",
317 | " infer_network.exec_net(nextReq, rsImg)\n",
318 | " else:\n",
319 | " infer_network.exec_net(currReq, rsImg)\n",
320 | " prevVideo = currVideo\n",
321 | " previous_img = current_img\n",
322 | " # Wait for previous request to end\n",
323 | " if infer_network.wait(currReq) == 0:\n",
324 | " infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000\n",
325 | " in_frame_workers = []\n",
326 | " people = 0\n",
327 | " violations = 0\n",
328 | " hard_hat_detection =False\n",
329 | " vest_detection = False\n",
330 | " result = infer_network.get_output(currReq)\n",
331 | " # Filter output\n",
332 | " for obj in result[0][0]:\n",
333 | " if obj[2] > conf_inferConfidenceThreshold:\n",
334 | " xmin = int(obj[3] * prevVideo.width)\n",
335 | " ymin = int(obj[4] * prevVideo.height)\n",
336 | " xmax = int(obj[5] * prevVideo.width)\n",
337 | " ymax = int(obj[6] * prevVideo.height)\n",
338 | " xmin = int(xmin - padding) if (xmin - padding) > 0 else 0\n",
339 | " ymin = int(ymin - padding) if (ymin - padding) > 0 else 0\n",
340 | " xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width\n",
341 | " ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height\n",
342 | " cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)\n",
343 | " people += 1\n",
344 | " in_frame_workers.append((xmin, ymin, xmax, ymax))\n",
345 | " new_frame = previous_img[ymin:ymax, xmin:xmax]\n",
346 | " if use_safety_model:\n",
347 | " # Transform image to safety model input\n",
348 | " in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm))\n",
349 | " in_frame_sm = in_frame_sm.transpose((2, 0, 1))\n",
350 | " in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm))\n",
351 | "\n",
352 | " infer_start_time_sm = datetime.datetime.now()\n",
353 | " if is_async_mode:\n",
354 | " infer_network_safety.exec_net(nextReq_s, in_frame_sm)\n",
355 | " else:\n",
356 | " infer_network_safety.exec_net(currReq_s, in_frame_sm)\n",
357 | " # Wait for the result\n",
358 | " infer_network_safety.wait(currReq_s)\n",
359 | " infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000\n",
360 | "\n",
361 | " result_sm = infer_network_safety.get_output(currReq_s)\n",
362 | " # Filter output\n",
363 | " hard_hat_detection = False\n",
364 | " vest_detection = False\n",
365 | " detection_list = []\n",
366 | " for obj_sm in result_sm[0][0]:\n",
367 | "\n",
368 | " if (obj_sm[2] > 0.4):\n",
369 | " # Detect safety vest\n",
370 | " if (int(obj_sm[1])) == 2:\n",
371 | " xmin_sm = int(obj_sm[3] * (xmax-xmin))\n",
372 | " ymin_sm = int(obj_sm[4] * (ymax-ymin))\n",
373 | " xmax_sm = int(obj_sm[5] * (xmax-xmin))\n",
374 | " ymax_sm = int(obj_sm[6] * (ymax-ymin))\n",
375 | " if vest_detection == False:\n",
376 | " detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin])\n",
377 | " vest_detection = True\n",
378 | "\n",
379 | " # Detect hard-hat\n",
380 | " if int(obj_sm[1]) == 4:\n",
381 | " xmin_sm_v = int(obj_sm[3] * (xmax-xmin))\n",
382 | " ymin_sm_v = int(obj_sm[4] * (ymax-ymin))\n",
383 | " xmax_sm_v = int(obj_sm[5] * (xmax-xmin))\n",
384 | " ymax_sm_v = int(obj_sm[6] * (ymax-ymin))\n",
385 | " if hard_hat_detection == False:\n",
386 | " detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin])\n",
387 | " hard_hat_detection = True\n",
388 | "\n",
389 | " if hard_hat_detection is False or vest_detection is False:\n",
390 | " violations += 1\n",
391 | " for _rect in detection_list:\n",
392 | " cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2)\n",
393 | " if is_async_mode:\n",
394 | " currReq_s, nextReq_s = nextReq_s, currReq_s\n",
395 | "\n",
396 | " # Use OpenCV if worker-safety-model is not provided\n",
397 | " else :\n",
398 | " violations = detect_workers(in_frame_workers, previous_img)\n",
399 | "\n",
400 | " # Check if detected violations equals previous frames\n",
401 | " if violations == prevVideo.currentViolationCount:\n",
402 | " prevVideo.currentViolationCountConfidence += 1\n",
403 | "\n",
404 | " # If frame threshold is reached, change validated count\n",
405 | " if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold:\n",
406 | "\n",
407 | " # If another violation occurred, save image\n",
408 | " if prevVideo.currentViolationCount > prevVideo.prevViolationCount:\n",
409 | " prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount)\n",
410 | " prevVideo.prevViolationCount = prevVideo.currentViolationCount\n",
411 | " else:\n",
412 | " prevVideo.currentViolationCountConfidence = 0\n",
413 | " prevVideo.currentViolationCount = violations\n",
414 | "\n",
415 | " # Check if detected people count equals previous frames\n",
416 | " if people == prevVideo.currentPeopleCount:\n",
417 | " prevVideo.currentPeopleCountConfidence += 1\n",
418 | " # If frame threshold is reached, change validated count\n",
419 | " if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold:\n",
420 | " prevVideo.currentTotalPeopleCount += (\n",
421 | " prevVideo.currentPeopleCount - prevVideo.prevPeopleCount)\n",
422 | " if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount:\n",
423 | " prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount\n",
424 | " prevVideo.prevPeopleCount = prevVideo.currentPeopleCount\n",
425 | " else:\n",
426 | " prevVideo.currentPeopleCountConfidence = 0\n",
427 | " prevVideo.currentPeopleCount = people\n",
428 | "\n",
429 | " frame_end_time = datetime.datetime.now()\n",
430 | " cv2.putText(previous_img, 'Total people count: ' + str(\n",
431 | " prevVideo.totalPeopleCount), (10, prevVideo.height - 10),\n",
432 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n",
433 | " cv2.putText(previous_img, 'Current people count: ' + str(\n",
434 | " prevVideo.currentTotalPeopleCount),\n",
435 | " (10, prevVideo.height - 40),\n",
436 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n",
437 | " cv2.putText(previous_img, 'Total violation count: ' + str(\n",
438 | " prevVideo.totalViolations), (10, prevVideo.height - 70),\n",
439 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n",
440 | " cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / (\n",
441 | " frame_end_time - prevVideo.frame_start_time).total_seconds()),\n",
442 | " (10, prevVideo.height - 100),\n",
443 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n",
444 | " cv2.putText(previous_img, 'Inference time: N\\A for async mode' if is_async_mode else 'Inference time: {}ms'.format((infer_end_time).total_seconds()),\n",
445 | " (10, prevVideo.height - 130),\n",
446 | " cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)\n",
447 | " cv2.imshow(prevVideo.name, previous_img)\n",
448 | " prevVideo.frame_start_time = datetime.datetime.now()\n",
449 | " # Swap\n",
450 | " if is_async_mode:\n",
451 | " currReq, nextReq = nextReq, currReq\n",
452 | " previous_img = current_img\n",
453 | " prevVideo = currVideo\n",
454 | " # Exit if ESC key is pressed\n",
455 | " if cv2.waitKey(1) == 27:\n",
456 | " print(\"Attempting to stop input files\")\n",
457 | " infer_network.clean()\n",
458 | " infer_network_safety.clean()\n",
459 | " cv2.destroyAllWindows()\n",
460 | " return\n",
461 | " \n",
462 | " if False not in vid_finished:\n",
463 | " infer_network.clean()\n",
464 | " infer_network_safety.clean()\n",
465 | " cv2.destroyAllWindows()\n",
466 | " break\n",
467 | "\n",
468 | "\n",
469 | "\n",
470 | "if __name__ == '__main__':\n",
471 | " main()"
472 | ]
473 | },
474 | {
475 | "cell_type": "code",
476 | "execution_count": null,
477 | "metadata": {},
478 | "outputs": [],
479 | "source": []
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": null,
484 | "metadata": {},
485 | "outputs": [],
486 | "source": []
487 | }
488 | ],
489 | "metadata": {
490 | "kernelspec": {
491 | "display_name": "Python 3",
492 | "language": "python",
493 | "name": "python3"
494 | },
495 | "language_info": {
496 | "codemirror_mode": {
497 | "name": "ipython",
498 | "version": 3
499 | },
500 | "file_extension": ".py",
501 | "mimetype": "text/x-python",
502 | "name": "python",
503 | "nbconvert_exporter": "python",
504 | "pygments_lexer": "ipython3",
505 | "version": "3.6.9"
506 | }
507 | },
508 | "nbformat": 4,
509 | "nbformat_minor": 2
510 | }
511 |
--------------------------------------------------------------------------------
/Jupyter/safety_gear_detector_jupyter.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | Copyright (c) 2018 Intel Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining
6 | a copy of this software and associated documentation files (the
7 | "Software"), to deal in the Software without restriction, including
8 | without limitation the rights to use, copy, modify, merge, publish,
9 | distribute, sublicense, and/or sell copies of the Software, and to
10 | permit persons to whom the Software is furnished to do so, subject to
11 | the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be
14 | included in all copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23 | """
24 |
25 | from __future__ import print_function
26 | import sys
27 | import os
28 | import cv2
29 | import numpy as np
30 | import datetime
31 | import json
32 | from inference import Network
33 |
34 | # Global vars
35 | cpu_extension = ''
36 | conf_modelLayers = ''
37 | conf_modelWeights = ''
38 | targetDevice = "CPU"
39 | conf_batchSize = 1
40 | conf_modelPersonLabel = 1
41 | conf_inferConfidenceThreshold = 0.7
42 | conf_inFrameViolationsThreshold = 19
43 | conf_inFramePeopleThreshold = 5
44 | padding = 30
45 | viol_wk = 0
46 | acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL']
47 | videos = []
48 | name_of_videos = []
49 | CONFIG_FILE = '../resources/config.json'
50 |
51 | class Video:
52 | def __init__(self, idx, path):
53 | if path.isnumeric():
54 | self.video = cv2.VideoCapture(int(path))
55 | self.name = "Cam " + str(idx)
56 | else:
57 | if os.path.exists(path):
58 | self.video = cv2.VideoCapture(path)
59 | self.name = "Video " + str(idx)
60 | else:
61 | print("Either wrong input path or empty line is found. Please check the conf.json file")
62 | exit(21)
63 | if not self.video.isOpened():
64 | print("Couldn't open video: " + path)
65 | sys.exit(20)
66 | self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT))
67 | self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))
68 |
69 | self.currentViolationCount = 0
70 | self.currentViolationCountConfidence = 0
71 | self.prevViolationCount = 0
72 | self.totalViolations = 0
73 | self.totalPeopleCount = 0
74 | self.currentPeopleCount = 0
75 | self.currentPeopleCountConfidence = 0
76 | self.prevPeopleCount = 0
77 | self.currentTotalPeopleCount = 0
78 |
79 | cv2.namedWindow(self.name, cv2.WINDOW_NORMAL)
80 | self.frame_start_time = datetime.datetime.now()
81 |
82 |
83 | def env_parser():
84 | """
85 | Parses the inputs.
86 | :return: None
87 | """
88 | global use_safety_model, conf_modelLayers, conf_modelWeights, targetDevice, cpu_extension, videos,\
89 | conf_safety_modelWeights, conf_safety_modelLayers, is_async_mode
90 | if 'MODEL' in os.environ:
91 | conf_modelLayers = os.environ['MODEL']
92 | conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + ".bin"
93 | else:
94 | print("Please provide path for the .xml file.")
95 | sys.exit(0)
96 | if 'DEVICE' in os.environ:
97 | targetDevice = os.environ['DEVICE']
98 | if targetDevice not in acceptedDevices:
99 | print("Selected device, %s not supported." % (targetDevice))
100 | sys.exit(12)
101 | if 'CPU_EXTENSION' in os.environ:
102 | cpu_extension = os.environ['CPU_EXTENSION']
103 | if 'USE_SAFETY_MODEL' in os.environ:
104 | conf_safety_modelLayers = os.environ['USE_SAFETY_MODEL']
105 | conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + ".bin"
106 | use_safety_model = True
107 | else:
108 | use_safety_model = False
109 | if 'FLAG' in os.environ:
110 | if os.environ['FLAG'] == 'async':
111 | is_async_mode = True
112 | print('Application running in Async mode')
113 | else:
114 | is_async_mode = False
115 | print('Application running in Sync mode')
116 | else:
117 | is_async_mode = True
118 | print('Application running in Async mode')
119 | assert os.path.isfile(CONFIG_FILE), "{} file doesn't exist".format(CONFIG_FILE)
120 | config = json.loads(open(CONFIG_FILE).read())
121 | for idx, item in enumerate(config['inputs']):
122 | vid = Video(idx, item['video'])
123 | name_of_videos.append([idx, item['video']])
124 | videos.append([idx, vid])
125 |
126 |
127 |
128 | def detect_safety_hat(img):
129 | """
130 | Detection of the hat of the person.
131 | :param img: Current frame
132 | :return: Boolean value of the detected hat
133 | """
134 | lowH = 15
135 | lowS = 65
136 | lowV = 75
137 |
138 | highH = 30
139 | highS = 255
140 | highV = 255
141 |
142 | crop = 0
143 | height = 15
144 | perc = 8
145 |
146 | hsv = np.zeros(1)
147 |
148 | try:
149 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
150 | except cv2.error as e:
151 | print("%d %d %d" % (img.shape))
152 | print("%d %d %d" % (img.shape))
153 | print(e)
154 |
155 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))
156 |
157 | x = 0
158 | y = int(threshold_img.shape[0] * crop / 100)
159 | w = int(threshold_img.shape[1])
160 | h = int(threshold_img.shape[0] * height / 100)
161 | img_cropped = threshold_img[y: y + h, x: x + w]
162 |
163 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:
164 | return False
165 |
166 | return True
167 |
168 |
169 | def detect_safety_jacket(img):
170 | """
171 | Detection of the safety jacket of the person.
172 | :param img: Current frame
173 | :return: Boolean value of the detected jacket
174 | """
175 | lowH = 0
176 | lowS = 150
177 | lowV = 42
178 |
179 | highH = 11
180 | highS = 255
181 | highV = 255
182 |
183 | crop = 15
184 | height = 40
185 | perc = 23
186 |
187 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
188 |
189 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))
190 |
191 | x = 0
192 | y = int(threshold_img.shape[0] * crop / 100)
193 | w = int(threshold_img.shape[1])
194 | h = int(threshold_img.shape[0] * height / 100)
195 | img_cropped = threshold_img[y: y + h, x: x + w]
196 |
197 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:
198 | return False
199 |
200 | return True
201 |
202 |
203 | def detect_workers(workers, frame):
204 | """
205 | Detection of the person with the safety guards.
206 | :param workers: Total number of the person in the current frame
207 | :param frame: Current frame
208 | :return: Total violation count of the person
209 | """
210 | violations = 0
211 | global viol_wk
212 | for worker in workers:
213 | xmin, ymin, xmax, ymax = worker
214 | crop = frame[ymin:ymax, xmin:xmax]
215 | if 0 not in crop.shape:
216 | if detect_safety_hat(crop):
217 | if detect_safety_jacket(crop):
218 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),
219 | (0, 255, 0), 2)
220 | else:
221 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),
222 | (0, 0, 255), 2)
223 | violations += 1
224 | viol_wk += 1
225 |
226 | else:
227 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)
228 | violations += 1
229 | viol_wk += 1
230 |
231 | return violations
232 |
233 |
234 | def main():
235 | """
236 | Load the network and parse the output.
237 | :return: None
238 | """
239 | env_parser()
240 | global is_async_mode
241 | nextReq = 1
242 | currReq = 0
243 | nextReq_s = 1
244 | currReq_s = 0
245 | prevVideo = None
246 | vid_finished = [False] * len(videos)
247 | min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))])
248 | # Initialise the class
249 | infer_network = Network()
250 | infer_network_safety = Network()
251 | # Load the network to IE plugin to get shape of input layer
252 | plugin, (batch_size, channels, model_height, model_width) = \
253 | infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension)
254 | if use_safety_model:
255 | batch_size_sm, channels_sm, model_height_sm, model_width_sm = \
256 | infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1]
257 |
258 | while True:
259 | for index, currVideo in videos:
260 | # Read image from video/cam
261 | vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS)))
262 | for i in range(0, int(round(vfps / min_FPS))):
263 | ret, current_img = currVideo.video.read()
264 | if not ret:
265 | vid_finished[index] = True
266 | break
267 | if vid_finished[index]:
268 | stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1),
269 | dtype='uint8')
270 | cv2.putText(stream_end_frame, "Input file {} has ended".format
271 | (name_of_videos[index][1].split('/')[-1]) ,
272 | (10, int(currVideo.height/2)),
273 | cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
274 | cv2.imshow(currVideo.name, stream_end_frame)
275 | continue
276 | # Transform image to person detection model input
277 | rsImg = cv2.resize(current_img, (model_width, model_height))
278 | rsImg = rsImg.transpose((2, 0, 1))
279 | rsImg = rsImg.reshape((batch_size, channels, model_height, model_width))
280 |
281 | infer_start_time = datetime.datetime.now()
282 | # Infer current image
283 | if is_async_mode:
284 | infer_network.exec_net(nextReq, rsImg)
285 | else:
286 | infer_network.exec_net(currReq, rsImg)
287 | prevVideo = currVideo
288 | previous_img = current_img
289 | # Wait for previous request to end
290 | if infer_network.wait(currReq) == 0:
291 | infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000
292 | in_frame_workers = []
293 | people = 0
294 | violations = 0
295 | hard_hat_detection =False
296 | vest_detection = False
297 | result = infer_network.get_output(currReq)
298 | # Filter output
299 | for obj in result[0][0]:
300 | if obj[2] > conf_inferConfidenceThreshold:
301 | xmin = int(obj[3] * prevVideo.width)
302 | ymin = int(obj[4] * prevVideo.height)
303 | xmax = int(obj[5] * prevVideo.width)
304 | ymax = int(obj[6] * prevVideo.height)
305 | xmin = int(xmin - padding) if (xmin - padding) > 0 else 0
306 | ymin = int(ymin - padding) if (ymin - padding) > 0 else 0
307 | xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width
308 | ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height
309 | cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
310 | people += 1
311 | in_frame_workers.append((xmin, ymin, xmax, ymax))
312 | new_frame = previous_img[ymin:ymax, xmin:xmax]
313 | if use_safety_model:
314 | # Transform image to safety model input
315 | in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm))
316 | in_frame_sm = in_frame_sm.transpose((2, 0, 1))
317 | in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm))
318 |
319 | infer_start_time_sm = datetime.datetime.now()
320 | if is_async_mode:
321 | infer_network_safety.exec_net(nextReq_s, in_frame_sm)
322 | else:
323 | infer_network_safety.exec_net(currReq_s, in_frame_sm)
324 | # Wait for the result
325 | infer_network_safety.wait(currReq_s)
326 | infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000
327 |
328 | result_sm = infer_network_safety.get_output(currReq_s)
329 | # Filter output
330 | hard_hat_detection = False
331 | vest_detection = False
332 | detection_list = []
333 | for obj_sm in result_sm[0][0]:
334 |
335 | if (obj_sm[2] > 0.4):
336 | # Detect safety vest
337 | if (int(obj_sm[1])) == 2:
338 | xmin_sm = int(obj_sm[3] * (xmax-xmin))
339 | ymin_sm = int(obj_sm[4] * (ymax-ymin))
340 | xmax_sm = int(obj_sm[5] * (xmax-xmin))
341 | ymax_sm = int(obj_sm[6] * (ymax-ymin))
342 | if vest_detection == False:
343 | detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin])
344 | vest_detection = True
345 |
346 | # Detect hard-hat
347 | if int(obj_sm[1]) == 4:
348 | xmin_sm_v = int(obj_sm[3] * (xmax-xmin))
349 | ymin_sm_v = int(obj_sm[4] * (ymax-ymin))
350 | xmax_sm_v = int(obj_sm[5] * (xmax-xmin))
351 | ymax_sm_v = int(obj_sm[6] * (ymax-ymin))
352 | if hard_hat_detection == False:
353 | detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin])
354 | hard_hat_detection = True
355 |
356 | if hard_hat_detection is False or vest_detection is False:
357 | violations += 1
358 | for _rect in detection_list:
359 | cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2)
360 | if is_async_mode:
361 | currReq_s, nextReq_s = nextReq_s, currReq_s
362 |
363 | # Use OpenCV if worker-safety-model is not provided
364 | else :
365 | violations = detect_workers(in_frame_workers, previous_img)
366 |
367 | # Check if detected violations equals previous frames
368 | if violations == prevVideo.currentViolationCount:
369 | prevVideo.currentViolationCountConfidence += 1
370 |
371 | # If frame threshold is reached, change validated count
372 | if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold:
373 |
374 | # If another violation occurred, save image
375 | if prevVideo.currentViolationCount > prevVideo.prevViolationCount:
376 | prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount)
377 | prevVideo.prevViolationCount = prevVideo.currentViolationCount
378 | else:
379 | prevVideo.currentViolationCountConfidence = 0
380 | prevVideo.currentViolationCount = violations
381 |
382 | # Check if detected people count equals previous frames
383 | if people == prevVideo.currentPeopleCount:
384 | prevVideo.currentPeopleCountConfidence += 1
385 | # If frame threshold is reached, change validated count
386 | if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold:
387 | prevVideo.currentTotalPeopleCount += (
388 | prevVideo.currentPeopleCount - prevVideo.prevPeopleCount)
389 | if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount:
390 | prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount
391 | prevVideo.prevPeopleCount = prevVideo.currentPeopleCount
392 | else:
393 | prevVideo.currentPeopleCountConfidence = 0
394 | prevVideo.currentPeopleCount = people
395 |
396 | frame_end_time = datetime.datetime.now()
397 | cv2.putText(previous_img, 'Total people count: ' + str(
398 | prevVideo.totalPeopleCount), (10, prevVideo.height - 10),
399 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
400 | cv2.putText(previous_img, 'Current people count: ' + str(
401 | prevVideo.currentTotalPeopleCount),
402 | (10, prevVideo.height - 40),
403 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
404 | cv2.putText(previous_img, 'Total violation count: ' + str(
405 | prevVideo.totalViolations), (10, prevVideo.height - 70),
406 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
407 | cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / (
408 | frame_end_time - prevVideo.frame_start_time).total_seconds()),
409 | (10, prevVideo.height - 100),
410 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
411 | cv2.putText(previous_img, 'Inference time: N\A for async mode' if is_async_mode else 'Inference time: {}ms'.format((infer_end_time).total_seconds()),
412 | (10, prevVideo.height - 130),
413 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
414 | cv2.imshow(prevVideo.name, previous_img)
415 | prevVideo.frame_start_time = datetime.datetime.now()
416 | # Swap
417 | if is_async_mode:
418 | currReq, nextReq = nextReq, currReq
419 | previous_img = current_img
420 | prevVideo = currVideo
421 | # Exit if ESC key is pressed
422 | if cv2.waitKey(1) == 27:
423 | print("Attempting to stop input files")
424 | infer_network.clean()
425 | infer_network_safety.clean()
426 | cv2.destroyAllWindows()
427 | return
428 |
429 | if False not in vid_finished:
430 | infer_network.clean()
431 | infer_network_safety.clean()
432 | cv2.destroyAllWindows()
433 | break
434 |
435 |
436 |
437 | if __name__ == '__main__':
438 | main()
439 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2021, Intel Corporation
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | 3. Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Safety Gear Detector
2 |
3 | | Details | |
4 | |-----------------------|---------------|
5 | | Target OS: | Ubuntu\* 18.04 LTS |
6 | | Programming Language: | Python\* 3.5|
7 | | Time to Complete: | 30-40min |
8 |
9 | 
10 |
11 |
12 | ## What It Does
13 | This reference implementation is capable of detecting people passing in front of a camera and detecting if the people are wearing safety-jackets and hard-hats. The application counts the number of people who are violating the safety gear standards and the total number of people detected.
14 |
15 | ## Requirements
16 |
17 | ### Hardware
18 |
19 | - 6th to 8th Generation Intel® Core™ processors with Iris® Pro graphics or Intel® HD Graphics
20 |
21 | ### Software
22 |
23 | - [Ubuntu\* 18.04 LTS](http://releases.ubuntu.com/18.04/)
24 | **Note**: We recommend using a 4.14+ Linux* kernel with this software. Run the following command to determine the kernel version:
25 |
26 | ```
27 | uname -a
28 | ```
29 |
30 | - OpenCL™ Runtime Package
31 |
32 | - Intel® Distribution of OpenVINO™ toolkit 2020 R3 Release
33 |
34 | ## How It Works
35 | The application uses the Inference Engine included in the Intel® Distribution of OpenVINO™ toolkit.
36 |
37 | Firstly, a trained neural network detects people in the frame and displays a green colored bounding box over them. For each person detected, the application determines if they are wearing a safety-jacket and hard-hat. If they are not, an alert is registered with the system.
38 |
39 | 
40 |
41 | ## Setup
42 |
43 | ### Install Intel® Distribution of OpenVINO™ toolkit
44 | Refer to [Install the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to install and set up the toolkit.
45 |
46 | Install the OpenCL™ Runtime Package to run inference on the GPU. It is not mandatory for CPU inference.
47 |
48 | ### Other dependencies
49 | #### FFmpeg*
50 | FFmpeg is a free and open-source project capable of recording, converting and streaming digital audio and video in various formats. It can be used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more.
51 |
52 | ## Setup
53 | ### Get the code
54 | Clone the reference implementation
55 | ```
56 | sudo apt-get update && sudo apt-get install git
57 | git clone https://github.com/intel-iot-devkit/safety-gear-detector-python.git
58 | ```
59 |
60 | ### Install OpenVINO
61 |
62 | Refer to [Install Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/articles/OpenVINO-Install-Linux) to learn how to install and configure the toolkit.
63 |
64 | Install the OpenCL™ Runtime Package to run inference on the GPU, as shown in the instructions below. It is not mandatory for CPU inference.
65 |
66 |
67 | ## Which model to use
68 |
69 | This application uses the [person-detection-retail-0013](https://docs.openvinotoolkit.org/2020.3/_models_intel_person_detection_retail_0013_description_person_detection_retail_0013.html) Intel® model, that can be downloaded using the **model downloader**. The **model downloader** downloads the __.xml__ and __.bin__ files that will be used by the application.
70 |
71 | The application also uses the **worker_safety_mobilenet** model, whose Caffe* model file are provided in the `resources/worker-safety-mobilenet` directory. These need to be passed through the model optimizer to generate the IR (the .xml and .bin files) that will be used by the application.
72 |
73 | To download the models and install the dependencies of the application, run the below command in the `safety-gear-detector-cpp-with-worker-safety-model` directory:
74 | ```
75 | ./setup.sh
76 | ```
77 |
78 | ### The Config File
79 |
80 | The _resources/config.json_ contains the path of video that will be used by the application as input.
81 |
82 | For example:
83 | ```
84 | {
85 | "inputs": [
86 | {
87 | "video":"path_to_video/video1.mp4",
88 | }
89 | ]
90 | }
91 | ```
92 |
93 | The `path/to/video` is the path to an input video file.
94 |
95 | ### Which Input Video to use
96 |
97 | The application works with any input video. Sample videos are provided [here](https://github.com/intel-iot-devkit/sample-videos/).
98 |
99 | For first-use, we recommend using the *Safety_Full_Hat_and_Vest.mp4* video which is present in the `resources/` directory.
100 |
101 | For example:
102 | ```
103 | {
104 | "inputs": [
105 | {
106 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4"
107 | },
108 | {
109 | "video":"sample-videos/Safety_Full_Hat_and_Vest.mp4"
110 | }
111 | ]
112 | }
113 | ```
114 | If the user wants to use any other video, it can be used by providing the path in the config.json file.
115 |
116 | ### Using the Camera Stream instead of video
117 |
118 | Replace `path/to/video` with the camera ID in the config.json file, where the ID is taken from the video device (the number **X** in /dev/video**X**).
119 |
120 | On Ubuntu, to list all available video devices use the following command:
121 |
122 | ```
123 | ls /dev/video*
124 | ```
125 |
126 | For example, if the output of above command is __/dev/video0__, then config.json would be:
127 |
128 | ```
129 | {
130 | "inputs": [
131 | {
132 | "video":"0"
133 | }
134 | ]
135 | }
136 | ```
137 |
138 | ### Setup the Environment
139 |
140 | Configure the environment to use the Intel® Distribution of OpenVINO™ toolkit by exporting environment variables:
141 |
142 | ```
143 | source /opt/intel/openvino/bin/setupvars.sh
144 | ```
145 |
146 | __Note__: This command needs to be executed only once in the terminal where the application will be executed. If the terminal is closed, the command needs to be executed again.
147 |
148 | ## Run the Application
149 |
150 | Change the current directory to the git-cloned application code location on your system:
151 | ```
152 | cd /application
153 | ```
154 |
155 | To see a list of the various options:
156 | ```
157 | ./safety_gear_detector.py -h
158 | ```
159 | A user can specify what target device to run on by using the device command-line argument `-d`. If no target device is specified the application will run on the CPU by default.
160 | To run with multiple devices use _-d MULTI:device1,device2_. For example: _-d MULTI:CPU,GPU,MYRIAD_
161 |
162 | ### Run on the CPU
163 |
164 | To run the application using **worker_safety_mobilenet** model, use the `-sm` flag followed by the path to the worker_safety_mobilenet.xml file, as follows:
165 | ```
166 | ./safety_gear_detector.py -d CPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml
167 | ```
168 | If the worker_safety_mobilenet model is not provided as command-line argument, the application uses OpenCV to detect safety jacket and hard-hat. To run the application without using worker_safety_mobilenet model:
169 | ```
170 | ./safety_gear_detector.py -d CPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml
171 | ```
172 | **Note:** By default, the application runs on async mode. To run the application on sync mode, use ```-f sync``` as command-line argument.
173 |
174 | ### Run on the Integrated GPU
175 | * To run on the integrated Intel GPU with floating point precision 32 (FP32), use the `-d GPU` command-line argument:
176 |
177 | **FP32:** FP32 is single-precision floating-point arithmetic uses 32 bits to represent numbers. 8 bits for the magnitude and 23 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)
178 |
179 | ```
180 | ./safety_gear_detector.py -d GPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP32/worker_safety_mobilenet.xml
181 | ```
182 | * To run on the integrated Intel® GPU with floating point precision 16 (FP16):
183 |
184 | **FP16:** FP16 is half-precision floating-point arithmetic uses 16 bits. 5 bits for the magnitude and 10 bits for the precision. For more information, [click here](https://en.wikipedia.org/wiki/Half-precision_floating-point_format)
185 |
186 | ```
187 | ./safety_gear_detector.py -d GPU -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml
188 | ```
189 | ### Run on the Intel® Neural Compute Stick
190 | To run on the Intel® Neural Compute Stick, use the `-d MYRIAD` command-line argument:
191 | ```
192 | ./safety_gear_detector.py -d MYRIAD -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP16/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml
193 | ```
194 |
195 | ### Run on the Intel® Movidius™ VPU
196 | To run on the Intel® Movidius™ VPU, use the `-d HDDL` command-line argument:
197 | ```
198 | ./safety_gear_detector.py -m /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -sm ../resources/worker-safety-mobilenet/FP16/worker_safety_mobilenet.xml -d HDDL
199 | ```
200 | **Note:** The Intel® Movidius™ VPU can only run FP16 models. The model that is passed to the application through the `-m ` command-line argument must be of data type FP16.
201 |
202 |
240 |
--------------------------------------------------------------------------------
/application/inference.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | Copyright (c) 2018 Intel Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining
6 | a copy of this software and associated documentation files (the
7 | "Software"), to deal in the Software without restriction, including
8 | without limitation the rights to use, copy, modify, merge, publish,
9 | distribute, sublicense, and/or sell copies of the Software, and to
10 | permit persons to whom the Software is furnished to do so, subject to
11 | the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be
14 | included in all copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23 | """
24 |
25 | import os
26 | import sys
27 | import logging as log
28 | from openvino.inference_engine import IENetwork, IECore
29 |
30 |
31 | class Network:
32 | """
33 | Load and configure inference plugins for the specified target devices
34 | and performs synchronous and asynchronous modes for the specified infer requests.
35 | """
36 |
37 | def __init__(self):
38 | self.net = None
39 | self.plugin = None
40 | self.input_blob = None
41 | self.out_blob = None
42 | self.net_plugin = None
43 | self.infer_request_handle = None
44 |
45 | def load_model(self, model, device, input_size, output_size, num_requests, cpu_extension=None, plugin=None):
46 | """
47 | Loads a network and an image to the Inference Engine plugin.
48 | :param model: .xml file of pre trained model
49 | :param cpu_extension: extension for the CPU device
50 | :param device: Target device
51 | :param input_size: Number of input layers
52 | :param output_size: Number of output layers
53 | :param num_requests: Index of Infer request value. Limited to device capabilities.
54 | :param plugin: Plugin for specified device
55 | :return: Shape of input layer
56 | """
57 |
58 | model_xml = model
59 | model_bin = os.path.splitext(model_xml)[0] + ".bin"
60 | # Plugin initialization for specified device
61 | # and load extensions library if specified
62 | if not plugin:
63 | log.info("Initializing plugin for {} device...".format(device))
64 | self.plugin = IECore()
65 | else:
66 | self.plugin = plugin
67 |
68 | if cpu_extension and 'CPU' in device:
69 | self.plugin.add_extension(cpu_extension, "CPU")
70 |
71 | # Read IR
72 | log.info("Reading IR...")
73 | self.net = self.plugin.read_network(model=model_xml, weights=model_bin) #IENetwork(model=model_xml, weights=model_bin)
74 | log.info("Loading IR to the plugin...")
75 |
76 | if "CPU" in device:
77 | supported_layers = self.plugin.query_network(self.net, "CPU")
78 | not_supported_layers = \
79 | [l for l in self.net.layers.keys() if l not in supported_layers]
80 | if len(not_supported_layers) != 0:
81 | log.error("Following layers are not supported by "
82 | "the plugin for specified device {}:\n {}".
83 | format(device,
84 | ', '.join(not_supported_layers)))
85 | # log.error("Please try to specify cpu extensions library path"
86 | # " in command line parameters using -l "
87 | # "or --cpu_extension command line argument")
88 | sys.exit(1)
89 |
90 | if num_requests == 0:
91 | # Loads network read from IR to the plugin
92 | self.net_plugin = self.plugin.load_network(network=self.net, device_name=device)
93 | else:
94 | self.net_plugin = self.plugin.load_network(network=self.net, num_requests=num_requests, device_name=device)
95 | # log.error("num_requests != 0")
96 |
97 | self.input_blob = next(iter(self.net.inputs))
98 | self.out_blob = next(iter(self.net.outputs))
99 | assert len(self.net.inputs.keys()) == input_size, \
100 | "Supports only {} input topologies".format(len(self.net.inputs))
101 | assert len(self.net.outputs) == output_size, \
102 | "Supports only {} output topologies".format(len(self.net.outputs))
103 |
104 | return self.plugin, self.get_input_shape()
105 |
106 | def get_input_shape(self):
107 | """
108 | Gives the shape of the input layer of the network.
109 | :return: None
110 | """
111 | return self.net.inputs[self.input_blob].shape
112 |
113 | def performance_counter(self, request_id):
114 | """
115 | Queries performance measures per layer to get feedback of what is the
116 | most time consuming layer.
117 | :param request_id: Index of Infer request value. Limited to device capabilities
118 | :return: Performance of the layer
119 | """
120 | perf_count = self.net_plugin.requests[request_id].get_perf_counts()
121 | return perf_count
122 |
123 | def exec_net(self, request_id, frame):
124 | """
125 | Starts asynchronous inference for specified request.
126 | :param request_id: Index of Infer request value. Limited to device capabilities.
127 | :param frame: Input image
128 | :return: Instance of Executable Network class
129 | """
130 | self.infer_request_handle = self.net_plugin.start_async(
131 | request_id=request_id, inputs={self.input_blob: frame})
132 | return self.net_plugin
133 |
134 | def wait(self, request_id):
135 | """
136 | Waits for the result to become available.
137 | :param request_id: Index of Infer request value. Limited to device capabilities.
138 | :return: Timeout value
139 | """
140 | wait_process = self.net_plugin.requests[request_id].wait(-1)
141 | return wait_process
142 |
143 | def get_output(self, request_id, output=None):
144 | """
145 | Gives a list of results for the output layer of the network.
146 | :param request_id: Index of Infer request value. Limited to device capabilities.
147 | :param output: Name of the output layer
148 | :return: Results for the specified request
149 | """
150 | if output:
151 | res = self.infer_request_handle.outputs[output]
152 | else:
153 | res = self.net_plugin.requests[request_id].outputs[self.out_blob]
154 | return res
155 |
156 | def clean(self):
157 | """
158 | Deletes all the instances
159 | :return: None
160 | """
161 | del self.net_plugin
162 | del self.plugin
163 | del self.net
164 |
--------------------------------------------------------------------------------
/application/safety_gear_detector.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | Copyright (c) 2018 Intel Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining
6 | a copy of this software and associated documentation files (the
7 | "Software"), to deal in the Software without restriction, including
8 | without limitation the rights to use, copy, modify, merge, publish,
9 | distribute, sublicense, and/or sell copies of the Software, and to
10 | permit persons to whom the Software is furnished to do so, subject to
11 | the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be
14 | included in all copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23 | """
24 |
25 | from __future__ import print_function
26 | import sys
27 | import os
28 | import cv2
29 | import numpy as np
30 | from argparse import ArgumentParser
31 | import datetime
32 | import json
33 | from inference import Network
34 |
35 | # Global vars
36 | cpu_extension = ''
37 | conf_modelLayers = ''
38 | conf_modelWeights = ''
39 | conf_safety_modelLayers = ''
40 | conf_safety_modelWeights = ''
41 | targetDevice = "CPU"
42 | conf_batchSize = 1
43 | conf_modelPersonLabel = 1
44 | conf_inferConfidenceThreshold = 0.7
45 | conf_inFrameViolationsThreshold = 19
46 | conf_inFramePeopleThreshold = 5
47 | use_safety_model = False
48 | padding = 30
49 | viol_wk = 0
50 | acceptedDevices = ['CPU', 'GPU', 'MYRIAD', 'HETERO:FPGA,CPU', 'HDDL']
51 | videos = []
52 | name_of_videos = []
53 | CONFIG_FILE = '../resources/config.json'
54 | is_async_mode = True
55 |
56 |
57 | class Video:
58 | def __init__(self, idx, path):
59 | if path.isnumeric():
60 | self.video = cv2.VideoCapture(int(path))
61 | self.name = "Cam " + str(idx)
62 | else:
63 | if os.path.exists(path):
64 | self.video = cv2.VideoCapture(path)
65 | self.name = "Video " + str(idx)
66 | else:
67 | print("Either wrong input path or empty line is found. Please check the conf.json file")
68 | exit(21)
69 | if not self.video.isOpened():
70 | print("Couldn't open video: " + path)
71 | sys.exit(20)
72 | self.height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT))
73 | self.width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))
74 |
75 | self.currentViolationCount = 0
76 | self.currentViolationCountConfidence = 0
77 | self.prevViolationCount = 0
78 | self.totalViolations = 0
79 | self.totalPeopleCount = 0
80 | self.currentPeopleCount = 0
81 | self.currentPeopleCountConfidence = 0
82 | self.prevPeopleCount = 0
83 | self.currentTotalPeopleCount = 0
84 |
85 | cv2.namedWindow(self.name, cv2.WINDOW_NORMAL)
86 | self.frame_start_time = datetime.datetime.now()
87 |
88 |
89 | def get_args():
90 | """
91 | Parses the argument.
92 | :return: None
93 | """
94 | global is_async_mode
95 | parser = ArgumentParser()
96 | parser.add_argument("-d", "--device",
97 | help="Specify the target device to infer on; CPU, GPU,"
98 | "FPGA, MYRIAD or HDDL is acceptable. Application will"
99 | "look for a suitable plugin for device specified"
100 | " (CPU by default)",
101 | type=str, required=False)
102 | parser.add_argument("-m", "--model",
103 | help="Path to an .xml file with a trained model's"
104 | " weights.",
105 | required=True, type=str)
106 | parser.add_argument("-sm", "--safety_model",
107 | help="Path to an .xml file with a trained model's"
108 | " weights.",
109 | required=False, type=str, default=None)
110 | parser.add_argument("-e", "--cpu_extension",
111 | help="MKLDNN (CPU)-targeted custom layers. Absolute "
112 | "path to a shared library with the kernels impl",
113 | type=str, default=None)
114 | parser.add_argument("-f", "--flag", help="sync or async", default="async", type=str)
115 |
116 | args = parser.parse_args()
117 |
118 | global conf_modelLayers, conf_modelWeights, conf_safety_modelLayers, conf_safety_modelWeights, \
119 | targetDevice, cpu_extension, videos, use_safety_model
120 | if args.model:
121 | conf_modelLayers = args.model
122 | conf_modelWeights = os.path.splitext(conf_modelLayers)[0] + ".bin"
123 | if args.safety_model:
124 | conf_safety_modelLayers = args.safety_model
125 | conf_safety_modelWeights = os.path.splitext(conf_safety_modelLayers)[0] + ".bin"
126 | use_safety_model = True
127 | if args.device:
128 | targetDevice = args.device
129 | if "MULTI:" not in targetDevice:
130 | if targetDevice not in acceptedDevices:
131 | print("Selected device, %s not supported." % (targetDevice))
132 | sys.exit(12)
133 | if args.cpu_extension:
134 | cpu_extension = args.cpu_extension
135 | if args.flag == "async":
136 | is_async_mode = True
137 | print('Application running in Async mode')
138 | else:
139 | is_async_mode = False
140 | print('Application running in Sync mode')
141 | assert os.path.isfile(CONFIG_FILE), "{} file doesn't exist".format(CONFIG_FILE)
142 | config = json.loads(open(CONFIG_FILE).read())
143 | for idx, item in enumerate(config['inputs']):
144 | vid = Video(idx, item['video'])
145 | name_of_videos.append([idx, item['video']])
146 | videos.append([idx, vid])
147 |
148 |
149 | def detect_safety_hat(img):
150 | """
151 | Detection of the hat of the person.
152 | :param img: Current frame
153 | :return: Boolean value of the detected hat
154 | """
155 | lowH = 15
156 | lowS = 65
157 | lowV = 75
158 |
159 | highH = 30
160 | highS = 255
161 | highV = 255
162 |
163 | crop = 0
164 | height = 15
165 | perc = 8
166 |
167 | hsv = np.zeros(1)
168 |
169 | try:
170 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
171 | except cv2.error as e:
172 | print("%d %d %d" % (img.shape))
173 | print("%d %d %d" % (img.shape))
174 | print(e)
175 |
176 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))
177 |
178 | x = 0
179 | y = int(threshold_img.shape[0] * crop / 100)
180 | w = int(threshold_img.shape[1])
181 | h = int(threshold_img.shape[0] * height / 100)
182 | img_cropped = threshold_img[y: y + h, x: x + w]
183 |
184 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:
185 | return False
186 | return True
187 |
188 |
189 | def detect_safety_jacket(img):
190 | """
191 | Detection of the safety jacket of the person.
192 | :param img: Current frame
193 | :return: Boolean value of the detected jacket
194 | """
195 | lowH = 0
196 | lowS = 150
197 | lowV = 42
198 |
199 | highH = 11
200 | highS = 255
201 | highV = 255
202 |
203 | crop = 15
204 | height = 40
205 | perc = 23
206 |
207 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
208 |
209 | threshold_img = cv2.inRange(hsv, (lowH, lowS, lowV), (highH, highS, highV))
210 |
211 | x = 0
212 | y = int(threshold_img.shape[0] * crop / 100)
213 | w = int(threshold_img.shape[1])
214 | h = int(threshold_img.shape[0] * height / 100)
215 | img_cropped = threshold_img[y: y + h, x: x + w]
216 |
217 | if cv2.countNonZero(threshold_img) < img_cropped.size * perc / 100:
218 | return False
219 | return True
220 |
221 |
222 | def detect_workers(workers, frame):
223 | """
224 | Detection of the person with the safety guards.
225 | :param workers: Total number of the person in the current frame
226 | :param frame: Current frame
227 | :return: Total violation count of the person
228 | """
229 | violations = 0
230 | global viol_wk
231 | for worker in workers:
232 | xmin, ymin, xmax, ymax = worker
233 | crop = frame[ymin:ymax, xmin:xmax]
234 | if 0 not in crop.shape:
235 | if detect_safety_hat(crop):
236 | if detect_safety_jacket(crop):
237 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),
238 | (0, 255, 0), 2)
239 | else:
240 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax),
241 | (0, 0, 255), 2)
242 | violations += 1
243 | viol_wk += 1
244 |
245 | else:
246 | cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)
247 | violations += 1
248 | viol_wk += 1
249 | return violations
250 |
251 |
252 | def main():
253 | """
254 | Load the network and parse the output.
255 | :return: None
256 | """
257 | get_args()
258 | global is_async_mode
259 | nextReq = 1
260 | currReq = 0
261 | nextReq_s = 1
262 | currReq_s = 0
263 | prevVideo = None
264 | vid_finished = [False] * len(videos)
265 | min_FPS = min([videos[i][1].video.get(cv2.CAP_PROP_FPS) for i in range(len(videos))])
266 |
267 | # Initialise the class
268 | infer_network = Network()
269 | infer_network_safety = Network()
270 | # Load the network to IE plugin to get shape of input layer
271 | plugin, (batch_size, channels, model_height, model_width) = \
272 | infer_network.load_model(conf_modelLayers, targetDevice, 1, 1, 2, cpu_extension)
273 | if use_safety_model:
274 | batch_size_sm, channels_sm, model_height_sm, model_width_sm = \
275 | infer_network_safety.load_model(conf_safety_modelLayers, targetDevice, 1, 1, 2, cpu_extension, plugin)[1]
276 |
277 | while True:
278 | for index, currVideo in videos:
279 | # Read image from video/cam
280 | vfps = int(round(currVideo.video.get(cv2.CAP_PROP_FPS)))
281 | for i in range(0, int(round(vfps / min_FPS))):
282 | ret, current_img = currVideo.video.read()
283 | if not ret:
284 | vid_finished[index] = True
285 | break
286 | if vid_finished[index]:
287 | stream_end_frame = np.zeros((int(currVideo.height), int(currVideo.width), 1),
288 | dtype='uint8')
289 | cv2.putText(stream_end_frame, "Input file {} has ended".format
290 | (name_of_videos[index][1].split('/')[-1]) ,
291 | (10, int(currVideo.height/2)),
292 | cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
293 | cv2.imshow(currVideo.name, stream_end_frame)
294 | continue
295 | # Transform image to person detection model input
296 | rsImg = cv2.resize(current_img, (model_width, model_height))
297 | rsImg = rsImg.transpose((2, 0, 1))
298 | rsImg = rsImg.reshape((batch_size, channels, model_height, model_width))
299 |
300 | infer_start_time = datetime.datetime.now()
301 | # Infer current image
302 | if is_async_mode:
303 | infer_network.exec_net(nextReq, rsImg)
304 | else:
305 | infer_network.exec_net(currReq, rsImg)
306 | prevVideo = currVideo
307 | previous_img = current_img
308 |
309 | # Wait for previous request to end
310 | if infer_network.wait(currReq) == 0:
311 | infer_end_time = (datetime.datetime.now() - infer_start_time) * 1000
312 |
313 | in_frame_workers = []
314 |
315 | people = 0
316 | violations = 0
317 | hard_hat_detection =False
318 | vest_detection = False
319 | result = infer_network.get_output(currReq)
320 | # Filter output
321 | for obj in result[0][0]:
322 | if obj[2] > conf_inferConfidenceThreshold:
323 | xmin = int(obj[3] * prevVideo.width)
324 | ymin = int(obj[4] * prevVideo.height)
325 | xmax = int(obj[5] * prevVideo.width)
326 | ymax = int(obj[6] * prevVideo.height)
327 | xmin = int(xmin - padding) if (xmin - padding) > 0 else 0
328 | ymin = int(ymin - padding) if (ymin - padding) > 0 else 0
329 | xmax = int(xmax + padding) if (xmax + padding) < prevVideo.width else prevVideo.width
330 | ymax = int(ymax + padding) if (ymax + padding) < prevVideo.height else prevVideo.height
331 | cv2.rectangle(previous_img, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
332 | people += 1
333 | in_frame_workers.append((xmin, ymin, xmax, ymax))
334 | new_frame = previous_img[ymin:ymax, xmin:xmax]
335 | if use_safety_model:
336 |
337 | # Transform image to safety model input
338 | in_frame_sm = cv2.resize(new_frame, (model_width_sm, model_height_sm))
339 | in_frame_sm = in_frame_sm.transpose((2, 0, 1))
340 | in_frame_sm = in_frame_sm.reshape((batch_size_sm, channels_sm, model_height_sm, model_width_sm))
341 |
342 | infer_start_time_sm = datetime.datetime.now()
343 | if is_async_mode:
344 | infer_network_safety.exec_net(nextReq_s, in_frame_sm)
345 | else:
346 | infer_network_safety.exec_net(currReq_s, in_frame_sm)
347 | # Wait for the result
348 | infer_network_safety.wait(currReq_s)
349 | infer_end_time_sm = (datetime.datetime.now() - infer_start_time_sm) * 1000
350 |
351 | result_sm = infer_network_safety.get_output(currReq_s)
352 | # Filter output
353 | hard_hat_detection = False
354 | vest_detection = False
355 | detection_list = []
356 | for obj_sm in result_sm[0][0]:
357 |
358 | if (obj_sm[2] > 0.4):
359 | # Detect safety vest
360 | if (int(obj_sm[1])) == 2:
361 | xmin_sm = int(obj_sm[3] * (xmax-xmin))
362 | ymin_sm = int(obj_sm[4] * (ymax-ymin))
363 | xmax_sm = int(obj_sm[5] * (xmax-xmin))
364 | ymax_sm = int(obj_sm[6] * (ymax-ymin))
365 | if vest_detection == False:
366 | detection_list.append([xmin_sm+xmin, ymin_sm+ymin, xmax_sm+xmin, ymax_sm+ymin])
367 | vest_detection = True
368 |
369 | # Detect hard-hat
370 | if int(obj_sm[1]) == 4:
371 | xmin_sm_v = int(obj_sm[3] * (xmax-xmin))
372 | ymin_sm_v = int(obj_sm[4] * (ymax-ymin))
373 | xmax_sm_v = int(obj_sm[5] * (xmax-xmin))
374 | ymax_sm_v = int(obj_sm[6] * (ymax-ymin))
375 | if hard_hat_detection == False:
376 | detection_list.append([xmin_sm_v+xmin, ymin_sm_v+ymin, xmax_sm_v+xmin, ymax_sm_v+ymin])
377 | hard_hat_detection = True
378 |
379 | if hard_hat_detection is False or vest_detection is False:
380 | violations += 1
381 | for _rect in detection_list:
382 | cv2.rectangle(current_img, (_rect[0] , _rect[1]), (_rect[2] , _rect[3]), (0, 255, 0), 2)
383 | if is_async_mode:
384 | currReq_s, nextReq_s = nextReq_s, currReq_s
385 |
386 | # Use OpenCV if worker-safety-model is not provided
387 | else :
388 | violations = detect_workers(in_frame_workers, previous_img)
389 |
390 | # Check if detected violations equals previous frames
391 | if violations == prevVideo.currentViolationCount:
392 | prevVideo.currentViolationCountConfidence += 1
393 |
394 | # If frame threshold is reached, change validated count
395 | if prevVideo.currentViolationCountConfidence == conf_inFrameViolationsThreshold:
396 |
397 | # If another violation occurred, save image
398 | if prevVideo.currentViolationCount > prevVideo.prevViolationCount:
399 | prevVideo.totalViolations += (prevVideo.currentViolationCount - prevVideo.prevViolationCount)
400 | prevVideo.prevViolationCount = prevVideo.currentViolationCount
401 | else:
402 | prevVideo.currentViolationCountConfidence = 0
403 | prevVideo.currentViolationCount = violations
404 |
405 | # Check if detected people count equals previous frames
406 | if people == prevVideo.currentPeopleCount:
407 | prevVideo.currentPeopleCountConfidence += 1
408 |
409 | # If frame threshold is reached, change validated count
410 | if prevVideo.currentPeopleCountConfidence == conf_inFrameViolationsThreshold:
411 | prevVideo.currentTotalPeopleCount += (
412 | prevVideo.currentPeopleCount - prevVideo.prevPeopleCount)
413 | if prevVideo.currentTotalPeopleCount > prevVideo.prevPeopleCount:
414 | prevVideo.totalPeopleCount += prevVideo.currentTotalPeopleCount - prevVideo.prevPeopleCount
415 | prevVideo.prevPeopleCount = prevVideo.currentPeopleCount
416 | else:
417 | prevVideo.currentPeopleCountConfidence = 0
418 | prevVideo.currentPeopleCount = people
419 |
420 |
421 |
422 | frame_end_time = datetime.datetime.now()
423 | cv2.putText(previous_img, 'Total people count: ' + str(
424 | prevVideo.totalPeopleCount), (10, prevVideo.height - 10),
425 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
426 | cv2.putText(previous_img, 'Current people count: ' + str(
427 | prevVideo.currentTotalPeopleCount),
428 | (10, prevVideo.height - 40),
429 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
430 | cv2.putText(previous_img, 'Total violation count: ' + str(
431 | prevVideo.totalViolations), (10, prevVideo.height - 70),
432 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
433 | cv2.putText(previous_img, 'FPS: %0.2fs' % (1 / (
434 | frame_end_time - prevVideo.frame_start_time).total_seconds()),
435 | (10, prevVideo.height - 100),
436 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
437 | cv2.putText(previous_img, "Inference time: N\A for async mode" if is_async_mode else\
438 | "Inference time: {:.3f} ms".format((infer_end_time).total_seconds()),
439 | (10, prevVideo.height - 130),
440 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
441 |
442 | cv2.imshow(prevVideo.name, previous_img)
443 | prevVideo.frame_start_time = datetime.datetime.now()
444 | # Swap
445 | if is_async_mode:
446 | currReq, nextReq = nextReq, currReq
447 | previous_img = current_img
448 | prevVideo = currVideo
449 | if cv2.waitKey(1) == 27:
450 | print("Attempting to stop input files")
451 | infer_network.clean()
452 | infer_network_safety.clean()
453 | cv2.destroyAllWindows()
454 | return
455 |
456 | if False not in vid_finished:
457 | infer_network.clean()
458 | infer_network_safety.clean()
459 | cv2.destroyAllWindows()
460 | break
461 |
462 |
463 | if __name__ == '__main__':
464 | main()
465 |
--------------------------------------------------------------------------------
/docs/images/archdia.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/archdia.png
--------------------------------------------------------------------------------
/docs/images/jupy1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/jupy1.png
--------------------------------------------------------------------------------
/docs/images/jupy2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/jupy2.png
--------------------------------------------------------------------------------
/docs/images/safetygear.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/docs/images/safetygear.png
--------------------------------------------------------------------------------
/resources/Safety_Full_Hat_and_Vest.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/resources/Safety_Full_Hat_and_Vest.mp4
--------------------------------------------------------------------------------
/resources/config.json:
--------------------------------------------------------------------------------
1 | {
2 | "inputs":[
3 | {
4 | "video":"../resources/Safety_Full_Hat_and_Vest.mp4"
5 | },
6 | {
7 | "video":"../resources/Safety_Full_Hat_and_Vest.mp4"
8 | }
9 | ]
10 | }
11 |
12 |
--------------------------------------------------------------------------------
/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/safety-gear-detector-python/f631969dc9fd916c365ab05fdce321d81eca26d8/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel
--------------------------------------------------------------------------------
/resources/worker-safety-mobilenet/worker_safety_mobilenet.prototxt:
--------------------------------------------------------------------------------
1 | name: "MobileNet-SSD"
2 | input: "data"
3 | input_shape {
4 | dim: 1
5 | dim: 3
6 | dim: 224
7 | dim: 224
8 | }
9 | layer {
10 | name: "conv0"
11 | type: "Convolution"
12 | bottom: "data"
13 | top: "conv0"
14 | param {
15 | lr_mult: 0.1
16 | decay_mult: 0.1
17 | }
18 | convolution_param {
19 | num_output: 32
20 | bias_term: false
21 | pad: 1
22 | kernel_size: 3
23 | stride: 2
24 | weight_filler {
25 | type: "msra"
26 | }
27 | }
28 | }
29 | layer {
30 | name: "conv0/bn"
31 | type: "BatchNorm"
32 | bottom: "conv0"
33 | top: "conv0"
34 | param {
35 | lr_mult: 0
36 | decay_mult: 0
37 | }
38 | param {
39 | lr_mult: 0
40 | decay_mult: 0
41 | }
42 | param {
43 | lr_mult: 0
44 | decay_mult: 0
45 | }
46 | }
47 | layer {
48 | name: "conv0/scale"
49 | type: "Scale"
50 | bottom: "conv0"
51 | top: "conv0"
52 | param {
53 | lr_mult: 0.1
54 | decay_mult: 0.0
55 | }
56 | param {
57 | lr_mult: 0.2
58 | decay_mult: 0.0
59 | }
60 | scale_param {
61 | filler {
62 | value: 1
63 | }
64 | bias_term: true
65 | bias_filler {
66 | value: 0
67 | }
68 | }
69 | }
70 | layer {
71 | name: "conv0/relu"
72 | type: "ReLU"
73 | bottom: "conv0"
74 | top: "conv0"
75 | }
76 | layer {
77 | name: "conv1/dw"
78 | type: "Convolution"
79 | bottom: "conv0"
80 | top: "conv1/dw"
81 | param {
82 | lr_mult: 0.1
83 | decay_mult: 0.1
84 | }
85 | convolution_param {
86 | num_output: 32
87 | bias_term: false
88 | pad: 1
89 | kernel_size: 3
90 | group: 32
91 | engine: CAFFE
92 | weight_filler {
93 | type: "msra"
94 | }
95 | }
96 | }
97 | layer {
98 | name: "conv1/dw/bn"
99 | type: "BatchNorm"
100 | bottom: "conv1/dw"
101 | top: "conv1/dw"
102 | param {
103 | lr_mult: 0
104 | decay_mult: 0
105 | }
106 | param {
107 | lr_mult: 0
108 | decay_mult: 0
109 | }
110 | param {
111 | lr_mult: 0
112 | decay_mult: 0
113 | }
114 | }
115 | layer {
116 | name: "conv1/dw/scale"
117 | type: "Scale"
118 | bottom: "conv1/dw"
119 | top: "conv1/dw"
120 | param {
121 | lr_mult: 0.1
122 | decay_mult: 0.0
123 | }
124 | param {
125 | lr_mult: 0.2
126 | decay_mult: 0.0
127 | }
128 | scale_param {
129 | filler {
130 | value: 1
131 | }
132 | bias_term: true
133 | bias_filler {
134 | value: 0
135 | }
136 | }
137 | }
138 | layer {
139 | name: "conv1/dw/relu"
140 | type: "ReLU"
141 | bottom: "conv1/dw"
142 | top: "conv1/dw"
143 | }
144 | layer {
145 | name: "conv1"
146 | type: "Convolution"
147 | bottom: "conv1/dw"
148 | top: "conv1"
149 | param {
150 | lr_mult: 0.1
151 | decay_mult: 0.1
152 | }
153 | convolution_param {
154 | num_output: 64
155 | bias_term: false
156 | kernel_size: 1
157 | weight_filler {
158 | type: "msra"
159 | }
160 | }
161 | }
162 | layer {
163 | name: "conv1/bn"
164 | type: "BatchNorm"
165 | bottom: "conv1"
166 | top: "conv1"
167 | param {
168 | lr_mult: 0
169 | decay_mult: 0
170 | }
171 | param {
172 | lr_mult: 0
173 | decay_mult: 0
174 | }
175 | param {
176 | lr_mult: 0
177 | decay_mult: 0
178 | }
179 | }
180 | layer {
181 | name: "conv1/scale"
182 | type: "Scale"
183 | bottom: "conv1"
184 | top: "conv1"
185 | param {
186 | lr_mult: 0.1
187 | decay_mult: 0.0
188 | }
189 | param {
190 | lr_mult: 0.2
191 | decay_mult: 0.0
192 | }
193 | scale_param {
194 | filler {
195 | value: 1
196 | }
197 | bias_term: true
198 | bias_filler {
199 | value: 0
200 | }
201 | }
202 | }
203 | layer {
204 | name: "conv1/relu"
205 | type: "ReLU"
206 | bottom: "conv1"
207 | top: "conv1"
208 | }
209 | layer {
210 | name: "conv2/dw"
211 | type: "Convolution"
212 | bottom: "conv1"
213 | top: "conv2/dw"
214 | param {
215 | lr_mult: 0.1
216 | decay_mult: 0.1
217 | }
218 | convolution_param {
219 | num_output: 64
220 | bias_term: false
221 | pad: 1
222 | kernel_size: 3
223 | stride: 2
224 | group: 64
225 | engine: CAFFE
226 | weight_filler {
227 | type: "msra"
228 | }
229 | }
230 | }
231 | layer {
232 | name: "conv2/dw/bn"
233 | type: "BatchNorm"
234 | bottom: "conv2/dw"
235 | top: "conv2/dw"
236 | param {
237 | lr_mult: 0
238 | decay_mult: 0
239 | }
240 | param {
241 | lr_mult: 0
242 | decay_mult: 0
243 | }
244 | param {
245 | lr_mult: 0
246 | decay_mult: 0
247 | }
248 | }
249 | layer {
250 | name: "conv2/dw/scale"
251 | type: "Scale"
252 | bottom: "conv2/dw"
253 | top: "conv2/dw"
254 | param {
255 | lr_mult: 0.1
256 | decay_mult: 0.0
257 | }
258 | param {
259 | lr_mult: 0.2
260 | decay_mult: 0.0
261 | }
262 | scale_param {
263 | filler {
264 | value: 1
265 | }
266 | bias_term: true
267 | bias_filler {
268 | value: 0
269 | }
270 | }
271 | }
272 | layer {
273 | name: "conv2/dw/relu"
274 | type: "ReLU"
275 | bottom: "conv2/dw"
276 | top: "conv2/dw"
277 | }
278 | layer {
279 | name: "conv2"
280 | type: "Convolution"
281 | bottom: "conv2/dw"
282 | top: "conv2"
283 | param {
284 | lr_mult: 0.1
285 | decay_mult: 0.1
286 | }
287 | convolution_param {
288 | num_output: 128
289 | bias_term: false
290 | kernel_size: 1
291 | weight_filler {
292 | type: "msra"
293 | }
294 | }
295 | }
296 | layer {
297 | name: "conv2/bn"
298 | type: "BatchNorm"
299 | bottom: "conv2"
300 | top: "conv2"
301 | param {
302 | lr_mult: 0
303 | decay_mult: 0
304 | }
305 | param {
306 | lr_mult: 0
307 | decay_mult: 0
308 | }
309 | param {
310 | lr_mult: 0
311 | decay_mult: 0
312 | }
313 | }
314 | layer {
315 | name: "conv2/scale"
316 | type: "Scale"
317 | bottom: "conv2"
318 | top: "conv2"
319 | param {
320 | lr_mult: 0.1
321 | decay_mult: 0.0
322 | }
323 | param {
324 | lr_mult: 0.2
325 | decay_mult: 0.0
326 | }
327 | scale_param {
328 | filler {
329 | value: 1
330 | }
331 | bias_term: true
332 | bias_filler {
333 | value: 0
334 | }
335 | }
336 | }
337 | layer {
338 | name: "conv2/relu"
339 | type: "ReLU"
340 | bottom: "conv2"
341 | top: "conv2"
342 | }
343 | layer {
344 | name: "conv3/dw"
345 | type: "Convolution"
346 | bottom: "conv2"
347 | top: "conv3/dw"
348 | param {
349 | lr_mult: 0.1
350 | decay_mult: 0.1
351 | }
352 | convolution_param {
353 | num_output: 128
354 | bias_term: false
355 | pad: 1
356 | kernel_size: 3
357 | group: 128
358 | engine: CAFFE
359 | weight_filler {
360 | type: "msra"
361 | }
362 | }
363 | }
364 | layer {
365 | name: "conv3/dw/bn"
366 | type: "BatchNorm"
367 | bottom: "conv3/dw"
368 | top: "conv3/dw"
369 | param {
370 | lr_mult: 0
371 | decay_mult: 0
372 | }
373 | param {
374 | lr_mult: 0
375 | decay_mult: 0
376 | }
377 | param {
378 | lr_mult: 0
379 | decay_mult: 0
380 | }
381 | }
382 | layer {
383 | name: "conv3/dw/scale"
384 | type: "Scale"
385 | bottom: "conv3/dw"
386 | top: "conv3/dw"
387 | param {
388 | lr_mult: 0.1
389 | decay_mult: 0.0
390 | }
391 | param {
392 | lr_mult: 0.2
393 | decay_mult: 0.0
394 | }
395 | scale_param {
396 | filler {
397 | value: 1
398 | }
399 | bias_term: true
400 | bias_filler {
401 | value: 0
402 | }
403 | }
404 | }
405 | layer {
406 | name: "conv3/dw/relu"
407 | type: "ReLU"
408 | bottom: "conv3/dw"
409 | top: "conv3/dw"
410 | }
411 | layer {
412 | name: "conv3"
413 | type: "Convolution"
414 | bottom: "conv3/dw"
415 | top: "conv3"
416 | param {
417 | lr_mult: 0.1
418 | decay_mult: 0.1
419 | }
420 | convolution_param {
421 | num_output: 128
422 | bias_term: false
423 | kernel_size: 1
424 | weight_filler {
425 | type: "msra"
426 | }
427 | }
428 | }
429 | layer {
430 | name: "conv3/bn"
431 | type: "BatchNorm"
432 | bottom: "conv3"
433 | top: "conv3"
434 | param {
435 | lr_mult: 0
436 | decay_mult: 0
437 | }
438 | param {
439 | lr_mult: 0
440 | decay_mult: 0
441 | }
442 | param {
443 | lr_mult: 0
444 | decay_mult: 0
445 | }
446 | }
447 | layer {
448 | name: "conv3/scale"
449 | type: "Scale"
450 | bottom: "conv3"
451 | top: "conv3"
452 | param {
453 | lr_mult: 0.1
454 | decay_mult: 0.0
455 | }
456 | param {
457 | lr_mult: 0.2
458 | decay_mult: 0.0
459 | }
460 | scale_param {
461 | filler {
462 | value: 1
463 | }
464 | bias_term: true
465 | bias_filler {
466 | value: 0
467 | }
468 | }
469 | }
470 | layer {
471 | name: "conv3/relu"
472 | type: "ReLU"
473 | bottom: "conv3"
474 | top: "conv3"
475 | }
476 | layer {
477 | name: "conv4/dw"
478 | type: "Convolution"
479 | bottom: "conv3"
480 | top: "conv4/dw"
481 | param {
482 | lr_mult: 0.1
483 | decay_mult: 0.1
484 | }
485 | convolution_param {
486 | num_output: 128
487 | bias_term: false
488 | pad: 1
489 | kernel_size: 3
490 | stride: 2
491 | group: 128
492 | engine: CAFFE
493 | weight_filler {
494 | type: "msra"
495 | }
496 | }
497 | }
498 | layer {
499 | name: "conv4/dw/bn"
500 | type: "BatchNorm"
501 | bottom: "conv4/dw"
502 | top: "conv4/dw"
503 | param {
504 | lr_mult: 0
505 | decay_mult: 0
506 | }
507 | param {
508 | lr_mult: 0
509 | decay_mult: 0
510 | }
511 | param {
512 | lr_mult: 0
513 | decay_mult: 0
514 | }
515 | }
516 | layer {
517 | name: "conv4/dw/scale"
518 | type: "Scale"
519 | bottom: "conv4/dw"
520 | top: "conv4/dw"
521 | param {
522 | lr_mult: 0.1
523 | decay_mult: 0.0
524 | }
525 | param {
526 | lr_mult: 0.2
527 | decay_mult: 0.0
528 | }
529 | scale_param {
530 | filler {
531 | value: 1
532 | }
533 | bias_term: true
534 | bias_filler {
535 | value: 0
536 | }
537 | }
538 | }
539 | layer {
540 | name: "conv4/dw/relu"
541 | type: "ReLU"
542 | bottom: "conv4/dw"
543 | top: "conv4/dw"
544 | }
545 | layer {
546 | name: "conv4"
547 | type: "Convolution"
548 | bottom: "conv4/dw"
549 | top: "conv4"
550 | param {
551 | lr_mult: 0.1
552 | decay_mult: 0.1
553 | }
554 | convolution_param {
555 | num_output: 256
556 | bias_term: false
557 | kernel_size: 1
558 | weight_filler {
559 | type: "msra"
560 | }
561 | }
562 | }
563 | layer {
564 | name: "conv4/bn"
565 | type: "BatchNorm"
566 | bottom: "conv4"
567 | top: "conv4"
568 | param {
569 | lr_mult: 0
570 | decay_mult: 0
571 | }
572 | param {
573 | lr_mult: 0
574 | decay_mult: 0
575 | }
576 | param {
577 | lr_mult: 0
578 | decay_mult: 0
579 | }
580 | }
581 | layer {
582 | name: "conv4/scale"
583 | type: "Scale"
584 | bottom: "conv4"
585 | top: "conv4"
586 | param {
587 | lr_mult: 0.1
588 | decay_mult: 0.0
589 | }
590 | param {
591 | lr_mult: 0.2
592 | decay_mult: 0.0
593 | }
594 | scale_param {
595 | filler {
596 | value: 1
597 | }
598 | bias_term: true
599 | bias_filler {
600 | value: 0
601 | }
602 | }
603 | }
604 | layer {
605 | name: "conv4/relu"
606 | type: "ReLU"
607 | bottom: "conv4"
608 | top: "conv4"
609 | }
610 | layer {
611 | name: "conv5/dw"
612 | type: "Convolution"
613 | bottom: "conv4"
614 | top: "conv5/dw"
615 | param {
616 | lr_mult: 0.1
617 | decay_mult: 0.1
618 | }
619 | convolution_param {
620 | num_output: 256
621 | bias_term: false
622 | pad: 1
623 | kernel_size: 3
624 | group: 256
625 | engine: CAFFE
626 | weight_filler {
627 | type: "msra"
628 | }
629 | }
630 | }
631 | layer {
632 | name: "conv5/dw/bn"
633 | type: "BatchNorm"
634 | bottom: "conv5/dw"
635 | top: "conv5/dw"
636 | param {
637 | lr_mult: 0
638 | decay_mult: 0
639 | }
640 | param {
641 | lr_mult: 0
642 | decay_mult: 0
643 | }
644 | param {
645 | lr_mult: 0
646 | decay_mult: 0
647 | }
648 | }
649 | layer {
650 | name: "conv5/dw/scale"
651 | type: "Scale"
652 | bottom: "conv5/dw"
653 | top: "conv5/dw"
654 | param {
655 | lr_mult: 0.1
656 | decay_mult: 0.0
657 | }
658 | param {
659 | lr_mult: 0.2
660 | decay_mult: 0.0
661 | }
662 | scale_param {
663 | filler {
664 | value: 1
665 | }
666 | bias_term: true
667 | bias_filler {
668 | value: 0
669 | }
670 | }
671 | }
672 | layer {
673 | name: "conv5/dw/relu"
674 | type: "ReLU"
675 | bottom: "conv5/dw"
676 | top: "conv5/dw"
677 | }
678 | layer {
679 | name: "conv5"
680 | type: "Convolution"
681 | bottom: "conv5/dw"
682 | top: "conv5"
683 | param {
684 | lr_mult: 0.1
685 | decay_mult: 0.1
686 | }
687 | convolution_param {
688 | num_output: 256
689 | bias_term: false
690 | kernel_size: 1
691 | weight_filler {
692 | type: "msra"
693 | }
694 | }
695 | }
696 | layer {
697 | name: "conv5/bn"
698 | type: "BatchNorm"
699 | bottom: "conv5"
700 | top: "conv5"
701 | param {
702 | lr_mult: 0
703 | decay_mult: 0
704 | }
705 | param {
706 | lr_mult: 0
707 | decay_mult: 0
708 | }
709 | param {
710 | lr_mult: 0
711 | decay_mult: 0
712 | }
713 | }
714 | layer {
715 | name: "conv5/scale"
716 | type: "Scale"
717 | bottom: "conv5"
718 | top: "conv5"
719 | param {
720 | lr_mult: 0.1
721 | decay_mult: 0.0
722 | }
723 | param {
724 | lr_mult: 0.2
725 | decay_mult: 0.0
726 | }
727 | scale_param {
728 | filler {
729 | value: 1
730 | }
731 | bias_term: true
732 | bias_filler {
733 | value: 0
734 | }
735 | }
736 | }
737 | layer {
738 | name: "conv5/relu"
739 | type: "ReLU"
740 | bottom: "conv5"
741 | top: "conv5"
742 | }
743 | layer {
744 | name: "conv6/dw"
745 | type: "Convolution"
746 | bottom: "conv5"
747 | top: "conv6/dw"
748 | param {
749 | lr_mult: 0.1
750 | decay_mult: 0.1
751 | }
752 | convolution_param {
753 | num_output: 256
754 | bias_term: false
755 | pad: 1
756 | kernel_size: 3
757 | stride: 2
758 | group: 256
759 | engine: CAFFE
760 | weight_filler {
761 | type: "msra"
762 | }
763 | }
764 | }
765 | layer {
766 | name: "conv6/dw/bn"
767 | type: "BatchNorm"
768 | bottom: "conv6/dw"
769 | top: "conv6/dw"
770 | param {
771 | lr_mult: 0
772 | decay_mult: 0
773 | }
774 | param {
775 | lr_mult: 0
776 | decay_mult: 0
777 | }
778 | param {
779 | lr_mult: 0
780 | decay_mult: 0
781 | }
782 | }
783 | layer {
784 | name: "conv6/dw/scale"
785 | type: "Scale"
786 | bottom: "conv6/dw"
787 | top: "conv6/dw"
788 | param {
789 | lr_mult: 0.1
790 | decay_mult: 0.0
791 | }
792 | param {
793 | lr_mult: 0.2
794 | decay_mult: 0.0
795 | }
796 | scale_param {
797 | filler {
798 | value: 1
799 | }
800 | bias_term: true
801 | bias_filler {
802 | value: 0
803 | }
804 | }
805 | }
806 | layer {
807 | name: "conv6/dw/relu"
808 | type: "ReLU"
809 | bottom: "conv6/dw"
810 | top: "conv6/dw"
811 | }
812 | layer {
813 | name: "conv6"
814 | type: "Convolution"
815 | bottom: "conv6/dw"
816 | top: "conv6"
817 | param {
818 | lr_mult: 0.1
819 | decay_mult: 0.1
820 | }
821 | convolution_param {
822 | num_output: 512
823 | bias_term: false
824 | kernel_size: 1
825 | weight_filler {
826 | type: "msra"
827 | }
828 | }
829 | }
830 | layer {
831 | name: "conv6/bn"
832 | type: "BatchNorm"
833 | bottom: "conv6"
834 | top: "conv6"
835 | param {
836 | lr_mult: 0
837 | decay_mult: 0
838 | }
839 | param {
840 | lr_mult: 0
841 | decay_mult: 0
842 | }
843 | param {
844 | lr_mult: 0
845 | decay_mult: 0
846 | }
847 | }
848 | layer {
849 | name: "conv6/scale"
850 | type: "Scale"
851 | bottom: "conv6"
852 | top: "conv6"
853 | param {
854 | lr_mult: 0.1
855 | decay_mult: 0.0
856 | }
857 | param {
858 | lr_mult: 0.2
859 | decay_mult: 0.0
860 | }
861 | scale_param {
862 | filler {
863 | value: 1
864 | }
865 | bias_term: true
866 | bias_filler {
867 | value: 0
868 | }
869 | }
870 | }
871 | layer {
872 | name: "conv6/relu"
873 | type: "ReLU"
874 | bottom: "conv6"
875 | top: "conv6"
876 | }
877 | layer {
878 | name: "conv7/dw"
879 | type: "Convolution"
880 | bottom: "conv6"
881 | top: "conv7/dw"
882 | param {
883 | lr_mult: 0.1
884 | decay_mult: 0.1
885 | }
886 | convolution_param {
887 | num_output: 512
888 | bias_term: false
889 | pad: 1
890 | kernel_size: 3
891 | group: 512
892 | engine: CAFFE
893 | weight_filler {
894 | type: "msra"
895 | }
896 | }
897 | }
898 | layer {
899 | name: "conv7/dw/bn"
900 | type: "BatchNorm"
901 | bottom: "conv7/dw"
902 | top: "conv7/dw"
903 | param {
904 | lr_mult: 0
905 | decay_mult: 0
906 | }
907 | param {
908 | lr_mult: 0
909 | decay_mult: 0
910 | }
911 | param {
912 | lr_mult: 0
913 | decay_mult: 0
914 | }
915 | }
916 | layer {
917 | name: "conv7/dw/scale"
918 | type: "Scale"
919 | bottom: "conv7/dw"
920 | top: "conv7/dw"
921 | param {
922 | lr_mult: 0.1
923 | decay_mult: 0.0
924 | }
925 | param {
926 | lr_mult: 0.2
927 | decay_mult: 0.0
928 | }
929 | scale_param {
930 | filler {
931 | value: 1
932 | }
933 | bias_term: true
934 | bias_filler {
935 | value: 0
936 | }
937 | }
938 | }
939 | layer {
940 | name: "conv7/dw/relu"
941 | type: "ReLU"
942 | bottom: "conv7/dw"
943 | top: "conv7/dw"
944 | }
945 | layer {
946 | name: "conv7"
947 | type: "Convolution"
948 | bottom: "conv7/dw"
949 | top: "conv7"
950 | param {
951 | lr_mult: 0.1
952 | decay_mult: 0.1
953 | }
954 | convolution_param {
955 | num_output: 512
956 | bias_term: false
957 | kernel_size: 1
958 | weight_filler {
959 | type: "msra"
960 | }
961 | }
962 | }
963 | layer {
964 | name: "conv7/bn"
965 | type: "BatchNorm"
966 | bottom: "conv7"
967 | top: "conv7"
968 | param {
969 | lr_mult: 0
970 | decay_mult: 0
971 | }
972 | param {
973 | lr_mult: 0
974 | decay_mult: 0
975 | }
976 | param {
977 | lr_mult: 0
978 | decay_mult: 0
979 | }
980 | }
981 | layer {
982 | name: "conv7/scale"
983 | type: "Scale"
984 | bottom: "conv7"
985 | top: "conv7"
986 | param {
987 | lr_mult: 0.1
988 | decay_mult: 0.0
989 | }
990 | param {
991 | lr_mult: 0.2
992 | decay_mult: 0.0
993 | }
994 | scale_param {
995 | filler {
996 | value: 1
997 | }
998 | bias_term: true
999 | bias_filler {
1000 | value: 0
1001 | }
1002 | }
1003 | }
1004 | layer {
1005 | name: "conv7/relu"
1006 | type: "ReLU"
1007 | bottom: "conv7"
1008 | top: "conv7"
1009 | }
1010 | layer {
1011 | name: "conv8/dw"
1012 | type: "Convolution"
1013 | bottom: "conv7"
1014 | top: "conv8/dw"
1015 | param {
1016 | lr_mult: 0.1
1017 | decay_mult: 0.1
1018 | }
1019 | convolution_param {
1020 | num_output: 512
1021 | bias_term: false
1022 | pad: 1
1023 | kernel_size: 3
1024 | group: 512
1025 | engine: CAFFE
1026 | weight_filler {
1027 | type: "msra"
1028 | }
1029 | }
1030 | }
1031 | layer {
1032 | name: "conv8/dw/bn"
1033 | type: "BatchNorm"
1034 | bottom: "conv8/dw"
1035 | top: "conv8/dw"
1036 | param {
1037 | lr_mult: 0
1038 | decay_mult: 0
1039 | }
1040 | param {
1041 | lr_mult: 0
1042 | decay_mult: 0
1043 | }
1044 | param {
1045 | lr_mult: 0
1046 | decay_mult: 0
1047 | }
1048 | }
1049 | layer {
1050 | name: "conv8/dw/scale"
1051 | type: "Scale"
1052 | bottom: "conv8/dw"
1053 | top: "conv8/dw"
1054 | param {
1055 | lr_mult: 0.1
1056 | decay_mult: 0.0
1057 | }
1058 | param {
1059 | lr_mult: 0.2
1060 | decay_mult: 0.0
1061 | }
1062 | scale_param {
1063 | filler {
1064 | value: 1
1065 | }
1066 | bias_term: true
1067 | bias_filler {
1068 | value: 0
1069 | }
1070 | }
1071 | }
1072 | layer {
1073 | name: "conv8/dw/relu"
1074 | type: "ReLU"
1075 | bottom: "conv8/dw"
1076 | top: "conv8/dw"
1077 | }
1078 | layer {
1079 | name: "conv8"
1080 | type: "Convolution"
1081 | bottom: "conv8/dw"
1082 | top: "conv8"
1083 | param {
1084 | lr_mult: 0.1
1085 | decay_mult: 0.1
1086 | }
1087 | convolution_param {
1088 | num_output: 512
1089 | bias_term: false
1090 | kernel_size: 1
1091 | weight_filler {
1092 | type: "msra"
1093 | }
1094 | }
1095 | }
1096 | layer {
1097 | name: "conv8/bn"
1098 | type: "BatchNorm"
1099 | bottom: "conv8"
1100 | top: "conv8"
1101 | param {
1102 | lr_mult: 0
1103 | decay_mult: 0
1104 | }
1105 | param {
1106 | lr_mult: 0
1107 | decay_mult: 0
1108 | }
1109 | param {
1110 | lr_mult: 0
1111 | decay_mult: 0
1112 | }
1113 | }
1114 | layer {
1115 | name: "conv8/scale"
1116 | type: "Scale"
1117 | bottom: "conv8"
1118 | top: "conv8"
1119 | param {
1120 | lr_mult: 0.1
1121 | decay_mult: 0.0
1122 | }
1123 | param {
1124 | lr_mult: 0.2
1125 | decay_mult: 0.0
1126 | }
1127 | scale_param {
1128 | filler {
1129 | value: 1
1130 | }
1131 | bias_term: true
1132 | bias_filler {
1133 | value: 0
1134 | }
1135 | }
1136 | }
1137 | layer {
1138 | name: "conv8/relu"
1139 | type: "ReLU"
1140 | bottom: "conv8"
1141 | top: "conv8"
1142 | }
1143 | layer {
1144 | name: "conv9/dw"
1145 | type: "Convolution"
1146 | bottom: "conv8"
1147 | top: "conv9/dw"
1148 | param {
1149 | lr_mult: 0.1
1150 | decay_mult: 0.1
1151 | }
1152 | convolution_param {
1153 | num_output: 512
1154 | bias_term: false
1155 | pad: 1
1156 | kernel_size: 3
1157 | group: 512
1158 | engine: CAFFE
1159 | weight_filler {
1160 | type: "msra"
1161 | }
1162 | }
1163 | }
1164 | layer {
1165 | name: "conv9/dw/bn"
1166 | type: "BatchNorm"
1167 | bottom: "conv9/dw"
1168 | top: "conv9/dw"
1169 | param {
1170 | lr_mult: 0
1171 | decay_mult: 0
1172 | }
1173 | param {
1174 | lr_mult: 0
1175 | decay_mult: 0
1176 | }
1177 | param {
1178 | lr_mult: 0
1179 | decay_mult: 0
1180 | }
1181 | }
1182 | layer {
1183 | name: "conv9/dw/scale"
1184 | type: "Scale"
1185 | bottom: "conv9/dw"
1186 | top: "conv9/dw"
1187 | param {
1188 | lr_mult: 0.1
1189 | decay_mult: 0.0
1190 | }
1191 | param {
1192 | lr_mult: 0.2
1193 | decay_mult: 0.0
1194 | }
1195 | scale_param {
1196 | filler {
1197 | value: 1
1198 | }
1199 | bias_term: true
1200 | bias_filler {
1201 | value: 0
1202 | }
1203 | }
1204 | }
1205 | layer {
1206 | name: "conv9/dw/relu"
1207 | type: "ReLU"
1208 | bottom: "conv9/dw"
1209 | top: "conv9/dw"
1210 | }
1211 | layer {
1212 | name: "conv9"
1213 | type: "Convolution"
1214 | bottom: "conv9/dw"
1215 | top: "conv9"
1216 | param {
1217 | lr_mult: 0.1
1218 | decay_mult: 0.1
1219 | }
1220 | convolution_param {
1221 | num_output: 512
1222 | bias_term: false
1223 | kernel_size: 1
1224 | weight_filler {
1225 | type: "msra"
1226 | }
1227 | }
1228 | }
1229 | layer {
1230 | name: "conv9/bn"
1231 | type: "BatchNorm"
1232 | bottom: "conv9"
1233 | top: "conv9"
1234 | param {
1235 | lr_mult: 0
1236 | decay_mult: 0
1237 | }
1238 | param {
1239 | lr_mult: 0
1240 | decay_mult: 0
1241 | }
1242 | param {
1243 | lr_mult: 0
1244 | decay_mult: 0
1245 | }
1246 | }
1247 | layer {
1248 | name: "conv9/scale"
1249 | type: "Scale"
1250 | bottom: "conv9"
1251 | top: "conv9"
1252 | param {
1253 | lr_mult: 0.1
1254 | decay_mult: 0.0
1255 | }
1256 | param {
1257 | lr_mult: 0.2
1258 | decay_mult: 0.0
1259 | }
1260 | scale_param {
1261 | filler {
1262 | value: 1
1263 | }
1264 | bias_term: true
1265 | bias_filler {
1266 | value: 0
1267 | }
1268 | }
1269 | }
1270 | layer {
1271 | name: "conv9/relu"
1272 | type: "ReLU"
1273 | bottom: "conv9"
1274 | top: "conv9"
1275 | }
1276 | layer {
1277 | name: "conv10/dw"
1278 | type: "Convolution"
1279 | bottom: "conv9"
1280 | top: "conv10/dw"
1281 | param {
1282 | lr_mult: 0.1
1283 | decay_mult: 0.1
1284 | }
1285 | convolution_param {
1286 | num_output: 512
1287 | bias_term: false
1288 | pad: 1
1289 | kernel_size: 3
1290 | group: 512
1291 | engine: CAFFE
1292 | weight_filler {
1293 | type: "msra"
1294 | }
1295 | }
1296 | }
1297 | layer {
1298 | name: "conv10/dw/bn"
1299 | type: "BatchNorm"
1300 | bottom: "conv10/dw"
1301 | top: "conv10/dw"
1302 | param {
1303 | lr_mult: 0
1304 | decay_mult: 0
1305 | }
1306 | param {
1307 | lr_mult: 0
1308 | decay_mult: 0
1309 | }
1310 | param {
1311 | lr_mult: 0
1312 | decay_mult: 0
1313 | }
1314 | }
1315 | layer {
1316 | name: "conv10/dw/scale"
1317 | type: "Scale"
1318 | bottom: "conv10/dw"
1319 | top: "conv10/dw"
1320 | param {
1321 | lr_mult: 0.1
1322 | decay_mult: 0.0
1323 | }
1324 | param {
1325 | lr_mult: 0.2
1326 | decay_mult: 0.0
1327 | }
1328 | scale_param {
1329 | filler {
1330 | value: 1
1331 | }
1332 | bias_term: true
1333 | bias_filler {
1334 | value: 0
1335 | }
1336 | }
1337 | }
1338 | layer {
1339 | name: "conv10/dw/relu"
1340 | type: "ReLU"
1341 | bottom: "conv10/dw"
1342 | top: "conv10/dw"
1343 | }
1344 | layer {
1345 | name: "conv10"
1346 | type: "Convolution"
1347 | bottom: "conv10/dw"
1348 | top: "conv10"
1349 | param {
1350 | lr_mult: 0.1
1351 | decay_mult: 0.1
1352 | }
1353 | convolution_param {
1354 | num_output: 512
1355 | bias_term: false
1356 | kernel_size: 1
1357 | weight_filler {
1358 | type: "msra"
1359 | }
1360 | }
1361 | }
1362 | layer {
1363 | name: "conv10/bn"
1364 | type: "BatchNorm"
1365 | bottom: "conv10"
1366 | top: "conv10"
1367 | param {
1368 | lr_mult: 0
1369 | decay_mult: 0
1370 | }
1371 | param {
1372 | lr_mult: 0
1373 | decay_mult: 0
1374 | }
1375 | param {
1376 | lr_mult: 0
1377 | decay_mult: 0
1378 | }
1379 | }
1380 | layer {
1381 | name: "conv10/scale"
1382 | type: "Scale"
1383 | bottom: "conv10"
1384 | top: "conv10"
1385 | param {
1386 | lr_mult: 0.1
1387 | decay_mult: 0.0
1388 | }
1389 | param {
1390 | lr_mult: 0.2
1391 | decay_mult: 0.0
1392 | }
1393 | scale_param {
1394 | filler {
1395 | value: 1
1396 | }
1397 | bias_term: true
1398 | bias_filler {
1399 | value: 0
1400 | }
1401 | }
1402 | }
1403 | layer {
1404 | name: "conv10/relu"
1405 | type: "ReLU"
1406 | bottom: "conv10"
1407 | top: "conv10"
1408 | }
1409 | layer {
1410 | name: "conv11/dw"
1411 | type: "Convolution"
1412 | bottom: "conv10"
1413 | top: "conv11/dw"
1414 | param {
1415 | lr_mult: 0.1
1416 | decay_mult: 0.1
1417 | }
1418 | convolution_param {
1419 | num_output: 512
1420 | bias_term: false
1421 | pad: 1
1422 | kernel_size: 3
1423 | group: 512
1424 | engine: CAFFE
1425 | weight_filler {
1426 | type: "msra"
1427 | }
1428 | }
1429 | }
1430 | layer {
1431 | name: "conv11/dw/bn"
1432 | type: "BatchNorm"
1433 | bottom: "conv11/dw"
1434 | top: "conv11/dw"
1435 | param {
1436 | lr_mult: 0
1437 | decay_mult: 0
1438 | }
1439 | param {
1440 | lr_mult: 0
1441 | decay_mult: 0
1442 | }
1443 | param {
1444 | lr_mult: 0
1445 | decay_mult: 0
1446 | }
1447 | }
1448 | layer {
1449 | name: "conv11/dw/scale"
1450 | type: "Scale"
1451 | bottom: "conv11/dw"
1452 | top: "conv11/dw"
1453 | param {
1454 | lr_mult: 0.1
1455 | decay_mult: 0.0
1456 | }
1457 | param {
1458 | lr_mult: 0.2
1459 | decay_mult: 0.0
1460 | }
1461 | scale_param {
1462 | filler {
1463 | value: 1
1464 | }
1465 | bias_term: true
1466 | bias_filler {
1467 | value: 0
1468 | }
1469 | }
1470 | }
1471 | layer {
1472 | name: "conv11/dw/relu"
1473 | type: "ReLU"
1474 | bottom: "conv11/dw"
1475 | top: "conv11/dw"
1476 | }
1477 | layer {
1478 | name: "conv11"
1479 | type: "Convolution"
1480 | bottom: "conv11/dw"
1481 | top: "conv11"
1482 | param {
1483 | lr_mult: 0.1
1484 | decay_mult: 0.1
1485 | }
1486 | convolution_param {
1487 | num_output: 512
1488 | bias_term: false
1489 | kernel_size: 1
1490 | weight_filler {
1491 | type: "msra"
1492 | }
1493 | }
1494 | }
1495 | layer {
1496 | name: "conv11/bn"
1497 | type: "BatchNorm"
1498 | bottom: "conv11"
1499 | top: "conv11"
1500 | param {
1501 | lr_mult: 0
1502 | decay_mult: 0
1503 | }
1504 | param {
1505 | lr_mult: 0
1506 | decay_mult: 0
1507 | }
1508 | param {
1509 | lr_mult: 0
1510 | decay_mult: 0
1511 | }
1512 | }
1513 | layer {
1514 | name: "conv11/scale"
1515 | type: "Scale"
1516 | bottom: "conv11"
1517 | top: "conv11"
1518 | param {
1519 | lr_mult: 0.1
1520 | decay_mult: 0.0
1521 | }
1522 | param {
1523 | lr_mult: 0.2
1524 | decay_mult: 0.0
1525 | }
1526 | scale_param {
1527 | filler {
1528 | value: 1
1529 | }
1530 | bias_term: true
1531 | bias_filler {
1532 | value: 0
1533 | }
1534 | }
1535 | }
1536 | layer {
1537 | name: "conv11/relu"
1538 | type: "ReLU"
1539 | bottom: "conv11"
1540 | top: "conv11"
1541 | }
1542 | layer {
1543 | name: "conv12/dw"
1544 | type: "Convolution"
1545 | bottom: "conv11"
1546 | top: "conv12/dw"
1547 | param {
1548 | lr_mult: 0.1
1549 | decay_mult: 0.1
1550 | }
1551 | convolution_param {
1552 | num_output: 512
1553 | bias_term: false
1554 | pad: 1
1555 | kernel_size: 3
1556 | stride: 2
1557 | group: 512
1558 | engine: CAFFE
1559 | weight_filler {
1560 | type: "msra"
1561 | }
1562 | }
1563 | }
1564 | layer {
1565 | name: "conv12/dw/bn"
1566 | type: "BatchNorm"
1567 | bottom: "conv12/dw"
1568 | top: "conv12/dw"
1569 | param {
1570 | lr_mult: 0
1571 | decay_mult: 0
1572 | }
1573 | param {
1574 | lr_mult: 0
1575 | decay_mult: 0
1576 | }
1577 | param {
1578 | lr_mult: 0
1579 | decay_mult: 0
1580 | }
1581 | }
1582 | layer {
1583 | name: "conv12/dw/scale"
1584 | type: "Scale"
1585 | bottom: "conv12/dw"
1586 | top: "conv12/dw"
1587 | param {
1588 | lr_mult: 0.1
1589 | decay_mult: 0.0
1590 | }
1591 | param {
1592 | lr_mult: 0.2
1593 | decay_mult: 0.0
1594 | }
1595 | scale_param {
1596 | filler {
1597 | value: 1
1598 | }
1599 | bias_term: true
1600 | bias_filler {
1601 | value: 0
1602 | }
1603 | }
1604 | }
1605 | layer {
1606 | name: "conv12/dw/relu"
1607 | type: "ReLU"
1608 | bottom: "conv12/dw"
1609 | top: "conv12/dw"
1610 | }
1611 | layer {
1612 | name: "conv12"
1613 | type: "Convolution"
1614 | bottom: "conv12/dw"
1615 | top: "conv12"
1616 | param {
1617 | lr_mult: 0.1
1618 | decay_mult: 0.1
1619 | }
1620 | convolution_param {
1621 | num_output: 1024
1622 | bias_term: false
1623 | kernel_size: 1
1624 | weight_filler {
1625 | type: "msra"
1626 | }
1627 | }
1628 | }
1629 | layer {
1630 | name: "conv12/bn"
1631 | type: "BatchNorm"
1632 | bottom: "conv12"
1633 | top: "conv12"
1634 | param {
1635 | lr_mult: 0
1636 | decay_mult: 0
1637 | }
1638 | param {
1639 | lr_mult: 0
1640 | decay_mult: 0
1641 | }
1642 | param {
1643 | lr_mult: 0
1644 | decay_mult: 0
1645 | }
1646 | }
1647 | layer {
1648 | name: "conv12/scale"
1649 | type: "Scale"
1650 | bottom: "conv12"
1651 | top: "conv12"
1652 | param {
1653 | lr_mult: 0.1
1654 | decay_mult: 0.0
1655 | }
1656 | param {
1657 | lr_mult: 0.2
1658 | decay_mult: 0.0
1659 | }
1660 | scale_param {
1661 | filler {
1662 | value: 1
1663 | }
1664 | bias_term: true
1665 | bias_filler {
1666 | value: 0
1667 | }
1668 | }
1669 | }
1670 | layer {
1671 | name: "conv12/relu"
1672 | type: "ReLU"
1673 | bottom: "conv12"
1674 | top: "conv12"
1675 | }
1676 | layer {
1677 | name: "conv13/dw"
1678 | type: "Convolution"
1679 | bottom: "conv12"
1680 | top: "conv13/dw"
1681 | param {
1682 | lr_mult: 0.1
1683 | decay_mult: 0.1
1684 | }
1685 | convolution_param {
1686 | num_output: 1024
1687 | bias_term: false
1688 | pad: 1
1689 | kernel_size: 3
1690 | group: 1024
1691 | engine: CAFFE
1692 | weight_filler {
1693 | type: "msra"
1694 | }
1695 | }
1696 | }
1697 | layer {
1698 | name: "conv13/dw/bn"
1699 | type: "BatchNorm"
1700 | bottom: "conv13/dw"
1701 | top: "conv13/dw"
1702 | param {
1703 | lr_mult: 0
1704 | decay_mult: 0
1705 | }
1706 | param {
1707 | lr_mult: 0
1708 | decay_mult: 0
1709 | }
1710 | param {
1711 | lr_mult: 0
1712 | decay_mult: 0
1713 | }
1714 | }
1715 | layer {
1716 | name: "conv13/dw/scale"
1717 | type: "Scale"
1718 | bottom: "conv13/dw"
1719 | top: "conv13/dw"
1720 | param {
1721 | lr_mult: 0.1
1722 | decay_mult: 0.0
1723 | }
1724 | param {
1725 | lr_mult: 0.2
1726 | decay_mult: 0.0
1727 | }
1728 | scale_param {
1729 | filler {
1730 | value: 1
1731 | }
1732 | bias_term: true
1733 | bias_filler {
1734 | value: 0
1735 | }
1736 | }
1737 | }
1738 | layer {
1739 | name: "conv13/dw/relu"
1740 | type: "ReLU"
1741 | bottom: "conv13/dw"
1742 | top: "conv13/dw"
1743 | }
1744 | layer {
1745 | name: "conv13"
1746 | type: "Convolution"
1747 | bottom: "conv13/dw"
1748 | top: "conv13"
1749 | param {
1750 | lr_mult: 0.1
1751 | decay_mult: 0.1
1752 | }
1753 | convolution_param {
1754 | num_output: 1024
1755 | bias_term: false
1756 | kernel_size: 1
1757 | weight_filler {
1758 | type: "msra"
1759 | }
1760 | }
1761 | }
1762 | layer {
1763 | name: "conv13/bn"
1764 | type: "BatchNorm"
1765 | bottom: "conv13"
1766 | top: "conv13"
1767 | param {
1768 | lr_mult: 0
1769 | decay_mult: 0
1770 | }
1771 | param {
1772 | lr_mult: 0
1773 | decay_mult: 0
1774 | }
1775 | param {
1776 | lr_mult: 0
1777 | decay_mult: 0
1778 | }
1779 | }
1780 | layer {
1781 | name: "conv13/scale"
1782 | type: "Scale"
1783 | bottom: "conv13"
1784 | top: "conv13"
1785 | param {
1786 | lr_mult: 0.1
1787 | decay_mult: 0.0
1788 | }
1789 | param {
1790 | lr_mult: 0.2
1791 | decay_mult: 0.0
1792 | }
1793 | scale_param {
1794 | filler {
1795 | value: 1
1796 | }
1797 | bias_term: true
1798 | bias_filler {
1799 | value: 0
1800 | }
1801 | }
1802 | }
1803 | layer {
1804 | name: "conv13/relu"
1805 | type: "ReLU"
1806 | bottom: "conv13"
1807 | top: "conv13"
1808 | }
1809 | layer {
1810 | name: "conv14_1"
1811 | type: "Convolution"
1812 | bottom: "conv13"
1813 | top: "conv14_1"
1814 | param {
1815 | lr_mult: 0.1
1816 | decay_mult: 0.1
1817 | }
1818 | convolution_param {
1819 | num_output: 256
1820 | bias_term: false
1821 | kernel_size: 1
1822 | weight_filler {
1823 | type: "msra"
1824 | }
1825 | }
1826 | }
1827 | layer {
1828 | name: "conv14_1/bn"
1829 | type: "BatchNorm"
1830 | bottom: "conv14_1"
1831 | top: "conv14_1"
1832 | param {
1833 | lr_mult: 0
1834 | decay_mult: 0
1835 | }
1836 | param {
1837 | lr_mult: 0
1838 | decay_mult: 0
1839 | }
1840 | param {
1841 | lr_mult: 0
1842 | decay_mult: 0
1843 | }
1844 | }
1845 | layer {
1846 | name: "conv14_1/scale"
1847 | type: "Scale"
1848 | bottom: "conv14_1"
1849 | top: "conv14_1"
1850 | param {
1851 | lr_mult: 0.1
1852 | decay_mult: 0.0
1853 | }
1854 | param {
1855 | lr_mult: 0.2
1856 | decay_mult: 0.0
1857 | }
1858 | scale_param {
1859 | filler {
1860 | value: 1
1861 | }
1862 | bias_term: true
1863 | bias_filler {
1864 | value: 0
1865 | }
1866 | }
1867 | }
1868 | layer {
1869 | name: "conv14_1/relu"
1870 | type: "ReLU"
1871 | bottom: "conv14_1"
1872 | top: "conv14_1"
1873 | }
1874 | layer {
1875 | name: "conv14_2"
1876 | type: "Convolution"
1877 | bottom: "conv14_1"
1878 | top: "conv14_2"
1879 | param {
1880 | lr_mult: 0.1
1881 | decay_mult: 0.1
1882 | }
1883 | convolution_param {
1884 | num_output: 512
1885 | bias_term: false
1886 | pad: 1
1887 | kernel_size: 3
1888 | stride: 2
1889 | weight_filler {
1890 | type: "msra"
1891 | }
1892 | }
1893 | }
1894 | layer {
1895 | name: "conv14_2/bn"
1896 | type: "BatchNorm"
1897 | bottom: "conv14_2"
1898 | top: "conv14_2"
1899 | param {
1900 | lr_mult: 0
1901 | decay_mult: 0
1902 | }
1903 | param {
1904 | lr_mult: 0
1905 | decay_mult: 0
1906 | }
1907 | param {
1908 | lr_mult: 0
1909 | decay_mult: 0
1910 | }
1911 | }
1912 | layer {
1913 | name: "conv14_2/scale"
1914 | type: "Scale"
1915 | bottom: "conv14_2"
1916 | top: "conv14_2"
1917 | param {
1918 | lr_mult: 0.1
1919 | decay_mult: 0.0
1920 | }
1921 | param {
1922 | lr_mult: 0.2
1923 | decay_mult: 0.0
1924 | }
1925 | scale_param {
1926 | filler {
1927 | value: 1
1928 | }
1929 | bias_term: true
1930 | bias_filler {
1931 | value: 0
1932 | }
1933 | }
1934 | }
1935 | layer {
1936 | name: "conv14_2/relu"
1937 | type: "ReLU"
1938 | bottom: "conv14_2"
1939 | top: "conv14_2"
1940 | }
1941 | layer {
1942 | name: "conv15_1"
1943 | type: "Convolution"
1944 | bottom: "conv14_2"
1945 | top: "conv15_1"
1946 | param {
1947 | lr_mult: 0.1
1948 | decay_mult: 0.1
1949 | }
1950 | convolution_param {
1951 | num_output: 128
1952 | bias_term: false
1953 | kernel_size: 1
1954 | weight_filler {
1955 | type: "msra"
1956 | }
1957 | }
1958 | }
1959 | layer {
1960 | name: "conv15_1/bn"
1961 | type: "BatchNorm"
1962 | bottom: "conv15_1"
1963 | top: "conv15_1"
1964 | param {
1965 | lr_mult: 0
1966 | decay_mult: 0
1967 | }
1968 | param {
1969 | lr_mult: 0
1970 | decay_mult: 0
1971 | }
1972 | param {
1973 | lr_mult: 0
1974 | decay_mult: 0
1975 | }
1976 | }
1977 | layer {
1978 | name: "conv15_1/scale"
1979 | type: "Scale"
1980 | bottom: "conv15_1"
1981 | top: "conv15_1"
1982 | param {
1983 | lr_mult: 0.1
1984 | decay_mult: 0.0
1985 | }
1986 | param {
1987 | lr_mult: 0.2
1988 | decay_mult: 0.0
1989 | }
1990 | scale_param {
1991 | filler {
1992 | value: 1
1993 | }
1994 | bias_term: true
1995 | bias_filler {
1996 | value: 0
1997 | }
1998 | }
1999 | }
2000 | layer {
2001 | name: "conv15_1/relu"
2002 | type: "ReLU"
2003 | bottom: "conv15_1"
2004 | top: "conv15_1"
2005 | }
2006 | layer {
2007 | name: "conv15_2"
2008 | type: "Convolution"
2009 | bottom: "conv15_1"
2010 | top: "conv15_2"
2011 | param {
2012 | lr_mult: 0.1
2013 | decay_mult: 0.1
2014 | }
2015 | convolution_param {
2016 | num_output: 256
2017 | bias_term: false
2018 | pad: 1
2019 | kernel_size: 3
2020 | stride: 2
2021 | weight_filler {
2022 | type: "msra"
2023 | }
2024 | }
2025 | }
2026 | layer {
2027 | name: "conv15_2/bn"
2028 | type: "BatchNorm"
2029 | bottom: "conv15_2"
2030 | top: "conv15_2"
2031 | param {
2032 | lr_mult: 0
2033 | decay_mult: 0
2034 | }
2035 | param {
2036 | lr_mult: 0
2037 | decay_mult: 0
2038 | }
2039 | param {
2040 | lr_mult: 0
2041 | decay_mult: 0
2042 | }
2043 | }
2044 | layer {
2045 | name: "conv15_2/scale"
2046 | type: "Scale"
2047 | bottom: "conv15_2"
2048 | top: "conv15_2"
2049 | param {
2050 | lr_mult: 0.1
2051 | decay_mult: 0.0
2052 | }
2053 | param {
2054 | lr_mult: 0.2
2055 | decay_mult: 0.0
2056 | }
2057 | scale_param {
2058 | filler {
2059 | value: 1
2060 | }
2061 | bias_term: true
2062 | bias_filler {
2063 | value: 0
2064 | }
2065 | }
2066 | }
2067 | layer {
2068 | name: "conv15_2/relu"
2069 | type: "ReLU"
2070 | bottom: "conv15_2"
2071 | top: "conv15_2"
2072 | }
2073 | layer {
2074 | name: "conv16_1"
2075 | type: "Convolution"
2076 | bottom: "conv15_2"
2077 | top: "conv16_1"
2078 | param {
2079 | lr_mult: 0.1
2080 | decay_mult: 0.1
2081 | }
2082 | convolution_param {
2083 | num_output: 128
2084 | bias_term: false
2085 | kernel_size: 1
2086 | weight_filler {
2087 | type: "msra"
2088 | }
2089 | }
2090 | }
2091 | layer {
2092 | name: "conv16_1/bn"
2093 | type: "BatchNorm"
2094 | bottom: "conv16_1"
2095 | top: "conv16_1"
2096 | param {
2097 | lr_mult: 0
2098 | decay_mult: 0
2099 | }
2100 | param {
2101 | lr_mult: 0
2102 | decay_mult: 0
2103 | }
2104 | param {
2105 | lr_mult: 0
2106 | decay_mult: 0
2107 | }
2108 | }
2109 | layer {
2110 | name: "conv16_1/scale"
2111 | type: "Scale"
2112 | bottom: "conv16_1"
2113 | top: "conv16_1"
2114 | param {
2115 | lr_mult: 0.1
2116 | decay_mult: 0.0
2117 | }
2118 | param {
2119 | lr_mult: 0.2
2120 | decay_mult: 0.0
2121 | }
2122 | scale_param {
2123 | filler {
2124 | value: 1
2125 | }
2126 | bias_term: true
2127 | bias_filler {
2128 | value: 0
2129 | }
2130 | }
2131 | }
2132 | layer {
2133 | name: "conv16_1/relu"
2134 | type: "ReLU"
2135 | bottom: "conv16_1"
2136 | top: "conv16_1"
2137 | }
2138 | layer {
2139 | name: "conv16_2"
2140 | type: "Convolution"
2141 | bottom: "conv16_1"
2142 | top: "conv16_2"
2143 | param {
2144 | lr_mult: 0.1
2145 | decay_mult: 0.1
2146 | }
2147 | convolution_param {
2148 | num_output: 256
2149 | bias_term: false
2150 | pad: 1
2151 | kernel_size: 3
2152 | stride: 2
2153 | weight_filler {
2154 | type: "msra"
2155 | }
2156 | }
2157 | }
2158 | layer {
2159 | name: "conv16_2/bn"
2160 | type: "BatchNorm"
2161 | bottom: "conv16_2"
2162 | top: "conv16_2"
2163 | param {
2164 | lr_mult: 0
2165 | decay_mult: 0
2166 | }
2167 | param {
2168 | lr_mult: 0
2169 | decay_mult: 0
2170 | }
2171 | param {
2172 | lr_mult: 0
2173 | decay_mult: 0
2174 | }
2175 | }
2176 | layer {
2177 | name: "conv16_2/scale"
2178 | type: "Scale"
2179 | bottom: "conv16_2"
2180 | top: "conv16_2"
2181 | param {
2182 | lr_mult: 0.1
2183 | decay_mult: 0.0
2184 | }
2185 | param {
2186 | lr_mult: 0.2
2187 | decay_mult: 0.0
2188 | }
2189 | scale_param {
2190 | filler {
2191 | value: 1
2192 | }
2193 | bias_term: true
2194 | bias_filler {
2195 | value: 0
2196 | }
2197 | }
2198 | }
2199 | layer {
2200 | name: "conv16_2/relu"
2201 | type: "ReLU"
2202 | bottom: "conv16_2"
2203 | top: "conv16_2"
2204 | }
2205 | layer {
2206 | name: "conv17_1"
2207 | type: "Convolution"
2208 | bottom: "conv16_2"
2209 | top: "conv17_1"
2210 | param {
2211 | lr_mult: 0.1
2212 | decay_mult: 0.1
2213 | }
2214 | convolution_param {
2215 | num_output: 64
2216 | bias_term: false
2217 | kernel_size: 1
2218 | weight_filler {
2219 | type: "msra"
2220 | }
2221 | }
2222 | }
2223 | layer {
2224 | name: "conv17_1/bn"
2225 | type: "BatchNorm"
2226 | bottom: "conv17_1"
2227 | top: "conv17_1"
2228 | param {
2229 | lr_mult: 0
2230 | decay_mult: 0
2231 | }
2232 | param {
2233 | lr_mult: 0
2234 | decay_mult: 0
2235 | }
2236 | param {
2237 | lr_mult: 0
2238 | decay_mult: 0
2239 | }
2240 | }
2241 | layer {
2242 | name: "conv17_1/scale"
2243 | type: "Scale"
2244 | bottom: "conv17_1"
2245 | top: "conv17_1"
2246 | param {
2247 | lr_mult: 0.1
2248 | decay_mult: 0.0
2249 | }
2250 | param {
2251 | lr_mult: 0.2
2252 | decay_mult: 0.0
2253 | }
2254 | scale_param {
2255 | filler {
2256 | value: 1
2257 | }
2258 | bias_term: true
2259 | bias_filler {
2260 | value: 0
2261 | }
2262 | }
2263 | }
2264 | layer {
2265 | name: "conv17_1/relu"
2266 | type: "ReLU"
2267 | bottom: "conv17_1"
2268 | top: "conv17_1"
2269 | }
2270 | layer {
2271 | name: "conv17_2"
2272 | type: "Convolution"
2273 | bottom: "conv17_1"
2274 | top: "conv17_2"
2275 | param {
2276 | lr_mult: 0.1
2277 | decay_mult: 0.1
2278 | }
2279 | convolution_param {
2280 | num_output: 128
2281 | bias_term: false
2282 | pad: 1
2283 | kernel_size: 3
2284 | stride: 2
2285 | weight_filler {
2286 | type: "msra"
2287 | }
2288 | }
2289 | }
2290 | layer {
2291 | name: "conv17_2/bn"
2292 | type: "BatchNorm"
2293 | bottom: "conv17_2"
2294 | top: "conv17_2"
2295 | param {
2296 | lr_mult: 0
2297 | decay_mult: 0
2298 | }
2299 | param {
2300 | lr_mult: 0
2301 | decay_mult: 0
2302 | }
2303 | param {
2304 | lr_mult: 0
2305 | decay_mult: 0
2306 | }
2307 | }
2308 | layer {
2309 | name: "conv17_2/scale"
2310 | type: "Scale"
2311 | bottom: "conv17_2"
2312 | top: "conv17_2"
2313 | param {
2314 | lr_mult: 0.1
2315 | decay_mult: 0.0
2316 | }
2317 | param {
2318 | lr_mult: 0.2
2319 | decay_mult: 0.0
2320 | }
2321 | scale_param {
2322 | filler {
2323 | value: 1
2324 | }
2325 | bias_term: true
2326 | bias_filler {
2327 | value: 0
2328 | }
2329 | }
2330 | }
2331 | layer {
2332 | name: "conv17_2/relu"
2333 | type: "ReLU"
2334 | bottom: "conv17_2"
2335 | top: "conv17_2"
2336 | }
2337 | layer {
2338 | name: "conv11_mbox_loc"
2339 | type: "Convolution"
2340 | bottom: "conv11"
2341 | top: "conv11_mbox_loc"
2342 | param {
2343 | lr_mult: 0.1
2344 | decay_mult: 0.1
2345 | }
2346 | param {
2347 | lr_mult: 0.2
2348 | decay_mult: 0.0
2349 | }
2350 | convolution_param {
2351 | num_output: 12
2352 | kernel_size: 1
2353 | weight_filler {
2354 | type: "msra"
2355 | }
2356 | bias_filler {
2357 | type: "constant"
2358 | value: 0.0
2359 | }
2360 | }
2361 | }
2362 | layer {
2363 | name: "conv11_mbox_loc_perm"
2364 | type: "Permute"
2365 | bottom: "conv11_mbox_loc"
2366 | top: "conv11_mbox_loc_perm"
2367 | permute_param {
2368 | order: 0
2369 | order: 2
2370 | order: 3
2371 | order: 1
2372 | }
2373 | }
2374 | layer {
2375 | name: "conv11_mbox_loc_flat"
2376 | type: "Flatten"
2377 | bottom: "conv11_mbox_loc_perm"
2378 | top: "conv11_mbox_loc_flat"
2379 | flatten_param {
2380 | axis: 1
2381 | }
2382 | }
2383 | layer {
2384 | name: "conv11_mbox_conf_new_worker"
2385 | type: "Convolution"
2386 | bottom: "conv11"
2387 | top: "conv11_mbox_conf_new_worker"
2388 | param {
2389 | lr_mult: 1.0
2390 | decay_mult: 1.0
2391 | }
2392 | param {
2393 | lr_mult: 2.0
2394 | decay_mult: 0.0
2395 | }
2396 | convolution_param {
2397 | num_output: 15
2398 | kernel_size: 1
2399 | weight_filler {
2400 | type: "msra"
2401 | }
2402 | bias_filler {
2403 | type: "constant"
2404 | value: 0.0
2405 | }
2406 | }
2407 | }
2408 | layer {
2409 | name: "conv11_mbox_conf_perm"
2410 | type: "Permute"
2411 | bottom: "conv11_mbox_conf_new_worker"
2412 | top: "conv11_mbox_conf_perm"
2413 | permute_param {
2414 | order: 0
2415 | order: 2
2416 | order: 3
2417 | order: 1
2418 | }
2419 | }
2420 | layer {
2421 | name: "conv11_mbox_conf_flat"
2422 | type: "Flatten"
2423 | bottom: "conv11_mbox_conf_perm"
2424 | top: "conv11_mbox_conf_flat"
2425 | flatten_param {
2426 | axis: 1
2427 | }
2428 | }
2429 | layer {
2430 | name: "conv11_mbox_priorbox"
2431 | type: "PriorBox"
2432 | bottom: "conv11"
2433 | bottom: "data"
2434 | top: "conv11_mbox_priorbox"
2435 | prior_box_param {
2436 | min_size: 60.0
2437 | aspect_ratio: 2.0
2438 | flip: true
2439 | clip: false
2440 | variance: 0.1
2441 | variance: 0.1
2442 | variance: 0.2
2443 | variance: 0.2
2444 | offset: 0.5
2445 | }
2446 | }
2447 | layer {
2448 | name: "conv13_mbox_loc"
2449 | type: "Convolution"
2450 | bottom: "conv13"
2451 | top: "conv13_mbox_loc"
2452 | param {
2453 | lr_mult: 0.1
2454 | decay_mult: 0.1
2455 | }
2456 | param {
2457 | lr_mult: 0.2
2458 | decay_mult: 0.0
2459 | }
2460 | convolution_param {
2461 | num_output: 24
2462 | kernel_size: 1
2463 | weight_filler {
2464 | type: "msra"
2465 | }
2466 | bias_filler {
2467 | type: "constant"
2468 | value: 0.0
2469 | }
2470 | }
2471 | }
2472 | layer {
2473 | name: "conv13_mbox_loc_perm"
2474 | type: "Permute"
2475 | bottom: "conv13_mbox_loc"
2476 | top: "conv13_mbox_loc_perm"
2477 | permute_param {
2478 | order: 0
2479 | order: 2
2480 | order: 3
2481 | order: 1
2482 | }
2483 | }
2484 | layer {
2485 | name: "conv13_mbox_loc_flat"
2486 | type: "Flatten"
2487 | bottom: "conv13_mbox_loc_perm"
2488 | top: "conv13_mbox_loc_flat"
2489 | flatten_param {
2490 | axis: 1
2491 | }
2492 | }
2493 | layer {
2494 | name: "conv13_mbox_conf_new_worker"
2495 | type: "Convolution"
2496 | bottom: "conv13"
2497 | top: "conv13_mbox_conf_new_worker"
2498 | param {
2499 | lr_mult: 1.0
2500 | decay_mult: 1.0
2501 | }
2502 | param {
2503 | lr_mult: 2.0
2504 | decay_mult: 0.0
2505 | }
2506 | convolution_param {
2507 | num_output: 30
2508 | kernel_size: 1
2509 | weight_filler {
2510 | type: "msra"
2511 | }
2512 | bias_filler {
2513 | type: "constant"
2514 | value: 0.0
2515 | }
2516 | }
2517 | }
2518 | layer {
2519 | name: "conv13_mbox_conf_perm"
2520 | type: "Permute"
2521 | bottom: "conv13_mbox_conf_new_worker"
2522 | top: "conv13_mbox_conf_perm"
2523 | permute_param {
2524 | order: 0
2525 | order: 2
2526 | order: 3
2527 | order: 1
2528 | }
2529 | }
2530 | layer {
2531 | name: "conv13_mbox_conf_flat"
2532 | type: "Flatten"
2533 | bottom: "conv13_mbox_conf_perm"
2534 | top: "conv13_mbox_conf_flat"
2535 | flatten_param {
2536 | axis: 1
2537 | }
2538 | }
2539 | layer {
2540 | name: "conv13_mbox_priorbox"
2541 | type: "PriorBox"
2542 | bottom: "conv13"
2543 | bottom: "data"
2544 | top: "conv13_mbox_priorbox"
2545 | prior_box_param {
2546 | min_size: 105.0
2547 | max_size: 150.0
2548 | aspect_ratio: 2.0
2549 | aspect_ratio: 3.0
2550 | flip: true
2551 | clip: false
2552 | variance: 0.1
2553 | variance: 0.1
2554 | variance: 0.2
2555 | variance: 0.2
2556 | offset: 0.5
2557 | }
2558 | }
2559 | layer {
2560 | name: "conv14_2_mbox_loc"
2561 | type: "Convolution"
2562 | bottom: "conv14_2"
2563 | top: "conv14_2_mbox_loc"
2564 | param {
2565 | lr_mult: 0.1
2566 | decay_mult: 0.1
2567 | }
2568 | param {
2569 | lr_mult: 0.2
2570 | decay_mult: 0.0
2571 | }
2572 | convolution_param {
2573 | num_output: 24
2574 | kernel_size: 1
2575 | weight_filler {
2576 | type: "msra"
2577 | }
2578 | bias_filler {
2579 | type: "constant"
2580 | value: 0.0
2581 | }
2582 | }
2583 | }
2584 | layer {
2585 | name: "conv14_2_mbox_loc_perm"
2586 | type: "Permute"
2587 | bottom: "conv14_2_mbox_loc"
2588 | top: "conv14_2_mbox_loc_perm"
2589 | permute_param {
2590 | order: 0
2591 | order: 2
2592 | order: 3
2593 | order: 1
2594 | }
2595 | }
2596 | layer {
2597 | name: "conv14_2_mbox_loc_flat"
2598 | type: "Flatten"
2599 | bottom: "conv14_2_mbox_loc_perm"
2600 | top: "conv14_2_mbox_loc_flat"
2601 | flatten_param {
2602 | axis: 1
2603 | }
2604 | }
2605 | layer {
2606 | name: "conv14_2_mbox_conf_new_worker"
2607 | type: "Convolution"
2608 | bottom: "conv14_2"
2609 | top: "conv14_2_mbox_conf_new_worker"
2610 | param {
2611 | lr_mult: 1.0
2612 | decay_mult: 1.0
2613 | }
2614 | param {
2615 | lr_mult: 2.0
2616 | decay_mult: 0.0
2617 | }
2618 | convolution_param {
2619 | num_output: 30
2620 | kernel_size: 1
2621 | weight_filler {
2622 | type: "msra"
2623 | }
2624 | bias_filler {
2625 | type: "constant"
2626 | value: 0.0
2627 | }
2628 | }
2629 | }
2630 | layer {
2631 | name: "conv14_2_mbox_conf_perm"
2632 | type: "Permute"
2633 | bottom: "conv14_2_mbox_conf_new_worker"
2634 | top: "conv14_2_mbox_conf_perm"
2635 | permute_param {
2636 | order: 0
2637 | order: 2
2638 | order: 3
2639 | order: 1
2640 | }
2641 | }
2642 | layer {
2643 | name: "conv14_2_mbox_conf_flat"
2644 | type: "Flatten"
2645 | bottom: "conv14_2_mbox_conf_perm"
2646 | top: "conv14_2_mbox_conf_flat"
2647 | flatten_param {
2648 | axis: 1
2649 | }
2650 | }
2651 | layer {
2652 | name: "conv14_2_mbox_priorbox"
2653 | type: "PriorBox"
2654 | bottom: "conv14_2"
2655 | bottom: "data"
2656 | top: "conv14_2_mbox_priorbox"
2657 | prior_box_param {
2658 | min_size: 150.0
2659 | max_size: 195.0
2660 | aspect_ratio: 2.0
2661 | aspect_ratio: 3.0
2662 | flip: true
2663 | clip: false
2664 | variance: 0.1
2665 | variance: 0.1
2666 | variance: 0.2
2667 | variance: 0.2
2668 | offset: 0.5
2669 | }
2670 | }
2671 | layer {
2672 | name: "conv15_2_mbox_loc"
2673 | type: "Convolution"
2674 | bottom: "conv15_2"
2675 | top: "conv15_2_mbox_loc"
2676 | param {
2677 | lr_mult: 0.1
2678 | decay_mult: 0.1
2679 | }
2680 | param {
2681 | lr_mult: 0.2
2682 | decay_mult: 0.0
2683 | }
2684 | convolution_param {
2685 | num_output: 24
2686 | kernel_size: 1
2687 | weight_filler {
2688 | type: "msra"
2689 | }
2690 | bias_filler {
2691 | type: "constant"
2692 | value: 0.0
2693 | }
2694 | }
2695 | }
2696 | layer {
2697 | name: "conv15_2_mbox_loc_perm"
2698 | type: "Permute"
2699 | bottom: "conv15_2_mbox_loc"
2700 | top: "conv15_2_mbox_loc_perm"
2701 | permute_param {
2702 | order: 0
2703 | order: 2
2704 | order: 3
2705 | order: 1
2706 | }
2707 | }
2708 | layer {
2709 | name: "conv15_2_mbox_loc_flat"
2710 | type: "Flatten"
2711 | bottom: "conv15_2_mbox_loc_perm"
2712 | top: "conv15_2_mbox_loc_flat"
2713 | flatten_param {
2714 | axis: 1
2715 | }
2716 | }
2717 | layer {
2718 | name: "conv15_2_mbox_conf_new_worker"
2719 | type: "Convolution"
2720 | bottom: "conv15_2"
2721 | top: "conv15_2_mbox_conf_new_worker"
2722 | param {
2723 | lr_mult: 1.0
2724 | decay_mult: 1.0
2725 | }
2726 | param {
2727 | lr_mult: 2.0
2728 | decay_mult: 0.0
2729 | }
2730 | convolution_param {
2731 | num_output: 30
2732 | kernel_size: 1
2733 | weight_filler {
2734 | type: "msra"
2735 | }
2736 | bias_filler {
2737 | type: "constant"
2738 | value: 0.0
2739 | }
2740 | }
2741 | }
2742 | layer {
2743 | name: "conv15_2_mbox_conf_perm"
2744 | type: "Permute"
2745 | bottom: "conv15_2_mbox_conf_new_worker"
2746 | top: "conv15_2_mbox_conf_perm"
2747 | permute_param {
2748 | order: 0
2749 | order: 2
2750 | order: 3
2751 | order: 1
2752 | }
2753 | }
2754 | layer {
2755 | name: "conv15_2_mbox_conf_flat"
2756 | type: "Flatten"
2757 | bottom: "conv15_2_mbox_conf_perm"
2758 | top: "conv15_2_mbox_conf_flat"
2759 | flatten_param {
2760 | axis: 1
2761 | }
2762 | }
2763 | layer {
2764 | name: "conv15_2_mbox_priorbox"
2765 | type: "PriorBox"
2766 | bottom: "conv15_2"
2767 | bottom: "data"
2768 | top: "conv15_2_mbox_priorbox"
2769 | prior_box_param {
2770 | min_size: 195.0
2771 | max_size: 240.0
2772 | aspect_ratio: 2.0
2773 | aspect_ratio: 3.0
2774 | flip: true
2775 | clip: false
2776 | variance: 0.1
2777 | variance: 0.1
2778 | variance: 0.2
2779 | variance: 0.2
2780 | offset: 0.5
2781 | }
2782 | }
2783 | layer {
2784 | name: "conv16_2_mbox_loc"
2785 | type: "Convolution"
2786 | bottom: "conv16_2"
2787 | top: "conv16_2_mbox_loc"
2788 | param {
2789 | lr_mult: 0.1
2790 | decay_mult: 0.1
2791 | }
2792 | param {
2793 | lr_mult: 0.2
2794 | decay_mult: 0.0
2795 | }
2796 | convolution_param {
2797 | num_output: 24
2798 | kernel_size: 1
2799 | weight_filler {
2800 | type: "msra"
2801 | }
2802 | bias_filler {
2803 | type: "constant"
2804 | value: 0.0
2805 | }
2806 | }
2807 | }
2808 | layer {
2809 | name: "conv16_2_mbox_loc_perm"
2810 | type: "Permute"
2811 | bottom: "conv16_2_mbox_loc"
2812 | top: "conv16_2_mbox_loc_perm"
2813 | permute_param {
2814 | order: 0
2815 | order: 2
2816 | order: 3
2817 | order: 1
2818 | }
2819 | }
2820 | layer {
2821 | name: "conv16_2_mbox_loc_flat"
2822 | type: "Flatten"
2823 | bottom: "conv16_2_mbox_loc_perm"
2824 | top: "conv16_2_mbox_loc_flat"
2825 | flatten_param {
2826 | axis: 1
2827 | }
2828 | }
2829 | layer {
2830 | name: "conv16_2_mbox_conf_new_worker"
2831 | type: "Convolution"
2832 | bottom: "conv16_2"
2833 | top: "conv16_2_mbox_conf_new_worker"
2834 | param {
2835 | lr_mult: 1.0
2836 | decay_mult: 1.0
2837 | }
2838 | param {
2839 | lr_mult: 2.0
2840 | decay_mult: 0.0
2841 | }
2842 | convolution_param {
2843 | num_output: 30
2844 | kernel_size: 1
2845 | weight_filler {
2846 | type: "msra"
2847 | }
2848 | bias_filler {
2849 | type: "constant"
2850 | value: 0.0
2851 | }
2852 | }
2853 | }
2854 | layer {
2855 | name: "conv16_2_mbox_conf_perm"
2856 | type: "Permute"
2857 | bottom: "conv16_2_mbox_conf_new_worker"
2858 | top: "conv16_2_mbox_conf_perm"
2859 | permute_param {
2860 | order: 0
2861 | order: 2
2862 | order: 3
2863 | order: 1
2864 | }
2865 | }
2866 | layer {
2867 | name: "conv16_2_mbox_conf_flat"
2868 | type: "Flatten"
2869 | bottom: "conv16_2_mbox_conf_perm"
2870 | top: "conv16_2_mbox_conf_flat"
2871 | flatten_param {
2872 | axis: 1
2873 | }
2874 | }
2875 | layer {
2876 | name: "conv16_2_mbox_priorbox"
2877 | type: "PriorBox"
2878 | bottom: "conv16_2"
2879 | bottom: "data"
2880 | top: "conv16_2_mbox_priorbox"
2881 | prior_box_param {
2882 | min_size: 240.0
2883 | max_size: 285.0
2884 | aspect_ratio: 2.0
2885 | aspect_ratio: 3.0
2886 | flip: true
2887 | clip: false
2888 | variance: 0.1
2889 | variance: 0.1
2890 | variance: 0.2
2891 | variance: 0.2
2892 | offset: 0.5
2893 | }
2894 | }
2895 | layer {
2896 | name: "conv17_2_mbox_loc"
2897 | type: "Convolution"
2898 | bottom: "conv17_2"
2899 | top: "conv17_2_mbox_loc"
2900 | param {
2901 | lr_mult: 0.1
2902 | decay_mult: 0.1
2903 | }
2904 | param {
2905 | lr_mult: 0.2
2906 | decay_mult: 0.0
2907 | }
2908 | convolution_param {
2909 | num_output: 24
2910 | kernel_size: 1
2911 | weight_filler {
2912 | type: "msra"
2913 | }
2914 | bias_filler {
2915 | type: "constant"
2916 | value: 0.0
2917 | }
2918 | }
2919 | }
2920 | layer {
2921 | name: "conv17_2_mbox_loc_perm"
2922 | type: "Permute"
2923 | bottom: "conv17_2_mbox_loc"
2924 | top: "conv17_2_mbox_loc_perm"
2925 | permute_param {
2926 | order: 0
2927 | order: 2
2928 | order: 3
2929 | order: 1
2930 | }
2931 | }
2932 | layer {
2933 | name: "conv17_2_mbox_loc_flat"
2934 | type: "Flatten"
2935 | bottom: "conv17_2_mbox_loc_perm"
2936 | top: "conv17_2_mbox_loc_flat"
2937 | flatten_param {
2938 | axis: 1
2939 | }
2940 | }
2941 | layer {
2942 | name: "conv17_2_mbox_conf_new_worker"
2943 | type: "Convolution"
2944 | bottom: "conv17_2"
2945 | top: "conv17_2_mbox_conf_new_worker"
2946 | param {
2947 | lr_mult: 1.0
2948 | decay_mult: 1.0
2949 | }
2950 | param {
2951 | lr_mult: 2.0
2952 | decay_mult: 0.0
2953 | }
2954 | convolution_param {
2955 | num_output: 30
2956 | kernel_size: 1
2957 | weight_filler {
2958 | type: "msra"
2959 | }
2960 | bias_filler {
2961 | type: "constant"
2962 | value: 0.0
2963 | }
2964 | }
2965 | }
2966 | layer {
2967 | name: "conv17_2_mbox_conf_perm"
2968 | type: "Permute"
2969 | bottom: "conv17_2_mbox_conf_new_worker"
2970 | top: "conv17_2_mbox_conf_perm"
2971 | permute_param {
2972 | order: 0
2973 | order: 2
2974 | order: 3
2975 | order: 1
2976 | }
2977 | }
2978 | layer {
2979 | name: "conv17_2_mbox_conf_flat"
2980 | type: "Flatten"
2981 | bottom: "conv17_2_mbox_conf_perm"
2982 | top: "conv17_2_mbox_conf_flat"
2983 | flatten_param {
2984 | axis: 1
2985 | }
2986 | }
2987 | layer {
2988 | name: "conv17_2_mbox_priorbox"
2989 | type: "PriorBox"
2990 | bottom: "conv17_2"
2991 | bottom: "data"
2992 | top: "conv17_2_mbox_priorbox"
2993 | prior_box_param {
2994 | min_size: 285.0
2995 | max_size: 300.0
2996 | aspect_ratio: 2.0
2997 | aspect_ratio: 3.0
2998 | flip: true
2999 | clip: false
3000 | variance: 0.1
3001 | variance: 0.1
3002 | variance: 0.2
3003 | variance: 0.2
3004 | offset: 0.5
3005 | }
3006 | }
3007 | layer {
3008 | name: "mbox_loc"
3009 | type: "Concat"
3010 | bottom: "conv11_mbox_loc_flat"
3011 | bottom: "conv13_mbox_loc_flat"
3012 | bottom: "conv14_2_mbox_loc_flat"
3013 | bottom: "conv15_2_mbox_loc_flat"
3014 | bottom: "conv16_2_mbox_loc_flat"
3015 | bottom: "conv17_2_mbox_loc_flat"
3016 | top: "mbox_loc"
3017 | concat_param {
3018 | axis: 1
3019 | }
3020 | }
3021 | layer {
3022 | name: "mbox_conf"
3023 | type: "Concat"
3024 | bottom: "conv11_mbox_conf_flat"
3025 | bottom: "conv13_mbox_conf_flat"
3026 | bottom: "conv14_2_mbox_conf_flat"
3027 | bottom: "conv15_2_mbox_conf_flat"
3028 | bottom: "conv16_2_mbox_conf_flat"
3029 | bottom: "conv17_2_mbox_conf_flat"
3030 | top: "mbox_conf"
3031 | concat_param {
3032 | axis: 1
3033 | }
3034 | }
3035 | layer {
3036 | name: "mbox_priorbox"
3037 | type: "Concat"
3038 | bottom: "conv11_mbox_priorbox"
3039 | bottom: "conv13_mbox_priorbox"
3040 | bottom: "conv14_2_mbox_priorbox"
3041 | bottom: "conv15_2_mbox_priorbox"
3042 | bottom: "conv16_2_mbox_priorbox"
3043 | bottom: "conv17_2_mbox_priorbox"
3044 | top: "mbox_priorbox"
3045 | concat_param {
3046 | axis: 2
3047 | }
3048 | }
3049 | layer {
3050 | name: "mbox_conf_reshape"
3051 | type: "Reshape"
3052 | bottom: "mbox_conf"
3053 | top: "mbox_conf_reshape"
3054 | reshape_param {
3055 | shape {
3056 | dim: 0
3057 | dim: -1
3058 | dim: 6
3059 | }
3060 | }
3061 | }
3062 | layer {
3063 | name: "mbox_conf_softmax"
3064 | type: "Softmax"
3065 | bottom: "mbox_conf_reshape"
3066 | top: "mbox_conf_softmax"
3067 | softmax_param {
3068 | axis: 2
3069 | }
3070 | }
3071 | layer {
3072 | name: "mbox_conf_flatten"
3073 | type: "Flatten"
3074 | bottom: "mbox_conf_softmax"
3075 | top: "mbox_conf_flatten"
3076 | flatten_param {
3077 | axis: 1
3078 | }
3079 | }
3080 | layer {
3081 | name: "detection_out"
3082 | type: "DetectionOutput"
3083 | bottom: "mbox_loc"
3084 | bottom: "mbox_conf_flatten"
3085 | bottom: "mbox_priorbox"
3086 | top: "detection_out"
3087 | include {
3088 | phase: TEST
3089 | }
3090 | detection_output_param {
3091 | num_classes: 5
3092 | share_location: true
3093 | background_label_id: 0
3094 | nms_param {
3095 | nms_threshold: 0.45
3096 | top_k: 100
3097 | }
3098 | code_type: CENTER_SIZE
3099 | keep_top_k: 100
3100 | confidence_threshold: 0.25
3101 | }
3102 | }
3103 |
--------------------------------------------------------------------------------
/setup.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) 2018 Intel Corporation.
3 | # Permission is hereby granted, free of charge, to any person obtaining
4 | # a copy of this software and associated documentation files (the
5 | # "Software"), to deal in the Software without restriction, including
6 | # without limitation the rights to use, copy, modify, merge, publish,
7 | # distribute, sublicense, and/or sell copies of the Software, and to
8 | # permit persons to whom the Software is furnished to do so, subject to
9 | # the following conditions:
10 | #
11 | # The above copyright notice and this permission notice shall be
12 | # included in all copies or substantial portions of the Software.
13 | #
14 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16 | # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17 | # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18 | # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20 | # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 |
22 | BASE_DIR=`pwd`
23 |
24 | #Install the dependencies
25 | sudo apt-get update
26 | sudo apt-get install ffmpeg
27 | sudo apt-get install python3-pip
28 | sudo pip3 install numpy jupyter
29 |
30 | #Download the person detection model
31 | cd /opt/intel/openvino/deployment_tools/tools/model_downloader
32 | sudo ./downloader.py --name person-detection-retail-0013
33 |
34 | #Optimize the worker-safety-mobilenet model
35 | cd /opt/intel/openvino/deployment_tools/model_optimizer/
36 | ./mo_caffe.py --input_model $BASE_DIR/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel -o $BASE_DIR/resources/worker-safety-mobilenet/FP32 --data_type FP32
37 | ./mo_caffe.py --input_model $BASE_DIR/resources/worker-safety-mobilenet/worker_safety_mobilenet.caffemodel -o $BASE_DIR/resources/worker-safety-mobilenet/FP16 --data_type FP16
38 |
39 |
--------------------------------------------------------------------------------