├── .gitignore
├── LICENSE
├── README.md
├── cfg
├── cd53s-yolov3.cfg
├── cd53s.cfg
├── csresnext50-panet-spp.cfg
├── yolov3-1cls.cfg
├── yolov3-asff.cfg
├── yolov3-spp-1cls.cfg
├── yolov3-spp-20cls.cfg
├── yolov3-spp-2cls.cfg
├── yolov3-spp-3cls.cfg
├── yolov3-spp-6cls.cfg
├── yolov3-spp-matrix.cfg
├── yolov3-spp-pan-scale.cfg
├── yolov3-spp.cfg
├── yolov3-spp3.cfg
├── yolov3-tiny-1cls.cfg
├── yolov3-tiny-2cls.cfg
├── yolov3-tiny-3cls.cfg
├── yolov3-tiny.cfg
├── yolov3-tiny3-1cls.cfg
├── yolov3-tiny3.cfg
├── yolov3.cfg
├── yolov4-relu.cfg
├── yolov4-tiny.cfg
└── yolov4.cfg
├── data
├── traffic_light.data
├── traffic_light.names
├── train.shapes
├── train.txt
├── val.shapes
└── val.txt
├── detect.py
├── img_to_vid.py
├── models.py
├── notebooks
└── eda.ipynb
├── outputs
├── video2.txt
├── video3.txt
└── video4_Trim.txt
├── prepare_labels.py
├── prepare_train_val.py
├── preview_images
├── vid_prev1.PNG
├── vid_prev2.PNG
└── vid_prev3.PNG
├── requirements.txt
├── results.png
├── results_model_12.txt
├── runs
└── Sep11_01-03-16_57a6ce0d91d9model_12
│ └── events.out.tfevents.1599786201.57a6ce0d91d9.426.0
├── test.py
├── test_batch0_gt.jpg
├── test_batch0_pred.jpg
├── train.py
├── utils
├── __init__.py
├── adabound.py
├── datasets.py
├── evolve.sh
├── gcp.sh
├── google_utils.py
├── layers.py
├── parse_config.py
├── torch_utils.py
└── utils.py
└── weights
└── readme.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
131 | # input data and models
132 | input/
133 | models/
134 | outputs/*.mp4
135 | weights/*.pt
136 | commands.txt
137 | training_tracker.xlsx
138 |
139 | # data files
140 | *.csv
141 | *.h5
142 | *.pkl
143 | *.pth
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Real Time Traffic Light Detection using Deep Learning (YOLOv3)
2 |
3 |
4 |
5 | ## Table of Contents
6 |
7 | * [About](#About)
8 | * [Progress and TODO](#Progress-and-TODO)
9 | * [Download Trained Weights](#Download-Trained-Weights)
10 | * [Get the Dataset](#Get-the-Dataset)
11 | * [Steps to Train](#Steps-to-Train)
12 | * [Query on Ultralytics YOLOv3 img-size](#Query-on-Ultralytics-YOLOv3-img-size)
13 | * [To Detect Using the Trained Model](#To-Detect-Using-the-Trained-Model)
14 | * [References](#References)
15 |
16 |
17 |
18 | ## About
19 |
20 | ***This project aims to detect traffic light in real time using deep learning as a part of autonomous driving technology.***
21 |
22 | * [Click on the following video to get a better idea about the project and predictions](https://www.youtube.com/watch?v=yy3XsMFKeSg&feature=youtu.be).
23 |
24 | [](https://youtu.be/yy3XsMFKeSg)
25 |
26 |
27 |
28 | ## Progress and TODO
29 |
30 | * **Implementation for all the traffic light types are done. But the final model is still being trained almost every day to make it better. Check the [Download Trained Weights](#Download-Trained-Weights) section to get your desired weight files and try the model on you system.**
31 |
32 | - [x] Detecting red (circular) `stop` sign.
33 | - [x] Detection green (circular) `go` sign.
34 | - [x] Train on for night time detection => Working but not perfect. Better updates to come soon.
35 | - [x] Detecting `warningLeft` sign.
36 | - [x] Detecting `goLeft` sign.
37 | - [x] Detecting `stopleft` sign.
38 | - [x] Detecting `warning` sign.
39 | - [ ] Carla support => **This one is a bit tricky.**
40 |
41 |
42 |
43 | ## Download Trained Weights
44 |
45 | ***Download the trained weights from [here](https://drive.google.com/drive/folders/1nGRGqw5KP6js9UbXDL5G99j_jYdKgdXl?usp=sharing).***
46 |
47 | * `best_model_12.pt`: **Trained for 67 epochs on all the traffic signs. Current mAP is 0.919**
48 |
49 |
50 |
51 | ## Get the Dataset
52 |
53 | This project uses the [LISA Traffic Light Dataset.](https://www.kaggle.com/mbornoe/lisa-traffic-light-dataset). Download the dataset from Kaggle [here](https://www.kaggle.com/mbornoe/lisa-traffic-light-dataset).
54 |
55 |
56 |
57 | ## Steps to Train
58 |
59 | * **The current train/test split is 90/10. The input image size is 608x608. So, it might take a lot of time to train if you train on a nominal GPU. I have trained the model on Google Colab with Tesla T4 GPU/P100 GPU. One epoch took with all the classes around 1 hour on a Tesla T4 GPU. Also, check the `cfg` folder and files before training. You have to use the cfg files corresponding to the number of classes you are training on. If you want to change the number of classes to train on, then you have to change the cfg file too. The current model has been trained on all 6 classes, so, the cfg file is `yolov3-spp-6cls.cfg`.**
60 |
61 | * Prepare the data. **Please do take a look at the paths inside the `prepare_labels.py` file and change them according to your preference and convenience**.
62 | * `python prepare_labels.py`
63 | * Create the train and validation text files (**Current train/validation split = 90/10**).
64 | * `python prepare_train_val.py`
65 | * To train on your own system (The current [model](https://drive.google.com/drive/folders/1nGRGqw5KP6js9UbXDL5G99j_jYdKgdXl?usp=sharing) has been trained for 30 epochs.)
66 | * **To train from scratch**: `python train.py --data /traffic_light.data --batch 2 --cfg cfg/yolov3-spp-6cls.cfg --epochs 55 --weights "" --name from_scratch`
67 | * **Using COCO pretrained weights**: `python train.py --data /traffic_light.data --batch 4 --cfg cfg/yolov3-spp-6cls.cfg --epochs 55 --multi-scale --img-size 608 608 --weights weights/yolov3-spp-ultralytics.pt --name coco_pretrained`
68 | * **To resume training**: `python train.py --data /traffic_light.data --batch 2 --cfg cfg/yolov3-spp-6cls.cfg --epochs --multi-scale --img-size 608 608 --resume --weights weights/.pt --name `
69 |
70 | ### [Query on Ultralytics YOLOv3 img-size](https://github.com/ultralytics/yolov3/issues/456).
71 |
72 | * Short answer: The image size in `cfg` file is not used. Only python executables' argument parser `img-size` argument is used.
73 |
74 |
75 |
76 | ## To Detect Using the Trained Model
77 |
78 | * **Download the [weights here](https://drive.google.com/drive/folders/1nGRGqw5KP6js9UbXDL5G99j_jYdKgdXl?usp=sharing) first, and paste them under the `weights` folder.**
79 | * `python detect.py --source --view-img --weights weights/.pt --img-size 608`
80 |
81 |
82 |
83 | ## References
84 |
85 | ### Articles / Blogs / Tutorials
86 |
87 | * [Recognizing Traffic Lights With Deep Learning.](https://www.freecodecamp.org/news/recognizing-traffic-lights-with-deep-learning-23dae23287cc/)
88 | * [Self Driving Vehicles: Traffic Light Detection and Classification with TensorFlow Object Detection API.](https://becominghuman.ai/traffic-light-detection-tensorflow-api-c75fdbadac62)
89 |
90 | ### Papers
91 |
92 | * [Detecting Traffic Lights by Single Shot Detection.](https://arxiv.org/pdf/1805.02523.pdf)
93 | * [A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection.](https://arxiv.org/pdf/1806.07987v2.pdf)
94 | * [Accurate traffic light detection using deep neural network with focal regression loss.](https://pdf.sciencedirectassets.com/271526/1-s2.0-S0262885619X00062/1-s2.0-S0262885619300538/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjENH%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIGJS6acKy%2Bn%2BogLTPASdUHm2kcAgzf%2BqPN9p8OeOtqjLAiEA%2F%2BXJIsDU4zTfeAt64IuxzWijoPZCAo8bGluHqWEyANsqvQMIuf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARADGgwwNTkwMDM1NDY4NjUiDDRiyVid6olIGdZwzyqRA10sNlWjy52x5aHLEkbyTlAwKwbhfH5gpZfQkY5ZnbhmzmOJAyj16Ij6x1D3cJL3XTMMT9Bj8TXdEOISOnDN2ZDThSTyotxowSzF3GN1V%2Brwgsv07x6GgyUGQz1TsZrbNxrdV2nYPKukv9PUNdcyDXeIWYh5emqvRSl75xtX5%2BGA9%2Be8OkAe8LjrsQJO4M%2BWL5vtSfc2ljzZH%2B%2FWHRwT8YJy8HWVoH1RyEOa1UdOaqfC1f2LYi2AiyAhEg4ODoAqrC9IXDOX%2BynMp4YbmUfUXff%2BCb%2F%2FpBfnuxYXXHGqZxFwf6hex%2FlQietzZ%2FJZnfM1dxZFkWdZjXMPeY6J6k5itnCQt6155HICBAaCD4jnCD93EG3CWTcQFGw5Fa59xkM6dRcyjFCyjvvOoDcOQkOdC9KkqXTEsviKA%2BGtfbR9VdfHxXTz6Eg3L2r0e%2FMD%2BWnKC9gE1O305BfGwVpH8QoC4y2YA6J6EB5SRcYcAYfVHEXae8jFcmT7RwqMlNmkvi5UARGyOOOj0HfuPQQj2Yn1c7qAMKKTk%2FoFOusBF61AXrHbnIYcGm4t9%2FshIODSgtKRGuw2AgBfRK8OQzmSoPfxhmZBph8Cg7vLOWlc6tygObNnLajEnuHOqENs0MNVERQRqeypLtugKOjYPTXhx6c2QHdu3dxq2xxVl4G%2FouOSad0Jk4shK1tvi4zBK7XubyhBnZg2nYEPJY87jCqMiyi8frITa51hPkILVTPH%2BMnWj71w52itNJCgoZ%2FLGKr%2F0yvE4ASCGEP0mGPdv3%2BkRJdQDNXnTlZZJ2jBDnUF8ppTA%2F5Ts8TG0MlXlvVmokNAHToumbuwlKA6LtGQFM5Ik3ksBZ4y2v3mMw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200825T092944Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYZSN4AUAD%2F20200825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=1a06167c3e97cae86c5f885091428f6313cd222846cba3196edfdd450e77f805&hash=42e81b760f319091bff8aa28f407c0be53b094e96dedd3e5895cf54cbcec3de6&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0262885619300538&tid=spdf-d78c15ef-4334-4615-9de5-b6e7a4fbcc3c&sid=9cbac0327e3d654a474b03703362e7cee4bdgxrqb&type=client)
95 |
96 | ### GitHub
97 |
98 | * The YOLOv3 code has been take from the [Ultralytics YOLOv3](https://github.com/ultralytics/yolov3) repo and modified according to the use case.
99 | * [TL-SSD: Detecting Traffic Lights by Single Shot Detection.](https://github.com/julimueller/tl_ssd)
100 | * [Detecting Traffic Lights in Real-time with YOLOv3.](https://github.com/berktepebag/Traffic-light-detection-with-YOLOv3-BOSCH-traffic-light-dataset)
101 |
102 | ### Dataset
103 |
104 | * [LISA Traffic Light Dataset.](https://www.kaggle.com/mbornoe/lisa-traffic-light-dataset)
105 |
106 | ### Image / Video Credits
107 |
108 | * **These may include links and citations for the data that I use for testing. You can also use these links to obtain the videos.**
109 | * `video1.mp4`: https://www.youtube.com/watch?v=yJrW8werMUs.
110 | * `video2.mp4`: https://www.youtube.com/watch?v=pU8ThDYZcCc.# Traffic-Light-Detection-Using-YOLOv3
111 | * `video3.mp4`: https://www.youtube.com/watch?v=iS5sq9IELEo.
112 | * `video4.mp4`: https://www.youtube.com/watch?v=GfWskqDjeTE.
113 | * `video5.mp4`: https://www.youtube.com/watch?v=7HaJArMDKgI.
114 | * `video6.mp4`: https://www.youtube.com/watch?v=NK_HNF1C8yA.
115 | * `video7.mp4`: https://www.youtube.com/watch?v=w-W9esW3eqI.
116 | * `video8.mp4`: https://www.youtube.com/watch?v=RPDYLA8Rh_M.
117 | * `video9.mp4`: https://www.youtube.com/watch?v=imeV3Pm-ZLE.
118 |
--------------------------------------------------------------------------------
/cfg/yolov3-1cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=16
7 | subdivisions=1
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=18
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=18
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=18
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 |
--------------------------------------------------------------------------------
/cfg/yolov3-asff.cfg:
--------------------------------------------------------------------------------
1 | # Generated by Glenn Jocher (glenn.jocher@ultralytics.com) for https://github.com/ultralytics/yolov3
2 | # def kmean_anchors(path='../coco/train2017.txt', n=12, img_size=(320, 640)): # from utils.utils import *; kmean_anchors()
3 | # Evolving anchors: 100%|██████████| 1000/1000 [41:15<00:00, 2.48s/it]
4 | # 0.20 iou_thr: 0.992 best possible recall, 4.25 anchors > thr
5 | # kmeans anchors (n=12, img_size=(320, 640), IoU=0.005/0.184/0.634-min/mean/best): 6,9, 15,16, 17,35, 37,26, 36,67, 63,42, 57,100, 121,81, 112,169, 241,158, 195,310, 426,359
6 |
7 | [net]
8 | # Testing
9 | # batch=1
10 | # subdivisions=1
11 | # Training
12 | batch=64
13 | subdivisions=16
14 | width=608
15 | height=608
16 | channels=3
17 | momentum=0.9
18 | decay=0.0005
19 | angle=0
20 | saturation = 1.5
21 | exposure = 1.5
22 | hue=.1
23 |
24 | learning_rate=0.001
25 | burn_in=1000
26 | max_batches = 500200
27 | policy=steps
28 | steps=400000,450000
29 | scales=.1,.1
30 |
31 | [convolutional]
32 | batch_normalize=1
33 | filters=32
34 | size=3
35 | stride=1
36 | pad=1
37 | activation=leaky
38 |
39 | # Downsample
40 |
41 | [convolutional]
42 | batch_normalize=1
43 | filters=64
44 | size=3
45 | stride=2
46 | pad=1
47 | activation=leaky
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=32
52 | size=1
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [convolutional]
58 | batch_normalize=1
59 | filters=64
60 | size=3
61 | stride=1
62 | pad=1
63 | activation=leaky
64 |
65 | [shortcut]
66 | from=-3
67 | activation=linear
68 |
69 | # Downsample
70 |
71 | [convolutional]
72 | batch_normalize=1
73 | filters=128
74 | size=3
75 | stride=2
76 | pad=1
77 | activation=leaky
78 |
79 | [convolutional]
80 | batch_normalize=1
81 | filters=64
82 | size=1
83 | stride=1
84 | pad=1
85 | activation=leaky
86 |
87 | [convolutional]
88 | batch_normalize=1
89 | filters=128
90 | size=3
91 | stride=1
92 | pad=1
93 | activation=leaky
94 |
95 | [shortcut]
96 | from=-3
97 | activation=linear
98 |
99 | [convolutional]
100 | batch_normalize=1
101 | filters=64
102 | size=1
103 | stride=1
104 | pad=1
105 | activation=leaky
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=128
110 | size=3
111 | stride=1
112 | pad=1
113 | activation=leaky
114 |
115 | [shortcut]
116 | from=-3
117 | activation=linear
118 |
119 | # Downsample
120 |
121 | [convolutional]
122 | batch_normalize=1
123 | filters=256
124 | size=3
125 | stride=2
126 | pad=1
127 | activation=leaky
128 |
129 | [convolutional]
130 | batch_normalize=1
131 | filters=128
132 | size=1
133 | stride=1
134 | pad=1
135 | activation=leaky
136 |
137 | [convolutional]
138 | batch_normalize=1
139 | filters=256
140 | size=3
141 | stride=1
142 | pad=1
143 | activation=leaky
144 |
145 | [shortcut]
146 | from=-3
147 | activation=linear
148 |
149 | [convolutional]
150 | batch_normalize=1
151 | filters=128
152 | size=1
153 | stride=1
154 | pad=1
155 | activation=leaky
156 |
157 | [convolutional]
158 | batch_normalize=1
159 | filters=256
160 | size=3
161 | stride=1
162 | pad=1
163 | activation=leaky
164 |
165 | [shortcut]
166 | from=-3
167 | activation=linear
168 |
169 | [convolutional]
170 | batch_normalize=1
171 | filters=128
172 | size=1
173 | stride=1
174 | pad=1
175 | activation=leaky
176 |
177 | [convolutional]
178 | batch_normalize=1
179 | filters=256
180 | size=3
181 | stride=1
182 | pad=1
183 | activation=leaky
184 |
185 | [shortcut]
186 | from=-3
187 | activation=linear
188 |
189 | [convolutional]
190 | batch_normalize=1
191 | filters=128
192 | size=1
193 | stride=1
194 | pad=1
195 | activation=leaky
196 |
197 | [convolutional]
198 | batch_normalize=1
199 | filters=256
200 | size=3
201 | stride=1
202 | pad=1
203 | activation=leaky
204 |
205 | [shortcut]
206 | from=-3
207 | activation=linear
208 |
209 | [convolutional]
210 | batch_normalize=1
211 | filters=128
212 | size=1
213 | stride=1
214 | pad=1
215 | activation=leaky
216 |
217 | [convolutional]
218 | batch_normalize=1
219 | filters=256
220 | size=3
221 | stride=1
222 | pad=1
223 | activation=leaky
224 |
225 | [shortcut]
226 | from=-3
227 | activation=linear
228 |
229 | [convolutional]
230 | batch_normalize=1
231 | filters=128
232 | size=1
233 | stride=1
234 | pad=1
235 | activation=leaky
236 |
237 | [convolutional]
238 | batch_normalize=1
239 | filters=256
240 | size=3
241 | stride=1
242 | pad=1
243 | activation=leaky
244 |
245 | [shortcut]
246 | from=-3
247 | activation=linear
248 |
249 | [convolutional]
250 | batch_normalize=1
251 | filters=128
252 | size=1
253 | stride=1
254 | pad=1
255 | activation=leaky
256 |
257 | [convolutional]
258 | batch_normalize=1
259 | filters=256
260 | size=3
261 | stride=1
262 | pad=1
263 | activation=leaky
264 |
265 | [shortcut]
266 | from=-3
267 | activation=linear
268 |
269 | [convolutional]
270 | batch_normalize=1
271 | filters=128
272 | size=1
273 | stride=1
274 | pad=1
275 | activation=leaky
276 |
277 | [convolutional]
278 | batch_normalize=1
279 | filters=256
280 | size=3
281 | stride=1
282 | pad=1
283 | activation=leaky
284 |
285 | [shortcut]
286 | from=-3
287 | activation=linear
288 |
289 | # Downsample
290 |
291 | [convolutional]
292 | batch_normalize=1
293 | filters=512
294 | size=3
295 | stride=2
296 | pad=1
297 | activation=leaky
298 |
299 | [convolutional]
300 | batch_normalize=1
301 | filters=256
302 | size=1
303 | stride=1
304 | pad=1
305 | activation=leaky
306 |
307 | [convolutional]
308 | batch_normalize=1
309 | filters=512
310 | size=3
311 | stride=1
312 | pad=1
313 | activation=leaky
314 |
315 | [shortcut]
316 | from=-3
317 | activation=linear
318 |
319 | [convolutional]
320 | batch_normalize=1
321 | filters=256
322 | size=1
323 | stride=1
324 | pad=1
325 | activation=leaky
326 |
327 | [convolutional]
328 | batch_normalize=1
329 | filters=512
330 | size=3
331 | stride=1
332 | pad=1
333 | activation=leaky
334 |
335 | [shortcut]
336 | from=-3
337 | activation=linear
338 |
339 | [convolutional]
340 | batch_normalize=1
341 | filters=256
342 | size=1
343 | stride=1
344 | pad=1
345 | activation=leaky
346 |
347 | [convolutional]
348 | batch_normalize=1
349 | filters=512
350 | size=3
351 | stride=1
352 | pad=1
353 | activation=leaky
354 |
355 | [shortcut]
356 | from=-3
357 | activation=linear
358 |
359 | [convolutional]
360 | batch_normalize=1
361 | filters=256
362 | size=1
363 | stride=1
364 | pad=1
365 | activation=leaky
366 |
367 | [convolutional]
368 | batch_normalize=1
369 | filters=512
370 | size=3
371 | stride=1
372 | pad=1
373 | activation=leaky
374 |
375 | [shortcut]
376 | from=-3
377 | activation=linear
378 |
379 | [convolutional]
380 | batch_normalize=1
381 | filters=256
382 | size=1
383 | stride=1
384 | pad=1
385 | activation=leaky
386 |
387 | [convolutional]
388 | batch_normalize=1
389 | filters=512
390 | size=3
391 | stride=1
392 | pad=1
393 | activation=leaky
394 |
395 | [shortcut]
396 | from=-3
397 | activation=linear
398 |
399 | [convolutional]
400 | batch_normalize=1
401 | filters=256
402 | size=1
403 | stride=1
404 | pad=1
405 | activation=leaky
406 |
407 | [convolutional]
408 | batch_normalize=1
409 | filters=512
410 | size=3
411 | stride=1
412 | pad=1
413 | activation=leaky
414 |
415 | [shortcut]
416 | from=-3
417 | activation=linear
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | # SPP --------------------------------------------------------------------------
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 | # SPP --------------------------------------------------------------------------
597 |
598 | [convolutional]
599 | batch_normalize=1
600 | filters=512
601 | size=1
602 | stride=1
603 | pad=1
604 | activation=leaky
605 |
606 | [convolutional]
607 | batch_normalize=1
608 | size=3
609 | stride=1
610 | pad=1
611 | filters=1024
612 | activation=leaky
613 |
614 | [convolutional]
615 | batch_normalize=1
616 | filters=512
617 | size=1
618 | stride=1
619 | pad=1
620 | activation=leaky
621 |
622 | [convolutional]
623 | batch_normalize=1
624 | size=3
625 | stride=1
626 | pad=1
627 | filters=1024
628 | activation=leaky
629 |
630 | [convolutional]
631 | size=1
632 | stride=1
633 | pad=1
634 | filters=258
635 | activation=linear
636 |
637 | # YOLO -------------------------------------------------------------------------
638 |
639 | [route]
640 | layers = -3
641 |
642 | [convolutional]
643 | batch_normalize=1
644 | filters=256
645 | size=1
646 | stride=1
647 | pad=1
648 | activation=leaky
649 |
650 | [upsample]
651 | stride=2
652 |
653 | [route]
654 | layers = -1, 61
655 |
656 | [convolutional]
657 | batch_normalize=1
658 | filters=256
659 | size=1
660 | stride=1
661 | pad=1
662 | activation=leaky
663 |
664 | [convolutional]
665 | batch_normalize=1
666 | size=3
667 | stride=1
668 | pad=1
669 | filters=512
670 | activation=leaky
671 |
672 | [convolutional]
673 | batch_normalize=1
674 | filters=256
675 | size=1
676 | stride=1
677 | pad=1
678 | activation=leaky
679 |
680 | [convolutional]
681 | batch_normalize=1
682 | size=3
683 | stride=1
684 | pad=1
685 | filters=512
686 | activation=leaky
687 |
688 | [convolutional]
689 | batch_normalize=1
690 | filters=256
691 | size=1
692 | stride=1
693 | pad=1
694 | activation=leaky
695 |
696 | [convolutional]
697 | batch_normalize=1
698 | size=3
699 | stride=1
700 | pad=1
701 | filters=512
702 | activation=leaky
703 |
704 | [convolutional]
705 | size=1
706 | stride=1
707 | pad=1
708 | filters=258
709 | activation=linear
710 |
711 | # YOLO -------------------------------------------------------------------------
712 |
713 | [route]
714 | layers = -3
715 |
716 | [convolutional]
717 | batch_normalize=1
718 | filters=128
719 | size=1
720 | stride=1
721 | pad=1
722 | activation=leaky
723 |
724 | [upsample]
725 | stride=2
726 |
727 | [route]
728 | layers = -1, 36
729 |
730 | [convolutional]
731 | batch_normalize=1
732 | filters=128
733 | size=1
734 | stride=1
735 | pad=1
736 | activation=leaky
737 |
738 | [convolutional]
739 | batch_normalize=1
740 | size=3
741 | stride=1
742 | pad=1
743 | filters=256
744 | activation=leaky
745 |
746 | [convolutional]
747 | batch_normalize=1
748 | filters=128
749 | size=1
750 | stride=1
751 | pad=1
752 | activation=leaky
753 |
754 | [convolutional]
755 | batch_normalize=1
756 | size=3
757 | stride=1
758 | pad=1
759 | filters=256
760 | activation=leaky
761 |
762 | [convolutional]
763 | batch_normalize=1
764 | filters=128
765 | size=1
766 | stride=1
767 | pad=1
768 | activation=leaky
769 |
770 | [convolutional]
771 | batch_normalize=1
772 | size=3
773 | stride=1
774 | pad=1
775 | filters=256
776 | activation=leaky
777 |
778 | [convolutional]
779 | size=1
780 | stride=1
781 | pad=1
782 | filters=258
783 | activation=linear
784 |
785 | [yolo]
786 | from=88,99,110
787 | mask = 6,7,8
788 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
789 | classes=80
790 | num=9
791 |
792 | [yolo]
793 | from=88,99,110
794 | mask = 3,4,5
795 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
796 | classes=80
797 | num=9
798 |
799 | [yolo]
800 | from=88,99,110
801 | mask = 0,1,2
802 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
803 | classes=80
804 | num=9
--------------------------------------------------------------------------------
/cfg/yolov3-spp-1cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=100
20 | max_batches = 5000
21 | policy=steps
22 | steps=4000,4500
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=18
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=1
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=18
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=1
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=18
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=1
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-spp-20cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=100
20 | max_batches = 5000
21 | policy=steps
22 | steps=4000,4500
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=75
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=20
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=75
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=20
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=75
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=20
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-spp-2cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=100
20 | max_batches = 5000
21 | policy=steps
22 | steps=4000,4500
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=21
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=2
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=21
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=2
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=21
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=2
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-spp-3cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=100
20 | max_batches = 5000
21 | policy=steps
22 | steps=4000,4500
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=24
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=3
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=24
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=3
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=24
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=3
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-spp-6cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=100
20 | max_batches = 5000
21 | policy=steps
22 | steps=4000,4500
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=33
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=6
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=33
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=6
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=33
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=6
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-spp.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | ### SPP ###
576 | [maxpool]
577 | stride=1
578 | size=5
579 |
580 | [route]
581 | layers=-2
582 |
583 | [maxpool]
584 | stride=1
585 | size=9
586 |
587 | [route]
588 | layers=-4
589 |
590 | [maxpool]
591 | stride=1
592 | size=13
593 |
594 | [route]
595 | layers=-1,-3,-5,-6
596 |
597 | ### End SPP ###
598 |
599 | [convolutional]
600 | batch_normalize=1
601 | filters=512
602 | size=1
603 | stride=1
604 | pad=1
605 | activation=leaky
606 |
607 |
608 | [convolutional]
609 | batch_normalize=1
610 | size=3
611 | stride=1
612 | pad=1
613 | filters=1024
614 | activation=leaky
615 |
616 | [convolutional]
617 | batch_normalize=1
618 | filters=512
619 | size=1
620 | stride=1
621 | pad=1
622 | activation=leaky
623 |
624 | [convolutional]
625 | batch_normalize=1
626 | size=3
627 | stride=1
628 | pad=1
629 | filters=1024
630 | activation=leaky
631 |
632 | [convolutional]
633 | size=1
634 | stride=1
635 | pad=1
636 | filters=255
637 | activation=linear
638 |
639 |
640 | [yolo]
641 | mask = 6,7,8
642 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
643 | classes=80
644 | num=9
645 | jitter=.3
646 | ignore_thresh = .7
647 | truth_thresh = 1
648 | random=1
649 |
650 |
651 | [route]
652 | layers = -4
653 |
654 | [convolutional]
655 | batch_normalize=1
656 | filters=256
657 | size=1
658 | stride=1
659 | pad=1
660 | activation=leaky
661 |
662 | [upsample]
663 | stride=2
664 |
665 | [route]
666 | layers = -1, 61
667 |
668 |
669 |
670 | [convolutional]
671 | batch_normalize=1
672 | filters=256
673 | size=1
674 | stride=1
675 | pad=1
676 | activation=leaky
677 |
678 | [convolutional]
679 | batch_normalize=1
680 | size=3
681 | stride=1
682 | pad=1
683 | filters=512
684 | activation=leaky
685 |
686 | [convolutional]
687 | batch_normalize=1
688 | filters=256
689 | size=1
690 | stride=1
691 | pad=1
692 | activation=leaky
693 |
694 | [convolutional]
695 | batch_normalize=1
696 | size=3
697 | stride=1
698 | pad=1
699 | filters=512
700 | activation=leaky
701 |
702 | [convolutional]
703 | batch_normalize=1
704 | filters=256
705 | size=1
706 | stride=1
707 | pad=1
708 | activation=leaky
709 |
710 | [convolutional]
711 | batch_normalize=1
712 | size=3
713 | stride=1
714 | pad=1
715 | filters=512
716 | activation=leaky
717 |
718 | [convolutional]
719 | size=1
720 | stride=1
721 | pad=1
722 | filters=255
723 | activation=linear
724 |
725 |
726 | [yolo]
727 | mask = 3,4,5
728 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
729 | classes=80
730 | num=9
731 | jitter=.3
732 | ignore_thresh = .7
733 | truth_thresh = 1
734 | random=1
735 |
736 |
737 |
738 | [route]
739 | layers = -4
740 |
741 | [convolutional]
742 | batch_normalize=1
743 | filters=128
744 | size=1
745 | stride=1
746 | pad=1
747 | activation=leaky
748 |
749 | [upsample]
750 | stride=2
751 |
752 | [route]
753 | layers = -1, 36
754 |
755 |
756 |
757 | [convolutional]
758 | batch_normalize=1
759 | filters=128
760 | size=1
761 | stride=1
762 | pad=1
763 | activation=leaky
764 |
765 | [convolutional]
766 | batch_normalize=1
767 | size=3
768 | stride=1
769 | pad=1
770 | filters=256
771 | activation=leaky
772 |
773 | [convolutional]
774 | batch_normalize=1
775 | filters=128
776 | size=1
777 | stride=1
778 | pad=1
779 | activation=leaky
780 |
781 | [convolutional]
782 | batch_normalize=1
783 | size=3
784 | stride=1
785 | pad=1
786 | filters=256
787 | activation=leaky
788 |
789 | [convolutional]
790 | batch_normalize=1
791 | filters=128
792 | size=1
793 | stride=1
794 | pad=1
795 | activation=leaky
796 |
797 | [convolutional]
798 | batch_normalize=1
799 | size=3
800 | stride=1
801 | pad=1
802 | filters=256
803 | activation=leaky
804 |
805 | [convolutional]
806 | size=1
807 | stride=1
808 | pad=1
809 | filters=255
810 | activation=linear
811 |
812 |
813 | [yolo]
814 | mask = 0,1,2
815 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
816 | classes=80
817 | num=9
818 | jitter=.3
819 | ignore_thresh = .7
820 | truth_thresh = 1
821 | random=1
822 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny-1cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=2
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=16
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=64
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [maxpool]
58 | size=2
59 | stride=2
60 |
61 | [convolutional]
62 | batch_normalize=1
63 | filters=128
64 | size=3
65 | stride=1
66 | pad=1
67 | activation=leaky
68 |
69 | [maxpool]
70 | size=2
71 | stride=2
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=256
76 | size=3
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [maxpool]
82 | size=2
83 | stride=2
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=512
88 | size=3
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [maxpool]
94 | size=2
95 | stride=1
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 |
105 | ###########
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=18
128 | activation=linear
129 |
130 |
131 |
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
135 | classes=1
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 |
142 | [route]
143 | layers = -4
144 |
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 |
153 | [upsample]
154 | stride=2
155 |
156 | [route]
157 | layers = -1, 8
158 |
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 |
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=18
172 | activation=linear
173 |
174 | [yolo]
175 | mask = 0,1,2
176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
177 | classes=1
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny-2cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=2
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=16
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=64
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [maxpool]
58 | size=2
59 | stride=2
60 |
61 | [convolutional]
62 | batch_normalize=1
63 | filters=128
64 | size=3
65 | stride=1
66 | pad=1
67 | activation=leaky
68 |
69 | [maxpool]
70 | size=2
71 | stride=2
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=256
76 | size=3
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [maxpool]
82 | size=2
83 | stride=2
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=512
88 | size=3
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [maxpool]
94 | size=2
95 | stride=1
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 |
105 | ###########
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=21
128 | activation=linear
129 |
130 |
131 |
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
135 | classes=2
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 |
142 | [route]
143 | layers = -4
144 |
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 |
153 | [upsample]
154 | stride=2
155 |
156 | [route]
157 | layers = -1, 8
158 |
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 |
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=21
172 | activation=linear
173 |
174 | [yolo]
175 | mask = 0,1,2
176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
177 | classes=2
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny-3cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=2
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=16
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=64
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [maxpool]
58 | size=2
59 | stride=2
60 |
61 | [convolutional]
62 | batch_normalize=1
63 | filters=128
64 | size=3
65 | stride=1
66 | pad=1
67 | activation=leaky
68 |
69 | [maxpool]
70 | size=2
71 | stride=2
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=256
76 | size=3
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [maxpool]
82 | size=2
83 | stride=2
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=512
88 | size=3
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [maxpool]
94 | size=2
95 | stride=1
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 |
105 | ###########
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=24
128 | activation=linear
129 |
130 |
131 |
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
135 | classes=3
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 |
142 | [route]
143 | layers = -4
144 |
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 |
153 | [upsample]
154 | stride=2
155 |
156 | [route]
157 | layers = -1, 8
158 |
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 |
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=24
172 | activation=linear
173 |
174 | [yolo]
175 | mask = 0,1,2
176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
177 | classes=3
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=2
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=16
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=32
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=64
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [maxpool]
58 | size=2
59 | stride=2
60 |
61 | [convolutional]
62 | batch_normalize=1
63 | filters=128
64 | size=3
65 | stride=1
66 | pad=1
67 | activation=leaky
68 |
69 | [maxpool]
70 | size=2
71 | stride=2
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=256
76 | size=3
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [maxpool]
82 | size=2
83 | stride=2
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=512
88 | size=3
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [maxpool]
94 | size=2
95 | stride=1
96 |
97 | [convolutional]
98 | batch_normalize=1
99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 |
105 | ###########
106 |
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=255
128 | activation=linear
129 |
130 |
131 |
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
135 | classes=80
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 |
142 | [route]
143 | layers = -4
144 |
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 |
153 | [upsample]
154 | stride=2
155 |
156 | [route]
157 | layers = -1, 8
158 |
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 |
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=255
172 | activation=linear
173 |
174 | [yolo]
175 | mask = 1,2,3
176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
177 | classes=80
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny3-1cls.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 200000
21 | policy=steps
22 | steps=180000,190000
23 | scales=.1,.1
24 |
25 |
26 | [convolutional]
27 | batch_normalize=1
28 | filters=16
29 | size=3
30 | stride=1
31 | pad=1
32 | activation=leaky
33 |
34 | [maxpool]
35 | size=2
36 | stride=2
37 |
38 | [convolutional]
39 | batch_normalize=1
40 | filters=32
41 | size=3
42 | stride=1
43 | pad=1
44 | activation=leaky
45 |
46 | [maxpool]
47 | size=2
48 | stride=2
49 |
50 | [convolutional]
51 | batch_normalize=1
52 | filters=64
53 | size=3
54 | stride=1
55 | pad=1
56 | activation=leaky
57 |
58 | [maxpool]
59 | size=2
60 | stride=2
61 |
62 | [convolutional]
63 | batch_normalize=1
64 | filters=128
65 | size=3
66 | stride=1
67 | pad=1
68 | activation=leaky
69 |
70 | [maxpool]
71 | size=2
72 | stride=2
73 |
74 | [convolutional]
75 | batch_normalize=1
76 | filters=256
77 | size=3
78 | stride=1
79 | pad=1
80 | activation=leaky
81 |
82 | [maxpool]
83 | size=2
84 | stride=2
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=512
89 | size=3
90 | stride=1
91 | pad=1
92 | activation=leaky
93 |
94 | [maxpool]
95 | size=2
96 | stride=1
97 |
98 | [convolutional]
99 | batch_normalize=1
100 | filters=1024
101 | size=3
102 | stride=1
103 | pad=1
104 | activation=leaky
105 |
106 | ###########
107 |
108 | [convolutional]
109 | batch_normalize=1
110 | filters=256
111 | size=1
112 | stride=1
113 | pad=1
114 | activation=leaky
115 |
116 | [convolutional]
117 | batch_normalize=1
118 | filters=512
119 | size=3
120 | stride=1
121 | pad=1
122 | activation=leaky
123 |
124 | [convolutional]
125 | size=1
126 | stride=1
127 | pad=1
128 | filters=18
129 | activation=linear
130 |
131 |
132 |
133 | [yolo]
134 | mask = 6,7,8
135 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
136 | classes=1
137 | num=9
138 | jitter=.3
139 | ignore_thresh = .7
140 | truth_thresh = 1
141 | random=1
142 |
143 | [route]
144 | layers = -4
145 |
146 | [convolutional]
147 | batch_normalize=1
148 | filters=128
149 | size=1
150 | stride=1
151 | pad=1
152 | activation=leaky
153 |
154 | [upsample]
155 | stride=2
156 |
157 | [route]
158 | layers = -1, 8
159 |
160 | [convolutional]
161 | batch_normalize=1
162 | filters=256
163 | size=3
164 | stride=1
165 | pad=1
166 | activation=leaky
167 |
168 | [convolutional]
169 | size=1
170 | stride=1
171 | pad=1
172 | filters=18
173 | activation=linear
174 |
175 | [yolo]
176 | mask = 3,4,5
177 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
178 | classes=1
179 | num=9
180 | jitter=.3
181 | ignore_thresh = .7
182 | truth_thresh = 1
183 | random=1
184 |
185 |
186 |
187 | [route]
188 | layers = -3
189 |
190 | [convolutional]
191 | batch_normalize=1
192 | filters=128
193 | size=1
194 | stride=1
195 | pad=1
196 | activation=leaky
197 |
198 | [upsample]
199 | stride=2
200 |
201 | [route]
202 | layers = -1, 6
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=3
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=18
217 | activation=linear
218 |
219 | [yolo]
220 | mask = 0,1,2
221 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
222 | classes=1
223 | num=9
224 | jitter=.3
225 | ignore_thresh = .7
226 | truth_thresh = 1
227 | random=1
228 |
--------------------------------------------------------------------------------
/cfg/yolov3-tiny3.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | # batch=1
4 | # subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=16
8 | width=608
9 | height=608
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 200000
21 | policy=steps
22 | steps=180000,190000
23 | scales=.1,.1
24 |
25 |
26 | [convolutional]
27 | batch_normalize=1
28 | filters=16
29 | size=3
30 | stride=1
31 | pad=1
32 | activation=leaky
33 |
34 | [maxpool]
35 | size=2
36 | stride=2
37 |
38 | [convolutional]
39 | batch_normalize=1
40 | filters=32
41 | size=3
42 | stride=1
43 | pad=1
44 | activation=leaky
45 |
46 | [maxpool]
47 | size=2
48 | stride=2
49 |
50 | [convolutional]
51 | batch_normalize=1
52 | filters=64
53 | size=3
54 | stride=1
55 | pad=1
56 | activation=leaky
57 |
58 | [maxpool]
59 | size=2
60 | stride=2
61 |
62 | [convolutional]
63 | batch_normalize=1
64 | filters=128
65 | size=3
66 | stride=1
67 | pad=1
68 | activation=leaky
69 |
70 | [maxpool]
71 | size=2
72 | stride=2
73 |
74 | [convolutional]
75 | batch_normalize=1
76 | filters=256
77 | size=3
78 | stride=1
79 | pad=1
80 | activation=leaky
81 |
82 | [maxpool]
83 | size=2
84 | stride=2
85 |
86 | [convolutional]
87 | batch_normalize=1
88 | filters=512
89 | size=3
90 | stride=1
91 | pad=1
92 | activation=leaky
93 |
94 | [maxpool]
95 | size=2
96 | stride=1
97 |
98 | [convolutional]
99 | batch_normalize=1
100 | filters=1024
101 | size=3
102 | stride=1
103 | pad=1
104 | activation=leaky
105 |
106 | ###########
107 |
108 | [convolutional]
109 | batch_normalize=1
110 | filters=256
111 | size=1
112 | stride=1
113 | pad=1
114 | activation=leaky
115 |
116 | [convolutional]
117 | batch_normalize=1
118 | filters=512
119 | size=3
120 | stride=1
121 | pad=1
122 | activation=leaky
123 |
124 | [convolutional]
125 | size=1
126 | stride=1
127 | pad=1
128 | filters=255
129 | activation=linear
130 |
131 |
132 |
133 | [yolo]
134 | mask = 6,7,8
135 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
136 | classes=80
137 | num=9
138 | jitter=.3
139 | ignore_thresh = .7
140 | truth_thresh = 1
141 | random=1
142 |
143 | [route]
144 | layers = -4
145 |
146 | [convolutional]
147 | batch_normalize=1
148 | filters=128
149 | size=1
150 | stride=1
151 | pad=1
152 | activation=leaky
153 |
154 | [upsample]
155 | stride=2
156 |
157 | [route]
158 | layers = -1, 8
159 |
160 | [convolutional]
161 | batch_normalize=1
162 | filters=256
163 | size=3
164 | stride=1
165 | pad=1
166 | activation=leaky
167 |
168 | [convolutional]
169 | size=1
170 | stride=1
171 | pad=1
172 | filters=255
173 | activation=linear
174 |
175 | [yolo]
176 | mask = 3,4,5
177 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
178 | classes=80
179 | num=9
180 | jitter=.3
181 | ignore_thresh = .7
182 | truth_thresh = 1
183 | random=1
184 |
185 |
186 |
187 | [route]
188 | layers = -3
189 |
190 | [convolutional]
191 | batch_normalize=1
192 | filters=128
193 | size=1
194 | stride=1
195 | pad=1
196 | activation=leaky
197 |
198 | [upsample]
199 | stride=2
200 |
201 | [route]
202 | layers = -1, 6
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=3
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=255
217 | activation=linear
218 |
219 | [yolo]
220 | mask = 0,1,2
221 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
222 | classes=80
223 | num=9
224 | jitter=.3
225 | ignore_thresh = .7
226 | truth_thresh = 1
227 | random=1
228 |
--------------------------------------------------------------------------------
/cfg/yolov3.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=16
7 | subdivisions=1
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 |
--------------------------------------------------------------------------------
/cfg/yolov4-tiny.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=64
7 | subdivisions=1
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.00261
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=2
30 | pad=1
31 | activation=leaky
32 |
33 | [convolutional]
34 | batch_normalize=1
35 | filters=64
36 | size=3
37 | stride=2
38 | pad=1
39 | activation=leaky
40 |
41 | [convolutional]
42 | batch_normalize=1
43 | filters=64
44 | size=3
45 | stride=1
46 | pad=1
47 | activation=leaky
48 |
49 | [route]
50 | layers=-1
51 | groups=2
52 | group_id=1
53 |
54 | [convolutional]
55 | batch_normalize=1
56 | filters=32
57 | size=3
58 | stride=1
59 | pad=1
60 | activation=leaky
61 |
62 | [convolutional]
63 | batch_normalize=1
64 | filters=32
65 | size=3
66 | stride=1
67 | pad=1
68 | activation=leaky
69 |
70 | [route]
71 | layers = -1,-2
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [route]
82 | layers = -6,-1
83 |
84 | [maxpool]
85 | size=2
86 | stride=2
87 |
88 | [convolutional]
89 | batch_normalize=1
90 | filters=128
91 | size=3
92 | stride=1
93 | pad=1
94 | activation=leaky
95 |
96 | [route]
97 | layers=-1
98 | groups=2
99 | group_id=1
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=64
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [convolutional]
110 | batch_normalize=1
111 | filters=64
112 | size=3
113 | stride=1
114 | pad=1
115 | activation=leaky
116 |
117 | [route]
118 | layers = -1,-2
119 |
120 | [convolutional]
121 | batch_normalize=1
122 | filters=128
123 | size=1
124 | stride=1
125 | pad=1
126 | activation=leaky
127 |
128 | [route]
129 | layers = -6,-1
130 |
131 | [maxpool]
132 | size=2
133 | stride=2
134 |
135 | [convolutional]
136 | batch_normalize=1
137 | filters=256
138 | size=3
139 | stride=1
140 | pad=1
141 | activation=leaky
142 |
143 | [route]
144 | layers=-1
145 | groups=2
146 | group_id=1
147 |
148 | [convolutional]
149 | batch_normalize=1
150 | filters=128
151 | size=3
152 | stride=1
153 | pad=1
154 | activation=leaky
155 |
156 | [convolutional]
157 | batch_normalize=1
158 | filters=128
159 | size=3
160 | stride=1
161 | pad=1
162 | activation=leaky
163 |
164 | [route]
165 | layers = -1,-2
166 |
167 | [convolutional]
168 | batch_normalize=1
169 | filters=256
170 | size=1
171 | stride=1
172 | pad=1
173 | activation=leaky
174 |
175 | [route]
176 | layers = -6,-1
177 |
178 | [maxpool]
179 | size=2
180 | stride=2
181 |
182 | [convolutional]
183 | batch_normalize=1
184 | filters=512
185 | size=3
186 | stride=1
187 | pad=1
188 | activation=leaky
189 |
190 | ##################################
191 |
192 | [convolutional]
193 | batch_normalize=1
194 | filters=256
195 | size=1
196 | stride=1
197 | pad=1
198 | activation=leaky
199 |
200 | [convolutional]
201 | batch_normalize=1
202 | filters=512
203 | size=3
204 | stride=1
205 | pad=1
206 | activation=leaky
207 |
208 | [convolutional]
209 | size=1
210 | stride=1
211 | pad=1
212 | filters=255
213 | activation=linear
214 |
215 |
216 |
217 | [yolo]
218 | mask = 3,4,5
219 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
220 | classes=80
221 | num=6
222 | jitter=.3
223 | scale_x_y = 1.05
224 | cls_normalizer=1.0
225 | iou_normalizer=0.07
226 | iou_loss=ciou
227 | ignore_thresh = .7
228 | truth_thresh = 1
229 | random=0
230 | resize=1.5
231 | nms_kind=greedynms
232 | beta_nms=0.6
233 |
234 | [route]
235 | layers = -4
236 |
237 | [convolutional]
238 | batch_normalize=1
239 | filters=128
240 | size=1
241 | stride=1
242 | pad=1
243 | activation=leaky
244 |
245 | [upsample]
246 | stride=2
247 |
248 | [route]
249 | layers = -1, 23
250 |
251 | [convolutional]
252 | batch_normalize=1
253 | filters=256
254 | size=3
255 | stride=1
256 | pad=1
257 | activation=leaky
258 |
259 | [convolutional]
260 | size=1
261 | stride=1
262 | pad=1
263 | filters=255
264 | activation=linear
265 |
266 | [yolo]
267 | mask = 1,2,3
268 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
269 | classes=80
270 | num=6
271 | jitter=.3
272 | scale_x_y = 1.05
273 | cls_normalizer=1.0
274 | iou_normalizer=0.07
275 | iou_loss=ciou
276 | ignore_thresh = .7
277 | truth_thresh = 1
278 | random=0
279 | resize=1.5
280 | nms_kind=greedynms
281 | beta_nms=0.6
282 |
--------------------------------------------------------------------------------
/data/traffic_light.data:
--------------------------------------------------------------------------------
1 | classes=6
2 | train=data/train.txt
3 | valid=data/val.txt
4 | names=data/traffic_light.names
--------------------------------------------------------------------------------
/data/traffic_light.names:
--------------------------------------------------------------------------------
1 | go
2 | stop
3 | stopLeft
4 | goLeft
5 | warning
6 | warningLeft
--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | from models import * # set ONNX_EXPORT in models.py
4 | from utils.datasets import *
5 | from utils.utils import *
6 |
7 |
8 | def detect(save_img=False):
9 | imgsz = (320, 192) if ONNX_EXPORT else opt.img_size # (320, 192) or (416, 256) or (608, 352) for (height, width)
10 | out, source, weights, half, view_img, save_txt = opt.output, opt.source, opt.weights, opt.half, opt.view_img, opt.save_txt
11 | webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')
12 |
13 | # Initialize
14 | device = torch_utils.select_device(device='cpu' if ONNX_EXPORT else opt.device)
15 | # if os.path.exists(out):
16 | # shutil.rmtree(out) # delete output folder
17 | os.makedirs(out, exist_ok=True) # make new output folder
18 |
19 | # Initialize model
20 | model = Darknet(opt.cfg, imgsz)
21 |
22 | # Load weights
23 | attempt_download(weights)
24 | if weights.endswith('.pt'): # pytorch format
25 | model.load_state_dict(torch.load(weights, map_location=device)['model'])
26 | else: # darknet format
27 | load_darknet_weights(model, weights)
28 |
29 | # Second-stage classifier
30 | classify = False
31 | if classify:
32 | modelc = torch_utils.load_classifier(name='resnet101', n=2) # initialize
33 | modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights
34 | modelc.to(device).eval()
35 |
36 | # Eval mode
37 | model.to(device).eval()
38 |
39 | # Fuse Conv2d + BatchNorm2d layers
40 | # model.fuse()
41 |
42 | # Export mode
43 | if ONNX_EXPORT:
44 | model.fuse()
45 | img = torch.zeros((1, 3) + imgsz) # (1, 3, 320, 192)
46 | f = opt.weights.replace(opt.weights.split('.')[-1], 'onnx') # *.onnx filename
47 | torch.onnx.export(model, img, f, verbose=False, opset_version=11,
48 | input_names=['images'], output_names=['classes', 'boxes'])
49 |
50 | # Validate exported model
51 | import onnx
52 | model = onnx.load(f) # Load the ONNX model
53 | onnx.checker.check_model(model) # Check that the IR is well formed
54 | print(onnx.helper.printable_graph(model.graph)) # Print a human readable representation of the graph
55 | return
56 |
57 | # Half precision
58 | half = half and device.type != 'cpu' # half precision only supported on CUDA
59 | if half:
60 | model.half()
61 |
62 | # Set Dataloader
63 | vid_path, vid_writer = None, None
64 | if webcam:
65 | view_img = True
66 | torch.backends.cudnn.benchmark = True # set True to speed up constant image size inference
67 | dataset = LoadStreams(source, img_size=imgsz)
68 | else:
69 | save_img = True
70 | dataset = LoadImages(source, img_size=imgsz)
71 |
72 | # Get names and colors
73 | names = load_classes(opt.names)
74 | # colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
75 | colors = [(0, 255, 0), (0, 0, 255), (0, 0, 155), (0, 200, 200), (29, 118, 255), (0 , 118, 255)]
76 |
77 | # Run inference
78 | t0 = time.time()
79 | img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
80 | _ = model(img.half() if half else img.float()) if device.type != 'cpu' else None # run once
81 | for path, img, im0s, vid_cap, frame, nframes in dataset:
82 | img = torch.from_numpy(img).to(device)
83 | img = img.half() if half else img.float() # uint8 to fp16/32
84 | img /= 255.0 # 0 - 255 to 0.0 - 1.0
85 | if img.ndimension() == 3:
86 | img = img.unsqueeze(0)
87 |
88 | # Inference
89 | t1 = torch_utils.time_synchronized()
90 | pred = model(img, augment=opt.augment)[0]
91 | t2 = torch_utils.time_synchronized()
92 |
93 | # to float
94 | if half:
95 | pred = pred.float()
96 |
97 | # Apply NMS
98 | pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres,
99 | multi_label=False, classes=opt.classes, agnostic=opt.agnostic_nms)
100 |
101 | # Apply Classifier
102 | if classify:
103 | pred = apply_classifier(pred, modelc, img, im0s)
104 |
105 | # Process detections
106 | for i, det in enumerate(pred): # detections for image i
107 | if webcam: # batch_size >= 1
108 | p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
109 | else:
110 | p, s, im0 = path, '', im0s
111 |
112 | save_path = str(Path(out) / Path(p).name)
113 | print(save_path)
114 | s += '%gx%g ' % img.shape[2:] # print string
115 | gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
116 | if det is not None and len(det):
117 | # Rescale boxes from imgsz to im0 size
118 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
119 |
120 | # Print results
121 | for c in det[:, -1].unique():
122 | n = (det[:, -1] == c).sum() # detections per class
123 | s += '%g %ss, ' % (n, names[int(c)]) # add to string
124 |
125 | # Write results
126 | for *xyxy, conf, cls in det:
127 | if save_txt: # Write to file
128 | xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
129 | with open(save_path[:save_path.rfind('.')] + '.txt', 'a') as file:
130 | file.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format
131 |
132 | if save_img or view_img: # Add bbox to image
133 | # label = '%s %.2f' % (names[int(cls)], conf)
134 | label = '%s' % (names[int(cls)])
135 | plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])
136 |
137 | # Print time (inference + NMS)
138 | print('%sDone. (%.3fs)' % (s, t2 - t1))
139 |
140 | # Stream results
141 | if view_img:
142 | cv2.imshow(p, im0)
143 | if nframes == 1:
144 | cv2.waitKey(0)
145 | elif nframes > 1:
146 | if cv2.waitKey(1) & 0xFF == ord('q'): # q to quit
147 | print(f"Average FPS: {frame/(time.time() - t0)}")
148 | raise StopIteration
149 |
150 | # Save results (image with detections)
151 | if save_img:
152 | if dataset.mode == 'images':
153 | cv2.imwrite(save_path, im0)
154 | else:
155 | if vid_path != save_path: # new video
156 | vid_path = save_path
157 | if isinstance(vid_writer, cv2.VideoWriter):
158 | vid_writer.release() # release previous video writer
159 |
160 | fps = vid_cap.get(cv2.CAP_PROP_FPS)
161 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
162 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
163 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h))
164 | vid_writer.write(im0)
165 |
166 | if save_txt or save_img:
167 | print('Results saved to %s' % os.getcwd() + os.sep + out)
168 | if platform == 'darwin': # MacOS
169 | os.system('open ' + save_path)
170 |
171 | print('Done. (%.3fs)' % (time.time() - t0))
172 | print(f"Average FPS: {nframes/(time.time() - t0)}")
173 |
174 |
175 | if __name__ == '__main__':
176 | parser = argparse.ArgumentParser()
177 | parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp-6cls.cfg', help='*.cfg path')
178 | parser.add_argument('--names', type=str, default='data/traffic_light.names', help='*.names path')
179 | parser.add_argument('--weights', type=str, required=True, help='weights path')
180 | parser.add_argument('--source', type=str, default='data/samples', help='source') # input file/folder, 0 for webcam
181 | parser.add_argument('--output', type=str, default='outputs', help='output folder') # output folder
182 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
183 | parser.add_argument('--conf-thres', type=float, default=0.3, help='object confidence threshold')
184 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
185 | parser.add_argument('--fourcc', type=str, default='mp4v', help='output video codec (verify ffmpeg support)')
186 | parser.add_argument('--half', action='store_true', help='half precision FP16 inference')
187 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
188 | parser.add_argument('--view-img', action='store_true', help='display results')
189 | parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
190 | parser.add_argument('--classes', nargs='+', type=int, help='filter by class')
191 | parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
192 | parser.add_argument('--augment', action='store_true', help='augmented inference')
193 | opt = parser.parse_args()
194 | opt.cfg = check_file(opt.cfg) # check file
195 | opt.names = check_file(opt.names) # check file
196 | print(opt)
197 |
198 | with torch.no_grad():
199 | detect()
200 |
--------------------------------------------------------------------------------
/img_to_vid.py:
--------------------------------------------------------------------------------
1 | from moviepy.editor import VideoFileClip
2 | from moviepy.editor import ImageSequenceClip
3 | import glob
4 |
5 | fps = 20
6 |
7 | image_paths = glob.glob('../input/lisa_traffic_light_dataset/lisa-traffic-light-dataset/daySequence1/daySequence1/frames/*.jpg')
8 | image_paths.sort()
9 | print(image_paths[:5])
10 | clip = ImageSequenceClip(image_paths, fps=fps)
11 | clip.write_videofile('../input/lisa_traffic_light_dataset/input/test_data/day_seq1.mp4', fps=fps)
12 | print('DONE')
--------------------------------------------------------------------------------
/prepare_labels.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import os
3 | import glob
4 | import cv2
5 |
6 | from tqdm import tqdm
7 |
8 | show_info = True
9 | images_with_required_classes = 0
10 | total_images = 0
11 | labels = {
12 | 'go': 0,
13 | 'stop': 1,
14 | 'stopLeft': 2,
15 | 'goLeft': 3,
16 | 'warning': 4,
17 | 'warningLeft': 5
18 | }
19 |
20 | root_folder_names = ['dayTrain', 'nightTrain']
21 | root_folder_name_mapper = {
22 | 'dayTrain': 'dayClip',
23 | 'nightTrain': 'nightClip'
24 | }
25 |
26 | annotation_root = '../input/lisa_traffic_light_dataset/lisa-traffic-light-dataset/Annotations/Annotations'
27 | image_root = '../input/lisa_traffic_light_dataset/lisa-traffic-light-dataset'
28 |
29 |
30 | def get_coords(tag, x_min, y_min, x_max, y_max, images_with_required_classes):
31 | """
32 | We will return a single digit for each label.
33 | Also we will return normalized x_center, y_center,
34 | width, and height. We will divice the x_center and width by
35 | image width and y_center and height by image height to
36 | normalize. Each image is 1280 in width and 960 in height.
37 | """
38 | if tag in labels:
39 | if tag == 'go':
40 | label = labels['go']
41 | color = (0, 255, 0)
42 | elif tag == 'stop':
43 | label = labels['stop']
44 | color = (0, 0, 255)
45 | elif tag == 'stopLeft':
46 | label = labels['stopLeft']
47 | color = (0, 0, 155)
48 | elif tag == 'goLeft':
49 | label = labels['goLeft']
50 | color = (0, 200, 200)
51 | elif tag == 'warning':
52 | label = labels['warning']
53 | color = (29, 118, 255)
54 | elif tag == 'warningLeft':
55 | label = labels['warningLeft']
56 | color = (0 , 118, 255)
57 |
58 | x_center = ((x_max + x_min) / 2) / 1280
59 | y_center = ((y_max + y_min) / 2) / 960
60 | w = (x_max - x_min) / 1280
61 | h = (y_max - y_min) / 960
62 | return label, x_center, y_center, w, h
63 | else:
64 | label = ''
65 | x_center = ''
66 | y_center = ''
67 | w = ''
68 | h = ''
69 | return label, x_center, y_center, w, h
70 |
71 | for root_folder_name in root_folder_names:
72 | folder_names = os.listdir(f"{annotation_root}/{root_folder_name}")
73 | num_folders = len(folder_names)
74 | mapped_clip = root_folder_name_mapper[root_folder_name]
75 |
76 | for i in range(1, num_folders+1):
77 | print('##### NEW CSV AND IMAGES ####')
78 | # read the annotation CSV file
79 | df = pd.read_csv(f"{annotation_root}/{root_folder_name}/{mapped_clip}{i}/frameAnnotationsBOX.csv",
80 | delimiter=';')
81 | # get all image paths
82 | image_paths = glob.glob(f"{image_root}/{root_folder_name}/{root_folder_name}/{mapped_clip}{i}/frames/*.jpg")
83 | image_paths.sort()
84 |
85 | total_images += len(image_paths)
86 |
87 | if show_info:
88 | print('NUMBER OF IMAGE AND UNIQUE CSV FILE NAMES MAY NOT MATCH')
89 | print('NOT A PROBLEM')
90 | print(f"Total objects in current CSV file: {len(df)}")
91 | print(f"Unique Filenames: {len(df['Filename'].unique())}")
92 | print(df.head())
93 | print(f"Total images in current folder: {len(image_paths)}")
94 |
95 | tags = df['Annotation tag'].values
96 | x_min = df['Upper left corner X'].values
97 | y_min = df['Upper left corner Y'].values
98 | x_max = df['Lower right corner X'].values
99 | y_max = df['Lower right corner Y'].values
100 |
101 | file_counter = 0 # to counter through CSV file
102 | # iterate through all image paths
103 | for i, image_path in tqdm(enumerate(image_paths), total=len(image_paths)):
104 | image_name = image_path.split(os.path.sep)[-1]
105 | # iterate through all CSV rows
106 | for j in range(len(df)):
107 | if file_counter < len(df):
108 | file_name = df.loc[file_counter]['Filename'].split('/')[-1]
109 | if file_name == image_name:
110 | label, x, y, w, h = get_coords(tags[file_counter],
111 | x_min[file_counter],
112 | y_min[file_counter],
113 | x_max[file_counter],
114 | y_max[file_counter],
115 | images_with_required_classes)
116 | with open(f"../input/lisa_traffic_light_dataset/input/labels/{image_name.split('.')[0]}.txt", 'a+') as f:
117 | if type(label) == int:
118 | f.writelines(f"{label} {x} {y} {w} {h}\n")
119 | f.close()
120 | else:
121 | f.writelines(f"")
122 | f.close()
123 | image = cv2.imread(image_path, cv2.IMREAD_COLOR)
124 | cv2.imwrite(f"../input/lisa_traffic_light_dataset/input/images/{image_name}", image)
125 | file_counter += 1
126 | # continue
127 | if file_name != image_name:
128 | break
129 |
130 | print(f"Total images parsed through: {total_images}")
131 | # print(f"Total images with desired classes: {images_with_required_classes}")
--------------------------------------------------------------------------------
/prepare_train_val.py:
--------------------------------------------------------------------------------
1 | """
2 | This python script prepares train.txt and val.txt for YOLOv3
3 | training.
4 | """
5 |
6 | import os
7 | import numpy as np
8 | import random
9 |
10 | # get all the image file names from `input/images/*`
11 | image_files = os.listdir('../input/lisa_traffic_light_dataset/input/images')
12 |
13 | # we will use 80% for training and 20% for validation
14 | train_indices = []
15 | valid_indices = []
16 | for tr_id in range(int(len(image_files)*0.80)):
17 | train_indices.append(random.randint(0, len(image_files) - 1))
18 |
19 | val_counter = 0
20 | while val_counter != (int(len(image_files)*0.20)):
21 | val_idx = random.randint(0, len(image_files) - 1)
22 | if val_idx not in train_indices:
23 | valid_indices.append(val_idx)
24 | val_counter += 1
25 |
26 | print(f"Training images: {len(train_indices)}")
27 | print(f"Validation images: {len(valid_indices)}")
28 |
29 | for i in train_indices:
30 | with open('data/train.txt', 'a') as train_file:
31 | train_file.writelines(f"../input/lisa_traffic_light_dataset/input/images/{image_files[i]}\n")
32 |
33 | for i in valid_indices:
34 | with open('data/val.txt', 'a') as val_file:
35 | val_file.writelines(f"../input/lisa_traffic_light_dataset/input/images/{image_files[i]}\n")
36 | i += 1
--------------------------------------------------------------------------------
/preview_images/vid_prev1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/preview_images/vid_prev1.PNG
--------------------------------------------------------------------------------
/preview_images/vid_prev2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/preview_images/vid_prev2.PNG
--------------------------------------------------------------------------------
/preview_images/vid_prev3.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/preview_images/vid_prev3.PNG
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | # pip install -U -r requirements.txt
2 | Cython
3 | numpy==1.17
4 | opencv-python
5 | matplotlib
6 | pillow
7 | tensorboard
8 | torchvision -f https://download.pytorch.org/whl/torch_stable.html
9 | torch -f https://download.pytorch.org/whl/torch_stable.html
10 | scipy
11 | tqdm
12 | git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
13 |
14 | # Conda commands (in lieu of pip) ---------------------------------------------
15 | # conda update -yn base -c defaults conda
16 | # conda install -yc anaconda numpy opencv matplotlib tqdm pillow ipython
17 | # conda install -yc conda-forge scikit-image pycocotools tensorboard
18 | # conda install -yc spyder-ide spyder-line-profiler
19 | # conda install -yc pytorch pytorch torchvision
20 | # conda install -yc conda-forge protobuf numpy && pip install onnx==1.6.0 # https://github.com/onnx/onnx#linux-and-macos
21 |
--------------------------------------------------------------------------------
/results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/results.png
--------------------------------------------------------------------------------
/results_model_12.txt:
--------------------------------------------------------------------------------
1 | 0/4 8.05G 2.52 0.403 0.435 3.36 7 832 0.587 0.545 0.633 0.564 1.42 0.181 0.0727
2 | 1/4 8.06G 1.85 0.153 0.0973 2.1 2 416 0.658 0.925 0.793 0.768 1.27 0.128 0.0578
3 | 2/4 8.29G 1.66 0.125 0.0837 1.87 1 608 0.629 0.885 0.857 0.732 1.14 0.11 0.0485
4 | 3/4 8.06G 1.28 0.0996 0.0441 1.42 6 576 0.684 0.938 0.894 0.791 0.97 0.0788 0.0228
5 | 4/4 8.23G 1.2 0.0904 0.0434 1.33 4 672 0.764 0.952 0.936 0.846 0.889 0.0768 0.0216
6 | 5/9 9.09G 1.53 0.112 0.0699 1.72 3 672 0.781 0.908 0.897 0.84 1.14 0.101 0.0367
7 | 6/9 9.09G 1.14 0.087 0.0342 1.26 3 768 0.683 0.959 0.924 0.795 0.931 0.0751 0.0167
8 | 7/9 9.08G 0.995 0.0816 0.0228 1.1 3 736 0.801 0.956 0.942 0.869 0.85 0.0712 0.0127
9 | 8/9 9.11G 0.874 0.0741 0.0165 0.965 3 672 0.744 0.968 0.948 0.839 0.73 0.0638 0.00794
10 | 9/9 9.09G 0.805 0.071 0.0125 0.889 2 384 0.757 0.973 0.952 0.851 0.668 0.0591 0.00511
11 | 10/14 9.09G 1.3 0.0988 0.0481 1.45 7 672 0.691 0.953 0.927 0.798 1.03 0.0812 0.0221
12 | 11/14 9.08G 0.916 0.078 0.0171 1.01 4 800 0.761 0.97 0.946 0.852 0.712 0.0609 0.00831
13 | 12/14 9.08G 0.831 0.0741 0.0136 0.919 4 736 0.778 0.971 0.948 0.863 0.669 0.0585 0.00567
14 | 13/14 9.11G 0.76 0.0718 0.0101 0.842 2 672 0.776 0.971 0.95 0.862 0.618 0.0572 0.0038
15 | 14/14 9.09G 0.727 0.0685 0.00834 0.804 1 512 0.783 0.972 0.954 0.867 0.589 0.0554 0.00297
16 | 15/19 9.08G 1.34 0.0823 0.0372 1.46 8 736 0.617 0.946 0.905 0.746 1.08 0.0751 0.0227
17 | 16/19 9.09G 0.983 0.0706 0.0153 1.07 14 384 0.694 0.952 0.93 0.803 0.81 0.0565 0.00395
18 | 17/19 9.08G 0.859 0.0641 0.00816 0.931 7 896 0.711 0.952 0.929 0.814 0.752 0.0543 0.00384
19 | 18/19 9.09G 0.82 0.062 0.00668 0.889 7 736 0.718 0.96 0.933 0.821 0.73 0.0539 0.00327
20 | 19/19 9.08G 0.791 0.0624 0.00605 0.86 16 672 0.763 0.95 0.934 0.846 0.696 0.0521 0.00223
21 | 20/24 9.08G 1.2 0.0762 0.0296 1.31 4 736 0.593 0.952 0.921 0.725 0.981 0.0658 0.0242
22 | 21/24 9.08G 0.902 0.0684 0.0124 0.983 5 800 0.715 0.963 0.942 0.821 0.7 0.0519 0.00328
23 | 22/24 9.08G 0.806 0.0619 0.00636 0.874 8 896 0.738 0.961 0.941 0.835 0.659 0.0509 0.00237
24 | 23/24 9.09G 0.775 0.0606 0.00556 0.841 10 736 0.729 0.965 0.945 0.83 0.649 0.0501 0.00223
25 | 24/24 9.08G 0.763 0.061 0.00482 0.829 11 640 0.758 0.958 0.945 0.846 0.622 0.0488 0.00183
26 | 25/29 9.13G 1.96 0.214 0.73 2.9 8 416 0.345 0.312 0.405 0.261 1.09 0.148 0.284
27 | 26/29 9.13G 1.1 0.134 0.308 1.54 11 384 0.328 0.455 0.454 0.379 0.851 0.105 0.109
28 | 27/29 9.12G 0.998 0.116 0.191 1.3 15 896 0.339 0.463 0.47 0.391 0.79 0.0942 0.0735
29 | 28/29 9.12G 0.941 0.108 0.148 1.2 11 672 0.478 0.475 0.53 0.414 0.778 0.0881 0.056
30 | 29/29 9.13G 0.907 0.104 0.135 1.15 8 672 0.518 0.491 0.57 0.441 0.739 0.0838 0.0513
31 | 30/34 4.61G 1.36 0.125 0.216 1.7 2 736 0.504 0.535 0.594 0.496 1.09 0.105 0.125
32 | 31/34 4.61G 0.957 0.105 0.149 1.21 4 800 0.648 0.596 0.637 0.531 0.862 0.0875 0.0668
33 | 32/34 4.6G 0.852 0.0969 0.091 1.04 3 896 0.656 0.622 0.685 0.548 0.794 0.0818 0.0434
34 | 33/34 4.75G 0.818 0.0945 0.0773 0.99 5 736 0.652 0.624 0.682 0.547 0.793 0.0825 0.0394
35 | 34/34 4.61G 0.811 0.0912 0.0768 0.979 5 640 0.666 0.624 0.685 0.556 0.759 0.0807 0.0332
36 | 35/39 9.13G 1.43 0.11 0.216 1.76 12 416 0.803 0.602 0.704 0.535 1.16 0.0977 0.121
37 | 36/39 9.13G 0.991 0.0986 0.14 1.23 14 384 0.713 0.691 0.741 0.612 0.805 0.0789 0.063
38 | 37/39 9.12G 0.882 0.0915 0.09 1.06 9 896 0.741 0.705 0.769 0.636 0.77 0.0753 0.0571
39 | 38/39 9.12G 0.843 0.088 0.0815 1.01 9 672 0.738 0.72 0.778 0.643 0.748 0.0734 0.051
40 | 39/39 9.13G 0.839 0.0859 0.0757 1 8 672 0.722 0.763 0.8 0.692 0.74 0.0734 0.0467
41 | 40/44 9.12G 1.3 0.0991 0.16 1.56 7 736 0.786 0.642 0.758 0.586 1.1 0.0865 0.105
42 | 41/44 9.12G 0.952 0.0912 0.109 1.15 8 384 0.71 0.797 0.847 0.717 0.745 0.0757 0.048
43 | 42/44 9.12G 0.843 0.0857 0.0773 1.01 12 896 0.737 0.836 0.872 0.761 0.697 0.0729 0.0387
44 | 43/44 9.13G 0.823 0.0824 0.065 0.97 10 736 0.714 0.862 0.868 0.769 0.696 0.0717 0.0339
45 | 44/44 9.12G 0.816 0.081 0.0653 0.963 11 672 0.743 0.868 0.885 0.789 0.689 0.0704 0.0315
46 | 45/49 4.61G 1.11 0.0918 0.105 1.31 2 736 0.789 0.733 0.812 0.727 0.924 0.081 0.0626
47 | 46/49 4.61G 0.832 0.0857 0.0587 0.977 6 800 0.722 0.831 0.833 0.765 0.738 0.0727 0.0326
48 | 47/49 4.61G 0.767 0.0801 0.0438 0.891 6 448 0.727 0.865 0.866 0.783 0.706 0.0715 0.028
49 | 48/49 4.75G 0.752 0.0788 0.0377 0.869 2 736 0.706 0.882 0.867 0.777 0.717 0.0721 0.0269
50 | 49/49 4.61G 0.743 0.0775 0.0339 0.855 6 640 0.724 0.869 0.869 0.784 0.691 0.0693 0.025
51 | 50/54 4.61G 1.07 0.0866 0.0819 1.24 2 736 0.658 0.879 0.804 0.74 0.957 0.0798 0.0825
52 | 51/54 4.61G 0.817 0.0829 0.0593 0.959 6 384 0.742 0.899 0.889 0.81 0.747 0.0722 0.0321
53 | 52/54 4.61G 0.764 0.0773 0.0424 0.884 4 896 0.724 0.909 0.877 0.801 0.724 0.0704 0.0288
54 | 53/54 4.76G 0.736 0.0754 0.0353 0.846 2 736 0.739 0.919 0.9 0.818 0.716 0.0714 0.032
55 | 54/54 4.61G 0.727 0.0753 0.0331 0.836 10 672 0.747 0.921 0.901 0.819 0.695 0.068 0.0261
56 | 55/59 4.6G 1.07 0.0851 0.0833 1.23 3 736 0.813 0.872 0.857 0.84 0.907 0.0777 0.0476
57 | 56/59 4.61G 0.807 0.0805 0.0563 0.944 4 800 0.738 0.909 0.882 0.809 0.689 0.067 0.0225
58 | 57/59 4.61G 0.744 0.0748 0.0406 0.859 7 896 0.731 0.915 0.882 0.809 0.664 0.0652 0.0191
59 | 58/59 4.75G 0.73 0.0732 0.0361 0.84 4 736 0.755 0.913 0.884 0.825 0.666 0.0645 0.0168
60 | 59/59 4.61G 0.722 0.0735 0.0318 0.827 5 640 0.762 0.919 0.892 0.832 0.653 0.0632 0.0154
61 | 60/66 9.13G 1.16 0.0865 0.0977 1.34 14 416 0.844 0.858 0.883 0.845 0.896 0.0836 0.0761
62 | 61/66 9.13G 0.912 0.0838 0.0781 1.07 13 384 0.74 0.906 0.905 0.811 0.703 0.066 0.0316
63 | 62/66 9.12G 0.811 0.0763 0.0522 0.939 9 896 0.761 0.899 0.896 0.823 0.718 0.0675 0.133
64 | 63/66 9.12G 0.785 0.0742 0.0488 0.908 9 736 0.747 0.924 0.912 0.824 0.657 0.0628 0.0181
65 | 64/66 9.12G 0.763 0.073 0.0399 0.876 14 672 0.773 0.923 0.91 0.84 0.643 0.0613 0.015
66 | 65/66 9.12G 0.748 0.071 0.038 0.857 6 768 0.766 0.93 0.92 0.839 0.65 0.0611 0.0139
67 | 66/66 9.12G 0.758 0.0727 0.0375 0.868 15 864 0.771 0.926 0.919 0.839 0.629 0.0594 0.012
68 |
--------------------------------------------------------------------------------
/runs/Sep11_01-03-16_57a6ce0d91d9model_12/events.out.tfevents.1599786201.57a6ce0d91d9.426.0:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/runs/Sep11_01-03-16_57a6ce0d91d9model_12/events.out.tfevents.1599786201.57a6ce0d91d9.426.0
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import json
3 |
4 | from torch.utils.data import DataLoader
5 |
6 | from models import *
7 | from utils.datasets import *
8 | from utils.utils import *
9 |
10 |
11 | def test(cfg,
12 | data,
13 | weights=None,
14 | batch_size=16,
15 | imgsz=416,
16 | conf_thres=0.001,
17 | iou_thres=0.6, # for nms
18 | save_json=False,
19 | single_cls=False,
20 | augment=False,
21 | model=None,
22 | dataloader=None,
23 | multi_label=True):
24 | # Initialize/load model and set device
25 | if model is None:
26 | is_training = False
27 | device = torch_utils.select_device(opt.device, batch_size=batch_size)
28 | verbose = opt.task == 'test'
29 |
30 | # Remove previous
31 | for f in glob.glob('test_batch*.jpg'):
32 | os.remove(f)
33 |
34 | # Initialize model
35 | model = Darknet(cfg, imgsz)
36 |
37 | # Load weights
38 | attempt_download(weights)
39 | if weights.endswith('.pt'): # pytorch format
40 | model.load_state_dict(torch.load(weights, map_location=device)['model'])
41 | else: # darknet format
42 | load_darknet_weights(model, weights)
43 |
44 | # Fuse
45 | model.fuse()
46 | model.to(device)
47 |
48 | if device.type != 'cpu' and torch.cuda.device_count() > 1:
49 | model = nn.DataParallel(model)
50 | else: # called by train.py
51 | is_training = True
52 | device = next(model.parameters()).device # get model device
53 | verbose = False
54 |
55 | # Configure run
56 | data = parse_data_cfg(data)
57 | nc = 1 if single_cls else int(data['classes']) # number of classes
58 | path = data['valid'] # path to test images
59 | names = load_classes(data['names']) # class names
60 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95
61 | iouv = iouv[0].view(1) # comment for mAP@0.5:0.95
62 | niou = iouv.numel()
63 |
64 | # Dataloader
65 | if dataloader is None:
66 | dataset = LoadImagesAndLabels(path, imgsz, batch_size, rect=True, single_cls=opt.single_cls, pad=0.5)
67 | batch_size = min(batch_size, len(dataset))
68 | dataloader = DataLoader(dataset,
69 | batch_size=batch_size,
70 | num_workers=min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]),
71 | pin_memory=True,
72 | collate_fn=dataset.collate_fn)
73 |
74 | seen = 0
75 | model.eval()
76 | _ = model(torch.zeros((1, 3, imgsz, imgsz), device=device)) if device.type != 'cpu' else None # run once
77 | coco91class = coco80_to_coco91_class()
78 | s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@0.5', 'F1')
79 | p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
80 | loss = torch.zeros(3, device=device)
81 | jdict, stats, ap, ap_class = [], [], [], []
82 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
83 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0
84 | targets = targets.to(device)
85 | nb, _, height, width = imgs.shape # batch size, channels, height, width
86 | whwh = torch.Tensor([width, height, width, height]).to(device)
87 |
88 | # Disable gradients
89 | with torch.no_grad():
90 | # Run model
91 | t = torch_utils.time_synchronized()
92 | inf_out, train_out = model(imgs, augment=augment) # inference and training outputs
93 | t0 += torch_utils.time_synchronized() - t
94 |
95 | # Compute loss
96 | if is_training: # if model has loss hyperparameters
97 | loss += compute_loss(train_out, targets, model)[1][:3] # GIoU, obj, cls
98 |
99 | # Run NMS
100 | t = torch_utils.time_synchronized()
101 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, multi_label=multi_label)
102 | t1 += torch_utils.time_synchronized() - t
103 |
104 | # Statistics per image
105 | for si, pred in enumerate(output):
106 | labels = targets[targets[:, 0] == si, 1:]
107 | nl = len(labels)
108 | tcls = labels[:, 0].tolist() if nl else [] # target class
109 | seen += 1
110 |
111 | if pred is None:
112 | if nl:
113 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
114 | continue
115 |
116 | # Append to text file
117 | # with open('test.txt', 'a') as file:
118 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
119 |
120 | # Clip boxes to image bounds
121 | clip_coords(pred, (height, width))
122 |
123 | # Append to pycocotools JSON dictionary
124 | if save_json:
125 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
126 | image_id = int(Path(paths[si]).stem.split('_')[-1])
127 | box = pred[:, :4].clone() # xyxy
128 | scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape
129 | box = xyxy2xywh(box) # xywh
130 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner
131 | for p, b in zip(pred.tolist(), box.tolist()):
132 | jdict.append({'image_id': image_id,
133 | 'category_id': coco91class[int(p[5])],
134 | 'bbox': [round(x, 3) for x in b],
135 | 'score': round(p[4], 5)})
136 |
137 | # Assign all predictions as incorrect
138 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
139 | if nl:
140 | detected = [] # target indices
141 | tcls_tensor = labels[:, 0]
142 |
143 | # target boxes
144 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh
145 |
146 | # Per target class
147 | for cls in torch.unique(tcls_tensor):
148 | ti = (cls == tcls_tensor).nonzero().view(-1) # prediction indices
149 | pi = (cls == pred[:, 5]).nonzero().view(-1) # target indices
150 |
151 | # Search for detections
152 | if pi.shape[0]:
153 | # Prediction to target ious
154 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices
155 |
156 | # Append detections
157 | for j in (ious > iouv[0]).nonzero():
158 | d = ti[i[j]] # detected target
159 | if d not in detected:
160 | detected.append(d)
161 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn
162 | if len(detected) == nl: # all targets already located in image
163 | break
164 |
165 | # Append statistics (correct, conf, pcls, tcls)
166 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
167 |
168 | # Plot images
169 | if batch_i < 1:
170 | f = 'test_batch%g_gt.jpg' % batch_i # filename
171 | plot_images(imgs, targets, paths=paths, names=names, fname=f) # ground truth
172 | f = 'test_batch%g_pred.jpg' % batch_i
173 | plot_images(imgs, output_to_target(output, width, height), paths=paths, names=names, fname=f) # predictions
174 |
175 | # Compute statistics
176 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
177 | if len(stats):
178 | p, r, ap, f1, ap_class = ap_per_class(*stats)
179 | if niou > 1:
180 | p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(1), ap[:, 0] # [P, R, AP@0.5:0.95, AP@0.5]
181 | mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
182 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class
183 | else:
184 | nt = torch.zeros(1)
185 |
186 | # Print results
187 | pf = '%20s' + '%10.3g' * 6 # print format
188 | print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
189 |
190 | # Print results per class
191 | if verbose and nc > 1 and len(stats):
192 | for i, c in enumerate(ap_class):
193 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
194 |
195 | # Print speeds
196 | if verbose or save_json:
197 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple
198 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
199 |
200 | # Save JSON
201 | if save_json and map and len(jdict):
202 | print('\nCOCO mAP with pycocotools...')
203 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files]
204 | with open('results.json', 'w') as file:
205 | json.dump(jdict, file)
206 |
207 | try:
208 | from pycocotools.coco import COCO
209 | from pycocotools.cocoeval import COCOeval
210 |
211 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
212 | cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0]) # initialize COCO ground truth api
213 | cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api
214 |
215 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
216 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images
217 | cocoEval.evaluate()
218 | cocoEval.accumulate()
219 | cocoEval.summarize()
220 | # mf1, map = cocoEval.stats[:2] # update to pycocotools results (mAP@0.5:0.95, mAP@0.5)
221 | except:
222 | print('WARNING: pycocotools must be installed with numpy==1.17 to run correctly. '
223 | 'See https://github.com/cocodataset/cocoapi/issues/356')
224 |
225 | # Return results
226 | maps = np.zeros(nc) + map
227 | for i, c in enumerate(ap_class):
228 | maps[c] = ap[i]
229 | return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps
230 |
231 |
232 | if __name__ == '__main__':
233 | parser = argparse.ArgumentParser(prog='test.py')
234 | parser.add_argument('--cfg', type=str, default='cfg/yolov3-spp.cfg', help='*.cfg path')
235 | parser.add_argument('--data', type=str, default='data/coco2014.data', help='*.data path')
236 | parser.add_argument('--weights', type=str, default='weights/yolov3-spp-ultralytics.pt', help='weights path')
237 | parser.add_argument('--batch-size', type=int, default=16, help='size of each image batch')
238 | parser.add_argument('--img-size', type=int, default=512, help='inference size (pixels)')
239 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
240 | parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')
241 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
242 | parser.add_argument('--task', default='test', help="'test', 'study', 'benchmark'")
243 | parser.add_argument('--device', default='', help='device id (i.e. 0 or 0,1) or cpu')
244 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
245 | parser.add_argument('--augment', action='store_true', help='augmented inference')
246 | opt = parser.parse_args()
247 | opt.save_json = opt.save_json or any([x in opt.data for x in ['coco.data', 'coco2014.data', 'coco2017.data']])
248 | opt.cfg = check_file(opt.cfg) # check file
249 | opt.data = check_file(opt.data) # check file
250 | print(opt)
251 |
252 | # task = 'test', 'study', 'benchmark'
253 | if opt.task == 'test': # (default) test normally
254 | test(opt.cfg,
255 | opt.data,
256 | opt.weights,
257 | opt.batch_size,
258 | opt.img_size,
259 | opt.conf_thres,
260 | opt.iou_thres,
261 | opt.save_json,
262 | opt.single_cls,
263 | opt.augment)
264 |
265 | elif opt.task == 'benchmark': # mAPs at 256-640 at conf 0.5 and 0.7
266 | y = []
267 | for i in list(range(256, 640, 128)): # img-size
268 | for j in [0.6, 0.7]: # iou-thres
269 | t = time.time()
270 | r = test(opt.cfg, opt.data, opt.weights, opt.batch_size, i, opt.conf_thres, j, opt.save_json)[0]
271 | y.append(r + (time.time() - t,))
272 | np.savetxt('benchmark.txt', y, fmt='%10.4g') # y = np.loadtxt('study.txt')
273 |
--------------------------------------------------------------------------------
/test_batch0_gt.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/test_batch0_gt.jpg
--------------------------------------------------------------------------------
/test_batch0_pred.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/test_batch0_pred.jpg
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sovit-123/Traffic-Light-Detection-Using-YOLOv3/e04e11a5f240118a6d710e22c50bab28f5aafe19/utils/__init__.py
--------------------------------------------------------------------------------
/utils/adabound.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | import torch
4 | from torch.optim.optimizer import Optimizer
5 |
6 |
7 | class AdaBound(Optimizer):
8 | """Implements AdaBound algorithm.
9 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
10 | Arguments:
11 | params (iterable): iterable of parameters to optimize or dicts defining
12 | parameter groups
13 | lr (float, optional): Adam learning rate (default: 1e-3)
14 | betas (Tuple[float, float], optional): coefficients used for computing
15 | running averages of gradient and its square (default: (0.9, 0.999))
16 | final_lr (float, optional): final (SGD) learning rate (default: 0.1)
17 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
18 | eps (float, optional): term added to the denominator to improve
19 | numerical stability (default: 1e-8)
20 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
21 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
22 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
23 | https://openreview.net/forum?id=Bkg3g2R9FX
24 | """
25 |
26 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
27 | eps=1e-8, weight_decay=0, amsbound=False):
28 | if not 0.0 <= lr:
29 | raise ValueError("Invalid learning rate: {}".format(lr))
30 | if not 0.0 <= eps:
31 | raise ValueError("Invalid epsilon value: {}".format(eps))
32 | if not 0.0 <= betas[0] < 1.0:
33 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
34 | if not 0.0 <= betas[1] < 1.0:
35 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
36 | if not 0.0 <= final_lr:
37 | raise ValueError("Invalid final learning rate: {}".format(final_lr))
38 | if not 0.0 <= gamma < 1.0:
39 | raise ValueError("Invalid gamma parameter: {}".format(gamma))
40 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
41 | weight_decay=weight_decay, amsbound=amsbound)
42 | super(AdaBound, self).__init__(params, defaults)
43 |
44 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
45 |
46 | def __setstate__(self, state):
47 | super(AdaBound, self).__setstate__(state)
48 | for group in self.param_groups:
49 | group.setdefault('amsbound', False)
50 |
51 | def step(self, closure=None):
52 | """Performs a single optimization step.
53 | Arguments:
54 | closure (callable, optional): A closure that reevaluates the model
55 | and returns the loss.
56 | """
57 | loss = None
58 | if closure is not None:
59 | loss = closure()
60 |
61 | for group, base_lr in zip(self.param_groups, self.base_lrs):
62 | for p in group['params']:
63 | if p.grad is None:
64 | continue
65 | grad = p.grad.data
66 | if grad.is_sparse:
67 | raise RuntimeError(
68 | 'Adam does not support sparse gradients, please consider SparseAdam instead')
69 | amsbound = group['amsbound']
70 |
71 | state = self.state[p]
72 |
73 | # State initialization
74 | if len(state) == 0:
75 | state['step'] = 0
76 | # Exponential moving average of gradient values
77 | state['exp_avg'] = torch.zeros_like(p.data)
78 | # Exponential moving average of squared gradient values
79 | state['exp_avg_sq'] = torch.zeros_like(p.data)
80 | if amsbound:
81 | # Maintains max of all exp. moving avg. of sq. grad. values
82 | state['max_exp_avg_sq'] = torch.zeros_like(p.data)
83 |
84 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
85 | if amsbound:
86 | max_exp_avg_sq = state['max_exp_avg_sq']
87 | beta1, beta2 = group['betas']
88 |
89 | state['step'] += 1
90 |
91 | if group['weight_decay'] != 0:
92 | grad = grad.add(group['weight_decay'], p.data)
93 |
94 | # Decay the first and second moment running average coefficient
95 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
96 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
97 | if amsbound:
98 | # Maintains the maximum of all 2nd moment running avg. till now
99 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
100 | # Use the max. for normalizing running avg. of gradient
101 | denom = max_exp_avg_sq.sqrt().add_(group['eps'])
102 | else:
103 | denom = exp_avg_sq.sqrt().add_(group['eps'])
104 |
105 | bias_correction1 = 1 - beta1 ** state['step']
106 | bias_correction2 = 1 - beta2 ** state['step']
107 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
108 |
109 | # Applies bounds on actual learning rate
110 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
111 | final_lr = group['final_lr'] * group['lr'] / base_lr
112 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
113 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
114 | step_size = torch.full_like(denom, step_size)
115 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
116 |
117 | p.data.add_(-step_size)
118 |
119 | return loss
120 |
121 |
122 | class AdaBoundW(Optimizer):
123 | """Implements AdaBound algorithm with Decoupled Weight Decay (arxiv.org/abs/1711.05101)
124 | It has been proposed in `Adaptive Gradient Methods with Dynamic Bound of Learning Rate`_.
125 | Arguments:
126 | params (iterable): iterable of parameters to optimize or dicts defining
127 | parameter groups
128 | lr (float, optional): Adam learning rate (default: 1e-3)
129 | betas (Tuple[float, float], optional): coefficients used for computing
130 | running averages of gradient and its square (default: (0.9, 0.999))
131 | final_lr (float, optional): final (SGD) learning rate (default: 0.1)
132 | gamma (float, optional): convergence speed of the bound functions (default: 1e-3)
133 | eps (float, optional): term added to the denominator to improve
134 | numerical stability (default: 1e-8)
135 | weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
136 | amsbound (boolean, optional): whether to use the AMSBound variant of this algorithm
137 | .. Adaptive Gradient Methods with Dynamic Bound of Learning Rate:
138 | https://openreview.net/forum?id=Bkg3g2R9FX
139 | """
140 |
141 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), final_lr=0.1, gamma=1e-3,
142 | eps=1e-8, weight_decay=0, amsbound=False):
143 | if not 0.0 <= lr:
144 | raise ValueError("Invalid learning rate: {}".format(lr))
145 | if not 0.0 <= eps:
146 | raise ValueError("Invalid epsilon value: {}".format(eps))
147 | if not 0.0 <= betas[0] < 1.0:
148 | raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
149 | if not 0.0 <= betas[1] < 1.0:
150 | raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
151 | if not 0.0 <= final_lr:
152 | raise ValueError("Invalid final learning rate: {}".format(final_lr))
153 | if not 0.0 <= gamma < 1.0:
154 | raise ValueError("Invalid gamma parameter: {}".format(gamma))
155 | defaults = dict(lr=lr, betas=betas, final_lr=final_lr, gamma=gamma, eps=eps,
156 | weight_decay=weight_decay, amsbound=amsbound)
157 | super(AdaBoundW, self).__init__(params, defaults)
158 |
159 | self.base_lrs = list(map(lambda group: group['lr'], self.param_groups))
160 |
161 | def __setstate__(self, state):
162 | super(AdaBoundW, self).__setstate__(state)
163 | for group in self.param_groups:
164 | group.setdefault('amsbound', False)
165 |
166 | def step(self, closure=None):
167 | """Performs a single optimization step.
168 | Arguments:
169 | closure (callable, optional): A closure that reevaluates the model
170 | and returns the loss.
171 | """
172 | loss = None
173 | if closure is not None:
174 | loss = closure()
175 |
176 | for group, base_lr in zip(self.param_groups, self.base_lrs):
177 | for p in group['params']:
178 | if p.grad is None:
179 | continue
180 | grad = p.grad.data
181 | if grad.is_sparse:
182 | raise RuntimeError(
183 | 'Adam does not support sparse gradients, please consider SparseAdam instead')
184 | amsbound = group['amsbound']
185 |
186 | state = self.state[p]
187 |
188 | # State initialization
189 | if len(state) == 0:
190 | state['step'] = 0
191 | # Exponential moving average of gradient values
192 | state['exp_avg'] = torch.zeros_like(p.data)
193 | # Exponential moving average of squared gradient values
194 | state['exp_avg_sq'] = torch.zeros_like(p.data)
195 | if amsbound:
196 | # Maintains max of all exp. moving avg. of sq. grad. values
197 | state['max_exp_avg_sq'] = torch.zeros_like(p.data)
198 |
199 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
200 | if amsbound:
201 | max_exp_avg_sq = state['max_exp_avg_sq']
202 | beta1, beta2 = group['betas']
203 |
204 | state['step'] += 1
205 |
206 | # Decay the first and second moment running average coefficient
207 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
208 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
209 | if amsbound:
210 | # Maintains the maximum of all 2nd moment running avg. till now
211 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
212 | # Use the max. for normalizing running avg. of gradient
213 | denom = max_exp_avg_sq.sqrt().add_(group['eps'])
214 | else:
215 | denom = exp_avg_sq.sqrt().add_(group['eps'])
216 |
217 | bias_correction1 = 1 - beta1 ** state['step']
218 | bias_correction2 = 1 - beta2 ** state['step']
219 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
220 |
221 | # Applies bounds on actual learning rate
222 | # lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
223 | final_lr = group['final_lr'] * group['lr'] / base_lr
224 | lower_bound = final_lr * (1 - 1 / (group['gamma'] * state['step'] + 1))
225 | upper_bound = final_lr * (1 + 1 / (group['gamma'] * state['step']))
226 | step_size = torch.full_like(denom, step_size)
227 | step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_(exp_avg)
228 |
229 | if group['weight_decay'] != 0:
230 | decayed_weights = torch.mul(p.data, group['weight_decay'])
231 | p.data.add_(-step_size)
232 | p.data.sub_(decayed_weights)
233 | else:
234 | p.data.add_(-step_size)
235 |
236 | return loss
237 |
--------------------------------------------------------------------------------
/utils/evolve.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #for i in 0 1 2 3
3 | #do
4 | # t=ultralytics/yolov3:v139 && sudo docker pull $t && sudo nvidia-docker run -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t utils/evolve.sh $i
5 | # sleep 30
6 | #done
7 |
8 | while true; do
9 | # python3 train.py --data ../data/sm4/out.data --img-size 320 --epochs 100 --batch 64 --accum 1 --weights yolov3-tiny.conv.15 --multi --bucket ult/wer --evolve --cache --device $1 --cfg yolov3-tiny3-1cls.cfg --single --adam
10 | # python3 train.py --data ../out/data.data --img-size 608 --epochs 10 --batch 8 --accum 8 --weights ultralytics68.pt --multi --bucket ult/athena --evolve --device $1 --cfg yolov3-spp-1cls.cfg
11 |
12 | python3 train.py --data coco2014.data --img-size 512 608 --epochs 27 --batch 8 --accum 8 --evolve --weights '' --bucket ult/coco/sppa_512 --device $1 --cfg yolov3-sppa.cfg --multi
13 | done
14 |
15 |
16 | # coco epoch times --img-size 416 608 --epochs 27 --batch 16 --accum 4
17 | # 36:34 2080ti
18 | # 21:58 V100
19 | # 63:00 T4
--------------------------------------------------------------------------------
/utils/gcp.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | # New VM
4 | rm -rf sample_data yolov3
5 | git clone https://github.com/ultralytics/yolov3
6 | # git clone -b test --depth 1 https://github.com/ultralytics/yolov3 test # branch
7 | # sudo apt-get install zip
8 | #git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . --user && cd .. && rm -rf apex
9 | sudo conda install -yc conda-forge scikit-image pycocotools
10 | # python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('193Zp_ye-3qXMonR1nZj3YyxMtQkMy50k','coco2014.zip')"
11 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1WQT6SOktSe8Uw6r10-2JhbEhMY5DJaph','coco2017.zip')"
12 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('1C3HewOG9akA3y456SZLBJZfNDPkBwAto','knife.zip')"
13 | python3 -c "from yolov3.utils.google_utils import gdrive_download; gdrive_download('13g3LqdpkNE8sPosVJT6KFXlfoMypzRP4','sm4.zip')"
14 | sudo shutdown
15 |
16 | # Mount local SSD
17 | lsblk
18 | sudo mkfs.ext4 -F /dev/nvme0n1
19 | sudo mkdir -p /mnt/disks/nvme0n1
20 | sudo mount /dev/nvme0n1 /mnt/disks/nvme0n1
21 | sudo chmod a+w /mnt/disks/nvme0n1
22 | cp -r coco /mnt/disks/nvme0n1
23 |
24 | # Kill All
25 | t=ultralytics/yolov3:v1
26 | docker kill $(docker ps -a -q --filter ancestor=$t)
27 |
28 | # Evolve coco
29 | sudo -s
30 | t=ultralytics/yolov3:evolve
31 | # docker kill $(docker ps -a -q --filter ancestor=$t)
32 | for i in 0 1 6 7
33 | do
34 | docker pull $t && docker run --gpus all -d --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t bash utils/evolve.sh $i
35 | sleep 30
36 | done
37 |
38 | #COCO training
39 | n=131 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 16 --weights '' --device 0 --cfg yolov3-spp.cfg --bucket ult/coco --name $n && sudo shutdown
40 | n=132 && t=ultralytics/coco:v131 && sudo docker pull $t && sudo docker run -it --gpus all --ipc=host -v "$(pwd)"/coco:/usr/src/coco $t python3 train.py --data coco2014.data --img-size 320 640 --epochs 300 --batch 64 --weights '' --device 0 --cfg yolov3-tiny.cfg --bucket ult/coco --name $n && sudo shutdown
41 |
--------------------------------------------------------------------------------
/utils/google_utils.py:
--------------------------------------------------------------------------------
1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries
2 | # pip install --upgrade google-cloud-storage
3 |
4 | import os
5 | import time
6 |
7 |
8 | # from google.cloud import storage
9 |
10 |
11 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'):
12 | # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f
13 | # Downloads a file from Google Drive, accepting presented query
14 | # from utils.google_utils import *; gdrive_download()
15 | t = time.time()
16 |
17 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='')
18 | os.remove(name) if os.path.exists(name) else None # remove existing
19 | os.remove('cookie') if os.path.exists('cookie') else None
20 |
21 | # Attempt file download
22 | os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id)
23 | if os.path.exists('cookie'): # large file
24 | s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % (
25 | id, name)
26 | else: # small file
27 | s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id)
28 | r = os.system(s) # execute, capture return values
29 | os.remove('cookie') if os.path.exists('cookie') else None
30 |
31 | # Error check
32 | if r != 0:
33 | os.remove(name) if os.path.exists(name) else None # remove partial
34 | print('Download error ') # raise Exception('Download error')
35 | return r
36 |
37 | # Unzip if archive
38 | if name.endswith('.zip'):
39 | print('unzipping... ', end='')
40 | os.system('unzip -q %s' % name) # unzip
41 | os.remove(name) # remove zip to free space
42 |
43 | print('Done (%.1fs)' % (time.time() - t))
44 | return r
45 |
46 |
47 | def upload_blob(bucket_name, source_file_name, destination_blob_name):
48 | # Uploads a file to a bucket
49 | # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
50 |
51 | storage_client = storage.Client()
52 | bucket = storage_client.get_bucket(bucket_name)
53 | blob = bucket.blob(destination_blob_name)
54 |
55 | blob.upload_from_filename(source_file_name)
56 |
57 | print('File {} uploaded to {}.'.format(
58 | source_file_name,
59 | destination_blob_name))
60 |
61 |
62 | def download_blob(bucket_name, source_blob_name, destination_file_name):
63 | # Uploads a blob from a bucket
64 | storage_client = storage.Client()
65 | bucket = storage_client.get_bucket(bucket_name)
66 | blob = bucket.blob(source_blob_name)
67 |
68 | blob.download_to_filename(destination_file_name)
69 |
70 | print('Blob {} downloaded to {}.'.format(
71 | source_blob_name,
72 | destination_file_name))
73 |
--------------------------------------------------------------------------------
/utils/layers.py:
--------------------------------------------------------------------------------
1 | import torch.nn.functional as F
2 |
3 | from utils.utils import *
4 |
5 |
6 | def make_divisible(v, divisor):
7 | # Function ensures all layers have a channel number that is divisible by 8
8 | # https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
9 | return math.ceil(v / divisor) * divisor
10 |
11 |
12 | class Flatten(nn.Module):
13 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions
14 | def forward(self, x):
15 | return x.view(x.size(0), -1)
16 |
17 |
18 | class Concat(nn.Module):
19 | # Concatenate a list of tensors along dimension
20 | def __init__(self, dimension=1):
21 | super(Concat, self).__init__()
22 | self.d = dimension
23 |
24 | def forward(self, x):
25 | return torch.cat(x, self.d)
26 |
27 |
28 | class FeatureConcat(nn.Module):
29 | def __init__(self, layers):
30 | super(FeatureConcat, self).__init__()
31 | self.layers = layers # layer indices
32 | self.multiple = len(layers) > 1 # multiple layers flag
33 |
34 | def forward(self, x, outputs):
35 | return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
36 |
37 |
38 | class WeightedFeatureFusion(nn.Module): # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
39 | def __init__(self, layers, weight=False):
40 | super(WeightedFeatureFusion, self).__init__()
41 | self.layers = layers # layer indices
42 | self.weight = weight # apply weights boolean
43 | self.n = len(layers) + 1 # number of layers
44 | if weight:
45 | self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True) # layer weights
46 |
47 | def forward(self, x, outputs):
48 | # Weights
49 | if self.weight:
50 | w = torch.sigmoid(self.w) * (2 / self.n) # sigmoid weights (0-1)
51 | x = x * w[0]
52 |
53 | # Fusion
54 | nx = x.shape[1] # input channels
55 | for i in range(self.n - 1):
56 | a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]] # feature to add
57 | na = a.shape[1] # feature channels
58 |
59 | # Adjust channels
60 | if nx == na: # same shape
61 | x = x + a
62 | elif nx > na: # slice input
63 | x[:, :na] = x[:, :na] + a # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
64 | else: # slice feature
65 | x = x + a[:, :nx]
66 |
67 | return x
68 |
69 |
70 | class MixConv2d(nn.Module): # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595
71 | def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'):
72 | super(MixConv2d, self).__init__()
73 |
74 | groups = len(k)
75 | if method == 'equal_ch': # equal channels per group
76 | i = torch.linspace(0, groups - 1E-6, out_ch).floor() # out_ch indices
77 | ch = [(i == g).sum() for g in range(groups)]
78 | else: # 'equal_params': equal parameter count per group
79 | b = [out_ch] + [0] * groups
80 | a = np.eye(groups + 1, groups, k=-1)
81 | a -= np.roll(a, 1, axis=1)
82 | a *= np.array(k) ** 2
83 | a[0] = 1
84 | ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int) # solve for equal weight indices, ax = b
85 |
86 | self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch,
87 | out_channels=ch[g],
88 | kernel_size=k[g],
89 | stride=stride,
90 | padding=k[g] // 2, # 'same' pad
91 | dilation=dilation,
92 | bias=bias) for g in range(groups)])
93 |
94 | def forward(self, x):
95 | return torch.cat([m(x) for m in self.m], 1)
96 |
97 |
98 | # Activation functions below -------------------------------------------------------------------------------------------
99 | class SwishImplementation(torch.autograd.Function):
100 | @staticmethod
101 | def forward(ctx, x):
102 | ctx.save_for_backward(x)
103 | return x * torch.sigmoid(x)
104 |
105 | @staticmethod
106 | def backward(ctx, grad_output):
107 | x = ctx.saved_tensors[0]
108 | sx = torch.sigmoid(x) # sigmoid(ctx)
109 | return grad_output * (sx * (1 + x * (1 - sx)))
110 |
111 |
112 | class MishImplementation(torch.autograd.Function):
113 | @staticmethod
114 | def forward(ctx, x):
115 | ctx.save_for_backward(x)
116 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x)))
117 |
118 | @staticmethod
119 | def backward(ctx, grad_output):
120 | x = ctx.saved_tensors[0]
121 | sx = torch.sigmoid(x)
122 | fx = F.softplus(x).tanh()
123 | return grad_output * (fx + x * sx * (1 - fx * fx))
124 |
125 |
126 | class MemoryEfficientSwish(nn.Module):
127 | def forward(self, x):
128 | return SwishImplementation.apply(x)
129 |
130 |
131 | class MemoryEfficientMish(nn.Module):
132 | def forward(self, x):
133 | return MishImplementation.apply(x)
134 |
135 |
136 | class Swish(nn.Module):
137 | def forward(self, x):
138 | return x * torch.sigmoid(x)
139 |
140 |
141 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf
142 | def forward(self, x):
143 | return x * F.hardtanh(x + 3, 0., 6., True) / 6.
144 |
145 |
146 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish
147 | def forward(self, x):
148 | return x * F.softplus(x).tanh()
149 |
--------------------------------------------------------------------------------
/utils/parse_config.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | import numpy as np
4 |
5 |
6 | def parse_model_cfg(path):
7 | # Parse the yolo *.cfg file and return module definitions path may be 'cfg/yolov3.cfg', 'yolov3.cfg', or 'yolov3'
8 | if not path.endswith('.cfg'): # add .cfg suffix if omitted
9 | path += '.cfg'
10 | if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path): # add cfg/ prefix if omitted
11 | path = 'cfg' + os.sep + path
12 |
13 | with open(path, 'r') as f:
14 | lines = f.read().split('\n')
15 | lines = [x for x in lines if x and not x.startswith('#')]
16 | lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespaces
17 | mdefs = [] # module definitions
18 | for line in lines:
19 | if line.startswith('['): # This marks the start of a new block
20 | mdefs.append({})
21 | mdefs[-1]['type'] = line[1:-1].rstrip()
22 | if mdefs[-1]['type'] == 'convolutional':
23 | mdefs[-1]['batch_normalize'] = 0 # pre-populate with zeros (may be overwritten later)
24 | else:
25 | key, val = line.split("=")
26 | key = key.rstrip()
27 |
28 | if key == 'anchors': # return nparray
29 | mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2)) # np anchors
30 | elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val): # return array
31 | mdefs[-1][key] = [int(x) for x in val.split(',')]
32 | else:
33 | val = val.strip()
34 | # TODO: .isnumeric() actually fails to get the float case
35 | if val.isnumeric(): # return int or float
36 | mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
37 | else:
38 | mdefs[-1][key] = val # return string
39 |
40 | # Check all fields are supported
41 | supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
42 | 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
43 | 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
44 | 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'probability']
45 |
46 | f = [] # fields
47 | for x in mdefs[1:]:
48 | [f.append(k) for k in x if k not in f]
49 | u = [x for x in f if x not in supported] # unsupported fields
50 | assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
51 |
52 | return mdefs
53 |
54 |
55 | def parse_data_cfg(path):
56 | # Parses the data configuration file
57 | if not os.path.exists(path) and os.path.exists('data' + os.sep + path): # add data/ prefix if omitted
58 | path = 'data' + os.sep + path
59 |
60 | with open(path, 'r') as f:
61 | lines = f.readlines()
62 |
63 | options = dict()
64 | for line in lines:
65 | line = line.strip()
66 | if line == '' or line.startswith('#'):
67 | continue
68 | key, val = line.split('=')
69 | options[key.strip()] = val.strip()
70 |
71 | return options
72 |
--------------------------------------------------------------------------------
/utils/torch_utils.py:
--------------------------------------------------------------------------------
1 | import math
2 | import os
3 | import time
4 | from copy import deepcopy
5 |
6 | import torch
7 | import torch.backends.cudnn as cudnn
8 | import torch.nn as nn
9 | import torch.nn.functional as F
10 |
11 |
12 | def init_seeds(seed=0):
13 | torch.manual_seed(seed)
14 |
15 | # Reduce randomness (may be slower on Tesla GPUs) # https://pytorch.org/docs/stable/notes/randomness.html
16 | if seed == 0:
17 | cudnn.deterministic = False
18 | cudnn.benchmark = True
19 |
20 |
21 | def select_device(device='', apex=False, batch_size=None):
22 | # device = 'cpu' or '0' or '0,1,2,3'
23 | cpu_request = device.lower() == 'cpu'
24 | if device and not cpu_request: # if device requested other than 'cpu'
25 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable
26 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity
27 |
28 | cuda = False if cpu_request else torch.cuda.is_available()
29 | if cuda:
30 | c = 1024 ** 2 # bytes to MB
31 | ng = torch.cuda.device_count()
32 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count
33 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng)
34 | x = [torch.cuda.get_device_properties(i) for i in range(ng)]
35 | s = 'Using CUDA ' + ('Apex ' if apex else '') # apex for mixed precision https://github.com/NVIDIA/apex
36 | for i in range(0, ng):
37 | if i == 1:
38 | s = ' ' * len(s)
39 | print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" %
40 | (s, i, x[i].name, x[i].total_memory / c))
41 | else:
42 | print('Using CPU')
43 |
44 | print('') # skip a line
45 | return torch.device('cuda:0' if cuda else 'cpu')
46 |
47 |
48 | def time_synchronized():
49 | torch.cuda.synchronize() if torch.cuda.is_available() else None
50 | return time.time()
51 |
52 |
53 | def initialize_weights(model):
54 | for m in model.modules():
55 | t = type(m)
56 | if t is nn.Conv2d:
57 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
58 | elif t is nn.BatchNorm2d:
59 | m.eps = 1e-4
60 | m.momentum = 0.03
61 | elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]:
62 | m.inplace = True
63 |
64 |
65 | def find_modules(model, mclass=nn.Conv2d):
66 | # finds layer indices matching module class 'mclass'
67 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]
68 |
69 |
70 | def fuse_conv_and_bn(conv, bn):
71 | # https://tehnokv.com/posts/fusing-batchnorm-and-conv/
72 | with torch.no_grad():
73 | # init
74 | fusedconv = torch.nn.Conv2d(conv.in_channels,
75 | conv.out_channels,
76 | kernel_size=conv.kernel_size,
77 | stride=conv.stride,
78 | padding=conv.padding,
79 | bias=True)
80 |
81 | # prepare filters
82 | w_conv = conv.weight.clone().view(conv.out_channels, -1)
83 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
84 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size()))
85 |
86 | # prepare spatial bias
87 | if conv.bias is not None:
88 | b_conv = conv.bias
89 | else:
90 | b_conv = torch.zeros(conv.weight.size(0))
91 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
92 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
93 |
94 | return fusedconv
95 |
96 |
97 | def model_info(model, verbose=False):
98 | # Plots a line-by-line description of a PyTorch model
99 | n_p = sum(x.numel() for x in model.parameters()) # number parameters
100 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients
101 | if verbose:
102 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma'))
103 | for i, (name, p) in enumerate(model.named_parameters()):
104 | name = name.replace('module_list.', '')
105 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
106 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
107 |
108 | try: # FLOPS
109 | from thop import profile
110 | macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False)
111 | fs = ', %.1f GFLOPS' % (macs / 1E9 * 2)
112 | except:
113 | fs = ''
114 |
115 | print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs))
116 |
117 |
118 | def load_classifier(name='resnet101', n=2):
119 | # Loads a pretrained model reshaped to n-class output
120 | import pretrainedmodels # https://github.com/Cadene/pretrained-models.pytorch#torchvision
121 | model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet')
122 |
123 | # Display model properties
124 | for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']:
125 | print(x + ' =', eval(x))
126 |
127 | # Reshape output to n classes
128 | filters = model.last_linear.weight.shape[1]
129 | model.last_linear.bias = torch.nn.Parameter(torch.zeros(n))
130 | model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters))
131 | model.last_linear.out_features = n
132 | return model
133 |
134 |
135 | def scale_img(img, ratio=1.0, same_shape=True): # img(16,3,256,416), r=ratio
136 | # scales img(bs,3,y,x) by ratio
137 | h, w = img.shape[2:]
138 | s = (int(h * ratio), int(w * ratio)) # new size
139 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize
140 | if not same_shape: # pad/crop img
141 | gs = 64 # (pixels) grid size
142 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
143 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean
144 |
145 |
146 | class ModelEMA:
147 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
148 | Keep a moving average of everything in the model state_dict (parameters and buffers).
149 | This is intended to allow functionality like
150 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
151 | A smoothed version of the weights is necessary for some training schemes to perform well.
152 | E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use
153 | RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA
154 | smoothing of weights to match results. Pay attention to the decay constant you are using
155 | relative to your update count per epoch.
156 | To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but
157 | disable validation of the EMA weights. Validation will have to be done manually in a separate
158 | process, or after the training stops converging.
159 | This class is sensitive where it is initialized in the sequence of model init,
160 | GPU assignment and distributed training wrappers.
161 | I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU.
162 | """
163 |
164 | def __init__(self, model, decay=0.9999, device=''):
165 | # make a copy of the model for accumulating moving average of weights
166 | self.ema = deepcopy(model)
167 | self.ema.eval()
168 | self.updates = 0 # number of EMA updates
169 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs)
170 | self.device = device # perform ema on different device from model if set
171 | if device:
172 | self.ema.to(device=device)
173 | for p in self.ema.parameters():
174 | p.requires_grad_(False)
175 |
176 | def update(self, model):
177 | self.updates += 1
178 | d = self.decay(self.updates)
179 | with torch.no_grad():
180 | if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
181 | msd, esd = model.module.state_dict(), self.ema.module.state_dict()
182 | else:
183 | msd, esd = model.state_dict(), self.ema.state_dict()
184 |
185 | for k, v in esd.items():
186 | if v.dtype.is_floating_point:
187 | v *= d
188 | v += (1. - d) * msd[k].detach()
189 |
190 | def update_attr(self, model):
191 | # Assign attributes (which may change during training)
192 | for k in model.__dict__.keys():
193 | if not k.startswith('_'):
194 | setattr(self.ema, k, getattr(model, k))
195 |
--------------------------------------------------------------------------------
/weights/readme.txt:
--------------------------------------------------------------------------------
1 | Put your .pt weight files here.
--------------------------------------------------------------------------------