├── Benchmarks_retinanet.xlsx
├── LICENSE
├── README.md
├── challenge2018
└── challenge-2018-class-descriptions-500.csv
├── images
├── test
│ ├── 00000b4dcff7f799.jpg
│ └── 0000d67245642c5f.jpg
└── train
│ ├── 0000b86e2fd18333.jpg
│ └── 0000b9115cdf1e54.jpg
├── keras_retinanet
├── .gitignore
├── callbacks
│ └── callbacks.py
├── models
│ ├── classifier.py
│ ├── model_backbone.py
│ ├── resnet.py
│ └── retinanet.py
├── preprocessing
│ ├── generator.py
│ ├── image.py
│ └── open_images.py
├── setup.py
├── trainer
│ ├── convert_model.py
│ ├── evaluate.py
│ ├── model.py
│ └── task.py
└── utils
│ ├── anchors.py
│ ├── clean.py
│ ├── freeze.py
│ ├── initializers.py
│ ├── layers.py
│ └── losses.py
└── logo
├── keras-logo-2018-large-1200.png
└── share2.jpg
/Benchmarks_retinanet.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/Benchmarks_retinanet.xlsx
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Mukesh Mithrakumar
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |

2 |
3 | Keras RetinaNet
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 | What is it :question:
28 |
29 | This is the Keras implementation of RetinaNet for object detection as described in
30 | [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)
31 | by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár.
32 |
33 | If this repository helps you in anyway, show your love :heart: by putting a :star: on this project :v:
34 |
35 |
36 | ##### Object Detection:
37 | The RetinaNet used is a single, unified network composed of a resnet50 backbone network and two task-specific
38 | subnetworks. The backbone is responsible for computing a convolution feature map over an entire input image and is
39 | an off-the-self convolution network. The first subnet performs classification on the backbones output; the second
40 | subnet performs convolution bounding box regression.
41 | The RetinaNet is a good model for object detection but getting it to work was a challenge. I underestimated the high
42 | number of classes and the size of the data set but was still able to land a bronze medal (Top 20%) among 450
43 | competitors with some tweaks. The benchmark file is added for reference with the local score for predictions and
44 | the parameter used.
45 |
46 | ##### Visual Relationship:
47 | I focused on Object detection and used a simple multi class linear regressor for relationship prediction. Unlike the
48 | usual approach of using a LSTM, I experimented with a Random Forest Classifier and a Multi Output Classifier from
49 | sklearn just to prove LSTM doesn't have much intelligence behind it and it was just a statistical tool. And the
50 | local classification scores proved I was right with giving me an accuracy greater than 90%. And since my visual
51 | relationship was based on how good my object detector performed I was not able to get a better score but with this
52 | model I was able to land a bronze model (Top 30%) among 230 competitors.
53 |
54 | ##### Lessons Learned with Tips:
55 | 1. Not to threshold the predictions and leave the low confidence predictions in the submission file.
56 | Because of the way average precision works, you cannot be penalised for adding additional false positives
57 | with a lower confidence than all your other predictions, however you can still improve your recall if you
58 | find additional objects that weren’t previously detected.
59 | 2. The number of steps and epochs, due to the number of images in the train set, having a balanced number of steps
60 | and epochs is very important and more important than that is to take all these classes and divide it into bins.
61 | Where each bin is occupied by classes with similar frequency in the data set to prepare proper epoch.
62 | 3. When running the training for the classes, to make sure that each class (within an epoch)
63 | has similar number of occurrences by implementing a sampler to do this work.
64 |
65 | :clipboard: Getting Started
66 |
67 | The build was made for the Google AI Object Detection and Visual Relationship Kaggle challenge so if you are using
68 | this project on Googles' Open Image data set follow the instructions below to run the module. Also the code is written
69 | in such a way that you can take individual modules to build a custom model as per your needs. So when you install the
70 | model, make sure you turn the imports into absolute imports or follow the Folder Structure shown below.
71 |
72 | ### :dvd: Software Prerequisites
73 |
74 | - keras
75 | - keras-resnet
76 | - tensorflow
77 | - pandas
78 | - numpy
79 | - pillow
80 | - opencv
81 | - sklearn
82 |
83 | ### :computer: Hardware Prerequisites
84 | The code was initially run on a NVIDIA GeForce GTX 1050 Ti but the model exploded since for the Open Image data set
85 | consisted of 1,743,042 Images and 500 classes with 12,195,144 bounding boxes and the image size was resized to
86 | 600 by 600. Resizing the images could have solved the issue but did not try it. Instead the code was run on a
87 | NVIDIA Tesla K80 and the model worked fine and to convert the training model to a inference model NVIDIA Tesla P100
88 | was used. So I would recommend a K80 or a higher version of GPU.
89 |
90 | ### :blue_book: Folder Structure
91 |
92 | ```
93 | main_dir
94 | - challenge2018 (The folder containing data files for the challenge)
95 | - images
96 | - train (consists of the train images)
97 | - test (consists of the test images)
98 | - keras_retinanet (keras retinanet package)
99 | - callbacks
100 | - callbacks.py
101 | - models
102 | - classifier.py
103 | - model_backbone.py
104 | - resnet.py
105 | - retinanet.py
106 | - preprocessing
107 | - generator.py
108 | - image.py
109 | - open_images.py
110 | - trainer
111 | - convert_model.py
112 | - evaluate.py
113 | - model.py
114 | - task.py
115 | - utils
116 | - anchors.py
117 | - clean.py
118 | - freeze.py
119 | - initializers.py
120 | - layers.py
121 | - losses.py
122 | ```
123 |
124 | :hourglass: Train
125 |
126 | Run the ```task.py``` from the trainer folder.
127 |
128 | #### Usage
129 | ```
130 | task.py main_dir(path/to/main directory) dataset_type(oid)
131 | ```
132 |
133 | :watch: Test
134 |
135 | First run the ```convert_model.py``` to convert the training model to inference model.
136 | Then run the ```evaluate.py``` for evaluation. Evaluation is defaulted for both object detection and visual
137 | relationship identification, to select between the object detection and the visual relationship identification
138 | add 'od' or 'vr' when calling the ```evaluate.py```
139 |
140 | #### Usage
141 | ```
142 | convert_model.py main_dir(path/to/main directory) model_in(model name to be used to convert)
143 | evaluate.py main_dir(path/to/main directory) model_in(model name to be used for evaluation)
144 | ```
145 |
146 | :page_facing_up: Documentation
147 |
148 |
149 | callbacks.py:
150 | - CALLED: at model.py by the create callbacks function
151 | - DOES: returns a set of callbacks used for training
152 |
153 | classifier.py:
154 | - CALLED: at evaluate.py by the main function
155 | - DOES: returns a Logistic Regression regressor for visual relationship prediction
156 |
157 | model_backbone.py:
158 | * CALLED: at model.py by the train function
159 | * DOES: Load the retinanet model using the correct backbone.
160 |
161 | resnet.py:
162 | * CALLED: at model_backbone.py by the backbone function
163 | * DOES: Constructs a retinanet model using a resnet backbone.
164 |
165 | retinanet.py:
166 | * CALLED: at resnet.py by the resnet_retinanet function
167 | * DOES: Construct a RetinaNet model on top of a backbone
168 |
169 | generator.py:
170 | * CALLED: at open_images.py by the OpenImagesGenerator class
171 | * DOES: creates a train and validation generator for open_images.py processing
172 |
173 | image.py:
174 | * CALLED: at generator.py by the Generator class
175 | * DOES: transformations and pre processing on the images
176 |
177 | open_images.py:
178 | * CALLED: at model.py by the create_generators function
179 | * DOES: returns train and validation generators
180 |
181 | convert_model.py:
182 | * CALLED: stand alone file to convert the train model to inference model
183 | * DOES: converts a train model to inference model
184 |
185 | evaluate.py:
186 | * CALLED: stand alone evaluation file
187 | * DOES: object and visual relationship detection and identification
188 |
189 | model.py:
190 | * CALLED: at task.py
191 | * DOES: the training
192 |
193 | task.py:
194 | * CALLED: stand alone file to be called to start training
195 | * DOES: initiates the training
196 |
197 | anchors.py:
198 | * CALLED: at generator.py
199 | * DOES: Generate anchors for bbox detection
200 |
201 | clean.py:
202 | * CALLED: stand alone file
203 | * DOES: creates ddata files based on the downloaded train and test images
204 |
205 | freeze.py:
206 | * CALLED: at model.py by the create_models function
207 | * DOES: freeze layers for training
208 |
209 | initializers.py:
210 | * CALLED: at retinanet.py
211 | * DOES: Applies a prior probability to the weights
212 |
213 | layers.py:
214 | * CALLED: at retinanet.py
215 | * DOES: Keras layer for filtering detections
216 |
217 | losses.py:
218 | * CALLED: at model.py by the create_models function
219 | * DOES: calculate the focal and smooth_l1 losses
220 |
221 | :alien: Authors
222 |
223 | * **Mukesh Mithrakumar** - *Initial work* - [Keras_RetinaNet](https://github.com/mukeshmithrakumar/)
224 |
225 | :key: License
226 |
227 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
228 |
229 | :loudspeaker: Acknowledgments
230 |
231 | * Inspiration from Fizyr Keras RetinaNet
232 |
--------------------------------------------------------------------------------
/challenge2018/challenge-2018-class-descriptions-500.csv:
--------------------------------------------------------------------------------
1 | /m/061hd_,Infant bed
2 | /m/06m11,Rose
3 | /m/03120,Flag
4 | /m/01kb5b,Flashlight
5 | /m/0120dh,Sea turtle
6 | /m/0dv5r,Camera
7 | /m/0jbk,Animal
8 | /m/0174n1,Glove
9 | /m/09f_2,Crocodile
10 | /m/01xq0k1,Cattle
11 | /m/03jm5,House
12 | /m/02g30s,Guacamole
13 | /m/05z6w,Penguin
14 | /m/01jfm_,Vehicle registration plate
15 | /m/076lb9,Bench
16 | /m/0gj37,Ladybug
17 | /m/0k0pj,Human nose
18 | /m/0kpqd,Watermelon
19 | /m/0l14j_,Flute
20 | /m/0cyf8,Butterfly
21 | /m/0174k2,Washing machine
22 | /m/0dq75,Raccoon
23 | /m/076bq,Segway
24 | /m/07crc,Taco
25 | /m/0d8zb,Jellyfish
26 | /m/0fszt,Cake
27 | /m/0k1tl,Pen
28 | /m/020kz,Cannon
29 | /m/09728,Bread
30 | /m/07j7r,Tree
31 | /m/0fbdv,Shellfish
32 | /m/03ssj5,Bed
33 | /m/03qrc,Hamster
34 | /m/02dl1y,Hat
35 | /m/01k6s3,Toaster
36 | /m/02jfl0,Sombrero
37 | /m/01krhy,Tiara
38 | /m/04kkgm,Bowl
39 | /m/0ft9s,Dragonfly
40 | /m/0d_2m,Moths and butterflies
41 | /m/0czz2,Antelope
42 | /m/0f4s2w,Vegetable
43 | /m/07dd4,Torch
44 | /m/0cgh4,Building
45 | /m/03bbps,Power plugs and sockets
46 | /m/02pjr4,Blender
47 | /m/04p0qw,Billiard table
48 | /m/02pdsw,Cutting board
49 | /m/01yx86,Bronze sculpture
50 | /m/09dzg,Turtle
51 | /m/0hkxq,Broccoli
52 | /m/07dm6,Tiger
53 | /m/054_l,Mirror
54 | /m/01dws,Bear
55 | /m/027pcv,Zucchini
56 | /m/01d40f,Dress
57 | /m/02rgn06,Volleyball
58 | /m/0342h,Guitar
59 | /m/06bt6,Reptile
60 | /m/0323sq,Golf cart
61 | /m/02zvsm,Tart
62 | /m/02fq_6,Fedora
63 | /m/01lrl,Carnivore
64 | /m/0k4j,Car
65 | /m/04h7h,Lighthouse
66 | /m/07xyvk,Coffeemaker
67 | /m/03y6mg,Food processor
68 | /m/07r04,Truck
69 | /m/03__z0,Bookcase
70 | /m/019w40,Surfboard
71 | /m/09j5n,Footwear
72 | /m/0cvnqh,Bench
73 | /m/01llwg,Necklace
74 | /m/0c9ph5,Flower
75 | /m/015x5n,Radish
76 | /m/0gd2v,Marine mammal
77 | /m/04v6l4,Frying pan
78 | /m/02jz0l,Tap
79 | /m/0dj6p,Peach
80 | /m/04ctx,Knife
81 | /m/080hkjn,Handbag
82 | /m/01c648,Laptop
83 | /m/01j61q,Tent
84 | /m/012n7d,Ambulance
85 | /m/025nd,Christmas tree
86 | /m/09csl,Eagle
87 | /m/01lcw4,Limousine
88 | /m/0h8n5zk,Kitchen & dining room table
89 | /m/0633h,Polar bear
90 | /m/01fdzj,Tower
91 | /m/01226z,Football
92 | /m/0mw_6,Willow
93 | /m/04hgtk,Human head
94 | /m/02pv19,Stop sign
95 | /m/09qck,Banana
96 | /m/063rgb,Mixer
97 | /m/0lt4_,Binoculars
98 | /m/0270h,Dessert
99 | /m/01h3n,Bee
100 | /m/01mzpv,Chair
101 | /m/04169hn,Wood-burning stove
102 | /m/0fm3zh,Flowerpot
103 | /m/0d20w4,Beaker
104 | /m/0_cp5,Oyster
105 | /m/01dy8n,Woodpecker
106 | /m/03m5k,Harp
107 | /m/03dnzn,Bathtub
108 | /m/0h8mzrc,Wall clock
109 | /m/0h8mhzd,Sports uniform
110 | /m/03d443,Rhinoceros
111 | /m/01gllr,Beehive
112 | /m/0642b4,Cupboard
113 | /m/09b5t,Chicken
114 | /m/04yx4,Man
115 | /m/01f8m5,Blue jay
116 | /m/015x4r,Cucumber
117 | /m/01j51,Balloon
118 | /m/02zt3,Kite
119 | /m/03tw93,Fireplace
120 | /m/01jfsr,Lantern
121 | /m/04ylt,Missile
122 | /m/0bt_c3,Book
123 | /m/0cmx8,Spoon
124 | /m/0hqkz,Grapefruit
125 | /m/071qp,Squirrel
126 | /m/0cyhj_,Orange
127 | /m/01xygc,Coat
128 | /m/0420v5,Punching bag
129 | /m/0898b,Zebra
130 | /m/01knjb,Billboard
131 | /m/0199g,Bicycle
132 | /m/03c7gz,Door handle
133 | /m/02x984l,Mechanical fan
134 | /m/04zwwv,Ring binder
135 | /m/04bcr3,Table
136 | /m/0gv1x,Parrot
137 | /m/01nq26,Sock
138 | /m/02s195,Vase
139 | /m/083kb,Weapon
140 | /m/06nrc,Shotgun
141 | /m/0jyfg,Glasses
142 | /m/0nybt,Seahorse
143 | /m/0176mf,Belt
144 | /m/01rzcn,Watercraft
145 | /m/0d4v4,Window
146 | /m/03bk1,Giraffe
147 | /m/096mb,Lion
148 | /m/0h9mv,Tire
149 | /m/07yv9,Vehicle
150 | /m/0ph39,Canoe
151 | /m/01rkbr,Tie
152 | /m/0gjbg72,Shelf
153 | /m/06z37_,Picture frame
154 | /m/01m4t,Printer
155 | /m/035r7c,Human leg
156 | /m/019jd,Boat
157 | /m/02tsc9,Slow cooker
158 | /m/015wgc,Croissant
159 | /m/0c06p,Candle
160 | /m/01dwwc,Pancake
161 | /m/034c16,Pillow
162 | /m/0242l,Coin
163 | /m/02lbcq,Stretcher
164 | /m/03nfch,Sandal
165 | /m/03bt1vf,Woman
166 | /m/01lynh,Stairs
167 | /m/03q5t,Harpsichord
168 | /m/0fqt361,Stool
169 | /m/01bjv,Bus
170 | /m/01s55n,Suitcase
171 | /m/0283dt1,Human mouth
172 | /m/01z1kdw,Juice
173 | /m/016m2d,Skull
174 | /m/02dgv,Door
175 | /m/07y_7,Violin
176 | /m/01_5g,Chopsticks
177 | /m/06_72j,Digital clock
178 | /m/0ftb8,Sunflower
179 | /m/0c29q,Leopard
180 | /m/0jg57,Bell pepper
181 | /m/02l8p9,Harbor seal
182 | /m/078jl,Snake
183 | /m/0llzx,Sewing machine
184 | /m/0dbvp,Goose
185 | /m/09ct_,Helicopter
186 | /m/0dkzw,Seat belt
187 | /m/02p5f1q,Coffee cup
188 | /m/0fx9l,Microwave oven
189 | /m/01b9xk,Hot dog
190 | /m/0b3fp9,Countertop
191 | /m/0h8n27j,Serving tray
192 | /m/0h8n6f9,Dog bed
193 | /m/01599,Beer
194 | /m/017ftj,Sunglasses
195 | /m/044r5d,Golf ball
196 | /m/01dwsz,Waffle
197 | /m/0cdl1,Palm tree
198 | /m/07gql,Trumpet
199 | /m/0hdln,Ruler
200 | /m/0zvk5,Helmet
201 | /m/012w5l,Ladder
202 | /m/021sj1,Office building
203 | /m/0bh9flk,Tablet computer
204 | /m/09gtd,Toilet paper
205 | /m/0jwn_,Pomegranate
206 | /m/02wv6h6,Skirt
207 | /m/02wv84t,Gas stove
208 | /m/021mn,Cookie
209 | /m/018p4k,Cart
210 | /m/06j2d,Raven
211 | /m/033cnk,Egg
212 | /m/01j3zr,Burrito
213 | /m/03fwl,Goat
214 | /m/058qzx,Kitchen knife
215 | /m/06_fw,Skateboard
216 | /m/02x8cch,Salt and pepper shakers
217 | /m/04g2r,Lynx
218 | /m/01b638,Boot
219 | /m/099ssp,Platter
220 | /m/071p9,Ski
221 | /m/01gkx_,Swimwear
222 | /m/0b_rs,Swimming pool
223 | /m/03v5tg,Drinking straw
224 | /m/01j5ks,Wrench
225 | /m/026t6,Drum
226 | /m/0_k2,Ant
227 | /m/039xj_,Human ear
228 | /m/01b7fy,Headphones
229 | /m/0220r2,Fountain
230 | /m/015p6,Bird
231 | /m/0fly7,Jeans
232 | /m/07c52,Television
233 | /m/0n28_,Crab
234 | /m/0hg7b,Microphone
235 | /m/019dx1,Home appliance
236 | /m/04vv5k,Snowplow
237 | /m/020jm,Beetle
238 | /m/047v4b,Artichoke
239 | /m/01xs3r,Jet ski
240 | /m/03kt2w,Stationary bicycle
241 | /m/03q69,Human hair
242 | /m/01dxs,Brown bear
243 | /m/01h8tj,Starfish
244 | /m/0dt3t,Fork
245 | /m/0cjq5,Lobster
246 | /m/0h8lkj8,Corded phone
247 | /m/0271t,Drink
248 | /m/03q5c7,Saucer
249 | /m/0fj52s,Carrot
250 | /m/03vt0,Insect
251 | /m/01x3z,Clock
252 | /m/0d5gx,Castle
253 | /m/0h8my_4,Tennis racket
254 | /m/03ldnb,Ceiling fan
255 | /m/0cjs7,Asparagus
256 | /m/0449p,Jaguar
257 | /m/04szw,Musical instrument
258 | /m/07jdr,Train
259 | /m/01yrx,Cat
260 | /m/06c54,Rifle
261 | /m/04h8sr,Dumbbell
262 | /m/050k8,Mobile phone
263 | /m/0pg52,Taxi
264 | /m/02f9f_,Shower
265 | /m/054fyh,Pitcher
266 | /m/09k_b,Lemon
267 | /m/03xxp,Invertebrate
268 | /m/0jly1,Turkey
269 | /m/06k2mb,High heels
270 | /m/04yqq2,Bust
271 | /m/0bwd_0j,Elephant
272 | /m/02h19r,Scarf
273 | /m/02zn6n,Barrel
274 | /m/07c6l,Trombone
275 | /m/05zsy,Pumpkin
276 | /m/025dyy,Box
277 | /m/07j87,Tomato
278 | /m/09ld4,Frog
279 | /m/01vbnl,Bidet
280 | /m/0dzct,Human face
281 | /m/03fp41,Houseplant
282 | /m/0h2r6,Van
283 | /m/0by6g,Shark
284 | /m/0cxn2,Ice cream
285 | /m/04tn4x,Swim cap
286 | /m/0f6wt,Falcon
287 | /m/05n4y,Ostrich
288 | /m/0gxl3,Handgun
289 | /m/02d9qx,Whiteboard
290 | /m/04m9y,Lizard
291 | /m/05z55,Pasta
292 | /m/01x3jk,Snowmobile
293 | /m/0h8l4fh,Light bulb
294 | /m/031b6r,Window blind
295 | /m/01tcjp,Muffin
296 | /m/01f91_,Pretzel
297 | /m/02522,Computer monitor
298 | /m/0319l,Horn
299 | /m/0c_jw,Furniture
300 | /m/0l515,Sandwich
301 | /m/0306r,Fox
302 | /m/0crjs,Convenience store
303 | /m/0ch_cf,Fish
304 | /m/02xwb,Fruit
305 | /m/01r546,Earrings
306 | /m/03rszm,Curtain
307 | /m/0388q,Grape
308 | /m/03m3pdh,Sofa bed
309 | /m/03k3r,Horse
310 | /m/0hf58v5,Luggage and bags
311 | /m/01y9k5,Desk
312 | /m/05441v,Crutch
313 | /m/03p3bw,Bicycle helmet
314 | /m/0175cv,Tick
315 | /m/0cmf2,Airplane
316 | /m/0ccs93,Canary
317 | /m/02d1br,Spatula
318 | /m/0gjkl,Watch
319 | /m/0jqgx,Lily
320 | /m/0h99cwc,Kitchen appliance
321 | /m/047j0r,Filing cabinet
322 | /m/0k5j,Aircraft
323 | /m/0h8n6ft,Cake stand
324 | /m/0gm28,Candy
325 | /m/0130jx,Sink
326 | /m/04rmv,Mouse
327 | /m/081qc,Wine
328 | /m/0qmmr,Wheelchair
329 | /m/03fj2,Goldfish
330 | /m/040b_t,Refrigerator
331 | /m/02y6n,French fries
332 | /m/0fqfqc,Drawer
333 | /m/030610,Treadmill
334 | /m/07kng9,Picnic basket
335 | /m/029b3,Dice
336 | /m/0fbw6,Cabbage
337 | /m/07qxg_,Football helmet
338 | /m/068zj,Pig
339 | /m/01g317,Person
340 | /m/01bfm9,Shorts
341 | /m/02068x,Gondola
342 | /m/0fz0h,Honeycomb
343 | /m/0jy4k,Doughnut
344 | /m/05kyg_,Chest of drawers
345 | /m/01prls,Land vehicle
346 | /m/01h44,Bat
347 | /m/08pbxl,Monkey
348 | /m/02gzp,Dagger
349 | /m/04brg2,Tableware
350 | /m/031n1,Human foot
351 | /m/02jvh9,Mug
352 | /m/046dlr,Alarm clock
353 | /m/0h8ntjv,Pressure cooker
354 | /m/0k65p,Human hand
355 | /m/011k07,Tortoise
356 | /m/03grzl,Baseball glove
357 | /m/06y5r,Sword
358 | /m/061_f,Pear
359 | /m/01cmb2,Miniskirt
360 | /m/01mqdt,Traffic sign
361 | /m/05r655,Girl
362 | /m/02p3w7d,Roller skates
363 | /m/029tx,Dinosaur
364 | /m/04m6gz,Porch
365 | /m/015h_t,Human beard
366 | /m/06pcq,Submarine sandwich
367 | /m/01bms0,Screwdriver
368 | /m/07fbm7,Strawberry
369 | /m/09tvcd,Wine glass
370 | /m/06nwz,Seafood
371 | /m/0dv9c,Racket
372 | /m/083wq,Wheel
373 | /m/0gd36,Sea lion
374 | /m/0138tl,Toy
375 | /m/07clx,Tea
376 | /m/05ctyq,Tennis ball
377 | /m/0bjyj5,Waste container
378 | /m/0dbzx,Mule
379 | /m/02ctlc,Cricket ball
380 | /m/0fp6w,Pineapple
381 | /m/0djtd,Coconut
382 | /m/0167gd,Doll
383 | /m/078n6m,Coffee table
384 | /m/0152hh,Snowman
385 | /m/04gth,Lavender
386 | /m/0ll1f78,Shrimp
387 | /m/0cffdh,Maple
388 | /m/025rp__,Cowboy hat
389 | /m/02_n6y,Goggles
390 | /m/0wdt60w,Rugby ball
391 | /m/0cydv,Caterpillar
392 | /m/01n5jq,Poster
393 | /m/09rvcxw,Rocket
394 | /m/013y1f,Organ
395 | /m/06ncr,Saxophone
396 | /m/015qff,Traffic light
397 | /m/024g6,Cocktail
398 | /m/05gqfk,Plastic bag
399 | /m/0dv77,Squash
400 | /m/052sf,Mushroom
401 | /m/0cdn1,Hamburger
402 | /m/03jbxj,Light switch
403 | /m/0cyfs,Parachute
404 | /m/0kmg4,Teddy bear
405 | /m/02cvgx,Winter melon
406 | /m/09kx5,Deer
407 | /m/057cc,Musical keyboard
408 | /m/02pkr5,Plumbing fixture
409 | /m/057p5t,Scoreboard
410 | /m/03g8mr,Baseball bat
411 | /m/0frqm,Envelope
412 | /m/03m3vtv,Adhesive tape
413 | /m/0584n8,Briefcase
414 | /m/014y4n,Paddle
415 | /m/01g3x7,Bow and arrow
416 | /m/07cx4,Telephone
417 | /m/07bgp,Sheep
418 | /m/032b3c,Jacket
419 | /m/01bl7v,Boy
420 | /m/0663v,Pizza
421 | /m/0cn6p,Otter
422 | /m/02rdsp,Office supplies
423 | /m/02crq1,Couch
424 | /m/01xqw,Cello
425 | /m/0cnyhnx,Bull
426 | /m/01x_v,Camel
427 | /m/018xm,Ball
428 | /m/09ddx,Duck
429 | /m/084zz,Whale
430 | /m/01n4qj,Shirt
431 | /m/07cmd,Tank
432 | /m/04_sv,Motorcycle
433 | /m/0mkg,Accordion
434 | /m/09d5_,Owl
435 | /m/0c568,Porcupine
436 | /m/02wbtzl,Sun hat
437 | /m/05bm6,Nail
438 | /m/01lsmm,Scissors
439 | /m/0dftk,Swan
440 | /m/0dtln,Lamp
441 | /m/0nl46,Crown
442 | /m/05r5c,Piano
443 | /m/06msq,Sculpture
444 | /m/0cd4d,Cheetah
445 | /m/05kms,Oboe
446 | /m/02jnhm,Tin can
447 | /m/0fldg,Mango
448 | /m/073bxn,Tripod
449 | /m/029bxz,Oven
450 | /m/020lf,Mouse
451 | /m/01btn,Barge
452 | /m/02vqfm,Coffee
453 | /m/06__v,Snowboard
454 | /m/043nyj,Common fig
455 | /m/0grw1,Salad
456 | /m/03hl4l9,Marine invertebrates
457 | /m/0hnnb,Umbrella
458 | /m/04c0y,Kangaroo
459 | /m/0dzf4,Human arm
460 | /m/07v9_z,Measuring cup
461 | /m/0f9_l,Snail
462 | /m/0703r8,Loveseat
463 | /m/01xyhv,Suit
464 | /m/01fh4r,Teapot
465 | /m/04dr76w,Bottle
466 | /m/0pcr,Alpaca
467 | /m/03s_tn,Kettle
468 | /m/07mhn,Trousers
469 | /m/01hrv5,Popcorn
470 | /m/019h78,Centipede
471 | /m/09kmb,Spider
472 | /m/0h23m,Sparrow
473 | /m/050gv4,Plate
474 | /m/01fb_0,Bagel
475 | /m/02w3_ws,Personal care
476 | /m/014j1m,Apple
477 | /m/01gmv2,Brassiere
478 | /m/04y4h8h,Bathroom cabinet
479 | /m/026qbn5,studio couch
480 | /m/01m2v,Computer keyboard
481 | /m/05_5p_0,Table tennis racket
482 | /m/07030,Sushi
483 | /m/01s105,Cabinetry
484 | /m/033rq4,Street light
485 | /m/0162_1,Towel
486 | /m/02z51p,Nightstand
487 | /m/06mf6,Rabbit
488 | /m/02hj4,Dolphin
489 | /m/0bt9lr,Dog
490 | /m/08hvt4,Jug
491 | /m/084rd,Wok
492 | /m/01pns0,Fire hydrant
493 | /m/014sv8,Human eye
494 | /m/079cl,Skyscraper
495 | /m/01940j,Backpack
496 | /m/05vtc,Potato
497 | /m/02w3r3,Paper towel
498 | /m/054xkw,Lifejacket
499 | /m/01bqk0,Bicycle wheel
500 | /m/09g1w,Toilet
501 |
--------------------------------------------------------------------------------
/images/test/00000b4dcff7f799.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/test/00000b4dcff7f799.jpg
--------------------------------------------------------------------------------
/images/test/0000d67245642c5f.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/test/0000d67245642c5f.jpg
--------------------------------------------------------------------------------
/images/train/0000b86e2fd18333.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/train/0000b86e2fd18333.jpg
--------------------------------------------------------------------------------
/images/train/0000b9115cdf1e54.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/train/0000b9115cdf1e54.jpg
--------------------------------------------------------------------------------
/keras_retinanet/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 | .pytest_cache/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 | db.sqlite3
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # Environments
85 | .env
86 | .venv
87 | env/
88 | venv/
89 | ENV/
90 | env.bak/
91 | venv.bak/
92 |
93 | # Spyder project settings
94 | .spyderproject
95 | .spyproject
96 |
97 | # Rope project settings
98 | .ropeproject
99 |
100 | # mkdocs documentation
101 | /site
102 |
103 | # mypy
104 | .mypy_cache/
105 |
--------------------------------------------------------------------------------
/keras_retinanet/callbacks/callbacks.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 |
3 | from ..utils.anchors import compute_overlap
4 | import keras
5 | import os
6 | import cv2
7 | import numpy as np
8 | import warnings
9 |
10 |
11 | class RedirectModel(keras.callbacks.Callback):
12 | """
13 | Callback which wraps another callback, but executed on a different model.
14 |
15 | ```python
16 | model = keras.models.load_model('model.h5')
17 | model_checkpoint = ModelCheckpoint(filepath='snapshot.h5')
18 | parallel_model = multi_gpu_model(model, gpus=2)
19 | parallel_model.fit(X_train, Y_train, callbacks=[RedirectModel(model_checkpoint, model)])
20 | ```
21 |
22 | Args
23 | callback : callback to wrap.
24 | model : model to use when executing callbacks.
25 | """
26 |
27 | def __init__(self,
28 | callback,
29 | model):
30 | super(RedirectModel, self).__init__()
31 |
32 | self.callback = callback
33 | self.redirect_model = model
34 |
35 | def on_epoch_begin(self, epoch, logs=None):
36 | self.callback.on_epoch_begin(epoch, logs=logs)
37 |
38 | def on_epoch_end(self, epoch, logs=None):
39 | self.callback.on_epoch_end(epoch, logs=logs)
40 |
41 | def on_batch_begin(self, batch, logs=None):
42 | self.callback.on_batch_begin(batch, logs=logs)
43 |
44 | def on_batch_end(self, batch, logs=None):
45 | self.callback.on_batch_end(batch, logs=logs)
46 |
47 | def on_train_begin(self, logs=None):
48 | # overwrite the model with our custom model
49 | self.callback.set_model(self.redirect_model)
50 |
51 | self.callback.on_train_begin(logs=logs)
52 |
53 | def on_train_end(self, logs=None):
54 | self.callback.on_train_end(logs=logs)
55 |
56 |
57 | def label_color(label):
58 | """ Return a color from a set of predefined colors. Contains 80 colors in total.
59 | Args
60 | label: The label to get the color for.
61 | Returns
62 | A list of three values representing a RGB color.
63 | If no color is defined for a certain label, the color green is returned and a warning is printed.
64 | """
65 | if label < len(colors):
66 | return colors[label]
67 | else:
68 | warnings.warn('Label {} has no color, returning default.'.format(label))
69 | return (0, 255, 0)
70 |
71 |
72 | """
73 | Generated using:
74 | ```
75 | colors = [list((matplotlib.colors.hsv_to_rgb([x, 1.0, 1.0]) * 255).astype(int)) for x in np.arange(0, 1, 1.0 / 80)]
76 | shuffle(colors)
77 | pprint(colors)
78 | ```
79 | """
80 | colors = [
81 | [31, 0, 255],
82 | [0, 159, 255],
83 | [255, 95, 0],
84 | [255, 19, 0],
85 | [255, 0, 0],
86 | [255, 38, 0],
87 | [0, 255, 25],
88 | [255, 0, 133],
89 | [255, 172, 0],
90 | [108, 0, 255],
91 | [0, 82, 255],
92 | [0, 255, 6],
93 | [255, 0, 152],
94 | [223, 0, 255],
95 | [12, 0, 255],
96 | [0, 255, 178],
97 | [108, 255, 0],
98 | [184, 0, 255],
99 | [255, 0, 76],
100 | [146, 255, 0],
101 | [51, 0, 255],
102 | [0, 197, 255],
103 | [255, 248, 0],
104 | [255, 0, 19],
105 | [255, 0, 38],
106 | [89, 255, 0],
107 | [127, 255, 0],
108 | [255, 153, 0],
109 | [0, 255, 255],
110 | [0, 255, 216],
111 | [0, 255, 121],
112 | [255, 0, 248],
113 | [70, 0, 255],
114 | [0, 255, 159],
115 | [0, 216, 255],
116 | [0, 6, 255],
117 | [0, 63, 255],
118 | [31, 255, 0],
119 | [255, 57, 0],
120 | [255, 0, 210],
121 | [0, 255, 102],
122 | [242, 255, 0],
123 | [255, 191, 0],
124 | [0, 255, 63],
125 | [255, 0, 95],
126 | [146, 0, 255],
127 | [184, 255, 0],
128 | [255, 114, 0],
129 | [0, 255, 235],
130 | [255, 229, 0],
131 | [0, 178, 255],
132 | [255, 0, 114],
133 | [255, 0, 57],
134 | [0, 140, 255],
135 | [0, 121, 255],
136 | [12, 255, 0],
137 | [255, 210, 0],
138 | [0, 255, 44],
139 | [165, 255, 0],
140 | [0, 25, 255],
141 | [0, 255, 140],
142 | [0, 101, 255],
143 | [0, 255, 82],
144 | [223, 255, 0],
145 | [242, 0, 255],
146 | [89, 0, 255],
147 | [165, 0, 255],
148 | [70, 255, 0],
149 | [255, 0, 172],
150 | [255, 76, 0],
151 | [203, 255, 0],
152 | [204, 0, 255],
153 | [255, 0, 229],
154 | [255, 133, 0],
155 | [127, 0, 255],
156 | [0, 235, 255],
157 | [0, 255, 197],
158 | [255, 0, 191],
159 | [0, 44, 255],
160 | [50, 255, 0]
161 | ]
162 |
163 |
164 | def draw_box(image, box, color, thickness=2):
165 | """ Draws a box on an image with a given color.
166 |
167 | # Arguments
168 | image : The image to draw on.
169 | box : A list of 4 elements (x1, y1, x2, y2).
170 | color : The color of the box.
171 | thickness : The thickness of the lines to draw a box with.
172 | """
173 | b = np.array(box).astype(int)
174 | cv2.rectangle(image, (b[0], b[1]), (b[2], b[3]), color, thickness, cv2.LINE_AA)
175 |
176 |
177 | def draw_caption(image, box, caption):
178 | """ Draws a caption above the box in an image.
179 |
180 | # Arguments
181 | image : The image to draw on.
182 | box : A list of 4 elements (x1, y1, x2, y2).
183 | caption : String containing the text to draw.
184 | """
185 | b = np.array(box).astype(int)
186 | cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 0), 2)
187 | cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1)
188 |
189 |
190 | def draw_boxes(image, boxes, color, thickness=2):
191 | """ Draws boxes on an image with a given color.
192 |
193 | # Arguments
194 | image : The image to draw on.
195 | boxes : A [N, 4] matrix (x1, y1, x2, y2).
196 | color : The color of the boxes.
197 | thickness : The thickness of the lines to draw boxes with.
198 | """
199 | for b in boxes:
200 | draw_box(image, b, color, thickness=thickness)
201 |
202 |
203 | def draw_detections(image, boxes, scores, labels, color=None, label_to_name=None, score_threshold=0.5):
204 | """ Draws detections in an image.
205 |
206 | # Arguments
207 | image : The image to draw on.
208 | boxes : A [N, 4] matrix (x1, y1, x2, y2).
209 | scores : A list of N classification scores.
210 | labels : A list of N labels.
211 | color : The color of the boxes.
212 | By default the color from keras_retinanet.utils.colors.label_color will be used.
213 | label_to_name : (optional) Functor for mapping a label to a name.
214 | score_threshold : Threshold used for determining what detections to draw.
215 | """
216 | selection = np.where(scores > score_threshold)[0]
217 |
218 | for i in selection:
219 | c = color if color is not None else label_color(labels[i])
220 | draw_box(image, boxes[i, :], color=c)
221 |
222 | # draw labels
223 | caption = (label_to_name(labels[i]) if label_to_name else labels[i]) + ': {0:.2f}'.format(scores[i])
224 | draw_caption(image, boxes[i, :], caption)
225 |
226 |
227 | def draw_annotations(image, annotations, color=(0, 255, 0), label_to_name=None):
228 | """ Draws annotations in an image.
229 |
230 | # Arguments
231 | image : The image to draw on.
232 | annotations : A [N, 5] matrix (x1, y1, x2, y2, label).
233 | color : The color of the boxes.
234 | By default the color from keras_retinanet.utils.colors.label_color will be used.
235 | label_to_name : (optional) Functor for mapping a label to a name.
236 | """
237 | for a in annotations:
238 | label = a[4]
239 | c = color if color is not None else label_color(label)
240 | caption = '{}'.format(label_to_name(label) if label_to_name else label)
241 | draw_caption(image, a, caption)
242 |
243 | draw_box(image, a, color=c)
244 |
245 |
246 | def _compute_ap(recall, precision):
247 | """ Compute the average precision, given the recall and precision curves.
248 |
249 | Code originally from https://github.com/rbgirshick/py-faster-rcnn.
250 |
251 | # Arguments
252 | recall: The recall curve (list).
253 | precision: The precision curve (list).
254 | # Returns
255 | The average precision as computed in py-faster-rcnn.
256 | """
257 | # correct AP calculation
258 | # first append sentinel values at the end
259 | mrec = np.concatenate(([0.], recall, [1.]))
260 | mpre = np.concatenate(([0.], precision, [0.]))
261 |
262 | # compute the precision envelope
263 | for i in range(mpre.size - 1, 0, -1):
264 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
265 |
266 | # to calculate area under PR curve, look for points
267 | # where X axis (recall) changes value
268 | i = np.where(mrec[1:] != mrec[:-1])[0]
269 |
270 | # and sum (\Delta recall) * prec
271 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
272 | return ap
273 |
274 |
275 | def _get_detections(generator, model, score_threshold=0.05, max_detections=100, save_path=None):
276 | """ Get the detections from the model using the generator.
277 |
278 | The result is a list of lists such that the size is:
279 | all_detections[num_images][num_classes] = detections[num_detections, 4 + num_classes]
280 |
281 | # Arguments
282 | generator : The generator used to run images through the model.
283 | model : The model to run on the images.
284 | score_threshold : The score confidence threshold to use.
285 | max_detections : The maximum number of detections to use per image.
286 | save_path : The path to save the images with visualized detections to.
287 | # Returns
288 | A list of lists containing the detections for each image in the generator.
289 | """
290 | all_detections = [[None for i in range(generator.num_classes())] for j in range(generator.size())]
291 |
292 | while True:
293 | for i in range(generator.size()):
294 | try:
295 | raw_image = generator.load_image(i)
296 | image = generator.preprocess_image(raw_image.copy())
297 | except:
298 | break
299 | image, scale = generator.resize_image(image)
300 |
301 | # run network
302 | boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))[:3]
303 |
304 | # correct boxes for image scale
305 | boxes /= scale
306 |
307 | # select indices which have a score above the threshold
308 | indices = np.where(scores[0, :] > score_threshold)[0]
309 |
310 | # select those scores
311 | scores = scores[0][indices]
312 |
313 | # find the order with which to sort the scores
314 | scores_sort = np.argsort(-scores)[:max_detections]
315 |
316 | # select detections
317 | image_boxes = boxes[0, indices[scores_sort], :]
318 | image_scores = scores[scores_sort]
319 | image_labels = labels[0, indices[scores_sort]]
320 | image_detections = np.concatenate(
321 | [image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1)
322 |
323 | if save_path is not None:
324 | draw_annotations(raw_image, generator.load_annotations(i), label_to_name=generator.label_to_name)
325 | draw_detections(raw_image, image_boxes, image_scores, image_labels, label_to_name=generator.label_to_name)
326 |
327 | cv2.imwrite(os.path.join(save_path, '{}.png'.format(i)), raw_image)
328 |
329 | # copy detections to all_detections
330 | for label in range(generator.num_classes()):
331 | all_detections[i][label] = image_detections[image_detections[:, -1] == label, :-1]
332 |
333 | print('{}/{}'.format(i + 1, generator.size()), end='\r')
334 |
335 | return all_detections
336 |
337 |
338 | def _get_annotations(generator):
339 | """ Get the ground truth annotations from the generator.
340 |
341 | The result is a list of lists such that the size is:
342 | all_detections[num_images][num_classes] = annotations[num_detections, 5]
343 |
344 | # Arguments
345 | generator : The generator used to retrieve ground truth annotations.
346 | # Returns
347 | A list of lists containing the annotations for each image in the generator.
348 | """
349 | all_annotations = [[None for i in range(generator.num_classes())] for j in range(generator.size())]
350 |
351 | for i in range(generator.size()):
352 | # load the annotations
353 | annotations = generator.load_annotations(i)
354 |
355 | # copy detections to all_annotations
356 | for label in range(generator.num_classes()):
357 | all_annotations[i][label] = annotations[annotations[:, 4] == label, :4].copy()
358 |
359 | print('{}/{}'.format(i + 1, generator.size()), end='\r')
360 |
361 | return all_annotations
362 |
363 |
364 | def evaluate(
365 | generator,
366 | model,
367 | iou_threshold=0.5,
368 | score_threshold=0.05,
369 | max_detections=100,
370 | save_path=None
371 | ):
372 | """ Evaluate a given dataset using a given model.
373 |
374 | # Arguments
375 | generator : The generator that represents the dataset to evaluate.
376 | model : The model to evaluate.
377 | iou_threshold : The threshold used to consider when a detection is positive or negative.
378 | score_threshold : The score confidence threshold to use for detections.
379 | max_detections : The maximum number of detections to use per image.
380 | save_path : The path to save images with visualized detections to.
381 | # Returns
382 | A dict mapping class names to mAP scores.
383 | """
384 | # gather all detections and annotations
385 | all_detections = _get_detections(generator, model, score_threshold=score_threshold, max_detections=max_detections,
386 | save_path=save_path)
387 | all_annotations = _get_annotations(generator)
388 | average_precisions = {}
389 |
390 | # process detections and annotations
391 | for label in range(generator.num_classes()):
392 | false_positives = np.zeros((0,))
393 | true_positives = np.zeros((0,))
394 | scores = np.zeros((0,))
395 | num_annotations = 0.0
396 |
397 | for i in range(generator.size()):
398 | detections = all_detections[i][label]
399 | annotations = all_annotations[i][label]
400 | num_annotations += annotations.shape[0]
401 | detected_annotations = []
402 |
403 | for d in detections:
404 | scores = np.append(scores, d[4])
405 |
406 | if annotations.shape[0] == 0:
407 | false_positives = np.append(false_positives, 1)
408 | true_positives = np.append(true_positives, 0)
409 | continue
410 |
411 | overlaps = compute_overlap(np.expand_dims(d, axis=0), annotations)
412 | assigned_annotation = np.argmax(overlaps, axis=1)
413 | max_overlap = overlaps[0, assigned_annotation]
414 |
415 | if max_overlap >= iou_threshold and assigned_annotation not in detected_annotations:
416 | false_positives = np.append(false_positives, 0)
417 | true_positives = np.append(true_positives, 1)
418 | detected_annotations.append(assigned_annotation)
419 | else:
420 | false_positives = np.append(false_positives, 1)
421 | true_positives = np.append(true_positives, 0)
422 |
423 | # no annotations -> AP for this class is 0 (is this correct?)
424 | if num_annotations == 0:
425 | average_precisions[label] = 0, 0
426 | continue
427 |
428 | # sort by score
429 | indices = np.argsort(-scores)
430 | false_positives = false_positives[indices]
431 | true_positives = true_positives[indices]
432 |
433 | # compute false positives and true positives
434 | false_positives = np.cumsum(false_positives)
435 | true_positives = np.cumsum(true_positives)
436 |
437 | # compute recall and precision
438 | recall = true_positives / num_annotations
439 | precision = true_positives / np.maximum(true_positives + false_positives, np.finfo(np.float64).eps)
440 |
441 | # compute average precision
442 | average_precision = _compute_ap(recall, precision)
443 | average_precisions[label] = average_precision, num_annotations
444 |
445 | return average_precisions
446 |
447 |
448 | class Evaluate(keras.callbacks.Callback):
449 | """ Evaluation callback for arbitrary datasets.
450 | """
451 |
452 | def __init__(self, generator, iou_threshold=0.5, score_threshold=0.05, max_detections=100, save_path=None,
453 | tensorboard=None, verbose=1):
454 | """ Evaluate a given dataset using a given model at the end of every epoch during training.
455 |
456 | # Arguments
457 | generator : The generator that represents the dataset to evaluate.
458 | iou_threshold : The threshold used to consider when a detection is positive or negative.
459 | score_threshold : The score confidence threshold to use for detections.
460 | max_detections : The maximum number of detections to use per image.
461 | save_path : The path to save images with visualized detections to.
462 | tensorboard : Instance of keras.callbacks.TensorBoard used to log the mAP value.
463 | verbose : Set the verbosity level, by default this is set to 1.
464 | """
465 | self.generator = generator
466 | self.iou_threshold = iou_threshold
467 | self.score_threshold = score_threshold
468 | self.max_detections = max_detections
469 | self.save_path = save_path
470 | self.tensorboard = tensorboard
471 | self.verbose = verbose
472 |
473 | super(Evaluate, self).__init__()
474 |
475 | def on_epoch_end(self, epoch, logs=None):
476 | logs = logs or {}
477 |
478 | # run evaluation
479 | average_precisions = evaluate(
480 | self.generator,
481 | self.model,
482 | iou_threshold=self.iou_threshold,
483 | score_threshold=self.score_threshold,
484 | max_detections=self.max_detections,
485 | save_path=self.save_path
486 | )
487 |
488 | # compute per class average precision
489 | present_classes = 0
490 | precision = 0
491 | for label, (average_precision, num_annotations) in average_precisions.items():
492 | if self.verbose == 1:
493 | print('{:.0f} instances of class'.format(num_annotations),
494 | self.generator.label_to_name(label), 'with average precision: {:.4f}'.format(average_precision))
495 | if num_annotations > 0:
496 | present_classes += 1
497 | precision += average_precision
498 | self.mean_ap = precision / present_classes
499 |
500 | if self.tensorboard is not None and self.tensorboard.writer is not None:
501 | import tensorflow as tf
502 | summary = tf.Summary()
503 | summary_value = summary.value.add()
504 | summary_value.simple_value = self.mean_ap
505 | summary_value.tag = "mAP"
506 | self.tensorboard.writer.add_summary(summary, epoch)
507 |
508 | logs['mAP'] = self.mean_ap
509 |
510 | if self.verbose == 1:
511 | print('mAP: {:.4f}'.format(self.mean_ap))
512 |
--------------------------------------------------------------------------------
/keras_retinanet/models/classifier.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import os
3 | from sklearn.multioutput import MultiOutputClassifier
4 | from sklearn.model_selection import train_test_split
5 | from sklearn.metrics import classification_report
6 | from sklearn.ensemble import RandomForestClassifier
7 |
8 |
9 | def vr_bb_classifier(main_dir):
10 |
11 | path = os.path.join(main_dir, 'challenge2018')
12 | train_file = "relationship_triplets_annotations.csv"
13 |
14 | train = pd.read_csv(path + train_file)
15 |
16 | train['box1length'] = train['XMax1'] - train['XMin1']
17 | train['box2length'] = train['XMax2'] - train['XMin2']
18 | train['box1height'] = train['YMax1'] - train['YMin1']
19 | train['box2height'] = train['YMax2'] - train['YMin2']
20 |
21 | train['box1area'] = train['box1length'] * train['box1height']
22 | train['box2area'] = train['box2length'] * train['box2height']
23 |
24 | train["xA"] = train[["XMin1", "XMin2"]].max(axis=1)
25 | train["yA"] = train[["YMin1", "YMin2"]].max(axis=1)
26 | train["xB"] = train[["XMax1", "XMax2"]].min(axis=1)
27 | train["yB"] = train[["YMax1", "YMax2"]].min(axis=1)
28 |
29 | train["intersectionarea"] = (train["xB"] - train["xA"]) * (train["yB"] - train["yA"])
30 | train["unionarea"] = train["box1area"] + train["box2area"] - train["intersectionarea"]
31 | train["iou"] = (train["intersectionarea"] / train["unionarea"])
32 |
33 | drop_columns = ["ImageID", "box1length", "box2length", "box1height",
34 | "box2height", "intersectionarea", "unionarea", "xA", "yA",
35 | "xB", "yB", "box1area", "box2area"]
36 | train = train.drop(columns=drop_columns)
37 |
38 | train = train[['LabelName1', 'LabelName2', 'XMin1', 'XMax1', 'YMin1', 'YMax1', 'XMin2',
39 | 'XMax2', 'YMin2', 'YMax2', 'iou', 'RelationshipLabel']]
40 |
41 | train = pd.get_dummies(train, columns=["RelationshipLabel"])
42 |
43 | COLUMN_NAMES = {"RelationshipLabel_at": "at",
44 | "RelationshipLabel_hits": "hits",
45 | "RelationshipLabel_holds": "holds",
46 | "RelationshipLabel_inside_of": "inside_of",
47 | "RelationshipLabel_interacts_with": "interacts_with",
48 | "RelationshipLabel_is": "is",
49 | "RelationshipLabel_on": "on",
50 | "RelationshipLabel_plays": "plays",
51 | "RelationshipLabel_under": "under",
52 | "RelationshipLabel_wears": "wears",
53 | }
54 |
55 | train = train.rename(columns=COLUMN_NAMES)
56 |
57 | X = train[['XMin1', 'XMax1', 'YMin1', 'YMax1', 'XMin2',
58 | 'XMax2', 'YMin2', 'YMax2', 'iou']]
59 |
60 | y = train[['at', 'hits', 'holds', 'inside_of', 'interacts_with',
61 | 'is', 'on', 'plays', 'under', 'wears']]
62 |
63 | print("Training VR Classifier")
64 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=25)
65 |
66 | forest = RandomForestClassifier(n_estimators=500,
67 | verbose=1)
68 | LogReg = MultiOutputClassifier(forest).fit(X_train, y_train)
69 |
70 | # y_pred = LogReg.predict(X_test)
71 | # print(classification_report(y_test, y_pred))
72 | print("VR Classifier Training Complete")
73 |
74 | return LogReg
75 |
76 |
--------------------------------------------------------------------------------
/keras_retinanet/models/model_backbone.py:
--------------------------------------------------------------------------------
1 | import keras.models
2 |
3 |
4 | class Backbone(object):
5 | """ This class stores additional information on backbones.
6 | """
7 |
8 | def __init__(self, backbone):
9 | # a dictionary mapping custom layer names to the correct classes
10 | from ..utils import layers
11 | from ..utils import losses
12 | from ..utils import initializers
13 | self.custom_objects = {
14 | 'UpsampleLike': layers.UpsampleLike,
15 | 'PriorProbability': initializers.PriorProbability,
16 | 'RegressBoxes': layers.RegressBoxes,
17 | 'FilterDetections': layers.FilterDetections,
18 | 'Anchors': layers.Anchors,
19 | 'ClipBoxes': layers.ClipBoxes,
20 | '_smooth_l1': losses.smooth_l1(),
21 | '_focal': losses.focal(),
22 | }
23 |
24 | self.backbone = backbone
25 | self.validate()
26 |
27 | def retinanet(self, *args, **kwargs):
28 | """ Returns a retinanet model using the correct backbone.
29 | """
30 | raise NotImplementedError('retinanet method not implemented.')
31 |
32 | def download_imagenet(self):
33 | """ Downloads ImageNet weights and returns path to weights file.
34 | """
35 | raise NotImplementedError('download_imagenet method not implemented.')
36 |
37 | def validate(self):
38 | """ Checks whether the backbone string is correct.
39 | """
40 | raise NotImplementedError('validate method not implemented.')
41 |
42 | def preprocess_image(self, inputs):
43 | """ Takes as input an image and prepares it for being passed through the network.
44 | Having this function in Backbone allows other backbones to define a specific preprocessing step.
45 | """
46 | raise NotImplementedError('preprocess_image method not implemented.')
47 |
48 |
49 | def backbone(backbone_name):
50 | """ Returns a backbone object for the given backbone.
51 | """
52 | if 'resnet' in backbone_name:
53 | from ..models.resnet import ResNetBackbone as b
54 | else:
55 | raise NotImplementedError('Backbone class for \'{}\' not implemented.'.format(backbone))
56 |
57 | return b(backbone_name)
58 |
59 |
60 | def load_model(filepath, backbone_name='resnet50', convert=False, nms=True, class_specific_filter=True):
61 | """ Loads a retinanet model using the correct custom objects.
62 | # Arguments
63 | filepath: one of the following:
64 | - string, path to the saved model, or
65 | - h5py.File object from which to load the model
66 | backbone_name : Backbone with which the model was trained.
67 | convert : Boolean, whether to convert the model to an inference model.
68 | nms : Boolean, whether to add NMS filtering to the converted model.
69 | Only valid if convert=True.
70 | class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only.
71 | # Returns
72 | A keras.models.Model object.
73 | # Raises
74 | ImportError: if h5py is not available.
75 | ValueError: In case of an invalid savefile.
76 | """
77 |
78 | model = keras.models.load_model(filepath, custom_objects=backbone(backbone_name).custom_objects)
79 | if convert:
80 | from ..models.retinanet import retinanet_bbox
81 | print("Starting to convert model...")
82 | model = retinanet_bbox(model=model, nms=nms, class_specific_filter=class_specific_filter)
83 |
84 | return model
85 |
--------------------------------------------------------------------------------
/keras_retinanet/models/resnet.py:
--------------------------------------------------------------------------------
1 | import keras
2 | from keras.utils import get_file
3 | import keras_resnet
4 | import keras_resnet.models
5 | from ..models import retinanet
6 |
7 | from ..models.model_backbone import Backbone
8 | from ..preprocessing.image import preprocess_image
9 |
10 |
11 | class ResNetBackbone(Backbone):
12 | """ Describes backbone information and provides utility functions.
13 | """
14 |
15 | def __init__(self, backbone):
16 | super(ResNetBackbone, self).__init__(backbone)
17 | self.custom_objects.update(keras_resnet.custom_objects)
18 |
19 | def retinanet(self, *args, **kwargs):
20 | """ Returns a retinanet model using the correct backbone.
21 | """
22 | return resnet_retinanet(*args, backbone=self.backbone, **kwargs)
23 |
24 | def download_imagenet(self):
25 | """ Downloads ImageNet weights and returns path to weights file.
26 | """
27 | resnet_filename = 'ResNet-{}-model.keras.h5'
28 | resnet_resource = 'https://github.com/fizyr/keras-models/releases/download/v0.0.1/{}'.format(resnet_filename)
29 | depth = int(self.backbone.replace('resnet', ''))
30 |
31 | filename = resnet_filename.format(depth)
32 | resource = resnet_resource.format(depth)
33 | if depth == 50:
34 | checksum = '3e9f4e4f77bbe2c9bec13b53ee1c2319'
35 | elif depth == 101:
36 | checksum = '05dc86924389e5b401a9ea0348a3213c'
37 | elif depth == 152:
38 | checksum = '6ee11ef2b135592f8031058820bb9e71'
39 |
40 | return get_file(
41 | filename,
42 | resource,
43 | cache_subdir='models',
44 | md5_hash=checksum
45 | )
46 |
47 | def validate(self):
48 | """ Checks whether the backbone string is correct.
49 | """
50 | allowed_backbones = ['resnet50', 'resnet101', 'resnet152']
51 | backbone = self.backbone.split('_')[0]
52 |
53 | if backbone not in allowed_backbones:
54 | raise ValueError('Backbone (\'{}\') not in allowed backbones ({}).'.format(backbone, allowed_backbones))
55 |
56 | def preprocess_image(self, inputs):
57 | """ Takes as input an image and prepares it for being passed through the network.
58 | """
59 | return preprocess_image(inputs)
60 |
61 |
62 | def resnet_retinanet(num_classes, backbone='resnet50', inputs=None, modifier=None, **kwargs):
63 | """ Constructs a retinanet model using a resnet backbone.
64 |
65 | Args
66 | num_classes: Number of classes to predict.
67 | backbone: Which backbone to use (one of ('resnet50', 'resnet101', 'resnet152')).
68 | inputs: The inputs to the network (defaults to a Tensor of shape (None, None, 3)).
69 | modifier: A function handler which can modify the backbone before using it in retinanet (this can be used to
70 | freeze backbone layers for example).
71 |
72 | Returns
73 | RetinaNet model with a ResNet backbone.
74 | """
75 | # choose default input
76 | if inputs is None:
77 | inputs = keras.layers.Input(shape=(None, None, 3))
78 |
79 | # create the resnet backbone
80 | if backbone == 'resnet50':
81 | resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True)
82 | elif backbone == 'resnet101':
83 | resnet = keras_resnet.models.ResNet101(inputs, include_top=False, freeze_bn=True)
84 | elif backbone == 'resnet152':
85 | resnet = keras_resnet.models.ResNet152(inputs, include_top=False, freeze_bn=True)
86 | else:
87 | raise ValueError('Backbone (\'{}\') is invalid.'.format(backbone))
88 |
89 | # invoke modifier if given
90 | if modifier:
91 | resnet = modifier(resnet)
92 |
93 | # create the full model
94 | return retinanet.retinanet(inputs=inputs, num_classes=num_classes, backbone_layers=resnet.outputs[1:], **kwargs)
95 |
96 |
97 | def resnet50_retinanet(num_classes, inputs=None, **kwargs):
98 | return resnet_retinanet(num_classes=num_classes, backbone='resnet50', inputs=inputs, **kwargs)
99 |
100 |
101 | def resnet101_retinanet(num_classes, inputs=None, **kwargs):
102 | return resnet_retinanet(num_classes=num_classes, backbone='resnet101', inputs=inputs, **kwargs)
103 |
104 |
105 | def resnet152_retinanet(num_classes, inputs=None, **kwargs):
106 | return resnet_retinanet(num_classes=num_classes, backbone='resnet152', inputs=inputs, **kwargs)
107 |
--------------------------------------------------------------------------------
/keras_retinanet/models/retinanet.py:
--------------------------------------------------------------------------------
1 | import keras
2 | from ..utils import initializers
3 | from ..utils import layers
4 | import numpy as np
5 |
6 |
7 | def default_classification_model(
8 | num_classes,
9 | num_anchors,
10 | pyramid_feature_size=256,
11 | prior_probability=0.01,
12 | classification_feature_size=256,
13 | name='classification_submodel'
14 | ):
15 | """ Creates the default regression submodel.
16 |
17 | Args
18 | num_classes : Number of classes to predict a score for at each feature level.
19 | num_anchors : Number of anchors to predict classification scores for at each feature level.
20 | pyramid_feature_size : The number of filters to expect from the feature pyramid levels.
21 | classification_feature_size : The number of filters to use in the layers in the classification submodel.
22 | name : The name of the submodel.
23 |
24 | Returns
25 | A keras.models.Model that predicts classes for each anchor.
26 | """
27 | options = {
28 | 'kernel_size': 3,
29 | 'strides': 1,
30 | 'padding': 'same',
31 | }
32 |
33 | inputs = keras.layers.Input(shape=(None, None, pyramid_feature_size))
34 | outputs = inputs
35 | for i in range(4):
36 | outputs = keras.layers.Conv2D(
37 | filters=classification_feature_size,
38 | activation='relu',
39 | name='pyramid_classification_{}'.format(i),
40 | kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
41 | bias_initializer='zeros',
42 | **options
43 | )(outputs)
44 |
45 | outputs = keras.layers.Conv2D(
46 | filters=num_classes * num_anchors,
47 | kernel_initializer=keras.initializers.zeros(),
48 | bias_initializer=initializers.PriorProbability(probability=prior_probability),
49 | name='pyramid_classification',
50 | **options
51 | )(outputs)
52 |
53 | # reshape output and apply sigmoid
54 | outputs = keras.layers.Reshape((-1, num_classes), name='pyramid_classification_reshape')(outputs)
55 | outputs = keras.layers.Activation('sigmoid', name='pyramid_classification_sigmoid')(outputs)
56 |
57 | return keras.models.Model(inputs=inputs, outputs=outputs, name=name)
58 |
59 |
60 | def default_regression_model(num_anchors, pyramid_feature_size=256, regression_feature_size=256,
61 | name='regression_submodel'):
62 | """ Creates the default regression submodel.
63 |
64 | Args
65 | num_anchors : Number of anchors to regress for each feature level.
66 | pyramid_feature_size : The number of filters to expect from the feature pyramid levels.
67 | regression_feature_size : The number of filters to use in the layers in the regression submodel.
68 | name : The name of the submodel.
69 |
70 | Returns
71 | A keras.models.Model that predicts regression values for each anchor.
72 | """
73 | # All new conv layers except the final one in the
74 | # RetinaNet (classification) subnets are initialized
75 | # with bias b = 0 and a Gaussian weight fill with stddev = 0.01.
76 | options = {
77 | 'kernel_size': 3,
78 | 'strides': 1,
79 | 'padding': 'same',
80 | 'kernel_initializer': keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
81 | 'bias_initializer': 'zeros'
82 | }
83 |
84 | inputs = keras.layers.Input(shape=(None, None, pyramid_feature_size))
85 | outputs = inputs
86 | for i in range(4):
87 | outputs = keras.layers.Conv2D(
88 | filters=regression_feature_size,
89 | activation='relu',
90 | name='pyramid_regression_{}'.format(i),
91 | **options
92 | )(outputs)
93 |
94 | outputs = keras.layers.Conv2D(num_anchors * 4, name='pyramid_regression', **options)(outputs)
95 | outputs = keras.layers.Reshape((-1, 4), name='pyramid_regression_reshape')(outputs)
96 |
97 | return keras.models.Model(inputs=inputs, outputs=outputs, name=name)
98 |
99 |
100 | def __create_pyramid_features(C3, C4, C5, feature_size=256):
101 | """ Creates the FPN layers on top of the backbone features.
102 |
103 | Args
104 | C3 : Feature stage C3 from the backbone.
105 | C4 : Feature stage C4 from the backbone.
106 | C5 : Feature stage C5 from the backbone.
107 | feature_size : The feature size to use for the resulting feature levels.
108 |
109 | Returns
110 | A list of feature levels [P3, P4, P5, P6, P7].
111 | """
112 | # upsample C5 to get P5 from the FPN paper
113 | P5 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C5_reduced')(C5)
114 | P5_upsampled = layers.UpsampleLike(name='P5_upsampled')([P5, C4])
115 | P5 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P5')(P5)
116 |
117 | # add P5 elementwise to C4
118 | P4 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C4_reduced')(C4)
119 | P4 = keras.layers.Add(name='P4_merged')([P5_upsampled, P4])
120 | P4_upsampled = layers.UpsampleLike(name='P4_upsampled')([P4, C3])
121 | P4 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P4')(P4)
122 |
123 | # add P4 elementwise to C3
124 | P3 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C3_reduced')(C3)
125 | P3 = keras.layers.Add(name='P3_merged')([P4_upsampled, P3])
126 | P3 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P3')(P3)
127 |
128 | # "P6 is obtained via a 3x3 stride-2 conv on C5"
129 | P6 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P6')(C5)
130 |
131 | # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6"
132 | P7 = keras.layers.Activation('relu', name='C6_relu')(P6)
133 | P7 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P7')(P7)
134 |
135 | return [P3, P4, P5, P6, P7]
136 |
137 |
138 | class AnchorParameters:
139 | """ The parameteres that define how anchors are generated.
140 |
141 | Args
142 | sizes : List of sizes to use. Each size corresponds to one feature level.
143 | strides : List of strides to use. Each stride correspond to one feature level.
144 | ratios : List of ratios to use per location in a feature map.
145 | scales : List of scales to use per location in a feature map.
146 | """
147 |
148 | def __init__(self, sizes, strides, ratios, scales):
149 | self.sizes = sizes
150 | self.strides = strides
151 | self.ratios = ratios
152 | self.scales = scales
153 |
154 | def num_anchors(self):
155 | return len(self.ratios) * len(self.scales)
156 |
157 |
158 | """
159 | The default anchor parameters.
160 | """
161 | AnchorParameters.default = AnchorParameters(
162 | sizes=[32, 64, 128, 256, 512],
163 | strides=[8, 16, 32, 64, 128],
164 | ratios=np.array([0.5, 1, 2], keras.backend.floatx()),
165 | scales=np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)], keras.backend.floatx()),
166 | )
167 |
168 |
169 | def default_submodels(num_classes, num_anchors):
170 | """ Create a list of default submodels used for object detection.
171 |
172 | The default submodels contains a regression submodel and a classification submodel.
173 |
174 | Args
175 | num_classes : Number of classes to use.
176 | num_anchors : Number of base anchors.
177 |
178 | Returns
179 | A list of tuple, where the first element is the name of the submodel and the second element is the
180 | submodel itself.
181 | """
182 | return [
183 | ('regression', default_regression_model(num_anchors)),
184 | ('classification', default_classification_model(num_classes, num_anchors))
185 | ]
186 |
187 |
188 | def __build_model_pyramid(name, model, features):
189 | """ Applies a single submodel to each FPN level.
190 |
191 | Args
192 | name : Name of the submodel.
193 | model : The submodel to evaluate.
194 | features : The FPN features.
195 |
196 | Returns
197 | A tensor containing the response from the submodel on the FPN features.
198 | """
199 | return keras.layers.Concatenate(axis=1, name=name)([model(f) for f in features])
200 |
201 |
202 | def __build_pyramid(models, features):
203 | """ Applies all submodels to each FPN level.
204 |
205 | Args
206 | models : List of sumodels to run on each pyramid level (by default only regression, classifcation).
207 | features : The FPN features.
208 |
209 | Returns
210 | A list of tensors, one for each submodel.
211 | """
212 | return [__build_model_pyramid(n, m, features) for n, m in models]
213 |
214 |
215 | def __build_anchors(anchor_parameters, features):
216 | """ Builds anchors for the shape of the features from FPN.
217 |
218 | Args
219 | anchor_parameters : Parameteres that determine how anchors are generated.
220 | features : The FPN features.
221 |
222 | Returns
223 | A tensor containing the anchors for the FPN features.
224 |
225 | The shape is:
226 | ```
227 | (batch_size, num_anchors, 4)
228 | ```
229 | """
230 | anchors = [
231 | layers.Anchors(
232 | size=anchor_parameters.sizes[i],
233 | stride=anchor_parameters.strides[i],
234 | ratios=anchor_parameters.ratios,
235 | scales=anchor_parameters.scales,
236 | name='anchors_{}'.format(i)
237 | )(f) for i, f in enumerate(features)
238 | ]
239 |
240 | return keras.layers.Concatenate(axis=1, name='anchors')(anchors)
241 |
242 |
243 | def retinanet(
244 | inputs,
245 | backbone_layers,
246 | num_classes,
247 | num_anchors=9,
248 | create_pyramid_features=__create_pyramid_features,
249 | submodels=None,
250 | name='retinanet'
251 | ):
252 | """ Construct a RetinaNet model on top of a backbone.
253 |
254 | This model is the minimum model necessary for training (with the unfortunate exception of anchors as output).
255 |
256 | Args
257 | inputs : keras.layers.Input (or list of) for the input to the model.
258 | num_classes : Number of classes to classify.
259 | num_anchors : Number of base anchors.
260 | create_pyramid_features : Functor for creating pyramid features given the features C3, C4, C5 from the backbone.
261 | submodels : Submodels to run on each feature map (default is regression and classification
262 | submodels).
263 | name : Name of the model.
264 |
265 | Returns
266 | A keras.models.Model which takes an image as input and outputs generated anchors and the result from each
267 | submodel on every pyramid level.
268 |
269 | The order of the outputs is as defined in submodels:
270 | ```
271 | [
272 | regression, classification, other[0], other[1], ...
273 | ]
274 | ```
275 | """
276 | if submodels is None:
277 | submodels = default_submodels(num_classes, num_anchors)
278 |
279 | C3, C4, C5 = backbone_layers
280 |
281 | # compute pyramid features as per https://arxiv.org/abs/1708.02002
282 | features = create_pyramid_features(C3, C4, C5)
283 |
284 | # for all pyramid levels, run available submodels
285 | pyramids = __build_pyramid(submodels, features)
286 |
287 | return keras.models.Model(inputs=inputs, outputs=pyramids, name=name)
288 |
289 |
290 | def retinanet_bbox(
291 | model=None,
292 | anchor_parameters=AnchorParameters.default,
293 | nms=True,
294 | class_specific_filter=True,
295 | name='retinanet-bbox',
296 | **kwargs
297 | ):
298 | """ Construct a RetinaNet model on top of a backbone and adds convenience functions to output boxes directly.
299 |
300 | This model uses the minimum retinanet model and appends a few layers to compute boxes within the graph.
301 | These layers include applying the regression values to the anchors and performing NMS.
302 |
303 | Args
304 | model : RetinaNet model to append bbox layers to. If None, it will create a RetinaNet model
305 | using **kwargs.
306 | anchor_parameters : Struct containing configuration for anchor generation (sizes, strides, ratios, scales).
307 | nms : Whether to use non-maximum suppression for the filtering step.
308 | class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only.
309 | name : Name of the model.
310 | *kwargs : Additional kwargs to pass to the minimal retinanet model.
311 |
312 | Returns
313 | A keras.models.Model which takes an image as input and outputs the detections on the image.
314 |
315 | The order is defined as follows:
316 | ```
317 | [
318 | boxes, scores, labels, other[0], other[1], ...
319 | ]
320 | ```
321 | """
322 | if model is None:
323 | model = retinanet(num_anchors=anchor_parameters.num_anchors(), **kwargs)
324 |
325 | # compute the anchors
326 | features = [model.get_layer(p_name).output for p_name in ['P3', 'P4', 'P5', 'P6', 'P7']]
327 | anchors = __build_anchors(anchor_parameters, features)
328 |
329 | # we expect the anchors, regression and classification values as first output
330 | regression = model.outputs[0]
331 | classification = model.outputs[1]
332 |
333 | # "other" can be any additional output from custom submodels, by default this will be []
334 | other = model.outputs[2:]
335 |
336 | # apply predicted regression to anchors
337 | boxes = layers.RegressBoxes(name='boxes')([anchors, regression])
338 | boxes = layers.ClipBoxes(name='clipped_boxes')([model.inputs[0], boxes])
339 |
340 | # filter detections (apply NMS / score threshold / select top-k)
341 | detections = layers.FilterDetections(
342 | nms=nms,
343 | class_specific_filter=class_specific_filter,
344 | name='filtered_detections'
345 | )([boxes, classification] + other)
346 |
347 | outputs = detections
348 |
349 | # construct the model
350 | return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
351 |
--------------------------------------------------------------------------------
/keras_retinanet/preprocessing/generator.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import random
3 | import threading
4 | import warnings
5 | import keras
6 |
7 | from ..utils.anchors import anchor_targets_bbox
8 | from ..utils.anchors import anchors_for_shape
9 | from ..utils.anchors import guess_shapes
10 | from ..preprocessing.image import TransformParameters
11 | from ..preprocessing.image import adjust_transform_for_image
12 | from ..preprocessing.image import apply_transform
13 | from ..preprocessing.image import preprocess_image
14 | from ..preprocessing.image import resize_image
15 | from ..preprocessing.image import transform_aabb
16 |
17 |
18 | class Generator(object):
19 | """ Abstract generator class.
20 | """
21 |
22 | def __init__(
23 | self,
24 | transform_generator=None,
25 | batch_size=1,
26 | group_method='ratio', # one of 'none', 'random', 'ratio'
27 | shuffle_groups=True,
28 | image_min_side=800,
29 | image_max_side=1333,
30 | transform_parameters=None,
31 | compute_anchor_targets=anchor_targets_bbox,
32 | compute_shapes=guess_shapes,
33 | preprocess_image=preprocess_image,
34 | ):
35 | """ Initialize Generator object.
36 |
37 | Args
38 | transform_generator : A generator used to randomly transform images and annotations.
39 | batch_size : The size of the batches to generate.
40 | group_method : Determines how images are grouped together (defaults to 'ratio', one of
41 | ('none', 'random', 'ratio')).
42 | shuffle_groups : If True, shuffles the groups each epoch.
43 | image_min_side : After resizing the minimum side of an image is equal to image_min_side.
44 | image_max_side : If after resizing the maximum side is larger than image_max_side, scales down
45 | further so that the max side is equal to image_max_side.
46 | transform_parameters : The transform parameters used for data augmentation.
47 | compute_anchor_targets : Function handler for computing the targets of anchors for an image and its
48 | annotations.
49 | compute_shapes : Function handler for computing the shapes of the pyramid for a given input.
50 | preprocess_image : Function handler for preprocessing an image (scaling / normalizing) for passing
51 | through a network.
52 | """
53 | self.transform_generator = transform_generator
54 | self.batch_size = int(batch_size)
55 | self.group_method = group_method
56 | self.shuffle_groups = shuffle_groups
57 | self.image_min_side = image_min_side
58 | self.image_max_side = image_max_side
59 | self.transform_parameters = transform_parameters or TransformParameters()
60 | self.compute_anchor_targets = compute_anchor_targets
61 | self.compute_shapes = compute_shapes
62 | self.preprocess_image = preprocess_image
63 |
64 | self.group_index = 0
65 | self.lock = threading.Lock()
66 |
67 | self.group_images()
68 |
69 | def size(self):
70 | """ Size of the dataset.
71 | """
72 | raise NotImplementedError('size method not implemented')
73 |
74 | def num_classes(self):
75 | """ Number of classes in the dataset.
76 | """
77 | raise NotImplementedError('num_classes method not implemented')
78 |
79 | def name_to_label(self, name):
80 | """ Map name to label.
81 | """
82 | raise NotImplementedError('name_to_label method not implemented')
83 |
84 | def label_to_name(self, label):
85 | """ Map label to name.
86 | """
87 | raise NotImplementedError('label_to_name method not implemented')
88 |
89 | def image_aspect_ratio(self, image_index):
90 | """ Compute the aspect ratio for an image with image_index.
91 | """
92 | raise NotImplementedError('image_aspect_ratio method not implemented')
93 |
94 | def load_image(self, image_index):
95 | """ Load an image at the image_index.
96 | """
97 | raise NotImplementedError('load_image method not implemented')
98 |
99 | def load_annotations(self, image_index):
100 | """ Load annotations for an image_index.
101 | """
102 | raise NotImplementedError('load_annotations method not implemented')
103 |
104 | def load_annotations_group(self, group):
105 | """ Load annotations for all images in group.
106 | """
107 | return [self.load_annotations(image_index) for image_index in group]
108 |
109 | def filter_annotations(self, image_group, annotations_group, group):
110 | """ Filter annotations by removing those that are outside of the image bounds or whose width/height < 0.
111 | """
112 | # test all annotations
113 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):
114 | assert (isinstance(annotations,
115 | np.ndarray)), \
116 | '\'load_annotations\' should return a list of numpy arrays, received: {}'.format(type(annotations))
117 |
118 | # test x2 < x1 | y2 < y1 | x1 < 0 | y1 < 0 | x2 <= 0 | y2 <= 0 | x2 >= image.shape[1] | y2 >= image.shape[0]
119 | invalid_indices = np.where(
120 | (annotations[:, 2] <= annotations[:, 0]) |
121 | (annotations[:, 3] <= annotations[:, 1]) |
122 | (annotations[:, 0] < 0) |
123 | (annotations[:, 1] < 0) |
124 | (annotations[:, 2] > image.shape[1]) |
125 | (annotations[:, 3] > image.shape[0])
126 | )[0]
127 |
128 | # delete invalid indices
129 | if len(invalid_indices):
130 | warnings.warn('Image with id {} (shape {}) contains the following invalid boxes: {}.'.format(
131 | group[index],
132 | image.shape,
133 | [annotations[invalid_index, :] for invalid_index in invalid_indices]
134 | ))
135 | annotations_group[index] = np.delete(annotations, invalid_indices, axis=0)
136 |
137 | return image_group, annotations_group
138 |
139 | def load_image_group(self, group):
140 | """ Load images for all images in a group.
141 | """
142 | return [self.load_image(image_index) for image_index in group]
143 |
144 | def random_transform_group_entry(self, image, annotations):
145 | """ Randomly transforms image and annotation.
146 | """
147 | # randomly transform both image and annotations
148 | if self.transform_generator:
149 | transform = adjust_transform_for_image(next(self.transform_generator), image,
150 | self.transform_parameters.relative_translation)
151 | image = apply_transform(transform, image, self.transform_parameters)
152 |
153 | # Transform the bounding boxes in the annotations.
154 | annotations = annotations.copy()
155 | for index in range(annotations.shape[0]):
156 | annotations[index, :4] = transform_aabb(transform, annotations[index, :4])
157 |
158 | return image, annotations
159 |
160 | def resize_image(self, image):
161 | """ Resize an image using image_min_side and image_max_side.
162 | """
163 | return resize_image(image, min_side=self.image_min_side, max_side=self.image_max_side)
164 |
165 | def preprocess_group_entry(self, image, annotations):
166 | """ Preprocess image and its annotations.
167 | """
168 | # preprocess the image
169 | image = self.preprocess_image(image)
170 |
171 | # # randomly transform image and annotations
172 | # image, annotations = self.random_transform_group_entry(image, annotations)
173 |
174 | # resize image
175 | image, image_scale = self.resize_image(image)
176 |
177 | # apply resizing to annotations too
178 | annotations[:, :4] *= image_scale
179 |
180 | return image, annotations
181 |
182 | def preprocess_group(self, image_group, annotations_group):
183 | """ Preprocess each image and its annotations in its group.
184 | """
185 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):
186 | # preprocess a single group entry
187 | image, annotations = self.preprocess_group_entry(image, annotations)
188 |
189 | # copy processed data back to group
190 | image_group[index] = image
191 | annotations_group[index] = annotations
192 |
193 | return image_group, annotations_group
194 |
195 | def group_images(self):
196 | """ Order the images according to self.order and makes groups of self.batch_size.
197 | """
198 | # determine the order of the images
199 | order = list(range(self.size()))
200 | if self.group_method == 'random':
201 | random.shuffle(order)
202 | elif self.group_method == 'ratio':
203 | order.sort(key=lambda x: self.image_aspect_ratio(x))
204 |
205 | # divide into groups, one group = one batch
206 | self.groups = [[order[x % len(order)] for x in range(i, i + self.batch_size)] for i in
207 | range(0, len(order), self.batch_size)]
208 |
209 | def compute_inputs(self, image_group):
210 | """ Compute inputs for the network using an image_group.
211 | """
212 | # get the max image shape
213 | max_shape = tuple(max(image.shape[x] for image in image_group) for x in range(3))
214 |
215 | # construct an image batch object
216 | image_batch = np.zeros((self.batch_size,) + max_shape, dtype=keras.backend.floatx())
217 |
218 | # copy all images to the upper left part of the image batch object
219 | for image_index, image in enumerate(image_group):
220 | image_batch[image_index, :image.shape[0], :image.shape[1], :image.shape[2]] = image
221 |
222 | return image_batch
223 |
224 | def generate_anchors(self, image_shape):
225 | return anchors_for_shape(image_shape, shapes_callback=self.compute_shapes)
226 |
227 | def compute_targets(self, image_group, annotations_group):
228 | """ Compute target outputs for the network using images and their annotations.
229 | """
230 | # get the max image shape
231 | max_shape = tuple(max(image.shape[x] for image in image_group) for x in range(3))
232 | anchors = self.generate_anchors(max_shape)
233 |
234 | labels_batch, regression_batch, _ = self.compute_anchor_targets(
235 | anchors,
236 | image_group,
237 | annotations_group,
238 | self.num_classes()
239 | )
240 |
241 | return [regression_batch, labels_batch]
242 |
243 | def compute_input_output(self, group):
244 | """ Compute inputs and target outputs for the network.
245 | """
246 | # load images and annotations
247 | image_group = self.load_image_group(group)
248 | annotations_group = self.load_annotations_group(group)
249 |
250 | # check validity of annotations
251 | image_group, annotations_group = self.filter_annotations(image_group, annotations_group, group)
252 |
253 | # perform preprocessing steps
254 | image_group, annotations_group = self.preprocess_group(image_group, annotations_group)
255 |
256 | # compute network inputs
257 | inputs = self.compute_inputs(image_group)
258 |
259 | # compute network targets
260 | targets = self.compute_targets(image_group, annotations_group)
261 |
262 | return inputs, targets
263 |
264 | def __next__(self):
265 | return self.next()
266 |
267 | def next(self):
268 | # advance the group index
269 | while True:
270 | with self.lock:
271 | if self.group_index == 0 and self.shuffle_groups:
272 | # shuffle groups at start of epoch
273 | random.shuffle(self.groups)
274 | group = self.groups[self.group_index]
275 | self.group_index = (self.group_index + 1) % len(self.groups)
276 | try:
277 | computed = self.compute_input_output(group)
278 | except:
279 | break
280 | return computed
281 |
--------------------------------------------------------------------------------
/keras_retinanet/preprocessing/image.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import keras
4 | import numpy as np
5 | import cv2
6 | from PIL import Image
7 |
8 | DEFAULT_PRNG = np.random
9 |
10 |
11 | def colvec(*args):
12 | """ Create a numpy array representing a column vector. """
13 | return np.array([args]).T
14 |
15 |
16 | def transform_aabb(transform, aabb):
17 | """ Apply a transformation to an axis aligned bounding box.
18 |
19 | The result is a new AABB in the same coordinate system as the original AABB.
20 | The new AABB contains all corner points of the original AABB after applying the given transformation.
21 |
22 | Args
23 | transform: The transformation to apply.
24 | x1: The minimum x value of the AABB.
25 | y1: The minimum y value of the AABB.
26 | x2: The maximum x value of the AABB.
27 | y2: The maximum y value of the AABB.
28 | Returns
29 | The new AABB as tuple (x1, y1, x2, y2)
30 | """
31 | x1, y1, x2, y2 = aabb
32 | # Transform all 4 corners of the AABB.
33 | points = transform.dot([
34 | [x1, x2, x1, x2],
35 | [y1, y2, y2, y1],
36 | [1, 1, 1, 1],
37 | ])
38 |
39 | # Extract the min and max corners again.
40 | min_corner = points.min(axis=1)
41 | max_corner = points.max(axis=1)
42 |
43 | return [min_corner[0], min_corner[1], max_corner[0], max_corner[1]]
44 |
45 |
46 | def _random_vector(min, max, prng=DEFAULT_PRNG):
47 | """ Construct a random vector between min and max.
48 | Args
49 | min: the minimum value for each component
50 | max: the maximum value for each component
51 | """
52 | min = np.array(min)
53 | max = np.array(max)
54 | assert min.shape == max.shape
55 | assert len(min.shape) == 1
56 | return prng.uniform(min, max)
57 |
58 |
59 | def rotation(angle):
60 | """ Construct a homogeneous 2D rotation matrix.
61 | Args
62 | angle: the angle in radians
63 | Returns
64 | the rotation matrix as 3 by 3 numpy array
65 | """
66 | return np.array([
67 | [np.cos(angle), -np.sin(angle), 0],
68 | [np.sin(angle), np.cos(angle), 0],
69 | [0, 0, 1]
70 | ])
71 |
72 |
73 | def random_rotation(min, max, prng=DEFAULT_PRNG):
74 | """ Construct a random rotation between -max and max.
75 | Args
76 | min: a scalar for the minimum absolute angle in radians
77 | max: a scalar for the maximum absolute angle in radians
78 | prng: the pseudo-random number generator to use.
79 | Returns
80 | a homogeneous 3 by 3 rotation matrix
81 | """
82 | return rotation(prng.uniform(min, max))
83 |
84 |
85 | def translation(translation):
86 | """ Construct a homogeneous 2D translation matrix.
87 | # Arguments
88 | translation: the translation 2D vector
89 | # Returns
90 | the translation matrix as 3 by 3 numpy array
91 | """
92 | return np.array([
93 | [1, 0, translation[0]],
94 | [0, 1, translation[1]],
95 | [0, 0, 1]
96 | ])
97 |
98 |
99 | def random_translation(min, max, prng=DEFAULT_PRNG):
100 | """ Construct a random 2D translation between min and max.
101 | Args
102 | min: a 2D vector with the minimum translation for each dimension
103 | max: a 2D vector with the maximum translation for each dimension
104 | prng: the pseudo-random number generator to use.
105 | Returns
106 | a homogeneous 3 by 3 translation matrix
107 | """
108 | return translation(_random_vector(min, max, prng))
109 |
110 |
111 | def shear(angle):
112 | """ Construct a homogeneous 2D shear matrix.
113 | Args
114 | angle: the shear angle in radians
115 | Returns
116 | the shear matrix as 3 by 3 numpy array
117 | """
118 | return np.array([
119 | [1, -np.sin(angle), 0],
120 | [0, np.cos(angle), 0],
121 | [0, 0, 1]
122 | ])
123 |
124 |
125 | def random_shear(min, max, prng=DEFAULT_PRNG):
126 | """ Construct a random 2D shear matrix with shear angle between -max and max.
127 | Args
128 | min: the minimum shear angle in radians.
129 | max: the maximum shear angle in radians.
130 | prng: the pseudo-random number generator to use.
131 | Returns
132 | a homogeneous 3 by 3 shear matrix
133 | """
134 | return shear(prng.uniform(min, max))
135 |
136 |
137 | def scaling(factor):
138 | """ Construct a homogeneous 2D scaling matrix.
139 | Args
140 | factor: a 2D vector for X and Y scaling
141 | Returns
142 | the zoom matrix as 3 by 3 numpy array
143 | """
144 | return np.array([
145 | [factor[0], 0, 0],
146 | [0, factor[1], 0],
147 | [0, 0, 1]
148 | ])
149 |
150 |
151 | def random_scaling(min, max, prng=DEFAULT_PRNG):
152 | """ Construct a random 2D scale matrix between -max and max.
153 | Args
154 | min: a 2D vector containing the minimum scaling factor for X and Y.
155 | min: a 2D vector containing The maximum scaling factor for X and Y.
156 | prng: the pseudo-random number generator to use.
157 | Returns
158 | a homogeneous 3 by 3 scaling matrix
159 | """
160 | return scaling(_random_vector(min, max, prng))
161 |
162 |
163 | def random_flip(flip_x_chance, flip_y_chance, prng=DEFAULT_PRNG):
164 | """ Construct a transformation randomly containing X/Y flips (or not).
165 | Args
166 | flip_x_chance: The chance that the result will contain a flip along the X axis.
167 | flip_y_chance: The chance that the result will contain a flip along the Y axis.
168 | prng: The pseudo-random number generator to use.
169 | Returns
170 | a homogeneous 3 by 3 transformation matrix
171 | """
172 | flip_x = prng.uniform(0, 1) < flip_x_chance
173 | flip_y = prng.uniform(0, 1) < flip_y_chance
174 | # 1 - 2 * bool gives 1 for False and -1 for True.
175 | return scaling((1 - 2 * flip_x, 1 - 2 * flip_y))
176 |
177 |
178 | def change_transform_origin(transform, center):
179 | """ Create a new transform representing the same transformation,
180 | only with the origin of the linear part changed.
181 | Args
182 | transform: the transformation matrix
183 | center: the new origin of the transformation
184 | Returns
185 | translate(center) * transform * translate(-center)
186 | """
187 | center = np.array(center)
188 | return np.linalg.multi_dot([translation(center), transform, translation(-center)])
189 |
190 |
191 | def random_transform(
192 | min_rotation=0,
193 | max_rotation=0,
194 | min_translation=(0, 0),
195 | max_translation=(0, 0),
196 | min_shear=0,
197 | max_shear=0,
198 | min_scaling=(1, 1),
199 | max_scaling=(1, 1),
200 | flip_x_chance=0,
201 | flip_y_chance=0,
202 | prng=DEFAULT_PRNG
203 | ):
204 | """ Create a random transformation.
205 |
206 | The transformation consists of the following operations in this order (from left to right):
207 | * rotation
208 | * translation
209 | * shear
210 | * scaling
211 | * flip x (if applied)
212 | * flip y (if applied)
213 |
214 | Note that by default, the data generators in `keras_retinanet.preprocessing.generators` interpret the translation
215 | as factor of the image size. So an X translation of 0.1 would translate the image by 10% of it's width.
216 | Set `relative_translation` to `False` in the `TransformParameters` of a data generator to have it interpret
217 | the translation directly as pixel distances instead.
218 |
219 | Args
220 | min_rotation: The minimum rotation in radians for the transform as scalar.
221 | max_rotation: The maximum rotation in radians for the transform as scalar.
222 | min_translation: The minimum translation for the transform as 2D column vector.
223 | max_translation: The maximum translation for the transform as 2D column vector.
224 | min_shear: The minimum shear angle for the transform in radians.
225 | max_shear: The maximum shear angle for the transform in radians.
226 | min_scaling: The minimum scaling for the transform as 2D column vector.
227 | max_scaling: The maximum scaling for the transform as 2D column vector.
228 | flip_x_chance: The chance (0 to 1) that a transform will contain a flip along X direction.
229 | flip_y_chance: The chance (0 to 1) that a transform will contain a flip along Y direction.
230 | prng: The pseudo-random number generator to use.
231 | """
232 | return np.linalg.multi_dot([
233 | random_rotation(min_rotation, max_rotation, prng),
234 | random_translation(min_translation, max_translation, prng),
235 | random_shear(min_shear, max_shear, prng),
236 | random_scaling(min_scaling, max_scaling, prng),
237 | random_flip(flip_x_chance, flip_y_chance, prng)
238 | ])
239 |
240 |
241 | def random_transform_generator(prng=None, **kwargs):
242 | """ Create a random transform generator.
243 |
244 | Uses a dedicated, newly created, properly seeded PRNG by default instead of the global DEFAULT_PRNG.
245 |
246 | The transformation consists of the following operations in this order (from left to right):
247 | * rotation
248 | * translation
249 | * shear
250 | * scaling
251 | * flip x (if applied)
252 | * flip y (if applied)
253 |
254 | Note that by default, the data generators in `keras_retinanet.preprocessing.generators` interpret the translation
255 | as factor of the image size. So an X translation of 0.1 would translate the image by 10% of it's width.
256 | Set `relative_translation` to `False` in the `TransformParameters` of a data generator to have it interpret
257 | the translation directly as pixel distances instead.
258 |
259 | Args
260 | min_rotation: The minimum rotation in radians for the transform as scalar.
261 | max_rotation: The maximum rotation in radians for the transform as scalar.
262 | min_translation: The minimum translation for the transform as 2D column vector.
263 | max_translation: The maximum translation for the transform as 2D column vector.
264 | min_shear: The minimum shear angle for the transform in radians.
265 | max_shear: The maximum shear angle for the transform in radians.
266 | min_scaling: The minimum scaling for the transform as 2D column vector.
267 | max_scaling: The maximum scaling for the transform as 2D column vector.
268 | flip_x_chance: The chance (0 to 1) that a transform will contain a flip along X direction.
269 | flip_y_chance: The chance (0 to 1) that a transform will contain a flip along Y direction.
270 | prng: The pseudo-random number generator to use.
271 | """
272 |
273 | if prng is None:
274 | # RandomState automatically seeds using the best available method.
275 | prng = np.random.RandomState()
276 |
277 | while True:
278 | yield random_transform(prng=prng, **kwargs)
279 |
280 |
281 | def read_image_bgr(path):
282 | """ Read an image in BGR format.
283 |
284 | Args
285 | path: Path to the image.
286 | """
287 | image = np.asarray(Image.open(path).convert('RGB'))
288 | return image[:, :, ::-1].copy()
289 |
290 |
291 | def preprocess_image(x):
292 | """ Preprocess an image by subtracting the ImageNet mean.
293 |
294 | Args
295 | x: np.array of shape (None, None, 3) or (3, None, None).
296 | mode: One of "caffe" or "tf".
297 | - caffe: will zero-center each color channel with
298 | respect to the ImageNet dataset, without scaling.
299 | - tf: will scale pixels between -1 and 1, sample-wise.
300 |
301 | Returns
302 | The input with the ImageNet mean subtracted.
303 | """
304 | # mostly identical to:
305 | # "https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py"
306 | # except for converting RGB -> BGR since we assume BGR already
307 | x = x.astype(keras.backend.floatx())
308 |
309 | if keras.backend.image_data_format() == 'channels_first':
310 | if x.ndim == 3:
311 | x[0, :, :] -= 103.939
312 | x[1, :, :] -= 116.779
313 | x[2, :, :] -= 123.68
314 | else:
315 | x[:, 0, :, :] -= 103.939
316 | x[:, 1, :, :] -= 116.779
317 | x[:, 2, :, :] -= 123.68
318 | else:
319 | x[..., 0] -= 103.939
320 | x[..., 1] -= 116.779
321 | x[..., 2] -= 123.68
322 |
323 | return x
324 |
325 |
326 | def adjust_transform_for_image(transform, image, relative_translation):
327 | """ Adjust a transformation for a specific image.
328 |
329 | The translation of the matrix will be scaled with the size of the image.
330 | The linear part of the transformation will adjusted so that the origin of the transformation will be
331 | at the center of the image.
332 | """
333 | height, width, channels = image.shape
334 |
335 | result = transform
336 |
337 | # Scale the translation with the image size if specified.
338 | if relative_translation:
339 | result[0:2, 2] *= [width, height]
340 |
341 | # Move the origin of transformation.
342 | result = change_transform_origin(transform, (0.5 * width, 0.5 * height))
343 |
344 | return result
345 |
346 |
347 | class TransformParameters:
348 | """ Struct holding parameters determining how to apply a transformation to an image.
349 |
350 | Args
351 | fill_mode: One of: 'constant', 'nearest', 'reflect', 'wrap'
352 | interpolation: One of: 'nearest', 'linear', 'cubic', 'area', 'lanczos4'
353 | cval: Fill value to use with fill_mode='constant'
354 | data_format: Same as for keras.preprocessing.image.apply_transform
355 | relative_translation: If true (the default), interpret translation as a factor of the image size.
356 | If false, interpret it as absolute pixels.
357 | """
358 |
359 | def __init__(
360 | self,
361 | fill_mode='nearest',
362 | interpolation='linear',
363 | cval=0,
364 | data_format=None,
365 | relative_translation=True,
366 | ):
367 | self.fill_mode = fill_mode
368 | self.cval = cval
369 | self.interpolation = interpolation
370 | self.relative_translation = relative_translation
371 |
372 | if data_format is None:
373 | data_format = keras.backend.image_data_format()
374 | self.data_format = data_format
375 |
376 | if data_format == 'channels_first':
377 | self.channel_axis = 0
378 | elif data_format == 'channels_last':
379 | self.channel_axis = 2
380 | else:
381 | raise ValueError(
382 | "invalid data_format, expected 'channels_first' or 'channels_last', got '{}'".format(data_format))
383 |
384 | def cvBorderMode(self):
385 | if self.fill_mode == 'constant':
386 | return cv2.BORDER_CONSTANT
387 | if self.fill_mode == 'nearest':
388 | return cv2.BORDER_REPLICATE
389 | if self.fill_mode == 'reflect':
390 | return cv2.BORDER_REFLECT_101
391 | if self.fill_mode == 'wrap':
392 | return cv2.BORDER_WRAP
393 |
394 | def cvInterpolation(self):
395 | if self.interpolation == 'nearest':
396 | return cv2.INTER_NEAREST
397 | if self.interpolation == 'linear':
398 | return cv2.INTER_LINEAR
399 | if self.interpolation == 'cubic':
400 | return cv2.INTER_CUBIC
401 | if self.interpolation == 'area':
402 | return cv2.INTER_AREA
403 | if self.interpolation == 'lanczos4':
404 | return cv2.INTER_LANCZOS4
405 |
406 |
407 | def apply_transform(matrix, image, params):
408 | """
409 | Apply a transformation to an image.
410 |
411 | The origin of transformation is at the top left corner of the image.
412 |
413 | The matrix is interpreted such that a point (x, y) on the original image is moved to
414 | transform * (x, y) in the generated image.
415 | Mathematically speaking, that means that the matrix is a transformation from the transformed image space to
416 | the original image space.
417 |
418 | Args
419 | matrix: A homogeneous 3 by 3 matrix holding representing the transformation to apply.
420 | image: The image to transform.
421 | params: The transform parameters (see TransformParameters)
422 | """
423 | if params.channel_axis != 2:
424 | image = np.moveaxis(image, params.channel_axis, 2)
425 |
426 | output = cv2.warpAffine(
427 | image,
428 | matrix[:2, :],
429 | dsize=(image.shape[1], image.shape[0]),
430 | flags=params.cvInterpolation(),
431 | borderMode=params.cvBorderMode(),
432 | borderValue=params.cval,
433 | )
434 |
435 | if params.channel_axis != 2:
436 | output = np.moveaxis(output, 2, params.channel_axis)
437 | return output
438 |
439 |
440 | def resize_image(img, min_side=800, max_side=1333):
441 | """ Resize an image such that the size is constrained to min_side and max_side.
442 |
443 | Args
444 | min_side: The image's min side will be equal to min_side after resizing.
445 | max_side: If after resizing the image's max side is above max_side, resize until the max side is
446 | equal to max_side.
447 |
448 | Returns
449 | A resized image.
450 | """
451 | (rows, cols, _) = img.shape
452 |
453 | smallest_side = min(rows, cols)
454 |
455 | # rescale the image so the smallest side is min_side
456 | scale = min_side / smallest_side
457 |
458 | # check if the largest side is now greater than max_side, which can happen
459 | # when images have a large aspect ratio
460 | largest_side = max(rows, cols)
461 | if largest_side * scale > max_side:
462 | scale = max_side / largest_side
463 |
464 | # resize the image with the computed scale
465 | img = cv2.resize(img, None, fx=scale, fy=scale)
466 |
467 | return img, scale
468 |
--------------------------------------------------------------------------------
/keras_retinanet/preprocessing/open_images.py:
--------------------------------------------------------------------------------
1 | import csv
2 | import json
3 | import os
4 | import warnings
5 |
6 | import numpy as np
7 | from PIL import Image
8 |
9 | from ..preprocessing.generator import Generator
10 | from ..preprocessing.image import read_image_bgr
11 |
12 |
13 | def load_hierarchy(metadata_dir, version='challenge2018'):
14 | hierarchy = None
15 | if version == 'challenge2018':
16 | hierarchy = 'bbox_labels_500_hierarchy.json'
17 |
18 | hierarchy_json = os.path.join(metadata_dir, hierarchy)
19 | with open(hierarchy_json) as f:
20 | hierarchy_data = json.loads(f.read())
21 |
22 | return hierarchy_data
23 |
24 |
25 | def load_hierarchy_children(hierarchy):
26 | res = [hierarchy['LabelName']]
27 |
28 | if 'Subcategory' in hierarchy:
29 | for subcategory in hierarchy['Subcategory']:
30 | children = load_hierarchy_children(subcategory)
31 |
32 | for c in children:
33 | res.append(c)
34 |
35 | return res
36 |
37 |
38 | def find_hierarchy_parent(hierarchy, parent_cls):
39 | if hierarchy['LabelName'] == parent_cls:
40 | return hierarchy
41 | elif 'Subcategory' in hierarchy:
42 | for child in hierarchy['Subcategory']:
43 | res = find_hierarchy_parent(child, parent_cls)
44 | if res is not None:
45 | return res
46 |
47 | return None
48 |
49 |
50 | def get_labels(metadata_dir, version='challenge2018'):
51 | if version == 'challenge2018':
52 | csv_file = 'challenge-2018-class-descriptions-500.csv'
53 |
54 | boxable_classes_descriptions = os.path.join(metadata_dir, csv_file)
55 | id_to_labels = {}
56 | cls_index = {}
57 |
58 | i = 0
59 | with open(boxable_classes_descriptions) as f:
60 | for row in csv.reader(f):
61 | # make sure the csv row is not empty (usually the last one)
62 | if len(row):
63 | label = row[0]
64 | description = row[1].replace("\"", "").replace("'", "").replace('`', '')
65 |
66 | id_to_labels[i] = description
67 | cls_index[label] = i
68 |
69 | i += 1
70 | else:
71 | trainable_classes_path = os.path.join(metadata_dir, 'classes-bbox-trainable.txt')
72 | description_path = os.path.join(metadata_dir, 'class-descriptions.csv')
73 |
74 | description_table = {}
75 | with open(description_path) as f:
76 | for row in csv.reader(f):
77 | # make sure the csv row is not empty (usually the last one)
78 | if len(row):
79 | description_table[row[0]] = row[1].replace("\"", "").replace("'", "").replace('`', '')
80 |
81 | with open(trainable_classes_path, 'rb') as f:
82 | trainable_classes = f.read().split('\n')
83 |
84 | id_to_labels = dict([(i, description_table[c]) for i, c in enumerate(trainable_classes)])
85 | cls_index = dict([(c, i) for i, c in enumerate(trainable_classes)])
86 |
87 | return id_to_labels, cls_index
88 |
89 |
90 | def generate_images_annotations_json(main_dir, metadata_dir, subset, cls_index, version='challenge2018'):
91 | validation_image_ids = {}
92 |
93 | validation_image_ids_path = os.path.join(metadata_dir, 'challenge-2018-image-ids-valset-od.csv')
94 |
95 | with open(validation_image_ids_path, 'r') as csv_file:
96 | reader = csv.DictReader(csv_file, fieldnames=['ImageID'])
97 | next(reader)
98 | for line, row in enumerate(reader):
99 | image_id = row['ImageID']
100 | validation_image_ids[image_id] = True
101 |
102 | annotations_path = os.path.join(metadata_dir, 'challenge-2018-train-annotations-bbox.csv')
103 |
104 | fieldnames = ['ImageID', 'Source', 'LabelName', 'Confidence',
105 | 'XMin', 'XMax', 'YMin', 'YMax',
106 | 'IsOccluded', 'IsTruncated', 'IsGroupOf', 'IsDepiction', 'IsInside']
107 |
108 | id_annotations = dict()
109 | with open(annotations_path, 'r') as csv_file:
110 | reader = csv.DictReader(csv_file, fieldnames=fieldnames)
111 | next(reader)
112 |
113 | images_sizes = {}
114 | for line, row in enumerate(reader):
115 | frame = row['ImageID']
116 |
117 | if version == 'challenge2018':
118 | if subset == 'train':
119 | if frame in validation_image_ids:
120 | continue
121 | elif subset == 'validation':
122 | if frame not in validation_image_ids:
123 | continue
124 | else:
125 | raise NotImplementedError('This generator handles only the train and validation subsets')
126 |
127 | class_name = row['LabelName']
128 |
129 | if class_name not in cls_index:
130 | continue
131 |
132 | cls_id = cls_index[class_name]
133 |
134 | if version == 'challenge2018':
135 | img_path = os.path.join(main_dir, 'images', 'train', frame + '.jpg')
136 | else:
137 | img_path = os.path.join(main_dir, 'images', subset, frame + '.jpg')
138 |
139 | if frame in images_sizes:
140 | width, height = images_sizes[frame]
141 | else:
142 | try:
143 | with Image.open(img_path) as img:
144 | width, height = img.width, img.height
145 | images_sizes[frame] = (width, height)
146 | except Exception as ex:
147 | if version == 'challenge2018':
148 | raise ex
149 | continue
150 |
151 | x1 = float(row['XMin'])
152 | x2 = float(row['XMax'])
153 | y1 = float(row['YMin'])
154 | y2 = float(row['YMax'])
155 |
156 | x1_int = int(round(x1 * width))
157 | x2_int = int(round(x2 * width))
158 | y1_int = int(round(y1 * height))
159 | y2_int = int(round(y2 * height))
160 |
161 | # Check that the bounding box is valid.
162 | if x2 <= x1:
163 | raise ValueError('line {}: x2 ({}) must be higher than x1 ({})'.format(line, x2, x1))
164 | if y2 <= y1:
165 | raise ValueError('line {}: y2 ({}) must be higher than y1 ({})'.format(line, y2, y1))
166 |
167 | if y2_int == y1_int:
168 | warnings.warn('filtering line {}: rounding y2 ({}) and y1 ({}) makes them equal'.format(line, y2, y1))
169 | continue
170 |
171 | if x2_int == x1_int:
172 | warnings.warn('filtering line {}: rounding x2 ({}) and x1 ({}) makes them equal'.format(line, x2, x1))
173 | continue
174 |
175 | img_id = row['ImageID']
176 | annotation = {'cls_id': cls_id, 'x1': x1, 'x2': x2, 'y1': y1, 'y2': y2}
177 |
178 | if img_id in id_annotations:
179 | annotations = id_annotations[img_id]
180 | annotations['boxes'].append(annotation)
181 | else:
182 | id_annotations[img_id] = {'w': width, 'h': height, 'boxes': [annotation]}
183 |
184 | return id_annotations
185 |
186 |
187 | class OpenImagesGenerator(Generator):
188 | def __init__(
189 | self, main_dir, subset, version='challenge2018',
190 | labels_filter=None, annotation_cache_dir='.',
191 | parent_label=None,
192 | **kwargs
193 | ):
194 | if version == 'challenge2018':
195 | metadata = 'challenge2018'
196 | else:
197 | raise NotImplementedError('There is currently no implementation for versions older than v3')
198 |
199 | if version == 'challenge2018':
200 | self.base_dir = os.path.join(main_dir, 'images', 'train')
201 |
202 | metadata_dir = os.path.join(main_dir, metadata)
203 | annotation_cache_json = os.path.join(annotation_cache_dir, subset + '.json')
204 |
205 | self.hierarchy = load_hierarchy(metadata_dir, version=version)
206 | id_to_labels, cls_index = get_labels(metadata_dir, version=version)
207 |
208 | if os.path.exists(annotation_cache_json):
209 | print("Loading {} annotations...".format(subset))
210 | with open(annotation_cache_json, 'r') as f:
211 | self.annotations = json.loads(f.read())
212 | else:
213 | print("Starting to generate image annotations...")
214 | self.annotations = generate_images_annotations_json(main_dir, metadata_dir, subset, cls_index,
215 | version=version)
216 | print("Dumping the annotations...")
217 | json.dump(self.annotations, open(annotation_cache_json, "w"))
218 |
219 | if labels_filter is not None or parent_label is not None:
220 | self.id_to_labels, self.annotations = self.__filter_data(id_to_labels, cls_index, labels_filter,
221 | parent_label)
222 | else:
223 | self.id_to_labels = id_to_labels
224 |
225 | self.id_to_image_id = dict([(i, k) for i, k in enumerate(self.annotations)])
226 |
227 | super(OpenImagesGenerator, self).__init__(**kwargs)
228 |
229 | def __filter_data(self, id_to_labels, cls_index, labels_filter=None, parent_label=None):
230 | """
231 | If you want to work with a subset of the labels just set a list with trainable labels
232 | :param labels_filter: Ex: labels_filter = ['Helmet', 'Hat', 'Analog television']
233 | :param parent_label: If parent_label is set this will bring you the parent label
234 | but also its children in the semantic hierarchy as defined in OID, ex: Animal
235 | hierarchical tree
236 | :return:
237 | """
238 |
239 | children_id_to_labels = {}
240 |
241 | if parent_label is None:
242 | # there is/are no other sublabel(s) other than the labels itself
243 |
244 | for label in labels_filter:
245 | for i, lb in id_to_labels:
246 | if lb == label:
247 | children_id_to_labels[i] = label
248 | break
249 | else:
250 | parent_cls = None
251 | for i, lb in iter(id_to_labels.items()):
252 | if lb == parent_label:
253 | parent_id = i
254 | for c, index in iter(cls_index.items()):
255 | if index == parent_id:
256 | parent_cls = c
257 | break
258 |
259 | if parent_cls is None:
260 | raise Exception('Couldnt find label {}'.format(parent_label))
261 |
262 | parent_tree = find_hierarchy_parent(self.hierarchy, parent_cls)
263 |
264 | if parent_tree is None:
265 | raise Exception('Couldnt find parent {} in the semantic hierarchical tree'.format(parent_label))
266 |
267 | children = load_hierarchy_children(parent_tree)
268 |
269 | for cls in children:
270 | index = cls_index[cls]
271 | label = id_to_labels[index]
272 | children_id_to_labels[index] = label
273 |
274 | id_map = dict([(ind, i) for i, ind in enumerate(iter(children_id_to_labels.keys()))])
275 |
276 | filtered_annotations = {}
277 | for k in self.annotations:
278 | img_ann = self.annotations[k]
279 |
280 | filtered_boxes = []
281 | for ann in img_ann['boxes']:
282 | cls_id = ann['cls_id']
283 | if cls_id in children_id_to_labels:
284 | ann['cls_id'] = id_map[cls_id]
285 | filtered_boxes.append(ann)
286 |
287 | if len(filtered_boxes) > 0:
288 | filtered_annotations[k] = {'w': img_ann['w'], 'h': img_ann['h'], 'boxes': filtered_boxes}
289 |
290 | children_id_to_labels = dict([(id_map[i], l) for (i, l) in iter(children_id_to_labels.items())])
291 |
292 | return children_id_to_labels, filtered_annotations
293 |
294 | def size(self):
295 | return len(self.annotations)
296 |
297 | def num_classes(self):
298 | return len(self.id_to_labels)
299 |
300 | def name_to_label(self, name):
301 | raise NotImplementedError()
302 |
303 | def label_to_name(self, label):
304 | return self.id_to_labels[label]
305 |
306 | def image_aspect_ratio(self, image_index):
307 | img_annotations = self.annotations[self.id_to_image_id[image_index]]
308 | height, width = img_annotations['h'], img_annotations['w']
309 | return float(width) / float(height)
310 |
311 | def image_path(self, image_index):
312 | path = os.path.join(self.base_dir, self.id_to_image_id[image_index] + '.jpg')
313 | return path
314 |
315 | def load_image(self, image_index):
316 | return read_image_bgr(self.image_path(image_index))
317 |
318 | def load_annotations(self, image_index):
319 | image_annotations = self.annotations[self.id_to_image_id[image_index]]
320 |
321 | labels = image_annotations['boxes']
322 | height, width = image_annotations['h'], image_annotations['w']
323 |
324 | boxes = np.zeros((len(labels), 5))
325 | for idx, ann in enumerate(labels):
326 | cls_id = ann['cls_id']
327 | x1 = ann['x1'] * width
328 | x2 = ann['x2'] * width
329 | y1 = ann['y1'] * height
330 | y2 = ann['y2'] * height
331 |
332 | boxes[idx, 0] = x1
333 | boxes[idx, 1] = y1
334 | boxes[idx, 2] = x2
335 | boxes[idx, 3] = y2
336 | boxes[idx, 4] = cls_id
337 |
338 | return boxes
339 |
--------------------------------------------------------------------------------
/keras_retinanet/setup.py:
--------------------------------------------------------------------------------
1 | import setuptools
2 | from setuptools import find_packages
3 |
4 | with open("README.md", "r") as fh:
5 | long_description = fh.read()
6 |
7 | REQUIRED_PACKAGES = ['keras', 'tensorflow', 'keras-resnet', 'six', 'tensorflow', 'pandas', 'sklearn']
8 |
9 | setuptools.setup(
10 | name="keras_retinanet",
11 | version="1.0.0",
12 | author="Mukesh Mithrakumar",
13 | author_email="mukesh.mithrakumar@jacks.sdstate.edu",
14 | description="Keras implementation of RetinaNet for object detection and visual relationship identification",
15 | long_description=long_description,
16 | long_description_content_type="text/markdown",
17 | url="https://github.com/mukeshmithrakumar/RetinaNet",
18 | classifiers=(
19 | "Development Status :: 1.0.0.dev1 - Development release",
20 | "Intended Audience :: Developers",
21 | "Programming Language :: Python :: 3.6",
22 | "License :: OSI Approved :: MIT License",
23 | "Operating System :: OS Independent",
24 | ),
25 | keywords="sample setuptools development",
26 | packages=find_packages(exclude=['contrib', 'docs', 'tests*']),
27 | install_requires=REQUIRED_PACKAGES,
28 | entry_points={
29 | 'console_scripts': [
30 | 'retinanet_task = keras_retinanet.trainer.task:main',
31 | 'retinanet_train = keras_retinanet.trainer.train:main',
32 | 'retinanet_evaluate = keras_retinanet.trainer.evaluate:main',
33 | ]
34 | },
35 | python_requires='>=3',
36 | )
37 |
--------------------------------------------------------------------------------
/keras_retinanet/trainer/convert_model.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 | import sys
4 |
5 |
6 | if __name__ == '__main__' and __package__ is None:
7 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
8 | __package__ = "keras_retinanet.trainer"
9 |
10 | from ..models import model_backbone
11 |
12 |
13 | def parse_args(args):
14 | parser = argparse.ArgumentParser(description='Script for converting a training model to an inference model.')
15 |
16 | parser.add_argument(
17 | 'main_dir',
18 | help='Path to dataset directory.'
19 | )
20 | parser.add_argument(
21 | 'model_in',
22 | help='The model to convert.'
23 | )
24 | parser.add_argument(
25 | '--backbone',
26 | help='The backbone of the model to convert.',
27 | default='resnet50'
28 | )
29 | parser.add_argument(
30 | '--no-nms',
31 | help='Disables non maximum suppression.',
32 | dest='nms',
33 | action='store_false'
34 | )
35 | parser.add_argument(
36 | '--no-class-specific-filter',
37 | help='Disables class specific filtering.',
38 | dest='class_specific_filter',
39 | action='store_false'
40 | )
41 |
42 | return parser.parse_args(args)
43 |
44 |
45 | def main(args=None):
46 | # parse arguments
47 | if args is None:
48 | args = sys.argv[1:]
49 | args = parse_args(args)
50 |
51 | # load and convert model
52 | model = model_backbone.load_model(args.model_in,
53 | convert=True,
54 | backbone_name=args.backbone,
55 | nms=args.nms,
56 | class_specific_filter=args.class_specific_filter)
57 |
58 | # save model
59 | model_out_path = os.path.join(args.main_dir, 'keras_retinanet', 'trainer', 'snapshots')
60 | model_out = os.path.join(model_out_path, '{}_inference.h5'.format(args.model_in))
61 | model.save(model_out)
62 |
63 |
64 | if __name__ == '__main__':
65 | main()
66 |
--------------------------------------------------------------------------------
/keras_retinanet/trainer/evaluate.py:
--------------------------------------------------------------------------------
1 | import keras
2 | import argparse
3 | import tensorflow as tf
4 | import numpy as np
5 | import sys
6 | import os
7 | import csv
8 | import pandas as pd
9 |
10 | if __name__ == '__main__' and __package__ is None:
11 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
12 | __package__ = "keras_retinanet.trainer"
13 |
14 | # Change these to absolute imports if you copy this script outside the keras_retinanet package.
15 | from ..models import model_backbone
16 | from ..preprocessing.image import read_image_bgr, preprocess_image, resize_image
17 | from ..models import classifier
18 |
19 |
20 | def parse_args(args):
21 | parser = argparse.ArgumentParser(description='Script for converting a training model to an inference model.')
22 |
23 | parser.add_argument(
24 | 'main_dir',
25 | help='Path to dataset directory.'
26 | )
27 | parser.add_argument(
28 | 'model_in',
29 | help="The converted model to evaluate."
30 | "If training model hasn't been converted for inference, run convert_model first."
31 | )
32 | parser.add_argument(
33 | '--train_type',
34 | help="Type of predictions you want to make"
35 | "If you want to train for Visual Reltionship, then type -'vr'."
36 | "If you want to train for Object Detection, then type -'od'."
37 | "If you want to train for both, then type -'both'.",
38 | default='both'
39 | )
40 | parser.add_argument(
41 | '--backbone',
42 | help='The backbone of the model to convert.',
43 | default='resnet50'
44 | )
45 |
46 | return parser.parse_args(args)
47 |
48 |
49 | def get_session():
50 | """ Construct a modified tf session.
51 | """
52 | config = tf.ConfigProto()
53 | config.gpu_options.allow_growth = True
54 | return tf.Session(config=config)
55 |
56 |
57 | def makedirs(path):
58 | # Intended behavior: try to create the directory,
59 | # pass if the directory exists already, fails otherwise.
60 | # Meant for Python 2.7/3.n compatibility.
61 | try:
62 | os.makedirs(path)
63 | except OSError:
64 | if not os.path.isdir(path):
65 | raise
66 |
67 |
68 | def get_midlabels(main_dir):
69 | meta_dir = os.path.join(main_dir, 'challenge2018')
70 | csv_file = 'challenge-2018-class-descriptions-500.csv'
71 | boxable_classes_descriptions = os.path.join(meta_dir, csv_file)
72 |
73 | id_to_midlabels = {}
74 | i = 0
75 | with open(boxable_classes_descriptions, 'r') as descriptions_file:
76 | for row in csv.reader(descriptions_file):
77 | if len(row):
78 | label = row[0]
79 | id_to_midlabels[i] = label
80 | i += 1
81 |
82 | return id_to_midlabels
83 |
84 |
85 | def get_annotations(base_dir, model):
86 | id_annotations = dict()
87 | count = 0
88 | for img in os.listdir(base_dir):
89 | try:
90 | img_path = os.path.join(base_dir, img)
91 | raw_image = read_image_bgr(img_path)
92 | image = preprocess_image(raw_image.copy())
93 | image, scale = resize_image(image, min_side=600, max_side=600)
94 | height, width, _ = image.shape
95 |
96 | img_id = img.strip('.jpg')
97 |
98 | # run network
99 | boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
100 |
101 | # boxes in (x1, y1, x2, y2) format
102 | new_boxes2 = []
103 | for box in boxes[0]:
104 | x1_int = round((box[0] / width), 3)
105 | y1_int = round((box[1] / height), 3)
106 | x2_int = round((box[2] / width), 3)
107 | y2_int = round((box[3] / height), 3)
108 | new_boxes2.extend([x1_int, y1_int, x2_int, y2_int])
109 |
110 | new_list = [new_boxes2[i:i + 4] for i in range(0, len(new_boxes2), 4)]
111 |
112 | annotation = {'cls_label': labels, 'box_values': new_list, 'scores': scores}
113 |
114 | if img_id in id_annotations:
115 | annotations = id_annotations[img_id]
116 | annotations['boxes'].append(annotation)
117 | else:
118 | id_annotations[img_id] = {'boxes': [annotation]}
119 |
120 | count += 1
121 | print("{0}/99999".format(count))
122 |
123 | except:
124 | print("Did not evaluate {}".format(img))
125 | continue
126 |
127 | return id_annotations
128 |
129 |
130 | def od(id_annotations, main_dir):
131 |
132 | id_to_midlabels = get_midlabels(main_dir)
133 |
134 | try:
135 | predict = pd.DataFrame.from_dict(id_annotations)
136 | except:
137 | print("from dict did not work")
138 |
139 | try:
140 | predict = pd.DataFrame.from_records(id_annotations)
141 | except:
142 | print("from records did not work")
143 |
144 | sub = []
145 | for k in predict:
146 | # convert class labels to MID format by iterating through class labels
147 | new_clslst = list(map(id_to_midlabels.get, predict[k]['boxes'][0]['cls_label'][0]))
148 |
149 | # copy the scores to the mid labels and create bounding box values
150 | new_boxlist = []
151 | for i, mids in enumerate(new_clslst):
152 | if mids is None:
153 | break
154 | else:
155 | scores = predict[k]['boxes'][0]['scores'][0][i]
156 | _scorelst = str(mids) + ' ' + str(scores)
157 | boxval = str(predict[k]['boxes'][0]['box_values'][i]).strip("[]")
158 | _boxlist = _scorelst + ' ' + boxval
159 | new_boxlist.append(_boxlist)
160 | i += 1
161 |
162 | new_boxlist = ''.join(str(new_boxlist)).replace(",", '').replace("'", '').replace("[", '').replace("]", '')
163 |
164 | sub.append(new_boxlist)
165 |
166 | mk_path = os.path.join(main_dir, 'ODSubmissions')
167 | makedirs(mk_path)
168 | path = os.path.join(main_dir, 'ODSubmissions')
169 |
170 | print("OD predictions complete")
171 | with open(path + "od.csv", "w") as csv_file:
172 | writer = csv.writer(csv_file, delimiter=' ')
173 | for line in sub:
174 | writer.writerow([line])
175 |
176 | header = ["PredictionString"]
177 | od_file = pd.read_csv(path + "od.csv", names=header)
178 |
179 | ImageId = []
180 | for k in predict:
181 | ImageId.append(k)
182 |
183 | se = pd.Series(ImageId)
184 | od_file['ImageId'] = se.values
185 |
186 | od_file = od_file[["ImageId", "PredictionString"]]
187 | od_file.to_csv(path + "submission-od.csv", index=False)
188 |
189 | print("Writing OD Submission file")
190 |
191 | if os.path.isfile(path + 'od.csv'):
192 | os.unlink(path + 'od.csv')
193 |
194 |
195 | def relationship_list(new_boxlist, new_scorelist, midlist,LogReg):
196 | XMin = []
197 | YMin = []
198 | XMax = []
199 | YMax = []
200 |
201 | for idx, i in enumerate(new_boxlist):
202 | XMin.append(new_boxlist[idx][0])
203 | YMin.append(new_boxlist[idx][1])
204 | XMax.append(new_boxlist[idx][2])
205 | YMax.append(new_boxlist[idx][3])
206 |
207 | if len(midlist) % 2 == 0:
208 | XMin1 = XMin[:int(len(new_boxlist) / 2)]
209 | YMin1 = YMin[:int(len(new_boxlist) / 2)]
210 | XMax1 = XMax[:int(len(new_boxlist) / 2)]
211 | YMax1 = YMax[:int(len(new_boxlist) / 2)]
212 |
213 | new_scorelist1 = new_scorelist[:int(len(new_scorelist) / 2)]
214 | midlist1 = midlist[:int(len(midlist) / 2)]
215 |
216 | XMin2 = XMin[int(len(new_boxlist) / 2):]
217 | YMin2 = YMin[int(len(new_boxlist) / 2):]
218 | XMax2 = XMax[int(len(new_boxlist) / 2):]
219 | YMax2 = YMax[int(len(new_boxlist) / 2):]
220 |
221 | new_scorelist2 = new_scorelist[int(len(new_scorelist) / 2):]
222 | midlist2 = midlist[int(len(midlist) / 2):]
223 |
224 | else:
225 | XMin1 = XMin[:int(len(new_boxlist) / 2)]
226 | YMin1 = YMin[:int(len(new_boxlist) / 2)]
227 | XMax1 = XMax[:int(len(new_boxlist) / 2)]
228 | YMax1 = YMax[:int(len(new_boxlist) / 2)]
229 |
230 | new_scorelist1 = new_scorelist[:int(len(new_scorelist) / 2)]
231 | midlist1 = midlist[:int(len(midlist) / 2)]
232 |
233 | XMin2 = XMin[int(len(new_boxlist) / 2) + 1:]
234 | YMin2 = YMin[int(len(new_boxlist) / 2) + 1:]
235 | XMax2 = XMax[int(len(new_boxlist) / 2) + 1:]
236 | YMax2 = YMax[int(len(new_boxlist) / 2) + 1:]
237 |
238 | new_scorelist2 = new_scorelist[int(len(new_scorelist) / 2) + 1:]
239 | midlist2 = midlist[int(len(midlist) / 2) + 1:]
240 |
241 | vr = pd.DataFrame()
242 |
243 | XMin1_se = pd.Series(XMin1)
244 | YMin1_se = pd.Series(YMin1)
245 | XMax1_se = pd.Series(XMax1)
246 | YMax1_se = pd.Series(YMax1)
247 |
248 | new_scorelist1_se = pd.Series(new_scorelist1)
249 | midlist1_se = pd.Series(midlist1)
250 |
251 | vr['LabelName1'] = midlist1_se.values
252 | vr['scores1'] = new_scorelist1_se.values
253 | vr['XMin1'] = XMin1_se.values
254 | vr['YMin1'] = YMin1_se.values
255 | vr['XMax1'] = XMax1_se.values
256 | vr['YMax1'] = YMax1_se.values
257 |
258 | vr['box_1_length'] = vr['XMax1'] - vr['XMin1']
259 | vr['box_1_height'] = vr['YMax1'] - vr['YMin1']
260 | vr['box_1_area'] = vr['box_1_length'] * vr['box_1_height']
261 |
262 | XMin2_se = pd.Series(XMin2)
263 | YMin2_se = pd.Series(YMin2)
264 | XMax2_se = pd.Series(XMax2)
265 | YMax2_se = pd.Series(YMax2)
266 |
267 | new_scorelist2_se = pd.Series(new_scorelist2)
268 | midlist2_se = pd.Series(midlist2)
269 |
270 | vr['LabelName2'] = midlist2_se.values
271 | vr['scores2'] = new_scorelist2_se.values
272 | vr['XMin2'] = XMin2_se.values
273 | vr['YMin2'] = YMin2_se.values
274 | vr['XMax2'] = XMax2_se.values
275 | vr['YMax2'] = YMax2_se.values
276 |
277 | vr['box_2_length'] = vr['XMax2'] - vr['XMin2']
278 | vr['box_2_height'] = vr['YMax2'] - vr['YMin2']
279 | vr['box_2_area'] = vr['box_2_length'] * vr['box_2_height']
280 |
281 | vr['confidence'] = (vr['scores1'] + vr['scores2']) / 2.0
282 |
283 | vr["xA"] = vr[["XMin1", "XMin2"]].max(axis=1)
284 | vr["yA"] = vr[["YMin1", "YMin2"]].max(axis=1)
285 | vr["xB"] = vr[["XMax1", "XMax2"]].min(axis=1)
286 | vr["yB"] = vr[["YMax1", "YMax2"]].min(axis=1)
287 |
288 | vr["intersectionarea"] = (vr["xB"] - vr["xA"]) * (vr["yB"] - vr["yA"])
289 | vr["unionarea"] = vr["box_1_area"] + vr["box_2_area"] - vr["intersectionarea"]
290 | vr["iou"] = (vr["intersectionarea"] / vr["unionarea"])
291 |
292 | drop_columns = ["intersectionarea", "unionarea", "xA", "yA", "xB", "yB", "box_1_area",
293 | "box_2_area", "scores1", "scores2", "box_1_length", "box_1_height",
294 | "box_2_length", "box_2_height"]
295 |
296 | vr = vr.drop(columns=drop_columns)
297 |
298 | # replace columns with inf values with nan so I could drop those values
299 | vr = vr.replace([np.inf, -np.inf], np.nan)
300 | vr = vr.dropna()
301 |
302 | # drop the ious if its less than zero, it means its without any relationships cause of no intersection
303 | vr_iou_negative = vr[vr['iou'] < 0]
304 | vr = vr.drop(vr_iou_negative.index, axis=0)
305 |
306 | vr = vr[['confidence', 'LabelName1', 'XMin1', 'YMin1', 'XMax1',
307 | 'YMax1', 'LabelName2', 'XMin2', 'YMin2', 'XMax2', 'YMax2', 'iou']]
308 |
309 | vr_test = vr[['XMin1', 'YMin1', 'XMax1', 'YMax1', 'XMin2', 'YMin2', 'XMax2',
310 | 'YMax2', 'iou']]
311 |
312 | try:
313 | vr_pred = LogReg.predict(vr_test)
314 |
315 | relations_file = {'0': 'at',
316 | "1": 'hits',
317 | "2": 'holds',
318 | "3": 'inside_of',
319 | "4": 'interacts_with',
320 | "5": 'is',
321 | "6": 'on',
322 | "7": 'plays',
323 | "8": 'under',
324 | "9": 'wears'
325 | }
326 |
327 | def get_vr(row):
328 | for c in vr_pred_df.columns:
329 | if row[c] == 1:
330 | return c
331 |
332 | vr_pred_df1 = pd.DataFrame(vr_pred, columns=relations_file)
333 | vr_pred_df = vr_pred_df1.rename(columns=relations_file)
334 | vr_pred_df = vr_pred_df.apply(get_vr, axis=1)
335 | vr['Relationship'] = vr_pred_df.values
336 | vr = vr.dropna()
337 | vr = vr.drop(columns='iou')
338 |
339 | vrlst = vr.values.tolist()
340 | new_vrlst = ''.join(str(vrlst)).replace(",", '').replace("'", '').replace("[", '').replace("]", '')
341 |
342 | except:
343 | print("EMPTY EVALUATION")
344 | new_vrlst = ''
345 |
346 | return new_vrlst
347 |
348 |
349 | def vr(id_annotations, logreg, main_dir):
350 |
351 | id_to_midlabels = get_midlabels(main_dir)
352 |
353 | try:
354 | predict = pd.DataFrame.from_dict(id_annotations)
355 | except:
356 | print("from dict did not work")
357 |
358 | try:
359 | predict = pd.DataFrame.from_records(id_annotations)
360 | except:
361 | print("from records did not work")
362 |
363 | sub = []
364 | for k in predict:
365 | counter = 0
366 |
367 | # convert class labels to MID format by iterating through class labels
368 | clslst = list(map(id_to_midlabels.get, predict[k]['boxes'][0]['cls_label'][0]))
369 |
370 | new_boxlist = []
371 | new_scorelist = []
372 | midlist = []
373 | empty_imgs = []
374 | for i, mids in enumerate(clslst):
375 | if mids is None:
376 | break
377 | else:
378 | scores = predict[k]['boxes'][0]['scores'][0][i]
379 | val = predict[k]['boxes'][0]['box_values'][i]
380 | new_scorelist.append(scores)
381 | midlist.append(mids)
382 | new_boxlist.append(val)
383 | i += 1
384 | counter += 1
385 |
386 | if len(midlist) == 0:
387 | empty_imgs.append(str(counter) + ':' + str(k))
388 | new_vrlst = ''
389 |
390 | else:
391 | new_vrlst = relationship_list(new_boxlist, new_scorelist, midlist, logreg)
392 |
393 | sub.append(new_vrlst)
394 | print("{0}/99999".format(len(sub)))
395 |
396 | mk_path = os.path.join(main_dir, 'VRSubmissions')
397 | makedirs(mk_path)
398 | path = os.path.join(main_dir, 'VRSubmissions')
399 |
400 | with open(path + "vr.csv", "w") as csv_file:
401 | writer = csv.writer(csv_file, delimiter=' ')
402 | for line in sub:
403 | writer.writerow([line])
404 |
405 | header = ["PredictionString"]
406 | vr_file = pd.read_csv(path + "vr.csv", names=header)
407 |
408 | ImageId = []
409 | for k in predict:
410 | ImageId.append(k)
411 |
412 | se = pd.Series(ImageId)
413 | vr_file['ImageId'] = se.values
414 |
415 | vr_file = vr_file[["ImageId", "PredictionString"]]
416 | vr_file.to_csv(path + "submission-vr.csv", index=False)
417 |
418 | print("Writing VR Submission file")
419 |
420 | if os.path.isfile(path + 'vr.csv'):
421 | os.unlink(path + 'vr.csv')
422 |
423 |
424 | def main(args=None):
425 |
426 | if args is None:
427 | args = sys.argv[1:]
428 | args = parse_args(args)
429 |
430 | keras.backend.tensorflow_backend.set_session(get_session())
431 |
432 | keras_retinanet = os.path.join(args.main_dir, 'keras_retinanet', 'trainer', 'snapshots')
433 | path_to_model = os.path.join(keras_retinanet, '{}.h5'.format(args.model_in))
434 | base_dir = os.path.join(args.main_dir, 'images', 'test')
435 |
436 | # load the evaluation model
437 | print('Loading model {}, this may take a second...'.format(args.model_in))
438 | model = model_backbone.load_model(path_to_model, backbone_name='resnet50')
439 |
440 | print("Starting Evaluation...")
441 |
442 | if args.train_type == 'both':
443 | id_annotations = get_annotations(base_dir, model)
444 | print("Evaluation Completed")
445 |
446 | print("Starting Object Detection Prediction")
447 | od(id_annotations, args.main_dir)
448 |
449 | print("Starting Visual Relationship Bounding Box Classifier Training")
450 | logreg = classifier.vr_bb_classifier(args.main_dir)
451 |
452 | print("Starting Visual Relationship Bounding Box Prediction")
453 | vr(id_annotations, logreg, args.main_dir)
454 | print("Prediction Completed")
455 |
456 | elif args.train_type == 'od':
457 | id_annotations = get_annotations(base_dir, model)
458 | print("Evaluation Completed")
459 |
460 | print("Starting Object Detection Prediction")
461 | od(id_annotations, args.main_dir)
462 | print("Prediction Completed")
463 |
464 | elif args.train_type == 'vr':
465 | id_annotations = get_annotations(base_dir, model)
466 | print("Evaluation Completed")
467 |
468 | print("Starting Visual Relationship Bounding Box Classifier Training")
469 | logreg = classifier.vr_bb_classifier(args.main_dir)
470 |
471 | print("Starting Visual Relationship Bounding Box Prediction")
472 | vr(id_annotations, logreg, args.main_dir)
473 | print("Prediction Completed")
474 |
475 | else:
476 | raise ValueError('Invalid train type received: {}'.format(args.train_type))
477 |
478 |
479 | if __name__ == '__main__':
480 | main()
481 |
--------------------------------------------------------------------------------
/keras_retinanet/trainer/model.py:
--------------------------------------------------------------------------------
1 | import os
2 | import keras
3 | import keras.preprocessing.image
4 | from keras.utils import multi_gpu_model
5 | import tensorflow as tf
6 |
7 | # Change these to absolute imports if you copy this script outside the keras_retinanet package.
8 | from ..utils import losses
9 | from ..models import model_backbone
10 | from ..models.retinanet import retinanet_bbox
11 | from ..utils.anchors import make_shapes_callback
12 | from ..callbacks.callbacks import RedirectModel
13 | from ..callbacks.callbacks import Evaluate
14 | from ..preprocessing.open_images import OpenImagesGenerator
15 | from ..preprocessing.image import random_transform_generator
16 | from ..utils import freeze as freeze_model
17 |
18 |
19 | def makedirs(path):
20 | # Intended behavior: try to create the directory,
21 | # pass if the directory exists already, fails otherwise.
22 | # Meant for Python 2.7/3.n compatibility.
23 | try:
24 | os.makedirs(path)
25 | except OSError:
26 | if not os.path.isdir(path):
27 | raise
28 |
29 |
30 | def get_session():
31 | """ Construct a modified tf session.
32 | """
33 | config = tf.ConfigProto()
34 | config.gpu_options.allow_growth = True
35 | # config.gpu_options.per_process_gpu_memory_fraction = 0.7
36 | return tf.Session(config=config)
37 |
38 |
39 | def model_with_weights(model, weights, skip_mismatch):
40 | """ Load weights for model.
41 |
42 | Args
43 | model : The model to load weights for.
44 | weights : The weights to load.
45 | skip_mismatch : If True, skips layers whose shape of weights doesn't match with the model.
46 | """
47 | if weights is not None:
48 | model.load_weights(weights, by_name=True, skip_mismatch=skip_mismatch)
49 | return model
50 |
51 |
52 | def create_models(backbone_retinanet, num_classes, weights, multi_gpu=1, freeze_backbone=False):
53 | """ Creates three models (model, training_model, prediction_model).
54 |
55 | Args
56 | backbone_retinanet : A function to call to create a retinanet model with a given backbone.
57 | num_classes : The number of classes to train.
58 | weights : The weights to load into the model.
59 | multi_gpu : The number of GPUs to use for training.
60 | freeze_backbone : If True, disables learning for the backbone.
61 |
62 | Returns
63 | model : The base model. This is also the model that is saved in snapshots.
64 | training_model : The training model. If multi_gpu=0, this is identical to model.
65 | prediction_model : The model wrapped with utility functions to perform object detection
66 | (applies regression values and performs NMS).
67 | """
68 | modifier = freeze_model if freeze_backbone else None
69 |
70 | # Keras recommends initialising a multi-gpu model on the CPU to ease weight sharing, and to prevent OOM errors.
71 | # optionally wrap in a parallel model
72 | if multi_gpu > 1:
73 | with tf.device('/cpu:0'):
74 | model = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights,
75 | skip_mismatch=True)
76 | training_model = multi_gpu_model(model, gpus=multi_gpu)
77 | else:
78 | model = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights,
79 | skip_mismatch=True)
80 | training_model = model
81 |
82 | # make prediction model
83 | prediction_model = retinanet_bbox(model=model)
84 |
85 | # compile model
86 | training_model.compile(
87 | loss={
88 | 'regression': losses.smooth_l1(),
89 | 'classification': losses.focal()
90 | },
91 | optimizer=keras.optimizers.adam(lr=1e-7, clipnorm=0.001)
92 | )
93 |
94 | return model, training_model, prediction_model
95 |
96 |
97 | def create_callbacks(model, training_model, prediction_model, validation_generator, args):
98 | """ Creates the callbacks to use during training.
99 |
100 | Args
101 | model: The base model.
102 | training_model: The model that is used for training.
103 | prediction_model: The model that should be used for validation.
104 | validation_generator: The generator for creating validation data.
105 | args: parseargs args object.
106 |
107 | Returns:
108 | A list of callbacks used for training.
109 | """
110 | callbacks = []
111 |
112 | tensorboard_callback = None
113 |
114 | if args.tensorboard_dir:
115 | tensorboard_callback = keras.callbacks.TensorBoard(
116 | log_dir=args.tensorboard_dir,
117 | histogram_freq=0,
118 | batch_size=args.batch_size,
119 | write_graph=True,
120 | write_grads=False,
121 | write_images=False,
122 | embeddings_freq=0,
123 | embeddings_layer_names=None,
124 | embeddings_metadata=None
125 | )
126 | callbacks.append(tensorboard_callback)
127 |
128 | if args.evaluation and validation_generator:
129 | evaluation = Evaluate(validation_generator, tensorboard=tensorboard_callback)
130 | evaluation = RedirectModel(evaluation, prediction_model)
131 | callbacks.append(evaluation)
132 |
133 | # save the model
134 | if args.snapshots:
135 | # ensure directory created first; otherwise h5py will error after epoch.
136 | makedirs(args.snapshot_path)
137 | checkpoint = keras.callbacks.ModelCheckpoint(
138 | os.path.join(args.snapshot_path,
139 | '{backbone}_{dataset_type}_{{epoch:02d}}.h5'.format(backbone=args.backbone,
140 | dataset_type=args.dataset_type)),
141 | verbose=1,
142 | # save_best_only=True,
143 | monitor="mAP",
144 | # mode='max'
145 | )
146 | checkpoint = RedirectModel(checkpoint, model)
147 | callbacks.append(checkpoint)
148 |
149 | callbacks.append(keras.callbacks.ReduceLROnPlateau(
150 | monitor='loss',
151 | factor=0.1,
152 | patience=2,
153 | verbose=1,
154 | mode='auto',
155 | min_delta=0.0001,
156 | cooldown=0,
157 | min_lr=0
158 | ))
159 |
160 | return callbacks
161 |
162 |
163 | def create_generators(args, preprocess_image):
164 | """ Create generators for training and validation.
165 |
166 | Args
167 | args : parseargs object containing configuration for generators.
168 | preprocess_image : Function that preprocesses an image for the network.
169 | """
170 | common_args = {
171 | 'batch_size': args.batch_size,
172 | 'image_min_side': args.image_min_side,
173 | 'image_max_side': args.image_max_side,
174 | 'preprocess_image': preprocess_image,
175 | }
176 |
177 | # create random transform generator for augmenting training data
178 | if args.random_transform:
179 | transform_generator = random_transform_generator(
180 | min_rotation=-0.1,
181 | max_rotation=0.1,
182 | min_translation=(-0.1, -0.1),
183 | max_translation=(0.1, 0.1),
184 | min_shear=-0.1,
185 | max_shear=0.1,
186 | min_scaling=(0.9, 0.9),
187 | max_scaling=(1.1, 1.1),
188 | flip_x_chance=0.5,
189 | flip_y_chance=0.5,
190 | )
191 | else:
192 | transform_generator = random_transform_generator(flip_x_chance=0.5)
193 |
194 | if args.dataset_type == 'oid':
195 | train_generator = OpenImagesGenerator(
196 | args.main_dir,
197 | subset='train',
198 | version=args.version,
199 | labels_filter=args.labels_filter,
200 | annotation_cache_dir=args.annotation_cache_dir,
201 | parent_label=args.parent_label,
202 | transform_generator=transform_generator,
203 | **common_args
204 | )
205 |
206 | validation_generator = OpenImagesGenerator(
207 | args.main_dir,
208 | subset='validation',
209 | version=args.version,
210 | labels_filter=args.labels_filter,
211 | annotation_cache_dir=args.annotation_cache_dir,
212 | parent_label=args.parent_label,
213 | **common_args
214 | )
215 |
216 | else:
217 | raise ValueError('Invalid data type received: {}'.format(args.dataset_type))
218 |
219 | return train_generator, validation_generator
220 |
221 |
222 | def train(args):
223 | # create object that stores backbone information
224 | backbone = model_backbone.backbone(args.backbone)
225 |
226 | # optionally choose specific GPU
227 | if args.gpu:
228 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
229 | keras.backend.tensorflow_backend.set_session(get_session())
230 |
231 | # create the generators
232 | print("Going to get the training and validation generators...")
233 | train_generator, validation_generator = create_generators(args, backbone.preprocess_image)
234 |
235 | # create the model
236 | if args.snapshot is not None:
237 | print('Loading model: {} \nThis may take a second...'.format(args.snapshot))
238 | model = model_backbone.load_model(args.snapshot, backbone_name=args.backbone)
239 | training_model = model
240 | prediction_model = retinanet_bbox(model=model)
241 | else:
242 | weights = args.weights
243 | # default to imagenet if nothing else is specified
244 | if weights is None and args.imagenet_weights:
245 | weights = backbone.download_imagenet()
246 |
247 | print('Creating model, this may take a second...')
248 | model, training_model, prediction_model = create_models(
249 | backbone_retinanet=backbone.retinanet,
250 | num_classes=train_generator.num_classes(),
251 | weights=weights,
252 | multi_gpu=args.multi_gpu,
253 | freeze_backbone=args.freeze_backbone
254 | )
255 |
256 | # print model summary
257 | print(model.summary())
258 |
259 | # create the callbacks
260 | callbacks = create_callbacks(
261 | model,
262 | training_model,
263 | prediction_model,
264 | validation_generator,
265 | args,
266 | )
267 |
268 | # start training
269 | print("Started training...")
270 | training_model.fit_generator(
271 | generator=train_generator,
272 | steps_per_epoch=args.steps,
273 | epochs=args.epochs,
274 | verbose=1,
275 | callbacks=callbacks,
276 | )
277 |
278 | print("Training Complete")
279 |
--------------------------------------------------------------------------------
/keras_retinanet/trainer/task.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import sys
3 | import os
4 |
5 |
6 | if __name__ == '__main__' and __package__ is None:
7 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
8 | __package__ = "keras_retinanet.trainer"
9 |
10 | from ..trainer import model # Your model.py file.
11 |
12 | """ Parse the arguments.
13 | """
14 | parser = argparse.ArgumentParser(description='Simple training script for training a RetinaNet network.')
15 | subparsers = parser.add_subparsers(help='Arguments for specific dataset types.', dest='dataset_type')
16 | subparsers.required = True
17 |
18 | def csv_list(string):
19 | return string.split(',')
20 |
21 | oid_parser = subparsers.add_parser('oid')
22 | oid_parser.add_argument(
23 | 'main_dir',
24 | help='Path to dataset directory.'
25 | )
26 | oid_parser.add_argument(
27 | '--version',
28 | help='The current dataset version is v4.',
29 | default='challenge2018'
30 | )
31 | oid_parser.add_argument(
32 | '--labels-filter',
33 | help='A list of labels to filter.',
34 | type=csv_list,
35 | default=None
36 | )
37 | oid_parser.add_argument(
38 | '--annotation-cache-dir',
39 | help='Path to store annotation cache.',
40 | default='.'
41 | )
42 | oid_parser.add_argument(
43 | '--parent-label',
44 | help='Use the hierarchy children of this label.',
45 | default=None
46 | )
47 |
48 | group = parser.add_mutually_exclusive_group()
49 | group.add_argument(
50 | '--snapshot',
51 | help='Resume training from a snapshot.'
52 | )
53 | group.add_argument(
54 | '--imagenet-weights',
55 | help='Initialize the model with pretrained imagenet weights. This is the default behaviour.',
56 | action='store_const',
57 | const=True,
58 | default=True
59 | )
60 | group.add_argument(
61 | '--weights',
62 | help='Initialize the model with weights from a file.'
63 | )
64 | group.add_argument(
65 | '--no-weights',
66 | help='Don\'t initialize the model with any weights.',
67 | dest='imagenet_weights',
68 | action='store_const',
69 | const=False
70 | )
71 |
72 | parser.add_argument(
73 | '--backbone',
74 | help='Backbone model used by retinanet.',
75 | default='resnet50',
76 | type=str
77 | )
78 | parser.add_argument(
79 | '--batch-size',
80 | help='Size of the batches.',
81 | default=1,
82 | type=int
83 | )
84 | parser.add_argument(
85 | '--gpu',
86 | help='Id of the GPU to use (as reported by nvidia-smi).'
87 | )
88 | parser.add_argument(
89 | '--multi-gpu',
90 | help='Number of GPUs to use for parallel processing.',
91 | type=int,
92 | default=1)
93 | parser.add_argument(
94 | '--multi-gpu-force',
95 | help='Extra flag needed to enable (experimental) multi-gpu support.',
96 | action='store_true'
97 | )
98 | parser.add_argument(
99 | '--epochs',
100 | help='Number of epochs to train.',
101 | type=int,
102 | default=50
103 | )
104 | parser.add_argument(
105 | '--steps',
106 | help='Number of steps per epoch.',
107 | type=int,
108 | default=100000
109 | )
110 | parser.add_argument(
111 | '--snapshot-path',
112 | help="Path to store snapshots of models during training (defaults to \'./snapshots\')",
113 | default='./snapshots'
114 | )
115 | parser.add_argument(
116 | '--tensorboard-dir',
117 | help='Log directory for Tensorboard output',
118 | default='./logs'
119 | )
120 | parser.add_argument(
121 | '--no-snapshots',
122 | help='Disable saving snapshots.',
123 | dest='snapshots',
124 | action='store_false'
125 | )
126 | parser.add_argument(
127 | '--no-evaluation',
128 | help='Disable per epoch evaluation.',
129 | dest='evaluation',
130 | action='store_false'
131 | )
132 | parser.add_argument(
133 | '--freeze-backbone',
134 | help='Freeze training of backbone layers.',
135 | action='store_true'
136 | )
137 | parser.add_argument(
138 | '--random-transform',
139 | help='Randomly transform image and annotations.',
140 | action='store_true'
141 | )
142 | parser.add_argument(
143 | '--image-min-side',
144 | help='Rescale the image so the smallest side is min_side.',
145 | type=int,
146 | default=600
147 | )
148 | parser.add_argument(
149 | '--image-max-side',
150 | help='Rescale the image if the largest side is larger than max_side.',
151 | type=int,
152 | default=600
153 | )
154 |
155 | args = parser.parse_args()
156 |
157 | # Run the training job
158 | model.train(args)
159 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/anchors.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import keras
3 |
4 |
5 | def compute_overlap(boxes, query_boxes):
6 | """
7 | Args
8 | a: (N, 4) ndarray of float
9 | b: (K, 4) ndarray of float
10 | Returns
11 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes
12 | """
13 |
14 | n_ = boxes.shape[0]
15 | k_ = query_boxes.shape[0]
16 | overlaps = np.zeros((n_, k_), dtype=np.float)
17 | for k in range(k_):
18 | query_box_area = (query_boxes[k, 2] - query_boxes[k, 0] + 1) * (query_boxes[k, 3] - query_boxes[k, 1] + 1)
19 | for n in range(n_):
20 | iw = min(boxes[n, 2], query_boxes[k, 2]) - max(boxes[n, 0], query_boxes[k, 0]) + 1
21 | if iw > 0:
22 | ih = min(boxes[n, 3], query_boxes[k, 3]) - max(boxes[n, 1], query_boxes[k, 1]) + 1
23 | if ih > 0:
24 | box_area = (boxes[n, 2] - boxes[n, 0] + 1) * (boxes[n, 3] - boxes[n, 1] + 1)
25 | all_area = float(box_area + query_box_area - iw * ih)
26 | overlaps[n, k] = iw * ih / all_area
27 | return overlaps
28 |
29 |
30 | def anchor_targets_bbox(
31 | anchors,
32 | image_group,
33 | annotations_group,
34 | num_classes,
35 | negative_overlap=0.3,
36 | positive_overlap=0.7
37 | ):
38 | """ Generate anchor targets for bbox detection.
39 |
40 | Args
41 | anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
42 | image_group: List of BGR images.
43 | annotations_group: List of annotations (np.array of shape (N, 5) for (x1, y1, x2, y2, label)).
44 | num_classes: Number of classes to predict.
45 | mask_shape: If the image is padded with zeros, mask_shape can be used to mark the relevant part of the image.
46 | negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
47 | positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).
48 |
49 | Returns
50 | labels_batch: batch that contains labels & anchor states (np.array of shape (batch_size, N, num_classes + 1),
51 | where N is the number of anchors for an image and the last column defines the anchor state
52 | (-1 for ignore, 0 for bg, 1 for fg).
53 | regression_batch: batch that contains bounding-box regression targets for an image & anchor states
54 | (np.array of shape (batch_size, N, 4 + 1),
55 | where N is the number of anchors for an image, the first 4 columns define regression targets for
56 | (x1, y1, x2, y2) and the
57 | last column defines anchor states (-1 for ignore, 0 for bg, 1 for fg).
58 | boxes_batch: box regression targets (np.array of shape (batch_size, N, num_classes + 1), where N is the number
59 | of anchors for an image)
60 | """
61 |
62 | assert (len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal."
63 | assert (len(annotations_group) > 0), "No data received to compute anchor targets for."
64 |
65 | batch_size = len(image_group)
66 |
67 | regression_batch = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx())
68 | labels_batch = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx())
69 | boxes_batch = np.zeros((batch_size, anchors.shape[0], annotations_group[0].shape[1]), dtype=keras.backend.floatx())
70 |
71 | # compute labels and regression targets
72 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):
73 | if annotations.shape[0]:
74 | # obtain indices of gt annotations with the greatest overlap
75 | positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors,
76 | annotations,
77 | negative_overlap,
78 | positive_overlap)
79 |
80 | labels_batch[index, ignore_indices, -1] = -1
81 | labels_batch[index, positive_indices, -1] = 1
82 |
83 | regression_batch[index, ignore_indices, -1] = -1
84 | regression_batch[index, positive_indices, -1] = 1
85 |
86 | # compute box regression targets
87 | annotations = annotations[argmax_overlaps_inds]
88 | boxes_batch[index, ...] = annotations
89 |
90 | # compute target class labels
91 | labels_batch[index, positive_indices, annotations[positive_indices, 4].astype(int)] = 1
92 |
93 | regression_batch[index, :, :-1] = bbox_transform(anchors, annotations)
94 |
95 | # ignore annotations outside of image
96 | if image.shape:
97 | anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T
98 | indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0])
99 |
100 | labels_batch[index, indices, -1] = - 1
101 | regression_batch[index, indices, -1] = -1
102 |
103 | return labels_batch, regression_batch, boxes_batch
104 |
105 |
106 | def compute_gt_annotations(
107 | anchors,
108 | annotations,
109 | negative_overlap=0.3,
110 | positive_overlap=0.7
111 | ):
112 | """ Obtain indices of gt annotations with the greatest overlap.
113 |
114 | Args
115 | anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
116 | annotations: np.array of shape (N, 5) for (x1, y1, x2, y2, label).
117 | negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
118 | positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).
119 |
120 | Returns
121 | positive_indices: indices of positive anchors
122 | ignore_indices: indices of ignored anchors
123 | argmax_overlaps_inds: ordered overlaps indices
124 | """
125 |
126 | overlaps = compute_overlap(anchors.astype(np.float64), annotations.astype(np.float64))
127 | argmax_overlaps_inds = np.argmax(overlaps, axis=1)
128 | max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
129 |
130 | # assign "dont care" labels
131 | positive_indices = max_overlaps >= positive_overlap
132 | ignore_indices = (max_overlaps > negative_overlap) & ~positive_indices
133 |
134 | return positive_indices, ignore_indices, argmax_overlaps_inds
135 |
136 |
137 | def layer_shapes(image_shape, model):
138 | """Compute layer shapes given input image shape and the model.
139 |
140 | Args
141 | image_shape: The shape of the image.
142 | model: The model to use for computing how the image shape is transformed in the pyramid.
143 |
144 | Returns
145 | A dictionary mapping layer names to image shapes.
146 | """
147 | shape = {
148 | model.layers[0].name: (None,) + image_shape,
149 | }
150 |
151 | for layer in model.layers[1:]:
152 | nodes = layer._inbound_nodes
153 | for node in nodes:
154 | inputs = [shape[lr.name] for lr in node.inbound_layers]
155 | if not inputs:
156 | continue
157 | shape[layer.name] = layer.compute_output_shape(inputs[0] if len(inputs) == 1 else inputs)
158 |
159 | return shape
160 |
161 |
162 | def make_shapes_callback(model):
163 | """ Make a function for getting the shape of the pyramid levels.
164 | """
165 |
166 | def get_shapes(image_shape, pyramid_levels):
167 | shape = layer_shapes(image_shape, model)
168 | image_shapes = [shape["P{}".format(level)][1:3] for level in pyramid_levels]
169 | return image_shapes
170 |
171 | return get_shapes
172 |
173 |
174 | def guess_shapes(image_shape, pyramid_levels):
175 | """Guess shapes based on pyramid levels.
176 |
177 | Args
178 | image_shape: The shape of the image.
179 | pyramid_levels: A list of what pyramid levels are used.
180 |
181 | Returns
182 | A list of image shapes at each pyramid level.
183 | """
184 | image_shape = np.array(image_shape[:2])
185 | image_shapes = [(image_shape + 2 ** x - 1) // (2 ** x) for x in pyramid_levels]
186 | return image_shapes
187 |
188 |
189 | def anchors_for_shape(
190 | image_shape,
191 | pyramid_levels=None,
192 | ratios=None,
193 | scales=None,
194 | strides=None,
195 | sizes=None,
196 | shapes_callback=None,
197 | ):
198 | """ Generators anchors for a given shape.
199 |
200 | Args
201 | image_shape: The shape of the image.
202 | pyramid_levels: List of ints representing which pyramids to use (defaults to [3, 4, 5, 6, 7]).
203 | ratios: List of ratios with which anchors are generated (defaults to [0.5, 1, 2]).
204 | scales: List of scales with which anchors are generated (defaults to [2^0, 2^(1/3), 2^(2/3)]).
205 | strides: Stride per pyramid level, defines how the pyramids are constructed.
206 | sizes: Sizes of the anchors per pyramid level.
207 | shapes_callback: Function to call for getting the shape of the image at different pyramid levels.
208 |
209 | Returns
210 | np.array of shape (N, 4) containing the (x1, y1, x2, y2) coordinates for the anchors.
211 | """
212 | if pyramid_levels is None:
213 | pyramid_levels = [3, 4, 5, 6, 7]
214 | if strides is None:
215 | strides = [2 ** x for x in pyramid_levels]
216 | if sizes is None:
217 | sizes = [2 ** (x + 2) for x in pyramid_levels]
218 | if ratios is None:
219 | ratios = np.array([0.5, 1, 2])
220 | if scales is None:
221 | scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
222 |
223 | if shapes_callback is None:
224 | shapes_callback = guess_shapes
225 | image_shapes = shapes_callback(image_shape, pyramid_levels)
226 |
227 | # compute anchors over all pyramid levels
228 | all_anchors = np.zeros((0, 4))
229 | for idx, p in enumerate(pyramid_levels):
230 | anchors = generate_anchors(base_size=sizes[idx], ratios=ratios, scales=scales)
231 | shifted_anchors = shift(image_shapes[idx], strides[idx], anchors)
232 | all_anchors = np.append(all_anchors, shifted_anchors, axis=0)
233 |
234 | return all_anchors
235 |
236 |
237 | def shift(shape, stride, anchors):
238 | """ Produce shifted anchors based on shape of the map and stride size.
239 |
240 | Args
241 | shape : Shape to shift the anchors over.
242 | stride : Stride to shift the anchors with over the shape.
243 | anchors: The anchors to apply at each location.
244 | """
245 | shift_x = (np.arange(0, shape[1]) + 0.5) * stride
246 | shift_y = (np.arange(0, shape[0]) + 0.5) * stride
247 |
248 | shift_x, shift_y = np.meshgrid(shift_x, shift_y)
249 |
250 | shifts = np.vstack((
251 | shift_x.ravel(), shift_y.ravel(),
252 | shift_x.ravel(), shift_y.ravel()
253 | )).transpose()
254 |
255 | # add A anchors (1, A, 4) to
256 | # cell K shifts (K, 1, 4) to get
257 | # shift anchors (K, A, 4)
258 | # reshape to (K*A, 4) shifted anchors
259 | A = anchors.shape[0]
260 | K = shifts.shape[0]
261 | all_anchors = (anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
262 | all_anchors = all_anchors.reshape((K * A, 4))
263 |
264 | return all_anchors
265 |
266 |
267 | def generate_anchors(base_size=16, ratios=None, scales=None):
268 | """
269 | Generate anchor (reference) windows by enumerating aspect ratios X
270 | scales w.r.t. a reference window.
271 | """
272 |
273 | if ratios is None:
274 | ratios = np.array([0.5, 1, 2])
275 |
276 | if scales is None:
277 | scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
278 |
279 | num_anchors = len(ratios) * len(scales)
280 |
281 | # initialize output anchors
282 | anchors = np.zeros((num_anchors, 4))
283 |
284 | # scale base_size
285 | anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T
286 |
287 | # compute areas of anchors
288 | areas = anchors[:, 2] * anchors[:, 3]
289 |
290 | # correct for ratios
291 | anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))
292 | anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))
293 |
294 | # transform from (x_ctr, y_ctr, w, h) -> (x1, y1, x2, y2)
295 | anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
296 | anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T
297 |
298 | return anchors
299 |
300 |
301 | def bbox_transform(anchors, gt_boxes, mean=None, std=None):
302 | """Compute bounding-box regression targets for an image."""
303 |
304 | if mean is None:
305 | mean = np.array([0, 0, 0, 0])
306 | if std is None:
307 | std = np.array([0.2, 0.2, 0.2, 0.2])
308 |
309 | if isinstance(mean, (list, tuple)):
310 | mean = np.array(mean)
311 | elif not isinstance(mean, np.ndarray):
312 | raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean)))
313 |
314 | if isinstance(std, (list, tuple)):
315 | std = np.array(std)
316 | elif not isinstance(std, np.ndarray):
317 | raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std)))
318 |
319 | anchor_widths = anchors[:, 2] - anchors[:, 0]
320 | anchor_heights = anchors[:, 3] - anchors[:, 1]
321 |
322 | targets_dx1 = (gt_boxes[:, 0] - anchors[:, 0]) / anchor_widths
323 | targets_dy1 = (gt_boxes[:, 1] - anchors[:, 1]) / anchor_heights
324 | targets_dx2 = (gt_boxes[:, 2] - anchors[:, 2]) / anchor_widths
325 | targets_dy2 = (gt_boxes[:, 3] - anchors[:, 3]) / anchor_heights
326 |
327 | targets = np.stack((targets_dx1, targets_dy1, targets_dx2, targets_dy2))
328 | targets = targets.T
329 |
330 | targets = (targets - mean) / std
331 |
332 | return targets
333 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/clean.py:
--------------------------------------------------------------------------------
1 | # make a list with the downloaded images
2 | import pandas as pd
3 | import os
4 |
5 | path_to_datafiles = "/home/mukeshmithrakumar/googleai/gcmount/challenge2018/"
6 |
7 | # code to load the train annotation box
8 | print("Reading challenge-2018-train-annotations-bbox.csv ....")
9 | challenge_2018_train_annotations_bbox = pd.read_csv(path_to_datafiles + "challenge-2018-train-annotations-bbox2.csv")
10 | challenge_2018_train_annotations_bbox = pd.DataFrame(challenge_2018_train_annotations_bbox)
11 | print("challenge_2018_train_annotations_bbox shape:", challenge_2018_train_annotations_bbox.shape)
12 |
13 | # code to load the validation annotation box
14 | print("Reading challenge-2018-image-ids-valset-od.csv ....")
15 | challenge_2018_image_ids_valset_od = pd.read_csv(path_to_datafiles + "challenge-2018-image-ids-valset-od2.csv")
16 | challenge_2018_image_ids_valset_od = pd.DataFrame(challenge_2018_image_ids_valset_od)
17 | print("challenge_2018_image_ids_valset_od shape:", challenge_2018_image_ids_valset_od.shape)
18 |
19 | # goes to the directory of the train/val images and creates a list of the downloaded images
20 | directory = "/home/mukeshmithrakumar/googleai/gcmount/images/train/train"
21 | downloaded_list = []
22 | print("Parsing downloaded files ....")
23 | for filename in os.listdir(directory):
24 | downloaded_list.append(filename)
25 | print("downloaded files: ", len(downloaded_list))
26 |
27 | # strips the imgs of the .jpg tag, see if you can add it to the for loop above
28 | downloaded_list = [imgs.strip('.jpg') for imgs in downloaded_list]
29 |
30 | # create a new df with descriptions from annotations for the downloaded images
31 | print("Creating new dataframes ....")
32 | train_annotations_bbox_downloaded_df_train = challenge_2018_train_annotations_bbox[
33 | challenge_2018_train_annotations_bbox['ImageID'].isin(downloaded_list)]
34 |
35 | val_annotations_bbox_downloaded_df_train = challenge_2018_image_ids_valset_od[
36 | challenge_2018_image_ids_valset_od['ImageID'].isin(downloaded_list)]
37 |
38 | print("challenge-2018-train-annotations-bbox shape:", train_annotations_bbox_downloaded_df_train.shape)
39 | print("challenge-2018-image-ids-valset-od shape:", val_annotations_bbox_downloaded_df_train.shape)
40 |
41 | # exported the data to csv
42 | print("Exporting the csv files ....")
43 | train_annotations_bbox_downloaded_df_train.to_csv(path_to_datafiles
44 | + 'challenge-2018-train-annotations-bbox.csv', index=False)
45 | val_annotations_bbox_downloaded_df_train.to_csv(path_to_datafiles
46 | + 'challenge-2018-image-ids-valset-od.csv', index=False)
47 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/freeze.py:
--------------------------------------------------------------------------------
1 | def freeze(model):
2 | """ Set all layers in a model to non-trainable.
3 |
4 | The weights for these layers will not be updated during training.
5 |
6 | This function modifies the given model in-place,
7 | but it also returns the modified model to allow easy chaining with other functions.
8 | """
9 | for layer in model.layers:
10 | layer.trainable = False
11 | return model
12 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/initializers.py:
--------------------------------------------------------------------------------
1 | import keras
2 | import numpy as np
3 | import math
4 |
5 |
6 | class PriorProbability(keras.initializers.Initializer):
7 | """ Apply a prior probability to the weights.
8 | """
9 |
10 | def __init__(self, probability=0.01):
11 | self.probability = probability
12 |
13 | def get_config(self):
14 | return {
15 | 'probability': self.probability
16 | }
17 |
18 | def __call__(self, shape, dtype=None):
19 | # set bias to -log((1 - p)/p) for foreground
20 | result = np.ones(shape, dtype=dtype) * -math.log((1 - self.probability) / self.probability)
21 |
22 | return result
23 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/layers.py:
--------------------------------------------------------------------------------
1 | import keras
2 | from ..utils import anchors as utils_anchors
3 | import numpy as np
4 | import tensorflow as tf
5 |
6 |
7 | def filter_detections(
8 | boxes,
9 | classification,
10 | other=[],
11 | class_specific_filter=True,
12 | nms=True,
13 | score_threshold=0.05,
14 | max_detections=300,
15 | nms_threshold=0.5
16 | ):
17 | """ Filter detections using the boxes and classification values.
18 | Args
19 | boxes : Tensor of shape (num_boxes, 4) containing the boxes in (x1, y1, x2, y2) format.
20 | classification : Tensor of shape (num_boxes, num_classes) containing the classification scores.
21 | other : List of tensors of shape (num_boxes, ...) to filter along with the boxes and
22 | classification scores.
23 | class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter those.
24 | nms : Flag to enable/disable non maximum suppression.
25 | score_threshold : Threshold used to prefilter the boxes with.
26 | max_detections : Maximum number of detections to keep.
27 | nms_threshold : Threshold for the IoU value to determine when a box should be suppressed.
28 | Returns
29 | A list of [boxes, scores, labels, other[0], other[1], ...].
30 | boxes is shaped (max_detections, 4) and contains the (x1, y1, x2, y2) of the non-suppressed boxes.
31 | scores is shaped (max_detections,) and contains the scores of the predicted class.
32 | labels is shaped (max_detections,) and contains the predicted label.
33 | other[i] is shaped (max_detections, ...) and contains the filtered other[i] data.
34 | In case there are less than max_detections detections, the tensors are padded with -1's.
35 | """
36 |
37 | def _filter_detections(scores, labels):
38 | # threshold based on score
39 | indices = tf.where(keras.backend.greater(scores, score_threshold))
40 |
41 | if nms:
42 | filtered_boxes = tf.gather_nd(boxes, indices)
43 | filtered_scores = keras.backend.gather(scores, indices)[:, 0]
44 |
45 | # perform NMS
46 | nms_indices = tf.image.non_max_suppression(filtered_boxes, filtered_scores, max_output_size=max_detections,
47 | iou_threshold=nms_threshold)
48 |
49 | # filter indices based on NMS
50 | indices = keras.backend.gather(indices, nms_indices)
51 |
52 | # add indices to list of all indices
53 | labels = tf.gather_nd(labels, indices)
54 | indices = keras.backend.stack([indices[:, 0], labels], axis=1)
55 |
56 | return indices
57 |
58 | if class_specific_filter:
59 | all_indices = []
60 | # perform per class filtering
61 | for c in range(int(classification.shape[1])):
62 | scores = classification[:, c]
63 | labels = c * keras.backend.ones((keras.backend.shape(scores)[0],), dtype='int64')
64 | all_indices.append(_filter_detections(scores, labels))
65 |
66 | # concatenate indices to single tensor
67 | indices = keras.backend.concatenate(all_indices, axis=0)
68 | else:
69 | scores = keras.backend.max(classification, axis=1)
70 | labels = keras.backend.argmax(classification, axis=1)
71 | indices = _filter_detections(scores, labels)
72 |
73 | # select top k
74 | scores = tf.gather_nd(classification, indices)
75 | labels = indices[:, 1]
76 | scores, top_indices = tf.nn.top_k(scores, k=keras.backend.minimum(max_detections, keras.backend.shape(scores)[0]))
77 |
78 | # filter input using the final set of indices
79 | indices = keras.backend.gather(indices[:, 0], top_indices)
80 | boxes = keras.backend.gather(boxes, indices)
81 | labels = keras.backend.gather(labels, top_indices)
82 | other_ = [keras.backend.gather(o, indices) for o in other]
83 |
84 | # zero pad the outputs
85 | pad_size = keras.backend.maximum(0, max_detections - keras.backend.shape(scores)[0])
86 | boxes = tf.pad(boxes, [[0, pad_size], [0, 0]], constant_values=-1)
87 | scores = tf.pad(scores, [[0, pad_size]], constant_values=-1)
88 | labels = tf.pad(labels, [[0, pad_size]], constant_values=-1)
89 | labels = keras.backend.cast(labels, 'int32')
90 | other_ = [tf.pad(o, [[0, pad_size]] + [[0, 0] for _ in range(1, len(o.shape))], constant_values=-1) for o in
91 | other_]
92 |
93 | # set shapes, since we know what they are
94 | boxes.set_shape([max_detections, 4])
95 | scores.set_shape([max_detections])
96 | labels.set_shape([max_detections])
97 | for o, s in zip(other_, [list(keras.backend.int_shape(o)) for o in other]):
98 | o.set_shape([max_detections] + s[1:])
99 |
100 | return [boxes, scores, labels] + other_
101 |
102 |
103 | class FilterDetections(keras.layers.Layer):
104 | """ Keras layer for filtering detections using score threshold and NMS.
105 | """
106 |
107 | def __init__(
108 | self,
109 | nms=True,
110 | class_specific_filter=True,
111 | nms_threshold=0.5,
112 | score_threshold=0.05,
113 | max_detections=300,
114 | parallel_iterations=32,
115 | **kwargs
116 | ):
117 | """ Filters detections using score threshold, NMS and selecting the top-k detections.
118 | Args
119 | nms : Flag to enable/disable NMS.
120 | class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter
121 | those.
122 | nms_threshold : Threshold for the IoU value to determine when a box should be suppressed.
123 | score_threshold : Threshold used to prefilter the boxes with.
124 | max_detections : Maximum number of detections to keep.
125 | parallel_iterations : Number of batch items to process in parallel.
126 | """
127 | self.nms = nms
128 | self.class_specific_filter = class_specific_filter
129 | self.nms_threshold = nms_threshold
130 | self.score_threshold = score_threshold
131 | self.max_detections = max_detections
132 | self.parallel_iterations = parallel_iterations
133 | super(FilterDetections, self).__init__(**kwargs)
134 |
135 | def call(self, inputs, **kwargs):
136 | """ Constructs the NMS graph.
137 | Args
138 | inputs : List of [boxes, classification, other[0], other[1], ...] tensors.
139 | """
140 | boxes = inputs[0]
141 | classification = inputs[1]
142 | other = inputs[2:]
143 |
144 | # wrap nms with our parameters
145 | def _filter_detections(args):
146 | boxes = args[0]
147 | classification = args[1]
148 | other = args[2]
149 |
150 | return filter_detections(
151 | boxes,
152 | classification,
153 | other,
154 | nms=self.nms,
155 | class_specific_filter=self.class_specific_filter,
156 | score_threshold=self.score_threshold,
157 | max_detections=self.max_detections,
158 | nms_threshold=self.nms_threshold,
159 | )
160 |
161 | # call filter_detections on each batch
162 | outputs = tf.map_fn(
163 | _filter_detections,
164 | elems=[boxes, classification, other],
165 | dtype=[keras.backend.floatx(), keras.backend.floatx(), 'int32'] + [o.dtype for o in other],
166 | parallel_iterations=self.parallel_iterations
167 | )
168 |
169 | return outputs
170 |
171 | def compute_output_shape(self, input_shape):
172 | """ Computes the output shapes given the input shapes.
173 | Args
174 | input_shape : List of input shapes [boxes, classification, other[0], other[1], ...].
175 | Returns
176 | List of tuples representing the output shapes:
177 | [filtered_boxes.shape, filtered_scores.shape, filtered_labels.shape, filtered_other[0].shape,
178 | filtered_other[1].shape, ...]
179 | """
180 | return [(input_shape[0][0], self.max_detections, 4),
181 | (input_shape[1][0], self.max_detections),
182 | (input_shape[1][0], self.max_detections),
183 | ] +[tuple([input_shape[i][0], self.max_detections] +
184 | list(input_shape[i][2:])) for i in range(2, len(input_shape))]
185 |
186 | def compute_mask(self, inputs, mask=None):
187 | """ This is required in Keras when there is more than 1 output.
188 | """
189 | return (len(inputs) + 1) * [None]
190 |
191 | def get_config(self):
192 | """ Gets the configuration of this layer.
193 | Returns
194 | Dictionary containing the parameters of this layer.
195 | """
196 | config = super(FilterDetections, self).get_config()
197 | config.update({
198 | 'nms': self.nms,
199 | 'class_specific_filter': self.class_specific_filter,
200 | 'nms_threshold': self.nms_threshold,
201 | 'score_threshold': self.score_threshold,
202 | 'max_detections': self.max_detections,
203 | 'parallel_iterations': self.parallel_iterations,
204 | })
205 |
206 | return config
207 |
208 |
209 | def shift(shape, stride, anchors):
210 | """ Produce shifted anchors based on shape of the map and stride size.
211 | Args
212 | shape : Shape to shift the anchors over.
213 | stride : Stride to shift the anchors with over the shape.
214 | anchors: The anchors to apply at each location.
215 | """
216 | shift_x = (keras.backend.arange(0, shape[1], dtype=keras.backend.floatx())
217 | + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride
218 | shift_y = (keras.backend.arange(0, shape[0], dtype=keras.backend.floatx())
219 | + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride
220 |
221 | shift_x, shift_y = tf.meshgrid(shift_x, shift_y)
222 | shift_x = keras.backend.reshape(shift_x, [-1])
223 | shift_y = keras.backend.reshape(shift_y, [-1])
224 |
225 | shifts = keras.backend.stack([
226 | shift_x,
227 | shift_y,
228 | shift_x,
229 | shift_y
230 | ], axis=0)
231 |
232 | shifts = keras.backend.transpose(shifts)
233 | number_of_anchors = keras.backend.shape(anchors)[0]
234 |
235 | k = keras.backend.shape(shifts)[0] # number of base points = feat_h * feat_w
236 |
237 | shifted_anchors = keras.backend.reshape(anchors, [1, number_of_anchors, 4]) + keras.backend.cast(
238 | keras.backend.reshape(shifts, [k, 1, 4]), keras.backend.floatx())
239 | shifted_anchors = keras.backend.reshape(shifted_anchors, [k * number_of_anchors, 4])
240 |
241 | return shifted_anchors
242 |
243 |
244 | def resize_images(images, size, method='bilinear', align_corners=False):
245 | """ See https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_images .
246 | Args
247 | method: The method used for interpolation. One of ('bilinear', 'nearest', 'bicubic', 'area').
248 | """
249 | methods = {
250 | 'bilinear': tf.image.ResizeMethod.BILINEAR,
251 | 'nearest': tf.image.ResizeMethod.NEAREST_NEIGHBOR,
252 | 'bicubic': tf.image.ResizeMethod.BICUBIC,
253 | 'area': tf.image.ResizeMethod.AREA,
254 | }
255 | return tf.image.resize_images(images, size, methods[method], align_corners)
256 |
257 |
258 | def bbox_transform_inv(boxes, deltas, mean=None, std=None):
259 | """ Applies deltas (usually regression results) to boxes (usually anchors).
260 | Before applying the deltas to the boxes, the normalization that was previously applied (in the generator) has to
261 | be removed.
262 | The mean and std are the mean and std as applied in the generator. They are unnormalized in this function and then
263 | applied to the boxes.
264 | Args
265 | boxes : np.array of shape (B, N, 4), where B is the batch size, N the number of boxes and 4 values for
266 | (x1, y1, x2, y2).
267 | deltas: np.array of same shape as boxes. These deltas (d_x1, d_y1, d_x2, d_y2) are a factor of the width/height.
268 | mean : The mean value used when computing deltas (defaults to [0, 0, 0, 0]).
269 | std : The standard deviation used when computing deltas (defaults to [0.2, 0.2, 0.2, 0.2]).
270 | Returns
271 | A np.array of the same shape as boxes, but with deltas applied to each box.
272 | The mean and std are used during training to normalize the regression values (networks love normalization).
273 | """
274 | if mean is None:
275 | mean = [0, 0, 0, 0]
276 | if std is None:
277 | std = [0.2, 0.2, 0.2, 0.2]
278 |
279 | width = boxes[:, :, 2] - boxes[:, :, 0]
280 | height = boxes[:, :, 3] - boxes[:, :, 1]
281 |
282 | x1 = boxes[:, :, 0] + (deltas[:, :, 0] * std[0] + mean[0]) * width
283 | y1 = boxes[:, :, 1] + (deltas[:, :, 1] * std[1] + mean[1]) * height
284 | x2 = boxes[:, :, 2] + (deltas[:, :, 2] * std[2] + mean[2]) * width
285 | y2 = boxes[:, :, 3] + (deltas[:, :, 3] * std[3] + mean[3]) * height
286 |
287 | pred_boxes = keras.backend.stack([x1, y1, x2, y2], axis=2)
288 |
289 | return pred_boxes
290 |
291 |
292 | class Anchors(keras.layers.Layer):
293 | """ Keras layer for generating achors for a given shape.
294 | """
295 |
296 | def __init__(self, size, stride, ratios=None, scales=None, *args, **kwargs):
297 | """ Initializer for an Anchors layer.
298 |
299 | Args
300 | size: The base size of the anchors to generate.
301 | stride: The stride of the anchors to generate.
302 | ratios: The ratios of the anchors to generate (defaults to [0.5, 1, 2]).
303 | scales: The scales of the anchors to generate (defaults to [2^0, 2^(1/3), 2^(2/3)]).
304 | """
305 | self.size = size
306 | self.stride = stride
307 | self.ratios = ratios
308 | self.scales = scales
309 |
310 | if ratios is None:
311 | self.ratios = np.array([0.5, 1, 2], keras.backend.floatx()),
312 | elif isinstance(ratios, list):
313 | self.ratios = np.array(ratios)
314 | if scales is None:
315 | self.scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)], keras.backend.floatx()),
316 | elif isinstance(scales, list):
317 | self.scales = np.array(scales)
318 |
319 | self.num_anchors = len(ratios) * len(scales)
320 | self.anchors = keras.backend.variable(utils_anchors.generate_anchors(
321 | base_size=size,
322 | ratios=ratios,
323 | scales=scales,
324 | ))
325 |
326 | super(Anchors, self).__init__(*args, **kwargs)
327 |
328 | def call(self, inputs, **kwargs):
329 | features = inputs
330 | features_shape = keras.backend.shape(features)[:3]
331 |
332 | # generate proposals from bbox deltas and shifted anchors
333 | anchors = shift(features_shape[1:3], self.stride, self.anchors)
334 | anchors = keras.backend.tile(keras.backend.expand_dims(anchors, axis=0), (features_shape[0], 1, 1))
335 |
336 | return anchors
337 |
338 | def compute_output_shape(self, input_shape):
339 | if None not in input_shape[1:]:
340 | total = np.prod(input_shape[1:3]) * self.num_anchors
341 | return (input_shape[0], total, 4)
342 | else:
343 | return (input_shape[0], None, 4)
344 |
345 | def get_config(self):
346 | config = super(Anchors, self).get_config()
347 | config.update({
348 | 'size': self.size,
349 | 'stride': self.stride,
350 | 'ratios': self.ratios.tolist(),
351 | 'scales': self.scales.tolist(),
352 | })
353 |
354 | return config
355 |
356 |
357 | class UpsampleLike(keras.layers.Layer):
358 | """ Keras layer for upsampling a Tensor to be the same shape as another Tensor.
359 | """
360 |
361 | def call(self, inputs, **kwargs):
362 | source, target = inputs
363 | target_shape = keras.backend.shape(target)
364 | return resize_images(source, (target_shape[1], target_shape[2]), method='nearest')
365 |
366 | def compute_output_shape(self, input_shape):
367 | return (input_shape[0][0],) + input_shape[1][1:3] + (input_shape[0][-1],)
368 |
369 |
370 | class RegressBoxes(keras.layers.Layer):
371 | """ Keras layer for applying regression values to boxes.
372 | """
373 |
374 | def __init__(self, mean=None, std=None, *args, **kwargs):
375 | """ Initializer for the RegressBoxes layer.
376 |
377 | Args
378 | mean: The mean value of the regression values which was used for normalization.
379 | std: The standard value of the regression values which was used for normalization.
380 | """
381 | if mean is None:
382 | mean = np.array([0, 0, 0, 0])
383 | if std is None:
384 | std = np.array([0.2, 0.2, 0.2, 0.2])
385 |
386 | if isinstance(mean, (list, tuple)):
387 | mean = np.array(mean)
388 | elif not isinstance(mean, np.ndarray):
389 | raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean)))
390 |
391 | if isinstance(std, (list, tuple)):
392 | std = np.array(std)
393 | elif not isinstance(std, np.ndarray):
394 | raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std)))
395 |
396 | self.mean = mean
397 | self.std = std
398 | super(RegressBoxes, self).__init__(*args, **kwargs)
399 |
400 | def call(self, inputs, **kwargs):
401 | anchors, regression = inputs
402 | return bbox_transform_inv(anchors, regression, mean=self.mean, std=self.std)
403 |
404 | def compute_output_shape(self, input_shape):
405 | return input_shape[0]
406 |
407 | def get_config(self):
408 | config = super(RegressBoxes, self).get_config()
409 | config.update({
410 | 'mean': self.mean.tolist(),
411 | 'std': self.std.tolist(),
412 | })
413 |
414 | return config
415 |
416 |
417 | class ClipBoxes(keras.layers.Layer):
418 | """ Keras layer to clip box values to lie inside a given shape.
419 | """
420 |
421 | def call(self, inputs, **kwargs):
422 | image, boxes = inputs
423 | shape = keras.backend.cast(keras.backend.shape(image), keras.backend.floatx())
424 |
425 | x1 = tf.clip_by_value(boxes[:, :, 0], 0, shape[2])
426 | y1 = tf.clip_by_value(boxes[:, :, 1], 0, shape[1])
427 | x2 = tf.clip_by_value(boxes[:, :, 2], 0, shape[2])
428 | y2 = tf.clip_by_value(boxes[:, :, 3], 0, shape[1])
429 |
430 | return keras.backend.stack([x1, y1, x2, y2], axis=2)
431 |
432 | def compute_output_shape(self, input_shape):
433 | return input_shape[1]
434 |
--------------------------------------------------------------------------------
/keras_retinanet/utils/losses.py:
--------------------------------------------------------------------------------
1 | import keras
2 | import tensorflow as tf
3 |
4 |
5 | def focal(alpha=0.25, gamma=2.0):
6 | """ Create a functor for computing the classification focal loss.
7 | Args
8 | alpha: Scale the focal weight with alpha.
9 | gamma: Take the power of the focal weight with gamma.
10 | Returns
11 | A functor that computes the focal loss using the alpha and gamma.
12 | """
13 |
14 | def _focal(y_true, y_pred):
15 | """ Compute the focal loss given the target tensor and the predicted tensor.
16 | As defined in https://arxiv.org/abs/1708.02002
17 | Args
18 | y_true: Tensor of target data from the generator with shape (B, N, num_classes).
19 | y_pred: Tensor of predicted data from the network with shape (B, N, num_classes).
20 | Returns
21 | The focal loss of y_pred w.r.t. y_true.
22 | """
23 | labels = y_true[:, :, :-1]
24 | anchor_state = y_true[:, :, -1] # -1 for ignore, 0 for background, 1 for object
25 | classification = y_pred
26 |
27 | # filter out "ignore" anchors
28 | indices = tf.where(keras.backend.not_equal(anchor_state, -1))
29 | labels = tf.gather_nd(labels, indices)
30 | classification = tf.gather_nd(classification, indices)
31 |
32 | # compute the focal loss
33 | alpha_factor = keras.backend.ones_like(labels) * alpha
34 | alpha_factor = tf.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor)
35 | focal_weight = tf.where(keras.backend.equal(labels, 1), 1 - classification, classification)
36 | focal_weight = alpha_factor * focal_weight ** gamma
37 |
38 | cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification)
39 |
40 | # compute the normalizer: the number of positive anchors
41 | normalizer = tf.where(keras.backend.equal(anchor_state, 1))
42 | normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx())
43 | normalizer = keras.backend.maximum(1.0, normalizer)
44 |
45 | return keras.backend.sum(cls_loss) / normalizer
46 |
47 | return _focal
48 |
49 |
50 | def smooth_l1(sigma=3.0):
51 | """ Create a smooth L1 regression loss functor.
52 | Args
53 | sigma: This argument defines the point where the loss changes from L2 to L1.
54 | Returns
55 | A functor for computing the smooth L1 loss given target data and predicted data.
56 | """
57 | sigma_squared = sigma ** 2
58 |
59 | def _smooth_l1(y_true, y_pred):
60 | """ Compute the smooth L1 loss of y_pred w.r.t. y_true.
61 | Args
62 | y_true: Tensor from the generator of shape (B, N, 5). The last value for each box is the state of the
63 | anchor (ignore, negative, positive).
64 | y_pred: Tensor from the network of shape (B, N, 4).
65 | Returns
66 | The smooth L1 loss of y_pred w.r.t. y_true.
67 | """
68 | # separate target and state
69 | regression = y_pred
70 | regression_target = y_true[:, :, :4]
71 | anchor_state = y_true[:, :, 4]
72 |
73 | # filter out "ignore" anchors
74 | indices = tf.where(keras.backend.equal(anchor_state, 1))
75 | regression = tf.gather_nd(regression, indices)
76 | regression_target = tf.gather_nd(regression_target, indices)
77 |
78 | # compute smooth L1 loss
79 | # f(x) = 0.5 * (sigma * x)^2 if |x| < 1 / sigma / sigma
80 | # |x| - 0.5 / sigma / sigma otherwise
81 | regression_diff = regression - regression_target
82 | regression_diff = keras.backend.abs(regression_diff)
83 | regression_loss = tf.where(
84 | keras.backend.less(regression_diff, 1.0 / sigma_squared),
85 | 0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),
86 | regression_diff - 0.5 / sigma_squared
87 | )
88 |
89 | # compute the normalizer: the number of positive anchors
90 | normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])
91 | normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())
92 | return keras.backend.sum(regression_loss) / normalizer
93 |
94 | return _smooth_l1
95 |
--------------------------------------------------------------------------------
/logo/keras-logo-2018-large-1200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/logo/keras-logo-2018-large-1200.png
--------------------------------------------------------------------------------
/logo/share2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/logo/share2.jpg
--------------------------------------------------------------------------------