├── Benchmarks_retinanet.xlsx ├── LICENSE ├── README.md ├── challenge2018 └── challenge-2018-class-descriptions-500.csv ├── images ├── test │ ├── 00000b4dcff7f799.jpg │ └── 0000d67245642c5f.jpg └── train │ ├── 0000b86e2fd18333.jpg │ └── 0000b9115cdf1e54.jpg ├── keras_retinanet ├── .gitignore ├── callbacks │ └── callbacks.py ├── models │ ├── classifier.py │ ├── model_backbone.py │ ├── resnet.py │ └── retinanet.py ├── preprocessing │ ├── generator.py │ ├── image.py │ └── open_images.py ├── setup.py ├── trainer │ ├── convert_model.py │ ├── evaluate.py │ ├── model.py │ └── task.py └── utils │ ├── anchors.py │ ├── clean.py │ ├── freeze.py │ ├── initializers.py │ ├── layers.py │ └── losses.py └── logo ├── keras-logo-2018-large-1200.png └── share2.jpg /Benchmarks_retinanet.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/Benchmarks_retinanet.xlsx -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Mukesh Mithrakumar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

Keras RetinaNet

4 | 5 |

6 | 7 |

8 | 9 | Open Source Love 10 | 11 | 12 | GitHub 13 | 14 | 15 | 16 | 17 | 18 | 20 | 21 | 22 | 23 | 24 |

25 | 26 | 27 |

What is it :question:

28 | 29 | This is the Keras implementation of RetinaNet for object detection as described in 30 | [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) 31 | by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár. 32 | 33 | If this repository helps you in anyway, show your love :heart: by putting a :star: on this project :v: 34 | 35 | 36 | ##### Object Detection: 37 | The RetinaNet used is a single, unified network composed of a resnet50 backbone network and two task-specific 38 | subnetworks. The backbone is responsible for computing a convolution feature map over an entire input image and is 39 | an off-the-self convolution network. The first subnet performs classification on the backbones output; the second 40 | subnet performs convolution bounding box regression. 41 | The RetinaNet is a good model for object detection but getting it to work was a challenge. I underestimated the high 42 | number of classes and the size of the data set but was still able to land a bronze medal (Top 20%) among 450 43 | competitors with some tweaks. The benchmark file is added for reference with the local score for predictions and 44 | the parameter used. 45 | 46 | ##### Visual Relationship: 47 | I focused on Object detection and used a simple multi class linear regressor for relationship prediction. Unlike the 48 | usual approach of using a LSTM, I experimented with a Random Forest Classifier and a Multi Output Classifier from 49 | sklearn just to prove LSTM doesn't have much intelligence behind it and it was just a statistical tool. And the 50 | local classification scores proved I was right with giving me an accuracy greater than 90%. And since my visual 51 | relationship was based on how good my object detector performed I was not able to get a better score but with this 52 | model I was able to land a bronze model (Top 30%) among 230 competitors. 53 | 54 | ##### Lessons Learned with Tips: 55 | 1. Not to threshold the predictions and leave the low confidence predictions in the submission file. 56 | Because of the way average precision works, you cannot be penalised for adding additional false positives 57 | with a lower confidence than all your other predictions, however you can still improve your recall if you 58 | find additional objects that weren’t previously detected. 59 | 2. The number of steps and epochs, due to the number of images in the train set, having a balanced number of steps 60 | and epochs is very important and more important than that is to take all these classes and divide it into bins. 61 | Where each bin is occupied by classes with similar frequency in the data set to prepare proper epoch. 62 | 3. When running the training for the classes, to make sure that each class (within an epoch) 63 | has similar number of occurrences by implementing a sampler to do this work. 64 | 65 |

:clipboard: Getting Started

:hourglass: Train

125 | 126 | Run the ```task.py``` from the trainer folder. 127 | 128 | #### Usage 129 | ``` 130 | task.py main_dir(path/to/main directory) dataset_type(oid) 131 | ``` 132 | 133 |

:watch: Test

134 | 135 | First run the ```convert_model.py``` to convert the training model to inference model. 136 | Then run the ```evaluate.py``` for evaluation. Evaluation is defaulted for both object detection and visual 137 | relationship identification, to select between the object detection and the visual relationship identification 138 | add 'od' or 'vr' when calling the ```evaluate.py``` 139 | 140 | #### Usage 141 | ``` 142 | convert_model.py main_dir(path/to/main directory) model_in(model name to be used to convert) 143 | evaluate.py main_dir(path/to/main directory) model_in(model name to be used for evaluation) 144 | ``` 145 | 146 |

:page_facing_up: Documentation

:alien: Authors

222 | 223 | * **Mukesh Mithrakumar** - *Initial work* - [Keras_RetinaNet](https://github.com/mukeshmithrakumar/) 224 | 225 |

:key: License

226 | 227 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details 228 | 229 |

:loudspeaker: Acknowledgments

230 | 231 | * Inspiration from Fizyr Keras RetinaNet 232 | -------------------------------------------------------------------------------- /challenge2018/challenge-2018-class-descriptions-500.csv: -------------------------------------------------------------------------------- 1 | /m/061hd_,Infant bed 2 | /m/06m11,Rose 3 | /m/03120,Flag 4 | /m/01kb5b,Flashlight 5 | /m/0120dh,Sea turtle 6 | /m/0dv5r,Camera 7 | /m/0jbk,Animal 8 | /m/0174n1,Glove 9 | /m/09f_2,Crocodile 10 | /m/01xq0k1,Cattle 11 | /m/03jm5,House 12 | /m/02g30s,Guacamole 13 | /m/05z6w,Penguin 14 | /m/01jfm_,Vehicle registration plate 15 | /m/076lb9,Bench 16 | /m/0gj37,Ladybug 17 | /m/0k0pj,Human nose 18 | /m/0kpqd,Watermelon 19 | /m/0l14j_,Flute 20 | /m/0cyf8,Butterfly 21 | /m/0174k2,Washing machine 22 | /m/0dq75,Raccoon 23 | /m/076bq,Segway 24 | /m/07crc,Taco 25 | /m/0d8zb,Jellyfish 26 | /m/0fszt,Cake 27 | /m/0k1tl,Pen 28 | /m/020kz,Cannon 29 | /m/09728,Bread 30 | /m/07j7r,Tree 31 | /m/0fbdv,Shellfish 32 | /m/03ssj5,Bed 33 | /m/03qrc,Hamster 34 | /m/02dl1y,Hat 35 | /m/01k6s3,Toaster 36 | /m/02jfl0,Sombrero 37 | /m/01krhy,Tiara 38 | /m/04kkgm,Bowl 39 | /m/0ft9s,Dragonfly 40 | /m/0d_2m,Moths and butterflies 41 | /m/0czz2,Antelope 42 | /m/0f4s2w,Vegetable 43 | /m/07dd4,Torch 44 | /m/0cgh4,Building 45 | /m/03bbps,Power plugs and sockets 46 | /m/02pjr4,Blender 47 | /m/04p0qw,Billiard table 48 | /m/02pdsw,Cutting board 49 | /m/01yx86,Bronze sculpture 50 | /m/09dzg,Turtle 51 | /m/0hkxq,Broccoli 52 | /m/07dm6,Tiger 53 | /m/054_l,Mirror 54 | /m/01dws,Bear 55 | /m/027pcv,Zucchini 56 | /m/01d40f,Dress 57 | /m/02rgn06,Volleyball 58 | /m/0342h,Guitar 59 | /m/06bt6,Reptile 60 | /m/0323sq,Golf cart 61 | /m/02zvsm,Tart 62 | /m/02fq_6,Fedora 63 | /m/01lrl,Carnivore 64 | /m/0k4j,Car 65 | /m/04h7h,Lighthouse 66 | /m/07xyvk,Coffeemaker 67 | /m/03y6mg,Food processor 68 | /m/07r04,Truck 69 | /m/03__z0,Bookcase 70 | /m/019w40,Surfboard 71 | /m/09j5n,Footwear 72 | /m/0cvnqh,Bench 73 | /m/01llwg,Necklace 74 | /m/0c9ph5,Flower 75 | /m/015x5n,Radish 76 | /m/0gd2v,Marine mammal 77 | /m/04v6l4,Frying pan 78 | /m/02jz0l,Tap 79 | /m/0dj6p,Peach 80 | /m/04ctx,Knife 81 | /m/080hkjn,Handbag 82 | /m/01c648,Laptop 83 | /m/01j61q,Tent 84 | /m/012n7d,Ambulance 85 | /m/025nd,Christmas tree 86 | /m/09csl,Eagle 87 | /m/01lcw4,Limousine 88 | /m/0h8n5zk,Kitchen & dining room table 89 | /m/0633h,Polar bear 90 | /m/01fdzj,Tower 91 | /m/01226z,Football 92 | /m/0mw_6,Willow 93 | /m/04hgtk,Human head 94 | /m/02pv19,Stop sign 95 | /m/09qck,Banana 96 | /m/063rgb,Mixer 97 | /m/0lt4_,Binoculars 98 | /m/0270h,Dessert 99 | /m/01h3n,Bee 100 | /m/01mzpv,Chair 101 | /m/04169hn,Wood-burning stove 102 | /m/0fm3zh,Flowerpot 103 | /m/0d20w4,Beaker 104 | /m/0_cp5,Oyster 105 | /m/01dy8n,Woodpecker 106 | /m/03m5k,Harp 107 | /m/03dnzn,Bathtub 108 | /m/0h8mzrc,Wall clock 109 | /m/0h8mhzd,Sports uniform 110 | /m/03d443,Rhinoceros 111 | /m/01gllr,Beehive 112 | /m/0642b4,Cupboard 113 | /m/09b5t,Chicken 114 | /m/04yx4,Man 115 | /m/01f8m5,Blue jay 116 | /m/015x4r,Cucumber 117 | /m/01j51,Balloon 118 | /m/02zt3,Kite 119 | /m/03tw93,Fireplace 120 | /m/01jfsr,Lantern 121 | /m/04ylt,Missile 122 | /m/0bt_c3,Book 123 | /m/0cmx8,Spoon 124 | /m/0hqkz,Grapefruit 125 | /m/071qp,Squirrel 126 | /m/0cyhj_,Orange 127 | /m/01xygc,Coat 128 | /m/0420v5,Punching bag 129 | /m/0898b,Zebra 130 | /m/01knjb,Billboard 131 | /m/0199g,Bicycle 132 | /m/03c7gz,Door handle 133 | /m/02x984l,Mechanical fan 134 | /m/04zwwv,Ring binder 135 | /m/04bcr3,Table 136 | /m/0gv1x,Parrot 137 | /m/01nq26,Sock 138 | /m/02s195,Vase 139 | /m/083kb,Weapon 140 | /m/06nrc,Shotgun 141 | /m/0jyfg,Glasses 142 | /m/0nybt,Seahorse 143 | /m/0176mf,Belt 144 | /m/01rzcn,Watercraft 145 | /m/0d4v4,Window 146 | /m/03bk1,Giraffe 147 | /m/096mb,Lion 148 | /m/0h9mv,Tire 149 | /m/07yv9,Vehicle 150 | /m/0ph39,Canoe 151 | /m/01rkbr,Tie 152 | /m/0gjbg72,Shelf 153 | /m/06z37_,Picture frame 154 | /m/01m4t,Printer 155 | /m/035r7c,Human leg 156 | /m/019jd,Boat 157 | /m/02tsc9,Slow cooker 158 | /m/015wgc,Croissant 159 | /m/0c06p,Candle 160 | /m/01dwwc,Pancake 161 | /m/034c16,Pillow 162 | /m/0242l,Coin 163 | /m/02lbcq,Stretcher 164 | /m/03nfch,Sandal 165 | /m/03bt1vf,Woman 166 | /m/01lynh,Stairs 167 | /m/03q5t,Harpsichord 168 | /m/0fqt361,Stool 169 | /m/01bjv,Bus 170 | /m/01s55n,Suitcase 171 | /m/0283dt1,Human mouth 172 | /m/01z1kdw,Juice 173 | /m/016m2d,Skull 174 | /m/02dgv,Door 175 | /m/07y_7,Violin 176 | /m/01_5g,Chopsticks 177 | /m/06_72j,Digital clock 178 | /m/0ftb8,Sunflower 179 | /m/0c29q,Leopard 180 | /m/0jg57,Bell pepper 181 | /m/02l8p9,Harbor seal 182 | /m/078jl,Snake 183 | /m/0llzx,Sewing machine 184 | /m/0dbvp,Goose 185 | /m/09ct_,Helicopter 186 | /m/0dkzw,Seat belt 187 | /m/02p5f1q,Coffee cup 188 | /m/0fx9l,Microwave oven 189 | /m/01b9xk,Hot dog 190 | /m/0b3fp9,Countertop 191 | /m/0h8n27j,Serving tray 192 | /m/0h8n6f9,Dog bed 193 | /m/01599,Beer 194 | /m/017ftj,Sunglasses 195 | /m/044r5d,Golf ball 196 | /m/01dwsz,Waffle 197 | /m/0cdl1,Palm tree 198 | /m/07gql,Trumpet 199 | /m/0hdln,Ruler 200 | /m/0zvk5,Helmet 201 | /m/012w5l,Ladder 202 | /m/021sj1,Office building 203 | /m/0bh9flk,Tablet computer 204 | /m/09gtd,Toilet paper 205 | /m/0jwn_,Pomegranate 206 | /m/02wv6h6,Skirt 207 | /m/02wv84t,Gas stove 208 | /m/021mn,Cookie 209 | /m/018p4k,Cart 210 | /m/06j2d,Raven 211 | /m/033cnk,Egg 212 | /m/01j3zr,Burrito 213 | /m/03fwl,Goat 214 | /m/058qzx,Kitchen knife 215 | /m/06_fw,Skateboard 216 | /m/02x8cch,Salt and pepper shakers 217 | /m/04g2r,Lynx 218 | /m/01b638,Boot 219 | /m/099ssp,Platter 220 | /m/071p9,Ski 221 | /m/01gkx_,Swimwear 222 | /m/0b_rs,Swimming pool 223 | /m/03v5tg,Drinking straw 224 | /m/01j5ks,Wrench 225 | /m/026t6,Drum 226 | /m/0_k2,Ant 227 | /m/039xj_,Human ear 228 | /m/01b7fy,Headphones 229 | /m/0220r2,Fountain 230 | /m/015p6,Bird 231 | /m/0fly7,Jeans 232 | /m/07c52,Television 233 | /m/0n28_,Crab 234 | /m/0hg7b,Microphone 235 | /m/019dx1,Home appliance 236 | /m/04vv5k,Snowplow 237 | /m/020jm,Beetle 238 | /m/047v4b,Artichoke 239 | /m/01xs3r,Jet ski 240 | /m/03kt2w,Stationary bicycle 241 | /m/03q69,Human hair 242 | /m/01dxs,Brown bear 243 | /m/01h8tj,Starfish 244 | /m/0dt3t,Fork 245 | /m/0cjq5,Lobster 246 | /m/0h8lkj8,Corded phone 247 | /m/0271t,Drink 248 | /m/03q5c7,Saucer 249 | /m/0fj52s,Carrot 250 | /m/03vt0,Insect 251 | /m/01x3z,Clock 252 | /m/0d5gx,Castle 253 | /m/0h8my_4,Tennis racket 254 | /m/03ldnb,Ceiling fan 255 | /m/0cjs7,Asparagus 256 | /m/0449p,Jaguar 257 | /m/04szw,Musical instrument 258 | /m/07jdr,Train 259 | /m/01yrx,Cat 260 | /m/06c54,Rifle 261 | /m/04h8sr,Dumbbell 262 | /m/050k8,Mobile phone 263 | /m/0pg52,Taxi 264 | /m/02f9f_,Shower 265 | /m/054fyh,Pitcher 266 | /m/09k_b,Lemon 267 | /m/03xxp,Invertebrate 268 | /m/0jly1,Turkey 269 | /m/06k2mb,High heels 270 | /m/04yqq2,Bust 271 | /m/0bwd_0j,Elephant 272 | /m/02h19r,Scarf 273 | /m/02zn6n,Barrel 274 | /m/07c6l,Trombone 275 | /m/05zsy,Pumpkin 276 | /m/025dyy,Box 277 | /m/07j87,Tomato 278 | /m/09ld4,Frog 279 | /m/01vbnl,Bidet 280 | /m/0dzct,Human face 281 | /m/03fp41,Houseplant 282 | /m/0h2r6,Van 283 | /m/0by6g,Shark 284 | /m/0cxn2,Ice cream 285 | /m/04tn4x,Swim cap 286 | /m/0f6wt,Falcon 287 | /m/05n4y,Ostrich 288 | /m/0gxl3,Handgun 289 | /m/02d9qx,Whiteboard 290 | /m/04m9y,Lizard 291 | /m/05z55,Pasta 292 | /m/01x3jk,Snowmobile 293 | /m/0h8l4fh,Light bulb 294 | /m/031b6r,Window blind 295 | /m/01tcjp,Muffin 296 | /m/01f91_,Pretzel 297 | /m/02522,Computer monitor 298 | /m/0319l,Horn 299 | /m/0c_jw,Furniture 300 | /m/0l515,Sandwich 301 | /m/0306r,Fox 302 | /m/0crjs,Convenience store 303 | /m/0ch_cf,Fish 304 | /m/02xwb,Fruit 305 | /m/01r546,Earrings 306 | /m/03rszm,Curtain 307 | /m/0388q,Grape 308 | /m/03m3pdh,Sofa bed 309 | /m/03k3r,Horse 310 | /m/0hf58v5,Luggage and bags 311 | /m/01y9k5,Desk 312 | /m/05441v,Crutch 313 | /m/03p3bw,Bicycle helmet 314 | /m/0175cv,Tick 315 | /m/0cmf2,Airplane 316 | /m/0ccs93,Canary 317 | /m/02d1br,Spatula 318 | /m/0gjkl,Watch 319 | /m/0jqgx,Lily 320 | /m/0h99cwc,Kitchen appliance 321 | /m/047j0r,Filing cabinet 322 | /m/0k5j,Aircraft 323 | /m/0h8n6ft,Cake stand 324 | /m/0gm28,Candy 325 | /m/0130jx,Sink 326 | /m/04rmv,Mouse 327 | /m/081qc,Wine 328 | /m/0qmmr,Wheelchair 329 | /m/03fj2,Goldfish 330 | /m/040b_t,Refrigerator 331 | /m/02y6n,French fries 332 | /m/0fqfqc,Drawer 333 | /m/030610,Treadmill 334 | /m/07kng9,Picnic basket 335 | /m/029b3,Dice 336 | /m/0fbw6,Cabbage 337 | /m/07qxg_,Football helmet 338 | /m/068zj,Pig 339 | /m/01g317,Person 340 | /m/01bfm9,Shorts 341 | /m/02068x,Gondola 342 | /m/0fz0h,Honeycomb 343 | /m/0jy4k,Doughnut 344 | /m/05kyg_,Chest of drawers 345 | /m/01prls,Land vehicle 346 | /m/01h44,Bat 347 | /m/08pbxl,Monkey 348 | /m/02gzp,Dagger 349 | /m/04brg2,Tableware 350 | /m/031n1,Human foot 351 | /m/02jvh9,Mug 352 | /m/046dlr,Alarm clock 353 | /m/0h8ntjv,Pressure cooker 354 | /m/0k65p,Human hand 355 | /m/011k07,Tortoise 356 | /m/03grzl,Baseball glove 357 | /m/06y5r,Sword 358 | /m/061_f,Pear 359 | /m/01cmb2,Miniskirt 360 | /m/01mqdt,Traffic sign 361 | /m/05r655,Girl 362 | /m/02p3w7d,Roller skates 363 | /m/029tx,Dinosaur 364 | /m/04m6gz,Porch 365 | /m/015h_t,Human beard 366 | /m/06pcq,Submarine sandwich 367 | /m/01bms0,Screwdriver 368 | /m/07fbm7,Strawberry 369 | /m/09tvcd,Wine glass 370 | /m/06nwz,Seafood 371 | /m/0dv9c,Racket 372 | /m/083wq,Wheel 373 | /m/0gd36,Sea lion 374 | /m/0138tl,Toy 375 | /m/07clx,Tea 376 | /m/05ctyq,Tennis ball 377 | /m/0bjyj5,Waste container 378 | /m/0dbzx,Mule 379 | /m/02ctlc,Cricket ball 380 | /m/0fp6w,Pineapple 381 | /m/0djtd,Coconut 382 | /m/0167gd,Doll 383 | /m/078n6m,Coffee table 384 | /m/0152hh,Snowman 385 | /m/04gth,Lavender 386 | /m/0ll1f78,Shrimp 387 | /m/0cffdh,Maple 388 | /m/025rp__,Cowboy hat 389 | /m/02_n6y,Goggles 390 | /m/0wdt60w,Rugby ball 391 | /m/0cydv,Caterpillar 392 | /m/01n5jq,Poster 393 | /m/09rvcxw,Rocket 394 | /m/013y1f,Organ 395 | /m/06ncr,Saxophone 396 | /m/015qff,Traffic light 397 | /m/024g6,Cocktail 398 | /m/05gqfk,Plastic bag 399 | /m/0dv77,Squash 400 | /m/052sf,Mushroom 401 | /m/0cdn1,Hamburger 402 | /m/03jbxj,Light switch 403 | /m/0cyfs,Parachute 404 | /m/0kmg4,Teddy bear 405 | /m/02cvgx,Winter melon 406 | /m/09kx5,Deer 407 | /m/057cc,Musical keyboard 408 | /m/02pkr5,Plumbing fixture 409 | /m/057p5t,Scoreboard 410 | /m/03g8mr,Baseball bat 411 | /m/0frqm,Envelope 412 | /m/03m3vtv,Adhesive tape 413 | /m/0584n8,Briefcase 414 | /m/014y4n,Paddle 415 | /m/01g3x7,Bow and arrow 416 | /m/07cx4,Telephone 417 | /m/07bgp,Sheep 418 | /m/032b3c,Jacket 419 | /m/01bl7v,Boy 420 | /m/0663v,Pizza 421 | /m/0cn6p,Otter 422 | /m/02rdsp,Office supplies 423 | /m/02crq1,Couch 424 | /m/01xqw,Cello 425 | /m/0cnyhnx,Bull 426 | /m/01x_v,Camel 427 | /m/018xm,Ball 428 | /m/09ddx,Duck 429 | /m/084zz,Whale 430 | /m/01n4qj,Shirt 431 | /m/07cmd,Tank 432 | /m/04_sv,Motorcycle 433 | /m/0mkg,Accordion 434 | /m/09d5_,Owl 435 | /m/0c568,Porcupine 436 | /m/02wbtzl,Sun hat 437 | /m/05bm6,Nail 438 | /m/01lsmm,Scissors 439 | /m/0dftk,Swan 440 | /m/0dtln,Lamp 441 | /m/0nl46,Crown 442 | /m/05r5c,Piano 443 | /m/06msq,Sculpture 444 | /m/0cd4d,Cheetah 445 | /m/05kms,Oboe 446 | /m/02jnhm,Tin can 447 | /m/0fldg,Mango 448 | /m/073bxn,Tripod 449 | /m/029bxz,Oven 450 | /m/020lf,Mouse 451 | /m/01btn,Barge 452 | /m/02vqfm,Coffee 453 | /m/06__v,Snowboard 454 | /m/043nyj,Common fig 455 | /m/0grw1,Salad 456 | /m/03hl4l9,Marine invertebrates 457 | /m/0hnnb,Umbrella 458 | /m/04c0y,Kangaroo 459 | /m/0dzf4,Human arm 460 | /m/07v9_z,Measuring cup 461 | /m/0f9_l,Snail 462 | /m/0703r8,Loveseat 463 | /m/01xyhv,Suit 464 | /m/01fh4r,Teapot 465 | /m/04dr76w,Bottle 466 | /m/0pcr,Alpaca 467 | /m/03s_tn,Kettle 468 | /m/07mhn,Trousers 469 | /m/01hrv5,Popcorn 470 | /m/019h78,Centipede 471 | /m/09kmb,Spider 472 | /m/0h23m,Sparrow 473 | /m/050gv4,Plate 474 | /m/01fb_0,Bagel 475 | /m/02w3_ws,Personal care 476 | /m/014j1m,Apple 477 | /m/01gmv2,Brassiere 478 | /m/04y4h8h,Bathroom cabinet 479 | /m/026qbn5,studio couch 480 | /m/01m2v,Computer keyboard 481 | /m/05_5p_0,Table tennis racket 482 | /m/07030,Sushi 483 | /m/01s105,Cabinetry 484 | /m/033rq4,Street light 485 | /m/0162_1,Towel 486 | /m/02z51p,Nightstand 487 | /m/06mf6,Rabbit 488 | /m/02hj4,Dolphin 489 | /m/0bt9lr,Dog 490 | /m/08hvt4,Jug 491 | /m/084rd,Wok 492 | /m/01pns0,Fire hydrant 493 | /m/014sv8,Human eye 494 | /m/079cl,Skyscraper 495 | /m/01940j,Backpack 496 | /m/05vtc,Potato 497 | /m/02w3r3,Paper towel 498 | /m/054xkw,Lifejacket 499 | /m/01bqk0,Bicycle wheel 500 | /m/09g1w,Toilet 501 | -------------------------------------------------------------------------------- /images/test/00000b4dcff7f799.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/test/00000b4dcff7f799.jpg -------------------------------------------------------------------------------- /images/test/0000d67245642c5f.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/test/0000d67245642c5f.jpg -------------------------------------------------------------------------------- /images/train/0000b86e2fd18333.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/train/0000b86e2fd18333.jpg -------------------------------------------------------------------------------- /images/train/0000b9115cdf1e54.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/images/train/0000b9115cdf1e54.jpg -------------------------------------------------------------------------------- /keras_retinanet/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /keras_retinanet/callbacks/callbacks.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | from ..utils.anchors import compute_overlap 4 | import keras 5 | import os 6 | import cv2 7 | import numpy as np 8 | import warnings 9 | 10 | 11 | class RedirectModel(keras.callbacks.Callback): 12 | """ 13 | Callback which wraps another callback, but executed on a different model. 14 | 15 | ```python 16 | model = keras.models.load_model('model.h5') 17 | model_checkpoint = ModelCheckpoint(filepath='snapshot.h5') 18 | parallel_model = multi_gpu_model(model, gpus=2) 19 | parallel_model.fit(X_train, Y_train, callbacks=[RedirectModel(model_checkpoint, model)]) 20 | ``` 21 | 22 | Args 23 | callback : callback to wrap. 24 | model : model to use when executing callbacks. 25 | """ 26 | 27 | def __init__(self, 28 | callback, 29 | model): 30 | super(RedirectModel, self).__init__() 31 | 32 | self.callback = callback 33 | self.redirect_model = model 34 | 35 | def on_epoch_begin(self, epoch, logs=None): 36 | self.callback.on_epoch_begin(epoch, logs=logs) 37 | 38 | def on_epoch_end(self, epoch, logs=None): 39 | self.callback.on_epoch_end(epoch, logs=logs) 40 | 41 | def on_batch_begin(self, batch, logs=None): 42 | self.callback.on_batch_begin(batch, logs=logs) 43 | 44 | def on_batch_end(self, batch, logs=None): 45 | self.callback.on_batch_end(batch, logs=logs) 46 | 47 | def on_train_begin(self, logs=None): 48 | # overwrite the model with our custom model 49 | self.callback.set_model(self.redirect_model) 50 | 51 | self.callback.on_train_begin(logs=logs) 52 | 53 | def on_train_end(self, logs=None): 54 | self.callback.on_train_end(logs=logs) 55 | 56 | 57 | def label_color(label): 58 | """ Return a color from a set of predefined colors. Contains 80 colors in total. 59 | Args 60 | label: The label to get the color for. 61 | Returns 62 | A list of three values representing a RGB color. 63 | If no color is defined for a certain label, the color green is returned and a warning is printed. 64 | """ 65 | if label < len(colors): 66 | return colors[label] 67 | else: 68 | warnings.warn('Label {} has no color, returning default.'.format(label)) 69 | return (0, 255, 0) 70 | 71 | 72 | """ 73 | Generated using: 74 | ``` 75 | colors = [list((matplotlib.colors.hsv_to_rgb([x, 1.0, 1.0]) * 255).astype(int)) for x in np.arange(0, 1, 1.0 / 80)] 76 | shuffle(colors) 77 | pprint(colors) 78 | ``` 79 | """ 80 | colors = [ 81 | [31, 0, 255], 82 | [0, 159, 255], 83 | [255, 95, 0], 84 | [255, 19, 0], 85 | [255, 0, 0], 86 | [255, 38, 0], 87 | [0, 255, 25], 88 | [255, 0, 133], 89 | [255, 172, 0], 90 | [108, 0, 255], 91 | [0, 82, 255], 92 | [0, 255, 6], 93 | [255, 0, 152], 94 | [223, 0, 255], 95 | [12, 0, 255], 96 | [0, 255, 178], 97 | [108, 255, 0], 98 | [184, 0, 255], 99 | [255, 0, 76], 100 | [146, 255, 0], 101 | [51, 0, 255], 102 | [0, 197, 255], 103 | [255, 248, 0], 104 | [255, 0, 19], 105 | [255, 0, 38], 106 | [89, 255, 0], 107 | [127, 255, 0], 108 | [255, 153, 0], 109 | [0, 255, 255], 110 | [0, 255, 216], 111 | [0, 255, 121], 112 | [255, 0, 248], 113 | [70, 0, 255], 114 | [0, 255, 159], 115 | [0, 216, 255], 116 | [0, 6, 255], 117 | [0, 63, 255], 118 | [31, 255, 0], 119 | [255, 57, 0], 120 | [255, 0, 210], 121 | [0, 255, 102], 122 | [242, 255, 0], 123 | [255, 191, 0], 124 | [0, 255, 63], 125 | [255, 0, 95], 126 | [146, 0, 255], 127 | [184, 255, 0], 128 | [255, 114, 0], 129 | [0, 255, 235], 130 | [255, 229, 0], 131 | [0, 178, 255], 132 | [255, 0, 114], 133 | [255, 0, 57], 134 | [0, 140, 255], 135 | [0, 121, 255], 136 | [12, 255, 0], 137 | [255, 210, 0], 138 | [0, 255, 44], 139 | [165, 255, 0], 140 | [0, 25, 255], 141 | [0, 255, 140], 142 | [0, 101, 255], 143 | [0, 255, 82], 144 | [223, 255, 0], 145 | [242, 0, 255], 146 | [89, 0, 255], 147 | [165, 0, 255], 148 | [70, 255, 0], 149 | [255, 0, 172], 150 | [255, 76, 0], 151 | [203, 255, 0], 152 | [204, 0, 255], 153 | [255, 0, 229], 154 | [255, 133, 0], 155 | [127, 0, 255], 156 | [0, 235, 255], 157 | [0, 255, 197], 158 | [255, 0, 191], 159 | [0, 44, 255], 160 | [50, 255, 0] 161 | ] 162 | 163 | 164 | def draw_box(image, box, color, thickness=2): 165 | """ Draws a box on an image with a given color. 166 | 167 | # Arguments 168 | image : The image to draw on. 169 | box : A list of 4 elements (x1, y1, x2, y2). 170 | color : The color of the box. 171 | thickness : The thickness of the lines to draw a box with. 172 | """ 173 | b = np.array(box).astype(int) 174 | cv2.rectangle(image, (b[0], b[1]), (b[2], b[3]), color, thickness, cv2.LINE_AA) 175 | 176 | 177 | def draw_caption(image, box, caption): 178 | """ Draws a caption above the box in an image. 179 | 180 | # Arguments 181 | image : The image to draw on. 182 | box : A list of 4 elements (x1, y1, x2, y2). 183 | caption : String containing the text to draw. 184 | """ 185 | b = np.array(box).astype(int) 186 | cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 0), 2) 187 | cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1) 188 | 189 | 190 | def draw_boxes(image, boxes, color, thickness=2): 191 | """ Draws boxes on an image with a given color. 192 | 193 | # Arguments 194 | image : The image to draw on. 195 | boxes : A [N, 4] matrix (x1, y1, x2, y2). 196 | color : The color of the boxes. 197 | thickness : The thickness of the lines to draw boxes with. 198 | """ 199 | for b in boxes: 200 | draw_box(image, b, color, thickness=thickness) 201 | 202 | 203 | def draw_detections(image, boxes, scores, labels, color=None, label_to_name=None, score_threshold=0.5): 204 | """ Draws detections in an image. 205 | 206 | # Arguments 207 | image : The image to draw on. 208 | boxes : A [N, 4] matrix (x1, y1, x2, y2). 209 | scores : A list of N classification scores. 210 | labels : A list of N labels. 211 | color : The color of the boxes. 212 | By default the color from keras_retinanet.utils.colors.label_color will be used. 213 | label_to_name : (optional) Functor for mapping a label to a name. 214 | score_threshold : Threshold used for determining what detections to draw. 215 | """ 216 | selection = np.where(scores > score_threshold)[0] 217 | 218 | for i in selection: 219 | c = color if color is not None else label_color(labels[i]) 220 | draw_box(image, boxes[i, :], color=c) 221 | 222 | # draw labels 223 | caption = (label_to_name(labels[i]) if label_to_name else labels[i]) + ': {0:.2f}'.format(scores[i]) 224 | draw_caption(image, boxes[i, :], caption) 225 | 226 | 227 | def draw_annotations(image, annotations, color=(0, 255, 0), label_to_name=None): 228 | """ Draws annotations in an image. 229 | 230 | # Arguments 231 | image : The image to draw on. 232 | annotations : A [N, 5] matrix (x1, y1, x2, y2, label). 233 | color : The color of the boxes. 234 | By default the color from keras_retinanet.utils.colors.label_color will be used. 235 | label_to_name : (optional) Functor for mapping a label to a name. 236 | """ 237 | for a in annotations: 238 | label = a[4] 239 | c = color if color is not None else label_color(label) 240 | caption = '{}'.format(label_to_name(label) if label_to_name else label) 241 | draw_caption(image, a, caption) 242 | 243 | draw_box(image, a, color=c) 244 | 245 | 246 | def _compute_ap(recall, precision): 247 | """ Compute the average precision, given the recall and precision curves. 248 | 249 | Code originally from https://github.com/rbgirshick/py-faster-rcnn. 250 | 251 | # Arguments 252 | recall: The recall curve (list). 253 | precision: The precision curve (list). 254 | # Returns 255 | The average precision as computed in py-faster-rcnn. 256 | """ 257 | # correct AP calculation 258 | # first append sentinel values at the end 259 | mrec = np.concatenate(([0.], recall, [1.])) 260 | mpre = np.concatenate(([0.], precision, [0.])) 261 | 262 | # compute the precision envelope 263 | for i in range(mpre.size - 1, 0, -1): 264 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 265 | 266 | # to calculate area under PR curve, look for points 267 | # where X axis (recall) changes value 268 | i = np.where(mrec[1:] != mrec[:-1])[0] 269 | 270 | # and sum (\Delta recall) * prec 271 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 272 | return ap 273 | 274 | 275 | def _get_detections(generator, model, score_threshold=0.05, max_detections=100, save_path=None): 276 | """ Get the detections from the model using the generator. 277 | 278 | The result is a list of lists such that the size is: 279 | all_detections[num_images][num_classes] = detections[num_detections, 4 + num_classes] 280 | 281 | # Arguments 282 | generator : The generator used to run images through the model. 283 | model : The model to run on the images. 284 | score_threshold : The score confidence threshold to use. 285 | max_detections : The maximum number of detections to use per image. 286 | save_path : The path to save the images with visualized detections to. 287 | # Returns 288 | A list of lists containing the detections for each image in the generator. 289 | """ 290 | all_detections = [[None for i in range(generator.num_classes())] for j in range(generator.size())] 291 | 292 | while True: 293 | for i in range(generator.size()): 294 | try: 295 | raw_image = generator.load_image(i) 296 | image = generator.preprocess_image(raw_image.copy()) 297 | except: 298 | break 299 | image, scale = generator.resize_image(image) 300 | 301 | # run network 302 | boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))[:3] 303 | 304 | # correct boxes for image scale 305 | boxes /= scale 306 | 307 | # select indices which have a score above the threshold 308 | indices = np.where(scores[0, :] > score_threshold)[0] 309 | 310 | # select those scores 311 | scores = scores[0][indices] 312 | 313 | # find the order with which to sort the scores 314 | scores_sort = np.argsort(-scores)[:max_detections] 315 | 316 | # select detections 317 | image_boxes = boxes[0, indices[scores_sort], :] 318 | image_scores = scores[scores_sort] 319 | image_labels = labels[0, indices[scores_sort]] 320 | image_detections = np.concatenate( 321 | [image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1) 322 | 323 | if save_path is not None: 324 | draw_annotations(raw_image, generator.load_annotations(i), label_to_name=generator.label_to_name) 325 | draw_detections(raw_image, image_boxes, image_scores, image_labels, label_to_name=generator.label_to_name) 326 | 327 | cv2.imwrite(os.path.join(save_path, '{}.png'.format(i)), raw_image) 328 | 329 | # copy detections to all_detections 330 | for label in range(generator.num_classes()): 331 | all_detections[i][label] = image_detections[image_detections[:, -1] == label, :-1] 332 | 333 | print('{}/{}'.format(i + 1, generator.size()), end='\r') 334 | 335 | return all_detections 336 | 337 | 338 | def _get_annotations(generator): 339 | """ Get the ground truth annotations from the generator. 340 | 341 | The result is a list of lists such that the size is: 342 | all_detections[num_images][num_classes] = annotations[num_detections, 5] 343 | 344 | # Arguments 345 | generator : The generator used to retrieve ground truth annotations. 346 | # Returns 347 | A list of lists containing the annotations for each image in the generator. 348 | """ 349 | all_annotations = [[None for i in range(generator.num_classes())] for j in range(generator.size())] 350 | 351 | for i in range(generator.size()): 352 | # load the annotations 353 | annotations = generator.load_annotations(i) 354 | 355 | # copy detections to all_annotations 356 | for label in range(generator.num_classes()): 357 | all_annotations[i][label] = annotations[annotations[:, 4] == label, :4].copy() 358 | 359 | print('{}/{}'.format(i + 1, generator.size()), end='\r') 360 | 361 | return all_annotations 362 | 363 | 364 | def evaluate( 365 | generator, 366 | model, 367 | iou_threshold=0.5, 368 | score_threshold=0.05, 369 | max_detections=100, 370 | save_path=None 371 | ): 372 | """ Evaluate a given dataset using a given model. 373 | 374 | # Arguments 375 | generator : The generator that represents the dataset to evaluate. 376 | model : The model to evaluate. 377 | iou_threshold : The threshold used to consider when a detection is positive or negative. 378 | score_threshold : The score confidence threshold to use for detections. 379 | max_detections : The maximum number of detections to use per image. 380 | save_path : The path to save images with visualized detections to. 381 | # Returns 382 | A dict mapping class names to mAP scores. 383 | """ 384 | # gather all detections and annotations 385 | all_detections = _get_detections(generator, model, score_threshold=score_threshold, max_detections=max_detections, 386 | save_path=save_path) 387 | all_annotations = _get_annotations(generator) 388 | average_precisions = {} 389 | 390 | # process detections and annotations 391 | for label in range(generator.num_classes()): 392 | false_positives = np.zeros((0,)) 393 | true_positives = np.zeros((0,)) 394 | scores = np.zeros((0,)) 395 | num_annotations = 0.0 396 | 397 | for i in range(generator.size()): 398 | detections = all_detections[i][label] 399 | annotations = all_annotations[i][label] 400 | num_annotations += annotations.shape[0] 401 | detected_annotations = [] 402 | 403 | for d in detections: 404 | scores = np.append(scores, d[4]) 405 | 406 | if annotations.shape[0] == 0: 407 | false_positives = np.append(false_positives, 1) 408 | true_positives = np.append(true_positives, 0) 409 | continue 410 | 411 | overlaps = compute_overlap(np.expand_dims(d, axis=0), annotations) 412 | assigned_annotation = np.argmax(overlaps, axis=1) 413 | max_overlap = overlaps[0, assigned_annotation] 414 | 415 | if max_overlap >= iou_threshold and assigned_annotation not in detected_annotations: 416 | false_positives = np.append(false_positives, 0) 417 | true_positives = np.append(true_positives, 1) 418 | detected_annotations.append(assigned_annotation) 419 | else: 420 | false_positives = np.append(false_positives, 1) 421 | true_positives = np.append(true_positives, 0) 422 | 423 | # no annotations -> AP for this class is 0 (is this correct?) 424 | if num_annotations == 0: 425 | average_precisions[label] = 0, 0 426 | continue 427 | 428 | # sort by score 429 | indices = np.argsort(-scores) 430 | false_positives = false_positives[indices] 431 | true_positives = true_positives[indices] 432 | 433 | # compute false positives and true positives 434 | false_positives = np.cumsum(false_positives) 435 | true_positives = np.cumsum(true_positives) 436 | 437 | # compute recall and precision 438 | recall = true_positives / num_annotations 439 | precision = true_positives / np.maximum(true_positives + false_positives, np.finfo(np.float64).eps) 440 | 441 | # compute average precision 442 | average_precision = _compute_ap(recall, precision) 443 | average_precisions[label] = average_precision, num_annotations 444 | 445 | return average_precisions 446 | 447 | 448 | class Evaluate(keras.callbacks.Callback): 449 | """ Evaluation callback for arbitrary datasets. 450 | """ 451 | 452 | def __init__(self, generator, iou_threshold=0.5, score_threshold=0.05, max_detections=100, save_path=None, 453 | tensorboard=None, verbose=1): 454 | """ Evaluate a given dataset using a given model at the end of every epoch during training. 455 | 456 | # Arguments 457 | generator : The generator that represents the dataset to evaluate. 458 | iou_threshold : The threshold used to consider when a detection is positive or negative. 459 | score_threshold : The score confidence threshold to use for detections. 460 | max_detections : The maximum number of detections to use per image. 461 | save_path : The path to save images with visualized detections to. 462 | tensorboard : Instance of keras.callbacks.TensorBoard used to log the mAP value. 463 | verbose : Set the verbosity level, by default this is set to 1. 464 | """ 465 | self.generator = generator 466 | self.iou_threshold = iou_threshold 467 | self.score_threshold = score_threshold 468 | self.max_detections = max_detections 469 | self.save_path = save_path 470 | self.tensorboard = tensorboard 471 | self.verbose = verbose 472 | 473 | super(Evaluate, self).__init__() 474 | 475 | def on_epoch_end(self, epoch, logs=None): 476 | logs = logs or {} 477 | 478 | # run evaluation 479 | average_precisions = evaluate( 480 | self.generator, 481 | self.model, 482 | iou_threshold=self.iou_threshold, 483 | score_threshold=self.score_threshold, 484 | max_detections=self.max_detections, 485 | save_path=self.save_path 486 | ) 487 | 488 | # compute per class average precision 489 | present_classes = 0 490 | precision = 0 491 | for label, (average_precision, num_annotations) in average_precisions.items(): 492 | if self.verbose == 1: 493 | print('{:.0f} instances of class'.format(num_annotations), 494 | self.generator.label_to_name(label), 'with average precision: {:.4f}'.format(average_precision)) 495 | if num_annotations > 0: 496 | present_classes += 1 497 | precision += average_precision 498 | self.mean_ap = precision / present_classes 499 | 500 | if self.tensorboard is not None and self.tensorboard.writer is not None: 501 | import tensorflow as tf 502 | summary = tf.Summary() 503 | summary_value = summary.value.add() 504 | summary_value.simple_value = self.mean_ap 505 | summary_value.tag = "mAP" 506 | self.tensorboard.writer.add_summary(summary, epoch) 507 | 508 | logs['mAP'] = self.mean_ap 509 | 510 | if self.verbose == 1: 511 | print('mAP: {:.4f}'.format(self.mean_ap)) 512 | -------------------------------------------------------------------------------- /keras_retinanet/models/classifier.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import os 3 | from sklearn.multioutput import MultiOutputClassifier 4 | from sklearn.model_selection import train_test_split 5 | from sklearn.metrics import classification_report 6 | from sklearn.ensemble import RandomForestClassifier 7 | 8 | 9 | def vr_bb_classifier(main_dir): 10 | 11 | path = os.path.join(main_dir, 'challenge2018') 12 | train_file = "relationship_triplets_annotations.csv" 13 | 14 | train = pd.read_csv(path + train_file) 15 | 16 | train['box1length'] = train['XMax1'] - train['XMin1'] 17 | train['box2length'] = train['XMax2'] - train['XMin2'] 18 | train['box1height'] = train['YMax1'] - train['YMin1'] 19 | train['box2height'] = train['YMax2'] - train['YMin2'] 20 | 21 | train['box1area'] = train['box1length'] * train['box1height'] 22 | train['box2area'] = train['box2length'] * train['box2height'] 23 | 24 | train["xA"] = train[["XMin1", "XMin2"]].max(axis=1) 25 | train["yA"] = train[["YMin1", "YMin2"]].max(axis=1) 26 | train["xB"] = train[["XMax1", "XMax2"]].min(axis=1) 27 | train["yB"] = train[["YMax1", "YMax2"]].min(axis=1) 28 | 29 | train["intersectionarea"] = (train["xB"] - train["xA"]) * (train["yB"] - train["yA"]) 30 | train["unionarea"] = train["box1area"] + train["box2area"] - train["intersectionarea"] 31 | train["iou"] = (train["intersectionarea"] / train["unionarea"]) 32 | 33 | drop_columns = ["ImageID", "box1length", "box2length", "box1height", 34 | "box2height", "intersectionarea", "unionarea", "xA", "yA", 35 | "xB", "yB", "box1area", "box2area"] 36 | train = train.drop(columns=drop_columns) 37 | 38 | train = train[['LabelName1', 'LabelName2', 'XMin1', 'XMax1', 'YMin1', 'YMax1', 'XMin2', 39 | 'XMax2', 'YMin2', 'YMax2', 'iou', 'RelationshipLabel']] 40 | 41 | train = pd.get_dummies(train, columns=["RelationshipLabel"]) 42 | 43 | COLUMN_NAMES = {"RelationshipLabel_at": "at", 44 | "RelationshipLabel_hits": "hits", 45 | "RelationshipLabel_holds": "holds", 46 | "RelationshipLabel_inside_of": "inside_of", 47 | "RelationshipLabel_interacts_with": "interacts_with", 48 | "RelationshipLabel_is": "is", 49 | "RelationshipLabel_on": "on", 50 | "RelationshipLabel_plays": "plays", 51 | "RelationshipLabel_under": "under", 52 | "RelationshipLabel_wears": "wears", 53 | } 54 | 55 | train = train.rename(columns=COLUMN_NAMES) 56 | 57 | X = train[['XMin1', 'XMax1', 'YMin1', 'YMax1', 'XMin2', 58 | 'XMax2', 'YMin2', 'YMax2', 'iou']] 59 | 60 | y = train[['at', 'hits', 'holds', 'inside_of', 'interacts_with', 61 | 'is', 'on', 'plays', 'under', 'wears']] 62 | 63 | print("Training VR Classifier") 64 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=25) 65 | 66 | forest = RandomForestClassifier(n_estimators=500, 67 | verbose=1) 68 | LogReg = MultiOutputClassifier(forest).fit(X_train, y_train) 69 | 70 | # y_pred = LogReg.predict(X_test) 71 | # print(classification_report(y_test, y_pred)) 72 | print("VR Classifier Training Complete") 73 | 74 | return LogReg 75 | 76 | -------------------------------------------------------------------------------- /keras_retinanet/models/model_backbone.py: -------------------------------------------------------------------------------- 1 | import keras.models 2 | 3 | 4 | class Backbone(object): 5 | """ This class stores additional information on backbones. 6 | """ 7 | 8 | def __init__(self, backbone): 9 | # a dictionary mapping custom layer names to the correct classes 10 | from ..utils import layers 11 | from ..utils import losses 12 | from ..utils import initializers 13 | self.custom_objects = { 14 | 'UpsampleLike': layers.UpsampleLike, 15 | 'PriorProbability': initializers.PriorProbability, 16 | 'RegressBoxes': layers.RegressBoxes, 17 | 'FilterDetections': layers.FilterDetections, 18 | 'Anchors': layers.Anchors, 19 | 'ClipBoxes': layers.ClipBoxes, 20 | '_smooth_l1': losses.smooth_l1(), 21 | '_focal': losses.focal(), 22 | } 23 | 24 | self.backbone = backbone 25 | self.validate() 26 | 27 | def retinanet(self, *args, **kwargs): 28 | """ Returns a retinanet model using the correct backbone. 29 | """ 30 | raise NotImplementedError('retinanet method not implemented.') 31 | 32 | def download_imagenet(self): 33 | """ Downloads ImageNet weights and returns path to weights file. 34 | """ 35 | raise NotImplementedError('download_imagenet method not implemented.') 36 | 37 | def validate(self): 38 | """ Checks whether the backbone string is correct. 39 | """ 40 | raise NotImplementedError('validate method not implemented.') 41 | 42 | def preprocess_image(self, inputs): 43 | """ Takes as input an image and prepares it for being passed through the network. 44 | Having this function in Backbone allows other backbones to define a specific preprocessing step. 45 | """ 46 | raise NotImplementedError('preprocess_image method not implemented.') 47 | 48 | 49 | def backbone(backbone_name): 50 | """ Returns a backbone object for the given backbone. 51 | """ 52 | if 'resnet' in backbone_name: 53 | from ..models.resnet import ResNetBackbone as b 54 | else: 55 | raise NotImplementedError('Backbone class for \'{}\' not implemented.'.format(backbone)) 56 | 57 | return b(backbone_name) 58 | 59 | 60 | def load_model(filepath, backbone_name='resnet50', convert=False, nms=True, class_specific_filter=True): 61 | """ Loads a retinanet model using the correct custom objects. 62 | # Arguments 63 | filepath: one of the following: 64 | - string, path to the saved model, or 65 | - h5py.File object from which to load the model 66 | backbone_name : Backbone with which the model was trained. 67 | convert : Boolean, whether to convert the model to an inference model. 68 | nms : Boolean, whether to add NMS filtering to the converted model. 69 | Only valid if convert=True. 70 | class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only. 71 | # Returns 72 | A keras.models.Model object. 73 | # Raises 74 | ImportError: if h5py is not available. 75 | ValueError: In case of an invalid savefile. 76 | """ 77 | 78 | model = keras.models.load_model(filepath, custom_objects=backbone(backbone_name).custom_objects) 79 | if convert: 80 | from ..models.retinanet import retinanet_bbox 81 | print("Starting to convert model...") 82 | model = retinanet_bbox(model=model, nms=nms, class_specific_filter=class_specific_filter) 83 | 84 | return model 85 | -------------------------------------------------------------------------------- /keras_retinanet/models/resnet.py: -------------------------------------------------------------------------------- 1 | import keras 2 | from keras.utils import get_file 3 | import keras_resnet 4 | import keras_resnet.models 5 | from ..models import retinanet 6 | 7 | from ..models.model_backbone import Backbone 8 | from ..preprocessing.image import preprocess_image 9 | 10 | 11 | class ResNetBackbone(Backbone): 12 | """ Describes backbone information and provides utility functions. 13 | """ 14 | 15 | def __init__(self, backbone): 16 | super(ResNetBackbone, self).__init__(backbone) 17 | self.custom_objects.update(keras_resnet.custom_objects) 18 | 19 | def retinanet(self, *args, **kwargs): 20 | """ Returns a retinanet model using the correct backbone. 21 | """ 22 | return resnet_retinanet(*args, backbone=self.backbone, **kwargs) 23 | 24 | def download_imagenet(self): 25 | """ Downloads ImageNet weights and returns path to weights file. 26 | """ 27 | resnet_filename = 'ResNet-{}-model.keras.h5' 28 | resnet_resource = 'https://github.com/fizyr/keras-models/releases/download/v0.0.1/{}'.format(resnet_filename) 29 | depth = int(self.backbone.replace('resnet', '')) 30 | 31 | filename = resnet_filename.format(depth) 32 | resource = resnet_resource.format(depth) 33 | if depth == 50: 34 | checksum = '3e9f4e4f77bbe2c9bec13b53ee1c2319' 35 | elif depth == 101: 36 | checksum = '05dc86924389e5b401a9ea0348a3213c' 37 | elif depth == 152: 38 | checksum = '6ee11ef2b135592f8031058820bb9e71' 39 | 40 | return get_file( 41 | filename, 42 | resource, 43 | cache_subdir='models', 44 | md5_hash=checksum 45 | ) 46 | 47 | def validate(self): 48 | """ Checks whether the backbone string is correct. 49 | """ 50 | allowed_backbones = ['resnet50', 'resnet101', 'resnet152'] 51 | backbone = self.backbone.split('_')[0] 52 | 53 | if backbone not in allowed_backbones: 54 | raise ValueError('Backbone (\'{}\') not in allowed backbones ({}).'.format(backbone, allowed_backbones)) 55 | 56 | def preprocess_image(self, inputs): 57 | """ Takes as input an image and prepares it for being passed through the network. 58 | """ 59 | return preprocess_image(inputs) 60 | 61 | 62 | def resnet_retinanet(num_classes, backbone='resnet50', inputs=None, modifier=None, **kwargs): 63 | """ Constructs a retinanet model using a resnet backbone. 64 | 65 | Args 66 | num_classes: Number of classes to predict. 67 | backbone: Which backbone to use (one of ('resnet50', 'resnet101', 'resnet152')). 68 | inputs: The inputs to the network (defaults to a Tensor of shape (None, None, 3)). 69 | modifier: A function handler which can modify the backbone before using it in retinanet (this can be used to 70 | freeze backbone layers for example). 71 | 72 | Returns 73 | RetinaNet model with a ResNet backbone. 74 | """ 75 | # choose default input 76 | if inputs is None: 77 | inputs = keras.layers.Input(shape=(None, None, 3)) 78 | 79 | # create the resnet backbone 80 | if backbone == 'resnet50': 81 | resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True) 82 | elif backbone == 'resnet101': 83 | resnet = keras_resnet.models.ResNet101(inputs, include_top=False, freeze_bn=True) 84 | elif backbone == 'resnet152': 85 | resnet = keras_resnet.models.ResNet152(inputs, include_top=False, freeze_bn=True) 86 | else: 87 | raise ValueError('Backbone (\'{}\') is invalid.'.format(backbone)) 88 | 89 | # invoke modifier if given 90 | if modifier: 91 | resnet = modifier(resnet) 92 | 93 | # create the full model 94 | return retinanet.retinanet(inputs=inputs, num_classes=num_classes, backbone_layers=resnet.outputs[1:], **kwargs) 95 | 96 | 97 | def resnet50_retinanet(num_classes, inputs=None, **kwargs): 98 | return resnet_retinanet(num_classes=num_classes, backbone='resnet50', inputs=inputs, **kwargs) 99 | 100 | 101 | def resnet101_retinanet(num_classes, inputs=None, **kwargs): 102 | return resnet_retinanet(num_classes=num_classes, backbone='resnet101', inputs=inputs, **kwargs) 103 | 104 | 105 | def resnet152_retinanet(num_classes, inputs=None, **kwargs): 106 | return resnet_retinanet(num_classes=num_classes, backbone='resnet152', inputs=inputs, **kwargs) 107 | -------------------------------------------------------------------------------- /keras_retinanet/models/retinanet.py: -------------------------------------------------------------------------------- 1 | import keras 2 | from ..utils import initializers 3 | from ..utils import layers 4 | import numpy as np 5 | 6 | 7 | def default_classification_model( 8 | num_classes, 9 | num_anchors, 10 | pyramid_feature_size=256, 11 | prior_probability=0.01, 12 | classification_feature_size=256, 13 | name='classification_submodel' 14 | ): 15 | """ Creates the default regression submodel. 16 | 17 | Args 18 | num_classes : Number of classes to predict a score for at each feature level. 19 | num_anchors : Number of anchors to predict classification scores for at each feature level. 20 | pyramid_feature_size : The number of filters to expect from the feature pyramid levels. 21 | classification_feature_size : The number of filters to use in the layers in the classification submodel. 22 | name : The name of the submodel. 23 | 24 | Returns 25 | A keras.models.Model that predicts classes for each anchor. 26 | """ 27 | options = { 28 | 'kernel_size': 3, 29 | 'strides': 1, 30 | 'padding': 'same', 31 | } 32 | 33 | inputs = keras.layers.Input(shape=(None, None, pyramid_feature_size)) 34 | outputs = inputs 35 | for i in range(4): 36 | outputs = keras.layers.Conv2D( 37 | filters=classification_feature_size, 38 | activation='relu', 39 | name='pyramid_classification_{}'.format(i), 40 | kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None), 41 | bias_initializer='zeros', 42 | **options 43 | )(outputs) 44 | 45 | outputs = keras.layers.Conv2D( 46 | filters=num_classes * num_anchors, 47 | kernel_initializer=keras.initializers.zeros(), 48 | bias_initializer=initializers.PriorProbability(probability=prior_probability), 49 | name='pyramid_classification', 50 | **options 51 | )(outputs) 52 | 53 | # reshape output and apply sigmoid 54 | outputs = keras.layers.Reshape((-1, num_classes), name='pyramid_classification_reshape')(outputs) 55 | outputs = keras.layers.Activation('sigmoid', name='pyramid_classification_sigmoid')(outputs) 56 | 57 | return keras.models.Model(inputs=inputs, outputs=outputs, name=name) 58 | 59 | 60 | def default_regression_model(num_anchors, pyramid_feature_size=256, regression_feature_size=256, 61 | name='regression_submodel'): 62 | """ Creates the default regression submodel. 63 | 64 | Args 65 | num_anchors : Number of anchors to regress for each feature level. 66 | pyramid_feature_size : The number of filters to expect from the feature pyramid levels. 67 | regression_feature_size : The number of filters to use in the layers in the regression submodel. 68 | name : The name of the submodel. 69 | 70 | Returns 71 | A keras.models.Model that predicts regression values for each anchor. 72 | """ 73 | # All new conv layers except the final one in the 74 | # RetinaNet (classification) subnets are initialized 75 | # with bias b = 0 and a Gaussian weight fill with stddev = 0.01. 76 | options = { 77 | 'kernel_size': 3, 78 | 'strides': 1, 79 | 'padding': 'same', 80 | 'kernel_initializer': keras.initializers.normal(mean=0.0, stddev=0.01, seed=None), 81 | 'bias_initializer': 'zeros' 82 | } 83 | 84 | inputs = keras.layers.Input(shape=(None, None, pyramid_feature_size)) 85 | outputs = inputs 86 | for i in range(4): 87 | outputs = keras.layers.Conv2D( 88 | filters=regression_feature_size, 89 | activation='relu', 90 | name='pyramid_regression_{}'.format(i), 91 | **options 92 | )(outputs) 93 | 94 | outputs = keras.layers.Conv2D(num_anchors * 4, name='pyramid_regression', **options)(outputs) 95 | outputs = keras.layers.Reshape((-1, 4), name='pyramid_regression_reshape')(outputs) 96 | 97 | return keras.models.Model(inputs=inputs, outputs=outputs, name=name) 98 | 99 | 100 | def __create_pyramid_features(C3, C4, C5, feature_size=256): 101 | """ Creates the FPN layers on top of the backbone features. 102 | 103 | Args 104 | C3 : Feature stage C3 from the backbone. 105 | C4 : Feature stage C4 from the backbone. 106 | C5 : Feature stage C5 from the backbone. 107 | feature_size : The feature size to use for the resulting feature levels. 108 | 109 | Returns 110 | A list of feature levels [P3, P4, P5, P6, P7]. 111 | """ 112 | # upsample C5 to get P5 from the FPN paper 113 | P5 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C5_reduced')(C5) 114 | P5_upsampled = layers.UpsampleLike(name='P5_upsampled')([P5, C4]) 115 | P5 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P5')(P5) 116 | 117 | # add P5 elementwise to C4 118 | P4 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C4_reduced')(C4) 119 | P4 = keras.layers.Add(name='P4_merged')([P5_upsampled, P4]) 120 | P4_upsampled = layers.UpsampleLike(name='P4_upsampled')([P4, C3]) 121 | P4 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P4')(P4) 122 | 123 | # add P4 elementwise to C3 124 | P3 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C3_reduced')(C3) 125 | P3 = keras.layers.Add(name='P3_merged')([P4_upsampled, P3]) 126 | P3 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P3')(P3) 127 | 128 | # "P6 is obtained via a 3x3 stride-2 conv on C5" 129 | P6 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P6')(C5) 130 | 131 | # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6" 132 | P7 = keras.layers.Activation('relu', name='C6_relu')(P6) 133 | P7 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P7')(P7) 134 | 135 | return [P3, P4, P5, P6, P7] 136 | 137 | 138 | class AnchorParameters: 139 | """ The parameteres that define how anchors are generated. 140 | 141 | Args 142 | sizes : List of sizes to use. Each size corresponds to one feature level. 143 | strides : List of strides to use. Each stride correspond to one feature level. 144 | ratios : List of ratios to use per location in a feature map. 145 | scales : List of scales to use per location in a feature map. 146 | """ 147 | 148 | def __init__(self, sizes, strides, ratios, scales): 149 | self.sizes = sizes 150 | self.strides = strides 151 | self.ratios = ratios 152 | self.scales = scales 153 | 154 | def num_anchors(self): 155 | return len(self.ratios) * len(self.scales) 156 | 157 | 158 | """ 159 | The default anchor parameters. 160 | """ 161 | AnchorParameters.default = AnchorParameters( 162 | sizes=[32, 64, 128, 256, 512], 163 | strides=[8, 16, 32, 64, 128], 164 | ratios=np.array([0.5, 1, 2], keras.backend.floatx()), 165 | scales=np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)], keras.backend.floatx()), 166 | ) 167 | 168 | 169 | def default_submodels(num_classes, num_anchors): 170 | """ Create a list of default submodels used for object detection. 171 | 172 | The default submodels contains a regression submodel and a classification submodel. 173 | 174 | Args 175 | num_classes : Number of classes to use. 176 | num_anchors : Number of base anchors. 177 | 178 | Returns 179 | A list of tuple, where the first element is the name of the submodel and the second element is the 180 | submodel itself. 181 | """ 182 | return [ 183 | ('regression', default_regression_model(num_anchors)), 184 | ('classification', default_classification_model(num_classes, num_anchors)) 185 | ] 186 | 187 | 188 | def __build_model_pyramid(name, model, features): 189 | """ Applies a single submodel to each FPN level. 190 | 191 | Args 192 | name : Name of the submodel. 193 | model : The submodel to evaluate. 194 | features : The FPN features. 195 | 196 | Returns 197 | A tensor containing the response from the submodel on the FPN features. 198 | """ 199 | return keras.layers.Concatenate(axis=1, name=name)([model(f) for f in features]) 200 | 201 | 202 | def __build_pyramid(models, features): 203 | """ Applies all submodels to each FPN level. 204 | 205 | Args 206 | models : List of sumodels to run on each pyramid level (by default only regression, classifcation). 207 | features : The FPN features. 208 | 209 | Returns 210 | A list of tensors, one for each submodel. 211 | """ 212 | return [__build_model_pyramid(n, m, features) for n, m in models] 213 | 214 | 215 | def __build_anchors(anchor_parameters, features): 216 | """ Builds anchors for the shape of the features from FPN. 217 | 218 | Args 219 | anchor_parameters : Parameteres that determine how anchors are generated. 220 | features : The FPN features. 221 | 222 | Returns 223 | A tensor containing the anchors for the FPN features. 224 | 225 | The shape is: 226 | ``` 227 | (batch_size, num_anchors, 4) 228 | ``` 229 | """ 230 | anchors = [ 231 | layers.Anchors( 232 | size=anchor_parameters.sizes[i], 233 | stride=anchor_parameters.strides[i], 234 | ratios=anchor_parameters.ratios, 235 | scales=anchor_parameters.scales, 236 | name='anchors_{}'.format(i) 237 | )(f) for i, f in enumerate(features) 238 | ] 239 | 240 | return keras.layers.Concatenate(axis=1, name='anchors')(anchors) 241 | 242 | 243 | def retinanet( 244 | inputs, 245 | backbone_layers, 246 | num_classes, 247 | num_anchors=9, 248 | create_pyramid_features=__create_pyramid_features, 249 | submodels=None, 250 | name='retinanet' 251 | ): 252 | """ Construct a RetinaNet model on top of a backbone. 253 | 254 | This model is the minimum model necessary for training (with the unfortunate exception of anchors as output). 255 | 256 | Args 257 | inputs : keras.layers.Input (or list of) for the input to the model. 258 | num_classes : Number of classes to classify. 259 | num_anchors : Number of base anchors. 260 | create_pyramid_features : Functor for creating pyramid features given the features C3, C4, C5 from the backbone. 261 | submodels : Submodels to run on each feature map (default is regression and classification 262 | submodels). 263 | name : Name of the model. 264 | 265 | Returns 266 | A keras.models.Model which takes an image as input and outputs generated anchors and the result from each 267 | submodel on every pyramid level. 268 | 269 | The order of the outputs is as defined in submodels: 270 | ``` 271 | [ 272 | regression, classification, other[0], other[1], ... 273 | ] 274 | ``` 275 | """ 276 | if submodels is None: 277 | submodels = default_submodels(num_classes, num_anchors) 278 | 279 | C3, C4, C5 = backbone_layers 280 | 281 | # compute pyramid features as per https://arxiv.org/abs/1708.02002 282 | features = create_pyramid_features(C3, C4, C5) 283 | 284 | # for all pyramid levels, run available submodels 285 | pyramids = __build_pyramid(submodels, features) 286 | 287 | return keras.models.Model(inputs=inputs, outputs=pyramids, name=name) 288 | 289 | 290 | def retinanet_bbox( 291 | model=None, 292 | anchor_parameters=AnchorParameters.default, 293 | nms=True, 294 | class_specific_filter=True, 295 | name='retinanet-bbox', 296 | **kwargs 297 | ): 298 | """ Construct a RetinaNet model on top of a backbone and adds convenience functions to output boxes directly. 299 | 300 | This model uses the minimum retinanet model and appends a few layers to compute boxes within the graph. 301 | These layers include applying the regression values to the anchors and performing NMS. 302 | 303 | Args 304 | model : RetinaNet model to append bbox layers to. If None, it will create a RetinaNet model 305 | using **kwargs. 306 | anchor_parameters : Struct containing configuration for anchor generation (sizes, strides, ratios, scales). 307 | nms : Whether to use non-maximum suppression for the filtering step. 308 | class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only. 309 | name : Name of the model. 310 | *kwargs : Additional kwargs to pass to the minimal retinanet model. 311 | 312 | Returns 313 | A keras.models.Model which takes an image as input and outputs the detections on the image. 314 | 315 | The order is defined as follows: 316 | ``` 317 | [ 318 | boxes, scores, labels, other[0], other[1], ... 319 | ] 320 | ``` 321 | """ 322 | if model is None: 323 | model = retinanet(num_anchors=anchor_parameters.num_anchors(), **kwargs) 324 | 325 | # compute the anchors 326 | features = [model.get_layer(p_name).output for p_name in ['P3', 'P4', 'P5', 'P6', 'P7']] 327 | anchors = __build_anchors(anchor_parameters, features) 328 | 329 | # we expect the anchors, regression and classification values as first output 330 | regression = model.outputs[0] 331 | classification = model.outputs[1] 332 | 333 | # "other" can be any additional output from custom submodels, by default this will be [] 334 | other = model.outputs[2:] 335 | 336 | # apply predicted regression to anchors 337 | boxes = layers.RegressBoxes(name='boxes')([anchors, regression]) 338 | boxes = layers.ClipBoxes(name='clipped_boxes')([model.inputs[0], boxes]) 339 | 340 | # filter detections (apply NMS / score threshold / select top-k) 341 | detections = layers.FilterDetections( 342 | nms=nms, 343 | class_specific_filter=class_specific_filter, 344 | name='filtered_detections' 345 | )([boxes, classification] + other) 346 | 347 | outputs = detections 348 | 349 | # construct the model 350 | return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name) 351 | -------------------------------------------------------------------------------- /keras_retinanet/preprocessing/generator.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import random 3 | import threading 4 | import warnings 5 | import keras 6 | 7 | from ..utils.anchors import anchor_targets_bbox 8 | from ..utils.anchors import anchors_for_shape 9 | from ..utils.anchors import guess_shapes 10 | from ..preprocessing.image import TransformParameters 11 | from ..preprocessing.image import adjust_transform_for_image 12 | from ..preprocessing.image import apply_transform 13 | from ..preprocessing.image import preprocess_image 14 | from ..preprocessing.image import resize_image 15 | from ..preprocessing.image import transform_aabb 16 | 17 | 18 | class Generator(object): 19 | """ Abstract generator class. 20 | """ 21 | 22 | def __init__( 23 | self, 24 | transform_generator=None, 25 | batch_size=1, 26 | group_method='ratio', # one of 'none', 'random', 'ratio' 27 | shuffle_groups=True, 28 | image_min_side=800, 29 | image_max_side=1333, 30 | transform_parameters=None, 31 | compute_anchor_targets=anchor_targets_bbox, 32 | compute_shapes=guess_shapes, 33 | preprocess_image=preprocess_image, 34 | ): 35 | """ Initialize Generator object. 36 | 37 | Args 38 | transform_generator : A generator used to randomly transform images and annotations. 39 | batch_size : The size of the batches to generate. 40 | group_method : Determines how images are grouped together (defaults to 'ratio', one of 41 | ('none', 'random', 'ratio')). 42 | shuffle_groups : If True, shuffles the groups each epoch. 43 | image_min_side : After resizing the minimum side of an image is equal to image_min_side. 44 | image_max_side : If after resizing the maximum side is larger than image_max_side, scales down 45 | further so that the max side is equal to image_max_side. 46 | transform_parameters : The transform parameters used for data augmentation. 47 | compute_anchor_targets : Function handler for computing the targets of anchors for an image and its 48 | annotations. 49 | compute_shapes : Function handler for computing the shapes of the pyramid for a given input. 50 | preprocess_image : Function handler for preprocessing an image (scaling / normalizing) for passing 51 | through a network. 52 | """ 53 | self.transform_generator = transform_generator 54 | self.batch_size = int(batch_size) 55 | self.group_method = group_method 56 | self.shuffle_groups = shuffle_groups 57 | self.image_min_side = image_min_side 58 | self.image_max_side = image_max_side 59 | self.transform_parameters = transform_parameters or TransformParameters() 60 | self.compute_anchor_targets = compute_anchor_targets 61 | self.compute_shapes = compute_shapes 62 | self.preprocess_image = preprocess_image 63 | 64 | self.group_index = 0 65 | self.lock = threading.Lock() 66 | 67 | self.group_images() 68 | 69 | def size(self): 70 | """ Size of the dataset. 71 | """ 72 | raise NotImplementedError('size method not implemented') 73 | 74 | def num_classes(self): 75 | """ Number of classes in the dataset. 76 | """ 77 | raise NotImplementedError('num_classes method not implemented') 78 | 79 | def name_to_label(self, name): 80 | """ Map name to label. 81 | """ 82 | raise NotImplementedError('name_to_label method not implemented') 83 | 84 | def label_to_name(self, label): 85 | """ Map label to name. 86 | """ 87 | raise NotImplementedError('label_to_name method not implemented') 88 | 89 | def image_aspect_ratio(self, image_index): 90 | """ Compute the aspect ratio for an image with image_index. 91 | """ 92 | raise NotImplementedError('image_aspect_ratio method not implemented') 93 | 94 | def load_image(self, image_index): 95 | """ Load an image at the image_index. 96 | """ 97 | raise NotImplementedError('load_image method not implemented') 98 | 99 | def load_annotations(self, image_index): 100 | """ Load annotations for an image_index. 101 | """ 102 | raise NotImplementedError('load_annotations method not implemented') 103 | 104 | def load_annotations_group(self, group): 105 | """ Load annotations for all images in group. 106 | """ 107 | return [self.load_annotations(image_index) for image_index in group] 108 | 109 | def filter_annotations(self, image_group, annotations_group, group): 110 | """ Filter annotations by removing those that are outside of the image bounds or whose width/height < 0. 111 | """ 112 | # test all annotations 113 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)): 114 | assert (isinstance(annotations, 115 | np.ndarray)), \ 116 | '\'load_annotations\' should return a list of numpy arrays, received: {}'.format(type(annotations)) 117 | 118 | # test x2 < x1 | y2 < y1 | x1 < 0 | y1 < 0 | x2 <= 0 | y2 <= 0 | x2 >= image.shape[1] | y2 >= image.shape[0] 119 | invalid_indices = np.where( 120 | (annotations[:, 2] <= annotations[:, 0]) | 121 | (annotations[:, 3] <= annotations[:, 1]) | 122 | (annotations[:, 0] < 0) | 123 | (annotations[:, 1] < 0) | 124 | (annotations[:, 2] > image.shape[1]) | 125 | (annotations[:, 3] > image.shape[0]) 126 | )[0] 127 | 128 | # delete invalid indices 129 | if len(invalid_indices): 130 | warnings.warn('Image with id {} (shape {}) contains the following invalid boxes: {}.'.format( 131 | group[index], 132 | image.shape, 133 | [annotations[invalid_index, :] for invalid_index in invalid_indices] 134 | )) 135 | annotations_group[index] = np.delete(annotations, invalid_indices, axis=0) 136 | 137 | return image_group, annotations_group 138 | 139 | def load_image_group(self, group): 140 | """ Load images for all images in a group. 141 | """ 142 | return [self.load_image(image_index) for image_index in group] 143 | 144 | def random_transform_group_entry(self, image, annotations): 145 | """ Randomly transforms image and annotation. 146 | """ 147 | # randomly transform both image and annotations 148 | if self.transform_generator: 149 | transform = adjust_transform_for_image(next(self.transform_generator), image, 150 | self.transform_parameters.relative_translation) 151 | image = apply_transform(transform, image, self.transform_parameters) 152 | 153 | # Transform the bounding boxes in the annotations. 154 | annotations = annotations.copy() 155 | for index in range(annotations.shape[0]): 156 | annotations[index, :4] = transform_aabb(transform, annotations[index, :4]) 157 | 158 | return image, annotations 159 | 160 | def resize_image(self, image): 161 | """ Resize an image using image_min_side and image_max_side. 162 | """ 163 | return resize_image(image, min_side=self.image_min_side, max_side=self.image_max_side) 164 | 165 | def preprocess_group_entry(self, image, annotations): 166 | """ Preprocess image and its annotations. 167 | """ 168 | # preprocess the image 169 | image = self.preprocess_image(image) 170 | 171 | # # randomly transform image and annotations 172 | # image, annotations = self.random_transform_group_entry(image, annotations) 173 | 174 | # resize image 175 | image, image_scale = self.resize_image(image) 176 | 177 | # apply resizing to annotations too 178 | annotations[:, :4] *= image_scale 179 | 180 | return image, annotations 181 | 182 | def preprocess_group(self, image_group, annotations_group): 183 | """ Preprocess each image and its annotations in its group. 184 | """ 185 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)): 186 | # preprocess a single group entry 187 | image, annotations = self.preprocess_group_entry(image, annotations) 188 | 189 | # copy processed data back to group 190 | image_group[index] = image 191 | annotations_group[index] = annotations 192 | 193 | return image_group, annotations_group 194 | 195 | def group_images(self): 196 | """ Order the images according to self.order and makes groups of self.batch_size. 197 | """ 198 | # determine the order of the images 199 | order = list(range(self.size())) 200 | if self.group_method == 'random': 201 | random.shuffle(order) 202 | elif self.group_method == 'ratio': 203 | order.sort(key=lambda x: self.image_aspect_ratio(x)) 204 | 205 | # divide into groups, one group = one batch 206 | self.groups = [[order[x % len(order)] for x in range(i, i + self.batch_size)] for i in 207 | range(0, len(order), self.batch_size)] 208 | 209 | def compute_inputs(self, image_group): 210 | """ Compute inputs for the network using an image_group. 211 | """ 212 | # get the max image shape 213 | max_shape = tuple(max(image.shape[x] for image in image_group) for x in range(3)) 214 | 215 | # construct an image batch object 216 | image_batch = np.zeros((self.batch_size,) + max_shape, dtype=keras.backend.floatx()) 217 | 218 | # copy all images to the upper left part of the image batch object 219 | for image_index, image in enumerate(image_group): 220 | image_batch[image_index, :image.shape[0], :image.shape[1], :image.shape[2]] = image 221 | 222 | return image_batch 223 | 224 | def generate_anchors(self, image_shape): 225 | return anchors_for_shape(image_shape, shapes_callback=self.compute_shapes) 226 | 227 | def compute_targets(self, image_group, annotations_group): 228 | """ Compute target outputs for the network using images and their annotations. 229 | """ 230 | # get the max image shape 231 | max_shape = tuple(max(image.shape[x] for image in image_group) for x in range(3)) 232 | anchors = self.generate_anchors(max_shape) 233 | 234 | labels_batch, regression_batch, _ = self.compute_anchor_targets( 235 | anchors, 236 | image_group, 237 | annotations_group, 238 | self.num_classes() 239 | ) 240 | 241 | return [regression_batch, labels_batch] 242 | 243 | def compute_input_output(self, group): 244 | """ Compute inputs and target outputs for the network. 245 | """ 246 | # load images and annotations 247 | image_group = self.load_image_group(group) 248 | annotations_group = self.load_annotations_group(group) 249 | 250 | # check validity of annotations 251 | image_group, annotations_group = self.filter_annotations(image_group, annotations_group, group) 252 | 253 | # perform preprocessing steps 254 | image_group, annotations_group = self.preprocess_group(image_group, annotations_group) 255 | 256 | # compute network inputs 257 | inputs = self.compute_inputs(image_group) 258 | 259 | # compute network targets 260 | targets = self.compute_targets(image_group, annotations_group) 261 | 262 | return inputs, targets 263 | 264 | def __next__(self): 265 | return self.next() 266 | 267 | def next(self): 268 | # advance the group index 269 | while True: 270 | with self.lock: 271 | if self.group_index == 0 and self.shuffle_groups: 272 | # shuffle groups at start of epoch 273 | random.shuffle(self.groups) 274 | group = self.groups[self.group_index] 275 | self.group_index = (self.group_index + 1) % len(self.groups) 276 | try: 277 | computed = self.compute_input_output(group) 278 | except: 279 | break 280 | return computed 281 | -------------------------------------------------------------------------------- /keras_retinanet/preprocessing/image.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import keras 4 | import numpy as np 5 | import cv2 6 | from PIL import Image 7 | 8 | DEFAULT_PRNG = np.random 9 | 10 | 11 | def colvec(*args): 12 | """ Create a numpy array representing a column vector. """ 13 | return np.array([args]).T 14 | 15 | 16 | def transform_aabb(transform, aabb): 17 | """ Apply a transformation to an axis aligned bounding box. 18 | 19 | The result is a new AABB in the same coordinate system as the original AABB. 20 | The new AABB contains all corner points of the original AABB after applying the given transformation. 21 | 22 | Args 23 | transform: The transformation to apply. 24 | x1: The minimum x value of the AABB. 25 | y1: The minimum y value of the AABB. 26 | x2: The maximum x value of the AABB. 27 | y2: The maximum y value of the AABB. 28 | Returns 29 | The new AABB as tuple (x1, y1, x2, y2) 30 | """ 31 | x1, y1, x2, y2 = aabb 32 | # Transform all 4 corners of the AABB. 33 | points = transform.dot([ 34 | [x1, x2, x1, x2], 35 | [y1, y2, y2, y1], 36 | [1, 1, 1, 1], 37 | ]) 38 | 39 | # Extract the min and max corners again. 40 | min_corner = points.min(axis=1) 41 | max_corner = points.max(axis=1) 42 | 43 | return [min_corner[0], min_corner[1], max_corner[0], max_corner[1]] 44 | 45 | 46 | def _random_vector(min, max, prng=DEFAULT_PRNG): 47 | """ Construct a random vector between min and max. 48 | Args 49 | min: the minimum value for each component 50 | max: the maximum value for each component 51 | """ 52 | min = np.array(min) 53 | max = np.array(max) 54 | assert min.shape == max.shape 55 | assert len(min.shape) == 1 56 | return prng.uniform(min, max) 57 | 58 | 59 | def rotation(angle): 60 | """ Construct a homogeneous 2D rotation matrix. 61 | Args 62 | angle: the angle in radians 63 | Returns 64 | the rotation matrix as 3 by 3 numpy array 65 | """ 66 | return np.array([ 67 | [np.cos(angle), -np.sin(angle), 0], 68 | [np.sin(angle), np.cos(angle), 0], 69 | [0, 0, 1] 70 | ]) 71 | 72 | 73 | def random_rotation(min, max, prng=DEFAULT_PRNG): 74 | """ Construct a random rotation between -max and max. 75 | Args 76 | min: a scalar for the minimum absolute angle in radians 77 | max: a scalar for the maximum absolute angle in radians 78 | prng: the pseudo-random number generator to use. 79 | Returns 80 | a homogeneous 3 by 3 rotation matrix 81 | """ 82 | return rotation(prng.uniform(min, max)) 83 | 84 | 85 | def translation(translation): 86 | """ Construct a homogeneous 2D translation matrix. 87 | # Arguments 88 | translation: the translation 2D vector 89 | # Returns 90 | the translation matrix as 3 by 3 numpy array 91 | """ 92 | return np.array([ 93 | [1, 0, translation[0]], 94 | [0, 1, translation[1]], 95 | [0, 0, 1] 96 | ]) 97 | 98 | 99 | def random_translation(min, max, prng=DEFAULT_PRNG): 100 | """ Construct a random 2D translation between min and max. 101 | Args 102 | min: a 2D vector with the minimum translation for each dimension 103 | max: a 2D vector with the maximum translation for each dimension 104 | prng: the pseudo-random number generator to use. 105 | Returns 106 | a homogeneous 3 by 3 translation matrix 107 | """ 108 | return translation(_random_vector(min, max, prng)) 109 | 110 | 111 | def shear(angle): 112 | """ Construct a homogeneous 2D shear matrix. 113 | Args 114 | angle: the shear angle in radians 115 | Returns 116 | the shear matrix as 3 by 3 numpy array 117 | """ 118 | return np.array([ 119 | [1, -np.sin(angle), 0], 120 | [0, np.cos(angle), 0], 121 | [0, 0, 1] 122 | ]) 123 | 124 | 125 | def random_shear(min, max, prng=DEFAULT_PRNG): 126 | """ Construct a random 2D shear matrix with shear angle between -max and max. 127 | Args 128 | min: the minimum shear angle in radians. 129 | max: the maximum shear angle in radians. 130 | prng: the pseudo-random number generator to use. 131 | Returns 132 | a homogeneous 3 by 3 shear matrix 133 | """ 134 | return shear(prng.uniform(min, max)) 135 | 136 | 137 | def scaling(factor): 138 | """ Construct a homogeneous 2D scaling matrix. 139 | Args 140 | factor: a 2D vector for X and Y scaling 141 | Returns 142 | the zoom matrix as 3 by 3 numpy array 143 | """ 144 | return np.array([ 145 | [factor[0], 0, 0], 146 | [0, factor[1], 0], 147 | [0, 0, 1] 148 | ]) 149 | 150 | 151 | def random_scaling(min, max, prng=DEFAULT_PRNG): 152 | """ Construct a random 2D scale matrix between -max and max. 153 | Args 154 | min: a 2D vector containing the minimum scaling factor for X and Y. 155 | min: a 2D vector containing The maximum scaling factor for X and Y. 156 | prng: the pseudo-random number generator to use. 157 | Returns 158 | a homogeneous 3 by 3 scaling matrix 159 | """ 160 | return scaling(_random_vector(min, max, prng)) 161 | 162 | 163 | def random_flip(flip_x_chance, flip_y_chance, prng=DEFAULT_PRNG): 164 | """ Construct a transformation randomly containing X/Y flips (or not). 165 | Args 166 | flip_x_chance: The chance that the result will contain a flip along the X axis. 167 | flip_y_chance: The chance that the result will contain a flip along the Y axis. 168 | prng: The pseudo-random number generator to use. 169 | Returns 170 | a homogeneous 3 by 3 transformation matrix 171 | """ 172 | flip_x = prng.uniform(0, 1) < flip_x_chance 173 | flip_y = prng.uniform(0, 1) < flip_y_chance 174 | # 1 - 2 * bool gives 1 for False and -1 for True. 175 | return scaling((1 - 2 * flip_x, 1 - 2 * flip_y)) 176 | 177 | 178 | def change_transform_origin(transform, center): 179 | """ Create a new transform representing the same transformation, 180 | only with the origin of the linear part changed. 181 | Args 182 | transform: the transformation matrix 183 | center: the new origin of the transformation 184 | Returns 185 | translate(center) * transform * translate(-center) 186 | """ 187 | center = np.array(center) 188 | return np.linalg.multi_dot([translation(center), transform, translation(-center)]) 189 | 190 | 191 | def random_transform( 192 | min_rotation=0, 193 | max_rotation=0, 194 | min_translation=(0, 0), 195 | max_translation=(0, 0), 196 | min_shear=0, 197 | max_shear=0, 198 | min_scaling=(1, 1), 199 | max_scaling=(1, 1), 200 | flip_x_chance=0, 201 | flip_y_chance=0, 202 | prng=DEFAULT_PRNG 203 | ): 204 | """ Create a random transformation. 205 | 206 | The transformation consists of the following operations in this order (from left to right): 207 | * rotation 208 | * translation 209 | * shear 210 | * scaling 211 | * flip x (if applied) 212 | * flip y (if applied) 213 | 214 | Note that by default, the data generators in `keras_retinanet.preprocessing.generators` interpret the translation 215 | as factor of the image size. So an X translation of 0.1 would translate the image by 10% of it's width. 216 | Set `relative_translation` to `False` in the `TransformParameters` of a data generator to have it interpret 217 | the translation directly as pixel distances instead. 218 | 219 | Args 220 | min_rotation: The minimum rotation in radians for the transform as scalar. 221 | max_rotation: The maximum rotation in radians for the transform as scalar. 222 | min_translation: The minimum translation for the transform as 2D column vector. 223 | max_translation: The maximum translation for the transform as 2D column vector. 224 | min_shear: The minimum shear angle for the transform in radians. 225 | max_shear: The maximum shear angle for the transform in radians. 226 | min_scaling: The minimum scaling for the transform as 2D column vector. 227 | max_scaling: The maximum scaling for the transform as 2D column vector. 228 | flip_x_chance: The chance (0 to 1) that a transform will contain a flip along X direction. 229 | flip_y_chance: The chance (0 to 1) that a transform will contain a flip along Y direction. 230 | prng: The pseudo-random number generator to use. 231 | """ 232 | return np.linalg.multi_dot([ 233 | random_rotation(min_rotation, max_rotation, prng), 234 | random_translation(min_translation, max_translation, prng), 235 | random_shear(min_shear, max_shear, prng), 236 | random_scaling(min_scaling, max_scaling, prng), 237 | random_flip(flip_x_chance, flip_y_chance, prng) 238 | ]) 239 | 240 | 241 | def random_transform_generator(prng=None, **kwargs): 242 | """ Create a random transform generator. 243 | 244 | Uses a dedicated, newly created, properly seeded PRNG by default instead of the global DEFAULT_PRNG. 245 | 246 | The transformation consists of the following operations in this order (from left to right): 247 | * rotation 248 | * translation 249 | * shear 250 | * scaling 251 | * flip x (if applied) 252 | * flip y (if applied) 253 | 254 | Note that by default, the data generators in `keras_retinanet.preprocessing.generators` interpret the translation 255 | as factor of the image size. So an X translation of 0.1 would translate the image by 10% of it's width. 256 | Set `relative_translation` to `False` in the `TransformParameters` of a data generator to have it interpret 257 | the translation directly as pixel distances instead. 258 | 259 | Args 260 | min_rotation: The minimum rotation in radians for the transform as scalar. 261 | max_rotation: The maximum rotation in radians for the transform as scalar. 262 | min_translation: The minimum translation for the transform as 2D column vector. 263 | max_translation: The maximum translation for the transform as 2D column vector. 264 | min_shear: The minimum shear angle for the transform in radians. 265 | max_shear: The maximum shear angle for the transform in radians. 266 | min_scaling: The minimum scaling for the transform as 2D column vector. 267 | max_scaling: The maximum scaling for the transform as 2D column vector. 268 | flip_x_chance: The chance (0 to 1) that a transform will contain a flip along X direction. 269 | flip_y_chance: The chance (0 to 1) that a transform will contain a flip along Y direction. 270 | prng: The pseudo-random number generator to use. 271 | """ 272 | 273 | if prng is None: 274 | # RandomState automatically seeds using the best available method. 275 | prng = np.random.RandomState() 276 | 277 | while True: 278 | yield random_transform(prng=prng, **kwargs) 279 | 280 | 281 | def read_image_bgr(path): 282 | """ Read an image in BGR format. 283 | 284 | Args 285 | path: Path to the image. 286 | """ 287 | image = np.asarray(Image.open(path).convert('RGB')) 288 | return image[:, :, ::-1].copy() 289 | 290 | 291 | def preprocess_image(x): 292 | """ Preprocess an image by subtracting the ImageNet mean. 293 | 294 | Args 295 | x: np.array of shape (None, None, 3) or (3, None, None). 296 | mode: One of "caffe" or "tf". 297 | - caffe: will zero-center each color channel with 298 | respect to the ImageNet dataset, without scaling. 299 | - tf: will scale pixels between -1 and 1, sample-wise. 300 | 301 | Returns 302 | The input with the ImageNet mean subtracted. 303 | """ 304 | # mostly identical to: 305 | # "https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py" 306 | # except for converting RGB -> BGR since we assume BGR already 307 | x = x.astype(keras.backend.floatx()) 308 | 309 | if keras.backend.image_data_format() == 'channels_first': 310 | if x.ndim == 3: 311 | x[0, :, :] -= 103.939 312 | x[1, :, :] -= 116.779 313 | x[2, :, :] -= 123.68 314 | else: 315 | x[:, 0, :, :] -= 103.939 316 | x[:, 1, :, :] -= 116.779 317 | x[:, 2, :, :] -= 123.68 318 | else: 319 | x[..., 0] -= 103.939 320 | x[..., 1] -= 116.779 321 | x[..., 2] -= 123.68 322 | 323 | return x 324 | 325 | 326 | def adjust_transform_for_image(transform, image, relative_translation): 327 | """ Adjust a transformation for a specific image. 328 | 329 | The translation of the matrix will be scaled with the size of the image. 330 | The linear part of the transformation will adjusted so that the origin of the transformation will be 331 | at the center of the image. 332 | """ 333 | height, width, channels = image.shape 334 | 335 | result = transform 336 | 337 | # Scale the translation with the image size if specified. 338 | if relative_translation: 339 | result[0:2, 2] *= [width, height] 340 | 341 | # Move the origin of transformation. 342 | result = change_transform_origin(transform, (0.5 * width, 0.5 * height)) 343 | 344 | return result 345 | 346 | 347 | class TransformParameters: 348 | """ Struct holding parameters determining how to apply a transformation to an image. 349 | 350 | Args 351 | fill_mode: One of: 'constant', 'nearest', 'reflect', 'wrap' 352 | interpolation: One of: 'nearest', 'linear', 'cubic', 'area', 'lanczos4' 353 | cval: Fill value to use with fill_mode='constant' 354 | data_format: Same as for keras.preprocessing.image.apply_transform 355 | relative_translation: If true (the default), interpret translation as a factor of the image size. 356 | If false, interpret it as absolute pixels. 357 | """ 358 | 359 | def __init__( 360 | self, 361 | fill_mode='nearest', 362 | interpolation='linear', 363 | cval=0, 364 | data_format=None, 365 | relative_translation=True, 366 | ): 367 | self.fill_mode = fill_mode 368 | self.cval = cval 369 | self.interpolation = interpolation 370 | self.relative_translation = relative_translation 371 | 372 | if data_format is None: 373 | data_format = keras.backend.image_data_format() 374 | self.data_format = data_format 375 | 376 | if data_format == 'channels_first': 377 | self.channel_axis = 0 378 | elif data_format == 'channels_last': 379 | self.channel_axis = 2 380 | else: 381 | raise ValueError( 382 | "invalid data_format, expected 'channels_first' or 'channels_last', got '{}'".format(data_format)) 383 | 384 | def cvBorderMode(self): 385 | if self.fill_mode == 'constant': 386 | return cv2.BORDER_CONSTANT 387 | if self.fill_mode == 'nearest': 388 | return cv2.BORDER_REPLICATE 389 | if self.fill_mode == 'reflect': 390 | return cv2.BORDER_REFLECT_101 391 | if self.fill_mode == 'wrap': 392 | return cv2.BORDER_WRAP 393 | 394 | def cvInterpolation(self): 395 | if self.interpolation == 'nearest': 396 | return cv2.INTER_NEAREST 397 | if self.interpolation == 'linear': 398 | return cv2.INTER_LINEAR 399 | if self.interpolation == 'cubic': 400 | return cv2.INTER_CUBIC 401 | if self.interpolation == 'area': 402 | return cv2.INTER_AREA 403 | if self.interpolation == 'lanczos4': 404 | return cv2.INTER_LANCZOS4 405 | 406 | 407 | def apply_transform(matrix, image, params): 408 | """ 409 | Apply a transformation to an image. 410 | 411 | The origin of transformation is at the top left corner of the image. 412 | 413 | The matrix is interpreted such that a point (x, y) on the original image is moved to 414 | transform * (x, y) in the generated image. 415 | Mathematically speaking, that means that the matrix is a transformation from the transformed image space to 416 | the original image space. 417 | 418 | Args 419 | matrix: A homogeneous 3 by 3 matrix holding representing the transformation to apply. 420 | image: The image to transform. 421 | params: The transform parameters (see TransformParameters) 422 | """ 423 | if params.channel_axis != 2: 424 | image = np.moveaxis(image, params.channel_axis, 2) 425 | 426 | output = cv2.warpAffine( 427 | image, 428 | matrix[:2, :], 429 | dsize=(image.shape[1], image.shape[0]), 430 | flags=params.cvInterpolation(), 431 | borderMode=params.cvBorderMode(), 432 | borderValue=params.cval, 433 | ) 434 | 435 | if params.channel_axis != 2: 436 | output = np.moveaxis(output, 2, params.channel_axis) 437 | return output 438 | 439 | 440 | def resize_image(img, min_side=800, max_side=1333): 441 | """ Resize an image such that the size is constrained to min_side and max_side. 442 | 443 | Args 444 | min_side: The image's min side will be equal to min_side after resizing. 445 | max_side: If after resizing the image's max side is above max_side, resize until the max side is 446 | equal to max_side. 447 | 448 | Returns 449 | A resized image. 450 | """ 451 | (rows, cols, _) = img.shape 452 | 453 | smallest_side = min(rows, cols) 454 | 455 | # rescale the image so the smallest side is min_side 456 | scale = min_side / smallest_side 457 | 458 | # check if the largest side is now greater than max_side, which can happen 459 | # when images have a large aspect ratio 460 | largest_side = max(rows, cols) 461 | if largest_side * scale > max_side: 462 | scale = max_side / largest_side 463 | 464 | # resize the image with the computed scale 465 | img = cv2.resize(img, None, fx=scale, fy=scale) 466 | 467 | return img, scale 468 | -------------------------------------------------------------------------------- /keras_retinanet/preprocessing/open_images.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import json 3 | import os 4 | import warnings 5 | 6 | import numpy as np 7 | from PIL import Image 8 | 9 | from ..preprocessing.generator import Generator 10 | from ..preprocessing.image import read_image_bgr 11 | 12 | 13 | def load_hierarchy(metadata_dir, version='challenge2018'): 14 | hierarchy = None 15 | if version == 'challenge2018': 16 | hierarchy = 'bbox_labels_500_hierarchy.json' 17 | 18 | hierarchy_json = os.path.join(metadata_dir, hierarchy) 19 | with open(hierarchy_json) as f: 20 | hierarchy_data = json.loads(f.read()) 21 | 22 | return hierarchy_data 23 | 24 | 25 | def load_hierarchy_children(hierarchy): 26 | res = [hierarchy['LabelName']] 27 | 28 | if 'Subcategory' in hierarchy: 29 | for subcategory in hierarchy['Subcategory']: 30 | children = load_hierarchy_children(subcategory) 31 | 32 | for c in children: 33 | res.append(c) 34 | 35 | return res 36 | 37 | 38 | def find_hierarchy_parent(hierarchy, parent_cls): 39 | if hierarchy['LabelName'] == parent_cls: 40 | return hierarchy 41 | elif 'Subcategory' in hierarchy: 42 | for child in hierarchy['Subcategory']: 43 | res = find_hierarchy_parent(child, parent_cls) 44 | if res is not None: 45 | return res 46 | 47 | return None 48 | 49 | 50 | def get_labels(metadata_dir, version='challenge2018'): 51 | if version == 'challenge2018': 52 | csv_file = 'challenge-2018-class-descriptions-500.csv' 53 | 54 | boxable_classes_descriptions = os.path.join(metadata_dir, csv_file) 55 | id_to_labels = {} 56 | cls_index = {} 57 | 58 | i = 0 59 | with open(boxable_classes_descriptions) as f: 60 | for row in csv.reader(f): 61 | # make sure the csv row is not empty (usually the last one) 62 | if len(row): 63 | label = row[0] 64 | description = row[1].replace("\"", "").replace("'", "").replace('`', '') 65 | 66 | id_to_labels[i] = description 67 | cls_index[label] = i 68 | 69 | i += 1 70 | else: 71 | trainable_classes_path = os.path.join(metadata_dir, 'classes-bbox-trainable.txt') 72 | description_path = os.path.join(metadata_dir, 'class-descriptions.csv') 73 | 74 | description_table = {} 75 | with open(description_path) as f: 76 | for row in csv.reader(f): 77 | # make sure the csv row is not empty (usually the last one) 78 | if len(row): 79 | description_table[row[0]] = row[1].replace("\"", "").replace("'", "").replace('`', '') 80 | 81 | with open(trainable_classes_path, 'rb') as f: 82 | trainable_classes = f.read().split('\n') 83 | 84 | id_to_labels = dict([(i, description_table[c]) for i, c in enumerate(trainable_classes)]) 85 | cls_index = dict([(c, i) for i, c in enumerate(trainable_classes)]) 86 | 87 | return id_to_labels, cls_index 88 | 89 | 90 | def generate_images_annotations_json(main_dir, metadata_dir, subset, cls_index, version='challenge2018'): 91 | validation_image_ids = {} 92 | 93 | validation_image_ids_path = os.path.join(metadata_dir, 'challenge-2018-image-ids-valset-od.csv') 94 | 95 | with open(validation_image_ids_path, 'r') as csv_file: 96 | reader = csv.DictReader(csv_file, fieldnames=['ImageID']) 97 | next(reader) 98 | for line, row in enumerate(reader): 99 | image_id = row['ImageID'] 100 | validation_image_ids[image_id] = True 101 | 102 | annotations_path = os.path.join(metadata_dir, 'challenge-2018-train-annotations-bbox.csv') 103 | 104 | fieldnames = ['ImageID', 'Source', 'LabelName', 'Confidence', 105 | 'XMin', 'XMax', 'YMin', 'YMax', 106 | 'IsOccluded', 'IsTruncated', 'IsGroupOf', 'IsDepiction', 'IsInside'] 107 | 108 | id_annotations = dict() 109 | with open(annotations_path, 'r') as csv_file: 110 | reader = csv.DictReader(csv_file, fieldnames=fieldnames) 111 | next(reader) 112 | 113 | images_sizes = {} 114 | for line, row in enumerate(reader): 115 | frame = row['ImageID'] 116 | 117 | if version == 'challenge2018': 118 | if subset == 'train': 119 | if frame in validation_image_ids: 120 | continue 121 | elif subset == 'validation': 122 | if frame not in validation_image_ids: 123 | continue 124 | else: 125 | raise NotImplementedError('This generator handles only the train and validation subsets') 126 | 127 | class_name = row['LabelName'] 128 | 129 | if class_name not in cls_index: 130 | continue 131 | 132 | cls_id = cls_index[class_name] 133 | 134 | if version == 'challenge2018': 135 | img_path = os.path.join(main_dir, 'images', 'train', frame + '.jpg') 136 | else: 137 | img_path = os.path.join(main_dir, 'images', subset, frame + '.jpg') 138 | 139 | if frame in images_sizes: 140 | width, height = images_sizes[frame] 141 | else: 142 | try: 143 | with Image.open(img_path) as img: 144 | width, height = img.width, img.height 145 | images_sizes[frame] = (width, height) 146 | except Exception as ex: 147 | if version == 'challenge2018': 148 | raise ex 149 | continue 150 | 151 | x1 = float(row['XMin']) 152 | x2 = float(row['XMax']) 153 | y1 = float(row['YMin']) 154 | y2 = float(row['YMax']) 155 | 156 | x1_int = int(round(x1 * width)) 157 | x2_int = int(round(x2 * width)) 158 | y1_int = int(round(y1 * height)) 159 | y2_int = int(round(y2 * height)) 160 | 161 | # Check that the bounding box is valid. 162 | if x2 <= x1: 163 | raise ValueError('line {}: x2 ({}) must be higher than x1 ({})'.format(line, x2, x1)) 164 | if y2 <= y1: 165 | raise ValueError('line {}: y2 ({}) must be higher than y1 ({})'.format(line, y2, y1)) 166 | 167 | if y2_int == y1_int: 168 | warnings.warn('filtering line {}: rounding y2 ({}) and y1 ({}) makes them equal'.format(line, y2, y1)) 169 | continue 170 | 171 | if x2_int == x1_int: 172 | warnings.warn('filtering line {}: rounding x2 ({}) and x1 ({}) makes them equal'.format(line, x2, x1)) 173 | continue 174 | 175 | img_id = row['ImageID'] 176 | annotation = {'cls_id': cls_id, 'x1': x1, 'x2': x2, 'y1': y1, 'y2': y2} 177 | 178 | if img_id in id_annotations: 179 | annotations = id_annotations[img_id] 180 | annotations['boxes'].append(annotation) 181 | else: 182 | id_annotations[img_id] = {'w': width, 'h': height, 'boxes': [annotation]} 183 | 184 | return id_annotations 185 | 186 | 187 | class OpenImagesGenerator(Generator): 188 | def __init__( 189 | self, main_dir, subset, version='challenge2018', 190 | labels_filter=None, annotation_cache_dir='.', 191 | parent_label=None, 192 | **kwargs 193 | ): 194 | if version == 'challenge2018': 195 | metadata = 'challenge2018' 196 | else: 197 | raise NotImplementedError('There is currently no implementation for versions older than v3') 198 | 199 | if version == 'challenge2018': 200 | self.base_dir = os.path.join(main_dir, 'images', 'train') 201 | 202 | metadata_dir = os.path.join(main_dir, metadata) 203 | annotation_cache_json = os.path.join(annotation_cache_dir, subset + '.json') 204 | 205 | self.hierarchy = load_hierarchy(metadata_dir, version=version) 206 | id_to_labels, cls_index = get_labels(metadata_dir, version=version) 207 | 208 | if os.path.exists(annotation_cache_json): 209 | print("Loading {} annotations...".format(subset)) 210 | with open(annotation_cache_json, 'r') as f: 211 | self.annotations = json.loads(f.read()) 212 | else: 213 | print("Starting to generate image annotations...") 214 | self.annotations = generate_images_annotations_json(main_dir, metadata_dir, subset, cls_index, 215 | version=version) 216 | print("Dumping the annotations...") 217 | json.dump(self.annotations, open(annotation_cache_json, "w")) 218 | 219 | if labels_filter is not None or parent_label is not None: 220 | self.id_to_labels, self.annotations = self.__filter_data(id_to_labels, cls_index, labels_filter, 221 | parent_label) 222 | else: 223 | self.id_to_labels = id_to_labels 224 | 225 | self.id_to_image_id = dict([(i, k) for i, k in enumerate(self.annotations)]) 226 | 227 | super(OpenImagesGenerator, self).__init__(**kwargs) 228 | 229 | def __filter_data(self, id_to_labels, cls_index, labels_filter=None, parent_label=None): 230 | """ 231 | If you want to work with a subset of the labels just set a list with trainable labels 232 | :param labels_filter: Ex: labels_filter = ['Helmet', 'Hat', 'Analog television'] 233 | :param parent_label: If parent_label is set this will bring you the parent label 234 | but also its children in the semantic hierarchy as defined in OID, ex: Animal 235 | hierarchical tree 236 | :return: 237 | """ 238 | 239 | children_id_to_labels = {} 240 | 241 | if parent_label is None: 242 | # there is/are no other sublabel(s) other than the labels itself 243 | 244 | for label in labels_filter: 245 | for i, lb in id_to_labels: 246 | if lb == label: 247 | children_id_to_labels[i] = label 248 | break 249 | else: 250 | parent_cls = None 251 | for i, lb in iter(id_to_labels.items()): 252 | if lb == parent_label: 253 | parent_id = i 254 | for c, index in iter(cls_index.items()): 255 | if index == parent_id: 256 | parent_cls = c 257 | break 258 | 259 | if parent_cls is None: 260 | raise Exception('Couldnt find label {}'.format(parent_label)) 261 | 262 | parent_tree = find_hierarchy_parent(self.hierarchy, parent_cls) 263 | 264 | if parent_tree is None: 265 | raise Exception('Couldnt find parent {} in the semantic hierarchical tree'.format(parent_label)) 266 | 267 | children = load_hierarchy_children(parent_tree) 268 | 269 | for cls in children: 270 | index = cls_index[cls] 271 | label = id_to_labels[index] 272 | children_id_to_labels[index] = label 273 | 274 | id_map = dict([(ind, i) for i, ind in enumerate(iter(children_id_to_labels.keys()))]) 275 | 276 | filtered_annotations = {} 277 | for k in self.annotations: 278 | img_ann = self.annotations[k] 279 | 280 | filtered_boxes = [] 281 | for ann in img_ann['boxes']: 282 | cls_id = ann['cls_id'] 283 | if cls_id in children_id_to_labels: 284 | ann['cls_id'] = id_map[cls_id] 285 | filtered_boxes.append(ann) 286 | 287 | if len(filtered_boxes) > 0: 288 | filtered_annotations[k] = {'w': img_ann['w'], 'h': img_ann['h'], 'boxes': filtered_boxes} 289 | 290 | children_id_to_labels = dict([(id_map[i], l) for (i, l) in iter(children_id_to_labels.items())]) 291 | 292 | return children_id_to_labels, filtered_annotations 293 | 294 | def size(self): 295 | return len(self.annotations) 296 | 297 | def num_classes(self): 298 | return len(self.id_to_labels) 299 | 300 | def name_to_label(self, name): 301 | raise NotImplementedError() 302 | 303 | def label_to_name(self, label): 304 | return self.id_to_labels[label] 305 | 306 | def image_aspect_ratio(self, image_index): 307 | img_annotations = self.annotations[self.id_to_image_id[image_index]] 308 | height, width = img_annotations['h'], img_annotations['w'] 309 | return float(width) / float(height) 310 | 311 | def image_path(self, image_index): 312 | path = os.path.join(self.base_dir, self.id_to_image_id[image_index] + '.jpg') 313 | return path 314 | 315 | def load_image(self, image_index): 316 | return read_image_bgr(self.image_path(image_index)) 317 | 318 | def load_annotations(self, image_index): 319 | image_annotations = self.annotations[self.id_to_image_id[image_index]] 320 | 321 | labels = image_annotations['boxes'] 322 | height, width = image_annotations['h'], image_annotations['w'] 323 | 324 | boxes = np.zeros((len(labels), 5)) 325 | for idx, ann in enumerate(labels): 326 | cls_id = ann['cls_id'] 327 | x1 = ann['x1'] * width 328 | x2 = ann['x2'] * width 329 | y1 = ann['y1'] * height 330 | y2 = ann['y2'] * height 331 | 332 | boxes[idx, 0] = x1 333 | boxes[idx, 1] = y1 334 | boxes[idx, 2] = x2 335 | boxes[idx, 3] = y2 336 | boxes[idx, 4] = cls_id 337 | 338 | return boxes 339 | -------------------------------------------------------------------------------- /keras_retinanet/setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | from setuptools import find_packages 3 | 4 | with open("README.md", "r") as fh: 5 | long_description = fh.read() 6 | 7 | REQUIRED_PACKAGES = ['keras', 'tensorflow', 'keras-resnet', 'six', 'tensorflow', 'pandas', 'sklearn'] 8 | 9 | setuptools.setup( 10 | name="keras_retinanet", 11 | version="1.0.0", 12 | author="Mukesh Mithrakumar", 13 | author_email="mukesh.mithrakumar@jacks.sdstate.edu", 14 | description="Keras implementation of RetinaNet for object detection and visual relationship identification", 15 | long_description=long_description, 16 | long_description_content_type="text/markdown", 17 | url="https://github.com/mukeshmithrakumar/RetinaNet", 18 | classifiers=( 19 | "Development Status :: 1.0.0.dev1 - Development release", 20 | "Intended Audience :: Developers", 21 | "Programming Language :: Python :: 3.6", 22 | "License :: OSI Approved :: MIT License", 23 | "Operating System :: OS Independent", 24 | ), 25 | keywords="sample setuptools development", 26 | packages=find_packages(exclude=['contrib', 'docs', 'tests*']), 27 | install_requires=REQUIRED_PACKAGES, 28 | entry_points={ 29 | 'console_scripts': [ 30 | 'retinanet_task = keras_retinanet.trainer.task:main', 31 | 'retinanet_train = keras_retinanet.trainer.train:main', 32 | 'retinanet_evaluate = keras_retinanet.trainer.evaluate:main', 33 | ] 34 | }, 35 | python_requires='>=3', 36 | ) 37 | -------------------------------------------------------------------------------- /keras_retinanet/trainer/convert_model.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import sys 4 | 5 | 6 | if __name__ == '__main__' and __package__ is None: 7 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..')) 8 | __package__ = "keras_retinanet.trainer" 9 | 10 | from ..models import model_backbone 11 | 12 | 13 | def parse_args(args): 14 | parser = argparse.ArgumentParser(description='Script for converting a training model to an inference model.') 15 | 16 | parser.add_argument( 17 | 'main_dir', 18 | help='Path to dataset directory.' 19 | ) 20 | parser.add_argument( 21 | 'model_in', 22 | help='The model to convert.' 23 | ) 24 | parser.add_argument( 25 | '--backbone', 26 | help='The backbone of the model to convert.', 27 | default='resnet50' 28 | ) 29 | parser.add_argument( 30 | '--no-nms', 31 | help='Disables non maximum suppression.', 32 | dest='nms', 33 | action='store_false' 34 | ) 35 | parser.add_argument( 36 | '--no-class-specific-filter', 37 | help='Disables class specific filtering.', 38 | dest='class_specific_filter', 39 | action='store_false' 40 | ) 41 | 42 | return parser.parse_args(args) 43 | 44 | 45 | def main(args=None): 46 | # parse arguments 47 | if args is None: 48 | args = sys.argv[1:] 49 | args = parse_args(args) 50 | 51 | # load and convert model 52 | model = model_backbone.load_model(args.model_in, 53 | convert=True, 54 | backbone_name=args.backbone, 55 | nms=args.nms, 56 | class_specific_filter=args.class_specific_filter) 57 | 58 | # save model 59 | model_out_path = os.path.join(args.main_dir, 'keras_retinanet', 'trainer', 'snapshots') 60 | model_out = os.path.join(model_out_path, '{}_inference.h5'.format(args.model_in)) 61 | model.save(model_out) 62 | 63 | 64 | if __name__ == '__main__': 65 | main() 66 | -------------------------------------------------------------------------------- /keras_retinanet/trainer/evaluate.py: -------------------------------------------------------------------------------- 1 | import keras 2 | import argparse 3 | import tensorflow as tf 4 | import numpy as np 5 | import sys 6 | import os 7 | import csv 8 | import pandas as pd 9 | 10 | if __name__ == '__main__' and __package__ is None: 11 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..')) 12 | __package__ = "keras_retinanet.trainer" 13 | 14 | # Change these to absolute imports if you copy this script outside the keras_retinanet package. 15 | from ..models import model_backbone 16 | from ..preprocessing.image import read_image_bgr, preprocess_image, resize_image 17 | from ..models import classifier 18 | 19 | 20 | def parse_args(args): 21 | parser = argparse.ArgumentParser(description='Script for converting a training model to an inference model.') 22 | 23 | parser.add_argument( 24 | 'main_dir', 25 | help='Path to dataset directory.' 26 | ) 27 | parser.add_argument( 28 | 'model_in', 29 | help="The converted model to evaluate." 30 | "If training model hasn't been converted for inference, run convert_model first." 31 | ) 32 | parser.add_argument( 33 | '--train_type', 34 | help="Type of predictions you want to make" 35 | "If you want to train for Visual Reltionship, then type -'vr'." 36 | "If you want to train for Object Detection, then type -'od'." 37 | "If you want to train for both, then type -'both'.", 38 | default='both' 39 | ) 40 | parser.add_argument( 41 | '--backbone', 42 | help='The backbone of the model to convert.', 43 | default='resnet50' 44 | ) 45 | 46 | return parser.parse_args(args) 47 | 48 | 49 | def get_session(): 50 | """ Construct a modified tf session. 51 | """ 52 | config = tf.ConfigProto() 53 | config.gpu_options.allow_growth = True 54 | return tf.Session(config=config) 55 | 56 | 57 | def makedirs(path): 58 | # Intended behavior: try to create the directory, 59 | # pass if the directory exists already, fails otherwise. 60 | # Meant for Python 2.7/3.n compatibility. 61 | try: 62 | os.makedirs(path) 63 | except OSError: 64 | if not os.path.isdir(path): 65 | raise 66 | 67 | 68 | def get_midlabels(main_dir): 69 | meta_dir = os.path.join(main_dir, 'challenge2018') 70 | csv_file = 'challenge-2018-class-descriptions-500.csv' 71 | boxable_classes_descriptions = os.path.join(meta_dir, csv_file) 72 | 73 | id_to_midlabels = {} 74 | i = 0 75 | with open(boxable_classes_descriptions, 'r') as descriptions_file: 76 | for row in csv.reader(descriptions_file): 77 | if len(row): 78 | label = row[0] 79 | id_to_midlabels[i] = label 80 | i += 1 81 | 82 | return id_to_midlabels 83 | 84 | 85 | def get_annotations(base_dir, model): 86 | id_annotations = dict() 87 | count = 0 88 | for img in os.listdir(base_dir): 89 | try: 90 | img_path = os.path.join(base_dir, img) 91 | raw_image = read_image_bgr(img_path) 92 | image = preprocess_image(raw_image.copy()) 93 | image, scale = resize_image(image, min_side=600, max_side=600) 94 | height, width, _ = image.shape 95 | 96 | img_id = img.strip('.jpg') 97 | 98 | # run network 99 | boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0)) 100 | 101 | # boxes in (x1, y1, x2, y2) format 102 | new_boxes2 = [] 103 | for box in boxes[0]: 104 | x1_int = round((box[0] / width), 3) 105 | y1_int = round((box[1] / height), 3) 106 | x2_int = round((box[2] / width), 3) 107 | y2_int = round((box[3] / height), 3) 108 | new_boxes2.extend([x1_int, y1_int, x2_int, y2_int]) 109 | 110 | new_list = [new_boxes2[i:i + 4] for i in range(0, len(new_boxes2), 4)] 111 | 112 | annotation = {'cls_label': labels, 'box_values': new_list, 'scores': scores} 113 | 114 | if img_id in id_annotations: 115 | annotations = id_annotations[img_id] 116 | annotations['boxes'].append(annotation) 117 | else: 118 | id_annotations[img_id] = {'boxes': [annotation]} 119 | 120 | count += 1 121 | print("{0}/99999".format(count)) 122 | 123 | except: 124 | print("Did not evaluate {}".format(img)) 125 | continue 126 | 127 | return id_annotations 128 | 129 | 130 | def od(id_annotations, main_dir): 131 | 132 | id_to_midlabels = get_midlabels(main_dir) 133 | 134 | try: 135 | predict = pd.DataFrame.from_dict(id_annotations) 136 | except: 137 | print("from dict did not work") 138 | 139 | try: 140 | predict = pd.DataFrame.from_records(id_annotations) 141 | except: 142 | print("from records did not work") 143 | 144 | sub = [] 145 | for k in predict: 146 | # convert class labels to MID format by iterating through class labels 147 | new_clslst = list(map(id_to_midlabels.get, predict[k]['boxes'][0]['cls_label'][0])) 148 | 149 | # copy the scores to the mid labels and create bounding box values 150 | new_boxlist = [] 151 | for i, mids in enumerate(new_clslst): 152 | if mids is None: 153 | break 154 | else: 155 | scores = predict[k]['boxes'][0]['scores'][0][i] 156 | _scorelst = str(mids) + ' ' + str(scores) 157 | boxval = str(predict[k]['boxes'][0]['box_values'][i]).strip("[]") 158 | _boxlist = _scorelst + ' ' + boxval 159 | new_boxlist.append(_boxlist) 160 | i += 1 161 | 162 | new_boxlist = ''.join(str(new_boxlist)).replace(",", '').replace("'", '').replace("[", '').replace("]", '') 163 | 164 | sub.append(new_boxlist) 165 | 166 | mk_path = os.path.join(main_dir, 'ODSubmissions') 167 | makedirs(mk_path) 168 | path = os.path.join(main_dir, 'ODSubmissions') 169 | 170 | print("OD predictions complete") 171 | with open(path + "od.csv", "w") as csv_file: 172 | writer = csv.writer(csv_file, delimiter=' ') 173 | for line in sub: 174 | writer.writerow([line]) 175 | 176 | header = ["PredictionString"] 177 | od_file = pd.read_csv(path + "od.csv", names=header) 178 | 179 | ImageId = [] 180 | for k in predict: 181 | ImageId.append(k) 182 | 183 | se = pd.Series(ImageId) 184 | od_file['ImageId'] = se.values 185 | 186 | od_file = od_file[["ImageId", "PredictionString"]] 187 | od_file.to_csv(path + "submission-od.csv", index=False) 188 | 189 | print("Writing OD Submission file") 190 | 191 | if os.path.isfile(path + 'od.csv'): 192 | os.unlink(path + 'od.csv') 193 | 194 | 195 | def relationship_list(new_boxlist, new_scorelist, midlist,LogReg): 196 | XMin = [] 197 | YMin = [] 198 | XMax = [] 199 | YMax = [] 200 | 201 | for idx, i in enumerate(new_boxlist): 202 | XMin.append(new_boxlist[idx][0]) 203 | YMin.append(new_boxlist[idx][1]) 204 | XMax.append(new_boxlist[idx][2]) 205 | YMax.append(new_boxlist[idx][3]) 206 | 207 | if len(midlist) % 2 == 0: 208 | XMin1 = XMin[:int(len(new_boxlist) / 2)] 209 | YMin1 = YMin[:int(len(new_boxlist) / 2)] 210 | XMax1 = XMax[:int(len(new_boxlist) / 2)] 211 | YMax1 = YMax[:int(len(new_boxlist) / 2)] 212 | 213 | new_scorelist1 = new_scorelist[:int(len(new_scorelist) / 2)] 214 | midlist1 = midlist[:int(len(midlist) / 2)] 215 | 216 | XMin2 = XMin[int(len(new_boxlist) / 2):] 217 | YMin2 = YMin[int(len(new_boxlist) / 2):] 218 | XMax2 = XMax[int(len(new_boxlist) / 2):] 219 | YMax2 = YMax[int(len(new_boxlist) / 2):] 220 | 221 | new_scorelist2 = new_scorelist[int(len(new_scorelist) / 2):] 222 | midlist2 = midlist[int(len(midlist) / 2):] 223 | 224 | else: 225 | XMin1 = XMin[:int(len(new_boxlist) / 2)] 226 | YMin1 = YMin[:int(len(new_boxlist) / 2)] 227 | XMax1 = XMax[:int(len(new_boxlist) / 2)] 228 | YMax1 = YMax[:int(len(new_boxlist) / 2)] 229 | 230 | new_scorelist1 = new_scorelist[:int(len(new_scorelist) / 2)] 231 | midlist1 = midlist[:int(len(midlist) / 2)] 232 | 233 | XMin2 = XMin[int(len(new_boxlist) / 2) + 1:] 234 | YMin2 = YMin[int(len(new_boxlist) / 2) + 1:] 235 | XMax2 = XMax[int(len(new_boxlist) / 2) + 1:] 236 | YMax2 = YMax[int(len(new_boxlist) / 2) + 1:] 237 | 238 | new_scorelist2 = new_scorelist[int(len(new_scorelist) / 2) + 1:] 239 | midlist2 = midlist[int(len(midlist) / 2) + 1:] 240 | 241 | vr = pd.DataFrame() 242 | 243 | XMin1_se = pd.Series(XMin1) 244 | YMin1_se = pd.Series(YMin1) 245 | XMax1_se = pd.Series(XMax1) 246 | YMax1_se = pd.Series(YMax1) 247 | 248 | new_scorelist1_se = pd.Series(new_scorelist1) 249 | midlist1_se = pd.Series(midlist1) 250 | 251 | vr['LabelName1'] = midlist1_se.values 252 | vr['scores1'] = new_scorelist1_se.values 253 | vr['XMin1'] = XMin1_se.values 254 | vr['YMin1'] = YMin1_se.values 255 | vr['XMax1'] = XMax1_se.values 256 | vr['YMax1'] = YMax1_se.values 257 | 258 | vr['box_1_length'] = vr['XMax1'] - vr['XMin1'] 259 | vr['box_1_height'] = vr['YMax1'] - vr['YMin1'] 260 | vr['box_1_area'] = vr['box_1_length'] * vr['box_1_height'] 261 | 262 | XMin2_se = pd.Series(XMin2) 263 | YMin2_se = pd.Series(YMin2) 264 | XMax2_se = pd.Series(XMax2) 265 | YMax2_se = pd.Series(YMax2) 266 | 267 | new_scorelist2_se = pd.Series(new_scorelist2) 268 | midlist2_se = pd.Series(midlist2) 269 | 270 | vr['LabelName2'] = midlist2_se.values 271 | vr['scores2'] = new_scorelist2_se.values 272 | vr['XMin2'] = XMin2_se.values 273 | vr['YMin2'] = YMin2_se.values 274 | vr['XMax2'] = XMax2_se.values 275 | vr['YMax2'] = YMax2_se.values 276 | 277 | vr['box_2_length'] = vr['XMax2'] - vr['XMin2'] 278 | vr['box_2_height'] = vr['YMax2'] - vr['YMin2'] 279 | vr['box_2_area'] = vr['box_2_length'] * vr['box_2_height'] 280 | 281 | vr['confidence'] = (vr['scores1'] + vr['scores2']) / 2.0 282 | 283 | vr["xA"] = vr[["XMin1", "XMin2"]].max(axis=1) 284 | vr["yA"] = vr[["YMin1", "YMin2"]].max(axis=1) 285 | vr["xB"] = vr[["XMax1", "XMax2"]].min(axis=1) 286 | vr["yB"] = vr[["YMax1", "YMax2"]].min(axis=1) 287 | 288 | vr["intersectionarea"] = (vr["xB"] - vr["xA"]) * (vr["yB"] - vr["yA"]) 289 | vr["unionarea"] = vr["box_1_area"] + vr["box_2_area"] - vr["intersectionarea"] 290 | vr["iou"] = (vr["intersectionarea"] / vr["unionarea"]) 291 | 292 | drop_columns = ["intersectionarea", "unionarea", "xA", "yA", "xB", "yB", "box_1_area", 293 | "box_2_area", "scores1", "scores2", "box_1_length", "box_1_height", 294 | "box_2_length", "box_2_height"] 295 | 296 | vr = vr.drop(columns=drop_columns) 297 | 298 | # replace columns with inf values with nan so I could drop those values 299 | vr = vr.replace([np.inf, -np.inf], np.nan) 300 | vr = vr.dropna() 301 | 302 | # drop the ious if its less than zero, it means its without any relationships cause of no intersection 303 | vr_iou_negative = vr[vr['iou'] < 0] 304 | vr = vr.drop(vr_iou_negative.index, axis=0) 305 | 306 | vr = vr[['confidence', 'LabelName1', 'XMin1', 'YMin1', 'XMax1', 307 | 'YMax1', 'LabelName2', 'XMin2', 'YMin2', 'XMax2', 'YMax2', 'iou']] 308 | 309 | vr_test = vr[['XMin1', 'YMin1', 'XMax1', 'YMax1', 'XMin2', 'YMin2', 'XMax2', 310 | 'YMax2', 'iou']] 311 | 312 | try: 313 | vr_pred = LogReg.predict(vr_test) 314 | 315 | relations_file = {'0': 'at', 316 | "1": 'hits', 317 | "2": 'holds', 318 | "3": 'inside_of', 319 | "4": 'interacts_with', 320 | "5": 'is', 321 | "6": 'on', 322 | "7": 'plays', 323 | "8": 'under', 324 | "9": 'wears' 325 | } 326 | 327 | def get_vr(row): 328 | for c in vr_pred_df.columns: 329 | if row[c] == 1: 330 | return c 331 | 332 | vr_pred_df1 = pd.DataFrame(vr_pred, columns=relations_file) 333 | vr_pred_df = vr_pred_df1.rename(columns=relations_file) 334 | vr_pred_df = vr_pred_df.apply(get_vr, axis=1) 335 | vr['Relationship'] = vr_pred_df.values 336 | vr = vr.dropna() 337 | vr = vr.drop(columns='iou') 338 | 339 | vrlst = vr.values.tolist() 340 | new_vrlst = ''.join(str(vrlst)).replace(",", '').replace("'", '').replace("[", '').replace("]", '') 341 | 342 | except: 343 | print("EMPTY EVALUATION") 344 | new_vrlst = '' 345 | 346 | return new_vrlst 347 | 348 | 349 | def vr(id_annotations, logreg, main_dir): 350 | 351 | id_to_midlabels = get_midlabels(main_dir) 352 | 353 | try: 354 | predict = pd.DataFrame.from_dict(id_annotations) 355 | except: 356 | print("from dict did not work") 357 | 358 | try: 359 | predict = pd.DataFrame.from_records(id_annotations) 360 | except: 361 | print("from records did not work") 362 | 363 | sub = [] 364 | for k in predict: 365 | counter = 0 366 | 367 | # convert class labels to MID format by iterating through class labels 368 | clslst = list(map(id_to_midlabels.get, predict[k]['boxes'][0]['cls_label'][0])) 369 | 370 | new_boxlist = [] 371 | new_scorelist = [] 372 | midlist = [] 373 | empty_imgs = [] 374 | for i, mids in enumerate(clslst): 375 | if mids is None: 376 | break 377 | else: 378 | scores = predict[k]['boxes'][0]['scores'][0][i] 379 | val = predict[k]['boxes'][0]['box_values'][i] 380 | new_scorelist.append(scores) 381 | midlist.append(mids) 382 | new_boxlist.append(val) 383 | i += 1 384 | counter += 1 385 | 386 | if len(midlist) == 0: 387 | empty_imgs.append(str(counter) + ':' + str(k)) 388 | new_vrlst = '' 389 | 390 | else: 391 | new_vrlst = relationship_list(new_boxlist, new_scorelist, midlist, logreg) 392 | 393 | sub.append(new_vrlst) 394 | print("{0}/99999".format(len(sub))) 395 | 396 | mk_path = os.path.join(main_dir, 'VRSubmissions') 397 | makedirs(mk_path) 398 | path = os.path.join(main_dir, 'VRSubmissions') 399 | 400 | with open(path + "vr.csv", "w") as csv_file: 401 | writer = csv.writer(csv_file, delimiter=' ') 402 | for line in sub: 403 | writer.writerow([line]) 404 | 405 | header = ["PredictionString"] 406 | vr_file = pd.read_csv(path + "vr.csv", names=header) 407 | 408 | ImageId = [] 409 | for k in predict: 410 | ImageId.append(k) 411 | 412 | se = pd.Series(ImageId) 413 | vr_file['ImageId'] = se.values 414 | 415 | vr_file = vr_file[["ImageId", "PredictionString"]] 416 | vr_file.to_csv(path + "submission-vr.csv", index=False) 417 | 418 | print("Writing VR Submission file") 419 | 420 | if os.path.isfile(path + 'vr.csv'): 421 | os.unlink(path + 'vr.csv') 422 | 423 | 424 | def main(args=None): 425 | 426 | if args is None: 427 | args = sys.argv[1:] 428 | args = parse_args(args) 429 | 430 | keras.backend.tensorflow_backend.set_session(get_session()) 431 | 432 | keras_retinanet = os.path.join(args.main_dir, 'keras_retinanet', 'trainer', 'snapshots') 433 | path_to_model = os.path.join(keras_retinanet, '{}.h5'.format(args.model_in)) 434 | base_dir = os.path.join(args.main_dir, 'images', 'test') 435 | 436 | # load the evaluation model 437 | print('Loading model {}, this may take a second...'.format(args.model_in)) 438 | model = model_backbone.load_model(path_to_model, backbone_name='resnet50') 439 | 440 | print("Starting Evaluation...") 441 | 442 | if args.train_type == 'both': 443 | id_annotations = get_annotations(base_dir, model) 444 | print("Evaluation Completed") 445 | 446 | print("Starting Object Detection Prediction") 447 | od(id_annotations, args.main_dir) 448 | 449 | print("Starting Visual Relationship Bounding Box Classifier Training") 450 | logreg = classifier.vr_bb_classifier(args.main_dir) 451 | 452 | print("Starting Visual Relationship Bounding Box Prediction") 453 | vr(id_annotations, logreg, args.main_dir) 454 | print("Prediction Completed") 455 | 456 | elif args.train_type == 'od': 457 | id_annotations = get_annotations(base_dir, model) 458 | print("Evaluation Completed") 459 | 460 | print("Starting Object Detection Prediction") 461 | od(id_annotations, args.main_dir) 462 | print("Prediction Completed") 463 | 464 | elif args.train_type == 'vr': 465 | id_annotations = get_annotations(base_dir, model) 466 | print("Evaluation Completed") 467 | 468 | print("Starting Visual Relationship Bounding Box Classifier Training") 469 | logreg = classifier.vr_bb_classifier(args.main_dir) 470 | 471 | print("Starting Visual Relationship Bounding Box Prediction") 472 | vr(id_annotations, logreg, args.main_dir) 473 | print("Prediction Completed") 474 | 475 | else: 476 | raise ValueError('Invalid train type received: {}'.format(args.train_type)) 477 | 478 | 479 | if __name__ == '__main__': 480 | main() 481 | -------------------------------------------------------------------------------- /keras_retinanet/trainer/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import keras 3 | import keras.preprocessing.image 4 | from keras.utils import multi_gpu_model 5 | import tensorflow as tf 6 | 7 | # Change these to absolute imports if you copy this script outside the keras_retinanet package. 8 | from ..utils import losses 9 | from ..models import model_backbone 10 | from ..models.retinanet import retinanet_bbox 11 | from ..utils.anchors import make_shapes_callback 12 | from ..callbacks.callbacks import RedirectModel 13 | from ..callbacks.callbacks import Evaluate 14 | from ..preprocessing.open_images import OpenImagesGenerator 15 | from ..preprocessing.image import random_transform_generator 16 | from ..utils import freeze as freeze_model 17 | 18 | 19 | def makedirs(path): 20 | # Intended behavior: try to create the directory, 21 | # pass if the directory exists already, fails otherwise. 22 | # Meant for Python 2.7/3.n compatibility. 23 | try: 24 | os.makedirs(path) 25 | except OSError: 26 | if not os.path.isdir(path): 27 | raise 28 | 29 | 30 | def get_session(): 31 | """ Construct a modified tf session. 32 | """ 33 | config = tf.ConfigProto() 34 | config.gpu_options.allow_growth = True 35 | # config.gpu_options.per_process_gpu_memory_fraction = 0.7 36 | return tf.Session(config=config) 37 | 38 | 39 | def model_with_weights(model, weights, skip_mismatch): 40 | """ Load weights for model. 41 | 42 | Args 43 | model : The model to load weights for. 44 | weights : The weights to load. 45 | skip_mismatch : If True, skips layers whose shape of weights doesn't match with the model. 46 | """ 47 | if weights is not None: 48 | model.load_weights(weights, by_name=True, skip_mismatch=skip_mismatch) 49 | return model 50 | 51 | 52 | def create_models(backbone_retinanet, num_classes, weights, multi_gpu=1, freeze_backbone=False): 53 | """ Creates three models (model, training_model, prediction_model). 54 | 55 | Args 56 | backbone_retinanet : A function to call to create a retinanet model with a given backbone. 57 | num_classes : The number of classes to train. 58 | weights : The weights to load into the model. 59 | multi_gpu : The number of GPUs to use for training. 60 | freeze_backbone : If True, disables learning for the backbone. 61 | 62 | Returns 63 | model : The base model. This is also the model that is saved in snapshots. 64 | training_model : The training model. If multi_gpu=0, this is identical to model. 65 | prediction_model : The model wrapped with utility functions to perform object detection 66 | (applies regression values and performs NMS). 67 | """ 68 | modifier = freeze_model if freeze_backbone else None 69 | 70 | # Keras recommends initialising a multi-gpu model on the CPU to ease weight sharing, and to prevent OOM errors. 71 | # optionally wrap in a parallel model 72 | if multi_gpu > 1: 73 | with tf.device('/cpu:0'): 74 | model = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights, 75 | skip_mismatch=True) 76 | training_model = multi_gpu_model(model, gpus=multi_gpu) 77 | else: 78 | model = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights, 79 | skip_mismatch=True) 80 | training_model = model 81 | 82 | # make prediction model 83 | prediction_model = retinanet_bbox(model=model) 84 | 85 | # compile model 86 | training_model.compile( 87 | loss={ 88 | 'regression': losses.smooth_l1(), 89 | 'classification': losses.focal() 90 | }, 91 | optimizer=keras.optimizers.adam(lr=1e-7, clipnorm=0.001) 92 | ) 93 | 94 | return model, training_model, prediction_model 95 | 96 | 97 | def create_callbacks(model, training_model, prediction_model, validation_generator, args): 98 | """ Creates the callbacks to use during training. 99 | 100 | Args 101 | model: The base model. 102 | training_model: The model that is used for training. 103 | prediction_model: The model that should be used for validation. 104 | validation_generator: The generator for creating validation data. 105 | args: parseargs args object. 106 | 107 | Returns: 108 | A list of callbacks used for training. 109 | """ 110 | callbacks = [] 111 | 112 | tensorboard_callback = None 113 | 114 | if args.tensorboard_dir: 115 | tensorboard_callback = keras.callbacks.TensorBoard( 116 | log_dir=args.tensorboard_dir, 117 | histogram_freq=0, 118 | batch_size=args.batch_size, 119 | write_graph=True, 120 | write_grads=False, 121 | write_images=False, 122 | embeddings_freq=0, 123 | embeddings_layer_names=None, 124 | embeddings_metadata=None 125 | ) 126 | callbacks.append(tensorboard_callback) 127 | 128 | if args.evaluation and validation_generator: 129 | evaluation = Evaluate(validation_generator, tensorboard=tensorboard_callback) 130 | evaluation = RedirectModel(evaluation, prediction_model) 131 | callbacks.append(evaluation) 132 | 133 | # save the model 134 | if args.snapshots: 135 | # ensure directory created first; otherwise h5py will error after epoch. 136 | makedirs(args.snapshot_path) 137 | checkpoint = keras.callbacks.ModelCheckpoint( 138 | os.path.join(args.snapshot_path, 139 | '{backbone}_{dataset_type}_{{epoch:02d}}.h5'.format(backbone=args.backbone, 140 | dataset_type=args.dataset_type)), 141 | verbose=1, 142 | # save_best_only=True, 143 | monitor="mAP", 144 | # mode='max' 145 | ) 146 | checkpoint = RedirectModel(checkpoint, model) 147 | callbacks.append(checkpoint) 148 | 149 | callbacks.append(keras.callbacks.ReduceLROnPlateau( 150 | monitor='loss', 151 | factor=0.1, 152 | patience=2, 153 | verbose=1, 154 | mode='auto', 155 | min_delta=0.0001, 156 | cooldown=0, 157 | min_lr=0 158 | )) 159 | 160 | return callbacks 161 | 162 | 163 | def create_generators(args, preprocess_image): 164 | """ Create generators for training and validation. 165 | 166 | Args 167 | args : parseargs object containing configuration for generators. 168 | preprocess_image : Function that preprocesses an image for the network. 169 | """ 170 | common_args = { 171 | 'batch_size': args.batch_size, 172 | 'image_min_side': args.image_min_side, 173 | 'image_max_side': args.image_max_side, 174 | 'preprocess_image': preprocess_image, 175 | } 176 | 177 | # create random transform generator for augmenting training data 178 | if args.random_transform: 179 | transform_generator = random_transform_generator( 180 | min_rotation=-0.1, 181 | max_rotation=0.1, 182 | min_translation=(-0.1, -0.1), 183 | max_translation=(0.1, 0.1), 184 | min_shear=-0.1, 185 | max_shear=0.1, 186 | min_scaling=(0.9, 0.9), 187 | max_scaling=(1.1, 1.1), 188 | flip_x_chance=0.5, 189 | flip_y_chance=0.5, 190 | ) 191 | else: 192 | transform_generator = random_transform_generator(flip_x_chance=0.5) 193 | 194 | if args.dataset_type == 'oid': 195 | train_generator = OpenImagesGenerator( 196 | args.main_dir, 197 | subset='train', 198 | version=args.version, 199 | labels_filter=args.labels_filter, 200 | annotation_cache_dir=args.annotation_cache_dir, 201 | parent_label=args.parent_label, 202 | transform_generator=transform_generator, 203 | **common_args 204 | ) 205 | 206 | validation_generator = OpenImagesGenerator( 207 | args.main_dir, 208 | subset='validation', 209 | version=args.version, 210 | labels_filter=args.labels_filter, 211 | annotation_cache_dir=args.annotation_cache_dir, 212 | parent_label=args.parent_label, 213 | **common_args 214 | ) 215 | 216 | else: 217 | raise ValueError('Invalid data type received: {}'.format(args.dataset_type)) 218 | 219 | return train_generator, validation_generator 220 | 221 | 222 | def train(args): 223 | # create object that stores backbone information 224 | backbone = model_backbone.backbone(args.backbone) 225 | 226 | # optionally choose specific GPU 227 | if args.gpu: 228 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 229 | keras.backend.tensorflow_backend.set_session(get_session()) 230 | 231 | # create the generators 232 | print("Going to get the training and validation generators...") 233 | train_generator, validation_generator = create_generators(args, backbone.preprocess_image) 234 | 235 | # create the model 236 | if args.snapshot is not None: 237 | print('Loading model: {} \nThis may take a second...'.format(args.snapshot)) 238 | model = model_backbone.load_model(args.snapshot, backbone_name=args.backbone) 239 | training_model = model 240 | prediction_model = retinanet_bbox(model=model) 241 | else: 242 | weights = args.weights 243 | # default to imagenet if nothing else is specified 244 | if weights is None and args.imagenet_weights: 245 | weights = backbone.download_imagenet() 246 | 247 | print('Creating model, this may take a second...') 248 | model, training_model, prediction_model = create_models( 249 | backbone_retinanet=backbone.retinanet, 250 | num_classes=train_generator.num_classes(), 251 | weights=weights, 252 | multi_gpu=args.multi_gpu, 253 | freeze_backbone=args.freeze_backbone 254 | ) 255 | 256 | # print model summary 257 | print(model.summary()) 258 | 259 | # create the callbacks 260 | callbacks = create_callbacks( 261 | model, 262 | training_model, 263 | prediction_model, 264 | validation_generator, 265 | args, 266 | ) 267 | 268 | # start training 269 | print("Started training...") 270 | training_model.fit_generator( 271 | generator=train_generator, 272 | steps_per_epoch=args.steps, 273 | epochs=args.epochs, 274 | verbose=1, 275 | callbacks=callbacks, 276 | ) 277 | 278 | print("Training Complete") 279 | -------------------------------------------------------------------------------- /keras_retinanet/trainer/task.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import sys 3 | import os 4 | 5 | 6 | if __name__ == '__main__' and __package__ is None: 7 | sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..')) 8 | __package__ = "keras_retinanet.trainer" 9 | 10 | from ..trainer import model # Your model.py file. 11 | 12 | """ Parse the arguments. 13 | """ 14 | parser = argparse.ArgumentParser(description='Simple training script for training a RetinaNet network.') 15 | subparsers = parser.add_subparsers(help='Arguments for specific dataset types.', dest='dataset_type') 16 | subparsers.required = True 17 | 18 | def csv_list(string): 19 | return string.split(',') 20 | 21 | oid_parser = subparsers.add_parser('oid') 22 | oid_parser.add_argument( 23 | 'main_dir', 24 | help='Path to dataset directory.' 25 | ) 26 | oid_parser.add_argument( 27 | '--version', 28 | help='The current dataset version is v4.', 29 | default='challenge2018' 30 | ) 31 | oid_parser.add_argument( 32 | '--labels-filter', 33 | help='A list of labels to filter.', 34 | type=csv_list, 35 | default=None 36 | ) 37 | oid_parser.add_argument( 38 | '--annotation-cache-dir', 39 | help='Path to store annotation cache.', 40 | default='.' 41 | ) 42 | oid_parser.add_argument( 43 | '--parent-label', 44 | help='Use the hierarchy children of this label.', 45 | default=None 46 | ) 47 | 48 | group = parser.add_mutually_exclusive_group() 49 | group.add_argument( 50 | '--snapshot', 51 | help='Resume training from a snapshot.' 52 | ) 53 | group.add_argument( 54 | '--imagenet-weights', 55 | help='Initialize the model with pretrained imagenet weights. This is the default behaviour.', 56 | action='store_const', 57 | const=True, 58 | default=True 59 | ) 60 | group.add_argument( 61 | '--weights', 62 | help='Initialize the model with weights from a file.' 63 | ) 64 | group.add_argument( 65 | '--no-weights', 66 | help='Don\'t initialize the model with any weights.', 67 | dest='imagenet_weights', 68 | action='store_const', 69 | const=False 70 | ) 71 | 72 | parser.add_argument( 73 | '--backbone', 74 | help='Backbone model used by retinanet.', 75 | default='resnet50', 76 | type=str 77 | ) 78 | parser.add_argument( 79 | '--batch-size', 80 | help='Size of the batches.', 81 | default=1, 82 | type=int 83 | ) 84 | parser.add_argument( 85 | '--gpu', 86 | help='Id of the GPU to use (as reported by nvidia-smi).' 87 | ) 88 | parser.add_argument( 89 | '--multi-gpu', 90 | help='Number of GPUs to use for parallel processing.', 91 | type=int, 92 | default=1) 93 | parser.add_argument( 94 | '--multi-gpu-force', 95 | help='Extra flag needed to enable (experimental) multi-gpu support.', 96 | action='store_true' 97 | ) 98 | parser.add_argument( 99 | '--epochs', 100 | help='Number of epochs to train.', 101 | type=int, 102 | default=50 103 | ) 104 | parser.add_argument( 105 | '--steps', 106 | help='Number of steps per epoch.', 107 | type=int, 108 | default=100000 109 | ) 110 | parser.add_argument( 111 | '--snapshot-path', 112 | help="Path to store snapshots of models during training (defaults to \'./snapshots\')", 113 | default='./snapshots' 114 | ) 115 | parser.add_argument( 116 | '--tensorboard-dir', 117 | help='Log directory for Tensorboard output', 118 | default='./logs' 119 | ) 120 | parser.add_argument( 121 | '--no-snapshots', 122 | help='Disable saving snapshots.', 123 | dest='snapshots', 124 | action='store_false' 125 | ) 126 | parser.add_argument( 127 | '--no-evaluation', 128 | help='Disable per epoch evaluation.', 129 | dest='evaluation', 130 | action='store_false' 131 | ) 132 | parser.add_argument( 133 | '--freeze-backbone', 134 | help='Freeze training of backbone layers.', 135 | action='store_true' 136 | ) 137 | parser.add_argument( 138 | '--random-transform', 139 | help='Randomly transform image and annotations.', 140 | action='store_true' 141 | ) 142 | parser.add_argument( 143 | '--image-min-side', 144 | help='Rescale the image so the smallest side is min_side.', 145 | type=int, 146 | default=600 147 | ) 148 | parser.add_argument( 149 | '--image-max-side', 150 | help='Rescale the image if the largest side is larger than max_side.', 151 | type=int, 152 | default=600 153 | ) 154 | 155 | args = parser.parse_args() 156 | 157 | # Run the training job 158 | model.train(args) 159 | -------------------------------------------------------------------------------- /keras_retinanet/utils/anchors.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import keras 3 | 4 | 5 | def compute_overlap(boxes, query_boxes): 6 | """ 7 | Args 8 | a: (N, 4) ndarray of float 9 | b: (K, 4) ndarray of float 10 | Returns 11 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes 12 | """ 13 | 14 | n_ = boxes.shape[0] 15 | k_ = query_boxes.shape[0] 16 | overlaps = np.zeros((n_, k_), dtype=np.float) 17 | for k in range(k_): 18 | query_box_area = (query_boxes[k, 2] - query_boxes[k, 0] + 1) * (query_boxes[k, 3] - query_boxes[k, 1] + 1) 19 | for n in range(n_): 20 | iw = min(boxes[n, 2], query_boxes[k, 2]) - max(boxes[n, 0], query_boxes[k, 0]) + 1 21 | if iw > 0: 22 | ih = min(boxes[n, 3], query_boxes[k, 3]) - max(boxes[n, 1], query_boxes[k, 1]) + 1 23 | if ih > 0: 24 | box_area = (boxes[n, 2] - boxes[n, 0] + 1) * (boxes[n, 3] - boxes[n, 1] + 1) 25 | all_area = float(box_area + query_box_area - iw * ih) 26 | overlaps[n, k] = iw * ih / all_area 27 | return overlaps 28 | 29 | 30 | def anchor_targets_bbox( 31 | anchors, 32 | image_group, 33 | annotations_group, 34 | num_classes, 35 | negative_overlap=0.3, 36 | positive_overlap=0.7 37 | ): 38 | """ Generate anchor targets for bbox detection. 39 | 40 | Args 41 | anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2). 42 | image_group: List of BGR images. 43 | annotations_group: List of annotations (np.array of shape (N, 5) for (x1, y1, x2, y2, label)). 44 | num_classes: Number of classes to predict. 45 | mask_shape: If the image is padded with zeros, mask_shape can be used to mark the relevant part of the image. 46 | negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative). 47 | positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive). 48 | 49 | Returns 50 | labels_batch: batch that contains labels & anchor states (np.array of shape (batch_size, N, num_classes + 1), 51 | where N is the number of anchors for an image and the last column defines the anchor state 52 | (-1 for ignore, 0 for bg, 1 for fg). 53 | regression_batch: batch that contains bounding-box regression targets for an image & anchor states 54 | (np.array of shape (batch_size, N, 4 + 1), 55 | where N is the number of anchors for an image, the first 4 columns define regression targets for 56 | (x1, y1, x2, y2) and the 57 | last column defines anchor states (-1 for ignore, 0 for bg, 1 for fg). 58 | boxes_batch: box regression targets (np.array of shape (batch_size, N, num_classes + 1), where N is the number 59 | of anchors for an image) 60 | """ 61 | 62 | assert (len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal." 63 | assert (len(annotations_group) > 0), "No data received to compute anchor targets for." 64 | 65 | batch_size = len(image_group) 66 | 67 | regression_batch = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx()) 68 | labels_batch = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx()) 69 | boxes_batch = np.zeros((batch_size, anchors.shape[0], annotations_group[0].shape[1]), dtype=keras.backend.floatx()) 70 | 71 | # compute labels and regression targets 72 | for index, (image, annotations) in enumerate(zip(image_group, annotations_group)): 73 | if annotations.shape[0]: 74 | # obtain indices of gt annotations with the greatest overlap 75 | positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors, 76 | annotations, 77 | negative_overlap, 78 | positive_overlap) 79 | 80 | labels_batch[index, ignore_indices, -1] = -1 81 | labels_batch[index, positive_indices, -1] = 1 82 | 83 | regression_batch[index, ignore_indices, -1] = -1 84 | regression_batch[index, positive_indices, -1] = 1 85 | 86 | # compute box regression targets 87 | annotations = annotations[argmax_overlaps_inds] 88 | boxes_batch[index, ...] = annotations 89 | 90 | # compute target class labels 91 | labels_batch[index, positive_indices, annotations[positive_indices, 4].astype(int)] = 1 92 | 93 | regression_batch[index, :, :-1] = bbox_transform(anchors, annotations) 94 | 95 | # ignore annotations outside of image 96 | if image.shape: 97 | anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T 98 | indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0]) 99 | 100 | labels_batch[index, indices, -1] = - 1 101 | regression_batch[index, indices, -1] = -1 102 | 103 | return labels_batch, regression_batch, boxes_batch 104 | 105 | 106 | def compute_gt_annotations( 107 | anchors, 108 | annotations, 109 | negative_overlap=0.3, 110 | positive_overlap=0.7 111 | ): 112 | """ Obtain indices of gt annotations with the greatest overlap. 113 | 114 | Args 115 | anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2). 116 | annotations: np.array of shape (N, 5) for (x1, y1, x2, y2, label). 117 | negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative). 118 | positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive). 119 | 120 | Returns 121 | positive_indices: indices of positive anchors 122 | ignore_indices: indices of ignored anchors 123 | argmax_overlaps_inds: ordered overlaps indices 124 | """ 125 | 126 | overlaps = compute_overlap(anchors.astype(np.float64), annotations.astype(np.float64)) 127 | argmax_overlaps_inds = np.argmax(overlaps, axis=1) 128 | max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds] 129 | 130 | # assign "dont care" labels 131 | positive_indices = max_overlaps >= positive_overlap 132 | ignore_indices = (max_overlaps > negative_overlap) & ~positive_indices 133 | 134 | return positive_indices, ignore_indices, argmax_overlaps_inds 135 | 136 | 137 | def layer_shapes(image_shape, model): 138 | """Compute layer shapes given input image shape and the model. 139 | 140 | Args 141 | image_shape: The shape of the image. 142 | model: The model to use for computing how the image shape is transformed in the pyramid. 143 | 144 | Returns 145 | A dictionary mapping layer names to image shapes. 146 | """ 147 | shape = { 148 | model.layers[0].name: (None,) + image_shape, 149 | } 150 | 151 | for layer in model.layers[1:]: 152 | nodes = layer._inbound_nodes 153 | for node in nodes: 154 | inputs = [shape[lr.name] for lr in node.inbound_layers] 155 | if not inputs: 156 | continue 157 | shape[layer.name] = layer.compute_output_shape(inputs[0] if len(inputs) == 1 else inputs) 158 | 159 | return shape 160 | 161 | 162 | def make_shapes_callback(model): 163 | """ Make a function for getting the shape of the pyramid levels. 164 | """ 165 | 166 | def get_shapes(image_shape, pyramid_levels): 167 | shape = layer_shapes(image_shape, model) 168 | image_shapes = [shape["P{}".format(level)][1:3] for level in pyramid_levels] 169 | return image_shapes 170 | 171 | return get_shapes 172 | 173 | 174 | def guess_shapes(image_shape, pyramid_levels): 175 | """Guess shapes based on pyramid levels. 176 | 177 | Args 178 | image_shape: The shape of the image. 179 | pyramid_levels: A list of what pyramid levels are used. 180 | 181 | Returns 182 | A list of image shapes at each pyramid level. 183 | """ 184 | image_shape = np.array(image_shape[:2]) 185 | image_shapes = [(image_shape + 2 ** x - 1) // (2 ** x) for x in pyramid_levels] 186 | return image_shapes 187 | 188 | 189 | def anchors_for_shape( 190 | image_shape, 191 | pyramid_levels=None, 192 | ratios=None, 193 | scales=None, 194 | strides=None, 195 | sizes=None, 196 | shapes_callback=None, 197 | ): 198 | """ Generators anchors for a given shape. 199 | 200 | Args 201 | image_shape: The shape of the image. 202 | pyramid_levels: List of ints representing which pyramids to use (defaults to [3, 4, 5, 6, 7]). 203 | ratios: List of ratios with which anchors are generated (defaults to [0.5, 1, 2]). 204 | scales: List of scales with which anchors are generated (defaults to [2^0, 2^(1/3), 2^(2/3)]). 205 | strides: Stride per pyramid level, defines how the pyramids are constructed. 206 | sizes: Sizes of the anchors per pyramid level. 207 | shapes_callback: Function to call for getting the shape of the image at different pyramid levels. 208 | 209 | Returns 210 | np.array of shape (N, 4) containing the (x1, y1, x2, y2) coordinates for the anchors. 211 | """ 212 | if pyramid_levels is None: 213 | pyramid_levels = [3, 4, 5, 6, 7] 214 | if strides is None: 215 | strides = [2 ** x for x in pyramid_levels] 216 | if sizes is None: 217 | sizes = [2 ** (x + 2) for x in pyramid_levels] 218 | if ratios is None: 219 | ratios = np.array([0.5, 1, 2]) 220 | if scales is None: 221 | scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]) 222 | 223 | if shapes_callback is None: 224 | shapes_callback = guess_shapes 225 | image_shapes = shapes_callback(image_shape, pyramid_levels) 226 | 227 | # compute anchors over all pyramid levels 228 | all_anchors = np.zeros((0, 4)) 229 | for idx, p in enumerate(pyramid_levels): 230 | anchors = generate_anchors(base_size=sizes[idx], ratios=ratios, scales=scales) 231 | shifted_anchors = shift(image_shapes[idx], strides[idx], anchors) 232 | all_anchors = np.append(all_anchors, shifted_anchors, axis=0) 233 | 234 | return all_anchors 235 | 236 | 237 | def shift(shape, stride, anchors): 238 | """ Produce shifted anchors based on shape of the map and stride size. 239 | 240 | Args 241 | shape : Shape to shift the anchors over. 242 | stride : Stride to shift the anchors with over the shape. 243 | anchors: The anchors to apply at each location. 244 | """ 245 | shift_x = (np.arange(0, shape[1]) + 0.5) * stride 246 | shift_y = (np.arange(0, shape[0]) + 0.5) * stride 247 | 248 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 249 | 250 | shifts = np.vstack(( 251 | shift_x.ravel(), shift_y.ravel(), 252 | shift_x.ravel(), shift_y.ravel() 253 | )).transpose() 254 | 255 | # add A anchors (1, A, 4) to 256 | # cell K shifts (K, 1, 4) to get 257 | # shift anchors (K, A, 4) 258 | # reshape to (K*A, 4) shifted anchors 259 | A = anchors.shape[0] 260 | K = shifts.shape[0] 261 | all_anchors = (anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))) 262 | all_anchors = all_anchors.reshape((K * A, 4)) 263 | 264 | return all_anchors 265 | 266 | 267 | def generate_anchors(base_size=16, ratios=None, scales=None): 268 | """ 269 | Generate anchor (reference) windows by enumerating aspect ratios X 270 | scales w.r.t. a reference window. 271 | """ 272 | 273 | if ratios is None: 274 | ratios = np.array([0.5, 1, 2]) 275 | 276 | if scales is None: 277 | scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]) 278 | 279 | num_anchors = len(ratios) * len(scales) 280 | 281 | # initialize output anchors 282 | anchors = np.zeros((num_anchors, 4)) 283 | 284 | # scale base_size 285 | anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T 286 | 287 | # compute areas of anchors 288 | areas = anchors[:, 2] * anchors[:, 3] 289 | 290 | # correct for ratios 291 | anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales))) 292 | anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales)) 293 | 294 | # transform from (x_ctr, y_ctr, w, h) -> (x1, y1, x2, y2) 295 | anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T 296 | anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T 297 | 298 | return anchors 299 | 300 | 301 | def bbox_transform(anchors, gt_boxes, mean=None, std=None): 302 | """Compute bounding-box regression targets for an image.""" 303 | 304 | if mean is None: 305 | mean = np.array([0, 0, 0, 0]) 306 | if std is None: 307 | std = np.array([0.2, 0.2, 0.2, 0.2]) 308 | 309 | if isinstance(mean, (list, tuple)): 310 | mean = np.array(mean) 311 | elif not isinstance(mean, np.ndarray): 312 | raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean))) 313 | 314 | if isinstance(std, (list, tuple)): 315 | std = np.array(std) 316 | elif not isinstance(std, np.ndarray): 317 | raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std))) 318 | 319 | anchor_widths = anchors[:, 2] - anchors[:, 0] 320 | anchor_heights = anchors[:, 3] - anchors[:, 1] 321 | 322 | targets_dx1 = (gt_boxes[:, 0] - anchors[:, 0]) / anchor_widths 323 | targets_dy1 = (gt_boxes[:, 1] - anchors[:, 1]) / anchor_heights 324 | targets_dx2 = (gt_boxes[:, 2] - anchors[:, 2]) / anchor_widths 325 | targets_dy2 = (gt_boxes[:, 3] - anchors[:, 3]) / anchor_heights 326 | 327 | targets = np.stack((targets_dx1, targets_dy1, targets_dx2, targets_dy2)) 328 | targets = targets.T 329 | 330 | targets = (targets - mean) / std 331 | 332 | return targets 333 | -------------------------------------------------------------------------------- /keras_retinanet/utils/clean.py: -------------------------------------------------------------------------------- 1 | # make a list with the downloaded images 2 | import pandas as pd 3 | import os 4 | 5 | path_to_datafiles = "/home/mukeshmithrakumar/googleai/gcmount/challenge2018/" 6 | 7 | # code to load the train annotation box 8 | print("Reading challenge-2018-train-annotations-bbox.csv ....") 9 | challenge_2018_train_annotations_bbox = pd.read_csv(path_to_datafiles + "challenge-2018-train-annotations-bbox2.csv") 10 | challenge_2018_train_annotations_bbox = pd.DataFrame(challenge_2018_train_annotations_bbox) 11 | print("challenge_2018_train_annotations_bbox shape:", challenge_2018_train_annotations_bbox.shape) 12 | 13 | # code to load the validation annotation box 14 | print("Reading challenge-2018-image-ids-valset-od.csv ....") 15 | challenge_2018_image_ids_valset_od = pd.read_csv(path_to_datafiles + "challenge-2018-image-ids-valset-od2.csv") 16 | challenge_2018_image_ids_valset_od = pd.DataFrame(challenge_2018_image_ids_valset_od) 17 | print("challenge_2018_image_ids_valset_od shape:", challenge_2018_image_ids_valset_od.shape) 18 | 19 | # goes to the directory of the train/val images and creates a list of the downloaded images 20 | directory = "/home/mukeshmithrakumar/googleai/gcmount/images/train/train" 21 | downloaded_list = [] 22 | print("Parsing downloaded files ....") 23 | for filename in os.listdir(directory): 24 | downloaded_list.append(filename) 25 | print("downloaded files: ", len(downloaded_list)) 26 | 27 | # strips the imgs of the .jpg tag, see if you can add it to the for loop above 28 | downloaded_list = [imgs.strip('.jpg') for imgs in downloaded_list] 29 | 30 | # create a new df with descriptions from annotations for the downloaded images 31 | print("Creating new dataframes ....") 32 | train_annotations_bbox_downloaded_df_train = challenge_2018_train_annotations_bbox[ 33 | challenge_2018_train_annotations_bbox['ImageID'].isin(downloaded_list)] 34 | 35 | val_annotations_bbox_downloaded_df_train = challenge_2018_image_ids_valset_od[ 36 | challenge_2018_image_ids_valset_od['ImageID'].isin(downloaded_list)] 37 | 38 | print("challenge-2018-train-annotations-bbox shape:", train_annotations_bbox_downloaded_df_train.shape) 39 | print("challenge-2018-image-ids-valset-od shape:", val_annotations_bbox_downloaded_df_train.shape) 40 | 41 | # exported the data to csv 42 | print("Exporting the csv files ....") 43 | train_annotations_bbox_downloaded_df_train.to_csv(path_to_datafiles 44 | + 'challenge-2018-train-annotations-bbox.csv', index=False) 45 | val_annotations_bbox_downloaded_df_train.to_csv(path_to_datafiles 46 | + 'challenge-2018-image-ids-valset-od.csv', index=False) 47 | -------------------------------------------------------------------------------- /keras_retinanet/utils/freeze.py: -------------------------------------------------------------------------------- 1 | def freeze(model): 2 | """ Set all layers in a model to non-trainable. 3 | 4 | The weights for these layers will not be updated during training. 5 | 6 | This function modifies the given model in-place, 7 | but it also returns the modified model to allow easy chaining with other functions. 8 | """ 9 | for layer in model.layers: 10 | layer.trainable = False 11 | return model 12 | -------------------------------------------------------------------------------- /keras_retinanet/utils/initializers.py: -------------------------------------------------------------------------------- 1 | import keras 2 | import numpy as np 3 | import math 4 | 5 | 6 | class PriorProbability(keras.initializers.Initializer): 7 | """ Apply a prior probability to the weights. 8 | """ 9 | 10 | def __init__(self, probability=0.01): 11 | self.probability = probability 12 | 13 | def get_config(self): 14 | return { 15 | 'probability': self.probability 16 | } 17 | 18 | def __call__(self, shape, dtype=None): 19 | # set bias to -log((1 - p)/p) for foreground 20 | result = np.ones(shape, dtype=dtype) * -math.log((1 - self.probability) / self.probability) 21 | 22 | return result 23 | -------------------------------------------------------------------------------- /keras_retinanet/utils/layers.py: -------------------------------------------------------------------------------- 1 | import keras 2 | from ..utils import anchors as utils_anchors 3 | import numpy as np 4 | import tensorflow as tf 5 | 6 | 7 | def filter_detections( 8 | boxes, 9 | classification, 10 | other=[], 11 | class_specific_filter=True, 12 | nms=True, 13 | score_threshold=0.05, 14 | max_detections=300, 15 | nms_threshold=0.5 16 | ): 17 | """ Filter detections using the boxes and classification values. 18 | Args 19 | boxes : Tensor of shape (num_boxes, 4) containing the boxes in (x1, y1, x2, y2) format. 20 | classification : Tensor of shape (num_boxes, num_classes) containing the classification scores. 21 | other : List of tensors of shape (num_boxes, ...) to filter along with the boxes and 22 | classification scores. 23 | class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter those. 24 | nms : Flag to enable/disable non maximum suppression. 25 | score_threshold : Threshold used to prefilter the boxes with. 26 | max_detections : Maximum number of detections to keep. 27 | nms_threshold : Threshold for the IoU value to determine when a box should be suppressed. 28 | Returns 29 | A list of [boxes, scores, labels, other[0], other[1], ...]. 30 | boxes is shaped (max_detections, 4) and contains the (x1, y1, x2, y2) of the non-suppressed boxes. 31 | scores is shaped (max_detections,) and contains the scores of the predicted class. 32 | labels is shaped (max_detections,) and contains the predicted label. 33 | other[i] is shaped (max_detections, ...) and contains the filtered other[i] data. 34 | In case there are less than max_detections detections, the tensors are padded with -1's. 35 | """ 36 | 37 | def _filter_detections(scores, labels): 38 | # threshold based on score 39 | indices = tf.where(keras.backend.greater(scores, score_threshold)) 40 | 41 | if nms: 42 | filtered_boxes = tf.gather_nd(boxes, indices) 43 | filtered_scores = keras.backend.gather(scores, indices)[:, 0] 44 | 45 | # perform NMS 46 | nms_indices = tf.image.non_max_suppression(filtered_boxes, filtered_scores, max_output_size=max_detections, 47 | iou_threshold=nms_threshold) 48 | 49 | # filter indices based on NMS 50 | indices = keras.backend.gather(indices, nms_indices) 51 | 52 | # add indices to list of all indices 53 | labels = tf.gather_nd(labels, indices) 54 | indices = keras.backend.stack([indices[:, 0], labels], axis=1) 55 | 56 | return indices 57 | 58 | if class_specific_filter: 59 | all_indices = [] 60 | # perform per class filtering 61 | for c in range(int(classification.shape[1])): 62 | scores = classification[:, c] 63 | labels = c * keras.backend.ones((keras.backend.shape(scores)[0],), dtype='int64') 64 | all_indices.append(_filter_detections(scores, labels)) 65 | 66 | # concatenate indices to single tensor 67 | indices = keras.backend.concatenate(all_indices, axis=0) 68 | else: 69 | scores = keras.backend.max(classification, axis=1) 70 | labels = keras.backend.argmax(classification, axis=1) 71 | indices = _filter_detections(scores, labels) 72 | 73 | # select top k 74 | scores = tf.gather_nd(classification, indices) 75 | labels = indices[:, 1] 76 | scores, top_indices = tf.nn.top_k(scores, k=keras.backend.minimum(max_detections, keras.backend.shape(scores)[0])) 77 | 78 | # filter input using the final set of indices 79 | indices = keras.backend.gather(indices[:, 0], top_indices) 80 | boxes = keras.backend.gather(boxes, indices) 81 | labels = keras.backend.gather(labels, top_indices) 82 | other_ = [keras.backend.gather(o, indices) for o in other] 83 | 84 | # zero pad the outputs 85 | pad_size = keras.backend.maximum(0, max_detections - keras.backend.shape(scores)[0]) 86 | boxes = tf.pad(boxes, [[0, pad_size], [0, 0]], constant_values=-1) 87 | scores = tf.pad(scores, [[0, pad_size]], constant_values=-1) 88 | labels = tf.pad(labels, [[0, pad_size]], constant_values=-1) 89 | labels = keras.backend.cast(labels, 'int32') 90 | other_ = [tf.pad(o, [[0, pad_size]] + [[0, 0] for _ in range(1, len(o.shape))], constant_values=-1) for o in 91 | other_] 92 | 93 | # set shapes, since we know what they are 94 | boxes.set_shape([max_detections, 4]) 95 | scores.set_shape([max_detections]) 96 | labels.set_shape([max_detections]) 97 | for o, s in zip(other_, [list(keras.backend.int_shape(o)) for o in other]): 98 | o.set_shape([max_detections] + s[1:]) 99 | 100 | return [boxes, scores, labels] + other_ 101 | 102 | 103 | class FilterDetections(keras.layers.Layer): 104 | """ Keras layer for filtering detections using score threshold and NMS. 105 | """ 106 | 107 | def __init__( 108 | self, 109 | nms=True, 110 | class_specific_filter=True, 111 | nms_threshold=0.5, 112 | score_threshold=0.05, 113 | max_detections=300, 114 | parallel_iterations=32, 115 | **kwargs 116 | ): 117 | """ Filters detections using score threshold, NMS and selecting the top-k detections. 118 | Args 119 | nms : Flag to enable/disable NMS. 120 | class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter 121 | those. 122 | nms_threshold : Threshold for the IoU value to determine when a box should be suppressed. 123 | score_threshold : Threshold used to prefilter the boxes with. 124 | max_detections : Maximum number of detections to keep. 125 | parallel_iterations : Number of batch items to process in parallel. 126 | """ 127 | self.nms = nms 128 | self.class_specific_filter = class_specific_filter 129 | self.nms_threshold = nms_threshold 130 | self.score_threshold = score_threshold 131 | self.max_detections = max_detections 132 | self.parallel_iterations = parallel_iterations 133 | super(FilterDetections, self).__init__(**kwargs) 134 | 135 | def call(self, inputs, **kwargs): 136 | """ Constructs the NMS graph. 137 | Args 138 | inputs : List of [boxes, classification, other[0], other[1], ...] tensors. 139 | """ 140 | boxes = inputs[0] 141 | classification = inputs[1] 142 | other = inputs[2:] 143 | 144 | # wrap nms with our parameters 145 | def _filter_detections(args): 146 | boxes = args[0] 147 | classification = args[1] 148 | other = args[2] 149 | 150 | return filter_detections( 151 | boxes, 152 | classification, 153 | other, 154 | nms=self.nms, 155 | class_specific_filter=self.class_specific_filter, 156 | score_threshold=self.score_threshold, 157 | max_detections=self.max_detections, 158 | nms_threshold=self.nms_threshold, 159 | ) 160 | 161 | # call filter_detections on each batch 162 | outputs = tf.map_fn( 163 | _filter_detections, 164 | elems=[boxes, classification, other], 165 | dtype=[keras.backend.floatx(), keras.backend.floatx(), 'int32'] + [o.dtype for o in other], 166 | parallel_iterations=self.parallel_iterations 167 | ) 168 | 169 | return outputs 170 | 171 | def compute_output_shape(self, input_shape): 172 | """ Computes the output shapes given the input shapes. 173 | Args 174 | input_shape : List of input shapes [boxes, classification, other[0], other[1], ...]. 175 | Returns 176 | List of tuples representing the output shapes: 177 | [filtered_boxes.shape, filtered_scores.shape, filtered_labels.shape, filtered_other[0].shape, 178 | filtered_other[1].shape, ...] 179 | """ 180 | return [(input_shape[0][0], self.max_detections, 4), 181 | (input_shape[1][0], self.max_detections), 182 | (input_shape[1][0], self.max_detections), 183 | ] +[tuple([input_shape[i][0], self.max_detections] + 184 | list(input_shape[i][2:])) for i in range(2, len(input_shape))] 185 | 186 | def compute_mask(self, inputs, mask=None): 187 | """ This is required in Keras when there is more than 1 output. 188 | """ 189 | return (len(inputs) + 1) * [None] 190 | 191 | def get_config(self): 192 | """ Gets the configuration of this layer. 193 | Returns 194 | Dictionary containing the parameters of this layer. 195 | """ 196 | config = super(FilterDetections, self).get_config() 197 | config.update({ 198 | 'nms': self.nms, 199 | 'class_specific_filter': self.class_specific_filter, 200 | 'nms_threshold': self.nms_threshold, 201 | 'score_threshold': self.score_threshold, 202 | 'max_detections': self.max_detections, 203 | 'parallel_iterations': self.parallel_iterations, 204 | }) 205 | 206 | return config 207 | 208 | 209 | def shift(shape, stride, anchors): 210 | """ Produce shifted anchors based on shape of the map and stride size. 211 | Args 212 | shape : Shape to shift the anchors over. 213 | stride : Stride to shift the anchors with over the shape. 214 | anchors: The anchors to apply at each location. 215 | """ 216 | shift_x = (keras.backend.arange(0, shape[1], dtype=keras.backend.floatx()) 217 | + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride 218 | shift_y = (keras.backend.arange(0, shape[0], dtype=keras.backend.floatx()) 219 | + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride 220 | 221 | shift_x, shift_y = tf.meshgrid(shift_x, shift_y) 222 | shift_x = keras.backend.reshape(shift_x, [-1]) 223 | shift_y = keras.backend.reshape(shift_y, [-1]) 224 | 225 | shifts = keras.backend.stack([ 226 | shift_x, 227 | shift_y, 228 | shift_x, 229 | shift_y 230 | ], axis=0) 231 | 232 | shifts = keras.backend.transpose(shifts) 233 | number_of_anchors = keras.backend.shape(anchors)[0] 234 | 235 | k = keras.backend.shape(shifts)[0] # number of base points = feat_h * feat_w 236 | 237 | shifted_anchors = keras.backend.reshape(anchors, [1, number_of_anchors, 4]) + keras.backend.cast( 238 | keras.backend.reshape(shifts, [k, 1, 4]), keras.backend.floatx()) 239 | shifted_anchors = keras.backend.reshape(shifted_anchors, [k * number_of_anchors, 4]) 240 | 241 | return shifted_anchors 242 | 243 | 244 | def resize_images(images, size, method='bilinear', align_corners=False): 245 | """ See https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_images . 246 | Args 247 | method: The method used for interpolation. One of ('bilinear', 'nearest', 'bicubic', 'area'). 248 | """ 249 | methods = { 250 | 'bilinear': tf.image.ResizeMethod.BILINEAR, 251 | 'nearest': tf.image.ResizeMethod.NEAREST_NEIGHBOR, 252 | 'bicubic': tf.image.ResizeMethod.BICUBIC, 253 | 'area': tf.image.ResizeMethod.AREA, 254 | } 255 | return tf.image.resize_images(images, size, methods[method], align_corners) 256 | 257 | 258 | def bbox_transform_inv(boxes, deltas, mean=None, std=None): 259 | """ Applies deltas (usually regression results) to boxes (usually anchors). 260 | Before applying the deltas to the boxes, the normalization that was previously applied (in the generator) has to 261 | be removed. 262 | The mean and std are the mean and std as applied in the generator. They are unnormalized in this function and then 263 | applied to the boxes. 264 | Args 265 | boxes : np.array of shape (B, N, 4), where B is the batch size, N the number of boxes and 4 values for 266 | (x1, y1, x2, y2). 267 | deltas: np.array of same shape as boxes. These deltas (d_x1, d_y1, d_x2, d_y2) are a factor of the width/height. 268 | mean : The mean value used when computing deltas (defaults to [0, 0, 0, 0]). 269 | std : The standard deviation used when computing deltas (defaults to [0.2, 0.2, 0.2, 0.2]). 270 | Returns 271 | A np.array of the same shape as boxes, but with deltas applied to each box. 272 | The mean and std are used during training to normalize the regression values (networks love normalization). 273 | """ 274 | if mean is None: 275 | mean = [0, 0, 0, 0] 276 | if std is None: 277 | std = [0.2, 0.2, 0.2, 0.2] 278 | 279 | width = boxes[:, :, 2] - boxes[:, :, 0] 280 | height = boxes[:, :, 3] - boxes[:, :, 1] 281 | 282 | x1 = boxes[:, :, 0] + (deltas[:, :, 0] * std[0] + mean[0]) * width 283 | y1 = boxes[:, :, 1] + (deltas[:, :, 1] * std[1] + mean[1]) * height 284 | x2 = boxes[:, :, 2] + (deltas[:, :, 2] * std[2] + mean[2]) * width 285 | y2 = boxes[:, :, 3] + (deltas[:, :, 3] * std[3] + mean[3]) * height 286 | 287 | pred_boxes = keras.backend.stack([x1, y1, x2, y2], axis=2) 288 | 289 | return pred_boxes 290 | 291 | 292 | class Anchors(keras.layers.Layer): 293 | """ Keras layer for generating achors for a given shape. 294 | """ 295 | 296 | def __init__(self, size, stride, ratios=None, scales=None, *args, **kwargs): 297 | """ Initializer for an Anchors layer. 298 | 299 | Args 300 | size: The base size of the anchors to generate. 301 | stride: The stride of the anchors to generate. 302 | ratios: The ratios of the anchors to generate (defaults to [0.5, 1, 2]). 303 | scales: The scales of the anchors to generate (defaults to [2^0, 2^(1/3), 2^(2/3)]). 304 | """ 305 | self.size = size 306 | self.stride = stride 307 | self.ratios = ratios 308 | self.scales = scales 309 | 310 | if ratios is None: 311 | self.ratios = np.array([0.5, 1, 2], keras.backend.floatx()), 312 | elif isinstance(ratios, list): 313 | self.ratios = np.array(ratios) 314 | if scales is None: 315 | self.scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)], keras.backend.floatx()), 316 | elif isinstance(scales, list): 317 | self.scales = np.array(scales) 318 | 319 | self.num_anchors = len(ratios) * len(scales) 320 | self.anchors = keras.backend.variable(utils_anchors.generate_anchors( 321 | base_size=size, 322 | ratios=ratios, 323 | scales=scales, 324 | )) 325 | 326 | super(Anchors, self).__init__(*args, **kwargs) 327 | 328 | def call(self, inputs, **kwargs): 329 | features = inputs 330 | features_shape = keras.backend.shape(features)[:3] 331 | 332 | # generate proposals from bbox deltas and shifted anchors 333 | anchors = shift(features_shape[1:3], self.stride, self.anchors) 334 | anchors = keras.backend.tile(keras.backend.expand_dims(anchors, axis=0), (features_shape[0], 1, 1)) 335 | 336 | return anchors 337 | 338 | def compute_output_shape(self, input_shape): 339 | if None not in input_shape[1:]: 340 | total = np.prod(input_shape[1:3]) * self.num_anchors 341 | return (input_shape[0], total, 4) 342 | else: 343 | return (input_shape[0], None, 4) 344 | 345 | def get_config(self): 346 | config = super(Anchors, self).get_config() 347 | config.update({ 348 | 'size': self.size, 349 | 'stride': self.stride, 350 | 'ratios': self.ratios.tolist(), 351 | 'scales': self.scales.tolist(), 352 | }) 353 | 354 | return config 355 | 356 | 357 | class UpsampleLike(keras.layers.Layer): 358 | """ Keras layer for upsampling a Tensor to be the same shape as another Tensor. 359 | """ 360 | 361 | def call(self, inputs, **kwargs): 362 | source, target = inputs 363 | target_shape = keras.backend.shape(target) 364 | return resize_images(source, (target_shape[1], target_shape[2]), method='nearest') 365 | 366 | def compute_output_shape(self, input_shape): 367 | return (input_shape[0][0],) + input_shape[1][1:3] + (input_shape[0][-1],) 368 | 369 | 370 | class RegressBoxes(keras.layers.Layer): 371 | """ Keras layer for applying regression values to boxes. 372 | """ 373 | 374 | def __init__(self, mean=None, std=None, *args, **kwargs): 375 | """ Initializer for the RegressBoxes layer. 376 | 377 | Args 378 | mean: The mean value of the regression values which was used for normalization. 379 | std: The standard value of the regression values which was used for normalization. 380 | """ 381 | if mean is None: 382 | mean = np.array([0, 0, 0, 0]) 383 | if std is None: 384 | std = np.array([0.2, 0.2, 0.2, 0.2]) 385 | 386 | if isinstance(mean, (list, tuple)): 387 | mean = np.array(mean) 388 | elif not isinstance(mean, np.ndarray): 389 | raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean))) 390 | 391 | if isinstance(std, (list, tuple)): 392 | std = np.array(std) 393 | elif not isinstance(std, np.ndarray): 394 | raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std))) 395 | 396 | self.mean = mean 397 | self.std = std 398 | super(RegressBoxes, self).__init__(*args, **kwargs) 399 | 400 | def call(self, inputs, **kwargs): 401 | anchors, regression = inputs 402 | return bbox_transform_inv(anchors, regression, mean=self.mean, std=self.std) 403 | 404 | def compute_output_shape(self, input_shape): 405 | return input_shape[0] 406 | 407 | def get_config(self): 408 | config = super(RegressBoxes, self).get_config() 409 | config.update({ 410 | 'mean': self.mean.tolist(), 411 | 'std': self.std.tolist(), 412 | }) 413 | 414 | return config 415 | 416 | 417 | class ClipBoxes(keras.layers.Layer): 418 | """ Keras layer to clip box values to lie inside a given shape. 419 | """ 420 | 421 | def call(self, inputs, **kwargs): 422 | image, boxes = inputs 423 | shape = keras.backend.cast(keras.backend.shape(image), keras.backend.floatx()) 424 | 425 | x1 = tf.clip_by_value(boxes[:, :, 0], 0, shape[2]) 426 | y1 = tf.clip_by_value(boxes[:, :, 1], 0, shape[1]) 427 | x2 = tf.clip_by_value(boxes[:, :, 2], 0, shape[2]) 428 | y2 = tf.clip_by_value(boxes[:, :, 3], 0, shape[1]) 429 | 430 | return keras.backend.stack([x1, y1, x2, y2], axis=2) 431 | 432 | def compute_output_shape(self, input_shape): 433 | return input_shape[1] 434 | -------------------------------------------------------------------------------- /keras_retinanet/utils/losses.py: -------------------------------------------------------------------------------- 1 | import keras 2 | import tensorflow as tf 3 | 4 | 5 | def focal(alpha=0.25, gamma=2.0): 6 | """ Create a functor for computing the classification focal loss. 7 | Args 8 | alpha: Scale the focal weight with alpha. 9 | gamma: Take the power of the focal weight with gamma. 10 | Returns 11 | A functor that computes the focal loss using the alpha and gamma. 12 | """ 13 | 14 | def _focal(y_true, y_pred): 15 | """ Compute the focal loss given the target tensor and the predicted tensor. 16 | As defined in https://arxiv.org/abs/1708.02002 17 | Args 18 | y_true: Tensor of target data from the generator with shape (B, N, num_classes). 19 | y_pred: Tensor of predicted data from the network with shape (B, N, num_classes). 20 | Returns 21 | The focal loss of y_pred w.r.t. y_true. 22 | """ 23 | labels = y_true[:, :, :-1] 24 | anchor_state = y_true[:, :, -1] # -1 for ignore, 0 for background, 1 for object 25 | classification = y_pred 26 | 27 | # filter out "ignore" anchors 28 | indices = tf.where(keras.backend.not_equal(anchor_state, -1)) 29 | labels = tf.gather_nd(labels, indices) 30 | classification = tf.gather_nd(classification, indices) 31 | 32 | # compute the focal loss 33 | alpha_factor = keras.backend.ones_like(labels) * alpha 34 | alpha_factor = tf.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor) 35 | focal_weight = tf.where(keras.backend.equal(labels, 1), 1 - classification, classification) 36 | focal_weight = alpha_factor * focal_weight ** gamma 37 | 38 | cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification) 39 | 40 | # compute the normalizer: the number of positive anchors 41 | normalizer = tf.where(keras.backend.equal(anchor_state, 1)) 42 | normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx()) 43 | normalizer = keras.backend.maximum(1.0, normalizer) 44 | 45 | return keras.backend.sum(cls_loss) / normalizer 46 | 47 | return _focal 48 | 49 | 50 | def smooth_l1(sigma=3.0): 51 | """ Create a smooth L1 regression loss functor. 52 | Args 53 | sigma: This argument defines the point where the loss changes from L2 to L1. 54 | Returns 55 | A functor for computing the smooth L1 loss given target data and predicted data. 56 | """ 57 | sigma_squared = sigma ** 2 58 | 59 | def _smooth_l1(y_true, y_pred): 60 | """ Compute the smooth L1 loss of y_pred w.r.t. y_true. 61 | Args 62 | y_true: Tensor from the generator of shape (B, N, 5). The last value for each box is the state of the 63 | anchor (ignore, negative, positive). 64 | y_pred: Tensor from the network of shape (B, N, 4). 65 | Returns 66 | The smooth L1 loss of y_pred w.r.t. y_true. 67 | """ 68 | # separate target and state 69 | regression = y_pred 70 | regression_target = y_true[:, :, :4] 71 | anchor_state = y_true[:, :, 4] 72 | 73 | # filter out "ignore" anchors 74 | indices = tf.where(keras.backend.equal(anchor_state, 1)) 75 | regression = tf.gather_nd(regression, indices) 76 | regression_target = tf.gather_nd(regression_target, indices) 77 | 78 | # compute smooth L1 loss 79 | # f(x) = 0.5 * (sigma * x)^2 if |x| < 1 / sigma / sigma 80 | # |x| - 0.5 / sigma / sigma otherwise 81 | regression_diff = regression - regression_target 82 | regression_diff = keras.backend.abs(regression_diff) 83 | regression_loss = tf.where( 84 | keras.backend.less(regression_diff, 1.0 / sigma_squared), 85 | 0.5 * sigma_squared * keras.backend.pow(regression_diff, 2), 86 | regression_diff - 0.5 / sigma_squared 87 | ) 88 | 89 | # compute the normalizer: the number of positive anchors 90 | normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0]) 91 | normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx()) 92 | return keras.backend.sum(regression_loss) / normalizer 93 | 94 | return _smooth_l1 95 | -------------------------------------------------------------------------------- /logo/keras-logo-2018-large-1200.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/logo/keras-logo-2018-large-1200.png -------------------------------------------------------------------------------- /logo/share2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mukeshmithrakumar/RetinaNet/1459aee0c07693ce813a77f97c3e3889cb3c4826/logo/share2.jpg --------------------------------------------------------------------------------