├── LICENSE ├── README.md ├── cont_num.txt ├── create_pascal_tf_record.py ├── data └── container_label_map.pbtxt ├── detection_var_image.py ├── generate_voc_datasets.py ├── image ├── image1.jpg ├── image2.jpg ├── image3.jpg ├── image4.jpg └── image5.jpg └── utils ├── __init__.py └── visualization_utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Container detection and container number OCR using Tensorflow Object Detection API and Tesseract 2 | 3 | Container detection and container number OCR is a specific project requirement, using [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) and [Tesseract](https://github.com/tesseract-ocr/tesseract) to verify feasibility is one of the quickest and simplest ways. 4 | 5 | >两年多之前我在“ex公司”的时候,有一个明确的项目需求是集装箱识别并计数,然后通过集装箱号OCR识别记录每一个集装箱号,然后与其余业务系统的数据进行交换,以实现特定的需求。正好Tensorflow Object Detection API 发布了,就放弃了YOLO或者SSD的选项,考虑用TF实现Demo做POC验证了。具体需求实现的思考与pipeline构想思考参见这篇文章:[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/) 。 6 | 7 | ## 用法 8 | 9 | ### Tensorflow Object Detection API 安装 10 | 11 | 具体安装参考官方[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md)。 12 | 13 | ### 环境与依赖 14 | 15 | 本人使用的环境是:macOS 10.14.2,python 3.6.8,TF 1.12 16 | 除了Tensorflow Object Detection API 安装必备的依赖外,还需要以下依赖: 17 | tesseract 18 | pytesseract 19 | 具体安装及用途,请自行Google。 20 | `visualization_utils.py`中: 21 | 22 | ``` python 23 | import matplotlib; matplotlib.use('Agg') 24 | ``` 25 | 26 | Agg在我的环境下用不了,也懒得折腾,所以把这句改了。 27 | 28 | ### 数据集准备 29 | 30 | 参考PascalVOC的数据集格式,使用[LabelImg](https://github.com/tzutalin/labelImg)进行标注。 31 | 标注完成后可以使用`generate_voc_datasets.py`按你的想法分割数据集为:train 、val 与 test这个三个data set。 32 | 分割为三个data set后,可以使用`create_pascal_tf_record.py`转换为TF record格式data set文件供TF使用(此文件官方提供,在`/object_detection/dataset_tools/`)。 33 | 有关数据准备的内容,可以参考这里的[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)。 34 | 35 | ### 训练 36 | 37 | 参考[官方说明-本地](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md)使用官方代码库中的`model_main.py`在本地训练(以前是train 和 val 分别提供了两个版本,目前版本用这一个文件就可以了。)。 38 | 参考[官方说明——Google Cloud ML Engine](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md)在Google Cloud ML Engine上使用TPU训练,资费说明在[这里](https://cloud.google.com/ml-engine/docs/tensorflow/pricing?hl=zh-CN),可以选择“竞争”模式使用,会便宜很多。 39 | 40 | ### 验证 41 | 42 | 可以使用官方代码中的`object_detection_tutorial.ipynb`做快速验证尝试。本repo中的`detection_var_image.py`也主要参考这个ipynp实现的。 43 | 以下几个位置需要根据你自己的实际情况来修改: 44 | 45 | ``` python 46 | 47 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17' 48 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb' 49 | # List of the strings that is used to add correct label for each box. 50 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt') 51 | 52 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)] 53 | 54 | lang = 'cont41' 55 | 56 | ``` 57 | 58 | 其中`lang = 'cont41'`中的`cont41`是trsseract使用的lang文件的名字,如果你还没有来得及自己训练lang文件,可以把`lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f'`中的其余内容都删了,仅保留`eng`,使用tesseract安装默认带的lang文件进行识别。 59 | 返回的`image_label`为一个嵌套列表,会是这个样子: 60 | 61 | ``` python 62 | 63 | [{'image1': [{'lable': 'container_number', 'actual': '100%', 'cont_num': 'TCLU § 148575 3\n45G1', 'image_corp_name': 'image1_1_container_number'}]}, {'image2': [{'lable': 'container_number', 'actual': '99%', 'cont_num': 'TRNU816699 4 |\n45G1', 'image_corp_name': 'image2_1_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'TCNU89092898\n4561', 'image_corp_name': 'image2_2_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'MSKUY 86801264\n4561', 'image_corp_name': 'image2_3_container_number'}]}] 64 | 65 | ``` 66 | 67 | 每个索引对应一个字典,字典的: 68 | `key`为输入的图片名称; 69 | `value`为一个列表,列表的索引对应的是由4个key构成的字典,分别是标签、置信度、OCR的结果以及输出的裁剪后的集装箱号图片的名称,索引数量则代表了在图片中找到的集装箱号。 70 | 71 | 主要是考虑如果再用flask做个Web,可以直接用flask简单做个服务端,把检测的结果JSON串一次性抛出来,Demo环节没必要再单独折腾TensorFlow Serving部署一个后端。 72 | 73 | 对于每张输入的图片,除了上述JOSN输出外,还输出: 74 | 绘制了Bounding box 与 label 的图片; 75 | 集装箱号位置的裁剪图片(有几个裁几个),以及使用openCV做了预处理后丢入tesseract之前的图片。通过对比图片与OCR结果,可以给我们调整图片预处理的思路与参数。 76 | 77 | #### Demo 78 | 79 | `image`文件夹下有5张测试图片,测试结果在`cont_num.txt`中,部分如下: 80 | 81 | | 图片名 | OCR结果 | 实际 | 82 | |:------:|:------:|:----:| 83 | | image1_1_container_number_100% | TCLU § 148575 3 45G1 | TCLU 148575 3 45G1 | 84 | |image2_1_container_number_99% | TRNU816699 4 \| 45G1 | TRLU 818699 0 45G1 | 85 | | image2_2_container_number_99% | TCNU89092898 4561 | TCNU 869248 8 45G1 | 86 | | image2_3_container_number_99% | MSKUY 86801264 4561 | MSKU 868012 6 4561 | 87 | | image3_1_container_number_99% | x L BOUL 871489 7 \| 221 | BMOU 871489 7 22R1 | 88 | | image3_2_container_number_99% | FCIU [599867 (0 22G1 | FCIU 599887 0 22G1 | 89 | 90 | 可以看到,OCR的整体准确率并不高,可以说,与我在[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/)中预估的准确率不超过8成是匹配的(现在看肯定是事后诸葛亮,但在当时下决心做验证的时候是这么一个真实预测)。这个准确率并不是没有提高可能的,实际上在以下几个方面可以继续做一些工作进行尝试: 91 | 92 | - 因为Tesseract训练用的图片质量大多和`image1.jpg`接近,所以需要调整训练集的图片质量,使其比较符合工程场景图像质量; 93 | - 工程场景下,尽量保证图像质量,并且通过工程现场使用,收集图片; 94 | - 图片收集足够数量后,OCR引擎转变为深度学习版本的; 95 | - 改善OCR之前的图像预处理策略,事实上,我在换了其他的预处理策略后,结果是可以优于上述表现的。 96 | 97 | 其中,`image1.jpg`输出图片分别如下: 98 | ![Bounding box](https://ws1.sinaimg.cn/large/55fc1144gy1fzkay5dqltj20qo0zk42p.jpg) 99 | 100 | ![original](https://ws1.sinaimg.cn/large/55fc1144gy1fzkaz64bcxj209t03ydfr.jpg) 101 | 102 | ![gray](https://ws1.sinaimg.cn/large/55fc1144gy1fzkazp3xcij209t03ywel.jpg) 103 | 104 | #### To Do 105 | 106 | - [ ] 增加使用视频流检测的Demo版本 107 | - [ ] 用flask增加一个简单的Web上传与显示结果的页面 108 | 109 | #### 参考 110 | 111 | [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) 112 | 113 | [Tensorflow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) 114 | -------------------------------------------------------------------------------- /cont_num.txt: -------------------------------------------------------------------------------- 1 | image1_1_container_number_100% 2 | TCLU § 148575 3 3 | 45G1 4 | image2_1_container_number_99% 5 | TRNU816699 4 | 6 | 45G1 7 | image2_2_container_number_99% 8 | TCNU89092898 9 | 4561 10 | image2_3_container_number_99% 11 | MSKUY 86801264 12 | 4561 13 | image3_1_container_number_99% 14 | x L 15 | BOUL 871489 7 | 16 | 221 17 | image3_2_container_number_99% 18 | FCIU [599867 (0 19 | 22G1 20 | image4_1_container_number_e_99% 21 | WH LU 555149 22 | CSU 23 | image5_1_container_number_99% 24 | 5421 357770 4 25 | 2261 26 | image5_2_container_number_99% 27 | 1 28 | BSU247709 29 | | 2221 30 | image5_3_container_number_99% 31 | TRHU | 395563 32 | 2261 33 | image5_4_container_number_99% 34 | TRUU20275643 35 | 221 36 | -------------------------------------------------------------------------------- /create_pascal_tf_record.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | r"""Convert raw PASCAL dataset to TFRecord for object_detection. 17 | 18 | Example usage: 19 | python object_detection/dataset_tools/create_pascal_tf_record.py \ 20 | --data_dir=/home/user/VOCdevkit \ 21 | --year=VOC2012 \ 22 | --output_path=/home/user/pascal.record 23 | """ 24 | from __future__ import absolute_import 25 | from __future__ import division 26 | from __future__ import print_function 27 | 28 | import hashlib 29 | import io 30 | import logging 31 | import os 32 | 33 | from lxml import etree 34 | import PIL.Image 35 | import tensorflow as tf 36 | 37 | from object_detection.utils import dataset_util 38 | from object_detection.utils import label_map_util 39 | 40 | 41 | flags = tf.app.flags 42 | flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.') 43 | flags.DEFINE_string('set', 'val', 'Convert training set, validation set or ' 44 | 'merged set.') 45 | flags.DEFINE_string('annotations_dir', 'Annotations', 46 | '(Relative) path to annotations directory.') 47 | flags.DEFINE_string('year', 'cont_train', 'Desired challenge year.') 48 | flags.DEFINE_string('output_path', '', 'Path to output TFRecord') 49 | flags.DEFINE_string('label_map_path', '', 50 | 'Path to label map proto') 51 | flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore ' 52 | 'difficult instances') 53 | FLAGS = flags.FLAGS 54 | 55 | SETS = ['train', 'val', 'trainval', 'test'] 56 | YEARS = ['cont_train', 'VOC2012', 'merged'] 57 | 58 | 59 | def dict_to_tf_example(data, 60 | dataset_directory, 61 | label_map_dict, 62 | ignore_difficult_instances=False, 63 | image_subdirectory='JPEGImages'): 64 | """Convert XML derived dict to tf.Example proto. 65 | 66 | Notice that this function normalizes the bounding box coordinates provided 67 | by the raw data. 68 | 69 | Args: 70 | data: dict holding PASCAL XML fields for a single image (obtained by 71 | running dataset_util.recursive_parse_xml_to_dict) 72 | dataset_directory: Path to root directory holding PASCAL dataset 73 | label_map_dict: A map from string label names to integers ids. 74 | ignore_difficult_instances: Whether to skip difficult instances in the 75 | dataset (default: False). 76 | image_subdirectory: String specifying subdirectory within the 77 | PASCAL dataset directory holding the actual image data. 78 | 79 | Returns: 80 | example: The converted tf.Example. 81 | 82 | Raises: 83 | ValueError: if the image pointed to by data['filename'] is not a valid JPEG 84 | """ 85 | img_path = os.path.join('cont_train', image_subdirectory, data['filename']) # I do'n know why data['folder'] give wrong path. 86 | full_path = os.path.join(dataset_directory, img_path) 87 | with tf.gfile.GFile(full_path, 'rb') as fid: 88 | encoded_jpg = fid.read() 89 | encoded_jpg_io = io.BytesIO(encoded_jpg) 90 | image = PIL.Image.open(encoded_jpg_io) 91 | if image.format != 'JPEG': 92 | raise ValueError('Image format not JPEG') 93 | key = hashlib.sha256(encoded_jpg).hexdigest() 94 | 95 | width = int(data['size']['width']) 96 | height = int(data['size']['height']) 97 | 98 | xmin = [] 99 | ymin = [] 100 | xmax = [] 101 | ymax = [] 102 | classes = [] 103 | classes_text = [] 104 | truncated = [] 105 | poses = [] 106 | difficult_obj = [] 107 | if 'object' in data: 108 | for obj in data['object']: 109 | difficult = bool(int(obj['difficult'])) 110 | if ignore_difficult_instances and difficult: 111 | continue 112 | 113 | difficult_obj.append(int(difficult)) 114 | 115 | xmin.append(float(obj['bndbox']['xmin']) / width) 116 | ymin.append(float(obj['bndbox']['ymin']) / height) 117 | xmax.append(float(obj['bndbox']['xmax']) / width) 118 | ymax.append(float(obj['bndbox']['ymax']) / height) 119 | classes_text.append(obj['name'].encode('utf8')) 120 | classes.append(label_map_dict[obj['name']]) 121 | truncated.append(int(obj['truncated'])) 122 | poses.append(obj['pose'].encode('utf8')) 123 | 124 | example = tf.train.Example(features=tf.train.Features(feature={ 125 | 'image/height': dataset_util.int64_feature(height), 126 | 'image/width': dataset_util.int64_feature(width), 127 | 'image/filename': dataset_util.bytes_feature( 128 | data['filename'].encode('utf8')), 129 | 'image/source_id': dataset_util.bytes_feature( 130 | data['filename'].encode('utf8')), 131 | 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), 132 | 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 133 | 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')), 134 | 'image/object/bbox/xmin': dataset_util.float_list_feature(xmin), 135 | 'image/object/bbox/xmax': dataset_util.float_list_feature(xmax), 136 | 'image/object/bbox/ymin': dataset_util.float_list_feature(ymin), 137 | 'image/object/bbox/ymax': dataset_util.float_list_feature(ymax), 138 | 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 139 | 'image/object/class/label': dataset_util.int64_list_feature(classes), 140 | 'image/object/difficult': dataset_util.int64_list_feature(difficult_obj), 141 | 'image/object/truncated': dataset_util.int64_list_feature(truncated), 142 | 'image/object/view': dataset_util.bytes_list_feature(poses), 143 | })) 144 | return example 145 | 146 | 147 | def main(_): 148 | if FLAGS.set not in SETS: 149 | raise ValueError('set must be in : {}'.format(SETS)) 150 | if FLAGS.year not in YEARS: 151 | raise ValueError('year must be in : {}'.format(YEARS)) 152 | 153 | data_dir = FLAGS.data_dir 154 | years = ['cont_train', 'VOC2012'] 155 | if FLAGS.year != 'merged': 156 | years = [FLAGS.year] 157 | 158 | writer = tf.python_io.TFRecordWriter(FLAGS.output_path) 159 | 160 | label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path) 161 | 162 | for year in years: 163 | logging.info('Reading from PASCAL %s dataset.', year) 164 | examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main', FLAGS.set + '.txt') 165 | annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir) 166 | examples_list = dataset_util.read_examples_list(examples_path) 167 | for idx, example in enumerate(examples_list): 168 | if idx % 100 == 0: 169 | logging.info('On image %d of %d', idx, len(examples_list)) 170 | path = os.path.join(annotations_dir, example + '.xml') 171 | with tf.gfile.GFile(path, 'r') as fid: 172 | xml_str = fid.read() 173 | xml = etree.fromstring(xml_str) 174 | data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation'] 175 | 176 | tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict, 177 | FLAGS.ignore_difficult_instances) 178 | 179 | writer.write(tf_example.SerializeToString()) 180 | 181 | writer.close() 182 | 183 | 184 | if __name__ == '__main__': 185 | tf.app.run() 186 | -------------------------------------------------------------------------------- /data/container_label_map.pbtxt: -------------------------------------------------------------------------------- 1 | item { 2 | id: 1 3 | name: 'container_number' 4 | } 5 | 6 | item { 7 | id: 2 8 | name: 'container_number_v' 9 | } 10 | 11 | item { 12 | id: 6 13 | name: 'container_number_e' 14 | } 15 | 16 | item { 17 | id: 3 18 | name: 'container_door' 19 | } 20 | item { 21 | id: 4 22 | name: 'container_end_door' 23 | } 24 | 25 | item { 26 | id: 5 27 | name: 'container' 28 | } -------------------------------------------------------------------------------- /detection_var_image.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | __author__ = 'Kevin Di' 4 | 5 | import numpy as np 6 | import os 7 | from skimage import io, data 8 | import six.moves.urllib as urllib 9 | import sys 10 | import tarfile 11 | import tensorflow as tf 12 | 13 | from collections import defaultdict 14 | import collections 15 | from io import StringIO 16 | import matplotlib as mpl 17 | 18 | from matplotlib import pyplot as plt 19 | from PIL import Image 20 | import pytesseract 21 | import cv2 22 | import re 23 | 24 | 25 | # This is needed since the notebook is stored in the object_detection folder. 26 | sys.path.append("..") 27 | from object_detection.utils import ops as utils_ops 28 | 29 | 30 | from object_detection.utils import label_map_util 31 | 32 | from object_detection.utils import visualization_utils as vis_util 33 | 34 | 35 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17' 36 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb' 37 | # List of the strings that is used to add correct label for each box. 38 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt') 39 | 40 | detection_graph = tf.Graph() 41 | with detection_graph.as_default(): 42 | od_graph_def = tf.GraphDef() 43 | with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid: 44 | serialized_graph = fid.read() 45 | od_graph_def.ParseFromString(serialized_graph) 46 | tf.import_graph_def(od_graph_def, name='') 47 | 48 | category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True) 49 | 50 | def load_image_into_numpy_array(image): 51 | (im_width, im_height) = image.size 52 | return np.array(image.getdata()).reshape( 53 | (im_height, im_width, 3)).astype(np.uint8) 54 | 55 | # If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS. 56 | PATH_TO_TEST_IMAGES_DIR = 'test_images' 57 | 58 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)] 59 | 60 | 61 | # Size, in inches, of the output images,use to plt.figure(figsize=IMAGE_SIZE) 62 | # IMAGE_SIZE = (12, 8) 63 | 64 | def run_inference_for_single_image(image, graph): 65 | with graph.as_default(): 66 | with tf.Session(config = tf.ConfigProto( 67 | device_count = {"CPU":16}, 68 | inter_op_parallelism_threads = 5, 69 | intra_op_parallelism_threads = 2, 70 | )) as sess: 71 | # Get handles to input and output tensors 72 | ops = tf.get_default_graph().get_operations() 73 | all_tensor_names = {output.name for op in ops for output in op.outputs} 74 | tensor_dict = {} 75 | for key in [ 76 | 'num_detections', 'detection_boxes', 'detection_scores', 77 | 'detection_classes', 'detection_masks' 78 | ]: 79 | tensor_name = key + ':0' 80 | if tensor_name in all_tensor_names: 81 | tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( 82 | tensor_name) 83 | if 'detection_masks' in tensor_dict: 84 | # The following processing is only for single image 85 | detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) 86 | detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0]) 87 | # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size. 88 | real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32) 89 | detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1]) 90 | detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1]) 91 | detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( 92 | detection_masks, detection_boxes, image.shape[0], image.shape[1]) 93 | detection_masks_reframed = tf.cast( 94 | tf.greater(detection_masks_reframed, 0.5), tf.uint8) 95 | # Follow the convention by adding back the batch dimension 96 | tensor_dict['detection_masks'] = tf.expand_dims( 97 | detection_masks_reframed, 0) 98 | image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0') 99 | 100 | # Run inference 101 | output_dict = sess.run(tensor_dict, 102 | feed_dict={image_tensor: np.expand_dims(image, 0)}) 103 | 104 | # all outputs are float32 numpy arrays, so convert types as appropriate 105 | output_dict['num_detections'] = int(output_dict['num_detections'][0]) 106 | output_dict['detection_classes'] = output_dict[ 107 | 'detection_classes'][0].astype(np.uint8) 108 | output_dict['detection_boxes'] = output_dict['detection_boxes'][0] 109 | output_dict['detection_scores'] = output_dict['detection_scores'][0] 110 | if 'detection_masks' in output_dict: 111 | output_dict['detection_masks'] = output_dict['detection_masks'][0] 112 | return output_dict 113 | 114 | def image_preprocessing(img): 115 | # image_gray = img 116 | image_gray = cv2.cvtColor(np.asarray(img), cv2.COLOR_BGR2GRAY) 117 | # image_gray = cv2.medianBlur(image_gray, 3) 118 | # image_gray = cv2.threshold(image_gray, 127, 255, cv2.THRESH_BINARY_INV)[1] 119 | # adaptiveThreshold not good ,just try it. 120 | # image_gray = cv2.adaptiveThreshold(image_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2) 121 | 122 | return image_gray 123 | # box_to_color_map{(xmin,xmax,ymin,ymax)(***): 'color'} 124 | # box_to_display_str_map{(xmin,xmax,ymin,ymax)(don't no): ['label: xx%']} 125 | def img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang = 'cont41'): 126 | cont_num_find = 0 127 | img_label = [] 128 | # Convert coordinates to raw pixels. 129 | for box, color in box_to_color_map.items(): 130 | ymin, xmin, ymax, xmax = box 131 | # loads the original image, visualize_boxes_and_labels_on_image_array returned image had draw bounding boxs on it. 132 | image_corp_org = Image.fromarray(np.uint8(image_org)) 133 | img_width, img_height = image_corp_org.size 134 | new_xmin = int(xmin * img_width) 135 | new_xmax = int(xmax * img_width) 136 | new_ymin = int(ymin * img_height) 137 | new_ymax = int(ymax * img_height) 138 | # Increase cropping security boundary(px). 139 | offset = 5 140 | if new_xmin - offset >= 0: 141 | new_xmin = new_xmin - offset 142 | if new_xmax + offset <= img_width: 143 | new_xmax = new_xmax + offset 144 | if new_ymin - offset >= 0: 145 | new_ymin = new_ymin - offset 146 | if new_ymax + offset <= img_height: 147 | new_ymax = new_ymax + offset 148 | # Get the label name of every bounding box,and rename 'xxx: 90%' to 'xxx-90%'. 149 | img_label_name = box_to_display_str_map[box][0].split(': ') 150 | # Corp image. Note that the PLI and Numpy coordinates are reversed!!! 151 | image_corp_org = load_image_into_numpy_array(image_org)[new_ymin:new_ymax,new_xmin:new_xmax] 152 | image_corp_org = Image.fromarray(np.uint8(image_corp_org)) 153 | # Tesseract OCR 154 | lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f' 155 | if re.match('container_number+', img_label_name[0]): 156 | cont_num_find += 1 157 | image_corp_gray = image_preprocessing(image_corp_org) 158 | if re.match('container_number_v+', img_label_name[0]): 159 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6') 160 | elif re.match('container_number_e+', img_label_name[0]): 161 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6') 162 | else : 163 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 4') 164 | # Save corp image to outo_path ,and join lable in name. 165 | # image_corp_name make up like this :'image_name(input)'_'cont_num_find'_'img_label_name' 166 | image_corp_name = image_name[:-4]+ '_'+ str(cont_num_find)+ '_'+ img_label_name[0] 167 | # img_lable[{lable,actual,cont_num,image_corp_name}] 168 | img_label.append({'lable':img_label_name[0], 'actual':img_label_name[1], 'cont_num':cont_num, 'image_corp_name':image_corp_name}) 169 | image_corp_org.save(os.path.join(output_path) + '/' + image_corp_name + '_org_'+ image_name[-4:]) 170 | cv2.imwrite(os.path.join(output_path) + '/' + image_corp_name + '_gray_'+ image_name[-4:], image_corp_gray) 171 | file = open(os.path.join(PATH_TO_TEST_IMAGES_DIR, 'cont_num.txt'), 'a') 172 | file.write(img_label[cont_num_find - 1]['image_corp_name']+ '_' + img_label[cont_num_find - 1]['actual'] + '\n' + img_label[cont_num_find - 1]['cont_num']+ '\n') 173 | file.close() 174 | return img_label # image_corp_org, image_corp_gray 175 | 176 | def detection(): 177 | image_label =[] 178 | for image_path in TEST_IMAGE_PATHS: 179 | image_org = Image.open(image_path, 'r') 180 | # the array based representation of the image will be used later in order to prepare the 181 | # result image with boxes and labels on it. 182 | image_np = load_image_into_numpy_array(image_org) 183 | # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 184 | # image_np_expanded = np.expand_dims(image_np, axis=0) 185 | image_name = os.path.basename(os.path.join(image_path)) 186 | # Actual detection. 187 | output_dict = run_inference_for_single_image(image_np, detection_graph) 188 | 189 | output_path = os.path.join(PATH_TO_TEST_IMAGES_DIR) 190 | 191 | # Visualization of the results of a detection. 192 | image, box_to_color_map, box_to_display_str_map = vis_util.visualize_boxes_and_labels_on_image_array( 193 | image_np, 194 | output_dict['detection_boxes'], 195 | output_dict['detection_classes'], 196 | output_dict['detection_scores'], 197 | category_index, 198 | instance_masks=output_dict.get('detection_masks'), 199 | use_normalized_coordinates=True, 200 | max_boxes_to_draw=200, 201 | min_score_thresh=.75, 202 | line_thickness=2) 203 | 204 | # Crop bounding box to splt images. 205 | lang = 'cont41' 206 | img_label = img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang) 207 | # save visualize_boxes_and_labels_on_image_array output image. 208 | image_name = os.path.basename(os.path.join(image_path)) 209 | output_image_name = image_name[:-4] + '_out' + image_name[-4:] 210 | image_out = Image.fromarray(image_np) 211 | image_out.save(os.path.join(PATH_TO_TEST_IMAGES_DIR) + '/'+ output_image_name) 212 | image_label.append({str(image_name[:-4]): img_label}) 213 | return image_label 214 | 215 | 216 | if __name__ == "__main__": 217 | print(detection()) 218 | 219 | 220 | 221 | 222 | 223 | -------------------------------------------------------------------------------- /generate_voc_datasets.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | __author__ = 'Kevin Di' 5 | 6 | import os 7 | import random 8 | 9 | # VOC like data_set file path. 10 | 11 | xml_file = r'path to your VOC like data_set: /Annotations' 12 | img_file = r'path to your VOC like data_set:/JPEGImages' 13 | save_path = r'path to your VOC like data_set: /ImageSets/Main' 14 | 15 | 16 | # Determine the train, val, test split ratio. 17 | # The frist step is split the train_val and test, and then split the train and val from the train_val. 18 | 19 | train_val_percent = 0.8 20 | train_percent = 0.8 21 | total_dataset_num = os.listdir(xml_file) 22 | total_img_num = os.listdir(img_file) 23 | num = len(total_dataset_num) 24 | img = len(total_img_num) 25 | list = range(num) 26 | t_v = int(num * train_val_percent) 27 | t = int(t_v * train_percent) 28 | train_val= random.sample(list,t_v) 29 | train = random.sample(train_val,t) 30 | 31 | print('Total number of xml files is:', num) 32 | print('Total number of images is:', img) 33 | print('training set size:', t) 34 | print('validation set size:', t_v - t) 35 | print('test set size:', num - t_v) 36 | 37 | file_train = open(os.path.join(save_path,'train.txt'), 'w') 38 | file_val = open(os.path.join(save_path,'val.txt'), 'w') 39 | file_test = open(os.path.join(save_path,'test.txt'), 'w') 40 | 41 | 42 | for i in list: 43 | xml_name = total_dataset_num[i][:5]+'\n' 44 | 45 | if i in train_val: 46 | if i in train: 47 | file_train.write(xml_name) 48 | else: 49 | file_val.write(xml_name) 50 | else: 51 | file_test.write(xml_name) 52 | 53 | file_train.close() 54 | file_val.close() 55 | file_test.close() 56 | -------------------------------------------------------------------------------- /image/image1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image1.jpg -------------------------------------------------------------------------------- /image/image2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image2.jpg -------------------------------------------------------------------------------- /image/image3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image3.jpg -------------------------------------------------------------------------------- /image/image4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image4.jpg -------------------------------------------------------------------------------- /image/image5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image5.jpg -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/utils/__init__.py -------------------------------------------------------------------------------- /utils/visualization_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # ============================================================================== 16 | 17 | """A set of functions that are used for visualization. 18 | 19 | These functions often receive an image, perform some visualization on the image. 20 | The functions do not return a value, instead they modify the image itself. 21 | 22 | """ 23 | import abc 24 | import collections 25 | import functools 26 | # Set headless-friendly backend. 27 | # Use Agg can not show image 28 | # import matplotlib; matplotlib.use('Agg') 29 | import matplotlib 30 | from matplotlib import pyplot as plt 31 | import os 32 | import numpy as np 33 | import PIL.Image as Image 34 | import PIL.ImageColor as ImageColor 35 | import PIL.ImageDraw as ImageDraw 36 | import PIL.ImageFont as ImageFont 37 | import six 38 | import tensorflow as tf 39 | 40 | from object_detection.core import standard_fields as fields 41 | from object_detection.utils import shape_utils 42 | 43 | _TITLE_LEFT_MARGIN = 10 44 | _TITLE_TOP_MARGIN = 10 45 | STANDARD_COLORS = [ 46 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque', 47 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite', 48 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', 49 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange', 50 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet', 51 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite', 52 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod', 53 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki', 54 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', 55 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey', 56 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', 57 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime', 58 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid', 59 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', 60 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin', 61 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', 62 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', 63 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', 64 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown', 65 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue', 66 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow', 67 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White', 68 | 'WhiteSmoke', 'Yellow', 'YellowGreen' 69 | ] 70 | 71 | 72 | def save_image_array_as_png(image, output_path): 73 | """Saves an image (represented as a numpy array) to PNG. 74 | 75 | Args: 76 | image: a numpy array with shape [height, width, 3]. 77 | output_path: path to which image should be written. 78 | """ 79 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB') 80 | with tf.gfile.Open(output_path, 'w') as fid: 81 | image_pil.save(fid, 'PNG') 82 | 83 | 84 | def encode_image_array_as_png_str(image): 85 | """Encodes a numpy array into a PNG string. 86 | 87 | Args: 88 | image: a numpy array with shape [height, width, 3]. 89 | 90 | Returns: 91 | PNG encoded image string. 92 | """ 93 | image_pil = Image.fromarray(np.uint8(image)) 94 | output = six.BytesIO() 95 | image_pil.save(output, format='PNG') 96 | png_string = output.getvalue() 97 | output.close() 98 | return png_string 99 | 100 | 101 | def draw_bounding_box_on_image_array(image, 102 | ymin, 103 | xmin, 104 | ymax, 105 | xmax, 106 | color='red', 107 | thickness=4, 108 | display_str_list=(), 109 | use_normalized_coordinates=True): 110 | """Adds a bounding box to an image (numpy array). 111 | 112 | Bounding box coordinates can be specified in either absolute (pixel) or 113 | normalized coordinates by setting the use_normalized_coordinates argument. 114 | 115 | Args: 116 | image: a numpy array with shape [height, width, 3]. 117 | ymin: ymin of bounding box. 118 | xmin: xmin of bounding box. 119 | ymax: ymax of bounding box. 120 | xmax: xmax of bounding box. 121 | color: color to draw bounding box. Default is red. 122 | thickness: line thickness. Default value is 4. 123 | display_str_list: list of strings to display in box 124 | (each to be shown on its own line). 125 | use_normalized_coordinates: If True (default), treat coordinates 126 | ymin, xmin, ymax, xmax as relative to the image. Otherwise treat 127 | coordinates as absolute. 128 | """ 129 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB') 130 | draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color, 131 | thickness, display_str_list, 132 | use_normalized_coordinates) 133 | np.copyto(image, np.array(image_pil)) 134 | 135 | 136 | def draw_bounding_box_on_image(image, 137 | ymin, 138 | xmin, 139 | ymax, 140 | xmax, 141 | color='red', 142 | thickness=4, 143 | display_str_list=(), 144 | use_normalized_coordinates=True): 145 | """Adds a bounding box to an image. 146 | 147 | Bounding box coordinates can be specified in either absolute (pixel) or 148 | normalized coordinates by setting the use_normalized_coordinates argument. 149 | 150 | Each string in display_str_list is displayed on a separate line above the 151 | bounding box in black text on a rectangle filled with the input 'color'. 152 | If the top of the bounding box extends to the edge of the image, the strings 153 | are displayed below the bounding box. 154 | 155 | Args: 156 | image: a PIL.Image object. 157 | ymin: ymin of bounding box. 158 | xmin: xmin of bounding box. 159 | ymax: ymax of bounding box. 160 | xmax: xmax of bounding box. 161 | color: color to draw bounding box. Default is red. 162 | thickness: line thickness. Default value is 4. 163 | display_str_list: list of strings to display in box 164 | (each to be shown on its own line). 165 | use_normalized_coordinates: If True (default), treat coordinates 166 | ymin, xmin, ymax, xmax as relative to the image. Otherwise treat 167 | coordinates as absolute. 168 | """ 169 | draw = ImageDraw.Draw(image) 170 | im_width, im_height = image.size 171 | if use_normalized_coordinates: 172 | (left, right, top, bottom) = (xmin * im_width, xmax * im_width, 173 | ymin * im_height, ymax * im_height) 174 | else: 175 | (left, right, top, bottom) = (xmin, xmax, ymin, ymax) 176 | draw.line([(left, top), (left, bottom), (right, bottom), 177 | (right, top), (left, top)], width=thickness, fill=color) 178 | try: 179 | font = ImageFont.truetype('arial.ttf', 24) 180 | except IOError: 181 | font = ImageFont.load_default() 182 | 183 | # If the total height of the display strings added to the top of the bounding 184 | # box exceeds the top of the image, stack the strings below the bounding box 185 | # instead of above. 186 | display_str_heights = [font.getsize(ds)[1] for ds in display_str_list] 187 | # Each display_str has a top and bottom margin of 0.05x. 188 | total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights) 189 | 190 | if top > total_display_str_height: 191 | text_bottom = top 192 | else: 193 | text_bottom = bottom + total_display_str_height 194 | # Reverse list and print from bottom to top. 195 | for display_str in display_str_list[::-1]: 196 | text_width, text_height = font.getsize(display_str) 197 | margin = np.ceil(0.05 * text_height) 198 | draw.rectangle( 199 | [(left, text_bottom - text_height - 2 * margin), (left + text_width, 200 | text_bottom)], 201 | fill=color) 202 | draw.text( 203 | (left + margin, text_bottom - text_height - margin), 204 | display_str, 205 | fill='black', 206 | font=font) 207 | text_bottom -= text_height - 2 * margin 208 | 209 | 210 | def draw_bounding_boxes_on_image_array(image, 211 | boxes, 212 | color='red', 213 | thickness=4, 214 | display_str_list_list=()): 215 | """Draws bounding boxes on image (numpy array). 216 | 217 | Args: 218 | image: a numpy array object. 219 | boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax). 220 | The coordinates are in normalized format between [0, 1]. 221 | color: color to draw bounding box. Default is red. 222 | thickness: line thickness. Default value is 4. 223 | display_str_list_list: list of list of strings. 224 | a list of strings for each bounding box. 225 | The reason to pass a list of strings for a 226 | bounding box is that it might contain 227 | multiple labels. 228 | 229 | Raises: 230 | ValueError: if boxes is not a [N, 4] array 231 | """ 232 | image_pil = Image.fromarray(image) 233 | draw_bounding_boxes_on_image(image_pil, boxes, color, thickness, 234 | display_str_list_list) 235 | np.copyto(image, np.array(image_pil)) 236 | 237 | 238 | def draw_bounding_boxes_on_image(image, 239 | boxes, 240 | color='red', 241 | thickness=4, 242 | display_str_list_list=()): 243 | """Draws bounding boxes on image. 244 | 245 | Args: 246 | image: a PIL.Image object. 247 | boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax). 248 | The coordinates are in normalized format between [0, 1]. 249 | color: color to draw bounding box. Default is red. 250 | thickness: line thickness. Default value is 4. 251 | display_str_list_list: list of list of strings. 252 | a list of strings for each bounding box. 253 | The reason to pass a list of strings for a 254 | bounding box is that it might contain 255 | multiple labels. 256 | 257 | Raises: 258 | ValueError: if boxes is not a [N, 4] array 259 | """ 260 | boxes_shape = boxes.shape 261 | if not boxes_shape: 262 | return 263 | if len(boxes_shape) != 2 or boxes_shape[1] != 4: 264 | raise ValueError('Input must be of size [N, 4]') 265 | for i in range(boxes_shape[0]): 266 | display_str_list = () 267 | if display_str_list_list: 268 | display_str_list = display_str_list_list[i] 269 | draw_bounding_box_on_image(image, boxes[i, 0], boxes[i, 1], boxes[i, 2], 270 | boxes[i, 3], color, thickness, display_str_list) 271 | 272 | 273 | def _visualize_boxes(image, boxes, classes, scores, category_index, **kwargs): 274 | return visualize_boxes_and_labels_on_image_array( 275 | image, boxes, classes, scores, category_index=category_index, **kwargs) 276 | 277 | 278 | def _visualize_boxes_and_masks(image, boxes, classes, scores, masks, 279 | category_index, **kwargs): 280 | return visualize_boxes_and_labels_on_image_array( 281 | image, 282 | boxes, 283 | classes, 284 | scores, 285 | category_index=category_index, 286 | instance_masks=masks, 287 | **kwargs) 288 | 289 | 290 | def _visualize_boxes_and_keypoints(image, boxes, classes, scores, keypoints, 291 | category_index, **kwargs): 292 | return visualize_boxes_and_labels_on_image_array( 293 | image, 294 | boxes, 295 | classes, 296 | scores, 297 | category_index=category_index, 298 | keypoints=keypoints, 299 | **kwargs) 300 | 301 | 302 | def _visualize_boxes_and_masks_and_keypoints( 303 | image, boxes, classes, scores, masks, keypoints, category_index, **kwargs): 304 | return visualize_boxes_and_labels_on_image_array( 305 | image, 306 | boxes, 307 | classes, 308 | scores, 309 | category_index=category_index, 310 | instance_masks=masks, 311 | keypoints=keypoints, 312 | **kwargs) 313 | 314 | 315 | def _resize_original_image(image, image_shape): 316 | image = tf.expand_dims(image, 0) 317 | image = tf.image.resize_images( 318 | image, 319 | image_shape, 320 | method=tf.image.ResizeMethod.NEAREST_NEIGHBOR, 321 | align_corners=True) 322 | return tf.cast(tf.squeeze(image, 0), tf.uint8) 323 | 324 | 325 | def draw_bounding_boxes_on_image_tensors(images, 326 | boxes, 327 | classes, 328 | scores, 329 | category_index, 330 | original_image_spatial_shape=None, 331 | true_image_shape=None, 332 | instance_masks=None, 333 | keypoints=None, 334 | max_boxes_to_draw=100, 335 | min_score_thresh=0.85, 336 | use_normalized_coordinates=True): 337 | """Draws bounding boxes, masks, and keypoints on batch of image tensors. 338 | 339 | Args: 340 | images: A 4D uint8 image tensor of shape [N, H, W, C]. If C > 3, additional 341 | channels will be ignored. If C = 1, then we convert the images to RGB 342 | images. 343 | boxes: [N, max_detections, 4] float32 tensor of detection boxes. 344 | classes: [N, max_detections] int tensor of detection classes. Note that 345 | classes are 1-indexed. 346 | scores: [N, max_detections] float32 tensor of detection scores. 347 | category_index: a dict that maps integer ids to category dicts. e.g. 348 | {1: {1: 'dog'}, 2: {2: 'cat'}, ...} 349 | original_image_spatial_shape: [N, 2] tensor containing the spatial size of 350 | the original image. 351 | true_image_shape: [N, 3] tensor containing the spatial size of unpadded 352 | original_image. 353 | instance_masks: A 4D uint8 tensor of shape [N, max_detection, H, W] with 354 | instance masks. 355 | keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2] 356 | with keypoints. 357 | max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20. 358 | min_score_thresh: Minimum score threshold for visualization. Default 0.2. 359 | use_normalized_coordinates: Whether to assume boxes and kepoints are in 360 | normalized coordinates (as opposed to absolute coordiantes). 361 | Default is True. 362 | 363 | Returns: 364 | 4D image tensor of type uint8, with boxes drawn on top. 365 | """ 366 | # Additional channels are being ignored. 367 | if images.shape[3] > 3: 368 | images = images[:, :, :, 0:3] 369 | elif images.shape[3] == 1: 370 | images = tf.image.grayscale_to_rgb(images) 371 | visualization_keyword_args = { 372 | 'use_normalized_coordinates': use_normalized_coordinates, 373 | 'max_boxes_to_draw': max_boxes_to_draw, 374 | 'min_score_thresh': min_score_thresh, 375 | 'agnostic_mode': False, 376 | 'line_thickness': 4 377 | } 378 | if true_image_shape is None: 379 | true_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 3]) 380 | else: 381 | true_shapes = true_image_shape 382 | if original_image_spatial_shape is None: 383 | original_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 2]) 384 | else: 385 | original_shapes = original_image_spatial_shape 386 | 387 | if instance_masks is not None and keypoints is None: 388 | visualize_boxes_fn = functools.partial( 389 | _visualize_boxes_and_masks, 390 | category_index=category_index, 391 | **visualization_keyword_args) 392 | elems = [ 393 | true_shapes, original_shapes, images, boxes, classes, scores, 394 | instance_masks 395 | ] 396 | elif instance_masks is None and keypoints is not None: 397 | visualize_boxes_fn = functools.partial( 398 | _visualize_boxes_and_keypoints, 399 | category_index=category_index, 400 | **visualization_keyword_args) 401 | elems = [ 402 | true_shapes, original_shapes, images, boxes, classes, scores, keypoints 403 | ] 404 | elif instance_masks is not None and keypoints is not None: 405 | visualize_boxes_fn = functools.partial( 406 | _visualize_boxes_and_masks_and_keypoints, 407 | category_index=category_index, 408 | **visualization_keyword_args) 409 | elems = [ 410 | true_shapes, original_shapes, images, boxes, classes, scores, 411 | instance_masks, keypoints 412 | ] 413 | else: 414 | visualize_boxes_fn = functools.partial( 415 | _visualize_boxes, 416 | category_index=category_index, 417 | **visualization_keyword_args) 418 | elems = [ 419 | true_shapes, original_shapes, images, boxes, classes, scores 420 | ] 421 | 422 | def draw_boxes(image_and_detections): 423 | """Draws boxes on image.""" 424 | true_shape = image_and_detections[0] 425 | original_shape = image_and_detections[1] 426 | if true_image_shape is not None: 427 | image = shape_utils.pad_or_clip_nd(image_and_detections[2], 428 | [true_shape[0], true_shape[1], 3]) 429 | if original_image_spatial_shape is not None: 430 | image_and_detections[2] = _resize_original_image(image, original_shape) 431 | 432 | image_with_boxes = tf.py_func(visualize_boxes_fn, image_and_detections[2:], 433 | tf.uint8) 434 | return image_with_boxes 435 | 436 | images = tf.map_fn(draw_boxes, elems, dtype=tf.uint8, back_prop=False) 437 | return images 438 | 439 | 440 | def draw_side_by_side_evaluation_image(eval_dict, 441 | category_index, 442 | max_boxes_to_draw=20, 443 | min_score_thresh=0.2, 444 | use_normalized_coordinates=True): 445 | """Creates a side-by-side image with detections and groundtruth. 446 | 447 | Bounding boxes (and instance masks, if available) are visualized on both 448 | subimages. 449 | 450 | Args: 451 | eval_dict: The evaluation dictionary returned by 452 | eval_util.result_dict_for_batched_example() or 453 | eval_util.result_dict_for_single_example(). 454 | category_index: A category index (dictionary) produced from a labelmap. 455 | max_boxes_to_draw: The maximum number of boxes to draw for detections. 456 | min_score_thresh: The minimum score threshold for showing detections. 457 | use_normalized_coordinates: Whether to assume boxes and kepoints are in 458 | normalized coordinates (as opposed to absolute coordiantes). 459 | Default is True. 460 | 461 | Returns: 462 | A list of [1, H, 2 * W, C] uint8 tensor. The subimage on the left 463 | corresponds to detections, while the subimage on the right corresponds to 464 | groundtruth. 465 | """ 466 | detection_fields = fields.DetectionResultFields() 467 | input_data_fields = fields.InputDataFields() 468 | 469 | images_with_detections_list = [] 470 | 471 | # Add the batch dimension if the eval_dict is for single example. 472 | if len(eval_dict[detection_fields.detection_classes].shape) == 1: 473 | for key in eval_dict: 474 | if key != input_data_fields.original_image: 475 | eval_dict[key] = tf.expand_dims(eval_dict[key], 0) 476 | 477 | for indx in range(eval_dict[input_data_fields.original_image].shape[0]): 478 | instance_masks = None 479 | if detection_fields.detection_masks in eval_dict: 480 | instance_masks = tf.cast( 481 | tf.expand_dims( 482 | eval_dict[detection_fields.detection_masks][indx], axis=0), 483 | tf.uint8) 484 | keypoints = None 485 | if detection_fields.detection_keypoints in eval_dict: 486 | keypoints = tf.expand_dims( 487 | eval_dict[detection_fields.detection_keypoints][indx], axis=0) 488 | groundtruth_instance_masks = None 489 | if input_data_fields.groundtruth_instance_masks in eval_dict: 490 | groundtruth_instance_masks = tf.cast( 491 | tf.expand_dims( 492 | eval_dict[input_data_fields.groundtruth_instance_masks][indx], 493 | axis=0), tf.uint8) 494 | 495 | images_with_detections = draw_bounding_boxes_on_image_tensors( 496 | tf.expand_dims( 497 | eval_dict[input_data_fields.original_image][indx], axis=0), 498 | tf.expand_dims( 499 | eval_dict[detection_fields.detection_boxes][indx], axis=0), 500 | tf.expand_dims( 501 | eval_dict[detection_fields.detection_classes][indx], axis=0), 502 | tf.expand_dims( 503 | eval_dict[detection_fields.detection_scores][indx], axis=0), 504 | category_index, 505 | original_image_spatial_shape=tf.expand_dims( 506 | eval_dict[input_data_fields.original_image_spatial_shape][indx], 507 | axis=0), 508 | true_image_shape=tf.expand_dims( 509 | eval_dict[input_data_fields.true_image_shape][indx], axis=0), 510 | instance_masks=instance_masks, 511 | keypoints=keypoints, 512 | max_boxes_to_draw=max_boxes_to_draw, 513 | min_score_thresh=min_score_thresh, 514 | use_normalized_coordinates=use_normalized_coordinates) 515 | images_with_groundtruth = draw_bounding_boxes_on_image_tensors( 516 | tf.expand_dims( 517 | eval_dict[input_data_fields.original_image][indx], axis=0), 518 | tf.expand_dims( 519 | eval_dict[input_data_fields.groundtruth_boxes][indx], axis=0), 520 | tf.expand_dims( 521 | eval_dict[input_data_fields.groundtruth_classes][indx], axis=0), 522 | tf.expand_dims( 523 | tf.ones_like( 524 | eval_dict[input_data_fields.groundtruth_classes][indx], 525 | dtype=tf.float32), 526 | axis=0), 527 | category_index, 528 | original_image_spatial_shape=tf.expand_dims( 529 | eval_dict[input_data_fields.original_image_spatial_shape][indx], 530 | axis=0), 531 | true_image_shape=tf.expand_dims( 532 | eval_dict[input_data_fields.true_image_shape][indx], axis=0), 533 | instance_masks=groundtruth_instance_masks, 534 | keypoints=None, 535 | max_boxes_to_draw=None, 536 | min_score_thresh=0.0, 537 | use_normalized_coordinates=use_normalized_coordinates) 538 | images_with_detections_list.append( 539 | tf.concat([images_with_detections, images_with_groundtruth], axis=2)) 540 | return images_with_detections_list 541 | 542 | 543 | def draw_keypoints_on_image_array(image, 544 | keypoints, 545 | color='red', 546 | radius=2, 547 | use_normalized_coordinates=True): 548 | """Draws keypoints on an image (numpy array). 549 | 550 | Args: 551 | image: a numpy array with shape [height, width, 3]. 552 | keypoints: a numpy array with shape [num_keypoints, 2]. 553 | color: color to draw the keypoints with. Default is red. 554 | radius: keypoint radius. Default value is 2. 555 | use_normalized_coordinates: if True (default), treat keypoint values as 556 | relative to the image. Otherwise treat them as absolute. 557 | """ 558 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB') 559 | draw_keypoints_on_image(image_pil, keypoints, color, radius, 560 | use_normalized_coordinates) 561 | np.copyto(image, np.array(image_pil)) 562 | 563 | 564 | def draw_keypoints_on_image(image, 565 | keypoints, 566 | color='red', 567 | radius=2, 568 | use_normalized_coordinates=True): 569 | """Draws keypoints on an image. 570 | 571 | Args: 572 | image: a PIL.Image object. 573 | keypoints: a numpy array with shape [num_keypoints, 2]. 574 | color: color to draw the keypoints with. Default is red. 575 | radius: keypoint radius. Default value is 2. 576 | use_normalized_coordinates: if True (default), treat keypoint values as 577 | relative to the image. Otherwise treat them as absolute. 578 | """ 579 | draw = ImageDraw.Draw(image) 580 | im_width, im_height = image.size 581 | keypoints_x = [k[1] for k in keypoints] 582 | keypoints_y = [k[0] for k in keypoints] 583 | if use_normalized_coordinates: 584 | keypoints_x = tuple([im_width * x for x in keypoints_x]) 585 | keypoints_y = tuple([im_height * y for y in keypoints_y]) 586 | for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y): 587 | draw.ellipse([(keypoint_x - radius, keypoint_y - radius), 588 | (keypoint_x + radius, keypoint_y + radius)], 589 | outline=color, fill=color) 590 | 591 | 592 | def draw_mask_on_image_array(image, mask, color='red', alpha=0.4): 593 | """Draws mask on an image. 594 | 595 | Args: 596 | image: uint8 numpy array with shape (img_height, img_height, 3) 597 | mask: a uint8 numpy array of shape (img_height, img_height) with 598 | values between either 0 or 1. 599 | color: color to draw the keypoints with. Default is red. 600 | alpha: transparency value between 0 and 1. (default: 0.4) 601 | 602 | Raises: 603 | ValueError: On incorrect data type for image or masks. 604 | """ 605 | if image.dtype != np.uint8: 606 | raise ValueError('`image` not of type np.uint8') 607 | if mask.dtype != np.uint8: 608 | raise ValueError('`mask` not of type np.uint8') 609 | if np.any(np.logical_and(mask != 1, mask != 0)): 610 | raise ValueError('`mask` elements should be in [0, 1]') 611 | if image.shape[:2] != mask.shape: 612 | raise ValueError('The image has spatial dimensions %s but the mask has ' 613 | 'dimensions %s' % (image.shape[:2], mask.shape)) 614 | rgb = ImageColor.getrgb(color) 615 | pil_image = Image.fromarray(image) 616 | 617 | solid_color = np.expand_dims( 618 | np.ones_like(mask), axis=2) * np.reshape(list(rgb), [1, 1, 3]) 619 | pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA') 620 | pil_mask = Image.fromarray(np.uint8(255.0*alpha*mask)).convert('L') 621 | pil_image = Image.composite(pil_solid_color, pil_image, pil_mask) 622 | np.copyto(image, np.array(pil_image.convert('RGB'))) 623 | 624 | 625 | def visualize_boxes_and_labels_on_image_array( 626 | image, 627 | # image_path, 628 | # output_path, 629 | boxes, 630 | classes, 631 | scores, 632 | category_index, 633 | instance_masks=None, 634 | instance_boundaries=None, 635 | keypoints=None, 636 | use_normalized_coordinates=False, 637 | max_boxes_to_draw=20, 638 | min_score_thresh=.5, 639 | agnostic_mode=False, 640 | line_thickness=4, 641 | groundtruth_box_visualization_color='black', 642 | skip_scores=False, 643 | skip_labels=False): 644 | """Overlay labeled boxes on an image with formatted scores and label names. 645 | 646 | This function groups boxes that correspond to the same location 647 | and creates a display string for each detection and overlays these 648 | on the image. Note that this function modifies the image in place, and returns 649 | that same image. 650 | 651 | Args: 652 | image: uint8 numpy array with shape (img_height, img_width, 3) 653 | boxes: a numpy array of shape [N, 4] 654 | classes: a numpy array of shape [N]. Note that class indices are 1-based, 655 | and match the keys in the label map. 656 | scores: a numpy array of shape [N] or None. If scores=None, then 657 | this function assumes that the boxes to be plotted are groundtruth 658 | boxes and plot all boxes as black with no classes or scores. 659 | category_index: a dict containing category dictionaries (each holding 660 | category index `id` and category name `name`) keyed by category indices. 661 | instance_masks: a numpy array of shape [N, image_height, image_width] with 662 | values ranging between 0 and 1, can be None. 663 | instance_boundaries: a numpy array of shape [N, image_height, image_width] 664 | with values ranging between 0 and 1, can be None. 665 | keypoints: a numpy array of shape [N, num_keypoints, 2], can 666 | be None 667 | use_normalized_coordinates: whether boxes is to be interpreted as 668 | normalized coordinates or not. 669 | max_boxes_to_draw: maximum number of boxes to visualize. If None, draw 670 | all boxes. 671 | min_score_thresh: minimum score threshold for a box to be visualized 672 | agnostic_mode: boolean (default: False) controlling whether to evaluate in 673 | class-agnostic mode or not. This mode will display scores but ignore 674 | classes. 675 | line_thickness: integer (default: 4) controlling line width of the boxes. 676 | groundtruth_box_visualization_color: box color for visualizing groundtruth 677 | boxes 678 | skip_scores: whether to skip score when drawing a single detection 679 | skip_labels: whether to skip label when drawing a single detection 680 | 681 | Returns: 682 | uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes. 683 | """ 684 | # Create a display string (and color) for every box location, group any boxes 685 | # that correspond to the same location. 686 | box_to_display_str_map = collections.defaultdict(list) 687 | box_to_color_map = collections.defaultdict(str) 688 | box_to_instance_masks_map = {} 689 | box_to_instance_boundaries_map = {} 690 | box_to_keypoints_map = collections.defaultdict(list) 691 | if not max_boxes_to_draw: 692 | max_boxes_to_draw = boxes.shape[0] 693 | for i in range(min(max_boxes_to_draw, boxes.shape[0])): 694 | if scores is None or scores[i] > min_score_thresh: 695 | box = tuple(boxes[i].tolist()) 696 | if instance_masks is not None: 697 | box_to_instance_masks_map[box] = instance_masks[i] 698 | if instance_boundaries is not None: 699 | box_to_instance_boundaries_map[box] = instance_boundaries[i] 700 | if keypoints is not None: 701 | box_to_keypoints_map[box].extend(keypoints[i]) 702 | if scores is None: 703 | box_to_color_map[box] = groundtruth_box_visualization_color 704 | else: 705 | display_str = '' 706 | if not skip_labels: 707 | if not agnostic_mode: 708 | if classes[i] in category_index.keys(): 709 | class_name = category_index[classes[i]]['name'] 710 | else: 711 | class_name = 'N/A' 712 | display_str = str(class_name) 713 | if not skip_scores: 714 | if not display_str: 715 | display_str = '{}%'.format(int(100*scores[i])) 716 | else: 717 | display_str = '{}: {}%'.format(display_str, int(100*scores[i])) 718 | box_to_display_str_map[box].append(display_str) 719 | if agnostic_mode: 720 | box_to_color_map[box] = 'DarkOrange' 721 | else: 722 | box_to_color_map[box] = STANDARD_COLORS[ 723 | classes[i] % len(STANDARD_COLORS)] 724 | 725 | 726 | # # Crop bounding box to splt images,move out of this file for OCR. 727 | 728 | # # Convert coordinates to raw pixels. 729 | # t = 0 730 | # for box, color in box_to_color_map.items(): 731 | # ymin, xmin, ymax, xmax = box 732 | 733 | # img = Image.fromarray(np.uint8(image)) 734 | # im_width, im_height = img.size 735 | # new_xmin = int(xmin * im_width) 736 | # new_xmax = int(xmax * im_width) 737 | # new_ymin = int(ymin * im_height) 738 | # new_ymax = int(ymax * im_height) 739 | 740 | # img_n = box_to_display_str_map[box][0] 741 | # img_name = img_n.replace(': ','-') 742 | 743 | # # corp image.Note that the PLI and Numpy coordinates are reversed!!! 744 | # image_corp = image[new_ymin:new_ymax,new_xmin:new_xmax] 745 | 746 | # image_corp = Image.fromarray(np.uint8(image_corp)) 747 | 748 | # # Save corp image to outo_path ,and join output lable in name. 749 | # if img_name.find('container_number') >= 0: 750 | # t += 1 751 | # image_corp.save(os.path.join(output_path) + '/' +img_name +'_' + (str(t)+'_') + os.path.basename(image_path)) 752 | 753 | 754 | # Draw all boxes onto image. 755 | for box, color in box_to_color_map.items(): 756 | ymin, xmin, ymax, xmax = box 757 | if instance_masks is not None: 758 | draw_mask_on_image_array( 759 | image, 760 | box_to_instance_masks_map[box], 761 | color=color 762 | ) 763 | if instance_boundaries is not None: 764 | draw_mask_on_image_array( 765 | image, 766 | box_to_instance_boundaries_map[box], 767 | color='red', 768 | alpha=1.0 769 | ) 770 | 771 | draw_bounding_box_on_image_array( 772 | image, 773 | ymin, 774 | xmin, 775 | ymax, 776 | xmax, 777 | color=color, 778 | thickness=line_thickness, 779 | display_str_list=box_to_display_str_map[box], 780 | use_normalized_coordinates=use_normalized_coordinates) 781 | if keypoints is not None: 782 | print(box_to_keypoints_map[box]) 783 | draw_keypoints_on_image_array( 784 | image, 785 | box_to_keypoints_map[box], 786 | color=color, 787 | radius=line_thickness / 2, 788 | use_normalized_coordinates=use_normalized_coordinates) 789 | 790 | return image, box_to_color_map, box_to_display_str_map 791 | 792 | 793 | def add_cdf_image_summary(values, name): 794 | """Adds a tf.summary.image for a CDF plot of the values. 795 | 796 | Normalizes `values` such that they sum to 1, plots the cumulative distribution 797 | function and creates a tf image summary. 798 | 799 | Args: 800 | values: a 1-D float32 tensor containing the values. 801 | name: name for the image summary. 802 | """ 803 | def cdf_plot(values): 804 | """Numpy function to plot CDF.""" 805 | normalized_values = values / np.sum(values) 806 | sorted_values = np.sort(normalized_values) 807 | cumulative_values = np.cumsum(sorted_values) 808 | fraction_of_examples = (np.arange(cumulative_values.size, dtype=np.float32) 809 | / cumulative_values.size) 810 | fig = plt.figure(frameon=False) 811 | ax = fig.add_subplot('111') 812 | ax.plot(fraction_of_examples, cumulative_values) 813 | ax.set_ylabel('cumulative normalized values') 814 | ax.set_xlabel('fraction of examples') 815 | fig.canvas.draw() 816 | width, height = fig.get_size_inches() * fig.get_dpi() 817 | image = np.fromstring(fig.canvas.tostring_rgb(), dtype='uint8').reshape( 818 | 1, int(height), int(width), 3) 819 | return image 820 | cdf_plot = tf.py_func(cdf_plot, [values], tf.uint8) 821 | tf.summary.image(name, cdf_plot) 822 | 823 | 824 | def add_hist_image_summary(values, bins, name): 825 | """Adds a tf.summary.image for a histogram plot of the values. 826 | 827 | Plots the histogram of values and creates a tf image summary. 828 | 829 | Args: 830 | values: a 1-D float32 tensor containing the values. 831 | bins: bin edges which will be directly passed to np.histogram. 832 | name: name for the image summary. 833 | """ 834 | 835 | def hist_plot(values, bins): 836 | """Numpy function to plot hist.""" 837 | fig = plt.figure(frameon=False) 838 | ax = fig.add_subplot('111') 839 | y, x = np.histogram(values, bins=bins) 840 | ax.plot(x[:-1], y) 841 | ax.set_ylabel('count') 842 | ax.set_xlabel('value') 843 | fig.canvas.draw() 844 | width, height = fig.get_size_inches() * fig.get_dpi() 845 | image = np.fromstring( 846 | fig.canvas.tostring_rgb(), dtype='uint8').reshape( 847 | 1, int(height), int(width), 3) 848 | return image 849 | hist_plot = tf.py_func(hist_plot, [values, bins], tf.uint8) 850 | tf.summary.image(name, hist_plot) 851 | 852 | 853 | class EvalMetricOpsVisualization(object): 854 | """Abstract base class responsible for visualizations during evaluation. 855 | 856 | Currently, summary images are not run during evaluation. One way to produce 857 | evaluation images in Tensorboard is to provide tf.summary.image strings as 858 | `value_ops` in tf.estimator.EstimatorSpec's `eval_metric_ops`. This class is 859 | responsible for accruing images (with overlaid detections and groundtruth) 860 | and returning a dictionary that can be passed to `eval_metric_ops`. 861 | """ 862 | __metaclass__ = abc.ABCMeta 863 | 864 | def __init__(self, 865 | category_index, 866 | max_examples_to_draw=5, 867 | max_boxes_to_draw=20, 868 | min_score_thresh=0.2, 869 | use_normalized_coordinates=True, 870 | summary_name_prefix='evaluation_image'): 871 | """Creates an EvalMetricOpsVisualization. 872 | 873 | Args: 874 | category_index: A category index (dictionary) produced from a labelmap. 875 | max_examples_to_draw: The maximum number of example summaries to produce. 876 | max_boxes_to_draw: The maximum number of boxes to draw for detections. 877 | min_score_thresh: The minimum score threshold for showing detections. 878 | use_normalized_coordinates: Whether to assume boxes and kepoints are in 879 | normalized coordinates (as opposed to absolute coordiantes). 880 | Default is True. 881 | summary_name_prefix: A string prefix for each image summary. 882 | """ 883 | 884 | self._category_index = category_index 885 | self._max_examples_to_draw = max_examples_to_draw 886 | self._max_boxes_to_draw = max_boxes_to_draw 887 | self._min_score_thresh = min_score_thresh 888 | self._use_normalized_coordinates = use_normalized_coordinates 889 | self._summary_name_prefix = summary_name_prefix 890 | self._images = [] 891 | 892 | def clear(self): 893 | self._images = [] 894 | 895 | def add_images(self, images): 896 | """Store a list of images, each with shape [1, H, W, C].""" 897 | if len(self._images) >= self._max_examples_to_draw: 898 | return 899 | 900 | # Store images and clip list if necessary. 901 | self._images.extend(images) 902 | if len(self._images) > self._max_examples_to_draw: 903 | self._images[self._max_examples_to_draw:] = [] 904 | 905 | def get_estimator_eval_metric_ops(self, eval_dict): 906 | """Returns metric ops for use in tf.estimator.EstimatorSpec. 907 | 908 | Args: 909 | eval_dict: A dictionary that holds an image, groundtruth, and detections 910 | for a batched example. Note that, we use only the first example for 911 | visualization. See eval_util.result_dict_for_batched_example() for a 912 | convenient method for constructing such a dictionary. The dictionary 913 | contains 914 | fields.InputDataFields.original_image: [batch_size, H, W, 3] image. 915 | fields.InputDataFields.original_image_spatial_shape: [batch_size, 2] 916 | tensor containing the size of the original image. 917 | fields.InputDataFields.true_image_shape: [batch_size, 3] 918 | tensor containing the spatial size of the upadded original image. 919 | fields.InputDataFields.groundtruth_boxes - [batch_size, num_boxes, 4] 920 | float32 tensor with groundtruth boxes in range [0.0, 1.0]. 921 | fields.InputDataFields.groundtruth_classes - [batch_size, num_boxes] 922 | int64 tensor with 1-indexed groundtruth classes. 923 | fields.InputDataFields.groundtruth_instance_masks - (optional) 924 | [batch_size, num_boxes, H, W] int64 tensor with instance masks. 925 | fields.DetectionResultFields.detection_boxes - [batch_size, 926 | max_num_boxes, 4] float32 tensor with detection boxes in range [0.0, 927 | 1.0]. 928 | fields.DetectionResultFields.detection_classes - [batch_size, 929 | max_num_boxes] int64 tensor with 1-indexed detection classes. 930 | fields.DetectionResultFields.detection_scores - [batch_size, 931 | max_num_boxes] float32 tensor with detection scores. 932 | fields.DetectionResultFields.detection_masks - (optional) [batch_size, 933 | max_num_boxes, H, W] float32 tensor of binarized masks. 934 | fields.DetectionResultFields.detection_keypoints - (optional) 935 | [batch_size, max_num_boxes, num_keypoints, 2] float32 tensor with 936 | keypoints. 937 | 938 | Returns: 939 | A dictionary of image summary names to tuple of (value_op, update_op). The 940 | `update_op` is the same for all items in the dictionary, and is 941 | responsible for saving a single side-by-side image with detections and 942 | groundtruth. Each `value_op` holds the tf.summary.image string for a given 943 | image. 944 | """ 945 | if self._max_examples_to_draw == 0: 946 | return {} 947 | images = self.images_from_evaluation_dict(eval_dict) 948 | 949 | def get_images(): 950 | """Returns a list of images, padded to self._max_images_to_draw.""" 951 | images = self._images 952 | while len(images) < self._max_examples_to_draw: 953 | images.append(np.array(0, dtype=np.uint8)) 954 | self.clear() 955 | return images 956 | 957 | def image_summary_or_default_string(summary_name, image): 958 | """Returns image summaries for non-padded elements.""" 959 | return tf.cond( 960 | tf.equal(tf.size(tf.shape(image)), 4), 961 | lambda: tf.summary.image(summary_name, image), 962 | lambda: tf.constant('')) 963 | 964 | update_op = tf.py_func(self.add_images, [[images[0]]], []) 965 | image_tensors = tf.py_func( 966 | get_images, [], [tf.uint8] * self._max_examples_to_draw) 967 | eval_metric_ops = {} 968 | for i, image in enumerate(image_tensors): 969 | summary_name = self._summary_name_prefix + '/' + str(i) 970 | value_op = image_summary_or_default_string(summary_name, image) 971 | eval_metric_ops[summary_name] = (value_op, update_op) 972 | return eval_metric_ops 973 | 974 | @abc.abstractmethod 975 | def images_from_evaluation_dict(self, eval_dict): 976 | """Converts evaluation dictionary into a list of image tensors. 977 | 978 | To be overridden by implementations. 979 | 980 | Args: 981 | eval_dict: A dictionary with all the necessary information for producing 982 | visualizations. 983 | 984 | Returns: 985 | A list of [1, H, W, C] uint8 tensors. 986 | """ 987 | raise NotImplementedError 988 | 989 | 990 | class VisualizeSingleFrameDetections(EvalMetricOpsVisualization): 991 | """Class responsible for single-frame object detection visualizations.""" 992 | 993 | def __init__(self, 994 | category_index, 995 | max_examples_to_draw=5, 996 | max_boxes_to_draw=20, 997 | min_score_thresh=0.2, 998 | use_normalized_coordinates=True, 999 | summary_name_prefix='Detections_Left_Groundtruth_Right'): 1000 | super(VisualizeSingleFrameDetections, self).__init__( 1001 | category_index=category_index, 1002 | max_examples_to_draw=max_examples_to_draw, 1003 | max_boxes_to_draw=max_boxes_to_draw, 1004 | min_score_thresh=min_score_thresh, 1005 | use_normalized_coordinates=use_normalized_coordinates, 1006 | summary_name_prefix=summary_name_prefix) 1007 | 1008 | def images_from_evaluation_dict(self, eval_dict): 1009 | return draw_side_by_side_evaluation_image( 1010 | eval_dict, self._category_index, self._max_boxes_to_draw, 1011 | self._min_score_thresh, self._use_normalized_coordinates) 1012 | --------------------------------------------------------------------------------