├── LICENSE
├── README.md
├── cont_num.txt
├── create_pascal_tf_record.py
├── data
    └── container_label_map.pbtxt
├── detection_var_image.py
├── generate_voc_datasets.py
├── image
    ├── image1.jpg
    ├── image2.jpg
    ├── image3.jpg
    ├── image4.jpg
    └── image5.jpg
└── utils
    ├── __init__.py
    └── visualization_utils.py


/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Container detection and container number OCR using Tensorflow Object Detection API and Tesseract
  2 | 
  3 | Container detection and container number OCR is a specific project requirement, using [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) and [Tesseract](https://github.com/tesseract-ocr/tesseract) to verify feasibility is one of the quickest and simplest ways.
  4 | 
  5 | >两年多之前我在“ex公司”的时候，有一个明确的项目需求是集装箱识别并计数，然后通过集装箱号OCR识别记录每一个集装箱号，然后与其余业务系统的数据进行交换，以实现特定的需求。正好Tensorflow Object Detection API 发布了，就放弃了YOLO或者SSD的选项，考虑用TF实现Demo做POC验证了。具体需求实现的思考与pipeline构想思考参见这篇文章：[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/) 。  
  6 | 
  7 | ## 用法
  8 | 
  9 | ### Tensorflow Object Detection API 安装
 10 | 
 11 | 具体安装参考官方[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md)。  
 12 | 
 13 | ### 环境与依赖
 14 | 
 15 | 本人使用的环境是：macOS 10.14.2，python 3.6.8，TF 1.12
 16 | 除了Tensorflow Object Detection API 安装必备的依赖外，还需要以下依赖：
 17 | tesseract
 18 | pytesseract
 19 | 具体安装及用途，请自行Google。
 20 | `visualization_utils.py`中:
 21 | 
 22 | ``` python
 23 | import matplotlib; matplotlib.use('Agg')
 24 | ```
 25 | 
 26 | Agg在我的环境下用不了，也懒得折腾，所以把这句改了。
 27 | 
 28 | ### 数据集准备
 29 | 
 30 | 参考PascalVOC的数据集格式，使用[LabelImg](https://github.com/tzutalin/labelImg)进行标注。  
 31 | 标注完成后可以使用`generate_voc_datasets.py`按你的想法分割数据集为：train 、val 与 test这个三个data set。
 32 | 分割为三个data set后，可以使用`create_pascal_tf_record.py`转换为TF record格式data set文件供TF使用（此文件官方提供，在`/object_detection/dataset_tools/`）。  
 33 | 有关数据准备的内容，可以参考这里的[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)。
 34 | 
 35 | ### 训练
 36 | 
 37 | 参考[官方说明-本地](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md)使用官方代码库中的`model_main.py`在本地训练(以前是train 和 val 分别提供了两个版本，目前版本用这一个文件就可以了。)。
 38 | 参考[官方说明——Google Cloud ML Engine](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md)在Google Cloud ML Engine上使用TPU训练，资费说明在[这里](https://cloud.google.com/ml-engine/docs/tensorflow/pricing?hl=zh-CN)，可以选择“竞争”模式使用，会便宜很多。
 39 | 
 40 | ### 验证
 41 | 
 42 | 可以使用官方代码中的`object_detection_tutorial.ipynb`做快速验证尝试。本repo中的`detection_var_image.py`也主要参考这个ipynp实现的。
 43 | 以下几个位置需要根据你自己的实际情况来修改：
 44 | 
 45 | ``` python
 46 | 
 47 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
 48 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
 49 | # List of the strings that is used to add correct label for each box.
 50 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt')
 51 | 
 52 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)]
 53 | 
 54 | lang = 'cont41'
 55 | 
 56 | ```
 57 | 
 58 | 其中`lang = 'cont41'`中的`cont41`是trsseract使用的lang文件的名字，如果你还没有来得及自己训练lang文件，可以把`lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f'`中的其余内容都删了，仅保留`eng`，使用tesseract安装默认带的lang文件进行识别。  
 59 | 返回的`image_label`为一个嵌套列表，会是这个样子：  
 60 | 
 61 | ``` python
 62 | 
 63 | [{'image1': [{'lable': 'container_number', 'actual': '100%', 'cont_num': 'TCLU § 148575 3\n45G1', 'image_corp_name': 'image1_1_container_number'}]}, {'image2': [{'lable': 'container_number', 'actual': '99%', 'cont_num': 'TRNU816699 4 |\n45G1', 'image_corp_name': 'image2_1_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'TCNU89092898\n4561', 'image_corp_name': 'image2_2_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'MSKUY 86801264\n4561', 'image_corp_name': 'image2_3_container_number'}]}]
 64 | 
 65 | ```
 66 | 
 67 | 每个索引对应一个字典，字典的：  
 68 | `key`为输入的图片名称；  
 69 | `value`为一个列表，列表的索引对应的是由4个key构成的字典，分别是标签、置信度、OCR的结果以及输出的裁剪后的集装箱号图片的名称，索引数量则代表了在图片中找到的集装箱号。
 70 | 
 71 | 主要是考虑如果再用flask做个Web，可以直接用flask简单做个服务端，把检测的结果JSON串一次性抛出来，Demo环节没必要再单独折腾TensorFlow Serving部署一个后端。
 72 | 
 73 | 对于每张输入的图片，除了上述JOSN输出外，还输出：  
 74 | 绘制了Bounding box 与 label 的图片；  
 75 | 集装箱号位置的裁剪图片（有几个裁几个），以及使用openCV做了预处理后丢入tesseract之前的图片。通过对比图片与OCR结果，可以给我们调整图片预处理的思路与参数。
 76 | 
 77 | #### Demo
 78 | 
 79 | `image`文件夹下有5张测试图片，测试结果在`cont_num.txt`中，部分如下：
 80 | 
 81 | | 图片名 | OCR结果 | 实际 |
 82 | |:------:|:------:|:----:|
 83 | | image1_1_container_number_100% | TCLU § 148575 3 45G1 | TCLU 148575 3 45G1 |
 84 | |image2_1_container_number_99% | TRNU816699 4 \| 45G1 | TRLU 818699 0 45G1 |
 85 | | image2_2_container_number_99% | TCNU89092898 4561 | TCNU 869248 8 45G1 |
 86 | | image2_3_container_number_99% | MSKUY 86801264 4561 | MSKU 868012 6 4561 |
 87 | | image3_1_container_number_99% | x L BOUL 871489 7 \| 221 | BMOU 871489 7 22R1 |
 88 | | image3_2_container_number_99% | FCIU [599867 (0 22G1 | FCIU 599887 0 22G1 |
 89 | 
 90 | 可以看到，OCR的整体准确率并不高，可以说，与我在[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/)中预估的准确率不超过8成是匹配的（现在看肯定是事后诸葛亮，但在当时下决心做验证的时候是这么一个真实预测）。这个准确率并不是没有提高可能的，实际上在以下几个方面可以继续做一些工作进行尝试：
 91 | 
 92 | - 因为Tesseract训练用的图片质量大多和`image1.jpg`接近，所以需要调整训练集的图片质量，使其比较符合工程场景图像质量；
 93 | - 工程场景下，尽量保证图像质量，并且通过工程现场使用，收集图片；
 94 | - 图片收集足够数量后，OCR引擎转变为深度学习版本的；
 95 | - 改善OCR之前的图像预处理策略，事实上，我在换了其他的预处理策略后，结果是可以优于上述表现的。
 96 | 
 97 | 其中，`image1.jpg`输出图片分别如下：
 98 | ![Bounding box](https://ws1.sinaimg.cn/large/55fc1144gy1fzkay5dqltj20qo0zk42p.jpg)
 99 | 
100 | ![original](https://ws1.sinaimg.cn/large/55fc1144gy1fzkaz64bcxj209t03ydfr.jpg)
101 | 
102 | ![gray](https://ws1.sinaimg.cn/large/55fc1144gy1fzkazp3xcij209t03ywel.jpg)
103 | 
104 | #### To Do
105 | 
106 | - [ ] 增加使用视频流检测的Demo版本
107 | - [ ] 用flask增加一个简单的Web上传与显示结果的页面
108 | 
109 | #### 参考
110 | 
111 | [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection)
112 | 
113 | [Tensorflow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)
114 | 


--------------------------------------------------------------------------------
/cont_num.txt:
--------------------------------------------------------------------------------
 1 | image1_1_container_number_100%
 2 | TCLU § 148575 3
 3 | 45G1
 4 | image2_1_container_number_99%
 5 | TRNU816699 4 |
 6 | 45G1
 7 | image2_2_container_number_99%
 8 | TCNU89092898
 9 | 4561
10 | image2_3_container_number_99%
11 | MSKUY 86801264
12 | 4561
13 | image3_1_container_number_99%
14 | x L
15 | BOUL 871489 7 |
16 | 221
17 | image3_2_container_number_99%
18 | FCIU [599867 (0
19 | 22G1
20 | image4_1_container_number_e_99%
21 | WH LU 555149
22 | CSU
23 | image5_1_container_number_99%
24 | 5421 357770 4
25 | 2261
26 | image5_2_container_number_99%
27 | 1
28 | BSU247709
29 | | 2221
30 | image5_3_container_number_99%
31 | TRHU | 395563
32 | 2261
33 | image5_4_container_number_99%
34 | TRUU20275643
35 | 221
36 | 


--------------------------------------------------------------------------------
/create_pascal_tf_record.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | 
 16 | r"""Convert raw PASCAL dataset to TFRecord for object_detection.
 17 | 
 18 | Example usage:
 19 |     python object_detection/dataset_tools/create_pascal_tf_record.py \
 20 |         --data_dir=/home/user/VOCdevkit \
 21 |         --year=VOC2012 \
 22 |         --output_path=/home/user/pascal.record
 23 | """
 24 | from __future__ import absolute_import
 25 | from __future__ import division
 26 | from __future__ import print_function
 27 | 
 28 | import hashlib
 29 | import io
 30 | import logging
 31 | import os
 32 | 
 33 | from lxml import etree
 34 | import PIL.Image
 35 | import tensorflow as tf
 36 | 
 37 | from object_detection.utils import dataset_util
 38 | from object_detection.utils import label_map_util
 39 | 
 40 | 
 41 | flags = tf.app.flags
 42 | flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.')
 43 | flags.DEFINE_string('set', 'val', 'Convert training set, validation set or '
 44 |                     'merged set.')
 45 | flags.DEFINE_string('annotations_dir', 'Annotations',
 46 |                     '(Relative) path to annotations directory.')
 47 | flags.DEFINE_string('year', 'cont_train', 'Desired challenge year.')
 48 | flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
 49 | flags.DEFINE_string('label_map_path', '',
 50 |                     'Path to label map proto')
 51 | flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '
 52 |                      'difficult instances')
 53 | FLAGS = flags.FLAGS
 54 | 
 55 | SETS = ['train', 'val', 'trainval', 'test']
 56 | YEARS = ['cont_train', 'VOC2012', 'merged']
 57 | 
 58 | 
 59 | def dict_to_tf_example(data,
 60 |                        dataset_directory,
 61 |                        label_map_dict,
 62 |                        ignore_difficult_instances=False,
 63 |                        image_subdirectory='JPEGImages'):  
 64 |   """Convert XML derived dict to tf.Example proto.
 65 | 
 66 |   Notice that this function normalizes the bounding box coordinates provided
 67 |   by the raw data.
 68 | 
 69 |   Args:
 70 |     data: dict holding PASCAL XML fields for a single image (obtained by
 71 |       running dataset_util.recursive_parse_xml_to_dict)
 72 |     dataset_directory: Path to root directory holding PASCAL dataset
 73 |     label_map_dict: A map from string label names to integers ids.
 74 |     ignore_difficult_instances: Whether to skip difficult instances in the
 75 |       dataset  (default: False).
 76 |     image_subdirectory: String specifying subdirectory within the
 77 |       PASCAL dataset directory holding the actual image data.
 78 | 
 79 |   Returns:
 80 |     example: The converted tf.Example.
 81 | 
 82 |   Raises:
 83 |     ValueError: if the image pointed to by data['filename'] is not a valid JPEG
 84 |   """
 85 |   img_path = os.path.join('cont_train', image_subdirectory, data['filename']) # I do'n know why data['folder'] give wrong path.
 86 |   full_path = os.path.join(dataset_directory, img_path)
 87 |   with tf.gfile.GFile(full_path, 'rb') as fid:
 88 |     encoded_jpg = fid.read()
 89 |   encoded_jpg_io = io.BytesIO(encoded_jpg)
 90 |   image = PIL.Image.open(encoded_jpg_io)
 91 |   if image.format != 'JPEG':
 92 |     raise ValueError('Image format not JPEG')
 93 |   key = hashlib.sha256(encoded_jpg).hexdigest()
 94 | 
 95 |   width = int(data['size']['width'])
 96 |   height = int(data['size']['height'])
 97 | 
 98 |   xmin = []
 99 |   ymin = []
100 |   xmax = []
101 |   ymax = []
102 |   classes = []
103 |   classes_text = []
104 |   truncated = []
105 |   poses = []
106 |   difficult_obj = []
107 |   if 'object' in data:
108 |     for obj in data['object']:
109 |       difficult = bool(int(obj['difficult']))
110 |       if ignore_difficult_instances and difficult:
111 |         continue
112 | 
113 |       difficult_obj.append(int(difficult))
114 | 
115 |       xmin.append(float(obj['bndbox']['xmin']) / width)
116 |       ymin.append(float(obj['bndbox']['ymin']) / height)
117 |       xmax.append(float(obj['bndbox']['xmax']) / width)
118 |       ymax.append(float(obj['bndbox']['ymax']) / height)
119 |       classes_text.append(obj['name'].encode('utf8'))
120 |       classes.append(label_map_dict[obj['name']])
121 |       truncated.append(int(obj['truncated']))
122 |       poses.append(obj['pose'].encode('utf8'))
123 | 
124 |   example = tf.train.Example(features=tf.train.Features(feature={
125 |       'image/height': dataset_util.int64_feature(height),
126 |       'image/width': dataset_util.int64_feature(width),
127 |       'image/filename': dataset_util.bytes_feature(
128 |           data['filename'].encode('utf8')),
129 |       'image/source_id': dataset_util.bytes_feature(
130 |           data['filename'].encode('utf8')),
131 |       'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
132 |       'image/encoded': dataset_util.bytes_feature(encoded_jpg),
133 |       'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
134 |       'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
135 |       'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
136 |       'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
137 |       'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
138 |       'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
139 |       'image/object/class/label': dataset_util.int64_list_feature(classes),
140 |       'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
141 |       'image/object/truncated': dataset_util.int64_list_feature(truncated),
142 |       'image/object/view': dataset_util.bytes_list_feature(poses),
143 |   }))
144 |   return example
145 | 
146 | 
147 | def main(_):
148 |   if FLAGS.set not in SETS:
149 |     raise ValueError('set must be in : {}'.format(SETS))
150 |   if FLAGS.year not in YEARS:
151 |     raise ValueError('year must be in : {}'.format(YEARS))
152 | 
153 |   data_dir = FLAGS.data_dir
154 |   years = ['cont_train', 'VOC2012']
155 |   if FLAGS.year != 'merged':
156 |     years = [FLAGS.year]
157 | 
158 |   writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
159 | 
160 |   label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)
161 | 
162 |   for year in years:
163 |     logging.info('Reading from PASCAL %s dataset.', year)
164 |     examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main',  FLAGS.set + '.txt')
165 |     annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir)
166 |     examples_list = dataset_util.read_examples_list(examples_path)
167 |     for idx, example in enumerate(examples_list):
168 |       if idx % 100 == 0:
169 |         logging.info('On image %d of %d', idx, len(examples_list))
170 |       path = os.path.join(annotations_dir, example + '.xml')
171 |       with tf.gfile.GFile(path, 'r') as fid:
172 |         xml_str = fid.read()
173 |       xml = etree.fromstring(xml_str)
174 |       data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
175 | 
176 |       tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,
177 |                                       FLAGS.ignore_difficult_instances)
178 | 
179 |       writer.write(tf_example.SerializeToString())
180 | 
181 |   writer.close()
182 | 
183 | 
184 | if __name__ == '__main__':
185 |   tf.app.run()
186 | 


--------------------------------------------------------------------------------
/data/container_label_map.pbtxt:
--------------------------------------------------------------------------------
 1 | item {
 2 |   id: 1
 3 |   name: 'container_number'
 4 | }
 5 | 
 6 | item {
 7 |   id: 2
 8 |   name: 'container_number_v'
 9 | }
10 | 
11 | item {
12 |   id: 6
13 |   name: 'container_number_e'
14 | }
15 | 
16 | item {
17 |   id: 3
18 |   name: 'container_door'
19 | }
20 | item {
21 |   id: 4
22 |   name: 'container_end_door'
23 | }
24 | 
25 | item {
26 |   id: 5
27 |   name: 'container'
28 | }


--------------------------------------------------------------------------------
/detection_var_image.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | __author__ = 'Kevin Di'
  4 | 
  5 | import numpy as np
  6 | import os
  7 | from skimage import io, data
  8 | import six.moves.urllib as urllib
  9 | import sys
 10 | import tarfile
 11 | import tensorflow as tf
 12 | 
 13 | from collections import defaultdict
 14 | import collections
 15 | from io import StringIO
 16 | import matplotlib as mpl
 17 | 
 18 | from matplotlib import pyplot as plt
 19 | from PIL import Image
 20 | import pytesseract
 21 | import cv2
 22 | import re
 23 | 
 24 | 
 25 | # This is needed since the notebook is stored in the object_detection folder.
 26 | sys.path.append("..")
 27 | from object_detection.utils import ops as utils_ops
 28 | 
 29 | 
 30 | from object_detection.utils import label_map_util
 31 | 
 32 | from object_detection.utils import visualization_utils as vis_util
 33 | 
 34 | 
 35 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
 36 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
 37 | # List of the strings that is used to add correct label for each box.
 38 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt')
 39 | 
 40 | detection_graph = tf.Graph()
 41 | with detection_graph.as_default():
 42 |   od_graph_def = tf.GraphDef()
 43 |   with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
 44 |     serialized_graph = fid.read()
 45 |     od_graph_def.ParseFromString(serialized_graph)
 46 |     tf.import_graph_def(od_graph_def, name='')
 47 | 
 48 | category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
 49 | 
 50 | def load_image_into_numpy_array(image):
 51 |   (im_width, im_height) = image.size
 52 |   return np.array(image.getdata()).reshape(
 53 |       (im_height, im_width, 3)).astype(np.uint8)
 54 |     
 55 | # If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
 56 | PATH_TO_TEST_IMAGES_DIR = 'test_images'
 57 | 
 58 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)]
 59 | 
 60 | 
 61 | # Size, in inches, of the output images,use to plt.figure(figsize=IMAGE_SIZE)
 62 | # IMAGE_SIZE = (12, 8)
 63 | 
 64 | def run_inference_for_single_image(image, graph):
 65 |   with graph.as_default():
 66 |     with tf.Session(config = tf.ConfigProto(
 67 |                     device_count = {"CPU":16},
 68 |                     inter_op_parallelism_threads = 5,
 69 |                     intra_op_parallelism_threads = 2,
 70 |                     )) as sess:
 71 |       # Get handles to input and output tensors
 72 |       ops = tf.get_default_graph().get_operations()
 73 |       all_tensor_names = {output.name for op in ops for output in op.outputs}
 74 |       tensor_dict = {}
 75 |       for key in [
 76 |           'num_detections', 'detection_boxes', 'detection_scores',
 77 |           'detection_classes', 'detection_masks'
 78 |       ]:
 79 |         tensor_name = key + ':0'
 80 |         if tensor_name in all_tensor_names:
 81 |           tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
 82 |               tensor_name)
 83 |       if 'detection_masks' in tensor_dict:
 84 |         # The following processing is only for single image
 85 |         detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
 86 |         detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
 87 |         # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
 88 |         real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
 89 |         detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
 90 |         detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
 91 |         detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
 92 |             detection_masks, detection_boxes, image.shape[0], image.shape[1])
 93 |         detection_masks_reframed = tf.cast(
 94 |             tf.greater(detection_masks_reframed, 0.5), tf.uint8)
 95 |         # Follow the convention by adding back the batch dimension
 96 |         tensor_dict['detection_masks'] = tf.expand_dims(
 97 |             detection_masks_reframed, 0)
 98 |       image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
 99 | 
100 |       # Run inference
101 |       output_dict = sess.run(tensor_dict,
102 |                              feed_dict={image_tensor: np.expand_dims(image, 0)})
103 | 
104 |       # all outputs are float32 numpy arrays, so convert types as appropriate
105 |       output_dict['num_detections'] = int(output_dict['num_detections'][0])
106 |       output_dict['detection_classes'] = output_dict[
107 |           'detection_classes'][0].astype(np.uint8)
108 |       output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
109 |       output_dict['detection_scores'] = output_dict['detection_scores'][0]
110 |       if 'detection_masks' in output_dict:
111 |         output_dict['detection_masks'] = output_dict['detection_masks'][0]
112 |   return output_dict
113 | 
114 | def image_preprocessing(img):
115 |   # image_gray = img
116 |   image_gray = cv2.cvtColor(np.asarray(img), cv2.COLOR_BGR2GRAY)
117 |   #  image_gray = cv2.medianBlur(image_gray, 3)
118 |   #  image_gray = cv2.threshold(image_gray, 127, 255, cv2.THRESH_BINARY_INV)[1]
119 |   #  adaptiveThreshold not good ,just try it.
120 |   #  image_gray = cv2.adaptiveThreshold(image_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
121 |   
122 |   return image_gray
123 | # box_to_color_map{(xmin,xmax,ymin,ymax)(***): 'color'}
124 | # box_to_display_str_map{(xmin,xmax,ymin,ymax)(don't no): ['label: xx%']}
125 | def img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang = 'cont41'):
126 |   cont_num_find = 0
127 |   img_label = []
128 |   # Convert coordinates to raw pixels.
129 |   for box, color in box_to_color_map.items():
130 |     ymin, xmin, ymax, xmax = box  
131 |     # loads the original image, visualize_boxes_and_labels_on_image_array returned image had draw bounding boxs on it.
132 |     image_corp_org = Image.fromarray(np.uint8(image_org))
133 |     img_width, img_height = image_corp_org.size
134 |     new_xmin = int(xmin * img_width)
135 |     new_xmax = int(xmax * img_width)
136 |     new_ymin = int(ymin * img_height)
137 |     new_ymax = int(ymax * img_height)   
138 |     # Increase cropping security boundary(px).
139 |     offset = 5
140 |     if new_xmin - offset >= 0:
141 |       new_xmin = new_xmin - offset
142 |     if new_xmax + offset <= img_width:
143 |       new_xmax = new_xmax + offset
144 |     if new_ymin - offset >= 0:
145 |       new_ymin = new_ymin - offset
146 |     if new_ymax + offset <= img_height:
147 |       new_ymax = new_ymax + offset
148 |     # Get the label name of every bounding box,and rename 'xxx: 90%' to 'xxx-90%'.
149 |     img_label_name = box_to_display_str_map[box][0].split(': ')
150 |     # Corp image. Note that the PLI and Numpy coordinates are reversed!!!
151 |     image_corp_org = load_image_into_numpy_array(image_org)[new_ymin:new_ymax,new_xmin:new_xmax]       
152 |     image_corp_org = Image.fromarray(np.uint8(image_corp_org))   
153 |     # Tesseract OCR
154 |     lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f'
155 |     if re.match('container_number+', img_label_name[0]):
156 |       cont_num_find += 1
157 |       image_corp_gray = image_preprocessing(image_corp_org)
158 |       if re.match('container_number_v+', img_label_name[0]):
159 |         cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6')
160 |       elif re.match('container_number_e+', img_label_name[0]):
161 |         cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6')
162 |       else :
163 |         cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 4')
164 |       # Save corp image to outo_path ,and join lable in name.
165 |       # image_corp_name make up like this :'image_name(input)'_'cont_num_find'_'img_label_name'
166 |       image_corp_name = image_name[:-4]+ '_'+ str(cont_num_find)+ '_'+ img_label_name[0]
167 |       # img_lable[{lable,actual,cont_num,image_corp_name}]
168 |       img_label.append({'lable':img_label_name[0], 'actual':img_label_name[1], 'cont_num':cont_num, 'image_corp_name':image_corp_name})
169 |       image_corp_org.save(os.path.join(output_path) + '/' + image_corp_name + '_org_'+ image_name[-4:])
170 |       cv2.imwrite(os.path.join(output_path) + '/' + image_corp_name + '_gray_'+ image_name[-4:], image_corp_gray)
171 |       file = open(os.path.join(PATH_TO_TEST_IMAGES_DIR, 'cont_num.txt'), 'a')
172 |       file.write(img_label[cont_num_find - 1]['image_corp_name']+ '_' + img_label[cont_num_find - 1]['actual'] + '\n' + img_label[cont_num_find - 1]['cont_num']+ '\n')
173 |       file.close()
174 |   return img_label # image_corp_org, image_corp_gray
175 | 
176 | def detection():
177 |   image_label =[]
178 |   for image_path in TEST_IMAGE_PATHS:
179 |     image_org = Image.open(image_path, 'r')
180 |     # the array based representation of the image will be used later in order to prepare the
181 |     # result image with boxes and labels on it.
182 |     image_np = load_image_into_numpy_array(image_org)
183 |     # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
184 |     # image_np_expanded = np.expand_dims(image_np, axis=0)
185 |     image_name = os.path.basename(os.path.join(image_path))
186 |     # Actual detection.
187 |     output_dict = run_inference_for_single_image(image_np, detection_graph)
188 |   
189 |     output_path = os.path.join(PATH_TO_TEST_IMAGES_DIR)
190 |   
191 |     # Visualization of the results of a detection.
192 |     image, box_to_color_map, box_to_display_str_map = vis_util.visualize_boxes_and_labels_on_image_array(
193 |         image_np,
194 |         output_dict['detection_boxes'],
195 |         output_dict['detection_classes'],
196 |         output_dict['detection_scores'],
197 |         category_index,
198 |         instance_masks=output_dict.get('detection_masks'),
199 |         use_normalized_coordinates=True,
200 |         max_boxes_to_draw=200,
201 |         min_score_thresh=.75,
202 |         line_thickness=2)
203 |   
204 |     # Crop bounding box to splt images.
205 |     lang = 'cont41'
206 |     img_label = img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang)
207 |     # save visualize_boxes_and_labels_on_image_array output image.
208 |     image_name = os.path.basename(os.path.join(image_path))
209 |     output_image_name = image_name[:-4] + '_out' + image_name[-4:]
210 |     image_out = Image.fromarray(image_np)
211 |     image_out.save(os.path.join(PATH_TO_TEST_IMAGES_DIR) + '/'+ output_image_name)
212 |     image_label.append({str(image_name[:-4]): img_label})
213 |   return image_label
214 |   
215 | 
216 | if __name__ == "__main__":
217 |     print(detection())
218 | 
219 | 
220 | 
221 |   
222 | 
223 | 


--------------------------------------------------------------------------------
/generate_voc_datasets.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | __author__ = 'Kevin Di'
 5 | 
 6 | import os
 7 | import random 
 8 |  
 9 | # VOC like data_set file path.
10 | 
11 | xml_file = r'path to your VOC like data_set: /Annotations'
12 | img_file = r'path to your VOC like data_set:/JPEGImages'
13 | save_path = r'path to your VOC like data_set: /ImageSets/Main'
14 | 
15 | 
16 | # Determine the train, val, test split ratio.
17 | # The frist step is split the train_val and test, and then split the train and val from the train_val.
18 | 
19 | train_val_percent = 0.8
20 | train_percent = 0.8
21 | total_dataset_num = os.listdir(xml_file)
22 | total_img_num = os.listdir(img_file)
23 | num = len(total_dataset_num)
24 | img = len(total_img_num)
25 | list = range(num)  
26 | t_v = int(num * train_val_percent)  
27 | t = int(t_v * train_percent)  
28 | train_val= random.sample(list,t_v)  
29 | train = random.sample(train_val,t)  
30 |  
31 | print('Total number of  xml files is:', num)
32 | print('Total number of images is:', img)
33 | print('training set size:', t)
34 | print('validation set size:', t_v - t)
35 | print('test set size:', num - t_v)
36 | 
37 | file_train = open(os.path.join(save_path,'train.txt'), 'w') 
38 | file_val = open(os.path.join(save_path,'val.txt'), 'w')  
39 | file_test = open(os.path.join(save_path,'test.txt'), 'w')  
40 |  
41 |  
42 | for i in list:  
43 |     xml_name = total_dataset_num[i][:5]+'\n'
44 | 
45 |     if i in train_val:  
46 |         if i in train:  
47 |             file_train.write(xml_name)  
48 |         else:  
49 |             file_val.write(xml_name)  
50 |     else:  
51 |         file_test.write(xml_name)  
52 |   
53 | file_train.close()  
54 | file_val.close()  
55 | file_test.close()
56 | 


--------------------------------------------------------------------------------
/image/image1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image1.jpg


--------------------------------------------------------------------------------
/image/image2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image2.jpg


--------------------------------------------------------------------------------
/image/image3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image3.jpg


--------------------------------------------------------------------------------
/image/image4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image4.jpg


--------------------------------------------------------------------------------
/image/image5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image5.jpg


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/utils/__init__.py


--------------------------------------------------------------------------------
/utils/visualization_utils.py:
--------------------------------------------------------------------------------
   1 | # -*- coding: utf-8 -*-
   2 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
   3 | #
   4 | # Licensed under the Apache License, Version 2.0 (the "License");
   5 | # you may not use this file except in compliance with the License.
   6 | # You may obtain a copy of the License at
   7 | #
   8 | #     http://www.apache.org/licenses/LICENSE-2.0
   9 | #
  10 | # Unless required by applicable law or agreed to in writing, software
  11 | # distributed under the License is distributed on an "AS IS" BASIS,
  12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13 | # See the License for the specific language governing permissions and
  14 | # limitations under the License.
  15 | # ==============================================================================
  16 | 
  17 | """A set of functions that are used for visualization.
  18 | 
  19 | These functions often receive an image, perform some visualization on the image.
  20 | The functions do not return a value, instead they modify the image itself.
  21 | 
  22 | """
  23 | import abc
  24 | import collections
  25 | import functools
  26 | # Set headless-friendly backend.
  27 | # Use Agg can not show image
  28 | # import matplotlib; matplotlib.use('Agg')  
  29 | import matplotlib
  30 | from matplotlib import pyplot as plt  
  31 | import os
  32 | import numpy as np
  33 | import PIL.Image as Image
  34 | import PIL.ImageColor as ImageColor
  35 | import PIL.ImageDraw as ImageDraw
  36 | import PIL.ImageFont as ImageFont
  37 | import six
  38 | import tensorflow as tf
  39 | 
  40 | from object_detection.core import standard_fields as fields
  41 | from object_detection.utils import shape_utils
  42 | 
  43 | _TITLE_LEFT_MARGIN = 10
  44 | _TITLE_TOP_MARGIN = 10
  45 | STANDARD_COLORS = [
  46 |     'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque',
  47 |     'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite',
  48 |     'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan',
  49 |     'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange',
  50 |     'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet',
  51 |     'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite',
  52 |     'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod',
  53 |     'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki',
  54 |     'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue',
  55 |     'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey',
  56 |     'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue',
  57 |     'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime',
  58 |     'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid',
  59 |     'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen',
  60 |     'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin',
  61 |     'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed',
  62 |     'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed',
  63 |     'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple',
  64 |     'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown',
  65 |     'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue',
  66 |     'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow',
  67 |     'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White',
  68 |     'WhiteSmoke', 'Yellow', 'YellowGreen'
  69 | ]
  70 | 
  71 | 
  72 | def save_image_array_as_png(image, output_path):
  73 |   """Saves an image (represented as a numpy array) to PNG.
  74 | 
  75 |   Args:
  76 |     image: a numpy array with shape [height, width, 3].
  77 |     output_path: path to which image should be written.
  78 |   """
  79 |   image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
  80 |   with tf.gfile.Open(output_path, 'w') as fid:
  81 |     image_pil.save(fid, 'PNG')
  82 | 
  83 | 
  84 | def encode_image_array_as_png_str(image):
  85 |   """Encodes a numpy array into a PNG string.
  86 | 
  87 |   Args:
  88 |     image: a numpy array with shape [height, width, 3].
  89 | 
  90 |   Returns:
  91 |     PNG encoded image string.
  92 |   """
  93 |   image_pil = Image.fromarray(np.uint8(image))
  94 |   output = six.BytesIO()
  95 |   image_pil.save(output, format='PNG')
  96 |   png_string = output.getvalue()
  97 |   output.close()
  98 |   return png_string
  99 | 
 100 | 
 101 | def draw_bounding_box_on_image_array(image,
 102 |                                      ymin,
 103 |                                      xmin,
 104 |                                      ymax,
 105 |                                      xmax,
 106 |                                      color='red',
 107 |                                      thickness=4,
 108 |                                      display_str_list=(),
 109 |                                      use_normalized_coordinates=True):
 110 |   """Adds a bounding box to an image (numpy array).
 111 | 
 112 |   Bounding box coordinates can be specified in either absolute (pixel) or
 113 |   normalized coordinates by setting the use_normalized_coordinates argument.
 114 | 
 115 |   Args:
 116 |     image: a numpy array with shape [height, width, 3].
 117 |     ymin: ymin of bounding box.
 118 |     xmin: xmin of bounding box.
 119 |     ymax: ymax of bounding box.
 120 |     xmax: xmax of bounding box.
 121 |     color: color to draw bounding box. Default is red.
 122 |     thickness: line thickness. Default value is 4.
 123 |     display_str_list: list of strings to display in box
 124 |                       (each to be shown on its own line).
 125 |     use_normalized_coordinates: If True (default), treat coordinates
 126 |       ymin, xmin, ymax, xmax as relative to the image.  Otherwise treat
 127 |       coordinates as absolute.
 128 |   """
 129 |   image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
 130 |   draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color,
 131 |                              thickness, display_str_list,
 132 |                              use_normalized_coordinates)
 133 |   np.copyto(image, np.array(image_pil))
 134 | 
 135 | 
 136 | def draw_bounding_box_on_image(image,
 137 |                                ymin,
 138 |                                xmin,
 139 |                                ymax,
 140 |                                xmax,
 141 |                                color='red',
 142 |                                thickness=4,
 143 |                                display_str_list=(),
 144 |                                use_normalized_coordinates=True):
 145 |   """Adds a bounding box to an image.
 146 | 
 147 |   Bounding box coordinates can be specified in either absolute (pixel) or
 148 |   normalized coordinates by setting the use_normalized_coordinates argument.
 149 | 
 150 |   Each string in display_str_list is displayed on a separate line above the
 151 |   bounding box in black text on a rectangle filled with the input 'color'.
 152 |   If the top of the bounding box extends to the edge of the image, the strings
 153 |   are displayed below the bounding box.
 154 | 
 155 |   Args:
 156 |     image: a PIL.Image object.
 157 |     ymin: ymin of bounding box.
 158 |     xmin: xmin of bounding box.
 159 |     ymax: ymax of bounding box.
 160 |     xmax: xmax of bounding box.
 161 |     color: color to draw bounding box. Default is red.
 162 |     thickness: line thickness. Default value is 4.
 163 |     display_str_list: list of strings to display in box
 164 |                       (each to be shown on its own line).
 165 |     use_normalized_coordinates: If True (default), treat coordinates
 166 |       ymin, xmin, ymax, xmax as relative to the image.  Otherwise treat
 167 |       coordinates as absolute.
 168 |   """
 169 |   draw = ImageDraw.Draw(image)
 170 |   im_width, im_height = image.size
 171 |   if use_normalized_coordinates:
 172 |     (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
 173 |                                   ymin * im_height, ymax * im_height)
 174 |   else:
 175 |     (left, right, top, bottom) = (xmin, xmax, ymin, ymax)
 176 |   draw.line([(left, top), (left, bottom), (right, bottom),
 177 |              (right, top), (left, top)], width=thickness, fill=color)
 178 |   try:
 179 |     font = ImageFont.truetype('arial.ttf', 24)
 180 |   except IOError:
 181 |     font = ImageFont.load_default()
 182 | 
 183 |   # If the total height of the display strings added to the top of the bounding
 184 |   # box exceeds the top of the image, stack the strings below the bounding box
 185 |   # instead of above.
 186 |   display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
 187 |   # Each display_str has a top and bottom margin of 0.05x.
 188 |   total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)
 189 | 
 190 |   if top > total_display_str_height:
 191 |     text_bottom = top
 192 |   else:
 193 |     text_bottom = bottom + total_display_str_height
 194 |   # Reverse list and print from bottom to top.
 195 |   for display_str in display_str_list[::-1]:
 196 |     text_width, text_height = font.getsize(display_str)
 197 |     margin = np.ceil(0.05 * text_height)
 198 |     draw.rectangle(
 199 |         [(left, text_bottom - text_height - 2 * margin), (left + text_width,
 200 |                                                           text_bottom)],
 201 |         fill=color)
 202 |     draw.text(
 203 |         (left + margin, text_bottom - text_height - margin),
 204 |         display_str,
 205 |         fill='black',
 206 |         font=font)
 207 |     text_bottom -= text_height - 2 * margin
 208 | 
 209 | 
 210 | def draw_bounding_boxes_on_image_array(image,
 211 |                                        boxes,
 212 |                                        color='red',
 213 |                                        thickness=4,
 214 |                                        display_str_list_list=()):
 215 |   """Draws bounding boxes on image (numpy array).
 216 | 
 217 |   Args:
 218 |     image: a numpy array object.
 219 |     boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax).
 220 |            The coordinates are in normalized format between [0, 1].
 221 |     color: color to draw bounding box. Default is red.
 222 |     thickness: line thickness. Default value is 4.
 223 |     display_str_list_list: list of list of strings.
 224 |                            a list of strings for each bounding box.
 225 |                            The reason to pass a list of strings for a
 226 |                            bounding box is that it might contain
 227 |                            multiple labels.
 228 | 
 229 |   Raises:
 230 |     ValueError: if boxes is not a [N, 4] array
 231 |   """
 232 |   image_pil = Image.fromarray(image)
 233 |   draw_bounding_boxes_on_image(image_pil, boxes, color, thickness,
 234 |                                display_str_list_list)
 235 |   np.copyto(image, np.array(image_pil))
 236 | 
 237 | 
 238 | def draw_bounding_boxes_on_image(image,
 239 |                                  boxes,
 240 |                                  color='red',
 241 |                                  thickness=4,
 242 |                                  display_str_list_list=()):
 243 |   """Draws bounding boxes on image.
 244 | 
 245 |   Args:
 246 |     image: a PIL.Image object.
 247 |     boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax).
 248 |            The coordinates are in normalized format between [0, 1].
 249 |     color: color to draw bounding box. Default is red.
 250 |     thickness: line thickness. Default value is 4.
 251 |     display_str_list_list: list of list of strings.
 252 |                            a list of strings for each bounding box.
 253 |                            The reason to pass a list of strings for a
 254 |                            bounding box is that it might contain
 255 |                            multiple labels.
 256 | 
 257 |   Raises:
 258 |     ValueError: if boxes is not a [N, 4] array
 259 |   """
 260 |   boxes_shape = boxes.shape
 261 |   if not boxes_shape:
 262 |     return
 263 |   if len(boxes_shape) != 2 or boxes_shape[1] != 4:
 264 |     raise ValueError('Input must be of size [N, 4]')
 265 |   for i in range(boxes_shape[0]):
 266 |     display_str_list = ()
 267 |     if display_str_list_list:
 268 |       display_str_list = display_str_list_list[i]
 269 |     draw_bounding_box_on_image(image, boxes[i, 0], boxes[i, 1], boxes[i, 2],
 270 |                                boxes[i, 3], color, thickness, display_str_list)
 271 | 
 272 | 
 273 | def _visualize_boxes(image, boxes, classes, scores, category_index, **kwargs):
 274 |   return visualize_boxes_and_labels_on_image_array(
 275 |       image, boxes, classes, scores, category_index=category_index, **kwargs)
 276 | 
 277 | 
 278 | def _visualize_boxes_and_masks(image, boxes, classes, scores, masks,
 279 |                                category_index, **kwargs):
 280 |   return visualize_boxes_and_labels_on_image_array(
 281 |       image,
 282 |       boxes,
 283 |       classes,
 284 |       scores,
 285 |       category_index=category_index,
 286 |       instance_masks=masks,
 287 |       **kwargs)
 288 | 
 289 | 
 290 | def _visualize_boxes_and_keypoints(image, boxes, classes, scores, keypoints,
 291 |                                    category_index, **kwargs):
 292 |   return visualize_boxes_and_labels_on_image_array(
 293 |       image,
 294 |       boxes,
 295 |       classes,
 296 |       scores,
 297 |       category_index=category_index,
 298 |       keypoints=keypoints,
 299 |       **kwargs)
 300 | 
 301 | 
 302 | def _visualize_boxes_and_masks_and_keypoints(
 303 |     image, boxes, classes, scores, masks, keypoints, category_index, **kwargs):
 304 |   return visualize_boxes_and_labels_on_image_array(
 305 |       image,
 306 |       boxes,
 307 |       classes,
 308 |       scores,
 309 |       category_index=category_index,
 310 |       instance_masks=masks,
 311 |       keypoints=keypoints,
 312 |       **kwargs)
 313 | 
 314 | 
 315 | def _resize_original_image(image, image_shape):
 316 |   image = tf.expand_dims(image, 0)
 317 |   image = tf.image.resize_images(
 318 |       image,
 319 |       image_shape,
 320 |       method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
 321 |       align_corners=True)
 322 |   return tf.cast(tf.squeeze(image, 0), tf.uint8)
 323 | 
 324 | 
 325 | def draw_bounding_boxes_on_image_tensors(images,
 326 |                                          boxes,
 327 |                                          classes,
 328 |                                          scores,
 329 |                                          category_index,
 330 |                                          original_image_spatial_shape=None,
 331 |                                          true_image_shape=None,
 332 |                                          instance_masks=None,
 333 |                                          keypoints=None,
 334 |                                          max_boxes_to_draw=100,
 335 |                                          min_score_thresh=0.85,
 336 |                                          use_normalized_coordinates=True):
 337 |   """Draws bounding boxes, masks, and keypoints on batch of image tensors.
 338 | 
 339 |   Args:
 340 |     images: A 4D uint8 image tensor of shape [N, H, W, C]. If C > 3, additional
 341 |       channels will be ignored. If C = 1, then we convert the images to RGB
 342 |       images.
 343 |     boxes: [N, max_detections, 4] float32 tensor of detection boxes.
 344 |     classes: [N, max_detections] int tensor of detection classes. Note that
 345 |       classes are 1-indexed.
 346 |     scores: [N, max_detections] float32 tensor of detection scores.
 347 |     category_index: a dict that maps integer ids to category dicts. e.g.
 348 |       {1: {1: 'dog'}, 2: {2: 'cat'}, ...}
 349 |     original_image_spatial_shape: [N, 2] tensor containing the spatial size of
 350 |       the original image.
 351 |     true_image_shape: [N, 3] tensor containing the spatial size of unpadded
 352 |       original_image.
 353 |     instance_masks: A 4D uint8 tensor of shape [N, max_detection, H, W] with
 354 |       instance masks.
 355 |     keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2]
 356 |       with keypoints.
 357 |     max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
 358 |     min_score_thresh: Minimum score threshold for visualization. Default 0.2.
 359 |     use_normalized_coordinates: Whether to assume boxes and kepoints are in
 360 |       normalized coordinates (as opposed to absolute coordiantes).
 361 |       Default is True.
 362 | 
 363 |   Returns:
 364 |     4D image tensor of type uint8, with boxes drawn on top.
 365 |   """
 366 |   # Additional channels are being ignored.
 367 |   if images.shape[3] > 3:
 368 |     images = images[:, :, :, 0:3]
 369 |   elif images.shape[3] == 1:
 370 |     images = tf.image.grayscale_to_rgb(images)
 371 |   visualization_keyword_args = {
 372 |       'use_normalized_coordinates': use_normalized_coordinates,
 373 |       'max_boxes_to_draw': max_boxes_to_draw,
 374 |       'min_score_thresh': min_score_thresh,
 375 |       'agnostic_mode': False,
 376 |       'line_thickness': 4
 377 |   }
 378 |   if true_image_shape is None:
 379 |     true_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 3])
 380 |   else:
 381 |     true_shapes = true_image_shape
 382 |   if original_image_spatial_shape is None:
 383 |     original_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 2])
 384 |   else:
 385 |     original_shapes = original_image_spatial_shape
 386 | 
 387 |   if instance_masks is not None and keypoints is None:
 388 |     visualize_boxes_fn = functools.partial(
 389 |         _visualize_boxes_and_masks,
 390 |         category_index=category_index,
 391 |         **visualization_keyword_args)
 392 |     elems = [
 393 |         true_shapes, original_shapes, images, boxes, classes, scores,
 394 |         instance_masks
 395 |     ]
 396 |   elif instance_masks is None and keypoints is not None:
 397 |     visualize_boxes_fn = functools.partial(
 398 |         _visualize_boxes_and_keypoints,
 399 |         category_index=category_index,
 400 |         **visualization_keyword_args)
 401 |     elems = [
 402 |         true_shapes, original_shapes, images, boxes, classes, scores, keypoints
 403 |     ]
 404 |   elif instance_masks is not None and keypoints is not None:
 405 |     visualize_boxes_fn = functools.partial(
 406 |         _visualize_boxes_and_masks_and_keypoints,
 407 |         category_index=category_index,
 408 |         **visualization_keyword_args)
 409 |     elems = [
 410 |         true_shapes, original_shapes, images, boxes, classes, scores,
 411 |         instance_masks, keypoints
 412 |     ]
 413 |   else:
 414 |     visualize_boxes_fn = functools.partial(
 415 |         _visualize_boxes,
 416 |         category_index=category_index,
 417 |         **visualization_keyword_args)
 418 |     elems = [
 419 |         true_shapes, original_shapes, images, boxes, classes, scores
 420 |     ]
 421 | 
 422 |   def draw_boxes(image_and_detections):
 423 |     """Draws boxes on image."""
 424 |     true_shape = image_and_detections[0]
 425 |     original_shape = image_and_detections[1]
 426 |     if true_image_shape is not None:
 427 |       image = shape_utils.pad_or_clip_nd(image_and_detections[2],
 428 |                                          [true_shape[0], true_shape[1], 3])
 429 |     if original_image_spatial_shape is not None:
 430 |       image_and_detections[2] = _resize_original_image(image, original_shape)
 431 | 
 432 |     image_with_boxes = tf.py_func(visualize_boxes_fn, image_and_detections[2:],
 433 |                                   tf.uint8)
 434 |     return image_with_boxes
 435 | 
 436 |   images = tf.map_fn(draw_boxes, elems, dtype=tf.uint8, back_prop=False)
 437 |   return images
 438 | 
 439 | 
 440 | def draw_side_by_side_evaluation_image(eval_dict,
 441 |                                        category_index,
 442 |                                        max_boxes_to_draw=20,
 443 |                                        min_score_thresh=0.2,
 444 |                                        use_normalized_coordinates=True):
 445 |   """Creates a side-by-side image with detections and groundtruth.
 446 | 
 447 |   Bounding boxes (and instance masks, if available) are visualized on both
 448 |   subimages.
 449 | 
 450 |   Args:
 451 |     eval_dict: The evaluation dictionary returned by
 452 |       eval_util.result_dict_for_batched_example() or
 453 |       eval_util.result_dict_for_single_example().
 454 |     category_index: A category index (dictionary) produced from a labelmap.
 455 |     max_boxes_to_draw: The maximum number of boxes to draw for detections.
 456 |     min_score_thresh: The minimum score threshold for showing detections.
 457 |     use_normalized_coordinates: Whether to assume boxes and kepoints are in
 458 |       normalized coordinates (as opposed to absolute coordiantes).
 459 |       Default is True.
 460 | 
 461 |   Returns:
 462 |     A list of [1, H, 2 * W, C] uint8 tensor. The subimage on the left
 463 |       corresponds to detections, while the subimage on the right corresponds to
 464 |       groundtruth.
 465 |   """
 466 |   detection_fields = fields.DetectionResultFields()
 467 |   input_data_fields = fields.InputDataFields()
 468 | 
 469 |   images_with_detections_list = []
 470 | 
 471 |   # Add the batch dimension if the eval_dict is for single example.
 472 |   if len(eval_dict[detection_fields.detection_classes].shape) == 1:
 473 |     for key in eval_dict:
 474 |       if key != input_data_fields.original_image:
 475 |         eval_dict[key] = tf.expand_dims(eval_dict[key], 0)
 476 | 
 477 |   for indx in range(eval_dict[input_data_fields.original_image].shape[0]):
 478 |     instance_masks = None
 479 |     if detection_fields.detection_masks in eval_dict:
 480 |       instance_masks = tf.cast(
 481 |           tf.expand_dims(
 482 |               eval_dict[detection_fields.detection_masks][indx], axis=0),
 483 |           tf.uint8)
 484 |     keypoints = None
 485 |     if detection_fields.detection_keypoints in eval_dict:
 486 |       keypoints = tf.expand_dims(
 487 |           eval_dict[detection_fields.detection_keypoints][indx], axis=0)
 488 |     groundtruth_instance_masks = None
 489 |     if input_data_fields.groundtruth_instance_masks in eval_dict:
 490 |       groundtruth_instance_masks = tf.cast(
 491 |           tf.expand_dims(
 492 |               eval_dict[input_data_fields.groundtruth_instance_masks][indx],
 493 |               axis=0), tf.uint8)
 494 | 
 495 |     images_with_detections = draw_bounding_boxes_on_image_tensors(
 496 |         tf.expand_dims(
 497 |             eval_dict[input_data_fields.original_image][indx], axis=0),
 498 |         tf.expand_dims(
 499 |             eval_dict[detection_fields.detection_boxes][indx], axis=0),
 500 |         tf.expand_dims(
 501 |             eval_dict[detection_fields.detection_classes][indx], axis=0),
 502 |         tf.expand_dims(
 503 |             eval_dict[detection_fields.detection_scores][indx], axis=0),
 504 |         category_index,
 505 |         original_image_spatial_shape=tf.expand_dims(
 506 |             eval_dict[input_data_fields.original_image_spatial_shape][indx],
 507 |             axis=0),
 508 |         true_image_shape=tf.expand_dims(
 509 |             eval_dict[input_data_fields.true_image_shape][indx], axis=0),
 510 |         instance_masks=instance_masks,
 511 |         keypoints=keypoints,
 512 |         max_boxes_to_draw=max_boxes_to_draw,
 513 |         min_score_thresh=min_score_thresh,
 514 |         use_normalized_coordinates=use_normalized_coordinates)
 515 |     images_with_groundtruth = draw_bounding_boxes_on_image_tensors(
 516 |         tf.expand_dims(
 517 |             eval_dict[input_data_fields.original_image][indx], axis=0),
 518 |         tf.expand_dims(
 519 |             eval_dict[input_data_fields.groundtruth_boxes][indx], axis=0),
 520 |         tf.expand_dims(
 521 |             eval_dict[input_data_fields.groundtruth_classes][indx], axis=0),
 522 |         tf.expand_dims(
 523 |             tf.ones_like(
 524 |                 eval_dict[input_data_fields.groundtruth_classes][indx],
 525 |                 dtype=tf.float32),
 526 |             axis=0),
 527 |         category_index,
 528 |         original_image_spatial_shape=tf.expand_dims(
 529 |             eval_dict[input_data_fields.original_image_spatial_shape][indx],
 530 |             axis=0),
 531 |         true_image_shape=tf.expand_dims(
 532 |             eval_dict[input_data_fields.true_image_shape][indx], axis=0),
 533 |         instance_masks=groundtruth_instance_masks,
 534 |         keypoints=None,
 535 |         max_boxes_to_draw=None,
 536 |         min_score_thresh=0.0,
 537 |         use_normalized_coordinates=use_normalized_coordinates)
 538 |     images_with_detections_list.append(
 539 |         tf.concat([images_with_detections, images_with_groundtruth], axis=2))
 540 |   return images_with_detections_list
 541 | 
 542 | 
 543 | def draw_keypoints_on_image_array(image,
 544 |                                   keypoints,
 545 |                                   color='red',
 546 |                                   radius=2,
 547 |                                   use_normalized_coordinates=True):
 548 |   """Draws keypoints on an image (numpy array).
 549 | 
 550 |   Args:
 551 |     image: a numpy array with shape [height, width, 3].
 552 |     keypoints: a numpy array with shape [num_keypoints, 2].
 553 |     color: color to draw the keypoints with. Default is red.
 554 |     radius: keypoint radius. Default value is 2.
 555 |     use_normalized_coordinates: if True (default), treat keypoint values as
 556 |       relative to the image.  Otherwise treat them as absolute.
 557 |   """
 558 |   image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
 559 |   draw_keypoints_on_image(image_pil, keypoints, color, radius,
 560 |                           use_normalized_coordinates)
 561 |   np.copyto(image, np.array(image_pil))
 562 | 
 563 | 
 564 | def draw_keypoints_on_image(image,
 565 |                             keypoints,
 566 |                             color='red',
 567 |                             radius=2,
 568 |                             use_normalized_coordinates=True):
 569 |   """Draws keypoints on an image.
 570 | 
 571 |   Args:
 572 |     image: a PIL.Image object.
 573 |     keypoints: a numpy array with shape [num_keypoints, 2].
 574 |     color: color to draw the keypoints with. Default is red.
 575 |     radius: keypoint radius. Default value is 2.
 576 |     use_normalized_coordinates: if True (default), treat keypoint values as
 577 |       relative to the image.  Otherwise treat them as absolute.
 578 |   """
 579 |   draw = ImageDraw.Draw(image)
 580 |   im_width, im_height = image.size
 581 |   keypoints_x = [k[1] for k in keypoints]
 582 |   keypoints_y = [k[0] for k in keypoints]
 583 |   if use_normalized_coordinates:
 584 |     keypoints_x = tuple([im_width * x for x in keypoints_x])
 585 |     keypoints_y = tuple([im_height * y for y in keypoints_y])
 586 |   for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y):
 587 |     draw.ellipse([(keypoint_x - radius, keypoint_y - radius),
 588 |                   (keypoint_x + radius, keypoint_y + radius)],
 589 |                  outline=color, fill=color)
 590 | 
 591 | 
 592 | def draw_mask_on_image_array(image, mask, color='red', alpha=0.4):
 593 |   """Draws mask on an image.
 594 | 
 595 |   Args:
 596 |     image: uint8 numpy array with shape (img_height, img_height, 3)
 597 |     mask: a uint8 numpy array of shape (img_height, img_height) with
 598 |       values between either 0 or 1.
 599 |     color: color to draw the keypoints with. Default is red.
 600 |     alpha: transparency value between 0 and 1. (default: 0.4)
 601 | 
 602 |   Raises:
 603 |     ValueError: On incorrect data type for image or masks.
 604 |   """
 605 |   if image.dtype != np.uint8:
 606 |     raise ValueError('`image` not of type np.uint8')
 607 |   if mask.dtype != np.uint8:
 608 |     raise ValueError('`mask` not of type np.uint8')
 609 |   if np.any(np.logical_and(mask != 1, mask != 0)):
 610 |     raise ValueError('`mask` elements should be in [0, 1]')
 611 |   if image.shape[:2] != mask.shape:
 612 |     raise ValueError('The image has spatial dimensions %s but the mask has '
 613 |                      'dimensions %s' % (image.shape[:2], mask.shape))
 614 |   rgb = ImageColor.getrgb(color)
 615 |   pil_image = Image.fromarray(image)
 616 | 
 617 |   solid_color = np.expand_dims(
 618 |       np.ones_like(mask), axis=2) * np.reshape(list(rgb), [1, 1, 3])
 619 |   pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA')
 620 |   pil_mask = Image.fromarray(np.uint8(255.0*alpha*mask)).convert('L')
 621 |   pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)
 622 |   np.copyto(image, np.array(pil_image.convert('RGB')))
 623 | 
 624 | 
 625 | def visualize_boxes_and_labels_on_image_array(
 626 |     image,
 627 | #    image_path,
 628 | #    output_path,
 629 |     boxes,
 630 |     classes,
 631 |     scores,
 632 |     category_index,
 633 |     instance_masks=None,
 634 |     instance_boundaries=None,
 635 |     keypoints=None,
 636 |     use_normalized_coordinates=False,
 637 |     max_boxes_to_draw=20,
 638 |     min_score_thresh=.5,
 639 |     agnostic_mode=False,
 640 |     line_thickness=4,
 641 |     groundtruth_box_visualization_color='black',
 642 |     skip_scores=False,
 643 |     skip_labels=False):
 644 |   """Overlay labeled boxes on an image with formatted scores and label names.
 645 | 
 646 |   This function groups boxes that correspond to the same location
 647 |   and creates a display string for each detection and overlays these
 648 |   on the image. Note that this function modifies the image in place, and returns
 649 |   that same image.
 650 | 
 651 |   Args:
 652 |     image: uint8 numpy array with shape (img_height, img_width, 3)
 653 |     boxes: a numpy array of shape [N, 4]
 654 |     classes: a numpy array of shape [N]. Note that class indices are 1-based,
 655 |       and match the keys in the label map.
 656 |     scores: a numpy array of shape [N] or None.  If scores=None, then
 657 |       this function assumes that the boxes to be plotted are groundtruth
 658 |       boxes and plot all boxes as black with no classes or scores.
 659 |     category_index: a dict containing category dictionaries (each holding
 660 |       category index `id` and category name `name`) keyed by category indices.
 661 |     instance_masks: a numpy array of shape [N, image_height, image_width] with
 662 |       values ranging between 0 and 1, can be None.
 663 |     instance_boundaries: a numpy array of shape [N, image_height, image_width]
 664 |       with values ranging between 0 and 1, can be None.
 665 |     keypoints: a numpy array of shape [N, num_keypoints, 2], can
 666 |       be None
 667 |     use_normalized_coordinates: whether boxes is to be interpreted as
 668 |       normalized coordinates or not.
 669 |     max_boxes_to_draw: maximum number of boxes to visualize.  If None, draw
 670 |       all boxes.
 671 |     min_score_thresh: minimum score threshold for a box to be visualized
 672 |     agnostic_mode: boolean (default: False) controlling whether to evaluate in
 673 |       class-agnostic mode or not.  This mode will display scores but ignore
 674 |       classes.
 675 |     line_thickness: integer (default: 4) controlling line width of the boxes.
 676 |     groundtruth_box_visualization_color: box color for visualizing groundtruth
 677 |       boxes
 678 |     skip_scores: whether to skip score when drawing a single detection
 679 |     skip_labels: whether to skip label when drawing a single detection
 680 | 
 681 |   Returns:
 682 |     uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
 683 |   """
 684 |   # Create a display string (and color) for every box location, group any boxes
 685 |   # that correspond to the same location.
 686 |   box_to_display_str_map = collections.defaultdict(list)
 687 |   box_to_color_map = collections.defaultdict(str)
 688 |   box_to_instance_masks_map = {}
 689 |   box_to_instance_boundaries_map = {}
 690 |   box_to_keypoints_map = collections.defaultdict(list)
 691 |   if not max_boxes_to_draw:
 692 |     max_boxes_to_draw = boxes.shape[0]
 693 |   for i in range(min(max_boxes_to_draw, boxes.shape[0])):
 694 |     if scores is None or scores[i] > min_score_thresh:
 695 |       box = tuple(boxes[i].tolist())
 696 |       if instance_masks is not None:
 697 |         box_to_instance_masks_map[box] = instance_masks[i]
 698 |       if instance_boundaries is not None:
 699 |         box_to_instance_boundaries_map[box] = instance_boundaries[i]
 700 |       if keypoints is not None:
 701 |         box_to_keypoints_map[box].extend(keypoints[i])
 702 |       if scores is None:
 703 |         box_to_color_map[box] = groundtruth_box_visualization_color
 704 |       else:
 705 |         display_str = ''
 706 |         if not skip_labels:
 707 |           if not agnostic_mode:
 708 |             if classes[i] in category_index.keys():
 709 |               class_name = category_index[classes[i]]['name']
 710 |             else:
 711 |               class_name = 'N/A'
 712 |             display_str = str(class_name)
 713 |         if not skip_scores:
 714 |           if not display_str:
 715 |             display_str = '{}%'.format(int(100*scores[i]))
 716 |           else:
 717 |             display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
 718 |         box_to_display_str_map[box].append(display_str)
 719 |         if agnostic_mode:
 720 |           box_to_color_map[box] = 'DarkOrange'
 721 |         else:
 722 |           box_to_color_map[box] = STANDARD_COLORS[
 723 |               classes[i] % len(STANDARD_COLORS)]
 724 | 
 725 | 
 726 |   # # Crop bounding box to splt images,move out of this file for OCR.
 727 |  
 728 |   # # Convert coordinates to raw pixels.
 729 |   # t = 0
 730 |   # for box, color in box_to_color_map.items():
 731 |   #   ymin, xmin, ymax, xmax = box  
 732 |   
 733 |   #   img = Image.fromarray(np.uint8(image))
 734 |   #   im_width, im_height = img.size
 735 |   #   new_xmin = int(xmin * im_width)
 736 |   #   new_xmax = int(xmax * im_width)
 737 |   #   new_ymin = int(ymin * im_height)
 738 |   #   new_ymax = int(ymax * im_height)
 739 | 
 740 |   #   img_n = box_to_display_str_map[box][0]
 741 |   #   img_name = img_n.replace(': ','-')
 742 |     
 743 |   #   # corp image.Note that the PLI and Numpy coordinates are reversed!!!
 744 |   #   image_corp = image[new_ymin:new_ymax,new_xmin:new_xmax] 
 745 |          
 746 |   #   image_corp = Image.fromarray(np.uint8(image_corp))
 747 |     
 748 |   #   # Save corp image to outo_path ,and join output lable in name.
 749 |   #   if img_name.find('container_number') >= 0:
 750 |   #     t += 1
 751 |   #     image_corp.save(os.path.join(output_path) + '/' +img_name +'_' + (str(t)+'_') + os.path.basename(image_path))
 752 |  
 753 | 
 754 |   # Draw all boxes onto image.
 755 |   for box, color in box_to_color_map.items():
 756 |     ymin, xmin, ymax, xmax = box
 757 |     if instance_masks is not None:
 758 |       draw_mask_on_image_array(
 759 |           image,
 760 |           box_to_instance_masks_map[box],
 761 |           color=color
 762 |       )
 763 |     if instance_boundaries is not None:
 764 |       draw_mask_on_image_array(
 765 |           image,
 766 |           box_to_instance_boundaries_map[box],
 767 |           color='red',
 768 |           alpha=1.0
 769 |       )
 770 | 
 771 |     draw_bounding_box_on_image_array(
 772 |         image,
 773 |         ymin,
 774 |         xmin,
 775 |         ymax,
 776 |         xmax,
 777 |         color=color,
 778 |         thickness=line_thickness,
 779 |         display_str_list=box_to_display_str_map[box],
 780 |         use_normalized_coordinates=use_normalized_coordinates)
 781 |     if keypoints is not None:
 782 |       print(box_to_keypoints_map[box])
 783 |       draw_keypoints_on_image_array(
 784 |           image,
 785 |           box_to_keypoints_map[box],
 786 |           color=color,
 787 |           radius=line_thickness / 2,
 788 |           use_normalized_coordinates=use_normalized_coordinates)
 789 | 
 790 |   return image, box_to_color_map, box_to_display_str_map
 791 | 
 792 | 
 793 | def add_cdf_image_summary(values, name):
 794 |   """Adds a tf.summary.image for a CDF plot of the values.
 795 | 
 796 |   Normalizes `values` such that they sum to 1, plots the cumulative distribution
 797 |   function and creates a tf image summary.
 798 | 
 799 |   Args:
 800 |     values: a 1-D float32 tensor containing the values.
 801 |     name: name for the image summary.
 802 |   """
 803 |   def cdf_plot(values):
 804 |     """Numpy function to plot CDF."""
 805 |     normalized_values = values / np.sum(values)
 806 |     sorted_values = np.sort(normalized_values)
 807 |     cumulative_values = np.cumsum(sorted_values)
 808 |     fraction_of_examples = (np.arange(cumulative_values.size, dtype=np.float32)
 809 |                             / cumulative_values.size)
 810 |     fig = plt.figure(frameon=False)
 811 |     ax = fig.add_subplot('111')
 812 |     ax.plot(fraction_of_examples, cumulative_values)
 813 |     ax.set_ylabel('cumulative normalized values')
 814 |     ax.set_xlabel('fraction of examples')
 815 |     fig.canvas.draw()
 816 |     width, height = fig.get_size_inches() * fig.get_dpi()
 817 |     image = np.fromstring(fig.canvas.tostring_rgb(), dtype='uint8').reshape(
 818 |         1, int(height), int(width), 3)
 819 |     return image
 820 |   cdf_plot = tf.py_func(cdf_plot, [values], tf.uint8)
 821 |   tf.summary.image(name, cdf_plot)
 822 | 
 823 | 
 824 | def add_hist_image_summary(values, bins, name):
 825 |   """Adds a tf.summary.image for a histogram plot of the values.
 826 | 
 827 |   Plots the histogram of values and creates a tf image summary.
 828 | 
 829 |   Args:
 830 |     values: a 1-D float32 tensor containing the values.
 831 |     bins: bin edges which will be directly passed to np.histogram.
 832 |     name: name for the image summary.
 833 |   """
 834 | 
 835 |   def hist_plot(values, bins):
 836 |     """Numpy function to plot hist."""
 837 |     fig = plt.figure(frameon=False)
 838 |     ax = fig.add_subplot('111')
 839 |     y, x = np.histogram(values, bins=bins)
 840 |     ax.plot(x[:-1], y)
 841 |     ax.set_ylabel('count')
 842 |     ax.set_xlabel('value')
 843 |     fig.canvas.draw()
 844 |     width, height = fig.get_size_inches() * fig.get_dpi()
 845 |     image = np.fromstring(
 846 |         fig.canvas.tostring_rgb(), dtype='uint8').reshape(
 847 |             1, int(height), int(width), 3)
 848 |     return image
 849 |   hist_plot = tf.py_func(hist_plot, [values, bins], tf.uint8)
 850 |   tf.summary.image(name, hist_plot)
 851 | 
 852 | 
 853 | class EvalMetricOpsVisualization(object):
 854 |   """Abstract base class responsible for visualizations during evaluation.
 855 | 
 856 |   Currently, summary images are not run during evaluation. One way to produce
 857 |   evaluation images in Tensorboard is to provide tf.summary.image strings as
 858 |   `value_ops` in tf.estimator.EstimatorSpec's `eval_metric_ops`. This class is
 859 |   responsible for accruing images (with overlaid detections and groundtruth)
 860 |   and returning a dictionary that can be passed to `eval_metric_ops`.
 861 |   """
 862 |   __metaclass__ = abc.ABCMeta
 863 | 
 864 |   def __init__(self,
 865 |                category_index,
 866 |                max_examples_to_draw=5,
 867 |                max_boxes_to_draw=20,
 868 |                min_score_thresh=0.2,
 869 |                use_normalized_coordinates=True,
 870 |                summary_name_prefix='evaluation_image'):
 871 |     """Creates an EvalMetricOpsVisualization.
 872 | 
 873 |     Args:
 874 |       category_index: A category index (dictionary) produced from a labelmap.
 875 |       max_examples_to_draw: The maximum number of example summaries to produce.
 876 |       max_boxes_to_draw: The maximum number of boxes to draw for detections.
 877 |       min_score_thresh: The minimum score threshold for showing detections.
 878 |       use_normalized_coordinates: Whether to assume boxes and kepoints are in
 879 |         normalized coordinates (as opposed to absolute coordiantes).
 880 |         Default is True.
 881 |       summary_name_prefix: A string prefix for each image summary.
 882 |     """
 883 | 
 884 |     self._category_index = category_index
 885 |     self._max_examples_to_draw = max_examples_to_draw
 886 |     self._max_boxes_to_draw = max_boxes_to_draw
 887 |     self._min_score_thresh = min_score_thresh
 888 |     self._use_normalized_coordinates = use_normalized_coordinates
 889 |     self._summary_name_prefix = summary_name_prefix
 890 |     self._images = []
 891 | 
 892 |   def clear(self):
 893 |     self._images = []
 894 | 
 895 |   def add_images(self, images):
 896 |     """Store a list of images, each with shape [1, H, W, C]."""
 897 |     if len(self._images) >= self._max_examples_to_draw:
 898 |       return
 899 | 
 900 |     # Store images and clip list if necessary.
 901 |     self._images.extend(images)
 902 |     if len(self._images) > self._max_examples_to_draw:
 903 |       self._images[self._max_examples_to_draw:] = []
 904 | 
 905 |   def get_estimator_eval_metric_ops(self, eval_dict):
 906 |     """Returns metric ops for use in tf.estimator.EstimatorSpec.
 907 | 
 908 |     Args:
 909 |       eval_dict: A dictionary that holds an image, groundtruth, and detections
 910 |         for a batched example. Note that, we use only the first example for
 911 |         visualization. See eval_util.result_dict_for_batched_example() for a
 912 |         convenient method for constructing such a dictionary. The dictionary
 913 |         contains
 914 |         fields.InputDataFields.original_image: [batch_size, H, W, 3] image.
 915 |         fields.InputDataFields.original_image_spatial_shape: [batch_size, 2]
 916 |           tensor containing the size of the original image.
 917 |         fields.InputDataFields.true_image_shape: [batch_size, 3]
 918 |           tensor containing the spatial size of the upadded original image.
 919 |         fields.InputDataFields.groundtruth_boxes - [batch_size, num_boxes, 4]
 920 |           float32 tensor with groundtruth boxes in range [0.0, 1.0].
 921 |         fields.InputDataFields.groundtruth_classes - [batch_size, num_boxes]
 922 |           int64 tensor with 1-indexed groundtruth classes.
 923 |         fields.InputDataFields.groundtruth_instance_masks - (optional)
 924 |           [batch_size, num_boxes, H, W] int64 tensor with instance masks.
 925 |         fields.DetectionResultFields.detection_boxes - [batch_size,
 926 |           max_num_boxes, 4] float32 tensor with detection boxes in range [0.0,
 927 |           1.0].
 928 |         fields.DetectionResultFields.detection_classes - [batch_size,
 929 |           max_num_boxes] int64 tensor with 1-indexed detection classes.
 930 |         fields.DetectionResultFields.detection_scores - [batch_size,
 931 |           max_num_boxes] float32 tensor with detection scores.
 932 |         fields.DetectionResultFields.detection_masks - (optional) [batch_size,
 933 |           max_num_boxes, H, W] float32 tensor of binarized masks.
 934 |         fields.DetectionResultFields.detection_keypoints - (optional)
 935 |           [batch_size, max_num_boxes, num_keypoints, 2] float32 tensor with
 936 |           keypoints.
 937 | 
 938 |     Returns:
 939 |       A dictionary of image summary names to tuple of (value_op, update_op). The
 940 |       `update_op` is the same for all items in the dictionary, and is
 941 |       responsible for saving a single side-by-side image with detections and
 942 |       groundtruth. Each `value_op` holds the tf.summary.image string for a given
 943 |       image.
 944 |     """
 945 |     if self._max_examples_to_draw == 0:
 946 |       return {}
 947 |     images = self.images_from_evaluation_dict(eval_dict)
 948 | 
 949 |     def get_images():
 950 |       """Returns a list of images, padded to self._max_images_to_draw."""
 951 |       images = self._images
 952 |       while len(images) < self._max_examples_to_draw:
 953 |         images.append(np.array(0, dtype=np.uint8))
 954 |       self.clear()
 955 |       return images
 956 | 
 957 |     def image_summary_or_default_string(summary_name, image):
 958 |       """Returns image summaries for non-padded elements."""
 959 |       return tf.cond(
 960 |           tf.equal(tf.size(tf.shape(image)), 4),
 961 |           lambda: tf.summary.image(summary_name, image),
 962 |           lambda: tf.constant(''))
 963 | 
 964 |     update_op = tf.py_func(self.add_images, [[images[0]]], [])
 965 |     image_tensors = tf.py_func(
 966 |         get_images, [], [tf.uint8] * self._max_examples_to_draw)
 967 |     eval_metric_ops = {}
 968 |     for i, image in enumerate(image_tensors):
 969 |       summary_name = self._summary_name_prefix + '/' + str(i)
 970 |       value_op = image_summary_or_default_string(summary_name, image)
 971 |       eval_metric_ops[summary_name] = (value_op, update_op)
 972 |     return eval_metric_ops
 973 | 
 974 |   @abc.abstractmethod
 975 |   def images_from_evaluation_dict(self, eval_dict):
 976 |     """Converts evaluation dictionary into a list of image tensors.
 977 | 
 978 |     To be overridden by implementations.
 979 | 
 980 |     Args:
 981 |       eval_dict: A dictionary with all the necessary information for producing
 982 |         visualizations.
 983 | 
 984 |     Returns:
 985 |       A list of [1, H, W, C] uint8 tensors.
 986 |     """
 987 |     raise NotImplementedError
 988 | 
 989 | 
 990 | class VisualizeSingleFrameDetections(EvalMetricOpsVisualization):
 991 |   """Class responsible for single-frame object detection visualizations."""
 992 | 
 993 |   def __init__(self,
 994 |                category_index,
 995 |                max_examples_to_draw=5,
 996 |                max_boxes_to_draw=20,
 997 |                min_score_thresh=0.2,
 998 |                use_normalized_coordinates=True,
 999 |                summary_name_prefix='Detections_Left_Groundtruth_Right'):
1000 |     super(VisualizeSingleFrameDetections, self).__init__(
1001 |         category_index=category_index,
1002 |         max_examples_to_draw=max_examples_to_draw,
1003 |         max_boxes_to_draw=max_boxes_to_draw,
1004 |         min_score_thresh=min_score_thresh,
1005 |         use_normalized_coordinates=use_normalized_coordinates,
1006 |         summary_name_prefix=summary_name_prefix)
1007 | 
1008 |   def images_from_evaluation_dict(self, eval_dict):
1009 |     return draw_side_by_side_evaluation_image(
1010 |         eval_dict, self._category_index, self._max_boxes_to_draw,
1011 |         self._min_score_thresh, self._use_normalized_coordinates)
1012 | 


--------------------------------------------------------------------------------