├── LICENSE
├── README.md
├── cont_num.txt
├── create_pascal_tf_record.py
├── data
└── container_label_map.pbtxt
├── detection_var_image.py
├── generate_voc_datasets.py
├── image
├── image1.jpg
├── image2.jpg
├── image3.jpg
├── image4.jpg
└── image5.jpg
└── utils
├── __init__.py
└── visualization_utils.py
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU General Public License is a free, copyleft license for
11 | software and other kinds of works.
12 |
13 | The licenses for most software and other practical works are designed
14 | to take away your freedom to share and change the works. By contrast,
15 | the GNU General Public License is intended to guarantee your freedom to
16 | share and change all versions of a program--to make sure it remains free
17 | software for all its users. We, the Free Software Foundation, use the
18 | GNU General Public License for most of our software; it applies also to
19 | any other work released this way by its authors. You can apply it to
20 | your programs, too.
21 |
22 | When we speak of free software, we are referring to freedom, not
23 | price. Our General Public Licenses are designed to make sure that you
24 | have the freedom to distribute copies of free software (and charge for
25 | them if you wish), that you receive source code or can get it if you
26 | want it, that you can change the software or use pieces of it in new
27 | free programs, and that you know you can do these things.
28 |
29 | To protect your rights, we need to prevent others from denying you
30 | these rights or asking you to surrender the rights. Therefore, you have
31 | certain responsibilities if you distribute copies of the software, or if
32 | you modify it: responsibilities to respect the freedom of others.
33 |
34 | For example, if you distribute copies of such a program, whether
35 | gratis or for a fee, you must pass on to the recipients the same
36 | freedoms that you received. You must make sure that they, too, receive
37 | or can get the source code. And you must show them these terms so they
38 | know their rights.
39 |
40 | Developers that use the GNU GPL protect your rights with two steps:
41 | (1) assert copyright on the software, and (2) offer you this License
42 | giving you legal permission to copy, distribute and/or modify it.
43 |
44 | For the developers' and authors' protection, the GPL clearly explains
45 | that there is no warranty for this free software. For both users' and
46 | authors' sake, the GPL requires that modified versions be marked as
47 | changed, so that their problems will not be attributed erroneously to
48 | authors of previous versions.
49 |
50 | Some devices are designed to deny users access to install or run
51 | modified versions of the software inside them, although the manufacturer
52 | can do so. This is fundamentally incompatible with the aim of
53 | protecting users' freedom to change the software. The systematic
54 | pattern of such abuse occurs in the area of products for individuals to
55 | use, which is precisely where it is most unacceptable. Therefore, we
56 | have designed this version of the GPL to prohibit the practice for those
57 | products. If such problems arise substantially in other domains, we
58 | stand ready to extend this provision to those domains in future versions
59 | of the GPL, as needed to protect the freedom of users.
60 |
61 | Finally, every program is threatened constantly by software patents.
62 | States should not allow patents to restrict development and use of
63 | software on general-purpose computers, but in those that do, we wish to
64 | avoid the special danger that patents applied to a free program could
65 | make it effectively proprietary. To prevent this, the GPL assures that
66 | patents cannot be used to render the program non-free.
67 |
68 | The precise terms and conditions for copying, distribution and
69 | modification follow.
70 |
71 | TERMS AND CONDITIONS
72 |
73 | 0. Definitions.
74 |
75 | "This License" refers to version 3 of the GNU General Public License.
76 |
77 | "Copyright" also means copyright-like laws that apply to other kinds of
78 | works, such as semiconductor masks.
79 |
80 | "The Program" refers to any copyrightable work licensed under this
81 | License. Each licensee is addressed as "you". "Licensees" and
82 | "recipients" may be individuals or organizations.
83 |
84 | To "modify" a work means to copy from or adapt all or part of the work
85 | in a fashion requiring copyright permission, other than the making of an
86 | exact copy. The resulting work is called a "modified version" of the
87 | earlier work or a work "based on" the earlier work.
88 |
89 | A "covered work" means either the unmodified Program or a work based
90 | on the Program.
91 |
92 | To "propagate" a work means to do anything with it that, without
93 | permission, would make you directly or secondarily liable for
94 | infringement under applicable copyright law, except executing it on a
95 | computer or modifying a private copy. Propagation includes copying,
96 | distribution (with or without modification), making available to the
97 | public, and in some countries other activities as well.
98 |
99 | To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies. Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 |
103 | An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License. If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 |
112 | 1. Source Code.
113 |
114 | The "source code" for a work means the preferred form of the work
115 | for making modifications to it. "Object code" means any non-source
116 | form of a work.
117 |
118 | A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 |
123 | The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form. A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 |
134 | The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities. However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work. For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 |
147 | The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 |
151 | The Corresponding Source for a work in source code form is that
152 | same work.
153 |
154 | 2. Basic Permissions.
155 |
156 | All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met. This License explicitly affirms your unlimited
159 | permission to run the unmodified Program. The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work. This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 |
164 | You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force. You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright. Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 |
175 | Conveying under any other circumstances is permitted solely under
176 | the conditions stated below. Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 |
179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 |
181 | No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 |
187 | When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 |
195 | 4. Conveying Verbatim Copies.
196 |
197 | You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 |
205 | You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 |
208 | 5. Conveying Modified Source Versions.
209 |
210 | You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 |
214 | a) The work must carry prominent notices stating that you modified
215 | it, and giving a relevant date.
216 |
217 | b) The work must carry prominent notices stating that it is
218 | released under this License and any conditions added under section
219 | 7. This requirement modifies the requirement in section 4 to
220 | "keep intact all notices".
221 |
222 | c) You must license the entire work, as a whole, under this
223 | License to anyone who comes into possession of a copy. This
224 | License will therefore apply, along with any applicable section 7
225 | additional terms, to the whole of the work, and all its parts,
226 | regardless of how they are packaged. This License gives no
227 | permission to license the work in any other way, but it does not
228 | invalidate such permission if you have separately received it.
229 |
230 | d) If the work has interactive user interfaces, each must display
231 | Appropriate Legal Notices; however, if the Program has interactive
232 | interfaces that do not display Appropriate Legal Notices, your
233 | work need not make them do so.
234 |
235 | A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit. Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 |
245 | 6. Conveying Non-Source Forms.
246 |
247 | You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 |
252 | a) Convey the object code in, or embodied in, a physical product
253 | (including a physical distribution medium), accompanied by the
254 | Corresponding Source fixed on a durable physical medium
255 | customarily used for software interchange.
256 |
257 | b) Convey the object code in, or embodied in, a physical product
258 | (including a physical distribution medium), accompanied by a
259 | written offer, valid for at least three years and valid for as
260 | long as you offer spare parts or customer support for that product
261 | model, to give anyone who possesses the object code either (1) a
262 | copy of the Corresponding Source for all the software in the
263 | product that is covered by this License, on a durable physical
264 | medium customarily used for software interchange, for a price no
265 | more than your reasonable cost of physically performing this
266 | conveying of source, or (2) access to copy the
267 | Corresponding Source from a network server at no charge.
268 |
269 | c) Convey individual copies of the object code with a copy of the
270 | written offer to provide the Corresponding Source. This
271 | alternative is allowed only occasionally and noncommercially, and
272 | only if you received the object code with such an offer, in accord
273 | with subsection 6b.
274 |
275 | d) Convey the object code by offering access from a designated
276 | place (gratis or for a charge), and offer equivalent access to the
277 | Corresponding Source in the same way through the same place at no
278 | further charge. You need not require recipients to copy the
279 | Corresponding Source along with the object code. If the place to
280 | copy the object code is a network server, the Corresponding Source
281 | may be on a different server (operated by you or a third party)
282 | that supports equivalent copying facilities, provided you maintain
283 | clear directions next to the object code saying where to find the
284 | Corresponding Source. Regardless of what server hosts the
285 | Corresponding Source, you remain obligated to ensure that it is
286 | available for as long as needed to satisfy these requirements.
287 |
288 | e) Convey the object code using peer-to-peer transmission, provided
289 | you inform other peers where the object code and Corresponding
290 | Source of the work are being offered to the general public at no
291 | charge under subsection 6d.
292 |
293 | A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 |
297 | A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling. In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage. For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product. A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 |
310 | "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source. The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 |
318 | If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information. But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 |
329 | The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed. Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 |
337 | Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 |
343 | 7. Additional Terms.
344 |
345 | "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law. If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 |
354 | When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it. (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.) You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 |
361 | Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 |
365 | a) Disclaiming warranty or limiting liability differently from the
366 | terms of sections 15 and 16 of this License; or
367 |
368 | b) Requiring preservation of specified reasonable legal notices or
369 | author attributions in that material or in the Appropriate Legal
370 | Notices displayed by works containing it; or
371 |
372 | c) Prohibiting misrepresentation of the origin of that material, or
373 | requiring that modified versions of such material be marked in
374 | reasonable ways as different from the original version; or
375 |
376 | d) Limiting the use for publicity purposes of names of licensors or
377 | authors of the material; or
378 |
379 | e) Declining to grant rights under trademark law for use of some
380 | trade names, trademarks, or service marks; or
381 |
382 | f) Requiring indemnification of licensors and authors of that
383 | material by anyone who conveys the material (or modified versions of
384 | it) with contractual assumptions of liability to the recipient, for
385 | any liability that these contractual assumptions directly impose on
386 | those licensors and authors.
387 |
388 | All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10. If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term. If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 |
398 | If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 |
403 | Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 |
407 | 8. Termination.
408 |
409 | You may not propagate or modify a covered work except as expressly
410 | provided under this License. Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 |
415 | However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 |
422 | Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 |
429 | Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License. If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 |
435 | 9. Acceptance Not Required for Having Copies.
436 |
437 | You are not required to accept this License in order to receive or
438 | run a copy of the Program. Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance. However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work. These actions infringe copyright if you do
443 | not accept this License. Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 |
446 | 10. Automatic Licensing of Downstream Recipients.
447 |
448 | Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License. You are not responsible
451 | for enforcing compliance by third parties with this License.
452 |
453 | An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations. If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 |
463 | You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License. For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 |
471 | 11. Patents.
472 |
473 | A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based. The
475 | work thus licensed is called the contributor's "contributor version".
476 |
477 | A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version. For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 |
487 | Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 |
492 | In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement). To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 |
499 | If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients. "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 |
513 | If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 |
521 | A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License. You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 |
536 | Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 |
540 | 12. No Surrender of Others' Freedom.
541 |
542 | If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License. If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all. For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 |
552 | 13. Use with the GNU Affero General Public License.
553 |
554 | Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work. The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 |
563 | 14. Revised Versions of this License.
564 |
565 | The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time. Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 |
570 | Each version is given a distinguishing version number. If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation. If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 |
579 | If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 |
584 | Later license versions may give you additional or different
585 | permissions. However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 |
589 | 15. Disclaimer of Warranty.
590 |
591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 |
600 | 16. Limitation of Liability.
601 |
602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 |
612 | 17. Interpretation of Sections 15 and 16.
613 |
614 | If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 |
621 | END OF TERMS AND CONDITIONS
622 |
623 | How to Apply These Terms to Your New Programs
624 |
625 | If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 |
629 | To do so, attach the following notices to the program. It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 |
634 |
635 | Copyright (C)
636 |
637 | This program is free software: you can redistribute it and/or modify
638 | it under the terms of the GNU General Public License as published by
639 | the Free Software Foundation, either version 3 of the License, or
640 | (at your option) any later version.
641 |
642 | This program is distributed in the hope that it will be useful,
643 | but WITHOUT ANY WARRANTY; without even the implied warranty of
644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645 | GNU General Public License for more details.
646 |
647 | You should have received a copy of the GNU General Public License
648 | along with this program. If not, see .
649 |
650 | Also add information on how to contact you by electronic and paper mail.
651 |
652 | If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 |
655 | Copyright (C)
656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 | This is free software, and you are welcome to redistribute it
658 | under certain conditions; type `show c' for details.
659 |
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License. Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 |
664 | You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | .
668 |
669 | The GNU General Public License does not permit incorporating your program
670 | into proprietary programs. If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library. If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License. But first, please read
674 | .
675 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Container detection and container number OCR using Tensorflow Object Detection API and Tesseract
2 |
3 | Container detection and container number OCR is a specific project requirement, using [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) and [Tesseract](https://github.com/tesseract-ocr/tesseract) to verify feasibility is one of the quickest and simplest ways.
4 |
5 | >两年多之前我在“ex公司”的时候,有一个明确的项目需求是集装箱识别并计数,然后通过集装箱号OCR识别记录每一个集装箱号,然后与其余业务系统的数据进行交换,以实现特定的需求。正好Tensorflow Object Detection API 发布了,就放弃了YOLO或者SSD的选项,考虑用TF实现Demo做POC验证了。具体需求实现的思考与pipeline构想思考参见这篇文章:[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/) 。
6 |
7 | ## 用法
8 |
9 | ### Tensorflow Object Detection API 安装
10 |
11 | 具体安装参考官方[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md)。
12 |
13 | ### 环境与依赖
14 |
15 | 本人使用的环境是:macOS 10.14.2,python 3.6.8,TF 1.12
16 | 除了Tensorflow Object Detection API 安装必备的依赖外,还需要以下依赖:
17 | tesseract
18 | pytesseract
19 | 具体安装及用途,请自行Google。
20 | `visualization_utils.py`中:
21 |
22 | ``` python
23 | import matplotlib; matplotlib.use('Agg')
24 | ```
25 |
26 | Agg在我的环境下用不了,也懒得折腾,所以把这句改了。
27 |
28 | ### 数据集准备
29 |
30 | 参考PascalVOC的数据集格式,使用[LabelImg](https://github.com/tzutalin/labelImg)进行标注。
31 | 标注完成后可以使用`generate_voc_datasets.py`按你的想法分割数据集为:train 、val 与 test这个三个data set。
32 | 分割为三个data set后,可以使用`create_pascal_tf_record.py`转换为TF record格式data set文件供TF使用(此文件官方提供,在`/object_detection/dataset_tools/`)。
33 | 有关数据准备的内容,可以参考这里的[说明](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md)。
34 |
35 | ### 训练
36 |
37 | 参考[官方说明-本地](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md)使用官方代码库中的`model_main.py`在本地训练(以前是train 和 val 分别提供了两个版本,目前版本用这一个文件就可以了。)。
38 | 参考[官方说明——Google Cloud ML Engine](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md)在Google Cloud ML Engine上使用TPU训练,资费说明在[这里](https://cloud.google.com/ml-engine/docs/tensorflow/pricing?hl=zh-CN),可以选择“竞争”模式使用,会便宜很多。
39 |
40 | ### 验证
41 |
42 | 可以使用官方代码中的`object_detection_tutorial.ipynb`做快速验证尝试。本repo中的`detection_var_image.py`也主要参考这个ipynp实现的。
43 | 以下几个位置需要根据你自己的实际情况来修改:
44 |
45 | ``` python
46 |
47 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
48 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
49 | # List of the strings that is used to add correct label for each box.
50 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt')
51 |
52 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)]
53 |
54 | lang = 'cont41'
55 |
56 | ```
57 |
58 | 其中`lang = 'cont41'`中的`cont41`是trsseract使用的lang文件的名字,如果你还没有来得及自己训练lang文件,可以把`lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f'`中的其余内容都删了,仅保留`eng`,使用tesseract安装默认带的lang文件进行识别。
59 | 返回的`image_label`为一个嵌套列表,会是这个样子:
60 |
61 | ``` python
62 |
63 | [{'image1': [{'lable': 'container_number', 'actual': '100%', 'cont_num': 'TCLU § 148575 3\n45G1', 'image_corp_name': 'image1_1_container_number'}]}, {'image2': [{'lable': 'container_number', 'actual': '99%', 'cont_num': 'TRNU816699 4 |\n45G1', 'image_corp_name': 'image2_1_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'TCNU89092898\n4561', 'image_corp_name': 'image2_2_container_number'}, {'lable': 'container_number', 'actual': '99%', 'cont_num': 'MSKUY 86801264\n4561', 'image_corp_name': 'image2_3_container_number'}]}]
64 |
65 | ```
66 |
67 | 每个索引对应一个字典,字典的:
68 | `key`为输入的图片名称;
69 | `value`为一个列表,列表的索引对应的是由4个key构成的字典,分别是标签、置信度、OCR的结果以及输出的裁剪后的集装箱号图片的名称,索引数量则代表了在图片中找到的集装箱号。
70 |
71 | 主要是考虑如果再用flask做个Web,可以直接用flask简单做个服务端,把检测的结果JSON串一次性抛出来,Demo环节没必要再单独折腾TensorFlow Serving部署一个后端。
72 |
73 | 对于每张输入的图片,除了上述JOSN输出外,还输出:
74 | 绘制了Bounding box 与 label 的图片;
75 | 集装箱号位置的裁剪图片(有几个裁几个),以及使用openCV做了预处理后丢入tesseract之前的图片。通过对比图片与OCR结果,可以给我们调整图片预处理的思路与参数。
76 |
77 | #### Demo
78 |
79 | `image`文件夹下有5张测试图片,测试结果在`cont_num.txt`中,部分如下:
80 |
81 | | 图片名 | OCR结果 | 实际 |
82 | |:------:|:------:|:----:|
83 | | image1_1_container_number_100% | TCLU § 148575 3 45G1 | TCLU 148575 3 45G1 |
84 | |image2_1_container_number_99% | TRNU816699 4 \| 45G1 | TRLU 818699 0 45G1 |
85 | | image2_2_container_number_99% | TCNU89092898 4561 | TCNU 869248 8 45G1 |
86 | | image2_3_container_number_99% | MSKUY 86801264 4561 | MSKU 868012 6 4561 |
87 | | image3_1_container_number_99% | x L BOUL 871489 7 \| 221 | BMOU 871489 7 22R1 |
88 | | image3_2_container_number_99% | FCIU [599867 (0 22G1 | FCIU 599887 0 22G1 |
89 |
90 | 可以看到,OCR的整体准确率并不高,可以说,与我在[Container detection and container number OCR](https://lonelygo.github.io/2019-01-20-container-detection/)中预估的准确率不超过8成是匹配的(现在看肯定是事后诸葛亮,但在当时下决心做验证的时候是这么一个真实预测)。这个准确率并不是没有提高可能的,实际上在以下几个方面可以继续做一些工作进行尝试:
91 |
92 | - 因为Tesseract训练用的图片质量大多和`image1.jpg`接近,所以需要调整训练集的图片质量,使其比较符合工程场景图像质量;
93 | - 工程场景下,尽量保证图像质量,并且通过工程现场使用,收集图片;
94 | - 图片收集足够数量后,OCR引擎转变为深度学习版本的;
95 | - 改善OCR之前的图像预处理策略,事实上,我在换了其他的预处理策略后,结果是可以优于上述表现的。
96 |
97 | 其中,`image1.jpg`输出图片分别如下:
98 | 
99 |
100 | 
101 |
102 | 
103 |
104 | #### To Do
105 |
106 | - [ ] 增加使用视频流检测的Demo版本
107 | - [ ] 用flask增加一个简单的Web上传与显示结果的页面
108 |
109 | #### 参考
110 |
111 | [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection)
112 |
113 | [Tensorflow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)
114 |
--------------------------------------------------------------------------------
/cont_num.txt:
--------------------------------------------------------------------------------
1 | image1_1_container_number_100%
2 | TCLU § 148575 3
3 | 45G1
4 | image2_1_container_number_99%
5 | TRNU816699 4 |
6 | 45G1
7 | image2_2_container_number_99%
8 | TCNU89092898
9 | 4561
10 | image2_3_container_number_99%
11 | MSKUY 86801264
12 | 4561
13 | image3_1_container_number_99%
14 | x L
15 | BOUL 871489 7 |
16 | 221
17 | image3_2_container_number_99%
18 | FCIU [599867 (0
19 | 22G1
20 | image4_1_container_number_e_99%
21 | WH LU 555149
22 | CSU
23 | image5_1_container_number_99%
24 | 5421 357770 4
25 | 2261
26 | image5_2_container_number_99%
27 | 1
28 | BSU247709
29 | | 2221
30 | image5_3_container_number_99%
31 | TRHU | 395563
32 | 2261
33 | image5_4_container_number_99%
34 | TRUU20275643
35 | 221
36 |
--------------------------------------------------------------------------------
/create_pascal_tf_record.py:
--------------------------------------------------------------------------------
1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | r"""Convert raw PASCAL dataset to TFRecord for object_detection.
17 |
18 | Example usage:
19 | python object_detection/dataset_tools/create_pascal_tf_record.py \
20 | --data_dir=/home/user/VOCdevkit \
21 | --year=VOC2012 \
22 | --output_path=/home/user/pascal.record
23 | """
24 | from __future__ import absolute_import
25 | from __future__ import division
26 | from __future__ import print_function
27 |
28 | import hashlib
29 | import io
30 | import logging
31 | import os
32 |
33 | from lxml import etree
34 | import PIL.Image
35 | import tensorflow as tf
36 |
37 | from object_detection.utils import dataset_util
38 | from object_detection.utils import label_map_util
39 |
40 |
41 | flags = tf.app.flags
42 | flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.')
43 | flags.DEFINE_string('set', 'val', 'Convert training set, validation set or '
44 | 'merged set.')
45 | flags.DEFINE_string('annotations_dir', 'Annotations',
46 | '(Relative) path to annotations directory.')
47 | flags.DEFINE_string('year', 'cont_train', 'Desired challenge year.')
48 | flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
49 | flags.DEFINE_string('label_map_path', '',
50 | 'Path to label map proto')
51 | flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '
52 | 'difficult instances')
53 | FLAGS = flags.FLAGS
54 |
55 | SETS = ['train', 'val', 'trainval', 'test']
56 | YEARS = ['cont_train', 'VOC2012', 'merged']
57 |
58 |
59 | def dict_to_tf_example(data,
60 | dataset_directory,
61 | label_map_dict,
62 | ignore_difficult_instances=False,
63 | image_subdirectory='JPEGImages'):
64 | """Convert XML derived dict to tf.Example proto.
65 |
66 | Notice that this function normalizes the bounding box coordinates provided
67 | by the raw data.
68 |
69 | Args:
70 | data: dict holding PASCAL XML fields for a single image (obtained by
71 | running dataset_util.recursive_parse_xml_to_dict)
72 | dataset_directory: Path to root directory holding PASCAL dataset
73 | label_map_dict: A map from string label names to integers ids.
74 | ignore_difficult_instances: Whether to skip difficult instances in the
75 | dataset (default: False).
76 | image_subdirectory: String specifying subdirectory within the
77 | PASCAL dataset directory holding the actual image data.
78 |
79 | Returns:
80 | example: The converted tf.Example.
81 |
82 | Raises:
83 | ValueError: if the image pointed to by data['filename'] is not a valid JPEG
84 | """
85 | img_path = os.path.join('cont_train', image_subdirectory, data['filename']) # I do'n know why data['folder'] give wrong path.
86 | full_path = os.path.join(dataset_directory, img_path)
87 | with tf.gfile.GFile(full_path, 'rb') as fid:
88 | encoded_jpg = fid.read()
89 | encoded_jpg_io = io.BytesIO(encoded_jpg)
90 | image = PIL.Image.open(encoded_jpg_io)
91 | if image.format != 'JPEG':
92 | raise ValueError('Image format not JPEG')
93 | key = hashlib.sha256(encoded_jpg).hexdigest()
94 |
95 | width = int(data['size']['width'])
96 | height = int(data['size']['height'])
97 |
98 | xmin = []
99 | ymin = []
100 | xmax = []
101 | ymax = []
102 | classes = []
103 | classes_text = []
104 | truncated = []
105 | poses = []
106 | difficult_obj = []
107 | if 'object' in data:
108 | for obj in data['object']:
109 | difficult = bool(int(obj['difficult']))
110 | if ignore_difficult_instances and difficult:
111 | continue
112 |
113 | difficult_obj.append(int(difficult))
114 |
115 | xmin.append(float(obj['bndbox']['xmin']) / width)
116 | ymin.append(float(obj['bndbox']['ymin']) / height)
117 | xmax.append(float(obj['bndbox']['xmax']) / width)
118 | ymax.append(float(obj['bndbox']['ymax']) / height)
119 | classes_text.append(obj['name'].encode('utf8'))
120 | classes.append(label_map_dict[obj['name']])
121 | truncated.append(int(obj['truncated']))
122 | poses.append(obj['pose'].encode('utf8'))
123 |
124 | example = tf.train.Example(features=tf.train.Features(feature={
125 | 'image/height': dataset_util.int64_feature(height),
126 | 'image/width': dataset_util.int64_feature(width),
127 | 'image/filename': dataset_util.bytes_feature(
128 | data['filename'].encode('utf8')),
129 | 'image/source_id': dataset_util.bytes_feature(
130 | data['filename'].encode('utf8')),
131 | 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
132 | 'image/encoded': dataset_util.bytes_feature(encoded_jpg),
133 | 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
134 | 'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
135 | 'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
136 | 'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
137 | 'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
138 | 'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
139 | 'image/object/class/label': dataset_util.int64_list_feature(classes),
140 | 'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
141 | 'image/object/truncated': dataset_util.int64_list_feature(truncated),
142 | 'image/object/view': dataset_util.bytes_list_feature(poses),
143 | }))
144 | return example
145 |
146 |
147 | def main(_):
148 | if FLAGS.set not in SETS:
149 | raise ValueError('set must be in : {}'.format(SETS))
150 | if FLAGS.year not in YEARS:
151 | raise ValueError('year must be in : {}'.format(YEARS))
152 |
153 | data_dir = FLAGS.data_dir
154 | years = ['cont_train', 'VOC2012']
155 | if FLAGS.year != 'merged':
156 | years = [FLAGS.year]
157 |
158 | writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
159 |
160 | label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)
161 |
162 | for year in years:
163 | logging.info('Reading from PASCAL %s dataset.', year)
164 | examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main', FLAGS.set + '.txt')
165 | annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir)
166 | examples_list = dataset_util.read_examples_list(examples_path)
167 | for idx, example in enumerate(examples_list):
168 | if idx % 100 == 0:
169 | logging.info('On image %d of %d', idx, len(examples_list))
170 | path = os.path.join(annotations_dir, example + '.xml')
171 | with tf.gfile.GFile(path, 'r') as fid:
172 | xml_str = fid.read()
173 | xml = etree.fromstring(xml_str)
174 | data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
175 |
176 | tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,
177 | FLAGS.ignore_difficult_instances)
178 |
179 | writer.write(tf_example.SerializeToString())
180 |
181 | writer.close()
182 |
183 |
184 | if __name__ == '__main__':
185 | tf.app.run()
186 |
--------------------------------------------------------------------------------
/data/container_label_map.pbtxt:
--------------------------------------------------------------------------------
1 | item {
2 | id: 1
3 | name: 'container_number'
4 | }
5 |
6 | item {
7 | id: 2
8 | name: 'container_number_v'
9 | }
10 |
11 | item {
12 | id: 6
13 | name: 'container_number_e'
14 | }
15 |
16 | item {
17 | id: 3
18 | name: 'container_door'
19 | }
20 | item {
21 | id: 4
22 | name: 'container_end_door'
23 | }
24 |
25 | item {
26 | id: 5
27 | name: 'container'
28 | }
--------------------------------------------------------------------------------
/detection_var_image.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | __author__ = 'Kevin Di'
4 |
5 | import numpy as np
6 | import os
7 | from skimage import io, data
8 | import six.moves.urllib as urllib
9 | import sys
10 | import tarfile
11 | import tensorflow as tf
12 |
13 | from collections import defaultdict
14 | import collections
15 | from io import StringIO
16 | import matplotlib as mpl
17 |
18 | from matplotlib import pyplot as plt
19 | from PIL import Image
20 | import pytesseract
21 | import cv2
22 | import re
23 |
24 |
25 | # This is needed since the notebook is stored in the object_detection folder.
26 | sys.path.append("..")
27 | from object_detection.utils import ops as utils_ops
28 |
29 |
30 | from object_detection.utils import label_map_util
31 |
32 | from object_detection.utils import visualization_utils as vis_util
33 |
34 |
35 | MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
36 | PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
37 | # List of the strings that is used to add correct label for each box.
38 | PATH_TO_LABELS = os.path.join('data', 'container_label_map.pbtxt')
39 |
40 | detection_graph = tf.Graph()
41 | with detection_graph.as_default():
42 | od_graph_def = tf.GraphDef()
43 | with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
44 | serialized_graph = fid.read()
45 | od_graph_def.ParseFromString(serialized_graph)
46 | tf.import_graph_def(od_graph_def, name='')
47 |
48 | category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
49 |
50 | def load_image_into_numpy_array(image):
51 | (im_width, im_height) = image.size
52 | return np.array(image.getdata()).reshape(
53 | (im_height, im_width, 3)).astype(np.uint8)
54 |
55 | # If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
56 | PATH_TO_TEST_IMAGES_DIR = 'test_images'
57 |
58 | TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4)]
59 |
60 |
61 | # Size, in inches, of the output images,use to plt.figure(figsize=IMAGE_SIZE)
62 | # IMAGE_SIZE = (12, 8)
63 |
64 | def run_inference_for_single_image(image, graph):
65 | with graph.as_default():
66 | with tf.Session(config = tf.ConfigProto(
67 | device_count = {"CPU":16},
68 | inter_op_parallelism_threads = 5,
69 | intra_op_parallelism_threads = 2,
70 | )) as sess:
71 | # Get handles to input and output tensors
72 | ops = tf.get_default_graph().get_operations()
73 | all_tensor_names = {output.name for op in ops for output in op.outputs}
74 | tensor_dict = {}
75 | for key in [
76 | 'num_detections', 'detection_boxes', 'detection_scores',
77 | 'detection_classes', 'detection_masks'
78 | ]:
79 | tensor_name = key + ':0'
80 | if tensor_name in all_tensor_names:
81 | tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
82 | tensor_name)
83 | if 'detection_masks' in tensor_dict:
84 | # The following processing is only for single image
85 | detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
86 | detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
87 | # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
88 | real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
89 | detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
90 | detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
91 | detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
92 | detection_masks, detection_boxes, image.shape[0], image.shape[1])
93 | detection_masks_reframed = tf.cast(
94 | tf.greater(detection_masks_reframed, 0.5), tf.uint8)
95 | # Follow the convention by adding back the batch dimension
96 | tensor_dict['detection_masks'] = tf.expand_dims(
97 | detection_masks_reframed, 0)
98 | image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
99 |
100 | # Run inference
101 | output_dict = sess.run(tensor_dict,
102 | feed_dict={image_tensor: np.expand_dims(image, 0)})
103 |
104 | # all outputs are float32 numpy arrays, so convert types as appropriate
105 | output_dict['num_detections'] = int(output_dict['num_detections'][0])
106 | output_dict['detection_classes'] = output_dict[
107 | 'detection_classes'][0].astype(np.uint8)
108 | output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
109 | output_dict['detection_scores'] = output_dict['detection_scores'][0]
110 | if 'detection_masks' in output_dict:
111 | output_dict['detection_masks'] = output_dict['detection_masks'][0]
112 | return output_dict
113 |
114 | def image_preprocessing(img):
115 | # image_gray = img
116 | image_gray = cv2.cvtColor(np.asarray(img), cv2.COLOR_BGR2GRAY)
117 | # image_gray = cv2.medianBlur(image_gray, 3)
118 | # image_gray = cv2.threshold(image_gray, 127, 255, cv2.THRESH_BINARY_INV)[1]
119 | # adaptiveThreshold not good ,just try it.
120 | # image_gray = cv2.adaptiveThreshold(image_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
121 |
122 | return image_gray
123 | # box_to_color_map{(xmin,xmax,ymin,ymax)(***): 'color'}
124 | # box_to_display_str_map{(xmin,xmax,ymin,ymax)(don't no): ['label: xx%']}
125 | def img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang = 'cont41'):
126 | cont_num_find = 0
127 | img_label = []
128 | # Convert coordinates to raw pixels.
129 | for box, color in box_to_color_map.items():
130 | ymin, xmin, ymax, xmax = box
131 | # loads the original image, visualize_boxes_and_labels_on_image_array returned image had draw bounding boxs on it.
132 | image_corp_org = Image.fromarray(np.uint8(image_org))
133 | img_width, img_height = image_corp_org.size
134 | new_xmin = int(xmin * img_width)
135 | new_xmax = int(xmax * img_width)
136 | new_ymin = int(ymin * img_height)
137 | new_ymax = int(ymax * img_height)
138 | # Increase cropping security boundary(px).
139 | offset = 5
140 | if new_xmin - offset >= 0:
141 | new_xmin = new_xmin - offset
142 | if new_xmax + offset <= img_width:
143 | new_xmax = new_xmax + offset
144 | if new_ymin - offset >= 0:
145 | new_ymin = new_ymin - offset
146 | if new_ymax + offset <= img_height:
147 | new_ymax = new_ymax + offset
148 | # Get the label name of every bounding box,and rename 'xxx: 90%' to 'xxx-90%'.
149 | img_label_name = box_to_display_str_map[box][0].split(': ')
150 | # Corp image. Note that the PLI and Numpy coordinates are reversed!!!
151 | image_corp_org = load_image_into_numpy_array(image_org)[new_ymin:new_ymax,new_xmin:new_xmax]
152 | image_corp_org = Image.fromarray(np.uint8(image_corp_org))
153 | # Tesseract OCR
154 | lang_use = 'eng+'+lang+'+letsgodigital+snum+eng_f'
155 | if re.match('container_number+', img_label_name[0]):
156 | cont_num_find += 1
157 | image_corp_gray = image_preprocessing(image_corp_org)
158 | if re.match('container_number_v+', img_label_name[0]):
159 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6')
160 | elif re.match('container_number_e+', img_label_name[0]):
161 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 6')
162 | else :
163 | cont_num = pytesseract.image_to_string(image_corp_gray, lang=lang_use, config='--psm 4')
164 | # Save corp image to outo_path ,and join lable in name.
165 | # image_corp_name make up like this :'image_name(input)'_'cont_num_find'_'img_label_name'
166 | image_corp_name = image_name[:-4]+ '_'+ str(cont_num_find)+ '_'+ img_label_name[0]
167 | # img_lable[{lable,actual,cont_num,image_corp_name}]
168 | img_label.append({'lable':img_label_name[0], 'actual':img_label_name[1], 'cont_num':cont_num, 'image_corp_name':image_corp_name})
169 | image_corp_org.save(os.path.join(output_path) + '/' + image_corp_name + '_org_'+ image_name[-4:])
170 | cv2.imwrite(os.path.join(output_path) + '/' + image_corp_name + '_gray_'+ image_name[-4:], image_corp_gray)
171 | file = open(os.path.join(PATH_TO_TEST_IMAGES_DIR, 'cont_num.txt'), 'a')
172 | file.write(img_label[cont_num_find - 1]['image_corp_name']+ '_' + img_label[cont_num_find - 1]['actual'] + '\n' + img_label[cont_num_find - 1]['cont_num']+ '\n')
173 | file.close()
174 | return img_label # image_corp_org, image_corp_gray
175 |
176 | def detection():
177 | image_label =[]
178 | for image_path in TEST_IMAGE_PATHS:
179 | image_org = Image.open(image_path, 'r')
180 | # the array based representation of the image will be used later in order to prepare the
181 | # result image with boxes and labels on it.
182 | image_np = load_image_into_numpy_array(image_org)
183 | # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
184 | # image_np_expanded = np.expand_dims(image_np, axis=0)
185 | image_name = os.path.basename(os.path.join(image_path))
186 | # Actual detection.
187 | output_dict = run_inference_for_single_image(image_np, detection_graph)
188 |
189 | output_path = os.path.join(PATH_TO_TEST_IMAGES_DIR)
190 |
191 | # Visualization of the results of a detection.
192 | image, box_to_color_map, box_to_display_str_map = vis_util.visualize_boxes_and_labels_on_image_array(
193 | image_np,
194 | output_dict['detection_boxes'],
195 | output_dict['detection_classes'],
196 | output_dict['detection_scores'],
197 | category_index,
198 | instance_masks=output_dict.get('detection_masks'),
199 | use_normalized_coordinates=True,
200 | max_boxes_to_draw=200,
201 | min_score_thresh=.75,
202 | line_thickness=2)
203 |
204 | # Crop bounding box to splt images.
205 | lang = 'cont41'
206 | img_label = img_ocr(image_name, output_path, image_org, box_to_color_map, box_to_display_str_map, lang)
207 | # save visualize_boxes_and_labels_on_image_array output image.
208 | image_name = os.path.basename(os.path.join(image_path))
209 | output_image_name = image_name[:-4] + '_out' + image_name[-4:]
210 | image_out = Image.fromarray(image_np)
211 | image_out.save(os.path.join(PATH_TO_TEST_IMAGES_DIR) + '/'+ output_image_name)
212 | image_label.append({str(image_name[:-4]): img_label})
213 | return image_label
214 |
215 |
216 | if __name__ == "__main__":
217 | print(detection())
218 |
219 |
220 |
221 |
222 |
223 |
--------------------------------------------------------------------------------
/generate_voc_datasets.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 |
4 | __author__ = 'Kevin Di'
5 |
6 | import os
7 | import random
8 |
9 | # VOC like data_set file path.
10 |
11 | xml_file = r'path to your VOC like data_set: /Annotations'
12 | img_file = r'path to your VOC like data_set:/JPEGImages'
13 | save_path = r'path to your VOC like data_set: /ImageSets/Main'
14 |
15 |
16 | # Determine the train, val, test split ratio.
17 | # The frist step is split the train_val and test, and then split the train and val from the train_val.
18 |
19 | train_val_percent = 0.8
20 | train_percent = 0.8
21 | total_dataset_num = os.listdir(xml_file)
22 | total_img_num = os.listdir(img_file)
23 | num = len(total_dataset_num)
24 | img = len(total_img_num)
25 | list = range(num)
26 | t_v = int(num * train_val_percent)
27 | t = int(t_v * train_percent)
28 | train_val= random.sample(list,t_v)
29 | train = random.sample(train_val,t)
30 |
31 | print('Total number of xml files is:', num)
32 | print('Total number of images is:', img)
33 | print('training set size:', t)
34 | print('validation set size:', t_v - t)
35 | print('test set size:', num - t_v)
36 |
37 | file_train = open(os.path.join(save_path,'train.txt'), 'w')
38 | file_val = open(os.path.join(save_path,'val.txt'), 'w')
39 | file_test = open(os.path.join(save_path,'test.txt'), 'w')
40 |
41 |
42 | for i in list:
43 | xml_name = total_dataset_num[i][:5]+'\n'
44 |
45 | if i in train_val:
46 | if i in train:
47 | file_train.write(xml_name)
48 | else:
49 | file_val.write(xml_name)
50 | else:
51 | file_test.write(xml_name)
52 |
53 | file_train.close()
54 | file_val.close()
55 | file_test.close()
56 |
--------------------------------------------------------------------------------
/image/image1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image1.jpg
--------------------------------------------------------------------------------
/image/image2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image2.jpg
--------------------------------------------------------------------------------
/image/image3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image3.jpg
--------------------------------------------------------------------------------
/image/image4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image4.jpg
--------------------------------------------------------------------------------
/image/image5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/image/image5.jpg
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lonelygo/container_detection/21f2e682af4210eb5b36126216f2276c522f6513/utils/__init__.py
--------------------------------------------------------------------------------
/utils/visualization_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # ==============================================================================
16 |
17 | """A set of functions that are used for visualization.
18 |
19 | These functions often receive an image, perform some visualization on the image.
20 | The functions do not return a value, instead they modify the image itself.
21 |
22 | """
23 | import abc
24 | import collections
25 | import functools
26 | # Set headless-friendly backend.
27 | # Use Agg can not show image
28 | # import matplotlib; matplotlib.use('Agg')
29 | import matplotlib
30 | from matplotlib import pyplot as plt
31 | import os
32 | import numpy as np
33 | import PIL.Image as Image
34 | import PIL.ImageColor as ImageColor
35 | import PIL.ImageDraw as ImageDraw
36 | import PIL.ImageFont as ImageFont
37 | import six
38 | import tensorflow as tf
39 |
40 | from object_detection.core import standard_fields as fields
41 | from object_detection.utils import shape_utils
42 |
43 | _TITLE_LEFT_MARGIN = 10
44 | _TITLE_TOP_MARGIN = 10
45 | STANDARD_COLORS = [
46 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque',
47 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite',
48 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan',
49 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange',
50 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet',
51 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite',
52 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod',
53 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki',
54 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue',
55 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey',
56 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue',
57 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime',
58 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid',
59 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen',
60 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin',
61 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed',
62 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed',
63 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple',
64 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown',
65 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue',
66 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow',
67 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White',
68 | 'WhiteSmoke', 'Yellow', 'YellowGreen'
69 | ]
70 |
71 |
72 | def save_image_array_as_png(image, output_path):
73 | """Saves an image (represented as a numpy array) to PNG.
74 |
75 | Args:
76 | image: a numpy array with shape [height, width, 3].
77 | output_path: path to which image should be written.
78 | """
79 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
80 | with tf.gfile.Open(output_path, 'w') as fid:
81 | image_pil.save(fid, 'PNG')
82 |
83 |
84 | def encode_image_array_as_png_str(image):
85 | """Encodes a numpy array into a PNG string.
86 |
87 | Args:
88 | image: a numpy array with shape [height, width, 3].
89 |
90 | Returns:
91 | PNG encoded image string.
92 | """
93 | image_pil = Image.fromarray(np.uint8(image))
94 | output = six.BytesIO()
95 | image_pil.save(output, format='PNG')
96 | png_string = output.getvalue()
97 | output.close()
98 | return png_string
99 |
100 |
101 | def draw_bounding_box_on_image_array(image,
102 | ymin,
103 | xmin,
104 | ymax,
105 | xmax,
106 | color='red',
107 | thickness=4,
108 | display_str_list=(),
109 | use_normalized_coordinates=True):
110 | """Adds a bounding box to an image (numpy array).
111 |
112 | Bounding box coordinates can be specified in either absolute (pixel) or
113 | normalized coordinates by setting the use_normalized_coordinates argument.
114 |
115 | Args:
116 | image: a numpy array with shape [height, width, 3].
117 | ymin: ymin of bounding box.
118 | xmin: xmin of bounding box.
119 | ymax: ymax of bounding box.
120 | xmax: xmax of bounding box.
121 | color: color to draw bounding box. Default is red.
122 | thickness: line thickness. Default value is 4.
123 | display_str_list: list of strings to display in box
124 | (each to be shown on its own line).
125 | use_normalized_coordinates: If True (default), treat coordinates
126 | ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
127 | coordinates as absolute.
128 | """
129 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
130 | draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color,
131 | thickness, display_str_list,
132 | use_normalized_coordinates)
133 | np.copyto(image, np.array(image_pil))
134 |
135 |
136 | def draw_bounding_box_on_image(image,
137 | ymin,
138 | xmin,
139 | ymax,
140 | xmax,
141 | color='red',
142 | thickness=4,
143 | display_str_list=(),
144 | use_normalized_coordinates=True):
145 | """Adds a bounding box to an image.
146 |
147 | Bounding box coordinates can be specified in either absolute (pixel) or
148 | normalized coordinates by setting the use_normalized_coordinates argument.
149 |
150 | Each string in display_str_list is displayed on a separate line above the
151 | bounding box in black text on a rectangle filled with the input 'color'.
152 | If the top of the bounding box extends to the edge of the image, the strings
153 | are displayed below the bounding box.
154 |
155 | Args:
156 | image: a PIL.Image object.
157 | ymin: ymin of bounding box.
158 | xmin: xmin of bounding box.
159 | ymax: ymax of bounding box.
160 | xmax: xmax of bounding box.
161 | color: color to draw bounding box. Default is red.
162 | thickness: line thickness. Default value is 4.
163 | display_str_list: list of strings to display in box
164 | (each to be shown on its own line).
165 | use_normalized_coordinates: If True (default), treat coordinates
166 | ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
167 | coordinates as absolute.
168 | """
169 | draw = ImageDraw.Draw(image)
170 | im_width, im_height = image.size
171 | if use_normalized_coordinates:
172 | (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
173 | ymin * im_height, ymax * im_height)
174 | else:
175 | (left, right, top, bottom) = (xmin, xmax, ymin, ymax)
176 | draw.line([(left, top), (left, bottom), (right, bottom),
177 | (right, top), (left, top)], width=thickness, fill=color)
178 | try:
179 | font = ImageFont.truetype('arial.ttf', 24)
180 | except IOError:
181 | font = ImageFont.load_default()
182 |
183 | # If the total height of the display strings added to the top of the bounding
184 | # box exceeds the top of the image, stack the strings below the bounding box
185 | # instead of above.
186 | display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
187 | # Each display_str has a top and bottom margin of 0.05x.
188 | total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)
189 |
190 | if top > total_display_str_height:
191 | text_bottom = top
192 | else:
193 | text_bottom = bottom + total_display_str_height
194 | # Reverse list and print from bottom to top.
195 | for display_str in display_str_list[::-1]:
196 | text_width, text_height = font.getsize(display_str)
197 | margin = np.ceil(0.05 * text_height)
198 | draw.rectangle(
199 | [(left, text_bottom - text_height - 2 * margin), (left + text_width,
200 | text_bottom)],
201 | fill=color)
202 | draw.text(
203 | (left + margin, text_bottom - text_height - margin),
204 | display_str,
205 | fill='black',
206 | font=font)
207 | text_bottom -= text_height - 2 * margin
208 |
209 |
210 | def draw_bounding_boxes_on_image_array(image,
211 | boxes,
212 | color='red',
213 | thickness=4,
214 | display_str_list_list=()):
215 | """Draws bounding boxes on image (numpy array).
216 |
217 | Args:
218 | image: a numpy array object.
219 | boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax).
220 | The coordinates are in normalized format between [0, 1].
221 | color: color to draw bounding box. Default is red.
222 | thickness: line thickness. Default value is 4.
223 | display_str_list_list: list of list of strings.
224 | a list of strings for each bounding box.
225 | The reason to pass a list of strings for a
226 | bounding box is that it might contain
227 | multiple labels.
228 |
229 | Raises:
230 | ValueError: if boxes is not a [N, 4] array
231 | """
232 | image_pil = Image.fromarray(image)
233 | draw_bounding_boxes_on_image(image_pil, boxes, color, thickness,
234 | display_str_list_list)
235 | np.copyto(image, np.array(image_pil))
236 |
237 |
238 | def draw_bounding_boxes_on_image(image,
239 | boxes,
240 | color='red',
241 | thickness=4,
242 | display_str_list_list=()):
243 | """Draws bounding boxes on image.
244 |
245 | Args:
246 | image: a PIL.Image object.
247 | boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax).
248 | The coordinates are in normalized format between [0, 1].
249 | color: color to draw bounding box. Default is red.
250 | thickness: line thickness. Default value is 4.
251 | display_str_list_list: list of list of strings.
252 | a list of strings for each bounding box.
253 | The reason to pass a list of strings for a
254 | bounding box is that it might contain
255 | multiple labels.
256 |
257 | Raises:
258 | ValueError: if boxes is not a [N, 4] array
259 | """
260 | boxes_shape = boxes.shape
261 | if not boxes_shape:
262 | return
263 | if len(boxes_shape) != 2 or boxes_shape[1] != 4:
264 | raise ValueError('Input must be of size [N, 4]')
265 | for i in range(boxes_shape[0]):
266 | display_str_list = ()
267 | if display_str_list_list:
268 | display_str_list = display_str_list_list[i]
269 | draw_bounding_box_on_image(image, boxes[i, 0], boxes[i, 1], boxes[i, 2],
270 | boxes[i, 3], color, thickness, display_str_list)
271 |
272 |
273 | def _visualize_boxes(image, boxes, classes, scores, category_index, **kwargs):
274 | return visualize_boxes_and_labels_on_image_array(
275 | image, boxes, classes, scores, category_index=category_index, **kwargs)
276 |
277 |
278 | def _visualize_boxes_and_masks(image, boxes, classes, scores, masks,
279 | category_index, **kwargs):
280 | return visualize_boxes_and_labels_on_image_array(
281 | image,
282 | boxes,
283 | classes,
284 | scores,
285 | category_index=category_index,
286 | instance_masks=masks,
287 | **kwargs)
288 |
289 |
290 | def _visualize_boxes_and_keypoints(image, boxes, classes, scores, keypoints,
291 | category_index, **kwargs):
292 | return visualize_boxes_and_labels_on_image_array(
293 | image,
294 | boxes,
295 | classes,
296 | scores,
297 | category_index=category_index,
298 | keypoints=keypoints,
299 | **kwargs)
300 |
301 |
302 | def _visualize_boxes_and_masks_and_keypoints(
303 | image, boxes, classes, scores, masks, keypoints, category_index, **kwargs):
304 | return visualize_boxes_and_labels_on_image_array(
305 | image,
306 | boxes,
307 | classes,
308 | scores,
309 | category_index=category_index,
310 | instance_masks=masks,
311 | keypoints=keypoints,
312 | **kwargs)
313 |
314 |
315 | def _resize_original_image(image, image_shape):
316 | image = tf.expand_dims(image, 0)
317 | image = tf.image.resize_images(
318 | image,
319 | image_shape,
320 | method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
321 | align_corners=True)
322 | return tf.cast(tf.squeeze(image, 0), tf.uint8)
323 |
324 |
325 | def draw_bounding_boxes_on_image_tensors(images,
326 | boxes,
327 | classes,
328 | scores,
329 | category_index,
330 | original_image_spatial_shape=None,
331 | true_image_shape=None,
332 | instance_masks=None,
333 | keypoints=None,
334 | max_boxes_to_draw=100,
335 | min_score_thresh=0.85,
336 | use_normalized_coordinates=True):
337 | """Draws bounding boxes, masks, and keypoints on batch of image tensors.
338 |
339 | Args:
340 | images: A 4D uint8 image tensor of shape [N, H, W, C]. If C > 3, additional
341 | channels will be ignored. If C = 1, then we convert the images to RGB
342 | images.
343 | boxes: [N, max_detections, 4] float32 tensor of detection boxes.
344 | classes: [N, max_detections] int tensor of detection classes. Note that
345 | classes are 1-indexed.
346 | scores: [N, max_detections] float32 tensor of detection scores.
347 | category_index: a dict that maps integer ids to category dicts. e.g.
348 | {1: {1: 'dog'}, 2: {2: 'cat'}, ...}
349 | original_image_spatial_shape: [N, 2] tensor containing the spatial size of
350 | the original image.
351 | true_image_shape: [N, 3] tensor containing the spatial size of unpadded
352 | original_image.
353 | instance_masks: A 4D uint8 tensor of shape [N, max_detection, H, W] with
354 | instance masks.
355 | keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2]
356 | with keypoints.
357 | max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
358 | min_score_thresh: Minimum score threshold for visualization. Default 0.2.
359 | use_normalized_coordinates: Whether to assume boxes and kepoints are in
360 | normalized coordinates (as opposed to absolute coordiantes).
361 | Default is True.
362 |
363 | Returns:
364 | 4D image tensor of type uint8, with boxes drawn on top.
365 | """
366 | # Additional channels are being ignored.
367 | if images.shape[3] > 3:
368 | images = images[:, :, :, 0:3]
369 | elif images.shape[3] == 1:
370 | images = tf.image.grayscale_to_rgb(images)
371 | visualization_keyword_args = {
372 | 'use_normalized_coordinates': use_normalized_coordinates,
373 | 'max_boxes_to_draw': max_boxes_to_draw,
374 | 'min_score_thresh': min_score_thresh,
375 | 'agnostic_mode': False,
376 | 'line_thickness': 4
377 | }
378 | if true_image_shape is None:
379 | true_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 3])
380 | else:
381 | true_shapes = true_image_shape
382 | if original_image_spatial_shape is None:
383 | original_shapes = tf.constant(-1, shape=[images.shape.as_list()[0], 2])
384 | else:
385 | original_shapes = original_image_spatial_shape
386 |
387 | if instance_masks is not None and keypoints is None:
388 | visualize_boxes_fn = functools.partial(
389 | _visualize_boxes_and_masks,
390 | category_index=category_index,
391 | **visualization_keyword_args)
392 | elems = [
393 | true_shapes, original_shapes, images, boxes, classes, scores,
394 | instance_masks
395 | ]
396 | elif instance_masks is None and keypoints is not None:
397 | visualize_boxes_fn = functools.partial(
398 | _visualize_boxes_and_keypoints,
399 | category_index=category_index,
400 | **visualization_keyword_args)
401 | elems = [
402 | true_shapes, original_shapes, images, boxes, classes, scores, keypoints
403 | ]
404 | elif instance_masks is not None and keypoints is not None:
405 | visualize_boxes_fn = functools.partial(
406 | _visualize_boxes_and_masks_and_keypoints,
407 | category_index=category_index,
408 | **visualization_keyword_args)
409 | elems = [
410 | true_shapes, original_shapes, images, boxes, classes, scores,
411 | instance_masks, keypoints
412 | ]
413 | else:
414 | visualize_boxes_fn = functools.partial(
415 | _visualize_boxes,
416 | category_index=category_index,
417 | **visualization_keyword_args)
418 | elems = [
419 | true_shapes, original_shapes, images, boxes, classes, scores
420 | ]
421 |
422 | def draw_boxes(image_and_detections):
423 | """Draws boxes on image."""
424 | true_shape = image_and_detections[0]
425 | original_shape = image_and_detections[1]
426 | if true_image_shape is not None:
427 | image = shape_utils.pad_or_clip_nd(image_and_detections[2],
428 | [true_shape[0], true_shape[1], 3])
429 | if original_image_spatial_shape is not None:
430 | image_and_detections[2] = _resize_original_image(image, original_shape)
431 |
432 | image_with_boxes = tf.py_func(visualize_boxes_fn, image_and_detections[2:],
433 | tf.uint8)
434 | return image_with_boxes
435 |
436 | images = tf.map_fn(draw_boxes, elems, dtype=tf.uint8, back_prop=False)
437 | return images
438 |
439 |
440 | def draw_side_by_side_evaluation_image(eval_dict,
441 | category_index,
442 | max_boxes_to_draw=20,
443 | min_score_thresh=0.2,
444 | use_normalized_coordinates=True):
445 | """Creates a side-by-side image with detections and groundtruth.
446 |
447 | Bounding boxes (and instance masks, if available) are visualized on both
448 | subimages.
449 |
450 | Args:
451 | eval_dict: The evaluation dictionary returned by
452 | eval_util.result_dict_for_batched_example() or
453 | eval_util.result_dict_for_single_example().
454 | category_index: A category index (dictionary) produced from a labelmap.
455 | max_boxes_to_draw: The maximum number of boxes to draw for detections.
456 | min_score_thresh: The minimum score threshold for showing detections.
457 | use_normalized_coordinates: Whether to assume boxes and kepoints are in
458 | normalized coordinates (as opposed to absolute coordiantes).
459 | Default is True.
460 |
461 | Returns:
462 | A list of [1, H, 2 * W, C] uint8 tensor. The subimage on the left
463 | corresponds to detections, while the subimage on the right corresponds to
464 | groundtruth.
465 | """
466 | detection_fields = fields.DetectionResultFields()
467 | input_data_fields = fields.InputDataFields()
468 |
469 | images_with_detections_list = []
470 |
471 | # Add the batch dimension if the eval_dict is for single example.
472 | if len(eval_dict[detection_fields.detection_classes].shape) == 1:
473 | for key in eval_dict:
474 | if key != input_data_fields.original_image:
475 | eval_dict[key] = tf.expand_dims(eval_dict[key], 0)
476 |
477 | for indx in range(eval_dict[input_data_fields.original_image].shape[0]):
478 | instance_masks = None
479 | if detection_fields.detection_masks in eval_dict:
480 | instance_masks = tf.cast(
481 | tf.expand_dims(
482 | eval_dict[detection_fields.detection_masks][indx], axis=0),
483 | tf.uint8)
484 | keypoints = None
485 | if detection_fields.detection_keypoints in eval_dict:
486 | keypoints = tf.expand_dims(
487 | eval_dict[detection_fields.detection_keypoints][indx], axis=0)
488 | groundtruth_instance_masks = None
489 | if input_data_fields.groundtruth_instance_masks in eval_dict:
490 | groundtruth_instance_masks = tf.cast(
491 | tf.expand_dims(
492 | eval_dict[input_data_fields.groundtruth_instance_masks][indx],
493 | axis=0), tf.uint8)
494 |
495 | images_with_detections = draw_bounding_boxes_on_image_tensors(
496 | tf.expand_dims(
497 | eval_dict[input_data_fields.original_image][indx], axis=0),
498 | tf.expand_dims(
499 | eval_dict[detection_fields.detection_boxes][indx], axis=0),
500 | tf.expand_dims(
501 | eval_dict[detection_fields.detection_classes][indx], axis=0),
502 | tf.expand_dims(
503 | eval_dict[detection_fields.detection_scores][indx], axis=0),
504 | category_index,
505 | original_image_spatial_shape=tf.expand_dims(
506 | eval_dict[input_data_fields.original_image_spatial_shape][indx],
507 | axis=0),
508 | true_image_shape=tf.expand_dims(
509 | eval_dict[input_data_fields.true_image_shape][indx], axis=0),
510 | instance_masks=instance_masks,
511 | keypoints=keypoints,
512 | max_boxes_to_draw=max_boxes_to_draw,
513 | min_score_thresh=min_score_thresh,
514 | use_normalized_coordinates=use_normalized_coordinates)
515 | images_with_groundtruth = draw_bounding_boxes_on_image_tensors(
516 | tf.expand_dims(
517 | eval_dict[input_data_fields.original_image][indx], axis=0),
518 | tf.expand_dims(
519 | eval_dict[input_data_fields.groundtruth_boxes][indx], axis=0),
520 | tf.expand_dims(
521 | eval_dict[input_data_fields.groundtruth_classes][indx], axis=0),
522 | tf.expand_dims(
523 | tf.ones_like(
524 | eval_dict[input_data_fields.groundtruth_classes][indx],
525 | dtype=tf.float32),
526 | axis=0),
527 | category_index,
528 | original_image_spatial_shape=tf.expand_dims(
529 | eval_dict[input_data_fields.original_image_spatial_shape][indx],
530 | axis=0),
531 | true_image_shape=tf.expand_dims(
532 | eval_dict[input_data_fields.true_image_shape][indx], axis=0),
533 | instance_masks=groundtruth_instance_masks,
534 | keypoints=None,
535 | max_boxes_to_draw=None,
536 | min_score_thresh=0.0,
537 | use_normalized_coordinates=use_normalized_coordinates)
538 | images_with_detections_list.append(
539 | tf.concat([images_with_detections, images_with_groundtruth], axis=2))
540 | return images_with_detections_list
541 |
542 |
543 | def draw_keypoints_on_image_array(image,
544 | keypoints,
545 | color='red',
546 | radius=2,
547 | use_normalized_coordinates=True):
548 | """Draws keypoints on an image (numpy array).
549 |
550 | Args:
551 | image: a numpy array with shape [height, width, 3].
552 | keypoints: a numpy array with shape [num_keypoints, 2].
553 | color: color to draw the keypoints with. Default is red.
554 | radius: keypoint radius. Default value is 2.
555 | use_normalized_coordinates: if True (default), treat keypoint values as
556 | relative to the image. Otherwise treat them as absolute.
557 | """
558 | image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
559 | draw_keypoints_on_image(image_pil, keypoints, color, radius,
560 | use_normalized_coordinates)
561 | np.copyto(image, np.array(image_pil))
562 |
563 |
564 | def draw_keypoints_on_image(image,
565 | keypoints,
566 | color='red',
567 | radius=2,
568 | use_normalized_coordinates=True):
569 | """Draws keypoints on an image.
570 |
571 | Args:
572 | image: a PIL.Image object.
573 | keypoints: a numpy array with shape [num_keypoints, 2].
574 | color: color to draw the keypoints with. Default is red.
575 | radius: keypoint radius. Default value is 2.
576 | use_normalized_coordinates: if True (default), treat keypoint values as
577 | relative to the image. Otherwise treat them as absolute.
578 | """
579 | draw = ImageDraw.Draw(image)
580 | im_width, im_height = image.size
581 | keypoints_x = [k[1] for k in keypoints]
582 | keypoints_y = [k[0] for k in keypoints]
583 | if use_normalized_coordinates:
584 | keypoints_x = tuple([im_width * x for x in keypoints_x])
585 | keypoints_y = tuple([im_height * y for y in keypoints_y])
586 | for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y):
587 | draw.ellipse([(keypoint_x - radius, keypoint_y - radius),
588 | (keypoint_x + radius, keypoint_y + radius)],
589 | outline=color, fill=color)
590 |
591 |
592 | def draw_mask_on_image_array(image, mask, color='red', alpha=0.4):
593 | """Draws mask on an image.
594 |
595 | Args:
596 | image: uint8 numpy array with shape (img_height, img_height, 3)
597 | mask: a uint8 numpy array of shape (img_height, img_height) with
598 | values between either 0 or 1.
599 | color: color to draw the keypoints with. Default is red.
600 | alpha: transparency value between 0 and 1. (default: 0.4)
601 |
602 | Raises:
603 | ValueError: On incorrect data type for image or masks.
604 | """
605 | if image.dtype != np.uint8:
606 | raise ValueError('`image` not of type np.uint8')
607 | if mask.dtype != np.uint8:
608 | raise ValueError('`mask` not of type np.uint8')
609 | if np.any(np.logical_and(mask != 1, mask != 0)):
610 | raise ValueError('`mask` elements should be in [0, 1]')
611 | if image.shape[:2] != mask.shape:
612 | raise ValueError('The image has spatial dimensions %s but the mask has '
613 | 'dimensions %s' % (image.shape[:2], mask.shape))
614 | rgb = ImageColor.getrgb(color)
615 | pil_image = Image.fromarray(image)
616 |
617 | solid_color = np.expand_dims(
618 | np.ones_like(mask), axis=2) * np.reshape(list(rgb), [1, 1, 3])
619 | pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA')
620 | pil_mask = Image.fromarray(np.uint8(255.0*alpha*mask)).convert('L')
621 | pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)
622 | np.copyto(image, np.array(pil_image.convert('RGB')))
623 |
624 |
625 | def visualize_boxes_and_labels_on_image_array(
626 | image,
627 | # image_path,
628 | # output_path,
629 | boxes,
630 | classes,
631 | scores,
632 | category_index,
633 | instance_masks=None,
634 | instance_boundaries=None,
635 | keypoints=None,
636 | use_normalized_coordinates=False,
637 | max_boxes_to_draw=20,
638 | min_score_thresh=.5,
639 | agnostic_mode=False,
640 | line_thickness=4,
641 | groundtruth_box_visualization_color='black',
642 | skip_scores=False,
643 | skip_labels=False):
644 | """Overlay labeled boxes on an image with formatted scores and label names.
645 |
646 | This function groups boxes that correspond to the same location
647 | and creates a display string for each detection and overlays these
648 | on the image. Note that this function modifies the image in place, and returns
649 | that same image.
650 |
651 | Args:
652 | image: uint8 numpy array with shape (img_height, img_width, 3)
653 | boxes: a numpy array of shape [N, 4]
654 | classes: a numpy array of shape [N]. Note that class indices are 1-based,
655 | and match the keys in the label map.
656 | scores: a numpy array of shape [N] or None. If scores=None, then
657 | this function assumes that the boxes to be plotted are groundtruth
658 | boxes and plot all boxes as black with no classes or scores.
659 | category_index: a dict containing category dictionaries (each holding
660 | category index `id` and category name `name`) keyed by category indices.
661 | instance_masks: a numpy array of shape [N, image_height, image_width] with
662 | values ranging between 0 and 1, can be None.
663 | instance_boundaries: a numpy array of shape [N, image_height, image_width]
664 | with values ranging between 0 and 1, can be None.
665 | keypoints: a numpy array of shape [N, num_keypoints, 2], can
666 | be None
667 | use_normalized_coordinates: whether boxes is to be interpreted as
668 | normalized coordinates or not.
669 | max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
670 | all boxes.
671 | min_score_thresh: minimum score threshold for a box to be visualized
672 | agnostic_mode: boolean (default: False) controlling whether to evaluate in
673 | class-agnostic mode or not. This mode will display scores but ignore
674 | classes.
675 | line_thickness: integer (default: 4) controlling line width of the boxes.
676 | groundtruth_box_visualization_color: box color for visualizing groundtruth
677 | boxes
678 | skip_scores: whether to skip score when drawing a single detection
679 | skip_labels: whether to skip label when drawing a single detection
680 |
681 | Returns:
682 | uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
683 | """
684 | # Create a display string (and color) for every box location, group any boxes
685 | # that correspond to the same location.
686 | box_to_display_str_map = collections.defaultdict(list)
687 | box_to_color_map = collections.defaultdict(str)
688 | box_to_instance_masks_map = {}
689 | box_to_instance_boundaries_map = {}
690 | box_to_keypoints_map = collections.defaultdict(list)
691 | if not max_boxes_to_draw:
692 | max_boxes_to_draw = boxes.shape[0]
693 | for i in range(min(max_boxes_to_draw, boxes.shape[0])):
694 | if scores is None or scores[i] > min_score_thresh:
695 | box = tuple(boxes[i].tolist())
696 | if instance_masks is not None:
697 | box_to_instance_masks_map[box] = instance_masks[i]
698 | if instance_boundaries is not None:
699 | box_to_instance_boundaries_map[box] = instance_boundaries[i]
700 | if keypoints is not None:
701 | box_to_keypoints_map[box].extend(keypoints[i])
702 | if scores is None:
703 | box_to_color_map[box] = groundtruth_box_visualization_color
704 | else:
705 | display_str = ''
706 | if not skip_labels:
707 | if not agnostic_mode:
708 | if classes[i] in category_index.keys():
709 | class_name = category_index[classes[i]]['name']
710 | else:
711 | class_name = 'N/A'
712 | display_str = str(class_name)
713 | if not skip_scores:
714 | if not display_str:
715 | display_str = '{}%'.format(int(100*scores[i]))
716 | else:
717 | display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
718 | box_to_display_str_map[box].append(display_str)
719 | if agnostic_mode:
720 | box_to_color_map[box] = 'DarkOrange'
721 | else:
722 | box_to_color_map[box] = STANDARD_COLORS[
723 | classes[i] % len(STANDARD_COLORS)]
724 |
725 |
726 | # # Crop bounding box to splt images,move out of this file for OCR.
727 |
728 | # # Convert coordinates to raw pixels.
729 | # t = 0
730 | # for box, color in box_to_color_map.items():
731 | # ymin, xmin, ymax, xmax = box
732 |
733 | # img = Image.fromarray(np.uint8(image))
734 | # im_width, im_height = img.size
735 | # new_xmin = int(xmin * im_width)
736 | # new_xmax = int(xmax * im_width)
737 | # new_ymin = int(ymin * im_height)
738 | # new_ymax = int(ymax * im_height)
739 |
740 | # img_n = box_to_display_str_map[box][0]
741 | # img_name = img_n.replace(': ','-')
742 |
743 | # # corp image.Note that the PLI and Numpy coordinates are reversed!!!
744 | # image_corp = image[new_ymin:new_ymax,new_xmin:new_xmax]
745 |
746 | # image_corp = Image.fromarray(np.uint8(image_corp))
747 |
748 | # # Save corp image to outo_path ,and join output lable in name.
749 | # if img_name.find('container_number') >= 0:
750 | # t += 1
751 | # image_corp.save(os.path.join(output_path) + '/' +img_name +'_' + (str(t)+'_') + os.path.basename(image_path))
752 |
753 |
754 | # Draw all boxes onto image.
755 | for box, color in box_to_color_map.items():
756 | ymin, xmin, ymax, xmax = box
757 | if instance_masks is not None:
758 | draw_mask_on_image_array(
759 | image,
760 | box_to_instance_masks_map[box],
761 | color=color
762 | )
763 | if instance_boundaries is not None:
764 | draw_mask_on_image_array(
765 | image,
766 | box_to_instance_boundaries_map[box],
767 | color='red',
768 | alpha=1.0
769 | )
770 |
771 | draw_bounding_box_on_image_array(
772 | image,
773 | ymin,
774 | xmin,
775 | ymax,
776 | xmax,
777 | color=color,
778 | thickness=line_thickness,
779 | display_str_list=box_to_display_str_map[box],
780 | use_normalized_coordinates=use_normalized_coordinates)
781 | if keypoints is not None:
782 | print(box_to_keypoints_map[box])
783 | draw_keypoints_on_image_array(
784 | image,
785 | box_to_keypoints_map[box],
786 | color=color,
787 | radius=line_thickness / 2,
788 | use_normalized_coordinates=use_normalized_coordinates)
789 |
790 | return image, box_to_color_map, box_to_display_str_map
791 |
792 |
793 | def add_cdf_image_summary(values, name):
794 | """Adds a tf.summary.image for a CDF plot of the values.
795 |
796 | Normalizes `values` such that they sum to 1, plots the cumulative distribution
797 | function and creates a tf image summary.
798 |
799 | Args:
800 | values: a 1-D float32 tensor containing the values.
801 | name: name for the image summary.
802 | """
803 | def cdf_plot(values):
804 | """Numpy function to plot CDF."""
805 | normalized_values = values / np.sum(values)
806 | sorted_values = np.sort(normalized_values)
807 | cumulative_values = np.cumsum(sorted_values)
808 | fraction_of_examples = (np.arange(cumulative_values.size, dtype=np.float32)
809 | / cumulative_values.size)
810 | fig = plt.figure(frameon=False)
811 | ax = fig.add_subplot('111')
812 | ax.plot(fraction_of_examples, cumulative_values)
813 | ax.set_ylabel('cumulative normalized values')
814 | ax.set_xlabel('fraction of examples')
815 | fig.canvas.draw()
816 | width, height = fig.get_size_inches() * fig.get_dpi()
817 | image = np.fromstring(fig.canvas.tostring_rgb(), dtype='uint8').reshape(
818 | 1, int(height), int(width), 3)
819 | return image
820 | cdf_plot = tf.py_func(cdf_plot, [values], tf.uint8)
821 | tf.summary.image(name, cdf_plot)
822 |
823 |
824 | def add_hist_image_summary(values, bins, name):
825 | """Adds a tf.summary.image for a histogram plot of the values.
826 |
827 | Plots the histogram of values and creates a tf image summary.
828 |
829 | Args:
830 | values: a 1-D float32 tensor containing the values.
831 | bins: bin edges which will be directly passed to np.histogram.
832 | name: name for the image summary.
833 | """
834 |
835 | def hist_plot(values, bins):
836 | """Numpy function to plot hist."""
837 | fig = plt.figure(frameon=False)
838 | ax = fig.add_subplot('111')
839 | y, x = np.histogram(values, bins=bins)
840 | ax.plot(x[:-1], y)
841 | ax.set_ylabel('count')
842 | ax.set_xlabel('value')
843 | fig.canvas.draw()
844 | width, height = fig.get_size_inches() * fig.get_dpi()
845 | image = np.fromstring(
846 | fig.canvas.tostring_rgb(), dtype='uint8').reshape(
847 | 1, int(height), int(width), 3)
848 | return image
849 | hist_plot = tf.py_func(hist_plot, [values, bins], tf.uint8)
850 | tf.summary.image(name, hist_plot)
851 |
852 |
853 | class EvalMetricOpsVisualization(object):
854 | """Abstract base class responsible for visualizations during evaluation.
855 |
856 | Currently, summary images are not run during evaluation. One way to produce
857 | evaluation images in Tensorboard is to provide tf.summary.image strings as
858 | `value_ops` in tf.estimator.EstimatorSpec's `eval_metric_ops`. This class is
859 | responsible for accruing images (with overlaid detections and groundtruth)
860 | and returning a dictionary that can be passed to `eval_metric_ops`.
861 | """
862 | __metaclass__ = abc.ABCMeta
863 |
864 | def __init__(self,
865 | category_index,
866 | max_examples_to_draw=5,
867 | max_boxes_to_draw=20,
868 | min_score_thresh=0.2,
869 | use_normalized_coordinates=True,
870 | summary_name_prefix='evaluation_image'):
871 | """Creates an EvalMetricOpsVisualization.
872 |
873 | Args:
874 | category_index: A category index (dictionary) produced from a labelmap.
875 | max_examples_to_draw: The maximum number of example summaries to produce.
876 | max_boxes_to_draw: The maximum number of boxes to draw for detections.
877 | min_score_thresh: The minimum score threshold for showing detections.
878 | use_normalized_coordinates: Whether to assume boxes and kepoints are in
879 | normalized coordinates (as opposed to absolute coordiantes).
880 | Default is True.
881 | summary_name_prefix: A string prefix for each image summary.
882 | """
883 |
884 | self._category_index = category_index
885 | self._max_examples_to_draw = max_examples_to_draw
886 | self._max_boxes_to_draw = max_boxes_to_draw
887 | self._min_score_thresh = min_score_thresh
888 | self._use_normalized_coordinates = use_normalized_coordinates
889 | self._summary_name_prefix = summary_name_prefix
890 | self._images = []
891 |
892 | def clear(self):
893 | self._images = []
894 |
895 | def add_images(self, images):
896 | """Store a list of images, each with shape [1, H, W, C]."""
897 | if len(self._images) >= self._max_examples_to_draw:
898 | return
899 |
900 | # Store images and clip list if necessary.
901 | self._images.extend(images)
902 | if len(self._images) > self._max_examples_to_draw:
903 | self._images[self._max_examples_to_draw:] = []
904 |
905 | def get_estimator_eval_metric_ops(self, eval_dict):
906 | """Returns metric ops for use in tf.estimator.EstimatorSpec.
907 |
908 | Args:
909 | eval_dict: A dictionary that holds an image, groundtruth, and detections
910 | for a batched example. Note that, we use only the first example for
911 | visualization. See eval_util.result_dict_for_batched_example() for a
912 | convenient method for constructing such a dictionary. The dictionary
913 | contains
914 | fields.InputDataFields.original_image: [batch_size, H, W, 3] image.
915 | fields.InputDataFields.original_image_spatial_shape: [batch_size, 2]
916 | tensor containing the size of the original image.
917 | fields.InputDataFields.true_image_shape: [batch_size, 3]
918 | tensor containing the spatial size of the upadded original image.
919 | fields.InputDataFields.groundtruth_boxes - [batch_size, num_boxes, 4]
920 | float32 tensor with groundtruth boxes in range [0.0, 1.0].
921 | fields.InputDataFields.groundtruth_classes - [batch_size, num_boxes]
922 | int64 tensor with 1-indexed groundtruth classes.
923 | fields.InputDataFields.groundtruth_instance_masks - (optional)
924 | [batch_size, num_boxes, H, W] int64 tensor with instance masks.
925 | fields.DetectionResultFields.detection_boxes - [batch_size,
926 | max_num_boxes, 4] float32 tensor with detection boxes in range [0.0,
927 | 1.0].
928 | fields.DetectionResultFields.detection_classes - [batch_size,
929 | max_num_boxes] int64 tensor with 1-indexed detection classes.
930 | fields.DetectionResultFields.detection_scores - [batch_size,
931 | max_num_boxes] float32 tensor with detection scores.
932 | fields.DetectionResultFields.detection_masks - (optional) [batch_size,
933 | max_num_boxes, H, W] float32 tensor of binarized masks.
934 | fields.DetectionResultFields.detection_keypoints - (optional)
935 | [batch_size, max_num_boxes, num_keypoints, 2] float32 tensor with
936 | keypoints.
937 |
938 | Returns:
939 | A dictionary of image summary names to tuple of (value_op, update_op). The
940 | `update_op` is the same for all items in the dictionary, and is
941 | responsible for saving a single side-by-side image with detections and
942 | groundtruth. Each `value_op` holds the tf.summary.image string for a given
943 | image.
944 | """
945 | if self._max_examples_to_draw == 0:
946 | return {}
947 | images = self.images_from_evaluation_dict(eval_dict)
948 |
949 | def get_images():
950 | """Returns a list of images, padded to self._max_images_to_draw."""
951 | images = self._images
952 | while len(images) < self._max_examples_to_draw:
953 | images.append(np.array(0, dtype=np.uint8))
954 | self.clear()
955 | return images
956 |
957 | def image_summary_or_default_string(summary_name, image):
958 | """Returns image summaries for non-padded elements."""
959 | return tf.cond(
960 | tf.equal(tf.size(tf.shape(image)), 4),
961 | lambda: tf.summary.image(summary_name, image),
962 | lambda: tf.constant(''))
963 |
964 | update_op = tf.py_func(self.add_images, [[images[0]]], [])
965 | image_tensors = tf.py_func(
966 | get_images, [], [tf.uint8] * self._max_examples_to_draw)
967 | eval_metric_ops = {}
968 | for i, image in enumerate(image_tensors):
969 | summary_name = self._summary_name_prefix + '/' + str(i)
970 | value_op = image_summary_or_default_string(summary_name, image)
971 | eval_metric_ops[summary_name] = (value_op, update_op)
972 | return eval_metric_ops
973 |
974 | @abc.abstractmethod
975 | def images_from_evaluation_dict(self, eval_dict):
976 | """Converts evaluation dictionary into a list of image tensors.
977 |
978 | To be overridden by implementations.
979 |
980 | Args:
981 | eval_dict: A dictionary with all the necessary information for producing
982 | visualizations.
983 |
984 | Returns:
985 | A list of [1, H, W, C] uint8 tensors.
986 | """
987 | raise NotImplementedError
988 |
989 |
990 | class VisualizeSingleFrameDetections(EvalMetricOpsVisualization):
991 | """Class responsible for single-frame object detection visualizations."""
992 |
993 | def __init__(self,
994 | category_index,
995 | max_examples_to_draw=5,
996 | max_boxes_to_draw=20,
997 | min_score_thresh=0.2,
998 | use_normalized_coordinates=True,
999 | summary_name_prefix='Detections_Left_Groundtruth_Right'):
1000 | super(VisualizeSingleFrameDetections, self).__init__(
1001 | category_index=category_index,
1002 | max_examples_to_draw=max_examples_to_draw,
1003 | max_boxes_to_draw=max_boxes_to_draw,
1004 | min_score_thresh=min_score_thresh,
1005 | use_normalized_coordinates=use_normalized_coordinates,
1006 | summary_name_prefix=summary_name_prefix)
1007 |
1008 | def images_from_evaluation_dict(self, eval_dict):
1009 | return draw_side_by_side_evaluation_image(
1010 | eval_dict, self._category_index, self._max_boxes_to_draw,
1011 | self._min_score_thresh, self._use_normalized_coordinates)
1012 |
--------------------------------------------------------------------------------