├── LICENSE ├── README.md ├── convert.py ├── deep_sort ├── __init__.py ├── detection.py ├── iou_matching.py ├── kalman_filter.py ├── linear_assignment.py ├── nn_matching.py ├── preprocessing.py ├── track.py └── tracker.py ├── detection.txt ├── main.py ├── model_data ├── coco_classes.txt ├── market1501.pb ├── mars-small128.pb ├── mars.pb ├── obj.txt ├── voc_classes.txt ├── yolo3_object.names ├── yolo_anchors.txt └── yolov3.cfg ├── output ├── result.png ├── st1_vedio_person_output.avi └── st1_vedio_person_output.gif ├── requirements.txt ├── tools ├── freeze_model.py └── generate_detections.py ├── vedio └── test1_vedio.avi ├── yolo.py └── yolo3 ├── model.py └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | {one line to give the program's name and a brief idea of what it does.} 635 | Copyright (C) {year} {name of author} 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | {project} Copyright (C) {year} {fullname} 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # YOLOv3 + Deep_SORT 2 | YOLOv3 + Deep_SORT 实现多类多目标检测(计数) 3 | 4 | 5 | 6 | ## Requirement 7 | * OpenCV 8 | * keras 9 | * NumPy 10 | * sklean 11 | * Pillow 12 | * tensorflow-gpu 1.10.0 13 | *** 14 | 15 | It uses: 16 | 17 | * __Detection__: [YOLOv3](https://github.com/qqwweee/keras-yolo3) to detect objects on each of the video frames. - 用自己的数据训练YOLOv3模型 18 | 19 | * __Tracking__: [Deep_SORT](https://github.com/nwojke/deep_sort) to track those objects over different frames. 20 | 21 | *This repository contains code for Simple Online and Realtime Tracking with a Deep Association Metric (Deep SORT). We extend the original SORT algorithm to integrate appearance information based on a deep appearance descriptor. See the [arXiv preprint](https://arxiv.org/abs/1703.07402) for more information.* 22 | 23 | ## Quick Start 24 | 25 | __0.Requirements__ 26 | 27 | pip install -r requirements.txt 28 | 29 | __1. Download the code to your computer.__ 30 | 31 | git clone https://github.com/xiaoxiong74/Object-Detection-and-Tracking.git 32 | 33 | __2. Download [[yolov3.weights]](https://pjreddie.com/media/files/yolov3.weights)__ and place it in `deep_sort_yolov3/model_data/` 34 | 35 | *Here you can download my trained [[yolo-spp.h5]](https://pan.baidu.com/s/1DoiifwXrss1QgSQBp2vv8w&shfl=shareset) - `t13k` weights for detecting person/car/bicycle,etc.* 36 | 37 | __3. Convert the Darknet YOLO model to a Keras model:__ 38 | ``` 39 | $ python convert.py model_data/yolov3.cfg model_data/yolov3.weights model_data/yolo.h5 40 | ``` 41 | __4. Run the YOLO_DEEP_SORT:__ 42 | 43 | ``` 44 | $ python main.py -c [CLASS NAME] -i [INPUT VIDEO PATH] 45 | 46 | $ python main.py -c person -i ./test_video/testvideo.avi 47 | ``` 48 | 49 | __5. Can change [yolo.py] `__Line 129__` to your tracking object__ 50 | 51 | ``` 52 | if predicted_class != 'person' and predicted_class != 'bicycle': 53 | print(predicted_class) 54 | continue 55 | ``` 56 | and change [main.py] `__Line 108__` and `__Line 123__` to your tracking object__ 57 | ``` 58 | # __Line 108__`分别保存每个类别的track_id 59 | if class_name == ['person']: 60 | counter1.append(int(track.track_id)) 61 | if class_name == ['bicycle']: 62 | counter2.append(int(track.track_id)) 63 | 64 | # __Line 123__当前画面中的每个类别单独计数 65 | if class_name == ['person']: 66 | i1 = i1 +1 67 | else: 68 | i2 = i2 +1 69 | 70 | ``` 71 | and change some desciption in [main.py] `__Line 146__` and `__Line 175__` 72 | 73 | 74 | ## Train on Market1501 & MARS 75 | *People Re-identification model* 76 | 77 | [cosine_metric_learning](https://github.com/nwojke/cosine_metric_learning) for training a metric feature representation to be used with the deep_sort tracker. 78 | 79 | ## Citation 80 | 81 | ### YOLOv3 : 82 | 83 | @article{yolov3, 84 | title={YOLOv3: An Incremental Improvement}, 85 | author={Redmon, Joseph and Farhadi, Ali}, 86 | journal = {arXiv}, 87 | year={2018} 88 | } 89 | 90 | ### Deep_SORT : 91 | 92 | @inproceedings{Wojke2017simple, 93 | title={Simple Online and Realtime Tracking with a Deep Association Metric}, 94 | author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich}, 95 | booktitle={2017 IEEE International Conference on Image Processing (ICIP)}, 96 | year={2017}, 97 | pages={3645--3649}, 98 | organization={IEEE}, 99 | doi={10.1109/ICIP.2017.8296962} 100 | } 101 | 102 | @inproceedings{Wojke2018deep, 103 | title={Deep Cosine Metric Learning for Person Re-identification}, 104 | author={Wojke, Nicolai and Bewley, Alex}, 105 | booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)}, 106 | year={2018}, 107 | pages={748--756}, 108 | organization={IEEE}, 109 | doi={10.1109/WACV.2018.00087} 110 | } 111 | 112 | ## Reference 113 | #### Github:deep_sort@[Nicolai Wojke nwojke](https://github.com/nwojke/deep_sort) 114 | #### Github:deep_sort_yolov3@[Qidian213 ](https://github.com/Qidian213/deep_sort_yolov3) 115 | 116 | 117 | 118 | -------------------------------------------------------------------------------- /convert.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | """ 3 | Reads Darknet config and weights and creates Keras model with TF backend. 4 | 5 | """ 6 | 7 | import argparse 8 | import configparser 9 | import io 10 | import os 11 | from collections import defaultdict 12 | 13 | import numpy as np 14 | from keras import backend as K 15 | from keras.layers import (Conv2D, Input, ZeroPadding2D, Add, 16 | UpSampling2D, MaxPooling2D, Concatenate) 17 | from keras.layers.advanced_activations import LeakyReLU 18 | from keras.layers.normalization import BatchNormalization 19 | from keras.models import Model 20 | from keras.regularizers import l2 21 | from keras.utils.vis_utils import plot_model as plot 22 | 23 | 24 | parser = argparse.ArgumentParser(description='Darknet To Keras Converter.') 25 | parser.add_argument('config_path', help='Path to Darknet cfg file.') 26 | parser.add_argument('weights_path', help='Path to Darknet weights file.') 27 | parser.add_argument('output_path', help='Path to output Keras model file.') 28 | parser.add_argument( 29 | '-p', 30 | '--plot_model', 31 | help='Plot generated Keras model and save as image.', 32 | action='store_true') 33 | parser.add_argument( 34 | '-w', 35 | '--weights_only', 36 | help='Save as Keras weights file instead of model file.', 37 | action='store_true') 38 | 39 | def unique_config_sections(config_file): 40 | """Convert all config sections to have unique names. 41 | 42 | Adds unique suffixes to config sections for compability with configparser. 43 | """ 44 | section_counters = defaultdict(int) 45 | output_stream = io.StringIO() 46 | with open(config_file) as fin: 47 | for line in fin: 48 | if line.startswith('['): 49 | section = line.strip().strip('[]') 50 | _section = section + '_' + str(section_counters[section]) 51 | section_counters[section] += 1 52 | line = line.replace(section, _section) 53 | output_stream.write(line) 54 | output_stream.seek(0) 55 | return output_stream 56 | 57 | # %% 58 | def _main(args): 59 | config_path = os.path.expanduser(args.config_path) 60 | weights_path = os.path.expanduser(args.weights_path) 61 | assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format( 62 | config_path) 63 | assert weights_path.endswith( 64 | '.weights'), '{} is not a .weights file'.format(weights_path) 65 | 66 | output_path = os.path.expanduser(args.output_path) 67 | assert output_path.endswith( 68 | '.h5'), 'output path {} is not a .h5 file'.format(output_path) 69 | output_root = os.path.splitext(output_path)[0] 70 | 71 | # Load weights and config. 72 | print('Loading weights.') 73 | weights_file = open(weights_path, 'rb') 74 | major, minor, revision = np.ndarray( 75 | shape=(3, ), dtype='int32', buffer=weights_file.read(12)) 76 | if (major*10+minor)>=2 and major<1000 and minor<1000: 77 | seen = np.ndarray(shape=(1,), dtype='int64', buffer=weights_file.read(8)) 78 | else: 79 | seen = np.ndarray(shape=(1,), dtype='int32', buffer=weights_file.read(4)) 80 | print('Weights Header: ', major, minor, revision, seen) 81 | 82 | print('Parsing Darknet config.') 83 | unique_config_file = unique_config_sections(config_path) 84 | cfg_parser = configparser.ConfigParser() 85 | cfg_parser.read_file(unique_config_file) 86 | 87 | print('Creating Keras model.') 88 | input_layer = Input(shape=(None, None, 3)) 89 | prev_layer = input_layer 90 | all_layers = [] 91 | 92 | weight_decay = float(cfg_parser['net_0']['decay'] 93 | ) if 'net_0' in cfg_parser.sections() else 5e-4 94 | count = 0 95 | out_index = [] 96 | for section in cfg_parser.sections(): 97 | print('Parsing section {}'.format(section)) 98 | if section.startswith('convolutional'): 99 | filters = int(cfg_parser[section]['filters']) 100 | size = int(cfg_parser[section]['size']) 101 | stride = int(cfg_parser[section]['stride']) 102 | pad = int(cfg_parser[section]['pad']) 103 | activation = cfg_parser[section]['activation'] 104 | batch_normalize = 'batch_normalize' in cfg_parser[section] 105 | 106 | padding = 'same' if pad == 1 and stride == 1 else 'valid' 107 | 108 | # Setting weights. 109 | # Darknet serializes convolutional weights as: 110 | # [bias/beta, [gamma, mean, variance], conv_weights] 111 | prev_layer_shape = K.int_shape(prev_layer) 112 | 113 | weights_shape = (size, size, prev_layer_shape[-1], filters) 114 | darknet_w_shape = (filters, weights_shape[2], size, size) 115 | weights_size = np.product(weights_shape) 116 | 117 | print('conv2d', 'bn' 118 | if batch_normalize else ' ', activation, weights_shape) 119 | 120 | conv_bias = np.ndarray( 121 | shape=(filters, ), 122 | dtype='float32', 123 | buffer=weights_file.read(filters * 4)) 124 | count += filters 125 | 126 | if batch_normalize: 127 | bn_weights = np.ndarray( 128 | shape=(3, filters), 129 | dtype='float32', 130 | buffer=weights_file.read(filters * 12)) 131 | count += 3 * filters 132 | 133 | bn_weight_list = [ 134 | bn_weights[0], # scale gamma 135 | conv_bias, # shift beta 136 | bn_weights[1], # running mean 137 | bn_weights[2] # running var 138 | ] 139 | 140 | conv_weights = np.ndarray( 141 | shape=darknet_w_shape, 142 | dtype='float32', 143 | buffer=weights_file.read(weights_size * 4)) 144 | count += weights_size 145 | 146 | # DarkNet conv_weights are serialized Caffe-style: 147 | # (out_dim, in_dim, height, width) 148 | # We would like to set these to Tensorflow order: 149 | # (height, width, in_dim, out_dim) 150 | conv_weights = np.transpose(conv_weights, [2, 3, 1, 0]) 151 | conv_weights = [conv_weights] if batch_normalize else [ 152 | conv_weights, conv_bias 153 | ] 154 | 155 | # Handle activation. 156 | act_fn = None 157 | if activation == 'leaky': 158 | pass # Add advanced activation later. 159 | elif activation != 'linear': 160 | raise ValueError( 161 | 'Unknown activation function `{}` in section {}'.format( 162 | activation, section)) 163 | 164 | # Create Conv2D layer 165 | if stride>1: 166 | # Darknet uses left and top padding instead of 'same' mode 167 | prev_layer = ZeroPadding2D(((1,0),(1,0)))(prev_layer) 168 | conv_layer = (Conv2D( 169 | filters, (size, size), 170 | strides=(stride, stride), 171 | kernel_regularizer=l2(weight_decay), 172 | use_bias=not batch_normalize, 173 | weights=conv_weights, 174 | activation=act_fn, 175 | padding=padding))(prev_layer) 176 | 177 | if batch_normalize: 178 | conv_layer = (BatchNormalization( 179 | weights=bn_weight_list))(conv_layer) 180 | prev_layer = conv_layer 181 | 182 | if activation == 'linear': 183 | all_layers.append(prev_layer) 184 | elif activation == 'leaky': 185 | act_layer = LeakyReLU(alpha=0.1)(prev_layer) 186 | prev_layer = act_layer 187 | all_layers.append(act_layer) 188 | 189 | elif section.startswith('route'): 190 | ids = [int(i) for i in cfg_parser[section]['layers'].split(',')] 191 | layers = [all_layers[i] for i in ids] 192 | if len(layers) > 1: 193 | print('Concatenating route layers:', layers) 194 | concatenate_layer = Concatenate()(layers) 195 | all_layers.append(concatenate_layer) 196 | prev_layer = concatenate_layer 197 | else: 198 | skip_layer = layers[0] # only one layer to route 199 | all_layers.append(skip_layer) 200 | prev_layer = skip_layer 201 | 202 | elif section.startswith('maxpool'): 203 | size = int(cfg_parser[section]['size']) 204 | stride = int(cfg_parser[section]['stride']) 205 | all_layers.append( 206 | MaxPooling2D( 207 | pool_size=(size, size), 208 | strides=(stride, stride), 209 | padding='same')(prev_layer)) 210 | prev_layer = all_layers[-1] 211 | 212 | elif section.startswith('shortcut'): 213 | index = int(cfg_parser[section]['from']) 214 | activation = cfg_parser[section]['activation'] 215 | assert activation == 'linear', 'Only linear activation supported.' 216 | all_layers.append(Add()([all_layers[index], prev_layer])) 217 | prev_layer = all_layers[-1] 218 | 219 | elif section.startswith('upsample'): 220 | stride = int(cfg_parser[section]['stride']) 221 | assert stride == 2, 'Only stride=2 supported.' 222 | all_layers.append(UpSampling2D(stride)(prev_layer)) 223 | prev_layer = all_layers[-1] 224 | 225 | elif section.startswith('yolo'): 226 | out_index.append(len(all_layers)-1) 227 | all_layers.append(None) 228 | prev_layer = all_layers[-1] 229 | 230 | elif section.startswith('net'): 231 | pass 232 | 233 | else: 234 | raise ValueError( 235 | 'Unsupported section header type: {}'.format(section)) 236 | 237 | # Create and save model. 238 | if len(out_index)==0: out_index.append(len(all_layers)-1) 239 | model = Model(inputs=input_layer, outputs=[all_layers[i] for i in out_index]) 240 | print(model.summary()) 241 | if args.weights_only: 242 | model.save_weights('{}'.format(output_path)) 243 | print('Saved Keras weights to {}'.format(output_path)) 244 | else: 245 | model.save('{}'.format(output_path)) 246 | print('Saved Keras model to {}'.format(output_path)) 247 | 248 | # Check to see if all weights have been read. 249 | remaining_weights = len(weights_file.read()) / 4 250 | weights_file.close() 251 | print('Read {} of {} from Darknet weights.'.format(count, count + 252 | remaining_weights)) 253 | if remaining_weights > 0: 254 | print('Warning: {} unused weights'.format(remaining_weights)) 255 | 256 | if args.plot_model: 257 | plot(model, to_file='{}.png'.format(output_root), show_shapes=True) 258 | print('Saved model plot to {}.png'.format(output_root)) 259 | 260 | 261 | if __name__ == '__main__': 262 | _main(parser.parse_args()) 263 | -------------------------------------------------------------------------------- /deep_sort/__init__.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | -------------------------------------------------------------------------------- /deep_sort/detection.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import numpy as np 3 | 4 | 5 | class Detection(object): 6 | """ 7 | This class represents a bounding box detection in a single image. 8 | 9 | Parameters 10 | ---------- 11 | tlwh : array_like 12 | Bounding box in format `(x, y, w, h)`. 13 | confidence : float 14 | Detector confidence score. 15 | feature : array_like 16 | A feature vector that describes the object contained in this image. 17 | 18 | Attributes 19 | ---------- 20 | tlwh : ndarray 21 | Bounding box in format `(top left x, top left y, width, height)`. 22 | confidence : ndarray 23 | Detector confidence score. 24 | feature : ndarray | NoneType 25 | A feature vector that describes the object contained in this image. 26 | 27 | """ 28 | 29 | def __init__(self, tlwh, confidence, feature): 30 | self.tlwh = np.asarray(tlwh, dtype=np.float) 31 | self.confidence = float(confidence) 32 | self.feature = np.asarray(feature, dtype=np.float32) 33 | 34 | def to_tlbr(self): 35 | """Convert bounding box to format `(min x, min y, max x, max y)`, i.e., 36 | `(top left, bottom right)`. 37 | """ 38 | ret = self.tlwh.copy() 39 | ret[2:] += ret[:2] 40 | return ret 41 | 42 | def to_xyah(self): 43 | """Convert bounding box to format `(center x, center y, aspect ratio, 44 | height)`, where the aspect ratio is `width / height`. 45 | """ 46 | ret = self.tlwh.copy() 47 | ret[:2] += ret[2:] / 2 48 | ret[2] /= ret[3] 49 | return ret 50 | -------------------------------------------------------------------------------- /deep_sort/iou_matching.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | from __future__ import absolute_import 3 | import numpy as np 4 | from . import linear_assignment 5 | 6 | 7 | def iou(bbox, candidates): 8 | """Computer intersection over union. 9 | 10 | Parameters 11 | ---------- 12 | bbox : ndarray 13 | A bounding box in format `(top left x, top left y, width, height)`. 14 | candidates : ndarray 15 | A matrix of candidate bounding boxes (one per row) in the same format 16 | as `bbox`. 17 | 18 | Returns 19 | ------- 20 | ndarray 21 | The intersection over union in [0, 1] between the `bbox` and each 22 | candidate. A higher score means a larger fraction of the `bbox` is 23 | occluded by the candidate. 24 | 25 | """ 26 | bbox_tl, bbox_br = bbox[:2], bbox[:2] + bbox[2:] 27 | candidates_tl = candidates[:, :2] 28 | candidates_br = candidates[:, :2] + candidates[:, 2:] 29 | 30 | tl = np.c_[np.maximum(bbox_tl[0], candidates_tl[:, 0])[:, np.newaxis], 31 | np.maximum(bbox_tl[1], candidates_tl[:, 1])[:, np.newaxis]] 32 | br = np.c_[np.minimum(bbox_br[0], candidates_br[:, 0])[:, np.newaxis], 33 | np.minimum(bbox_br[1], candidates_br[:, 1])[:, np.newaxis]] 34 | wh = np.maximum(0., br - tl) 35 | 36 | area_intersection = wh.prod(axis=1) 37 | area_bbox = bbox[2:].prod() 38 | area_candidates = candidates[:, 2:].prod(axis=1) 39 | return area_intersection / (area_bbox + area_candidates - area_intersection) 40 | 41 | 42 | def iou_cost(tracks, detections, track_indices=None, 43 | detection_indices=None): 44 | """An intersection over union distance metric. 45 | 46 | Parameters 47 | ---------- 48 | tracks : List[deep_sort.track.Track] 49 | A list of tracks. 50 | detections : List[deep_sort.detection.Detection] 51 | A list of detections. 52 | track_indices : Optional[List[int]] 53 | A list of indices to tracks that should be matched. Defaults to 54 | all `tracks`. 55 | detection_indices : Optional[List[int]] 56 | A list of indices to detections that should be matched. Defaults 57 | to all `detections`. 58 | 59 | Returns 60 | ------- 61 | ndarray 62 | Returns a cost matrix of shape 63 | len(track_indices), len(detection_indices) where entry (i, j) is 64 | `1 - iou(tracks[track_indices[i]], detections[detection_indices[j]])`. 65 | 66 | """ 67 | if track_indices is None: 68 | track_indices = np.arange(len(tracks)) 69 | if detection_indices is None: 70 | detection_indices = np.arange(len(detections)) 71 | 72 | cost_matrix = np.zeros((len(track_indices), len(detection_indices))) 73 | for row, track_idx in enumerate(track_indices): 74 | if tracks[track_idx].time_since_update > 1: 75 | cost_matrix[row, :] = linear_assignment.INFTY_COST 76 | continue 77 | 78 | bbox = tracks[track_idx].to_tlwh() 79 | candidates = np.asarray([detections[i].tlwh for i in detection_indices]) 80 | cost_matrix[row, :] = 1. - iou(bbox, candidates) 81 | return cost_matrix 82 | -------------------------------------------------------------------------------- /deep_sort/kalman_filter.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import numpy as np 3 | import scipy.linalg 4 | 5 | 6 | """ 7 | Table for the 0.95 quantile of the chi-square distribution with N degrees of 8 | freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv 9 | function and used as Mahalanobis gating threshold. 10 | """ 11 | chi2inv95 = { 12 | 1: 3.8415, 13 | 2: 5.9915, 14 | 3: 7.8147, 15 | 4: 9.4877, 16 | 5: 11.070, 17 | 6: 12.592, 18 | 7: 14.067, 19 | 8: 15.507, 20 | 9: 16.919} 21 | 22 | 23 | class KalmanFilter(object): 24 | """ 25 | A simple Kalman filter for tracking bounding boxes in image space. 26 | 27 | The 8-dimensional state space 28 | 29 | x, y, a, h, vx, vy, va, vh 30 | 31 | contains the bounding box center position (x, y), aspect ratio a, height h, 32 | and their respective velocities. 33 | 34 | Object motion follows a constant velocity model. The bounding box location 35 | (x, y, a, h) is taken as direct observation of the state space (linear 36 | observation model). 37 | 38 | """ 39 | 40 | def __init__(self): 41 | ndim, dt = 4, 1. 42 | 43 | # Create Kalman filter model matrices. 44 | self._motion_mat = np.eye(2 * ndim, 2 * ndim) 45 | for i in range(ndim): 46 | self._motion_mat[i, ndim + i] = dt 47 | self._update_mat = np.eye(ndim, 2 * ndim) 48 | 49 | # Motion and observation uncertainty are chosen relative to the current 50 | # state estimate. These weights control the amount of uncertainty in 51 | # the model. This is a bit hacky. 52 | self._std_weight_position = 1. / 20 53 | self._std_weight_velocity = 1. / 160 54 | 55 | def initiate(self, measurement): 56 | """Create track from unassociated measurement. 57 | 58 | Parameters 59 | ---------- 60 | measurement : ndarray 61 | Bounding box coordinates (x, y, a, h) with center position (x, y), 62 | aspect ratio a, and height h. 63 | 64 | Returns 65 | ------- 66 | (ndarray, ndarray) 67 | Returns the mean vector (8 dimensional) and covariance matrix (8x8 68 | dimensional) of the new track. Unobserved velocities are initialized 69 | to 0 mean. 70 | 71 | """ 72 | mean_pos = measurement 73 | mean_vel = np.zeros_like(mean_pos) 74 | mean = np.r_[mean_pos, mean_vel] 75 | 76 | std = [ 77 | 2 * self._std_weight_position * measurement[3], 78 | 2 * self._std_weight_position * measurement[3], 79 | 1e-2, 80 | 2 * self._std_weight_position * measurement[3], 81 | 10 * self._std_weight_velocity * measurement[3], 82 | 10 * self._std_weight_velocity * measurement[3], 83 | 1e-5, 84 | 10 * self._std_weight_velocity * measurement[3]] 85 | covariance = np.diag(np.square(std)) 86 | return mean, covariance 87 | 88 | def predict(self, mean, covariance): 89 | """Run Kalman filter prediction step. 90 | 91 | Parameters 92 | ---------- 93 | mean : ndarray 94 | The 8 dimensional mean vector of the object state at the previous 95 | time step. 96 | covariance : ndarray 97 | The 8x8 dimensional covariance matrix of the object state at the 98 | previous time step. 99 | 100 | Returns 101 | ------- 102 | (ndarray, ndarray) 103 | Returns the mean vector and covariance matrix of the predicted 104 | state. Unobserved velocities are initialized to 0 mean. 105 | 106 | """ 107 | std_pos = [ 108 | self._std_weight_position * mean[3], 109 | self._std_weight_position * mean[3], 110 | 1e-2, 111 | self._std_weight_position * mean[3]] 112 | std_vel = [ 113 | self._std_weight_velocity * mean[3], 114 | self._std_weight_velocity * mean[3], 115 | 1e-5, 116 | self._std_weight_velocity * mean[3]] 117 | motion_cov = np.diag(np.square(np.r_[std_pos, std_vel])) 118 | 119 | mean = np.dot(self._motion_mat, mean) 120 | covariance = np.linalg.multi_dot(( 121 | self._motion_mat, covariance, self._motion_mat.T)) + motion_cov 122 | 123 | return mean, covariance 124 | 125 | def project(self, mean, covariance): 126 | """Project state distribution to measurement space. 127 | 128 | Parameters 129 | ---------- 130 | mean : ndarray 131 | The state's mean vector (8 dimensional array). 132 | covariance : ndarray 133 | The state's covariance matrix (8x8 dimensional). 134 | 135 | Returns 136 | ------- 137 | (ndarray, ndarray) 138 | Returns the projected mean and covariance matrix of the given state 139 | estimate. 140 | 141 | """ 142 | std = [ 143 | self._std_weight_position * mean[3], 144 | self._std_weight_position * mean[3], 145 | 1e-1, 146 | self._std_weight_position * mean[3]] 147 | innovation_cov = np.diag(np.square(std)) 148 | 149 | mean = np.dot(self._update_mat, mean) 150 | covariance = np.linalg.multi_dot(( 151 | self._update_mat, covariance, self._update_mat.T)) 152 | return mean, covariance + innovation_cov 153 | 154 | def update(self, mean, covariance, measurement): 155 | """Run Kalman filter correction step. 156 | 157 | Parameters 158 | ---------- 159 | mean : ndarray 160 | The predicted state's mean vector (8 dimensional). 161 | covariance : ndarray 162 | The state's covariance matrix (8x8 dimensional). 163 | measurement : ndarray 164 | The 4 dimensional measurement vector (x, y, a, h), where (x, y) 165 | is the center position, a the aspect ratio, and h the height of the 166 | bounding box. 167 | 168 | Returns 169 | ------- 170 | (ndarray, ndarray) 171 | Returns the measurement-corrected state distribution. 172 | 173 | """ 174 | projected_mean, projected_cov = self.project(mean, covariance) 175 | 176 | chol_factor, lower = scipy.linalg.cho_factor( 177 | projected_cov, lower=True, check_finite=False) 178 | kalman_gain = scipy.linalg.cho_solve( 179 | (chol_factor, lower), np.dot(covariance, self._update_mat.T).T, 180 | check_finite=False).T 181 | innovation = measurement - projected_mean 182 | 183 | new_mean = mean + np.dot(innovation, kalman_gain.T) 184 | new_covariance = covariance - np.linalg.multi_dot(( 185 | kalman_gain, projected_cov, kalman_gain.T)) 186 | return new_mean, new_covariance 187 | 188 | def gating_distance(self, mean, covariance, measurements, 189 | only_position=False): 190 | """Compute gating distance between state distribution and measurements. 191 | 192 | A suitable distance threshold can be obtained from `chi2inv95`. If 193 | `only_position` is False, the chi-square distribution has 4 degrees of 194 | freedom, otherwise 2. 195 | 196 | Parameters 197 | ---------- 198 | mean : ndarray 199 | Mean vector over the state distribution (8 dimensional). 200 | covariance : ndarray 201 | Covariance of the state distribution (8x8 dimensional). 202 | measurements : ndarray 203 | An Nx4 dimensional matrix of N measurements, each in 204 | format (x, y, a, h) where (x, y) is the bounding box center 205 | position, a the aspect ratio, and h the height. 206 | only_position : Optional[bool] 207 | If True, distance computation is done with respect to the bounding 208 | box center position only. 209 | 210 | Returns 211 | ------- 212 | ndarray 213 | Returns an array of length N, where the i-th element contains the 214 | squared Mahalanobis distance between (mean, covariance) and 215 | `measurements[i]`. 216 | 217 | """ 218 | mean, covariance = self.project(mean, covariance) 219 | if only_position: 220 | mean, covariance = mean[:2], covariance[:2, :2] 221 | measurements = measurements[:, :2] 222 | 223 | cholesky_factor = np.linalg.cholesky(covariance) 224 | d = measurements - mean 225 | z = scipy.linalg.solve_triangular( 226 | cholesky_factor, d.T, lower=True, check_finite=False, 227 | overwrite_b=True) 228 | squared_maha = np.sum(z * z, axis=0) 229 | return squared_maha 230 | -------------------------------------------------------------------------------- /deep_sort/linear_assignment.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | from __future__ import absolute_import 3 | import numpy as np 4 | from sklearn.utils.linear_assignment_ import linear_assignment 5 | from . import kalman_filter 6 | 7 | 8 | INFTY_COST = 1e+5 9 | 10 | 11 | def min_cost_matching( 12 | distance_metric, max_distance, tracks, detections, track_indices=None, 13 | detection_indices=None): 14 | """Solve linear assignment problem. 15 | 16 | Parameters 17 | ---------- 18 | distance_metric : Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray 19 | The distance metric is given a list of tracks and detections as well as 20 | a list of N track indices and M detection indices. The metric should 21 | return the NxM dimensional cost matrix, where element (i, j) is the 22 | association cost between the i-th track in the given track indices and 23 | the j-th detection in the given detection_indices. 24 | max_distance : float 25 | Gating threshold. Associations with cost larger than this value are 26 | disregarded. 27 | tracks : List[track.Track] 28 | A list of predicted tracks at the current time step. 29 | detections : List[detection.Detection] 30 | A list of detections at the current time step. 31 | track_indices : List[int] 32 | List of track indices that maps rows in `cost_matrix` to tracks in 33 | `tracks` (see description above). 34 | detection_indices : List[int] 35 | List of detection indices that maps columns in `cost_matrix` to 36 | detections in `detections` (see description above). 37 | 38 | Returns 39 | ------- 40 | (List[(int, int)], List[int], List[int]) 41 | Returns a tuple with the following three entries: 42 | * A list of matched track and detection indices. 43 | * A list of unmatched track indices. 44 | * A list of unmatched detection indices. 45 | 46 | """ 47 | if track_indices is None: 48 | track_indices = np.arange(len(tracks)) 49 | if detection_indices is None: 50 | detection_indices = np.arange(len(detections)) 51 | 52 | if len(detection_indices) == 0 or len(track_indices) == 0: 53 | return [], track_indices, detection_indices # Nothing to match. 54 | 55 | cost_matrix = distance_metric( 56 | tracks, detections, track_indices, detection_indices) 57 | cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5 58 | indices = linear_assignment(cost_matrix) 59 | 60 | matches, unmatched_tracks, unmatched_detections = [], [], [] 61 | for col, detection_idx in enumerate(detection_indices): 62 | if col not in indices[:, 1]: 63 | unmatched_detections.append(detection_idx) 64 | for row, track_idx in enumerate(track_indices): 65 | if row not in indices[:, 0]: 66 | unmatched_tracks.append(track_idx) 67 | for row, col in indices: 68 | track_idx = track_indices[row] 69 | detection_idx = detection_indices[col] 70 | if cost_matrix[row, col] > max_distance: 71 | unmatched_tracks.append(track_idx) 72 | unmatched_detections.append(detection_idx) 73 | else: 74 | matches.append((track_idx, detection_idx)) 75 | return matches, unmatched_tracks, unmatched_detections 76 | 77 | 78 | def matching_cascade( 79 | distance_metric, max_distance, cascade_depth, tracks, detections, 80 | track_indices=None, detection_indices=None): 81 | """Run matching cascade. 82 | 83 | Parameters 84 | ---------- 85 | distance_metric : Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray 86 | The distance metric is given a list of tracks and detections as well as 87 | a list of N track indices and M detection indices. The metric should 88 | return the NxM dimensional cost matrix, where element (i, j) is the 89 | association cost between the i-th track in the given track indices and 90 | the j-th detection in the given detection indices. 91 | max_distance : float 92 | Gating threshold. Associations with cost larger than this value are 93 | disregarded. 94 | cascade_depth: int 95 | The cascade depth, should be se to the maximum track age. 96 | tracks : List[track.Track] 97 | A list of predicted tracks at the current time step. 98 | detections : List[detection.Detection] 99 | A list of detections at the current time step. 100 | track_indices : Optional[List[int]] 101 | List of track indices that maps rows in `cost_matrix` to tracks in 102 | `tracks` (see description above). Defaults to all tracks. 103 | detection_indices : Optional[List[int]] 104 | List of detection indices that maps columns in `cost_matrix` to 105 | detections in `detections` (see description above). Defaults to all 106 | detections. 107 | 108 | Returns 109 | ------- 110 | (List[(int, int)], List[int], List[int]) 111 | Returns a tuple with the following three entries: 112 | * A list of matched track and detection indices. 113 | * A list of unmatched track indices. 114 | * A list of unmatched detection indices. 115 | 116 | """ 117 | if track_indices is None: 118 | track_indices = list(range(len(tracks))) 119 | if detection_indices is None: 120 | detection_indices = list(range(len(detections))) 121 | 122 | unmatched_detections = detection_indices 123 | matches = [] 124 | for level in range(cascade_depth): 125 | if len(unmatched_detections) == 0: # No detections left 126 | break 127 | 128 | track_indices_l = [ 129 | k for k in track_indices 130 | if tracks[k].time_since_update == 1 + level 131 | ] 132 | if len(track_indices_l) == 0: # Nothing to match at this level 133 | continue 134 | 135 | matches_l, _, unmatched_detections = \ 136 | min_cost_matching( 137 | distance_metric, max_distance, tracks, detections, 138 | track_indices_l, unmatched_detections) 139 | matches += matches_l 140 | unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches)) 141 | return matches, unmatched_tracks, unmatched_detections 142 | 143 | 144 | def gate_cost_matrix( 145 | kf, cost_matrix, tracks, detections, track_indices, detection_indices, 146 | gated_cost=INFTY_COST, only_position=False): 147 | """Invalidate infeasible entries in cost matrix based on the state 148 | distributions obtained by Kalman filtering. 149 | 150 | Parameters 151 | ---------- 152 | kf : The Kalman filter. 153 | cost_matrix : ndarray 154 | The NxM dimensional cost matrix, where N is the number of track indices 155 | and M is the number of detection indices, such that entry (i, j) is the 156 | association cost between `tracks[track_indices[i]]` and 157 | `detections[detection_indices[j]]`. 158 | tracks : List[track.Track] 159 | A list of predicted tracks at the current time step. 160 | detections : List[detection.Detection] 161 | A list of detections at the current time step. 162 | track_indices : List[int] 163 | List of track indices that maps rows in `cost_matrix` to tracks in 164 | `tracks` (see description above). 165 | detection_indices : List[int] 166 | List of detection indices that maps columns in `cost_matrix` to 167 | detections in `detections` (see description above). 168 | gated_cost : Optional[float] 169 | Entries in the cost matrix corresponding to infeasible associations are 170 | set this value. Defaults to a very large value. 171 | only_position : Optional[bool] 172 | If True, only the x, y position of the state distribution is considered 173 | during gating. Defaults to False. 174 | 175 | Returns 176 | ------- 177 | ndarray 178 | Returns the modified cost matrix. 179 | 180 | """ 181 | gating_dim = 2 if only_position else 4 182 | gating_threshold = kalman_filter.chi2inv95[gating_dim] 183 | measurements = np.asarray( 184 | [detections[i].to_xyah() for i in detection_indices]) 185 | for row, track_idx in enumerate(track_indices): 186 | track = tracks[track_idx] 187 | gating_distance = kf.gating_distance( 188 | track.mean, track.covariance, measurements, only_position) 189 | cost_matrix[row, gating_distance > gating_threshold] = gated_cost 190 | return cost_matrix 191 | -------------------------------------------------------------------------------- /deep_sort/nn_matching.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import numpy as np 3 | 4 | 5 | def _pdist(a, b): 6 | """Compute pair-wise squared distance between points in `a` and `b`. 7 | 8 | Parameters 9 | ---------- 10 | a : array_like 11 | An NxM matrix of N samples of dimensionality M. 12 | b : array_like 13 | An LxM matrix of L samples of dimensionality M. 14 | 15 | Returns 16 | ------- 17 | ndarray 18 | Returns a matrix of size len(a), len(b) such that eleement (i, j) 19 | contains the squared distance between `a[i]` and `b[j]`. 20 | 21 | """ 22 | a, b = np.asarray(a), np.asarray(b) 23 | if len(a) == 0 or len(b) == 0: 24 | return np.zeros((len(a), len(b))) 25 | a2, b2 = np.square(a).sum(axis=1), np.square(b).sum(axis=1) 26 | r2 = -2. * np.dot(a, b.T) + a2[:, None] + b2[None, :] 27 | r2 = np.clip(r2, 0., float(np.inf)) 28 | return r2 29 | 30 | 31 | def _cosine_distance(a, b, data_is_normalized=False): 32 | """Compute pair-wise cosine distance between points in `a` and `b`. 33 | 34 | Parameters 35 | ---------- 36 | a : array_like 37 | An NxM matrix of N samples of dimensionality M. 38 | b : array_like 39 | An LxM matrix of L samples of dimensionality M. 40 | data_is_normalized : Optional[bool] 41 | If True, assumes rows in a and b are unit length vectors. 42 | Otherwise, a and b are explicitly normalized to lenght 1. 43 | 44 | Returns 45 | ------- 46 | ndarray 47 | Returns a matrix of size len(a), len(b) such that eleement (i, j) 48 | contains the squared distance between `a[i]` and `b[j]`. 49 | 50 | """ 51 | if not data_is_normalized: 52 | a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True) 53 | b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True) 54 | return 1. - np.dot(a, b.T) 55 | 56 | 57 | def _nn_euclidean_distance(x, y): 58 | """ Helper function for nearest neighbor distance metric (Euclidean). 59 | 60 | Parameters 61 | ---------- 62 | x : ndarray 63 | A matrix of N row-vectors (sample points). 64 | y : ndarray 65 | A matrix of M row-vectors (query points). 66 | 67 | Returns 68 | ------- 69 | ndarray 70 | A vector of length M that contains for each entry in `y` the 71 | smallest Euclidean distance to a sample in `x`. 72 | 73 | """ 74 | distances = _pdist(x, y) 75 | return np.maximum(0.0, distances.min(axis=0)) 76 | 77 | 78 | def _nn_cosine_distance(x, y): 79 | """ Helper function for nearest neighbor distance metric (cosine). 80 | 81 | Parameters 82 | ---------- 83 | x : ndarray 84 | A matrix of N row-vectors (sample points). 85 | y : ndarray 86 | A matrix of M row-vectors (query points). 87 | 88 | Returns 89 | ------- 90 | ndarray 91 | A vector of length M that contains for each entry in `y` the 92 | smallest cosine distance to a sample in `x`. 93 | 94 | """ 95 | distances = _cosine_distance(x, y) 96 | return distances.min(axis=0) 97 | 98 | 99 | class NearestNeighborDistanceMetric(object): 100 | """ 101 | A nearest neighbor distance metric that, for each target, returns 102 | the closest distance to any sample that has been observed so far. 103 | 104 | Parameters 105 | ---------- 106 | metric : str 107 | Either "euclidean" or "cosine". 108 | matching_threshold: float 109 | The matching threshold. Samples with larger distance are considered an 110 | invalid match. 111 | budget : Optional[int] 112 | If not None, fix samples per class to at most this number. Removes 113 | the oldest samples when the budget is reached. 114 | 115 | Attributes 116 | ---------- 117 | samples : Dict[int -> List[ndarray]] 118 | A dictionary that maps from target identities to the list of samples 119 | that have been observed so far. 120 | 121 | """ 122 | 123 | def __init__(self, metric, matching_threshold, budget=None): 124 | 125 | 126 | if metric == "euclidean": 127 | self._metric = _nn_euclidean_distance 128 | elif metric == "cosine": 129 | self._metric = _nn_cosine_distance 130 | else: 131 | raise ValueError( 132 | "Invalid metric; must be either 'euclidean' or 'cosine'") 133 | self.matching_threshold = matching_threshold 134 | self.budget = budget 135 | self.samples = {} 136 | 137 | def partial_fit(self, features, targets, active_targets): 138 | """Update the distance metric with new data. 139 | 140 | Parameters 141 | ---------- 142 | features : ndarray 143 | An NxM matrix of N features of dimensionality M. 144 | targets : ndarray 145 | An integer array of associated target identities. 146 | active_targets : List[int] 147 | A list of targets that are currently present in the scene. 148 | 149 | """ 150 | for feature, target in zip(features, targets): 151 | self.samples.setdefault(target, []).append(feature) 152 | if self.budget is not None: 153 | self.samples[target] = self.samples[target][-self.budget:] 154 | self.samples = {k: self.samples[k] for k in active_targets} 155 | 156 | def distance(self, features, targets): 157 | """Compute distance between features and targets. 158 | 159 | Parameters 160 | ---------- 161 | features : ndarray 162 | An NxM matrix of N features of dimensionality M. 163 | targets : List[int] 164 | A list of targets to match the given `features` against. 165 | 166 | Returns 167 | ------- 168 | ndarray 169 | Returns a cost matrix of shape len(targets), len(features), where 170 | element (i, j) contains the closest squared distance between 171 | `targets[i]` and `features[j]`. 172 | 173 | """ 174 | cost_matrix = np.zeros((len(targets), len(features))) 175 | for i, target in enumerate(targets): 176 | cost_matrix[i, :] = self._metric(self.samples[target], features) 177 | return cost_matrix 178 | -------------------------------------------------------------------------------- /deep_sort/preprocessing.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import numpy as np 3 | import cv2 4 | 5 | 6 | def non_max_suppression(boxes, max_bbox_overlap, scores=None): 7 | """Suppress overlapping detections. 8 | 9 | Original code from [1]_ has been adapted to include confidence score. 10 | 11 | .. [1] http://www.pyimagesearch.com/2015/02/16/ 12 | faster-non-maximum-suppression-python/ 13 | 14 | Examples 15 | -------- 16 | 17 | >>> boxes = [d.roi for d in detections] 18 | >>> scores = [d.confidence for d in detections] 19 | >>> indices = non_max_suppression(boxes, max_bbox_overlap, scores) 20 | >>> detections = [detections[i] for i in indices] 21 | 22 | Parameters 23 | ---------- 24 | boxes : ndarray 25 | Array of ROIs (x, y, width, height). 26 | max_bbox_overlap : float 27 | ROIs that overlap more than this values are suppressed. 28 | scores : Optional[array_like] 29 | Detector confidence score. 30 | 31 | Returns 32 | ------- 33 | List[int] 34 | Returns indices of detections that have survived non-maxima suppression. 35 | 36 | """ 37 | if len(boxes) == 0: 38 | return [] 39 | 40 | boxes = boxes.astype(np.float) 41 | pick = [] 42 | 43 | x1 = boxes[:, 0] 44 | y1 = boxes[:, 1] 45 | x2 = boxes[:, 2] + boxes[:, 0] 46 | y2 = boxes[:, 3] + boxes[:, 1] 47 | 48 | area = (x2 - x1 + 1) * (y2 - y1 + 1) 49 | if scores is not None: 50 | idxs = np.argsort(scores) 51 | else: 52 | idxs = np.argsort(y2) 53 | 54 | while len(idxs) > 0: 55 | last = len(idxs) - 1 56 | i = idxs[last] 57 | pick.append(i) 58 | 59 | xx1 = np.maximum(x1[i], x1[idxs[:last]]) 60 | yy1 = np.maximum(y1[i], y1[idxs[:last]]) 61 | xx2 = np.minimum(x2[i], x2[idxs[:last]]) 62 | yy2 = np.minimum(y2[i], y2[idxs[:last]]) 63 | 64 | w = np.maximum(0, xx2 - xx1 + 1) 65 | h = np.maximum(0, yy2 - yy1 + 1) 66 | 67 | overlap = (w * h) / area[idxs[:last]] 68 | 69 | idxs = np.delete( 70 | idxs, np.concatenate( 71 | ([last], np.where(overlap > max_bbox_overlap)[0]))) 72 | 73 | return pick 74 | -------------------------------------------------------------------------------- /deep_sort/track.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | 3 | 4 | class TrackState: 5 | """ 6 | Enumeration type for the single target track state. Newly created tracks are 7 | classified as `tentative` until enough evidence has been collected. Then, 8 | the track state is changed to `confirmed`. Tracks that are no longer alive 9 | are classified as `deleted` to mark them for removal from the set of active 10 | tracks. 11 | 12 | """ 13 | 14 | Tentative = 1 15 | Confirmed = 2 16 | Deleted = 3 17 | 18 | 19 | class Track: 20 | """ 21 | A single target track with state space `(x, y, a, h)` and associated 22 | velocities, where `(x, y)` is the center of the bounding box, `a` is the 23 | aspect ratio and `h` is the height. 24 | 25 | Parameters 26 | ---------- 27 | mean : ndarray 28 | Mean vector of the initial state distribution. 29 | covariance : ndarray 30 | Covariance matrix of the initial state distribution. 31 | track_id : int 32 | A unique track identifier. 33 | n_init : int 34 | Number of consecutive detections before the track is confirmed. The 35 | track state is set to `Deleted` if a miss occurs within the first 36 | `n_init` frames. 37 | max_age : int 38 | The maximum number of consecutive misses before the track state is 39 | set to `Deleted`. 40 | feature : Optional[ndarray] 41 | Feature vector of the detection this track originates from. If not None, 42 | this feature is added to the `features` cache. 43 | 44 | Attributes 45 | ---------- 46 | mean : ndarray 47 | Mean vector of the initial state distribution. 48 | covariance : ndarray 49 | Covariance matrix of the initial state distribution. 50 | track_id : int 51 | A unique track identifier. 52 | hits : int 53 | Total number of measurement updates. 54 | age : int 55 | Total number of frames since first occurance. 56 | time_since_update : int 57 | Total number of frames since last measurement update. 58 | state : TrackState 59 | The current track state. 60 | features : List[ndarray] 61 | A cache of features. On each measurement update, the associated feature 62 | vector is added to this list. 63 | 64 | """ 65 | 66 | def __init__(self, mean, covariance, track_id, n_init, max_age, 67 | feature=None): 68 | self.mean = mean 69 | self.covariance = covariance 70 | self.track_id = track_id 71 | self.hits = 1 72 | self.age = 1 73 | self.time_since_update = 0 74 | 75 | self.state = TrackState.Tentative 76 | self.features = [] 77 | if feature is not None: 78 | self.features.append(feature) 79 | 80 | self._n_init = n_init 81 | self._max_age = max_age 82 | 83 | def to_tlwh(self): 84 | """Get current position in bounding box format `(top left x, top left y, 85 | width, height)`. 86 | 87 | Returns 88 | ------- 89 | ndarray 90 | The bounding box. 91 | 92 | """ 93 | ret = self.mean[:4].copy() 94 | ret[2] *= ret[3] 95 | ret[:2] -= ret[2:] / 2 96 | return ret 97 | 98 | def to_tlbr(self): 99 | """Get current position in bounding box format `(min x, miny, max x, 100 | max y)`. 101 | 102 | Returns 103 | ------- 104 | ndarray 105 | The bounding box. 106 | 107 | """ 108 | ret = self.to_tlwh() 109 | ret[2:] = ret[:2] + ret[2:] 110 | return ret 111 | 112 | def predict(self, kf): 113 | """Propagate the state distribution to the current time step using a 114 | Kalman filter prediction step. 115 | 116 | Parameters 117 | ---------- 118 | kf : kalman_filter.KalmanFilter 119 | The Kalman filter. 120 | 121 | """ 122 | self.mean, self.covariance = kf.predict(self.mean, self.covariance) 123 | self.age += 1 124 | self.time_since_update += 1 125 | 126 | def update(self, kf, detection): 127 | """Perform Kalman filter measurement update step and update the feature 128 | cache. 129 | 130 | Parameters 131 | ---------- 132 | kf : kalman_filter.KalmanFilter 133 | The Kalman filter. 134 | detection : Detection 135 | The associated detection. 136 | 137 | """ 138 | self.mean, self.covariance = kf.update( 139 | self.mean, self.covariance, detection.to_xyah()) 140 | self.features.append(detection.feature) 141 | 142 | self.hits += 1 143 | self.time_since_update = 0 144 | if self.state == TrackState.Tentative and self.hits >= self._n_init: 145 | self.state = TrackState.Confirmed 146 | 147 | def mark_missed(self): 148 | """Mark this track as missed (no association at the current time step). 149 | """ 150 | if self.state == TrackState.Tentative: 151 | self.state = TrackState.Deleted 152 | elif self.time_since_update > self._max_age: 153 | self.state = TrackState.Deleted 154 | 155 | def is_tentative(self): 156 | """Returns True if this track is tentative (unconfirmed). 157 | """ 158 | return self.state == TrackState.Tentative 159 | 160 | def is_confirmed(self): 161 | """Returns True if this track is confirmed.""" 162 | return self.state == TrackState.Confirmed 163 | 164 | def is_deleted(self): 165 | """Returns True if this track is dead and should be deleted.""" 166 | return self.state == TrackState.Deleted 167 | -------------------------------------------------------------------------------- /deep_sort/tracker.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | from __future__ import absolute_import 3 | import numpy as np 4 | from . import kalman_filter 5 | from . import linear_assignment 6 | from . import iou_matching 7 | from .track import Track 8 | 9 | 10 | class Tracker: 11 | """ 12 | This is the multi-target tracker. 13 | 14 | Parameters 15 | ---------- 16 | metric : nn_matching.NearestNeighborDistanceMetric 17 | A distance metric for measurement-to-track association. 18 | max_age : int 19 | Maximum number of missed misses before a track is deleted. 20 | n_init : int 21 | Number of consecutive detections before the track is confirmed. The 22 | track state is set to `Deleted` if a miss occurs within the first 23 | `n_init` frames. 24 | 25 | Attributes 26 | ---------- 27 | metric : nn_matching.NearestNeighborDistanceMetric 28 | The distance metric used for measurement to track association. 29 | max_age : int 30 | Maximum number of missed misses before a track is deleted. 31 | n_init : int 32 | Number of frames that a track remains in initialization phase. 33 | kf : kalman_filter.KalmanFilter 34 | A Kalman filter to filter target trajectories in image space. 35 | tracks : List[Track] 36 | The list of active tracks at the current time step. 37 | 38 | """ 39 | 40 | def __init__(self, metric, max_iou_distance=0.7, max_age=30, n_init=3): 41 | self.metric = metric 42 | self.max_iou_distance = max_iou_distance 43 | self.max_age = max_age 44 | self.n_init = n_init 45 | 46 | self.kf = kalman_filter.KalmanFilter() 47 | self.tracks = [] 48 | self._next_id = 1 49 | 50 | def predict(self): 51 | """Propagate track state distributions one time step forward. 52 | 53 | This function should be called once every time step, before `update`. 54 | """ 55 | for track in self.tracks: 56 | track.predict(self.kf) 57 | 58 | def update(self, detections): 59 | """Perform measurement update and track management. 60 | 61 | Parameters 62 | ---------- 63 | detections : List[deep_sort.detection.Detection] 64 | A list of detections at the current time step. 65 | 66 | """ 67 | # Run matching cascade. 68 | matches, unmatched_tracks, unmatched_detections = \ 69 | self._match(detections) 70 | 71 | # Update track set. 72 | for track_idx, detection_idx in matches: 73 | self.tracks[track_idx].update( 74 | self.kf, detections[detection_idx]) 75 | for track_idx in unmatched_tracks: 76 | self.tracks[track_idx].mark_missed() 77 | for detection_idx in unmatched_detections: 78 | self._initiate_track(detections[detection_idx]) 79 | self.tracks = [t for t in self.tracks if not t.is_deleted()] 80 | 81 | # Update distance metric. 82 | active_targets = [t.track_id for t in self.tracks if t.is_confirmed()] 83 | features, targets = [], [] 84 | for track in self.tracks: 85 | if not track.is_confirmed(): 86 | continue 87 | features += track.features 88 | targets += [track.track_id for _ in track.features] 89 | track.features = [] 90 | self.metric.partial_fit( 91 | np.asarray(features), np.asarray(targets), active_targets) 92 | 93 | def _match(self, detections): 94 | 95 | def gated_metric(tracks, dets, track_indices, detection_indices): 96 | features = np.array([dets[i].feature for i in detection_indices]) 97 | targets = np.array([tracks[i].track_id for i in track_indices]) 98 | cost_matrix = self.metric.distance(features, targets) 99 | cost_matrix = linear_assignment.gate_cost_matrix( 100 | self.kf, cost_matrix, tracks, dets, track_indices, 101 | detection_indices) 102 | 103 | return cost_matrix 104 | 105 | # Split track set into confirmed and unconfirmed tracks. 106 | confirmed_tracks = [ 107 | i for i, t in enumerate(self.tracks) if t.is_confirmed()] 108 | unconfirmed_tracks = [ 109 | i for i, t in enumerate(self.tracks) if not t.is_confirmed()] 110 | 111 | # Associate confirmed tracks using appearance features. 112 | matches_a, unmatched_tracks_a, unmatched_detections = \ 113 | linear_assignment.matching_cascade( 114 | gated_metric, self.metric.matching_threshold, self.max_age, 115 | self.tracks, detections, confirmed_tracks) 116 | 117 | # Associate remaining tracks together with unconfirmed tracks using IOU. 118 | iou_track_candidates = unconfirmed_tracks + [ 119 | k for k in unmatched_tracks_a if 120 | self.tracks[k].time_since_update == 1] 121 | unmatched_tracks_a = [ 122 | k for k in unmatched_tracks_a if 123 | self.tracks[k].time_since_update != 1] 124 | matches_b, unmatched_tracks_b, unmatched_detections = \ 125 | linear_assignment.min_cost_matching( 126 | iou_matching.iou_cost, self.max_iou_distance, self.tracks, 127 | detections, iou_track_candidates, unmatched_detections) 128 | 129 | matches = matches_a + matches_b 130 | unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b)) 131 | return matches, unmatched_tracks, unmatched_detections 132 | 133 | def _initiate_track(self, detection): 134 | mean, covariance = self.kf.initiate(detection.to_xyah()) 135 | self.tracks.append(Track( 136 | mean, covariance, self._next_id, self.n_init, self.max_age, 137 | detection.feature)) 138 | self._next_id += 1 139 | -------------------------------------------------------------------------------- /detection.txt: -------------------------------------------------------------------------------- 1 | 0 591 214 177 276 0 59 365 393 2 | 1 591 215 177 276 0 59 365 394 3 | 2 591 214 177 276 0 59 365 391 4 | 3 591 215 178 275 0 59 365 390 5 | 4 590 215 178 275 0 58 365 391 6 | 5 591 218 175 271 0 60 365 389 7 | 6 591 216 176 272 0 61 366 388 8 | 7 592 215 176 273 0 60 367 388 9 | 8 592 215 176 273 0 59 367 388 10 | 9 591 215 177 272 0 60 367 387 11 | 10 592 214 178 274 0 59 367 388 12 | 11 591 215 177 273 0 60 367 387 13 | 12 592 214 176 273 0 60 367 388 14 | 13 592 214 177 275 0 59 366 391 15 | 14 592 214 178 275 0 59 366 391 16 | 15 591 214 178 274 0 58 367 390 17 | 16 591 214 178 274 0 59 366 388 18 | 17 591 214 177 274 0 59 365 390 19 | 18 591 213 176 277 0 58 365 392 20 | 19 590 212 176 278 0 59 364 391 21 | 20 590 214 177 277 0 60 363 388 22 | 21 590 213 176 277 0 60 363 389 23 | 22 590 215 176 276 0 61 364 389 24 | 23 589 215 178 277 0 60 364 393 25 | 24 590 216 178 276 0 63 365 394 26 | 25 588 215 178 278 0 63 366 395 27 | 26 588 213 177 280 0 65 366 393 28 | 27 587 213 177 280 0 66 367 391 29 | 28 585 213 177 281 0 66 367 388 30 | 29 584 214 176 280 0 64 365 390 31 | 30 584 215 176 279 0 64 364 388 32 | 31 584 212 175 281 0 63 364 392 33 | 32 583 211 175 281 0 64 365 391 34 | 33 583 211 175 283 0 62 364 394 35 | 34 583 211 175 281 0 60 363 398 36 | 35 582 210 175 282 0 59 362 398 37 | 36 583 211 174 282 0 59 363 398 38 | 37 583 210 174 283 0 58 362 400 39 | 38 582 209 174 285 0 59 362 399 40 | 39 583 210 174 283 0 59 361 400 41 | 40 582 209 174 285 0 59 362 403 42 | 41 582 209 175 285 0 59 363 403 43 | 42 583 212 174 282 0 59 363 402 44 | 43 583 212 174 282 0 61 363 400 45 | 44 583 211 175 282 0 61 364 396 46 | 45 583 211 175 282 0 62 365 395 47 | 46 583 212 177 281 0 61 365 397 48 | 47 583 212 176 281 0 61 365 397 49 | 48 583 212 176 281 0 62 365 398 50 | 49 583 212 176 281 0 63 365 399 51 | 50 584 213 176 280 0 62 364 402 52 | 51 584 213 175 280 0 61 362 403 53 | 52 584 213 175 279 0 61 362 405 54 | 53 584 214 175 279 0 61 360 405 55 | 54 584 213 176 280 0 60 359 406 56 | 55 584 212 175 280 0 62 357 400 57 | 56 584 211 176 281 0 62 353 399 58 | 57 584 212 175 280 0 62 353 397 59 | 58 584 211 176 281 0 61 344 399 60 | 59 584 211 175 281 0 59 336 405 61 | 60 584 210 176 284 0 59 330 403 62 | 61 584 211 176 282 0 63 318 398 63 | 62 583 211 177 283 64 | 63 582 211 177 283 65 | 64 583 212 177 282 66 | 65 582 211 178 283 67 | 66 582 211 177 283 0 67 267 428 68 | 67 582 210 177 284 0 70 259 427 69 | 68 582 210 178 285 70 | 69 582 210 178 285 71 | 70 582 211 178 284 72 | 71 582 211 178 283 73 | 72 582 211 178 284 74 | 73 582 210 179 285 75 | 74 582 211 180 284 76 | 75 581 210 181 285 77 | 76 582 211 181 284 78 | 77 582 211 181 284 79 | 78 582 210 181 284 80 | 79 583 211 183 283 81 | 80 586 212 182 282 82 | 81 586 211 181 283 83 | 82 587 210 180 282 84 | 83 589 211 181 283 85 | 84 589 211 181 282 86 | 85 590 211 181 282 87 | 86 590 212 181 280 88 | 87 590 210 183 284 89 | 88 590 210 182 284 90 | 89 588 212 184 282 91 | 90 590 210 184 285 92 | 91 591 210 183 284 93 | 92 591 209 183 285 94 | 93 592 206 183 289 95 | 94 591 207 182 288 96 | 95 591 209 182 285 97 | 96 590 210 182 285 98 | 97 590 210 182 285 99 | 98 590 208 183 288 100 | 99 590 207 182 288 101 | 100 590 207 182 288 102 | 101 589 209 182 286 103 | 102 588 208 183 287 104 | 103 588 208 183 285 105 | 104 589 208 182 285 106 | 105 589 197 190 287 107 | 106 590 199 191 284 108 | 107 592 200 191 283 109 | 108 596 197 190 288 110 | 109 599 193 191 294 111 | 110 600 194 190 293 112 | 111 600 192 191 296 113 | 112 599 192 192 296 114 | 113 600 205 192 291 115 | 114 600 192 191 296 116 | 115 600 193 190 293 117 | 116 599 191 192 296 118 | 117 599 191 191 296 119 | 118 600 192 189 295 120 | 119 598 194 187 289 121 | 120 595 194 189 289 122 | 121 594 193 187 289 123 | 122 592 194 188 288 124 | 123 591 195 189 288 125 | 124 591 194 188 288 126 | 125 590 195 187 286 127 | 126 589 195 188 285 128 | 127 589 194 187 286 129 | 128 590 192 187 289 130 | 129 587 194 190 289 131 | 130 586 194 190 291 132 | 131 584 195 192 291 133 | 132 583 195 192 291 134 | 133 582 194 191 291 135 | 134 582 203 186 297 136 | 135 581 205 188 294 137 | 136 581 207 187 291 138 | 137 580 207 187 293 139 | 138 580 208 186 292 140 | 139 580 208 187 293 141 | 140 579 207 188 294 142 | 141 579 206 186 295 143 | 142 579 207 187 296 144 | 143 579 207 186 297 145 | 144 578 208 188 295 146 | 145 579 210 186 295 147 | 146 580 210 185 294 148 | 147 579 211 186 296 149 | 148 580 212 186 296 150 | 149 579 214 188 297 151 | 150 581 216 187 294 152 | 151 581 216 188 293 153 | 152 582 216 188 295 154 | 153 584 218 188 294 155 | 154 585 222 186 290 156 | 155 587 226 186 289 157 | 156 587 226 187 289 158 | 157 589 229 187 285 159 | 158 590 230 187 285 160 | 159 592 231 185 285 161 | 160 593 233 187 283 162 | 161 594 234 187 283 163 | 162 597 238 185 276 164 | 163 597 242 190 291 165 | 164 599 243 188 290 166 | 165 602 235 185 283 167 | 166 602 247 187 284 168 | 167 604 251 186 281 169 | 168 611 250 185 280 170 | 169 611 252 188 279 171 | 170 612 253 188 280 172 | 171 613 255 192 278 173 | 172 615 256 192 276 174 | 173 617 256 190 277 175 | 174 619 258 189 277 176 | 175 623 261 187 275 177 | 176 626 263 185 274 178 | 177 629 262 183 271 179 | 178 632 263 181 271 180 | 179 633 262 181 272 181 | 180 632 262 183 273 182 | 181 632 263 183 272 183 | 182 633 264 181 273 184 | 183 633 261 185 277 185 | 184 633 261 186 280 186 | 185 636 264 184 277 187 | 186 636 267 186 277 188 | 187 636 268 186 277 189 | 188 636 270 186 275 190 | 189 637 272 184 273 191 | 190 638 271 185 276 192 | 191 639 272 185 275 193 | 192 640 272 183 276 194 | 193 640 272 185 277 195 | 194 640 273 188 277 196 | 195 639 272 192 278 197 | 196 646 284 189 258 198 | 197 648 283 188 259 199 | 198 650 288 186 254 200 | 199 650 296 187 254 201 | 200 654 296 185 255 202 | 201 651 315 199 218 203 | 202 652 319 199 212 204 | 203 653 317 201 214 205 | 204 654 317 202 216 206 | 205 656 320 204 212 207 | 206 660 321 201 209 208 | 207 661 326 197 205 209 | 208 662 328 196 203 210 | 209 663 329 197 203 211 | 210 662 327 196 204 212 | 211 660 326 199 204 213 | 212 661 325 199 206 214 | 213 662 326 194 205 215 | 214 662 327 195 205 216 | 215 662 327 195 205 217 | 216 663 330 195 202 218 | 217 663 332 195 200 219 | 218 662 331 195 201 220 | 219 662 329 194 203 221 | 220 661 328 196 205 222 | 221 660 327 197 206 223 | 222 659 327 200 205 224 | 223 657 327 202 206 225 | 224 656 327 203 205 226 | 225 656 326 204 207 227 | 226 656 326 203 206 228 | 227 657 327 199 206 229 | 228 655 325 201 206 230 | 229 655 326 201 206 231 | 230 655 323 206 209 232 | 231 656 323 206 212 233 | 232 660 321 220 218 234 | 233 663 319 235 221 235 | 234 666 312 244 230 236 | 235 667 309 252 233 237 | 236 666 309 264 231 238 | 237 665 310 276 229 239 | 238 660 307 291 233 240 | 239 662 313 291 224 241 | 240 676 314 284 220 242 | 241 679 320 282 215 243 | 242 684 326 273 207 244 | 243 695 328 255 204 245 | 244 706 325 245 205 246 | 245 715 321 236 209 247 | 246 725 319 222 208 248 | 247 734 318 212 208 249 | 248 740 308 209 216 250 | 249 747 301 203 222 251 | 250 755 303 196 209 252 | 251 763 301 188 204 253 | 252 775 293 177 215 254 | 253 783 290 174 214 255 | 254 794 286 168 212 256 | 255 800 280 162 219 257 | 256 816 272 145 203 258 | 257 824 267 136 210 259 | 258 830 266 128 210 260 | 259 261 | 260 262 | 261 263 | 262 264 | 263 265 | 264 266 | 265 267 | 266 268 | 267 269 | 268 270 | 269 271 | 270 272 | 271 273 | 272 274 | 273 275 | 274 276 | 275 277 | 276 278 | 277 279 | 278 280 | 279 281 | 280 282 | 281 283 | 282 284 | 283 285 | 284 286 | 285 287 | 286 288 | 287 289 | 288 290 | 289 291 | 290 292 | 291 293 | 292 294 | 293 295 | 294 296 | 295 297 | 296 298 | 297 299 | 298 300 | 299 301 | 300 302 | 301 303 | 302 304 | 303 305 | 304 306 | 305 307 | 306 308 | 307 309 | 308 310 | 309 311 | 310 312 | 311 313 | 312 314 | 313 315 | 314 316 | 315 317 | 316 318 | 317 319 | 318 320 | 319 321 | 320 322 | 321 323 | 322 324 | 323 325 | 324 326 | 325 327 | 326 328 | 327 329 | 328 330 | 329 331 | 330 332 | 331 333 | 332 830 266 130 199 334 | 333 823 268 138 196 335 | 334 807 267 150 201 336 | 335 802 268 155 201 337 | 336 796 270 162 201 338 | 337 790 271 165 201 339 | 338 783 271 168 202 340 | 339 770 273 183 205 341 | 340 761 276 193 197 342 | 341 748 281 208 188 343 | 342 734 281 222 185 344 | 343 723 284 233 181 345 | 344 718 295 230 185 346 | 345 710 295 237 185 347 | 346 703 296 245 187 348 | 347 692 298 256 183 349 | 348 683 297 270 186 350 | 349 676 299 280 182 351 | 350 650 292 311 195 352 | 351 645 291 320 198 353 | 352 640 291 327 197 354 | 353 637 291 331 197 355 | 354 632 290 332 197 356 | 355 630 291 329 193 357 | 356 622 295 329 186 358 | 357 621 294 326 187 359 | 358 618 296 324 183 360 | 359 614 298 325 180 361 | 360 611 296 327 185 362 | 361 605 296 331 183 363 | 362 598 296 335 184 364 | 363 590 297 343 181 365 | 364 578 296 352 182 366 | 365 572 298 356 178 367 | 366 567 298 363 178 368 | 367 541 298 412 159 369 | 368 555 292 354 169 370 | 369 551 292 360 168 371 | 370 549 293 357 165 372 | 371 543 292 358 164 373 | 372 543 290 346 165 374 | 373 538 287 352 167 375 | 374 535 286 353 168 376 | 375 531 285 351 170 377 | 376 529 284 350 172 378 | 377 528 285 347 170 379 | 378 528 281 347 175 380 | 379 527 279 346 174 381 | 380 526 277 341 174 382 | 381 524 275 345 177 383 | 382 524 273 342 179 384 | 383 522 271 340 181 385 | 384 523 269 336 185 386 | 385 522 270 339 183 387 | 386 520 270 340 184 388 | 387 522 269 335 186 389 | 388 522 268 335 188 390 | 389 524 269 336 189 391 | 390 523 269 338 188 392 | 391 524 266 337 191 393 | 392 526 266 338 190 394 | 393 530 264 335 192 395 | 394 531 265 332 191 396 | 395 530 265 338 192 397 | 396 533 264 328 192 398 | 397 533 263 326 195 399 | 398 534 263 315 195 400 | 399 535 261 313 196 401 | 400 533 260 315 196 402 | 401 534 259 315 197 403 | 402 534 258 315 198 404 | 403 529 259 317 196 405 | 404 528 258 315 195 406 | 405 528 258 314 194 407 | 406 525 257 316 194 408 | 407 524 257 316 193 409 | 408 522 258 318 193 410 | 409 521 258 315 193 411 | 410 520 257 314 193 412 | 411 518 258 316 192 413 | 412 516 258 317 192 414 | 413 515 256 317 194 415 | 414 515 256 317 194 416 | 415 515 255 318 195 417 | 416 516 256 317 193 418 | 417 516 255 318 194 419 | 418 516 257 317 192 420 | 419 517 256 316 194 421 | 420 518 256 316 194 422 | 421 520 257 315 194 423 | 422 520 258 316 193 424 | 423 521 256 316 196 425 | 424 521 256 316 196 426 | 425 521 256 318 195 427 | 426 520 255 320 196 428 | 427 522 256 318 195 429 | 428 523 256 317 196 430 | 429 523 256 319 196 431 | 430 522 256 318 194 432 | 431 522 257 319 193 433 | 432 522 255 320 196 434 | 433 523 256 319 196 435 | 434 525 255 317 197 436 | 435 526 255 316 197 437 | 436 525 255 318 197 438 | 437 525 256 318 197 439 | 438 525 255 321 198 440 | 439 523 256 322 196 441 | 440 526 255 319 197 442 | 441 524 256 321 196 443 | 442 524 256 322 196 444 | 443 525 256 321 196 445 | 444 525 255 321 197 446 | 445 526 255 319 198 447 | 446 525 255 320 198 448 | 447 526 255 321 199 449 | 448 526 256 319 196 450 | 449 525 255 318 197 451 | 450 527 254 316 200 452 | 451 526 254 315 200 453 | 452 525 254 317 201 454 | 453 524 254 319 201 455 | 454 524 254 321 200 456 | 455 523 254 322 200 457 | 456 524 254 323 200 458 | 457 524 255 321 199 459 | 458 525 255 320 199 460 | 459 526 255 320 200 461 | 460 526 254 319 201 462 | 461 528 254 319 202 463 | 462 532 254 316 204 464 | 463 530 254 318 203 465 | 464 528 255 319 201 466 | 465 529 256 317 200 467 | 466 528 256 317 201 468 | 467 528 254 320 202 469 | 468 528 254 322 202 470 | 469 526 255 322 201 471 | 470 525 255 321 201 472 | 471 524 256 321 200 473 | 472 521 258 321 199 474 | 473 519 258 319 200 475 | 474 518 258 322 199 476 | 475 518 258 323 201 477 | 476 520 257 322 200 478 | 477 521 256 321 201 479 | 478 522 253 324 205 480 | 479 523 251 327 206 481 | 480 526 250 326 208 482 | 481 527 250 325 209 483 | 482 529 251 324 210 484 | 483 531 248 323 214 485 | 484 533 249 323 214 486 | 485 533 250 323 214 487 | 486 535 248 326 213 488 | 487 536 245 328 215 489 | 488 536 244 328 216 490 | 489 535 245 328 214 491 | 490 533 245 331 214 492 | 491 534 246 327 213 493 | 492 534 246 327 213 494 | 493 533 244 329 215 495 | 494 533 245 325 214 496 | 495 533 242 328 217 497 | 496 531 244 326 213 498 | 497 533 244 323 213 499 | 498 534 243 326 213 500 | 499 535 243 327 213 501 | 500 537 240 325 217 502 | 501 536 242 325 213 503 | 502 536 242 324 213 504 | 503 537 241 322 214 505 | 504 536 240 322 215 506 | 505 536 239 321 217 507 | 506 536 239 320 215 508 | 507 536 239 320 215 509 | 508 536 237 319 217 510 | 509 537 239 318 215 511 | 510 535 237 319 218 512 | 511 535 239 317 214 513 | 512 536 237 318 215 0 119 172 411 514 | 513 536 238 320 214 515 | 514 535 237 318 216 516 | 515 536 236 317 215 517 | 516 535 236 317 215 518 | 517 535 234 318 218 0 118 238 409 519 | 518 535 234 315 216 0 114 256 412 520 | 519 534 235 318 217 0 111 266 415 521 | 520 535 237 316 216 2 108 271 421 522 | 521 534 238 316 214 2 100 279 437 523 | 522 536 235 316 218 524 | 523 537 236 314 217 525 | 524 535 234 317 217 1 85 315 426 526 | 525 536 232 316 220 0 82 336 429 527 | 526 535 230 316 220 0 84 346 427 528 | 527 536 232 316 218 0 84 353 426 529 | 528 539 232 311 220 0 83 359 429 530 | 529 540 234 313 217 0 86 365 424 531 | 530 540 233 312 217 0 85 372 425 532 | 531 540 232 313 217 0 86 384 423 533 | 532 542 231 314 217 0 83 389 429 534 | 533 542 230 315 219 3 82 401 430 535 | 534 543 229 315 218 6 87 404 418 536 | 535 543 229 316 220 537 | 536 543 229 317 220 538 | 537 544 230 316 218 539 | 538 545 229 316 218 16 84 432 418 540 | 539 547 228 316 218 541 | 540 545 228 319 218 542 | 541 544 227 323 218 543 | 542 546 228 320 218 544 | 543 545 231 320 216 545 | 544 545 227 320 220 546 | 545 546 227 317 220 547 | 546 545 228 317 218 548 | 547 544 229 319 217 22 77 430 431 549 | 548 544 229 318 217 23 78 428 430 550 | 549 543 227 319 221 551 | 550 543 230 317 217 21 79 430 423 552 | 551 541 232 316 215 553 | 552 540 232 318 216 554 | 553 541 231 315 216 17 82 436 421 555 | 554 540 232 313 214 19 82 433 421 556 | 555 539 232 313 215 16 82 439 420 557 | 556 538 231 314 218 14 83 442 418 558 | 557 538 231 313 218 13 83 443 419 559 | 558 538 231 313 217 14 84 440 418 560 | 559 536 232 315 218 13 85 440 414 561 | 560 536 232 314 217 14 85 437 416 562 | 561 533 233 318 217 15 90 432 409 563 | 562 532 233 319 216 12 92 436 409 564 | 563 530 232 320 219 4 85 421 419 565 | 564 532 232 317 218 5 88 412 413 566 | 565 531 232 319 219 5 91 405 410 567 | 566 530 232 317 219 4 92 395 409 568 | 567 530 231 316 220 0 96 386 407 569 | 568 529 231 316 221 0 95 377 410 570 | 569 530 230 315 220 0 95 366 409 571 | 570 529 231 316 220 0 96 360 406 572 | 571 529 230 316 221 0 119 356 393 573 | 572 528 230 314 221 0 119 348 394 574 | 573 527 229 317 221 0 116 331 403 575 | 574 527 230 317 222 3 113 314 409 576 | 575 524 231 321 221 4 113 305 411 577 | 576 525 230 320 222 10 115 285 408 578 | 577 524 231 321 222 3 112 277 414 579 | 578 524 233 320 220 3 112 269 420 580 | 579 524 234 321 217 0 118 263 407 581 | 580 525 235 319 216 0 121 251 404 582 | 581 524 237 321 214 0 120 235 408 583 | 582 524 236 322 215 0 124 228 406 584 | 583 524 234 322 217 585 | 584 524 235 325 217 586 | 585 525 235 323 217 587 | 586 525 234 321 217 588 | 587 525 234 321 218 589 | 588 523 235 322 217 590 | 589 524 236 323 216 591 | 590 525 235 322 216 592 | 591 526 236 322 216 593 | 592 526 233 322 219 594 | 593 526 232 324 221 595 | 594 528 232 324 222 596 | 595 529 233 324 221 597 | 596 527 232 324 221 598 | 597 528 233 323 221 599 | 598 530 232 321 221 600 | 599 530 232 321 222 601 | 600 529 232 322 220 602 | 601 531 232 322 220 603 | 602 532 231 322 222 604 | 603 532 230 325 225 605 | 604 533 232 324 223 606 | 605 535 229 324 228 607 | 606 534 229 325 227 608 | 607 533 231 326 224 609 | 608 534 231 323 225 610 | 609 533 233 323 223 611 | 610 532 231 324 224 612 | 611 530 231 324 224 613 | 612 529 232 323 223 614 | 613 528 232 323 222 615 | 614 527 232 323 223 616 | 615 526 231 325 223 617 | 616 527 230 326 226 618 | 617 528 230 326 222 619 | 618 528 232 324 223 620 | 619 530 231 325 223 621 | 620 531 230 322 224 622 | 621 531 230 324 223 623 | 622 534 229 323 225 624 | 623 533 229 324 225 625 | 624 534 231 323 223 626 | 625 534 229 323 224 627 | 626 535 231 325 224 628 | 627 536 232 322 224 629 | 628 537 232 323 223 630 | 629 538 231 323 222 631 | 630 538 230 327 224 632 | 631 539 231 326 224 633 | 632 538 232 328 222 634 | 633 539 231 325 225 635 | 634 540 231 327 225 636 | 635 540 229 329 227 637 | 636 541 228 328 227 638 | 637 541 226 327 228 639 | 638 542 227 327 226 640 | 639 542 227 327 227 641 | 640 542 227 326 226 642 | 641 542 227 328 226 643 | 642 542 228 328 226 644 | 643 543 227 329 227 645 | 644 544 229 330 225 646 | 645 544 227 332 226 647 | 646 544 228 335 224 648 | 647 545 227 334 225 649 | 648 546 229 335 224 650 | 649 547 229 332 224 651 | 650 548 228 334 225 652 | 651 550 228 334 225 653 | 652 551 227 336 226 654 | 653 552 227 337 225 655 | 654 552 226 340 225 656 | 655 552 225 342 228 657 | 656 554 225 342 227 658 | 657 553 226 343 227 659 | 658 553 225 346 228 0 63 223 441 660 | 659 553 224 348 229 0 64 229 435 661 | 660 554 224 347 228 0 61 234 434 662 | 661 555 226 347 226 0 56 245 437 663 | 662 555 226 348 226 0 53 252 444 664 | 663 554 225 349 226 0 50 264 442 665 | 664 555 226 349 225 0 48 271 445 666 | 665 556 225 347 226 0 45 276 449 667 | 666 554 225 349 226 3 43 277 446 668 | 667 555 225 349 225 5 39 293 440 669 | 668 557 226 345 225 2 41 305 439 670 | 669 558 225 344 225 0 41 328 437 671 | 670 557 224 346 226 0 45 342 434 672 | 671 557 224 345 227 0 48 353 430 673 | 672 555 224 353 225 0 47 361 430 674 | 673 555 222 351 227 0 48 368 431 675 | 674 556 221 350 228 0 51 377 424 676 | 675 555 221 353 229 0 48 381 424 677 | 676 553 217 357 230 0 46 382 423 678 | 677 552 216 357 231 0 46 381 420 679 | 678 552 214 358 234 0 43 378 425 680 | 679 551 214 357 233 0 46 378 421 681 | 680 551 213 354 233 0 45 373 419 682 | 681 551 212 356 235 0 44 377 416 683 | 682 549 212 358 234 0 42 376 420 684 | 683 549 210 358 236 0 39 378 421 685 | 684 549 210 359 236 0 37 377 424 686 | 685 550 212 357 233 0 37 377 424 687 | 686 551 210 357 234 0 34 378 428 688 | 687 551 209 357 235 0 31 377 432 689 | 688 550 209 360 234 0 32 378 430 690 | 689 559 194 371 247 0 32 378 427 691 | 690 558 195 373 246 0 32 381 425 692 | 691 558 195 373 244 0 30 382 430 693 | 692 557 197 374 240 0 30 380 429 694 | 693 558 194 371 244 0 30 380 429 695 | 694 551 201 362 236 0 14 379 421 696 | 695 548 200 362 235 0 13 379 426 697 | 696 545 200 360 232 0 8 372 433 698 | 697 536 187 385 228 0 13 371 421 699 | 698 537 187 377 226 0 15 367 417 700 | 699 531 189 384 223 0 16 363 416 701 | 700 536 196 341 238 0 15 360 419 702 | 701 532 194 334 242 0 16 356 419 703 | 702 520 196 338 238 0 15 351 421 704 | 703 515 196 341 237 0 11 342 430 705 | 704 509 187 373 226 0 10 340 433 706 | 705 509 184 371 231 0 7 339 437 707 | 706 509 183 365 231 0 6 336 437 708 | 707 503 183 370 231 0 5 331 441 709 | 708 492 185 379 230 1 6 318 438 710 | 709 486 187 381 226 711 | 710 483 186 389 226 712 | 711 481 182 402 227 713 | 712 478 181 404 227 714 | 713 483 180 397 226 715 | 714 485 178 391 229 716 | 715 486 176 385 232 717 | 716 486 172 384 235 718 | 717 488 171 379 234 3 0 273 459 719 | 718 489 167 377 236 1 0 274 461 720 | 719 477 164 359 254 721 | 720 474 163 365 257 722 | 721 466 161 380 263 723 | 722 462 159 389 269 0 0 269 524 724 | 723 459 102 469 340 0 0 260 538 725 | 724 459 92 478 362 0 0 247 548 726 | 725 463 86 465 381 727 | 726 473 78 451 399 0 0 240 554 728 | 727 487 85 449 393 729 | 728 505 82 445 400 730 | 729 512 74 439 414 731 | 730 522 76 421 420 732 | 731 537 84 423 407 733 | 732 552 81 402 412 734 | 733 578 85 380 422 735 | 734 634 112 319 420 736 | 735 679 128 277 395 737 | 736 703 125 253 403 738 | 737 732 54 219 463 739 | 738 743 54 210 460 740 | 739 750 46 207 474 741 | 740 759 33 197 495 742 | 741 764 14 195 515 743 | 742 770 11 192 524 744 | 743 745 | 744 746 | 745 747 | 746 748 | 747 749 | 748 750 | 749 751 | 750 752 | 751 753 | 752 754 | 753 755 | 754 756 | 755 757 | 756 758 | 757 759 | 758 760 | 759 761 | 760 762 | 761 763 | 762 764 | 763 765 | 764 766 | 765 767 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | from __future__ import division, print_function, absolute_import 5 | import os 6 | import datetime 7 | from timeit import time 8 | import warnings 9 | import cv2 10 | import numpy as np 11 | import argparse 12 | from PIL import Image 13 | from yolo import YOLO 14 | from deep_sort import preprocessing 15 | from deep_sort import nn_matching 16 | from deep_sort.detection import Detection 17 | from deep_sort.tracker import Tracker 18 | from tools import generate_detections as gdet 19 | from deep_sort.detection import Detection as ddet 20 | from collections import deque 21 | from keras import backend 22 | 23 | backend.clear_session() 24 | ap = argparse.ArgumentParser() 25 | ap.add_argument("-i", "--input",help="path to input video", default = "./test_video/det_t1_video_00315_test.avi") 26 | ap.add_argument("-c", "--class",help="name of class", default = "person") 27 | args = vars(ap.parse_args()) 28 | 29 | pts = [deque(maxlen=30) for _ in range(9999)] 30 | warnings.filterwarnings('ignore') 31 | 32 | # initialize a list of colors to represent each possible class label 33 | np.random.seed(100) 34 | COLORS = np.random.randint(0, 255, size=(200, 3), 35 | dtype="uint8") 36 | 37 | def main(yolo): 38 | 39 | start = time.time() 40 | #Definition of the parameters 41 | max_cosine_distance = 0.5 #0.9 余弦距离的控制阈值 42 | nn_budget = None 43 | nms_max_overlap = 0.3 #非极大抑制的阈值 44 | 45 | counter1 = [] 46 | counter2 = [] 47 | #deep_sort 48 | model_filename = 'model_data/market1501.pb' 49 | encoder = gdet.create_box_encoder(model_filename, batch_size=1) 50 | 51 | metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget) 52 | tracker = Tracker(metric) 53 | 54 | writeVideo_flag = True 55 | #video_path = "../../yolo_dataset/t1_video/test_video/det_t1_video_00025_test.avi" 56 | video_capture = cv2.VideoCapture(args["input"]) 57 | 58 | if writeVideo_flag: 59 | # Define the codec and create VideoWriter object 60 | w = int(video_capture.get(3)) 61 | h = int(video_capture.get(4)) 62 | fourcc = cv2.VideoWriter_fourcc(*'MJPG') 63 | out = cv2.VideoWriter('./output/'+args["input"][-13:-4]+ "_" + args["class"] + '_output.avi', fourcc, 15, (w, h)) 64 | list_file = open('detection.txt', 'w') 65 | frame_index = -1 66 | 67 | fps = 0.0 68 | 69 | while True: 70 | 71 | ret, frame = video_capture.read() # frame shape 640*480*3 72 | if ret != True: 73 | break 74 | t1 = time.time() 75 | 76 | # image = Image.fromarray(frame) 77 | image = Image.fromarray(frame[...,::-1]) #bgr to rgb 78 | boxs,class_names = yolo.detect_image(image) 79 | features = encoder(frame,boxs) 80 | # score to 1.0 here). 81 | detections = [Detection(bbox, 1.0, feature) for bbox, feature in zip(boxs, features)] 82 | # Run non-maxima suppression. 83 | boxes = np.array([d.tlwh for d in detections]) 84 | scores = np.array([d.confidence for d in detections]) 85 | indices = preprocessing.non_max_suppression(boxes, nms_max_overlap, scores) 86 | detections = [detections[i] for i in indices] 87 | 88 | # Call the tracker 89 | tracker.predict() 90 | tracker.update(detections) 91 | 92 | i = int(0) 93 | i1 = int(0) 94 | i2 = int(0) 95 | indexIDs = [] 96 | c = [] 97 | boxes = [] 98 | for det in detections: 99 | bbox = det.to_tlbr() 100 | cv2.rectangle(frame,(int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])),(255,255,255), 2) 101 | 102 | for track, class_name in zip(tracker.tracks, class_names): 103 | if not track.is_confirmed() or track.time_since_update > 1: 104 | continue 105 | # boxes.append([track[0], track[1], track[2], track[3]]) 106 | indexIDs.append(int(track.track_id)) 107 | print("relal class:" + class_name[0]) 108 | # 分别保存每个类别的track_id 109 | if class_name == ['person']: 110 | counter1.append(int(track.track_id)) 111 | if class_name == ['bicycle']: 112 | counter2.append(int(track.track_id)) 113 | color = [int(c) for c in COLORS[indexIDs[i] % len(COLORS)]] 114 | bbox = track.to_tlbr() 115 | cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])),(color), 3) 116 | cv2.putText(frame,str(track.track_id),(int(bbox[0]), int(bbox[1] -50)),0, 5e-3 * 150, (color),2) 117 | # if len(class_names) > 0: 118 | # class_name = class_names[0] 119 | # cv2.putText(frame, str(class_names[0]),(int(bbox[0]), int(bbox[1] -20)),0, 5e-3 * 150, (color),2) 120 | # 显示类别 121 | cv2.putText(frame, str(class_name), (int(bbox[0]), int(bbox[1] - 20)), 0, 5e-3 * 150, (color), 2) 122 | # 当前画面中的每个类别单独计数 123 | if class_name == ['person']: 124 | i1 = i1 +1 125 | else: 126 | i2 = i2 +1 127 | #bbox_center_point(x,y) 128 | center = (int(((bbox[0])+(bbox[2]))/2),int(((bbox[1])+(bbox[3]))/2)) 129 | #track_id[center] 130 | pts[track.track_id].append(center) 131 | thickness = 5 132 | #center point 133 | cv2.circle(frame, (center), 1, color, thickness) 134 | 135 | # draw motion path 移动路径 136 | for j in range(1, len(pts[track.track_id])): 137 | if pts[track.track_id][j - 1] is None or pts[track.track_id][j] is None: 138 | continue 139 | thickness = int(np.sqrt(64 / float(j + 1)) * 2) 140 | cv2.line(frame,(pts[track.track_id][j-1]), (pts[track.track_id][j]),(color),thickness) 141 | #cv2.putText(frame, str(class_names[j]),(int(bbox[0]), int(bbox[1] -20)),0, 5e-3 * 150, (255,255,255),2) 142 | 143 | # 统计每类物品的总数 144 | count1 = len(set(counter1)) 145 | count2 = len(set(counter2)) 146 | cv2.putText(frame, "Total person Counter: "+str(count1),(int(20), int(120)),0, 5e-3 * 100, (0,255,0),2) 147 | cv2.putText(frame, "Current person Counter: "+str(i1),(int(20), int(100)),0, 5e-3 * 100, (0,255,0),2) 148 | cv2.putText(frame, "Total bicycle Counter: "+str(count2),(int(20), int(80)),0, 5e-3 * 100, (0,255,0),2) 149 | cv2.putText(frame, "Current bicycle Counter: "+str(i2),(int(20), int(60)),0, 5e-3 * 100, (0,255,0),2) 150 | cv2.putText(frame, "FPS: %f"%(fps),(int(20), int(40)),0, 5e-3 * 100, (0,255,0),3) 151 | # cv2.namedWindow("YOLO3_Deep_SORT", 0); 152 | # cv2.resizeWindow('YOLO3_Deep_SORT', 1024, 768); 153 | # cv2.imshow('YOLO3_Deep_SORT', frame) 154 | 155 | if writeVideo_flag: 156 | #save a frame 157 | out.write(frame) 158 | frame_index = frame_index + 1 159 | list_file.write(str(frame_index)+' ') 160 | if len(boxs) != 0: 161 | for i in range(0,len(boxs)): 162 | list_file.write(str(boxs[i][0]) + ' '+str(boxs[i][1]) + ' '+str(boxs[i][2]) + ' '+str(boxs[i][3]) + ' ') 163 | list_file.write('\n') 164 | fps = ( fps + (1./(time.time()-t1)) ) / 2 165 | #print(set(counter)) 166 | 167 | # Press Q to stop! 168 | if cv2.waitKey(1) & 0xFF == ord('q'): 169 | break 170 | print(" ") 171 | print("[Finish]") 172 | end = time.time() 173 | 174 | if len(pts[track.track_id]) != None: 175 | print(args["input"][-13:-4] + ": " + str(count1) + " " + 'person Found') 176 | print(args["input"][-13:-4] + ": " + str(count2) + " " + 'bicycle Found') 177 | 178 | else: 179 | print("[No Found]") 180 | 181 | video_capture.release() 182 | 183 | if writeVideo_flag: 184 | out.release() 185 | list_file.close() 186 | cv2.destroyAllWindows() 187 | 188 | if __name__ == '__main__': 189 | main(YOLO()) 190 | -------------------------------------------------------------------------------- /model_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /model_data/market1501.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/model_data/market1501.pb -------------------------------------------------------------------------------- /model_data/mars-small128.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/model_data/mars-small128.pb -------------------------------------------------------------------------------- /model_data/mars.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/model_data/mars.pb -------------------------------------------------------------------------------- /model_data/obj.txt: -------------------------------------------------------------------------------- 1 | person 2 | fire_extinguisher 3 | fireplug 4 | car 5 | bicycle 6 | motorcycle 7 | -------------------------------------------------------------------------------- /model_data/voc_classes.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor 21 | -------------------------------------------------------------------------------- /model_data/yolo3_object.names: -------------------------------------------------------------------------------- 1 | person 2 | fire_extinguisher 3 | fireplug 4 | car 5 | bicycle 6 | motorcycle 7 | -------------------------------------------------------------------------------- /model_data/yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 2 | -------------------------------------------------------------------------------- /model_data/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | # batch=1 4 | # subdivisions=1 5 | # Training 6 | batch=64 7 | subdivisions=16 8 | width=608 9 | height=608 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | 790 | -------------------------------------------------------------------------------- /output/result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/output/result.png -------------------------------------------------------------------------------- /output/st1_vedio_person_output.avi: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/output/st1_vedio_person_output.avi -------------------------------------------------------------------------------- /output/st1_vedio_person_output.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/output/st1_vedio_person_output.gif -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Keras==2.2.4 2 | tensorflow-gpu==1.10.0 3 | opencv-python==3.4.4.19 4 | scikit-learn==0.21.2 5 | scipy==1.1.0 6 | Pillow 7 | torch==0.3.1 8 | torchvision==0.2.0 9 | -------------------------------------------------------------------------------- /tools/freeze_model.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import argparse 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | 6 | 7 | def _batch_norm_fn(x, scope=None): 8 | if scope is None: 9 | scope = tf.get_variable_scope().name + "/bn" 10 | return slim.batch_norm(x, scope=scope) 11 | 12 | 13 | def create_link( 14 | incoming, network_builder, scope, nonlinearity=tf.nn.elu, 15 | weights_initializer=tf.truncated_normal_initializer(stddev=1e-3), 16 | regularizer=None, is_first=False, summarize_activations=True): 17 | if is_first: 18 | network = incoming 19 | else: 20 | network = _batch_norm_fn(incoming, scope=scope + "/bn") 21 | network = nonlinearity(network) 22 | if summarize_activations: 23 | tf.summary.histogram(scope+"/activations", network) 24 | 25 | pre_block_network = network 26 | post_block_network = network_builder(pre_block_network, scope) 27 | 28 | incoming_dim = pre_block_network.get_shape().as_list()[-1] 29 | outgoing_dim = post_block_network.get_shape().as_list()[-1] 30 | if incoming_dim != outgoing_dim: 31 | assert outgoing_dim == 2 * incoming_dim, \ 32 | "%d != %d" % (outgoing_dim, 2 * incoming) 33 | projection = slim.conv2d( 34 | incoming, outgoing_dim, 1, 2, padding="SAME", activation_fn=None, 35 | scope=scope+"/projection", weights_initializer=weights_initializer, 36 | biases_initializer=None, weights_regularizer=regularizer) 37 | network = projection + post_block_network 38 | else: 39 | network = incoming + post_block_network 40 | return network 41 | 42 | 43 | def create_inner_block( 44 | incoming, scope, nonlinearity=tf.nn.elu, 45 | weights_initializer=tf.truncated_normal_initializer(1e-3), 46 | bias_initializer=tf.zeros_initializer(), regularizer=None, 47 | increase_dim=False, summarize_activations=True): 48 | n = incoming.get_shape().as_list()[-1] 49 | stride = 1 50 | if increase_dim: 51 | n *= 2 52 | stride = 2 53 | 54 | incoming = slim.conv2d( 55 | incoming, n, [3, 3], stride, activation_fn=nonlinearity, padding="SAME", 56 | normalizer_fn=_batch_norm_fn, weights_initializer=weights_initializer, 57 | biases_initializer=bias_initializer, weights_regularizer=regularizer, 58 | scope=scope + "/1") 59 | if summarize_activations: 60 | tf.summary.histogram(incoming.name + "/activations", incoming) 61 | 62 | incoming = slim.dropout(incoming, keep_prob=0.6) 63 | 64 | incoming = slim.conv2d( 65 | incoming, n, [3, 3], 1, activation_fn=None, padding="SAME", 66 | normalizer_fn=None, weights_initializer=weights_initializer, 67 | biases_initializer=bias_initializer, weights_regularizer=regularizer, 68 | scope=scope + "/2") 69 | return incoming 70 | 71 | 72 | def residual_block(incoming, scope, nonlinearity=tf.nn.elu, 73 | weights_initializer=tf.truncated_normal_initializer(1e3), 74 | bias_initializer=tf.zeros_initializer(), regularizer=None, 75 | increase_dim=False, is_first=False, 76 | summarize_activations=True): 77 | 78 | def network_builder(x, s): 79 | return create_inner_block( 80 | x, s, nonlinearity, weights_initializer, bias_initializer, 81 | regularizer, increase_dim, summarize_activations) 82 | 83 | return create_link( 84 | incoming, network_builder, scope, nonlinearity, weights_initializer, 85 | regularizer, is_first, summarize_activations) 86 | 87 | 88 | def _create_network(incoming, reuse=None, weight_decay=1e-8): 89 | nonlinearity = tf.nn.elu 90 | conv_weight_init = tf.truncated_normal_initializer(stddev=1e-3) 91 | conv_bias_init = tf.zeros_initializer() 92 | conv_regularizer = slim.l2_regularizer(weight_decay) 93 | fc_weight_init = tf.truncated_normal_initializer(stddev=1e-3) 94 | fc_bias_init = tf.zeros_initializer() 95 | fc_regularizer = slim.l2_regularizer(weight_decay) 96 | 97 | def batch_norm_fn(x): 98 | return slim.batch_norm(x, scope=tf.get_variable_scope().name + "/bn") 99 | 100 | network = incoming 101 | network = slim.conv2d( 102 | network, 32, [3, 3], stride=1, activation_fn=nonlinearity, 103 | padding="SAME", normalizer_fn=batch_norm_fn, scope="conv1_1", 104 | weights_initializer=conv_weight_init, biases_initializer=conv_bias_init, 105 | weights_regularizer=conv_regularizer) 106 | network = slim.conv2d( 107 | network, 32, [3, 3], stride=1, activation_fn=nonlinearity, 108 | padding="SAME", normalizer_fn=batch_norm_fn, scope="conv1_2", 109 | weights_initializer=conv_weight_init, biases_initializer=conv_bias_init, 110 | weights_regularizer=conv_regularizer) 111 | 112 | # NOTE(nwojke): This is missing a padding="SAME" to match the CNN 113 | # architecture in Table 1 of the paper. Information on how this affects 114 | # performance on MOT 16 training sequences can be found in 115 | # issue 10 https://github.com/nwojke/deep_sort/issues/10 116 | network = slim.max_pool2d(network, [3, 3], [2, 2], scope="pool1") 117 | 118 | network = residual_block( 119 | network, "conv2_1", nonlinearity, conv_weight_init, conv_bias_init, 120 | conv_regularizer, increase_dim=False, is_first=True) 121 | network = residual_block( 122 | network, "conv2_3", nonlinearity, conv_weight_init, conv_bias_init, 123 | conv_regularizer, increase_dim=False) 124 | 125 | network = residual_block( 126 | network, "conv3_1", nonlinearity, conv_weight_init, conv_bias_init, 127 | conv_regularizer, increase_dim=True) 128 | network = residual_block( 129 | network, "conv3_3", nonlinearity, conv_weight_init, conv_bias_init, 130 | conv_regularizer, increase_dim=False) 131 | 132 | network = residual_block( 133 | network, "conv4_1", nonlinearity, conv_weight_init, conv_bias_init, 134 | conv_regularizer, increase_dim=True) 135 | network = residual_block( 136 | network, "conv4_3", nonlinearity, conv_weight_init, conv_bias_init, 137 | conv_regularizer, increase_dim=False) 138 | 139 | feature_dim = network.get_shape().as_list()[-1] 140 | network = slim.flatten(network) 141 | 142 | network = slim.dropout(network, keep_prob=0.6) 143 | network = slim.fully_connected( 144 | network, feature_dim, activation_fn=nonlinearity, 145 | normalizer_fn=batch_norm_fn, weights_regularizer=fc_regularizer, 146 | scope="fc1", weights_initializer=fc_weight_init, 147 | biases_initializer=fc_bias_init) 148 | 149 | features = network 150 | 151 | # Features in rows, normalize axis 1. 152 | features = slim.batch_norm(features, scope="ball", reuse=reuse) 153 | feature_norm = tf.sqrt( 154 | tf.constant(1e-8, tf.float32) + 155 | tf.reduce_sum(tf.square(features), [1], keepdims=True)) 156 | features = features / feature_norm 157 | return features, None 158 | 159 | 160 | def _network_factory(weight_decay=1e-8): 161 | 162 | def factory_fn(image, reuse): 163 | with slim.arg_scope([slim.batch_norm, slim.dropout], 164 | is_training=False): 165 | with slim.arg_scope([slim.conv2d, slim.fully_connected, 166 | slim.batch_norm, slim.layer_norm], 167 | reuse=reuse): 168 | features, logits = _create_network( 169 | image, reuse=reuse, weight_decay=weight_decay) 170 | return features, logits 171 | 172 | return factory_fn 173 | 174 | 175 | def _preprocess(image): 176 | image = image[:, :, ::-1] # BGR to RGB 177 | return image 178 | 179 | 180 | def parse_args(): 181 | """Parse command line arguments. 182 | """ 183 | parser = argparse.ArgumentParser(description="Freeze old model") 184 | parser.add_argument( 185 | "--checkpoint_in", 186 | default="resources/networks/mars-small128.ckpt-68577", 187 | help="Path to checkpoint file") 188 | parser.add_argument( 189 | "--graphdef_out", 190 | default="resources/networks/mars-small128.pb") 191 | return parser.parse_args() 192 | 193 | 194 | def main(): 195 | args = parse_args() 196 | 197 | with tf.Session(graph=tf.Graph()) as session: 198 | input_var = tf.placeholder( 199 | tf.uint8, (None, 128, 64, 3), name="images") 200 | image_var = tf.map_fn( 201 | lambda x: _preprocess(x), tf.cast(input_var, tf.float32), 202 | back_prop=False) 203 | 204 | factory_fn = _network_factory() 205 | features, _ = factory_fn(image_var, reuse=None) 206 | features = tf.identity(features, name="features") 207 | 208 | saver = tf.train.Saver(slim.get_variables_to_restore()) 209 | saver.restore(session, args.checkpoint_in) 210 | 211 | output_graph_def = tf.graph_util.convert_variables_to_constants( 212 | session, tf.get_default_graph().as_graph_def(), 213 | [features.name.split(":")[0]]) 214 | with tf.gfile.GFile(args.graphdef_out, "wb") as file_handle: 215 | file_handle.write(output_graph_def.SerializeToString()) 216 | 217 | 218 | if __name__ == "__main__": 219 | main() 220 | -------------------------------------------------------------------------------- /tools/generate_detections.py: -------------------------------------------------------------------------------- 1 | # vim: expandtab:ts=4:sw=4 2 | import os 3 | import errno 4 | import argparse 5 | import numpy as np 6 | import cv2 7 | import tensorflow as tf 8 | 9 | 10 | def _run_in_batches(f, data_dict, out, batch_size): 11 | data_len = len(out) 12 | num_batches = int(data_len / batch_size) 13 | 14 | s, e = 0, 0 15 | for i in range(num_batches): 16 | s, e = i * batch_size, (i + 1) * batch_size 17 | batch_data_dict = {k: v[s:e] for k, v in data_dict.items()} 18 | out[s:e] = f(batch_data_dict) 19 | if e < len(out): 20 | batch_data_dict = {k: v[e:] for k, v in data_dict.items()} 21 | out[e:] = f(batch_data_dict) 22 | 23 | 24 | def extract_image_patch(image, bbox, patch_shape): 25 | """Extract image patch from bounding box. 26 | 27 | Parameters 28 | ---------- 29 | image : ndarray 30 | The full image. 31 | bbox : array_like 32 | The bounding box in format (x, y, width, height). 33 | patch_shape : Optional[array_like] 34 | This parameter can be used to enforce a desired patch shape 35 | (height, width). First, the `bbox` is adapted to the aspect ratio 36 | of the patch shape, then it is clipped at the image boundaries. 37 | If None, the shape is computed from :arg:`bbox`. 38 | 39 | Returns 40 | ------- 41 | ndarray | NoneType 42 | An image patch showing the :arg:`bbox`, optionally reshaped to 43 | :arg:`patch_shape`. 44 | Returns None if the bounding box is empty or fully outside of the image 45 | boundaries. 46 | 47 | """ 48 | bbox = np.array(bbox) 49 | if patch_shape is not None: 50 | # correct aspect ratio to patch shape 51 | target_aspect = float(patch_shape[1]) / patch_shape[0] 52 | new_width = target_aspect * bbox[3] 53 | bbox[0] -= (new_width - bbox[2]) / 2 54 | bbox[2] = new_width 55 | 56 | # convert to top left, bottom right 57 | bbox[2:] += bbox[:2] 58 | bbox = bbox.astype(np.int) 59 | 60 | # clip at image boundaries 61 | bbox[:2] = np.maximum(0, bbox[:2]) 62 | bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]) - 1, bbox[2:]) 63 | if np.any(bbox[:2] >= bbox[2:]): 64 | return None 65 | sx, sy, ex, ey = bbox 66 | image = image[sy:ey, sx:ex] 67 | image = cv2.resize(image, tuple(patch_shape[::-1])) 68 | return image 69 | 70 | 71 | class ImageEncoder(object): 72 | 73 | def __init__(self, checkpoint_filename, input_name="images", 74 | output_name="features"): 75 | self.session = tf.Session() 76 | with tf.gfile.GFile(checkpoint_filename, "rb") as file_handle: 77 | graph_def = tf.GraphDef() 78 | graph_def.ParseFromString(file_handle.read()) 79 | tf.import_graph_def(graph_def, name="net") 80 | self.input_var = tf.get_default_graph().get_tensor_by_name( 81 | "net/%s:0" % input_name) 82 | self.output_var = tf.get_default_graph().get_tensor_by_name( 83 | "net/%s:0" % output_name) 84 | 85 | assert len(self.output_var.get_shape()) == 2 86 | assert len(self.input_var.get_shape()) == 4 87 | self.feature_dim = self.output_var.get_shape().as_list()[-1] 88 | self.image_shape = self.input_var.get_shape().as_list()[1:] 89 | 90 | def __call__(self, data_x, batch_size=32): 91 | out = np.zeros((len(data_x), self.feature_dim), np.float32) 92 | _run_in_batches( 93 | lambda x: self.session.run(self.output_var, feed_dict=x), 94 | {self.input_var: data_x}, out, batch_size) 95 | return out 96 | 97 | 98 | def create_box_encoder(model_filename, input_name="images", 99 | output_name="features", batch_size=32): 100 | image_encoder = ImageEncoder(model_filename, input_name, output_name) 101 | image_shape = image_encoder.image_shape 102 | 103 | def encoder(image, boxes): 104 | image_patches = [] 105 | for box in boxes: 106 | patch = extract_image_patch(image, box, image_shape[:2]) 107 | if patch is None: 108 | print("WARNING: Failed to extract image patch: %s." % str(box)) 109 | patch = np.random.uniform( 110 | 0., 255., image_shape).astype(np.uint8) 111 | image_patches.append(patch) 112 | image_patches = np.asarray(image_patches) 113 | return image_encoder(image_patches, batch_size) 114 | 115 | return encoder 116 | 117 | 118 | def generate_detections(encoder, mot_dir, output_dir, detection_dir=None): 119 | """Generate detections with features. 120 | 121 | Parameters 122 | ---------- 123 | encoder : Callable[image, ndarray] -> ndarray 124 | The encoder function takes as input a BGR color image and a matrix of 125 | bounding boxes in format `(x, y, w, h)` and returns a matrix of 126 | corresponding feature vectors. 127 | mot_dir : str 128 | Path to the MOTChallenge directory (can be either train or test). 129 | output_dir 130 | Path to the output directory. Will be created if it does not exist. 131 | detection_dir 132 | Path to custom detections. The directory structure should be the default 133 | MOTChallenge structure: `[sequence]/det/det.txt`. If None, uses the 134 | standard MOTChallenge detections. 135 | 136 | """ 137 | if detection_dir is None: 138 | detection_dir = mot_dir 139 | try: 140 | os.makedirs(output_dir) 141 | except OSError as exception: 142 | if exception.errno == errno.EEXIST and os.path.isdir(output_dir): 143 | pass 144 | else: 145 | raise ValueError( 146 | "Failed to created output directory '%s'" % output_dir) 147 | 148 | for sequence in os.listdir(mot_dir): 149 | print("Processing %s" % sequence) 150 | sequence_dir = os.path.join(mot_dir, sequence) 151 | 152 | image_dir = os.path.join(sequence_dir, "img1") 153 | image_filenames = { 154 | int(os.path.splitext(f)[0]): os.path.join(image_dir, f) 155 | for f in os.listdir(image_dir)} 156 | 157 | detection_file = os.path.join( 158 | detection_dir, sequence, "det/det.txt") 159 | detections_in = np.loadtxt(detection_file, delimiter=',') 160 | detections_out = [] 161 | 162 | frame_indices = detections_in[:, 0].astype(np.int) 163 | min_frame_idx = frame_indices.astype(np.int).min() 164 | max_frame_idx = frame_indices.astype(np.int).max() 165 | for frame_idx in range(min_frame_idx, max_frame_idx + 1): 166 | print("Frame %05d/%05d" % (frame_idx, max_frame_idx)) 167 | mask = frame_indices == frame_idx 168 | rows = detections_in[mask] 169 | 170 | if frame_idx not in image_filenames: 171 | print("WARNING could not find image for frame %d" % frame_idx) 172 | continue 173 | bgr_image = cv2.imread( 174 | image_filenames[frame_idx], cv2.IMREAD_COLOR) 175 | features = encoder(bgr_image, rows[:, 2:6].copy()) 176 | detections_out += [np.r_[(row, feature)] for row, feature 177 | in zip(rows, features)] 178 | 179 | output_filename = os.path.join(output_dir, "%s.npy" % sequence) 180 | np.save( 181 | output_filename, np.asarray(detections_out), allow_pickle=False) 182 | 183 | 184 | def parse_args(): 185 | """Parse command line arguments. 186 | """ 187 | parser = argparse.ArgumentParser(description="Re-ID feature extractor") 188 | parser.add_argument( 189 | "--model", 190 | default="resources/networks/mars-small128.pb", 191 | help="Path to freezed inference graph protobuf.") 192 | parser.add_argument( 193 | "--mot_dir", help="Path to MOTChallenge directory (train or test)", 194 | required=True) 195 | parser.add_argument( 196 | "--detection_dir", help="Path to custom detections. Defaults to " 197 | "standard MOT detections Directory structure should be the default " 198 | "MOTChallenge structure: [sequence]/det/det.txt", default=None) 199 | parser.add_argument( 200 | "--output_dir", help="Output directory. Will be created if it does not" 201 | " exist.", default="detections") 202 | return parser.parse_args() 203 | 204 | 205 | def main(): 206 | args = parse_args() 207 | encoder = create_box_encoder(args.model, batch_size=32) 208 | generate_detections(encoder, args.mot_dir, args.output_dir, 209 | args.detection_dir) 210 | 211 | 212 | if __name__ == "__main__": 213 | main() 214 | -------------------------------------------------------------------------------- /vedio/test1_vedio.avi: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaoxiong74/Object-Detection-and-Tracking/d2a11affb54a1d3f2cb76f74b3eab7c370e42f09/vedio/test1_vedio.avi -------------------------------------------------------------------------------- /yolo.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Run a YOLO_v3 style detection model on test images. 5 | """ 6 | 7 | import colorsys 8 | import os 9 | import random 10 | from timeit import time 11 | from timeit import default_timer as timer ### to calculate FPS 12 | import cv2 13 | import numpy as np 14 | from keras import backend as K 15 | from keras.models import load_model 16 | from PIL import Image, ImageFont, ImageDraw 17 | 18 | from yolo3.model import yolo_eval 19 | from yolo3.utils import letterbox_image 20 | import argparse 21 | ap = argparse.ArgumentParser() 22 | ap.add_argument("-i", "--input",help="path to input video", default = "./test_video/det_t1_video_00315_test.avi") 23 | ap.add_argument("-c", "--class",help="name of class", default = "person") 24 | args = vars(ap.parse_args()) 25 | 26 | class YOLO(object): 27 | def __init__(self): 28 | self.model_path = './model_data/yolo.h5' 29 | self.anchors_path = 'model_data/yolo_anchors.txt' 30 | self.classes_path = 'model_data/coco_classes.txt' 31 | #具体参数可实验后进行调整 32 | if args["class"] == 'person': 33 | self.score = 0.6 #0.8 34 | self.iou = 0.6 35 | self.model_image_size = (416,416) 36 | if args["class"] == 'car': 37 | self.score = 0.6 38 | self.iou = 0.6 39 | self.model_image_size = (416, 416) 40 | if args["class"] == 'bicycle' or args["class"] == 'motorcycle': 41 | self.score = 0.6 42 | self.iou = 0.6 43 | self.model_image_size = (416, 416) 44 | if args["class"] == 'fire_extinguisher' or args["class"] == 'fireplug': 45 | self.score = 0.4#0.4 46 | self.iou = 0.6 47 | self.model_image_size = (416, 416) 48 | if args["class"] == 'cup' or args["class"] == 'mouse': 49 | self.score = 0.6 50 | self.iou = 0.6 51 | 52 | self.class_names = self._get_class() 53 | self.anchors = self._get_anchors() 54 | self.sess = K.get_session() 55 | #self.model_image_size = (416, 416) # fixed size or (None, None) small targets:(320,320) mid targets:(960,960) 56 | self.is_fixed_size = self.model_image_size != (None, None) 57 | self.boxes, self.scores, self.classes = self.generate() 58 | 59 | def _get_class(self): 60 | classes_path = os.path.expanduser(self.classes_path) 61 | with open(classes_path) as f: 62 | class_names = f.readlines() 63 | class_names = [c.strip() for c in class_names] 64 | #print(class_names) 65 | return class_names 66 | 67 | def _get_anchors(self): 68 | anchors_path = os.path.expanduser(self.anchors_path) 69 | with open(anchors_path) as f: 70 | anchors = f.readline() 71 | anchors = [float(x) for x in anchors.split(',')] 72 | anchors = np.array(anchors).reshape(-1, 2) 73 | return anchors 74 | 75 | def generate(self): 76 | model_path = os.path.expanduser(self.model_path) 77 | assert model_path.endswith('.h5'), 'Keras model must be a .h5 file.' 78 | 79 | self.yolo_model = load_model(model_path, compile=False) 80 | print('{} model, anchors, and classes loaded.'.format(model_path)) 81 | 82 | # Generate colors for drawing bounding boxes. 83 | hsv_tuples = [(x / len(self.class_names), 1., 1.) 84 | for x in range(len(self.class_names))] 85 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 86 | self.colors = list( 87 | map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 88 | self.colors)) 89 | random.seed(10101) # Fixed seed for consistent colors across runs. 90 | random.shuffle(self.colors) # Shuffle colors to decorrelate adjacent classes. 91 | random.seed(None) # Reset seed to default. 92 | 93 | # Generate output tensor targets for filtered bounding boxes. 94 | self.input_image_shape = K.placeholder(shape=(2, )) 95 | boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, 96 | len(self.class_names), self.input_image_shape, 97 | score_threshold=self.score, iou_threshold=self.iou) 98 | return boxes, scores, classes 99 | 100 | def detect_image(self, image): 101 | if self.is_fixed_size: 102 | assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required' 103 | assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required' 104 | boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) 105 | else: 106 | new_image_size = (image.width - (image.width % 32), 107 | image.height - (image.height % 32)) 108 | boxed_image = letterbox_image(image, new_image_size) 109 | image_data = np.array(boxed_image, dtype='float32') 110 | 111 | #print(image_data.shape) 112 | image_data /= 255. 113 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 114 | 115 | out_boxes, out_scores, out_classes = self.sess.run( 116 | [self.boxes, self.scores, self.classes], 117 | feed_dict={ 118 | self.yolo_model.input: image_data, 119 | self.input_image_shape: [image.size[1], image.size[0]], 120 | K.learning_phase(): 0 121 | }) 122 | return_boxs = [] 123 | return_class_name = [] 124 | person_counter = 0 125 | for i, c in reversed(list(enumerate(out_classes))): 126 | predicted_class = self.class_names[c] 127 | #print(self.class_names[c]) 128 | 129 | if predicted_class != 'person' and predicted_class != 'bicycle': 130 | print(predicted_class) 131 | continue 132 | 133 | # if predicted_class != args["class"]:#and predicted_class != 'car': 134 | # #print(predicted_class) 135 | # continue 136 | 137 | person_counter += 1 138 | #if predicted_class != 'car': 139 | #continue 140 | #label = predicted_class 141 | box = out_boxes[i] 142 | #score = out_scores[i] 143 | x = int(box[1]) 144 | y = int(box[0]) 145 | w = int(box[3]-box[1]) 146 | h = int(box[2]-box[0]) 147 | if x < 0 : 148 | w = w + x 149 | x = 0 150 | if y < 0 : 151 | h = h + y 152 | y = 0 153 | return_boxs.append([x,y,w,h]) 154 | #print(return_boxs) 155 | return_class_name.append([predicted_class]) 156 | #cv2.putText(image, str(self.class_names[c]),(int(box[0]), int(box[1] -50)),0, 5e-3 * 150, (0,255,0),2) 157 | #print("Found person: ",person_counter) 158 | return return_boxs,return_class_name 159 | 160 | def close_session(self): 161 | self.sess.close() 162 | -------------------------------------------------------------------------------- /yolo3/model.py: -------------------------------------------------------------------------------- 1 | """YOLO_v3 Model Defined in Keras.""" 2 | 3 | from functools import wraps 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | from keras import backend as K 8 | from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate 9 | from keras.layers.advanced_activations import LeakyReLU 10 | from keras.layers.normalization import BatchNormalization 11 | from keras.models import Model 12 | from keras.regularizers import l2 13 | 14 | from yolo3.utils import compose 15 | 16 | 17 | @wraps(Conv2D) 18 | def DarknetConv2D(*args, **kwargs): 19 | """Wrapper to set Darknet parameters for Convolution2D.""" 20 | darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)} 21 | darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same' 22 | darknet_conv_kwargs.update(kwargs) 23 | return Conv2D(*args, **darknet_conv_kwargs) 24 | 25 | def DarknetConv2D_BN_Leaky(*args, **kwargs): 26 | """Darknet Convolution2D followed by BatchNormalization and LeakyReLU.""" 27 | no_bias_kwargs = {'use_bias': False} 28 | no_bias_kwargs.update(kwargs) 29 | return compose( 30 | DarknetConv2D(*args, **no_bias_kwargs), 31 | BatchNormalization(), 32 | LeakyReLU(alpha=0.1)) 33 | 34 | def resblock_body(x, num_filters, num_blocks): 35 | '''A series of resblocks starting with a downsampling Convolution2D''' 36 | # Darknet uses left and top padding instead of 'same' mode 37 | x = ZeroPadding2D(((1,0),(1,0)))(x) 38 | x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x) 39 | for i in range(num_blocks): 40 | y = compose( 41 | DarknetConv2D_BN_Leaky(num_filters//2, (1,1)), 42 | DarknetConv2D_BN_Leaky(num_filters, (3,3)))(x) 43 | x = Add()([x,y]) 44 | return x 45 | 46 | def darknet_body(x): 47 | '''Darknent body having 52 Convolution2D layers''' 48 | x = DarknetConv2D_BN_Leaky(32, (3,3))(x) 49 | x = resblock_body(x, 64, 1) 50 | x = resblock_body(x, 128, 2) 51 | x = resblock_body(x, 256, 8) 52 | x = resblock_body(x, 512, 8) 53 | x = resblock_body(x, 1024, 4) 54 | return x 55 | 56 | def make_last_layers(x, num_filters, out_filters): 57 | '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer''' 58 | x = compose( 59 | DarknetConv2D_BN_Leaky(num_filters, (1,1)), 60 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 61 | DarknetConv2D_BN_Leaky(num_filters, (1,1)), 62 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 63 | DarknetConv2D_BN_Leaky(num_filters, (1,1)))(x) 64 | y = compose( 65 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 66 | DarknetConv2D(out_filters, (1,1)))(x) 67 | return x, y 68 | 69 | 70 | def yolo_body(inputs, num_anchors, num_classes): 71 | """Create YOLO_V3 model CNN body in Keras.""" 72 | darknet = Model(inputs, darknet_body(inputs)) 73 | x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5)) 74 | 75 | x = compose( 76 | DarknetConv2D_BN_Leaky(256, (1,1)), 77 | UpSampling2D(2))(x) 78 | x = Concatenate()([x,darknet.layers[152].output]) 79 | x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5)) 80 | 81 | x = compose( 82 | DarknetConv2D_BN_Leaky(128, (1,1)), 83 | UpSampling2D(2))(x) 84 | x = Concatenate()([x,darknet.layers[92].output]) 85 | x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5)) 86 | 87 | return Model(inputs, [y1,y2,y3]) 88 | 89 | 90 | def yolo_head(feats, anchors, num_classes, input_shape): 91 | """Convert final layer features to bounding box parameters.""" 92 | num_anchors = len(anchors) 93 | # Reshape to batch, height, width, num_anchors, box_params. 94 | anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) 95 | 96 | grid_shape = K.shape(feats)[1:3] # height, width 97 | grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), 98 | [1, grid_shape[1], 1, 1]) 99 | grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), 100 | [grid_shape[0], 1, 1, 1]) 101 | grid = K.concatenate([grid_x, grid_y]) 102 | grid = K.cast(grid, K.dtype(feats)) 103 | 104 | feats = K.reshape( 105 | feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5]) 106 | 107 | box_xy = K.sigmoid(feats[..., :2]) 108 | box_wh = K.exp(feats[..., 2:4]) 109 | box_confidence = K.sigmoid(feats[..., 4:5]) 110 | box_class_probs = K.sigmoid(feats[..., 5:]) 111 | 112 | # Adjust preditions to each spatial grid point and anchor size. 113 | box_xy = (box_xy + grid) / K.cast(grid_shape[::-1], K.dtype(feats)) 114 | box_wh = box_wh * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats)) 115 | 116 | return box_xy, box_wh, box_confidence, box_class_probs 117 | 118 | 119 | def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape): 120 | '''Get corrected boxes''' 121 | box_yx = box_xy[..., ::-1] 122 | box_hw = box_wh[..., ::-1] 123 | input_shape = K.cast(input_shape, K.dtype(box_yx)) 124 | image_shape = K.cast(image_shape, K.dtype(box_yx)) 125 | new_shape = K.round(image_shape * K.min(input_shape/image_shape)) 126 | offset = (input_shape-new_shape)/2./input_shape 127 | scale = input_shape/new_shape 128 | box_yx = (box_yx - offset) * scale 129 | box_hw *= scale 130 | 131 | box_mins = box_yx - (box_hw / 2.) 132 | box_maxes = box_yx + (box_hw / 2.) 133 | boxes = K.concatenate([ 134 | box_mins[..., 0:1], # y_min 135 | box_mins[..., 1:2], # x_min 136 | box_maxes[..., 0:1], # y_max 137 | box_maxes[..., 1:2] # x_max 138 | ]) 139 | 140 | # Scale boxes back to original image shape. 141 | boxes *= K.concatenate([image_shape, image_shape]) 142 | return boxes 143 | 144 | 145 | def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape): 146 | '''Process Conv layer output''' 147 | box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, 148 | anchors, num_classes, input_shape) 149 | boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) 150 | boxes = K.reshape(boxes, [-1, 4]) 151 | box_scores = box_confidence * box_class_probs 152 | box_scores = K.reshape(box_scores, [-1, num_classes]) 153 | return boxes, box_scores 154 | 155 | 156 | def yolo_eval(yolo_outputs, 157 | anchors, 158 | num_classes, 159 | image_shape, 160 | max_boxes=20, 161 | score_threshold=.6, 162 | iou_threshold=.5): 163 | """Evaluate YOLO model on given input and return filtered boxes.""" 164 | anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] 165 | input_shape = K.shape(yolo_outputs[0])[1:3] * 32 166 | boxes = [] 167 | box_scores = [] 168 | for l in range(3): 169 | _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], 170 | anchors[anchor_mask[l]], num_classes, input_shape, image_shape) 171 | boxes.append(_boxes) 172 | box_scores.append(_box_scores) 173 | boxes = K.concatenate(boxes, axis=0) 174 | box_scores = K.concatenate(box_scores, axis=0) 175 | 176 | mask = box_scores >= score_threshold 177 | max_boxes_tensor = K.constant(max_boxes, dtype='int32') 178 | boxes_ = [] 179 | scores_ = [] 180 | classes_ = [] 181 | for c in range(num_classes): 182 | # TODO: use keras backend instead of tf. 183 | class_boxes = tf.boolean_mask(boxes, mask[:, c]) 184 | class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) 185 | nms_index = tf.image.non_max_suppression( 186 | class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) 187 | class_boxes = K.gather(class_boxes, nms_index) 188 | class_box_scores = K.gather(class_box_scores, nms_index) 189 | classes = K.ones_like(class_box_scores, 'int32') * c 190 | boxes_.append(class_boxes) 191 | scores_.append(class_box_scores) 192 | classes_.append(classes) 193 | boxes_ = K.concatenate(boxes_, axis=0) 194 | scores_ = K.concatenate(scores_, axis=0) 195 | classes_ = K.concatenate(classes_, axis=0) 196 | 197 | return boxes_, scores_, classes_ 198 | 199 | 200 | def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes): 201 | '''Preprocess true boxes to training input format 202 | 203 | Parameters 204 | ---------- 205 | true_boxes: array, shape=(m, T, 5) 206 | Absolute x_min, y_min, x_max, y_max, class_code reletive to input_shape. 207 | input_shape: array-like, hw, multiples of 32 208 | anchors: array, shape=(N, 2), wh 209 | num_classes: integer 210 | 211 | Returns 212 | ------- 213 | y_true: list of array, shape like yolo_outputs, xywh are reletive value 214 | 215 | ''' 216 | anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] 217 | 218 | true_boxes = np.array(true_boxes, dtype='float32') 219 | input_shape = np.array(input_shape, dtype='int32') 220 | boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2 221 | boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2] 222 | true_boxes[..., 0:2] = boxes_xy/input_shape[::-1] 223 | true_boxes[..., 2:4] = boxes_wh/input_shape[::-1] 224 | 225 | m = true_boxes.shape[0] 226 | grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(3)] 227 | y_true = [np.zeros((m,grid_shapes[l][0],grid_shapes[l][1],len(anchor_mask[l]),5+num_classes), 228 | dtype='float32') for l in range(3)] 229 | 230 | # Expand dim to apply broadcasting. 231 | anchors = np.expand_dims(anchors, 0) 232 | anchor_maxes = anchors / 2. 233 | anchor_mins = -anchor_maxes 234 | valid_mask = boxes_wh[..., 0]>0 235 | 236 | for b in range(m): 237 | # Discard zero rows. 238 | wh = boxes_wh[b, valid_mask[b]] 239 | # Expand dim to apply broadcasting. 240 | wh = np.expand_dims(wh, -2) 241 | box_maxes = wh / 2. 242 | box_mins = -box_maxes 243 | 244 | intersect_mins = np.maximum(box_mins, anchor_mins) 245 | intersect_maxes = np.minimum(box_maxes, anchor_maxes) 246 | intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.) 247 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 248 | box_area = wh[..., 0] * wh[..., 1] 249 | anchor_area = anchors[..., 0] * anchors[..., 1] 250 | iou = intersect_area / (box_area + anchor_area - intersect_area) 251 | 252 | # Find best anchor for each true box 253 | best_anchor = np.argmax(iou, axis=-1) 254 | 255 | for t, n in enumerate(best_anchor): 256 | for l in range(3): 257 | if n in anchor_mask[l]: 258 | i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32') 259 | j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32') 260 | n = anchor_mask[l].index(n) 261 | c = true_boxes[b,t, 4].astype('int32') 262 | y_true[l][b, j, i, n, 0:4] = true_boxes[b,t, 0:4] 263 | y_true[l][b, j, i, n, 4] = 1 264 | y_true[l][b, j, i, n, 5+c] = 1 265 | break 266 | 267 | return y_true 268 | 269 | def box_iou(b1, b2): 270 | '''Return iou tensor 271 | 272 | Parameters 273 | ---------- 274 | b1: tensor, shape=(i1,...,iN, 4), xywh 275 | b2: tensor, shape=(j, 4), xywh 276 | 277 | Returns 278 | ------- 279 | iou: tensor, shape=(i1,...,iN, j) 280 | 281 | ''' 282 | 283 | # Expand dim to apply broadcasting. 284 | b1 = K.expand_dims(b1, -2) 285 | b1_xy = b1[..., :2] 286 | b1_wh = b1[..., 2:4] 287 | b1_wh_half = b1_wh/2. 288 | b1_mins = b1_xy - b1_wh_half 289 | b1_maxes = b1_xy + b1_wh_half 290 | 291 | # Expand dim to apply broadcasting. 292 | b2 = K.expand_dims(b2, 0) 293 | b2_xy = b2[..., :2] 294 | b2_wh = b2[..., 2:4] 295 | b2_wh_half = b2_wh/2. 296 | b2_mins = b2_xy - b2_wh_half 297 | b2_maxes = b2_xy + b2_wh_half 298 | 299 | intersect_mins = K.maximum(b1_mins, b2_mins) 300 | intersect_maxes = K.minimum(b1_maxes, b2_maxes) 301 | intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.) 302 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 303 | b1_area = b1_wh[..., 0] * b1_wh[..., 1] 304 | b2_area = b2_wh[..., 0] * b2_wh[..., 1] 305 | iou = intersect_area / (b1_area + b2_area - intersect_area) 306 | 307 | return iou 308 | 309 | 310 | 311 | def yolo_loss(args, anchors, num_classes, ignore_thresh=.5): 312 | '''Return yolo_loss tensor 313 | 314 | Parameters 315 | ---------- 316 | yolo_outputs: list of tensor, the output of yolo_body 317 | y_true: list of array, the output of preprocess_true_boxes 318 | anchors: array, shape=(T, 2), wh 319 | num_classes: integer 320 | ignore_thresh: float, the iou threshold whether to ignore object confidence loss 321 | 322 | Returns 323 | ------- 324 | loss: tensor, shape=(1,) 325 | 326 | ''' 327 | yolo_outputs = args[:3] 328 | y_true = args[3:] 329 | anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] 330 | input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0])) 331 | grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(3)] 332 | loss = 0 333 | m = K.shape(yolo_outputs[0])[0] 334 | 335 | for l in range(3): 336 | object_mask = y_true[l][..., 4:5] 337 | true_class_probs = y_true[l][..., 5:] 338 | 339 | pred_xy, pred_wh, pred_confidence, pred_class_probs = yolo_head(yolo_outputs[l], 340 | anchors[anchor_mask[l]], num_classes, input_shape) 341 | pred_box = K.concatenate([pred_xy, pred_wh]) 342 | 343 | # Darknet box loss. 344 | xy_delta = (y_true[l][..., :2]-pred_xy)*grid_shapes[l][::-1] 345 | wh_delta = K.log(y_true[l][..., 2:4]) - K.log(pred_wh) 346 | # Avoid log(0)=-inf. 347 | wh_delta = K.switch(object_mask, wh_delta, K.zeros_like(wh_delta)) 348 | box_delta = K.concatenate([xy_delta, wh_delta], axis=-1) 349 | box_delta_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4] 350 | 351 | # Find ignore mask, iterate over each of batch. 352 | ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True) 353 | object_mask_bool = K.cast(object_mask, 'bool') 354 | def loop_body(b, ignore_mask): 355 | true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0]) 356 | iou = box_iou(pred_box[b], true_box) 357 | best_iou = K.max(iou, axis=-1) 358 | ignore_mask = ignore_mask.write(b, K.cast(best_iou