├── LICENSE ├── README.md ├── doc └── Install_openface_on_ubuntu_1604.md ├── docker ├── auto-build.sh ├── cltorch │ ├── Dockerfile │ ├── README.md │ ├── amd │ │ ├── Dockerfile │ │ ├── README.md │ │ └── amdgpu-pro-16.40-348864.tar.gz │ └── nvidia │ │ ├── Dockerfile │ │ └── README.md ├── cuda-torch │ ├── Dockerfile │ └── README.md ├── ffmpeg │ ├── Dockerfile │ ├── README.md │ └── ubuntu16.04 │ │ ├── cuda8.0 │ │ └── Dockerfile │ │ └── cuda9.1 │ │ └── Dockerfile ├── ffserver │ ├── Dockerfile │ └── ffserver.conf ├── openface │ ├── Dockerfile │ └── README.md ├── pull.sh ├── pycuda │ ├── Dockerfile │ ├── README.md │ └── python-pycuda │ │ ├── aescuda.py │ │ ├── cuda_enc.py │ │ └── pycuda-2016.1.2.tar.gz ├── pyopencl │ ├── Dockerfile │ ├── README.md │ ├── amd │ │ ├── Dockerfile │ │ ├── REDME.md │ │ └── amdgpu-pro-16.40-348864.tar.gz │ └── nvidia │ │ ├── Dockerfile │ │ └── README.md ├── pytorch │ ├── Dockerfile │ ├── README.md │ └── ubuntu16.04 │ │ └── cuda9.1 │ │ └── Dockerfile └── torch_opencv_dlib │ ├── Dockerfile │ └── README.md └── scripts ├── gpgpu_monitor └── README.md └── gpu-mon.sh /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | {one line to give the program's name and a brief idea of what it does.} 635 | Copyright (C) {year} {name of author} 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | {project} Copyright (C) {year} {fullname} 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GPGPU Application Notes and Demos for x86/ARM + OpenCL/CUDA System. 2 | 3 | By Xiaohai Li (haixiaolee@gmail.com) 4 | 5 | ## Articles 6 | 7 | * [Install Openface on Ubuntu 16.04](./doc/Install_openface_on_ubuntu_1604.md) 8 | 9 | * [CPU & GPU Visualized Utility Tool (TBD)](./scripts/gpgpu_monitor/README.md) -------------------------------------------------------------------------------- /doc/Install_openface_on_ubuntu_1604.md: -------------------------------------------------------------------------------- 1 | # Install Openface on Ubuntu 16.04 2 | 3 | Xiaohai Li (haixiaolee@gamil.com) 4 | 5 | ## Introduction 6 | My hardware platform is Intel Xeon E5-2618L 10-core processor and NVidia GTX1060 6GB graphic card. The OS distro is Ubuntu Mate 16.04.1 LTS x64. 7 | 8 | The GTX1060 driver, Ubuntu system, GCC 5.4, torch and CUDA SDK have a compatibility problem that blocks me to use GPU acceleration for Openface. 9 | 10 | To use GTX1060 with CUDA, you need to install CUDA8.0 SDK. Visit [NVidia web site][cuda_info] for more information: 11 | 12 | 13 | ## Preparation 14 | Install dependence: 15 | ``` sh 16 | sudo apt-get update 17 | sudo apt-get upgrade 18 | sudo apt-get install build-essential cmake curl gfortran git libatlas-dev libavcodec-dev libavformat-dev libboost-all-dev libgtk2.0-dev libjpeg-dev liblapack-dev libswscale-dev pkg-config python-dev python-pip wget -y 19 | ``` 20 | 21 | Install appropriate NVidia drivers and tools for GTX1060: 22 | ```sh 23 | sudo apt-get purge nvidia* 24 | sudo apt-get install nvidia-367 nvidia-367-dev nvidia-opencl-icd-367 nvidia-settings 25 | sudo apt-get install 26 | ``` 27 | 28 | Install python packages: 29 | ```sh 30 | pip install numpy scipy pandas scikit-learn scikit-image 31 | ``` 32 | 33 | ## Install OpenCV 34 | 35 | Install dependence, some of them may be installed in previous step: 36 | ``` sh 37 | # Required: 38 | sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev 39 | # Optional 40 | sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev 41 | ``` 42 | 43 | Fetch OpenCV and switch to version 2.4.13: 44 | ``` sh 45 | git clone https://github.com/opencv/opencv.git 46 | cd opencv 47 | git checkout 2.4.13 48 | ``` 49 | 50 | Use cmake to configure OpenCV with CUDA/OpenCL or OpenMP (multi CPU cores) support. 51 | ``` sh 52 | mkdir build 53 | cd build 54 | cmake -D WITH_CUDA=1 -D WITH_OPENMP=1 -D WITH_OPENCL=1 -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local .. 55 | ``` 56 | 57 | Compile and install OpenCV, and set -jxx according to your host PC's multi-thread capability: 58 | ``` sh 59 | make -j20 60 | sudo make install 61 | ``` 62 | 63 | For more information refer to the [OpenCV installation guide][opencv_ins]. 64 | 65 | ## Install dlib 66 | 67 | Fetch dlib and switch to version 18.18: 68 | ``` sh 69 | git clone https://github.com/davisking/dlib.git 70 | cd dlib 71 | git checkout v19.0 72 | ``` 73 | 74 | Configure, compile and install dlib: 75 | ``` sh 76 | cd python_examples 77 | cmake ../tools/python 78 | cmake --build . --config Release -- -j20 79 | cp dlib.so /usr/local/lib/python2.7/dist-packages 80 | ``` 81 | 82 | Unfortunately, dlib doesn't support GPU acceleration at all. 83 | 84 | ## Install OpenCL/CUDA enabled Torch distro: 85 | 86 | Fetch cltorch and install dependens: 87 | ``` sh 88 | git clone --recursive https://github.com/hughperkins/distro -b distro-cl ~/torch-cl 89 | cd ~/torch-cl 90 | bash install-deps 91 | ``` 92 | 93 | Because there's compatible issue between GCC4.9, boost and dlib, we should change the default gcc to GCC5.4 before installing cltorch. Edit ./install.sh and replace gcc-4.9 & g++-4.9 to gcc & g++. 94 | ``` sh 95 | if [[ $(gcc -dumpversion | cut -d . -f 1) == 5 ]]; then { 96 | # export CC=gcc-4.9 97 | # export CXX=g++-4.9 98 | export CC=gcc 99 | export CXX=g++ 100 | } fi 101 | ... 102 | if [[ $(gcc -dumpversion | cut -d . -f 1) == 5 ]]; then { 103 | # echo export CC=gcc-4.9>>$PREFIX/bin/torch-activate 104 | # echo export CXX=g++-4.9>>$PREFIX/bin/torch-activate 105 | echo export CC=gcc>>$PREFIX/bin/torch-activate 106 | echo export CXX=g++>>$PREFIX/bin/torch-activate 107 | } fi 108 | ``` 109 | 110 | Then install cltorch: 111 | ``` sh 112 | ./install.sh 113 | ``` 114 | 115 | You should also link the executable files which are needed by openface: 116 | ``` sh 117 | ln -s ~/torch-cl/install/bin/* /usr/local/bin 118 | ``` 119 | 120 | To verify OpenCL features, use these commands below: 121 | ``` sh 122 | source ~/torch-cl/install/bin/torch-activate 123 | luajit -l torch -e 'torch.test()' 124 | luajit -l nn -e 'nn.test()' 125 | luajit -l cltorch -e 'cltorch.test()' 126 | luajit -l clnn -e 'clnn.test()' 127 | ``` 128 | 129 | The cutorch & cunn are also available: 130 | ``` sh 131 | luajit -l cutorch -e 'cutorch.test()' 132 | luajit -l cunn -e 'nn.testcuda()' 133 | ``` 134 | 135 | To update cltorch use the command: 136 | ``` sh 137 | cd ~/torch-cl 138 | git pull 139 | git submodule update --init --recursive 140 | ./install.sh 141 | ``` 142 | 143 | ## Install Openface 144 | 145 | Before install Openface, check if OpenCV and dlib are correctly configured in Python: 146 | ``` sh 147 | import cv2, dlib 148 | ``` 149 | 150 | Install dependent lua packages: 151 | ``` sh 152 | luarocks install dpnn 153 | luarocks install optim 154 | luarocks install csvigo 155 | luarocks install torchx 156 | luarocks install optnet 157 | ``` 158 | 159 | Fetch Openface: 160 | ``` sh 161 | git clone https://github.com/cmusatyalab/openface.git 162 | git submodule init 163 | git submodule update 164 | ``` 165 | 166 | Download Openface and dlib trained models, install openface and everything you need to run the demos: 167 | ``` sh 168 | ./models/get-models.sh 169 | pip install -r requirements.txt 170 | sudo python setup.py install 171 | pip install -r demos/web/requirements.txt 172 | pip install -r training/requirements.txt 173 | ./data/download-lfw-subset.sh 174 | ``` 175 | 176 | Some demos for test: 177 | ``` sh 178 | # Face comparison demo: 179 | ./demos/compare.py images/examples/{lennon*,clapton*} 180 | # Image classifier demo (with CUDA): 181 | ./demos/classifier.py --cuda infer models/openface/celeb-classifier.nn4.small2.v1.pkl ./images/examples/carell.jpg 182 | ``` 183 | 184 | Real-time face recognition web demo: 185 | ``` sh 186 | ./demos/web/start-servers.sh 187 | ``` 188 | 189 | Edit this line in start-servers.sh to enable cuda: 190 | ``` sh 191 | #./demos/web/websocket-server.py --port $WEBSOCKET_PORT 2>&1 | tee $WEBSOCKET_LOG & 192 | ./demos/web/websocket-server.py --cuda --port $WEBSOCKET_PORT 2>&1 | tee $WEBSOCKET_LOG & 193 | ``` 194 | 195 | Connect a USB camera to the computer, open Chrome and use the local host address to access demo server: 196 | ``` sh 197 | http://localhost:8000 198 | ``` 199 | 200 | Enjoy your Openface time! 201 | 202 | [opencv_ins]: http://docs.opencv.org/2.4/doc/tutorials/introduction/linux_install/linux_install.html 203 | [cuda_info]: http://developer.nvidia.com/cuda-toolkit 204 | 205 | -------------------------------------------------------------------------------- /docker/auto-build.sh: -------------------------------------------------------------------------------- 1 | docker build -t nightseas/cuda-torch -t nightseas/cuda-torch:cuda8.0-ubuntu16.04 cuda-torch/. 2 | docker push nightseas/cuda-torch:cuda8.0-ubuntu16.04 3 | docker push nightseas/cuda-torch 4 | 5 | docker build -t nightseas/torch-opencv-dlib -t nightseas/torch-opencv-dlib:cv2.4.13-dlib19.0-cuda8.0-ubuntu16.04 torch_opencv_dlib/. 6 | docker push nightseas/torch-opencv-dlib:cv2.4.13-dlib19.0-cuda8.0-ubuntu16.04 7 | docker push nightseas/torch-opencv-dlib 8 | 9 | docker build -t nightseas/openface openface/. 10 | docker push nightseas/openface 11 | -------------------------------------------------------------------------------- /docker/cltorch/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with Ubuntu base image 2 | FROM ubuntu:16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install basic deps 7 | RUN apt-get update && apt-get install -y nano sudo wget build-essential cmake curl gfortran git \ 8 | libatlas-dev libavcodec-dev libavformat-dev libboost-all-dev libgtk2.0-dev libjpeg-dev \ 9 | liblapack-dev libswscale-dev pkg-config python-dev python-pip software-properties-common \ 10 | graphicsmagick libgraphicsmagick1-dev python-numpy zip \ 11 | && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 12 | 13 | # Fetch and install cltorch 14 | RUN git clone --recursive https://github.com/hughperkins/distro -b distro-cl /root/torch-cl && cd /root/torch-cl && bash install-deps 15 | 16 | # Do not use gcc4.9! There's compatibility issues between gcc4.9, boost, and dlib 17 | RUN cd /root/torch-cl && sed -i -- 's/gcc-4.9/gcc/g' install.sh && sed -i -- 's/g++-4.9/g++/g' install.sh && ./install.sh 18 | 19 | RUN ln -s /root/torch-cl/install/bin/* /usr/local/bin 20 | 21 | WORKDIR /root 22 | -------------------------------------------------------------------------------- /docker/cltorch/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### latest 4 | This is only a base image without any drivers. 5 | 6 | #### nvidia-367 7 | Nvidia 367.57 driver integrated, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 8 | 9 | #### amdgpu-pro-16.40 10 | AMD GPU Pro 16.40-348864 driver integrated, which supports AMD Polaris GPU such as RX480/470. 11 | 12 | More information: 13 | 14 | TBD. 15 | 16 | ## Requirement 17 | 18 | - You should choose the exactly same version of driver running on both container and host. 19 | 20 | ## Test 21 | 22 | ```sh 23 | # Nvidia (change /dev/nvidia0 to your GPU) 24 | docker run --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm -it nightseas/opencl-torch:nvidia-367 25 | 26 | # AMD 27 | docker run --device /dev/dri nightseas/opencl-torch:amdgpu-pro-16.40 28 | ``` 29 | 30 | 31 | ## Known Issues 32 | 33 | - cutorch and cunn are not supported, even for the Nvidia tag. 34 | -------------------------------------------------------------------------------- /docker/cltorch/amd/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nightseas/opencl-torch 2 | 3 | MAINTAINER Xiaohai Li 4 | 5 | ENV AMD_DRI_VER=16.40-348864 6 | 7 | RUN apt-get install -y dkms libglib2.0-0 libgstreamer-plugins-base1.0-0 libepoxy0 \ 8 | libgstreamer1.0-0 libomxil-bellagio0 libcunit1 libx11-xcb1 libxcb-dri2-0 \ 9 | libxcb-glx0 libxdamage1 libxfixes3 libxxf86vm1 libxcb-dri3-0 libxcb-present0 \ 10 | libxcb-sync1 libxshmfence1 libelf1 libvdpau1 libelf1 libomxil-bellagio0 11 | 12 | COPY amdgpu-pro-$AMD_DRI_VER.tar.gz /root 13 | 14 | RUN cd /root && tar xzf amdgpu-pro-$AMD_DRI_VER.tar.gz && dpkg -i /root/amdgpu-pro-$AMD_DRI_VER/*.deb && rm -rf /root/amdgpu* 15 | 16 | RUN apt-get install clinfo 17 | 18 | CMD sh -c '/root/torch-cl/test.sh' 19 | 20 | -------------------------------------------------------------------------------- /docker/cltorch/amd/README.md: -------------------------------------------------------------------------------- 1 | ## Test 2 | 3 | ```sh 4 | docker run --device /dev/dri nightseas/opencl-torch:amdgpu-pro-16.40 5 | ``` 6 | -------------------------------------------------------------------------------- /docker/cltorch/amd/amdgpu-pro-16.40-348864.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nightseas/gpgpu_applications/498de3041712f98f4c11c6587c6b3cb248e99228/docker/cltorch/amd/amdgpu-pro-16.40-348864.tar.gz -------------------------------------------------------------------------------- /docker/cltorch/nvidia/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nightseas/opencl-torch 2 | 3 | MAINTAINER Xiaohai Li 4 | 5 | ENV NV_DRI_VER=367 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get install -y nvidia-$NV_DRI_VER nvidia-$NV_DRI_VER-dev nvidia-opencl-icd-$NV_DRI_VER clinfo 9 | 10 | CMD sh -c '/root/torch-cl/test.sh' 11 | 12 | -------------------------------------------------------------------------------- /docker/cltorch/nvidia/README.md: -------------------------------------------------------------------------------- 1 | ## Test 2 | 3 | ```sh 4 | docker run --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm -it nightseas/opencl-torch:nvidia-367 5 | ``` 6 | -------------------------------------------------------------------------------- /docker/cuda-torch/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with NVidia cuDNN base image 2 | FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install basic deps 7 | RUN apt-get update && apt-get install -y nano sudo wget build-essential cmake curl gfortran git \ 8 | libatlas-dev libavcodec-dev libavformat-dev libboost-all-dev libgtk2.0-dev libjpeg-dev \ 9 | liblapack-dev libswscale-dev pkg-config python-dev python-pip software-properties-common \ 10 | graphicsmagick libgraphicsmagick1-dev python-numpy zip \ 11 | && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 12 | 13 | RUN git clone https://github.com/torch/distro.git /root/torch --recursive && cd /root/torch && \ 14 | bash install-deps 15 | 16 | RUN cd /root/torch && ./install.sh 17 | 18 | RUN ln -s /root/torch/install/bin/* /usr/local/bin 19 | 20 | RUN luarocks install cutorch && luarocks install cunn && luarocks install cudnn 21 | 22 | WORKDIR /root 23 | -------------------------------------------------------------------------------- /docker/cuda-torch/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### cuda8.0-ubuntu16.04 (=latest) 4 | For now it's only Ubuntu 16.04 + CUDA 8.0, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 5 | 6 | More information: 7 | 8 | - [CUDA 8.0](http://www.nvidia.com/object/cuda_home_new.html) 9 | - [cuDNN v5](https://developer.nvidia.com/cuDNN) 10 | 11 | ## Requirement 12 | 13 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 14 | 15 | 16 | 17 | ## Test 18 | 19 | ```sh 20 | nvidia-docker run -it nightseas/cuda-torch bash 21 | luajit -l torch -e 'torch.test()' 22 | luajit -l nn -e 'nn.test()' 23 | 24 | luajit -l cutorch -e 'cutorch.test()' 25 | luajit -l cunn -e 'nn.testcuda()' 26 | ``` 27 | 28 | 29 | ## Known Issues 30 | 31 | - Some random failures in cutorch unit test. Refer to Google group of Torch7 for more information: 32 | 33 | https://groups.google.com/forum/m/#!msg/torch7/pgfMUUy9wWo/Mk8iGHTSAgAJ 34 | 35 | ``` 36 | Completed 105699 asserts in 169 tests with 2 failures and 0 errors 37 | -------------------------------------------------------------------------------- 38 | bernoulli 39 | mean is not equal to p 40 | ALMOST_EQ failed: 0.875 ~= 0.7456697744783 with tolerance=0.1 41 | /root/torch/install/share/lua/5.1/cutorch/test.lua:2661: in function 'v' 42 | /root/torch/install/share/lua/5.1/cutorch/test.lua:3965: in function 43 | -------------------------------------------------------------------------------- 44 | multinomial_without_replacement 45 | sampled an index twice 46 | BOOL violation condition=false 47 | /root/torch/install/share/lua/5.1/cutorch/test.lua:2861: in function 'v' 48 | /root/torch/install/share/lua/5.1/cutorch/test.lua:3965: in function 49 | -------------------------------------------------------------------------------- 50 | luajit: /root/torch/install/share/lua/5.1/torch/Tester.lua:361: An error was found while running tests! 51 | stack traceback: 52 | [C]: in function 'assert' 53 | /root/torch/install/share/lua/5.1/torch/Tester.lua:361: in function 'run' 54 | /root/torch/install/share/lua/5.1/cutorch/test.lua:3984: in function 'test' 55 | (command line):1: in main chunk 56 | [C]: at 0x00405d50 57 | 58 | ``` 59 | 60 | 61 | 62 | - In cunn unit test there are errors on 2 cases which are remain to be analyzed: 63 | 64 | (It looks like a bug, I've got a GTX1060 6GB card and running the test only took 4GB memory. But the test still falied with 'out of memory') 65 | 66 | ``` 67 | 160/169 VolumetricDilatedMaxPooling_backward_batch ...................... [ERROR] 68 | 161/169 SpatialReplicationPadding_forward ............................... [ERROR] 69 | ... 70 | Completed 1902 asserts in 169 tests with 0 failures and 2 errors 71 | -------------------------------------------------------------------------------- 72 | VolumetricDilatedMaxPooling_backward_batch 73 | Function call failed 74 | /root/torch/install/share/lua/5.1/cunn/test.lua:70: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-4846/cutorch/lib/THC/generic/THCStorage.cu:65 75 | stack traceback: 76 | [C]: in function 'resize' 77 | /root/torch/install/share/lua/5.1/cunn/test.lua:70: in function 'makeNonContiguous' 78 | /root/torch/install/share/lua/5.1/cunn/test.lua:4552: in function 'v' 79 | /root/torch/install/share/lua/5.1/cunn/test.lua:5671: in function 80 | [C]: in function 'xpcall' 81 | /root/torch/install/share/lua/5.1/torch/Tester.lua:477: in function '_pcall' 82 | /root/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run' 83 | /root/torch/install/share/lua/5.1/torch/Tester.lua:355: in function 'run' 84 | /root/torch/install/share/lua/5.1/cunn/test.lua:5692: in function 'testcuda' 85 | (command line):1: in main chunk 86 | [C]: at 0x00405d50 87 | 88 | -------------------------------------------------------------------------------- 89 | SpatialReplicationPadding_forward 90 | Function call failed 91 | /root/torch/install/share/lua/5.1/cunn/test.lua:78: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-4846/cutorch/lib/THC/THCTensorCopy.cu:204 92 | stack traceback: 93 | [C]: in function 'copy' 94 | /root/torch/install/share/lua/5.1/cunn/test.lua:78: in function 'makeNonContiguous' 95 | /root/torch/install/share/lua/5.1/cunn/test.lua:5315: in function 'v' 96 | /root/torch/install/share/lua/5.1/cunn/test.lua:5671: in function 97 | [C]: in function 'xpcall' 98 | /root/torch/install/share/lua/5.1/torch/Tester.lua:477: in function '_pcall' 99 | /root/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run' 100 | /root/torch/install/share/lua/5.1/torch/Tester.lua:355: in function 'run' 101 | /root/torch/install/share/lua/5.1/cunn/test.lua:5692: in function 'testcuda' 102 | (command line):1: in main chunk 103 | [C]: at 0x00405d50 104 | 105 | -------------------------------------------------------------------------------- 106 | luajit: /root/torch/install/share/lua/5.1/torch/Tester.lua:363: An error was found while running tests! 107 | stack traceback: 108 | [C]: in function 'assert' 109 | /root/torch/install/share/lua/5.1/torch/Tester.lua:363: in function 'run' 110 | /root/torch/install/share/lua/5.1/cunn/test.lua:5692: in function 'testcuda' 111 | (command line):1: in main chunk 112 | [C]: at 0x00405d50 113 | ``` 114 | -------------------------------------------------------------------------------- /docker/ffmpeg/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with NVidia CUDA base image 2 | FROM nvidia/cuda:9.1-devel 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install dependent packages 7 | RUN apt-get -y update && apt-get install -y wget nano git build-essential yasm pkg-config 8 | 9 | RUN git clone https://github.com/FFmpeg/nv-codec-headers /root/nv-codec-headers && \ 10 | cd /root/nv-codec-headers &&\ 11 | make -j8 && \ 12 | make install -j8 && \ 13 | cd /root && rm -rf nv-codec-headers 14 | 15 | # Compile and install ffmpeg from source 16 | RUN git clone https://github.com/FFmpeg/FFmpeg /root/ffmpeg && \ 17 | cd /root/ffmpeg && ./configure \ 18 | --enable-nonfree --disable-shared \ 19 | --enable-nvenc --enable-cuda \ 20 | --enable-cuvid --enable-libnpp \ 21 | --extra-cflags=-I/usr/local/cuda/include \ 22 | --extra-cflags=-I/usr/local/include \ 23 | --extra-ldflags=-L/usr/local/cuda/lib64 && \ 24 | make -j8 && \ 25 | make install -j8 && \ 26 | cd /root && rm -rf ffmpeg 27 | 28 | ENV NVIDIA_DRIVER_CAPABILITIES video,compute,utility 29 | 30 | WORKDIR /root 31 | -------------------------------------------------------------------------------- /docker/ffmpeg/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### cuda9.1-ubuntu16.04 (=latest) 4 | 5 | Ubuntu 16.04 + CUDA 9.1, which has been verified on NVidia P100. 6 | 7 | #### cuda8.0-ubuntu16.04 8 | 9 | Ubuntu 16.04 + CUDA 8.0, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 10 | 11 | More information: 12 | 13 | - [CUDA](http://www.nvidia.com/object/cuda_home_new.html) 14 | - [FFmpeg with CUDA](https://developer.nvidia.com/ffmpeg) 15 | 16 | ## Requirement 17 | 18 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 19 | - [NVIDIA Driver Version Requirements from NV Codec Headers](https://github.com/FFmpeg/nv-codec-headers). 20 | 21 | ## Test 22 | 23 | ```sh 24 | docker run -it --runtime=nvidia --volume path_to_your_data:/root/data nightseas/ffmpeg bash 25 | ``` 26 | 27 | ### Decode a single H.264 to YUV 28 | 29 | To decode a single H.264 encoded elementary bitstream file into YUV, use the following command: 30 | 31 | ```sh 32 | ffmpeg -vsync 0 -c:v h264_cuvid -i -f rawvideo 33 | ``` 34 | 35 | ### Encode a single YUV file to a bitstream 36 | 37 | To encode a single YUV file into an H.264/HEVC bitstream, use the following command: 38 | 39 | ```sh 40 | # H264 41 | ffmpeg -f rawvideo -s:v 1920x1080 -r 30 -pix_fmt yuv420p -i -c:v h264_nvenc -preset slow -cq 10 -bf 2 -g 150 42 | # H265/HEVC (No B-frames) 43 | ffmpeg -f rawvideo -s:v 1920x1080 -r 30 -pix_fmt yuv420p -i -vcodec hevc_nvenc -preset slow -cq 10 -g 150 44 | ``` 45 | 46 | ### Transcode a single video file to N streams 47 | 48 | Note: For GTX10xx GPU, only TWO encoders are available at the same time, although the encoder usage are not 100%. Acctually in my case, transcoding 2 1080p H264 videos only use 30% of encoder resouces on GTX1080. It seems like a software limitation set by NVIDIA. And there's no such limitation on P100. 49 | 50 | To do 1:N transcode, use the following command: 51 | 52 | ```sh 53 | ffmpeg -hwaccel cuvid -c:v h264_cuvid -i -vf scale_npp=1280:720 -vcodec h264_nvenc -vf scale_npp=640:480 -vcodec h264_nvenc 54 | ``` 55 | 56 | ### Perf Test 57 | 58 | Platform: Dual Intel E5-2699v4 + NVIDIA P100 59 | 60 | NVIDIA Driver: 390.48 61 | 62 | FFmpeg 4.0 + CUDA9.1 63 | 64 | 65 | ```sh 66 | ffmpeg -hwaccel cuvid -c:v h264_cuvid -i big_buck_bunny_1080p_H264_AAC_25fps_7200K.MP4 -vf scale_npp=1280:720 -vcodec h264_nvenc output0.mp4 -vf scale_npp=640:480 -vcodec h264_nvenc output1.mp4 67 | 68 | ... 69 | 70 | frame= 1130 fps=258 q=23.0 Lq=19.0 size= 12277kB time=00:00:45.16 bitrate=2227.1kbits/s dup=10 drop=0 speed=10.3x 71 | video:22926kB audio:1412kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 72 | [aac @ 0x2204080] Qavg: 828.371 73 | [aac @ 0x22aa4c0] Qavg: 828.371 74 | ``` 75 | 76 | 77 | ## Known Issue 78 | 79 | N/A 80 | -------------------------------------------------------------------------------- /docker/ffmpeg/ubuntu16.04/cuda8.0/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with NVidia CUDA base image 2 | FROM nvidia/cuda:8.0-devel 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install dependent packages 7 | RUN apt-get -y update && apt-get install -y wget nano git build-essential yasm pkg-config 8 | 9 | RUN git clone https://github.com/FFmpeg/nv-codec-headers /root/nv-codec-headers && \ 10 | cd /root/nv-codec-headers &&\ 11 | make -j8 && \ 12 | make install -j8 && \ 13 | cd /root && rm -rf nv-codec-headers 14 | 15 | # Compile and install ffmpeg from source 16 | RUN git clone https://github.com/FFmpeg/FFmpeg /root/ffmpeg && \ 17 | cd /root/ffmpeg && ./configure \ 18 | --enable-nonfree --disable-shared \ 19 | --enable-nvenc --enable-cuda \ 20 | --enable-cuvid --enable-libnpp \ 21 | --extra-cflags=-I/usr/local/cuda/include \ 22 | --extra-cflags=-I/usr/local/include \ 23 | --extra-ldflags=-L/usr/local/cuda/lib64 && \ 24 | make -j8 && \ 25 | make install -j8 && \ 26 | cd /root && rm -rf ffmpeg 27 | 28 | 29 | ENV NVIDIA_DRIVER_CAPABILITIES video,compute,utility 30 | 31 | WORKDIR /root 32 | -------------------------------------------------------------------------------- /docker/ffmpeg/ubuntu16.04/cuda9.1/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with NVidia CUDA base image 2 | FROM nvidia/cuda:9.1-devel 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install dependent packages 7 | RUN apt-get -y update && apt-get install -y wget nano git build-essential yasm pkg-config 8 | 9 | RUN git clone https://github.com/FFmpeg/nv-codec-headers /root/nv-codec-headers && \ 10 | cd /root/nv-codec-headers &&\ 11 | make -j8 && \ 12 | make install -j8 && \ 13 | cd /root && rm -rf nv-codec-headers 14 | 15 | # Compile and install ffmpeg from source 16 | RUN git clone https://github.com/FFmpeg/FFmpeg /root/ffmpeg && \ 17 | cd /root/ffmpeg && ./configure \ 18 | --enable-nonfree --disable-shared \ 19 | --enable-nvenc --enable-cuda \ 20 | --enable-cuvid --enable-libnpp \ 21 | --extra-cflags=-I/usr/local/cuda/include \ 22 | --extra-cflags=-I/usr/local/include \ 23 | --extra-ldflags=-L/usr/local/cuda/lib64 && \ 24 | make -j8 && \ 25 | make install -j8 && \ 26 | cd /root && rm -rf ffmpeg 27 | 28 | ENV NVIDIA_DRIVER_CAPABILITIES video,compute,utility 29 | 30 | WORKDIR /root 31 | -------------------------------------------------------------------------------- /docker/ffserver/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with ffmpeg image 2 | FROM nightseas/ffmpeg:cuda8.0-ubuntu16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | COPY ffserver.conf /etc/ffserver.conf 7 | 8 | # HTTP: 8090, RTP: 554 9 | EXPOSE 8090 554 10 | 11 | CMD bash -c "ffserver" 12 | -------------------------------------------------------------------------------- /docker/ffserver/ffserver.conf: -------------------------------------------------------------------------------- 1 | # Port on which the server is listening. You must select a different 2 | # port from your standard HTTP web server if it is running on the same 3 | # computer. 4 | HTTPPort 8090 5 | RTSPPort 5454 6 | 7 | # Address on which the server is bound. Only useful if you have 8 | # several network interfaces. 9 | HTTPBindAddress 0.0.0.0 10 | RTSPBindAddress 0.0.0.0 11 | 12 | # Number of simultaneous HTTP connections that can be handled. It has 13 | # to be defined *before* the MaxClients parameter, since it defines the 14 | # MaxClients maximum limit. 15 | MaxHTTPConnections 100 16 | 17 | # Number of simultaneous requests that can be handled. Since FFServer 18 | # is very fast, it is more likely that you will want to leave this high 19 | # and use MaxBandwidth, below. 20 | MaxClients 50 21 | 22 | # This the maximum amount of kbit/sec that you are prepared to 23 | # consume when streaming to clients. 24 | MaxBandwidth 100000 25 | 26 | # Access log file (uses standard Apache log file format) 27 | # '-' is the standard output. 28 | CustomLog - 29 | 30 | ################################################################## 31 | 32 | 33 | 34 | FileMaxSize 50M 35 | 36 | 37 | 38 | 39 | 40 | 41 | FileMaxSize 50M 42 | 43 | 44 | 45 | ################################################################## 46 | # Format of the stream : you can choose among: 47 | # mpeg : MPEG-1 multiplexed video and audio 48 | # mpegvideo : only MPEG-1 video 49 | # mp2 : MPEG-2 audio (use AudioCodec to select layer 2 and 3 codec) 50 | # ogg : Ogg format (Vorbis audio codec) 51 | # rm : RealNetworks-compatible stream. Multiplexed audio and video. 52 | # ra : RealNetworks-compatible stream. Audio only. 53 | # mpjpeg : Multipart JPEG (works with Netscape without any plugin) 54 | # jpeg : Generate a single JPEG image. 55 | # mjpeg : Generate a M-JPEG stream. 56 | # asf : ASF compatible streaming (Windows Media Player format). 57 | # swf : Macromedia Flash compatible stream 58 | # avi : AVI format (MPEG-4 video, MPEG audio sound) 59 | 60 | # Size of the video frame: WxH (default: 160x128) 61 | # The following abbreviations are defined: sqcif, qcif, cif, 4cif, qqvga, 62 | # qvga, vga, svga, xga, uxga, qxga, sxga, qsxga, hsxga, wvga, wxga, wsxga, 63 | # wuxga, woxga, wqsxga, wquxga, whsxga, whuxga, cga, ega, hd480, hd720, 64 | # hd1080 65 | 66 | ################################################################## 67 | # Flash 68 | 69 | # 70 | #Feed feed1.ffm 71 | #Format swf 72 | #VideoFrameRate 2 73 | #VideoIntraOnly 74 | #NoAudio 75 | # 76 | 77 | ################################################################## 78 | # RTSP examples 79 | # 80 | # You can access this stream with the RTSP URL: 81 | # rtsp://localhost:5454/live.flv 82 | # 83 | # A non-standard RTSP redirector is also created. Its URL is: 84 | # http://localhost:8090/live.rtsp 85 | 86 | 87 | # Transcode an incoming live feed to another live feed, 88 | # using libx264 and video presets 89 | 90 | 91 | Format flv 92 | Feed feed1.ffm 93 | VideoCodec h264_nvenc 94 | VideoFrameRate 25 95 | VideoBitRate 8000 96 | VideoSize hd1080 97 | 98 | AudioCodec aac 99 | AudioBitRate 32 100 | AudioChannels 2 101 | AudioSampleRate 22050 102 | 103 | 104 | 105 | Format flv 106 | Feed feed2.ffm 107 | VideoCodec h264_nvenc 108 | VideoFrameRate 25 109 | VideoBitRate 4000 110 | VideoSize hd720 111 | 112 | AudioCodec aac 113 | AudioBitRate 32 114 | AudioChannels 2 115 | AudioSampleRate 22050 116 | 117 | 118 | # 119 | #Format flv 120 | #Feed feed2.ffm 121 | #VideoCodec h264_nvenc 122 | #VideoFrameRate 25 123 | #VideoBitRate 2000 124 | #VideoSize hd480 125 | 126 | #AudioCodec aac 127 | #AudioBitRate 32 128 | #AudioChannels 2 129 | #AudioSampleRate 22050 130 | # 131 | 132 | ###################################### 133 | # ffmpeg cmd: 134 | ###################################### 135 | # ffmpeg -re -stream_loop -1 -c:v h264_cuvid -vsync 0 -i big_buck_bunny_1080p_H264_AAC_25fps_7200K.MP4 http://localhost:8090/feed1.ffm http://localhost:8090/feed2.ffm 136 | -------------------------------------------------------------------------------- /docker/openface/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with cutorch + opencv + dlib image 2 | FROM nightseas/torch-opencv-dlib:cv2.4.13-dlib19.0-cuda8.0-ubuntu16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install python deps 7 | RUN pip install numpy scipy pandas scikit-learn scikit-image 8 | 9 | # Install torch deps 10 | RUN luarocks install dpnn && \ 11 | luarocks install image && \ 12 | luarocks install optim && \ 13 | luarocks install csvigo && \ 14 | luarocks install torchx && \ 15 | luarocks install optnet && \ 16 | luarocks install graphicsmagick && \ 17 | luarocks install tds 18 | 19 | # Fetch & install openface 20 | RUN git clone https://github.com/cmusatyalab/openface.git /root/openface && \ 21 | cd /root/openface && git submodule init && git submodule update 22 | 23 | RUN cd /root/openface && \ 24 | ./models/get-models.sh && \ 25 | pip install -r requirements.txt && \ 26 | python setup.py install && \ 27 | pip install -r demos/web/requirements.txt && \ 28 | pip install -r training/requirements.txt 29 | 30 | RUN cd /root/openface && \ 31 | ./data/download-lfw-subset.sh 32 | 33 | # Expose the ports that are used by web demo 34 | EXPOSE 8000 9000 35 | 36 | CMD /bin/bash -l -c '/root/openface/demos/web/start-servers.sh' 37 | -------------------------------------------------------------------------------- /docker/openface/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | For now it's only Ubuntu 16.04 + CUDA 8.0, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 4 | 5 | More information: 6 | 7 | - [CUDA 8.0](http://www.nvidia.com/object/cuda_home_new.html) 8 | - [cuDNN v5](https://developer.nvidia.com/cuDNN) 9 | - [CMU openface](http://cmusatyalab.github.io/openface/) 10 | 11 | ## Requirement 12 | 13 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 14 | 15 | 16 | 17 | ## Test 18 | 19 | Face comparison demo: 20 | 21 | ```sh 22 | /root/openface/demos/compare.py images/examples/{lennon*,clapton*} 23 | ``` 24 | 25 | Image classifier demo: 26 | 27 | ```sh 28 | /root/openface/demos/classifier.py infer models/openface/celeb-classifier.nn4.small2.v1.pkl ./images/examples/carell.jpg 29 | ``` 30 | 31 | Real-time web based face recognition: 32 | 33 | ```sh 34 | /root/openface/demos/web/start-servers.sh 35 | ``` 36 | 37 | ## Known Issues 38 | 39 | Some function may not work well: 40 | 41 | ``` 42 | root:~/openface# ./run-tests.sh 43 | ttests.openface_api_tests.test_pipeline ... ok 44 | tests.openface_batch_represent_tests.test_batch_represent ... ok 45 | tests.openface_demo_tests.test_compare_demo ... ok 46 | tests.openface_demo_tests.test_classification_demo_pretrained ... ok 47 | tests.openface_demo_tests.test_classification_demo_pretrained_multi ... ok 48 | tests.openface_demo_tests.test_classification_demo_training ... ok 49 | tests.openface_neural_net_training_tests.test_dnn_training ... FAIL 50 | 51 | ====================================================================== 52 | FAIL: tests.openface_neural_net_training_tests.test_dnn_training 53 | ---------------------------------------------------------------------- 54 | Traceback (most recent call last): 55 | File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest 56 | self.test(*self.arg) 57 | File "/root/openface/tests/openface_neural_net_training_tests.py", line 82, in test_dnn_training 58 | assert np.mean(trainLoss) < 0.3 59 | AssertionError: 60 | ... 61 | ``` 62 | -------------------------------------------------------------------------------- /docker/pull.sh: -------------------------------------------------------------------------------- 1 | docker pull nightseas/openface 2 | docker pull nightseas/ffmpeg 3 | docker pull nightseas/cuda-torch 4 | docker pull nightseas/pycuda 5 | -------------------------------------------------------------------------------- /docker/pycuda/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with NVidia CUDA base image 2 | FROM nvidia/cuda:8.0-devel-ubuntu16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install dependent packages 7 | RUN apt-get -y update && apt-get install -y wget nano python-pip libboost-all-dev python-numpy build-essential python-dev python-setuptools libboost-python-dev libboost-thread-dev 8 | 9 | # Compile and install pyCUDA from source 10 | COPY python-pycuda/pycuda-2016.1.2.tar.gz /root/pycuda-2016.1.2.tar.gz 11 | RUN tar xzf /root/pycuda-2016.1.2.tar.gz -C /root && cd /root/pycuda-2016.1.2 && ./configure.py --cuda-root=/usr/local/cuda --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python --boost-thread-libname=boost_thread --no-use-shipped-boost && make -j8 /root/pycuda-2016.1.2 && python setup.py install && pip install . && rm /root/pycuda* -rf 12 | 13 | CMD nvidia-smi -q 14 | -------------------------------------------------------------------------------- /docker/pycuda/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | For now it's only Ubuntu 16.04 + CUDA 8.0, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 4 | 5 | More information: 6 | 7 | - [CUDA 8.0](http://www.nvidia.com/object/cuda_home_new.html) 8 | - [pyCUDA](https://documen.tician.de/pycuda/) 9 | 10 | ## Requirement 11 | 12 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 13 | 14 | ## Test 15 | 16 | ```sh 17 | nvidia-docker run -it nightseas/pycuda bash 18 | ``` 19 | 20 | - [AES Encryption with CUDA](https://github.com/nightseas/gpgpu_applications/tree/master/docker/pycuda/python-pycuda) 21 | 22 | ## Known Issues 23 | 24 | #### TBD 25 | -------------------------------------------------------------------------------- /docker/pycuda/python-pycuda/aescuda.py: -------------------------------------------------------------------------------- 1 | import os, sys, unittest, binascii 2 | import pycuda.driver as cuda 3 | import pycuda.autoinit 4 | from pycuda.compiler import SourceModule 5 | import numpy 6 | import array, time, math, numpy 7 | 8 | 9 | class AES: 10 | 11 | sbox = (99,124,119,123,242,107,111,197,48,1,103,43,254,215,171, 12 | 118,202,130,201,125,250,89,71,240,173,212,162,175,156,164,114,192,183,253, 13 | 147,38,54,63,247,204,52,165,229,241,113,216,49,21,4,199,35,195,24,150,5,154, 14 | 7,18,128,226,235,39,178,117,9,131,44,26,27,110,90,160,82,59,214,179,41,227, 15 | 47,132,83,209,0,237,32,252,177,91,106,203,190,57,74,76,88,207,208,239,170, 16 | 251,67,77,51,133,69,249,2,127,80,60,159,168,81,163,64,143,146,157,56,245, 17 | 188,182,218,33,16,255,243,210,205,12,19,236,95,151,68,23,196,167,126,61, 18 | 100,93,25,115,96,129,79,220,34,42,144,136,70,238,184,20,222,94,11,219,224, 19 | 50,58,10,73,6,36,92,194,211,172,98,145,149,228,121,231,200,55,109,141,213, 20 | 78,169,108,86,244,234,101,122,174,8,186,120,37,46,28,166,180,198,232,221, 21 | 116,31,75,189,139,138,112,62,181,102,72,3,246,14,97,53,87,185,134,193,29, 22 | 158,225,248,152,17,105,217,142,148,155,30,135,233,206,85,40,223,140,161, 23 | 137,13,191,230,66,104,65,153,45,15,176,84,187,22) 24 | 25 | cuda_inited = False 26 | 27 | 28 | threadMax = 1024 29 | blockMax = 1024 30 | cuda_buf_size = 16 * blockMax * threadMax 31 | 32 | def __init__(self, key, threadMax = 1024, blockMax = 1024): 33 | self.key = key 34 | self.threadMax = threadMax 35 | self.blockMax = blockMax 36 | 37 | self.expandKey() 38 | self.gen_tbox() 39 | 40 | def gen_tbox(self): 41 | self.Te = [numpy.zeros(256, numpy.uint32) for i in xrange(4)] 42 | d = bytearray(256) 43 | 44 | for i in xrange(128): 45 | d[i] = i << 1; 46 | d[128 + i] = (i << 1) ^ 0x1b; 47 | for i in xrange(256): 48 | self.Te[0][i] = self.tuple2word((d[self.sbox[i]], self.sbox[i], self.sbox[i], d[self.sbox[i]] ^ self.sbox[i])) 49 | self.Te[1][i] = self.tuple2word((d[self.sbox[i]] ^ self.sbox[i], d[self.sbox[i]], self.sbox[i], self.sbox[i])) 50 | self.Te[2][i] = self.tuple2word((self.sbox[i], d[self.sbox[i]] ^ self.sbox[i], d[self.sbox[i]], self.sbox[i])) 51 | self.Te[3][i] = self.tuple2word((self.sbox[i], self.sbox[i], d[self.sbox[i]] ^ self.sbox[i], d[self.sbox[i]])) 52 | 53 | def tuple2word(self, x): 54 | return (x[0] << 24) | (x[1] << 16) | (x[2] << 8) | x[3] 55 | 56 | def byte2word(self, bArr): 57 | return (bArr[0] << 24) | (bArr[1] << 16) | (bArr[2] << 8) | bArr[3] 58 | 59 | def word2byte(self, w): 60 | b = bytearray(4) 61 | b[0] = w >> 24 62 | b[1] = (w >> 16) & 0xff 63 | b[2] = (w >> 8) & 0xff 64 | b[3] = w & 0xff 65 | return b 66 | 67 | def expandKey(self): 68 | if not self.key or len(self.key) not in (16, 16): 69 | raise Exception("invalid key") 70 | ks = bytearray((len(self.key) / 4 + 7) * 16) 71 | ks[0 : 16] = self.key 72 | 73 | self.keySchedule = numpy.zeros(4 * (len(self.key) / 4 + 7), numpy.uint32) 74 | 75 | rcon = 1 76 | 77 | for i in xrange(len(self.key), len(ks), 4): 78 | temp = ks[i - 4 : i] 79 | if i % len(self.key) == 0: 80 | temp = (self.sbox[temp[1]] ^ rcon, self.sbox[temp[2]], self.sbox[temp[3]], self.sbox[temp[0]]) 81 | rcon = rcon << 1 82 | if rcon >= 256: 83 | rcon ^= 0x11b; 84 | for j in xrange(0, 4): 85 | ks[i + j] = ks[i + j - len(self.key)] ^ temp[j] 86 | for i in xrange(len(ks) / 16): 87 | self.keySchedule[i * 4 : (i + 1) * 4] = (self.byte2word(ks[i * 16 : i * 16 + 4]), self.byte2word(ks[i * 16 + 4 : i * 16 + 8]), self.byte2word(ks[i * 16 + 8 : i * 16 + 12]), self.byte2word(ks[i * 16 + 12 : i * 16 + 16])) 88 | 89 | def printKeySchedule(self): 90 | for i in xrange(0, len(self.keySchedule)): 91 | line = "(%08x, %08x, %08x, %08x)" % (self.keySchedule[i][0], self.keySchedule[i][1], self.keySchedule[i][2], self.keySchedule[i][3]) 92 | print line 93 | 94 | def printDebugInfo(self): 95 | print self.Te 96 | 97 | def __addRoundKey(self, dst, src): 98 | for i in xrange(len(dst)): 99 | dst[i] ^= src[i] 100 | 101 | def __block_encrypt(self, pt): 102 | s = [0] * 4 103 | rk = self.keySchedule[0 : 4] 104 | for i in xrange(4): 105 | s[i] = self.byte2word(pt[i * 4 : (i + 1) * 4]) 106 | # add round key 107 | s[i] ^= rk[i] 108 | 109 | t = [0] * 4 110 | 111 | for i in xrange(1, 10): 112 | rk = self.keySchedule[i * 4 : (i + 1) * 4] 113 | t[0] = self.Te[0][s[0] >> 24] ^ self.Te[1][(s[1] >> 16) & 0xff] ^ self.Te[2][(s[2] >> 8 ) & 0xff] ^ self.Te[3][(s[3]) & 0xff] ^ rk[0] 114 | t[1] = self.Te[0][s[1] >> 24] ^ self.Te[1][(s[2] >> 16) & 0xff] ^ self.Te[2][(s[3] >> 8 ) & 0xff] ^ self.Te[3][(s[0]) & 0xff] ^ rk[1] 115 | t[2] = self.Te[0][s[2] >> 24] ^ self.Te[1][(s[3] >> 16) & 0xff] ^ self.Te[2][(s[0] >> 8 ) & 0xff] ^ self.Te[3][(s[1]) & 0xff] ^ rk[2] 116 | t[3] = self.Te[0][s[3] >> 24] ^ self.Te[1][(s[0] >> 16) & 0xff] ^ self.Te[2][(s[1] >> 8 ) & 0xff] ^ self.Te[3][(s[2]) & 0xff] ^ rk[3] 117 | 118 | for j in xrange(4): 119 | s[j] = t[j] 120 | 121 | rk = self.keySchedule[40 : 44] 122 | s[0] = (self.Te[2][(t[0] >> 24)] & 0xff000000) ^ (self.Te[3][(t[1] >> 16) & 0xff] & 0x00ff0000) ^ (self.Te[0][(t[2] >> 8) & 0xff] & 0x0000ff00) ^ (self.Te[1][t[3] & 0xff] & 0x000000ff) ^ rk[0] 123 | s[1] = (self.Te[2][(t[1] >> 24)] & 0xff000000) ^ (self.Te[3][(t[2] >> 16) & 0xff] & 0x00ff0000) ^ (self.Te[0][(t[3] >> 8) & 0xff] & 0x0000ff00) ^ (self.Te[1][t[0] & 0xff] & 0x000000ff) ^ rk[1] 124 | s[2] = (self.Te[2][(t[2] >> 24)] & 0xff000000) ^ (self.Te[3][(t[3] >> 16) & 0xff] & 0x00ff0000) ^ (self.Te[0][(t[0] >> 8) & 0xff] & 0x0000ff00) ^ (self.Te[1][t[1] & 0xff] & 0x000000ff) ^ rk[2] 125 | s[3] = (self.Te[2][(t[3] >> 24)] & 0xff000000) ^ (self.Te[3][(t[0] >> 16) & 0xff] & 0x00ff0000) ^ (self.Te[0][(t[1] >> 8) & 0xff] & 0x0000ff00) ^ (self.Te[1][t[2] & 0xff] & 0x000000ff) ^ rk[3] 126 | 127 | ct = bytearray(len(pt)) 128 | for i in range(4): 129 | ct[i * 4 : (i + 1) * 4] = self.word2byte(s[i]) 130 | return ct 131 | 132 | def basic_encrypt(self, pt): 133 | if len(pt) % 16 != 0: 134 | raise Exception("invalid block size: " + str(len(pt))) 135 | 136 | if not isinstance(pt, bytearray): 137 | if isinstance(pt, str): 138 | pt = bytearray(pt) 139 | 140 | ct = bytearray(len(pt)) 141 | 142 | for i in xrange(0, len(pt), 16): 143 | ct[i : i + 16] = self.__block_encrypt(pt[i : i + 16]) 144 | 145 | return ct 146 | 147 | def init_cuda(self): 148 | if self.cuda_inited: 149 | return 150 | cuda_kernel = """ 151 | #include 152 | 153 | __device__ __constant__ unsigned int keySchedule[44]; 154 | __device__ __constant__ unsigned int Te0[256]; 155 | __device__ __constant__ unsigned int Te1[256]; 156 | __device__ __constant__ unsigned int Te2[256]; 157 | __device__ __constant__ unsigned int Te3[256]; 158 | __device__ __constant__ unsigned int length; 159 | __device__ __constant__ unsigned int threadMax; 160 | 161 | 162 | __global__ void printKeySchedule(){ 163 | for(int i = 0; i < 11; i++){ 164 | for(int j = 0; j < 4; j++){ 165 | printf("%08x", keySchedule[i * 4 + j]); 166 | } 167 | printf("\\n"); 168 | } 169 | } 170 | 171 | __device__ unsigned int bytestoword(unsigned char* b){ 172 | return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3]; 173 | } 174 | 175 | __device__ void wordtobytes(unsigned char* b, unsigned int w){ 176 | b[0] = (w >> 24); 177 | b[1] = (w >> 16) & 0xff; 178 | b[2] = (w >> 8) & 0xff; 179 | b[3] = w & 0xff; 180 | } 181 | 182 | __device__ void addRoundKey(unsigned int *s, unsigned int *k){ 183 | s[0] ^= k[0]; 184 | s[1] ^= k[1]; 185 | s[2] ^= k[2]; 186 | s[3] ^= k[3]; 187 | } 188 | 189 | __global__ void encrypt(unsigned char* in){ 190 | int p = blockIdx.x * 1024 + threadIdx.x; 191 | if(p * 16 >= length) 192 | return; 193 | unsigned char* block = in + p * 16; 194 | unsigned int s[4], t[4]; 195 | unsigned int *rk; 196 | s[0] = bytestoword(block); 197 | s[1] = bytestoword(block + 4); 198 | s[2] = bytestoword(block + 8); 199 | s[3] = bytestoword(block + 12); 200 | 201 | addRoundKey(s, keySchedule); 202 | 203 | for(int i = 1; i < 10; i++){ 204 | rk = keySchedule + i * 4; 205 | t[0] = Te0[s[0] >> 24] ^ Te1[(s[1] >> 16) & 0xff] ^ Te2[(s[2] >> 8 ) & 0xff] ^ Te3[(s[3]) & 0xff] ^ rk[0]; 206 | t[1] = Te0[s[1] >> 24] ^ Te1[(s[2] >> 16) & 0xff] ^ Te2[(s[3] >> 8 ) & 0xff] ^ Te3[(s[0]) & 0xff] ^ rk[1]; 207 | t[2] = Te0[s[2] >> 24] ^ Te1[(s[3] >> 16) & 0xff] ^ Te2[(s[0] >> 8 ) & 0xff] ^ Te3[(s[1]) & 0xff] ^ rk[2]; 208 | t[3] = Te0[s[3] >> 24] ^ Te1[(s[0] >> 16) & 0xff] ^ Te2[(s[1] >> 8 ) & 0xff] ^ Te3[(s[2]) & 0xff] ^ rk[3]; 209 | 210 | for(int j = 0; j < 4; j++) 211 | s[j] = t[j]; 212 | } 213 | 214 | rk = keySchedule + 4 * 10; 215 | s[0] = (Te2[(t[0] >> 24)] & 0xff000000) ^ (Te3[(t[1] >> 16) & 0xff] & 0x00ff0000) ^ (Te0[(t[2] >> 8) & 0xff] & 0x0000ff00) ^ (Te1[t[3] & 0xff] & 0x000000ff) ^ rk[0]; 216 | s[1] = (Te2[(t[1] >> 24)] & 0xff000000) ^ (Te3[(t[2] >> 16) & 0xff] & 0x00ff0000) ^ (Te0[(t[3] >> 8) & 0xff] & 0x0000ff00) ^ (Te1[t[0] & 0xff] & 0x000000ff) ^ rk[1]; 217 | s[2] = (Te2[(t[2] >> 24)] & 0xff000000) ^ (Te3[(t[3] >> 16) & 0xff] & 0x00ff0000) ^ (Te0[(t[0] >> 8) & 0xff] & 0x0000ff00) ^ (Te1[t[1] & 0xff] & 0x000000ff) ^ rk[2]; 218 | s[3] = (Te2[(t[3] >> 24)] & 0xff000000) ^ (Te3[(t[0] >> 16) & 0xff] & 0x00ff0000) ^ (Te0[(t[1] >> 8) & 0xff] & 0x0000ff00) ^ (Te1[t[2] & 0xff] & 0x000000ff) ^ rk[3]; 219 | 220 | wordtobytes(block, s[0]); 221 | wordtobytes(block + 4, s[1]); 222 | wordtobytes(block + 8, s[2]); 223 | wordtobytes(block + 12, s[3]); 224 | } 225 | 226 | """ 227 | 228 | mod = SourceModule(cuda_kernel) 229 | dKeySchedule = mod.get_global("keySchedule")[0] 230 | cuda.memcpy_htod(dKeySchedule, self.keySchedule) 231 | dThreadMax = mod.get_global("threadMax")[0] 232 | cuda.memcpy_htod(dThreadMax, numpy.array([self.threadMax], numpy.uint32)) 233 | self.dLength = mod.get_global('length')[0] 234 | 235 | dTe0 = mod.get_global("Te0")[0] 236 | cuda.memcpy_htod(dTe0, self.Te[0]) 237 | dTe1 = mod.get_global("Te1")[0] 238 | cuda.memcpy_htod(dTe1, self.Te[1]) 239 | dTe2 = mod.get_global("Te2")[0] 240 | cuda.memcpy_htod(dTe2, self.Te[2]) 241 | dTe3 = mod.get_global("Te3")[0] 242 | cuda.memcpy_htod(dTe3, self.Te[3]) 243 | 244 | self.mod = mod 245 | 246 | self.cuda_buf = cuda.mem_alloc(self.cuda_buf_size) 247 | 248 | self.batchMax = self.threadMax * self.blockMax * 16 249 | 250 | self.cuda_inited = True 251 | 252 | def cuda_encrypt(self, pt): 253 | if len(pt) <= 0 or len(pt) % 16 != 0: 254 | raise Exception("invalid block size: " + str(len(pt))) 255 | 256 | if isinstance(pt, str): 257 | pass 258 | elif isinstance(pt, bytearray): 259 | pass 260 | else: 261 | raise Exception("invalid input type: " + type(pt)) 262 | self.init_cuda() 263 | 264 | # printKS = self.mod.get_function("printKeySchedule") 265 | # printKS(block = (1, 1, 1)) 266 | 267 | 268 | 269 | enc = self.mod.get_function("encrypt"); 270 | ct = numpy.empty(len(pt), dtype = numpy.ubyte) 271 | 272 | start = 0 273 | remain = len(pt) 274 | 275 | while remain > 0: 276 | threadNum = self.blockMax 277 | blockNum = 1 278 | if remain >= self.batchMax: 279 | dispose = self.batchMax 280 | blockNum = self.blockMax 281 | threadNum = self.threadMax 282 | elif remain > self.blockMax * 16: 283 | blockNum = int(math.ceil(float(remain / 16) / self.threadMax)) 284 | threadm = self.threadMax 285 | dispose = remain 286 | else: 287 | threadNum = remain / 16 288 | dispose = remain 289 | remain -= dispose 290 | 291 | cuda.memcpy_htod(self.cuda_buf, pt[start : start + dispose]) 292 | cuda.memcpy_htod(self.dLength, numpy.array([dispose], numpy.uint32)) 293 | 294 | enc(self.cuda_buf, block = (threadNum, 1, 1), grid = (blockNum, 1)) 295 | 296 | cuda.memcpy_dtoh(ct[start : start + dispose], self.cuda_buf) 297 | 298 | start += dispose 299 | 300 | 301 | return ct 302 | 303 | 304 | 305 | class TestCUDAAES(unittest.TestCase): 306 | 307 | def setUp(self): 308 | pass 309 | 310 | def tearDown(self): 311 | pass 312 | 313 | def _test_debuginfo(self): 314 | aes = AES('passward') 315 | aes.printKeySchedule() 316 | 317 | def test_basic_encrypt(self): 318 | key = bytearray(16) 319 | for i in xrange(16): 320 | key[i] = i 321 | rslt = (210,83,99,252,114,19,55,100,138,104,243,74,190,243,180,5) 322 | 323 | aes = AES(key) 324 | # aes.printKeySchedule() 325 | # print binascii.hexlify(key) 326 | 327 | pt = 'abcdefghijklmnop' 328 | # print binascii.hexlify(pt) 329 | ct = aes.basic_encrypt(pt) 330 | # print binascii.hexlify(ct) 331 | 332 | for i in xrange(len(rslt)): 333 | self.assertEqual(rslt[i], ct[i]) 334 | 335 | def test_cuda_encrypt(self): 336 | key = bytearray(16) 337 | for i in xrange(16): 338 | key[i] = i 339 | rslt = (210,83,99,252,114,19,55,100,138,104,243,74,190,243,180,5) 340 | aes = AES(key) 341 | pt = 'abcdefghijklmnop' 342 | # print binascii.hexlify(pt) 343 | ct = aes.cuda_encrypt(pt) 344 | # print binascii.hexlify(ct) 345 | 346 | for i in xrange(len(rslt)): 347 | self.assertEqual(rslt[i], ct[i]) 348 | 349 | def test_compare(self): 350 | key = bytearray(16) 351 | for i in xrange(16): 352 | key[i] = i 353 | aes = AES(key, 4) 354 | 355 | bs = 1024 356 | pt = numpy.random.bytes(bs) 357 | 358 | ct1 = aes.basic_encrypt(pt) 359 | ct2 = aes.cuda_encrypt(pt) 360 | 361 | for i in xrange(len(pt)): 362 | self.assertEqual(ct1[i], ct2[i]) 363 | 364 | def test_benchmark(self): 365 | key = numpy.random.bytes(16) 366 | aes = AES(key) 367 | 368 | for i in xrange(17): 369 | bs = 16 * pow(2, i) 370 | print "\nBlock size: %d bytes" % bs 371 | # pt = 'abcdefghijklmnop' * (bs / 16) 372 | pt = numpy.random.bytes(bs) 373 | 374 | s = time.clock() 375 | aes.basic_encrypt(pt) 376 | e = time.clock() 377 | print "CPU: %fs, speed: %.2fMiB/s" % ((e - s), (bs / (e - s) / 1024 / 1024)) 378 | 379 | s = time.clock() 380 | aes.cuda_encrypt(pt) 381 | e = time.clock() 382 | print "CUDA: %fs, speed: %.2fMiB/s" % ((e - s), (bs / (e - s) / 1024 / 1024)) 383 | 384 | 385 | if __name__ == "__main__": 386 | unittest.main() 387 | -------------------------------------------------------------------------------- /docker/pycuda/python-pycuda/cuda_enc.py: -------------------------------------------------------------------------------- 1 | import aescuda, time, numpy 2 | 3 | 4 | bs = 16 * 1024 * 1024 5 | print "\n\nblock size: %d bytes" % bs 6 | 7 | aes = aescuda.AES("1234567890123456") 8 | for i in range(0, 64): 9 | pt = numpy.random.bytes(bs) 10 | s = time.clock() 11 | aes.cuda_encrypt(pt) 12 | e = time.clock() 13 | print "[CUDA ENC] time: %fs, speed: %.2fMiB/s" % ((e - s), (bs / (e - s) / 1024 / 1024)) 14 | 15 | -------------------------------------------------------------------------------- /docker/pycuda/python-pycuda/pycuda-2016.1.2.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nightseas/gpgpu_applications/498de3041712f98f4c11c6587c6b3cb248e99228/docker/pycuda/python-pycuda/pycuda-2016.1.2.tar.gz -------------------------------------------------------------------------------- /docker/pyopencl/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:16.04 2 | 3 | RUN apt-get update && apt-get install -y nano wget git sudo 4 | 5 | RUN apt-get install -y python-pyopencl python-pyopencl-doc clinfo 6 | 7 | WORKDIR /root 8 | 9 | -------------------------------------------------------------------------------- /docker/pyopencl/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### latest 4 | This is only a base image without any drivers. 5 | 6 | #### nvidia-367 7 | Nvidia 367.57 driver integrated, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 8 | 9 | #### amdgpu-pro-16.40 10 | AMD GPU Pro 16.40-348864 driver integrated, which supports AMD Polaris GPU such as RX480/470. 11 | 12 | More information: 13 | 14 | TBD. 15 | 16 | ## Requirement 17 | 18 | - You should choose the exactly same version of driver running on both container and host. 19 | 20 | ## Test 21 | 22 | ```sh 23 | # Nvidia (change /dev/nvidia0 to your GPU) 24 | docker run --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm nightseas/pyopencl:nvidia-367 25 | 26 | # AMD 27 | docker run --device /dev/dri nightseas/pyopencl:amdgpu-pro-16.40 28 | ``` 29 | 30 | -------------------------------------------------------------------------------- /docker/pyopencl/amd/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nightseas/pyopencl 2 | 3 | ENV AMD_DRI_VER=16.40-348864 4 | 5 | RUN apt-get install -y dkms libglib2.0-0 libgstreamer-plugins-base1.0-0 libepoxy0 \ 6 | libgstreamer1.0-0 libomxil-bellagio0 libcunit1 libx11-xcb1 libxcb-dri2-0 \ 7 | libxcb-glx0 libxdamage1 libxfixes3 libxxf86vm1 libxcb-dri3-0 libxcb-present0 \ 8 | libxcb-sync1 libxshmfence1 libelf1 libvdpau1 libelf1 libomxil-bellagio0 9 | 10 | COPY amdgpu-pro-$AMD_DRI_VER.tar.gz /root 11 | 12 | RUN cd /root && tar xzf amdgpu-pro-$AMD_DRI_VER.tar.gz && dpkg -i /root/amdgpu-pro-$AMD_DRI_VER/*.deb && rm -rf /root/amdgpu* 13 | 14 | CMD sh -c clinfo 15 | 16 | -------------------------------------------------------------------------------- /docker/pyopencl/amd/REDME.md: -------------------------------------------------------------------------------- 1 | ## Test 2 | 3 | ```sh 4 | docker run --device /dev/dri:/dev/dri nightseas/pyopencl:amdgpu-pro-16.40 5 | ``` 6 | 7 | Run the pyopencl example: 8 | 9 | ```sh 10 | docker run --device /dev/dri nightseas/pyopencl:amdgpu-pro-16.40 sh -c 'python /usr/share/doc/python-pyopencl-doc/examples/benchmark.py' 11 | ``` 12 | -------------------------------------------------------------------------------- /docker/pyopencl/amd/amdgpu-pro-16.40-348864.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nightseas/gpgpu_applications/498de3041712f98f4c11c6587c6b3cb248e99228/docker/pyopencl/amd/amdgpu-pro-16.40-348864.tar.gz -------------------------------------------------------------------------------- /docker/pyopencl/nvidia/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nightseas/pyopencl 2 | 3 | ENV NV_DRI_VER=367 4 | ENV DEBIAN_FRONTEND=noninteractive 5 | 6 | RUN apt-get install -y nvidia-$NV_DRI_VER nvidia-$NV_DRI_VER-dev nvidia-opencl-icd-$NV_DRI_VER clinfo 7 | 8 | CMD sh -c clinfo 9 | 10 | -------------------------------------------------------------------------------- /docker/pyopencl/nvidia/README.md: -------------------------------------------------------------------------------- 1 | ## Test 2 | 3 | ```sh 4 | docker run --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm nightseas/pyopencl:nvidia-367 5 | ``` 6 | 7 | Run the pyopencl example: 8 | 9 | ```sh 10 | docker run --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm nightseas/pyopencl:nvidia-367 sh -c 'python /usr/share/doc/python-pyopencl-doc/examples/benchmark.py' 11 | ``` 12 | -------------------------------------------------------------------------------- /docker/pytorch/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nvidia/cuda:9.1-cudnn7-devel 2 | 3 | RUN apt-get update && apt-get install -y --no-install-recommends \ 4 | python3.5 \ 5 | python3.5-dev \ 6 | python3-pip \ 7 | build-essential \ 8 | cmake \ 9 | git \ 10 | curl \ 11 | vim \ 12 | ca-certificates \ 13 | libjpeg-dev \ 14 | libpng-dev &&\ 15 | rm -rf /var/lib/apt/lists/* 16 | 17 | RUN pip3 install numpy && \ 18 | pip3 install wheel && \ 19 | pip3 install setuptools && \ 20 | pip3 install ninja && \ 21 | pip3 install http://download.pytorch.org/whl/cu91/torch-0.4.0-cp35-cp35m-linux_x86_64.whl && \ 22 | pip3 install torchvision 23 | 24 | # Workaround for pip installation and pytorch test bugs. 25 | RUN ln -s /usr/bin/python3 /usr/bin/python 26 | 27 | WORKDIR /root 28 | -------------------------------------------------------------------------------- /docker/pytorch/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### cuda9.1-ubuntu16.04 (=latest) 4 | 5 | Ubuntu 16.04 + CUDA 9.1, which supports NVIDIA Volta GPUs such as V100 (not tested), and older ones. 6 | 7 | More information: 8 | 9 | - [CUDA](http://www.nvidia.com/object/cuda_home_new.html) 10 | - [Pytorch](https://pytorch.org) 11 | 12 | ## Requirement 13 | 14 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 15 | 16 | ## Test 17 | 18 | ```sh 19 | docker run -it --runtime=nvidia nightseas/pytorch bash 20 | ``` 21 | 22 | ### Pytorch Basic MNIST Example 23 | 24 | ```sh 25 | git clone https://github.com/pytorch/examples 26 | cd examples/mnist 27 | pip3 install -r requirements.txt 28 | 29 | python3 main.py 30 | ``` 31 | 32 | ### Pytorch Auto Test Results 33 | 34 | Platform: Dual Intel E5-2699v4 + NVIDIA P100 35 | 36 | NVIDIA Driver: 390.48 37 | 38 | Pytorch: v0.4.0 39 | 40 | ```sh 41 | git clone --recursive https://github.com/pytorch/pytorch 42 | cd pytorch 43 | git checkout v0.4.0 44 | pip3 install -r requirements.txt 45 | 46 | cd test 47 | python3 run_test.py 48 | 49 | -------------------------------------- 50 | 51 | Running test_autograd ... 52 | Ran 784 tests in 110.118s 53 | OK 54 | 55 | Running test_cpp_extensions ... 56 | Ran 6 tests in 5.280s 57 | OK 58 | 59 | Running test_cuda ... 60 | Ran 396 tests in 83.172s 61 | OK (skipped=12) 62 | 63 | Running test_dataloader ... 64 | Ran 42 tests in 6.842s 65 | OK (skipped=1) 66 | 67 | Running test_distributed ... 68 | Ran 41 tests in 42.162s 69 | OK (skipped=22) 70 | Ran 41 tests in 45.037s 71 | OK (skipped=22) 72 | Ran 41 tests in 5.408s 73 | OK (skipped=9) 74 | Ran 41 tests in 11.949s 75 | OK (skipped=9) 76 | Ran 41 tests in 120.216s 77 | OK (skipped=31) 78 | Ran 41 tests in 122.232s 79 | OK (skipped=31) 80 | 81 | Running test_distributions ... 82 | Ran 149 tests in 12.358s 83 | OK (skipped=44) 84 | 85 | Running test_indexing ... 86 | Ran 40 tests in 0.051s 87 | OK 88 | 89 | Running test_jit ... 90 | Ran 142 tests in 28.509s 91 | OK (skipped=4, expected failures=3) 92 | 93 | Running test_legacy_nn ... 94 | Ran 426 tests in 140.719s 95 | OK (skipped=4) 96 | 97 | Running test_multiprocessing ... 98 | Ran 21 tests in 84.834s 99 | OK (skipped=1) 100 | 101 | Running test_nccl ... 102 | Ran 6 tests in 17.971s 103 | OK 104 | 105 | Running test_nn ... 106 | Ran 1104 tests in 1380.985s 107 | OK (skipped=14) 108 | 109 | Running test_optim ... 110 | Ran 30 tests in 65.653s 111 | OK 112 | 113 | Running test_sparse ... 114 | Ran 465 tests in 31.899s 115 | OK (skipped=36) 116 | 117 | Running test_torch ... 118 | Ran 309 tests in 31.833s 119 | OK (skipped=12) 120 | 121 | Running test_utils ... 122 | Ran 153 tests in 31.343s 123 | OK (skipped=3) 124 | 125 | ``` 126 | 127 | ## Known Issue 128 | 129 | N/A 130 | 131 | -------------------------------------------------------------------------------- /docker/pytorch/ubuntu16.04/cuda9.1/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nvidia/cuda:9.1-cudnn7-devel 2 | 3 | RUN apt-get update && apt-get install -y --no-install-recommends \ 4 | python3.5 \ 5 | python3.5-dev \ 6 | python3-pip \ 7 | build-essential \ 8 | cmake \ 9 | git \ 10 | curl \ 11 | vim \ 12 | ca-certificates \ 13 | libjpeg-dev \ 14 | libpng-dev &&\ 15 | rm -rf /var/lib/apt/lists/* 16 | 17 | RUN pip3 install numpy && \ 18 | pip3 install wheel && \ 19 | pip3 install setuptools && \ 20 | pip3 install ninja && \ 21 | pip3 install http://download.pytorch.org/whl/cu91/torch-0.4.0-cp35-cp35m-linux_x86_64.whl && \ 22 | pip3 install torchvision 23 | 24 | # Workaround for pip installation and pytorch test bugs. 25 | RUN ln -s /usr/bin/python3 /usr/bin/python 26 | 27 | WORKDIR /root 28 | -------------------------------------------------------------------------------- /docker/torch_opencv_dlib/Dockerfile: -------------------------------------------------------------------------------- 1 | # Start with cutorch base image 2 | FROM nightseas/cuda-torch:cuda8.0-ubuntu16.04 3 | 4 | MAINTAINER Xiaohai Li 5 | 6 | # Install basic deps 7 | RUN apt-get update && sudo apt-get install -y cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev \ 8 | python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev \ 9 | && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 10 | 11 | # Fetch and install openCV 12 | RUN git clone https://github.com/opencv/opencv.git /root/opencv && cd /root/opencv && git checkout 2.4.13 13 | RUN mkdir /root/opencv/build && cd /root/opencv/build && \ 14 | cmake -D WITH_CUDA=1 -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 \ 15 | -D WITH_OPENMP=1 -D WITH_OPENCL=1 -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local .. && \ 16 | make -j8 && make install -j8 && \ 17 | cd /root && rm opencv -rf 18 | 19 | # Fetch and install dlib 20 | RUN git clone https://github.com/davisking/dlib.git /root/dlib && cd /root/dlib && git checkout v19.0 21 | RUN cd /root/dlib/python_examples && cmake ../tools/python && cmake --build . --config Release -- -j8 && \ 22 | cp dlib.so /usr/local/lib/python2.7/dist-packages && \ 23 | cd /root && rm dlib -rf 24 | -------------------------------------------------------------------------------- /docker/torch_opencv_dlib/README.md: -------------------------------------------------------------------------------- 1 | ## Tags 2 | 3 | #### cv2.4.13-dlib19.0-cuda8.0-ubuntu16.04 (=latest) 4 | For now it's only Ubuntu 16.04 + CUDA 8.0, which supports NVidia Pascal GPU such as GTX1080/1070/1060. 5 | 6 | More information: 7 | 8 | - [CUDA 8.0](http://www.nvidia.com/object/cuda_home_new.html) 9 | - [cuDNN v5](https://developer.nvidia.com/cuDNN) 10 | - [openCV](http://opencv.org/platforms/cuda.html) 11 | - [dlib](http://dlib.net/) 12 | 13 | ## Requirement 14 | 15 | - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker) - see [requirements](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements) for more details. 16 | 17 | 18 | 19 | ## Test 20 | 21 | #### TBD 22 | 23 | 24 | ## Known Issues 25 | 26 | #### TBD 27 | -------------------------------------------------------------------------------- /scripts/gpgpu_monitor/README.md: -------------------------------------------------------------------------------- 1 | # CPU & GPU Visualized Utility Tool 2 | 3 | TBD. 4 | 5 | Required tools: 6 | 7 | - GUI: pyQt4, python and Qt. 8 | - CPU: Intel ptumon. 9 | - MEM: free. 10 | - GPU: nvidia-setting and nvidia-smi. -------------------------------------------------------------------------------- /scripts/gpu-mon.sh: -------------------------------------------------------------------------------- 1 | gpulist=`nvidia-settings -t -q gpus` 2 | gpulist=`echo "$gpulist" | sed -e 's/^ *//'` # no leading spaces 3 | gpulist=`echo "$gpulist" | grep -e '^\['` 4 | 5 | echo $gpulist | while read LINE; do 6 | gpuid=`echo "$LINE" | cut -d \ -f 2 | grep -E -o '\[.*\]'` 7 | gpuname=`echo "$LINE" | cut -d \ -f 3-` 8 | 9 | gpuutilstats=`nvidia-settings -t -q "$gpuid"/GPUUtilization | tr ',' '\n'` 10 | gputemp=`nvidia-settings -t -q "$gpuid"/GPUCoreTemp` 11 | gputotalmem=`nvidia-settings -t -q "$gpuid"/TotalDedicatedGPUMemory` 12 | gpuusedmem=`nvidia-settings -t -q "$gpuid"/UsedDedicatedGPUMemory` 13 | 14 | gpuusage=`echo "$gpuutilstats"|grep graphics|sed 's/[^0-9]//g'` 15 | memoryusage=`echo "$gpuutilstats"|grep memory|sed 's/[^0-9]//g'` 16 | bandwidthusage=`echo "$gpuutilstats"|grep PCIe|sed 's/[^0-9]//g'` 17 | 18 | echo "$gpuid $gpuname" 19 | echo " GPU usage : $gpuusage%" 20 | echo " Current temperature : $gputemp°C" 21 | echo " Memory usage : $gpuusedmem MB/$gputotalmem MB" 22 | echo " Memory bandwidth usage : $memoryusage%" 23 | echo " PCIe bandwidth usage : $bandwidthusage%" 24 | done 25 | --------------------------------------------------------------------------------